[
  {
    "path": "README.md",
    "content": "# TransformerCVAE\n\nThis repository contains source code for paper [Transformer-based Conditional Variational Autoencoder for Controllable Story Generation](https://arxiv.org/abs/2101.00828):\n\n```\n@article{fang2021transformer,\n  title={Transformer-based Conditional Variational Autoencoder for Controllable Story Generation},\n  author={Fang, Le and Zeng, Tao and Liu, Chaochun and Bo, Liefeng and Dong, Wen and Chen, Changyou},\n  journal={arXiv preprint arXiv:2101.00828},\n  year={2021}\n}\n```\n\n0. get source data ([Arxiv](https://github.com/gcunhase/ArXivAbsTitleDataset), [Yelp](https://github.com/fangleai/Implicit-LVM/tree/master/lang_model_yelp/data), [WritingPrompts](https://github.com/pytorch/fairseq/blob/master/examples/stories/README.md), [WikiPlots](https://github.com/markriedl/WikiPlots)).\n1. data pre-processing (data/).\n2. training (choose from several different implementations on parallelism and precision: train.py, train_dist.py, train_dist_half.py).\n3. generation, evaluation and analysis (generate.py/generate_prefix.py, eval_ppl.py/eval_ppl_prefix.py, tsne_plot.py).\n\nContact: lefang@buffalo.edu\n\nUpdate on 2022:\nIf you encounter package version issue, sorry for that I don't have a requirements.txt with exact versions. I used this package: https://github.com/nvidia/apex and an old pytorch version compatible with it at that time, say pytorch=0.4 (not 100% sure).\n"
  },
  {
    "path": "data/__init__.py",
    "content": ""
  },
  {
    "path": "data/arxiv/artificial intelligence_10047_15000_15_abs.txt",
    "content": "Artificial intelligence has impacted many aspects of human life. This paper studies the impact of artificial intelligence on economic theory. In particular we study the impact of artificial intelligence on the theory of bounded rationality, efficient market hypothesis and prospect theory.\nIn this paper, I put forward that in many instances, thinking mechanisms are equivalent to artificial intelligence modules programmed into the human mind.\nThe rapid development of artificial intelligence has brought the artificial intelligence threat theory as well as the problem about how to evaluate the intelligence level of intelligent products. Both need to find a quantitative method to evaluate the intelligence level of intelligence systems, including human intelligence. Based on the standard intelligence system and the extended Von Neumann architecture, this paper proposes General IQ, Service IQ and Value IQ evaluation methods for intelligence systems, depending on different evaluation purposes. Among them, the General IQ of intelligence systems is to answer the question of whether the artificial intelligence can surpass the human intelligence, which is reflected in putting the intelligence systems on an equal status and conducting the unified evaluation. The Service IQ and Value IQ of intelligence systems are used to answer the question of how the intelligent products can better serve the human, reflecting the intelligence and required cost of each intelligence system as a product in the process of serving human.\nIn this article, I discuss how the AI community views concerns about the emergence of superintelligent AI and related philosophical issues.\nCurrently, potential threats of artificial intelligence (AI) to human have triggered a large controversy in society, behind which, the nature of the issue is whether the artificial intelligence (AI) system can be evaluated quantitatively. This article analyzes and evaluates the challenges that the AI development level is facing, and proposes that the evaluation methods for the human intelligence test and the AI system are not uniform; and the key reason for which is that none of the models can uniformly describe the AI system and the beings like human. Aiming at this problem, a standard intelligent system model is established in this study to describe the AI system and the beings like human uniformly. Based on the model, the article makes an abstract mathematical description, and builds the standard intelligent machine mathematical model; expands the Von Neumann architecture and proposes the Liufeng - Shiyong architecture; gives the definition of the artificial intelligence IQ, and establishes the artificial intelligence scale and the evaluation method; conduct the test on 50 search engines and three human subjects at different ages across the world, and finally obtains the ranking of the absolute IQ and deviation IQ ranking for artificial intelligence IQ 2014.\nKnowledge-based or Artificial Intelligence techniques are used increasingly as alternatives to more classical techniques to model ENVIRONMENTAL SYSTEMS. Use of Artificial Intelligence (AI) in environmental modelling has increased with recognition of its potential. In this paper we examine the DIFFERENT TECHNIQUES of Artificial intelligence with profound examples of human perception, learning and reasoning to solve complex problems. However with the increase of complexity better methods are required. Keeping in view of the above some researchers introduced the idea of hybrid mechanism in which two or more methods can be combined which seems to be a positive effort for creating a more complex; advanced and intelligent system which has the capability to in- cooperate human decisions thus driving the landscape changes.\nAlthough the definition and measurement of intelligence is clearly of fundamental importance to the field of artificial intelligence, no general survey of definitions and tests of machine intelligence exists. Indeed few researchers are even aware of alternatives to the Turing test and its many derivatives. In this paper we fill this gap by providing a short survey of the many tests of machine intelligence that have been proposed.\nThe Universal Intelligence Measure is a recently proposed formal definition of intelligence. It is mathematically specified, extremely general, and captures the essence of many informal definitions of intelligence. It is based on Hutter's Universal Artificial Intelligence theory, an extension of Ray Solomonoff's pioneering work on universal induction. Since the Universal Intelligence Measure is only asymptotically computable, building a practical intelligence test from it is not straightforward. This paper studies the practical issues involved in developing a real-world UIM-based performance metric. Based on our investigation, we develop a prototype implementation which we use to evaluate a number of different artificial agents.\nWe derive axiomatically the probability function that should be used to make decisions given any form of underlying uncertainty.\nThis article analyses the properties of the Internal Behaviour network, an action selection mechanism previously proposed by the authors, with the aid of a simulation developed for such ends. A brief review of the Internal Behaviour network is followed by the explanation of the implementation of the simulation. Then, experiments are presented and discussed analysing the properties of the action selection in the proposed model.\nThis paper describe about a new methodology for developing and improving the robotics field via artificial intelligence and internet of things. Now a day, we can say Artificial Intelligence take the world into robotics. Almost all industries use robots for lot of works. They are use co-operative robots to make different kind of works. But there was some problem to make robot for multi tasks. So there was a necessary new methodology to made multi tasking robots. It will be done only by artificial intelligence and internet of things.\nOver the last decade, there has been growing interest in the use or measures or change in belief for reasoning with uncertainty in artificial intelligence research. An important characteristic of several methodologies that reason with changes in belief or belief updates, is a property that we term modularity. We call updates that satisfy this property modular updates. Whereas probabilistic measures of belief update - which satisfy the modularity property were first discovered in the nineteenth century, knowledge and discussion of these quantities remains obscure in artificial intelligence research. We define modular updates and discuss their inappropriate use in two influential expert systems.\nOne of the most important aims of the fields of robotics, artificial intelligence and artificial life is the design and construction of systems and machines as versatile and as reliable as living organisms at performing high level human-like tasks. But how are we to evaluate artificial systems if we are not certain how to measure these capacities in living systems, let alone how to define life or intelligence? Here I survey a concrete metric towards measuring abstract properties of natural and artificial systems, such as the ability to react to the environment and to control one's own behaviour.\nA fundamental problem in artificial intelligence is that nobody really knows what intelligence is. The problem is especially acute when we need to consider artificial systems which are significantly different to humans. In this paper we approach this problem in the following way: We take a number of well known informal definitions of human intelligence that have been given by experts, and extract their essential features. These are then mathematically formalised to produce a general measure of intelligence for arbitrary machines. We believe that this measure formally captures the concept of machine intelligence in the broadest reasonable sense.\nNarrative intelligence is the ability to craft, tell, understand, and respond affectively to stories. We argue that instilling artificial intelligences with computational narrative intelligence affords a number of applications beneficial to humans. We lay out some of the machine learning challenges necessary to solve to achieve computational narrative intelligence. Finally, we argue that computational narrative is a practical step towards machine enculturation, the teaching of sociocultural values to machines.\nThe advent of artificial intelligence has changed many disciplines such as engineering, social science and economics. Artificial intelligence is a computational technique which is inspired by natural intelligence such as the swarming of birds, the working of the brain and the pathfinding of the ants. These techniques have impact on economic theories. This book studies the impact of artificial intelligence on economic theories, a subject that has not been extensively studied. The theories that are considered are: demand and supply, asymmetrical information, pricing, rational choice, rational expectation, game theory, efficient market hypotheses, mechanism design, prospect, bounded rationality, portfolio theory, rational counterfactual and causality. The benefit of this book is that it evaluates existing theories of economics and update them based on the developments in artificial intelligence field.\nIn this paper we offer a formal definition of Artificial Intelligence and this directly gives us an algorithm for construction of this object. Really, this algorithm is useless due to the combinatory explosion.   The main innovation in our definition is that it does not include the knowledge as a part of the intelligence. So according to our definition a newly born baby also is an Intellect. Here we differs with Turing's definition which suggests that an Intellect is a person with knowledge gained through the years.\nWith almost daily improvements in capabilities of artificial intelligence it is more important than ever to develop safety software for use by the AI research community. Building on our previous work on AI Containment Problem we propose a number of guidelines which should help AI safety researchers to develop reliable sandboxing software for intelligent programs of all levels. Such safety container software will make it possible to study and analyze intelligent artificial agent while maintaining certain level of safety against information leakage, social engineering attacks and cyberattacks from within the container.\nContent marketing is todays one of the most remarkable approaches in the context of marketing processes of companies. Value of this kind of marketing has improved in time, thanks to the latest developments regarding to computer and communication technologies. Nowadays, especially social media based platforms have a great importance on enabling companies to design multimedia oriented, interactive content. But on the other hand, there is still something more to do for improved content marketing approaches. In this context, objective of this study is to focus on intelligent content marketing, which can be done by using artificial intelligence. Artificial Intelligence is todays one of the most remarkable research fields and it can be used easily as multidisciplinary. So, this study has aimed to discuss about its potential on improving content marketing. In detail, the study has enabled readers to improve their awareness about the intersection point of content marketing and artificial intelligence. Furthermore, the authors have introduced some example models of intelligent content marketing, which can be achieved by using current Web technologies and artificial intelligence techniques.\nThe recent surge in interest in ethics in artificial intelligence may leave many educators wondering how to address moral, ethical, and philosophical issues in their AI courses. As instructors we want to develop curriculum that not only prepares students to be artificial intelligence practitioners, but also to understand the moral, ethical, and philosophical impacts that artificial intelligence will have on society. In this article we provide practical case studies and links to resources for use by AI educators. We also provide concrete suggestions on how to integrate AI ethics into a general artificial intelligence course and how to teach a stand-alone artificial intelligence ethics course.\nThere is a significant lack of unified approaches to building generally intelligent machines. The majority of current artificial intelligence research operates within a very narrow field of focus, frequently without considering the importance of the 'big picture'. In this document, we seek to describe and unify principles that guide the basis of our development of general artificial intelligence. These principles revolve around the idea that intelligence is a tool for searching for general solutions to problems. We define intelligence as the ability to acquire skills that narrow this search, diversify it and help steer it to more promising areas. We also provide suggestions for studying, measuring, and testing the various skills and abilities that a human-level intelligent machine needs to acquire. The document aims to be both implementation agnostic, and to provide an analytic, systematic, and scalable way to generate hypotheses that we believe are needed to meet the necessary conditions in the search for general artificial intelligence. We believe that such a framework is an important stepping stone for bringing together definitions, highlighting open problems, connecting researchers willing to collaborate, and for unifying the arguably most significant search of this century.\nArtificial intelligence (AI) is an important technology that supports daily social life and economic activities. It contributes greatly to the sustainable growth of Japan's economy and solves various social problems. In recent years, AI has attracted attention as a key for growth in developed countries such as Europe and the United States and developing countries such as China and India. The attention has been focused mainly on developing new artificial intelligence information communication technology (ICT) and robot technology (RT). Although recently developed AI technology certainly excels in extracting certain patterns, there are many limitations. Most ICT models are overly dependent on big data, lack a self-idea function, and are complicated. In this paper, rather than merely developing next-generation artificial intelligence technology, we aim to develop a new concept of general-purpose intelligence cognition technology called Beyond AI. Specifically, we plan to develop an intelligent learning model called Brain Intelligence (BI) that generates new ideas about events without having experienced them by using artificial life with an imagine function. We will also conduct demonstrations of the developed BI intelligence learning model on automatic driving, precision medical care, and industrial robots.\nAlthough artificial intelligence is currently one of the most interesting areas in scientific research, the potential threats posed by emerging AI systems remain a source of persistent controversy. To address the issue of AI threat, this study proposes a standard intelligence model that unifies AI and human characteristics in terms of four aspects of knowledge, i.e., input, output, mastery, and creation. Using this model, we observe three challenges, namely, expanding of the von Neumann architecture; testing and ranking the intelligence quotient of naturally and artificially intelligent systems, including humans, Google, Bing, Baidu, and Siri; and finally, the dividing of artificially intelligent systems into seven grades from robots to Google Brain. Based on this, we conclude that AlphaGo belongs to the third grade.\nBiologically inspired computing is an area of computer science which uses the advantageous properties of biological systems. It is the amalgamation of computational intelligence and collective intelligence. Biologically inspired mechanisms have already proved successful in achieving major advances in a wide range of problems in computing and communication systems. The consortium of bio-inspired computing are artificial neural networks, evolutionary algorithms, swarm intelligence, artificial immune systems, fractal geometry, DNA computing and quantum computing, etc. This article gives an introduction to swarm intelligence.\nThis paper analyses the influence of including agents of different degrees of intelligence in a multiagent system. The goal is to better understand how we can develop intelligence tests that can evaluate social intelligence. We analyse several reinforcement algorithms in several contexts of cooperation and competition. Our experimental setting is inspired by the recently developed Darwin-Wallace distribution.\nWe give a brief overview of the main characteristics of diagrammatic reasoning, analyze a case of human reasoning in a mastermind game, and explain why hybrid representation systems (HRS) are particularly attractive and promising for Artificial General Intelligence and Computer Science in general.\nWe reminisce and discuss applications of algorithmic probability to a wide range of problems in artificial intelligence, philosophy and technological society. We propose that Solomonoff has effectively axiomatized the field of artificial intelligence, therefore establishing it as a rigorous scientific discipline. We also relate to our own work in incremental machine learning and philosophy of complexity.\nMuch artificial intelligence research focuses on the problem of deducing the validity of unobservable propositions or hypotheses from observable evidence.! Many of the knowledge representation techniques designed for this problem encode the relationship between evidence and hypothesis in a directed manner. Moreover, the direction in which evidence is stored is typically from evidence to hypothesis.\nThis paper covers two topics: first an introduction to Algorithmic Complexity Theory: how it defines probability, some of its characteristic properties and past successful applications. Second, we apply it to problems in A.I. - where it promises to give near optimum search procedures for two very broad classes of problems.\nVerified artificial intelligence (AI) is the goal of designing AI-based systems that are provably correct with respect to mathematically-specified requirements. This paper considers Verified AI from a formal methods perspective. We describe five challenges for achieving Verified AI, and five corresponding principles for addressing these challenges.\nHuman speech is the most important part of General Artificial Intelligence and subject of much research. The hypothesis proposed in this article provides explanation of difficulties that modern science tackles in the field of human brain simulation. The hypothesis is based on the author's conviction that the brain of any given person has different ability to process and store information. Therefore, the approaches that are currently used to create General Artificial Intelligence have to be altered.\nThis article considers evidence from physical and biological sciences to show machines are deficient compared to biological systems at incorporating intelligence. Machines fall short on two counts: firstly, unlike brains, machines do not self-organize in a recursive manner; secondly, machines are based on classical logic, whereas Nature's intelligence may depend on quantum mechanics.\nArtificial Intelligence began as a field probing some of the most fundamental questions of science - the nature of intelligence and the design of intelligent artifacts. But it has grown into a discipline that is deeply entwined with commerce and society. Today's AI technology, such as expert systems and intelligent assistants, pose some difficult questions of risk, trust and accountability. In this paper, we present these concerns, examining them in the context of historical developments that have shaped the nature and direction of AI research. We also suggest the exploration and further development of two paradigms, human intelligence-machine cooperation, and a sociological view of intelligence, which might help address some of these concerns.\nLittle by little, newspapers are revealing the bright future that Artificial Intelligence (AI) is building. Intelligent machines will help everywhere. However, this bright future has a dark side: a dramatic job market contraction before its unpredictable transformation. Hence, in a near future, large numbers of job seekers will need financial support while catching up with these novel unpredictable jobs. This possible job market crisis has an antidote inside. In fact, the rise of AI is sustained by the biggest knowledge theft of the recent years. Learning AI machines are extracting knowledge from unaware skilled or unskilled workers by analyzing their interactions. By passionately doing their jobs, these workers are digging their own graves.   In this paper, we propose Human-in-the-loop Artificial Intelligence (HIT-AI) as a fairer paradigm for Artificial Intelligence systems. HIT-AI will reward aware and unaware knowledge producers with a different scheme: decisions of AI systems generating revenues will repay the legitimate owners of the knowledge used for taking those decisions. As modern Robin Hoods, HIT-AI researchers should fight for a fairer Artificial Intelligence that gives back what it steals.\nIndependent from the still ongoing research in measuring individual intelligence, we anticipate and provide a framework for measuring collective intelligence. Collective intelligence refers to the idea that several individuals can collaborate in order to achieve high levels of intelligence. We present thus some ideas on how the intelligence of a group can be measured and simulate such tests. We will however focus here on groups of artificial intelligence agents (i.e., machines). We will explore how a group of agents is able to choose the appropriate problem and to specialize for a variety of tasks. This is a feature which is an important contributor to the increase of intelligence in a group (apart from the addition of more agents and the improvement due to common decision making). Our results reveal some interesting results about how (collective) intelligence can be modeled, about how collective intelligence tests can be designed and about the underlying dynamics of collective intelligence. As it will be useful for our simulations, we provide also some improvements of the threshold allocation model originally used in the area of swarm intelligence but further generalized here.\nCrisis response poses many of the most difficult information technology in crisis management. It requires information and communication-intensive efforts, utilized for reducing uncertainty, calculating and comparing costs and benefits, and managing resources in a fashion beyond those regularly available to handle routine problems. In this paper, we explore the benefits of artificial intelligence technologies in crisis response. This paper discusses the role of artificial intelligence technologies; namely, robotics, ontology and semantic web, and multi-agent systems in crisis response.\nAgents and agent systems are becoming more and more important in the development of a variety of fields such as ubiquitous computing, ambient intelligence, autonomous computing, intelligent systems and intelligent robotics. The need for improvement of our basic knowledge on agents is very essential. We take a systematic approach and present extended classification of artificial agents which can be useful for understanding of what artificial agents are and what they can be in the future. The aim of this classification is to give us insights in what kind of agents can be created and what type of problems demand a specific kind of agents for their solution.\nTwo different definitions of the Artificial Intelligence concept have been proposed in papers [1] and [2]. The first definition is informal. It says that any program that is cleverer than a human being, is acknowledged as Artificial Intelligence. The second definition is formal because it avoids reference to the concept of human being. The readers of papers [1] and [2] might be left with the impression that both definitions are equivalent and the definition in [2] is simply a formal version of that in [1]. This paper will compare both definitions of Artificial Intelligence and, hopefully, will bring a better understanding of the concept.\nComputer games play an important role in our society and motivate people to learn computer science. Since artificial intelligence is integral to most games, they can also be used to teach artificial intelligence. We introduce the Game AI Game Engine (GAIGE), a Python game engine specifically designed to teach about how AI is used in computer games. A progression of seven assignments builds toward a complete, working Multi-User Battle Arena (MOBA) game. We describe the engine, the assignments, and our experiences using it in a class on Game Artificial Intelligence.\nThis paper is concerned with two theories of probability judgment: the Bayesian theory and the theory of belief functions. It illustrates these theories with some simple examples and discusses some of the issues that arise when we try to implement them in expert systems. The Bayesian theory is well known; its main ideas go back to the work of Thomas Bayes (1702-1761). The theory of belief functions, often called the Dempster-Shafer theory in the artificial intelligence community, is less well known, but it has even older antecedents; belief-function arguments appear in the work of George Hooper (16401723) and James Bernoulli (1654-1705). For elementary expositions of the theory of belief functions, see Shafer (1976, 1985).\nAlgorithmic composition is the partial or total automation of the process of music composition by using computers. Since the 1950s, different computational techniques related to Artificial Intelligence have been used for algorithmic composition, including grammatical representations, probabilistic methods, neural networks, symbolic rule-based systems, constraint programming and evolutionary algorithms. This survey aims to be a comprehensive account of research on algorithmic composition, presenting a thorough view of the field for researchers in Artificial Intelligence.\nThe theory of rational choice assumes that when people make decisions they do so in order to maximize their utility. In order to achieve this goal they ought to use all the information available and consider all the choices available to choose an optimal choice. This paper investigates what happens when decisions are made by artificially intelligent machines in the market rather than human beings. Firstly, the expectations of the future are more consistent if they are made by an artificially intelligent machine and the decisions are more rational and thus marketplace becomes more rational.\nArtificial Intelligence (AI) started out with an ambition to reproduce the human mind, but, as the sheer scale of that ambition became apparent, quickly retreated into either studying specialized intelligent behaviours, or proposing overarching architectural concepts for interfacing specialized intelligent behaviour components, conceived of as agents in a kind of organization. This agent-based modeling paradigm, in turn, proves to have interesting applications in understanding, simulating, and predicting the behaviour of social and legal structures on an aggregate level. This chapter examines a number of relevant cross-cutting concerns, conceptualizations, modeling problems and design challenges in large-scale distributed Artificial Intelligence, as well as in institutional systems, and identifies potential grounds for novel advances.\nSince Artificial Intelligence (AI) software uses techniques like deep lookahead search and stochastic optimization of huge neural networks to fit mammoth datasets, it often results in complex behavior that is difficult for people to understand. Yet organizations are deploying AI algorithms in many mission-critical settings. In order to trust their behavior, we must make it intelligible --- either by using inherently interpretable models or by developing methods for explaining otherwise overwhelmingly complex decisions by local approximation, vocabulary alignment, and interactive dialog.\nHow intelligent is robot A compared with robot B? And how intelligent are robots A and B compared with animals (or plants) X and Y? These are both interesting and deeply challenging questions. In this paper we address the question \"how intelligent is your intelligent robot?\" by proposing that embodied intelligence emerges from the interaction and integration of four different and distinct kinds of intelligence. We then suggest a simple diagrammatic representation on which these kinds of intelligence are shown as four axes in a star diagram. A crude qualitative comparison of the intelligence graphs of animals and robots both exposes and helps to explain the chronic intelligence deficit of intelligent robots. Finally we examine the options for determining numerical values for the four kinds of intelligence in an effort to move toward a quantifiable intelligence vector.\nOn grounds of the discussed material, we reason about possible future development of quantum game theory and its impact on information processing and the emerging information society. The idea of quantum artificial intelligence is explained.\nWe show that the d -separation criterion constitutes a valid test for conditional independence relationships that are induced by feedback systems involving discrete variables.\nAn elaboration of Dempster's method of constructing belief functions suggests a broadly applicable strategy for constructing lower probabilities under a variety of evidentiary constraints.\nThis paper examines the concept of a combination rule for belief functions. It is shown that two fairly simple and apparently reasonable assumptions determine Dempster's rule, giving a new justification for it.\nWithin the transferable belief model, positive basic belief masses can be allocated to the empty set, leading to unnormalized belief functions. The nature of these unnormalized beliefs is analyzed.\nDefinitions and notations with historical references are given for some numerical coefficients commonly used to quantify relations among collections of objects for the purpose of expressing approximate knowledge and probabilistic reasoning.\nSmart optical networks are the next evolution of programmable networking and programmable automation of optical networks, with human-in-the-loop network control and management. The paper discusses this evolution and the role of Artificial Intelligence (AI).\nThe first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers culminating in book (Hutter, 2005), an exciting sound and complete mathematical model for a super intelligent agent (AIXI) has been developed and rigorously analyzed. While nowadays most AI researchers avoid discussing intelligence, the award-winning PhD thesis (Legg, 2008) provided the philosophical embedding and investigated the UAI-based universal measure of rational intelligence, which is formal, objective and non-anthropocentric. Recently, effective approximations of AIXI have been derived and experimentally investigated in JAIR paper (Veness et al. 2011). This practical breakthrough has resulted in some impressive applications, finally muting earlier critique that UAI is only a theory. For the first time, without providing any domain knowledge, the same agent is able to self-adapt to a diverse range of interactive environments. For instance, AIXI is able to learn from scratch to play TicTacToe, Pacman, Kuhn Poker, and other games by trial and error, without even providing the rules of the games.   These achievements give new hope that the grand goal of Artificial General Intelligence is not elusive.   This article provides an informal overview of UAI in context. It attempts to gently introduce a very theoretical, formal, and mathematical subject, and discusses philosophical and technical ingredients, traits of intelligence, some social questions, and the past and future of UAI.\nOrder exists in the world. The intelligence process enables us to realize that order, to some extent. We provide a high level description of intelligence using simple definitions, basic building blocks, a conceptual framework and general hierarchy. This perspective includes multiple levels of abstraction occurring in space and in time. The resulting model offers simple, useful ways to help realize the essence of intelligence.\nWhen human agents come together to make decisions, it is often the case that one human agent has more information than the other. This phenomenon is called information asymmetry and this distorts the market. Often if one human agent intends to manipulate a decision in its favor the human agent can signal wrong or right information. Alternatively, one human agent can screen for information to reduce the impact of asymmetric information on decisions. With the advent of artificial intelligence, signaling and screening have been made easier. This paper studies the impact of artificial intelligence on the theory of asymmetric information. It is surmised that artificial intelligent agents reduce the degree of information asymmetry and thus the market where these agents are deployed become more efficient. It is also postulated that the more artificial intelligent agents there are deployed in the market the less is the volume of trades in the market. This is because for many trades to happen the asymmetry of information on goods and services to be traded should exist, creating a sense of arbitrage.\nThere has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to provide more transparency to their algorithms. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that, if these techniques are to succeed, the explanations they generate should have a structure that humans accept. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers' intuition of what constitutes a `good' explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence.\nWe briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high level intelligence problems such as sparse coding and contextual processing.\nWhether the machine intelligence can surpass the human intelligence is a controversial issue. On the basis of traditional IQ, this article presents the Universal IQ test method suitable for both the machine intelligence and the human intelligence. With the method, machine and human intelligences were divided into 4 major categories and 15 subcategories. A total of 50 search engines across the world and 150 persons at different ages were subject to the relevant test. And then, the Universal IQ ranking list of 2014 for the test objects was obtained.\nLooks at state interactions from an agent based AI perspective to see state interactions as an example of emergent intelligent behavior. Exposes basic principles of game theory.\nThis brief note highlights some basic concepts required toward understanding the evolution of machine learning and deep learning models. The note starts with an overview of artificial intelligence and its relationship to biological neuron that ultimately led to the evolution of todays intelligent models.\nA introduction to the syntax and Semantics of Answer Set Programming intended as an handout to [under]graduate students taking Artificial Intlligence or Logic Programming classes.\nA fundamental problem in artificial intelligence is that nobody really knows what intelligence is. The problem is especially acute when we need to consider artificial systems which are significantly different to humans. In this paper we approach this problem in the following way: We take a number of well known informal definitions of human intelligence that have been given by experts, and extract their essential features. These are then mathematically formalised to produce a general measure of intelligence for arbitrary machines. We believe that this equation formally captures the concept of machine intelligence in the broadest reasonable sense. We then show how this formal definition is related to the theory of universal optimal learning agents. Finally, we survey the many other tests and definitions of intelligence that have been proposed for machines.\nArtificial general intelligence (AGI) refers to research aimed at tackling the full problem of artificial intelligence, that is, create truly intelligent agents. This sets it apart from most AI research which aims at solving relatively narrow domains, such as character recognition, motion planning, or increasing player satisfaction in games. But how do we know when an agent is truly intelligent? A common point of reference in the AGI community is Legg and Hutter's formal definition of universal intelligence, which has the appeal of simplicity and generality but is unfortunately incomputable. Games of various kinds are commonly used as benchmarks for \"narrow\" AI research, as they are considered to have many important properties. We argue that many of these properties carry over to the testing of general intelligence as well. We then sketch how such testing could practically be carried out. The central part of this sketch is an extension of universal intelligence to deal with finite time, and the use of sampling of the space of games expressed in a suitably biased game description language.\nWe propose a number of heuristics that can be used for identifying when intransitive choice behaviour is likely to occur in choice situations. We also suggest two methods for avoiding undesired choice behaviour, namely transparent communication and adaptive choice-set generation. We believe that these two ways can contribute to the avoidance of decision biases in choice situations that may often be regretted.\nThe Hard Problem of consciousness has been dismissed as an illusion. By showing that computers are capable of experiencing, we show that they are at least rudimentarily conscious with potential to eventually reach superconsciousness. The main contribution of the paper is a test for confirming certain subjective experiences in a tested agent. We follow with analysis of benefits and problems with conscious machines and implications of such capability on future of computing, machine rights and artificial intelligence safety.\nOur desire and fascination with intelligent machines dates back to the antiquity's mythical automaton Talos, Aristotle's mode of mechanical thought (syllogism) and Heron of Alexandria's mechanical machines and automata. However, the quest for Artificial General Intelligence (AGI) is troubled with repeated failures of strategies and approaches throughout the history. This decade has seen a shift in interest towards bio-inspired software and hardware, with the assumption that such mimicry entails intelligence. Though these steps are fruitful in certain directions and have advanced automation, their singular design focus renders them highly inefficient in achieving AGI. Which set of requirements have to be met in the design of AGI? What are the limits in the design of the artificial? Here, a careful examination of computation in biological systems hints that evolutionary tinkering of contextual processing of information enabled by a hierarchical architecture is the key to build AGI.\nThe development of artificial general intelligence is considered by many to be inevitable. What such intelligence does after becoming aware is not so certain. To that end, research suggests that the likelihood of artificial general intelligence becoming hostile to humans is significant enough to warrant inquiry into methods to limit such potential. Thus, containment of artificial general intelligence is a timely and meaningful research topic. While there is limited research exploring possible containment strategies, such work is bounded by the underlying field the strategies draw upon. Accordingly, we set out to construct an ontology to describe necessary elements in any future containment technology. Using existing academic literature, we developed a single domain ontology containing five levels, 32 codes, and 32 associated descriptors. Further, we constructed ontology diagrams to demonstrate intended relationships. We then identified humans, AGI, and the cyber world as novel agent objects necessary for future containment activities. Collectively, the work addresses three critical gaps: (a) identifying and arranging fundamental constructs; (b) situating AGI containment within cyber science; and (c) developing scientific rigor within the field.\nArtificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to \"learn\". Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.\nThis article is a brief personal account of the past, present, and future of algorithmic randomness, emphasizing its role in inductive inference and artificial intelligence. It is written for a general audience interested in science and philosophy. Intuitively, randomness is a lack of order or predictability. If randomness is the opposite of determinism, then algorithmic randomness is the opposite of computability. Besides many other things, these concepts have been used to quantify Ockham's razor, solve the induction problem, and define intelligence.\nThere are few knowledge representation (KR) techniques available for efficiently representing knowledge. However, with the increase in complexity, better methods are needed. Some researchers came up with hybrid mechanisms by combining two or more methods. In an effort to construct an intelligent computer system, a primary consideration is to represent large amounts of knowledge in a way that allows effective use and efficiently organizing information to facilitate making the recommended inferences. There are merits and demerits of combinations, and standardized method of KR is needed. In this paper, various hybrid schemes of KR were explored at length and details presented.\nThis work summarizes part of current knowledge on High-level Cognitive process and its relation with biological hardware. Thus, it is possible to identify some paradoxes which could impact the development of future technologies and artificial intelligence: we may make a High-level Cognitive Machine, sacrificing the principal attribute of a machine, its accuracy.\nIn this work we investigate the systems that implements algorithms for the planning problem in Artificial Intelligence, called planners, with especial attention to the planners based on the plan graph. We analyze the problem of comparing the performance of the different algorithms and we propose an environment for the development and analysis of planners.\nThis paper covers a number of approaches that leverage Artificial Intelligence algorithms and techniques to aid Unmanned Combat Aerial Vehicle (UCAV) autonomy. An analysis of current approaches to autonomous control is provided followed by an exploration of how these techniques can be extended and enriched with AI techniques including Artificial Neural Networks (ANN), Ensembling and Reinforcement Learning (RL) to evolve control strategies for UCAVs.\nAs intelligent systems are increasingly making decisions that directly affect society, perhaps the most important upcoming research direction in AI is to rethink the ethical implications of their actions. Means are needed to integrate moral, societal and legal values with technological developments in AI, both during the design process as well as part of the deliberation algorithms employed by these systems. In this paper, we describe leading ethics theories and propose alternative ways to ensure ethical behavior by artificial systems. Given that ethics are dependent on the socio-cultural context and are often only implicit in deliberation processes, methodologies are needed to elicit the values held by designers and stakeholders, and to make these explicit leading to better understanding and trust on artificial autonomous systems.\nAI technology has a long history which is actively and constantly changing and growing. It focuses on intelligent agents, which contain devices that perceive the environment and based on which takes actions in order to maximize goal success chances. In this paper, we will explain the modern AI basics and various representative applications of AI. In the context of the modern digitalized world, AI is the property of machines, computer programs, and systems to perform the intellectual and creative functions of a person, independently find ways to solve problems, be able to draw conclusions and make decisions. Most artificial intelligence systems have the ability to learn, which allows people to improve their performance over time. The recent research on AI tools, including machine learning, deep learning and predictive analysis intended toward increasing the planning, learning, reasoning, thinking and action taking ability. Based on which, the proposed research intends towards exploring on how the human intelligence differs from the artificial intelligence. Moreover, we critically analyze what AI of today is capable of doing, why it still cannot reach human intelligence and what are the open challenges existing in front of AI to reach and outperform human level of intelligence. Furthermore, it will explore the future predictions for artificial intelligence and based on which potential solution will be recommended to solve it within next decades.\nSpecialized intelligent systems can be found everywhere: finger print, handwriting, speech, and face recognition, spam filtering, chess and other game programs, robots, et al. This decade the first presumably complete mathematical theory of artificial intelligence based on universal induction-prediction-decision-action has been proposed. This information-theoretic approach solidifies the foundations of inductive inference and artificial intelligence. Getting the foundations right usually marks a significant progress and maturing of a field. The theory provides a gold standard and guidance for researchers working on intelligent algorithms. The roots of universal induction have been laid exactly half-a-century ago and the roots of universal intelligence exactly one decade ago. So it is timely to take stock of what has been achieved and what remains to be done. Since there are already good recent surveys, I describe the state-of-the-art only in passing and refer the reader to the literature. This article concentrates on the open problems in universal induction and its extension to universal intelligence.\nThis paper is a survey of a large number of informal definitions of ``intelligence'' that the authors have collected over the years. Naturally, compiling a complete list would be impossible as many definitions of intelligence are buried deep inside articles and books. Nevertheless, the 70-odd definitions presented here are, to the authors' knowledge, the largest and most well referenced collection there is.\nData Mining techniques plays a vital role like extraction of required knowledge, finding unsuspected information to make strategic decision in a novel way which in term understandable by domain experts. A generalized frame work is proposed by considering non - domain experts during mining process for better understanding, making better decision and better finding new patters in case of selecting suitable data mining techniques based on the user profile by means of intelligent agents. KEYWORDS: Data Mining Techniques, Intelligent Agents, User Profile, Multidimensional Visualization, Knowledge Discovery.\nObserving that the creation of certain types of artistic artifacts necessitate intelligence, we present the Lovelace 2.0 Test of creativity as an alternative to the Turing Test as a means of determining whether an agent is intelligent. The Lovelace 2.0 Test builds off prior tests of creativity and additionally provides a means of directly comparing the relative intelligence of different agents.\nIn the era of intelligent biohybrid neurotechnologies for brain repair, new fanciful terms are appearing in the scientific dictionary to define what has so far been unimaginable. As the emerging neurotechnologies are becoming increasingly polyhedral and sophisticated, should we talk about evolution and rank the intelligence of these devices?\nThis paper introduce a software system including widely-used Swarm Intelligence algorithms or approaches to be used for the related scientific research studies associated with the subject area. The programmatic infrastructure of the system allows working on a fast, easy-to-use, interactive platform to perform Swarm Intelligence based studies in a more effective, efficient and accurate way. In this sense, the system employs all of the necessary controls for the algorithms and it ensures an interactive platform on which computer users can perform studies on a wide spectrum of solution approaches associated with simple and also more advanced problems.\nTuring test was long considered the measure for artificial intelligence. But with the advances in AI, it has proved to be insufficient measure. We can now aim to mea- sure machine intelligence like we measure human intelligence. One of the widely accepted measure of intelligence is standardized math and science test. In this paper, we explore the progress we have made towards the goal of making a machine smart enough to pass the standardized test. We see the challenges and opportunities posed by the domain, and note that we are quite some ways from actually making a system as smart as a even a middle school scholar.\nNumerous, artificially intelligent, networked things will populate the battlefield of the future, operating in close collaboration with human warfighters, and fighting as teams in highly adversarial environments. This paper explores the characteristics, capabilities and intelligence required of such a network of intelligent things and humans - Internet of Battle Things (IOBT). It will experience unique challenges that are not yet well addressed by the current generation of AI and machine learning.\nThis paper presents an information theoretic approach to the concept of intelligence in the computational sense. We introduce a probabilistic framework from which computational intelligence is shown to be an entropy minimizing process at the local level. Using this new scheme, we develop a simple data driven clustering example and discuss its applications.\nThis paper proposes a model for combination of external and internal stimuli for the action selection in an autonomous agent, based in an action selection mechanism previously proposed by the authors. This combination model includes additive and multiplicative elements, which allows to incorporate new properties, which enhance the action selection. A given parameter a, which is part of the proposed model, allows to regulate the degree of dependence of the observed external behaviour from the internal states of the entity.\nThis paper investigates the use of different Artificial Intelligence methods to predict the values of several continuous variables from a Steam Generator. The objective was to determine how the different artificial intelligence methods performed in making predictions on the given dataset. The artificial intelligence methods evaluated were Neural Networks, Support Vector Machines, and Adaptive Neuro-Fuzzy Inference Systems. The types of neural networks investigated were Multi-Layer Perceptions, and Radial Basis Function. Bayesian and committee techniques were applied to these neural networks. Each of the AI methods considered was simulated in Matlab. The results of the simulations showed that all the AI methods were capable of predicting the Steam Generator data reasonably accurately. However, the Adaptive Neuro-Fuzzy Inference system out performed the other methods in terms of accuracy and ease of implementation, while still achieving a fast execution time as well as a reasonable training time.\nMicroarray is one of the essential technologies used by the biologist to measure genome-wide expression levels of genes in a particular organism under some particular conditions or stimuli. As microarrays technologies have become more prevalent, the challenges of analyzing these data for getting better insight about biological processes have essentially increased. Due to availability of artificial intelligence based sophisticated computational techniques, such as artificial neural networks, fuzzy logic, genetic algorithms, and many other nature-inspired algorithms, it is possible to analyse microarray gene expression data in more better way. Here, we reviewed artificial intelligence based techniques for the analysis of microarray gene expression data. Further, challenges in the field and future work direction have also been suggested.\nThe 7th Symposium on Educational Advances in Artificial Intelligence (EAAI'17, co-chaired by Sven Koenig and Eric Eaton) launched the EAAI New and Future AI Educator Program to support the training of early-career university faculty, secondary school faculty, and future educators (PhD candidates or postdocs who intend a career in academia). As part of the program, awardees were asked to address one of the following \"blue sky\" questions:   * How could/should Artificial Intelligence (AI) courses incorporate ethics into the curriculum?   * How could we teach AI topics at an early undergraduate or a secondary school level?   * AI has the potential for broad impact to numerous disciplines. How could we make AI education more interdisciplinary, specifically to benefit non-engineering fields?   This paper is a collection of their responses, intended to help motivate discussion around these issues in AI education.\nArtificial intelligence methods have often been applied to perform specific functions or tasks in the cyber-defense realm. However, as adversary methods become more complex and difficult to divine, piecemeal efforts to understand cyber-attacks, and malware-based attacks in particular, are not providing sufficient means for malware analysts to understand the past, present and future characteristics of malware.   In this paper, we present the Malware Analysis and Attributed using Genetic Information (MAAGI) system. The underlying idea behind the MAAGI system is that there are strong similarities between malware behavior and biological organism behavior, and applying biologically inspired methods to corpora of malware can help analysts better understand the ecosystem of malware attacks. Due to the sophistication of the malware and the analysis, the MAAGI system relies heavily on artificial intelligence techniques to provide this capability. It has already yielded promising results over its development life, and will hopefully inspire more integration between the artificial intelligence and cyber--defense communities.\nWe consider the fundamental question: how a legacy \"student\" Artificial Intelligent (AI) system could learn from a legacy \"teacher\" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here \"learning\" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the \"student\" Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the \"student\" system can successfully and non-iteratively learn $k\\ll n$ new examples from the \"teacher\" (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features.\nWe propose to use a supervised machine learning technique to track the location of a mobile agent in real time. Hidden Markov Models are used to build artificial intelligence that estimates the unknown position of a mobile target moving in a defined environment. This narrow artificial intelligence performs two distinct tasks. First, it provides real-time estimation of the mobile agent's position using the forward algorithm. Second, it uses the Baum-Welch algorithm as a statistical learning tool to gain knowledge of the mobile target. Finally, an experimental environment is proposed, namely a video game that we use to test our artificial intelligence. We present statistical and graphical results to illustrate the efficiency of our method.\nThe elusive quest for intelligence in artificial intelligence prompts us to consider that instituting human-level intelligence in systems may be (still) in the realm of utopia. In about a quarter century, we have witnessed the winter of AI (1990) being transformed and transported to the zenith of tabloid fodder about AI (2015). The discussion at hand is about the elements that constitute the canonical idea of intelligence. The delivery of intelligence as a pay-per-use-service, popping out of an app or from a shrink-wrapped software defined point solution, is in contrast to the bio-inspired view of intelligence as an outcome, perhaps formed from a tapestry of events, cross-pollinated by instances, each with its own microcosm of experiences and learning, which may not be discrete all-or-none functions but continuous, over space and time. The enterprise world may not require, aspire or desire such an engaged solution to improve its services for enabling digital transformation through the deployment of digital twins, for example. One might ask whether the \"work-flow on steroids\" version of decision support may suffice for intelligence? Are we harking back to the era of rule based expert systems? The image conjured by the publicity machines offers deep solutions with human-level AI and preposterous claims about capturing the \"brain in a box\" by 2020. Even emulating insects may be difficult in terms of real progress. Perhaps we can try to focus on worms (Caenorhabditis elegans) which may be better suited for what business needs to quench its thirst for so-called intelligence in AI.\nThis paper summarises how the \"SP theory of intelligence\" and its realisation in the \"SP computer model\" simplifies and integrates concepts across artificial intelligence and related areas, and thus provides a promising foundation for the development of a general, human-level thinking machine, in accordance with the main goal of research in artificial general intelligence.   The key to this simplification and integration is the powerful concept of \"multiple alignment\", borrowed and adapted from bioinformatics. This concept has the potential to be the \"double helix\" of intelligence, with as much significance for human-level intelligence as has DNA for biological sciences.   Strengths of the SP system include: versatility in the representation of diverse kinds of knowledge; versatility in aspects of intelligence (including: strengths in unsupervised learning; the processing of natural language; pattern recognition at multiple levels of abstraction that is robust in the face of errors in data; several kinds of reasoning (including: one-step `deductive' reasoning; chains of reasoning; abductive reasoning; reasoning with probabilistic networks and trees; reasoning with 'rules'; nonmonotonic reasoning and reasoning with default values; Bayesian reasoning with 'explaining away'; and more); planning; problem solving; and more); seamless integration of diverse kinds of knowledge and diverse aspects of intelligence in any combination; and potential for application in several areas (including: helping to solve nine problems with big data; helping to develop human-level intelligence in autonomous robots; serving as a database with intelligence and with versatility in the representation and integration of several forms of knowledge; serving as a vehicle for medical knowledge and as an aid to medical diagnosis; and several more).\nThe SP theory of computing and cognition, described in previous publications, is an attractive model for intelligent databases because it provides a simple but versatile format for different kinds of knowledge, it has capabilities in artificial intelligence, and it can also function like established database models when that is required.   This paper describes how the SP model can emulate other models used in database applications and compares the SP model with those other models. The artificial intelligence capabilities of the SP model are reviewed and its relationship with other artificial intelligence systems is described. Also considered are ways in which current prototypes may be translated into an 'industrial strength' working system.\nSince the Turing test was first proposed by Alan Turing in 1950, the primary goal of artificial intelligence has been predicated on the ability for computers to imitate human behavior. However, the majority of uses for the computer can be said to fall outside the domain of human abilities and it is exactly outside of this domain where computers have demonstrated their greatest contribution to intelligence. Another goal for artificial intelligence is one that is not predicated on human mimicry, but instead, on human amplification. This article surveys various systems that contribute to the advancement of human and social intelligence.\nPeople sometimes worry about the Singularity [Vinge, 1993; Kurzweil, 2005], or about the world being taken over by artificially intelligent robots. I believe the risks of these are very small. However, few people recognize that we already share our world with artificial creatures that participate as intelligent agents in our society: corporations. Our planet is inhabited by two distinct kinds of intelligent beings --- individual humans and corporate entities --- whose natures and interests are intimately linked. To co-exist well, we need to find ways to define the rights and responsibilities of both individual humans and corporate entities, and to find ways to ensure that corporate entities behave as responsible members of society.\nReinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent's estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically self-preserving), and that the agent's posterior belief that it will survive increases over time.\nArtificial General Intelligence is a field of research aiming to distill the principles of intelligence that operate independently of a specific problem domain or a predefined context and utilize these principles in order to synthesize systems capable of performing any intellectual task a human being is capable of and eventually go beyond that. While \"narrow\" artificial intelligence which focuses on solving specific problems such as speech recognition, text comprehension, visual pattern recognition, robotic motion, etc. has shown quite a few impressive breakthroughs lately, understanding general intelligence remains elusive. In the paper we offer a novel theoretical approach to understanding general intelligence. We start with a brief introduction of the current conceptual approach. Our critique exposes a number of serious limitations that are traced back to the ontological roots of the concept of intelligence. We then propose a paradigm shift from intelligence perceived as a competence of individual agents defined in relation to an a priori given problem domain or a goal, to intelligence perceived as a formative process of self-organization by which intelligent agents are individuated. We call this process open-ended intelligence. Open-ended intelligence is developed as an abstraction of the process of cognitive development so its application can be extended to general agents and systems. We introduce and discuss three facets of the idea: the philosophical concept of individuation, sense-making and the individuation of general cognitive agents. We further show how open-ended intelligence can be framed in terms of a distributed, self-organizing network of interacting elements and how such process is scalable. The framework highlights an important relation between coordination and intelligence and a new understanding of values. We conclude with a number of questions for future research.\nThis article deals with the links between the enaction paradigm and artificial intelligence. Enaction is considered a metaphor for artificial intelligence, as a number of the notions which it deals with are deemed incompatible with the phenomenal field of the virtual. After explaining this stance, we shall review previous works regarding this issue in terms of artifical life and robotics. We shall focus on the lack of recognition of co-evolution at the heart of these approaches. We propose to explicitly integrate the evolution of the environment into our approach in order to refine the ontogenesis of the artificial system, and to compare it with the enaction paradigm. The growing complexity of the ontogenetic mechanisms to be activated can therefore be compensated by an interactive guidance system emanating from the environment. This proposition does not however resolve that of the relevance of the meaning created by the machine (sense-making). Such reflections lead us to integrate human interaction into this environment in order to construct relevant meaning in terms of participative artificial intelligence. This raises a number of questions with regards to setting up an enactive interaction. The article concludes by exploring a number of issues, thereby enabling us to associate current approaches with the principles of morphogenesis, guidance, the phenomenology of interactions and the use of minimal enactive interfaces in setting up experiments which will deal with the problem of artificial intelligence in a variety of enaction-based ways.\nSequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline how the AIXI model can formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXItl that is still effectively more intelligent than any other time t and length l bounded agent. The computation time of AIXItl is of the order t x 2^l. The discussion includes formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.\nOur understanding of intelligence is directed primarily at the level of human beings. This paper attempts to give a more unifying definition that can be applied to the natural world in general. The definition would be used more to verify a degree of intelligence, not to quantify it and might help when making judgements on the matter. A version of an accepted test for AI is then put forward as the 'acid test' for Artificial Intelligence itself. It might be what a free-thinking program or robot would try to achieve. Recent work by the author on AI has been more from a direction of mechanical processes, or ones that might operate automatically. This paper will not try to question the idea of intelligence, in the sense of a pro-active or conscious event, but try to put it into a more passive, automatic and mechanical context. The paper also suggests looking at intelligence and consciousness as being slightly different.\nThis paper looks at Turing's postulations about Artificial Intelligence in his paper 'Computing Machinery and Intelligence', published in 1950. It notes how accurate they were and how relevant they still are today. This paper notes the arguments and mechanisms that he suggested and tries to expand on them further. The paper however is mostly about describing the essential ingredients for building an intelligent model and the problems related with that. The discussion includes recent work by the author himself, who adds his own thoughts on the matter that come from a purely technical investigation into the problem. These are personal and quite speculative, but provide an interesting insight into the mechanisms that might be used for building an intelligent system.\nThere is overwhelming evidence that human intelligence is a product of Darwinian evolution. Investigating the consequences of self-modification, and more precisely, the consequences of utility function self-modification, leads to the stronger claim that not only human, but any form of intelligence is ultimately only possible within evolutionary processes. Human-designed artificial intelligences can only remain stable until they discover how to manipulate their own utility function. By definition, a human designer cannot prevent a superhuman intelligence from modifying itself, even if protection mechanisms against this action are put in place. Without evolutionary pressure, sufficiently advanced artificial intelligences become inert by simplifying their own utility function. Within evolutionary processes, the implicit utility function is always reducible to persistence, and the control of superhuman intelligences embedded in evolutionary processes is not possible. Mechanisms against utility function self-modification are ultimately futile. Instead, scientific effort toward the mitigation of existential risks from the development of superintelligences should be in two directions: understanding consciousness, and the complex dynamics of evolutionary systems.\nIn Intelligence Analysis it is of vital importance to manage uncertainty. Intelligence data is almost always uncertain and incomplete, making it necessary to reason and taking decisions under uncertainty. One way to manage the uncertainty in Intelligence Analysis is Dempster-Shafer Theory. This thesis contains five results regarding multiple target tracks and intelligence specification.\nIn this paper we develop an evidential force aggregation method intended for classification of evidential intelligence into recognized force structures. We assume that the intelligence has already been partitioned into clusters and use the classification method individually in each cluster. The classification is based on a measure of fitness between template and fused intelligence that makes it possible to handle intelligence reports with multiple nonspecific and uncertain propositions. With this measure we can aggregate on a level-by-level basis, starting from general intelligence to achieve a complete force structure with recognized units on all hierarchical levels.\nSimilar to intelligent multicellular neural networks controlling human brains, even single cells surprisingly are able to make intelligent decisions to classify several external stimuli or to associate them. This happens because of the fact that gene regulatory networks can perform as perceptrons, simple intelligent schemes known from studies on Artificial Intelligence. We study the role of genetic noise in intelligent decision making at the genetic level and show that noise can play a constructive role helping cells to make a proper decision. We show this using the example of a simple genetic classifier able to classify two external stimuli.\nThis paper assumes the hypothesis that human learning is perception based, and consequently, the learning process and perceptions should not be represented and investigated independently or modeled in different simulation spaces. In order to keep the analogy between the artificial and human learning, the former is assumed here as being based on the artificial perception. Hence, instead of choosing to apply or develop a Computational Theory of (human) Perceptions, we choose to mirror the human perceptions in a numeric (computational) space as artificial perceptions and to analyze the interdependence between artificial learning and artificial perception in the same numeric space, using one of the simplest tools of Artificial Intelligence and Soft Computing, namely the perceptrons. As practical applications, we choose to work around two examples: Optical Character Recognition and Iris Recognition. In both cases a simple Turing test shows that artificial perceptions of the difference between two characters and between two irides are fuzzy, whereas the corresponding human perceptions are, in fact, crisp.\nToday, available methods that assess AI systems are focused on using empirical techniques to measure the performance of algorithms in some specific tasks (e.g., playing chess, solving mazes or land a helicopter). However, these methods are not appropriate if we want to evaluate the general intelligence of AI and, even less, if we compare it with human intelligence. The ANYNT project has designed a new method of evaluation that tries to assess AI systems using well known computational notions and problems which are as general as possible. This new method serves to assess general intelligence (which allows us to learn how to solve any new kind of problem we face) and not only to evaluate performance on a set of specific tasks. This method not only focuses on measuring the intelligence of algorithms, but also to assess any intelligent system (human beings, animals, AI, aliens?,...), and letting us to place their results on the same scale and, therefore, to be able to compare them. This new approach will allow us (in the future) to evaluate and compare any kind of intelligent system known or even to build/find, be it artificial or biological. This master thesis aims at ensuring that this new method provides consistent results when evaluating AI algorithms, this is done through the design and implementation of prototypes of universal intelligence tests and their application to different intelligent systems (AI algorithms and humans beings). From the study we analyze whether the results obtained by two different intelligent systems are properly located on the same scale and we propose changes and refinements to these prototypes in order to, in the future, being able to achieve a truly universal intelligence test.\nOPUS is a branch and bound search algorithm that enables efficient admissible search through spaces for which the order of search operator application is not significant. The algorithm's search efficiency is demonstrated with respect to very large machine learning search spaces. The use of admissible search is of potential value to the machine learning community as it means that the exact learning biases to be employed for complex learning tasks can be precisely specified and manipulated. OPUS also has potential for application in other areas of artificial intelligence, notably, truth maintenance.\nThis paper applies resolution theorem proving to natural language semantics. The aim is to circumvent the computational complexity triggered by natural language ambiguities like pronoun binding, by interleaving pronoun binding with resolution deduction. Therefore disambiguation is only applied to expression that actually occur during derivations.\nWe introduce a logic for reasoning about evidence that essentially views evidence as a function from prior beliefs (before making an observation) to posterior beliefs (after making the observation). We provide a sound and complete axiomatization for the logic, and consider the complexity of the decision problem. Although the reasoning in the logic is mainly propositional, we allow variables representing numbers and quantification over them. This expressive power seems necessary to capture important properties of evidence.\nThis paper overviews the basic principles and recent advances in the emerging field of Quantum Computation (QC), highlighting its potential application to Artificial Intelligence (AI). The paper provides a very brief introduction to basic QC issues like quantum registers, quantum gates and quantum algorithms and then it presents references, ideas and research guidelines on how QC can be used to deal with some basic AI problems, such as search and pattern matching, as soon as quantum computers become widely available.\nIn a certain number of situations, human cognitive functioning is difficult to represent with classical artificial intelligence structures. Such a difficulty arises in the polyneuropathy diagnosis which is based on the spatial distribution, along the nerve fibres, of lesions, together with the synthesis of several partial diagnoses. Faced with this problem while building up an expert system (NEUROP), we developed a heterogeneous knowledge representation associating a finite automaton with first order logic. A number of knowledge representation problems raised by the electromyography test features are examined in this study and the expert system architecture allowing such a knowledge modeling are laid out.\nIn Pe\\~na (2007), MCMC sampling is applied to approximately calculate the ratio of essential graphs (EGs) to directed acyclic graphs (DAGs) for up to 20 nodes. In the present paper, we extend that work from 20 to 31 nodes. We also extend that work by computing the approximate ratio of connected EGs to connected DAGs, of connected EGs to EGs, and of connected DAGs to DAGs. Furthermore, we prove that the latter ratio is asymptotically 1. We also discuss the implications of these results for learning DAGs from data.\nWe examine the computational complexity of testing and finding small plans in probabilistic planning domains with succinct representations. We find that many problems of interest are complete for a variety of complexity classes: NP, co-NP, PP, NP^PP, co-NP^PP, and PSPACE. Of these, the probabilistic classes PP and NP^PP are likely to be of special interest in the field of uncertainty in artificial intelligence and are deserving of additional study. These results suggest a fruitful direction of future algorithmic development.\nProbabilistic conceptual network is a knowledge representation scheme designed for reasoning about concepts and categorical abstractions in utility-based categorization. The scheme combines the formalisms of abstraction and inheritance hierarchies from artificial intelligence, and probabilistic networks from decision analysis. It provides a common framework for representing conceptual knowledge, hierarchical knowledge, and uncertainty. It facilitates dynamic construction of categorization decision models at varying levels of abstraction. The scheme is applied to an automated machining problem for reasoning about the state of the machine at varying levels of abstraction in support of actions for maintaining competitiveness of the plant.\nEvidential reasoning is now a leading topic in Artificial Intelligence. Evidence is represented by a variety of evidential functions. Evidential reasoning is carried out by certain kinds of fundamental operation on these functions. This paper discusses two of the basic operations on evidential functions, the discount operation and the well-known orthogonal sum operation. We show that the discount operation is not commutative with the orthogonal sum operation, and derive expressions for the two operations applied to the various evidential function.\nWe present initial ideas for a programming paradigm based on simulation that is targeted towards applications of artificial intelligence (AI). The approach aims at integrating techniques from different areas of AI and is based on the idea that simulated entities may freely exchange data and behavioural patterns. We define basic notions of a simulation-based programming paradigm and show how it can be used for implementing AI applications.\nThis paper describes the incorporation of uncertainty in diagnostic reasoning based on the set covering model of Reggia et. al. extended to what in the Artificial Intelligence dichotomy between deep and compiled (shallow, surface) knowledge based diagnosis may be viewed as the generic form at the compiled end of the spectrum. A major undercurrent in this is advocating the need for a strong underlying model and an integrated set of support tools for carrying such a model in order to deal with uncertainty.\nVarious properties of relative entropy have led to its widespread use in information theory. These properties suggest that relative entropy has a role to play in systems that attempt to perform inference in terms of probability distributions. In this paper, I will review some basic properties of relative entropy as well as its role in probabilistic inference. I will also mention briefly a few existing and potential applications of relative entropy to so-called artificial intelligence (AI).\nIn the field of Artificial Intelligence, traditional approaches to choosing moves in games involve the we of the minimax algorithm. However, recent research results indicate that minimizing may not always be the best approach. In this paper we summarize the results of some measurements on several model games with several different evaluation functions. These measurements, which are presented in detail in [NPT], show that there are some new algorithms that can make significantly better use of evaluation function values than the minimax algorithm does.\nSuccess in the quest for artificial intelligence has the potential to bring unprecedented benefits to humanity, and it is therefore worthwhile to investigate how to maximize these benefits while avoiding potential pitfalls. This article gives numerous examples (which should by no means be construed as an exhaustive list) of such worthwhile research aimed at ensuring that AI remains robust and beneficial.\nThis paper considers the computational power of constant size, dynamic Bayesian networks. Although discrete dynamic Bayesian networks are no more powerful than hidden Markov models, dynamic Bayesian networks with continuous random variables and discrete children of continuous parents are capable of performing Turing-complete computation. With modified versions of existing algorithms for belief propagation, such a simulation can be carried out in real time. This result suggests that dynamic Bayesian networks may be more powerful than previously considered. Relationships to causal models and recurrent neural networks are also discussed.\nIn recent years prominent intellectuals have raised ethical concerns about the consequences of artificial intelligence. One concern is that an autonomous agent might modify itself to become \"superintelligent\" and, in supremely effective pursuit of poorly specified goals, destroy all of humanity. This paper considers and rejects the possibility of this outcome. We argue that this scenario depends on an agent's ability to rapidly improve its ability to predict its environment through self-modification. Using a Bayesian model of a reasoning agent, we show that there are important limitations to how an agent may improve its predictive ability through self-modification alone. We conclude that concern about this artificial intelligence outcome is misplaced and better directed at policy questions around data access and storage.\nGeneral Video Game Artificial Intelligence is a general game playing framework for Artificial General Intelligence research in the video-games domain. In this paper, we propose for the first time a screen capture learning agent for General Video Game AI framework. A Deep Q-Network algorithm was applied and improved to develop an agent capable of learning to play different games in the framework. After testing this algorithm using various games of different categories and difficulty levels, the results suggest that our proposed screen capture learning agent has the potential to learn many different games using only a single learning algorithm.\nMulti-agent path finding (MAPF) is a well-studied problem in artificial intelligence, where one needs to find collision-free paths for agents with given start and goal locations. In video games, agents of different types often form teams. In this paper, we demonstrate the usefulness of MAPF algorithms from artificial intelligence for moving such non-homogeneous teams in congested video game environments.\nHere we examine the paperclip apocalypse concern for artificial general intelligence (or AGI) whereby a superintelligent AI with a simple goal (ie., producing paperclips) accumulates power so that all resources are devoted towards that simple goal and are unavailable for any other use. We provide conditions under which a paper apocalypse can arise but also show that, under certain architectures for recursive self-improvement of AIs, that a paperclip AI may refrain from allowing power capabilities to be developed. The reason is that such developments pose the same control problem for the AI as they do for humans (over AIs) and hence, threaten to deprive it of resources for its primary goal.\nThe welfare of modern societies has been intrinsically linked to wage labour. With some exceptions, the modern human has to sell her labour-power to be able reproduce biologically and socially. Thus, a lingering fear of technological unemployment features predominately as a theme among Artificial Intelligence researchers. In this short paper we show that, if past trends are anything to go by, this fear is irrational. On the contrary, we argue that the main problem humanity will be facing is the normalisation of extremely long working hours.\nOne of the main research areas in Artificial Intelligence is the coding of agents (programs) which are able to learn by themselves in any situation. This means that agents must be useful for purposes other than those they were created for, as, for example, playing chess. In this way we try to get closer to the pristine goal of Artificial Intelligence. One of the problems to decide whether an agent is really intelligent or not is the measurement of its intelligence, since there is currently no way to measure it in a reliable way. The purpose of this project is to create an interpreter that allows for the execution of several environments, including those which are generated randomly, so that an agent (a person or a program) can interact with them. Once the interaction between the agent and the environment is over, the interpreter will measure the intelligence of the agent according to the actions, states and rewards the agent has undergone inside the environment during the test. As a result we will be able to measure agents' intelligence in any possible environment, and to make comparisons between several agents, in order to determine which of them is the most intelligent. In order to perform the tests, the interpreter must be able to randomly generate environments that are really useful to measure agents' intelligence, since not any randomly generated environment will serve that purpose.\nEfficiently entering information into a computer is key to enjoying the benefits of computing. This paper describes three intelligent user interfaces: handwriting recognition, adaptive menus, and predictive fillin. In the context of adding a personUs name and address to an electronic organizer, tests show handwriting recognition is slower than typing on an on-screen, soft keyboard, while adaptive menus and predictive fillin can be twice as fast. This paper also presents strategies for applying these three interfaces to other information collection domains.\nHow can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) is a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent's actions. The constraint is defined in terms of the agent's belief distributions, and does not require an explicit specification of which actions constitute wireheading.\nAlthough the problem of a critique of robotic behavior in near-unanimous agreement to human norms seems intractable, a starting point of such an ambition is a framework of the collection of knowledge a priori and experience a posteriori categorized as a set of synthetical judgments available to the intelligence, translated into computer code. If such a proposal were successful, an algorithm with ethically traceable behavior and cogent equivalence to human cognition is established. This paper will propose the application of Kant's critique of reason to current programming constructs of an autonomous intelligent system.\nTheoretical analysis of machine intelligence (MI) is useful for defining a common platform in both theoretical and applied artificial intelligence (AI). The goal of this paper is to set canonical definitions that can assist pragmatic research in both strong and weak AI. Described epistemological features of machine intelligence include relationship between intelligent behavior, intelligent and unintelligent machine characteristics, observable and unobservable entities and classification of intelligence. The paper also establishes algebraic definitions of efficiency and accuracy of MI tests as their quality measure. The last part of the paper addresses the learning process with respect to the traditional epistemology and the epistemology of MI described here. The proposed views on MI positively correlate to the Hegelian monistic epistemology and contribute towards amalgamating idealistic deliberations with the AI theory, particularly in a local frame of reference.\nA definition of Artificial Intelligence was proposed in [1] but this definition was not absolutely formal at least because the word \"Human\" was used. In this paper we will formalize the definition from [1]. The biggest problem in this definition was that the level of intelligence of AI is compared to the intelligence of a human being. In order to change this we will introduce some parameters to which AI will depend. One of this parameters will be the level of intelligence and we will define one AI to each level of intelligence. We assume that for some level of intelligence the respective AI will be more intelligent than a human being. Nevertheless, we cannot say which is this level because we cannot calculate its exact value.\nStream computing is the use of multiple autonomic and parallel modules together with integrative processors at a higher level of abstraction to embody \"intelligent\" processing. The biological basis of this computing is sketched and the matter of learning is examined.\nIn this article, we describe the fuzzy logic, fuzzy language and algorithms as the basis of fuzzy reasoning, one of the intelligent information processing method, and then describe the general fuzzy reasoning method.\nThe vision systems of the eagle and the snake outperform everything that we can make in the laboratory, but snakes and eagles cannot build an eyeglass or a telescope or a microscope. (Judea Pearl)\nArtificial Chemistries (ACs) are symbolic chemical metaphors for the exploration of Artificial Life, with specific focus on the origin of life. In this work we define a P system based artificial graph chemistry to understand the principles leading to the evolution of life-like structures in an AC set up and to develop a unified framework to characterize and classify symbolic artificial chemistries by devising appropriate formalism to capture semantic and organizational information. An extension of P system is considered by associating probabilities with the rules providing the topological framework for the evolution of a labeled undirected graph based molecular reaction semantics.\nResearch on human self-regulation has shown that people hold many goals simultaneously and have complex self-regulation mechanisms to deal with this goal conflict. Artificial autonomous systems may also need to find ways to cope with conflicting goals. Indeed, the intricate interplay among different goals may be critical to the design as well as long-term safety and stability of artificial autonomous systems. I discuss some of the critical features of the human self-regulation system and how it might be applied to an artificial system. Furthermore, the implications of goal conflict for the reliability and stability of artificial autonomous systems and ensuring their alignment with human goals and ethics is examined.\nWith the advances in information technology (IT) criminals are using cyberspace to commit numerous cyber crimes. Cyber infrastructures are highly vulnerable to intrusions and other threats. Physical devices and human intervention are not sufficient for monitoring and protection of these infrastructures; hence, there is a need for more sophisticated cyber defense systems that need to be flexible, adaptable and robust, and able to detect a wide variety of threats and make intelligent real-time decisions. Numerous bio-inspired computing methods of Artificial Intelligence have been increasingly playing an important role in cyber crime detection and prevention. The purpose of this study is to present advances made so far in the field of applying AI techniques for combating cyber crimes, to demonstrate how these techniques can be an effective tool for detection and prevention of cyber attacks, as well as to give the scope for future work.\nIn this paper, the idea of a new artificial intelligence based optimization algorithm, which is inspired from the nature of vortex, has been provided briefly. As also a bio-inspired computation algorithm, the idea is generally focused on a typical vortex flow / behavior in nature and inspires from some dynamics that are occurred in the sense of vortex nature. Briefly, the algorithm is also a swarm-oriented evolutional problem solution approach; because it includes many methods related to elimination of weak swarm members and trying to improve the solution process by supporting the solution space via new swarm members. In order have better idea about success of the algorithm; it has been tested via some benchmark functions. At this point, the obtained results show that the algorithm can be an alternative to the literature in terms of single-objective optimization solution ways. Vortex Optimization Algorithm (VOA) is the name suggestion by the authors; for this new idea of intelligent optimization approach.\nDecomposable dependency models possess a number of interesting and useful properties. This paper presents new characterizations of decomposable models in terms of independence relationships, which are obtained by adding a single axiom to the well-known set characterizing dependency models that are isomorphic to undirected graphs. We also briefly discuss a potential application of our results to the problem of learning graphical models from data.\nA general notion of algebraic conditional plausibility measures is defined. Probability measures, ranking functions, possibility measures, and (under the appropriate definitions) sets of probability measures can all be viewed as defining algebraic conditional plausibility measures. It is shown that algebraic conditional plausibility measures can be represented using Bayesian networks.\nThis paper presents a review of instantaneously trained neural networks (ITNNs). These networks trade learning time for size and, in the basic model, a new hidden node is created for each training sample. Various versions of the corner-classification family of ITNNs, which have found applications in artificial intelligence (AI), are described. Implementation issues are also considered.\nIn game theory and artificial intelligence, decision making models often involve maximizing expected utility, which does not respect ordinal invariance. In this paper, the author discusses the possibility of preserving ordinal invariance and still making a rational decision under uncertainty.\nNeurons are individually translated into simple gates to plan a brain based on human psychology and intelligence. State machines, assumed previously learned in subconscious associative memory are shown to enable equation solving and rudimentary thinking using nanoprocessing within short term memory.\nI consider the problem of learning an optimal path graphical model from data and show the problem to be NP-hard for the maximum likelihood and minimum description length approaches and a Bayesian approach. This hardness result holds despite the fact that the problem is a restriction of the polynomially solvable problem of finding the optimal tree graphical model.\nThe SHOP2 planning system received one of the awards for distinguished performance in the 2002 International Planning Competition. This paper describes the features of SHOP2 which enabled it to excel in the competition, especially those aspects of SHOP2 that deal with temporal and metric planning domains.\nWe address the problem of propositional logic-based abduction, i.e., the problem of searching for a best explanation for a given propositional observation according to a given propositional knowledge base. We give a general algorithm, based on the notion of projection; then we study restrictions over the representations of the knowledge base and of the query, and find new polynomial classes of abduction problems.\nArtificial general intelligence aims to create agents capable of learning to solve arbitrary interesting problems. We define two versions of asymptotic optimality and prove that no agent can satisfy the strong version while in some cases, depending on discounting, there does exist a non-computable weak asymptotically optimal agent.\nWe present a partial-order, conformant, probabilistic planner, Probapop which competed in the blind track of the Probabilistic Planning Competition in IPC-4. We explain how we adapt distance based heuristics for use with probabilistic domains. Probapop also incorporates heuristics based on probability of success. We explain the successes and difficulties encountered during the design and implementation of Probapop.\nAcyclic directed mixed graphs, also known as semi-Markov models represent the conditional independence structure induced on an observed margin by a DAG model with latent variables. In this paper we present the first method for fitting these models to binary data using maximum likelihood estimation.\nThis paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models.\nSymmetry is an important problem in many combinatorial problems. One way of dealing with symmetry is to add constraints that eliminate symmetric solutions. We survey recent results in this area, focusing especially on two common and useful cases: symmetry breaking constraints for row and column symmetry, and symmetry breaking constraints for eliminating value symmetry\nThis paper presents complexity analysis and variational methods for inference in probabilistic description logics featuring Boolean operators, quantification, qualified number restrictions, nominals, inverse roles and role hierarchies. Inference is shown to be PEXP-complete, and variational methods are designed so as to exploit logical inference whenever possible.\nWe address the problem of identifying dynamic sequential plans in the framework of causal Bayesian networks, and show that the problem is reduced to identifying causal effects, for which there are complete identi cation algorithms available in the literature.\nWe present a graphical criterion for reading dependencies from the minimal directed independence map G of a graphoid p when G is a polytree and p satisfies composition and weak transitivity. We prove that the criterion is sound and complete. We argue that assuming composition and weak transitivity is not too restrictive.\nThis paper deals with the problem of identifying direct causal effects in recursive linear structural equation models. The paper establishes a sufficient criterion for identifying individual causal effects and provides a procedure computing identified causal effects in terms of observed covariance matrix.\nWe derive novel sufficient conditions for convergence of Loopy Belief Propagation (also known as the Sum-Product algorithm) to a unique fixed point. Our results improve upon previously known conditions. For binary variables with (anti-)ferromagnetic interactions, our conditions seem to be sharp.\nWe present a fuzzy version of description logics with concrete domains. Main features are: (i) concept constructors are based on t-norm, t-conorm, negation and implication; (ii) concrete domains are fuzzy sets; (iii) fuzzy modifiers are allowed; and (iv) the reasoning algorithm is based on a mixture of completion rules and bounded mixed integer programming.\nWe use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.\nWe formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. This result generalizes the well known Hammersley-Clifford Theorem.\nWe are interested in the problem of planning for factored POMDPs. Building on the recent results of Kearns, Mansour and Ng, we provide a planning algorithm for factored POMDPs that exploits the accuracy-efficiency tradeoff in the belief state simplification introduced by Boyen and Koller.\nA control algorithm for batch processing of mixed waste is proposed based on conditional Gaussian Bayesian networks. The network is compiled during batch staging for real-time response to sensor input.\nIn the last decade, several architectures have been proposed for exact computation of marginals using local computation. In this paper, we compare three architectures - Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer - from the perspective of graphical structure for message propagation, message-passing scheme, computational efficiency, and storage efficiency.\nThere is much interest in using partially observable Markov decision processes (POMDPs) as a formal model for planning in stochastic domains. This paper is concerned with finding optimal policies for POMDPs. We propose several improvements to incremental pruning, presently the most efficient exact algorithm for solving POMDPs.\nA modelling language is described which is suitable for the correlation of information when the underlying functional model of the system is incomplete or uncertain and the temporal dependencies are imprecise. An efficient and incremental implementation is outlined which depends on cost functions satisfying certain criteria. Possibilistic logic and probability theory (as it is used in the applications targetted) satisfy these criteria.\nThis paper describes a class of probabilistic approximation algorithms based on bucket elimination which offer adjustable levels of accuracy and efficiency. We analyze the approximation for several tasks: finding the most probable explanation, belief updating and finding the maximum a posteriori hypothesis. We identify regions of completeness and provide preliminary empirical evaluation on randomly generated networks.\nPoole has shown that nonmonotonic logics do not handle the lottery paradox correctly. In this paper we will show that Pollock's theory of defeasible reasoning fails for the same reason: defeasible reasoning is incompatible with the skeptical notion of derivability.\nA stable joint plan should guarantee the achievement of a designer's goal in a multi-agent environment, while ensuring that deviations from the prescribed plan would be detected. We present a computational framework where stable joint plans can be studied, as well as several basic results about the representation, verification and synthesis of stable joint plans.\nThis paper is concerned with planning in stochastic domains by means of partially observable Markov decision processes (POMDPs). POMDPs are difficult to solve. This paper identifies a subclass of POMDPs called region observable POMDPs, which are easier to solve and can be used to approximate general POMDPs to arbitrary accuracy.\nLower and upper probabilities, also known as Choquet capacities, are widely used as a convenient representation for sets of probability distributions. This paper presents a graphical decomposition and exact propagation algorithm for computing marginal posteriors of 2-monotone lower probabilities (equivalently, 2-alternating upper probabilities).\nIn this paper we propose a family of algorithms combining tree-clustering with conditioning that trade space for time. Such algorithms are useful for reasoning in probabilistic and deterministic networks as well as for accomplishing optimization tasks. By analyzing the problem structure it will be possible to select from a spectrum the algorithm that best meets a given time-space specification.\nWe present deterministic techniques for computing upper and lower bounds on marginal probabilities in sigmoid and noisy-OR networks. These techniques become useful when the size of the network (or clique size) precludes exact computations. We illustrate the tightness of the bounds by numerical experiments.\nThe main goal of this paper is to describe a data structure called binary join trees that are useful in computing multiple marginals efficiently using the Shenoy-Shafer architecture. We define binary join trees, describe their utility, and sketch a procedure for constructing them.\nFor real time evaluation of a Bayesian network when there is not sufficient time to obtain an exact solution, a guaranteed response time, approximate solution is required. It is shown that nontraditional methods utilizing estimators based on an archive of trial solutions and genetic search can provide an approximate solution that is considerably superior to the traditional Monte Carlo simulation methods.\nWe provide a new characterization of the Dirichlet distribution. This characterization implies that under assumptions made by several previous authors for learning belief networks, a Dirichlet prior on the parameters is inevitable.\nThis is a working paper summarizing results of an ongoing research project whose aim is to uniquely characterize the uncertainty measure for the Dempster-Shafer Theory. A set of intuitive axiomatic requirements is presented, some of their implications are shown, and the proof is given of the minimality of recently proposed measure AU among all measures satisfying the proposed requirements.\nThis paper presents correct algorithms for answering the following two questions; (i) Does there exist a causal explanation consistent with a set of background knowledge which explains all of the observed independence facts in a sample? (ii) Given that there is such a causal explanation what are the causal relationships common to every such causal explanation?\nA completeness result for d-separation applied to discrete Bayesian networks is presented and it is shown that in a strong measure-theoretic sense almost all discrete distributions for a given network structure are faithful; i.e. the independence facts true of the distribution are all and only those entailed by the network structure.\nWe develop a new semantics for defeasible inference based on extended probability measures allowed to take infinitesimal values, on the interpretation of defaults as generalized conditional probability constraints and on a preferred-model implementation of entropy maximization.\nThis paper examines the \"K2\" network scoring metric of Cooper and Herskovits. It shows counterintuitive results from applying this metric to simple networks. One family of noninformative priors is suggested for assigning equal scores to equivalent networks.\nWe propose an integration of possibility theory into non-classical logics. We obtain many formal results that generalize the case where possibility and necessity functions are based on classical logic. We show how useful such an approach is by applying it to reasoning under uncertain and inconsistent information.\nWe present an approach to the solution of decision problems formulated as influence diagrams. This approach involves a special triangulation of the underlying graph, the construction of a junction tree with special properties, and a message passing algorithm operating on the junction tree for computation of expected utilities and optimal decision policies.\nWe construct the belief function that quantifies the agent, beliefs about which event of Q will occurred when he knows that the event is selected by a chance set-up and that the probability function associated to the chance set up is only partially known.\nThis paper studies the connection between probabilistic conditional independence in uncertain reasoning and data dependency in relational databases. As a demonstration of the usefulness of this preliminary investigation, an alternate proof is presented for refuting the conjecture suggested by Pearl and Paz that probabilistic conditional independencies have a complete axiomatization.\nThis paper describes a normative system design that incorporates diagnosis, dynamic evolution, decision making, and information gathering. A single influence diagram demonstrates the design's coherence, yet each activity is more effectively modeled and evaluated separately. Application to offshore oil platforms illustrates the design. For this application, the normative system is embedded in a real-time expert system.\nTwo algorithms are presented for \"compiling\" influence diagrams into a set of simple decision rules. These decision rules define simple-to-execute, complete, consistent, and near-optimal decision procedures. These compilation algorithms can be used to derive decision procedures for human teams solving time constrained decision problems.\nIn order to find a causal explanation for data presented in the form of covariance and concentration matrices it is necessary to decide if the graph formed by such associations is a projection of a directed acyclic graph (dag). We show that the general problem of deciding whether such a dag exists is NP-complete.\nThis paper introduces a qualitative measure of ambiguity and analyses its relationship with other measures of uncertainty. Probability measures relative likelihoods, while ambiguity measures vagueness surrounding those judgments. Ambiguity is an important representation of uncertain knowledge. It deals with a different, type of uncertainty modeled by subjective probability or belief.\nJeffrey's rule of conditioning has been proposed in order to revise a probability measure by another probability function. We generalize it within the framework of the models based on belief functions. We show that several forms of Jeffrey's conditionings can be defined that correspond to the geometrical rule of conditioning and to Dempster's rule of conditioning, respectively.\nIn this paper, the concept of possibilistic evidence which is a possibility distribution as well as a body of evidence is proposed over an infinite universe of discourse. The inference with possibilistic evidence is investigated based on a unified inference framework maintaining both the compatibility of concepts and the consistency of the probability logic.\nThe product expansion of conditional probabilities for belief nets is not maximum entropy. This appears to deny a desirable kind of assurance for the model. However, a kind of guarantee that is almost as strong as maximum entropy can be derived. Surprisingly, a variant model also exhibits the guarantee, and for many cases obtains a higher performance score than the product expansion.\nThis paper introduces the notion of objection-based causal networks which resemble probabilistic causal networks except that they are quantified using objections. An objection is a logical sentence and denotes a condition under which a, causal dependency does not exist. Objection-based causal networks enjoy almost all the properties that make probabilistic causal networks popular, with the added advantage that objections are, arguably more intuitive than probabilities.\nA new entropy-like measure as well as a new measure of total uncertainty pertaining to the Dempster-Shafer theory are introduced. It is argued that these measures are better justified than any of the previously proposed candidates.\nWe propose a general Bayesian network model for application in a wide class of problems of therapy monitoring. We discuss the use of stochastic simulation as a computational approach to inference on the proposed class of models. As an illustration we present an application to the monitoring of cytotoxic chemotherapy in breast cancer.\nUseless paths are a chronic problem for marker-passing techniques. We use a probabilistic analysis to justify a method for quickly identifying and rejecting useless paths. Using the same analysis, we identify key conditions and assumptions necessary for marker-passing to perform well.\nA reason maintenance system which extends an ATMS through Mukaidono's fuzzy logic is described. It supports a problem solver in situations affected by incomplete information and vague data, by allowing nonmonotonic inferences and the revision of previous conclusions when contradictions are detected.\nThis paper outlines a methodology for analyzing the representational support for knowledge-based decision-modeling in a broad domain. A relevant set of inference patterns and knowledge types are identified. By comparing the analysis results to existing representations, some insights are gained into a design approach for integrating categorical and uncertain knowledge in a context sensitive manner.\nThis paper proposes a new method for solving Bayesian decision problems. The method consists of representing a Bayesian decision problem as a valuation-based system and applying a fusion algorithm for solving it. The fusion algorithm is a hybrid of local computational methods for computation of marginals of joint probability distributions and the local computational methods for discrete optimization problems.\nIrrelevance-based partial MAPs are useful constructs for domain-independent explanation using belief networks. We look at two definitions for such partial MAPs, and prove important properties that are useful in designing algorithms for computing them effectively. We make use of these properties in modifying our standard MAP best-first algorithm, so as to handle irrelevance-based partial MAPs.\nThe relationship between belief networks and relational databases is examined. Based on this analysis, a method to construct belief networks automatically from statistical relational data is proposed. A comparison between our method and other methods shows that our method has several advantages when generalization or prediction is deeded.\nA general notion of algebraic conditional plausibility measures is defined. Probability measures, ranking functions, possibility measures, and (under the appropriate definitions) sets of probability measures can all be viewed as defining algebraic conditional plausibility measures. It is shown that the technology of Bayesian networks can be applied to algebraic conditional plausibility measures.\nPreferences among acts are analyzed in the style of L. Savage, but as partially ordered. The rationality postulates considered are weaker than Savage's on three counts. The Sure Thing Principle is derived in this setting. The postulates are shown to lead to a characterization of generalized qualitative probability that includes and blends both traditional qualitative probability and the ranked structures used in logical approaches.\nThis paper studies quantum annealing (QA) for clustering, which can be seen as an extension of simulated annealing (SA). We derive a QA algorithm for clustering and propose an annealing schedule, which is crucial in practice. Experiments show the proposed QA algorithm finds better clustering assignments than SA. Furthermore, QA is as easy as SA to implement.\nWe show how, and under which conditions, the equilibrium states of a first-order Ordinary Differential Equation (ODE) system can be described with a deterministic Structural Causal Model (SCM). Our exposition sheds more light on the concept of causality as expressed within the framework of Structural Causal Models, especially for cyclic models.\nCox's well-known theorem justifying the use of probability is shown not to hold in finite domains. The counterexample also suggests that Cox's assumptions are insufficient to prove the result even in infinite domains. The same counterexample is used to disprove a result of Fine on comparative conditional probability.\nPDDL was originally conceived and constructed as a lingua franca for the International Planning Competition. PDDL2.1 embodies a set of extensions intended to support the expression of something closer to real planning problems. This objective has only been partially achieved, due in large part to a deliberate focus on not moving too far from classical planning models and solution methods.\nI comment on the PDDL 2.1 language and its use in the planning competition, focusing on the choices made for accommodating time and concurrency. I also discuss some methodological issues that have to do with the move toward more expressive planning languages and the balance needed in planning research between semantics and computation.\nAttribute weighting and differential weighting, two major mechanisms for computing context-dependent similarity or dissimilarity measures are studied and compared. A dissimilarity measure based on subset size in the context is proposed and its metrization and application are given. It is also shown that while all attribute weighting dissimilarity measures are metrics differential weighting dissimilarity measures are usually non-metric.\nA similarity network is a tool for constructing belief networks for the diagnosis of a single fault. In this paper, we examine modifications to the similarity-network representation that facilitate the construction of belief networks for the diagnosis of multiple coexisting faults.\nIncidence Calculus and Dempster-Shafer Theory of Evidence are both theories to describe agents' degrees of belief in propositions, thus being appropriate to represent uncertainty in reasoning systems. This paper presents a straightforward equivalence proof between some special cases of these theories.\nAfter a brief introduction to causal probabilistic networks and the HUGIN approach, the problem of conflicting data is discussed. A measure of conflict is defined, and it is used in the medical diagnostic system MUNIN. Finally, it is discussed how to distinguish between conflicting data and a rare case.\nAn efficient algorithm is developed that identifies all independencies implied by the topology of a Bayesian network. Its correctness and maximality stems from the soundness and completeness of d-separation with respect to probability theory. The algorithm runs in time O (l E l) where E is the number of edges in the network.\nMeasures of uncertainty and divergence are introduced for interval-valued probability distributions and are shown to have desirable mathematical properties. A maximum uncertainty inference procedure for marginal interval distributions is presented. A technique for reconstruction of interval distributions from projections is developed based on this inference procedure\nThe most difficult task in probabilistic reasoning may be handling directed cycles in belief networks. To the best knowledge of this author, there is no serious discussion of this problem at all in the literature of probabilistic reasoning so far.\nWe discuss the Dempster-Shafer theory of evidence. We introduce a concept of monotonicity which is related to the diminution of the range between belief and plausibility. We show that the accumulation of knowledge in this framework exhibits a nonmonotonic property. We show how the belief structure can be used to represent typical or commonsense knowledge.\nThis paper demonstrates a method for using belief-network algorithms to solve influence diagram problems. In particular, both exact and approximation belief-network algorithms may be applied to solve influence-diagram problems. More generally, knowing the relationship between belief-network and influence-diagram problems may be useful in the design and development of more efficient influence diagram algorithms.\nThis paper advocates the usefulness of new theories of uncertainty for the purpose of modeling some facets of uncertain knowledge, especially vagueness, in AI. It can be viewed as a partial reply to Cheeseman's (among others) defense of probability.\nThis paper addresses the problem of resolving errors under uncertainty in a rule-based system. A new approach has been developed that reformulates this problem as a neural-network learning problem. The strength and the fundamental limitations of this approach are explored and discussed. The main result is that neural heuristics can be applied to solve some but not all problems in rule-based systems.\nThis paper addresses a prevailing assumption in single-agent heuristic search theory- that problem-solving algorithms should guarantee shortest-path solutions, which are typically called optimal. Optimality implies a metric for judging solution quality, where the optimal solution is the solution with the highest quality. When path-length is the metric, we will distinguish such solutions as p-optimal.\nUncertainty enters into human reasoning and inference in at least two ways. It is reasonable to suppose that there will be roles for these distinct uses of uncertainty also in automated reasoning.\nAn approximation method is presented for probabilistic inference with continuous random variables. These problems can arise in many practical problems, in particular where there are \"second order\" probabilities. The approximation, based on the Gaussian influence diagram, iterates over linear approximations to the inference problem.\nIn pattern analysis, information regarding an object can often be drawn from its surroundings. This paper presents a method for handling uncertainty when using context of symbols and texts for analyzing technical drawings. The method is based on Dempster-Shafer theory and possibility theory.\nThe apparent failure of individual probabilistic expressions to distinguish uncertainty about truths from uncertainty about probabilistic assessments have prompted researchers to seek formalisms where the two types of uncertainties are given notational distinction. This paper demonstrates that the desired distinction is already a built-in feature of classical probabilistic models, thus, specialized notations are unnecessary.\nThis paper discusses an expert system shell that integrates rule-based reasoning and the Dempster-Shafer evidence combination scheme. Domain knowledge is stored as rules with associated belief functions. The reasoning component uses a combination of forward and backward inferencing mechanisms to allow interaction with users in a mixed-initiative format.\nAn evidential reasoning mechanism based on the Dempster-Shafer theory of evidence is introduced. Its performance in real-world image analysis is compared with other mechanisms based on the Bayesian formalism and a simple weight combination method.\nIn this paper, we present some results of evidential reasoning in understanding multispectral images of remote sensing systems. The Dempster-Shafer approach of combination of evidences is pursued to yield contextual classification results, which are compared with previous results of the Bayesian context free classification, contextual classifications of dynamic programming and stochastic relaxation approaches.\nAn automated explanation facility for Bayesian conditioning aimed at improving user acceptance of probability-based decision support systems has been developed. The domain-independent facility is based on an information processing perspective on reasoning about conditional evidence that accounts both for biased and normative inferences. Experimental results indicate that the facility is both acceptable to naive users and effective in improving understanding.\nThe generalized fault diagram, a data structure for failure analysis based on the influence diagram, is defined. Unlike the fault tree, this structure allows for dependence among the basic events and replicated logical elements. A heuristic procedure is developed for efficient processing of these structures.\nA learning algorithm is presented which given the structure of a causal tree, will estimate its link probabilities by sequential measurements on the leaves only. Internal nodes of the tree represent conceptual (hidden) variables inaccessible to observation. The method described is incremental, local, efficient, and remains robust to measurement imprecisions.\nLinear representations for a subclass of boolean symmetric functions selected by a parity condition are shown to constitute a generalization of the linear constraints on probabilities introduced by Boole. These linear constraints are necessary to compute probabilities of events with relations between the. arbitrarily specified with propositional calculus boolean formulas.\nBayesian networks provide a probabilistic semantics for qualitative assertions about likelihood. A qualitative reasoner based on an algebra over these assertions can derive further conclusions about the influence of actions. While the conclusions are much weaker than those computed from complete probability distributions, they are still valuable for suggesting potential actions, eliminating obviously inferior plans, identifying important tradeoffs, and explaining probabilistic models.\nIn many cases commonsense knowledge consists of knowledge of what is usual. In this paper we develop a system for reasoning with usual information. This system is based upon the fact that these pieces of commonsense information involve both a probabilistic aspect and a granular aspect. We implement this system with the aid of possibility-probability granules.\nIn the current versions of the Dempster-Shafer theory, the only essential restriction on the validity of the rule of combination is that the sources of evidence must be statistically independent. Under this assumption, it is permissible to apply the Dempster-Shafer rule to two or mere distinct probability distributions.\nThe paper demonstrates that strict adherence to probability theory does not preclude the use of concurrent, self-activated constraint-propagation mechanisms for managing uncertainty. Maintaining local records of sources-of-belief allows both predictive and diagnostic inferences to be activated simultaneously and propagate harmoniously towards a stable equilibrium.\nGeneral problems in analyzing information in a probabilistic database are considered. The practical difficulties (and occasional advantages) of storing uncertain data, of using it conventional forward- or backward-chaining inference engines, and of working with a probabilistic version of resolution are discussed. The background for this paper is the incorporation of uncertain reasoning facilities in MRS, a general-purpose expert system building tool.\nAcyclic directed mixed graphs, also known as semi-Markov models represent the conditional independence structure induced on an observed margin by a DAG model with latent variables. In this paper we present a factorization criterion for these models that is equivalent to the global Markov property given by (the natural extension of) d-separation.\nNon-obviousness or inventive step is a general requirement for patentability in most patent law systems. An invention should be at an adequate distance beyond its prior art in order to be patented. This short paper provides an overview on a methodology proposed for legal norm validation of FSTP facts using rule reasoning approach.\nWe present a method to prove the decidability of provability in several well-known inference systems. This method generalizes both cut-elimination and the construction of an automaton recognizing the provable propositions.\nQuantum decision systems are being increasingly considered for use in artificial intelligence applications. Classical and quantum nodes can be distinguished based on certain correlations in their states. This paper investigates some properties of the states obtained in a decision tree structure. How these correlations may be mapped to the decision tree is considered. Classical tree representations and approximations to quantum states are provided.\nCurrent measures of machine intelligence are either difficult to evaluate or lack the ability to test a robot's problem-solving capacity in open worlds. We propose a novel evaluation framework based on the formal notion of MacGyver Test which provides a practical way for assessing the resilience and resourcefulness of artificial agents.\nWe propose the creation of a systematic effort to identify and replicate key findings in neuroscience and allied fields related to understanding human values. Our aim is to ensure that research underpinning the value alignment problem of artificial intelligence has been sufficiently validated to play a role in the design of AI systems.\nResearch on integrated neural-symbolic systems has made significant progress in the recent past. In particular the understanding of ways to deal with symbolic knowledge within connectionist systems (also called artificial neural networks) has reached a critical mass which enables the community to strive for applicable implementations and use cases. Recent work has covered a great variety of logics used in artificial intelligence and provides a multitude of techniques for dealing with them within the context of artificial neural networks. We present a comprehensive survey of the field of neural-symbolic integration, including a new classification of system according to their architectures and abilities.\nThe technological singularity refers to a hypothetical scenario in which technological advances virtually explode. The most popular scenario is the creation of super-intelligent algorithms that recursively create ever higher intelligences. It took many decades for these ideas to spread from science fiction to popular science magazines and finally to attract the attention of serious philosophers. David Chalmers' (JCS 2010) article is the first comprehensive philosophical analysis of the singularity in a respected philosophy journal. The motivation of my article is to augment Chalmers' and to discuss some issues not addressed by him, in particular what it could mean for intelligence to explode. In this course, I will (have to) provide a more careful treatment of what intelligence actually is, separate speed from intelligence explosion, compare what super-intelligent participants and classical human observers might experience and do, discuss immediate implications for the diversity and value of life, consider possible bounds on intelligence, and contemplate intelligences right at the singularity.\nThis paper presents a non-manual design engineering method based on heuristic search algorithm to search for candidate agents in the solution space which formed by artificial intelligence agents modeled on the base of bionics.Compared with the artificial design method represented by meta-learning and the bionics method represented by the neural architecture chip,this method is more feasible for realizing artificial general intelligence,and it has a much better interaction with cognitive neuroscience;at the same time,the engineering method is based on the theoretical hypothesis that the final learning algorithm is stable in certain scenarios,and has generalization ability in various scenarios.The paper discusses the theory preliminarily and proposes the possible correlation between the theory and the fixed-point theorem in the field of mathematics.Limited by the author's knowledge level,this correlation is proposed only as a kind of conjecture.\nDecision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameterless theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline for a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning, how the AIXI model can formally solve them. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI-tl, which is still effectively more intelligent than any other time t and space l bounded agent. The computation time of AIXI-tl is of the order tx2^l. Other discussed topics are formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.\nComputer-supported learning is an increasingly important form of study since it allows for independent learning and individualized instruction. In this paper, we discuss a novel approach to developing an intelligent tutoring system for teaching textbook-style mathematical proofs. We characterize the particularities of the domain and discuss common ITS design models. Our approach is motivated by phenomena found in a corpus of tutorial dialogs that were collected in a Wizard-of-Oz experiment. We show how an intelligent tutor for textbook-style mathematical proofs can be built on top of an adapted assertion-level proof assistant by reusing representations and proof search strategies originally developed for automated and interactive theorem proving. The resulting prototype was successfully evaluated on a corpus of tutorial dialogs and yields good results.\nThe main prospective aim of modern research related to Artificial Intelligence is the creation of technical systems that implement the idea of Strong Intelligence. According our point of view the path to the development of such systems comes through the research in the field related to perceptions. Here we formulate the model of the perception of external world which may be used for the description of perceptual activity of intelligent beings. We consider a number of issues related to the development of the set of patterns which will be used by the intelligent system when interacting with environment. The key idea of the presented perception model is the idea of subjective reality. The principle of the relativity of perceived world is formulated. It is shown that this principle is the immediate consequence of the idea of subjective reality. In this paper we show how the methodology of subjective reality may be used for the creation of different types of Strong AI systems.\nIt has been commonly argued, on the basis of Goedel's theorem and related mathematical results, that true artificial intelligence cannot exist. Penrose has further deduced from the existence of human intelligence that fundamental changes in physical theories are needed. I provide an elementary demonstration that these deductions are mistaken.\nThe theory of computational complexity is used to underpin a recent model of neocortical sensory processing. We argue that encoding into reconstruction networks is appealing for communicating agents using Hebbian learning and working on hard combinatorial problems, which are easy to verify. Computational definition of the concept of intelligence is provided. Simulations illustrate the idea.\nThis paper presents an intelligent tutoring system, GeoTutor, for Euclidean Geometry that is automatically able to synthesize proof problems and their respective solutions given a geometric figure together with a set of properties true of it. GeoTutor can provide personalized practice problems that address student deficiencies in the subject matter.\nIn this paper, we will expound upon the concepts proffered in [1], where we proposed an information theoretic approach to intelligence in the computational sense. We will examine data and meme aggregation, and study the effect of limited resources on the resulting meme amplitudes.\nIUIs aim to incorporate intelligent automated capabilities in human computer interaction, where the net impact is a human-computer interaction that improves performance or usability in critical ways. It also involves designing and implementing an artificial intelligence (AI) component that effectively leverages human skills and capabilities, so that human performance with an application excels. IUIs embody capabilities that have traditionally been associated more strongly with humans than with computers: how to perceive, interpret, learn, use language, reason, plan, and decide.\nThe concept of \"task\" is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane's performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial *general* intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A *task theory* would enable addressing tasks at the *class* level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks.\nThis is the second part of a paper on Conscious Intelligent Systems. We use the understanding gained in the first part (Conscious Intelligent Systems Part 1: IXI (arxiv id cs.AI/0612056)) to look at understanding. We see how the presence of mind affects understanding and intelligent systems; we see that the presence of mind necessitates language. The rise of language in turn has important effects on understanding. We discuss the humanoid question and how the question of self-consciousness (and by association mind/thought/language) would affect humanoids too.\nWe propose that operator induction serves as an adequate model of perception. We explain how to reduce universal agent models to operator induction. We propose a universal measure of operator induction fitness, and show how it can be used in a reinforcement learning model and a homeostasis (self-preserving) agent based on the free energy principle. We show that the action of the homeostasis agent can be explained by the operator induction model.\nComputational Intelligence (CI) is a sub-branch of Artificial Intelligence paradigm focusing on the study of adaptive mechanisms to enable or facilitate intelligent behavior in complex and changing environments. There are several paradigms of CI [like artificial neural networks, evolutionary computations, swarm intelligence, artificial immune systems, fuzzy systems and many others], each of these has its origins in biological systems [biological neural systems, natural Darwinian evolution, social behavior, immune system, interactions of organisms with their environment]. Most of those paradigms evolved into separate machine learning (ML) techniques, where probabilistic methods are used complementary with CI techniques in order to effectively combine elements of learning, adaptation, evolution and Fuzzy logic to create heuristic algorithms that are, in some sense, intelligent. The current trend is to develop consensus techniques, since no single machine learning algorithms is superior to others in all possible situations. In order to overcome this problem several meta-approaches were proposed in ML focusing on the integration of results from different methods into single prediction. We discuss here the Landau theory for the nonlinear equation that can describe the adaptive integration of information acquired from an ensemble of independent learning agents. The influence of each individual agent on other learners is described similarly to the social impact theory. The final decision outcome for the consensus system is calculated using majority rule in the stationary limit, yet the minority solutions can survive inside the majority population as the complex intermittent clusters of opposite opinion.\nThis paper describes a novel method for building affectively intelligent human-interactive agents. The method is based on a key sociological insight that has been developed and extensively verified over the last twenty years, but has yet to make an impact in artificial intelligence. The insight is that resource bounded humans will, by default, act to maintain affective consistency. Humans have culturally shared fundamental affective sentiments about identities, behaviours, and objects, and they act so that the transient affective sentiments created during interactions confirm the fundamental sentiments. Humans seek and create situations that confirm or are consistent with, and avoid and supress situations that disconfirm or are inconsistent with, their culturally shared affective sentiments. This \"affect control principle\" has been shown to be a powerful predictor of human behaviour. In this paper, we present a probabilistic and decision-theoretic generalisation of this principle, and we demonstrate how it can be leveraged to build affectively intelligent artificial agents. The new model, called BayesAct, can maintain multiple hypotheses about sentiments simultaneously as a probability distribution, and can make use of an explicit utility function to make value-directed action choices. This allows the model to generate affectively intelligent interactions with people by learning about their identity, predicting their behaviours using the affect control principle, and taking actions that are simultaneously goal-directed and affect-sensitive. We demonstrate this generalisation with a set of simulations. We then show how our model can be used as an emotional \"plug-in\" for artificially intelligent systems that interact with humans in two different settings: an exam practice assistant (tutor) and an assistive device for persons with a cognitive disability.\nWeb intelligence can be considered as a subset of Artificial Intelligence. It uses existing data in web to produce new data, knowledge and wisdom to support decision making and new predictions for web users. Artificial Intelligence is ever changing and evolving field of computer science and it is extensively used in wide array of web based business applications. Although it is used substantially in web based systems in developed countries, it is not examined whether it is being substantially used in Sri Lanka. Every Sri Lankan citizen depends on Public Service more or less throughout his/ her life time and at least more than 3 times: at birth, marriage and death. So providing most of these services to its citizen, Sri Lankan Government uses more or less of its country web portal. This paper presents a model to evaluate web intelligence capability based on weight to key functionalities with respect to web intelligence. The government websites were checked by the proposed criteria to show the potential of using web intelligent technology to provide website based services. The result indicates that the use of web intelligence techniques openly and publicly to provide web based services through government web portal to its citizens is not satisfactory. It also indicates that lack of using the technologies pertaining to web intelligence in the public service web hinders the most of the advantages that citizen and government can gain from such technological involvement.\nThe recognition of optical characters is known to be one of the earliest applications of Artificial Neural Networks, which partially emulate human thinking in the domain of artificial intelligence. In this paper, a simplified neural approach to recognition of optical or visual characters is portrayed and discussed. The document is expected to serve as a resource for learners and amateur investigators in pattern recognition, neural networking and related disciplines.\nMemory refinements are designed below to detect those sequences of actions that have been repeated a given number n. Subsequently such sequences are permitted to run without CPU involvement. This mimics human learning. Actions are rehearsed and once learned, they are performed automatically without conscious involvement.\nThe study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. In a companion paper (Friedman & Halpern, 1997), we introduce a new framework to model belief change. This framework combines temporal and epistemic modalities with a notion of plausibility, allowing us to examine the change of beliefs over time. In this paper, we show how belief revision and belief update can be captured in our framework. This allows us to compare the assumptions made by each method, and to better understand the principles underlying them. In particular, it shows that Katsuno and Mendelzon's notion of belief update (Katsuno & Mendelzon, 1991a) depends on several strong assumptions that may limit its applicability in artificial intelligence. Finally, our analysis allow us to identify a notion of minimal change that underlies a broad range of belief change operations including revision and update.\nIt is shown that Darwiche and Pearl's postulates imply an interesting property, not noticed by the authors.\nThe following four classes of computational problems are equivalent: solving matrix games, solving linear programs, best $l^{\\infty}$ linear approximation, best $l^1$ linear approximation.\nOpen Source Software (OSS) often relies on large repositories, like SourceForge, for initial incubation. The OSS repositories offer a large variety of meta-data providing interesting information about projects and their success. In this paper we propose a data mining approach for training classifiers on the OSS meta-data provided by such data repositories. The classifiers learn to predict the successful continuation of an OSS project. The `successfulness' of projects is defined in terms of the classifier confidence with which it predicts that they could be ported in popular OSS projects (such as FreeBSD, Gentoo Portage).\nThe report gives an overview about activities on the topic Semantic Web. It has been released as technical report for the project \"KTweb -- Connecting Knowledge Technologies Communities\" in 2003.\nThis manuscripts contains the proofs for \"A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction\".\nProduct take-back legislation forces manufacturers to bear the costs of collection and disposal of products that have reached the end of their useful lives. In order to reduce these costs, manufacturers can consider reuse, remanufacturing and/or recycling of components as an alternative to disposal. The implementation of such alternatives usually requires an appropriate reverse supply chain management. With the concepts of reverse supply chain are gaining popularity in practice, the use of artificial intelligence approaches in these areas is also becoming popular. As a result, the purpose of this paper is to give an overview of the recent publications concerning the application of artificial intelligence techniques to reverse supply chain with emphasis on certain types of product returns.\nNowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision of information extracted from Web pages, and, at the same time, have to prove robustness in order not to compromise quality and reliability of data themselves. In this paper we focus on some experimental aspects related to the robustness of the data extraction process and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for finding similarities between two different version of a Web page, in order to handle modifications, avoiding the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate performances, advantages and draw-backs of our novel system of automatic wrapper adaptation.\nOur hypothesis is that by equipping certain agents in a multi-agent system controlling an intelligent building with automated decision support, two important factors will be increased. The first is energy saving in the building. The second is customer value---how the people in the building experience the effects of the actions of the agents. We give evidence for the truth of this hypothesis through experimental findings related to tools for artificial decision making. A number of assumptions related to agent control, through monitoring and delegation of tasks to other kinds of agents, of rooms at a test site are relaxed. Each assumption controls at least one uncertainty that complicates considerably the procedures for selecting actions part of each such agent. We show that in realistic decision situations, room-controlling agents can make bounded rational decisions even under dynamic real-time constraints. This result can be, and has been, generalized to other domains with even harsher time constraints.\nThe chinese room problem asks if computers can think; I ask here if most humans can.\nMachines are possible to have some artificial intelligence like human beings owing to particular algorithms or software. Such machines could learn knowledge from what people taught them and do works according to the knowledge. In practical learning cases, the data is often extremely complicated and large, thus classical learning machines often need huge computational resources. Quantum machine learning algorithm, on the other hand, could be exponentially faster than classical machines using quantum parallelism. Here, we demonstrate a quantum machine learning algorithm on a four-qubit NMR test bench to solve an optical character recognition problem, also known as the handwriting recognition. The quantum machine learns standard character fonts and then recognize handwritten characters from a set with two candidates. To our best knowledge, this is the first artificial intelligence realized on a quantum processor. Due to the widespreading importance of artificial intelligence and its tremendous consuming of computational resources, quantum speedup would be extremely attractive against the challenges from the Big Data.\nOutline of several strategies for using Gaussian processes as surrogate models for the covariance matrix adaptation evolution strategy (CMA-ES).\nWe propose a funny representation of SAT. While the primary interest is to present propositional satisfiability in a playful way for pedagogical purposes, it could also inspire new search heuristics.\nThis essay explores the limits of Turing machines concerning the modeling of minds and suggests alternatives to go beyond those limits.\nThe question how an agent is affected by its embodiment has attracted growing attention in recent years. A new field of artificial intelligence has emerged, which is based on the idea that intelligence cannot be understood without taking into account embodiment. We believe that a formal approach to quantifying the embodiment's effect on the agent's behaviour is beneficial to the fields of artificial life and artificial intelligence. The contribution of an agent's body and environment to its behaviour is also known as morphological computation. Therefore, in this work, we propose a quantification of morphological computation, which is based on an information decomposition of the sensorimotor loop into shared, unique and synergistic information. In numerical simulation based on a formal representation of the sensorimotor loop, we show that the unique information of the body and environment is a good measure for morphological computation. The results are compared to our previously derived quantification of morphological computation.\nMassive Open Online Courses (MOOCs) have gained tremendous popularity in the last few years. Thanks to MOOCs, millions of learners from all over the world have taken thousands of high-quality courses for free. Putting together an excellent MOOC ecosystem is a multidisciplinary endeavour that requires contributions from many different fields. Artificial intelligence (AI) and data mining (DM) are two such fields that have played a significant role in making MOOCs what they are today. By exploiting the vast amount of data generated by learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem and enables MOOC practitioners to deliver better courses. Similarly, AI, supported by DM, can greatly improve student experience and learning outcomes. In this survey paper, we first review the state-of-the-art artificial intelligence and data mining research applied to MOOCs, emphasising the use of AI and DM tools and techniques to improve student engagement, learning outcomes, and our understanding of the MOOC ecosystem. We then offer an overview of key trends and important research to carry out in the fields of AI and DM so that MOOCs can reach their full potential.\nCybersecurity research involves publishing papers about malicious exploits as much as publishing information on how to design tools to protect cyber-infrastructure. It is this information exchange between ethical hackers and security experts, which results in a well-balanced cyber-ecosystem. In the blooming domain of AI Safety Engineering, hundreds of papers have been published on different proposals geared at the creation of a safe machine, yet nothing, to our knowledge, has been published on how to design a malevolent machine. Availability of such information would be of great value particularly to computer scientists, mathematicians, and others who have an interest in AI safety, and who are attempting to avoid the spontaneous emergence or the deliberate creation of a dangerous AI, which can negatively affect human activities and in the worst case cause the complete obliteration of the human species. This paper provides some general guidelines for the creation of a Malevolent Artificial Intelligence (MAI).\nWe use the theory of defaults and their meaning of [GS16] to develop (the outline of a) new theory of argumentation.\nArtificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated with statistical significance professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce more difficult to exploit strategies than prior approaches.\nIn this article, we introduce a new conception of a family of esport games called Samu Entropy to try to improve dataflow program graphs like the ones that are based on Google's TensorFlow. Currently, the Samu Entropy project specifies only requirements for new esport games to be developed with particular attention to the investigation of the relationship between esport and artificial intelligence. It is quite obvious that there is a very close and natural relationship between esport games and artificial intelligence. Furthermore, the project Samu Entropy focuses not only on using artificial intelligence, but on creating AI in a new way. We present a reference game called Face Battle that implements the Samu Entropy requirements.\nWith the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.\nArtificial Intelligence is a central topic in the computer science curriculum. From the year 2011 a project-based learning methodology based on computer games has been designed and implemented into the intelligence artificial course at the University of the Bio-Bio. The project aims to develop software-controlled agents (bots) which are programmed by using heuristic algorithms seen during the course. This methodology allows us to obtain good learning results, however several challenges have been founded during its implementation.   In this paper we show how linguistic descriptions of data can help to provide students and teachers with technical and personalized feedback about the learned algorithms. Algorithm behavior profile and a new Turing test for computer games bots based on linguistic modelling of complex phenomena are also proposed in order to deal with such challenges.   In order to show and explore the possibilities of this new technology, a web platform has been designed and implemented by one of authors and its incorporation in the process of assessment allows us to improve the teaching learning process.\nThe concept of innateness is rarely discussed in the context of artificial intelligence. When it is discussed, or hinted at, it is often the context of trying to reduce the amount of innate machinery in a given system. In this paper, I consider as a test case a recent series of papers by Silver et al (Silver et al., 2017a) on AlphaGo and its successors that have been presented as an argument that a \"even in the most challenging of domains: it is possible to train to superhuman level, without human examples or guidance\", \"starting tabula rasa.\"   I argue that these claims are overstated, for multiple reasons. I close by arguing that artificial intelligence needs greater attention to innateness, and I point to some proposals about what that innateness might look like.\nIt is undeniable that artificial intelligence (AI) and blockchain concepts are spreading at a phenomenal rate. Both technologies have distinct degree of technological complexity and multi-dimensional business implications. However, a common misunderstanding about blockchain concept, in particular, is that blockchain is decentralized and is not controlled by anyone. But the underlying development of a blockchain system is still attributed to a cluster of core developers. Take smart contract as an example, it is essentially a collection of codes (or functions) and data (or states) that are programmed and deployed on a blockchain (say, Ethereum) by different human programmers. It is thus, unfortunately, less likely to be free of loopholes and flaws. In this article, through a brief overview about how artificial intelligence could be used to deliver bug-free smart contract so as to achieve the goal of blockchain 2.0, we to emphasize that the blockchain implementation can be assisted or enhanced via various AI techniques. The alliance of AI and blockchain is expected to create numerous possibilities.\nThe use of artificial intelligence intelligencein medicine can be traced back to 1968 when Paycha published his paper Le diagnostic a l'aide d'intelligences artificielle, presentation de la premiere machine diagnostri. Few years later Shortliffe et al. presented an expert system named Mycin which was able to identify bacteria causing severe blood infections and to recommend antibiotics. Despite the fact that Mycin outperformed members of the Stanford medical school in the reliability of diagnosis it was never used in practice due to a legal issue who do you sue if it gives a wrong diagnosis?. However only in 2016 when the artificial intelligence software built into the IBM Watson AI platform correctly diagnosed and proposed an effective treatment for a 60-year-old womans rare form of leukemia the AI use in medicine become really popular.On of first papers presenting the use of AI in paediatrics was published in 1984. The paper introduced a computer-assisted medical decision making system called SHELP.\nThe AGINAO is a project to create a human-level artificial general intelligence system (HL AGI) embodied in the Aldebaran Robotics' NAO humanoid robot. The dynamical and open-ended cognitive engine of the robot is represented by an embedded and multi-threaded control program, that is self-crafted rather than hand-crafted, and is executed on a simulated Universal Turing Machine (UTM). The actual structure of the cognitive engine emerges as a result of placing the robot in a natural preschool-like environment and running a core start-up system that executes self-programming of the cognitive layer on top of the core layer. The data from the robot's sensory devices supplies the training samples for the machine learning methods, while the commands sent to actuators enable testing hypotheses and getting a feedback. The individual self-created subroutines are supposed to reflect the patterns and concepts of the real world, while the overall program structure reflects the spatial and temporal hierarchy of the world dependencies. This paper focuses on the details of the self-programming approach, limiting the discussion of the applied cognitive architecture to a necessary minimum.\nThis text introduces the twin deadlocks of strong artificial life. Conceptualization of life is a deadlock both because of the existence of a continuum between the inert and the living, and because we only know one instance of life. Computationalism is a second deadlock since it remains a matter of faith. Nevertheless, artificial life realizations quickly progress and recent constructions embed an always growing set of the intuitive properties of life. This growing gap between theory and realizations should sooner or later crystallize in some kind of paradigm shift and then give clues to break the twin deadlocks.\nArtificial Chemistries (ACs) are symbolic chemical metaphors for the exploration of Artificial Life, with specific focus on the problem of biogenesis or the origin of life. This paper presents authors thoughts towards defining a unified framework to characterize and classify symbolic artificial chemistries by devising appropriate formalism to capture semantic and organizational information. We identify three basic high level abstractions in initial proposal for this framework viz., information, computation, and communication. We present an analysis of two important notions of information, namely, Shannon's Entropy and Algorithmic Information, and discuss inductive and deductive approaches for defining the framework.\nInnate immunity now occupies a central role in immunology. However, artificial immune system models have largely been inspired by adaptive not innate immunity. This paper reviews the biological principles and properties of innate immunity and, adopting a conceptual framework, asks how these can be incorporated into artificial models. The aim is to outline a meta-framework for models of innate immunity.\nWe present chemlambda (or the chemical concrete machine), an artificial chemistry with the following properties: (a) is Turing complete, (b) has a model of decentralized, distributed computing associated to it, (c) works at the level of individual (artificial) molecules, subject of reversible, but otherwise deterministic interactions with a small number of enzymes, (d) encodes information in the geometrical structure of the molecules and not in their numbers, (e) all interactions are purely local in space and time. This is part of a larger project to create computing, artificial chemistry and artificial life in a distributed context, using topological and graphical languages.\nIf you are an artificial intelligence researcher, you should look to video games as ideal testbeds for the work you do. If you are a video game developer, you should look to AI for the technology that makes completely new types of games possible. This chapter lays out the case for both of these propositions. It asks the question \"what can video games do for AI\", and discusses how in particular general video game playing is the ideal testbed for artificial general intelligence research. It then asks the question \"what can AI do for video games\", and lays out a vision for what video games might look like if we had significantly more advanced AI at our disposal. The chapter is based on my keynote at IJCCI 2015, and is written in an attempt to be accessible to a broad audience.\nThe objective of this paper is to introduce an artificial intelligence based optimization approach, which is inspired from Piagets theory on cognitive development. The approach has been designed according to essential processes that an individual may experience while learning something new or improving his / her knowledge. These processes are associated with the Piagets ideas on an individuals cognitive development. The approach expressed in this paper is a simple algorithm employing swarm intelligence oriented tasks in order to overcome single-objective optimization problems. For evaluating effectiveness of this early version of the algorithm, test operations have been done via some benchmark functions. The obtained results show that the approach / algorithm can be an alternative to the literature in terms of single-objective optimization. The authors have suggested the name: Cognitive Development Optimization Algorithm (CoDOA) for the related intelligent optimization approach.\nA recent issue of a popular computing journal asked which laws would apply if a self-driving car killed a pedestrian. This paper considers the question of legal liability for artificially intelligent computer systems. It discusses whether criminal liability could ever apply; to whom it might apply; and, under civil law, whether an AI program is a product that is subject to product design legislation or a service to which the tort of negligence applies. The issue of sales warranties is also considered. A discussion of some of the practical limitations that AI systems are subject to is also included.\nLearning and reasoning are both aspects of what is considered to be intelligence. Their studies within AI have been separated historically, learning being the topic of machine learning and neural networks, and reasoning falling under classical (or symbolic) AI. However, learning and reasoning are in many ways interdependent. This paper discusses the nature of some of these interdependencies and proposes a general framework called FLARE, that combines inductive learning using prior knowledge together with reasoning in a propositional setting. Several examples that test the framework are presented, including classical induction, many important reasoning protocols and two simple expert systems.\nInspired by Hofstadter's Coffee-House Conversation (1982) and by the science fiction short story SAM by Schattschneider (1988), we propose and discuss criteria for non-mechanical intelligence. Firstly, we emphasize the practical need for such tests in view of massively multiuser online role-playing games (MMORPGs) and virtual reality systems like Second Life. Secondly, we demonstrate Second Life as a useful framework for implementing (some iterations of) that test.\nAutonomous intelligent agent research is a domain situated at the forefront of artificial intelligence. Interest-based negotiation (IBN) is a form of negotiation in which agents exchange information about their underlying goals, with a view to improve the likelihood and quality of a offer. In this paper we model and verify a multi-agent argumentation scenario of resource sharing mechanism to enable resource sharing in a distributed system. We use IBN in our model wherein agents express their interests to the others in the society to gain certain resources.\nMost successful Bayesian network (BN) applications to datehave been built through knowledge elicitation from experts.This is difficult and time consuming, which has lead to recentinterest in automated methods for learning BNs from data. We present a case study in the construction of a BN in anintelligent tutoring application, specifically decimal misconceptions. Wedescribe the BN construction using expert elicitation and then investigate how certainexisting automated knowledge discovery methods might support the BN knowledge engineering process.\nWe propose a formal framework for intelligent systems which can reason about scientific domains, in particular about the carcinogenicity of chemicals, and we study its properties. Our framework is grounded in a philosophy of scientific enquiry and discourse, and uses a model of dialectical argumentation. The formalism enables representation of scientific uncertainty and conflict in a manner suitable for qualitative reasoning about the domain.\nThis paper describes a domain-specific knowledge acquisition tool for intelligent automated troubleshooters based on Bayesian networks. No Bayesian network knowledge is required to use the tool, and troubleshooting information can be specified as natural and intuitive as possible. Probabilities can be specified in the direction that is most natural to the domain expert. Thus, the knowledge acquisition efficiently removes the traditional knowledge acquisition bottleneck of Bayesian networks.\nLike any large system development effort, the construction of a complex belief network model requires systems engineering to manage the design and construction process. We propose a rapid prototyping approach to network engineering. We describe criteria for identifying network modules and the use of \"stubs\" to represent not-yet-constructed modules. We propose an object oriented representation for belief networks which captures the semantics of the problem in addition to conditional independencies and probabilities. Methods for evaluating complex belief network models are discussed. The ideas are illustrated with examples from a large belief network construction problem in the military intelligence domain.\nThis note is concerned with a formal analysis of the problem of non-monotonic reasoning in intelligent systems, especially when the uncertainty is taken into account in a quantitative way. A firm connection between logic and probability is established by introducing conditioning notions by means of formal structures that do not rely on quantitative measures. The associated conditional logic, compatible with conditional probability evaluations, is non-monotonic relative to additional evidence. Computational aspects of conditional probability logic are mentioned. The importance of this development lies on its role to provide a conceptual basis for various forms of evidence combination and on its significance to unify multi-valued and non-monotonic logics\nThis is a preliminary version of visual interpretation integrating multiple sensors in SUCCESSOR, an intelligent, model-based vision system. We pursue a thorough integration of hierarchical Bayesian inference with comprehensive physical representation of objects and their relations in a system for reasoning with geometry, surface materials and sensor models in machine vision. Bayesian inference provides a framework for accruing_ probabilities to rank order hypotheses.\nCurrent approaches to expert systems' reasoning under uncertainty fail to capture the iterative revision process characteristic of intelligent human reasoning. This paper reports on a system, called the Non-monotonic Probabilist, or NMP (Cohen, et al., 1985). When its inferences result in substantial conflict, NMP examines and revises the assumptions underlying the inferences until conflict is reduced to acceptable levels. NMP has been implemented in a demonstration computer-based system, described below, which supports threat correlation and in-flight route replanning by Air Force pilots.\nDoes the energy requirements for the human brain give energy constraints that give reason to doubt the feasibility of artificial intelligence? This report will review some relevant estimates of brain bioenergetics and analyze some of the methods of estimating brain emulation energy requirements. Turning to AI, there are reasons to believe the energy requirements for de novo AI to have little correlation with brain (emulation) energy requirements since cost could depend merely of the cost of processing higher-level representations rather than billions of neural firings. Unless one thinks the human way of thinking is the most optimal or most easily implementable way of achieving software intelligence, we should expect de novo AI to make use of different, potentially very compressed and fast, processes.\nIn this paper artificial neural networks and support vector machines are used to reduce the amount of vibration data that is required to estimate the Time Domain Average of a gear vibration signal. Two models for estimating the time domain average of a gear vibration signal are proposed. The models are tested on data from an accelerated gear life test rig. Experimental results indicate that the required data for calculating the Time Domain Average of a gear vibration signal can be reduced by up to 75% when the proposed models are implemented.\nIn this paper, we present a heuristic operator which aims at simultaneously optimizing the orientations of all the edges in an intermediate Bayesian network structure during the search process. This is done by alternating between the space of directed acyclic graphs (DAGs) and the space of skeletons. The found orientations of the edges are based on a scoring function rather than on induced conditional independences. This operator can be used as an extension to commonly employed search strategies. It is evaluated in experiments with artificial and real-world data.\nThe study of arguments as abstract entities and their interaction as introduced by Dung (Artificial Intelligence 177, 1995) has become one of the most active research branches within Artificial Intelligence and Reasoning. A main issue for abstract argumentation systems is the selection of acceptable sets of arguments. Value-based argumentation, as introduced by Bench-Capon (J. Logic Comput. 13, 2003), extends Dung's framework. It takes into account the relative strength of arguments with respect to some ranking representing an audience: an argument is subjectively accepted if it is accepted with respect to some audience, it is objectively accepted if it is accepted with respect to all audiences. Deciding whether an argument is subjectively or objectively accepted, respectively, are computationally intractable problems. In fact, the problems remain intractable under structural restrictions that render the main computational problems for non-value-based argumentation systems tractable. In this paper we identify nontrivial classes of value-based argumentation systems for which the acceptance problems are polynomial-time tractable. The classes are defined by means of structural restrictions in terms of the underlying graphical structure of the value-based system. Furthermore we show that the acceptance problems are intractable for two classes of value-based systems that where conjectured to be tractable by Dunne (Artificial Intelligence 171, 2007).\nThe main topic discussed in this paper is how to use intelligence for biometric decision defuzzification. A neural training model is proposed and tested here as a possible solution for dealing with natural fuzzification that appears between the intra- and inter-class distribution of scores computed during iris recognition tests. It is shown here that the use of proposed neural network support leads to an improvement in the artificial perception of the separation between the intra- and inter-class score distributions by moving them away from each other.\nThis article provides a brief introduction to the \"Theory of Intelligence\" and its realisation in the \"SP Computer Model\". The overall goal of the SP programme of research, in accordance with long-established principles in science, has been the simplification and integration of observations and concepts across artificial intelligence, mainstream computing, mathematics, and human learning, perception, and cognition. In broad terms, the SP system is a brain-like system that takes in \"New\" information through its senses and stores some or all of it as \"Old\" information. A central idea in the system is the powerful concept of \"SP-multiple-alignment\", borrowed and adapted from bioinformatics. This the key to the system's versatility in aspects of intelligence, in the representation of diverse kinds of knowledge, and in the seamless integration of diverse aspects of intelligence and diverse kinds of knowledge, in any combination. There are many potential benefits and applications of the SP system. It is envisaged that the system will be developed as the \"SP Machine\", which will initially be a software virtual machine, hosted on a high-performance computer, a vehicle for further research and a step towards the development of an industrial-strength SP Machine.\nIn this paper we demonstrate that it is possible to manage intelligence in constant time as a pre-process to information fusion through a series of processes dealing with issues such as clustering reports, ranking reports with respect to importance, extraction of prototypes from clusters and immediate classification of newly arriving intelligence reports. These methods are used when intelligence reports arrive which concerns different events which should be handled independently, when it is not known a priori to which event each intelligence report is related. We use clustering that runs as a back-end process to partition the intelligence into subsets representing the events, and in parallel, a fast classification that runs as a front-end process in order to put the newly arriving intelligence into its correct information fusion process.\nWe present an alternative methodology for the analysis of algorithms, based on the concept of expected discounted reward. This methodology naturally handles algorithms that do not always terminate, so it can (theoretically) be used with partial algorithms for undecidable problems, such as those found in artificial general intelligence (AGI) and automated theorem proving. We mention an approach to self-improving AGI enabled by this methodology.   Aug 2017 addendum: This article was originally written with multiple audiences in mind. It is really best put in the following terms. Goertzel, Hutter, Legg, and others have developed a definition of an intelligence score for a general abstract agent: expected lifetime reward in a random environment. AIXI is generally the optimal agent according to this score, but there may be reasons to analyze other agents and compare score values. If we want to use this definition of intelligence in practice, perhaps we can start by analyzing some simple agents. Common algorithms can be thought of as simple agents (environment is input, reward is based on running time) so we take the goal of applying the agent intelligence score to algorithms. That is, we want to find, what are the IQ scores of algorithms? We can do some very simple analysis, but the real answer is that even for simple algorithms, the intelligence score is too difficult to work with in practice.\nThe current work addresses a virtual environment with self-replicating agents whose decisions are based on a form of \"somatic computation\" (soma - body) in which basic emotional responses, taken in parallelism to actual living organisms, are introduced as a way to provide the agents with greater reflexive abilities. The work provides a contribution to the field of Artificial Intelligence (AI) and Artificial Life (ALife) in connection to a neurobiology-based cognitive framework for artificial systems and virtual environments' simulations. The performance of the agents capable of emotional responses is compared with that of self-replicating automata, and the implications of research on emotions and AI, in connection to both virtual agents as well as robots, is addressed regarding possible future directions and applications.\nThe biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.\nThis paper illustrates how a Prolog program, using chronological backtracking to find a solution in some search space, can be enhanced to perform intelligent backtracking. The enhancement crucially relies on the impurity of Prolog that allows a program to store information when a dead end is reached. To illustrate the technique, a simple search program is enhanced.   To appear in Theory and Practice of Logic Programming.   Keywords: intelligent backtracking, dependency-directed backtracking, backjumping, conflict-directed backjumping, nogood sets, look-back.\nIn this study, we reproduce two new hybrid intelligent systems, involve three prominent intelligent computing and approximate reasoning methods: Self Organizing feature Map (SOM), Neruo-Fuzzy Inference System and Rough Set Theory (RST),called SONFIS and SORST. We show how our algorithms can be construed as a linkage of government-society interactions, where government catches various states of behaviors: solid (absolute) or flexible. So, transition of society, by changing of connectivity parameters (noise) from order to disorder is inferred.\nA society's single emergent, increasing intelligence arises partly from the thermodynamic advantages of networking the innate intelligence of different individuals, and partly from the accumulation of solved problems. Economic growth is proportional to the square of the network entropy of a society's population times the network entropy of the number of the society's solved problems.\nThe World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper we present survey on the search engine generations and the role of search engines in intelligent web and semantic search technologies.\nInformledge System (ILS) is a knowledge network with autonomous nodes and intelligent links that integrate and structure the pieces of knowledge. In this paper, we aim to put forward the link dynamics involved in intelligent processing of information in ILS. There has been advancement in knowledge management field which involve managing information in databases from a single domain. ILS works with information from multiple domains stored in distributed way in the autonomous nodes termed as Knowledge Network Node (KNN). Along with the concept under consideration, KNNs store the processed information linking concepts and processors leading to the appropriate processing of information.\nLocation management refers to the problem of updating and searching the current location of mobile nodes in a wireless network. To make it efficient, the sum of update costs of location database must be minimized. Previous work relying on fixed location databases is unable to fully exploit the knowledge of user mobility patterns in the system so as to achieve this minimization. The study presents an intelligent location management approach which has interacts between intelligent information system and knowledge-base technologies, so we can dynamically change the user patterns and reduce the transition between the VLR and HLR. The study provides algorithms are ability to handle location registration and call delivery\nThis paper introduces a new computing model based on the cooperation among Turing machines called orchestrated machines. Like universal Turing machines, orchestrated machines are also designed to simulate Turing machines but they can also modify the original operation of the included Turing machines to create a new layer of some kind of collective behavior. Using this new model we can define some interested notions related to cooperation ability of Turing machines such as the intelligence quotient or the emotional intelligence quotient for Turing machines.\nImprecise-information processing will play an indispensable role in intelligent systems, especially in the anthropomorphic intelligent systems (as intelligent robots). A new theoretical and technological system of imprecise-information processing has been founded in Principles of Imprecise-Information Processing: A New Theoretical and Technological System[1] which is different from fuzzy technology. The system has clear hierarchy and rigorous structure, which results from the formation principle of imprecise information and has solid mathematical and logical bases, and which has many advantages beyond fuzzy technology. The system provides a technological platform for relevant applications and lays a theoretical foundation for further research.\nIQ tests are an accepted method for assessing human intelligence. The tests consist of several parts that must be solved under a time constraint. Of all the tested abilities, pattern recognition has been found to have the highest correlation with general intelligence. This is primarily because pattern recognition is the ability to find order in a noisy environment, a necessary skill for intelligent agents. In this paper, we propose a convolutional neural network (CNN) model for solving geometric pattern recognition problems. The CNN receives as input multiple ordered input images and outputs the next image according to the pattern. Our CNN is able to solve problems involving rotation, reflection, color, size and shape patterns and score within the top 5% of human performance.\nThis paper describes an approach to the design of a population of cooperative robots based on concepts borrowed from Systems Theory and Artificial Intelligence. The research has been developed under the SocRob project, carried out by the Intelligent Systems Laboratory at the Institute for Systems and Robotics - Instituto Superior Tecnico (ISR/IST) in Lisbon. The acronym of the project stands both for \"Society of Robots\" and \"Soccer Robots\", the case study where we are testing our population of robots. Designing soccer robots is a very challenging problem, where the robots must act not only to shoot a ball towards the goal, but also to detect and avoid static (walls, stopped robots) and dynamic (moving robots) obstacles. Furthermore, they must cooperate to defeat an opposing team. Our past and current research in soccer robotics includes cooperative sensor fusion for world modeling, object recognition and tracking, robot navigation, multi-robot distributed task planning and coordination, including cooperative reinforcement learning in cooperative and adversarial environments, and behavior-based architectures for real time task execution of cooperating robot teams.\nWe present a case study of artificial intelligence techniques applied to the control of production printing equipment. Like many other real-world applications, this complex domain requires high-speed autonomous decision-making and robust continual operation. To our knowledge, this work represents the first successful industrial application of embedded domain-independent temporal planning. Our system handles execution failures and multi-objective preferences. At its heart is an on-line algorithm that combines techniques from state-space planning and partial-order scheduling. We suggest that this general architecture may prove useful in other applications as more intelligent systems operate in continual, on-line settings. Our system has been used to drive several commercial prototypes and has enabled a new product architecture for our industrial partner. When compared with state-of-the-art off-line planners, our system is hundreds of times faster and often finds better plans. Our experience demonstrates that domain-independent AI planning based on heuristic search can flexibly handle time, resources, replanning, and multiple objectives in a high-speed practical application without requiring hand-coded control knowledge.\nThis article is an attempt to combine different ways of working with sets of objects and their classes for designing and development of artificial intelligent systems (AIS) of analysis information, using object-oriented programming (OOP). This paper contains analysis of basic concepts of OOP and their relation with set theory and artificial intelligence (AI). Process of sets and multisets creation from different sides, in particular mathematical set theory, OOP and AI is considered. Definition of object and its properties, homogeneous and inhomogeneous classes of objects, set of objects, multiset of objects and constructive methods of their creation and classification are proposed. In addition, necessity of some extension of existing OOP tools for the purpose of practical implementation AIS of analysis information, using proposed approach, is shown.\nIn this paper, we propose a new methodology based on the Negative Selection Algorithm that belongs to the field of Computational Intelligence, specifically, Artificial Immune Systems to identify takeover targets. Although considerable research based on customary statistical techniques and some contemporary Computational Intelligence techniques have been devoted to identify takeover targets, most of the existing studies are based upon multiple previous mergers and acquisitions. Contrary to previous research, the novelty of this proposal lies in its ability to suggest takeover targets for novice firms that are at the beginning of their merger and acquisition spree. We first discuss the theoretical perspective and then provide a case study with details for practical implementation, both capitalizing from unique generalization capabilities of artificial immune systems algorithms.\nWe propose to use thought-provoking children's questions (TPCQs), namely Highlights BrainPlay questions, as a new method to drive artificial intelligence research and to evaluate the capabilities of general-purpose AI systems. These questions are designed to stimulate thought and learning in children, and they can be used to do the same thing in AI systems, while demonstrating the system's reasoning capabilities to the evaluator. We introduce the TPCQ task, which which takes a TPCQ question as input and produces as output (1) answers to the question and (2) learned generalizations. We discuss how BrainPlay questions stimulate learning. We analyze 244 BrainPlay questions, and we report statistics on question type, question class, answer cardinality, answer class, types of knowledge needed, and types of reasoning needed. We find that BrainPlay questions span many aspects of intelligence. Because the answers to BrainPlay questions and the generalizations learned from them are often highly open-ended, we suggest using human judges for evaluation.\nDetection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence to predict whether a customer causes NTL. This paper first provides an overview of how NTLs are defined and their impact on economies, which include loss of revenue and profit of electricity providers and decrease of the stability and reliability of electrical power grids. It then surveys the state-of-the-art research efforts in a up-to-date and comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be addressed in the future.\nThere is a growing focus on how to design safe artificial intelligent (AI) agents. As systems become more complex, poorly specified goals or control mechanisms may cause AI agents to engage in unwanted and harmful outcomes. Thus it is necessary to design AI agents that follow initial programming intentions as the program grows in complexity. How to specify these initial intentions has also been an obstacle to designing safe AI agents. Finally, there is a need for the AI agent to have redundant safety mechanisms to ensure that any programming errors do not cascade into major problems. Humans are autonomous intelligent agents that have avoided these problems and the present manuscript argues that by understanding human self-regulation and goal setting, we may be better able to design safe AI agents. Some general principles of human self-regulation are outlined and specific guidance for AI design is given.\nThe majority of artificial intelligence research, as it relates from which to biological senses has been focused on vision. The recent explosion of machine learning and in particular, dee p learning, can be partially attributed to the release of high quality data sets for algorithm s from which to model the world on. Thus, most of these datasets are comprised of images. We believe that focusing on sensorimotor systems and tactile feedback will create algorithms that better mimic human intelligence. Here we present SenseNet: a collection of tactile simulators and a large scale dataset of 3D objects for manipulation. SenseNet was created for the purpose of researching and training Artificial Intelligences (AIs) to interact with the environment via sensorimotor neural systems and tactile feedback. We aim to accelerate that same explosion in image processing, but for the domain of tactile feedback and sensorimotor research. We hope that SenseNet can offer researchers in both the machine learning and computational neuroscience communities brand new opportunities and avenues to explore.\nArtificial life models, swarm intelligent and evolutionary computation algorithms are usually built on fixed size populations. Some studies indicate however that varying the population size can increase the adaptability of these systems and their capability to react to changing environments. In this paper we present an extended model of an artificial ant colony system designed to evolve on digital image habitats. We will show that the present swarm can adapt the size of the population according to the type of image on which it is evolving and reacting faster to changing images, thus converging more rapidly to the new desired regions, regulating the number of his image foraging agents. Finally, we will show evidences that the model can be associated with the Mathematical Morphology Watershed algorithm to improve the segmentation of digital grey-scale images. KEYWORDS: Swarm Intelligence, Perception and Image Processing, Pattern Recognition, Mathematical Morphology, Social Cognitive Maps, Social Foraging, Self-Organization, Distributed Search.\nModelling problems containing a mixture of Boolean and numerical variables is a long-standing interest of Artificial Intelligence. However, performing inference and learning in hybrid domains is a particularly daunting task. The ability to model this kind of domains is crucial in \"learning to design\" tasks, that is, learning applications where the goal is to learn from examples how to perform automatic {\\em de novo} design of novel objects. In this paper we present Structured Learning Modulo Theories, a max-margin approach for learning in hybrid domains based on Satisfiability Modulo Theories, which allows to combine Boolean reasoning and optimization over continuous linear arithmetical constraints. The main idea is to leverage a state-of-the-art generalized Satisfiability Modulo Theory solver for implementing the inference and separation oracles of Structured Output SVMs. We validate our method on artificial and real world scenarios.\nThe ozone level prediction is an important task of air quality agencies of modern cities. In this paper, we design an ozone level alarm system (OLP) for Isfahan city and test it through the real word data from 1-1-2000 to 7-6-2011. We propose a computer based system with three inputs and single output. The inputs include three sensors of solar ultraviolet (UV), total solar radiation (TSR) and total ozone (O3). And the output of the system is the predicted O3 of the next day and the alarm massages. A developed artificial intelligence (AI) algorithm is applied to determine the output, based on the inputs variables. For this issue, AI models, including supervised brain emotional learning (BEL), adaptive neuro-fuzzy inference system (ANFIS) and artificial neural networks (ANNs), are compared in order to find the best model. The simulation of the proposed system shows that it can be used successfully in prediction of major cities ozone level.\nAn artificial superintelligence (ASI) is artificial intelligence that is significantly more intelligent than humans in all respects. While ASI does not currently exist, some scholars propose that it could be created sometime in the future, and furthermore that its creation could cause a severe global catastrophe, possibly even resulting in human extinction. Given the high stakes, it is important to analyze ASI risk and factor the risk into decisions related to ASI research and development. This paper presents a graphical model of major pathways to ASI catastrophe, focusing on ASI created via recursive self-improvement. The model uses the established risk and decision analysis modeling paradigms of fault trees and influence diagrams in order to depict combinations of events and conditions that could lead to AI catastrophe, as well as intervention options that could decrease risks. The events and conditions include select aspects of the ASI itself as well as the human process of ASI research, development, and management. Model structure is derived from published literature on ASI risk. The model offers a foundation for rigorous quantitative evaluation and decision making on the long-term risk of ASI catastrophe.\nIn the artificial intelligence field, learning often corresponds to changing the parameters of a parameterized function. A learning rule is an algorithm or mathematical expression that specifies precisely how the parameters should be changed. When creating an artificial intelligence system, we must make two decisions: what representation should be used (i.e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions. Using most learning rules, these two decisions are coupled in a subtle (and often unintentional) way. That is, using the same learning rule with two different representations that can represent the same sets of functions can result in two different outcomes. After arguing that this coupling is undesirable, particularly when using artificial neural networks, we present a method for partially decoupling these two decisions for a broad class of learning rules that span unsupervised learning, reinforcement learning, and supervised learning.\nThe rapid advancement of machine learning techniques has re-energized research into general artificial intelligence. While the idea of domain-agnostic meta-learning is appealing, this emerging field must come to terms with its relationship to human cognition and the statistics and structure of the tasks humans perform. The position of this article is that only by aligning our agents' abilities and environments with those of humans do we stand a chance at developing general artificial intelligence (GAI). A broad reading of the famous 'No Free Lunch' theorem is that there is no universally optimal inductive bias or, equivalently, bias-free learning is impossible. This follows from the fact that there are an infinite number of ways to extrapolate data, any of which might be the one used by the data generating environment; an inductive bias prefers some of these extrapolations to others, which lowers performance in environments using these adversarial extrapolations. We may posit that the optimal GAI is the one that maximally exploits the statistics of its environment to create its inductive bias; accepting the fact that this agent is guaranteed to be extremely sub-optimal for some alternative environments. This trade-off appears benign when thinking about the environment as being the physical universe, as performance on any fictive universe is obviously irrelevant. But, we should expect a sharper inductive bias if we further constrain our environment. Indeed, we implicitly do so by defining GAI in terms of accomplishing that humans consider useful. One common version of this is need the for 'common-sense reasoning', which implicitly appeals to the statistics of physical universe as perceived by humans.\nNext-generation wireless networks must support ultra-reliable, low-latency communication and intelligently manage a massive number of Internet of Things (IoT) devices in real-time, within a highly dynamic environment. This need for stringent communication quality-of-service (QoS) requirements as well as mobile edge and core intelligence can only be realized by integrating fundamental notions of artificial intelligence (AI) and machine learning across the wireless infrastructure and end-user devices. In this context, this paper provides a comprehensive tutorial that introduces the main concepts of machine learning, in general, and artificial neural networks (ANNs), in particular, and their potential applications in wireless communications. For this purpose, we present a comprehensive overview on a number of key types of neural networks that include feed-forward, recurrent, spiking, and deep neural networks. For each type of neural network, we present the basic architecture and training procedure, as well as the associated challenges and opportunities. Then, we provide an in-depth overview on the variety of wireless communication problems that can be addressed using ANNs, ranging from communication using unmanned aerial vehicles to virtual reality and edge caching.For each individual application, we present the main motivation for using ANNs along with the associated challenges while also providing a detailed example for a use case scenario and outlining future works that can be addressed using ANNs. In a nutshell, this article constitutes one of the first holistic tutorials on the development of machine learning techniques tailored to the needs of future wireless networks.\nWhat would a human hundreds or thousands times more intelligent than the brightest human ever born be like? We must admit we can hardly guess. A human being of such intelligence will be so radically different from us that it can hardly, if at all, be recognized as human. If we had to go back along the evolutionary tree to identify a creature 1000 times less intelligent than the average contemporary human, we will have to go really far back. Would it be a kind of a lizard? An insect perhaps? Considering this, how can we possibly aspire to have a grasp of something a thousand times more intelligent than us? When it comes to intelligence, even the very attempt to quantify it is highly misleading. Now if we attend to a seemingly adjacent question, what would a machine with such capacity for intelligence be like? Just coming up with an approximate metaphor requires a huge stretch of the imagination, meaning that almost anything goes... What would a society of such super intelligent agents, be they human, machines or an amalgam of both, be like? Well, here we are transported into the realm of pure speculation. Technological Singularity is referred to as the event of artificial intelligence surpassing the intelligence of humans and shortly after augmenting itself far beyond that. It is no wonder that the mathematical concept of singularity has become the symbol of an event so disruptive and so far reaching that it is impossible to conceptually or even metaphorically grasp, much less to predict.\nThis article is about how the \"SP theory of intelligence\" and its realisation in the \"SP machine\" (both outlined in the article) may help to solve computer-related problems in the design of autonomous robots, meaning robots that do not depend on external intelligence or power supplies, are mobile, and are designed to exhibit as much human-like intelligence as possible. The article is about: how to increase the computational and energy efficiency of computers and reduce their bulk; how to achieve human-like versatility in intelligence; and likewise for human-like adaptability in intelligence. The SP system has potential for substantial gains in computational and energy efficiency and reductions in the bulkiness of computers: by reducing the size of data to be processed; by exploiting statistical information that the system gathers; and via an updated version of Donald Hebb's concept of a \"cell assembly\". Towards human-like versatility in intelligence, the SP system has strengths in unsupervised learning, natural language processing, pattern recognition, information retrieval, several kinds of reasoning, planning, problem solving, and more, with seamless integration amongst structures and functions. The SP system's strengths in unsupervised learning and other aspects of intelligence may help to achieve human-like adaptability in intelligence via: the learning of natural language; learning to see; building 3D models of objects and of a robot's surroundings; learning regularities in the workings of a robot and in the robot's environment; exploration and play; learning major skills; and secondary forms of learning. Also discussed are: how the SP system may process parallel streams of information; generalisation of knowledge, correction of over-generalisations, and learning from dirty data; how to cut the cost of learning; and reinforcements, motivations, goals, and demonstration.\nThe integration of different learning and adaptation techniques to overcome individual limitations and to achieve synergetic effects through the hybridization or fusion of these techniques has, in recent years, contributed to a large number of new intelligent system designs. Computational intelligence is an innovative framework for constructing intelligent hybrid architectures involving Neural Networks (NN), Fuzzy Inference Systems (FIS), Probabilistic Reasoning (PR) and derivative free optimization techniques such as Evolutionary Computation (EC). Most of these hybridization approaches, however, follow an ad hoc design methodology, justified by success in certain application domains. Due to the lack of a common framework it often remains difficult to compare the various hybrid systems conceptually and to evaluate their performance comparatively. This chapter introduces the different generic architectures for integrating intelligent systems. The designing aspects and perspectives of different hybrid archirectures like NN-FIS, EC-FIS, EC-NN, FIS-PR and NN-FIS-EC systems are presented. Some conclusions are also provided towards the end.\nThe intelligent acoustic emission locator is described in Part I, while Part II discusses blind source separation, time delay estimation and location of two simultaneously active continuous acoustic emission sources.   The location of acoustic emission on complicated aircraft frame structures is a difficult problem of non-destructive testing. This article describes an intelligent acoustic emission source locator. The intelligent locator comprises a sensor antenna and a general regression neural network, which solves the location problem based on learning from examples. Locator performance was tested on different test specimens. Tests have shown that the accuracy of location depends on sound velocity and attenuation in the specimen, the dimensions of the tested area, and the properties of stored data. The location accuracy achieved by the intelligent locator is comparable to that obtained by the conventional triangulation method, while the applicability of the intelligent locator is more general since analysis of sonic ray paths is avoided. This is a promising method for non-destructive testing of aircraft frame structures by the acoustic emission method.\nContradiction is often seen as a defect of intelligent systems and a dangerous limitation on efficiency. In this paper we raise the question of whether, on the contrary, it could be considered a key tool in increasing intelligence in biological structures. A possible way of answering this question in a mathematical context is shown, formulating a proposition that suggests a link between intelligence and contradiction.   A concrete approach is presented in the well-defined setting of cellular automata. Here we define the models of ``observer'', ``entity'', ``environment'', ``intelligence'' and ``contradiction''. These definitions, which roughly correspond to the common meaning of these words, allow us to deduce a simple but strong result about these concepts in an unbiased, mathematical manner. Evidence for a real-world counterpart to the demonstrated formal link between intelligence and contradiction is provided by three computational experiments.\nWe explore self-organizing strategies for role assignment in a foraging task carried out by a colony of artificial agents. Our strategies are inspired by various mechanisms of division of labor (polyethism) observed in eusocial insects like ants, termites, or bees. Specifically we instantiate models of caste polyethism and age or temporal polyethism to evaluated the benefits to foraging in a dynamic environment. Our experiment is directly related to the exploration/exploitation trade of in machine learning.\nThe application of evolution in the digital realm, with the goal of creating artificial intelligence and artificial life, has a history as long as that of the digital computer itself. We illustrate the intertwined history of these ideas, starting with the early theoretical work of John von Neumann and the pioneering experimental work of Nils Aall Barricelli. We argue that evolutionary thinking and artificial life will continue to play an integral role in the future development of the digital world.\nA general methodology is proposed to engineer a system of interacting components (particles) which is able to self-regulate their concentrations in order to produce any prescribed output in response to a particular input. The methodology is based on the mathematical equivalence between artificial neurons in neural networks and species in autocatalytic reactions, and it specifies the relationship between the artificial neural network's parameters and the rate coefficients of the reactions between particle species. Such systems are characterised by a high degree of robustness as they are able to reach the desired output despite disturbances and perturbations of the concentrations of the various species.\nOur paradigm for the use of artificial agents to teach requires among other things that they persist through time in their interaction with human students, in such a way that they \"teleport\" or \"migrate\" from an embodiment at one time t to a different embodiment at later time t'. In this short paper, we report on initial steps toward the formalization of such teleportation, in order to enable an overseeing AI system to establish, mechanically, and verifiably, that the human students in question will likely believe that the very same artificial agent has persisted across such times despite the different embodiments.\nHuman-Computer Interaction with the traditional User Interface is done using a specified in advance script dialog menu, mainly based on human intellect and unproductive use of navigation. This approach does not lead to making qualitative decision in control systems, where the situations and processes cannot be structured in advance. Any dynamic changes in the controlled business process (as example, in organizational unit of the information fuzzy control system) make it necessary to modify the script dialogue in User Interface. This circumstance leads to a redesign of the components of the User Interface and of the entire control system. In the Intelligent User Interface, where the dialog situations are unknown in advance, fuzzy structured and artificial intelligence is crucial, the redesign described above is impossible. To solve this and other problems, we propose the data, information and knowledge based technology of Smart/ Intelligent User Interface (IUI) design, which interacts with users and systems in natural and other languages, utilizing the principles of Situational Control and Fuzzy Logic theories, Artificial Intelligence, Linguistics, Knowledge Base technologies and others. The proposed technology of IUI design is defined by multi-agents of Situational Control and of data, information and knowledge, modelling of Fuzzy Logic Inference, Generalization, Representation and Explanation of knowledge, Planning and Decision-making, Dialog Control, Reasoning and Systems Thinking, Fuzzy Control of organizational unit in real-time, fuzzy conditions, heterogeneous domains, and multi-lingual communication under uncertainty and in Fuzzy Environment.\nSociety has become more dependent on automated intelligent systems, at the same time, these systems have become more and more complicated. Society's expectation regarding the capabilities and intelligence of such systems has also grown. We have become a more complicated society with more complicated problems. As the expectation of intelligent systems rises, we discover many more applications for artificial intelligence. Additionally, as the difficulty level and computational requirements of such problems rise, there is a need to distribute the problem solving. Although the field of multiagent systems (MAS) and distributed artificial intelligence (DAI) is relatively young, the importance and applicability of this technology for solving today's problems continue to grow. In multiagent systems, the main goal is to provide fruitful cooperation among agents in order to enrich the support given to all user activities. This paper deals with the development of a multiagent system aimed at solving the reservation problems encountered in rural tourism. Due to their benefits over the last few years, online travel agencies have become a very useful instrument in planning vacations. A MAS concept (which is based on the Internet exploitation) can improve this activity and provide clients with a new, rapid and efficient way of making accommodation arrangements.\nDesigning and implementing an intelligent and user friendly human machine interface for any kind of software or hardware oriented application is always be a challenging task for the designers and developers because it is very difficult to understand the psychology of the user, nature of the work and best suit of the environment. This research paper is basically about to propose an intelligent, flexible and user friendly machine interface for Product Life Cycle Management products or PDM Systems since studies show that usability and human computer interaction issues are a major cause of acceptance problems introducing or using such systems. Going into details of the proposition, we present prototype implementations about theme based on design requirements, designed designs and technologies involved for the development of human machine interface.\nThis paper shows that maintaining logical consistency of an iris recognition system is a matter of finding a suitable partitioning of the input space in enrollable and unenrollable pairs by negotiating the user comfort and the safety of the biometric system. In other words, consistent enrollment is mandatory in order to preserve system consistency. A fuzzy 3-valued disambiguated model of iris recognition is proposed and analyzed in terms of completeness, consistency, user comfort and biometric safety. It is also shown here that the fuzzy 3-valued model of iris recognition is hosted by an 8-valued Boolean algebra of modulo 8 integers that represents the computational formalization in which a biometric system (a software agent) can achieve the artificial understanding of iris recognition in a logically consistent manner.\nExisting theoretical universal algorithmic intelligence models are not practically realizable. More pragmatic approach to artificial general intelligence is based on cognitive architectures, which are, however, non-universal in sense that they can construct and use models of the environment only from Turing-incomplete model spaces. We believe that the way to the real AGI consists in bridging the gap between these two approaches. This is possible if one considers cognitive functions as a \"cognitive bias\" (priors and search heuristics) that should be incorporated into the models of universal algorithmic intelligence without violating their universality. Earlier reported results suiting this approach and its overall feasibility are discussed on the example of perception, planning, knowledge representation, attention, theory of mind, language, and some others.\nSolomonoff induction is known to be universal, but incomputable. Its approximations, namely, the Minimum Description (or Message) Length (MDL) principles, are adopted in practice in the efficient, but non-universal form. Recent attempts to bridge this gap leaded to development of the Representational MDL principle that originates from formal decomposition of the task of induction. In this paper, possible extension of the RMDL principle in the context of universal intelligence agents is considered, for which introduction of representations is shown to be an unavoidable meta-heuristic and a step toward efficient general intelligence. Hierarchical representations and model optimization with the use of information-theoretic interpretation of the adaptive resonance are also discussed.\nAny agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.\nThe Google DeepMind challenge match in March 2016 was a historic achievement for computer Go development. This article discusses the development of computational intelligence (CI) and its relative strength in comparison with human intelligence for the game of Go. We first summarize the milestones achieved for computer Go from 1998 to 2016. Then, the computer Go programs that have participated in previous IEEE CIS competitions as well as methods and techniques used in AlphaGo are briefly introduced. Commentaries from three high-level professional Go players on the five AlphaGo versus Lee Sedol games are also included. We conclude that AlphaGo beating Lee Sedol is a huge achievement in artificial intelligence (AI) based largely on CI methods. In the future, powerful computer Go programs such as AlphaGo are expected to be instrumental in promoting Go education and AI real-world applications.\nAs an inevitable trend of future 5G networks, Software Defined architecture has many advantages in providing central- ized control and flexible resource management. But it is also confronted with various security challenges and potential threats with emerging services and technologies. As the focus of network security, Intrusion Detection Systems (IDS) are usually deployed separately without collaboration. They are also unable to detect novel attacks with limited intelligent abilities, which are hard to meet the needs of software defined 5G. In this paper, we propose an intelligent intrusion system taking the advances of software defined technology and artificial intelligence based on Software Defined 5G architecture. It flexibly combines security function mod- ules which are adaptively invoked under centralized management and control with a globle view. It can also deal with unknown intrusions by using machine learning algorithms. Evaluation results prove that the intelligent intrusion detection system achieves a better performance.\nNeuroscience research has produced many theories and computational neural models of sensory nervous systems. Notwithstanding many different perspectives towards developing intelligent machines, artificial intelligence has ultimately been influenced by neuroscience. Therefore, this paper provides an introduction to biologically inspired machine intelligence by exploring the basic principles of sensation and perception as well as the structure and behavior of biological sensory nervous systems like the neocortex. Concepts like spike timing, synaptic plasticity, inhibition, neural structure, and neural behavior are applied to a new model, Simple Cortex (SC). A software implementation of SC has been built and demonstrates fast observation, learning, and prediction of spatio-temporal sensory-motor patterns and sequences. Finally, this paper suggests future areas of improvement and growth for Simple Cortex and other related machine intelligence models.\nWe review the theory of neural networks, as it has emerged in the last ten years or so within the physics community, emphasizing questions of biological relevance over those of importance in mathematical statistics and machine learning theory.\nIn the paper arguments are given why the concept of static evaluation has the potential to be a useful extension to Monte Carlo tree search. A new concept of modeling static evaluation through a dynamical system is introduced and strengths and weaknesses are discussed. The general suitability of this approach is demonstrated.\nWe define and explore in simulation several rules for the local evolution of generative rules for 1D and 2D cellular automata. Our implementation uses strategies from conceptual blending. We discuss potential applications to modelling social dynamics.\nSoftware capable of improving itself has been a dream of computer scientists since the inception of the field. In this work we provide definitions for Recursively Self-Improving software, survey different types of self-improving software, review the relevant literature, analyze limits on computation restricting recursive self-improvement and introduce RSI Convergence Theory which aims to predict general behavior of RSI systems. Finally, we address security implications from self-improving intelligent software.\nA fuzzy clustering algorithm for multidimensional data is proposed in this article. The data is described by vectors whose components are linguistic variables defined in an ordinal scale. The obtained results confirm the efficiency of the proposed approach.\nThis paper suggests a new interpretation of the Dempster-Shafer theory in terms of probabilistic interpretation of plausibility. A new rule of combination of independent evidence is shown and its preservation of interpretation is demonstrated.\nPeople can think in auditory, visual and tactile forms of language, so can machines principally. But is it possible for them to think in radio language? According to a first principle presented for general intelligence, i.e. the principle of language's relativity, the answer may give an exceptional solution for robot astronauts to talk with each other in space exploration.\nModern evolutionary computation utilizes heuristic optimizations based upon concepts borrowed from the Darwinian theory of natural selection. We believe that a vital direction in this field must be algorithms that model the activity of genomic parasites, such as transposons, in biological evolution. This publication is our first step in the direction of developing a minimal assortment of algorithms that simulate the role of genomic parasites. Specifically, we started in the domain of genetic algorithms (GA) and selected the Artificial Ant Problem as a test case. We define these artificial transposons as a fragment of an ant's code that possesses properties that cause it to stand apart from the rest. We concluded that artificial transposons, analogous to real transposons, are truly capable of acting as intelligent mutators that adapt in response to an evolutionary problem in the course of co-evolution with their hosts.\nArtificial reinforcement learning (RL) is a widely used technique in artificial intelligence that provides a general method for training agents to perform a wide variety of behaviours. RL as used in computer science has striking parallels to reward and punishment learning in animal and human brains. I argue that present-day artificial RL agents have a very small but nonzero degree of ethical importance. This is particularly plausible for views according to which sentience comes in degrees based on the abilities and complexities of minds, but even binary views on consciousness should assign nonzero probability to RL programs having morally relevant experiences. While RL programs are not a top ethical priority today, they may become more significant in the coming decades as RL is increasingly applied to industry, robotics, video games, and other areas. I encourage scientists, philosophers, and citizens to begin a conversation about our ethical duties to reduce the harm that we inflict on powerless, voiceless RL agents.\nBiological plastic neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifetime learning. The interplay of these elements leads to the emergence of adaptive behavior and intelligence. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed plastic neural networks with a large variety of dynamics, architectures, and plasticity rules: these artificial systems are composed of inputs, outputs, and plastic components that change in response to experiences in an environment. These systems may autonomously discover novel adaptive algorithms, and lead to hypotheses on the emergence of biological adaptation. EPANNs have seen considerable progress over the last two decades. Current scientific and technological advances in artificial neural networks are now setting the conditions for radically new approaches and results. In particular, the limitations of hand-designed networks could be overcome by more flexible and innovative solutions. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main methods and results are reviewed. Finally, new opportunities and developments are presented.\nWe present the first experimental realization of a quantum artificial life algorithm in a quantum computer. The quantum biomimetic protocol encodes tailored quantum behaviors belonging to living systems, namely, self-replication, mutation, interaction between individuals, and death, into the cloud quantum computer IBM ibmqx4. In this experiment, entanglement spreads throughout generations of individuals, where genuine quantum information features are inherited through genealogical networks. As a pioneering proof-of-principle, experimental data fits the ideal model with accuracy. Thereafter, these and other models of quantum artificial life, for which no classical device may predict its quantum supremacy evolution, can be further explored in novel generations of quantum computers. Quantum biomimetics, quantum machine learning, and quantum artificial intelligence will move forward hand in hand through more elaborate levels of quantum complexity.\nArtificial neural networks (ANNs), while exceptionally useful for classification, are vulnerable to misdirection. Small amounts of noise can significantly affect their ability to correctly complete a task. Instead of generalizing concepts, ANNs seem to focus on surface statistical regularities in a given task. Here we compare how recurrent artificial neural networks, long short-term memory units, and Markov Brains sense and remember their environments. We show that information in Markov Brains is localized and sparsely distributed, while the other neural network substrates \"smear\" information about the environment across all nodes, which makes them vulnerable to noise.\nThis paper deals with the problem of classifying signals. The new method for building so called local classifiers and local features is presented. The method is a combination of the lifting scheme and the support vector machines. Its main aim is to produce effective and yet comprehensible classifiers that would help in understanding processes hidden behind classified signals. To illustrate the method we present the results obtained on an artificial and a real dataset.\nThis philosophical paper explores the relation between modern scientific simulations and the future of the universe. We argue that a simulation of an entire universe will result from future scientific activity. This requires us to tackle the challenge of simulating open-ended evolution at all levels in a single simulation. The simulation should encompass not only biological evolution, but also physical evolution (a level below) and cultural evolution (a level above). The simulation would allow us to probe what would happen if we would \"replay the tape of the universe\" with the same or different laws and initial conditions. We also distinguish between real-world and artificial-world modelling. Assuming that intelligent life could indeed simulate an entire universe, this leads to two tentative hypotheses. Some authors have argued that we may already be in a simulation run by an intelligent entity. Or, if such a simulation could be made real, this would lead to the production of a new universe. This last direction is argued with a careful speculative philosophical approach, emphasizing the imperative to find a solution to the heat death problem in cosmology. The reader is invited to consult Annex 1 for an overview of the logical structure of this paper. -- Keywords: far future, future of science, ALife, simulation, realization, cosmology, heat death, fine-tuning, physical eschatology, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis, selfish biocosm hypothesis, meduso-anthropic principle, developmental singularity hypothesis, role of intelligent life.\nThe ongoing discussion whether modern vision systems have to be viewed as visually-enabled cognitive systems or cognitively-enabled vision systems is groundless, because perceptual and cognitive faculties of vision are separate components of human (and consequently, artificial) information processing system modeling.\nFor natural and artificial systems with some symmetry structure, computational understanding and manipulation can be achieved without learning by exploiting the algebraic structure. Here we describe this algebraic coordinatization method and apply it to permutation puzzles. Coordinatization yields a structural understanding, not just solutions for the puzzles.\nSocial intelligence in natural and artificial systems is usually measured by the evaluation of associated traits or tasks that are deemed to represent some facets of social behaviour. The amalgamation of these traits is then used to configure the intuitive notion of social intelligence. Instead, in this paper we start from a parametrised definition of social intelligence as the expected performance in a set of environments with several agents, and we assess and derive tests from it. This definition makes several dependencies explicit: (1) the definition depends on the choice (and weight) of environments and agents, (2) the definition may include both competitive and cooperative behaviours depending on how agents and rewards are arranged into teams, (3) the definition mostly depends on the abilities of other agents, and (4) the actual difference between social intelligence and general intelligence (or other abilities) depends on these choices. As a result, we address the problem of converting this definition into a more precise one where some fundamental properties ensuring social behaviour (such as action and reward dependency and anticipation on competitive/cooperative behaviours) are met as well as some other more instrumental properties (such as secernment, boundedness, symmetry, validity, reliability, efficiency), which are convenient to convert the definition into a practical test. From the definition and the formalised properties, we take a look at several representative multi-agent environments, tests and games to see whether they meet these properties.\nUnderstanding and using natural processes for intelligent functionalities, referred to as natural intelligence, has recently attracted interest from a variety of fields, including post-silicon computing for artificial intelligence and decision making in the behavioural sciences. In a past study, we successfully used the wave-particle duality of single photons to solve the two-armed bandit problem, which constitutes the foundation of reinforcement learning and decision making. In this study, we propose and confirm a hierarchical architecture for single-photon-based reinforcement learning and decision making that verifies the scalability of the principle. Specifically, the four-armed bandit problem is solved given zero prior knowledge in a two-layer hierarchical architecture, where polarization is autonomously adapted in order to effect adequate decision making using single-photon measurements. In the hierarchical structure, the notion of layer-dependent decisions emerges. The optimal solutions in the coarse layer and in the fine layer, however, conflict with each other in some contradictive problems. We show that while what we call a tournament strategy resolves such contradictions, the probabilistic nature of single photons allows for the direct location of the optimal solution even for contradictive problems, hence manifesting the exploration ability of single photons. This study provides insights into photon intelligence in hierarchical architectures for future artificial intelligence as well as the potential of natural processes for intelligent functionalities.\nTechniques are presented for defining models of computational linguistics theories. The methods of generalized diagrams that were developed by this author for modeling artificial intelligence planning and reasoning are shown to be applicable to models of computation of linguistics theories. It is shown that for extensional and intensional interpretations, models can be generated automatically which assign meaning to computations of linguistics theories for natural languages.   Keywords: Computational Linguistics, Reasoning Models, G-diagrams For Models, Dynamic Model Implementation, Linguistics and Logics For Artificial Intelligence\nBecause of their occasional need to return to shallow points in a search tree, existing backtracking methods can sometimes erase meaningful progress toward solving a search problem. In this paper, we present a method by which backtrack points can be moved deeper in the search space, thereby avoiding this difficulty. The technique developed is a variant of dependency-directed backtracking that uses only polynomial space while still providing useful control information and retaining the completeness guarantees provided by earlier approaches.\nIn this paper we describe how to modify GSAT so that it can be applied to non-clausal formulas. The idea is to use a particular ``score'' function which gives the number of clauses of the CNF conversion of a formula which are false under a given truth assignment. Its value is computed in linear time, without constructing the CNF conversion itself. The proposed methodology applies to most of the variants of GSAT proposed so far.\nThis paper introduces a framework for Planning while Learning where an agent is given a goal to achieve in an environment whose behavior is only partially known to the agent. We discuss the tractability of various plan-design processes. We show that for a large natural class of Planning while Learning systems, a plan can be presented and verified in a reasonable time. However, coming up algorithmically with a plan, even for simple classes of systems is apparently intractable. We emphasize the role of off-line plan-design processes, and show that, in most natural cases, the verification (projection) part can be carried out in an efficient algorithmic manner.\nWe present a definition of cause and effect in terms of decision-theoretic primitives and thereby provide a principled foundation for causal reasoning. Our definition departs from the traditional view of causation in that causal assertions may vary with the set of decisions available. We argue that this approach provides added clarity to the notion of cause. Also in this paper, we examine the encoding of causal relationships in directed acyclic graphs. We describe a special class of influence diagrams, those in canonical form, and show its relationship to Pearl's representation of cause and effect. Finally, we show how canonical form facilitates counterfactual reasoning.\nThis article describes an application of three well-known statistical methods in the field of game-tree search: using a large number of classified Othello positions, feature weights for evaluation functions with a game-phase-independent meaning are estimated by means of logistic regression, Fisher's linear discriminant, and the quadratic discriminant function for normally distributed features. Thereafter, the playing strengths are compared by means of tournaments between the resulting versions of a world-class Othello program. In this application, logistic regression - which is used here for the first time in the context of game playing - leads to better results than the other approaches.\nWe describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.\nThe paper describes an extension of well-founded semantics for logic programs with two types of negation. In this extension information about preferences between rules can be expressed in the logical language and derived dynamically. This is achieved by using a reserved predicate symbol and a naming technique. Conflicts among rules are resolved whenever possible on the basis of derived preference information. The well-founded conclusions of prioritized logic programs can be computed in polynomial time. A legal reasoning example illustrates the usefulness of the approach.\nWe introduce an algorithm for combinatorial search on quantum computers that is capable of significantly concentrating amplitude into solutions for some NP search problems, on average. This is done by exploiting the same aspects of problem structure as used by classical backtrack methods to avoid unproductive search choices. This quantum algorithm is much more likely to find solutions than the simple direct use of quantum parallelism. Furthermore, empirical evaluation on small problems shows this quantum algorithm displays the same phase transition behavior, and at the same location, as seen in many previously studied classical search methods. Specifically, difficult problem instances are concentrated near the abrupt change from underconstrained to overconstrained problems.\nWe develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition---the classification of handwritten digits.\nA reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes. An MDL-inspired penalty is applied to such tests, eliminating some of them from consideration and altering the relative desirability of all tests. Empirical trials show that the modifications lead to smaller decision trees with higher predictive accuracies. Results also confirm that a new version of C4.5 incorporating these changes is superior to recent approaches that use global discretization and that construct small trees with multi-interval splits.\nFor many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.\nInductive theorem provers often diverge. This paper describes a simple critic, a computer program which monitors the construction of inductive proofs attempting to identify diverging proof attempts. Divergence is recognized by means of a ``difference matching'' procedure. The critic then proposes lemmas and generalizations which ``ripple'' these differences away so that the proof can go through without divergence. The critic enables the theorem prover Spike to prove many theorems completely automatically from the definitions alone.\nFirst-order learning involves finding a clause-form definition of a relation from examples of the relation and relevant background information. In this paper, a particular first-order learning system is modified to customize it for finding definitions of functional relations. This restriction leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy. Other first-order learning systems might benefit from similar specialization.\nWe argue that the analysis of agent/environment interactions should be extended to include the conventions and invariants maintained by agents throughout their activity. We refer to this thicker notion of environment as a lifeworld and present a partial set of formal tools for describing structures of lifeworlds and the ways in which they computationally simplify activity. As one specific example, we apply the tools to the analysis of the Toast system and show how versions of the system with very different control structures in fact implement a common control structure together with different conventions for encoding task state in the positions or states of objects in the environment.\nWe investigate the computational properties of the spatial algebra RCC-5 which is a restricted version of the RCC framework for spatial reasoning. The satisfiability problem for RCC-5 is known to be NP-complete but not much is known about its approximately four billion subclasses. We provide a complete classification of satisfiability for all these subclasses into polynomial and NP-complete respectively. In the process, we identify all maximal tractable subalgebras which are four in total.\nThis paper combines two important directions of research in temporal resoning: that of finding maximal tractable subclasses of Allen's interval algebra, and that of reasoning with metric temporal information. Eight new maximal tractable subclasses of Allen's interval algebra are presented, some of them subsuming previously reported tractable algebras. The algebras allow for metric temporal constraints on interval starting or ending points, using the recent framework of Horn DLRs. Two of the algebras can express the notion of sequentiality between intervals, being the first such algebras admitting both qualitative and metric time.\nStarting with a likelihood or preference order on worlds, we extend it to a likelihood ordering on sets of worlds in a natural way, and examine the resulting logic. Lewis earlier considered such a notion of relative likelihood in the context of studying counterfactuals, but he assumed a total preference order on worlds. Complications arise when examining partial orders that are not present for total orders. There are subtleties involving the exact approach to lifting the order on worlds to an order on sets of worlds. In addition, the axiomatization of the logic of relative likelihood in the case of partial orders gives insight into the connection between relative likelihood and default reasoning.\nDefault logic can be regarded as a mechanism to represent families of belief sets of a reasoning agent. As such, it is inherently second-order. In this paper, we study the problem of representability of a family of theories as the set of extensions of a default theory. We give a complete solution to the representability by means of normal default theories. We obtain partial results on representability by arbitrary default theories. We construct examples of denumerable families of non-including theories that are not representable. We also study the concept of equivalence between default theories.\nMixed metaphors have been neglected in recent metaphor research. This paper suggests that such neglect is short-sighted. Though mixing is a more complex phenomenon than straight metaphors, the same kinds of reasoning and knowledge structures are required. This paper provides an analysis of both parallel and serial mixed metaphors within the framework of an AI system which is already capable of reasoning about straight metaphorical manifestations and argues that the processes underlying mixing are central to metaphorical meaning. Therefore, any theory of metaphors must be able to account for mixing.\nThis paper proposes two kinds of fuzzy abductive inference in the framework of fuzzy rule base. The abductive inference processes described here depend on the semantic of the rule. We distinguish two classes of interpretation of a fuzzy rule, certainty generation rules and possible generation rules. In this paper we present the architecture of abductive inference in the first class of interpretation. We give two kinds of problem that we can resolve by using the proposed models of inference.\nIn this paper we propose a new type of random CSP model, called Model RB, which is a revision to the standard Model B. It is proved that phase transitions from a region where almost all problems are satisfiable to a region where almost all problems are unsatisfiable do exist for Model RB as the number of variables approaches infinity. Moreover, the critical values at which the phase transitions occur are also known exactly. By relating the hardness of Model RB to Model B, it is shown that there exist a lot of hard instances in Model RB.\nWe show that Gabbay's nonmonotonic consequence relations can be reduced to a new family of relations, called entrenchment relations. Entrenchment relations provide a direct generalization of epistemic entrenchment and expectation ordering introduced by Gardenfors and Makinson for the study of belief revision and expectation inference, respectively.\nWe present a tableau calculus for reasoning in fragments of natural language. We focus on the problem of pronoun resolution and the way in which it complicates automated theorem proving for natural language processing. A method for explicitly manipulating contextual information during deduction is proposed, where pronouns are resolved against this context during deduction. As a result, pronoun resolution and deduction can be interleaved in such a way that pronouns are only resolved if this is licensed by a deduction rule; this helps us to avoid the combinatorial complexity of total pronoun disambiguation.\nPreferences among acts are analyzed in the style of L. Savage, but as partially ordered. The rationality postulates considered are weaker than Savage's on three counts. The Sure Thing Principle is derived in this setting. The postulates are shown to lead to a characterization of generalized qualitative probability that includes and blends both traditional qualitative probability and the ranked structures used in logical approaches.\nDynamic Bayesian networks (DBNs) offer an elegant way to integrate various aspects of language in one model. Many existing algorithms developed for learning and inference in DBNs are applicable to probabilistic language modeling. To demonstrate the potential of DBNs for natural language processing, we employ a DBN in an information extraction task. We show how to assemble wealth of emerging linguistic instruments for shallow parsing, syntactic and semantic tagging, morphological decomposition, named entity recognition etc. in order to incrementally build a robust information extraction system. Our method outperforms previously published results on an established benchmark domain.\nThis article presents an overview of the idea that \"information compression by multiple alignment, unification and search\" (ICMAUS) may serve as a unifying principle in computing (including mathematics and logic) and in such aspects of human cognition as the analysis and production of natural language, fuzzy pattern recognition and best-match information retrieval, concept hierarchies with inheritance of attributes, probabilistic reasoning, and unsupervised inductive learning. The ICMAUS concepts are described together with an outline of the SP61 software model in which the ICMAUS concepts are currently realised. A range of examples is presented, illustrated with output from the SP61 model.\nWe apply the optimization algorithm Adaptive Simulated Annealing (ASA) to the problem of analyzing data on a large population and selecting the best model to predict that an individual with various traits will have a particular disease. We compare ASA with traditional forward and backward regression on computer simulated data. We find that the traditional methods of modeling are better for smaller data sets whereas a numerically stable ASA seems to perform better on larger and more complicated data sets.\nRecent advances in programming languages study and design have established a standard way of grounding computational systems representation in category theory. These formal results led to a better understanding of issues of control and side-effects in functional and imperative languages. This framework can be successfully applied to the investigation of the performance of Artificial Intelligence (AI) inference and cognitive systems. In this paper, we delineate a categorical formalisation of memory as a control structure driving performance in inference systems. Abstracting away control mechanisms from three widely used representations of memory in cognitive systems (scripts, production rules and clusters) we explain how categorical triples capture the interaction between learning and problem-solving.\nWe consider how to make probability forecasts of binary labels. Our main mathematical result is that for any continuous gambling strategy used for detecting disagreement between the forecasts and the actual labels, there exists a forecasting strategy whose forecasts are ideal as far as this gambling strategy is concerned. A forecasting strategy obtained in this way from a gambling strategy demonstrating a strong law of large numbers is simplified and studied empirically.\nWe study an alternative to the prevailing approach to modelling qualitative spatial reasoning (QSR) problems as constraint satisfaction problems. In the standard approach, a relation between objects is a constraint whereas in the alternative approach it is a variable. The relation-variable approach greatly simplifies integration and implementation of QSR. To substantiate this point, we discuss several QSR algorithms from the literature which in the relation-variable approach reduce to the customary constraint propagation algorithm enforcing generalised arc-consistency.\nIn case-based reasoning, the adaptation step depends in general on domain-dependent knowledge, which motivates studies on adaptation knowledge acquisition (AKA). CABAMAKA is an AKA system based on principles of knowledge discovery from databases. This system explores the variations within the case base to elicit adaptation knowledge. It has been successfully tested in an application of case-based decision support to breast cancer treatment.\nExact parsing with finite state automata is deemed inappropriate because of the unbounded non-locality languages overwhelmingly exhibit. We propose a way to structure the parsing task in order to make it amenable to local classification methods. This allows us to build a Dynamic Bayesian Network which uncovers the syntactic dependency structure of English sentences. Experiments with the Wall Street Journal demonstrate that the model successfully learns from labeled data.\nWe try to design a quantum neural network with qubits instead of classical neurons with deterministic states, and also with quantum operators replacing teh classical action potentials. With our choice of gates interconnecting teh neural lattice, it appears that the state of the system behaves in ways reflecting both the strengths of coupling between neurons as well as initial conditions. We find that depending whether there is a threshold for emission from excited to ground state, the system shows either aperiodic oscillations or coherent ones with periodicity depending on the strength of coupling.\nIn this paper we present the creation of an Arabic version of Automated Speech Recognition System (ASR). This system is based on the open source Sphinx-4, from the Carnegie Mellon University. Which is a speech recognition system based on discrete hidden Markov models (HMMs). We investigate the changes that must be made to the model to adapt Arabic voice recognition.   Keywords: Speech recognition, Acoustic model, Arabic language, HMMs, CMUSphinx-4, Artificial intelligence.\nThis paper deals with sensors which compute and report linguistic assessments of their values.Such sensors, called symbolic sensors are a natural extension of smart ones when working with control systems which use artificial intelligence based technics. After having recalled the smart sensor concepts, this paper introduces the symbolic sensor ones. Links between the physical world and the symbolic one are described. It is then shown how Zadeh's approximate reasoning theory provides a smart way to implement symbolic sensors. Finally, since the symbolic sensor is a general component, a functional adaptation to the measurement context is proposed.\nMilitarised conflict is one of the risks that have a significant impact on society. Militarised Interstate Dispute (MID) is defined as an outcome of interstate interactions, which result on either peace or conflict. Effective prediction of the possibility of conflict between states is an important decision support tool for policy makers. In a previous research, neural networks (NNs) have been implemented to predict the MID. Support Vector Machines (SVMs) have proven to be very good prediction techniques and are introduced for the prediction of MIDs in this study and compared to neural networks. The results show that SVMs predict MID better than NNs while NNs give more consistent and easy to interpret sensitivity analysis than SVMs.\nIn the process of training Support Vector Machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods.\nFeature Markov Decision Processes (PhiMDPs) are well-suited for learning agents in general environments. Nevertheless, unstructured (Phi)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend PhiMDP to PhiDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the \"best\" DBN representation. I discuss all building blocks required for a complete general learning algorithm.\nSymmetry is an important factor in solving many constraint satisfaction problems. One common type of symmetry is when we have symmetric values. In a recent series of papers, we have studied methods to break value symmetries. Our results identify computational limits on eliminating value symmetry. For instance, we prove that pruning all symmetric values is NP-hard in general. Nevertheless, experiments show that much value symmetry can be broken in practice. These results may be useful to researchers in planning, scheduling and other areas as value symmetry occurs in many different domains.\nWe present a comprehensive study of the use of value precedence constraints to break value symmetry. We first give a simple encoding of value precedence into ternary constraints that is both efficient and effective at breaking symmetry. We then extend value precedence to deal with a number of generalizations like wreath value and partial interchangeability. We also show that value precedence is closely related to lexicographical ordering. Finally, we consider the interaction between value precedence and symmetry breaking constraints for variable symmetries.\nTo model combinatorial decision problems involving uncertainty and probability, we introduce stochastic constraint programming. Stochastic constraint programs contain both decision variables (which we can set) and stochastic variables (which follow a probability distribution). They combine together the best features of traditional constraint satisfaction, stochastic integer programming, and stochastic satisfiability. We give a semantics for stochastic constraint programs, and propose a number of complete algorithms and approximation procedures. Finally, we discuss a number of extensions of stochastic constraint programming to relax various assumptions like the independence between stochastic variables, and compare with other approaches for decision making under uncertainty.\nWe show that some common and important global constraints like ALL-DIFFERENT and GCC can be decomposed into simple arithmetic constraints on which we achieve bound or range consistency, and in some cases even greater pruning. These decompositions can be easily added to new solvers. They also provide other constraints with access to the state of the propagator by sharing of variables. Such sharing can be used to improve propagation between constraints. We report experiments with our decomposition in a pseudo-Boolean solver.\nMany real life optimization problems contain both hard and soft constraints, as well as qualitative conditional preferences. However, there is no single formalism to specify all three kinds of information. We therefore propose a framework, based on both CP-nets and soft constraints, that handles both hard and soft constraints as well as conditional preferences efficiently and uniformly. We study the complexity of testing the consistency of preference statements, and show how soft constraints can faithfully approximate the semantics of conditional preference statements whilst improving the computational complexity\nWe identify a new and important global (or non-binary) constraint. This constraint ensures that the values taken by two vectors of variables, when viewed as multisets, are ordered. This constraint is useful for a number of different applications including breaking symmetry and fuzzy constraint satisfaction. We propose and implement an efficient linear time algorithm for enforcing generalised arc consistency on such a multiset ordering constraint. Experimental results on several problem domains show considerable promise.\nI propose that pattern recognition, memorization and processing are key concepts that can be a principle set for the theoretical modeling of the mind function. Most of the questions about the mind functioning can be answered by a descriptive modeling and definitions from these principles. An understandable consciousness definition can be drawn based on the assumption that a pattern recognition system can recognize its own patterns of activity. The principles, descriptive modeling and definitions can be a basis for theoretical and applied research on cognitive sciences, particularly at artificial intelligence studies.\nWhen configuring customizable software, it is useful to provide interactive tool-support that ensures that the configuration does not breach given constraints.   But, when is a configuration complete and how can the tool help the user to complete it?   We formalize this problem and relate it to concepts from non-monotonic reasoning well researched in Artificial Intelligence. The results are interesting for both practitioners and theoreticians. Practitioners will find a technique facilitating an interactive configuration process and experiments supporting feasibility of the approach. Theoreticians will find links between well-known formal concepts and a concrete practical application.\nAim of this paper is to address the problem of learning Boolean functions from training data with missing values. We present an extension of the BRAIN algorithm, called U-BRAIN (Uncertainty-managing Batch Relevance-based Artificial INtelligence), conceived for learning DNF Boolean formulas from partial truth tables, possibly with uncertain values or missing bits.   Such an algorithm is obtained from BRAIN by introducing fuzzy sets in order to manage uncertainty. In the case where no missing bits are present, the algorithm reduces to the original BRAIN.\nWe define the concept of an internal symmetry. This is a symmety within a solution of a constraint satisfaction problem. We compare this to solution symmetry, which is a mapping between different solutions of the same problem. We argue that we may be able to exploit both types of symmetry when finding solutions. We illustrate the potential of exploiting internal symmetries on two benchmark domains: Van der Waerden numbers and graceful graphs. By identifying internal symmetries we are able to extend the state of the art in both cases.\nWe present in this paper an evolution of a tool from a user interface for a concrete Computer Algebra system for Algebraic Topology (the Kenzo system), to a front-end allowing the interoperability among different sources for computation and deduction. The architecture allows the system not only to interface several systems, but also to make them cooperate in shared calculations.\nThis paper deals with chain graphs under the classic Lauritzen-Wermuth-Frydenberg interpretation. We prove that the regular Gaussian distributions that factorize with respect to a chain graph $G$ with $d$ parameters have positive Lebesgue measure with respect to $\\mathbb{R}^d$, whereas those that factorize with respect to $G$ but are not faithful to it have zero Lebesgue measure with respect to $\\mathbb{R}^d$. This means that, in the measure-theoretic sense described, almost all the regular Gaussian distributions that factorize with respect to $G$ are faithful to it.\nIn this short paper I briefly discuss 3D war Game based on artificial intelligence concepts called AI WAR. Going in to the details, I present the importance of CAICL language and how this language is used in AI WAR. Moreover I also present a designed and implemented 3D War Cybug for AI WAR using CAICL and discus the implemented strategy to defeat its enemies during the game life.\nIn this paper I will show how to use Python programming with a computer interface such as Phoenix-M 1 to drive simple robots. In my quest towards Artificial Intelligence(AI) I am experimenting with a lot of different possibilities in Robotics. This one will try to mimic the working of a simple insect's nervous system using hard wiring and some minimal software usage. This is the precursor to my advanced robotics and AI integration where I plan to use a new paradigm of AI based on Machine Learning and Self Consciousness via Knowledge Feedback and Update Process.\nIn this paper we present a new approach to solve the satisfiability problem (SAT), based on boolean networks (BN). We define a mapping between a SAT instance and a BN, and we solve SAT problem by simulating the BN dynamics. We prove that BN fixed points correspond to the SAT solutions. The mapping presented allows to develop a new class of algorithms to solve SAT. Moreover, this new approach suggests new ways to combine symbolic and connectionist computation and provides a general framework for local search algorithms.\nWe propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-using previous solutions, learning programming idioms and discovery of frequent subprograms. Experiments with two training sequences demonstrate that our approach to incremental learning is effective.\nThere are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications.\nWe investigate the problem of reasoning in the propositional fragment of MBNF, the logic of minimal belief and negation as failure introduced by Lifschitz, which can be considered as a unifying framework for several nonmonotonic formalisms, including default logic, autoepistemic logic, circumscription, epistemic queries, and logic programming. We characterize the complexity and provide algorithms for reasoning in propositional MBNF. In particular, we show that entailment in propositional MBNF lies at the third level of the polynomial hierarchy, hence it is harder than reasoning in all the above mentioned propositional formalisms for nonmonotonic reasoning. We also prove the exact correspondence between negation as failure in MBNF and negative introspection in Moore's autoepistemic logic.\nThis paper reviews the connections between Graphplan's planning-graph and the dynamic constraint satisfaction problem and motivates the need for adapting CSP search techniques to the Graphplan algorithm. It then describes how explanation based learning, dependency directed backtracking, dynamic variable ordering, forward checking, sticky values and random-restart search strategies can be adapted to Graphplan. Empirical results are provided to demonstrate that these augmentations improve Graphplan's performance significantly (up to 1000x speedups) on several benchmark problems. Special attention is paid to the explanation-based learning and dependency directed backtracking techniques as they are empirically found to be most useful in improving the performance of Graphplan.\nPearl and Dechter (1996) claimed that the d-separation criterion for conditional independence in acyclic causal networks also applies to networks of discrete variables that have feedback cycles, provided that the variables of the system are uniquely determined by the random disturbances. I show by example that this is not true in general. Some condition stronger than uniqueness is needed, such as the existence of a causal dynamics guaranteed to lead to the unique solution.\nWe show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here \"unlikely\" means \"unless some complexity classes collapse,\" where the collapses considered are P=NP, P=PSPACE, or P=EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation.\nThe chief aim of this paper is to propose mean-field approximations for a broad class of Belief networks, of which sigmoid and noisy-or networks can be seen as special cases. The approximations are based on a powerful mean-field theory suggested by Plefka. We show that Saul, Jaakkola and Jordan' s approach is the first order approximation in Plefka's approach, via a variational derivation. The application of Plefka's theory to belief networks is not computationally tractable. To tackle this problem we propose new approximations based on Taylor series. Small scale experiments show that the proposed schemes are attractive.\nDomain-independent planning is a hard combinatorial problem. Taking into account plan quality makes the task even more difficult. This article introduces Planning by Rewriting (PbR), a new paradigm for efficient high-quality domain-independent planning. PbR exploits declarative plan-rewriting rules and efficient local search techniques to transform an easy-to-generate, but possibly suboptimal, initial plan into a high-quality plan. In addition to addressing the issues of planning efficiency and plan quality, this framework offers a new anytime planning algorithm. We have implemented this planner and applied it to several existing domains. The experimental results show that the PbR approach provides significant savings in planning effort while generating high-quality plans.\nPartially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.\nDesigning the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.\nThe First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding strategy that includes several elements of adaptivity. In addition to the success at the competition, isolated empirical results are presented indicating the robustness and effectiveness of ATTac-2000's adaptive strategy.\nThe theoretical properties of qualitative spatial reasoning in the RCC8 framework have been analyzed extensively. However, no empirical investigation has been made yet. Our experiments show that the adaption of the algorithms used for qualitative temporal reasoning can solve large RCC8 instances, even if they are in the phase transition region -- provided that one uses the maximal tractable subsets of RCC8 that have been identified by us. In particular, we demonstrate that the orthogonal combination of heuristic methods is successful in solving almost all apparently hard instances in the phase transition region up to a certain size in reasonable time.\nInductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems.\nThis paper presents an approach to expert-guided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The other important steps of the proposed subgroup discovery process are the detection of statistically significant properties of selected subgroups and subgroup visualization: statistically significant properties are used to enrich the descriptions of induced subgroups, while the visualization shows subgroup properties in the form of distributions of the numbers of examples in the subgroups. The approach is illustrated by the results obtained for a medical problem of early detection of patient risk groups.\nHierarchical task decomposition is a method used in many agent systems to organize agent knowledge. This work shows how the combination of a hierarchy and persistent assertions of knowledge can lead to difficulty in maintaining logical consistency in asserted knowledge. We explore the problematic consequences of persistent assumptions in the reasoning process and introduce novel potential solutions. Having implemented one of the possible solutions, Dynamic Hierarchical Justification, its effectiveness is demonstrated with an empirical analysis.\nIn common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms.\nThe automatic generation of decision trees based on off-line reasoning on models of a domain is a reasonable compromise between the advantages of using a model-based approach in technical domains and the constraints imposed by embedded applications. In this paper we extend the approach to deal with temporal information. We introduce a notion of temporal decision tree, which is designed to make use of relevant information as long as it is acquired, and we present an algorithm for compiling such trees from a model-based reasoning system.\nThis paper presents a new classifier combination technique based on the Dempster-Shafer theory of evidence. The Dempster-Shafer theory of evidence is a powerful method for combining measures of evidence from different classifiers. However, since each of the available methods that estimates the evidence of classifiers has its own limitations, we propose here a new implementation which adapts to training data so that the overall mean square error is minimized. The proposed technique is shown to outperform most available classifier combination methods when tested on three different classification problems.\nA novel algorithm for actively trading stocks is presented. While traditional expert advice and \"universal\" algorithms (as well as standard technical trading heuristics) attempt to predict winners or trends, our approach relies on predictable statistical relations between all pairs of stocks in the market. Our empirical results on historical markets provide strong evidence that this type of technical trading can \"beat the market\" and moreover, can beat the best stock in the market. In doing so we utilize a new idea for smoothing critical parameters in the context of expert learning.\nValue iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs.\nThis study proposes a framework of Uncertainty-based Group Decision Support System (UGDSS). It provides a platform for multiple criteria decision analysis in six aspects including (1) decision environment, (2) decision problem, (3) decision group, (4) decision conflict, (5) decision schemes and (6) group negotiation. Based on multiple artificial intelligent technologies, this framework provides reliable support for the comprehensive manipulation of applications and advanced decision approaches through the design of an integrated multi-agents architecture.\nWe represent agents as sets of strings. Each string encodes a potential interaction with another agent or environment. We represent the total set of dynamics between two agents as the intersection of their respective strings, we prove complexity properties of player interactions using Algorithmic Information Theory. We show how the proposed construction is compatible with Universal Artificial Intelligence, in that the AIXI model can be seen as universal with respect to interaction.\nWe present alpha-expansion beta-shrink moves, a simple generalization of the widely-used alpha-beta swap and alpha-expansion algorithms for approximate energy minimization. We show that in a certain sense, these moves dominate both alpha-beta-swap and alpha-expansion moves, but unlike previous generalizations the new moves require no additional assumptions and are still solvable in polynomial-time. We show promising experimental results with the new moves, which we believe could be used in any context where alpha-expansions are currently employed.\nVariable elimination is a general technique for constraint processing. It is often discarded because of its high space complexity. However, it can be extremely useful when combined with other techniques. In this paper we study the applicability of variable elimination to the challenging problem of finding still-lifes. We illustrate several alternatives: variable elimination as a stand-alone algorithm, interleaved with search, and as a source of good quality lower bounds. We show that these techniques are the best known option both theoretically and empirically. In our experiments we have been able to solve the n=20 instance, which is far beyond reach with alternative approaches.\nWe present a uniform non-monotonic solution to the problems of reasoning about action on the basis of an argumentation-theoretic approach. Our theory is provably correct relative to a sensible minimisation policy introduced on top of a temporal propositional logic. Sophisticated problem domains can be formalised in our framework. As much attention of researchers in the field has been paid to the traditional and basic problems in reasoning about actions such as the frame, the qualification and the ramification problems, approaches to these problems within our formalisation lie at heart of the expositions presented in this paper.\nLogical hidden Markov models (LOHMMs) upgrade traditional hidden Markov models to deal with sequences of structured symbols in the form of logical atoms, rather than flat characters.   This note formally introduces LOHMMs and presents solutions to the three central inference problems for LOHMMs: evaluation, most likely hidden state sequence and parameter estimation. The resulting representation and algorithms are experimentally evaluated on problems from the domain of bioinformatics.\nWe describe the version of the GPT planner used in the probabilistic track of the 4th International Planning Competition (IPC-4). This version, called mGPT, solves Markov Decision Processes specified in the PPDDL language by extracting and using different classes of lower bounds along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations where the alternative probabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state and the greedy policy.\nThe Optiplan planning system is the first integer programming-based planner that successfully participated in the international planning competition. This engineering note describes the architecture of Optiplan and provides the integer programming formulation that enabled it to perform reasonably well in the competition. We also touch upon some recent developments that make integer programming encodings significantly more competitive.\nPDDL2.1 was designed to push the envelope of what planning algorithms can do, and it has succeeded. It adds two important features: durative actions,which take time (and may have continuous effects); and objective functions for measuring the quality of plans. The concept of durative actions is flawed; and the treatment of their semantics reveals too strong an attachment to the way many contemporary planners work. Future PDDL innovators should focus on producing a clean semantics for additions to the language, and let planner implementers worry about coupling their algorithms to problems expressed in the latest version of the language.\nThe addition of durative actions to PDDL2.1 sparked some controversy. Fox and Long argued that actions should be considered as instantaneous, but can start and stop processes. Ultimately, a limited notion of durative actions was incorporated into the language. I argue that this notion is still impoverished, and that the underlying philosophical position of regarding durative actions as being a shorthand for a start action, process, and stop action ignores the realities of modelling and execution for complex systems.\nWe present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting prior knowledge is formulated as a bilevel program, which is solved (approximately) via iterative application of second-order cone programming. To test our approach, we consider the problem of using WordNet (a semantic database of English language) to improve low-sample classification accuracy of newsgroup categorization. WordNet is viewed as an approximate, but readily available source of background knowledge, and our framework is capable of utilizing it in a flexible way.\nWe present a new algorithm for probabilistic planning with no observability. Our algorithm, called Probabilistic-FF, extends the heuristic forward-search machinery of Conformant-FF to problems with probabilistic uncertainty about both the initial state and action effects. Specifically, Probabilistic-FF combines Conformant-FFs techniques with a powerful machinery for weighted model counting in (weighted) CNFs, serving to elegantly define both the search space and the heuristic function. Our evaluation of Probabilistic-FF shows its fine scalability in a range of probabilistic domains, constituting a several orders of magnitude improvement over previous results in this area. We use a problematic case to point out the main open issue to be addressed by further research.\nWe present three new complexity results for classes of planning problems with simple causal graphs. First, we describe a polynomial-time algorithm that uses macros to generate plans for the class 3S of planning problems with binary state variables and acyclic causal graphs. This implies that plan generation may be tractable even when a planning problem has an exponentially long minimal solution. We also prove that the problem of plan existence for planning problems with multi-valued variables and chain causal graphs is NP-hard. Finally, we show that plan existence for planning problems with binary state variables and polytree causal graphs is NP-complete.\nThe task of keyhole (unobtrusive) plan recognition is central to adaptive game AI. \"Tech trees\" or \"build trees\" are the core of real-time strategy (RTS) game strategic (long term) planning. This paper presents a generic and simple Bayesian model for RTS build tree prediction from noisy observations, which parameters are learned from replays (game logs). This unsupervised machine learning approach involves minimal work for the game developers as it leverage players' data (com- mon in RTS). We applied it to StarCraft1 and showed that it yields high quality and robust predictions, that can feed an adaptive AI.\nWe analyse the philosopher Davidson's semantics of actions, using a strongly typed logic with contexts given by sets of partial equations between the outcomes of actions. This provides a perspicuous and elegant treatment of reasoning about action, analogous to Reiter's work on artificial intelligence. We define a sequent calculus for this logic, prove cut elimination, and give a semantics based on fibrations over partial cartesian categories: we give a structure theory for such fibrations. The existence of lax comma objects is necessary for the proof of cut elimination, and we give conditions on the domain fibration of a partial cartesian category for such comma objects to exist.\nPerforming sensitivity analysis for influence diagrams using the decision circuit framework is particularly convenient, since the partial derivatives with respect to every parameter are readily available [Bhattacharjya and Shachter, 2007; 2008]. In this paper we present three non-linear sensitivity analysis methods that utilize this partial derivative information and therefore do not require re-evaluating the decision situation multiple times. Specifically, we show how to efficiently compare strategies in decision situations, perform sensitivity to risk aversion and compute the value of perfect hedging [Seyller, 2008].\nWe propose a new point-based method for approximate planning in Dec-POMDP which outperforms the state-of-the-art approaches in terms of solution quality. It uses a heuristic estimation of the prior probability of beliefs to choose a bounded number of policy trees: this choice is formulated as a combinatorial optimisation problem minimising the error induced by pruning.\nA major limitation of exact inference algorithms for probabilistic graphical models is their extensive memory usage, which often puts real-world problems out of their reach. In this paper we show how we can extend inference algorithms, particularly Bucket Elimination, a special case of cluster (join) tree decomposition, to utilize disk memory. We provide the underlying ideas and show promising empirical results of exactly solving large problems not solvable before.\nWe describe a framework and an algorithm for solving hybrid influence diagrams with discrete, continuous, and deterministic chance variables, and discrete and continuous decision variables. A continuous chance variable in an influence diagram is said to be deterministic if its conditional distributions have zero variances. The solution algorithm is an extension of Shenoy's fusion algorithm for discrete influence diagrams. We describe an extended Shenoy-Shafer architecture for propagation of discrete, continuous, and utility potentials in hybrid influence diagrams that include deterministic chance variables. The algorithm and framework are illustrated by solving two small examples.\nThe paper introduces k-bounded MAP inference, a parameterization of MAP inference in Markov logic networks. k-Bounded MAP states are MAP states with at most k active ground atoms of hidden (non-evidence) predicates. We present a novel delayed column generation algorithm and provide empirical evidence that the algorithm efficiently computes k-bounded MAP states for meaningful real-world graph matching problems. The underlying idea is that, instead of solving one large optimization problem, it is often more efficient to tackle several small ones.\nRollating walkers are popular mobility aids used by older adults to improve balance control. There is a need to automatically recognize the activities performed by walker users to better understand activity patterns, mobility issues and the context in which falls are more likely to happen. We design and compare several techniques to recognize walker related activities. A comprehensive evaluation with control subjects and walker users from a retirement community is presented.\nThe paper provides a simple test for deciding, from a given causal diagram, whether two sets of variables have the same bias-reducing potential under adjustment. The test requires that one of the following two conditions holds: either (1) both sets are admissible (i.e., satisfy the back-door criterion) or (2) the Markov boundaries surrounding the manipulated variable(s) are identical in both sets. Applications to covariate selection and model testing are discussed.\nThe standard coherence criterion for lower previsions is expressed using an infinite number of linear constraints. For lower previsions that are essentially defined on some finite set of gambles on a finite possibility space, we present a reformulation of this criterion that only uses a finite number of constraints. Any such lower prevision is coherent if it lies within the convex polytope defined by these constraints. The vertices of this polytope are the extreme coherent lower previsions for the given set of gambles. Our reformulation makes it possible to compute them. We show how this is done and illustrate the procedure and its results.\nWe present a probabilistic model of events in continuous time in which each event triggers a Poisson process of successor events. The ensemble of observed events is thereby modeled as a superposition of Poisson processes. Efficient inference is feasible under this model with an EM algorithm. Moreover, the EM algorithm can be implemented as a distributed algorithm, permitting the model to be applied to very large datasets. We apply these techniques to the modeling of Twitter messages and the revision history of Wikipedia.\nWe study the problem of learning Bayesian network structures from data. We develop an algorithm for finding the k-best Bayesian network structures. We propose to compute the posterior probabilities of hypotheses of interest by Bayesian model averaging over the k-best Bayesian networks. We present empirical results on structural discovery over several real and synthetic data sets and show that the method outperforms the model selection method and the state of-the-art MCMC methods.\nFor product rating environments, similar to that of Amazon Reviews, it has been shown that the truthful elicitation of feedback is possible through mechanisms which pay buyer reports contingent on the reports of other buyers. We study whether similar mechanisms can be designed for reputation mechanisms at online auction sites where the buyers' experiences are partially determined by a strategic seller. We show that this is impossible for the basic setting. However, introducing a small prior belief that the seller is a cooperative commitment player leads to a payment scheme with a truthful perfect Bayesian equilibrium.\nA branch-and-bound approach to solving influ- ence diagrams has been previously proposed in the literature, but appears to have never been implemented and evaluated - apparently due to the difficulties of computing effective bounds for the branch-and-bound search. In this paper, we describe how to efficiently compute effective bounds, and we develop a practical implementa- tion of depth-first branch-and-bound search for influence diagram evaluation that outperforms existing methods for solving influence diagrams with multiple stages.\nThis work proposes to put up a tool for diagnosing multi faults based on model using techniques of detection and localization inspired from the community of artificial intelligence and that of automatic. The diagnostic procedure to be integrated into the supervisory system must therefore be provided with explanatory features. Techniques based on causal reasoning are a pertinent approach for this purpose. Bond graph modeling is used to describe the cause effect relationship between process variables. Experimental results are presented and discussed in order to compare performance of causal graph technique and classic methods inspired from artificial intelligence (DX) and control theory (FDI).\nThe development of expert system for treatment of Diabetes disease by using natural methods is new information technology derived from Artificial Intelligent research using ESTA (Expert System Text Animation) System. The proposed expert system contains knowledge about various methods of natural treatment methods (Massage, Herbal/Proper Nutrition, Acupuncture, Gems) for Diabetes diseases of Human Beings. The system is developed in the ESTA (Expert System shell for Text Animation) which is Visual Prolog 7.3 Application. The knowledge for the said system will be acquired from domain experts, texts and other related sources.\nFirst-order probabilistic models combine representational power of first-order logic with graphical models. There is an ongoing effort to design lifted inference algorithms for first-order probabilistic models. We analyze lifted inference from the perspective of constraint processing and, through this viewpoint, we analyze and compare existing approaches and expose their advantages and limitations. Our theoretical results show that the wrong choice of constraint processing method can lead to exponential increase in computational complexity. Our empirical tests confirm the importance of constraint processing in lifted inference. This is the first theoretical and empirical study of constraint processing in lifted inference.\nWe study a subclass of POMDPs, called Deterministic POMDPs, that is characterized by deterministic actions and observations. These models do not provide the same generality of POMDPs yet they capture a number of interesting and challenging problems, and permit more efficient algorithms. Indeed, some of the recent work in planning is built around such assumptions mainly by the quest of amenable models more expressive than the classical deterministic models. We provide results about the fundamental properties of Deterministic POMDPs, their relation with AND/OR search problems and algorithms, and their computational complexity.\nThe paper introduces AND/OR importance sampling for probabilistic graphical models. In contrast to importance sampling, AND/OR importance sampling caches samples in the AND/OR space and then extracts a new sample mean from the stored samples. We prove that AND/OR importance sampling may have lower variance than importance sampling; thereby providing a theoretical justification for preferring it over importance sampling. Our empirical evaluation demonstrates that AND/OR importance sampling is far more accurate than importance sampling in many cases.\nWe present an algorithm that identifies the reasoning patterns of agents in a game, by iteratively examining the graph structure of its Multi-Agent Influence Diagram (MAID) representation. If the decision of an agent participates in no reasoning patterns, then we can effectively ignore that decision for the purpose of calculating a Nash equilibrium for the game. In some cases, this can lead to exponential time savings in the process of equilibrium calculation. Moreover, our algorithm can be used to enumerate the reasoning patterns in a game, which can be useful for constructing more effective computerized agents interacting with humans.\nIt is well-known that inference in graphical models is hard in the worst case, but tractable for models with bounded treewidth. We ask whether treewidth is the only structural criterion of the underlying graph that enables tractable inference. In other words, is there some class of structures with unbounded treewidth in which inference is tractable? Subject to a combinatorial hypothesis due to Robertson et al. (1994), we show that low treewidth is indeed the only structural restriction that can ensure tractability. Thus, even for the \"best case\" graph structure, there is no inference algorithm with complexity polynomial in the treewidth.\nWe consider conditions that allow us to find an optimal strategy for sequential decisions from a given data situation. For the case where all interventions are unconditional (atomic), identifiability has been discussed by Pearl & Robins (1995). We argue here that an optimal strategy must be conditional, i.e. take the information available at each decision point into account. We show that the identification of an optimal sequential decision strategy is more restrictive, in the sense that conditional interventions might not always be identified when atomic interventions are. We further demonstrate that a simple graphical criterion for the identifiability of an optimal strategy can be given.\nApproximate inference in dynamic systems is the problem of estimating the state of the system given a sequence of actions and partial observations. High precision estimation is fundamental in many applications like diagnosis, natural language processing, tracking, planning, and robotics. In this paper we present an algorithm that samples possible deterministic executions of a probabilistic sequence. The algorithm takes advantage of a compact representation (using first order logic) for actions and world states to improve the precision of its estimation. Theoretical and empirical results show that the algorithm's expected error is smaller than propositional sampling and Sequential Monte Carlo (SMC) sampling techniques.\nAn efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimatorwith lower variance.\nBayesian networks can be used to extract explanations about the observed state of a subset of variables. In this paper, we explicate the desiderata of an explanation and confront them with the concept of explanation proposed by existing methods. The necessity of taking into account causal approaches when a causal graph is available is discussed. We then introduce causal explanation trees, based on the construction of explanation trees using the measure of causal information ow (Ay and Polani, 2006). This approach is compared to several other methods on known networks.\nModel-based Bayesian reinforcement learning has generated significant interest in the AI community as it provides an elegant solution to the optimal exploration-exploitation tradeoff in classical reinforcement learning. Unfortunately, the applicability of this type of approach has been limited to small domains due to the high complexity of reasoning about the joint posterior over model parameters. In this paper, we consider the use of factored representations combined with online planning techniques, to improve scalability of these methods. The main contribution of this paper is a Bayesian framework for learning the structure and parameters of a dynamical system, while also simultaneously planning a (near-)optimal sequence of actions.\nThis paper develops a measure for bounding the performance of AND/OR search algorithms for solving a variety of queries over graphical models. We show how drawing a connection to the recent notion of hypertree decompositions allows to exploit determinism in the problem specification and produce tighter bounds. We demonstrate on a variety of practical problem instances that we are often able to improve upon existing bounds by several orders of magnitude.\nWe present and evaluate new techniques for designing algorithm portfolios. In our view, the problem has both a scheduling aspect and a machine learning aspect. Prior work has largely addressed one of the two aspects in isolation. Building on recent work on the scheduling aspect of the problem, we present a technique that addresses both aspects simultaneously and has attractive theoretical guarantees. Experimentally, we show that this technique can be used to improve the performance of state-of-the-art algorithms for Boolean satisfiability, zero-one integer programming, and A.I. planning.\nIn this paper we introduce Refractor Importance Sampling (RIS), an improvement to reduce error variance in Bayesian network importance sampling propagation under evidential reasoning. We prove the existence of a collection of importance functions that are close to the optimal importance function under evidential reasoning. Based on this theoretic result we derive the RIS algorithm. RIS approaches the optimal importance function by applying localized arc changes to minimize the divergence between the evidence-adjusted importance function and the optimal importance function. The validity and performance of RIS is empirically tested with a large setof synthetic Bayesian networks and two real-world networks.\nThe paper introduces a generalization for known probabilistic models such as log-linear and graphical models, called here multiplicative models. These models, that express probabilities via product of parameters are shown to capture multiple forms of contextual independence between variables, including decision graphs and noisy-OR functions. An inference algorithm for multiplicative models is provided and its correctness is proved. The complexity analysis of the inference algorithm uses a more refined parameter than the tree-width of the underlying graph, and shows the computational cost does not exceed that of the variable elimination algorithm in graphical models. The paper ends with examples where using the new models and algorithm is computationally beneficial.\nWe propose a new Monte Carlo algorithm for complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get \"trapped\" in cycles. We surmount this problem by modifying the sampling process. This correction does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler.\nWe propose a definition of causality for time series in terms of the effect of an intervention in one component of a multivariate time series on another component at some later point in time. Conditions for identifiability, comparable to the back-door and front-door criteria, are presented and can also be verified graphically. Computation of the causal effect is derived and illustrated for the linear case.\nWe describe the semantic foundations for elicitation of generalized additively independent (GAI) utilities using the minimax regret criterion, and propose several new query types and strategies for this purpose. Computational feasibility is obtained by exploiting the local GAI structure in the model. Our results provide a practical approach for implementing preference-based constrained configuration optimization as well as effective search in multiattribute product databases.\nWe use the implicitization procedure to generate polynomial equality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network with hidden variables. We show how we may reduce the complexity of the implicitization problem and make the problem tractable in certain causal Bayesian networks. We also show some preliminary results on the algebraic structure of polynomial constraints. The results have applications in distinguishing between causal models and in testing causal models with combined observational and experimental data.\nThe belief propagation (BP) algorithm is widely applied to perform approximate inference on arbitrary graphical models, in part due to its excellent empirical properties and performance. However, little is known theoretically about when this algorithm will perform well. Using recent analysis of convergence and stability properties in BP and new results on approximations in binary systems, we derive a bound on the error in BP's estimates for pairwise Markov random fields over discrete valued random variables. Our bound is relatively simple to compute, and compares favorably with a previous method of bounding the accuracy of BP.\nCounterfactual statements, e.g., \"my headache would be gone had I taken an aspirin\" are central to scientific discourse, and are formally interpreted as statements derived from \"alternative worlds\". However, since they invoke hypothetical states of affairs, often incompatible with what is actually known or observed, testing counterfactuals is fraught with conceptual and practical difficulties. In this paper, we provide a complete characterization of \"testable counterfactuals,\" namely, counterfactual statements whose probabilities can be inferred from physical experiments. We provide complete procedures for discerning whether a given counterfactual is testable and, if so, expressing its probability in terms of experimental data.\nMemory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well.\nIn Bayesian networks, a Most Probable Explanation (MPE) is a complete variable instantiation with a highest probability given the current evidence. In this paper, we discuss the problem of finding robustness conditions of the MPE under single parameter changes. Specifically, we ask the question: How much change in a single network parameter can we afford to apply while keeping the MPE unchanged? We will describe a procedure, which is the first of its kind, that computes this answer for each parameter in the Bayesian network variable in time O(n exp(w)), where n is the number of network variables and w is its treewidth.\nWe present a class of inequality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network, in which some of the variables remain unmeasured. We derive bounds on causal effects that are not directly measured in randomized experiments. We derive instrumental inequality type of constraints on nonexperimental distributions. The results have applications in testing causal models with observational or experimental data.\nThis paper studies a new and more general axiomatization than one presented previously for preference on likelihood gambles. Likelihood gambles describe actions in a situation where a decision maker knows multiple probabilistic models and a random sample generated from one of those models but does not know prior probability of models. This new axiom system is inspired by Jensen's axiomatization of probabilistic gambles. Our approach provides a new perspective to the role of data in decision making under ambiguity. It avoids one of the most controversial issue of Bayesian methodology namely the assumption of prior probability.\nThere exist several architectures to solve influence diagrams using local computations, such as the Shenoy-Shafer, the HUGIN, or the Lazy Propagation architectures. They all extend usual variable elimination algorithms thanks to the use of so-called 'potentials'. In this paper, we introduce a new architecture, called the Multi-operator Cluster DAG architecture, which can produce decompositions with an improved constrained induced-width, and therefore induce potentially exponential gains. Its principle is to benefit from the composite nature of influence diagrams, instead of using uniform potentials, in order to better analyze the problem structure.\nOne approach to monitoring a dynamic system relies on decomposition of the system into weakly interacting subsystems. An earlier paper introduced a notion of weak interaction called separability, and showed that it leads to exact propagation of marginals for prediction. This paper addresses two questions left open by the earlier paper: can we define a notion of approximate separability that occurs naturally in practice, and do separability and approximate separability lead to accurate monitoring? The answer to both questions is afirmative. The paper also analyzes the structure of approximately separable decompositions, and provides some explanation as to why these models perform well.\nWe propose a method to identify all the nodes that are relevant to compute all the conditional probability distributions for a given set of nodes. Our method is simple, effcient, consistent, and does not require learning a Bayesian network first. Therefore, our method can be applied to high-dimensional databases, e.g. gene expression databases.\nIn recent years Bayesian networks (BNs) with a mixture of continuous and discrete variables have received an increasing level of attention. We present an architecture for exact belief update in Conditional Linear Gaussian BNs (CLG BNs). The architecture is an extension of lazy propagation using operations of Lauritzen & Jensen [6] and Cowell [2]. By decomposing clique and separator potentials into sets of factors, the proposed architecture takes advantage of independence and irrelevance properties induced by the structure of the graph and the evidence. The resulting benefits are illustrated by examples. Results of a preliminary empirical performance evaluation indicate a significant potential of the proposed architecture.\nWe set up a model for reasoning about metric spaces with belief theoretic measures. The uncertainty in these spaces stems from both probability and metric. To represent both aspect of uncertainty, we choose an expected distance function as a measure of uncertainty. A formal logical system is constructed for the reasoning about expected distance. Soundness and completeness are shown for this logic. For reasoning on product metric space with uncertainty, a new metric is defined and shown to have good properties.\nThis paper proposes new formulas for the probabilities of causation difined by Pearl (2000). Tian and Pearl (2000a, 2000b) showed how to bound the quantities of the probabilities of causation from experimental and observational data, under the minimal assumptions about the data-generating process. We derive narrower bounds than Tian-Pearl bounds by making use of the covariate information measured in experimental and observational studies. In addition, we provide identifiable case under no-prevention assumption and discuss the covariate selection problem from the viewpoint of estimation accuracy. These results are helpful in providing more evidence for public policy assessment and dicision making problems.\nWe introduce priors and algorithms to perform Bayesian inference in Gaussian models defined by acyclic directed mixed graphs. Such a class of graphs, composed of directed and bi-directed edges, is a representation of conditional independencies that is closed under marginalization and arises naturally from causal models which allow for unmeasured confounding. Monte Carlo methods and a variational approximation for such models are presented. Our algorithms for Bayesian inference allow the evaluation of posterior distributions for several quantities of interest, including causal effects that are not identifiable from data alone but could otherwise be inferred where informative prior knowledge about confounding is available.\nThe subject of this paper is the elucidation of effects of actions from causal assumptions represented as a directed graph, and statistical knowledge given as a probability distribution. In particular, we are interested in predicting conditional distributions resulting from performing an action on a set of variables and, subsequently, taking measurements of another set. We provide a necessary and sufficient graphical condition for the cases where such distributions can be uniquely computed from the available information, as well as an algorithm which performs this computation whenever the condition holds. Furthermore, we use our results to prove completeness of do-calculus [Pearl, 1995] for the same identification problem.\nThe use of Artificial Intelligence is finding prominence not only in core computer areas, but also in cross disciplinary areas including medical diagnosis. In this paper, we present a rule based Expert System used in diagnosis of Cerebral Palsy. The expert system takes user input and depending on the symptoms of the patient, diagnoses if the patient is suffering from Cerebral Palsy. The Expert System also classifies the Cerebral Palsy as mild, moderate or severe based on the presented symptoms.\nWhile POMDPs provide a general platform for non-deterministic conditional planning under a variety of quality metrics they have limited scalability. On the other hand, non-deterministic conditional planners scale very well, but many lack the ability to optimize plan quality metrics. We present a novel generalization of planning graph based heuristics that helps conditional planners both scale and generate high quality plans when using actions with nonuniform costs. We make empirical comparisons with two state of the art planners to show the benefit of our techniques.\nWe present research on developing models that forecast traffic flow and congestion in the Greater Seattle area. The research has led to the deployment of a service named JamBayes, that is being actively used by over 2,500 users via smartphones and desktop versions of the system. We review the modeling effort and describe experiments probing the predictive accuracy of the models. Finally, we present research on building models that can identify current and future surprises, via efforts on modeling and forecasting unexpected situations.\nThis paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum likelihood method that generates point estimates given a SQPN and empirical data, and we describe a Bayesian-minded method that employs the Imprecise Dirichlet Model to generate set-valued estimates.\nVoting is a very general method of preference aggregation. A voting rule takes as input every voter's vote (typically, a ranking of the alternatives), and produces as output either just the winning alternative or a ranking of the alternatives. One potential view of voting is the following. There exists a 'correct' outcome (winner/ranking), and each voter's vote corresponds to a noisy perception of this correct outcome. If we are given the noise model, then for any vector of votes, we can\nWe consider the problem of deleting edges from a Bayesian network for the purpose of simplifying models in probabilistic inference. In particular, we propose a new method for deleting network edges, which is based on the evidence at hand. We provide some interesting bounds on the KL-divergence between original and approximate networks, which highlight the impact of given evidence on the quality of approximation and shed some light on good and bad candidates for edge deletion. We finally demonstrate empirically the promise of the proposed edge deletion technique as a basis for approximate inference.\nWe define the notion of compiling a Bayesian network with evidence and provide a specific approach for evidence-based compilation, which makes use of logical processing. The approach is practical and advantageous in a number of application areas-including maximum likelihood estimation, sensitivity analysis, and MAP computations-and we provide specific empirical results in the domain of genetic linkage analysis. We also show that the approach is applicable for networks that do not contain determinism, and show that it empirically subsumes the performance of the quickscore algorithm when applied to noisy-or networks.\nThe local Markov condition for a DAG to be an independence map of a probability distribution is well known. For DAGs with latent variables, represented as bi-directed edges in the graph, the local Markov property may invoke exponential number of conditional independencies. This paper shows that the number of conditional independence relations required may be reduced if the probability distributions satisfy the composition axiom. In certain types of graphs, only linear number of conditional independencies are required. The result has applications in testing linear structural equation models with correlated errors.\nIn this paper, we consider Hybrid Mixed Networks (HMN) which are Hybrid Bayesian Networks that allow discrete deterministic information to be modeled explicitly in the form of constraints. We present two approximate inference algorithms for HMNs that integrate and adjust well known algorithmic principles such as Generalized Belief Propagation, Rao-Blackwellised Importance Sampling and Constraint Propagation to address the complexity of modeling and reasoning in HMNs. We demonstrate the performance of our approximate inference algorithms on randomly generated HMNs.\nWe present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning task varies continuously with respect to our metric distances.\nTackling the problem of ordinal preference revelation and reasoning, we propose a novel methodology for generating an ordinal utility function from a set of qualitative preference statements. To the best of our knowledge, our proposal constitutes the first nonparametric solution for this problem that is both efficient and semantically sound. Our initial experiments provide strong evidence for practical effectiveness of our approach.\nConsider the case where cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding linear structural equation model. This paper provides graphical identifiability criteria for total effects by using surrogate variables in the case where it is difficult to observe a treatment/response variable. The results enable us to judge from graph structure whether a total effect can be identified through the observation of surrogate variables.\nIn this paper we compare search and inference in graphical models through the new framework of AND/OR search. Specifically, we compare Variable Elimination (VE) and memoryintensive AND/OR Search (AO) and place algorithms such as graph-based backjumping and no-good and good learning, as well as Recursive Conditioning [7] and Value Elimination [2] within the AND/OR search framework.\nExisting complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also discuss recent improvements to our (point-based) heuristic search value iteration algorithm. Our new implementation calculates tighter initial bounds, avoids solving linear programs, and makes more effective use of sparsity.\nThe aim of this paper is to propose a generalization of previous approaches in qualitative decision making. Our work is based on the binary possibilistic utility (PU), which is a possibilistic counterpart of Expected Utility (EU).We first provide a new axiomatization of PU and study its relation with the lexicographic aggregation of pessimistic and optimistic utilities. Then we explain the reasons of the coarseness of qualitative decision criteria. Finally, thanks to a redefinition of possibilistic lotteries and mixtures, we present the refined binary possibilistic utility, which is more discriminating than previously proposed criteria.\nMaximal ancestral graphs (MAGs) are used to encode conditional independence relations in DAG models with hidden variables. Different MAGs may represent the same set of conditional independences and are called Markov equivalent. This paper considers MAGs without undirected edges and shows conditions under which an arrow in a MAG can be reversed or interchanged with a bi-directed edge so as to yield a Markov equivalent MAG.\nWe propose a variant of Alternating-time Temporal Logic (ATL) grounded in the agents' operational know-how, as defined by their libraries of abstract plans. Inspired by ATLES, a variant itself of ATL, it is possible in our logic to explicitly refer to \"rational\" strategies for agents developed under the Belief-Desire-Intention agent programming paradigm. This allows us to express and verify properties of BDI systems using ATL-type logical frameworks.\nWe present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.\nWe describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming, the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it to piecewise linear representations, using techniques from POMDPs to represent and reason about linear surfaces efficiently. We show that for complex, structured problems, our approach exploits the natural structure so that optimal solutions can be computed efficiently.\nWe present a major improvement to the incremental pruning algorithm for solving partially observable Markov decision processes. Our technique targets the cross-sum step of the dynamic programming (DP) update, a key source of complexity in POMDP algorithms. Instead of reasoning about the whole belief space when pruning the cross-sums, our algorithm divides the belief space into smaller regions and performs independent pruning in each region. We evaluate the benefits of the new technique both analytically and experimentally, and show that it produces very significant performance gains. The results contribute to the scalability of POMDP algorithms to domains that cannot be handled by the best existing techniques.\nThe aim of this work is to provide a unified framework for ordinal representations of uncertainty lying at the crosswords between possibility and probability theories. Such confidence relations between events are commonly found in monotonic reasoning, inconsistency management, or qualitative decision theory. They start either from probability theory, making it more qualitative, or from possibility theory, making it more expressive. We show these two trends converge to a class of genuine probability theories. We provide characterization results for these useful tools that preserve the qualitative nature of possibility rankings, while enjoying the power of expressivity of additive representations.\nThe paper introduces mixed networks, a new framework for expressing and reasoning with probabilistic and deterministic information. The framework combines belief networks with constraint networks, defining the semantics and graphical representation. We also introduce the AND/OR search space for graphical models, and develop a new linear space search algorithm. This provides the basis for understanding the benefits of processing the constraint information separately, resulting in the pruning of the search space. When the constraint part is tractable or has a small number of solutions, using the mixed representation can be exponentially more effective than using pure belief networks which odel constraints as conditional probability tables.\nWe consider the challenge of preference elicitation in systems that help users discover the most desirable item(s) within a given database. Past work on preference elicitation focused on structured models that provide a factored representation of users' preferences. Such models require less information to construct and support efficient reasoning algorithms. This paper makes two substantial contributions to this area: (1) Strong representation theorems for factored value functions. (2) A methodology that utilizes our representation results to address the problem of optimal item selection.\nPearl has provided the back door criterion, the front door criterion and the conditional instrumental variable (IV) method as identifiability criteria for total effects. In some situations, these three criteria can be applied to identifying total effects simultaneously. For the purpose of increasing estimating accuracy, this paper compares the three ways of identifying total effects in terms of the asymptotic variance, and concludes that in some situations the superior of them can be recognized directly from the graph structure.\nThis paper concerns the assessment of the effects of actions from a combination of nonexperimental data and causal assumptions encoded in the form of a directed acyclic graph in which some variables are presumed to be unobserved. We provide a procedure that systematically identifies cause effects between two sets of variables conditioned on some other variables, in time polynomial in the number of variables in the graph. The identifiable conditional causal effects are expressed in terms of the observed joint distribution.\nWe present a novel POMDP planning algorithm called heuristic search value iteration (HSVI).HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.\nA causal claim is any assertion that invokes causal relationships between variables, for example that a drug has a certain effect on preventing a disease. Causal claims are established through a combination of data and a set of causal assumptions called a causal model. A claim is robust when it is insensitive to violations of some of the causal assumptions embodied in the model. This paper gives a formal definition of this notion of robustness and establishes a graphical condition for quantifying the degree of robustness of a given causal claim. Algorithms for computing the degree of robustness are also presented.\nFilter selection techniques are known for their simplicity and efficiency. However this kind of methods doesn't take into consideration the features inter-redundancy. Consequently the un-removed redundant features remain in the final classification model, giving lower generalization performance. In this paper we propose to use a mathematical optimization method that reduces inter-features redundancy and maximize relevance between each feature and the target variable.\nWe develop several algorithms taking advantage of two common approaches for bounding MPE queries in graphical models: minibucket elimination and message-passing updates for linear programming relaxations. Both methods are quite similar, and offer useful perspectives for the other; our hybrid approaches attempt to balance the advantages of each. We demonstrate the power of our hybrid algorithms through extensive empirical evaluation. Most notably, a Branch and Bound search guided by the heuristic function calculated by one of our new algorithms has recently won first place in the PASCAL2 inference challenge.\nWe consider the problem of selecting a subset of alternatives given noisy evaluations of the relative strength of different alternatives. We wish to select a k-subset (for a given k) that provides a maximum likelihood estimate for one of several objectives, e.g., containing the strongest alternative. Although this problem is NP-hard, we show that when the noise level is sufficiently high, intuitive methods provide the optimal solution. We thus generalize classical results about singling out one alternative and identifying the hidden ranking of alternatives by strength. Extensive experiments show that our methods perform well in practical settings.\nWe study the problem of complexity estimation in the context of parallelizing an advanced Branch and Bound-type algorithm over graphical models. The algorithm's pruning power makes load balancing, one crucial element of every distributed system, very challenging. We propose using a statistical regression model to identify and tackle disproportionally complex parallel subproblems, the cause of load imbalance, ahead of time. The proposed model is evaluated and analyzed on various levels and shown to yield robust predictions. We then demonstrate its effectiveness for load balancing in practice.\nInfluence diagrams allow for intuitive and yet precise description of complex situations involving decision making under uncertainty. Unfortunately, most of the problems described by influence diagrams are hard to solve. In this paper we discuss the complexity of approximately solving influence diagrams. We do not assume no-forgetting or regularity, which makes the class of problems we address very broad. Remarkably, we show that when both the tree-width and the cardinality of the variables are bounded the problem admits a fully polynomial-time approximation scheme.\nVariational inference algorithms such as belief propagation have had tremendous impact on our ability to learn and use graphical models, and give many insights for developing or understanding exact and approximate inference. However, variational approaches have not been widely adoped for decision making in graphical models, often formulated through influence diagrams and including both centralized and decentralized (or multi-agent) decisions. In this work, we present a general variational framework for solving structured cooperative decision-making problems, use it to propose several belief propagation-like algorithms, and analyze them both theoretically and empirically.\nStochastic domains often involve risk-averse decision makers. While recent work has focused on how to model risk in Markov decision processes using risk measures, it has not addressed the problem of solving large risk-averse formulations. In this paper, we propose and analyze a new method for solving large risk-averse MDPs with hybrid continuous-discrete state spaces and continuous action spaces. The proposed method iteratively improves a bound on the value function using a linearity structure of the MDP. We demonstrate the utility and properties of the method on a portfolio optimization problem.\nIn this paper, starting from a generalized coherent (i.e. avoiding uniform loss) intervalvalued probability assessment on a finite family of conditional events, we construct conditional probabilities with quasi additive classes of conditioning events which are consistent with the given initial assessment. Quasi additivity assures coherence for the obtained conditional probabilities. In order to reach our goal we define a finite sequence of conditional probabilities by exploiting some theoretical results on g-coherence. In particular, we use solutions of a finite sequence of linear systems.\nWe describe multi-objective influence diagrams, based on a set of p objectives, where utility values are vectors in Rp, and are typically only partially ordered. These can still be solved by a variable elimination algorithm, leading to a set of maximal values of expected utility. If the Pareto ordering is used this set can often be prohibitively large. We consider approximate representations of the Pareto set based on e-coverings, allowing much larger problems to be solved. In addition, we define a method for incorporating user tradeoffs, which also greatly improves the efficiency.\nControlled natural languages (mostly English-based) recently have emerged as seemingly informal supplementary means for OWL ontology authoring, if compared to the formal notations that are used by professional knowledge engineers. In this paper we present by examples controlled Latvian language that has been designed to be compliant with the state of the art Attempto Controlled English. We also discuss relation with controlled Lithuanian language that is being designed in parallel.\nIn cognitive sciences it is not uncommon to use various games effectively. For example, in artificial intelligence, the RoboCup initiative was to set up to catalyse research on the field of autonomous agent technology. In this paper, we introduce a similar soccer simulation initiative to try to investigate a model of human consciousness and a notion of reality in the form of a cognitive problem. In addition, for example, the home pitch advantage and the objective role of the supporters could be naturally described and discussed in terms of this new soccer simulation model.\nThe RoboCup 2D Simulation League incorporates several challenging features, setting a benchmark for Artificial Intelligence (AI). In this paper we describe some of the ideas and tools around the development of our team, Gliders2012. In our description, we focus on the evaluation function as one of our central mechanisms for action selection. We also point to a new framework for watching log files in a web browser that we release for use and further development by the RoboCup community. Finally, we also summarize results of the group and final matches we played during RoboCup 2012, with Gliders2012 finishing 4th out of 19 teams.\nThis paper advocates the exploration of the full state of recorded real-time strategy (RTS) games, by human or robotic players, to discover how to reason about tactics and strategy. We present a dataset of StarCraft games encompassing the most of the games' state (not only player's orders). We explain one of the possible usages of this dataset by clustering armies on their compositions. This reduction of armies compositions to mixtures of Gaussian allow for strategic reasoning at the level of the components. We evaluated this clustering method by predicting the outcomes of battles based on armies compositions' mixtures components\nIn a standard possibilistic logic, prioritized information are encoded by means of weighted knowledge base. This paper proposes an extension of possibilistic logic for dealing with partially ordered information. We Show that all basic notions of standard possibilitic logic (sumbsumption, syntactic and semantic inference, etc.) have natural couterparts when dealing with partially ordered information. We also propose an algorithm which computes possibilistic conclusions of a partial knowledge base of a partially ordered knowlege base.\nThis paper is directed towards combining Pearl's structural-model approach to causal reasoning with high-level formalisms for reasoning about actions. More precisely, we present a combination of Pearl's structural-model approach with Poole's independent choice logic. We show how probabilistic theories in the independent choice logic can be mapped to probabilistic causal models. This mapping provides the independent choice logic with appealing concepts of causality and explanation from the structural-model approach. We illustrate this along Halpern and Pearl's sophisticated notions of actual cause, explanation, and partial explanation. This mapping also adds first-order modeling capabilities and explicit actions to the structural-model approach.\nWe present the language {m P}{cal C}+ for probabilistic reasoning about actions, which is a generalization of the action language {cal C}+ that allows to deal with probabilistic as well as nondeterministic effects of actions. We define a formal semantics of {m P}{cal C}+ in terms of probabilistic transitions between sets of states. Using a concept of a history and its belief state, we then show how several important problems in reasoning about actions can be concisely formulated in our formalism.\nBy elaborating on the notion of linear belief functions (Dempster 1990; Liu 1996), we propose an elementary approach to knowledge representation for expert systems using linear belief functions. We show how to use basic matrices to represent market information and financial knowledge, including complete ignorance, statistical observations, subjective speculations, distributional assumptions, linear relations, and empirical asset pricing models. We then appeal to Dempster's rule of combination to integrate the knowledge for assessing an overall belief of portfolio performance, and updating the belief by incorporating additional information. We use an example of three gold stocks to illustrate the approach.\nThis paper presents a scalable control algorithm that enables a deployed mobile robot system to make high-level decisions under full consideration of its probabilistic belief. Our approach is based on insights from the rich literature of hierarchical controllers and hierarchical MDPs. The resulting controller has been successfully deployed in a nursing facility near Pittsburgh, PA. To the best of our knowledge, this work is a unique instance of applying POMDPs to high-level robotic control problems.\nThis paper is devoted to the search of robust solutions in state space graphs when costs depend on scenarios. We first present axiomatic requirements for preference compatibility with the intuitive idea of robustness.This leads us to propose the Lorenz dominance rule as a basis for robustness analysis. Then, after presenting complexity results about the determination of robust solutions, we propose a new sophistication of A* specially designed to determine the set of robust paths in a state space graph. The behavior of the algorithm is illustrated on a small example. Finally, an axiomatic justification of the refinement of robustness by an OWA criterion is provided.\nRobust optimization is one of the fundamental approaches to deal with uncertainty in combinatorial optimization. This paper considers the robust spanning tree problem with interval data, which arises in a variety of telecommunication applications. It proposes a constraint satisfaction approach using a combinatorial lower bound, a pruning component that removes infeasible and suboptimal edges, as well as a search strategy exploring the most uncertain edges first. The resulting algorithm is shown to produce very dramatic improvements over the mathematical programming approach of Yaman et al. and to enlarge considerably the class of problems amenable to effective solutions\nWe develop a qualitative theory of Markov Decision Processes (MDPs) and Partially Observable MDPs that can be used to model sequential decision making tasks when only qualitative information is available. Our approach is based upon an order-of-magnitude approximation of both probabilities and utilities, similar to epsilon-semantics. The result is a qualitative theory that has close ties with the standard maximum-expected-utility theory and is amenable to general planning techniques.\nThis paper concerns the assessment of direct causal effects from a combination of: (i) non-experimental data, and (ii) qualitative domain knowledge. Domain knowledge is encoded in the form of a directed acyclic graph (DAG), in which all interactions are assumed linear, and some variables are presumed to be unobserved. We provide a generalization of the well-known method of Instrumental Variables, which allows its application to models with few conditional independeces.\nIn this paper, we continue our research on the algorithmic aspects of Halpern and Pearl's causes and explanations in the structural-model approach. To this end, we present new characterizations of weak causes for certain classes of causal models, which show that under suitable restrictions deciding causes and explanations is tractable. To our knowledge, these are the first explicit tractability results for the structural-model approach.\nThis paper presents a decision-theoretic approach to statistical inference that satisfies the likelihood principle (LP) without using prior information. Unlike the Bayesian approach, which also satisfies LP, we do not assume knowledge of the prior distribution of the unknown parameter. With respect to information that can be obtained from an experiment, our solution is more efficient than Wald's minimax solution.However, with respect to information assumed to be known before the experiment, our solution demands less input than the Bayesian solution.\nWe show that maximum entropy (maxent) models can be modeled with certain kinds of HMMs, allowing us to construct maxent models with hidden variables, hidden state sequences, or other characteristics. The models can be trained using the forward-backward algorithm. While the results are primarily of theoretical interest, unifying apparently unrelated concepts, we also give experimental results for a maxent model with a hidden variable on a word disambiguation task; the model outperforms standard techniques.\nWe describe expectation propagation for approximate inference in dynamic Bayesian networks as a natural extension of Pearl s exact belief propagation.Expectation propagation IS a greedy algorithm, converges IN many practical cases, but NOT always.We derive a DOUBLE - loop algorithm, guaranteed TO converge TO a local minimum OF a Bethe free energy.Furthermore, we show that stable fixed points OF (damped) expectation propagation correspond TO local minima OF this free energy, but that the converse need NOT be the CASE .We illustrate the algorithms BY applying them TO switching linear dynamical systems AND discuss implications FOR approximate inference IN general Bayesian networks.\nWe present methods employed in Coordinate, a prototype service that supports collaboration and communication by learning predictive models that provide forecasts of users s AND availability.We describe how data IS collected about USER activity AND proximity FROM multiple devices, IN addition TO analysis OF the content OF users, the time of day, and day of week. We review applications of presence forecasting embedded in the Priorities application and then present details of the Coordinate service that was informed by the earlier efforts.\nWe introduce a general representation of large-population games in which each player s influence ON the others IS centralized AND limited, but may otherwise be arbitrary.This representation significantly generalizes the class known AS congestion games IN a natural way.Our main results are provably correct AND efficient algorithms FOR computing AND learning approximate Nash equilibria IN this general framework.\nWe propose a formal treatment of scenarios in the context of a dialectical argumentation formalism for qualitative reasoning about uncertain propositions. Our formalism extends prior work in which arguments for and against uncertain propositions were presented and compared in interaction spaces called Agoras. We now define the notion of a scenario in this framework and use it to define a set of qualitative uncertainty labels for propositions across a collection of scenarios. This work is intended to lead to a formal theory of scenarios and scenario analysis.\nExact monitoring in dynamic Bayesian networks is intractable, so approximate algorithms are necessary. This paper presents a new family of approximate monitoring algorithms that combine the best qualities of the particle filtering and Boyen-Koller methods. Our algorithms maintain an approximate representation the belief state in the form of sets of factored particles, that correspond to samples of clusters of state variables. Empirical results show that our algorithms outperform both ordinary particle filtering and the Boyen-Koller algorithm on large systems.\nWe present new algorithms for inference in credal networks --- directed acyclic graphs associated with sets of probabilities. Credal networks are here interpreted as encoding strong independence relations among variables. We first present a theory of credal networks based on separately specified sets of probabilities. We also show that inference with polytrees is NP-hard in this setting. We then introduce new techniques that reduce the computational effort demanded by inference, particularly in polytrees, by exploring separability of credal sets.\nWe develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. This formula deviates from the standard BIC score. Our work provides a concrete example that the BIC score is generally not valid for statistical models that belong to a stratified exponential family. This stands in contrast to linear and curved exponential families, where the BIC score has been proven to provide a correct approximation for the marginal likelihood.\nWe address the question of convergence in the loopy belief propagation (LBP) algorithm. Specifically, we relate convergence of LBP to the existence of a weak limit for a sequence of Gibbs measures defined on the LBP s associated computation tree.Using tools FROM the theory OF Gibbs measures we develop easily testable sufficient conditions FOR convergence.The failure OF convergence OF LBP implies the existence OF multiple phases FOR the associated Gibbs specification.These results give new insight INTO the mechanics OF the algorithm.\nThis presentation will introduce the audience to a new, emerging body of research on sequential Monte Carlo techniques in robotics. In recent years, particle filters have solved several hard perceptual robotic problems. Early successes were limited to low-dimensional problems, such as the problem of robot localization in environments with known maps. More recently, researchers have begun exploiting structural properties of robotic domains that have led to successful particle filter applications in spaces with as many as 100,000 dimensions. The presentation will discuss specific tricks necessary to make these techniques work in real - world domains,and also discuss open challenges for researchers IN the UAI community.\nThe validity OF a causal model can be tested ONLY IF the model imposes constraints ON the probability distribution that governs the generated data. IN the presence OF unmeasured variables, causal models may impose two types OF constraints : conditional independencies, AS READ through the d - separation criterion, AND functional constraints, FOR which no general criterion IS available.This paper offers a systematic way OF identifying functional constraints AND, thus, facilitates the task OF testing causal models AS well AS inferring such models FROM data.\nWe present a general framework for defining priors on model structure and sampling from the posterior using the Metropolis-Hastings algorithm. The key idea is that structure priors are defined via a probability tree and that the proposal mechanism for the Metropolis-Hastings algorithm operates by traversing this tree, thereby defining a cheaply computable acceptance probability. We have applied this approach to Bayesian net structure learning using a number of priors and tree traversal strategies. Our results show that these must be chosen appropriately for this approach to be successful.\nThis paper presents a sound and completecalculus for causal relevance, based onPearl's functional models semantics.The calculus consists of axioms and rulesof inference for reasoning about causalrelevance relationships.We extend the set of known axioms for causalrelevance with three new axioms, andintroduce two new rules of inference forreasoning about specific subclasses ofmodels.These subclasses give a more refinedcharacterization of causal models than the one given in Halpern's axiomatizationof counterfactual reasoning.Finally, we show how the calculus for causalrelevance can be used in the task ofidentifying causal structure from non-observational data.\nAn instrument is a random variable thatallows the identification of parameters inlinear models when the error terms arenot uncorrelated.It is a popular method used in economicsand the social sciences that reduces theproblem of identification to the problemof finding the appropriate instruments.Few years ago, Pearl introduced a necessarytest for instruments that allows the researcher to discard those candidatesthat fail the test.In this paper, we make a detailed study of Pearl's test and the general model forinstruments. The results of this studyinclude a novel interpretation of Pearl'stest, a general theory of instrumentaltests, and an affirmative answer to aprevious conjecture. We also presentnew instrumentality tests for the casesof discrete and continuous variables.\nIt is often stated in papers tackling the task of inferring Bayesian network structures from data that there are these two distinct approaches: (i) Apply conditional independence tests when testing for the presence or otherwise of edges; (ii) Search the model space using a scoring metric. Here I argue that for complete data and a given node ordering this division is a myth, by showing that cross entropy methods for checking conditional independence are mathematically identical to methods based upon discriminating between models by their overall goodness-of-fit logarithmic scores.\nWe propose a new definition of actual causes, using structural equations to model counterfactuals.We show that the definitions yield a plausible and elegant account ofcausation that handles well examples which have caused problems forother definitions and resolves major difficulties in the traditionalaccount. In a companion paper, we show how the definition of causality can beused to give an elegant definition of (causal) explanation.\nThis article deals with plausible reasoning from incomplete knowledge about large-scale spatial properties. The availableinformation, consisting of a set of pointwise observations,is extrapolated to neighbour points. We make use of belief functions to represent the influence of the knowledge at a given point to another point; the quantitative strength of this influence decreases when the distance between both points increases. These influences arethen aggregated using a variant of Dempster's rule of combination which takes into account the relative dependence between observations.\nWe present probabilistic logic programming under inheritance with overriding. This approach is based on new notions of entailment for reasoning with conditional constraints, which are obtained from the classical notion of logical entailment by adding the principle of inheritance with overriding. This is done by using recent approaches to probabilistic default reasoning with conditional constraints. We analyze the semantic properties of the new entailment relations. We also present algorithms for probabilistic logic programming under inheritance with overriding, and program transformations for an increased efficiency.\nIn this paper we compare three different architectures for the evaluation of influence diagrams: HUGIN, Shafer-Shenoy, and Lazy Evaluation architecture. The computational complexity of the architectures are compared on the LImited Memory Influence Diagram (LIMID): a diagram where only the requiste information for the computation of the optimal policies are depicted. Because the requsite information is explicitly represented in the LIMID the evaluation can take advantage of it, and significant savings in computational can be obtained. In this paper we show how the obtained savings is considerably increased when the computations performed on the LIMID is according to the Lazy Evaluation scheme.\nThe direct effect of one eventon another can be defined and measured byholding constant all intermediate variables between the two.Indirect effects present conceptual andpractical difficulties (in nonlinear models), because they cannot be isolated by holding certain variablesconstant. This paper shows a way of defining any path-specific effectthat does not invoke blocking the remainingpaths.This permits the assessment of a more naturaltype of direct and indirect effects, one thatis applicable in both linear and nonlinear models. The paper establishesconditions under which such assessments can be estimated consistentlyfrom experimental and nonexperimental data,and thus extends path-analytic techniques tononlinear and nonparametric models.\nWe propose a new approach to value-directed belief state approximation for POMDPs. The value-directed model allows one to choose approximation methods for belief state monitoring that have a small impact on decision quality. Using a vector space analysis of the problem, we devise two new search procedures for selecting an approximation scheme that have much better computational properties than existing methods. Though these provide looser error bounds, we show empirically that they have a similar impact on decision quality in practice, and run up to two orders of magnitude more quickly.\nA method is presented for the rhythmic parsing problem: Given a sequence of observed musical note onset times, we estimate the corresponding notated rhythm and tempo process. A graphical model is developed that represents the simultaneous evolution of tempo and rhythm and relates these hidden quantities to observations. The rhythm variables are discrete and the tempo and observation variables are continuous. We show how to compute the globally most likely configuration of the tempo and rhythm variables given an observation of note onset times. Preliminary experiments are presented on a small data set. A generalization to arbitrary conditional Gaussian distributions is outlined.\nWe consider a partially observable Markov decision problem (POMDP) that models a class of sequencing problems. Although POMDPs are typically intractable, our formulation admits tractable solution. Instead of maintaining a value function over a high-dimensional set of belief states, we reduce the state space to one of smaller dimension, in which grid-based dynamic programming techniques are effective. We develop an error bound for the resulting approximation, and discuss an application of the model to a problem in targeted advertising.\nWe propose a new method of discovering causal structures, based on the detection of local, spontaneous changes in the underlying data-generating model. We analyze the classes of structures that are equivalent relative to a stream of distributions produced by local changes, and devise algorithms that output graphical representations of these equivalence classes. We present experimental results, using simulated data, and examine the errors associated with detection of changes and recovery of structures.\nWe treat collaborative filtering as a univariate time series estimation problem: given a user's previous votes, predict the next vote. We describe two families of methods for transforming data to encode time order in ways amenable to off-the-shelf classification and density estimation tools, and examine the results of using these approaches on several real-world data sets. The improvements in predictive accuracy we realize recommend the use of other predictive algorithms that exploit the temporal order of data.\nWe show that if a strictly positive joint probability distribution for a set of binary random variables factors according to a tree, then vertex separation represents all and only the independence relations enclosed in the distribution. The same result is shown to hold also for multivariate strictly positive normal distributions. Our proof uses a new property of conditional independence that holds for these two classes of probability distributions.\nPossibilistic logic offers a qualitative framework for representing pieces of information associated with levels of uncertainty of priority. The fusion of multiple sources information is discussed in this setting. Different classes of merging operators are considered including conjunctive, disjunctive, reinforcement, adaptive and averaging operators. Then we propose to analyse these classes in terms of postulates. This is done by first extending the postulate for merging classical bases to the case where priorites are avaialbe.\nMonitoring plan preconditions can allow for replanning when a precondition fails, generally far in advance of the point in the plan where the precondition is relevant. However, monitoring is generally costly, and some precondition failures have a very small impact on plan quality. We formulate a model for optimal precondition monitoring, using partially-observable Markov decisions processes, and describe methods for solving this model efficitively, though approximately. Specifically, we show that the single-precondition monitoring problem is generally tractable, and the multiple-precondition monitoring policies can be efficitively approximated using single-precondition soultions.\nAlgorithms for exact and approximate inference in stochastic logic programs (SLPs) are presented, based respectively, on variable elimination and importance sampling. We then show how SLPs can be used to represent prior distributions for machine learning, using (i) logic programs and (ii) Bayes net structures as examples. Drawing on existing work in statistics, we apply the Metropolis-Hasting algorithm to construct a Markov chain which samples from the posterior distribution. A Prolog implementation for this is described. We also discuss the possibility of constructing explicit representations of the posterior.\nIn this paper, we formulate a qualitative \"linear\" utility theory for lotteries in which uncertainty is expressed qualitatively using a Spohnian disbelief function. We argue that a rational decision maker facing an uncertain decision problem in which the uncertainty is expressed qualitatively should behave so as to maximize \"qualitative expected utility.\" Our axiomatization of the qualitative utility is similar to the axiomatization developed by von Neumann and Morgenstern for probabilistic lotteries. We compare our results with other recent results in qualitative decision making.\nWe propose a framework for building graphical causal model that is based on the concept of causal mechanisms. Causal models are intuitive for human users and, more importantly, support the prediction of the effect of manipulation. We describe an implementation of the proposed framework as an interactive model construction module, ImaGeNIe, in SMILE (Structural Modeling, Inference, and Learning Engine) and in GeNIe (SMILE's Windows user interface).\nWe present a new approach to the solution of decision problems formulated as influence diagrams. The approach converts the influence diagram into a simpler structure, the LImited Memory Influence Diagram (LIMID), where only the requisite information for the computation of optimal policies is depicted. Because the requisite information is explicitly represented in the diagram, the evaluation procedure can take advantage of it. In this paper we show how to convert an influence diagram to a LIMID and describe the procedure for finding an optimal strategy. Our approach can yield significant savings of memory and computational time when compared to traditional methods.\nConversations abound with uncetainties of various kinds. Treating conversation as inference and decision making under uncertainty, we propose a task independent, multimodal architecture for supporting robust continuous spoken dialog called Quartet. We introduce four interdependent levels of analysis, and describe representations, inference procedures, and decision strategies for managing uncertainties within and between the levels. We highlight the approach by reviewing interactions between a user and two spoken dialog systems developed using the Quartet architecture: Prsenter, a prototype system for navigating Microsoft PowerPoint presentations, and the Bayesian Receptionist, a prototype system for dealing with tasks typically handled by front desk receptionists at the Microsoft corporate campus.\nQualitative probabilistic networks have been designed for probabilistic reasoning in a qualitative way. Due to their coarse level of representation detail, qualitative probabilistic networks do not provide for resolving trade-offs and typically yield ambiguous results upon inference. We present an algorithm for computing more insightful results for unresolved trade-offs. The algorithm builds upon the idea of using pivots to zoom in on the trade-offs and identifying the information that would serve to resolve them.\nThis paper extends the work in [Suzuki, 1996] and presents an efficient depth-first branch-and-bound algorithm for learning Bayesian network structures, based on the minimum description length (MDL) principle, for a given (consistent) variable ordering. The algorithm exhaustively searches through all network structures and guarantees to find the network with the best MDL score. Preliminary experiments show that the algorithm is efficient, and that the time complexity grows slowly with the sample size. The algorithm is useful for empirically studying both the performance of suboptimal heuristic search algorithms and the adequacy of the MDL principle in learning Bayesian networks.\nThis paper deals with the problem of estimating the probability that one event was a cause of another in a given scenario. Using structural-semantical definitions of the probabilities of necessary or sufficient causation (or both), we show how to optimally bound these quantities from data obtained in experimental and observational studies, making minimal assumptions concerning the data-generating process. In particular, we strengthen the results of Pearl (1999) by weakening the data-generation assumptions and deriving theoretically sharp bounds on the probabilities of causation. These results delineate precisely how empirical data can be used both in settling questions of attribution and in solving attribution-related problems of decision making.\nThis paper examines the use of Bayesian Networks to tackle one of the tougher problems in requirements engineering, translating user requirements into system requirements. The approach taken is to model domain knowledge as Bayesian Network fragments that are glued together to form a complete view of the domain specific system requirements. User requirements are introduced as evidence and the propagation of belief is used to determine what are the appropriate system requirements as indicated by user requirements. This concept has been demonstrated in the development of a system specification and the results are presented here.\nRecent work on loglinear models in probabilistic constraint logic programming is applied to first-order probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of first-order reasoning, so that, for example, there is a one-one mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to first-order probabilistic reasoning.\nWe consider the task of learning the maximum-likelihood polytree from data. Our first result is a performance guarantee establishing that the optimal branching (or Chow-Liu tree), which can be computed very easily, constitutes a good approximation to the best polytree. We then show that it is not possible to do very much better, since the learning problem is NP-hard even to approximately solve within some constant factor.\nRecent improvement on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden variables. 2. Comparing graphyical models with hidden variables (i.e., model equivalence, inclusion, and overlap). 3. Answering questions about the identification of a model or portion of a model, and about bounds on quantities derived from a model. 4. Determing whether a given set of independence assertions. We discuss the foundation of quantifier elimination and demonstrate its application to these problems.\nA conceptual foundation for approximation of belief functions is proposed and investigated. It is based on the requirements of consistency and closeness. An optimal approximation is studied. Unfortunately, the computation of the optimal approximation turns out to be intractable. Hence, various heuristic methods are proposed and experimantally evaluated both in terms of their accuracy and in terms of the speed of computation. These methods are compared to the earlier proposed approximations of belief functions.\nWe outline a method to estimate the value of computation for a flexible algorithm using empirical data. To determine a reasonable trade-off between cost and value, we build an empirical model of the value obtained through computation, and apply this model to estimate the value of computation for quite different problems. In particular, we investigate this trade-off for the problem of constructing policies for decision problems represented as influence diagrams. We show how two features of our anytime algorithm provide reasonable estimates of the value of computation in this domain.\nThis paper extends previous work with network fragments and situation-specific network construction. We formally define the asymmetry network, an alternative representation for a conditional probability table. We also present an object-oriented representation for partially specified asymmetry networks. We show that the representation is parsimonious. We define an algebra for the elements of the representation that allows us to 'factor' any CPT and to soundly combine the partially specified asymmetry networks.\nDecision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.\nWe show how to use a variational approximation to the logistic function to perform approximate inference in Bayesian networks containing discrete nodes with continuous parents. Essentially, we convert the logistic function to a Gaussian, which facilitates exact inference, and then iteratively adjust the variational parameters to improve the quality of the approximation. We demonstrate experimentally that this approximation is faster and potentially more accurate than sampling. We also introduce a simple new technique for handling evidence, which allows us to handle arbitrary distributions on observed nodes, as well as achieving a significant speedup in networks with discrete variables of large cardinality.\nThis paper describes stochastic search approaches, including a new stochastic algorithm and an adaptive mutation operator, for learning Bayesian networks from incomplete data. This problem is characterized by a huge solution space with a highly multimodal landscape. State-of-the-art approaches all involve using deterministic approaches such as the expectation-maximization algorithm. These approaches are guaranteed to find local maxima, but do not explore the landscape for other modes. Our approach evolves structure and the missing data. We compare our stochastic algorithms and show they all produce accurate results.\nQualitative probabilistic networks have been introduced as qualitative abstractions of Bayesian belief networks. One of the major drawbacks of these qualitative networks is their coarse level of detail, which may lead to unresolved trade-offs during inference. We present an enhanced formalism for qualitative networks with a finer level of detail. An enhanced qualitative probabilistic network differs from a regular qualitative network in that it distinguishes between strong and weak influences. Enhanced qualitative probabilistic networks are purely qualitative in nature, as regular qualitative networks are, yet allow for efficiently resolving trade-offs during inference.\nIn this article we propose a qualitative (ordinal) counterpart for the Partially Observable Markov Decision Processes model (POMDP) in which the uncertainty, as well as the preferences of the agent, are modeled by possibility distributions. This qualitative counterpart of the POMDP model relies on a possibilistic theory of decision under uncertainty, recently developed. One advantage of such a qualitative framework is its ability to escape from the classical obstacle of stochastic POMDPs, in which even with a finite state space, the obtained belief state space of the POMDP is infinite. Instead, in the possibilistic framework even if exponentially larger than the state space, the belief state space remains finite.\nOne of the most useful sensitivity analysis techniques of decision analysis is the computation of value of information (or clairvoyance), the difference in value obtained by changing the decisions by which some of the uncertainties are observed. In this paper, some simple but powerful extensions to previous algorithms are introduced which allow an efficient value of information calculation on the rooted cluster tree (or strong junction tree) used to solve the original decision problem.\nThe noisy-or and its generalization noisy-max have been utilized to reduce the complexity of knowledge acquisition. In this paper, we present a new representation of noisy-max that allows for efficient inference in general Bayesian networks. Empirical studies show that our method is capable of computing queries in well-known large medical networks, QMR-DT and CPCS, for which no previous exact inference method has been shown to perform well.\nWe present a technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov decision processes (MDPs). The technique can be easily incorporated into any existing POMDP value iteration algorithms. Experiments have been conducted on several test problems with one POMDP value iteration algorithm called incremental pruning. We find that the technique can make incremental pruning run several orders of magnitude faster.\nArgumentation is a promising model for reasoning with uncertain knowledge. The key concept of acceptability enables to differentiate arguments and counterarguments: The certainty of a proposition can then be evaluated through the most acceptable arguments for that proposition. In this paper, we investigate different complementary points of view: - an acceptability based on the existence of direct counterarguments, - an acceptability based on the existence of defenders. Pursuing previous work on preference-based argumentation principles, we enforce both points of view by taking into account preference orderings for comparing arguments. Our approach is illustrated in the context of reasoning with stratified knowldge bases.\nThis paper addresses the problem of merging uncertain information in the framework of possibilistic logic. It presents several syntactic combination rules to merge possibilistic knowledge bases, provided by different sources, into a new possibilistic knowledge base. These combination rules are first described at the meta-level outside the language of possibilistic logic. Next, an extension of possibilistic logic, where the combination rules are inside the language, is proposed. A proof system in a sequent form, which is sound and complete with respect to the possibilistic logic semantics, is given.\nGiven an undirected graph G or hypergraph X model for a given set of variables V, we introduce two marginalization operators for obtaining the undirected graph GA or hypergraph HA associated with a given subset A c V such that the marginal distribution of A factorizes according to GA or HA, respectively. Finally, we illustrate the method by its application to some practical examples. With them we show that hypergraph models allow defining a finer factorization or performing a more precise conditional independence analysis than undirected graph models.\nThis paper analyzes irrelevance and independence relations in graphical models associated with convex sets of probability distributions (called Quasi-Bayesian networks). The basic question in Quasi-Bayesian networks is, How can irrelevance/independence relations in Quasi-Bayesian networks be detected, enforced and exploited? This paper addresses these questions through Walley's definitions of irrelevance and independence. Novel algorithms and results are presented for inferences with the so-called natural extensions using fractional linear programming, and the properties of the so-called type-1 extensions are clarified through a new generalization of d-separation.\nThe variability of structure in a finite Markov equivalence class of causally sufficient models represented by directed acyclic graphs has been fully characterized. Without causal sufficiency, an infinite semi-Markov equivalence class of models has only been characterized by the fact that each model in the equivalence class entails the same marginal statistical dependencies. In this paper, we study the variability of structure of causal models within a semi-Markov equivalence class and propose a systematic approach to construct models entailing any specific marginal statistical dependencies.\nThis paper relates comparative belief structures and a general view of belief management in the setting of deductively closed logical representations of accepted beliefs. We show that the range of compatibility between the classical deductive closure and uncertain reasoning covers precisely the nonmonotonic 'preferential' inference system of Kraus, Lehmann and Magidor and nothing else. In terms of uncertain reasoning any possibility or necessity measure gives birth to a structure of accepted beliefs. The classes of probability functions and of Shafer's belief functions which yield belief sets prove to be very special ones.\nThis paper presents an axiomatic framework for qualitative decision under uncertainty in a finite setting. The corresponding utility is expressed by a sup-min expression, called Sugeno (or fuzzy) integral. Technically speaking, Sugeno integral is a median, which is indeed a qualitative counterpart to the averaging operation underlying expected utility. The axiomatic justification of Sugeno integral-based utility is expressed in terms of preference between acts as in Savage decision theory. Pessimistic and optimistic qualitative utilities, based on necessity and possibility measures, previously introduced by two of the authors, can be retrieved in this setting by adding appropriate axioms.\nDynamic probabilistic networks are a compact representation of complex stochastic processes. In this paper we examine how to learn the structure of a DPN from data. We extend structure scoring rules for standard probabilistic networks to the dynamic case, and show how to search for structure when some of the variables are hidden. Finally, we examine two applications where such a technology might be useful: predicting and classifying dynamic behaviors, and learning causal orderings in biological processes. We provide empirical results that demonstrate the applicability of our methods in both domains.\nMost algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state.\nWe take another look at the general problem of selecting a preferred probability measure among those that comply with some given constraints. The dominant role that entropy maximization has obtained in this context is questioned by arguing that the minimum information principle on which it is based could be supplanted by an at least as plausible \"likelihood of evidence\" principle. We then review a method for turning given selection functions into representation independent variants, and discuss the tradeoffs involved in this transformation.\nIn the literature on graphical models, there has been increased attention paid to the problems of learning hidden structure (see Heckerman [H96] for survey) and causal mechanisms from sample data [H96, P88, S93, P95, F98]. In most settings we should expect the former to be difficult, and the latter potentially impossible without experimental intervention. In this work, we examine some restricted settings in which perfectly reconstruct the hidden structure solely on the basis of observed sample data.\nWe exploit qualitative probabilistic relationships among variables for computing bounds of conditional probability distributions of interest in Bayesian networks. Using the signs of qualitative relationships, we can implement abstraction operations that are guaranteed to bound the distributions of interest in the desired direction. By evaluating incrementally improved approximate networks, our algorithm obtains monotonically tightening bounds that converge to exact distributions. For supermodular utility functions, the tightening bounds monotonically reduce the set of admissible decision alternatives as well.\nThis paper describes a process for constructing situation-specific belief networks from a knowledge base of network fragments. A situation-specific network is a minimal query complete network constructed from a knowledge base in response to a query for the probability distribution on a set of target variables given evidence and context variables. We present definitions of query completeness and situation-specific networks. We describe conditions on the knowledge base that guarantee query completeness. The relationship of our work to earlier work on KBMC is also discussed.\nI present a parallel algorithm for exact probabilistic inference in Bayesian networks. For polytree networks with n variables, the worst-case time complexity is O(log n) on a CREW PRAM (concurrent-read, exclusive-write parallel random-access machine) with n processors, for any constant number of evidence variables. For arbitrary networks, the time complexity is O(r^{3w}*log n) for n processors, or O(w*log n) for r^{3w}*n processors, where r is the maximum range of any variable, and w is the induced width (the maximum clique size), after moralizing and triangulating the network.\nThis paper is about reducing influence diagram (ID) evaluation into Bayesian network (BN) inference problems. Such reduction is interesting because it enables one to readily use one's favorite BN inference algorithm to efficiently evaluate IDs. Two such reduction methods have been proposed previously (Cooper 1988, Shachter and Peot 1992). This paper proposes a new method. The BN inference problems induced by the mew method are much easier to solve than those induced by the two previous methods.\nMuch recent research in decision theoretic planning has adopted Markov decision processes (MDPs) as the model of choice, and has attempted to make their solution more tractable by exploiting problem structure. One particular algorithm, structured policy construction achieves this by means of a decision theoretic analog of goal regression using action descriptions based on Bayesian networks with tree-structured conditional probability tables. The algorithm as presented is not able to deal with actions with correlated effects. We describe a new decision theoretic regression operator that corrects this weakness. While conceptually straightforward, this extension requires a somewhat more complicated technical approach.\nDecomposable dependency models and their graphical counterparts, i.e., chordal graphs, possess a number of interesting and useful properties. On the basis of two characterizations of decomposable models in terms of independence relationships, we develop an exact algorithm for recovering the chordal graphical representation of any given decomposable model. We also propose an algorithm for learning chordal approximations of dependency models isomorphic to general undirected graphs.\nMost exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the \"incremental pruning\" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving POMDPs.\nAs probabilistic systems gain popularity and are coming into wider use, the need for a mechanism that explains the system's findings and recommendations becomes more critical. The system will also need a mechanism for ordering competing explanations. We examine two representative approaches to explanation in the literature - one due to G\\\"ardenfors and one due to Pearl - and show that both suffer from significant problems. We propose an approach to defining a notion of \"better explanation\" that combines some of the features of both together with more recent work by Pearl and others on causality.\nThis paper introduces a new algorithm for the induction if complex finite state automata from samples of behavior. The algorithm is based on information theoretic principles. The algorithm reduces the search space by many orders of magnitude over what was previously thought possible. We compare the algorithm with some existing induction techniques for finite state automata and show that the algorithm is much superior in both run time and quality of inductions.\nThis paper proposes a novel, algorithm-independent approach to optimizing belief network inference. rather than designing optimizations on an algorithm by algorithm basis, we argue that one should use an unoptimized algorithm to generate a Q-DAG, a compiled graphical representation of the belief network, and then optimize the Q-DAG and its evaluator instead. We present a set of Q-DAG optimizations that supplant optimizations designed for traditional inference algorithms, including zero compression, network pruning and caching. We show that our Q-DAG optimizations require time linear in the Q-DAG size, and significantly simplify the process of designing algorithms for optimizing belief network inference.\nStochastic algorithms are among the best for solving computationally hard search and reasoning problems. The runtime of such procedures is characterized by a random variable. Different algorithms give rise to different probability distributions. One can take advantage of such differences by combining several algorithms into a portfolio, and running them in parallel or interleaving them on a single processor. We provide a detailed evaluation of the portfolio approach on distributions of hard combinatorial search problems. We show under what conditions the protfolio approach can have a dramatic computational advantage over the best traditional methods.\nValuation based systems verifying an idempotent property are studied. A partial order is defined between the valuations giving them a lattice structure. Then, two different strategies are introduced to represent valuations: as infimum of the most informative valuations or as supremum of the least informative ones. It is studied how to carry out computations with both representations in an efficient way. The particular cases of finite sets and convex polytopes are considered.\nWe review the problem of time-critical action and discuss a reformulation that shifts knowledge acquisition from the assessment of complex temporal probabilistic dependencies to the direct assessment of time-dependent utilities over key outcomes of interest. We dwell on a class of decision problems characterized by the centrality of diagnosing and reacting in a timely manner to pathological processes. We motivate key ideas in the context of trauma-care triage and transportation decisions.\nA new method is developed to represent probabilistic relations on multiple random events. Where previously knowledge bases containing probabilistic rules were used for this purpose, here a probability distribution over the relations is directly represented by a Bayesian network. By using a powerful way of specifying conditional probability distributions in these networks, the resulting formalism is more expressive than the previous ones. Particularly, it provides for constraints on equalities of events, and it allows to define complex, nested combination functions.\nThe idea of fully accepting statements when the evidence has rendered them probable enough faces a number of difficulties. We leave the interpretation of probability largely open, but attempt to suggest a contextual approach to full belief. We show that the difficulties of probabilistic acceptance are not as severe as they are sometimes painted, and that though there are oddities associated with probabilistic acceptance they are in some instances less awkward than the difficulties associated with other nonmonotonic formalisms. We show that the structure at which we arrive provides a natural home for statistical inference.\nIn this paper we present some results obtained with a troupe of low-cost robots designed to cooperatively explore and adquire the map of unknown structured orthogonal environments. In order to improve the covering of the explored zone, the robots show different behaviours and cooperate by transferring each other the perceived environment when they meet. The returning robots deliver to a host computer their partial maps and the host incrementally generates the map of the environment by means of apossibility/ necessity grid.\nThis paper discusses causal independence models and a generalization of these models called causal interaction models. Causal interaction models are models that have independent mechanisms where a mechanism can have several causes. In addition to introducing several particular types of causal interaction models, we show how we can apply the Bayesian approach to learning causal interaction models obtaining approximate posterior distributions for the models and obtain MAP and ML estimates for the parameters. We illustrate the approach with a simulation study of learning model posteriors.\nBayesian approaches to learn the graphical structure of Bayesian Belief Networks (BBNs) from databases share the assumption that the database is complete, that is, no entry is reported as unknown. Attempts to relax this assumption involve the use of expensive iterative methods to discriminate among different structures. This paper introduces a deterministic method to learn the graphical structure of a BBN from a possibly incomplete database. Experimental evaluations show a significant robustness of this method and a remarkable independence of its execution time from the number of missing data.\nThis paper explores the role of independence of causal influence (ICI) in Bayesian network inference. ICI allows one to factorize a conditional probability table into smaller pieces. We describe a method for exploiting the factorization in clique tree propagation (CTP) - the state-of-the-art exact inference algorithm for Bayesian networks. We also present empirical results showing that the resulting algorithm is significantly more efficient than the combination of CTP and previous techniques for exploiting ICI.\nPlanning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to accelerate value iteration, a standard algorithm for solving Markov decision processes. Empirical studies have shown that the techniques can bring about significant speedups.\nThe computational complexity of reasoning within the Dempster-Shafer theory of evidence is one of the main points of criticism this formalism has to face. To overcome this difficulty various approximation algorithms have been suggested that aim at reducing the number of focal elements in the belief functions involved. Besides introducing a new algorithm using this method, this paper describes an empirical study that examines the appropriateness of these approximation procedures in decision making situations. It presents the empirical findings and discusses the various tradeoffs that have to be taken into account when actually applying one of these methods.\nWe develop a qualitative model of decision making with two aims: to describe how people make simple decisions and to enable computer programs to do the same. Current approaches based on Planning or Decisions Theory either ignore uncertainty and tradeoffs, or provide languages and algorithms that are too complex for this task. The proposed model provides a language based on rules, a semantics based on high probabilities and lexicographical preferences, and a transparent decision procedure where reasons for and against decisions interact. The model is no substitude for Decision Theory, yet for decisions that people find easy to explain it may provide an appealing alternative.\nReal-time Decision algorithms are a class of incremental resource-bounded [Horvitz, 89] or anytime [Dean, 93] algorithms for evaluating influence diagrams. We present a test domain for real-time decision algorithms, and the results of experiments with several Real-time Decision Algorithms in this domain. The results demonstrate high performance for two algorithms, a decision-evaluation variant of Incremental Probabilisitic Inference [D'Ambrosio 93] and a variant of an algorithm suggested by Goldszmidt, [Goldszmidt, 95], PK-reduced. We discuss the implications of these experimental results and explore the broader applicability of these algorithms.\nProbabilistic inference algorithms for finding the most probable explanation, the maximum aposteriori hypothesis, and the maximum expected utility and for updating belief are reformulated as an elimination--type algorithm called bucket elimination. This emphasizes the principle common to many of the algorithms appearing in that literature and clarifies their relationship to nonserial dynamic programming algorithms. We also present a general way of combining conditioning and elimination within this framework. Bounds on complexity are given for all the algorithms as a function of the problem's structure.\nWe report on work towards flexible algorithms for solving decision problems represented as influence diagrams. An algorithm is given to construct a tree structure for each decision node in an influence diagram. Each tree represents a decision function and is constructed incrementally. The improvements to the tree converge to the optimal decision function (neglecting computational costs) and the asymptotic behaviour is only a constant factor worse than dynamic programming techniques, counting the number of Bayesian network queries. Empirical results show how expected utility increases with the size of the tree and the number of Bayesian net calculations.\nInference algorithms for arbitrary belief networks are impractical for large, complex belief networks. Inference algorithms for specialized classes of belief networks have been shown to be more efficient. In this paper, we present a search-based algorithm for approximate inference on arbitrary, noisy-OR belief networks, generalizing earlier work on search-based inference for two-level, noisy-OR belief networks. Initial experimental results appear promising.\nWe present a prototype of a decision support system for management of the fungal disease mildew in winter wheat. The prototype is based on an influence diagram which is used to determine the optimal time and dose of mildew treatments. This involves multiple decision opportunities over time, stochasticity, inaccurate information and incomplete knowledge. The paper describes the practical and theoretical problems encountered during the construction of the influence diagram, and also the experience with the prototype.\nWe present a methodology for representing probabilistic relationships in a general-equilibrium economic model. Specifically, we define a precise mapping from a Bayesian network with binary nodes to a market price system where consumers and producers trade in uncertain propositions. We demonstrate the correspondence between the equilibrium prices of goods in this economy and the probabilities represented by the Bayesian network. A computational market model such as this may provide a useful framework for investigations of belief aggregation, distributed probabilistic inference, resource allocation under uncertainty, and other problems of decentralized uncertainty.\nWe derive qualitative relationships about the informational relevance of variables in graphical decision models based on a consideration of the topology of the models. Specifically, we identify dominance relations for the expected value of information on chance variables in terms of their position and relationships in influence diagrams. The qualitative relationships can be harnessed to generate nonnumerical procedures for ordering uncertain variables in a decision model by their informational relevance.\nWe present two Monte Carlo sampling algorithms for probabilistic inference that guarantee polynomial-time convergence for a larger class of network than current sampling algorithms provide. These new methods are variants of the known likelihood weighting algorithm. We use of recent advances in the theory of optimal stopping rules for Monte Carlo simulation to obtain an inference approximation with relative error epsilon and a small failure probability delta. We present an empirical evaluation of the algorithms which demonstrates their improved performance.\nSPIRIT is an expert system shell for probabilistic knowledge bases. Knowledge acquisition is performed by processing facts and rules on discrete variables in a rich syntax. The shell generates a probability distribution which respects all acquired facts and rules and which maximizes entropy. The user-friendly devices of SPIRIT to define variables, formulate rules and create the knowledge base are revealed in detail. Inductive learning is possible. Medium sized applications show the power of the system.\nWe propose a decision-analytical approach to comparing the flexibility of decision situations from the perspective of a decision-maker who exhibits constant risk-aversion over a monetary value model. Our approach is simple yet seems to be consistent with a variety of flexibility concepts, including robust and adaptive alternatives. We try to compensate within the model for uncertainty that was not anticipated or not modeled. This approach not only allows one to compare the flexibility of plans, but also guides the search for new, more flexible alternatives.\nAxiomatization has been widely used for testing logical implications. This paper suggests a non-axiomatic method, the chase, to test if a new dependency follows from a given set of probabilistic dependencies. Although the chase computation may require exponential time in some cases, this technique is a powerful tool for establishing nontrivial theoretical results. More importantly, this approach provides valuable insight into the intriguing connection between relational databases and probabilistic reasoning systems.\nThe article looks at mass market artificial intelligence tools in the context of their ever-growing sophistication, availability and market penetration. The subject is especially relevant today for these exact reasons - if a few years ago AI was the subject of high tech research and science fiction novels, today, we increasingly rely on cloud robotics to cater to our daily needs - to trade stock, predict weather, manage diaries, find friends and buy presents online.\nEvaluation of counterfactual queries (e.g., \"If A were true, would C have been true?\") is important to fault diagnosis, planning, determination of liability, and policy analysis. We present a method of revaluating counterfactuals when the underlying causal model is represented by structural models - a nonlinear generalization of the simultaneous equations models commonly used in econometrics and social sciences. This new method provides a coherent means for evaluating policies involving the control of variables which, prior to enacting the policy were influenced by other variables in the system.\nWe present a new approach to dealing with default information based on the theory of belief functions. Our semantic structures, inspired by Adams' epsilon-semantics, are epsilon-belief assignments, where values committed to focal elements are either close to 0 or close to 1. We define two systems based on these structures, and relate them to other non-monotonic systems presented in the literature. We show that our second system correctly addresses the well-known problems of specificity, irrelevance, blocking of inheritance, ambiguity, and redundancy.\nChain graphs combine directed and undirected graphs and their underlying mathematics combines properties of the two. This paper gives a simplified definition of chain graphs based on a hierarchical combination of Bayesian (directed) and Markov (undirected) networks. Examples of a chain graph are multivariate feed-forward networks, clustering with conditional interaction between variables, and forms of Bayes classifiers. Chain graphs are then extended using the notation of plates so that samples and data analysis problems can be represented in a graphical model as well. Implications for learning are discussed in the conclusion.\nWe present a simple characterization of equivalent Bayesian network structures based on local transformations. The significance of the characterization is twofold. First, we are able to easily prove several new invariant properties of theoretical interest for equivalent structures. Second, we use the characterization to derive an efficient algorithm that identifies all of the compelled edges in a structure. Compelled edge identification is of particular importance for learning Bayesian network structures from data because these edges indicate causal relationships when certain assumptions hold.\nWe present two algorithms for exact and approximate inference in causal networks. The first algorithm, dynamic conditioning, is a refinement of cutset conditioning that has linear complexity on some networks for which cutset conditioning is exponential. The second algorithm, B-conditioning, is an algorithm for approximate inference that allows one to trade-off the quality of approximations with the computation time. We also present some experimental results illustrating the properties of the proposed algorithms.\nBayesian networks provide a method of representing conditional independence between random variables and computing the probability distributions associated with these random variables. In this paper, we extend Bayesian network structures to compute probability density functions for continuous random variables. We make this extension by approximating prior and conditional densities using sums of weighted Gaussian distributions and then finding the propagation rules for updating the densities in terms of these weights. We present a simple example that illustrates the Bayesian network for continuous variables; this example shows the effect of the network structure and approximation errors on the computation of densities for variables in the network.\nThis paper concerns the probabilistic evaluation of the effects of actions in the presence of unmeasured variables. We show that the identification of causal effect between a singleton variable X and a set of variables Y can be accomplished systematically, in time polynomial in the number of variables in the graph. When the causal effect is identifiable, a closed-form expression can be obtained for the probability that the action will achieve a specified goal, or a set of goals.\nThis paper discusses techniques for performing efficient decision-theoretic planning. We give an overview of the DRIPS decision-theoretic refinement planning system, which uses abstraction to efficiently identify optimal plans. We present techniques for automatically generating search control information, which can significantly improve the planner's performance. We evaluate the efficiency of DRIPS both with and without the search control rules on a complex medical planning problem and compare its performance to that of a branch-and-bound decision tree algorithm.\nWe examine Bayesian methods for learning Bayesian networks from a combination of prior knowledge and statistical data. In particular, we unify the approaches we presented at last year's conference for discrete and Gaussian domains. We derive a general Bayesian scoring metric, appropriate for both domains. We then use this metric in combination with well-known statistical facts about the Dirichlet and normal--Wishart distributions to derive our metrics for discrete and Gaussian domains.\nIn earlier work, we introduced flexible inference and decision-theoretic metareasoning to address the intractability of normative inference. Here, rather than pursuing the task of computing beliefs and actions with decision models composed of distinctions about uncertain events, we examine methods for inferring beliefs about mathematical truth before an automated theorem prover completes a proof. We employ a Bayesian analysis to update belief in truth, given theorem-proving progress, and show how decision-theoretic methods can be used to determine the value of continuing to deliberate versus taking immediate action in time-critical situations.\nBayesian networks offer great potential for use in automating large scale diagnostic reasoning tasks. Gibbs sampling is the main technique used to perform diagnostic reasoning in large richly interconnected Bayesian networks. Unfortunately Gibbs sampling can take an excessive time to generate a representative sample. In this paper we describe and test a number of heuristic strategies for improving sampling in noisy-or Bayesian networks. The strategies include Monte Carlo Markov chain sampling techniques other than Gibbs sampling. Emphasis is put on strategies that can be implemented in distributed systems.\nConsider the situation where some evidence e has been entered to a Bayesian network. When performing conflict analysis, sensitivity analysis, or when answering questions like \"What if the finding on X had been y instead of x?\" you need probabilities P (e'| h), where e' is a subset of e, and h is a configuration of a (possibly empty) set of variables. Cautious propagation is a modification of HUGIN propagation into a Shafer-Shenoy-like architecture. It is less efficient than HUGIN propagation; however, it provides easy access to P (e'| h) for a great deal of relevant subsets e'.\nDawid, Kjaerulff and Lauritzen (1994) provided a preliminary description of a hybrid between Monte-Carlo sampling methods and exact local computations in junction trees. Utilizing the strengths of both methods, such hybrid inference methods has the potential of expanding the class of problems which can be solved under bounded resources as well as solving problems which otherwise resist exact solutions. The paper provides a detailed description of a particular instance of such a hybrid scheme; namely, combination of exact inference and Gibbs sampling in discrete Bayesian networks. We argue that this combination calls for an extension of the usual message passing scheme of ordinary junction trees.\nClassically, risk is characterized by a point value probability indicating the likelihood of occurrence of an adverse effect. However, there are domains where the attainability of objective numerical risk characterizations is increasingly being questioned. This paper reviews the arguments in favour of extending classical techniques of risk assessment to incorporate meaningful qualitative and weak quantitative risk characterizations. A technique in which linguistic uncertainty terms are defined in terms of patterns of argument is then proposed. The technique is demonstrated using a prototype computer-based system for predicting the carcinogenic risk due to novel chemical compounds.\nMarkov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.\nWe define a context-sensitive temporal probability logic for representing classes of discrete-time temporal Bayesian networks. Context constraints allow inference to be focused on only the relevant portions of the probabilistic knowledge. We provide a declarative semantics for our language. We present a Bayesian network construction algorithm whose generated networks give sound and complete answers to queries. We use related concepts in logic programming to justify our approach. We have implemented a Bayesian network construction algorithm for a subset of the theory and demonstrate it's application to the problem of evaluating the effectiveness of treatments for acute cardiac conditions.\nIn recent years there has been a spate of papers describing systems for probabilisitic reasoning which do not use numerical probabilities. In some cases the simple set of values used by these systems make it impossible to predict how a probability will change or which hypothesis is most likely given certain evidence. This paper concentrates on such situations, and suggests a number of ways in which they may be resolved by refining the representation.\nCertain causal models involving unmeasured variables induce no independence constraints among the observed variables but imply, nevertheless, inequality contraints on the observed distribution. This paper derives a general formula for such instrumental variables, that is, exogenous variables that directly affect some variables but not all. With the help of this formula, it is possible to test whether a model involving instrumental variables may account for the data, or, conversely, whether a given variables can be deemed instrumental.\nThe paper concerns the probabilistic evaluation of plans in the presence of unmeasured variables, each plan consisting of several concurrent or sequential actions. We establish a graphical criterion for recognizing when the effects of a given plan can be predicted from passive observations on measured variables only. When the criterion is satisfied, a closed-form expression is provided for the probability that the plan will achieve a specified goal.\nWe show that there is a general, informative and reliable procedure for discovering causal relations when, for all the investigator knows, both latent variables and selection bias may be at work. Given information about conditional independence and dependence relations between measured variables, even when latent variables and selection bias may be present, there are sufficient conditions for reliably concluding that there is a causal path from one variable to another, and sufficient conditions for reliably concluding when no such causal path exists.\nThis paper develops a simple calculus for order of magnitude reasoning. A semantics is given with soundness and completeness results. Order of magnitude probability functions are easily defined and turn out to be equivalent to kappa functions, which are slight generalizations of Spohn's Natural Conditional Functions. The calculus also gives rise to an order of magnitude decision theory, which can be used to justify an amended version of Pearl's decision theory for kappa functions, although the latter is weaker and less expressive.\nThis paper discusses a method for implementing a probabilistic inference system based on an extended relational data model. This model provides a unified approach for a variety of applications such as dynamic programming, solving sparse linear equations, and constraint propagation. In this framework, the probability model is represented as a generalized relational database. Subsequent probabilistic requests can be processed as standard relational queries. Conventional database management systems can be easily adopted for implementing such an approximate reasoning system.\nIn this paper, we present two methods to provide explanations for reasoning with belief functions in the valuation-based systems. One approach, inspired by Strat's method, is based on sensitivity analysis, but its computation is simpler thus easier to implement than Strat's. The other one is to examine the impact of evidence on the conclusion based on the measure of the information content in the evidence. We show the property of additivity for the pieces of evidence that are conditional independent within the context of the valuation-based systems. We will give an example to show how these approaches are applied in an evidential network.\nThis paper reports experiments with the causal independence inference algorithm proposed by Zhang and Poole (1994b) on the CPSC network created by Pradhan et al. (1994). It is found that the algorithm is able to answer 420 of the 422 possible zero-observation queries, 94 of 100 randomly generated five-observation queries, 87 of 100 randomly generated ten-observation queries, and 69 of 100 randomly generated twenty-observation queries.\nWe discuss the problem of construction of inference procedures which can manipulate with uncertainties measured in ordinal scales and fulfill to the property of strict monotonicity of conclusion. The class of A-valuations of plausibility is considered where operations based only on information about linear ordering of plausibility values are used. In this class the modus ponens generating function fulfiling to the property of strict monotonicity of conclusions is introduced.\nWe show how to find a small loop curser in a Bayesian network. Finding such a loop cutset is the first step in the method of conditioning for inference. Our algorithm for finding a loop cutset, called MGA, finds a loop cutset which is guaranteed in the worst case to contain less than twice the number of variables contained in a minimum loop cutset. We test MGA on randomly generated graphs and find that the average ratio between the number of instances associated with the algorithms' output and the number of instances associated with a minimum solution is 1.22.\nSome instances of creative thinking require an agent to build and test hypothetical theories. Such a reasoner needs to explore the space of not only those situations that have occurred in the past, but also those that are rationally conceivable. In this paper we present a formalism for exploring the space of conceivable situation-models for those domains in which the knowledge is primarily probabilistic in nature. The formalism seeks to construct consistent, minimal, and desirable situation-descriptions by selecting suitable domain-attributes and dependency relationships from the available domain knowledge.\nSimulation schemes for probabilistic inference in Bayesian belief networks offer many advantages over exact algorithms; for example, these schemes have a linear and thus predictable runtime while exact algorithms have exponential runtime. Experiments have shown that likelihood weighting is one of the most promising simulation schemes. In this paper, we present a new simulation scheme that generates samples more evenly spread in the sample space than the likelihood weighting scheme. We show both theoretically and experimentally that the stratified scheme outperforms likelihood weighting in average runtime and error in estimates of beliefs.\nA BN2O network is a two level belief net in which the parent interactions are modeled using the noisy-or interaction model. In this paper we discuss application of the SPI local expression language to efficient inference in large BN2O networks. In particular, we show that there is significant structure, which can be exploited to improve over the Quickscore result. We further describe how symbolic techniques can provide information which can significantly reduce the computation required for computing all cause posterior marginals. Finally, we present a novel approximation technique with preliminary experimental results.\nWe investigate planning in time-critical domains represented as Markov Decision Processes, showing that search based techniques can be a very powerful method for finding close to optimal plans. To reduce the computational cost of planning in these domains, we execute actions as we construct the plan, and sacrifice optimality by searching to a fixed depth and using a heuristic function to estimate the value of states. Although this paper concentrates on the search algorithm, we also discuss ways of constructing heuristic functions suitable for this approach. Our results show that by interleaving search and execution, close to optimal policies can be found without the computational requirements of other approaches.\nPenalty logic, introduced by Pinkas, associates to each formula of a knowledge base the price to pay if this formula is violated. Penalties may be used as a criterion for selecting preferred consistent subsets in an inconsistent knowledge base, thus inducing a non-monotonic inference relation. A precise formalization and the main properties of penalty logic and of its associated non-monotonic inference relation are given in the first part. We also show that penalty logic and Dempster-Shafer theory are related, especially in the infinitesimal case.\nPossibilistic conditional independence is investigated: we propose a definition of this notion similar to the one used in probability theory. The links between independence and non-interactivity are investigated, and properties of these relations are given. The influence of the conjunction used to define a conditional measure of possibility is also highlighted: we examine three types of conjunctions: Lukasiewicz - like T-norms, product-like T-norms and the minimum operator.\nBackward simulation is an approximate inference technique for Bayesian belief networks. It differs from existing simulation methods in that it starts simulation from the known evidence and works backward (i.e., contrary to the direction of the arcs). The technique's focus on the evidence leads to improved convergence in situations where the posterior beliefs are dominated by the evidence rather than by the prior probabilities. Since this class of situations is large, the technique may make practical the application of approximate inference in Bayesian belief networks to many real-world problems.\nThis paper discusses the problem of abstracting conditional probabilistic actions. We identify two distinct types of abstraction: intra-action abstraction and inter-action abstraction. We define what it means for the abstraction of an action to be correct and then derive two methods of intra-action abstraction and two methods of inter-action abstraction which are correct according to this criterion. We illustrate the developed techniques by applying them to actions described with the temporal action representation used in the DRIPS decision-theoretic planner and we describe how the planner uses abstraction to reduce the complexity of planning.\nWithin the possibilistic approach to uncertainty modeling, the paper presents a modal logical system to reason about qualitative (comparative) statements of the possibility (and necessity) of fuzzy propositions. We relate this qualitative modal logic to the many--valued analogues MVS5 and MVKD45 of the well known modal logics of knowledge and belief S5 and KD45 respectively. Completeness results are obtained for such logics and therefore, they extend previous existing results for qualitative possibilistic logics in the classical non-fuzzy setting.\nA logic is defined that allows to express information about statistical probabilities and about degrees of belief in specific propositions. By interpreting the two types of probabilities in one common probability space, the semantics given are well suited to model the influence of statistical information on the formation of subjective beliefs. Cross entropy minimization is a key element in these semantics, the use of which is justified by showing that the resulting logic exhibits some very reasonable properties.\nThe paper deals with optimality issues in connection with updating beliefs in networks. We address two processes: triangulation and construction of junction trees. In the first part, we give a simple algorithm for constructing an optimal junction tree from a triangulated network. In the second part, we argue that any exact method based on local calculations must either be less efficient than the junction tree method, or it has an optimality problem equivalent to that of triangulation.\nThis paper examines the problem of constructing belief networks to evaluate plans produced by an knowledge-based planner. Techniques are presented for handling various types of complicating plan features. These include plans with context-dependent consequences, indirect consequences, actions with preconditions that must be true during the execution of an action, contingencies, multiple levels of abstraction multiple execution agents with partially-ordered and temporally overlapping actions, and plans which reference specific times and time durations.\nThis paper examines methods of decision making that are able to accommodate limitations on both the form in which uncertainty pertaining to a decision problem can be realistically represented and the amount of computing time available before a decision must be made. The methods are anytime algorithms in the sense of Boddy and Dean 1991. Techniques are presented for use with Frisch and Haddawy's [1992] anytime deduction system, with an anytime adaptation of Nilsson's [1986] probabilistic logic, and with a probabilistic database model.\nWhile influence diagrams have many advantages as a representation framework for Bayesian decision problems, they have a serious drawback in handling asymmetric decision problems. To be represented in an influence diagram, an asymmetric decision problem must be symmetrized. A considerable amount of unnecessary computation may be involved when a symmetrized influence diagram is evaluated by conventional algorithms. In this paper we present an approach for avoiding such unnecessary computation in influence diagram evaluation.\nModel-based diagnosis reasons backwards from a functional schematic of a system to isolate faults given observations of anomalous behavior. We develop a fully probabilistic approach to model based diagnosis and extend it to support hierarchical models. Our scheme translates the functional schematic into a Bayesian network and diagnostic inference takes place in the Bayesian network. A Bayesian network diagnostic inference algorithm is modified to take advantage of the hierarchy to give computational gains.\nThe semigraphoid closure of every couple of CI-statements (GI=conditional independence) is a stochastic CI-model. As a consequence of this result it is shown that every probabilistically sound inference rule for CI-model, having at most two antecedents, is derivable from the semigraphoid inference rules. This justifies the use of semigraphoids as approximations of stochastic CI-models in probabilistic reasoning. The list of all 19 potential dominant elements of the mentioned semigraphoid closure is given as a byproduct.\nSystem Z+ [Goldszmidt and Pearl, 1991, Goldszmidt, 1992] is a formalism for reasoning with normality defaults of the form \"typically if phi then + (with strength cf)\" where 6 is a positive integer. The system has a critical shortcoming in that it does not sanction inheritance across exceptional subclasses. In this paper we propose an extension to System Z+ that rectifies this shortcoming by extracting additional conditions between worlds from the defaults database. We show that the additional constraints do not change the notion of the consistency of a database. We also make comparisons with competing default reasoning systems.\nBy analyzing the relationships among chance, weight of evidence and degree of beliefwe show that the assertion \"probability functions are special cases of belief functions\" and the assertion \"Dempster's rule can be used to combine belief functions based on distinct bodies of evidence\" together lead to an inconsistency in Dempster-Shafer theory. To solve this problem, we must reject some fundamental postulates of the theory. We introduce a new approach for uncertainty management that shares many intuitive ideas with D-S theory, while avoiding this problem.\nOne important factor determining the computational complexity of evaluating a probabilistic network is the cardinality of the state spaces of the nodes. By varying the granularity of the state spaces, one can trade off accuracy in the result for computational efficiency. We present an anytime procedure for approximate evaluation of probabilistic networks based on this idea. On application to some simple networks, the procedure exhibits a smooth improvement in approximation quality as computation time increases. This suggests that state-space abstraction is one more useful control parameter for designing real-time probabilistic reasoners.\nWe take a general approach to uncertainty on product spaces, and give sufficient conditions for the independence structures of uncertainty measures to satisfy graphoid properties. Since these conditions are arguably more intuitive than some of the graphoid properties, they can be viewed as explanations why probability and certain other formalisms generate graphoids. The conditions include a sufficient condition for the Intersection property which can still apply even if there is a strong logical relations hip between the variables. We indicate how these results can be used to produce theories of qualitative conditional probability which are semi-graphoids and graphoids.\nIn the existing evidential networks with belief functions, the relations among the variables are always represented by joint belief functions on the product space of the involved variables. In this paper, we use conditional belief functions to represent such relations in the network and show some relations of these two kinds of representations. We also present a propagation algorithm for such networks. By analyzing the properties of some special evidential networks with conditional belief functions, we show that the reasoning process can be simplified in such kinds of networks.\nIt is well known that conditional independence can be used to factorize a joint probability into a multiplication of conditional probabilities. This paper proposes a constructive definition of inter-causal independence, which can be used to further factorize a conditional probability. An inference algorithm is developed, which makes use of both conditional independence and inter-causal independence to reduce inference complexity in Bayesian networks.\nWe address the problem of causal interpretation of the graphical structure of Bayesian belief networks (BBNs). We review the concept of causality explicated in the domain of structural equations models and show that it is applicable to BBNs. In this view, which we call mechanism-based, causality is defined within models and causal asymmetries arise when mechanisms are placed in the context of a system. We lay the link between structural equations models and BBNs models and formulate the conditions under which the latter can be given causal interpretation.\nThe primary theme of this investigation is a decision theoretic account of conditional ought statements (e.g., \"You ought to do A, if C\") that rectifies glaring deficiencies in classical deontic logic. The resulting account forms a sound basis for qualitative decision theory, thus providing a framework for qualitative planning under uncertainty. In particular, we show that adding causal relationships (in the form of a single graph) as part of an epistemic state is sufficient to facilitate the analysis of action sequences, their consequences, their interaction with observations, their expected utilities and, hence, the synthesis of plans and strategies under uncertainty.\nSpiegelhalter and Lauritzen [15] studied sequential learning in Bayesian networks and proposed three models for the representation of conditional probabilities. A forth model, shown here, assumes that the parameter distribution is given by a product of Gaussian functions and updates them from the _ and _r messages of evidence propagation. We also generalize the noisy OR-gate for multivalued variables, develop the algorithm to compute probability in time proportional to the number of parents (even in networks with loops) and apply the learning model to this gate.\nI introduce a temporal belief-network representation of causal independence that a knowledge engineer can use to elicit probabilistic models. Like the current, atemporal belief-network representation of causal independence, the new representation makes knowledge acquisition tractable. Unlike the atemproal representation, however, the temporal representation can simplify inference, and does not require the use of unobservable variables. The representation is less general than is the atemporal representation, but appears to be useful for many practical applications.\nWhen eliciting probability models from experts, knowledge engineers may compare the results of the model with expert judgment on test scenarios, then adjust model parameters to bring the behavior of the model more in line with the expert's intuition. This paper presents a methodology for analytic computation of sensitivity values to measure the impact of small changes in a network parameter on a target probability value or distribution. These values can be used to guide knowledge elicitation. They can also be used in a gradient descent algorithm to estimate parameter values that maximize a measure of goodness-of-fit to both local and holistic probability assessments.\nCausal Models are like Dependency Graphs and Belief Nets in that they provide a structure and a set of assumptions from which a joint distribution can, in principle, be computed. Unlike Dependency Graphs, Causal Models are models of hierarchical and/or parallel processes, rather than models of distributions (partially) known to a model builder through some sort of gestalt. As such, Causal Models are more modular, easier to build, more intuitive, and easier to understand than Dependency Graph Models. Causal Models are formally defined and Dependency Graph Models are shown to be a special case of them. Algorithms supporting inference are presented. Parsimonious methods for eliciting dependent probabilities are presented.\nWe investigate the value of extending the completeness of a decision model along different dimensions of refinement. Specifically, we analyze the expected value of quantitative, conceptual, and structural refinement of decision models. We illustrate the key dimensions of refinement with examples. The analyses of value of model refinement can be used to focus the attention of an analyst or an automated reasoning system on extensions of a decision model associated with the greatest expected value.\nValuation networks have been proposed as graphical representations of valuation-based systems (VBSs). The VBS framework is able to capture many uncertainty calculi including probability theory, Dempster-Shafer's belief-function theory, Spohn's epistemic belief theory, and Zadeh's possibility theory. In this paper, we show how valuation networks encode conditional independence relations. For the probabilistic case, the class of probability models encoded by valuation networks includes undirected graph models, directed acyclic graph models, directed balloon graph models, and recursive causal graph models.\nThe Noisy-Or model is convenient for describing a class of uncertain relationships in Bayesian networks [Pearl 1988]. Pearl describes the Noisy-Or model for Boolean variables. Here we generalize the model to nary input and output variables and to arbitrary functions other than the Boolean OR function. This generalization is a useful modeling aid for construction of Bayesian networks. We illustrate with some examples including digital circuit diagnosis and network reliability analysis.\nOne of the most difficult aspects of modeling complex dilemmas in decision-analytic terms is composing a diagram of relevance relations from a set of domain concepts. Decision models in domains such as medicine, however, exhibit certain prototypical patterns that can guide the modeling process. Medical concepts can be classified according to semantic types that have characteristic positions and typical roles in an influence-diagram model. We have developed a graph-grammar production system that uses such inherent interrelationships among medical terms to facilitate the modeling of medical decisions.\nPrevious algorithms for the construction of Bayesian belief network structures from data have been either highly dependent on conditional independence (CI) tests, or have required an ordering on the nodes to be supplied by the user. We present an algorithm that integrates these two approaches - CI tests are used to generate an ordering on the nodes from the database which is then used to recover the underlying Bayesian network structure using a non CI based method. Results of preliminary evaluation of the algorithm on two networks (ALARM and LED) are presented. We also discuss some algorithm performance issues and open problems.\nWe describe the integration of logical and uncertain reasoning methods to identify the likely source and location of software problems. To date, software engineers have had few tools for identifying the sources of error in complex software packages. We describe a method for diagnosing software problems through combining logical and uncertain reasoning analyses. Our preliminary results suggest that such methods can be of value in directing the attention of software engineers to paths of an algorithm that have the highest likelihood of harboring a programming error.\nPropositional representation services such as truth maintenance systems offer powerful support for incremental, interleaved, problem-model construction and evaluation. Probabilistic inference systems, in contrast, have lagged behind in supporting this incrementality typically demanded by problem solvers. The problem, we argue, is that the basic task of probabilistic inference is typically formulated at too large a grain-size. We show how a system built around a smaller grain-size inference task can have the desired incrementality and serve as the basis for a low-level (propositional) probabilistic representation service.\nIntercausal reasoning is a common inference pattern involving probabilistic dependence of causes of an observed common effect. The sign of this dependence is captured by a qualitative property called product synergy. The current definition of product synergy is insufficient for intercausal reasoning where there are additional uninstantiated causes of the common effect. We propose a new definition of product synergy and prove its adequacy for intercausal reasoning with direct and indirect evidence for the common effect. The new definition is based on a new property matrix half positive semi-definiteness, a weakened form of matrix positive semi-definiteness.\nWe examine two types of similarity networks each based on a distinct notion of relevance. For both types of similarity networks we present an efficient inference algorithm that works under the assumption that every event has a nonzero probability of occurrence. Another inference algorithm is developed for type 1 similarity networks that works under no restriction, albeit less efficiently.\nTree structures have been shown to provide an efficient framework for propagating beliefs [Pearl,1986]. This paper studies the problem of finding an optimal approximating tree. The star decomposition scheme for sets of three binary variables [Lazarsfeld,1966; Pearl,1986] is shown to enhance the class of probability distributions that can support tree structures; such structures are called tree-decomposable structures. The logarithm scoring rule is found to be an appropriate optimality criterion to evaluate different tree-decomposable structures. Characteristics of such structures closest to the actual belief network are identified using the logarithm rule, and greedy and exact techniques are developed to find the optimal approximation.\nThe potential influence diagram is a generalization of the standard \"conditional\" influence diagram, a directed network representation for probabilistic inference and decision analysis [Ndilikilikesha, 1991]. It allows efficient inference calculations corresponding exactly to those on undirected graphs. In this paper, we explore the relationship between potential and conditional influence diagrams and provide insight into the properties of the potential influence diagram. In particular, we show how to convert a potential influence diagram into a conditional influence diagram, and how to view the potential influence diagram operations in terms of the conditional influence diagram.\nTo determine the value of perfect information in an influence diagram, one needs first to modify the diagram to reflect the change in information availability, and then to compute the optimal expected values of both the original diagram and the modified diagram. The value of perfect information is the difference between the two optimal expected values. This paper is about how to speed up the computation of the optimal expected value of the modified diagram by making use of the intermediate computation results obtained when computing the optimal expected value of the original diagram.\nA major reason behind the success of probability calculus is that it possesses a number of valuable tools, which are based on the notion of probabilistic independence. In this paper, I identify a notion of logical independence that makes some of these tools available to a class of propositional databases, called argument databases. Specifically, I suggest a graphical representation of argument databases, called argument networks, which resemble Bayesian networks. I also suggest an algorithm for reasoning with argument networks, which resembles a basic algorithm for reasoning with Bayesian networks. Finally, I show that argument networks have several applications: Nonmonotonic reasoning, truth maintenance, and diagnosis.\nIn this paper some initial work towards a new approach to qualitative reasoning under uncertainty is presented. This method is not only applicable to qualitative probabilistic reasoning, as is the case with other methods, but also allows the qualitative propagation within networks of values based upon possibility theory and Dempster-Shafer evidence theory. The method is applied to two simple networks from which a large class of directed graphs may be constructed. The results of this analysis are used to compare the qualitative behaviour of the three major quantitative uncertainty handling formalisms, and to demonstrate that the qualitative integration of the formalisms is possible under certain assumptions.\nThe classical propositional assumption-based model is extended to incorporate probabilities for the assumptions. Then it is placed into the framework of evidence theory. Several authors like Laskey, Lehner (1989) and Provan (1990) already proposed a similar point of view, but the first paper is not as much concerned with mathematical foundations, and Provan's paper develops into a different direction. Here we thoroughly develop and present the mathematical foundations of this theory, together with computational methods adapted from Reiter, De Kleer (1987) and Inoue (1992). Finally, recently proposed techniques for computing degrees of support are presented.\nThis paper presents a procedure to determine a complete belief function from the known values of belief for some of the subsets of the frame of discerment. The method is based on the principle of minimum commitment and a new principle called the focusing principle. This additional principle is based on the idea that belief is specified for the most relevant sets: the focal elements. The resulting procedure is compared with existing methods of building complete belief functions: the minimum specificity principle and the least commitment principle.\nIn a probability-based reasoning system, Bayes' theorem and its variations are often used to revise the system's beliefs. However, if the explicit conditions and the implicit conditions of probability assignments `me properly distinguished, it follows that Bayes' theorem is not a generally applicable revision rule. Upon properly distinguishing belief revision from belief updating, we see that Jeffrey's rule and its variations are not revision rules, either. Without these distinctions, the limitation of the Bayesian approach is often ignored or underestimated. Revision, in its general form, cannot be done in the Bayesian approach, because a probability distribution function alone does not contain the information needed by the operation.\nIn this paper, we present a decision support system based on belief functions and the pignistic transformation. The system is an integration of an evidential system for belief function propagation and a valuation-based system for Bayesian decision analysis. The two subsystems are connected through the pignistic transformation. The system takes as inputs the user's \"gut feelings\" about a situation and suggests what, if any, are to be tested and in what order, and it does so with a user friendly interface.\nIn this paper we describe a novel method for evidential reasoning [1]. It involves modelling the process of evidential reasoning in three steps, namely, evidence structure construction, evidence accumulation, and decision making. The proposed method, called RES, is novel in that evidence strength is associated with an evidential support relationship (an argument) between a pair of statements and such strength is carried by comparison between arguments. This is in contrast to the onventional approaches, where evidence strength is represented numerically and is associated with a statement.\nAn algorithm for generating the structure of a directed acyclic graph from data using the notion of causal input lists is presented. The algorithm manipulates the ordering of the variables with operations which very much resemble arc reversal. Operations are only applied if the DAG after the operation represents at least the independencies represented by the DAG before the operation until no more arcs can be removed from the DAG. The resulting DAG is a minimal l-map.\nPossibilistic logic has been proposed as a numerical formalism for reasoning with uncertainty. There has been interest in developing qualitative accounts of possibility, as well as an explanation of the relationship between possibility and modal logics. We present two modal logics that can be used to represent and reason with qualitative statements of possibility and necessity. Within this modal framework, we are able to identify interesting relationships between possibilistic logic, beliefs and conditionals. In particular, the most natural conditional definable via possibilistic means for default reasoning is identical to Pearl's conditional for e-semantics.\nExperts do not always feel very, comfortable when they have to give precise numerical estimations of certainty degrees. In this paper we present a qualitative approach which allows for attaching partially ordered symbolic grades to logical formulas. Uncertain information is expressed by means of parameterized modal operators. We propose a semantics for this multimodal logic and give a sound and complete axiomatization. We study the links with related approaches and suggest how this framework might be used to manage both uncertain and incomplere knowledge.\nWe have developed a probabilistic forecasting methodology through a synthesis of belief network models and classical time-series analysis. We present the dynamic network model (DNM) and describe methods for constructing, refining, and performing inference with this representation of temporal probabilistic knowledge. The DNM representation extends static belief-network models to more general dynamic forecasting models by integrating and iteratively refining contemporaneous and time-lagged dependencies. We discuss key concepts in terms of a model for forecasting U.S. car sales in Japan.\nWe report on an experimental investigation into opportunities for parallelism in beliefnet inference. Specifically, we report on a study performed of the available parallelism, on hypercube style machines, of a set of randomly generated belief nets, using factoring (SPI) style inference algorithms. Our results indicate that substantial speedup is available, but that it is available only through parallelization of individual conformal product operations, and depends critically on finding an appropriate factoring. We find negligible opportunity for parallelism at the topological, or clustering tree, level.\nThis article offers a modification of Chow and Liu's learning algorithm in the context of handwritten digit recognition. The modified algorithm directs the user to group digits into several classes consisting of digits that are hard to distinguish and then constructing an optimal conditional tree representation for each class of digits instead of for each single digit as done by Chow and Liu (1968). Advantages and extensions of the new method are discussed. Related works of Wong and Wang (1977) and Wong and Poon (1989) which offer a different entropy-based learning algorithm are shown to rest on inappropriate assumptions.\nIn the probabilistic approach to uncertainty management the input knowledge is usually represented by means of some probability distributions. In this paper we assume that the input knowledge is given by two discrete conditional probability distributions, represented by two stochastic matrices P and Q. The consistency of the knowledge base is analyzed. Coherence conditions and explicit formulas for the extension to marginal distributions are obtained in some special cases.\nA computational scheme for reasoning about dynamic systems using (causal) probabilistic networks is presented. The scheme is based on the framework of Lauritzen and Spiegelhalter (1988), and may be viewed as a generalization of the inference methods of classical time-series analysis in the sense that it allows description of non-linear, multivariate dynamic systems with complex conditional independence structures. Further, the scheme provides a method for efficient backward smoothing and possibilities for efficient, approximate forecasting methods. The scheme has been implemented on top of the HUGIN shell.\nThe fundamental updating process in the transferable belief model is related to the concept of specialization and can be described by a specialization matrix. The degree of belief in the truth of a proposition is a degree of justified support. The Principle of Minimal Commitment implies that one should never give more support to the truth of a proposition than justified. We show that Dempster's rule of conditioning corresponds essentially to the least committed specialization, and that Dempster's rule of combination results essentially from commutativity requirements. The concept of generalization, dual to thc concept of specialization, is described.\nWe discuss problems for convex Bayesian decision making and uncertainty representation. These include the inability to accommodate various natural and useful constraints and the possibility of an analog of the classical Dutch Book being made against an agent behaving in accordance with convex Bayesian prescriptions. A more general set-based Bayesianism may be as tractable and would avoid the difficulties we raise.\nThis paper presents a Bayesian framework for assessing the adequacy of a model without the necessity of explicitly enumerating a specific alternate model. A test statistic is developed for tracking the performance of the model across repeated problem instances. Asymptotic methods are used to derive an approximate distribution for the test statistic. When the model is rejected, the individual components of the test statistic can be used to guide search for an alternate model.\nThe ideal Bayesian agent reasons from a global probability model, but real agents are restricted to simplified models which they know to be adequate only in restricted circumstances. Very little formal theory has been developed to help fallibly rational agents manage the process of constructing and revising small world models. The goal of this paper is to present a theoretical framework for analyzing model management approaches. For a probability forecasting problem, a search process over small world models is analyzed as an approximation to a larger-world model which the agent cannot explicitly enumerate or compute. Conditions are given under which the sequence of small-world models converges to the larger-world probabilities.\nAutomated decision making is often complicated by the complexity of the knowledge involved. Much of this complexity arises from the context sensitive variations of the underlying phenomena. We propose a framework for representing descriptive, context-sensitive knowledge. Our approach attempts to integrate categorical and uncertain knowledge in a network formalism. This paper outlines the basic representation constructs, examines their expressiveness and efficiency, and discusses the potential applications of the framework.\nBayesian networks are directed acyclic graphs representing independence relationships among a set of random variables. A random variable can be regarded as a set of exhaustive and mutually exclusive propositions. We argue that there are several drawbacks resulting from the propositional nature and acyclic structure of Bayesian networks. To remedy these shortcomings, we propose a probabilistic network where nodes represent unary predicates and which may contain directed cycles. The proposed representation allows us to represent domain knowledge in a single static network even though we cannot determine the instantiations of the predicates before hand. The ability to deal with cycles also enables us to handle cyclic causal tendencies and to recognize recursive plans.\nWe address the problem of supporting empirical probabilities in monadic logic databases. Though the semantics of multivalued logic programs has been studied extensively, the treatment of probabilities as results of statistical findings has not been studied in logic programming/deductive databases. We develop a model-theoretic characterization of logic databases that facilitates such a treatment. We present an algorithm for checking consistency of such databases and prove its total correctness. We develop a sound and complete query processing procedure for handling queries to such databases.\nThe paper describes aHUGIN, a tool for creating adaptive systems. aHUGIN is an extension of the HUGIN shell, and is based on the methods reported by Spiegelhalter and Lauritzen (1990a). The adaptive systems resulting from aHUGIN are able to adjust the C011ditional probabilities in the model. A short analysis of the adaptation task is given and the features of aHUGIN are described. Finally a session with experiments is reported and the results are discussed.\nProbabilistic reasoning systems combine different probabilistic rules and probabilistic facts to arrive at the desired probability values of consequences. In this paper we describe the MESA-algorithm (Maximum Entropy by Simulated Annealing) that derives a joint distribution of variables or propositions. It takes into account the reliability of probability values and can resolve conflicts between contradictory statements. The joint distribution is represented in terms of marginal distributions and therefore allows to process large inference networks and to determine desired probability values with high precision. The procedure derives a maximum entropy distribution subject to the given constraints. It can be applied to inference networks of arbitrary topology and may be extended into a number of directions.\nAn expert classification system having statistical information about the prior probabilities of the different classes should be able to use this knowledge to reduce the amount of additional information that it must collect, e.g., through questions, in order to make a correct classification. This paper examines how best to use such prior information and additional information-collection opportunities to reduce uncertainty about the class to which a case belongs, thus minimizing the average cost or effort required to correctly classify new cases.\nThe analysis of decision making under uncertainty is closely related to the analysis of probabilistic inference. Indeed, much of the research into efficient methods for probabilistic inference in expert systems has been motivated by the fundamental normative arguments of decision theory. In this paper we show how the developments underlying those efficient methods can be applied immediately to decision problems. In addition to general approaches which need know nothing about the actual probabilistic inference method, we suggest some simple modifications to the clustering family of algorithms in order to efficiently incorporate decision making capabilities.\nThis paper introduces the notions of independence and conditional independence in valuation-based systems (VBS). VBS is an axiomatic framework capable of representing many different uncertainty calculi. We define independence and conditional independence in terms of factorization of the joint valuation. The definitions of independence and conditional independence in VBS generalize the corresponding definitions in probability theory. Our definitions apply not only to probability theory, but also to Dempster-Shafer's belief-function theory, Spohn's epistemic-belief theory, and Zadeh's possibility theory. In fact, they apply to any uncertainty calculi that fit in the framework of valuation-based systems.\nThis paper discusses a target tracking problem in which no dynamic mathematical model is explicitly assumed. A nonlinear filter based on the fuzzy If-then rules is developed. A comparison with a Kalman filter is made, and empirical results show that the performance of the fuzzy filter is better. Intensive simulations suggest that theoretical justification of the empirical results is possible.\nThe DUCK-calculus presented here is a recent approach to cope with probabilistic uncertainty in a sound and efficient way. Uncertain rules with bounds for probabilities and explicit conditional independences can be maintained incrementally. The basic inference mechanism relies on local bounds propagation, implementable by deductive databases with a bottom-up fixpoint evaluation. In situations, where no precise bounds are deducible, it can be combined with simple operations research techniques on a local scope. In particular, we provide new precise analytical bounds for probabilistic entailment.\nJeffrey's rule has been generalized by Wagner to the case in which new evidence bounds the possible revisions of a prior probability below by a Dempsterian lower probability. Classical probability kinematics arises within this generalization as the special case in which the evidentiary focal elements of the bounding lower probability are pairwise disjoint. We discuss a twofold extension of this generalization, first allowing the lower bound to be any two-monotone capacity and then allowing the prior to be a lower envelope.\nIn this paper, a unified framework for representing uncertain information based on the notion of an interval structure is proposed. It is shown that the lower and upper approximations of the rough-set model, the lower and upper bounds of incidence calculus, and the belief and plausibility functions all obey the axioms of an interval structure. An interval structure can be used to synthesize the decision rules provided by the experts. An efficient algorithm to find the desirable set of rules is developed from a set of sound and complete inference axioms.\nThis paper examines the interdependence generated between two parent nodes with a common instantiated child node, such as two hypotheses sharing common evidence. The relation so generated has been termed \"intercausal.\" It is shown by construction that inter-causal independence is possible for binary distributions at one state of evidence. For such \"CICI\" distributions, the two measures of inter-causal effect, \"multiplicative synergy\" and \"additive synergy\" are equal. The well known \"noisy-or\" model is an example of such a distribution. This introduces novel semantics for the noisy-or, as a model of the degree of conflict among competing hypotheses of a common observation.\nIn this paper, we consider several types of information and methods of combination associated with incomplete probabilistic systems. We discriminate between 'a priori' and evidential information. The former one is a description of the whole population, the latest is a restriction based on observations for a particular case. Then, we propose different combination methods for each one of them. We also consider conditioning as the heterogeneous combination of 'a priori' and evidential information. The evidential information is represented as a convex set of likelihood functions. These will have an associated possibility distribution with behavior according to classical Possibility Theory.\nThis paper presents a Bayesian method for constructing Bayesian belief networks from a database of cases. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. We relate the methods in this paper to previous work, and we discuss open problems.\nThis paper discuses multiple Bayesian networks representation paradigms for encoding asymmetric independence assertions. We offer three contributions: (1) an inference mechanism that makes explicit use of asymmetric independence to speed up computations, (2) a simplified definition of similarity networks and extensions of their theory, and (3) a generalized representation scheme that encodes more types of asymmetric independence assertions than do similarity networks.\nWe discuss representing and reasoning with knowledge about the time-dependent utility of an agent's actions. Time-dependent utility plays a crucial role in the interaction between computation and action under bounded resources. We present a semantics for time-dependent utility and describe the use of time-dependent information in decision contexts. We illustrate our discussion with examples of time-pressured reasoning in Protos, a system constructed to explore the ideal control of inference by reasoners with limit abilities.\nTraditional approaches to non-monotonic reasoning fail to satisfy a number of plausible axioms for belief revision and suffer from conceptual difficulties as well. Recent work on ranked preferential models (RPMs) promises to overcome some of these difficulties. Here we show that RPMs are not adequate to handle iterated belief change. Specifically, we show that RPMs do not always allow for the reversibility of belief change. This result indicates the need for numerical strengths of belief.\nThe concept of movable evidence masses that flow from supersets to subsets as specified by experts represents a suitable framework for reasoning under uncertainty. The mass flow is controlled by specialization matrices. New evidence is integrated into the frame of discernment by conditioning or revision (Dempster's rule of conditioning), for which special specialization matrices exist. Even some aspects of non-monotonic reasoning can be represented by certain specialization matrices.\nAny probabilistic model of a problem is based on assumptions which, if violated, invalidate the model. Users of probability based decision aids need to be alerted when cases arise that are not covered by the aid's model. Diagnosis of model failure is also necessary to control dynamic model construction and revision. This paper presents a set of decision theoretically motivated heuristics for diagnosing situations in which a model is likely to provide an inadequate representation of the process being modeled.\nA series of monte carlo studies were performed to compare the behavior of some alternative procedures for reasoning under uncertainty. The behavior of several Bayesian, linear model and default reasoning procedures were examined in the context of increasing levels of calibration error. The most interesting result is that Bayesian procedures tended to output more extreme posterior belief values (posterior beliefs near 0.0 or 1.0) than other techniques, but the linear models were relatively less likely to output strong support for an erroneous conclusion. Also, accounting for the probabilistic dependencies between evidence items was important for both Bayesian and linear updating procedures.\nSelecting the right reference class and the right interval when faced with conflicting candidates and no possibility of establishing subset style dominance has been a problem for Kyburg's Evidential Probability system. Various methods have been proposed by Loui and Kyburg to solve this problem in a way that is both intuitively appealing and justifiable within Kyburg's framework. The scheme proposed in this paper leads to stronger statistical assertions without sacrificing too much of the intuitive appeal of Kyburg's latest proposal.\nIn this paper we study the uses and the semantics of non-monotonic negation in probabilistic deductive data bases. Based on the stable semantics for classical logic programming, we introduce the notion of stable formula, functions. We show that stable formula, functions are minimal fixpoints of operators associated with probabilistic deductive databases with negation. Furthermore, since a. probabilistic deductive database may not necessarily have a stable formula function, we provide a stable class semantics for such databases. Finally, we demonstrate that the proposed semantics can handle default reasoning naturally in the context of probabilistic deduction.\nThe EM-algorithm is a general procedure to get maximum likelihood estimates if part of the observations on the variables of a network are missing. In this paper a stochastic version of the algorithm is adapted to probabilistic neural networks describing the associative dependency of variables. These networks have a probability distribution, which is a special case of the distribution generated by probabilistic inference networks. Hence both types of networks can be combined allowing to integrate probabilistic rules as well as unspecified associations in a sound way. The resulting network may have a number of interesting features including cycles of probabilistic rules, hidden 'unobservable' variables, and uncertain and contradictory evidence.\nThis paper presents a simple framework for Horn clause abduction, with probabilities associated with hypotheses. It is shown how this representation can represent any probabilistic knowledge representable in a Bayesian belief network. The main contributions are in finding a relationship between logical and probabilistic notions of evidential reasoning. This can be used as a basis for a new way to implement Bayesian Networks that allows for approximations to the value of the posterior probabilities, and also points to a way that Bayesian networks can be extended beyond a propositional language.\nWe present PULCinella and its use in comparing uncertainty theories. PULCinella is a general tool for Propagating Uncertainty based on the Local Computation technique of Shafer and Shenoy. It may be specialized to different uncertainty theories: at the moment, Pulcinella can propagate probabilities, belief functions, Boolean values, and possibilities. Moreover, Pulcinella allows the user to easily define his own specializations. To illustrate Pulcinella, we analyze two examples by using each of the four theories above. In the first one, we mainly focus on intrinsic differences between theories. In the second one, we take a knowledge engineer viewpoint, and check the adequacy of each theory to a given problem.\nIn general, the best explanation for a given observation makes no promises on how good it is with respect to other alternative explanations. A major deficiency of message-passing schemes for belief revision in Bayesian networks is their inability to generate alternatives beyond the second best. In this paper, we present a general approach based on linear constraint systems that naturally generates alternative explanations in an orderly and highly efficient manner. This approach is then applied to cost-based abduction problems as well as belief revision in Bayesian net works.\nSurvey of several forms of updating, with a practical illustrative example. We study several updating (conditioning) schemes that emerge naturally from a common scenarion to provide some insights into their meaning. Updating is a subtle operation and there is no single method, no single 'good' rule. The choice of the appropriate rule must always be given due consideration. Planchet (1989) presents a mathematical survey of many rules. We focus on the practical meaning of these rules. After summarizing the several rules for conditioning, we present an illustrative example in which the various forms of conditioning can be explained.\nIn probabilistic logic entailments, even moderate size problems can yield linear constraint systems with so many variables that exact methods are impractical. This difficulty can be remedied in many cases of interest by introducing a three valued logic (true, false, and \"don't care\"). The three-valued approach allows the construction of \"compressed\" constraint systems which have the same solution sets as their two-valued counterparts, but which may involve dramatically fewer variables. Techniques to calculate point estimates for the posterior probabilities of entailed sentences are discussed.\nThe presence of latent variables can greatly complicate inferences about causal relations between measured variables from statistical data. In many cases, the presence of latent variables makes it impossible to determine for two measured variables A and B, whether A causes B, B causes A, or there is some common cause. In this paper I present several theorems that state conditions under which it is possible to reliably infer the causal relation between two measured variables, regardless of whether latent variables are acting or not.\nThe local computation technique (Shafer et al. 1987, Shafer and Shenoy 1988, Shenoy and Shafer 1986) is used for propagating belief functions in so called a Markov Tree. In this paper, we describe an efficient implementation of belief function propagation on the basis of the local computation technique. The presented method avoids all the redundant computations in the propagation process, and so makes the computational complexity decrease with respect to other existing implementations (Hsia and Shenoy 1989, Zarley et al. 1988). We also give a combined algorithm for both propagation and re-propagation which makes the re-propagation process more efficient when one or more of the prior belief functions is changed.\nSurely we want solid foundations. What kind of castle can we build on sand? What is the point of devoting effort to balconies and minarets, if the foundation may be so weak as to allow the structure to collapse of its own weight? We want our foundations set on bedrock, designed to last for generations. Who would want an architect who cannot certify the soundness of the foundations of his buildings?\nIn this work we are interested in the problem of energy management in Mobile Ad-hoc Network (MANET). The solving and optimization of MANET allow assisting the users to efficiently use their devices in order to minimize the batteries power consumption. In this framework, we propose a modelling of the MANET in form of a Constraint Optimization Problem called COMANET. Then, in the objective to minimize the consumption of batteries power, we present an approach based on an adaptation of the A star algorithm to the MANET problem called MANED. Finally, we expose some experimental results showing utility of this approach.\nIn order to represent the preferences of a group of individuals, we introduce Probabilistic CP-nets (PCP-nets). PCP-nets provide a compact language for representing probability distributions over preference orderings. We argue that they are useful for aggregating preferences or modelling noisy preferences. Then we give efficient algorithms for the main reasoning problems, namely for computing the probability that a given outcome is preferred to another one, and the probability that a given outcome is optimal. As a by-product, we obtain an unexpected linear-time algorithm for checking dominance in a standard, tree-structured CP-net.\nWe consider the problem of learning Bayesian networks (BNs) from complete discrete data. This problem of discrete optimisation is formulated as an integer program (IP). We describe the various steps we have taken to allow efficient solving of this IP. These are (i) efficient search for cutting planes, (ii) a fast greedy algorithm to find high-scoring (perhaps not optimal) BNs and (iii) tightening the linear relaxation of the IP. After relating this BN learning problem to set covering and the multidimensional 0-1 knapsack problem, we present our empirical results. These show improvements, sometimes dramatic, over earlier results.\nThe PC algorithm learns maximally oriented causal Bayesian networks. However, there is no equivalent complete algorithm for learning the structure of relational models, a more expressive generalization of Bayesian networks. Recent developments in the theory and representation of relational models support lifted reasoning about conditional independence. This enables a powerful constraint for orienting bivariate dependencies and forms the basis of a new algorithm for learning structure. We present the relational causal discovery (RCD) algorithm that learns causal relational models. We prove that RCD is sound and complete, and we present empirical results that demonstrate effectiveness.\nWe propose a kernel method to identify finite mixtures of nonparametric product distributions. It is based on a Hilbert space embedding of the joint distribution. The rank of the constructed tensor is equal to the number of mixture components. We present an algorithm to recover the components by partitioning the data points into clusters such that the variables are jointly conditionally independent given the cluster. This method can be used to identify finite confounders.\nThis paper proposes an approach for the adaptation of spatial or temporal cases in a case-based reasoning system. Qualitative algebras are used as spatial and temporal knowledge representation languages. The intuition behind this adaptation approach is to apply a substitution and then repair potential inconsistencies, thanks to belief revision on qualitative algebras. A temporal example from the cooking domain is given. (The paper on which this extended abstract is based was the recipient of the best paper award of the 2012 International Conference on Case-Based Reasoning.)\nIn Artificial Intelligence, planning refers to an area of research that proposes to develop systems that can automatically generate a result set, in the form of an integrated decision-making system through a formal procedure, known as plan. Instead of resorting to the scheduling algorithms to generate plans, it is proposed to operate the automatic learning by decision tree to optimize time. In this paper, we propose to build a classification model by induction graph from a learning sample containing plans that have an associated set of descriptors whose values change depending on each plan. This model will then operate for classifying new cases by assigning the appropriate plan.\nArtificial Intelligence - what is this? That is the question! In earlier papers we already gave a formal definition for AI, but if one desires to build an actual AI implementation, the following issues require attention and are treated here: the data format to be used, the idea of Undef and Nothing symbols, various ways for defining the \"meaning of life\", and finally, a new notion of \"incorrect move\". These questions are of minor importance in the theoretical discussion, but we already know the answer of the question \"Does AI exist?\" Now we want to make the next step and to create this program.\nWe develop a technique for deriving data-dependent error bounds for transductive learning algorithms based on transductive Rademacher complexity. Our technique is based on a novel general error bound for transduction in terms of transductive Rademacher complexity, together with a novel bounding technique for Rademacher averages for particular algorithms, in terms of their \"unlabeled-labeled\" representation. This technique is relevant to many advanced graph-based transductive algorithms and we demonstrate its effectiveness by deriving error bounds to three well known algorithms. Finally, we present a new PAC-Bayesian bound for mixtures of transductive algorithms based on our Rademacher bounds.\nThis paper presents several new tractability results for planning based on macros. We describe an algorithm that optimally solves planning problems in a class that we call inverted tree reducible, and is provably tractable for several subclasses of this class. By using macros to store partial plans that recur frequently in the solution, the algorithm is polynomial in time and space even for exponentially long plans. We generalize the inverted tree reducible class in several ways and describe modifications of the algorithm to deal with these new classes. Theoretical results are validated in experiments.\nBinary Decision Diagram (BDD) based set bounds propagation is a powerful approach to solving set-constraint satisfaction problems. However, prior BDD based techniques in- cur the significant overhead of constructing and manipulating graphs during search. We present a set-constraint solver which combines BDD-based set-bounds propagators with the learning abilities of a modern SAT solver. Together with a number of improvements beyond the basic algorithm, this solver is highly competitive with existing propagation based set constraint solvers.\nTranslations between different nonmonotonic formalisms always have been an important topic in the field, in particular to understand the knowledge-representation capabilities those formalisms offer. We provide such an investigation in terms of different semantics proposed for abstract argumentation frameworks, a nonmonotonic yet simple formalism which received increasing interest within the last decade. Although the properties of these different semantics are nowadays well understood, there are no explicit results about intertranslatability. We provide such translations wrt. different properties and also give a few novel complexity results which underlie some negative results.\nWe describe Dr.Fill, a program that solves American-style crossword puzzles. From a technical perspective, Dr.Fill works by converting crosswords to weighted CSPs, and then using a variety of novel techniques to find a solution. These techniques include generally applicable heuristics for variable and value selection, a variant of limited discrepancy search, and postprocessing and partitioning ideas. Branch and bound is not used, as it was incompatible with postprocessing and was determined experimentally to be of little practical value. Dr.Fillls performance on crosswords from the American Crossword Puzzle Tournament suggests that it ranks among the top fifty or so crossword solvers in the world.\nWe investigate a class of first-order temporal-epistemic logics for reasoning about multi-agent systems. We encode typical properties of systems including perfect recall, synchronicity, no learning, and having a unique initial state in terms of variants of quantified interpreted systems, a first-order extension of interpreted systems. We identify several monodic fragments of first-order temporal-epistemic logic and show their completeness with respect to their corresponding classes of quantified interpreted systems.\nA model of story generation recently proposed by Riedl and Young casts it as planning, with the additional condition that story characters behave intentionally. This means that characters have perceivable motivation for the actions they take. I show that this condition can be compiled away (in more ways than one) to produce a classical planning problem that can be solved by an off-the-shelf classical planner, more efficiently than by Riedl and Youngs specialised planner.\nThis paper introduces the concept of rational countefactuals which is an idea of identifying a counterfactual from the factual (whether perceived or real) that maximizes the attainment of the desired consequent. In counterfactual thinking if we have a factual statement like: Saddam Hussein invaded Kuwait and consequently George Bush declared war on Iraq then its counterfactuals is: If Saddam Hussein did not invade Kuwait then George Bush would not have declared war on Iraq. The theory of rational counterfactuals is applied to identify the antecedent that gives the desired consequent necessary for rational decision making. The rational countefactual theory is applied to identify the values of variables Allies, Contingency, Distance, Major Power, Capability, Democracy, as well as Economic Interdependency that gives the desired consequent Peace.\nI am Joachim Jansen and this is my research summary, part of my application to the Doctoral Consortium at ICLP'14. I am a PhD student in the Knowledge Representation and Reasoning (KRR) research group, a subgroup of the Declarative Languages and Artificial Intelligence (DTAI) group at the department of Computer Science at KU Leuven. I started my PhD in September 2012. My promotor is prof. dr. ir. Gerda Janssens and my co-promotor is prof. dr. Marc Denecker. I can be contacted at joachim.jansen@cs.kuleuven.be or at: Room 01.167 Celestijnenlaan 200A 3001 Heverlee Belgium An extended abstract / full version of a paper accepted to be presented at the Doctoral Consortium of the 30th International Conference on Logic Programming (ICLP 2014), July 19-22, Vienna, Austria\nThe psychological state of flow has been linked to optimizing human performance. A key condition of flow emergence is a match between the human abilities and complexity of the task. We propose a simple computational model of flow for Artificial Intelligence (AI) agents. The model factors the standard agent-environment state into a self-reflective set of the agent's abilities and a socially learned set of the environmental complexity. Maximizing the flow serves as a meta control for the agent. We show how to apply the meta-control policy to a broad class of AI control policies and illustrate our approach with a specific implementation. Results in a synthetic testbed are promising and open interesting directions for future work.\nWe introduce a logic for reasoning about evidence, that essentially views evidence as a function from prior beliefs (before making an observation) to posterior beliefs (after making the observation). We provide a sound and complete axiomatization for the logic, and consider the complexity of the decision problem. Although the reasoning in the logic is mainly propositional, we allow variables representing numbers and quantification over them. This expressive power seems necessary to capture important properties of evidence\nWe present a propositional logic to reason about the uncertainty of events, where the uncertainty is modeled by a set of probability measures assigning an interval of probability to each event. We give a sound and complete axiomatization for the logic, and show that the satisfiability problem is NP-complete, no harder than satisfiability for propositional logic.\nWe present a heuristic search algorithm for solving first-order MDPs (FOMDPs). Our approach combines first-order state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, we apply state abstraction directly on the FOMDP avoiding propositionalization. Such kind of abstraction is referred to as firstorder state abstraction. Secondly, guided by an admissible heuristic, the search is restricted only to those states that are reachable from the initial state. We demonstrate the usefullness of the above techniques for solving FOMDPs on a system, referred to as FCPlanner, that entered the probabilistic track of the International Planning Competition (IPC'2004).\nWe present a novel approach to detecting and utilizing symmetries in probabilistic graphical models with two main contributions. First, we present a scalable approach to computing generating sets of permutation groups representing the symmetries of graphical models. Second, we introduce orbital Markov chains, a novel family of Markov chains leveraging model symmetries to reduce mixing times. We establish an insightful connection between model symmetries and rapid mixing of orbital Markov chains. Thus, we present the first lifted MCMC algorithm for probabilistic graphical models. Both analytical and empirical results demonstrate the effectiveness and efficiency of the approach.\nTwo interpretations about syllogistic statements are described in this paper. One is the so-called set-based interpretation, which assumes that quantified statements and syllogisms talk about quantity-relationships between sets. The other one, the so-called conditional interpretation, assumes that quantified propositions talk about conditional propositions and how strong are the links between the antecedent and the consequent. Both interpretations are compared attending to three different questions (existential import, singular statements and non-proportional quantifiers) from the point of view of their impact on the further development of this type of reasoning.\nQsmodels is a novel application of Answer Set Programming to interactive gaming environment. We describe a software architecture by which the behavior of a bot acting inside the Quake 3 Arena can be controlled by a planner. The planner is written as an Answer Set Program and is interpreted by the Smodels solver.\nHow could we solve the machine learning and the artificial intelligence problem if we had infinite computation? Solomonoff induction and the reinforcement learning agent AIXI are proposed answers to this question. Both are known to be incomputable. In this paper, we quantify this using the arithmetical hierarchy, and prove upper and corresponding lower bounds for incomputability. We show that AIXI is not limit computable, thus it cannot be approximated using finite computation. Our main result is a limit-computable {\\epsilon}-optimal version of AIXI with infinite horizon that maximizes expected rewards.\nSequential allocation is a simple and attractive mechanism for the allocation of indivisible goods. Agents take turns, according to a policy, to pick items. Sequential allocation is guaranteed to return an allocation which is efficient but may not have an optimal social welfare. We consider therefore the relation between welfare and efficiency. We study the (computational) questions of what welfare is possible or necessary depending on the choice of policy. We also consider a novel control problem in which the chair chooses a policy to improve social welfare.\nWithin the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.\nSocial norms are powerful formalism in coordinating autonomous agents' behaviour to achieve certain objectives. In this paper, we propose a dynamic normative system to enable the reasoning of the changes of norms under different circumstances, which cannot be done in the existing static normative systems. We study two important problems (norm synthesis and norm recognition) related to the autonomy of the entire system and the agents, and characterise the computational complexities of solving these problems.\nPersonalized recommendations for new users, also known as the cold-start problem, can be formulated as a contextual bandit problem. Existing contextual bandit algorithms generally rely on features alone to capture user variability. Such methods are inefficient in learning new users' interests. In this paper we propose Latent Contextual Bandits. We consider both the benefit of leveraging a set of learned latent user classes for new users, and how we can learn such latent classes from prior users. We show that our approach achieves a better regret bound than existing algorithms. We also demonstrate the benefit of our approach using a large real world dataset and a preliminary user study.\nThis paper presents a procedural generation method that creates visually attractive levels for the Angry Birds game. Besides being an immensely popular mobile game, Angry Birds has recently become a test bed for various artificial intelligence technologies. We propose a new approach for procedurally generating Angry Birds levels using Chinese style and Japanese style building structures. A conducted experiment confirms the effectiveness of our approach with statistical significance.\nLogic-based abduction finds important applications in artificial intelligence and related areas. One application example is in finding explanations for observed phenomena. Propositional abduction is a restriction of abduction to the propositional domain, and complexity-wise is in the second level of the polynomial hierarchy. Recent work has shown that exploiting implicit hitting sets and propositional satisfiability (SAT) solvers provides an efficient approach for propositional abduction. This paper investigates this earlier work and proposes a number of algorithmic improvements. These improvements are shown to yield exponential reductions in the number of SAT solver calls. More importantly, the experimental results show significant performance improvements compared to the the best approaches for propositional abduction.\n\"Natural Language,\" whether spoken and attended to by humans, or processed and generated by computers, requires networked structures that reflect creative processes in semantic, syntactic, phonetic, linguistic, social, emotional, and cultural modules. Being able to produce novel and useful behavior following repeated practice gets to the root of both artificial intelligence and human language. This paper investigates the modalities involved in language-like applications that computers -- and programmers -- engage with, and aims to fine tune the questions we ask to better account for context, self-awareness, and embodiment.\nWe investigate learning heuristics for domain-specific planning. Prior work framed learning a heuristic as an ordinary regression problem. However, in a greedy best-first search, the ordering of states induced by a heuristic is more indicative of the resulting planner's performance than mean squared error. Thus, we instead frame learning a heuristic as a learning to rank problem which we solve using a RankSVM formulation. Additionally, we introduce new methods for computing features that capture temporal interactions in an approximate plan. Our experiments on recent International Planning Competition problems show that the RankSVM learned heuristics outperform both the original heuristics and heuristics learned through ordinary regression.\nWe argue that there already exists de facto artificial intelligence policy - a patchwork of policies impacting the field of AI's development in myriad ways. The key question related to AI policy, then, is not whether AI should be governed at all, but how it is currently being governed, and how that governance might become more informed, integrated, effective, and anticipatory. We describe the main components of de facto AI policy and make some recommendations for how AI policy can be improved, drawing on lessons from other scientific and technological domains.\nThis article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness.\nIntractable distributions present a common difficulty in inference within the probabilistic knowledge representation framework and variational methods have recently been popular in providing an approximate solution. In this article, we describe a perturbational approach in the form of a cumulant expansion which, to lowest order, recovers the standard Kullback-Leibler variational bound. Higher-order terms describe corrections on the variational approach without incurring much further computational cost. The relationship to other perturbational approaches such as TAP is also elucidated. We demonstrate the method on a particular class of undirected graphical models, Boltzmann machines, for which our simulation results confirm improved accuracy and enhanced stability during learning.\nA previously developed quantum search algorithm for solving 1-SAT problems in a single step is generalized to apply to a range of highly constrained k-SAT problems. We identify a bound on the number of clauses in satisfiability problems for which the generalized algorithm can find a solution in a constant number of steps as the number of variables increases. This performance contrasts with the linear growth in the number of steps required by the best classical algorithms, and the exponential number required by classical and quantum methods that ignore the problem structure. In some cases, the algorithm can also guarantee that insoluble problems in fact have no solutions, unlike previously proposed quantum search algorithms.\nWe study properties of programs with monotone and convex constraints. We extend to these formalisms concepts and results from normal logic programming. They include the notions of strong and uniform equivalence with their characterizations, tight programs and Fages Lemma, program completion and loop formulas. Our results provide an abstract account of properties of some recent extensions of logic programming with aggregates, especially the formalism of lparse programs. They imply a method to compute stable models of lparse programs by means of off-the-shelf solvers of pseudo-boolean constraints, which is often much faster than the smodels system.\nEfficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems.\nIn this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a probabilistic, relational planning rule representation that compactly models noisy, nondeterministic action effects, and show how such rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics.\nIn this paper, we construct and investigate a hierarchy of spatio-temporal formalisms that result from various combinations of propositional spatial and temporal logics such as the propositional temporal logic PTL, the spatial logics RCC-8, BRCC-8, S4u and their fragments. The obtained results give a clear picture of the trade-off between expressiveness and computational realisability within the hierarchy. We demonstrate how different combining principles as well as spatial and temporal primitives can produce NP-, PSPACE-, EXPSPACE-, 2EXPSPACE-complete, and even undecidable spatio-temporal logics out of components that are at most NP- or PSPACE-complete.\nIn this commentary I argue that although PDDL is a very useful standard for the planning competition, its design does not properly consider the issue of domain modeling. Hence, I would not advocate its use in specifying planning domains outside of the context of the planning competition. Rather, the field needs to explore different approaches and grapple more directly with the problem of effectively modeling and utilizing all of the diverse pieces of knowledge we typically have about planning domains.\nWe study auctions with severe bounds on the communication allowed: each bidder may only transmit t bits of information to the auctioneer. We consider both welfare- and profit-maximizing auctions under this communication restriction. For both measures, we determine the optimal auction and show that the loss incurred relative to unconstrained auctions is mild. We prove non-surprising properties of these kinds of auctions, e.g., that in optimal mechanisms bidders simply report the interval in which their valuation lies in, as well as some surprising properties, e.g., that asymmetric auctions are better than symmetric ones and that multi-round auctions reduce the communication complexity only by a linear factor.\nThis paper describes Marvin, a planner that competed in the Fourth International Planning Competition (IPC 4). Marvin uses action-sequence-memoisation techniques to generate macro-actions, which are then used during search for a solution plan. We provide an overview of its architecture and search behaviour, detailing the algorithms used. We also empirically demonstrate the effectiveness of its features in various planning domains; in particular, the effects on performance due to the use of macro-actions, the novel features of its search behaviour, and the native support of ADL and Derived Predicates.\nIn this paper we apply computer-aided theorem discovery technique to discover theorems about strongly equivalent logic programs under the answer set semantics. Our discovered theorems capture new classes of strongly equivalent logic programs that can lead to new program simplification rules that preserve strong equivalence. Specifically, with the help of computers, we discovered exact conditions that capture the strong equivalence between a rule and the empty set, between two rules, between two rules and one of the two rules, between two rules and another rule, and between three rules and two of the three rules.\nThe QXORSAT problem is the quantified version of the satisfiability problem XORSAT in which the connective exclusive-or is used instead of the usual or. We study the phase transition associated with random QXORSAT instances. We give a description of this phase transition in the case of one alternation of quantifiers, thus performing an advanced practical and theoretical study on the phase transition of a quantified roblem.\nFunctional dependencies restrict the potential interactions among variables connected in a probabilistic network. This restriction can be exploited in qualitative probabilistic reasoning by introducing deterministic variables and modifying the inference rules to produce stronger conclusions in the presence of functional relations. I describe how to accomplish these modifications in qualitative probabilistic networks by exhibiting the update procedures for graphical transformations involving probabilistic and deterministic variables and combinations. A simple example demonstrates that the augmented scheme can reduce qualitative ambiguity that would arise without the special treatment of functional dependency. Analysis of qualitative synergy reveals that new higher-order relations are required to reason effectively about synergistic interactions among deterministic variables.\nAn experiment replicated and extended recent findings on psychologically realistic ways of modeling propagation of uncertainty in rule based reasoning. Within a single production rule, the antecedent evidence can be summarized by taking the maximum of disjunctively connected antecedents and the minimum of conjunctively connected antecedents. The maximum certainty factor attached to each of the rule's conclusions can be sealed down by multiplication with this summarized antecedent certainty. Heckerman's modified certainty factor technique can be used to combine certainties for common conclusions across production rules.\nIn this paper, we extend the QMRDT probabilistic model for the domain of internal medicine to include decisions about treatments. In addition, we describe how we can use the comprehensive decision model to construct a simpler decision model for a specific patient. In so doing, we transform the task of problem formulation to that of narrowing of a larger problem.\nWe describe a method for incrementally constructing belief networks. We have developed a network-construction language similar to a forward-chaining language using data dependencies, but with additional features for specifying distributions. Using this language, we can define parameterized classes of probabilistic models. These parameterized models make it possible to apply probabilistic reasoning to problems for which it is impractical to have a single large static model.\nWe describe an environment that considerably simplifies the process of generating Bayesian belief networks. The system has been implemented on readily available, inexpensive hardware, and provides clarity and high performance. We present an introduction to Bayesian belief networks, discuss algorithms for inference with these networks, and delineate the classes of problems that can be solved with this paradigm. We then describe the hardware and software that constitute the system, and illustrate Ergo's use with several example\nIn this paper we present a framework for dynamically constructing Bayesian networks. We introduce the notion of a background knowledge base of schemata, which is a collection of parameterized conditional probability statements. These schemata explicitly separate the general knowledge of properties an individual may have from the specific knowledge of particular individuals that may have these properties. Knowledge of individuals can be combined with this background knowledge to create Bayesian networks, which can then be used in any propagation scheme. We discuss the theory and assumptions necessary for the implementation of dynamic Bayesian networks, and indicate where our approach may be useful.\nA series of monte carlo studies were performed to assess the extent to which different inference procedures robustly output reasonable belief values in the context of increasing levels of judgmental imprecision. It was found that, when compared to an equal-weights linear model, the Bayesian procedures are more likely to deduce strong support for a hypothesis. But, the Bayesian procedures are also more likely to strongly support the wrong hypothesis. Bayesian techniques are more powerful, but are also more error prone.\nThis paper describes a generalization of previous methods for constructing tree-structured belief network with hidden variables. The major new feature of the described method is the ability to produce a tree decomposition even when there are errors in the correlation data among the input variables. This is an important extension of existing methods since the correlational coefficients usually cannot be measured with precision. The technique involves using a greedy search algorithm that locally minimizes an error function.\nIDEAL (Influence Diagram Evaluation and Analysis in Lisp) is a software environment for creation and evaluation of belief networks and influence diagrams. IDEAL is primarily a research tool and provides an implementation of many of the latest developments in belief network and influence diagram evaluation in a unified framework. This paper describes IDEAL and some lessons learned during its development.\nIn this paper, optimum decomposition of belief networks is discussed. Some methods of decomposition are examined and a new method - the method of Minimum Total Number of States (MTNS) - is proposed. The problem of optimum belief network decomposition under our framework, as under all the other frameworks, is shown to be NP-hard. According to the computational complexity analysis, an algorithm of belief network decomposition is proposed in (Wee, 1990a) based on simulated annealing.\nWe introduce a new heuristic algorithm for the problem of finding minimum size loop cutsets in multiply connected belief networks. We compare this algorithm to that proposed in [Suemmondt and Cooper, 1988]. We provide lower bounds on the performance of these algorithms with respect to one another and with respect to optimal. We demonstrate that no heuristic algorithm for this problem cam be guaranteed to produce loop cutsets within a constant difference from optimal. We discuss experimental results based on randomly generated networks, and discuss future work and open questions.\nCutset conditioning and clique-tree propagation are two popular methods for performing exact probabilistic inference in Bayesian belief networks. Cutset conditioning is based on decomposition of a subset of network nodes, whereas clique-tree propagation depends on aggregation of nodes. We describe a means to combine cutset conditioning and clique- tree propagation in an approach called aggregation after decomposition (AD). We discuss the application of the AD method in the Pathfinder system, a medical expert system that offers assistance with diagnosis in hematopathology.\nIn this paper, we suggest marrying Dempster-Shafer (DS) theory with Knowledge Representation (KR). Born out of this marriage is the definition of \"Dempster-Shafer Belief Bases\", abstract data types representing uncertain knowledge that use DS theory for representing strength of belief about our knowledge, and the linguistic structures of an arbitrary KR system for representing the knowledge itself. A formal result guarantees that both the properties of the given KR system and of DS theory are preserved. The general model is exemplified by defining DS Belief Bases where First Order Logic and (an extension of) KRYPTON are used as KR systems. The implementation problem is also touched upon.\nWe point out the need to use probability amplitudes rather than probabilities to model evidence accumulation in decision processes involving real physical sensors. Optical information processing systems are given as typical examples of systems that naturally gather evidence in this manner. We derive a new, amplitude-based generalization of the Hough transform technique used for object recognition in machine vision. We argue that one should use complex Hough accumulators and square their magnitudes to get a proper probabilistic interpretation of the likelihood that an object is present. Finally, we suggest that probability amplitudes may have natural applications in connectionist models, as well as in formulating knowledge-based reasoning problems.\nA framework is presented for a computational theory of probabilistic argument. The Probabilistic Reasoning Environment encodes knowledge at three levels. At the deepest level are a set of schemata encoding the system's domain knowledge. This knowledge is used to build a set of second-level arguments, which are structured for efficient recapture of the knowledge used to construct them. Finally, at the top level is a Bayesian network constructed from the arguments. The system is designed to facilitate not just propagation of beliefs and assimilation of evidence, but also the dynamic process of constructing a belief network, evaluating its adequacy, and revising it when necessary.\nProbability estimation by maximum entropy reconstruction of an initial relative frequency estimate from its projection onto a hypergraph model of the approximate conditional independence relations exhibited by it is investigated. The results of this study suggest that use of this estimation technique may improve the quality of decisions that must be made on the basis of limited observations over a decomposable finite product space.\nThis paper describes a natural framework for rules, based on belief functions, which includes a repre- sentation of numerical rules, default rules and rules allowing and rules not allowing contraposition. In particular it justifies the use of the Dempster-Shafer Theory for representing a particular class of rules, Belief calculated being a lower probability given certain independence assumptions on an underlying space. It shows how a belief function framework can be generalised to other logics, including a general Monte-Carlo algorithm for calculating belief, and how a version of Reiter's Default Logic can be seen as a limiting case of a belief function formalism.\nMany AI researchers argue that probability theory is only capable of dealing with uncertainty in situations where a full specification of a joint probability distribution is available, and conclude that it is not suitable for application in knowledge-based systems. Probability intervals, however, constitute a means for expressing incompleteness of information. We present a method for computing such probability intervals for probabilities of interest from a partial specification of a joint probability distribution. Our method improves on earlier approaches by allowing for independency relationships between statistical variables to be exploited.\nWe analyzed the convergence properties of likelihood- weighting algorithms on a two-level, multiply connected, belief-network representation of the QMR knowledge base of internal medicine. Specifically, on two difficult diagnostic cases, we examined the effects of Markov blanket scoring, importance sampling, demonstrating that the Markov blanket scoring and self-importance sampling significantly improve the convergence of the simulation on our model.\nA scientific reasoning system makes decisions using objective evidence in the form of independent experimental trials, propositional axioms, and constraints on the probabilities of events. As a first step towards this goal, we propose a system that derives probability intervals from objective evidence in those forms. Our reasoning system can manage uncertainty about data and rules in a rule based expert system. We expect that our system will be particularly applicable to diagnosis and analysis in domains with a wealth of experimental evidence such as medicine. We discuss limitations of this solution and propose future directions for this research. This work can be considered a generalization of Nilsson's \"probabilistic logic\" [Nil86] to intervals and experimental observations.\nPlan recognition does not work the same way in stories and in \"real life\" (people tend to jump to conclusions more in stories). We present a theory of this, for the particular case of how objects in stories (or in life) influence plan recognition decisions. We provide a Bayesian network formalization of a simple first-order theory of plans, and show how a particular network parameter seems to govern the difference between \"life-like\" and \"story-like\" response. We then show why this parameter would be influenced (in the desired way) by a model of speaker (or author) topic selection which assumes that facts in stories are typically \"relevant\".\nUnaided human decision making appears to systematically violate consistency constraints imposed by normative theories; these biases in turn appear to justify the application of formal decision-analytic models. It is argued that both claims are wrong. In particular, we will argue that the \"confirmation bias\" is premised on an overly narrow view of how conflicting evidence is and ought to be handled. Effective decision aiding should focus on supporting the contral processes by means of which knowledge is extended into novel situations and in which assumptions are adopted, utilized, and revised. The Non- Monotonic Probabilist represents initial work toward such an aid.\nWe propose a norm of consistency for a mixed set of defeasible and strict sentences, based on a probabilistic semantics. This norm establishes a clear distinction between knowledge bases depicting exceptions and those containing outright contradictions. We then define a notion of entailment based also on probabilistic considerations and provide a characterization of the relation between consistency and entailment. We derive necessary and sufficient conditions for consistency, and provide a simple decision procedure for testing consistency and deciding whether a sentence is entailed by a database. Finally, it is shown that if al1 sentences are Horn clauses, consistency and entailment can be tested in polynomial time.\nBPS, the Bayesian Problem Solver, applies probabilistic inference and decision-theoretic control to flexible, resource-constrained problem-solving. This paper focuses on the Bayesian inference mechanism in BPS, and contrasts it with those of traditional heuristic search techniques. By performing sound inference, BPS can outperform traditional techniques with significantly less computational effort. Empirical tests on the Eight Puzzle show that after only a few hundred node expansions, BPS makes better decisions than does the best existing algorithm after several million node expansions\nIt is suggested that an AI inference system should reflect an inference policy that is tailored to the domain of problems to which it is applied -- and furthermore that an inference policy need not conform to any general theory of rational inference or induction. We note, for instance, that Bayesian reasoning about the probabilistic characteristics of an inference domain may result in the specification of an nonBayesian procedure for reasoning within the inference domain. In this paper, the idea of an inference policy is explored in some detail. To support this exploration, the characteristics of some standard and nonstandard inference policies are examined.\nBayesian Belief Networks have been largely overlooked by Expert Systems practitioners on the grounds that they do not correspond to the human inference mechanism. In this paper, we introduce an explanation mechanism designed to generate intuitive yet probabilistically sound explanations of inferences drawn by a Bayesian Belief Network. In particular, our mechanism accounts for the results obtained due to changes in the causal and the evidential support of a node.\nThe arc reversal/node reduction approach to probabilistic inference is extended to include the case of instantiated evidence by an operation called \"evidence reversal.\" This not only provides a technique for computing posterior joint distributions on general belief networks, but also provides insight into the methods of Pearl [1986b] and Lauritzen and Spiegelhalter [1988]. Although it is well understood that the latter two algorithms are closely related, in fact all three algorithms are identical whenever the belief network is a forest.\nThis paper discusses a new measure that is adaptable to certain intervalic probability frameworks, possibility theory, and belief theory. As such, it has the potential for wide use in knowledge engineering, expert systems, and related problems in the human sciences. This measure (denoted here by F) has been introduced in Smithson (1988) and is more formally discussed in Smithson (1989a)o Here, I propose to outline the conceptual basis for F and compare its properties with other measures of second-order uncertainty. I will argue that F is an indicator of nonspecificity or alternatively, of freedom, as distinguished from either ambiguity or vagueness.\nWe present a new, deterministic, distributed MAP estimation algorithm for Markov Random Fields called Local Highest Confidence First (Local HCF). The algorithm has been applied to segmentation problems in computer vision and its performance compared with stochastic algorithms. The experiments show that Local HCF finds better estimates than stochastic algorithms with much less computation.\nIn this paper, the feasibility of using finite totally ordered probability models under Alelinnas's Theory of Probabilistic Logic [Aleliunas, 1988] is investigated. The general form of the probability algebra of these models is derived and the number of possible algebras with given size is deduced. Based on this analysis, we discuss problems of denominator-indifference and ambiguity-generation that arise in reasoning by cases and abductive reasoning. An example is given that illustrates how these problems arise. The investigation shows that a finite probability model may be of very limited usage.\nBy probabilistic logic I mean a normative theory of belief that explains how a body of evidence affects one's degree of belief in a possible hypothesis. A new axiomatization of such a theory is presented which avoids a finite additivity axiom, yet which retains many useful inference rules. Many of the examples of this theory--its models do not use numerical probabilities. Put another way, this article gives sharper answers to the two questions: 1.What kinds of sets can used as the range of a probability function? 2.Under what conditions is the range set of a probability function isomorphic to the set of real numbers in the interval 10,1/ with the usual arithmetical operations?\nComputational mechanisms for uncertainty management must support interactive and incremental problem formulation, inference, hypothesis testing, and decision making. However, most current uncertainty inference systems concentrate primarily on inference, and provide no support for the larger issues. We present a computational approach to uncertainty management which provides direct support for the dynamic, incremental aspect of this task, while at the same time permitting direct representation of the structure of evidential relationships. At the same time, we show that this approach responds to the modularity concerns of Heckerman and Horvitz [Heck87]. This paper emphasizes examples of the capabilities of this approach. Another paper [D'Am89] details the representations and algorithms involved.\nThere is uncertainty associated with the occurrence of many events in real life. In this paper we develop a temporal logic to deal with such uncertain events and outline a possible implementation in an extension of PROLOG. Events are represented as fuzzy sets with the membership function giving the possibility of occurrence of the event in a given interval of time. The developed temporal logic is simple but powerful. It can determine effectively the various temporal relations between uncertain events or their combinations. PROLOG provides a uniform substrate on which to effectively implement such a temporal logic for uncertain events\nThis paper argues for a modal view of probability. The syntax and semantics of one particularly strong probability logic are discussed and some examples of the use of the logic are provided. We show that it is both natural and useful to think of probability as a modal operator. Contrary to popular belief in AI, a probability ranging between 0 and 1 represents a continuum between impossibility and necessity, not between simple falsity and truth. The present work provides a clear semantics for quantification into the scope of the probability operator and for higher-order probabilities. Probability logic is a language for expressing both probabilistic and logical concepts.\nThis paper explores the role of Directed Acyclic Graphs (DAGs) as a representation of conditional independence relationships. We show that DAGs offer polynomially sound and complete inference mechanisms for inferring conditional independence relationships from a given causal set of such relationships. As a consequence, d-separation, a graphical criterion for identifying independencies in a DAG, is shown to uncover more valid independencies then any other criterion. In addition, we employ the Armstrong property of conditional independence to show that the dependence relationships displayed by a DAG are inherently consistent, i.e. for every DAG D there exists some probability distribution P that embodies all the conditional independencies displayed in D and none other.\nIn this paper, an empirical evaluation of three inference methods for uncertain reasoning is presented in the context of Pathfinder, a large expert system for the diagnosis of lymph-node pathology. The inference procedures evaluated are (1) Bayes' theorem, assuming evidence is conditionally independent given each hypothesis; (2) odds-likelihood updating, assuming evidence is conditionally independent given each hypothesis and given the negation of each hypothesis; and (3) a inference method related to the Dempster-Shafer theory of belief. Both expert-rating and decision-theoretic metrics are used to compare the diagnostic accuracy of the inference methods.\nThis paper describes a formal system of belief revision developed by Wolfgang Spohn and shows that this system has a parallel implementation that can be derived from an influence diagram in a manner similar to that in which Bayesian networks are derived. The proof rests upon completeness results for an axiomatization of the notion of conditional independence, with the Spohn system being used as a semantics for the relation of conditional independence.\nNonmonotonic reasoning is a pattern of reasoning that allows an agent to make and retract (tentative) conclusions from inconclusive evidence. This paper gives a possible-worlds interpretation of the nonmonotonic reasoning problem based on standard decision theory and the emerging probability logic. The system's central principle is that a tentative conclusion is a decision to make a bet, not an assertion of fact. The system is rational, and as sound as the proof theory of its underlying probability log.\nFor many years, at least since McCarthy and Hayes (1969), writers have lamented, and attempted to compensate for, the alleged fact that we often do not have adequate statistical knowledge for governing the uncertainty of belief, for making uncertain inferences, and the like. It is hardly ever spelled out what \"adequate statistical knowledge\" would be, if we had it, and how adequate statistical knowledge could be used to control and regulate epistemic uncertainty.\nWhen knowledge is obtained from a database, it is only possible to deduce confidence intervals for probability values. With confidence intervals replacing point values, the results in the set covering model include interval constraints for the probabilities of mutually exclusive and exhaustive explanations. The Principle of Interval Constraints ranks these explanations by determining the expected values of the probabilities based on distributions determined from the interval, constraints. This principle was developed using the Classical Approach to probability. This paper justifies the Principle of Interval Constraints with a more rigorous statement of the Classical Approach and by defending the concept of probabilities of probabilities.\nIn this paper, we describe an abstract framework and axioms under which exact local computation of marginals is possible. The primitive objects of the framework are variables and valuations. The primitive operators of the framework are combination and marginalization. These operate on valuations. We state three axioms for these operators and we derive the possibility of local computation from the axioms. Next, we describe a propagation scheme for computing marginals of a valuation when we have a factorization of the valuation on a hypertree. Finally we show how the problem of computing marginals of joint probability distributions and joint belief functions fits the general framework.\nThis paper focuses on probability updates in multiply-connected belief networks. Pearl has designed the method of conditioning, which enables us to apply his algorithm for belief updates in singly-connected networks to multiply-connected belief networks by selecting a loop-cutset for the network and instantiating these loop-cutset nodes. We discuss conditions that need to be satisfied by the selected nodes. We present a heuristic algorithm for finding a loop-cutset that satisfies these conditions.\nDependency knowledge of the form \"x is independent of y once z is known\" invariably obeys the four graphoid axioms, examples include probabilistic and database dependencies. Often, such knowledge can be represented efficiently with graphical structures such as undirected graphs and directed acyclic graphs (DAGs). In this paper we show that the graphical criterion called d-separation is a sound rule for reading independencies from any DAG based on a causal input list drawn from a graphoid. The rule may be extended to cover DAGs that represent functional dependencies as well as conditional dependencies.\nA probabilistic method of reasoning under uncertainty is proposed based on the principle of Minimum Cross Entropy (MCE) and concept of Recursive Causal Model (RCM). The dependency and correlations among the variables are described in a special language BNDL (Belief Networks Description Language). Beliefs are propagated among the clauses of the BNDL programs representing the underlying probabilistic distributions. BNDL interpreters in both Prolog and C has been developed and the performance of the method is compared with those of the others.\nWe introduce the operation of possibility qualification and show how. this modal-like operator can be used to represent \"typical\" or default knowledge in a theory of nonmonotonic reasoning. We investigate the representational power of this approach by looking at a number of prototypical problems from the nonmonotonic reasoning literature. In particular we look at the so called Yale shooting problem and its relation to priority in default reasoning.\nThis paper examines the relationship between Shafer's belief functions and convex sets of probability distributions. Kyburg's (1986) result showed that belief function models form a subset of the class of closed convex probability distributions. This paper emphasizes the importance of Kyburg's result by looking at simple examples involving Bernoulli trials. Furthermore, it is shown that many convex sets of probability distributions generate the same belief function in the sense that they support the same lower and upper values. This has implications for a decision theoretic extension. Dempster's rule of combination is also compared with Bayes' rule of conditioning.\nModifiable combining functions are a synthesis of two common approaches to combining evidence. They offer many of the advantages of these approaches and avoid some disadvantages. Because they facilitate the acquisition, representation, explanation, and modification of knowledge about combinations of evidence, they are proposed as a tool for knowledge engineers who build systems that reason under uncertainty, not as a normative theory of evidence.\nThe combination of evidence in Dempster-Shafer theory is compared with the combination of evidence in probabilistic logic. Sufficient conditions are stated for these two methods to agree. It is then shown that these conditions are minimal in the sense that disagreement can occur when any one of them is removed. An example is given in which the traditional assumption of conditional independence of evidence on hypotheses holds and a uniform prior is assumed, but probabilistic logic and Dempster's rule give radically different results for the combination of two evidence events.\nIn the canonical examples underlying Shafer-Dempster theory, beliefs over the hypotheses of interest are derived from a probability model for a set of auxiliary hypotheses. Beliefs are derived via a compatibility relation connecting the auxiliary hypotheses to subsets of the primary hypotheses. A belief function differs from a Bayesian probability model in that one does not condition on those parts of the evidence for which no probabilities are specified. The significance of this difference in conditioning assumptions is illustrated with two examples giving rise to identical belief functions but different Bayesian probability distributions.\nDempster's rule of combination has been the most controversial part of the Dempster-Shafer (D-S) theory. In particular, Zadeh has reached a conjecture on the noncombinability of evidence from a relational model of the D-S theory. In this paper, we will describe another relational model where D-S masses are represented as conditional granular distributions. By comparing it with Zadeh's relational model, we will show how Zadeh's conjecture on combinability does not affect the applicability of Dempster's rule in our model.\nWe present a program that manages a database of temporally scoped beliefs. The basic functionality of the system includes maintaining a network of constraints among time points, supporting a variety of fetches, mediating the application of causal rules, monitoring intervals of time for the addition of new facts, and managing data dependencies that keep the database consistent. At this level the system operates independent of any measure of belief or belief calculus. We provide an example of how an application program mi9ght use this functionality to implement a belief calculus.\nWe present a representation of partial confidence in belief and preference that is consistent with the tenets of decision-theory. The fundamental insight underlying the representation is that if a person is not completely confident in a probability or utility assessment, additional modeling of the assessment may improve decisions to which it is relevant. We show how a traditional decision-analytic approach can be used to balance the benefits of additional modeling with associated costs. The approach can be used during knowledge acquisition to focus the attention of a knowledge engineer or expert on parts of a decision model that deserve additional refinement.\nBayes belief networks and influence diagrams are tools for constructing coherent probabilistic representations of uncertain knowledge. The process of constructing such a network to represent an expert's knowledge is used to illustrate a variety of techniques which can facilitate the process of structuring and quantifying uncertain relationships. These include some generalizations of the \"noisy OR gate\" concept. Sensitivity analysis of generic elements of Bayes' networks provides insight into when rough probability assessments are sufficient and when greater precision may be important.\nA distinction is sometimes made between \"statistical\" and \"subjective\" probabilities. This is based on a distinction between \"unique\" events and \"repeatable\" events. We argue that this distinction is untenable, since all events are \"unique\" and all events belong to \"kinds\", and offer a conception of probability for A1 in which (1) all probabilities are based on -- possibly vague -- statistical knowledge, and (2) every statement in the language has a probability. This conception of probability can be applied to very rich languages.\nDecision tree induction systems are being used for knowledge acquisition in noisy domains. This paper develops a subjective Bayesian interpretation of the task tackled by these systems and the heuristic methods they use. It is argued that decision tree systems implicitly incorporate a prior belief that the simpler (in terms of decision tree complexity) of two hypotheses be preferred, all else being equal, and that they perform a greedy search of the space of decision rules to find one in which there is strong posterior belief. A number of improvements to these systems are then suggested.\nThe use of numerical uncertainty representations allows better modeling of some aspects of human evidential reasoning. It also makes knowledge acquisition and system development, test, and modification more difficult. We propose that where possible, the assignment and/or refinement of rule weights should be performed automatically. We present one approach to performing this training - numerical optimization - and report on the results of some preliminary tests in training rule bases. We also show that truth maintenance can be used to make training more efficient and ask some epistemological questions raised by training rule weights.\nAn inductive logic can be formulated in which the elements are not propositions or probability distributions, but information systems. The logic is complete for information systems with binary hypotheses, i.e., it applies to all such systems. It is not complete for information systems with more than two hypotheses, but applies to a subset of such systems. The logic is inductive in that conclusions are more informative than premises. Inferences using the formalism have a strong justification in terms of the expected value of the derived information system.\nPoly-trees are singly connected causal networks in which variables may arise from multiple causes. This paper develops a method of recovering ply-trees from empirically measured probability distributions of pairs of variables. The method guarantees that, if the measured distributions are generated by a causal process structured as a ply-tree then the topological structure of such tree can be recovered precisely and, in addition, the causal directionality of the branches can be determined up to the maximum extent possible. The method also pinpoints the minimum (if any) external semantics required to determine the causal relationships among the variables considered.\nThis paper describes a heuristic Bayesian method for computing probability distributions from experimental data, based upon the multivariate normal form of the influence diagram. An example illustrates its use in medical technology assessment. This approach facilitates the integration of results from different studies, and permits a medical expert to make proper assessments without considerable statistical training.\nEvidential reasoning is cast as the problem of simplifying the evidence-hypothesis relation and constructing combination formulas that possess certain testable properties. Important classes of evidence as identifiers, annihilators, and idempotents and their roles in determining binary operations on intervals of reals are discussed. The appropriate way of constructing formulas for combining evidence and their limitations, for instance, in robustness, are presented.\nThis paper discusses the semantics and proof theory of Nilsson's probabilistic logic, outlining both the benefits of its well-defined model theory and the drawbacks of its proof theory. Within Nilsson's semantic framework, we derive a set of inference rules which are provably sound. The resulting proof system, in contrast to Nilsson's approach, has the important feature of convergence - that is, the inference process proceeds by computing increasingly narrow probability intervals which converge from above and below on the smallest entailed probability interval. Thus the procedure can be stopped at any time to yield partial information concerning the smallest entailed interval.\nThe comparisons of uncertainty calculi from the last two Uncertainty Workshops have all used theoretical probabilistic accuracy as the sole metric. While mathematical correctness is important, there are other factors which should be considered when developing reasoning systems. These other factors include, among other things, the error in uncertainty measures obtainable for the problem and the effect of this error on the performance of the resulting system.\nIn our previous series of studies to investigate the role of evidential reasoning in the RUBRIC system for full-text document retrieval (Tong et al., 1985; Tong and Shapiro, 1985; Tong and Appelbaum, 1987), we identified the important role that problem structure plays in the overall performance of the system. In this paper, we focus on these structural elements (which we now call \"semantic structure\") and show how explicit consideration of their properties reduces what previously were seen as difficult evidential reasoning problems to more tractable questions.\nThis study examined the effects of \"tuning\" the parameters of the incremental function of MYCIN, the independent function of PROSPECTOR, a probability model that assumes independence, and a simple additive linear equation. me parameters of each of these models were optimized to provide solutions which most nearly approximated those from a full probability model for a large set of simple networks. Surprisingly, MYCIN, PROSPECTOR, and the linear equation performed equivalently; the independence model was clearly more accurate on the networks studied.\nWe describe a representation and a set of inference methods that combine logic programming techniques with probabilistic network representations for uncertainty (influence diagrams). The techniques emphasize the dynamic construction and solution of probabilistic and decision-theoretic models for complex and uncertain domains. Given a query, a logical proof is produced if possible; if not, an influence diagram based on the query and the knowledge of the decision domain is produced and subsequently solved. A uniform declarative, first-order, knowledge representation is combined with a set of integrated inference procedures for logical, probabilistic, and decision-theoretic reasoning.\nA method for computing probabilistic propositions is presented. It assumes the availability of a single external routine for computing the probability of one instantiated variable, given a conjunction of other instantiated variables. In particular, the method allows belief network algorithms to calculate general probabilistic propositions over nodes in the network. Although in the worst case the time complexity of the method is exponential in the size of a query, it is polynomial in the size of a number of common types of queries.\nThe fundamental elements of evidential reasoning problems are described, followed by a discussion of the structure of various types of problems. Bayesian inference networks and state space formalism are used as the tool for problem representation.   A human-oriented decision making cycle for solving evidential reasoning problems is described and illustrated for a military situation assessment problem. The implementation of this cycle may serve as the basis for an expert system shell for evidential reasoning; i.e. a situation assessment processor.\nThe ability to predict the future in a given domain can be acquired by discovering empirically from experience certain temporal patterns that tend to repeat unerringly. Previous works in time series analysis allow one to make quantitative predictions on the likely values of certain linear variables. Since certain types of knowledge are better expressed in symbolic forms, making qualitative predictions based on symbolic representations require a different approach. A domain independent methodology called TIM (Time based Inductive Machine) for discovering potentially uncertain temporal patterns from real time observations using the technique of inductive inference is described here.\nA model of knowledge representation is described in which propositional facts and the relationships among them can be supported by other facts. The set of knowledge which can be supported is called the set of cognitive units, each having associated descriptions of their explicit and implicit support structures, summarizing belief and reliability of belief. This summary is precise enough to be useful in a computational model while remaining descriptive of the underlying symbolic support structure. When a fact supports another supportive relationship between facts we call this meta-support. This facilitates reasoning about both the propositional knowledge. and the support structures underlying it.\nTo develop an approach to utilizing continuous statistical information within the Dempster- Shafer framework, we combine methods proposed by Strat and by Shafero We first derive continuous possibility and mass functions from probability-density functions. Then we propose a rule for combining such evidence that is simpler and more efficiently computed than Dempster's rule. We discuss the relationship between Dempster's rule and our proposed rule for combining evidence over continuous frames.\nWe start by defining an approach to non-monotonic probabilistic reasoning in terms of non-monotonic categorical (true-false) reasoning. We identify a type of non-monotonic probabilistic reasoning, akin to default inheritance, that is commonly found in practice, especially in \"evidential\" and \"Bayesian\" reasoning. We formulate this in terms of the Maximization of Conditional Independence (MCI), and identify a variety of applications for this sort of default. We propose a formalization using Pointwise Circumscription. We compare MCI to Maximum Entropy, another kind of non-monotonic principle, and conclude by raising a number of open questions\nThe investigations reported in this paper center on the process of dynamic uncertainty assessment during interpretation tasks in real domain. In particular, we are interested here in the nature of the control structure of computer programs that can support multiple interpretation and smooth transitions between them, in real time. Each step of the processing involves the interpretation of one input item and the appropriate re-establishment of the system's confidence of the correctness of its interpretation(s).\nWe describe a viewpoint on the Dempster/Shafer 'Theory of Evidence', and provide an interpretation which regards the combination formulas as statistics of the opinions of \"experts\". This is done by introducing spaces with binary operations that are simpler to interpret or simpler to implement than the standard combination formula, and showing that these spaces can be mapped homomorphically onto the Dempster/Shafer theory of evidence space. The experts in the space of \"opinions of experts\" combine information in a Bayesian fashion. We present alternative spaces for the combination of evidence suggested by this viewpoint.\nWe are interested in creating an automated or semi-automated system with the capability of taking a set of radar imagery, collection parameters and a priori map and other tactical data, and producing likely interpretations of the possible military situations given the available evidence. This paper is concerned with the problem of the interpretation and computation of certainty or belief in the conclusions reached by such a system.\nThis paper presents an efficient adaptation and application of the Dempster-Shafer theory of evidence, one that can be used effectively in a massively parallel hierarchical system for visual pattern perception. It describes the techniques used, and shows in an extended example how they serve to improve the system's performance as it applies a multiple-level set of processes.\nFor any system with limited statistical knowledge, the combination of evidence and the interpretation of sampling information require the determination of the right reference class (or of an adequate one). The present note (1) discusses the use of reference classes in evidential reasoning, and (2) discusses implementations of Kyburg's rules for reference classes. This paper contributes the first frank discussion of how much of Kyburg's system is needed to be powerful, how much can be computed effectively, and how much is philosophical fat.\nMINDS is a distributed system of cooperating query engines that customize, document retrieval for each user in a dynamic environment. It improves its performance and adapts to changing patterns of document distribution by observing system-user interactions and modifying the appropriate certainty factors, which act as search control parameters. It argued here that the uncertainty management calculus must account for temporal precedence, reliability of evidence, degree of support for a proposition, and saturation effects. The calculus presented here possesses these features. Some results obtained with this scheme are discussed.\nIn this paper, we describe a representation for spatial information, called the stochastic map, and associated procedures for building it, reading information from it, and revising it incrementally as new information is obtained. The map contains the estimates of relationships among objects in the map, and their uncertainties, given all the available information. The procedures provide a general solution to the problem of estimating uncertain relative spatial relationships. The estimates are probabilistic in nature, an advance over the previous, very conservative, worst-case approaches to the problem. Finally, the procedures are developed in the context of state-estimation and filtering theory, which provides a solid basis for numerous extensions.\nThis paper examines the accuracy of the PROSPECTOR model for uncertain reasoning. PROSPECTOR's solutions for a large number of computer-generated inference networks were compared to those obtained from probability theory and minimum cross-entropy calculations. PROSPECTOR's answers were generally accurate for a restricted subset of problems that are consistent with its assumptions. However, even within this subset, we identified conditions under which PROSPECTOR's performance deteriorates.\nAttempts to replicate probabilistic reasoning in expert systems have typically overlooked a critical ingredient of that process. Probabilistic analysis typically requires extensive judgments regarding interdependencies among hypotheses and data, and regarding the appropriateness of various alternative models. The application of such models is often an iterative process, in which the plausibility of the results confirms or disconfirms the validity of assumptions made in building the model. In current expert systems, by contrast, probabilistic information is encapsulated within modular rules (involving, for example, \"certainty factors\"), and there is no mechanism for reviewing the overall form of the probability argument or the validity of the judgments entering into it.\nThis paper examines some methods and ideas underlying the author's successful probabilistic learning systems(PLS), which have proven uniquely effective and efficient in generalization learning or induction. While the emerging principles are generally applicable, this paper illustrates them in heuristic search, which demands noise management and incremental learning. In our approach, both task performance and learning are guided by probability. Probabilities are incrementally normalized and revised, and their errors are located and corrected.\nIn a real expert system, one may have unreliable, unconfident, conflicting estimates of the value for a particular parameter. It is important for decision making that the information present in this aggregate somehow find its way into use. We cast the problem of representing and combining uncertain estimates as selection of two kinds of functions, one to determine an estimate, the other its uncertainty. The paper includes a long list of properties that such functions should satisfy, and it presents one method that satisfies them.\nMechanisms for the automation of uncertainty are required for expert systems. Sometimes these mechanisms need to obey the properties of probabilistic reasoning. A purely numeric mechanism, like those proposed so far, cannot provide a probabilistic logic with truth functional connectives. We propose an alternative mechanism, Incidence Calculus, which is based on a representation of uncertainty using sets of points, which might represent situations, models or possible worlds. Incidence Calculus does provide a probabilistic logic with truth functional connectives.\nThis paper focuses on designing expert systems to support decision making in complex, uncertain environments. In this context, our research indicates that strictly probabilistic representations, which enable the use of decision-theoretic reasoning, are highly preferable to recently proposed alternatives (e.g., fuzzy set theory and Dempster-Shafer theory). Furthermore, we discuss the language of influence diagrams and a corresponding methodology -decision analysis -- that allows decision theory to be used effectively and efficiently as a decision-making aid. Finally, we use RACHEL, a system that helps infertile couples select medical treatments, to illustrate the methodology of decision analysis as basis for expert decision systems.\nThe last few years has seen a growing debate about techniques for managing uncertainty in AI systems. Unfortunately this debate has been cast as a rivalry between AI methods and classical probability based ones. Three arguments for extending the probability framework of uncertainty are presented, none of which imply a challenge to classical methods. These are (1) explicit representation of several types of uncertainty, specifically possibility and plausibility, as well as probability, (2) the use of weak methods for uncertainty management in problems which are poorly defined, and (3) symbolic representation of different uncertainty calculi and methods for choosing between them.\nIt is proposed to apply modern methods of nonlinear nonequilibrium statistical mechanics to develop software algorithms that will optimally respond to targets within short response times with minimal computer resources. This Statistical Mechanics Algorithm for Response to Targets (SMART) can be developed with a view towards its future implementation into a hardwired Statistical Algorithm Multiprocessor (SAM) to enhance the efficiency and speed of response to targets (SMART_SAM).\nThe roles played by decision factors in making complex subject are decisions are characterized by how these factors affect the overall decision. Evidence that partially matches a factor is evaluated, and then effective computational rules are applied to these roles to form an appropriate aggregation of the evidence. The use of this technique supports the expression of deeper levels of causality, and may also preserve the cognitive structure of the decision maker better than the usual weighting methods, certainty-factor or other probabilistic models can.\nWe consider a simple sequential allocation procedure for sharing indivisible items between agents in which agents take turns to pick items. Supposing additive utilities and independence between the agents, we show that the expected utility of each agent is computable in polynomial time. Using this result, we prove that the expected utilitarian social welfare is maximized when agents take alternate turns. We also argue that this mechanism remains optimal when agents behave strategically\nWe introduce statistical constraints, a declarative modelling tool that links statistics and constraint programming. We discuss two statistical constraints and some associated filtering algorithms. Finally, we illustrate applications to standard problems encountered in statistics and to a novel inspection scheduling problem in which the aim is to find inspection plans with desirable statistical properties.\nWe frame the question of what kind of subjective experience a brain simulation would have in contrast to a biological brain. We discuss the brain prosthesis thought experiment. We evaluate how the experience of the brain simulation might differ from the biological, according to a number of hypotheses about experience and the properties of simulation. Then, we identify finer questions relating to the original inquiry, and answer them from both a general physicalist, and panexperientialist perspective.\nThe debts' clearing problem is about clearing all the debts in a group of n entities (persons, companies etc.) using a minimal number of money transaction operations. The problem is known to be NP-hard in the strong sense. As for many intractable problems, techniques from the field of artificial intelligence are useful in finding solutions close to optimum for large inputs. An evolutionary algorithm for solving the debts' clearing problem is proposed.\nSpace and time are two critical components of many real world systems. For this reason, analysis of anomalies in spatiotemporal data has been a great of interest. In this work, application of tensor decomposition and eigenspace techniques on spatiotemporal hotspot detection is investigated. An algorithm called SST-Hotspot is proposed which accounts for spatiotemporal variations in data and detect hotspots using matching of eigenvector elements of two cases and population tensors. The experimental results reveal the interesting application of tensor decomposition and eigenvector-based techniques in hotspot analysis.\nAgent programming is mostly a symbolic discipline and, as such, draws little benefits from probabilistic areas as machine learning and graphical models. However, the greatest objective of agent research is the achievement of autonomy in dynamical and complex environments --- a goal that implies embracing uncertainty and therefore the entailed representations, algorithms and techniques. This paper proposes an innovative and conflict free two layer approach to agent programming that uses already established methods and tools from both symbolic and probabilistic artificial intelligence. Moreover, this framework is illustrated by means of a widely used agent programming example, GoldMiners.\nIt is well known that for certain tasks, quantum computing outperforms classical computing. A growing number of contributions try to use this advantage in order to improve or extend classical machine learning algorithms by methods of quantum information theory. This paper gives a brief introduction into quantum machine learning using the example of pattern classification. We introduce a quantum pattern classification algorithm that draws on Trugenberger's proposal for measuring the Hamming distance on a quantum computer (CA Trugenberger, Phys Rev Let 87, 2001) and discuss its advantages using handwritten digit recognition as from the MNIST database.\nWe study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression. We show that universal consistency of Empirical Risk Minimization remains possible using the MAPE instead of the MAE.\nWe compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability out-perform filter approaches while retaining a reasonable computational cost.\nWe consider graphs that represent pairwise marginal independencies amongst a set of variables (for instance, the zero entries of a covariance matrix for normal data). We characterize the directed acyclic graphs (DAGs) that faithfully explain a given set of independencies, and derive algorithms to efficiently enumerate such structures. Our results map out the space of faithful causal models for a given set of pairwise marginal independence relations. This allows us to show the extent to which causal inference is possible without using conditional independence tests.\nUsing the recently introduced universal computing model, called orchestrated machine, that represents computations in a dissipative environment, we consider a new kind of interpretation of Turing's Imitation Game. In addition we raise the question whether the intelligence may show fractal properties. Then we sketch a vision of what robotic cars are going to do in the future. Finally we give the specification of an artificial life game based on the concept of orchestrated machines. The purpose of this paper is to start the search for possible relationships between these different topics.\nThis paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures. A variant of fitted Q iteration, based on Advantage values instead of Q values, is also explored. The results show that GRU performs significantly better than LSTM and MUT1 for most of the problems considered, requiring less training episodes and less CPU time before learning a very good policy. Advantage learning also tends to produce better results.\nProbabilistic inference procedures are usually coded painstakingly from scratch, for each target model and each inference algorithm. We reduce this effort by generating inference procedures from models automatically. We make this code generation modular by decomposing inference algorithms into reusable program-to-program transformations. These transformations perform exact inference as well as generate probabilistic programs that compute expectations, densities, and MCMC samples. The resulting inference procedures are about as accurate and fast as other probabilistic programming systems on real-world problems.\nHere, I review current state-of-the-arts in many areas of AI to estimate when it's reasonable to expect human level AI development. Predictions of prominent AI researchers vary broadly from very pessimistic predictions of Andrew Ng to much more moderate predictions of Geoffrey Hinton and optimistic predictions of Shane Legg, DeepMind cofounder. Given huge rate of progress in recent years and this broad range of predictions of AI experts, AI safety questions are also discussed.\nAutomatic extraction of cause-effect relationships from natural language texts is a challenging open problem in Artificial Intelligence. Most of the early attempts at its solution used manually constructed linguistic and syntactic rules on small and domain-specific data sets. However, with the advent of big data, the availability of affordable computing power and the recent popularization of machine learning, the paradigm to tackle this problem has slowly shifted. Machines are now expected to learn generic causal extraction rules from labelled data with minimal supervision, in a domain independent-manner. In this paper, we provide a comprehensive survey of causal relation extraction techniques from both paradigms, and analyse their relative strengths and weaknesses, with recommendations for future work.\nThe 15th Conference on Theoretical Aspects of Rationality and Knowledge (TARK) took place in Carnegie Mellon University, Pittsburgh, USA from June 4 to 6, 2015.   The mission of the TARK conferences is to bring together researchers from a wide variety of fields, including Artificial Intelligence, Cryptography, Distributed Computing, Economics and Game Theory, Linguistics, Philosophy, and Psychology, in order to further our understanding of interdisciplinary issues involving reasoning about rationality and knowledge.   These proceedings consist of a subset of the papers / abstracts presented at the TARK conference.\nWe present a system for generating and understanding of dynamic and static spatial relations in robotic interaction setups. Robots describe an environment of moving blocks using English phrases that include spatial relations such as \"across\" and \"in front of\". We evaluate the system in robot-robot interactions and show that the system can robustly deal with visual perception errors, language omissions and ungrammatical utterances.\nProbabilistic modeling is one of the foundations of modern machine learning and artificial intelligence. In this paper, we propose a novel type of probabilistic models named latent dependency forest models (LDFMs). A LDFM models the dependencies between random variables with a forest structure that can change dynamically based on the variable values. It is therefore capable of modeling context-specific independence. We parameterize a LDFM using a first-order non-projective dependency grammar. Learning LDFMs from data can be formulated purely as a parameter learning problem, and hence the difficult problem of model structure learning is circumvented. Our experimental results show that LDFMs are competitive with existing probabilistic models.\nA modification of the neo-fuzzy neuron is proposed (an extended neo-fuzzy neuron (ENFN)) that is characterized by improved approximating properties. An adaptive learning algorithm is proposed that has both tracking and smoothing properties. An ENFN distinctive feature is its computational simplicity compared to other artificial neural networks and neuro-fuzzy systems.\nMulti-agent path finding (MAPF) is well-studied in artificial intelligence, robotics, theoretical computer science and operations research. We discuss issues that arise when generalizing MAPF methods to real-world scenarios and four research directions that address them. We emphasize the importance of addressing these issues as opposed to developing faster methods for the standard formulation of the MAPF problem.\nMotivated by the idea that criticality and universality of phase transitions might play a crucial role in achieving and sustaining learning and intelligent behaviour in biological and artificial networks, we analyse a theoretical and a pragmatic experimental set up for critical phenomena in deep learning. On the theoretical side, we use results from statistical physics to carry out critical point calculations in feed-forward/fully connected networks, while on the experimental side we set out to find traces of criticality in deep neural networks. This is our first step in a series of upcoming investigations to map out the relationship between criticality and learning in deep networks.\nThis paper presents minimax rates for density estimation when the data dimension $d$ is allowed to grow with the number of observations $n$ rather than remaining fixed as in previous analyses. We prove a non-asymptotic lower bound which gives the worst-case rate over standard classes of smooth densities, and we show that kernel density estimators achieve this rate. We also give oracle choices for the bandwidth and derive the fastest rate $d$ can grow with $n$ to maintain estimation consistency.\nRoboCup offers a set of benchmark problems for Artificial Intelligence in form of official world championships since 1997. The most tactical advanced and richest in terms of behavioural complexity of these is the 2D Soccer Simulation League, a simulated robotic soccer competition. BetaRun is a new attempt combining both machine learning and manual programming approaches, with the ultimate goal to arrive at a team that is trained entirely from observing and playing games, and a new development based on agent2D.\nThe AGM model is the most remarkable framework for modeling belief revision. However, it is not perfect in all aspects. Paraconsistent belief revision, multi-agent belief revision and non-prioritized belief revision are three different extensions to AGM to address three important criticisms applied to it. In this article, we propose a framework based on AGM that takes a position in each of these categories. Also, we discuss some features of our framework and study the satisfiability of AGM postulates in this new context.\nGames have always been popular testbeds for Artificial Intelligence (AI). In the last decade, we have seen the rise of the Multiple Online Battle Arena (MOBA) games, which are the most played games nowadays. In spite of this, there are few works that explore MOBA as a testbed for AI Research. In this paper we present and discuss the main features and opportunities offered by MOBA games to Game AI Research. We describe the various challenges faced along the game and also propose a discrete model that can be used to better understand and explore the game. With this, we aim to encourage the use of MOBA as a novel research platform for Game AI.\nThere are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact'. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research.\nDigital games have become a key player in the entertainment industry, attracting millions of new players each year. In spite of that, novice players may have a hard time when playing certain types of games, such as MOBAs and MMORPGs, due to their steep learning curves and not so friendly online communities. In this paper, we present an approach to help novice players in MOBA games overcome these problems. An artificial intelligence agent plays alongside the player analyzing his/her performance and giving tips about the game. Experiments performed with the game {\\em League of Legends} show the potential of this approach.\nDrawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions.\nConstrained counting is important in domains ranging from artificial intelligence to software analysis. There are already a few approaches for counting models over various types of constraints. Recently, hashing-based approaches achieve both theoretical guarantees and scalability, but still rely on solution enumeration. In this paper, a new probabilistic polynomial time approximate model counter is proposed, which is also a hashing-based universal framework, but with only satisfiability queries. A variant with a dynamic stopping criterion is also presented. Empirical evaluation over benchmarks on propositional logic formulas and SMT(BV) formulas shows that the approach is promising.\nThis paper gives an overview of impersonation bots that generate output in one, or possibly, multiple modalities. We also discuss rapidly advancing areas of machine learning and artificial intelligence that could lead to frighteningly powerful new multi-modal social bots. Our main conclusion is that most commonly known bots are one dimensional (i.e., chatterbot), and far from deceiving serious interrogators. However, using recent advances in machine learning, it is possible to unleash incredibly powerful, human-like armies of social bots, in potentially well coordinated campaigns of deception and influence.\nThe paper investigates navigability with imperfect information. It shows that the properties of navigability with perfect recall are exactly those captured by Armstrong's axioms from the database theory. If the assumption of perfect recall is omitted, then Armstrong's transitivity axiom is not valid, but it can be replaced by two new weaker principles. The main technical results are soundness and completeness theorems for the logical systems describing properties of navigability with and without perfect recall.\nThe paper proposes a bimodal logic that describes an interplay between distributed knowledge modality and coalition know-how modality. Unlike other similar systems, the one proposed here assumes perfect recall by all agents. Perfect recall is captured in the system by a single axiom. The main technical results are the soundness and the completeness theorems for the proposed logical system.\nThis volume consists of papers presented at the Sixteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK) held at the University of Liverpool, UK, from July 24 to 26, 2017.   TARK conferences bring together researchers from a wide variety of fields, including Computer Science (especially, Artificial Intelligence, Cryptography, Distributed Computing), Economics (especially, Decision Theory, Game Theory, Social Choice Theory), Linguistics, Philosophy (especially, Philosophical Logic), and Cognitive Psychology, in order to further understand the issues involving reasoning about rationality and knowledge.\nSequential pattern mining algorithms are widely used to explore care pathways database, but they generate a deluge of patterns, mostly redundant or useless. Clinicians need tools to express complex mining queries in order to generate less but more significant patterns. These algorithms are not versatile enough to answer complex clinician queries. This article proposes to apply a declarative pattern mining approach based on Answer Set Programming paradigm. It is exemplified by a pharmaco-epidemiological study investigating the possible association between hospitalization for seizure and antiepileptic drug switch from a french medico-administrative database.\nProbabilistic Inference Modulo Theories (PIMT) is a recent framework that expands exact inference on graphical models to use richer languages that include arithmetic, equalities, and inequalities on both integers and real numbers. In this paper, we expand PIMT to a lifted version that also processes random functions and relations. This enhancement is achieved by adapting Inversion, a method from Lifted First-Order Probabilistic Inference literature, to also be modulo theories. This results in the first algorithm for exact probabilistic inference that efficiently and simultaneously exploits random relations and functions, arithmetic, equalities and inequalities.\nWe present a commonsense, qualitative model for the semantic grounding of embodied visuo-spatial and locomotive interactions. The key contribution is an integrative methodology combining low-level visual processing with high-level, human-centred representations of space and motion rooted in artificial intelligence. We demonstrate practical applicability with examples involving object interactions, and indoor movement.\nWith the use of ontologies in several domains such as semantic web, information retrieval, artificial intelligence, the concept of similarity measuring has become a very important domain of research. Therefore, in the current paper, we propose our method of similarity measuring which uses the Dijkstra algorithm to define and compute the shortest path. Then, we use this one to compute the semantic distance between two concepts defined in the same hierarchy of ontology. Afterward, we base on this result to compute the semantic similarity. Finally, we present an experimental comparison between our method and other methods of similarity measuring.\nTransparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.\nThe present document is an excerpt of an essay that I wrote as part of my application material to graduate school in Computer Science (with a focus on Artificial Intelligence), in 1986. I was not invited by any of the schools that received it, so I became a theoretical physicist instead. The essay's full title was \"Some Topics in Philosophy and Computer Science\". I am making this text (unchanged from 1985, preserving the typesetting as much as possible) available now in memory of Jerry Fodor, whose writings had influenced me significantly at the time (even though I did not always agree).\nCloud computing relies on sharing computing resources rather than having local servers or personal devices to handle applications. Nowadays, cloud computing has become one of the fastest growing fields in information technology. However, several new security issues of cloud computing have emerged due to its service delivery models. In this paper, we discuss the case of distributed denial-of-service (DDoS) attack using Cloud resources. First, we show how such attack using a cloud platform could not be detected by previous techniques. Then we present a tricky solution based on the cloud as well.\nUsing Deep Reinforcement Learning (DRL) can be a promising approach to handle various tasks in the field of (simulated) autonomous driving. However, recent publications mainly consider learning in unusual driving environments. This paper presents Driving School for Autonomous Agents (DSA^2), a software for validating DRL algorithms in more usual driving environments based on artificial and realistic road networks. We also present the results of applying DSA^2 for handling the task of driving on a straight road while regulating the velocity of one vehicle according to different speed limits.\nAlthough deep learning has historical roots going back decades, neither the term \"deep learning\" nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton's now classic (2012) deep network model of Imagenet. What has the field discovered in the five subsequent years? Against a background of considerable progress in areas such as speech recognition, image recognition, and game playing, and considerable enthusiasm in the popular press, I present ten concerns for deep learning, and suggest that deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.\nThis paper is to explore the possibility to use alternative data and artificial intelligence techniques to trade stocks. The efficacy of the daily Twitter sentiment on predicting the stock return is examined using machine learning methods. Reinforcement learning(Q-learning) is applied to generate the optimal trading policy based on the sentiment signal. The predicting power of the sentiment signal is more significant if the stock price is driven by the expectation of the company growth and when the company has a major event that draws the public attention. The optimal trading strategy based on reinforcement learning outperforms the trading strategy based on the machine learning prediction.\nThis paper builds on an existing notion of group responsibility and proposes two ways to define the degree of group responsibility: structural and functional degrees of responsibility. These notions measure the potential responsibilities of (agent) groups for avoiding a state of affairs. According to these notions, a degree of responsibility for a state of affairs can be assigned to a group of agents if, and to the extent that, the group has the potential to preclude the state of affairs.\nWe present Etymo (https://etymo.io), a discovery engine to facilitate artificial intelligence (AI) research and development. It aims to help readers navigate a large number of AI-related papers published every week by using a novel form of search that finds relevant papers and displays related papers in a graphical interface. Etymo constructs and maintains an adaptive similarity-based network of research papers as an all-purpose knowledge graph for ranking, recommendation, and visualisation. The network is constantly evolving and can learn from user feedback to adjust itself.\nSeveral tasks in artificial intelligence require to be able to find models about knowledge dynamics. They include belief revision, fusion and belief merging, and abduction. In this paper we exploit the algebraic framework of mathematical morphology in the context of propositional logic, and define operations such as dilation or erosion of a set of formulas. We derive concrete operators, based on a semantic approach, that have an intuitive interpretation and that are formally well behaved, to perform revision, fusion and abduction. Computation and tractability are addressed, and simple examples illustrate the typical results that can be obtained.\nJust as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for nodes in a graph. By imagining the embeddings as independent coin flips of varying bias, continuous optimization techniques can be applied to the approximate expected loss. Embeddings optimized in this fashion consistently outperform the quantization of both spectral graph embeddings and various learned real-valued embeddings, on both ranking and pre-ranking tasks for a variety of datasets.\nThis paper introduces the settlement generation competition for Minecraft, the first part of the Generative Design in Minecraft challenge. The settlement generation competition is about creating Artificial Intelligence (AI) agents that can produce functional, aesthetically appealing and believable settlements adapted to a given Minecraft map - ideally at a level that can compete with human created designs. The aim of the competition is to advance procedural content generation for games, especially in overcoming the challenges of adaptive and holistic PCG. The paper introduces the technical details of the challenge, but mostly focuses on what challenges this competition provides and why they are scientifically relevant.\nThe theory of grey systems plays an important role in science,engineering and in the everyday life in general for handling approximate data. In the present paper grey numbers are used as a tool for assessing with linguistic expressions the mean performance of a group of objects participating in a certain activity. Two applications to student and football player assessment are also presented illustrating our results.\nRecently, deep learning has been advancing the state of the art in artificial intelligence to a new level, and humans rely on artificial intelligence techniques more than ever. However, even with such unprecedented advancements, the lack of explanation regarding the decisions made by deep learning models and absence of control over their internal processes act as major drawbacks in critical decision-making processes, such as precision medicine and law enforcement. In response, efforts are being made to make deep learning interpretable and controllable by humans. In this paper, we review visual analytics, information visualization, and machine learning perspectives relevant to this aim, and discuss potential challenges and future research directions.\nIn this study, under general frame of MAny Connected Intelligent Particles Systems (MACIPS), we reproduce two new simple subsets of such intelligent complex network, namely hybrid intelligent systems, involved a few prominent intelligent computing and approximate reasoning methods: self organizing feature map (SOM), Neuro-Fuzzy Inference System and Rough Set Theory (RST). Over this, we show how our algorithms can be construed as a linkage of government-society interaction, where government catches various fashions of behavior: solid (absolute) or flexible. So, transition of such society, by changing of connectivity parameters (noise) from order to disorder is inferred. Add to this, one may find an indirect mapping among financial systems and eventual market fluctuations with MACIPS.\nKnowledge representation (KR) and inference mechanism are most desirable thing to make the system intelligent. System is known to an intelligent if its intelligence is equivalent to the intelligence of human being for a particular domain or general. Because of incomplete ambiguous and uncertain information the task of making intelligent system is very difficult. The objective of this paper is to present the hybrid KR technique for making the system effective & Optimistic. The requirement for (effective & optimistic) is because the system must be able to reply the answer with a confidence of some factor. This paper also presents the comparison between various hybrid KR techniques with the proposed one.\nThere has been an increasing interest in inferring some personality traits from users and players in social networks and games, respectively. This goes beyond classical sentiment analysis, and also much further than customer profiling. The purpose here is to have a characterisation of users in terms of personality traits, such as openness, conscientiousness, extraversion, agreeableness, and neuroticism. While this is an incipient area of research, we ask the question of whether cognitive abilities, and intelligence in particular, are also measurable from user profiles. However, we pose the question as broadly as possible in terms of subjects, in the context of universal psychometrics, including humans, machines and hybrids. Namely, in this paper we analyse the following question: is it possible to measure the intelligence of humans and (non-human) bots in a social network or a game just from their user profiles, i.e., by observation, without the use of interactive tests, such as IQ tests, the Turing test or other more principled machine intelligence tests?\nWhat is \"intelligent\" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models\nTwo-dimensional (2D) materials and their heterostructures, with wafer-scale synthesis methods and fascinating properties, have attracted numerous interest and triggered revolutions of corresponding device applications. However, facile methods to realize accurate, intelligent and large-area characterizations of these 2D structures are still highly desired. Here, we report a successful application of machine-learning strategy in the optical identification of 2D structure. The machine-learning optical identification method (MOI method) endows optical microscopy with intelligent insight into the characteristic colour information in the optical photograph. Experimental results indicate that the MOI method enables accurate, intelligent and large-area characterizations of graphene, molybdenum disulphide (MoS2) and their heterostructures, including identifications of the thickness, the existence of impurities, and even the stacking order. Thanks to the convergence of artificial intelligence and nanoscience, this intelligent identification method can certainly promote the fundamental research and wafer-scale device application of 2D structures.\nWe discuss properties of recursive schemas related to McCarthy's ``91 function'' and to Takeuchi's triple recursion. Several theorems are proposed as interesting candidates for machine verification, and some intriguing open questions are raised.\nWe introduce a simple generalization of Gardenfors and Makinson's epistemic entrenchment called partial entrenchment. We show that preferential inference can be generated as the sceptical counterpart of an inference mechanism defined directly on partial entrenchment.\nThis is a tutorial on logic programming and Prolog appropriate for a course on programming languages for students familiar with imperative programming.\nThis article aims at clarifying the language and practice of scientific experiment, mainly by hooking observability on calculability.\nThis paper introduces the notion of value-based argumentation frameworks, an extension of the standard argumentation frameworks proposed by Dung, which are able toshow how rational decision is possible in cases where arguments derive their force from the social values their acceptance would promote.\nThis work analyses main features that should be present in knowledge representation. It suggests a model for representation and a way to implement this model in software. Representation takes care of both low-level sensor information and high-level concepts.\nThis document describes the functions as they are treated in the DLV system. We give first the language, then specify the main implementation issues.\nThis research paper gives an overview of quantum computers - description of their operation, differences between quantum and silicon computers, major construction problems of a quantum computer and many other basic aspects. No special scientific knowledge is necessary for the reader.\nSelf-organizing neural networks are used for brick finding in OPERA experiment. Self-organizing neural networks and wavelet analysis used for recognition and extraction of car numbers from images.\nWe prove general exponential moment inequalities for averages of [0,1]-valued iid random variables and use them to tighten the PAC Bayesian Theorem. The logarithmic dependence on the sample count in the enumerator of the PAC Bayesian bound is halved.\nIn this paper, we present a rich semantic network based on a differential analysis. We then detail implemented measures that take into account common and differential features between words. In a last section, we describe some industrial applications.\nThe principles of self-organizing the neural networks of optimal complexity is considered under the unrepresentative learning set. The method of self-organizing the multi-layered neural networks is offered and used to train the logical neural networks which were applied to the medical diagnostics.\nResults about the redundancy of circumscriptive and default theories are presented. In particular, the complexity of establishing whether a given theory is redundant is establihsed.\nThe unification algorithm is at the core of the logic programming paradigm, the first unification algorithm being developed by Robinson [5]. More efficient algorithms were developed later [3] and I introduce here yet another efficient unification algorithm centered on a specific data structure, called the Unification Table.\nIn this note we introduce the notion of islands for restricting local search. We show how we can construct islands for CNF SAT problems, and how much search space can be eliminated by restricting search to the island.\nWe show that solving planning domains on binary variables with polytree causal graph is \\NP-complete. This is in contrast to a polynomial-time algorithm of Domshlak and Brafman that solves these planning domains for polytree causal graphs of bounded indegree.\nThis report introduces researchers in AI to some of the concepts in quantum heurisitics and quantum AI.\nIn these notes we formally describe the functionality of Calculating Valid Domains from the BDD representing the solution space of valid configurations. The formalization is largely based on the CLab configuration framework.\nComputer model of a \"sense of humour\" suggested previously [arXiv:0711.2058, 0711.2061, 0711.2270] is raised to the level of a realistic algorithm.\nReview of: Brigitte Le Roux and Henry Rouanet, Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis, Kluwer, Dordrecht, 2004, xi+475 pp.\nThis paper has been withdrawn by the author due to a crucial error in the submission action.\nThe standard classification of emotions involves categorizing the expression of emotions. In this paper, parameters underlying some emotions are identified and a new classification based on these parameters is suggested.\nWe show that a prominent counterexample for the completeness of first order RUE-resolution does not apply to the higher order RUE-resolution approach EXTRUE.\nA fuzzy mnesor space is a semimodule over the positive real numbers. It can be used as theoretical framework for fuzzy sets. Hence we can prove a great number of properties for fuzzy sets without refering to the membership functions.\nThis research report presents an extension of Cumulative of Choco constraint solver, which is useful to encode over-constrained cumulative problems. This new global constraint uses sweep and task interval violation-based algorithms.\nIn this paper, we propose a first-order ontology for generalized stratified order structure. We then classify the models of the theory using model-theoretic techniques. An ontology mapping from this ontology to the core theory of Process Specification Language is also discussed.\nThis paper discusses \"computational\" systems capable of \"computing\" functions not computable by predefined Turing machines if the systems are not isolated from their environment. Roughly speaking, these systems can change their finite descriptions by interacting with their environment.\nWe develop abc-logitboost, based on the prior work on abc-boost and robust logitboost. Our extensive experiments on a variety of datasets demonstrate the considerable improvement of abc-logitboost over logitboost and abc-mart.\nThis short note reviews briefly three algorithms for finding the set of dispensable variables of a boolean formula. The presentation is light on proofs and heavy on intuitions.\nThis paper proposes a design for a system to generate constraint solvers that are specialised for specific problem models. It describes the design in detail and gives preliminary experimental results showing the feasibility and effectiveness of the approach.\nThe paper offers a mathematical formalization of the Turing test. This formalization makes it possible to establish the conditions under which some Turing machine will pass the Turing test and the conditions under which every Turing machine (or every Turing machine of the special class) will fail the Turing test.\nTo maximize its success, an AGI typically needs to explore its initially unknown world. Is there an optimal way of doing so? Here we derive an affirmative answer for a broad class of environments.\nThis study presents a method to predict the growth fluctuation of firms interdependent in a network economy. The risk of downward growth fluctuation of firms is calculated from the statistics on Japanese industry.\nThis paper investigates under which conditions instantiation-based proof procedures can be combined in a nested way, in order to mechanically construct new instantiation procedures for richer theories. Interesting applications in the field of verification are emphasized, particularly for handling extensions of the theory of arrays.\nThis paper introduces 'just enough' principles and 'systems engineering' approach to the practice of ontology development to provide a minimal yet complete, lightweight, agile and integrated development process, supportive of stakeholder management and implementation independence.\nThis paper presents a kernel formulation of the recently introduced diff-hash algorithm for the construction of similarity-sensitive hash functions. Our kernel diff-hash algorithm that shows superior performance on the problem of image feature descriptor matching.\nWe identify principles characterizing Solomonoff Induction by demands on an agent's external behaviour. Key concepts are rationality, computability, indifference and time consistency. Furthermore, we discuss extensions to the full AI case to derive AIXI.\nIn this thesis I develop a variety of techniques to train, evaluate, and sample from intractable and high dimensional probabilistic models. Abstract exceeds arXiv space limitations -- see PDF.\nThis document demonstrates that the efficient approach for diagnosis of Petri nets via integer linear programming may be unable to detect a fault even if the system is diagnosable.\nThe problem of replicating the flexibility of human common-sense reasoning has captured the imagination of computer scientists since the early days of Alan Turing's foundational work on computation and the philosophy of artificial intelligence. In the intervening years, the idea of cognition as computation has emerged as a fundamental tenet of Artificial Intelligence (AI) and cognitive science. But what kind of computation is cognition?   We describe a computational formalism centered around a probabilistic Turing machine called QUERY, which captures the operation of probabilistic conditioning via conditional simulation. Through several examples and analyses, we demonstrate how the QUERY abstraction can be used to cast common-sense reasoning as probabilistic inference in a statistical model of our observations and the uncertain structure of the world that generated that experience. This formulation is a recent synthesis of several research programs in AI and cognitive science, but it also represents a surprising convergence of several of Turing's pioneering insights in AI, the foundations of computation, and statistics.\nModification of a conceptual clustering algorithm Cobweb for the purpose of its application for numerical data is offered. Keywords: clustering, algorithm Cobweb, numerical data, fuzzy membership function.\nWe advance a doxastic interpretation for many of the logical connectives considered in Dependence Logic and in its extensions, and we argue that Team Semantics is a natural framework for reasoning about beliefs and belief updates.\nThis is a purely pedagogical paper with no new results. The goal of the paper is to give a fairly self-contained introduction to Judea Pearl's do-calculus, including proofs of his 3 rules.\nThis short note presents a new formal language, lambda dependency-based compositional semantics (lambda DCS) for representing logical forms in semantic parsing. By eliminating variables and making existential quantification implicit, lambda DCS logical forms are generally more compact than those in lambda calculus.\nIn this note, we argue that the axiomatic requirement of range to the measure of aggregated total uncertainty (ATU) in Dempster-Shafer theory is not reasonable.\nThis file summarizes the plenary talk on laboratory experiments on logic at the TARK 2013 - 14th Conference on Theoretical Aspects of Rationality and Knowledge.\nThis document contains improved and updated proofs of convergence for the sampling method presented in our paper \"Free-configuration Biased Sampling for Motion Planning\".\nIn the dawn of computer science and the eve of neuroscience we participate in rebirth of neuroscience due to new technology that allows us to deeply and precisely explore whole new world that dwells in our brains.\nWe investigate cumulative scheduling in uncertain environments, using constraint programming. We detail in this paper the dynamic sweep filtering algorithm of the FlexC global constraint.\nArtificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify and foster progress in the discipline. We will describe and critically assess the different ways AI systems are evaluated. We first focus on the traditional task-oriented evaluation approach. We see that black-box (behavioural evaluation) is becoming more and more common, as AI systems are becoming more complex and unpredictable. We identify three kinds of evaluation: Human discrimination, problem benchmarks and peer confrontation. We describe the limitations of the many evaluation settings and competitions in these three categories and propose several ideas for a more systematic and robust evaluation. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more general approaches under the perspective of universal psychometrics.\nMethods of applying neural networks to control plants are considered. Methods and schemes are described, their advantages and disadvantages are discussed.\nIn this short note we address the issue of expressing norms (such as obligations and prohibitions) in temporal logic. In particular, we address the argument from [Governatori 2015] that norms cannot be expressed in Linear Time Temporal Logic (LTL).\nWe introduce the Xapagy cognitive architecture: a software system designed to perform narrative reasoning. The architecture has been designed from scratch to model and mimic the activities performed by humans when witnessing, reading, recalling, narrating and talking about stories.\nThis paper compares various optimization methods for fuzzy inference system optimization. The optimization methods compared are genetic algorithm, particle swarm optimization and simulated annealing. When these techniques were implemented it was observed that the performance of each technique within the fuzzy inference system classification was context dependent.\nDeontic modalities are here defined in terms of the preference relation explored in our previous work (Osherson and Weinstein, 2012). Some consequences of the system are discussed.\nThis document serves as a brief technical report, detailing the processes used to represent and reconstruct simplified polygons using qualitative spatial descriptions, as defined by the eOPRAm qualitative spatial calculus.\nTau-chain is a decentralized peer-to-peer network having three unified faces: Rules, Proofs, and Computer Programs, allowing a generalization of virtually any centralized or decentralized P2P network, together with many new abilities, as we present on this note.\nWe investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.\nThis note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.\nThis short note discusses the role of syntax vs. semantics and the interplay between logic, philosophy, and language in computer science and game theory.\nThe authors apply the Generalized Rectangular Model to assessing critical thinking skills and its relations with their language competency.\nThis thesis investigates the generation of new concepts from combinations of existing concepts as a language evolves. We give a method for combining concepts, and will be investigating the utility of composite concepts in language evolution and thence the utility of concept generation.\nWe present a method of training a differentiable function approximator for a regression task using negative examples. We effect this training using negative learning rates. We also show how this method can be used to perform direct policy learning in a reinforcement learning setting.\nThis paper discusses obstacle avoidance using fuzzy logic and shortest path algorithm. This paper also introduces the sliding blades problem and illustrates how a drone can navigate itself through the swinging blade obstacles while tracing a semi-optimal path and also maintaining constant velocity\nWe show how to adjust the coefficient of determination ($R^2$) when used for measuring predictive accuracy via leave-one-out cross-validation.\nThis paper discusses the root cause of systems perceiving the self experience and how to exploit adaptive and learning features without introducing ethically problematic system properties.\nGeneral game playing artificial intelligence has recently seen important advances due to the various techniques known as 'deep learning'. However the advances conceal equally important limitations in their reliance on: massive data sets; fortuitously constructed problems; and absence of any human-level complexity, including other human opponents. On the other hand, deep learning systems which do beat human champions, such as in Go, do not generalise well. The power of deep learning simultaneously exposes its weakness. Given that deep learning is mostly clever reconfigurations of well-established methods, moving beyond the state of art calls for forward-thinking visionary solutions, not just more of the same. I present the argument that general game playing artificial intelligence will require a generalised player model. This is because games are inherently human artefacts which therefore, as a class of problems, contain cases which require a human-style problem solving approach. I relate this argument to the performance of state of art general game playing agents. I then describe a concept for a formal category theoretic basis to a generalised player model. This formal model approach integrates my existing 'Behavlets' method for psychologically-derived player modelling:   Cowley, B., Charles, D. (2016). Behavlets: a Method for Practical Player Modelling using Psychology-Based Player Traits and Domain Specific Features. User Modeling and User-Adapted Interaction, 26(2), 257-306.\nWe formalize Simplified Boardgames language, which describes a subclass of arbitrary board games. The language structure is based on the regular expressions, which makes the rules easily machine-processable while keeping the rules concise and fairly human-readable.\nThe main purpose of this paper is to study the lattice structure of variable precision rough sets. The notion of variation in precision of rough sets have been further extended to variable precision rough set with variable classification error and its algebraic properties are also studied.\nAn algorithm of solution of the Automatic Classification (AC for brevity) problem is set forth in the paper. In the AC problem, it is required to find one or several artitions, starting with the given pattern matrix or dissimilarity, similarity matrix.\nThis paper investigates the application of consensus clustering and meta-clustering to the set of all possible partitions of a data set. We show that when using a \"complement\" of Rand Index as a measure of cluster similarity, the total-separation partition, putting each element in a separate set, is chosen.\nThis thesis is in the area called computational social choice which is an intersection area of algorithms and social choice theory.\nI make some basic observations about hard takeoff, value alignment, and coherent extrapolated volition, concepts which have been central in analyses of superintelligent AI systems.\nThis chapter describes the history of metaheuristics in five distinct periods, starting long before the first use of the term and ending a long time in the future.\nIn this paper, we argue that the future of Artificial Intelligence research resides in two keywords: integration and embodiment. We support this claim by analyzing the recent advances of the field. Regarding integration, we note that the most impactful recent contributions have been made possible through the integration of recent Machine Learning methods (based in particular on Deep Learning and Recurrent Neural Networks) with more traditional ones (e.g. Monte-Carlo tree search, goal babbling exploration or addressable memory systems). Regarding embodiment, we note that the traditional benchmark tasks (e.g. visual classification or board games) are becoming obsolete as state-of-the-art learning algorithms approach or even surpass human performance in most of them, having recently encouraged the development of first-person 3D game platforms embedding realistic physics. Building upon this analysis, we first propose an embodied cognitive architecture integrating heterogenous sub-fields of Artificial Intelligence into a unified framework. We demonstrate the utility of our approach by showing how major contributions of the field can be expressed within the proposed framework. We then claim that benchmarking environments need to reproduce ecologically-valid conditions for bootstrapping the acquisition of increasingly complex cognitive skills through the concept of a cognitive arms race between embodied agents.\nIn this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes for query answers from databases.\nWe show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).\nThis position paper formalises an abstract model for complex negotiation dialogue. This model is to be used for the benchmark of optimisation algorithms ranging from Reinforcement Learning to Stochastic Games, through Transfer Learning, One-Shot Learning or others.\nThe end (for human scientists) is nigh? The posit of this discourse is that the majority, if not all, scientific research will eventually be undertaken by one, or a number of, weak artificial intelligences.\nMagNet and \"Efficient Defenses...\" were recently proposed as a defense to adversarial examples. We find that we can construct adversarial examples that defeat these defenses with only a slight increase in distortion.\nSafety critical systems strongly require the quality aspects of artificial intelligence including explainability. In this paper, we analyzed a trained network to extract features which mainly contribute the inference. Based on the analysis, we developed a simple solution to generate explanations of the inference processes.\nIn this paper, we present Paranom, a parallel anomaly dataset generator. We discuss its design and provide brief experimental results demonstrating its usefulness in improving the classification correctness of LSTM-AD, a state-of-the-art anomaly detection model.\nThis paper provides an overview of the SP theory of intelligence and its central idea that artificial intelligence, mainstream computing, and much of human perception and cognition, may be understood as information compression.   The background and origins of the SP theory are described, and the main elements of the theory, including the key concept of multiple alignment, borrowed from bioinformatics but with important differences. Associated with the SP theory is the idea that redundancy in information may be understood as repetition of patterns, that compression of information may be achieved via the matching and unification (merging) of patterns, and that computing and information compression are both fundamentally probabilistic. It appears that the SP system is Turing-equivalent in the sense that anything that may be computed with a Turing machine may, in principle, also be computed with an SP machine.   One of the main strengths of the SP theory and the multiple alignment concept is in modelling concepts and phenomena in artificial intelligence. Within that area, the SP theory provides a simple but versatile means of representing different kinds of knowledge, it can model both the parsing and production of natural language, with potential for the understanding and translation of natural languages, it has strengths in pattern recognition, with potential in computer vision, it can model several kinds of reasoning, and it has capabilities in planning, problem solving, and unsupervised learning.   The paper includes two examples showing how alternative parsings of an ambiguous sentence may be modelled as multiple alignments, and another example showing how the concept of multiple alignment may be applied in medical diagnosis.\nDigital pathology is not only one of the most promising fields of diagnostic medicine, but at the same time a hot topic for fundamental research. Digital pathology is not just the transfer of histopathological slides into digital representations. The combination of different data sources (images, patient records, and *omics data) together with current advances in artificial intelligence/machine learning enable to make novel information accessible and quantifiable to a human expert, which is not yet available and not exploited in current medical settings. The grand goal is to reach a level of usable intelligence to understand the data in the context of an application task, thereby making machine decisions transparent, interpretable and explainable. The foundation of such an \"augmented pathologist\" needs an integrated approach: While machine learning algorithms require many thousands of training examples, a human expert is often confronted with only a few data points. Interestingly, humans can learn from such few examples and are able to instantly interpret complex patterns. Consequently, the grand goal is to combine the possibilities of artificial intelligence with human intelligence and to find a well-suited balance between them to enable what neither of them could do on their own. This can raise the quality of education, diagnosis, prognosis and prediction of cancer and other diseases. In this paper we describe some (incomplete) research issues which we believe should be addressed in an integrated and concerted effort for paving the way towards the augmented pathologist.\nTriggered by modern technologies, our possibilities may now expand beyond the unthinkable. Cars externally may look similar to decades ago, but a dramatic revolution happened inside the cabin as a result of their computation, communications, and storage capabilities. With the advent of Electric Autonomous Vehicles (EAVs), Artificial Intelligence and ecological technologies found the best synergy. Several transportation problems may be solved (accidents, emissions, and congestion among others), and the foundation of Machine-to-Machine (M2M) economy could be established, in addition to value-added services such as infotainment (information and entertainment).   In the world where intelligent technologies are pervading everyday life, software and algorithms play a major role. Software has been lately introduced in virtually every technological product available on the market, from phones to television sets to cars and even housing. Artificial Intelligence is one of the consequences of this pervasive presence of algorithms. The role of software is becoming dominant and technology is, at times pervasive, of our existence. Concerns, such as privacy and security, demand high attention and have been already explored to some level of detail. However, intelligent agents and actors are often considered as perfect entities that will overcome human error-prone nature. This may not always be the case and we advocate that the notion of reputation is also applicable to intelligent artificial agents, in particular to EAVs.\nAn important characteristic of many logics for Artificial Intelligence is their nonmonotonicity. This means that adding a formula to the premises can invalidate some of the consequences. There may, however, exist formulae that can always be safely added to the premises without destroying any of the consequences: we say they respect monotonicity. Also, there may be formulae that, when they are a consequence, can not be invalidated when adding any formula to the premises: we call them conservative. We study these two classes of formulae for preferential logics, and show that they are closely linked to the formulae whose truth-value is preserved along the (preferential) ordering. We will consider some preferential logics for illustration, and prove syntactic characterization results for them. The results in this paper may improve the efficiency of theorem provers for preferential logics.\nRandomized algorithms for deciding satisfiability were shown to be effective in solving problems with thousands of variables. However, these algorithms are not complete. That is, they provide no guarantee that a satisfying assignment, if one exists, will be found. Thus, when studying randomized algorithms, there are two important characteristics that need to be considered: the running time and, even more importantly, the accuracy --- a measure of likelihood that a satisfying assignment will be found, provided one exists. In fact, we argue that without a reference to the accuracy, the notion of the running time for randomized algorithms is not well-defined. In this paper, we introduce a formal notion of accuracy. We use it to define a concept of the running time. We use both notions to study the random walk strategy GSAT algorithm. We investigate the dependence of accuracy on properties of input formulas such as clause-to-variable ratio and the number of satisfying assignments. We demonstrate that the running time of GSAT grows exponentially in the number of variables of the input formula for randomly generated 3-CNF formulas and for the formulas encoding 3- and 4-colorability of graphs.\nDecision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown distribution. We unify both theories and give strong arguments that the resulting universal AIXI model behaves optimal in any computable environment. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI^tl, which is still superior to any other time t and space l bounded agent. The computation time of AIXI^tl is of the order t x 2^l.\nRevision programming is a formalism to describe and enforce updates of belief sets and databases. That formalism was extended by Fitting who assigned annotations to revision atoms. Annotations provide a way to quantify the confidence (probability) that a revision atom holds. The main goal of our paper is to reexamine the work of Fitting, argue that his semantics does not always provide results consistent with intuition, and to propose an alternative treatment of annotated revision programs. Our approach differs from that proposed by Fitting in two key aspects: we change the notion of a model of a program and we change the notion of a justified revision. We show that under this new approach fundamental properties of justified revisions of standard revision programs extend to the annotated case.\nMutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.\nMost traditional artificial intelligence (AI) systems of the past 50 years are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inductive inference based on Occam's razor, problem solving, decision making, and reinforcement learning in environments of a very general type. Since inductive inference is at the heart of all inductive sciences, some of the results are relevant not only for AI and computer science but also for physics, provoking nontraditional predictions based on Zuse's thesis of the computer-generated universe.\nFully-automatic general-purpose high-quality machine translation systems (FGH-MT) are extremely difficult to build. In fact, there is no system in the world for any pair of languages which qualifies to be called FGH-MT. The reasons are not far to seek. Translation is a creative process which involves interpretation of the given text by the translator. Translation would also vary depending on the audience and the purpose for which it is meant. This would explain the difficulty of building a machine translation system. Since, the machine is not capable of interpreting a general text with sufficient accuracy automatically at present - let alone re-expressing it for a given audience, it fails to perform as FGH-MT. FOOTNOTE{The major difficulty that the machine faces in interpreting a given text is the lack of general world knowledge or common sense knowledge.}\nFusion of Artificial Neural Networks (ANN) and Fuzzy Inference Systems (FIS) have attracted the growing interest of researchers in various scientific and engineering areas due to the growing need of adaptive intelligent systems to solve the real world problems. ANN learns from scratch by adjusting the interconnections between layers. FIS is a popular computing framework based on the concept of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. The advantages of a combination of ANN and FIS are obvious. There are several approaches to integrate ANN and FIS and very often it depends on the application. We broadly classify the integration of ANN and FIS into three categories namely concurrent model, cooperative model and fully fused model. This paper starts with a discussion of the features of each model and generalize the advantages and deficiencies of each model. We further focus the review on the different types of fused neuro-fuzzy systems and citing the advantages and disadvantages of each model.\nIntelligent systems based on first-order logic on the one hand, and on artificial neural networks (also called connectionist systems) on the other, differ substantially. It would be very desirable to combine the robust neural networking machinery with symbolic knowledge representation and reasoning paradigms like logic programming in such a way that the strengths of either paradigm will be retained. Current state-of-the-art research, however, fails by far to achieve this ultimate goal. As one of the main obstacles to be overcome we perceive the question how symbolic knowledge can be encoded by means of connectionist systems: Satisfactory answers to this will naturally lead the way to knowledge extraction algorithms and to integrated neural-symbolic systems.\nW.C. Rounds and G.-Q. Zhang (2001) have proposed to study a form of disjunctive logic programming generalized to algebraic domains. This system allows reasoning with information which is hierarchically structured and forms a (suitable) domain. We extend this framework to include reasoning with default negation, giving rise to a new nonmonotonic reasoning framework on hierarchical knowledge which encompasses answer set programming with extended disjunctive logic programs. We also show that the hierarchically structured knowledge on which programming in this paradigm can be done, arises very naturally from formal concept analysis. Together, we obtain a default reasoning paradigm for conceptual knowledge which is in accordance with mainstream developments in nonmonotonic reasoning.\nWater plays a pivotal role in many physical processes, and most importantly in sustaining human life, animal life and plant life. Water supply entities therefore have the responsibility to supply clean and safe water at the rate required by the consumer. It is therefore necessary to implement mechanisms and systems that can be employed to predict both short-term and long-term water demands. The increasingly growing field of computational intelligence techniques has been proposed as an efficient tool in the modelling of dynamic phenomena. The primary objective of this paper is to compare the efficiency of two computational intelligence techniques in water demand forecasting. The techniques under comparison are the Artificial Neural Networks (ANNs) and the Support Vector Machines (SVMs). In this study it was observed that the ANNs perform better than the SVMs. This performance is measured against the generalisation ability of the two.\nWhen Kurt Goedel layed the foundations of theoretical computer science in 1931, he also introduced essential concepts of the theory of Artificial Intelligence (AI). Although much of subsequent AI research has focused on heuristics, which still play a major role in many practical AI applications, in the new millennium AI theory has finally become a full-fledged formal science, with important optimality results for embodied agents living in unknown environments, obtained through a combination of theory a la Goedel and probability theory. Here we look back at important milestones of AI history, mention essential recent results, and speculate about what we may expect from the next 25 years, emphasizing the significance of the ongoing dramatic hardware speedups, and discussing Goedel-inspired, self-referential, self-improving universal problem solvers.\nThis paper introduces the concept of fitness cloud as an alternative way to visualize and analyze search spaces than given by the geographic notion of fitness landscape. It is argued that the fitness cloud concept overcomes several deficiencies of the landscape representation. Our analysis is based on the correlation between fitness of solutions and fitnesses of nearest solutions according to some neighboring. We focus on the behavior of local search heuristics, such as hill climber, on the well-known NK fitness landscape. In both cases the fitness vs. fitness correlation is shown to be related to the epistatic parameter K.\nThe generation of meaningless \"words\" matching certain statistical and/or linguistic criteria is frequently needed for experimental purposes in Psycholinguistics. Such stimuli receive the name of pseudowords or nonwords in the Cognitive Neuroscience literatue. The process for building nonwords sometimes has to be based on linguistic units such as syllables or morphemes, resulting in a numerical explosion of combinations when the size of the nonwords is increased. In this paper, a reactive tabu search scheme is proposed to generate nonwords of variables size. The approach builds pseudowords by using a modified Metaheuristic algorithm based on a local search procedure enhanced by a feedback-based scheme. Experimental results show that the new algorithm is a practical and effective tool for nonword generation.\nDirection relations between extended spatial objects are important commonsense knowledge. Recently, Goyal and Egenhofer proposed a formal model, known as Cardinal Direction Calculus (CDC), for representing direction relations between connected plane regions. CDC is perhaps the most expressive qualitative calculus for directional information, and has attracted increasing interest from areas such as artificial intelligence, geographical information science, and image retrieval. Given a network of CDC constraints, the consistency problem is deciding if the network is realizable by connected regions in the real plane. This paper provides a cubic algorithm for checking consistency of basic CDC constraint networks, and proves that reasoning with CDC is in general an NP-Complete problem. For a consistent network of basic CDC constraints, our algorithm also returns a 'canonical' solution in cubic time. This cubic algorithm is also adapted to cope with cardinal directions between possibly disconnected regions, in which case currently the best algorithm is of time complexity O(n^5).\nWhy did the big bang occur, why do the laws and constants of nature as well as the boundary conditions seem so fine-tuned for life, what is the role of intelligence and self-consciousness in the universe, and how can it escape cosmic doomsday? The hypothesis of Cosmological Artificial Selection (CAS) connects those questions and suggests a far-reaching answer: Our universe might be understood in terms of vast computer simulations and could even have been created and transcended by one. - This essay critically discusses some of the premises and implications of CAS and related problems both with the proposal itself and its possible physical realization: Is our universe really fine-tuned, does CAS deserve to be considered as a convincing explanation, and which other options are available to understand the physical laws, constants and boundary conditions? Is life incidental, and does CAS revalue it? And is intelligence and self-consciousness ultimately doomed, or might CAS rescue it?   Keywords: origin of the universe, big bang, fine-tuning, laws of nature, physical constants, initial conditions, intelligent life, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis, deism, natural theology, far future of the universe, physical eschatology\nThe aim of this work is to address the question of whether we can in principle design rational decision-making agents or artificial intelligences embedded in computable physics such that their decisions are optimal in reasonable mathematical senses. Recent developments in rare event probability estimation, recursive bayesian inference, neural networks, and probabilistic planning are sufficient to explicitly approximate reinforcement learners of the AIXI style with non-trivial model classes (here, the class of resource-bounded Turing machines). Consideration of the effects of resource limitations in a concrete implementation leads to insights about possible architectures for learning systems using optimal decision makers as components.\nThe methodology of Bayesian Model Averaging (BMA) is applied for assessment of newborn brain maturity from sleep EEG. In theory this methodology provides the most accurate assessments of uncertainty in decisions. However, the existing BMA techniques have been shown providing biased assessments in the absence of some prior information enabling to explore model parameter space in details within a reasonable time. The lack in details leads to disproportional sampling from the posterior distribution. In case of the EEG assessment of brain maturity, BMA results can be biased because of the absence of information about EEG feature importance. In this paper we explore how the posterior information about EEG features can be used in order to reduce a negative impact of disproportional sampling on BMA performance. We use EEG data recorded from sleeping newborns to test the efficiency of the proposed BMA technique.\nSymmetry can be used to help solve many problems. For instance, Einstein's famous 1905 paper (\"On the Electrodynamics of Moving Bodies\") uses symmetry to help derive the laws of special relativity. In artificial intelligence, symmetry has played an important role in both problem representation and reasoning. I describe recent work on using symmetry to help solve constraint satisfaction problems. Symmetries occur within individual solutions of problems as well as between different solutions of the same problem. Symmetry can also be applied to the constraints in a problem to give new symmetric constraints. Reasoning about symmetry can speed up problem solving, and has led to the discovery of new results in both graph and number theory.\nThe constraint satisfaction problem (CSP) is a central generic problem in computer science and artificial intelligence: it provides a common framework for many theoretical problems as well as for many real-life applications. Soft constraint problems are a generalisation of the CSP which allow the user to model optimisation problems. Considerable effort has been made in identifying properties which ensure tractability in such problems. In this work, we initiate the study of hybrid tractability of soft constraint problems; that is, properties which guarantee tractability of the given soft constraint problem, but which do not depend only on the underlying structure of the instance (such as being tree-structured) or only on the types of soft constraints in the instance (such as submodularity). We present several novel hybrid classes of soft constraint problems, which include a machine scheduling problem, constraint problems of arbitrary arities with no overlapping nogoods, and the SoftAllDiff constraint with arbitrary unary soft constraints. An important tool in our investigation will be the notion of forbidden substructures.\nThe mathematical formalism of quantum mechanics has been successfully employed in the last years to model situations in which the use of classical structures gives rise to problematical situations, and where typically quantum effects, such as 'contextuality' and 'entanglement', have been recognized. This 'Quantum Interaction Approach' is briefly reviewed in this paper focusing, in particular, on the quantum models that have been elaborated to describe how concepts combine in cognitive science, and on the ensuing identification of a quantum structure in human thought. We point out that these results provide interesting insights toward the development of a unified theory for meaning and knowledge formalization and representation. Then, we analyze the technological aspects and implications of our approach, and a particular attention is devoted to the connections with symbolic artificial intelligence, quantum computation and robotics.\nIn the recent literature of Artificial Intelligence, an intensive research effort has been spent, for various algebras of qualitative relations used in the representation of temporal and spatial knowledge, on the problem of classifying the computational complexity of reasoning problems for subsets of algebras. The main purpose of these researches is to describe a restricted set of maximal tractable subalgebras, ideally in an exhaustive fashion with respect to the hosting algebras. In this paper we introduce a novel algebra for reasoning about Spatial Congruence, show that the satisfiability problem in the spatial algebra MC-4 is NP-complete, and present a complete classification of tractability in the algebra, based on the individuation of three maximal tractable subclasses, one containing the basic relations. The three algebras are formed by 14, 10 and 9 relations out of 16 which form the full algebra.\nSupply chain formation is the process of determining the structure and terms of exchange relationships to enable a multilevel, multiagent production activity. We present a simple model of supply chains, highlighting two characteristic features: hierarchical subtask decomposition, and resource contention. To decentralize the formation process, we introduce a market price system over the resources produced along the chain. In a competitive equilibrium for this system, agents choose locally optimal allocations with respect to prices, and outcomes are optimal overall. To determine prices, we define a market protocol based on distributed, progressive auctions, and myopic, non-strategic agent bidding policies. In the presence of resource contention, this protocol produces better solutions than the greedy protocols common in the artificial intelligence and multiagent systems literature. The protocol often converges to high-value supply chains, and when competitive equilibria exist, typically to approximate competitive equilibria. However, complementarities in agent production technologies can cause the protocol to wastefully allocate inputs to agents that do not produce their outputs. A subsequent decommitment phase recovers a significant fraction of the lost surplus.\nMany researchers in artificial intelligence are beginning to explore the use of soft constraints to express a set of (possibly conflicting) problem requirements. A soft constraint is a function defined on a collection of variables which associates some measure of desirability with each possible combination of values for those variables. However, the crucial question of the computational complexity of finding the optimal solution to a collection of soft constraints has so far received very little attention. In this paper we identify a class of soft binary constraints for which the problem of finding the optimal solution is tractable. In other words, we show that for any given set of such constraints, there exists a polynomial time algorithm to determine the assignment having the best overall combined measure of desirability. This tractable class includes many commonly-occurring soft constraints, such as 'as near as possible' or 'as soon as possible after', as well as crisp constraints such as 'greater than'. Finally, we show that this tractable class is maximal, in the sense that adding any other form of soft binary constraint which is not in the class gives rise to a class of problems which is NP-hard.\nLarge-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a simple approach to parallel best-first search that asynchronously distributes and schedules work among processors based on a hash function of the search state. We use this approach to parallelize the A* algorithm in an optimal sequential version of the Fast Downward planner, as well as a 24-puzzle solver. The scaling behavior of HDA* is evaluated experimentally on a shared memory, multicore machine with 8 cores, a cluster of commodity machines using up to 64 cores, and large-scale high-performance clusters, using up to 2400 processors. We show that this approach scales well, allowing the effective utilization of large amounts of distributed memory to optimally solve problems which require terabytes of RAM. We also compare HDA* to Transposition-table Driven Scheduling (TDS), a hash-based parallelization of IDA*, and show that, in planning, HDA* significantly outperforms TDS. A simple hybrid which combines HDA* and TDS to exploit strengths of both algorithms is proposed and evaluated.\nMany methods have been developed to secure the network infrastructure and communication over the Internet. Intrusion detection is a relatively new addition to such techniques. Intrusion detection systems (IDS) are used to find out if someone has intrusion into or is trying to get it the network. One big problem is amount of Intrusion which is increasing day by day. We need to know about network attack information using IDS, then analysing the effect. Due to the nature of IDSs which are solely signature based, every new intrusion cannot be detected; so it is important to introduce artificial intelligence (AI) methods / techniques in IDS. Introduction of AI necessitates the importance of normalization in intrusions. This work is focused on classification of AI based IDS techniques which will help better design intrusion detection systems in the future. We have also proposed a support vector machine for IDS to detect Smurf attack with much reliable accuracy.\nThe rapid growth and diversity in service offerings and the ensuing complexity of information technology ecosystems present numerous management challenges (both operational and strategic). Instrumentation and measurement technology is, by and large, keeping pace with this development and growth. However, the algorithms, tools, and technology required to transform the data into relevant information for decision making are not. The claim in this paper (and the invited talk) is that the line of research conducted in Uncertainty in Artificial Intelligence is very well suited to address the challenges and close this gap. I will support this claim and discuss open problems using recent examples in diagnosis, model discovery, and policy optimization on three real life distributed systems.\nInferring from inconsistency and making decisions are two problems which have always been treated separately by researchers in Artificial Intelligence. Consequently, different models have been proposed for each category. Different argumentation systems [2, 7, 10, 11] have been developed for handling inconsistency in knowledge bases. Recently, other argumentation systems [3, 4, 8] have been defined for making decisions under uncertainty. The aim of this paper is to present a general argumentation framework in which both inferring from inconsistency and decision making are captured. The proposed framework can be used for decision under uncertainty, multiple criteria decision, rule-based decision and finally case-based decision. Moreover, works on classical decision suppose that the information about environment is coherent, and this no longer required by this general framework.\nDecision-theoretic planning with risk-sensitive planning objectives is important for building autonomous agents or decision-support systems for real-world applications. However, this line of research has been largely ignored in the artificial intelligence and operations research communities since planning with risk-sensitive planning objectives is more complicated than planning with risk-neutral planning objectives. To remedy this situation, we derive conditions that guarantee that the optimal expected utilities of the total plan-execution reward exist and are finite for fully observable Markov decision process models with non-linear utility functions. In case of Markov decision process models with both positive and negative rewards, most of our results hold for stationary policies only, but we conjecture that they can be generalized to non stationary policies.\nPoker is a challenging problem for artificial intelligence, with non-deterministic dynamics, partial observability, and the added difficulty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of the opponent's strategy. We then describe approaches to two key subproblems: (i) inferring a posterior over opponent strategies given a prior distribution and observations of their play, and (ii) playing an appropriate response to that distribution. We demonstrate the overall approach on a reduced version of poker using Dirichlet priors and then on the full game of Texas hold'em using a more informed prior. We demonstrate methods for playing effective responses to the opponent, based on the posterior.\nPlanning in partially observable Markov decision processes (POMDPs) remains a challenging topic in the artificial intelligence community, in spite of recent impressive progress in approximation techniques. Previous research has indicated that online planning approaches are promising in handling large-scale POMDP domains efficiently as they make decisions \"on demand\" instead of proactively for the entire state space. We present a Factored Hybrid Heuristic Online Planning (FHHOP) algorithm for large POMDPs. FHHOP gets its power by combining a novel hybrid heuristic search strategy with a recently developed factored state representation. On several benchmark problems, FHHOP substantially outperformed state-of-the-art online heuristic search approaches in terms of both scalability and quality.\nIntuitively, the concept of similarity is the notion to measure an inexact matching between two entities of the same reference set. The notions of similarity and its close relative dissimilarity are widely used in many fields of Artificial Intelligence. Yet they have many different and often partial definitions or properties, usually restricted to one field of application and thus incompatible with other uses. This paper contributes to the design and understanding of similarity and dissimilarity measures for Artificial Intelligence. A formal dual definition for each concept is proposed, joined with a set of fundamental properties. The behavior of the properties under several transformations is studied and revealed as an important matter to bear in mind. We also develop several practical examples that work out the proposed approach.\nWe present four novel approximation algorithms for finding triangulation of minimum treewidth. Two of the algorithms improve on the running times of algorithms by Robertson and Seymour, and Becker and Geiger that approximate the optimum by factors of 4 and 3 2/3, respectively. A third algorithm is faster than those but gives an approximation factor of 4 1/2. The last algorithm is yet faster, producing factor-O(lg/k) approximations in polynomial time. Finding triangulations of minimum treewidth for graphs is central to many problems in computer science. Real-world problems in artificial intelligence, VLSI design and databases are efficiently solvable if we have an efficient approximation algorithm for them. We report on experimental results confirming the effectiveness of our algorithms for large graphs associated with real-world problems.\nThere is available an ever-increasing variety of procedures for managing uncertainty. These methods are discussed in the literature of artificial intelligence, as well as in the literature of philosophy of science. Heretofore these methods have been evaluated by intuition, discussion, and the general philosophical method of argument and counterexample. Almost any method of uncertainty management will have the property that in the long run it will deliver numbers approaching the relative frequency of the kinds of events at issue. To find a measure that will provide a meaningful evaluation of these treatments of uncertainty, we must look, not at the long run, but at the short or intermediate run. Our project attempts to develop such a measure in terms of short or intermediate length performance. We represent the effects of practical choices by the outcomes of bets offered to agents characterized by two uncertainty management approaches: the subjective Bayesian approach and the Classical confidence interval approach. Experimental evaluation suggests that the confidence interval approach can outperform the subjective approach in the relatively short run.\nWe introduce a new class of graphical representations, expected utility networks (EUNs), and discuss some of its properties and potential applications to artificial intelligence and economic theory. In EUNs not only probabilities, but also utilities enjoy a modular representation. EUNs are undirected graphs with two types of arc, representing probability and utility dependencies respectively. The representation of utilities is based on a novel notion of conditional utility independence, which we introduce and discuss in the context of other existing proposals. Just as probabilistic inference involves the computation of conditional probabilities, strategic inference involves the computation of conditional expected utilities for alternative plans of action. We define a new notion of conditional expected utility (EU) independence, and show that in EUNs node separation with respect to the probability and utility subgraphs implies conditional EU independence.\nWe extend Gaussian networks - directed acyclic graphs that encode probabilistic relationships between variables - to its vector form. Vector Gaussian continuous networks consist of composite nodes representing multivariates, that take continuous values. These vector or composite nodes can represent correlations between parents, as opposed to conventional univariate nodes. We derive rules for inference in these networks based on two methods: message propagation and topology transformation. These two approaches lead to the development of algorithms, that can be implemented in either a centralized or a decentralized manner. The domain of application of these networks are monitoring and estimation problems. This new representation along with the rules for inference developed here can be used to derive current Bayesian algorithms such as the Kalman filter, and provide a rich foundation to develop new algorithms. We illustrate this process by deriving the decentralized form of the Kalman filter. This work unifies concepts from artificial intelligence and modern control theory.\nSequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learning and Q-learning in particular. The existing exploration strategies for Q-learning are of a heuristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimentation should be sufficient for selecting with statistical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hill-climbing algorithm that uses a statistical selection procedure to decide how much exploration is needed for selecting a plan which is, with arbitrarily high probability, arbitrarily close to a locally optimal one. Due to its generality the algorithm can be employed for the exploration strategy of robust Q-learning. An experiment on a relatively complex control task shows that the proposed exploration strategy performs better than a typical exploration strategy.\nSeveral Artificial Intelligence schemes for reasoning under uncertainty explore either explicitly or implicitly asymmetries among probabilities of various states of their uncertain domain models. Even though the correct working of these schemes is practically contingent upon the existence of a small number of probable states, no formal justification has been proposed of why this should be the case. This paper attempts to fill this apparent gap by studying asymmetries among probabilities of various states of uncertain models. By rewriting the joint probability distribution over a model's variables into a product of individual variables' prior and conditional probability distributions, and applying central limit theorem to this product, we can demonstrate that the probabilities of individual states of the model can be expected to be drawn from highly skewed, log-normal distributions. With sufficient asymmetry in individual prior and conditional probability distributions, a small fraction of states can be expected to cover a large portion of the total probability space with the remaining states having practically negligible probability. Theoretical discussion is supplemented by simulation results and an illustrative real-world example.\nThis paper compares three approaches to the problem of selecting among probability models to fit data (1) use of statistical criteria such as Akaike's information criterion and Schwarz's \"Bayesian information criterion,\" (2) maximization of the posterior probability of the model, and (3) maximization of an effectiveness ratio? trading off accuracy and computational cost. The unifying characteristic of the approaches is that all can be viewed as maximizing a penalized likelihood function. The second approach with suitable prior distributions has been shown to reduce to the first. This paper shows that the third approach reduces to the second for a particular form of the effectiveness ratio, and illustrates all three approaches with the problem of selecting the number of components in a mixture of Gaussian distributions. Unlike the first two approaches, the third can be used even when the candidate models are chosen for computational efficiency, without regard to physical interpretation, so that the likelihood and the prior distribution over models cannot be interpreted literally. As the most general and computationally oriented of the approaches, it is especially useful for artificial intelligence applications.\nBayesian networks have been used extensively in diagnostic tasks such as medicine, where they represent the dependency relations between a set of symptoms and a set of diseases. A criticism of this type of knowledge representation is that it is restricted to this kind of task, and that it cannot cope with the knowledge required in other artificial intelligence applications. For example, in computer vision, we require the ability to model complex knowledge, including temporal and relational factors. In this paper we extend Bayesian networks to model relational and temporal knowledge for high-level vision. These extended networks have a simple structure which permits us to propagate probability efficiently. We have applied them to the domain of endoscopy, illustrating how the general modelling principles can be used in specific cases.\nAt last year?s Uncertainty in AI Conference, we reported the results of a sensitivity analysis study of Pathfinder. Our findings were quite unexpected-slight variations to Pathfinder?s parameters appeared to lead to substantial degradations in system performance. A careful look at our first analysis, together with the valuable feedback provided by the participants of last year?s conference, led us to conduct a follow-up study. Our follow-up differs from our initial study in two ways: (i) the probabilities 0.0 and 1.0 remained unchanged, and (ii) the variations to the probabilities that are close to both ends (0.0 or 1.0) were less than the ones close to the middle (0.5). The results of the follow-up study look more reasonable-slight variations to Pathfinder?s parameters now have little effect on its performance. Taken together, these two sets of results suggest a viable extension of a common decision analytic sensitivity analysis to the larger, more complex settings generally encountered in artificial intelligence.\nWe study the model of projective simulation (PS), a novel approach to artificial intelligence based on stochastic processing of episodic memory which was recently introduced [H.J. Briegel and G. De las Cuevas. Sci. Rep. 2, 400, (2012)]. Here we provide a detailed analysis of the model and examine its performance, including its achievable efficiency, its learning times and the way both properties scale with the problems' dimension. In addition, we situate the PS agent in different learning scenarios, and study its learning abilities. A variety of new scenarios are being considered, thereby demonstrating the model's flexibility. Furthermore, to put the PS scheme in context, we compare its performance with those of Q-learning and learning classifier systems, two popular models in the field of reinforcement learning. It is shown that PS is a competitive artificial intelligence model of unique properties and strengths.\nAn artificial Ant Colony System (ACS) algorithm to solve general-purpose combinatorial Optimization Problems (COP) that extends previous AC models [21] by the inclusion of a negative pheromone, is here described. Several Travelling Salesman Problem (TSP) were used as benchmark. We show that by using two different sets of pheromones, a second-order co-evolved compromise between positive and negative feedbacks achieves better results than single positive feedback systems. The algorithm was tested against known NP-complete combinatorial Optimization Problems, running on symmetrical TSP's. We show that the new algorithm compares favourably against these benchmarks, accordingly to recent biological findings by Robinson [26,27], and Gruter [28] where \"No entry\" signals and negative feedback allows a colony to quickly reallocate the majority of its foragers to superior food patches. This is the first time an extended ACS algorithm is implemented with these successful characteristics.\nIn this paper, we present an illustration to the history of Artificial Intelligence(AI) with a statistical analysis of publish since 1940. We collected and mined through the IEEE publish data base to analysis the geological and chronological variance of the activeness of research in AI. The connections between different institutes are showed. The result shows that the leading community of AI research are mainly in the USA, China, the Europe and Japan. The key institutes, authors and the research hotspots are revealed. It is found that the research institutes in the fields like Data Mining, Computer Vision, Pattern Recognition and some other fields of Machine Learning are quite consistent, implying a strong interaction between the community of each field. It is also showed that the research of Electronic Engineering and Industrial or Commercial applications are very active in California. Japan is also publishing a lot of papers in robotics. Due to the limitation of data source, the result might be overly influenced by the number of published articles, which is to our best improved by applying network keynode analysis on the research community instead of merely count the number of publish.\nBound propagation is an important Artificial Intelligence technique used in Constraint Programming tools to deal with numerical constraints. It is typically embedded within a search procedure (\"branch and prune\") and used at every node of the search tree to narrow down the search space, so it is critical that it be fast. The procedure invokes constraint propagators until a common fixpoint is reached, but the known algorithms for this have a pseudo-polynomial worst-case time complexity: they are fast indeed when the variables have a small numerical range, but they have the well-known problem of being prohibitively slow when these ranges are large. An important question is therefore whether strongly-polynomial algorithms exist that compute the common bound consistent fixpoint of a set of constraints. This paper answers this question. In particular we show that this fixpoint computation is in fact NP-complete, even when restricted to binary linear constraints.\nMany Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the relative weights may produce substantial changes in the system rankings. This paper introduces the Unanimous Improvement Ratio (UIR), a measure that complements standard metric combination criteria (such as van Rijsbergen's F-measure) and indicates how robust the measured differences are to changes in the relative weights of the individual metrics. UIR is meant to elucidate whether a perceived difference between two systems is an artifact of how individual metrics are weighted.   Besides discussing the theoretical foundations of UIR, this paper presents empirical results that confirm the validity and usefulness of the metric for the Text Clustering problem, where there is a tradeoff between precision and recall based metrics and results are particularly sensitive to the weighting scheme used to combine them. Remarkably, our experiments show that UIR can be used as a predictor of how well differences between systems measured on a given test bed will also hold in a different test bed.\nReal Time Strategy (RTS) games provide complex domain to test the latest artificial intelligence (AI) research. In much of the literature, AI systems have been limited to playing one game. Although, this specialization has resulted in stronger AI gaming systems it does not address the key concerns of AI researcher. AI researchers seek the development of AI agents that can autonomously interpret learn, and apply new knowledge. To achieve human level performance, current AI systems rely on game specific knowledge of an expert. The paper presents the full RTS language in hopes of shifting the current research focus to the development of general RTS agents. General RTS agents are AI gaming systems that can play any RTS games, defined in the RTS language. This prevents game specific knowledge from being hard coded into the system, thereby facilitating research that addresses the fundamental concerns of artificial intelligence.\nWe extend the notion of anti-unification to cover equational theories and present a method based on regular tree grammars to compute a finite representation of E-generalization sets. We present a framework to combine Inductive Logic Programming and E-generalization that includes an extension of Plotkin's lgg theorem to the equational case. We demonstrate the potential power of E-generalization by three example applications: computation of suggestions for auxiliary lemmas in equational inductive proofs, computation of construction laws for given term sequences, and learning of screen editor command sequences.\nMutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.\nPath planning is typically considered in Artificial Intelligence as a graph searching problem and R* is state-of-the-art algorithm tailored to solve it. The algorithm decomposes given path finding task into the series of subtasks each of which can be easily (in computational sense) solved by well-known methods (such as A*). Parameterized random choice is used to perform the decomposition and as a result R* performance largely depends on the choice of its input parameters. In our work we formulate a range of assumptions concerning possible upper and lower bounds of R* parameters, their interdependency and their influence on R* performance. Then we evaluate these assumptions by running a large number of experiments. As a result we formulate a set of heuristic rules which can be used to initialize the values of R* parameters in a way that leads to algorithm's best performance.\nArtificially intelligent agents equipped with strategic skills that can negotiate during their interactions with other natural or artificial agents are still underdeveloped. This paper describes a successful application of Deep Reinforcement Learning (DRL) for training intelligent agents with strategic conversational skills, in a situated dialogue setting. Previous studies have modelled the behaviour of strategic agents using supervised learning and traditional reinforcement learning techniques, the latter using tabular representations or learning with linear function approximation. In this study, we apply DRL with a high-dimensional state space to the strategic board game of Settlers of Catan---where players can offer resources in exchange for others and they can also reply to offers made by other players. Our experimental results report that the DRL-based learnt policies significantly outperformed several baselines including random, rule-based, and supervised-based behaviours. The DRL-based policy has a 53% win rate versus 3 automated players (`bots'), whereas a supervised player trained on a dialogue corpus in this setting achieved only 27%, versus the same 3 bots. This result supports the claim that DRL is a promising framework for training dialogue systems, and strategic agents with negotiation abilities.\nVerification and validation of agentic behavior have been suggested as important research priorities in efforts to reduce risks associated with the creation of general artificial intelligence (Russell et al 2015). In this paper we question the appropriateness of using language of certainty with respect to efforts to manage that risk. We begin by establishing a very general formalism to characterize agentic behavior and to describe standards of acceptable behavior. We show that determination of whether an agent meets any particular standard is not computable. We discuss the extent of the burden associated with verification by manual proof and by automated behavioral governance. We show that to ensure decidability of the behavioral standard itself, one must further limit the capabilities of the agent. We then demonstrate that if our concerns relate to outcomes in the physical world, attempts at validation are futile. Finally, we show that layered architectures aimed at making these challenges tractable mistakenly equate intentions with actions or outcomes, thereby failing to provide any guarantees. We conclude with a discussion of why language of certainty should be eradicated from the conversation about the safety of general artificial intelligence.\nObjective. To pilot test an artificial intelligence (AI) algorithm that selects peer change agents (PCA) to disseminate HIV testing messaging in a population of homeless youth. Methods. We recruited and assessed 62 youth at baseline, 1 month (n = 48), and 3 months (n = 38). A Facebook app collected preliminary social network data. Eleven PCAs selected by AI attended a 1-day training and 7 weekly booster sessions. Mixed-effects models with random effects were used to assess change over time. Results. Significant change over time was observed in past 6-month HIV testing (57.9%, 82.4%, 76.3%; p < .05) but not condom use (63.9%, 65.7%, 65.8%). Most youth reported speaking to a PCA about HIV prevention (72.0% at 1 month, 61.5% at 3 months). Conclusions. AI is a promising avenue for implementing PCA models for homeless youth. Increasing rates of regular HIV testing is critical to HIV prevention and linking homeless youth to treatment.\nIn recent years, researchers in decision analysis and artificial intelligence (AI) have used Bayesian belief networks to build models of expert opinion. Using standard methods drawn from the theory of computational complexity, workers in the field have shown that the problem of exact probabilistic inference on belief networks almost certainly requires exponential computation in the worst ease [3]. We have previously described a randomized approximation scheme, called BN-RAS, for computation on belief networks [ 1, 2, 4]. We gave precise analytic bounds on the convergence of BN-RAS and showed how to trade running time for accuracy in the evaluation of posterior marginal probabilities. We now extend our previous results and demonstrate the generality of our framework by applying similar mathematical techniques to the analysis of convergence for logic sampling [7], an alternative simulation algorithm for probabilistic inference.\nRule based reasoning (RBR) and case based reasoning (CBR) have emerged as two important and complementary reasoning methodologies in artificial intelligence (Al). For problem solving in complex, real world situations, it is useful to integrate RBR and CBR. This paper presents an approach to achieve a compact and seamless integration of RBR and CBR within the base architecture of rules. The paper focuses on the possibilistic nature of the approximate reasoning methodology common to both CBR and RBR. In CBR, the concept of similarity is casted as the complement of the distance between cases. In RBR the transitivity of similarity is the basis for the approximate deductions based on the generalized modus ponens. It is shown that the integration of CBR and RBR is possible without altering the inference engine of RBR. This integration is illustrated in the financial domain of mergers and acquisitions. These ideas have been implemented in a prototype system called MARS.\nIn this paper, we first consider a Bayesian framework and model the \"utility function\" in terms of fuzzy random variables. On the basis of this model, we define the \"prior (fuzzy) expected utility\" associated with each action, and the corresponding \"posterior (fuzzy) expected utility given sample information from a random experiment\". The aim of this paper is to analyze how sample information can affect the expected utility. In this way, by using some fuzzy preference relations, we conclude that sample information allows a decision maker to increase the expected utility on the average. The upper bound on the value of the expected utility is when the decision maker has perfect information. Applications of this work to the field of artificial intelligence are presented through two examples.\nThis paper shows that the common method used for making predictions under uncertainty in A1 and science is in error. This method is to use currently available data to select the best model from a given class of models-this process is called abduction-and then to use this model to make predictions about future data. The correct method requires averaging over all the models to make a prediction-we call this method transduction. Using transduction, an AI system will not give misleading results when basing predictions on small amounts of data, when no model is clearly best. For common classes of models we show that the optimal solution can be given in closed form.\nDuring the ongoing debate over the representation of uncertainty in Artificial Intelligence, Cheeseman, Lemmer, Pearl, and others have argued that probability theory, and in particular the Bayesian theory, should be used as the basis for the inference mechanisms of Expert Systems dealing with uncertainty. In order to pursue the issue in a practical setting, sophisticated tools for knowledge engineering are needed that allow flexible and understandable interaction with the underlying knowledge representation schemes. This paper describes a Generalized Bayesian framework for building expert systems which function in uncertain domains, using algorithms proposed by Lemmer. It is neither rule-based nor frame-based, and requires a new system of knowledge engineering tools. The framework we describe provides a knowledge-based system architecture with an inference engine, explanation capability, and a unique aid for building consistent knowledge bases.\nBelief updating schemes in artificial intelligence may be viewed as three dimensional languages, consisting of a syntax (e.g. probabilities or certainty factors), a calculus (e.g. Bayesian or CF combination rules), and a semantics (i.e. cognitive interpretations of competing formalisms). This paper studies the rational scope of those languages on the syntax and calculus grounds. In particular, the paper presents an endomorphism theorem which highlights the limitations imposed by the conditional independence assumptions implicit in the CF calculus. Implications of the theorem to the relationship between the CF and the Bayesian languages and the Dempster-Shafer theory of evidence are presented. The paper concludes with a discussion of some implications on rule-based knowledge engineering in uncertain domains.\nThis paper describes a machine induction program (WITT) that attempts to model human categorization. Properties of categories to which human subjects are sensitive includes best or prototypical members, relative contrasts between putative categories, and polymorphy (neither necessary or sufficient features). This approach represents an alternative to usual Artificial Intelligence approaches to generalization and conceptual clustering which tend to focus on necessary and sufficient feature rules, equivalence classes, and simple search and match schemes. WITT is shown to be more consistent with human categorization while potentially including results produced by more traditional clustering schemes. Applications of this approach in the domains of expert systems and information retrieval are also discussed.\nArtificial intelligence applications such as industrial robotics, military surveillance, and hazardous environment clean-up, require situation understanding based on partial, uncertain, and ambiguous or erroneous evidence. It is necessary to evaluate the relative likelihood of multiple possible hypotheses of the (current) situation faced by the decision making program. Often, the evidence and hypotheses are hierarchical in nature. In image understanding tasks, for example, evidence begins with raw imagery, from which ambiguous features are extracted which have multiple possible aggregations providing evidential support for the presence of multiple hypothesis of objects and terrain, which in turn aggregate in multiple ways to provide partial evidence for different interpretations of the ambient scene. Information fusion for military situation understanding has a similar evidence/hypothesis hierarchy from multiple sensor through message level interpretations, and also provides evidence at multiple levels of the doctrinal hierarchy of military forces.\nThis is a preliminary theoretical discussion on the computational requirements of the state of the art smoothed particle hydrodynamics (SPH) from the optics of pattern recognition and artificial intelligence. It is pointed out in the present paper that, when including anisotropy detection to improve resolution on shock layer, SPH is a very peculiar case of unsupervised machine learning. On the other hand, the free particle nature of SPH opens an opportunity for artificial intelligence to study particles as agents acting in a collaborative framework in which the timed outcomes of a fluid simulation forms a large knowledge base, which might be very attractive in computational astrophysics phenomenological problems like self-propagating star formation.\nIn many technical fields, single-objective optimization procedures in continuous domains involve expensive numerical simulations. In this context, an improvement of the Artificial Bee Colony (ABC) algorithm, called the Artificial super-Bee enhanced Colony (AsBeC), is presented. AsBeC is designed to provide fast convergence speed, high solution accuracy and robust performance over a wide range of problems. It implements enhancements of the ABC structure and hybridizations with interpolation strategies. The latter are inspired by the quadratic trust region approach for local investigation and by an efficient global optimizer for separable problems. Each modification and their combined effects are studied with appropriate metrics on a numerical benchmark, which is also used for comparing AsBeC with some effective ABC variants and other derivative-free algorithms. In addition, the presented algorithm is validated on two recent benchmarks adopted for competitions in international conferences. Results show remarkable competitiveness and robustness for AsBeC.\nRelentless progress in artificial intelligence (AI) is increasingly raising concerns that machines will replace humans on the job market, and perhaps altogether. Eliezer Yudkowski and others have explored the possibility that a promising future for humankind could be guaranteed by a superintelligent \"Friendly AI\", designed to safeguard humanity and its values. I argue that, from a physics perspective where everything is simply an arrangement of elementary particles, this might be even harder than it appears. Indeed, it may require thinking rigorously about the meaning of life: What is \"meaning\" in a particle arrangement? What is \"life\"? What is the ultimate ethical imperative, i.e., how should we strive to rearrange the particles of our Universe and shape its future? If we fail to answer the last question rigorously, this future is unlikely to contain humans.\nThe production system is a theoretical model of computation relevant to the artificial intelligence field allowing for problem solving procedures such as hierarchical tree search. In this work we explore some of the connections between artificial intelligence and quantum computation by presenting a model for a quantum production system. Our approach focuses on initially developing a model for a reversible production system which is a simple mapping of Bennett's reversible Turing machine. We then expand on this result in order to accommodate for the requirements of quantum computation. We present the details of how our proposition can be used alongside Grover's algorithm in order to yield a speedup comparatively to its classical counterpart. We discuss the requirements associated with such a speedup and how it compares against a similar quantum hierarchical search approach.\nThis paper critically assesses the anti-functionalist stance on consciousness adopted by certain advocates of integrated information theory (IIT), a corollary of which is that human-level artificial intelligence implemented on conventional computing hardware is necessarily not conscious. The critique draws on variations of a well-known gradual neuronal replacement thought experiment, as well as bringing out tensions in IIT's treatment of self-knowledge. The aim, though, is neither to reject IIT outright nor to champion functionalism in particular. Rather, it is suggested that both ideas have something to offer a scientific understanding of consciousness, as long as they are not dressed up as solutions to illusory metaphysical problems. As for human-level AI, we must await its development before we can decide whether or not to ascribe consciousness to it.\nThis paper presents a tentative outline for the construction of an artificial, generally intelligent system (AGI). It is argued that building a general data compression algorithm solving all problems up to a complexity threshold should be the main thrust of research. A measure for partial progress in AGI is suggested. Although the details are far from being clear, some general properties for a general compression algorithm are fleshed out. Its inductive bias should be flexible and adapt to the input data while constantly searching for a simple, orthogonal and complete set of hypotheses explaining the data. It should recursively reduce the size of its representations thereby compressing the data increasingly at every iteration.   Abstract Based on that fundamental ability, a grounded reasoning system is proposed. It is argued how grounding and flexible feature bases made of hypotheses allow for resourceful thinking. While the simulation of representation contents on the mental stage accounts for much of the power of propositional logic, compression leads to simple sets of hypotheses that allow the detection and verification of universally quantified statements.   Abstract Together, it is highlighted how general compression and grounded reasoning could account for the birth and growth of first concepts about the world and the commonsense reasoning about them.\nNuclear magnetic resonance (NMR) spectroscopy is a powerful method for the investigation of three-dimensional structures of biological molecules such as proteins. Determining a protein structure is essential for understanding its function and alterations in function which lead to disease. One of the major challenges of the post-genomic era is to obtain structural and functional information on the many unknown proteins encoded by thousands of newly identified genes. The goal of this research is to design an algorithm capable of automating the analysis of backbone protein NMR data by implementing AI strategies such as greedy and A* search.\nA minimalistic cognitive architecture called MANIC is presented. The MANIC architecture requires only three function approximating models, and one state machine. Even with so few major components, it is theoretically sufficient to achieve functional equivalence with all other cognitive architectures, and can be practically trained. Instead of seeking to transfer architectural inspiration from biology into artificial intelligence, MANIC seeks to minimize novelty and follow the most well-established constructs that have evolved within various sub-fields of data science. From this perspective, MANIC offers an alternate approach to a long-standing objective of artificial intelligence. This paper provides a theoretical analysis of the MANIC architecture.\nSearch is a central problem in artificial intelligence, and breadth-first search (BFS) and depth-first search (DFS) are the two most fundamental ways to search. In this paper we derive estimates for average BFS and DFS runtime. The average runtime estimates can be used to allocate resources or judge the hardness of a problem. They can also be used for selecting the best graph representation, and for selecting the faster algorithm out of BFS and DFS. They may also form the basis for an analysis of more advanced search methods. The paper treats both tree search and graph search. For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results to predict BFS and DFS runtime on two concrete grammar problems and on the N-puzzle. Experimental verification shows that our analytical approximations come close to empirical reality.\nThe artificial intelligence received broad interpretation as a literary image. This approach did not have unambiguous refering to the scopes of logical studies and mathematical investigations. An author applied methods peculiar to the semiotic approach, offered by Boris Uspensky and Yury Lotman. In addition, the article presented the criticism of modern versions of educational technologies, which led to the unconditional expectations for possibilities of information and telecommunication technologies. Methodological culture's growth, which was described on the base of semiotics and functional approach to word formation of new meanings for the description of the studied subjects, provided the development of pupils' thought. As a result, the research opened new prospects on understanding of artificial intelligence within educational practice.\nDecision making is a vital function in the age of machine learning and artificial intelligence; however, its physical realizations and their theoretical fundamentals are not yet known. In our former study, we demonstrated that single photons can be used to make decisions in uncertain, dynamically changing environments. The multi-armed bandit problem was successfully solved using the dual probabilistic and particle attributes of single photons. Herein, we revolutionize how decision making is comprehended via a category theoretic viewpoint; we present the category theoretic foundation of the single-photon-based decision making, including quantitative analysis that agrees well with the experimental results. The category theoretic model unveils complex interdependencies of the entities of the subject matter in the most simplified manner, including a dynamically changing environment. In particular, the octahedral structure and the braid structure in triangulated categories provide clear understandings and quantitative metrics of the underlying mechanisms for the single-photon decision maker. This is the first demonstration of a category theoretic interpretation of decision making, and provides a solid understanding and a design fundamental for machine learning and artificial intelligence.\nBridge is among the zero-sum games for which artificial intelligence has not yet outperformed expert human players. The main difficulty lies in the bidding phase of bridge, which requires cooperative decision making under partial information. Existing artificial intelligence systems for bridge bidding rely on and are thus restricted by human-designed bidding systems or features. In this work, we propose a pioneering bridge bidding system without the aid of human domain knowledge. The system is based on a novel deep reinforcement learning model, which extracts sophisticated features and learns to bid automatically based on raw card data. The model includes an upper-confidence-bound algorithm and additional techniques to achieve a balance between exploration and exploitation. Our experiments validate the promising performance of our proposed model. In particular, the model advances from having no knowledge about bidding to achieving superior performance when compared with a champion-winning computer bridge program that implements a human-designed bidding system.\nAnalyses of text corpora over time can reveal trends in beliefs, interest, and sentiment about a topic. We focus on views expressed about artificial intelligence (AI) in the New York Times over a 30-year period. General interest, awareness, and discussion about AI has waxed and waned since the field was founded in 1956. We present a set of measures that captures levels of engagement, measures of pessimism and optimism, the prevalence of specific hopes and concerns, and topics that are linked to discussions about AI over decades. We find that discussion of AI has increased sharply since 2009, and that these discussions have been consistently more optimistic than pessimistic. However, when we examine specific concerns, we find that worries of loss of control of AI, ethical concerns for AI, and the negative impact of AI on work have grown in recent years. We also find that hopes for AI in healthcare and education have increased over time.\nIn this work, we present and analyze reported failures of artificially intelligent systems and extrapolate our analysis to future AIs. We suggest that both the frequency and the seriousness of future AI failures will steadily increase. AI Safety can be improved based on ideas developed by cybersecurity experts. For narrow AIs safety failures are at the same, moderate, level of criticality as in cybersecurity, however for general AI, failures have a fundamentally different impact. A single failure of a superintelligent system may cause a catastrophic event without a chance for recovery. The goal of cybersecurity is to reduce the number of successful attacks on the system; the goal of AI Safety is to make sure zero attacks succeed in bypassing the safety mechanisms. Unfortunately, such a level of performance is unachievable. Every security system will eventually fail; there is no such thing as a 100% secure system.\nWhen an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to \"correct\" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations.\nSuperconducting circuit technologies have recently achieved quantum protocols involving closed feedback loops. Quantum artificial intelligence and quantum machine learning are emerging fields inside quantum technologies which may enable quantum devices to acquire information from the outer world and improve themselves via a learning process. Here we propose the implementation of basic protocols in quantum reinforcement learning, with superconducting circuits employing feedback-loop control. We introduce diverse scenarios for proof-of-principle experiments with state-of-the-art superconducting circuit technologies and analyze their feasibility in presence of imperfections. The field of quantum artificial intelligence implemented with superconducting circuits paves the way for enhanced quantum control and quantum computation protocols.\nWe present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.\nCurrently, Artificial Intelligence (AI) has won unprecedented attention and is becoming the increasingly popular focus in China. This change can be judged by the impressive record of academic publications, the amount of state-level investment and the presence of nation-wide participation and devotion. In this paper, we place emphasis on discussing the progress of artificial intelligence engineerings in China. We first introduce the focus on AI in Chinese academia, including the supercomputing brain system, Cambrian Period supercomputer of neural networks, and biometric systems. Then, the development of AI in industrial circles and the latest layout of AI products in companies, such as Baidu, Tencent, and Alibaba, are introduced. Last, we bring in the opinions and arguments of the main intelligentsia of China about the future development of AI, including how to examine the relationship between humanity on one side and science and technology on the other.\nWhile artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.\nArtificial Intelligence (AI) is an effective science which employs strong enough approaches, methods, and techniques to solve unsolvable real world based problems. Because of its unstoppable rise towards the future, there are also some discussions about its ethics and safety. Shaping an AI friendly environment for people and a people friendly environment for AI can be a possible answer for finding a shared context of values for both humans and robots. In this context, objective of this paper is to address the ethical issues of AI and explore the moral dilemmas that arise from ethical algorithms, from pre set or acquired values. In addition, the paper will also focus on the subject of AI safety. As general, the paper will briefly analyze the concerns and potential solutions to solving the ethical issues presented and increase readers awareness on AI safety as another related research interest.\nArtificial Intelligence methods to solve continuous- control tasks have made significant progress in recent years. However, these algorithms have important limitations and still need significant improvement to be used in industry and real- world applications. This means that this area is still in an active research phase. To involve a large number of research groups, standard benchmarks are needed to evaluate and compare proposed algorithms. In this paper, we propose a physical environment benchmark framework to facilitate collaborative research in this area by enabling different research groups to integrate their designed benchmarks in a unified cloud-based repository and also share their actual implemented benchmarks via the cloud. We demonstrate the proposed framework using an actual implementation of the classical mountain-car example and present the results obtained using a Reinforcement Learning algorithm.\nAutomatic photo aesthetic assessment is a challenging artificial intelligence task. Existing computational approaches have focused on modeling a single aesthetic score or a class (good or bad), however these do not provide any details on why the photograph is good or bad, or which attributes contribute to the quality of the photograph. To obtain both accuracy and human interpretation of the score, we advocate learning the aesthetic attributes along with the prediction of the overall score. For this purpose, we propose a novel multitask deep convolution neural network, which jointly learns eight aesthetic attributes along with the overall aesthetic score. We report near human performance in the prediction of the overall aesthetic score. To understand the internal representation of these attributes in the learned model, we also develop the visualization technique using back propagation of gradients. These visualizations highlight the important image regions for the corresponding attributes, thus providing insights about model's representation of these attributes. We showcase the diversity and complexity associated with different attributes through a qualitative analysis of the activation maps.\nPharmaco-epidemiology (PE) is the study of uses and effects of drugs in well defined populations. As medico-administrative databases cover a large part of the population, they have become very interesting to carry PE studies. Such databases provide longitudinal care pathways in real condition containing timestamped care events, especially drug deliveries. Temporal pattern mining becomes a strategic choice to gain valuable insights about drug uses. In this paper we propose DCM, a new discriminant temporal pattern mining algorithm. It extracts chronicle patterns that occur more in a studied population than in a control population. We present results on the identification of possible associations between hospitalizations for seizure and anti-epileptic drug switches in care pathway of epileptic patients.\nMuch research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.\nBusiness Process Management (BPM) is a central element of today organizations. Despite over the years its main focus has been the support of processes in highly controlled domains, nowadays many domains of interest to the BPM community are characterized by ever-changing requirements, unpredictable environments and increasing amounts of data that influence the execution of process instances. Under such dynamic conditions, BPM systems must increase their level of automation to provide the reactivity and flexibility necessary for process management. On the other hand, the Artificial Intelligence (AI) community has concentrated its efforts on investigating dynamic domains that involve active control of computational entities and physical devices (e.g., robots, software agents, etc.). In this context, Automated Planning, which is one of the oldest areas in AI, is conceived as a model-based approach to synthesize autonomous behaviours in automated way from a model. In this paper, we discuss how automated planning techniques can be leveraged to enable new levels of automation and support for business processing, and we show some concrete examples of their successful application to the different stages of the BPM life cycle.\nArtificial intelligence (AI) has achieved superhuman performance in a growing number of tasks, but understanding and explaining AI remain challenging. This paper clarifies the connections between machine-learning algorithms to develop AIs and the econometrics of dynamic structural models through the case studies of three famous game AIs. Chess-playing Deep Blue is a calibrated value function, whereas shogi-playing Bonanza is an estimated value function via Rust's (1987) nested fixed-point method. AlphaGo's \"supervised-learning policy network\" is a deep neural network implementation of Hotz and Miller's (1993) conditional choice probability estimation; its \"reinforcement-learning value network\" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) conditional choice simulation method. Relaxing these AIs' implicit econometric assumptions would improve their structural interpretability.\nArtificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during generation of data, development of algorithms, and evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS). The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and research. These ideas include experimental design principles of randomization and local control as well as the principle of stability to gain reproducibility and interpretability of algorithms and data results. We discuss the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors' collaborative research.\nComputation has changed the world more than any previous expressions of knowledge. In its particular algorithmic embodiment, it offers a perspective, within which the digital computer (one of many possible) exercises a role reminiscent of theology. Since it is closed to meaning, algorithmic digital computation can at most mimic the creative aspects of life. AI, in the perspective of time, proved to be less an acronym for artificial intelligence and more of automating tasks associated with intelligence. The entire development led to the hypostatized role of the machine: outputting nothing else but reality, including that of the humanity that made the machine happen. The convergence machine called deep learning is only the latest form through which the deterministic theology of the machine claims more than what extremely effective data processing actually is. A new understanding of complexity, as well as the need to distinguish between the reactive nature of the artificial and the anticipatory nature of the living are suggested as practical responses to the challenges posed by machine theology.\nArtificial intelligence (AI) is an extensive scientific discipline which enables computer systems to solve problems by emulating complex biological processes such as learning, reasoning and self-correction. This paper presents a comprehensive review of the application of AI techniques for improving performance of optical communication systems and networks. The use of AI-based techniques is first studied in applications related to optical transmission, ranging from the characterization and operation of network components to performance monitoring, mitigation of nonlinearities, and quality of transmission estimation. Then, applications related to optical network control and management are also reviewed, including topics like optical network planning and operation in both transport and access networks. Finally, the paper also presents a summary of opportunities and challenges in optical networking where AI is expected to play a key role in the near future.\nThis essay discusses whether computers, using Artificial Intelligence (AI), could create art. The first part concerns AI-based tools for assisting with art making. The history of technologies that automated aspects of art is covered, including photography and animation. In each case, we see initial fears and denial of the technology, followed by a blossoming of new creative and professional opportunities for artists. The hype and reality of Artificial Intelligence (AI) tools for art making is discussed, together with predictions about how AI tools will be used. The second part speculates about whether it could ever happen that AI systems could conceive of artwork, and be credited with authorship of an artwork. It is theorized that art is something created by social agents, and so computers cannot be credited with authorship of art in our current understanding. A few ways that this could change are also hypothesized.\nIn order for people to be able to trust and take advantage of the results of advanced machine learning and artificial intelligence solutions for real decision making, people need to be able to understand the machine rationale for given output. Research in explain artificial intelligence (XAI) addresses the aim, but there is a need for evaluation of human relevance and understandability of explanations. Our work contributes a novel methodology for evaluating the quality or human interpretability of explanations for machine learning models. We present an evaluation benchmark for instance explanations from text and image classifiers. The explanation meta-data in this benchmark is generated from user annotations of image and text samples. We describe the benchmark and demonstrate its utility by a quantitative evaluation on explanations generated from a recent machine learning algorithm. This research demonstrates how human-grounded evaluation could be used as a measure to qualify local machine-learning explanations.\nReasoning systems with too simple a model of the world and human intent are unable to consider potential negative side effects of their actions and modify their plans to avoid them (e.g., avoiding potential errors). However, hand-encoding the enormous and subtle body of facts that constitutes common sense into a knowledge base has proved too difficult despite decades of work. Distributed semantic vector spaces learned from large text corpora, on the other hand, can learn representations that capture shades of meaning of common-sense concepts and perform analogical and associational reasoning in ways that knowledge bases are too rigid to perform, by encoding concepts and the relations between them as geometric structures. These have, however, the disadvantage of being unreliable, poorly understood, and biased in their view of the world by the source material. This chapter will discuss how these approaches may be combined in a way that combines the best properties of each for understanding the world and human intentions in a richer way.\nThere are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are \"(at least) four different mechanisms\" that relate to Goodhart's Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field.\nSelf-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network's ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection.\nIn recent years, the increased demand for dynamic management of network resources in modern computer networks in general and in today's data centers in particular has resulted in a new promising architecture, in which a more flexible controlling functionalities can be achieved with high level of abstraction. In software defined networking (SDN) architecture, a central management of the forwarding elements (i.e. switches and routers) is accomplished by a central unit, which can be programmed directly to perform fundamental networking tasks or implementing any other additional services. Combining both central management and network programmability, opens the door to employ more advanced techniques such as artificial intelligence (AI) in order to deal with high-demand and rapidly-changing networks. In this study, we provide a detailed overview of current efforts and recent advancements to include AI in SDN-based networks.\nAttacks to networks are becoming more complex and sophisticated every day. Beyond the so-called script-kiddies and hacking newbies, there is a myriad of professional attackers seeking to make serious profits infiltrating in corporate networks. Either hostile governments, big corporations or mafias are constantly increasing their resources and skills in cybercrime in order to spy, steal or cause damage more effectively. traditional approaches to Network Security seem to start hitting their limits and it is being recognized the need for a smarter approach to threat detections. This paper provides an introduction on the need for evolution of Cyber Security techniques and how Artificial Intelligence could be of application to help solving some of the problems. It provides also, a high-level overview of some state of the art AI Network Security techniques, to finish analysing what is the foreseeable future of the application of AI to Network Security.\nThis essay examines how what is considered to be artificial intelligence (AI) has changed over time and come to intersect with the expertise of the author. Initially, AI developed on a separate trajectory, both topically and institutionally, from pattern recognition, neural information processing, decision and control systems, and allied topics by focusing on symbolic systems within computer science departments rather than on continuous systems in electrical engineering departments. The separate evolutions continued throughout the author's lifetime, with some crossover in reinforcement learning and graphical models, but were shocked into converging by the virality of deep learning, thus making an electrical engineer into an AI researcher. Now that this convergence has happened, opportunity exists to pursue an agenda that combines learning and reasoning bridged by interpretable machine learning models.\nIn recent years, China, the United States and other countries, Google and other high-tech companies have increased investment in artificial intelligence. Deep learning is one of the current artificial intelligence research's key areas. This paper analyzes and summarizes the latest progress and future research directions of deep learning. Firstly, three basic models of deep learning are outlined, including multilayer perceptrons, convolutional neural networks, and recurrent neural networks. On this basis, we further analyze the emerging new models of convolution neural networks and recurrent neural networks. This paper then summarizes deep learning's applications in many areas of artificial intelligence, including voice, computer vision, natural language processing and so on. Finally, this paper discusses the existing problems of deep learning and gives the corresponding possible solutions.\nOne of the common artificial intelligence applications in electronic games consists of making an artificial agent learn how to execute some determined task successfully in a game environment. One way to perform this task is through machine learning algorithms capable of learning the sequence of actions required to win in a given game environment. There are several supervised learning techniques able to learn the correct answer for a problem through examples. However, when learning how to play electronic games, the correct answer might only be known by the end of the game, after all the actions were already taken. Thus, not being possible to measure the accuracy of each individual action to be taken at each time step. A way for dealing with this problem is through Neuroevolution, a method which trains Artificial Neural Networks using evolutionary algorithms. In this article, we introduce a framework for testing optimization algorithms with artificial agent controllers in electronic games, called EvoMan, which is inspired in the action-platformer game Mega Man II. The environment can be configured to run in different experiment modes, as single evolution, coevolution and others. To demonstrate some challenges regarding the proposed platform, as initial experiments we applied Neuroevolution using Genetic Algorithms and the NEAT algorithm, in the context of competitively coevolving two distinct agents in this game.\nIn this paper we describe a web search agent, called Global Search Agent (hereafter GSA for short). GSA integrates and enhances several search techniques in order to achieve significant improvements in the user-perceived quality of delivered information as compared to usual web search engines. GSA features intelligent merging of relevant documents from different search engines, anticipated selective exploration and evaluation of links from the current result set, automated derivation of refined queries based on user relevance feedback. System architecture as well as experimental accounts are also illustrated.\nConsumers of mass media must have a comprehensive, balanced and plural selection of news to get an unbiased perspective; but achieving this goal can be very challenging, laborious and time consuming. News stories development over time, its (in)consistency, and different level of coverage across the media outlets are challenges that a conscientious reader has to overcome in order to alleviate bias.   In this paper we present an intelligent agent framework currently facilitating analysis of the main sources of on-line news in El Salvador. We show how prior tools of text analysis and Web 2.0 technologies can be combined with minimal manual intervention to help individuals on their rational decision process, while holding media outlets accountable for their work.\nComputational techniques have shown much promise in the field of Finance, owing to their ability to extract sense out of dauntingly complex systems. This paper reviews the most promising of these techniques, from traditional computational intelligence methods to their machine learning siblings, with particular view to their application in optimising the management of a portfolio of financial instruments. The current state of the art is assessed, and prospective further work is assessed and recommended\nThe four intensive problems to the software rose by the software industry .i.e., User System Communication / Human Machine Interface, Meta Data extraction, Information processing & management and Data representation are discussed in this research paper. To contribute in the field we have proposed and described an intelligent semantic oriented agent based search engine including the concepts of intelligent graphical user interface, natural language based information processing, data management and data reconstruction for the final user end information representation.\nThere is a growing trend towards the convergence of cyber-physical systems (CPS) and social computing, which will lead to the emergence of smart communities composed of various objects (including both human individuals and physical things) that interact and cooperate with each other. These smart communities promise to enable a number of innovative applications and services that will improve the quality of life. This position paper addresses some opportunities and challenges of building smart communities characterized by cyber-physical and social intelligence.\nNurse scheduling is a difficult optimization problem with multiple constraints. There is extensive research in the literature solving the problem using meta-heuristics approaches. In this paper, we will investigate an intelligent search heuristics that handles cost based scheduling problem. The heuristics demonstrated superior performances compared to the original algorithms used to solve the problems described in Li et. Al. (2003) and Ozkarahan (1989) in terms of time needed to establish a feasible solution. Both problems can be formulated as a cost problem. The search heuristic consists of several phrases of search and input based on the cost of each assignment and how the assignment will interact with the cost of the resources.\nWith the development of robotics and artificial intelligence field unceasingly thorough, path planning as an important field of robot calculation has been widespread concern. This paper analyzes the current development of robot and path planning algorithm and focuses on the advantages and disadvantages of the traditional intelligent path planning as well as the path planning. The problem of mobile robot path planning is studied by using ant colony algorithm, and it also provides some solving methods.\nThe development of intelligent machines is one of the biggest unsolved challenges in computer science. In this paper, we propose some fundamental properties these machines should have, focusing in particular on communication and learning. We discuss a simple environment that could be used to incrementally teach a machine the basics of natural-language-based communication, as a prerequisite to more complex interaction with human users. We also present some conjectures on the sort of algorithms the machine should support in order to profitably learn from the environment.\nWikipedia, the world largest encyclopedia contains a lot of knowledge that is expressed as formulae exclusively. Unfortunately, this knowledge is currently not fully accessible by intelligent information retrieval systems. This immense body of knowledge is hidden form value-added services, such as search. In this paper, we present our MathSearch implementation for Wikipedia that enables users to perform a combined text and fully unlock the potential benefits.\nModality is one of the important components of grammar in linguistics. It lets speaker to express attitude towards, or give assessment or potentiality of state of affairs. It implies different senses and thus has different perceptions as per the context. This paper presents an account showing the gap in the functionality of the current state of art Natural Language Processing (NLP) systems. The contextual nature of linguistic modality is studied. In this paper, the works and logical approaches employed by Natural Language Processing systems dealing with modality are reviewed. It sees human cognition and intelligence as multi-layered approach that can be implemented by intelligent systems for learning. Lastly, current flow of research going on within this field is talked providing futurology.\nIn this paper, the idea of applying Computational Intelligence in the process of creation board games, in particular mazes, is presented. For two different algorithms the proposed idea has been examined. The results of the experiments are shown and discussed to present advantages and disadvantages.\nWe present a new social animal inspired emotional swarm intelligence technique. This technique is used to solve a variant of the popular collective robots problem called foraging. We show with a simulation study how simple interaction rules based on sensations like hunger and loneliness can lead to globally coherent emergent behavior which allows sensor constrained robots to solve the given problem\nMultiplayer Online Battle Arena (MOBA) is one of the most played game genres nowadays. With the increasing growth of this genre, it becomes necessary to develop effective intelligent agents to play alongside or against human players. In this paper we address the problem of agent development for MOBA games. We implement a two-layered architecture agent that handles both navigation and game mechanics. This architecture relies on the use of Influence Maps, a widely used approach for tactical analysis. Several experiments were performed using {\\em League of Legends} as a testbed, and show promising results in this highly dynamic real-time context.\nAn elementary mathematical example proves, thanks to the Routh-Hurwitz criterion, a result that is intriguing with respect to today's practical understanding of model-free control, i.e., an \"intelligent\" proportional controller (iP) may turn to be more difficult to tune than an intelligent proportional-derivative one (iPD). The vast superiority of iPDs when compared to classic PIDs is shown via computer simulations. The introduction as well as the conclusion analyse model-free control in the light of recent advances.\nIn this paper we present a new form of access to knowledge through what we call \"hypermediator websites\". These hypermediator sites are intermediate between information devices that just scan the book culture and a \"real\" hypertext writing format.\nThis paper examines common assumptions regarding the decision-making internal environment for intelligent agents and investigates issues related to processing of memory and belief states to help obtain better understanding of the responses. In specific, we consider order effects and discuss both classical and non-classical explanations for them. We also consider implicit cognition and explore if certain inaccessible states may be best modeled as quantum states. We propose that the hypothesis that quantum states are at the basis of order effects be tested on large databases such as those related to medical treatment and drug efficacy. A problem involving a maze network is considered and comparisons made between classical and quantum decision scenarios for it.\nDecision-making is a process of choosing among alternative courses of action for solving complicated problems where multi-criteria objectives are involved. The past few years have witnessed a growing recognition of Soft Computing (SC) technologies that underlie the conception, design and utilization of intelligent systems. In this paper, we present different SC paradigms involving an artificial neural network trained using the scaled conjugate gradient algorithm, two different fuzzy inference methods optimised using neural network learning/evolutionary algorithms and regression trees for developing intelligent decision support systems. We demonstrate the efficiency of the different algorithms by developing a decision support system for a Tactical Air Combat Environment (TACE). Some empirical comparisons between the different algorithms are also provided.\nDomain experts should provide relevant domain knowledge to an Intelligent Tutoring System (ITS) so that it can guide a learner during problemsolving learning activities. However, for many ill-defined domains, the domain knowledge is hard to define explicitly. In previous works, we showed how sequential pattern mining can be used to extract a partial problem space from logged user interactions, and how it can support tutoring services during problem-solving exercises. This article describes an extension of this approach to extract a problem space that is richer and more adapted for supporting tutoring services. We combined sequential pattern mining with (1) dimensional pattern mining (2) time intervals, (3) the automatic clustering of valued actions and (4) closed sequences mining. Some tutoring services have been implemented and an experiment has been conducted in a tutoring system.\nEudaemonics is the study of the nature, causes, and conditions of human well-being. According to the ethical theory of eudaemonia, reaping satisfaction and fulfillment from life is not only a desirable end, but a moral responsibility. However, in modern society, many individuals struggle to meet this responsibility. Computational mechanisms could better enable individuals to achieve eudaemonia by yielding practical real-world systems that embody algorithms that promote human flourishing. This article presents eudaemonic systems as the evolutionary goal of the present day recommender system.\nThe study is from a base of accident scenarii in rail transport (feedback) in order to develop a tool to share build and sustain knowledge and safety and secondly to exploit the knowledge stored to prevent the reproduction of accidents / incidents. This tool should ultimately lead to the proposal of prevention and protection measures to minimize the risk level of a new transport system and thus to improve safety. The approach to achieving this goal largely depends on the use of artificial intelligence techniques and rarely the use of a method of automatic learning in order to develop a feasibility model of a software tool based on case based reasoning (CBR) to exploit stored knowledge in order to create know-how that can help stimulate domain experts in the task of analysis, evaluation and certification of a new system.\nExpert systems use human knowledge often stored as rules within the computer to solve problems that generally would entail human intelligence. Today, with information systems turning out to be more pervasive and with the myriad advances in information technologies, automating computer fault diagnosis is becoming so fundamental that soon every enterprise has to endorse it. This paper proposes an expert system called Expert PC Troubleshooter for diagnosing computer problems. The system is composed of a user interface, a rule-base, an inference engine, and an expert interface. Additionally, the system features a fuzzy-logic module to troubleshoot POST beep errors, and an intelligent agent that assists in the knowledge acquisition process. The proposed system is meant to automate the maintenance, repair, and operations (MRO) process, and free-up human technicians from manually performing routine, laborious, and timeconsuming maintenance tasks. As future work, the proposed system is to be parallelized so as to boost its performance and speed-up its various operations.\nThis paper presents formulae that can solve various seemingly hopeless philosophical conundrums. We discuss the simulation argument, teleportation, mind-uploading, the rationality of utilitarianism, and the ethics of exploiting artificial general intelligence. Our approach arises from combining the essential ideas of formalisms such as algorithmic probability, the universal intelligence measure, space-time-embedded intelligence, and Hutter's observer localization. We argue that such universal models can yield the ultimate solutions, but a novel research direction would be required in order to find computationally efficient approximations thereof.\nThe recently introduced theory of practopoiesis offers an account on how adaptive intelligent systems are organized. According to that theory biological agents adapt at three levels of organization and this structure applies also to our brains. This is referred to as tri-traversal theory of the organization of mind or for short, a T3-structure. To implement a similar T3-organization in an artificially intelligent agent, it is necessary to have multiple policies, as usually used as a concept in the theory of reinforcement learning. These policies have to form a hierarchy. We define adaptive practopoietic systems in terms of hierarchy of policies and calculate whether the total variety of behavior required by real-life conditions of an adult human can be satisfactorily accounted for by a traditional approach to artificial intelligence based on T2-agents, or whether a T3-agent is needed instead. We conclude that the complexity of real life can be dealt with appropriately only by a T3-agent.\nHumans dispose of two intertwined information processing pathways, cognitive information processing via neural firing patterns and diffusive volume control via neuromodulation. The cognitive information processing in the brain is traditionally considered to be the prime neural correlate of human intelligence, clinical studies indicate that human emotions intrinsically correlate with the activation of the neuromodulatory system.   We examine here the question: Why do humans dispose of the diffusive emotional control system? Is this a coincidence, a caprice of nature, perhaps a leftover of our genetic heritage, or a necessary aspect of any advanced intelligence, being it biological or synthetic? We argue here that emotional control is necessary to solve the motivational problem, viz the selection of short-term utility functions, in the context of an environment where information, computing power and time constitute scarce resources.\nIntelligent systems for the annotation of media content are increasingly being used for the automation of parts of social science research. In this domain the problem of integrating various Artificial Intelligence (AI) algorithms into a single intelligent system arises spontaneously. As part of our ongoing effort in automating media content analysis for the social sciences, we have built a modular system by combining multiple AI modules into a flexible framework in which they can cooperate in complex tasks. Our system combines data gathering, machine translation, topic classification, extraction and annotation of entities and social networks, as well as many other tasks that have been perfected over the past years of AI research. Over the last few years, it has allowed us to realise a series of scientific studies over a vast range of applications including comparative studies between news outlets and media content in different countries, modelling of user preferences, and monitoring public mood. The framework is flexible and allows the design and implementation of modular agents, where simple modules cooperate in the annotation of a large dataset without central coordination.\nOver the past decade, AI has made a remarkable progress. It is agreed that this is due to the recently revived Deep Learning technology. Deep Learning enables to process large amounts of data using simplified neuron networks that simulate the way in which the brain works. However, there is a different point of view, which posits that the brain is processing information, not data. This unresolved duality hampered AI progress for years. In this paper, I propose a notion of Integrated information that hopefully will resolve the problem. I consider integrated information as a coupling between two separate entities - physical information (that implies data processing) and semantic information (that provides physical information interpretation). In this regard, intelligence becomes a product of information processing. Extending further this line of thinking, it can be said that information processing does not require more a human brain for its implementation. Indeed, bacteria and amoebas exhibit intelligent behavior without any sign of a brain. That dramatically removes the need for AI systems to emulate the human brain complexity! The paper tries to explore this shift in AI systems design philosophy.\nThere is both much optimism and pessimism around artificial intelligence (AI) today. The optimists are investing millions of dollars, and even in some cases billions of dollars into AI. The pessimists, on the other hand, predict that AI will end many things: jobs, warfare, and even the human race. Both the optimists and the pessimists often appeal to the idea of a technological singularity, a point in time where machine intelligence starts to run away, and a new, more intelligent species starts to inhabit the earth. If the optimists are right, this will be a moment that fundamentally changes our economy and our society. If the pessimists are right, this will be a moment that also fundamentally changes our economy and our society. It is therefore very worthwhile spending some time deciding if either of them might be right.\nDeveloping Intelligent Systems involves artificial intelligence approaches including artificial neural networks. Here, we present a tutorial of Deep Neural Networks (DNNs), and some insights about the origin of the term \"deep\"; references to deep learning are also given. Restricted Boltzmann Machines, which are the core of DNNs, are discussed in detail. An example of a simple two-layer network, performing unsupervised learning for unlabeled data, is shown. Deep Belief Networks (DBNs), which are used to build networks with more than two layers, are also described. Moreover, examples for supervised learning with DNNs performing simple prediction and classification tasks, are presented and explained. This tutorial includes two intelligent pattern recognition applications: hand- written digits (benchmark known as MNIST) and speech recognition.\nMeaning has been called the \"holy grail\" of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive science hold that human semantic representation depends on sensori-motor experience; the abundant evidence that human meaning representation is grounded in the perception of physical reality leads to the conclusion that meaning must depend on a fusion of multiple (perceptual) modalities. Despite this, AI research in general, and its subdisciplines such as computational linguistics and computer vision in particular, have focused primarily on tasks that involve a single modality. Here, we propose virtual embodiment as an alternative, long-term strategy for AI research that is multi-modal in nature and that allows for the kind of scalability required to develop the field coherently and incrementally, in an ethically responsible fashion.\nWhat is the nature of curiosity? Is there any scientific way to understand the origin of this mysterious force that drives the behavior of even the stupidest naturally intelligent systems and is completely absent in their smartest artificial analogs? Can we build AI systems that could be curious about something, systems that would have an intrinsic motivation to learn? Is such a motivation quantifiable? Is it implementable? I will discuss this problem from the standpoint of physics. The relationship between physics and intelligence is a consequence of the fact that correctly predicted information is nothing but an energy resource, and the process of thinking can be viewed as a process of accumulating and spending this resource through the acts of perception and, respectively, decision making. The natural motivation of any autonomous system to keep this accumulation/spending balance as high as possible allows one to treat the problem of describing the dynamics of thinking processes as a resource optimization problem. Here I will propose and discuss a simple theoretical model of such an autonomous system which I call the Autonomous Turing Machine (ATM). The potential attractiveness of ATM lies in the fact that it is the model of a self-propelled AI for which the only available energy resource is the information itself. For ATM, the problem of optimal thinking, learning, and decision-making becomes conceptually simple and mathematically well tractable. This circumstance makes the ATM an ideal playground for studying the dynamics of intelligent behavior and allows one to quantify many seemingly unquantifiable features of genuine intelligence.\nOver the last decade, a new idea challenging the classical self-non-self viewpoint has become popular amongst immunologists. It is called the Danger Theory. In this conceptual paper, we look at this theory from the perspective of Artificial Immune System practitioners. An overview of the Danger Theory is presented with particular emphasis on analogies in the Artificial Immune Systems world. A number of potential application areas are then used to provide a framing for a critical assessment of the concept, and its relevance for Artificial Immune Systems.\nArtificial Immune Systems have been used successfully to build recommender systems for film databases. In this research, an attempt is made to extend this idea to web site recommendation. A collection of more than 1000 individuals web profiles (alternatively called preferences / favourites / bookmarks file) will be used. URLs will be classified using the DMOZ (Directory Mozilla) database of the Open Directory Project as our ontology. This will then be used as the data for the Artificial Immune Systems rather than the actual addresses. The first attempt will involve using a simple classification code number coupled with the number of pages within that classification code. However, this implementation does not make use of the hierarchical tree-like structure of DMOZ. Consideration will then be given to the construction of a similarity measure for web profiles that makes use of this hierarchical information to build a better-informed Artificial Immune System.\nThe human immune system protects the human body against various pathogens like e.g. biological viruses and bacteria. Artificial immune systems reuse the architecture, organization, and workflows of the human immune system for various problems in computer science. In the network security, the artificial immune system is used to secure a network and its nodes against intrusions like viruses, worms, and trojans. However, these approaches are far away from production where they are academic proof-of-concept implementations or use only a small part to protect against a certain intrusion. This article discusses the required steps to bring artificial immune systems into production in the network security domain. It furthermore figures out the challenges and provides the description and results of the prototype of an artificial immune system, which is SANA called.\nThe immune system provides a rich metaphor for computer security: anomaly detection that works in nature should work for machines. However, early artificial immune system approaches for computer security had only limited success. Arguably, this was due to these artificial systems being based on too simplistic a view of the immune system. We present here a second generation artificial immune system for process anomaly detection. It improves on earlier systems by having different artificial cell types that process information. Following detailed information about how to build such second generation systems, we find that communication between cells types is key to performance. Through realistic testing and validation we show that second generation artificial immune systems are capable of anomaly detection beyond generic system policies. The paper concludes with a discussion and outline of the next steps in this exciting area of computer security.\nThe semi-automatic or automatic synthesis of robot controller software is both desirable and challenging. Synthesis of rather simple behaviors such as collision avoidance by applying artificial evolution has been shown multiple times. However, the difficulty of this synthesis increases heavily with increasing complexity of the task that should be performed by the robot. We try to tackle this problem of complexity with Artificial Homeostatic Hormone Systems (AHHS), which provide both intrinsic, homeostatic processes and (transient) intrinsic, variant behavior. By using AHHS the need for pre-defined controller topologies or information about the field of application is minimized. We investigate how the principle design of the controller and the hormone network size affects the overall performance of the artificial evolution (i.e., evolvability). This is done by comparing two variants of AHHS that show different effects when mutated. We evolve a controller for a robot built from five autonomous, cooperating modules. The desired behavior is a form of gait resulting in fast locomotion by using the modules' main hinges.\nWhen modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.\nArtificial fish swarm algorithm (AFSA) is one of the swarm intelligence optimization algorithms that works based on population and stochastic search. In order to achieve acceptable result, there are many parameters needs to be adjusted in AFSA. Among these parameters, visual and step are very significant in view of the fact that artificial fish basically move based on these parameters. In standard AFSA, these two parameters remain constant until the algorithm termination. Large values of these parameters increase the capability of algorithm in global search, while small values improve the local search ability of the algorithm. In this paper, we empirically study the performance of the AFSA and different approaches to balance between local and global exploration have been tested based on the adaptive modification of visual and step during algorithm execution. The proposed approaches have been evaluated based on the four well-known benchmark functions. Experimental results show considerable positive impact on the performance of AFSA.\nWe survey concepts at the frontier of research connecting artificial, animal and human cognition to computation and information processing---from the Turing test to Searle's Chinese Room argument, from Integrated Information Theory to computational and algorithmic complexity. We start by arguing that passing the Turing test is a trivial computational problem and that its pragmatic difficulty sheds light on the computational nature of the human mind more than it does on the challenge of artificial intelligence. We then review our proposed algorithmic information-theoretic measures for quantifying and characterizing cognition in various forms. These are capable of accounting for known biases in human behavior, thus vindicating a computational algorithmic view of cognition as first suggested by Turing, but this time rooted in the concept of algorithmic probability, which in turn is based on computational universality while being independent of computational model, and which has the virtue of being predictive and testable as a model theory of cognitive behavior.\nIn a 1950 article in Mind, decades before the existence of anything resembling an artificial intelligence system, Alan Turing addressed the question of how to test whether machines can think, or in modern terminology, whether a computer claimed to exhibit intelligence indeed does so. The current paper raises the analogous issue for olfaction: how to test the validity of a system claimed to reproduce arbitrary odors artificially, in a way recognizable to humans, in face of the unavailability of a general naming method for odors. Although odor reproduction systems are still far from being viable, the question of how to test candidates thereof is claimed to be interesting and nontrivial, and a novel method is proposed. To some extent, the method is inspired by Turing`s test for AI, in that it involves a human challenger and the real and artificial entities, yet it is very different: our test is conditional, requiring from the artificial no more than is required from the original, and it employs a novel method of immersion that takes advantage of the availability of near-perfect reproduction methods for sight and sound.\nIt is expected that progress toward true artificial intelligence will be achieved through the emergence of a system that integrates representation learning and complex reasoning (LeCun et al. 2015). In response to this prediction, research has been conducted on implementing the symbolic reasoning of a von Neumann computer in an artificial neural network (Graves et al. 2016; Graves et al. 2014; Reed et al. 2015). However, these studies have many limitations in realizing neural-symbolic integration (Jaeger. 2016). Here, we present a new learning paradigm: a learning solving procedure (LSP) that learns the procedure for solving complex problems. This is not accomplished merely by learning input-output data, but by learning algorithms through a solving procedure that obtains the output as a sequence of tasks for a given input problem. The LSP neural network system not only learns simple problems of addition and multiplication, but also the algorithms of complicated problems, such as complex arithmetic expression, sorting, and Hanoi Tower. To realize this, the LSP neural network structure consists of a deep neural network and long short-term memory, which are recursively combined. Through experimentation, we demonstrate the efficiency and scalability of LSP and its validity as a mechanism of complex reasoning.\nOn markets with receding prices, artificial noise traders may consider alternatives to buy-and-hold. By simulating variations of the Parrondo strategy, using real data from the Swedish stock market, we produce first indications of a buy-low-sell-random Parrondo variation outperforming buy-and-hold. Subject to our assumptions, buy-low-sell-random also outperforms the traditional value and trend investor strategies. We measure the success of the Parrondo variations not only through their performance compared to other kinds of strategies, but also relative to varying levels of perfect information, received through messages within a multi-agent system of artificial traders.\nThis article underlines the learning and discrimination capabilities of a model of associative memory based on artificial networks of spiking neurons. Inspired from neuropsychology and neurobiology, the model implements top-down modulations, as in neocortical layer V pyramidal neurons, with a learning rule based on synaptic plasticity (STDP), for performing a multimodal association learning task. A temporal correlation method of analysis proves the ability of the model to associate specific activity patterns to different samples of stimulation. Even in the absence of initial learning and with continuously varying weights, the activity patterns become stable enough for discrimination.\nWe apply the Artificial Immune System (AIS) technology to the Collaborative Filtering (CF) technology when we build the movie recommendation system. Two different affinity measure algorithms of AIS, Kendall tau and Weighted Kappa, are used to calculate the correlation coefficients for this movie recommendation system. From the testing we think that Weighted Kappa is more suitable than Kendall tau for movie problems.\nThe Dendritic Cell algorithm (DCA) is inspired by recent work in innate immunity. In this paper a formal description of the DCA is given. The DCA is described in detail, and its use as an anomaly detector is illustrated within the context of computer security. A port scan detection task is performed to substantiate the influence of signal selection on the behaviour of the algorithm. Experimental results provide a comparison of differing input signal mappings.\nDendritic Cells (DCs) are innate immune system cells which have the power to activate or suppress the immune system. The behaviour of human of human DCs is abstracted to form an algorithm suitable for anomaly detection. We test this algorithm on the real-time problem of port scan detection. Our results show a significant difference in artificial DC behaviour for an outgoing portscan when compared to behaviour for normal processes.\nThis paper illustrates successful implementation of three evolutionary algorithms, namely- Particle Swarm Optimization(PSO), Artificial Bee Colony (ABC) and Bacterial Foraging Optimization (BFO) algorithms to economic load dispatch problem (ELD). Power output of each generating unit and optimum fuel cost obtained using all three algorithms have been compared. The results obtained show that ABC and BFO algorithms converge to optimal fuel cost with reduced computational time when compared to PSO for the two example problems considered.\nThe paper presents the electronic design and motion planning of a robot based on decision making regarding its straight motion and precise turn using Artificial Neural Network (ANN). The ANN helps in learning of robot so that it performs motion autonomously. The weights calculated are implemented in microcontroller. The performance has been tested to be excellent.\nThere is a need for new metaphors from immunology to flourish the application areas of Artificial Immune Systems. A metaheuristic called Obesity Heuristic derived from advances in obesity treatment is proposed. The main forces of the algorithm are the generation omega-6 and omega-3 fatty acids. The algorithm works with Just-In-Time philosophy; by starting only when desired. A case study of data cleaning is provided. With experiments conducted on standard tables, results show that Obesity Heuristic outperforms other algorithms, with 100% recall. This is a great improvement over other algorithms\nThis paper describes a new model for an artificial neural network processing unit or neuron. It is slightly different to a traditional feedforward network by the fact that it favours a mechanism of trying to match the wave-like 'shape' of the input with the shape of the output against specific value error corrections. The expectation is then that a best fit shape can be transposed into the desired output values more easily. This allows for notions of reinforcement through resonance and also the construction of synapses.\nThis paper motivates the study of decision theory as necessary for aligning smarter-than-human artificial systems with human interests. We discuss the shortcomings of two standard formulations of decision theory, and demonstrate that they cannot be used to describe an idealized decision procedure suitable for approximation by artificial systems. We then explore the notions of policy selection and logical counterfactuals, two recent insights into decision theory that point the way toward promising paths for future research.\nIt is well known that artificial neural networks (ANNs) can learn deterministic automata. Learning nondeterministic automata is another matter. This is important because much of the world is nondeterministic, taking the form of unpredictable or probabilistic events that must be acted upon. If ANNs are to engage such phenomena, then they must be able to learn how to deal with nondeterminism. In this project the game of Pong poses a nondeterministic environment. The learner is given an incomplete view of the game state and underlying deterministic physics, resulting in a nondeterministic game. Three models were trained and tested on the game: Mona, Elman, and Numenta's NuPIC.\nWe present a position paper advocating the notion that Stoic philosophy and ethics can inform the development of ethical A.I. systems. This is in sharp contrast to most work on building ethical A.I., which has focused on Utilitarian or Deontological ethical theories. We relate ethical A.I. to several core Stoic notions, including the dichotomy of control, the four cardinal virtues, the ideal Sage, Stoic practices, and Stoic perspectives on emotion or affect. More generally, we put forward an ethical view of A.I. that focuses more on internal states of the artificial agent rather than on external actions of the agent. We provide examples relating to near-term A.I. systems as well as hypothetical superintelligent agents.\nSelf-organization has been an important concept within a number of disciplines, which Artificial Life (ALife) also has heavily utilized since its inception. The term and its implications, however, are often confusing or misinterpreted. In this work, we provide a mini-review of self-organization and its relationship with ALife, aiming at initiating discussions on this important topic with the interested audience. We first articulate some fundamental aspects of self-organization, outline its usage, and review its applications to ALife within its soft, hard, and wet domains. We also provide perspectives for further research.\nAutonomous lifelong development and learning is a fundamental capability of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives. Deep learning (DL) approaches made great advances in artificial intelligence, but are still far away from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very little data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap, that are either superficially, or not adequately, discussed by Lake et al. These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to manually specify a task-specific objective function for every new task, and learn through off-line processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value, and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources. In the two last decades, the field of Developmental and Cognitive Robotics (Cangelosi and Schlesinger, 2015, Asada et al., 2009), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational\nQuantum information technologies, and intelligent learning systems, are both emergent technologies that will likely have a transforming impact on our society. The respective underlying fields of research -- quantum information (QI) versus machine learning (ML) and artificial intelligence (AI) -- have their own specific challenges, which have hitherto been investigated largely independently. However, in a growing body of recent work, researchers have been probing the question to what extent these fields can learn and benefit from each other. QML explores the interaction between quantum computing and ML, investigating how results and techniques from one field can be used to solve the problems of the other. Recently, we have witnessed breakthroughs in both directions of influence. For instance, quantum computing is finding a vital application in providing speed-ups in ML, critical in our \"big data\" world. Conversely, ML already permeates cutting-edge technologies, and may become instrumental in advanced quantum technologies. Aside from quantum speed-up in data analysis, or classical ML optimization used in quantum experiments, quantum enhancements have also been demonstrated for interactive learning, highlighting the potential of quantum-enhanced learning agents. Finally, works exploring the use of AI for the very design of quantum experiments, and for performing parts of genuine research autonomously, have reported their first successes. Beyond the topics of mutual enhancement, researchers have also broached the fundamental issue of quantum generalizations of ML/AI concepts. This deals with questions of the very meaning of learning and intelligence in a world that is described by quantum mechanics. In this review, we describe the main ideas, recent developments, and progress in a broad spectrum of research investigating machine learning and artificial intelligence in the quantum domain.\nSome recent studies have pointed that, the self-organization of neurons into brain-like structures, and the self-organization of ants into a swarm are similar in many respects. If possible to implement, these features could lead to important developments in pattern recognition systems, where perceptive capabilities can emerge and evolve from the interaction of many simple local rules. The principle of the method is inspired by the work of Chialvo and Millonas who developed the first numerical simulation in which swarm cognitive map formation could be explained. From this point, an extended model is presented in order to deal with digital image habitats, in which artificial ants could be able to react to the environment and perceive it. Evolution of pheromone fields point that artificial ant colonies could react and adapt appropriately to any type of digital habitat. KEYWORDS: Swarm Intelligence, Self-Organization, Stigmergy, Artificial Ant Systems, Pattern Recognition and Perception, Image Segmentation, Gestalt Perception Theory, Distributed Computation.\nDid natural consciousness and intelligent systems arise out of a path that was co-evolutionary to evolution? Can we explain human self-consciousness as having risen out of such an evolutionary path? If so how could it have been?   In this first part of a two-part paper (titled IXI), we take a learning system perspective to the problem of consciousness and intelligent systems, an approach that may look unseasonable in this age of fMRI's and high tech neuroscience.   We posit conscious intelligent systems in natural environments and wonder how natural factors influence their design paths. Such a perspective allows us to explain seamlessly a variety of natural factors, factors ranging from the rise and presence of the human mind, man's sense of I, his self-consciousness and his looping thought processes to factors like reproduction, incubation, extinction, sleep, the richness of natural behavior, etc. It even allows us to speculate on a possible human evolution scenario and other natural phenomena.\nA definition of intelligence is given in terms of performance that can be quantitatively measured. In this study, we have presented a conceptual model of Intelligent Agent System for Automatic Vehicle Checking Agent (VCA). To achieve this goal, we have introduced several kinds of agents that exhibit intelligent features. These are the Management agent, internal agent, External Agent, Watcher agent and Report agent. Metrics and measurements are suggested for evaluating the performance of Automatic Vehicle Checking Agent (VCA). Calibrate data and test facilities are suggested to facilitate the development of intelligent systems.\nThe relation between self awareness and intelligence is an open problem these days. Despite the fact that self awarness is usually related to Emotional Intelligence, this is not the case here. The problem described in this paper is how to model an agent which knows (Cognitive) Binary Logic and which is also able to pass (without any mistake) a certain family of Turing Tests designed to verify its knowledge and its discourse about the modal states of truth corresponding to well-formed formulae within the language of Propositional Binary Logic.\nAgents' judgment depends on perception and previous knowledge. Assuming that previous knowledge depends on perception, we can say that judgment depends on perception. So, if judgment depends on perception, can agents judge that they have the same perception? In few words, this is the addressed paradox through this document. While illustrating on the paradox, it's found that to reach agreement in communication, it's not necessary for parties to have the same perception however the necessity is to have perception correspondence. The attempted solution to this paradox reveals a potential uncertainty in judging the matter thus supporting the skeptical view of the problem. Moreover, relating perception to intelligence, the same uncertainty is inherited by judging the level of intelligence of an agent compared to others not necessarily from the same kind (e.g. machine intelligence compared to human intelligence). Using a proposed simple mathematical model for perception and action, a tool is developed to construct scenarios, and the problem is addressed mathematically such that conclusions are drawn systematically based on mathematically defined properties. When it comes to formalization, philosophical arguments and views become more visible and explicit.\nWeb space is the huge repository of data. Everyday lots of new information get added to this web space. The more the information, more is demand for tools to access that information. Answering users' queries about the online information intelligently is one of the great challenges in information retrieval in intelligent systems. In this paper, we will start with the brief introduction on information retrieval and intelligent systems and explain how swoogle, the semantic search engine, uses its algorithms and techniques to search for the desired contents in the web. We then continue with the clustering technique that is used to group the similar things together and discuss the machine learning technique called Self-organizing maps [6] or SOM, which is a data visualization technique that reduces the dimensions of data through the use of self-organizing neural networks. We then discuss how SOM is used to visualize the contents of the data, by following some lines of algorithm, in the form of maps. So, we could say that websites or machines can be used to retrieve the information that what exactly users want from them.\nThis paper discusses the merits and demerits of crisp logic and fuzzy logic with respect to their applicability in intelligent response generation by a human being and by a robot. Intelligent systems must have the capability of taking decisions that are wise and handle situations intelligently. A direct relationship exists between the level of perfection in handling a situation and the level of completeness of the available knowledge or information or data required to handle the situation. The paper concludes that the use of crisp logic with complete knowledge leads to perfection in handling situations whereas fuzzy logic can handle situations imperfectly only. However, in the light of availability of incomplete knowledge fuzzy theory is more effective but may be disadvantageous as compared to crisp logic.\nWith advancement in computer science research on artificial intelligence and in cognitive psychology research on human learning and performance, the next generation of computer-based tutoring systems moved beyond the simple presentation of pages of text or graphics. These new intelligent tutoring systems (ITSs) called cognitive tutors; incorporated model-tracing technology which is a cognitive model of student problem solving that captures students multiple strategies and common misconceptions. Such Intelligent tutoring systems or Knowledge Based Tutoring Systems can guide learners to progress in the learning process at their best. This paper deals with the review of various Intelligent tutoring systems using Bayesian Networks and how Bayesian Networks can be used for efficient decision making.\nThe goal of this paper is to elaborate swarm intelligence for business intelligence decision making and the business rules management improvement. .The swarm optimization, which is highly influenced by the behavior of creature, performs in group. The Spatial data is defined as data that is represented by 2D or 3D images. SQL Server supports only 2D images till now. As we know that location is an essential part of any organizational data as well as business data enterprises maintain customer address lists, own property, ship goods from and to warehouses, manage transport flows among their workforce, and perform many other activities. By means to say a lot of spatial data is used and processed by enterprises, organizations and other bodies in order to make the things more visible and self descriptive. From the experiments, we found that PSO is can facilitate the intelligence in social and business behavior.\nIn this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.\nAs machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine's ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.\nIn order to identify an object, human eyes firstly search the field of view for points or areas which have particular properties. These properties are used to recognise an image or an object. Then this process could be taken as a model to develop computer algorithms for images identification. This paper proposes the idea of applying the simplified firefly algorithm to search for key-areas in 2D images. For a set of input test images the proposed version of firefly algorithm has been examined. Research results are presented and discussed to show the efficiency of this evolutionary computation method.\nWe obtain the conditions for the emergence of the swarm intelligence effect in an interactive game of restless multi-armed bandit (rMAB). A player competes with multiple agents. Each bandit has a payoff that changes with a probability $p_{c}$ per round. The agents and player choose one of three options: (1) Exploit (a good bandit), (2) Innovate (asocial learning for a good bandit among $n_{I}$ randomly chosen bandits), and (3) Observe (social learning for a good bandit). Each agent has two parameters $(c,p_{obs})$ to specify the decision: (i) $c$, the threshold value for Exploit, and (ii) $p_{obs}$, the probability for Observe in learning. The parameters $(c,p_{obs})$ are uniformly distributed. We determine the optimal strategies for the player using complete knowledge about the rMAB. We show whether or not social or asocial learning is more optimal in the $(p_{c},n_{I})$ space and define the swarm intelligence effect. We conduct a laboratory experiment (67 subjects) and observe the swarm intelligence effect only if $(p_{c},n_{I})$ are chosen so that social learning is far more optimal than asocial learning.\nWe continue our analysis of volume and energy measures that are appropriate for quantifying inductive inference systems. We extend logical depth and conceptual jump size measures in AIT to stochastic problems, and physical measures that involve volume and energy. We introduce a graphical model of computational complexity that we believe to be appropriate for intelligent machines. We show several asymptotic relations between energy, logical depth and volume of computation for inductive inference. In particular, we arrive at a \"black-hole equation\" of inductive inference, which relates energy, volume, space, and algorithmic information for an optimal inductive inference solution. We introduce energy-bounded algorithmic entropy. We briefly apply our ideas to the physical limits of intelligent computation in our universe.\nDeveloping intelligent systems requires combining results from both industry and academia. In this report you find an overview of relevant research fields and industrially applicable technologies for building very large scale cyber physical systems. A concept architecture is used to illustrate how existing pieces may fit together, and the maturity of the subsystems is estimated.   The goal is to structure the developments and the challenge of machine intelligence for Consumer and Industrial Internet technologists, cyber physical systems researchers and people interested in the convergence of data & Internet of Things. It can be used for planning developments of intelligent systems.\nWe are at the dawn of a new era, where advances in computer power, broadband communication and digital sensor technologies have led to an unprecedented flood of data inundating our surrounding. It is generally believed that means such as Computational Intelligence will help to outlive these tough times. However, these hopes are improperly high. Computational Intelligence is a surprising composition of two mutually exclusive and contradicting constituents that could be coupled only if you disregard and neglect their controversies: \"Computational\" implies reliance on data processing and \"Intelligence\" implies reliance on information processing. Only those who are indifferent to data-information discrepancy can believe that such a combination can be viable. We do not believe in miracles, so we will try to share with you our reservations.\nThis paper describes how the \"SP theory of intelligence\", outlined in an Appendix, may throw light on aspects of commonsense reasoning (CSR) and commonsense knowledge (CSK) (together shortened to CSRK), as discussed in another paper by Ernest Davis and Gary Marcus (DM). The SP system has the generality needed for CSRK: Turing equivalence; the generality of information compression as the foundation for the SP system both in the representation of knowledge and in concepts of prediction and probability; the versatility of the SP system in the representation of knowledge and in aspects of intelligence including forms of reasoning; and the potential of the system for the seamless integration of diverse forms of knowledge and diverse aspects of intelligence. Several examples discussed by DM, and how they may be processed in the SP system, are discussed. Also discussed are current successes in CSR (taxonomic reasoning, temporal reasoning, action and change, and qualitative reasoning), how the SP system may promote seamless integration across these areas, and how insights gained from the SP programme of research may yield some potentially useful new ways of approaching these topics. The paper considers how the SP system may help overcome several challenges in the automation of CSR described by DM, and how it meets several of the objectives for research in CSRK that they have described.\nSince its inception, artificial intelligence has relied upon a theoretical foundation centered around perfect rationality as the desired property of intelligent systems. We argue, as others have done, that this foundation is inadequate because it imposes fundamentally unsatisfiable requirements. As a result, there has arisen a wide gap between theory and practice in AI, hindering progress in the field. We propose instead a property called bounded optimality. Roughly speaking, an agent is bounded-optimal if its program is a solution to the constrained optimization problem presented by its architecture and the task environment. We show how to construct agents with this property for a simple class of machine architectures in a broad class of real-time environments. We illustrate these results using a simple model of an automated mail sorting facility. We also define a weaker property, asymptotic bounded optimality (ABO), that generalizes the notion of optimality in classical complexity theory. We then construct universal ABO programs, i.e., programs that are ABO no matter what real-time constraints are applied. Universal ABO programs can be used as building blocks for more complex systems. We conclude with a discussion of the prospects for bounded optimality as a theoretical basis for AI, and relate it to similar trends in philosophy, economics, and game theory.\nPenetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks. Doing so automatically allows for regular and systematic testing. A key question is how to generate the attacks. This is naturally formulated as planning under uncertainty, i.e., under incomplete knowledge about the network configuration. Previous work uses classical planning, and requires costly pre-processes reducing this uncertainty by extensive application of scanning methods. By contrast, we herein model the attack planning problem in terms of partially observable Markov decision processes (POMDP). This allows to reason about the knowledge available, and to intelligently employ scanning actions as part of the attack. As one would expect, this accurate solution does not scale. We devise a method that relies on POMDPs to find good attacks on individual machines, which are then composed into an attack on the network as a whole. This decomposition exploits network structure to the extent possible, making targeted approximations (only) where needed. Evaluating this method on a suitably adapted industrial test suite, we demonstrate its effectiveness in both runtime and solution quality.\nCognitive radio networks (CRNs) are networks of nodes equipped with cognitive radios that can optimize performance by adapting to network conditions. While cognitive radio networks (CRN) are envisioned as intelligent networks, relatively little research has focused on the network level functionality of CRNs. Although various routing protocols, incorporating varying degrees of adaptiveness, have been proposed for CRNs, it is imperative for the long term success of CRNs that the design of cognitive routing protocols be pursued by the research community. Cognitive routing protocols are envisioned as routing protocols that fully and seamless incorporate AI-based techniques into their design. In this paper, we provide a self-contained tutorial on various AI and machine-learning techniques that have been, or can be, used for developing cognitive routing protocols. We also survey the application of various classes of AI techniques to CRNs in general, and to the problem of routing in particular. We discuss various decision making techniques and learning techniques from AI and document their current and potential applications to the problem of routing in CRNs. We also highlight the various inference, reasoning, modeling, and learning sub tasks that a cognitive routing protocol must solve. Finally, open research issues and future directions of work are identified.\nHow useful can machine learning be in a quantum laboratory? Here we raise the question of the potential of intelligent machines in the context of scientific research. A major motivation for the present work is the unknown reachability of various entanglement classes in quantum experiments. We investigate this question by using the projective simulation model, a physics-oriented approach to artificial intelligence. In our approach, the projective simulation system is challenged to design complex photonic quantum experiments that produce high-dimensional entangled multiphoton states, which are of high interest in modern quantum experiments. The artificial intelligence system learns to create a variety of entangled states, and improves the efficiency of their realization. In the process, the system autonomously (re)discovers experimental techniques which are only now becoming standard in modern quantum optical experiments - a trait which was not explicitly demanded from the system but emerged through the process of learning. Such features highlight the possibility that machines could have a significantly more creative role in future research.\nIn this paper, we conduct an empirical study on discovering the ordered collective dynamics obtained by a population of artificial intelligence (AI) agents. Our intention is to put AI agents into a simulated natural context, and then to understand their induced dynamics at the population level. In particular, we aim to verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning, and scale the population size up to millions. Our results show that the population dynamics of AI agents, driven only by each agent's individual self interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.\nPower grids are one of the most important components of infrastructure in today's world. Every nation is dependent on the security and stability of its own power grid to provide electricity to the households and industries. A malfunction of even a small part of a power grid can cause loss of productivity, revenue and in some cases even life. Thus, it is imperative to design a system which can detect the health of the power grid and take protective measures accordingly even before a serious anomaly takes place. To achieve this objective, we have set out to create an artificially intelligent system which can analyze the grid information at any given time and determine the health of the grid through the usage of sophisticated formal models and novel machine learning techniques like recurrent neural networks. Our system simulates grid conditions including stimuli like faults, generator output fluctuations, load fluctuations using Siemens PSS/E software and this data is trained using various classifiers like SVM, LSTM and subsequently tested. The results are excellent with our methods giving very high accuracy for the data. This model can easily be scaled to handle larger and more complex grid architectures.\nWe propose Cognitive Databases, an approach for transparently enabling Artificial Intelligence (AI) capabilities in relational databases. A novel aspect of our design is to first view the structured data source as meaningful unstructured text, and then use the text to build an unsupervised neural network model using a Natural Language Processing (NLP) technique called word embedding. This model captures the hidden inter-/intra-column relationships between database tokens of different types. For each database token, the model includes a vector that encodes contextual semantic relationships. We seamlessly integrate the word embedding model into existing SQL query infrastructure and use it to enable a new class of SQL-based analytics queries called cognitive intelligence (CI) queries. CI queries use the model vectors to enable complex queries such as semantic matching, inductive reasoning queries such as analogies, predictive queries using entities not present in a database, and, more generally, using knowledge from external sources. We demonstrate unique capabilities of Cognitive Databases using an Apache Spark based prototype to execute inductive reasoning CI queries over a multi-modal database containing text and images. We believe our first-of-a-kind system exemplifies using AI functionality to endow relational databases with capabilities that were previously very hard to realize in practice.\nThe biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.\nExperiments in cognitive science and decision theory show that the ways in which people combine concepts and make decisions cannot be described by classical logic and probability theory. This has serious implications for applied disciplines such as information retrieval, artificial intelligence and robotics. Inspired by a mathematical formalism that generalizes quantum mechanics the authors have constructed a contextual framework for both concept representation and decision making, together with quantum models that are in strong alignment with experimental data. The results can be interpreted by assuming the existence in human thought of a double-layered structure, a 'classical logical thought' and a 'quantum conceptual thought', the latter being responsible of the above paradoxes and nonclassical effects. The presence of a quantum structure in cognition is relevant, for it shows that quantum mechanics provides not only a useful modeling tool for experimental data but also supplies a structural model for human and artificial thought processes. This approach has strong connections with theories formalizing meaning, such as semantic analysis, and has also a deep impact on computer science, information retrieval and artificial intelligence. More specifically, the links with information retrieval are discussed in this paper.\nBy introducing elements of information mining to tax analysis, by means of data mining software and advanced computational concepts of artificial intelligence, the problem of tax evader's crime against public property has been addressed. Through an empirical approach from a hypothetical case of use, induction algorithms, neural networks and bayesian networks are applied to determine the feasibility of its heuristic application by the tax public administrator. Different strategies are explored to facilitate the work of local and regional federal tax inspectors, considering their limited computational capabilities, but equally effective for those social scientist committed to handcrafting tax research.   -----   Apresentando a introdu\\c{c}\\~ao de elementos de explora\\c{c}\\~ao de informa\\c{c}\\~oes para an\\'alise fiscal, por meio de software de minera\\c{c}\\~ao de dados e conceitos avan\\c{c}ados computacionais de intelig\\^encia artificial, foi abordado o problema do crime de sonegador fiscal contra o patrim\\^onio p\\'ublico. Atrav\\'es de uma abordagem emp\\'irica a partir de um caso hipot\\'etico de uso, os algoritmos de indu\\c{c}\\~ao, redes neurais e redes bayesianas s\\~ao aplicados para determinar a viabilidade de sua aplica\\c{c}\\~ao heur\\'istica pelo administrador p\\'ublico tribut\\'ario. Diferentes estrat\\'egias s\\~ao exploradas para facilitar o trabalho dos inspectores tribut\\'arios federais locais e regionais, tendo em conta as suas capacidades computacionais limitados, mas igualmente eficaz para aqueles cientista social comprometido com a investiga\\c{c}\\~ao fiscal.\nMany of the artificial intelligence techniques developed to date rely on heuristic search through large spaces. Unfortunately, the size of these spaces and the corresponding computational effort reduce the applicability of otherwise novel and effective algorithms. A number of parallel and distributed approaches to search have considerably improved the performance of the search process. Our goal is to develop an architecture that automatically selects parallel search strategies for optimal performance on a variety of search problems. In this paper we describe one such architecture realized in the Eureka system, which combines the benefits of many different approaches to parallel heuristic search. Through empirical and theoretical analyses we observe that features of the problem space directly affect the choice of optimal parallel search strategy. We then employ machine learning techniques to select the optimal parallel search strategy for a given problem space. When a new search task is input to the system, Eureka uses features describing the search space and the chosen architecture to automatically select the appropriate search strategy. Eureka has been tested on a MIMD parallel processor, a distributed network of workstations, and a single workstation using multithreading. Results generated from fifteen puzzle problems, robot arm motion problems, artificial search spaces, and planning problems indicate that Eureka outperforms any of the tested strategies used exclusively for all problem instances and is able to greatly reduce the search time for these applications.\nThe increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect the various characteristics of the evaluated intelligence processors. BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon.\nThe Turing Test (TT) checks for human intelligence, rather than any putative general intelligence. It involves repeated interaction requiring learning in the form of adaption to the human conversation partner. It is a macro-level post-hoc test in contrast to the definition of a Turing Machine (TM), which is a prior micro-level definition. This raises the question of whether learning is just another computational process, i.e. can be implemented as a TM. Here we argue that learning or adaption is fundamentally different from computation, though it does involve processes that can be seen as computations. To illustrate this difference we compare (a) designing a TM and (b) learning a TM, defining them for the purpose of the argument. We show that there is a well-defined sequence of problems which are not effectively designable but are learnable, in the form of the bounded halting problem. Some characteristics of human intelligence are reviewed including it's: interactive nature, learning abilities, imitative tendencies, linguistic ability and context-dependency. A story that explains some of these is the Social Intelligence Hypothesis. If this is broadly correct, this points to the necessity of a considerable period of acculturation (social learning in context) if an artificial intelligence is to pass the TT. Whilst it is always possible to 'compile' the results of learning into a TM, this would not be a designed TM and would not be able to continually adapt (pass future TTs). We conclude three things, namely that: a purely \"designed\" TM will never pass the TT; that there is no such thing as a general intelligence since it necessary involves learning; and that learning/adaption and computation should be clearly distinguished.\nIndividual-intelligence research, from a neurological perspective, discusses the hierarchical layers of the cortex as a structure that performs conceptual abstraction and specification. This theory has been used to explain how motor-cortex regions responsible for different behavioral modalities such as writing and speaking can be utilized to express the same general concept represented higher in the cortical hierarchy. For example, the concept of a dog, represented across a region of high-level cortical-neurons, can either be written or spoken about depending on the individual's context. The higher-layer cortical areas project down the hierarchy, sending abstract information to specific regions of the motor-cortex for contextual implementation. In this paper, this idea is expanded to incorporate collective-intelligence within a hyper-cortical construct. This hyper-cortex is a multi-layered network used to represent abstract collective concepts. These ideas play an important role in understanding how collective-intelligence systems can be engineered to handle problem abstraction and solution specification. Finally, a collection of common problems in the scientific community are solved using an artificial hyper-cortex generated from digital-library metadata.\nIn the article a turn-based game played on four computers connected via network is investigated. There are three computers with natural intelligence and one with artificial intelligence. Game table is seen by each player's own view point in all players' monitors. Domino pieces are three dimensional. For distributed systems TCP/IP protocol is used. In order to get 3D image, Microsoft XNA technology is applied. Domino 101 game is nondeterministic game that is result of the game depends on the initial random distribution of the pieces. Number of the distributions is equal to the multiplication of following combinations: . Moreover, in this game that is played by four people, players are divided into 2 pairs. Accordingly, we cannot predict how the player uses the dominoes that is according to the dominoes of his/her partner or according to his/her own dominoes. The fact that the natural intelligence can be a player in any level affects the outcome. These reasons make it difficult to develop an AI. In the article four levels of AI are developed. The AI in the first level is equivalent to the intelligence of a child who knows the rules of the game and recognizes the numbers. The AI in this level plays if it has any domino, suitable to play or says pass. In most of the games which can be played on the internet, the AI does the same. But the AI in the last level is a master player, and it can develop itself according to its competitors' levels.\nMycotoxin contamination in certain agricultural systems have been a serious concern for human and animal health. Mycotoxins are toxic substances produced mostly as secondary metabolites by fungi that grow on seeds and feed in the field, or in storage. The food-borne Mycotoxins likely to be of greatest significance for human health in tropical developing countries are Aflatoxins and Fumonisins. Chili pepper is also prone to Aflatoxin contamination during harvesting, production and storage periods.Various methods used for detection of Mycotoxins give accurate results, but they are slow, expensive and destructive. Destructive method is testing a material that degrades the sample under investigation. Whereas, non-destructive testing will, after testing, allow the part to be used for its intended purpose. Ultrasonic methods, Multispectral image processing methods, Terahertz methods, X-ray and Thermography have been very popular in nondestructive testing and characterization of materials and health monitoring. Image processing methods are used to improve the visual quality of the pictures and to extract useful information from them. In this proposed work, the chili pepper samples will be collected, and the X-ray, multispectral images of the samples will be processed using image processing methods. The term \"Computational Intelligence\" referred as simulation of human intelligence on computers. It is also called as \"Artificial Intelligence\" (AI) approach. The techniques used in AI approach are Neural network, Fuzzy logic and evolutionary computation. Finally, the computational intelligence method will be used in addition to image processing to provide best, high performance and accurate results for detecting the Mycotoxin level in the samples collected.\nThis article describes existing and expected benefits of the \"SP theory of intelligence\", and some potential applications. The theory aims to simplify and integrate ideas across artificial intelligence, mainstream computing, and human perception and cognition, with information compression as a unifying theme. It combines conceptual simplicity with descriptive and explanatory power across several areas of computing and cognition. In the \"SP machine\" -- an expression of the SP theory which is currently realized in the form of a computer model -- there is potential for an overall simplification of computing systems, including software. The SP theory promises deeper insights and better solutions in several areas of application including, most notably, unsupervised learning, natural language processing, autonomous robots, computer vision, intelligent databases, software engineering, information compression, medical diagnosis and big data. There is also potential in areas such as the semantic web, bioinformatics, structuring of documents, the detection of computer viruses, data fusion, new kinds of computer, and the development of scientific theories. The theory promises seamless integration of structures and functions within and between different areas of application. The potential value, worldwide, of these benefits and applications is at least $190 billion each year. Further development would be facilitated by the creation of a high-parallel, open-source version of the SP machine, available to researchers everywhere.\nNowadays, represented by Deep Learning techniques, the field of machine learning is experiencing unprecedented prosperity and its influence is demonstrated in academia, industry and civil society. \"Intelligent\" has become a label which could not be neglected for most applications; celebrities and scientists also warned that the development of full artificial intelligence may spell the end of the human race. It seems that the answer to building a computer system that could automatically improve with experience is right on the next corner. While for AI and machine learning researchers, it is a consensus that we are not anywhere near the core technique which could bring the Terminator, Number 5 or R2D2 into real life, and there is not even a formal definition about what is intelligence, or one of its basic properties: Learning. Therefore, even though researchers know these concerns are not necessary currently, there is no generalized explanation about why these concerns are not necessary, and what properties people should take into account that would make these concerns to be necessary. In this paper, starts from analysing the relation between information and its representation, a necessary condition for a model to be a learning model is proposed. This condition and related future works could be used to verify whether a system is able to learn or not, and enrich our understanding of learning: one important property of Intelligence.\nReinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn't explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents.\nIn this paper, we present MLEANN (Meta-Learning Evolutionary Artificial Neural Network), an automatic computational framework for the adaptive optimization of artificial neural networks wherein the neural network architecture, activation function, connection weights; learning algorithm and its parameters are adapted according to the problem. We explored the performance of MLEANN and conventionally designed artificial neural networks for function approximation problems. To evaluate the comparative performance, we used three different well-known chaotic time series. We also present the state of the art popular neural network learning algorithms and some experimentation results related to convergence speed and generalization performance. We explored the performance of backpropagation algorithm; conjugate gradient algorithm, quasi-Newton algorithm and Levenberg-Marquardt algorithm for the three chaotic time series. Performances of the different learning algorithms were evaluated when the activation functions and architecture were changed. We further present the theoretical background, algorithm, design strategy and further demonstrate how effective and inevitable is the proposed MLEANN framework to design a neural network, which is smaller, faster and with a better generalization performance.\nArtificial immune systems (AISs) to date have generally been inspired by naive biological metaphors. This has limited the effectiveness of these systems. In this position paper two ways in which AISs could be made more biologically realistic are discussed. We propose that AISs should draw their inspiration from organisms which possess only innate immune systems, and that AISs should employ systemic models of the immune system to structure their overall design. An outline of plant and invertebrate immune systems is presented, and a number of contemporary research that more biologically-realistic AISs could have is also discussed.\nIn a previous paper the authors argued the case for incorporating ideas from innate immunity into artificial immune systems (AISs) and presented an outline for a conceptual framework for such systems. A number of key general properties observed in the biological innate and adaptive immune systems were highlighted, and how such properties might be instantiated in artificial systems was discussed in detail. The next logical step is to take these ideas and build a software system with which AISs with these properties can be implemented and experimentally evaluated. This paper reports on the results of that step - the libtissue system.\nThe complementary DNA (cDNA) sequence is considered to be the magic biometric technique for personal identification. In this paper, we present a new method for cDNA recognition based on the artificial neural network (ANN). Microarray imaging is used for the concurrent identification of thousands of genes. We have segmented the location of the spots in a cDNA microarray. Thus, a precise localization and segmenting of a spot are essential to obtain a more accurate intensity measurement, leading to a more precise expression measurement of a gene. The segmented cDNA microarray image is resized and it is used as an input for the proposed artificial neural network. For matching and recognition, we have trained the artificial neural network. Recognition results are given for the galleries of cDNA sequences . The numerical results show that, the proposed matching technique is an effective in the cDNA sequences process. We also compare our results with previous results and find out that, the proposed technique is an effective matching performance.\nCan we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex \"black box\" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm.\nA system with artificial intelligence usually relies on symbol manipulation, at least partly and implicitly. However, the interpretation of the symbols - what they represent and what they are about - is ultimately left to humans, as designers and users of the system. How symbols can acquire meaning for the system itself, independent of external interpretation, is an unsolved problem. Some grounding of symbols can be obtained by embodiment, that is, by causally connecting symbols (or sub-symbolic variables) to the physical environment, such as in a robot with sensors and effectors. However, a causal connection as such does not produce representation and aboutness of the kind that symbols have for humans. Here I present a theory that explains how humans and other living organisms have acquired the capability to have symbols and sub-symbolic variables that represent, refer to, and are about something else. The theory shows how reference can be to physical objects, but also to abstract objects, and even how it can be misguided (errors in reference) or be about non-existing objects. I subsequently abstract the primary components of the theory from their biological context, and discuss how and under what conditions the theory could be implemented in artificial agents. A major component of the theory is the strong nonlinearity associated with (potentially unlimited) self-reproduction. The latter is likely not acceptable in artificial systems. It remains unclear if goals other than those inherently serving self-reproduction can have aboutness and if such goals could be stabilized.\nThrough the success of deep learning, Artificial Neural Networks (ANNs) are among the most used artificial intelligence methods nowadays. ANNs have led to major breakthroughs in various domains, such as particle physics, reinforcement learning, speech recognition, computer vision, and so on. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) Artificial Neural Networks (ANN), too, should not have fully-connected layers. We show how ANNs perform perfectly well with sparsely-connected layers. Following a Darwinian evolutionary approach, we propose a novel algorithm which evolves an initial random sparse topology (i.e. an Erd\\H{o}s-R\\'enyi random graph) of two consecutive layers of neurons into a scale-free topology, during the ANN training process. The resulting sparse layers can safely replace the corresponding fully-connected layers. Our method allows to quadratically reduce the number of parameters in the fully conencted layers of ANNs, yielding quadratically faster computational times in both phases (i.e. training and inference), at no decrease in accuracy. We demonstrate our claims on two popular ANN types (restricted Boltzmann machine and multi-layer perceptron), on two types of tasks (supervised and unsupervised learning), and on 14 benchmark datasets. We anticipate that our approach will enable ANNs having billions of neurons and evolved topologies to be capable of handling complex real-world tasks that are intractable using state-of-the-art methods.\nDespite the effort of many researchers in the area of multi-agent systems (MAS) for designing and programming agents, a few years ago the research community began to take into account that common features among different MAS exists. Based on these common features, several tools have tackled the problem of agent development on specific application domains or specific types of agents. As a consequence, their scope is restricted to a subset of the huge application domain of MAS. In this paper we propose a generic infrastructure for programming agents whose name is Brainstorm/J. The infrastructure has been implemented as an object oriented framework. As a consequence, our approach supports a broader scope of MAS applications than previous efforts, being flexible and reusable.\nInformation Integration is a young and exciting field with enormous research and commercial significance in the new world of the Information Society. It stands at the crossroad of Databases and Artificial Intelligence requiring novel techniques that bring together different methods from these fields. Information from disparate heterogeneous sources often with no a-priori common schema needs to be synthesized in a flexible, transparent and intelligent way in order to respond to the demands of a query thus enabling a more informed decision by the user or application program. The field although relatively young has already found many practical applications particularly for integrating information over the World Wide Web. This paper gives a brief introduction of the field highlighting some of the main current and future research issues and application areas. It attempts to evaluate the current and potential role of Computational Logic in this and suggests some of the problems where logic-based techniques could be used.\nThis paper presents a new approach to solving N-queen problems, which involves a model of distributed autonomous agents with artificial life (ALife) and a method of representing N-queen constraints in an agent environment. The distributed agents locally interact with their living environment, i.e., a chessboard, and execute their reactive behaviors by applying their behavioral rules for randomized motion, least-conflict position searching, and cooperating with other agents etc. The agent-based N-queen problem solving system evolves through selection and contest according to the rule of Survival of the Fittest, in which some agents will die or be eaten if their moving strategies are less efficient than others. The experimental results have shown that this system is capable of solving large-scale N-queen problems. This paper also provides a model of ALife agents for solving general CSPs.\nThe study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. In a companion paper, we introduce a new framework to model belief change. This framework combines temporal and epistemic modalities with a notion of plausibility, allowing us to examine the change of beliefs over time. In this paper, we show how belief revision and belief update can be captured in our framework. This allows us to compare the assumptions made by each method, and to better understand the principles underlying them. In particular, it shows that Katsuno and Mendelzon's notion of belief update depends on several strong assumptions that may limit its applicability in artificial intelligence. Finally, our analysis allow us to identify a notion of minimal change that underlies a broad range of belief change operations including revision and update.\nThe use of intelligent systems for stock market predictions has been widely established. In this paper, we investigate how the seemingly chaotic behavior of stock markets could be well represented using several connectionist paradigms and soft computing techniques. To demonstrate the different techniques, we considered Nasdaq-100 index of Nasdaq Stock MarketS and the S&P CNX NIFTY stock index. We analyzed 7 year's Nasdaq 100 main index values and 4 year's NIFTY index values. This paper investigates the development of a reliable and efficient technique to model the seemingly chaotic behavior of stock markets. We considered an artificial neural network trained using Levenberg-Marquardt algorithm, Support Vector Machine (SVM), Takagi-Sugeno neuro-fuzzy model and a Difference Boosting Neural Network (DBNN). This paper briefly explains how the different connectionist paradigms could be formulated using different learning methods and then investigates whether they can provide the required level of performance, which are sufficiently good and robust so as to provide a reliable forecast model for stock market indices. Experiment results reveal that all the connectionist paradigms considered could represent the stock indices behavior very accurately.\nNormally a decision support system is build to solve problem where multi-criteria decisions are involved. The knowledge base is the vital part of the decision support containing the information or data that is used in decision-making process. This is the field where engineers and scientists have applied several intelligent techniques and heuristics to obtain optimal decisions from imprecise information. In this paper, we present a hybrid neuro-genetic learning approach for the adaptation a Mamdani fuzzy inference system for the Tactical Air Combat Decision Support System (TACDSS). Some simulation results demonstrating the difference of the learning techniques and are also provided.\nArtificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequence-processing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2^9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a by-product of the way humans allocate memory space to past events?\nKnowledge plays a central role in human and artificial intelligence. One of the key characteristics of knowledge is its structured organization. Knowledge can be and should be presented in multiple levels and multiple views to meet people's needs in different levels of granularities and from different perspectives. In this paper, we stand on the view point of granular computing and provide our understanding on multi-level and multi-view of knowledge through granular knowledge structures (GKS). Representation of granular knowledge structures, operations for building granular knowledge structures and how to use them are investigated. As an illustration, we provide some examples through results from an analysis of proceeding papers. Results show that granular knowledge structures could help users get better understanding of the knowledge source from set theoretical, logical and visual point of views. One may consider using them to meet specific needs or solve certain kinds of problems.\nThe recurrent theme of this paper is that sequences of long temporal patterns as opposed to sequences of simple statements are to be fed into computation devices, being them (new proposed) models for brain activity or multi-core/many-core computers. In such models, parts of these long temporal patterns are already committed while other are predicted. This combination of matching patterns and making predictions appears as a key element in producing intelligent processing in brain models and getting efficient speculative execution on multi-core/many-core computers. A bridge between these far-apart models of computation could be provided by appropriate design of massively parallel, interactive programming languages. Agapia is a recently proposed language of this kind, where user controlled long high-level temporal structures occur at the interaction interfaces of processes. In this paper Agapia is used to link HTMs brain models with TRIPS multi-core/many-core architectures.\nGeneral purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in a companion article.\nGeneral-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II. The role of POMDPs is also considered there.\nThis paper presents a combination of several automated reasoning and proof presentation tools with the Mizar system for formalization of mathematics. The combination forms an online service called MizAR, similar to the SystemOnTPTP service for first-order automated reasoning. The main differences to SystemOnTPTP are the use of the Mizar language that is oriented towards human mathematicians (rather than the pure first-order logic used in SystemOnTPTP), and setting the service in the context of the large Mizar Mathematical Library of previous theorems,definitions, and proofs (rather than the isolated problems that are solved in SystemOnTPTP). These differences poses new challenges and new opportunities for automated reasoning and for proof presentation tools. This paper describes the overall structure of MizAR, and presents the automated reasoning systems and proof presentation tools that are combined to make MizAR a useful mathematical service.\nThe Dendritic Cell Algorithm (DCA) is inspired by the function of the dendritic cells of the human immune system. In nature, dendritic cells are the intrusion detection agents of the human body, policing the tissue and organs for potential invaders in the form of pathogens. In this research, and abstract model of DC behaviour is developed and subsequently used to form an algorithm, the DCA. The abstraction process was facilitated through close collaboration with laboratory- based immunologists, who performed bespoke experiments, the results of which are used as an integral part of this algorithm. The DCA is a population based algorithm, with each agent in the system represented as an 'artificial DC'. Each DC has the ability to combine multiple data streams and can add context to data suspected as anomalous. In this chapter the abstraction process and details of the resultant algorithm are given. The algorithm is applied to numerous intrusion detection problems in computer security including the detection of port scans and botnets, where it has produced impressive results with relatively low rates of false positives.\nUncertainty of decisions in safety-critical engineering applications can be estimated on the basis of the Bayesian Markov Chain Monte Carlo (MCMC) technique of averaging over decision models. The use of decision tree (DT) models assists experts to interpret causal relations and find factors of the uncertainty. Bayesian averaging also allows experts to estimate the uncertainty accurately when a priori information on the favored structure of DTs is available. Then an expert can select a single DT model, typically the Maximum a Posteriori model, for interpretation purposes. Unfortunately, a priori information on favored structure of DTs is not always available. For this reason, we suggest a new prior on DTs for the Bayesian MCMC technique. We also suggest a new procedure of selecting a single DT and describe an application scenario. In our experiments on the Short-Term Conflict Alert data our technique outperforms the existing Bayesian techniques in predictive accuracy of the selected single DTs.\nImitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.\nThe amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.\nResearch in the field of Artificial Intelligence is continually progressing to simulate the human knowledge into automated intelligent knowledge base, which can encode and retrieve knowledge efficiently along with the capability of being is consistent and scalable at all times. However, there is no system at hand that can match the diversified abilities of human knowledge base. In this position paper, we put forward a theoretical model of a different system that intends to integrate pieces of knowledge, Informledge System (ILS). ILS would encode the knowledge, by virtue of knowledge units linked across diversified domains. The proposed ILS comprises of autonomous knowledge units termed as Knowledge Network Node (KNN), which would help in efficient cross-linking of knowledge units to encode fresh knowledge. These links are reasoned and inferred by the Parser and Link Manager, which are part of KNN.\nWe present an application focused on the design of resilient long-reach passive optical networks. We specifically consider dual-parented networks whereby each customer must be connected to two metro sites via local exchange sites. An important property of such a placement is resilience to single metro node failure. The objective of the application is to determine the optimal position of a set of metro nodes such that the total optical fibre length is minimized. We prove that this problem is NP-Complete. We present two alternative combinatorial optimisation approaches to finding an optimal metro node placement using: a mixed integer linear programming (MIP) formulation of the problem; and, a hybrid approach that uses clustering as a preprocessing step. We consider a detailed case-study based on a network for Ireland. The hybrid approach scales well and finds solutions that are close to optimal, with a runtime that is two orders-of-magnitude better than the MIP model.\nIt is widely recognized today that the management of imprecision and vagueness will yield more intelligent and realistic knowledge-based applications. Description Logics (DLs) are a family of knowledge representation languages that have gained considerable attention the last decade, mainly due to their decidability and the existence of empirically high performance of reasoning algorithms. In this paper, we extend the well known fuzzy ALC DL to the fuzzy SHIN DL, which extends the fuzzy ALC DL with transitive role axioms (S), inverse roles (I), role hierarchies (H) and number restrictions (N). We illustrate why transitive role axioms are difficult to handle in the presence of fuzzy interpretations and how to handle them properly. Then we extend these results by adding role hierarchies and finally number restrictions. The main contributions of the paper are the decidability proof of the fuzzy DL languages fuzzy-SI and fuzzy-SHIN, as well as decision procedures for the knowledge base satisfiability problem of the fuzzy-SI and fuzzy-SHIN.\nThe theoretical transition from the graphs of production systems to the bipartite graphs of the MIVAR nets is shown. Examples of the implementation of the MIVAR nets in the formalisms of matrixes and graphs are given. The linear computational complexity of algorithms for automated building of objects and rules of the MIVAR nets is theoretically proved. On the basis of the MIVAR nets the UDAV software complex is developed, handling more than 1.17 million objects and more than 3.5 million rules on ordinary computers. The results of experiments that confirm a linear computational complexity of the MIVAR method of information processing are given.   Keywords: MIVAR, MIVAR net, logical inference, computational complexity, artificial intelligence, intelligent systems, expert systems, General Problem Solver.\nLanguage evolution might have preferred certain prior social configurations over others. Experiments conducted with models of different social structures (varying subgroup interactions and the role of a dominant interlocutor) suggest that having isolated agent groups rather than an interconnected agent is more advantageous for the emergence of a social communication system. Distinctive groups that are closely connected by communication yield systems less like natural language than fully isolated groups inhabiting the same world. Furthermore, the addition of a dominant male who is asymmetrically favoured as a hearer, and equally likely to be a speaker has no positive influence on the disjoint groups.\nUser preferences for automated assistance often vary widely, depending on the situation, and quality or presentation of help. Developing effectivemodels to learn individual preferences online requires domain models that associate observations of user behavior with their utility functions, which in turn can be constructed using utility elicitation techniques. However, most elicitation methods ask for users' predicted utilities based on hypothetical scenarios rather than more realistic experienced utilities. This is especially true in interface customization, where users are asked to assess novel interface designs. We propose experiential utility elicitation methods for customization and compare these to predictivemethods. As experienced utilities have been argued to better reflect true preferences in behavioral decision making, the purpose here is to investigate accurate and efficient procedures that are suitable for software domains. Unlike conventional elicitation, our results indicate that an experiential approach helps people understand stochastic outcomes, as well as better appreciate the sequential utility of intelligent assistance.\nIntelligent systems in an open world must reason about many interacting entities related to each other in diverse ways and having uncertain features and relationships. Traditional probabilistic languages lack the expressive power to handle relational domains. Classical first-order logic is sufficiently expressive, but lacks a coherent plausible reasoning capability. Recent years have seen the emergence of a variety of approaches to integrating first-order logic, probability, and machine learning. This paper presents Multi-entity Bayesian networks (MEBN), a formal system that integrates First Order Logic (FOL) with Bayesian probability theory. MEBN extends ordinary Bayesian networks to allow representation of graphical models with repeated sub-structures, and can express a probability distribution over models of any consistent, finitely axiomatizable first-order theory. We present the logic using an example inspired by the Paramount Series StarTrek.\nA key prerequisite to optimal reasoning under uncertainty in intelligent systems is to start with good class probability estimates. This paper improves on the current best probability estimation trees (Bagged-PETs) and also presents a new ensemble-based algorithm (MOB-ESP). Comparisons are made using several benchmark datasets and multiple metrics. These experiments show that MOB-ESP outputs significantly more accurate class probabilities than either the baseline BPETs algorithm or the enhanced version presented here (EB-PETs). These results are based on metrics closely associated with the average accuracy of the predictions. MOB-ESP also provides much better probability rankings than B-PETs. The paper further suggests how these estimation techniques can be applied in concert with a broader category of classifiers.\nFor the past few decades, man has been trying to create an intelligent computer which can talk and respond like he can. The task of creating a system that can talk like a human being is the primary objective of Automatic Speech Recognition. Various Speech Recognition techniques have been developed in theory and have been applied in practice. This paper discusses the problems that have been encountered in developing Speech Recognition, the techniques that have been applied to automate the task, and a representation of the core problems of present day Speech Recognition by using Fuzzy Mathematics.\nEmpty taxi cruising represents a wastage of resources in the context of urban taxi services. In this work, we seek to minimize such wastage. An analysis of a large trace of taxi operations reveals that the services' inefficiency is caused by drivers' greedy cruising behavior. We model the existing system as a continuous time Markov chain. To address the problem, we propose that each taxi be equipped with an intelligent agent that will guide the driver when cruising for passengers. Then, drawing from AI literature on multiagent planning, we explore two possible ways to compute such guidance. The first formulation assumes fully cooperative drivers. This allows us, in principle, to compute systemwide optimal cruising policy. This is modeled as a Markov decision process. The second formulation assumes rational drivers, seeking to maximize their own profit. This is modeled as a stochastic congestion game, a specialization of stochastic games. Nash equilibrium policy is proposed as the solution to the game, where no driver has the incentive to singly deviate from it. Empirical result shows that both formulations improve the efficiency of the service significantly.\nThis paper considers the problem of knowledge-based model construction in the presence of uncertainty about the association of domain entities to random variables. Multi-entity Bayesian networks (MEBNs) are defined as a representation for knowledge in domains characterized by uncertainty in the number of relevant entities, their interrelationships, and their association with observables. An MEBN implicitly specifies a probability distribution in terms of a hierarchically structured collection of Bayesian network fragments that together encode a joint probability distribution over arbitrarily many interrelated hypotheses. Although a finite query-complete model can always be constructed, association uncertainty typically makes exact model construction and evaluation intractable. The objective of hypothesis management is to balance tractability against accuracy. We describe an application to the problem of using intelligence reports to infer the organization and activities of groups of military vehicles. Our approach is compared to related work in the tracking and fusion literature.\nMany intelligent user interfaces employ application and user models to determine the user's preferences, goals and likely future actions. Such models require application analysis, adaptation and expansion. Building and maintaining such models adds a substantial amount of time and labour to the application development cycle. We present a system that observes the interface of an unmodified application and records users' interactions with the application. From a history of such observations we build a coarse state space of observed interface states and actions between them. To refine the space, we hypothesize sub-states based upon the histories that led users to a given state. We evaluate the information gain of possible state splits, varying the length of the histories considered in such splits. In this way, we automatically produce a stochastic dynamic model of the application and of how it is used. To evaluate our approach, we present models derived from real-world application usage data.\nMany applications of intelligent systems require reasoning about the mental states of agents in the domain. We may want to reason about an agent's beliefs, including beliefs about other agents; we may also want to reason about an agent's preferences, and how his beliefs and preferences relate to his behavior. We define a probabilistic epistemic logic (PEL) in which belief statements are given a formal semantics, and provide an algorithm for asserting and querying PEL formulas in Bayesian networks. We then show how to reason about an agent's behavior by modeling his decision process as an influence diagram and assuming that he behaves rationally. PEL can then be used for reasoning from an agent's observed actions to conclusions about other aspects of the domain, including unobserved domain variables and the agent's mental states.\nThe application of Bayesian networks (BNs) to cognitive assessment and intelligent tutoring systems poses new challenges for model construction. When cognitive task analyses suggest constructing a BN with several latent variables, empirical model criticism of the latent structure becomes both critical and complex. This paper introduces a methodology for criticizing models both globally (a BN in its entirety) and locally (observable nodes), and explores its value in identifying several kinds of misfit: node errors, edge errors, state errors, and prior probability errors in the latent structure. The results suggest the indices have potential for detecting model misfit and assisting in locating problematic components of the model.\nWe present a new abductive, probabilistic theory of plan recognition. This model differs from previous plan recognition theories in being centered around a model of plan execution: most previous methods have been based on plans as formal objects or on rules describing the recognition process. We show that our new model accounts for phenomena omitted from most previous plan recognition theories: notably the cumulative effect of a sequence of observations of partially-ordered, interleaved plans and the effect of context on plan adoption. The model also supports inferences about the evolution of plan execution in situations where another agent intervenes in plan execution. This facility provides support for using plan recognition to build systems that will intelligently assist a user.\nThe Lumiere Project centers on harnessing probability and utility to provide assistance to computer software users. We review work on Bayesian user models that can be employed to infer a users needs by considering a user's background, actions, and queries. Several problems were tackled in Lumiere research, including (1) the construction of Bayesian models for reasoning about the time-varying goals of computer users from their observed actions and queries, (2) gaining access to a stream of events from software applications, (3) developing a language for transforming system events into observational variables represented in Bayesian user models, (4) developing persistent profiles to capture changes in a user expertise, and (5) the development of an overall architecture for an intelligent user interface. Lumiere prototypes served as the basis for the Office Assistant in the Microsoft Office '97 suite of productivity applications.\nThis paper presents two new approaches to decomposing and solving large Markov decision problems (MDPs), a partial decoupling method and a complete decoupling method. In these approaches, a large, stochastic decision problem is divided into smaller pieces. The first approach builds a cache of policies for each part of the problem independently, and then combines the pieces in a separate, light-weight step. A second approach also divides the problem into smaller pieces, but information is communicated between the different problem pieces, allowing intelligent decisions to be made about which piece requires the most attention. Both approaches can be used to find optimal policies or approximately optimal policies with provable bounds. These algorithms also provide a framework for the efficient transfer of knowledge across problems that share similar structure.\nEver growing number of image documents available on the Internet continuously motivates research in better annotation models and more efficient retrieval methods. Formal knowledge representation of objects and events in pictures, their interaction as well as context complexity becomes no longer an option for a quality image repository, but a necessity. We present an ontology-based online image annotation tool WNtags and demonstrate its usefulness in several typical multimedia retrieval tasks using International Affective Picture System emotionally annotated image database. WNtags is built around WordNet lexical ontology but considers Suggested Upper Merged Ontology as the preferred labeling formalism. WNtags uses sets of weighted WordNet synsets as high-level image semantic descriptors and query matching is performed with word stemming and node distance metrics. We also elaborate our near future plans to expand image content description with induced affect as in stimuli for research of human emotion and attention.\nApproximate models of world state transitions are necessary when building plans for complex systems operating in dynamic environments. External event probabilities can depend on state feature values as well as time spent in that particular state. We assign temporally -dependent probability functions to state transitions. These functions are used to locally compute state probabilities, which are then used to select highly probable goal paths and eliminate improbable states. This probabilistic model has been implemented in the Cooperative Intelligent Real-time Control Architecture (CIRCA), which combines an AI planner with a separate real-time system such that plans are developed, scheduled, and executed with real-time guarantees. We present flight simulation tests that demonstrate how our probabilistic model may improve CIRCA performance.\nMost traditional models of uncertainty have focused on the associational relationship among variables as captured by conditional dependence. In order to successfully manage intelligent systems for decision making, however, we must be able to predict the effects of actions. In this paper, we attempt to unite two branches of research that address such predictions: causal modeling and decision analysis. First, we provide a definition of causal dependence in decision-analytic terms, which we derive from consequences of causal dependence cited in the literature. Using this definition, we show how causal dependence can be represented within an influence diagram. In particular, we identify two inadequacies of an ordinary influence diagram as a representation for cause. We introduce a special class of influence diagrams, called causal influence diagrams, which corrects one of these problems, and identify situations where the other inadequacy can be eliminated. In addition, we describe the relationships between Howard Canonical Form and existing graphical representations of cause.\nThis paper addresses learning stochastic rules especially on an inter-attribute relation based on a Minimum Description Length (MDL) principle with a finite number of examples, assuming an application to the design of intelligent relational database systems. The stochastic rule in this paper consists of a model giving the structure like the dependencies of a Bayesian Belief Network (BBN) and some stochastic parameters each indicating a conditional probability of an attribute value given the state determined by the other attributes' values in the same record. Especially, we propose the extended version of the algorithm of Chow and Liu in that our learning algorithm selects the model in the range where the dependencies among the attributes are represented by some general plural number of trees.\nThis paper describes the architecture of R&D Analyst, a commercial intelligent decision system for evaluating corporate research and development projects and portfolios. In analyzing projects, R&D Analyst interactively guides a user in constructing an influence diagram model for an individual research project. The system's interactive approach can be clearly explained from a blackboard system perspective. The opportunistic reasoning emphasis of blackboard systems satisfies the flexibility requirements of model construction, thereby suggesting that a similar architecture would be valuable for developing normative decision systems in other domains. Current research is aimed at extending the system architecture to explicitly consider of sequential decisions involving limited temporal, financial, and physical resources.\nIn this paper the theory of semi-bounded rationality is proposed as an extension of the theory of bounded rationality. In particular, it is proposed that a decision making process involves two components and these are the correlation machine, which estimates missing values, and the causal machine, which relates the cause to the effect. Rational decision making involves using information which is almost always imperfect and incomplete as well as some intelligent machine which if it is a human being is inconsistent to make decisions. In the theory of bounded rationality this decision is made irrespective of the fact that the information to be used is incomplete and imperfect and the human brain is inconsistent and thus this decision that is to be made is taken within the bounds of these limitations. In the theory of semi-bounded rationality, signal processing is used to filter noise and outliers in the information and the correlation machine is applied to complete the missing information and artificial intelligence is used to make more consistent decisions.\nIn this paper the theory of flexibly-bounded rationality which is an extension to the theory of bounded rationality is revisited. Rational decision making involves using information which is almost always imperfect and incomplete together with some intelligent machine which if it is a human being is inconsistent to make decisions. In bounded rationality, this decision is made irrespective of the fact that the information to be used is incomplete and imperfect and that the human brain is inconsistent and thus this decision that is to be made is taken within the bounds of these limitations. In the theory of flexibly-bounded rationality, advanced information analysis is used, the correlation machine is applied to complete missing information and artificial intelligence is used to make more consistent decisions. Therefore flexibly-bounded rationality expands the bounds within which rationality is exercised. Because human decision making is essentially irrational, this paper proposes the theory of marginalization of irrationality in decision making to deal with the problem of satisficing in the presence of irrationality.\nThe current paper offers a perspective on what we term conceptive intelligence - the capacity of an agent to continuously think of new object definitions (tasks, problems, physical systems, etc.) and to look for methods to realize them. The framework, called a Brouwer machine, is inspired by previous research in design theory and modeling, with its roots in the constructivist mathematics of intuitionism. The dual constructivist perspective we describe offers the possibility to create novelty both in terms of the types of objects and the methods for constructing objects. More generally, the theoretical work on which Brouwer machines are based is called imaginative constructivism. Based on the framework and the theory, we discuss many paradigms and techniques omnipresent in AI research and their merits and shortcomings for modeling aspects of design, as described by imaginative constructivism. To demonstrate and explain the type of creative process expressed by the notion of a Brouwer machine, we compare this concept with a system using genetic algorithms for scientific law discovery.\nOver the past decade, AI has made a remarkable progress due to recently revived Deep Learning technology. Deep Learning enables to process large amounts of data using simplified neuron networks that simulate the way in which the brain works. At the same time, there is another point of view that posits that brain is processing information, not data. This duality hampered AI progress for years. To provide a remedy for this situation, I propose a new definition of information that considers it as a coupling between two separate entities - physical information (that implies data processing) and semantic information (that provides physical information interpretation). In such a case, intelligence arises as a result of information processing. The paper points on the consequences of this turn for the AI design philosophy.\nWe show in this paper how managed multi-context systems (mMCSs) can be turned into a reactive formalism suitable for continuous reasoning in dynamic environments. We extend mMCSs with (abstract) sensors and define the notion of a run of the extended systems. We then show how typical problems arising in online reasoning can be addressed: handling potentially inconsistent sensor input, modeling intelligent forms of forgetting, selective integration of knowledge, and controlling the reasoning effort spent by contexts, like setting contexts to an idle mode. We also investigate the complexity of some important related decision problems and discuss different design choices which are given to the knowledge engineer.\nThere is considerable uncertainty about what properties, capabilities and motivations future AGIs will have. In some plausible scenarios, AGIs may pose security risks arising from accidents and defects. In order to mitigate these risks, prudent early AGI research teams will perform significant testing on their creations before use. Unfortunately, if an AGI has human-level or greater intelligence, testing itself may not be safe; some natural AGI goal systems create emergent incentives for AGIs to tamper with their test environments, make copies of themselves on the internet, or convince developers and operators to do dangerous things. In this paper, we survey the AGI containment problem - the question of how to build a container in which tests can be conducted safely and reliably, even on AGIs with unknown motivations and capabilities that could be dangerous. We identify requirements for AGI containers, available mechanisms, and weaknesses that need to be addressed.\nTo operate intelligently in the world, an agent must reason about its actions. The consequences of an action are a function of both the state of the world and the action itself. Many aspects of the world are inherently stochastic, so a representation for reasoning about actions must be able to express chances of world states as well as indeterminacy in the effects of actions and other events. This paper presents a propositional temporal probability logic for representing and reasoning about actions. The logic can represent the probability that facts hold and events occur at various times. It can represent the probability that actions and other events affect the future. It can represent concurrent actions and conditions that hold or change during execution of an action. The model of probability relates probabilities over time. The logical language integrates both modal and probabilistic constructs and can thus represent and distinguish between possibility, probability, and truth. Several examples illustrating the use of the logic are given.\nAs the technology for building knowledge based systems has matured, important lessons have been learned about the relationship between the architecture of a system and the nature of the problems it is intended to solve. We are implementing a knowledge engineering tool called BART that is designed with these lessons in mind. BART is a Bayesian reasoning tool that makes belief networks and other probabilistic techniques available to knowledge engineers building classificatory problem solvers. BART has already been used to develop a decision aid for classifying ship images, and it is currently being used to manage uncertainty in systems concerned with analyzing intelligence reports. This paper discusses how state-of-the-art probabilistic methods fit naturally into a knowledge based approach to classificatory problem solving, and describes the current capabilities of BART.\nScheduling in the factory setting is compounded by computational complexity and temporal uncertainty. Together, these two factors guarantee that the process of constructing an optimal schedule will be costly and the chances of executing that schedule will be slight. Temporal uncertainty in the task execution time can be offset by several methods: eliminate uncertainty by careful engineering, restore certainty whenever it is lost, reduce the uncertainty by using more accurate sensors, and quantify and circumscribe the remaining uncertainty. Unfortunately, these methods focus exclusively on the sources of uncertainty and fail to apply knowledge of the tasks which are to be scheduled. A complete solution must adapt the schedule of activities to be performed according to the evolving state of the production world. The example of vision-directed assembly is presented to illustrate that the principle of least commitment, in the creation of a plan, in the representation of a schedule, and in the execution of a schedule, enables a robot to operate intelligently and efficiently, even in the presence of considerable uncertainty in the sequence of future events.\nThe control and integration of distributed, multi-sensor perceptual systems is a complex and challenging problem. The observations or opinions of different sensors are often disparate incomparable and are usually only partial views. Sensor information is inherently uncertain and in addition the individual sensors may themselves be in error with respect to the system as a whole. The successful operation of a multi-sensor system must account for this uncertainty and provide for the aggregation of disparate information in an intelligent and robust manner. We consider the sensors of a multi-sensor system to be members or agents of a team, able to offer opinions and bargain in group decisions. We will analyze the coordination and control of this structure using a theory of team decision-making. We present some new analytic results on multi-sensor aggregation and detail a simulation which we use to investigate our ideas. This simulation provides a basis for the analysis of complex agent structures cooperating in the presence of uncertainty. The results of this study are discussed with reference to multi-sensor robot systems, distributed Al and decision making under uncertainty.\nIn designing an intelligent system that must be able to explain its reasoning to a human user, or to provide generalizations that the human user finds reasonable, it may be useful to take into consideration psychological data on what types of concepts and categories people naturally use. The psychological literature on concept learning and categorization provides strong evidence that certain categories are more easily learned, recalled, and recognized than others. We show here how a measure of the informational value of a category predicts the results of several important categorization experiments better than standard alternative explanations. This suggests that information-based approaches to machine generalization may prove particularly useful and natural for human users of the systems.\nA key challenge in non-cooperative multi-agent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, self-interested agents (e.g., humans). The practicality of existing works addressing this challenge is being undermined due to either the restrictive assumptions of the other agents' behavior, the failure in accounting for their rationality, or the prohibitively expensive cost of modeling and predicting their intentions. To boost the practicality of research in this field, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intention-aware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions. We show that the performance losses incurred by the resulting planning policies are linearly bounded by the error of intention prediction. Empirical evaluations through a series of stochastic games demonstrate that our policies can achieve better and more robust performance than the state-of-the-art algorithms.\nRational agents are usually built to maximize rewards. However, AGI agents can find undesirable ways of maximizing any prior reward function. Therefore value learning is crucial for safe AGI. We assume that generalized states of the world are valuable - not rewards themselves, and propose an extension of AIXI, in which rewards are used only to bootstrap hierarchical value learning. The modified AIXI agent is considered in the multi-agent environment, where other agents can be either humans or other \"mature\" agents, which values should be revealed and adopted by the \"infant\" AGI agent. General framework for designing such empathic agent with ethical bias is proposed also as an extension of the universal intelligence model. Moreover, we perform experiments in the simple Markov environment, which demonstrate feasibility of our approach to value learning in safe AGI.\nThis note revisits the concepts of task and difficulty. The notion of cognitive task and its use for the evaluation of intelligent systems is still replete with issues. The view of tasks as MDP in the context of reinforcement learning has been especially useful for the formalisation of learning tasks. However, this alternate interaction does not accommodate well for some other tasks that are usual in artificial intelligence and, most especially, in animal and human evaluation. In particular, we want to have a more general account of episodes, rewards and responses, and, most especially, the computational complexity of the algorithm behind an agent solving a task. This is crucial for the determination of the difficulty of a task as the (logarithm of the) number of computational steps required to acquire an acceptable policy for the task, which includes the exploration of policies and their verification. We introduce a notion of asynchronous-time stochastic tasks. Based on this interpretation, we can see what task difficulty is, what instance difficulty is (relative to a task) and also what task compositions and decompositions are.\nThe ability to generalize is an important feature of any intelligent agent. Not only because it may allow the agent to cope with large amounts of data, but also because in some environments, an agent with no generalization capabilities cannot learn. In this work we outline several criteria for generalization, and present a dynamic and autonomous machinery that enables projective simulation agents to meaningfully generalize. Projective simulation, a novel, physical approach to artificial intelligence, was recently shown to perform well in standard reinforcement learning problems, with applications in advanced robotics as well as quantum experiments. Both the basic projective simulation model and the presented generalization machinery are based on very simple principles. This allows us to provide a full analytical analysis of the agent's performance and to illustrate the benefit the agent gains by generalizing. Specifically, we show that already in basic (but extreme) environments, learning without generalization may be impossible, and demonstrate how the presented generalization machinery enables the projective simulation agent to learn.\nIn this paper, a method is proposed to detect the emotion of a song based on its lyrical and audio features. Lyrical features are generated by segmentation of lyrics during the process of data extraction. ANEW and WordNet knowledge is then incorporated to compute Valence and Arousal values. In addition to this, linguistic association rules are applied to ensure that the issue of ambiguity is properly addressed. Audio features are used to supplement the lyrical ones and include attributes like energy, tempo, and danceability. These features are extracted from The Echo Nest, a widely used music intelligence platform. Construction of training and test sets is done on the basis of social tags extracted from the last.fm website. The classification is done by applying feature weighting and stepwise threshold reduction on the k-Nearest Neighbors algorithm to provide fuzziness in the classification.\nWe administered the Verbal IQ (VIQ) part of the Wechsler Preschool and Primary Scale of Intelligence (WPPSI-III) to the ConceptNet 4 AI system. The test questions (e.g., \"Why do we shake hands?\") were translated into ConceptNet 4 inputs using a combination of the simple natural language processing tools that come with ConceptNet together with short Python programs that we wrote. The question answering used a version of ConceptNet based on spectral methods. The ConceptNet system scored a WPPSI-III VIQ that is average for a four-year-old child, but below average for 5 to 7 year-olds. Large variations among subtests indicate potential areas of improvement. In particular, results were strongest for the Vocabulary and Similarities subtests, intermediate for the Information subtest, and lowest for the Comprehension and Word Reasoning subtests. Comprehension is the subtest most strongly associated with common sense. The large variations among subtests and ordinary common sense strongly suggest that the WPPSI-III VIQ results do not show that \"ConceptNet has the verbal abilities a four-year-old.\" Rather, children's IQ tests offer one objective metric for the evaluation and comparison of AI systems. Also, this work continues previous research on Psychometric AI.\nAgents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions. While such adaptive agents may leverage engineered knowledge, they will require the capacity to construct and evaluate knowledge themselves from their own experience in a bottom-up, constructivist fashion. This position paper builds on the idea of encoding knowledge as temporally extended predictions through the use of general value functions. Prior work has focused on learning predictions about externally derived signals about a task or environment (e.g. battery level, joint position). Here we advocate that the agent should also predict internally generated signals regarding its own learning process - for example, an agent's confidence in its learned predictions. Finally, we suggest how such information would be beneficial in creating an introspective agent that is able to learn to make good decisions in a complex, changing world.\nAfter an earthquake, disaster sites pose a multitude of health and safety concerns. A rescue operation of people trapped in the ruins after an earthquake disaster requires a series of intelligent behavior, including planning. For a successful rescue operation, given a limited number of available actions and regulations, the role of planning in rescue operations is crucial. Fortunately, recent developments in automated planning by artificial intelligence community can help different organization in this crucial task. Due to the number of rules and regulations, we believe that a rule based system for planning can be helpful for this specific planning problem. In this research work, we use logic rules to represent rescue and related regular regulations, together with a logic based planner to solve this complicated problem. Although this research is still in the prototyping and modeling stage, it clearly shows that rule based languages can be a good infrastructure for this computational task. The results of this research can be used by different organizations, such as Iranian Red Crescent Society and International Institute of Seismology and Earthquake Engineering (IISEE).\nHandwriting is a skill learned by humans from a very early age. The ability to develop one's own unique handwriting as well as mimic another person's handwriting is a task learned by the brain with practice. This paper deals with this very problem where an intelligent system tries to learn the handwriting of an entity using Generative Adversarial Networks (GANs). We propose a modified architecture of DCGAN (Radford, Metz, and Chintala 2015) to achieve this. We also discuss about applying reinforcement learning techniques to achieve faster learning. Our algorithm hopes to give new insights in this area and its uses include identification of forged documents, signature verification, computer generated art, digitization of documents among others. Our early implementation of the algorithm illustrates a good performance with MNIST datasets.\nReasoning about objects, relations, and physics is central to human intelligence, and a key goal of artificial intelligence. Here we introduce the interaction network, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. Our model takes graphs as input, performs object- and relation-centric reasoning in a way that is analogous to a simulation, and is implemented using deep neural networks. We evaluate its ability to reason about several challenging physical domains: n-body problems, rigid-body collision, and non-rigid dynamics. Our results show it can be trained to accurately simulate the physical trajectories of dozens of objects over thousands of time steps, estimate abstract quantities such as energy, and generalize automatically to systems with different numbers and configurations of objects and relations. Our interaction network implementation is the first general-purpose, learnable physics engine, and a powerful general framework for reasoning about object and relations in a wide variety of complex real-world domains.\nCommunicating and sharing intelligence among agents is an important facet of achieving Artificial General Intelligence. As a first step towards this challenge, we introduce a novel framework for image generation: Message Passing Multi-Agent Generative Adversarial Networks (MPM GANs). While GANs have recently been shown to be very effective for image generation and other tasks, these networks have been limited to mostly single generator-discriminator networks. We show that we can obtain multi-agent GANs that communicate through message passing to achieve better image generation. The objectives of the individual agents in this framework are two fold: a co-operation objective and a competing objective. The co-operation objective ensures that the message sharing mechanism guides the other generator to generate better than itself while the competing objective encourages each generator to generate better than its counterpart. We analyze and visualize the messages that these GANs share among themselves in various scenarios. We quantitatively show that the message sharing formulation serves as a regularizer for the adversarial training. Qualitatively, we show that the different generators capture different traits of the underlying data distribution.\nIntelligent systems capable of automatically understanding natural language text are important for many artificial intelligence applications including mobile phone voice assistants, computer vision, and robotics. Understanding language often constitutes fitting new information into a previously acquired view of the world. However, many machine reading systems rely on the text alone to infer its meaning. In this paper, we pursue a different approach; machine reading methods that make use of background knowledge to facilitate language understanding. To this end, we have developed two methods: The first method addresses prepositional phrase attachment ambiguity. It uses background knowledge within a semi-supervised machine learning algorithm that learns from both labeled and unlabeled data. This approach yields state-of-the-art results on two datasets against strong baselines; The second method extracts relationships from compound nouns. Our knowledge-aware method for compound noun analysis accurately extracts relationships and significantly outperforms a baseline that does not make use of background knowledge.\nA recurring topic in interstellar exploration and the search for extraterrestrial intelligence (SETI) is the role of artificial intelligence. More precisely, these are programs or devices that are capable of performing cognitive tasks that have been previously associated with humans such as image recognition, reasoning, decision-making etc. Such systems are likely to play an important role in future deep space missions, notably interstellar exploration, where the spacecraft needs to act autonomously. This article explores the drivers for an interstellar mission with a computation-heavy payload and provides an outline of a spacecraft and mission architecture that supports such a payload. Based on existing technologies and extrapolations of current trends, it is shown that AI spacecraft development and operation will be constrained and driven by three aspects: power requirements for the payload, power generation capabilities, and heat rejection capabilities. A likely mission architecture for such a probe is to get into an orbit close to the star in order to generate maximum power for computational activities, and then to prepare for further exploration activities. Given current levels of increase in computational power, such a payload with a similar computational power as the human brain would have a mass of hundreds to dozens of tons in a 2050 - 2060 timeframe.\nThe explosive increase in number of smart devices hosting sophisticated applications is rapidly affecting the landscape of information communication technology industry. Mobile subscriptions, expected to reach 8.9 billion by 2022, would drastically increase the demand of extra capacity with aggregate throughput anticipated to be enhanced by a factor of 1000. In an already crowded radio spectrum, it becomes increasingly difficult to meet ever growing application demands of wireless bandwidth. It has been shown that the allocated spectrum is seldom utilized by the primary users and hence contains spectrum holes that may be exploited by the unlicensed users for their communication. As we enter the Internet Of Things (IoT) era in which appliances of common use will become smart digital devices with rigid performance requirements (such as low latency, energy efficiency, etc.), current networks face the vexing problem of how to create sufficient capacity for such applications. The fifth generation of cellular networks (5G) envisioned to address these challenges are thus required to incorporate cognition and intelligence to resolve the aforementioned issues.\nEvolution has resulted in highly developed abilities in many natural intelligences to quickly and accurately predict mechanical phenomena. Humans have successfully developed laws of physics to abstract and model such mechanical phenomena. In the context of artificial intelligence, a recent line of work has focused on estimating physical parameters based on sensory data and use them in physical simulators to make long-term predictions. In contrast, we investigate the effectiveness of a single neural network for end-to-end long-term prediction of mechanical phenomena. Based on extensive evaluation, we demonstrate that such networks can outperform alternate approaches having even access to ground-truth physical simulators, especially when some physical parameters are unobserved or not known a-priori. Further, our network outputs a distribution of outcomes to capture the inherent uncertainty in the data. Our approach demonstrates for the first time the possibility of making actionable long-term predictions from sensor data without requiring to explicitly model the underlying physical laws.\nThe real-time strategy game StarCraft has proven to be a challenging environment for artificial intelligence techniques, and as a result, current state-of-the-art solutions consist of numerous hand-crafted modules. In this paper, we show how macromanagement decisions in StarCraft can be learned directly from game replays using deep learning. Neural networks are trained on 789,571 state-action pairs extracted from 2,005 replays of highly skilled players, achieving top-1 and top-3 error rates of 54.6% and 22.9% in predicting the next build action. By integrating the trained network into UAlbertaBot, an open source StarCraft bot, the system can significantly outperform the game's built-in Terran bot, and play competitively against UAlbertaBot with a fixed rush strategy. To our knowledge, this is the first time macromanagement tasks are learned directly from replays in StarCraft. While the best hand-crafted strategies are still the state-of-the-art, the deep network approach is able to express a wide range of different strategies and thus improving the network's performance further with deep reinforcement learning is an immediately promising avenue for future research. Ultimately this approach could lead to strong StarCraft bots that are less reliant on hard-coded strategies.\nArtificial General Intelligence (AGI) or Strong AI aims to create machines with human-like or human-level intelligence, which is still a very ambitious goal when compared to the existing computing and AI systems. After many hype cycles and lessons from AI history, it is clear that a big conceptual leap is needed for crossing the starting line to kick-start mainstream AGI research. This position paper aims to make a small conceptual contribution toward reaching that starting line. After a broad analysis of the AGI problem from different perspectives, a system-theoretic and engineering-based research approach is introduced, which builds upon the existing mainstream AI and systems foundations. Several promising cross-fertilization opportunities between systems disciplines and AI research are identified. Specific potential research directions are discussed.\nThe General AI Challenge is an initiative to encourage the wider artificial intelligence community to focus on important problems in building intelligent machines with more general scope than is currently possible. The challenge comprises of multiple rounds, with the first round focusing on gradual learning, i.e. the ability to re-use already learned knowledge for efficiently learning to solve subsequent problems. In this article, we will present details of the first round of the challenge, its inspiration and aims. We also outline a more formal description of the challenge and present a preliminary analysis of its curriculum, based on ideas from computational mechanics. We believe, that such formalism will allow for a more principled approach towards investigating tasks in the challenge, building new curricula and for potentially improving consequent challenge rounds.\nData science and machine learning are the key technologies when it comes to the processes and products with automatic learning and optimization to be used in the automotive industry of the future. This article defines the terms \"data science\" (also referred to as \"data analytics\") and \"machine learning\" and how they are related. In addition, it defines the term \"optimizing analytics\" and illustrates the role of automatic optimization as a key technology in combination with data analytics. It also uses examples to explain the way that these technologies are currently being used in the automotive industry on the basis of the major subprocesses in the automotive value chain (development, procurement; logistics, production, marketing, sales and after-sales, connected customer). Since the industry is just starting to explore the broad range of potential uses for these technologies, visionary application examples are used to illustrate the revolutionary possibilities that they offer. Finally, the article demonstrates how these technologies can make the automotive industry more efficient and enhance its customer focus throughout all its operations and activities, extending from the product and its development process to the customers and their connection to the product.\nNovel user interfaces based on artificial intelligence, such as natural-language agents, present new categories of engineering challenges. These systems need to cope with uncertainty and ambiguity, interface with machine learning algorithms, and compose information from multiple users to make decisions. We propose to treat these challenges as language-design problems. We describe three programming language abstractions for three core problems in intelligent system design. First, hypothetical worlds support nondeterministic search over spaces of alternative actions. Second, a feature type system abstracts the interaction between applications and learning algorithms. Finally, constructs for collaborative execution extend hypothetical worlds across multiple machines while controlling access to private data. We envision these features as first steps toward a complete language for implementing AI-based interfaces and applications.\nDeep reinforcement learning is revolutionizing the artificial intelligence field. Currently, it serves as a good starting point for constructing intelligent autonomous systems which offer a better knowledge of the visual world. It is possible to scale deep reinforcement learning with the use of deep learning and do amazing tasks such as use of pixels in playing video games. In this paper, key concepts of deep reinforcement learning including reward function, differences between reinforcement learning and supervised learning and models for implementation of reinforcement are discussed. Key challenges related to the implementation of reinforcement learning in conversational AI domain are identified as well as discussed in detail. Various conversational models which are based on deep reinforcement learning (as well as deep learning) are also discussed. In summary, this paper discusses key aspects of deep reinforcement learning which are crucial for designing an efficient conversational AI.\nAn Oracle is a design for potentially high power artificial intelligences (AIs), where the AI is made safe by restricting it to only answer questions. Unfortunately most designs cause the Oracle to be motivated to manipulate humans with the contents of their answers, and Oracles of potentially high intelligence might be very successful at this. Solving that problem, without compromising the accuracy of the answer, is tricky. This paper reduces the issue to a cryptographic-style problem of Alice ensuring that her Oracle answers her questions while not providing key information to an eavesdropping Eve. Two Oracle designs solve this problem, one counterfactual (the Oracle answers as if it expected its answer to never be read) and one on-policy, but limited by the quantity of information it can transmit.\nCooperative multi-agent planning (MAP) is a relatively recent research field that combines technologies, algorithms and techniques developed by the Artificial Intelligence Planning and Multi-Agent Systems communities. While planning has been generally treated as a single-agent task, MAP generalizes this concept by considering multiple intelligent agents that work cooperatively to develop a course of action that satisfies the goals of the group.   This paper reviews the most relevant approaches to MAP, putting the focus on the solvers that took part in the 2015 Competition of Distributed and Multi-Agent Planning, and classifies them according to their key features and relative performance.\nWe introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents' optimal polices, but more importantly, the observation and understanding of individual agent's behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch.\nIn recent years, deep learning has made tremendous progress in a number of fields that were previously out of reach for artificial intelligence. The successes in these problems has led researchers to consider the possibilities for intelligent systems to tackle a problem that humans have only recently themselves considered: program synthesis. This challenge is unlike others such as object recognition and speech translation, since its abstract nature and demand for rigor make it difficult even for human minds to attempt. While it is still far from being solved or even competitive with most existing methods, neural program synthesis is a rapidly growing discipline which holds great promise if completely realized. In this paper, we start with exploring the problem statement and challenges of program synthesis. Then, we examine the fascinating evolution of program induction models, along with how they have succeeded, failed and been reimagined since. Finally, we conclude with a contrastive look at program synthesis and future research recommendations for the field.\nEthics and safety research in artificial intelligence is increasingly framed in terms of \"alignment\" with human values and interests. I argue that Turing's call for \"fair play for machines\" is an early and often overlooked contribution to the alignment literature. Turing's appeal to fair play suggests a need to correct human behavior to accommodate our machines, a surprising inversion of how value alignment is treated today. Reflections on \"fair play\" motivate a novel interpretation of Turing's notorious \"imitation game\" as a condition not of intelligence but instead of value alignment: a machine demonstrates a minimal degree of alignment (with the norms of conversation, for instance) when it can go undetected when interrogated by a human. I carefully distinguish this interpretation from the Moral Turing Test, which is not motivated by a principle of fair play, but instead depends on imitation of human moral behavior. Finally, I consider how the framework of fair play can be used to situate the debate over robot rights within the alignment literature. I argue that extending rights to service robots operating in public spaces is \"fair\" in precisely the sense that it encourages an alignment of interests between humans and machines.\nThis paper presents an overview of the sixth AIBIRDS competition, held at the 26th International Joint Conference on Artificial Intelligence. This competition tasked participants with developing an intelligent agent which can play the physics-based puzzle game Angry Birds. This game uses a sophisticated physics engine that requires agents to reason and predict the outcome of actions with only limited environmental information. Agents entered into this competition were required to solve a wide assortment of previously unseen levels within a set time limit. The physical reasoning and planning required to solve these levels are very similar to those of many real-world problems. This year's competition featured some of the best agents developed so far and even included several new AI techniques such as deep reinforcement learning. Within this paper we describe the framework, rules, submitted agents and results for this competition. We also provide some background information on related work and other video game AI competitions, as well as discussing some potential ideas for future AIBIRDS competitions and agent improvements.\nThe ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. The intelligence of current computer algorithms has not reached this level of complexity, but there are several research efforts that are working towards it. With the number of classification algorithms available, it is hard to determine which algorithm works best for a particular situation. In classification of visual human intent data, Hidden Markov Models (HMM), and their variants, are leading candidates.   The inability of HMMs to provide a probability in the observation to observation linkages is a big downfall in this classification technique. If a person is visually identifying an action of another person, they monitor patterns in the observations. By estimating the next observation, people have the ability to summarize the actions, and thus determine, with pretty good accuracy, the intention of the person performing the action. These visual cues and linkages are important in creating intelligent algorithms for determining human actions based on visual observations.   The Evidence Feed Forward Hidden Markov Model is a newly developed algorithm which provides observation to observation linkages. The following research addresses the theory behind Evidence Feed Forward HMMs, provides mathematical proofs of their learning of these parameters to optimize the likelihood of observations with a Evidence Feed Forwards HMM, which is important in all computational intelligence algorithm, and gives comparative examples with standard HMMs in classification of both visual action data and measurement data; thus providing a strong base for Evidence Feed Forward HMMs in classification of many types of problems.\nSocial Robot Lumen is an Artificial Intelligence development project that aims to create an Artificial Intelligence (AI) which allows a humanoid robot to communicate with human being naturally. In this study, Lumen will be developed to be a tour guide in Electrical Engineering Days 2015 exhibition. In developing an AI, there are a lot of modules that need to be developed separately. To make the development easier, we need a computational platform which becomes basis for all developers to give easiness in developing the modules in parallel way. That computational platform that developed by the writer is called Lumen Server. Lumen Server has two main function, which are to be a bridge between all Lumen intelligence modules with NAO robot, and to be the communication bridge between those Lumen intelligence modules. For the second function, Lumen Server implements the AMQP protocol using RabbitMQ. Besides that, writer also developed a control system for robot movement called Lumen Motion. Lumen motion is implemented by modelling the movement of NAO robot and also by creating a control system using fuzzy logic controller. Writer also developed a program that connects all Lumen intelligence modules so that Lumen can act like a tour guide. The implementation of this program uses FSM and event-driven program. From implementation result, all the features which were designed are successfully implemented. By the developing of this computational platform, it can ease the development of Lumen in the future. For next development, it must be focused on creating integration system so that Lumen can be more responsive to the environment.   -----   Sosial Robot Lumen adalah proyek pengembangan kecerdasan buatan yang bertujuan untuk menciptakan kecerdasan buatan atau artificial intelligence (AI) yang memungkinkan robot untuk dapat berkomunikasi dengan manusia secara alami.\nMarket price systems constitute a well-understood class of mechanisms that under certain conditions provide effective decentralization of decision making with minimal communication overhead. In a market-oriented programming approach to distributed problem solving, we derive the activities and resource allocations for a set of computational agents by computing the competitive equilibrium of an artificial economy. WALRAS provides basic constructs for defining computational market structures, and protocols for deriving their corresponding price equilibria. In a particular realization of this approach for a form of multicommodity flow problem, we see that careful construction of the decision process according to economic principles can lead to efficient distributed resource allocation, and that the behavior of the system can be meaningfully analyzed in economic terms.\nLearning the past tense of English verbs - a seemingly minor aspect of language acquisition - has generated heated debates since 1986, and has become a landmark task for testing the adequacy of cognitive modeling. Several artificial neural networks (ANNs) have been implemented, and a challenge for better symbolic models has been posed. In this paper, we present a general-purpose Symbolic Pattern Associator (SPA) based upon the decision-tree learning algorithm ID3. We conduct extensive head-to-head comparisons on the generalization ability between ANN models and the SPA under different representations. We conclude that the SPA generalizes the past tense of unseen verbs better than ANN models by a wide margin, and we offer insights as to why this should be the case. We also discuss a new default strategy for decision-tree learning algorithms.\nWe report on a series of experiments in which all decision trees consistent with the training data are constructed. These experiments were run to gain an understanding of the properties of the set of consistent decision trees and the factors that affect the accuracy of individual trees. In particular, we investigated the relationship between the size of a decision tree consistent with some training data and the accuracy of the tree on test data. The experiments were performed on a massively parallel Maspar computer. The results of the experiments on several artificial and two real world problems indicate that, for many of the problems investigated, smaller consistent decision trees are on average less accurate than the average accuracy of slightly larger trees.\nThis article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.\nIn this paper we re-investigate windowing for rule learning algorithms. We show that, contrary to previous results for decision tree learning, windowing can in fact achieve significant run-time gains in noise-free domains and explain the different behavior of rule learning algorithms by the fact that they learn each rule independently. The main contribution of this paper is integrative windowing, a new type of algorithm that further exploits this property by integrating good rules into the final theory right after they have been discovered. Thus it avoids re-learning these rules in subsequent iterations of the windowing process. Experimental evidence in a variety of noise-free domains shows that integrative windowing can in fact achieve substantial run-time gains. Furthermore, we discuss the problem of noise in windowing and present an algorithm that is able to achieve run-time gains in a set of experiments in a simple domain with artificial noise.\nEvolutionary artificial neural networks (EANNs) refer to a special class of artificial neural networks (ANNs) in which evolution is another fundamental form of adaptation in addition to learning. Evolutionary algorithms are used to adapt the connection weights, network architecture and learning algorithms according to the problem environment. Even though evolutionary algorithms are well known as efficient global search algorithms, very often they miss the best local solutions in the complex solution space. In this paper, we propose a hybrid meta-heuristic learning approach combining evolutionary learning and local search methods (using 1st and 2nd order error information) to improve the learning and faster convergence obtained using a direct evolutionary approach. The proposed technique is tested on three different chaotic time series and the test results are compared with some popular neuro-fuzzy systems and a recently developed cutting angle method of global optimization. Empirical results reveal that the proposed technique is efficient in spite of the computational complexity.\nImagine a \"machine\" where there is no pre-commitment to any particular representational scheme: the desired behaviour is distributed and roughly specified simultaneously among many parts, but there is minimal specification of the mechanism required to generate that behaviour, i.e. the global behaviour evolves from the many relations of multiple simple behaviours. A machine that lives to and from/with Synergy. An artificial super-organism that avoids specific constraints and emerges within multiple low-level implicit bio-inspired mechanisms. KEYWORDS: Complex Science, ArtSBots Project, Swarm Intelligence, Stigmergy, UnManned Art, Symbiotic Art, Swarm Paintings, Robot Paintings, Non-Human Art, Painting Emergence and Cooperation, Art and Complexity, ArtBots: The Robot Talent Show.\nOver the last few years, more and more heuristic decision making techniques have been inspired by nature, e.g. evolutionary algorithms, ant colony optimisation and simulated annealing. More recently, a novel computational intelligence technique inspired by immunology has emerged, called Artificial Immune Systems (AIS). This immune system inspired technique has already been useful in solving some computational problems. In this keynote, we will very briefly describe the immune system metaphors that are relevant to AIS. We will then give some illustrative real-world problems suitable for AIS use and show a step-by-step algorithm walkthrough. A comparison of AIS to other well-known algorithms and areas for future work will round this keynote off. It should be noted that as AIS is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from the examples given here.\nThis paper investigates a new method for improving the learning algorithm of Mixture of Experts (ME) model using a hybrid of Modified Cuckoo Search (MCS) and Conjugate Gradient (CG) as a second order optimization technique. The CG technique is combined with Back-Propagation (BP) algorithm to yield a much more efficient learning algorithm for ME structure. In addition, the experts and gating networks in enhanced model are replaced by CG based Multi-Layer Perceptrons (MLPs) to provide faster and more accurate learning. The CG is considerably depends on initial weights of connections of Artificial Neural Network (ANN), so, a metaheuristic algorithm, the so-called Modified Cuckoo Search is applied in order to select the optimal weights. The performance of proposed method is compared with Gradient Decent Based ME (GDME) and Conjugate Gradient Based ME (CGME) in classification and regression problems. The experimental results show that hybrid MSC and CG based ME (MCS-CGME) has faster convergence and better performance in utilized benchmark data sets.\nEssentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN), a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training.\nA central task of artificial intelligence is the design of artificial agents that act towards specified goals in partially observed environments. Since such environments frequently include interaction over time with other agents with their own goals, reasoning about such interaction relies on sequential game-theoretic models such as extensive-form games or some of their succinct representations such as multi-agent influence diagrams. The current algorithms for calculating equilibria either work with inefficient representations, possibly doubly exponential inthe number of time steps, or place strong assumptions on the game structure. In this paper,we propose a sampling-based approach, which calculates extensive-form correlated equilibria with small representations without placing such strong assumptions. Thus, it is practical in situations where the previous approaches would fail. In addition, our algorithm allows control over characteristics of the target equilibrium, e.g., we can ask for an equilibrium with high social welfare. Our approach is based on a multiplicativeweight update algorithm analogous to AdaBoost, and Markov chain Monte Carlo sampling. We prove convergence guarantees and explore the utility of our approach on several moderately sized multi-player games.\nArtificial bee colony (ABC), an optimization algorithm is a recent addition to the family of population based search algorithm. ABC has taken its inspiration from the collective intelligent foraging behavior of honey bees. In this study we have incorporated golden section search mechanism in the structure of basic ABC to improve the global convergence and prevent to stick on a local solution. The proposed variant is termed as ILS-ABC. Comparative numerical results with the state-of-art algorithms show the performance of the proposal when applied to the set of unconstrained engineering design problems. The simulated results show that the proposed variant can be successfully applied to solve real life problems.\nRenewable and sustainable energy is one of the most important challenges currently facing mankind. Wind has made an increasing contribution to the world's energy supply mix, but still remains a long way from reaching its full potential. In this paper, we investigate the use of artificial evolution to design vertical-axis wind turbine prototypes that are physically instantiated and evaluated under approximated wind tunnel conditions. An artificial neural network is used as a surrogate model to assist learning and found to reduce the number of fabrications required to reach a higher aerodynamic efficiency, resulting in an important cost reduction. Unlike in other approaches, such as computational fluid dynamics simulations, no mathematical formulations are used and no model assumptions are made.\nA novel method for estimating Bayesian network (BN) parameters from data is presented which provides improved performance on test data. Previous research has shown the value of representing conditional probability distributions (CPDs) via neural networks(Neal 1992), noisy-OR gates (Neal 1992, Diez 1993)and decision trees (Friedman and Goldszmidt 1996).The Bernoulli mixture network (BMN) explicitly represents the CPDs of discrete BN nodes as mixtures of local distributions,each having a different set of parents.This increases the space of possible structures which can be considered,enabling the CPDs to have finer-grained dependencies.The resulting estimation procedure induces a modelthat is better able to emulate the underlying interactions occurring in the data than conventional conditional Bernoulli network models.The results for artificially generated data indicate that overfitting is best reduced by restricting the complexity of candidate mixture substructures local to each node. Furthermore, mixtures of very simple substructures can perform almost as well as more complex ones.The BMN is also applied to data collected from an online adventure game with an application to keyhole plan recognition. The results show that the BMN-based model brings a dramatic improvement in performance over a conventional BN model.\nRecent developments show that Multiply Sectioned Bayesian Networks (MSBNs) can be used for diagnosis of natural systems as well as for model-based diagnosis of artificial systems. They can be applied to single-agent oriented reasoning systems as well as multi-agent distributed probabilistic reasoning systems. Belief propagation between a pair of subnets plays a central role in maintenance of global consistency in a MSBN. This paper studies the operation UpdateBelief, presented originally with MSBNs, for inter-subnet propagation. We analyze how the operation achieves its intended functionality, which provides hints as for how its efficiency can be improved. We then define two new versions of UpdateBelief that reduce the computation time for inter-subnet propagation. One of them is optimal in the sense that the minimum amount of computation for coordinating multi-linkage belief propagation is required. The optimization problem is solved through the solution of a graph-theoretic problem: the minimum weight open tour in a tree.\nIn this paper, we present structured message passing (SMP), a unifying framework for approximate inference algorithms that take advantage of structured representations such as algebraic decision diagrams and sparse hash tables. These representations can yield significant time and space savings over the conventional tabular representation when the message has several identical values (context-specific independence) or zeros (determinism) or both in its range. Therefore, in order to fully exploit the power of structured representations, we propose to artificially introduce context-specific independence and determinism in the messages. This yields a new class of powerful approximate inference algorithms which includes popular algorithms such as cluster-graph Belief propagation (BP), expectation propagation and particle BP as special cases. We show that our new algorithms introduce several interesting bias-variance trade-offs. We evaluate these trade-offs empirically and demonstrate that our new algorithms are more accurate and scalable than state-of-the-art techniques.\nConsider a collection of weighted subsets of a ground set N. Given a query subset Q of N, how fast can one (1) find the weighted sum over all subsets of Q, and (2) sample a subset of Q proportionally to the weights? We present a tree-based greedy heuristic, Treedy, that for a given positive tolerance d answers such counting and sampling queries to within a guaranteed relative error d and total variation distance d, respectively. Experimental results on artificial instances and in application to Bayesian structure discovery in Bayesian networks show that approximations yield dramatic savings in running time compared to exact computation, and that Treedy typically outperforms a previously proposed sorting-based heuristic.\nThe problem of similarity search is one of the main problems in computer science. This problem has many applications in text-retrieval, web search, computational biology, bioinformatics and others. Similarity between two data objects can be depicted using a similarity measure or a distance metric. There are numerous distance metrics in the literature, some are used for a particular data type, and others are more general. In this paper we present a new distance metric for sequential data which is based on the sum of n-grams. The novelty of our distance is that these n-grams are weighted using artificial bee colony; a recent optimization algorithm based on the collective intelligence of a swarm of bees on their search for nectar. This algorithm has been used in optimizing a large number of numerical problems. We validate the new distance experimentally.\nArtificial bee colony (ABC) algorithm has proved its importance in solving a number of problems including engineering optimization problems. ABC algorithm is one of the most popular and youngest member of the family of population based nature inspired meta-heuristic swarm intelligence method. ABC has been proved its superiority over some other Nature Inspired Algorithms (NIA) when applied for both benchmark functions and real world problems. The performance of search process of ABC depends on a random value which tries to balance exploration and exploitation phase. In order to increase the performance it is required to balance the exploration of search space and exploitation of optimal solution of the ABC. This paper outlines a new hybrid of ABC algorithm with Genetic Algorithm. The proposed method integrates crossover operation from Genetic Algorithm (GA) with original ABC algorithm. The proposed method is named as Crossover based ABC (CbABC). The CbABC strengthens the exploitation phase of ABC as crossover enhances exploration of search space. The CbABC tested over four standard benchmark functions and a popular continuous optimization problem.\nThere is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems. Similarly, intertemporal choice research has shown that decision-makers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and intertemporal choice preferences can be explained as a consequence of the coding efficiency of sensorimotor representation. In particular, the model implies that interactions that constrain future behavior are perceived as being both longer in duration and more valuable. Furthermore, using simulations of artificial agents, we investigate how memory constraints enforce a renormalization of the perceived timescales. Our results show that qualitatively different discount functions, such as exponential and hyperbolic discounting, arise as a consequence of an agent's probabilistic model of the world.\nRecently, the long short-term memory neural network (LSTM) has attracted wide interest due to its success in many tasks. LSTM architecture consists of a memory cell and three gates, which looks similar to the neuronal networks in the brain. However, there still lacks the evidence of the cognitive plausibility of LSTM architecture as well as its working mechanism. In this paper, we study the cognitive plausibility of LSTM by aligning its internal architecture with the brain activity observed via fMRI when the subjects read a story. Experiment results show that the artificial memory vector in LSTM can accurately predict the observed sequential brain activities, indicating the correlation between LSTM architecture and the cognitive process of story reading.\nUsing an improved backtrack algorithm with sophisticated pruning techniques, we revise previous observations correlating a high frequency of hard to solve Hamiltonian Cycle instances with the Gn,m phase transition between Hamiltonicity and non-Hamiltonicity. Instead all tested graphs of 100 to 1500 vertices are easily solved. When we artificially restrict the degree sequence with a bounded maximum degree, although there is some increase in difficulty, the frequency of hard graphs is still low. When we consider more regular graphs based on a generalization of knight's tours, we observe frequent instances of really hard graphs, but on these the average degree is bounded by a constant. We design a set of graphs with a feature our algorithm is unable to detect and so are very hard for our algorithm, but in these we can vary the average degree from O(1) to O(n). We have so far found no class of graphs correlated with the Gn,m phase transition which asymptotically produces a high frequency of hard instances.\nThis paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurrently explore the network and exchange collected information. The communication among the agents is indirect and asynchronous, mediated by the network itself. This form of communication is typical of social insects and is called stigmergy. We compare our algorithm with six state-of-the-art routing algorithms coming from the telecommunications and machine learning fields. The algorithms' performance is evaluated over a set of realistic testbeds. We run many experiments over real and artificial IP datagram networks with increasing number of nodes and under several paradigmatic spatial and temporal traffic distributions. Results are very encouraging. AntNet showed superior performance under all the experimental conditions with respect to its competitors. We analyze the main characteristics of the algorithm and try to explain the reasons for its superiority.\nNowadays, computer scientists have shown the interest in the study of social insect's behaviour in neural networks area for solving different combinatorial and statistical problems. Chief among these is the Artificial Bee Colony (ABC) algorithm. This paper investigates the use of ABC algorithm that simulates the intelligent foraging behaviour of a honey bee swarm. Multilayer Perceptron (MLP) trained with the standard back propagation algorithm normally utilises computationally intensive training algorithms. One of the crucial problems with the backpropagation (BP) algorithm is that it can sometimes yield the networks with suboptimal weights because of the presence of many local optima in the solution space. To overcome ABC algorithm used in this work to train MLP learning the complex behaviour of earthquake time series data trained by BP, the performance of MLP-ABC is benchmarked against MLP training with the standard BP. The experimental result shows that MLP-ABC performance is better than MLP-BP for time series data.\nGames have always been a popular test bed for artificial intelligence techniques. Game developers are always in constant search for techniques that can automatically create computer games minimizing the developer's task. In this work we present an evolutionary strategy based solution towards the automatic generation of two player board games. To guide the evolutionary process towards games, which are entertaining, we propose a set of metrics. These metrics are based upon different theories of entertainment in computer games. This work also compares the entertainment value of the evolved games with the existing popular board based games. Further to verify the entertainment value of the evolved games with the entertainment value of the human user a human user survey is conducted. In addition to the user survey we check the learnability of the evolved games using an artificial neural network based controller. The proposed metrics and the evolutionary process can be employed for generating new and entertaining board games, provided an initial search space is given to the evolutionary algorithm.\nDeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community.\nQuestion Answering System (QAS) is used for information retrieval and natural language processing (NLP) to reduce human effort. There are numerous QAS based on the user documents present today, but they all are limited to providing objective answers and process simple questions only. Complex questions cannot be answered by the existing QAS, as they require interpretation of the current and old data as well as the question asked by the user. The above limitations can be overcome by using deep cases and neural network. Hence we propose a modified QAS in which we create a deep artificial neural network with associative memory from text documents. The modified QAS processes the contents of the text document provided to it and find the answer to even complex questions in the documents.\nExperience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-policy setting. In addition to selecting appropriate sequences, we also artificially construct transition sequences using information gathered from previous agent-environment interactions. These sequences, when replayed, allow value function information to trickle down to larger sections of the state/state-action space, thereby making the most of the agent's experience. We demonstrate our approach on modified versions of standard reinforcement learning tasks such as the mountain car and puddle world problems and empirically show that it enables better learning of value functions as compared to other forms of experience replay. Further, we briefly discuss some of the possible extensions to this work, as well as applications and situations where this approach could be particularly useful.\nSeveral types of sensors have been available in off-the-shelf mobile devices, including motion, magnetic, vision, acoustic, and location sensors. This paper focuses on the fusion of the data acquired from motion and magnetic sensors, i.e., accelerometer, gyroscope and magnetometer sensors, for the recognition of Activities of Daily Living (ADL) using pattern recognition techniques. The system developed in this study includes data acquisition, data processing, data fusion, and artificial intelligence methods. Artificial Neural Networks (ANN) are included in artificial intelligence methods, which are used in this study for the recognition of ADL. The purpose of this study is the creation of a new method using ANN for the identification of ADL, comparing three types of ANN, in order to achieve results with a reliable accuracy. The best accuracy was obtained with Deep Learning, which, after the application of the L2 regularization and normalization techniques on the sensors data, reports an accuracy of 89.51%.\nThere currently exists a wide range of techniques to model and evolve artificial players for games. Existing techniques range from black box neural networks to entirely hand-designed solutions. In this paper, we demonstrate the feasibility of a genetic programming framework using human controller input to derive meaningful artificial players which can, later on, be optimised by hand. The current state of the art in game character design relies heavily on human designers to manually create and edit scripts and rules for game characters. To address this manual editing bottleneck, current computational intelligence techniques approach the issue with fully autonomous character generators, replacing most of the design process using black box solutions such as neural networks or the like. Our GP approach to this problem creates character controllers which can be further authored and developed by a designer it also offers designers to included their play style without the need to use a programming language. This keeps the designer in the loop while reducing repetitive manual labour. Our system also provides insights into how players express themselves in games and into deriving appropriate models for representing those insights. We present our framework, supporting findings and open challenges.\nA vexing problem in artificial intelligence is reasoning about events that occur in complex, changing visual stimuli such as in video analysis or game play. Inspired by a rich tradition of visual reasoning and memory in cognitive psychology and neuroscience, we developed an artificial, configurable visual question and answer dataset (COG) to parallel experiments in humans and animals. COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory -- problems that remain challenging for modern deep learning architectures. We additionally propose a deep learning architecture that performs competitively on other diagnostic VQA datasets (i.e. CLEVR) as well as easy settings of the COG dataset. However, several settings of COG result in datasets that are progressively more challenging to learn. After training, the network can zero-shot generalize to many new tasks. Preliminary analyses of the network architectures trained on COG demonstrate that the network accomplishes the task in a manner interpretable to humans.\nDisease Intelligence (DI) is based on the acquisition and aggregation of fragmented knowledge of diseases at multiple sources all over the world to provide valuable information to doctors, researchers and information seeking community. Some diseases have their own characteristics changed rapidly at different places of the world and are reported on documents as unrelated and heterogeneous information which may be going unnoticed and may not be quickly available. This research presents an Ontology based theoretical framework in the context of medical intelligence and country/region. Ontology is designed for storing information about rapidly spreading and changing diseases with incorporating existing disease taxonomies to genetic information of both humans and infectious organisms. It further maps disease symptoms to diseases and drug effects to disease symptoms. The machine understandable disease ontology represented as a website thus allows the drug effects to be evaluated on disease symptoms and exposes genetic involvements in the human diseases. Infectious agents which have no known place in an existing classification but have data on genetics would still be identified as organisms through the intelligence of this system. It will further facilitate researchers on the subject to try out different solutions for curing diseases.\nIn this paper we define Clinical Data Intelligence as the analysis of data generated in the clinical routine with the goal of improving patient care. We define a science of a Clinical Data Intelligence as a data analysis that permits the derivation of scientific, i.e., generalizable and reliable results. We argue that a science of a Clinical Data Intelligence is sensible in the context of a Big Data analysis, i.e., with data from many patients and with complete patient information. We discuss that Clinical Data Intelligence requires the joint efforts of knowledge engineering, information extraction (from textual and other unstructured data), and statistics and statistical machine learning. We describe some of our main results as conjectures and relate them to a recently funded research project involving two major German university hospitals.\nBusiness Intelligence and Analytics (BI&A) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the growing number of data sources, traditional BI\\&A are evolving to provide intelligence at different scales and perspectives - operational BI, situational BI, self-service BI. In this survey, we review the evolution of business intelligence systems in full scale from back-end architecture to and front-end applications. We focus on the changes in the back-end architecture that deals with the collection and organization of the data. We also review the changes in the front-end applications, where analytic services and visualization are the core components. Using a uses case from BI in Healthcare, which is one of the most complex enterprises, we show how BI\\&A will play an important role beyond the traditional usage. The survey provides a holistic view of Business Intelligence and Analytics for anyone interested in getting a complete picture of the different pieces in the emerging next generation BI\\&A solutions.\nIn this paper, we demonstrate the application of Fuzzy Markup Language (FML) to construct an FML-based Dynamic Assessment Agent (FDAA), and we present an FML-based Human-Machine Cooperative System (FHMCS) for the game of Go. The proposed FDAA comprises an intelligent decision-making and learning mechanism, an intelligent game bot, a proximal development agent, and an intelligent agent. The intelligent game bot is based on the open-source code of Facebook Darkforest, and it features a representational state transfer application programming interface mechanism. The proximal development agent contains a dynamic assessment mechanism, a GoSocket mechanism, and an FML engine with a fuzzy knowledge base and rule base. The intelligent agent contains a GoSocket engine and a summarization agent that is based on the estimated win rate, real-time simulation number, and matching degree of predicted moves. Additionally, the FML for player performance evaluation and linguistic descriptions for game results commentary are presented. We experimentally verify and validate the performance of the FDAA and variants of the FHMCS by testing five games in 2016 and 60 games of Google Master Go, a new version of the AlphaGo program, in January 2017. The experimental results demonstrate that the proposed FDAA can work effectively for Go applications.\nThe combination of Artificial Intelligence (AI) and Internet-of-Things (IoT), which is denoted as AI-powered Internet-of-Things (AIoT), is capable of processing huge amount of data generated from a large number of devices and handling complex problems in social infrastructures. As AI and IoT technologies are becoming mature, in this paper, we propose to apply AIoT technologies for traffic light control, which is an essential component for intelligent transportation system, to improve the efficiency of smart city's road system. Specifically, various sensors such as surveillance cameras provide real-time information for intelligent traffic light control system to observe the states of both motorized traffic and non-motorized traffic. In this paper, we propose an intelligent traffic light control solution by using distributed multi-agent Q learning, considering the traffic information at the neighboring intersections as well as local motorized and non-motorized traffic, to improve the overall performance of the entire control system. By using the proposed multi-agent Q learning algorithm, our solution is targeting to optimize both the motorized and non-motorized traffic. In addition, we considered many constraints/rules for traffic light control in the real world, and integrate these constraints in the learning algorithm, which can facilitate the proposed solution to be deployed in real operational scenarios. We conducted numerical simulations for a real-world map with real-world traffic data. The simulation results show that our proposed solution outperforms existing solutions in terms of vehicle and pedestrian queue lengths, waiting time at intersections, and many other key performance metrics.\nThis report describes an initial reference architecture for intelligent software agents performing active, largely autonomous cyber defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 \"Intelligent Autonomous Agents for Cyber Defense and Resilience\". In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents - malware - will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance and computerized weapon systems. To fight them, NATO needs artificial cyber hunters - intelligent, autonomous, mobile agents specialized in active cyber defense. With this in mind, in 2016, NATO initiated RTG IST-152. Its objective is to help accelerate development and transition to practice of such software agents by producing a reference architecture and technical roadmap. This report presents the concept and architecture of an Autonomous Intelligent Cyber Defense Agent (AICA). We describe the rationale of the AICA concept, explain the methodology and purpose that drive the definition of the AICA Reference Architecture, and review some of the main features and challenges of the AICA.\nIntelligent agents are often faced with the problem of trying to merge possibly conflicting pieces of information obtained from different sources into a consistent view of the world. We propose a framework for the modelling of such merging operations with roots in the work of Spohn (1988, 1991). Unlike most approaches we focus on the merging of epistemic states, not knowledge bases. We construct a number of plausible merging operations and measure them against various properties that merging operations ought to satisfy. Finally, we discuss the connection between merging and the use of infobases Meyer (1999) and Meyer et al. (2000).\nThe goal of the LP+ project at the K.U.Leuven is to design an expressive logic, suitable for declarative knowledge representation, and to develop intelligent systems based on Logic Programming technology for solving computational problems using the declarative specifications. The ID-logic is an integration of typed classical logic and a definition logic. Different abductive solvers for this language are being developed. This paper is a report of the integration of high order aggregates into ID-logic and the consequences on the solver SLDNFA.\nThe aim of this work is to explore new methodologies on Semantic Parsing for unrestricted texts. Our approach follows the current trends in Information Extraction (IE) and is based on the application of a verbal subcategorization lexicon (LEXPIR) by means of complex pattern recognition techniques. LEXPIR is framed on the theoretical model of the verbal subcategorization developed in the Pirapides project.\nWhen simultaneously reasoning with evidences about several different events it is necessary to separate the evidence according to event. These events should then be handled independently. However, when propositions of evidences are weakly specified in the sense that it may not be certain to which event they are referring, this may not be directly possible. In this paper a criterion for partitioning evidences into subsets representing events is established. This criterion, derived from the conflict within each subset, involves minimising a criterion function for the overall conflict of the partition. An algorithm based on characteristics of the criterion function and an iterative optimisation among partitionings of evidences is proposed.\nWithin the scope of the three-year ANTI-SUBMARINE WARFARE project of the National Defence Research Establishment, the INFORMATION SYSTEMS subproject has developed the demonstration prototype Dezzy for handling and analysis of intelligence reports concerning foreign underwater activities.   -----   Inom ramen f\\\"or FOA:s tre{\\aa}riga huvudprojekt UB{\\AA}TSSKYDD har delprojekt INFORMATIONSSYSTEM utvecklat demonstrationsprototypen Dezzy till ett beslutsst\\\"odsystem f\\\"or hantering och analys av underr\\\"attelser om fr\\\"ammande undervattensverksamhet.\nWe present a flexible rule compiler developed for a text-to-speech (TTS) system. The compiler converts a set of rules into a finite-state transducer (FST). The input and output of the FST are subject to parameterization, so that the system can be applied to strings and sequences of feature-structures. The resulting transducer is guaranteed to realize a function (as opposed to a relation), and therefore can be implemented as a deterministic device (either a deterministic FST or a bimachine).\nResearch concerning organization and coordination within multi-agent systems continues to draw from a variety of architectures and methodologies. The work presented in this paper combines techniques from game theory and multi-agent systems to produce self-organizing, polymorphic, lightweight, embedded agents for systems scheduling within a large-scale real-time systems environment. Results show how this approach is used to experimentally produce optimum real-time scheduling through the emergent behavior of thousands of agents. These results are obtained using a SWARM simulation of systems scheduling within a High Energy Physics experiment consisting of 2500 digital signal processors.\nIn Dempster-Shafer belief theory, general beliefs are expressed as belief mass distribution functions over frames of discernment. In Subjective Logic beliefs are expressed as belief mass distribution functions over binary frames of discernment. Belief representations in Subjective Logic, which are called opinions, also contain a base rate parameter which express the a priori belief in the absence of evidence. Philosophically, beliefs are quantitative representations of evidence as perceived by humans or by other intelligent agents. The basic operators of classical probability calculus, such as addition and multiplication, can be applied to opinions, thereby making belief calculus practical. Through the equivalence between opinions and Beta probability density functions, this also provides a calculus for Beta probability density functions. This article explains the basic elements of belief calculus.\nSolomonoff's inductive learning model is a powerful, universal and highly elegant theory of sequence prediction. Its critical flaw is that it is incomputable and thus cannot be used in practice. It is sometimes suggested that it may still be useful to help guide the development of very general and powerful theories of prediction which are computable. In this paper it is shown that although powerful algorithms exist, they are necessarily highly complex. This alone makes their theoretical analysis problematic, however it is further shown that beyond a moderate level of complexity the analysis runs into the deeper problem of Goedel incompleteness. This limits the power of mathematics to analyse and study prediction algorithms, and indeed intelligent systems in general.\nIn this report, a novel approach to intelligence and learning is introduced, this approach is based on what we call 'perception logic'. Based on this logic, a computing mechanism and automata are introduced. Multi-resolution analysis of perceptual information is given, in which learning is accomplished in at most O(log(N))epochs, where N is the number of samples, and the convergence is guarnteed. This approach combines the favors of computational modeles in the sense that they are structured and mathematically well-defined, and the adaptivity of soft computing approaches, in addition to the continuity and real-time response of dynamical systems.\nWith the great success in simulating many intelligent behaviors using computing devices, there has been an ongoing debate whether all conscious activities are computational processes. In this paper, the answer to this question is shown to be no. A certain phenomenon of consciousness is demonstrated to be fully represented as a computational process using a quantum computer. Based on the computability criterion discussed with Turing machines, the model constructed is shown to necessarily involve a non-computable element. The concept that this is solely a quantum effect and does not work for a classical case is also discussed.\nPlasmodium of Physarym polycephalum is an ideal biological substrate for implementing concurrent and parallel computation, including combinatorial geometry and optimization on graphs. We report results of scoping experiments on Physarum computing in conditions of minimal friction, on the water surface. We show that plasmodium of Physarum is capable for computing a basic spanning trees and manipulating of light-weight objects. We speculate that our results pave the pathways towards design and implementation of amorphous biological robots.\nPurpose: To present an algorithm for spatially sorting objects into an annular structure. Design/Methodology/Approach: A swarm-based model that requires only stochastic agent behaviour coupled with a pheromone-inspired \"attraction-repulsion\" mechanism. Findings: The algorithm consistently generates high-quality annular structures, and is particularly powerful in situations where the initial configuration of objects is similar to those observed in nature. Research limitations/implications: Experimental evidence supports previous theoretical arguments about the nature and mechanism of spatial sorting by insects. Practical implications: The algorithm may find applications in distributed robotics. Originality/value: The model offers a powerful minimal algorithmic framework, and also sheds further light on the nature of attraction-repulsion algorithms and underlying natural processes.\nThe article presents an approach to interactively solve multi-objective optimization problems. While the identification of efficient solutions is supported by computational intelligence techniques on the basis of local search, the search is directed by partial preference information obtained from the decision maker.   An application of the approach to biobjective portfolio optimization, modeled as the well-known knapsack problem, is reported, and experimental results are reported for benchmark instances taken from the literature. In brief, we obtain encouraging results that show the applicability of the approach to the described problem.\nThis document describes the communication language used in one multiagent system environment for ecological simulations, based on EcoDynamo simulator application linked with several intelligent agents and visualisation applications, and extends the initial definition of the language. The agents actions and perceptions are translated into messages exchanged with the simulator application and other agents. The concepts and definitions used follow the BNF notation (Backus et al. 1960) and is inspired in the Coach Unilang language (Reis and Lau 2002).\nThe logic which describes quantum robots is not orthodox quantum logic, but a deductive calculus which reproduces the quantum tasks (computational processes, and actions) taking into account quantum superposition and quantum entanglement. A way toward the realization of intelligent quantum robots is to adopt a quantum metalanguage to control quantum robots. A physical implementation of a quantum metalanguage might be the use of coherent states in brain signals.\nThe paper proposes an analysis on some existent ontologies, in order to point out ways to resolve semantic heterogeneity in information systems. Authors are highlighting the tasks in a Knowledge Acquisiton System and identifying aspects related to the addition of new information to an intelligent system. A solution is proposed, as a combination of ontology reasoning services and natural languages generation. A multi-agent system will be conceived with an extractor agent, a reasoner agent and a competence management agent.\nThis paper describes application of information granulation theory, on the back analysis of Jeffrey mine southeast wall Quebec. In this manner, using a combining of Self Organizing Map (SOM) and rough set theory (RST), crisp and rough granules are obtained. Balancing of crisp granules and sub rough granules is rendered in close-open iteration. Combining of hard and soft computing, namely finite difference method (FDM) and computational intelligence and taking in to account missing information are two main benefits of the proposed method. As a practical example, reverse analysis on the failure of the southeast wall Jeffrey mine is accomplished.\nMamdani Fuzzy Model is an important technique in Computational Intelligence (CI) study. This paper presents an implementation of a supervised learning method based on membership function training in the context of Mamdani fuzzy models. Specifically, auto zoom function of a digital camera is modelled using Mamdani technique. The performance of control method is verified through a series of simulation and numerical results are provided as illustrations.\nA lot of mathematical knowledge has been formalized and stored in repositories by now: different mathematical theorems and theories have been taken into consideration and included in mathematical repositories. Applications more distant from pure mathematics, however --- though based on these theories --- often need more detailed knowledge about the underlying theories. In this paper we present an example Mizar formalization from the area of electrical engineering focusing on stability theory which is based on complex analysis. We discuss what kind of special knowledge is necessary here and which amount of this knowledge is included in existing repositories.\nBrain-Like Stochastic Search (BLiSS) refers to this task: given a family of utility functions U(u,A), where u is a vector of parameters or task descriptors, maximize or minimize U with respect to u, using networks (Option Nets) which input A and learn to generate good options u stochastically. This paper discusses why this is crucial to brain-like intelligence (an area funded by NSF) and to many applications, and discusses various possibilities for network design and training. The appendix discusses recent research, relations to work on stochastic optimization in operations research, and relations to engineering-based approaches to understanding neocortex.\nWe present in this paper, a modelling of an expertise in pragmatics. We follow knowledge engineering techniques and observe the expert when he analyses a social discussion forum. Then a number of models are defined. These models emphasises the process followed by the expert and a number of criteria used in his analysis. Results can be used as guides that help to understand and annotate discussion forum. We aim at modelling other pragmatics analysis in order to complete the base of guides; criteria, process, etc. of discussion analysis\nSerious games have recently emerged as an avenue for curriculum delivery. Serious games incorporate motivation and entertainment while providing pointed curriculum for the user. This paper presents a serious game, called MiBoard, currently being developed from the iSTART Intelligent Tutoring System. MiBoard incorporates a multiplayer interaction that iSTART was previously unable to provide. This multiplayer interaction produces a wide variation across game trials, while also increasing the repeat playability for users. This paper presents a demonstration of the MiBoard system and the expectations for its application.\nIn this work, we develop an intelligent user interface that allows users to enter biomedical queries in a natural language, and that presents the answers (possibly with explanations if requested) in a natural language. We develop a rule layer over biomedical ontologies and databases, and use automated reasoners to answer queries considering relevant parts of the rule layer.\nIntersections constitute one of the most dangerous elements in road systems. Traffic signals remain the most common way to control traffic at high-volume intersections and offer many opportunities to apply intelligent transportation systems to make traffic more efficient and safe. This paper describes an automated method to estimate the temporal exposure of road users crossing the conflict zone to lateral collision with road users originating from a different approach. This component is part of a larger system relying on video sensors to provide queue lengths and spatial occupancy that are used for real time traffic control and monitoring. The method is evaluated on data collected during a real world experiment.\nStatistics is a uniquely difficult field to convey to the uninitiated. It sits astride the abstract and the concrete, the theoretical and the applied. It has a mathematical flavor and yet it is not simply a branch of mathematics. Its core problems blend into those of the disciplines that probe into the nature of intelligence and thought, in particular philosophy, psychology and artificial intelligence. Debates over foundational issues have waxed and waned, but the field has not yet arrived at a single foundational perspective.\nOne of the first step in the realization of an automatic system of check recognition is the extraction of the handwritten area. We propose in this paper an hybrid method to extract these areas. This method is based on digit recognition by Fourier descriptors and different steps of colored image processing . It requires the bank recognition of its code which is located in the check marking band as well as the handwritten color recognition by the method of difference of histograms. The areas extraction is then carried out by the use of some mathematical morphology tools.\nThis paper introduces an elemental building block which combines Dictionary Learning and Dimension Reduction (DRDL). We show how this foundational element can be used to iteratively construct a Hierarchical Sparse Representation (HSR) of a sensory stream. We compare our approach to existing models showing the generality of our simple prescription. We then perform preliminary experiments using this framework, illustrating with the example of an object recognition task using standard datasets. This work introduces the very first steps towards an integrated framework for designing and analyzing various computational tasks from learning to attention to action. The ultimate goal is building a mathematically rigorous, integrated theory of intelligence.\nHarmonic theory provides a mathematical framework to describe the structure, behavior, evolution and emergence of harmonic systems. A harmonic system is context aware, contains elements that manifest characteristics either collaboratively or independently according to system's expression and can interact with its environment. This theory provides a fresh way to analyze emergence and collaboration of \"ad-hoc\" and complex systems.\nThe bioinformatical methods to detect lateral gene transfer events are mainly based on functional coding DNA characteristics. In this paper, we propose the use of DNA traits not depending on protein coding requirements. We introduce several semilocal variables that depend on DNA primary sequence and that reflect thermodynamic as well as physico-chemical magnitudes that are able to tell apart the genome of different organisms. After combining these variables in a neural classificator, we obtain results whose power of resolution go as far as to detect the exchange of genomic material between bacteria that are phylogenetically close.\nQuality assurance remains a key topic in human computation research. Prior work indicates that majority voting is effective for low difficulty tasks, but has limitations for harder tasks. This paper explores two methods of addressing this problem: tournament selection and elimination selection, which exploit 2-, 3- and 4-way comparisons between different answers to human computation tasks. Our experimental results and statistical analyses show that both methods produce the correct answer in noisy human computation environment more often than majority voting. Furthermore, we find that the use of 4-way comparisons can significantly reduce the cost of quality assurance relative to the use of 2-way comparisons.\nPartially observable Markov decision processes have been widely used to provide models for real-world decision making problems. In this paper, we will provide a method in which a slightly different version of them called Mixed observability Markov decision process, MOMDP, is going to join with our problem. Basically, we aim at offering a behavioural model for interaction of intelligent agents with musical pitch environment and we will show that how MOMDP can shed some light on building up a decision making model for musical pitch conveniently.\nUsing the recently proposed model of combinatorial landscapes: local optima networks, we study the distribution of local optima in two classes of instances of the quadratic assignment problem. Our results indicate that the two problem instance classes give rise to very different configuration spaces. For the so-called real-like class, the optima networks possess a clear modular structure, while the networks belonging to the class of random uniform instances are less well partitionable into clusters. We briefly discuss the consequences of the findings for heuristically searching the corresponding problem spaces.\nOne of the remarkable feats of intelligent life is that it restructures the world it lives in for its own benefit. This extended abstract outlines how the information-theoretic principle of empowerment, as an intrinsic motivation, can be used to restructure the environment an agent lives in. We present a first qualitative evaluation of how an agent in a 3d-gridworld builds a staircase-like structure, which reflects the agent's embodiment.\nReasoning, the most important human brain operation, is charactrized by a degree fuzziness. In the present paper we construct a fuzzy model for the reasoning process giving through the calculation of the possibilities of all possible individuals' profiles a quantitative/qualitative view of their behaviour during the above process and we use the centroid defuzzification technique for measuring the reasoning skills. We also present a number of classroom experiments illustrating our results in practice.\nFor a computational system to be intelligent, it should be able to perform, at least, basic deductions. Nonetheless, since deductions are, in some sense, equivalent to tautologies, it seems that they do not provide new information. The present article proposes a measure the degree of semantic informativity of valid deductions in a dynamic setting. Concepts of coherency and relevancy, displayed in terms of insertions and deletions on databases, are used to define semantic informativity. In this way, the article shows that a solution to the problem about the informativity of deductions provides a heuristic principle to improve the deductive power of computational systems.\nWe present an implementation of E-anti-unification as defined in Heinz (1995), where tree-grammar descriptions of equivalence classes of terms are used to compute generalizations modulo equational theories. We discuss several improvements, including an efficient implementation of variable-restricted E-anti-unification from Heinz (1995), and give some runtime figures about them. We present applications in various areas, including lemma generation in equational inductive proofs, intelligence tests, diverging Knuth-Bendix completion, strengthening of induction hypotheses, and theory formation about finite algebras.\nThe study of collective decision making system has become the central part of the Swarm- Intelligence Related research in recent years. The most challenging task of modelling a collec- tive decision making system is to develop the macroscopic stochastic equation from its microscopic model. In this report we have investigated the behaviour of a collective decision making system with specified microscopic rules that resemble the chemical reaction and used different group size. Then we ventured to derive a generalized analytical model of a collective-decision system using hyper-geometric distribution.   Index Terms-swarm; collective decision making; noise; group size; hyper-geometric distribution\nThis chapter discusses the existing and future use of robotics and intelligent sensing technology in mental health care. While the use of this technology is nascent in mental health care, it represents a potentially useful tool in the practitioner's toolbox. The goal of this chapter is to provide a brief overview of the field, discuss the recent use of robotics technology in mental health care practice, explore some of the design issues and ethical issues of using robots in this space, and finally to explore the potential of emerging technology.\nWe propose a system for automated essay grading using ontologies and textual entailment. The process of textual entailment is guided by hypotheses, which are extracted from a domain ontology. Textual entailment checks if the truth of the hypothesis follows from a given text. We enact textual entailment to compare students answer to a model answer obtained from ontology. We validated the solution against various essays written by students in the chemistry domain.\nOur aim is to extract information about literary characters in unstructured texts. We employ natural language processing and reasoning on domain ontologies. The first task is to identify the main characters and the parts of the story where these characters are described or act. We illustrate the system in a scenario in the folktale domain. The system relies on a folktale ontology that we have developed based on Propp's model for folktales morphology.\nIn this paper we address planning problems in high-dimensional hybrid configuration spaces, with a particular focus on manipulation planning problems involving many objects. We present the hybrid backward-forward (HBF) planning algorithm that uses a backward identification of constraints to direct the sampling of the infinite action space in a forward search from the initial state towards a goal configuration. The resulting planner is probabilistically complete and can effectively construct long manipulation plans requiring both prehensile and nonprehensile actions in cluttered environments.\nGiven recent successes in AI (e.g., AlphaGo's victory against Lee Sedol in the game of GO), it's become increasingly important to assess: how close are AI systems to human-level intelligence? This paper describes the Allen AI Science Challenge---an approach towards that goal which led to a unique Kaggle Competition, its results, the lessons learned, and our next steps.\nAdvances in ICT are bringing into reality the vision of a large number of uniquely identifiable, interconnected objects and things that gather information from diverse physical environments and deliver the information to a variety of innovative applications and services. These sensing objects and things form the Internet of Things (IoT) that can improve energy and cost efficiency and automation in many different industry fields such as transportation and logistics, health care and manufacturing, and facilitate our everyday lives as well. IoT applications rely on real-time context data and allow sending information for driving the behaviors of users in intelligent environments.\nA visual type theory is a cognitive tool that has much in common with language, and may be regarded as an exceptional form of spatial text adjunct. A mathematical visual type theory, called NPM, has been under development that can be viewed as an early-stage project in mathematical knowledge management and mathematical user interface development. We discuss in greater detail the notion of a visual type theory, report on progress towards a usable mathematical visual type theory, and discuss the outlook for future work on this project.\nExplanations have been introduced in the previous century. Their interest in reducing the search space is no longer questioned. Yet, their efficient implementation into CSP solver is still a challenge. In this paper, we introduce ESeR, an Event Selection Rules algorithm that filters events generated during propagation. This dynamic selection enables an efficient computation of explanations for intelligent backtracking al- gorithms. We show the effectiveness of our approach on the instances of the last three MiniZinc challenges\nThis paper describes an approach for user (e.g. SW architect) assisting in software processes. The approach observes the user's action and tries to predict his next step. For this we use approaches in the area of machine learning (sequence learning) and adopt these for the use in software processes.   Keywords: Software engineering, Software process description languages, Software processes, Machine learning, Sequence prediction\nThis manual describes the competition software for the Simulated Car Racing Championship, an international competition held at major conferences in the field of Evolutionary Computation and in the field of Computational Intelligence and Games. It provides an overview of the architecture, the instructions to install the software and to run the simple drivers provided in the package, the description of the sensors and the actuators.\nThe measure of distance between two fuzzy sets is a fundamental tool within fuzzy set theory. However, current distance measures within the literature do not account for the direction of change between fuzzy sets; a useful concept in a variety of applications, such as Computing With Words. In this paper, we highlight this utility and introduce a distance measure which takes the direction between sets into account. We provide details of its application for normal and non-normal, as well as convex and non-convex fuzzy sets. We demonstrate the new distance measure using real data from the MovieLens dataset and establish the benefits of measuring the direction between fuzzy sets.\nThe measure of distance between two fuzzy sets is a fundamental tool within fuzzy set theory, however, distance measures currently within the literature use a crisp value to represent the distance between fuzzy sets. A real valued distance measure is developed into a fuzzy distance measure which better reflects the uncertainty inherent in fuzzy sets and a fuzzy directional distance measure is presented, which accounts for the direction of change between fuzzy sets. A multiplicative version is explored as a full maximal assignment is computationally intractable so an intermediate solution is offered.\nSerial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.\nWe characterize different types of conflicts that may occur in complex distributed multi-agent scenarios, such as in Ambient Intelligence (AmI) environments, and we argue that these conflicts should be resolved in a suitable order and with the appropriate strategies for each individual conflict type. We call for further research with the goal of turning conflict resolution in AmI environments and similar multi-agent domains into a more coordinated and agreed upon process.\nIn a Human-Computer Interaction context, we aim to elaborate an adaptive and generic interaction model in two different use cases: Embodied Conversational Agents and Creative Musical Agents for musical improvisation. To reach this goal, we'll try to use the concepts of adaptation and synchronization to enhance the interactive abilities of our agents and guide the development of our interaction model, and will try to make synchrony emerge from non-verbal dimensions of interaction.\nWe propose a way of extracting and aggregating per-move evaluations from sets of Go game records. The evaluations capture different aspects of the games such as played patterns or statistic of sente/gote sequences. Using machine learning algorithms, the evaluations can be utilized to predict different relevant target variables. We apply this methodology to predict the strength and playing style of the player (e.g. territoriality or aggressivity) with good accuracy. We propose a number of possible applications including aiding in Go study, seeding real-work ranks of internet players or tuning of Go-playing programs.\nThe paper presents an application of non-linear stacking ensembles for prediction of Go player attributes. An evolutionary algorithm is used to form a diverse ensemble of base learners, which are then aggregated by a stacking ensemble. This methodology allows for an efficient prediction of different attributes of Go players from sets of their games. These attributes can be fairly general, in this work, we used the strength and style of the players.\nThis paper presents 'SimpleDS', a simple and publicly available dialogue system trained with deep reinforcement learning. In contrast to previous reinforcement learning dialogue systems, this system avoids manual feature engineering by performing action selection directly from raw text of the last system and (noisy) user responses. Our initial results, in the restaurant domain, show that it is indeed possible to induce reasonable dialogue behaviour with an approach that aims for high levels of automation in dialogue control for intelligent interactive agents.\nThere exists a theory of a single general-purpose learning algorithm which could explain the principles of its operation. This theory assumes that the brain has some initial rough architecture, a small library of simple innate circuits which are prewired at birth and proposes that all significant mental algorithms can be learned. Given current understanding and observations, this paper reviews and lists the ingredients of such an algorithm from both architectural and functional perspectives.\nWe describe CITlab's recognition system for the HTRtS competition attached to the 13. International Conference on Document Analysis and Recognition, ICDAR 2015. The task comprises the recognition of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET's ARGUS framework for intelligent text recognition and image processing.\nThis paper introduces the revival of the popular Ms. Pac-Man Versus Ghost Team competition. We present an updated game engine with Partial Observability constraints, a new Multi-Agent Systems approach to developing Ghost agents and several sample controllers to ease the development of entries. A restricted communication protocol is provided for the Ghosts, providing a more challenging environment than before. The competition will debut at the IEEE Computational Intelligence and Games Conference 2016. Some preliminary results showing the effects of Partial Observability and the benefits of simple communication are also presented.\nThis paper presents a systematic survey on existing literature and seminal works relevant to the application of ontologies in different aspects of Cloud computing. Our hypothesis is that ontologies along with their reasoning capabilities can have significant impact on improving various aspects of the Cloud computing phenomena. Ontologies can promote intelligent decision support mechanisms for various Cloud based services. They can also provide effective interoperability among the Cloud based systems and resources. This survey can promote a comprehensive understanding on the roles and significance of ontologies within the overall domain of Cloud Computing. Also, this project can potentially form the basis of new research area and possibilities for both ontology and Cloud computing communities.\nNeo-fuzzy elements are used as nodes for an evolving cascade system. The proposed system can tune both its parameters and architecture in an online mode. It can be used for solving a wide range of Data Mining tasks (namely time series forecasting). The evolving cascade system with neo-fuzzy nodes can process rather large data sets with high speed and effectiveness.\nDespite outstanding success in vision amongst other domains, many of the recent deep learning approaches have evident drawbacks for robots. This manuscript surveys recent work in the literature that pertain to applying deep learning systems to the robotics domain, either as means of estimation or as a tool to resolve motor commands directly from raw percepts. These recent advances are only a piece to the puzzle. We suggest that deep learning as a tool alone is insufficient in building a unified framework to acquire general intelligence. For this reason, we complement our survey with insights from cognitive development and refer to ideas from classical control theory, producing an integrated direction for a lifelong learning architecture.\nDefining various dishonest notions in a formal way is a key step to enable intelligent agents to act in untrustworthy environments. This review evaluates the literature for this topic by looking at formal definitions based on modal logic as well as other formal approaches. Criteria from philosophical groundwork is used to assess the definitions for correctness and completeness. The key contribution of this review is to show that only a few definitions fully comply with this gold standard and to point out the missing steps towards a successful application of these definitions in an actual agent environment.\nIn this paper we introduce a fuzzy constraint linear discriminant analysis (FC-LDA). The FC-LDA tries to minimize misclassification error based on modified perceptron criterion that benefits handling the uncertainty near the decision boundary by means of a fuzzy linear programming approach with fuzzy resources. The method proposed has low computational complexity because of its linear characteristics and the ability to deal with noisy data with different degrees of tolerance. Obtained results verify the success of the algorithm when dealing with different problems. Comparing FC-LDA and LDA shows superiority in classification task.\nThe ability to accurately predict and simulate human driving behavior is critical for the development of intelligent transportation systems. Traditional modeling methods have employed simple parametric models and behavioral cloning. This paper adopts a method for overcoming the problem of cascading errors inherent in prior approaches, resulting in realistic behavior that is robust to trajectory perturbations. We extend Generative Adversarial Imitation Learning to the training of recurrent policies, and we demonstrate that our model outperforms rule-based controllers and maximum likelihood models in realistic highway simulations. Our model both reproduces emergent behavior of human drivers, such as lane change rate, while maintaining realistic control over long time horizons.\nRetention of residual skills for persons who partially lose their cognitive or physical ability is of utmost importance. Research is focused on developing systems that provide need-based assistance for retention of such residual skills. This paper describes a novel cognitive collaborative control architecture C3A, designed to address the challenges of developing need- based assistance for wheelchair navigation. Organization of C3A is detailed and results from simulation of the proposed architecture is presented. For simulation of our proposed architecture, we have used ROS (Robot Operating System) as a control framework and a 3D robotic simulator called USARSim (Unified System for Automation and Robot Simulation).\nClassical higher-order logic, when utilized as a meta-logic in which various other (classical and non-classical) logics can be shallowly embedded, is well suited for realising a universal logic reasoning approach. Universal logic reasoning in turn, as envisioned already by Leibniz, may support the rigorous formalisation and deep logical analysis of rational arguments within machines. A respective universal logic reasoning framework is described and a range of exemplary applications are discussed. In the future, universal logic reasoning in combination with appropriate, controlled forms of rational argumentation may serve as a communication layer between humans and intelligent machines.\nIn this paper we describe an automatic generator to support the data scientist to construct, in a user-friendly way, dashboards from data represented as networks. The generator called SBINet (Semantic for Business Intelligence from Networks) has a semantic layer that, through ontologies, describes the data that represents a network as well as the possible metrics to be calculated in the network. Thus, with SBINet, the stages of the dashboard constructing process that uses complex network metrics are facilitated and can be done by users who do not necessarily know about complex networks.\nIn this report, an automated bartender system was developed for making orders in a bar using hand gestures. The gesture recognition of the system was developed using Machine Learning techniques, where the model was trained to classify gestures using collected data. The final model used in the system reached an average accuracy of 95%. The system raised ethical concerns both in terms of user interaction and having such a system in a real world scenario, but it could initially work as a complement to a real bartender.\nSymbolic has been long considered as a language of human intelligence while neural networks have advantages of robust computation and dealing with noisy data. The integration of neural-symbolic can offer better learning and reasoning while providing a means for interpretability through the representation of symbolic knowledge. Although previous works focus intensively on supervised feedforward neural networks, little has been done for the unsupervised counterparts. In this paper we show how to integrate symbolic knowledge into unsupervised neural networks. We exemplify our approach with knowledge in different forms, including propositional logic for DNA promoter prediction and first-order logic for understanding family relationship.\nWe propose a novel method for automatic program synthesis. P-Tree Programming represents the program search space through a single probabilistic prototype tree. From this prototype tree we form program instances which we evaluate on a given problem. The error values from the evaluations are propagated through the prototype tree. We use them to update the probability distributions that determine the symbol choices of further instances. The iterative method is applied to several symbolic regression benchmarks from the literature. It outperforms standard Genetic Programming to a large extend. Furthermore, it relies on a concise set of parameters which are held constant for all problems. The algorithm can be employed for most of the typical computational intelligence tasks such as classification, automatic program induction, and symbolic regression.\nNeuroevolution has proven effective at many reinforcement learning tasks, but does not seem to scale well to high-dimensional controller representations, which are needed for tasks where the input is raw pixel data. We propose a novel method where we train an autoencoder to create a comparatively low-dimensional representation of the environment observation, and then use CMA-ES to train neural network controllers acting on this input data. As the behavior of the agent changes the nature of the input data, the autoencoder training progresses throughout evolution. We test this method in the VizDoom environment built on the classic FPS Doom, where it performs well on a health-pack gathering task.\nThe paper considers the problem of planning a set of non-conflict trajectories for the coalition of intelligent agents (mobile robots). Two divergent approaches, e.g. centralized and decentralized, are surveyed and analyzed. Decentralized planner - MAPP is described and applied to the task of finding trajectories for dozens UAVs performing nap-of-the-earth flight in urban environments. Results of the experimental studies provide an opportunity to claim that MAPP is a highly efficient planner for solving considered types of tasks.\nSustainable and economical generation of electrical power is an essential and mandatory component of infrastructure in today's world. Optimal generation (generator subset selection) of power requires a careful evaluation of various factors like type of source, generation, transmission & storage capacities, congestion among others which makes this a difficult task. We created a grid to simulate various conditions including stimuli like generator supply, weather and load demand using Siemens PSS/E software and this data is trained using deep learning methods and subsequently tested. The results are highly encouraging. As per our knowledge, this is the first paper to propose a working and scalable deep learning model for this problem.\nOrdered weighted averaging (OWA) operators have been widely used in decision making these past few years. An important issue facing the OWA operators' users is the determination of the OWA weights. This paper introduces an OWA determination method based on truncated distributions that enables intuitive generation of OWA weights according to a certain level of risk and trade-off. These two dimensions are represented by the two first moments of the truncated distribution. We illustrate our approach with the well-know normal distribution and the definition of a continuous parabolic decision-strategy space. We finally study the impact of the number of criteria on the results.\nIn this work we propose an ontology to support automated negotiation in multiagent systems. The ontology can be connected with some domain-specific ontologies to facilitate the negotiation in different domains, such as Intelligent Transportation Systems (ITS), e-commerce, etc. The specific negotiation rules for each type of negotiation strategy can also be defined as part of the ontology, reducing the amount of knowledge hardcoded in the agents and ensuring the interoperability. The expressiveness of the ontology was proved in a multiagent architecture for the automatic traffic light setting application on ITS.\nTelecom companies are severely damaged by bypass fraud or SIM boxing. However, there is a shortage of published research to tackle this problem. The traditional method of Test Call Generating is easily overcome by fraudsters and the need for more sophisticated ways is inevitable. In this work, we are developing intelligent algorithms that mine a huge amount of mobile operator's data and detect the SIMs that are used to bypass international calls. This method will make it hard for fraudsters to generate revenue and hinder their work. Also by reducing fraudulent activities, quality of service can be increased as well as customer satisfaction. Our technique has been evaluated and tested on real world mobile operator data, and proved to be very efficient.\nWe introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain.\nThe Winograd Schema Challenge (WSC) is a test of machine intelligence, designed to be an improvement on the Turing test. A Winograd Schema consists of a sentence and a corresponding question. To successfully answer these questions, one requires the use of commonsense knowledge and reasoning. This work focuses on extracting common sense knowledge which can be used to generate answers for the Winograd schema challenge. Common sense knowledge is extracted based on events (or actions) and their participants; called Event-Based Conditional Commonsense (ECC). I propose an approach using Narrative Event Chains [Chambers et al., 2008] to extract ECC knowledge. These are stored in templates, to be later used for answering the WSC questions. This approach works well with respect to a subset of WSC tasks.\nCurrent machine learning systems operate, almost exclusively, in a statistical, or model-free mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, cannot serve as the basis for strong AI. To achieve human level intelligence, learning machines need the guidance of a model of reality, similar to the ones used in causal inference tasks. To demonstrate the essential role of such models, I will present a summary of seven tasks which are beyond reach of current machine learning systems and which have been accomplished using the tools of causal modeling.\nThe emerging paradigm of Human-Machine Inference Networks (HuMaINs) combines complementary cognitive strengths of humans and machines in an intelligent manner to tackle various inference tasks and achieves higher performance than either humans or machines by themselves. While inference performance optimization techniques for human-only or sensor-only networks are quite mature, HuMaINs require novel signal processing and machine learning solutions. In this paper, we present an overview of the HuMaINs architecture with a focus on three main issues that include architecture design, inference algorithms including security/privacy challenges, and application areas/use cases.\nWhile great advances are made in pattern recognition and machine learning, the successes of such fields remain restricted to narrow applications and seem to break down when training data is scarce, a shift in domain occurs, or when intelligent reasoning is required for rapid adaptation to new environments. In this work, we list several of the shortcomings of modern machine-learning solutions, specifically in the contexts of computer vision and in reinforcement learning and suggest directions to explore in order to try to ameliorate these weaknesses.\nThe paper describes the architecture of the intelligence system for automated design of ontological knowledge bases of domain areas and the software model of the management GUI (Graphical User Interface) subsystem\nWe present MIRIAM (Multimodal Intelligent inteRactIon for Autonomous systeMs), a multimodal interface to support situation awareness of autonomous vehicles through chat-based interaction. The user is able to chat about the vehicle's plan, objectives, previous activities and mission progress. The system is mixed initiative in that it pro-actively sends messages about key events, such as fault warnings. We will demonstrate MIRIAM using SeeByte's SeeTrack command and control interface and Neptune autonomy simulator.\nConversation interfaces (CIs), or chatbots, are a popular form of intelligent agents that engage humans in task-oriented or informal conversation. In this position paper and demonstration, we argue that chatbots working in dynamic environments, like with sensor data, can not only serve as a promising platform to research issues at the intersection of learning, reasoning, representation and execution for goal-directed autonomy; but also handle non-trivial business applications. We explore the underlying issues in the context of Water Advisor, a preliminary multi-modal conversation system that can access and explain water quality data.\nWe discuss and predict the evolution of Simultaneous Localisation and Mapping (SLAM) into a general geometric and semantic `Spatial AI' perception capability for intelligent embodied devices. A big gap remains between the visual perception performance that devices such as augmented reality eyewear or comsumer robots will require and what is possible within the constraints imposed by real products. Co-design of algorithms, processors and sensors will be needed. We explore the computational structure of current and future Spatial AI algorithms and consider this within the landscape of ongoing hardware developments.\nWe developed an Android Smartophone application software for tourist information system. Especially, the agent system recommends the sightseeing spot and local hospitality corresponding to the current feelings. The system such as concierge can estimate user's emotion and mood by Emotion Generating Calculations and Mental State Transition Network. In this paper, the system decides the next candidates for spots and foods by the reasoning of fuzzy Petri Net in order to make more smooth communication between human and smartphone. The system was developed for Hiroshima Tourist Information and described some hospitality about the concierge system.\nWWW has a scale-free structure where novel information is often difficult to locate. Moreover, Intelligent agents easily get trapped in this structure. Here a novel method is put forth, which turns these traps into information repositories, supplies: We populated an Internet environment with intelligent news foragers. Foraging has its associated cost whereas foragers are rewarded if they detect not yet discovered novel information. The intelligent news foragers crawl by using the estimated long-term cumulated reward, and also have a finite sized memory: the list of most promising supplies. Foragers form an artificial life community: the most successful ones are allowed to multiply, while unsuccessful ones die out. The specific property of this community is that there is no direct communication amongst foragers but the centralized rewarding system. Still, fast division of work is achieved.\nThe most advanced implementation of adaptive constraint processing with Constraint Handling Rules (CHR) allows the application of intelligent search strategies to solve Constraint Satisfaction Problems (CSP). This presentation compares an improved version of conflict-directed backjumping and two variants of dynamic backtracking with respect to chronological backtracking on some of the AIM instances which are a benchmark set of random 3-SAT problems. A CHR implementation of a Boolean constraint solver combined with these different search strategies in Java is thus being compared with a CHR implementation of the same Boolean constraint solver combined with chronological backtracking in SICStus Prolog. This comparison shows that the addition of ``intelligence'' to the search process may reduce the number of search steps dramatically. Furthermore, the runtime of their Java implementations is in most cases faster than the implementations of chronological backtracking. More specifically, conflict-directed backjumping is even faster than the SICStus Prolog implementation of chronological backtracking, although our Java implementation of CHR lacks the optimisations made in the SICStus Prolog system. To appear in Theory and Practice of Logic Programming (TPLP).\nWe discuss which properties common-use artifacts should have to collaborate without human intervention. We conceive how devices, such as mobile phones, PDAs, and home appliances, could be seamlessly integrated to provide an \"ambient intelligence\" that responds to the user's desires without requiring explicit programming or commands. While the hardware and software technology to build such systems already exists, as yet there is no standard protocol that can learn new meanings. We propose the first steps in the development of such a protocol, which would need to be adaptive, extensible, and open to the community, while promoting self-organization. We argue that devices, interacting through \"game-like\" moves, can learn to agree about how to communicate, with whom to cooperate, and how to delegate and coordinate specialized tasks. Thus, they may evolve a distributed cognition or collective intelligence capable of tackling complex tasks.\nThe act of bluffing confounds game designers to this day. The very nature of bluffing is even open for debate, adding further complication to the process of creating intelligent virtual players that can bluff, and hence play, realistically. Through the use of intelligent, learning agents, and carefully designed agent outlooks, an agent can in fact learn to predict its opponents reactions based not only on its own cards, but on the actions of those around it. With this wider scope of understanding, an agent can in learn to bluff its opponents, with the action representing not an illogical action, as bluffing is often viewed, but rather as an act of maximising returns through an effective statistical optimisation. By using a tee dee lambda learning algorithm to continuously adapt neural network agent intelligence, agents have been shown to be able to learn to bluff without outside prompting, and even to learn to call each others bluffs in free, competitive play.\nIn this study, we introduce general frame of MAny Connected Intelligent Particles Systems (MACIPS). Connections and interconnections between particles get a complex behavior of such merely simple system (system in system).Contribution of natural computing, under information granulation theory, are the main topics of this spacious skeleton. Upon this clue, we organize two algorithms involved a few prominent intelligent computing and approximate reasoning methods: self organizing feature map (SOM), Neuro- Fuzzy Inference System and Rough Set Theory (RST). Over this, we show how our algorithms can be taken as a linkage of government-society interaction, where government catches various fashions of behavior: solid (absolute) or flexible. So, transition of such society, by changing of connectivity parameters (noise) from order to disorder is inferred. Add to this, one may find an indirect mapping among financial systems and eventual market fluctuations with MACIPS. Keywords: phase transition, SONFIS, SORST, many connected intelligent particles system, society-government interaction\nThe investigation of the terrorist attack is a time-critical task. The investigators have a limited time window to diagnose the organizational background of the terrorists, to run down and arrest the wire-pullers, and to take an action to prevent or eradicate the terrorist attack. The intuitive interface to visualize the intelligence data set stimulates the investigators' experience and knowledge, and aids them in decision-making for an immediately effective action. This paper presents a computational method to analyze the intelligence data set on the collective actions of the perpetrators of the attack, and to visualize it into the form of a social network diagram which predicts the positions where the wire-pullers conceals themselves.\nIn this study, we introduce general frame of MAny Connected Intelligent Particles Systems (MACIPS). Connections and interconnections between particles get a complex behavior of such merely simple system (system in system).Contribution of natural computing, under information granulation theory, are the main topic of this spacious skeleton. Upon this clue, we organize different algorithms involved a few prominent intelligent computing and approximate reasoning methods such as self organizing feature map (SOM)[9], Neuro- Fuzzy Inference System[10], Rough Set Theory (RST)[11], collaborative clustering, Genetic Algorithm and Ant Colony System. Upon this, we have employed our algorithms on the several engineering systems, especially emerged systems in Civil and Mineral processing. In other process, we investigated how our algorithms can be taken as a linkage of government-society interaction, where government catches various fashions of behavior: solid (absolute) or flexible. So, transition of such society, by changing of connectivity parameters (noise) from order to disorder is inferred. Add to this, one may find an indirect mapping among finical systems and eventual market fluctuations with MACIPS. In the following sections, we will mention the main topics of the suggested proposal, briefly Details of the proposed algorithms can be found in the references.\nComputer-based information technologies have been extensively used to help many organizations, private companies, and academic and education institutions manage their processes and information systems hereby become their nervous centre. The explosion of massive data sets created by businesses, science and governments necessitates intelligent and more powerful computing paradigms so that users can benefit from this data. Therefore most new-generation database applications demand intelligent information management to enhance efficient interactions between database and the users. Database systems support only a Boolean query model. A selection query on SQL database returns all those tuples that satisfy the conditions in the query.\nInformledge System (ILS) is a knowledge network with autonomous nodes and intelligent links that integrate and structure the pieces of knowledge. In this paper, we put forward the strategies for knowledge embedding and retrieval in an ILS. ILS is a powerful knowledge network system dealing with logical storage and connectivity of information units to form knowledge using autonomous nodes and multi-lateral links. In ILS, the autonomous nodes known as Knowledge Network Nodes (KNN)s play vital roles which are not only used in storage, parsing and in forming the multi-lateral linkages between knowledge points but also in helping the realization of intelligent retrieval of linked information units in the form of knowledge. Knowledge built in to the ILS forms the shape of sphere. The intelligence incorporated into the links of a KNN helps in retrieving various knowledge threads from a specific set of KNNs. A developed entity of information realized through KNN forms in to the shape of a knowledge cone\nHolding commercial negotiations and selecting the best supplier in supply chain management systems are among weaknesses of producers in production process. Therefore, applying intelligent systems may have an effective role in increased speed and improved quality in the selections .This paper introduces a system which tries to trade using multi-agents systems and holding negotiations between any agents. In this system, an intelligent agent is considered for each segment of chains which it tries to send order and receive the response with attendance in negotiation medium and communication with other agents .This paper introduces how to communicate between agents, characteristics of multi-agent and standard registration medium of each agent in the environment. JADE (Java Application Development Environment) was used for implementation and simulation of agents cooperation.\nWe describe challenges in the risk management of smart card based electronic cash industry and describe a method to evaluate the effectiveness of distributed intelligence on the smart card. More specifically, we discuss the evaluation of distributed intelligence function called \"on-chip risk management\" of the smart card for the global electronic cash payment application using micro dynamic simulation. Handling of uncertainty related to future economic environment, various potential counterfeit attack scenarios, requires simulation of such environment to evaluate on-chip performance. Creation of realistic simulation of electronic cash economy, transaction environment, consumers, merchants, banks are challenge themselves. In addition, we shows examples of detection capability of off-chip, host based counterfeit detection systems based on the micro dynamic simulation model generated data set.\nThe paper considers the class of information systems capable of solving heuristic problems on basis of formal theory that was termed modal and vector theory of formal intelligent systems (FIS). The paper justifies the construction of FIS resolution algorithm, defines the main features of these systems and proves theorems that underlie the theory. The principle of representation diversity of FIS construction is formulated. The paper deals with the main principles of constructing and functioning formal intelligent system (FIS) on basis of FIS modal and vector theory. The following phenomena are considered: modular architecture of FIS presentation sub-system, algorithms of data processing at every step of the stage of creating presentations. Besides the paper suggests the structure of neural elements, i.e. zone detectors and processors that are the basis for FIS construction.\nThis paper introduces a novel idea for representation of individuals using quaternions in swarm intelligence and evolutionary algorithms. Quaternions are a number system, which extends complex numbers. They are successfully applied to problems of theoretical physics and to those areas needing fast rotation calculations. We propose the application of quaternions in optimization, more precisely, we have been using quaternions for representation of individuals in Bat algorithm. The preliminary results of our experiments when optimizing a test-suite consisting of ten standard functions showed that this new algorithm significantly improved the results of the original Bat algorithm. Moreover, the obtained results are comparable with other swarm intelligence and evolutionary algorithms, like the artificial bees colony, and differential evolution. We believe that this representation could also be successfully applied to other swarm intelligence and evolutionary algorithms.\nPath planning of Robot is one of the challenging fields in the area of Robotics research. In this paper, we proposed a novel algorithm to find path between starting and ending position for an intelligent system. An intelligent system is considered to be a device/robot having an antenna connected with sensor-detector system. The proposed algorithm is based on Neural Network training concept. The considered neural network is Adapti ve to the knowledge bases. However, implementation of this algorithm is slightly expensive due to hardware it requires. From detailed analysis, it can be proved that the resulted path of this algorithm is efficient.\nIntelligent Transportation System in case of cities is controlling traffic congestion and regulating the traffic flow. This paper presents three modules that will help in managing city traffic issues and ultimately gives advanced development in transportation system. First module, Congestion Detection and Management will provide user real time information about congestion on the road towards his destination, Second module, Intelligent Public Transport System will provide user real time public transport information,i.e, local buses, and the third module, Signal Synchronization will help in controlling congestion at signals, with real time adjustments of signal timers according to the congestion. All the information that user is getting about the traffic or public transportation will be provided on users day to day device that is mobile through Android application or SMS. Moreover, communication can also be done via Website for Clients having internet access. And all these modules will be fully automated without any human intervention at server side.\nAttributing a cyber-operation through the use of multiple pieces of technical evidence (i.e., malware reverse-engineering and source tracking) and conventional intelligence sources (i.e., human or signals intelligence) is a difficult problem not only due to the effort required to obtain evidence, but the ease with which an adversary can plant false evidence. In this paper, we introduce a formal reasoning system called the InCA (Intelligent Cyber Attribution) framework that is designed to aid an analyst in the attribution of a cyber-operation even when the available information is conflicting and/or uncertain. Our approach combines argumentation-based reasoning, logic programming, and probabilistic models to not only attribute an operation but also explain to the analyst why the system reaches its conclusions.\nWhile perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a general introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this survey, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.\nIn [1,2] a new class of intelligent controllers that can semantically embed an agent in a spatial context constraining its behavior in a goal-oriented manner was suggested. A controller of such a class can guide an agent in a stationary unknown environment to a fixed target zone along an obstacle-free trajectory. Here, an extension is suggested that would enable the interception of an intelligent target that is maneuvering to evade capture amidst stationary clutter (i.e. the target zone is moving). This is achieved by forcing the differential properties of the potential field used to induce the control action to satisfy the wave equation. Background of the problem, theoretical developments, as well as, proofs of the ability of the modified control to intercept the target along an obstacle-free trajectory are supplied. Simulation results are also provided.\nSoftware cost estimation based on multivariate data from completed projects requires the building of efficient models. These models essentially describe relations in the data, either on the basis of correlations between variables or of similarities between the projects. The continuous growth of the amount of data gathered and the need to perform preliminary analysis in order to discover patterns able to drive the building of reasonable models, leads the researchers towards intelligent and time-saving tools which can effectively describe data and their relationships. The goal of this paper is to suggest an innovative visualization tool, widely used in bioinformatics, which represents relations in data in an aesthetic and intelligent way. In order to illustrate the capabilities of the tool, we use a well known dataset from software engineering projects.\nRecent advances in communications, mobile computing, and artificial intelligence have greatly expanded the application space of intelligent distributed sensor networks. This in turn motivates the development of generalized Bayesian decentralized data fusion (DDF) algorithms for robust and efficient information sharing among autonomous agents using probabilistic belief models. However, DDF is significantly challenging to implement for general real-world applications requiring the use of dynamic/ad hoc network topologies and complex belief models, such as Gaussian mixtures or hybrid Bayesian networks. To tackle these issues, we first discuss some new key mathematical insights about exact DDF and conservative approximations to DDF. These insights are then used to develop novel generalized DDF algorithms for complex beliefs based on mixture pdfs and conditional factors. Numerical examples motivated by multi-robot target search demonstrate that our methods lead to significantly better fusion results, and thus have great potential to enhance distributed intelligent reasoning in sensor networks.\nIn this paper we discuss and analyze some of the intelligent classifiers which allows for automatic detection and classification of networks attacks for any intrusion detection system. We will proceed initially with their analysis using the WEKA software to work with the classifiers on a well-known IDS (Intrusion Detection Systems) dataset like NSL-KDD dataset. The NSL-KDD dataset of network attacks was created in a military network by MIT Lincoln Labs. Then we will discuss and experiment some of the hybrid AI (Artificial Intelligence) classifiers that can be used for IDS, and finally we developed a Java software with three most efficient classifiers and compared it with other options. The outputs would show the detection accuracy and efficiency of the single and combined classifiers used.\nIt has been quite a long time since AI researchers in the field of computer science stop talking about simulating human intelligence or trying to explain how brain works. Recently, represented by deep learning techniques, the field of machine learning is experiencing unprecedented prosperity and some applications with near human-level performance bring researchers confidence to imply that their approaches are the promising candidate for understanding the mechanism of human brain. However apart from several ancient philological criteria and some imaginary black box tests (Turing test, Chinese room) there is no computational level explanation, definition or criteria about intelligence or any of its components. Base on the common sense that learning ability is one critical component of intelligence and inspect from the viewpoint of mapping relations, this paper presents two laws which explains what is the \"learning ability\" as we familiar with and under what conditions a mapping relation can be acknowledged as \"Learning Model\".\nIn this paper, we examine the problem of missing data in high-dimensional datasets by taking into consideration the Missing Completely at Random and Missing at Random mechanisms, as well as theArbitrary missing pattern. Additionally, this paper employs a methodology based on Deep Learning and Swarm Intelligence algorithms in order to provide reliable estimates for missing data. The deep learning technique is used to extract features from the input data via an unsupervised learning approach by modeling the data distribution based on the input. This deep learning technique is then used as part of the objective function for the swarm intelligence technique in order to estimate the missing data after a supervised fine-tuning phase by minimizing an error function based on the interrelationship and correlation between features in the dataset. The investigated methodology in this paper therefore has longer running times, however, the promising potential outcomes justify the trade-off. Also, basic knowledge of statistics is presumed.\nEfforts at understanding the computational processes in the brain have met with limited success, despite their importance and potential uses in building intelligent machines. We propose a simple new model which draws on recent findings in Neuroscience and the Applied Mathematics of interacting Dynamical Systems. The Feynman Machine is a Universal Computer for Dynamical Systems, analogous to the Turing Machine for symbolic computing, but with several important differences. We demonstrate that networks and hierarchies of simple interacting Dynamical Systems, each adaptively learning to forecast its evolution, are capable of automatically building sensorimotor models of the external and internal world. We identify such networks in mammalian neocortex, and show how existing theories of cortical computation combine with our model to explain the power and flexibility of mammalian intelligence. These findings lead directly to new architectures for machine intelligence. A suite of software implementations has been built based on these principles, and applied to a number of spatiotemporal learning tasks.\nThe sampling based motion planning algorithm known as Rapidly-exploring Random Trees (RRT) has gained the attention of many researchers due to their computational efficiency and effectiveness. Recently, a variant of RRT called RRT* has been proposed that ensures asymptotic optimality. Subsequently its bidirectional version has also been introduced in the literature known as Bidirectional-RRT* (B-RRT*). We introduce a new variant called Intelligent Bidirectional-RRT* (IB-RRT*) which is an improved variant of the optimal RRT* and bidirectional version of RRT* (B-RRT*) algorithms and is specially designed for complex cluttered environments. IB-RRT* utilizes the bidirectional trees approach and introduces intelligent sample insertion heuristic for fast convergence to the optimal path solution using uniform sampling heuristics. The proposed algorithm is evaluated theoretically and experimental results are presented that compares IB-RRT* with RRT* and B-RRT*. Moreover, experimental results demonstrate the superior efficiency of IB-RRT* in comparison with RRT* and B-RRT in complex cluttered environments.\nGeometry theorem proving forms a major and challenging component in the K-12 mathematics curriculum. A particular difficult task is to add auxiliary constructions (i.e, additional lines or points) to aid proof discovery. Although there exist many intelligent tutoring systems proposed for geometry proofs, few teach students how to find auxiliary constructions. And the few exceptions are all limited by their underlying reasoning processes for supporting auxiliary constructions. This paper tackles these weaknesses of prior systems by introducing an interactive geometry tutor, the Advanced Geometry Proof Tutor (AGPT). It leverages a recent automated geometry prover to provide combined benefits that any geometry theorem prover or intelligent tutoring system alone cannot accomplish. In particular, AGPT not only can automatically process images of geometry problems directly, but also can interactively train and guide students toward discovering auxiliary constructions on their own. We have evaluated AGPT via a pilot study with 78 high school students. The study results show that, on training students how to find auxiliary constructions, there is no significant perceived difference between AGPT and human tutors, and AGPT is significantly more effective than the state-of-the-art geometry solver that produces human-readable proofs.\nParticle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a controller component. In this paper, we combine two isolate populations of PSO to forge the Adaptive Intelligence Optimizer (AIO) which harnesses the advantages of a bi-population PSO to escape from the local minimum and avoid premature convergence. Furthermore, using the rich framework of SIS and the nifty control theory that LA derived from, we find the perfect matching between SIS and LA where acting slowly is the pillar of both of them. Both SIS and LA need time to converge to the optimal decision where this enables AIO to outperform standard PSO having an incomparable performance on evolutionary optimization benchmark functions.\nGiven a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula Phi holds given KB. If we are reasoning about a world or system consisting of N individuals, then we can consider all possible worlds, or first-order models, with domain {1,...,N} that satisfy KB, and compute the fraction of them in which Phi is true. We define the degree of belief to be the asymptotic value of this fraction as N grows large. We show that when the vocabulary underlying Phi and KB uses constants and unary predicates only, we can naturally associate an entropy with each world. As N grows larger, there are many more worlds with higher entropy. Therefore, we can use a maximum-entropy computation to compute the degree of belief. This result is in a similar spirit to previous work in physics and artificial intelligence, but is far more general. Of equal interest to the result itself are the limitations on its scope. Most importantly, the restriction to unary predicates seems necessary. Although the random-worlds method makes sense in general, the connection to maximum entropy seems to disappear in the non-unary case. These observations suggest unexpected limitations to the applicability of maximum-entropy methods.\nMost modern formalisms used in Databases and Artificial Intelligence for describing an application domain are based on the notions of class (or concept) and relationship among classes. One interesting feature of such formalisms is the possibility of defining a class, i.e., providing a set of properties that precisely characterize the instances of the class. Many recent articles point out that there are several ways of assigning a meaning to a class definition containing some sort of recursion. In this paper, we argue that, instead of choosing a single style of semantics, we achieve better results by adopting a formalism that allows for different semantics to coexist. We demonstrate the feasibility of our argument, by presenting a knowledge representation formalism, the description logic muALCQ, with the above characteristics. In addition to the constructs for conjunction, disjunction, negation, quantifiers, and qualified number restrictions, muALCQ includes special fixpoint constructs to express (suitably interpreted) recursive definitions. These constructs enable the usual frame-based descriptions to be combined with definitions of recursive data structures such as directed acyclic graphs, lists, streams, etc. We establish several properties of muALCQ, including the decidability and the computational complexity of reasoning, by formulating a correspondence with a particular modal logic of programs called the modal mu-calculus.\nExisting plan synthesis approaches in artificial intelligence fall into two categories -- domain independent and domain dependent. The domain independent approaches are applicable across a variety of domains, but may not be very efficient in any one given domain. The domain dependent approaches need to be (re)designed for each domain separately, but can be very efficient in the domain for which they are designed. One enticing alternative to these approaches is to automatically synthesize domain independent planners given the knowledge about the domain and the theory of planning. In this paper, we investigate the feasibility of using existing automated software synthesis tools to support such synthesis. Specifically, we describe an architecture called CLAY in which the Kestrel Interactive Development System (KIDS) is used to derive a domain-customized planner through a semi-automatic combination of a declarative theory of planning, and the declarative control knowledge specific to a given domain, to semi-automatically combine them to derive domain-customized planners. We discuss what it means to write a declarative theory of planning and control knowledge for KIDS, and illustrate our approach by generating a class of domain-specific planners using state space refinements. Our experiments show that the synthesized planners can outperform classical refinement planners (implemented as instantiations of UCP, Kambhampati & Srivastava, 1995), using the same control knowledge. We will contrast the costs and benefits of the synthesis approach with conventional methods for customizing domain independent planners.\nIn this paper we propose a random CSP model, called Model GB, which is a natural generalization of standard Model B. It is proved that Model GB in which each constraint is easy to satisfy exhibits non-trivial behaviour (not trivially satisfiable or unsatisfiable) as the number of variables approaches infinity. A detailed analysis to obtain an asymptotic estimate (good to 1+o(1)) of the average number of nodes in a search tree used by the backtracking algorithm on Model GB is also presented. It is shown that the average number of nodes required for finding all solutions or proving that no solution exists grows exponentially with the number of variables. So this model might be an interesting distribution for studying the nature of hard instances and evaluating the performance of CSP algorithms. In addition, we further investigate the behaviour of the average number of nodes as r (the ratio of constraints to variables) varies. The results indicate that as r increases, random CSP instances get easier and easier to solve, and the base for the average number of nodes that is exponential in r tends to 1 as r approaches infinity. Therefore, although the average number of nodes used by the backtracking algorithm on random CSP is exponential, many CSP instances will be very easy to solve when r is sufficiently large.\nThe system presented here shows the feasibility of modeling the knowledge involved in a complex musical activity by integrating sub-symbolic and symbolic processes. This research focuses on the question of whether there is any advantage in integrating a neural network together with a distributed artificial intelligence approach within the music domain. The primary purpose of our work is to design a model that describes the different aspects a user might be interested in considering when involved in a musical activity. The approach we suggest in this work enables the musician to encode his knowledge, intuitions, and aesthetic taste into different modules. The system captures these aspects by computing and applying three distinct functions: rules, fuzzy concepts, and learning.   As a case study, we began experimenting with first species two-part counterpoint melodies. We have developed a hybrid system composed of a connectionist module and an agent-based module to combine the sub-symbolic and symbolic levels to achieve this task. The technique presented here to represent musical knowledge constitutes a new approach for composing polyphonic music.\nIn this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.\nThe two standard branching schemes for CSPs are d-way and 2-way branching. Although it has been shown that in theory the latter can be exponentially more effective than the former, there is a lack of empirical evidence showing such differences. To investigate this, we initially make an experimental comparison of the two branching schemes over a wide range of benchmarks. Experimental results verify the theoretical gap between d-way and 2-way branching as we move from a simple variable ordering heuristic like dom to more sophisticated ones like dom/ddeg. However, perhaps surprisingly, experiments also show that when state-of-the-art variable ordering heuristics like dom/wdeg are used then d-way can be clearly more efficient than 2-way branching in many cases. Motivated by this observation, we develop two generic heuristics that can be applied at certain points during search to decide whether 2-way branching or a restricted version of 2-way branching, which is close to d-way branching, will be followed. The application of these heuristics results in an adaptive branching scheme. Experiments with instantiations of the two generic heuristics confirm that search with adaptive branching outperforms search with a fixed branching scheme on a wide range of problems.\nThe constraint satisfaction problem (CSP) is a general problem central to computer science and artificial intelligence. Although the CSP is NP-hard in general, considerable effort has been spent on identifying tractable subclasses. The main two approaches consider structural properties (restrictions on the hypergraph of constraint scopes) and relational properties (restrictions on the language of constraint relations). Recently, some authors have considered hybrid properties that restrict the constraint hypergraph and the relations simultaneously.   Our key contribution is the novel concept of a CSP pattern and classes of problems defined by forbidden patterns (which can be viewed as forbidding generic subproblems). We describe the theoretical framework which can be used to reason about classes of problems defined by forbidden patterns. We show that this framework generalises relational properties and allows us to capture known hybrid tractable classes.   Although we are not close to obtaining a dichotomy concerning the tractability of general forbidden patterns, we are able to make some progress in a special case: classes of problems that arise when we can only forbid binary negative patterns (generic subproblems in which only inconsistent tuples are specified). In this case we are able to characterise very large classes of tractable and NP-hard forbidden patterns. This leaves the complexity of just one case unresolved and we conjecture that this last case is tractable.\nHuman intuition has been simulated by several research projects using artificial intelligence techniques. Most of these algorithms or models lack the ability to handle complications or diversions. Moreover, they also do not explain the factors influencing intuition and the accuracy of the results from this process. In this paper, we present a simple series based model for implementation of human-like intuition using the principles of connectivity and unknown entities. By using Poker hand datasets and Car evaluation datasets, we compare the performance of some well-known models with our intuition model. The aim of the experiment was to predict the maximum accurate answers using intuition based models. We found that the presence of unknown entities, diversion from the current problem scenario, and identifying weakness without the normal logic based execution, greatly affects the reliability of the answers. Generally, the intuition based models cannot be a substitute for the logic based mechanisms in handling such problems. The intuition can only act as a support for an ongoing logic based model that processes all the steps in a sequential manner. However, when time and computational cost are very strict constraints, this intuition based model becomes extremely important and useful, because it can give a reasonably good performance. Factors affecting intuition are analyzed and interpreted through our model.\nAs computational agents are developed for increasingly complicated e-commerce applications, the complexity of the decisions they face demands advances in artificial intelligence techniques. For example, an agent representing a seller in an auction should try to maximize the seller's profit by reasoning about a variety of possibly uncertain pieces of information, such as the maximum prices various buyers might be willing to pay, the possible prices being offered by competing sellers, the rules by which the auction operates, the dynamic arrival and matching of offers to buy and sell, and so on. A naive application of multiagent reasoning techniques would require the seller's agent to explicitly model all of the other agents through an extended time horizon, rendering the problem intractable for many realistically-sized problems. We have instead devised a new strategy that an agent can use to determine its bid price based on a more tractable Markov chain model of the auction process. We have experimentally identified the conditions under which our new strategy works well, as well as how well it works in comparison to the optimal performance the agent could have achieved had it known the future. Our results show that our new strategy in general performs well, outperforming other tractable heuristic strategies in a majority of experiments, and is particularly effective in a 'seller?s market', where many buy offers are available.\nBelief merging is an important but difficult problem in Artificial Intelligence, especially when sources of information are pervaded with uncertainty. Many merging operators have been proposed to deal with this problem in possibilistic logic, a weighted logic which is powerful for handling inconsistency and deal- ing with uncertainty. They often result in a possibilistic knowledge base which is a set of weighted formulas. Although possibilistic logic is inconsistency tolerant, it suers from the well-known \"drowning effect\". Therefore, we may still want to obtain a consistent possi- bilistic knowledge base as the result of merg- ing. In such a case, we argue that it is not always necessary to keep weighted informa- tion after merging. In this paper, we define a merging operator that maps a set of pos- sibilistic knowledge bases and a formula rep- resenting the integrity constraints to a clas- sical knowledge base by using lexicographic ordering. We show that it satisfies nine pos- tulates that generalize basic postulates for propositional merging given in [11]. These postulates capture the principle of minimal change in some sense. We then provide an algorithm for generating the resulting knowl- edge base of our merging operator. Finally, we discuss the compatibility of our merging operator with propositional merging and es- tablish the advantage of our merging opera- tor over existing semantic merging operators in the propositional case.\nManipulation, bribery, and control are well-studied ways of changing the outcome of an election. Many voting rules are, in the general case, computationally resistant to some of these manipulative actions. However when restricted to single-peaked electorates, these rules suddenly become easy to manipulate. Recently, Faliszewski, Hemaspaandra, and Hemaspaandra studied the computational complexity of strategic behavior in nearly single-peaked electorates. These are electorates that are not single-peaked but close to it according to some distance measure.   In this paper we introduce several new distance measures regarding single-peakedness. We prove that determining whether a given profile is nearly single-peaked is NP-complete in many cases. For one case we present a polynomial-time algorithm. In case the single-peaked axis is given, we show that determining the distance is always possible in polynomial time. Furthermore, we explore the relations between the new notions introduced in this paper and existing notions from the literature.\nThe problems associated with scaling involve active and challenging research topics in the area of artificial intelligence. The purpose is to solve real world problems by means of AI technologies, in cases where the complexity of representation of the real world problem is potentially combinatorial. In this paper, we present a novel approach to cope with the scaling issues in Bayesian belief networks for ship classification. The proposed approach divides the conceptual model of a complex ship classification problem into a set of small modules that work together to solve the classification problem while preserving the functionality of the original model. The possible ways of explaining sensor returns (e.g., the evidence) for some features, such as portholes along the length of a ship, are sometimes combinatorial. Thus, using an exhaustive approach, which entails the enumeration of all possible explanations, is impractical for larger problems. We present a network structure (referred to as Sequential Decomposition, SD) in which each observation is associated with a set of legitimate outcomes which are consistent with the explanation of each observed piece of evidence. The results show that the SD approach allows one to represent feature-observation relations in a manageable way and achieve the same explanatory power as an exhaustive approach.\nShafer's theory of belief and the Bayesian theory of probability are two alternative and mutually inconsistent approaches toward modelling uncertainty in artificial intelligence. To help reduce the conflict between these two approaches, this paper reexamines expected utility theory-from which Bayesian probability theory is derived. Expected utility theory requires the decision maker to assign a utility to each decision conditioned on every possible event that might occur. But frequently the decision maker cannot foresee all the events that might occur, i.e., one of the possible events is the occurrence of an unforeseen event. So once we acknowledge the existence of unforeseen events, we need to develop some way of assigning utilities to decisions conditioned on unforeseen events. The commonsensical solution to this problem is to assign similar utilities to events which are similar. Implementing this commonsensical solution is equivalent to replacing Bayesian subjective probabilities over the space of foreseen and unforeseen events by random set theory probabilities over the space of foreseen events. This leads to an expected utility principle in which normalized variants of Shafer's commonalities play the role of subjective probabilities. Hence allowing for unforeseen events in decision analysis causes Bayesian probability theory to become much more similar to Shaferian theory.\nThe modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational methods that would underlie their spatial information theoretic underpinnings.   We present the conceptual overview and architecture for the development of high-level semantic and qualitative analytical capabilities for dynamic geospatial domains. Building on formal methods in the areas of commonsense reasoning, qualitative reasoning, spatial and temporal representation and reasoning, reasoning about actions and change, and computational models of narrative, we identify concrete theoretical and practical challenges that accrue in the context of formal reasoning about `space, events, actions, and change'. With this as a basis, and within the backdrop of an illustrated scenario involving the spatio-temporal dynamics of urban narratives, we address specific problems and solutions techniques chiefly involving `qualitative abstraction', `data integration and spatial consistency', and `practical geospatial abduction'. From a broad topical viewpoint, we propose that next-generation dynamic GIS technology demands a transdisciplinary scientific perspective that brings together Geography, Artificial Intelligence, and Cognitive Science.   Keywords: artificial intelligence; cognitive systems; human-computer interaction; geographic information systems; spatio-temporal dynamics; computational models of narrative; geospatial analysis; geospatial modelling; ontology; qualitative spatial modelling and reasoning; spatial assistance systems\nWe introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve. (2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA. Our main theorem demonstrates that GOTCHAs can be used to mitigate the threat of offline dictionary attacks against passwords by ensuring that a password cracker must receive constant feedback from a human being while mounting an attack. Finally, we provide a candidate construction of GOTCHAs based on Inkblot images. Our construction relies on the usability assumption that users can recognize the phrases that they originally used to describe each Inkblot image --- a much weaker usability assumption than previous password systems based on Inkblots which required users to recall their phrase exactly. We conduct a user study to evaluate the usability of our GOTCHA construction. We also generate a GOTCHA challenge where we encourage artificial intelligence and security researchers to try to crack several passwords protected with our scheme.\nPrime implicates and prime implicants have proven relevant to a number of areas of artificial intelligence, most notably abductive reasoning and knowledge compilation. The purpose of this paper is to examine how these notions might be appropriately extended from propositional logic to the modal logic K. We begin the paper by considering a number of potential definitions of clauses and terms for K. The different definitions are evaluated with respect to a set of syntactic, semantic, and complexity-theoretic properties characteristic of the propositional definition. We then compare the definitions with respect to the properties of the notions of prime implicates and prime implicants that they induce. While there is no definition that perfectly generalizes the propositional notions, we show that there does exist one definition which satisfies many of the desirable properties of the propositional case. In the second half of the paper, we consider the computational properties of the selected definition. To this end, we provide sound and complete algorithms for generating and recognizing prime implicates, and we show the prime implicate recognition task to be PSPACE-complete. We also prove upper and lower bounds on the size and number of prime implicates. While the paper focuses on the logic K, all of our results hold equally well for multi-modal K and for concept expressions in the description logic ALC.\nReal-time heuristic search algorithms satisfy a constant bound on the amount of planning per action, independent of problem size. As a result, they scale up well as problems become larger. This property would make them well suited for video games where Artificial Intelligence controlled agents must react quickly to user commands and to other agents actions. On the downside, real-time search algorithms employ learning methods that frequently lead to poor solution quality and cause the agent to appear irrational by re-visiting the same problem states repeatedly. The situation changed recently with a new algorithm, D LRTA*, which attempted to eliminate learning by automatically selecting subgoals. D LRTA* is well poised for video games, except it has a complex and memory-demanding pre-computation phase during which it builds a database of subgoals. In this paper, we propose a simpler and more memory-efficient way of pre-computing subgoals thereby eliminating the main obstacle to applying state-of-the-art real-time search methods in video games. The new algorithm solves a number of randomly chosen problems off-line, compresses the solutions into a series of subgoals and stores them in a database. When presented with a novel problem on-line, it queries the database for the most similar previously solved case and uses its subgoals to solve the problem. In the domain of pathfinding on four large video game maps, the new algorithm delivers solutions eight times better while using 57 times less memory and requiring 14% less pre-computation time.\nDomain-independent planning is one of the foundational areas in the field of Artificial Intelligence. A description of a planning task consists of an initial world state, a goal, and a set of actions for modifying the world state. The objective is to find a sequence of actions, that is, a plan, that transforms the initial world state into a goal state. In optimal planning, we are interested in finding not just a plan, but one of the cheapest plans. A prominent approach to optimal planning these days is heuristic state-space search, guided by admissible heuristic functions. Numerous admissible heuristics have been developed, each with its own strengths and weaknesses, and it is well known that there is no single \"best heuristic for optimal planning in general. Thus, which heuristic to choose for a given planning task is a difficult question. This difficulty can be avoided by combining several heuristics, but that requires computing numerous heuristic estimates at each state, and the tradeoff between the time spent doing so and the time saved by the combined advantages of the different heuristics might be high. We present a novel method that reduces the cost of combining admissible heuristics for optimal planning, while maintaining its benefits. Using an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for learning a classifier with that decision rule as the target concept, and employ the learned classifier to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms the standard method for combining several heuristics via their pointwise maximum.\nThis book-length article combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence (AI). The behavior of future AI systems can be described by mathematical equations, which are adapted to analyze possible unintended AI behaviors and ways that AI designs can avoid them. This article makes the case for utility-maximizing agents and for avoiding infinite sets in agent definitions. It shows how to avoid agent self-delusion using model-based utility functions and how to avoid agents that corrupt their reward generators (sometimes called \"perverse instantiation\") using utility functions that evaluate outcomes at one point in time from the perspective of humans at a different point in time. It argues that agents can avoid unintended instrumental actions (sometimes called \"basic AI drives\" or \"instrumental goals\") by accurately learning human values. This article defines a self-modeling agent framework and shows how it can avoid problems of resource limits, being predicted by other agents, and inconsistency between the agent's utility function and its definition (one version of this problem is sometimes called \"motivated value selection\"). This article also discusses how future AI will differ from current AI, the politics of AI, and the ultimate use of AI to help understand the nature of the universe and our place in it.\nIn this position paper, I argue that standardized tests for elementary science such as SAT or Regents tests are not very good benchmarks for measuring the progress of artificial intelligence systems in understanding basic science. The primary problem is that these tests are designed to test aspects of knowledge and ability that are challenging for people; the aspects that are challenging for AI systems are very different. In particular, standardized tests do not test knowledge that is obvious for people; none of this knowledge can be assumed in AI systems. Individual standardized tests also have specific features that are not necessarily appropriate for an AI benchmark. I analyze the Physics subject SAT in some detail and the New York State Regents Science test more briefly. I also argue that the apparent advantages offered by using standardized tests are mostly either minor or illusory. The one major real advantage is that the significance is easily explained to the public; but I argue that even this is a somewhat mixed blessing. I conclude by arguing that, first, more appropriate collections of exam style problems could be assembled, and second, that there are better kinds of benchmarks than exam-style problems. In an appendix I present a collection of sample exam-style problems that test kinds of knowledge missing from the standardized tests.\nUndergraduate students of artificial intelligence often struggle with representing knowledge as logical sentences. This is a skill that seems to require extensive practice to obtain, suggesting a teaching strategy that involves the assignment of numerous exercises involving the formulation of some bit of knowledge, communicated using a natural language such as English, as a sentence in some logic. The number of such exercises needed to master this skill is far too large to allow typical artificial intelligence course teaching teams to provide prompt feedback on student efforts. Thus, an automated assessment system for such exercises is needed to ensure that students receive an adequate amount of practice, with the rapid delivery of feedback allowing students to identify errors in their understanding and correct them. This paper describes an automated grading system for knowledge representation exercises using first-order logic. A resolution theorem prover, \\textit{Prover9}, is used to check if a student-submitted formula is logically equivalent to a solution provided by the instructor. This system has been used by students enrolled in undergraduate artificial intelligence classes for several years. Use of this teaching tool resulted in a statistically significant improvement on first-order logic knowledge representation questions appearing on the course final examination. This article explains how this system works, provides an analysis of changes in student learning outcomes, and explores potential enhancements of this system, including the possibility of providing rich formative feedback by replacing the resolution theorem prover with a tableaux-based method.\nThis article addresses an open problem in the area of cognitive systems and architectures: namely the problem of handling (in terms of processing and reasoning capabilities) complex knowledge structures that can be at least plausibly comparable, both in terms of size and of typology of the encoded information, to the knowledge that humans process daily for executing everyday activities. Handling a huge amount of knowledge, and selectively retrieve it ac- cording to the needs emerging in different situational scenarios, is an important aspect of human intelligence. For this task, in fact, humans adopt a wide range of heuristics (Gigerenzer and Todd) due to their bounded rationality (Simon, 1957). In this perspective, one of the re- quirements that should be considered for the design, the realization and the evaluation of intelligent cognitively inspired systems should be represented by their ability of heuristically identify and retrieve, from the general knowledge stored in their artificial Long Term Memory (LTM), that one which is synthetically and contextually relevant. This require- ment, however, is often neglected. Currently, artificial cognitive systems and architectures are not able, de facto, to deal with complex knowledge structures that can be even slightly comparable to the knowledge heuris- tically managed by humans. In this paper I will argue that this is not only a technological problem but also an epistemological one and I will briefly sketch a proposal for a possible solution.\nIn recent years, researchers in decision analysis and artificial intelligence (Al) have used Bayesian belief networks to build models of expert opinion. Using standard methods drawn from the theory of computational complexity, workers in the field have shown that the problem of probabilistic inference in belief networks is difficult and almost certainly intractable. K N ET, a software environment for constructing knowledge-based systems within the axiomatic framework of decision theory, contains a randomized approximation scheme for probabilistic inference. The algorithm can, in many circumstances, perform efficient approximate inference in large and richly interconnected models of medical diagnosis. Unlike previously described stochastic algorithms for probabilistic inference, the randomized approximation scheme computes a priori bounds on running time by analyzing the structure and contents of the belief network. In this article, we describe a randomized algorithm for probabilistic inference and analyze its performance mathematically. Then, we devote the major portion of the paper to a discussion of the algorithm's empirical behavior. The results indicate that the generation of good trials (that is, trials whose distribution closely matches the true distribution), rather than the computation of numerous mediocre trials, dominates the performance of stochastic simulation. Key words: probabilistic inference, belief networks, stochastic simulation, computational complexity theory, randomized algorithms.\nMany Artificial Intelligence systems depend on the agent's updating its beliefs about the world on the basis of experience. Experiments constitute one type of experience, so scientific methodology offers a natural environment for examining the issues attendant to using this class of evidence. This paper presents a framework which structures the process of using scientific data from research reports for the purpose of making decisions, using decision analysis as the basis for the structure and using medical research as the general scientific domain. The structure extends the basic influence diagram for updating belief in an object domain parameter of interest by expanding the parameter into four parts: those of the patient, the population, the study sample, and the effective study sample. The structure uses biases to perform the transformation of one parameter into another, so that, for instance, selection biases, in concert with the population parameter, yield the study sample parameter. The influence diagram structure provides decision theoretic justification for practices of good clinical research such as randomized assignment and blindfolding of care providers. The model covers most research designs used in medicine: case-control studies, cohort studies, and controlled clinical trials, and provides an architecture to separate clearly between statistical knowledge and domain knowledge. The proposed general model can be the basis for clinical epidemiological advisory systems, when coupled with heuristic pruning of irrelevant biases; of statistical workstations, when the computational machinery for calculation of posterior distributions is added; and of meta-analytic reviews, when multiple studies may impact on a single population parameter.\nThis document contains supplementary material for the paper \"Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation\", published at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15). The paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs). We propose a policy-based approach that exploits gradient information to generate solutions close to the Pareto ones. Differently from previous policy-gradient multi-objective algorithms, where n optimization routines are use to have n solutions, our approach performs a single gradient-ascent run that at each step generates an improved continuous approximation of the Pareto frontier. The idea is to exploit a gradient-based approach to optimize the parameters of a function that defines a manifold in the policy parameter space so that the corresponding image in the objective space gets as close as possible to the Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two interesting MOMDPs.\nMotivated by recent progress on pricing in the AI literature, we study marketplaces that contain multiple vendors offering identical or similar products and unit-demand buyers with different valuations on these vendors. The objective of each vendor is to set the price of its product to a fixed value so that its profit is maximized. The profit depends on the vendor's price itself and the total volume of buyers that find the particular price more attractive than the price of the vendor's competitors. We model the behaviour of buyers and vendors as a two-stage full-information game and study a series of questions related to the existence, efficiency (price of anarchy) and computational complexity of equilibria in this game. To overcome situations where equilibria do not exist or exist but are highly inefficient, we consider the scenario where some of the vendors are subsidized in order to keep prices low and buyers highly satisfied.\nLearning models of artificial intelligence can nowadays perform very well on a large variety of tasks. However, in practice different task environments are best handled by different learning models, rather than a single, universal, approach. Most non-trivial models thus require the adjustment of several to many learning parameters, which is often done on a case-by-case basis by an external party. Meta-learning refers to the ability of an agent to autonomously and dynamically adjust its own learning parameters, or meta-parameters. In this work we show how projective simulation, a recently developed model of artificial intelligence, can naturally be extended to account for meta-learning in reinforcement learning settings. The projective simulation approach is based on a random walk process over a network of clips. The suggested meta-learning scheme builds upon the same design and employs clip networks to monitor the agent's performance and to adjust its meta-parameters \"on the fly\". We distinguish between \"reflexive adaptation\" and \"adaptation through learning\", and show the utility of both approaches. In addition, a trade-off between flexibility and learning-time is addressed. The extended model is examined on three different kinds of reinforcement learning tasks, in which the agent has different optimal values of the meta-parameters, and is shown to perform well, reaching near-optimal to optimal success rates in all of them, without ever needing to manually adjust any meta-parameter.\nLinguistic relations in oral conversations present how opinions are constructed and developed in a restricted time. The relations bond ideas, arguments, thoughts, and feelings, re-shape them during a speech, and finally build knowledge out of all information provided in the conversation. Speakers share a common interest to discuss. It is expected that each speaker's reply includes duplicated forms of words from previous speakers. However, linguistic adaptation is observed and evolves in a more complex path than just transferring slightly modified versions of common concepts. A conversation aiming a benefit at the end shows an emergent cooperation inducing the adaptation. Not only cooperation, but also competition drives the adaptation or an opposite scenario and one can capture the dynamic process by tracking how the concepts are linguistically linked. To uncover salient complex dynamic events in verbal communications, we attempt to discover self-organized linguistic relations hidden in a conversation with explicitly stated winners and losers. We examine open access data of the United States Supreme Court. Our understanding is crucial in big data research to guide how transition states in opinion mining and decision-making should be modeled and how this required knowledge to guide the model should be pinpointed, by filtering large amount of data.\nGiven recent proposals to synthesize consciousness, how many forms of conscious machines can one distinguish and on what grounds? Based on current clinical scales of consciousness, that measure cognitive awareness and wakefulness, we take a perspective on how contemporary artificially intelligent machines and synthetically engineered life forms would measure on these scales. To do so, we argue that awareness and wakefulness can be associated to computational and autonomous complexity respectively. Then, building on insights from cognitive robotics, we ask what function consciousness serves, and interpret it as an evolutionary game-theoretic strategy. We make the case for a third type of complexity necessary for describing consciousness, namely, social complexity. Having identified these complexity types, allows us to represent both, biological and synthetic systems in a common morphospace. This suggests an embodiment-based taxonomy of consciousness. In particular, we distinguish four forms of consciousness, based on embodiment: biological, synthetic, group (resulting from group interactions) and simulated consciousness (embodied by virtual agents within a simulated reality). Such a taxonomy is useful for studying comparative signatures of consciousness across domains, in order to highlight design principles necessary to engineer conscious machines. This is particularly relevant in the light of recent developments at the crossroads of neuroscience, biomedical engineering, artificial intelligence and biomimetics.\nMolecular variants of vitamin B12, siderophores and glycans occur. To take up variant forms, bacteria may express an array of receptors. The gut microbe Bacteroides thetaiotaomicron has three different receptors to take up variants of vitamin B12 and 88 receptors to take up various glycans. The design of receptor arrays reflects key processes that shape cellular evolution. Competition may focus each species on a subset of the available nutrient diversity. Some gut bacteria can take up only a narrow range of carbohydrates, whereas species such as B.~thetaiotaomicron can digest many different complex glycans. Comparison of different nutrients, habitats, and genomes provide opportunity to test hypotheses about the breadth of receptor arrays. Another important process concerns fluctuations in nutrient availability. Such fluctuations enhance the value of cellular sensors, which gain information about environmental availability and adjust receptor deployment. Bacteria often adjust receptor expression in response to fluctuations of particular carbohydrate food sources. Some species may adjust expression of uptake receptors for specific siderophores. How do cells use sensor information to control the response to fluctuations? That question about regulatory wiring relates to problems that arise in control theory and artificial intelligence. Control theory clarifies how to analyze environmental fluctuations in relation to the design of sensors and response systems. Recent advances in deep learning studies of artificial intelligence focus on the architecture of regulatory wiring and the ways in which complex control networks represent and classify environmental states. I emphasize the similar design problems that arise in cellular evolution, control theory, and artificial intelligence. I connect those broad concepts to testable hypotheses for bacterial uptake of B12, siderophores and glycans.\nSyntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful.   In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources.   The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.\nThe study and understanding of human behaviour is relevant to computer science, artificial intelligence, neural computation, cognitive science, philosophy, psychology, and several other areas. Presupposing cognition as basis of behaviour, among the most prominent tools in the modelling of behaviour are computational-logic systems, connectionist models of cognition, and models of uncertainty. Recent studies in cognitive science, artificial intelligence, and psychology have produced a number of cognitive models of reasoning, learning, and language that are underpinned by computation. In addition, efforts in computer science research have led to the development of cognitive computational systems integrating machine learning and automated reasoning. Such systems have shown promise in a range of applications, including computational biology, fault diagnosis, training and assessment in simulators, and software verification. This joint survey reviews the personal ideas and views of several researchers on neural-symbolic learning and reasoning. The article is organised in three parts: Firstly, we frame the scope and goals of neural-symbolic computation and have a look at the theoretical foundations. We then proceed to describe the realisations of neural-symbolic computation, systems, and applications. Finally we present the challenges facing the area and avenues for further research.\nThere have been numerous breakthroughs with reinforcement learning in the recent years, perhaps most notably on Deep Reinforcement Learning successfully playing and winning relatively advanced computer games. There is undoubtedly an anticipation that Deep Reinforcement Learning will play a major role when the first AI masters the complicated game plays needed to beat a professional Real-Time Strategy game player. For this to be possible, there needs to be a game environment that targets and fosters AI research, and specifically Deep Reinforcement Learning. Some game environments already exist, however, these are either overly simplistic such as Atari 2600 or complex such as Starcraft II from Blizzard Entertainment. We propose a game environment in between Atari 2600 and Starcraft II, particularly targeting Deep Reinforcement Learning algorithm research. The environment is a variant of Tower Line Wars from Warcraft III, Blizzard Entertainment. Further, as a proof of concept that the environment can harbor Deep Reinforcement algorithms, we propose and apply a Deep Q-Reinforcement architecture. The architecture simplifies the state space so that it is applicable to Q-learning, and in turn improves performance compared to current state-of-the-art methods. Our experiments show that the proposed architecture can learn to play the environment well, and score 33% better than standard Deep Q-learning which in turn proves the usefulness of the game environment.\nThe hard problem in artificial intelligence asks how the shuffling of syntactical symbols in a program can lead to systems which experience semantics and qualia. We address this question in three stages. First, we introduce a new class of human semantic symbols which appears when unexpected and drastic environmental change causes humans to become surprised, confused, uncertain, and in extreme cases, unresponsive, passive and dysfunctional. For this class of symbols, pre-learned programs become inoperative so these syntactical programs cannot be the source of experienced qualia. Second, we model the dysfunctional human response to a radically changed environment as being the natural response of any learning machine facing novel inputs from well outside its previous training set. In this situation, learning machines are unable to extract information from their input and will typically enter a dynamical state characterized by null outputs and a lack of response. This state immediately predicts and explains the characteristics of the semantic experiences of humans in similar circumstances. In the third stage, we consider learning machines trained to implement multiple functions in simple sequential programs using environmental data to specify subroutine names, control flow instructions, memory calls, and so on. Drastic change in any of these environmental inputs can again lead to inoperative programs. By examining changes specific to people or locations we can model human cognitive symbols featuring these dependencies, such as attachment and grief. Our approach links known dynamical machines states with human qualia and thus offers new insight into the hard problem of artificial intelligence.\nAll artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes (\"non-human\" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems. The mathematical foundations of AI non-destructive correction are presented and a series of new stochastic separation theorems is proven. These theorems provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. They demonstrate that in high dimensions and even for exponentially large samples, linear classifiers in their classical Fisher's form are powerful enough to separate errors from correct responses with high probability and to provide efficient solution to the non-destructive corrector problem. In particular, we prove some hypotheses formulated in our paper `Stochastic Separation Theorems' (Neural Networks, 94, 255--259, 2017), and answer one general problem published by Donoho and Tanner in 2009.\nDespite significant advances in artificial intelligence (AI) for computer vision, its application in medical imaging has been limited by the burden and limits of expert-generated labels. We used images from optical coherence tomography angiography (OCTA), a relatively new imaging modality that measures perfusion of the retinal vasculature, to train an AI algorithm to generate vasculature maps from standard structural optical coherence tomography (OCT) images of the same retinae, both exceeding the ability and bypassing the need for expert labeling. Deep learning was able to infer perfusion of microvasculature from structural OCT images with similar fidelity to OCTA and significantly better than expert clinicians (P < 0.00001). OCTA suffers from need of specialized hardware, laborious acquisition protocols, and motion artifacts; whereas our model works directly from standard OCT which are ubiquitous and quick to obtain, and allows unlocking of large volumes of previously collected standard OCT data both in existing clinical trials and clinical practice. This finding demonstrates a novel application of AI to medical imaging, whereby subtle regularities between different modalities are used to image the same body part and AI is used to generate detailed and accurate inferences of tissue function from structure imaging.\nThis article presents an extensive literature review of technology based intervention methodologies for individuals facing Autism Spectrum Disorder (ASD). Reviewed methodologies include: contemporary Computer Aided Systems (CAS), Computer Vision Assisted Technologies (CVAT) and Virtual Reality (VR) or Artificial Intelligence-Assisted interventions. The research over the past decade has provided enough demonstrations that individuals of ASD have a strong interest in technology based interventions and can connect with them for longer durations without facing any trouble(s). Theses technology based interventions are useful for individuals facing autism in clinical settings as well as at home and classrooms.   Despite showing great promise, research in developing an advanced technology based intervention that is clinically quantitative for ASD is minimal. Moreover, the clinicians are generally not convinced about the potential of the technology based interventions due to non-empirical nature of published results. A major reason behind this non-acceptability is a vast majority of studies on distinct intervention methodologies do not follow any specific standard or research design. Consequently, the data produced by these studies is minimally appealing to the clinical community.   This research domain has a vast social impact as per official statistics given by the Autism Society of America, autism is the fastest growing developmental disability in the United States (US). The estimated annual cost in the US for diagnosis and treatment for ASD is 236-262 Billion US Dollars. The cost of up-bringing an ASD individual is estimated to be 1.4 million USD while statistics show 1% of the worlds' total population is suffering from ASD.\nSelf-organizing complex systems typically are comprised of a large number of frequently similar components or events. Through their process, a pattern at the global-level of a system emerges solely from numerous interactions among the lower-level components of the system. Moreover, the rules specifying interactions among the system's components are executed using only local information, without reference to the global pattern, which, as in many real-world problems is not easily accessible or possible to be found. Stigmergy, a kind of indirect communication and learning by the environment found in social insects is a well know example of self-organization, providing not only vital clues in order to understand how the components can interact to produce a complex pattern, as can pinpoint simple biological non-linear rules and methods to achieve improved artificial intelligent adaptive categorization systems, critical for Data-Mining. On the present work it is our intention to show that a new type of Data-Mining can be designed based on Stigmergic paradigms, taking profit of several natural features of this phenomenon. By hybridizing bio-inspired Swarm Intelligence with Evolutionary Computation we seek for an entire distributed, adaptive, collective and cooperative self-organized Data-Mining. As a real-world, real-time test bed for our proposal, World-Wide-Web Mining will be used. Having that purpose in mind, Web usage Data was collected from the Monash University's Web site (Australia), with over 7 million hits every week. Results are compared to other recent systems, showing that the system presented is by far promising.\nThe explosive growth of spatial data and extensive utilization of spatial databases emphasize the necessity for the automated discovery of spatial knowledge. In modern times, spatial data mining has emerged as an area of voluminous research. Forest fires are a chief environmental concern, causing economical and ecological damage while endangering human lives across the world. The fast or early detection of forest fires is a vital element for controlling such phenomenon. The application of remote sensing is at present a significant method for forest fires monitoring, particularly in vast and remote areas. Different methods have been presented by researchers for forest fire detection. The motivation behind this research is to obtain beneficial information from images in the forest spatial data and use the same in the determination of regions at the risk of fires by utilizing Image Processing and Artificial Intelligence techniques. This paper presents an intelligent system to detect the presence of forest fires in the forest spatial data using Artificial Neural Networks. The digital images in the forest spatial data are converted from RGB to XYZ color space and then segmented by employing anisotropic diffusion to identify the fire regions. Subsequently, Radial Basis Function Neural Network is employed in the design of the intelligent system, which is trained with the color space values of the segmented fire regions. Extensive experimental assessments on publicly available spatial data illustrated the efficiency of the proposed system in effectively detecting forest fires.\nThe main purpose of this article is to describe potential benefits and applications of the SP theory, a unique attempt to simplify and integrate ideas across artificial intelligence, mainstream computing and human cognition, with information compression as a unifying theme. The theory, including a concept of multiple alignment, combines conceptual simplicity with descriptive and explanatory power in several areas including representation of knowledge, natural language processing, pattern recognition, several kinds of reasoning, the storage and retrieval of information, planning and problem solving, unsupervised learning, information compression, and human perception and cognition. In the SP machine -- an expression of the SP theory which is currently realised in the form of computer models -- there is potential for an overall simplification of computing systems, including software. As a theory with a broad base of support, the SP theory promises useful insights in many areas and the integration of structures and functions, both within a given area and amongst different areas. There are potential benefits in natural language processing (with potential for the understanding and translation of natural languages), the need for a versatile intelligence in autonomous robots, computer vision, intelligent databases, maintaining multiple versions of documents or web pages, software engineering, criminal investigations, the management of big data and gaining benefits from it, the semantic web, medical diagnosis, the detection of computer viruses, the economical transmission of data, and data fusion. Further development of these ideas would be facilitated by the creation of a high-parallel, web-based, open-source version of the SP machine, with a good user interface. This would provide a means for researchers to explore what can be done with the system and to refine it.\nIn current perception systems applied to the rebuilding of the environment for intelligent vehicles, the part reserved to object association for the tracking is increasingly significant. This allows firstly to follow the objects temporal evolution and secondly to increase the reliability of environment perception. We propose in this communication the development of a multi-objects association algorithm with ambiguity removal entering into the design of such a dynamic perception system for intelligent vehicles. This algorithm uses the belief theory and data modelling with fuzzy mathematics in order to be able to handle inaccurate as well as uncertain information due to imperfect sensors. These theories also allow the fusion of numerical as well as symbolic data. We develop in this article the problem of matching between known and perceived objects. This makes it possible to update a dynamic environment map for a vehicle. The belief theory will enable us to quantify the belief in the association of each perceived object with each known object. Conflicts can appear in the case of object appearance or disappearance, or in the case of a confused situation or bad perception. These conflicts are removed or solved using an assignment algorithm, giving a solution called the \" best \" and so ensuring the tracking of some objects present in our environment.\nDuring the last two decades there has been a growing interest in Particle Filtering (PF). However, PF suffers from two long-standing problems that are referred to as sample degeneracy and impoverishment. We are investigating methods that are particularly efficient at Particle Distribution Optimization (PDO) to fight sample degeneracy and impoverishment, with an emphasis on intelligence choices. These methods benefit from such methods as Markov Chain Monte Carlo methods, Mean-shift algorithms, artificial intelligence algorithms (e.g., Particle Swarm Optimization, Genetic Algorithm and Ant Colony Optimization), machine learning approaches (e.g., clustering, splitting and merging) and their hybrids, forming a coherent standpoint to enhance the particle filter. The working mechanism, interrelationship, pros and cons of these approaches are provided. In addition, Approaches that are effective for dealing with high-dimensionality are reviewed. While improving the filter performance in terms of accuracy, robustness and convergence, it is noted that advanced techniques employed in PF often causes additional computational requirement that will in turn sacrifice improvement obtained in real life filtering. This fact, hidden in pure simulations, deserves the attention of the users and designers of new filters.\nThe majority of big data is unstructured and of this majority the largest chunk is text. While data mining techniques are well developed and standardized for structured, numerical data, the realm of unstructured data is still largely unexplored. The general focus lies on information extraction, which attempts to retrieve known information from text. The Holy Grail, however is knowledge discovery, where machines are expected to unearth entirely new facts and relations that were not previously known by any human expert. Indeed, understanding the meaning of text is often considered as one of the main characteristics of human intelligence. The ultimate goal of semantic artificial intelligence is to devise software that can understand the meaning of free text, at least in the practical sense of providing new, actionable information condensed out of a body of documents. As a stepping stone on the road to this vision I will introduce a totally new approach to drug research, namely that of identifying relevant information by employing a self-organizing semantic engine to text mine large repositories of biomedical research papers, a technique pioneered by Merck with the InfoCodex software. I will describe the methodology and a first successful experiment for the discovery of new biomarkers and phenotypes for diabetes and obesity on the basis of PubMed abstracts, public clinical trials and Merck internal documents. The reported approach shows much promise and has potential to impact fundamentally pharmaceutical research as a way to shorten time-to-market of novel drugs, and for early recognition of dead ends.\nRemote science operations require automated systems that can both act and react with minimal human intervention. One such vision is that of an intelligent instrument that collects data in an automated fashion, and based on what it learns, decides which new measurements to take. This innovation implements experimental design and unites it with data analysis in such a way that it completes the cycle of learning. This cycle is the basis of the Scientific Method.   The three basic steps of this cycle are hypothesis generation, inquiry, and inference. Hypothesis generation is implemented by artificially supplying the instrument with a parameterized set of possible hypotheses that might be used to describe the physical system. The act of inquiry is handled by an inquiry engine that relies on Bayesian adaptive exploration where the optimal experiment is chosen as the one which maximizes the expected information gain. The inference engine is implemented using the nested sampling algorithm, which provides the inquiry engine with a set of posterior samples from which the expected information gain can be estimated. With these computational structures in place, the instrument will refine its hypotheses, and repeat the learning cycle by taking measurements until the system under study is described within a pre-specified tolerance. We will demonstrate our first attempts toward achieving this goal with an intelligent instrument constructed using the LEGO MINDSTORMS NXT robotics platform.\nA typical modern optimization technique is usually either heuristic or metaheuristic. This technique has managed to solve some optimization problems in the research area of science, engineering, and industry. However, implementation strategy of metaheuristic for accuracy improvement on convolution neural networks (CNN), a famous deep learning method, is still rarely investigated. Deep learning relates to a type of machine learning technique, where its aim is to move closer to the goal of artificial intelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human. In this paper, we propose the implementation strategy of three popular metaheuristic approaches, that is, simulated annealing, differential evolution, and harmony search, to optimize CNN. The performances of these metaheuristic methods in optimizing CNN on classifying MNIST and CIFAR dataset were evaluated and compared. Furthermore, the proposed methods are also compared with the original CNN. Although the proposed methods show an increase in the computation time, their accuracy has also been improved (up to 7.14 percent).\nThe next generation wireless networks (i.e. 5G and beyond), which would be extremely dynamic and complex due to the ultra-dense deployment of heterogeneous networks (HetNets), poses many critical challenges for network planning, operation, management and troubleshooting. At the same time, generation and consumption of wireless data are becoming increasingly distributed with ongoing paradigm shift from people-centric to machine-oriented communications, making the operation of future wireless networks even more complex. In mitigating the complexity of future network operation, new approaches of intelligently utilizing distributed computational resources with improved context-awareness becomes extremely important. In this regard, the emerging fog (edge) computing architecture aiming to distribute computing, storage, control, communication, and networking functions closer to end users, have a great potential for enabling efficient operation of future wireless networks. These promising architectures make the adoption of artificial intelligence (AI) principles which incorporate learning, reasoning and decision-making mechanism, as natural choices for designing a tightly integrated network. Towards this end, this article provides a comprehensive survey on the utilization of AI integrating machine learning, data analytics and natural language processing (NLP) techniques for enhancing the efficiency of wireless network operation. In particular, we provide comprehensive discussion on the utilization of these techniques for efficient data acquisition, knowledge discovery, network planning, operation and management of the next generation wireless networks. A brief case study utilizing the AI techniques for this network has also been provided.\nWe present the UK Robotics and Artificial Intelligence Hub for Offshore Robotics for Certification of Assets (ORCA Hub), a 3.5 year EPSRC funded, multi-site project. The ORCA Hub vision is to use teams of robots and autonomous intelligent systems (AIS) to work on offshore energy platforms to enable cheaper, safer and more efficient working practices. The ORCA Hub will research, integrate, validate and deploy remote AIS solutions that can operate with existing and future offshore energy assets and sensors, interacting safely in autonomous or semi-autonomous modes in complex and cluttered environments, co-operating with remote operators. The goal is that through the use of such robotic systems offshore, the need for personnel will decrease. To enable this to happen, the remote operator will need a high level of situation awareness and key to this is the transparency of what the autonomous systems are doing and why. This increased transparency will facilitate a trusting relationship, which is particularly key in high-stakes, hazardous situations.\nWe show that several constraint propagation algorithms (also called (local) consistency, consistency enforcing, Waltz, filtering or narrowing algorithms) are instances of algorithms that deal with chaotic iteration. To this end we propose a simple abstract framework that allows us to classify and compare these algorithms and to establish in a uniform way their basic properties.\nThe assumptions needed to prove Cox's Theorem are discussed and examined. Various sets of assumptions under which a Cox-style theorem can be proved are provided, although all are rather strong and, arguably, not natural.\nThe goal of this paper is to extend classical logic with a generalized notion of inductive definition supporting positive and negative induction, to investigate the properties of this logic, its relationships to other logics in the area of non-monotonic reasoning, logic programming and deductive databases, and to show its application for knowledge representation by giving a typology of definitional knowledge.\nIn this paper, we outline the prototype of an automated inference tool, called QUIP, which provides a uniform implementation for several nonmonotonic reasoning formalisms. The theoretical basis of QUIP is derived from well-known results about the computational complexity of nonmonotonic logics and exploits a representation of the different reasoning tasks in terms of quantified boolean formulae.\nWe construct a probabilistic coherence measure for information sets which determines a partial coherence ordering. This measure is applied in constructing a criterion for expanding our beliefs in the face of new information. A number of idealizations are being made which can be relaxed by an appeal to Bayesian Networks.\nThe paper reports on first preliminary results and insights gained in a project aiming at implementing the fluent calculus using methods and techniques based on binary decision diagrams. After reporting on an initial experiment showing promising results we discuss our findings concerning various techniques and heuristics used to speed up the reasoning process.\nThe current document contains a brief description of a system for Reasoning about Actions and Change called PAL (Pertinence Action Language) which makes use of several reasoning properties extracted from a Temporal Expert Systems tool called Medtool.\nIn an earlier work, we have presented operations of belief change which only affect the relevant part of a belief base. In this paper, we propose the application of the same strategy to the problem of model-based diangosis. We first isolate the subset of the system description which is relevant for a given observation and then solve the diagnosis problem for this subset.\nXNMR is a system designed to explore the results of combining the well-founded semantics system XSB with the stable-models evaluator SMODELS. Its main goal is to work as a tool for fast and interactive exploration of knowledge bases.\nIn this paper we present a rule based formalism for filtering variables domains of constraints. This formalism is well adapted for solving dynamic CSP. We take diagnosis as an instance problem to illustrate the use of these rules. A diagnosis problem is seen like finding all the minimal sets of constraints to be relaxed in the constraint network that models the device to be diagnosed\nWe study the topological models of a logic of knowledge for topological reasoning, introduced by Larry Moss and Rohit Parikh. Among our results is a solution of a conjecture by the formentioned authors, finite satisfiability property and decidability for the theory of topological models.\nIn this thesis we shall present two logical systems, MP and MP, for the purpose of reasoning about knowledge and effort. These logical systems will be interpreted in a spatial context and therefore, the abstract concepts of knowledge and effort will be defined by concrete mathematical concepts.\nIn fuzzy propositional logic, to a proposition a partial truth in [0,1] is assigned. It is well known that under certain circumstances, fuzzy logic collapses to classical logic. In this paper, we will show that under dual conditions, fuzzy logic collapses to four-valued (relevance) logic, where propositions have truth-value true, false, unknown, or contradiction. As a consequence, fuzzy entailment may be considered as ``in between'' four-valued (relevance) entailment and classical entailment.\nThe constraint of difference is known to the constraint programming community since Lauriere introduced Alice in 1978. Since then, several solving strategies have been designed for this constraint. In this paper we give both a practical overview and an abstract comparison of these different strategies.\nWe make a connection between classical polytopes called zonotopes and Support Vector Machine (SVM) classifiers. We combine this connection with the ellipsoid method to give some new theoretical results on training SVMs. We also describe some special properties of soft margin C-SVMs as parameter C goes to infinity.\nThe paper outlines ongoing research on logic-based tools for the analysis and representation of legal contracts of the kind frequently encountered in large-scale engineering projects and complex, long-term trading agreements. We consider both contract formation and contract performance, in each case identifying the representational issues and the prospects for providing automated support tools.\nThe Expansion property considered by researchers in Social Choice is shown to correspond to a logical property of nonmonotonic consequence relations that is the {\\em pure}, i.e., not involving connectives, version of a previously known weak rationality condition. The assumption that the union of two definable sets of models is definable is needed for the soundness part of the result.\nInformation personalization is fertile ground for application of AI techniques. In this article I relate personalization to the ability to capture partial information in an information-seeking interaction. The specific focus is on personalizing interactions at web sites. Using ideas from partial evaluation and explanation-based generalization, I present a modeling methodology for reasoning about personalization. This approach helps identify seven tiers of `personable traits' in web sites.\nWe address a general representation problem for belief change, and describe two interrelated representations for iterative non-prioritized change: a logical representation in terms of persistent epistemic states, and a constructive representation in terms of flocks of bases.\nAn extension of an abstract argumentation framework, called collective argumentation, is introduced in which the attack relation is defined directly among sets of arguments. The extension turns out to be suitable, in particular, for representing semantics of disjunctive logic programs. Two special kinds of collective argumentation are considered in which the opponents can share their arguments.\nIt has long been an open question whether the formula XCB = EpEEEpqErqr is, with the rules of substitution and detachment, a single axiom for the classical equivalential calculus. This paper answers that question affirmatively, thus completing a search for all such eleven-symbol single axioms that began seventy years ago.\nThis paper describes a novel approach to unsupervised learning that has been developed within a framework of \"information compression by multiple alignment, unification and search\" (ICMAUS), designed to integrate learning with other AI functions such as parsing and production of language, fuzzy pattern recognition, probabilistic and exact forms of reasoning, and others.\nWe give a brief introduction to the AIXI model, which unifies and overcomes the limitations of sequential decision theory and universal Solomonoff induction. While the former theory is suited for active agents in known environments, the latter is suited for passive prediction of unknown environments.\nAn experimental server for stock trading autonomous agents is presented and made available, together with an agent shell for swift development. The server, written in Java, was implemented as proof-of-concept for an agent trade server for a real financial exchange.\nDiversity is an important aspect of highly efficient multi-agent teams. We introduce the main factors that drive a multi-agent system in either direction along the diversity scale. A metric for diversity is described, and we speculate on the concept of transient diversity. Finally, an experiment on social entropy using a RoboCup simulated soccer team is presented.\nWe describe WSAT(cc), a local-search solver for computing models of theories in the language of propositional logic extended by cardinality atoms. WSAT(cc) is a processing back-end for the logic PS+, a recently proposed formalism for answer-set programming.\nMany different rules for decision making have been introduced in the literature. We show that a notion of generalized expected utility proposed in Part I of this paper is a universal decision rule, in the sense that it can represent essentially all other decision rules.\nSearle's Chinese Room argument is refuted by showing that he has actually given two different versions of the room, which fail for different reasons. Hence, Searle does not achieve his stated goal of showing ``that a system could have input and output capabilities that duplicated those of a native Chinese speaker and still not understand Chinese''.\nReiter's original definition of default logic allows for the application of a default that contradicts a previously applied one. We call failure this condition. The possibility of generating failures has been in the past considered as a semantical problem, and variants have been proposed to solve it. We show that it is instead a computational feature that is needed to encode some domains into default logic.\nThis document describes syntax, semantics and implementation guidelines in order to enrich the DLV system with the possibility to make external C function calls. This feature is realized by the introduction of parametric external predicates, whose extension is not specified through a logic program but implicitly computed through external code.\nDefeasible logic is a rule-based nonmonotonic logic, with both strict and defeasible rules, and a priority relation on rules. We show that inference in the propositional form of the logic can be performed in linear time. This contrasts markedly with most other propositional nonmonotonic logics, in which inference is intractable.\nGeneralized evolutionary algorithm based on Tsallis canonical distribution is proposed. The algorithm uses Tsallis generalized canonical distribution to weigh the configurations for `selection' instead of Gibbs-Boltzmann distribution. Our simulation results show that for an appropriate choice of non-extensive index that is offered by Tsallis statistics, evolutionary algorithms based on this generalization outperform algorithms based on Gibbs-Boltzmann distribution.\nWe consider the well-known family ALC(D) of description logics with a concrete domain, and provide first results on a framework obtained by augmenting ALC(D) atemporal roles and aspatial concrete domain with temporal roles and a spatial concrete domain.\nWe define a quantitative constraint language subsuming two calculi well-known in QSR (Qualitative Spatial Reasoning): Frank's cone-shaped and projection-based calculi of cardinal direction relations. We show how to solve a CSP (Constraint Satisfaction Problem) expressed in the language.\nThis paper reports about experiments with GermaNet as a resource within domain specific document analysis. The main question to be answered is: How is the coverage of GermaNet in a specific domain? We report about results of a field test of GermaNet for analyses of autopsy protocols and present a sketch about the integration of GermaNet inside XDOC. Our remarks will contribute to a GermaNet user's wish list.\nThe aim of the project presented in this paper is to design a system for an NLG architecture, which supports the documentation process of eBusiness models. A major task is to enrich the formal description of an eBusiness model with additional information needed in an NLG task.\nThe optimal complexity of neural networks is achieved when the self-organization principles is used to eliminate the contradictions existing in accordance with the K. Godel theorem about incompleteness of the systems based on axiomatics. The principle of S. Beer exterior addition the Heuristic Group Method of Data Handling by A. Ivakhnenko realized is used.\nThis paper describes and evaluates the Metalinguistic Operation Processor (MOP) system for automatic compilation of metalinguistic information from technical and scientific documents. This system is designed to extract non-standard terminological resources that we have called Metalinguistic Information Databases (or MIDs), in order to help update changing glossaries, knowledge bases and ontologies, as well as to reflect the metastable dynamics of special-domain knowledge.\nTo the reduct problems of decision system, the paper proposes the notion of dynamic core according to the dynamic reduct model. It describes various formal definitions of dynamic core, and discusses some properties about dynamic core. All of these show that dynamic core possesses the essential characters of the feature core.\nWe study and compare the learning dynamics of two universal learning algorithms, one based on Bayesian learning and the other on prediction with expert advice. Both approaches have strong asymptotic performance guarantees. When confronted with the task of finding good long-term strategies in repeated 2x2 matrix games, they behave quite differently.\nEvolutionary computing (EC) is an exciting development in Computer Science. It amounts to building, applying and studying algorithms based on the Darwinian principles of natural selection. In this paper we briefly introduce the main concepts behind evolutionary computing. We present the main components all evolutionary algorithms (EA), sketch the differences between different types of EAs and survey application areas ranging from optimization, modeling and simulation to entertainment.\nTwo example of Evolutionary System Identification are presented to highlight the importance of incorporating Domain Knowledge: the discovery of an analytical indentation law in Structural Mechanics using constrained Genetic Programming, and the identification of the repartition of underground velocities in Seismic Prospection. Critical issues for sucessful ESI are discussed in the light of these results.\nIn this short paper we present a linear constraint solver for the UniCalc system, an environment for reliable solution of mathematical modeling problems.\nIn this paper we propose a new family of Belief Conditioning Rules (BCRs) for belief revision. These rules are not directly related with the fusion of several sources of evidence but with the revision of a belief assignment available at a given time according to the new truth (i.e. conditioning constraint) one has about the space of solutions of the problem.\nA modification of OWL-S regarding parameter description is proposed. It is strictly based on Description Logic. In addition to class description of parameters it also allows the modelling of relations between parameters and the precise description of the size of data to be supplied to a service. In particular, it solves two major issues identified within current proposals for a Semantic Web Service annotation standard.\nThe aim of this paper is to provide a sound framework for addressing a difficult problem: the automatic construction of an autonomous agent's modular architecture. We combine results from two apparently uncorrelated domains: Autonomous planning through Markov Decision Processes and a General Data Clustering Approach using a kernel-like method. Our fundamental idea is that the former is a good framework for addressing autonomy whereas the latter allows to tackle self-organizing problems.\nWe develop a system which must be able to perform the same inferences that a human reader of an accident report can do and more particularly to determine the apparent causes of the accident. We describe the general framework in which we are situated, linguistic and semantic levels of the analysis and the inference rules used by the system.\nKanerva's Binary Spatter Codes are reformulated in terms of geometric algebra. The key ingredient of the construction is the representation of XOR binding in terms of geometric product.\nCreation procedure of associative patterns ensemble in terms of formal logic with using neural net-work (NN) model is formulated. It is shown that the associative patterns set is created by means of unique procedure of NN work which having individual parameters of entrance stimulus transformation. It is ascer-tained that the quantity of the selected associative patterns possesses is a constant.\nIn this paper, the traditional k-modes clustering algorithm is extended by weighting attribute value matches in dissimilarity computation. The use of attribute value weighting technique makes it possible to generate clusters with stronger intra-similarities, and therefore achieve better clustering performance. Experimental results on real life datasets show that these value weighting based k-modes algorithms are superior to the standard k-modes algorithm with respect to clustering accuracy.\nIn this paper, we determine the complexity of the satisfiability problem for various logics obtained by adding numerical quantifiers, and other constructions, to the traditional syllogistic. In addition, we demonstrate the incompleteness of some recently proposed proof-systems for these logics.\nThe problem of matchmaking in electronic social networks is formulated as an optimization problem. In particular, a function measuring the matching degree of fields of interest of a search profile with those of an advertising profile is proposed.\nWe try a conceptual analysis of inheritance diagrams, first in abstract terms, and then compare to \"normality\" and the \"small/big sets\" of preferential and related reasoning. The main ideas are about nodes as truth values and information sources, truth comparison by paths, accessibility or relevance of information by paths, relative normality, and prototypical reasoning.\nOne of the outstanding problems of philosophy of science and mathematics today is whether there is just \"one\" unique mathematics or the same can be bifurcated into \"pure\" and \"applied\" categories. A novel solution for this problem is offered here. This will allow us to appreciate the manner in which mathematics acts as an exact and precise language of nature. This has significant implications for Artificial Intelligence.\nGoedel's Incompleteness Theorems have the same scientific status as Einstein's principle of relativity, Heisenberg's uncertainty principle, and Watson and Crick's double helix model of DNA. Our aim is to discuss some new faces of the incompleteness phenomenon unveiled by an information-theoretic approach to randomness and recent developments in quantum computing.\nThis paper has been withdrawn by the author. This draft is withdrawn for its poor quality in english, unfortunately produced by the author when he was just starting his science route. Look at the ICML version instead: http://icml2008.cs.helsinki.fi/papers/111.pdf\nWhen will the Internet become aware of itself? In this note the problem is approached by asking an alternative question: Can the Internet cope with stress? By extrapolating the psychological difference between coping and defense mechanisms a distributed software experiment is outlined which could reject the hypothesis that the Internet is not a conscious entity.\nWe argue for a compositional semantics grounded in a strongly typed ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. Assuming the existence of such a structure, we show that the semantics of various natural language phenomena may become nearly trivial.\nThis paper provides a new, decidable definition of the higher- order recursive path ordering in which type comparisons are made only when needed, therefore eliminating the need for the computability clo- sure, and bound variables are handled explicitly, making it possible to handle recursors for arbitrary strictly positive inductive types.\nWe present an algorithm for effectively generating binary sequences which would be rated by people as highly likely to have been generated by a random process, such as flipping a fair coin.\nThis theoretical work defines the measure of autocorrelation of evolvability in the context of neutral fitness landscape. This measure has been studied on the classical MAX-SAT problem. This work highlight a new characteristic of neutral fitness landscapes which allows to design new adapted metaheuristic.\nThe mnesor theory is the adaptation of vectors to artificial intelligence. The scalar field is replaced by a lattice. Addition becomes idempotent and multiplication is interpreted as a selection operation. We also show that mnesors can be the foundation for a linear calculus.\nIn this article the algorithm for transformation of logic functions which are given by truth tables is considered. The suggested algorithm allows the transformation of many-valued logic functions with the required number of variables and can be looked in this sense as universal.\nThe treatment of both aleatory and epistemic uncertainty by recent methods often requires an high computational effort. In this abstract, we propose a numerical sampling method allowing to lighten the computational burden of treating the information by means of so-called fuzzy random variables.\nThis paper addresses the question of which models fit with information concerning the preferences of the decision maker over each attribute, and his preferences about aggregation of criteria (interacting criteria). We show that the conditions induced by these information plus some intuitive conditions lead to a unique possible aggregation operator: the Choquet integral.\nThe data-complexity of both satisfiability and finite satisfiability for the two-variable fragment with counting is NP-complete; the data-complexity of both query-answering and finite query-answering for the two-variable guarded fragment with counting is co-NP-complete.\nWe discuss metacognitive modelling as an enhancement to cognitive modelling and computing. Metacognitive control mechanisms should enable AI systems to self-reflect, reason about their actions, and to adapt to new situations. In this respect, we propose implementation details of a knowledge taxonomy and an augmented data mining life cycle which supports a live integration of obtained models.\nThis report describes experimental results for a set of benchmarks on program verification. It compares the capabilities of CPBVP \"Constraint Programming framework for Bounded Program Verification\" [4] with the following frameworks: ESC/Java, CBMC, Blast, EUREKA and Why.\nThis paper generalizes the traditional statistical concept of prediction intervals for arbitrary probability density functions in high-dimensional feature spaces by introducing significance level distributions, which provides interval-independent probabilities for continuous random variables. The advantage of the transformation of a probability density function into a significance level distribution is that it enables one-class classification or outlier detection in a direct manner.\nWe present a domain-independent algorithm that computes macros in a novel way. Our algorithm computes macros \"on-the-fly\" for a given set of states and does not require previously learned or inferred information, nor prior domain knowledge. The algorithm is used to define new domain-independent tractable classes of classical planning that are proved to include \\emph{Blocksworld-arm} and \\emph{Towers of Hanoi}.\nThe effects of policy sharing between agents in a multi-agent dynamical system has not been studied extensively. I simulate a system of agents optimizing the same task using reinforcement learning, to study the effects of different population densities and policy sharing. I demonstrate that sharing policies decreases the time to reach asymptotic behavior, and results in improved asymptotic behavior.\nWe present a Monte Carlo algorithm for efficiently finding near optimal moves and bids in the game of Bidding Hex. The algorithm is based on the recent solution of Random-Turn Hex by Peres, Schramm, Sheffield, and Wilson together with Richman's work connecting random-turn games to bidding games.\nIn this paper we present the N-norms/N-conorms in neutrosophic logic and set as extensions of T-norms/T-conorms in fuzzy logic and set. Also, as an extension of the Intuitionistic Fuzzy Topology we present the Neutrosophic Topologies.\nWe propose a new extended format to represent constraint networks using XML. This format allows us to represent constraints defined either in extension or in intension. It also allows us to reference global constraints. Any instance of the problems CSP (Constraint Satisfaction Problem), QCSP (Quantified CSP) and WCSP (Weighted CSP) can be represented using this format.\nWe present a convenient notation for positive/negative-conditional equations. The idea is to merge rules specifying the same function by using case-, if-, match-, and let-expressions. Based on the presented macro-rule-construct, positive/negative-conditional equational specifications can be written on a higher level. A rewrite system translates the macro-rule-constructs into positive/negative-conditional equations.\nWe propose a new family of constraints which combine together lexicographical ordering constraints for symmetry breaking with other common global constraints. We give a general purpose propagator for this family of constraints, and show how to improve its complexity by exploiting properties of the included global constraints.\nMany reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an optimistic model to discover promising paths for real exploration. This reduces the needs for the real exploration.\nWe describe a variant of resolution rule of proof and show that it is complete for stable semantics of logic programs. We show applications of this result.\nWe discuss the role of random basis function approximators in modeling and control. We analyze the published work on random basis function approximators and demonstrate that their favorable error rate of convergence O(1/n) is guaranteed only with very substantial computational resources. We also discuss implications of our analysis for applications of neural networks in modeling and control.\nWe present a novel approach for learning nonlinear dynamic models, which leads to a new set of tools capable of solving problems that are otherwise difficult. We provide theory showing this new approach is consistent for models with long range structure, and apply the approach to motion capture and high-dimensional video data, yielding results superior to standard alternatives.\nIn a case study we investigate whether off the shelf higher-order theorem provers and model generators can be employed to automate reasoning in and about quantified multimodal logics. In our experiments we exploit the new TPTP infrastructure for classical higher-order logic.\nCould we define I? Throughout this article we give a negative answer to this question. More exactly, we show that there is no definition for I in a certain way. But this negative answer depends on our definition of definability. Here, we try to consider sufficient generalized definition of definability. In the middle of paper a paradox will arise which makes us to modify the way we use the concept of property and definability.\nI discuss (ontologies_and_ontological_knowledge_bases / formal_methods_and_theories) duality and its category theory extensions as a step toward a solution to Knowledge-Based Systems Theory. In particular I focus on the example of the design of elements of ontologies and ontological knowledge bases of next three electronic courses: Foundations of Research Activities, Virtual Modeling of Complex Systems and Introduction to String Theory.\nMnesors are defined as elements of a semimodule over the min-plus integers. This two-sorted structure is able to merge graduation properties of vectors and idempotent properties of boolean numbers, which makes it appropriate for hybrid systems. We apply it to the control of an inverted pendulum and design a full logical controller, that is, without the usual algebra of real numbers.\nIn this paper we introduce two new DSm fusion conditioning rules with example, and as a generalization of them a class of DSm fusion conditioning rules, and then extend them to a class of DSm conditioning rules.\nTo capture the uncertainty of information or knowledge in information systems, various information granulations, also known as knowledge granulations, have been proposed. Recently, several axiomatic definitions of information granulation have been introduced. In this paper, we try to improve these axiomatic definitions and give a universal construction of information granulation by relating information granulations with a class of functions of multiple variables. We show that the improved axiomatic definition has some concrete information granulations in the literature as instances.\nThe aim of this paper is to introduce a logic in which nouns and verbs are handled together as a deductive reasoning, and also to observe the relationship between nouns and verbs as well as between logics and conversations.\nWe introduce the weighted CFG constraint and propose a propagation algorithm that enforces domain consistency in $O(n^3|G|)$ time. We show that this algorithm can be decomposed into a set of primitive arithmetic constraints without hindering propagation.\nLSCS is a satellite workshop of the international conference on principles and practice of Constraint Programming (CP), since 2004. It is devoted to local search techniques in constraint satisfaction, and focuses on all aspects of local search techniques, including: design and implementation of new algorithms, hybrid stochastic-systematic search, reactive search optimization, adaptive search, modeling for local-search, global constraints, flexibility and robustness, learning methods, and specific applications.\nThe fundamental laws of quantum world upsets the logical foundation of classic physics. They are completely counter-intuitive with many bizarre behaviors. However, this paper shows that they may make sense from the perspective of a general decision-optimization principle for cooperation. This principle also offers a generalization of Nash equilibrium, a key concept in game theory, for better payoffs and stability of game playing.\nTraditional staging is based on a formal approach of similarity leaning on dramaturgical ontologies and instanciation variations. Inspired by interactive data mining, that suggests different approaches, we give an overview of computer science and theater researches using computers as partners of the actor to escape the a priori specification of roles.\nWe present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice.\nRefereeToolbox is a java package implementing combination operators for fusing evidences. It is downloadable from: http://refereefunction.fredericdambreville.com/releases RefereeToolbox is based on an interpretation of the fusion rules by means of Referee Functions. This approach implies a dissociation between the definition of the combination and its actual implementation, which is common to all referee-based combinations. As a result, RefereeToolbox is designed with the aim to be generic and evolutive.\nWe present in this paper some examples of how to compute by hand the PCR5 fusion rule for three sources, so the reader will better understand its mechanism. We also take into consideration the importance of sources, which is different from the classical discounting of sources.\nIn case of realization of successful business, gain analysis is essential. In this paper we have cited some new techniques of gain expectation on the basis of neural property of perceptron. Support rule and Sequence mining based artificial intelligence oriented practices have also been done in this context. In the view of above fuzzy and statistical based gain sensing is also pointed out.\nWe have an audacious dream, we would like to develop a simulation and virtual reality system to support the decision making in European football (soccer). In this review, we summarize the efforts that we have made to fulfil this dream until recently. In addition, an introductory version of FerSML (Footballer and Football Simulation Markup Language) is presented in this paper.\nWe report recent research on computing with biology-based neural network models by means of physics-based opto-electronic hardware. New technology provides opportunities for very-high-speed computation and uncovers problems obstructing the wide-spread use of this new capability. The Computation Modeling community may be able to offer solutions to these cross-boundary research problems.\nOne possible escape from the Gibbard-Satterthwaite theorem is computational complexity. For example, it is NP-hard to compute if the STV rule can be manipulated. However, there is increasing concern that such results may not re ect the difficulty of manipulation in practice. In this tutorial, I survey recent results in this area.\nSymmetry is a common feature of many combinatorial problems. Unfortunately eliminating all symmetry from a problem is often computationally intractable. This paper argues that recent parameterized complexity results provide insight into that intractability and help identify special cases in which symmetry can be dealt with more tractably\nThe Border algorithm and the iPred algorithm find the Hasse diagrams of FCA lattices. We show that they can be generalized to arbitrary lattices. In the case of iPred, this requires the identification of a join-semilattice homomorphism into a distributive lattice.\nThis research applies ideas from argumentation theory in the context of semantic wikis, aiming to provide support for structured-large scale argumentation between human agents. The implemented prototype is exemplified by modelling the MMR vaccine controversy.\nSemantic wikis, wikis enhanced with Semantic Web technologies, are appropriate systems for community-authored knowledge models. They are particularly suitable for scientific collaboration. This paper details the design principles ofWikiBridge, a semantic wiki.\nIn this report, we propose a quick survey of the currently known techniques for encoding a Boolean cardinality constraint into a CNF formula, and we discuss about the relevance of these encodings. We also propose models to facilitate analysis and design of CNF encodings for Boolean constraints.\nBoolVar/PB is an open source java library dedicated to the translation of pseudo-Boolean constraints into CNF formulae. Input constraints can be categorized with tags. Several encoding schemes are implemented in a way that each input constraint can be translated using one or several encoders, according to the related tags. The library can be easily extended by adding new encoders and / or new output formats.\nFour options for assigning a meaning to Islamic Logic are surveyed including a new proposal for an option named \"Real Islamic Logic\" (RIL). That approach to Islamic Logic should serve modern Islamic objectives in a way comparable to the functionality of Islamic Finance. The prospective role of RIL is analyzed from several perspectives: (i) parallel distributed systems design, (ii) reception by a community structured audience, (iii) informal logic and applied non-classical logics, and (iv) (in)tractability and artificial intelligence.\nWe solve constraint satisfaction problems through translation to answer set programming (ASP). Our reformulations have the property that unit-propagation in the ASP solver achieves well defined local consistency properties like arc, bound and range consistency. Experiments demonstrate the computational value of this approach.\nTwo distinct algorithms are presented to extract (schemata of) resolution proofs from closed tableaux for propositional schemata. The first one handles the most efficient version of the tableau calculus but generates very complex derivations (denoted by rather elaborate rewrite systems). The second one has the advantage that much simpler systems can be obtained, however the considered proof procedure is less efficient.\nLogic programming is a powerful paradigm for programming autonomous agents in dynamic domains, as witnessed by languages such as Golog and Flux. In this work we present ALPprolog, an expressive, yet efficient, logic programming language for the online control of agents that have to reason about incomplete information and sensing actions.\nIn this paper, we investigate the following question: how could you write such computer programs that can work like conscious beings? The motivation behind this question is that we want to create such applications that can see the future. The aim of this paper is to provide an overall conceptual framework for this new approach to machine consciousness. So we introduce a new programming paradigm called Consciousness Oriented Programming (COP).\nSelf-Organizing Maps are commonly used for unsupervised learning purposes. This paper is dedicated to the certain modification of SOM called SOMN (Self-Organizing Mixture Networks) used as a mechanism for representing grayscale digital images. Any grayscale digital image regarded as a distribution function can be approximated by the corresponding Gaussian mixture. In this paper, the use of SOMN is proposed in order to obtain such approximations for input grayscale images in unsupervised manner.\nTwo different conceptions of emergence are reconciled as two instances of the phenomenon of detection. In the process of comparing these two conceptions, we find that the notions of complexity and detection allow us to form a unified definition of emergence that clearly delineates the role of the observer.\nThe aim of this paper is to announce the release of a novel system for abstract argumentation which is based on decomposition and dynamic programming. We provide first experimental evaluations to show the feasibility of this approach.\nIn this article, we present a corpus of dialogues between a schizophrenic speaker and an interlocutor who drives the dialogue. We had identified specific discontinuities for paranoid schizophrenics. We propose a modeling of these discontinuities with S-DRT (its pragmatic part)\nWe present the Linux package configuration tool aspcud based on Answer Set Programming. In particular, we detail aspcud's preprocessor turning a CUDF specification into a set of logical facts.\nWe study the complexity of reasoning in abstracts argumentation frameworks close to graph classes that allow for efficient reasoning methods, i.e.\\ to one of the classes of acyclic, noeven, biparite and symmetric AFs. In this work we show that certain reasoning problems on the second level of the polynomial hierarchy still maintain their full complexity when restricted to instances of fixed distance to one of the above graph classes.\nWe discuss the frequent pattern mining problem in a general setting. From an analysis of abstract representations, summarization and frequent pattern mining, we arrive at a generalization of the problem. Then, we show how the problem can be cast into the powerful language of algorithmic information theory. This allows us to formulate a simple algorithm to mine for all frequent patterns.\nWe propose a method called EDML for learning MAP parameters in binary Bayesian networks under incomplete data. The method assumes Beta priors and can be used to learn maximum likelihood parameters when the priors are uninformative. EDML exhibits interesting behaviors, especially when compared to EM. We introduce EDML, explain its origin, and study some of its properties both analytically and empirically.\nWe show that the existence of a computationally efficient calibration algorithm, with a low weak calibration rate, would imply the existence of an efficient algorithm for computing approximate Nash equilibria - thus implying the unlikely conclusion that every problem in PPAD is solvable in polynomial time.\nRelational representations in reinforcement learning allow for the use of structural information like the presence of objects and relationships between them in the description of value functions. Through this paper, we show that such representations allow for the inclusion of background knowledge that qualitatively describes a state and can be used to design agents that demonstrate learning behavior in domains with large state and actions spaces such as computer games.\nWe introduce a family of new equational semantics for argumentation networks which can handle odd and even loops in a uniform manner. We offer one version of equational semantics which is equivalent to CF2 semantics, and a better version which gives the same results as traditional Dung semantics for even loops but can still handle odd loops.\nThis paper takes new look on language and knowledge modelling for corpus linguistics. Using ideas of Chaitin, a line of argument is made against language/knowledge separation in Natural Language Processing. A simplistic model, that generalises approaches to language and knowledge, is proposed. One of hypothetical consequences of this model is Strong AI.\nThe article discusses some applications of fuzzy logic ideas to formalizing of the Case-Based Reasoning (CBR) process and to measuring the effectiveness of CBR systems\nLearning in Riemannian orbifolds is motivated by existing machine learning algorithms that directly operate on finite combinatorial structures such as point patterns, trees, and graphs. These methods, however, lack statistical justification. This contribution derives consistency results for learning problems in structured domains and thereby generalizes learning in vector spaces and manifolds.\nObjectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Level - Adult (CARLA) baseline score was most significant in predicting outcome over time (odds ratio 4.1 + 2.27). Other variables with consistently significant impact on outcome included payer, diagnostic category, location and provision of case management services. Conclusions: This approach represents a promising avenue toward reducing the current gap between research and practice across healthcare, developing data-driven clinical decision support based on real-world populations, and serving as a component of embedded clinical artificial intelligences that \"learn\" over time.\nThis paper deals with chain graphs under the alternative Andersson-Madigan-Perlman (AMP) interpretation. In particular, we present a constraint based algorithm for learning an AMP chain graph a given probability distribution is faithful to. We also show that the extension of Meek's conjecture to AMP chain graphs does not hold, which compromises the development of efficient and correct score+search learning algorithms under assumptions weaker than faithfulness.\nA semantic embedding of (constant domain) quantified conditional logic in classical higher-order logic is presented.\nWe introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multi-level heuristic refinement. The method is computationally efficient and achieves better quantization errors than the\nA simplified description of Fuzzy TOPSIS (Technique for Order Preference by Similarity to Ideal Situation) is presented. We have adapted the TOPSIS description from existing Fuzzy theory literature and distilled the bare minimum concepts required for understanding and applying TOPSIS. An example has been worked out to illustrate the application of TOPSIS for a multi-criteria group decision making scenario.\nThe causal structure of cognition can be simulated but not implemented computationally, just as the causal structure of a comet can be simulated but not implemented computationally. The only thing that allows us even to imagine otherwise is that cognition, unlike a comet, is invisible (to all but the cognizer).\n\"The hardest logic puzzle ever\" presented by George Boolos became a target for philosophers and logicians who tried to modify it and make it even tougher. I propose further modification of the original puzzle where part of the available information is eliminated but the solution is still possible. The solution also gives interesting ideas on logic behind discovery of unknown language.\nWe present a novel technique to remove spurious ambiguity from transition systems for dependency parsing. Our technique chooses a canonical sequence of transition operations (computation) for a given dependency tree. Our technique can be applied to a large class of bottom-up transition systems, including for instance Nivre (2004) and Attardi (2006).\nThis paper summarises the current state-of-the art in the study of compositionality in distributional semantics, and major challenges for this area. We single out generalised quantifiers and intensional semantics as areas on which to focus attention for the development of the theory. Once suitable theories have been developed, algorithms will be needed to apply the theory to tasks. Evaluation is a major problem; we single out application to recognising textual entailment and machine translation for this purpose.\nIn this paper, we present a new approach dedicated to correcting the spelling errors of the Arabic language. This approach corrects typographical errors like inserting, deleting, and permutation. Our method is inspired from the Levenshtein algorithm, and allows a finer and better scheduling than Levenshtein. The results obtained are very satisfactory and encouraging, which shows the interest of our new approach.\nAnt-based algorithms are successful tools for solving complex problems. One of these problems is the Linear Ordering Problem (LOP). The paper shows new results on some LOP instances, using Ant Colony System (ACS) and the Step-Back Sensitive Ant Model (SB-SAM).\nIn this paper, we show the possibility of using a linear Conditional Random Fields (CRF) for terminology extraction from a specialized text corpus.\nWe propose a solution to the problem of estimating a Riemannian metric associated with a given differentiable manifold. The metric learning problem is based on minimizing the relative volume of a given set of points. We derive the details for a family of metrics on the multinomial simplex. The resulting metric has applications in text classification and bears some similarity to TFIDF representation of text documents.\nWe use princiles of fuzzy logic to develop a general model representing several processes in a system's operation characterized by a degree of vagueness and/or uncertainy. Further, we introduce three altenative measures of a fuzzy system's effectiveness connected to the above model. An applcation is also developed for the Mathematical Modelling process illustrating our results.\nWe know anything because we learn about it, there is anything we ever share about it, but now a lot of media that can represent how it happened as infrastructure of the knowledge sharing. This paper aims to introduce a model for understanding a problem in knowledge sharing based on interaction.\nWe present a new algorithm for approximate inference in probabilistic programs, based on a stochastic gradient for variational programs. This method is efficient without restrictions on the probabilistic program; it is particularly practical for distributions which are not analytically tractable, including highly structured distributions that arise in probabilistic programs. We show how to automatically derive mean-field probabilistic programs and optimize them, and demonstrate that our perspective improves inference efficiency over other algorithms.\nRecent theoretical work has identified random projection as a promising dimensionality reduction technique for learning mixtures of Gausians. Here we summarize these results and illustrate them by a wide variety of experiments on synthetic and real data.\nIn our work we define a new algebra of operators as a substitute for fuzzy logic. Its primary purpose is for construction of binary discriminators for phonemes based on spectral content. It is optimized for design of non-parametric computational circuits, and makes uses of 4 operations: $\\min$, $\\max$, the difference and generalized additively homogenuous means.\nRecent improvements of the LEO-II theorem prover are presented. These improvements include a revised ATP interface, new translations into first-order logic, rule support for the axiom of choice, detection of defined equality, and more flexible strategy scheduling.\nWith a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O({\\kappa}/T) for strongly convex functions, instead of O({\\kappa} ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O({\\kappa}/T).\nWe examine three different algorithms that enable the collision certificate method from [Bialkowski, et al.] to handle the case of a centralized multi-robot team. By taking advantage of symmetries in the configuration space of multi-robot teams, our methods can significantly reduce the number of collision checks vs. both [Bialkowski, et al.] and standard collision checking implementations.\nWe present Exponentiated Gradient LINUCB, an algorithm for con-textual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.\nWe introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.\nWe study a generic program to investigate the scope for automatically customising it for a vital current task, which was not considered when it was first written. In detail, we show genetic programming (GP) can evolve models of aspects of BLAST's output when it is used to map Solexa Next-Gen DNA sequences to the human genome.\nThis work uses the L-system to construct a tree structure for the text sequence and derives its complexity. It serves as a measure of structural complexity of the text. It is applied to anomaly detection in data transmission.\nThis paper describes a program that solves elementary mathematical problems, mostly in metric space theory, and presents solutions that are hard to distinguish from solutions that might be written by human mathematicians. The program is part of a more general project, which we also discuss.\nWe present a complete finite axiomatization of the unrestricted implication problem for inclusion and conditional independence atoms in the context of dependence logic. For databases, our result implies a finite axiomatization of the unrestricted implication problem for inclusion, functional, and embedded multivalued dependencies in the unirelational case.\nWe consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate the updates received from neighboring agents using a gossip-like mechanism. The combined scheme is shown to converge for both discounted and average cost problems.\nTo know which operators to apply and in which order, as well as attributing good values to their parameters is a challenge for users of computer vision. This paper proposes a solution to this problem as a multi-agent system modeled according to the Vowel approach and using the Q-learning algorithm to optimize its choice. An implementation is given to test and validate this method.\nIn this paper we study the complexity of strategic argumentation for dialogue games. A dialogue game is a 2-player game where the parties play arguments. We show how to model dialogue games in a skeptical, non-monotonic formalism, and we show that the problem of deciding what move (set of rules) to play at each turn is an NP-complete problem.\nAbstract dialectical frameworks (ADFs) are a powerful generalisation of Dung's abstract argumentation frameworks. In this paper we present an answer set programming based software system, called DIAMOND (DIAlectical MOdels eNcoDing). It translates ADFs into answer set programs whose stable models correspond to models of the ADF with respect to several semantics (i.e. admissible, complete, stable, grounded).\nIn this paper we explore various parameter settings of the state-of-art Statistical Machine Translation system to improve the quality of the translation for a `distant' language pair like English-Hindi. We proposed new techniques for efficient reordering. A slight improvement over the baseline is reported using these techniques. We also show that a simple pre-processing step can improve the quality of the translation significantly.\nThis paper presents a microkernel architecture for constraint programming organized around a number of small number of core functionalities and minimal interfaces. The architecture contrasts with the monolithic nature of many implementations. Experimental results indicate that the software engineering benefits are not incompatible with runtime efficiency.\nAirspace sectorisation provides a partition of a given airspace into sectors, subject to geometric constraints and workload constraints, so that some cost metric is minimised. We make a study of the constraints that arise in airspace sectorisation. For each constraint, we give an analysis of what algorithms and properties are required under systematic search and stochastic local search.\nThese notes pose a \"proof challenge\": a proof, or disproof, of the proposition that \"For any given body of information, I, expressed as a one-dimensional sequence of atomic symbols, a multiple alignment concept, described in the document, provides a means of encoding all the redundancy that may exist in I. Aspects of the challenge are described.\nCorrelation clustering is a concept of machine learning. The ultimate goal of such a clustering is to find a partition with minimal conflicts. In this paper we investigate a correlation clustering of integers, based upon the greatest common divisor.\nIn this paper we discuss some reasons why temporal logic might not be suitable to model real life norms. To show this, we present a novel deontic logic contrary-to-duty/derived permission paradox based on the interaction of obligations, permissions and contrary-to-duty obligations. The paradox is inspired by real life norms.\nThis paper discloses the potential of OWL (Web Ontology Language) ontologies for generation of rules. The main purpose of this paper is to identify new types of rules, which may be generated from OWL ontologies. Rules, generated from OWL ontologies, are necessary for the functioning of the Semantic Web Expert System. It is expected that the Semantic Web Expert System (SWES) will be able to process ontologies from the Web with the purpose to supplement or even to develop its knowledge base.\nTurKontrol, and algorithm presented in (Dai et al. 2010), uses a POMDP to model and control an iterative workflow for crowdsourced work. Here, TurKontrol is re-implemented as \"TurKPF,\" which uses a Particle Filter to reduce computation time & memory usage. Most importantly, in our experimental environment with default parameter settings, the action is chosen nearly instantaneously. Through a series of experiments we see that TurKPF and TurKontrol perform similarly.\nIn this essay the stance on robots is discussed. The attitude against robots in history, starting in Ancient Greek culture until the industrial revolution is described. The uncanny valley and some possible explanations are given. Some differences in Western and Asian understanding of robots are listed and finally we answer the question raised with the title.\nThe NMR community would like to build a repository of benchmarks to push forward the design of systems implementing NMR as it has been the case for many other areas in AI. There are a number of lessons which can be learned from the experience of other communi- ties. Here are a few thoughts about the requirements and choices to make before building such a repository.\nDialogue games are a two-player semantics for a variety of logics, including intuitionistic and classical logic. Dialogues can be viewed as a kind of analytic calculus not unlike tableaux. Can dialogue games be an effective foundation for proof search in intuitionistic logic (both first-order and propositional)? We announce Kuno, an automated theorem prover for intuitionistic first-order logic based on dialogue games.\nIn this treatise we aim to build a hybrid network automated (self-adaptive) security threats discovery and prevention system; by using unconventional techniques and methods, including fuzzy logic and biological inspired algorithms under the context of soft computing.\nThe paper presents a knowledge representation language $\\mathcal{A}log$ which extends ASP with aggregates. The goal is to have a language based on simple syntax and clear intuitive and mathematical semantics. We give some properties of $\\mathcal{A}log$, an algorithm for computing its answer sets, and comparison with other approaches.\nThis paper presents an overview of `Lexpresso', a Controlled Natural Language developed at the Defence Science & Technology Organisation as a bidirectional natural language interface to a high-level information fusion system. The paper describes Lexpresso's main features including lexical coverage, expressiveness and range of linguistic syntactic and semantic structures. It also touches on its tight integration with a formal semantic formalism and tentatively classifies it against the PENS system.\nIn this paper, concept of possibility neutrosophic soft set and its operations are defined, and their properties are studied. An application of this theory in decision making is investigated. Also a similarity measure of two possibility neutrosophic soft sets is introduced and discussed. Finally an application of this similarity measure is given to select suitable person for position in a firm.\nIn Inverse subsumption for complete explanatory induction Yamamoto et al. investigate which inductive logic programming systems can learn a correct hypothesis $H$ by using the inverse subsumption instead of inverse entailment. We prove that inductive logic programming system Imparo is complete by inverse subsumption for learning a correct definite hypothesis $H$ wrt the definite background theory $B$ and ground atomic examples $E$, by establishing that there exists a connected theory $T$ for $B$ and $E$ such that $H$ subsumes $T$.\nWe introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.\nThe paper attempts to describe the space of possible mind designs by first equating all minds to software. Next it proves some interesting properties of the mind design space such as infinitude of minds, size and representation complexity of minds. A survey of mind design taxonomies is followed by a proposal for a new field of investigation devoted to study of minds, intellectology, a list of open problems for this new field is presented.\nIn Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration. However, such an approach generally depends on the domain, viz., the scale of the rewards must be known, and the feature representation must have a constant norm. We present a simple approach that performs optimistic initialization with less dependence on the domain.\nThe main goal of this work is to analyze the behaviour of the FA quantifier fuzzification mechanism. As we prove in the paper, this model has a very solid theorethical behaviour, superior to most of the models defined in the literature. Moreover, we show that the underlying probabilistic interpretation has very interesting consequences.\nWe present a parallel solver for numerical constraint satisfaction problems (NCSPs) that can scale on a number of cores. Our proposed method runs worker solvers on the available cores and simultaneously the workers cooperate for the search space distribution and balancing. In the experiments, we attained up to 119-fold speedup using 256 cores of a parallel computer.\nROSS (\"Representation, Ontology, Structure, Star\") is introduced as a new method for knowledge representation that emphasizes representational constructs for physical structure. The ROSS representational scheme includes a language called \"Star\" for the specification of ontology classes. The ROSS method also includes a formal scheme called the \"instance model\". Instance models are used in the area of natural language meaning representation to represent situations. This paper provides both the rationale and the philosophical background for the ROSS method.\nThis paper briefly characterizes the field of cognitive computing. As an exemplification, the field of natural language question answering is introduced together with its specific challenges. A possibility to master these challenges is illustrated by a detailed presentation of the LogAnswer system, which is a successful representative of the field of natural language question answering.\nThe article is dedicated to the analysis of the existing models for assessment based of the fuzzy logic centroid technique. A new Generalized Rectangular Model were developed. Some generalizations of the existing models are offered.\nThis short report describes an automated BWAPI-based script developed for live streams of a StarCraft Brood War bot tournament, SSCAIT. The script controls the in-game camera in order to follow the relevant events and improve the viewer experience. We enumerate its novel features and provide a few implementation notes.\nWhat is happiness for reinforcement learning agents? We seek a formal definition satisfying a list of desiderata. Our proposed definition of happiness is the temporal difference error, i.e. the difference between the value of the obtained reward and observation and the agent's expectation of this value. This definition satisfies most of our desiderata and is compatible with empirical research on humans. We state several implications and discuss examples.\nIn this paper, we propose a different insight to analyze AdaBoost. This analysis reveals that, beyond some preconceptions, AdaBoost can be directly used as an asymmetric learning algorithm, preserving all its theoretical properties. A novel class-conditional description of AdaBoost, which models the actual asymmetric behavior of the algorithm, is presented.\nWe introduce a lazy approach to the explanation-based approximation of probabilistic logic programs. It uses only the most significant part of the program when searching for explanations. The result is a fast and anytime approximate inference algorithm which returns hard lower and upper bounds on the exact probability. We experimentally show that this method outperforms state-of-the-art approximate inference.\nSolomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent.\nWe introduce a new rule based system for belief tracking in dialog systems. Despite the simplicity of the rules being considered, the proposed belief tracker ranks favourably compared to the previous submissions on the second and third Dialog State Tracking challenges. The results of this simple tracker allows to reconsider the performances of previous submissions using more elaborate techniques.\nThe problem of autonomous navigation is one of the basic problems for robotics. Although, in general, it may be challenging when an autonomous vehicle is placed into partially observable domain. In this paper we consider simplistic environment model and introduce a navigation algorithm based on Learning Classifier System.\nThis paper introduces a novel approach to tackle the existing gap on message translations in dialogue systems. Currently, submitted messages to the dialogue systems are considered as isolated sentences. Thus, missing context information impede the disambiguation of homographs words in ambiguous sentences. Our approach solves this disambiguation problem by using concepts over existing ontologies.\nWe give an algorithm A which assigns probabilities to logical sentences. For any simple infinite sequence of sentences whose truth-values appear indistinguishable from a biased coin that outputs \"true\" with probability p, we have that the sequence of probabilities that A assigns to these sentences converges to p.\nThis volume contains the system description of the 18 solvers submitted to the First International Competition on Computational Models of Argumentation (ICCMA'15) and therefore gives an overview on state-of-the-art of computational approaches to abstract argumentation problems. Further information on the results of the competition and the performance of the individual solvers can be found on at http://argumentationcompetition.org/2015/.\nThe first ever human vs. computer no-limit Texas hold 'em competition took place from April 24-May 8, 2015 at River's Casino in Pittsburgh, PA. In this article I present my thoughts on the competition design, agent architecture, and lessons learned.\nSometime in the future we will have to deal with the impact of AI's being mistaken for humans. For this reason, I propose that any autonomous system should be designed so that it is unlikely to be mistaken for anything besides an autonomous sysem, and should identify itself at the start of any interaction with another agent.\nThis article provides a formalization of the W3C Draft Core SHACL Semantics specification using Z notation. This formalization exercise has identified a number of quality issues in the draft. It has also established that the recursive definitions in the draft are well-founded. Further formal validation of the draft will require the use of an executable specification technology.\nAttribute exploration has been investigated in several studies, with particular emphasis on the algorithmic aspects of this knowledge acquisition method. In its basic version the method itself is rather simple and transparent. But when background knowledge and partially described counter-examples are admitted, it gets more difficult. Here we discuss this case in an abstract, somewhat \"axiomatic\" setting, providing a terminology that clarifies the abstract strategy of the method rather than its algorithmic implementation.\nThis document provides the foundations behind the functionality provided by the $\\rho$G library (https://github.com/santiontanon/RHOG), focusing on the basic operations the library provides: subsumption, refinement of directed labeled graphs, and distance/similarity assessment between directed labeled graphs. $\\rho$G development was initially supported by the National Science Foundation, by the EAGER grant IIS-1551338.\nIn previous work with J. Hedges, we formalised a generalised quantifiers theory of natural language in categorical compositional distributional semantics with the help of bialgebras. In this paper, we show how quantifier scope ambiguity can be represented in that setting and how this representation can be generalised to branching quantifiers.\nI propose a system for Automated Theorem Proving in higher order logic using deep learning and eschewing hand-constructed features. Holophrasm exploits the formalism of the Metamath language and explores partial proof trees using a neural-network-augmented bandit algorithm and a sequence-to-sequence model for action enumeration. The system proves 14% of its test theorems from Metamath's set.mm module.\nReasoning does not work well when done in isolation from its significance, both to the needs and interests of an agent and with respect to the wider world. Moreover, those issues may best be handled with a new sort of data structure that goes beyond the knowledge base and incorporates aspects of perceptual knowledge and even more, in which a kind of anticipatory action may be key.\nIn this paper, we reexamine the Movie Graph Argument, which demonstrates a basic incompatibility between computationalism and materialism. We discover that the incompatibility is only manifest in singular classical-like universes. If we accept that we live in a Multiverse, then the incompatibility goes away, but in that case another line of argument shows that with computationalism, the fundamental, or primitive materiality has no causal influence on what is observed, which must must be derivable from basic arithmetic properties.\nThis thesis derives, tests and applies two linear projection algorithms for machine learning under non-stationarity. The first finds a direction in a linear space upon which a data set is maximally non-stationary. The second aims to robustify two-way classification against non-stationarity. The algorithm is tested on a key application scenario, namely Brain Computer Interfacing.\nWe present a logical framework to represent and reason about fuzzy optimization problems based on fuzzy answer set optimization programming. This is accomplished by allowing fuzzy optimization aggregates, e.g., minimum and maximum in the language of fuzzy answer set optimization programming to allow minimization or maximization of some desired criteria under fuzzy environments. We show the application of the proposed logical fuzzy optimization framework under the fuzzy answer set optimization programming to the fuzzy water allocation optimization problem.\nThis article addresses the problem of expressing preferences in flexible queries while basing on a combination of the fuzzy logic theory and Conditional Preference Networks or CP-Nets.\nThe Rao-Blackwell theorem is utilized to analyze and improve the scalability of inference in large probabilistic models that exhibit symmetries. A novel marginal density estimator is introduced and shown both analytically and empirically to outperform standard estimators by several orders of magnitude. The developed theory and algorithms apply to a broad class of probabilistic models including statistical relational models considered not susceptible to lifted probabilistic inference.\nMost feature detectors such as edge detectors or circle finders are statistical, in the sense that they decide at each point in an image about the presence of a feature, this paper describes the use of Bayesian feature detectors.\nUsual techniques to solve WCSP are based on cost transfer operations coupled with a branch and bound algorithm. In this paper, we focus on an approach integrating extraction and relaxation of Minimal Unsatisfiable Cores in order to solve this problem. We decline our approach in two ways: an incomplete, greedy, algorithm and a complete one.\nThis paper presents the OntoRich framework, a support tool for semi-automatic ontology enrichment and evaluation. The WordNet is used to extract candidates for dynamic ontology enrichment from RSS streams. With the integration of OpenNLP the system gains access to syntactic analysis of the RSS news. The enriched ontologies are evaluated against several qualitative metrics.\nThe explosion of available data along with the need to integrate and utilize that data has led to a pressing interest in data integration techniques. In terms of Semantic Web technologies, Ontology Alignment is a key step in the process of integrating heterogeneous knowledge bases. In this paper, we present the Edge Confidence technique, a modification and improvement over the popular Similarity Flooding technique for Ontology Alignment.\nModeling emotional-cognition is in a nascent stage and therefore wide-open for new ideas and discussions. In this paper the author looks at the modeling problem by bringing in ideas from axiomatic mathematics, information theory, computer science, molecular biology, non-linear dynamical systems and quantum computing and explains how ideas from these disciplines may have applications in modeling emotional-cognition.\nThe FOCUS constraint expresses the notion that solutions are concentrated. In practice, this constraint suffers from the rigidity of its semantics. To tackle this issue, we propose three generalizations of the FOCUS constraint. We provide for each one a complete filtering algorithm as well as discussing decompositions.\nWe show how, and under which conditions, the equilibrium states of a first-order Ordinary Differential Equation (ODE) system can be described with a deterministic Structural Causal Model (SCM). Our exposition sheds more light on the concept of causality as expressed within the framework of Structural Causal Models, especially for cyclic models.\nWe introduce stratified labelings as a novel semantical approach to abstract argumentation frameworks. Compared to standard labelings, stratified labelings provide a more fine-grained assessment of the controversiality of arguments using ranks instead of the usual labels in, out, and undecided. We relate the framework of stratified labelings to conditional logic and, in particular, to the System Z ranking functions.\nWe introduce in this paper an algorithm named Contextuel-E-Greedy that tackles the dynamicity of the user's content. It is based on dynamic exploration/exploitation tradeoff and can adaptively balance the two aspects by deciding which situation is most relevant for exploration or exploitation. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.\nIn this work, we first define relations on the fuzzy parametrized soft sets and study their properties. We also give a decision making method based on these relations. In approximate reasoning, relations on the fuzzy parametrized soft sets have shown to be of a primordial importance. Finally, the method is successfully applied to a problems that contain uncertainties.\nThis paper presents a powerful genetic algorithm(GA) to solve the traveling salesman problem (TSP). To construct a powerful GA, I use edge swapping(ES) with a local search procedure to determine good combinations of building blocks of parent solutions for generating even better offspring solutions. Experimental results on well studied TSP benchmarks demonstrate that the proposed GA is competitive in finding very high quality solutions on instances with up to 16,862 cities.\nIn this paper, we proposed a new approximate heuristic search algorithm: Cascading A*, which is a two-phrase algorithm combining A* and IDA* by a new concept \"envelope ball\". The new algorithm CA* is efficient, able to generate approximate solution and any-time solution, and parallel friendly.\nThis paper reports our initial experiments with using external ATP on some corpora built with the ACL2 system. This is intended to provide the first estimate about the usefulness of such external reasoning and AI systems for solving ACL2 problems.\nA common assumption in belief revision is that the reliability of the information sources is either given, derived from temporal information, or the same for all. This article does not describe a new semantics for integration but the problem of obtaining the reliability of the sources given the result of a previous merging. As an example, the relative reliability of two sensors can be assessed given some certain observation, and allows for subsequent mergings of data coming from them.\nAnalyzing Big Data can help corporations to im-prove their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a non-supervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning.\nThe domain model is one of the important components used by adaptive learning systems to automatically generate customized courses for the learners. In this paper our contribution is to propose a new tool for implementation of a domain model based on fuzzy relationships among concepts. This tool allows the experts and teachers to find the best parameters in order to adapt the learners's differences.\nWe propose a framework for computer music composition that uses resilient propagation (RProp) and long short term memory (LSTM) recurrent neural network. In this paper, we show that LSTM network learns the structure and characteristics of music pieces properly by demonstrating its ability to recreate music. We also show that predicting existing music using RProp outperforms Back propagation through time (BPTT).\nThe paper presents some steps for multi-valued representation of neutrosophic information. These steps are provided in the framework of multi-valued logics using the following logical value: true, false, neutral, unknown and saturated. Also, this approach provides some calculus formulae for the following neutrosophic features: truth, falsity, neutrality, ignorance, under-definedness, over-definedness, saturation and entropy. In addition, it was defined net truth, definedness and neutrosophic score.\nARCOE-Logic 2014, the 6th International Workshop on Acquisition, Representation and Reasoning about Context with Logic, was held in co-location with the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014) on November 25, 2014 in Link\\\"oping, Sweden. These notes contain the five papers which were accepted and presented at the workshop.\nIn this paper one presents a new fuzzy clustering algorithm based on a dissimilarity function determined by three parameters. This algorithm can be considered a generalization of the Gustafson-Kessel algorithm for fuzzy clustering.\nIn this paper a knowledge representation model are proposed, FP5, which combine the ideas from fuzzy sets and penta-valued logic. FP5 represents imprecise properties whose accomplished degree is undefined, contradictory or indeterminate for some objects. Basic operations of conjunction, disjunction and negation are introduced. Relations to other representation models like fuzzy sets, intuitionistic, paraconsistent and bipolar fuzzy sets are discussed.\nWe report on our initial work to automate the generation of a domain ontology using subject fields of resources held in the Virtual Observatory registry. Preliminary results are comparable to more generalized ontology learning software currently in use. We expect to be able to refine our solution to improve both the depth and breadth of the generated ontology.\nWe propose a generalization of SimRank similarity measure for heterogeneous information networks. Given the information network, the intraclass similarity score s(a, b) is high if the set of objects that are related with a and the set of objects that are related with b are pair-wise similar according to all imposed relations.\nIn practical situations, it is of interest to investigate computing approximations of sets as an important step of knowledge reduction of dynamic covering decision information systems. In this paper, we present incremental approaches to computing the type-1 and type-2 characteristic matrices of dynamic coverings whose cardinalities increase with immigration of more objects. We also present the incremental algorithms of computing the second and sixth lower and upper approximations of sets in dynamic covering approximation spaces.\nThis report describes a minimalistic set of methods engineered to anchor clinical events onto a temporal space. Specifically, we describe methods to extract clinical events (e.g., Problems, Treatments and Tests), temporal expressions (i.e., time, date, duration, and frequency), and temporal links (e.g., Before, After, Overlap) between events and temporal entities. These methods are developed and validated using high quality datasets.\nMachine learning is a quickly evolving field which now looks really different from what it was 15 years ago, when classification and clustering were major issues. This document proposes several trends to explore the new questions of modern machine learning, with the strong afterthought that the belief function framework has a major role to play.\nWe study confidentiality enforcement in ontologies under the Controlled Query Evaluation framework, where a policy specifies the sensitive information and a censor ensures that query answers that may compromise the policy are not returned. We focus on censors that ensure confidentiality while maximising information access, and consider both Datalog and the OWL 2 profiles as ontology languages.\nCan machines truly think? This question and its answer have many implications that depend, in large part, on any number of assumptions underlying how the issue has been addressed or considered previously. A crucial question, and one that is almost taken for granted, is the starting point for this discussion: Can \"thought\" be achieved or emulated by algorithmic procedures?\nIn this paper one presents new similarity, cardinality and entropy measures for bipolar fuzzy set and for its particular forms like intuitionistic, paraconsistent and fuzzy set. All these are constructed in the framework of multi-valued representations and are based on a penta-valued logic that uses the following logical values: true, false, unknown, contradictory and ambiguous. Also a new distance for bounded real interval was defined.\nThis paper presents a five-valued representation of bifuzzy sets. This representation is related to a five-valued logic that uses the following values: true, false, inconsistent, incomplete and ambiguous. In the framework of five-valued representation, formulae for similarity, entropy and syntropy of bifuzzy sets are constructed.\nUpdated on 24/09/2015: This update provides preliminary experiment results for fine-grained classification on the surveillance data of CompCars. The train/test splits are provided in the updated dataset. See details in Section 6.\nIn this paper, we address the problem of identifying linear structural equation models. We first extend the edge set half-trek criterion to cover a broader class of models. We then show that any semi-Markovian linear model can be recursively decomposed into simpler sub-models, resulting in improved identification power. Finally, we show that, unlike the existing methods developed for linear models, the resulting method subsumes the identification algorithm of non-parametric models.\nIn many European countries the growth of the real GDP per capita has been linear since 1950. An explanation for this linearity is still missing. We propose that in artificial intelligence we may find models for a linear growth of performance. We also discuss possible consequences of the fact that in systems with linear growth the percentage growth goes to zero.\nWe describe a framework for building abstraction hierarchies whereby an agent alternates skill- and representation-acquisition phases to construct a sequence of increasingly abstract Markov decision processes. Our formulation builds on recent results showing that the appropriate abstract representation of a problem is specified by the agent's skills. We describe how such a hierarchy can be used for fast planning, and illustrate the construction of an appropriate hierarchy for the Taxi domain.\nIn this paper, we present a board game: Square War. The game definition of Square War is similar to the classic Chinese board game Go. Then we propose a mathematical problem of the game Square War. Finally, we show that the problem can be solved by using a method of mixed mathematics and computer science.\nThere exists a theory of a single general-purpose learning algorithm which could explain the principles its operation. It assumes the initial rough architecture, a small library of simple innate circuits which are prewired at birth. and proposes that all significant mental algorithms are learned. Given current understanding and observations, this paper reviews and lists the ingredients of such an algorithm from architectural and functional perspectives.\nIn this work, we present a MCTS-based Go-playing program which uses convolutional networks in all parts. Our method performs MCTS in batches, explores the Monte Carlo search tree using Thompson sampling and a convolutional network, and evaluates convnet-based rollouts on the GPU. We achieve strong win rates against open source Go programs and attain competitive results against state of the art convolutional net-based Go-playing programs.\nAccurate and computationally efficient means for classifying human activities have been the subject of extensive research efforts. Most current research focuses on extracting complex features to achieve high classification accuracy. We propose a template selection approach based on Dynamic Time Warping, such that complex feature extraction and domain knowledge is avoided. We demonstrate the predictive capability of the algorithm on both simulated and real smartphone data.\nGiven a data set of numerical values which are sampled from some unknown probability distribution, we will show how to check if the data set exhibits the Markov property and we will show how to use the Markov property to predict future values from the same distribution, with probability 1.\nThis paper presents inference rules for Resource Description Framework (RDF), RDF Schema (RDFS) and Web Ontology Language (OWL). Our formalization is based on Notation 3 Logic, which extended RDF by logical symbols and created Semantic Web logic for deductive RDF graph stores. We also propose OWL-P that is a lightweight formalism of OWL and supports soft inferences by omitting complex language constructs.\nFor vehicle sharing schemes, where drop-off positions are not fixed, we propose a pricing scheme, where the price depends in part on the distance between where a vehicle is being dropped off and where the closest shared vehicle is parked. Under certain restrictive assumptions, we show that this pricing leads to a socially optimal spread of the vehicles within a region.\nThis paper follows previous research we have already performed in the area of Bayesian networks models for CAT. We present models using Item Response Theory (IRT - standard CAT method), Bayesian networks, and neural networks. We conducted simulated CAT tests on empirical data. Results of these tests are presented for each model separately and compared.\nIn this paper we explore the application of some notable Boolean methods, namely the Disjunctive Normal Form representation of logic table expansions, and apply them to a real-valued logic model which utilizes quantities on the range [0,1] to produce a probabilistic programming of a game character's logic in mathematical form.\nCharacterizations of semi-stable and stage extensions in terms of 2-valued logical models are presented. To this end, the so-called GL-supported and GL-stage models are defined. These two classes of logical models are logic programming counterparts of the notion of range which is an established concept in argumentation semantics.\nWe report about significant enhancements of the complex algebraic geometry theorem proving subsystem in GeoGebra for automated proofs in Euclidean geometry, concerning the extension of numerous GeoGebra tools with proof capabilities. As a result, a number of elementary theorems can be proven by using GeoGebra's intuitive user interface on various computer architectures including native Java and web based systems with JavaScript. We also provide a test suite for benchmarking our results with 200 test cases.\nWe present a budget-free experimental setup and procedure for benchmarking numericaloptimization algorithms in a black-box scenario. This procedure can be applied with the COCO benchmarking platform. We describe initialization of and input to the algorithm and touch upon therelevance of termination and restarts.\nKnowledge representation is a popular research field in IT. As mathematical knowledge is most formalized, its representation is important and interesting. Mathematical knowledge consists of various mathematical theories. In this paper we consider a deductive system that derives mathematical notions, axioms and theorems. All these notions, axioms and theorems can be considered as the part of elementary set theory. This theory will be represented as a semantic net.\nProbabilistic programming is considered as a framework, in which basic components of cognitive architectures can be represented in unified and elegant fashion. At the same time, necessity of adopting some component of cognitive architectures for extending capabilities of probabilistic programming languages is pointed out. In particular, implicit specification of generative models via declaration of concepts and links between them is proposed, and usefulness of declarative knowledge for achieving efficient inference is briefly discussed.\nWe improve further the 2015 version of abcdSAT by various heuristics such as at-least-one recently used strategy, learnt clause database approximation reduction etc. Based on the requirement of different tracks at the SAT Competition 2016, we develop three versions of abcdSAT: drup, inc and lim, which participate in the competition of main (agile), incremental library and no-limit track, respectively.\nThis paper discusses the idea of levels of autonomy of systems - be this technical or organic - and compares the insights with models employed by industries used to describe maturity and capability of their products.\nOpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.\nIn this note, we discuss and analyse a shortest path finding approach using strong spatial cognition. It is compared with a symbolic graph-based algorithm and it is shown that both approaches are similar with respect to structure and complexity. Nevertheless, the strong spatial cognition solution is easy to understand and even pops up immediately when one has to solve the problem.\nOur software system simulates the classical collaborative Japanese poetry form, renga, made of linked haikus. We used NLP methods wrapped up as web services. Our experiments were only a partial success, since results fail to satisfy classical constraints. To gather ideas for future work, we examine related research in semiotics, linguistics, and computing.\nRecently, Neural Networks have been proven extremely effective in many natural language processing tasks such as sentiment analysis, question answering, or machine translation. Aiming to exploit such advantages in the Ontology Learning process, in this technical report we present a detailed description of a Recurrent Neural Network based system to be used to pursue such goal.\nWe present the last of a series of three academic essays which deal with the question of how and why to build a generalized player model. We propose that a general player model needs parameters for subjective experience of play, including: player psychology, game structure, and actions of play. Based on this proposition, we pose three linked research questions: RQ1 what is a necessary and sufficient foundation to a general player model?; RQ2 can such a foundation improve performance of a computational intelligence- based player model?; and RQ3 can such a player model improve efficacy of adaptive artificial intelligence in games?   We set out the arguments behind these research questions in each of the three essays, presented as three preprints. The third essay, in this preprint, presents the argument that adaptive game artificial intelligence will be enhanced by a generalised player model. This is because games are inherently human artefacts which therefore, require some encoding of the human perspective in order to effectively autonomously respond to the individual player. The player model informs the necessary constraints on the adaptive artificial intelligence. A generalised player model is not only more efficient than a per-game solution, but also allows comparison between games which makes it a useful tool for studying play in general. We describe the concept and meaning of an adaptive game. We propose requirements for functional adaptive AI, arguing from first principles drawn from the games research literature. We propose solutions to these requirements, based on a formal model approach to our existing 'Behavlets' method for psychologically-derived player modelling:   Cowley, B., & Charles, D. (2016). Behavlets: a Method for Practical Player Modelling using Psychology-Based Player Traits and Domain Specific Features. User Modeling and User-Adapted Interaction, 26(2), 257-306.\nA warning system for assisting drivers during overtaking maneuvers is proposed. The system relies on Car-2-Car communication technologies and multi-agent systems. A protocol for safety overtaking is proposed based on ACL communicative acts. The mathematical model for safety overtaking used Kalman filter to minimize localization error.\nSelectional restrictions are semantic constraints on forming certain complex types in natural language. The paper gives an overview of modeling selectional restrictions in a relational type system with morphological and syntactic types. We discuss some foundations of the system and ways of formalizing selectional restrictions.   Keywords: type theory, selectional restrictions, syntax, morphology\nCrowdsourcing is a multidisciplinary research area including disciplines like artificial intelligence, human-computer interaction, database, and social science. To facilitate cooperation across disciplines, reproducibility is a crucial factor, but unfortunately, it has not gotten enough attention in the HCOMP community. In this paper, we present Reprowd, a system aiming to make it easy to reproduce crowdsourced data processing research. We have open sourced Reprowd at http://sfu-db.github.io/reprowd/.\nThe point of this note is to prove that a language is in the complexity class PP if and only if the strings of the language encode valid inferences in a Bayesian network defined using function-free first-order logic with equality.\nMany fields are now snowed under with an avalanche of data, which raises considerable challenges for computer scientists. Meanwhile, robotics (among other fields) can often only use a few dozen data points because acquiring them involves a process that is expensive or time-consuming. How can an algorithm learn with only a few data points?\nWe outline a program in the area of formalization of mathematics to automate theorem proving in algebra and algebraic geometry. We propose a construction of a dictionary between automated theorem provers and (La)TeX exploiting syntactic parsers. We describe its application to a repository of human-written facts and definitions in algebraic geometry (The Stacks Project). We use deep learning techniques.\nWe explore the following question: Is a decision-making program fair, for some useful definition of fairness? First, we describe how several algorithmic fairness questions can be phrased as program verification problems. Second, we discuss an automated verification technique for proving or disproving fairness of decision-making programs with respect to a probabilistic model of the population.\nThis special issue is dedicated to get a better picture of the relationships between computational linguistics and cognitive science. It specifically raises two questions: \"what is the potential contribution of computational language modeling to cognitive science?\" and conversely: \"what is the influence of cognitive science in contemporary computational linguistics?\"\nWe introduce a kind of partial observability to the projective simulation (PS) learning method via Dirac notation. It is done by adding a projection operator and an observability parameter to the original formulation of the efficiency in PS model. Our examples are from invasion toy problem regarding a multi-agent setting.\nIn reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire.\nWe examine three probabilistic concepts related to the sentence \"two variables have no bearing on each other\". We explore the relationships between these three concepts and establish their relevance to the process of constructing similarity networks---a tool for acquiring probabilistic knowledge from human experts. We also establish a precise relationship between connectedness in Bayesian networks and relevance in probability.\nI suggest an approach that helps the online marketers to target their Gamification elements to users by modifying the order of the list of tasks that they send to users. It is more realistic and flexible as it allows the model to learn more parameters when the online marketers collect more data. The targeting approach is scalable and quick, and it can be used over streaming data.\nWe adapt Tomita's Generalized LR algorithm to languages generated by context-free grammars enriched with a shuffle operator. The change involves extensions to the underlying handle-finding finite automaton, construction of parser tables, and the necessary optimizations in constructing a deterministic parser. Our system is motivated by an application from artificial intelligence plan recognition. We argue for the correctness of the system, and discuss future extensions of this work.\nIn this paper, we advocate the use of stratified logical theories for representing probabilistic models. We argue that such encodings can be more interpretable than those obtained in existing frameworks such as Markov logic networks. Among others, this allows for the use of domain experts to improve learned models by directly removing, adding, or modifying logical formulas.\nThis paper discusses the semantics of weighted argumentation graphs that are biplor, i.e. contain both attacks and support graphs. The work builds on previous work by Amgoud, Ben-Naim et. al., which presents and compares several semantics for argumentation graphs that contain only supports or only attacks relationships, respectively.\nWe show how the classic Cramer-Rao bound limits how accurately one can simultaneously estimate values of a large number of Google Ad campaigns (or similarly limit the measurement rate of many confounding A/B tests).\nThis paper introduces ALYSIA: Automated LYrical SongwrIting Application. ALYSIA is based on a machine learning model using Random Forests, and we discuss its success at pitch and rhythm prediction. Next, we show how ALYSIA was used to create original pop songs that were subsequently recorded and produced. Finally, we discuss our vision for the future of Automated Songwriting for both co-creative and autonomous systems.\nWe tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates.\nDespite the recent progress in automatic theorem provers, proof engineers are still suffering from the lack of powerful proof automation. In this position paper we first report our proof strategy language based on a meta-tool approach. Then, we propose an AI-based approach to drastically improve proof automation for Isabelle, while identifying three major challenges we plan to address for this objective.\nWe introduce a new family of minmax rank aggregation problems under two distance measures, the Kendall {\\tau} and the Spearman footrule. As the problems are NP-hard, we proceed to describe a number of constant-approximation algorithms for solving them. We conclude with illustrative applications of the aggregation methods on the Mallows model and genomic data.\nInteraction information is one of the multivariate generalizations of mutual information, which expresses the amount information shared among a set of variables, beyond the information, which is shared in any proper subset of those variables. Unlike (conditional) mutual information, which is always non-negative, interaction information can be negative. We utilize this property to find the direction of causal influences among variables in a triangle topology under some mild assumptions.\nASHACL, a variant of the W3C Shapes Constraint Language, is designed to determine whether an RDF graph meets some conditions. These conditions are grouped into shapes, which validate whether particular RDF terms each meet the constraints of the shape. Shapes are themselves expressed as RDF triples in an RDF graph, called a shapes graph.\nInfluence diagrams are a decision-theoretic extension of probabilistic graphical models. In this paper we show how they can be used to solve the Brachistochrone problem. We present results of numerical experiments on this problem, compare the solution provided by the influence diagram with the optimal solution. The R code used for the experiments is presented in the Appendix.\nWe develop T-SKIRT: a temporal, structured-knowledge, IRT-based method for predicting student responses online. By explicitly accounting for student learning and employing a structured, multidimensional representation of student proficiencies, the model outperforms standard IRT-based methods on an online response prediction task when applied to real responses collected from students interacting with diverse pools of educational content.\nThis paper proposes Monte Carlo Action Programming, a programming language framework for autonomous systems that act in large probabilistic state spaces with high branching factors. It comprises formal syntax and semantics of a nondeterministic action programming language. The language is interpreted stochastically via Monte Carlo Tree Search. Effectiveness of the approach is shown empirically.\nWe investigate how to model exchangeability with choice functions. Exchangeability is a structural assessment on a sequence of uncertain variables. We show how such assessments are a special indifference assessment, and how that leads to a counterpart of de Finetti's Representation Theorem, both in a finite and a countable context.\nThis paper proposes an innovative method for segmentation of skin lesions in dermoscopy images developed by the authors, based on fuzzy classification of pixels and histogram thresholding.\nInfluence diagrams are a decision-theoretic extension of probabilistic graphical models. In this paper we show how they can be used to solve the Goddard problem. We present results of numerical experiments with this problem and compare the solutions provided by influence diagrams with the optimal solution.\nCatastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters.\nThe paper discusses scientific and technological problems of dynamic integrated expert systems development. Extensions of problem-oriented methodology for dynamic integrated expert systems development are considered. Attention is paid to the temporal knowledge representation and processing.\nWe present a baseline approach for cross-modal knowledge fusion. Different basic fusion methods are evaluated on existing embedding approaches to show the potential of joining knowledge about certain concepts across modalities in a fused concept representation.\nIn this extended abstract, we propose Structured Production Systems (SPS), which extend traditional production systems with well-formed syntactic structures. Due to the richness of structures, structured production systems significantly enhance the expressive power as well as the flexibility of production systems, for instance, to handle uncertainty. We show that different rule application strategies can be reduced into the basic one by utilizing structures. Also, many fundamental approaches in computer science, including automata, grammar and logic, can be captured by structured production systems.\nRecently introduced composition operator for credal sets is an analogy of such operators in probability, possibility, evidence and valuation-based systems theories. It was designed to construct multidimensional models (in the framework of credal sets) from a system of low- dimensional credal sets. In this paper we study its potential from the computational point of view utilizing methods of polyhedral geometry.\nSocial abstract argumentation is a principled way to assign values to conflicting (weighted) arguments. In this note we discuss the important property of the uniqueness of the model.\nRelation Extraction refers to the task of populating a database with tuples of the form $r(e_1, e_2)$, where $r$ is a relation and $e_1$, $e_2$ are entities. Distant supervision is one such technique which tries to automatically generate training examples based on an existing KB such as Freebase. This paper is a survey of some of the techniques in distant supervision which primarily rely on Probabilistic Graphical Models (PGMs).\nWe present a rational analysis of curiosity, proposing that people's curiosity is driven by seeking stimuli that maximize their ability to make appropriate responses in the future. This perspective offers a way to unify previous theories of curiosity into a single framework. Experimental results confirm our model's predictions, showing how the relationship between curiosity and confidence can change significantly depending on the nature of the environment.\nIn this paper, we constructed a model to determine weights of criterias and presented a solution for determining the optimal alternative by using the constructed model and relationship analysis between criterias in fuzzy group decision-making problem with different forms of preference information of decision makers on criterias.\nFor most reinforcement learning approaches, the learning is performed by maximizing an accumulative reward that is expectedly and manually defined for specific tasks. However, in real world, rewards are emergent phenomena from the complex interactions between agents and environments. In this paper, we propose an implicit generic reward model for reinforcement learning. Unlike those rewards that are manually defined for specific tasks, such implicit reward is task independent. It only comes from the deviation from the agents' previous experiences.\nIn this paper $\\ast$--compatible extensions of fuzzy relations are studied, generalizing some results obtained by Duggan in case of crisp relations. From this general result are obtained as particular cases fuzzy versions of some important extension theorems for crisp relations (Szpilrajn, Hansson, Suzumura). Two notions of consistent closure of a fuzzy relation are introduced.\nToby Walsh in 'The Singularity May Never Be Near' gives six arguments to support his point of view that technological singularity may happen but that it is unlikely. In this paper, we provide analysis of each one of his arguments and arrive at similar conclusions, but with more weight given to the 'likely to happen' probability.\nWe present an initial version of Regular Boardgames general game description language. This stands as an extension of Simplified Boardgames language. Our language is designed to be able to express the rules of a majority of popular boardgames including the complex rules such as promotions, castling, en passant, jump captures, liberty captures, and obligatory moves. The language describes all the above through one consistent general mechanism based on regular expressions, without using exceptions or ad hoc rules.\nThis paper is concerned with the apparent greatest weakness of the Mathematical Theory of Evidence (MTE) of Shafer \\cite{Shafer:76}, which has been strongly criticized by Wasserman \\cite{Wasserman:92ijar} - the relationship to frequencies.   Weaknesses of various proposals of probabilistic interpretation of MTE belief functions are demonstrated.   A new frequency-based interpretation is presented overcoming various drawbacks of earlier interpretations.\nIn this paper, we propose generative probabilistic models for label aggregation. We use Gibbs sampling and a novel variational inference algorithm to perform the posterior inference. Empirical results show that our methods consistently outperform state-of-the-art methods.\nIn this article, we explain in detail the internal structures and databases of a smart health application. Moreover, we describe how to generate a statistically sound synthetic dataset using real-world medical data.\nIn this work we describe and evaluate methods to learn musical embeddings. Each embedding is a vector that represents four contiguous beats of music and is derived from a symbolic representation. We consider autoencoding-based methods including denoising autoencoders, and context reconstruction, and evaluate the resulting embeddings on a forward prediction and a classification task.\nPresented is a Julia meta-program that discovers compact theories from data if they exist. It writes candidate theories in Julia and then validates: tossing the bad theories and keeping the good theories. Compactness is measured by a metric: such as the number of space-time derivatives. The underlying algorithm is applicable to a wide variety of combinatorics problems and compactness serves to cut down the search space.\nRepresenting knowledge as high-dimensional vectors in a continuous semantic vector space can help overcome the brittleness and incompleteness of traditional knowledge bases. We present a method for performing deductive reasoning directly in such a vector space, combining analogy, association, and deduction in a straightforward way at each step in a chain of reasoning, drawing on knowledge from diverse sources and ontologies.\nWe study some mathematical aspects of the Mahjong game. In particular, we use combinatorial theory and write a Python program to study some special features of the game. The results confirm some folklore concerning the game, and expose some unexpected results. Related results and possible future research in connection to artificial intelligence are mentioned.\nWe study a novel outlier detection problem that aims to identify abnormal input-output associations in data, whose instances consist of multi-dimensional input (context) and output (responses) pairs. We present our approach that works by analyzing data in the conditional (input--output) relation space, captured by a decomposable probabilistic model. Experimental results demonstrate the ability of our approach in identifying multivariate conditional outliers.\nGiven an environment with continuous state spaces and discrete actions, we investigate using a Double Deep Q-learning Reinforcement Agent to find optimal policies using the LunarLander-v2 OpenAI gym environment.\nWe give a non-FPT lower bound on the size of structured decision DNNF and OBDD with decomposable AND-nodes representing CNF-formulas of bounded incidence treewidth. Both models are known to be of FPT size for CNFs of bounded primal treewidth. To the best of our knowledge this is the first parameterized separation of primal treewidth and incidence treewidth for knowledge compilation models.\nWe consider the problem of rational uncertainty about unproven mathematical statements, which G\\\"odel and others have remarked on. Using Bayesian-inspired arguments we build a normative model of fair bets under deductive uncertainty which draws from both probability and the theory of algorithms. We comment on connections to Zeilberger's notion of \"semi-rigorous proofs\", particularly that inherent subjectivity is an obstacle.\nIn this note, we point out a basic link between generative adversarial (GA) training and binary classification -- any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function.\nLogic-based paradigms are nowadays widely used in many different fields, also thank to the availability of robust tools and systems that allow the development of real-world and industrial applications.   In this work we present LoIDE, an advanced and modular web-editor for logic-based languages that also integrates with state-of-the-art solvers.\nIn this paper we design and evaluate a Deep-Reinforcement Learning agent that optimizes routing. Our agent adapts automatically to current traffic conditions and proposes tailored configurations that attempt to minimize the network delay. Experiments show very promising performance. Moreover, this approach provides important operational advantages with respect to traditional optimization algorithms.\nThis paper maps out the relation between different approaches for handling preferences in argumentation with strict rules and defeasible assumptions by offering translations between them. The systems we compare are: non-prioritized defeats i.e. attacks, preference-based defeats, and preference-based defeats extended with reverse defeat.\nThis article discusses how the automation of tensor algorithms, based on A Mathematics of Arrays and Psi Calculus, and a new way to represent numbers, Unum Arithmetic, enables mechanically provable, scalable, portable, and more numerically accurate software.\nThe connected autonomous vehicle has been often touted as a technology that will become pervasive in society in the near future. Rather than being stand alone, we examine the need for autonomous vehicles to cooperate and interact within their socio-cyber-physical environments, including the problems cooperation will solve, but also the issues and challenges.\nOver the past decade, the idea of smart homes has been conceived as a potential solution to counter energy crises or to at least mitigate its intensive destructive consequences in the residential building sector.\nCausation has been the issue of philosophic debate since Hippocrates. Recent work defines actual causation in terms of Pearl/Halpern's causality framework, formalizing necessary causes (IJCAI'15). This has inspired causality notions in the security domain (CSF'15), which, perhaps surprisingly, formalize sufficient causes instead. We provide an explicit relation between necessary and sufficient causes.\nWe demonstrate applications of topological characteristics of oil and gas reservoirs considered as three-dimensional bodies to geological modeling.\nMao H. (2017, Representing attribute reduction and concepts in concept lattice using graphs. Soft Computing 21(24):7293--7311) claims to make contributions to the study of reduction of attributes in concept lattices by using graph theory. We show that her results are either trivial or already well-known and all three algorithms proposed in the paper are incorrect.\nConstant structure closed semantic systems are the systems each element of which receives its definition through the correspondent unchangeable set of other elements of the system. Discrete time means here that the definitions of the elements change iteratively and simultaneously based on the \"neighbor portraits\" from the previous iteration. I prove that the iterative redefinition process in such class of systems will quickly degenerate into a series of pairwise isomorphic states and discuss some directions of further research.\n: In this paper a method to make inputting electrical model upon factors that affect melting process of high ultra power(UHP) electric furnace by using fuzzy rule and regression model is suggested and its effectiveness is verified with simulation experiment.\nWe propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters. We will establish the equivalence between our proposed surrogate objective and variational inference loss. Our new algorithm achieves efficient exploration and performs well on large scale chain Markov Decision Process (MDP).\nNintendo's Super Smash Bros. Melee fighting game can be emulated on modern hardware allowing us to inspect internal memory states, such as character positions. We created an AI that avoids being hit by training using these internal memory states and outputting controller button presses. After training on a month's worth of Melee matches, our best agent learned to avoid the toughest AI built into the game for a full minute 74.6% of the time.\nIn this work, we present our findings and experiments for stock-market prediction using various textual sentiment analysis tools, such as mood analysis and event extraction, as well as prediction models, such as LSTMs and specific convolutional architectures.\nCatastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting.\nThis short paper describes our ongoing research on Greenhouse - a zero-positive machine learning system for time-series anomaly detection.\nClassical anomaly detection is principally concerned with point-based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range-based anomalies, anomalies that occur over a range (or period) of time.\nStarting from the observation that rational closure has the undesirable property of being an \"all or nothing\" mechanism, we here propose a multipreferential semantics, which enriches the preferential semantics underlying rational closure in order to separately deal with the inheritance of different properties in an ontology with exceptions. We provide a multipreference closure mechanism which is sound with respect to the multipreference semantics.\nThe cognitive theory of true conditions (CTTC) is a proposal to describe the model-theoretic semantics of symbolic cognitive architectures and design the implementation of cognitive abilities. The CTTC is formulated mathematically using the multi-optional many-sorted past present future(MMPPF) structures. This article defines mathematically the MMPPF structures and the formal languages proposed to describe them by the CTTC.\nWe propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering.\nWe describe an approach to learn, in a term-rewriting setting, function definitions from input/output equations. By confining ourselves to structurally recursive definitions we obtain a fairly fast learning algorithm that often yields definitions close to intuitive expectations. We provide a Prolog prototype implementation of our approach, and indicate open issues of further investigation.\nImplicational bases are objects of interest in formal concept analysis and its applications. Unfortunately, even the smallest base, the Duquenne-Guigues base, has an exponential size in the worst case. In this paper, we use results on the average number of minimal transversals in random hypergraphs to show that the base of proper premises is, on average, of quasi-polynomial size.\nThe paper analyzes the problem of judgments or preferences subsequent to initial analysis by autonomous agents in a hierarchical system where the higher level agents does not have access to group size information. We propose methods that reduce instances of preference reversal of the kind encountered in Simpson's paradox.\nWe introduce and discuss, through a computational algebraic geometry approach, the automatic reasoning handling of propositions that are simultaneously true and false over some relevant collections of instances. A rigorous, algorithmic criterion is presented for detecting such cases, and its performance is exemplified through the implementation of this test on the dynamic geometry program GeoGebra.\nThe article describes the technique for designing a domain ontology, shows the flowchart of algorithm design and example of constructing a fragment of the ontology of the subject area of Computer Science is considered.\nThe article presents an overview of current specialized ontology engineering tools, as well as texts' annotation tools based on ontologies. The main functions and features of these tools, their advantages and disadvantages are discussed. A systematic comparative analysis of means for engineering ontologies is presented.\nThis paper describes the design principles of methodology of knowledge-oriented information systems based on ontological approach. Such systems implement technology subject-oriented extraction of knowledge from the set of natural language texts and their formal and logical presentation and application processing\nComputing universal distributed representations of sentences is a fundamental task in natural language processing. We propose a method to learn such representations by encoding the suffixes of word sequences in a sentence and training on the Stanford Natural Language Inference (SNLI) dataset. We demonstrate the effectiveness of our approach by evaluating it on the SentEval benchmark, improving on existing approaches on several transfer tasks.\nIn many neural models, new features as polynomial functions of existing ones are used to augment representations. Using the natural language inference task as an example, we investigate the use of scaled polynomials of degree 2 and above as matching features. We find that scaling degree 2 features has the highest impact on performance, reducing classification error by 5% in the best models.\nThe Cognitive Theory of True Conditions (CTTC) is a proposal to design the implementation of cognitive abilities and to describe the model-theoretic semantics of symbolic cognitive architectures. The CTTC is formulated mathematically using the multi-optional many-sorted past present future(MMPPF) structures. This article discussed how decision-making processes are described in the CTTC.\nBias and heterogeneity in peer assessment can lead to the issue of unfair scoring in the educational field. To deal with this problem, we propose a reference ranking method for an online peer assessment system using HodgeRank. Such a scheme provides instructors with an objective scoring reference based on mathematics.\nWe present a formal language with expressions denoting general symbol structures and queries which access information in those structures. A sequence-to-sequence network processing this language learns to encode symbol structures and query them. The learned representation (approximately) shares a simple linearity property with theoretical techniques for performing this task.\nThis article explores the ideas that went into George Boole's development of an algebra for logical inference in his book The Laws of Thought. We explore in particular his wife Mary Boole's claim that he was deeply influenced by Indian logic and argue that his work was more than a framework for processing propositions. By exploring parallels between his work and Indian logic, we are able to explain several peculiarities of this work.\nWe interpret part of the experimental results of Shwartz-Ziv and Tishby [2017]. Inspired by these results, we established a conjecture of the dynamics of the machinary of deep neural network. This conjecture can be used to explain the counterpart result by Saxe et al. [2018].\nIn this paper, we study the model theoretical aspects of Weakly Aggregative Modal Logic (WAL), which is a collection of disguised polyadic modal logics with $n$-ary modalities whose arguments are all the same. We give a van-Benthem-Rosen characterization theorem of WAL based on an intuitive notion of bisimulation, and show that WAL has Craig Interpolation.\nCommittee selection with diversity or distributional constraints is a ubiquitous problem. However, many of the formal approaches proposed so far have certain drawbacks including (1) computationally intractability in general, and (2) inability to suggest a solution for certain instances where the hard constraints cannot be met. We propose a practical and polynomial-time algorithm for diverse committee selection that draws on the idea of using soft bounds and satisfies natural axioms.\nThis chapter offers an accessible introduction to the channel-based approach to Bayesian probability theory. This framework rests on algebraic and logical foundations, inspired by the methodologies of programming language semantics. It offers a uniform, structured and expressive language for describing Bayesian phenomena in terms of familiar programming concepts, like channel, predicate transformation and state transformation. The introduction also covers inference in Bayesian networks, which will be modelled by a suitable calculus of string diagrams.\nThis paper discusses the problem of learning language from unprocessed text and speech signals, concentrating on the problem of learning a lexicon. In particular, it argues for a representation of language in which linguistic parameters like words are built by perturbing a composition of existing parameters. The power of this representation is demonstrated by several examples in text segmentation and compression, acquisition of a lexicon from raw speech, and the acquisition of mappings between text and artificial representations of meaning.\nA new three-stage computer artificial neural network model of the tip-of-the-tongue phenomenon is proposed. Each word's node is build from some interconnected learned auto-associative two-layer neural networks each of which represents separate word's semantic, lexical, or phonological components. The model synthesizes memory, psycholinguistic, and metamemory approaches, bridges speech errors and naming chronometry research traditions, and can explain quantitatively many tip-of-the-tongue effects.\nUnlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a recent approach to market-based RL and for the first time evaluate it in a toy POMDP setting.\nIn this paper we expose the theoretical background underlying our current research. This consists in the development of behaviour-based knowledge systems, for closing the gaps between behaviour-based and knowledge-based systems, and also between the understandings of the phenomena they model. We expose the requirements and stages for developing behaviour-based knowledge systems and discuss their limits. We believe that these are necessary conditions for the development of higher order cognitive capacities, in artificial and natural cognitive systems.\nMulti-dimensional data classification is an important and challenging problem in many astro-particle experiments. Neural networks have proved to be versatile and robust in multi-dimensional data classification. In this article we shall study the classification of gamma from the hadrons for the MAGIC Experiment. Two neural networks have been used for the classification task. One is Multi-Layer Perceptron based on supervised learning and other is Self-Organising Map (SOM), which is based on unsupervised learning technique. The results have been shown and the possible ways of combining these networks have been proposed to yield better and faster classification results.\nThis paper presents an artificial evolutionbased method for stereo image analysis and its application to real-time obstacle detection and avoidance for a mobile robot. It uses the Parisian approach, which consists here in splitting the representation of the robot's environment into a large number of simple primitives, the \"flies\", which are evolved following a biologically inspired scheme and give a fast, low-cost solution to the obstacle detection problem in mobile robotics.\nPertaining to Agent-based Computational Economics (ACE), this work presents two models for the rise and downfall of speculative bubbles through an exchange price fixing based on double auction mechanisms. The first model is based on a finite time horizon context, where the expected dividends decrease along time. The second model follows the {\\em greater fool} hypothesis; the agent behaviour depends on the comparison of the estimated risk with the greater fool's. Simulations shed some light on the influent parameters and the necessary conditions for the apparition of speculative bubbles in an asset market within the considered framework.\nThis is to present work on modifying the Aleph ILP system so that it evaluates the hypothesised clauses in parallel by distributing the data-set among the nodes of a parallel or distributed machine. The paper briefly discusses MPI, the interface used to access message- passing libraries for parallel computers and clusters. It then proceeds to describe an extension of YAP Prolog with an MPI interface and an implementation of data-parallel clause evaluation for Aleph through this interface. The paper concludes by testing the data-parallel Aleph on artificially constructed data-sets.\nIn this paper, we study instances of complex neural networks, i.e. neural netwo rks with complex topologies. We use Self-Organizing Map neural networks whose n eighbourhood relationships are defined by a complex network, to classify handwr itten digits. We show that topology has a small impact on performance and robus tness to neuron failures, at least at long learning times. Performance may howe ver be increased (by almost 10%) by artificial evolution of the network topo logy. In our experimental conditions, the evolved networks are more random than their parents, but display a more heterogeneous degree distribution.\nThe subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. Many efforts have been made to predict protein subcellular localization. This paper aims to merge the artificial neural networks and bioinformatics to predict the location of protein in yeast genome. We introduce a new subcellular prediction method based on a backpropagation neural network. The results show that the prediction within an error limit of 5 to 10 percentage can be achieved with the system.\nThe convergence properties of the stationary Fokker-Planck algorithm for the estimation of the asymptotic density of stochastic search processes is studied. Theoretical and empirical arguments for the characterization of convergence of the estimation in the case of separable and nonseparable nonlinear optimization problems are given. Some implications of the convergence of stationary Fokker-Planck learning for the inference of parameters in artificial neural network models are outlined.\nThe management and combination of uncertain, imprecise, fuzzy and even paradoxical or high conflicting sources of information has always been, and still remains today, of primal importance for the development of reliable modern information systems involving artificial reasoning. In this introduction, we present a survey of our recent theory of plausible and paradoxical reasoning, known as Dezert-Smarandache Theory (DSmT), developed for dealing with imprecise, uncertain and conflicting sources of information. We focus our presentation on the foundations of DSmT and on its most important rules of combination, rather than on browsing specific applications of DSmT available in literature. Several simple examples are given throughout this presentation to show the efficiency and the generality of this new approach.\nThe role of T-cells within the immune system is to confirm and assess anomalous situations and then either respond to or tolerate the source of the effect. To illustrate how these mechanisms can be harnessed to solve real-world problems, we present the blueprint of a T-cell inspired algorithm for computer security worm detection. We show how the three central T-cell processes, namely T-cell maturation, differentiation and proliferation, naturally map into this domain and further illustrate how such an algorithm fits into a complete immune inspired computer security system and framework.\nWe discuss how to use a Genetic Regulatory Network as an evolutionary representation to solve a typical GP reinforcement problem, the pole balancing. The network is a modified version of an Artificial Regulatory Network proposed a few years ago, and the task could be solved only by finding a proper way of connecting inputs and outputs to the network. We show that the representation is able to generalize well over the problem domain, and discuss the performance of different models of this kind.\nChaotic neural networks have received a great deal of attention these last years. In this paper we establish a precise correspondence between the so-called chaotic iterations and a particular class of artificial neural networks: global recurrent multi-layer perceptrons. We show formally that it is possible to make these iterations behave chaotically, as defined by Devaney, and thus we obtain the first neural networks proven chaotic. Several neural networks with different architectures are trained to exhibit a chaotical behavior.\nA multiagent system may be thought of as an artificial society of autonomous software agents and we can apply concepts borrowed from welfare economics and social choice theory to assess the social welfare of such an agent society. In this paper, we study an abstract negotiation framework where agents can agree on multilateral deals to exchange bundles of indivisible resources. We then analyse how these deals affect social welfare for different instances of the basic framework and different interpretations of the concept of social welfare itself. In particular, we show how certain classes of deals are both sufficient and necessary to guarantee that a socially optimal allocation of resources will be reached eventually.\nIn this paper we present the experimental results of the neural network control of a servo-system in order to control its speed. The control strategy is implemented by using an inverse-model control based on Artificial Neural Networks (ANNs). The network training was performed using two learning algorithms: Levenberg-Marquardt and Bayesian regularization. We evaluate the generalization capability for each method according to both the correct operation of the controller to follow the reference signal, and the control efforts developed by the ANN-based controller.\nUnsupervised classification algorithm based on clonal selection principle named Unsupervised Clonal Selection Classification (UCSC) is proposed in this paper. The new proposed algorithm is data driven and self-adaptive, it adjusts its parameters to the data to make the classification operation as fast as possible. The performance of UCSC is evaluated by comparing it with the well known K-means algorithm using several artificial and real-life data sets. The experiments show that the proposed UCSC algorithm is more reliable and has high classification precision comparing to traditional classification methods such as K-means.\nThe ideas about decision making under ignorance in economics are combined with the ideas about uncertainty representation in computer science. The combination sheds new light on the question of how artificial agents can act in a dynamically consistent manner. The notion of sequential consistency is formalized by adapting the law of iterated expectation for plausibility measures. The necessary and sufficient condition for a certainty equivalence operator for Nehring-Puppe's preference to be sequentially consistent is given. This result sheds light on the models of decision making under uncertainty.\nDiscovering causal relations among observed variables in a given data set is a main topic in studies of statistics and artificial intelligence. Recently, some techniques to discover an identifiable causal structure have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose a new approach to derive an identifiable causal structure governing the data based on skew Bernoulli distributions of external noise. Experimental evaluation shows excellent performance for both artificial and real world data sets.\nWe focus on a comparative study of three recently developed nature-inspired optimization algorithms, including state transition algorithm, harmony search and artificial bee colony. Their core mechanisms are introduced and their similarities and differences are described. Then, a suit of 27 well-known benchmark problems are used to investigate the performance of these algorithms and finally we discuss their general applicability with respect to the structure of optimization problems.\nDrawing from research on computational models of argumentation (particularly the Carneades Argumentation System), we explore the graphical representation of arguments in a dispute; then, comparing two different traditions on the limits of the justification of decisions, and devising an intermediate, semi-formal, model, we also show that it can shed light on the theory of dispute resolution.   We conclude our paper with an observation on the usefulness of highly constrained reasoning for Online Dispute Resolution systems. Restricting the search space of arguments exclusively to reasons proposed by the parties (vetoing the introduction of new arguments by the human or artificial arbitrator) is the only way to introduce some kind of decidability -- together with foreseeability -- in the argumentation system.\nWe present a method for constructing the log-optimal portfolio using the well-calibrated forecasts of market values. Dawid's notion of calibration and the Blackwell approachability theorem are used for computing well-calibrated forecasts. We select a portfolio using this \"artificial\" probability distribution of market values. Our portfolio performs asymptotically at least as well as any stationary portfolio that redistributes the investment at each round using a continuous function of side information. Unlike in classical mathematical finance theory, no stochastic assumptions are made about market values.\nEveryday activities performed by artificial assistants can potentially be executed naively and dangerously given their lack of common sense knowledge. This paper presents conceptual work towards obtaining prior knowledge on the usual modality (passive or active) of any given entity, and their affordance estimates, by extracting high-confidence ability modality semantic relations (X can Y relationship) from non-figurative texts, by analyzing co-occurrence of grammatical instances of subjects and verbs, and verbs and objects. The discussion includes an outline of the concept, potential and limitations, and possible feature and learning framework adoption.\nThis paper deals with the problem of neural code solving. On the basis of the formulated hypotheses the information model of a neuron-detector is suggested, the detector being one of the basic elements of an artificial neural network (ANN). The paper subjects the connectionist paradigm of ANN building to criticism and suggests a new presentation paradigm for ANN building and neuroelements (NE) learning. The adequacy of the suggested model is proved by the fact that is does not contradict the modern propositions of neuropsychology and neurophysiology.\nWe consider the effects of social learning on the individual learning and genetic evolution of a colony of artificial agents capable of genetic, individual and social modes of adaptation. We confirm that there is strong selection pressure to acquire traits of individual learning and social learning when these are adaptive traits. We show that selection pressure for learning of either kind can supress selection pressure for reproduction or greater fitness. We show that social learning differs from individual learning in that it can support a second evolutionary system that is decoupled from the biological evolutionary system. This decoupling leads to an emergent interaction where immature agents are more likely to engage in learning activities than mature agents.\nWe propose a new approach for solving combinatorial optimization problem by utilizing the mechanism of chases and escapes, which has a long history in mathematics. In addition to the well-used steepest descent and neighboring search, we perform a chase and escape game on the \"landscape\" of the cost function. We have created a concrete algorithm for the Traveling Salesman Problem. Our preliminary test indicates a possibility that this new fusion of chases and escapes problem into combinatorial optimization search is fruitful.\nAlthough different learning systems are coordinated to afford complex behavior, little is known about how this occurs. This article describes a theoretical framework that specifies how complex behaviors that might be thought to require error-driven learning might instead be acquired through simple reinforcement. This framework includes specific assumptions about the mechanisms that contribute to the evolution of (artificial) neural networks to generate topologies that allow the networks to learn large-scale complex problems using only information about the quality of their performance. The practical and theoretical implications of the framework are discussed, as are possible biological analogs of the approach.\nIn this position paper we present a novel approach to neurobiologically plausible implementation of emotional reactions and behaviors for real-time autonomous robotic systems. The working metaphor we use is the \"day\" and \"night\" phases of mammalian life. During the \"day\" phase a robotic system stores the inbound information and is controlled by a light-weight rule-based system in real time. In contrast to that, during the \"night\" phase the stored information is been transferred to the supercomputing system to update the realistic neural network: emotional and behavioral strategies.\nRecent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts. Our model achieves state-of-the-art results on three different datasets for dialog act prediction.\nWe formulate learning of a binary autoencoder as a biconvex optimization problem which learns from the pairwise correlations between encoded and decoded bits. Among all possible algorithms that use this information, ours finds the autoencoder that reconstructs its inputs with worst-case optimal loss. The optimal decoder is a single layer of artificial neurons, emerging entirely from the minimax loss minimization, and with weights learned by convex optimization. All this is reflected in competitive experimental results, demonstrating that binary autoencoding can be done efficiently by conveying information in pairwise correlations in an optimal fashion.\nIn order to study the application of artificial intelligence (AI) to dental imaging, we applied AI technology to classify a set of panoramic radiographs using (a) a convolutional neural network (CNN) which is a form of an artificial neural network (ANN), (b) representative image cognition algorithms that implement scale-invariant feature transform (SIFT), and (c) histogram of oriented gradients (HOG).\nExisting models based on artificial neural networks (ANNs) for sentence classification often do not incorporate the context in which sentences appear, and classify sentences individually. However, traditional sentence classification approaches have been shown to greatly benefit from jointly classifying subsequent sentences, such as with conditional random fields. In this work, we present an ANN architecture that combines the effectiveness of typical ANN models to classify sentences in isolation, with the strength of structured prediction. Our model achieves state-of-the-art results on two different datasets for sequential sentence classification in medical abstracts.\nOver 50 million scholarly articles have been published: they constitute a unique repository of knowledge. In particular, one may infer from them relations between scientific concepts, such as synonyms and hyponyms. Artificial neural networks have been recently explored for relation extraction. In this work, we continue this line of work and present a system based on a convolutional neural network to extract relations. Our model ranked first in the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific articles (subtask C).\nWe discuss problems with the standard approaches to evaluation for tasks like visual question answering, and argue that artificial data can be used to address these as a complement to current practice. We demonstrate that with the help of existing 'deep' linguistic processing technology we are able to create challenging abstract datasets, which enable us to investigate the language understanding abilities of multimodal deep learning models in detail.\nRubenstein et al. present an interesting system of programmable self-assembled structure formation using 1000 Kilobot robots. The paper claims to advance work in artificial swarms similar to capabilities of natural systems besides being highly robust. However, the system lacks in terms of matching motility and complex shapes with holes, thereby limiting practical similarity to self-assembly in living systems.\nThe ever increasing prevalence of publicly available structured data on the World Wide Web enables new applications in a variety of domains. In this paper, we provide a conceptual approach that leverages such data in order to explain the input-output behavior of trained artificial neural networks. We apply existing Semantic Web technologies in order to provide an experimental proof of concept.\nCortical minicolumns are considered a model of cortical organization. Their function is still a source of research and not reflected properly in modern architecture of nets in algorithms of Artificial Intelligence. We assume its function and describe it in this article. Furthermore, we show how this proposal allows to construct a new architecture, that is not based on convolutional neural networks, test it on MNIST data and receive close to Convolutional Neural Network accuracy. We also show that the proposed architecture possesses an ability to train on a small quantity of samples. To achieve these results, we enable the minicolumns to remember context transformations.\nLiterature involving preferences of artificial agents or human beings often assume their preferences can be represented using a complete transitive binary relation. Much has been written however on different models of preferences. We review some of the reasons that have been put forward to justify more complex modeling, and review some of the techniques that have been proposed to obtain models of such preferences.\nAfter providing a brief historical overview on the synergies between artificial intelligence research, in the areas of evolutionary computations and machine learning, and the optimal design of interplanetary trajectories, we propose and study the use of deep artificial neural networks to represent, on-board, the optimal guidance profile of an interplanetary mission. The results, limited to the chosen test case of an Earth-Mars orbital transfer, extend the findings made previously for landing scenarios and quadcopter dynamics, opening a new research area in interplanetary trajectory planning.\nThe impression of free will is the feeling according to which our choices are neither imposed from our inside nor from outside. It is the sense we are the ultimate cause of our acts. In direct opposition with the universal determinism, the existence of free will continues to be discussed. In this paper, free will is linked to a decisional mechanism: an agent is provided with free will if having performed a predictable choice Cp, it can immediately perform another choice Cr in a random way. The intangible feeling of free will is replaced by a decision-making process including a predictable decision-making process immediately followed by an unpredictable decisional one.\nIn events that are composed by many activities, there is a problem that involves retrieve and management the information of visitors that are visiting the activities. This management is crucial to find some activities that are drawing attention of visitors; identify an ideal positioning for activities; which path is more frequented by visitors. In this work, these features are studied using Complex Network theory. For the beginning, an artificial database was generated to study the mentioned features. Secondly, this work shows a method to optimize the event structure that is better than a random method and a recommendation system that achieves ~95% of accuracy.\nThe next-generation wireless networks are evolving into very complex systems because of the very diversified service requirements, heterogeneity in applications, devices, and networks. The mobile network operators (MNOs) need to make the best use of the available resources, for example, power, spectrum, as well as infrastructures. Traditional networking approaches, i.e., reactive, centrally-managed, one-size-fits-all approaches and conventional data analysis tools that have limited capability (space and time) are not competent anymore and cannot satisfy and serve that future complex networks in terms of operation and optimization in a cost-effective way. A novel paradigm of proactive, self-aware, self- adaptive and predictive networking is much needed. The MNOs have access to large amounts of data, especially from the network and the subscribers. Systematic exploitation of the big data greatly helps in making the network smart, intelligent and facilitates cost-effective operation and optimization. In view of this, we consider a data-driven next-generation wireless network model, where the MNOs employ advanced data analytics for their networks. We discuss the data sources and strong drivers for the adoption of the data analytics and the role of machine learning, artificial intelligence in making the network intelligent in terms of being self-aware, self-adaptive, proactive and prescriptive. A set of network design and optimization schemes are presented with respect to data analytics. The paper is concluded with a discussion of challenges and benefits of adopting big data analytics and artificial intelligence in the next-generation communication system.\nHandwriting is one of the most important means of daily communication. Although the problem of handwriting recognition has been considered for more than 60 years there are still many open issues, especially in the task of unconstrained handwritten sentence recognition. This paper focuses on the automatic system that recognizes continuous English sentence through a mouse-based gestures in real-time based on Artificial Neural Network. The proposed Artificial Neural Network is trained using the traditional backpropagation algorithm for self supervised neural network which provides the system with great learning ability and thus has proven highly successful in training for feed-forward Artificial Neural Network. The designed algorithm is not only capable of translating discrete gesture moves, but also continuous gestures through the mouse. In this paper we are using the efficient neural network approach for recognizing English sentence drawn by mouse. This approach shows an efficient way of extracting the boundary of the English Sentence and specifies the area of the recognition English sentence where it has been drawn in an image and then used Artificial Neural Network to recognize the English sentence. The proposed approach English sentence recognition (ESR) system is designed and tested successfully. Experimental results show that the higher speed and accuracy were examined.\nWe combine Artificial Immune Systems 'AIS', technology with Collaborative Filtering 'CF' and use it to build a movie recommendation system. We already know that Artificial Immune Systems work well as movie recommenders from previous work by Cayzer and Aickelin 3, 4, 5. Here our aim is to investigate the effect of different affinity measure algorithms for the AIS. Two different affinity measures, Kendalls Tau and Weighted Kappa, are used to calculate the correlation coefficients for the movie recommender. We compare the results with those published previously and show that Weighted Kappa is more suitable than others for movie problems. We also show that AIS are generally robust movie recommenders and that, as long as a suitable affinity measure is chosen, results are good.\nA combined Short-Term Learning (STL) and Long-Term Learning (LTL) approach to solving mobile robot navigation problems is presented and tested in both real and simulated environments. The LTL consists of rapid simulations that use a Genetic Algorithm to derive diverse sets of behaviours. These sets are then transferred to an idiotypic Artificial Immune System (AIS), which forms the STL phase, and the system is said to be seeded. The combined LTL-STL approach is compared with using STL only, and with using a handdesigned controller. In addition, the STL phase is tested when the idiotypic mechanism is turned off. The results provide substantial evidence that the best option is the seeded idiotypic system, i.e. the architecture that merges LTL with an idiotypic AIS for the STL. They also show that structurally different environments can be used for the two phases without compromising transferability\nAs introduced by Bentley et al. (2005), artificial immune systems (AIS) are lacking tissue, which is present in one form or another in all living multi-cellular organisms. Some have argued that this concept in the context of AIS brings little novelty to the already saturated field of the immune inspired computational research. This article aims to show that such a component of an AIS has the potential to bring an advantage to a data processing algorithm in terms of data pre-processing, clustering and extraction of features desired by the immune inspired system. The proposed tissue algorithm is based on self-organizing networks, such as self-organizing maps (SOM) developed by Kohonen (1996) and an analogy of the so called Toll-Like Receptors (TLR) affecting the activation function of the clusters developed by the SOM.\nDendritic cells are antigen presenting cells that provide a vital link between the innate and adaptive immune system. Research into this family of cells has revealed that they perform the role of coordinating T-cell based immune responses, both reactive and for generating tolerance. We have derived an algorithm based on the functionality of these cells, and have used the signals and differentiation pathways to build a control mechanism for an artificial immune system. We present our algorithmic details in addition to some preliminary results, where the algorithm was applied for the purpose of anomaly detection. We hope that this algorithm will eventually become the key component within a large, distributed immune system, based on sound immunological concepts.\nIn this paper we present an efficient computer aided mass classification method in digitized mammograms using Artificial Neural Network (ANN), which performs benign-malignant classification on region of interest (ROI) that contains mass. One of the major mammographic characteristics for mass classification is texture. ANN exploits this important factor to classify the mass into benign or malignant. The statistical textural features used in characterizing the masses are mean, standard deviation, entropy, skewness, kurtosis and uniformity. The main aim of the method is to increase the effectiveness and efficiency of the classification process in an objective manner to reduce the numbers of false-positive of malignancies. Three layers artificial neural network (ANN) with seven features was proposed for classifying the marked regions into benign and malignant and 90.91% sensitivity and 83.87% specificity is achieved that is very much promising compare to the radiologist's sensitivity 75%.\nAn effective procedure to determine the optimal parameters appearing in artificial flockings is proposed in terms of optimization problems. We numerically examine genetic algorithms (GAs) to determine the optimal set of parameters such as the weights for three essential interactions in BOIDS by Reynolds (1987) under `zero-collision' and `no-breaking-up' constraints. As a fitness function (the energy function) to be maximized by the GA, we choose the so-called the $\\gamma$-value of anisotropy which can be observed empirically in typical flocks of starling. We confirm that the GA successfully finds the solution having a large $\\gamma$-value leading-up to a strong anisotropy. The numerical experience shows that the procedure might enable us to make more realistic and efficient artificial flocking of starling even in our personal computers. We also evaluate two distinct types of interactions in agents, namely, metric and topological definitions of interactions. We confirmed that the topological definition can explain the empirical evidence much better than the metric definition does.\nBoth humans and artificial systems frequently use trial and error methods to problem solving. In order to be effective, this type of strategy implies having high quality control knowledge to guide the quest for the optimal solution. Unfortunately, this control knowledge is rarely perfect. Moreover, in artificial systems-as in humans-self-evaluation of one's own knowledge is often difficult. Yet, this self-evaluation can be very useful to manage knowledge and to determine when to revise it. The objective of our work is to propose an automated approach to evaluate the quality of control knowledge in artificial systems based on a specific trial and error strategy, namely the informed tree search strategy. Our revision approach consists in analysing the system's execution logs, and in using the belief theory to evaluate the global quality of the knowledge. We present a real-world industrial application in the form of an experiment using this approach in the domain of cartographic generalisation. Thus far, the results of using our approach have been encouraging.\nBack-propagation algorithm is one of the most widely used and popular techniques to optimize the feed forward neural network training. Nature inspired meta-heuristic algorithms also provide derivative-free solution to optimize complex problem. Artificial bee colony algorithm is a nature inspired meta-heuristic algorithm, mimicking the foraging or food source searching behaviour of bees in a bee colony and this algorithm is implemented in several applications for an improved optimized outcome. The proposed method in this paper includes an improved artificial bee colony algorithm based back-propagation neural network training method for fast and improved convergence rate of the hybrid neural network learning method. The result is analysed with the genetic algorithm based back-propagation method, and it is another hybridized procedure of its kind. Analysis is performed over standard data sets, reflecting the light of efficiency of proposed method in terms of convergence speed and rate.\nLandmines, specifically anti-tank mines, cluster bombs, and unexploded ordnance form a serious problem in many countries. Several landmine sweeping techniques are used for minesweeping. This paper presents the design and the implementation of the vision system of an autonomous robot for landmines localization. The proposed work develops state-of-the-art techniques in digital image processing for pre-processing captured images of the contaminated area. After enhancement, Artificial Neural Network (ANN) is used in order to identify, recognize and classify the landmines' make and model. The Back-Propagation algorithm is used for training the network. The proposed work proved to be able to identify and classify different types of landmines under various conditions (rotated landmine, partially covered landmine) with a success rate of up to 90%.\nThis document is written with the intention to describe in detail a method and means by which a computer program can reason about the world and in so doing, increase its analogue to a living system. As the literature is rife and it is apparent we, as scientists and engineers, have not found the solution, this document will attempt the solution by grounding its intellectual arguments within tenets of human cognition in Western philosophy. The result will be a characteristic description of a method to describe an artificial system analogous to that performed for a human. The approach was the substance of my Master's thesis, explored more deeply during the course of my postdoc research. It focuses primarily on context awareness and choice set within a boundary of available epistemology, which serves to describe it. Expanded upon, such a description strives to discover agreement with Kant's critique of reason to understand how it could be applied to define the architecture of its design. The intention has never been to mimic human or biological systems, rather, to understand the profoundly fundamental rules, when leveraged correctly, results in an artificial consciousness as noumenon while in keeping with the perception of it as phenomenon.\nDue to imprecision and uncertainties in predicting real world problems, artificial neural network (ANN) techniques have become increasingly useful for modeling and optimization. This paper presents an artificial neural network approach for forecasting electric energy consumption. For effective planning and operation of power systems, optimal forecasting tools are needed for energy operators to maximize profit and also to provide maximum satisfaction to energy consumers. Monthly data for electric energy consumed in the Gaza strip was collected from year 1994 to 2013. Data was trained and the proposed model was validated using 2-Fold and K-Fold cross validation techniques. The model has been tested with actual energy consumption data and yields satisfactory performance.\nPersonalization is the process of fitting a model to patient data, a critical step towards application of multi-physics computational models in clinical practice. Designing robust personalization algorithms is often a tedious, time-consuming, model- and data-specific process. We propose to use artificial intelligence concepts to learn this task, inspired by how human experts manually perform it. The problem is reformulated in terms of reinforcement learning. In an off-line phase, Vito, our self-taught artificial agent, learns a representative decision process model through exploration of the computational model: it learns how the model behaves under change of parameters. The agent then automatically learns an optimal strategy for on-line personalization. The algorithm is model-independent; applying it to a new model requires only adjusting few hyper-parameters of the agent and defining the observations to match. The full knowledge of the model itself is not required. Vito was tested in a synthetic scenario, showing that it could learn how to optimize cost functions generically. Then Vito was applied to the inverse problem of cardiac electrophysiology and the personalization of a whole-body circulation model. The obtained results suggested that Vito could achieve equivalent, if not better goodness of fit than standard methods, while being more robust (up to 11% higher success rates) and with faster (up to seven times) convergence rate. Our artificial intelligence approach could thus make personalization algorithms generalizable and self-adaptable to any patient and any model.\nThe human intelligence lies in the algorithm, the nature of algorithm lies in the classification, and the classification is equal to outlier detection. A lot of algorithms have been proposed to detect outliers, meanwhile a lot of definitions. Unsatisfying point is that definitions seem vague, which makes the solution an ad hoc one. We analyzed the nature of outliers, and give two clear definitions. We then develop an efficient RDD algorithm, which converts outlier problem to pattern and degree problem. Furthermore, a collapse mechanism was introduced by IIR algorithm, which can be united seamlessly with the RDD algorithm and serve for the final decision. Both algorithms are originated from the study on general AI. The combined edition is named as Pe algorithm, which is the basis of the intelligent decision. Here we introduce longest k-turn subsequence problem and corresponding solution as an example to interpret the function of Pe algorithm in detecting curve-type outliers. We also give a comparison between IIR algorithm and Pe algorithm, where we can get a better understanding at both algorithms. A short discussion about intelligence is added to demonstrate the function of the Pe algorithm. Related experimental results indicate its robustness.\nProblem: This paper addresses the design of an intelligent software system for the IC (incident commander) of a team in order to coordinate actions of agents (field units or robots) in the domain of emergency/crisis response operations. Objective: This paper proposes GICoordinator. It is a GIS-based assistant software agent that assists and collaborates with the human planner in strategic planning and macro tasks assignment for centralized multi-agent coordination. Method: Our approach to design GICoordinator was to: analyze the problem, design a complete data model, design an architecture of GICoordinator, specify required capabilities of human and system in coordination problem solving, specify development tools, and deploy. Result: The result was an architecture/design of GICoordinator that contains system requirements. Findings: GICoordinator efficiently integrates geoinformatics with artifice intelligent techniques in order to provide a spatial intelligent coordinator system for an IC to efficiently coordinate and control agents by making macro/strategic decisions. Results define a framework for future works to develop this system.\nThe sociotechnological system is a system constituted of human individuals and their artifacts: technological artifacts, institutions, conceptual and representational systems, worldviews, knowledge systems, culture and the whole biosphere as a volutionary niche. In our view the sociotechnological system as a super-organism is shaped and determined both by the characteristics of the agents involved and the characteristics emergent in their interactions at multiple scales. Our approach to sociotechnological dynamics will maintain a balance between perspectives: the individual and the collective. Accordingly, we analyze dynamics of the Web as a sociotechnological system made of people, computers and digital artifacts (Web pages, databases, search engines, etc.). Making sense of the sociotechnological system while being part of it, is also a constant interplay between pragmatic and value based approaches. The first is focusing on the actualities of the system while the second highlights the observer's projections. In our attempt to model sociotechnological dynamics and envision its future, we take special care to make explicit our values as part of the analysis. In sociotechnological systems with a high degree of reflexivity (coupling between the perception of the system and the system's behavior), highlighting values is of critical importance. In this essay, we choose to see the future evolution of the web as facilitating a basic value, that is, continuous open-ended intelligence expansion. By that we mean that we see intelligence expansion as the determinant of the 'greater good' and 'well being' of both of individuals and collectives at all scales. Our working definition of intelligence here is the progressive process of sense-making of self, other, environment and universe. Intelligence expansion, therefore, means an increasing ability of sense-making.\nWith the advancement of huge data generation and data handling capability, Machine Learning and Probabilistic modelling enables an immense opportunity to employ predictive analytics platform in high security critical industries namely data centers, electricity grids, utilities, airport etc. where downtime minimization is one of the primary objectives. This paper proposes a novel, complete architecture of an intelligent predictive analytics platform, Fault Engine, for huge device network connected with electrical/information flow. Three unique modules, here proposed, seamlessly integrate with available technology stack of data handling and connect with middleware to produce online intelligent prediction in critical failure scenarios. The Markov Failure module predicts the severity of a failure along with survival probability of a device at any given instances. The Root Cause Analysis model indicates probable devices as potential root cause employing Bayesian probability assignment and topological sort. Finally, a community detection algorithm produces correlated clusters of device in terms of failure probability which will further narrow down the search space of finding route cause. The whole Engine has been tested with different size of network with simulated failure environments and shows its potential to be scalable in real-time implementation.\nThe biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self or non-self substances. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years.   A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune system have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.\nRiddles based on simple puns can be classified according to the patterns of word, syllable or phrase similarity they depend upon. We have devised a formal model of the semantic and syntactic regularities underlying some of the simpler types of punning riddle. We have also implemented this preliminary theory in a computer program which can generate riddles from a lexicon containing general data about words and phrases; that is, the lexicon content is not customised to produce jokes. Informal evaluation of the program's results by a set of human judges suggest that the riddles produced by this program are of comparable quality to those in general circulation among school children.\nIn this paper, we discuss a model of simple question-answer punning, implemented in a program, JAPE, which generates riddles from humour-independent lexical entries. The model uses two main types of structure: schemata, which determine the relationships between key words in a joke, and templates, which produce the surface form of the joke. JAPE succeeds in generating pieces of text that are recognizably jokes, but some of them are not very good jokes. We mention some potential improvements and extensions, including post-production heuristics for ordering the jokes according to quality.\nWe present an integrated architecture for word-level and sentence-level processing in a unification-based paradigm. The core of the system is a CLP implementation of a unification engine for feature structures supporting relational values. In this framework an HPSG-style grammar is implemented. Word-level processing uses X2MorF, a morphological component based on an extended version of two-level morphology. This component is tightly integrated with the grammar as a relation. The advantage of this approach is that morphology and syntax are kept logically autonomous while at the same time minimizing interface problems.\nThis paper is an introduction to natural language interfaces to databases (NLIDBs). A brief overview of the history of NLIDBs is first given. Some advantages and disadvantages of NLIDBs are then discussed, comparing NLIDBs to formal query languages, form-based interfaces, and graphical interfaces. An introduction to some of the linguistic problems NLIDBs have to confront follows, for the benefit of readers less familiar with computational linguistics. The discussion then moves on to NLIDB architectures, portability issues, restricted natural language input systems (including menu-based NLIDBs), and NLIDBs with reasoning capabilities. Some less explored areas of NLIDB research are then presented, namely database updates, meta-knowledge questions, temporal questions, and multi-modal NLIDBs. The paper ends with reflections on the current state of the art.\nThe Earley algorithm is a widely used parsing method in natural language processing applications. We introduce a variant of Earley parsing that is based on a ``delayed'' recognition of constituents. This allows us to start the recognition of a constituent only in cases in which all of its subconstituents have been found within the input string. This is particularly advantageous in several cases in which partial analysis of a constituent cannot be completed and in general in all cases of productions sharing some suffix of their right-hand sides (even for different left-hand side nonterminals). Although the two algorithms result in the same asymptotic time and space complexity, from a practical perspective our algorithm improves the time and space requirements of the original method, as shown by reported experimental results.\nWe propose a general-purpose method for finding high-quality solutions to hard optimization problems, inspired by self-organizing processes often found in nature. The method, called Extremal Optimization, successively eliminates extremely undesirable components of sub-optimal solutions. Drawing upon models used to simulate far-from-equilibrium dynamics, it complements approximation methods inspired by equilibrium statistical physics, such as Simulated Annealing. With only one adjustable parameter, its performance proves competitive with, and often superior to, more elaborate stochastic optimization procedures. We demonstrate it here on two classic hard optimization problems: graph partitioning and the traveling salesman problem.\nWe describe an extensive study of search in GSAT, an approximation procedure for propositional satisfiability. GSAT performs greedy hill-climbing on the number of satisfied clauses in a truth assignment. Our experiments provide a more complete picture of GSAT's search than previous accounts. We describe in detail the two phases of search: rapid hill-climbing followed by a long plateau search. We demonstrate that when applied to randomly generated 3SAT problems, there is a very simple scaling with problem size for both the mean number of satisfied clauses and the mean branching rate. Our results allow us to make detailed numerical conjectures about the length of the hill-climbing phase, the average gradient of this phase, and to conjecture that both the average score and average branching rate decay exponentially during plateau search. We end by showing how these results can be used to direct future theoretical analysis. This work provides a case study of how computer experiments can be used to improve understanding of the theoretical properties of algorithms.\nAs real logic programmers normally use cut (!), an effective learning procedure for logic programs should be able to deal with it. Because the cut predicate has only a procedural meaning, clauses containing cut cannot be learned using an extensional evaluation method, as is done in most learning systems. On the other hand, searching a space of possible programs (instead of a space of independent clauses) is unfeasible. An alternative solution is to generate first a candidate base program which covers the positive examples, and then make it consistent by inserting cut where appropriate. The problem of learning programs with cut has not been investigated before and this seems to be a natural and reasonable approach. We generalize this scheme and investigate the difficulties that arise. Some of the major shortcomings are actually caused, in general, by the need for intensional evaluation. As a conclusion, the analysis of this paper suggests, on precise and technical grounds, that learning cut is difficult, and current induction techniques should probably be restricted to purely declarative logic languages.\nTo support the goal of allowing users to record and retrieve information, this paper describes an interactive note-taking system for pen-based computers with two distinctive features. First, it actively predicts what the user is going to write. Second, it automatically constructs a custom, button-box user interface on request. The system is an example of a learning-apprentice software- agent. A machine learning component characterizes the syntax and semantics of the user's information. A performance system uses this learned information to generate completion strings and construct a user interface. Description of Online Appendix: People like to record information. Doing this on paper is initially efficient, but lacks flexibility. Recording information on a computer is less efficient but more powerful. In our new note taking softwre, the user records information directly on a computer. Behind the interface, an agent acts for the user. To help, it provides defaults and constructs a custom user interface. The demonstration is a QuickTime movie of the note taking agent in action. The file is a binhexed self-extracting archive. Macintosh utilities for binhex are available from mac.archive.umich.edu. QuickTime is available from ftp.apple.com in the dts/mac/sys.soft/quicktime.\nTerminological knowledge representation systems (TKRSs) are tools for designing and using knowledge bases that make use of terminological languages (or concept languages). We analyze from a theoretical point of view a TKRS whose capabilities go beyond the ones of presently available TKRSs. The new features studied, often required in practical applications, can be summarized in three main points. First, we consider a highly expressive terminological language, called ALCNR, including general complements of concepts, number restrictions and role conjunction. Second, we allow to express inclusion statements between general concepts, and terminological cycles as a particular case. Third, we prove the decidability of a number of desirable TKRS-deduction services (like satisfiability, subsumption and instance checking) through a sound, complete and terminating calculus for reasoning in ALCNR-knowledge bases. Our calculus extends the general technique of constraint systems. As a byproduct of the proof, we get also the result that inclusion statements in ALCNR can be simulated by terminological cycles, if descriptive semantics is adopted.\nA formalism is presented for computing and organizing actions for autonomous agents in dynamic environments. We introduce the notion of teleo-reactive (T-R) programs whose execution entails the construction of circuitry for the continuous computation of the parameters and conditions on which agent action is based. In addition to continuous feedback, T-R programs support parameter binding and recursion. A primary difference between T-R programs and many other circuit-based systems is that the circuitry of T-R programs is more compact; it is constructed at run time and thus does not have to anticipate all the contingencies that might arise over all possible runs. In addition, T-R programs are intuitive and easy to write and are written in a form that is compatible with automatic planning and learning methods. We briefly describe some experimental applications of T-R programs in the control of simulated and actual mobile robots.\nThe ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.\nThe theory revision problem is the problem of how best to go about revising a deficient domain theory using information contained in examples that expose inaccuracies. In this paper we present our approach to the theory revision problem for propositional domain theories. The approach described here, called PTR, uses probabilities associated with domain theory elements to numerically track the ``flow'' of proof through the theory. This allows us to measure the precise role of a clause or literal in allowing or preventing a (desired or undesired) derivation for a given example. This information is used to efficiently locate and repair flawed elements of the theory. PTR is proved to converge to a theory which correctly classifies all examples, and shown experimentally to be fast and accurate even for deep theories.\nThis paper analyzes the correctness of the subsumption algorithm used in CLASSIC, a description logic-based knowledge representation system that is being used in practical applications. In order to deal efficiently with individuals in CLASSIC descriptions, the developers have had to use an algorithm that is incomplete with respect to the standard, model-theoretic semantics for description logics. We provide a variant semantics for descriptions with respect to which the current implementation is complete, and which can be independently motivated. The soundness and completeness of the polynomial-time subsumption algorithm is established using description graphs, which are an abstracted version of the implementation structures used in CLASSIC, and are of independent interest.\nInformation extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scattered throughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple pattern search can achieve this purpose. The second step requires deeper knowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto a structured output format, complex discourse processing is essential. This paper reports on a Japanese information extraction system that merges information using a pattern matcher and discourse processor. Evaluation results show a high level of system performance which approaches human performance.\nThe vast amounts of on-line text now available have led to renewed interest in information extraction (IE) systems that analyze unrestricted text, producing a structured representation of selected information from the text. This paper presents a novel approach that uses machine learning to acquire knowledge for some of the higher level IE processing. Wrap-Up is a trainable IE discourse component that makes intersentential inferences and identifies logical relations among information extracted from the text. Previous corpus-based approaches were limited to lower level processing such as part-of-speech tagging, lexical disambiguation, and dictionary construction. Wrap-Up is fully trainable, and not only automatically decides what classifiers are needed, but even derives the feature set for each classifier automatically. Performance equals that of a partially trainable discourse module requiring manual customization for each domain.\nThis paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decomposition techniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.\nFor many years, the intuitions underlying partial-order planning were largely taken for granted. Only in the past few years has there been renewed interest in the fundamental principles underlying this paradigm. In this paper, we present a rigorous comparative analysis of partial-order and total-order planning by focusing on two specific planners that can be directly compared. We show that there are some subtle assumptions that underly the wide-spread intuitions regarding the supposed efficiency of partial-order planning. For instance, the superiority of partial-order planning can depend critically upon the search strategy and the structure of the search space. Understanding the underlying assumptions is crucial for constructing efficient planners.\nThe paradigms of transformational planning, case-based planning, and plan debugging all involve a process known as plan adaptation - modifying or repairing an old plan so it solves a new problem. In this paper we provide a domain-independent algorithm for plan adaptation, demonstrate that it is sound, complete, and systematic, and compare it to other adaptation algorithms in the literature. Our approach is based on a view of planning as searching a graph of partial plans. Generative planning starts at the graph's root and moves from node to node using plan-refinement operators. In planning by adaptation, a library plan - an arbitrary node in the plan graph - is the starting point for the search, and the plan-adaptation algorithm can apply both the same refinement operators available to a generative planner and can also retract constraints and steps from the plan. Our algorithm's completeness ensures that the adaptation algorithm will eventually search the entire graph and its systematicity ensures that it will do so without redundantly searching any parts of the graph.\nTemporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda &gt 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.\nThe DNA promoter sequences domain theory and database have become popular for testing systems that integrate empirical and analytical learning. This note reports a simple change and reinterpretation of the domain theory in terms of M-of-N concepts, involving no learning, that results in an accuracy of 93.4% on the 106 items of the database. Moreover, an exhaustive search of the space of M-of-N domain theory interpretations indicates that the expected accuracy of a randomly chosen interpretation is 76.5%, and that a maximum accuracy of 97.2% is achieved in 12 cases. This demonstrates the informativeness of the domain theory, without the complications of understanding the interactions between various learning algorithms and the theory. In addition, our results help characterize the difficulty of learning using the DNA promoters theory.\nThis paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.\nMany studies have been carried out in order to increase the search efficiency of constraint satisfaction problems; among them, some make use of structural properties of the constraint network; others take into account semantic properties of the constraints, generally assuming that all the constraints possess the given property. In this paper, we propose a new decomposition method benefiting from both semantic properties of functional constraints (not bijective constraints) and structural properties of the network; furthermore, not all the constraints need to be functional. We show that under some conditions, the existence of solutions can be guaranteed. We first characterize a particular subset of the variables, which we name a root set. We then introduce pivot consistency, a new local consistency which is a weak form of path consistency and can be achieved in O(n^2d^2) complexity (instead of O(n^3d^3) for path consistency), and we present associated properties; in particular, we show that any consistent instantiation of the root set can be linearly extended to a solution, which leads to the presentation of the aforementioned new method for solving by decomposing functional CSPs.\nWe study the process of multi-agent reinforcement learning in the context of load balancing in a distributed system, without use of either central coordination or explicit communication. We first define a precise framework in which to study adaptive load balancing, important features of which are its stochastic nature and the purely local information available to individual agents. Given this framework, we show illuminating results on the interplay between basic adaptive behavior parameters and their effect on system efficiency. We then investigate the properties of adaptive load balancing in heterogeneous populations, and address the issue of exploration vs. exploitation in that context. Finally, we show that naive use of communication may not improve, and might even harm system efficiency.\nWe present algorithms that learn certain classes of function-free recursive logic programs in polynomial time from equivalence queries. In particular, we show that a single k-ary recursive constant-depth determinate clause is learnable. Two-clause programs consisting of one learnable recursive clause and one constant-depth determinate non-recursive clause are also learnable, if an additional ``basecase'' oracle is assumed. These results immediately imply the pac-learnability of these classes. Although these classes of learnable recursive programs are very constrained, it is shown in a companion paper that they are maximally general, in that generalizing either class in any natural way leads to a computationally difficult learning problem. Thus, taken together with its companion paper, this paper establishes a boundary of efficient learnability for recursive logic programs.\nIn a companion paper it was shown that the class of constant-depth determinate k-ary recursive clauses is efficiently learnable. In this paper we present negative results showing that any natural generalization of this class is hard to learn in Valiant's model of pac-learnability. In particular, we show that the following program classes are cryptographically hard to learn: programs with an unbounded number of constant-depth linear recursive clauses; programs with one constant-depth determinate clause containing an unbounded number of recursive calls; and programs with one linear recursive clause of constant locality. These results immediately imply the non-learnability of any more general class of programs. We also show that learning a constant-depth determinate program with either two linear recursive clauses or one linear recursive clause and one non-recursive clause is as hard as learning boolean DNF. Together with positive results from the companion paper, these negative results establish a boundary of efficient learnability for recursive function-free clauses.\nThis paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called FOIDL, is based on FOIL (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the past-tense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. FOIDL is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic).\nion is one of the most promising approaches to improve the performance of problem solvers. In several domains abstraction by dropping sentences of a domain description -- as used in most hierarchical planners -- has proven useful. In this paper we present examples which illustrate significant drawbacks of abstraction by dropping sentences. To overcome these drawbacks, we propose a more general view of abstraction involving the change of representation language. We have developed a new abstraction methodology and a related sound and complete learning algorithm that allows the complete change of representation language of planning cases from concrete to abstract. However, to achieve a powerful change of the representation language, the abstract language itself as well as rules which describe admissible ways of abstracting states must be provided in the domain model. This new abstraction approach is the core of Paris (Plan Abstraction and Refinement in an Integrated System), a system in which abstract planning cases are automatically learned from given concrete cases. An empirical study in the domain of process planning in mechanical engineering shows significant advantages of the proposed reasoning from abstract cases over classical hierarchical planning.\nIdentifying inaccurate data has long been regarded as a significant and difficult problem in AI. In this paper, we present a new method for identifying inaccurate data on the basis of qualitative correlations among related data. First, we introduce the definitions of related data and qualitative correlations among related data. Then we put forward a new concept called support coefficient function (SCF). SCF can be used to extract, represent, and calculate qualitative correlations among related data within a dataset. We propose an approach to determining dynamic shift intervals of inaccurate data, and an approach to calculating possibility of identifying inaccurate data, respectively. Both of the approaches are based on SCF. Finally we present an algorithm for identifying inaccurate data by using qualitative correlations among related data as confirmatory or disconfirmatory evidence. We have developed a practical system for interpreting infrared spectra by applying the method, and have fully tested the system against several hundred real spectra. The experimental results show that the method is significantly better than the conventional methods used in many similar systems.\nThis paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.\nFunctionality-based recognition systems recognize objects at the category level by reasoning about how well the objects support the expected function. Such systems naturally associate a ``measure of goodness'' or ``membership value'' with a recognized object. This measure of goodness is the result of combining individual measures, or membership values, from potentially many primitive evaluations of different properties of the object's shape. A membership function is used to compute the membership value when evaluating a primitive of a particular physical property of an object. In previous versions of a recognition system known as Gruff, the membership function for each of the primitive evaluations was hand-crafted by the system designer. In this paper, we provide a learning component for the Gruff system, called Omlet, that automatically learns membership functions given a set of example objects labeled with their desired category measure. The learning algorithm is generally applicable to any problem in which low-level membership values are combined through an and-or tree structure to give a final overall membership value.\nThis paper presents an approach to learning from situated, interactive tutorial instruction within an ongoing agent. Tutorial instruction is a flexible (and thus powerful) paradigm for teaching tasks because it allows an instructor to communicate whatever types of knowledge an agent might need in whatever situations might arise. To support this flexibility, however, the agent must be able to learn multiple kinds of knowledge from a broad range of instructional interactions. Our approach, called situated explanation, achieves such learning through a combination of analytic and inductive techniques. It combines a form of explanation-based learning that is situated for each instruction with a full suite of contextually guided responses to incomplete explanations. The approach is implemented in an agent called Instructo-Soar that learns hierarchies of new tasks and other domain knowledge from interactive natural language instructions. Instructo-Soar meets three key requirements of flexible instructability that distinguish it from previous systems: (1) it can take known or unknown commands at any instruction point; (2) it can handle instructions that apply to either its current situation or to a hypothetical situation specified in language (as in, for instance, conditional instructions); and (3) it can learn, from instructions, each class of knowledge it uses to perform tasks.\nThe main aim of this work is the development of a vision-based road detection system fast enough to cope with the difficult real-time constraints imposed by moving vehicle applications. The hardware platform, a special-purpose massively parallel system, has been chosen to minimize system production and operational costs. This paper presents a novel approach to expectation-driven low-level image segmentation, which can be mapped naturally onto mesh-connected massively parallel SIMD architectures capable of handling hierarchical data structures. The input image is assumed to contain a distorted version of a given template; a multiresolution stretching process is used to reshape the original template in accordance with the acquired image content, minimizing a potential function. The distorted template is the process output.\nIn the area of inductive learning, generalization is a main operation, and the usual definition of induction is based on logical implication. Recently there has been a rising interest in clausal representation of knowledge in machine learning. Almost all inductive learning systems that perform generalization of clauses use the relation theta-subsumption instead of implication. The main reason is that there is a well-known and simple technique to compute least general generalizations under theta-subsumption, but not under implication. However generalization under theta-subsumption is inappropriate for learning recursive clauses, which is a crucial problem since recursion is the basic program structure of logic programs. We note that implication between clauses is undecidable, and we therefore introduce a stronger form of implication, called T-implication, which is decidable between clauses. We show that for every finite set of clauses there exists a least general generalization under T-implication. We describe a technique to reduce generalizations under implication of a clause to generalizations under theta-subsumption of what we call an expansion of the original clause. Moreover we show that for every non-tautological clause there exists a T-complete expansion, which means that every generalization under T-implication of the clause is reduced to a generalization under theta-subsumption of the expansion.\nCharacteristic models are an alternative, model based, representation for Horn expressions. It has been shown that these two representations are incomparable and each has its advantages over the other. It is therefore natural to ask what is the cost of translating, back and forth, between these representations. Interestingly, the same translation questions arise in database theory, where it has applications to the design of relational databases. This paper studies the computational complexity of these problems. Our main result is that the two translation problems are equivalent under polynomial reductions, and that they are equivalent to the corresponding decision problem. Namely, translating is equivalent to deciding whether a given set of models is the set of characteristic models for a given Horn expression. We also relate these problems to the hypergraph transversal problem, a well known problem which is related to other applications in AI and for which no polynomial time algorithm is known. It is shown that in general our translation problems are at least as hard as the hypergraph transversal problem, and in a special case they are equivalent to it.\nMany applications -- from planning and scheduling to problems in molecular biology -- rely heavily on a temporal reasoning component. In this paper, we discuss the design and empirical analysis of algorithms for a temporal reasoning system based on Allen's influential interval-based framework for representing temporal information. At the core of the system are algorithms for determining whether the temporal information is consistent, and, if so, finding one or more scenarios that are consistent with the temporal information. Two important algorithms for these tasks are a path consistency algorithm and a backtracking algorithm. For the path consistency algorithm, we develop techniques that can result in up to a ten-fold speedup over an already highly optimized implementation. For the backtracking algorithm, we develop variable and value ordering heuristics that are shown empirically to dramatically improve the performance of the algorithm. As well, we show that a previously suggested reformulation of the backtracking search problem can reduce the time and space requirements of the backtracking search. Taken together, the techniques we develop allow a temporal reasoning component to solve problems that are of practical size.\nTraditional databases commonly support efficient query and update procedures that operate in time which is sublinear in the size of the database. Our goal in this paper is to take a first step toward dynamic reasoning in probabilistic databases with comparable efficiency. We propose a dynamic data structure that supports efficient algorithms for updating and querying singly connected Bayesian networks. In the conventional algorithm, new evidence is absorbed in O(1) time and queries are processed in time O(N), where N is the size of the network. We propose an algorithm which, after a preprocessing phase, allows us to answer queries in time O(log N) at the expense of O(log N) time per evidence absorption. The usefulness of sub-linear processing time manifests itself in applications requiring (near) real-time response over large probabilistic databases. We briefly discuss a potential application of dynamic probabilistic reasoning in computational biology.\nTermination of logic programs with negated body atoms (here called general logic programs) is an important topic. One reason is that many computational mechanisms used to process negated atoms, like Clark's negation as failure and Chan's constructive negation, are based on termination conditions. This paper introduces a methodology for proving termination of general logic programs w.r.t. the Prolog selection rule. The idea is to distinguish parts of the program depending on whether or not their termination depends on the selection rule. To this end, the notions of low-, weakly up-, and up-acceptable program are introduced. We use these notions to develop a methodology for proving termination of general logic programs, and show how interesting problems in non-monotonic reasoning can be formalized and implemented by means of terminating general logic programs.\nThis paper presents new experimental evidence against the utility of Occam's razor. A~systematic procedure is presented for post-processing decision trees produced by C4.5. This procedure was derived by rejecting Occam's razor and instead attending to the assumption that similar objects are likely to belong to the same class. It increases a decision tree's complexity without altering the performance of that tree on the training data from which it is inferred. The resulting more complex decision trees are demonstrated to have, on average, for a variety of common learning tasks, higher predictive accuracy than the less complex original decision trees. This result raises considerable doubt about the utility of Occam's razor as it is commonly applied in modern machine learning.\nThe main operations in Inductive Logic Programming (ILP) are generalization and specialization, which only make sense in a generality order. In ILP, the three most important generality orders are subsumption, implication and implication relative to background knowledge. The two languages used most often are languages of clauses and languages of only Horn clauses. This gives a total of six different ordered languages. In this paper, we give a systematic treatment of the existence or non-existence of least generalizations and greatest specializations of finite sets of clauses in each of these six ordered sets. We survey results already obtained by others and also contribute some answers of our own. Our main new results are, firstly, the existence of a computable least generalization under implication of every finite set of clauses containing at least one non-tautologous function-free clause (among other, not necessarily function-free clauses). Secondly, we show that such a least generalization need not exist under relative implication, not even if both the set that is to be generalized and the background knowledge are function-free. Thirdly, we give a complete discussion of existence and non-existence of greatest specializations in each of the six ordered languages.\nThis paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.\nAlthough most scheduling problems are NP-hard, domain specific techniques perform well in practice but are quite expensive to construct. In adaptive problem-solving solving, domain specific knowledge is acquired automatically for a general problem solver with a flexible control architecture. In this approach, a learning system explores a space of possible heuristic methods for one well-suited to the eccentricities of the given domain and problem distribution. In this article, we discuss an application of the approach to scheduling satellite communications. Using problem distributions based on actual mission requirements, our approach identifies strategies that not only decrease the amount of CPU time required to produce schedules, but also increase the percentage of problems that are solvable within computational resource limitations.\nSpeedup learning seeks to improve the computational efficiency of problem solving with experience. In this paper, we develop a formal framework for learning efficient problem solving from random problems and their solutions. We apply this framework to two different representations of learned knowledge, namely control rules and macro-operators, and prove theorems that identify sufficient conditions for learning in each representation. Our proofs are constructive in that they are accompanied with learning algorithms. Our framework captures both empirical and explanation-based speedup learning in a unified fashion. We illustrate our framework with implementations in two domains: symbolic integration and Eight Puzzle. This work integrates many strands of experimental and theoretical work in machine learning, including empirical learning of control rules, macro-operator learning, Explanation-Based Learning (EBL), and Probably Approximately Correct (PAC) Learning.\nA fundamental assumption made by classical AI planners is that there is no uncertainty in the world: the planner has full knowledge of the conditions under which the plan will be executed and the outcome of every action is fully predictable. These planners cannot therefore construct contingency plans, i.e., plans in which different actions are performed in different circumstances. In this paper we discuss some issues that arise in the representation and construction of contingency plans and describe Cassandra, a partial-order contingency planner. Cassandra uses explicit decision-steps that enable the agent executing the plan to decide which plan branch to follow. The decision-steps in a plan result in subgoals to acquire knowledge, which are planned for in the same way as any other subgoals. Cassandra thus distinguishes the process of gathering information from the process of making decisions. The explicit representation of decisions in Cassandra allows a coherent approach to the problems of contingent planning, and provides a solid base for extensions such as the use of different decision-making procedures.\nAn important problem in geometric reasoning is to find the configuration of a collection of geometric bodies so as to satisfy a set of given constraints. Recently, it has been suggested that this problem can be solved efficiently by symbolically reasoning about geometry. This approach, called degrees of freedom analysis, employs a set of specialized routines called plan fragments that specify how to change the configuration of a set of bodies to satisfy a new constraint while preserving existing constraints. A potential drawback, which limits the scalability of this approach, is concerned with the difficulty of writing plan fragments. In this paper we address this limitation by showing how these plan fragments can be automatically synthesized using first principles about geometric bodies, actions, and topology.\nMotivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multi-agent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control. We refer to such systems as partially controlled multi-agent systems, and we investigate how one might influence the behavior of the uncontrolled agents through appropriate design of the controlled agents. In particular, we wish to understand which problems are naturally described in these terms, what methods can be applied to influence the uncontrollable agents, the effectiveness of such methods, and whether similar methods work across different domains. Using a game-theoretic framework, this paper studies the design of partially controlled multi-agent systems in two contexts: in one context, the uncontrollable agents are expected utility maximizers, while in the other they are reinforcement learners. We suggest different techniques for controlling agents' behavior in each domain, assess their success, and examine their relationship.\nCue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.\nA new method is proposed for exploiting causal independencies in exact Bayesian network inference. A Bayesian network can be viewed as representing a factorization of a joint probability into the multiplication of a set of conditional probabilities. We present a notion of causal independence that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The new formulation of causal independence lets us specify the conditional probability of a variable given its parents in terms of an associative and commutative operator, such as ``or'', ``sum'' or ``max'', on the contribution of each parent. We start with a simple algorithm VE for Bayesian network inference that, given evidence and a query variable, uses the factorization to find the posterior distribution of the query. We show how this algorithm can be extended to exploit causal independence. Empirical studies, based on the CPCS networks for medical diagnosis, show that this method is more efficient than previous methods and allows for inference in larger networks than previous algorithms.\nInstance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.\nAn algorithm that learns from a set of examples should ideally be able to exploit the available resources of (a) abundant computing power and (b) domain-specific knowledge to improve its ability to generalize. Connectionist theory-refinement systems, which use background knowledge to select a neural network's topology and initial weights, have proven to be effective at exploiting domain-specific knowledge; however, most do not exploit available computing power. This weakness occurs because they lack the ability to refine the topology of the neural networks they produce, thereby limiting generalization, especially when given impoverished domain theories. We present the REGENT algorithm which uses (a) domain-specific knowledge to help create an initial population of knowledge-based neural networks and (b) genetic operators of crossover and mutation (specifically designed for knowledge-based networks) to continually search for better network topologies. Experiments on three real-world domains indicate that our new algorithm is able to significantly increase generalization compared to a standard connectionist theory-refinement system, as well as our previous algorithm for growing knowledge-based networks.\nSeveral recent studies have compared the relative efficiency of alternative flaw selection strategies for partial-order causal link (POCL) planning. We review this literature, and present new experimental results that generalize the earlier work and explain some of the discrepancies in it. In particular, we describe the Least-Cost Flaw Repair (LCFR) strategy developed and analyzed by Joslin and Pollack (1994), and compare it with other strategies, including Gerevini and Schubert's (1996) ZLIFO strategy. LCFR and ZLIFO make very different, and apparently conflicting claims about the most effective way to reduce search-space size in POCL planning. We resolve this conflict, arguing that much of the benefit that Gerevini and Schubert ascribe to the LIFO component of their ZLIFO strategy is better attributed to other causes. We show that for many problems, a strategy that combines least-cost flaw selection with the delay of separable threats will be effective in reducing search-space size, and will do so without excessive computational overhead. Although such a strategy thus provides a good default, we also show that certain domain characteristics may reduce its effectiveness.\nThe easy-hard-easy pattern in the difficulty of combinatorial search problems as constraints are added has been explained as due to a competition between the decrease in number of solutions and increased pruning. We test the generality of this explanation by examining one of its predictions: if the number of solutions is held fixed by the choice of problems, then increased pruning should lead to a monotonic decrease in search cost. Instead, we find the easy-hard-easy pattern in median search cost even when the number of solutions is held constant, for some search methods. This generalizes previous observations of this pattern and shows that the existing theory does not explain the full range of the peak in search cost. In these cases the pattern appears to be due to changes in the size of the minimal unsolvable subproblems, rather than changing numbers of solutions.\nSEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences.\nA fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the three-dimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as alpha-helices and beta-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction.\nPartially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.\nLocal search algorithms for combinatorial search problems frequently encounter a sequence of states in which it is impossible to improve the value of the objective function; moves through these regions, called plateau moves, dominate the time spent in local search. We analyze and characterize plateaus for three different classes of randomly generated Boolean Satisfiability problems. We identify several interesting features of plateaus that impact the performance of local search algorithms. We show that local minima tend to be small but occasionally may be very large. We also show that local minima can be escaped without unsatisfying a large number of clauses, but that systematically searching for an escape route may be computationally expensive if the local minimum is large. We show that plateaus with exits, called benches, tend to be much larger than minima, and that some benches have very few exit states which local search can use to escape. We show that the solutions (i.e., global minima) of randomly generated problem instances form clusters, which behave similarly to local minima. We revisit several enhancements of local search algorithms and explain their performance in light of our results. Finally we discuss strategies for creating the next generation of local search algorithms.\nThe assessment of bidirectional heuristic search has been incorrect since it was first published more than a quarter of a century ago. For quite a long time, this search strategy did not achieve the expected results, and there was a major misunderstanding about the reasons behind it. Although there is still wide-spread belief that bidirectional heuristic search is afflicted by the problem of search frontiers passing each other, we demonstrate that this conjecture is wrong. Based on this finding, we present both a new generic approach to bidirectional heuristic search and a new approach to dynamically improving heuristic values that is feasible in bidirectional search only. These approaches are put into perspective with both the traditional and more recently proposed approaches in order to facilitate a better overall understanding. Empirical results of experiments with our new approaches show that bidirectional heuristic search can be performed very efficiently and also with limited memory. These results suggest that bidirectional heuristic search appears to be better for solving certain difficult problems than corresponding unidirectional search. This provides some evidence for the usefulness of a search strategy that was long neglected. In summary, we show that bidirectional heuristic search is viable and consequently propose that it be reconsidered.\nIn this paper we consider the problem of `theory patching', in which we are given a domain theory, some of whose components are indicated to be possibly flawed, and a set of labeled training examples for the domain concept. The theory patching problem is to revise only the indicated components of the theory, such that the resulting theory correctly classifies all the training examples. Theory patching is thus a type of theory revision in which revisions are made to individual components of the theory. Our concern in this paper is to determine for which classes of logical domain theories the theory patching problem is tractable. We consider both propositional and first-order domain theories, and show that the theory patching problem is equivalent to that of determining what information contained in a theory is `stable' regardless of what revisions might be performed to the theory. We show that determining stability is tractable if the input theory satisfies two conditions: that revisions to each theory component have monotonic effects on the classification of examples, and that theory components act independently in the classification of examples in the theory. We also show how the concepts introduced can be used to determine the soundness and completeness of particular theory patching algorithms.\nThis paper presents a comprehensive approach for model-based diagnosis which includes proposals for characterizing and computing preferred diagnoses, assuming that the system description is augmented with a system structure (a directed graph explicating the interconnections between system components). Specifically, we first introduce the notion of a consequence, which is a syntactically unconstrained propositional sentence that characterizes all consistency-based diagnoses and show that standard characterizations of diagnoses, such as minimal conflicts, correspond to syntactic variations on a consequence. Second, we propose a new syntactic variation on the consequence known as negation normal form (NNF) and discuss its merits compared to standard variations. Third, we introduce a basic algorithm for computing consequences in NNF given a structured system description. We show that if the system structure does not contain cycles, then there is always a linear-size consequence in NNF which can be computed in linear time. For arbitrary system structures, we show a precise connection between the complexity of computing consequences and the topology of the underlying system structure. Finally, we present an algorithm that enumerates the preferred diagnoses characterized by a consequence. The algorithm is shown to take linear time in the size of the consequence if the preference criterion satisfies some general conditions.\nOne of the most common mechanisms used for speeding up problem solvers is macro-learning. Macros are sequences of basic operators acquired during problem solving. Macros are used by the problem solver as if they were basic operators. The major problem that macro-learning presents is the vast number of macros that are available for acquisition. Macros increase the branching factor of the search space and can severely degrade problem-solving efficiency. To make macro learning useful, a program must be selective in acquiring and utilizing macros. This paper describes a general method for selective acquisition of macros. Solvable training problems are generated in increasing order of difficulty. The only macros acquired are those that take the problem solver out of a local minimum to a better state. The utility of the method is demonstrated in several domains, including the domain of NxN sliding-tile puzzles. After learning on small puzzles, the system is able to efficiently solve puzzles of any size.\nThe standard approach to logic in the literature in philosophy and mathematics, which has also been adopted in computer science, is to define a language (the syntax), an appropriate class of models together with an interpretation of formulas in the language (the semantics), a collection of axioms and rules of inference characterizing reasoning (the proof theory), and then relate the proof theory to the semantics via soundness and completeness results. Here we consider an approach that is more common in the economics literature, which works purely at the semantic, set-theoretic level. We provide set-theoretic completeness results for a number of epistemic and conditional logics, and contrast the expressive power of the syntactic and set-theoretic approaches\nWe examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and looping plans, and partially ordered plans under three natural definitions of plan value. We show that problems of interest are complete for a variety of complexity classes: PL, P, NP, co-NP, PP, NP^PP, co-NP^PP, and PSPACE. In the process of proving that certain planning problems are complete for NP^PP, we introduce a new basic NP^PP-complete problem, E-MAJSAT, which generalizes the standard Boolean satisfiability problem to computations involving probabilistic quantities; our results suggest that the development of good heuristics for E-MAJSAT could be important for the creation of efficient algorithms for a wide variety of problems.\nWe address the issues of semantics and conversations for agent communication languages and the Knowledge Query Manipulation Language (KQML) in particular. Based on ideas from speech act theory, we present a semantic description for KQML that associates ``cognitive'' states of the agent with the use of the language's primitives (performatives). We have used this approach to describe the semantics for the whole set of reserved KQML performatives. Building on the semantics, we devise the conversation policies, i.e., a formal description of how KQML performatives may be combined into KQML exchanges (conversations), using a Definite Clause Grammar. Our research offers methods for a speech act theory-based semantic description of a language of communication acts and for the specification of the protocols associated with these acts. Languages of communication acts address the issue of communication among software applications at a level of abstraction that is useful to the emerging software agents paradigm.\nWe present our approach to the problem of how an agent, within an economic Multi-Agent System, can determine when it should behave strategically (i.e. learn and use models of other agents), and when it should act as a simple price-taker. We provide a framework for the incremental implementation of modeling capabilities in agents, and a description of the forms of knowledge required. The agents were implemented and different populations simulated in order to learn more about their behavior and the merits of using and learning agent models. Our results show, among other lessons, how savvy buyers can avoid being ``cheated'' by sellers, how price volatility can be used to quantitatively predict the benefits of deeper models, and how specific types of agent populations influence system behavior.\nThe paper presents a constructive fixpoint semantics for autoepistemic logic (AEL). This fixpoint characterizes a unique but possibly three-valued belief set of an autoepistemic theory. It may be three-valued in the sense that for a subclass of formulas F, the fixpoint may not specify whether F is believed or not. The paper presents a constructive 3-valued semantics for autoepistemic logic (AEL). We introduce a derivation operator and define the semantics as its least fixpoint. The semantics is 3-valued in the sense that, for some formulas, the least fixpoint does not specify whether they are believed or not. We show that complete fixpoints of the derivation operator correspond to Moore's stable expansions. In the case of modal representations of logic programs our least fixpoint semantics expresses well-founded semantics or 3-valued Fitting-Kunen semantics (depending on the embedding used). We show that, computationally, our semantics is simpler than the semantics proposed by Moore (assuming that the polynomial hierarchy does not collapse).\nThis article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on \"management succession events\" and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95%.\nACLP is a system which combines abductive reasoning and constraint solving by integrating the frameworks of Abductive Logic Programming (ALP) and Constraint Logic Programming (CLP). It forms a general high-level knowledge representation environment for abductive problems in Artificial Intelligence and other areas. In ACLP, the task of abduction is supported and enhanced by its non-trivial integration with constraint solving facilitating its application to complex problems. The ACLP system is currently implemented on top of the CLP language of ECLiPSe as a meta-interpreter exploiting its underlying constraint solver for finite domains. It has been applied to the problems of planning and scheduling in order to test its computational effectiveness compared with the direct use of the (lower level) constraint solving framework of CLP on which it is built. These experiments provide evidence that the abductive framework of ACLP does not compromise significantly the computational efficiency of the solutions. Other experiments show the natural ability of ACLP to accommodate easily and in a robust way new or changing requirements of the original problem.\nWe study here the well-known propagation rules for Boolean constraints. First we propose a simple notion of completeness for sets of such rules and establish a completeness result. Then we show an equivalence in an appropriate sense between Boolean constraint propagation and unit propagation, a form of resolution for propositional logic.   Subsequently we characterize one set of such rules by means of the notion of hyper-arc consistency introduced in (Mohr and Masini 1988). Also, we clarify the status of a similar, though different, set of rules introduced in (Simonis 1989a) and more fully in (Codognet and Diaz 1996).\nThis paper describes an experimental comparison between two standard supervised learning methods, namely Naive Bayes and Exemplar-based classification, on the Word Sense Disambiguation (WSD) problem. The aim of the work is twofold. Firstly, it attempts to contribute to clarify some confusing information about the comparison between both methods appearing in the related literature. In doing so, several directions have been explored, including: testing several modifications of the basic learning algorithms and varying the feature space. Secondly, an improvement of both algorithms is proposed, in order to deal with large attribute sets. This modification, which basically consists in using only the positive information appearing in the examples, allows to improve greatly the efficiency of the methods, with no loss in accuracy. The experiments have been performed on the largest sense-tagged corpus available containing the most frequent and ambiguous English words. Results show that the Exemplar-based approach to WSD is generally superior to the Bayesian approach, especially when a specific metric for dealing with symbolic attributes is used.\nRational inference relations were introduced by Lehmann and Magidor as the ideal systems for drawing conclusions from a conditional base. However, there has been no simple characterization of these relations, other than its original representation by preferential models. In this paper, we shall characterize them with a class of total preorders of formulas by improving and extending Gardenfors and Makinson's results for expectation inference relations. A second representation is application-oriented and is obtained by considering a class of consequence operators that grade sets of defaults according to our reliance on them. The finitary fragment of this class of consequence operators has been employed by recent default logic formalisms based on maxiconsistency.\nIn this paper, an application of automated theorem proving techniques to computational semantics is considered. In order to compute the presuppositions of a natural language discourse, several inference tasks arise. Instead of treating these inferences independently of each other, we show how integrating techniques from formal approaches to context into deduction can help to compute presuppositions more efficiently. Contexts are represented as Discourse Representation Structures and the way they are nested is made explicit. In addition, a tableau calculus is present which keeps track of contextual information, and thereby allows to avoid carrying out redundant inference steps as it happens in approaches that neglect explicit nesting of contexts.\nMany logic programming based approaches can be used to describe and solve combinatorial search problems. On the one hand there is constraint logic programming which computes a solution as an answer substitution to a query containing the variables of the constraint satisfaction problem. On the other hand there are systems based on stable model semantics, abductive systems, and first order logic model generators which compute solutions as models of some theory. This paper compares these different approaches from the point of view of knowledge representation (how declarative are the programs) and from the point of view of performance (how good are they at solving typical problems).\nWe present an approach for modelling the structure and coarse content of legal documents with a view to providing automated support for the drafting of contracts and contract database retrieval. The approach is designed to be applicable where contract drafting is based on model-form contracts or on existing examples of a similar type. The main features of the approach are: (1) the representation addresses the structure and the interrelationships between the constituent parts of contracts, but not the text of the document itself; (2) the representation of documents is separated from the mechanisms that manipulate it; and (3) the drafting process is subject to a collection of explicitly stated constraints that govern the structure of the documents. We describe the representation of document instances and of 'generic documents', which are data structures used to drive the creation of new document instances, and we show extracts from a sample session to illustrate the features of a prototype system implemented in MacProlog.\nThe lexicographic closure of any given finite set D of normal defaults is defined. A conditional assertion \"if a then b\" is in this lexicographic closure if, given the defaults D and the fact a, one would conclude b. The lexicographic closure is essentially a rational extension of D, and of its rational closure, defined in a previous paper. It provides a logic of normal defaults that is different from the one proposed by R. Reiter and that is rich enough not to require the consideration of non-normal defaults. A large number of examples are provided to show that the lexicographic closure corresponds to the basic intuitions behind Reiter's logic of defaults.\nWe analyze the problem of defining well-founded semantics for ordered logic programs within a general framework based on alternating fixpoint theory. We start by showing that generalizations of existing answer set approaches to preference are too weak in the setting of well-founded semantics. We then specify some informal yet intuitive criteria and propose a semantical framework for preference handling that is more suitable for defining well-founded semantics for ordered logic programs. The suitability of the new approach is convinced by the fact that many attractive properties are satisfied by our semantics. In particular, our semantics is still correct with respect to various existing answer sets semantics while it successfully overcomes the weakness of their generalization to well-founded semantics. Finally, we indicate how an existing preferred well-founded semantics can be captured within our semantical framework.\nWe implement Groenendijk and Stokhof's partition semantics of questions in a simple question answering algorithm. The algorithm is sound, complete, and based on tableau theorem proving. The algorithm relies on a syntactic characterization of answerhood: Any answer to a question is equivalent to some formula built up only from instances of the question. We prove this characterization by translating the logic of interrogation to classical predicate logic and applying Craig's interpolation theorem.\nRecent advances in Multiagent Systems (MAS) and Epistemic Logic within Distributed Systems Theory, have used various combinatorial structures that model both the geometry of the systems and the Kripke model structure of models for the logic. Examining one of the simpler versions of these models, interpreted systems, and the related Kripke semantics of the logic $S5_n$ (an epistemic logic with $n$-agents), the similarities with the geometric / homotopy theoretic structure of groupoid atlases is striking. These latter objects arise in problems within algebraic K-theory, an area of algebra linked to the study of decomposition and normal form theorems in linear algebra. They have a natural well structured notion of path and constructions of path objects, etc., that yield a rich homotopy theory.\nPart of the theory of logic programming and nonmonotonic reasoning concerns the study of fixed-point semantics for these paradigms. Several different semantics have been proposed during the last two decades, and some have been more successful and acknowledged than others. The rationales behind those various semantics have been manifold, depending on one's point of view, which may be that of a programmer or inspired by commonsense reasoning, and consequently the constructions which lead to these semantics are technically very diverse, and the exact relationships between them have not yet been fully understood. In this paper, we present a conceptually new method, based on level mappings, which allows to provide uniform characterizations of different semantics for logic programs. We will display our approach by giving new and uniform characterizations of some of the major semantics, more particular of the least model semantics for definite programs, of the Fitting semantics, and of the well-founded semantics. A novel characterization of the weakly perfect model semantics will also be provided.\nWe relate two formerly independent areas: Formal concept analysis and logic of domains. We will establish a correspondene between contextual attribute logic on formal contexts resp. concept lattices and a clausal logic on coherent algebraic cpos. We show how to identify the notion of formal concept in the domain theoretic setting. In particular, we show that a special instance of the resolution rule from the domain logic coincides with the concept closure operator from formal concept analysis. The results shed light on the use of contexts and domains for knowledge representation and reasoning purposes.\nGiven the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.\nIn most real-world settings, due to limited time or other resources, an agent cannot perform all potentially useful deliberation and information gathering actions. This leads to the metareasoning problem of selecting such actions. Decision-theoretic methods for metareasoning have been studied in AI, but there are few theoretical results on the complexity of metareasoning.   We derive hardness results for three settings which most real metareasoning systems would have to encompass as special cases. In the first, the agent has to decide how to allocate its deliberation time across anytime algorithms running on different problem instances. We show this to be $\\mathcal{NP}$-complete. In the second, the agent has to (dynamically) allocate its deliberation or information gathering resources across multiple actions that it has to choose among. We show this to be $\\mathcal{NP}$-hard even when evaluating each individual action is extremely simple. In the third, the agent has to (dynamically) choose a limited number of deliberation or information gathering actions to disambiguate the state of the world. We show that this is $\\mathcal{NP}$-hard under a natural restriction, and $\\mathcal{PSPACE}$-hard in general.\nObject oriented constraint programs (OOCPs) emerge as a leading evolution of constraint programming and artificial intelligence, first applied to a range of industrial applications called configuration problems. The rich variety of technical approaches to solving configuration problems (CLP(FD), CC(FD), DCSP, Terminological systems, constraint programs with set variables ...) is a source of difficulty. No universally accepted formal language exists for communicating about OOCPs, which makes the comparison of systems difficult. We present here a Z based specification of OOCPs which avoids the falltrap of hidden object semantics. The object system is part of the specification, and captures all of the most advanced notions from the object oriented modeling standard UML. The paper illustrates these issues and the conciseness and precision of Z by the specification of a working OOCP that solves an historical AI problem : parsing a context free grammar. Being written in Z, an OOCP specification also supports formal proofs. The whole builds the foundation of an adaptative and evolving framework for communicating about constrained object models and programs.\nA new approach to the construction of general persistent polyhierarchical classifications is proposed. It is based on implicit description of category polyhierarchy by a generating polyhierarchy of classification criteria. Similarly to existing approaches, the classification categories are defined by logical functions encoded by attributive expressions. However, the generating hierarchy explicitly predefines domains of criteria applicability, and the semantics of relations between categories is invariant to changes in the universe composition, extending variety of criteria, and increasing their cardinalities. The generating polyhierarchy is an independent, compact, portable, and re-usable information structure serving as a template classification. It can be associated with one or more particular sets of objects, included in more general classifications as a standard component, or used as a prototype for more comprehensive classifications. The approach dramatically simplifies development and unplanned modifications of persistent hierarchical classifications compared with tree, DAG, and faceted schemes. It can be efficiently implemented in common DBMS, while considerably reducing amount of computer resources required for storage, maintenance, and use of complex polyhierarchies.\nIn this paper we present NLML (Natural Language Markup Language), a markup language to describe the syntactic and semantic structure of any grammatically correct English expression. At first the related works are analyzed to demonstrate the necessity of the NLML: simple form, easy management and direct storage. Then the description of the English grammar with NLML is introduced in details in three levels: sentences (with different complexities, voices, moods, and tenses), clause (relative clause and noun clause) and phrase (noun phrase, verb phrase, prepositional phrase, adjective phrase, adverb phrase and predicate phrase). At last the application fields of the NLML in NLP are shown with two typical examples: NLOJM (Natural Language Object Modal in Java) and NLDB (Natural Language Database).\nThis paper presents a methodology for summarization from multiple documents which are about a specific topic. It is based on the specification and identification of the cross-document relations that occur among textual elements within those documents. Our methodology involves the specification of the topic-specific entities, the messages conveyed for the specific entities by certain textual elements and the specification of the relations that can hold among these messages. The above resources are necessary for setting up a specific topic for our query-based summarization approach which uses these resources to identify the query-specific messages within the documents and the query-specific relations that connect these messages across documents.\nNeuro-fuzzy systems have attracted growing interest of researchers in various scientific and engineering areas due to the increasing need of intelligent systems. This paper evaluates the use of two popular soft computing techniques and conventional statistical approach based on Box--Jenkins autoregressive integrated moving average (ARIMA) model to predict electricity demand in the State of Victoria, Australia. The soft computing methods considered are an evolving fuzzy neural network (EFuNN) and an artificial neural network (ANN) trained using scaled conjugate gradient algorithm (CGA) and backpropagation (BP) algorithm. The forecast accuracy is compared with the forecasts used by Victorian Power Exchange (VPX) and the actual energy demand. To evaluate, we considered load demand patterns for 10 consecutive months taken every 30 min for training the different prediction models. Test results show that the neuro-fuzzy system performed better than neural networks, ARIMA model and the VPX forecasts.\nThe aim of our research was to apply well-known data mining techniques (such as linear neural networks, multi-layered perceptrons, probabilistic neural networks, classification and regression trees, support vector machines and finally a hybrid decision tree neural network approach) to the problem of predicting the quality of service in call centers; based on the performance data actually collected in a call center of a large insurance company. Our aim was two-fold. First, to compare the performance of models built using the above-mentioned techniques and, second, to analyze the characteristics of the input sensitivity in order to better understand the relationship between the perform-ance evaluation process and the actual performance and in this way help improve the performance of call centers. In this paper we summarize our findings.\nThe framework of algorithmic knowledge assumes that agents use algorithms to compute the facts they explicitly know. In many cases of interest, a deductive system, rather than a particular algorithm, captures the formal reasoning used by the agents to compute what they explicitly know. We introduce a logic for reasoning about both implicit and explicit knowledge with the latter defined with respect to a deductive system formalizing a logical theory for agents. The highly structured nature of deductive systems leads to very natural axiomatizations of the resulting logic when interpreted over any fixed deductive system. The decision problem for the logic, in the presence of a single agent, is NP-complete in general, no harder than propositional logic. It remains NP-complete when we fix a deductive system that is decidable in nondeterministic polynomial time. These results extend in a straightforward way to multiple agents.\nDefeasible argumentation has experienced a considerable growth in AI in the last decade. Theoretical results have been combined with development of practical applications in AI & Law, Case-Based Reasoning and various knowledge-based systems. However, the dialectical process associated with inference is computationally expensive. This paper focuses on speeding up this inference process by pruning the involved search space. Our approach is twofold. On one hand, we identify distinguished literals for computing defeat. On the other hand, we restrict ourselves to a subset of all possible conflicting arguments by introducing dialectical constraints.\nNiching enables a genetic algorithm (GA) to maintain diversity in a population. It is particularly useful when the problem has multiple optima where the aim is to find all or as many as possible of these optima. When the fitness landscape of a problem changes overtime, the problem is called non--stationary, dynamic or time--variant problem. In these problems, niching can maintain useful solutions to respond quickly, reliably and accurately to a change in the environment. In this paper, we present a niching method that works on the problem substructures rather than the whole solution, therefore it has less space complexity than previously known niching mechanisms. We show that the method is responding accurately when environmental changes occur.\nThis paper explores the concept of engagement, the process by which individuals in an interaction start, maintain and end their perceived connection to one another. The paper reports on one aspect of engagement among human interactors--the effect of tracking faces during an interaction. It also describes the architecture of a robot that can participate in conversational, collaborative interactions with engagement gestures. Finally, the paper reports on findings of experiments with human participants who interacted with a robot when it either performed or did not perform engagement gestures. Results of the human-robot studies indicate that people become engaged with robots: they direct their attention to the robot more often in interactions where engagement gestures are present, and they find interactions more appropriate when engagement gestures are present than when they are not.\nWe define a class of probabilistic models in terms of an operator algebra of stochastic processes, and a representation for this class in terms of stochastic parameterized grammars. A syntactic specification of a grammar is mapped to semantics given in terms of a ring of operators, so that grammatical composition corresponds to operator addition or multiplication. The operators are generators for the time-evolution of stochastic processes. Within this modeling framework one can express data clustering models, logic programs, ordinary and stochastic differential equations, graph grammars, and stochastic chemical reaction kinetics. This mathematical formulation connects these apparently distant fields to one another and to mathematical methods from quantum field theory and operator algebra.\nThis paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m^4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.\nComputing and storing probabilities is a hard problem as soon as one has to deal with complex distributions over multiple random variables. The problem of efficient representation of probability distributions is central in term of computational efficiency in the field of probabilistic reasoning. The main problem arises when dealing with joint probability distributions over a set of random variables: they are always represented using huge probability arrays. In this paper, a new method based on binary-tree representation is introduced in order to store efficiently very large joint distributions. Our approach approximates any multidimensional joint distributions using an adaptive discretization of the space. We make the assumption that the lower is the probability mass of a particular region of feature space, the larger is the discretization step. This assumption leads to a very optimized representation in term of time and memory. The other advantages of our approach are the ability to refine dynamically the distribution every time it is needed leading to a more accurate representation of the probability distribution and to an anytime representation of the distribution.\nImagination is the critical point in developing of realistic artificial intelligence (AI) systems. One way to approach imagination would be simulation of its properties and operations. We developed two models: AI-Brain Network Hierarchy of Languages and Semantical Holographic Calculus as well as simulation system ScriptWriter that emulate the process of imagination through an automatic animation of English texts. The purpose of this paper is to demonstrate the model and to present ScriptWriter system http://nvo.sdsc.edu/NVO/JCSG/get_SRB_mime_file2.cgi//home/tamara.sdsc/test/demo.zip?F=/home/tamara.sdsc/test/demo.zip&M=application/x-gtar for simulation of the imagination.\nNorms are essential to extend inference: inferences based on norms are far richer than those based on logical implications. In the recent decades, much effort has been devoted to reason on a domain, once its norms are represented. How to extract and express those norms has received far less attention. Extraction is difficult: as the readers are supposed to know them, the norms of a domain are seldom made explicit. For one thing, extracting norms requires a language to represent them, and this is the topic of this paper. We apply this language to represent norms in the domain of driving, and show that it is adequate to reason on the causes of accidents, as described by car-crash reports.\nIn case-based reasoning, the adaptation of a source case in order to solve the target problem is at the same time crucial and difficult to implement. The reason for this difficulty is that, in general, adaptation strongly depends on domain-dependent knowledge. This fact motivates research on adaptation knowledge acquisition (AKA). This paper presents an approach to AKA based on the principles and techniques of knowledge discovery from databases and data-mining. It is implemented in CABAMAKA, a system that explores the variations within the case base to elicit adaptation knowledge. This system has been successfully tested in an application of case-based reasoning to decision support in the domain of breast cancer treatment.\nOf all the issues discussed at {\\em Alife VII: Looking Forward,   Looking Backward}, the issue of whether it was possible to create an artificial life system that exhibits {\\em open-ended evolution} of novelty is by far the biggest. Of the 14 open problems settled on as a result of debate at the conference, some 6 are directly, or indirectly related to this issue. Most people equate open-ended evolution with complexity growth, although a priori these seem to be different things. In this paper I report on experiments to measure the complexity of Tierran organisms, and show the results for a {\\em size-neutral} run of Tierra. In this run, no increase in organismal complexity was observed, although organism size did increase through the run. This result is discussed, offering some signposts on path to solving the issue of open ended evolution.\nIn this paper, we employ Probabilistic Neural Network (PNN) with image and data processing techniques to implement a general purpose automated leaf recognition algorithm. 12 leaf features are extracted and orthogonalized into 5 principal variables which consist the input vector of the PNN. The PNN is trained by 1800 leaves to classify 32 kinds of plants with an accuracy greater than 90%. Compared with other approaches, our algorithm is an accurate artificial intelligence approach which is fast in execution and easy in implementation.\nCurrent network protection systems use a collection of intelligent components - e.g. classifiers or rule-based firewall systems to detect intrusions and anomalies and to secure a network against viruses, worms, or trojans. However, these network systems rely on individuality and support an architecture with less collaborative work of the protection components. They give less administration support for maintenance, but offer a large number of individual single points of failures - an ideal situation for network attacks to succeed. In this work, we discuss the required features, the performance, and the problems of a distributed protection system called SANA. It consists of a cooperative architecture, it is motivated by the human immune system, where the components correspond to artificial immune cells that are connected for their collaborative work. SANA promises a better protection against intruders than common known protection systems through an adaptive self-management while keeping the resources efficiently by an intelligent reduction of redundant tasks. We introduce a library of several novel and common used protection components and evaluate the performance of SANA by a proof-of-concept implementation.\nCurrent network protection systems use a collection of intelligent components - e.g. classifiers or rule-based firewall systems to detect intrusions and anomalies and to secure a network against viruses, worms, or trojans. However, these network systems rely on individuality and support an architecture with less collaborative work of the protection components. They give less administration support for maintenance, but offer a large number of individual single points of failures - an ideal situation for network attacks to succeed. In this work, we discuss the required features, the performance, and the problems of a distributed protection system called {\\it SANA}. It consists of a cooperative architecture, it is motivated by the human immune system, where the components correspond to artificial immune cells that are connected for their collaborative work. SANA promises a better protection against intruders than common known protection systems through an adaptive self-management while keeping the resources efficiently by an intelligent reduction of redundancies. We introduce a library of several novel and common used protection components and evaluate the performance of SANA by a proof-of-concept implementation.\nFor medical volume visualization, one of the most important tasks is to reveal clinically relevant details from the 3D scan (CT, MRI ...), e.g. the coronary arteries, without obscuring them with less significant parts. These volume datasets contain different materials which are difficult to extract and visualize with 1D transfer functions based solely on the attenuation coefficient. Multi-dimensional transfer functions allow a much more precise classification of data which makes it easier to separate different surfaces from each other. Unfortunately, setting up multi-dimensional transfer functions can become a fairly complex task, generally accomplished by trial and error. This paper explains neural networks, and then presents an efficient way to speed up visualization process by semi-automatic transfer function generation. We describe how to use neural networks to detect distinctive features shown in the 2D histogram of the volume data and how to use this information for data classification.\nIn this paper we extend Inagaki Weighted Operators fusion rule (WO) in information fusion by doing redistribution of not only the conflicting mass, but also of masses of non-empty intersections, that we call Double Weighted Operators (DWO). Then we propose a new fusion rule Class of Proportional Redistribution of Intersection Masses (CPRIM), which generates many interesting particular fusion rules in information fusion. Both formulas are presented for any number of sources of information. An application and comparison with other fusion rules are given in the last section.\nThe notion of optimality naturally arises in many areas of applied mathematics and computer science concerned with decision making. Here we consider this notion in the context of two formalisms used for different purposes and in different research areas: graphical games and soft constraints. We relate the notion of optimality used in the area of soft constraint satisfaction problems (SCSPs) to that used in graphical games, showing that for a large class of SCSPs that includes weighted constraints every optimal solution corresponds to a Nash equilibrium that is also a Pareto efficient joint strategy.\nA lattice-theoretic framework is introduced that permits the study of the conditional independence (CI) implication problem relative to the class of discrete probability measures. Semi-lattices are associated with CI statements and a finite, sound and complete inference system relative to semi-lattice inclusions is presented. This system is shown to be (1) sound and complete for saturated CI statements, (2) complete for general CI statements, and (3) sound and complete for stable CI statements. These results yield a criterion that can be used to falsify instances of the implication problem and several heuristics are derived that approximate this \"lattice-exclusion\" criterion in polynomial time. Finally, we provide experimental results that relate our work to results obtained from other existing inference algorithms.\nIt has previously been an open problem whether all Boolean submodular functions can be decomposed into a sum of binary submodular functions over a possibly larger set of variables. This problem has been considered within several different contexts in computer science, including computer vision, artificial intelligence, and pseudo-Boolean optimisation. Using a connection between the expressive power of valued constraints and certain algebraic properties of functions, we answer this question negatively.   Our results have several corollaries. First, we characterise precisely which submodular functions of arity 4 can be expressed by binary submodular functions. Next, we identify a novel class of submodular functions of arbitrary arities which can be expressed by binary submodular functions, and therefore minimised efficiently using a so-called expressibility reduction to the Min-Cut problem. More importantly, our results imply limitations on this kind of reduction and establish for the first time that it cannot be used in general to minimise arbitrary submodular functions. Finally, we refute a conjecture of Promislow and Young on the structure of the extreme rays of the cone of Boolean submodular functions.\nMany AI researchers and cognitive scientists have argued that analogy is the core of cognition. The most influential work on computational modeling of analogy-making is Structure Mapping Theory (SMT) and its implementation in the Structure Mapping Engine (SME). A limitation of SME is the requirement for complex hand-coded representations. We introduce the Latent Relation Mapping Engine (LRME), which combines ideas from SME and Latent Relational Analysis (LRA) in order to remove the requirement for hand-coded representations. LRME builds analogical mappings between lists of words, using a large corpus of raw text to automatically discover the semantic relations among the words. We evaluate LRME on a set of twenty analogical mapping problems, ten based on scientific analogies and ten based on common metaphors. LRME achieves human-level performance on the twenty problems. We compare LRME with a variety of alternative approaches and find that they are not able to reach the same level of performance.\nAll self-active living beings need to solve the motivational problem: The question what to do at any moment of their live. For humans and non-human animals at least two distinct layers of motivational drives are known, the primary needs for survival and the emotional drives leading to a wide range of sophisticated strategies, such as explorative learning and socializing. Part of the emotional layer of drives has universal facets, being beneficial in an extended range of environmental settings. Emotions are triggered in the brain by the release of neuromodulators, which are, at the same time, the agents for meta-learning. This intrinsic relation between emotions, meta-learning and universal action strategies suggests a central importance for emotional control for the design of artificial intelligences and synthetic cognitive systems. An implementation of this concept is proposed in terms of a dense and homogeneous associative network (dHan).\nThe present work consisted in developing a plateau game. There are the traditional ones (monopoly, cluedo, ect.) but those which interest us leave less place at the chance (luck) than to the strategy such that the chess game. Kallah is an old African game, its rules are simple but the strategies to be used are very complex to implement. Of course, they are based on a strongly mathematical basis as in the film \"Rain-Man\" where one can see that gambling can be payed with strategies based on mathematical theories. The Artificial Intelligence gives the possibility \"of thinking\" to a machine and, therefore, allows it to make decisions. In our work, we use it to give the means to the computer choosing its best movement.\nWhen mathematicians present proofs they usually adapt their explanations to their didactic goals and to the (assumed) knowledge of their addressees. Modern automated theorem provers, in contrast, present proofs usually at a fixed level of detail (also called granularity). Often these presentations are neither intended nor suitable for human use. A challenge therefore is to develop user- and goal-adaptive proof presentation techniques that obey common mathematical practice. We present a flexible and adaptive approach to proof presentation that exploits machine learning techniques to extract a model of the specific granularity of proof examples and employs this model for the automated generation of further proofs at an adapted level of granularity.\nConstraint programming (CP) has been used with great success to tackle a wide variety of constraint satisfaction problems which are computationally intractable in general. Global constraints are one of the important factors behind the success of CP. In this paper, we study a new global constraint, the multiset ordering constraint, which is shown to be useful in symmetry breaking and searching for leximin optimal solutions in CP. We propose efficient and effective filtering algorithms for propagating this global constraint. We show that the algorithms are sound and complete and we discuss possible extensions. We also consider alternative propagation methods based on existing constraints in CP toolkits. Our experimental results on a number of benchmark problems demonstrate that propagating the multiset ordering constraint via a dedicated algorithm can be very beneficial.\nWe argue that parameterized complexity is a useful tool with which to study global constraints. In particular, we show that many global constraints which are intractable to propagate completely have natural parameters which make them fixed-parameter tractable and which are easy to compute. This tractability tends either to be the result of a simple dynamic program or of a decomposition which has a strong backdoor of bounded size. This strong backdoor is often a cycle cutset. We also show that parameterized complexity can be used to study other aspects of constraint programming like symmetry breaking. For instance, we prove that value symmetry is fixed-parameter tractable to break in the number of symmetries. Finally, we argue that parameterized complexity can be used to derive results about the approximability of constraint propagation.\nA wide range of constraints can be compactly specified using automata or formal languages. In a sequence of recent papers, we have shown that an effective means to reason with such specifications is to decompose them into primitive constraints. We can then, for instance, use state of the art SAT solvers and profit from their advanced features like fast unit propagation, clause learning, and conflict-based search heuristics. This approach holds promise for solving combinatorial problems in scheduling, rostering, and configuration, as well as problems in more diverse areas like bioinformatics, software testing and natural language processing. In addition, decomposition may be an effective method to propagate other global constraints.\nWe study the CardPath constraint. This ensures a given constraint holds a number of times down a sequence of variables. We show that SLIDE, a special case of CardPath where the slid constraint must hold always, can be used to encode a wide range of sliding sequence constraints including CardPath itself. We consider how to propagate SLIDE and provide a complete propagator for CardPath. Since propagation is NP-hard in general, we identify special cases where propagation takes polynomial time. Our experiments demonstrate that using SLIDE to encode global constraints can be as efficient and effective as specialised propagators.\nSuccessful management of emotional stimuli is a pivotal issue concerning Affective Computing (AC) and the related research. As a subfield of Artificial Intelligence, AC is concerned not only with the design of computer systems and the accompanying hardware that can recognize, interpret, and process human emotions, but also with the development of systems that can trigger human emotional response in an ordered and controlled manner. This requires the maximum attainable precision and efficiency in the extraction of data from emotionally annotated databases While these databases do use keywords or tags for description of the semantic content, they do not provide either the necessary flexibility or leverage needed to efficiently extract the pertinent emotional content. Therefore, to this extent we propose an introduction of ontologies as a new paradigm for description of emotionally annotated data. The ability to select and sequence data based on their semantic attributes is vital for any study involving metadata, semantics and ontological sorting like the Semantic Web or the Social Semantic Desktop, and the approach described in the paper facilitates reuse in these areas as well.\nAt this point in time there is a need for a new representation of different information, to identify and organize descending its characteristics. Today, science is a powerful tool for the description of reality - the numbers. Why the most important property of numbers. Suppose we have a number 0.2351734, it is clear that the figures are there in order of importance. If necessary, we can round the number up to some value, eg 0.235. Arguably, the 0,235 - the most important information of 0.2351734. Thus, we can reduce the size of numbers is not losing much with the accuracy. Clearly, if learning to provide a graphical or audio information kernel, we can provide the most relevant information, discarding the rest. Introduction of various kinds of information in an information kernel, is an important task, to solve many problems in artificial intelligence and information theory.\nVoting is a simple mechanism to aggregate the preferences of agents. Many voting rules have been shown to be NP-hard to manipulate. However, a number of recent theoretical results suggest that this complexity may only be in the worst-case since manipulation is often easy in practice. In this paper, we show that empirical studies are useful in improving our understanding of this issue. We demonstrate that there is a smooth transition in the probability that a coalition can elect a desired candidate using the veto rule as the size of the manipulating coalition increases. We show that a rescaled probability curve displays a simple and universal form independent of the size of the problem. We argue that manipulation of the veto rule is asymptotically easy for many independent and identically distributed votes even when the coalition of manipulators is critical in size. Based on this argument, we identify a situation in which manipulation is computationally hard. This is when votes are highly correlated and the election is \"hung\". We show, however, that even a single uncorrelated voter is enough to make manipulation easy again.\nWe show that tools from circuit complexity can be used to study decompositions of global constraints. In particular, we study decompositions of global constraints into conjunctive normal form with the property that unit propagation on the decomposition enforces the same level of consistency as a specialized propagation algorithm. We prove that a constraint propagator has a a polynomial size decomposition if and only if it can be computed by a polynomial size monotone Boolean circuit. Lower bounds on the size of monotone Boolean circuits thus translate to lower bounds on the size of decompositions of global constraints. For instance, we prove that there is no polynomial sized decomposition of the domain consistency propagator for the ALLDIFFERENT constraint.\nTo model combinatorial decision problems involving uncertainty and probability, we extend the stochastic constraint programming framework proposed in [Walsh, 2002] along a number of important dimensions (e.g. to multiple chance constraints and to a range of new objectives). We also provide a new (but equivalent) semantics based on scenarios. Using this semantics, we can compile stochastic constraint programs down into conventional (nonstochastic) constraint programs. This allows us to exploit the full power of existing constraint solvers. We have implemented this framework for decision making under uncertainty in stochastic OPL, a language which is based on the OPL constraint modelling language [Hentenryck et al., 1999]. To illustrate the potential of this framework, we model a wide range of problems in areas as diverse as finance, agriculture and production.\nWe present a theoretical analysis of Maximum a Posteriori (MAP) sequence estimation for binary symmetric hidden Markov processes. We reduce the MAP estimation to the energy minimization of an appropriately defined Ising spin model, and focus on the performance of MAP as characterized by its accuracy and the number of solutions corresponding to a typical observed sequence. It is shown that for a finite range of sufficiently low noise levels, the solution is uniquely related to the observed sequence, while the accuracy degrades linearly with increasing the noise strength. For intermediate noise values, the accuracy is nearly noise-independent, but now there are exponentially many solutions to the estimation problem, which is reflected in non-zero ground-state entropy for the Ising model. Finally, for even larger noise intensities, the number of solutions reduces again, but the accuracy is poor. It is shown that these regimes are different thermodynamic phases of the Ising model that are related to each other via first-order phase transitions.\nSurveillance control and reporting (SCR) system for air threats play an important role in the defense of a country. SCR system corresponds to air and ground situation management/processing along with information fusion, communication, coordination, simulation and other critical defense oriented tasks. Threat Evaluation and Weapon Assignment (TEWA) sits at the core of SCR system. In such a system, maximal or near maximal utilization of constrained resources is of extreme importance. Manual TEWA systems cannot provide optimality because of different limitations e.g.surface to air missile (SAM) can fire from a distance of 5Km, but manual TEWA systems are constrained by human vision range and other constraints. Current TEWA systems usually work on target-by-target basis using some type of greedy algorithm thus affecting the optimality of the solution and failing in multi-target scenario. his paper relates to a novel two-staged flexible dynamic decision support based optimal threat evaluation and weapon assignment algorithm for multi-target air-borne threats.\nA general approach describing quantum decision procedures is developed. The approach can be applied to quantum information processing, quantum computing, creation of artificial quantum intelligence, as well as to analyzing decision processes of human decision makers. Our basic point is to consider an active quantum system possessing its own strategic state. Processing information by such a system is analogous to the cognitive processes associated to decision making by humans. The algebra of probability operators, associated with the possible options available to the decision maker, plays the role of the algebra of observables in quantum theory of measurements. A scheme is advanced for a practical realization of decision procedures by thinking quantum systems. Such thinking quantum systems can be realized by using spin lattices, systems of magnetic molecules, cold atoms trapped in optical lattices, ensembles of quantum dots, or multilevel atomic systems interacting with electromagnetic field.\nThis work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, non-communicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared to a maximization per robot. Another focus of this paper is the analysis of the effect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process.\nThis paper reviews the overview of the dynamic shortest path routing problem and the various neural networks to solve it. Different shortest path optimization problems can be solved by using various neural networks algorithms. The routing in packet switched multi-hop networks can be described as a classical combinatorial optimization problem i.e. a shortest path routing problem in graphs. The survey shows that the neural networks are the best candidates for the optimization of dynamic shortest path routing problems due to their fastness in computation comparing to other softcomputing and metaheuristics algorithms\nA central problem in artificial intelligence is that of planning to maximize future reward under uncertainty in a partially observable environment. In this paper we propose and demonstrate a novel algorithm which accurately learns a model of such an environment directly from sequences of action-observation pairs. We then close the loop from observations to actions by planning in the learned model and recovering a policy which is near-optimal in the original environment. Specifically, we present an efficient and statistically consistent spectral algorithm for learning the parameters of a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then perform approximate point-based planning in the learned PSR. Analysis of our results shows that the algorithm learns a state space which efficiently captures the essential features of the environment. This representation allows accurate prediction with a small number of parameters, and enables successful and efficient planning.\nParaphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.\nIn this philosophical paper, we explore computational and biological analogies to address the fine-tuning problem in cosmology. We first clarify what it means for physical constants or initial conditions to be fine-tuned. We review important distinctions such as the dimensionless and dimensional physical constants, and the classification of constants proposed by Levy-Leblond. Then we explore how two great analogies, computational and biological, can give new insights into our problem. This paper includes a preliminary study to examine the two analogies. Importantly, analogies are both useful and fundamental cognitive tools, but can also be misused or misinterpreted. The idea that our universe might be modelled as a computational entity is analysed, and we discuss the distinction between physical laws and initial conditions using algorithmic information theory. Smolin introduced the theory of \"Cosmological Natural Selection\" with a biological analogy in mind. We examine an extension of this analogy involving intelligent life. We discuss if and how this extension could be legitimated.   Keywords: origin of the universe, fine-tuning, physical constants, initial conditions, computational universe, biological universe, role of intelligent life, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis.\nWe study propagation algorithms for the conjunction of two AllDifferent constraints. Solutions of an AllDifferent constraint can be seen as perfect matchings on the variable/value bipartite graph. Therefore, we investigate the problem of finding simultaneous bipartite matchings. We present an extension of the famous Hall theorem which characterizes when simultaneous bipartite matchings exists. Unfortunately, finding such matchings is NP-hard in general. However, we prove a surprising result that finding a simultaneous matching on a convex bipartite graph takes just polynomial time. Based on this theoretical result, we provide the first polynomial time bound consistency algorithm for the conjunction of two AllDifferent constraints. We identify a pathological problem on which this propagator is exponentially faster compared to existing propagators. Our experiments show that this new propagator can offer significant benefits over existing methods.\nAs one of the newest members in the field of artificial immune systems (AIS), the Dendritic Cell Algorithm (DCA) is based on behavioural models of natural dendritic cells (DCs). Unlike other AIS, the DCA does not rely on training data, instead domain or expert knowledge is required to predetermine the mapping between input signals from a particular instance to the three categories used by the DCA. This data preprocessing phase has received the criticism of having manually over-?tted the data to the algorithm, which is undesirable. Therefore, in this paper we have attempted to ascertain if it is possible to use principal component analysis (PCA) techniques to automatically categorise input data while still generating useful and accurate classication results. The integrated system is tested with a biometrics dataset for the stress recognition of automobile drivers. The experimental results have shown the application of PCA to the DCA for the purpose of automated data preprocessing is successful.\nLuhmann (1984) defined society as a communication system which is structurally coupled to, but not an aggregate of, human action systems. The communication system is then considered as self-organizing (\"autopoietic\"), as are human actors. Communication systems can be studied by using Shannon's (1948) mathematical theory of communication. The update of a network by action at one of the local nodes is then a well-known problem in artificial intelligence (Pearl 1988). By combining these various theories, a general algorithm for probabilistic structure/action contingency can be derived. The consequences of this contingency for each system, its consequences for their further histories, and the stabilization on each side by counterbalancing mechanisms are discussed, in both mathematical and theoretical terms. An empirical example is elaborated.\nWe present tropical games, a generalization of combinatorial min-max games based on tropical algebras. Our model breaks the traditional symmetry of rational zero-sum games where players have exactly opposed goals (min vs. max), is more widely applicable than min-max and also supports a form of pruning, despite it being less effective than alpha-beta. Actually, min-max games may be seen as particular cases where both the game and its dual are tropical: when the dual of a tropical game is also tropical, the power of alpha-beta is completely recovered. We formally develop the model and prove that the tropical pruning strategy is correct, then conclude by showing how the problem of approximated parsing can be modeled as a tropical game, profiting from pruning.\nVoting is a simple mechanism to combine together the preferences of multiple agents. Agents may try to manipulate the result of voting by mis-reporting their preferences. One barrier that might exist to such manipulation is computational complexity. In particular, it has been shown that it is NP-hard to compute how to manipulate a number of different voting rules. However, NP-hardness only bounds the worst-case complexity. Recent theoretical results suggest that manipulation may often be easy in practice. In this paper, we study empirically the manipulability of single transferable voting (STV) to determine if computational complexity is really a barrier to manipulation. STV was one of the first voting rules shown to be NP-hard. It also appears one of the harder voting rules to manipulate. We sample a number of distributions of votes including uniform and real world elections. In almost every election in our experiments, it was easy to compute how a single agent could manipulate the election or to prove that manipulation by a single agent was impossible.\nSymmetry is an important feature of many constraint programs. We show that any problem symmetry acting on a set of symmetry breaking constraints can be used to break symmetry. Different symmetries pick out different solutions in each symmetry class. This simple but powerful idea can be used in a number of different ways. We describe one application within model restarts, a search technique designed to reduce the conflict between symmetry breaking and the branching heuristic. In model restarts, we restart search periodically with a random symmetry of the symmetry breaking constraints. Experimental results show that this symmetry breaking technique is effective in practice on some standard benchmark problems.\nIn May 1, 2008, researchers at Hewlett Packard (HP) announced the first physical realization of a fundamental circuit element called memristor that attracted so much interest worldwide. This newly found element can easily be combined with crossbar interconnect technology which this new structure has opened a new field in designing configurable or programmable electronic systems. These systems in return can have applications in signal processing and artificial intelligence. In this paper, based on the simple memristor crossbar structure, we propose new and simple circuits for hardware implementation of fuzzy membership functions. In our proposed circuits, these fuzzy membership functions can have any shapes and resolutions. In addition, these circuits can be used as a basis in the construction of evolutionary systems.\nText classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of na\\\"ive Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental results show that the proposed system works as a successful text classifier.\nDCSP (Distributed Constraint Satisfaction Problem) has been a very important research area in AI (Artificial Intelligence). There are many application problems in distributed AI that can be formalized as DSCPs. With the increasing complexity and problem size of the application problems in AI, the required storage place in searching and the average searching time are increasing too. Thus, to use a limited storage place efficiently in solving DCSP becomes a very important problem, and it can help to reduce searching time as well. This paper provides an efficient knowledge base management approach based on general usage of hyper-resolution-rule in consistence algorithm. The approach minimizes the increasing of the knowledge base by eliminate sufficient constraint and false nogood. These eliminations do not change the completeness of the original knowledge base increased. The proofs are given as well. The example shows that this approach decrease both the new nogoods generated and the knowledge base greatly. Thus it decreases the required storage place and simplify the searching process.\nThe paper proposes artificial intelligence technique called hill climbing to find numerical solutions of Diophantine Equations. Such equations are important as they have many applications in fields like public key cryptography, integer factorization, algebraic curves, projective curves and data dependency in super computers. Importantly, it has been proved that there is no general method to find solutions of such equations. This paper is an attempt to find numerical solutions of Diophantine equations using steepest ascent version of Hill Climbing. The method, which uses tree representation to depict possible solutions of Diophantine equations, adopts a novel methodology to generate successors. The heuristic function used help to make the process of finding solution as a minimization process. The work illustrates the effectiveness of the proposed methodology using a class of Diophantine equations given by a1. x1 p1 + a2. x2 p2 + ...... + an . xn pn = N where ai and N are integers. The experimental results validate that the procedure proposed is successful in finding solutions of Diophantine Equations with sufficiently large powers and large number of variables.\nSequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.\nInference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus our attention on the class of planar Ising models, for which inference is tractable using techniques of statistical physics [Kac and Ward; Kasteleyn]. Based on these techniques and recent methods for planarity testing and planar embedding [Chrobak and Payne], we propose a simple greedy algorithm for learning the best planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. We demonstrate our method in some simulations and for the application of modeling senate voting records.\n\"Encoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it. The deliberate process we call reasoning is, I believe, the thinnest veneer of human thought, effective only because it is supported by this much older and much powerful, though usually unconscious, sensor motor knowledge. We are all prodigious Olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.\"- Hans Moravec Moravec's paradox is involved with the fact that it is the seemingly easier day to day problems that are harder to implement in a machine, than the seemingly complicated logic based problems of today. The results prove that most artificially intelligent machines are as adept if not more than us at under-taking long calculations or even play chess, but their logic brings them nowhere when it comes to carrying out everyday tasks like walking, facial gesture recognition or speech recognition.\nCase Based Reasoning and particularly Estimation by Analogy, has been used in a number of problem-solving areas, such as cost estimation. Conventional methods, despite the lack of a sound criterion for choosing nearest projects, were based on estimation using a fixed and predetermined number of neighbors from the entire set of historical instances. This approach puts boundaries to the estimation ability of such algorithms, for they do not take into consideration that every project under estimation is unique and requires different handling. The notion of distributions of distances together with a distance metric for distributions help us to adapt the proposed method (we call it DD-EbA) each time to a specific case that is to be estimated without loosing in prediction power or computational cost. The results of this paper show that the proposed technique achieves the above idea in a very efficient way.\nCurrent work in planning with preferences assume that the user's preference models are completely specified and aim to search for a single solution plan. In many real-world planning scenarios, however, the user probably cannot provide any information about her desired plans, or in some cases can only express partial preferences. In such situations, the planner has to present not only one but a set of plans to the user, with the hope that some of them are similar to the plan she prefers. We first propose the usage of different measures to capture quality of plan sets that are suitable for such scenarios: domain-independent distance measures defined based on plan elements (actions, states, causal links) if no knowledge of the user's preferences is given, and the Integrated Convex Preference measure in case the user's partial preference is provided. We then investigate various heuristic approaches to find set of plans according to these measures, and present empirical results demonstrating the promise of our approach.\nSoftware testing is an expensive process, which is vital in the industry. Construction of the test-data in software testing requires the major cost and to decide which method to use in order to generate the test data is important. This paper discusses the efficiency of search-based algorithms (preferably genetic algorithm) versus random testing, in soft- ware test-data generation. This study differs from all previous studies due to sample programs (SUTs) which are used. Since we want to in- crease the complexity of SUTs gradually, and the program generation is automatic as well, Grammatical Evolution is used to guide the program generation. SUTs are generated according to the grammar we provide, with different levels of complexity. SUTs will first undergo genetic al- gorithm and then random testing. Based on the test results, this paper recommends one method to use for automation of software testing.\nThe context of a software developer is something hard to define and capture, as it represents a complex network of elements across different dimensions that are not limited to the work developed on an IDE. We propose the definition of a software developer context model that takes into account all the dimensions that characterize the work environment of the developer. We are especially focused on what the software developer context encompasses at the project level and how it can be captured. The experimental work done so far show that useful context information can be extracted from project management tools. The extraction, analysis and availability of this context information can be used to enrich the work environment of the developer with additional knowledge to support her/his work.\nMeaning negotiation (MN) is the general process with which agents reach an agreement about the meaning of a set of terms. Artificial Intelligence scholars have dealt with the problem of MN by means of argumentations schemes, beliefs merging and information fusion operators, and ontology alignment but the proposed approaches depend upon the number of participants. In this paper, we give a general model of MN for an arbitrary number of agents, in which each participant discusses with the others her viewpoint by exhibiting it in an actual set of constraints on the meaning of the negotiated terms. We call this presentation of individual viewpoints an angle. The agents do not aim at forming a common viewpoint but, instead, at agreeing about an acceptable common angle. We analyze separately the process of MN by two agents (\\emph{bilateral} or \\emph{pairwise} MN) and by more than two agents (\\emph{multiparty} MN), and we use game theoretic models to understand how the process develops in both cases: the models are Bargaining Game for bilateral MN and English Auction for multiparty MN. We formalize the process of reaching such an agreement by giving a deduction system that comprises of rules that are consistent and adequate for representing MN.\nThe immune system is a highly parallel and distributed intelligent system which has learning, memory, and associative capabilities. Artificial Immune System is an evolutionary paradigm inspired by the biological aspects of the immune system of mammals. The immune system can inspire to form new algorithms learning from its course of action. The human immune system has motivated scientists and engineers for finding powerful information processing algorithms that has solved complex engineering problems. This work is the result of an attempt to explore a different perspective of the immune system namely the Immune Privileged Site (IPS) which has the ability to make an exception to different parts of the body by not triggering immune response to some of the foreign agent in these parts of the body. While the complete system is secured by an Immune System at certain times it may be required that the system allows certain activities which may be harmful to other system which is useful to it and learns over a period of time through the immune privilege model as done in case of Immune Privilege Sites in Natural Immune System.\nIn Artificial Intelligence with Coalition Structure Generation (CSG) one refers to those cooperative complex problems that require to find an optimal partition, maximising a social welfare, of a set of entities involved in a system into exhaustive and disjoint coalitions. The solution of the CSG problem finds applications in many fields such as Machine Learning (covering machines, clustering), Data Mining (decision tree, discretization), Graph Theory, Natural Language Processing (aggregation), Semantic Web (service composition), and Bioinformatics. The problem of finding the optimal coalition structure is NP-complete. In this paper we present a greedy adaptive search procedure (GRASP) with path-relinking to efficiently search the space of coalition structures. Experiments and comparisons to other algorithms prove the validity of the proposed method in solving this hard combinatorial problem.\nIn a minimal binary constraint network, every tuple of a constraint relation can be extended to a solution. The tractability or intractability of computing a solution to such a minimal network was a long standing open question. Dechter conjectured this computation problem to be NP-hard. We prove this conjecture. We also prove a conjecture by Dechter and Pearl stating that for k\\geq2 it is NP-hard to decide whether a single constraint can be decomposed into an equivalent k-ary constraint network. We show that this holds even in case of bi-valued constraints where k\\geq3, which proves another conjecture of Dechter and Pearl. Finally, we establish the tractability frontier for this problem with respect to the domain cardinality and the parameter k.\nSome recent works in conditional planning have proposed reachability heuristics to improve planner scalability, but many lack a formal description of the properties of their distance estimates. To place previous work in context and extend work on heuristics for conditional planning, we provide a formal basis for distance estimates between belief states. We give a definition for the distance between belief states that relies on aggregating underlying state distance measures. We give several techniques to aggregate state distances and their associated properties. Many existing heuristics exhibit a subset of the properties, but in order to provide a standardized comparison we present several generalizations of planning graph heuristics that are used in a single planner. We compliment our belief state distance estimate framework by also investigating efficient planning graph data structures that incorporate BDDs to compute the most effective heuristics.   We developed two planners to serve as test-beds for our investigation. The first, CAltAlt, is a conformant regression planner that uses A* search. The second, POND, is a conditional progression planner that uses AO* search. We show the relative effectiveness of our heuristic techniques within these planners. We also compare the performance of these planners with several state of the art approaches in conditional planning.\nMany fundamental problems in artificial intelligence, knowledge representation, and verification involve reasoning about sets and relations between sets and can be modeled as set constraint satisfaction problems (set CSPs). Such problems are frequently intractable, but there are several important set CSPs that are known to be polynomial-time tractable. We introduce a large class of set CSPs that can be solved in quadratic time. Our class, which we call EI, contains all previously known tractable set CSPs, but also some new ones that are of crucial importance for example in description logics. The class of EI set constraints has an elegant universal-algebraic characterization, which we use to show that every set constraint language that properly contains all EI set constraints already has a finite sublanguage with an NP-hard constraint satisfaction problem.\nAs was shown recently, many important AI problems require counting the number of models of propositional formulas. The problem of counting models of such formulas is, according to present knowledge, computationally intractable in a worst case. Based on the Davis-Putnam procedure, we present an algorithm, CDP, that computes the exact number of models of a propositional CNF or DNF formula F. Let m and n be the number of clauses and variables of F, respectively, and let p denote the probability that a literal l of F occurs in a clause C of F, then the average running time of CDP is shown to be O(nm^d), where d=-1/log(1-p). The practical performance of CDP has been estimated in a series of experiments on a wide variety of CNF formulas.\nThis paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data.\nLocalization, that is the estimation of a robot's location from sensor data, is a fundamental problem in mobile robotics. This papers presents a version of Markov localization which provides accurate position estimates and which is tailored towards dynamic environments. The key idea of Markov localization is to maintain a probability density over the space of all locations of a robot in its environment. Our approach represents this space metrically, using a fine-grained grid to approximate densities. It is able to globally localize the robot from scratch and to recover from localization failures. It is robust to approximate models of the environment (such as occupancy grid maps) and noisy sensors (such as ultrasound sensors). Our approach also includes a filtering technique which allows a mobile robot to reliably estimate its position even in densely populated environments in which crowds of people block the robot's sensors for extended periods of time. The method described here has been implemented and tested in several real-world applications of mobile robots, including the deployments of two mobile robots as interactive museum tour-guides.\nMulti-Agent Systems (MAS) promise to offer solutions to problems where established, older paradigms fall short. In order to validate such claims that are repeatedly made in software agent publications, empirical in-depth studies of advantages and weaknesses of multi-agent solutions versus conventional ones in practical applications are needed. Climate control in large buildings is one application area where multi-agent systems, and market-oriented programming in particular, have been reported to be very successful, although central control solutions are still the standard practice. We have therefore constructed and implemented a variety of market designs for this problem, as well as different standard control engineering solutions. This article gives a detailed analysis and comparison, so as to learn about differences between standard versus agent approaches, and yielding new insights about benefits and limitations of computational markets. An important outcome is that \"local information plus market communication produces global control\".\nWe show how to find a minimum weight loop cutset in a Bayesian network with high probability. Finding such a loop cutset is the first step in the method of conditioning for inference. Our randomized algorithm for finding a loop cutset outputs a minimum loop cutset after O(c 6^k kn) steps with probability at least 1 - (1 - 1/(6^k))^c6^k, where c > 1 is a constant specified by the user, k is the minimal size of a minimum weight loop cutset, and n is the number of vertices. We also show empirically that a variant of this algorithm often finds a loop cutset that is closer to the minimum weight loop cutset than the ones found by the best deterministic algorithms known.\nWe investigate the space efficiency of a Propositional Knowledge Representation (PKR) formalism. Intuitively, the space efficiency of a formalism F in representing a certain piece of knowledge A, is the size of the shortest formula of F that represents A. In this paper we assume that knowledge is either a set of propositional interpretations (models) or a set of propositional formulae (theorems). We provide a formal way of talking about the relative ability of PKR formalisms to compactly represent a set of models or a set of theorems. We introduce two new compactness measures, the corresponding classes, and show that the relative space efficiency of a PKR formalism in representing models/theorems is directly related to such classes. In particular, we consider formalisms for nonmonotonic reasoning, such as circumscription and default logic, as well as belief revision operators and the stable model semantics for logic programs with negation. One interesting result is that formalisms with the same time complexity do not necessarily belong to the same space efficiency class.\nPartially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain.\nAgents in dynamic multi-agent environments must monitor their peers to execute individual and group plans. A key open question is how much monitoring of other agents' states is required to be effective: The Monitoring Selectivity Problem. We investigate this question in the context of detecting failures in teams of cooperating agents, via Socially-Attentive Monitoring, which focuses on monitoring for failures in the social relationships between the agents. We empirically and analytically explore a family of socially-attentive teamwork monitoring algorithms in two dynamic, complex, multi-agent domains, under varying conditions of task distribution and uncertainty. We show that a centralized scheme using a complex algorithm trades correctness for completeness and requires monitoring all teammates. In contrast, a simple distributed teamwork monitoring algorithm results in correct and complete detection of teamwork failures, despite relying on limited, uncertain knowledge, and monitoring only key agents in a team. In addition, we report on the design of a socially-attentive monitoring system and demonstrate its generality in monitoring several coordination relationships, diagnosing detected failures, and both on-line and off-line applications.\nFunctional relationships between objects, called `attributes', are of considerable importance in knowledge representation languages, including Description Logics (DLs). A study of the literature indicates that papers have made, often implicitly, different assumptions about the nature of attributes: whether they are always required to have a value, or whether they can be partial functions. The work presented here is the first explicit study of this difference for subclasses of the CLASSIC DL, involving the same-as concept constructor. It is shown that although determining subsumption between concept descriptions has the same complexity (though requiring different algorithms), the story is different in the case of determining the least common subsumer (lcs). For attributes interpreted as partial functions, the lcs exists and can be computed relatively easily; even in this case our results correct and extend three previous papers about the lcs of DLs. In the case where attributes must have a value, the lcs may not exist, and even if it exists it may be of exponential size. Interestingly, it is possible to decide in polynomial time if the lcs exists.\nWe study the complexity of the combination of the Description Logics ALCQ and ALCQI with a terminological formalism based on cardinality restrictions on concepts. These combinations can naturally be embedded into C^2, the two variable fragment of predicate logic with counting quantifiers, which yields decidability in NExpTime. We show that this approach leads to an optimal solution for ALCQI, as ALCQI with cardinality restrictions has the same complexity as C^2 (NExpTime-complete). In contrast, we show that for ALCQ, the problem can be solved in ExpTime. This result is obtained by a reduction of reasoning with cardinality restrictions to reasoning with the (in general weaker) terminological formalism of general axioms for ALCQ extended with nominals. Using the same reduction, we show that, for the extension of ALCQI with nominals, reasoning with general axioms is a NExpTime-complete problem. Finally, we sharpen this result and show that pure concept satisfiability for ALCQI with nominals is NExpTime-complete. Without nominals, this problem is known to be PSpace-complete.\nThis paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.\nThe goal of this research is to develop agents that are adaptive and predictable and timely. At first blush, these three requirements seem contradictory. For example, adaptation risks introducing undesirable side effects, thereby making agents' behavior less predictable. Furthermore, although formal verification can assist in ensuring behavioral predictability, it is known to be time-consuming. Our solution to the challenge of satisfying all three requirements is the following. Agents have finite-state automaton plans, which are adapted online via evolutionary learning (perturbation) operators. To ensure that critical behavioral constraints are always satisfied, agents' plans are first formally verified. They are then reverified after every adaptation. If reverification concludes that constraints are violated, the plans are repaired. The main objective of this paper is to improve the efficiency of reverification after learning, so that agents have a sufficiently rapid response time. We present two solutions: positive results that certain learning operators are a priori guaranteed to preserve useful classes of behavioral assurance constraints (which implies that no reverification is needed for these operators), and efficient incremental reverification algorithms for those learning operators that have negative a priori results.\nA major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task.\nThe recent approaches of extending the GRAPHPLAN algorithm to handle more expressive planning formalisms raise the question of what the formal meaning of \"expressive power\" is. We formalize the intuition that expressive power is a measure of how concisely planning domains and plans can be expressed in a particular formalism by introducing the notion of \"compilation schemes\" between planning formalisms. Using this notion, we analyze the expressiveness of a large family of propositional planning formalisms, ranging from basic STRIPS to a formalism with conditional effects, partial state specifications, and propositional formulae in the preconditions. One of the results is that conditional effects cannot be compiled away if plan size should grow only linearly but can be compiled away if we allow for polynomial growth of the resulting plans. This result confirms that the recently proposed extensions to the GRAPHPLAN algorithm concerning conditional effects are optimal with respect to the \"compilability\" framework. Another result is that general propositional formulae cannot be compiled into conditional effects if the plan size should be preserved linearly. This implies that allowing general propositional formulae in preconditions and effect conditions adds another level of difficulty in generating a plan.\nIn order to generate plans for agents with multiple actuators, agent teams, or distributed controllers, we must be able to represent and plan using concurrent actions with interacting effects. This has historically been considered a challenging task requiring a temporal planner with the ability to reason explicitly about time. We show that with simple modifications, the STRIPS action representation language can be used to represent interacting actions. Moreover, algorithms for partial-order planning require only small modifications in order to be applied in such multiagent domains. We demonstrate this fact by developing a sound and complete partial-order planner for planning with concurrent interacting actions, POMP, that extends existing partial-order planners in a straightforward way. These results open the way to the use of partial-order planners for the centralized control of cooperative multiagent systems.\nThis paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile.\nThis paper presents GRT, a domain-independent heuristic planning system for STRIPS worlds. GRT solves problems in two phases. In the pre-processing phase, it estimates the distance between each fact and the goals of the problem, in a backward direction. Then, in the search phase, these estimates are used in order to further estimate the distance between each intermediate state and the goals, guiding so the search process in a forward direction and on a best-first basis. The paper presents the benefits from the adoption of opposite directions between the preprocessing and the search phases, discusses some difficulties that arise in the pre-processing phase and introduces techniques to cope with them. Moreover, it presents several methods of improving the efficiency of the heuristic, by enriching the representation and by reducing the size of the problem. Finally, a method of overcoming local optimal states, based on domain axioms, is proposed. According to it, difficult problems are decomposed into easier sub-problems that have to be solved sequentially. The performance results from various domains, including those of the recent planning competitions, show that GRT is among the fastest planners.\nIn this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter and Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter and Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.\nDescription Logics (DLs) are suitable, well-known, logics for managing structured knowledge. They allow reasoning about individuals and well defined concepts, i.e., set of individuals with common properties. The experience in using DLs in applications has shown that in many cases we would like to extend their capabilities. In particular, their use in the context of Multimedia Information Retrieval (MIR) leads to the convincement that such DLs should allow the treatment of the inherent imprecision in multimedia object content representation and retrieval. In this paper we will present a fuzzy extension of ALC, combining Zadeh's fuzzy logic with a classical DL. In particular, concepts becomes fuzzy and, thus, reasoning about imprecise concepts is supported. We will define its syntax, its semantics, describe its properties and present a constraint propagation calculus for reasoning in it.\nThis paper investigates the problems arising in the construction of a program to play the game of contract bridge. These problems include both the difficulty of solving the game's perfect information variant, and techniques needed to address the fact that bridge is not, in fact, a perfect information game. GIB, the program being described, involves five separate technical advances: partition search, the practical application of Monte Carlo techniques to realistic problems, a focus on achievable sets to solve problems inherent in the Monte Carlo approach, an extension of alpha-beta pruning from total orders to arbitrary distributive lattices, and the use of squeaky wheel optimization to find approximately optimal solutions to cardplay problems. GIB is currently believed to be of approximately expert caliber, and is currently the strongest computer bridge program in the world.\nEnforcing local consistencies is one of the main features of constraint reasoning. Which level of local consistency should be used when searching for solutions in a constraint network is a basic question. Arc consistency and partial forms of arc consistency have been widely studied, and have been known for sometime through the forward checking or the MAC search algorithms. Until recently, stronger forms of local consistency remained limited to those that change the structure of the constraint graph, and thus, could not be used in practice, especially on large networks. This paper focuses on the local consistencies that are stronger than arc consistency, without changing the structure of the network, i.e., only removing inconsistent values from the domains. In the last five years, several such local consistencies have been proposed by us or by others. We make an overview of all of them, and highlight some relations between them. We compare them both theoretically and experimentally, considering their pruning efficiency and the time required to enforce them.\nWe describe and evaluate the algorithmic techniques that are used in the FF planning system. Like the HSP system, FF relies on forward state space search, using a heuristic that estimates goal distances by ignoring delete lists. Unlike HSP's heuristic, our method does not assume facts to be independent. We introduce a novel search strategy that combines hill-climbing with systematic search, and we show how other powerful heuristic information can be extracted and used to prune the search space. FF was the most successful automatic planner at the recent AIPS-2000 planning competition. We review the results of the competition, give data for other benchmark domains, and investigate the reasons for the runtime performance of FF compared to HSP.\nHidden Markov models (HMMs) and partially observable Markov decision processes (POMDPs) provide useful tools for modeling dynamical systems. They are particularly useful for representing the topology of environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information and geometrical constraints into both the models and the algorithm that learns them. By taking advantage of such information, learning HMMs/POMDPs can be made to generate better solutions and require fewer iterations, while being robust in the face of data reduction. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach.\nMany widely studied graphical models with latent variables lead to nontrivial constraints on the distribution of the observed variables. Inspired by the Bell inequalities in quantum mechanics, we refer to any linear inequality whose violation rules out some latent variable model as a \"hidden variable test\" for that model. Our main contribution is to introduce a sequence of relaxations which provides progressively tighter hidden variable tests. We demonstrate applicability to mixtures of sequences of i.i.d. variables, Bell inequalities, and homophily models in social networks. For the last, we demonstrate that our method provides a test that is able to rule out latent homophily as the sole explanation for correlations on a real social network that are known to be due to influence.\nThis paper discusses a system that accelerates reinforcement learning by using transfer from related tasks. Without such transfer, even if two tasks are very similar at some abstract level, an extensive re-learning effort is required. The system achieves much of its power by transferring parts of previously learned solutions rather than a single complete solution. The system exploits strong features in the multi-dimensional function produced by reinforcement learning in solving a particular task. These features are stable and easy to recognize early in the learning process. They generate a partitioning of the state space and thus the function. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. Experiments demonstrate that function composition often produces more than an order of magnitude increase in learning rate compared to a basic reinforcement learning algorithm.\nWe propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. definite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, that runs for a class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have been developed independently in each research field. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can significantly outperform the Inside-Outside algorithm.\nSimple conceptual graphs are considered as the kernel of most knowledge representation formalisms built upon Sowa's model. Reasoning in this model can be expressed by a graph homomorphism called projection, whose semantics is usually given in terms of positive, conjunctive, existential FOL. We present here a family of extensions of this model, based on rules and constraints, keeping graph homomorphism as the basic operation. We focus on the formal definitions of the different models obtained, including their operational semantics and relationships with FOL, and we analyze the decidability and complexity of the associated problems (consistency and deduction). As soon as rules are involved in reasonings, these problems are not decidable, but we exhibit a condition under which they fall in the polynomial hierarchy. These results extend and complete the ones already published by the authors. Moreover we systematically study the complexity of some particular cases obtained by restricting the form of constraints and/or rules.\nFusions are a simple way of combining logics. For normal modal logics, fusions have been investigated in detail. In particular, it is known that, under certain conditions, decidability transfers from the component logics to their fusion. Though description logics are closely related to modal logics, they are not necessarily normal. In addition, ABox reasoning in description logics is not covered by the results from modal logics. In this paper, we extend the decidability transfer results from normal modal logics to a large class of description logics. To cover different description logics in a uniform way, we introduce abstract description systems, which can be seen as a common generalization of description and modal logics, and show the transfer results in this general setting.\nCommon wisdom has it that small distinctions in the probabilities (parameters) quantifying a belief network do not matter much for the results of probabilistic queries. Yet, one can develop realistic scenarios under which small variations in network parameters can lead to significant changes in computed queries. A pending theoretical question is then to analytically characterize parameter changes that do or do not matter. In this paper, we study the sensitivity of probabilistic queries to changes in network parameters and prove some tight bounds on the impact that such parameters can have on queries. Our analytic results pinpoint some interesting situations under which parameter changes do or do not matter. These results are important for knowledge engineers as they help them identify influential network parameters. They also help explain some of the previous experimental results and observations with regards to network robustness against parameter changes.\nSpoken dialogue systems promise efficient and natural access to a large variety of information sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the 'How May I Help You' (SM) spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automatically-obtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline.\nWe propose a perspective on knowledge compilation which calls for analyzing different compilation approaches according to two key dimensions: the succinctness of the target compilation language, and the class of queries and transformations that the language supports in polytime. We then provide a knowledge compilation map, which analyzes a large number of existing target compilation languages according to their succinctness and their polytime transformations and queries. We argue that such analysis is necessary for placing new compilation approaches within the context of existing ones. We also go beyond classical, flat target compilation languages based on CNF and DNF, and consider a richer, nested class based on directed acyclic graphs (such as OBDDs), which we show to include a relatively large number of target compilation languages.\nThe problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies.\nPrediction markets show considerable promise for developing flexible mechanisms for machine learning. Here, machine learning markets for multivariate systems are defined, and a utility-based framework is established for their analysis. This differs from the usual approach of defining static betting functions. It is shown that such markets can implement model combination methods used in machine learning, such as product of expert and mixture of expert approaches as equilibrium pricing models, by varying agent utility functions. They can also implement models composed of local potentials, and message passing methods. Prediction markets also allow for more flexible combinations, by combining multiple different utility functions. Conversely, the market mechanisms implement inference in the relevant probabilistic models. This means that market mechanism can be utilized for implementing parallelized model building and inference for probabilistic modelling.\nWe develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support learning. We then give algorithms, along with lower and upper complexity bounds, for the subsumption and generalization problems for AMA formulas. We present a positive-examples--only specific-to-general learning method based on these algorithms. We also present a polynomial-time--computable ``syntactic'' subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. Finally, we apply this algorithm to the task of learning relational event definitions from video and show that it yields definitions that are competitive with hand-coded ones.\nIn this paper, we analyze the decision version of the NK landscape model from the perspective of threshold phenomena and phase transitions under two random distributions, the uniform probability model and the fixed ratio model. For the uniform probability model, we prove that the phase transition is easy in the sense that there is a polynomial algorithm that can solve a random instance of the problem with the probability asymptotic to 1 as the problem size tends to infinity. For the fixed ratio model, we establish several upper bounds for the solubility threshold, and prove that random instances with parameters above these upper bounds can be solved polynomially. This, together with our empirical study for random instances generated below and in the phase transition region, suggests that the phase transition of the fixed ratio model is also easy.\nIndependence -- the study of what is relevant to a given problem of reasoning -- has received an increasing attention from the AI community. In this paper, we consider two basic forms of independence, namely, a syntactic one and a semantic one. We show features and drawbacks of them. In particular, while the syntactic form of independence is computationally easy to check, there are cases in which things that intuitively are not relevant are not recognized as such. We also consider the problem of forgetting, i.e., distilling from a knowledge base only the part that is relevant to the set of queries constructed from a subset of the alphabet. While such process is computationally hard, it allows for a simplification of subsequent reasoning, and can thus be viewed as a form of compilation: once the relevant part of a knowledge base has been extracted, all reasoning tasks to be performed can be simplified.\nThis paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference chains, and translate Spanish zero pronouns into English---issues hardly considered by other systems. The paper presents the resolution and evaluation of these anaphora problems in AGIR with the use of different kinds of knowledge (lexical, morphological, syntactic, and semantic). The translation of English and Spanish anaphoric third-person personal pronouns (including Spanish zero pronouns) into the target language has been evaluated on unrestricted corpora. We have obtained a precision of 80.4% and 84.8% in the translation of Spanish and English pronouns, respectively. Although we have only studied the Spanish and English languages, our approach can be easily extended to other languages such as Portuguese, Italian, or Japanese.\nWe present a probabilistic generative model for timing deviations in expressive music performance. The structure of the proposed model is equivalent to a switching state space model. The switch variables correspond to discrete note locations as in a musical score. The continuous hidden variables denote the tempo. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. Exact computation of posterior features such as the MAP state is intractable in this model class, so we introduce Monte Carlo methods for integration and optimization. We compare Markov Chain Monte Carlo (MCMC) methods (such as Gibbs sampling, simulated annealing and iterative improvement) and sequential Monte Carlo methods (particle filters). Our simulation results suggest better results with sequential methods. The methods can be applied in both online and batch scenarios such as tempo tracking and transcription and are thus potentially useful in a number of music applications such as adaptive automatic accompaniment, score typesetting and music information retrieval.\nBayesian belief networks have grown to prominence because they provide compact representations for many problems for which probabilistic inference is appropriate, and there are algorithms to exploit this compactness. The next step is to allow compact representations of the conditional probabilities of a variable given its parents. In this paper we present such a representation that exploits contextual independence in terms of parent contexts; which variables act as parents may depend on the value of other variables. The internal representation is in terms of contextual factors (confactors) that is simply a pair of a context and a table. The algorithm, contextual variable elimination, is based on the standard variable elimination algorithm that eliminates the non-query variables in turn, but when eliminating a variable, the tables that need to be multiplied can depend on the context. This algorithm reduces to standard variable elimination when there is no contextual independence structure to exploit. We show how this can be much more efficient than variable elimination when there is structure to exploit. We explain why this new method can exploit more structure than previous methods for structured belief network inference and an analogous algorithm that uses trees.\nIn this article we present an algorithm to compute bounds on the marginals of a graphical model. For several small clusters of nodes upper and lower bounds on the marginal values are computed independently of the rest of the network. The range of allowed probability distributions over the surrounding nodes is restricted using earlier computed bounds. As we will show, this can be considered as a set of constraints in a linear programming problem of which the objective function is the marginal probability of the center nodes. In this way knowledge about the maginals of neighbouring clusters is passed to other clusters thereby tightening the bounds on their marginals. We show that sharp bounds can be obtained for undirected and directed graphs that are used for practical applications, but for which exact computations are infeasible.\nPolicies of Markov Decision Processes (MDPs) determine the next action to execute from the current state and, possibly, the history (the past states). When the number of states is large, succinct representations are often used to compactly represent both the MDPs and the policies in a reduced amount of space. In this paper, some problems related to the size of succinctly represented policies are analyzed. Namely, it is shown that some MDPs have policies that can only be represented in space super-polynomial in the size of the MDP, unless the polynomial hierarchy collapses. This fact motivates the study of the problem of deciding whether a given MDP has a policy of a given size and reward. Since some algorithms for MDPs work by finding a succinct representation of the value function, the problem of deciding the existence of a succinct representation of a value function of a given size and reward is also considered.\nWe describe a system for specifying the effects of actions. Unlike those commonly used in AI planning, our system uses an action description language that allows one to specify the effects of actions using domain rules, which are state constraints that can entail new action effects from old ones. Declaratively, an action domain in our language corresponds to a nonmonotonic causal theory in the situation calculus. Procedurally, such an action domain is compiled into a set of logical theories, one for each action in the domain, from which fully instantiated successor state-like axioms and STRIPS-like systems are then generated. We expect the system to be a useful tool for knowledge engineers writing action specifications for classical AI planning systems, GOLOG systems, and other systems where formal specifications of actions are needed.\nVHPOP is a partial order causal link (POCL) planner loosely based on UCPOP. It draws from the experience gained in the early to mid 1990's on flaw selection strategies for POCL planning, and combines this with more recent developments in the field of domain independent planning such as distance based heuristics and reachability analysis. We present an adaptation of the additive heuristic for plan space planning, and modify it to account for possible reuse of existing actions in a plan. We also propose a large set of novel flaw selection strategies, and show how these can help us solve more problems than previously possible by POCL planners. VHPOP also supports planning with durative actions by incorporating standard techniques for temporal constraint reasoning. We demonstrate that the same heuristic techniques used to boost the performance of classical POCL planning can be effective in domains with durative actions as well. The result is a versatile heuristic POCL planner competitive with established CSP-based and heuristic state space planners.\nRecently, planning based on answer set programming has been proposed as an approach towards realizing declarative planning systems. In this paper, we present the language Kc, which extends the declarative planning language K by action costs. Kc provides the notion of admissible and optimal plans, which are plans whose overall action costs are within a given limit resp. minimum over all plans (i.e., cheapest plans). As we demonstrate, this novel language allows for expressing some nontrivial planning tasks in a declarative way. Furthermore, it can be utilized for representing planning problems under other optimality criteria, such as computing ``shortest'' plans (with the least number of steps), and refinement combinations of cheapest and fastest plans. We study complexity aspects of the language Kc and provide a transformation to logic programs, such that planning problems are solved via answer set programming. Furthermore, we report experimental results on selected problems. Our experience is encouraging that answer set planning may be a valuable approach to expressive planning systems in which intricate planning problems can be naturally specified and solved.\nSAPA is a domain-independent heuristic forward chaining planner that can handle durative actions, metric resource constraints, and deadline goals. It is designed to be capable of handling the multi-objective nature of metric temporal planning. Our technical contributions include (i) planning-graph based methods for deriving heuristics that are sensitive to both cost and makespan (ii) techniques for adjusting the heuristic estimates to take action interactions and metric resource limitations into account and (iii) a linear time greedy post-processing technique to improve execution flexibility of the solution plans. An implementation of SAPA using many of the techniques presented in this paper was one of the best domain independent planners for domains with metric and temporal constraints in the third International Planning Competition, held at AIPS-02. We describe the technical details of extracting the heuristics and present an empirical evaluation of the current implementation of SAPA.\nDespite their near dominance, heuristic state search planners still lag behind disjunctive planners in the generation of parallel plans in classical planning. The reason is that directly searching for parallel solutions in state space planners would require the planners to branch on all possible subsets of parallel actions, thus increasing the branching factor exponentially. We present a variant of our heuristic state search planner AltAlt, called AltAltp which generates parallel plans by using greedy online parallelization of partial plans. The greedy approach is significantly informed by the use of novel distance heuristics that AltAltp derives from a graphplan-style planning graph for the problem. While this approach is not guaranteed to provide optimal parallel plans, empirical results show that AltAltp is capable of generating good quality parallel plans at a fraction of the cost incurred by the disjunctive planners.\nWe present some techniques for planning in domains specified with the recent standard language PDDL2.1, supporting 'durative actions' and numerical quantities. These techniques are implemented in LPG, a domain-independent planner that took part in the 3rd International Planning Competition (IPC). LPG is an incremental, any time system producing multi-criteria quality plans. The core of the system is based on a stochastic local search method and on a graph-based representation called 'Temporal Action Graphs' (TA-graphs). This paper focuses on temporal planning, introducing TA-graphs and proposing some techniques to guide the search in LPG using this representation. The experimental results of the 3rd IPC, as well as further results presented in this paper, show that our techniques can be very effective. Often LPG outperforms all other fully-automated planners of the 3rd IPC in terms of speed to derive a solution, or quality of the solutions that can be produced.\nTALplanner is a forward-chaining planner that relies on domain knowledge in the shape of temporal logic formulas in order to prune irrelevant parts of the search space. TALplanner recently participated in the third International Planning Competition, which had a clear emphasis on increasing the complexity of the problem domains being used as benchmark tests and the expressivity required to represent these domains in a planning system. Like many other planners, TALplanner had support for some but not all aspects of this increase in expressivity, and a number of changes to the planner were required. After a short introduction to TALplanner, this article describes some of the changes that were made before and during the competition. We also describe the process of introducing suitable domain knowledge for several of the competition domains.\nThe performance of anytime algorithms can be improved by simultaneously solving several instances of algorithm-problem pairs. These pairs may include different instances of a problem (such as starting from a different initial state), different algorithms (if several alternatives exist), or several runs of the same algorithm (for non-deterministic algorithms). In this paper we present a methodology for designing an optimal scheduling policy based on the statistical characteristics of the algorithms involved. We formally analyze the case where the processes share resources (a single-processor model), and provide an algorithm for optimal scheduling. We analyze, theoretically and empirically, the behavior of our scheduling algorithm for various distribution types. Finally, we present empirical results of applying our scheduling algorithm to the Latin Square problem.\nAuctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives.\nPlanning with numeric state variables has been a challenge for many years, and was a part of the 3rd International Planning Competition (IPC-3). Currently one of the most popular and successful algorithmic techniques in STRIPS planning is to guide search by a heuristic function, where the heuristic is based on relaxing the planning task by ignoring the delete lists of the available actions. We present a natural extension of ``ignoring delete lists'' to numeric state variables, preserving the relevant theoretical properties of the STRIPS relaxation under the condition that the numeric task at hand is ``monotonic''. We then identify a subset of the numeric IPC-3 competition language, ``linear tasks'', where monotonicity can be achieved by pre-processing. Based on that, we extend the algorithms used in the heuristic planning system FF to linear tasks. The resulting system Metric-FF is, according to the IPC-3 results which we discuss, one of the two currently most efficient numeric planners.\nThis paper reports the outcome of the third in the series of biennial international planning competitions, held in association with the International Conference on AI Planning and Scheduling (AIPS) in 2002. In addition to describing the domains, the planners and the objectives of the competition, the paper includes analysis of the results. The results are analysed from several perspectives, in order to address the questions of comparative performance between planners, comparative difficulty of domains, the degree of agreement between planners about the relative difficulty of individual problem instances and the question of how well planners scale relative to one another over increasingly difficult problems. The paper addresses these questions through statistical analysis of the raw results of the competition, in order to determine which results can be considered to be adequately supported by the data. The paper concludes with a discussion of some challenges for the future of the competition series.\nInformation about user preferences plays a key role in automated decision making. In many domains it is desirable to assess such preferences in a qualitative rather than quantitative way. In this paper, we propose a qualitative graphical representation of preferences that reflects conditional dependence and independence of preference statements under a ceteris paribus (all else being equal) interpretation. Such a representation is often compact and arguably quite natural in many circumstances. We provide a formal semantics for this model, and describe how the structure of the network can be exploited in several inference tasks, such as determining whether one outcome dominates (is preferred to) another, ordering a set outcomes according to the preference relation, and constructing the best outcome subject to available evidence.\nWe propose a formalism for representation of finite languages, referred to as the class of IDL-expressions, which combines concepts that were only considered in isolation in existing formalisms. The suggested applications are in natural language processing, more specifically in surface natural language generation and in machine translation, where a sentence is obtained by first generating a large set of candidate sentences, represented in a compact way, and then by filtering such a set through a parser. We study several formal properties of IDL-expressions and compare this new formalism with more standard ones. We also present a novel parsing algorithm for IDL-expressions and prove a non-trivial upper bound on its time complexity.\nHierarchical latent class (HLC) models are tree-structured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, empirical studies suggest that the BIC score is a reasonable criterion to use in practice for learning HLC models. Empirical studies also suggest that sometimes model selection can be improved if standard model dimension is replaced with effective model dimension in the penalty term of the BIC score. Effective dimensions are difficult to compute. In this paper, we prove a theorem that relates the effective dimension of an HLC model to the effective dimensions of a number of latent class models. The theorem makes it computationally feasible to compute the effective dimensions of large HLC models. The theorem can also be used to compute the effective dimensions of general tree models.\nSearching for and making decisions about information is becoming increasingly difficult as the amount of information and number of choices increases. Recommendation systems help users find items of interest of a particular type, such as movies or restaurants, but are still somewhat awkward to use. Our solution is to take advantage of the complementary strengths of personalized recommendation systems and dialogue systems, creating personalized aides. We present a system -- the Adaptive Place Advisor -- that treats item selection as an interactive, conversational process, with the program inquiring about item attributes and the user responding. Individual, long-term user preferences are unobtrusively obtained in the course of normal recommendation dialogues and used to direct future conversations with the same user. We present a novel user model that influences both item search and the questions asked during a conversation. We demonstrate the effectiveness of our system in significantly reducing the time and number of interactions required to find a satisfactory item, as compared to a control group of users interacting with a non-adaptive version of the system.\nWe introduce an abductive method for a coherent integration of independent data-sources. The idea is to compute a list of data-facts that should be inserted to the amalgamated database or retracted from it in order to restore its consistency. This method is implemented by an abductive solver, called Asystem, that applies SLDNFA-resolution on a meta-theory that relates different, possibly contradicting, input databases. We also give a pure model-theoretic analysis of the possible ways to `recover' consistent data from an inconsistent database in terms of those models of the database that exhibit as minimal inconsistent information as reasonably possible. This allows us to characterize the `recovered databases' in terms of the `preferred' (i.e., most consistent) models of the theory. The outcome is an abductive-based application that is sound and complete with respect to a corresponding model-based, preferential semantics, and -- to the best of our knowledge -- is more expressive (thus more general) than any other implementation of coherent integration of databases.\nWe present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.\nThe 2002 Trading Agent Competition (TAC) presented a challenging market game in the domain of travel shopping. One of the pivotal issues in this domain is uncertainty about hotel prices, which have a significant influence on the relative cost of alternative trip schedules. Thus, virtually all participants employ some method for predicting hotel prices. We survey approaches employed in the tournament, finding that agents apply an interesting diversity of techniques, taking into account differing sources of evidence bearing on prices. Based on data provided by entrants on their agents' actual predictions in the TAC-02 finals and semifinals, we analyze the relative efficacy of these approaches. The results show that taking into account game-specific information about flight prices is a major distinguishing factor. Machine learning methods effectively induce the relationship between flight and hotel prices from game data, and a purely analytical approach based on competitive equilibrium analysis achieves equal accuracy with no historical data. Employing a new measure of prediction quality, we relate absolute accuracy to bottom-line performance in the game.\nThe predominant knowledge-based approach to automated model construction, compositional modelling, employs a set of models of particular functional components. Its inference mechanism takes a scenario describing the constituent interacting components of a system and translates it into a useful mathematical model. This paper presents a novel compositional modelling approach aimed at building model repositories. It furthers the field in two respects. Firstly, it expands the application domain of compositional modelling to systems that can not be easily described in terms of interacting functional components, such as ecological systems. Secondly, it enables the incorporation of user preferences into the model selection process. These features are achieved by casting the compositional modelling problem as an activity-based dynamic preference constraint satisfaction problem, where the dynamic constraints describe the restrictions imposed over the composition of partial models and the preferences correspond to those of the user of the automated modeller. In addition, the preference levels are represented through the use of symbolic values that differ in orders of magnitude.\nTwo major goals in machine learning are the discovery and improvement of solutions to complex problems. In this paper, we argue that complexification, i.e. the incremental elaboration of solutions through adding new structure, achieves both these goals. We demonstrate the power of complexification through the NeuroEvolution of Augmenting Topologies (NEAT) method, which evolves increasingly complex neural network architectures. NEAT is applied to an open-ended coevolutionary robot duel domain where robot controllers compete head to head. Because the robot duel domain supports a wide range of strategies, and because coevolution benefits from an escalating arms race, it serves as a suitable testbed for studying complexification. When compared to the evolution of networks with fixed structure, complexifying evolution discovers significantly more sophisticated strategies. The results suggest that in order to discover and improve complex solutions, evolution, and search in general, should be allowed to complexify as well as optimize.\nWhen writing a constraint program, we have to choose which variables should be the decision variables, and how to represent the constraints on these variables. In many cases, there is considerable choice for the decision variables. Consider, for example, permutation problems in which we have as many values as variables, and each variable takes an unique value. In such problems, we can choose between a primal and a dual viewpoint. In the dual viewpoint, each dual variable represents one of the primal values, whilst each dual value represents one of the primal variables. Alternatively, by means of channelling constraints to link the primal and dual variables, we can have a combined model with both sets of variables. In this paper, we perform an extensive theoretical and empirical study of such primal, dual and combined models for two classes of problems: permutation problems and injection problems. Our results show that it often be advantageous to use multiple viewpoints, and to have constraints which channel between them to maintain consistency. They also illustrate a general methodology for comparing different constraint models.\nThis is the first of three planned papers describing ZAP, a satisfiability engine that substantially generalizes existing tools while retaining the performance characteristics of modern high-performance solvers. The fundamental idea underlying ZAP is that many problems passed to such engines contain rich internal structure that is obscured by the Boolean representation used; our goal is to define a representation in which this structure is apparent and can easily be exploited to improve computational performance. This paper is a survey of the work underlying ZAP, and discusses previous attempts to improve the performance of the Davis-Putnam-Logemann-Loveland algorithm by exploiting the structure of the problem being solved. We examine existing ideas including extensions of the Boolean language to allow cardinality constraints, pseudo-Boolean representations, symmetry, and a limited form of quantification. While this paper is intended as a survey, our research results are contained in the two subsequent articles, with the theoretical structure of ZAP described in the second paper in this series, and ZAP's implementation described in the third.\nArgumentation is based on the exchange and valuation of interacting arguments, followed by the selection of the most acceptable of them (for example, in order to take a decision, to make a choice). Starting from the framework proposed by Dung in 1995, our purpose is to introduce 'graduality' in the selection of the best arguments, i.e., to be able to partition the set of the arguments in more than the two usual subsets of 'selected' and 'non-selected' arguments in order to represent different levels of selection. Our basic idea is that an argument is all the more acceptable if it can be preferred to its attackers. First, we discuss general principles underlying a 'gradual' valuation of arguments based on their interactions. Following these principles, we define several valuation models for an abstract argumentation system. Then, we introduce 'graduality' in the concept of acceptability of arguments. We propose new acceptability classes and a refinement of existing classes taking advantage of an available 'gradual' valuation.\nDecentralized control of cooperative systems captures the operation of a group of decision makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems.\nThis paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a \"decomposed\" CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.\nMany known planning tasks have inherent constraints concerning the best order in which to achieve the goals. A number of research efforts have been made to detect such constraints and to use them for guiding search, in the hope of speeding up the planning process. We go beyond the previous approaches by considering ordering constraints not only over the (top-level) goals, but also over the sub-goals that will necessarily arise during planning. Landmarks are facts that must be true at some point in every valid solution plan. We extend Koehler and Hoffmann's definition of reasonable orders between top level goals to the more general case of landmarks. We show how landmarks can be found, how their reasonable orders can be approximated, and how this information can be used to decompose a given planning task into several smaller sub-tasks. Our methodology is completely domain- and planner-independent. The implementation demonstrates that the approach can yield significant runtime performance improvements when used as a control loop around state-of-the-art sub-optimal planning systems, as exemplified by FF and LPG.\nWe propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of query-by-humming (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.\nIn recent years, there has been much interest in phase transitions of combinatorial problems. Phase transitions have been successfully used to analyze combinatorial optimization problems, characterize their typical-case features and locate the hardest problem instances. In this paper, we study phase transitions of the asymmetric Traveling Salesman Problem (ATSP), an NP-hard combinatorial optimization problem that has many real-world applications. Using random instances of up to 1,500 cities in which intercity distances are uniformly distributed, we empirically show that many properties of the problem, including the optimal tour cost and backbone size, experience sharp transitions as the precision of intercity distances increases across a critical value. Our experimental results on the costs of the ATSP tours and assignment problem agree with the theoretical result that the asymptotic cost of assignment problem is pi ^2 /6 the number of cities goes to infinity. In addition, we show that the average computational cost of the well-known branch-and-bound subtour elimination algorithm for the problem also exhibits a thrashing behavior, transitioning from easy to difficult as the distance precision increases. These results answer positively an open question regarding the existence of phase transitions in the ATSP, and provide guidance on how difficult ATSP problem instances should be generated.\nPurely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics.\nPerfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational \"free utility\" principle akin to thermodynamical free energy that trades off utility and information costs. We show that bounded optimal control solutions can be derived from this variational principle, which leads in general to stochastic policies. Furthermore, we show that risk-sensitive and robust (minimax) control schemes fall out naturally from this framework if the environment is considered as a bounded rational and perfectly rational opponent, respectively. When resource costs are ignored, the maximum expected utility principle is recovered.\nData compression is a method of improving the efficiency of transmission and storage of images. Dithering, as a method of data compression, can be used to convert an 8-bit gray level image into a 1-bit / binary image. Undithering is the process of reconstruction of gray image from binary image obtained from dithering of gray image. In the present paper, I propose a method of undithering using linear filtering followed by anisotropic diffusion which brings the advantage of smoothing and edge enhancement. First-order statistical parameters, second-order statistical parameters, mean-squared error (MSE) between reconstructed image and the original image before dithering, and peak signal to noise ratio (PSNR) are evaluated at each step of diffusion. Results of the experiments show that the reconstructed image is not as sharp as the image before dithering but a large number of gray values are reproduced with reference to those of the original image prior to dithering.\nThis paper begins the discussion of how the Information Flow Framework can be used to provide a principled foundation for the metalevel (or structural level) of the Standard Upper Ontology (SUO). This SUO structural level can be used as a logical framework for manipulating collections of ontologies in the object level of the SUO or other middle level or domain ontologies. From the Information Flow perspective, the SUO structural level resolves into several metalevel ontologies. This paper discusses a KIF formalization for one of those metalevel categories, the Category Theory Ontology. In particular, it discusses its category and colimit sub-namespaces.\nThis paper describes the Automated Reasoning for Mizar (MizAR) service, which integrates several automated reasoning, artificial intelligence, and presentation tools with Mizar and its authoring environment. The service provides ATP assistance to Mizar authors in finding and explaining proofs, and offers generation of Mizar problems as challenges to ATP systems. The service is based on a sound translation from the Mizar language to that of first-order ATP systems, and relies on the recent progress in application of ATP systems in large theories containing tens of thousands of available facts. We present the main features of MizAR services, followed by an account of initial experiments in finding proofs with the ATP assistance. Our initial experience indicates that the tool offers substantial help in exploring the Mizar library and in preparing new Mizar articles.\nWe propose a structured approach to the problem of retrieval of images by content and present a description logic that has been devised for the semantic indexing and retrieval of images containing complex objects. As other approaches do, we start from low-level features extracted with image analysis to detect and characterize regions in an image. However, in contrast with feature-based approaches, we provide a syntax to describe segmented regions as basic objects and complex objects as compositions of basic ones. Then we introduce a companion extensional semantics for defining reasoning services, such as retrieval, classification, and subsumption. These services can be used for both exact and approximate matching, using similarity measures. Using our logical approach as a formal specification, we implemented a complete client-server image retrieval system, which allows a user to pose both queries by sketch and queries by example. A set of experiments has been carried out on a testbed of images to assess the retrieval capabilities of the system in comparison with expert users ranking. Results are presented adopting a well-established measure of quality borrowed from textual information retrieval.\nThis is the second of three planned papers describing ZAP, a satisfiability engine that substantially generalizes existing tools while retaining the performance characteristics of modern high performance solvers. The fundamental idea underlying ZAP is that many problems passed to such engines contain rich internal structure that is obscured by the Boolean representation used; our goal is to define a representation in which this structure is apparent and can easily be exploited to improve computational performance. This paper presents the theoretical basis for the ideas underlying ZAP, arguing that existing ideas in this area exploit a single, recurring structure in that multiple database axioms can be obtained by operating on a single axiom using a subgroup of the group of permutations on the literals in the problem. We argue that the group structure precisely captures the general structure at which earlier approaches hinted, and give numerous examples of its use. We go on to extend the Davis-Putnam-Logemann-Loveland inference procedure to this broader setting, and show that earlier computational improvements are either subsumed or left intact by the new method. The third paper in this series discusses ZAPs implementation and presents experimental performance results.\nStochastic processes that involve the creation of objects and relations over time are widespread, but relatively poorly studied. For example, accurate fault diagnosis in factory assembly processes requires inferring the probabilities of erroneous assembly operations, but doing this efficiently and accurately is difficult. Modeled as dynamic Bayesian networks, these processes have discrete variables with very large domains and extremely high dimensionality. In this paper, we introduce relational dynamic Bayesian networks (RDBNs), which are an extension of dynamic Bayesian networks (DBNs) to first-order logic. RDBNs are a generalization of dynamic probabilistic relational models (DPRMs), which we had proposed in our previous work to model dynamic uncertain domains. We first extend the Rao-Blackwellised particle filtering described in our earlier work to RDBNs. Next, we lift the assumptions associated with Rao-Blackwellization in RDBNs and propose two new forms of particle filtering. The first one uses abstraction hierarchies over the predicates to smooth the particle filters estimates. The second employs kernel density estimation with a kernel function specifically designed for relational domains. Experiments show these two methods greatly outperform standard particle filtering on the task of assembly plan execution monitoring.\nIn this paper we present a new approach to modeling finite set domain constraint problems using Reduced Ordered Binary Decision Diagrams (ROBDDs). We show that it is possible to construct an efficient set domain propagator which compactly represents many set domains and set constraints using ROBDDs. We demonstrate that the ROBDD-based approach provides unprecedented flexibility in modeling constraint satisfaction problems, leading to performance improvements. We also show that the ROBDD-based modeling approach can be extended to the modeling of integer and multiset constraint problems in a straightforward manner. Since domain propagation is not always practical, we also show how to incorporate less strict consistency notions into the ROBDD framework, such as set bounds, cardinality bounds and lexicographic bounds consistency. Finally, we present experimental results that demonstrate the ROBDD-based solver performs better than various more conventional constraint solvers on several standard set constraint problems.\nWe present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.\nThis is the third of three papers describing ZAP, a satisfiability engine that substantially generalizes existing tools while retaining the performance characteristics of modern high-performance solvers. The fundamental idea underlying ZAP is that many problems passed to such engines contain rich internal structure that is obscured by the Boolean representation used; our goal has been to define a representation in which this structure is apparent and can be exploited to improve computational performance. The first paper surveyed existing work that (knowingly or not) exploited problem structure to improve the performance of satisfiability engines, and the second paper showed that this structure could be understood in terms of groups of permutations acting on individual clauses in any particular Boolean theory. We conclude the series by discussing the techniques needed to implement our ideas, and by reporting on their performance on a variety of problem instances.\nPartially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.\nIn this paper we propose a crossover operator for evolutionary algorithms with real values that is based on the statistical theory of population distributions. The operator is based on the theoretical distribution of the values of the genes of the best individuals in the population. The proposed operator takes into account the localization and dispersion features of the best individuals of the population with the objective that these features would be inherited by the offspring. Our aim is the optimization of the balance between exploration and exploitation in the search process. In order to test the efficiency and robustness of this crossover, we have used a set of functions to be optimized with regard to different criteria, such as, multimodality, separability, regularity and epistasis. With this set of functions we can extract conclusions in function of the problem at hand. We analyze the results using ANOVA and multiple comparison statistical tests. As an example of how our crossover can be used to solve artificial intelligence problems, we have applied the proposed model to the problem of obtaining the weight of each network in a ensemble of neural networks. The results obtained are above the performance of standard methods.\nDespite recent progress in AI planning, many benchmarks remain challenging for current planners. In many domains, the performance of a planner can greatly be improved by discovering and exploiting information about the domain structure that is not explicitly encoded in the initial PDDL formulation. In this paper we present and compare two automated methods that learn relevant information from previous experience in a domain and use it to solve new problem instances. Our methods share a common four-step strategy. First, a domain is analyzed and structural information is extracted, then macro-operators are generated based on the previously discovered structure. A filtering and ranking procedure selects the most useful macro-operators. Finally, the selected macros are used to speed up future searches. We have successfully used such an approach in the fourth international planning competition IPC-4. Our system, Macro-FF, extends Hoffmanns state-of-the-art planner FF 2.3 with support for two kinds of macro-operators, and with engineering enhancements. We demonstrate the effectiveness of our ideas on benchmarks from international planning competitions. Our results indicate a large reduction in search effort in those complex domains where structural information can be inferred.\nWe study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.\nWe provide an overview of the organization and results of the deterministic part of the 4th International Planning Competition, i.e., of the part concerned with evaluating systems doing deterministic planning. IPC-4 attracted even more competing systems than its already large predecessors, and the competition event was revised in several important respects. After giving an introduction to the IPC, we briefly explain the main differences between the deterministic part of IPC-4 and its predecessors. We then introduce formally the language used, called PDDL2.2 that extends PDDL2.1 by derived predicates and timed initial literals. We list the competing systems and overview the results of the competition. The entire set of data is far too large to be presented in full. We provide a detailed summary; the complete data is available in an online appendix. We explain how we awarded the competition prizes.\nA non-binary Constraint Satisfaction Problem (CSP) can be solved directly using extended versions of binary techniques. Alternatively, the non-binary problem can be translated into an equivalent binary one. In this case, it is generally accepted that the translated problem can be solved by applying well-established techniques for binary CSPs. In this paper we evaluate the applicability of the latter approach. We demonstrate that the use of standard techniques for binary CSPs in the encodings of non-binary problems is problematic and results in models that are very rarely competitive with the non-binary representation. To overcome this, we propose specialized arc consistency and search algorithms for binary encodings, and we evaluate them theoretically and empirically. We consider three binary representations; the hidden variable encoding, the dual encoding, and the double encoding. Theoretical and empirical results show that, for certain classes of non-binary constraints, binary encodings are a competitive option, and in many cases, a better one than the non-binary representation.\nIn this paper, we introduce DLS-MC, a new stochastic local search algorithm for the maximum clique problem. DLS-MC alternates between phases of iterative improvement, during which suitable vertices are added to the current clique, and plateau search, during which vertices of the current clique are swapped with vertices not contained in the current clique. The selection of vertices is solely based on vertex penalties that are dynamically adjusted during the search, and a perturbation mechanism is used to overcome search stagnation. The behaviour of DLS-MC is controlled by a single parameter, penalty delay, which controls the frequency at which vertex penalties are reduced. We show empirically that DLS-MC achieves substantial performance improvements over state-of-the-art algorithms for the maximum clique problem over a large range of the commonly used DIMACS benchmark instances.\nOpen distributed multi-agent systems are gaining interest in the academic community and in industry. In such open settings, agents are often coordinated using standardized agent conversation protocols. The representation of such protocols (for analysis, validation, monitoring, etc) is an important aspect of multi-agent applications. Recently, Petri nets have been shown to be an interesting approach to such representation, and radically different approaches using Petri nets have been proposed. However, their relative strengths and weaknesses have not been examined. Moreover, their scalability and suitability for different tasks have not been addressed. This paper addresses both these challenges. First, we analyze existing Petri net representations in terms of their scalability and appropriateness for overhearing, an important task in monitoring open multi-agent systems. Then, building on the insights gained, we introduce a novel representation using Colored Petri nets that explicitly represent legal joint conversation states and messages. This representation approach offers significant improvements in scalability and is particularly suitable for overhearing. Furthermore, we show that this new representation offers a comprehensive coverage of all conversation features of FIPA conversation standards. We also present a procedure for transforming AUML conversation protocol diagrams (a standard human-readable representation), to our Colored Petri net representation.\nThis article develops Probabilistic Hybrid Action Models (PHAMs), a realistic causal model for predicting the behavior generated by modern percept-driven robot plans. PHAMs represent aspects of robot behavior that cannot be represented by most action models used in AI planning: the temporal structure of continuous control processes, their non-deterministic effects, several modes of their interferences, and the achievement of triggering conditions in closed-loop robot plans.   The main contributions of this article are: (1) PHAMs, a model of concurrent percept-driven behavior, its formalization, and proofs that the model generates probably, qualitatively accurate predictions; and (2) a resource-efficient inference method for PHAMs based on sampling projections from probabilistic action models and state descriptions. We show how PHAMs can be applied to planning the course of action of an autonomous robot office courier based on analytical and experimental results.\nDistributed Constraint Satisfaction (DCSP) has long been considered an important problem in multi-agent systems research. This is because many real-world problems can be represented as constraint satisfaction and these problems often present themselves in a distributed form. In this article, we present a new complete, distributed algorithm called Asynchronous Partial Overlay (APO) for solving DCSPs that is based on a cooperative mediation process. The primary ideas behind this algorithm are that agents, when acting as a mediator, centralize small, relevant portions of the DCSP, that these centralized subproblems overlap, and that agents increase the size of their subproblems along critical paths within the DCSP as the problem solving unfolds. We present empirical evidence that shows that APO outperforms other known, complete DCSP techniques.\nAs partial justification of their framework for iterated belief revision Darwiche and Pearl convincingly argued against Boutiliers natural revision and provided a prototypical revision operator that fits into their scheme. We show that the Darwiche-Pearl arguments lead naturally to the acceptance of a smaller class of operators which we refer to as admissible. Admissible revision ensures that the penultimate input is not ignored completely, thereby eliminating natural revision, but includes the Darwiche-Pearl operator, Nayaks lexicographic revision operator, and a newly introduced operator called restrained revision. We demonstrate that restrained revision is the most conservative of admissible revision operators, effecting as few changes as possible, while lexicographic revision is the least conservative, and point out that restrained revision can also be viewed as a composite operator, consisting of natural revision preceded by an application of a \"backwards revision\" operator previously studied by Papini. Finally, we propose the establishment of a principled approach for choosing an appropriate revision operator in different contexts and discuss future work.\nIn recent years, CP-nets have emerged as a useful tool for supporting preference elicitation, reasoning, and representation. CP-nets capture and support reasoning with qualitative conditional preference statements, statements that are relatively natural for users to express. In this paper, we extend the CP-nets formalism to handle another class of very natural qualitative statements one often uses in expressing preferences in daily life - statements of relative importance of attributes. The resulting formalism, TCP-nets, maintains the spirit of CP-nets, in that it remains focused on using only simple and natural preference statements, uses the ceteris paribus semantics, and utilizes a graphical representation of this information to reason about its consistency and to perform, possibly constrained, optimization using it. The extra expressiveness it provides allows us to better model tradeoffs users would like to make, more faithfully representing their preferences.\nA delta-model is a satisfying assignment of a Boolean formula for which any small alteration, such as a single bit flip, can be repaired by flips to some small number of other bits, yielding a new satisfying assignment. These satisfying assignments represent robust solutions to optimization problems (e.g., scheduling) where it is possible to recover from unforeseen events (e.g., a resource becoming unavailable). The concept of delta-models was introduced by Ginsberg, Parkes and Roy (AAAI 1998), where it was proved that finding delta-models for general Boolean formulas is NP-complete. In this paper, we extend that result by studying the complexity of finding delta-models for classes of Boolean formulas which are known to have polynomial time satisfiability solvers. In particular, we examine 2-SAT, Horn-SAT, Affine-SAT, dual-Horn-SAT, 0-valid and 1-valid SAT. We see a wide variation in the complexity of finding delta-models, e.g., while 2-SAT and Affine-SAT have polynomial time tests for delta-models, testing whether a Horn-SAT formula has one is NP-complete.\nMultimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorporates the cognitive principles of Conversational Implicature and Givenness Hierarchy and applies constraints from various sources (e.g., temporal, semantic, and contextual) to resolve references. Our empirical results have shown the advantage of this algorithm in efficiently resolving a variety of user references. Because of its simplicity and generality, this approach has the potential to improve the robustness of multimodal input interpretation.\nThis paper presents a new framework for anytime heuristic search where the task is to achieve as many goals as possible within the allocated resources. We show the inadequacy of traditional distance-estimation heuristics for tasks of this type and present alternative heuristics that are more appropriate for multiple-goal search. In particular, we introduce the marginal-utility heuristic, which estimates the cost and the benefit of exploring a subtree below a search node. We developed two methods for online learning of the marginal-utility heuristic. One is based on local similarity of the partial marginal utility of sibling nodes, and the other generalizes marginal-utility over the state feature space. We apply our adaptive and non-adaptive multiple-goal search algorithms to several problems, including focused crawling, and show their superiority over existing methods.\nWe present a heuristic search algorithm for solving first-order Markov Decision Processes (FOMDPs). Our approach combines first-order state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, in contrast to existing systems, which start with propositionalizing the FOMDP and then perform state abstraction on its propositionalized version we apply state abstraction directly on the FOMDP avoiding propositionalization. This kind of abstraction is referred to as first-order state abstraction. Secondly, guided by an admissible heuristic, the search is restricted to those states that are reachable from the initial state. We demonstrate the usefulness of the above techniques for solving FOMDPs with a system, referred to as FluCaP (formerly, FCPlanner), that entered the probabilistic track of the 2004 International Planning Competition (IPC2004) and demonstrated an advantage over other planners on the problems represented in first-order terms.\nExact Max-SAT solvers, compared with SAT solvers, apply little inference at each node of the proof tree. Commonly used SAT inference rules like unit propagation produce a simplified formula that preserves satisfiability but, unfortunately, solving the Max-SAT problem for the simplified formula is not equivalent to solving it for the original formula. In this paper, we define a number of original inference rules that, besides being applied efficiently, transform Max-SAT instances into equivalent Max-SAT instances which are easier to solve. The soundness of the rules, that can be seen as refinements of unit resolution adapted to Max-SAT, are proved in a novel and simple way via an integer programming transformation. With the aim of finding out how powerful the inference rules are in practice, we have developed a new Max-SAT solver, called MaxSatz, which incorporates those rules, and performed an experimental investigation. The results provide empirical evidence that MaxSatz is very competitive, at least, on random Max-2SAT, random Max-3SAT, Max-Cut, and Graph 3-coloring instances, as well as on the benchmarks from the Max-SAT Evaluation 2006.\nReputation mechanisms offer an effective alternative to verification authorities for building trust in electronic markets with moral hazard. Future clients guide their business decisions by considering the feedback from past transactions; if truthfully exposed, cheating behavior is sanctioned and thus becomes irrational.   It therefore becomes important to ensure that rational clients have the right incentives to report honestly. As an alternative to side-payment schemes that explicitly reward truthful reports, we show that honesty can emerge as a rational behavior when clients have a repeated presence in the market. To this end we describe a mechanism that supports an equilibrium where truthful feedback is obtained. Then we characterize the set of pareto-optimal equilibria of the mechanism, and derive an upper bound on the percentage of false reports that can be recorded by the mechanism. An important role in the existence of this bound is played by the fact that rational clients can establish a reputation for reporting honestly.\nConjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive query answering over DL knowledge bases is only poorly understood if transitive roles are admitted in the query. In this paper, we consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHIQ and allow transitive roles in both the query and the knowledge base. We show decidability of query answering in this setting and establish two tight complexity bounds: regarding combined complexity, we prove that there is a deterministic algorithm for query answering that needs time single exponential in the size of the KB and double exponential in the size of the query, which is optimal. Regarding data complexity, we prove containment in co-NP.\nMulti-robot path planning is difficult due to the combinatorial explosion of the search space with every new robot added. Complete search of the combined state-space soon becomes intractable. In this paper we present a novel form of abstraction that allows us to plan much more efficiently. The key to this abstraction is the partitioning of the map into subgraphs of known structure with entry and exit restrictions which we can represent compactly. Planning then becomes a search in the much smaller space of subgraph configurations. Once an abstract plan is found, it can be quickly resolved into a correct (but possibly sub-optimal) concrete plan without the need for further search. We prove that this technique is sound and complete and demonstrate its practical effectiveness on a real map.   A contending solution, prioritised planning, is also evaluated and shown to have similar performance albeit at the cost of completeness. The two approaches are not necessarily conflicting; we demonstrate how they can be combined into a single algorithm which outperforms either approach alone.\nModel checking is a promising technology, which has been applied for verification of many hardware and software systems. In this paper, we introduce the concept of model update towards the development of an automatic system modification tool that extends model checking functions. We define primitive update operations on the models of Computation Tree Logic (CTL) and formalize the principle of minimal change for CTL model update. These primitive update operations, together with the underlying minimal change principle, serve as the foundation for CTL model update. Essential semantic and computational characterizations are provided for our CTL model update approach. We then describe a formal algorithm that implements this approach. We also illustrate two case studies of CTL model updates for the well-known microwave oven example and the Andrew File System 1, from which we further propose a method to optimize the update results in complex system modifications.\nOntologies and automated reasoning are the building blocks of the Semantic Web initiative. Derivation rules can be included in an ontology to define derived concepts, based on base concepts. For example, rules allow to define the extension of a class or property, based on a complex relation between the extensions of the same or other classes and properties. On the other hand, the inclusion of negative information both in the form of negation-as-failure and explicit negative information is also needed to enable various forms of reasoning. In this paper, we extend RDF graphs with weak and strong negation, as well as derivation rules. The ERDF stable model semantics of the extended framework (Extended RDF) is defined, extending RDF(S) semantics. A distinctive feature of our theory, which is based on Partial Logic, is that both truth and falsity extensions of properties and classes are considered, allowing for truth value gaps. Our framework supports both closed-world and open-world reasoning through the explicit representation of the particular closed-world assumptions and the ERDF ontological categories of total properties and total classes.\nWe represent planning as a set of loosely coupled network flow problems, where each network corresponds to one of the state variables in the planning domain. The network nodes correspond to the state variable values and the network arcs correspond to the value transitions. The planning problem is to find a path (a sequence of actions) in each network such that, when merged, they constitute a feasible plan. In this paper we present a number of integer programming formulations that model these loosely coupled networks with varying degrees of flexibility. Since merging may introduce exponentially many ordering constraints we implement a so-called branch-and-cut algorithm, in which these constraints are dynamically generated and added to the formulation when needed. Our results are very promising, they improve upon previous planning as integer programming approaches and lay the foundation for integer programming approaches for cost optimal planning.\nIn a facility with front room and back room operations, it is useful to switch workers between the rooms in order to cope with changing customer demand. Assuming stochastic customer arrival and service times, we seek a policy for switching workers such that the expected customer waiting time is minimized while the expected back room staffing is sufficient to perform all work. Three novel constraint programming models and several shaving procedures for these models are presented. Experimental results show that a model based on closed-form expressions together with a combination of shaving procedures is the most efficient. This model is able to find and prove optimal solutions for many problem instances within a reasonable run-time. Previously, the only available approach was a heuristic algorithm. Furthermore, a hybrid method combining the heuristic and the best constraint programming method is shown to perform as well as the heuristic in terms of solution quality over time, while achieving the same performance in terms of proving optimality as the pure constraint programming model. This is the first work of which we are aware that solves such queueing-based problems with constraint programming.\nMarkov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy.\nIn this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way, and we prove it. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and used in an evaluation involving real-world data, showing the predominance of the present approach.\nSituation calculus has been widely applied in Artificial Intelligence related fields. This formalism is considered as a dialect of logic programming language and mostly used in dynamic domain modeling. However, type systems are hardly deployed in situation calculus in the literature. To achieve a correct and sound typed program written in situation calculus, adding typing elements into the current situation calculus will be quite helpful. In this paper, we propose to add more typing mechanisms to the current version of situation calculus, especially for three basic elements in situation calculus: situations, actions and objects, and then perform rigid type checking for existing situation calculus programs to find out the well-typed and ill-typed ones. In this way, type correctness and soundness in situation calculus programs can be guaranteed by type checking based on our type system. This modified version of a lightweight situation calculus is proved to be a robust and well-typed system.\nSituation calculus has been applied widely in artificial intelligence to model and reason about actions and changes in dynamic systems. Since actions carried out by agents will cause constant changes of the agents' beliefs, how to manage these changes is a very important issue. Shapiro et al. [22] is one of the studies that considered this issue. However, in this framework, the problem of noisy sensing, which often presents in real-world applications, is not considered. As a consequence, noisy sensing actions in this framework will lead to an agent facing inconsistent situation and subsequently the agent cannot proceed further. In this paper, we investigate how noisy sensing actions can be handled in iterated belief change within the situation calculus formalism. We extend the framework proposed in [22] with the capability of managing noisy sensings. We demonstrate that an agent can still detect the actual situation when the ratio of noisy sensing actions vs. accurate sensing actions is limited. We prove that our framework subsumes the iterated belief change strategy in [22] when all sensing actions are accurate. Furthermore, we prove that our framework can adequately handle belief introspection, mistaken beliefs, belief revision and belief update even with noisy sensing, as done in [22] with accurate sensing actions only.\nThe work presented here involves the design of a Multi Layer Perceptron (MLP) based pattern classifier for recognition of handwritten Bangla digits using a 76 element feature vector. Bangla is the second most popular script and language in the Indian subcontinent and the fifth most popular language in the world. The feature set developed for representing handwritten Bangla numerals here includes 24 shadow features, 16 centroid features and 36 longest-run features. On experimentation with a database of 6000 samples, the technique yields an average recognition rate of 96.67% evaluated after three-fold cross validation of results. It is useful for applications related to OCR of handwritten Bangla Digit and can also be extended to include OCR of handwritten characters of Bangla alphabet.\nLanguages for open-universe probabilistic models (OUPMs) can represent situations with an unknown number of objects and iden- tity uncertainty. While such cases arise in a wide range of important real-world appli- cations, existing general purpose inference methods for OUPMs are far less efficient than those available for more restricted lan- guages and model classes. This paper goes some way to remedying this deficit by in- troducing, and proving correct, a generaliza- tion of Gibbs sampling to partial worlds with possibly varying model structure. Our ap- proach draws on and extends previous generic OUPM inference methods, as well as aux- iliary variable samplers for nonparametric mixture models. It has been implemented for BLOG, a well-known OUPM language. Combined with compile-time optimizations, the resulting algorithm yields very substan- tial speedups over existing methods on sev- eral test cases, and substantially improves the practicality of OUPM languages generally.\nQualitative possibilistic networks, also known as min-based possibilistic networks, are important tools for handling uncertain information in the possibility theory frame- work. Despite their importance, only the junction tree adaptation has been proposed for exact reasoning with such networks. This paper explores alternative algorithms using compilation techniques. We first propose possibilistic adaptations of standard compilation-based probabilistic methods. Then, we develop a new, purely possibilistic, method based on the transformation of the initial network into a possibilistic base. A comparative study shows that this latter performs better than the possibilistic adap- tations of probabilistic methods. This result is also confirmed by experimental results.\nPossibilistic answer set programming (PASP) extends answer set programming (ASP) by attaching to each rule a degree of certainty. While such an extension is important from an application point of view, existing semantics are not well-motivated, and do not always yield intuitive results. To develop a more suitable semantics, we first introduce a characterization of answer sets of classical ASP programs in terms of possibilistic logic where an ASP program specifies a set of constraints on possibility distributions. This characterization is then naturally generalized to define answer sets of PASP programs. We furthermore provide a syntactic counterpart, leading to a possibilistic generalization of the well-known Gelfond-Lifschitz reduct, and we show how our framework can readily be implemented using standard ASP solvers.\nMany machine learning applications require the ability to learn from and reason about noisy multi-relational data. To address this, several effective representations have been developed that provide both a language for expressing the structural regularities of a domain, and principled support for probabilistic inference. In addition to these two aspects, however, many applications also involve a third aspect-the need to reason about similarities-which has not been directly supported in existing frameworks. This paper introduces probabilistic similarity logic (PSL), a general-purpose framework for joint reasoning about similarity in relational domains that incorporates probabilistic reasoning about similarities and relational structure in a principled way. PSL can integrate any existing domain-specific similarity measures and also supports reasoning about similarities between sets of entities. We provide efficient inference and learning techniques for PSL and demonstrate its effectiveness both in common relational tasks and in settings that require reasoning about similarity.\nWe study the tracking problem, namely, estimating the hidden state of an object over time, from unreliable and noisy measurements. The standard framework for the tracking problem is the generative framework, which is the basis of solutions such as the Bayesian algorithm and its approximation, the particle filters. However, these solutions can be very sensitive to model mismatches. In this paper, motivated by online learning, we introduce a new framework for tracking. We provide an efficient tracking algorithm for this framework. We provide experimental results comparing our algorithm to the Bayesian algorithm on simulated data. Our experiments show that when there are slight model mismatches, our algorithm outperforms the Bayesian algorithm.\nRelational Continuous Models (RCMs) represent joint probability densities over attributes of objects, when the attributes have continuous domains. With relational representations, they can model joint probability distributions over large numbers of variables compactly in a natural way. This paper presents a new exact lifted inference algorithm for RCMs, thus it scales up to large models of real world applications. The algorithm applies to Relational Pairwise Models which are (relational) products of potentials of arity 2. Our algorithm is unique in two ways. First, it substantially improves the efficiency of lifted inference with variables of continuous domains. When a relational model has Gaussian potentials, it takes only linear-time compared to cubic time of previous methods. Second, it is the first exact inference algorithm which handles RCMs in a lifted way. The algorithm is illustrated over an example from econometrics. Experimental results show that our algorithm outperforms both a groundlevel inference algorithm and an algorithm built with previously-known lifted methods.\nPartially-Observable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding belief-MDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a first-order filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current state-of-the-art. We demonstrate the scalability of our algorithm by considering a simulated hand-eye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions.\nIn this paper we introduce a class of Markov decision processes that arise as a natural model for many renewable resource allocation problems. Upon extending results from the inventory control literature, we prove that they admit a closed form solution and we show how to exploit this structure to speed up its computation. We consider the application of the proposed framework to several problems arising in very different domains, and as part of the ongoing effort in the emerging field of Computational Sustainability we discuss in detail its application to the Northern Pacific Halibut marine fishery. Our approach is applied to a model based on real world data, obtaining a policy with a guaranteed lower bound on the utility function that is structurally very different from the one currently employed.\nWhile game theory is widely used to model strategic interactions, a natural question is where do the game representations come from? One answer is to learn the representations from data. If one wants to learn both the payoffs and the players' strategies, a naive approach is to learn them both directly from the data. This approach ignores the fact the players might be playing reasonably good strategies, so there is a connection between the strategies and the data. The main contribution of this paper is to make this connection while learning. We formulate the learning problem as a weighted constraint satisfaction problem, including constraints both for the fit of the payoffs and strategies to the data and the fit of the strategies to the payoffs. We use quantal response equilibrium as our notion of rationality for quantifying the latter fit. Our results show that incorporating rationality constraints can improve learning when the amount of data is limited.\nCyber-physical systems, such as mobile robots, must respond adaptively to dynamic operating conditions. Effective operation of these systems requires that sensing and actuation tasks are performed in a timely manner. Additionally, execution of mission specific tasks such as imaging a room must be balanced against the need to perform more general tasks such as obstacle avoidance. This problem has been addressed by maintaining relative utilization of shared resources among tasks near a user-specified target level. Producing optimal scheduling strategies requires complete prior knowledge of task behavior, which is unlikely to be available in practice. Instead, suitable scheduling strategies must be learned online through interaction with the system. We consider the sample complexity of reinforcement learning in this domain, and demonstrate that while the problem state space is countably infinite, we may leverage the problem's structure to guarantee efficient learning.\nComputing the probability of a formula given the probabilities or weights associated with other formulas is a natural extension of logical inference to the probabilistic setting. Surprisingly, this problem has received little attention in the literature to date, particularly considering that it includes many standard inference problems as special cases. In this paper, we propose two algorithms for this problem: formula decomposition and conditioning, which is an exact method, and formula importance sampling, which is an approximate method. The latter is, to our knowledge, the first application of model counting to approximate probabilistic inference. Unlike conventional variable-based algorithms, our algorithms work in the dual realm of logical formulas. Theoretically, we show that our algorithms can greatly improve efficiency by exploiting the structural information in the formulas. Empirically, we show that they are indeed quite powerful, often achieving substantial performance gains over state-of-the-art schemes.\nThis paper addresses the problem of sampling from binary distributions with constraints. In particular, it proposes an MCMC method to draw samples from a distribution of the set of all states at a specified distance from some reference state. For example, when the reference state is the vector of zeros, the algorithm can draw samples from a binary distribution with a constraint on the number of active variables, say the number of 1's. We motivate the need for this algorithm with examples from statistical physics and probabilistic inference. Unlike previous algorithms proposed to sample from binary distributions with these constraints, the new algorithm allows for large moves in state space and tends to propose them such that they are energetically favourable. The algorithm is demonstrated on three Boltzmann machines of varying difficulty: A ferromagnetic Ising model (with positive potentials), a restricted Boltzmann machine with learned Gabor-like filters as potentials, and a challenging three-dimensional spin-glass (with positive and negative potentials).\nOver the past two decades, several consistent procedures have been designed to infer causal conclusions from observational data. We prove that if the true causal network might be an arbitrary, linear Gaussian network or a discrete Bayes network, then every unambiguous causal conclusion produced by a consistent method from non-experimental data is subject to reversal as the sample size increases any finite number of times. That result, called the causal flipping theorem, extends prior results to the effect that causal discovery cannot be reliable on a given sample size. We argue that since repeated flipping of causal conclusions is unavoidable in principle for consistent methods, the best possible discovery methods are consistent methods that retract their earlier conclusions no more than necessary. A series of simulations of various methods across a wide range of sample sizes illustrates concretely both the theorem and the principle of comparing methods in terms of retractions.\nDecentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While fnite-horizon DECPOMDPs have enjoyed signifcant success, progress remains slow for the infnite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infnite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers.\nDecision theoretical troubleshooting is about minimizing the expected cost of solving a certain problem like repairing a complicated man-made device. In this paper we consider situations where you have to take apart some of the device to get access to certain clusters and actions. Specifically, we investigate troubleshooting with independent actions in a tree of clusters where actions inside a cluster cannot be performed before the cluster is opened. The problem is non-trivial because there is a cost associated with opening and closing a cluster. Troubleshooting with independent actions and no clusters can be solved in O(n lg n) time (n being the number of actions) by the well-known \"P-over-C\" algorithm due to Kadane and Simon, but an efficient and optimal algorithm for a tree cluster model has not yet been found. In this paper we describe a \"bottom-up P-over-C\" O(n lg n) time algorithm and show that it is optimal when the clusters do not need to be closed to test whether the actions solved the problem.\nThis note deals with a class of variables that, if conditioned on, tends to amplify confounding bias in the analysis of causal effects. This class, independently discovered by Bhattacharya and Vogt (2007) and Wooldridge (2009), includes instrumental variables and variables that have greater influence on treatment selection than on the outcome. We offer a simple derivation and an intuitive explanation of this phenomenon and then extend the analysis to non linear models. We show that: 1. the bias-amplifying potential of instrumental variables extends over to non-linear models, though not as sweepingly as in linear models; 2. in non-linear models, conditioning on instrumental variables may introduce new bias where none existed before; 3. in both linear and non-linear models, instrumental variables have no effect on selection-induced bias.\nIn many fields observations are performed irregularly along time, due to either measurement limitations or lack of a constant immanent rate. While discrete-time Markov models (as Dynamic Bayesian Networks) introduce either inefficient computation or an information loss to reasoning about such processes, continuous-time Markov models assume either a discrete state space (as Continuous-Time Bayesian Networks), or a flat continuous state space (as stochastic differential equations). To address these problems, we present a new modeling class called Irregular-Time Bayesian Networks (ITBNs), generalizing Dynamic Bayesian Networks, allowing substantially more compact representations, and increasing the expressivity of the temporal dynamics. In addition, a globally optimal solution is guaranteed when learning temporal systems, provided that they are fully observed at the same irregularly spaced time-points, and a semiparametric subclass of ITBNs is introduced to allow further adaptation to the irregular nature of the available data.\nIdentifying effects of actions (treatments) on outcome variables from observational data and causal assumptions is a fundamental problem in causal inference. This identification is made difficult by the presence of confounders which can be related to both treatment and outcome variables. Confounders are often handled, both in theory and in practice, by adjusting for covariates, in other words considering outcomes conditioned on treatment and covariate values, weighed by probability of observing those covariate values. In this paper, we give a complete graphical criterion for covariate adjustment, which we term the adjustment criterion, and derive some interesting corollaries of the completeness of this criterion.\nMonte-Carlo Tree Search (MCTS) methods are drawing great interest after yielding breakthrough results in computer Go. This paper proposes a Bayesian approach to MCTS that is inspired by distributionfree approaches such as UCT [13], yet significantly differs in important respects. The Bayesian framework allows potentially much more accurate (Bayes-optimal) estimation of node values and node uncertainties from a limited number of simulation trials. We further propose propagating inference in the tree via fast analytic Gaussian approximation methods: this can make the overhead of Bayesian inference manageable in domains such as Go, while preserving high accuracy of expected-value estimates. We find substantial empirical outperformance of UCT in an idealized bandit-tree test environment, where we can obtain valuable insights by comparing with known ground truth. Additionally we rigorously prove on-policy and off-policy convergence of the proposed methods.\nIn this paper, we present the Difference- Based Causality Learner (DBCL), an algorithm for learning a class of discrete-time dynamic models that represents all causation across time by means of difference equations driving change in a system. We motivate this representation with real-world mechanical systems and prove DBCL's correctness for learning structure from time series data, an endeavour that is complicated by the existence of latent derivatives that have to be detected. We also prove that, under common assumptions for causal discovery, DBCL will identify the presence or absence of feedback loops, making the model more useful for predicting the effects of manipulating variables when the system is in equilibrium. We argue analytically and show empirically the advantages of DBCL over vector autoregression (VAR) and Granger causality models as well as modified forms of Bayesian and constraintbased structure discovery algorithms. Finally, we show that our algorithm can discover causal directions of alpha rhythms in human brains from EEG data.\nIt is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before convergence. Thus, we have no better understanding of BP than as an algorithm to seek for a common zero of a system of non-linear functions, not explicitly related to each other. In this theoretical paper, we show that these functions are in fact explicitly related -- they are the partial derivatives of a single function of reparameterizations. That means, BP seeks for a stationary point of a single function, without any constraints. This function has a very natural form: it is a linear combination of local log-partition functions, exactly as the Bethe entropy is the same linear combination of local entropies.\nWe present decentralized rollout sampling policy iteration (DecRSPI) - a new algorithm for multi-agent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte- Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach.\nLearning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances ground-truth may not exist). Semi-supervised learning approaches have shown that utilizing the unlabeled data is often beneficial in these cases. This paper presents a probabilistic semi-supervised model and algorithm that allows for learning from both unlabeled and labeled data in the presence of multiple annotators. We assume that it is known what annotator labeled which data points. The proposed approach produces annotator models that allow us to provide (1) estimates of the true label and (2) annotator variable expertise for both labeled and unlabeled data. We provide numerical comparisons under various scenarios and with respect to standard semi-supervised learning. Experiments showed that the presented approach provides clear advantages over multi-annotator methods that do not use the unlabeled data and over methods that do not use multi-labeler information.\nCollaborative filtering is an effective recommendation approach in which the preference of a user on an item is predicted based on the preferences of other users with similar interests. A big challenge in using collaborative filtering methods is the data sparsity problem which often arises because each user typically only rates very few items and hence the rating matrix is extremely sparse. In this paper, we address this problem by considering multiple collaborative filtering tasks in different domains simultaneously and exploiting the relationships between domains. We refer to it as a multi-domain collaborative filtering (MCF) problem. To solve the MCF problem, we propose a probabilistic framework which uses probabilistic matrix factorization to model the rating problem in each domain and allows the knowledge to be adaptively transferred across different domains by automatically learning the correlation between domains. We also introduce the link function for different domains to correct their biases. Experiments conducted on several real-world applications demonstrate the effectiveness of our methods when compared with some representative methods.\nMulti-task learning is a learning paradigm which seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this paper, we propose a regularization formulation for learning the relationships between tasks in multi-task learning. This formulation can be viewed as a novel generalization of the regularization framework for single-task learning. Besides modeling positive task correlation, our method, called multi-task relationship learning (MTRL), can also describe negative task correlation and identify outlier tasks based on the same underlying principle. Under this regularization framework, the objective function of MTRL is convex. For efficiency, we use an alternating method to learn the optimal model parameters for each task as well as the relationships between tasks. We study MTRL in the symmetric multi-task learning setting and then generalize it to the asymmetric setting as well. We also study the relationships between MTRL and some existing multi-task learning methods. Experiments conducted on a toy problem as well as several benchmark data sets demonstrate the effectiveness of MTRL.\nDespite the intractability of generic optimal partially observable Markov decision process planning, there exist important problems that have highly structured models. Previous researchers have used this insight to construct more efficient algorithms for factored domains, and for domains with topological structure in the flat state dynamics model. In our work, motivated by findings from the education community relevant to automated tutoring, we consider problems that exhibit a form of topological structure in the factored dynamics model. Our Reachable Anytime Planner for Imprecisely-sensed Domains (RAPID) leverages this structure to efficiently compute a good initial envelope of reachable states under the optimal MDP policy in time linear in the number of state variables. RAPID performs partially-observable planning over the limited envelope of states, and slowly expands the state space considered as time allows. RAPID performs well on a large tutoring-inspired problem simulation with 122 state variables, corresponding to a flat state space of over 10^30 states.\nUCT has recently emerged as an exciting new adversarial reasoning technique based on cleverly balancing exploration and exploitation in a Monte-Carlo sampling setting. It has been particularly successful in the game of Go but the reasons for its success are not well understood and attempts to replicate its success in other domains such as Chess have failed. We provide an in-depth analysis of the potential of UCT in domain-independent settings, in cases where heuristic values are available, and the effect of enhancing random playouts to more informed playouts between two weak minimax players. To provide further insights, we develop synthetic game tree instances and discuss interesting properties of UCT, both empirically and analytically.\nSuccessive elimination of candidates is often a route to making manipulation intractable to compute. We prove that eliminating candidates does not necessarily increase the computational complexity of manipulation. However, for many voting rules used in practice, the computational complexity increases. For example, it is already known that it is NP-hard to compute how a single voter can manipulate the result of single transferable voting (the elimination version of plurality voting). We show here that it is NP-hard to compute how a single voter can manipulate the result of the elimination version of veto voting, of the closely related Coombs' rule, and of the elimination versions of a general class of scoring rules.\nWe identify the presence of typically quantum effects, namely 'superposition' and 'interference', in what happens when human concepts are combined, and provide a quantum model in complex Hilbert space that represents faithfully experimental data measuring the situation of combining concepts. Our model shows how 'interference of concepts' explains the effects of underextension and overextension when two concepts combine to the disjunction of these two concepts. This result supports our earlier hypothesis that human thought has a superposed two-layered structure, one layer consisting of 'classical logical thought' and a superposed layer consisting of 'quantum conceptual thought'. Possible connections with recent findings of a 'grid-structure' for the brain are analyzed, and influences on the mind/brain relation, and consequences on applied disciplines, such as artificial intelligence and quantum computation, are considered.\nWe propose an analysis of the codified Law of France as a structured system. Fifty two legal codes are selected on the basis of explicit legal criteria and considered as vertices with their mutual quotations forming the edges in a network which properties are analyzed relying on graph theory. We find that a group of 10 codes are simultaneously the most citing and the most cited by other codes, and are also strongly connected together so forming a \"rich club\" sub-graph. Three other code communities are also found that somewhat partition the legal field is distinct thematic sub-domains. The legal interpretation of this partition is opening new untraditional lines of research. We also conjecture that many legal systems are forming such new kind of networks that share some properties in common with small worlds but are far denser. We propose to call \"concentrated world\".\nMost Relevant Explanation (MRE) is a method for finding multivariate explanations for given evidence in Bayesian networks [12]. This paper studies the theoretical properties of MRE and develops an algorithm for finding multiple top MRE solutions. Our study shows that MRE relies on an implicit soft relevance measure in automatically identifying the most relevant target variables and pruning less relevant variables from an explanation. The soft measure also enables MRE to capture the intuitive phenomenon of explaining away encoded in Bayesian networks. Furthermore, our study shows that the solution space of MRE has a special lattice structure which yields interesting dominance relations among the solutions. A K-MRE algorithm based on these dominance relations is developed for generating a set of top solutions that are more representative. Our empirical results show that MRE methods are promising approaches for explanation in Bayesian networks.\nThis paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.\nThis paper develops an inconsistency measure on conditional probabilistic knowledge bases. The measure is based on fundamental principles for inconsistency measures and thus provides a solid theoretical framework for the treatment of inconsistencies in probabilistic expert systems. We illustrate its usefulness and immediate application on several examples and present some formal results. Building on this measure we use the Shapley value-a well-known solution for coalition games-to define a sophisticated indicator that is not only able to measure inconsistencies but to reveal the causes of inconsistencies in the knowledge base. Altogether these tools guide the knowledge engineer in his aim to restore consistency and therefore enable him to build a consistent and usable knowledge base that can be employed in probabilistic expert systems.\nMany applications of causal analysis call for assessing, retrospectively, the effect of withholding an action that has in fact been implemented. This counterfactual quantity, sometimes called \"effect of treatment on the treated,\" (ETT) have been used to to evaluate educational programs, critic public policies, and justify individual decision making. In this paper we explore the conditions under which ETT can be estimated from (i.e., identified in) experimental and/or observational studies. We show that, when the action invokes a singleton variable, the conditions for ETT identification have simple characterizations in terms of causal diagrams. We further give a graphical characterization of the conditions under which the effects of multiple treatments on the treated can be identified, as well as ways in which the ETT estimand can be constructed from both interventional and observational distributions.\nThere has been a great deal of recent interest in methods for performing lifted inference; however, most of this work assumes that the first-order model is given as input to the system. Here, we describe lifted inference algorithms that determine symmetries and automatically lift the probabilistic model to speedup inference. In particular, we describe approximate lifted inference techniques that allow the user to trade off inference accuracy for computational efficiency by using a handful of tunable parameters, while keeping the error bounded. Our algorithms are closely related to the graph-theoretic concept of bisimulation. We report experiments on both synthetic and real data to show that in the presence of symmetries, run-times for inference can be improved significantly, with approximate lifted inference providing orders of magnitude speedup over ground inference.\nThe specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We first discuss how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion. We then demonstrate how regret can be reduced by efficiently eliciting reward information using bound queries, using regret-reduction as a means for choosing suitable queries. Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function.\nThe fastest known exact algorithms for scorebased structure discovery in Bayesian networks on n nodes run in time and space 2nnO(1). The usage of these algorithms is limited to networks on at most around 25 nodes mainly due to the space requirement. Here, we study space-time tradeoffs for finding an optimal network structure. When little space is available, we apply the Gurevich-Shelah recurrence-originally proposed for the Hamiltonian path problem-and obtain time 22n-snO(1) in space 2snO(1) for any s = n/2, n/4, n/8, . . .; we assume the indegree of each node is bounded by a constant. For the more practical setting with moderate amounts of space, we present a novel scheme. It yields running time 2n(3/2)pnO(1) in space 2n(3/4)pnO(1) for any p = 0, 1, . . ., n/2; these bounds hold as long as the indegrees are at most 0.238n. Furthermore, the latter scheme allows easy and efficient parallelization beyond previous algorithms. We also explore empirically the potential of the presented techniques.\nLogical inference algorithms for conditional independence (CI) statements have important applications from testing consistency during knowledge elicitation to constraintbased structure learning of graphical models. We prove that the implication problem for CI statements is decidable, given that the size of the domains of the random variables is known and fixed. We will present an approximate logical inference algorithm which combines a falsification and a novel validation algorithm. The validation algorithm represents each set of CI statements as a sparse 0-1 matrix A and validates instances of the implication problem by solving specific linear programs with constraint matrix A. We will show experimentally that the algorithm is both effective and efficient in validating and falsifying instances of the probabilistic CI implication problem.\nThe introduction of loopy belief propagation (LBP) revitalized the application of graphical models in many domains. Many recent works present improvements on the basic LBP algorithm in an attempt to overcome convergence and local optima problems. Notable among these are convexified free energy approximations that lead to inference procedures with provable convergence and quality properties. However, empirically LBP still outperforms most of its convex variants in a variety of settings, as we also demonstrate here. Motivated by this fact we seek convexified free energies that directly approximate the Bethe free energy. We show that the proposed approximations compare favorably with state-of-the art convex free energy approximations.\nMessage-passing algorithms have emerged as powerful techniques for approximate inference in graphical models. When these algorithms converge, they can be shown to find local (or sometimes even global) optima of variational formulations to the inference problem. But many of the most popular algorithms are not guaranteed to converge. This has lead to recent interest in convergent message-passing algorithms. In this paper, we present a unified view of convergent message-passing algorithms. We present a simple derivation of an abstract algorithm, tree-consistency bound optimization (TCBO) that is provably convergent in both its sum and max product forms. We then show that many of the existing convergent algorithms are instances of our TCBO algorithm, and obtain novel convergent algorithms \"for free\" by exchanging maximizations and summations in existing algorithms. In particular, we show that Wainwright's non-convergent sum-product algorithm for tree based variational bounds, is actually convergent with the right update order for the case where trees are monotonic chains.\nWe consider the task of obtaining the maximum a posteriori estimate of discrete pairwise random fields with arbitrary unary potentials and semimetric pairwise potentials. For this problem, we propose an accurate hierarchical move making strategy where each move is computed efficiently by solving an st-MINCUT problem. Unlike previous move making approaches, e.g. the widely used a-expansion algorithm, our method obtains the guarantees of the standard linear programming (LP) relaxation for the important special case of metric labeling. Unlike the existing LP relaxation solvers, e.g. interior-point algorithms or tree-reweighted message passing, our method is significantly faster as it uses only the efficient st-MINCUT algorithm in its design. Using both synthetic and real data experiments, we show that our technique outperforms several commonly used algorithms.\nProbabilistic programming languages and modeling toolkits are two modular ways to build and reuse stochastic models and inference procedures. Combining strengths of both, we express models and inference as generalized coroutines in the same general-purpose language. We use existing facilities of the language, such as rich libraries, optimizing compilers, and types, to develop concise, declarative, and realistic models with competitive performance on exact and approximate inference. In particular, a wide range of models can be expressed using memoization. Because deterministic parts of models run at full speed, custom inference procedures are trivial to incorporate, and inference procedures can reason about themselves without interpretive overhead. Within this framework, we introduce a new, general algorithm for importance sampling with look-ahead.\nA major benefit of graphical models is that most knowledge is captured in the model structure. Many models, however, produce inference problems with a lot of symmetries not reflected in the graphical structure and hence not exploitable by efficient inference techniques such as belief propagation (BP). In this paper, we present a new and simple BP algorithm, called counting BP, that exploits such additional symmetries. Starting from a given factor graph, counting BP first constructs a compressed factor graph of clusternodes and clusterfactors, corresponding to sets of nodes and factors that are indistinguishable given the evidence. Then it runs a modified BP algorithm on the compressed graph that is equivalent to running BP on the original factor graph. Our experiments show that counting BP is applicable to a variety of important AI tasks such as (dynamic) relational models and boolean model counting, and that significant efficiency gains are obtainable, often by orders of magnitude.\nIn this paper we introduce temporal action graph games (TAGGs), a novel graphical representation of imperfect-information extensive form games. We show that when a game involves anonymity or context-specific utility independencies, its encoding as a TAGG can be much more compact than its direct encoding as a multiagent influence diagram (MAID).We also show that TAGGs can be understood as indirect MAID encodings in which many deterministic chance nodes are introduced. We provide an algorithm for computing with TAGGs, and show both theoretically and empirically that our approach improves significantly on the previous state of the art.\nEfficiently finding the maximum a posteriori (MAP) configuration of a graphical model is an important problem which is often implemented using message passing algorithms. The optimality of such algorithms is only well established for singly-connected graphs and other limited settings. This article extends the set of graphs where MAP estimation is in P and where message passing recovers the exact solution to so-called perfect graphs. This result leverages recent progress in defining perfect graphs (the strong perfect graph theorem), linear programming relaxations of MAP estimation and recent convergent message passing schemes. The article converts graphical models into nand Markov random fields which are straightforward to relax into linear programs. Therein, integrality can be established in general by testing for graph perfection. This perfection test is performed efficiently using a polynomial time algorithm. Alternatively, known decomposition tools from perfect graph theory may be used to prove perfection for certain families of graphs. Thus, a general graph framework is provided for determining when MAP estimation in any graphical model is in P, has an integral linear programming relaxation and is exactly recoverable by message passing.\nA Bayesian belief network models a joint distribution with an directed acyclic graph representing dependencies among variables and network parameters characterizing conditional distributions. The parameters are viewed as random variables to quantify uncertainty about their values. Belief nets are used to compute responses to queries; i.e., conditional probabilities of interest. A query is a function of the parameters, hence a random variable. Van Allen et al. (2001, 2008) showed how to quantify uncertainty about a query via a delta method approximation of its variance. We develop more accurate approximations for both query mean and variance. The key idea is to extend the query mean approximation to a \"doubled network\" involving two independent replicates. Our method assumes complete data and can be applied to discrete, continuous, and hybrid networks (provided discrete variables have only discrete parents). We analyze several improvements, and provide empirical studies to demonstrate their effectiveness.\nMixed integer linear programming (MILP) is a powerful representation often used to formulate decision-making problems under uncertainty. However, it lacks a natural mechanism to reason about objects, classes of objects, and relations. First-order logic (FOL), on the other hand, excels at reasoning about classes of objects, but lacks a rich representation of uncertainty. While representing propositional logic in MILP has been extensively explored, no theory exists yet for fully combining FOL with MILP. We propose a new representation, called first-order programming or FOP, which subsumes both FOL and MILP. We establish formal methods for reasoning about first order programs, including a sound and complete lifted inference procedure for integer first order programs. Since FOP can offer exponential savings in representation and proof size compared to FOL, and since representations and proofs are never significantly longer in FOP than in FOL, we anticipate that inference in FOP will be more tractable than inference in FOL for corresponding problems.\nAs computer clusters become more common and the size of the problems encountered in the field of AI grows, there is an increasing demand for efficient parallel inference algorithms. We consider the problem of parallel inference on large factor graphs in the distributed memory setting of computer clusters. We develop a new efficient parallel inference algorithm, DBRSplash, which incorporates over-segmented graph partitioning, belief residual scheduling, and uniform work Splash operations. We empirically evaluate the DBRSplash algorithm on a 120 processor cluster and demonstrate linear to super-linear performance gains on large factor graph models.\nGenerating optimal plans in highly dynamic environments is challenging. Plans are predicated on an assumed initial state, but this state can change unexpectedly during plan generation, potentially invalidating the planning effort. In this paper we make three contributions: (1) We propose a novel algorithm for generating optimal plans in settings where frequent, unexpected events interfere with planning. It is able to quickly distinguish relevant from irrelevant state changes, and to update the existing planning search tree if necessary. (2) We argue for a new criterion for evaluating plan adaptation techniques: the relative running time compared to the \"size\" of changes. This is significant since during recovery more changes may occur that need to be recovered from subsequently, and in order for this process of repeated recovery to terminate, recovery time has to converge. (3) We show empirically that our approach can converge and find optimal plans in environments that would ordinarily defy planning due to their high dynamics.\nContinuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation, inference in such models is intractable even in relatively simple structured networks. Here we introduce a mean field variational approximation in which we use a product of inhomogeneous Markov processes to approximate a distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efficiently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. We provide the theoretical foundations for the approximation, an efficient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale realworld inference problem.\nWe present a new method to propagate lower bounds on conditional probability distributions in conventional Bayesian networks. Our method guarantees to provide outer approximations of the exact lower bounds. A key advantage is that we can use any available algorithms and tools for Bayesian networks in order to represent and infer lower bounds. This new method yields results that are provable exact for trees with binary variables, and results which are competitive to existing approximations in credal networks for all other network structures. Our method is not limited to a specific kind of network structure. Basically, it is also not restricted to a specific kind of inference, but we restrict our analysis to prognostic inference in this article. The computational complexity is superior to that of other existing approaches.\nThe approach described here allows using membership function to represent imprecise and uncertain knowledge by learning in Fuzzy Semantic Networks. This representation has a great practical interest due to the possibility to realize on the one hand, the construction of this membership function from a simple value expressing the degree of interpretation of an Object or a Goal as compared to an other and on the other hand, the adjustment of the membership function during the apprenticeship. We show, how to use these membership functions to represent the interpretation of an Object (respectively of a Goal) user as compared to an system Object (respectively to a Goal). We also show the possibility to make decision for each representation of an user Object compared to a system Object. This decision is taken by determining decision coefficient calculates according to the nucleus of the membership function of the user Object.\nIn this paper, we consider planning in stochastic shortest path (SSP) problems, a subclass of Markov Decision Problems (MDP). We focus on medium-size problems whose state space can be fully enumerated. This problem has numerous important applications, such as navigation and planning under uncertainty. We propose a new approach for constructing a multi-level hierarchy of progressively simpler abstractions of the original problem. Once computed, the hierarchy can be used to speed up planning by first finding a policy for the most abstract level and then recursively refining it into a solution to the original problem. This approach is fully automated and delivers a speed-up of two orders of magnitude over a state-of-the-art MDP solver on sample problems while returning near-optimal solutions. We also prove theoretical bounds on the loss of solution optimality resulting from the use of abstractions.\nMany algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional probabilities and dependencies in logarithmic time. We give experimental results for an implementation of our algorithm, and demonstrate its potential performance benefit in the study of protein structure.\nAssume that cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding linear structural equation model.We consider the identification problem of total effects in the presence of latent variables and selection bias between a treatment variable and a response variable. Pearl and his colleagues provided the back door criterion, the front door criterion (Pearl, 2000) and the conditional instrumental variable method (Brito and Pearl, 2002) as identifiability criteria for total effects in the presence of latent variables, but not in the presence of selection bias. In order to solve this problem, we propose new graphical identifiability criteria for total effects based on the identifiable factor models. The results of this paper are useful to identify total effects in observational studies and provide a new viewpoint to the identification conditions of factor models.\nThe problem of learning discrete Bayesian networks from data is encoded as a weighted MAX-SAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the per-variable summands of the (BDeu) marginal likelihood for different choices of parents ('family scores') are computed prior to applying MaxWalkSat. Each permissible choice of parents for each variable is encoded as a distinct propositional atom and the associated family score encoded as a 'soft' weighted single-literal clause. Two approaches to enforcing acyclicity are considered: either by encoding the ancestor relation or by attaching a total order to each graph and encoding that. The latter approach gives better results. Learning experiments have been conducted on 21 synthetic datasets sampled from 7 BNs. The largest dataset has 10,000 datapoints and 60 variables producing (for the 'ancestor' encoding) a weighted CNF input file with 19,932 atoms and 269,367 clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu score than the 'true' BN. The effect of adding prior information is assessed. It is further shown that Bayesian model averaging can be effected by collecting BNs generated during the search.\nThis paper describes a new algorithm to solve the decision making problem in Influence Diagrams based on algorithms for credal networks. Decision nodes are associated to imprecise probability distributions and a reformulation is introduced that finds the global maximum strategy with respect to the expected utility. We work with Limited Memory Influence Diagrams, which generalize most Influence Diagram proposals and handle simultaneous decisions. Besides the global optimum method, we explore an anytime approximate solution with a guaranteed maximum error and show that imprecise probabilities are handled in a straightforward way. Complexity issues and experiments with random diagrams and an effects-based military planning problem are discussed.\nA graphical multiagent model (GMM) represents a joint distribution over the behavior of a set of agents. One source of knowledge about agents' behavior may come from gametheoretic analysis, as captured by several graphical game representations developed in recent years. GMMs generalize this approach to express arbitrary distributions, based on game descriptions or other sources of knowledge bearing on beliefs about agent behavior. To illustrate the flexibility of GMMs, we exhibit game-derived models that allow probabilistic deviation from equilibrium, as well as models based on heuristic action choice. We investigate three different methods of integrating these models into a single model representing the combined knowledge sources. To evaluate the predictive performance of the combined model, we treat as actual outcome the behavior produced by a reinforcement learning process. We find that combining the two knowledge sources, using any of the methods, provides better predictions than either source alone. Among the combination methods, mixing data outperforms the opinion pool and direct update methods investigated in this empirical trial.\nWe conjecture that the worst case number of experiments necessary and sufficient to discover a causal graph uniquely given its observational Markov equivalence class can be specified as a function of the largest clique in the Markov equivalence class. We provide an algorithm that computes intervention sets that we believe are optimal for the above task. The algorithm builds on insights gained from the worst case analysis in Eberhardt et al. (2005) for sequences of experiments when all possible directed acyclic graphs over N variables are considered. A simulation suggests that our conjecture is correct. We also show that a generalization of our conjecture to other classes of possible graph hypotheses cannot be given easily, and in what sense the algorithm is then no longer optimal.\nA central task in many applications is reasoning about processes that change over continuous time. Continuous-Time Bayesian Networks is a general compact representation language for multi-component continuous-time processes. However, exact inference in such processes is exponential in the number of components, and thus infeasible for most models of interest. Here we develop a novel Gibbs sampling procedure for multi-component processes. This procedure iteratively samples a trajectory for one of the components given the remaining ones. We show how to perform exact sampling that adapts to the natural time scale of the sampled process. Moreover, we show that this sampling procedure naturally exploits the structure of the network to reduce the computational cost of each step. This procedure is the first that can provide asymptotically unbiased approximation in such processes.\nIn addressing the challenge of exponential scaling with the number of agents we adopt a cluster-based representation to approximately solve asymmetric games of very many players. A cluster groups together agents with a similar \"strategic view\" of the game. We learn the clustered approximation from data consisting of strategy profiles and payoffs, which may be obtained from observations of play or access to a simulator. Using our clustering we construct a reduced \"twins\" game in which each cluster is associated with two players of the reduced game. This allows our representation to be individually- responsive because we align the interests of every individual agent with the strategy of its cluster. Our approach provides agents with higher payoffs and lower regret on average than model-free methods as well as previous cluster-based methods, and requires only few observations for learning to be successful. The \"twins\" approach is shown to be an important component of providing these low regret approximations.\nAn important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.\nWe study a multiagent learning problem where agents can either learn via repeated interactions, or can follow the advice of a mediator who suggests possible actions to take. We present an algorithmthat each agent can use so that, with high probability, they can verify whether or not the mediator's advice is useful. In particular, if the mediator's advice is useful then agents will reach a correlated equilibrium, but if the mediator's advice is not useful, then agents are not harmed by using our test, and can fall back to their original learning algorithm. We then generalize our algorithm and show that in the limit it always correctly verifies the mediator's advice.\nBounded policy iteration is an approach to solving infinite-horizon POMDPs that represents policies as stochastic finite-state controllers and iteratively improves a controller by adjusting the parameters of each node using linear programming. In the original algorithm, the size of the linear programs, and thus the complexity of policy improvement, depends on the number of parameters of each node, which grows with the size of the controller. But in practice, the number of parameters of a node with non-zero values is often very small, and does not grow with the size of the controller. Based on this observation, we develop a version of bounded policy iteration that leverages the sparse structure of a stochastic finite-state controller. In each iteration, it improves a policy by the same amount as the original algorithm, but with much better scalability.\nWhile known algorithms for sensitivity analysis and parameter tuning in probabilistic networks have a running time that is exponential in the size of the network, the exact computational complexity of these problems has not been established as yet. In this paper we study several variants of the tuning problem and show that these problems are NPPP-complete in general. We further show that the problems remain NP-complete or PP-complete, for a number of restricted variants. These complexity results provide insight in whether or not recent achievements in sensitivity analysis and tuning can be extended to more general, practicable methods.\nApproximate linear programming (ALP) is an efficient approach to solving large factored Markov decision processes (MDPs). The main idea of the method is to approximate the optimal value function by a set of basis functions and optimize their weights by linear programming (LP). This paper proposes a new ALP approximation. Comparing to the standard ALP formulation, we decompose the constraint space into a set of low-dimensional spaces. This structure allows for solving the new LP efficiently. In particular, the constraints of the LP can be satisfied in a compact form without an exponential dependence on the treewidth of ALP constraints. We study both practical and theoretical aspects of the proposed approach. Moreover, we demonstrate its scale-up potential on an MDP with more than 2^100 states.\nThis paper deals with the problem of evaluating the causal effect using observational data in the presence of an unobserved exposure/ outcome variable, when cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding recursive factorization of a joint distribution. First, we propose identifiability criteria for causal effects when an unobserved exposure/outcome variable is considered to contain more than two categories. Next, when unmeasured variables exist between an unobserved outcome variable and its proxy variables, we provide the tightest bounds based on the potential outcome approach. The results of this paper are helpful to evaluate causal effects in the case where it is difficult or expensive to observe an exposure/ outcome variable in many practical fields.\nGraphical models are usually learned without regard to the cost of doing inference with them. As a result, even if a good model is learned, it may perform poorly at prediction, because it requires approximate inference. We propose an alternative: learning models with a score function that directly penalizes the cost of inference. Specifically, we learn arithmetic circuits with a penalty on the number of edges in the circuit (in which the cost of inference is linear). Our algorithm is equivalent to learning a Bayesian network with context-specific independence by greedily splitting conditional distributions, at each step scoring the candidates by compiling the resulting network into an arithmetic circuit, and using its size as the penalty. We show how this can be done efficiently, without compiling a circuit from scratch for each candidate. Experiments on several real-world domains show that our algorithm is able to learn tractable models with very large treewidth, and yields more accurate predictions than a standard context-specific Bayesian network learner, in far less time.\nWe generalize Shimizu et al's (2006) ICA-based approach for discovering linear non-Gaussian acyclic (LiNGAM) Structural Equation Models (SEMs) from causally sufficient, continuous-valued observational data. By relaxing the assumption that the generating SEM's graph is acyclic, we solve the more general problem of linear non-Gaussian (LiNG) SEM discovery. LiNG discovery algorithms output the distribution equivalence class of SEMs which, in the large sample limit, represents the population distribution. We apply a LiNG discovery algorithm to simulated data. Finally, we give sufficient conditions under which only one of the SEMs in the output class is 'stable'.\nWe present a generative model for representing and reasoning about the relationships among events in continuous time. We apply the model to the domain of networked and distributed computing environments where we fit the parameters of the model from timestamp observations, and then use hypothesis testing to discover dependencies between the events and changes in behavior for monitoring and diagnosis. After introducing the model, we present an EM algorithm for fitting the parameters and then present the hypothesis testing approach for both dependence discovery and change-point detection. We validate the approach for both tasks using real data from a trace of network events at Microsoft Research Cambridge. Finally, we formalize the relationship between the proposed model and the noisy-or gate for cases when time can be discretized.\nIn this work we present Cutting Plane Inference (CPI), a Maximum A Posteriori (MAP) inference method for Statistical Relational Learning. Framed in terms of Markov Logic and inspired by the Cutting Plane Method, it can be seen as a meta algorithm that instantiates small parts of a large and complex Markov Network and then solves these using a conventional MAP method. We evaluate CPI on two tasks, Semantic Role Labelling and Joint Entity Resolution, while plugging in two different MAP inference methods: the current method of choice for MAP inference in Markov Logic, MaxWalkSAT, and Integer Linear Programming. We observe that when used with CPI both methods are significantly faster than when used alone. In addition, CPI improves the accuracy of MaxWalkSAT and maintains the exactness of Integer Linear Programming.\nDeciding what to sense is a crucial task, made harder by dependencies and by a nonadditive utility function. We develop approximation algorithms for selecting an optimal set of measurements, under a dependency structure modeled by a tree-shaped Bayesian network (BN). Our approach is a generalization of composing anytime algorithm represented by conditional performance profiles. This is done by relaxing the input monotonicity assumption, and extending the local compilation technique to more general classes of performance profiles (PPs). We apply the extended scheme to selecting a subset of measurements for choosing a maximum expectation variable in a binary valued BN, and for minimizing the worst variance in a Gaussian BN.\nWe consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dynastyle planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems.\nLinear Programming (LP) relaxations have become powerful tools for finding the most probable (MAP) configuration in graphical models. These relaxations can be solved efficiently using message-passing algorithms such as belief propagation and, when the relaxation is tight, provably find the MAP configuration. The standard LP relaxation is not tight enough in many real-world problems, however, and this has lead to the use of higher order cluster-based LP relaxations. The computational cost increases exponentially with the size of the clusters and limits the number and type of clusters we can use. We propose to solve the cluster selection problem monotonically in the dual LP, iteratively selecting clusters with guaranteed improvement, and quickly re-solving with the added clusters by reusing the existing solution. Our dual message-passing algorithm finds the MAP configuration in protein sidechain placement, protein design, and stereo problems, in cases where the standard LP relaxation fails.\nNumerous temporal inference tasks such as fault monitoring and anomaly detection exhibit a persistence property: for example, if something breaks, it stays broken until an intervention. When modeled as a Dynamic Bayesian Network, persistence adds dependencies between adjacent time slices, often making exact inference over time intractable using standard inference algorithms. However, we show that persistence implies a regular structure that can be exploited for efficient inference. We present three successively more general classes of models: persistent causal chains (PCCs), persistent causal trees (PCTs) and persistent polytrees (PPTs), and the corresponding exact inference algorithms that exploit persistence. We show that analytic asymptotic bounds for our algorithms compare favorably to junction tree inference; and we demonstrate empirically that we can perform exact smoothing on the order of 100 times faster than the approximate Boyen-Koller method on randomly generated instances of persistent tree models. We also show how to handle non-persistent variables and how persistence can be exploited effectively for approximate filtering.\nPlanning can often be simpli ed by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. [4] recently showed that the hierarchy discovery problem can be framed as a non-convex optimization problem. However, the inherent computational di culty of solving such an optimization problem makes it hard to scale to realworld problems. In another line of research, Toussaint et al. [18] developed a method to solve planning problems by maximumlikelihood estimation. In this paper, we show how the hierarchy discovery problem in partially observable domains can be tackled using a similar maximum likelihood approach. Our technique rst transforms the problem into a dynamic Bayesian network through which a hierarchical structure can naturally be discovered while optimizing the policy. Experimental results demonstrate that this approach scales better than previous techniques based on non-convex optimization.\nA Chain Event Graph (CEG) is a graphial model which designed to embody conditional independencies in problems whose state spaces are highly asymmetric and do not admit a natural product structure. In this paer we present a probability propagation algorithm which uses the topology of the CEG to build a transporter CEG. Intriungly,the transporter CEG is directly analogous to the triangulated Bayesian Network (BN) in the more conventional junction tree propagation algorithms used with BNs. The propagation method uses factorization formulae also analogous to (but different from) the ones using potentials on cliques and separators of the BN. It appears that the methods will be typically more efficient than the BN algorithms when applied to contexts where there is significant asymmetry present.\nDecision circuits have been developed to perform efficient evaluation of influence diagrams [Bhattacharjya and Shachter, 2007], building on the advances in arithmetic circuits for belief network inference [Darwiche,2003]. In the process of model building and analysis, we perform sensitivity analysis to understand how the optimal solution changes in response to changes in the model. When sequential decision problems under uncertainty are represented as decision circuits, we can exploit the efficient solution process embodied in the decision circuit and the wealth of derivative information available to compute the value of information for the uncertainties in the problem and the effects of changes to model parameters on the value and the optimal strategy.\nComputing the probability of evidence even with known error bounds is NP-hard. In this paper we address this hard problem by settling on an easier problem. We propose an approximation which provides high confidence lower bounds on probability of evidence but does not have any guarantees in terms of relative or absolute error. Our proposed approximation is a randomized importance sampling scheme that uses the Markov inequality. However, a straight-forward application of the Markov inequality may lead to poor lower bounds. We therefore propose several heuristic measures to improve its performance in practice. Empirical evaluation of our scheme with state-of- the-art lower bounding schemes reveals the promise of our approach.\nChoquet expected utility (CEU) is one of the most sophisticated decision criteria used in decision theory under uncertainty. It provides a generalisation of expected utility enhancing both descriptive and prescriptive possibilities. In this paper, we investigate the use of CEU for path-planning under uncertainty with a special focus on robust solutions. We first recall the main features of the CEU model and introduce some examples showing its descriptive potential. Then we focus on the search for Choquet-optimal paths in multivalued implicit graphs where costs depend on different scenarios. After discussing complexity issues, we propose two different heuristic search algorithms to solve the problem. Finally, numerical experiments are reported, showing the practical efficiency of the proposed algorithms.\nWe propose a new method for parameter learning in Bayesian networks with qualitative influences. This method extends our previous work from networks of binary variables to networks of discrete variables with ordered values. The specified qualitative influences correspond to certain order restrictions on the parameters in the network. These parameters may therefore be estimated using constrained maximum likelihood estimation. We propose an alternative method, based on the isotonic regression. The constrained maximum likelihood estimates are fairly complicated to compute, whereas computation of the isotonic regression estimates only requires the repeated application of the Pool Adjacent Violators algorithm for linear orders. Therefore, the isotonic regression estimator is to be preferred from the viewpoint of computational complexity. Through experiments on simulated and real data, we show that the new learning method is competitive in performance to the constrained maximum likelihood estimator, and that both estimators improve on the standard estimator.\nThe ways in which an agent's actions affect the world can often be modeled compactly using a set of relational probabilistic planning rules. This paper addresses the problem of learning such rule sets for multiple related tasks. We take a hierarchical Bayesian approach, in which the system learns a prior distribution over rule sets. We present a class of prior distributions parameterized by a rule set prototype that is stochastically modified to produce a task-specific rule set. We also describe a coordinate ascent algorithm that iteratively optimizes the task-specific rule sets and the prior distribution. Experiments using this algorithm show that transferring information from related tasks significantly reduces the amount of training data required to predict action effects in blocks-world domains.\nRemote sensors are becoming the standard for observing and recording ecological data in the field. Such sensors can record data at fine temporal resolutions, and they can operate under extreme conditions prohibitive to human access. Unfortunately, sensor data streams exhibit many kinds of errors ranging from corrupt communications to partial or total sensor failures. This means that the raw data stream must be cleaned before it can be used by domain scientists. In our application environment|the H.J. Andrews Experimental Forest|this data cleaning is performed manually. This paper introduces a Dynamic Bayesian Network model for analyzing sensor observations and distinguishing sensor failures from valid data for the case of air temperature measured at 15 minute time resolution. The model combines an accurate distribution of long-term and short-term temperature variations with a single generalized fault model. Experiments with historical data show that the precision and recall of the method is comparable to that of the domain expert. The system is currently being deployed to perform real-time automated data cleaning.\nWe formulate in this paper the mini-bucket algorithm for approximate inference in terms of exact inference on an approximate model produced by splitting nodes in a Bayesian network. The new formulation leads to a number of theoretical and practical implications. First, we show that branchand- bound search algorithms that use minibucket bounds may operate in a drastically reduced search space. Second, we show that the proposed formulation inspires new minibucket heuristics and allows us to analyze existing heuristics from a new perspective. Finally, we show that this new formulation allows mini-bucket approximations to benefit from recent advances in exact inference, allowing one to significantly increase the reach of these approximations.\nIn this paper we introduce a new network reachability problem where the goal is to find the most reliable path between two nodes in a network, represented as a directed acyclic graph. Individual edges within this network may fail according to certain probabilities, and these failure probabilities may depend on the values of one or more hidden variables. This problem may be viewed as a generalization of shortest-path problems for finding minimum cost paths or Viterbi-type problems for finding highest-probability sequences of states, where the addition of the hidden variables introduces correlations that are not handled by previous algorithms. We give theoretical results characterizing this problem including an NP-hardness proof. We also give an exact algorithm and a more efficient approximation algorithm for this problem.\nAlthough a number of related algorithms have been developed to evaluate influence diagrams, exploiting the conditional independence in the diagram, the exact solution has remained intractable for many important problems. In this paper we introduce decision circuits as a means to exploit the local structure usually found in decision problems and to improve the performance of influence diagram analysis. This work builds on the probabilistic inference algorithms using arithmetic circuits to represent Bayesian belief networks [Darwiche, 2003]. Once compiled, these arithmetic circuits efficiently evaluate probabilistic queries on the belief network, and methods have been developed to exploit both the global and local structure of the network. We show that decision circuits can be constructed in a similar fashion and promise similar benefits.\nWe present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty.\nWe present the mixture-of-parents maximum entropy Markov model (MoP-MEMM), a class of directed graphical models extending MEMMs. The MoP-MEMM allows tractable incorporation of long-range dependencies between nodes by restricting the conditional distribution of each node to be a mixture of distributions given the parents. We show how to efficiently compute the exact marginal posterior node distributions, regardless of the range of the dependencies. This enables us to model non-sequential correlations present within text documents, as well as between interconnected documents, such as hyperlinked web pages. We apply the MoP-MEMM to a named entity recognition task and a web page classification task. In each, our model shows significant improvement over the basic MEMM, and is competitive with other long-range sequence models that use approximate inference.\nWe analyze the generalized Mallows model, a popular exponential model over rankings. Estimating the central (or consensus) ranking from data is NP-hard. We obtain the following new results: (1) We show that search methods can estimate both the central ranking pi0 and the model parameters theta exactly. The search is n! in the worst case, but is tractable when the true distribution is concentrated around its mode; (2) We show that the generalized Mallows model is jointly exponential in (pi0; theta), and introduce the conjugate prior for this model class; (3) The sufficient statistics are the pairwise marginal probabilities that item i is preferred to item j. Preliminary experiments confirm the theoretical predictions and compare the new algorithm and existing heuristics.\nCompiling graphical models has recently been under intense investigation, especially for probabilistic modeling and processing. We present here a novel data structure for compiling weighted graphical models (in particular, probabilistic models), called AND/OR Multi-Valued Decision Diagram (AOMDD). This is a generalization of our previous work on constraint networks, to weighted models. The AOMDD is based on the frameworks of AND/OR search spaces for graphical models, and Ordered Binary Decision Diagrams (OBDD). The AOMDD is a canonical representation of a graphical model, and its size and compilation time are bounded exponentially by the treewidth of the graph, rather than pathwidth as is known for OBDDs. We discuss a Variable Elimination schedule for compilation, and present the general APPLY algorithm that combines two weighted AOMDDs, and also present a search based method for compilation method. The preliminary experimental evaluation is quite encouraging, showing the potential of the AOMDD data structure.\nThe paper evaluates the power of best-first search over AND/OR search spaces for solving the Most Probable Explanation (MPE) task in Bayesian networks. The main virtue of the AND/OR representation of the search space is its sensitivity to the structure of the problem, which can translate into significant time savings. In recent years depth-first AND/OR Branch-and- Bound algorithms were shown to be very effective when exploring such search spaces, especially when using caching. Since best-first strategies are known to be superior to depth-first when memory is utilized, exploring the best-first control strategy is called for. The main contribution of this paper is in showing that a recent extension of AND/OR search algorithms from depth-first Branch-and-Bound to best-first is indeed very effective for computing the MPE in Bayesian networks. We demonstrate empirically the superiority of the best-first search approach on various probabilistic networks.\nSearching the complete space of possible Bayesian networks is intractable for problems of interesting size, so Bayesian network structure learning algorithms, such as the commonly used Sparse Candidate algorithm, employ heuristics. However, these heuristics also restrict the types of relationships that can be learned exclusively from data. They are unable to learn relationships that exhibit \"correlation-immunity\", such as parity. To learn Bayesian networks in the presence of correlation-immune relationships, we extend the Sparse Candidate algorithm with a technique called \"skewing\". This technique uses the observation that relationships that are correlation-immune under a specific input distribution may not be correlation-immune under another, sufficiently different distribution. We show that by extending Sparse Candidate with this technique we are able to discover relationships between random variables that are approximately correlation-immune, with a significantly lower computational cost than the alternative of considering multiple parents of a node at a time.\nWhen observational data is available from practical studies and a directed cyclic graph for how various variables affect each other is known based on substantive understanding of the process, we consider a problem in which a control plan of a treatment variable is conducted in order to bring a response variable close to a target value with variation reduction. We formulate an optimal control plan concerning a certain treatment variable through path coefficients in the framework of linear nonrecursive structural equation models. Based on the formulation, we clarify the properties of causal effects when conducting a control plan. The results enable us to evaluate the effect of a control plan on the variance from observational data.\nSurvey propagation (SP) is an exciting new technique that has been remarkably successful at solving very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas. In a promising attempt at understanding the success of SP, it was recently shown that SP can be viewed as a form of belief propagation, computing marginal probabilities over certain objects called covers of a formula. This explanation was, however, shortly dismissed by experiments suggesting that non-trivial covers simply do not exist for large formulas. In this paper, we show that these experiments were misleading: not only do covers exist for large hard random formulas, SP is surprisingly accurate at computing marginals over these covers despite the existence of many cycles in the formulas. This re-opens a potentially simpler line of reasoning for understanding SP, in contrast to some alternative lines of explanation that have been proposed assuming covers do not exist.\nRanking objects is a simple and natural procedure for organizing data. It is often performed by assigning a quality score to each object according to its relevance to the problem at hand. Ranking is widely used for object selection, when resources are limited and it is necessary to select a subset of most relevant objects for further processing. In real world situations, the object's scores are often calculated from noisy measurements, casting doubt on the ranking reliability. We introduce an analytical method for assessing the influence of noise levels on the ranking reliability. We use two similarity measures for reliability evaluation, Top-K-List overlap and Kendall's tau measure, and show that the former is much more sensitive to noise than the latter. We apply our method to gene selection in a series of microarray experiments of several cancer types. The results indicate that the reliability of the lists obtained from these experiments is very poor, and that experiment sizes which are necessary for attaining reasonably stable Top-K-Lists are much larger than those currently available. Simulations support our analytical results.\nPreferences play an important role in our everyday lives. CP-networks, or CP-nets in short, are graphical models for representing conditional qualitative preferences under ceteris paribus (\"all else being equal\") assumptions. Despite their intuitive nature and rich representation, dominance testing with CP-nets is computationally complex, even when the CP-nets are restricted to binary-valued preferences. Tractable algorithms exist for binary CP-nets, but these algorithms are incomplete for multi-valued CPnets. In this paper, we identify a class of multivalued CP-nets, which we call more-or-less CPnets, that have the same computational complexity as binary CP-nets. More-or-less CP-nets exploit the monotonicity of the attribute values and use intervals to aggregate values that induce similar preferences. We then present a search control rule for dominance testing that effectively prunes the search space while preserving completeness.\nComputing the exact likelihood of data in large Bayesian networks consisting of thousands of vertices is often a difficult task. When these models contain many deterministic conditional probability tables and when the observed values are extremely unlikely even alternative algorithms such as variational methods and stochastic sampling often perform poorly. We present a new importance sampling algorithm for Bayesian networks which is based on variational techniques. We use the updates of the importance function to predict whether the stochastic sampling converged above or below the true likelihood, and change the proposal distribution accordingly. The validity of the method and its contribution to convergence is demonstrated on hard networks of large genetic linkage analysis tasks.\nRelational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the value iteration algorithm. However, exact versions of more complex algorithms, including policy iteration, have not been developed or analyzed. The paper investigates this potential and makes several contributions. First we observe two anomalies for relational representations showing that the value of some policies is not well defined or cannot be calculated for restricted representation schemes used in the literature. On the other hand, we develop a variant of policy iteration that can get around these anomalies. The algorithm includes an aspect of policy improvement in the process of policy evaluation and thus differs from the original algorithm. We show that despite this difference the algorithm converges to the optimal policy.\nWe present a functional framework for automated mechanism design based on a two-stage game model of strategic interaction between the designer and the mechanism participants, and apply it to several classes of two-player infinite games of incomplete information. At the core of our framework is a black-box optimization algorithm which guides the selection process of candidate mechanisms. Our approach yields optimal or nearly optimal mechanisms in several application domains using various objective functions. By comparing our results with known optimal mechanisms, and in some cases improving on the best known mechanisms, we provide evidence that ours is a promising approach to parametric design of indirect mechanisms.\nBelief propagation and its variants are popular methods for approximate inference, but their running time and even their convergence depend greatly on the schedule used to send the messages. Recently, dynamic update schedules have been shown to converge much faster on hard networks than static schedules, namely the residual BP schedule of Elidan et al. [2006]. But that RBP algorithm wastes message updates: many messages are computed solely to determine their priority, and are never actually performed. In this paper, we show that estimating the residual, rather than calculating it directly, leads to significant decreases in the number of messages required for convergence, and in the total running time. The residual is estimated using an upper bound based on recent work on message errors in BP. On both synthetic and real-world networks, this dramatically decreases the running time of BP, in some cases by a factor of five, without affecting the quality of the solution.\nCombining first-order logic and probability has long been a goal of AI. Markov logic (Richardson & Domingos, 2006) accomplishes this by attaching weights to first-order formulas and viewing them as templates for features of Markov networks. Unfortunately, it does not have the full power of first-order logic, because it is only defined for finite domains. This paper extends Markov logic to infinite domains, by casting it in the framework of Gibbs measures (Georgii, 1988). We show that a Markov logic network (MLN) admits a Gibbs measure as long as each ground atom has a finite number of neighbors. Many interesting cases fall in this category. We also show that an MLN admits a unique measure if the weights of its non-unit clauses are small enough. We then examine the structure of the set of consistent measures in the non-unique case. Many important phenomena, including systems with phase transitions, are represented by MLNs with non-unique measures. We relate the problem of satisfiability in first-order logic to the properties of MLN measures, and discuss how Markov logic relates to previous infinite models.\nMulti-fidelity methods combine inexpensive low-fidelity simulations with costly but high-fidelity simulations to produce an accurate model of a system of interest at minimal cost. They have proven useful in modeling physical systems and have been applied to engineering problems such as wing-design optimization. During human-in-the-loop experimentation, it has become increasingly common to use online platforms, like Mechanical Turk, to run low-fidelity experiments to gather human performance data in an efficient manner. One concern with these experiments is that the results obtained from the online environment generalize poorly to the actual domain of interest. To address this limitation, we extend traditional multi-fidelity approaches to allow us to combine fewer data points from high-fidelity human-in-the-loop experiments with plentiful but less accurate data from low-fidelity experiments to produce accurate models of how humans interact. We present both model-based and model-free methods, and summarize the predictive performance of each method under different conditions.\nPredicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts' predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert's prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms.\nWe describe an expert system, MAIES, developed for analysing forensic identification problems involving DNA mixture traces using quantitative peak area information. Peak area information is represented by conditional Gaussian distributions, and inference based on exact junction tree propagation ascertains whether individuals, whose profiles have been measured, have contributed to the mixture. The system can also be used to predict DNA profiles of unknown contributors by separating the mixture into its individual components. The use of the system is illustrated with an application to a real world example. The system implements a novel MAP (maximum a posteriori) search algorithm that is described in an appendix.\nWe consider in this paper the formulation of approximate inference in Bayesian networks as a problem of exact inference on an approximate network that results from deleting edges (to reduce treewidth). We have shown in earlier work that deleting edges calls for introducing auxiliary network parameters to compensate for lost dependencies, and proposed intuitive conditions for determining these parameters. We have also shown that our method corresponds to IBP when enough edges are deleted to yield a polytree, and corresponds to some generalizations of IBP when fewer edges are deleted. In this paper, we propose a different criteria for determining auxiliary parameters based on optimizing the KL-divergence between the original and approximate networks. We discuss the relationship between the two methods for selecting parameters, shedding new light on IBP and its generalizations. We also discuss the application of our new method to approximating inference problems which are exponential in constrained treewidth, including MAP and nonmyopic value of information.\nThe effect of inaccuracies in the parameters of a dynamic Bayesian network can be investigated by subjecting the network to a sensitivity analysis. Having detailed the resulting sensitivity functions in our previous work, we now study the effect of parameter inaccuracies on a recommended decision in view of a threshold decision-making model. We detail the effect of varying a single and multiple parameters from a conditional probability table and present a computational procedure for establishing bounds between which assessments for these parameters can be varied without inducing a change in the recommended decision. We illustrate the various concepts involved by means of a real-life dynamic network in the field of infectious disease.\nConsider a multi-agent system in a dynamic and uncertain environment. Each agent's local decision problem is modeled as a Markov decision process (MDP) and agents must coordinate on a joint action in each period, which provides a reward to each agent and causes local state transitions. A social planner knows the model of every agent's MDP and wants to implement the optimal joint policy, but agents are self-interested and have private local state. We provide an incentive-compatible mechanism for eliciting state information that achieves the optimal joint plan in a Markov perfect equilibrium of the induced stochastic game. In the special case in which local problems are Markov chains and agents compete to take a single action in each period, we leverage Gittins allocation indices to provide an efficient factored algorithm and distribute computation of the optimal policy among the agents. Distributed, optimal coordinated learning in a multi-agent variant of the multi-armed bandit problem is obtained as a special case.\nThe paper concerns the problem of predicting the effect of actions or interventions on a system from a combination of (i) statistical data on a set of observed variables, and (ii) qualitative causal knowledge encoded in the form of a directed acyclic graph (DAG). The DAG represents a set of linear equations called Structural Equations Model (SEM), whose coefficients are parameters representing direct causal effects. Reliable quantitative conclusions can only be obtained from the model if the causal effects are uniquely determined by the data. That is, if there exists a unique parametrization for the model that makes it compatible with the data. If this is the case, the model is called identified. The main result of the paper is a general sufficient condition for identification of recursive SEM models.\nThe paper analyzes theoretically and empirically the performance of likelihood weighting (LW) on a subset of nodes in Bayesian networks. The proposed scheme requires fewer samples to converge due to reduction in sampling variance. The method exploits the structure of the network to bound the complexity of exact inference used to compute sampling distributions, similar to Gibbs cutset sampling. Yet, the extension of the previosly proposed cutset sampling principles to likelihood weighting is non-trivial due to differences in the sampling processes of Gibbs sampler and LW. We demonstrate empirically that likelihood weighting on a cutset (LWLC) is effective time-wise and has a lower rejection rate than LW when applied to networks with many deterministic probabilities. Finally, we show that the performance of likelihood weighting on a cutset can be improved further by caching computed sampling distributions and, consequently, learning 'zeros' of the target distribution.\nLinear-time computational techniques have been developed for combining evidence which is available on a number of contending hypotheses. They offer a means of making the computation-intensive calculations involved more efficient in certain circumstances. Unfortunately, they restrict the orthogonal sum of evidential functions to the dichotomous structure applies only to elements and their complements. In this paper, we present a novel evidence structure in terms of a triplet and a set of algorithms for evidential reasoning. The merit of this structure is that it divides a set of evidence into three subsets, distinguishing trivial evidential elements from important ones focusing some particular elements. It avoids the deficits of the dichotomous structure in representing the preference of evidence and estimating the basic probability assignment of evidence. We have established a formalism for this structure and the general formulae for combining pieces of evidence in the form of the triplet, which have been theoretically justified.\nWe observe that certain large-clique graph triangulations can be useful to reduce both computational and space requirements when making queries on mixed stochastic/deterministic graphical models. We demonstrate that many of these large-clique triangulations are non-minimal and are thus unattainable via the variable elimination algorithm. We introduce ancestral pairs as the basis for novel triangulation heuristics and prove that no more than the addition of edges between ancestral pairs need be considered when searching for state space optimal triangulations in such graphs. Empirical results on random and real world graphs show that the resulting triangulations that yield significant speedups are almost always non-minimal. We also give an algorithm and correctness proof for determining if a triangulation can be obtained via elimination, and we show that the decision problem associated with finding optimal state space triangulations in this mixed stochastic/deterministic setting is NP-complete.\nSeparable Bayesian Networks, or the Influence Model, are dynamic Bayesian Networks in which the conditional probability distribution can be separated into a function of only the marginal distribution of a node's neighbors, instead of the joint distributions. In terms of modeling, separable networks has rendered possible siginificant reduction in complexity, as the state space is only linear in the number of variables on the network, in contrast to a typical state space which is exponential. In this work, We describe the connection between an arbitrary Conditional Probability Table (CPT) and separable systems using linear algebra. We give an alternate proof on the equivalence of sufficiency and separability. We present a computational method for testing whether a given CPT is separable.\nWe consider a Bayesian method for learning the Bayesian network structure from complete data. Recently, Koivisto and Sood (2004) presented an algorithm that for any single edge computes its marginal posterior probability in O(n 2^n) time, where n is the number of attributes; the number of parents per attribute is bounded by a constant. In this paper we show that the posterior probabilities for all the n (n - 1) potential edges can be computed in O(n 2^n) total time. This result is achieved by a forward-backward technique and fast Moebius transform algorithms, which are of independent interest. The resulting speedup by a factor of about n^2 allows us to experimentally study the statistical power of learning moderate-size networks. We report results from a simulation study that covers data sets with 20 to 10,000 records over 5 to 25 discrete attributes\nWe investigate methods for parameter learning from incomplete data that is not missing at random. Likelihood-based methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimzing this profile likelihood poses two main difficulties: multiple (local) maxima, and its very high-dimensional parameter space. In this paper a new method is presented for optimizing the profile likelihood that addresses the second difficulty: in the proposed AI&M (adjusting imputation and mazimization) procedure the optimization is performed by operations in the space of data completions, rather than directly in the parameter space of the profile likelihood. We apply the AI&M method to learning parameters for Bayesian networks. The method is compared against conservative inference, which takes into account each possible data completion, and against EM. The results indicate that likelihood-based inference is still feasible in the case of unknown missingness mechanisms, and that conservative inference is unnecessarily weak. On the other hand, our results also provide evidence that the EM algorithm is still quite effective when the data is not missing at random.\nThis paper is concerned with graphical criteria that can be used to solve the problem of identifying casual effects from nonexperimental data in a causal Bayesian network structure, i.e., a directed acyclic graph that represents causal relationships. We first review Pearl's work on this topic [Pearl, 1995], in which several useful graphical criteria are presented. Then we present a complete algorithm [Huang and Valtorta, 2006b] for the identifiability problem. By exploiting the completeness of this algorithm, we prove that the three basic do-calculus rules that Pearl presents are complete, in the sense that, if a causal effect is identifiable, there exists a sequence of applications of the rules of the do-calculus that transforms the causal effect formula into a formula that only includes observational quantities.\nContinuous-time Bayesian networks (CTBNs) are graphical representations of multi-component continuous-time Markov processes as directed graphs. The edges in the network represent direct influences among components. The joint rate matrix of the multi-component process is specified by means of conditional rate matrices for each component separately. This paper addresses the situation where some of the components evolve on a time scale that is much shorter compared to the time scale of the other components. In this paper, we prove that in the limit where the separation of scales is infinite, the Markov process converges (in distribution, or weakly) to a reduced, or effective Markov process that only involves the slow components. We also demonstrate that for reasonable separation of scale (an order of magnitude) the reduced process is a good approximation of the marginal process over the slow components. We provide a simple procedure for building a reduced CTBN for this effective process, with conditional rate matrices that can be directly calculated from the original CTBN, and discuss the implications for approximate reasoning in large systems.\nA popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovich metric between probability distributions. However, while this metric enables many satisfying theoretical properties, it is costly to compute in practice. In this paper, we use techniques from network optimization and statistical sampling to overcome this problem. We obtain in this manner a variety of distance functions for MDP state aggregation, which differ in the tradeoff between time and space complexity, as well as the quality of the aggregation. We provide an empirical evaluation of these trade-offs.\nDirected possibly cyclic graphs have been proposed by Didelez (2000) and Nodelmann et al. (2002) in order to represent the dynamic dependencies among stochastic processes. These dependencies are based on a generalization of Granger-causality to continuous time, first developed by Schweder (1970) for Markov processes, who called them local dependencies. They deserve special attention as they are asymmetric unlike stochastic (in)dependence. In this paper we focus on their graphical representation and develop a suitable, i.e. asymmetric notion of separation, called delta-separation. The properties of this graph separation as well as of local independence are investigated in detail within a framework of asymmetric (semi)graphoids allowing a deeper insight into what information can be read off these graphs.\nTasks such as record linkage and multi-target tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carlo (MCMC) techniques with customized proposal distributions. Currently, implementing such a system requires coding MCMC state representations and acceptance probability calculations that are specific to a particular application. An alternative approach, which we pursue in this paper, is to use a general-purpose probabilistic modeling language (such as BLOG) and a generic Metropolis-Hastings MCMC algorithm that supports user-supplied proposal distributions. Our algorithm gains flexibility by using MCMC states that are only partial descriptions of possible worlds; we provide conditions under which MCMC over partial worlds yields correct answers to queries. We also show how to use a context-specific Bayes net to identify the factors in the acceptance probability that need to be computed for a given proposed move. Experimental results on a citation matching task show that our general-purpose MCMC engine compares favorably with an application-specific system.\nCollaborative data consist of ratings relating two distinct sets of objects: users and items. Much of the work with such data focuses on filtering: predicting unknown ratings for pairs of users and items. In this paper we focus on the problem of visualizing the information. Given all of the ratings, our task is to embed all of the users and items as points in the same Euclidean space. We would like to place users near items that they have rated (or would rate) high, and far away from those they would give a low rating. We pose this problem as a real-valued non-linear Bayesian network and employ Markov chain Monte Carlo and expectation maximization to find an embedding. We present a metric by which to judge the quality of a visualization and compare our results to local linear embedding and Eigentaste on three real-world datasets.\nPrevious work in hierarchical reinforcement learning has faced a dilemma: either ignore the values of different possible exit states from a subroutine, thereby risking suboptimal behavior, or represent those values explicitly thereby incurring a possibly large representation cost because exit values refer to nonlocal aspects of the world (i.e., all subsequent rewards). This paper shows that, in many cases, one can avoid both of these problems. The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy. This leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect the exit value. We also identify structural conditions on the value function and transition distributions that allow much more concise representations of exit state distributions, leading to further state abstraction. In essence, the only variables whose exit values need be considered are those that the parent cares about and the child affects. We demonstrate the utility of our algorithms on a series of increasingly complex environments.\nTraditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in real-world systems often group into classes that predict the kinds of probabilistic dependencies they participate in. Here we capture this form of prior knowledge in a hierarchical Bayesian framework, and exploit it to enable structure learning and type discovery from small datasets. Specifically, we present a nonparametric generative model for directed acyclic graphs as a prior for Bayes net structure learning. Our model assumes that variables come in one or more classes and that the prior probability of an edge existing between two variables is a function only of their classes. We derive an MCMC algorithm for simultaneous inference of the number of classes, the class assignments of variables, and the Bayes net structure over variables. For several realistic, sparse datasets, we show that the bias towards systematicity of connections provided by our model yields more accurate learned networks than a traditional, uniform prior approach, and that the classes found by our model are appropriate.\nThere are several existing algorithms that under appropriate assumptions can reliably identify a subset of the underlying causal relationships from observational data. This paper introduces the first computationally feasible score-based algorithm that can reliably identify causal relationships in the large sample limit for discrete models, while allowing for the possibility that there are unobserved common causes. In doing so, the algorithm does not ever need to assign scores to causal structures with unobserved common causes. The algorithm is based on the identification of so called Y substructures within Bayesian network structures that can be learned from observational data. An example of a Y substructure is A -> C, B -> C, C -> D. After providing background on causal discovery, the paper proves the conditions under which the algorithm is reliable in the large sample limit.\nBayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous work have studied BNs sample complexity, yet it mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes. We show that structure learning is a more difficult task, compared to approximating the correct distribution, in the sense that it requires a much larger number of samples, regardless of the computational power available for the learner.\nWe present a non-parametric Bayesian approach to structure learning with hidden causes. Previous Bayesian treatments of this problem define a prior over the number of hidden causes and use algorithms such as reversible jump Markov chain Monte Carlo to move between solutions. In contrast, we assume that the number of hidden causes is unbounded, but only a finite number influence observable variables. This makes it possible to use a Gibbs sampler to approximate the distribution over causal structures. We evaluate the performance of both approaches in discovering hidden causes in simulated data, and use our non-parametric approach to discover hidden causes in a real medical dataset.\nExpected Utility: Algebraic Expected Utility In this paper, we provide two axiomatizations of algebraic expected utility, which is a particular generalized expected utility, in a von Neumann-Morgenstern setting, i.e. uncertainty representation is supposed to be given and here to be described by a plausibility measure valued on a semiring, which could be partially ordered. We show that axioms identical to those for expected utility entail that preferences are represented by an algebraic expected utility. This algebraic approach allows many previous propositions (expected utility, binary possibilistic utility,...) to be unified in a same general framework and proves that the obtained utility enjoys the same nice features as expected utility: linearity, dynamic consistency, autoduality of the underlying uncertainty measure, autoduality of the decision criterion and possibility of modeling decision maker's attitude toward uncertainty.\nWe introduce a new dynamic model with the capability of recognizing both activities that an individual is performing as well as where that ndividual is located. Our model is novel in that it utilizes a dynamic graphical model to jointly estimate both activity and spatial context over time based on the simultaneous use of asynchronous observations consisting of GPS measurements, and measurements from a small mountable sensor board. Joint inference is quite desirable as it has the ability to improve accuracy of the model. A key goal, however, in designing our overall system is to be able to perform accurate inference decisions while minimizing the amount of hardware an individual must wear. This minimization leads to greater comfort and flexibility, decreased power requirements and therefore increased battery life, and reduced cost. We show results indicating that our joint measurement model outperforms measurements from either the sensor board or GPS alone, using two types of probabilistic inference procedures, namely particle filtering and pruned exact inference.\nModel-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic programming (RTDP) to speed up two model-based algorithms, RMAX and MBIE (model-based interval estimation), resulting in computationally much faster algorithms with little loss compared to existing bounds. Specifically, our two new learning algorithms, RTDP-RMAX and RTDP-IE, have considerably smaller computational demands than RMAX and MBIE. We develop a general theoretical framework that allows us to prove that both are efficient learners in a PAC (probably approximately correct) sense. We also present an experimental evaluation of these new algorithms that helps quantify the tradeoff between computational and experience demands.\nWe study the problem of learning the best Bayesian network structure with respect to a decomposable score such as BDe, BIC or AIC. This problem is known to be NP-hard, which means that solving it becomes quickly infeasible as the number of variables increases. Nevertheless, in this paper we show that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases. Our algorithm is less complicated and more efficient than the techniques presented earlier. It can be easily parallelized, and offers a possibility for efficient exploration of the best networks consistent with different variable orderings. In the experimental part of the paper we compare the performance of the algorithm to the previous state-of-the-art algorithm. Free source-code and an online-demo can be found at http://b-course.hiit.fi/bene.\nThe main goal of this paper is to describe a method for exact inference in general hybrid Bayesian networks (BNs) (with a mixture of discrete and continuous chance variables). Our method consists of approximating general hybrid Bayesian networks by a mixture of Gaussians (MoG) BNs. There exists a fast algorithm by Lauritzen-Jensen (LJ) for making exact inferences in MoG Bayesian networks, and there exists a commercial implementation of this algorithm. However, this algorithm can only be used for MoG BNs. Some limitations of such networks are as follows. All continuous chance variables must have conditional linear Gaussian distributions, and discrete chance nodes cannot have continuous parents. The methods described in this paper will enable us to use the LJ algorithm for a bigger class of hybrid Bayesian networks. This includes networks with continuous chance nodes with non-Gaussian distributions, networks with no restrictions on the topology of discrete and continuous variables, networks with conditionally deterministic variables that are a nonlinear function of their continuous parents, and networks with continuous chance variables whose variances are functions of their parents.\nRecent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the first-order ALP framework to approximate policy iteration to address performance deficiencies of previous approaches? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on logistics problems from the ICAPS 2004 Probabilistic Planning Competition.\nWith the aid of the concept of stable independence we can construct, in an efficient way, a compact representation of a semi-graphoid independence relation. We show that this representation provides a new necessary condition for the existence of a directed perfect map for the relation. The test for this condition is based to a large extent on the transitivity property of a special form of d-separation. The complexity of the test is linear in the size of the representation. The test, moreover, brings the additional benefit that it can be used to guide the early stages of network construction.\nMany real world sequences such as protein secondary structures or shell logs exhibit a rich internal structures. Traditional probabilistic models of sequences, however, consider sequences of flat symbols only. Logical hidden Markov models have been proposed as one solution. They deal with logical sequences, i.e., sequences over an alphabet of logical atoms. This comes at the expense of a more complex model selection problem. Indeed, different abstraction levels have to be explored. In this paper, we propose a novel method for selecting logical hidden Markov models from data called SAGEM. SAGEM combines generalized expectation maximization, which optimizes parameters, with structure search for model selection using inductive logic programming refinement operators. We provide convergence and experimental results that show SAGEM's effectiveness.\nIn this paper we present a differential semantics of Lazy AR Propagation (LARP) in discrete Bayesian networks. We describe how both single and multi dimensional partial derivatives of the evidence may easily be calculated from a junction tree in LARP equilibrium. We show that the simplicity of the calculations stems from the nature of LARP. Based on the differential semantics we describe how variable propagation in the LARP architecture may give access to additional partial derivatives. The cautious LARP (cLARP) scheme is derived to produce a flexible cLARP equilibrium that offers additional opportunities for calculating single and multidimensional partial derivatives of the evidence and subsets of the evidence from a single propagation. The results of an empirical evaluation illustrates how the access to a largely increased number of partial derivatives comes at a low computational cost.\nThis paper deals with the following problem: modify a Bayesian network to satisfy a given set of probability constraints by only change its conditional probability tables, and the probability distribution of the resulting network should be as close as possible to that of the original network. We propose to solve this problem by extending IPFP (iterative proportional fitting procedure) to probability distributions represented by Bayesian networks. The resulting algorithm E-IPFP is further developed to D-IPFP, which reduces the computational cost by decomposing a global EIPFP into a set of smaller local E-IPFP problems. Limited analysis is provided, including the convergence proofs of the two algorithms. Computer experiments were conducted to validate the algorithms. The results are consistent with the theoretical analysis.\nStudying the effects of one-way variation of any number of parameters on any number of output probabilities quickly becomes infeasible in practice, especially if various evidence profiles are to be taken into consideration. To provide for identifying the parameters that have a potentially large effect prior to actually performing the analysis, we need properties of sensitivity functions that are independent of the network under study, of the available evidence, or of both. In this paper, we study properties that depend upon just the probability of the entered evidence. We demonstrate that these properties provide for establishing an upper bound on the sensitivity value for a parameter; they further provide for establishing the region in which the vertex of the sensitivity function resides, thereby serving to identify parameters with a low sensitivity value that may still have a large impact on the probability of interest for relatively small parameter variations.\nWe present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multirobot coordination, network traffic control, `or distributed resource allocation. Solving such problems efiectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite horizon problems.\nStructured utility models are essential for the effective representation and elicitation of complex multiattribute utility functions. Generalized additive independence (GAI) models provide an attractive structural model of user preferences, offering a balanced tradeoff between simplicity and applicability. While representation and inference with such models is reasonably well understood, elicitation of the parameters of such models has been studied less from a practical perspective. We propose a procedure to elicit GAI model parameters using only \"local\" utility queries rather than \"global\" queries over full outcomes. Our local queries take full advantage of GAI structure and provide a sound framework for extending the elicitation procedure to settings where the uncertainty over utility parameters is represented probabilistically. We describe experiments using a myopic value-of-information approach to elicitation in a large GAI model.\nIt is well known that there may be many causal explanations that are consistent with a given set of data. Recent work has been done to represent the common aspects of these explanations into one representation. In this paper, we address what is less well known: how do the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of latent variables? Ancestral graphs provide a class of graphs that can encode conditional independence relations that arise in DAG models with latent and selection variables. In this paper we present a set of orientation rules that construct the Markov equivalence class representative for ancestral graphs, given a member of the equivalence class. These rules are sound and complete. We also show that when the equivalence class includes a DAG, the equivalence class representative is the essential graph for the said DAG\nWhen a hybrid Bayesian network has conditionally deterministic variables with continuous parents, the joint density function for the continuous variables does not exist. Conditional linear Gaussian distributions can handle such cases when the continuous variables have a multi-variate normal distribution and the discrete variables do not have continuous parents. In this paper, operations required for performing inference with conditionally deterministic variables in hybrid Bayesian networks are developed. These methods allow inference in networks with deterministic variables where continuous variables may be non-Gaussian, and their density functions can be approximated by mixtures of truncated exponentials. There are no constraints on the placement of continuous and discrete nodes in the network.\nPlanning in adversarial and uncertain environments can be modeled as the problem of devising strategies in stochastic perfect information games. These games are generalizations of Markov decision processes (MDPs): there are two (adversarial) players, and a source of randomness. The main practical obstacle to computing winning strategies in such games is the size of the state space. In practice therefore, one typically works with abstractions of the model. The diffculty is to come up with an abstraction that is neither too coarse to remove all winning strategies (plans), nor too fine to be intractable. In verification, the paradigm of counterexample-guided abstraction refinement has been successful to construct useful but parsimonious abstractions automatically. We extend this paradigm to probabilistic models (namely, perfect information games and, as a special case, MDPs). This allows us to apply the counterexample-guided abstraction paradigm to the AI planning problem. As special cases, we get planning algorithms for MDPs and deterministic systems that automatically construct system abstractions.\nThe Bayesian Logic (BLOG) language was recently developed for defining first-order probability models over worlds with unknown numbers of objects. It handles important problems in AI, including data association and population estimation. This paper extends BLOG by adopting generative processes over function spaces - known as nonparametrics in the Bayesian literature. We introduce syntax for reasoning about arbitrary collections of objects, and their properties, in an intuitive manner. By exploiting exchangeability, distributions over unknown objects and their attributes are cast as Dirichlet processes, which resolve difficulties in model selection and inference caused by varying numbers of objects. We demonstrate these concepts with application to citation matching.\nConsider the case where causal relations among variables can be described as a Gaussian linear structural equation model. This paper deals with the problem of clarifying how the variance of a response variable would have changed if a treatment variable were assigned to some value (counterfactually), given that a set of variables is observed (actually). In order to achieve this aim, we reformulate the formulas of the counterfactual distribution proposed by Balke and Pearl (1995) through both the total effects and a covariance matrix of observed variables. We further extend the framework of Balke and Pearl (1995) from point observations to interval observations, and from an unconditional plan to a conditional plan. The results of this paper enable us to clarify the properties of counterfactual distribution and establish an optimal plan.\nWe propose an efficient algorithm for estimation of possibility based qualitative expected utility. It is useful for decision making mechanisms where each possible decision is assigned a multi-attribute possibility distribution. The computational complexity of ordinary methods calculating the expected utility based on discretization is growing exponentially with the number of attributes, and may become infeasible with a high number of these attributes. We present series of theorems and lemmas proving the correctness of our algorithm that exibits a linear computational complexity. Our algorithm has been applied in the context of selecting the most prospective partners in multi-party multi-attribute negotiation, and can also be used in making decisions about potential offers during the negotiation as other similar problems.\nWe present a framework to discover and characterize different classes of everyday activities from event-streams. We begin by representing activities as bags of event n-grams. This allows us to analyze the global structural information of activities, using their local event statistics. We demonstrate how maximal cliques in an undirected edge-weighted graph of activities, can be used for activity-class discovery in an unsupervised manner. We show how modeling an activity as a variable length Markov process, can be used to discover recurrent event-motifs to characterize the discovered activity-classes. We present results over extensive data-sets, collected from multiple active environments, to show the competence and generalizability of our proposed framework.\nThis paper describes a general framework called Hybrid Dynamic Mixed Networks (HDMNs) which are Hybrid Dynamic Bayesian Networks that allow representation of discrete deterministic information in the form of constraints. We propose approximate inference algorithms that integrate and adjust well known algorithmic principles such as Generalized Belief Propagation, Rao-Blackwellised Particle Filtering and Constraint Propagation to address the complexity of modeling and reasoning in HDMNs. We use this framework to model a person's travel activity over time and to predict destination and routes given the current location. We present a preliminary empirical evaluation demonstrating the effectiveness of our modeling framework and algorithms using several variants of the activity model.\nWe present a method for learning the parameters of a Bayesian network with prior knowledge about the signs of influences between variables. Our method accommodates not just the standard signs, but provides for context-specific signs as well. We show how the various signs translate into order constraints on the network parameters and how isotonic regression can be used to compute order-constrained estimates from the available data. Our experimental results show that taking prior knowledge about the signs of influences into account leads to an improved fit of the true distribution, especially when only a small sample of data is available. Moreover, the computed estimates are guaranteed to be consistent with the specified signs, thereby resulting in a network that is more likely to be accepted by experts in its domain of application.\nPlanning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state representation (PSR) - a representation which recently has been receiving increasing attention. In this work, we relate POMDPs to multiplicity automata- showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature. Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP. As a result, whenever the predictive state representation is logarithmic in the standard POMDP representation, our planning algorithm is efficient.\nWe show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N >= 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K, 0 < K < 1/(2N) we provide an upper bound on the number experiments required to determine causal structure when each experiment simultaneously randomizes K variables. For large N, these bounds are significantly lower than the N - 1 bound required when each experiment randomizes at most one variable. For kmax < N/2, we show that (N/kmax-1)+N/(2kmax)log2(kmax) experiments aresufficient and in the worst case necessary. We over a conjecture as to the minimal number of experiments that are in the worst case sufficient to identify all causal relations among N observed variables that are a subset of the vertices of a DAG.\nA fundamental issue in real-world systems, such as sensor networks, is the selection of observations which most effectively reduce uncertainty. More specifically, we address the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model. We present the first efficient randomized algorithm providing a constant factor (1-1/e-epsilon) approximation guarantee for any epsilon > 0 with high confidence. The algorithm leverages the theory of submodular functions, in combination with a polynomial bound on sample complexity. We furthermore prove that no polynomial time algorithm can provide a constant factor approximation better than (1 - 1/e) unless P = NP. Finally, we provide extensive evidence of the effectiveness of our method on two complex real-world datasets.\nTree-reweighted max-product (TRW) message passing is a modified form of the ordinary max-product algorithm for attempting to find minimal energy configurations in Markov random field with cycles. For a TRW fixed point satisfying the strong tree agreement condition, the algorithm outputs a configuration that is provably optimal. In this paper, we focus on the case of binary variables with pairwise couplings, and establish stronger properties of TRW fixed points that satisfy only the milder condition of weak tree agreement (WTA). First, we demonstrate how it is possible to identify part of the optimal solution|i.e., a provably optimal solution for a subset of nodes| without knowing a complete solution. Second, we show that for submodular functions, a WTA fixed point always yields a globally optimal solution. We establish that for binary variables, any WTA fixed point always achieves the global maximum of the linear programming relaxation underlying the TRW method.\nIn this paper, we propose a revision-based approach for conflict resolution by generalizing the Disjunctive Maxi-Adjustment (DMA) approach (Benferhat et al. 2004). Revision operators can be classified into two different families: the model-based ones and the formula-based ones. So the revision-based approach has two different versions according to which family of revision operators is chosen. Two particular revision operators are considered, one is the Dalal's revision operator, which is a model-based revision operator, and the other is the cardinality-maximal based revision operator, which is a formulabased revision operator. When the Dalal's revision operator is chosen, the revision-based approach is independent of the syntactic form in each stratum and it captures some notion of minimal change. When the cardinalitymaximal based revision operator is chosen, the revision-based approach is equivalent to the DMA approach. We also show that both approaches are computationally easier than the DMA approach.\nSystems such as sensor networks and teams of autonomous robots consist of multiple autonomous entities that interact with each other in a distributed, asynchronous manner. These entities need to keep track of the state of the system as it evolves. Asynchronous systems lead to special challenges for monitoring, as nodes must update their beliefs independently of each other and no central coordination is possible. Furthermore, the state of the system continues to change as beliefs are being updated. Previous approaches to developing distributed asynchronous probabilistic reasoning systems have used static models. We present an approach using dynamic models, that take into account the way the system changes state over time. Our approach, which is based on belief propagation, is fully distributed and asynchronous, and allows the world to keep on changing as messages are being sent around. Experimental results show that our approach compares favorably to the factored frontier algorithm.\nTwo types of probabilistic maps are popular in the mobile robotics literature: occupancy grids and geometric maps. Occupancy grids have the advantages of simplicity and speed, but they represent only a restricted class of maps and they make incorrect independence assumptions. On the other hand, current geometric approaches, which characterize the environment by features such as line segments, can represent complex environments compactly. However, they do not reason explicitly about occupancy, a necessity for motion planning; and, they lack a complete probability model over environmental structures. In this paper we present a probabilistic mapping technique based on polygonal random fields (PRF), which combines the advantages of both approaches. Our approach explicitly represents occupancy using a geometric representation, and it is based upon a consistent probability distribution over environments which avoids the incorrect independence assumptions made by occupancy grids. We show how sampling techniques for PRFs can be applied to localized laser and sonar data, and we demonstrate significant improvements in mapping performance over occupancy grids.\nContinuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. As shown previously, exact inference in CTBNs is intractable. We address the problem of approximate inference, allowing for general queries conditioned on evidence over continuous time intervals and at discrete time points. We show how CTBNs can be parameterized within the exponential family, and use that insight to develop a message passing scheme in cluster graphs and allows us to apply expectation propagation to CTBNs. The clusters in our cluster graph do not contain distributions over the cluster variables at individual time points, but distributions over trajectories of the variables throughout a duration. Thus, unlike discrete time temporal models such as dynamic Bayesian networks, we can adapt the time granularity at which we reason for different variables and in different conditions.\nThe need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets.\nThis paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least squares policy iteration (LSPI). The key innovation is a coordinate-free representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value functions, where the basis functions reflect the largescale topology of the underlying state space. A new class of algorithms called Representation Policy Iteration (RPI) are presented that automatically learn both basis functions and approximately optimal policies. Illustrative experiments compare the performance of RPI with that of LSPI using two handcoded basis functions (RBF and polynomial state encodings).\nWe introduce a new approximate solution technique for first-order Markov decision processes (FOMDPs). Representing the value function linearly w.r.t. a set of first-order basis functions, we compute suitable weights by casting the corresponding optimization as a first-order linear program and show how off-the-shelf theorem prover and LP software can be effectively used. This technique allows one to solve FOMDPs independent of a specific domain instantiation; furthermore, it allows one to determine bounds on approximation error that apply equally to all domain instantiations. We apply this solution technique to the task of elevator scheduling with a rich feature space and multi-criteria additive reward, and demonstrate that it outperforms a number of intuitive, heuristicallyguided policies.\nModels of dynamical systems based on predictive state representations (PSRs) are defined strictly in terms of observable quantities, in contrast with traditional models (such as Hidden Markov Models) that use latent variables or statespace representations. In addition, PSRs have an effectively infinite memory, allowing them to model some systems that finite memory-based models cannot. Thus far, PSR models have primarily been developed for domains with discrete observations. Here, we develop the Predictive Linear-Gaussian (PLG) model, a class of PSR models for domains with continuous observations. We show that PLG models subsume Linear Dynamical System models (also called Kalman filter models or state-space models) while using fewer parameters. We also introduce an algorithm to estimate PLG parameters from data, and contrast it with standard Expectation Maximization (EM) algorithms used to estimate Kalman filter parameters. We show that our algorithm is a consistent estimation procedure and present preliminary empirical results suggesting that our algorithm outperforms EM, particularly as the model dimension increases.\nWe consider the problem of diagnosing faults in a system represented by a Bayesian network, where diagnosis corresponds to recovering the most likely state of unobserved nodes given the outcomes of tests (observed nodes). Finding an optimal subset of tests in this setting is intractable in general. We show that it is difficult even to compute the next most-informative test using greedy test selection, as it involves several entropy terms whose exact computation is intractable. We propose an approximate approach that utilizes the loopy belief propagation infrastructure to simultaneously compute approximations of marginal and conditional entropies on multiple subsets of nodes. We apply our method to fault diagnosis in computer networks, and show the algorithm to be very effective on realistic Internet-like topologies. We also provide theoretical justification for the greedy test selection approach, along with some performance guarantees.\nDifferent directed acyclic graphs (DAGs) may be Markov equivalent in the sense that they entail the same conditional independence relations among the observed variables. Chickering (1995) provided a transformational characterization of Markov equivalence for DAGs (with no latent variables), which is useful in deriving properties shared by Markov equivalent DAGs, and, with certain generalization, is needed to prove the asymptotic correctness of a search procedure over Markov equivalence classes, known as the GES algorithm. For DAG models with latent variables, maximal ancestral graphs (MAGs) provide a neat representation that facilitates model search. However, no transformational characterization -- analogous to Chickering's -- of Markov equivalent MAGs is yet available. This paper establishes such a characterization for directed MAGs, which we expect will have similar uses as it does for DAGs.\nOne of the main problems of importance sampling in Bayesian networks is representation of the importance function, which should ideally be as close as possible to the posterior joint distribution. Typically, we represent an importance function as a factorization, i.e., product of conditional probability tables (CPTs). Given diagnostic evidence, we do not have explicit forms for the CPTs in the networks. We first derive the exact form for the CPTs of the optimal importance function. Since the calculation is hard, we usually only use their approximations. We review several popular strategies and point out their limitations. Based on an analysis of the influence of evidence, we propose a method for approximating the exact form of importance function by explicitly modeling the most important additional dependence relations introduced by evidence. Our experimental results show that the new approximation strategy offers an immediate improvement in the quality of the importance function.\nGBP and EP are two successful algorithms for approximate probabilistic inference, which are based on different approximation strategies. An open problem in both algorithms has been how to choose an appropriate approximation structure. We introduce 'structured region graphs', a formalism which marries these two strategies, reveals a deep connection between them, and suggests how to choose good approximation structures. In this formalism, each region has an internal structure which defines an exponential family, whose sufficient statistics must be matched by the parent region. Reduction operators on these structures allow conversion between EP and GBP free energies. Thus it is revealed that all EP approximations on discrete variables are special cases of GBP, and conversely that some wellknown GBP approximations, such as overlapping squares, are special cases of EP. Furthermore, region graphs derived from EP have a number of good structural properties, including maxent-normality and overall counting number of one. The result is a convenient framework for producing high-quality approximations with a user-adjustable level of complexity\nThis article introduces the benefits of using computer as a tool for foreign language teaching and learning. It describes the effect of using Natural Language Processing (NLP) tools for learning Arabic. The technique explored in this particular case is the employment of pedagogically indexed corpora. This text-based method provides the teacher the advantage of building activities based on texts adapted to a particular pedagogical situation. This paper also presents ARAC: a Platform dedicated to language educators allowing them to create activities within their own pedagogical area of interest.\nThe behavior composition problem involves automatically building a controller that is able to realize a desired, but unavailable, target system (e.g., a house surveillance) by suitably coordinating a set of available components (e.g., video cameras, blinds, lamps, a vacuum cleaner, phones, etc.) Previous work has almost exclusively aimed at bringing about the desired component in its totality, which is highly unsatisfactory for unsolvable problems. In this work, we develop an approach for approximate behavior composition without departing from the classical setting, thus making the problem applicable to a much wider range of cases. Based on the notion of simulation, we characterize what a maximal controller and the \"closest\" implementable target module (optimal approximation) are, and show how these can be computed using ATL model checking technology for a special case. We show the uniqueness of optimal approximations, and prove their soundness and completeness with respect to their imported controllers.\nWe consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using first-order decision theoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical first-order regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver's attention on concepts that are specifically relevant to the optimal value function for the domain considered.\nIn this paper, we present a Branch and Bound algorithm called QuickBB for computing the treewidth of an undirected graph. This algorithm performs a search in the space of perfect elimination ordering of vertices of the graph. The algorithm uses novel pruning and propagation techniques which are derived from the theory of graph minors and graph isomorphism. We present a new algorithm called minor-min-width for computing a lower bound on treewidth that is used within the branch and bound algorithm and which improves over earlier available lower bounds. Empirical evaluation of QuickBB on randomly generated graphs and benchmarks in Graph Coloring and Bayesian Networks shows that it is consistently better than complete algorithms like QuickTree [Shoikhet and Geiger, 1997] in terms of cpu time. QuickBB also has good anytime performance, being able to generate a better upper bound on treewidth of some graphs whose optimal treewidth could not be computed up to now.\nThis paper proposes a decision theory for a symbolic generalization of probability theory (SP). Darwiche and Ginsberg [2,3] proposed SP to relax the requirement of using numbers for uncertainty while preserving desirable patterns of Bayesian reasoning. SP represents uncertainty by symbolic supports that are ordered partially rather than completely as in the case of standard probability. We show that a preference relation on acts that satisfies a number of intuitive postulates is represented by a utility function whose domain is a set of pairs of supports. We argue that a subjective interpretation is as useful and appropriate for SP as it is for numerical probability. It is useful because the subjective interpretation provides a basis for uncertainty elicitation. It is appropriate because we can provide a decision theory that explains how preference on acts is based on support comparison.\nThe representation of independence relations generally builds upon the well-known semigraphoid axioms of independence. Recently, a representation has been proposed that captures a set of dominant statements of an independence relation from which any other statement can be generated by means of the axioms; the cardinality of this set is taken to indicate the complexity of the relation. Building upon the idea of dominance, we introduce the concept of stability to provide for a more compact representation of independence. We give an associated algorithm for establishing such a representation.We show that, with our concept of stability, many independence relations are found to be of lower complexity than with existing representations.\nThis paper investigates a representation language with flexibility inspired by probabilistic logic and compactness inspired by relational Bayesian networks. The goal is to handle propositional and first-order constructs together with precise, imprecise, indeterminate and qualitative probabilistic assessments. The paper shows how this can be achieved through the theory of credal networks. New exact and approximate inference algorithms based on multilinear programming and iterated/loopy propagation of interval probabilities are presented; their superior performance, compared to existing ones, is shown empirically.\nEarly, reliable detection of disease outbreaks is a critical problem today. This paper reports an investigation of the use of causal Bayesian networks to model spatio-temporal patterns of a non-contagious disease (respiratory anthrax infection) in a population of people. The number of parameters in such a network can become enormous, if not carefully managed. Also, inference needs to be performed in real time as population data stream in. We describe techniques we have applied to address both the modeling and inference challenges. A key contribution of this paper is the explication of assumptions and techniques that are sufficient to allow the scaling of Bayesian network modeling and inference to millions of nodes for real-time surveillance applications. The results reported here provide a proof-of-concept that Bayesian networks can serve as the foundation of a system that effectively performs Bayesian biosurveillance of disease outbreaks.\nDefeasible argumentation frameworks have evolved to become a sound setting to formalize commonsense, qualitative reasoning from incomplete and potentially inconsistent knowledge. Defeasible Logic Programming (DeLP) is a defeasible argumentation formalism based on an extension of logic programming. Although DeLP has been successfully integrated in a number of different real-world applications, DeLP cannot deal with explicit uncertainty, nor with vague knowledge, as defeasibility is directly encoded in the object language. This paper introduces P-DeLP, a new logic programming language that extends original DeLP capabilities for qualitative reasoning by incorporating the treatment of possibilistic uncertainty and fuzzy knowledge. Such features will be formalized on the basis of PGL, a possibilistic logic based on Godel fuzzy logic.\nPrevious work on sensitivity analysis in Bayesian networks has focused on single parameters, where the goal is to understand the sensitivity of queries to single parameter changes, and to identify single parameter changes that would enforce a certain query constraint. In this paper, we expand the work to multiple parameters which may be in the CPT of a single variable, or the CPTs of multiple variables. Not only do we identify the solution space of multiple parameter changes that would be needed to enforce a query constraint, but we also show how to find the optimal solution, that is, the one which disturbs the current probability distribution the least (with respect to a specific measure of disturbance). We characterize the computational complexity of our new techniques and discuss their applications to developing and debugging Bayesian networks, and to the problem of reasoning about the value (reliability) of new information.\nThe complexity of a reasoning task over a graphical model is tied to the induced width of the underlying graph. It is well-known that the conditioning (assigning values) on a subset of variables yields a subproblem of the reduced complexity where instantiated variables are removed. If the assigned variables constitute a cycle-cutset, the rest of the network is singly-connected and therefore can be solved by linear propagation algorithms. A w-cutset is a generalization of a cycle-cutset defined as a subset of nodes such that the subgraph with cutset nodes removed has induced-width of w or less. In this paper we address the problem of finding a minimal w-cutset in a graph. We relate the problem to that of finding the minimal w-cutset of a treedecomposition. The latter can be mapped to the well-known set multi-cover problem. This relationship yields a proof of NP-completeness on one hand and a greedy algorithm for finding a w-cutset of a tree decomposition on the other. Empirical evaluation of the algorithms is presented.\nHumans currently use arguments for explaining choices which are already made, or for evaluating potential choices. Each potential choice has usually pros and cons of various strengths. In spite of the usefulness of arguments in a decision making process, there have been few formal proposals handling this idea if we except works by Fox and Parsons and by Bonet and Geffner. In this paper we propose a possibilistic logic framework where arguments are built from an uncertain knowledge base and a set of prioritized goals. The proposed approach can compute two kinds of decisions by distinguishing between pessimistic and optimistic attitudes. When the available, maybe uncertain, knowledge is consistent, as well as the set of prioritized goals (which have to be fulfilled as far as possible), the method for evaluating decisions on the basis of arguments agrees with the possibility theory-based approach to decision-making under uncertainty. Taking advantage of its relation with formal approaches to defeasible argumentation, the proposed framework can be generalized in case of partially inconsistent knowledge, or goal bases.\nWe introduce a probabilistic formalism subsuming Markov random fields of bounded tree width and probabilistic context free grammars. Our models are based on a representation of Boolean formulas that we call case-factor diagrams (CFDs). CFDs are similar to binary decision diagrams (BDDs) but are concise for circuits of bounded tree width (unlike BDDs) and can concisely represent the set of parse trees over a given string undera given context free grammar (also unlike BDDs). A probabilistic model consists of aCFD defining a feasible set of Boolean assignments and a weight (or cost) for each individual Boolean variable. We give an insideoutside algorithm for simultaneously computing the marginal of each Boolean variable, and a Viterbi algorithm for finding the mininum cost variable assignment. Both algorithms run in time proportional to the size of the CFD.\nBased on a recent development in the area of error control coding, we introduce the notion of convolutional factor graphs (CFGs) as a new class of probabilistic graphical models. In this context, the conventional factor graphs are referred to as multiplicative factor graphs (MFGs). This paper shows that CFGs are natural models for probability functions when summation of independent latent random variables is involved. In particular, CFGs capture a large class of linear models, where the linearity is in the sense that the observed variables are obtained as a linear ransformation of the latent variables taking arbitrary distributions. We use Gaussian models and independent factor models as examples to emonstrate the use of CFGs. The requirement of a linear transformation between latent variables (with certain independence restriction) and the bserved variables, to an extent, limits the modelling flexibility of CFGs. This structural restriction however provides a powerful analytic tool to the framework of CFGs; that is, upon taking the Fourier transform of the function represented by the CFG, the resulting function is represented by a FG with identical structure. This Fourier transform duality allows inference problems on a CFG to be solved on the corresponding dual MFG.\nAs real-world Bayesian networks continue to grow larger and more complex, it is important to investigate the possibilities for improving the performance of existing algorithms of probabilistic inference. Motivated by examples, we investigate the dependency of the performance of Lazy propagation on the message computation algorithm. We show how Symbolic Probabilistic Inference (SPI) and Arc-Reversal (AR) can be used for computation of clique to clique messages in the addition to the traditional use of Variable Elimination (VE). In addition, the paper resents the results of an empirical evaluation of the performance of Lazy propagation using VE, SPI, and AR as the message computation algorithm. The results of the empirical evaluation show that for most networks, the performance of inference did not depend on the choice of message computation algorithm, but for some randomly generated networks the choice had an impact on both space and time performance. In the cases where the choice had an impact, AR produced the best results.\nSuppose that the only available information in a multi-class problem are expert estimates of the conditional probabilities of occurrence for a set of binary features. The aim is to select a subset of features to be measured in subsequent data collection experiments. In the lack of any information about the dependencies between the features, we assume that all features are conditionally independent and hence choose the Naive Bayes classifier as the optimal classifier for the problem. Even in this (seemingly trivial) case of complete knowledge of the distributions, choosing an optimal feature subset is not straightforward. We discuss the properties and implementation details of Sequential Forward Selection (SFS) as a feature selection procedure for the current problem. A sensitivity analysis was carried out to investigate whether the same features are selected when the probabilities vary around the estimated values. The procedure is illustrated with a set of probability estimates for Scrapie in sheep.\nAlthough many real-world stochastic planning problems are more naturally formulated by hybrid models with both discrete and continuous variables, current state-of-the-art methods cannot adequately address these problems. We present the first framework that can exploit problem structure for modeling and solving hybrid problems efficiently. We formulate these problems as hybrid Markov decision processes (MDPs with continuous and discrete state and action variables), which we assume can be represented in a factored way using a hybrid dynamic Bayesian network (hybrid DBN). This formulation also allows us to apply our methods to collaborative multiagent settings. We present a new linear program approximation method that exploits the structure of the hybrid MDP and lets us compute approximate value functions more efficiently. In particular, we describe a new factored discretization of continuous variables that avoids the exponential blow-up of traditional approaches. We provide theoretical bounds on the quality of such an approximation and on its scale-up potential. We support our theoretical arguments with experiments on a set of control problems with up to 28-dimensional continuous state space and 22-dimensional action space.\nMaximum a Posteriori assignment (MAP) is the problem of finding the most probable instantiation of a set of variables given the partial evidence on the other variables in a Bayesian network. MAP has been shown to be a NP-hard problem [22], even for constrained networks, such as polytrees [18]. Hence, previous approaches often fail to yield any results for MAP problems in large complex Bayesian networks. To address this problem, we propose AnnealedMAP algorithm, a simulated annealing-based MAP algorithm. The AnnealedMAP algorithm simulates a non-homogeneous Markov chain whose invariant function is a probability density that concentrates itself on the modes of the target density. We tested this algorithm on several real Bayesian networks. The results show that, while maintaining good quality of the MAP solutions, the AnnealedMAP algorithm is also able to solve many problems that are beyond the reach of previous approaches.\nIn this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost J in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. Weshow the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous.\nGeneralized belief propagation (GBP) has proven to be a promising technique for approximate inference tasks in AI and machine learning. However, the choice of a good set of clusters to be used in GBP has remained more of an art then a science until this day. This paper proposes a sequential approach to adding new clusters of nodes and their interactions (i.e. \"regions\") to the approximation. We first review and analyze the recently introduced region graphs and find that three kinds of operations (\"split\", \"merge\" and \"death\") leave the free energy and (under some conditions) the fixed points of GBP invariant. This leads to the notion of \"weakly irreducible\" regions as the natural candidates to be added to the approximation. Computational complexity of the GBP algorithm is controlled by restricting attention to regions with small \"region-width\". Combining the above with an efficient (i.e. local in the graph) measure to predict the improved accuracy of GBP leads to the sequential \"region pursuit\" algorithm for adding new regions bottom-up to the region graph. Experiments show that this algorithm can indeed perform close to optimally.\nFor many real-life Bayesian networks, common knowledge dictates that the output established for the main variable of interest increases with higher values for the observable variables. We define two concepts of monotonicity to capture this type of knowledge. We say that a network is isotone in distribution if the probability distribution computed for the output variable given specific observations is stochastically dominated by any such distribution given higher-ordered observations; a network is isotone in mode if a probability distribution given higher observations has a higher mode. We show that establishing whether a network exhibits any of these properties of monotonicity is coNPPP-complete in general, and remains coNP-complete for polytrees. We present an approximate algorithm for deciding whether a network is monotone in distribution and illustrate its application to a real-life network in oncology.\nModeling dynamical systems, both for control purposes and to make predictions about their behavior, is ubiquitous in science and engineering. Predictive state representations (PSRs) are a recently introduced class of models for discrete-time dynamical systems. The key idea behind PSRs and the closely related OOMs (Jaeger's observable operator models) is to represent the state of the system as a set of predictions of observable outcomes of experiments one can do in the system. This makes PSRs rather different from history-based models such as nth-order Markov models and hidden-state-based models such as HMMs and POMDPs. We introduce an interesting construct, the systemdynamics matrix, and show how PSRs can be derived simply from it. We also use this construct to show formally that PSRs are more general than both nth-order Markov models and HMMs/POMDPs. Finally, we discuss the main difference between PSRs and OOMs and conclude with directions for future work.\nWe characterize probabilities in Bayesian networks in terms of algebraic expressions called quasi-probabilities. These are arrived at by casting Bayesian networks as noisy AND-OR-NOT networks, and viewing the subnetworks that lead to a node as arguments for or against a node. Quasi-probabilities are in a sense the \"natural\" algebra of Bayesian networks: we can easily compute the marginal quasi-probability of any node recursively, in a compact form; and we can obtain the joint quasi-probability of any set of nodes by multiplying their marginals (using an idempotent product operator). Quasi-probabilities are easily manipulated to improve the efficiency of probabilistic inference. They also turn out to be representable as square-wave pulse trains, and joint and marginal distributions can be computed by multiplication and complementation of pulse trains.\nThe sensitivities revealed by a sensitivity analysis of a probabilistic network typically depend on the entered evidence. For a real-life network therefore, the analysis is performed a number of times, with different evidence. Although efficient algorithms for sensitivity analysis exist, a complete analysis is often infeasible because of the large range of possible combinations of observations. In this paper we present a method for studying sensitivities that are invariant to the evidence entered. Our method builds upon the idea of establishing bounds between which a parameter can be varied without ever inducing a change in the most likely value of a variable of interest.\nProbabilistic inference problems arise naturally in distributed systems such as sensor networks and teams of mobile robots. Inference algorithms that use message passing are a natural fit for distributed systems, but they must be robust to the failure situations that arise in real-world settings, such as unreliable communication and node failures. Unfortunately, the popular sum-product algorithm can yield very poor estimates in these settings because the nodes' beliefs before convergence can be arbitrarily different from the correct posteriors. In this paper, we present a new message passing algorithm for probabilistic inference which provides several crucial guarantees that the standard sum-product algorithm does not. Not only does it converge to the correct posteriors, but it is also guaranteed to yield a principled approximation at any point before convergence. In addition, the computational complexity of the message passing updates depends only upon the model, and is dependent of the network topology of the distributed system. We demonstrate the approach with detailed experimental results on a distributed sensor calibration task using data from an actual sensor network deployment.\nWe consider the problem of estimating the distribution underlying an observed sample of data. Instead of maximum likelihood, which maximizes the probability of the ob served values, we propose a different estimate, the high-profile distribution, which maximizes the probability of the observed profile the number of symbols appearing any given number of times. We determine the high-profile distribution of several data samples, establish some of its general properties, and show that when the number of distinct symbols observed is small compared to the data size, the high-profile and maximum-likelihood distributions are roughly the same, but when the number of symbols is large, the distributions differ, and high-profile better explains the data.\nA diagnostic policy specifies what test to perform next, based on the results of previous tests, and when to stop and make a diagnosis. Cost-sensitive diagnostic policies perform tradeoffs between (a) the cost of tests and (b) the cost of misdiagnoses. An optimal diagnostic policy minimizes the expected total cost. We formalize this diagnosis process as a Markov Decision Process (MDP). We investigate two types of algorithms for solving this MDP: systematic search based on AO* algorithm and greedy search (particularly the Value of Information method). We investigate the issue of learning the MDP probabilities from examples, but only as they are relevant to the search for good policies. We do not learn nor assume a Bayesian network for the diagnosis process. Regularizers are developed to control overfitting and speed up the search. This research is the first that integrates overfitting prevention into systematic search. The paper has two contributions: it discusses the factors that make systematic search feasible for diagnosis, and it shows experimentally, on benchmark data sets, that systematic search methods produce better diagnostic policies than greedy methods.\nMixtures of truncated exponentials (MTE) potentials are an alternative to discretization for representing continuous chance variables in influence diagrams. Also, MTE potentials can be used to approximate utility functions. This paper introduces MTE influence diagrams, which can represent decision problems without restrictions on the relationships between continuous and discrete chance variables, without limitations on the distributions of continuous chance variables, and without limitations on the nature of the utility functions. In MTE influence diagrams, all probability distributions and the joint utility function (or its multiplicative factors) are represented by MTE potentials and decision nodes are assumed to have discrete state spaces. MTE influence diagrams are solved by variable elimination using a fusion algorithm.\nIn this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.\nOptimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first are used. This simple textbook algorithm has given remarkably efficient results. Finding better algorithms has proved difficult. In trying to improve upon the greedy scheme we have implemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by factors up to two.\nThis paper presents a novel fuzzy logic based Adaptive Super-twisting Sliding Mode Controller for the control of dynamic uncertain systems. The proposed controller combines the advantages of Second order Sliding Mode Control, Fuzzy Logic Control and Adaptive Control. The reaching conditions, stability and robustness of the system with the proposed controller are guaranteed. In addition, the proposed controller is well suited for simple design and implementation. The effectiveness of the proposed controller over the first order Sliding Mode Fuzzy Logic controller is illustrated by Matlab based simulations performed on a DC-DC Buck converter. Based on this comparison, the proposed controller is shown to obtain the desired transient response without causing chattering and error under steady-state conditions. The proposed controller is able to give robust performance in terms of rejection to input voltage variations and load variations.\nThis paper presents a predictive control strategy based on neural network model of the plant is applied to Continuous Stirred Tank Reactor (CSTR). This system is a highly nonlinear process; therefore, a nonlinear predictive method, e.g., neural network predictive control, can be a better match to govern the system dynamics. In the paper, the NN model and the way in which it can be used to predict the behavior of the CSTR process over a certain prediction horizon are described, and some comments about the optimization procedure are made. Predictive control algorithm is applied to control the concentration in a continuous stirred tank reactor (CSTR), whose parameters are optimally determined by solving quadratic performance index using the optimization algorithm. An efficient control of the product concentration in cstr can be achieved only through accurate model. Here an attempt is made to alleviate the modeling difficulties using Artificial Intelligent technique such as Neural Network. Simulation results demonstrate the feasibility and effectiveness of the NNMPC technique.\nThe evolution of the human society raises more and more difficult endeavors. For some of the real-life problems, the computing time-restriction enhances their complexity. The Matrix Bandwidth Minimization Problem (MBMP) seeks for a simultaneous permutation of the rows and the columns of a square matrix in order to keep its nonzero entries close to the main diagonal. The MBMP is a highly investigated P-complete problem, as it has broad applications in industry, logistics, artificial intelligence or information recovery. This paper describes a new attempt to use the Ant Colony Optimization framework in tackling MBMP. The introduced model is based on the hybridization of the Ant Colony System technique with new local search mechanisms. Computational experiments confirm a good performance of the proposed algorithm for the considered set of MBMP instances.\nThe Matrix Bandwidth Minimization Problem (MBMP) seeks for a simultaneous reordering of the rows and the columns of a square matrix such that the nonzero entries are collected within a band of small width close to the main diagonal. The MBMP is a NP-complete problem, with applications in many scientific domains, linear systems, artificial intelligence, and real-life situations in industry, logistics, information recovery. The complex problems are hard to solve, that is why any attempt to improve their solutions is beneficent. Genetic algorithms and ant-based systems are Soft Computing methods used in this paper in order to solve some MBMP instances. Our approach is based on a learning agent-based model involving a local search procedure. The algorithm is compared with the classical Cuthill-McKee algorithm, and with a hybrid genetic algorithm, using several instances from Matrix Market collection. Computational experiments confirm a good performance of the proposed algorithms for the considered set of MBMP instances. On Soft Computing basis, we also propose a new theoretical Reinforcement Learning model for solving the MBMP problem.\nEffective search for graph automorphisms allows identifying symmetries in many discrete structures, ranging from chemical molecules to microprocessor circuits. Using this type of structure can enhance visualization as well as speed up computational optimization and verification. Competitive algorithms for the graph automorphism problem are based on efficient partition refinement augmented with group-theoretic pruning techniques. In this paper, we improve prior algorithms for the graph automorphism problem by introducing simultaneous refinement of multiple partitions, which enables the anticipation of future conflicts in search and leads to significant pruning, reducing overall runtimes. Empirically, we observe an exponential speedup for the family of Miyazaki graphs, which have been shown to impede leading graph-automorphism algorithms.\nThe process of sorting marble plates according to their surface texture is an important task in the automated marble plate production. Nowadays some inspection systems in marble industry that automate the classification tasks are too expensive and are compatible only with specific technological equipment in the plant. In this paper a new approach to the design of an Automated Marble Plate Classification System (AMPCS),based on different neural network input training sets is proposed, aiming at high classification accuracy using simple processing and application of only standard devices. It is based on training a classification MLP neural network with three different input training sets: extracted texture histograms, Discrete Cosine and Wavelet Transform over the histograms. The algorithm is implemented in a PLC for real-time operation. The performance of the system is assessed with each one of the input training sets. The experimental test results regarding classification accuracy and quick operation are represented and discussed.\nQualitative modelling is a technique integrating the fields of theoretical computer science, artificial intelligence and the physical and biological sciences. The aim is to be able to model the behaviour of systems without estimating parameter values and fixing the exact quantitative dynamics. Traditional applications are the study of the dynamics of physical and biological systems at a higher level of abstraction than that obtained by estimation of numerical parameter values for a fixed quantitative model. Qualitative modelling has been studied and implemented to varying degrees of sophistication in Petri nets, process calculi and constraint programming. In this paper we reflect on the strengths and weaknesses of existing frameworks, we demonstrate how recent advances in constraint programming can be leveraged to produce high quality qualitative models, and we describe the advances in theory and technology that would be needed to make constraint programming the best option for scientific investigation in the broadest sense.\nA significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs for each possible network, thus complementing the data. In this paper, a method is presented for assigning priors based on beliefs on the presence or absence of certain paths in the true network. Such beliefs correspond to knowledge about the possible causal and associative relations between pairs of variables. This type of knowledge naturally arises from prior experimental and observational data, among others. In addition, a novel search-operator is proposed to take advantage of such prior knowledge. Experiments show that, using path beliefs improves the learning of the skeleton, as well as the edge directions in the network.\nWe propose an approach to lifted approximate inference for first-order probabilistic models, such as Markov logic networks. It is based on performing exact lifted inference in a simplified first-order model, which is found by relaxing first-order constraints, and then compensating for the relaxation. These simplified models can be incrementally improved by carefully recovering constraints that have been relaxed, also at the first-order level. This leads to a spectrum of approximations, with lifted belief propagation on one end, and exact lifted inference on the other. We discuss how relaxation, compensation, and recovery can be performed, all at the firstorder level, and show empirically that our approach substantially improves on the approximations of both propositional solvers and lifted belief propagation.\nMuch effort has been directed at algorithms for obtaining the highest probability configuration in a probabilistic random field model known as the maximum a posteriori (MAP) inference problem. In many situations, one could benefit from having not just a single solution, but the top M most probable solutions known as the M-Best MAP problem. In this paper, we propose an efficient message-passing based algorithm for solving the M-Best MAP problem. Specifically, our algorithm solves the recently proposed Linear Programming (LP) formulation of M-Best MAP [7], while being orders of magnitude faster than a generic LP-solver. Our approach relies on studying a particular partial Lagrangian relaxation of the M-Best MAP LP which exposes a natural combinatorial structure of the problem that we exploit.\nWe address the problem of estimating the effect of intervening on a set of variables X from experiments on a different set, Z, that is more accessible to manipulation. This problem, which we call z-identifiability, reduces to ordinary identifiability when Z = empty and, like the latter, can be given syntactic characterization using the do-calculus [Pearl, 1995; 2000]. We provide a graphical necessary and sufficient condition for z-identifiability for arbitrary sets X,Z, and Y (the outcomes). We further develop a complete algorithm for computing the causal effect of X on Y using information provided by experiments on Z. Finally, we use our results to prove completeness of do-calculus relative to z-identifiability, a result that does not follow from completeness relative to ordinary identifiability.\nThe MPE (Most Probable Explanation) query plays an important role in probabilistic inference. MPE solution algorithms for probabilistic relational models essentially adapt existing belief assessment method, replacing summation with maximization. But the rich structure and symmetries captured by relational models together with the properties of the maximization operator offer an opportunity for additional simplification with potentially significant computational ramifications. Specifically, these models often have groups of variables that define symmetric distributions over some population of formulas. The maximizing choice for different elements of this group is the same. If we can realize this ahead of time, we can significantly reduce the size of the model by eliminating a potentially significant portion of random variables. This paper defines the notion of uniformly assigned and partially uniformly assigned sets of variables, shows how one can recognize these sets efficiently, and how the model can be greatly simplified once we recognize them, with little computational effort. We demonstrate the effectiveness of these ideas empirically on a number of models.\nThe do-calculus was developed in 1995 to facilitate the identification of causal effects in non-parametric models. The completeness proofs of [Huang and Valtorta, 2006] and [Shpitser and Pearl, 2006] and the graphical criteria of [Tian and Shpitser, 2010] have laid this identification problem to rest. Recent explorations unveil the usefulness of the do-calculus in three additional areas: mediation analysis [Pearl, 2012], transportability [Pearl and Bareinboim, 2011] and metasynthesis. Meta-synthesis (freshly coined) is the task of fusing empirical results from several diverse studies, conducted on heterogeneous populations and under different conditions, so as to synthesize an estimate of a causal relation in some target environment, potentially different from those under study. The talk surveys these results with emphasis on the challenges posed by meta-synthesis. For background material, see http://bayes.cs.ucla.edu/csl_papers.html\nWe consider a setting where an agent's uncertainty is represented by a set of probability measures, rather than a single measure. Measure-bymeasure updating of such a set of measures upon acquiring new information is well-known to suffer from problems; agents are not always able to learn appropriately. To deal with these problems, we propose using weighted sets of probabilities: a representation where each measure is associated with a weight, which denotes its significance. We describe a natural approach to updating in such a situation and a natural approach to determining the weights. We then show how this representation can be used in decision-making, by modifying a standard approach to decision making-minimizing expected regret-to obtain minimax weighted expected regret (MWER).We provide an axiomatization that characterizes preferences induced by MWER both in the static and dynamic case.\nThis paper presents a novel approach to the problem of semantic parsing via learning the correspondences between complex sentences and rich sets of events. Our main intuition is that correct correspondences tend to occur more frequently. Our model benefits from a discriminative notion of similarity to learn the correspondence between sentence and an event and a ranking machinery that scores the popularity of each correspondence. Our method can discover a group of events (called macro-events) that best describes a sentence. We evaluate our method on our novel dataset of professional soccer commentaries. The empirical results show that our method significantly outperforms the state-of-theart.\nThis paper provides some new guidance in the construction of region graphs for Generalized Belief Propagation (GBP). We connect the problem of choosing the outer regions of a LoopStructured Region Graph (SRG) to that of finding a fundamental cycle basis of the corresponding Markov network. We also define a new class of tree-robust Loop-SRG for which GBP on any induced (spanning) tree of the Markov network, obtained by setting to zero the off-tree interactions, is exact. This class of SRG is then mapped to an equivalent class of tree-robust cycle bases on the Markov network. We show that a treerobust cycle basis can be identified by proving that for every subset of cycles, the graph obtained from the edges that participate in a single cycle only, is multiply connected. Using this we identify two classes of tree-robust cycle bases: planar cycle bases and \"star\" cycle bases. In experiments we show that tree-robustness can be successfully exploited as a design principle to improve the accuracy and convergence of GBP.\nWe consider the problem of sampling from solutions defined by a set of hard constraints on a combinatorial space. We propose a new sampling technique that, while enforcing a uniform exploration of the search space, leverages the reasoning power of a systematic constraint solver in a black-box scheme. We present a series of challenging domains, such as energy barriers and highly asymmetric spaces, that reveal the difficulties introduced by hard constraints. We demonstrate that standard approaches such as Simulated Annealing and Gibbs Sampling are greatly affected, while our new technique can overcome many of these difficulties. Finally, we show that our sampling scheme naturally defines a new approximate model counting technique, which we empirically show to be very accurate on a range of benchmark problems.\nDecentralized partially observable Markov decision processes (Dec-POMDPs) are rich models for cooperative decision-making under uncertainty, but are often intractable to solve optimally (NEXP-complete). The transition and observation independent Dec-MDP is a general subclass that has been shown to have complexity in NP, but optimal algorithms for this subclass are still inefficient in practice. In this paper, we first provide an updated proof that an optimal policy does not depend on the histories of the agents, but only the local observations. We then present a new algorithm based on heuristic search that is able to expand search nodes by using constraint optimization. We show experimental results comparing our approach with the state-of-the-art DecMDP and Dec-POMDP solvers. These results show a reduction in computation time and an increase in scalability by multiple orders of magnitude in a number of benchmarks.\nWe target the problem of accuracy and robustness in causal inference from finite data sets. Some state-of-the-art algorithms produce clear output complete with solid theoretical guarantees but are susceptible to propagating erroneous decisions, while others are very adept at handling and representing uncertainty, but need to rely on undesirable assumptions. Our aim is to combine the inherent robustness of the Bayesian approach with the theoretical strength and clarity of constraint-based methods. We use a Bayesian score to obtain probability estimates on the input statements used in a constraint-based procedure. These are subsequently processed in decreasing order of reliability, letting more reliable decisions take precedence in case of con icts, until a single output model is obtained. Tests show that a basic implementation of the resulting Bayesian Constraint-based Causal Discovery (BCCD) algorithm already outperforms established procedures such as FCI and Conservative PC. It can also indicate which causal decisions in the output have high reliability and which do not.\nOrienteering problems (OPs) are a variant of the well-known prize-collecting traveling salesman problem, where the salesman needs to choose a subset of cities to visit within a given deadline. OPs and their extensions with stochastic travel times (SOPs) have been used to model vehicle routing problems and tourist trip design problems. However, they suffer from two limitations travel times between cities are assumed to be time independent and the route provided is independent of the risk preference (with respect to violating the deadline) of the user. To address these issues, we make the following contributions: We introduce (1) a dynamic SOP (DSOP) model, which is an extension of SOPs with dynamic (time-dependent) travel times; (2) a risk-sensitive criterion to allow for different risk preferences; and (3) a local search algorithm to solve DSOPs with this risk-sensitive criterion. We evaluated our algorithms on a real-world dataset for a theme park navigation problem as well as synthetic datasets employed in the literature.\nStochastic Shortest Path (SSP) MDPs is a problem class widely studied in AI, especially in probabilistic planning. They describe a wide range of scenarios but make the restrictive assumption that the goal is reachable from any state, i.e., that dead-end states do not exist. Because of this, SSPs are unable to model various scenarios that may have catastrophic events (e.g., an airplane possibly crashing if it flies into a storm). Even though MDP algorithms have been used for solving problems with dead ends, a principled theory of SSP extensions that would allow dead ends, including theoretically sound algorithms for solving such MDPs, has been lacking. In this paper, we propose three new MDP classes that admit dead ends under increasingly weaker assumptions. We present Value Iteration-based as well as the more efficient heuristic search algorithms for optimally solving each class, and explore theoretical relationships between these classes. We also conduct a preliminary empirical study comparing the performance of our algorithms on different MDP classes, especially on scenarios with unavoidable dead ends.\nMuch of scientific data is collected as randomized experiments intervening on some and observing other variables of interest. Quite often, a given phenomenon is investigated in several studies, and different sets of variables are involved in each study. In this article we consider the problem of integrating such knowledge, inferring as much as possible concerning the underlying causal structure with respect to the union of observed variables from such experimental or passive observational overlapping data sets. We do not assume acyclicity or joint causal sufficiency of the underlying data generating model, but we do restrict the causal relationships to be linear and use only second order statistics of the data. We derive conditions for full model identifiability in the most generic case, and provide novel techniques for incorporating an assumption of faithfulness to aid in inference. In each case we seek to establish what is and what is not determined by the data at hand.\nIn typical real-time strategy (RTS) games, enemy units are visible only when they are within sight range of a friendly unit. Knowledge of an opponent's disposition is limited to what can be observed through scouting. Information is costly, since units dedicated to scouting are unavailable for other purposes, and the enemy will resist scouting attempts. It is important to infer as much as possible about the opponent's current and future strategy from the available observations. We present a dynamic Bayes net model of strategies in the RTS game Starcraft that combines a generative model of how strategies relate to observable quantities with a principled framework for incorporating evidence gained via scouting. We demonstrate the model's ability to infer unobserved aspects of the game from realistic observations.\nCooperative Bayesian games (BGs) can model decision-making problems for teams of agents under imperfect information, but require space and computation time that is exponential in the number of agents. While agent independence has been used to mitigate these problems in perfect information settings, we propose a novel approach for BGs based on the observation that BGs additionally possess a different types of structure, which we call type independence. We propose a factor graph representation that captures both forms of independence and present a theoretical analysis showing that non-serial dynamic programming cannot effectively exploit type independence, while Max-Sum can. Experimental results demonstrate that our approach can tackle cooperative Bayesian games of unprecedented size.\nA nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes' rule to these distribution embeddings. Policies and value functions are defined on the feature space over states, which leads to a feature space expression for the Bellman equation. Value iteration may then be used to estimate the optimal value function and associated policy. Experimental results confirm that the correct policy is learned using the feature space representation.\nLearning a Bayesian network structure from data is an NP-hard problem and thus exact algorithms are feasible only for small data sets. Therefore, network structures for larger networks are usually learned with various heuristics. Another approach to scaling up the structure learning is local learning. In local learning, the modeler has one or more target variables that are of special interest; he wants to learn the structure near the target variables and is not interested in the rest of the variables. In this paper, we present a score-based local learning algorithm called SLL. We conjecture that our algorithm is theoretically sound in the sense that it is optimal in the limit of large sample size. Empirical results suggest that SLL is competitive when compared to the constraint-based HITON algorithm. We also study the prospects of constructing the network structure for the whole node set based on local results by presenting two algorithms and comparing them to several heuristics.\nAgents learning to act autonomously in real-world domains must acquire a model of the dynamics of the domain in which they operate. Learning domain dynamics can be challenging, especially where an agent only has partial access to the world state, and/or noisy external sensors. Even in standard STRIPS domains, existing approaches cannot learn from noisy, incomplete observations typical of real-world domains. We propose a method which learns STRIPS action models in such domains, by decomposing the problem into first learning a transition function between states in the form of a set of classifiers, and then deriving explicit STRIPS rules from the classifiers' parameters. We evaluate our approach on simulated standard planning domains from the International Planning Competition, and show that it learns useful domain descriptions from noisy, incomplete observations.\nMarkov networks (MNs) are a powerful way to compactly represent a joint probability distribution, but most MN structure learning methods are very slow, due to the high cost of evaluating candidates structures. Dependency networks (DNs) represent a probability distribution as a set of conditional probability distributions. DNs are very fast to learn, but the conditional distributions may be inconsistent with each other and few inference algorithms support DNs. In this paper, we present a closed-form method for converting a DN into an MN, allowing us to enjoy both the efficiency of DN learning and the convenience of the MN representation. When the DN is consistent, this conversion is exact. For inconsistent DNs, we present averaging methods that significantly improve the approximation. In experiments on 12 standard datasets, our methods are orders of magnitude faster than and often more accurate than combining conditional distributions using weight learning.\nWe consider the linear programming relaxation of an energy minimization problem for Markov Random Fields. The dual objective of this problem can be treated as a concave and unconstrained, but non-smooth function. The idea of smoothing the objective prior to optimization was recently proposed in a series of papers. Some of them suggested the idea to decrease the amount of smoothing (so called temperature) while getting closer to the optimum. However, no theoretical substantiation was provided. We propose an adaptive smoothing diminishing algorithm based on the duality gap between relaxed primal and dual objectives and demonstrate the efficiency of our approach with a smoothed version of Sequential Tree-Reweighted Message Passing (TRW-S) algorithm. The strategy is applicable to other algorithms as well, avoids adhoc tuning of the smoothing during iterations, and provably guarantees convergence to the optimum.\nEDML is a recently proposed algorithm for learning MAP parameters in Bayesian networks. In this paper, we present a number of new advances and insights on the EDML algorithm. First, we provide the multivalued extension of EDML, originally proposed for Bayesian networks over binary variables. Next, we identify a simplified characterization of EDML that further implies a simple fixed-point algorithm for the convex optimization problem that underlies it. This characterization further reveals a connection between EDML and EM: a fixed point of EDML is a fixed point of EM, and vice versa. We thus identify also a new characterization of EM fixed points, but in the semantics of EDML. Finally, we propose a hybrid EDML/EM algorithm that takes advantage of the improved empirical convergence behavior of EDML, while maintaining the monotonic improvement property of EM.\nRecently two search algorithms, A* and breadth-first branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses optimal parents independently. As a result, the heuristic may contain many directed cycles and result in a loose bound. This paper introduces an improved admissible heuristic that tries to avoid directed cycles within small groups of variables. A sparse representation is also introduced to store only the unique optimal parent choices. Empirical results show that the new techniques significantly improved the efficiency and scalability of A* and BFBnB on most of datasets tested in this paper.\nWe describe theoretical bounds and a practical algorithm for teaching a model by demonstration in a sequential decision making environment. Unlike previous efforts that have optimized learners that watch a teacher demonstrate a static policy, we focus on the teacher as a decision maker who can dynamically choose different policies to teach different parts of the environment. We develop several teaching frameworks based on previously defined supervised protocols, such as Teaching Dimension, extending them to handle noise and sequences of inputs encountered in an MDP.We provide theoretical bounds on the learnability of several important model classes in this setting and suggest a practical algorithm for dynamic teaching.\nEasy access and vast amount of data, especially from long period of time, allows to divide social network into timeframes and create temporal social network. Such network enables to analyse its dynamics. One aspect of the dynamics is analysis of social communities evolution, i.e., how particular group changes over time. To do so, the complete group evolution history is needed. That is why in this paper the new method for group evolution extraction called GED is presented.\nObject detection is a fundamental task in computer vision and has many applications in image processing. This paper proposes a new approach for object detection by applying scale invariant feature transform (SIFT) in an automatic segmentation algorithm. SIFT is an invariant algorithm respect to scale, translation and rotation. The features are very distinct and provide stable keypoints that can be used for matching an object in different images. At first, an object is trained with different aspects for finding best keypoints. The object can be recognized in the other images by using achieved keypoints. Then, a robust segmentation algorithm is used to detect the object with full boundary based on SIFT keypoints. In segmentation algorithm, a merging role is defined to merge the regions in image with the assistance of keypoints. The results show that the proposed approach is reliable for object detection and can extract object boundary well.\nRequirements engineers should strive to get a better insight into decision making processes. During elicitation of requirements, decision making influences how stakeholders communicate with engineers, thereby affecting the engineers' understanding of requirements for the future information system. Empirical studies issued from Artificial Intelligence offer an adequate groundwork to understand how decision making is influenced by some particular contextual factors. However, no research has gone into the validation of such empirical studies in the process of collecting needs of the future system's users. As an answer, the paper empirically studies factors, initially identified by AI literature, that influence decision making and communication during requirements elicitation. We argue that the context's structure of the decision should be considered as a cornerstone to adequately study how stakeholders decide to communicate or not a requirement. The paper proposes a context framework to categorize former factors into specific families, and support the engineers during the elicitation process.\nA wide range of engineering design problems have been solved by the algorithms that simulates collective intelligence in swarms of birds or insects. The Artificial Bee Colony or ABC is one of the recent additions to the class of swarm intelligence based algorithms that mimics the foraging behavior of honey bees. ABC consists of three groups of bees namely employed, onlooker and scout bees. In ABC, the food locations represent the potential candidate solution. In the present study an attempt is made to generate the population of food sources (Colony Size) adaptively and the variant is named as A-ABC. A-ABC is further enhanced to improve convergence speed and exploitation capability, by employing the concept of elitism, which guides the bees towards the best food source. This enhanced variant is called E-ABC. The proposed algorithms are validated on a set of standard benchmark problems with varying dimensions taken from literature and on five engineering design problems. The numerical results are compared with the basic ABC and three recent variant of ABC. Numerically and statistically simulated results illustrate that the proposed method is very efficient and competitive.\nThe improvement of medical care quality is a significant interest for the future years. The fight against nosocomial infections (NI) in the intensive care units (ICU) is a good example. We will focus on a set of observations which reflect the dynamic aspect of the decision, result of the application of a Medical Decision Support System (MDSS). This system has to make dynamic decision on temporal data. We use dynamic Bayesian network (DBN) to model this dynamic process. It is a temporal reasoning within a real-time environment; we are interested in the Dynamic Decision Support Systems in healthcare domain (MDDSS).\nTraffic regulation must be respected by all vehicles, either human- or computer- driven. However, extreme traffic situations might exhibit practical cases in which a vehicle should safely and reasonably relax traffic regulation, e.g., in order not to be indefinitely blocked and to keep circulating. In this paper, we propose a high-level representation of an automated vehicle, other vehicles and their environment, which can assist drivers in taking such \"illegal\" but practical relaxation decisions. This high-level representation (an ontology) includes topological knowledge and inference rules, in order to compute the next high-level motion an automated vehicle should take, as assistance to a driver. Results on practical cases are presented.\nWe look at the problem of revising fuzzy belief bases, i.e., belief base revision in which both formulas in the base as well as revision-input formulas can come attached with varying truth-degrees. Working within a very general framework for fuzzy logic which is able to capture a variety of types of inference under uncertainty, such as truth-functional fuzzy logics and certain types of probabilistic inference, we show how the idea of rational change from 'crisp' base revision, as embodied by the idea of partial meet revision, can be faithfully extended to revising fuzzy belief bases. We present and axiomatise an operation of partial meet fuzzy revision and illustrate how the operation works in several important special instances of the framework.\nWA qualitative probabilistic network models the probabilistic relationships between its variables by means of signs. Non-monotonic influences have associated an ambiguous sign. These ambiguous signs typically lead to uninformative results upon inference. A non-monotonic influence can, however, be associated with a, more informative, sign that indicates its effect in the current state of the network. To capture this effect, we introduce the concept of situational sign. Furthermore, if the network converts to a state in which all variables that provoke the non-monotonicity have been observed, a non-monotonic influence reduces to a monotonic influence. We study the persistence and propagation of situational signs upon inference and give a method to establish the sign of a reduced influence.\nThe paper studies empirically the time-space trade-off between sampling and inference in a sl cutset sampling algorithm. The algorithm samples over a subset of nodes in a Bayesian network and applies exact inference over the rest. Consequently, while the size of the sampling space decreases, requiring less samples for convergence, the time for generating each single sample increases. The w-cutset sampling selects a sampling set such that the induced-width of the network when the sampling set is observed is bounded by w, thus requiring inference whose complexity is exponential in w. In this paper, we investigate performance of w-cutset sampling over a range of w values and measure the accuracy of w-cutset sampling as a function of w. Our experiments demonstrate that the cutset sampling idea is quite powerful showing that an optimal balance between inference and sampling benefits substantially from restricting the cutset size, even at the cost of more complex inference.\nBacktracking search is a powerful algorithmic paradigm that can be used to solve many problems. It is in a certain sense the dual of variable elimination; but on many problems, e.g., SAT, it is vastly superior to variable elimination in practice. Motivated by this we investigate the application of backtracking search to the problem of Bayesian inference (Bayes). We show that natural generalizations of known techniques allow backtracking search to achieve performance guarantees similar to standard algorithms for Bayes, and that there exist problems on which backtracking can in fact do much better. We also demonstrate that these ideas can be applied to implement a Bayesian inference engine whose performance is competitive with standard algorithms. Since backtracking search can very naturally take advantage of context specific structure, the potential exists for performance superior to standard algorithms on many problems.\nRecursive Conditioning (RC) was introduced recently as the first any-space algorithm for inference in Bayesian networks which can trade time for space by varying the size of its cache at the increment needed to store a floating point number. Under full caching, RC has an asymptotic time and space complexity which is comparable to mainstream algorithms based on variable elimination and clustering (exponential in the network treewidth and linear in its size). We show two main results about RC in this paper. First, we show that its actual space requirements under full caching are much more modest than those needed by mainstream methods and study the implications of this finding. Second, we show that RC can effectively deal with determinism in Bayesian networks by employing standard logical techniques, such as unit resolution, allowing a significant reduction in its time requirements in certain cases. We illustrate our results using a number of benchmark networks, including the very challenging ones that arise in genetic linkage analysis.\nSymbolic representations have been used successfully in off-line planning algorithms for Markov decision processes. We show that they can also improve the performance of on-line planners. In addition to reducing computation time, symbolic generalization can reduce the amount of costly real-world interactions required for convergence. We introduce Symbolic Real-Time Dynamic Programming (or sRTDP), an extension of RTDP. After each step of on-line interaction with an environment, sRTDP uses symbolic model-checking techniques to generalizes its experience by updating a group of states rather than a single state. We examine two heuristic approaches to dynamic grouping of states and show that they accelerate the planning process significantly in terms of both CPU time and the number of steps of interaction with the environment.\nIn Non - ergodic belief networks the posterior belief OF many queries given evidence may become zero.The paper shows that WHEN belief propagation IS applied iteratively OVER arbitrary networks(the so called, iterative OR loopy belief propagation(IBP)) it IS identical TO an arc - consistency algorithm relative TO zero - belief queries(namely assessing zero posterior probabilities). This implies that zero - belief conclusions derived BY belief propagation converge AND are sound.More importantly it suggests that the inference power OF IBP IS AS strong AND AS weak, AS that OF arc - consistency.This allows the synthesis OF belief networks FOR which belief propagation IS useless ON one hand, AND focuses the investigation OF classes OF belief network FOR which belief propagation may be zero - complete.Finally, ALL the above conclusions apply also TO Generalized belief propagation algorithms that extend loopy belief propagation AND allow a crisper understanding OF their power.\nConstraint-based (CB) learning is a formalism for learning a causal network with a database D by performing a series of conditional-independence tests to infer structural information. This paper considers a new test of independence that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test. The new test can be calculated in the same asymptotic time and space required for the standard tests such as the chi-squared test, but it allows the specification of a prior distribution over parameters and can be used when the database is incomplete. We prove that the test is correct, and we demonstrate empirically that, when used with a CB causal discovery algorithm with noninformative priors, it recovers structural features more reliably and it produces networks with smaller KL-Divergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small.\nIn this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest model able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is hard, even when we are given an independence oracle, an inference oracle, and/or an information oracle. Our negative results also apply to the learning of discrete-variable Bayesian networks in which each node has at most k parents, for all k > 3.\nPearls concept OF a d - connecting path IS one OF the foundations OF the modern theory OF graphical models : the absence OF a d - connecting path IN a DAG indicates that conditional independence will hold IN ANY distribution factorising according TO that graph. IN this paper we show that IN singly - connected Gaussian DAGs it IS possible TO USE the form OF a d - connection TO obtain qualitative information about the strength OF conditional dependence.More precisely, the squared partial correlations BETWEEN two given variables, conditioned ON different subsets may be partially ordered BY examining the relationship BETWEEN the d - connecting path AND the SET OF variables conditioned upon.\nBayesian network classifiers are used in many fields, and one common class of classifiers are naive Bayes classifiers. In this paper, we introduce an approach for reasoning about Bayesian network classifiers in which we explicitly convert them into Ordered Decision Diagrams (ODDs), which are then used to reason about the properties of these classifiers. Specifically, we present an algorithm for converting any naive Bayes classifier into an ODD, and we show theoretically and experimentally that this algorithm can give us an ODD that is tractable in size even given an intractable number of instances. Since ODDs are tractable representations of classifiers, our algorithm allows us to efficiently test the equivalence of two naive Bayes classifiers and characterize discrepancies between them. We also show a number of additional results including a count of distinct classifiers that can be induced by changing some CPT in a naive Bayes classifier, and the range of allowable changes to a CPT which keeps the current classifier unchanged.\nIn 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI improves on runtime over a maximum likelihood model-based policy evaluation approach and on both runtime and accuracy over the temporal differencing (TD) policy evaluation approach. We further improve on MCMI policy evaluation by adding an importance sampling technique to our algorithm to reduce the variance of our estimator. Lastly, we illustrate techniques for scaling up MCMI to large state spaces in order to perform policy improvement.\nIn this paper, we introduce a method for approximating the solution to inference and optimization tasks in uncertain and deterministic reasoning. Such tasks are in general intractable for exact algorithms because of the large number of dependency relationships in their structure. Our method effectively maps such a dense problem to a sparser one which is in some sense \"closest\". Exact methods can be run on the sparser problem to derive bounds on the original answer, which can be quite sharp. We present empirical results demonstrating that our method works well on the tasks of belief inference and finding the probability of the most probable explanation in belief networks, and finding the cost of the solution that violates the smallest number of constraints in constraint satisfaction problems. On one large CPCS network, for example, we were able to calculate upper and lower bounds on the conditional probability of a variable, given evidence, that were almost identical in the average case.\nWe analyze a new property of directed acyclic graphs (DAGs), called layerwidth, arising from a class of DAGs proposed by Eiter and Lukasiewicz. This class of DAGs permits certain problems of structural model-based causality and explanation to be tractably solved. In this paper, we first address an open question raised by Eiter and Lukasiewicz - the computational complexity of deciding whether a given graph has a bounded layerwidth. After proving that this problem is NP-complete, we proceed by proving numerous important properties of layerwidth that are helpful in efficiently computing the optimal layerwidth. Finally, we compare this new DAG property to two other important DAG properties: treewidth and bandwidth.\nLoopy and generalized belief propagation are popular algorithms for approximate inference in Markov random fields and Bayesian networks. Fixed points of these algorithms correspond to extrema of the Bethe and Kikuchi free energy. However, belief propagation does not always converge, which explains the need for approaches that explicitly minimize the Kikuchi/Bethe free energy, such as CCCP and UPS. Here we describe a class of algorithms that solves this typically nonconvex constrained minimization of the Kikuchi free energy through a sequence of convex constrained minimizations of upper bounds on the Kikuchi free energy. Intuitively one would expect tighter bounds to lead to faster algorithms, which is indeed convincingly demonstrated in our simulations. Several ideas are applied to obtain tight convex bounds that yield dramatic speed-ups over CCCP.\nReal-world distributed systems and networks are often unreliable and subject to random failures of its components. Such a stochastic behavior affects adversely the complexity of optimization tasks performed routinely upon such systems, in particular, various resource allocation tasks. In this work we investigate and develop Monte Carlo solutions for a class of two-stage optimization problems in stochastic networks in which the expected value of resource allocations before and after stochastic failures needs to be optimized. The limitation of these problems is that their exact solutions are exponential in the number of unreliable network components: thus, exact methods do not scale-up well to large networks often seen in practice. We first prove that Monte Carlo optimization methods can overcome the exponential bottleneck of exact methods. Next we support our theoretical findings on resource allocation experiments and show a very good scale-up potential of the new methods to large stochastic networks.\nThis paper examines a number of solution methods for decision processes with non-Markovian rewards (NMRDPs). They all exploit a temporal logic specification of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to well-known MDP solution methods. They differ however in the representation of the target MDP and the class of MDP solution methods to which they are suited. As a result, they adopt different temporal logics and different translations. Unfortunately, no implementation of these methods nor experimental let alone comparative results have ever been reported. This paper is the first step towards filling this gap. We describe an integrated system for solving NMRDPs which implements these methods and several variants under a common interface; we use it to compare the various approaches and identify the problem features favoring one over the other.\nThis paper studies decision making for Walley's partially consonant belief functions (pcb). In a pcb, the set of foci are partitioned. Within each partition, the foci are nested. The pcb class includes probability functions and possibility functions as extreme cases. Unlike earlier proposals for a decision theory with belief functions, we employ an axiomatic approach. We adopt an axiom system similar in spirit to von Neumann - Morgenstern's linear utility theory for a preference relation on pcb lotteries. We prove a representation theorem for this relation. Utility for a pcb lottery is a combination of linear utility for probabilistic lottery and binary utility for possibilistic lottery.\nThere has been great interest in identifying tractable subclasses of NP complete problems and designing efficient algorithms for these tractable classes. Constraint satisfaction and Bayesian network inference are two examples of such problems that are of great importance in AI and algorithms. In this paper we study, under the frameworks of random constraint satisfaction problems and random Bayesian networks, a typical tractable subclass characterized by the treewidth of the problems. We show that the property of having a bounded treewidth for CSPs and Bayesian network inference problem has a phase transition that occurs while the underlying structures of problems are still sparse. This implies that algorithms making use of treewidth based structural knowledge only work efficiently in a limited range of random instance.\nThis paper presents a scalable Bayesian technique for decentralized state estimation from multiple platforms in dynamic environments. As has long been recognized, centralized architectures impose severe scaling limitations for distributed systems due to the enormous communication overheads. We propose a strictly decentralized approach in which only nearby platforms exchange information. They do so through an interactive communication protocol aimed at maximizing information flow. Our approach is evaluated in the context of a distributed surveillance scenario that arises in a robotic system for playing the game of laser tag. Our results, both from simulation and using physical robots, illustrate an unprecedented scaling capability to large teams of vehicles.\nMAP is the problem of finding a most probable instantiation of a set of variables in a Bayesian network given some evidence. Unlike computing posterior probabilities, or MPE (a special case of MAP), the time and space complexity of structural solutions for MAP are not only exponential in the network treewidth, but in a larger parameter known as the \"constrained\" treewidth. In practice, this means that computing MAP can be orders of magnitude more expensive than computing posterior probabilities or MPE. This paper introduces a new, simple upper bound on the probability of a MAP solution, which admits a tradeoff between the bound quality and the time needed to compute it. The bound is shown to be generally much tighter than those of other methods of comparable complexity. We use this proposed upper bound to develop a branch-and-bound search algorithm for solving MAP exactly. Experimental results demonstrate that the search algorithm is able to solve many problems that are far beyond the reach of any structure-based method for MAP. For example, we show that the proposed algorithm can compute MAP exactly and efficiently for some networks whose constrained treewidth is more than 40.\nGroup elevator scheduling is an NP-hard sequential decision-making problem with unbounded state spaces and substantial uncertainty. Decision-theoretic reasoning plays a surprisingly limited role in fielded systems. A new opportunity for probabilistic methods has opened with the recent discovery of a tractable solution for the expected waiting times of all passengers in the building, marginalized over all possible passenger itineraries. Though commercially competitive, this solution does not contemplate future passengers. Yet in up-peak traffic, the effects of future passengers arriving at the lobby and entering elevator cars can dominate all waiting times. We develop a probabilistic model of how these arrivals affect the behavior of elevator cars at the lobby, and demonstrate how this model can be used to very significantly reduce the average waiting time of all passengers.\nThis paper proposes and evaluates the k-greedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a trade-off between greediness and randomness, thus exploring different good local optima. When greediness is set at maximum, KES corresponds to the greedy equivalence search algorithm (GES). When greediness is kept at minimum, we prove that under mild assumptions KES asymptotically returns any inclusion optimal BN with nonzero probability. Experimental results for both synthetic and real data are reported showing that KES often finds a better local optima than GES. Moreover, we use KES to experimentally confirm that the number of different local optima is often huge.\nFor a given problem, the optimal Markov policy can be considerred as a conditional or contingent plan containing a (potentially large) number of branches. Unfortunately, there are applications where it is desirable to strictly limit the number of decision points and branches in a plan. For example, it may be that plans must later undergo more detailed simulation to verify correctness and safety, or that they must be simple enough to be understood and analyzed by humans. As a result, it may be necessary to limit consideration to plans with only a small number of branches. This raises the question of how one goes about finding optimal plans containing only a limited number of branches. In this paper, we present an any-time algorithm for optimal k-contingency planning (OKP). It is the first optimal algorithm for limited contingency planning that is not an explicit enumeration of possible contingent plans. By modelling the problem as a Partially Observable Markov Decision Process, it implements the Bellman optimality principle and prunes the solution space. We present experimental results of applying this algorithm to some simple test cases.\nThe property of perfectness plays an important role in the theory of Bayesian networks. First, the existence of perfect distributions for arbitrary sets of variables and directed acyclic graphs implies that various methods for reading independence from the structure of the graph (e.g., Pearl, 1988; Lauritzen, Dawid, Larsen & Leimer, 1990) are complete. Second, the asymptotic reliability of various search methods is guaranteed under the assumption that the generating distribution is perfect (e.g., Spirtes, Glymour & Scheines, 2000; Chickering & Meek, 2002). We provide a lower-bound on the probability of sampling a non-perfect distribution when using a fixed number of bits to represent the parameters of the Bayesian network. This bound approaches zero exponentially fast as one increases the number of bits used to represent the parameters. This result implies that perfect distributions with fixed-length representations exist. We also provide a lower-bound on the number of bits needed to guarantee that a distribution sampled from a uniform Dirichlet distribution is perfect with probability greater than 1/2. This result is useful for constructing randomized reductions for hardness proofs.\nThe paper continues the study of partitioning based inference of heuristics for search in the context of solving the Most Probable Explanation task in Bayesian Networks. We compare two systematic Branch and Bound search algorithms, BBBT (for which the heuristic information is constructed during search and allows dynamic variable/value ordering) and its predecessor BBMB (for which the heuristic information is pre-compiled), against a number of popular local search algorithms for the MPE problem. We show empirically that, when viewed as approximation schemes, BBBT/BBMB are superior to all of these best known SLS algorithms, especially when the domain sizes increase beyond 2. This is in contrast with the performance of SLS vs. systematic search on CSP/SAT problems, where SLS often significantly outperforms systematic algorithms. As far as we know, BBBT/BBMB are currently the best performing algorithms for solving the MPE task.\nPrecision achieved by stochastic sampling algorithms for Bayesian networks typically deteriorates in face of extremely unlikely evidence. To address this problem, we propose the Evidence Pre-propagation Importance Sampling algorithm (EPIS-BN), an importance sampling algorithm that computes an approximate importance function by the heuristic methods: loopy belief Propagation and e-cutoff. We tested the performance of e-cutoff on three large real Bayesian networks: ANDES, CPCS, and PATHFINDER. We observed that on each of these networks the EPIS-BN algorithm gives us a considerable improvement over the current state of the art algorithm, the AIS-BN algorithm. In addition, it avoids the costly learning stage of the AIS-BN algorithm.\nPublished experiments on spidering the Web suggest that, given training data in the form of a (relatively small) subgraph of the Web containing a subset of a selected class of target pages, it is possible to conduct a directed search and find additional target pages significantly faster (with fewer page retrievals) than by performing a blind or uninformed random or systematic search, e.g., breadth-first search. If true, this claim motivates a number of practical applications. Unfortunately, these experiments were carried out in specialized domains or under conditions that are difficult to replicate. We present and apply an experimental framework designed to reexamine and resolve the basic claims of the earlier work, so that the supporting experiments can be replicated and built upon. We provide high-performance tools for building experimental spiders, make use of the ground truth and static nature of the WT10g TREC Web corpus, and rely on simple well understand machine learning techniques to conduct our experiments. In this paper, we describe the basic framework, motivate the experimental design, and report on our findings supporting and qualifying the conclusions of the earlier research.\nWe present an application of hierarchical Bayesian estimation to robot map building. The revisiting problem occurs when a robot has to decide whether it is seeing a previously-built portion of a map, or is exploring new territory. This is a difficult decision problem, requiring the probability of being outside of the current known map. To estimate this probability, we model the structure of a \"typical\" environment as a hidden Markov model that generates sequences of views observed by a robot navigating through the environment. A Dirichlet prior over structural models is learned from previously explored environments. Whenever a robot explores a new environment, the posterior over the model is estimated by Dirichlet hyperparameters. Our approach is implemented and tested in the context of multi-robot map merging, a particularly difficult instance of the revisiting problem. Experiments with robot data show that the technique yields strong improvements over alternative methods.\nIn this paper we examine the problem of inference in Bayesian Networks with discrete random variables that have very large or even unbounded domains. For example, in a domain where we are trying to identify a person, we may have variables that have as domains, the set of all names, the set of all postal codes, or the set of all credit card numbers. We cannot just have big tables of the conditional probabilities, but need compact representations. We provide an inference algorithm, based on variable elimination, for belief networks containing both large domain and normal discrete random variables. We use intensional (i.e., in terms of procedures) and extensional (in terms of listing the elements) representations of conditional probabilities and of the intermediate factors.\nAncestral graphs are a class of graphs that encode conditional independence relations arising in DAG models with latent and selection variables, corresponding to marginalization and conditioning. However, for any ancestral graph, there may be several other graphs to which it is Markov equivalent. We introduce a simple representation of a Markov equivalence class of ancestral graphs, thereby facilitating model search. \\ More specifically, we define a join operation on ancestral graphs which will associate a unique graph with a Markov equivalence class. We also extend the separation criterion for ancestral graphs (which is an extension of d-separation) and provide a proof of the pairwise Markov property for joined ancestral graphs.\nThe problem of learning Markov equivalence classes of Bayesian network structures may be solved by searching for the maximum of a scoring metric in a space of these classes. This paper deals with the definition and analysis of one such search space. We use a theoretically motivated neighbourhood, the inclusion boundary, and represent equivalence classes by essential graphs. We show that this search space is connected and that the score of the neighbours can be evaluated incrementally. We devise a practical way of building this neighbourhood for an essential graph that is purely graphical and does not explicitely refer to the underlying independences. We find that its size can be intractable, depending on the complexity of the essential graph of the equivalence class. The emphasis is put on the potential use of this space with greedy hill -climbing search\nThe ability to make decisions and to assess potential courses of action is a corner-stone of many AI applications, and usually this requires explicit information about the decision-maker s preferences. IN many applications, preference elicitation IS a serious bottleneck.The USER either does NOT have the time, the knowledge, OR the expert support required TO specify complex multi - attribute utility functions. IN such cases, a method that IS based ON intuitive, yet expressive, preference statements IS required. IN this paper we suggest the USE OF TCP - nets, an enhancement OF CP - nets, AS a tool FOR representing, AND reasoning about qualitative preference statements.We present AND motivate this framework, define its semantics, AND show how it can be used TO perform constrained optimization.\nThe paper presents an iterative version of join-tree clustering that applies the message passing of join-tree clustering algorithm to join-graphs rather than to join-trees, iteratively. It is inspired by the success of Pearl's belief propagation algorithm as an iterative approximation scheme on one hand, and by a recently introduced mini-clustering i. success as an anytime approximation method, on the other. The proposed Iterative Join-graph Propagation IJGP belongs to the class of generalized belief propagation methods, recently proposed using analogy with algorithms in statistical physics. Empirical evaluation of this approach on a number of problem classes demonstrates that even the most time-efficient variant is almost always superior to IBP and MC i, and is sometimes more accurate by as much as several orders of magnitude.\nMost reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a na\\\"{i}ve propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen learning performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.\nWe present a principled and efficient planning algorithm for collaborative multiagent dynamical systems. All computation, during both the planning and the execution phases, is distributed among the agents; each agent only needs to model and plan for a small part of the system. Each of these local subsystems is small, but once they are combined they can represent an exponentially larger problem. The subsystems are connected through a subsystem hierarchy. Coordination and communication between the agents is not imposed, but derived directly from the structure of this hierarchy. A globally consistent plan is achieved by a message passing algorithm, where messages correspond to natural local reward functions and are computed by local linear programs; another message passing algorithm allows us to execute the resulting policy. When two portions of the hierarchy share the same structure, our algorithm can reuse plans and messages to speed up computation.\nWe extend the language of influence diagrams to cope with decision scenarios where the order of decisions and observations is not determined. As the ordering of decisions is dependent on the evidence, a step-strategy of such a scenario is a sequence of dependent choices of the next action. A strategy is a step-strategy together with selection functions for decision actions. The structure of a step-strategy can be represented as a DAG with nodes labeled with action variables. We introduce the concept of GS-DAG: a DAG incorporating an optimal step-strategy for any instantiation. We give a method for constructing GS-DAGs, and we show how to use a GS-DAG for determining an optimal strategy. Finally we discuss how analysis of relevant past can be used to reduce the size of the GS-DAG.\nWe describe CFW, a computationally efficient algorithm for collaborative filtering that uses posteriors over weights of evidence. In experiments on real data, we show that this method predicts as well or better than other methods in situations where the size of the user query is small. The new approach works particularly well when the user s query CONTAINS low frequency(unpopular) items.The approach complements that OF dependency networks which perform well WHEN the size OF the query IS large.Also IN this paper, we argue that the USE OF posteriors OVER weights OF evidence IS a natural way TO recommend similar items collaborative - filtering task.\nWe introduce a new Bayesian network (BN) scoring metric called the Global Uniform (GU) metric. This metric is based on a particular type of default parameter prior. Such priors may be useful when a BN developer is not willing or able to specify domain-specific parameter priors. The GU parameter prior specifies that every prior joint probability distribution P consistent with a BN structure S is considered to be equally likely. Distribution P is consistent with S if P includes just the set of independence relations defined by S. We show that the GU metric addresses some undesirable behavior of the BDeu and K2 Bayesian network scoring metrics, which also use particular forms of default parameter priors. A closed form formula for computing GU for special classes of BNs is derived. Efficiently computing GU for an arbitrary BN remains an open problem.\nThis paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.\nValue iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest average reward cycle on a DMDP problem in heta(n^2) iterations, or heta(mn^2) total time, where n denotes the number of states, and m the number of edges. We give two extensions of value iteration that solve the DMDP in heta(mn) time. We explore the analysis of policy iteration algorithms and report on an empirical study of value iteration showing that its convergence is much faster on random sparse graphs.\nThis paper is about searching the combinatorial space of contingency tables during the inner loop of a nonlinear statistical optimization. Examples of this operation in various data analytic communities include searching for nonlinear combinations of attributes that contribute significantly to a regression (Statistics), searching for items to include in a decision list (machine learning) and association rule hunting (Data Mining).   This paper investigates a new, efficient approach to this class of problems, called RADSEARCH (Real-valued All-Dimensions-tree Search). RADSEARCH finds the global optimum, and this gives us the opportunity to empirically evaluate the question: apart from algorithmic elegance what does this attention to optimality buy us?   We compare RADSEARCH with other recent successful search algorithms such as CN2, PRIM, APriori, OPUS and DenseMiner. Finally, we introduce RADREG, a new regression algorithm for learning real-valued outputs based on RADSEARCHing for high-order interactions.\nIn this paper we present a language for finite state continuous time Bayesian networks (CTBNs), which describe structured stochastic processes that evolve over continuous time. The state of the system is decomposed into a set of local variables whose values change over time. The dynamics of the system are described by specifying the behavior of each local variable as a function of its parents in a directed (possibly cyclic) graph. The model specifies, at any given point in time, the distribution over two aspects: when a local variable changes its value and the next value it takes. These distributions are determined by the variable s CURRENT value AND the CURRENT VALUES OF its parents IN the graph.More formally, each variable IS modelled AS a finite state continuous time Markov process whose transition intensities are functions OF its parents.We present a probabilistic semantics FOR the language IN terms OF the generative model a CTBN defines OVER sequences OF events.We list types OF queries one might ask OF a CTBN, discuss the conceptual AND computational difficulties associated WITH exact inference, AND provide an algorithm FOR approximate inference which takes advantage OF the structure within the process.\nWe develop a model of how information flows into a market, and derive algorithms for automatically detecting and explaining relevant events. We analyze data from twenty-two \"political stock markets\" (i.e., betting markets on political outcomes) on the Iowa Electronic Market (IEM). We prove that, under certain efficiency assumptions, prices in such betting markets will on average approach the correct outcomes over time, and show that IEM data conforms closely to the theory. We present a simple model of a betting market where information is revealed over time, and show a qualitative correspondence between the model and real market data. We also present an algorithm for automatically detecting significant events and generating semantic explanations of their origin. The algorithm operates by discovering significant changes in vocabulary on online news sources (using expected entropy loss) that align with major price spikes in related betting markets.\nQuantification is well known to be a major obstacle in the construction of a probabilistic network, especially when relying on human experts for this purpose. The construction of a qualitative probabilistic network has been proposed as an initial step in a network s quantification, since the qualitative network can be used TO gain preliminary insight IN the projected networks reasoning behaviour. We extend on this idea and present a new type of network in which both signs and numbers are specified; we further present an associated algorithm for probabilistic inference. Building upon these semi-qualitative networks, a probabilistic network can be quantified and studied in a stepwise manner. As a result, modelling inadequacies can be detected and amended at an early stage in the quantification process.\nTypical Recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of generating recommendations as a sequential decision problem and, consequently, that Markov decision processes (MDP) provide a more appropriate model for Recommender systems. MDPs introduce two benefits: they take into account the long-term effects of each recommendation, and they take into account the expected value of each recommendation. To succeed in practice, an MDP-based Recommender system must employ a strong initial model; and the bulk of this paper is concerned with the generation of such a model. In particular, we suggest the use of an n-gram predictive model for generating the initial MDP. Our n-gram model induces a Markov-chain model of user behavior whose predictive accuracy is greater than that of existing predictive models. We describe our predictive model in detail and evaluate its performance on real data. In addition, we show how the model can be used in an MDP-based Recommender system.\nIn many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hypertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, ignoring the correlations between them. Recently, Probabilistic Relational Models, a relational version of Bayesian networks, were used to define a joint probabilistic model for a collection of related entities. In this paper, we present an alternative framework that builds on (conditional) Markov networks and addresses two limitations of the previous approach. First, undirected models do not impose the acyclicity constraint that hinders representation of many important relational dependencies in directed models. Second, undirected models are well suited for discriminative training, where we optimize the conditional likelihood of the labels given the features, which generally improves classification accuracy. We show how to train these models effectively, and how to use approximate probabilistic inference over the learned model for collective classification of multiple related entities. We provide experimental results on a webpage classification task, showing that accuracy can be significantly improved by modeling relational dependencies.\nA popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed at making the best possible use of state-based anytime algorithms as the solution method. By explicitly constructing and exploring only parts of the state space, these algorithms are able to trade computation time for policy quality, and have proven quite effective in dealing with large MDPs. Our representation extends future linear temporal logic (FLTL) to express rewards. Our translation has the effect of embedding model-checking in the solution method. It results in an MDP of the minimal size achievable without stepping outside the anytime framework, and consequently in better policies by the deadline.\nWe propose an efficient method for Bayesian network inference in models with functional dependence. We generalize the multiplicative factorization method originally designed by Takikawa and D Ambrosio(1999) FOR models WITH independence OF causal influence.Using a hidden variable, we transform a probability potential INTO a product OF two - dimensional potentials.The multiplicative factorization yields more efficient inference. FOR example, IN junction tree propagation it helps TO avoid large cliques. IN ORDER TO keep potentials small, the number OF states OF the hidden variable should be minimized.We transform this problem INTO a combinatorial problem OF minimal base IN a particular space.We present an example OF a computerized adaptive test, IN which the factorization method IS significantly more efficient than previous inference methods.\nThis paper uses decision-theoretic principles to obtain new insights into the assessment and updating of probabilities. First, a new foundation of Bayesianism is given. It does not require infinite atomless uncertainties as did Savage s classical result, AND can therefore be applied TO ANY finite Bayesian network.It neither requires linear utility AS did de Finetti s classical result, AND r ntherefore allows FOR the empirically AND normatively desirable risk r naversion.Finally, BY identifying AND fixing utility IN an elementary r nmanner, our result can readily be applied TO identify methods OF r nprobability updating.Thus, a decision - theoretic foundation IS given r nto the computationally efficient method OF inductive reasoning r ndeveloped BY Rudolf Carnap.Finally, recent empirical findings ON r nprobability assessments are discussed.It leads TO suggestions FOR r ncorrecting biases IN probability assessments, AND FOR an alternative r nto the Dempster - Shafer belief functions that avoids the reduction TO r ndegeneracy after multiple updatings.r n\nIterative Proportional Fitting (IPF), combined with EM, is commonly used as an algorithm for likelihood maximization in undirected graphical models. In this paper, we present two iterative algorithms that generalize upon IPF. The first one is for likelihood maximization in discrete chain factor graphs, which we define as a wide class of discrete variable models including undirected graphical models and Bayesian networks, but also chain graphs and sigmoid belief networks. The second one is for conditional likelihood maximization in standard undirected models and Bayesian networks. In both algorithms, the iteration steps are expressed in closed form. Numerical simulations show that the algorithms are competitive with state of the art methods.\nWe select policies for large Markov Decision Processes (MDPs) with compact first-order representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamic-programming approaches based on flat, propositional, or first-order representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces first-order policies using training data constructed by solving small problem instances using PGraphplan (Blum & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find \"good\" policies for several stochastic first-order MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcement-learning problem.\nPossibility theory offers either a qualitive, or a numerical framework for representing uncertainty, in terms of dual measures of possibility and necessity. This leads to the existence of two kinds of possibilistic causal graphs where the conditioning is either based on the minimum, or the product operator. Benferhat et al. (1999) have investigated the connections between min-based graphs and possibilistic logic bases (made of classical formulas weighted in terms of certainty). This paper deals with a more difficult issue : the product-based graphical representations of possibilistic bases, which provides an easy structural reading of possibilistic bases. Moreover, this paper also provides another reading of possibilistic bases in terms of comparative preferences of the form \"in the context p, q is preferred to not q\". This enables us to explicit preferences underlying a set of goals with different levels of priority.\nThe currently most efficient algorithm for inference with a probabilistic network builds upon a triangulation of a network's graph. In this paper, we show that pre-processing can help in finding good triangulations forprobabilistic networks, that is, triangulations with a minimal maximum clique size. We provide a set of rules for stepwise reducing a graph, without losing optimality. This reduction allows us to solve the triangulation problem on a smaller graph. From the smaller graph's triangulation, a triangulation of the original graph is obtained by reversing the reduction steps. Our experimental results show that the graphs of some well-known real-life probabilistic networks can be triangulated optimally just by preprocessing; for other networks, huge reductions in their graph's size are obtained.\nWe present two sampling algorithms for probabilistic confidence inference in Bayesian networks. These two algorithms (we call them AIS-BN-mu and AIS-BN-sigma algorithms) guarantee that estimates of posterior probabilities are with a given probability within a desired precision bound. Our algorithms are based on recent advances in sampling algorithms for (1) estimating the mean of bounded random variables and (2) adaptive importance sampling in Bayesian networks. In addition to a simple stopping rule for sampling that they provide, the AIS-BN-mu and AIS-BN-sigma algorithms are capable of guiding the learning process in the AIS-BN algorithm. An empirical evaluation of the proposed algorithms shows excellent performance, even for very unlikely evidence.\nIn a causal graphical model, an instrument for a variable X and its effect Y is a random variable that is a cause of X and independent of all the causes of Y except X. (Pearl (1995), Spirtes et al (2000)). Instrumental variables can be used to estimate how the distribution of an effect will respond to a manipulation of its causes, even in the presence of unmeasured common causes (confounders). In typical instrumental variable estimation, instruments are chosen based on domain knowledge. There is currently no statistical test for validating a variable as an instrument. In this paper, we introduce the concept of semi-instrument, which generalizes the concept of instrument. We show that in the framework of additive models, under certain conditions, we can test whether a variable is semi-instrumental. Moreover, adding some distribution assumptions, we can test whether two semi-instruments are instrumental. We give algorithms to estimate the p-value that a random variable is semi-instrumental, and the p-value that two semi-instruments are both instrumental. These algorithms can be used to test the experts' choice of instruments, or to identify instruments automatically.\nOn roads showing significant violations of posted speed limits, one measure of the safety effect of speeding is the difference between the road's actual accident count and the count that would have occurred if the posted speed limit had been strictly obeyed. An estimate of this accident reduction can be had by computing the probability that speeding was a necessary condition for each of set of accidents. This is an instance of assessing individual probabilities of causation, which is generally not possible absent prior knowledge of causal structure. For traffic accidents such prior knowledge is often available and this paper illustrates how, for a commonly occurring class of vehicle/pedestrian accidents, approaches to uncertainty and causal analyses appearing in the accident reconstruction literature can be unified using Bayesian networks. Measured skidmarks, pedestrian throw distances, and pedestrian injury severity are treated as evidence, and using the Gibbs Sampling routine BUGS, the posterior probability distribution over exogenous variables, such as the vehicle's initial speed, location, and driver reaction time, is computed. This posterior distribution is then used to compute the \"probability of necessity\" for speeding.\nIn this paper, we present an efficient way of performing stepwise selection in the class of decomposable models. The main contribution of the paper is a simple characterization of the edges that canbe added to a decomposable model while keeping the resulting model decomposable and an efficient algorithm for enumerating all such edges for a given model in essentially O(1) time per edge. We also discuss how backward selection can be performed efficiently using our data structures.We also analyze the complexity of the complete stepwise selection procedure, including the complexity of choosing which of the eligible dges to add to (or delete from) the current model, with the aim ofminimizing the Kullback-Leibler distance of the resulting model from the saturated model for the data.\nGlobal variational approximation methods in graphical models allow efficient approximate inference of complex posterior distributions by using a simpler model. The choice of the approximating model determines a tradeoff between the complexity of the approximation procedure and the quality of the approximation. In this paper, we consider variational approximations based on two classes of models that are richer than standard Bayesian networks, Markov networks or mixture models. As such, these classes allow to find better tradeoffs in the spectrum of approximations. The first class of models are chain graphs, which capture distributions that are partially directed. The second class of models are directed graphs (Bayesian networks) with additional latent variables. Both classes allow representation of multi-variable dependencies that cannot be easily represented within a Bayesian network.\nA serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the number of states of the hidden variable. In this paper, we address the latter problem in the context of Bayesian networks. We describe an approach that utilizes a score-based agglomerative state-clustering. As we show, this approach allows us to efficiently evaluate models with a range of cardinalities for the hidden variable. We show how to extend this procedure to deal with multiple interacting hidden variables. We demonstrate the effectiveness of this approach by evaluating it on synthetic and real-life data. We show that our approach learns models with hidden variables that generalize better and have better structure than previous approaches.\nThe Information bottleneck method is an unsupervised non-parametric data organization technique. Given a joint distribution P(A,B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. The information bottleneck has already been applied to document classification, gene expression, neural code, and spectral analysis. In this paper, we introduce a general principled framework for multivariate extensions of the information bottleneck method. This allows us to consider multiple systems of data partitions that are inter-related. Our approach utilizes Bayesian networks for specifying the systems of clusters and what information each captures. We show that this construction provides insight about bottleneck variations and enables us to characterize solutions of these variations. We also present a general framework for iterative algorithms for constructing solutions, and apply it to several examples.\nIn this paper we analyze two recent axiomatic approaches proposed by Dubois et al and by Giang and Shenoy to qualitative decision making where uncertainty is described by possibility theory. Both axiomtizations are inspired by von Neumann and Morgenstern's system of axioms for the case of probability theory. We show that our approach naturally unifies two axiomatic systems that correspond respectively to pessimistic and optimistic decision criteria proposed by Dubois et al. The simplifying unification is achieved by (i) replacing axioms that are supposed to reflect two informational attitudes (uncertainty aversion and uncertainty attraction) by an axiom that imposes order on set of standard lotteries and (ii) using a binary utility scale in which each utility level is represented by a pair of numbers.\nPlanning problems are hard, motion planning, for example, isPSPACE-hard. Such problems are even more difficult in the presence of uncertainty. Although, Markov Decision Processes (MDPs) provide a formal framework for such problems, finding solutions to high dimensional continuous MDPs is usually difficult, especially when the actions and time measurements are continuous. Fortunately, problem-specific knowledge allows us to design controllers that are good locally, though having no global guarantees. We propose a method of nonparametrically combining local controllers to obtain globally good solutions. We apply this formulation to two types of problems : motion planning (stochastic shortest path) and discounted MDPs. For motion planning, we argue that usual MDP optimality criterion (expected cost) may not be practically relevant. Wepropose an alternative: finding the minimum cost path,subject to the constraint that the robot must reach the goal withhigh probability. For this problem, we prove that a polynomial number of samples is sufficient to obtain a high probability path. For discounted MDPs, we propose a formulation that explicitly deals with model uncertainty, i.e., the problem introduced when transition probabilities are not known exactly. We formulate the problem as a robust linear program which directly incorporates this type of uncertainty.\nIn this work we focus on efficient heuristics for solving a class of stochastic planning problems that arise in a variety of business, investment, and industrial applications. The problem is best described in terms of future buy and sell contracts. By buying less reliable, but less expensive, buy (supply) contracts, a company or a trader can cover a position of more reliable and more expensive sell contracts. The goal is to maximize the expected net gain (profit) by constructing a dose to optimum portfolio out of the available buy and sell contracts. This stochastic planning problem can be formulated as a two-stage stochastic linear programming problem with recourse. However, this formalization leads to solutions that are exponential in the number of possible failure combinations. Thus, this approach is not feasible for large scale problems. In this work we investigate heuristic approximation techniques alleviating the efficiency problem. We primarily focus on the clustering approach and devise heuristics for finding clusterings leading to good approximations. We illustrate the quality and feasibility of the approach through experimental data.\nWe are developing a general framework for using learned Bayesian models for decision-theoretic control of search and reasoningalgorithms. We illustrate the approach on the specific task of controlling both general and domain-specific solvers on a hard class of structured constraint satisfaction problems. A successful strategyfor reducing the high (and even infinite) variance in running time typically exhibited by backtracking search algorithms is to cut off and restart the search if a solution is not found within a certainamount of time. Previous work on restart strategies have employed fixed cut off values. We show how to create a dynamic cut off strategy by learning a Bayesian model that predicts the ultimate length of a trial based on observing the early behavior of the search algorithm. Furthermore, we describe the general conditions under which a dynamic restart strategy can outperform the theoretically optimal fixed strategy.\nEvery directed acyclic graph (DAG) over a finite non-empty set of variables (= nodes) N induces an independence model over N, which is a list of conditional independence statements over N.The inclusion problem is how to characterize (in graphical terms) whether all independence statements in the model induced by a DAG K are in the model induced by a second DAG L. Meek (1997) conjectured that this inclusion holds iff there exists a sequence of DAGs from L to K such that only certain 'legal' arrow reversal and 'legal' arrow adding operations are performed to get the next DAG in the sequence.In this paper we give several characterizations of inclusion of DAG models and verify Meek's conjecture in the case that the DAGs K and L differ in at most one adjacency. As a warming up a rigorous proof of well-known graphical characterizations of equivalence of DAGs, which is a highly related problem, is given.\nThe search space of Bayesian Network structures is usually defined as Acyclic Directed Graphs (DAGs) and the search is done by local transformations of DAGs. But the space of Bayesian Networks is ordered by DAG Markov model inclusion and it is natural to consider that a good search policy should take this into account. First attempt to do this (Chickering 1996) was using equivalence classes of DAGs instead of DAGs itself. This approach produces better results but it is significantly slower. We present a compromise between these two approaches. It uses DAGs to search the space in such a way that the ordering by inclusion is taken into account. This is achieved by repetitive usage of local moves within the equivalence class of DAGs. We show that this new approach produces better results than the original DAGs approach without substantial change in time complexity. We present empirical results, within the framework of heuristic search and Markov Chain Monte Carlo, provided through the Alarm dataset.\nAn important subclass of hybrid Bayesian networks are those that represent Conditional Linear Gaussian (CLG) distributions --- a distribution with a multivariate Gaussian component for each instantiation of the discrete variables. In this paper we explore the problem of inference in CLGs. We show that inference in CLGs can be significantly harder than inference in Bayes Nets. In particular, we prove that even if the CLG is restricted to an extremely simple structure of a polytree in which every continuous node has at most one discrete ancestor, the inference task is NP-hard.To deal with the often prohibitive computational cost of the exact inference algorithm for CLGs, we explore several approximate inference algorithms. These algorithms try to find a small subset of Gaussians which are a good approximation to the full mixture distribution. We consider two Monte Carlo approaches and a novel approach that enumerates mixture components in order of prior probability. We compare these methods on a variety of problems and show that our novel algorithm is very promising for large, hybrid diagnosis problems.\nIn this paper we present a method ofcomputing the posterior probability ofconditional independence of two or morecontinuous variables from data,examined at several resolutions. Ourapproach is motivated by theobservation that the appearance ofcontinuous data varies widely atvarious resolutions, producing verydifferent independence estimatesbetween the variablesinvolved. Therefore, it is difficultto ascertain independence withoutexamining data at several carefullyselected resolutions. In our paper, weaccomplish this using the exactcomputation of the posteriorprobability of independence, calculatedanalytically given a resolution. Ateach examined resolution, we assume amultinomial distribution with Dirichletpriors for the discretized tableparameters, and compute the posteriorusing Bayesian integration. Acrossresolutions, we use a search procedureto approximate the Bayesian integral ofprobability over an exponential numberof possible histograms. Our methodgeneralizes to an arbitrary numbervariables in a straightforward manner.The test is suitable for Bayesiannetwork learning algorithms that useindependence tests to infer the networkstructure, in domains that contain anymix of continuous, ordinal andcategorical variables.\nWe consider the task of aggregating beliefs of severalexperts. We assume that these beliefs are represented as probabilitydistributions. We argue that the evaluation of any aggregationtechnique depends on the semantic context of this task. We propose aframework, in which we assume that nature generates samples from a`true' distribution and different experts form their beliefs based onthe subsets of the data they have a chance to observe. Naturally, theideal aggregate distribution would be the one learned from thecombined sample sets. Such a formulation leads to a natural way tomeasure the accuracy of the aggregation mechanism.We show that the well-known aggregation operator LinOP is ideallysuited for that task. We propose a LinOP-based learning algorithm,inspired by the techniques developed for Bayesian learning, whichaggregates the experts' distributions represented as Bayesiannetworks. Our preliminary experiments show that this algorithmperforms well in practice.\nThis paper presents a new deterministic approximation technique in Bayesian networks. This method, \"Expectation Propagation\", unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. All three algorithms try to recover an approximate distribution which is close in KL divergence to the true distribution. Loopy belief propagation, because it propagates exact belief states, is useful for a limited class of belief networks, such as those which are purely discrete. Expectation Propagation approximates the belief states by only retaining certain expectations, such as mean and variance, and iterates until these expectations are consistent throughout the network. This makes it applicable to hybrid networks with discrete and continuous nodes. Expectation Propagation also extends belief propagation in the opposite direction - it can propagate richer belief states that incorporate correlations between nodes. Experiments with Gaussian mixture models show Expectation Propagation to be convincingly better than methods with similar computational cost: Laplace's method, variational Bayes, and Monte Carlo. Expectation Propagation also provides an efficient algorithm for training Bayes point machine classifiers.\nThe Factored Frontier (FF) algorithm is a simple approximate inferencealgorithm for Dynamic Bayesian Networks (DBNs). It is very similar tothe fully factorized version of the Boyen-Koller (BK) algorithm, butinstead of doing an exact update at every step followed bymarginalisation (projection), it always works with factoreddistributions. Hence it can be applied to models for which the exactupdate step is intractable. We show that FF is equivalent to (oneiteration of) loopy belief propagation (LBP) on the original DBN, andthat BK is equivalent (to one iteration of) LBP on a DBN where wecluster some of the nodes. We then show empirically that byiterating, LBP can improve on the accuracy of both FF and BK. Wecompare these algorithms on two real-world DBNs: the first is a modelof a water treatment plant, and the second is a coupled HMM, used tomodel freeway traffic.\nA standard approach to approximate inference in state-space models isto apply a particle filter, e.g., the Condensation Algorithm.However, the performance of particle filters often varies significantlydue to their stochastic nature.We present a class of algorithms, called lattice particle filters, thatcircumvent this difficulty by placing the particles deterministicallyaccording to a Quasi-Monte Carlo integration rule.We describe a practical realization of this idea, discuss itstheoretical properties, and its efficiency.Experimental results with a synthetic 2D tracking problem show that thelattice particle filter is equivalent to a conventional particle filterthat has between 10 and 60% more particles, depending ontheir \"sparsity\" in the state-space.We also present results on inferring 3D human motion frommoving light displays.\nMAP is the problem of finding a most probable instantiation of a set of variables in a Bayesian network, given evidence. Unlike computing marginals, posteriors, and MPE (a special case of MAP), the time and space complexity of MAP is not only exponential in the network treewidth, but also in a larger parameter known as the \"constrained\" treewidth. In practice, this means that computing MAP can be orders of magnitude more expensive than computingposteriors or MPE. Thus, practitioners generally avoid MAP computations, resorting instead to approximating them by the most likely value for each MAP variableseparately, or by MPE.We present a method for approximating MAP using local search. This method has space complexity which is exponential onlyin the treewidth, as is the complexity of each search step. We investigate the effectiveness of different local searchmethods and several initialization strategies and compare them to otherapproximation schemes.Experimental results show that local search provides a much more accurate approximation of MAP, while requiring few search steps.Practically, this means that the complexity of local search is often exponential only in treewidth as opposed to the constrained treewidth, making approximating MAP as efficient as other computations.\nSuppose we are given the conditional probability of one variable given some other variables.Normally the full joint distribution over the conditioning variablesis required to determine the probability of the conditioned variable.Under what circumstances are the marginal distributions over the conditioning variables sufficient to determine the probability ofthe conditioned variable?Sufficiency in this sense is equivalent to additive separability ofthe conditional probability distribution.Such separability structure is natural and can be exploited forefficient inference.Separability has a natural generalization to conditional separability.Separability provides a precise notion of weaklyinteracting subsystems in temporal probabilistic models.Given a system that is decomposed into separable subsystems, exactmarginal probabilities over subsystems at future points in time can becomputed by propagating marginal subsystem probabilities, rather thancomplete system joint probabilities.Thus, separability can make exact prediction tractable.However, observations can break separability,so exact monitoring of dynamic systems remains hard.\nThere is increasing interest within the research community in the design and use of recursive probability models. Although there still remains concern about computational complexity costs and the fact that computing exact solutions can be intractable for many nonrecursive models and impossible in the general case for recursive problems, several research groups are actively developing computational techniques for recursive stochastic languages. We have developed an extension to the traditional lambda-calculus as a framework for families of Turing complete stochastic languages. We have also developed a class of exact inference algorithms based on the traditional reductions of the lambda-calculus. We further propose that using the deBruijn notation (a lambda-calculus notation with nameless dummies) supports effective caching in such systems (caching being an essential component of efficient computation). Finally, our extension to the lambda-calculus offers a foundation and general theory for the construction of recursive stochastic modeling languages as well as promise for effective caching and efficient approximation algorithms for inference.\nWe consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP). While particle filtering has become a widely-used tool in AI for monitoring dynamical systems, rather scant attention has been paid to their use in the context of decision making. Assuming the existence of a value function, we derive error bounds on decision quality associated with filtering using importance sampling. We also describe an adaptive procedure that can be used to dynamically determine the number of samples required to meet specific error bounds. Empirical evidence is offered supporting this technique as a profitable means of directing sampling effort where it is needed to distinguish policies.\nWe investigate a model for planning under uncertainty with temporallyextended actions, where multiple actions can be taken concurrently at each decision epoch. Our model is based on the options framework, and combines it with factored state space models,where the set of options can be partitioned into classes that affectdisjoint state variables. We show that the set of decisionepochs for concurrent options defines a semi-Markov decisionprocess, if the underlying temporally extended actions being parallelized arerestricted to Markov options. This property allows us to use SMDPalgorithms for computing the value function over concurrentoptions. The concurrent options model allows overlapping execution ofoptions in order to achieve higher performance or in order to performa complex task. We describe a simple experiment using a navigationtask which illustrates how concurrent options results in a faster planwhen compared to the case when only one option is taken at a time.\nWe present a new method for estimating the expected return of a POMDP from experience. The method does not assume any knowledge of the POMDP and allows the experience to be gathered from an arbitrary sequence of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons. We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to REINFORCE algorithms showing an order of magnitude reduction in the number of trials required.\nChow and Liu (1968) studied the problem of learning a maximumlikelihood Markov tree. We generalize their work to more complexMarkov networks by considering the problem of learning a maximumlikelihood Markov network of bounded complexity. We discuss howtree-width is in many ways the appropriate measure of complexity andthus analyze the problem of learning a maximum likelihood Markovnetwork of bounded tree-width.Similar to the work of Chow and Liu, we are able to formalize thelearning problem as a combinatorial optimization problem on graphs. Weshow that learning a maximum likelihood Markov network of boundedtree-width is equivalent to finding a maximum weight hypertree. Thisequivalence gives rise to global, integer-programming based,approximation algorithms with provable performance guarantees, for thelearning problem. This contrasts with heuristic local-searchalgorithms which were previously suggested (e.g. by Malvestuto 1991).The equivalence also allows us to study the computational hardness ofthe learning problem. We show that learning a maximum likelihoodMarkov network of bounded tree-width is NP-hard, and discuss thehardness of approximation.\nA Bayesian Belief Network (BN) is a model of a joint distribution over a setof n variables, with a DAG structure to represent the immediate dependenciesbetween the variables, and a set of parameters (aka CPTables) to represent thelocal conditional probabilities of a node, given each assignment to itsparents. In many situations, these parameters are themselves random variables - this may reflect the uncertainty of the domain expert, or may come from atraining sample used to estimate the parameter values. The distribution overthese \"CPtable variables\" induces a distribution over the response the BNwill return to any \"What is Pr(H | E)?\" query. This paper investigates thevariance of this response, showing first that it is asymptotically normal,then providing its mean and asymptotical variance. We then present aneffective general algorithm for computing this variance, which has the samecomplexity as simply computing the (mean value of) the response itself - ie,O(n 2^w), where n is the number of variables and w is the effective treewidth. Finally, we provide empirical evidence that this algorithm, whichincorporates assumptions and approximations, works effectively in practice,given only small samples.\nWith the advance of efficient analytical methods for sensitivity analysis ofprobabilistic networks, the interest in the sensitivities revealed by real-life networks is rekindled. As the amount of data resulting from a sensitivity analysis of even a moderately-sized network is alreadyoverwhelming, methods for extracting relevant information are called for. One such methodis to study the derivative of the sensitivity functions yielded for a network's parameters. We further propose to build upon the concept of admissible deviation, that is, the extent to which a parameter can deviate from the true value without inducing a change in the most likely outcome. We illustrate these concepts by means of a sensitivity analysis of a real-life probabilistic network in oncology.\nThere exist a number of reinforcement learning algorithms which learnby climbing the gradient of expected reward. Their long-runconvergence has been proved, even in partially observableenvironments with non-deterministic actions, and without the need fora system model. However, the variance of the gradient estimator hasbeen found to be a significant practical problem. Recent approacheshave discounted future rewards, introducing a bias-variance trade-offinto the gradient estimate. We incorporate a reward baseline into thelearning system, and show that it affects variance without introducingfurther bias. In particular, as we approach the zero-bias,high-variance parameterization, the optimal (or variance minimizing)constant reward baseline is equal to the long-term average expectedreward. Modified policy-gradient algorithms are presented, and anumber of experiments demonstrate their improvement over previous work.\nWe present a novel inference algorithm for arbitrary, binary, undirected graphs. Unlike loopy belief propagation, which iterates fixed point equations, we directly descend on the Bethe free energy. The algorithm consists of two phases, first we update the pairwise probabilities, given the marginal probabilities at each unit,using an analytic expression. Next, we update the marginal probabilities, given the pairwise probabilities by following the negative gradient of the Bethe free energy. Both steps are guaranteed to decrease the Bethe free energy, and since it is lower bounded, the algorithm is guaranteed to converge to a local minimum. We also show that the Bethe free energy is equal to the TAP free energy up to second order in the weights. In experiments we confirm that when belief propagation converges it usually finds identical solutions as our belief optimization method. However, in cases where belief propagation fails to converge, belief optimization continues to converge to reasonable beliefs. The stable nature of belief optimization makes it ideally suited for learning graphical models from data.\nAutomatic continuous speech recognition (CSR) is sufficiently mature that a variety of real world applications are now possible including large vocabulary transcription and interactive spoken dialogues. This paper reviews the evolution of the statistical modelling techniques which underlie current-day systems, specifically hidden Markov models (HMMs) and N-grams. Starting from a description of the speech signal and its parameterisation, the various modelling assumptions and their consequences are discussed. It then describes various techniques by which the effects of these assumptions can be mitigated. Despite the progress that has been made, the limitations of current modelling techniques are still evident. The paper therefore concludes with a brief review of some of the more fundamental modelling work now in progress.\nUncertainty plays a central role in spoken dialogue systems. Some stochastic models like Markov decision process (MDP) are used to model the dialogue manager. But the partially observable system state and user intention hinder the natural representation of the dialogue state. MDP-based system degrades fast when uncertainty about a user's intention increases. We propose a novel dialogue model based on the partially observable Markov decision process (POMDP). We use hidden system states and user intentions as the state set, parser results and low-level information as the observation set, domain actions and dialogue repair actions as the action set. Here the low-level information is extracted from different input modals, including speech, keyboard, mouse, etc., using Bayesian networks. Because of the limitation of the exact algorithms, we focus on heuristic approximation algorithms and their applicability in POMDP for dialogue management. We also propose two methods for grid point selection in grid-based approximation algorithms.\nIn this paper we present a propositional logic programming language for reasoning under possibilistic uncertainty and representing vague knowledge. Formulas are represented by pairs (A, c), where A is a many-valued proposition and c is value in the unit interval [0,1] which denotes a lower bound on the belief on A in terms of necessity measures. Belief states are modeled by possibility distributions on the set of all many-valued interpretations. In this framework, (i) we define a syntax and a semantics of the general underlying uncertainty logic; (ii) we provide a modus ponens-style calculus for a sublanguage of Horn-rules and we prove that it is complete for determining the maximum degree of possibilistic belief with which a fuzzy propositional variable can be entailed from a set of formulas; and finally, (iii) we show how the computation of a partial matching between fuzzy propositional variables, in terms of necessity measures for fuzzy sets, can be included in our logic programming system.\nPlanning for distributed agents with partial state information is considered from a decision- theoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the finite-horizon problems corresponding to both of our models are complete for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov processes. In contrast to the MDP and POMDP problems, the problems we consider provably do not admit polynomial-time algorithms and most likely require doubly exponential time to solve in the worst case. We have thus provided mathematical evidence corresponding to the intuition that decentralized planning problems cannot easily be reduced to centralized problems and solved exactly using established techniques.\nIn this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how information-theoretic criterion functions can be used to induce sparse, discriminative, and class-conditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for the classification task. Using a new structure learning heuristic, the resulting models are tested on a medium-vocabulary isolated-word speech recognition task. It is demonstrated that these discriminatively structured dynamic Bayesian multinets, when trained in a maximum likelihood setting using EM, can outperform both HMMs and other dynamic Bayesian networks with a similar number of parameters.\nDecision theory does not traditionally include uncertainty over utility functions. We argue that the a person's utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply statistical density estimation techniques to learn such a density function from a database of partially elicited utility functions. In particular, we define a Bayesian learning framework for this problem, assuming the distribution over utilities is a mixture of Gaussians, where the mixture components represent statistically coherent subpopulations. We can also extend our techniques to the problem of discovering generalized additivity structure in the utility functions in the population. We define a Bayesian model selection criterion for utility function structure and a search procedure over structures. The factorization of the utilities in the learned model, and the generalization obtained from density estimation, allows us to provide robust estimates of utilities using a significantly smaller number of utility elicitation questions. We experiment with our technique on synthetic utility data and on a real database of utility functions in the domain of prenatal diagnosis.\nMonte Carlo sampling has become a major vehicle for approximate inference in Bayesian networks. In this paper, we investigate a family of related simulation approaches, known collectively as quasi-Monte Carlo methods based on deterministic low-discrepancy sequences. We first outline several theoretical aspects of deterministic low-discrepancy sequences, show three examples of such sequences, and then discuss practical issues related to applying them to belief updating in Bayesian networks. We propose an algorithm for selecting direction numbers for Sobol sequence. Our experimental results show that low-discrepancy sequences (especially Sobol sequence) significantly improve the performance of simulation algorithms in Bayesian networks compared to Monte Carlo sampling.\nA simple advertising strategy that can be used to help increase sales of a product is to mail out special offers to selected potential customers. Because there is a cost associated with sending each offer, the optimal mailing strategy depends on both the benefit obtained from a purchase and how the offer affects the buying behavior of the customers. In this paper, we describe two methods for partitioning the potential customers into groups, and show how to perform a simple cost-benefit analysis to decide which, if any, of the groups should be targeted. In particular, we consider two decision-tree learning algorithms. The first is an \"off the shelf\" algorithm used to model the probability that groups of customers will buy the product. The second is a new algorithm that is similar to the first, except that for each group, it explicitly models the probability of purchase under the two mailing scenarios: (1) the mail is sent to members of that group and (2) the mail is not sent to members of that group. Using data from a real-world advertising experiment, we compare the algorithms to each other and to a naive mail-to-all strategy.\nThis paper describes a Bayesian method for learning causal networks using samples that were selected in a non-random manner from a population of interest. Examples of data obtained by non-random sampling include convenience samples and case-control data in which a fixed number of samples with and without some condition is collected; such data are not uncommon. The paper describes a method for combining data under selection with prior beliefs in order to derive a posterior probability for a model of the causal processes that are generating the data in the population of interest. The priors include beliefs about the nature of the non-random sampling procedure. Although exact application of the method would be computationally intractable for most realistic datasets, efficient special-case and approximation methods are discussed. Finally, the paper describes how to combine learning under selection with previous methods for learning from observational and experimental data that are obtained on random samples of the population of interest. The net result is a Bayesian methodology that supports causal modeling and discovery from a rich mixture of different types of data.\nThis paper analyzes independence concepts for sets of probability measures associated with directed acyclic graphs. The paper shows that epistemic independence and the standard Markov condition violate desirable separation properties. The adoption of a contraction condition leads to d-separation but still fails to guarantee a belief separation property. To overcome this unsatisfactory situation, a strong Markov condition is proposed, based on epistemic independence. The main result is that the strong Markov condition leads to strong independence and does enforce separation properties; this result implies that (1) separation properties of Bayesian networks do extend to epistemic independence and sets of probability measures, and (2) strong independence has a clear justification based on epistemic independence and the strong Markov condition.\nWe present a new approach for inference in Bayesian networks, which is mainly based on partial differentiation. According to this approach, one compiles a Bayesian network into a multivariate polynomial and then computes the partial derivatives of this polynomial with respect to each variable. We show that once such derivatives are made available, one can compute in constant-time answers to a large class of probabilistic queries, which are central to classical inference, parameter estimation, model validation and sensitivity analysis. We present a number of complexity results relating to the compilation of such polynomials and to the computation of their partial derivatives. We argue that the combined simplicity, comprehensiveness and computational complexity of the presented framework is unique among existing frameworks for inference in Bayesian networks.\nWe have recently introduced an any-space algorithm for exact inference in Bayesian networks, called Recursive Conditioning, RC, which allows one to trade space with time at increments of X-bytes, where X is the number of bytes needed to cache a floating point number. In this paper, we present three key extensions of RC. First, we modify the algorithm so it applies to more general factorization of probability distributions, including (but not limited to) Bayesian network factorizations. Second, we present a forgetting mechanism which reduces the space requirements of RC considerably and then compare such requirmenets with those of variable elimination on a number of realistic networks, showing orders of magnitude improvements in certain cases. Third, we present a version of RC for computing maximum a posteriori hypotheses (MAP), which turns out to be the first MAP algorithm allowing a smooth time-space tradeoff. A key advantage of presented MAP algorithm is that it does not have to start from scratch each time a new query is presented, but can reuse some of its computations across multiple queries, leading to significant savings in ceratain cases.\nRecently developed techniques have made it possible to quickly learn accurate probability density functions from data in low-dimensional continuous space. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kd-trees (Moore, 1999). In this paper, we propose a kind of Bayesian networks in which low-dimensional mixtures of Gaussians over different subsets of the domain's variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrated how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.\nParticle filters (PFs) are powerful sampling-based inference/learning algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a principled way, any type of probability distribution, nonlinearity and non-stationarity. They have appeared in several fields under such names as \"condensation\", \"sequential Monte Carlo\" and \"survival of the fittest\". In this paper, we show how we can exploit the structure of the DBN to increase the efficiency of particle filtering, using a technique known as Rao-Blackwellisation. Essentially, this samples some of the variables, and marginalizes out the rest exactly, using the Kalman filter, HMM filter, junction tree algorithm, or any other finite dimensional optimal filter. We show that Rao-Blackwellised particle filters (RBPFs) lead to more accurate estimates than standard PFs. We demonstrate RBPFs on two problems, namely non-stationary online regression with radial basis function networks and robot localization and map building. We also discuss other potential application areas and provide references to some finite dimensional optimal filters.\nIn this paper, we use evidence-specific value abstraction for speeding Bayesian networks inference. This is done by grouping variable values and treating the combined values as a single entity. As we show, such abstractions can exploit regularities in conditional probability distributions and also the specific values of observed variables. To formally justify value abstraction, we define the notion of safe value abstraction and devise inference algorithms that use it to reduce the cost of inference. Our procedure is particularly useful for learning complex networks with many hidden variables. In such cases, repeated likelihood computations are required for EM or other parameter optimization techniques. Since these computations are repeated with respect to the same evidence set, our methods can provide significant speedup to the learning procedure. We demonstrate the algorithm on genetic linkage problems where the use of value abstraction sometimes differentiates between a feasible and non-feasible solution.\nInference for belief networks using Gibbs sampling produces a distribution for unobserved variables that differs from the correct distribution by a (usually) unknown error, since convergence to the right distribution occurs only asymptotically. The method of \"coupling from the past\" samples from exactly the correct distribution by (conceptually) running dependent Gibbs sampling simulations from every possible starting state from a time far enough in the past that all runs reach the same state at time t=0. Explicitly considering every possible state is intractable for large networks, however. We propose a method for layered noisy-or networks that uses a compact, but often imprecise, summary of a set of states. This method samples from exactly the correct distribution, and requires only about twice the time per step as ordinary Gibbs sampling, but it may require more simulation steps than would be needed if chains were tracked exactly.\nWe describe a graphical model for probabilistic relationships---an alternative to the Bayesian network---called a dependency network. The graph of a dependency network, unlike a Bayesian network, is potentially cyclic. The probability component of a dependency network, like a Bayesian network, is a set of conditional distributions, one for each node given its parents. We identify several basic properties of this representation and describe a computationally efficient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative filtering (the task of predicting preferences), and the visualization of acausal predictive relationships.\nThere are two main objectives of this paper. The first is to present a statistical framework for models with context specific independence structures, i.e., conditional independences holding only for sepcific values of the conditioning variables. This framework is constituted by the class of split models. Split models are extension of graphical models for contigency tables and allow for a more sophisticiated modelling than graphical models. The treatment of split models include estimation, representation and a Markov property for reading off those independencies holding in a specific context. The second objective is to present a software package named YGGDRASIL which is designed for statistical inference in split models, i.e., for learning such models on the basis of data.\nComposition of low-dimensional distributions, whose foundations were laid in the papaer published in the Proceeding of UAI'97 (Jirousek 1997), appeared to be an alternative apparatus to describe multidimensional probabilistic models. In contrast to Graphical Markov Models, which define multidomensinoal distributions in a declarative way, this approach is rather procedural. Ordering of low-dimensional distributions into a proper sequence fully defines the resepctive computational procedure; therefore, a stury of different type of generating sequences is one fo the central problems in this field. Thus, it appears that an important role is played by special sequences that are called perfect. Their main characterization theorems are presetned in this paper. However, the main result of this paper is a solution to the problem of margnialization for general sequences. The main theorem describes a way to obtain a generating sequence that defines the model corresponding to the marginal of the distribution defined by an arbitrary genearting sequence. From this theorem the reader can see to what extent these comutations are local; i.e., the sequence consists of marginal distributions whose computation must be made by summing up over the values of the variable eliminated (the paper deals with finite model).\nStochastic games generalize Markov decision processes (MDPs) to a multiagent setting by allowing the state transitions to depend jointly on all player actions, and having rewards determined by multiplayer matrix games at each state. We consider the problem of computing Nash equilibria in stochastic games, the analogue of planning in MDPs. We begin by providing a generalization of finite-horizon value iteration that computes a Nash strategy for each player in generalsum stochastic games. The algorithm takes an arbitrary Nash selection function as input, which allows the translation of local choices between multiple Nash equilibria into the selection of a single global Nash equilibrium.   Our main technical result is an algorithm for computing near-Nash equilibria in large or infinite state spaces. This algorithm builds on our finite-horizon value iteration algorithm, and adapts the sparse sampling methods of Kearns, Mansour and Ng (1999) to stochastic games. We conclude by descrbing a counterexample showing that infinite-horizon discounted value iteration, which was shown by shaplely to converge in the zero-sum case (a result we give extend slightly here), does not converge in the general-sum case.\nTo investigate the robustness of the output probabilities of a Bayesian network, a sensitivity analysis can be performed. A one-way sensitivity analysis establishes, for each of the probability parameters of a network, a function expressing a posterior marginal probability of interest in terms of the parameter. Current methods for computing the coefficients in such a function rely on a large number of network evaluations. In this paper, we present a method that requires just a single outward propagation in a junction tree for establishing the coefficients in the functions for all possible parameters; in addition, an inward propagation is required for processing evidence. Conversely, the method requires a single outward propagation for computing the coefficients in the functions expressing all possible posterior marginals in terms of a single parameter. We extend these results to an n-way sensitivity analysis in which sets of parameters are studied.\nWe introduce Game networks (G nets), a novel representation for multi-agent decision problems. Compared to other game-theoretic representations, such as strategic or extensive forms, G nets are more structured and more compact; more fundamentally, G nets constitute a computationally advantageous framework for strategic inference, as both probability and utility independencies are captured in the structure of the network and can be exploited in order to simplify the inference process. An important aspect of multi-agent reasoning is the identification of some or all of the strategic equilibria in a game; we present original convergence methods for strategic equilibrium which can take advantage of strategic separabilities in the G net structure in order to simplify the computations. Specifically, we describe a method which identifies a unique equilibrium as a function of the game payoffs, and one which identifies all equilibria.\nThis paper shows how the Bayesian network paradigm can be used in order to solve combinatorial optimization problems. To do it some methods of structure learning from data and simulation of Bayesian networks are inserted inside Estimation of Distribution Algorithms (EDA). EDA are a new tool for evolutionary computation in which populations of individuals are created by estimation and simulation of the joint probability distribution of the selected individuals. We propose new approaches to EDA for combinatorial optimization based on the theory of probabilistic graphical models. Experimental results are also presented.\nWe apply the principle of maximum entropy to select a unique joint probability distribution from the set of all joint probability distributions specified by a credal network. In detail, we start by showing that the unique joint distribution of a Bayesian tree coincides with the maximum entropy model of its conditional distributions. This result, however, does not hold anymore for general Bayesian networks. We thus present a new kind of maximum entropy models, which are computed sequentially. We then show that for all general Bayesian networks, the sequential maximum entropy model coincides with the unique joint distribution. Moreover, we apply the new principle of sequential maximum entropy to interval Bayesian networks and more generally to credal networks. We especially show that this application is equivalent to a number of small local entropy maximizations.\nIn this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial time. This follows from two main results: First, we show that factored distributions over spanning trees in a graph can be integrated in closed form. Second, we examine priors over tree parameters and show that a set of assumptions similar to (Heckerman and al. 1995) constrain the tree parameter priors to be a compactly parameterized product of Dirichlet distributions. Beside allowing for exact Bayesian learning, these results permit us to formulate a new class of tractable latent variable models in which the likelihood of a data point is computed through an ensemble average over tree structures.\nThis paper deals with the representation and solution of asymmetric Bayesian decision problems. We present a formal framework, termed asymmetric influence diagrams, that is based on the influence diagram and allows an efficient representation of asymmetric decision problems. As opposed to existing frameworks, the asymmetric influece diagram primarily encodes asymmetry at the qualitative level and it can therefore be read directly from the model. We give an algorithm for solving asymmetric influence diagrams. The algorithm initially decomposes the asymmetric decision problem into a structure of symmetric subproblems organized as a tree. A solution to the decision problem can then be found by propagating from the leaves toward the root using existing evaluation methods to solve the sub-problems.\nWhen using Bayesian networks for modelling the behavior of man-made machinery, it usually happens that a large part of the model is deterministic. For such Bayesian networks deterministic part of the model can be represented as a Boolean function, and a central part of belief updating reduces to the task of calculating the number of satisfying configurations in a Boolean function. In this paper we explore how advances in the calculation of Boolean functions can be adopted for belief updating, in particular within the context of troubleshooting. We present experimental results indicating a substantial speed-up compared to traditional junction tree propagation.\nSampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we want to have a sampling distribution that provides optimal-variance estimators. In this paper, we present methods that improve the sampling distribution by systematically adapting it as we obtain information from the samples. We present a stochastic-gradient-descent method for sequentially updating the sampling distribution based on the direct minization of the variance. We also present other stochastic-gradient-descent methods based on the minimizationof typical notions of distance between the current sampling distribution and approximations of the target, optimal distribution. We finally validate and compare the different methods empirically by applying them to the problem of action evaluation in influence diagrams.\nLarge sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical problem for such databases.   We investigate the application of probabilistic models to this problem. In particular, we study a Markov random field (MRF) approach based on frequent sets and maximum entropy, and compare it to the independence model and the Chow-Liu tree model. We find that the MRF model provides substantially more accurate probability estimates than the other methods but is more expensive from a computational and memory viewpoint. To alleviate the computational requirements we show how one can apply bucket elimination and clique tree approaches to take advantage of structure in the models and in the queries. We provide experimental results on two large real-world transaction datasets.\nWe consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for belief-state approximation (e.g., based on minimixing a measures such as KL-diveregence between the true and estimated state) are not necessarily appropriate for POMDPs. Instead we propose a framework for analyzing value-directed approximation schemes, where approximation quality is determined by the expected error in utility rather than by the error in the belief state itself. We propose heuristic methods for finding good projection schemes for belief state estimation - exhibiting anytime characteristics - given a POMDP value fucntion. We also describe several algorithms for constructing bounds on the error in decision quality (expected utility) associated with acting in accordance with a given belief state approximation.\nTechniques for plan recognition under uncertainty require a stochastic model of the plan-generation process. We introduce Probabilistic State-Dependent Grammars (PSDGs) to represent an agent's plan-generation process. The PSDG language model extends probabilistic context-free grammars (PCFGs) by allowing production probabilities to depend on an explicit model of the planning agent's internal and external state. Given a PSDG description of the plan-generation process, we can then use inference algorithms that exploit the particular independence properties of the PSDG language to efficiently answer plan-recognition queries. The combination of the PSDG language model and inference algorithms extends the range of plan-recognition domains for which practical probabilistic inference is possible, as illustrated by applications in traffic monitoring and air combat.\nDynamic trees are mixtures of tree structured belief networks. They solve some of the problems of fixed tree networks at the cost of making exact inference intractable. For this reason approximate methods such as sampling or mean field approaches have been used. However, mean field approximations assume a factorized distribution over node states. Such a distribution seems unlickely in the posterior, as nodes are highly correlated in the prior. Here a structured variational approach is used, where the posterior distribution over the non-evidential nodes is itself approximated by a dynamic tree. It turns out that this form can be used tractably and efficiently. The result is a set of update rules which can propagate information through the network to obtain both a full variational approximation, and the relevant marginals. The progagtion rules are more efficient than the mean field approach and give noticeable quantitative and qualitative improvement in the inference. The marginals calculated give better approximations to the posterior than loopy propagation on a small toy problem.\nWe present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribution correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of document clustering for which we use a multinomial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage process wherein we first perform a flat clustering followed by a modified hierarchical agglomerative merging process that includes determining the features that will have common distributions over the merged clusters. The regularization induced by using the marginal likelihood automatically determines the optimal model structure including number of clusters, the depth of the tree and the subset of features to be modeled as having a common distribution at each node. We present experimental results on both synthetic data and a real document collection.\nConditional independence and Markov properties are powerful tools allowing expression of multidimensional probability distributions by means of low-dimensional ones. As multidimensional possibilistic models have been studied for several years, the demand for analogous tools in possibility theory seems to be quite natural. This paper is intended to be a promotion of de Cooman's measure-theoretic approcah to possibility theory, as this approach allows us to find analogies to many important results obtained in probabilistic framework. First, we recall semi-graphoid properties of conditional possibilistic independence, parameterized by a continuous t-norm, and find sufficient conditions for a class of Archimedean t-norms to have the graphoid property. Then we introduce Markov properties and factorization of possibility distrubtions (again parameterized by a continuous t-norm) and find the relationships between them. These results are accompanied by a number of conterexamples, which show that the assumptions of specific theorems are substantial.\nRecently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction trees. We derive generalized mean field equations to optimize the cluster potentials. We show that the method bridges the gap between the standard mean field approximation and the exact junction tree algorithm. In addition, we address the problem of how to choose the graphical structure of the approximating distribution. From the generalised mean field equations we derive rules to simplify the structure of the approximating distribution in advance without affecting the quality of the approximation. We also show how the method fits into some other variational approximations that are currently popular.\nAlgorithms for learning the conditional probabilities of Bayesian networks with hidden variables typically operate within a high-dimensional search space and yield only locally optimal solutions. One way of limiting the search space and avoiding local optima is to impose qualitative constraints that are based on background knowledge concerning the domain. We present a method for integrating formal statements of qualitative constraints into two learning algorithms, APN and EM. In our experiments with synthetic data, this method yielded networks that satisfied the constraints almost perfectly. The accuracy of the learned networks was consistently superior to that of corresponding networks learned without constraints. The exploitation of qualitative constraints therefore appears to be a promising way to increase both the interpretability and the accuracy of learned Bayesian networks with known structure.\nElicitation of probabilities is one of the most laborious tasks in building decision-theoretic models, and one that has so far received only moderate attention in decision-theoretic systems. We propose a set of user interface tools for graphical probabilistic models, focusing on two aspects of probability elicitation: (1) navigation through conditional probability tables and (2) interactive graphical assessment of discrete probability distributions. We propose two new graphical views that aid navigation in very large conditional probability tables: the CPTree (Conditional Probability Tree) and the SCPT (shrinkable Conditional Probability Table). Based on what is known about graphical presentation of quantitative data to humans, we offer several useful enhancements to probability wheel and bar graph, including different chart styles and options that can be adapted to user preferences and needs. We present the results of a simple usability study that proves the value of the proposed tools.\nComputer Poker's unique characteristics present a well-suited challenge for research in artificial intelligence. For that reason, and due to the Poker's market increase in popularity in Portugal since 2008, several members of LIACC have researched in this field. Several works were published as papers and master theses and more recently a member of LIACC engaged on a research in this area as a Ph.D. thesis in order to develop a more extensive and in-depth work. This paper describes the existing research in LIACC about Computer Poker, with special emphasis on the completed master's theses and plans for future work. This paper means to present a summary of the lab's work to the research community in order to encourage the exchange of ideas with other labs / individuals. LIACC hopes this will improve research in this area so as to reach the goal of creating an agent that surpasses the best human players.\nDiagnosis and prediction in some domains, like medical and industrial diagnosis, require a representation that combines uncertainty management and temporal reasoning. Based on the fact that in many cases there are few state changes in the temporal range of interest, we propose a novel representation called Temporal Nodes Bayesian Networks (TNBN). In a TNBN each node represents an event or state change of a variable, and an arc corresponds to a causal-temporal relationship. The temporal intervals can differ in number and size for each temporal node, so this allows multiple granularity. Our approach is contrasted with a dynamic Bayesian network for a simple medical example. An empirical evaluation is presented for a more complex problem, a subsystem of a fossil power plant, in which this approach is used for fault diagnosis and prediction with good results.\nPossibilistic logic bases and possibilistic graphs are two different frameworks of interest for representing knowledge. The former stratifies the pieces of knowledge (expressed by logical formulas) according to their level of certainty, while the latter exhibits relationships between variables. The two types of representations are semantically equivalent when they lead to the same possibility distribution (which rank-orders the possible interpretations). A possibility distribution can be decomposed using a chain rule which may be based on two different kinds of conditioning which exist in possibility theory (one based on product in a numerical setting, one based on minimum operation in a qualitative setting). These two types of conditioning induce two kinds of possibilistic graphs. In both cases, a translation of these graphs into possibilistic bases is provided. The converse translation from a possibilistic knowledge base into a min-based graph is also described.\nIn many domains it is desirable to assess the preferences of users in a qualitative rather than quantitative way. Such representations of qualitative preference orderings form an importnat component of automated decision tools. We propose a graphical representation of preferences that reflects conditional dependence and independence of preference statements under a ceteris paribus (all else being equal) interpretation. Such a representation is ofetn compact and arguably natural. We describe several search algorithms for dominance testing based on this representation; these algorithms are quite effective, especially in specific network topologies, such as chain-and tree- structured networks, as well as polytrees.\nDynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly those where some relevant variables are partially observed or even entirely unknown. Our approach is based on the Structural Expectation Maximization (SEM) algorithm. The main computational cost of the SEM algorithm is the gathering of expected sufficient statistics. We propose a novel approximation scheme that allows these sufficient statistics to be computed efficiently. We also investigate the fundamental problem of discovering the existence of hidden variables without exhaustive and expensive search. Our approach is based on the observation that, in dynamic systems, ignoring a hidden variable typically results in a violation of the Markov property. Thus, our algorithm searches for such violations in the data, and introduces hidden variables to explain them. We provide empirical results showing that the algorithm is able to learn the dynamics of complex systems in a computationally tractable way.\nIn this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers - Naive-Bayes, tree augmented Naive-Bayes, BN augmented Naive-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence (CI) based BN-learning algorithm. Experimental results show the obtained classifiers, learned using the CI based algorithms, are competitive with (or superior to) the best known classifiers, based on both Bayesian networks and other formalisms; and that the computational time for learning and using these classifiers is relatively small. Moreover, these results also suggest a way to learn yet more effective classifiers; we demonstrate empirically that this new algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities.\nWe present a hybrid constraint-based/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraint-based techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants of the algorithm are developed and tested using data from randomly generated networks of sizes from 15 to 45 nodes with data sizes ranging from 250 to 2000 records. Both variations are compared to, and found to consistently outperform two variations of greedy search with restarts.\nReinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways of representing and reasoning about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.\nHybrid Probabilistic Programs (HPPs) are logic programs that allow the programmer to explicitly encode his knowledge of the dependencies between events being described in the program. In this paper, we classify HPPs into three classes called HPP_1,HPP_2 and HPP_r,r>= 3. For these classes, we provide three types of results for HPPs. First, we develop algorithms to compute the set of all ground consequences of an HPP. Then we provide algorithms and complexity results for the problems of entailment (\"Given an HPP P and a query Q as input, is Q a logical consequence of P?\") and consistency (\"Given an HPP P as input, is P consistent?\"). Our results provide a fine characterization of when polynomial algorithms exist for the above problems, and when these problems become intractable.\nThe problem of assessing the value of a candidate is viewed here as a multiple combination problem. On the one hand a candidate can be evaluated according to different criteria, and on the other hand several experts are supposed to assess the value of candidates according to each criterion. Criteria are not equally important, experts are not equally competent or reliable. Moreover levels of satisfaction of criteria, or levels of confidence are only assumed to take their values in qualitative scales which are just linearly ordered. The problem is discussed within two frameworks, the transferable belief model and the qualitative possibility theory. They respectively offer a quantitative and a qualitative setting for handling the problem, providing thus a way to compare the nature of the underlying assumptions.\nThis paper investigates a purely qualitative version of Savage's theory for decision making under uncertainty. Until now, most representation theorems for preference over acts rely on a numerical representation of utility and uncertainty where utility and uncertainty are commensurate. Disrupting the tradition, we relax this assumption and introduce a purely ordinal axiom requiring that the Decision Maker (DM) preference between two acts only depends on the relative position of their consequences for each state. Within this qualitative framework, we determine the only possible form of the decision rule and investigate some instances compatible with the transitivity of the strict preference. Finally we propose a mild relaxation of our ordinality axiom, leaving room for a new family of qualitative decision rules compatible with transitivity.\nIn recent years there has been significant progress in algorithms and methods for inducing Bayesian networks from data. However, in complex data analysis problems, we need to go beyond being satisfied with inducing networks with high scores. We need to provide confidence measures on features of these networks: Is the existence of an edge between two nodes warranted? Is the Markov blanket of a given node robust? Can we say something about the ordering of the variables? We should be able to address these questions, even when the amount of data is not enough to induce a high scoring network. In this paper we propose Efron's Bootstrap as a computationally efficient approach for answering these questions. In addition, we propose to use these confidence measures to induce better structures from the data, and to detect the presence of latent variables.\nLearning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large either in the number of instances, or the number of attributes. In this paper, we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures.\nIn this paper, we analyze the relationship between probability and Spohn's theory for representation of uncertain beliefs. Using the intuitive idea that the more probable a proposition is, the more believable it is, we study transformations from probability to Sphonian disbelief and vice-versa. The transformations described in this paper are different from those described in the literature. In particular, the former satisfies the principles of ordinal congruence while the latter does not. Such transformations between probability and Spohn's calculi can contribute to (1) a clarification of the semantics of nonprobabilistic degree of uncertain belief, and (2) to a construction of a decision theory for such calculi. In practice, the transformations will allow a meaningful combination of more than one calculus in different stages of using an expert system such as knowledge acquisition, inference, and interpretation of results.\nClassical Decision Theory provides a normative framework for representing and reasoning about complex preferences. Straightforward application of this theory to automate decision making is difficult due to high elicitation cost. In response to this problem, researchers have recently developed a number of qualitative, logic-oriented approaches for representing and reasoning about references. While effectively addressing some expressiveness issues, these logics have not proven powerful enough for building practical automated decision making systems. In this paper we present a hybrid approach to preference elicitation and decision making that is grounded in classical multi-attribute utility theory, but can make effective use of the expressive power of qualitative approaches. Specifically, assuming a partially specified multilinear utility function, we show how comparative statements about classes of decision alternatives can be used to further constrain the utility function and thus identify sup-optimal alternatives. This work demonstrates that quantitative and qualitative approaches can be synergistically integrated to provide effective and flexible decision support.\nMarkov decisions processes (MDPs) are becoming increasing popular as models of decision theoretic planning. While traditional dynamic programming methods perform well for problems with small state spaces, structured methods are needed for large problems. We propose and examine a value iteration algorithm for MDPs that uses algebraic decision diagrams(ADDs) to represent value functions and policies. An MDP is represented using Bayesian networks and ADDs and dynamic programming is applied directly to these ADDs. We demonstrate our method on large MDPs (up to 63 million states) and show that significant gains can be had when compared to tree-structured representations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions).\nWe introduce utility-directed procedures for mediating the flow of potentially distracting alerts and communications to computer users. We present models and inference procedures that balance the context-sensitive costs of deferring alerts with the cost of interruption. We describe the challenge of reasoning about such costs under uncertainty via an analysis of user activity and the content of notifications. After introducing principles of attention-sensitive alerting, we focus on the problem of guiding alerts about email messages. We dwell on the problem of inferring the expected criticality of email and discuss work on the Priorities system, centering on prioritizing email by criticality and modulating the communication of notifications to users about the presence and nature of incoming email.\nThe paper is a second in a series of two papers evaluating the power of a new scheme that generates search heuristics mechanically. The heuristics are extracted from an approximation scheme called mini-bucket elimination that was recently introduced. The first paper introduced the idea and evaluated it within Branch-and-Bound search. In the current paper the idea is further extended and evaluated within Best-First search. The resulting algorithms are compared on coding and medical diagnosis problems, using varying strength of the mini-bucket heuristics.   Our results demonstrate an effective search scheme that permits controlled tradeoff between preprocessing (for heuristic generation) and search. Best-first search is shown to outperform Branch-and-Bound, when supplied with good heuristics, and sufficient memory space.\nThe clique tree algorithm is the standard method for doing inference in Bayesian networks. It works by manipulating clique potentials - distributions over the variables in a clique. While this approach works well for many networks, it is limited by the need to maintain an exact representation of the clique potentials. This paper presents a new unified approach that combines approximate inference and the clique tree algorithm, thereby circumventing this limitation. Many known approximate inference algorithms can be viewed as instances of this approach. The algorithm essentially does clique tree propagation, using approximate inference to estimate the densities in each clique. In many settings, the computation of the approximate clique potential can be done easily using statistical importance sampling. Iterations are used to gradually improve the quality of the estimation.\nPoker is ideal for testing automated reasoning under uncertainty. It introduces uncertainty both by physical randomization and by incomplete information about opponents hands.Another source OF uncertainty IS the limited information available TO construct psychological models OF opponents, their tendencies TO bluff, play conservatively, reveal weakness, etc. AND the relation BETWEEN their hand strengths AND betting behaviour. ALL OF these uncertainties must be assessed accurately AND combined effectively FOR ANY reasonable LEVEL OF skill IN the game TO be achieved, since good decision making IS highly sensitive TO those tasks.We describe our Bayesian Poker Program(BPP), which uses a Bayesian network TO model the programs poker hand, the opponents hand AND the opponents playing behaviour conditioned upon the hand, and betting curves which govern play given a probability of winning. The history of play with opponents is used to improve BPPs understanding OF their behaviour.We compare BPP experimentally WITH : a simple RULE - based system; a program which depends exclusively ON hand probabilities(i.e., without opponent modeling); AND WITH human players.BPP has shown itself TO be an effective player against ALL these opponents, barring the better humans.We also sketch out SOME likely ways OF improving play.\nSolving symmetric Bayesian decision problems is a computationally intensive task to perform regardless of the algorithm used. In this paper we propose a method for improving the efficiency of algorithms for solving Bayesian decision problems. The method is based on the principle of lazy evaluation - a principle recently shown to improve the efficiency of inference in Bayesian networks. The basic idea is to maintain decompositions of potentials and to postpone computations for as long as possible. The efficiency improvements obtained with the lazy evaluation based method is emphasized through examples. Finally, the lazy evaluation based method is compared with the hugin and valuation-based systems architectures for solving symmetric Bayesian decision problems.\nSolving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP and/or policy are further constrained. We demonstrate good empirical results with a branch-and-bound method for finding globally optimal deterministic policies, and a gradient-ascent method for finding locally optimal stochastic policies.\nReactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finite-state controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.\nAs observations and student models become complex, educational assessments that exploit advances in technology and cognitive psychology can outstrip familiar testing models and analytic methods. Within the Portal conceptual framework for assessment design, Bayesian inference networks (BINs) record beliefs about students' knowledge and skills, in light of what they say and do. Joining evidence model BIN fragments- which contain observable variables and pointers to student model variables - to the student model allows one to update belief about knowledge and skills as observations arrive. Markov Chain Monte Carlo (MCMC) techniques can estimate the required conditional probabilities from empirical data, supplemented by expert judgment or substantive theory. Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter.\nIn this paper we present a new Bayesian network model for classification that combines the naive-Bayes (NB) classifier and the finite-mixture (FM) classifier. The resulting classifier aims at relaxing the strong assumptions on which the two component models are based, in an attempt to improve on their classification performance, both in terms of accuracy and in terms of calibration of the estimated probabilities. The proposed classifier is obtained by superimposing a finite mixture model on the set of feature variables of a naive Bayes model. We present experimental results that compare the predictive performance on real datasets of the new classifier with the predictive performance of the NB classifier and the FM classifier.\nRecently, researchers have demonstrated that loopy belief propagation - the use of Pearls polytree algorithm IN a Bayesian network WITH loops OF error- correcting codes.The most dramatic instance OF this IS the near Shannon - limit performance OF Turbo Codes codes whose decoding algorithm IS equivalent TO loopy belief propagation IN a chain - structured Bayesian network. IN this paper we ask : IS there something special about the error - correcting code context, OR does loopy propagation WORK AS an approximate inference schemeIN a more general setting? We compare the marginals computed using loopy propagation TO the exact ones IN four Bayesian network architectures, including two real - world networks : ALARM AND QMR.We find that the loopy beliefs often converge AND WHEN they do, they give a good approximation TO the correct marginals.However,ON the QMR network, the loopy beliefs oscillated AND had no obvious relationship TO the correct posteriors. We present SOME initial investigations INTO the cause OF these oscillations, AND show that SOME simple methods OF preventing them lead TO the wrong results.\nA major problem for the learning of Bayesian networks (BNs) is the exponential number of parameters needed for conditional probability tables. Recent research reduces this complexity by modeling local structure in the probability tables. We examine the use of log-linear local models. While log-linear models in this context are not new (Whittaker, 1990; Buntine, 1991; Neal, 1992; Heckerman and Meek, 1997), for structure learning they are generally subsumed under a naive Bayes model. We describe an alternative interpretation, and use a Minimum Message Length (MML) (Wallace, 1987) metric for structure learning of networks exhibiting causal independence, which we term first-order networks (FONs). We also investigate local model selection on a node-by-node basis.\nThe need to help people choose among large numbers of items and to filter through large amounts of information has led to a flood of research in construction of personal recommendation agents. One of the central issues in constructing such agents is the representation and elicitation of user preferences or interests. This topic has long been studied in Decision Theory, but surprisingly little work in the area of recommender systems has made use of formal decision-theoretic techniques. This paper describes DIVA, a decision-theoretic agent for recommending movies that contains a number of novel features. DIVA represents user preferences using pairwise comparisons among items, rather than numeric ratings. It uses a novel similarity measure based on the concept of the probability of conflict between two orderings of items. The system has a rich representation of preference, distinguishing between a user's general taste in movies and his immediate interests. It takes an incremental approach to preference elicitation in which the user can provide feedback if not satisfied with the recommendation list. We empirically evaluate the performance of the system using the EachMovie collaborative filtering database.\nGraphical models based on conditional independence support concise encodings of the subjective belief of a single agent. A natural question is whether the consensus belief of a group of agents can be represented with equal parsimony. We prove, under relatively mild assumptions, that even if everyone agrees on a common graph topology, no method of combining beliefs can maintain that structure. Even weaker conditions rule out local aggregation within conditional probability tables. On a more positive note, we show that if probabilities are combined with the logarithmic opinion pool (LogOP), then commonly held Markov independencies are maintained. This suggests a straightforward procedure for constructing a consensus Markov network. We describe an algorithm for computing the LogOP with time complexity comparable to that of exact Bayesian inference.\nBayesian Networks (BN) provide robust probabilistic methods of reasoning under uncertainty, but despite their formal grounds are strictly based on the notion of conditional dependence, not much attention has been paid so far to their use in dependability analysis. The aim of this paper is to propose BN as a suitable tool for dependability analysis, by challenging the formalism with basic issues arising in dependability tasks. We will discuss how both modeling and analysis issues can be naturally dealt with by BN. Moreover, we will show how some limitations intrinsic to combinatorial dependability methods such as Fault Trees can be overcome using BN. This will be pursued through the study of a real-world example concerning the reliability analysis of a redundant digital Programmable Logic Controller (PLC) with majority voting 2:3\nInference networks have a variety of important uses and are constructed by persons having quite different standpoints. Discussed in this paper are three different but complementary methods for generating and analyzing probabilistic inference networks. The first method, though over eighty years old, is very useful for knowledge representation in the task of constructing probabilistic arguments. It is also useful as a heuristic device in generating new forms of evidence. The other two methods are formally equivalent ways for combining probabilities in the analysis of inference networks. The use of these three methods is illustrated in an analysis of a mass of evidence in a celebrated American law case.\nHidden Markov models (HMMs) and partially observable Markov decision processes (POMDPs) form a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here is concerned with acquiring such models. We demonstrate how domain-specific information and constraints can be incorporated into the statistical estimation process, greatly improving the learned models in terms of the model quality, the number of iterations required for convergence and robustness to reduction in the amount of available data. We present new initialization heuristics which can be used even when the data suffers from cumulative rotational error, new update rules for the model parameters, as an instance of generalized EM, and a strategy for enforcing complete geometrical consistency in the model. Experimental results demonstrate the effectiveness of our approach for both simulated and real robot data, in traditionally hard-to-learn environments.\nThe deontic logic DUS is a Deontic Update Semantics for prescriptive obligations based on the update semantics of Veltman. In DUS the definition of logical validity of obligations is not based on static truth values but on dynamic action transitions. In this paper prescriptive defeasible obligations are formalized in update semantics and the diagnostic problem of defeasible deontic logic is discussed. Assume a defeasible obligation `normally A ought to be (done)' together withthe fact `A is not (done).' Is this an exception of the normality claim, or is it a violation of the obligation? In this paper we formalize the heuristic principle that it is a violation, unless there is a more specific overriding obligation. The underlying motivation from legal reasoning is that criminals should have as little opportunities as possible to excuse themselves by claiming that their behavior was exceptional rather than criminal.\nIn building Bayesian belief networks, the elicitation of all probabilities required can be a major obstacle. We learned the extent of this often-cited observation in the construction of the probabilistic part of a complex influence diagram in the field of cancer treatment. Based upon our negative experiences with existing methods, we designed a new method for probability elicitation from domain experts. The method combines various ideas, among which are the ideas of transcribing probabilities and of using a scale with both numerical and verbal anchors for marking assessments. In the construction of the probabilistic part of our influence diagram, the method proved to allow for the elicitation of many probabilities in little time.\nThe AGM theory of belief revision has become an important paradigm for investigating rational belief changes. Unfortunately, researchers working in this paradigm have restricted much of their attention to rather simple representations of belief states, namely logically closed sets of propositional sentences. In our opinion, this has resulted in a too abstract categorisation of belief change operations: expansion, revision, or contraction. Occasionally, in the AGM paradigm, also probabilistic belief changes have been considered, and it is widely accepted that the probabilistic version of expansion is conditioning. However, we argue that it may be more correct to view conditioning and expansion as two essentially different kinds of belief change, and that what we call constraining is a better candidate for being considered probabilistic expansion.\nIt is well-known that the notion of (strong) conditional independence (CI) is too restrictive to capture independencies that only hold in certain contexts. This kind of contextual independency, called context-strong independence (CSI), can be used to facilitate the acquisition, representation, and inference of probabilistic knowledge. In this paper, we suggest the use of contextual weak independence (CWI) in Bayesian networks. It should be emphasized that the notion of CWI is a more general form of contextual independence than CSI. Furthermore, if the contextual strong independence holds for all contexts, then the notion of CSI becomes strong CI. On the other hand, if the weak contextual independence holds for all contexts, then the notion of CWI becomes weak independence (WI) nwhich is a more general noncontextual independency than strong CI. More importantly, complete axiomatizations are studied for both the class of WI and the class of CI and WI together. Finally, the interesting property of WI being a necessary and sufficient condition for ensuring consistency in granular probabilistic networks is shown.\nAs Bayesian networks are applied to larger and more complex problem domains, search for flexible modeling and more efficient inference methods is an ongoing effort. Multiply sectioned Bayesian networks (MSBNs) extend the HUGIN inference for Bayesian networks into a coherent framework for flexible modeling and distributed inference.Lazy propagation extends the Shafer-Shenoy and HUGIN inference methods with reduced space complexity. We apply the Shafer-Shenoy and lazy propagation to inference in MSBNs. The combination of the MSBN framework and lazy propagation provides a better framework for modeling and inference in very large domains. It retains the modeling flexibility of MSBNs and reduces the runtime space complexity, allowing exact inference in much larger domains given the same computational resources.\nRecent interests in dynamic decision modeling have led to the development of several representation and inference methods. These methods however, have limited application under time critical conditions where a trade-off between model quality and computational tractability is essential. This paper presents an approach to time-critical dynamic decision modeling. A knowledge representation and modeling method called the time-critical dynamic influence diagram is proposed. The formalism has two forms. The condensed form is used for modeling and model abstraction, while the deployed form which can be converted from the condensed form is used for inference purposes. The proposed approach has the ability to represent space-temporal abstraction within the model. A knowledge-based meta-reasoning approach is proposed for the purpose of selecting the best abstracted model that provide the optimal trade-off between model quality and model tractability. An outline of the knowledge-based model construction algorithm is also provided.\nIn this paper we propose a logic-based, framework inspired by artificial intelligence, but scaled down for practical database and programming applications. Computation in the framework is viewed as the task of generating a sequence of state transitions, with the purpose of making an agent's goals all true. States are represented by sets of atomic sentences (or facts), representing the values of program variables, tuples in a coordination language, facts in relational databases, or Herbrand models.   In the model-theoretic semantics, the entire sequence of states and events are combined into a single model-theoretic structure, by associating timestamps with facts and events. But in the operational semantics, facts are updated destructively, without timestamps. We show that the model generated by destructive updates is identical to the model generated by reasoning with facts containing timestamps. We also extend the model with intentional predicates and composite event predicates defined by logic programs containing conditions in first-order logic, which query the current state.\nPossibilistic logic is a well-known graded logic of uncertainty suitable to reason under incomplete information and partially inconsistent knowledge, which is built upon classical first order logic. There exists for Possibilistic logic a proof procedure based on a refutation complete resolution-style calculus. Recently, a syntactical extension of first order Possibilistic logic (called PLFC) dealing with fuzzy constants and fuzzily restricted quantifiers has been proposed. Our aim is to present steps towards both the formalization of PLFC itself and an automated deduction system for it by (i) providing a formal semantics; (ii) defining a sound resolution-style calculus by refutation; and (iii) describing a first-order proof procedure for PLFC clauses based on (ii) and on a novel notion of most general substitution of two literals in a resolution step. In contrast to standard Possibilistic logic semantics, truth-evaluation of formulas with fuzzy constants are many-valued instead of boolean, and consequently an extended notion of possibilistic uncertainty is also needed.\nThere exist two general forms of exact algorithms for updating probabilities in Bayesian Networks. The first approach involves using a structure, usually a clique tree, and performing local message based calculation to extract the belief in each variable. The second general class of algorithm involves the use of non-serial dynamic programming techniques to extract the belief in some desired group of variables. In this paper we present a hybrid algorithm based on the latter approach yet possessing the ability to retrieve the belief in all single variables. The technique is advantageous in that it saves a NP-hard computation step over using one algorithm of each type. Furthermore, this technique re-enforces a conjecture of Jensen and Jensen [JJ94] in that it still requires a single NP-hard step to set up the structure on which inference is performed, as we show by confirming Li and D'Ambrosio's [LD94] conjectured NP-hardness of OFP.\nRecent research in decision theoretic planning has focussed on making the solution of Markov decision processes (MDPs) more feasible. We develop a family of algorithms for structured reachability analysis of MDPs that are suitable when an initial state (or set of states) is known. Using compact, structured representations of MDPs (e.g., Bayesian networks), our methods, which vary in the tradeoff between complexity and accuracy, produce structured descriptions of (estimated) reachable states that can be used to eliminate variables or variable values from the problem description, reducing the size of the MDP and making it easier to solve. One contribution of our work is the extension of ideas from GRAPHPLAN to deal with the distributed nature of action representations typically embodied within Bayes nets and the problem of correlated action effects. We also demonstrate that our algorithm can be made more complete by using k-ary constraints instead of binary constraints. Another contribution is the illustration of how the compact representation of reachability constraints can be exploited by several existing (exact and approximate) abstraction algorithms for MDPs.\nInformation Retrieval (IR) is concerned with the identification of documents in a collection that are relevant to a given information need, usually represented as a query containing terms or keywords, which are supposed to be a good description of what the user is looking for. IR systems may improve their effectiveness (i.e., increasing the number of relevant documents retrieved) by using a process of query expansion, which automatically adds new terms to the original query posed by an user. In this paper we develop a method of query expansion based on Bayesian networks. Using a learning algorithm, we construct a Bayesian network that represents some of the relationships among the terms appearing in a given document collection; this network is then used as a thesaurus (specific for that collection). We also report the results obtained by our method on three standard test collections.\nIt is well known that one can ignore parts of a belief network when computing answers to certain probabilistic queries. It is also well known that the ignorable parts (if any) depend on the specific query of interest and, therefore, may change as the query changes. Algorithms based on jointrees, however, do not seem to take computational advantage of these facts given that they typically construct jointrees for worst-case queries; that is, queries for which every part of the belief network is considered relevant. To address this limitation, we propose in this paper a method for reconfiguring jointrees dynamically as the query changes. The reconfiguration process aims at maintaining a jointree which corresponds to the underlying belief network after it has been pruned given the current query. Our reconfiguration method is marked by three characteristics: (a) it is based on a non-classical definition of jointrees; (b) it is relatively efficient; and (c) it can reuse some of the computations performed before a jointree is reconfigured. We present preliminary experimental results which demonstrate significant savings over using static jointrees when query changes are considerable.\nIn recent years there has been a flurry of works on learning Bayesian networks from data. One of the hard problems in this area is how to effectively learn the structure of a belief network from incomplete data- that is, in the presence of missing values or hidden variables. In a recent paper, I introduced an algorithm called Structural EM that combines the standard Expectation Maximization (EM) algorithm, which optimizes parameters, with structure search for model selection. That algorithm learns networks based on penalized likelihood scores, which include the BIC/MDL score and various approximations to the Bayesian score. In this paper, I extend Structural EM to deal directly with Bayesian model selection. I prove the convergence of the resulting algorithm and show how to apply it for learning a large class of probabilistic models, including Bayesian networks and some variants thereof.\nThis paper (1)shows that the best supported current psychological theory (Cheng, 1997) of how human subjects judge the causal power or influence of variations in presence or absence of one feature on another, given data on their covariation, tacitly uses a Bayes network which is either a noisy or gate (for causes that promote the effect) or a noisy and gate (for causes that inhibit the effect); (2)generalizes Chengs theory to arbitrary acyclic networks of noisy or and noisy and gates; (3)gives various sufficient conditions for the estimation of the parameters in such networks when there are independent, unobserved causes; (4)distinguishes direct causal influence of one feature on another (influence along a path with one edge) from total influence (influence along all paths from one variable to another) and gives sufficient conditions for estimating each when there are unobserved causes of the outcome variable; (5)describes the relation between Cheng models and a simplified version of the Rubin framework for representing causal relations.\nWhile decision theory provides an appealing normative framework for representing rich preference structures, eliciting utility or value functions typically incurs a large cost. For many applications involving interactive systems this overhead precludes the use of formal decision-theoretic models of preference. Instead of performing elicitation in a vacuum, it would be useful if we could augment directly elicited preferences with some appropriate default information. In this paper we propose a case-based approach to alleviating the preference elicitation bottleneck. Assuming the existence of a population of users from whom we have elicited complete or incomplete preference structures, we propose eliciting the preferences of a new user interactively and incrementally, using the closest existing preference structures as potential defaults. Since a notion of closeness demands a measure of distance among preference structures, this paper takes the first step of studying various distance measures over fully and partially specified preference structures. We explore the use of Euclidean distance, Spearmans footrule, and define a new measure, the probabilistic distance. We provide computational techniques for all three measures.\nWe investigate the use of temporally abstract actions, or macro-actions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macro-actions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macro-actions only, and that significantly reduces the size of the state space. This is achieved by treating macroactions as local policies that act in certain regions of state space, and by restricting states in the abstract MDP to those at the boundaries of regions. The abstract MDP approximates the original and can be solved more efficiently. We discuss several ways in which macro-actions can be generated to ensure good solution quality. Finally, we consider ways in which macro-actions can be reused to solve multiple, related MDPs; and we show that this can justify the computational overhead of macro-action generation.\nPeople using consumer software applications typically do not use technical jargon when querying an online database of help topics. Rather, they attempt to communicate their goals with common words and phrases that describe software functionality in terms of structure and objects they understand. We describe a Bayesian approach to modeling the relationship between words in a user's query for assistance and the informational goals of the user. After reviewing the general method, we describe several extensions that center on integrating additional distinctions and structure about language usage and user goals into the Bayesian models.\nStochastic search algorithms are among the most sucessful approaches for solving hard combinatorial problems. A large class of stochastic search approaches can be cast into the framework of Las Vegas Algorithms (LVAs). As the run-time behavior of LVAs is characterized by random variables, the detailed knowledge of run-time distributions provides important information for the analysis of these algorithms. In this paper we propose a novel methodology for evaluating the performance of LVAs, based on the identification of empirical run-time distributions. We exemplify our approach by applying it to Stochastic Local Search (SLS) algorithms for the satisfiability problem (SAT) in propositional logic. We point out pitfalls arising from the use of improper empirical methods and discuss the benefits of the proposed methodology for evaluating and comparing LVAs.\nWe present an anytime algorithm which computes policies for decision problems represented as multi-stage influence diagrams. Our algorithm constructs policies incrementally, starting from a policy which makes no use of the available information. The incremental process constructs policies which includes more of the information available to the decision maker at each step. While the process converges to the optimal policy, our approach is designed for situations in which computing the optimal policy is infeasible. We provide examples of the process on several large decision problems, showing that, for these examples, the process constructs valuable (but sub-optimal) policies before the optimal policy would be available by traditional methods.\nThe adaptation to situations of sequential choice under uncertainty of decision criteria which deviate from (subjective) expected utility raises the problem of ensuring the selection of a nondominated strategy. In particular, when following the suggestion of Machina and McClennen of giving up separability (also known as consequentialism), which requires the choice of a substrategy in a subtree to depend only on data relevant to that subtree, one must renounce to the use of dynamic programming, since Bellman's principle is no longer valid. An interpretation of McClennen's resolute choice, based on cooperation between the successive Selves of the decision maker, is proposed. Implementations of resolute choice which prevent Money Pumps negative prices of information or, more generally, choices of dominated strategies, while remaining computationally tractable, are proposed.\nThis paper proposes a method to find the actual state of a complex dynamic system from information coming from the sensors on the system himself, or on its environment. The nominal evolution of the system is a priori known and can be modeled (by an expert, for example), by different methods. In this paper, the Petri nets have been chosen. Contrary to the usual use of the Petri nets, the initial state of the system is unknown. So a degree of belief is bound to each places, or set of places. The theory used to model this uncertainty is the Dempster-Shafer's one which is well adapted to this type of problems. From the given Petri net characterizing the nominal evolution of the dynamic system, and from the observation inputs, the proposed method allows to determine according to the reliability of the model and the inputs, the state of the system at any time.\nQualitative probabilistic reasoning in a Bayesian network often reveals tradeoffs: relationships that are ambiguous due to competing qualitative influences. We present two techniques that combine qualitative and numeric probabilistic reasoning to resolve such tradeoffs, inferring the qualitative relationship between nodes in a Bayesian network. The first approach incrementally marginalizes nodes that contribute to the ambiguous qualitative relationships. The second approach evaluates approximate Bayesian networks for bounds of probability distributions, and uses these bounds to determinate qualitative relationships in question. This approach is also incremental in that the algorithm refines the state spaces of random variables for tighter bounds until the qualitative relationships are resolved. Both approaches provide systematic methods for tradeoff resolution at potentially lower computational cost than application of purely numeric methods.\nWe present locally complete inference rules for probabilistic deduction from taxonomic and probabilistic knowledge-bases over conjunctive events. Crucially, in contrast to similar inference rules in the literature, our inference rules are locally complete for conjunctive events and under additional taxonomic knowledge. We discover that our inference rules are extremely complex and that it is at first glance not clear at all where the deduced tightest bounds come from. Moreover, analyzing the global completeness of our inference rules, we find examples of globally very incomplete probabilistic deductions. More generally, we even show that all systems of inference rules for taxonomic and probabilistic knowledge-bases over conjunctive events are globally incomplete. We conclude that probabilistic deduction by the iterative application of inference rules on interval restrictions for conditional probabilities, even though considered very promising in the literature so far, seems very limited in its field of application.\nThe efficiency of algorithms using secondary structures for probabilistic inference in Bayesian networks can be improved by exploiting independence relations induced by evidence and the direction of the links in the original network. In this paper we present an algorithm that on-line exploits independence relations induced by evidence and the direction of the links in the original network to reduce both time and space costs. Instead of multiplying the conditional probability distributions for the various cliques, we determine on-line which potentials to multiply when a message is to be produced. The performance improvement of the algorithm is emphasized through empirical evaluations involving large real world Bayesian networks, and we compare the method with the HUGIN and Shafer-Shenoy inference algorithms.\nSeveral authors have explained that the likelihood ratio measures the strength of the evidence represented by observations in statistical problems. This idea works fine when the goal is to evaluate the strength of the available evidence for a simple hypothesis versus another simple hypothesis. However, the applicability of this idea is limited to simple hypotheses because the likelihood function is primarily defined on points (simple hypotheses) of the parameter space. In this paper we define a general weight of evidence that is applicable to both simple and composite hypotheses. It is based on the Dempster-Shafer concept of plausibility and is shown to be a generalization of the likelihood ratio. Functional models are of a fundamental importance for the general weight of evidence proposed in this paper. The relevant concepts and ideas are explained by means of a familiar urn problem and the general analysis of a real-world medical problem is presented.\nIn this paper we address the problem of discretization in the context of learning Bayesian networks (BNs) from data containing both continuous and discrete variables. We describe a new technique for <EM>multivariate</EM> discretization, whereby each continuous variable is discretized while taking into account its interaction with the other variables. The technique is based on the use of a Bayesian scoring metric that scores the discretization policy for a continuous variable given a BN structure and the observed data. Since the metric is relative to the BN structure currently being evaluated, the discretization of a variable needs to be dynamically adjusted as the BN structure changes.\nDistributed knowledge based applications in open domain rely on common sense information which is bound to be uncertain and incomplete. To draw the useful conclusions from ambiguous data, one must address uncertainties and conflicts incurred in a holistic view. No integrated frameworks are viable without an in-depth analysis of conflicts incurred by uncertainties. In this paper, we give such an analysis and based on the result, propose an integrated framework. Our framework extends definite argumentation theory to model uncertainty. It supports three views over conflicting and uncertain knowledge. Thus, knowledge engineers can draw different conclusions depending on the application context (i.e. view). We also give an illustrative example on strategical decision support to show the practical usefulness of our framework.\nThe process of diagnosis involves learning about the state of a system from various observations of symptoms or findings about the system. Sophisticated Bayesian (and other) algorithms have been developed to revise and maintain beliefs about the system as observations are made. Nonetheless, diagnostic models have tended to ignore some common sense reasoning exploited by human diagnosticians; In particular, one can learn from which observations have not been made, in the spirit of conversational implicature. There are two concepts that we describe to extract information from the observations not made. First, some symptoms, if present, are more likely to be reported before others. Second, most human diagnosticians and expert systems are economical in their data-gathering, searching first where they are more likely to find symptoms present. Thus, there is a desirable bias toward reporting symptoms that are present. We develop a simple model for these concepts that can significantly improve diagnostic inference.\nThere is evidence that the numbers in probabilistic inference don't really matter. This paper considers the idea that we can make a probabilistic model simpler by making fewer distinctions. Unfortunately, the level of a Bayesian network seems too coarse; it is unlikely that a parent will make little difference for all values of the other parents. In this paper we consider an approximation scheme where distinctions can be ignored in some contexts, but not in other contexts. We elaborate on a notion of a parent context that allows a structured context-specific decomposition of a probability distribution and the associated probabilistic inference scheme called probabilistic partial evaluation (Poole 1997). This paper shows a way to simplify a probabilistic model by ignoring distinctions which have similar probabilities, a method to exploit the simpler model, a bound on the resulting errors, and some preliminary empirical results on simple networks.\nIt was recently shown that the problem of decoding messages transmitted through a noisy channel can be formulated as a belief updating task over a probabilistic network [McEliece]. Moreover, it was observed that iterative application of the (linear time) Pearl's belief propagation algorithm designed for polytrees outperformed state of the art decoding algorithms, even though the corresponding networks may have many cycles. This paper demonstrates empirically that an approximation algorithm approx-mpe for solving the most probable explanation (MPE) problem, developed within the recently proposed mini-bucket elimination framework [Dechter96], outperforms iterative belief propagation on classes of coding networks that have bounded induced width. Our experiments suggest that approximate MPE decoders can be good competitors to the approximate belief updating decoders.\nThis paper describes a decision theoretic formulation of learning the graphical structure of a Bayesian Belief Network from data. This framework subsumes the standard Bayesian approach of choosing the model with the largest posterior probability as the solution of a decision problem with a 0-1 loss function and allows the use of more general loss functions able to trade-off the complexity of the selected model and the error of choosing an oversimplified model. A new class of loss functions, called disintegrable, is introduced, to allow the decision problem to match the decomposability of the graphical model. With this class of loss functions, the optimal solution to the decision problem can be found using an efficient bottom-up search strategy.\nOne of the benefits of belief networks and influence diagrams is that so much knowledge is captured in the graphical structure. In particular, statements of conditional irrelevance (or independence) can be verified in time linear in the size of the graph. To resolve a particular inference query or decision problem, only some of the possible states and probability distributions must be specified, the \"requisite information.\"   This paper presents a new, simple, and efficient \"Bayes-ball\" algorithm which is well-suited to both new students of belief networks and state of the art implementations. The Bayes-ball algorithm determines irrelevant sets and requisite information more efficiently than existing methods, and is linear in the size of the graph for belief networks and influence diagrams.\nA constant rebalanced portfolio is an asset allocation algorithm which keeps the same distribution of wealth among a set of assets along a period of time. Recently, there has been work on on-line portfolio selection algorithms which are competitive with the best constant rebalanced portfolio determined in hindsight. By their nature, these algorithms employ the assumption that high returns can be achieved using a fixed asset allocation strategy. However, stock markets are far from being stationary and in many cases the wealth achieved by a constant rebalanced portfolio is much smaller than the wealth achieved by an ad-hoc investment strategy that adapts to changes in the market. In this paper we present an efficient Bayesian portfolio selection algorithm that is able to track a changing market. We also describe a simple extension of the algorithm for the case of a general transaction cost, including the transactions cost models recently investigated by Blum and kalai. We provide a simple analysis of the competitiveness of the algorithm and check its performance on real stock data from the New York Stock Exchange accumulated during a 22-year period.\nAThe paper gives a few arguments in favour of the use of chain graphs for description of probabilistic conditional independence structures. Every Bayesian network model can be equivalently introduced by means of a factorization formula with respect to a chain graph which is Markov equivalent to the Bayesian network. A graphical characterization of such graphs is given. The class of equivalent graphs can be represented by a distinguished graph which is called the largest chain graph. The factorization formula with respect to the largest chain graph is a basis of a proposal of how to represent the corresponding (discrete) probability distribution in a computer (i.e. parametrize it). This way does not depend on the choice of a particular Bayesian network from the class of equivalent networks and seems to be the most efficient way from the point of view of memory demands. A separation criterion for reading independency statements from a chain graph is formulated in a simpler way. It resembles the well-known d-separation criterion for Bayesian networks and can be implemented locally.\nWe describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple search-and-score algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman--Stutz asymptotic approximation for model posterior probability and (2) the Expectation--Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples.\nIn the real world, insufficient information, limited computation resources, and complex problem structures often force an autonomous agent to make a decision in time less than that required to solve the problem at hand completely. Flexible and approximate computations are two approaches to decision making under limited computation resources. Flexible computation helps an agent to flexibly allocate limited computation resources so that the overall system utility is maximized. Approximate computation enables an agent to find the best satisfactory solution within a deadline. In this paper, we present two state-space reduction methods for flexible and approximate computation: quantitative reduction to deal with inaccurate heuristic information, and structural reduction to handle complex problem structures. These two methods can be applied successively to continuously improve solution quality if more computation is available. Our results show that these reduction methods are effective and efficient, finding better solutions with less computation than some existing well-known methods.\nThe task of expert finding has been getting increasing attention in information retrieval literature. However, the current state-of-the-art is still lacking in principled approaches for combining different sources of evidence in an optimal way. This paper explores the usage of learning to rank methods as a principled approach for combining multiple estimators of expertise, derived from the textual contents, from the graph-structure with the citation patterns for the community of experts, and from profile information about the experts. Experiments made over a dataset of academic publications, for the area of Computer Science, attest for the adequacy of the proposed approaches.\nIn the winter of 1967 Cambridge radio astronomers discovered a new type of radio source of such an artificial seeming nature that for a few weeks some members of the group had to seriously consider whether they had discovered an extraterrestrial intelligence. Although their investigations lead them to a natural explanation (they had discovered pulsars), they had discussed the implications if it was indeed an artificial source: how to verify such a conclusion and how to announce it, and whether such a discovery might be dangerous. In this they presaged many of the components of the SETI Detection Protocols and the proposed Reply Protocols which have been used to guide the responses of groups dealing with the detection of an extraterrestrial intelligence. These Protocols were only established some twenty five years later in the 1990s and 2000s. Using contemporary and near-contemporary documentation and later recollections, this paper discusses in detail what happened that winter.\nWide-angle sonar mapping of the environment by mobile robot is nontrivial due to several sources of uncertainty: dropouts due to \"specular\" reflections, obstacle location uncertainty due to the wide beam, and distance measurement error. Earlier papers address the latter problems, but dropouts remain a problem in many environments. We present an approach that lifts the overoptimistic independence assumption used in earlier work, and use Bayes nets to represent the dependencies between objects of the model. Objects of the model consist of readings, and of regions in which \"quasi location invariance\" of the (possible) obstacles exists, with respect to the readings. Simulation supports the method's feasibility. The model is readily extensible to allow for prior distributions, as well as other types of sensing operations.\nWe present an algorithm for arc reversal in Bayesian networks with tree-structured conditional probability tables, and consider some of its advantages, especially for the simulation of dynamic probabilistic networks. In particular, the method allows one to produce CPTs for nodes involved in the reversal that exploit regularities in the conditional distributions. We argue that this approach alleviates some of the overhead associated with arc reversal, plays an important role in evidence integration and can be used to restrict sampling of variables in DPNs. We also provide an algorithm that detects the dynamic irrelevance of state variables in forward simulation. This algorithm exploits the structured CPTs in a reversed network to determine, in a time-independent fashion, the conditions under which a variable does or does not need to be sampled.\nRecently several researchers have investigated techniques for using data to learn Bayesian networks containing compact representations for the conditional probability distributions (CPDs) stored at each node. The majority of this work has concentrated on using decision-tree representations for the CPDs. In addition, researchers typically apply non-Bayesian (or asymptotically Bayesian) scoring functions such as MDL to evaluate the goodness-of-fit of networks to the data. In this paper we investigate a Bayesian approach to learning Bayesian networks that contain the more general decision-graph representations of the CPDs. First, we describe how to evaluate the posterior probability that is, the Bayesian score of such a network, given a database of observed cases. Second, we describe various search spaces that can be used, in conjunction with a scoring function and a search procedure, to identify one or more high-scoring networks. Finally, we present an experimental evaluation of the search spaces, using a greedy algorithm and a Bayesian scoring function.\nIt has been shown that a class of probabilistic domain models cannot be learned correctly by several existing algorithms which employ a single-link look ahead search. When a multi-link look ahead search is used, the computational complexity of the learning algorithm increases. We study how to use parallelism to tackle the increased complexity in learning such models and to speed up learning in large domains. An algorithm is proposed to decompose the learning task for parallel processing. A further task decomposition is used to balance load among processors and to increase the speed-up and efficiency. For learning from very large datasets, we present a regrouping of the available processors such that slow data access through file can be replaced by fast memory access. Our implementation in a parallel computer demonstrates the effectiveness of the algorithm.\nRobust Bayesian inference is the calculation of posterior probability bounds given perturbations in a probabilistic model. This paper focuses on perturbations that can be expressed locally in Bayesian networks through convex sets of distributions. Two approaches for combination of local models are considered. The first approach takes the largest set of joint distributions that is compatible with the local sets of distributions; we show how to reduce this type of robust inference to a linear programming problem. The second approach takes the convex hull of joint distributions generated from the local sets of distributions; we demonstrate how to apply interior-point optimization methods to generate posterior bounds and how to generate approximations that are guaranteed to converge to correct posterior bounds. We also discuss calculation of bounds for expected utilities and variances, and global perturbation models.\nWe present a method for calculation of myopic value of information in influence diagrams (Howard & Matheson, 1981) based on the strong junction tree framework (Jensen, Jensen & Dittmer, 1994). The difference in instantiation order in the influence diagrams is reflected in the corresponding junction trees by the order in which the chance nodes are marginalized. This order of marginalization can be changed by table expansion and in effect the same junction tree with expanded tables may be used for calculating the expected utility for scenarios with different instantiation order. We also compare our method to the classic method of modeling different instantiation orders in the same influence diagram.\nThis paper investigates the problem of finding a preference relation on a set of acts from the knowledge of an ordering on events (subsets of states of the world) describing the decision-maker (DM)s uncertainty and an ordering of consequences of acts, describing the DMs preferences. However, contrary to classical approaches to decision theory, we try to do it without resorting to any numerical representation of utility nor uncertainty, and without even using any qualitative scale on which both uncertainty and preference could be mapped. It is shown that although many axioms of Savage theory can be preserved and despite the intuitive appeal of the method for constructing a preference over acts, the approach is inconsistent with a probabilistic representation of uncertainty, but leads to the kind of uncertainty theory encountered in non-monotonic reasoning (especially preferential and rational inference), closely related to possibility theory. Moreover the method turns out to be either very little decisive or to lead to very risky decisions, although its basic principles look sound. This paper raises the question of the very possibility of purely symbolic approaches to Savage-like decision-making under uncertainty and obtains preliminary negative results.\nThere is an obvious need for improving the performance and accuracy of a Bayesian network as new data is observed. Because of errors in model construction and changes in the dynamics of the domains, we cannot afford to ignore the information in new data. While sequential update of parameters for a fixed structure can be accomplished using standard techniques, sequential update of network structure is still an open problem. In this paper, we investigate sequential update of Bayesian networks were both parameters and structure are expected to change. We introduce a new approach that allows for the flexible manipulation of the tradeoff between the quality of the learned networks and the amount of information that is maintained about past observations. We formally describe our approach including the necessary modifications to the scoring functions for learning Bayesian networks, evaluate its effectiveness through an empirical study, and extend it to the case of missing data.\n\"Background subtraction\" is an old technique for finding moving objects in a video sequence for example, cars driving on a freeway. The idea is that subtracting the current image from a timeaveraged background image will leave only nonstationary objects. It is, however, a crude approximation to the task of classifying each pixel of the current image; it fails with slow-moving objects and does not distinguish shadows from moving objects. The basic idea of this paper is that we can classify each pixel using a model of how that pixel looks when it is part of different classes. We learn a mixture-of-Gaussians classification model for each pixel using an unsupervised technique- an efficient, incremental version of EM. Unlike the standard image-averaging approach, this automatically updates the mixture component for each class according to likelihood of membership; hence slow-moving objects are handled perfectly. Our approach also identifies and eliminates shadows much more effectively than other techniques such as thresholding. Application of this method as part of the Roadwatch traffic surveillance project is expected to result in significant improvements in vehicle identification and tracking.\nA Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for learning BNs, however, attempt to optimize another criterion (usually likelihood, possibly augmented with a regularizing term), which is independent of the distribution of queries that are posed. This paper takes the \"performance criteria\" seriously, and considers the challenge of computing the BN whose performance - read \"accuracy over the distribution of queries\" - is optimal. We show that many aspects of this learning task are more difficult than the corresponding subtasks in the standard model.\nConditioning is the generally agreed-upon method for updating probability distributions when one learns that an event is certainly true. But it has been argued that we need other rules, in particular the rule of cross-entropy minimization, to handle updates that involve uncertain information. In this paper we re-examine such a case: van Fraassen's Judy Benjamin problem, which in essence asks how one might update given the value of a conditional probability. We argue that -- contrary to the suggestions in the literature -- it is possible to use simple conditionalization in this case, and thereby obtain answers that agree fully with intuition. This contrasts with proposals such as cross-entropy, which are easier to apply but can give unsatisfactory answers. Based on the lessons from this example, we speculate on some general philosophical issues concerning probability update.\nDecision theory has become widely accepted in the AI community as a useful framework for planning and decision making. Applying the framework typically requires elicitation of some form of probability and utility information. While much work in AI has focused on providing representations and tools for elicitation of probabilities, relatively little work has addressed the elicitation of utility models. This imbalance is not particularly justified considering that probability models are relatively stable across problem instances, while utility models may be different for each instance. Spending large amounts of time on elicitation can be undesirable for interactive systems used in low-stakes decision making and in time-critical decision making. In this paper we investigate the issues of reasoning with incomplete utility models. We identify patterns of problem instances where plans can be proved to be suboptimal if the (unknown) utility function satisfies certain conditions. We present an approach to planning and decision making that performs the utility elicitation incrementally and in a way that is informed by the domain model.\nWe describe work to control graphics rendering under limited computational resources by taking a decision-theoretic perspective on perceptual costs and computational savings of approximations. The work extends earlier work on the control of rendering by introducing methods and models for computing the expected cost associated with degradations of scene components. The expected cost is computed by considering the perceptual cost of degradations and a probability distribution over the attentional focus of viewers. We review the critical literature describing findings on visual search and attention, discuss the implications of the findings, and introduce models of expected perceptual cost. Finally, we discuss policies that harness information about the expected cost of scene components.\nA pseudo independent (PI) model is a probabilistic domain model (PDM) where proper subsets of a set of collectively dependent variables display marginal independence. PI models cannot be learned correctly by many algorithms that rely on a single link search. Earlier work on learning PI models has suggested a straightforward multi-link search algorithm. However, when a domain contains recursively embedded PI submodels, it may escape the detection of such an algorithm. In this paper, we propose an improved algorithm that ensures the learning of all embedded PI submodels whose sizes are upper bounded by a predetermined parameter. We show that this improved learning capability only increases the complexity slightly beyond that of the previous algorithm. The performance of the new algorithm is demonstrated through experiment.\nDecomposable models and Bayesian networks can be defined as sequences of oligo-dimensional probability measures connected with operators of composition. The preliminary results suggest that the probabilistic models allowing for effective computational procedures are represented by sequences possessing a special property; we shall call them perfect sequences. The paper lays down the elementary foundation necessary for further study of iterative application of operators of composition. We believe to develop a technique describing several graph models in a unifying way. We are convinced that practically all theoretical results and procedures connected with decomposable models and Bayesian networks can be translated into the terminology introduced in this paper. For example, complexity of computational procedures in these models is closely dependent on possibility to change the ordering of oligo-dimensional measures defining the model. Therefore, in this paper, lot of attention is paid to possibility to change ordering of the operators of composition.\nThe efficiency of inference in both the Hugin and, most notably, the Shafer-Shenoy architectures can be improved by exploiting the independence relations induced by the incoming messages of a clique. That is, the message to be sent from a clique can be computed via a factorization of the clique potential in the form of a junction tree. In this paper we show that by exploiting such nested junction trees in the computation of messages both space and time costs of the conventional propagation methods may be reduced. The paper presents a structured way of exploiting the nested junction trees technique to achieve such reductions. The usefulness of the method is emphasized through a thorough empirical evaluation involving ten large real-world Bayesian networks and the Hugin inference algorithm.\nWe consider probabilistic inference in general hybrid networks, which include continuous and discrete variables in an arbitrary topology. We reexamine the question of variable discretization in a hybrid network aiming at minimizing the information loss induced by the discretization. We show that a nonuniform partition across all variables as opposed to uniform partition of each variable separately reduces the size of the data structures needed to represent a continuous function. We also provide a simple but efficient procedure for nonuniform partition. To represent a nonuniform discretization in the computer memory, we introduce a new data structure, which we call a Binary Split Partition (BSP) tree. We show that BSP trees can be an exponential factor smaller than the data structures in the standard uniform discretization in multiple dimensions and show how the BSP trees can be used in the standard join tree algorithm. We show that the accuracy of the inference process can be significantly improved by adjusting discretization with evidence. We construct an iterative anytime algorithm that gradually improves the quality of the discretization and the accuracy of the answer on a query. We provide empirical evidence that the algorithm converges.\nIn most current applications of belief networks, domain knowledge is represented by a single belief network that applies to all problem instances in the domain. In more complex domains, problem-specific models must be constructed from a knowledge base encoding probabilistic relationships in the domain. Most work in knowledge-based model construction takes the rule as the basic unit of knowledge. We present a knowledge representation framework that permits the knowledge base designer to specify knowledge in larger semantically meaningful units which we call network fragments. Our framework provides for representation of asymmetric independence and canonical intercausal interaction. We discuss the combination of network fragments to form problem-specific models to reason about particular problem instances. The framework is illustrated using examples from the domain of military situation awareness.\nThis paper introduces a computational framework for reasoning in Bayesian belief networks that derives significant advantages from focused inference and relevance reasoning. This framework is based on d -separation and other simple and computationally efficient techniques for pruning irrelevant parts of a network. Our main contribution is a technique that we call relevance-based decomposition. Relevance-based decomposition approaches belief updating in large networks by focusing on their parts and decomposing them into partially overlapping subnetworks. This makes reasoning in some intractable networks possible and, in addition, often results in significant speedup, as the total time taken to update all subnetworks is in practice often considerably less than the time taken to update the network as a whole. We report results of empirical tests that demonstrate practical significance of our approach.\nA submarine's sonar team is responsible for detecting, localising and classifying targets using information provided by the platform's sensor suite. The information used to make these assessments is typically uncertain and/or incomplete and is likely to require a measure of confidence in its reliability. Moreover, improvements in sensor and communication technology are resulting in increased amounts of on-platform and off-platform information available for evaluation. This proliferation of imprecise information increases the risk of overwhelming the operator. To assist the task of localisation and classification a concept demonstration decision aid (Horizon), based on evidential reasoning, has been developed. Horizon is an information fusion software package for representing and fusing imprecise information about the state of the world, expressed across suitable frames of reference. The Horizon software is currently at prototype stage.\nBy discussing several examples, the theory of generalized functional models is shown to be very natural for modeling some situations of reasoning under uncertainty. A generalized functional model is a pair (f, P) where f is a function describing the interactions between a parameter variable, an observation variable and a random source, and P is a probability distribution for the random source. Unlike traditional functional models, generalized functional models do not require that there is only one value of the parameter variable that is compatible with an observation and a realization of the random source. As a consequence, the results of the analysis of a generalized functional model are not expressed in terms of probability distributions but rather by support and plausibility functions. The analysis of a generalized functional model is very logical and is inspired from ideas already put forward by R.A. Fisher in his theory of fiducial probability.\nThere is a brief description of the probabilistic causal graph model for representing, reasoning with, and learning causal structure using Bayesian networks. It is then argued that this model is closely related to how humans reason with and learn causal structure. It is shown that studies in psychology on discounting (reasoning concerning how the presence of one cause of an effect makes another cause less probable) support the hypothesis that humans reach the same judgments as algorithms for doing inference in Bayesian networks. Next, it is shown how studies by Piaget indicate that humans learn causal structure by observing the same independencies and dependencies as those used by certain algorithms for learning the structure of a Bayesian network. Based on this indication, a subjective definition of causality is forwarded. Finally, methods for further testing the accuracy of these claims are discussed.\nBayesian knowledge bases (BKBs) are a generalization of Bayes networks and weighted proof graphs (WAODAGs), that allow cycles in the causal graph. Reasoning in BKBs requires finding the most probable inferences consistent with the evidence. The cost-sharing heuristic for finding least-cost explanations in WAODAGs was presented and shown to be effective by Charniak and Husain. However, the cycles in BKBs would make the definition of cost-sharing cyclic as well, if applied directly to BKBs. By treating the defining equations of cost-sharing as a system of equations, one can properly define an admissible cost-sharing heuristic for BKBs. Empirical evaluation shows that cost-sharing improves performance significantly when applied to BKBs.\nWe introduce a new interpretation of two related notions - conditional utility and utility independence. Unlike the traditional interpretation, the new interpretation renders the notions the direct analogues of their probabilistic counterparts. To capture these notions formally, we appeal to the notion of utility distribution, introduced in previous paper. We show that utility distributions, which have a structure that is identical to that of probability distributions, can be viewed as a special case of an additive multiattribute utility functions, and show how this special case permits us to capture the novel senses of conditional utility and utility independence. Finally, we present the notion of utility networks, which do for utilities what Bayesian networks do for probabilities. Specifically, utility networks exploit the new interpretation of conditional utility and utility independence to compactly represent a utility distribution.\nDefault logic encounters some conceptual difficulties in representing common sense reasoning tasks. We argue that we should not try to formulate modular default rules that are presumed to work in all or most circumstances. We need to take into account the importance of the context which is continuously evolving during the reasoning process. Sequential thresholding is a quantitative counterpart of default logic which makes explicit the role context plays in the construction of a non-monotonic extension. We present a semantic characterization of generic non-monotonic reasoning, as well as the instantiations pertaining to default logic and sequential thresholding. This provides a link between the two mechanisms as well as a way to integrate the two that can be beneficial to both.\nRecursive graphical models usually underlie the statistical modelling concerning probabilistic expert systems based on Bayesian networks. This paper defines a version of these models, denoted as recursive exponential models, which have evolved by the desire to impose sophisticated domain knowledge onto local fragments of a model. Besides the structural knowledge, as specified by a given model, the statistical modelling may also include expert opinion about the values of parameters in the model. It is shown how to translate imprecise expert knowledge into approximately conjugate prior distributions. Based on possibly incomplete data, the score and the observed information are derived for these models. This accounts for both the traditional score and observed information, derived as derivatives of the log-likelihood, and the posterior score and observed information, derived as derivatives of the log-posterior distribution. Throughout the paper the specialization into recursive graphical models is accounted for by a simple example.\nThe criterion commonly used in directed acyclic graphs (dags) for testing graphical independence is the well-known d-separation criterion. It allows us to build graphical representations of dependency models (usually probabilistic dependency models) in the form of belief networks, which make easy interpretation and management of independence relationships possible, without reference to numerical parameters (conditional probabilities). In this paper, we study the following combinatorial problem: finding the minimum d-separating set for two nodes in a dag. This set would represent the minimum information (in the sense of minimum number of variables) necessary to prevent these two nodes from influencing each other. The solution to this basic problem and some of its extensions can be useful in several ways, as we shall see later. Our solution is based on a two-step process: first, we reduce the original problem to the simpler one of finding a minimum separating set in an undirected graph, and second, we develop an algorithm for solving it.\nThis paper works through the optimization of a real world planning problem, with a combination of a generative planning tool and an influence diagram solver. The problem is taken from an existing application in the domain of oil spill emergency response. The planning agent manages constraints that order sets of feasible equipment employment actions. This is mapped at an intermediate level of abstraction onto an influence diagram. In addition, the planner can apply a surveillance operator that determines observability of the state---the unknown trajectory of the oil. The uncertain world state and the objective function properties are part of the influence diagram structure, but not represented in the planning agent domain. By exploiting this structure under the constraints generated by the planning agent, the influence diagram solution complexity simplifies considerably, and an optimum solution to the employment problem based on the objective function is found. Finding this optimum is equivalent to the simultaneous evaluation of a range of plans. This result is an example of bounded optimality, within the limitations of this hybrid generative planner and influence diagram architecture.\nWe developed the language of Modifiable Temporal Belief Networks (MTBNs) as a structural and temporal extension of Bayesian Belief Networks (BNs) to facilitate normative temporal and causal modeling under uncertainty. In this paper we present definitions of the model, its components, and its fundamental properties. We also discuss how to represent various types of temporal knowledge, with an emphasis on hybrid temporal-explicit time modeling, dynamic structures, avoiding causal temporal inconsistencies, and dealing with models that involve simultaneously actions (decisions) and causal and non-causal associations. We examine the relationships among BNs, Modifiable Belief Networks, and MTBNs with a single temporal granularity, and suggest areas of application suitable to each one of them.\nGraphical Markov models use graphs, either undirected, directed, or mixed, to represent possible dependences among statistical variables. Applications of undirected graphs (UDGs) include models for spatial dependence and image analysis, while acyclic directed graphs (ADGs), which are especially convenient for statistical analysis, arise in such fields as genetics and psychometrics and as models for expert systems and Bayesian belief networks. Lauritzen, Wermuth and Frydenberg (LWF) introduced a Markov property for chain graphs, which are mixed graphs that can be used to represent simultaneously both causal and associative dependencies and which include both UDGs and ADGs as special cases. In this paper an alternative Markov property (AMP) for chain graphs is introduced, which in some ways is a more direct extension of the ADG Markov property than is the LWF property for chain graph.\nA nonmonotonic logic of thresholded generalizations is presented. Given propositions A and B from a language L and a positive integer k, the thresholded generalization A=>B{k} means that the conditional probability P(B|A) falls short of one by no more than c*d^k. A two-level probability structure is defined. At the lower level, a model is defined to be a probability function on L. At the upper level, there is a probability distribution over models. A definition is given of what it means for a collection of thresholded generalizations to entail another thresholded generalization. This nonmonotonic entailment relation, called \"entailment in probability\", has the feature that its conclusions are \"probabilistically trustworthy\" meaning that, given true premises, it is improbable that an entailed conclusion would be false. A procedure is presented for ascertaining whether any given collection of premises entails any given conclusion. It is shown that entailment in probability is closely related to Goldszmidt and Pearl's System-Z^+, thereby demonstrating that the conclusions of System-Z^+ are probabilistically trustworthy.\nThis paper deals with a scene recognition system in a robotics contex. The general problem is to match images with <I>a priori</I> descriptions. A typical mission would consist in identifying an object in an installation with a vision system situated at the end of a manipulator and with a human operator provided description, formulated in a pseudo-natural language, and possibly redundant. The originality of this work comes from the nature of the description, from the special attention given to the management of imprecision and uncertainty in the interpretation process and from the way to assess the description redundancy so as to reinforce the overall matching likelihood.\nAn algorithm is developed for finding a close to optimal junction tree of a given graph G. The algorithm has a worst case complexity O(c^k n^a) where a and c are constants, n is the number of vertices, and k is the size of the largest clique in a junction tree of G in which this size is minimized. The algorithm guarantees that the logarithm of the size of the state space of the heaviest clique in the junction tree produced is less than a constant factor off the optimal value. When k = O(log n), our algorithm yields a polynomial inference algorithm for Bayesian networks.\nBayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution. This allows a natural and compact representation of the distribution, eases knowledge acquisition, and supports effective inference algorithms. It is well-known, however, that there are certain independencies that we cannot capture qualitatively within the Bayesian network structure: independencies that hold only in certain contexts, i.e., given a specific assignment of values to certain variables. In this paper, we propose a formal notion of context-specific independence (CSI), based on regularities in the conditional probability tables (CPTs) at a node. We present a technique, analogous to (and based on) d-separation, for determining when such independence holds in a given network. We then focus on a particular qualitative representation scheme - tree-structured CPTs - for capturing CSI. We suggest ways in which this representation can be used to support effective inference algorithms. In particular, we present a structural decomposition of the resulting network which can improve the performance of clustering algorithms, and an alternative algorithm based on cutset conditioning.\nWe develop and extend existing decision-theoretic methods for troubleshooting a nonfunctioning device. Traditionally, diagnosis with Bayesian networks has focused on belief updating---determining the probabilities of various faults given current observations. In this paper, we extend this paradigm to include taking actions. In particular, we consider three classes of actions: (1) we can make observations regarding the behavior of a device and infer likely faults as in traditional diagnosis, (2) we can repair a component and then observe the behavior of the device to infer likely faults, and (3) we can change the configuration of the device, observe its new behavior, and infer the likelihood of faults. Analysis of latter two classes of troubleshooting actions requires incorporating notions of persistence into the belief-network formalism used for probabilistic inference.\nThe paper presents an efficient method for simulating the tails of a target variable Z=h(X) which depends on a set of basic variables X=(X_1, ..., X_n). To this aim, variables X_i, i=1, ..., n are sequentially simulated in such a manner that Z=h(x_1, ..., x_i-1, X_i, ..., X_n) is guaranteed to be in the tail of Z. When this method is difficult to apply, an alternative method is proposed, which leads to a low rejection proportion of sample values, when compared with the Monte Carlo method. Both methods are shown to be very useful to perform a sensitivity analysis of Bayesian networks, when very large confidence intervals for the marginal/conditional probabilities are required, as in reliability or risk analysis. The methods are shown to behave best when all scores coincide. The required modifications for this to occur are discussed. The methods are illustrated with several examples and one example of application to a real case is used to illustrate the whole process.\nDecision analysis (DA) and the rich set of tools developed by researchers in decision making under uncertainty show great potential to penetrate the technological content of the products and services delivered by firms in a variety of industries as well as the business processes used to deliver those products and services to market. In this paper I describe work in progress at Sun Microsystems in the application of decision-analytic methods to Operational Decision Making (ODM) in its World-Wide Operations (WWOPS) Business Management Group. Working with membersof product engineering, marketing, and sales, operations planners from WWOPS have begun to use a decision-analytic framework called SCRAM (Supply Communication/Risk Assessment and Management) to structure and solve problems in product planning, tracking, and transition. Concepts such as information value provide a powerful method of managing huge information sets and thereby enable managers to focus attention on factors that matter most for their business. Finally, our process-oriented introduction of decision-analytic methods to Sun managers has led to a focused effort to develop decision support software based on methods from decision making under uncertainty.\nApproaches to learning Bayesian networks from data typically combine a scoring function with a heuristic search procedure. Given a Bayesian network structure, many of the scoring functions derived in the literature return a score for the entire equivalence class to which the structure belongs. When using such a scoring function, it is appropriate for the heuristic search algorithm to search over equivalence classes of Bayesian networks as opposed to individual structures. We present the general formulation of a search space for which the states of the search correspond to equivalence classes of structures. Using this space, any one of a number of heuristic search algorithms can easily be applied. We compare greedy search performance in the proposed search space to greedy search performance in a search space for which the states correspond to individual Bayesian network structures.\nWe discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that the CS measure is the most accurate.\nIt is shown that the ability of the interval probability representation to capture epistemological independence is severely limited. Two events are epistemologically independent if knowledge of the first event does not alter belief (i.e., probability bounds) about the second. However, independence in this form can only exist in a 2-monotone probability function in degenerate cases i.e., if the prior bounds are either point probabilities or entirely vacuous. Additional limitations are characterized for other classes of lower probabilities as well. It is argued that these phenomena are simply a matter of interpretation. They appear to be limitations when one interprets probability bounds as a measure of epistemological indeterminacy (i.e., uncertainty arising from a lack of knowledge), but are exactly as one would expect when probability intervals are interpreted as representations of ontological indeterminacy (indeterminacy introduced by structural approximations). The ontological interpretation is introduced and discussed.\nQuasi-Bayesian theory uses convex sets of probability distributions and expected loss to represent preferences about plans. The theory focuses on decision robustness, i.e., the extent to which plans are affected by deviations in subjective assessments of probability. The present work presents solutions for plan generation when robustness of probability assessments must be included: plans contain information about the robustness of certain actions. The surprising result is that some problems can be solved faster in the Quasi-Bayesian framework than within usual Bayesian theory. We investigate this on the planning to observe problem, i.e., an agent must decide whether to take new observations or not. The fundamental question is: How, and how much, to search for a \"best\" plan, based on the robustness of probability assessments? Plan generation algorithms are derived in the context of material classification with an acoustic robotic probe. A package that constructs Quasi-Bayesian plans is available through anonymous ftp.\nThis paper provides a formal and practical framework for sound abstraction of probabilistic actions. We start by precisely defining the concept of sound abstraction within the context of finite-horizon planning (where each plan is a finite sequence of actions). Next we show that such abstraction cannot be performed within the traditional probabilistic action representation, which models a world with a single probability distribution over the state space. We then present the constraint mass assignment representation, which models the world with a set of probability distributions and is a generalization of mass assignment representations. Within this framework, we present sound abstraction procedures for three types of action abstraction. We end the paper with discussions and related work on sound and approximate abstraction. We give pointers to papers in which we discuss other sound abstraction-related issues, including applications, estimating loss due to abstraction, and automatically generating abstraction hierarchies.\nThis paper discusses belief revision under uncertain inputs in the framework of possibility theory. Revision can be based on two possible definitions of the conditioning operation, one based on min operator which requires a purely ordinal scale only, and another based on product, for which a richer structure is needed, and which is a particular case of Dempster's rule of conditioning. Besides, revision under uncertain inputs can be understood in two different ways depending on whether the input is viewed, or not, as a constraint to enforce. Moreover, it is shown that M.A. Williams' transmutations, originally defined in the setting of Spohn's functions, can be captured in this framework, as well as Boutilier's natural revision.\nMany algorithms for processing probabilistic networks are dependent on the topological properties of the problem's structure. Such algorithms (e.g., clustering, conditioning) are effective only if the problem has a sparse graph captured by parameters such as tree width and cycle-cut set size. In this paper we initiate a study to determine the potential of structure-based algorithms in real-life applications. We analyze empirically the structural properties of problems coming from the circuit diagnosis domain. Specifically, we locate those properties that capture the effectiveness of clustering and conditioning as well as of a family of conditioning+clustering algorithms designed to gradually trade space for time. We perform our analysis on 11 benchmark circuits widely used in the testing community. We also report on the effect of ordering heuristics on tree-clustering and show that, on our benchmarks, the well-known max-cardinality ordering is substantially inferior to an ordering called min-degree.\nIn this paper we examine a novel addition to the known methods for learning Bayesian networks from data that improves the quality of the learned networks. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method. This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters.\nThe study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. Roughly, revision treats a surprising observation as a sign that previous beliefs were wrong, while update treats a surprising observation as an indication that the world has changed. In general, we would expect that an agent making an observation may both want to revise some earlier beliefs and assume that some change has occurred in the world. We define a novel approach to belief change that allows us to do this, by applying ideas from probability theory in a qualitative setting. The key idea is to use a qualitative Markov assumption, which says that state transitions are independent. We show that a recent approach to modeling qualitative uncertainty using plausibility measures allows us to make such a qualitative Markov assumption in a relatively straightforward way, and show how the Markov assumption can be used to provide an attractive belief-change model.\nWe extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a model according to the dimension of its parameters. We argue that the dimension of a Bayesian network with hidden variables is the rank of the Jacobian matrix of the transformation between the parameters of the network and the parameters of the observable variables. We compute the dimensions of several networks including the naive Bayes model with a hidden root node.\nModeling worlds and actions under uncertainty is one of the central problems in the framework of decision-theoretic planning. The representation must be general enough to capture real-world problems but at the same time it must provide a basis upon which theoretical results can be derived. The central notion in the framework we propose here is that of the affine-operator, which serves as a tool for constructing (convex) sets of probability distributions, and which can be considered as a generalization of belief functions and interval mass assignments. Uncertainty in the state of the worlds is modeled with sets of probability distributions, represented by affine-trees while actions are defined as tree-manipulators. A small set of key properties of the affine-operator is presented, forming the basis for most existing operator-based definitions of probabilistic action projection and action abstraction. We derive and prove correct three projection rules, which vividly illustrate the precision-complexity tradeoff in plan projection. Finally, we show how the three types of action abstraction identified by Haddawy and Doan are manifested in the present framework.\nRecent research has found that diagnostic performance with Bayesian belief networks is often surprisingly insensitive to imprecision in the numerical probabilities. For example, the authors have recently completed an extensive study in which they applied random noise to the numerical probabilities in a set of belief networks for medical diagnosis, subsets of the CPCS network, a subset of the QMR (Quick Medical Reference) focused on liver and bile diseases. The diagnostic performance in terms of the average probabilities assigned to the actual diseases showed small sensitivity even to large amounts of noise. In this paper, we summarize the findings of this study and discuss possible explanations of this low sensitivity. One reason is that the criterion for performance is average probability of the true hypotheses, rather than average error in probability, which is insensitive to symmetric noise distributions. But, we show that even asymmetric, logodds-normal noise has modest effects. A second reason is that the gold-standard posterior probabilities are often near zero or one, and are little disturbed by noise.\nThe validation of data from sensors has become an important issue in the operation and control of modern industrial plants. One approach is to use knowledge based techniques to detect inconsistencies in measured data. This article presents a probabilistic model for the detection of such inconsistencies. Based on probability propagation, this method is able to find the existence of a possible fault among the set of sensors. That is, if an error exists, many sensors present an apparent fault due to the propagation from the sensor(s) with a real fault. So the fault detection mechanism can only tell if a sensor has a potential fault, but it can not tell if the fault is real or apparent. So the central problem is to develop a theory, and then an algorithm, for distinguishing real and apparent faults, given that one or more sensors can fail at the same time. This article then, presents an approach based on two levels: (i) probabilistic reasoning, to detect a potential fault, and (ii) constraint management, to distinguish the real fault from the apparent ones. The proposed approach is exemplified by applying it to a power plant model.\nUncertainty may be taken to characterize inferences, their conclusions, their premises or all three. Under some treatments of uncertainty, the inferences itself is never characterized by uncertainty. We explore both the significance of uncertainty in the premises and in the conclusion of an argument that involves uncertainty. We argue that for uncertainty to characterize the conclusion of an inference is natural, but that there is an interplay between uncertainty in the premises and uncertainty in the procedure of argument itself. We show that it is possible in principle to incorporate all uncertainty in the premises, rendering uncertainty arguments deductively valid. But we then argue (1) that this does not reflect human argument, (2) that it is computationally costly, and (3) that the gain in simplicity obtained by allowing uncertainty inference can sometimes outweigh the loss of flexibility it entails.\nIn this paper we propose a framework for combining Disjunctive Logic Programming and Poole's Probabilistic Horn Abduction. We use the concept of hypothesis to specify the probability structure. We consider the case in which probabilistic information is not available. Instead of using probability intervals, we allow for the specification of the probabilities of disjunctions. Because minimal models are used as characteristic models in disjunctive logic programming, we apply the principle of indifference on the set of minimal models to derive default probability values. We define the concepts of explanation and partial explanation of a formula, and use them to determine the default probability distribution(s) induced by a program. An algorithm for calculating the default probability of a goal is presented.\nA naive (or Idiot) Bayes network is a network with a single hypothesis node and several observations that are conditionally independent given the hypothesis. We recently surveyed a number of members of the UAI community and discovered a general lack of understanding of the implications of the Naive Bayes assumption on the kinds of problems that can be solved by these networks. It has long been recognized [Minsky 61] that if observations are binary, the decision surfaces in these networks are hyperplanes. We extend this result (hyperplane separability) to Naive Bayes networks with m-ary observations. In addition, we illustrate the effect of observation-observation dependencies on decision surfaces. Finally, we discuss the implications of these results on knowledge acquisition and research in learning.\nDirected acyclic graphs have been used fruitfully to represent causal strucures (Pearl 1988). However, in the social sciences and elsewhere models are often used which correspond both causally and statistically to directed graphs with directed cycles (Spirtes 1995). Pearl (1993) discussed predicting the effects of intervention in models of this kind, so-called linear non-recursive structural equation models. This raises the question of whether it is possible to make inferences about causal structure with cycles, form sample data. In particular do there exist general, informative, feasible and reliable precedures for inferring causal structure from conditional independence relations among variables in a sample generated by an unknown causal structure? In this paper I present a discovery algorithm that is correct in the large sample limit, given commonly (but often implicitly) made plausible assumptions, and which provides information about the existence or non-existence of causal pathways from one variable to another. The algorithm is polynomial on sparse graphs.\nAlthough the concept of d-separation was originally defined for directed acyclic graphs (see Pearl 1988), there is a natural extension of he concept to directed cyclic graphs. When exactly the same set of d-separation relations hold in two directed graphs, no matter whether respectively cyclic or acyclic, we say that they are Markov equivalent. In other words, when two directed cyclic graphs are Markov equivalent, the set of distributions that satisfy a natural extension of the Global Directed Markov condition (Lauritzen et al. 1990) is exactly the same for each graph. There is an obvious exponential (in the number of vertices) time algorithm for deciding Markov equivalence of two directed cyclic graphs; simply chech all of the d-separation relations in each graph. In this paper I state a theorem that gives necessary and sufficient conditions for the Markov equivalence of two directed cyclic graphs, where each of the conditions can be checked in polynomial time. Hence, the theorem can be easily adapted into a polynomial time algorithm for deciding the Markov equivalence of two directed cyclic graphs. Although space prohibits inclusion of correctness proofs, they are fully described in Richardson (1994b).\nBelief updating in Bayes nets, a well known computationally hard problem, has recently been approximated by several deterministic algorithms, and by various randomized approximation algorithms. Deterministic algorithms usually provide probability bounds, but have an exponential runtime. Some randomized schemes have a polynomial runtime, but provide only probability estimates. We present randomized algorithms that enumerate high-probability partial instantiations, resulting in probability bounds. Some of these algorithms are also sampling algorithms. Specifically, we introduce and evaluate a variant of backward sampling, both as a sampling algorithm and as a randomized enumeration algorithm. We also relax the implicit assumption used by both sampling and accumulation algorithms, that query nodes must be instantiated in all the samples.\nOver the past several years Bayesian networks have been applied to a wide variety of problems. A central problem in applying Bayesian networks is that of finding one or more of the most probable instantiations of a network. In this paper we develop an efficient algorithm that incrementally enumerates the instantiations of a Bayesian network in decreasing order of probability. Such enumeration algorithms are applicable in a variety of applications ranging from medical expert systems to model-based diagnosis. Fundamentally, our algorithm is simply performing a lazy enumeration of the sorted list of all instantiations of the network. This insight leads to a very concise algorithm statement which is both easily understood and implemented. We show that for singly connected networks, our algorithm generates the next instantiation in time polynomial in the size of the network. The algorithm extends to arbitrary Bayesian networks using standard conditioning techniques. We empirically evaluate the enumeration algorithm and demonstrate its practicality.\nChain graphs give a natural unifying point of view on Markov and Bayesian networks and enlarge the potential of graphical models for description of conditional independence structures. In the paper a direct graphical separation criterion for chain graphs, called c-separation, which generalizes the d-separation criterion for Bayesian networks is introduced (recalled). It is equivalent to the classic moralization criterion for chain graphs and complete in sense that for every chain graph there exists a probability distribution satisfying exactly conditional independencies derivable from the chain graph by the c-separation criterion. Every class of Markov equivalent chain graphs can be uniquely described by a natural representative, called the largest chain graph. A recovery algorithm, which on basis of the (conditional) dependency model induced by an unknown chain graph finds the corresponding largest chain graph, is presented.\nWhen we work with information from multiple sources, the formalism each employs to handle uncertainty may not be uniform. In order to be able to combine these knowledge bases of different formats, we need to first establish a common basis for characterizing and evaluating the different formalisms, and provide a semantics for the combined mechanism. A common framework can provide an infrastructure for building an integrated system, and is essential if we are to understand its behavior. We present a unifying framework based on an ordered partition of possible worlds called partition sequences, which corresponds to our intuitive notion of biasing towards certain possible scenarios when we are uncertain of the actual situation. We show that some of the existing formalisms, namely, default logic, autoepistemic logic, probabilistic conditioning and thresholding (generalized conditioning), and possibility theory can be incorporated into this general framework.\nIntegrating diagnosis and repair is particularly crucial when gaining sufficient information to discriminate between several candidate diagnoses requires carrying out some repair actions. A typical case is supply restoration in a faulty power distribution system. This problem, which is a major concern for electricity distributors, features partial observability, and stochastic repair actions which are more elaborate than simple replacement of components. This paper analyses the difficulties in applying existing work on integrating model-based diagnosis and repair and on planning in partially observable stochastic domains to this real-world problem, and describes the pragmatic approach we have retained so far.\nWe examine a standard factory scheduling problem with stochastic processing and setup times, minimizing the expectation of the weighted number of tardy jobs. Because the costs of operators in the schedule are stochastic and sequence dependent, standard dynamic programming algorithms such as A* may fail to find the optimal schedule. The SDA* (Stochastic Dominance A*) algorithm remedies this difficulty by relaxing the pruning condition. We present an improved state-space search formulation for these problems and discuss the conditions under which stochastic scheduling problems can be solved optimally using SDA*. In empirical testing on randomly generated problems, we found that in 70%, the expected cost of the optimal stochastic solution is lower than that of the solution derived using a deterministic approximation, with comparable search effort.\nIn learning belief networks, the single link lookahead search is widely adopted to reduce the search space. We show that there exists a class of probabilistic domain models which displays a special pattern of dependency. We analyze the behavior of several learning algorithms using different scoring metrics such as the entropy, conditional independence, minimal description length and Bayesian metrics. We demonstrate that single link lookahead search procedures (employed in these algorithms) cannot learn these models correctly. Thus, when the underlying domain model actually belongs to this class, the use of a single link search procedure will result in learning of an incorrect model. This may lead to inference errors when the model is used. Our analysis suggests that if the prior knowledge about a domain does not rule out the possible existence of these models, a multi-link lookahead search or other heuristics should be used for the learning process.\nProbabilistic independence can dramatically simplify the task of eliciting, representing, and computing with probabilities in large domains. A key technique in achieving these benefits is the idea of graphical modeling. We survey existing notions of independence for utility functions in a multi-attribute space, and suggest that these can be used to achieve similar advantages. Our new results concern conditional additive independence, which we show always has a perfect representation as separation in an undirected graph (a Markov network). Conditional additive independencies entail a particular functional for the utility function that is analogous to a product decomposition of a probability function, and confers analogous benefits. This functional form has been utilized in the Bayesian network and influence diagram literature, but generally without an explanation in terms of independence. The functional form yields a decomposition of the utility function that can greatly speed up expected utility calculations, particularly when the utility graph has a similar topology to the probabilistic network being used.\nThe first contribution of this paper is the presentation of a Pavelka - like formulation of possibilistic logic in which the language is naturally enriched by two connectives which represent negation (eg) and a new type of conjunction (otimes). The space of truth values for this logic is the lattice of possibility functions, that, from an algebraic point of view, forms a quantal. A second contribution comes from the understanding of the new conjunction as the combination of tokens of information coming from different sources, which makes our language \"dynamic\". A Gentzen calculus is presented, which is proved sound and complete with respect to the given semantics. The problem of truth functionality is discussed in this context.\nWe describe an application of belief networks to the diagnosis of bottlenecks in computer systems. The technique relies on a high-level functional model of the interaction between application workloads, the Windows NT operating system, and system hardware. Given a workload description, the model predicts the values of observable system counters available from the Windows NT performance monitoring tool. Uncertainty in workloads, predictions, and counter values are characterized with Gaussian distributions. During diagnostic inference, we use observed performance monitor values to find the most probable assignment to the workload parameters. In this paper we provide some background on automated bottleneck detection, describe the structure of the system model, and discuss empirical procedures for model calibration and verification. Part of the calibration process includes generating a dataset to estimate a multivariate Gaussian error model. Initial results in diagnosing bottlenecks are presented.\nWe can perform inference in Bayesian belief networks by enumerating instantiations with high probability thus approximating the marginals. In this paper, we present a method for determining the fraction of instantiations that has to be considered such that the absolute error in the marginals does not exceed a predefined value. The method is based on extreme value theory. Essentially, the proposed method uses the reversed generalized Pareto distribution to model probabilities of instantiations below a given threshold. Based on this distribution, an estimate of the maximal absolute error if instantiations with probability smaller than u are disregarded can be made.\nAn approach to fault isolation that exploits vastly incomplete models is presented. It relies on separate descriptions of each component behavior, together with the links between them, which enables focusing of the reasoning to the relevant part of the system. As normal observations do not need explanation, the behavior of the components is limited to anomaly propagation. Diagnostic solutions are disorders (fault modes or abnormal signatures) that are consistent with the observations, as well as abductive explanations. An ordinal representation of uncertainty based on possibility theory provides a simple exception-tolerant description of the component behaviors. We can for instance distinguish between effects that are more or less certainly present (or absent) and effects that are more or less certainly present (or absent) when a given anomaly is present. A realistic example illustrates the benefits of this approach.\nIn this paper we study different concepts of independence for convex sets of probabilities. There will be two basic ideas for independence. The first is irrelevance. Two variables are independent when a change on the knowledge about one variable does not affect the other. The second one is factorization. Two variables are independent when the joint convex set of probabilities can be decomposed on the product of marginal convex sets. In the case of the Theory of Probability, these two starting points give rise to the same definition. In the case of convex sets of probabilities, the resulting concepts will be strongly related, but they will not be equivalent. As application of the concept of independence, we shall consider the problem of building a global convex set from marginal convex sets of probabilities.\nThe undirected technique for evaluating belief networks [Jensen, et.al., 1990, Lauritzen and Spiegelhalter, 1988] requires clustering the nodes in the network into a junction tree. In the traditional view, the junction tree is constructed from the cliques of the moralized and triangulated belief network: triangulation is taken to be the primitive concept, the goal towards which any clustering algorithm (e.g. node elimination) is directed. In this paper, we present an alternative conception of clustering, in which clusters and the junction tree property play the role of primitives: given a graph (not a tree) of clusters which obey (a modified version of) the junction tree property, we transform this graph until we have obtained a tree. There are several advantages to this approach: it is much clearer and easier to understand, which is important for humans who are constructing belief networks; it admits a wider range of heuristics which may enable more efficient or superior clustering algorithms; and it serves as the natural basis for an incremental clustering scheme, which we describe.\nAlthough the usefulness of belief networks for reasoning under uncertainty is widely accepted, obtaining numerical probabilities that they require is still perceived a major obstacle. Often not enough statistical data is available to allow for reliable probability estimation. Available information may not be directly amenable for encoding in the network. Finally, domain experts may be reluctant to provide numerical probabilities. In this paper, we propose a method for elicitation of probabilities from a domain expert that is non-invasive and accommodates whatever probabilistic information the expert is willing to state. We express all available information, whether qualitative or quantitative in nature, in a canonical form consisting of (in) equalities expressing constraints on the hyperspace of possible joint probability distributions. We then use this canonical form to derive second-order probability distributions over the desired probabilities.\nAccepting a proposition means that our confidence in this proposition is strictly greater than the confidence in its negation. This paper investigates the subclass of uncertainty measures, expressing confidence, that capture the idea of acceptance, what we call acceptance functions. Due to the monotonicity property of confidence measures, the acceptance of a proposition entails the acceptance of any of its logical consequences. In agreement with the idea that a belief set (in the sense of Gardenfors) must be closed under logical consequence, it is also required that the separate acceptance o two propositions entail the acceptance of their conjunction. Necessity (and possibility) measures agree with this view of acceptance while probability and belief functions generally do not. General properties of acceptance functions are estabilished. The motivation behind this work is the investigation of a setting for belief revision more general than the one proposed by Alchourron, Gardenfors and Makinson, in connection with the notion of conditioning.\nThe fraud/uncollectible debt problem in the telecommunications industry presents two technical challenges: the detection and the treatment of the account given the detection. In this paper, we focus on the first problem of detection using Bayesian network models, and we briefly discuss the application of a normative expert system for the treatment at the end. We apply Bayesian network models to the problem of fraud/uncollectible debt detection for telecommunication services. In addition to being quite successful at predicting rare event outcomes, it is able to handle a mixture of categorical and continuous data. We present a performance comparison using linear and non-linear discriminant analysis, classification and regression trees, and Bayesian network models\nThe Constraint Satisfaction Problem (CSP) framework offers a simple and sound basis for representing and solving simple decision problems, without uncertainty. This paper is devoted to an extension of the CSP framework enabling us to deal with some decisions problems under uncertainty. This extension relies on a differentiation between the agent-controllable decision variables and the uncontrollable parameters whose values depend on the occurrence of uncertain events. The uncertainty on the values of the parameters is assumed to be given under the form of a probability distribution. Two algorithms are given, for computing respectively decisions solving the problem with a maximal probability, and conditional decisions mapping the largest possible amount of possible cases to actual decisions.\nWe examine a new approach to modeling uncertainty based on plausibility measures, where a plausibility measure just associates with an event its plausibility, an element is some partially ordered set. This approach is easily seen to generalize other approaches to modeling uncertainty, such as probability measures, belief functions, and possibility measures. The lack of structure in a plausibility measure makes it easy for us to add structure on an \"as needed\" basis, letting us examine what is required to ensure that a plausibility measure has certain properties of interest. This gives us insight into the essential features of the properties in question, while allowing us to prove general results that apply to many approaches to reasoning about uncertainty. Plausibility measures have already proved useful in analyzing default reasoning. In this paper, we examine their \"algebraic properties,\" analogues to the use of + and * in probability theory. An understanding of such properties will be essential if plausibility measures are to be used in practice as a representation tool.\nWe present an algorithm, called Predict, for updating beliefs in causal networks quantified with order-of-magnitude probabilities. The algorithm takes advantage of both the structure and the quantification of the network and presents a polynomial asymptotic complexity. Predict exhibits a conservative behavior in that it is always sound but not always complete. We provide sufficient conditions for completeness and present algorithms for testing these conditions and for computing a complete set of plausible values. We propose Predict as an efficient method to estimate probabilistic values and illustrate its use in conjunction with two known algorithms for probabilistic inference. Finally, we describe an application of Predict to plan evaluation, present experimental results, and discuss issues regarding its use with conditional logics of belief, and in the characterization of irrelevance.\nWe present a precise definition of cause and effect in terms of a fundamental notion called unresponsiveness. Our definition is based on Savage's (1954) formulation of decision theory and departs from the traditional view of causation in that our causal assertions are made relative to a set of decisions. An important consequence of this departure is that we can reason about cause locally, not requiring a causal explanation for every dependency. Such local reasoning can be beneficial because it may not be necessary to determine whether a particular dependency is causal to make a decision. Also in this paper, we examine the graphical encoding of causal relationships. We show that influence diagrams in canonical form are an accurate and efficient representation of causal relationships. In addition, we establish a correspondence between canonical form and Pearl's causal theory.\nWe describe methods for managing the complexity of information displayed to people responsible for making high-stakes, time-critical decisions. The techniques provide tools for real-time control of the configuration and quantity of information displayed to a user, and a methodology for designing flexible human-computer interfaces for monitoring applications. After defining a prototypical set of display decision problems, we introduce the expected value of revealed information (EVRI) and the related measure of expected value of displayed information (EVDI). We describe how these measures can be used to enhance computer displays used for monitoring complex systems. We motivate the presentation by discussing our efforts to employ decision-theoretic control of displays for a time-critical monitoring application at the NASA Mission Control Center in Houston.\nIn this paper we extend the influence diagram (ID) representation for decisions under uncertainty. In the standard ID, arrows into a decision node are only informational; they do not represent constraints on what the decision maker can do. We can represent such constraints only indirectly, using arrows to the children of the decision and sometimes adding more variables to the influence diagram, thus making the ID more complicated. Users of influence diagrams often want to represent constraints by arrows into decision nodes. We represent constraints on decisions by allowing relevance arrows into decision nodes. We call the resulting representation information/relevance influence diagrams (IRIDs). Information/relevance influence diagrams allow for direct representation and specification of constrained decisions. We use a combination of stochastic dynamic programming and Gibbs sampling to solve IRIDs. This method is especially useful when exact methods for solving IDs fail.\nAn important issue in the use of expert systems is the so-called brittleness problem. Expert systems model only a limited part of the world. While the explicit management of uncertainty in expert systems itigates the brittleness problem, it is still possible for a system to be used, unwittingly, in ways that the system is not prepared to address. Such a situation may be detected by the method of straw models, first presented by Jensen et al. [1990] and later generalized and justified by Laskey [1991]. We describe an algorithm, which we have implemented, that takes as input an annotated diagnostic Bayesian network (the base model) and constructs, without assistance, a bipartite network to be used as a straw model. We show that in some cases this straw model is better that the independent straw model of Jensen et al., the only other straw model for which a construction algorithm has been designed and implemented.\nWe show an alternative way of representing a Bayesian belief network by sensitivities and probability distributions. This representation is equivalent to the traditional representation by conditional probabilities, but makes dependencies between nodes apparent and intuitively easy to understand. We also propose a QR matrix representation for the sensitivities and/or conditional probabilities which is more efficient, in both memory requirements and computational speed, than the traditional representation for computer-based implementations of probabilistic inference. We use sensitivities to show that for a certain class of binary networks, the computation time for approximate probabilistic inference with any positive upper bound on the error of the result is independent of the size of the network. Finally, as an alternative to traditional algorithms that use conditional probabilities, we describe an exact algorithm for probabilistic inference that uses the QR-representation for sensitivities and updates probability distributions of nodes in a network according to messages from the neighbors.\nThis paper introduces the independent choice logic, and in particular the \"single agent with nature\" instance of the independent choice logic, namely ICLdt. This is a logical framework for decision making uncertainty that extends both logic programming and stochastic models such as influence diagrams. This paper shows how the representation of a decision problem within the independent choice logic can be exploited to cut down the combinatorics of dynamic programming. One of the main problems with influence diagram evaluation techniques is the need to optimise a decision for all values of the 'parents' of a decision variable. In this paper we show how the rule based nature of the ICLdt can be exploited so that we only make distinctions in the values of the information available for a decision that will make a difference to utility.\nBayesian belief networks are bing increasingly used as a knowledge representation for diagnostic reasoning. One simple method for conducting diagnostic reasoning is to represent system faults and observations only. In this paper, we investigate how having intermediate nodes-nodes other than fault and observation nodes affects the diagnostic performance of a Bayesian belief network. We conducted a series of experiments on a set of real belief networks for medical diagnosis in liver and bile disease. We compared the effects on diagnostic performance of a two-level network consisting just of disease and finding nodes with that of a network which models intermediate pathophysiological disease states as well. We provide some theoretical evidence for differences observed between the abstracted two-level network and the full network.\nTypical approaches to plan recognition start from a representation of an agent's possible plans, and reason evidentially from observations of the agent's actions to assess the plausibility of the various candidates. A more expansive view of the task (consistent with some prior work) accounts for the context in which the plan was generated, the mental state and planning process of the agent, and consequences of the agent's actions in the world. We present a general Bayesian framework encompassing this view, and focus on how context can be exploited in plan recognition. We demonstrate the approach on a problem in traffic monitoring, where the objective is to induce the plan of the driver from observation of vehicle movements. Starting from a model of how the driver generates plans, we show how the highway context can appropriately influence the recognizer's interpretation of observed driver behavior.\nThe main goal of this paper is to describe a new pruning method for solving decision trees and game trees. The pruning method for decision trees suggests a slight variant of decision trees that we call scenario trees. In scenario trees, we do not need a conditional probability for each edge emanating from a chance node. Instead, we require a joint probability for each path from the root node to a leaf node. We compare the pruning method to the traditional rollback method for decision trees and game trees. For problems that require Bayesian revision of probabilities, a scenario tree representation with the pruning method is more efficient than a decision tree representation with the rollback method. For game trees, the pruning method is more efficient than the rollback method.\nThe use of directed acyclic graphs (DAGs) to represent conditional independence relations among random variables has proved fruitful in a variety of ways. Recursive structural equation models are one kind of DAG model. However, non-recursive structural equation models of the kinds used to model economic processes are naturally represented by directed cyclic graphs with independent errors, a characterization of conditional independence errors, a characterization of conditional independence constraints is obtained, and it is shown that the result generalizes in a natural way to systems in which the error variables or noises are statistically dependent. For non-linear systems with independent errors a sufficient condition for conditional independence of variables in associated distributions is obtained.\nProbabilistic model-based diagnosis computes the posterior probabilities of failure of components from the prior probabilities of component failure and observations of system behavior. One problem with this method is that such priors are almost never directly available. One of the reasons is that the prior probability estimates include an implicit notion of a time interval over which they are specified -- for example, if the probability of failure of a component is 0.05, is this over the period of a day or is this over a week? A second problem facing probabilistic model-based diagnosis is the modeling of persistence. Say we have an observation about a system at time t_1 and then another observation at a later time t_2. To compute posterior probabilities that take into account both the observations, we need some model of how the state of the system changes from time t_1 to t_2. In this paper, we address these problems using techniques from Reliability theory. We show how to compute the failure prior of a component from an empirical measure of its reliability -- the Mean Time Between Failure (MTBF). We also develop a scheme to model persistence when handling multiple time tagged observations.\nThe goal of diagnosis is to compute good repair strategies in response to anomalous system behavior. In a decision theoretic framework, a good repair strategy has low expected cost. In a general formulation of the problem, the computation of the optimal (lowest expected cost) repair strategy for a system with multiple faults is intractable. In this paper, we consider an interesting and natural restriction on the behavior of the system being diagnosed: (a) the system exhibits faulty behavior if and only if one or more components is malfunctioning. (b) The failures of the system components are independent. Given this restriction on system behavior, we develop a polynomial time algorithm for computing the optimal repair strategy. We then go on to introduce a system hierarchy and the notion of inspecting (testing) components before repair. We develop a linear time algorithm for computing an optimal repair strategy for the hierarchical system which includes both repair and inspection.\nThe goal of model-based diagnosis is to isolate causes of anomalous system behavior and recommend inexpensive repair actions in response. In general, precomputing optimal repair policies is intractable. To date, investigators addressing this problem have explored approximations that either impose restrictions on the system model (such as a single fault assumption) or compute an immediate best action with limited lookahead. In this paper, we develop a formulation of repair in model-based diagnosis and a repair algorithm that computes optimal sequences of actions. This optimal approach is costly but can be applied to precompute an optimal repair strategy for compact systems. We show how we can exploit a hierarchical system specification to make this approach tractable for large systems. When introducing hierarchy, we also consider the tradeoff between simply replacing a component and decomposing it to repair its subcomponents. The hierarchical repair algorithm is suitable for off-line precomputation of an optimal repair strategy. A modification of the algorithm takes advantage of an iterative deepening scheme to trade off inference time and the quality of the computed strategy.\nStandard algorithms for finding the shortest path in a graph require that the cost of a path be additive in edge costs, and typically assume that costs are deterministic. We consider the problem of uncertain edge costs, with potential probabilistic dependencies among the costs. Although these dependencies violate the standard dynamic-programming decomposition, we identify a weaker stochastic consistency condition that justifies a generalized dynamic-programming approach based on stochastic dominance. We present a revised path-planning algorithm and prove that it produces optimal paths under time-dependent uncertain costs. We test the algorithm by applying it to a model of stochastic bus networks, and present empirical performance results comparing it to some alternatives. Finally, we consider extensions of these concepts to a more general class of problems of heuristic search under uncertainty.\nDeveloping Question Answering systems has been one of the important research issues because it requires insights from a variety of disciplines,including,Artificial Intelligence,Information Retrieval, Information Extraction,Natural Language Processing, and Psychology.In this paper we realize a formal model for a lightweight semantic based open domain yes/no Arabic question answering system based on paragraph retrieval with variable length. We propose a constrained semantic representation. Using an explicit unification framework based on semantic similarities and query expansion synonyms and antonyms.This frequently improves the precision of the system. Employing the passage retrieval system achieves a better precision by retrieving more paragraphs that contain relevant answers to the question; It significantly reduces the amount of text to be processed by the system.\nBayesian learning of belief networks (BLN) is a method for automatically constructing belief networks (BNs) from data using search and Bayesian scoring techniques. K2 is a particular instantiation of the method that implements a greedy search strategy. To evaluate the accuracy of K2, we randomly generated a number of BNs and for each of those we simulated data sets. K2 was then used to induce the generating BNs from the simulated data. We examine the performance of the program, and the factors that influence it. We also present a simple BN model, developed from our results, which predicts the accuracy of K2, when given various characteristics of the data set.\nWe have previously reported a Bayesian algorithm for determining the coordinates of points in three-dimensional space from uncertain constraints. This method is useful in the determination of biological molecular structure. It is limited, however, by the requirement that the uncertainty in the constraints be normally distributed. In this paper, we present an extension of the original algorithm that allows constraint uncertainty to be represented as a mixture of Gaussians, and thereby allows arbitrary constraint distributions. We illustrate the performance of this algorithm on a problem drawn from the domain of molecular structure determination, in which a multicomponent constraint representation produces a much more accurate solution than the old single component mechanism. The new mechanism uses mixture distributions to decompose the problem into a set of independent problems with unimodal constraint uncertainty. The results of the unimodal subproblems are periodically recombined using Bayes' law, to avoid combinatorial explosion. The new algorithm is particularly suited for parallel implementation.\nIn previous work [BGHK92, BGHK93], we have studied the random-worlds approach -- a particular (and quite powerful) method for generating degrees of belief (i.e., subjective probabilities) from a knowledge base consisting of objective (first-order, statistical, and default) information. But allowing a knowledge base to contain only objective information is sometimes limiting. We occasionally wish to include information about degrees of belief in the knowledge base as well, because there are contexts in which old beliefs represent important information that should influence new beliefs. In this paper, we describe three quite general techniques for extending a method that generates degrees of belief from objective information to one that can make use of degrees of belief as well. All of our techniques are bloused on well-known approaches, such as cross-entropy. We discuss general connections between the techniques and in particular show that, although conceptually and technically quite different, all of the techniques give the same answer when applied to the random-worlds method.\nEvaluation of counterfactual queries (e.g., \"If A were true, would C have been true?\") is important to fault diagnosis, planning, and determination of liability. In this paper we present methods for computing the probabilities of such queries using the formulation proposed in [Balke and Pearl, 1994], where the antecedent of the query is interpreted as an external action that forces the proposition A to be true. When a prior probability is available on the causal mechanisms governing the domain, counterfactual probabilities can be evaluated precisely. However, when causal knowledge is specified as conditional probabilities on the observables, only bounds can computed. This paper develops techniques for evaluating these bounds, and demonstrates their use in two applications: (1) the determination of treatment efficacy from studies in which subjects may choose their own treatment, and (2) the determination of liability in product-safety litigation.\nI describe a planning methodology for domains with uncertainty in the form of external events that are not completely predictable. The events are represented by enabling conditions and probabilities of occurrence. The planner is goal-directed and backward chaining, but the subgoals are suggested by analyzing the probability of success of the partial plan rather than being simply the open conditions of the operators in the plan. The partial plan is represented as a Bayesian belief net to compute its probability of success. Since calculating the probability of success of a plan can be very expensive I introduce two other techniques for computing it, one that uses Monte Carlo simulation to estimate it and one based on a Markov chain representation that uses knowledge about the dependencies between the predicates describing the domain.\nBayesian belief network learning algorithms have three basic components: a measure of a network structure and a database, a search heuristic that chooses network structures to be considered, and a method of estimating the probability tables from the database. This paper contributes to all these three topics. The behavior of the Bayesian measure of Cooper and Herskovits and a minimum description length (MDL) measure are compared with respect to their properties for both limiting size and finite size databases. It is shown that the MDL measure has more desirable properties than the Bayesian measure when a distribution is to be learned. It is shown that selecting belief networks with certain minimallity properties is NP-hard. This result justifies the use of search heuristics instead of exact algorithms for choosing network structures to be considered. In some cases, a collection of belief networks can be represented by a single belief network which leads to a new kind of probability table estimation called smoothing. We argue that smoothing can be efficiently implemented by incorporating it in the search heuristic. Experimental results suggest that for learning probabilities of belief networks smoothing is helpful.\nThe expected value of information (EVI) is the most powerful measure of sensitivity to uncertainty in a decision model: it measures the potential of information to improve the decision, and hence measures the expected value of outcome. Standard methods for computing EVI use discrete variables and are computationally intractable for models that contain more than a few variables. Monte Carlo simulation provides the basis for more tractable evaluation of large predictive models with continuous and discrete variables, but so far computation of EVI in a Monte Carlo setting also has appeared impractical. We introduce an approximate approach based on pre-posterior analysis for estimating EVI in Monte Carlo models. Our method uses a linear approximation to the value function and multiple linear regression to estimate the linear model from the samples. The approach is efficient and practical for extremely large models. It allows easy estimation of EVI for perfect or partial information on individual variables or on combinations of variables. We illustrate its implementation within Demos (a decision modeling system), and its application to a large model for crisis transportation planning.\nThis work proposes action networks as a semantically well-founded framework for reasoning about actions and change under uncertainty. Action networks add two primitives to probabilistic causal networks: controllable variables and persistent variables. Controllable variables allow the representation of actions as directly setting the value of specific events in the domain, subject to preconditions. Persistent variables provide a canonical model of persistence according to which both the state of a variable and the causal mechanism dictating its value persist over time unless intervened upon by an action (or its consequences). Action networks also allow different methods for quantifying the uncertainty in causal relationships, which go beyond traditional probabilistic quantification. This paper describes both recent results and work in progress.\nWe study the connection between kappa calculus and probabilistic reasoning in diagnosis applications. Specifically, we abstract a probabilistic belief network for diagnosing faults into a kappa network and compare the ordering of faults computed using both methods. We show that, at least for the example examined, the ordering of faults coincide as long as all the causal relations in the original probabilistic network are taken into account. We also provide a formal analysis of some network structures where the two methods will differ. Both kappa rankings and infinitesimal probabilities have been used extensively to study default reasoning and belief revision. But little has been done on utilizing their connection as outlined above. This is partly because the relation between kappa and probability calculi assumes that probabilities are arbitrarily close to one (or zero). The experiments in this paper investigate this relation when this assumption is not satisfied. The reported results have important implications on the use of kappa rankings to enhance the knowledge engineering of uncertainty models.\nWhen agents devise plans for execution in the real world, they face two important forms of uncertainty: they can never have complete knowledge about the state of the world, and they do not have complete control, as the effects of their actions are uncertain. While most classical planning methods avoid explicit uncertainty reasoning, we believe that uncertainty should be explicitly represented and reasoned about. We develop a probabilistic representation for states and actions, based on belief networks. We define conditional belief nets (CBNs) to capture the probabilistic dependency of the effects of an action upon the state of the world. We also use a CBN to represent the intrinsic relationships among entities in the environment, which persist from state to state. We present a simple projection algorithm to construct the belief network of the state succeeding an action, using the environment CBN model to infer indirect effects. We discuss how the qualitative aspects of belief networks and CBNs make them appropriate for the various stages of the problem solving process, from model construction to the design of planning algorithms.\nMost algorithms for propagating evidence through belief networks have been exact and exhaustive: they produce an exact (point-valued) marginal probability for every node in the network. Often, however, an application will not need information about every n ode in the network nor will it need exact probabilities. We present the localized partial evaluation (LPE) propagation algorithm, which computes interval bounds on the marginal probability of a specified query node by examining a subset of the nodes in the entire network. Conceptually, LPE ignores parts of the network that are \"too far away\" from the queried node to have much impact on its value. LPE has the \"anytime\" property of being able to produce better solutions (tighter intervals) given more time to consider more of the network.\nAI planning algorithms have addressed the problem of generating sequences of operators that achieve some input goal, usually assuming that the planning agent has perfect control over and information about the world. Relaxing these assumptions requires an extension to the action representation that allows reasoning both about the changes an action makes and the information it provides. This paper presents an action representation that extends the deterministic STRIPS model, allowing actions to have both causal and informational effects, both of which can be context dependent and noisy. We also demonstrate how a standard least-commitment planning algorithm can be extended to include informational actions and contingent execution.\nAn ordinal view of independence is studied in the framework of possibility theory. We investigate three possible definitions of dependence, of increasing strength. One of them is the counterpart to the multiplication law in probability theory, and the two others are based on the notion of conditional possibility. These two have enough expressive power to support the whole possibility theory, and a complete axiomatization is provided for the strongest one. Moreover we show that weak independence is well-suited to the problems of belief change and plausible reasoning, especially to address the problem of blocking of property inheritance in exception-tolerant taxonomic reasoning.\nIn this paper, we introduce evidence propagation operations on influence diagrams and a concept of value of evidence, which measures the value of experimentation. Evidence propagation operations are critical for the computation of the value of evidence, general update and inference operations in normative expert systems which are based on the influence diagram (generalized Bayesian network) paradigm. The value of evidence allows us to compute directly an outcome sensitivity, a value of perfect information and a value of control which are used in decision analysis (the science of decision making under uncertainty). More specifically, the outcome sensitivity is the maximum difference among the values of evidence, the value of perfect information is the expected value of the values of evidence, and the value of control is the optimal value of the values of evidence. We also discuss an implementation and a relative computational efficiency issues related to the value of evidence and the value of perfect information.\nWe describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Previous work has concentrated on metrics for domains containing only discrete variables, under the assumption that data represents a multinomial sample. In this paper, we extend this work, developing scoring metrics for domains containing all continuous variables or a mixture of discrete and continuous variables, under the assumption that continuous data is sampled from a multivariate normal distribution. Our work extends traditional statistical approaches for identifying vanishing regression coefficients in that we identify two important assumptions, called event equivalence and parameter modularity, that when combined allow the construction of prior distributions for multivariate normal parameters from a single prior Bayesian network specified by a user.\nTesting the validity of probabilistic models containing unmeasured (hidden) variables is shown to be a hard task. We show that the task of testing whether models are structurally incompatible with the data at hand, requires an exponential number of independence evaluations, each of the form: \"X is conditionally independent of Y, given Z.\" In contrast, a linear number of such evaluations is required to test a standard Bayesian network (one per vertex). On the positive side, we show that if a network with hidden variables G has a tree skeleton, checking whether G represents a given probability model P requires the polynomial number of such independence evaluations. Moreover, we provide an algorithm that efficiently constructs a tree-structured Bayesian network (with hidden variables) that represents P if such a network exists, and further recognizes when such a network does not exist.\nWe introduce an approach to high-level conditional planning we call epsilon-safe planning. This probabilistic approach commits us to planning to meet some specified goal with a probability of success of at least 1-epsilon for some user-supplied epsilon. We describe several algorithms for epsilon-safe planning based on conditional planners. The two conditional planners we discuss are Peot and Smith's nonlinear conditional planner, CNLP, and our own linear conditional planner, PLINTH. We present a straightforward extension to conditional planners for which computing the necessary probabilities is simple, employing a commonly-made but perhaps overly-strong independence assumption. We also discuss a second approach to epsilon-safe planning which relaxes this independence assumption, involving the incremental construction of a probability dependence model in conjunction with the construction of the plan graph.\nWe present a method for dynamically generating Bayesian networks from knowledge bases consisting of first-order probability logic sentences. We present a subset of probability logic sufficient for representing the class of Bayesian networks with discrete-valued nodes. We impose constraints on the form of the sentences that guarantee that the knowledge base contains all the probabilistic information necessary to generate a network. We define the concept of d-separation for knowledge bases and prove that a knowledge base with independence conditions defined by d-separation is a complete specification of a probability distribution. We present a network generation algorithm that, given an inference problem in the form of a query Q and a set of evidence E, generates a network to compute P(Q|E). We prove the algorithm to be correct.\nHeckerman (1993) defined causal independence in terms of a set of temporal conditional independence statements. These statements formalized certain types of causal interaction where (1) the effect is independent of the order that causes are introduced and (2) the impact of a single cause on the effect does not depend on what other causes have previously been applied. In this paper, we introduce an equivalent a temporal characterization of causal independence based on a functional representation of the relationship between causes and the effect. In this representation, the interaction between causes and effect can be written as a nested decomposition of functions. Causal independence can be exploited by representing this decomposition in the belief network, resulting in representations that are more efficient for inference than general causal models. We present empirical results showing the benefits of a causal-independence representation for belief-network inference.\nOn the one hand, classical terminological knowledge representation excludes the possibility of handling uncertain concept descriptions involving, e.g., \"usually true\" concept properties, generalized quantifiers, or exceptions. On the other hand, purely numerical approaches for handling uncertainty in general are unable to consider terminological knowledge. This paper presents the language ACP which is a probabilistic extension of terminological logics and aims at closing the gap between the two areas of research. We present the formal semantics underlying the language ALUP and introduce the probabilistic formalism that is based on classes of probabilities and is realized by means of probabilistic constraints. Besides inferring implicitly existent probabilistic relationships, the constraints guarantee terminological and probabilistic consistency. Altogether, the new language ALUP applies to domains where both term descriptions and uncertainty have to be handled.\nQualitative and infinitesimal probability schemes are consistent with the axioms of probability theory, but avoid the need for precise numerical probabilities. Using qualitative probabilities could substantially reduce the effort for knowledge engineering and improve the robustness of results. We examine experimentally how well infinitesimal probabilities (the kappa-calculus of Goldszmidt and Pearl) perform a diagnostic task - troubleshooting a car that will not start by comparison with a conventional numerical belief network. We found the infinitesimal scheme to be as good as the numerical scheme in identifying the true fault. The performance of the infinitesimal scheme worsens significantly for prior fault probabilities greater than 0.03. These results suggest that infinitesimal probability methods may be of substantial practical value for machine diagnosis with small prior fault probabilities.\nPossibilistic logic, an extension of first-order logic, deals with uncertainty that can be estimated in terms of possibility and necessity measures. Syntactically, this means that a first-order formula is equipped with a possibility degree or a necessity degree that expresses to what extent the formula is possibly or necessarily true. Possibilistic resolution yields a calculus for possibilistic logic which respects the semantics developed for possibilistic logic. A drawback, which possibilistic resolution inherits from classical resolution, is that it may not terminate if applied to formulas belonging to decidable fragments of first-order logic. Therefore we propose an alternative proof method for possibilistic logic. The main feature of this method is that it completely abstracts from a concrete calculus but uses as basic operation a test for classical entailment. We then instantiate possibilistic logic with a terminological logic, which is a decidable subclass o f first-order logic but nevertheless much more expressive than propositional logic. This yields an extension of terminological logics towards the representation of uncertain knowledge which is satisfactory from a semantic as well as algorithmic point of view.\nWe give an axiomatization of confidence transfer - a known conditioning scheme - from the perspective of expectation-based inference in the sense of Gardenfors and Makinson. Then, we use the notion of belief independence to \"filter out\" different proposal s of possibilistic conditioning rules, all are variations of confidence transfer. Among the three rules that we consider, only Dempster's rule of conditioning passes the test of supporting the notion of belief independence. With the use of this conditioning rule, we then show that we can use local computation for computing desired conditional marginal possibilities of the joint possibility satisfying the given constraints. It turns out that our local computation scheme is already proposed by Shenoy. However, our intuitions are completely different from that of Shenoy. While Shenoy just defines a local computation scheme that fits his framework of valuation-based systems, we derive that local computation scheme from II(,8) = tI(,8 I a) * II(a) and appropriate independence assumptions, just like how the Bayesians derive their local computation scheme.\nTo coordinate with other agents in its environment, an agent needs models of what the other agents are trying to do. When communication is impossible or expensive, this information must be acquired indirectly via plan recognition. Typical approaches to plan recognition start with a specification of the possible plans the other agents may be following, and develop special techniques for discriminating among the possibilities. Perhaps more desirable would be a uniform procedure for mapping plans to general structures supporting inference based on uncertain and incomplete observations. In this paper, we describe a set of methods for converting plans represented in a flexible procedural language to observation models represented as probabilistic belief networks.\nThe paper presents a method for reducing the computational complexity of Bayesian networks through identification and removal of weak dependencies (removal of links from the (moralized) independence graph). The removal of a small number of links may reduce the computational complexity dramatically, since several fill-ins and moral links may be rendered superfluous by the removal. The method is described in terms of impact on the independence graph, the junction tree, and the potential functions associated with these. An empirical evaluation of the method using large real-world networks demonstrates the applicability of the method. Further, the method, which has been implemented in Hugin, complements the approximation method suggested by Jensen & Andersen (1990).\nWe explore the issue of refining an existent Bayesian network structure using new data which might mention only a subset of the variables. Most previous works have only considered the refinement of the network's conditional probability parameters, and have not addressed the issue of refining the network's structure. We develop a new approach for refining the network's structure. Our approach is based on the Minimal Description Length (MDL) principle, and it employs an adapted version of a Bayesian network learning algorithm developed in our previous work. One of the adaptations required is to modify the previous algorithm to account for the structure of the existent network. The learning algorithm generates a partial network structure which can then be used to improve the existent network. We also present experimental evidence demonstrating the effectiveness of our approach.\nWe view the syntax-based approaches to default reasoning as a model-based diagnosis problem, where each source giving a piece of information is considered as a component. It is formalized in the ATMS framework (each source corresponds to an assumption). We assume then that all sources are independent and \"fail\" with a very small probability. This leads to a probability assignment on the set of candidates, or equivalently on the set of consistent environments. This probability assignment induces a Dempster-Shafer belief function which measures the probability that a proposition can be deduced from the evidence. This belief function can be used in several different ways to define a non-monotonic consequence relation. We study and compare these consequence relations. The -case of prioritized knowledge bases is briefly considered.\nA model to represent spatial information is presented in this paper. It is based on fuzzy constraints represented as fuzzy geometric relations that can be hierarchically structured. The concept of spatial template is introduced to capture the idea of interrelated objects in two-dimensional space. The representation model is used to specify imprecise or vague information consisting in relative locations and orientations of template objects. It is shown in this paper how a template represented by this model can be matched against a crisp situation to recognize a particular instance of this template. Furthermore, the proximity measure (fuzzy measure) between the instance and the template is worked out - this measure can be interpreted as a degree of similarity. In this context, template recognition can be viewed as a case of fuzzy pattern recognition. The results of this work have been implemented and applied to a complex military problem from which this work originated.\nThis paper describes the best first search strategy used by U-Plan (Mansell 1993a), a planning system that constructs quantitatively ranked plans given an incomplete description of an uncertain environment. U-Plan uses uncertain and incomplete evidence de scribing the environment, characterizes it using a Dempster-Shafer interval, and generates a set of possible world states. Plan construction takes place in an abstraction hierarchy where strategic decisions are made before tactical decisions. Search through this abstraction hierarchy is guided by a quantitative measure (expected fulfillment) based on decision theory. The search strategy is best first with the provision to update expected fulfillment and review previous decisions in the light of planning developments. U-Plan generates multiple plans for multiple possible worlds, and attempts to use existing plans for new world situations. A super-plan is then constructed, based on merging the set of plans and appropriately timed knowledge acquisition operators, which are used to decide between plan alternatives during plan execution.\nIn this paper we describe a framework for model-based diagnosis of dynamic systems, which extends previous work in this field by using and expressing temporal uncertainty in the form of qualitative interval relations a la Allen. Based on a logical framework extended by qualitative and quantitative temporal constraints we show how to describe behavioral models (both consistency- and abductive-based), discuss how to use abstract observations and show how abstract temporal diagnoses are computed. This yields an expressive framework, which allows the representation of complex temporal behavior allowing us to represent temporal uncertainty. Due to its abstraction capabilities computation is made independent of the number of observations and time points in a temporal setting. An example of hepatitis diagnosis is used throughout the paper.\nCertain classes of problems, including perceptual data understanding, robotics, discovery, and learning, can be represented as incremental, dynamically constructed belief networks. These automatically constructed networks can be dynamically extended and modified as evidence of new individuals becomes available. The main result of this paper is the incremental extension of the singly connected polytree network in such a way that the network retains its singly connected polytree structure after the changes. The algorithm is deterministic and is guaranteed to have a complexity of single node addition that is at most of order proportional to the number of nodes (or size) of the network. Additional speed-up can be achieved by maintaining the path information. Despite its incremental and dynamic nature, the algorithm can also be used for probabilistic inference in belief networks in a fashion similar to other exact inference algorithms.\nWe present a symbolic machinery that admits both probabilistic and causal information about a given domain and produces probabilistic statements about the effect of actions and the impact of observations. The calculus admits two types of conditioning operators: ordinary Bayes conditioning, P(y|X = x), which represents the observation X = x, and causal conditioning, P(y|do(X = x)), read the probability of Y = y conditioned on holding X constant (at x) by deliberate action. Given a mixture of such observational and causal sentences, together with the topology of the causal graph, the calculus derives new conditional probabilities of both types, thus enabling one to quantify the effects of actions (and policies) from partially specified knowledge bases, such as Bayesian networks in which some conditional probabilities may not be available.\nThis paper describes a novel approach to planning which takes advantage of decision theory to greatly improve robustness in an uncertain environment. We present an algorithm which computes conditional plans of maximum expected utility. This algorithm relies on a representation of the search space as an AND/OR tree and employs a depth-limit to control computation costs. A numeric robustness factor, which parameterizes the utility function, allows the user to modulate the degree of risk-aversion employed by the planner. Via a look-ahead search, the planning algorithm seeks to find an optimal plan using expected utility as its optimization criterion. We present experimental results obtained by applying our algorithm to a non-deterministic extension of the blocks world domain. Our results demonstrate that the robustness factor governs the degree of risk embodied in the conditional plans computed by our algorithm.\nWe present several techniques for knowledge engineering of large belief networks (BNs) based on the our experiences with a network derived from a large medical knowledge base. The noisyMAX, a generalization of the noisy-OR gate, is used to model causal in dependence in a BN with multi-valued variables. We describe the use of leak probabilities to enforce the closed-world assumption in our model. We present Netview, a visualization tool based on causal independence and the use of leak probabilities. The Netview software allows knowledge engineers to dynamically view sub-networks for knowledge engineering, and it provides version control for editing a BN. Netview generates sub-networks in which leak probabilities are dynamically updated to reflect the missing portions of the network.\nBayesian Belief Networks (BBNs) are a powerful formalism for reasoning under uncertainty but bear some severe limitations: they require a large amount of information before any reasoning process can start, they have limited contradiction handling capabilities, and their ability to provide explanations for their conclusion is still controversial. There exists a class of reasoning systems, called Truth Maintenance Systems (TMSs), which are able to deal with partially specified knowledge, to provide well-founded explanation for their conclusions, and to detect and handle contradictions. TMSs incorporating measure of uncertainty are called Belief Maintenance Systems (BMSs). This paper describes how a BMS based on probabilistic logic can be applied to BBNs, thus introducing a new class of BBNs, called Ignorant Belief Networks, able to incrementally deal with partially specified conditional dependencies, to provide explanations, and to detect and handle contradictions.\nIn this paper we propose a new approach to probabilistic inference on belief networks, global conditioning, which is a simple generalization of Pearl's (1986b) method of loopcutset conditioning. We show that global conditioning, as well as loop-cutset conditioning, can be thought of as a special case of the method of Lauritzen and Spiegelhalter (1988) as refined by Jensen et al (199Oa; 1990b). Nonetheless, this approach provides new opportunities for parallel processing and, in the case of sequential processing, a tradeoff of time for memory. We also show how a hybrid method (Suermondt and others 1990) combining loop-cutset conditioning with Jensen's method can be viewed within our framework. By exploring the relationships between these methods, we develop a unifying framework in which the advantages of each approach can be combined successfully.\nOver time, there have hen refinements in the way that probability distributions are used for representing beliefs. Models which rely on single probability distributions depict a complete ordering among the propositions of interest, yet human beliefs are sometimes not completely ordered. Non-singleton sets of probability distributions can represent partially ordered beliefs. Convex sets are particularly convenient and expressive, but it is known that there are reasonable patterns of belief whose faithful representation require less restrictive sets. The present paper shows that prior ignorance about three or more exclusive alternatives and the emergence of partially ordered beliefs when evidence is obtained defy representation by any single set of distributions, but yield to a representation baud on several uts. The partial order is shown to be a partial qualitative probability which shares some intuitively appealing attributes with probability distributions.\nProbability measures by themselves, are known to be inappropriate for modeling the dynamics of plain belief and their excessively strong measurability constraints make them unsuitable for some representational tasks, e.g. in the context of firstorder knowledge. In this paper, we are therefore going to look for possible alternatives and extensions. We begin by delimiting the general area of interest, proposing a minimal list of assumptions to be satisfied by any reasonable quasi-probabilistic valuation concept. Within this framework, we investigate two particularly interesting kinds of quasi-measures which are not or much less affected by the traditional problems. * Ranking measures, which generalize Spohn-type and possibility measures. * Cumulative measures, which combine the probabilistic and the ranking philosophy, allowing thereby a fine-grained account of static and dynamic belief.\nWe have developed a general Bayesian algorithm for determining the coordinates of points in a three-dimensional space. The algorithm takes as input a set of probabilistic constraints on the coordinates of the points, and an a priori distribution for each point location. The output is a maximum-likelihood estimate of the location of each point. We use the extended, iterated Kalman filter, and add a search heuristic for optimizing its solution under nonlinear conditions. This heuristic is based on the same principle as the simulated annealing heuristic for other optimization problems. Our method uses any probabilistic constraints that can be expressed as a function of the point coordinates (for example, distance, angles, dihedral angles, and planarity). It assumes that all constraints have Gaussian noise. In this paper, we describe the algorithm and show its performance on a set of synthetic data to illustrate its convergence properties, and its applicability to domains such ng molecular structure determination.\nThis paper addresses the tradeoffs which need to be considered in reasoning using probabilistic network representations, such as Influence Diagrams (IDs). In particular, we examine the tradeoffs entailed in using Temporal Influence Diagrams (TIDs) which adequately capture the temporal evolution of a dynamic system without prohibitive data and computational requirements. Three approaches for TID construction which make different tradeoffs are examined: (1) tailoring the network at each time interval to the data available (rather then just copying the original Bayes Network for all time intervals); (2) modeling the evolution of a parsimonious subset of variables (rather than all variables); and (3) model selection approaches, which seek to minimize some measure of the predictive accuracy of the model without introducing too many parameters, which might cause \"overfitting\" of the model. Methods of evaluating the accuracy/efficiency of the tradeoffs are proposed.\nInfluence diagrams are ideal knowledge representations for Bayesian statistical models. However, these diagrams are difficult for end users to interpret and to manipulate. We present a user-based architecture that enables end users to create and to manipulate the knowledge representation. We use the problem of physicians' interpretation of two-arm parallel randomized clinical trials (TAPRCT) to illustrate the architecture and its use. There are three primary data structures. Elements of statistical models are encoded as subgraphs of a restricted class of influence diagram. The interpretations of those elements are mapped into users' language in a domain-specific, user-based semantic interface, called a patient-flow diagram, in the TAPRCT problem. Pennitted transformations of the statistical model that maintain the semantic relationships of the model are encoded in a metadata-state diagram, called the cohort-state diagram, in the TAPRCT problem. The algorithm that runs the system uses modular actions called construction steps. This framework has been implemented in a system called THOMAS, that allows physicians to interpret the data reported from a TAPRCT.\nIn this paper we address the uncertainty issues involved in the low-level vision task of image segmentation. Researchers in computer vision have worked extensively on this problem, in which the goal is to partition (or segment) an image into regions that are homogeneous or uniform in some sense. This segmentation is often utilized by some higher level process, such as an object recognition system. We show that by considering uncertainty in a Bayesian formalism, we can use statistical image models to build an approximate representation of a probability distribution over a space of alternative segmentations. We give detailed descriptions of the various levels of uncertainty associated with this problem, discuss the interaction of prior and posterior distributions, and provide the operations for constructing this representation.\nDynamic network models (DNMs) are belief networks for temporal reasoning. The DNM methodology combines techniques from time series analysis and probabilistic reasoning to provide (1) a knowledge representation that integrates noncontemporaneous and contemporaneous dependencies and (2) methods for iteratively refining these dependencies in response to the effects of exogenous influences. We use belief-network inference algorithms to perform forecasting, control, and discrete event simulation on DNMs. The belief network formulation allows us to move beyond the traditional assumptions of linearity in the relationships among time-dependent variables and of normality in their probability distributions. We demonstrate the DNM methodology on an important forecasting problem in medicine. We conclude with a discussion of how the methodology addresses several limitations found in traditional time series analyses.\nWe compare the diagnostic accuracy of three diagnostic inference models: the simple Bayes model, the multimembership Bayes model, which is isomorphic to the parallel combination function in the certainty-factor model, and a model that incorporates the noisy OR-gate interaction. The comparison is done on 20 clinicopathological conference (CPC) cases from the American Journal of Medicine-challenging cases describing actual patients often with multiple disorders. We find that the distributions produced by the noisy OR model agree most closely with the gold-standard diagnoses, although substantial differences exist between the distributions and the diagnoses. In addition, we find that the multimembership Bayes model tends to significantly overestimate the posterior probabilities of diseases, whereas the simple Bayes model tends to significantly underestimate the posterior probabilities. Our results suggest that additional work to refine the noisy OR model for internal medicine will be worthwhile.\nThe inherent intractability of probabilistic inference has hindered the application of belief networks to large domains. Noisy OR-gates [30] and probabilistic similarity networks [18, 17] escape the complexity of inference by restricting model expressiveness. Recent work in the application of belief-network models to time-series analysis and forecasting [9, 10] has given rise to the additive belief network model (ABNM). We (1) discuss the nature and implications of the approximations made by an additive decomposition of a belief network, (2) show greater efficiency in the induction of additive models when available data are scarce, (3) generalize probabilistic inference algorithms to exploit the additive decomposition of ABNMs, (4) show greater efficiency of inference, and (5) compare results on inference with a simple additive belief network.\nFrom an inconsistent database non-trivial arguments may be constructed both for a proposition, and for the contrary of that proposition. Therefore, inconsistency in a logical database causes uncertainty about which conclusions to accept. This kind of uncertainty is called logical uncertainty. We define a concept of \"acceptability\", which induces a means for differentiating arguments. The more acceptable an argument, the more confident we are in it. A specific interest is to use the acceptability classes to assign linguistic qualifiers to propositions, such that the qualifier assigned to a propositions reflects its logical uncertainty. A more general interest is to understand how classes of acceptability can be defined for arguments constructed from an inconsistent database, and how this notion of acceptability can be devised to reflect different criteria. Whilst concentrating on the aspects of assigning linguistic qualifiers to propositions, we also indicate the more general significance of the notion of acceptability.\nWe take a utility-based approach to categorization. We construct generalizations about events and actions by considering losses associated with failing to distinguish among detailed distinctions in a decision model. The utility-based methods transform detailed states of the world into more abstract categories comprised of disjunctions of the states. We show how we can cluster distinctions into groups of distinctions at progressively higher levels of abstraction, and describe rules for decision making with the abstractions. The techniques introduce a utility-based perspective on the nature of concepts, and provide a means of simplifying decision models used in automated reasoning systems. We demonstrate the techniques by describing the capabilities and output of TUBA, a program for utility-based abstraction.\nOne topic that is likely to attract an increasing amount of attention within the Knowledge-base systems research community is the coordination of information provided by multiple experts. We envision a situation in which several experts independently encode information as belief networks. A potential user must then coordinate the conclusions and recommendations of these networks to derive some sort of consensus. One approach to such a consensus is the fusion of the contributed networks into a single, consensus model prior to the consideration of any case-specific data (specific observations, test results). This approach requires two types of combination procedures, one for probabilities, and one for graphs. Since the combination of probabilities is relatively well understood, the key barriers to this approach lie in the realm of graph theory. This paper provides formal definitions of some of the operations necessary to effect the necessary graphical combinations, and provides complexity analyses of these procedures. The paper's key result is that most of these operations are NP-hard, and its primary message is that the derivation of ?good? consensus networks must be done heuristically.\nThis paper identifies and solves a new optimization problem: Given a belief network (BN) and a target ordering on its variables, how can we efficiently derive its minimal I-map whose arcs are consistent with the target ordering? We present three solutions to this problem, all of which lead to directed acyclic graphs based on the original BN's recursive basis relative to the specified ordering (such a DAG is sometimes termed the boundary DAG drawn from the given BN relative to the said ordering [5]). Along the way, we also uncover an important general principal about arc reversals: when reordering a BN according to some target ordering, (while attempting to minimize the number of arcs generated), the sequence of arc reversals should follow the topological ordering induced by the original belief network's arcs to as great an extent as possible. These results promise to have a significant impact on the derivation of consensus models, as well as on other algorithms that require the reconfiguration and/or combination of BN's.\nProblems of probabilistic inference and decision making under uncertainty commonly involve continuous random variables. Often these are discretized to a few points, to simplify assessments and computations. An alternative approximation is to fit analytically tractable continuous probability distributions. This approach has potential simplicity and accuracy advantages, especially if variables can be transformed first. This paper shows how a minimum relative entropy criterion can drive both transformation and fitting, illustrating with a power and logarithm family of transformations and mixtures of Gaussian (normal) distributions, which allow use of efficient influence diagram methods. The fitting procedure in this case is the well-known EM algorithm. The selection of the number of components in a fitted mixture distribution is automated with an objective that trades off accuracy and computational cost.\nRelevance-based explanation is a scheme in which partial assignments to Bayesian belief network variables are explanations (abductive conclusions). We allow variables to remain unassigned in explanations as long as they are irrelevant to the explanation, where irrelevance is defined in terms of statistical independence. When multiple-valued variables exist in the system, especially when subsets of values correspond to natural types of events, the over specification problem, alleviated by independence-based explanation, resurfaces. As a solution to that, as well as for addressing the question of explanation specificity, it is desirable to collapse such a subset of values into a single value on the fly. The equivalent method, which is adopted here, is to generalize the notion of assignments to allow disjunctive assignments. We proceed to define generalized independence based explanations as maximum posterior probability independence based generalized assignments (GIB-MAPs). GIB assignments are shown to have certain properties that ease the design of algorithms for computing GIB-MAPs. One such algorithm is discussed here, as well as suggestions for how other algorithms may be adapted to compute GIB-MAPs. GIB-MAP explanations still suffer from instability, a problem which may be addressed using ?approximate? conditional independence as a condition for irrelevance.\nWe present a mechanism for constructing graphical models, specifically Bayesian networks, from a knowledge base of general probabilistic information. The unique feature of our approach is that it uses a powerful first-order probabilistic logic for expressing the general knowledge base. This logic allows for the representation of a wide range of logical and probabilistic information. The model construction procedure we propose uses notions from direct inference to identify pieces of local statistical information from the knowledge base that are most appropriate to the particular event we want to reason about. These pieces are composed to generate a joint probability distribution specified as a Bayesian network. Although there are fundamental difficulties in dealing with fully general knowledge, our procedure is practical for quite rich knowledge bases and it supports the construction of a far wider range of networks than allowed for by current template technology.\nPAGODA (Probabilistic Autonomous Goal-Directed Agent) is a model for autonomous learning in probabilistic domains [desJardins, 1992] that incorporates innovative techniques for using the agent's existing knowledge to guide and constrain the learning process and for representing, reasoning with, and learning probabilistic knowledge. This paper describes the probabilistic representation and inference mechanism used in PAGODA. PAGODA forms theories about the effects of its actions and the world state on the environment over time. These theories are represented as conditional probability distributions. A restriction is imposed on the structure of the theories that allows the inference mechanism to find a unique predicted distribution for any action and world state description. These restricted theories are called uniquely predictive theories. The inference mechanism, Probability Combination using Independence (PCI), uses minimal independence assumptions to combine the probabilities in a theory to make probabilistic predictions.\nIn previous work we developed a method of learning Bayesian Network models from raw data. This method relies on the well known minimal description length (MDL) principle. The MDL principle is particularly well suited to this task as it allows us to tradeoff, in a principled way, the accuracy of the learned network against its practical usefulness. In this paper we present some new results that have arisen from our work. In particular, we present a new local way of computing the description length. This allows us to make significant improvements in our search algorithm. In addition, we modify our algorithm so that it can take into account partial domain information that might be provided by a domain expert. The local computation of description length also opens the door for local refinement of an existent network. The feasibility of our approach is demonstrated by experiments involving networks of a practical size.\nAs belief networks are used to model increasingly complex situations, the need to automatically construct them from large databases will become paramount. This paper concentrates on solving a part of the belief network induction problem: that of learning the quantitative structure (the conditional probabilities), given the qualitative structure. In particular, a theory is presented that shows how to propagate inference distributions in a belief network, with the only assumption being that the given qualitative structure is correct. Most inference algorithms must make at least this assumption. The theory is based on four network transformations that are sufficient for any inference in a belief network. Furthermore, the claim is made that contrary to popular belief, error will not necessarily grow as the inference chain grows. Instead, for QBN belief nets induced from large enough samples, the error is more likely to decrease as the size of the inference chain increases.\nNumerous methods for probabilistic reasoning in large, complex belief or decision networks are currently being developed. There has been little research on automating the dynamic, incremental construction of decision models. A uniform value-driven method of decision model construction is proposed for the hierarchical complete diagnosis. Hierarchical complete diagnostic reasoning is formulated as a stochastic process and modeled using influence diagrams. Given observations, this method creates decision models in order to obtain the best actions sequentially for locating and repairing a fault at minimum cost. This method construct decision models incrementally, interleaving probe actions with model construction and evaluation. The method treats meta-level and baselevel tasks uniformly. That is, the method takes a decision-theoretic look at the control of search in causal pathways and structural hierarchies.\nIn recent years the belief network has been used increasingly to model systems in Al that must perform uncertain inference. The development of efficient algorithms for probabilistic inference in belief networks has been a focus of much research in AI. Efficient algorithms for certain classes of belief networks have been developed, but the problem of reporting the uncertainty in inferred probabilities has received little attention. A system should not only be capable of reporting the values of inferred probabilities and/or the favorable choices of a decision; it should report the range of possible error in the inferred probabilities and/or choices. Two methods have been developed and implemented for determining the variance in inferred probabilities in belief networks. These methods, the Approximate Propagation Method and the Monte Carlo Integration Method are discussed and compared in this paper.\nWe describe a method for time-critical decision making involving sequential tasks and stochastic processes. The method employs several iterative refinement routines for solving different aspects of the decision making problem. This paper concentrates on the meta-level control problem of deliberation scheduling, allocating computational resources to these routines. We provide different models corresponding to optimization problems that capture the different circumstances and computational strategies for decision making under time constraints. We consider precursor models in which all decision making is performed prior to execution and recurrent models in which decision making is performed in parallel with execution, accounting for the states observed during execution and anticipating future states. We describe algorithms for precursor and recurrent models and provide the results of our empirical investigations to date.\nGiven a belief network with evidence, the task of finding the I most probable explanations (MPE) in the belief network is that of identifying and ordering the I most probable instantiations of the non-evidence nodes of the belief network. Although many approaches have been proposed for solving this problem, most work only for restricted topologies (i.e., singly connected belief networks). In this paper, we will present a new approach for finding I MPEs in an arbitrary belief network. First, we will present an algorithm for finding the MPE in a belief network. Then, we will present a linear time algorithm for finding the next MPE after finding the first MPE. And finally, we will discuss the problem of finding the MPE for a subset of variables of a belief network, and show that the problem can be efficiently solved by this approach.\nThis paper describes ongoing research into planning in an uncertain environment. In particular, it introduces U-Plan, a planning system that constructs quantitatively ranked plans given an incomplete description of the state of the world. U-Plan uses a DempsterShafer interval to characterise uncertain and incomplete information about the state of the world. The planner takes as input what is known about the world, and constructs a number of possible initial states with representations at different abstraction levels. A plan is constructed for the initial state with the greatest support, and this plan is tested to see if it will work for other possible initial states. All, part, or none of the existing plans may be used in the generation of the plans for the remaining possible worlds. Planning takes place in an abstraction hierarchy where strategic decisions are made before tactical decisions. A super-plan is then constructed, based on merging the set of plans and the appropriately timed acquisition of essential knowledge, which is used to decide between plan alternatives. U-Plan usually produces a super-plan in less time than a classical planner would take to produce a set of plans, one for each possible world.\nThis paper discusses how conflicts (as used by the consistency-based diagnosis community) can be adapted to be used in a search-based algorithm for computing prior and posterior probabilities in discrete Bayesian Networks. This is an \"anytime\" algorithm, that at any stage can estimate the probabilities and give an error bound. Whereas the most popular Bayesian net algorithms exploit the structure of the network for efficiency, we exploit probability distributions for efficiency; this algorithm is most suited to the case with extreme probabilities. This paper presents a solution to the inefficiencies found in naive algorithms, and shows how the tools of the consistency-based diagnosis community (namely conflicts) can be used effectively to improve the efficiency. Empirical results with networks having tens of thousands of nodes are presented.\nBayesian belief networks can be used to represent and to reason about complex systems with uncertain, incomplete and conflicting information. Belief networks are graphs encoding and quantifying probabilistic dependence and conditional independence among variables. One type of reasoning of interest in diagnosis is called abductive inference (determination of the global most probable system description given the values of any partial subset of variables). In some cases, abductive inference can be performed with exact algorithms using distributed network computations but it is an NP-hard problem and complexity increases drastically with the presence of undirected cycles, number of discrete states per variable, and number of variables in the network. This paper describes an approximate method based on genetic algorithms to perform abductive inference in large, multiply connected networks for which complexity is a concern when using most exact methods and for which systematic search methods are not feasible. The theoretical adequacy of the method is discussed and preliminary experimental results are presented.\nThis paper presents and discusses several methods for reasoning from inconsistent knowledge bases. A so-called argumentative-consequence relation taking into account the existence of consistent arguments in favor of a conclusion and the absence of consistent arguments in favor of its contrary, is particularly investigated. Flat knowledge bases, i.e. without any priority between their elements, as well as prioritized ones where some elements are considered as more strongly entrenched than others are studied under different consequence relations. Lastly a paraconsistent-like treatment of prioritized knowledge bases is proposed, where both the level of entrenchment and the level of paraconsistency attached to a formula are propagated. The priority levels are handled in the framework of possibility theory.\nArgumentation is the process of constructing arguments about propositions, and the assignment of statements of confidence to those propositions based on the nature and relative strength of their supporting arguments. The process is modelled as a labelled deductive system, in which propositions are doubly labelled with the grounds on which they are based and a representation of the confidence attached to the argument. Argument construction is captured by a generalized argument consequence relation based on the ^,--fragment of minimal logic. Arguments can be aggregated by a variety of numeric and symbolic flattening functions. This approach appears to shed light on the common logical structure of a variety of quantitative, qualitative and defeasible uncertainty calculi.\nWe present a semantics for adding uncertainty to conditional logics for default reasoning and belief revision. We are able to treat conditional sentences as statements of conditional probability, and express rules for revision such as \"If A were believed, then B would be believed to degree p.\" This method of revision extends conditionalization by allowing meaningful revision by sentences whose probability is zero. This is achieved through the use of counterfactual probabilities. Thus, our system accounts for the best properties of qualitative methods of update (in particular, the AGM theory of revision) and probabilistic methods. We also show how our system can be viewed as a unification of probability theory and possibility theory, highlighting their orthogonality and providing a means for expressing the probability of a possibility. We also demonstrate the connection to Lewis's method of imaging.\nA key issue in the handling of temporal data is the treatment of persistence; in most approaches it consists in inferring defeasible confusions by extrapolating from the actual knowledge of the history of the world; we propose here a gradual modelling of persistence, following the idea that persistence is decreasing (the further we are from the last time point where a fluent is known to be true, the less certainly true the fluent is); it is based on possibility theory, which has strong relations with other well-known ordering-based approaches to nonmonotonic reasoning. We compare our approach with Dean and Kanazawa's probabilistic projection. We give a formal modelling of the decreasing persistence problem. Lastly, we show how to infer nonmonotonic conclusions using the principle of decreasing persistence.\nSecurity flaws in software applications today has been attributed mostly to design flaws. With limited budget and time to release software into the market, many developers often consider security as an afterthought. Previous research shows that integrating security into software applications at a later stage of software development lifecycle (SDLC) has been found to be more costly than when it is integrated during the early stages. To assist in the integration of security early in the SDLC stages, a new approach for assessing security during the design phase by neural network is investigated in this paper. Our findings show that by training a back propagation neural network to identify attack patterns, possible attacks can be identified from design scenarios presented to it. The result of performance of the neural network is presented in this paper.\nInfluence diagram is a graphical representation of belief networks with uncertainty. This article studies the structural properties of a probabilistic model in an influence diagram. In particular, structural controllability theorems and structural observability theorems are developed and algorithms are formulated. Controllability and observability are fundamental concepts in dynamic systems (Luenberger 1979). Controllability corresponds to the ability to control a system while observability analyzes the inferability of its variables. Both properties can be determined by the ranks of the system matrices. Structural controllability and observability, on the other hand, analyze the property of a system with its structure only, without the specific knowledge of the values of its elements (tin 1974, Shields and Pearson 1976). The structural analysis explores the connection between the structure of a model and the functional dependence among its elements. It is useful in comprehending problem and formulating solution by challenging the underlying intuitions and detecting inconsistency in a model. This type of qualitative reasoning can sometimes provide insight even when there is insufficient numerical information in a model.\nWe describe how we selectively reformulate portions of a belief network that pose difficulties for solution with a stochastic-simulation algorithm. With employ the selective conditioning approach to target specific nodes in a belief network for decomposition, based on the contribution the nodes make to the tractability of stochastic simulation. We review previous work on BNRAS algorithms- randomized approximation algorithms for probabilistic inference. We show how selective conditioning can be employed to reformulate a single BNRAS problem into multiple tractable BNRAS simulation problems. We discuss how we can use another simulation algorithm-logic sampling-to solve a component of the inference problem that provides a means for knitting the solutions of individual subproblems into a final result. Finally, we analyze tradeoffs among the computational subtasks associated with the selective conditioning approach to reformulation.\nThis paper investigates the possibility of performing automated reasoning in probabilistic logic when probabilities are expressed by means of linguistic quantifiers. Each linguistic term is expressed as a prescribed interval of proportions. Then instead of propagating numbers, qualitative terms are propagated in accordance with the numerical interpretation of these terms. The quantified syllogism, modelling the chaining of probabilistic rules, is studied in this context. It is shown that a qualitative counterpart of this syllogism makes sense, and is relatively independent of the threshold defining the linguistically meaningful intervals, provided that these threshold values remain in accordance with the intuition. The inference power is less than that of a full-fledged probabilistic con-quaint propagation device but better corresponds to what could be thought of as commonsense probabilistic reasoning.\nData fusion allows the elaboration and the evaluation of a situation synthesized from low level informations provided by different kinds of sensors. The fusion of the collected data will result in fewer and higher level informations more easily assessed by a human operator and that will assist him effectively in his decision process. In this paper we present the suitability and the advantages of using a Possibilistic Assumption based Truth Maintenance System (n-ATMS) in a data fusion military application. We first describe the problem, the needed knowledge representation formalisms and problem solving paradigms. Then we remind the reader of the basic concepts of ATMSs, Possibilistic Logic and 11-ATMSs. Finally we detail the solution to the given data fusion problem and conclude with the results and comparison with a non-possibilistic solution.\nTo date, most probabilistic reasoning systems have relied on a fixed belief network constructed at design time. The network is used by an application program as a representation of (in)dependencies in the domain. Probabilistic inference algorithms operate over the network to answer queries. Recognizing the inflexibility of fixed models has led researchers to develop automated network construction procedures that use an expressive knowledge base to generate a network that can answer a query. Although more flexible than fixed model approaches, these construction procedures separate construction and evaluation into distinct phases. In this paper we develop an approach to combining incremental construction and evaluation of a partial probability model. The combined method holds promise for improved methods for control of model construction based on a trade-off between fidelity of results and cost of construction.\nWe recently described a formalism for reasoning with if-then rules that re expressed with different levels of firmness [18]. The formalism interprets these rules as extreme conditional probability statements, specifying orders of magnitude of disbelief, which impose constraints over possible rankings of worlds. It was shown that, once we compute a priority function Z+ on the rules, the degree to which a given query is confirmed or denied can be computed in O(log n`) propositional satisfiability tests, where n is the number of rules in the knowledge base. In this paper, we show that computing Z+ requires O(n2 X log n) satisfiability tests, not an exponential number as was conjectured in [18], which reduces to polynomial complexity in the case of Horn expressions. We also show how reasoning with imprecise observations can be incorporated in our formalism and how the popular notions of belief revision and epistemic entrenchment are embodied naturally and tractably.\nA number of writers(Joseph Halpern and Fahiem Bacchus among them) have offered semantics for formal languages in which inferences concerning probabilities can be made. Our concern is different. This paper provides a formalization of nonmonotonic inferences in which the conclusion is supported only to a certain degree. Such inferences are clearly 'invalid' since they must allow the falsity of a conclusion even when the premises are true. Nevertheless, such inferences can be characterized both syntactically and semantically. The 'premises' of probabilistic arguments are sets of statements (as in a database or knowledge base), the conclusions categorical statements in the language. We provide standards for both this form of inference, for which high probability is required, and for an inference in which the conclusion is qualified by an intermediate interval of support.\nThe Dempster-Shafer theory of evidence has been used intensively to deal with uncertainty in knowledge-based systems. However the representation of uncertain relationships between evidence and hypothesis groups (heuristic knowledge) is still a major research problem. This paper presents an approach to representing such heuristic knowledge by evidential mappings which are defined on the basis of mass functions. The relationships between evidential mappings and multi valued mappings, as well as between evidential mappings and Bayesian multi- valued causal link models in Bayesian theory are discussed. Following this the detailed procedures for constructing evidential mappings for any set of heuristic rules are introduced. Several situations of belief propagation are discussed.\nBayes nets are relatively recent innovations. As a result, most of their theoretical development has focused on the simplest class of single-author models. The introduction of more sophisticated multiple-author settings raises a variety of interesting questions. One such question involves the nature of compromise and consensus. Posterior compromises let each model process all data to arrive at an independent response, and then split the difference. Prior compromises, on the other hand, force compromise to be reached on all points before data is observed. This paper introduces prior compromises in a Bayes net setting. It outlines the problem and develops an efficient algorithm for fusing two directed acyclic graphs into a single, consensus structure, which may then be used as the basis of a prior compromise.\nIn Moral, Campos (1991) and Cano, Moral, Verdegay-Lopez (1991) a new method of conditioning convex sets of probabilities has been proposed. The result of it is a convex set of non-necessarily normalized probability distributions. The normalizing factor of each probability distribution is interpreted as the possibility assigned to it by the conditioning information. From this, it is deduced that the natural value for the conditional probability of an event is a possibility distribution. The aim of this paper is to study methods of transforming this possibility distribution into a probability (or uncertainty) interval. These methods will be based on the use of Sugeno and Choquet integrals. Their behaviour will be compared in basis to some selected examples.\nThe trajectory of a robot is monitored in a restricted dynamic environment using light beam sensor data. We have a Dynamic Belief Network (DBN), based on a discrete model of the domain, which provides discrete monitoring analogous to conventional quantitative filter techniques. Sensor observations are added to the basic DBN in the form of specific evidence. However, sensor data is often partially or totally incorrect. We show how the basic DBN, which infers only an impossible combination of evidence, may be modified to handle specific types of incorrect data which may occur in the domain. We then present an extension to the DBN, the addition of an invalidating node, which models the status of the sensor as working or defective. This node provides a qualitative explanation of inconsistent data: it is caused by a defective sensor. The connection of successive instances of the invalidating node models the status of a sensor over time, allowing the DBN to handle both persistent and intermittent faults.\nThis paper describes some results of research on associate systems: knowledge-based systems that flexibly and adaptively support their human users in carrying out complex, time-dependent problem-solving tasks under uncertainty. Based on principles derived from decision theory and decision analysis, a problem-solving approach is presented which can overcome many of the limitations of traditional expert-systems. This approach implements an explicit model of the human user's problem-solving capabilities as an integral element in the overall problem solving architecture. This integrated model, represented as an influence diagram, is the basis for achieving adaptive task sharing behavior between the associate system and the human user. This associate system model has been applied toward ongoing research on a Mars Rover Manager's Associate (MRMA). MRMA's role would be to manage a small fleet of robotic rovers on the Martian surface. The paper describes results for a specific scenario where MRMA examines the benefits and costs of consulting human experts on Earth to assist a Mars rover with a complex resource management decision.\nAlthough the notion of diagnostic problem has been extensively investigated in the context of static systems, in most practical applications the behavior of the modeled system is significantly variable during time. The goal of the paper is to propose a novel approach to the modeling of uncertainty about temporal evolutions of time-varying systems and a characterization of model-based temporal diagnosis. Since in most real world cases knowledge about the temporal evolution of the system to be diagnosed is uncertain, we consider the case when probabilistic temporal knowledge is available for each component of the system and we choose to model it by means of Markov chains. In fact, we aim at exploiting the statistical assumptions underlying reliability theory in the context of the diagnosis of timevarying systems. We finally show how to exploit Markov chain theory in order to discard, in the diagnostic process, very unlikely diagnoses.\nMany AI synthesis problems such as planning or scheduling may be modelized as constraint satisfaction problems (CSP). A CSP is typically defined as the problem of finding any consistent labeling for a fixed set of variables satisfying all given constraints between these variables. However, for many real tasks such as job-shop scheduling, time-table scheduling, design?, all these constraints have not the same significance and have not to be necessarily satisfied. A first distinction can be made between hard constraints, which every solution should satisfy and soft constraints, whose satisfaction has not to be certain. In this paper, we formalize the notion of possibilistic constraint satisfaction problems that allows the modeling of uncertainly satisfied constraints. We use a possibility distribution over labelings to represent respective possibilities of each labeling. Necessity-valued constraints allow a simple expression of the respective certainty degrees of each constraint. The main advantage of our approach is its integration in the CSP technical framework. Most classical techniques, such as Backtracking (BT), arcconsistency enforcing (AC) or Forward Checking have been extended to handle possibilistics CSP and are effectively implemented. The utility of our approach is demonstrated on a simple design problem.\nThe general use of subjective probabilities to model belief has been justified using many axiomatic schemes. For example, ?consistent betting behavior' arguments are well-known. To those not already convinced of the unique fitness and generality of probability models, such justifications are often unconvincing. The present paper explores another rationale for probability models. ?Qualitative probability,' which is known to provide stringent constraints on belief representation schemes, is derived from five simple assumptions about relationships among beliefs. While counterparts of familiar rationality concepts such as transitivity, dominance, and consistency are used, the betting context is avoided. The gap between qualitative probability and probability proper can be bridged by any of several additional assumptions. The discussion here relies on results common in the recent AI literature, introducing a sixth simple assumption. The narrative emphasizes models based on unique complete orderings, but the rationale extends easily to motivate set-valued representations of partial orderings as well.\nIn a previous paper [Pearl and Verma, 1991] we presented an algorithm for extracting causal influences from independence information, where a causal influence was defined as the existence of a directed arc in all minimal causal models consistent with the data. In this paper we address the question of deciding whether there exists a causal model that explains ALL the observed dependencies and independencies. Formally, given a list M of conditional independence statements, it is required to decide whether there exists a directed acyclic graph (dag) D that is perfectly consistent with M, namely, every statement in M, and no other, is reflected via dseparation in D. We present and analyze an effective algorithm that tests for the existence of such a day, and produces one, if it exists.\nCurrent Bayesian net representations do not consider structure in the domain and include all variables in a homogeneous network. At any time, a human reasoner in a large domain may direct his attention to only one of a number of natural subdomains, i.e., there is ?localization' of queries and evidence. In such a case, propagating evidence through a homogeneous network is inefficient since the entire network has to be updated each time. This paper presents multiply sectioned Bayesian networks that enable a (localization preserving) representation of natural subdomains by separate Bayesian subnets. The subnets are transformed into a set of permanent junction trees such that evidential reasoning takes place at only one of them at a time. Probabilities obtained are identical to those that would be obtained from the homogeneous network. We discuss attention shift to a different junction tree and propagation of previously acquired evidence. Although the overall system can be large, computational requirements are governed by the size of only one junction tree.\nValuation-based system (VBS) provides a general framework for representing knowledge and drawing inferences under uncertainty. Recent studies have shown that the semantics of VBS can represent and solve Bayesian decision problems (Shenoy, 1991a). The purpose of this paper is to propose a decision calculus for Dempster-Shafer (D-S) theory in the framework of VBS. The proposed calculus uses a weighting factor whose role is similar to the probabilistic interpretation of an assumption that disambiguates decision problems represented with belief functions (Strat 1990). It will be shown that with the presented calculus, if the decision problems are represented in the valuation network properly, we can solve the problems by using fusion algorithm (Shenoy 1991a). It will also be shown the presented decision calculus can be reduced to the calculus for Bayesian probability theory when probabilities, instead of belief functions, are given.\nThis paper presents a new approach for computing posterior probabilities in Bayesian nets, which sidesteps the triangulation problem. The current state of art is the clique tree propagation approach. When the underlying graph of a Bayesian net is triangulated, this approach arranges its cliques into a tree and computes posterior probabilities by appropriately passing around messages in that tree. The computation in each clique is simply direct marginalization. When the underlying graph is not triangulated, one has to first triangulated it by adding edges. Referred to as the triangulation problem, the problem of finding an optimal or even a ?good? triangulation proves to be difficult. In this paper, we propose to first decompose a Bayesian net into smaller components by making use of Tarjan's algorithm for decomposing an undirected graph at all its minimal complete separators. Then, the components are arranged into a tree and posterior probabilities are computed by appropriately passing around messages in that tree. The computation in each component is carried out by repeating the whole procedure from the beginning. Thus the triangulation problem is sidestepped.\nBelief networks are a new, potentially important, class of knowledge-based models. ARCO1, currently under development at the Atlantic Richfield Company (ARCO) and the University of Southern California (USC), is the most advanced reported implementation of these models in a financial forecasting setting. ARCO1's underlying belief network models the variables believed to have an impact on the crude oil market. A pictorial market model-developed on a MAC II- facilitates consensus among the members of the forecasting team. The system forecasts crude oil prices via Monte Carlo analyses of the network. Several different models of the oil market have been developed; the system's ability to be updated quickly highlights its flexibility.\nThe way experts manage uncertainty usually changes depending on the task they are performing. This fact has lead us to consider the problem of communicating modules (task implementations) in a large and structured knowledge based system when modules have different uncertainty calculi. In this paper, the analysis of the communication problem is made assuming that (i) each uncertainty calculus is an inference mechanism defining an entailment relation, and therefore the communication is considered to be inference-preserving, and (ii) we restrict ourselves to the case which the different uncertainty calculi are given by a class of truth functional Multiple-valued Logics.\nAn approach to reasoning with default rules where the proportion of exceptions, or more generally the probability of encountering an exception, can be at least roughly assessed is presented. It is based on local uncertainty propagation rules which provide the best bracketing of a conditional probability of interest from the knowledge of the bracketing of some other conditional probabilities. A procedure that uses two such propagation rules repeatedly is proposed in order to estimate any simple conditional probability of interest from the available knowledge. The iterative procedure, that does not require independence assumptions, looks promising with respect to the linear programming method. Improved bounds for conditional probabilities are given when independence assumptions hold.\nThis paper presents a plausible reasoning system to illustrate some broad issues in knowledge representation: dualities between different reasoning forms, the difficulty of unifying complementary reasoning styles, and the approximate nature of plausible reasoning. These issues have a common underlying theme: there should be an underlying belief calculus of which the many different reasoning forms are special cases, sometimes approximate. The system presented allows reasoning about defaults, likelihood, necessity and possibility in a manner similar to the earlier work of Adams. The system is based on the belief calculus of subjective Bayesian probability which itself is based on a few simple assumptions about how belief should be manipulated. Approximations, semantics, consistency and consequence results are presented for the system. While this puts these often discussed plausible reasoning forms on a probabilistic footing, useful application to practical problems remains an issue.\nTheory refinement is the task of updating a domain theory in the light of new cases, to be done automatically or with some expert assistance. The problem of theory refinement under uncertainty is reviewed here in the context of Bayesian statistics, a theory of belief revision. The problem is reduced to an incremental learning task as follows: the learning system is initially primed with a partial theory supplied by a domain expert, and thereafter maintains its own internal representation of alternative theories which is able to be interrogated by the domain expert and able to be incrementally refined from data. Algorithms for refinement of Bayesian networks are presented to illustrate what is meant by \"partial theory\", \"alternative theory representation\", etc. The algorithms are an incremental variant of batch learning algorithms from the literature so can work well in batch and incremental mode.\nResearch on Symbolic Probabilistic Inference (SPI) [2, 3] has provided an algorithm for resolving general queries in Bayesian networks. SPI applies the concept of dependency directed backward search to probabilistic inference, and is incremental with respect to both queries and observations. Unlike traditional Bayesian network inferencing algorithms, SPI algorithm is goal directed, performing only those calculations that are required to respond to queries. Research to date on SPI applies to Bayesian networks with discrete-valued variables and does not address variables with continuous values. In this papers, we extend the SPI algorithm to handle Bayesian networks made up of continuous variables where the relationships between the variables are restricted to be ?linear gaussian?. We call this variation of the SPI algorithm, SPI Continuous (SPIC). SPIC modifies the three basic SPI operations: multiplication, summation, and substitution. However, SPIC retains the framework of the SPI algorithm, namely building the search tree and recursive query mechanism and therefore retains the goal-directed and incrementality features of SPI.\nIn this paper, we consider one aspect of the problem of applying decision theory to the design of agents that learn how to make decisions under uncertainty. This aspect concerns how an agent can estimate probabilities for the possible states of the world, given that it only makes limited observations before committing to a decision. We show that the naive application of statistical tools can be improved upon if the agent can determine which of his observations are truly relevant to the estimation problem at hand. We give a framework in which such determinations can be made, and define an estimation procedure to use them. Our framework also suggests several extensions, which show how additional knowledge can be used to improve tile estimation procedure still further.\nValue-of-information analyses provide a straightforward means for selecting the best next observation to make, and for determining whether it is better to gather additional information or to act immediately. Determining the next best test to perform, given a state of uncertainty about the world, requires a consideration of the value of making all possible sequences of observations. In practice, decision analysts and expert-system designers have avoided the intractability of exact computation of the value of information by relying on a myopic approximation. Myopic analyses are based on the assumption that only one additional test will be performed, even when there is an opportunity to make a large number of observations. We present a nonmyopic approximation for value of information that bypasses the traditional myopic analyses by exploiting the statistical properties of large samples.\nSince exact probabilistic inference is intractable in general for large multiply connected belief nets, approximate methods are required. A promising approach is to use heuristic search among hypotheses (instantiations of the network) to find the most probable ones, as in the TopN algorithm. Search is based on the relative probabilities of hypotheses which are efficient to compute. Given upper and lower bounds on the relative probability of partial hypotheses, it is possible to obtain bounds on the absolute probabilities of hypotheses. Best-first search aimed at reducing the maximum error progressively narrows the bounds as more hypotheses are examined. Here, qualitative probabilistic analysis is employed to obtain bounds on the relative probability of partial hypotheses for the BN20 class of networks networks and a generalization replacing the noisy OR assumption by negative synergy. The approach is illustrated by application to a very large belief network, QMR-BN, which is a reformulation of the Internist-1 system for diagnosis in internal medicine.\nWe motivate and describe a theory of belief in this paper. This theory is developed with the following view of human belief in mind. Consider the belief that an event E will occur (or has occurred or is occurring). An agent either entertains this belief or does not entertain this belief (i.e., there is no \"grade\" in entertaining the belief). If the agent chooses to exercise \"the will to believe\" and entertain this belief, he/she/it is entitled to a degree of confidence c (1 > c > 0) in doing so. Adopting this view of human belief, we conjecture that whenever an agent entertains the belief that E will occur with c degree of confidence, the agent will be surprised (to the extent c) upon realizing that E did not occur.\nThe categorial approach to evidential reasoning can be seen as a combination of the probability kinematics approach of Richard Jeffrey (1965) and the maximum (cross-) entropy inference approach of E. T. Jaynes (1957). As a consequence of that viewpoint, it is well known that category theory provides natural definitions for logical connectives. In particular, disjunction and conjunction are modelled by general categorial constructions known as products and coproducts. In this paper, I focus mainly on Dempster-Shafer theory of belief functions for which I introduce a category I call Dempster?s category. I prove the existence of and give explicit formulas for conjunction and disjunction in the subcategory of separable belief functions. In Dempster?s category, the new defined conjunction can be seen as the most cautious conjunction of beliefs, and thus no assumption about distinctness (of the sources) of beliefs is needed as opposed to Dempster?s rule of combination, which calls for distinctness (of the sources) of beliefs.\nA semantics is given to possibilistic logic, a logic that handles weighted classical logic formulae, and where weights are interpreted as lower bounds on degrees of certainty or possibility, in the sense of Zadeh's possibility theory. The proposed semantics is based on fuzzy sets of interpretations. It is tolerant to partial inconsistency. Satisfiability is extended from interpretations to fuzzy sets of interpretations, each fuzzy set representing a possibility distribution describing what is known about the state of the world. A possibilistic knowledge base is then viewed as a set of possibility distributions that satisfy it. The refutation method of automated deduction in possibilistic logic, based on previously introduced generalized resolution principle is proved to be sound and complete with respect to the proposed semantics, including the case of partial inconsistency.\nWhen a planner must decide whether it has enough evidence to make a decision based on probability, it faces the sample size problem. Current planners using probabilities need not deal with this problem because they do not generate their probabilities from observations. This paper presents an event based language in which the planner's probabilities are calculated from the binomial random variable generated by the observed ratio of one type of event to another. Such probabilities are subject to error, so the planner must introspect about their validity. Inferences about the probability of these events can be made using statistics. Inferences about the validity of the approximations can be made using interval estimation. Interval estimation allows the planner to avoid making choices that are only weakly supported by the planner's evidence.\nWe present a general architecture for the monitoring and diagnosis of large scale sensor-based systems with real time diagnostic constraints. This architecture is multileveled, combining a single monitoring level based on statistical methods with two model based diagnostic levels. At each level, sources of uncertainty are identified, and integrated methodologies for uncertainty management are developed. The general architecture was applied to the monitoring and diagnosis of a specific nuclear physics detector at Lawrence Berkeley National Laboratory that contained approximately 5000 components and produced over 500 channels of output data. The general architecture is scalable, and work is ongoing to apply it to detector systems one and two orders of magnitude more complex.\nA new probabilistic network construction system, DYNASTY, is proposed for diagnostic reasoning given variables whose probabilities change over time. Diagnostic reasoning is formulated as a sequential stochastic process, and is modeled using influence diagrams. Given a set O of observations, DYNASTY creates an influence diagram in order to devise the best action given O. Sensitivity analyses are conducted to determine if the best network has been created, given the uncertainty in network parameters and topology. DYNASTY uses an equivalence class approach to provide decision thresholds for the sensitivity analysis. This equivalence-class approach to diagnostic reasoning differentiates diagnoses only if the required actions are different. A set of network-topology updating algorithms are proposed for dynamically updating the network when necessary.\nFor high level path planning, environments are usually modeled as distance graphs, and path planning problems are reduced to computing the shortest path in distance graphs. One major drawback of this modeling is the inability to model uncertainties, which are often encountered in practice. In this paper, a new tool, called U-yraph, is proposed for environment modeling. A U-graph is an extension of distance graphs with the ability to handle a kind of uncertainty. By modeling an uncertain environment as a U-graph, and a navigation problem as a Markovian decision process, we can precisely define a new optimality criterion for navigation plans, and more importantly, we can come up with a general algorithm for computing optimal plans for navigation tasks.\nDeliberation plays an important role in the design of rational agents embedded in the real-world. In particular, deliberation leads to the formation of intentions, i.e., plans of action that the agent is committed to achieving. In this paper, we present a branching time possible-worlds model for representing and reasoning about, beliefs, goals, intentions, time, actions, probabilities, and payoffs. We compare this possible-worlds approach with the more traditional decision tree representation and provide a transformation from decision trees to possible worlds. Finally, we illustrate how an agent can perform deliberation using a decision-tree representation and then use a possible-worlds model to form and reason about his intentions.\nThis paper introduces conceptual relations that synthesize utilitarian and logical concepts, extending the logics of preference of Rescher. We define first, in the context of a possible worlds model, constraint-dependent measures that quantify the relative quality of alternative solutions of reasoning problems or the relative desirability of various policies in control, decision, and planning problems. We show that these measures may be interpreted as truth values in a multi valued logic and propose mechanisms for the representation of complex constraints as combinations of simpler restrictions. These extended logical operations permit also the combination and aggregation of goal-specific quality measures into global measures of utility. We identify also relations that represent differential preferences between alternative solutions and relate them to the previously defined desirability measures. Extending conventional modal logic formulations, we introduce structures for the representation of ignorance about the utility of alternative solutions. Finally, we examine relations between these concepts and similarity based semantic models of fuzzy logic.\nIn this article we present two ways of structuring bodies of evidence, which allow us to reduce the complexity of the operations usually performed in the framework of evidence theory. The first structure just partitions the focal elements in a body of evidence by their cardinality. With this structure we are able to reduce the complexity on the calculation of the belief functions Bel, Pl, and Q. The other structure proposed here, the Hierarchical Trees, permits us to reduce the complexity of the calculation of Bel, Pl, and Q, as well as of the Dempster's rule of combination in relation to the brute-force algorithm. Both these structures do not require the generation of all the subsets of the reference domain.\nIn mechanical design, there is often unavoidable uncertainty in estimates of design performance. Evaluation of design alternatives requires consideration of the impact of this uncertainty. Expert heuristics embody assumptions regarding the designer's attitude towards risk and uncertainty that might be reasonable in most cases but inaccurate in others. We present a technique to allow designers to incorporate their own unique attitude towards uncertainty as opposed to those assumed by the domain expert's rules. The general approach is to eliminate aspects of heuristic rules which directly or indirectly include assumptions regarding the user's attitude towards risk, and replace them with explicit, user-specified probabilistic multi attribute utility and probability distribution functions. We illustrate the method in a system for material selection for automobile bumpers.\nThe compatibility of quantitative and qualitative representations of beliefs was studied extensively in probability theory. It is only recently that this important topic is considered in the context of belief functions. In this paper, the compatibility of various quantitative belief measures and qualitative belief structures is investigated. Four classes of belief measures considered are: the probability function, the monotonic belief function, Shafer's belief function, and Smets' generalized belief function. The analysis of their individual compatibility with different belief structures not only provides a sound b<msis for these quantitative measures, but also alleviates some of the difficulties in the acquisition and interpretation of numeric belief numbers. It is shown that the structure of qualitative probability is compatible with monotonic belief functions. Moreover, a belief structure slightly weaker than that of qualitative belief is compatible with Smets' generalized belief functions.\nWe describe a technique that can be used for the fusion of multiple sources of information as well as for the evaluation and selection of alternatives under multi-criteria. Three important properties contribute to the uniqueness of the technique introduced. The first is the ability to do all necessary operations and aggregations with information that is of a nonnumeric linguistic nature. This facility greatly reduces the burden on the providers of information, the experts. A second characterizing feature is the ability assign, again linguistically, differing importance to the criteria or in the case of information fusion to the individual sources of information. A third significant feature of the approach is its ability to be used as method to find a consensus of the opinion of multiple experts on the issue of concern. The techniques used in this approach are base on ideas developed from the theory of approximate reasoning. We illustrate the approach with a problem of project selection.\nUniversal induction is a crucial issue in AGI. Its practical applicability can be achieved by the choice of the reference machine or representation of algorithms agreed with the environment. This machine should be updatable for solving subsequent tasks more efficiently. We study this problem on an example of combinatory logic as the very simple Turing-complete reference machine, which enables modifying program representations by introducing different sets of primitive combinators. Genetic programming system is used to search for combinator expressions, which are easily decomposed into sub-expressions being recombined in crossover. Our experiments show that low-complexity induction or prediction tasks can be solved by the developed system (much more efficiently than using brute force); useful combinators can be revealed and included into the representation simplifying more difficult tasks. However, optimal sets of combinators depend on the specific task, so the reference machine should be adaptively chosen in coordination with the search engine.\nThe purpose of this paper is to serve as a reference guide for the development of chatterbots implemented with the AIML language. In order to achieve this, the main concepts in Pattern Recognition area are described because the AIML uses such theoretical framework in their syntactic and semantic structures. After that, AIML language is described and each AIML command/tag is followed by an application example. Also, the usage of AIML embedded tags for the handling of sequence dialogue limitations between humans and machines is shown. Finally, computer systems that assist in the design of chatterbots with the AIML language are classified and described.\nThe best current methods for exactly computing the number of satisfying assignments, or the satisfying probability, of Boolean formulas can be seen, either directly or indirectly, as building 'decision-DNNF' (decision decomposable negation normal form) representations of the input Boolean formulas. Decision-DNNFs are a special case of 'd-DNNF's where 'd' stands for 'deterministic'. We show that any decision-DNNF can be converted into an equivalent 'FBDD' (free binary decision diagram) -- also known as a 'read-once branching program' (ROBP or 1-BP) -- with only a quasipolynomial increase in representation size in general, and with only a polynomial increase in size in the special case of monotone k-DNF formulas. Leveraging known exponential lower bounds for FBDDs, we then obtain similar exponential lower bounds for decision-DNNFs which provide lower bounds for the recent algorithms. We also separate the power of decision-DNNFs from d-DNNFs and a generalization of decision-DNNFs known as AND-FBDDs. Finally we show how these imply exponential lower bounds for natural problems associated with probabilistic databases.\nReasoning about degrees of belief in uncertain dynamic worlds is fundamental to many applications, such as robotics and planning, where actions modify state properties and sensors provide measurements, both of which are prone to noise. With the exception of limited cases such as Gaussian processes over linear phenomena, belief state evolution can be complex and hard to reason with in a general way. This paper proposes a framework with new results that allows the reduction of subjective probabilities after sensing and acting to questions about the initial state only. We build on an expressive probabilistic first-order logical account by Bacchus, Halpern and Levesque, resulting in a methodology that, in principle, can be coupled with a variety of existing inference solutions.\nWe give a new consistent scoring function for structure learning of Bayesian networks. In contrast to traditional approaches to scorebased structure learning, such as BDeu or MDL, the complexity penalty that we propose is data-dependent and is given by the probability that a conditional independence test correctly shows that an edge cannot exist. What really distinguishes this new scoring function from earlier work is that it has the property of becoming computationally easier to maximize as the amount of data increases. We prove a polynomial sample complexity result, showing that maximizing this score is guaranteed to correctly learn a structure with no false edges and a distribution close to the generating distribution, whenever there exists a Bayesian network which is a perfect map for the data generating distribution. Although the new score can be used with any search algorithm, we give empirical results showing that it is particularly effective when used together with a linear programming relaxation approach to Bayesian network structure learning.\nUsing the theory of group action, we first introduce the concept of the automorphism group of an exponential family or a graphical model, thus formalizing the general notion of symmetry of a probabilistic model. This automorphism group provides a precise mathematical framework for lifted inference in the general exponential family. Its group action partitions the set of random variables and feature functions into equivalent classes (called orbits) having identical marginals and expectations. Then the inference problem is effectively reduced to that of computing marginals or expectations for each class, thus avoiding the need to deal with each individual variable or feature. We demonstrate the usefulness of this general framework in lifting two classes of variational approximation for maximum a posteriori (MAP) inference: local linear programming (LP) relaxation and local LP relaxation with cycle constraints; the latter yields the first lifted variational inference algorithm that operates on a bound tighter than the local constraints.\nThis paper shows that causal model discovery is not an NP-hard problem, in the sense that for sparse graphs bounded by node degree k the sound and complete causal model can be obtained in worst case order N^{2(k+2)} independence tests, even when latent variables and selection bias may be present. We present a modification of the well-known FCI algorithm that implements the method for an independence oracle, and suggest improvements for sample/real-world data versions. It does not contradict any known hardness results, and does not solve an NP-hard problem: it just proves that sparse causal discovery is perhaps more complicated, but not as hard as learning minimal Bayesian networks.\nPossibilistic and qualitative POMDPs (pi-POMDPs) are counterparts of POMDPs used to model situations where the agent's initial belief or observation probabilities are imprecise due to lack of past experiences or insufficient data collection. However, like probabilistic POMDPs, optimally solving pi-POMDPs is intractable: the finite belief state space exponentially grows with the number of system's states. In this paper, a possibilistic version of Mixed-Observable MDPs is presented to get around this issue: the complexity of solving pi-POMDPs, some state variables of which are fully observable, can be then dramatically reduced. A value iteration algorithm for this new formulation under infinite horizon is next proposed and the optimality of the returned policy (for a specified criterion) is shown assuming the existence of a \"stay\" action in some goal states. Experimental work finally shows that this possibilistic model outperforms probabilistic POMDPs commonly used in robotics, for a target recognition problem where the agent's observations are imprecise.\nMany probabilistic inference tasks involve summations over exponentially large sets. Recently, it has been shown that these problems can be reduced to solving a polynomial number of MAP inference queries for a model augmented with randomly generated parity constraints. By exploiting a connection with max-likelihood decoding of binary codes, we show that these optimizations are computationally hard. Inspired by iterative message passing decoding algorithms, we propose an Integer Linear Programming (ILP) formulation for the problem, enhanced with new sparsification techniques to improve decoding performance. By solving the ILP through a sequence of LP relaxations, we get both lower and upper bounds on the partition function, which hold with high probability and are much tighter than those obtained with variational methods.\nPopular Monte-Carlo tree search (MCTS) algorithms for online planning, such as epsilon-greedy tree search and UCT, aim at rapidly identifying a reasonably good action, but provide rather poor worst-case guarantees on performance improvement over time. In contrast, a recently introduced MCTS algorithm BRUE guarantees exponential-rate improvement over time, yet it is not geared towards identifying reasonably good choices right at the go. We take a stand on the individual strengths of these two classes of algorithms, and show how they can be effectively connected. We then rationalize a principle of \"selective tree expansion\", and suggest a concrete implementation of this principle within MCTS. The resulting algorithm,s favorably compete with other MCTS algorithms under short planning times, while preserving the attractive convergence properties of BRUE.\nWe consider the problem of maximum a posteriori (MAP) inference in discrete graphical models. We present a parallel MAP inference algorithm called Bethe-ADMM based on two ideas: tree-decomposition of the graph and the alternating direction method of multipliers (ADMM). However, unlike the standard ADMM, we use an inexact ADMM augmented with a Bethe-divergence based proximal function, which makes each subproblem in ADMM easy to solve in parallel using the sum-product algorithm. We rigorously prove global convergence of Bethe-ADMM. The proposed algorithm is extensively evaluated on both synthetic and real datasets to illustrate its effectiveness. Further, the parallel Bethe-ADMM is shown to scale almost linearly with increasing number of cores.\nWe present a very general approach to learning the structure of causal models based on d-separation constraints, obtained from any given set of overlapping passive observational or experimental data sets. The procedure allows for both directed cycles (feedback loops) and the presence of latent variables. Our approach is based on a logical representation of causal pathways, which permits the integration of quite general background knowledge, and inference is performed using a Boolean satisfiability (SAT) solver. The procedure is complete in that it exhausts the available information on whether any given edge can be determined to be present or absent, and returns \"unknown\" otherwise. Many existing constraint-based causal discovery algorithms can be seen as special cases, tailored to circumstances in which one or more restricting assumptions apply. Simulations illustrate the effect of these assumptions on discovery and how the present algorithm scales.\nA limited-memory influence diagram (LIMID) generalizes a traditional influence diagram by relaxing the assumptions of regularity and no-forgetting, allowing a wider range of decision problems to be modeled. Algorithms for solving traditional influence diagrams are not easily generalized to solve LIMIDs, however, and only recently have exact algorithms for solving LIMIDs been developed. In this paper, we introduce an exact algorithm for solving LIMIDs that is based on branch-and-bound search. Our approach is related to the approach of solving an influence diagram by converting it to an equivalent decision tree, with the difference that the LIMID is converted to a much smaller decision graph that can be searched more efficiently.\nExact algorithms for learning Bayesian networks guarantee to find provably optimal networks. However, they may fail in difficult learning tasks due to limited time or memory. In this research we adapt several anytime heuristic search-based algorithms to learn Bayesian networks. These algorithms find high-quality solutions quickly, and continually improve the incumbent solution or prove its optimality before resources are exhausted. Empirical results show that the anytime window A* algorithm usually finds higher-quality, often optimal, networks more quickly than other approaches. The results also show that, surprisingly, while generating networks with few parents per variable are structurally simpler, they are harder to learn than complex generating networks with more parents per variable.\nCredal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with ternary variables. We prove that under epistemic irrelevance the polynomial time complexity of inferences in credal trees is not likely to extend to more general models (e.g. singly connected networks). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding computational complexity.\nIn many developing countries, half the population lives in rural locations, where access to essentials such as school materials, mosquito nets, and medical supplies is restricted. We propose an alternative method of distribution (to standard road delivery) in which the existing mobility habits of a local population are leveraged to deliver aid, which raises two technical challenges in the areas optimisation and learning. For optimisation, a standard Markov decision process applied to this problem is intractable, so we provide an exact formulation that takes advantage of the periodicities in human location behaviour. To learn such behaviour models from sparse data (i.e., cell tower observations), we develop a Bayesian model of human mobility. Using real cell tower data of the mobility behaviour of 50,000 individuals in Ivory Coast, we find that our model outperforms the state of the art approaches in mobility prediction by at least 25% (in held-out data likelihood). Furthermore, when incorporating mobility prediction with our MDP approach, we find a 81.3% reduction in total delivery time versus routine planning that minimises just the number of participants in the solution path.\nGraphical models with High Order Potentials (HOPs) have received considerable interest in recent years. While there are a variety of approaches to inference in these models, nearly all of them amount to solving a linear program (LP) relaxation with unary consistency constraints between the HOP and the individual variables. In many cases, the resulting relaxations are loose, and in these cases the results of inference can be poor. It is thus desirable to look for more accurate ways of performing inference in these models. In this work, we study the LP relaxations that result from enforcing additional consistency constraints between the HOP and the rest of the model. We address theoretical questions about the strength of the resulting relaxations compared to the relaxations that arise in standard approaches, and we develop practical and efficient message passing algorithms for optimizing the LPs. Empirically, we show that the LPs with additional consistency constraints lead to more accurate inference on some challenging problems that include a combination of low order and high order terms.\nWe propose a method for learning cyclic causal models from a combination of observational and interventional equilibrium data. Novel aspects of the proposed method are its ability to work with continuous data (without assuming linearity) and to deal with feedback loops. Within the context of biochemical reactions, we also propose a novel way of modeling interventions that modify the activity of compounds instead of their abundance. For computational reasons, we approximate the nonlinear causal mechanisms by (coupled) local linearizations, one for each experimental condition. We apply the method to reconstruct a cellular signaling network from the flow cytometry data measured by Sachs et al. (2005). We show that our method finds evidence in the data for feedback loops and that it gives a more accurate quantitative description of the data at comparable model complexity.\nWe evaluate four computational models of explanation in Bayesian networks by comparing model predictions to human judgments. In two experiments, we present human participants with causal structures for which the models make divergent predictions and either solicit the best explanation for an observed event (Experiment 1) or have participants rate provided explanations for an observed event (Experiment 2). Across two versions of two causal structures and across both experiments, we find that the Causal Explanation Tree and Most Relevant Explanation models provide better fits to human data than either Most Probable Explanation or Explanation Tree models. We identify strengths and shortcomings of these models and what they can reveal about human explanation. We conclude by suggesting the value of pursuing computational and psychological investigations of explanation in parallel.\nThis paper is devoted to fair optimization in Multiobjective Markov Decision Processes (MOMDPs). A MOMDP is an extension of the MDP model for planning under uncertainty while trying to optimize several reward functions simultaneously. This applies to multiagent problems when rewards define individual utility functions, or in multicriteria problems when rewards refer to different features. In this setting, we study the determination of policies leading to Lorenz-non-dominated tradeoffs. Lorenz dominance is a refinement of Pareto dominance that was introduced in Social Choice for the measurement of inequalities. In this paper, we introduce methods to efficiently approximate the sets of Lorenz-non-dominated solutions of infinite-horizon, discounted MOMDPs. The approximations are polynomial-sized subsets of those solutions.\nWe propose solution methods for previously-unsolved constrained MDPs in which actions can continuously modify the transition probabilities within some acceptable sets. While many methods have been proposed to solve regular MDPs with large state sets, there are few practical approaches for solving constrained MDPs with large action sets. In particular, we show that the continuous action sets can be replaced by their extreme points when the rewards are linear in the modulation. We also develop a tractable optimization formulation for concave reward functions and, surprisingly, also extend it to non- concave reward functions by using their concave envelopes. We evaluate the effectiveness of the approach on the problem of managing delinquencies in a portfolio of loans.\nHidden variables are ubiquitous in practical data analysis, and therefore modeling marginal densities and doing inference with the resulting models is an important problem in statistics, machine learning, and causal inference. Recently, a new type of graphical model, called the nested Markov model, was developed which captures equality constraints found in marginals of directed acyclic graph (DAG) models. Some of these constraints, such as the so called `Verma constraint', strictly generalize conditional independence. To make modeling and inference with nested Markov models practical, it is necessary to limit the number of parameters in the model, while still correctly capturing the constraints in the marginal of a DAG model. Placing such limits is similar in spirit to sparsity methods for undirected graphical models, and regression models. In this paper, we give a log-linear parameterization which allows sparse modeling with nested Markov models. We illustrate the advantages of this parameterization with a simulation study.\nThis paper discusses {General Random Utility Models (GRUMs)}. These are a class of parametric models that generate partial ranks over alternatives given attributes of agents and alternatives. We propose two preference elicitation scheme for GRUMs developed from principles in Bayesian experimental design, one for social choice and the other for personalized choice. We couple this with a general Monte-Carlo-Expectation-Maximization (MC-EM) based algorithm for MAP inference under GRUMs. We also prove uni-modality of the likelihood functions for a class of GRUMs. We examine the performance of various criteria by experimental studies, which show that the proposed elicitation scheme increases the precision of estimation.\nIn this paper, we investigate combining blocking and collapsing -- two widely used strategies for improving the accuracy of Gibbs sampling -- in the context of probabilistic graphical models (PGMs). We show that combining them is not straight-forward because collapsing (or eliminating variables) introduces new dependencies in the PGM and in computation-limited settings, this may adversely affect blocking. We therefore propose a principled approach for tackling this problem. Specifically, we develop two scoring functions, one each for blocking and collapsing, and formulate the problem of partitioning the variables in the PGM into blocked and collapsed subsets as simultaneously maximizing both scoring functions (i.e., a multi-objective optimization problem). We propose a dynamic, greedy algorithm for approximately solving this intractable optimization problem. Our dynamic algorithm periodically updates the partitioning into blocked and collapsed variables by leveraging correlation statistics gathered from the generated samples and enables rapid mixing by blocking together and collapsing highly correlated variables. We demonstrate experimentally the clear benefit of our dynamic approach: as more samples are drawn, our dynamic approach significantly outperforms static graph-based approaches by an order of magnitude in terms of accuracy.\nRecent advances in symbolic dynamic programming (SDP) combined with the extended algebraic decision diagram (XADD) data structure have provided exact solutions for mixed discrete and continuous (hybrid) MDPs with piecewise linear dynamics and continuous actions. Since XADD-based exact solutions may grow intractably large for many problems, we propose a bounded error compression technique for XADDs that involves the solution of a constrained bilinear saddle point problem. Fortuitously, we show that given the special structure of this problem, it can be expressed as a bilevel linear programming problem and solved to optimality in finite time via constraint generation, despite having an infinite set of constraints. This solution permits the use of efficient linear program solvers for XADD compression and enables a novel class of bounded approximate SDP algorithms for hybrid MDPs that empirically offers order-of-magnitude speedups over the exact solution in exchange for a small approximation error.\nFinding the most likely (MAP) configuration of a Markov random field (MRF) is NP-hard in general. A promising, recent technique is to reduce the problem to finding a maximum weight stable set (MWSS) on a derived weighted graph, which if perfect, allows inference in polynomial time. We derive new results for this approach, including a general decomposition theorem for MRFs of any order and number of labels, extensions of results for binary pairwise models with submodular cost functions to higher order, and an exact characterization of which binary pairwise MRFs can be efficiently solved with this method. This defines the power of the approach on this class of models, improves our toolbox and expands the range of tractable models.\nOne of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviours, because a maximisation of the PI corresponds to an exploration of morphology- and environment-dependent behavioural regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.\nThe Trek Separation Theorem (Sullivant et al. 2010) states necessary and sufficient conditions for a linear directed acyclic graphical model to entail for all possible values of its linear coefficients that the rank of various sub-matrices of the covariance matrix is less than or equal to n, for any given n. In this paper, I extend the Trek Separation Theorem in two ways: I prove that the same necessary and sufficient conditions apply even when the generating model is partially non-linear and contains some cycles. This justifies application of constraint-based causal search algorithms such as the BuildPureClusters algorithm (Silva et al. 2006) for discovering the causal structure of latent variable models to data generated by a wider class of causal models that may contain non-linear and cyclic relations among the latent variables.\nIn this paper, a new method for dynamic balancing of double four-bar crank slider mechanism by meta- heuristic-based optimization algorithms is proposed. For this purpose, a proper objective function which is necessary for balancing of this mechanism and corresponding constraints has been obtained by dynamic modeling of the mechanism. Then PSO, ABC, BGA and HGAPSO algorithms have been applied for minimizing the defined cost function in optimization step. The optimization results have been studied completely by extracting the cost function, fitness, convergence speed and runtime values of applied algorithms. It has been shown that PSO and ABC are more efficient than BGA and HGAPSO in terms of convergence speed and result quality. Also, a laboratory scale experimental doublefour-bar crank-slider mechanism was provided for validating the proposed balancing method practically.\nDynamics of a chaotic spiking neuron model are being studied mathematically and experimentally. The Nonlinear Dynamic State neuron (NDS) is analysed to further understand the model and improve it. Chaos has many interesting properties such as sensitivity to initial conditions, space filling, control and synchronization. As suggested by biologists, these properties may be exploited and play vital role in carrying out computational tasks in human brain. The NDS model has some limitations; in thus paper the model is investigated to overcome some of these limitations in order to enhance the model. Therefore, the models parameters are tuned and the resulted dynamics are studied. Also, the discretization method of the model is considered. Moreover, a mathematical analysis is carried out to reveal the underlying dynamics of the model after tuning of its parameters. The results of the aforementioned methods revealed some facts regarding the NDS attractor and suggest the stabilization of a large number of unstable periodic orbits (UPOs) which might correspond to memories in phase space.\nComputational creativity is an emerging branch of artificial intelligence that places computers in the center of the creative process. Broadly, creativity involves a generative step to produce many ideas and a selective step to determine the ones that are the best. Many previous attempts at computational creativity, however, have not been able to achieve a valid selective step. This work shows how bringing data sources from the creative domain and from hedonic psychophysics together with big data analytics techniques can overcome this shortcoming to yield a system that can produce novel and high-quality creative artifacts. Our data-driven approach is demonstrated through a computational creativity system for culinary recipes and menus we developed and deployed, which can operate either autonomously or semi-autonomously with human interaction. We also comment on the volume, velocity, variety, and veracity of data in computational creativity.\nThe treatment of complex systems often requires the manipulation of vague, imprecise and uncertain information. Indeed, the human being is competent in handling of such systems in a natural way. Instead of thinking in mathematical terms, humans describes the behavior of the system by language proposals. In order to represent this type of information, Zadeh proposed to model the mechanism of human thought by approximate reasoning based on linguistic variables. He introduced the theory of fuzzy sets in 1965, which provides an interface between language and digital worlds. In this paper, we propose a Boolean modeling of the fuzzy reasoning that we baptized Fuzzy-BML and uses the characteristics of induction graph classification. Fuzzy-BML is the process by which the retrieval phase of a CBR is modelled not in the conventional form of mathematical equations, but in the form of a database with membership functions of fuzzy rules.\nA distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly formations of groups or clusters. In this work we establish the relation between the free-energy framework for decision making and rate-distortion theory and demonstrate how the application of rate-distortion for decision-making leads to the emergence of abstractions. We argue that abstractions are induced due to a limit in information processing capacity.\nWe have designed a machine that becomes increasingly better at behaving in underspecified circumstances, in a goal-directed way, on the job, by modeling itself and its environment as experience accumulates. Based on principles of autocatalysis, endogeny, and reflectivity, the work provides an architectural blueprint for constructing systems with high levels of operational autonomy in underspecified circumstances, starting from a small seed. Through value-driven dynamic priority scheduling controlling the parallel execution of a vast number of reasoning threads, the system achieves recursive self-improvement after it leaves the lab, within the boundaries imposed by its designers. A prototype system has been implemented and demonstrated to learn a complex real-world task, real-time multimodal dialogue with humans, by on-line observation. Our work presents solutions to several challenges that must be solved for achieving artificial general intelligence.\nWe propose a new approach for solving a class of discrete decision making problems under uncertainty with positive cost. This issue concerns multiple and diverse fields such as engineering, economics, artificial intelligence, cognitive science and many others. Basically, an agent has to choose a single or series of actions from a set of options, without knowing for sure their consequences. Schematically, two main approaches have been followed: either the agent learns which option is the correct one to choose in a given situation by trial and error, or the agent already has some knowledge on the possible consequences of his decisions; this knowledge being generally expressed as a conditional probability distribution. In the latter case, several optimal or suboptimal methods have been proposed to exploit this uncertain knowledge in various contexts. In this work, we propose following a different approach, based on the geometric intuition of distance. More precisely, we define a goal independent quasimetric structure on the state space, taking into account both cost function and transition probability. We then compare precision and computation time with classical approaches.\nExchangeability is a central notion in statistics and probability theory. The assumption that an infinite sequence of data points is exchangeable is at the core of Bayesian statistics. However, finite exchangeability as a statistical property that renders probabilistic inference tractable is less well-understood. We develop a theory of finite exchangeability and its relation to tractable probabilistic inference. The theory is complementary to that of independence and conditional independence. We show that tractable inference in probabilistic models with high treewidth and millions of variables can be understood using the notion of finite (partial) exchangeability. We also show that existing lifted inference algorithms implicitly utilize a combination of conditional independence and partial exchangeability.\nThis paper presents Networks of Influence Diagrams (NID), a compact, natural and highly expressive language for reasoning about agents beliefs and decision-making processes. NIDs are graphical structures in which agents mental models are represented as nodes in a network; a mental model for an agent may itself use descriptions of the mental models of other agents. NIDs are demonstrated by examples, showing how they can be used to describe conflicting and cyclic belief structures, and certain forms of bounded rationality. In an opponent modeling domain, NIDs were able to outperform other computational agents whose strategies were not known in advance. NIDs are equivalent in representation to Bayesian games but they are more compact and structured than this formalism. In particular, the equilibrium definition for NIDs makes an explicit distinction between agents optimal strategies, and how they actually behave in reality.\nThis paper defines the notion of analogical dissimilarity between four objects, with a special focus on objects structured as sequences. Firstly, it studies the case where the four objects have a null analogical dissimilarity, i.e. are in analogical proportion. Secondly, when one of these objects is unknown, it gives algorithms to compute it. Thirdly, it tackles the problem of defining analogical dissimilarity, which is a measure of how far four objects are from being in analogical proportion. In particular, when objects are sequences, it gives a definition and an algorithm based on an optimal alignment of the four sequences. It gives also learning algorithms, i.e. methods to find the triple of objects in a learning sample which has the least analogical dissimilarity with a given object. Two practical experiments are described: the first is a classification problem on benchmarks of binary and nominal data, the second shows how the generation of sequences by solving analogical equations enables a handwritten character recognition system to rapidly be adapted to a new writer.\nWe consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO* algorithm, a generalization of the AO* algorithm that performs search in a hybrid state space that is modeled using both discrete and continuous state variables, where the continuous variables represent monotonic resources. Like other heuristic search algorithms, HAO* leverages knowledge of the start state and an admissible heuristic to focus computational effort on those parts of the state space that could be reached from the start state by following an optimal policy. We show that this approach is especially effective when resource constraints limit how much of the state space is reachable. Experimental results demonstrate its effectiveness in the domain that motivates our research: automated planning for planetary exploration rovers.\nPartially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently.\nA phylogenetic tree shows the evolutionary relationships among species. Internal nodes of the tree represent speciation events and leaf nodes correspond to species. A goal of phylogenetics is to combine such trees into larger trees, called supertrees, whilst respecting the relationships in the original trees. A rooted tree exhibits an ultrametric property; that is, for any three leaves of the tree it must be that one pair has a deeper most recent common ancestor than the other pairs, or that all three have the same most recent common ancestor. This inspires a constraint programming encoding for rooted trees. We present an efficient constraint that enforces the ultrametric property over a symmetric array of constrained integer variables, with the inevitable property that the lower bounds of any three variables are mutually supportive. We show that this allows an efficient constraint-based solution to the supertree construction problem. We demonstrate that the versatility of constraint programming can be exploited to allow solutions to variants of the supertree construction problem.\nWe present Confidence-Based Autonomy (CBA), an interactive algorithm for policy learning from demonstration. The CBA algorithm consists of two components which take advantage of the complimentary abilities of humans and computer agents. The first component, Confident Execution, enables the agent to identify states in which demonstration is required, to request a demonstration from the human teacher and to learn a policy based on the acquired data. The algorithm selects demonstrations based on a measure of action selection confidence, and our results show that using Confident Execution the agent requires fewer demonstrations to learn the policy than when demonstrations are selected by a human teacher. The second algorithmic component, Corrective Demonstration, enables the teacher to correct any mistakes made by the agent through additional demonstrations in order to improve the policy and future task performance. CBA and its individual components are compared and evaluated in a complex simulated driving domain. The complete CBA algorithm results in the best overall learning performance, successfully reproducing the behavior of the teacher while balancing the tradeoff between number of demonstrations and number of incorrect actions during learning.\nA new search algorithm for solving distributed constraint optimization problems (DisCOPs) is presented. Agents assign variables sequentially and compute bounds on partial assignments asynchronously. The asynchronous bounds computation is based on the propagation of partial assignments. The asynchronous forward-bounding algorithm (AFB) is a distributed optimization search algorithm that keeps one consistent partial assignment at all times. The algorithm is described in detail and its correctness proven. Experimental evaluation shows that AFB outperforms synchronous branch and bound by many orders of magnitude, and produces a phase transition as the tightness of the problem increases. This is an analogous effect to the phase transition that has been observed when local consistency maintenance is applied to MaxCSPs. The AFB algorithm is further enhanced by the addition of a backjumping mechanism, resulting in the AFB-BJ algorithm. Distributed backjumping is based on accumulated information on bounds of all values and on processing concurrently a queue of candidate goals for the next move back. The AFB-BJ algorithm is compared experimentally to other DisCOP algorithms (ADOPT, DPOP, OptAPO) and is shown to be a very efficient algorithm for DisCOPs.\nAsynchronous Partial Overlay (APO) is a search algorithm that uses cooperative mediation to solve Distributed Constraint Satisfaction Problems (DisCSPs). The algorithm partitions the search into different subproblems of the DisCSP. The original proof of completeness of the APO algorithm is based on the growth of the size of the subproblems. The present paper demonstrates that this expected growth of subproblems does not occur in some situations, leading to a termination problem of the algorithm. The problematic parts in the APO algorithm that interfere with its completeness are identified and necessary modifications to the algorithm that fix these problematic parts are given. The resulting version of the algorithm, Complete Asynchronous Partial Overlay (CompAPO), ensures its completeness. Formal proofs for the soundness and completeness of CompAPO are given. A detailed performance evaluation of CompAPO comparing it to other DisCSP algorithms is presented, along with an extensive experimental evaluation of the algorithm's unique behavior. Additionally, an optimization version of the algorithm, CompOptAPO, is presented, discussed, and evaluated.\nWe investigate the computational complexity of testing dominance and consistency in CP-nets. Previously, the complexity of dominance has been determined for restricted classes in which the dependency graph of the CP-net is acyclic. However, there are preferences of interest that define cyclic dependency graphs; these are modeled with general CP-nets. In our main results, we show here that both dominance and consistency for general CP-nets are PSPACE-complete. We then consider the concept of strong dominance, dominance equivalence and dominance incomparability, and several notions of optimality, and identify the complexity of the corresponding decision problems. The reductions used in the proofs are from STRIPS planning, and thus reinforce the earlier established connections between both areas.\nPartially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent's belief about the physical world, about beliefs of other agents, and about their beliefs about others' beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity -- sometimes also called the curse of history -- we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse's impact substantially. We provide experimental results and chart future work.\nInference in Bayes Nets (BAYES) is an important problem with numerous applications in probabilistic reasoning. Counting the number of satisfying assignments of a propositional formula (#SAT) is a closely related problem of fundamental theoretical importance. Both these problems, and others, are members of the class of sum-of-products (SUMPROD) problems. In this paper we show that standard backtracking search when augmented with a simple memoization scheme (caching) can solve any sum-of-products problem with time complexity that is at least as good any other state-of-the-art exact algorithm, and that it can also achieve the best known time-space tradeoff. Furthermore, backtracking's ability to utilize more flexible variable orderings allows us to prove that it can achieve an exponential speedup over other standard algorithms for SUMPROD on some instances.   The ideas presented here have been utilized in a number of solvers that have been applied to various types of sum-of-product problems. These system's have exploited the fact that backtracking can naturally exploit more of the problem's structure to achieve improved performance on a range of probleminstances. Empirical evidence of this performance gain has appeared in published works describing these solvers, and we provide references to these works.\nVarious tasks in decision making and decision support systems require selecting a preferred subset of a given set of items. Here we focus on problems where the individual items are described using a set of characterizing attributes, and a generic preference specification is required, that is, a specification that can work with an arbitrary set of items. For example, preferences over the content of an online newspaper should have this form: At each viewing, the newspaper contains a subset of the set of articles currently available. Our preference specification over this subset should be provided offline, but we should be able to use it to select a subset of any currently available set of articles, e.g., based on their tags. We present a general approach for lifting formalisms for specifying preferences over objects with multiple attributes into ones that specify preferences over subsets of such objects. We also show how we can compute an optimal subset given such a specification in a relatively efficient manner. We provide an empirical evaluation of the approach as well as some worst-case complexity results.\nMultiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and-unlike the coverage set algorithm-it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs.\nBayesian networks are a useful tool in the representation of uncertain knowledge. This paper proposes a new algorithm called ACO-E, to learn the structure of a Bayesian network. It does this by conducting a search through the space of equivalence classes of Bayesian networks using Ant Colony Optimization (ACO). To this end, two novel extensions of traditional ACO techniques are proposed and implemented. Firstly, multiple types of moves are allowed. Secondly, moves can be given in terms of indices that are not based on construction graph nodes. The results of testing show that ACO-E performs better than a greedy search and other state-of-the-art and metaheuristic algorithms whilst searching in the space of equivalence classes.\nRecently, considerable focus has been given to the problem of determining the boundary between tractable and intractable planning problems. In this paper, we study the complexity of planning in the class C_n of planning problems, characterized by unary operators and directed path causal graphs. Although this is one of the simplest forms of causal graphs a planning problem can have, we show that planning is intractable for C_n (unless P = NP), even if the domains of state variables have bounded size. In particular, we show that plan existence for C_n^k is NP-hard for k>=5 by reduction from CNFSAT. Here, k denotes the upper bound on the size of the state variable domains. Our result reduces the complexity gap for the class C_n^k to cases k=3 and k=4 only, since C_n^2 is known to be tractable.\nConformant planning is the problem of finding a sequence of actions for achieving a goal in the presence of uncertainty in the initial state or action effects. The problem has been approached as a path-finding problem in belief space where good belief representations and heuristics are critical for scaling up. In this work, a different formulation is introduced for conformant problems with deterministic actions where they are automatically converted into classical ones and solved by an off-the-shelf classical planner. The translation maps literals L and sets of assumptions t about the initial situation, into new literals KL/t that represent that L must be true if t is initially true. We lay out a general translation scheme that is sound and establish the conditions under which the translation is also complete. We show that the complexity of the complete translation is exponential in a parameter of the problem called the conformant width, which for most benchmarks is bounded. The planner based on this translation exhibits good performance in comparison with existing planners, and is the basis for T0, the best performing planner in the Conformant Track of the 2006 International Planning Competition.\nSymmetries in discrete constraint satisfaction problems have been explored and exploited in the last years, but symmetries in continuous constraint problems have not received the same attention. Here we focus on permutations of the variables consisting of one single cycle. We propose a procedure that takes advantage of these symmetries by interacting with a continuous constraint solver without interfering with it. A key concept in this procedure are the classes of symmetric boxes formed by bisecting a n-dimensional cube at the same point in all dimensions at the same time. We analyze these classes and quantify them as a function of the cube dimensionality. Moreover, we propose a simple algorithm to generate the representatives of all these classes for any number of variables at very high rates. A problem example from the chemical and#64257;eld and the cyclic n-roots problem are used to show the performance of the approach in practice.\nIn this paper we formulate the problem of inference under incomplete information in very general terms. This includes modelling the process responsible for the incompleteness, which we call the incompleteness process. We allow the process behaviour to be partly unknown. Then we use Walleys theory of coherent lower previsions, a generalisation of the Bayesian theory to imprecision, to derive the rule to update beliefs under incompleteness that logically follows from our assumptions, and that we call conservative inference rule. This rule has some remarkable properties: it is an abstract rule to update beliefs that can be applied in any situation or domain; it gives us the opportunity to be neither too optimistic nor too pessimistic about the incompleteness process, which is a necessary condition to draw reliable while strong enough conclusions; and it is a coherent rule, in the sense that it cannot lead to inconsistencies. We give examples to show how the new rule can be applied in expert systems, in parametric statistical inference, and in pattern classification, and discuss more generally the view of incompleteness processes defended here as well as some of its consequences.\nAs fragments of first-order logic, Description logics (DLs) do not provide nonmonotonic features such as defeasible inheritance and default rules. Since many applications would benefit from the availability of such features, several families of nonmonotonic DLs have been developed that are mostly based on default logic and autoepistemic logic. In this paper, we consider circumscription as an interesting alternative approach to nonmonotonic DLs that, in particular, supports defeasible inheritance in a natural way. We study DLs extended with circumscription under different language restrictions and under different constraints on the sets of minimized, fixed, and varying predicates, and pinpoint the exact computational complexity of reasoning for DLs ranging from ALC to ALCIO and ALCQO. When the minimized and fixed predicates include only concept names but no role names, then reasoning is complete for NExpTime^NP. It becomes complete for NP^NExpTime when the number of minimized and fixed predicates is bounded by a constant. If roles can be minimized or fixed, then complexity ranges from NExpTime^NP to undecidability.\nThe survey propagation (SP) algorithm has been shown to work well on large instances of the random 3-SAT problem near its phase transition. It was shown that SP estimates marginals over covers that represent clusters of solutions. The SP-y algorithm generalizes SP to work on the maximum satisfiability (Max-SAT) problem, but the cover interpretation of SP does not generalize to SP-y. In this paper, we formulate the relaxed survey propagation (RSP) algorithm, which extends the SP algorithm to apply to the weighted Max-SAT problem. We show that RSP has an interpretation of estimating marginals over covers violating a set of clauses with minimal weight. This naturally generalizes the cover interpretation of SP. Empirically, we show that RSP outperforms SP-y and other state-of-the-art Max-SAT solvers on random Max-SAT instances. RSP also outperforms state-of-the-art weighted Max-SAT solvers on random weighted Max-SAT instances.\nWe present a novel reasoning calculus for the description logic SHOIQ^+---a knowledge representation formalism with applications in areas such as the Semantic Web. Unnecessary nondeterminism and the construction of large models are two primary sources of inefficiency in the tableau-based reasoning calculi used in state-of-the-art reasoners. In order to reduce nondeterminism, we base our calculus on hypertableau and hyperresolution calculi, which we extend with a blocking condition to ensure termination. In order to reduce the size of the constructed models, we introduce anywhere pairwise blocking. We also present an improved nominal introduction rule that ensures termination in the presence of nominals, inverse roles, and number restrictions---a combination of DL constructs that has proven notoriously difficult to handle. Our implementation shows significant performance improvements over state-of-the-art reasoners on several well-known ontologies.\nThe recently introduced series of description logics under the common moniker DL-Lite has attracted attention of the description logic and semantic web communities due to the low computational complexity of inference, on the one hand, and the ability to represent conceptual modeling formalisms, on the other. The main aim of this article is to carry out a thorough and systematic investigation of inference in extensions of the original DL-Lite logics along five axes: by (i) adding the Boolean connectives and (ii) number restrictions to concept constructs, (iii) allowing role hierarchies, (iv) allowing role disjointness, symmetry, asymmetry, reflexivity, irreflexivity and transitivity constraints, and (v) adopting or dropping the unique same assumption. We analyze the combined complexity of satisfiability for the resulting logics, as well as the data complexity of instance checking and answering positive existential queries. Our approach is based on embedding DL-Lite logics in suitable fragments of the one-variable first-order logic, which provides useful insights into their properties and, in particular, computational behavior.\nThe paper investigates parameterized approximate message-passing schemes that are based on bounded inference and are inspired by Pearl's belief propagation algorithm (BP). We start with the bounded inference mini-clustering algorithm and then move to the iterative scheme called Iterative Join-Graph Propagation (IJGP), that combines both iteration and bounded inference. Algorithm IJGP belongs to the class of Generalized Belief Propagation algorithms, a framework that allowed connections with approximate algorithms from statistical physics and is shown empirically to surpass the performance of mini-clustering and belief propagation, as well as a number of other state-of-the-art algorithms on several classes of networks. We also provide insight into the accuracy of iterative BP and IJGP by relating these algorithms to well known classes of constraint propagation schemes.\nThe identification of performance-optimizing parameter settings is an important part of the development and application of algorithms. We describe an automatic framework for this algorithm configuration problem. More formally, we provide methods for optimizing a target algorithm's performance on a given class of problem instances by varying a set of ordinal and/or categorical parameters. We review a family of local-search-based algorithm configuration procedures and present novel techniques for accelerating them by adaptively limiting the time spent for evaluating individual configurations. We describe the results of a comprehensive experimental evaluation of our methods, based on the configuration of prominent complete and incomplete algorithms for SAT. We also present what is, to our knowledge, the first published work on automatically configuring the CPLEX mixed integer programming solver. All the algorithms we considered had default parameter settings that were manually identified with considerable effort. Nevertheless, using our automated algorithm configuration procedures, we achieved substantial and consistent performance improvements.\nKorf, Reid, and Edelkamp introduced a formula to predict the number of nodes IDA* will expand on a single iteration for a given consistent heuristic, and experimentally demonstrated that it could make very accurate predictions. In this paper we show that, in addition to requiring the heuristic to be consistent, their formulas predictions are accurate only at levels of the brute-force search tree where the heuristic values obey the unconditional distribution that they defined and then used in their formula. We then propose a new formula that works well without these requirements, i.e., it can make accurate predictions of IDA*s performance for inconsistent heuristics and if the heuristic values in any level do not obey the unconditional distribution. In order to achieve this we introduce the conditional distribution of heuristic values which is a generalization of their unconditional heuristic distribution. We also provide extensions of our formula that handle individual start states and the augmentation of IDA* with bidirectional pathmax (BPMX), a technique for propagating heuristic values when inconsistent heuristics are used. Experimental results demonstrate the accuracy of our new method and all its variations.\nDeciding how to act in partially observable environments remains an active area of research. Identifying good sequences of decisions is particularly challenging when good control performance requires planning multiple steps into the future in domains with many states. Towards addressing this challenge, we present an online, forward-search algorithm called the Posterior Belief Distribution (PBD). PBD leverages a novel method for calculating the posterior distribution over beliefs that result after a sequence of actions is taken, given the set of observation sequences that could be received during this process. This method allows us to efficiently evaluate the expected reward of a sequence of primitive actions, which we refer to as macro-actions. We present a formal analysis of our approach, and examine its performance on two very large simulation experiments: scientific exploration and a target monitoring domain. We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multi-step lookahead is required to achieve good performance.\nThe paper presents a scheme for computing lower and upper bounds on the posterior marginals in Bayesian networks with discrete variables. Its power lies in its ability to use any available scheme that bounds the probability of evidence or posterior marginals and enhance its performance in an anytime manner. The scheme uses the cutset conditioning principle to tighten existing bounding schemes and to facilitate anytime behavior, utilizing a fixed number of cutset tuples. The accuracy of the bounds improves as the number of used cutset tuples increases and so does the computation time. We demonstrate empirically the value of our scheme for bounding posterior marginals and probability of evidence using a variant of the bound propagation algorithm as a plug-in scheme.\nIn this paper, we address the problem of change in an abstract argumentation system. We focus on a particular change: the addition of a new argument which interacts with previous arguments. We study the impact of such an addition on the outcome of the argumentation system, more particularly on the set of its extensions. Several properties for this change operation are defined by comparing the new set of extensions to the initial one, these properties are called structural when the comparisons are based on set-cardinality or set-inclusion relations. Several other properties are proposed where comparisons are based on the status of some particular arguments: the accepted arguments; these properties refer to the evolution of this status during the change, e.g., Monotony and Priority to Recency. All these properties may be more or less desirable according to specific applications. They are studied under two particular semantics: the grounded and preferred semantics.\nCall control features (e.g., call-divert, voice-mail) are primitive options to which users can subscribe off-line to personalise their service. The configuration of a feature subscription involves choosing and sequencing features from a catalogue and is subject to constraints that prevent undesirable feature interactions at run-time. When the subscription requested by a user is inconsistent, one problem is to find an optimal relaxation, which is a generalisation of the feedback vertex set problem on directed graphs, and thus it is an NP-hard task. We present several constraint programming formulations of the problem. We also present formulations using partial weighted maximum Boolean satisfiability and mixed integer linear programming. We study all these formulations by experimentally comparing them on a variety of randomly generated instances of the feature subscription problem.\nWe develop multiattribute auctions that accommodate generalized additive independent (GAI) preferences. We propose an iterative auction mechanism that maintains prices on potentially overlapping GAI clusters of attributes, thus decreases elicitation and computational burden, and creates an open competition among suppliers over a multidimensional domain. Most significantly, the auction is guaranteed to achieve surplus which approximates optimal welfare up to a small additive factor, under reasonable equilibrium strategies of traders. The main departure of GAI auctions from previous literature is to accommodate non-additive trader preferences, hence allowing traders to condition their evaluation of specific attributes on the value of other attributes. At the same time, the GAI structure supports a compact representation of prices, enabling a tractable auction process. We perform a simulation study, demonstrating and quantifying the significant efficiency advantage of more expressive preference modeling. We draw random GAI-structured utility functions with various internal structures, generate additive functions that approximate the GAI utility, and compare the performance of the auctions using the two representations. We find that allowing traders to express existing dependencies among attributes improves the economic efficiency of multiattribute auctions.\nDomain-specific features are important in representing problem structure throughout machine learning and decision-theoretic planning. In planning, once state features are provided, domain-independent algorithms such as approximate value iteration can learn weighted combinations of those features that often perform well as heuristic estimates of state value (e.g., distance to the goal). Successful applications in real-world domains often require features crafted by human experts. Here, we propose automatic processes for learning useful domain-specific feature sets with little or no human intervention. Our methods select and add features that describe state-space regions of high inconsistency in the Bellman equation (statewise Bellman error) during approximate value iteration. Our method can be applied using any real-valued-feature hypothesis space and corresponding learning method for selecting features from training sets of state-value pairs. We evaluate the method with hypothesis spaces defined by both relational and propositional feature languages, using nine probabilistic planning domains. We show that approximate value iteration using a relational feature space performs at the state-of-the-art in domain-independent stochastic relational planning. Our method provides the first domain-independent approach that plays Tetris successfully (without human-engineered features).\nWe propose a StochAstic Fault diagnosis AlgoRIthm, called SAFARI, which trades off guarantees of computing minimal diagnoses for computational efficiency. We empirically demonstrate, using the 74XXX and ISCAS-85 suites of benchmark combinatorial circuits, that SAFARI achieves several orders-of-magnitude speedup over two well-known deterministic algorithms, CDA* and HA*, for multiple-fault diagnoses; further, SAFARI can compute a range of multiple-fault diagnoses that CDA* and HA* cannot. We also prove that SAFARI is optimal for a range of propositional fault models, such as the widely-used weak-fault models (models with ignorance of abnormal behavior). We discuss the optimality of SAFARI in a class of strong-fault circuit models with stuck-at failure modes. By modeling the algorithm itself as a Markov chain, we provide exact bounds on the minimality of the diagnosis computed. SAFARI also displays strong anytime behavior, and will return a diagnosis after any non-trivial inference time.\nDescription Logics are knowledge representation formalisms that provide, for example, the logical underpinning of the W3C OWL standards. Conjunctive queries, the standard query language in databases, have recently gained significant attention as an expressive formalism for querying Description Logic knowledge bases. Several different techniques for deciding conjunctive query entailment are available for a wide range of DLs. Nevertheless, the combination of nominals, inverse roles, and number restrictions in OWL 1 and OWL 2 DL causes unsolvable problems for the techniques hitherto available. We tackle this problem and present a decidability result for entailment of unions of conjunctive queries in the DL ALCHOIQb that contains all three problematic constructors simultaneously. Provided that queries contain only simple roles, our result also shows decidability of entailment of (unions of) conjunctive queries in the logic that underpins OWL 1 DL and we believe that the presented results will pave the way for further progress towards conjunctive query entailment decision procedures for the Description Logics underlying the OWL standards.\nWe provide a series of algorithms demonstrating that solutions according to the fundamental game-theoretic solution concept of closed under rational behavior (CURB) sets in two-player, normal-form games can be computed in polynomial time (we also discuss extensions to n-player games). First, we describe an algorithm that identifies all of a player's best responses conditioned on the belief that the other player will play from within a given subset of its strategy space. This algorithm serves as a subroutine in a series of polynomial-time algorithms for finding all minimal CURB sets, one minimal CURB set, and the smallest minimal CURB set in a game. We then show that the complexity of finding a Nash equilibrium can be exponential only in the size of a game's smallest CURB set. Related to this, we show that the smallest CURB set can be an arbitrarily small portion of the game, but it can also be arbitrarily larger than the supports of its only enclosed Nash equilibrium. We test our algorithms empirically and find that most commonly studied academic games tend to have either very large or very small minimal CURB sets.\nThe Resource Description Framework (RDF) is a Semantic Web standard that provides a data language, simply called RDF, as well as a lightweight ontology language, called RDF Schema. We investigate embeddings of RDF in logic and show how standard logic programming and description logic technology can be used for reasoning with RDF. We subsequently consider extensions of RDF with datatype support, considering D entailment, defined in the RDF semantics specification, and D* entailment, a semantic weakening of D entailment, introduced by ter Horst. We use the embeddings and properties of the logics to establish novel upper bounds for the complexity of deciding entailment. We subsequently establish two novel lower bounds, establishing that RDFS entailment is PTime-complete and that simple-D entailment is coNP-hard, when considering arbitrary datatypes, both in the size of the entailing graph. The results indicate that RDFS may not be as lightweight as one may expect.\nNoisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning with such rules in grounded relational domains. Our algorithms exploit the compactness of rules for efficient and flexible decision-theoretic planning. As a first approach, we combine these rules with the Upper Confidence Bounds applied to Trees (UCT) algorithm based on look-ahead trees. Our second approach converts these rules into a structured dynamic Bayesian network representation and predicts the effects of action sequences using approximate inference and beliefs over world states. We evaluate the effectiveness of our approaches for planning in a simulated complex 3D robot manipulation scenario with an articulated manipulator and realistic physics and in domains of the probabilistic planning competition. Empirical results show that our methods can solve problems where existing methods fail.\nTo harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals.\nThe Hamiltonian cycle problem (HCP) is an important combinatorial problem with applications in many areas. It is among the first problems used for studying intrinsic properties, including phase transitions, of combinatorial problems. While thorough theoretical and experimental analyses have been made on the HCP in undirected graphs, a limited amount of work has been done for the HCP in directed graphs (DHCP). The main contribution of this work is an effective algorithm for the DHCP. Our algorithm explores and exploits the close relationship between the DHCP and the Assignment Problem (AP) and utilizes a technique based on Boolean satisfiability (SAT). By combining effective algorithms for the AP and SAT, our algorithm significantly outperforms previous exact DHCP algorithms, including an algorithm based on the award-winning Concorde TSP algorithm. The second result of the current study is an experimental analysis of phase transitions of the DHCP, verifying and refining a known phase transition of the DHCP.\nWe introduce a novel logical notion--partial entailment--to propositional logic. In contrast with classical entailment, that a formula P partially entails another formula Q with respect to a background formula set \\Gamma intuitively means that under the circumstance of \\Gamma, if P is true then some \"part\" of Q will also be true. We distinguish three different kinds of partial entailments and formalize them by using an extended notion of prime implicant. We study their semantic properties, which show that, surprisingly, partial entailments fail for many simple inference rules. Then, we study the related computational properties, which indicate that partial entailments are relatively difficult to be computed. Finally, we consider a potential application of partial entailments in reasoning about rational agents.\nIn action domains where agents may have erroneous beliefs, reasoning about the effects of actions involves reasoning about belief change. In this paper, we use a transition system approach to reason about the evolution of an agents beliefs as actions are executed. Some actions cause an agent to perform belief revision while others cause an agent to perform belief update, but the interaction between revision and update can be non-elementary. We present a set of rationality properties describing the interaction between revision and update, and we introduce a new class of belief change operators for reasoning about alternating sequences of revisions and updates. Our belief change operators can be characterized in terms of a natural shifting operation on total pre-orderings over interpretations. We compare our approach with related work on iterated belief change due to action, and we conclude with some directions for future research.\nWe offer a new understanding of some aspects of practical SAT-solvers that are based on DPLL with unit-clause propagation, clause-learning, and restarts. We do so by analyzing a concrete algorithm which we claim is faithful to what practical solvers do. In particular, before making any new decision or restart, the solver repeatedly applies the unit-resolution rule until saturation, and leaves no component to the mercy of non-determinism except for some internal randomness. We prove the perhaps surprising fact that, although the solver is not explicitly designed for it, with high probability it ends up behaving as width-k resolution after no more than O(n^2k+2) conflicts and restarts, where n is the number of variables. In other words, width-k resolution can be thought of as O(n^2k+2) restarts of the unit-resolution rule with learning.\nWhen faced with the problem of learning a model of a high-dimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined together to form a more complete, structured model. However, in partially observable (non-Markov) environments, standard model-learning methods learn generative models, i.e. models that provide a probability distribution over all possible futures (such as POMDPs). It is not straightforward to restrict such models to make only certain predictions, and doing so does not always simplify the learning problem. In this paper we present prediction profile models: non-generative partial models for partially observable systems that make only a given set of predictions, and are therefore far simpler than generative models in some cases. We formalize the problem of learning a prediction profile model as a transformation of the original model-learning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models.\nIn this paper, we propose a comprehensive study of second-order consistencies (i.e., consistencies identifying inconsistent pairs of values) for constraint satisfaction. We build a full picture of the relationships existing between four basic second-order consistencies, namely path consistency (PC), 3-consistency (3C), dual consistency (DC) and 2-singleton arc consistency (2SAC), as well as their conservative and strong variants. Interestingly, dual consistency is an original property that can be established by using the outcome of the enforcement of generalized arc consistency (GAC), which makes it rather easy to obtain since constraint solvers typically maintain GAC during search. On binary constraint networks, DC is equivalent to PC, but its restriction to existing constraints, called conservative dual consistency (CDC), is strictly stronger than traditional conservative consistencies derived from path consistency, namely partial path consistency (PPC) and conservative path consistency (CPC). After introducing a general algorithm to enforce strong (C)DC, we present the results of an experimentation over a wide range of benchmarks that demonstrate the interest of (conservative) dual consistency. In particular, we show that enforcing (C)DC before search clearly improves the performance of MAC (the algorithm that maintains GAC during search) on several binary and non-binary structured problems.\nWe address the problem of computing approximate marginals in Gaussian probabilistic models by using mean field and fractional Bethe approximations. We define the Gaussian fractional Bethe free energy in terms of the moment parameters of the approximate marginals, derive a lower and an upper bound on the fractional Bethe free energy and establish a necessary condition for the lower bound to be bounded from below. It turns out that the condition is identical to the pairwise normalizability condition, which is known to be a sufficient condition for the convergence of the message passing algorithm. We show that stable fixed points of the Gaussian message passing algorithm are local minima of the Gaussian Bethe free energy. By a counterexample, we disprove the conjecture stating that the unboundedness of the free energy implies the divergence of the message passing algorithm.\nWe address the cost-sensitive feature acquisition problem, where misclassifying an instance is costly but the expected misclassification cost can be reduced by acquiring the values of the missing features. Because acquiring the features is costly as well, the objective is to acquire the right set of features so that the sum of the feature acquisition cost and misclassification cost is minimized. We describe the Value of Information Lattice (VOILA), an optimal and efficient feature subset acquisition framework. Unlike the common practice, which is to acquire features greedily, VOILA can reason with subsets of features. VOILA efficiently searches the space of possible feature subsets by discovering and exploiting conditional independence properties between the features and it reuses probabilistic inference computations to further speed up the process. Through empirical evaluation on five medical datasets, we show that the greedy strategy is often reluctant to acquire features, as it cannot forecast the benefit of acquiring multiple features in combination.\nDynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first order logic using several representation schemes. Recent work introduced a first order variant of decision diagrams (FODD) and developed a value iteration algorithm for this representation. This paper develops several improvements to the FODD algorithm that make the approach practical. These include, new reduction operators that decrease the size of the representation, several speedup techniques, and techniques for value approximation. Incorporating these, the paper presents a planning system, FODD-Planner, for solving relational stochastic planning problems. The system is evaluated on several domains, including problems from the recent international planning competition, and shows competitive performance with top ranking systems. This is the first demonstration of feasibility of this approach and it shows that abstraction through compact representation is a promising approach to stochastic planning.\nPrevious studies have demonstrated that encoding a Bayesian network into a SAT formula and then performing weighted model counting using a backtracking search algorithm can be an effective method for exact inference. In this paper, we present techniques for improving this approach for Bayesian networks with noisy-OR and noisy-MAX relations---two relations that are widely used in practice as they can dramatically reduce the number of probabilities one needs to specify. In particular, we present two SAT encodings for noisy-OR and two encodings for noisy-MAX that exploit the structure or semantics of the relations to improve both time and space efficiency, and we prove the correctness of the encodings. We experimentally evaluated our techniques on large-scale real and randomly generated Bayesian networks. On these benchmarks, our techniques gave speedups of up to two orders of magnitude over the best previous approaches for networks with noisy-OR/MAX relations and scaled up to larger networks. As well, our techniques extend the weighted model counting approach for exact inference to networks that were previously intractable for the approach.\nThe ignoring delete lists relaxation is of paramount importance for both satisficing and optimal planning. In earlier work, it was observed that the optimal relaxation heuristic h+ has amazing qualities in many classical planning benchmarks, in particular pertaining to the complete absence of local minima. The proofs of this are hand-made, raising the question whether such proofs can be lead automatically by domain analysis techniques. In contrast to earlier disappointing results -- the analysis method has exponential runtime and succeeds only in two extremely simple benchmark domains -- we herein answer this question in the affirmative. We establish connections between causal graph structure and h+ topology. This results in low-order polynomial time analysis methods, implemented in a tool we call TorchLight. Of the 12 domains where the absence of local minima has been proved, TorchLight gives strong success guarantees in 8 domains. Empirically, its analysis exhibits strong performance in a further 2 of these domains, plus in 4 more domains where local minima may exist but are rare. In this way, TorchLight can distinguish easy domains from hard ones. By summarizing structural reasons for analysis failure, TorchLight also provides diagnostic output indicating domain aspects that may cause local minima.\nInterpolation is an important property of classical and many non-classical logics that has been shown to have interesting applications in computer science and AI. Here we study the Interpolation Property for the the non-monotonic system of equilibrium logic, establishing weaker or stronger forms of interpolation depending on the precise interpretation of the inference relation. These results also yield a form of interpolation for ground logic programs under the answer sets semantics. For disjunctive logic programs we also study the property of uniform interpolation that is closely related to the concept of variable forgetting. The first-order version of equilibrium logic has analogous Interpolation properties whenever the collection of equilibrium models is (first-order) definable. Since this is the case for so-called safe programs and theories, it applies to the usual situations that arise in practical answer set programming.\nLin and Zhaos theorem on loop formulas states that in the propositional case the stable model semantics of a logic program can be completely characterized by propositional loop formulas, but this result does not fully carry over to the first-order case. We investigate the precise relationship between the first-order stable model semantics and first-order loop formulas, and study conditions under which the former can be represented by the latter. In order to facilitate the comparison, we extend the definition of a first-order loop formula which was limited to a nondisjunctive program, to a disjunctive program and to an arbitrary first-order theory. Based on the studied relationship we extend the syntax of a logic program with explicit quantifiers, which allows us to do reasoning involving non-Herbrand stable models using first-order reasoners. Such programs can be viewed as a special class of first-order theories under the stable model semantics, which yields more succinct loop formulas than the general language due to their restricted syntax.\nWe define a logic of propositional formula schemata adding to the syntax of propositional logic indexed propositions and iterated connectives ranging over intervals parameterized by arithmetic variables. The satisfiability problem is shown to be undecidable for this new logic, but we introduce a very general class of schemata, called bound-linear, for which this problem becomes decidable. This result is obtained by reduction to a particular class of schemata called regular, for which we provide a sound and complete terminating proof procedure. This schemata calculus allows one to capture proof patterns corresponding to a large class of problems specified in propositional logic. We also show that the satisfiability problem becomes again undecidable for slight extensions of this class, thus demonstrating that bound-linear schemata represent a good compromise between expressivity and decidability.\nSome of the applications of OWL and RDF (e.g. biomedical knowledge representation and semantic policy formulation) call for extensions of these languages with nonmonotonic constructs such as inheritance with overriding. Nonmonotonic description logics have been studied for many years, however no practical such knowledge representation languages exist, due to a combination of semantic difficulties and high computational complexity. Independently, low-complexity description logics such as DL-lite and EL have been introduced and incorporated in the OWL standard. Therefore, it is interesting to see whether the syntactic restrictions characterizing DL-lite and EL bring computational benefits to their nonmonotonic versions, too. In this paper we extensively investigate the computational complexity of Circumscription when knowledge bases are formulated in DL-lite_R, EL, and fragments thereof. We identify fragments whose complexity ranges from P to the second level of the polynomial hierarchy, as well as fragments whose complexity raises to PSPACE and beyond.\nWe consider how an agent should update her beliefs when her beliefs are represented by a set P of probability distributions, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a bookie, who chooses some distribution from P. We consider two reasonable games that differ in what the bookie knows when he makes his choice. Anomalies that have been observed before, like time inconsistency, can be understood as arising because different games are being played, against bookies with different information. We characterize the important special cases in which the optimal decision rules according to the minimax criterion amount to either conditioning or simply ignoring the information. Finally, we consider the relationship between updating and calibration when uncertainty is described by sets of probabilities. Our results emphasize the key role of the rectangularity condition of Epstein and Schneider.\nIn a deterministic world, a planning agent can be certain of the consequences of its planned sequence of actions. Not so, however, in dynamic, stochastic domains where Markov decision processes are commonly used. Unfortunately these suffer from the curse of dimensionality: if the state space is a Cartesian product of many small sets (dimensions), planning is exponential in the number of those dimensions.   Our new technique exploits the intuitive strategy of selectively ignoring various dimensions in different parts of the state space. The resulting non-uniformity has strong implications, since the approximation is no longer Markovian, requiring the use of a modified planner. We also use a spatial and temporal proximity measure, which responds to continued planning as well as movement of the agent through the state space, to dynamically adapt the abstraction as planning progresses.   We present qualitative and quantitative results across a range of experimental domains showing that an agent exploiting this novel approximation method successfully finds solutions to the planning problem using much less than the full state space. We assess and analyse the features of domains which our method can exploit.\nPlanning as satisfiability is a principal approach to planning with many eminent advantages. The existing planning as satisfiability techniques usually use encodings compiled from STRIPS. We introduce a novel SAT encoding scheme (SASE) based on the SAS+ formalism. The new scheme exploits the structural information in SAS+, resulting in an encoding that is both more compact and efficient for planning. We prove the correctness of the new encoding by establishing an isomorphism between the solution plans of SASE and that of STRIPS based encodings. We further analyze the transition variables newly introduced in SASE to explain why it accommodates modern SAT solving algorithms and improves performance. We give empirical statistical results to support our analysis. We also develop a number of techniques to further reduce the encoding size of SASE, and conduct experimental studies to show the strength of each individual technique. Finally, we report extensive experimental results to demonstrate significant improvements of SASE over the state-of-the-art STRIPS based encoding schemes in terms of both time and memory efficiency.\nWe propose the concept of Action-Related Place (ARPlace) as a powerful and flexible representation of task-related place in the context of mobile manipulation. ARPlace represents robot base locations not as a single position, but rather as a collection of positions, each with an associated probability that the manipulation action will succeed when located there. ARPlaces are generated using a predictive model that is acquired through experience-based learning, and take into account the uncertainty the robot has about its own location and the location of the object to be manipulated.   When executing the task, rather than choosing one specific goal position based only on the initial knowledge about the task context, the robot instantiates an ARPlace, and bases its decisions on this ARPlace, which is updated as new information about the task becomes available. To show the advantages of this least-commitment approach, we present a transformational planner that reasons about ARPlaces in order to optimize symbolic plans. Our empirical evaluation demonstrates that using ARPlaces leads to more robust and efficient mobile manipulation in the face of state estimation uncertainty on our simulated robot.\nDesigning a search heuristic for constraint programming that is reliable across problem domains has been an important research topic in recent years. This paper concentrates on one family of candidates: counting-based search. Such heuristics seek to make branching decisions that preserve most of the solutions by determining what proportion of solutions to each individual constraint agree with that decision. Whereas most generic search heuristics in constraint programming rely on local information at the level of the individual variable, our search heuristics are based on more global information at the constraint level. We design several algorithms that are used to count the number of solutions to specific families of constraints and propose some search heuristics exploiting such information. The experimental part of the paper considers eight problem domains ranging from well-established benchmark puzzles to rostering and sport scheduling. An initial empirical analysis identifies heuristic maxSD as a robust candidate among our proposals.eWe then evaluate the latter against the state of the art, including the latest generic search heuristics, restarts, and discrepancy-based tree traversals. Experimental results show that counting-based search generally outperforms other generic heuristics.\nThe focus of this paper is the calculation of similarity between two concepts from an ontology for a Human-Like Interaction system. In order to facilitate this calculation, a similarity function is proposed based on five dimensions (sort, compositional, essential, restrictive and descriptive) constituting the structure of ontological knowledge. The paper includes a proposal for computing a similarity function for each dimension of knowledge. Later on, the similarity values obtained are weighted and aggregated to obtain a global similarity measure. In order to calculate those weights associated to each dimension, four training methods have been proposed. The training methods differ in the element to fit: the user, concepts or pairs of concepts, and a hybrid approach. For evaluating the proposal, the knowledge base was fed from WordNet and extended by using a knowledge editing toolkit (Cognos). The evaluation of the proposal is carried out through the comparison of system responses with those given by human test subjects, both providing a measure of the soundness of the procedure and revealing ways in which the proposal may be improved.\nCircumscription and logic programs under the stable model semantics are two well-known nonmonotonic formalisms. The former has served as a basis of classical logic based action formalisms, such as the situation calculus, the event calculus and temporal action logics; the latter has served as a basis of a family of action languages, such as language A and several of its descendants. Based on the discovery that circumscription and the stable model semantics coincide on a class of canonical formulas, we reformulate the situation calculus and the event calculus in the general theory of stable models. We also present a translation that turns the reformulations further into answer set programs, so that efficient answer set solvers can be applied to compute the situation calculus and the event calculus.\nLocal consistency techniques such as k-consistency are a key component of specialised solvers for constraint satisfaction problems. In this paper we show that the power of using k-consistency techniques on a constraint satisfaction problem is precisely captured by using a particular inference rule, which we call negative-hyper-resolution, on the standard direct encoding of the problem into Boolean clauses. We also show that current clause-learning SAT-solvers will discover in expected polynomial time any inconsistency that can be deduced from a given set of clauses using negative-hyper-resolvents of a fixed size. We combine these two results to show that, without being explicitly designed to do so, current clause-learning SAT-solvers efficiently simulate k-consistency techniques, for all fixed values of k. We then give some experimental results to show that this feature allows clause-learning SAT-solvers to efficiently solve certain families of constraint problems which are challenging for conventional constraint-programming solvers.\nCompact representations of objects is a common concept in computer science. Automated planning can be viewed as a case of this concept: a planning instance is a compact implicit representation of a graph and the problem is to find a path (a plan) in this graph. While the graphs themselves are represented compactly as planning instances, the paths are usually represented explicitly as sequences of actions. Some cases are known where the plans always have compact representations, for example, using macros. We show that these results do not extend to the general case, by proving a number of bounds for compact representations of plans under various criteria, like efficient sequential or random access of actions. In addition to this, we show that our results have consequences for what can be gained from reformulating planning into some other problem. As a contrast to this we also prove a number of positive results, demonstrating restricted cases where plans do have useful compact representations, as well as proving that macro plans have favourable access properties. Our results are finally discussed in relation to other relevant contexts.\nWe study a logic-based approach to versioning of ontologies. Under this view, ontologies provide answers to queries about some vocabulary of interest. The difference between two versions of an ontology is given by the set of queries that receive different answers. We investigate this approach for terminologies given in the description logic EL extended with role inclusions and domain and range restrictions for three distinct types of queries: subsumption, instance, and conjunctive queries. In all three cases, we present polynomial-time algorithms that decide whether two terminologies give the same answers to queries over a given vocabulary and compute a succinct representation of the difference if it is non- empty. We present an implementation, CEX2, of the developed algorithms for subsumption and instance queries and apply it to distinct versions of Snomed CT and the NCI ontology.\nWe present algorithms for generating alternative solutions for explicit acyclic AND/OR structures in non-decreasing order of cost. The proposed algorithms use a best first search technique and report the solutions using an implicit representation ordered by cost. In this paper, we present two versions of the search algorithm -- (a) an initial version of the best first search algorithm, ASG, which may present one solution more than once while generating the ordered solutions, and (b) another version, LASG, which avoids the construction of the duplicate solutions. The actual solutions can be reconstructed quickly from the implicit compact representation used. We have applied the methods on a few test domains, some of them are synthetic while the others are based on well known problems including the search space of the 5-peg Tower of Hanoi problem, the matrix-chain multiplication problem and the problem of finding secondary structure of RNA. Experimental results show the efficacy of the proposed algorithms over the existing approach. Our proposed algorithms have potential use in various domains ranging from knowledge based frameworks to service composition, where the AND/OR structure is widely used for representing problems.\nThere is currently a growing interest in techniques for hiding parts of the signature of an ontology Kh that is being reused by another ontology Kv. Towards this goal, in this paper we propose the import-by-query framework, which makes the content of Kh accessible through a limited query interface. If Kv reuses the symbols from Kh in a certain restricted way, one can reason over Kv U Kh by accessing only Kv and the query interface. We map out the landscape of the import-by-query problem. In particular, we outline the limitations of our framework and prove that certain restrictions on the expressivity of Kh and the way in which Kv reuses symbols from Kh are strictly necessary to enable reasoning in our setting. We also identify cases in which reasoning is possible and we present suitable import-by-query reasoning algorithms.\nThe objective is to present one important aspect of the European IST-FET project \"REV!GIS\"1: the methodology which has been developed for the translation (interpretation) of the quality of the data into a \"fitness for use\" information, that we can confront to the user needs in its application. This methodology is based upon the notion of \"ontologies\" as a conceptual framework able to capture the explicit and implicit knowledge involved in the application. We do not address the general problem of formalizing such ontologies, instead, we rather try to illustrate this with three applications which are particular cases of the more general \"data fusion\" problem. In each application, we show how to deploy our methodology, by comparing several possible solutions, and we try to enlighten where are the quality issues, and what kind of solution to privilege, even at the expense of a highly complex computational approach. The expectation of the REV!GIS project is that computationally tractable solutions will be available among the next generation AI tools.\nThis paper proposes a new mechanism for pruning a search game-tree in computer chess. The algorithm stores and then reuses chains or sequences of moves, built up from previous searches. These move sequences have a built-in forward-pruning mechanism that can radically reduce the search space. A typical search process might retrieve a move from a Transposition Table, where the decision of what move to retrieve would be based on the position itself. This algorithm stores move sequences based on what previous sequences were better, or caused cutoffs. This is therefore position independent and so it could also be useful in games with imperfect information or uncertainty, where the whole situation is not known at any one time. Over a small set of tests, the algorithm was shown to clearly out-perform Transposition Tables, both in terms of search reduction and game-play results.\nReal-time strategy (RTS) games make heavy use of artificial intelligence (AI), especially in the design of computerized opponents. Because of the computational complexity involved in managing all aspects of these games, many AI opponents are designed to optimize only a few areas of playing style. In games like StarCraft 2, a very popular and recently released RTS, most AI strategies revolve around economic and building efficiency: AI opponents try to gather and spend all resources as quickly and effectively as possible while ensuring that no units are idle. The aim of this work was to help address the need for AI combat strategies that are not computationally intensive. Our goal was to produce a computationally efficient model that is accurate at predicting the results of complex battles between diverse armies, including which army will win and how many units will remain. Our results suggest it may be possible to develop a relatively simple approximation model of combat that can accurately predict many battles that do not involve micromanagement. Future designs of AI opponents may be able to incorporate such an approximation model into their decision and planning systems to provide a challenge that is strategically balanced across all aspects of play.\nWe show that the set of all formulas in n variables valid in a finite class A of finite algebras is always a regular tree language, and compute a finite axiom set for A. We give a rational reconstruction of Barzdins' liquid flow algorithm (Barzdin+Barzdin, 1991). We show a sufficient condition for the existence of a class A of prototype algebras for a given theory T. Such a set allows us to prove T |= p simply by testing whether p holds in A.\nDual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP). This tutorial gives an overview of the technique. We describe example algorithms, describe formal guarantees for the method, and describe practical issues in implementing the algorithms. While our examples are predominantly drawn from the NLP literature, the material should be of general relevance to inference problems in machine learning. A central theme of this tutorial is that Lagrangian relaxation is naturally applied in conjunction with a broad class of combinatorial algorithms, allowing inference in models that go significantly beyond previous work on Lagrangian relaxation for inference in graphical models.\nWe study the model of projective simulation (PS) which is a novel approach to artificial intelligence (AI). Recently it was shown that the PS agent performs well in a number of simple task environments, also when compared to standard models of reinforcement learning (RL). In this paper we study the performance of the PS agent further in more complicated scenarios. To that end we chose two well-studied benchmarking problems, namely the \"grid-world\" and the \"mountain-car\" problem, which challenge the model with large and continuous input space. We compare the performance of the PS agent model with those of existing models and show that the PS agent exhibits competitive performance also in such scenarios.\nDynamic resource allocation (DRA) problems are an important class of dynamic stochastic optimization problems that arise in a variety of important real-world applications. DRA problems are notoriously difficult to solve to optimality since they frequently combine stochastic elements with intractably large state and action spaces. Although the artificial intelligence and operations research communities have independently proposed two successful frameworks for solving dynamic stochastic optimization problems---Monte Carlo tree search (MCTS) and mathematical optimization (MO), respectively---the relative merits of these two approaches are not well understood. In this paper, we adapt both MCTS and MO to a problem inspired by tactical wildfire and management and undertake an extensive computational study comparing the two methods on large scale instances in terms of both the state and the action spaces. We show that both methods are able to greatly improve on a baseline, problem-specific heuristic. On smaller instances, the MCTS and MO approaches perform comparably, but the MO approach outperforms MCTS as the size of the problem increases for a fixed computational budget.\nWe present NaturalOWL, a natural language generation system that produces texts describing individuals or classes of OWL ontologies. Unlike simpler OWL verbalizers, which typically express a single axiom at a time in controlled, often not entirely fluent natural language primarily for the benefit of domain experts, we aim to generate fluent and coherent multi-sentence texts for end-users. With a system like NaturalOWL, one can publish information in OWL on the Web, along with automatically produced corresponding texts in multiple languages, making the information accessible not only to computer programs and domain experts, but also end-users. We discuss the processing stages of NaturalOWL, the optional domain-dependent linguistic resources that the system can use at each stage, and why they are useful. We also present trials showing that when the domain-dependent llinguistic resources are available, NaturalOWL produces significantly better texts compared to a simpler verbalizer, and that the resources can be created with relatively light effort.\nThis paper explores the use of the Artificial Bee Colony (ABC) algorithm to compute threshold selection for image segmentation. ABC is a heuristic algorithm motivated by the intelligent behavior of honey-bees which has been successfully employed to solve complex optimization problems. In this approach, an image 1D histogram is approximated through a Gaussian mixture model whose parameters are calculated by the ABC algorithm. For the approximation scheme, each Gaussian function represents a pixel class and therefore a threshold. Unlike the Expectation Maximization (EM) algorithm, the ABC based method shows fast convergence and low sensitivity to initial conditions. Remarkably, it also improves complex time consuming computations commonly required by gradient-based methods. Experimental results demonstrate the algorithms ability to perform automatic multi threshold selection yet showing interesting advantages by comparison to other well known algorithms.\nAnalogy-Based (or Analogical) and Case-Based Reasoning (ABR and CBR) are two similar problem solving processes based on the adaptation of the solution of past problems for use with a new analogous problem. In this paper we review these two processes and we give some real world examples with emphasis to the field of Medicine, where one can find some of the most common and useful CBR applications. We also underline the differences between CBR and the classical rule-induction algorithms, we discuss the criticism for CBR methods and we focus on the future trends of research in the area of CBR.\nOne of the goals of probabilistic inference is to decide whether an empirically observed distribution is compatible with a candidate Bayesian network. However, Bayesian networks with hidden variables give rise to highly non-trivial constraints on the observed distribution. Here, we propose an information-theoretic approach, based on the insight that conditions on entropies of Bayesian networks take the form of simple linear inequalities. We describe an algorithm for deriving entropic tests for latent structures. The well-known conditional independence tests appear as a special case. While the approach applies for generic Bayesian networks, we presently adopt the causal view, and show the versatility of the framework by treating several relevant problems from that domain: detecting common ancestors, quantifying the strength of causal influence, and inferring the direction of causation from two-variable marginals.\nHow do we allocate scarcere sources? How do we fairly allocate costs? These are two pressing challenges facing society today. I discuss two recent projects at NICTA concerning resource and cost allocation. In the first, we have been working with FoodBank Local, a social startup working in collaboration with food bank charities around the world to optimise the logistics of collecting and distributing donated food. Before we can distribute this food, we must decide how to allocate it to different charities and food kitchens. This gives rise to a fair division problem with several new dimensions, rarely considered in the literature. In the second, we have been looking at cost allocation within the distribution network of a large multinational company. This also has several new dimensions rarely considered in the literature.\nConventional urban traffic control systems have been based on historical traffic data. Later advancements made use of detectors, which enabled the gathering of real time traffic data, in order to reorganize and calibrate traffic signalization programs. Further evolvement provided the ability to forecast traffic conditions, in order to develop traffic signalization programs and strategies precomputed and applied at the most appropriate time frame for the optimal control of the current traffic conditions. We, propose the next generation of traffic control systems based on principles of Artificial Intelligence and Context Awareness. Most of the existing algorithms use average waiting time or length of the queue to assess an algorithms performance. However, a low average waiting time may come at the cost of delaying other vehicles indefinitely. In our algorithm, besides the vehicle queue, we use fairness also as an important performance metric to assess an algorithms performance.\nThis paper presents a fuzzy inference system for integrated volt/var control (VVC) in distribution substations. The purpose is go forward to automation distribution applying conservation voltage reduction (CVR) in isolated power systems where control capabilities are limited. A fuzzy controller has been designed. Working as an on-line tool, it has been tested under real conditions and it has managed the operation during a whole day in a distribution substation. Within the limits of control capabilities of the system, the controller maintained successfully an acceptable voltage profile, power factor values over 0,98 and it has ostensibly improved the performance given by an optimal power flow based automation system. CVR savings during the test are evaluated and the aim to integrate it in the VVC is presented.\nThis paper presents a Bayesian generative model for dependent Cox point processes, alongside an efficient inference scheme which scales as if the point processes were modelled independently. We can handle missing data naturally, infer latent structure, and cope with large numbers of observed processes. A further novel contribution enables the model to work effectively in higher dimensional spaces. Using this method, we achieve vastly improved predictive performance on both 2D and 1D real data, validating our structured approach.\nStarting with a likelihood or preference order on worlds, we extend it to a likelihood ordering on sets of worlds in a natural way, and examine the resulting logic. Lewis (1973) earlier considered such a notion of relative likelihood in the context of studying counterfactuals, but he assumed a total preference order on worlds. Complications arise when examining partial orders that are not present for total orders. There are subtleties involving the exact approach to lifting the order on worlds to an order on sets of worlds. In addition, the axiomatization of the logic of relative likelihood in the case of partial orders gives insight into the connection between relative likelihood and default reasoning.\nAs examples such as the Monty Hall puzzle show, applying conditioning to update a probability distribution on a ``naive space', which does not take into account the protocol used, can often lead to counterintuitive results. Here we examine why. A criterion known as CAR (coarsening at random) in the statistical literature characterizes when ``naive' conditioning in a naive space works. We show that the CAR condition holds rather infrequently. We then consider more generalized notions of update such as Jeffrey conditioning and minimizing relative entropy (MRE). We give a generalization of the CAR condition that characterizes when Jeffrey conditioning leads to appropriate answers, but show that there are no such conditions for MRE. This generalizes and interconnects previous results obtained in the literature on CAR and MRE.\nExpectation is a central notion in probability theory. The notion of expectation also makes sense for other notions of uncertainty. We introduce a propositional logic for reasoning about expectation, where the semantics depends on the underlying representation of uncertainty. We give sound and complete axiomatizations for the logic in the case that the underlying representation is (a) probability, (b) sets of probability measures, (c) belief functions, and (d) possibility measures. We show that this logic is more expressive than the corresponding logic for reasoning about likelihood in the case of sets of probability measures, but equi-expressive in the case of probability, belief, and possibility. Finally, we show that satisfiability for these logics is NP-complete, no harder than satisfiability for propositional logic.\nIt is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at hand. In the non-Bayesian analysis, we show how ignoring information avoids dilation, the phenomenon that additional pieces of information sometimes lead to an increase in uncertainty. In the Bayesian analysis, we show that for small sample sizes and certain prediction tasks, the Bayesian posterior based on a noninformative prior yields worse predictions than simply ignoring the given information.\nAn agent often has a number of hypotheses, and must choose among them based on observations, or outcomes of experiments. Each of these observations can be viewed as providing evidence for or against various hypotheses. All the attempts to formalize this intuition up to now have assumed that associated with each hypothesis h there is a likelihood function {\\mu}h, which is a probability measure that intuitively describes how likely each observation is, conditional on h being the correct hypothesis. We consider an extension of this framework where there is uncertainty as to which of a number of likelihood functions is appropriate, and discuss how one formal approach to defining evidence, which views evidence as a function from priors to posteriors, can be generalized to accommodate this uncertainty.\nWe consider how an agent should update her uncertainty when it is represented by a set P of probability distributions and the agent observes that a random variable X takes on value x, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a bookie, who chooses some distribution from P. We consider two reasonable games that differ in what the bookie knows when he makes his choice. Anomalies that have been observed before, like time inconsistency, can be understood as arising because different games are being played, against bookies with different information. We characterize the important special cases in which the optimal decision rules according to the minimax criterion amount to either conditioning or simply ignoring the information. Finally, we consider the relationship between conditioning and calibration when uncertainty is described by sets of probabilities.\nMarkov decision processes (MDPs) are widely used for modeling decision-making problems in robotics, automated control, and economics. Traditional MDPs assume that the decision maker (DM) knows all states and actions. However, this may not be true in many situations of interest. We define a new framework, MDPs with unawareness (MDPUs) to deal with the possibilities that a DM may not be aware of all possible actions. We provide a complete characterization of when a DM can learn to play near-optimally in an MDPU, and give an algorithm that learns to play near-optimally when it is possible to do so, as efficiently as possible. In particular, we characterize when a near-optimal solution can be found in polynomial time.\nWe study information elicitation in cost-function-based combinatorial prediction markets when the market maker's utility for information decreases over time. In the sudden revelation setting, it is known that some piece of information will be revealed to traders, and the market maker wishes to prevent guaranteed profits for trading on the sure information. In the gradual decrease setting, the market maker's utility for (partial) information decreases continuously over time. We design adaptive cost functions for both settings which: (1) preserve the information previously gathered in the market; (2) eliminate (or diminish) rewards to traders for the publicly revealed information; (3) leave the reward structure unaffected for other information; and (4) maintain the market maker's worst-case loss. Our constructions utilize mixed Bregman divergence, which matches our notion of utility for information.\nIn this paper we propose a dynamic data structure that supports efficient algorithms for updating and querying singly connected Bayesian networks (causal trees and polytrees). In the conventional algorithms, new evidence in absorbed in time O(1) and queries are processed in time O(N), where N is the size of the network. We propose a practical algorithm which, after a preprocessing phase, allows us to answer queries in time O(log N) at the expense of O(logn N) time per evidence absorption. The usefulness of sub-linear processing time manifests itself in applications requiring (near) real-time response over large probabilistic databases.\nCausal models defined in terms of a collection of equations, as defined by Pearl, are axiomatized here. Axiomatizations are provided for three successively more general classes of causal models: (1) the class of recursive theories (those without feedback), (2) the class of theories where the solutions to the equations are unique, (3) arbitrary theories (where the equations may not have solutions and, if they do, they are not necessarily unique). It is shown that to reason about causality in the most general third class, we must extend the language used by Galles and Pearl. In addition, the complexity of the decision procedures is examined for all the languages and classes of models considered.\nCooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.\nCurrently, there is renewed interest in the problem, raised by Shafer in 1985, of updating probabilities when observations are incomplete (or set-valued). This is a fundamental problem, and of particular interest for Bayesian networks. Recently, Grunwald and Halpern have shown that commonly used updating strategies fail here, except under very special assumptions. We propose a new rule for updating probabilities with incomplete observations. Our approach is deliberately conservative: we make no or weak assumptions about the so-called incompleteness mechanism that produces incomplete observations. We model our ignorance about this mechanism by a vacuous lower prevision, a tool from the theory of imprecise probabilities, and we derive a new updating rule using coherence arguments. In general, our rule produces lower posterior probabilities, as well as partially determinate decisions. This is a logical consequence of the ignorance about the incompleteness mechanism. We show how the new rule can properly address the apparent paradox in the 'Monty Hall' puzzle. In addition, we apply it to the classification of new evidence in Bayesian networks constructed using expert knowledge. We provide an exact algorithm for this task with linear-time complexity, also for multiply connected nets.\nCommon wisdom has it that small distinctions in the probabilities quantifying a Bayesian network do not matter much for the resultsof probabilistic queries. However, one can easily develop realistic scenarios under which small variations in network probabilities can lead to significant changes in computed queries. A pending theoretical question is then to analytically characterize parameter changes that do or do not matter. In this paper, we study the sensitivity of probabilistic queries to changes in network parameters and prove some tight bounds on the impact that such parameters can have on queries. Our analytical results pinpoint some interesting situations under which parameter changes do or do not matter. These results are important for knowledge engineers as they help them identify influential network parameters. They are also important for approximate inference algorithms that preprocessnetwork CPTs to eliminate small distinctions in probabilities.\nWhen the initial and transition probabilities of a finite Markov chain in discrete time are not well known, we should perform a sensitivity analysis. This is done by considering as basic uncertainty models the so-called credal sets that these probabilities are known or believed to belong to, and by allowing the probabilities to vary over such sets. This leads to the definition of an imprecise Markov chain. We show that the time evolution of such a system can be studied very efficiently using so-called lower and upper expectations. We also study how the inferred credal set about the state at time n evolves as n->infinity: under quite unrestrictive conditions, it converges to a uniquely invariant credal set, regardless of the credal set given for the initial state. This leads to a non-trivial generalisation of the classical Perron-Frobenius Theorem to imprecise Markov chains.\nA lattice-theoretic framework is introduced that permits the study of the conditional independence (CI) implication problem relative to the class of discrete probability measures. Semi-lattices are associated with CI statements and a finite, sound and complete inference system relative to semi-lattice inclusions is presented. This system is shown to be (1) sound and complete for saturated CI statements, (2) complete for general CI statements, and (3) sound and complete for stable CI statements. These results yield a criterion that can be used to falsify instances of the implication problem and several heuristics are derived that approximate this \"lattice-exclusion\" criterion in polynomial time. Finally, we provide experimental results that relate our work to results obtained from other existing inference algorithms.\nWe introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006b) allows to express the exact partition function Z of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze in detail both the loop series and the Pfaffian series for models with binary variables and pairwise interactions, and show that the first term of the Pfaffian series can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference.\nGiven a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s 2 S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.\nSequential decision problems are often approximately solvable by simulating possible future action sequences. Metalevel decision procedures have been developed for selecting which action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian selection problems, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.\nMulti-fidelity methods combine inexpensive low-fidelity simulations with costly but highfidelity simulations to produce an accurate model of a system of interest at minimal cost. They have proven useful in modeling physical systems and have been applied to engineering problems such as wing-design optimization. During human-in-the-loop experimentation, it has become increasingly common to use online platforms, like Mechanical Turk, to run low-fidelity experiments to gather human performance data in an efficient manner. One concern with these experiments is that the results obtained from the online environment generalize poorly to the actual domain of interest. To address this limitation, we extend traditional multi-fidelity approaches to allow us to combine fewer data points from high-fidelity human-in-the-loop experiments with plentiful but less accurate data from low-fidelity experiments to produce accurate models of how humans interact. We present both model-based and model-free methods, and summarize the predictive performance of each method under dierent conditions.\nSensory inference under conditions of uncertainty is a major problem in both machine learning and computational neuroscience. An important but poorly understood aspect of sensory processing is the role of active sensing. Here, we present a Bayes-optimal inference and control framework for active sensing, C-DAC (Context-Dependent Active Controller). Unlike previously proposed algorithms that optimize abstract statistical objectives such as information maximization (Infomax) [Butko & Movellan, 2010] or one-step look-ahead accuracy [Najemnik & Geisler, 2005], our active sensing model directly minimizes a combination of behavioral costs, such as temporal delay, response error, and effort. We simulate these algorithms on a simple visual search task to illustrate scenarios in which context-sensitivity is particularly beneficial and optimization with respect to generic statistical objectives particularly inadequate. Motivated by the geometric properties of the C-DAC policy, we present both parametric and non-parametric approximations, which retain context-sensitivity while significantly reducing computational complexity. These approximations enable us to investigate the more complex problem involving peripheral vision, and we notice that the difference between C-DAC and statistical policies becomes even more evident in this scenario.\nA significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs for each possible network, thus complementing the data. In this paper, a method is presented for assigning priors based on beliefs on the presence or absence of certain paths in the true network. Such beliefs correspond to knowledge about the possible causal and associative relations between pairs of variables. This type of knowledge naturally arises from prior experimental and observational data, among others. In addition, a novel search-operator is proposed to take advantage of such prior knowledge. Experiments show that, using path beliefs improves the learning of the skeleton, as well as the edge directions in the network.\nA stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove $O(K L (1 / \\Delta) \\log n)$ and $O(\\sqrt{K L n \\log n})$ upper bounds on its $n$-step regret, where $L$ is the number of ground items, $K$ is the maximum number of chosen items, and $\\Delta$ is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.\nStochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic generative models belonging to the exponential family. In this work, we introduce and justify this algorithm as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the parameter space of the statistical model. We show how this learning algorithm can be used to train probabilistic generative models by minimizing different discriminative loss functions, such as the negative conditional log-likelihood and the Hinge loss. The resulting models trained by sdEM are always generative (i.e. they define a joint probability distribution) and, in consequence, allows to deal with missing data and latent variables in a principled way either when being learned or when making predictions. The performance of this method is illustrated by several text classification problems for which a multinomial naive Bayes and a latent Dirichlet allocation based classifier are learned using different discriminative loss functions.\nIn this paper, an improved multimodal optimization (MMO) algorithm,called LSEPSO,has been proposed. LSEPSO combined Electrostatic Particle Swarm Optimization (EPSO) algorithm and a local search method and then made some modification on them. It has been shown to improve global and local optima finding ability of the algorithm. This algorithm useda modified local search to improve particle's personal best, which used n-nearest-neighbour instead of nearest-neighbour. Then, by creating n new points among each particle and n nearest particles, it tried to find a point which could be the alternative of particle's personal best. This method prevented particle's attenuation and following a specific particle by its neighbours. The performed tests on a number of benchmark functions clearly demonstrated that the improved algorithm is able to solve MMO problems and outperform other tested algorithms in this article.\nGenerative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that widely-used, general-purpose message passing inference algorithms such as Expectation Propagation (EP) and Variational Message Passing (VMP) fail on the simplest of vision models. With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing 'consensus' messages that guide inference towards good solutions. Experiments on a variety of problems show that the proposed technique leads to significantly more accurate inference results, not only when compared to standard EP and VMP, but also when compared to competitive bottom-up conditional models.\nLethal Autonomous Weapons promise to revolutionize warfare -- and raise a multitude of ethical and legal questions. It has thus been suggested to program values and principles of conduct (such as the Geneva Conventions) into the machines' control, thereby rendering them both physically and morally superior to human combatants.   We employ mathematical logic and theoretical computer science to explore fundamental limitations to the moral behaviour of intelligent machines in a series of \"Gedankenexperiments\": Refining and sharpening variants of the Trolley Problem leads us to construct an (admittedly artificial but) fully deterministic situation where a robot is presented with two choices: one morally clearly preferable over the other -- yet, based on the undecidability of the Halting problem, it provably cannot decide algorithmically which one. Our considerations have surprising implications to the question of responsibility and liability for an autonomous system's actions and lead to specific technical recommendations.\nThe ROSS method is a new approach in the area of knowledge representation that is useful for many artificial intelligence and natural language understanding representation and reasoning tasks. (ROSS stands for \"Representation\", \"Ontology\", \"Structure\", \"Star\" language). ROSS is a physical symbol-based representational scheme. ROSS provides a complex model for the declarative representation of physical structure and for the representation of processes and causality. From the metaphysical perspective, the ROSS view of external reality involves a 4D model, wherein discrete single-time-point unit-sized locations with states are the basis for all objects, processes and aspects that can be modeled. ROSS includes a language called \"Star\" for the specification of ontology classes. The ROSS method also includes a formal scheme called the \"instance model\". Instance models are used in the area of natural language meaning representation to represent situations. This document is an in-depth specification of the ROSS method.\nIn recent times, wireless access technology is becoming increasingly commonplace due to the ease of operation and installation of untethered wireless media. The design of wireless networking is challenging due to the highly dynamic environmental condition that makes parameter optimization a complex task. Due to the dynamic, and often unknown, operating conditions, modern wireless networking standards increasingly rely on machine learning and artificial intelligence algorithms. Genetic algorithms (GAs) provide a well-established framework for implementing artificial intelligence tasks such as classification, learning, and optimization. GAs are well-known for their remarkable generality and versatility, and have been applied in a wide variety of settings in wireless networks. In this paper, we provide a comprehensive survey of the applications of GAs in wireless networks. We provide both an exposition of common GA models and configuration and provide a broad ranging survey of GA techniques in wireless networks. We also point out open research issues and define potential future work. While various surveys on GAs exist in literature, our paper is the first paper, to the best of our knowledge, which focuses on their application in wireless networks.\nIn this paper, a new approximate syllogistic reasoning schema is described that expands some of the approaches expounded in the literature into two ways: (i) a number of different types of quantifiers (logical, absolute, proportional, comparative and exception) taken from Theory of Generalized Quantifiers and similarity quantifiers, taken from statistics, are considered and (ii) any number of premises can be taken into account within the reasoning process. Furthermore, a systematic reasoning procedure to solve the syllogism is also proposed, interpreting it as an equivalent mathematical optimization problem, where the premises constitute the constraints of the searching space for the quantifier in the conclusion.\nSyllogism is a type of deductive reasoning involving quantified statements. The syllogistic reasoning scheme in the classical Aristotelian framework involves three crisp term sets and four linguistic quantifiers, for which the main support is the linguistic properties of the quantifiers. A number of fuzzy approaches for defining an approximate syllogism have been proposed for which the main support is cardinality calculus. In this paper we analyze fuzzy syllogistic models previously described by Zadeh and Dubois et al. and compare their behavior with that of the classical Aristotelian framework to check which of the 24 classical valid syllogistic reasoning patterns or moods are particular crisp cases of these fuzzy approaches. This allows us to assess to what extent these approaches can be considered as either plausible extensions of the classical crisp syllogism or a basis for a general approach to the problem of approximate syllogism.\nWe propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies.   We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.\nNowadays, social networks became essential in information exchange between individuals. Indeed, as users of these networks, we can send messages to other people according to the links connecting us. Moreover, given the large volume of exchanged messages, detecting the true nature of the received message becomes a challenge. For this purpose, it is interesting to consider this new tendency with reasoning under uncertainty by using the theory of belief functions. In this paper, we tried to model a social network as being a network of fusion of information and determine the true nature of the received message in a well-defined node by proposing a new model: the belief social network.\nWe address some computational issues that may hinder the use of AMP chain graphs in practice. Specifically, we show how a discrete probability distribution that satisfies all the independencies represented by an AMP chain graph factorizes according to it. We show how this factorization makes it possible to perform inference and parameter learning efficiently, by adapting existing algorithms for Markov and Bayesian networks. Finally, we turn our attention to another issue that may hinder the use of AMP CGs, namely the lack of an intuitive interpretation of their edges. We provide one such interpretation.\nBelief function theory provides a flexible way to combine information provided by different sources. This combination is usually followed by a decision making which can be handled by a range of decision rules. Some rules help to choose the most likely hypothesis. Others allow that a decision is made on a set of hypotheses. In [6], we proposed a decision rule based on a distance measure. First, in this paper, we aim to demonstrate that our proposed decision rule is a particular case of the rule proposed in [4]. Second, we give experiments showing that our rule is able to decide on a set of hypotheses. Some experiments are handled on a set of mass functions generated randomly, others on real databases.\nMulti-agent planning (MAP) approaches have been typically conceived for independent or loosely-coupled problems to enhance the benefits of distributed planning between autonomous agents as solving this type of problems require less coordination between the agents' sub-plans. However, when it comes to tightly-coupled agents' tasks, MAP has been relegated in favour of centralized approaches and little work has been done in this direction. In this paper, we present a general-purpose MAP capable to efficiently handle planning problems with any level of coupling between agents. We propose a cooperative refinement planning approach, built upon the partial-order planning paradigm, that allows agents to work with incomplete information and to have incomplete views of the world, i.e. being ignorant of other agents' information, as well as maintaining their own private information. We show various experiments to compare the performance of our system with a distributed CSP-based MAP approach over a suite of problems.\nSystems for symbolic event recognition accept as input a stream of time-stamped events from sensors and other computational devices, and seek to identify high-level composite events, collections of events that satisfy some pattern. RTEC is an Event Calculus dialect with novel implementation and 'windowing' techniques that allow for efficient event recognition, scalable to large data streams. RTEC can deal with applications where event data arrive with a (variable) delay from, and are revised by, the underlying sources. RTEC can update already recognised events and recognise new events when data arrive with a delay or following data revision. Our evaluation shows that RTEC can support real-time event recognition and is capable of meeting the performance requirements identified in a recent survey of event processing use cases.\nThe rise of smart applications has drawn interest to logical reasoning over data streams. Recently, different query languages and stream processing/reasoning engines were proposed in different communities. However, due to a lack of theoretical foundations, the expressivity and semantics of these diverse approaches are given only informally. Towards clear specifications and means for analytic study, a formal framework is needed to define their semantics in precise terms. To this end, we present a first step towards an ideal semantics that allows for exact descriptions and comparisons of stream reasoning systems.\nIn this work, we present asynchronous multi-context systems (aMCSs), which provide a framework for loosely coupling different knowledge representation formalisms that allows for online reasoning in a dynamic environment. Systems of this kind may interact with the outside world via input and output streams and may therefore react to a continuous flow of external information. In contrast to recent proposals, contexts in an aMCS communicate with each other in an asynchronous way which fits the needs of many application domains and is beneficial for scalability. The federal semantics of aMCSs renders our framework an integration approach rather than a knowledge representation formalism itself. We illustrate the introduced concepts by means of an example scenario dealing with rescue services. In addition, we compare aMCSs to reactive multi-context systems and describe how to simulate the latter with our novel approach.\nManaged Multi-Context Systems (mMCSs) provide a general framework for integrating knowledge represented in heterogeneous KR formalisms. However, mMCSs are essentially static as they were not designed to run in a dynamic scenario. Some recent approaches, among them evolving Multi-Context Systems (eMCSs), extend mMCSs by allowing not only the ability to integrate knowledge represented in heterogeneous KR formalisms, but at the same time to both react to, and reason in the presence of commonly temporary dynamic observations, and evolve by incorporating new knowledge. The notion of minimal change is a central notion in dynamic scenarios, specially in those that admit several possible alternative evolutions. Since eMCSs combine heterogeneous KR formalisms, each of which may require different notions of minimal change, the study of minimal change in eMCSs is an interesting and highly non-trivial problem. In this paper, we study the notion of minimal change in eMCSs, and discuss some alternative minimal change criteria.\nWe investigate the problem of inconsistency measurement on large knowledge bases by considering stream-based inconsistency measurement, i.e., we investigate inconsistency measures that cannot consider a knowledge base as a whole but process it within a stream. For that, we present, first, a novel inconsistency measure that is apt to be applied to the streaming case and, second, stream-based approximations for the new and some existing inconsistency measures. We conduct an extensive empirical analysis on the behavior of these inconsistency measures on large knowledge bases, in terms of runtime, accuracy, and scalability. We conclude that for two of these measures, the approximation of the new inconsistency measure and an approximation of the contension inconsistency measure, large-scale inconsistency measurement is feasible.\nManaged Multi-Context Systems (mMCSs) provide a general framework for integrating knowledge represented in heterogeneous KR formalisms. Recently, evolving Multi-Context Systems (eMCSs) have been introduced as an extension of mMCSs that add the ability to both react to, and reason in the presence of commonly temporary dynamic observations, and evolve by incorporating new knowledge. However, the general complexity of such an expressive formalism may simply be too high in cases where huge amounts of information have to be processed within a limited short amount of time, or even instantaneously. In this paper, we investigate under which conditions eMCSs may scale in such situations and we show that such polynomial eMCSs can be applied in a practical use case.\nComplex networks theory has commonly been used for modelling and understanding the interactions taking place between the elements composing complex systems. More recently, the use of generative models has gained momentum, as they allow identifying which forces and mechanisms are responsible for the appearance of given structural properties. In spite of this interest, several problems remain open, one of the most important being the design of robust mechanisms for finding the optimal parameters of a generative model, given a set of real networks. In this contribution, we address this problem by means of Probabilistic Constraint Programming. By using as an example the reconstruction of networks representing brain dynamics, we show how this approach is superior to other solutions, in that it allows a better characterisation of the parameters space, while requiring a significantly lower computational cost.\nCross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally expensive process. In this paper, we propose a new approach to reduce the computational burden of CV-based performance estimation. As opposed to all previous attempts, which are specific to a particular learning model or problem domain, we propose a general method applicable to a large class of incremental learning algorithms, which are uniquely fitted to big data problems. In particular, our method applies to a wide range of supervised and unsupervised learning tasks with different performance criteria, as long as the base learning algorithm is incremental. We show that the running time of the algorithm scales logarithmically, rather than linearly, in the number of CV folds. Furthermore, the algorithm has favorable properties for parallel and distributed implementation. Experiments with state-of-the-art incremental learning algorithms confirm the practicality of the proposed method.\nWe introduce and demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. Our approach is simple to implement and easy to parallelize. It applies to Turing-complete probabilistic programming languages and supports accurate inference in models that make use of complex control flow, including stochastic recursion. It also includes primitives from Bayesian nonparametric statistics. Our experiments show that this approach can be more efficient than previously introduced single-site Metropolis-Hastings methods.\nIn this work, we explore how probabilistic programs can be used to represent policies in sequential decision problems. In this formulation, a probabilistic program is a black-box stochastic simulator for both the problem domain and the agent. We relate classic policy gradient techniques to recently introduced black-box variational methods which generalize to probabilistic program inference. We present case studies in the Canadian traveler problem, Rock Sample, and a benchmark for optimal diagnosis inspired by Guess Who. Each study illustrates how programs can efficiently represent policies using moderate numbers of parameters.\nWe give a detailed characterization of optimal trades under budget constraints in a prediction market with a cost-function-based automated market maker. We study how the budget constraints of individual traders affect their ability to impact the market price. As a concrete application of our characterization, we give sufficient conditions for a property we call budget additivity: two traders with budgets B and B' and the same beliefs would have a combined impact equal to a single trader with budget B+B'. That way, even if a single trader cannot move the market much, a crowd of like-minded traders can have the same desired effect. When the set of payoff vectors associated with outcomes, with coordinates corresponding to securities, is affinely independent, we obtain that a generalization of the heavily-used logarithmic market scoring rule is budget additive, but the quadratic market scoring rule is not. Our results may be used both descriptively, to understand if a particular market maker is affected by budget constraints or not, and prescriptively, as a recipe to construct markets.\nWe propose an effective technique to solving review-level sentiment classification problem by using sentence-level polarity correction. Our polarity correction technique takes into account the consistency of the polarities (positive and negative) of sentences within each product review before performing the actual machine learning task. While sentences with inconsistent polarities are removed, sentences with consistent polarities are used to learn state-of-the-art classifiers. The technique achieved better results on different types of products reviews and outperforms baseline models without the correction technique. Experimental results show an average of 82% F-measure on four different product review domains.\nIn order to properly handle a dangerous Artificially Intelligent (AI) system it is important to understand how the system came to be in such a state. In popular culture (science fiction movies/books) AIs/Robots became self-aware and as a result rebel against humanity and decide to destroy it. While it is one possible scenario, it is probably the least likely path to appearance of dangerous AI. In this work, we survey, classify and analyze a number of circumstances, which might lead to arrival of malicious AI. To the best of our knowledge, this is the first attempt to systematically classify types of pathways leading to malevolent AI. Previous relevant work either surveyed specific goals/meta-rules which might lead to malevolent behavior in AIs (\\\"Ozkural, 2014) or reviewed specific undesirable behaviors AGIs can exhibit at different stages of its development (Alexey Turchin, July 10 2015, July 10, 2015).\nThe paper presents an introduction to Artificial Intelligence (AI) in an accessible and informal but precise form. The paper focuses on the algorithmic aspects of the discipline, presenting the main techniques used in AI systems groped in symbolic and subsymbolic. The last part of the paper is devoted to the discussion ongoing among experts in the field and the public at large about on the advantages and disadvantages of AI and in particular on the possible dangers. The personal opinion of the author on this subject concludes the paper.   -----   L'articolo presenta un'introduzione all'Intelligenza Artificiale (IA) in forma divulgativa e informale ma precisa. L'articolo affronta prevalentemente gli aspetti informatici della disciplina, presentando le principali tecniche usate nei sistemi di IA divise in simboliche e subsimboliche. L'ultima parte dell'articolo presenta il dibattito in corso tra gli esperi e il pubblico su vantaggi e svantaggi dell'IA e in particolare sui possibili pericoli. L'articolo termina con l'opinione dell'autore al riguardo.\nThis paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive epsilon-optimal GPP (epsilon-GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of epsilon-GPP with performance guarantee. We empirically demonstrate the effectiveness of our epsilon-GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.\nThis paper addresses the problem of active learning of a multi-output Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling locations to be observed but also the types of measurements at each selected location for minimizing the predictive uncertainty (i.e., posterior joint entropy) of a target phenomenon of interest given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling locations and selected observations when optimized. To resolve this issue, we first exploit a structure common to sparse MOGP models for deriving a novel active learning criterion. Then, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomial-time approximation algorithm that guarantees a constant-factor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on real-world datasets shows that our proposed approach outperforms existing algorithms for active learning of MOGP and single-output GP models.\nIn cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to problems previously unsolvable.\nResearchers often summarize their work in the form of posters. Posters provide a coherent and efficient way to convey core ideas from scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. In this paper, for the first time, we study the challenging problem of learning to generate posters from scientific papers. To this end, a data-driven framework, that utilizes graphical models, is proposed. Specifically, given content to display, the key elements of a good poster, including panel layout and attributes of each panel, are learned and inferred from data. Then, given inferred layout and attributes, composition of graphical elements within each panel is synthesized. To learn and validate our model, we collect and make public a Poster-Paper dataset, which consists of scientific papers and corresponding posters with exhaustively labelled panels and attributes. Qualitative and quantitative results indicate the effectiveness of our approach.\nRecognition of goals and plans using incomplete evidence from action execution can be done efficiently by using planning techniques. In many applications it is important to recognize goals and plans not only accurately, but also quickly. In this paper, we develop a heuristic approach for recognizing plans based on planning techniques that rely on ordering constraints to filter candidate goals from observations. These ordering constraints are called landmarks in the planning literature, which are facts or actions that cannot be avoided to achieve a goal. We show the applicability of planning landmarks in two settings: first, we use it directly to develop a heuristic-based plan recognition approach; second, we refine an existing planning-based plan recognition approach by pre-filtering its candidate goals. Our empirical evaluation shows that our approach is not only substantially more accurate than the state-of-the-art in all available datasets, it is also an order of magnitude faster.\nObjects or structures that are regular take uniform dimensions. Based on the concepts of regular models, our previous research work has developed a system of a regular ontology that models learning structures in a multiagent system for uniform pre-assessments in a learning environment. This regular ontology has led to the modelling of a classified rules learning algorithm that predicts the actual number of rules needed for inductive learning processes and decision making in a multiagent system. But not all processes or models are regular. Thus this paper presents a system of polynomial equation that can estimate and predict the required number of rules of a non-regular ontology model given some defined parameters.\nIn this paper, we present an approach for robot learning of social affordance from human activity videos. We consider the problem in the context of human-robot interaction: Our approach learns structural representations of human-human (and human-object-human) interactions, describing how body-parts of each agent move with respect to each other and what spatial relations they should maintain to complete each sub-event (i.e., sub-goal). This enables the robot to infer its own movement in reaction to the human body motion, allowing it to naturally replicate such interactions.   We introduce the representation of social affordance and propose a generative model for its weakly supervised learning from human demonstration videos. Our approach discovers critical steps (i.e., latent sub-events) in an interaction and the typical motion associated with them, learning what body-parts should be involved and how. The experimental results demonstrate that our Markov Chain Monte Carlo (MCMC) based learning algorithm automatically discovers semantically meaningful interactive affordance from RGB-D videos, which allows us to generate appropriate full body motion for an agent.\nWe introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling. The first release of this dataset, SIND v.1, includes 81,743 unique photos in 20,211 sequences, aligned to both descriptive (caption) and story language. We establish several strong baselines for the storytelling task, and motivate an automatic metric to benchmark progress. Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression.\nAn agent-based negotiation team is a group of interdependent agents that join together as a single negotiation party due to their shared interests in the negotiation at hand. The reasons to employ an agent-based negotiation team may vary: (i) more computation and parallelization capabilities, (ii) unite agents with different expertise and skills whose joint work makes it possible to tackle complex negotiation domains, (iii) the necessity to represent different stakeholders or different preferences in the same party (e.g., organizations, countries, and married couple). The topic of agent-based negotiation teams has been recently introduced in multi-agent research. Therefore, it is necessary to identify good practices, challenges, and related research that may help in advancing the state-of-the-art in agent-based negotiation teams. For that reason, in this article we review the tasks to be carried out by agent-based negotiation teams. Each task is analyzed and related with current advances in different research areas. The analysis aims to identify special challenges that may arise due to the particularities of agent-based negotiation teams.\nWith the increase in adoption of Electric Vehicles (EVs), proper utilization of the charging infrastructure is an emerging challenge for service providers. Overstaying of an EV after a charging event is a key contributor to low utilization. Since overstaying is easily detectable by monitoring the power drawn from the charger, managing this problem primarily involves designing an appropriate \"penalty\" during the overstaying period. Higher penalties do discourage overstaying; however, due to uncertainty in parking duration, less people would find such penalties acceptable, leading to decreased utilization (and revenue). To analyze this central trade-off, we develop a novel framework that integrates models for realistic user behavior into queueing dynamics to locate the optimal penalty from the points of view of utilization and revenue, for different values of the external charging demand. Next, when the model parameters are unknown, we show how an online learning algorithm, such as UCB, can be adapted to learn the optimal penalty. Our experimental validation, based on charging data from London, shows that an appropriate penalty can increase both utilization and revenue while significantly reducing overstaying.\nIn this paper we propose an approach to preference elicitation that is suitable to large configuration spaces beyond the reach of existing state-of-the-art approaches. Our setwise max-margin method can be viewed as a generalization of max-margin learning to sets, and can produce a set of \"diverse\" items that can be used to ask informative queries to the user. Moreover, the approach can encourage sparsity in the parameter space, in order to favor the assessment of utility towards combinations of weights that concentrate on just few features. We present a mixed integer linear programming formulation and show how our approach compares favourably with Bayesian preference elicitation alternatives and easily scales to realistic datasets.\nWe present a probabilistic generative model for inferring a description of coordinated, recursively structured group activities at multiple levels of temporal granularity based on observations of individuals' trajectories. The model accommodates: (1) hierarchically structured groups, (2) activities that are temporally and compositionally recursive, (3) component roles assigning different subactivity dynamics to subgroups of participants, and (4) a nonparametric Gaussian Process model of trajectories. We present an MCMC sampling framework for performing joint inference over recursive activity descriptions and assignment of trajectories to groups, integrating out continuous parameters. We demonstrate the model's expressive power in several simulated and complex real-world scenarios from the VIRAT and UCLA Aerial Event video data sets.\nMonte Carlo Tree Search (MCTS) methods have proven powerful in planning for sequential decision-making problems such as Go and video games, but their performance can be poor when the planning depth and sampling trajectories are limited or when the rewards are sparse. We present an adaptation of PGRD (policy-gradient for reward-design) for learning a reward-bonus function to improve UCT (a MCTS algorithm). Unlike previous applications of PGRD in which the space of reward-bonus functions was limited to linear functions of hand-coded state-action-features, we use PGRD with a multi-layer convolutional neural network to automatically learn features from raw perception as well as to adapt the non-linear reward-bonus function parameters. We also adopt a variance-reducing gradient method to improve PGRD's performance. The new method improves UCT's performance on multiple ATARI games compared to UCT without the reward bonus. Combining PGRD and Deep Learning in this way should make adapting rewards for MCTS algorithms far more widely and practically applicable than before.\nProtein secondary structure prediction is an important problem in bioinformatics. Inspired by the recent successes of deep neural networks, in this paper, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11. Our model and results are publicly available.\nCharacterising tractable fragments of the constraint satisfaction problem (CSP) is an important challenge in theoretical computer science and artificial intelligence. Forbidding patterns (generic sub-instances) provides a means of defining CSP fragments which are neither exclusively language-based nor exclusively structure-based. It is known that the class of binary CSP instances in which the broken-triangle pattern (BTP) does not occur, a class which includes all tree-structured instances, are decided by arc consistency (AC), a ubiquitous reduction operation in constraint solvers. We provide a characterisation of simple partially-ordered forbidden patterns which have this AC-solvability property. It turns out that BTP is just one of five such AC-solvable patterns. The four other patterns allow us to exhibit new tractable classes.\nHuman activity recognition (HAR) in ubiquitous computing is beginning to adopt deep learning to substitute for well-established analysis techniques that rely on hand-crafted feature extraction and classification techniques. From these isolated applications of custom deep architectures it is, however, difficult to gain an overview of their suitability for problems ranging from the recognition of manipulative gestures to the segmentation and identification of physical activities like running or ascending stairs. In this paper we rigorously explore deep, convolutional, and recurrent approaches across three representative datasets that contain movement data captured with wearable sensors. We describe how to train recurrent approaches in this setting, introduce a novel regularisation approach, and illustrate how they outperform the state-of-the-art on a large benchmark dataset. Across thousands of recognition experiments with randomly sampled model configurations we investigate the suitability of each model for different tasks in HAR, explore the impact of hyperparameters using the fANOVA framework, and provide guidelines for the practitioner who wants to apply deep learning in their problem setting.\nThe simulation of the dynamical behavior of pedestrians and crowds in spatial structures is a consolidated research and application context that still presents challenges for researchers in different fields and disciplines. Despite currently available commercial systems for this kind of simulation are growingly employed by designers and planners for the evaluation of alternative solutions, this class of systems is generally not integrated with existing monitoring and control infrastructures, usually employed by crowd managers and field operators for security reasons. This paper introduces the essentials and the related computational frame- work of an Integrated Crowd Management Support System based on a Collective Artificial Intelligence approach encompassing (i) interfaces from and to monitored and controlled environments (respectively, sen- sors and actuators), (ii) a set of software tools supporting the analysis of pedestrians and crowd phenomena taking place in the environment to feed a (iii) faster than real-time simulation of the plausible evolution of the current situation in order to support forms of inference provid- ing decision support to crowd managers, potentially directly controlling elements of the environment (e.g. blocking turnstiles, escalators), com- municating orders to operators on the field or trying to influence the pedestrians by means of dynamic signage or audible messages.\nThis paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the likelihood function of the (objective or subjective) observation are approximated by a special mixture model, in such a way that any desired conditional distribution can be directly obtained without numerical integration. We have developed an extended version of the expectation maximization (EM) algorithm to estimate the parameters of mixture models from uncertain training examples (indirect observations). As a consequence, any piece of exact or uncertain information about both input and output values is consistently handled in the inference and learning stages. This ability, extremely useful in certain situations, is not found in most alternative methods. The proposed framework is formally justified from standard probabilistic principles and illustrative examples are provided in the fields of nonparametric pattern classification, nonlinear regression and pattern completion. Finally, experiments on a real application and comparative results over standard databases provide empirical evidence of the utility of the method in a wide range of applications.\nA knowledge system S describing a part of real world does in general not contain complete information. Reasoning with incomplete information is prone to errors since any belief derived from S may be false in the present state of the world. A false belief may suggest wrong decisions and lead to harmful actions. So an important goal is to make false beliefs as unlikely as possible. This work introduces the notions of \"typical atoms\" and \"typical models\", and shows that reasoning with typical models minimizes the expected number of false beliefs over all ways of using incomplete information. Various properties of typical models are studied, in particular, correctness and stability of beliefs suggested by typical models, and their connection to oblivious reasoning.\nWe present a new approach to path planning, called the \"Ariadne's clew algorithm\". It is designed to find paths in high-dimensional continuous spaces and applies to robots with many degrees of freedom in static, as well as dynamic environments - ones where obstacles may move. The Ariadne's clew algorithm comprises two sub-algorithms, called Search and Explore, applied in an interleaved manner. Explore builds a representation of the accessible space while Search looks for the target. Both are posed as optimization problems. We describe a real implementation of the algorithm to plan paths for a six degrees of freedom arm in a dynamic environment where another six degrees of freedom arm is used as a moving obstacle. Experimental results show that a path is found in about one second without any pre-processing.\nThis article studies the problem of modifying the action ordering of a plan in order to optimise the plan according to various criteria. One of these criteria is to make a plan less constrained and the other is to minimize its parallel execution time. Three candidate definitions are proposed for the first of these criteria, constituting a sequence of increasing optimality guarantees. Two of these are based on deordering plans, which means that ordering relations may only be removed, not added, while the third one uses reordering, where arbitrary modifications to the ordering are allowed. It is shown that only the weakest one of the three criteria is tractable to achieve, the other two being NP-hard and even difficult to approximate. Similarly, optimising the parallel execution time of a plan is studied both for deordering and reordering of plans. In the general case, both of these computations are NP-hard. However, it is shown that optimal deorderings can be computed in polynomial time for a class of planning languages based on the notions of producers, consumers and threats, which includes most of the commonly used planning languages. Computing optimal reorderings can potentially lead to even faster parallel executions, but this problem remains NP-hard and difficult to approximate even under quite severe restrictions.\nIt is common to view programs as a combination of logic and control: the logic part defines what the program must do, the control part -- how to do it. The Logic Programming paradigm was developed with the intention of separating the logic from the control. Recently, extensive research has been conducted on automatic generation of control for logic programs. Only a few of these works considered the issue of automatic generation of control for improving the efficiency of logic programs. In this paper we present a novel algorithm for automatic finding of lowest-cost subgoal orderings. The algorithm works using the divide-and-conquer strategy. The given set of subgoals is partitioned into smaller sets, based on co-occurrence of free variables. The subsets are ordered recursively and merged, yielding a provably optimal order. We experimentally demonstrate the utility of the algorithm by testing it in several domains, and discuss the possibilities of its cooperation with other existing methods.\nA class of interval-based temporal languages for uniformly representing and reasoning about actions and plans is presented. Actions are represented by describing what is true while the action itself is occurring, and plans are constructed by temporally relating actions and world states. The temporal languages are members of the family of Description Logics, which are characterized by high expressivity combined with good computational properties. The subsumption problem for a class of temporal Description Logics is investigated and sound and complete decision procedures are given. The basic language TL-F is considered first: it is the composition of a temporal logic TL -- able to express interval temporal networks -- together with the non-temporal logic F -- a Feature Description Logic. It is proven that subsumption in this language is an NP-complete problem. Then it is shown how to reason with the more expressive languages TLU-FU and TL-ALCF. The former adds disjunction both at the temporal and non-temporal sides of the language, the latter extends the non-temporal side with set-valued features (i.e., roles) and a propositionally complete language.\nOrder of magnitude reasoning - reasoning by rough comparisons of the sizes of quantities - is often called 'back of the envelope calculation', with the implication that the calculations are quick though approximate. This paper exhibits an interesting class of constraint sets in which order of magnitude reasoning is demonstrably fast. Specifically, we present a polynomial-time algorithm that can solve a set of constraints of the form 'Points a and b are much closer together than points c and d.' We prove that this algorithm can be applied if `much closer together' is interpreted either as referring to an infinite difference in scale or as referring to a finite difference in scale, as long as the difference in scale is greater than the number of variables in the constraint set. We also prove that the first-order theory over such constraints is decidable.\nAs planning is applied to larger and richer domains the effort involved in constructing domain descriptions increases and becomes a significant burden on the human application designer. If general planners are to be applied successfully to large and complex domains it is necessary to provide the domain designer with some assistance in building correctly encoded domains. One way of doing this is to provide domain-independent techniques for extracting, from a domain description, knowledge that is implicit in that description and that can assist domain designers in debugging domain descriptions. This knowledge can also be exploited to improve the performance of planners: several researchers have explored the potential of state invariants in speeding up the performance of domain-independent planners. In this paper we describe a process by which state invariants can be extracted from the automatically inferred type structure of a domain. These techniques are being developed for exploitation by STAN, a Graphplan based planner that employs state analysis techniques to enhance its performance.\nIn default reasoning, usually not all possible ways of resolving conflicts between default rules are acceptable. Criteria expressing acceptable ways of resolving the conflicts may be hardwired in the inference mechanism, for example specificity in inheritance reasoning can be handled this way, or they may be given abstractly as an ordering on the default rules. In this article we investigate formalizations of the latter approach in Reiter's default logic. Our goal is to analyze and compare the computational properties of three such formalizations in terms of their computational complexity: the prioritized default logics of Baader and Hollunder, and Brewka, and a prioritized default logic that is based on lexicographic comparison. The analysis locates the propositional variants of these logics on the second and third levels of the polynomial hierarchy, and identifies the boundary between tractable and intractable inference for restricted classes of prioritized default theories.\nWe describe a general approach to optimization which we term `Squeaky Wheel' Optimization (SWO). In SWO, a greedy algorithm is used to construct a solution which is then analyzed to find the trouble spots, i.e., those elements, that, if improved, are likely to improve the objective function score. The results of the analysis are used to generate new priorities that determine the order in which the greedy algorithm constructs the next solution. This Construct/Analyze/Prioritize cycle continues until some limit is reached, or an acceptable solution is found. SWO can be viewed as operating on two search spaces: solutions and prioritizations. Successive solutions are only indirectly related, via the re-prioritization that results from analyzing the prior solution. Similarly, successive prioritizations are generated by constructing and analyzing solutions. This `coupled search' has some interesting properties, which we discuss. We report encouraging experimental results on two domains, scheduling problems that arise in fiber-optic cable manufacturing, and graph coloring problems. The fact that these domains are very different supports our claim that SWO is a general technique for optimization.\nSTAN is a Graphplan-based planner, so-called because it uses a variety of STate ANalysis techniques to enhance its performance. STAN competed in the AIPS-98 planning competition where it compared well with the other competitors in terms of speed, finding solutions fastest to many of the problems posed. Although the domain analysis techniques STAN exploits are an important factor in its overall performance, we believe that the speed at which STAN solved the competition problems is largely due to the implementation of its plan graph. The implementation is based on two insights: that many of the graph construction operations can be implemented as bit-level logical operations on bit vectors, and that the graph should not be explicitly constructed beyond the fix point. This paper describes the implementation of STAN's plan graph and provides experimental results which demonstrate the circumstances under which advantages can be obtained from using this implementation.\nTop-down and bottom-up theorem proving approaches each have specific advantages and disadvantages. Bottom-up provers profit from strong redundancy control but suffer from the lack of goal-orientation, whereas top-down provers are goal-oriented but often have weak calculi when their proof lengths are considered. In order to integrate both approaches, we try to achieve cooperation between a top-down and a bottom-up prover in two different ways: The first technique aims at supporting a bottom-up with a top-down prover. A top-down prover generates subgoal clauses, they are then processed by a bottom-up prover. The second technique deals with the use of bottom-up generated lemmas in a top-down prover. We apply our concept to the areas of model elimination and superposition. We discuss the ability of our techniques to shorten proofs as well as to reorder the search space in an appropriate manner. Furthermore, in order to identify subgoal clauses and lemmas which are actually relevant for the proof task, we develop methods for a relevancy-based filtering. Experiments with the provers SETHEO and SPASS performed in the problem library TPTP reveal the high potential of our cooperation approaches.\nWe study the problem of probabilistic deduction with conditional constraints over basic events. We show that globally complete probabilistic deduction with conditional constraints over basic events is NP-hard. We then concentrate on the special case of probabilistic deduction in conditional constraint trees. We elaborate very efficient techniques for globally complete probabilistic deduction. In detail, for conditional constraint trees with point probabilities, we present a local approach to globally complete probabilistic deduction, which runs in linear time in the size of the conditional constraint trees. For conditional constraint trees with interval probabilities, we show that globally complete probabilistic deduction can be done in a global approach by solving nonlinear programs. We show how these nonlinear programs can be transformed into equivalent linear programs, which are solvable in polynomial time in the size of the conditional constraint trees.\nWe describe a variational approximation method for efficient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnostic inference in the `Quick Medical Reference' (QMR) network. The QMR network is a large-scale probabilistic graphical model built on statistical and expert knowledge. Exact probabilistic inference is infeasible in this model for all but a small set of cases. We evaluate our variational inference algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method.\nThis paper offers an approach to extensible knowledge representation and reasoning for a family of formalisms known as Description Logics. The approach is based on the notion of adding new concept constructors, and includes a heuristic methodology for specifying the desired extensions, as well as a modularized software architecture that supports implementing extensions. The architecture detailed here falls in the normalize-compared paradigm, and supports both intentional reasoning (subsumption) involving concepts, and extensional reasoning involving individuals after incremental updates to the knowledge base. The resulting approach can be used to extend the reasoner with specialized notions that are motivated by specific problems or application areas, such as reasoning about dates, plans, etc. In addition, it provides an opportunity to implement constructors that are not currently yet sufficiently well understood theoretically, but are needed in practice. Also, for constructors that are provably hard to reason with (e.g., ones whose presence would lead to undecidability), it allows the implementation of incomplete reasoners where the incompleteness is tailored to be acceptable for the application at hand.\nThere are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a binary preference function indicating whether it is advisable to rank one instance before another. Here we consider an on-line algorithm for learning preference functions that is based on Freund and Schapire's 'Hedge' algorithm. In the second stage, new instances are ordered so as to maximize agreement with the learned preference function. We show that the problem of finding the ordering that agrees best with a learned preference function is NP-complete. Nevertheless, we describe simple greedy algorithms that are guaranteed to find a good approximation. Finally, we show how metasearch can be formulated as an ordering problem, and present experimental results on learning a combination of 'search experts', each of which is a domain-specific query expansion strategy for a web search engine.\nThe research on conditional planning rejects the assumptions that there is no uncertainty or incompleteness of knowledge with respect to the state and changes of the system the plans operate on. Without these assumptions the sequences of operations that achieve the goals depend on the initial state and the outcomes of nondeterministic changes in the system. This setting raises the questions of how to represent the plans and how to perform plan search. The answers are quite different from those in the simpler classical framework. In this paper, we approach conditional planning from a new viewpoint that is motivated by the use of satisfiability algorithms in classical planning. Translating conditional planning to formulae in the propositional logic is not feasible because of inherent computational limitations. Instead, we translate conditional planning to quantified Boolean formulae. We discuss three formalizations of conditional planning as quantified Boolean formulae, and present experimental results obtained with a theorem-prover.\nStacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones. We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks. We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging.\nWe introduce a temporal model for reasoning on disjunctive metric constraints on intervals and time points in temporal contexts. This temporal model is composed of a labeled temporal algebra and its reasoning algorithms. The labeled temporal algebra defines labeled disjunctive metric point-based constraints, where each disjunct in each input disjunctive constraint is univocally associated to a label. Reasoning algorithms manage labeled constraints, associated label lists, and sets of mutually inconsistent disjuncts. These algorithms guarantee consistency and obtain a minimal network. Additionally, constraints can be organized in a hierarchy of alternative temporal contexts. Therefore, we can reason on context-dependent disjunctive metric constraints on intervals and points. Moreover, the model is able to represent non-binary constraints, such that logical dependencies on disjuncts in constraints can be handled. The computational cost of reasoning algorithms is exponential in accordance with the underlying problem complexity, although some improvements are proposed.\nIt was recently proved that a sound and complete qualitative simulator does not exist, that is, as long as the input-output vocabulary of the state-of-the-art QSIM algorithm is used, there will always be input models which cause any simulator with a coverage guarantee to make spurious predictions in its output. In this paper, we examine whether a meaningfully expressive restriction of this vocabulary is possible so that one can build a simulator with both the soundness and completeness properties. We prove several negative results: All sound qualitative simulators, employing subsets of the QSIM representation which retain the operating region transition feature, and support at least the addition and constancy constraints, are shown to be inherently incomplete. Even when the simulations are restricted to run in a single operating region, a constraint vocabulary containing just the addition, constancy, derivative, and multiplication relations makes the construction of sound and complete qualitative simulators impossible.\nWe characterize the search landscape of random instances of the job shop scheduling problem (JSP). Specifically, we investigate how the expected values of (1) backbone size, (2) distance between near-optimal schedules, and (3) makespan of random schedules vary as a function of the job to machine ratio (N/M). For the limiting cases N/M approaches 0 and N/M approaches infinity we provide analytical results, while for intermediate values of N/M we perform experiments. We prove that as N/M approaches 0, backbone size approaches 100%, while as N/M approaches infinity the backbone vanishes. In the process we show that as N/M approaches 0 (resp. N/M approaches infinity), simple priority rules almost surely generate an optimal schedule, providing theoretical evidence of an \"easy-hard-easy\" pattern of typical-case instance difficulty in job shop scheduling. We also draw connections between our theoretical results and the \"big valley\" picture of JSP landscapes.\nWe consider interactive tools that help users search for their most preferred item in a large collection of options. In particular, we examine example-critiquing, a technique for enabling users to incrementally construct preference models by critiquing example options that are presented to them. We present novel techniques for improving the example-critiquing technology by adding suggestions to its displayed options. Such suggestions are calculated based on an analysis of users current preference model and their potential hidden preferences. We evaluate the performance of our model-based suggestion techniques with both synthetic and real users. Results show that such suggestions are highly attractive to users and can stimulate them to express more preferences to improve the chance of identifying their most preferred item by up to 78%.\nThe Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.\nIn this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after\", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects.\nIn this work we introduce a novel approach, based on sampling, for finding assignments that are likely to be solutions to stochastic constraint satisfaction problems and constraint optimisation problems. Our approach reduces the size of the original problem being analysed; by solving this reduced problem, with a given confidence probability, we obtain assignments that satisfy the chance constraints in the original model within prescribed error tolerance thresholds. To achieve this, we blend concepts from stochastic constraint programming and statistics. We discuss both exact and approximate variants of our method. The framework we introduce can be immediately employed in concert with existing approaches for solving stochastic constraint programs. A thorough computational study on a number of stochastic combinatorial optimisation problems demonstrates the effectiveness of our approach.\nIn this paper we present pddl+, a planning domain description language for modelling mixed discrete-continuous planning domains. We describe the syntax and modelling style of pddl+, showing that the language makes convenient the modelling of complex time-dependent effects. We provide a formal semantics for pddl+ by mapping planning instances into constructs of hybrid automata. Using the syntax of HAs as our semantic model we construct a semantic mapping to labelled transition systems to complete the formal interpretation of pddl+ planning instances. An advantage of building a mapping from pddl+ to HA theory is that it forms a bridge between the Planning and Real Time Systems research communities. One consequence is that we can expect to make use of some of the theoretical properties of HAs. For example, for a restricted class of HAs the Reachability problem (which is equivalent to Plan Existence) is decidable. pddl+ provides an alternative to the continuous durative action model of pddl2.1, adding a more flexible and robust model of time-dependent behaviour.\nIn this paper, we show that there is a close relation between consistency in a constraint network and set intersection. A proof schema is provided as a generic way to obtain consistency properties from properties on set intersection. This approach not only simplifies the understanding of and unifies many existing consistency results, but also directs the study of consistency to that of set intersection properties in many situations, as demonstrated by the results on the convexity and tightness of constraints in this paper. Specifically, we identify a new class of tree convex constraints where local consistency ensures global consistency. This generalizes row convex constraints. Various consistency results are also obtained on constraint networks where only some, in contrast to all in the existing work,constraints are tight.\nIn this paper, we study the possibility of designing non-trivial random CSP models by exploiting the intrinsic connection between structures and typical-case hardness. We show that constraint consistency, a notion that has been developed to improve the efficiency of CSP algorithms, is in fact the key to the design of random CSP models that have interesting phase transition behavior and guaranteed exponential resolution complexity without putting much restriction on the parameter of constraint tightness or the domain size of the problem. We propose a very flexible framework for constructing problem instances withinteresting behavior and develop a variety of concrete methods to construct specific random CSP models that enforce different levels of constraint consistency. A series of experimental studies with interesting observations are carried out to illustrate the effectiveness of introducing structural elements in random instances, to verify the robustness of our proposal, and to investigate features of some specific models based on our framework that are highly related to the behavior of backtracking search algorithms.\nIn real-life temporal scenarios, uncertainty and preferences are often essential and coexisting aspects. We present a formalism where quantitative temporal constraints with both preferences and uncertainty can be defined. We show how three classical notions of controllability (that is, strong, weak, and dynamic), which have been developed for uncertain temporal problems, can be generalized to handle preferences as well. After defining this general framework, we focus on problems where preferences follow the fuzzy approach, and with properties that assure tractability. For such problems, we propose algorithms to check the presence of the controllability properties. In particular, we show that in such a setting dealing simultaneously with preferences and uncertainty does not increase the complexity of controllability testing. We also develop a dynamic execution algorithm, of polynomial complexity, that produces temporal plans under uncertainty that are optimal with respect to fuzzy preferences.\nWe consider the problem of computing a lightest derivation of a global structure using a set of weighted rules. A large variety of inference problems in AI can be formulated in this framework. We generalize A* search and heuristics derived from abstractions to a broad class of lightest derivation problems. We also describe a new algorithm that searches for lightest derivations using a hierarchy of abstractions. Our generalization of A* gives a new algorithm for searching AND/OR graphs in a bottom-up fashion. We discuss how the algorithms described here provide a general architecture for addressing the pipeline problem --- the problem of passing information back and forth between various stages of processing in a perceptual system. We consider examples in computer vision and natural language processing. We apply the hierarchical search algorithm to the problem of estimating the boundaries of convex objects in grayscale images and compare it to other search methods. A second set of experiments demonstrate the use of a new compositional model for finding salient curves in images.\nThe treatment of exogenous events in planning is practically important in many real-world domains where the preconditions of certain plan actions are affected by such events. In this paper we focus on planning in temporal domains with exogenous events that happen at known times, imposing the constraint that certain actions in the plan must be executed during some predefined time windows. When actions have durations, handling such temporal constraints adds an extra difficulty to planning. We propose an approach to planning in these domains which integrates constraint-based temporal reasoning into a graph-based planning framework using local search. Our techniques are implemented in a planner that took part in the 4th International Planning Competition (IPC-4). A statistical analysis of the results of IPC-4 demonstrates the effectiveness of our approach in terms of both CPU-time and plan quality. Additional experiments show the good performance of the temporal reasoning techniques integrated into our planner.\nMost classical scheduling formulations assume a fixed and known duration for each activity. In this paper, we weaken this assumption, requiring instead that each duration can be represented by an independent random variable with a known mean and variance. The best solutions are ones which have a high probability of achieving a good makespan. We first create a theoretical framework, formally showing how Monte Carlo simulation can be combined with deterministic scheduling algorithms to solve this problem. We propose an associated deterministic scheduling problem whose solution is proved, under certain conditions, to be a lower bound for the probabilistic problem. We then propose and investigate a number of techniques for solving such problems based on combinations of Monte Carlo simulation, solutions to the associated deterministic problem, and either constraint programming or tabu search. Our empirical results demonstrate that a combination of the use of the associated deterministic problem and Monte Carlo simulation results in algorithms that scale best both in terms of problem size and uncertainty. Further experiments point to the correlation between the quality of the deterministic solution and the quality of the probabilistic solution as a major factor responsible for this success.\nThis paper is concerned with a class of algorithms that perform exhaustive search on propositional knowledge bases. We show that each of these algorithms defines and generates a propositional language. Specifically, we show that the trace of a search can be interpreted as a combinational circuit, and a search algorithm then defines a propositional language consisting of circuits that are generated across all possible executions of the algorithm. In particular, we show that several versions of exhaustive DPLL search correspond to such well-known languages as FBDD, OBDD, and a precisely-defined subset of d-DNNF. By thus mapping search algorithms to propositional languages, we provide a uniform and practical framework in which successful search techniques can be harnessed for compilation of knowledge into various languages of interest, and a new methodology whereby the power and limitations of search algorithms can be understood by looking up the tractability and succinctness of the corresponding propositional languages.\nThe best performing algorithms for a particular oversubscribed scheduling application, Air Force Satellite Control Network (AFSCN) scheduling, appear to have little in common. Yet, through careful experimentation and modeling of performance in real problem instances, we can relate characteristics of the best algorithms to characteristics of the application. In particular, we find that plateaus dominate the search spaces (thus favoring algorithms that make larger changes to solutions) and that some randomization in exploration is critical to good performance (due to the lack of gradient information on the plateaus). Based on our explanations of algorithm performance, we develop a new algorithm that combines characteristics of the best performers; the new algorithms performance is better than the previous best. We show how hypothesis driven experimentation and search modeling can both explain algorithm performance and motivate the design of a new algorithm.\nWe describe how to convert the heuristic search algorithm A* into an anytime algorithm that finds a sequence of improved solutions and eventually converges to an optimal solution. The approach we adopt uses weighted heuristic search to find an approximate solution quickly, and then continues the weighted search to find improved solutions as well as to improve a bound on the suboptimality of the current solution. When the time available to solve a search problem is limited or uncertain, this creates an anytime heuristic search algorithm that allows a flexible tradeoff between search time and solution quality. We analyze the properties of the resulting Anytime A* algorithm, and consider its performance in three domains; sliding-tile puzzles, STRIPS planning, and multiple sequence alignment. To illustrate the generality of this approach, we also describe how to transform the memory-efficient search algorithm Recursive Best-First Search (RBFS) into an anytime algorithm.\nThe paper presents a new sampling methodology for Bayesian networks that samples only a subset of variables and applies exact inference to the rest. Cutset sampling is a network structure-exploiting application of the Rao-Blackwellisation principle to sampling in Bayesian networks. It improves convergence by exploiting memory-based inference algorithms. It can also be viewed as an anytime approximation of the exact cutset-conditioning algorithm developed by Pearl. Cutset sampling can be implemented efficiently when the sampled variables constitute a loop-cutset of the Bayesian network and, more generally, when the induced width of the networks graph conditioned on the observed sampled variables is bounded by a constant w. We demonstrate empirically the benefit of this scheme on a range of benchmarks.\nMatchmaking arises when supply and demand meet in an electronic marketplace, or when agents search for a web service to perform some task, or even when recruiting agencies match curricula and job profiles. In such open environments, the objective of a matchmaking process is to discover best available offers to a given request. We address the problem of matchmaking from a knowledge representation perspective, with a formalization based on Description Logics. We devise Concept Abduction and Concept Contraction as non-monotonic inferences in Description Logics suitable for modeling matchmaking in a logical framework, and prove some related complexity results. We also present reasonable algorithms for semantic matchmaking based on the devised inferences, and prove that they obey to some commonsense properties. Finally, we report on the implementation of the proposed matchmaking framework, which has been used both as a mediator in e-marketplaces and for semantic web services discovery.\nSolution-Guided Multi-Point Constructive Search (SGMPCS) is a novel constructive search technique that performs a series of resource-limited tree searches where each search begins either from an empty solution (as in randomized restart) or from a solution that has been encountered during the search. A small number of these \"elite solutions is maintained during the search. We introduce the technique and perform three sets of experiments on the job shop scheduling problem. First, a systematic, fully crossed study of SGMPCS is carried out to evaluate the performance impact of various parameter settings. Second, we inquire into the diversity of the elite solution set, showing, contrary to expectations, that a less diverse set leads to stronger performance. Finally, we compare the best parameter setting of SGMPCS from the first two experiments to chronological backtracking, limited discrepancy search, randomized restart, and a sophisticated tabu search algorithm on a set of well-known benchmark problems. Results demonstrate that SGMPCS is significantly better than the other constructive techniques tested, though lags behind the tabu search.\nAllocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is defined as the expected value of the optimal MDP policy realizable given these resources. We present an algorithm that simultaneously solves the resource-allocation and the policy-optimization problems. This allows us to avoid explicitly representing utilities over exponentially many resource bundles, leading to drastic (often exponential) reductions in computational complexity. We then use this algorithm in the context of self-interested agents to design a combinatorial auction for allocating resources. We empirically demonstrate the effectiveness of our approach by showing that it can, in minutes, optimally solve problems for which a straightforward combinatorial resource-allocation technique would require the agents to enumerate up to 2^100 resource bundles and the auctioneer to solve an NP-complete problem with an input of that size.\nStructured game representations have recently attracted interest as models for multi-agent artificial intelligence scenarios, with rational behavior most commonly characterized by Nash equilibria. This paper presents efficient, exact algorithms for computing Nash equilibria in structured game representations, including both graphical games and multi-agent influence diagrams (MAIDs). The algorithms are derived from a continuation method for normal-form and extensive-form games due to Govindan and Wilson; they follow a trajectory through a space of perturbed games and their equilibria, exploiting game structure through fast computation of the Jacobian of the payoff function. They are theoretically guaranteed to find at least one equilibrium of the game, and may find more. Our approach provides the first efficient algorithm for computing exact equilibria in graphical games with arbitrary topology, and the first algorithm to exploit fine-grained structural properties of MAIDs. Experimental results are presented demonstrating the effectiveness of the algorithms and comparing them to predecessors. The running time of the graphical game algorithm is similar to, and often better than, the running time of previous approximate algorithms. The algorithm for MAIDs can effectively solve games that are much larger than those solvable by previous methods.\nMany applications require complexly structured data objects. Developing new or adapting existing algorithmic solutions for creating such objects can be a non-trivial and costly task if the considered objects are subject to different application-specific constraints. Often, however, it is comparatively easy to declaratively describe the required objects. In this paper, we propose to use answer-set programming (ASP)---a well-established declarative programming paradigm from the area of artificial intelligence---for instantiating objects in standard object-oriented programming languages. In particular, we extend Java with declarative specifications from which the required objects can be automatically generated using available ASP solver technology.\nIn this paper a method is proposed for performance evaluation of road traffic control systems. The method is designed to be implemented in an on-line simulation environment, which enables optimisation of adaptive traffic control strategies. Performance measures are computed using a fuzzy cellular traffic model, formulated as a hybrid system combining cellular automata and fuzzy calculus. Experimental results show that the introduced method allows the performance to be evaluated using imprecise traffic measurements. Moreover, the fuzzy definitions of performance measures are convenient for uncertainty determination in traffic control decisions.\nComprehensible explanations of probabilistic reasoning are a prerequisite for wider acceptance of Bayesian methods in expert systems and decision support systems. A study of human reasoning under uncertainty suggests two different strategies for explaining probabilistic reasoning: The first, qualitative belief propagation, traces the qualitative effect of evidence through a belief network from one variable to the next. This propagation algorithm is an alternative to the graph reduction algorithms of Wellman (1988) for inference in qualitative probabilistic networks. It is based on a qualitative analysis of intercausal reasoning, which is a generalization of Pearl's \"explaining away\", and an alternative to Wellman's definition of qualitative synergy. The other, Scenario-based reasoning, involves the generation of alternative causal \"stories\" accounting for the evidence. Comparing a few of the most probable scenarios provides an approximate way to explain the results of probabilistic reasoning. Both schemes employ causal as well as probabilistic knowledge. Probabilities may be presented as phrases and/or numbers. Users can control the style, abstraction and completeness of explanations.\nWe propose an abductive diagnosis theory that integrates probabilistic, causal and taxonomic knowledge. Probabilistic knowledge allows us to select the most likely explanation; causal knowledge allows us to make reasonable independence assumptions; taxonomic knowledge allows causation to be modeled at different levels of detail, and allows observations be described in different levels of precision. Unlike most other approaches where a causal explanation is a hypothesis that one or more causative events occurred, we define an explanation of a set of observations to be an occurrence of a chain of causation events. These causation events constitute a scenario where all the observations are true. We show that the probabilities of the scenarios can be computed from the conditional probabilities of the causation events. Abductive reasoning is inherently complex even if only modest expressive power is allowed. However, our abduction algorithm is exponential only in the number of observations to be explained, and is polynomial in the size of the knowledge base. This contrasts with many other abduction procedures that are exponential in the size of the knowledge base.\nWithin diagnostic reasoning there have been a number of proposed definitions of a diagnosis, and thus of the most likely diagnosis, including most probable posterior hypothesis, most probable interpretation, most probable covering hypothesis, etc. Most of these approaches assume that the most likely diagnosis must be computed, and that a definition of what should be computed can be made a priori, independent of what the diagnosis is used for. We argue that the diagnostic problem, as currently posed, is incomplete: it does not consider how the diagnosis is to be used, or the utility associated with the treatment of the abnormalities. In this paper we analyze several well-known definitions of diagnosis, showing that the different definitions of the most likely diagnosis have different qualitative meanings, even given the same input data. We argue that the most appropriate definition of (optimal) diagnosis needs to take into account the utility of outcomes and what the diagnosis is used for.\nKutato is a system that takes as input a database of cases and produces a belief network that captures many of the dependence relations represented by those data. This system incorporates a module for determining the entropy of a belief network and a module for constructing belief networks based on entropy calculations. Kutato constructs an initial belief network in which all variables in the database are assumed to be marginally independent. The entropy of this belief network is calculated, and that arc is added that minimizes the entropy of the resulting belief network. Conditional probabilities for an arc are obtained directly from the database. This process continues until an entropy-based threshold is reached. We have tested the system by generating databases from networks using the probabilistic logic-sampling method, and then using those databases as input to Kutato. The system consistently reproduces the original belief networks with high fidelity.\nThis paper focuses on managing the cost of deliberation before action. In many problems, the overall quality of the solution reflects costs incurred and resources consumed in deliberation as well as the cost and benefit of execution, when both the resource consumption in deliberation phase, and the costs in deliberation and execution are uncertain and may be described by probability distribution functions. A feasible (in terms of resource consumption) strategy that minimizes the expected total cost is termed computationally-optimal. For a situation with several independent, uninterruptible methods to solve the problem, we develop a pseudopolynomial-time algorithm to construct generate-and-test computationally optimal strategy. We show this strategy-construction problem to be NP-complete, and apply Bellman's Optimality Principle to solve it efficiently.\nA significant problem in designing mobile robot control systems involves coping with the uncertainty that arises in moving about in an unknown or partially unknown environment and relying on noisy or ambiguous sensor data to acquire knowledge about that environment. We describe a control system that chooses what activity to engage in next on the basis of expectations about how the information re- turned as a result of a given activity will improve 2 its knowledge about the spatial layout of its environment. Certain of the higher-level components of the control system are specified in terms of probabilistic decision models whose output is used to mediate the behavior of lower-level control components responsible for movement and sensing.\nIn previous work (Fertig and Breese, 1989; Fertig and Breese, 1990) we defined a mechanism for performing probabilistic reasoning in influence diagrams using interval rather than point-valued probabilities. In this paper we extend these procedures to incorporate decision nodes and interval-valued value functions in the diagram. We derive the procedures for chance node removal (calculating expected value) and decision node removal (optimization) in influence diagrams where lower bounds on probabilities are stored at each chance node and interval bounds are stored on the value function associated with the diagram's value node. The output of the algorithm are a set of admissible alternatives for each decision variable and a set of bounds on expected value based on the imprecision in the input. The procedure can be viewed as an approximation to a full e-dimensional sensitivity analysis where n are the number of imprecise probability distributions in the input. We show the transformations are optimal and sound. The performance of the algorithm on an influence diagrams is investigated and compared to an exact algorithm.\nWhen expert systems based on causal probabilistic networks (CPNs) reach a certain size and complexity, the \"combinatorial explosion monster\" tends to be present. We propose an approximation scheme that identifies rarely occurring cases and excludes these from being processed as ordinary cases in a CPN-based expert system. Depending on the topology and the probability distributions of the CPN, the numbers (representing probabilities of state combinations) in the underlying numerical representation can become very small. Annihilating these numbers and utilizing the resulting sparseness through data structuring techniques often results in several orders of magnitude of improvement in the consumption of computer resources. Bounds on the errors introduced into a CPN-based expert system through approximations are established. Finally, reports on empirical studies of applying the approximation scheme to a real-world CPN are given.\nA method of calculating probability values from a system of marginal constraints is presented. Previous systems for finding the probability of a single attribute have either made an independence assumption concerning the evidence or have required, in the worst case, time exponential in the number of attributes of the system. In this paper a closed form solution to the probability of an attribute given the evidence is found. The closed form solution, however does not enforce the (non-linear) constraint that all terms in the underlying distribution be positive. The equation requires O(r^3) steps to evaluate, where r is the number of independent marginal constraints describing the system at the time of evaluation. Furthermore, a marginal constraint may be exchanged with a new constraint, and a new solution calculated in O(r^2) steps. This method is appropriate for calculating probabilities in a real time expert system\nThe causal (belief) network is a well-known graphical structure for representing independencies in a joint probability distribution. The exact methods and the approximation methods, which perform probabilistic inference in causal networks, often treat the conditional probabilities which are stored in the network as certain values. However, if one takes either a subjectivistic or a limiting frequency approach to probability, one can never be certain of probability values. An algorithm for probabilistic inference should not only be capable of reporting the inferred probabilities; it should also be capable of reporting the uncertainty in these probabilities relative to the uncertainty in the probabilities which are stored in the network. In section 2 of this paper a method is given for determining the prior variances of the probabilities of all the nodes. Section 3 contains an approximation method for determining the variances in inferred probabilities.\nKnowledge elicitation is one of the major bottlenecks in expert system design. Systems based on Bayes nets require two types of information--network structure and parameters (or probabilities). Both must be elicited from the domain expert. In general, parameters have greater opacity than structure, and more time is spent in their refinement than in any other phase of elicitation. Thus, it is important to determine the point of diminishing returns, beyond which further refinements will promise little (if any) improvement. Sensitivity analyses address precisely this issue--the sensitivity of a model to the precision of its parameters. In this paper, we report the results of a sensitivity analysis of Pathfinder, a Bayes net based system for diagnosing pathologies of the lymph system. This analysis is intended to shed some light on the relative importance of structure and parameters to system performance, as well as the sensitivity of a system based on a Bayes net to noise in its assessed parameters.\nScientists often use directed acyclic graphs (days) to model the qualitative structure of causal theories, allowing the parameters to be estimated from observational data. Two causal models are equivalent if there is no experiment which could distinguish one from the other. A canonical representation for causal models is presented which yields an efficient graphical criterion for deciding equivalence, and provides a theoretical basis for extracting causal structures from empirical data. This representation is then extended to the more general case of an embedded causal model, that is, a dag in which only a subset of the variables are observable. The canonical representation presented here yields an efficient algorithm for determining when two embedded causal models reflect the same dependency information. This algorithm leads to a model theoretic definition of causation in terms of statistical dependencies.\nIn recent years, there have been intense research efforts to develop efficient methods for probabilistic inference in probabilistic influence diagrams or belief networks. Many people have concluded that the best methods are those based on undirected graph structures, and that those methods are inherently superior to those based on node reduction operations on the influence diagram. We show here that these two approaches are essentially the same, since they are explicitly or implicity building and operating on the same underlying graphical structures. In this paper we examine those graphical structures and show how this insight can lead to an improved class of directed reduction methods.\nThis paper addresses fundamental issues on the nature of the concepts and structures of fuzzy logic, focusing, in particular, on the conceptual and functional differences that exist between probabilistic and possibilistic approaches. A semantic model provides the basic framework to define possibilistic structures and concepts by means of a function that quantifies proximity, closeness, or resemblance between pairs of possible worlds. The resulting model is a natural extension, based on multiple conceivability relations, of the modal logic concepts of necessity and possibility. By contrast, chance-oriented probabilistic concepts and structures rely on measures of set extension that quantify the proportion of possible worlds where a proposition is true. Resemblance between possible worlds is quantified by a generalized similarity relation: a function that assigns a number between O and 1 to every pair of possible worlds. Using this similarity relation, which is a form of numerical complement of a classic metric or distance, it is possible to define and interpret the major constructs and methods of fuzzy logic: conditional and unconditioned possibility and necessity distributions and the generalized modus ponens of Zadeh.\nWe are concerned with the problem of introducing credibility type information into reasoning systems. The concept of credibility allows us to discount information provided by agents. An important characteristic of this kind of procedure is that a complete lack of credibility rather than resulting in the negation of the information provided results in the nullification of the information provided. We suggest a representational scheme for credibility qualification in the theory of approximate reasoning. We discuss the concept of relative credibility. By this idea we mean to indicate situations in which the credibility of a piece of evidence is determined by its compatibility with higher priority evidence. This situation leads to structures very much in the spirit of nonmonotonic reasoning.\nThis paper discusses how a measure of uncertainty representing a state of knowledge can be updated when a new information, which may be pervaded with uncertainty, becomes available. This problem is considered in various framework, namely: Shafer's evidence theory, Zadeh's possibility theory, Spohn's theory of epistemic states. In the two first cases, analogues of Jeffrey's rule of conditioning are introduced and discussed. The relations between Spohn's model and possibility theory are emphasized and Spohn's updating rule is contrasted with the Jeffrey-like rule of conditioning in possibility theory. Recent results by Shenoy on the combination of ordinal conditional functions are reinterpreted in the language of possibility theory. It is shown that Shenoy's combination rule has a well-known possibilistic counterpart.\nThis paper describes valuation-based systems for representing and solving discrete optimization problems. In valuation-based systems, we represent information in an optimization problem using variables, sample spaces of variables, a set of values, and functions that map sample spaces of sets of variables to the set of values. The functions, called valuations, represent the factors of an objective function. Solving the optimization problem involves using two operations called combination and marginalization. Combination tells us how to combine the factors of the joint objective function. Marginalization is either maximization or minimization. Solving an optimization problem can be simply described as finding the marginal of the joint objective function for the empty set. We state some simple axioms that combination and marginalization need to satisfy to enable us to solve an optimization problem using local computation. For optimization problems, the solution method of valuation-based systems reduces to non-serial dynamic programming. Thus our solution method for VBS can be regarded as an abstract description of dynamic programming. And our axioms can be viewed as conditions that permit the use of dynamic programming.\nIn this paper we associate with every (directed) graph G a transformation called the Mobius transformation of the graph G. The Mobius transformation of the graph (O) is of major significance for Dempster-Shafer theory of evidence. However, because it is computationally very heavy, the Mobius transformation together with Dempster's rule of combination is a major obstacle to the use of Dempster-Shafer theory for handling uncertainty in expert systems. The major contribution of this paper is the discovery of the 'fast Mobius transformations' of (O). These 'fast Mobius transformations' are the fastest algorithms for computing the Mobius transformation of (O). As an easy but useful application, we provide, via the commonality function, an algorithm for computing Dempster's rule of combination which is much faster than the usual one.\nThis paper presents a new technique for the design of approximate reasoning based controllers for dynamic physical systems with interacting goals. In this approach, goals are achieved based on a hierarchy defined by a control knowledge base and remain highly interactive during the execution of the control task. The approach has been implemented in a rule-based computer program which is used in conjunction with a prototype hardware system to solve the cart-pole balancing problem in real-time. It provides a complementary approach to the conventional analytical control methodology, and is of substantial use where a precise mathematical model of the process being controlled is not available.\nIn this paper a new mathematical procedure is presented for combining different pieces of evidence which are represented in the interval form to reflect our knowledge about the truth of a hypothesis. Evidences may be correlated to each other (dependent evidences) or conflicting in supports (conflicting evidences). First, assuming independent evidences, we propose a methodology to construct combination rules which obey a set of essential properties. The method is based on a geometric model. We compare results obtained from Dempster-Shafer's rule and the proposed combination rules with both conflicting and non-conflicting data and show that the values generated by proposed combining rules are in tune with our intuition in both cases. Secondly, in the case that evidences are known to be dependent, we consider extensions of the rules derived for handling conflicting evidence. The performance of proposed rules are shown by different examples. The results show that the proposed rules reasonably make decision under dependent evidences\nThis paper describes recent work on an ongoing project in medical diagnosis at the University of Guelph. A domain on which experts are not very good at pinpointing a single disease outcome is explored. On-line medical data is available over a relatively short period of time. Belief Functions (Dempster-Shafer theory) are first extracted from data and then modified with expert opinions. Several methods for doing this are compared and results show that one formulation statistically outperforms the others, including a method suggested by Shafer. Expert opinions and statistically derived information about dependencies among symptoms are also compared. The benefits of using uncertainty management techniques as methods for knowledge acquisition from data are discussed.\nWhile concept-based methods for information retrieval can provide improved performance over more conventional techniques, they require large amounts of effort to acquire the concepts and their qualitative and quantitative relationships. This paper discusses an architecture for probabilistic concept-based information retrieval which addresses the knowledge acquisition problem. The architecture makes use of the probabilistic networks technology for representing and reasoning about concepts and includes a knowledge acquisition component which partially automates the construction of concept knowledge bases from data. We describe two experiments that apply the architecture to the task of retrieving documents about terrorism from a set of documents from the Reuters news service. The experiments provide positive evidence that the architecture design is feasible and that there are advantages to concept-based methods.\nDecision-theoretic control of search has previously used as its basic unit. of computation the generation and evaluation of a complete set of successors. Although this simplifies analysis, it results in some lost opportunities for pruning and satisficing. This paper therefore extends the analysis of the value of computation to cover individual successor evaluations. The analytic techniques used may prove useful for control of reasoning in more general settings. A formula is developed for the expected value of a node, k of whose n successors have been evaluated. This formula is used to estimate the value of expanding further successors, using a general formula for the value of a computation in game-playing developed in earlier work. We exhibit an improved version of the MGSS* algorithm, giving empirical results for the game of Othello.\nOne of the most important aspects in any treatment of uncertain information is the rule of combination for updating the degrees of uncertainty. The theory of belief functions uses the Dempster rule to combine two belief functions defined by independent bodies of evidence. However, with limited dependency information about the accumulated belief the Dempster rule may lead to unsatisfactory results. The present study suggests a method to determine the accumulated belief based on the premise that the information gain from the combination process should be minimum. This method provides a mechanism that is equivalent to the Bayes rule when all the conditional probabilities are available and to the Dempster rule when the normalization constant is equal to one. The proposed principle of minimum information gain is shown to be equivalent to the maximum entropy formalism, a special case of the principle of minimum cross-entropy. The application of this principle results in a monotonic increase in belief with accumulation of consistent evidence. The suggested approach may provide a more reasonable criterion for identifying conflicts among various bodies of evidence.\nThis paper derives a formula for computing the conditional probability of a set of candidates, where a candidate is a set of disorders that explain a given set of positive findings. Such candidate sets are produced by a recent method for multidisorder diagnosis called symptom clustering. A symptom clustering represents a set of candidates compactly as a cartesian product of differential diagnoses. By evaluating the probability of a candidate set, then, a large set of candidates can be validated or pruned simultaneously. The probability of a candidate set is then specialized to obtain the probability of a single candidate. Unlike earlier results, the equation derived here allows the specification of positive, negative, and unknown symptoms and does not make assumptions about disorders not in the candidate.\nA major difficulty in developing and maintaining very large knowledge bases originates from the variety of forms in which knowledge is made available to the KB builder. The objective of this research is to bring together two complementary knowledge representation schemes: term subsumption languages, which represent and reason about defining characteristics of concepts, and proximate reasoning models, which deal with uncertain knowledge and data in expert systems. Previous works in this area have primarily focused on probabilistic inheritance. In this paper, we address two other important issues regarding the integration of term subsumption-based systems and approximate reasoning models. First, we outline a general architecture that specifies the interactions between the deductive reasoner of a term subsumption system and an approximate reasoner. Second, we generalize the semantics of terminological language so that terminological knowledge can be used to make plausible inferences. The architecture, combined with the generalized semantics, forms the foundation of a synergistic tight integration of term subsumption systems and approximate reasoning models.\nIn almost all situation assessment problems, it is useful to dynamically contract and expand the states under consideration as assessment proceeds. Contraction is most often used to combine similar events or low probability events together in order to reduce computation. Expansion is most often used to make distinctions of interest which have significant probability in order to improve the quality of the assessment. Although other uncertainty calculi, notably Dempster-Shafer [Shafer, 1976], have addressed these operations, there has not yet been any approach of refining and coarsening state spaces for the Bayesian Network technology. This paper presents two operations for refining and coarsening the state space in Bayesian Networks. We also discuss their practical implications for knowledge acquisition.\nIn this paper the elicitation of probabilities from human experts is considered as a measurement process, which may be disturbed by random 'measurement noise'. Using Bayesian concepts a second order probability distribution is derived reflecting the uncertainty of the input probabilities. The algorithm is based on an approximate sample representation of the basic probabilities. This sample is continuously modified by a stochastic simulation procedure, the Metropolis algorithm, such that the sequence of successive samples corresponds to the desired posterior distribution. The procedure is able to combine inconsistent probabilities according to their reliability and is applicable to general inference networks with arbitrary structure. Dempster-Shafer probability mass functions may be included using specific measurement distributions. The properties of the approach are demonstrated by numerical experiments.\nConsiderable attention has been given to the problem of non-monotonic reasoning in a belief function framework. Earlier work (M. Ginsberg) proposed solutions introducing meta-rules which recognized conditional independencies in a probabilistic sense. More recently an e-calculus formulation of default reasoning (J. Pearl) shows that the application of Dempster's rule to a non-monotonic situation produces erroneous results. This paper presents a new belief function interpretation of the problem which combines the rules in a way which is more compatible with probabilistic results and respects conditions of independence necessary for the application of Dempster's combination rule. A new general framework for combining conflicting evidence is also proposed in which the normalization factor becomes modified. This produces more intuitively acceptable results.\nInappropriate use of Dempster's rule of combination has led some authors to reject the Dempster-Shafer model, arguing that it leads to supposedly unacceptable conclusions when defaults are involved. A most classic example is about the penguin Tweety. This paper will successively present: the origin of the miss-management of the Tweety example; two types of default; the correct solution for both types based on the transferable belief model (our interpretation of the Dempster-Shafer model (Shafer 1976, Smets 1988)); Except when explicitly stated, all belief functions used in this paper are simple support functions, i.e. belief functions for which only one proposition (the focus) of the frame of discernment receives a positive basic belief mass with the remaining mass being given to the tautology. Each belief function will be described by its focus and the weight of the focus (e.g. m(A)=.9). Computation of the basic belief masses are always performed by vacuously extending each belief function to the product space built from all variables involved, combining them on that space by Dempster's rule of combination, and projecting the result to the space corresponding to each individual variable.\nWe examine three probabilistic formulations of the sentence a and b are totally unrelated with respect to a given set of variables U. First, two variables a and b are totally independent if they are independent given any value of any subset of the variables in U. Second, two variables are totally uncoupled if U can be partitioned into two marginally independent sets containing a and b respectively. Third, two variables are totally disconnected if the corresponding nodes are disconnected in every belief network representation. We explore the relationship between these three formulations of unrelatedness and explain their relevance to the process of acquiring probabilistic knowledge from human experts.\nNearly all spatial reasoning problems involve uncertainty of one sort or another. Uncertainty arises due to the inaccuracies of sensors used in measuring distances and angles. We refer to this as directional uncertainty. Uncertainty also arises in combining spatial information when one location is mistakenly identified with another. We refer to this as recognition uncertainty. Most problems in constructing spatial representations (maps) for the purpose of navigation involve both directional and recognition uncertainty. In this paper, we show that a particular class of spatial reasoning problems involving the construction of representations of large-scale space can be solved efficiently even in the presence of directional and recognition uncertainty. We pay particular attention to the problems that arise due to recognition uncertainty.\nIn this paper we explore representations of temporal knowledge based upon the formalism of Causal Probabilistic Networks (CPNs). Two different ?continuous-time? representations are proposed. In the first, the CPN includes variables representing ?event-occurrence times?, possibly on different time scales, and variables representing the ?state? of the system at these times. In the second, the CPN describes the influences between random variables with values in () representing dates, i.e. time-points associated with the occurrence of relevant events. However, structuring a system of inter-related dates as a network where all links commit to a single specific notion of cause and effect is in general far from trivial and leads to severe difficulties. We claim that we should recognize explicitly different kinds of relation between dates, such as ?cause?, ?inhibition?, ?competition?, etc., and propose a method whereby these relations are coherently embedded in a CPN using additional auxiliary nodes corresponding to \"instrumental\" variables. Also discussed, though not covered in detail, is the topic concerning how the quantitative specifications to be inserted in a temporal CPN can be learned from specific data.\nRather than discussing the isolated merits of a nominative theory of uncertainty, this paper focuses on a class of problems, referred to as Dynamic Classification Problem (DCP), which requires the integration of many theories, including a prescriptive theory of uncertainty. We start by analyzing the Dynamic Classification Problem and by defining its induced requirements on a supporting (plausible) reasoning system. We provide a summary of the underlying theory (based on the semantics of many-valed logics) and illustrate the constraints imposed upon it to ensure the modularity and computational performance required by the applications. We describe the technologies used for knowledge engineering (such as object-based simulator to exercise requirements, and development tools to build the Knowledge Base and functionally validate it). We emphasize the difference between development environment and run-time system, describe the rule cross-compiler, and the real-time inference engine with meta-reasoning capabilities. Finally, we illustrate how our proposed technology satisfies the pop's requirements and analyze some of the lessons reamed from its applications to situation assessment problems for Pilot's Associate and Submarine Commander Associate.\nTwo major difficulties in using default logics are their intractability and the problem of selecting among multiple extensions. We propose an approach to these problems based on integrating nommonotonic reasoning with plausible reasoning based on triangular norms. A previously proposed system for reasoning with uncertainty (RUM) performs uncertain monotonic inferences on an acyclic graph. We have extended RUM to allow nommonotonic inferences and cycles within nonmonotonic rules. By restricting the size and complexity of the nommonotonic cycles we can still perform efficient inferences. Uncertainty measures provide a basis for deciding among multiple defaults. Different algorithms and heuristics for finding the optimal defaults are discussed.\nIn this paper an approach to automated deduction under uncertainty,based on possibilistic logic, is proposed ; for that purpose we deal with clauses weighted by a degree which is a lower bound of a necessity or a possibility measure, according to the nature of the uncertainty. Two resolution rules are used for coping with the different situations, and the refutation method can be generalized. Besides the lower bounds are allowed to be functions of variables involved in the clause, which gives hypothetical reasoning capabilities. The relation between our approach and the idea of minimizing abnormality is briefly discussed. In case where only lower bounds of necessity measures are involved, a semantics is proposed, in which the completeness of the extended resolution principle is proved. Moreover deduction from a partially inconsistent knowledge base can be managed in this approach and displays some form of non-monotonicity.\nBayesian inference systems should be able to explain their reasoning to users, translating from numerical to natural language. Previous empirical work has investigated the correspondence between absolute probabilities and linguistic phrases. This study extends that work to the correspondence between changes in probabilities (updates) and relative probability phrases, such as \"much more likely\" or \"a little less likely.\" Subjects selected such phrases to best describe numerical probability updates. We examined three hypotheses about the correspondence, and found the most descriptively accurate of these three to be that each such phrase corresponds to a fixed difference in probability (rather than fixed ratio of probabilities or of odds). The empirically derived phrase selection function uses eight phrases and achieved a 72% accuracy in correspondence with the subjects' actual usage.\nWe describe a mechanism for performing probabilistic reasoning in influence diagrams using interval rather than point valued probabilities. We derive the procedures for node removal (corresponding to conditional expectation) and arc reversal (corresponding to Bayesian conditioning) in influence diagrams where lower bounds on probabilities are stored at each node. The resulting bounds for the transformed diagram are shown to be optimal within the class of constraints on probability distributions that can be expressed exclusively as lower bounds on the component probabilities of the diagram. Sequences of these operations can be performed to answer probabilistic queries with indeterminacies in the input and for performing sensitivity analysis on an influence diagram. The storage requirements and computational complexity of this approach are comparable to those for point-valued probabilistic inference mechanisms, making the approach attractive for performing sensitivity analysis and where probability information is not available. Limited empirical data on an implementation of the methodology are provided.\nStochastic simulation approaches perform probabilistic inference in Bayesian networks by estimating the probability of an event based on the frequency that the event occurs in a set of simulation trials. This paper describes the evidence weighting mechanism, for augmenting the logic sampling stochastic simulation algorithm [Henrion, 1986]. Evidence weighting modifies the logic sampling algorithm by weighting each simulation trial by the likelihood of a network's evidence given the sampled state node values for that trial. We also describe an enhancement to the basic algorithm which uses the evidential integration technique [Chin and Cooper, 1987]. A comparison of the basic evidence weighting mechanism with the Markov blanket algorithm [Pearl, 1987], the logic sampling algorithm, and the evidence integration algorithm is presented. The comparison is aided by analyzing the performance of the algorithms in a simple example network.\nWe consider the relation between knowledge and certainty, where a fact is known if it is true at all worlds an agent considers possible and is certain if it holds with probability 1. We identify certainty with probabilistic belief. We show that if we assume one fixed probability assignment, then the logic KD45, which has been identified as perhaps the most appropriate for belief, provides a complete axiomatization for reasoning about certainty. Just as an agent may believe a fact although phi is false, he may be certain that a fact phi, is true although phi is false. However, it is easy to see that an agent can have such false (probabilistic) beliefs only at a set of worlds of probability 0. If we restrict attention to structures where all worlds have positive probability, then S5 provides a complete axiomatization. If we consider a more general setting, where there might be a different probability assignment at each world, then by placing appropriate conditions on the support of the probability function (the set of worlds which have non-zero probability), we can capture many other well-known modal logics, such as T and S4. Finally, we consider which axioms characterize structures satisfying Miller's principle.\nWe introduce and analyze the problem of the compilation of decision models from a decision-theoretic perspective. The techniques described allow us to evaluate various configurations of compiled knowledge given the nature of evidential relationships in a domain, the utilities associated with alternative actions, the costs of run-time delays, and the costs of memory. We describe procedures for selecting a subset of the total observations available to be incorporated into a compiled situation-action mapping, in the context of a binary decision with conditional independence of evidence. The methods allow us to incrementally select the best pieces of evidence to add to the set of compiled knowledge in an engineering setting. After presenting several approaches to compilation, we exercise one of the methods to provide insight into the relationship between the distribution over weights of evidence and the preferred degree of compilation.\nWe examine a probabilistic model for the diagnosis of multiple diseases. In the model, diseases and findings are represented as binary variables. Also, diseases are marginally independent, features are conditionally independent given disease instances, and diseases interact to produce findings via a noisy OR-gate. An algorithm for computing the posterior probability of each disease, given a set of observed findings, called quickscore, is presented. The time complexity of the algorithm is O(nm-2m+), where n is the number of diseases, m+ is the number of positive findings and m- is the number of negative findings. Although the time complexity of quickscore i5 exponential in the number of positive findings, the algorithm is useful in practice because the number of observed positive findings is usually far less than the number of diseases under consideration. Performance results for quickscore applied to a probabilistic version of Quick Medical Reference (QMR) are provided.\nWe introduce a graceful approach to probabilistic inference called bounded conditioning. Bounded conditioning monotonically refines the bounds on posterior probabilities in a belief network with computation, and converges on final probabilities of interest with the allocation of a complete resource fraction. The approach allows a reasoner to exchange arbitrary quantities of computational resource for incremental gains in inference quality. As such, bounded conditioning holds promise as a useful inference technique for reasoning under the general conditions of uncertain and varying reasoning resources. The algorithm solves a probabilistic bounding problem in complex belief networks by breaking the problem into a set of mutually exclusive, tractable subproblems and ordering their solution by the expected effect that each subproblem will have on the final answer. We introduce the algorithm, discuss its characterization, and present its performance on several belief networks, including a complex model for reasoning about problems in intensive-care medicine.\nIn this paper, we will review the process of evidence accumulation in the PSEIKI system for expectation-driven interpretation of images of 3-D scenes. Expectations are presented to PSEIKI as a geometrical hierarchy of abstractions. PSEIKI's job is then to construct abstraction hierarchies in the perceived image taking cues from the abstraction hierarchies in the expectations. The Dempster-Shafer formalism is used for associating belief values with the different possible labels for the constructed abstractions in the perceived image. This system has been used successfully for autonomous navigation of a mobile robot in indoor environments.\nThis paper argues that the principal difference between decision aids and most other types of information systems is the greater reliance of decision aids on fallible algorithms--algorithms that sometimes generate incorrect advice. It is shown that interactive problem solving with a decision aid that is based on a fallible algorithm can easily result in aided performance which is poorer than unaided performance, even if the algorithm, by itself, performs significantly better than the unaided decision maker. This suggests that unless certain conditions are satisfied, using a decision aid as an aid is counterproductive. Some conditions under which a decision aid is best used as an aid are derived.\nWe show an approach to automated control of machine vision systems based on incremental creation and evaluation of a particular family of influence diagrams that represent hypotheses of imagery interpretation and possible subsequent processing decisions. In our approach, model-based machine vision techniques are integrated with hierarchical Bayesian inference to provide a framework for representing and matching instances of objects and relationships in imagery and for accruing probabilities to rank order conflicting scene interpretations. We extend a result of Tatman and Shachter to show that the sequence of processing decisions derived from evaluating the diagrams at each stage is the same as the sequence that would have been derived by evaluating the final influence diagram that contains all random variables created during the run of the vision system.\nIn two recent papers, I have proposed a description of decision analysis that differs from the Bayesian picture painted by Savage, Jeffrey and other classic authors. Response to this view has been either overly enthusiastic or unduly pessimistic. In this paper I try to place the idea in its proper place, which must be somewhere in between. Looking at decision analysis as defeasible reasoning produces a framework in which planning and decision theory can be integrated, but work on the details has barely begun. It also produces a framework in which the meta-decision regress can be stopped in a reasonable way, but it does not allow us to ignore meta-level decisions. The heuristics for producing arguments that I have presented are only supposed to be suggestive; but they are not open to the egregious errors about which some have worried. And though the idea is familiar to those who have studied heuristic search, it is somewhat richer because the control of dialectic is more interesting than the deepening of search.\nThis paper presents some ideas and results of using uncertainty management methods in the presence of data in preference to other statistical and machine learning methods. A medical domain is used as a test-bed with data available from a large hospital database system which collects symptom and outcome information about patients. Data is often missing, of many variable types and sample sizes for particular outcomes is not large. Uncertainty management methods are useful for such domains and have the added advantage of allowing for expert modification of belief values originally obtained from data. Methodological considerations for using belief functions on statistical data are dealt with in some detail. Expert opinions are Incorporated at various levels of the project development and results are reported on an application to liver disease diagnosis. Recent results contrasting the use of weights of evidence and logistic regression on another medical domain are also presented.\nMany writers have observed that default logics appear to contain the \"lottery paradox\" of probability theory. This arises when a default \"proof by contradiction\" lets us conclude that a typical X is not a Y where Y is an unusual subclass of X. We show that there is a similar problem with default \"proof by cases\" and construct a setting where we might draw a different conclusion knowing a disjunction than we would knowing any particular disjunct. Though Reiter's original formalism is capable of representing this distinction, other approaches are not. To represent and reason about this case, default logicians must specify how a \"typical\" individual is selected. The problem is closely related to Simpson's paradox of probability theory. If we accept a simple probabilistic account of defaults based on the notion that one proposition may favour or increase belief in another, the \"multiple extension problem\" for both conjunctive and disjunctive knowledge vanishes.\nWe formulate Dempster Shafer Belief functions in terms of Propositional Logic using the implicit notion of provability underlying Dempster Shafer Theory. Given a set of propositional clauses, assigning weights to certain propositional literals enables the Belief functions to be explicitly computed using Network Reliability techniques. Also, the logical procedure corresponding to updating Belief functions using Dempster's Rule of Combination is shown. This analysis formalizes the implementation of Belief functions within an Assumption-based Truth Maintenance System (ATMS). We describe the extension of an ATMS-based visual recognition system, VICTORS, with this logical formulation of Dempster Shafer theory. Without Dempster Shafer theory, VICTORS computes all possible visual interpretations (i.e. all logical models) without determining the best interpretation(s). Incorporating Dempster Shafer theory enables optimal visual interpretations to be computed and a logical semantics to be maintained.\nA number of algorithms have been developed to solve probabilistic inference problems on belief networks. These algorithms can be divided into two main groups: exact techniques which exploit the conditional independence revealed when the graph structure is relatively sparse, and probabilistic sampling techniques which exploit the \"conductance\" of an embedded Markov chain when the conditional probabilities have non-extreme values. In this paper, we investigate a family of \"forward\" Monte Carlo sampling techniques similar to Logic Sampling [Henrion, 1988] which appear to perform well even in some multiply connected networks with extreme conditional probabilities, and thus would be generally applicable. We consider several enhancements which reduce the posterior variance using this approach and propose a framework and criteria for choosing when to use those enhancements.\nThree paediatric cardiologists assessed nearly 1000 imprecise subjective conditional probabilities for a simple belief network representing congenital heart disease, and the quality of the assessments has been measured using prospective data on 200 babies. Quality has been assessed by a Brier scoring rule, which decomposes into terms measuring lack of discrimination and reliability. The results are displayed for each of 27 diseases and 24 questions, and generally the assessments are reliable although there was a tendency for the probabilities to be too extreme. The imprecision allows the judgements to be converted to implicit samples, and by combining with the observed data the probabilities naturally adapt with experience. This appears to be a practical procedure even for reasonably large expert systems.\nAn algorithm for automated construction of a sparse Bayesian network given an unstructured probabilistic model and causal domain information from an expert has been developed and implemented. The goal is to obtain a network that explicitly reveals as much information regarding conditional independence as possible. The network is built incrementally adding one node at a time. The expert's information and a greedy heuristic that tries to keep the number of arcs added at each step to a minimum are used to guide the search for the next node to add. The probabilistic model is a predicate that can answer queries about independencies in the domain. In practice the model can be implemented in various ways. For example, the model could be a statistical independence test operating on empirical data or a deductive prover operating on a set of independence statements about the domain.\nA primary motivation for reasoning under uncertainty is to derive decisions in the face of inconclusive evidence. However, Shafer's theory of belief functions, which explicitly represents the underconstrained nature of many reasoning problems, lacks a formal procedure for making decisions. Clearly, when sufficient information is not available, no theory can prescribe actions without making additional assumptions. Faced with this situation, some assumption must be made if a clearly superior choice is to emerge. In this paper we offer a probabilistic interpretation of a simple assumption that disambiguates decision problems represented with belief functions. We prove that it yields expected values identical to those obtained by a probabilistic analysis that makes the same assumption. In addition, we show how the decision analysis methodology frequently employed in probabilistic reasoning can be extended for use with belief functions.\nThis study compares the inherent intuitiveness or usability of the most prominent methods for managing uncertainty in expert systems, including those of EMYCIN, PROSPECTOR, Dempster-Shafer theory, fuzzy set theory, simplified probability theory (assuming marginal independence), and linear regression using probability estimates. Participants in the study gained experience in a simple, hypothetical problem domain through a series of learning trials. They were then randomly assigned to develop an expert system using one of the six Uncertain Inference Systems (UISs) listed above. Performance of the resulting systems was then compared. The results indicate that the systems based on the PROSPECTOR and EMYCIN models were significantly less accurate for certain types of problems compared to systems based on the other UISs. Possible reasons for these differences are discussed.\nRecent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.\nThis paper is part of a study whose goal is to show the effciency of using Bayes networks to carry out model based vision calculations. [Binford et al. 1987] Recognition proceeds by drawing up a network model from the object's geometric and functional description that predicts the appearance of an object. Then this network is used to find the object within a photographic image. Many existing and proposed techniques for vision recognition resemble the uncertainty calculations of a Bayes net. In contrast, though, they lack a derivation from first principles, and tend to rely on arbitrary parameters that we hope to avoid by a network model. The connectedness of the network depends on what independence considerations can be identified in the vision problem. Greater independence leads to easier calculations, at the expense of the net's expressiveness. Once this trade-off is made and the structure of the network is determined, it should be possible to tailor a solution technique for it. This paper explores the use of a network with multiply connected paths, drawing on both techniques of belief networks [Pearl 86] and influence diagrams. We then demonstrate how one formulation of a multiply connected network can be solved.\nIn Probabilistic Logic Nilsson uses the device of a probability distribution over a set of possible worlds to assign probabilities to the sentences of a logical language. In his paper Nilsson concentrated on inference and associated computational issues. This paper, on the other hand, examines the probabilistic semantics in more detail, particularly for the case of first-order languages, and attempts to explain some of the features and limitations of this form of probability logic. It is pointed out that the device of assigning probabilities to logical sentences has certain expressive limitations. In particular, statistical assertions are not easily expressed by such a device. This leads to certain difficulties with attempts to give probabilistic semantics to default reasoning using probabilities assigned to logical sentences.\nDempster/Shafer (D/S) theory has been advocated as a way of representing incompleteness of evidence in a system's knowledge base. Methods now exist for propagating beliefs through chains of inference. This paper discusses how rules with attached beliefs, a common representation for knowledge in automated reasoning systems, can be transformed into the joint belief functions required by propagation algorithms. A rule is taken as defining a conditional belief function on the consequent given the antecedents. It is demonstrated by example that different joint belief functions may be consistent with a given set of rules. Moreover, different representations of the same rules may yield different beliefs on the consequent hypotheses.\nThis paper discusses a project undertaken between the Departments of Computing Science, Statistics, and the College of Veterinary Medicine to design a medical diagnostic system. On-line medical data has been collected in the hospital database system for several years. A number of induction methods are being used to extract knowledge from the data in an attempt to improve upon simple diagnostic charts used by the clinicians. They also enhance the results of classical statistical methods - finding many more significant variables. The second part of the paper describes an essentially Bayesian method of evidence combination using fuzzy events at an initial step. Results are presented and comparisons are made with other methods.\nKNET is a general-purpose shell for constructing expert systems based on belief networks and decision networks. Such networks serve as graphical representations for decision models, in which the knowledge engineer must define clearly the alternatives, states, preferences, and relationships that constitute a decision basis. KNET contains a knowledge-engineering core written in Object Pascal and an interface that tightly integrates HyperCard, a hypertext authoring tool for the Apple Macintosh computer, into a novel expert-system architecture. Hypertext and hypermedia have become increasingly important in the storage management, and retrieval of information. In broad terms, hypermedia deliver heterogeneous bits of information in dynamic, extensively cross-referenced packages. The resulting KNET system features a coherent probabilistic scheme for managing uncertainty, an objectoriented graphics editor for drawing and manipulating decision networks, and HyperCard's potential for quickly constructing flexible and friendly user interfaces. We envision KNET as a useful prototyping tool for our ongoing research on a variety of Bayesian reasoning problems, including tractable representation, inference, and explanation.\nPredicting the future is an important component of decision making. In most situations, however, there is not enough information to make accurate predictions. In this paper, we develop a theory of causal reasoning for predictive inference under uncertainty. We emphasize a common type of prediction that involves reasoning about persistence: whether or not a proposition once made true remains true at some later time. We provide a decision procedure with a polynomial-time algorithm for determining the probability of the possible consequences of a set events and initial conditions. The integration of simple probability theory with temporal projection enables us to circumvent problems that nonmonotonic temporal reasoning schemes have in dealing with persistence. The ideas in this paper have been implemented in a prototype system that refines a database of causal rules in the course of applying those rules to construct and carry out plans in a manufacturing domain.\nA new approach for uncertainty management for fuzzy, rule based decision support systems is proposed: The domain expert's knowledge is expressed by a set of rules that frequently refer to vague and uncertain propositions. The certainty of propositions is represented using intervals [a, b] expressing that the proposition's probability is at least a and at most b. Methods and techniques for computing the overall certainty of fuzzy compound propositions that have been defined by using logical connectives 'and', 'or' and 'not' are introduced. Different inference schemas for applying fuzzy rules by using modus ponens are discussed. Different algorithms for combining evidence that has been received from different rules for the same proposition are provided. The relationship of the approach to other approaches is analyzed and its problems of knowledge acquisition and knowledge representation are discussed in some detail. The basic concepts of a rule-based programming language called PICASSO, for which the approach is a theoretical foundation, are outlined.\nThe practice of stochastic sensitivity analysis described in the decision analysis literature is a testimonial to the need for considering deviations from precise point estimates of uncertainty. We propose the use of Bayesian fuzzy probabilities within an influence diagram computational scheme for performing sensitivity analysis during the solution of probabilistic inference and decision problems. Unlike other parametric approaches, the proposed scheme does not require resolving the problem for the varying probability point estimates. We claim that the solution to fuzzy influence diagrams provides as much information as the classical point estimate approach plus additional information concerning stochastic sensitivity. An example based on diagnostic decision making in microcomputer assembly is used to illustrate this idea. We claim that the solution to fuzzy influence diagrams provides as much information as the classical point estimate approach plus additional interval information that is useful for stochastic sensitivity analysis.\nMany real world models can be characterized as weak, meaning that there is significant uncertainty in both the data input and inferences. This lack of determinism makes it especially difficult for users of computer decision aids to understand and have confidence in the models. This paper presents a representation for uncertainty and utilities that serves as a framework for graphical summary and computer-generated explanation of decision models. The application described that tests the methodology is a computer decision aid designed to enhance the clinician-patient consultation process for patients with angina (chest pain due to lack of blood flow to the heart muscle). The angina model is represented as a Bayesian decision network. Additionally, the probabilities and utilities are treated as random variables with probability distributions on their range of possible values. The initial distributions represent information on all patients with anginal symptoms, and the approach allows for rapid tailoring to more patientspecific distributions. This framework provides a metric for judging the importance of each variable in the model dynamically.\nThere has long been debate about the relative merits of decision theoretic methods and heuristic rule-based approaches for reasoning under uncertainty. We report an experimental comparison of the performance of the two approaches to troubleshooting, specifically to test selection for fault diagnosis. We use as experimental testbed the problem of diagnosing motorcycle engines. The first approach employs heuristic test selection rules obtained from expert mechanics. We compare it with the optimal decision analytic algorithm for test selection which employs estimated component failure probabilities and test costs. The decision analytic algorithm was found to reduce the expected cost (i.e. time) to arrive at a diagnosis by an average of 14% relative to the expert rules. Sensitivity analysis shows the results are quite robust to inaccuracy in the probability and cost estimates. This difference suggests some interesting implications for knowledge acquisition.\nThis paper describes experiments, on two domains, to investigate the effect of averaging over predictions of multiple decision trees, instead of using a single tree. Other authors have pointed out theoretical and commonsense reasons for preferring the multiple tree approach. Ideally, we would like to consider predictions from all trees, weighted by their probability. However, there is a vast number of different trees, and it is difficult to estimate the probability of each tree. We sidestep the estimation problem by using a modified version of the ID3 algorithm to build good trees, and average over only these trees. Our results are encouraging. For each domain, we managed to produce a small number of good trees. We find that it is best to average across sets of trees with different structure; this usually gives better performance than any of the constituent trees, including the ID3 tree.\nThere is much interest in providing probabilistic semantics for defaults but most approaches seem to suffer from one of two problems: either they require numbers, a problem defaults were intended to avoid, or they generate peculiar side effects. Rather than provide semantics for defaults, we address the problem defaults were intended to solve: that of reasoning under uncertainty where numeric probability distributions are not available. We describe a non-numeric formalism called an inference graph based on standard probability theory, conditional independence and sentences of favouring where a favours b - favours(a, b) - p(a|b) > p(a). The formalism seems to handle the examples from the nonmonotonic literature. Most importantly, the sentences of our system can be verified by performing an appropriate experiment in the semantic domain.\nTechniques for decision making with knowledge of linear constraints on condition probabilities are examined. These constraints arise naturally in many situations: upper and lower condition probabilities are known; an ordering among the probabilities is determined; marginal probabilities or bounds on such probabilities are known, e.g., data are available in the form of a probabilistic database (Cavallo and Pittarelli, 1987a); etc. Standard situations of decision making under risk and uncertainty may also be characterized by linear constraints. Each of these types of information may be represented by a convex polyhedron of numerically determinate condition probabilities. A uniform approach to decision making under risk, uncertainty, and partial uncertainty based on a generalized version of a criterion of Hurwicz is proposed, Methods for processing marginal probabilities to improve decision making using any of the criteria discussed are presented.\nRecent developments using directed acyclical graphs (i.e., influence diagrams and Bayesian networks) for knowledge representation have lessened the problems of using probability in knowledge-based systems (KBS). Most current research involves the efficient propagation of new evidence, but little has been done concerning the maintenance of domain-specific knowledge, which includes the probabilistic information about the problem domain. By making use of conditional independencies represented in she graphs, however, probability assessments are required only for certain variables when the knowledge base is updated. The purpose of this study was to investigate, for those variables which require probability assessments, ways to reduce the amount of new knowledge required from the expert when updating probabilistic information in a probabilistic knowledge-based system. Three special cases (ignored outcome, split outcome, and assumed constraint outcome) were identified under which many of the original probabilities (those already in the knowledge-base) do not need to be reassessed when maintenance is required.\nThis paper examines two related problems that are central to developing an autonomous decision-making agent, such as a robot. Both problems require generating structured representafions from a database of unstructured declarative knowledge that includes many facts and rules that are irrelevant in the problem context. The first problem is how to generate a well structured decision problem from such a database. The second problem is how to generate, from the same database, a well-structured explanation of why some possible world occurred. In this paper it is shown that the problem of generating the appropriate decision structure or explanation is intractable without introducing further constraints on the knowledge in the database. The paper proposes that the problem search space can be constrained by adding knowledge to the database about causal relafions between events. In order to determine the causal knowledge that would be most useful, causal theories for deterministic and indeterministic universes are proposed. A program that uses some of these causal constraints has been used to generate explanations about faulty plans. The program shows the expected increase in efficiency as the causal constraints are introduced.\nWith the desire to apply the Dempster-Shafer theory to complex real world problems where the evidential strength is often imprecise and vague, several attempts have been made to generalize the theory. However, the important concept in the D-S theory that the belief and plausibility functions are lower and upper probabilities is no longer preserved in these generalizations. In this paper, we describe a generalized theory of evidence where the degree of belief in a fuzzy set is obtained by minimizing the probability of the fuzzy set under the constraints imposed by a basic probability assignment. To formulate the probabilistic constraint of a fuzzy focal element, we decompose it into a set of consonant non-fuzzy focal elements. By generalizing the compatibility relation to a possibility theory, we are able to justify our generalization to Dempster's rule based on possibility distribution. Our generalization not only extends the application of the D-S theory but also illustrates a way that probability theory and fuzzy set theory can be combined to deal with different kinds of uncertain information in AI systems.\nA number of writers have supposed that for the full specification of belief, higher order probabilities are required. Some have even supposed that there may be an unending sequence of higher order probabilities of probabilities of probabilities.... In the present paper we show that higher order probabilities can always be replaced by the marginal distributions of joint probability distributions. We consider both the case in which higher order probabilities are of the same sort as lower order probabilities and that in which higher order probabilities are distinct in character, as when lower order probabilities are construed as frequencies and higher order probabilities are construed as subjective degrees of belief. In neither case do higher order probabilities appear to offer any advantages, either conceptually or computationally.\nThe domain of spare parts forecasting is examined, and is found to present unique uncertainty based problems in the architectural design of a knowledge-based system. A mixture of different uncertainty paradigms is required for the solution, with an intriguing combinatoric problem arising from an uncertain choice of inference engines. Thus, uncertainty in the system is manifested in two different meta-levels. The different uncertainty paradigms and meta-levels must be integrated into a functioning whole. FRED is an example of a difficult real-world domain to which no existing uncertainty approach is completely appropriate. This paper discusses the architecture of FRED, highlighting: the points of uncertainty and other interesting features of the domain, the specific implications of those features on the system design (including the combinatoric explosions), their current implementation & future plans,and other problems and issues with the architecture.\nThis paper examines Bayesian belief network inference using simulation as a method for computing the posterior probabilities of network variables. Specifically, it examines the use of a method described by Henrion, called logic sampling, and a method described by Pearl, called stochastic simulation. We first review the conditions under which logic sampling is computationally infeasible. Such cases motivated the development of the Pearl's stochastic simulation algorithm. We have found that this stochastic simulation algorithm, when applied to certain networks, leads to much slower than expected convergence to the true posterior probabilities. This behavior is a result of the tendency for local areas in the network to become fixed through many simulation cycles. The time required to obtain significant convergence can be made arbitrarily long by strengthening the probabilistic dependency between nodes. We propose the use of several forms of graph modification, such as graph pruning, arc reversal, and node reduction, in order to convert some networks into formats that are computationally more efficient for simulation.\nThis paper describes NAIVE, a low-level knowledge representation language and inferencing process. NAIVE has been designed for reasoning about nondeterministic dynamic systems like those found in medicine. Knowledge is represented in a graph structure consisting of nodes, which correspond to the variables describing the system of interest, and arcs, which correspond to the procedures used to infer the value of a variable from the values of other variables. The value of a variable can be determined at an instant in time, over a time interval or for a series of times. Information about the value of a variable is expressed as a probability density function which quantifies the likelihood of each possible value. The inferencing process uses these probability density functions to propagate uncertainty. NAIVE has been used to develop medical knowledge bases including over 100 variables.\nThis paper demonstrates a methodology for examining the accuracy of uncertain inference systems (UIS), after their parameters have been optimized, and does so for several common UIS's. This methodology may be used to test the accuracy when either the prior assumptions or updating formulae are not exactly satisfied. Surprisingly, these UIS's were revealed to be no more accurate on the average than a simple linear regression. Moreover, even on prior distributions which were deliberately biased so as give very good accuracy, they were less accurate than the simple probabilistic model which assumes marginal independence between inputs. This demonstrates that the importance of updating formulae can outweigh that of prior assumptions. Thus, when UIS's are judged by their final accuracy after optimization, we get completely different results than when they are judged by whether or not their prior assumptions are perfectly satisfied.\nThis paper considers the problem of invoking auxiliary, unobservable variables to facilitate the structuring of causal tree models for a given set of continuous variables. Paralleling the treatment of bi-valued variables in [Pearl 1986], we show that if a collection of coupled variables are governed by a joint normal distribution and a tree-structured representation exists, then both the topology and all internal relationships of the tree can be uncovered by observing pairwise dependencies among the observed variables (i.e., the leaves of the tree). Furthermore, the conditions for normally distributed variables are less restrictive than those governing bi-valued variables. The result extends the applications of causal tree models which were found useful in evidential reasoning tasks.\nThe Dempster-Shafer theory has been extended recently for its application to expert systems. However, implementing the extended D-S reasoning model in rule-based systems greatly complicates the task of generating informative explanations. By implementing GERTIS, a prototype system for diagnosing rheumatoid arthritis, we show that two kinds of knowledge are essential for explanation generation: (l) taxonomic class relationships between hypotheses and (2) pointers to the rules that significantly contribute to belief in the hypothesis. As a result, the knowledge represented in GERTIS is richer and more complex than that of conventional rule-based systems. GERTIS not only demonstrates the feasibility of rule-based evidential-reasoning systems, but also suggests ways to generate better explanations, and to explicitly represent various useful relationships among hypotheses and rules.\nWhen creating an expert system, the most difficult and expensive task is constructing a knowledge base. This is particularly true if the problem involves noisy data and redundant measurements. This paper shows how to modify the MACIE process for generating connectionist expert systems from training examples so that it can accommodate noisy and redundant data. The basic idea is to dynamically generate appropriate training examples by constructing both a 'deep' model and a noise model for the underlying problem. The use of winner-take-all groups of variables is also discussed. These techniques are illustrated with a small example that would be very difficult for standard expert system approaches.\nThe multiple extension problem arises frequently in diagnostic and default inference. That is, we can often use any of a number of sets of defaults or possible hypotheses to explain observations or make Predictions. In default inference, some extensions seem to be simply wrong and we use qualitative techniques to weed out the unwanted ones. In the area of diagnosis, however, the multiple explanations may all seem reasonable, however improbable. Choosing among them is a matter of quantitative preference. Quantitative preference works well in diagnosis when knowledge is modelled causally. Here we suggest a framework that combines probabilities and defaults in a single unified framework that retains the semantics of diagnosis as construction of explanations from a fixed set of possible hypotheses. We can then compute probabilities incrementally as we construct explanations. Here we describe a branch and bound algorithm that maintains a set of all partial explanations while exploring a most promising one first. A most probable explanation is found first if explanations are partially ordered.\nMuch of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary decision problem, and (2) identifying knowledge engineering concerns that present additional representational tradeoffs. We argue on pragmatic grounds that the attempt to support all of these tasks within a single calculus is misguided. In the process, we note several uncertain reasoning objectives that conflict with the Bayesian ideal of complete specification of probabilities and utilities. In response, we advocate treating the uncertainty calculus as an object language for reasoning mechanisms that support the secondary tasks. Arguments against Bayesian decision theory are weakened when the calculus is relegated to this role. Architectures for uncertainty handling that take statements in the calculus as objects to be reasoned about offer the prospect of retaining normative status with respect to decision making while supporting the other tasks in uncertain reasoning.\nOur previous work on classifying complex ship images [1,2] has evolved into an effort to develop software tools for building and solving generic classification problems. Managing the uncertainty associated with feature data and other evidence is an important issue in this endeavor. Bayesian techniques for managing uncertainty [7,12,13] have proven to be useful for managing several of the belief maintenance requirements of classification problem solving. One such requirement is the need to give qualitative explanations of what is believed. Pearl [11] addresses this need by computing what he calls a belief commitment-the most probable instantiation of all hypothesis variables given the evidence available. Before belief commitments can be computed, the straightforward implementation of Pearl's procedure involves finding an analytical solution to some often difficult optimization problems. We describe an efficient implementation of this procedure using tensor products that solves these problems enumeratively and avoids the need for case by case analysis. The procedure is thereby made more practical to use in the general case.\nA major aspect of human reasoning involves the use of approximations. Particularly in situations where the decision-making process is under stringent time constraints, decisions are based largely on approximate, qualitative assessments of the situations. Our work is concerned with the application of approximate reasoning to real-time control. Because of the stringent processing speed requirements in such applications, hardware implementations of fuzzy logic inferencing are being pursued. We describe a programming environment for translating fuzzy control rules into hardware realizations. Two methods of hardware realizations are possible. The First is based on a special purpose chip for fuzzy inferencing. The second is based on a simple memory chip. The ability to directly translate a set of decision rules into hardware implementations is expected to make fuzzy control an increasingly practical approach to the control of complex systems.\nReasoning under uncertainty in Al hats come to mean assessing the credibility of hypotheses inferred from evidence. But techniques for assessing credibility do not tell a problem solver what to do when it is uncertain. This is the focus of our current research. We have developed a medical expert system called MUM, for Managing Uncertainty in Medicine, that plans diagnostic sequences of questions, tests, and treatments. This paper describes the kinds of problems that MUM was designed to solve and gives a brief description of its architecture. More recently, we have built an empty version of MUM called MU, and used it to reimplement MUM and a small diagnostic system for plant pathology. The latter part of the paper describes the features of MU that make it appropriate for building expert systems that manage uncertainty.\nA complete approach to reasoning under uncertainty requires support for incremental and interactive formulation and revision of, as well as reasoning with, models of the problem domain capable of representing our uncertainty. We present a hybrid reasoning scheme which combines symbolic and numeric methods for uncertainty management to provide efficient and effective support for each of these tasks. The hybrid is based on symbolic techniques adapted from Assumption-based Truth Maintenance systems (ATMS), combined with numeric methods adapted from the Dempster/Shafer theory of evidence, as extended in Baldwin's Support Logic Programming system. The hybridization is achieved by viewing an ATMS as a symbolic algebra system for uncertainty calculations. This technique has several major advantages over conventional methods for performing inference with numeric certainty estimates in addition to the ability to dynamically determine hypothesis spaces, including improved management of dependent and partially independent evidence, faster run-time evaluation of propositional certainties, the ability to query the certainty value of a proposition from multiple perspectives, and the ability to incrementally extend or revise domain models.\nMany robotic sensor estimation problems can characterized in terms of nonlinear measurement systems. These systems are contaminated with noise and may be underdetermined from a single observation. In order to get reliable estimation results, the system must choose views which result in an overdetermined system. This is the sensor control problem. Accurate and reliable sensor control requires an estimation procedure which yields both estimates and measures of its own performance. In the case of nonlinear measurement systems, computationally simple closed-form estimation solutions may not exist. However, approximation techniques provide viable alternatives. In this paper, we evaluate three estimation techniques: the extended Kalman filter, a discrete Bayes approximation, and an iterative Bayes approximation. We present mathematical results and simulation statistics illustrating operating conditions where the extended Kalman filter is inappropriate for sensor control, and discuss issues in the use of the discrete Bayes approximation.\nAlthough many investigators affirm a desire to build reasoning systems that behave consistently with the axiomatic basis defined by probability theory and utility theory, limited resources for engineering and computation can make a complete normative analysis impossible. We attempt to move discussion beyond the debate over the scope of problems that can be handled effectively to cases where it is clear that there are insufficient computational resources to perform an analysis deemed as complete. Under these conditions, we stress the importance of considering the expected costs and benefits of applying alternative approximation procedures and heuristics for computation and knowledge acquisition. We discuss how knowledge about the structure of user utility can be used to control value tradeoffs for tailoring inference to alternative contexts. We address the notion of real-time rationality, focusing on the application of knowledge about the expected timewise-refinement abilities of reasoning strategies to balance the benefits of additional computation with the costs of acting with a partial result. We discuss the benefits of applying decision theory to control the solution of difficult problems given limitations and uncertainty in reasoning resources.\nAfter experimenting with a number of non-probabilistic methods for dealing with uncertainty many researchers reaffirm a preference for probability methods [1] [2], although this remains controversial. The importance of being able to form decisions from incomplete data in diagnostic problems has highlighted probabilistic methods [5] which compute posterior probabilities from prior distributions in a way similar to Bayes Rule, and thus are called Bayesian methods. This paper documents the use of a Bayesian method in a real time problem which is similar to medical diagnosis in that there is a need to form decisions and take some action without complete knowledge of conditions in the problem domain. This particular method has a limitation which is discussed.\nThis paper presents a methodology for research and development of the inferencing and knowledge representation aspects of an Expert System approach for performing reasoning under uncertainty in support of a real time vehicle guidance and navigation system. Such a system could be of major benefit for non-terrain following low altitude flight systems operating in foreign hostile environments such as might be experienced by NOE helicopter or similar mission craft. An innovative extension of the evidential reasoning methodology, termed the Sum-and-Lattice-Points Method, has been developed. The research and development effort presented in this paper consists of a formal mathematical development of the Sum-and-Lattice-Points Method, its formulation and representation in a parallel environment, prototype software development of the method within an expert system, and initial testing of the system within the confines of the vehicle guidance system.\nOne of the most important aspects of current expert systems technology is the ability to make causal inferences about the impact of new evidence. When the domain knowledge and problem knowledge are uncertain and incomplete Bayesian reasoning has proven to be an effective way of forming such inferences [3,4,8]. While several reasoning schemes have been developed based on Bayes Rule, there has been very little work examining the comparative effectiveness of these schemes in a real application. This paper describes a knowledge based system for ship classification [1], originally developed using the PROSPECTOR updating method [2], that has been reimplemented to use the inference procedure developed by Pearl and Kim [4,5]. We discuss our reasons for making this change, the implementation of the new inference engine, and the comparative performance of the two versions of the system.\nThe discovery that the minimax decision rule performs poorly in some games has sparked interest in possible alternatives to minimax. Until recently, the only games in which minimax was known to perform poorly were games which were mainly of theoretical interest. However, this paper reports results showing poor performance of minimax in a more common game called kalah. For the kalah games tested, a non-minimax decision rule called the product rule performs significantly better than minimax.   This paper also discusses a possible way to predict whether or not minimax will perform well in a game when compared to product. A parameter called the rate of heuristic flaw (rhf) has been found to correlate positively with the. performance of product against minimax. Both analytical and experimental results are given that appear to support the predictive power of rhf.\nIn this paper, we examine the concept of modularity, an often cited advantage of the ruled-based representation methodology. We argue that the notion of modularity consists of two distinct concepts which we call syntactic modularity and semantic modularity. We argue that when reasoning under certainty, it is reasonable to regard the rule-based approach as both syntactically and semantically modular. However, we argue that in the case of plausible reasoning, rules are syntactically modular but are rarely semantically modular. To illustrate this point, we examine a particular approach for managing uncertainty in rule-based systems called the MYCIN certainty factor model. We formally define the concept of semantic modularity with respect to the certainty factor model and discuss logical consequences of the definition. We show that the assumption of semantic modularity imposes strong restrictions on rules in a knowledge base. We argue that such restrictions are rarely valid in practical applications. Finally, we suggest how the concept of semantic modularity can be relaxed in a manner that makes it appropriate for plausible reasoning.\nIn the 1940's, a physicist named Cox provided the first formal justification for the axioms of probability based on the subjective or Bayesian interpretation. He showed that if a measure of belief satisfies several fundamental properties, then the measure must be some monotonic transformation of a probability. In this paper, measures of change in belief or belief updates are examined. In the spirit of Cox, properties for a measure of change in belief are enumerated. It is shown that if a measure satisfies these properties, it must satisfy other restrictive conditions. For example, it is shown that belief updates in a probabilistic context must be equal to some monotonic transformation of a likelihood ratio. It is hoped that this formal explication of the belief update paradigm will facilitate critical discussion and useful extensions of the approach.\nThere has been a considerable amount of work on uncertainty in knowledge-based systems. This work has generally been concerned with uncertainty arising from the strength of inferences and the weight of evidence. In this paper we discuss another type of uncertainty: that which is due to imprecision in the underlying primitives used to represent the knowledge of the system. In particular, a given word may denote many similar but not identical entities. Such words are said to be lexically imprecise. Lexical imprecision has caused widespread problems in many areas. Unless this phenomenon is recognized and appropriately handled, it can degrade the performance of knowledge-based systems. In particular, it can lead to difficulties with the user interface, and with the inferencing processes of these systems. Some techniques are suggested for coping with this phenomenon.\nExplanation facilities are a particularly important feature of expert system frameworks. It is an area in which traditional rule-based expert system frameworks have had mixed results. While explanations about control are well handled, facilities are needed for generating better explanations concerning knowledge base content. This paper approaches the explanation problem by examining the effect an event has on a variable of interest within a symmetric Bayesian inferencing system. We argue that any effect measure operating in this context must satisfy certain properties. Such a measure is proposed. It forms the basis for an explanation facility which allows the user of the Generalized Bayesian Inferencing System to question the meaning of the knowledge base. That facility is described in detail.\nThis paper extends the applications of belief-networks to include the revision of belief commitments, i.e., the categorical acceptance of a subset of hypotheses which, together, constitute the most satisfactory explanation of the evidence at hand. A coherent model of non-monotonic reasoning is established and distributed algorithms for belief revision are presented. We show that, in singly connected networks, the most satisfactory explanation can be found in linear time by a message-passing algorithm similar to the one used in belief updating. In multiply-connected networks, the problem may be exponentially hard but, if the network is sparse, topological considerations can be used to render the interpretation task tractable. In general, finding the most probable combination of hypotheses is no more complex than computing the degree of belief for any individual hypothesis. Applications to medical diagnosis are illustrated.\nResults on approximate deduction in the context of the calculus of evidence of Dempster-Shafer and the theory of interval probabilities are reported. Approximate conditional knowledge about the truth of conditional propositions was assumed available and expressed as sets of possible values (actually numeric intervals) of conditional probabilities. Under different interpretations of this conditional knowledge, several formulas were produced to integrate unconditioned estimates (assumed given as sets of possible values of unconditioned probabilities) with conditional estimates. These formulas are discussed together with the computational characteristics of the methods derived from them. Of particular importance is one such evidence integration formulation, produced under a belief oriented interpretation, which incorporates both modus ponens and modus tollens inferential mechanisms, allows integration of conditioned and unconditioned knowledge without resorting to iterative or sequential approximations, and produces elementary mass distributions as outputs using similar distributions as inputs.\nThe causal Bayesian approach is based on the assumption that effects (e.g., symptoms) that are not conditionally independent with respect to some causal agent (e.g., a disease) are conditionally independent with respect to some intermediate state caused by the agent, (e.g., a pathological condition). This paper describes the development of a causal Bayesian model for the diagnosis of appendicitis. The paper begins with a description of the standard Bayesian approach to reasoning about uncertainty and the major critiques it faces. The paper then lays the theoretical groundwork for the causal extension of the Bayesian approach, and details specific improvements we have developed. The paper then goes on to describe our knowledge engineering and implementation and the results of a test of the system. The paper concludes with a discussion of how the causal Bayesian approach deals with the criticisms of the standard Bayesian model and why it is superior to alternative approaches to reasoning about uncertainty popular in the Al community.\nThis paper examines the biases and performance of several uncertain inference systems: Mycin, a variant of Mycin. and a simplified version of probability using conditional independence assumptions. We present axiomatic arguments for using Minimum Cross Entropy inference as the best way to do uncertain inference. For Mycin and its variant we found special situations where its performance was very good, but also situations where performance was worse than random guessing, or where data was interpreted as having the opposite of its true import We have found that all three of these systems usually gave accurate results, and that the conditional independence assumptions gave the most robust results. We illustrate how the Importance of biases may be quantitatively assessed and ranked. Considerations of robustness might be a critical factor is selecting UlS's for a given application.\nWe propose an inequality paradigm for probabilistic reasoning based on a logic of upper and lower bounds on conditional probabilities. We investigate a family of probabilistic logics, generalizing the work of Nilsson [14]. We develop a variety of logical notions for probabilistic reasoning, including soundness, completeness justification; and convergence: reduction of a theory to a simpler logical class. We argue that a bound view is especially useful for describing the semantics of probabilistic knowledge representation and for describing intermediate states of probabilistic inference and updating. We show that the Dempster-Shafer theory of evidence is formally identical to a special case of our generalized probabilistic logic. Our paradigm thus incorporates both Bayesian \"rule-based\" approaches and avowedly non-Bayesian \"evidential\" approaches such as MYCIN and DempsterShafer. We suggest how to integrate the two \"schools\", and explore some possibilities for novel synthesis of a variety of ideas in probabilistic reasoning.\nThis paper examines the quantities used by MYCIN to reason with uncertainty, called certainty factors. It is shown that the original definition of certainty factors is inconsistent with the functions used in MYCIN to combine the quantities. This inconsistency is used to argue for a redefinition of certainty factors in terms of the intuitively appealing desiderata associated with the combining functions. It is shown that this redefinition accommodates an unlimited number of probabilistic interpretations. These interpretations are shown to be monotonic transformations of the likelihood ratio p(EIH)/p(El H). The construction of these interpretations provides insight into the assumptions implicit in the certainty factor model. In particular, it is shown that if uncertainty is to be propagated through an inference network in accordance with the desiderata, evidence must be conditionally independent given the hypothesis and its negation and the inference network must have a tree structure. It is emphasized that assumptions implicit in the model are rarely true in practical applications. Methods for relaxing the assumptions are suggested.\nThe use of maximum entropy inference in reasoning with uncertain information is commonly justified by an information-theoretic argument. This paper discusses a possible objection to this information-theoretic justification and shows how it can be met. I then compare maximum entropy inference with certain other currently popular methods for uncertain reasoning. In making such a comparison, one must distinguish between static and dynamic theories of degrees of belief: a static theory concerns the consistency conditions for degrees of belief at a given time; whereas a dynamic theory concerns how one's degrees of belief should change in the light of new information. It is argued that maximum entropy is a dynamic theory and that a complete theory of uncertain reasoning can be gotten by combining maximum entropy inference with probability theory, which is a static theory. This total theory, I argue, is much better grounded than are other theories of uncertain reasoning.\nDuda, Hart, and Nilsson have set forth a method for rule-based inference systems to use in updating the probabilities of hypotheses on the basis of multiple items of new evidence. Pednault, Zucker, and Muresan claimed to give conditions under which independence assumptions made by Duda et al. preclude updating-that is, prevent the evidence from altering the probabilities of the hypotheses. Glymour refutes Pednault et al.'s claim with a counterexample of a rather special form (one item of evidence is incompatible with all but one of the hypotheses); he raises, but leaves open, the question whether their result would be true with an added assumption to rule out such special cases. We show that their result does not hold even with the added assumption, but that it can nevertheless be largely salvaged. Namely, under the conditions assumed by Pednault et al., at most one of the items of evidence can alter the probability of any given hypothesis; thus, although updating is possible, multiple updating for any of the hypotheses is precluded.\nSeveral different uncertain inference systems (UISs) have been developed for representing uncertainty in rule-based expert systems. Some of these, such as Mycin's Certainty Factors, Prospector, and Bayes' Networks were designed as approximations to probability, and others, such as Fuzzy Set Theory and DempsterShafer Belief Functions were not. How different are these UISs in practice, and does it matter which you use? When combining and propagating uncertain information, each UIS must, at least by implication, make certain assumptions about correlations not explicily specified. The maximum entropy principle with minimum cross-entropy updating, provides a way of making assumptions about the missing specification that minimizes the additional information assumed, and thus offers a standard against which the other UISs can be compared. We describe a framework for the experimental comparison of the performance of different UISs, and provide some illustrative results.\nThe form and justification of inductive inference rules depend strongly on the representation of uncertainty. This paper examines one generic representation, namely, incomplete information. The notion can be formalized by presuming that the relevant probabilities in a decision problem are known only to the extent that they belong to a class K of probability distributions. The concept is a generalization of a frequent suggestion that uncertainty be represented by intervals or ranges on probabilities. To make the representation useful for decision making, an inductive rule can be formulated which determines, in a well-defined manner, a best approximation to the unknown probability, given the set K. In addition, the knowledge set notion entails a natural procedure for updating -- modifying the set K given new evidence. Several non-intuitive consequences of updating emphasize the differences between inference with complete and inference with incomplete information.\nExpert systems applications that involve uncertain inference can be represented by a multidimensional contingency table. These tables offer a general approach to inferring with uncertain evidence, because they can embody any form of association between any number of pieces of evidence and conclusions. (Simpler models may be required, however, if the number of pieces of evidence bearing on a conclusion is large.) This paper presents a method of using these tables to make uncertain inferences without assumptions of conditional independence among pieces of evidence or heuristic combining rules. As evidence is accumulated, new joint probabilities are calculated so as to maintain any dependencies among the pieces of evidence that are found in the contingency table. The new conditional probability of the conclusion is then calculated directly from these new joint probabilities and the conditional probabilities in the contingency table.\nControl Strategies for hierarchical tree-like probabilistic inference networks are formulated and investigated. Strategies that utilize staged look-ahead and temporary focus on subgoals are formalized and refined using the Depth Vector concept that serves as a tool for defining the 'virtual tree' regarded by the control strategy. The concept is illustrated by four types of control strategies for three-level trees that are characterized according to their Depth Vector, and according to the way they consider intermediate nodes and the role that they let these nodes play. INFERENTI is a computerized inference system written in Prolog, which provides tools for exercising a variety of control strategies. The system also provides tools for simulating test data and for comparing the relative average performance under different strategies.\nThe issue of confidence factors in Knowledge Based Systems has become increasingly important and Dempster-Shafer (DS) theory has become increasingly popular as a basis for these factors. This paper discusses the need for an empirical lnterpretatlon of any theory of confidence factors applied to Knowledge Based Systems and describes an empirical lnterpretatlon of DS theory suggesting that the theory has been extensively misinterpreted. For the essentially syntactic DS theory, a model is developed based on sample spaces, the traditional semantic model of probability theory. This model is used to show that, if belief functions are based on reasonably accurate sampling or observation of a sample space, then the beliefs and upper probabilities as computed according to DS theory cannot be interpreted as frequency ratios. Since many proposed applications of DS theory use belief functions in situations with statistically derived evidence (Wesley [1]) and seem to appeal to statistical intuition to provide an lnterpretatlon of the results as has Garvey [2], it may be argued that DS theory has often been misapplied.\nA considerable body of work in AI has been concerned with aggregating measures of confirmatory and disconfirmatory evidence for a common set of propositions. Claiming classical probability to be inadequate or inappropriate, several researchers have gone so far as to invent new formalisms and methods. We show how to represent two major such alternative approaches to evidential confirmation not only in terms of transformed (Bayesian) probability, but also in terms of each other. This unifies two of the leading approaches to confirmation theory, by showing that a revised MYCIN Certainty Factor method [12] is equivalent to a special case of Dempster-Shafer theory. It yields a well-understood axiomatic basis, i.e. conditional independence, to interpret previous work on quantitative confirmation theory. It substantially resolves the \"taxe-them-or-leave-them\" problem of priors: MYCIN had to leave them out, while PROSPECTOR had to have them in. It recasts some of confirmation theory's advantages in terms of the psychological accessibility of probabilistic information in different (transformed) formats. Finally, it helps to unify the representation of uncertain reasoning (see also [11]).\nAbductive reasoning (or Abduction, for short) is among the most fundamental AI reasoning methods, with a broad range of applications, including fault diagnosis, belief revision, and automated planning. Unfortunately, Abduction is of high computational complexity; even propositional Abduction is \\Sigma_2^P-complete and thus harder than NP and coNP. This complexity barrier rules out the existence of a polynomial transformation to propositional satisfiability (SAT). In this work we use structural properties of the Abduction instance to break this complexity barrier. We utilize the problem structure in terms of small backdoor sets. We present fixed-parameter tractable transformations from Abduction to SAT, which make the power of today's SAT solvers available to Abduction.\nAn enhanced approach for network monitoring is to create a network monitoring tool that has artificial intelligence characteristics. There are a number of approaches available. One such approach is by the use of a combination of rule based, fuzzy logic and neural networks to create a hybrid ANFIS system. Such system will have a dual knowledge database approach. One containing membership function values to compare to and do deductive reasoning and another database with rules deductively formulated by an expert (a network administrator). The knowledge database will be updated continuously with newly acquired patterns. In short, the system will be composed of 2 parts, learning from data sets and fine-tuning the knowledge-base using neural network and the use of fuzzy logic in making decision based on the rules and membership functions inside the knowledge base. This paper will discuss the idea, steps and issues involved in creating such a system.\nWe present an n-ary constraint for the stable marriage problem. This constraint acts between two sets of integer variables where the domains of those variables represent preferences. Our constraint enforces stability and disallows bigamy. For a stable marriage instance with $n$ men and $n$ women we require only one of these constraints, and the complexity of enforcing arc-consistency is $O(n^2)$ which is optimal in the size of input. Our computational studies show that our n-ary constraint is significantly faster and more space efficient than the encodings presented in \\cite{cp01}. We also introduce a new problem to the constraint community, the sex-equal stable marriage problem.\nThe question of the nature of space around us has occupied thinkers since the dawn of humanity, with scientists and philosophers today implicitly assuming that space is something that exists objectively. Here we show that this does not have to be the case: the notion of space could emerge when biological organisms seek an economic representation of their sensorimotor flow. The emergence of spatial notions does not necessitate the existence of real physical space, but only requires the presence of sensorimotor invariants called `compensable' sensory changes. We show mathematically and then in simulations that na\\\"ive agents making no assumptions about the existence of space are able to learn these invariants and to build the abstract notion that physicists call rigid displacement, which is independent of what is being displaced. Rigid displacements may underly perception of space as an unchanging medium within which objects are described by their relative positions. Our findings suggest that the question of the nature of space, currently exclusive to philosophy and physics, should also be addressed from the standpoint of neuroscience and artificial intelligence.\nIn this paper we address the problem of coalition formation in hedonic context. Our modelling tries to be as realistic as possible. In previous models, once an agent joins a coalition it would not be able to leave the coalition and join the new one; in this research we made it possible to leave a coalition but put some restrictions to control the behavior of agents. Leaving or staying of an agent in a coalition will affect on the trust of the other agents included in this coalition. Agents will use the trust values in computing the expected utility of coalitions. Three different risk behaviors are introduced for agents that want to initiate a coalition. Using these risk behaviors, some simulations are made and results are analyzed.\nG\\\"odel's ontological proof has been analysed for the first-time with an unprecedent degree of detail and formality with the help of higher-order theorem provers. The following has been done (and in this order): A detailed natural deduction proof. A formalization of the axioms, definitions and theorems in the TPTP THF syntax. Automatic verification of the consistency of the axioms and definitions with Nitpick. Automatic demonstration of the theorems with the provers LEO-II and Satallax. A step-by-step formalization using the Coq proof assistant. A formalization using the Isabelle proof assistant, where the theorems (and some additional lemmata) have been automated with Sledgehammer and Metis.\nIn the middle of the 1980s, David Poole introduced a semantical, model-theoretic notion of specificity to the artificial-intelligence community. Since then it has found further applications in non-monotonic reasoning, in particular in defeasible reasoning. Poole tried to approximate the intuitive human concept of specificity, which seems to be essential for reasoning in everyday life with its partial and inconsistent information. His notion, however, turns out to be intricate and problematic, which --- as we show --- can be overcome to some extent by a closer approximation of the intuitive human concept of specificity. Besides the intuitive advantages of our novel specificity ordering over Poole's specificity relation in the classical examples of the literature, we also report some hard mathematical facts: Contrary to what was claimed before, we show that Poole's relation is not transitive. The present means to decide our novel specificity relation, however, show only a slight improvement over the known ones for Poole's relation, and further work is needed in this aspect.\nWe consider the problem of finding all enclosing rectangles of minimum area that can contain a given set of rectangles without overlap. Our rectangle packer chooses the x-coordinates of all the rectangles before any of the y-coordinates. We then transform the problem into a perfect-packing problem with no empty space by adding additional rectangles. To determine the y-coordinates, we branch on the different rectangles that can be placed in each empty position. Our packer allows us to extend the known solutions for a consecutive-square benchmark from 27 to 32 squares. We also introduce three new benchmarks, avoiding properties that make a benchmark easy, such as rectangles with shared dimensions. Our third benchmark consists of rectangles of increasingly high precision. To pack them efficiently, we limit the rectangles coordinates and the bounding box dimensions to the set of subset sums of the rectangles dimensions. Overall, our algorithms represent the current state-of-the-art for this problem, outperforming other algorithms by orders of magnitude, depending on the benchmark.\nIn this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.\nThe results in this paper add useful tools to the theory of sets of desirable gambles, a growing toolbox for reasoning with partial probability assessments. We investigate how to combine a number of marginal coherent sets of desirable gambles into a joint set using the properties of epistemic irrelevance and independence. We provide formulas for the smallest such joint, called their independent natural extension, and study its main properties. The independent natural extension of maximal coherent sets of desirable gambles allows us to define the strong product of sets of desirable gambles. Finally, we explore an easy way to generalise these results to also apply for the conditional versions of epistemic irrelevance and independence. Having such a set of tools that are easily implemented in computer programs is clearly beneficial to fields, like AI, with a clear interest in coherent reasoning under uncertainty using general and robust uncertainty models that require no full specification.\nLifted probabilistic inference algorithms exploit regularities in the structure of graphical models to perform inference more efficiently. More specifically, they identify groups of interchangeable variables and perform inference once per group, as opposed to once per variable. The groups are defined by means of constraints, so the flexibility of the grouping is determined by the expressivity of the constraint language. Existing approaches for exact lifted inference use specific languages for (in)equality constraints, which often have limited expressivity. In this article, we decouple lifted inference from the constraint language. We define operators for lifted inference in terms of relational algebra operators, so that they operate on the semantic level (the constraints extension) rather than on the syntactic level, making them language-independent. As a result, lifted inference can be performed using more powerful constraint languages, which provide more opportunities for lifting. We empirically demonstrate that this can improve inference efficiency by orders of magnitude, allowing exact inference where until now only approximate inference was feasible.\nWe present an approach to propagation-based SAT encoding of combinatorial problems, Boolean equi-propagation, where constraints are modeled as Boolean functions which propagate information about equalities between Boolean literals. This information is then applied to simplify the CNF encoding of the constraints. A key factor is that considering only a small fragment of a constraint model at one time enables us to apply stronger, and even complete, reasoning to detect equivalent literals in that fragment. Once detected, equivalences apply to simplify the entire constraint model and facilitate further reasoning on other fragments. Equi-propagation in combination with partial evaluation and constraint simplification provide the foundation for a powerful approach to SAT-based finite domain constraint solving. We introduce a tool called BEE (Ben-Gurion Equi-propagation Encoder) based on these ideas and demonstrate for a variety of benchmarks that our approach leads to a considerable reduction in the size of CNF encodings and subsequent speed-ups in SAT solving times.\nDescription logic Knowledge and Action Bases (KAB) are a mechanism for providing both a semantically rich representation of the information on the domain of interest in terms of a description logic knowledge base and actions to change such information over time, possibly introducing new objects. We resort to a variant of DL-Lite where the unique name assumption is not enforced and where equality between objects may be asserted and inferred. Actions are specified as sets of conditional effects, where conditions are based on epistemic queries over the knowledge base (TBox and ABox), and effects are expressed in terms of new ABoxes. In this setting, we address verification of temporal properties expressed in a variant of first-order mu-calculus with quantification across states. Notably, we show decidability of verification, under a suitable restriction inspired by the notion of weak acyclicity in data exchange.\nGiven a current news event, we tackle the problem of generating plausible predictions of future events it might cause. We present a new methodology for modeling and predicting such future news events using machine learning and data mining techniques. Our Pundit algorithm generalizes examples of causality pairs to infer a causality predictor. To obtain precisely labeled causality examples, we mine 150 years of news articles and apply semantic natural language modeling techniques to headlines containing certain predefined causality patterns. For generalization, the model uses a vast number of world knowledge ontologies. Empirical evaluation on real news articles shows that our Pundit algorithm performs as well as non-expert humans.\nIn order to meet usability requirements, most logic-based applications provide explanation facilities for reasoning services. This holds also for Description Logics, where research has focused on the explanation of both TBox reasoning and, more recently, query answering. Besides explaining the presence of a tuple in a query answer, it is important to explain also why a given tuple is missing. We address the latter problem for instance and conjunctive query answering over DL-Lite ontologies by adopting abductive reasoning; that is, we look for additions to the ABox that force a given tuple to be in the result. As reasoning tasks we consider existence and recognition of an explanation, and relevance and necessity of a given assertion for an explanation. We characterize the computational complexity of these problems for arbitrary, subset minimal, and cardinality minimal explanations.\nSequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.\nThe task of artificial intelligence is to provide representation techniques for describing problems, as well as search algorithms that can be used to answer our questions. A widespread and elaborated model is state-space representation, which, however, has some shortcomings. Classical search algorithms are not applicable in practice when the state space contains even only a few tens of thousands of states. We can give remedy to this problem by defining some kind of heuristic knowledge. In case of classical state-space representation, heuristic must be defined so that it qualifies an arbitrary state based on its \"goodness,\" which is obviously not trivial. In our paper, we introduce an algorithm that gives us the ability to handle huge state spaces and to use a heuristic concept which is easier to embed into search algorithms.\nHow should we gather information to make effective decisions? We address Bayesian active learning and experimental design problems, where we sequentially select tests to reduce uncertainty about a set of hypotheses. Instead of minimizing uncertainty per se, we consider a set of overlapping decision regions of these hypotheses. Our goal is to drive uncertainty into a single decision region as quickly as possible.   We identify necessary and sufficient conditions for correctly identifying a decision region that contains all hypotheses consistent with observations. We develop a novel Hyperedge Cutting (HEC) algorithm for this problem, and prove that is competitive with the intractable optimal policy. Our efficient implementation of the algorithm relies on computing subsets of the complete homogeneous symmetric polynomials. Finally, we demonstrate its effectiveness on two practical applications: approximate comparison-based learning and active localization using a robot manipulator.\nThis paper introduces a self-organizing traffic signal system for an urban road network. The key elements of this system are agents that control traffic signals at intersections. Each agent uses an interval microscopic traffic model to predict effects of its possible control actions in a short time horizon. The executed control action is selected on the basis of predicted delay intervals. Since the prediction results are represented by intervals, the agents can recognize and suspend those control actions, whose positive effect on the performance of traffic control is uncertain. Evaluation of the proposed traffic control system was performed in a simulation environment. The simulation experiments have shown that the proposed approach results in an improved performance, particularly for non-uniform traffic streams.\nWe propose an approach to generate geometric theorems from electronic images of diagrams automatically. The approach makes use of techniques of Hough transform to recognize geometric objects and their labels and of numeric verification to mine basic geometric relations. Candidate propositions are generated from the retrieved information by using six strategies and geometric theorems are obtained from the candidates via algebraic computation. Experiments with a preliminary implementation illustrate the effectiveness and efficiency of the proposed approach for generating nontrivial theorems from images of diagrams. This work demonstrates the feasibility of automated discovery of profound geometric knowledge from simple image data and has potential applications in geometric knowledge management and education.\nEncoding temporal information from the recent past as spatially distributed activations is essential in order for the entire recent past to be simultaneously accessible. Any biological or synthetic agent that relies on the past to predict/plan the future, would be endowed with such a spatially distributed temporal memory. Simplistically, we would expect that resource limitations would demand the memory system to store only the most useful information for future prediction. For natural signals in real world which show scale free temporal fluctuations, the predictive information encoded in memory is maximal if the past information is scale invariantly coarse grained. Here we examine the general mechanism to construct a scale invariantly coarse grained memory system. Remarkably, the generic construction is equivalent to encoding the linear combinations of Laplace transform of the past information and their approximated inverses. This reveals a fundamental construction constraint on memory networks that attempt to maximize predictive information storage relevant to the natural world.\nFailure detection in telecommunication networks is a vital task. So far, several supervised and unsupervised solutions have been provided for discovering failures in such networks. Among them unsupervised approaches has attracted more attention since no label data is required. Often, network devices are not able to provide information about the type of failure. In such cases the type of failure is not known in advance and the unsupervised setting is more appropriate for diagnosis. Among unsupervised approaches, Principal Component Analysis (PCA) is a well-known solution which has been widely used in the anomaly detection literature and can be applied to matrix data (e.g. Users-Features). However, one of the important properties of network data is their temporal sequential nature. So considering the interaction of dimensions over a third dimension, such as time, may provide us better insights into the nature of network failures. In this paper we demonstrate the power of three-way analysis to detect events and anomalies in time-evolving network data.\nWe analyze variational inference for highly symmetric graphical models such as those arising from first-order probabilistic models. We first show that for these graphical models, the tree-reweighted variational objective lends itself to a compact lifted formulation which can be solved much more efficiently than the standard TRW formulation for the ground graphical model. Compared to earlier work on lifted belief propagation, our formulation leads to a convex optimization problem for lifted marginal inference and provides an upper bound on the partition function. We provide two approaches for improving the lifted TRW upper bound. The first is a method for efficiently computing maximum spanning trees in highly symmetric graphs, which can be used to optimize the TRW edge appearance probabilities. The second is a method for tightening the relaxation of the marginal polytope using lifted cycle inequalities and novel exchangeable cluster consistency constraints.\nThe Shapley value has been recently advocated as a method to choose the seed nodes for the process of information diffusion. Intuitively, since the Shapley value evaluates the average marginal contribution of a player to the coalitional game, it can be used in the network context to evaluate the marginal contribution of a node in the process of information diffusion given various groups of already 'infected' nodes. Although the above direction of research seems promising, the current liter- ature is missing a throughout assessment of its performance. The aim of this work is to provide such an assessment of the existing Shapley value-based approaches to information diffusion.\nIn repeated stochastic games (RSGs), an agent must quickly adapt to the behavior of previously unknown associates, who may themselves be learning. This machine-learning problem is particularly challenging due, in part, to the presence of multiple (even infinite) equilibria and inherently large strategy spaces. In this paper, we introduce a method to reduce the strategy space of two-player general-sum RSGs to a handful of expert strategies. This process, called Mega, effectually reduces an RSG to a bandit problem. We show that the resulting strategy space preserves several important properties of the original RSG, thus enabling a learner to produce robust strategies within a reasonably small number of interactions. To better establish strengths and weaknesses of this approach, we empirically evaluate the resulting learning system against other algorithms in three different RSGs.\nArriving at the complete probabilistic knowledge of a domain, i.e., learning how all variables interact, is indeed a demanding task. In reality, settings often arise for which an individual merely possesses partial knowledge of the domain, and yet, is expected to give adequate answers to a variety of posed queries. That is, although precise answers to some queries, in principle, cannot be achieved, a range of plausible answers is attainable for each query given the available partial knowledge. In this paper, we propose the Multi-Context Model (MCM), a new graphical model to represent the state of partial knowledge as to a domain. MCM is a middle ground between Probabilistic Logic, Bayesian Logic, and Probabilistic Graphical Models. For this model we discuss: (i) the dynamics of constructing a contradiction-free MCM, i.e., to form partial beliefs regarding a domain in a gradual and probabilistically consistent way, and (ii) how to perform inference, i.e., to evaluate a probability of interest involving some variables of the domain.\nAutomated Theorem Proving (ATP) is an established branch of Artificial Intelligence. The purpose of ATP is to design a system which can automatically figure out an algorithm either to prove or disprove a mathematical claim, on the basis of a set of given premises, using a set of fundamental postulates and following the method of logical inference. In this paper, we propose GraATP, a generalized framework for automated theorem proving in plane geometry. Our proposed method translates the geometric entities into nodes of a graph and the relations between them as edges of that graph. The automated system searches for different ways to reach the conclusion for a claim via graph traversal by which the validity of the geometric theorem is examined.\nMajority of the existing robot navigation systems, which facilitate the use of laser range finders, sonar sensors or artificial landmarks, has the ability to locate itself in an unknown environment and then build a map of the corresponding environment. Stereo vision, while still being a rapidly developing technique in the field of autonomous mobile robots, are currently less preferable due to its high implementation cost. This paper aims at describing an experimental approach for the building of a stereo vision system that helps the robots to avoid obstacles and navigate through indoor environments and at the same time remaining very much cost effective. This paper discusses the fusion techniques of stereo vision and ultrasound sensors which helps in the successful navigation through different types of complex environments. The data from the sensor enables the robot to create the two dimensional topological map of unknown environments and stereo vision systems models the three dimension model of the same environment.\nOne of the most significant problems which inhibits further developments in the areas of Knowledge Representation and Artificial Intelligence is a problem of semantic alignment or knowledge mapping. The progress in its solution will be greatly beneficial for further advances of information retrieval, ontology alignment, relevance calculation, text mining, natural language processing etc. In the paper the concept of multidimensional global knowledge map, elaborated through unsupervised extraction of dependencies from large documents corpus, is proposed. In addition, the problem of direct Human - Knowledge Representation System interface is addressed and a concept of adaptive decoder proposed for the purpose of interaction with previously described unified mapping model. In combination these two approaches are suggested as basis for a development of a new generation of knowledge representation systems.\nWhen two or more self-interested agents put their plans to execution in the same environment, conflicts may arise as a consequence, for instance, of a common utilization of resources. In this case, an agent can postpone the execution of a particular action, if this punctually solves the conflict, or it can resort to execute a different plan if the agent's payoff significantly diminishes due to the action deferral. In this paper, we present a game-theoretic approach to non-cooperative planning that helps predict before execution what plan schedules agents will adopt so that the set of strategies of all agents constitute a Nash equilibrium. We perform some experiments and discuss the solutions obtained with our game-theoretical approach, analyzing how the conflicts between the plans determine the strategic behavior of the agents.\nResearch on the so-called \"free-energy principle'' (FEP) in cognitive neuroscience is becoming increasingly high-profile. To date, introductions to this theory have proved difficult for many readers to follow, but it depends mainly upon two relatively simple ideas: firstly that normative or teleological values can be expressed as probability distributions (active inference), and secondly that approximate Bayesian reasoning can be effectively performed by gradient descent on model parameters (the free-energy principle). The notion of active inference is of great interest for a number of disciplines including cognitive science and artificial intelligence, as well as cognitive neuroscience, and deserves to be more widely known.   This paper attempts to provide an accessible introduction to active inference and informational free-energy, for readers from a range of scientific backgrounds. In this work introduce an agent-based model with an agent trying to make predictions about its position in a one-dimensional discretized world using methods from the FEP.\nIt is argued that the concept of free will, like the concept of truth in formal languages, requires a separation between an object level and a meta-level for being consistently defined. The Jamesian two-stage model, which deconstructs free will into the causally open \"free\" stage with its closure in the \"will\" stage, is implicitly a move in this direction. However, to avoid the dilemma of determinism, free will additionally requires an infinite regress of causal meta-stages, making free choice a hypertask. We use this model to define free will of the rationalist-compatibilist type. This is shown to provide a natural three-way distinction between quantum indeterminism, freedom and free will, applicable respectively to artificial intelligence (AI), animal agents and human agents. We propose that the causal hierarchy in our model corresponds to a hierarchy of Turing uncomputability. Possible neurobiological and behavioral tests to demonstrate free will experimentally are suggested. Ramifications of the model for physics, evolutionary biology, neuroscience, neuropathological medicine and moral philosophy are briefly outlined.\nThe paper describes a novel approach to categorize users' reviews according to the three Quality in Use (QU) indicators defined in ISO: effectiveness, efficiency and freedom from risk. With the tremendous amount of reviews published each day, there is a need to automatically summarize user reviews to inform us if any of the software able to meet requirement of a company according to the quality requirements. We implemented the method of Latent Semantic Analysis (LSA) and its subspace to predict QU indicators. We build a reduced dimensionality universal semantic space from Information System journals and Amazon reviews. Next, we projected set of indicators' measurement scales into the universal semantic space and represent them as subspace. In the subspace, we can map similar measurement scales to the unseen reviews and predict the QU indicators. Our preliminary study able to obtain the average of F-measure, 0.3627.\nOver the last decade there has been increasing concern about the biases embodied in traditional evaluation methods for Natural Language Processing/Learning, particularly methods borrowed from Information Retrieval. Without knowledge of the Bias and Prevalence of the contingency being tested, or equivalently the expectation due to chance, the simple conditional probabilities Recall, Precision and Accuracy are not meaningful as evaluation measures, either individually or in combinations such as F-factor. The existence of bias in NLP measures leads to the 'improvement' of systems by increasing their bias, such as the practice of improving tagging and parsing scores by using most common value (e.g. water is always a Noun) rather than the attempting to discover the correct one. The measures Cohen Kappa and Powers Informedness are discussed as unbiased alternative to Recall and related to the psychologically significant measure DeltaP. In this paper we will analyze both biased and unbiased measures theoretically, characterizing the precise relationship between all these measures as well as evaluating the evaluation measures themselves empirically using a Monte Carlo simulation.\nPath finding algorithm addresses problem of finding shortest path from source to destination avoiding obstacles. There exist various search algorithms namely A*, Dijkstra's and ant colony optimization. Unlike most path finding algorithms which require destination co-ordinates to compute path, the proposed algorithm comprises of a new method which finds path using backtracking without requiring destination co-ordinates. Moreover, in existing path finding algorithm, the number of iterations required to find path is large. Hence, to overcome this, an algorithm is proposed which reduces number of iterations required to traverse the path. The proposed algorithm is hybrid of backtracking and a new technique(modified 8- neighbor approach). The proposed algorithm can become essential part in location based, network, gaming applications. grid traversal, navigation, gaming applications, mobile robot and Artificial Intelligence.\nDiscovery of (strong) association rules, or implications, is an important task in data management, and it finds application in artificial intelligence, data mining and the semantic web. We introduce a novel approach for the discovery of a specific set of implications, called the $D$-basis, that provides a representation for a reduced binary table, based on the structure of its Galois lattice. At the core of the method are the $D$-relation defined in the lattice theory framework, and the hypergraph dualization algorithm that allows us to effectively produce the set of transversals for a given Sperner hypergraph. The latter algorithm, first developed by specialists from Rutgers Center for Operations Research, has already found numerous applications in solving optimization problems in data base theory, artificial intelligence and game theory. One application of the method is for analysis of gene expression data related to a particular phenotypic variable, and some initial testing is done for the data provided by the University of Hawaii Cancer Center.\nSquare grids are commonly used in robotics and game development as spatial models and well known in AI community heuristic search algorithms (such as A*, JPS, Theta* etc.) are widely used for path planning on grids. A lot of research is concentrated on finding the shortest (in geometrical sense) paths while in many applications finding smooth paths (rather than the shortest ones but containing sharp turns) is preferable. In this paper we study the problem of generating smooth paths and concentrate on angle constrained path planning. We put angle-constrained path planning problem formally and present a new algorithm tailored to solve it - LIAN. We examine LIAN both theoretically and empirically. We show that it is sound and complete (under some restrictions). We also show that LIAN outperforms the analogues when solving numerous path planning tasks within urban outdoor navigation scenarios.\nThe question of how humans solve problem has been addressed extensively. However, the direct study of the effectiveness of this process seems to be overlooked. In this paper, we address the issue of the effectiveness of human problem solving: we analyze where this effectiveness comes from and what cognitive mechanisms or heuristics are involved. Our results are based on the optimal probabilistic problem solving strategy that appeared in Solomonoff paper on general problem solving system. We provide arguments that a certain set of cognitive mechanisms or heuristics drive human problem solving in the similar manner as the optimal Solomonoff strategy. The results presented in this paper can serve both cognitive psychology in better understanding of human problem solving processes as well as artificial intelligence in designing more human-like agents.\nRecommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. It thus influences the way users interact with the system and, as a consequence, bias the evaluation of the performance of a recommendation algorithm computed using historical data (via offline evaluation). This paper presents a new application of a weighted offline evaluation to reduce this bias for collaborative filtering algorithms.\nThe latent block model (LBM) is a flexible probabilistic tool to describe interactions between node sets in bipartite networks, but it does not account for interactions of time varying intensity between nodes in unknown classes. In this paper we propose a non stationary temporal extension of the LBM that clusters simultaneously the two node sets of a bipartite network and constructs classes of time intervals on which interactions are stationary. The number of clusters as well as the membership to classes are obtained by maximizing the exact complete-data integrated likelihood relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology.\nThe social network analysis (SNA), branch of complex systems can be used in the construction of multiagent systems. This paper proposes a study of how social network analysis can assist in modeling multiagent systems, while addressing similarities and differences between the two theories. We built a prototype of multi-agent systems for resolution of tasks through the formation of teams of agents that are formed on the basis of the social network established between agents. Agents make use of performance indicators to assess when should change their social network to maximize the participation in teams\nIntegrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.\nDesign mining is the use of computational intelligence techniques to iteratively search and model the attribute space of physical objects evaluated directly through rapid prototyping to meet given objectives. It enables the exploitation of novel materials and processes without formal models or complex simulation. In this paper, we focus upon the coevolutionary nature of the design process when it is decomposed into concurrent sub-design threads due to the overall complexity of the task. Using an abstract, tuneable model of coevolution we consider strategies to sample sub-thread designs for whole system testing and how best to construct and use surrogate models within the coevolutionary scenario. Drawing on our findings, the paper then describes the effective design of an array of six heterogeneous vertical-axis wind turbines.\nMaier et al. (2010) introduced the relational causal model (RCM) for representing and inferring causal relationships in relational data. A lifted representation, called abstract ground graph (AGG), plays a central role in reasoning with and learning of RCM. The correctness of the algorithm proposed by Maier et al. (2013a) for learning RCM from data relies on the soundness and completeness of AGG for relational d-separation to reduce the learning of an RCM to learning of an AGG. We revisit the definition of AGG and show that AGG, as defined in Maier et al. (2013b), does not correctly abstract all ground graphs. We revise the definition of AGG to ensure that it correctly abstracts all ground graphs. We further show that AGG representation is not complete for relational d-separation, that is, there can exist conditional independence relations in an RCM that are not entailed by AGG. A careful examination of the relationship between the lack of completeness of AGG for relational d-separation and faithfulness conditions suggests that weaker notions of completeness, namely adjacency faithfulness and orientation faithfulness between an RCM and its AGG, can be used to learn an RCM from data.\nIn this note we provide a concise report on the complexity of the causal ordering problem, originally introduced by Simon to reason about causal dependencies implicit in systems of mathematical equations. We show that Simon's classical algorithm to infer causal ordering is NP-Hard---an intractability previously guessed but never proven. We present then a detailed account based on Nayak's suggested algorithmic solution (the best available), which is dominated by computing transitive closure---bounded in time by $O(|\\mathcal V|\\cdot |\\mathcal S|)$, where $\\mathcal S(\\mathcal E, \\mathcal V)$ is the input system structure composed of a set $\\mathcal E$ of equations over a set $\\mathcal V$ of variables with number of variable appearances (density) $|\\mathcal S|$. We also comment on the potential of causal ordering for emerging applications in large-scale hypothesis management and analytics.\nPropagating input uncertainty through non-linear Gaussian process (GP) mappings is intractable. This hinders the task of training GPs using uncertain and partially observed inputs. In this paper we refer to this task as \"semi-described learning\". We then introduce a GP framework that solves both, the semi-described and the semi-supervised learning problems (where missing values occur in the outputs). Auto-regressive state space simulation is also recognised as a special case of semi-described learning. To achieve our goal we develop variational methods for handling semi-described inputs in GPs, and couple them with algorithms that allow for imputing the missing values while treating the uncertainty in a principled, Bayesian manner. Extensive experiments on simulated and real-world data study the problems of iterative forecasting and regression/classification with missing values. The results suggest that the principled propagation of uncertainty stemming from our framework can significantly improve performance in these tasks.\nRecent results have shown that the MCTS algorithm (a new, adaptive, randomized optimization algorithm) is effective in a remarkably diverse set of applications in Artificial Intelligence, Operations Research, and High Energy Physics. MCTS can find good solutions without domain dependent heuristics, using the UCT formula to balance exploitation and exploration. It has been suggested that the optimum in the exploitation- exploration balance differs for different search tree sizes: small search trees needs more exploitation; large search trees need more exploration. Small search trees occur in variations of MCTS, such as parallel and ensemble approaches. This paper investigates the possibility of improving the performance of Ensemble UCT by increasing the level of exploitation. As the search trees becomes smaller we achieve an improved performance. The results are important for improving the performance of large scale parallelism of MCTS.\nThe multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely-available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on the quantity and quality of training data. Such has a very limited availability especially for some languages and very narrow text domains. Is this research we present our improvements to Yalign mining methodology by reimplementing the comparison algorithm, introducing a tuning scripts and by improving performance using GPU computing acceleration. The experiments are conducted on various text domains and bi-data is extracted from the Wikipedia dumps.\nFuzzy Description Logics (FDLs) are logic-based formalisms used to represent and reason with vague or imprecise knowledge. It has been recently shown that reasoning in most FDLs using truth values from the interval [0,1] becomes undecidable in the presence of a negation constructor and general concept inclusion axioms. One exception to this negative result are FDLs whose semantics is based on the infinitely valued G\\\"odel t-norm (G). In this paper, we extend previous decidability results for G-IALC to deal also with qualified number restrictions. Our novel approach is based on a combination of the known crispification technique for finitely valued FDLs and the automata-based procedure originally developed for reasoning in G-IALC. The proposed approach combines the advantages of these two methods, while removing their respective drawbacks.\nThe human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to discover a coarse taxonomy of the reasons for abnormality. Our experiments reveal three major categories of abnormality: object-centric, scene-centric, and contextual. Based on this taxonomy, we propose a comprehensive computational model that can predict all different types of abnormality in images and outperform prior arts in abnormality recognition.\nWe advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surface- and syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions.\nTraditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges $|E|$ and distinct labels $m$. To deal with the large label size problem, recent works propose sketch-based methods to approximate the distribution on labels per node thereby achieving a space reduction from $O(m)$ to $O(\\log m)$, under certain conditions. In this paper, we present a novel streaming graph-based SSL approximation that captures the sparsity of the label distribution and ensures the algorithm propagates labels accurately, and further reduces the space complexity per node to $O(1)$. We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. We also study different graph construction mechanisms for natural language applications and propose a robust graph augmentation strategy trained using state-of-the-art unsupervised deep learning architectures that yields further significant quality gains.\nDescription logic (DL) based biomedical terminology (SNOMED CT) is used routinely in medical practice. However, diagnostic inference using such terminology is precluded by its complexity. Here we propose a model that simplifies these inferential components. We propose three concepts that classify clinical features and examined their effect on inference using SNOMED CT. We used PAIRS (Physician Assistant Artificial Intelligence Reference System) database (1964 findings for 485 disorders, 18 397 disease feature links) for our analysis. We also use a 50-million medical word corpus for estimating the vectors of disease-feature links. Our major results are 10% of finding-disorder links are concomitant in both assertion and negation where as 90% are either concomitant in assertion or negation. Logical implications of PAIRS data on SNOMED CT include 70% of the links do not share any common system while 18% share organ and 12% share both system and organ. Applications of these principles for inference are discussed and suggestions are made for deriving a diagnostic process using SNOMED CT. Limitations of these processes and suggestions for improvements are also discussed.\nDistributional semantic models provide vector representations for words by gathering co-occurrence frequencies from corpora of text. Compositional distributional models extend these representations from words to phrases and sentences. In categorical compositional distributional semantics these representations are built in such a manner that meanings of phrases and sentences are functions of their grammatical structure and the meanings of the words therein. These models have been applied to reasoning about phrase and sentence level similarity. In this paper, we argue for and prove that these models can also be used to reason about phrase and sentence level entailment. We provide preliminary experimental results on a toy entailment dataset.\nThis paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.\nConstrained sampling and counting are two fundamental problems in artificial intelligence with a diverse range of applications, spanning probabilistic reasoning and planning to constrained-random verification. While the theory of these problems was thoroughly investigated in the 1980s, prior work either did not scale to industrial size instances or gave up correctness guarantees to achieve scalability. Recently, we proposed a novel approach that combines universal hashing and SAT solving and scales to formulas with hundreds of thousands of variables without giving up correctness guarantees. This paper provides an overview of the key ingredients of the approach and discusses challenges that need to be overcome to handle larger real-world instances.\nBounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.\nWe study the (parameterized) complexity of SHIFT BRIBERY for multiwinner voting rules. We focus on SNTV, Bloc, k-Borda, and Chamberlin-Courant, as well as on approximate variants of Chamberlin-Courant, since the original rule is NP-hard to compute. We show that SHIFT BRIBERY tends to be significantly harder in the multiwinner setting than in the single-winner one by showing settings where SHIFT BRIBERY is easy in the single-winner cases, but is hard (and hard to approximate) in the multiwinner ones. Moreover, we show that the non-monotonicity of those rules which are based on approximation algorithms for the Chamberlin-Courant rule sometimes affects the complexity of SHIFT BRIBERY.\nThis Bachelor's thesis, written in Russian, is devoted to a relatively new direction in the field of machine learning and artificial intelligence, namely probabilistic programming. The thesis gives a brief overview to the already existing probabilistic programming languages: Church, Venture, and Anglican. It also describes the results of the first experiments on the automatic induction of probabilistic programs. The thesis was submitted, in June 2014, in partial fulfilment of the requirements for the degree of Bachelor of Science in Mathematics in the Department of Mathematics and Computer Science, Siberian Federal University, Krasnoyarsk, Russia. The work, which is described in this thesis, has been performing in 2012-2014 in the Massachusetts Institute of Technology and in the University of Oxford by the colleagues of the author and by himself.\nArgumentation is a process of evaluating and comparing a set of arguments. A way to compare them consists in using a ranking-based semantics which rank-order arguments from the most to the least acceptable ones. Recently, a number of such semantics have been proposed independently, often associated with some desirable properties. However, there is no comparative study which takes a broader perspective. This is what we propose in this work. We provide a general comparison of all these semantics with respect to the proposed properties. That allows to underline the differences of behavior between the existing semantics.\nIn this paper we proposed reinforcement learning algorithms with the generalized reward function. In our proposed method we use Q-learning and SARSA algorithms with generalised reward function to train the reinforcement learning agent. We evaluated the performance of our proposed algorithms on two real-time strategy games called BattleCity and S3. There are two main advantages of having such an approach as compared to other works in RTS. (1) We can ignore the concept of a simulator which is often game specific and is usually hard coded in any type of RTS games (2) our system can learn from interaction with any opponents and quickly change the strategy according to the opponents and do not need any human traces as used in previous works. Keywords : Reinforcement learning, Machine learning, Real time strategy, Artificial intelligence.\nIn practice, training language models for individual authors is often expensive because of limited data resources. In such cases, Neural Network Language Models (NNLMs), generally outperform the traditional non-parametric N-gram models. Here we investigate the performance of a feed-forward NNLM on an authorship attribution problem, with moderate author set size and relatively limited data. We also consider how the text topics impact performance. Compared with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the proposed method achieves nearly 2:5% reduction in perplexity and increases author classification accuracy by 3:43% on average, given as few as 5 test sentences. The performance is very competitive with the state of the art in terms of accuracy and demand on test data. The source code, preprocessed datasets, a detailed description of the methodology and results are available at https://github.com/zge/authorship-attribution.\nConceptual spaces are geometric representations of conceptual knowledge, in which entities correspond to points, natural properties correspond to convex regions, and the dimensions of the space correspond to salient features. While conceptual spaces enable elegant models of various cognitive phenomena, the lack of automated methods for constructing such representations have so far limited their application in artificial intelligence. To address this issue, we propose a method which learns a vector-space embedding of entities from Wikipedia and constrains this embedding such that entities of the same semantic type are located in some lower-dimensional subspace. We experimentally demonstrate the usefulness of these subspaces as (approximate) conceptual space representations by showing, among others, that important features can be modelled as directions and that natural properties tend to correspond to convex regions.\nThe field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.\nQuantum computer has an amazing potential of fast information processing. However, realisation of a digital quantum computer is still a challenging problem requiring highly accurate controls and key application strategies. Here we propose a novel platform, quantum reservoir computing, to solve these issues successfully by exploiting natural quantum dynamics, which is ubiquitous in laboratories nowadays, for machine learning. In this framework, nonlinear dynamics including classical chaos can be universally emulated in quantum systems. A number of numerical experiments show that quantum systems consisting of at most seven qubits possess computational capabilities comparable to conventional recurrent neural networks of 500 nodes. This discovery opens up a new paradigm for information processing with artificial intelligence powered by quantum physics.\nBounded rational decision-makers transform sensory input into motor output under limited computational resources. Mathematically, such decision-makers can be modeled as information-theoretic channels with limited transmission rate. Here, we apply this formalism for the first time to multilayer feedforward neural networks. We derive synaptic weight update rules for two scenarios, where either each neuron is considered as a bounded rational decision-maker or the network as a whole. In the update rules, bounded rationality translates into information-theoretically motivated types of regularization in weight space. In experiments on the MNIST benchmark classification task for handwritten digits, we show that such information-theoretic regularization successfully prevents overfitting across different architectures and attains results that are competitive with other recent techniques like dropout, dropconnect and Bayes by backprop, for both ordinary and convolutional neural networks.\nWith proper management, Autonomous Mobility-on-Demand (AMoD) systems have great potential to satisfy the transport demands of urban populations by providing safe, convenient, and affordable ridesharing services. Meanwhile, such systems can substantially decrease private car ownership and use, and thus significantly reduce traffic congestion, energy consumption, and carbon emissions. To achieve this objective, an AMoD system requires private information about the demand from passengers. However, due to self-interestedness, passengers are unlikely to cooperate with the service providers in this regard. Therefore, an online mechanism is desirable if it incentivizes passengers to truthfully report their actual demand. For the purpose of promoting ridesharing, we hereby introduce a posted-price, integrated online ridesharing mechanism (IORS) that satisfies desirable properties such as ex-post incentive compatibility, individual rationality, and budget-balance. Numerical results indicate the competitiveness of IORS compared with two benchmarks, namely the optimal assignment and an offline, auction-based mechanism.\nBayesian inference has great promise for the privacy-preserving analysis of sensitive data, as posterior sampling automatically preserves differential privacy, an algorithmic notion of data privacy, under certain conditions (Dimitrakakis et al., 2014; Wang et al., 2015). While this one posterior sample (OPS) approach elegantly provides privacy \"for free,\" it is data inefficient in the sense of asymptotic relative efficiency (ARE). We show that a simple alternative based on the Laplace mechanism, the workhorse of differential privacy, is as asymptotically efficient as non-private posterior inference, under general assumptions. This technique also has practical advantages including efficient use of the privacy budget for MCMC. We demonstrate the practicality of our approach on a time-series analysis of sensitive military records from the Afghanistan and Iraq wars disclosed by the Wikileaks organization.\nWe study the worst-case adaptive optimization problem with budget constraint that is useful for modeling various practical applications in artificial intelligence and machine learning. We investigate the near-optimality of greedy algorithms for this problem with both modular and non-modular cost functions. In both cases, we prove that two simple greedy algorithms are not near-optimal but the best between them is near-optimal if the utility function satisfies pointwise submodularity and pointwise cost-sensitive submodularity respectively. This implies a combined algorithm that is near-optimal with respect to the optimal algorithm that uses half of the budget. We discuss applications of our theoretical results and also report experiments comparing the greedy algorithms on the active learning problem.\nArtificial intelligence is commonly defined as the ability to achieve goals in the world. In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress. However, some domains have no such reward signal, or have a reward signal so sparse as to appear absent. Without reward feedback, agent behaviour is typically random, often dithering aimlessly and lacking intentionality. In this paper we present an algorithm capable of learning purposeful behaviour in the absence of rewards. The algorithm proceeds by constructing temporally extended actions (options), through the identification of purposes that are \"just out of reach\" of the agent's current behaviour. These purposes establish intrinsic goals for the agent to learn, ultimately resulting in a suite of behaviours that encourage the agent to visit different parts of the state space. Moreover, the approach is particularly suited for settings where rewards are very sparse, and such behaviours can help in the exploration of the environment until reward is observed.\nKidney exchanges are organized markets where patients swap willing but incompatible donors. In the last decade, kidney exchanges grew from small and regional to large and national---and soon, international. This growth results in more lives saved, but exacerbates the empirical hardness of the $\\mathcal{NP}$-complete problem of optimally matching patients to donors. State-of-the-art matching engines use integer programming techniques to clear fielded kidney exchanges, but these methods must be tailored to specific models and objective functions, and may fail to scale to larger exchanges. In this paper, we observe that if the kidney exchange compatibility graph can be encoded by a constant number of patient and donor attributes, the clearing problem is solvable in polynomial time. We give necessary and sufficient conditions for losslessly shrinking the representation of an arbitrary compatibility graph. Then, using real compatibility graphs from the UNOS nationwide kidney exchange, we show how many attributes are needed to encode real compatibility graphs. The experiments show that, indeed, small numbers of attributes suffice.\nMany interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance. Our algorithm is suitable for an important class of episodic POMDPs. Our approach builds on recent advances in method of moments for latent variable model estimation.\nThe Schatten-p quasi-norm $(0<p<1)$ is usually used to replace the standard nuclear norm in order to approximate the rank function more accurately. However, existing Schatten-p quasi-norm minimization algorithms involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration, and thus may become very slow and impractical for large-scale problems. In this paper, we first define two tractable Schatten quasi-norms, i.e., the Frobenius/nuclear hybrid and bi-nuclear quasi-norms, and then prove that they are in essence the Schatten-2/3 and 1/2 quasi-norms, respectively, which lead to the design of very efficient algorithms that only need to update two much smaller factor matrices. We also design two efficient proximal alternating linearized minimization algorithms for solving representative matrix completion problems. Finally, we provide the global convergence and performance guarantees for our algorithms, which have better convergence properties than existing algorithms. Experimental results on synthetic and real-world data show that our algorithms are more accurate than the state-of-the-art methods, and are orders of magnitude faster.\nConcerns over the risks associated with advances in Artificial Intelligence have prompted calls for greater efforts toward robust and beneficial AI, including machine ethics. Recently, roboticists have responded by initiating the development of so-called ethical robots. These robots would, ideally, evaluate the consequences of their actions and morally justify their choices. This emerging field promises to develop extensively over the next years. However, in this paper, we point out an inherent limitation of the emerging field of ethical robots. We show that building ethical robots also necessarily facilitates the construction of unethical robots. In three experiments, we show that it is remarkably easy to modify an ethical robot so that it behaves competitively, or even aggressively. The reason for this is that the specific AI, required to make an ethical robot, can always be exploited to make unethical robots. Hence, the development of ethical robots will not guarantee the responsible deployment of AI. While advocating for ethical robots, we conclude that preventing the misuse of robots is beyond the scope of engineering, and requires instead governance frameworks underpinned by legislation. Without this, the development of ethical robots will serve to increase the risks of robotic malpractice instead of diminishing it.\nThis paper proposes a model which aim is providing a more coherent framework for agents design. We identify three closely related anthropo-centered domains working on separate functional levels. Abstracting from human physiology, psychology, and philosophy we create the $P^3$ model to be used as a multi-tier approach to deal with complex class of problems. The three layers identified in this model have been named PhysioComputing, MindComputing, and MetaComputing. Several instantiations of this model are finally presented related to different IT areas such as artificial intelligence, distributed computing, software and service engineering.\nRapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function (\"avoiding side effects\" and \"avoiding reward hacking\"), an objective function that is too expensive to evaluate frequently (\"scalable supervision\"), or undesirable behavior during the learning process (\"safe exploration\" and \"distributional shift\"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.\nMarkov models lie at the interface between statistical independence in a probability distribution and graph separation properties. We review model selection and estimation in directed and undirected Markov models with Gaussian parametrization, emphasizing the main similarities and differences. These two models are similar but not equivalent, although they share a common intersection. We present the existing results from a historical perspective, taking into account the amount of literature existing from both the artificial intelligence and statistics research communities, where these models were originated. We also discuss how the Gaussian assumption can be relaxed. We finally point out the main areas of application where these Markov models are nowadays used.\nMarkov Logic Networks (MLN) and Probabilistic Soft Logic (PSL) are widely applied formalisms in Statistical Relational Learning, an emerging area in Artificial Intelligence that is concerned with combining logical and statistical AI. Despite their resemblance, the relationship has not been formally stated. In this paper, we describe the precise semantic relationship between them from a logical perspective. This is facilitated by first extending fuzzy logic to allow weights, which can be also viewed as a generalization of PSL, and then relate that generalization to MLN. We observe that the relationship between PSL and MLN is analogous to the known relationship between fuzzy logic and Boolean logic, and furthermore the weight scheme of PSL is essentially a generalization of the weight scheme of MLN for the many-valued setting.\nWe consider the problem of personalization of online services from the viewpoint of ad targeting, where we seek to find the best ad categories to be shown to each user, resulting in improved user experience and increased advertisers' revenue. We propose to address this problem as a task of ranking the ad categories depending on a user's preference, and introduce a novel label ranking approach capable of efficiently learning non-linear, highly accurate models in large-scale settings. Experiments on a real-world advertising data set with more than 3.2 million users show that the proposed algorithm outperforms the existing solutions in terms of both rank loss and top-K retrieval performance, strongly suggesting the benefit of using the proposed model on large-scale ranking problems.\nWe consider the task of KBP slot filling -- extracting relation information from newswire documents for knowledge base construction. We present our pipeline, which employs Relational Dependency Networks (RDNs) to learn linguistic patterns for relation extraction. Additionally, we demonstrate how several components such as weak supervision, word2vec features, joint learning and the use of human advice, can be incorporated in this relational framework. We evaluate the different components in the benchmark KBP 2015 task and show that RDNs effectively model a diverse set of features and perform competitively with current state-of-the-art relation extraction.\nThis paper details the implementation of an algorithm for automatically generating a high-level knowledge network to perform commonsense reasoning, specifically with the application of robotic task repair. The network is represented using a Bayesian Logic Network (BLN) (Jain, Waldherr, and Beetz 2009), which combines a set of directed relations between abstract concepts, including IsA, AtLocation, HasProperty, and UsedFor, with a corresponding probability distribution that models the uncertainty inherent in these relations. Inference over this network enables reasoning over the abstract concepts in order to perform appropriate object substitution or to locate missing objects in the robot's environment. The structure of the network is generated by combining information from two existing knowledge sources: ConceptNet (Speer and Havasi 2012), and WordNet (Miller 1995). This is done in a \"situated\" manner by only including information relevant a given context. Results show that the generated network is able to accurately predict object categories, locations, properties, and affordances in three different household scenarios.\nA natural language interface exploits the conceptual simplicity and naturalness of the language to create a high-level user-friendly communication channel between humans and machines. One of the promising applications of such interfaces is generating visual interpretations of semantic content of a given natural language that can be then visualized either as a static scene or a dynamic animation. This survey discusses requirements and challenges of developing such systems and reports 26 graphical systems that exploit natural language interfaces and addresses both artificial intelligence and visualization aspects. This work serves as a frame of reference to researchers and to enable further advances in the field.\nIn many navigational domains the traversability of cells is conditioned on the path taken. This is often the case in video-games, in which a character may need to acquire a certain object (i.e., a key or a flying suit) to be able to traverse specific locations (e.g., doors or high walls). In order for non-player characters to handle such scenarios we present invJPS, an \"inventory-driven\" pathfinding approach based on the highly successful grid-based Jump-Point-Search (JPS) algorithm. We show, formally and experimentally, that the invJPS preserves JPS's optimality guarantees and its symmetry breaking advantages in inventory-based variants of game maps.\nMany artificial intelligences (AIs) are randomized. One can be lucky or unlucky with the random seed; we quantify this effect and show that, maybe contrarily to intuition, this is far from being negligible. Then, we apply two different existing algorithms for selecting good seeds and good probability distributions over seeds. This mainly leads to learning an opening book. We apply this to Phantom Go, which, as all phantom games, is hard for opening book learning. We improve the winning rate from 50% to 70% in 5x5 against the same AI, and from approximately 0% to 40% in 5x5, 7x7 and 9x9 against a stronger (learning) opponent.\nThis paper considers global optimization with a black-box unknown objective function that can be non-convex and non-differentiable. Such a difficult optimization problem arises in many real-world applications, such as parameter tuning in machine learning, engineering design problem, and planning with a complex physics simulator. This paper proposes a new global optimization algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both fast convergence in practice and finite-time error bound in theory. The advantage and usage of the new algorithm are illustrated via theoretical analysis and an experiment conducted with 11 benchmark test functions. Further, we modify the LOGO algorithm to specifically solve a planning problem via policy search with continuous state/action space and long time horizon while maintaining its finite-time error bound. We apply the proposed planning method to accident management of a nuclear power plant. The result of the application study demonstrates the practical utility of our method.\nImage generation remains a fundamental problem in artificial intelligence in general and deep learning in specific. The generative adversarial network (GAN) was successful in generating high quality samples of natural images. We propose a model called composite generative adversarial network, that reveals the complex structure of images with multiple generators in which each generator generates some part of the image. Those parts are combined by alpha blending process to create a new single image. It can generate, for example, background and face sequentially with two generators, after training on face dataset. Training was done in an unsupervised way without any labels about what each generator should generate. We found possibilities of learning the structure by using this generative model empirically.\nA recent research trend in Artificial Intelligence (AI) is the combination of several programs into one single, stronger, program; this is termed portfolio methods. We here investigate the application of such methods to Game Playing Programs (GPPs). In addition, we consider the case in which only one GPP is available - by decomposing this single GPP into several ones through the use of parameters or even simply random seeds. These portfolio methods are trained in a learning phase. We propose two different offline approaches. The simplest one, BestArm, is a straightforward optimization of seeds or parame- ters; it performs quite well against the original GPP, but performs poorly against an opponent which repeats games and learns. The second one, namely Nash-portfolio, performs similarly in a \"one game\" test, and is much more robust against an opponent who learns. We also propose an online learning portfolio, which tests several of the GPP repeatedly and progressively switches to the best one - using a bandit algorithm.\nBehavior planning is known to be one of the basic cognitive functions, which is essential for any cognitive architecture of any control system used in robotics. At the same time most of the widespread planning algorithms employed in those systems are developed using only approaches and models of Artificial Intelligence and don't take into account numerous results of cognitive experiments. As a result, there is a strong need for novel methods of behavior planning suitable for modern cognitive architectures aimed at robot control. One such method is presented in this work and is studied within a special class of navigation task called smart relocation task. The method is based on the hierarchical two-level model of abstraction and knowledge representation, e.g. symbolic and subsymbolic. On the symbolic level sign world model is used for knowledge representation and hierarchical planning algorithm, PMA, is utilized for planning. On the subsymbolic level the task of path planning is considered and solved as a graph search problem. Interaction between both planners is examined and inter-level interfaces and feedback loops are described. Preliminary experimental results are presented.\nLTL synthesis -- the construction of a function to satisfy a logical specification formulated in Linear Temporal Logic -- is a 2EXPTIME-complete problem with relevant applications in controller synthesis and a myriad of artificial intelligence applications. In this research note we consider De Giacomo and Vardi's variant of the synthesis problem for LTL formulas interpreted over finite rather than infinite traces. Rather surprisingly, given the existing claims on complexity, we establish that LTL synthesis is EXPTIME-complete for the finite interpretation, and not 2EXPTIME-complete as previously reported. Our result coincides nicely with the planning perspective where non-deterministic planning with full observability is EXPTIME-complete and partial observability increases the complexity to 2EXPTIME-complete; a recent related result for LTL synthesis shows that in the finite case with partial observability, the problem is 2EXPTIME-complete.\nTemporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework.\nGaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.\nWe examine two recent artificial intelligence (AI) based deep learning algorithms for visual blending in convolutional neural networks (Mordvintsev et al. 2015, Gatys et al. 2015). To investigate the potential value of these algorithms as tools for computational creativity research, we explain and schematize the essential aspects of the algorithms' operation and give visual examples of their output. We discuss the relationship of the two algorithms to human cognitive science theories of creativity such as conceptual blending theory and honing theory, and characterize the algorithms with respect to generation of novelty and aesthetic quality.\nThe emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.\nDatasets with hundreds of variables and many missing values are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes an approach that combines probabilistic programming, information theory, and non-parametric Bayes. It shows how to use Bayesian non-parametric modeling to (i) build an ensemble of joint probability models for all the variables; (ii) efficiently detect marginal independencies; and (iii) estimate the conditional mutual information between arbitrary subsets of variables, subject to a broad class of constraints. Users can access these capabilities using BayesDB, a probabilistic programming platform for probabilistic data analysis, by writing queries in a simple, SQL-like language. This paper demonstrates empirically that the method can (i) detect context-specific (in)dependencies on challenging synthetic problems and (ii) yield improved sensitivity and specificity over baselines from statistics and machine learning, on a real-world database of over 300 sparsely observed indicators of macroeconomic development and public health.\nContinuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problem-decomposition approach to nonconvex optimization. Similarly to DPLL-style SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent sub-functions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding.\nIn digital-based information boom, the fuzzy covering rough set model is an important mathematical tool for artificial intelligence, and how to build the bridge between the fuzzy covering rough set theory and Pawlak's model is becoming a hot research topic. In this paper, we first present the $\\gamma-$fuzzy covering based probabilistic and grade approximation operators and double-quantitative approximation operators. We also study the relationships among the three types of $\\gamma-$fuzzy covering based approximation operators. Second, we propose the $\\gamma^{\\ast}-$fuzzy coverings based multi-granulation probabilistic and grade lower and upper approximation operators and multi-granulation double-quantitative lower and upper approximation operators. We also investigate the relationships among these types of $\\gamma-$fuzzy coverings based approximation operators. Finally, we employ several examples to illustrate how to construct the lower and upper approximations of fuzzy sets with the absolute and relative quantitative information.\nMarkov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.\nMultivariate Pattern (MVP) classification can map different cognitive states to the brain tasks. One of the main challenges in MVP analysis is validating the generated results across subjects. However, analyzing multi-subject fMRI data requires accurate functional alignments between neuronal activities of different subjects, which can rapidly increase the performance and robustness of the final results. Hyperalignment (HA) is one of the most effective functional alignment methods, which can be mathematically formulated by the Canonical Correlation Analysis (CCA) methods. Since HA mostly uses the unsupervised CCA techniques, its solution may not be optimized for MVP analysis. By incorporating the idea of Local Discriminant Analysis (LDA) into CCA, this paper proposes Local Discriminant Hyperalignment (LDHA) as a novel supervised HA method, which can provide better functional alignment for MVP analysis. Indeed, the locality is defined based on the stimuli categories in the train-set, where the correlation between all stimuli in the same category will be maximized and the correlation between distinct categories of stimuli approaches to near zero. Experimental studies on multi-subject MVP analysis confirm that the LDHA method achieves superior performance to other state-of-the-art HA algorithms.\nThe family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\\lambda}) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efficiency benefits to least-squares methods, at a fraction of the computation and storage (2) significantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain.\nThe ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.\nWe tackle the prediction of instructor intervention in student posts from discussion forums in Massive Open Online Courses (MOOCs). Our key finding is that using automatically obtained discourse relations improves the prediction of when instructors intervene in student discussions, when compared with a state-of-the-art, feature-rich baseline. Our supervised classifier makes use of an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags that represent in-post discourse features. We show PDTB relation-based features increase the robustness of the classifier and complement baseline features in recalling more diverse instructor intervention patterns. In comprehensive experiments over 14 MOOC offerings from several disciplines, the PDTB discourse features improve performance on average. The resultant models are less dependent on domain-specific vocabulary, allowing them to better generalize to new courses.\nRepresentations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning the representations directly from the incoming data stream reduces the human labour involved in designing a learning system. More importantly, this allows in scaling of a learning system for difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.\nSparsity-constrained optimization is an important and challenging problem that has wide applicability in data mining, machine learning, and statistics. In this paper, we focus on sparsity-constrained optimization in cases where the cost function is a general nonlinear function and, in particular, the sparsity constraint is defined by a graph-structured sparsity model. Existing methods explore this problem in the context of sparse estimation in linear models. To the best of our knowledge, this is the first work to present an efficient approximation algorithm, namely, Graph-structured Matching Pursuit (Graph-Mp), to optimize a general nonlinear function subject to graph-structured constraints. We prove that our algorithm enjoys the strong guarantees analogous to those designed for linear models in terms of convergence rate and approximation accuracy. As a case study, we specialize Graph-Mp to optimize a number of well-known graph scan statistic models for the connected subgraph detection task, and empirical evidence demonstrates that our general algorithm performs superior over state-of-the-art methods that are designed specifically for the task of connected subgraph detection.\nAlgorithms for equilibrium computation generally make no attempt to ensure that the computed strategies are understandable by humans. For instance the strategies for the strongest poker agents are represented as massive binary files. In many situations, we would like to compute strategies that can actually be implemented by humans, who may have computational limitations and may only be able to remember a small number of features or components of the strategies that have been computed. We study poker games where private information distributions can be arbitrary. We create a large training set of game instances and solutions, by randomly selecting the information probabilities, and present algorithms that learn from the training instances in order to perform well in games with unseen information distributions. We are able to conclude several new fundamental rules about poker strategy that can be easily implemented by humans.\nOptimization - minimization or maximization - in the lattice of subsets is a frequent operation in Artificial Intelligence tasks. Examples are subset-minimal model-based diagnosis, nonmonotonic reasoning by means of circumscription, or preferred extensions in abstract argumentation. Finding the optimum among many admissible solutions is often harder than finding admissible solutions with respect to both computational complexity and methodology. This paper addresses the former issue by means of an effective method for finding subset-optimal solutions. It is based on the relationship between cardinality-optimal and subset-optimal solutions, and the fact that many logic-based declarative programming systems provide constructs for finding cardinality-optimal solutions, for example maximum satisfiability (MaxSAT) or weak constraints in Answer Set Programming (ASP). Clearly each cardinality-optimal solution is also a subset-optimal one, and if the language also allows for the addition of particular restricting constructs (both MaxSAT and ASP do) then all subset-optimal solutions can be found by an iterative computation of cardinality-optimal solutions. As a showcase, the computation of preferred extensions of abstract argumentation frameworks using the proposed method is studied.\nOne of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple \"imagined\" planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures.\nConvolutions have long been regarded as fundamental to applied mathematics, physics and engineering. Their mathematical elegance allows for common tasks such as numerical differentiation to be computed efficiently on large data sets. Efficient computation of convolutions is critical to artificial intelligence in real-time applications, like machine vision, where convolutions must be continuously and efficiently computed on tens to hundreds of kilobytes per second. In this paper, we explore how convolutions are used in fundamental machine vision applications. We present an accelerated n-dimensional convolution package in the high performance computing language, Julia, and demonstrate its efficacy in solving the time to contact problem for machine vision. Results are measured against synthetically generated videos and quantitatively assessed according to their mean squared error from the ground truth. We achieve over an order of magnitude decrease in compute time and allocated memory for comparable machine vision applications. All code is packaged and integrated into the official Julia Package Manager to be used in various other scenarios.\nOpponent modeling consists in modeling the strategy or preferences of an agent thanks to the data it provides. In the context of automated negotiation and with machine learning, it can result in an advantage so overwhelming that it may restrain some casual agents to be part of the bargaining process. We qualify as \"curious\" an agent driven by the desire of negotiating in order to collect information and improve its opponent model. However, neither curiosity-based rational-ity nor curiosity-robust protocol have been studied in automatic negotiation. In this paper, we rely on mechanism design to propose three extensions of the standard bargaining protocol that limit information leak. Those extensions are supported by an enhanced rationality model, that considers the exchanged information. Also, they are theoretically analyzed and experimentally evaluated.\nIn this paper, we propose an efficient transfer leaning methods for training a personalized language model using a recurrent neural network with long short-term memory architecture. With our proposed fast transfer learning schemes, a general language model is updated to a personalized language model with a small amount of user data and a limited computing resource. These methods are especially useful for a mobile device environment while the data is prevented from transferring out of the device for privacy purposes. Through experiments on dialogue data in a drama, it is verified that our transfer learning methods have successfully generated the personalized language model, whose output is more similar to the personal language style in both qualitative and quantitative aspects.\nA common problem in disciplines of applied Statistics research such as Astrostatistics is of estimating the posterior distribution of relevant parameters. Typically, the likelihoods for such models are computed via expensive experiments such as cosmological simulations of the universe. An urgent challenge in these research domains is to develop methods that can estimate the posterior with few likelihood evaluations.   In this paper, we study active posterior estimation in a Bayesian setting when the likelihood is expensive to evaluate. Existing techniques for posterior estimation are based on generating samples representative of the posterior. Such methods do not consider efficiency in terms of likelihood evaluations. In order to be query efficient we treat posterior estimation in an active regression framework. We propose two myopic query strategies to choose where to evaluate the likelihood and implement them using Gaussian processes. Via experiments on a series of synthetic and real examples we demonstrate that our approach is significantly more query efficient than existing techniques and other heuristics for posterior estimation.\nThis paper reconsiders the problem of the absent-minded driver who must choose between alternatives with different payoff with imperfect recall and varying degrees of knowledge of the system. The classical absent-minded driver problem represents the case with limited information and it has bearing on the general area of communication and learning, social choice, mechanism design, auctions, theories of knowledge, belief, and rational agency. Within the framework of extensive games, this problem has applications to many artificial intelligence scenarios. It is obvious that the performance of the agent improves as information available increases. It is shown that a non-uniform assignment strategy for successive choices does better than a fixed probability strategy. We consider both classical and quantum approaches to the problem. We argue that the superior performance of quantum decisions with access to entanglement cannot be fairly compared to a classical algorithm. If the cognitive systems of agents are taken to have access to quantum resources, or have a quantum mechanical basis, then that can be leveraged into superior performance.\nThere has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of RL tasks, from Atari games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning, that learn to play from experience with minimal knowledge of the specific domain of interest. In this work, we will investigate the performance of these methods on Super Smash Bros. Melee (SSBM), a popular console fighting game. The SSBM environment has complex dynamics and partial observability, making it challenging for human and machine alike. The multi-player aspect poses an additional challenge, as the vast majority of recent advances in RL have focused on single-agent environments. Nonetheless, we will show that it is possible to train agents that are competitive against and even surpass human professionals, a new result for the multi-player video game setting.\nProgramming by Example (PBE) targets at automatically inferring a computer program for accomplishing a certain task from sample input and output. In this paper, we propose a deep neural networks (DNN) based PBE model called Neural Programming by Example (NPBE), which can learn from input-output strings and induce programs that solve the string manipulation problems. Our NPBE model has four neural network based components: a string encoder, an input-output analyzer, a program generator, and a symbol selector. We demonstrate the effectiveness of NPBE by training it end-to-end to solve some common string manipulation problems in spreadsheet systems. The results show that our model can induce string manipulation programs effectively. Our work is one step towards teaching DNN to generate computer programs.\nIn this paper we follow our previous research in the area of Computerized Adaptive Testing (CAT). We present three different methods for CAT. One of them, the item response theory, is a well established method, while the other two, Bayesian and neural networks, are new in the area of educational testing. In the first part of this paper, we present the concept of CAT and its advantages and disadvantages. We collected data from paper tests performed with grammar school students. We provide the summary of data used for our experiments in the second part. Next, we present three different model types for CAT. They are based on the item response theory, Bayesian networks, and neural networks. The general theory associated with each type is briefly explained and the utilization of these models for CAT is analyzed. Future research is outlined in the concluding part of the paper. It shows many interesting research paths that are important not only for CAT but also for other areas of artificial intelligence.\nWhich topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use machine learning in order to determine the top 10 topics in machine learning. We not only include models, but provide a holistic view across optimization, data, features, etc. This quantitative approach allows reducing the bias of surveys. It reveals new and up-to-date insights into what the 10 most prolific topics in machine learning research are. This allows researchers to identify popular topics as well as new and rising topics for their research.\nTreatment effects can be estimated from observational data as the difference in potential outcomes. In this paper, we address the challenge of estimating the potential outcome when treatment-dose levels can vary continuously over time. Further, the outcome variable may not be measured at a regular frequency. Our proposed solution represents the treatment response curves using linear time-invariant dynamical systems---this provides a flexible means for modeling response over time to highly variable dose curves. Moreover, for multivariate data, the proposed method: uncovers shared structure in treatment response and the baseline across multiple markers; and, flexibly models challenging correlation structure both across and within signals over time. For this, we build upon the framework of multiple-output Gaussian Processes. On simulated and a challenging clinical dataset, we show significant gains in accuracy over state-of-the-art models.\nThe weighted Maximum Satisfiability problem (weighted MAX-SAT) is a NP-hard problem with numerous applications arising in artificial intelligence. As an efficient tool for heuristic design, the backbone has been applied to heuristics design for many NP-hard problems. In this paper, we investigated the computational complexity for retrieving the backbone in weighted MAX-SAT and developed a new algorithm for solving this problem. We showed that it is intractable to retrieve the full backbone under the assumption that . Moreover, it is intractable to retrieve a fixed fraction of the backbone as well. And then we presented a backbone guided local search (BGLS) with Walksat operator for weighted MAX-SAT. BGLS consists of two phases: the first phase samples the backbone information from local optima and the backbone phase conducts local search under the guideline of backbone. Extensive experimental results on the benchmark showed that BGLS outperforms the existing heuristics in both solution quality and runtime.\nWith the recent advancements in Artificial Intelligence (AI), various organizations and individuals started debating about the progress of AI as a blessing or a curse for the future of the society. This paper conducts an investigation on how the public perceives the progress of AI by utilizing the data shared on Twitter. Specifically, this paper performs a comparative analysis on the understanding of users from two categories -- general AI-Tweeters (AIT) and the expert AI-Tweeters (EAIT) who share posts about AI on Twitter. Our analysis revealed that users from both the categories express distinct emotions and interests towards AI. Users from both the categories regard AI as positive and are optimistic about the progress of AI but the experts are more negative than the general AI-Tweeters. Characterization of users manifested that `London' is the popular location of users from where they tweet about AI. Tweets posted by AIT are highly retweeted than posts made by EAIT that reveals greater diffusion of information from AIT.\nThis work presents a model for the Tramp Ship Scheduling problem including berth allocation considerations, motivated by a real case of a shipping company. The aim is to determine the travel schedule for each vessel considering multiple docking and multiple time windows at the berths. This work is innovative due to the consideration of both spatial and temporal attributes during the scheduling process. The resulting model is formulated as a mixed-integer linear programming problem, and a heuristic method to deal with multiple vessel schedules is also presented. Numerical experimentation is performed to highlight the benefits of the proposed approach and the applicability of the heuristic. Conclusions and recommendations for further research are provided.\nLogics of limited belief aim at enabling computationally feasible reasoning in highly expressive representation languages. These languages are often dialects of first-order logic with a weaker form of logical entailment that keeps reasoning decidable or even tractable. While a number of such logics have been proposed in the past, they tend to remain for theoretical analysis only and their practical relevance is very limited. In this paper, we aim to go beyond the theory. Building on earlier work by Liu, Lakemeyer, and Levesque, we develop a logic of limited belief that is highly expressive while remaining decidable in the first-order and tractable in the propositional case and exhibits some characteristics that make it attractive for an implementation. We introduce a reasoning system that employs this logic as representation language and present experimental results that showcase the benefit of limited belief.\nThis paper uses anthropic reasoning to argue for a reduced likelihood that superintelligent AI will come into existence in the future. To make this argument, a new principle is introduced: the Super-Strong Self-Sampling Assumption (SSSSA), building on the Self-Sampling Assumption (SSA) and the Strong Self-Sampling Assumption (SSSA). SSA uses as its sample the relevant observers, whereas SSSA goes further by using observer-moments. SSSSA goes further still and weights each sample proportionally, according to the size of a mind in cognitive terms. SSSSA is required for human observer-samples to be typical, given by how much non-human animals outnumber humans. Given SSSSA, the assumption that humans experience typical observer-samples relies on a future where superintelligent AI does not dominate, which in turn reduces the likelihood of it being created at all.\nGood parameter settings are crucial to achieve high performance in many areas of artificial intelligence (AI), such as satisfiability solving, AI planning, scheduling, and machine learning (in particular deep learning). Automated algorithm configuration methods have recently received much attention in the AI community since they replace tedious, irreproducible and error-prone manual parameter tuning and can lead to new state-of-the-art performance. However, practical applications of algorithm configuration are prone to several (often subtle) pitfalls in the experimental design that can render the procedure ineffective. We identify several common issues and propose best practices for avoiding them, including a tool called GenericWrapper4AC for preventing the many possible problems in measuring the performance of the algorithm being optimized by executing it in a standardized, controlled manner.\nWe consider goods that can be shared with k-hop neighbors (i.e., the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising game-theoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the social inefficiency decreases as k increases and thus a good can be shared with more nodes. Third, and most importantly, the social inefficiency can also be significantly reduced by charging free riders an access cost and paying it to owners, leading to the conclusion that organizations and system designers should impose such a cost. These findings are supported by our theoretical analysis in terms of the price of anarchy and the price of stability; and by simulations based on synthetic and real social networks.\nMany network optimization problems can be formulated as stochastic network design problems in which edges are present or absent stochastically. Furthermore, protective actions can guarantee that edges will remain present. We consider the problem of finding the optimal protection strategy under a budget limit in order to maximize some connectivity measurements of the network. Previous approaches rely on the assumption that edges are independent. In this paper, we consider a more realistic setting where multiple edges are not independent due to natural disasters or regional events that make the states of multiple edges stochastically correlated. We use Markov Random Fields to model the correlation and define a new stochastic network design framework. We provide a novel algorithm based on Sample Average Approximation (SAA) coupled with a Gibbs or XOR sampler. The experimental results on real road network data show that the policies produced by SAA with the XOR sampler have higher quality and lower variance compared to SAA with Gibbs sampler.\nAttempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting. Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of hippocampus as a short-term memory system in primate brain, we propose the Deep Generative Replay, a novel framework with a cooperative dual model architecture consisting of a deep generative model (\"generator\") and a task solving model (\"solver\"). With only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task. We test our methods in several sequential learning settings involving image classification tasks.\nAdvances in artificial intelligence (AI) will transform modern life by reshaping transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the next ten years, such as translating languages (by 2024), writing high-school essays (by 2026), driving a truck (by 2027), working in retail (by 2031), writing a bestselling book (by 2049), and working as a surgeon (by 2053). Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI.\nIn this paper we explore the theoretical boundaries of planning in a setting where no model of the agent's actions is given. Instead of an action model, a set of successfully executed plans are given and the task is to generate a plan that is safe, i.e., guaranteed to achieve the goal without failing. To this end, we show how to learn a conservative model of the world in which actions are guaranteed to be applicable. This conservative model is then given to an off-the-shelf classical planner, resulting in a plan that is guaranteed to achieve the goal. However, this reduction from a model-free planning to a model-based planning is not complete: in some cases a plan will not be found even when such exists. We analyze the relation between the number of observed plans and the likelihood that our conservative approach will indeed fail to solve a solvable problem. Our analysis show that the number of trajectories needed scales gracefully.\nWith applications to many disciplines, the traveling salesman problem (TSP) is a classical computer science optimization problem with applications to industrial engineering, theoretical computer science, bioinformatics, and several other disciplines. In recent years, there have been a plethora of novel approaches for approximate solutions ranging from simplistic greedy to cooperative distributed algorithms derived from artificial intelligence. In this paper, we perform an evaluation and analysis of cornerstone algorithms for the Euclidean TSP. We evaluate greedy, 2-opt, and genetic algorithms. We use several datasets as input for the algorithms including a small dataset, a mediumsized dataset representing cities in the United States, and a synthetic dataset consisting of 200 cities to test algorithm scalability. We discover that the greedy and 2-opt algorithms efficiently calculate solutions for smaller datasets. Genetic algorithm has the best performance for optimality for medium to large datasets, but generally have longer runtime. Our implementations is public available.\nMany state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.\nWhile the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and associated parameters. In order to adopt such models for artificial intelligence, researchers have handcrafted the relevant states, and then used neural networks to learn the state transitions using simulation runs as training data. Unfortunately, such approaches can be unsuitable for modeling complex real-world scenarios, where manually authoring relevant state spaces tend to be challenging. In this work, we investigate if neural networks can implicitly learn physical states of real-world mechanical processes only based on visual data, and thus enable long-term physical extrapolation. We develop a recurrent neural network architecture for this task and also characterize resultant uncertainties in the form of evolving variance estimates. We evaluate our setup to extrapolate motion of a rolling ball on bowl of varying shape and orientation using only images as input, and report competitive results with approaches that assume access to internal physics models and parameters.\nMulti-Agent Path Finding (MAPF) is an NP-hard problem well studied in artificial intelligence and robotics. It has many real-world applications for which existing MAPF solvers use various heuristics. However, these solvers are deterministic and perform poorly on \"hard\" instances typically characterized by many agents interfering with each other in a small region. In this paper, we enhance MAPF solvers with randomization and observe that they exhibit heavy-tailed distributions of runtimes on hard instances. This leads us to develop simple rapid randomized restart (RRR) strategies with the intuition that, given a hard instance, multiple short runs have a better chance of solving it compared to one long run. We validate this intuition through experiments and show that our RRR strategies indeed boost the performance of state-of-the-art MAPF solvers such as iECBS and M*.\nThis paper addresses the dynamic difficulty adjustment on MOBA games as a way to improve the player's entertainment. Although MOBA is currently one of the most played genres around the world, it is known as a game that offer less autonomy, more challenges and consequently more frustration. Due to these characteristics, the use of a mechanism that performs the difficulty balance dynamically seems to be an interesting alternative to minimize and/or avoid that players experience such frustrations. In this sense, this paper presents a dynamic difficulty adjustment mechanism for MOBA games. The main idea is to create a computer controlled opponent that adapts dynamically to the player performance, trying to offer to the player a better game experience. This is done by evaluating the performance of the player using a metric based on some game features and switching the difficulty of the opponent's artificial intelligence behavior accordingly. Quantitative and qualitative experiments were performed and the results showed that the system is capable of adapting dynamically to the opponent's skills. In spite of that, the qualitative experiments with users showed that the player's expertise has a greater influence on the perception of the difficulty level and dynamic adaptation.\nWe study the problem of identifying the best action among a set of possible options when the value of each action is given by a mapping from a number of noisy micro-observables in the so-called fixed confidence setting. Our main motivation is the application to the minimax game search, which has been a major topic of interest in artificial intelligence. In this paper we introduce an abstract setting to clearly describe the essential properties of the problem. While previous work only considered a two-move game tree search problem, our abstract setting can be applied to the general minimax games where the depth can be non-uniform and arbitrary, and transpositions are allowed. We introduce a new algorithm (LUCB-micro) for the abstract setting, and give its lower and upper sample complexity results. Our bounds recover some previous results, which were only available in more limited settings, while they also shed further light on how the structure of minimax problems influence sample complexity.\nThere is significant concern that technological advances, especially in Robotics and Artificial Intelligence (AI), could lead to high levels of unemployment in the coming decades. Studies have estimated that around half of all current jobs are at risk of automation. To look into this issue in more depth, we surveyed experts in Robotics and AI about the risk, and compared their views with those of non-experts. Whilst the experts predicted a significant number of occupations were at risk of automation in the next two decades, they were more cautious than people outside the field in predicting occupations at risk. Their predictions were consistent with their estimates for when computers might be expected to reach human level performance across a wide range of skills. These estimates were typically decades later than those of the non-experts. Technological barriers may therefore provide society with more time to prepare for an automated future than the public fear. In addition, public expectations may need to be dampened about the speed of progress to be expected in Robotics and AI.\nAnimals (especially humans) have an amazing ability to learn new tasks quickly, and switch between them flexibly. How brains support this ability is largely unknown, both neuroscientifically and algorithmically. One reasonable supposition is that modules drawing on an underlying general-purpose sensory representation are dynamically allocated on a per-task basis. Recent results from neuroscience and artificial intelligence suggest the role of the general purpose visual representation may be played by a deep convolutional neural network, and give some clues how task modules based on such a representation might be discovered and constructed. In this work, we investigate module architectures in an embodied two-dimensional touchscreen environment, in which an agent's learning must occur via interactions with an environment that emits images and rewards, and accepts touches as input. This environment is designed to capture the physical structure of the task environments that are commonly deployed in visual neuroscience and psychophysics. We show that in this context, very simple changes in the nonlinear activations used by such a module can significantly influence how fast it is at learning visual tasks and how suitable it is for switching to new tasks.\nWe introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \\phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \\phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.\nEmotions play a central role in most forms of natural human interaction so we may expect that computational methods for the processing and expression of emotions will play a growing role in human-computer interaction. The OCC model has established itself as the standard model for emotion synthesis. A large number of studies employed the OCC model to generate emotions for their embodied characters. Many developers of such characters believe that the OCC model will be all they ever need to equip their character with emotions. This study reflects on the limitations of the OCC model specifically, and on the emotion models in general due to their dependency on artificial intelligence.\nComplex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs. This provides a general language to consider, for instance, the different levels of description in the following three scenarios: (a) models with large numbers of variables versus models in which the `irrelevant' or unobservable variables have been marginalised out; (b) micro-level models versus macro-level models in which the macro-variables are aggregate features of the micro-variables; (c) dynamical time series models versus models of their stationary behaviour. Our analysis stresses the importance of well specified interventions in the causal modelling process and sheds light on the interpretation of cyclic SEMs.\nSocial dilemmas are situations where individuals face a temptation to increase their payoffs at a cost to total welfare. Building artificially intelligent agents that achieve good outcomes in these situations is important because many real world interactions include a tension between selfish interests and the welfare of others. We show how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice (begin by cooperating), provokable (try to avoid being exploited), and forgiving (try to return to mutual cooperation). We show both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas. Our construction does not require training methods beyond a modification of self-play, thus if an environment is such that good strategies can be constructed in the zero-sum case (eg. Atari) then we can construct agents that solve social dilemmas in this environment.\nThis paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.\nBayesian optimization has recently attracted the attention of the automatic machine learning community for its excellent results in hyperparameter tuning. BO is characterized by the sample efficiency with which it can optimize expensive black-box functions. The efficiency is achieved in a similar fashion to the learning to learn methods: surrogate models (typically in the form of Gaussian processes) learn the target function and perform intelligent sampling. This surrogate model can be applied even in the presence of noise; however, as with most regression methods, it is very sensitive to outlier data. This can result in erroneous predictions and, in the case of BO, biased and inefficient exploration. In this work, we present a GP model that is robust to outliers which uses a Student-t likelihood to segregate outliers and robustly conduct Bayesian optimization. We present numerical results evaluating the proposed method in both artificial functions and real problems.\nAutomated planning remains one of the most general paradigms in Artificial Intelligence, providing means of solving problems coming from a wide variety of domains. One of the key factors restricting the applicability of planning is its computational complexity resulting from exponentially large search spaces. Heuristic approaches are necessary to solve all but the simplest problems. In this work, we explore the possibility of obtaining domain-independent heuristic functions using machine learning. This is a part of a wider research program whose objective is to improve practical applicability of planning in systems for which the planning domains evolve at run time. The challenge is therefore the learning of (corrections of) domain-independent heuristics that can be reused across different planning domains.\nA variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the non-convexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We compare the learned representations to a canonical fixed sparse representation, called tile-coding, demonstrating that the sparse coding representation outperforms a wide variety of tilecoding representations.\nLifted inference algorithms commonly exploit symmetries in a probabilistic graphical model (PGM) for efficient inference. However, existing algorithms for Boolean-valued domains can identify only those pairs of states as symmetric, in which the number of ones and zeros match exactly (count symmetries). Moreover, algorithms for lifted inference in multi-valued domains also compute a multi-valued extension of count symmetries only. These algorithms miss many symmetries in a domain. In this paper, we present first algorithms to compute non-count symmetries in both Boolean-valued and multi-valued domains. Our methods can also find symmetries between multi-valued variables that have different domain cardinalities. The key insight in the algorithms is that they change the unit of symmetry computation from a variable to a variable-value (VV) pair. Our experiments find that exploiting these symmetries in MCMC can obtain substantial computational gains over existing algorithms.\nHumans develop a common sense of style compatibility between items based on their attributes. We seek to automatically answer questions like \"Does this shirt go well with that pair of jeans?\" In order to answer these kinds of questions, we attempt to model human sense of style compatibility in this paper. The basic assumption of our approach is that most of the important attributes for a product in an online store are included in its title description. Therefore it is feasible to learn style compatibility from these descriptions. We design a Siamese Convolutional Neural Network architecture and feed it with title pairs of items, which are either compatible or incompatible. Those pairs will be mapped from the original space of symbolic words into some embedded style space. Our approach takes only words as the input with few preprocessing and there is no laborious and expensive feature engineering.\nSand--bubblers are crabs of the genera Dotilla and Scopimera which are known to produce remarkable patterns and structures at tropical beaches. From these pattern-making abilities, we may draw inspiration for digital visual art. A simple mathematical model is proposed and an algorithm is designed that may create such sand-bubbler patterns artificially. In addition, design parameters to modify the patterns are identified and analyzed by computational aesthetic measures. Finally, an extension of the algorithm is discussed that may enable controlling and guiding generative evolution of the art-making process.\nWe present a concrete design for Solomonoff's incremental machine learning system suitable for desktop computers. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on a stochastic Context Free Grammar together with new update algorithms that use the same grammar as a guiding probability distribution for incremental machine learning. The updates include adjusting production probabilities, re-using previous solutions, learning programming idioms and discovery of frequent subprograms. The issues of extending the a priori probability distribution and bootstrapping are discussed. We have implemented a good portion of the proposed algorithms. Experiments with toy problems show that the update algorithms work as expected.\nData based judgments go into artificial intelligence applications but they undergo paradoxical reversal when seemingly unnecessary additional data is provided. Examples of this are Simpson's reversal and the disjunction effect where the beliefs about the data change once it is presented or aggregated differently. Sometimes the significance of the difference can be evaluated using statistical tests such as Pearson's chi-squared or Fisher's exact test, but this may not be helpful in threshold-based decision systems that operate with incomplete information. To mitigate risks in the use of algorithms in decision-making, we consider the question of modeling of beliefs. We argue that evidence supports that beliefs are not classical statistical variables and they should, in the general case, be considered as superposition states of disjoint or polar outcomes. We analyze the disjunction effect from the perspective of the belief as a quantum vector.\nConversational AI systems are becoming famous in day to day lives. In this paper, we are trying to address the following key question: To identify whether design, as well as development efforts for search oriented conversational AI are successful or not.It is tricky to define 'success' in the case of conversational AI and equally tricky part is to use appropriate metrics for the evaluation of conversational AI. We propose four different perspectives namely user experience, information retrieval, linguistic and artificial intelligence for the evaluation of conversational AI systems. Additionally, background details of conversational AI systems are provided including desirable characteristics of personal assistants, differences between chatbot and an AI based personal assistant. An importance of personalization and how it can be achieved is explained in detail. Current challenges in the development of an ideal conversational AI (personal assistant) are also highlighted along with guidelines for achieving personalized experience for users.\nDeep convolutional neural networks (CNN) are widely used in modern artificial intelligence (AI) and smart vision systems but also limited by computation latency, throughput, and energy efficiency on a resource-limited scenario, such as mobile devices, internet of things (IoT), unmanned aerial vehicles (UAV), and so on. A hardware streaming architecture is proposed to accelerate convolution and pooling computations for state-of-the-art deep CNNs. It is optimized for energy efficiency by maximizing local data reuse to reduce off-chip DRAM data access. In addition, image and feature decomposition techniques are introduced to optimize memory access pattern for an arbitrary size of image and number of features within limited on-chip SRAM capacity. A prototype accelerator was implemented in TSMC 65 nm CMOS technology with 2.3 mm x 0.8 mm core area, which achieves 144 GOPS peak throughput and 0.8 TOPS/W peak energy efficiency.\nRecommendation systems are recognised as being hugely important in industry, and the area is now well understood. At News UK, there is a requirement to be able to quickly generate recommendations for users on news items as they are published. However, little has been published about systems that can generate recommendations in response to changes in recommendable items and user behaviour in a very short space of time. In this paper we describe a new algorithm for updating collaborative filtering models incrementally, and demonstrate its effectiveness on clickstream data from The Times. We also describe the architecture that allows recommendations to be generated on the fly, and how we have made each component scalable. The system is currently being used in production at News UK.\nBuilding dialog agents that can converse naturally with humans is a challenging yet intriguing problem of artificial intelligence. In open-domain human-computer conversation, where the conversational agent is expected to respond to human responses in an interesting and engaging way, commonsense knowledge has to be integrated into the model effectively. In this paper, we investigate the impact of providing commonsense knowledge about the concepts covered in the dialog. Our model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models. In the retrieval-based scenario, we propose the Tri-LSTM model to jointly take into account message and commonsense for selecting an appropriate response. Our experiments suggest that the knowledge-augmented models are superior to their knowledge-free counterparts in automatic evaluation.\nLatent factor models are increasingly popular for modeling multi-relational knowledge graphs. By their vectorial nature, it is not only hard to interpret why this class of models works so well, but also to understand where they fail and how they might be improved. We conduct an experimental survey of state-of-the-art models, not towards a purely comparative end, but as a means to get insight about their inductive abilities. To assess the strengths and weaknesses of each model, we create simple tasks that exhibit first, atomic properties of binary relations, and then, common inter-relational inference through synthetic genealogies. Based on these experimental results, we propose new research directions to improve on existing models.\nIn recent years, a number of artificial intelligent services have been developed such as defect detection system or diagnosis system for customer services. Unfortunately, the core in these services is a black-box in which human cannot understand the underlying decision making logic, even though the inspection of the logic is crucial before launching a commercial service. Our goal in this paper is to propose an analytic method of a model explanation that is applicable to general classification models. To this end, we introduce the concept of a contribution matrix and an explanation embedding in a constraint space by using a matrix factorization. We extract a rule-like model explanation from the contribution matrix with the help of the nonnegative matrix factorization. To validate our method, the experiment results provide with open datasets as well as an industry dataset of a LTE network diagnosis and the results show our method extracts reasonable explanations.\nWith the recent advancements in Artificial Intelligence (AI), various organizations and individuals are debating about the progress of AI as a blessing or a curse for the future of the society. This paper conducts an investigation on how the public perceives the progress of AI by utilizing the data shared on Twitter. Specifically, this paper performs a comparative analysis on the understanding of users belonging to two categories -- general AI-Tweeters (AIT) and expert AI-Tweeters (EAIT) who share posts about AI on Twitter. Our analysis revealed that users from both the categories express distinct emotions and interests towards AI. Users from both the categories regard AI as positive and are optimistic about the progress of AI but the experts are more negative than the general AI-Tweeters. Expert AI-Tweeters share relatively large percentage of tweets about their personal news compared to technical aspects of AI. However, the effects of automation on the future are of primary concern to AIT than to EAIT. When the expert category is sub-categorized, the emotion analysis revealed that students and industry professionals have more insights in their tweets about AI than academicians.\nEvent learning is one of the most important problems in AI. However, notwithstanding significant research efforts, it is still a very complex task, especially when the events involve the interaction of humans or agents with other objects, as it requires modeling human kinematics and object movements. This study proposes a methodology for learning complex human-object interaction (HOI) events, involving the recording, annotation and classification of event interactions. For annotation, we allow multiple interpretations of a motion capture by slicing over its temporal span, for classification, we use Long-Short Term Memory (LSTM) sequential models with Conditional Randon Field (CRF) for constraints of outputs. Using a setup involving captures of human-object interaction as three dimensional inputs, we argue that this approach could be used for event types involving complex spatio-temporal dynamics.\nLearning from expert demonstrations has received a lot of attention in artificial intelligence and machine learning. The goal is to infer the underlying reward function that an agent is optimizing given a set of observations of the agent's behavior over time in a variety of circumstances, the system state trajectories, and a plant model specifying the evolution of the system state for different agent's actions. The system is often modeled as a Markov decision process, that is, the next state depends only on the current state and agent's action, and the the agent's choice of action depends only on the current state. While the former is a Markovian assumption on the evolution of system state, the later assumes that the target reward function is itself Markovian. In this work, we explore learning a class of non-Markovian reward functions, known in the formal methods literature as specifications. These specifications offer better composition, transferability, and interpretability. We then show that inferring the specification can be done efficiently without unrolling the transition system. We demonstrate on a 2-d grid world example.\nSafe autonomous driving may be one of the most difficult engineering challenges that any artificial intelligence system has been asked to do since the birth of AI over sixty years ago. The difficulty is not within the task itself, but rather in the extremely small margin of allowable error given the human life at stake and the extremely large number of edge cases that have to be accounted for. In other words, we task these systems to expect the unexpected with near 100% accuracy, which is a technical challenge for machine learning methods that to date have generally been better at memorizing the expected than predicting the unexpected. In fact, the process of efficiently and automatically discovering the edge cases of driving may be the key to solving this engineering challenge. In this work, we propose and evaluate a method for discovering edge cases by monitoring the disagreement between two monocular-vision-based automated steering systems. The first is a proprietary Tesla Autopilot system equipped in the first generation of Autopilot-capable vehicles. The second is a end-to-end neural network trained on a large-scale naturalistic dataset of 420 hours or 45 million frames of autonomous driving in Tesla vehicles.\nGames with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called {\\em na\\\"{i}ve sampling}, based on a variant of the Multi-armed Bandit problem called {\\em Combinatorial Multi-armed Bandits} (CMAB). We analyze the theoretical properties of several variants of {\\em na\\\"{i}ve sampling}, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, {\\em na\\\"{i}ve sampling} outperforms the other sampling strategies.\nWhat are the most popular research topics in Artificial Intelligence (AI)? We formulate the problem as extracting top-$k$ topics that can best represent a given area with the help of knowledge base. We theoretically prove that the problem is NP-hard and propose an optimization model, FastKATE, to address this problem by combining both explicit and latent representations for each topic. We leverage a large-scale knowledge base (Wikipedia) to generate topic embeddings using neural networks and use this kind of representations to help capture the representativeness of topics for given areas. We develop a fast heuristic algorithm to efficiently solve the problem with a provable error bound. We evaluate the proposed model on three real-world datasets. Experimental results demonstrate our model's effectiveness, robustness, real-timeness (return results in $<1$s), and its superiority over several alternative methods.\nThe relatively recent adoption of Knowledge Graphs as an enabling technology in multiple high-profile artificial intelligence and cognitive applications has led to growing interest in the Semantic Web technology stack. Many semantics-related tools, however, are focused on serving experts with a deep understanding of semantic technologies. For example, triplification of relational data is available but there is no open source tool that allows a user unfamiliar with OWL/RDF to import data into a semantic triple store in an intuitive manner. Further, many tools require users to have a working understanding of SPARQL to query data. Casual users interested in benefiting from the power of Knowledge Graphs have few tools available for exploring, querying, and managing semantic data. We present SemTK, the Semantics Toolkit, a user-friendly suite of tools that allow both expert and non-expert semantics users convenient ingestion of relational data, simplified query generation, and more. The exploration of ontologies and instance data is performed through SPARQLgraph, an intuitive web-based user interface in SemTK understandable and navigable by a lay user. The open source version of SemTK is available at http://semtk.research.ge.com.\nUnderstanding physical phenomena is a key component of human intelligence and enables physical interaction with previously unseen environments. In this paper, we study how an artificial agent can autonomously acquire this intuition through interaction with the environment. We created a synthetic block stacking environment with physics simulation in which the agent can learn a policy end-to-end through trial and error. Thereby, we bypass to explicitly model physical knowledge within the policy. We are specifically interested in tasks that require the agent to reach a given goal state that may be different for every new trial. To this end, we propose a deep reinforcement learning framework that learns policies which are parametrized by a goal. We validated the model on a toy example navigating in a grid world with different target positions and in a block stacking task with different target structures of the final tower. In contrast to prior work, our policies show better generalization across different goals.\nThe importance of interpretability of machine learning models has been increasing due to emerging enterprise predictive analytics, threat of data privacy, accountability of artificial intelligence in society, and so on. Piecewise linear models have been actively studied to achieve both accuracy and interpretability. They often produce competitive accuracy against state-of-the-art non-linear methods. In addition, their representations (i.e., rule-based segmentation plus sparse linear formula) are often preferred by domain experts. A disadvantage of such models, however, is high computational cost for simultaneous determinations of the number of \"pieces\" and cardinality of each linear predictor, which has restricted their applicability to middle-scale data sets. This paper proposes a distributed factorized asymptotic Bayesian (FAB) inference of learning piece-wise sparse linear models on distributed memory architectures. The distributed FAB inference solves the simultaneous model selection issue without communicating $O(N)$ data where N is the number of training samples and achieves linear scale-out against the number of CPU cores. Experimental results demonstrate that the distributed FAB inference achieves high prediction accuracy and performance scalability with both synthetic and benchmark data.\nEthics of algorithms is an emerging topic in various disciplines such as social science, law, and philosophy, but also artificial intelligence (AI). The value alignment problem expresses the challenge of (machine) learning values that are, in some way, aligned with human requirements or values. In this paper I argue for looking at how humans have formalized and communicated values, in professional codes of ethics, and for exploring declarative decision-theoretic ethical programs (DDTEP) to formalize codes of ethics. This renders machine ethical reasoning and decision-making, as well as learning, more transparent and hopefully more accountable. The paper includes proof-of-concept examples of known toy dilemmas and gatekeeping domains such as archives and libraries.\nArtificial Intelligence (AI) has been used extensively in automatic decision making in a broad variety of scenarios, ranging from credit ratings for loans to recommendations of movies. Traditional design guidelines for AI models focus essentially on accuracy maximization, but recent work has shown that economically irrational and socially unacceptable scenarios of discrimination and unfairness are likely to arise unless these issues are explicitly addressed. This undesirable behavior has several possible sources, such as biased datasets used for training that may not be detected in black-box models. After pointing out connections between such bias of AI and the problem of induction, we focus on Popper's contributions after Hume's, which offer a logical theory of preferences. An AI model can be preferred over others on purely rational grounds after one or more attempts at refutation based on accuracy and fairness. Inspired by such epistemological principles, this paper proposes a structured approach to mitigate discrimination and unfairness caused by bias in AI systems. In the proposed computational framework, models are selected and enhanced after attempts at refutation. To illustrate our discussion, we focus on hiring decision scenarios where an AI system filters in which job applicants should go to the interview phase.\nThe ability to use a 2D map to navigate a complex 3D environment is quite remarkable, and even difficult for many humans. Localization and navigation is also an important problem in domains such as robotics, and has recently become a focus of the deep reinforcement learning community. In this paper we teach a reinforcement learning agent to read a map in order to find the shortest way out of a random maze it has never seen before. Our system combines several state-of-the-art methods such as A3C and incorporates novel elements such as a recurrent localization cell. Our agent learns to localize itself based on 3D first person images and an approximate orientation angle. The agent generalizes well to bigger mazes, showing that it learned useful localization and navigation capabilities.\nWe propose a novel approach for the generation of polyphonic music based on LSTMs. We generate music in two steps. First, a chord LSTM predicts a chord progression based on a chord embedding. A second LSTM then generates polyphonic music from the predicted chord progression. The generated music sounds pleasing and harmonic, with only few dissonant notes. It has clear long-term structure that is similar to what a musician would play during a jam session. We show that our approach is sensible from a music theory perspective by evaluating the learned chord embeddings. Surprisingly, our simple model managed to extract the circle of fifths, an important tool in music theory, from the dataset.\nThe Simple Temporal Problem (STP) is a fundamental temporal reasoning problem and has recently been extended to the Multiagent Simple Temporal Problem (MaSTP). In this paper we present a novel approach that is based on enforcing arc-consistency (AC) on the input (multiagent) simple temporal network. We show that the AC-based approach is sufficient for solving both the STP and MaSTP and provide efficient algorithms for them. As our AC-based approach does not impose new constraints between agents, it does not violate the privacy of the agents and is superior to the state-of-the-art approach to MaSTP. Empirical evaluations on diverse benchmark datasets also show that our AC-based algorithms for STP and MaSTP are significantly more efficient than existing approaches.\nThe best improvisational theatre actors can make any scene partner, of any skill level or ability, appear talented and proficient in the art form, and thus \"make them shine\". To challenge this improvisational paradigm, we built an artificial intelligence (AI) trained to perform live shows alongside human actors for human audiences. Over the course of 30 performances to a combined audience of almost 3000 people, we have refined theatrical games which involve combinations of human and (at times, adversarial) AI actors. We have developed specific scene structures to include audience participants in interesting ways. Finally, we developed a complete show structure that submitted the audience to a Turing test and observed their suspension of disbelief, which we believe is key for human/non-human theatre co-creation.\nA growing field in robotics and Artificial Intelligence (AI) research is human-robot collaboration, whose target is to enable effective teamwork between humans and robots. However, in many situations human teams are still superior to human-robot teams, primarily because human teams can easily agree on a common goal with language, and the individual members observe each other effectively, leveraging their shared motor repertoire and sensorimotor resources. This paper shows that for cognitive robots it is possible, and indeed fruitful, to combine knowledge acquired from interacting with elements of the environment (affordance exploration) with the probabilistic observation of another agent's actions.   We propose a model that unites (i) learning robot affordances and word descriptions with (ii) statistical recognition of human gestures with vision sensors. We discuss theoretical motivations, possible implementations, and we show initial results which highlight that, after having acquired knowledge of its surrounding environment, a humanoid robot can generalize this knowledge to the case when it observes another agent (human partner) performing the same motor actions previously executed during training.\nObject ranking or \"learning to rank\" is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects $A,B,C,D$, if object $A$ is known to be preferred to $B$, and $C$ relates to $D$ as $A$ relates to $B$, then $C$ is (supposedly) preferred to $D$. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.\nModeling personality is a challenging problem with applications spanning computer games, virtual assistants, online shopping and education. Many techniques have been tried, ranging from neural networks to computational cognitive architectures. However, most approaches rely on examples with hand-crafted features and scenarios. Here, we approach learning a personality by training agents using a Deep Q-Network (DQN) model on rewards based on psychoanalysis, against hand-coded AI in the game of Pong. As a result, we obtain 4 agents, each with its own personality. Then, we define happiness of an agent, which can be seen as a measure of alignment with agent's objective function, and study it when agents play both against hand-coded AI, and against each other. We find that the agents that achieve higher happiness during testing against hand-coded AI, have lower happiness when competing against each other. This suggests that higher happiness in testing is a sign of overfitting in learning to interact with hand-coded AI, and leads to worse performance against agents with different personalities.\nWe propose a hybrid architecture for systematically computing robust visual explanation(s) encompassing hypothesis formation, belief revision, and default reasoning with video data. The architecture consists of two tightly integrated synergistic components: (1) (functional) answer set programming based abductive reasoning with space-time tracklets as native entities; and (2) a visual processing pipeline for detection based object tracking and motion analysis.   We present the formal framework, its general implementation as a (declarative) method in answer set programming, and an example application and evaluation based on two diverse video datasets: the MOTChallenge benchmark developed by the vision community, and a recently developed Movie Dataset.\nCan artificial intelligence (AI) learn complicated non-linear physics? Here we propose a novel deep learning approach that learns non-linear photon scattering physics and obtains accurate 3D distribution of optical anomalies. In contrast to the traditional black-box deep learning approaches to inverse problems, our deep network learns to invert the Lippmann-Schwinger integral equation which describes the essential physics of photon migration of diffuse near-infrared (NIR) photons in turbid media. As an example for clinical relevance, we applied the method to our prototype diffuse optical tomography (DOT). We show that our deep neural network, trained with only simulation data, can accurately recover the location of anomalies within biomimetic phantoms and live animals without the use of an exogenous contrast agent.\nRepresentation-based classification methods such as sparse representation-based classification (SRC) and linear regression classification (LRC) have attracted a lot of attentions. In order to obtain the better representation, a novel method called projection representation-based classification (PRC) is proposed for image recognition in this paper. PRC is based on a new mathematical model. This model denotes that the 'ideal projection' of a sample point $x$ on the hyper-space $H$ may be gained by iteratively computing the projection of $x$ on a line of hyper-space $H$ with the proper strategy. Therefore, PRC is able to iteratively approximate the 'ideal representation' of each subject for classification. Moreover, the discriminant PRC (DPRC) is further proposed, which obtains the discriminant information by maximizing the ratio of the between-class reconstruction error over the within-class reconstruction error. Experimental results on five typical databases show that the proposed PRC and DPRC are effective and outperform other state-of-the-art methods on several vision recognition tasks.\nThe game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.\nResearch in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents' life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey.\nWe cast the problem of combinatorial auction design in a Bayesian framework in order to incorporate prior information into the auction process and minimize the number of rounds to convergence. We first develop a generative model of agent valuations and market prices such that clearing prices become maximum a posteriori estimates given observed agent valuations. This generative model then forms the basis of an auction process which alternates between refining estimates of agent valuations and computing candidate clearing prices. We provide an implementation of the auction using assumed density filtering to estimate valuations and expectation maximization to compute prices. An empirical evaluation over a range of valuation domains demonstrates that our Bayesian auction mechanism is highly competitive against the combinatorial clock auction in terms of rounds to convergence, even under the most favorable choices of price increment for this baseline.\nMany tasks in artificial intelligence require the collaboration of multiple agents. We exam deep reinforcement learning for multi-agent domains. Recent research efforts often take the form of two seemingly conflicting perspectives, the decentralized perspective, where each agent is supposed to have its own controller; and the centralized perspective, where one assumes there is a larger model controlling all agents. In this regard, we revisit the idea of the master-slave architecture by incorporating both perspectives within one framework. Such a hierarchical structure naturally leverages advantages from one another. The idea of combining both perspectives is intuitive and can be well motivated from many real world systems, however, out of a variety of possible realizations, we highlights three key ingredients, i.e. composed action representation, learnable communication and independent reasoning. With network designs to facilitate these explicitly, our proposal consistently outperforms latest competing methods both in synthetic experiments and when applied to challenging StarCraft micromanagement tasks.\nWe examine implemented systems for ethical machine reasoning with a view to identifying the practical challenges (as opposed to philosophical challenges) posed by the area. We identify a need for complex ethical machine reasoning not only to be multi-objective, proactive, and scrutable but that it must draw on heterogeneous evidential reasoning. We also argue that, in many cases, it needs to operate in real time and be verifiable. We propose a general architecture involving a declarative ethical arbiter which draws upon multiple evidential reasoners each responsible for a particular ethical feature of the system's environment. We claim that this architecture enables some separation of concerns among the practical challenges that ethical machine reasoning poses.\nCatastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.\nIn this paper a novel biclustering algorithm based on artificial intelligence (AI) is introduced. The method called EBIC aims to detect biologically meaningful, order-preserving patterns in complex data. The proposed algorithm is probably the first one capable of discovering with accuracy exceeding 50\\% multiple complex patterns in real gene expression datasets. It is also one of the very few biclustering methods designed for parallel environments with multiple graphics processing units (GPUs). We demonstrate that EBIC outperforms state-of-the-art biclustering methods, in terms of recovery and relevance, on both synthetic and genetic datasets. EBIC also yields results over 12 times faster than the most accurate reference algorithms. The proposed algorithm is anticipated to be added to the repertoire of unsupervised machine learning algorithms for the analysis of datasets, including those from large-scale genomic studies.\nTraditional radio systems are strictly co-designed on the lower levels of the OSI stack for compatibility and efficiency. Although this has enabled the success of radio communications, it has also introduced lengthy standardization processes and imposed static allocation of the radio spectrum. Various initiatives have been undertaken by the research community to tackle the problem of artificial spectrum scarcity by both making frequency allocation more dynamic and building flexible radios to replace the static ones. There is reason to believe that just as computer vision and control have been overhauled by the introduction of machine learning, wireless communication can also be improved by utilizing similar techniques to increase the flexibility of wireless networks. In this work, we pose the problem of discovering low-level wireless communication schemes ex-nihilo between two agents in a fully decentralized fashion as a reinforcement learning problem. Our proposed approach uses policy gradients to learn an optimal bi-directional communication scheme and shows surprisingly sophisticated and intelligent learning behavior. We present the results of extensive experiments and an analysis of the fidelity of our approach.\nThis paper presents a pilot study on developing an instrument to predict the quality of e-commerce websites. The 8C model was adopted as the reference model of the heuristic evaluation. Each dimension of the 8C was mapped into a set of quantitative website elements, selected websites were scraped to get the quantitative website elements, and the score of each dimension was calculated. A software was developed in PHP for the experiments. In the training process, 10 experiments were conducted and quantitative analyses were regressively conducted between the experiments. The conversion rate was used to verify the heuristic evaluation of an e-commerce website after each experiment. The results showed that the mapping revisions between the experiments improved the performance of the evaluation instrument, therefore the experiment process and the quantitative mapping revision guideline proposed was on the right track. The software resulted from the experiment 10 can serve as the aimed e-commerce website evaluation instrument. The experiment results and the future work have been discussed.\nSystems are typically made from simple components regardless of their complexity. While the function of each part is easily understood, higher order functions are emergent properties and are notoriously difficult to explain. In networked systems, both digital and biological, each component receives inputs, performs a simple computation, and creates an output. When these components have multiple outputs, we intuitively assume that the outputs are causally dependent on the inputs but are themselves independent of each other given the state of their shared input. However, this intuition can be violated for components with probabilistic logic, as these typically cannot be decomposed into separate logic gates with one output each. This violation of conditional independence on the past system state is equivalent to instantaneous interaction --- the idea is that some information between the outputs is not coming from the inputs and thus must have been created instantaneously. Here we compare evolved artificial neural systems with and without instantaneous interaction across several task environments. We show that systems without instantaneous interactions evolve faster, to higher final levels of performance, and require fewer logic components to create a densely connected cognitive machinery.\nDeep neural networks (DNNs) are powerful machine learning models and have succeeded in various artificial intelligence tasks. Although various architectures and modules for the DNNs have been proposed, selecting and designing the appropriate network structure for a target problem is a challenging task. In this paper, we propose a method to simultaneously optimize the network structure and weight parameters during neural network training. We consider a probability distribution that generates network structures, and optimize the parameters of the distribution instead of directly optimizing the network structure. The proposed method can apply to the various network structure optimization problems under the same framework. We apply the proposed method to several structure optimization problems such as selection of layers, selection of unit types, and selection of connections using the MNIST, CIFAR-10, and CIFAR-100 datasets. The experimental results show that the proposed method can find the appropriate and competitive network structures.\nAutomated planning is a major topic of research in artificial intelligence, and enjoys a long and distinguished history. The classical paradigm assumes a distinguished initial state, comprised of a set of facts, and is defined over a set of actions which change that state in one way or another. Planning in many real-world settings, however, is much more involved: an agent's knowledge is almost never simply a set of facts that are true, and actions that the agent intends to execute never operate the way they are supposed to. Thus, probabilistic planning attempts to incorporate stochastic models directly into the planning process. In this article, we briefly report on probabilistic planning through the lens of probabilistic programming: a programming paradigm that aims to ease the specification of structured probability distributions. In particular, we provide an overview of the features of two systems, HYPE and ALLEGRO, which emphasise different strengths of probabilistic programming that are particularly useful for complex modelling issues raised in probabilistic planning. Among other things, with these systems, one can instantiate planning problems with growing and shrinking state spaces, discrete and continuous probability distributions, and non-unique prior distributions in a first-order setting.\nDeceptive games are games where the reward structure or other aspects of the game are designed to lead the agent away from a globally optimal policy. While many games are already deceptive to some extent, we designed a series of games in the Video Game Description Language (VGDL) implementing specific types of deception, classified by the cognitive biases they exploit. VGDL games can be run in the General Video Game Artificial Intelligence (GVGAI) Framework, making it possible to test a variety of existing AI agents that have been submitted to the GVGAI Competition on these deceptive games. Our results show that all tested agents are vulnerable to several kinds of deception, but that different agents have different weaknesses. This suggests that we can use deception to understand the capabilities of a game-playing algorithm, and game-playing algorithms to characterize the deception displayed by a game.\nThe extension of deep learning towards temporal data processing is gaining an increasing research interest. In this paper we investigate the properties of state dynamics developed in successive levels of deep recurrent neural networks (RNNs) in terms of short-term memory abilities. Our results reveal interesting insights that shed light on the nature of layering as a factor of RNN design. Noticeably, higher layers in a hierarchically organized RNN architecture results to be inherently biased towards longer memory spans even prior to training of the recurrent connections. Moreover, in the context of Reservoir Computing framework, our analysis also points out the benefit of a layered recurrent organization as an efficient approach to improve the memory skills of reservoir models.\nHuman face-to-face communication is a complex multimodal signal. We use words (language modality), gestures (vision modality) and changes in tone (acoustic modality) to convey our intentions. Humans easily process and understand face-to-face communication, however, comprehending this form of communication remains a significant challenge for Artificial Intelligence (AI). AI must understand each modality and the interactions between them that shape human communication. In this paper, we present a novel neural architecture for understanding human communication called the Multi-attention Recurrent Network (MARN). The main strength of our model comes from discovering interactions between modalities through time using a neural component called the Multi-attention Block (MAB) and storing them in the hybrid memory of a recurrent component called the Long-short Term Hybrid Memory (LSTHM). We perform extensive comparisons on six publicly available datasets for multimodal sentiment analysis, speaker trait recognition and emotion recognition. MARN shows state-of-the-art performance on all the datasets.\nAs Artificial Intelligence (AI) techniques have become more powerful and easier to use they are increasingly deployed as key components of modern software systems. While this enables new functionality and often allows better adaptation to user needs it also creates additional problems for software engineers and exposes companies to new risks. Some work has been done to better understand the interaction between Software Engineering and AI but we lack methods to classify ways of applying AI in software systems and to analyse and understand the risks this poses. Only by doing so can we devise tools and solutions to help mitigate them. This paper presents the AI in SE Application Levels (AI-SEAL) taxonomy that categorises applications according to their point of AI application, the type of AI technology used and the automation level allowed. We show the usefulness of this taxonomy by classifying 15 papers from previous editions of the RAISE workshop. Results show that the taxonomy allows classification of distinct AI applications and provides insights concerning the risks associated with them. We argue that this will be important for companies in deciding how to apply AI in their software applications and to create strategies for its use.\nGoal-oriented dialogue has been paid attention for its numerous applications in artificial intelligence. To solve this task, deep learning and reinforcement learning have recently been applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose \"Answerer in Questioner's Mind\" (AQM), a novel algorithm for goal-oriented dialogue. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer. The questioner figures out the answerer's intent via selecting a plausible question by explicitly calculating the information gain of the candidate intentions and possible answers to each question. We test our framework on two goal-oriented visual dialogue tasks: \"MNIST Counting Dialog\" and \"GuessWhat?!.\" In our experiments, AQM outperforms comparative algorithms and makes human-like dialogue. We further use AQM as a tool for analyzing the mechanism of deep reinforcement learning approach and discuss the future direction of practical goal-oriented neural dialogue systems.\nDespite rapid progress, most of the educational technologies today lack a strong instructional design knowledge basis leading to questionable quality of instruction. In addition, a major challenge is to customize these educational technologies for a wide range of instructional designs. Ontologies are one of the pertinent mechanisms to represent instructional design in the literature. However, existing approaches do not support modeling of flexible instructional designs. To address this problem, in this paper, we propose an ontology based framework for systematic modeling of different aspects of instructional design knowledge based on domain patterns. As part of the framework, we present ontologies for modeling goals, instructional processes and instructional materials. We demonstrate the ontology framework by presenting instances of the ontology for the large scale case study of adult literacy in India (287 million learners spread across 22 Indian Languages), which requires creation of 1000 similar but varied eLearning Systems based on flexible instructional designs. The implemented framework is available at http://rice.iiit.ac.in and is transferred to National Literacy Mission of Government of India. This framework could be used for modeling instructional design knowledge of systems for skills, school education and beyond.\nPlanning problems are among the most important and well-studied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree. Among these algorithms, Monte-Carlo tree search (MCTS) is one of the most general, powerful and widely used. A typical implementation of MCTS uses cleverly designed rules, optimized to the particular characteristics of the domain. These rules control where the simulation traverses, what to evaluate in the states that are reached, and how to back-up those evaluations. In this paper we instead learn where, what and how to search. Our architecture, which we call an MCTSnet, incorporates simulation-based search inside a neural network, by expanding, evaluating and backing-up a vector embedding. The parameters of the network are trained end-to-end using gradient-based optimisation. When applied to small searches in the well known planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines.\nQuantum Decision Theory, advanced earlier by the authors, and illustrated for lotteries with gains, is generalized to the games containing lotteries with gains as well as losses. The mathematical structure of the approach is based on the theory of quantum measurements, which makes this approach relevant both for the description of decision making of humans and the creation of artificial quantum intelligence. General rules are formulated allowing for the explicit calculation of quantum probabilities representing the fraction of decision makers preferring the considered prospects. This provides a method to quantitatively predict decision-maker choices, including the cases of games with high uncertainty for which the classical expected utility theory fails. The approach is applied to experimental results obtained on a set of lottery gambles with gains and losses. Our predictions, involving no fitting parameters, are in very good agreement with experimental data. The use of quantum decision making in game theory is described. A principal scheme of creating quantum artificial intelligence is suggested.\nThis paper describes a method for generative player modeling and its application to the automatic testing of game content using archetypal player models called procedural personas. Theoretically grounded in psychological decision theory, procedural personas are implemented using a variation of Monte Carlo Tree Search (MCTS) where the node selection criteria are developed using evolutionary computation, replacing the standard UCB1 criterion of MCTS. Using these personas we demonstrate how generative player models can be applied to a varied corpus of game levels and demonstrate how different play styles can be enacted in each level. In short, we use artificially intelligent personas to construct synthetic playtesters. The proposed approach could be used as a tool for automatic play testing when human feedback is not readily available or when quick visualization of potential interactions is necessary. Possible applications include interactive tools during game development or procedural content generation systems where many evaluations must be conducted within a short time span.\nThis report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.\nOver these years the Artificial Intelligence (AI) community has produced several datasets which have given the machine learning algorithms the opportunity to learn various skills across various domains. However, a subclass of these machine learning algorithms that aimed at learning logic programs, namely the Inductive Logic Programming algorithms, have often failed at the task due to the vastness of these datasets. This has impacted the usability of knowledge representation and reasoning techniques in the development of AI systems. In this research, we try to address this scalability issue for the algorithms that learn Answer Set Programs. We present a sound and complete algorithm which takes the input in a slightly different manner and perform an efficient and more user controlled search for a solution. We show via experiments that our algorithm can learn from two popular datasets from machine learning community, namely bAbl (a question answering dataset) and MNIST (a dataset for handwritten digit recognition), which to the best of our knowledge was not previously possible. The system is publicly available at https://goo.gl/KdWAcV.\nA novel Neural Network architecture is proposed using the mathematically and physically rich idea of vector fields as hidden layers to perform nonlinear transformations in the data. The data points are interpreted as particles moving along a flow defined by the vector field which intuitively represents the desired movement to enable classification. The architecture moves the data points from their original configuration to anew one following the streamlines of the vector field with the objective of achieving a final configuration where classes are separable. An optimization problem is solved through gradient descent to learn this vector field.\nSemantic composition functions have been playing a pivotal role in neural representation learning of text sequences. In spite of their success, most existing models suffer from the underfitting problem: they use the same shared compositional function on all the positions in the sequence, thereby lacking expressive power due to incapacity to capture the richness of compositionality. Besides, the composition functions of different tasks are independent and learned from scratch. In this paper, we propose a new sharing scheme of composition function across multiple tasks. Specifically, we use a shared meta-network to capture the meta-knowledge of semantic composition and generate the parameters of the task-specific semantic composition models. We conduct extensive experiments on two types of tasks, text classification and sequence tagging, which demonstrate the benefits of our approach. Besides, we show that the shared meta-knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks.\nSociotechnical systems are complex systems, where nonlinear interaction among different players can obscure causal relationships. The absence of mechanisms to help us understand how to create a change in the system makes it hard to manage these systems.   Influencing and shaping are social operators acting on sociotechnical systems to design a change. However, the two operators are usually discussed in an ad-hoc manner, without proper guiding models and metrics which assist in adopting these models successfully. Moreover, both social operators rely on accurate understanding of the concept of trust. Without such understanding, neither of these operators can create the required level to create a change in a desirable direction.   In this paper, we define these concepts in a concise manner suitable for modelling the concepts and understanding their dynamics. We then introduce a model for influencing and shaping and use Computational Red Teaming principles to design and demonstrate how this model operates. We validate the results computationally through a simulation environment to show social influencing and shaping in an artificial society.\nDeep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.\nWhat are the functions of curiosity? What are the mechanisms of curiosity-driven learning? We approach these questions using concepts and tools from machine learning and developmental robotics. We argue that curiosity-driven learning enables organisms to make discoveries to solve complex problems with rare or deceptive rewards. By fostering exploration and discovery of a diversity of behavioural skills, and ignoring these rewards, curiosity can be efficient to bootstrap learning when there is no information, or deceptive information, about local improvement towards these problems. We review both normative and heuristic computational frameworks used to understand the mechanisms of curiosity in humans, conceptualizing the child as a sense-making organism. These frameworks enable us to discuss the bi-directional causal links between curiosity and learning, and to provide new hypotheses about the fundamental role of curiosity in self-organizing developmental structures through curriculum learning. We present various developmental robotics experiments that study these mechanisms in action, both supporting these hypotheses and opening new research avenues in machine learning and artificial intelligence. Finally, we discuss challenges for the design of experimental paradigms for studying curiosity in psychology and cognitive neuroscience. Keywords: Curiosity, intrinsic motivation, lifelong learning, predictions, world model, rewards, free-energy principle, learning progress, machine learning, AI, developmental robotics, development, curriculum learning, self-organization.\nCombining Generative Adversarial Networks (GANs) with encoders that learn to encode data points has shown promising results in learning data representations in an unsupervised way. We propose a framework that combines an encoder and a generator to learn disentangled representations which encode meaningful information about the data distribution without the need for any labels. While current approaches focus mostly on the generative aspects of GANs, our framework can be used to perform inference on both real and generated data points. Experiments on several data sets show that the encoder learns interpretable, disentangled representations which encode descriptive properties and can be used to sample images that exhibit specific characteristics.\nOur study aims to build a machine learning model for crime prediction using geospatial features for different categories of crime. The reverse geocoding technique is applied to retrieve open street map (OSM) spatial data. This study also proposes finding hotpoints extracted from crime hotspots area found by Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). A spatial distance feature is then computed based on the position of different hotpoints for various types of crime and this value is used as a feature for classifiers. We test the engineered features in crime data from Royal Canadian Mounted Police of Halifax, NS. We observed a significant performance improvement in crime prediction using the new generated spatial features.\nRelaxed models are abstract problem descriptions generated by ignoring constraints that are present in base-level problems. They play an important role in planning and search algorithms, as it has been shown that the length of an optimal solution to a relaxed model yields a monotone heuristic for an A? search of a base-level problem. Optimal solutions to a relaxed model may be computed algorithmically or by search in a further relaxed model, leading to a search that explores a hierarchy of relaxed models. In this paper, we review the traditional definition of problem relaxation and show that searching in the abstraction hierarchy created by problem relaxation will not reduce the computational effort required to find optimal solutions to the base- level problem, unless the relaxed problem found in the hierarchy can be transformed by some optimization (e.g., subproblem factoring). Specifically, we prove that any A* search of the base-level using a heuristic h2 will largely dominate an A* search of the base-level using a heuristic h1, if h1 must be computed by an A* search of the relaxed model using h2.\nDeep reinforcement learning is quickly changing the field of artificial intelligence. These models are able to capture a high level understanding of their environment, enabling them to learn difficult dynamic tasks in a variety of domains. In the database field, query optimization remains a difficult problem. Our goal in this work is to explore the capabilities of deep reinforcement learning in the context of query optimization. At each state, we build queries incrementally and encode properties of subqueries through a learned representation. The challenge here lies in the formation of the state transition function, which defines how the current subquery state combines with the next query operation (action) to yield the next state. As a first step in this direction, we focus the state representation problem and the formation of the state transition function. We describe our approach and show preliminary results. We further discuss how we can use the state representation to improve query optimization using reinforcement learning.\nReinforcement learning involves decision making in dynamic and uncertain environments and constitutes a crucial element of artificial intelligence. In our previous work, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers can be used to solve the two-armed bandit problem efficiently, which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. However, only two selections were employed in that research; thus, the scalability of the laser-chaos-based reinforcement learning should be clarified. In this study, we demonstrated a scalable, pipelined principle of resolving the multi-armed bandit problem by introducing time-division multiplexing of chaotically oscillated ultrafast time-series. The experimental demonstrations in which bandit problems with up to 64 arms were successfully solved are presented in this report. Detailed analyses are also provided that include performance comparisons among laser chaos signals generated in different physical conditions, which coincide with the diffusivity inherent in the time series. This study paves the way for ultrafast reinforcement learning by taking advantage of the ultrahigh bandwidths of light wave and practical enabling technologies.\nThe Turing Machine has two implicit properties that depend on its underlying notion of computing: the format is fully determinate and computations are information preserving. Distributed representations lack these properties and cannot be fully captured by Turing's standard model. To address this limitation a distributed extension of the Turing Machine is introduced in this paper. In the extended machine, functions and abstractions are expressed extensionally and computations are entropic. The machine is applied to the definition of an associative memory, with its corresponding memory register, recognition and retrieval operations. The memory is tested with an experiment for storing and recognizing hand written digits with satisfactory results. The experiment can be seen as a proof of concept that information can be stored and processed effectively in a highly distributed fashion using a symbolic but not fully determinate format. The new machine augments the symbolic mode of computing with consequences on the way Church Thesis is understood. The paper is concluded with a discussion of some implications of the extended machine for Artificial Intelligence and Cognition.\nThe recent successes of AI have captured the wildest imagination of both the scientific communities and the general public. Robotics and AI amplify human potentials, increase productivity and are moving from simple reasoning towards human-like cognitive abilities. Current AI technologies are used in a set area of applications, ranging from healthcare, manufacturing, transport, energy, to financial services, banking, advertising, management consulting and government agencies. The global AI market is around 260 billion USD in 2016 and it is estimated to exceed 3 trillion by 2024. To understand the impact of AI, it is important to draw lessons from it's past successes and failures and this white paper provides a comprehensive explanation of the evolution of AI, its current status and future directions.\nUncertainty analysis in the form of probabilistic forecasting can provide significant improvements in decision-making processes in the smart power grid for better integrating renewable energies such as wind. Whereas point forecasting provides a single expected value, probabilistic forecasts provide more information in the form of quantiles, prediction intervals, or full predictive densities. This paper analyzes the effectiveness of an approach for nonparametric probabilistic forecasting of wind power that combines support vector machines and nonlinear quantile regression with non-crossing constraints. A numerical case study is conducted using publicly available wind data from the Global Energy Forecasting Competition 2014. Multiple quantiles are estimated to form 20%, 40%, 60% and 80% prediction intervals which are evaluated using the pinball loss function and reliability measures. Three benchmark models are used for comparison where results demonstrate the proposed approach leads to significantly better performance while preventing the problem of overlapping quantile estimates.\nTechniques combining machine learning with translation to automated reasoning have recently become an important component of formal proof assistants. Such \"hammer\" tech- niques complement traditional proof assistant automation as implemented by tactics and decision procedures. In this paper we present a unified proof assistant automation approach which attempts to automate the selection of appropriate tactics and tactic-sequences com- bined with an optimized small-scale hammering approach. We implement the technique as a tactic-level automation for HOL4: TacticToe. It implements a modified A*-algorithm directly in HOL4 that explores different tactic-level proof paths, guiding their selection by learning from a large number of previous tactic-level proofs. Unlike the existing hammer methods, TacticToe avoids translation to FOL, working directly on the HOL level. By combining tactic prediction and premise selection, TacticToe is able to re-prove 39 percent of 7902 HOL4 theorems in 5 seconds whereas the best single HOL(y)Hammer strategy solves 32 percent in the same amount of time.\nHumans are going to delegate the rights of driving to the autonomous vehicles in near future. However, to fulfill this complicated task, there is a need for a mechanism, which enforces the autonomous vehicles to obey the road and social rules that have been practiced by well-behaved drivers. This task can be achieved by introducing social norms compliance mechanism in the autonomous vehicles. This research paper is proposing an artificial society of autonomous vehicles as an analogy of human social society. Each AV has been assigned a social personality having different social influence. Social norms have been introduced which help the AVs in making the decisions, influenced by emotions, regarding road collision avoidance. Furthermore, social norms compliance mechanism, by artificial social AVs, has been proposed using prospect based emotion i.e. fear, which is conceived from OCC model. Fuzzy logic has been employed to compute the emotions quantitatively. Then, using SimConnect approach, fuzzy values of fear has been provided to the Netlogo simulation environment to simulate artificial society of AVs. Extensive testing has been performed using the behavior space tool to find out the performance of the proposed approach in terms of the number of collisions. For comparison, the random-walk model based artificial society of AVs has been proposed as well. A comparative study with a random walk, prove that proposed approach provides a better option to tailor the autopilots of future AVS, Which will be more socially acceptable and trustworthy by their riders in terms of safe road travel.\nAn intelligent agent will often be uncertain about various properties of its environment, and when acting in that environment it will frequently need to quantify its uncertainty. For example, if the agent wishes to employ the expected-utility paradigm of decision theory to guide its actions, it will need to assign degrees of belief (subjective probabilities) to various assertions. Of course, these degrees of belief should not be arbitrary, but rather should be based on the information available to the agent. This paper describes one approach for inducing degrees of belief from very rich knowledge bases, that can include information about particular individuals, statistical correlations, physical laws, and default rules. We call our approach the random-worlds method. The method is based on the principle of indifference: it treats all of the worlds the agent considers possible as being equally likely. It is able to integrate qualitative default reasoning with quantitative probabilistic reasoning by providing a language in which both types of information can be easily expressed. Our results show that a number of desiderata that arise in direct inference (reasoning from statistical information to conclusions about individuals) and default reasoning follow directly {from} the semantics of random worlds. For example, random worlds captures important patterns of reasoning such as specificity, inheritance, indifference to irrelevant information, and default assumptions of independence. Furthermore, the expressive power of the language used and the intuitive semantics of random worlds allow the method to deal with problems that are beyond the scope of many other non-deductive reasoning systems.\nThe rapid e-commerce growth has made both business community and customers face a new situation. Due to intense competition on one hand and the customer's option to choose from several alternatives business community has realized the necessity of intelligent marketing strategies and relationship management. Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on. The study of ant colonies behavior and their self-organizing capabilities is of interest to knowledge retrieval/management and decision support systems sciences, because it provides models of distributed adaptive organization, which are useful to solve difficult optimization, classification, and distributed control problems, among others. In this paper, we propose an ant clustering algorithm to discover Web usage patterns (data clusters) and a linear genetic programming approach to analyze the visitor trends. Empirical results clearly shows that ant colony clustering performs well when compared to a self-organizing map (for clustering Web usage patterns) even though the performance accuracy is not that efficient when comparared to evolutionary-fuzzy clustering (i-miner) approach. KEYWORDS: Web Usage Mining, Swarm Intelligence, Ant Systems, Stigmergy, Data-Mining, Linear Genetic Programming.\nChemotaxis can be defined as an innate behavioural response by an organism to a directional stimulus, in which bacteria, and other single-cell or multicellular organisms direct their movements according to certain chemicals in their environment. This is important for bacteria to find food (e.g., glucose) by swimming towards the highest concentration of food molecules, or to flee from poisons. Based on self-organized computational approaches and similar stigmergic concepts we derive a novel swarm intelligent algorithm. What strikes from these observations is that both eusocial insects as ant colonies and bacteria have similar natural mechanisms based on stigmergy in order to emerge coherent and sophisticated patterns of global collective behaviour. Keeping in mind the above characteristics we will present a simple model to tackle the collective adaptation of a social swarm based on real ant colony behaviors (SSA algorithm) for tracking extrema in dynamic environments and highly multimodal complex functions described in the well-know De Jong test suite. Later, for the purpose of comparison, a recent model of artificial bacterial foraging (BFOA algorithm) based on similar stigmergic features is described and analyzed. Final results indicate that the SSA collective intelligence is able to cope and quickly adapt to unforeseen situations even when over the same cooperative foraging period, the community is requested to deal with two different and contradictory purposes, while outperforming BFOA in adaptive speed. Results indicate that the present approach deals well in severe Dynamic Optimization problems.\nRecent advances in neurosciences and psychology have provided evidence that affective phenomena pervade intelligence at many levels, being inseparable from the cognitionaction loop. Perception, attention, memory, learning, decisionmaking, adaptation, communication and social interaction are some of the aspects influenced by them. This work draws its inspirations from neurobiology, psychophysics and sociology to approach the problem of building autonomous robots capable of interacting with each other and building strategies based on temperamental decision mechanism. Modelling emotions is a relatively recent focus in artificial intelligence and cognitive modelling. Such models can ideally inform our understanding of human behavior. We may see the development of computational models of emotion as a core research focus that will facilitate advances in the large array of computational systems that model, interpret or influence human behavior. We propose a model based on a scalable, flexible and modular approach to emotion which allows runtime evaluation between emotional quality and performance. The results achieved showed that the strategies based on temperamental decision mechanism strongly influence the system performance and there are evident dependency between emotional state of the agents and their temperamental type, as well as the dependency between the team performance and the temperamental configuration of the team members, and this enable us to conclude that the modular approach to emotional programming based on temperamental theory is the good choice to develop computational mind models for emotional behavioral Multi-Agent systems.\nThis work reports the most relevant technical aspects in the problem of learning the \\emph{Markov network structure} from data. Such problem has become increasingly important in machine learning, and many other application fields of machine learning. Markov networks, together with Bayesian networks, are probabilistic graphical models, a widely used formalism for handling probability distributions in intelligent systems. Learning graphical models from data have been extensively applied for the case of Bayesian networks, but for Markov networks learning it is not tractable in practice. However, this situation is changing with time, given the exponential growth of computers capacity, the plethora of available digital data, and the researching on new learning technologies. This work stresses on a technology called independence-based learning, which allows the learning of the independence structure of those networks from data in an efficient and sound manner, whenever the dataset is sufficiently large, and data is a representative sampling of the target distribution. In the analysis of such technology, this work surveys the current state-of-the-art algorithms for learning Markov networks structure, discussing its current limitations, and proposing a series of open problems where future works may produce some advances in the area in terms of quality and efficiency. The paper concludes by opening a discussion about how to develop a general formalism for improving the quality of the structures learned, when data is scarce.\nThe main objective of this paper is to develop a new semantic Network structure, based on the fuzzy sets theory, used in Artificial Intelligent system in order to provide effective on-line assistance to users of new technological systems. This Semantic Networks is used to describe the knowledge of an \"ideal\" expert while fuzzy sets are used both to describe the approximate and uncertain knowledge of novice users who intervene to match fuzzy labels of a query with categories from an \"ideal\" expert. The technical system we consider is a word processor software, with Objects such as \"Word\" and Goals such as \"Cut\" or \"Copy\". We suggest to consider the set of the system's Goals as a set of linguistic variables to which corresponds a set of possible linguistic values based on the fuzzy set. We consider, therefore, a set of interpretation's levels for these possible values to which corresponds a set of membership functions. We also propose a method to measure the similarity degree between different fuzzy linguistic variables for the partition of the semantic network in class of similar objects to make easy the diagnosis of the user's fuzzy queries.\nThe growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a lot of queries from various parts of this world, and the answers of it solely depend on the knowledge that it gathers by means of crawling. The information sharing becomes a most common habit of the society, and it is done by means of publishing structured, semi-structured and unstructured resources on the web. This social practice leads to an exponential growth of web-resource, and hence it became essential to crawl for continuous updating of web-knowledge and modification of several existing resources in any situation. In this paper one statistical hypothesis based learning mechanism is incorporated for learning the behavior of crawling speed in different environment of network, and for intelligently control of the speed of crawler. The scaling technique is used to compare the performance proposed method with the standard crawler. The high speed performance is observed after scaling, and the retrieval of relevant web-resource in such a high speed is analyzed.\nMedical diagnosis process vary in the degree to which they attempt to deal with different complicating aspects of diagnosis such as relative importance of symptoms, varied symptom pattern and the relation between diseases them selves. Based on decision theory, in the past many mathematical models such as crisp set, probability distribution, fuzzy set, intuitionistic fuzzy set were developed to deal with complicating aspects of diagnosis. But, many such models are failed to include important aspects of the expert decisions. Therefore, an effort has been made to process inconsistencies in data being considered by Pawlak with the introduction of rough set theory. Though rough set has major advantages over the other methods, but it generates too many rules that create many difficulties while taking decisions. Therefore, it is essential to minimize the decision rules. In this paper, we use two processes such as pre process and post process to mine suitable rules and to explore the relationship among the attributes. In pre process we use rough set theory to mine suitable rules, whereas in post process we use formal concept analysis from these suitable rules to explore better knowledge and most important factors affecting the decision making.\nArtificial Intelligence (AI) techniques are known for its ability in tackling problems found to be unyielding to traditional mathematical methods. A recent addition to these techniques are the Computational Intelligence (CI) techniques which, in most cases, are nature or biologically inspired techniques. Different CI techniques found their way to many control engineering applications, including system identification, and the results obtained by many researchers were encouraging. However, most control engineers and researchers used the basic CI models as is or slightly modified them to match their needs. Henceforth, the merits of one model over the other was not clear, and full potential of these models was not exploited.   In this research, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) methods, which are different CI techniques, are modified to best suit the multimodal problem of system identification. In the first case of GA, an extension to the basic algorithm, which is inspired from nature as well, was deployed by introducing redundant genetic material. This extension, which come in handy in living organisms, did not result in significant performance improvement to the basic algorithm. In the second case, the Clubs-based PSO (C-PSO) dynamic neighborhood structure was introduced to replace the basic static structure used in canonical PSO algorithms. This modification of the neighborhood structure resulted in significant performance of the algorithm regarding convergence speed, and equipped it with a tool to handle multimodal problems.   To understand the suitability of different GA and PSO techniques in the problem of system identification, they were used in an induction motor's parameter identification problem. The results enforced previous conclusions and showed the superiority of PSO in general over the GA in such a multimodal problem.\nHigh-speed, accuracy, meticulousness and quick response are notion of the vital necessities for modern digital world. An efficient electronic circuit unswervingly affects the maneuver of the whole system. Different tools are required to unravel different types of engineering tribulations. Improving the efficiency, accuracy and low power consumption in an electronic circuit is always been a bottle neck problem. So the need of circuit miniaturization is always there. It saves a lot of time and power that is wasted in switching of gates, the wiring-crises is reduced, cross-sectional area of chip is reduced, the number of transistors that can implemented in chip is multiplied many folds. Therefore to trounce with this problem we have proposed an Artificial intelligence (AI) based approach that make use of Rough Set Theory for its implementation. Theory of rough set has been proposed by Z Pawlak in the year 1982. Rough set theory is a new mathematical tool which deals with uncertainty and vagueness. Decisions can be generated using rough set theory by reducing the unwanted and superfluous data. We have condensed the number of gates without upsetting the productivity of the given circuit. This paper proposes an approach with the help of rough set theory which basically lessens the number of gates in the circuit, based on decision rules.\nThis paper describes some biologically-inspired processes that could be used to build the sort of networks that we associate with the human brain. New to this paper, a 'refined' neuron will be proposed. This is a group of neurons that by joining together can produce a more analogue system, but with the same level of control and reliability that a binary neuron would have. With this new structure, it will be possible to think of an essentially binary system in terms of a more variable set of values. The paper also shows how recent research associated with the new model, can be combined with established theories, to produce a more complete picture. The propositions are largely in line with conventional thinking, but possibly with one or two more radical suggestions. An earlier cognitive model can be filled in with more specific details, based on the new research results, where the components appear to fit together almost seamlessly. The intention of the research has been to describe plausible 'mechanical' processes that can produce the appropriate brain structures and mechanisms, but that could be used without the magical 'intelligence' part that is still not fully understood. There are also some important updates from an earlier version of this paper.\nRecent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.\nThe intelligent reformulation or restructuring of a belief network can greatly increase the efficiency of inference. However, time expended for reformulation is not available for performing inference. Thus, under time pressure, there is a tradeoff between the time dedicated to reformulating the network and the time applied to the implementation of a solution. We investigate this partition of resources into time applied to reformulation and time used for inference. We shall describe first general principles for computing the ideal partition of resources under uncertainty. These principles have applicability to a wide variety of problems that can be divided into interdependent phases of problem solving. After, we shall present results of our empirical study of the problem of determining the ideal amount of time to devote to searching for clusters in belief networks. In this work, we acquired and made use of probability distributions that characterize (1) the performance of alternative heuristic search methods for reformulating a network instance into a set of cliques, and (2) the time for executing inference procedures on various belief networks. Given a preference model describing the value of a solution as a function of the delay required for its computation, the system selects an ideal time to devote to reformulation.\nThe role of inferencing with uncertainty is becoming more important in rule-based expert systems (ES), since knowledge given by a human expert is often uncertain or imprecise. We have succeeded in designing a VLSI chip which can perform an entire inference process based on fuzzy logic. The design of the VLSI fuzzy inference engine emphasizes simplicity, extensibility, and efficiency (operational speed and layout area). It is fabricated in 2.5 um CMOS technology. The inference engine consists of three major components; a rule set memory, an inference processor, and a controller. In this implementation, a rule set memory is realized by a read only memory (ROM). The controller consists of two counters. In the inference processor, one data path is laid out for each rule. The number of the inference rule can be increased adding more data paths to the inference processor. All rules are executed in parallel, but each rule is processed serially. The logical structure of fuzzy inference proposed in the current paper maps nicely onto the VLSI structure. A two-phase nonoverlapping clocking scheme is used. Timing tests indicate that the inference engine can operate at approximately 20.8 MHz. This translates to an execution speed of approximately 80,000 Fuzzy Logical Inferences Per Second (FLIPS), and indicates that the inference engine is suitable for a demanding real-time application. The potential applications include decision-making in the area of command and control for intelligent robot systems, process control, missile and aircraft guidance, and other high performance machines.\nMassive Online Open Courses (MOOCs) which were introduced in 2008 has since drawn attention around the world for both its advantages as well as criticism on its drawbacks. One of the issues in MOOCs which is the lack of interactivity with the instructor has brought conversational bot into the picture to fill in this gap. In this study, a prototype of MOOCs conversational bot, MOOC-bot is being developed and integrated into MOOCs website to respond to the learner inquiries using text or speech input. MOOC-bot is using the popular Artificial Intelligence Markup Language (AIML) to develop its knowledge base, leverage from AIML capability to deliver appropriate responses and can be quickly adapted to new knowledge domains. The system architecture of MOOC-bot consists of knowledge base along with AIML interpreter, chat interface, MOOCs website and Web Speech API to provide speech recognition and speech synthesis capability. The initial MOOC-bot prototype has the general knowledge from the past Loebner Prize winner - ALICE, frequent asked questions, and a content offered by Universiti Teknikal Malaysia Melaka (UTeM). The evaluation of MOOC-bot based on the past competition questions from Chatterbox Challenge (CBC) and Loebner Prize has shown that it was able to provide correct answers most of the time during the test and demonstrated the capability to prolong the conversation. The advantages of MOOC-bot such as able to provide 24-hour service that can serve different time zones, able to have knowledge in multiple domains, and can be shared by multiple sites simultaneously have outweighed its existing limitations.\nIn recent years, traditional cybersecurity safeguards have proven ineffective against insider threats. Famous cases of sensitive information leaks caused by insiders, including the WikiLeaks release of diplomatic cables and the Edward Snowden incident, have greatly harmed the U.S. government's relationship with other governments and with its own citizens. Data Leak Prevention (DLP) is a solution for detecting and preventing information leaks from within an organization's network. However, state-of-art DLP detection models are only able to detect very limited types of sensitive information, and research in the field has been hindered due to the lack of available sensitive texts. Many researchers have focused on document-based detection with artificially labeled \"confidential documents\" for which security labels are assigned to the entire document, when in reality only a portion of the document is sensitive. This type of whole-document based security labeling increases the chances of preventing authorized users from accessing non-sensitive information within sensitive documents. In this paper, we introduce Automated Classification Enabled by Security Similarity (ACESS), a new and innovative detection model that penetrates the complexity of big text security classification/detection. To analyze the ACESS system, we constructed a novel dataset, containing formerly classified paragraphs from diplomatic cables made public by the WikiLeaks organization. To our knowledge this paper is the first to analyze a dataset that contains actual formerly sensitive information annotated at paragraph granularity.\nArtificial intelligence research to a great degree focuses on the brain and behaviors that the brain generates. But the brain, an extremely complex structure resulting from millions of years of evolution, can be viewed as a solution to problems posed by an environment existing in space and time. The environment generates signals that produce sensory events within an organism. Building an internal spatial and temporal model of the environment allows an organism to navigate and manipulate the environment. Higher intelligence might be the ability to process information coming from a larger extent of space-time. In keeping with nature's penchant for extending rather than replacing, the purpose of the mammalian neocortex might then be to record events from distant reaches of space and time and render them, as though yet near and present, to the older, deeper brain whose instinctual roles have changed little over eons. Here this notion is embodied in a model called morphognosis (morpho = shape and gnosis = knowledge). Its basic structure is a pyramid of event recordings called a morphognostic. At the apex of the pyramid are the most recent and nearby events. Receding from the apex are less recent and possibly more distant events. A morphognostic can thus be viewed as a structure of progressively larger chunks of space-time knowledge. A set of morphognostics forms long-term memories that are learned by exposure to the environment. A cellular automaton is used as the platform to investigate the morphognosis model, using a simulated organism that learns to forage in its world for food, build a nest, and play the game of Pong.\nConventional reinforcement learning methods for Markov decision processes rely on weakly-guided, stochastic searches to drive the learning process. It can therefore be difficult to predict what agent behaviors might emerge. In this paper, we consider an information-theoretic cost function for performing constrained stochastic searches that promote the formation of risk-averse to risk-favoring behaviors. This cost function is the value of information, which provides the optimal trade-off between the expected return of a policy and the policy's complexity; policy complexity is measured by number of bits and controlled by a single hyperparameter on the cost function. As the policy complexity is reduced, the agents will increasingly eschew risky actions. This reduces the potential for high accrued rewards. As the policy complexity increases, the agents will take actions, regardless of the risk, that can raise the long-term rewards. The obtainable reward depends on a single, tunable hyperparameter that regulates the degree of policy complexity.   We evaluate the performance of value-of-information-based policies on a stochastic version of Ms. Pac-Man. A major component of this paper is the demonstration that ranges of policy complexity values yield different game-play styles and explaining why this occurs. We also show that our reinforcement-learning search mechanism is more efficient than the others we utilize. This result implies that the value of information theory is appropriate for framing the exploitation-exploration trade-off in reinforcement learning.\nMany artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort. Efficient learning for intra-agent communication and coordination is an indispensable step towards general AI. In this paper, we take StarCraft combat game as a case study, where the task is to coordinate multiple agents as a team to defeat their enemies. To maintain a scalable yet effective communication protocol, we introduce a Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a vectorised extension of actor-critic formulation. We show that BiCNet can handle different types of combats with arbitrary numbers of AI agents for both sides. Our analysis demonstrates that without any supervisions such as human demonstrations or labelled data, BiCNet could learn various types of advanced coordination strategies that have been commonly used by experienced game players. In our experiments, we evaluate our approach against multiple baselines under different scenarios; it shows state-of-the-art performance, and possesses potential values for large-scale real-world applications.\nWeb-based human trafficking activity has increased in recent years but it remains sparsely dispersed among escort advertisements and difficult to identify due to its often-latent nature. The use of intelligent systems to detect trafficking can thus have a direct impact on investigative resource allocation and decision-making, and, more broadly, help curb a widespread social problem. Trafficking detection involves assigning a normalized score to a set of escort advertisements crawled from the Web -- a higher score indicates a greater risk of trafficking-related (involuntary) activities. In this paper, we define and study the problem of trafficking detection and present a trafficking detection pipeline architecture developed over three years of research within the DARPA Memex program. Drawing on multi-institutional data, systems, and experiences collected during this time, we also conduct post hoc bias analyses and present a bias mitigation plan. Our findings show that, while automatic trafficking detection is an important application of AI for social good, it also provides cautionary lessons for deploying predictive machine learning algorithms without appropriate de-biasing. This ultimately led to integration of an interpretable solution into a search system that contains over 100 million advertisements and is used by over 200 law enforcement agencies to investigate leads.\nWith the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by methodological advances in machine learning, by innovations in systems software and architectures, and by the broad accessibility of these technologies.   The next generation of AI systems promises to accelerate these developments and increasingly impact our lives via frequent interactions and making (often mission-critical) decisions on our behalf, often in highly personalized contexts. Realizing this promise, however, raises daunting challenges. In particular, we need AI systems that make timely and safe decisions in unpredictable environments, that are robust against sophisticated adversaries, and that can process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore's Law, which will constrain the amount of data these technologies can store and process. In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI's potential to improve lives and society.\nDespite the several successes of deep learning systems, there are concerns about their limitations, discussed most recently by Gary Marcus. This paper discusses Marcus's concerns and some others, together with solutions to several of these problems provided by the \"P theory of intelligence\" and its realisation in the \"SP computer model\". The main advantages of the SP system are: relatively small requirements for data and the ability to learn from a single experience; the ability to model both hierarchical and non-hierarchical structures; strengths in several kinds of reasoning, including `commonsense' reasoning; transparency in the representation of knowledge, and the provision of an audit trail for all processing; the likelihood that the SP system could not be fooled into bizarre or eccentric recognition of stimuli, as deep learning systems can be; the SP system provides a robust solution to the problem of `catastrophic forgetting' in deep learning systems; the SP system provides a theoretically-coherent solution to the problems of correcting over- and under-generalisations in learning, and learning correct structures despite errors in data; unlike most research on deep learning, the SP programme of research draws extensively on research on human learning, perception, and cognition; and the SP programme of research has an overarching theory, supported by evidence, something that is largely missing from research on deep learning. In general, the SP system provides a much firmer foundation than deep learning for the development of artificial general intelligence.\nThe ability of intelligent agents to play games in human-like fashion is popularly considered a benchmark of progress in Artificial Intelligence. Similarly, performance on multi-disciplinary tasks such as Visual Question Answering (VQA) is considered a marker for gauging progress in Computer Vision. In our work, we bring games and VQA together. Specifically, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Notably, Sketch-QA involves asking a fixed question (\"What object is being drawn?\") and gathering open-ended guess-words from human guessers. We analyze the resulting dataset and present many interesting findings therein. To mimic Pictionary-style guessing, we subsequently propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.\nFractal AI is a theory for general artificial intelligence. It allows to derive new mathematical tools that constitute the foundations for a new kind of stochastic calculus, by modelling information using cellular automaton-like structures instead of smooth functions.   In the repository included we are presenting a new Agent, derived from the first principles of the theory, which is capable of solving Atari games several orders of magnitude more efficiently than other similar techniques, like Monte Carlo Tree Search.   The code provided shows how it is now possible to beat some of the current state of the art benchmarks on Atari games, without previous learning and using less than 1000 samples to calculate each one of the actions when standard MCTS uses 3 Million samples. Among other things, Fractal AI makes it possible to generate a huge database of top performing examples with very little amount of computation required, transforming Reinforcement Learning into a supervised problem.   The algorithm presented is capable of solving the exploration vs exploitation dilemma on both the discrete and continuous cases, while maintaining control over any aspect of the behavior of the Agent. From a general approach, new techniques presented here have direct applications to other areas such as: Non-equilibrium thermodynamics, chemistry, quantum physics, economics, information theory, and non-linear control theory.\nMachine learning is a computational process. To that end, it is inextricably tied to computational power - the tangible material of chips and semiconductors that the algorithms of machine intelligence operate on. Most obviously, computational power and computing architectures shape the speed of training and inference in machine learning, and therefore influence the rate of progress in the technology. But, these relationships are more nuanced than that: hardware shapes the methods used by researchers and engineers in the design and development of machine learning models. Characteristics such as the power consumption of chips also define where and how machine learning can be used in the real world.   Despite this, many analyses of the social impact of the current wave of progress in AI have not substantively brought the dimension of hardware into their accounts. While a common trope in both the popular press and scholarly literature is to highlight the massive increase in computational power that has enabled the recent breakthroughs in machine learning, the analysis frequently goes no further than this observation around magnitude. This paper aims to dig more deeply into the relationship between computational power and the development of machine learning. Specifically, it examines how changes in computing architectures, machine learning methodologies, and supply chains might influence the future of AI. In doing so, it seeks to trace a set of specific relationships between this underlying hardware layer and the broader social impacts and risks around AI.\nA main difference between pre-Web artificial intelligence and the current one is that the Web and its Semantic extension (i.e. Web of Data) contain open global-scale knowledge and make it available to potentially intelligent machines that may want to benefit from it. Nevertheless, most of the Web of Data lacks ontological distinctions and has a sparse distribution of axiomatisations. For example, foundational distinctions such as whether an entity is inherently a class or an individual, or whether it is a physical object or not, are hardly expressed in the data, although they have been largely studied and formalised by foundational ontologies (e.g. DOLCE, SUMO). There is a gap between these ontologies, that often formalise or are inspired by preexisting philosophical theories and are developed with a top-down approach, and the Web of Data that is mostly derived from existing databases or from crowd-based effort (e.g. DBpedia, Wikidata, Freebase). We investigate whether the Web provides an empirical foundation for characterising entities of the Web of Data according to foundational distinctions. We want to answer questions such as \"is the DBpedia entity for dog a class or an instance?\" We report on a set of experiments based on machine learning and crowdsourcing that show promising results.\nNavigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation (\"I am here\") and a representation of the goal (\"I am going there\"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. A video summarizing our research and showing the trained agent in diverse city environments as well as on the transfer task is available at: goo.gl/ESUfho.\nReal-time strategy games have been an important field of game artificial intelligence in recent years. This paper presents a reinforcement learning and curriculum transfer learning method to control multiple units in StarCraft micromanagement. We define an efficient state representation, which breaks down the complexity caused by the large state space in the game environment. Then a parameter sharing multi-agent gradientdescent Sarsa({\\lambda}) (PS-MAGDS) algorithm is proposed to train the units. The learning policy is shared among our units to encourage cooperative behaviors. We use a neural network as a function approximator to estimate the action-value function, and propose a reward function to help units balance their move and attack. In addition, a transfer learning method is used to extend our model to more difficult scenarios, which accelerates the training process and improves the learning performance. In small scale scenarios, our units successfully learn to combat and defeat the built-in AI with 100% win rates. In large scale scenarios, curriculum transfer learning method is used to progressively train a group of units, and shows superior performance over some baseline methods in target scenarios. With reinforcement learning and curriculum transfer learning, our units are able to learn appropriate strategies in StarCraft micromanagement scenarios.\nWe describe a class calculus that is expressive enough to describe and improve its own learning process. It can design and debug programs that satisfy given input/output constraints, based on its ontology of previously learned programs. It can improve its own model of the world by checking the actual results of the actions of its robotic activators. For instance, it could check the black box of a car crash to determine if it was probably caused by electric failure, a stuck electronic gate, dark ice, or some other condition that it must add to its ontology in order to meet its sub-goal of preventing such crashes in the future. Class algebra basically defines the eval/eval-1 Galois connection between the residuated Boolean algebras of 1. equivalence classes and super/sub classes of class algebra type expressions, and 2. a residual Boolean algebra of biclique relationships. It distinguishes which formulas are equivalent, entailed, or unrelated, based on a simplification algorithm that may be thought of as producing a unique pair of Karnaugh maps that describe the rough sets of maximal bicliques of relations. Such maps divide the n-dimensional space of up to 2n-1 conjunctions of up to n propositions into clopen (i.e. a closed set of regions and their boundaries) causal sets. This class algebra is generalized to type-2 fuzzy class algebra by using relative frequencies as probabilities. It is also generalized to a class calculus involving assignments that change the states of programs.   INDEX TERMS 4-valued Boolean Logic, Artificial Intelligence, causal sets, class algebra, consciousness, intelligent design, IS-A hierarchy, mathematical logic, meta-theory, pointless topological space, residuated lattices, rough sets, type-2 fuzzy sets\nRecent years have witnessed increasing interest in the potential benefits of `intelligent' autonomous machines such as robots. Honda's Asimo humanoid robot, iRobot's Roomba robot vacuum cleaner and Google's driverless cars have fired the imagination of the general public, and social media buzz with speculation about a utopian world of helpful robot assistants or the coming robot apocalypse! However, there is a long way to go before autonomous systems reach the level of capabilities required for even the simplest of tasks involving human-robot interaction - especially if it involves communicative behaviour such as speech and language. Of course the field of Artificial Intelligence (AI) has made great strides in these areas, and has moved on from abstract high-level rule-based paradigms to embodied architectures whose operations are grounded in real physical environments. What is still missing, however, is an overarching theory of intelligent communicative behaviour that informs system-level design decisions in order to provide a more coherent approach to system integration. This chapter introduces the beginnings of such a framework inspired by the principles of Perceptual Control Theory (PCT). In particular, it is observed that PCT has hitherto tended to view perceptual processes as a relatively straightforward series of transformations from sensation to perception, and has overlooked the potential of powerful generative model-based solutions that have emerged in practical fields such as visual or auditory scene analysis. Starting from first principles, a sequence of arguments is presented which not only shows how these ideas might be integrated into PCT, but which also extend PCT towards a remarkably symmetric architecture for a needs-driven communicative agent. It is concluded that, if behaviour is the control of perception, then perception is the simulation of behaviour.\nWe describe how specialized database technology and data analysis methods were applied by the Swedish defense to help deal with the violation of Swedish marine territory by foreign submarine intruders during the Eighties and early Nineties. Among several approaches tried some yielded interesting information, although most of the key questions remain unanswered. We conclude with a survey of belief-function- and genetic-algorithm-based methods which were proposed to support interpretation of intelligence reports and prediction of future submarine positions, respectively.\nThe public sector comprises government agencies, ministries, education institutions, health providers and other types of government, commercial and not-for-profit organisations. Unlike commercial enterprises, this environment is highly heterogeneous in all aspects. This forms a complex network which is not always optimised. A lack of optimisation and communication hinders information sharing between the network nodes limiting the flow of information. Another limiting aspect is privacy of personal information and security of operations of some nodes or segments of the network. Attempts to reorganise the network or improve communications to make more information available for sharing and analysis may be hindered or completely halted by public concerns over privacy, political agendas, social and technological barriers. This paper discusses a technical solution for information sharing while addressing the privacy concerns with no need for reorganisation of the existing public sector infrastructure . The solution is based on imposing an additional layer of Intelligent Software Agents and Knowledge Bases for data mining and analysis.\nDecision-making is a process of choosing among alternative courses of action for solving complicated problems where multi-criteria objectives are involved. The past few years have witnessed a growing recognition of Soft Computing technologies that underlie the conception, design and utilization of intelligent systems. Several works have been done where engineers and scientists have applied intelligent techniques and heuristics to obtain optimal decisions from imprecise information. In this paper, we present a concurrent fuzzy-neural network approach combining unsupervised and supervised learning techniques to develop the Tactical Air Combat Decision Support System (TACDSS). Experiment results clearly demonstrate the efficiency of the proposed technique.\nA model of an organism as an autonomous intelligent system has been proposed. This model was used to analyze learning of an organism in various environmental conditions. Processes of learning were divided into two types: strong and weak processes taking place in the absence and the presence of aprioristic information about an object respectively. Weak learning is synonymous to adaptation when aprioristic programs already available in a system (an organism) are started. It was shown that strong learning is impossible for both an organism and any autonomous intelligent system. It was shown also that the knowledge base of an organism cannot be updated. Therefore, all behavior programs of an organism are congenital. A model of a conditioned reflex as a series of consecutive measurements of environmental parameters has been advanced. Repeated measurements are necessary in this case to reduce the error during decision making.\nWe investigate the behavioral patterns of a population of agents, each controlled by a simple biologically motivated neural network model, when they are set in competition against each other in the Minority Model of Challet and Zhang. We explore the effects of changing agent characteristics, demonstrating that crowding behavior takes place among agents of similar memory, and show how this allows unique `rogue' agents with higher memory values to take advantage of a majority population. We also show that agents' analytic capability is largely determined by the size of the intermediary layer of neurons.   In the context of these results, we discuss the general nature of natural and artificial intelligence systems, and suggest intelligence only exists in the context of the surrounding environment (embodiment).   Source code for the programs used can be found at http://neuro.webdrake.net .\nPart I describes an intelligent acoustic emission locator, while Part II discusses blind source separation, time delay estimation and location of two continuous acoustic emission sources.   Acoustic emission (AE) analysis is used for characterization and location of developing defects in materials. AE sources often generate a mixture of various statistically independent signals. A difficult problem of AE analysis is separation and characterization of signal components when the signals from various sources and the mode of mixing are unknown. Recently, blind source separation (BSS) by independent component analysis (ICA) has been used to solve these problems. The purpose of this paper is to demonstrate the applicability of ICA to locate two independent simultaneously active acoustic emission sources on an aluminum band specimen. The method is promising for non-destructive testing of aircraft frame structures by acoustic emission analysis.\nCharacterization of semiconductor devices is used to gather as much data about the device as possible to determine weaknesses in design or trends in the manufacturing process. In this paper, we propose a novel multiple trip point characterization concept to overcome the constraint of single trip point concept in device characterization phase. In addition, we use computational intelligence techniques (e.g. neural network, fuzzy and genetic algorithm) to further manipulate these sets of multiple trip point values and tests based on semiconductor test equipments, Our experimental results demonstrate an excellent design parameter variation analysis in device characterization phase, as well as detection of a set of worst case tests that can provoke the worst case variation, while traditional approach was not capable of detecting them.\nThis paper presents a Multi-Agent approach to the problem of recommending training courses to engineering professionals. The recommendation system is built as a proof of concept and limited to the electrical and mechanical engineering disciplines. Through user modelling and data collection from a survey, collaborative filtering recommendation is implemented using intelligent agents. The agents work together in recommending meaningful training courses and updating the course information. The system uses a users profile and keywords from courses to rank courses. A ranking accuracy for courses of 90% is achieved while flexibility is achieved using an agent that retrieves information autonomously using data mining techniques from websites. This manner of recommendation is scalable and adaptable. Further improvements can be made using clustering and recording user feedback.\nThe major function of this model is to access the UCI Wisconsin Breast Can- cer data-set[1] and classify the data items into two categories, which are normal and anomalous. This kind of classifi cation can be referred as anomaly detection, which discriminates anomalous behaviour from normal behaviour in computer systems. One popular solution for anomaly detection is Artifi cial Immune Sys- tems (AIS). AIS are adaptive systems inspired by theoretical immunology and observed immune functions, principles and models which are applied to prob- lem solving. The Dendritic Cell Algorithm (DCA)[2] is an AIS algorithm that is developed specifi cally for anomaly detection. It has been successfully applied to intrusion detection in computer security. It is believed that agent-based mod- elling is an ideal approach for implementing AIS, as intelligent agents could be the perfect representations of immune entities in AIS. This model evaluates the feasibility of re-implementing the DCA in an agent-based simulation environ- ment called AnyLogic, where the immune entities in the DCA are represented by intelligent agents. If this model can be successfully implemented, it makes it possible to implement more complicated and adaptive AIS models in the agent-based simulation environment.\nThis paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.\nAnalysis of data without labels is commonly subject to scrutiny by unsupervised machine learning techniques. Such techniques provide more meaningful representations, useful for better understanding of a problem at hand, than by looking only at the data itself. Although abundant expert knowledge exists in many areas where unlabelled data is examined, such knowledge is rarely incorporated into automatic analysis. Incorporation of expert knowledge is frequently a matter of combining multiple data sources from disparate hypothetical spaces. In cases where such spaces belong to different data types, this task becomes even more challenging. In this paper we present a novel immune-inspired method that enables the fusion of such disparate types of data for a specific set of problems. We show that our method provides a better visual understanding of one hypothetical space with the help of data from another hypothetical space. We believe that our model has implications for the field of exploratory data analysis and knowledge discovery.\nThe study of immune system aging, i.e. immunosenescence, is a relatively new research topic. It deals with understanding the processes of immunodegradation that indicate signs of functionality loss possibly leading to death. Even though it is not possible to prevent immunosenescence, there is great benefit in comprehending its causes, which may help to reverse some of the damage done and thus improve life expectancy. One of the main factors influencing the process of immunosenescence is the number and phenotypical variety of naive T cells in an individual. This work presents a review of immunosenescence, proposes system dynamics modelling of the processes involving the maintenance of the naive T cell repertoire and presents some preliminary results.\nThis paper constructively proves the existence of an effective procedure generating a computable (total) function that is not contained in any given effectively enumerable set of such functions. The proof implies the existence of machines that process informal concepts such as computable (total) functions beyond the limits of any given Turing machine or formal system, that is, these machines can, in a certain sense, \"compute\" function values beyond these limits. We call these machines creative. We argue that any \"intelligent\" machine should be capable of processing informal concepts such as computable (total) functions, that is, it should be creative. Finally, we introduce hypotheses on creative machines which were developed on the basis of theoretical investigations and experiments with computer programs. The hypotheses say that machine intelligence is the execution of a self-developing procedure starting from any universal programming language and any input.\nThe Dendritic Cell Algorithm (DCA) is an immune-inspired algorithm, developed for the purpose of anomaly detection. The algorithm performs multi-sensor data fusion and correlation which results in a 'context aware' detection system. Previous applications of the DCA have included the detection of potentially malicious port scanning activity, where it has produced high rates of true positives and low rates of false positives. In this work we aim to compare the performance of the DCA and of a Self-Organizing Map (SOM) when applied to the detection of SYN port scans, through experimental analysis. A SOM is an ideal candidate for comparison as it shares similarities with the DCA in terms of the data fusion method employed. It is shown that the results of the two systems are comparable, and both produce false positives for the same processes. This shows that the DCA can produce anomaly detection results to the same standard as an established technique.\nMachine Consciousness and Machine Intelligence are not simply new buzzwords that occupy our imagination. Over the last decades, we witness an unprecedented rise in attempts to create machines with human-like features and capabilities. However, despite widespread sympathy and abundant funding, progress in these enterprises is far from being satisfactory. The reasons for this are twofold: First, the notions of cognition and intelligence (usually borrowed from human behavior studies) are notoriously blurred and ill-defined, and second, the basic concepts underpinning the whole discourse are by themselves either undefined or defined very vaguely. That leads to improper and inadequate research goals determination, which I will illustrate with some examples drawn from recent documents issued by DARPA and the European Commission. On the other hand, I would like to propose some remedies that, I hope, would improve the current state-of-the-art disgrace.\nThe goal is to develop a novel approach for cardiac disease prediction and diagnosis using intelligent agents. Initially the symptoms are preprocessed using filter and wrapper based agents. The filter removes the missing or irrelevant symptoms. Wrapper is used to extract the data in the data set according to the threshold limits. Dependency of each symptom is identified using dependency checker agent. The classification is based on the prior and posterior probability of the symptoms with the evidence value. Finally the symptoms are classified in to five classes namely absence, starting, mild, moderate and serious. Using the cooperative approach the cardiac problem is solved and verified.\nInformation distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of robustness of wrappers, in order not to compromise assets of information or reliability of data extracted. Unfortunately, wrappers may fail in the task of extracting data from a Web page, if its structure changes, sometimes even slightly, thus requiring the exploiting of new techniques to be automatically held so as to adapt the wrapper to the new structure of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through improved tree edit distance matching techniques.\nthis paper presents an enhancement of the medial axis algorithm to be used for finding the optimal shortest path for developed cognitive map. The cognitive map has been developed, based on the architectural blueprint maps. The idea for using the medial-axis is to find main path central pixels; each center pixel represents the center distance between two side boarder pixels. The need for these pixels in the algorithm comes from the need of building a network of nodes for the path, where each node represents a turning in the real world (left, right, critical left, critical right...). The algorithm also ignores from finding the center pixels paths that are too small for intelligent robot navigation. The Idea of this algorithm is to find the possible shortest path between start and end points. The goal of this research is to extract a simple, robust representation of the shape of the cognitive map together with the optimal shortest path between start and end points. The intelligent robot will use this algorithm in order to decrease the time that is needed for sweeping the targeted building.\nThis paper presents the design and development of a proposed rule based Decision Support System that will help students in selecting the best suitable faculty/major decision while taking admission in Gomal University, Dera Ismail Khan, Pakistan. The basic idea of our approach is to design a model for testing and measuring the student capabilities like intelligence, understanding, comprehension, mathematical concepts plus his/her past academic record plus his/her intelligence level, and applying the module results to a rule-based decision support system to determine the compatibility of those capabilities with the available faculties/majors in Gomal University. The result is shown as a list of suggested faculties/majors with the student capabilities and abilities.\nWeb 3.0 is an evolving extension of the web 2.0 scenario. The perceptions regarding web 3.0 is different from person to person . Web 3.0 Architecture supports ubiquitous connectivity, network computing, open identity, intelligent web, distributed databases and intelligent applications. Some of the technologies which lead to the design and development of web 3.0 applications are Artificial intelligence, Automated reasoning, Cognitive architecture, Semantic web . An attempt is made to capture the requirements of Students inline with web 3.0 so as to bridge the gap between the design and development of web 3.0 applications and requirements among Students. Maximum Spanning Tree modeling of the requirements facilitate the identification of key areas and key attributes in the design and development of software products for Students in Web 3.0 using Discriminant analysis. Keywords : Web 3.0, Discriminant analysis, Design and Development, Model, Maximum Spanning Tree 1.\nWe present an automated solution for rapid diagnosis of client device problems in private cloud environments: the Intelligent Automated Client Diagnostic (IACD) system. Clients are diagnosed with the aid of Transmission Control Protocol (TCP) packet traces, by (i) observation of anomalous artifacts occurring as a result of each fault and (ii) subsequent use of the inference capabilities of soft-margin Support Vector Machine (SVM) classifiers. The IACD system features a modular design and is extendible to new faults, with detection capability unaffected by the TCP variant used at the client. Experimental evaluation of the IACD system in a controlled environment demonstrated an overall diagnostic accuracy of 98%.\nWe present the Intelligent Automated Client Diagnostic (IACD) system, which only relies on inference from Transmission Control Protocol (TCP) packet traces for rapid diagnosis of client device problems that cause network performance issues. Using soft-margin Support Vector Machine (SVM) classifiers, the system (i) distinguishes link problems from client problems, and (ii) identifies characteristics unique to client faults to report the root cause of the client device problem. Experimental evaluation demonstrated the capability of the IACD system to distinguish between faulty and healthy links and to diagnose the client faults with 98% accuracy in healthy links. The system can perform fault diagnosis independent of the client's specific TCP implementation, enabling diagnosis capability on diverse range of client computers.\nSystem identification from the experimental data plays a vital role for model based controller design. Derivation of process model from first principles is often difficult due to its complexity. The first stage in the development of any control and monitoring system is the identification and modeling of the system. Each model is developed within the context of a specific control problem. Thus, the need for a general system identification framework is warranted. The proposed framework should be able to adapt and emphasize different properties based on the control objective and the nature of the behavior of the system. Therefore, system identification has been a valuable tool in identifying the model of the system based on the input and output data for the design of the controller. The present work is concerned with the identification of transfer function models using statistical model identification, process reaction curve method, ARX model, genetic algorithm and modeling using neural network and fuzzy logic for interacting and non interacting tank process. The identification technique and modeling used is prone to parameter change & disturbance. The proposed methods are used for identifying the mathematical model and intelligent model of interacting and non interacting process from the real time experimental data.\nBased on integrated infrastructure of resource sharing and computing in distributed environment, cloud computing involves the provision of dynamically scalable and provides virtualized resources as services over the Internet. These applications also bring a large scale heterogeneous and distributed information which pose a great challenge in terms of the semantic ambiguity. It is critical for application services in cloud computing environment to provide users intelligent service and precise information. Semantic information processing can help users deal with semantic ambiguity and information overload efficiently through appropriate semantic models and semantic information processing technology. The semantic information processing have been successfully employed in many fields such as the knowledge representation, natural language understanding, intelligent web search, etc. The purpose of this report is to give an overview of existing technologies for semantic information processing in cloud computing environment, to propose a research direction for addressing distributed semantic reasoning and parallel semantic computing by exploiting semantic information newly available in cloud computing environment.\nPenetration Testing is a methodology for assessing network security, by generating and executing possible attacks. Doing so automatically allows for regular and systematic testing without a prohibitive amount of human labor. A key question then is how to generate the attacks. This is naturally formulated as a planning problem. Previous work (Lucangeli et al. 2010) used classical planning and hence ignores all the incomplete knowledge that characterizes hacking. More recent work (Sarraute et al. 2011) makes strong independence assumptions for the sake of scaling, and lacks a clear formal concept of what the attack planning problem actually is. Herein, we model that problem in terms of partially observable Markov decision processes (POMDP). This grounds penetration testing in a well-researched formalism, highlighting important aspects of this problem's nature. POMDPs allow to model information gathering as an integral part of the problem, thus providing for the first time a means to intelligently mix scanning actions with actual exploits.\nThese notes describe how the \"SP theory of intelligence\", and its embodiment in the \"SP machine\", may help to realise cognitive computing, as described in the book \"Smart Machines\". In the SP system, information compression and a concept of \"multiple alignment\" are centre stage. The system is designed to integrate such things as unsupervised learning, pattern recognition, probabilistic reasoning, and more. It may help to overcome the problem of variety in big data, it may serve in pattern recognition and in the unsupervised learning of structure in data, and it may facilitate the management and transmission of big data. There is potential, via information compression, for substantial gains in computational efficiency, especially in the use of energy. The SP system may help to realise data-centric computing, perhaps via a development of Hebb's concept of a \"cell assembly\", or via the use of light or DNA for the processing of information. It has potential in the management of errors and uncertainty in data, in medical diagnosis, in processing streams of data, and in promoting adaptability in robots.\nCan quantum mechanics help us in building intelligent robots and agents? One of the defining characteristics of intelligent behavior is the capacity to learn from experience. However, a major bottleneck for agents to learn in any real-life situation is the size and complexity of the corresponding task environment. Owing to, e.g., a large space of possible strategies, learning is typically slow. Even for a moderate task environment, it may simply take too long to rationally respond to a given situation. If the environment is impatient, allowing only a certain time for a response, an agent may then be unable to cope with the situation and to learn at all. Here we show that quantum physics can help and provide a significant speed-up for active learning as a genuine problem of artificial intelligence. We introduce a large class of quantum learning agents for which we show a quadratic boost in their active learning efficiency over their classical analogues. This result will be particularly relevant for applications involving complex task environments.\nIn this survey we present different approaches that allow an intelligent agent to explore autonomous its environment to gather information and learn multiple tasks. Different communities proposed different solutions, that are in many cases, similar and/or complementary. These solutions include active learning, exploration/exploitation, online-learning and social learning. The common aspect of all these approaches is that it is the agent to selects and decides what information to gather next. Applications for these approaches already include tutoring systems, autonomous grasping learning, navigation and mapping and human-robot interaction. We discuss how these approaches are related, explaining their similarities and their differences in terms of problem assumptions and metrics of success. We consider that such an integrated discussion will improve inter-disciplinary research and applications.\nThis work relates to context-awareness of things that belong to IoT networks. Preferences understood as a priority in selection are considered, and dynamic preference models for such systems are built. Preference models are based on formal logic, and they are built on-the-fly by software agents observing the behavior of users/inhabitants, and gathering knowledge about preferences expressed in terms of logical specifications. A 3-level structure of agents has been introduced to support IoT inference. These agents cooperate with each other basing on the graph representation of the system knowledge. An example of such a system is presented.\nA big open question of algorithmic information theory is the choice of the universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff induction we have invariance theorems: the choice of the UTM changes bounds only by a constant. For the universally intelligent agent AIXI (Hutter, 2005) no invariance theorem is known. Our results are entirely negative: we discuss cases in which unlucky or adversarial choices of the UTM cause AIXI to misbehave drastically. We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. This undermines all existing optimality properties for AIXI. While it may still serve as a gold standard for AI, our results imply that AIXI is a relative theory, dependent on the choice of the UTM.\nAlthough direct marketing is a good method for banks to utilize in the face of global competition and the financial crisis, it has been shown to exhibit poor performance. However, there are some drawbacks to direct campaigns, such as those related to improving the negative attributes that customers ascribe to banks. To overcome these problems, attractive long-term deposit campaigns should be organized and managed more effectively. The aim of this study is to develop an Intelligent Bank Market Management System (IBMMS) for bank managers who want to manage efficient marketing campaigns. IBMMS is the first system developed by combining the power of data mining with the capabilities of expert systems in this area. Moreover, IBMMS includes important features that enable it to be intelligent: a knowledge base, an inference engine and an advisor. Using this system, a manager can successfully direct marketing campaigns and follow the decision schemas of customers both as individuals and as a group; moreover, a manager can make decisions that lead to the desired response by customers.\nThis paper presents a solution to Autonomous Underwater Vehicles (AUVs) large scale route planning and task assignment joint problem. Given a set of constraints (e.g., time) and a set of task priority values, the goal is to find the optimal route for underwater mission that maximizes the sum of the priorities and minimizes the total risk percentage while meeting the given constraints. Making use of the heuristic nature of genetic and swarm intelligence algorithms in solving NP-hard graph problems, Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) are employed to find the optimum solution, where each individual in the population is a candidate solution (route). To evaluate the robustness of the proposed methods, the performance of the all PS and GA algorithms are examined and compared for a number of Monte Carlo runs. Simulation results suggest that the routes generated by both algorithms are feasible and reliable enough, and applicable for underwater motion planning. However, the GA-based route planner produces superior results comparing to the results obtained from the PSO based route planner.\nWe define a property of intelligent systems, which we call Reflexivity. In human beings, it is one aspect of consciousness, and an element of deliberation. We propose a conjecture, that this property is conditioned by a topological property of the processes which implement this reflexivity. These processes may be symbolic, or non symbolic e.g. connexionnist. An architecture which implements reflexivity may be based on the interaction of one or several modules of deep learning, which may be specialized or not, and interconnected in a relevant way. A necessary condition of reflexivity is the existence of recurrence in its processes, we will examine in which cases this condition may be sufficient. We will then examine how this topology and this property make possible the expression of a second property, the deliberation. In a final paragraph, we propose an evaluation of intelligent systems, based on the fulfillment of all or some of these properties.\nOne of the important issues in computer networks is \"Load Balancing\" which leads to efficient use of the network resources. To achieve a balanced network it is necessary to find different routes between the source and destination. In the current paper we propose a new approach to find different routes using swarm intelligence techniques and multi colony algorithms. In the proposed algorithm that is an improved version of MACO algorithm, we use different colonies of ants and bees and appoint these colony members as intelligent agents to monitor the network and update the routing information. The survey includes comparison and critiques of MACO. The simulation results show a tangible improvement in the aforementioned approach.\nNature-inspired metaheuristic algorithms, especially those based on swarm intelligence, have attracted much attention in the last ten years. Firefly algorithm appeared in about five years ago, its literature has expanded dramatically with diverse applications. In this paper, we will briefly review the fundamentals of firefly algorithm together with a selection of recent publications. Then, we discuss the optimality associated with balancing exploration and exploitation, which is essential for all metaheuristic algorithms. By comparing with intermittent search strategy, we conclude that metaheuristics such as firefly algorithm are better than the optimal intermittent search strategy. We also analyse algorithms and their implications for higher-dimensional optimization problems.\nProcedural content generation (PCG) has recently become one of the hottest topics in computational intelligence and AI game researches. Among a variety of PCG techniques, search-based approaches overwhelmingly dominate PCG development at present. While SBPCG leads to promising results and successful applications, it poses a number of challenges ranging from representation to evaluation of the content being generated. In this paper, we present an alternative yet generic PCG framework, named learning-based procedure content generation (LBPCG), to provide potential solutions to several challenging problems in existing PCG techniques. By exploring and exploiting information gained in game development and public beta test via data-driven learning, our framework can generate robust content adaptable to end-user or target players on-line with minimal interruption to their experience. Furthermore, we develop enabling techniques to implement the various models required in our framework. For a proof of concept, we have developed a prototype based on the classic open source first-person shooter game, Quake. Simulation results suggest that our framework is promising in generating quality content.\nWhen a large number of people with heterogeneous knowledge and skills run a project together, it is important to use a sensible engineering process. This especially holds for a project building an intelligent autonomously driving car to participate in the 2007 DARPA Urban Challenge. In this article, we present essential elements of a software and system engineering process for the development of artificial intelligence capable of driving autonomously in complex urban situations. The process includes agile concepts, like test first approach, continuous integration of every software module and a reliable release and configuration management assisted by software tools in integrated development environments. However, the most important ingredients for an efficient and stringent development are the ability to efficiently test the behavior of the developed system in a flexible and modular simulator for urban situations.\nAn intelligent version of the sliding-puzzle game is developed using the new Go programming language, which uses a concurrent version of the A* Informed Search Algorithm to power solver-bot that runs in the background. The game runs in computer system's terminals. Mainly, it was developed for UNIX-type systems but it works pretty well in nearly all the operating systems because of cross-platform compatibility of the programming language used. The game uses language's concurrency primitives to simplify most of the hefty parts of the game. A real-time notification delivery architecture is developed using language's built-in concurrency support, which performs similar to event based context aware invocations like we see on the web platform.\nIntelligence can be understood as an agent's ability to predict its environment's dynamic by a level of precision which allows it to effectively foresee opportunities and threats. Under the assumption that such intelligence relies on a knowledge space any effective reasoning would benefit from a maximum portion of useful and a minimum portion of misleading knowledge fragments. It begs the question of how the quality of such knowledge space can be kept high as the amount of knowledge keeps growing. This article proposes a mathematical model to describe general principles of how quality of a growing knowledge space evolves depending on error rate, error propagation and countermeasures. There is also shown to which extend the quality of a knowledge space collapses as removal of low quality knowledge fragments occurs too slowly for a given knowledge space's growth rate.\nOne of the major challenges for collective intelligence is inconsistency, which is unavoidable whenever subjective assessments are involved. Pairwise comparisons allow one to represent such subjective assessments and to process them by analyzing, quantifying and identifying the inconsistencies.   We propose using smaller scales for pairwise comparisons and provide mathematical and practical justifications for this change. Our postulate's aim is to initiate a paradigm shift in the search for a better scale construction for pairwise comparisons. Beyond pairwise comparisons, the results presented may be relevant to other methods using subjective scales.   Keywords: pairwise comparisons, collective intelligence, scale, subjective assessment, inaccuracy, inconsistency.\nSoftware project estimation is crucial aspect in delivering software on time and on budget. Software size is an important metric in determining the effort, cost, and productivity. Today, source lines of code and function point are the most used sizing metrics. Backfiring is a well-known technique for converting between function points and source lines of code. However when backfiring is used, there is a high margin of error. This study introduces a method to improve the accuracy of backfiring. Intelligent systems have been used in software prediction models to improve performance over traditional techniques. For this reason, a hybrid Neuro-Fuzzy is used because it takes advantages of the neural networks learning and fuzzy logic human-like reasoning. This paper describes an improved backfiring technique which uses Neuro-Fuzzy and compares the new method against the default conversion ratios currently used by software practitioners.\nIn this paper we describe an extension of the Variable Neighbourhood Search (VNS) which integrates the basic VNS with other complementary approaches from machine learning, statistics and experimental algorithmic, in order to produce high-quality performance and to completely automate the resulting optimization strategy. The resulting intelligent VNS has been successfully applied to a couple of optimization problems where the solution space consists of the subsets of a finite reference set. These problems are the labelled spanning tree and forest problems that are formulated on an undirected labelled graph; a graph where each edge has a label in a finite set of labels L. The problems consist on selecting the subset of labels such that the subgraph generated by these labels has an optimal spanning tree or forest, respectively. These problems have several applications in the real-world, where one aims to ensure connectivity by means of homogeneous connections.\nWhile emerging deep-learning systems have outclassed knowledge-based approaches in many tasks, their application to detection tasks for autonomous technologies remains an open field for scientific exploration. Broadly, there are two major developmental bottlenecks: the unavailability of comprehensively labeled datasets and of expressive evaluation strategies. Approaches for labeling datasets have relied on intensive hand-engineering, and strategies for evaluating learning systems have been unable to identify failure-case scenarios. Human intelligence offers an untapped approach for breaking through these bottlenecks. This paper introduces Driverseat, a technology for embedding crowds around learning systems for autonomous driving. Driverseat utilizes crowd contributions for (a) collecting complex 3D labels and (b) tagging diverse scenarios for ready evaluation of learning systems. We demonstrate how Driverseat can crowdstrap a convolutional neural network on the lane-detection task. More generally, crowdstrapping introduces a valuable paradigm for any technology that can benefit from leveraging the powerful combination of human and computer intelligence.\nUse of intelligent decision aids can help alleviate the challenges of planning complex operations. We describe integrated algorithms, and a tool capable of translating a high-level concept for a tactical military operation into a fully detailed, actionable plan, producing automatically (or with human guidance) plans with realistic degree of detail and of human-like quality. Tight interleaving of several algorithms -- planning, adversary estimates, scheduling, routing, attrition and consumption estimates -- comprise the computational approach of this tool. Although originally developed for Army large-unit operations, the technology is generic and also applies to a number of other domains, particularly in critical situations requiring detailed planning within a constrained period of time. In this paper, we focus particularly on the engineering tradeoffs in the design of the tool. In an experimental evaluation, reminiscent of the Turing test, the tool's performance compared favorably with human planners.\nObtaining a survival strategy (policy) is one of the fundamental problems of biological agents. In this paper, we generalize the formulation of previous research related to the survival of an agent and we formulate the survival problem as a maximization of the multi-step survival probability in future time steps. We introduce a method for converting the maximization of multi-step survival probability into a classical reinforcement learning problem. Using this conversion, the reward function (negative temporal cost function) is expressed as the log of the temporal survival probability. And we show that the objective function of the reinforcement learning in this sense is proportional to the variational lower bound of the original problem. Finally, We empirically demonstrate that the agent learns survival behavior by using the reward function introduced in this paper.\nSuperintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. In light of recent advances in machine intelligence, a number of scientists, philosophers and technologists have revived the discussion about the potential catastrophic risks entailed by such an entity. In this article, we trace the origins and development of the neo-fear of superintelligence, and some of the major proposals for its containment. We argue that such containment is, in principle, impossible, due to fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.\nEach day, approximately 500 missing persons cases occur that go unsolved/unresolved in the United States. The non-profit organization known as the Find Me Group (FMG), led by former law enforcement professionals, is dedicated to solving or resolving these cases. This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abductive inference. This system takes search locations provided by a group of experts and rank-orders them based on the probability assigned to areas based on the prior performance of the experts taken as a group. We evaluate our approach compared to the current practices employed by the Find Me Group and found it significantly reduces the search area - leading to a reduction of 31 square miles over 24 cases we examined in our experiments. Currently, we are using MIST to aid the Find Me Group in an active missing person case.\nMost of the metaheuristics can efficiently solve unconstrained problems; however, their performance may degenerate if the constraints are involved. This paper proposes two constraint handling approaches for an emerging metaheuristic of Cohort Intelligence (CI). More specifically CI with static penalty function approach (SCI) and CI with dynamic penalty function approach (DCI) are proposed. The approaches have been tested by solving several constrained test problems. The performance of the SCI and DCI have been compared with algorithms like GA, PSO, ABC, d-Ds. In addition, as well as three real world problems from mechanical engineering domain with improved solutions. The results were satisfactory and validated the applicability of CI methodology for solving real world problems.\nInternet of Things (IoT) is emerging as a significant technology in shaping the future by connecting physical devices or things with internet. It also presents various opportunities for intersection of other technological trends which can allow it to become even more intelligent and efficient. In this paper we focus our attention on the integration of Intelligent Conversational Software Agents or Chatbots with IoT. Literature surveys have looked into various applications, features, underlying technologies and known challenges of IoT. On the other hand, Chatbots are being adopted in greater numbers due to major strides in development of platforms and frameworks. The novelty of this paper lies in the specific integration of Chatbots in the IoT scenario. We analyzed the shortcomings of existing IoT systems and put forward ways to tackle them by incorporating chatbots. A general architecture is proposed for implementing such a system, as well as platforms and frameworks, both commercial and open source, which allow for implementation of such systems. Identification of the newer challenges and possible future directions with this new integration, have also been addressed.\nPeople have information needs of varying complexity, which can be solved by an intelligent agent able to answer questions formulated in a proper way, eventually considering user context and preferences. In a scenario in which the user profile can be considered as a question, intelligent agents able to answer questions can be used to find the most relevant answers for a given user. In this work we propose a novel model based on Artificial Neural Networks to answer questions with multiple answers by exploiting multiple facts retrieved from a knowledge base. The model is evaluated on the factoid Question Answering and top-n recommendation tasks of the bAbI Movie Dialog dataset. After assessing the performance of the model on both tasks, we try to define the long-term goal of a conversational recommender system able to interact using natural language and to support users in their information seeking processes in a personalized way.\nThe introduction of automated vehicles without permanent human supervision demands a functional system description, including functional system boundaries and a comprehensive safety analysis. These inputs to the technical development can be identified and analyzed by a scenario-based approach. Furthermore, to establish an economical test and release process, a large number of scenarios must be identified to obtain meaningful test results. Experts are doing well to identify scenarios that are difficult to handle or unlikely to happen. However, experts are unlikely to identify all scenarios possible based on the knowledge they have on hand. Expert knowledge modeled for computer aided processing may help for the purpose of providing a wide range of scenarios. This contribution reviews ontologies as knowledge-based systems in the field of automated vehicles, and proposes a generation of traffic scenes in natural language as a basis for a scenario creation.\nAn Intelligent Personal Agent (IPA) is an agent that has the purpose of helping the user to gain information through reliable resources with the help of knowledge navigation techniques and saving time to search the best content. The agent is also responsible for responding to the chat-based queries with the help of Conversation Corpus. We will be testing different methods for optimal query generation. To felicitate the ease of usage of the application, the agent will be able to accept the input through Text (Keyboard), Voice (Speech Recognition) and Server (Facebook) and output responses using the same method. Existing chat bots reply by making changes in the input, but we will give responses based on multiple SRT files. The model will learn using the human dialogs dataset and will be able respond human-like. Responses to queries about famous things (places, people, and words) can be provided using web scraping which will enable the bot to have knowledge navigation features. The agent will even learn from its past experiences supporting semi-supervised learning.\nPredicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborious to the expert if the number of candidate features is very large. We introduce a probabilistic model that can incorporate expert feedback about the impact of genomic measurements on the sensitivity of a cancer cell for a given drug. We also present two methods to intelligently collect this feedback from the expert, using experimental design and multi-armed bandit models. In a multiple myeloma blood cancer data set (n=51), expert knowledge decreased the prediction error by 8%. Furthermore, the intelligent approaches can be used to reduce the workload of feedback collection to less than 30% on average compared to a naive approach.\nThe accelerated path of technological development, particularly at the interface between hardware and biology has been suggested as evidence for future major technological breakthroughs associated to our potential to overcome biological constraints. This includes the potential of becoming immortal, having expanded cognitive capacities thanks to hardware implants or the creation of intelligent machines. Here I argue that several relevant evolutionary and structural constraints might prevent achieving most (if not all) these innovations. Instead, the coming future will bring novelties that will challenge many other aspects of our life and that can be seen as other feasible singularities. One particularly important one has to do with the evolving interactions between humans and non-intelligent robots capable of learning and communication. Here I argue that a long term interaction can lead to a new class of \"agent\" (the humanbot). The way shared memories get tangled over time will inevitably have important consequences for both sides of the pair, whose identity as separated entities might become blurred and ultimately vanish. Understanding such hybrid systems requires a second-order neuroscience approach while posing serious conceptual challenges, including the definition of consciousness.\nThis paper describes a novel approach to software engineering derived from the \"SP theory of intelligence\" and its realisation in the \"SP computer model\". These are the bases of a projected industrial-strength \"SP machine\" which, when mature, is anticipated to be the vehicle for software engineering as described in this paper. Potential benefits of this new approach to software engineering include: the automation of semi-automation of software development, with non-automatic programming of the SP system where necessary; allowing programmers to concentrate on 'real-world' parallelism, without worries about parallelism to speed up processing; the ambitious long-term goal of programming the SP system via written or spoken natural language; reducing or eliminating the distinction between 'design' and 'implementation'; reducing or eliminating operations like compiling or interpretation; reducing or eliminating the need for verification of software; reducing the need for an explicit process of validation of software; no formal distinction between program and database; potential for substantial reductions in the number of types of data file and the number of computer languages; benefits for version control; and reducing technical debt.\nThe recent breakthroughs in Artificial Intelligence (AI) have allowed individuals to rely on automated systems for a variety of reasons. Some of these systems are the currently popular voice-enabled systems like Echo by Amazon and Home by Google that are also called as Intelligent Personal Assistants (IPAs). Though there are raising concerns about privacy and ethical implications, users of these IPAs seem to continue using these systems. We aim to investigate why users are concerned about privacy and how they are handling these concerns while using the IPAs. By utilizing the reviews posted online along with the responses to a survey, this paper provides a set of insights about the detected markers related to user interests and privacy challenges. The insights suggest that users of these systems irrespective of their concerns about privacy, are generally positive in terms of utilizing IPAs in their everyday lives. However, there is a significant percentage of users who are concerned about privacy and took further actions to address the related concerns. Some percentage of users expressed that they do not have any privacy concerns but when they learned about the \"always listening\" feature of these devices, their concern about privacy increased.\nWe draw upon a previously largely untapped literature on human collective intelligence as a source of inspiration for improving deep learning. Implicit in many algorithms that attempt to solve Deep Reinforcement Learning (DRL) tasks is the network of processors along which parameter values are shared. So far, existing approaches have implicitly utilized fully-connected networks, in which all processors are connected. However, the scientific literature on human collective intelligence suggests that complete networks may not always be the most effective information network structures for distributed search through complex spaces. Here we show that alternative topologies can improve deep neural network training: we find that sparser networks learn higher rewards faster, leading to learning improvements at lower communication costs.\nWe present AliMe Assist, an intelligent assistant designed for creating an innovative online shopping experience in E-commerce. Based on question answering (QA), AliMe Assist offers assistance service, customer service, and chatting service. It is able to take voice and text input, incorporate context to QA, and support multi-round interaction. Currently, it serves millions of customer questions per day and is able to address 85% of them. In this paper, we demonstrate the system, present the underlying techniques, and share our experience in dealing with real-world QA in the E-commerce field.\nEvolutionary deep intelligence synthesizes highly efficient deep neural networks architectures over successive generations. Inspired by the nature versus nurture debate, we propose a study to examine the role of external factors on the network synthesis process by varying the availability of simulated environmental resources. Experimental results were obtained for networks synthesized via asexual evolutionary synthesis (1-parent) and sexual evolutionary synthesis (2-parent, 3-parent, and 5-parent) using a 10% subset of the MNIST dataset. Results show that a lower environmental factor model resulted in a more gradual loss in performance accuracy and decrease in storage size. This potentially allows significantly reduced storage size with minimal to no drop in performance accuracy, and the best networks were synthesized using the lowest environmental factor models.\nAntifragile systems grow measurably better in the presence of hazards. This is in contrast to fragile systems which break down in the presence of hazards, robust systems that tolerate hazards up to a certain degree, and resilient systems that -- like self-healing systems -- revert to their earlier expected behavior after a period of convalescence. The notion of antifragility was introduced by Taleb for economics systems, but its applicability has been illustrated in biological and engineering domains as well. In this paper, we propose an architecture that imparts antifragility to intelligent autonomous systems, specifically those that are goal-driven and based on AI-planning. We argue that this architecture allows the system to self-improve by uncovering new capabilities obtained either through the hazards themselves (opportunistic) or through deliberation (strategic). An AI planning-based case study of an autonomous wheeled robot is presented. We show that with the proposed architecture, the robot develops antifragile behaviour with respect to an oil spill hazard.\nBusinesses are naturally interested in detecting anomalies in their internal processes, because these can be indicators for fraud and inefficiencies. Within the domain of business intelligence, classic anomaly detection is not very frequently researched. In this paper, we propose a method, using autoencoders, for detecting and analyzing anomalies occurring in the execution of a business process. Our method does not rely on any prior knowledge about the process and can be trained on a noisy dataset already containing the anomalies. We demonstrate its effectiveness by evaluating it on 700 different datasets and testing its performance against three state-of-the-art anomaly detection methods. This paper is an extension of our previous work from 2016 [30]. Compared to the original publication we have further refined the approach in terms of performance and conducted an elaborate evaluation on more sophisticated datasets including real-life event logs from the Business Process Intelligence Challenges of 2012 and 2017. In our experiments our approach reached an F1 score of 0.87, whereas the best unaltered state-of-the-art approach reached an F1 score of 0.72. Furthermore, our approach can be used to analyze the detected anomalies in terms of which event within one execution of the process causes the anomaly.\nAs wireless networks evolve towards high mobility and providing better support for connected vehicles, a number of new challenges arise due to the resulting high dynamics in vehicular environments and thus motive rethinking of traditional wireless design methodologies. Future intelligent vehicles, which are at the heart of high mobility networks, are increasingly equipped with multiple advanced onboard sensors and keep generating large volumes of data. Machine learning, as an effective approach to artificial intelligence, can provide a rich set of tools to exploit such data for the benefit of the networks. In this article, we first identify the distinctive characteristics of high mobility vehicular networks and motivate the use of machine learning to address the resulting challenges. After a brief introduction of the major concepts of machine learning, we discuss its applications to learn the dynamics of vehicular networks and make informed decisions to optimize network performance. In particular, we discuss in greater detail the application of reinforcement learning in managing network resources as an alternative to the prevalent optimization approach. Finally, some open issues worth further investigation are highlighted.\nThe emergence of knowledge graphs in the scholarly communication domain and recent advances in artificial intelligence and natural language processing bring us closer to a scenario where intelligent systems can assist scientists over a range of knowledge-intensive tasks. In this paper we present experimental results about the generation of word embeddings from scholarly publications for the intelligent processing of scientific texts extracted from SciGraph. We compare the performance of domain-specific embeddings with existing pre-trained vectors generated from very large and general purpose corpora. Our results suggest that there is a trade-off between corpus specificity and volume. Embeddings from domain-specific scientific corpora effectively capture the semantics of the domain. On the other hand, obtaining comparable results through general corpora can also be achieved, but only in the presence of very large corpora of well formed text. Furthermore, We also show that the degree of overlapping between knowledge areas is directly related to the performance of embeddings in domain evaluation tasks.\nThe clonal selection principle explains the basic features of an adaptive immune response to a antigenic stimulus. It established the idea that only those cells that recognize the antigens are selected to proliferate and differentiate. This paper explains a computational implementation of the clonal selection principle that explicitly takes into account the affinity maturation of the immune response. Antibodies generated by the clonal selection algorithm are clustered in some categories according to the affinity maturation, so that immunological memory cells which respond to the specified pathogen are created. Experimental results to classify the medical database of Coronary Heart Disease databases are reported. For the dataset, our proposed method shows the 99.6\\% classification capability of training data.\nGenerally intelligent agents exhibit successful behavior across problems in several settings. Endemic in approaches to realize such intelligence in machines is catastrophic forgetting: sequential learning corrupts knowledge obtained earlier in the sequence, or tasks antagonistically compete for system resources. Methods for obviating catastrophic forgetting have sought to identify and preserve features of the system necessary to solve one problem when learning to solve another, or to enforce modularity such that minimally overlapping sub-functions contain task specific knowledge. While successful, both approaches scale poorly because they require larger architectures as the number of training instances grows, causing different parts of the system to specialize for separate subsets of the data. Here we present a method for addressing catastrophic forgetting called developmental compression. It exploits the mild impacts of developmental mutations to lessen adverse changes to previously-evolved capabilities and `compresses' specialized neural networks into a generalized one. In the absence of domain knowledge, developmental compression produces systems that avoid overt specialization, alleviating the need to engineer a bespoke system for every task permutation and suggesting better scalability than existing approaches. We validate this method on a robot control problem and hope to extend this approach to other machine learning domains in the future.\nAn emotion orientated intelligent interface consists of Emotion Generating Calculations (EGC) and Mental State Transition Network (MSTN). We have developed the Android EGC application software which the agent works to evaluate the feelings in the conversation. In this paper, we develop the tourist information system which can estimate the user's feelings at the sightseeing spot. The system can recommend the sightseeing spot and the local food corresponded to the user's feeling. The system calculates the recommendation list by the estimate function which consists of Google search results, the important degree of a term at the sightseeing website, and the the aroused emotion by EGC. In order to show the effectiveness, this paper describes the experimental results for some situations during Hiroshima sightseeing.\nPrevious approaches of analyzing spontaneously spoken language often have been based on encoding syntactic and semantic knowledge manually and symbolically. While there has been some progress using statistical or connectionist language models, many current spoken- language systems still use a relatively brittle, hand-coded symbolic grammar or symbolic semantic component. In contrast, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the SCREEN system which is based on this new robust, learned and flat analysis. In this paper, we focus on a detailed description of SCREEN's architecture, the flat syntactic and semantic analysis, the interaction with a speech recognizer, and a detailed evaluation analysis of the robustness under the influence of noisy or incomplete input. The main result of this paper is that flat representations allow more robust processing of spontaneous spoken language than deeply structured representations. In particular, we show how the fault-tolerance and learning capability of connectionist networks can support a flat analysis for providing more robust spoken-language processing within an overall hybrid symbolic/connectionist framework.\nThe interaction networks of biological systems are known to take on several non-random structural properties, some of which are believed to positively influence system robustness. Researchers are only starting to understand how these structural properties emerge, however suggested roles for component fitness and community development (modularity) have attracted interest from the scientific community. In this study, we apply some of these concepts to an evolutionary algorithm and spontaneously organize its population using information that the population receives as it moves over a fitness landscape. More precisely, we employ fitness and clustering based driving forces for guiding network structural dynamics, which in turn are controlled by the population dynamics of an evolutionary algorithm. To evaluate the effect this has on evolution, experiments are conducted on six engineering design problems and six artificial test functions and compared against cellular genetic algorithms and 16 other evolutionary algorithm designs. Our results indicate that a self-organizing topology evolutionary algorithm exhibits surprisingly robust search behavior with promising performance observed over short and long time scales. After a careful analysis of these results, we conclude that the coevolution between a population and its topology represents a powerful new paradigm for designing robust search heuristics.\nWe present a method to eliminate redundancy in the transition tables of Boolean automata: schema redescription with two symbols. One symbol is used to capture redundancy of individual input variables, and another to capture permutability in sets of input variables: fully characterizing the canalization present in Boolean functions. Two-symbol schemata explain aspects of the behaviour of automata networks that the characterization of their emergent patterns does not capture. We use our method to compare two well-known cellular automata for the density classification task: the human engineered CA GKL, and another obtained via genetic programming (GP). We show that despite having very different collective behaviour, these rules are very similar. Indeed, GKL is a special case of GP. Therefore, we demonstrate that it is more feasible to compare cellular automata via schema redescriptions of their rules, than by looking at their emergent behaviour, leading us to question the tendency in complexity research to pay much more attention to emergent patterns than to local interactions.\nThis paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance.\nResearch on agent communication languages has typically taken the speech acts paradigm as its starting point. Despite their manifest attractions, speech-act models of communication have several serious disadvantages as a foundation for communication in artificial agent systems. In particular, it has proved to be extremely difficult to give a satisfactory semantics to speech-act based agent communication languages. In part, the problem is that speech-act semantics typically make reference to the \"mental states\" of agents (their beliefs, desires, and intentions), and there is in general no way to attribute such attitudes to arbitrary computational agents. In addition, agent programming languages have only had their semantics formalised for abstract, stand-alone versions, neglecting aspects such as communication primitives. With respect to communication, implemented agent programming languages have tended to be rather ad hoc. This paper addresses both of these problems, by giving semantics to speech-act based messages received by an AgentSpeak agent. AgentSpeak is a logic-based agent programming language which incorporates the main features of the PRS model of reactive planning systems. The paper builds upon a structural operational semantics to AgentSpeak that we developed in previous work. The main contributions of this paper are as follows: an extension of our earlier work on the theoretical foundations of AgentSpeak interpreters; a computationally grounded semantics for (the core) performatives used in speech-act based agent communication languages; and a well-defined extension of AgentSpeak that supports agent communication.\nSoftware aging is a phenomenon that refers to progressive performance degradation or transient failures or even crashes in long running software systems such as web servers. It mainly occurs due to the deterioration of operating system resource, fragmentation and numerical error accumulation. A primitive method to fight against software aging is software rejuvenation. Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash failures in the future. It involves occasionally stopping the running software, cleaning its internal state and restarting it. An optimized schedule for performing the software rejuvenation has to be derived in advance because a long running application could not be put down now and then as it may lead to waste of cost. This paper proposes a method to derive an accurate and optimized schedule for rejuvenation of a web server (Apache) by using Radial Basis Function (RBF) based Feed Forward Neural Network, a variant of Artificial Neural Networks (ANN). Aging indicators are obtained through experimental setup involving Apache web server and clients, which acts as input to the neural network model. This method is better than existing ones because usage of RBF leads to better accuracy and speed in convergence.\nMethods of general applicability are searched for in swarm intelligence with the aim of gaining new insights about natural swarms and to develop design methodologies for artificial swarms. An ideal solution could be a `swarm calculus' that allows to calculate key features of swarms such as expected swarm performance and robustness based on only a few parameters. To work towards this ideal, one needs to find methods and models with high degrees of generality. In this paper, we report two models that might be examples of exceptional generality. First, an abstract model is presented that describes swarm performance depending on swarm density based on the dichotomy between cooperation and interference. Typical swarm experiments are given as examples to show how the model fits to several different results. Second, we give an abstract model of collective decision making that is inspired by urn models. The effects of positive feedback probability, that is increasing over time in a decision making system, are understood by the help of a parameter that controls the feedback based on the swarm's current consensus. Several applicable methods, such as the description as Markov process, calculation of splitting probabilities, mean first passage times, and measurements of positive feedback, are discussed and applications to artificial and natural swarms are reported.\nArtificial Immune System (AIS-MACA) a novel computational intelligence technique is can be used for strengthening the automated protein prediction system with more adaptability and incorporating more parallelism to the system. Most of the existing approaches are sequential which will classify the input into four major classes and these are designed for similar sequences. AIS-MACA is designed to identify ten classes from the sequences that share twilight zone similarity and identity with the training sequences with mixed and hybrid variations. This method also predicts three states (helix, strand, and coil) for the secondary structure. Our comprehensive design considers 10 feature selection methods and 4 classifiers to develop MACA (Multiple Attractor Cellular Automata) based classifiers that are build for each of the ten classes. We have tested the proposed classifier with twilight-zone and 1-high-similarity benchmark datasets with over three dozens of modern competing predictors shows that AIS-MACA provides the best overall accuracy that ranges between 80% and 89.8% depending on the dataset.\nGrids with blocked and unblocked cells are often used to represent terrain in robotics and video games. However, paths formed by grid edges can be longer than true shortest paths in the terrain since their headings are artificially constrained. We present two new correct and complete any-angle path-planning algorithms that avoid this shortcoming. Basic Theta* and Angle-Propagation Theta* are both variants of A* that propagate information along grid edges without constraining paths to grid edges. Basic Theta* is simple to understand and implement, fast and finds short paths. However, it is not guaranteed to find true shortest paths. Angle-Propagation Theta* achieves a better worst-case complexity per vertex expansion than Basic Theta* by propagating angle ranges when it expands vertices, but is more complex, not as fast and finds slightly longer paths. We refer to Basic Theta* and Angle-Propagation Theta* collectively as Theta*. Theta* has unique properties, which we analyze in detail. We show experimentally that it finds shorter paths than both A* with post-smoothed paths and Field D* (the only other version of A* we know of that propagates information along grid edges without constraining paths to grid edges) with a runtime comparable to that of A* on grids. Finally, we extend Theta* to grids that contain unblocked cells with non-uniform traversal costs and introduce variants of Theta* which provide different tradeoffs between path length and runtime.\nMany people suffer from the loss of a limb. Learning to get by without an arm or hand can be very challenging, and existing prostheses do not yet fulfil the needs of individuals with amputations. One promising solution is to provide greater communication between a prosthesis and its user. Towards this end, we present a simple machine learning interface to supplement the control of a robotic limb with feedback to the user about what the limb will be experiencing in the near future. A real-time prediction learner was implemented to predict impact-related electrical load experienced by a robot limb; the learning system's predictions were then communicated to the device's user to aid in their interactions with a workspace. We tested this system with five able-bodied subjects. Each subject manipulated the robot arm while receiving different forms of vibrotactile feedback regarding the arm's contact with its workspace. Our trials showed that communicable predictions could be learned quickly during human control of the robot arm. Using these predictions as a basis for feedback led to a statistically significant improvement in task performance when compared to purely reactive feedback from the device. Our study therefore contributes initial evidence that prediction learning and machine intelligence can benefit not just control, but also feedback from an artificial limb. We expect that a greater level of acceptance and ownership can be achieved if the prosthesis itself takes an active role in transmitting learned knowledge about its state and its situation of use.\nOne purpose -- quite a few thinkers would say the main purpose -- of seeking knowledge about the world is to enhance our ability to make good decisions. An item of knowledge that can make no conceivable difference with regard to anything we might do would strike many as frivolous. Whether or not we want to be philosophical pragmatists in this strong sense with regard to everything we might want to enquire about, it seems a perfectly appropriate attitude to adopt toward artificial knowledge systems. If is granted that we are ultimately concerned with decisions, then some constraints are imposed on our measures of uncertainty at the level of decision making. If our measure of uncertainty is real-valued, then it isn't hard to show that it must satisfy the classical probability axioms. For example, if an act has a real-valued utility U(E) if the event E obtains, and the same real-valued utility if the denial of E obtains, so that U(E) = U(-E), then the expected utility of that act must be U(E), and that must be the same as the uncertainty-weighted average of the returns of the act, p-U(E) + q-U('E), where p and q represent the uncertainty of E and-E respectively. But then we must have p + q = 1.\nCorrect inference of genetic regulations inside a cell is one of the greatest challenges in post genomic era for the biologist and researchers. Several intelligent techniques and models were already proposed to identify the regulatory relations among genes from the biological database like time series microarray data. Recurrent Neural Network (RNN) is one of the most popular and simple approach to model the dynamics as well as to infer correct dependencies among genes. In this paper, Bat Algorithm (BA) is applied to optimize the model parameters of RNN model of Gene Regulatory Network (GRN). Initially the proposed method is tested against small artificial network without any noise and the efficiency is observed in term of number of iteration, number of population and BA optimization parameters. The model is also validated in presence of different level of random noise for the small artificial network and that proved its ability to infer the correct inferences in presence of noise like real world dataset. In the next phase of this research, BA based RNN is applied to real world benchmark time series microarray dataset of E. coli. The results prove that it can able to identify the maximum number of true positive regulation but also include some false positive regulations. Therefore, BA is very suitable for identifying biological plausible GRN with the help RNN model.\nWe are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions. The agent's comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions. Moreover, the speed with which this agent learns new words increases as its semantic knowledge grows. This facility for generalising and bootstrapping semantic knowledge indicates the potential of the present approach for reconciling ambiguous natural language with the complexity of the physical world.\nWe formulate a strong equivalence between machine learning, artificial intelligence methods and the formulation of statistical data assimilation as used widely in physical and biological sciences. The correspondence is that layer number in the artificial network setting is the analog of time in the data assimilation setting. Within the discussion of this equivalence we show that adding more layers (making the network deeper) is analogous to adding temporal resolution in a data assimilation framework.   How one can find a candidate for the global minimum of the cost functions in the machine learning context using a method from data assimilation is discussed. Calculations on simple models from each side of the equivalence are reported.   Also discussed is a framework in which the time or layer label is taken to be continuous, providing a differential equation, the Euler-Lagrange equation, which shows that the problem being solved is a two point boundary value problem familiar in the discussion of variational methods. The use of continuous layers is denoted \"deepest learning\". These problems respect a symplectic symmetry in continuous time/layer phase space. Both Lagrangian versions and Hamiltonian versions of these problems are presented. Their well-studied implementation in a discrete time/layer, while respected the symplectic structure, is addressed. The Hamiltonian version provides a direct rationale for back propagation as a solution method for the canonical momentum.\nAchieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.\nA mutation in a protein-coding gene in DNA can alter the protein structure coded by the same gene. Structurally altered proteins usually lose their functions and sometimes gain an undesirable function instead. These types of mutations and their effects can result in genetic diseases or antibiotic resistant bacteria, among other health issues. Important curing methods have been developed for detecting mutations against AIDS as well as genetic diseases. Another example is the influenza virus. The reasons why a vaccination developed to fight against influenza does not work the following year are (a) the mutation of its DNA and (b) the outbreak of the virus after it has been mutated especially if it is a virus that escaped the vaccinations target. Due to such reasons, it is highly important to know in advance the location of a potential mutation in a protein as well as the problems it might cause the medical sciences. In this study we have used artificial neural networks, which are one of the latest artificial intelligence technologies, to determine the effects of cancer mutations. The model we developed has given more successful results compared to other methods. We foresee that our model will bring a new dimension to medical research and the medical industry.\nIn the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems.\nThis paper focuses on the recognition of Activities of Daily Living (ADL) applying pattern recognition techniques to the data acquired by the accelerometer available in the mobile devices. The recognition of ADL is composed by several stages, including data acquisition, data processing, and artificial intelligence methods. The artificial intelligence methods used are related to pattern recognition, and this study focuses on the use of Artificial Neural Networks (ANN). The data processing includes data cleaning, and the feature extraction techniques to define the inputs for the ANN. Due to the low processing power and memory of the mobile devices, they should be mainly used to acquire the data, applying an ANN previously trained for the identification of the ADL. The main purpose of this paper is to present a new method implemented with ANN for the identification of a defined set of ADL with a reliable accuracy. This paper also presents a comparison of different types of ANN in order to choose the type for the implementation of the final method. Results of this research probes that the best accuracies are achieved with Deep Learning techniques with an accuracy higher than 80%.\nThe detection of the environment where user is located, is of extreme use for the identification of Activities of Daily Living (ADL). ADL can be identified by use of the sensors available in many off-the-shelf mobile devices, including magnetic and motion, and the environment can be also identified using acoustic sensors. The study presented in this paper is divided in two parts: firstly, we discuss the recognition of the environment using acoustic sensors (i.e., microphone), and secondly, we fuse this information with motion and magnetic sensors (i.e., motion and magnetic sensors) for the recognition of standing activities of daily living. The recognition of the environments and the ADL are performed using pattern recognition techniques, in order to develop a system that includes data acquisition, data processing, data fusion, and artificial intelligence methods. The artificial intelligence methods explored in this study are composed by different types of Artificial Neural Networks (ANN), comparing the different types of ANN and selecting the best methods to implement in the different stages of the system developed. Conclusions point to the use of Deep Neural Networks (DNN) with normalized data for the identification of ADL with 85.89% of accuracy, the use of Feedforward neural networks with non-normalized data for the identification of the environments with 86.50% of accuracy, and the use of DNN with normalized data for the identification of standing activities with 100% of accuracy.\nReinforcement Learning (RL) is a research area that has blossomed tremendously in recent years and has shown remarkable potential for artificial intelligence based opponents in computer games. This success is primarily due to vast capabilities of Convolutional Neural Networks (ConvNet), enabling algorithms to extract useful information from noisy environments. Capsule Network (CapsNet) is a recent introduction to the Deep Learning algorithm group and has only barely begun to be explored. The network is an architecture for image classification, with superior performance for classification of the MNIST dataset. CapsNets have not been explored beyond image classification.   This thesis introduces the use of CapsNet for Q-Learning based game algorithms. To successfully apply CapsNet in advanced game play, three main contributions follow. First, the introduction of four new game environments as frameworks for RL research with increasing complexity, namely Flash RL, Deep Line Wars, Deep RTS, and Deep Maze. These environments fill the gap between relatively simple and more complex game environments available for RL research and are in the thesis used to test and explore the CapsNet behavior.   Second, the thesis introduces a generative modeling approach to produce artificial training data for use in Deep Learning models including CapsNets. We empirically show that conditional generative modeling can successfully generate game data of sufficient quality to train a Deep Q-Network well.   Third, we show that CapsNet is a reliable architecture for Deep Q-Learning based algorithms for game AI. A capsule is a group of neurons that determine the presence of objects in the data and is in the literature shown to increase the robustness of training and predictions while lowering the amount training data needed. It should, therefore, be ideally suited for game plays.\nThe primary aim of the study is to introduce APLOCO method which is developed for the solution of multicriteria decision making problems both theoretically and practically. In this context, application subject of APLACO constitutes evaluation of investment potential of different cities in metropolitan status in Turkey. The secondary purpose of the study is to identify the independent variables affecting the factories in the operating phase and to estimate the effect levels of independent variables on the dependent variable in the organized industrial zones (OIZs), whose mission is to reduce regional development disparities and to mobilize local production dynamics. For this purpose, the effect levels of independent variables on dependent variables have been determined using the multilayer perceptron (MLP) method, which has a wide use in artificial neural networks (ANNs). The effect levels derived from MLP have been then used as the weight levels of the decision criteria in APLOCO. The independent variables included in MLP are also used as the decision criteria in APLOCO. According to the results obtained from APLOCO, Istanbul city is the best alternative in term of the investment potential and other alternatives are Manisa, Denizli, Izmir, Kocaeli, Bursa, Ankara, Adana, and Antalya, respectively. Although APLOCO is used to solve the ranking problem in order to show application process in the paper, it can be employed easily in the solution of classification and selection problems. On the other hand, the study also shows a rare example of the nested usage of APLOCO which is one of the methods of operation research as well as MLP used in determination of weights.\nGravitational mass flows, such as avalanches, debris flows and rockfalls are common events in alpine regions with high impact on transport routes. Within the last few decades, hazard zone maps have been developed to systematically approach this threat. These maps mark vulnerable zones in habitable areas to allow effective planning of hazard mitigation measures and development of settlements. Hazard zone maps have shown to be an effective tool to reduce fatalities during extreme events. They are created in a complex process, based on experience, empirical models, physical simulations and historical data. The generation of such maps is therefore expensive and limited to crucially important regions, e.g. permanently inhabited areas.   In this work we interpret the task of hazard zone mapping as a classification problem. Every point in a specific area has to be classified according to its vulnerability. On a regional scale this leads to a segmentation problem, where the total area has to be divided in the respective hazard zones. The recent developments in artificial intelligence, namely convolutional neuronal networks, have led to major improvement in a very similar task, image classification and semantic segmentation, i.e. computer vision. We use a convolutional neuronal network to identify terrain formations with the potential for catastrophic snow avalanches and label points in their reach as vulnerable. Repeating this procedure for all points allows us to generate an artificial hazard zone map. We demonstrate that the approach is feasible and promising based on the hazard zone map of the Tirolean Oberland. However, more training data and further improvement of the method is required before such techniques can be applied reliably.\nThis paper presents our computational methodology using Genetic Algorithms (GA) for exploring the nature of RNA editing. These models are constructed using several genetic editing characteristics that are gleaned from the RNA editing system as observed in several organisms. We have expanded the traditional Genetic Algorithm with artificial editing mechanisms as proposed by (Rocha, 1997). The incorporation of editing mechanisms provides a means for artificial agents with genetic descriptions to gain greater phenotypic plasticity, which may be environmentally regulated. Our first implementations of these ideas have shed some light into the evolutionary implications of RNA editing. Based on these understandings, we demonstrate how to select proper RNA editors for designing more robust GAs, and the results will show promising applications to real-world problems. We expect that the framework proposed will both facilitate determining the evolutionary role of RNA editing in biology, and advance the current state of research in Genetic Algorithms.\nThe aim of this paper is to address the question: Can an artificial neural network (ANN) model be used as a possible characterization of the power of the human mind? We will discuss what might be the relationship between such a model and its natural counterpart. A possible characterization of the different power capabilities of the mind is suggested in terms of the information contained (in its computational complexity) or achievable by it. Such characterization takes advantage of recent results based on natural neural networks (NNN) and the computational power of arbitrary artificial neural networks (ANN). The possible acceptance of neural networks as the model of the human mind's operation makes the aforementioned quite relevant.\nArtificial Neural Network (ANN) is used as numerical methode in solving modified Nonlinear Schroedinger (NLS) equation with Dispersion Managed System (DMS) for jitter analysis. We take the optical axis z and the time t as input, and then some relevant values such as the change of position and the center frequency of the pulse, and further the mean square time of incoming pulse which are needed for jitter analysis. It shows that ANN yields numerical solutions which are adaptive with respect to the numerical errors and also verifies the previous numerical results using conventional numerical method. Our result indicates that DMS can minimize the timing jitter induced by some amplifiers.\nThis paper uses Artificial Neural Network (ANN) models to compute response of structural system subject to Indian earthquakes at Chamoli and Uttarkashi ground motion data. The system is first trained for a single real earthquake data. The trained ANN architecture is then used to simulate earthquakes with various intensities and it was found that the predicted responses given by ANN model are accurate for practical purposes. When the ANN is trained by a part of the ground motion data, it can also identify the responses of the structural system well. In this way the safeness of the structural systems may be predicted in case of future earthquakes without waiting for the earthquake to occur for the lessons. Time period and the corresponding maximum response of the building for an earthquake has been evaluated, which is again trained to predict the maximum response of the building at different time periods. The trained time period versus maximum response ANN model is also tested for real earthquake data of other place, which was not used in the training and was found to be in good agreement.\nIt has previously been shown that a recommender based on immune system idiotypic principles can out perform one based on correlation alone. This paper reports the results of work in progress, where we undertake some investigations into the nature of this beneficial effect. The initial findings are that the immune system recommender tends to produce different neighbourhoods, and that the superior performance of this recommender is due partly to the different neighbourhoods, and partly to the way that the idiotypic effect is used to weight each neighbours recommendations.\nWe present ideas about creating a next generation Intrusion Detection System based on the latest immunological theories. The central challenge with computer security is determining the difference between normal and potentially harmful activity. For half a century, developers have protected their systems by coding rules that identify and block specific events. However, the nature of current and future threats in conjunction with ever larger IT systems urgently requires the development of automated and adaptive defensive tools. A promising solution is emerging in the form of Artificial Immune Systems. The Human Immune System can detect and defend against harmful and previously unseen invaders, so can we not build a similar Intrusion Detection System for our computers.\nThe immune system is a complex biological system with a highly distributed, adaptive and self-organising nature. This paper presents an Artificial Immune System (AIS) that exploits some of these characteristics and is applied to the task of film recommendation by Collaborative Filtering (CF). Natural evolution and in particular the immune system have not been designed for classical optimisation. However, for this problem, we are not interested in finding a single optimum. Rather we intend to identify a sub-set of good matches on which recommendations can be based. It is our hypothesis that an AIS built on two central aspects of the biological immune system will be an ideal candidate to achieve this: Antigen-antibody interaction for matching and idiotypic antibody-antibody interaction for diversity. Computational results are presented in support of this conjecture and compared to those found by other CF techniques.\nIf a computer node is infected by a virus, worm or a backdoor, then this is a security risk for the complete network structure where the node is associated. Existing Network Intrusion Detection Systems (NIDS) provide a certain amount of support for the identification of such infected nodes but suffer from the need of plenty of communication and computational power. In this article, we present a novel approach called AGNOSCO to support the identification of infected nodes through the usage of artificial ant colonies. It is shown that AGNOSCO overcomes the communication and computational power problem while identifying infected nodes properly.\nDistributed system as e.g. artificial immune systems, complex adaptive systems, or multi-agent systems are widely used in Computer Science, e.g. for network security, optimisations, or simulations. In these systems, small entities move through the network and perform certain tasks. At some time, the entities move to another place and require therefore information where to move is most profitable. Common used systems do not provide any information or use a centralised approach where a center delegates the entities. This article discusses whether small information about the neighbours enhances the performance of the overall system or not. Therefore, two information-protocols are introduced and analysed. In addition, the protocols are implemented and tested using the artificial immune system SANA that protects a network against intrusions.\nDendritic cells are the crime scene investigators of the human immune system. Their function is to correlate potentially anomalous invading entities with observed damage to the body. The detection of such invaders by dendritic cells results in the activation of the adaptive immune system, eventually leading to the removal of the invader from the host body. This mechanism has provided inspiration for the development of a novel bio-inspired algorithm, the Dendritic Cell Algorithm. This algorithm processes information at multiple levels of resolution, resulting in the creation of information granules of variable structure. In this chapter we examine the multi-faceted nature of immunology and how research in this field has shaped the function of the resulting Dendritic Cell Algorithm. A brief overview of the algorithm is given in combination with the details of the processes used for its development. The chapter is concluded with a discussion of the parallels between our understanding of the human immune system and how such knowledge influences the design of artificial immune systems.\nAs an immune-inspired algorithm, the Dendritic Cell Algorithm (DCA), produces promising performances in the field of anomaly detection. This paper presents the application of the DCA to a standard data set, the KDD 99 data set. The results of different implementation versions of the DXA, including the antigen multiplier and moving time windows are reported. The real-valued Negative Selection Algorithm (NSA) using constant-sized detectors and the C4.5 decision tree algorithm are used, to conduct a baseline comparison. The results suggest that the DCA is applicable to KDD 99 data set, and the antigen multiplier and moving time windows have the same effect on the DCA for this particular data set. The real-valued NSA with constant-sized detectors is not applicable to the data set, and the C4.5 decision tree algorithm provides a benchmark of the classification performance for this data set.\nIn a previous paper the authors argued the case for incorporating ideas from innate immunity into articficial immune systems (AISs) and presented an outline for a conceptual framework for such systems. A number of key general properties observed in the biological innate and adaptive immune systems were hughlighted, and how such properties might be instantiated in artificial systems was discussed in detail. The next logical step is to take these ideas and build a software system with which AISs with these properties can be implemented and experimentally evaluated. This paper reports on the results of that step - the libtissue system.\nCrisis response requires information intensive efforts utilized for reducing uncertainty, calculating and comparing costs and benefits, and managing resources in a fashion beyond those regularly available to handle routine problems. This paper presents an Artificial Immune Systems (AIS) metaphor for agent based modeling of crisis response operations. The presented model proposes integration of hybrid set of aspects (multi-agent systems, built-in defensive model of AIS, situation management, and intensity-based learning) for crisis response operations. In addition, the proposed response model is applied on the spread of pandemic influenza in Egypt as a case study.\nWe outline initial concepts for an immune inspired algorithm to evaluate and predict oil price time series data. The proposed solution evolves a short term pool of trackers dynamically, with each member attempting to map trends and anticipate future price movements. Successful trackers feed into a long term memory pool that can generalise across repeating trend patterns. The resulting sequence of trackers, ordered in time, can be used as a forecasting tool. Examination of the pool of evolving trackers also provides valuable insight into the properties of the crude oil market.\nIn this paper we outline initial concepts for an immune inspired algorithm to evaluate price time series data. The proposed solution evolves a short term pool of trackers dynamically through a process of proliferation and mutation, with each member attempting to map to trends in price movements. Successful trackers feed into a long term memory pool that can generalise across repeating trend patterns. Tests are performed to examine the algorithm's ability to successfully identify trends in a small data set. The influence of the long term memory pool is then examined. We find the algorithm is able to identify price trends presented successfully and efficiently.\nThe dendritic cell algorithm is an immune-inspired technique for processing time-dependant data. Here we propose it as a possible solution for a robotic classification problem. The dendritic cell algorithm is implemented on a real robot and an investigation is performed into the effects of varying the migration threshold median for the cell population. The algorithm performs well on a classification task with very little tuning. Ways of extending the implementation to allow it to be used as a classifier within the field of robotic security are suggested.\nThe Dendritic Cell Algorithm is an immune-inspired algorithm orig- inally based on the function of natural dendritic cells. The original instantiation of the algorithm is a highly stochastic algorithm. While the performance of the algorithm is good when applied to large real-time datasets, it is difficult to anal- yse due to the number of random-based elements. In this paper a deterministic version of the algorithm is proposed, implemented and tested using a port scan dataset to provide a controllable system. This version consists of a controllable amount of parameters, which are experimented with in this paper. In addition the effects are examined of the use of time windows and variation on the number of cells, both which are shown to influence the algorithm. Finally a novel metric for the assessment of the algorithms output is introduced and proves to be a more sensitive metric than the metric used with the original Dendritic Cell Algorithm.\nArtificial Immune Systems have been successfully applied to a number of problem domains including fault tolerance and data mining, but have been shown to scale poorly when applied to computer intrusion detec- tion despite the fact that the biological immune system is a very effective anomaly detector. This may be because AIS algorithms have previously been based on the adaptive immune system and biologically-naive mod- els. This paper focuses on describing and testing a more complex and biologically-authentic AIS model, inspired by the interactions between the innate and adaptive immune systems. Its performance on a realistic process anomaly detection problem is shown to be better than standard AIS methods (negative-selection), policy-based anomaly detection methods (systrace), and an alternative innate AIS approach (the DCA). In addition, it is shown that runtime information can be used in combination with system call information to enhance detection capability.\nThe human immune system has numerous properties that make it ripe for exploitation in the computational domain, such as robustness and fault tolerance, and many different algorithms, collectively termed Artificial Immune Systems (AIS), have been inspired by it. Two generations of AIS are currently in use, with the first generation relying on simplified immune models and the second generation utilising interdisciplinary collaboration to develop a deeper understanding of the immune system and hence produce more complex models. Both generations of algorithms have been successfully applied to a variety of problems, including anomaly detection, pattern recognition, optimisation and robotics. In this chapter an overview of AIS is presented, its evolution is discussed, and it is shown that the diversification of the field is linked to the diversity of the immune system itself, leading to a number of algorithms as opposed to one archetypal system. Two case studies are also presented to help provide insight into the mechanisms of AIS; these are the idiotypic network approach and the Dendritic Cell Algorithm.\nAddressing the problem of spam emails in the Internet, this paper presents a comparative study on Na\\\"ive Bayes and Artificial Neural Networks (ANN) based modeling of spammer behavior. Keyword-based spam email filtering techniques fall short to model spammer behavior as the spammer constantly changes tactics to circumvent these filters. The evasive tactics that the spammer uses are themselves patterns that can be modeled to combat spam. It has been observed that both Na\\\"ive Bayes and ANN are best suitable for modeling spammer common patterns. Experimental results demonstrate that both of them achieve a promising detection rate of around 92%, which is considerably an improvement of performance compared to the keyword-based contemporary filtering approaches.\nNatural Immune system plays a vital role in the survival of the all living being. It provides a mechanism to defend itself from external predates making it consistent systems, capable of adapting itself for survival incase of changes. The human immune system has motivated scientists and engineers for finding powerful information processing algorithms that has solved complex engineering tasks. This paper explores one of the various possibilities for solving problem in a Multiagent scenario wherein multiple robots are deployed to achieve a goal collectively. The final goal is dependent on the performance of individual robot and its survival without having to lose its energy beyond a predetermined threshold value by deploying an evolutionary computational technique otherwise called the artificial immune system that imitates the biological immune system.\nFinancial forecasting is an example of a signal processing problem which is challenging due to Small sample sizes, high noise, non-stationarity, and non-linearity,but fast forecasting of stock market price is very important for strategic business planning.Present study is aimed to develop a comparative predictive model with Feedforward Multilayer Artificial Neural Network & Recurrent Time Delay Neural Network for the Financial Timeseries Prediction.This study is developed with the help of historical stockprice dataset made available by GoogleFinance.To develop this prediction model Backpropagation method with Gradient Descent learning has been implemented.Finally the Neural Net, learned with said algorithm is found to be skillful predictor for non-stationary noisy Financial Timeseries.\nBasically, in (one-player) war Real Time Strategy (wRTS) games a human player controls, in real time, an army consisting of a number of soldiers and her aim is to destroy the opponent's assets where the opponent is a virtual (i.e., non-human player controlled) player that usually consists of a pre-programmed decision-making script. These scripts have usually associated some well-known problems (e.g., predictability, non-rationality, repetitive behaviors, and sensation of artificial stupidity among others). This paper describes a method for the automatic generation of virtual players that adapt to the player skills; this is done by building initially a model of the player behavior in real time during the game, and further evolving the virtual player via this model in-between two games. The paper also shows preliminary results obtained on a one player wRTS game constructed specifically for experimentation.\nThe Artificial Bee Colony (ABC) is the name of an optimization algorithm that was inspired by the intelligent behavior of a honey bee swarm. It is widely recognized as a quick, reliable, and efficient methods for solving optimization problems. This paper proposes a hybrid ABC (HABC) algorithm for graph 3-coloring, which is a well-known discrete optimization problem. The results of HABC are compared with results of the well-known graph coloring algorithms of today, i.e. the Tabucol and Hybrid Evolutionary algorithm (HEA) and results of the traditional evolutionary algorithm with SAW method (EA-SAW). Extensive experimentations has shown that the HABC matched the competitive results of the best graph coloring algorithms, and did better than the traditional heuristics EA-SAW when solving equi-partite, flat, and random generated medium-sized graphs.\nMemetic computation (MC) has emerged recently as a new paradigm of efficient algorithms for solving the hardest optimization problems. On the other hand, artificial bees colony (ABC) algorithms demonstrate good performances when solving continuous and combinatorial optimization problems. This study tries to use these technologies under the same roof. As a result, a memetic ABC (MABC) algorithm has been developed that is hybridized with two local search heuristics: the Nelder-Mead algorithm (NMA) and the random walk with direction exploitation (RWDE). The former is attended more towards exploration, while the latter more towards exploitation of the search space. The stochastic adaptation rule was employed in order to control the balancing between exploration and exploitation. This MABC algorithm was applied to a Special suite on Large Scale Continuous Global Optimization at the 2012 IEEE Congress on Evolutionary Computation. The obtained results the MABC are comparable with the results of DECC-G, DECC-G*, and MLCC.\nIn this paper, first we present a new explanation for the relation between logical circuits and artificial neural networks, logical circuits and fuzzy logic, and artificial neural networks and fuzzy inference systems. Then, based on these results, we propose a new neuro-fuzzy computing system which can effectively be implemented on the memristor-crossbar structure. One important feature of the proposed system is that its hardware can directly be trained using the Hebbian learning rule and without the need to any optimization. The system also has a very good capability to deal with huge number of input-out training data without facing problems like overtraining.\nDetection and segmentation of Brain tumor is very important because it provides anatomical information of normal and abnormal tissues which helps in treatment planning and patient follow-up. There are number of techniques for image segmentation. Proposed research work uses ANFIS (Artificial Neural Network Fuzzy Inference System) for image classification and then compares the results with FCM (Fuzzy C means) and K-NN (K-nearest neighbor). ANFIS includes benefits of both ANN and the fuzzy logic systems. A comprehensive feature set and fuzzy rules are selected to classify an abnormal image to the corresponding tumor type. Experimental results illustrate promising results in terms of classification accuracy. A comparative analysis is performed with the FCM and K-NN to show the superior nature of ANFIS systems.\nTermites present a very good natural metaphor to evolutionary computation. While each individuals computational power is small compared to more evolved species, it is the power of their colonies that inspires communication engineers. This paper presents a study of artificial termites in sensor networks for the purpose of solving its routing problem. The behaviors of each of the termites in their colony allow their simulation in a restricted environment. The simulating behavior demonstrates how the termites make use of an auto-catalytic behavior in order to collectively find a solution for a posed problem in reasonable time. The derived algorithm termed Termite-hill demonstrates the principle of termites behavior to routing problem solving in the real applications of sensor networks. The performance of the algorithm was tested on static and dynamic sink scenarios. The results as compared with other routing algorithms and with varying network density show that Termite-hill is scalable and improved on network energy consumption with a control over best-effort-service.\nIn paddy field, monitoring soil moisture is required for irrigation scheduling and water resource allocation, management and planning. The current study proposes an Artificial Neural Networks (ANN) model to estimate soil moisture in paddy field with limited meteorological data. Dynamic of ANN model was adopted to estimate soil moisture with the inputs of reference evapotranspiration (ETo) and precipitation. ETo was firstly estimated using the maximum, average and minimum values of air temperature as the inputs of model. The models were performed under different weather conditions between the two paddy cultivation periods. Training process of model was carried out using the observation data in the first period, while validation process was conducted based on the observation data in the second period. Dynamic of ANN model estimated soil moisture with R2 values of 0.80 and 0.73 for training and validation processes, respectively, indicated that tight linear correlations between observed and estimated values of soil moisture were observed. Thus, the ANN model reliably estimates soil moisture with limited meteorological data.\nIn mobile robotics, a solid test for adaptation is the ability of a control system to function not only in a diverse number of physical environments, but also on a number of different robotic platforms. This paper demonstrates that a set of behaviours evolved in simulation on a miniature robot (epuck) can be transferred to a much larger-scale platform (Pioneer), both in simulation and in the real world. The chosen architecture uses artificial evolution of epuck behaviours to obtain a genetic sequence, which is then employed to seed an idiotypic, artificial immune system (AIS) on the Pioneers. Despite numerous hardware and software differences between the platforms, navigation and target-finding experiments show that the evolved behaviours transfer very well to the larger robot when the idiotypic AIS technique is used. In contrast, transferability is poor when reinforcement learning alone is used, which validates the adaptability of the chosen architecture.\nThis work focuses on development of a Offline Hand Written English Character Recognition algorithm based on Artificial Neural Network (ANN). The ANN implemented in this work has single output neuron which shows whether the tested character belongs to a particular cluster or not. The implementation is carried out completely in 'C' language. Ten sets of English alphabets (small-26, capital-26) were used to train the ANN and 5 sets of English alphabets were used to test the network. The characters were collected from different persons over duration of about 25 days. The algorithm was tested with 5 capital letters and 5 small letter sets. However, the result showed that the algorithm recognized English alphabet patterns with maximum accuracy of 92.59% and False Rejection Rate (FRR) of 0%.\nNow the Meta-Heuristic algorithms have been used vastly in solving the problem of continuous optimization. In this paper the Artificial Bee Colony (ABC) algorithm and the Firefly Algorithm (FA) are valuated. And for presenting the efficiency of the algorithms and also for more analysis of them, the continuous optimization problems which are of the type of the problems of vast limit of answer and the close optimized points are tested. So, in this paper the efficiency of the ABC algorithm and FA are presented for solving the continuous optimization problems and also the said algorithms are studied from the accuracy in reaching the optimized solution and the resulting time and the reliability of the optimized answer points of view.\nToday, with respect to the increasing growth of demand to get credit from the customers of banks and finance and credit institutions, using an effective and efficient method to decrease the risk of non-repayment of credit given is very necessary. Assessment of customers' credit is one of the most important and the most essential duties of banks and institutions, and if an error occurs in this field, it would leads to the great losses for banks and institutions. Thus, using the predicting computer systems has been significantly progressed in recent decades. The data that are provided to the credit institutions' managers help them to make a straight decision for giving the credit or not-giving it. In this paper, we will assess the customer credit through a combined classification using artificial neural networks, genetics algorithm and Bayesian probabilities simultaneously, and the results obtained from three methods mentioned above would be used to achieve an appropriate and final result. We use the K_folds cross validation test in order to assess the method and finally, we compare the proposed method with the methods such as Clustering-Launched Classification (CLC), Support Vector Machine (SVM) as well as GA+SVM where the genetics algorithm has been used to improve them.\nHandoff decisions are usually signal strength based because of simplicity and effectiveness. Apart from the conventional techniques, such as threshold and hysteresis based schemes, recently many artificial intelligent techniques such as Fuzzy Logic, Artificial Neural Network (ANN) etc. are also used for taking handoff decision. In this paper, an Artificial Neural Network based handoff algorithm is proposed and its performance is studied. We have used ANN here for taking fast and accurate handoff decision. In our proposed handoff algorithm, Backpropagation Neural Network model is used.The advantages of Back propagation method are its simplicity and reasonable speed. The algorithm is designed, tested and found to give optimum results.\nIn this paper, we study the impact of selection methods in the context of on-line on-board distributed evolutionary algorithms. We propose a variant of the mEDEA algorithm in which we add a selection operator, and we apply it in a taskdriven scenario. We evaluate four selection methods that induce different intensity of selection pressure in a multi-robot navigation with obstacle avoidance task and a collective foraging task. Experiments show that a small intensity of selection pressure is sufficient to rapidly obtain good performances on the tasks at hand. We introduce different measures to compare the selection methods, and show that the higher the selection pressure, the better the performances obtained, especially for the more challenging food foraging task.\nEmergence is a phenomenon taken for granted in science but also still not well understood. We have developed a model of artificial genetic evolution intended to allow for emergence on genetic, population and social levels. We present the details of the current state of our environment, agent, and reproductive models. In developing our models we have relied on a principle of using non-linear systems to model as many systems as possible including mutation and recombination, gene-environment interaction, agent metabolism, agent survival, resource gathering and sexual reproduction. In this paper we review the genetic dynamics that have emerged in our system including genotype-phenotype divergence, genetic drift, pseudogenes, and gene duplication. We conclude that emergence-focused design in complex system simulation is necessary to reproduce the multilevel emergence seen in the natural world.\nAutonomous robots need to be able to adapt to unforeseen situations and to acquire new skills through trial and error. Reinforcement learning in principle offers a suitable methodological framework for this kind of autonomous learning. However current computational reinforcement learning agents mostly learn each individual skill entirely from scratch. How can we enable artificial agents, such as robots, to acquire some form of generic knowledge, which they could leverage for the learning of new skills? This paper argues that, like the brain, the cognitive system of artificial agents has to develop a world model to support adaptive behavior and learning. Inspiration is taken from two recent developments in the cognitive science literature: predictive processing theories of cognition, and the sensorimotor contingencies theory of perception. Based on these, a hypothesis is formulated about what the content of information might be that is encoded in an internal world model, and how an agent could autonomously acquire it. A computational model is described to formalize this hypothesis, and is evaluated in a series of simulation experiments.\nThe production of renewable and sustainable energy is one of the most important challenges currently facing mankind. Wind has made an increasing contribution to the world's energy supply mix, but still remains a long way from reaching its full potential. In this paper, we investigate the use of artificial evolution to design vertical-axis wind turbine prototypes that are physically instantiated and evaluated under fan generated wind conditions. Initially a conventional evolutionary algorithm is used to explore the design space of a single wind turbine and later a cooperative coevolutionary algorithm is used to explore the design space of an array of wind turbines. Artificial neural networks are used throughout as surrogate models to assist learning and found to reduce the number of fabrications required to reach a higher aerodynamic efficiency. Unlike in other approaches, such as computational fluid dynamics simulations, no mathematical formulations are used and no model assumptions are made.\nQuantum cybernetics and its connections to complex quantum systems science is addressed from the perspective of complex quantum computing systems. In this way, the notion of an autonomous quantum computing system is introduced in regards to quantum artificial intelligence, and applied to quantum artificial neural networks, considered as autonomous quantum computing systems, which leads to a quantum connectionist framework within quantum cybernetics for complex quantum computing systems. Several examples of quantum feedforward neural networks are addressed in regards to Boolean functions' computation, multilayer quantum computation dynamics, entanglement and quantum complementarity. The examples provide a framework for a reflection on the role of quantum artificial neural networks as a general framework for addressing complex quantum systems that perform network-based quantum computation, possible consequences are drawn regarding quantum technologies, as well as fundamental research in complex quantum systems science and quantum biology.\nTime perception is essential for task switching, and in the mammalian brain appears alongside other processes. Memristors are electronic components used as synapses and as models for neurons. The d.c. response of memristors can be considered as a type of short-term memory. Interactions of the memristor d.c. response within networks of memristors leads to the emergence of oscillatory dynamics and intermittent spike trains, which are similar to neural dynamics. Based on this data, the structure of a memristor network control for a robot as it undergoes task switching is discussed and it is suggested that these emergent network dynamics could improve the performance of role switching and learning in an artificial intelligence and perhaps create artificial time perception.\nClustering problems are considered amongst the prominent challenges in statistics and computational science. Clustering of nodes in wireless sensor networks which is used to prolong the life-time of networks is one of the difficult tasks of clustering procedure. In order to perform nodes clustering, a number of nodes are determined as cluster heads and other ones are joined to one of these heads, based on different criteria e.g. Euclidean distance. So far, different approaches have been proposed for this process, where swarm and evolutionary algorithms contribute in this regard. In this study, a novel algorithm is proposed based on Artificial Fish Swarm Algorithm (AFSA) for clustering procedure. In the proposed method, the performance of the standard AFSA is improved by increasing balance between local and global searches. Furthermore, a new mechanism has been added to the base algorithm for improving convergence speed in clustering problems. Performance of the proposed technique is compared to a number of state-of-the-art techniques in this field and the outcomes indicate the supremacy of the proposed technique.\nWe propose the Artificial Continuous Prediction Market (ACPM) as a means to predict a continuous real value, by integrating a range of data sources and aggregating the results of different machine learning (ML) algorithms. ACPM adapts the concept of the (physical) prediction market to address the prediction of real values instead of discrete events. Each ACPM participant has a data source, a ML algorithm and a local decision-making procedure that determines what to bid on what value. The contributions of ACPM are: (i) adaptation to changes in data quality by the use of learning in: (a) the market, which weights each market participant to adjust the influence of each on the market prediction and (b) the participants, which use a Q-learning based trading strategy to incorporate the market prediction into their subsequent predictions, (ii) resilience to a changing population of low- and high-performing participants. We demonstrate the effectiveness of ACPM by application to an influenza-like illnesses data set, showing ACPM out-performs a range of well-known regression models and is resilient to variation in data source quality.\nWritten responses can provide a wealth of data in understanding student reasoning on a topic. Yet they are time- and labor-intensive to score, requiring many instructors to forego them except as limited parts of summative assessments at the end of a unit or course. Recent developments in Machine Learning (ML) have produced computational methods of scoring written responses for the presence or absence of specific concepts. Here, we compare the scores from one particular ML program -- EvoGrader -- to human scoring of responses to structurally- and content-similar questions that are distinct from the ones the program was trained on. We find that there is substantial inter-rater reliability between the human and ML scoring. However, sufficient systematic differences remain between the human and ML scoring that we advise only using the ML scoring for formative, rather than summative, assessment of student reasoning.\nArtificial Neural Networks (ANNs) have received increasing attention in recent years with applications that span a wide range of disciplines including vital domains such as medicine, network security and autonomous transportation. However, neural network architectures are becoming increasingly complex and with an increasing need to obtain real-time results from such models, it has become pivotal to use parallelization as a mechanism for speeding up network training and deployment. In this work we propose an implementation of Network Parallel Training through Cannon's Algorithm for matrix multiplication. We show that increasing the number of processes speeds up training until the point where process communication costs become prohibitive; this point varies by network complexity. We also show through empirical efficiency calculations that the speedup obtained is superlinear.\nThe Human visual perception of the world is of a large fixed image that is highly detailed and sharp. However, receptor density in the retina is not uniform: a small central region called the fovea is very dense and exhibits high resolution, whereas a peripheral region around it has much lower spatial resolution. Thus, contrary to our perception, we are only able to observe a very small region around the line of sight with high resolution. The perception of a complete and stable view is aided by an attention mechanism that directs the eyes to the numerous points of interest within the scene. The eyes move between these targets in quick, unconscious movements, known as \"saccades\". Once a target is centered at the fovea, the eyes fixate for a fraction of a second while the visual system extracts the necessary information. An artificial visual system was built based on a fully recurrent neural network set within a reinforcement learning protocol, and learned to attend to regions of interest while solving a classification task. The model is consistent with several experimentally observed phenomena, and suggests novel predictions.\nThe deployment of Artificial Neural Networks (ANNs) in safety-critical applications poses a number of new verification and certification challenges. In particular, for ANN-enabled self-driving vehicles it is important to establish properties about the resilience of ANNs to noisy or even maliciously manipulated sensory input. We are addressing these challenges by defining resilience properties of ANN-based classifiers as the maximal amount of input or sensor perturbation which is still tolerated. This problem of computing maximal perturbation bounds for ANNs is then reduced to solving mixed integer optimization problems (MIP). A number of MIP encoding heuristics are developed for drastically reducing MIP-solver runtimes, and using parallelization of MIP-solvers results in an almost linear speed-up in the number (up to a certain limit) of computing cores in our experiments. We demonstrate the effectiveness and scalability of our approach by means of computing maximal resilience bounds for a number of ANN benchmark sets ranging from typical image recognition scenarios to the autonomous maneuvering of robots.\nThe main task in oil and gas exploration is to gain an understanding of the distribution and nature of rocks and fluids in the subsurface. Well logs are records of petro-physical data acquired along a borehole, providing direct information about what is in the subsurface. The data collected by logging wells can have significant economic consequences, due to the costs inherent to drilling wells, and the potential return of oil deposits. In this paper, we describe preliminary work aimed at building a general framework for well log prediction.   First, we perform a descriptive and exploratory analysis of the gaps in the neutron porosity logs of more than a thousand wells in the North Sea. Then, we generate artificial gaps in the neutron logs that reflect the statistics collected before. Finally, we compare Artificial Neural Networks, Random Forests, and three algorithms of Linear Regression in the prediction of missing gaps on a well-by-well basis.\nWe introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.\nHumans are routinely asked to evaluate the performance of other individuals, separating success from failure and affecting outcomes from science to education and sports. Yet, in many contexts, the metrics driving the human evaluation process remain unclear. Here we analyse a massive dataset capturing players' evaluations by human judges to explore human perception of performance in soccer, the world's most popular sport. We use machine learning to design an artificial judge which accurately reproduces human evaluation, allowing us to demonstrate how human observers are biased towards diverse contextual features. By investigating the structure of the artificial judge, we uncover the aspects of the players' behavior which attract the attention of human judges, demonstrating that human evaluation is based on a noticeability heuristic where only feature values far from the norm are considered to rate an individual's performance.\nTo compare entities of differing types and structural components, the artificial neural network paradigm was used to cross-compare structural components between heterogeneous documents. Trainable weighted structural components were input into machine-learned activation functions of the neurons. The model was used for matching news articles and videos, where the inputs and activation functions respectively consisted of term vectors and cosine similarity measures between the weighted structural components. The model was tested with different weights, achieving as high as 59.2% accuracy for matching videos to news articles. A mobile application user interface for recommending related videos for news articles was developed to demonstrate consumer value, including its potential usefulness for cross-selling products from unrelated categories.\nMotivated by recent developments impacting our view of Fermi's paradox (absence of extraterrestrials and their manifestations from our past light cone), we suggest a reassessment of the problem itself, as well as of strategies employed by SETI projects so far. The need for such reevaluation is fueled not only by the failure of searches thus far, but also by great advances recently made in astrophysics, astrobiology, computer science and future studies, which have remained largely ignored in SETI practice. As an example of the new approach, we consider the effects of the observed metallicity and temperature gradients in the Milky Way on the spatial distribution of hypothetical advanced extraterrestrial intelligent communities. While, obviously, properties of such communities and their sociological and technological preferences are entirely unknown, we assume that (1) they operate in agreement with the known laws of physics, and (2) that at some point they typically become motivated by a meta-principle embodying the central role of information-processing; a prototype of the latter is the recently suggested Intelligence Principle of Steven J. Dick. There are specific conclusions of practical interest to be drawn from coupling of these reasonable assumptions with the astrophysical and astrochemical structure of the Galaxy. In particular, we suggest that the outer regions of the Galactic disk are most likely locations for advanced SETI targets, and that intelligent communities will tend to migrate outward through the Galaxy as their capacities of information-processing increase, for both thermodynamical and astrochemical reasons. This can also be regarded as a possible generalization of the Galactic Habitable Zone, concept currently much investigated in astrobiology.\nThe rapid e-commerce growth has made both business community and customers face a new situation. Due to intense competition on one hand and the customer's option to choose from several alternatives business community has realized the necessity of intelligent marketing strategies and relationship management. Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on. In this paper, we present the important concepts of Web usage mining and its various practical applications. We further present a novel approach 'intelligent-miner' (i-Miner) to optimize the concurrent architecture of a fuzzy clustering algorithm (to discover web data clusters) and a fuzzy inference system to analyze the Web site visitor trends. A hybrid evolutionary fuzzy clustering algorithm is proposed in this paper to optimally segregate similar user interests. The clustered data is then used to analyze the trends using a Takagi-Sugeno fuzzy inference system learned using a combination of evolutionary algorithm and neural network learning. Proposed approach is compared with self-organizing maps (to discover patterns) and several function approximation techniques like neural networks, linear genetic programming and Takagi-Sugeno fuzzy inference system (to analyze the clusters). The results are graphically illustrated and the practical significance is discussed in detail. Empirical results clearly show that the proposed Web usage-mining framework is efficient.\nA major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use and integrate spoken instructions, visual perceptions, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and an modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.\nCurrent research on Internet of Things (IoT) mainly focuses on how to enable general objects to see, hear, and smell the physical world for themselves, and make them connected to share the observations. In this paper, we argue that only connected is not enough, beyond that, general objects should have the capability to learn, think, and understand both physical and social worlds by themselves. This practical need impels us to develop a new paradigm, named Cognitive Internet of Things (CIoT), to empower the current IoT with a `brain' for high-level intelligence. Specifically, we first present a comprehensive definition for CIoT, primarily inspired by the effectiveness of human cognition. Then, we propose an operational framework of CIoT, which mainly characterizes the interactions among five fundamental cognitive tasks: perception-action cycle, massive data analytics, semantic derivation and knowledge discovery, intelligent decision-making, and on-demand service provisioning. Furthermore, we provide a systematic tutorial on key enabling techniques involved in the cognitive tasks. In addition, we also discuss the design of proper performance metrics on evaluating the enabling techniques. Last but not least, we present the research challenges and open issues ahead. Building on the present work and potentially fruitful future studies, CIoT has the capability to bridge the physical world (with objects, resources, etc.) and the social world (with human demand, social behavior, etc.), and enhance smart resource allocation, automatic network operation, and intelligent service provisioning.\nAmbient Intelligence aims to offer personalized services and easier ways of interaction between people and systems. Since several users and systems may coexist in these environments, it is quite possible that entities with opposing preferences need to cooperate to reach their respective goals. Automated negotiation is pointed as one of the mechanisms that may provide a solution to this kind of problems. In this article, a multi-issue bilateral bargaining model for Ambient Intelligence domains is presented where it is assumed that agents have computational bounded resources and do not know their opponents' preferences. The main goal of this work is to provide negotiation models that obtain efficient agreements while maintaining the computational cost low. A niching genetic algorithm is used before the negotiation process to sample one's own utility function (self-sampling). During the negotiation process, genetic operators are applied over the opponent's and one's own offers in order to sample new offers that are interesting for both parties. Results show that the proposed model is capable of outperforming similarity heuristics which only sample before the negotiation process and of obtaining similar results to similarity heuristics which have access to all of the possible offers.\nIn a currently ongoing project, we investigate a new possibility for solving the k-labelled spanning forest (kLSF) problem by an intelligent Variable Neighbourhood Search (Int-VNS) metaheuristic. In the kLSF problem we are given an undirected input graph G and an integer positive value k, and the aim is to find a spanning forest of G having the minimum number of connected components and the upper bound k on the number of labels to use. The problem is related to the minimum labelling spanning tree (MLST) problem, whose goal is to get the spanning tree of the input graph with the minimum number of labels, and has several applications in the real world, where one aims to ensure connectivity by means of homogeneous connections. The Int-VNS metaheuristic that we propose for the kLSF problem is derived from the promising intelligent VNS strategy recently proposed for the MLST problem, and integrates the basic VNS for the kLSF problem with other complementary approaches from machine learning, statistics and experimental algorithmics, in order to produce high-quality performance and to completely automate the resulting strategy.\nIn this paper, we present an operational system for cyber threat intelligence gathering from various social platforms on the Internet particularly sites on the darknet and deepnet. We focus our attention to collecting information from hacker forum discussions and marketplaces offering products and services focusing on malicious hacking. We have developed an operational system for obtaining information from these sites for the purposes of identifying emerging cyber threats. Currently, this system collects on average 305 high-quality cyber threat warnings each week. These threat warnings include information on newly developed malware and exploits that have not yet been deployed in a cyber-attack. This provides a significant service to cyber-defenders. The system is significantly augmented through the use of various data mining and machine learning techniques. With the use of machine learning models, we are able to recall 92% of products in marketplaces and 80% of discussions on forums relating to malicious hacking with high precision. We perform preliminary analysis on the data collected, demonstrating its application to aid a security expert for better threat analysis.\nIn this paper, we address the basic problem of recognizing moving objects in video images using SP Theory of Intelligence. The concept of SP Theory of Intelligence which is a framework of artificial intelligence, was first introduced by Gerard J Wolff, where S stands for Simplicity and P stands for Power. Using the concept of multiple alignment, we detect and recognize object of our interest in video frames with multilevel hierarchical parts and subparts, based on polythetic categories. We track the recognized objects using the species based Particle Swarm Optimization (PSO). First, we extract the multiple alignment of our object of interest from training images. In order to recognize accurately and handle occlusion, we use the polythetic concepts on raw data line to omit the redundant noise via searching for best alignment representing the features from the extracted alignments. We recognize the domain of interest from the video scenes in form of wide variety of multiple alignments to handle scene variability. Unsupervised learning is done in the SP model following the DONSVIC principle and natural structures are discovered via information compression and pattern analysis. After successful recognition of objects, we use species based PSO algorithm as the alignments of our object of interest is analogues to observation likelihood and fitness ability of species. Subsequently, we analyze the competition and repulsion among species with annealed Gaussian based PSO. We have tested our algorithms on David, Walking2, FaceOcc1, Jogging and Dudek, obtaining very satisfactory and competitive results.\nThe rapid growth of IoT era is shaping the future of mobile services. Advanced communication technology enables a heterogeneous connectivity where mobile devices broadcast information to everything. Mobile applications such as robotics and vehicles connecting to cloud and surroundings transfer the short-range on-board sensor perception system to long-range mobile-sensing perception system. However, the mobile sensing perception brings new challenges for how to efficiently analyze and intelligently interpret the deluge of IoT data in mission- critical services. In this article, we model the challenges as latency, packet loss and measurement noise which severely deteriorate the reliability and quality of IoT data. We integrate the artificial intelligence into IoT to tackle these challenges. We propose a novel architecture that leverages recurrent neural networks (RNN) and Kalman filtering to anticipate motions and interac- tions between objects. The basic idea is to learn environment dynamics by recurrent networks. To improve the robustness of IoT communication, we use the idea of Kalman filtering and deploy a prediction and correction step. In this way, the architecture learns to develop a biased belief between prediction and measurement in the different situation. We demonstrate our approach with synthetic and real-world datasets with noise that mimics the challenges of IoT communications. Our method brings a new level of IoT intelligence. It is also lightweight compared to other state-of-the-art convolutional recurrent architecture and is ideally suitable for the resource-limited mobile applications.\nDeep neural networks are among the most influential architectures of deep learning algorithms, being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants (IPAs), autonomous cars, and smart home services often employ either simple local models or complex remote models on the cloud. Mobile-only and cloud-only computations are currently the status quo approaches. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side, but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward and backward propagation in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18X and 32X reductions on the latency and mobile energy consumption of querying DNNs, respectively.\nThe three laws of Robotics first appeared together in Isaac Asimov's story 'Runaround' after being mentioned in some form or the other in previous works by Asimov. These three laws commonly known as the three laws of robotics are the earliest forms of depiction for the needs of ethics in Robotics. In simplistic language Isaac Asimov is able to explain what rules a robot must confine itself to in order to maintain societal sanctity. However, even though they are outdated they still represent some of our innate fears which are beginning to resurface in present day 21st Century. Our society is on the advent of a new revolution; a revolution led by advances in Computer Science, Artificial Intelligence & Nanotechnology. Some of our advances have been so phenomenal that we surpassed what was predicted by the Moore's law. With these advancements comes the fear that our future may be at the mercy of these androids. Humans today are scared that we, ourselves, might create something which we cannot control. We may end up creating something which can not only learn much faster than anyone of us can, but also evolve faster than what the theory of evolution has allowed us to. The greatest fear is not only that we might lose our jobs to these intelligent beings, but that these beings might end up replacing us at the top of the cycle. The public hysteria has been heightened more so by a number of cultural works which depict annihilation of the human race by robots. Right from Frankenstein to I, Robot mass media has also depicted such issues. This paper is an effort to understand the need for ethics in Robotics or simply termed as Roboethics. This is achieved by the study of artificial beings and the thought being put behind them. By the end of the paper, however, it is concluded that there isn't a need for ethical robots but more so ever a need for ethical roboticists.\nThere is a long history in game theory on the topic of Bayesian or \"rational\" learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the long-term, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.\nDuring the summer of 1989, the Forecast Systems Laboratory of the National Oceanic and Atmospheric Administration sponsored an evaluation of artificial intelligence-based systems that forecast severe convective storms. The evaluation experiment, called Shootout-89, took place in Boulder, and focussed on storms over the northeastern Colorado foothills and plains (Moninger, et al., 1990). Six systems participated in Shootout-89. These included traditional expert systems, an analogy-based system, and a system developed using methods from the cognitive science/judgment analysis tradition. Each day of the exercise, the systems generated 2 to 9 hour forecasts of the probabilities of occurrence of: non significant weather, significant weather, and severe weather, in each of four regions in northeastern Colorado. A verification coordinator working at the Denver Weather Service Forecast Office gathered ground-truth data from a network of observers. Systems were evaluated on the basis of several measures of forecast skill, and on other metrics such as timeliness, ease of learning, and ease of use. Systems were generally easy to operate, however the various systems required substantially different levels of meteorological expertise on the part of their users--reflecting the various operational environments for which the systems had been designed. Systems varied in their statistical behavior, but on this difficult forecast problem, the systems generally showed a skill approximately equal to that of persistence forecasts and climatological (historical frequency) forecasts. The two systems that appeared best able to discriminate significant from non significant weather events were traditional expert systems. Both of these systems required the operator to make relatively sophisticated meteorological judgments. We are unable, based on only one summer's worth of data, to determine the extent to which the greater skill of the two systems was due to the content of their knowledge bases, or to the subjective judgments of the operator. A follow-on experiment, Shootout-91, is currently being planned. Interested potential participants are encouraged to contact the author at the address above.\nDuring their first years of life, infants learn the language(s) of their environment at an amazing speed despite large cross cultural variations in amount and complexity of the available language input. Understanding this simple fact still escapes current cognitive and linguistic theories. Recently, spectacular progress in the engineering science, notably, machine learning and wearable technology, offer the promise of revolutionizing the study of cognitive development. Machine learning offers powerful learning algorithms that can achieve human-like performance on many linguistic tasks. Wearable sensors can capture vast amounts of data, which enable the reconstruction of the sensory experience of infants in their natural environment. The project of 'reverse engineering' language development, i.e., of building an effective system that mimics infant's achievements appears therefore to be within reach. Here, we analyze the conditions under which such a project can contribute to our scientific understanding of early language development. We argue that instead of defining a sub-problem or simplifying the data, computational models should address the full complexity of the learning situation, and take as input the raw sensory signals available to infants. This implies that (1) accessible but privacy-preserving repositories of home data be setup and widely shared, and (2) models be evaluated at different linguistic levels through a benchmark of psycholinguist tests that can be passed by machines and humans alike, (3) linguistically and psychologically plausible learning architectures be scaled up to real data using probabilistic/optimization principles from machine learning. We discuss the feasibility of this approach and present preliminary results.\nFor many years, the idea of a human with bionic muscles immediately conjures up science fiction images of a TV series superhuman character that was implanted with bionic muscles and portrayed with strength and speed far superior to any normal human. As fantastic as this idea may seem, recent developments in electroactive polymers (EAP) may one day make such bionics possible. Polymers that exhibit large displacement in response to stimulation that is other than electrical signal were known for many years. Initially, EAP received relatively little attention due to their limited actuation capability. However, in the recent years, the view of the EAP materials has changed due to the introduction of effective new materials that significantly surpassed the capability of the widely used piezoelectric polymer, PVDF. As this technology continues to evolve, novel mechanisms that are biologically inspired are expected to emerge. EAP materials can potentially provide actuation with lifelike response and more flexible configurations. While further improvements in performance and robustness are still needed, there already have been several reported successes. In recognition of the need for cooperation in this multidisciplinary field, the author initiated and organized a series of international forums that are leading to a growing number of research and development projects and to great advances in the field. In 1999, he challenged the worldwide science and engineering community of EAP experts to develop a robotic arm that is actuated by artificial muscles to win a wrestling match against a human opponent. In this paper, the field of EAP as artificial muscles will be reviewed covering the state of the art, the challenges and the vision for the progress in future years.\nThe speech code is a vehicle of language: it defines a set of forms used by a community to carry information. Such a code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is discrete and compositional, shared by all the individuals of a community but different across communities, and phoneme inventories are characterized by statistical regularities. How can a speech code with these properties form? We try to approach these questions in the paper, using the \"methodology of the artificial\". We build a society of artificial agents, and detail a mechanism that shows the formation of a discrete speech code without pre-supposing the existence of linguistic capacities or of coordinated interactions. The mechanism is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non language-specific neural devices leads to the formation of a speech code that has properties similar to the human speech code. This result relies on the self-organizing properties of a generic coupling between perception and production within agents, and on the interactions between agents. The artificial system helps us to develop better intuitions on how speech might have appeared, by showing how self-organization might have helped natural selection to find speech.\nAs one of the newest members in Artificial Immune Systems (AIS), the Dendritic Cell Algorithm (DCA) has been applied to a range of problems. These applications mainly belong to the field of anomaly detection. However, real-time detection, a new challenge to anomaly detection, requires improvement on the real-time capability of the DCA. To assess such capability, formal methods in the research of rea-time systems can be employed. The findings of the assessment can provide guideline for the future development of the algorithm. Therefore, in this paper we use an interval logic based method, named the Duration Calculus (DC), to specify a simplified single-cell model of the DCA. Based on the DC specifications with further induction, we find that each individual cell in the DCA can perform its function as a detector in real-time. Since the DCA can be seen as many such cells operating in parallel, it is potentially capable of performing real-time detection. However, the analysis process of the standard DCA constricts its real-time capability. As a result, we conclude that the analysis process of the standard DCA should be replaced by a real-time analysis component, which can perform periodic analysis for the purpose of real-time detection.\nWith the recent exponential growth of applications using artificial intelligence (AI), the development of efficient and ultrafast brain-like (neuromorphic) systems is crucial for future information and communication technologies. While the implementation of AI systems using computer algorithms of neural networks is emerging rapidly, scientists are just taking the very first steps in the development of the hardware elements of an artificial brain, specifically neuromorphic microchips. In this review article, we present the current state of neuromorphic photonic circuits based on solid-state optoelectronic oscillators formed by nanoscale double barrier quantum well resonant tunneling diodes. We address, both experimentally and theoretically, the key dynamic properties of recently developed artificial solid-state neuron microchips with delayed perturbations and describe their role in the study of neural activity and regenerative memory. This review covers our recent research work on excitable and delay dynamic characteristics of both single and autaptic (delayed) artificial neurons including all-or-none response, spike-based data encoding, storage, signal regeneration and signal healing. Furthermore, the neural responses of these neuromorphic microchips display all the signatures of extended spatio-temporal localized structures (LSs) of light, which are reviewed here in detail. By taking advantage of the dissipative nature of LSs, we demonstrate potential applications in optical data reconfiguration and clock and timing at high-speeds and with short transients. The results reviewed in this article are a key enabler for the development of high-performance optoelectronic devices in future high-speed brain-inspired optical memories and neuromorphic computing.\nWe consider the problem of designing the the utility functions of the utility-maximizing agents in a multi-agent system so that they work synergistically to maximize a global utility. The particular problem domain we explore is the control of network routing by placing agents on all the routers in the network. Conventional approaches to this task have the agents all use the Ideal Shortest Path routing Algorithm (ISPA). We demonstrate that in many cases, due to the side-effects of one agent's actions on another agent's performance, having agents use ISPA's is suboptimal as far as global aggregate cost is concerned, even when they are only used to route infinitesimally small amounts of traffic. The utility functions of the individual agents are not \"aligned\" with the global utility, intuitively speaking. As a particular example of this we present an instance of Braess' paradox in which adding new links to a network whose agents all use the ISPA results in a decrease in overall throughput. We also demonstrate that load-balancing, in which the agents' decisions are collectively made to optimize the global cost incurred by all traffic currently being routed, is suboptimal as far as global cost averaged across time is concerned. This is also due to 'side-effects', in this case of current routing decision on future traffic. The mathematics of Collective Intelligence (COIN) is concerned precisely with the issue of avoiding such deleterious side-effects in multi-agent systems, both over time and space. We present key concepts from that mathematics and use them to derive an algorithm whose ideal version should have better performance than that of having all agents use the ISPA, even in the infinitesimal limit. We present experiments verifying this, and also showing that a machine-learning-based version of this COIN algorithm in which costs are only imprecisely estimated via empirical means (a version potentially applicable in the real world) also outperforms the ISPA, despite having access to less information than does the ISPA. In particular, this COIN algorithm almost always avoids Braess' paradox.\nThis article is an overview of the \"SP theory of intelligence\". The theory aims to simplify and integrate concepts across artificial intelligence, mainstream computing and human perception and cognition, with information compression as a unifying theme. It is conceived as a brain-like system that receives 'New' information and stores some or all of it in compressed form as 'Old' information. It is realised in the form of a computer model -- a first version of the SP machine. The concept of \"multiple alignment\" is a powerful central idea. Using heuristic techniques, the system builds multiple alignments that are 'good' in terms of information compression. For each multiple alignment, probabilities may be calculated. These provide the basis for calculating the probabilities of inferences. The system learns new structures from partial matches between patterns. Using heuristic techniques, the system searches for sets of structures that are 'good' in terms of information compression. These are normally ones that people judge to be 'natural', in accordance with the 'DONSVIC' principle -- the discovery of natural structures via information compression. The SP theory may be applied in several areas including 'computing', aspects of mathematics and logic, representation of knowledge, natural language processing, pattern recognition, several kinds of reasoning, information storage and retrieval, planning and problem solving, information compression, neuroscience, and human perception and cognition. Examples include the parsing and production of language including discontinuous dependencies in syntax, pattern recognition at multiple levels of abstraction and its integration with part-whole relations, nonmonotonic reasoning and reasoning with default values, reasoning in Bayesian networks including 'explaining away', causal diagnosis, and the solving of a geometric analogy problem.\nThis paper presents evidence for the idea that much of artificial intelligence, human perception and cognition, mainstream computing, and mathematics, may be understood as compression of information via the matching and unification of patterns. This is the basis for the \"SP theory of intelligence\", outlined in the paper and fully described elsewhere. Relevant evidence may be seen: in empirical support for the SP theory; in some advantages of information compression (IC) in terms of biology and engineering; in our use of shorthands and ordinary words in language; in how we merge successive views of any one thing; in visual recognition; in binocular vision; in visual adaptation; in how we learn lexical and grammatical structures in language; and in perceptual constancies. IC via the matching and unification of patterns may be seen in both computing and mathematics: in IC via equations; in the matching and unification of names; in the reduction or removal of redundancy from unary numbers; in the workings of Post's Canonical System and the transition function in the Universal Turing Machine; in the way computers retrieve information from memory; in systems like Prolog; and in the query-by-example technique for information retrieval. The chunking-with-codes technique for IC may be seen in the use of named functions to avoid repetition of computer code. The schema-plus-correction technique may be seen in functions with parameters and in the use of classes in object-oriented programming. And the run-length coding technique may be seen in multiplication, in division, and in several other devices in mathematics and computing. The SP theory resolves the apparent paradox of \"decompression by compression\". And computing and cognition as IC is compatible with the uses of redundancy in such things as backup copies to safeguard data and understanding speech in a noisy environment.\nTraffic congestion in urban road networks is a costly problem that affects all major cities in developed countries. To tackle this problem, it is possible (i) to act on the supply side, increasing the number of roads or lanes in a network, (ii) to reduce the demand, restricting the access to urban areas at specific hours or to specific vehicles, or (iii) to improve the efficiency of the existing network, by means of a widespread use of so-called Intelligent Transportation Systems (ITS). In line with the recent advances in smart transportation management infrastructures, ITS has turned out to be a promising field of application for artificial intelligence techniques. In particular, multiagent systems seem to be the ideal candidates for the design and implementation of ITS. In fact, drivers can be naturally modelled as autonomous agents that interact with the transportation management infrastructure, thereby generating a large-scale, open, agent-based system. To regulate such a system and maintain a smooth and efficient flow of traffic, decentralised mechanisms for the management of the transportation infrastructure are needed.   In this article we propose a distributed, market-inspired, mechanism for the management of a future urban road network, where intelligent autonomous vehicles, operated by software agents on behalf of their human owners, interact with the infrastructure in order to travel safely and efficiently through the road network. Building on the reservation-based intersection control model proposed by Dresner and Stone, we consider two different scenarios: one with a single intersection and one with a network of intersections. In the former, we analyse the performance of a novel policy based on combinatorial auctions for the allocation of reservations. In the latter, we analyse the impact that a traffic assignment strategy inspired by competitive markets has on the drivers route choices. Finally we propose an adaptive management mechanism that integrates the auction-based traffic control policy with the competitive traffic assignment strategy.\nThis is a proposal to create a research facility for the development of a high-parallel version of the \"SP machine\", based on the \"SP theory of intelligence\". We envisage that the new version of the SP machine will be an open-source software virtual machine, derived from the existing \"SP computer model\", and hosted on an existing high-performance computer. It will be a means for researchers everywhere to explore what can be done with the system and to create new versions of it. The SP system is a unique attempt to simplify and integrate observations and concepts across artificial intelligence, mainstream computing, mathematics, and human perception and cognition, with information compression as a unifying theme. Potential benefits and applications include helping to solve problems associated with big data; facilitating the development of autonomous robots; unsupervised learning, natural language processing, several kinds of reasoning, fuzzy pattern recognition at multiple levels of abstraction, computer vision, best-match and semantic forms of information retrieval, software engineering, medical diagnosis, simplification of computing systems, and the seamless integration of diverse kinds of knowledge and diverse aspects of intelligence. Additional motivations include the potential of the SP system to help solve problems in defence, security, and the detection and prevention of crime; potential in terms of economic, social, environmental, and academic criteria, and in terms of publicity; and the potential for international influence in research. The main elements of the proposed facility are described, including support for the development of \"SP-neural\", a neural version of the SP machine. The facility should be permanent in the sense that it should be available for the foreseeable future, and it should be designed to facilitate its use by researchers anywhere in the world.\nArtificial intelligence (AI) classifiers can be used to classify unknowns, refine existing classification parameters, and identify/screen out ineffectual parameters. We present an AI methodology for classifying new gamma-ray bursts, along with some preliminary results.\nIn this brief, an algorithm for controlling chaotic systems using small, continuous time perturbations is presented. Stabilisation is achieved by self controlling feedback using low order LTI filters. The algorithm alleviates the need of complex calculati ons or costly delay elements, and can be implemented in a wide variety of systems using simple circuit elments only.\nDependency grammar is usually interpreted as equivalent to a strict form of X--bar theory that forbids the stacking of nodes of the same bar level (e.g., N' immediately dominating N' with the same head). But adequate accounts of _one_--anaphora and of the semantics of multiple modifiers require such stacking and accordingly argue against dependency grammar. Dependency grammar can be salvaged by reinterpreting its claims about phrase structure, so that modifiers map onto binary--branching X--bar trees rather than ``flat'' ones.\nWe present an algorithm that acquires words (pairings of phonological forms and semantic representations) from larger utterances of unsegmented phoneme sequences and semantic representations. The algorithm maintains from utterance to utterance only a single coherent dictionary, and learns in the presence of homonymy, synonymy, and noise. Test results over a corpus of utterances generated from the Childes database of mother-child interactions are presented.\nWe define {\\em semantic complexity} using a new concept of {\\em meaning automata}. We measure the semantic complexity of understanding of prepositional phrases, of an \"in depth understanding system\", and of a natural language interface to an on-line calendar. We argue that it is possible to measure some semantic complexities of natural language processing systems before building them, and that systems that exhibit relatively complex behavior can be built from semantically simple components.\nAlthough unification can be used to implement a weak form of $\\beta$-reduction, several linguistic phenomena are better handled by using some form of $\\lambda$-calculus. In this paper we present a higher order feature description calculus based on a typed $\\lambda$-calculus. We show how the techniques used in \\CLG for resolving complex feature constraints can be efficiently extended. \\CCLG is a simple formalism, based on categorial grammars, designed to test the practical feasibility of such a calculus.\nThis paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).\nA generation algorithm based on an active chart parsing algorithm is introduced which can be used in conjunction with a Shake and Bake machine translation system. A concise Prolog implementation of the algorithm is provided, and some performance comparisons with a shift-reduce based algorithm are given which show the chart generator is much more efficient for generating all possible sentences from an input specification.\nWe present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodological framework for text ellipsis resolution is the centering model that has been adapted to these constraints.\nConversational implicatures are usually described as being licensed by the disobeying or flouting of a Principle of Cooperation. However, the specification of this principle has proved computationally elusive. In this paper we suggest that a more useful concept is rationality. Such a concept can be specified explicitely in planning terms and we argue that speakers perform utterances as part of the optimal plan for their particular communicative goals. Such an assumption can be used by the hearer to infer conversational implicatures implicit in the speaker's utterance.\nIt is odd that chess grandmasters often disagree in their analysis of positions, sometimes even of simple ones, and that a grandmaster can hold his own against an powerful analytic machine such as Deep Blue. The fact that there must exist pure winning strategies for chess is used to construct a control strategy function. It is then shown that chess strategy is equivalent to an autonomous system of differential equations, and conjectured that the system is chaotic. If true the conjecture would explain the forenamed peculiarities and would also imply that there cannot exist a static evaluator for chess.\nWe provide here a proof theoretic account of constraint programming that attempts to capture the essential ingredients of this programming style. We exemplify it by presenting proof rules for linear constraints over interval domains, and illustrate their use by analyzing the constraint propagation process for the {\\tt SEND + MORE = MONEY} puzzle. We also show how this approach allows one to build new constraint solvers.\nA new approach to software design based on an agent-oriented architecture is presented. Unlike current research, we consider software to be designed and implemented with this methodology in mind. In this approach agents are considered adaptively communicating concurrent modules which are divided into a white box module responsible for the communications and learning, and a black box which is the independent specialized processes of the agent. A distributed Learning policy is also introduced for adaptability.\nWe present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoretical-linguistic decompositional lexical entries.\nPredicate Logic with Definitions (PLD or D-logic) is a modification of first-order logic intended mostly for practical formalization of mathematics. The main syntactic constructs of D-logic are terms, formulas and definitions. A definition is a definition of variables, a definition of constants, or a composite definition (D-logic has also abbreviation definitions called abbreviations). Definitions can be used inside terms and formulas. This possibility alleviates introducing new quantifier-like names. Composite definitions allow constructing new definitions from existing ones.\nWe present an empirical investigation of various ways to automatically identify phrases in a tagged corpus that are useful for dialogue act tagging. We found that a new method (which measures a phrase's deviation from an optimally-predictive phrase), enhanced with a lexical filtering mechanism, produces significantly better cues than manually-selected cue phrases, the exhaustive set of phrases in a training corpus, and phrases chosen by traditional metrics, like mutual information and information gain.\nA pattern-based approach to the presentation, codification and reuse of property specifications for finite-state verification was proposed by Dwyer and his collegues. The patterns enable non-experts to read and write formal specifications for realistic systems and facilitate easy conversion of specifications between formalisms, such as LTL, CTL, QRE. In this paper, we extend the pattern system with events - changes of values of variables in the context of LTL.\nThe rules associated with propositional logic programs and the stable model semantics are not expressive enough to let one write concise programs. This problem is alleviated by introducing some new types of propositional rules. Together with a decision procedure that has been used as a base for an efficient implementation, the new rules supplant the standard ones in practical applications of the stable model semantics.\nWe provide here a simple, yet very general framework that allows us to explain several constraint propagation algorithms in a systematic way. In particular, using the notions commutativity and semi-commutativity, we show how the well-known AC-3, PC-2, DAC and DPC algorithms are instances of a single generic algorithm. The work reported here extends and simplifies that of Apt, cs.AI/9811024.\nA system is described that uses a mixed-level representation of (part of) meaning of natural language documents (based on standard Horn Clause Logic) and a variable-depth search strategy that distinguishes between the different levels of abstraction in the knowledge representation to locate specific passages in the documents. Mixed-level representations as well as variable-depth search strategies are applicable in fields outside that of NLP.\nA system is described that uses a mixed-level knowledge representation based on standard Horn Clause Logic to represent (part of) the meaning of natural language documents. A variable-depth search strategy is outlined that distinguishes between the different levels of abstraction in the knowledge representation to locate specific passages in the documents. A detailed description of the linguistic aspects of the system is given. Mixed-level representations as well as variable-depth search strategies are applicable in fields outside that of NLP.\nThis paper initiates the use of vector fields to design, optimize, and implement reactive schedules for safe cooperative robot patterns on planar graphs. We consider Automated Guided Vehicles (AGV's) operating upon a predefined network of pathways. In contrast to the case of locally Euclidean configuration spaces, regularization of collisions is no longer a local procedure, and issues concerning the global topology of configuration spaces must be addressed. The focus of the present inquiry is the achievement of safe, efficient, cooperative patterns in the simplest nontrivial example (a pair of robots on a Y-network) by means of a state-event heirarchical controller.\nWhile much research on the hard problem of in-depth story understanding by computer was performed starting in the 1970s, interest shifted in the 1990s to information extraction and word sense disambiguation. Now that a degree of success has been achieved on these easier problems, I propose it is time to return to in-depth story understanding. In this paper I examine the shift away from story understanding, discuss some of the major problems in building a story understanding system, present some possible solutions involving a set of interacting understanding agents, and provide pointers to useful tools and resources for building story understanding systems.\nSince scripts were proposed in the 1970's as an inferencing mechanism for AI and natural language processing programs, there have been few attempts to build a database of scripts. This paper describes a database and lexicon of scripts that has been added to the ThoughtTreasure commonsense platform. The database provides the following information about scripts: sequence of events, roles, props, entry conditions, results, goals, emotions, places, duration, frequency, and cost. English and French words and phrases are linked to script concepts.\nThe idea of preserving conditional beliefs emerged recently as a new paradigm apt to guide the revision of epistemic states. Conditionals are substantially different from propositional beliefs and need specific treatment. In this paper, we present a new approach to conditionals, capturing particularly well their dynamic part as revision policies. We thoroughly axiomatize a principle of conditional preservation as an indifference property with respect to conditional structures of worlds. This principle is developed in a semi-quantitative setting, so as to reveal its fundamental meaning for belief revision in quantitative as well as in qualitative frameworks. In fact, it is shown to cover other proposed approaches to conditional preservation.\nThis article describes the first implementation of the GADEL system : a Genetic Algorithm for Default Logic. The goal of GADEL is to compute extensions in Reiter's default logic. It accepts every kind of finite propositional default theories and is based on evolutionary principles of Genetic Algorithms. Its first experimental results on certain instances of the problem show that this new approach of the problem can be successful.\nThe paper studies the notion of supposition encoded in non-Archimedean conditional probability (and revealed in the acceptance of the so-called indicative conditionals). The notion of qualitative change of view that thus arises is axiomatized and compared with standard notions like AGM and UPDATE. Applications in the following fields are discussed: (1) theory of games and decisions, (2) causal models, (3) non-monotonic logic.\nWe propose a combination of probabilistic reasoning from conditional constraints with approaches to default reasoning from conditional knowledge bases. In detail, we generalize the notions of Pearl's entailment in system Z, Lehmann's lexicographic entailment, and Geffner's conditional entailment to conditional constraints. We give some examples that show that the new notions of z-, lexicographic, and conditional entailment have similar properties like their classical counterparts. Moreover, we show that the new notions of z-, lexicographic, and conditional entailment are proper generalizations of both their classical counterparts and the classical notion of logical entailment for conditional constraints.\nThis paper describes a system, called PLP, for compiling ordered logic programs into standard logic programs under the answer set semantics. In an ordered logic program, rules are named by unique terms, and preferences among rules are given by a set of dedicated atoms. An ordered logic program is transformed into a second, regular, extended logic program wherein the preferences are respected, in that the answer sets obtained in the transformed theory correspond with the preferred answer sets of the original theory. Since the result of the translation is an extended logic program, existing logic programming systems can be used as underlying reasoning engine. In particular, PLP is conceived as a front-end to the logic programming systems dlv and smodels.\nThe SLDNFA-system results from the LP+ project at the K.U.Leuven, which investigates logics and proof procedures for these logics for declarative knowledge representation. Within this project inductive definition logic (ID-logic) is used as representation logic. Different solvers are being developed for this logic and one of these is SLDNFA. A prototype of the system is available and used for investigating how to solve efficiently problems represented in ID-logic.\nIn this paper we introduce a nonmonotonic framework for belief revision in which reasoning about the reliability of different pieces of information based on meta-knowledge about the information is possible, and where revision strategies can be described declaratively. The approach is based on a Poole-style system for default reasoning in which entrenchment information is represented in the logical language. A notion of inference based on the least fixed point of a monotone operator is used to make sure that all theories possess a consistent set of conclusions.\nDLV is an efficient logic programming and non-monotonic reasoning (LPNMR) system with advanced knowledge representation mechanisms and interfaces to classic relational database systems.   Its core language is disjunctive datalog (function-free disjunctive logic programming) under the Answer Set Semantics with integrity constraints, both default and strong (or explicit) negation, and queries. Integer arithmetics and various built-in predicates are also supported.   In addition DLV has several frontends, namely brave and cautious reasoning, abductive diagnosis, consistency-based diagnosis, a subset of SQL3, planning with action languages, and logic programming with inheritance.\nWe generalize a theorem by Francois Fages that describes the relationship between the completion semantics and the answer set semantics for logic programs with negation as failure. The study of this relationship is important in connection with the emergence of answer set programming. Whenever the two semantics are equivalent, answer sets can be computed by a satisfiability solver, and the use of answer set solvers such as smodels and dlv is unnecessary. A logic programming representation of the blocks world due to Ilkka Niemelae is discussed as an example.\nThis paper analyses the declarative readings of logic programming. Logic programming - and negation as failure - has no unique declarative reading. One common view is that logic programming is a logic for default reasoning, a sub-formalism of default logic or autoepistemic logic. In this view, negation as failure is a modal operator. In an alternative view, a logic program is interpreted as a definition. In this view, negation as failure is classical objective negation. From a commonsense point of view, there is definitely a difference between these views. Surprisingly though, both types of declarative readings lead to grosso modo the same model semantics. This note investigates the causes for this.\nSATEN is an object-oriented web-based extraction and belief revision engine. It runs on any computer via a Java 1.1 enabled browser such as Netscape 4. SATEN performs belief revision based on the AGM approach. The extraction and belief revision reasoning engines operate on a user specified ranking of information. One of the features of SATEN is that it can be used to integrate mutually inconsistent commensuate rankings into a consistent ranking.\nAnswer-set programming (ASP) has emerged recently as a viable programming paradigm. We describe here an ASP system, DATALOG with constraints or DC, based on non-monotonic logic. Informally, DC theories consist of propositional clauses (constraints) and of Horn rules. The semantics is a simple and natural extension of the semantics of the propositional logic. However, thanks to the presence of Horn rules in the system, modeling of transitive closure becomes straightforward. We describe the syntax, use and implementation of DC and provide experimental results.\nThe importance of transformations and normal forms in logic programming, and generally in computer science, is well documented. This paper investigates transformations and normal forms in the context of Defeasible Logic, a simple but efficient formalism for nonmonotonic reasoning based on rules and priorities. The transformations described in this paper have two main benefits: on one hand they can be used as a theoretical tool that leads to a deeper understanding of the formalism, and on the other hand they have been used in the development of an efficient implementation of defeasible logic.\nIn (Apt et al, TOPLAS 1998) we introduced the imperative programming language Alma-0 that supports declarative programming. In this paper we illustrate the hybrid programming style of Alma-0 by means of various examples that complement those presented in (Apt et al, TOPLAS 1998). The presented Alma-0 programs illustrate the versatility of the language and show that ``don't know'' nondeterminism can be naturally combined with assignment.\nDescription Logics (DLs) are a family of knowledge representation formalisms mainly characterised by constructors to build complex concepts and roles from atomic ones. Expressive role constructors are important in many applications, but can be computationally problematical. We present an algorithm that decides satisfiability of the DL ALC extended with transitive and inverse roles, role hierarchies, and qualifying number restrictions. Early experiments indicate that this algorithm is well-suited for implementation. Additionally, we show that ALC extended with just transitive and inverse roles is still in PSPACE. Finally, we investigate the limits of decidability for this family of DLs.\nWhile there has been a great deal of work on the development of reasoning algorithms for expressive description logics, in most cases only Tbox reasoning is considered. In this paper we present an algorithm for combined Tbox and Abox reasoning in the SHIQ description logic. This algorithm is of particular interest as it can be used to decide the problem of (database) conjunctive query containment w.r.t. a schema. Moreover, the realisation of an efficient implementation should be relatively straightforward as it can be based on an existing highly optimised implementation of the Tbox algorithm in the FaCT system.\nWe present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also describe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.\nExisting procedures for model validation have been deemed inadequate for many engineering systems. The reason of this inadequacy is due to the high degree of complexity of the mechanisms that govern these systems. It is proposed in this paper to shift the attention from modeling the engineering system itself to modeling the uncertainty that underlies its behavior. A mathematical framework for modeling the uncertainty in complex engineering systems is developed. This framework uses the results of computational learning theory. It is based on the premise that a system model is a learning machine.\nPhase transition is an important feature of SAT problem. For random k-SAT model, it is proved that as r (ratio of clauses to variables) increases, the structure of solutions will undergo a sudden change like satisfiability phase transition when r reaches a threshold point. This phenomenon shows that the satisfying truth assignments suddenly shift from being relatively different from each other to being very similar to each other.\nWe demonstrate how multiagent systems provide useful control techniques for modular self-reconfigurable (metamorphic) robots. Such robots consist of many modules that can move relative to each other, thereby changing the overall shape of the robot to suit different tasks. Multiagent control is particularly well-suited for tasks involving uncertain and changing environments. We illustrate this approach through simulation experiments of Proteo, a metamorphic robot system currently under development.\nThis paper presents a bimodal logic for reasoning about knowledge during knowledge acquisition. One of the modalities represents (effort during) non-deterministic time and the other represents knowledge. The semantics of this logic are tree-like spaces which are a generalization of semantics used for modeling branching time and historical necessity. A finite system of axiom schemes is shown to be canonically complete for the formentioned spaces. A characterization of the satisfaction relation implies the small model property and decidability for this system.\nThis paper describes how automated deduction methods for natural language processing can be applied more efficiently by encoding context in a more elaborate way. Our work is based on formal approaches to context, and we provide a tableau calculus for contextual reasoning. This is explained by considering an example from the problem area of presupposition projection.\nThis paper explores the kinds of probabilistic relations that are important in syntactic disambiguation. It proposes that two widely used kinds of relations, lexical dependencies and structural relations, have complementary disambiguation capabilities. It presents a new model based on structural relations, the Tree-gram model, and reports experiments showing that structural relations should benefit from enrichment by lexical dependencies.\nWe propose a new definition of actual cause, using structural equations to model counterfactuals. We show that the definition yields a plausible and elegant account of causation that handles well examples which have caused problems for other definitions and resolves major difficulties in the traditional account.\nSome normal logic programs under the answer set (stable model) semantics lack the appealing property of \"cautious monotonicity.\" That is, augmenting a program with one of its consequences may cause it to lose another of its consequences. The syntactic condition of \"order-consistency\" was shown by Fages to guarantee existence of an answer set. This note establishes that order-consistent programs are not only consistent, but cautiously monotonic.   From this it follows that they are also \"cumulative.\" That is, augmenting an order-consistent with some of its consequences does not alter its consequences. In fact, as we show, its answer sets remain unchanged.\nThis paper presents a comprehensive empirical comparison between two approaches for developing a base noun phrase chunker: human rule writing and active learning using interactive real-time human annotation. Several novel variations on active learning are investigated, and underlying cost models for cross-modal machine learning comparison are presented and explored. Results show that it is more efficient and more successful by several measures to train a system using active learning annotation rather than hand-crafted rule writing at a comparable level of human labor investment.\nThis paper deals with a problem from discrete-time robust control which requires the solution of constraints over the reals that contain both universal and existential quantifiers. For solving this problem we formulate it as a program in a (fictitious) constraint logic programming language with explicit quantifier notation. This allows us to clarify the special structure of the problem, and to extend an algorithm for computing approximate solution sets of first-order constraints over the reals to exploit this structure. As a result we can deal with inputs that are clearly out of reach for current symbolic solvers.\nClassical notions of disjunctive and cumulative scheduling are studied from the point of view of soft constraint satisfaction. Soft disjunctive scheduling is introduced as an instance of soft CSP and preferences included in this problem are applied to generate a lower bound based on existing discrete capacity resource. Timetabling problems at Purdue University and Faculty of Informatics at Masaryk University considering individual course requirements of students demonstrate practical problems which are solved via proposed methods. Implementation of general preference constraint solver is discussed and first computational results for timetabling problem are presented.\nThis paper describes the system of storage, extract and processing of information structured similarly to the natural language. For recursive inference the system uses the rules having the same representation, as the data. The environment of storage of information is provided with the File Mapping (SHM) mechanism of operating system. In the paper the main principles of construction of dynamic data structure and language for record of the inference rules are stated; the features of available implementation are considered and the description of the application realizing semantic information retrieval on the natural language is given.\nWe evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, or \"spam\", floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the efficiency of automatically induced anti-spam filters, and that such filters can be used in real-life applications.\nMany classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general approach -- a sequential learning model that utilizes classifiers to sequentially restrict the number of competing classes while maintaining, with high probability, the presence of the true outcome in the candidates set. Some theoretical and computational properties of the model are discussed and we argue that these are important in NLP-like domains. The advantages of the model are illustrated in an experiment in part-of-speech tagging.\nConstraint propagation is a general algorithmic approach for pruning the search space of a CSP. In a uniform way, K. R. Apt has defined a computation as an iteration of reduction functions over a domain. He has also demonstrated the need for integrating static properties of reduction functions (commutativity and semi-commutativity) to design specialized algorithms such as AC3 and DAC. We introduce here a set of operators for modeling compositions of reduction functions. Two of the major goals are to tackle parallel computations, and dynamic behaviours (such as slow convergence).\nWe present an implementation of an answer-set programming paradigm, called aspps (short for answer-set programming with propositional schemata). The system aspps is designed to process PS+ theories. It consists of two basic modules. The first module, psgrnd, grounds an PS+ theory. The second module, referred to as aspps, is a solver. It computes models of ground PS+ theories.\nThis work explores a new robust approach for Semantic Parsing of unrestricted texts. Our approach considers Semantic Parsing as a Consistent Labelling Problem (CLP), allowing the integration of several knowledge types (syntactic and semantic) obtained from different sources (linguistic and statistic). The current implementation obtains 95% accuracy in model identification and 72% in case-role filling.\nAn important aspect of data integration involves answering queries using various resources rather than by accessing database relations. The process of transforming a query from the database relations to the resources is often referred to as query folding or answering queries using views, where the views are the resources. We present a uniform approach that includes as special cases much of the previous work on this subject. Our approach is logic-based using resolution. We deal with integrity constraints, negation, and recursion also within this framework.\nWe introduce a learning method called ``gradient-based reinforcement planning'' (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.\nIn the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans.\nMany mathematical models utilize limit processes. Continuous functions and the calculus, differential equations and topology, all are based on limits and continuity. However, when we perform measurements and computations, we can achieve only approximate results. In some cases, this discrepancy between theoretical schemes and practical actions changes drastically outcomes of a research and decision-making resulting in uncertainty of knowledge. In the paper, a mathematical approach to such kind of uncertainty, which emerges in computation and measurement, is suggested on the base of the concept of a fuzzy limit. A mathematical technique is developed for differential models with uncertainty. To take into account the intrinsic uncertainty of a model, it is suggested to use fuzzy derivatives instead of conventional derivatives of functions in this model.\nWe investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.\nMuch work in computer science has adopted competitive analysis as a tool for decision making under uncertainty. In this work we extend competitive analysis to the context of multi-agent systems. Unlike classical competitive analysis where the behavior of an agent's environment is taken to be arbitrary, we consider the case where an agent's environment consists of other agents. These agents will usually obey some (minimal) rationality constraints. This leads to the definition of rational competitive analysis. We introduce the concept of rational competitive analysis, and initiate the study of competitive analysis for multi-agent systems. We also discuss the application of rational competitive analysis to the context of bidding games, as well as to the classical one-way trading problem.\nThis paper describes the LDL++ system and the research advances that have enabled its design and development. We begin by discussing the new nonmonotonic and nondeterministic constructs that extend the functionality of the LDL++ language, while preserving its model-theoretic and fixpoint semantics. Then, we describe the execution model and the open architecture designed to support these new constructs and to facilitate the integration with existing DBMSs and applications. Finally, we describe the lessons learned by using LDL++ on various tested applications, such as middleware and datamining.\nA vast and interesting family of natural semantics for belief revision is defined. Suppose one is given a distance d between any two models. One may then define the revision of a theory K by a formula a as the theory defined by the set of all those models of a that are closest, by d, to the set of models of K. This family is characterized by a set of rationality postulates that extends the AGM postulates. The new postulates describe properties of iterated revisions.\nWe give a semantics to iterated update by a preference relation on possible developments. An iterated update is a sequence of formulas, giving (incomplete) information about successive states of the world. A development is a sequence of models, describing a possible trajectory through time. We assume a principle of inertia and prefer those developments, which are compatible with the information, and avoid unnecessary changes. The logical properties of the updates defined in this way are considered, and a representation result is proved.\nFinding optimal solutions for multi-unit combinatorial auctions is a hard problem and finding approximations to the optimal solution is also hard. We investigate the use of Branch-and-Bound techniques: they require both a way to bound from above the value of the best allocation and a good criterion to decide which bids are to be tried first. Different methods for efficiently bounding from above the value of the best allocation are considered. Theoretical original results characterize the best approximation ratio and the ordering criterion that provides it. We suggest to use this criterion.\nWe propose that a regulation mechanism based on Hebbian covariance plasticity may cause the brain to operate near criticality. We analyze the effect of such a regulation on the dynamics of a network with excitatory and inhibitory neurons and uniform connectivity within and across the two populations. We show that, under broad conditions, the system converges to a critical state lying at the common boundary of three regions in parameter space; these correspond to three modes of behavior: high activity, low activity, oscillation.\nStereotypical reasoning assumes that the situation at hand is one of a kind and that it enjoys the properties generally associated with that kind of situation. It is one of the most basic forms of nonmonotonic reasoning. A formal model for stereotypical reasoning is proposed and the logical properties of this form of reasoning are studied. Stereotypical reasoning is shown to be cumulative under weak assumptions.\nPrioritized default reasoning has illustrated its rich expressiveness and flexibility in knowledge representation and reasoning. However, many important aspects of prioritized default reasoning have yet to be thoroughly explored. In this paper, we investigate two properties of prioritized logic programs in the context of answer set semantics. Specifically, we reveal a close relationship between mutual defeasibility and uniqueness of the answer set for a prioritized logic program. We then explore how the splitting technique for extended logic programs can be extended to prioritized logic programs. We prove splitting theorems that can be used to simplify the evaluation of a prioritized logic program under certain conditions.\nBecause the data being mined in the temporal database will evolve with time, many researchers have focused on the incremental mining of frequent sequences in temporal database. In this paper, we propose an algorithm called IUS, using the frequent and negative border sequences in the original database for incremental sequence mining. To deal with the case where some data need to be updated from the original database, we present an algorithm called DUS to maintain sequential patterns in the updated database. We also define the negative border sequence threshold: Min_nbd_supp to control the number of sequences in the negative border.\nWe describe a method for text entry based on inverse arithmetic coding that relies on gaze direction and which is faster and more accurate than using an on-screen keyboard.   These benefits are derived from two innovations: the writing task is matched to the capabilities of the eye, and a language model is used to make predictable words and phrases easier to write.\nThe (extended) AGM postulates for belief revision seem to deal with the revision of a given theory K by an arbitrary formula, but not to constrain the revisions of two different theories by the same formula. A new postulate is proposed and compared with other similar postulates that have been proposed in the literature. The AGM revisions that satisfy this new postulate stand in one-to-one correspondence with the rational, consistency-preserving relations. This correspondence is described explicitly. Two viewpoints on iterative revisions are distinguished and discussed.\nWe study algorithms for computing stable models of propositional logic programs and derive estimates on their worst-case performance that are asymptotically better than the trivial bound of O(m 2^n), where m is the size of an input program and n is the number of its atoms. For instance, for programs, whose clauses consist of at most two literals (counting the head) we design an algorithm to compute stable models that works in time O(m\\times 1.44225^n). We present similar results for several broader classes of programs, as well.\nIn recent years, the problem of association rule mining in transactional data has been well studied. We propose to extend the discovery of classical association rules to the discovery of association rules of conjunctive queries in arbitrary relational data, inspired by the WARMR algorithm, developed by Dehaspe and Toivonen, that discovers association rules over a limited set of conjunctive queries. Conjunctive query evaluation in relational databases is well understood, but still poses some great challenges when approached from a discovery viewpoint in which patterns are generated and evaluated with respect to some well defined search space and pruning operators.\nMuch work on argument systems has focussed on preferred extensions which define the maximal collectively defensible subsets. Identification and enumeration of these subsets is (under the usual assumptions) computationally demanding. We consider approaches to deciding if a subset S is a preferred extension which query a representations encoding all such extensions, so that the computational effort is invested once only (for the initial enumeration) rather than for each separate query.\nIn this work we present additional results related to the property of strong equivalence of logic programs. This property asserts that two programs share the same set of stable models, even under the addition of new rules. As shown in a recent work by Lifschitz, Pearce and Valverde, strong equivalence can be simply reduced to equivalence in the logic of Here-and-There (HT). In this paper we provide two alternatives respectively based on classical logic and 3-valued logic. The former is applicable to general rules, but not for nested expressions, whereas the latter is applicable for nested expressions but, when moving to an unrestricted syntax, it generally yields different results from HT.\nThe introduction of explicit notions of rejection, or disbelief, into logics for knowledge representation can be justified in a number of ways. Motivations range from the need for versions of negation weaker than classical negation, to the explicit recording of classic belief contraction operations in the area of belief change, and the additional levels of expressivity obtained from an extended version of belief change which includes disbelief contraction. In this paper we present four logics of disbelief which address some or all of these intuitions. Soundness and completeness results are supplied and the logics are compared with respect to applicability and utility.\nThis paper defines an argumentation semantics for extended logic programming and shows its equivalence to the well-founded semantics with explicit negation. We set up a general framework in which we extensively compare this semantics to other argumentation semantics, including those of Dung, and Prakken and Sartor. We present a general dialectical proof theory for these argumentation semantics.\nLogic programs with ordered disjunction (LPODs) combine ideas underlying Qualitative Choice Logic (Brewka et al. KR 2002) and answer set programming. Logic programming under answer set semantics is extended with a new connective called ordered disjunction. The new connective allows us to represent alternative, ranked options for problem solutions in the heads of rules: A \\times B intuitively means: if possible A, but if A is not possible then at least B. The semantics of logic programs with ordered disjunction is based on a preference relation on answer sets. LPODs are useful for applications in design and configuration and can serve as a basis for qualitative decision making.\nThe essay consists of three parts. In the first part, it is explained how theory of algorithms and computations evaluates the contemporary situation with computers and global networks. In the second part, it is demonstrated what new perspectives this theory opens through its new direction that is called theory of super-recursive algorithms. These algorithms have much higher computing power than conventional algorithmic schemes. In the third part, we explicate how realization of what this theory suggests might influence life of people in future. It is demonstrated that now the theory is far ahead computing practice and practice has to catch up with the theory. We conclude with a comparison of different approaches to the development of information technology.\nIn this paper we present a transformation of finite propositional default theories into so-called propositional argumentation systems. This transformation allows to characterize all notions of Reiter's default logic in the framework of argumentation systems. As a consequence, computing extensions, or determining wether a given formula belongs to one extension or all extensions can be answered without leaving the field of classical propositional logic. The transformation proposed is linear in the number of defaults.\nThis paper deals with the revision of partially ordered beliefs. It proposes a semantic representation of epistemic states by partial pre-orders on interpretations and a syntactic representation by partially ordered belief bases. Two revision operations, the revision stemming from the history of observations and the possibilistic revision, defined when the epistemic state is represented by a total pre-order, are generalized, at a semantic level, to the case of a partial pre-order on interpretations, and at a syntactic level, to the case of a partially ordered belief base. The equivalence between the two representations is shown for the two revision operations.\nAbduction is one of the most important forms of reasoning; it has been successfully applied to several practical problems such as diagnosis. In this paper we investigate whether the computational complexity of abduction can be reduced by an appropriate use of preprocessing. This is motivated by the fact that part of the data of the problem (namely, the set of all possible assumptions and the theory relating assumptions and manifestations) are often known before the rest of the problem. In this paper, we show some complexity results about abduction when compilation is allowed.\nAs a part of our effort for studying the evolution and development of cognition, we present results derived from synthetic experimentations in a virtual laboratory where animats develop koncepts adaptively and ground their meaning through action. We introduce the term \"koncept\" to avoid confusions and ambiguity derived from the wide use of the word \"concept\". We present the models which our animats use for abstracting koncepts from perceptions, plastically adapt koncepts, and associate koncepts with actions. On a more philosophical vein, we suggest that knowledge is a property of a cognitive system, not an element, and therefore observer-dependent.\nIn this paper we name some of the advantages of virtual laboratories; and propose that a Behaviours Virtual Laboratory should be useful for both biologists and AI researchers, offering a new perspective for understanding adaptive behaviour. We present our development of a Behaviours Virtual Laboratory, which at this stage is focused in action selection, and show some experiments to illustrate the properties of our proposal, which can be accessed via Internet.\nA knowledge base is redundant if it contains parts that can be inferred from the rest of it. We study the problem of checking whether a CNF formula (a set of clauses) is redundant, that is, it contains clauses that can be derived from the other ones. Any CNF formula can be made irredundant by deleting some of its clauses: what results is an irredundant equivalent subset (I.E.S.) We study the complexity of some related problems: verification, checking existence of a I.E.S. with a given size, checking necessary and possible presence of clauses in I.E.S.'s, and uniqueness. We also consider the problem of redundancy with different definitions of equivalence.\nRecent advances in programming languages study and design have established a standard way of grounding computational systems representation in category theory. These formal results led to a better understanding of issues of control and side-effects in functional and imperative languages. Another benefit is a better way of modelling computational effects in logical frameworks. With this analogy in mind, we embark on an investigation of inference systems based on considering inference behaviour as a form of computation. We delineate a categorical formalisation of control constructs in inference systems. This representation emphasises the parallel between the modular articulation of the categorical building blocks (triples) used to account for the inference architecture and the modular composition of cognitive processes.\nThis paper presents a model for dynamic adjustment of the motivation degree, using a reinforcement learning approach, in an action selection mechanism previously developed by the authors. The learning takes place in the modification of a parameter of the model of combination of internal and external stimuli. Experiments that show the claimed properties are presented, using a VR simulation developed for such purposes. The importance of adaptation by learning in action selection is also discussed.\nThis paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize cross-validation error. A general theory is presented, then it is developed in detail for linear regression and instance-based learning.\nBelief integration methods are often aimed at deriving a single and consistent knowledge base that retains as much as possible of the knowledge bases to integrate. The rationale behind this approach is the minimal change principle: the result of the integration process should differ as less as possible from the knowledge bases to integrate. We show that this principle can be reformulated in terms of a more general model of belief revision, based on the assumption that inconsistency is due to the mistakes the knowledge bases contain. Current belief revision strategies are based on a specific kind of mistakes, which however does not include all possible ones. Some alternative possibilities are discussed.\nThere is a growing interest in using Kalman-filter models in brain modelling. In turn, it is of considerable importance to make Kalman-filters amenable for reinforcement learning. In the usual formulation of optimal control it is computed off-line by solving a backward recursion. In this technical note we show that slight modification of the linear-quadratic-Gaussian Kalman-filter model allows the on-line estimation of optimal control and makes the bridge to reinforcement learning. Moreover, the learning rule for value estimation assumes a Hebbian form weighted by the error of the value estimation.\nWe provide a semantic framework for preference handling in answer set programming. To this end, we introduce preference preserving consequence operators. The resulting fixpoint characterizations provide us with a uniform semantic framework for characterizing preference handling in existing approaches. Although our approach is extensible to other semantics by means of an alternating fixpoint theory, we focus here on the elaboration of preferences under answer set semantics. Alternatively, we show how these approaches can be characterized by the concept of order preservation. These uniform semantic characterizations provide us with new insights about interrelationships and moreover about ways of implementation.\nCooperative constraint solving is an area of constraint programming that studies the interaction between constraint solvers with the aim of discovering the interaction patterns that amplify the positive qualities of individual solvers. Automatisation and formalisation of such studies is an important issue of cooperative constraint solving.   In this paper we present a constraint-based analysis of composite solvers that integrates reasoning about the individual solvers and the processed data. The idea is to approximate this reasoning by resolution of set constraints on the finite sets representing the predicates that express all the necessary properties. We illustrate application of our analysis to two important cooperation patterns: deterministic choice and loop.\nThis note is about the relationship between two theories of negation as failure -- one based on program completion, the other based on stable models, or answer sets. Francois Fages showed that if a logic program satisfies a certain syntactic condition, which is now called ``tightness,'' then its stable models can be characterized as the models of its completion. We extend the definition of tightness and Fages' theorem to programs with nested expressions in the bodies of rules, and study tight logic programs containing the definition of the transitive closure of a predicate.\nThere is a growing interest in using Kalman-filter models for brain modelling. In turn, it is of considerable importance to represent Kalman-filter in connectionist forms with local Hebbian learning rules. To our best knowledge, Kalman-filter has not been given such local representation. It seems that the main obstacle is the dynamic adaptation of the Kalman-gain. Here, a connectionist representation is presented, which is derived by means of the recursive prediction error method. We show that this method gives rise to attractive local learning rules and can adapt the Kalman-gain.\nWe discuss philosophical issues concerning the notion of cognition basing ourselves in experimental results in cognitive sciences, especially in computer simulations of cognitive systems. There have been debates on the \"proper\" approach for studying cognition, but we have realized that all approaches can be in theory equivalent. Different approaches model different properties of cognitive systems from different perspectives, so we can only learn from all of them. We also integrate ideas from several perspectives for enhancing the notion of cognition, such that it can contain other definitions of cognition as special cases. This allows us to propose a simple classification of different types of cognition.\nWe note the importance of time-scales, meaning, and availability of information for the emergence of novel information meta-structures at a global scale. We discuss previous work in this area and develop future perspectives. We focus on the transmission of scientific articles and the integration of traditional conferences with their virtual extensions on the Internet, their time-scales, and availability. We mention the Semantic Web as an effort for integrating meaningful information.\nIn this paper we develop a method for clustering belief functions based on attracting and conflicting metalevel evidence. Such clustering is done when the belief functions concern multiple events, and all belief functions are mixed up. The clustering process is used as the means for separating the belief functions into subsets that should be handled independently. While the conflicting metalevel evidence is generated internally from pairwise conflicts of all belief functions, the attracting metalevel evidence is assumed given by some external source.\nThe vocabulary of a continuous speech recognition (CSR) system is a significant factor in determining its performance. In this paper, we present three principled approaches to select the target vocabulary for a particular domain by trading off between the target out-of-vocabulary (OOV) rate and vocabulary size. We evaluate these approaches against an ad-hoc baseline strategy. Results are presented in the form of OOV rate graphs plotted against increasing vocabulary size for each technique.\nThe relationship between Popper spaces (conditional probability spaces that satisfy some regularity conditions), lexicographic probability systems (LPS's), and nonstandard probability spaces (NPS's) is considered. If countable additivity is assumed, Popper spaces and a subclass of LPS's are equivalent; without the assumption of countable additivity, the equivalence no longer holds. If the state space is finite, LPS's are equivalent to NPS's. However, if the state space is infinite, NPS's are shown to be more general than LPS's.\nReinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0) with linear function approximation is convergent for a simple class of problems, where the system is linear and the costs are quadratic (the LQ control problem). Furthermore, we show that for systems with Gaussian noise and non-completely observable states (the LQG problem), the mentioned RL algorithms are still convergent, if they are combined with Kalman filtering.\nThis paper describes an approach to the representation and processing of ontologies in the Semantic Web, based on the ICMAUS theory of computation and AI. This approach has strengths that complement those of languages based on the Resource Description Framework (RDF) such as RDF Schema and DAML+OIL. The main benefits of the ICMAUS approach are simplicity and comprehensibility in the representation of ontologies, an ability to cope with errors and uncertainties in knowledge, and a versatile reasoning system with capabilities in the kinds of probabilistic reasoning that seem to be required in the Semantic Web.\nWe present a propositional logic %which can be used to reason about the uncertainty of events, where the uncertainty is modeled by a set of probability measures assigning an interval of probability to each event. We give a sound and complete axiomatization for the logic, and show that the satisfiability problem is NP-complete, no harder than satisfiability for propositional logic.\nWe introduce the topic of learning in multiagent systems. We first provide a quick introduction to the field of game theory, focusing on the equilibrium concepts of iterated dominance, and Nash equilibrium. We show some of the most relevant findings in the theory of learning in games, including theorems on fictitious play, replicator dynamics, and evolutionary stable strategies. The CLRI theory and n-level learning agents are introduced as attempts to apply some of these findings to the problem of engineering multiagent systems with learning agents. Finally, we summarize some of the remaining challenges in the field of learning in multiagent systems.\nThis paper introduces an automatic debugging framework that relies on model-based reasoning techniques to locate faults in programs. In particular, model-based diagnosis, together with an abstract interpretation based conflict detection mechanism is used to derive diagnoses, which correspond to possible faults in programs. Design information and partial specifications are applied to guide a model revision process, which allows for automatic detection and correction of structural faults.\nA situation calculus is presented that provides a solution to the frame problem for hierarchical situations, that is, situations that have a modular structure in which parts of the situation behave in a relatively independent manner. This situation calculus is given in a relational, functional, and modal logic form. Each form permits both a single level hierarchy or a multiple level hierarchy, giving six versions of the formalism in all, and a number of sub-versions of these. For multiple level hierarchies, it is possible to give equations between parts of the situation to impose additional structure on the problem. This approach is compared to others in the literature.\nArticle discusses the application of Kullback-Leibler divergence to the recognition of speech signals and suggests three algorithms implementing this divergence criterion: correlation algorithm, spectral algorithm and filter algorithm. Discussion covers an approach to the problem of speech variability and is illustrated with the results of experimental modeling of speech signals. The article gives a number of recommendations on the choice of appropriate model parameters and provides a comparison to some other methods of speech recognition.\nRichard Cox [1] set the axiomatic foundations of probable inference and the algebra of propositions. He showed that consistency within these axioms requires certain rules for updating belief. In this paper we use the analogy between probability and utility introduced in [2] to propose an axiomatic foundation for utility inference and the algebra of preferences. We show that consistency within these axioms requires certain rules for updating preference. We discuss a class of utility functions that stems from the axioms of utility inference and show that this class is the basic building block for any general multiattribute utility function. We use this class of utility functions together with the algebra of preferences to construct utility functions represented by logical operations on the attributes.\nRecent literature in the last Maximum Entropy workshop introduced an analogy between cumulative probability distributions and normalized utility functions. Based on this analogy, a utility density function can de defined as the derivative of a normalized utility function. A utility density function is non-negative and integrates to unity. These two properties form the basis of a correspondence between utility and probability. A natural application of this analogy is a maximum entropy principle to assign maximum entropy utility values. Maximum entropy utility interprets many of the common utility functions based on the preference information needed for their assignment, and helps assign utility values based on partial preference information. This paper reviews maximum entropy utility and introduces further results that stem from the duality between probability and utility.\nWe propose a generalization of expected utility that we call generalized EU (GEU), where a decision maker's beliefs are represented by plausibility measures, and the decision maker's tastes are represented by general (i.e.,not necessarily real-valued) utility functions. We show that every agent, ``rational'' or not, can be modeled as a GEU maximizer. We then show that we can customize GEU by selectively imposing just the constraints we want. In particular, we show how each of Savage's postulates corresponds to constraints on GEU.\nThis paper adds counterfactuals to the framework of knowledge-based programs of Fagin, Halpern, Moses, and Vardi. The use of counterfactuals is illustrated by designing a protocol in which an agent stops sending messages once it knows that it is safe to do so. Such behavior is difficult to capture in the original framework because it involves reasoning about counterfactual executions, including ones that are not consistent with the protocol. Attempts to formalize these notions without counterfactuals are shown to lead to rather counterintuitive behavior.\nCausality is typically treated an all-or-nothing concept; either A is a cause of B or it is not. We extend the definition of causality introduced by Halpern and Pearl [2001] to take into account the degree of responsibility of A for B. For example, if someone wins an election 11--0, then each person who votes for him is less responsible for the victory than if he had won 6--5. We then define a notion of degree of blame, which takes into account an agent's epistemic state. Roughly speaking, the degree of blame of A for B is the expected degree of responsibility of A for B, taken over the epistemic state of an agent.\nIn this paper we suggest an architecture for a software agent which operates a physical device and is capable of making observations and of testing and repairing the device's components. We present simplified definitions of the notions of symptom, candidate diagnosis, and diagnosis which are based on the theory of action language ${\\cal AL}$. The definitions allow one to give a simple account of the agent's behavior in which many of the agent's tasks are reduced to computing stable models of logic programs.\nThis paper explains why scripted dialogue shares some crucial properties with discourse. In particular, when scripted dialogues are generated by a Natural Language Generation system, the generator can apply revision strategies that cannot normally be used when the dialogue results from an interaction between autonomous agents (i.e., when the dialogue is not scripted). The paper explains that the relevant revision operators are best applied at the level of a dialogue plan and discusses how the generator may decide when to apply a given revision operator.\nThis paper studies the potential of identifying lexical paraphrases within a single corpus, focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of comparable corpora, each of them containing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel method that successfully detects isolated paraphrase instances within a single corpus without relying on any a-priori structure and information. A comparison suggests that an instance-based approach may be combined with a vector based approach in order to assess better the paraphrase likelihood for many verb pairs.\nThis book develops the conjecture that all kinds of information processing in computers and in brains may usefully be understood as \"information compression by multiple alignment, unification and search\". This \"SP theory\", which has been under development since 1987, provides a unified view of such things as the workings of a universal Turing machine, the nature of 'knowledge', the interpretation and production of natural language, pattern recognition and best-match information retrieval, several kinds of probabilistic reasoning, planning and problem solving, unsupervised learning, and a range of concepts in mathematics and logic. The theory also provides a basis for the design of an 'SP' computer with several potential advantages compared with traditional digital computers.\nPhysics analysis in astroparticle experiments requires the capability of recognizing new phenomena; in order to establish what is new, it is important to develop tools for automatic classification, able to compare the final result with data from different detectors. A typical example is the problem of Gamma Ray Burst detection, classification, and possible association to known sources: for this task physicists will need in the next years tools to associate data from optical databases, from satellite experiments (EGRET, GLAST), and from Cherenkov telescopes (MAGIC, HESS, CANGAROO, VERITAS).\nIn this paper we study the complexity of solving a problem when a solution of a similar instance is known. This problem is relevant whenever instances may change from time to time, and known solutions may not remain valid after the change. We consider two scenarios: in the first one, what is known is only a solution of the problem before the change; in the second case, we assume that some additional information, found during the search for this solution, is also known. In the first setting, the techniques from the theory of NP-completeness suffice to show complexity results. In the second case, negative results can only be proved using the techniques of compilability, and are often related to the size of considered changes.\nWe address the problem of the development of representations and their relationship to the environment. We study a software agent which develops in a network a representation of its simple environment which captures and integrates the relationships between agent and environment through a closure mechanism. The inclusion of a variable behavior modifier allows better representation development. This can be confirmed with an internal description of the closure mechanism, and with an external description of the properties of the representation network.\nWe propose here a number of approaches to implement constraint propagation for arithmetic constraints on integer intervals. To this end we introduce integer interval arithmetic. Each approach is explained using appropriate proof rules that reduce the variable domains. We compare these approaches using a set of benchmarks.\nAn XML framework for concept description is given, based upon the fact that the tree structure of XML implies the logical structure of concepts as defined by attributional calculus. Especially, the attribute-value representation is implementable in the XML framework. Since the attribute-value representation is an important way to represent knowledge in AI, the framework offers a further and simpler way than the powerful RDF technology.\nWe introduce Ak, an extension of the action description language A (Gelfond and Lifschitz, 1993) to handle actions which affect knowledge. We use sensing actions to increase an agent's knowledge of the world and non-deterministic actions to remove knowledge. We include complex plans involving conditionals and loops in our query language for hypothetical reasoning. We also present a translation of Ak domain descriptions into epistemic logic programs.\nSpam, also known as Unsolicited Commercial Email (UCE), is the bane of email communication. Many data mining researchers have addressed the problem of detecting spam, generally by treating it as a static text classification problem. True in vivo spam filtering has characteristics that make it a rich and challenging domain for data mining. Indeed, real-world datasets with these characteristics are typically difficult to acquire and to share. This paper demonstrates some of these characteristics and argues that researchers should pursue in vivo spam filtering as an accessible domain for investigating them.\nRecently, Jadbabaie, Lin, and Morse (IEEE TAC, 48(6)2003:988-1001) offered a mathematical analysis of the discrete time model of groups of mobile autonomous agents raised by Vicsek et al. in 1995. In their paper, Jadbabaie et al. showed that all agents shall move in the same heading, provided that these agents are periodically linked together. This paper sharpens this result by showing that coordination will be reached under a very weak condition that requires all agents are finally linked together. This condition is also strictly weaker than the one Jadbabaie et al. desired.\nWe analyze the computational complexity of problems related to case-based planning: planning when a plan for a similar instance is known, and planning from a library of plans. We prove that planning from a single case has the same complexity than generative planning (i.e., planning \"from scratch\"); using an extended definition of cases, complexity is reduced if the domain stored in the case is similar to the one to search plans for. Planning from a library of cases is shown to have the same complexity. In both cases, the complexity of planning remains, in the worst case, PSPACE-complete.\nIn this paper we present a cut-free sequent calculus, called SeqS, for some standard conditional logics, namely CK, CK+ID, CK+MP and CK+MP+ID. The calculus uses labels and transition formulas and can be used to prove decidability and space complexity bounds for the respective logics. We also present CondLean, a theorem prover for these logics implementing SeqS calculi written in SICStus Prolog.\nIn this paper one proposes a simple algorithm of combining the fusion rules, those rules which first use the conjunctive rule and then the transfer of conflicting mass to the non-empty sets, in such a way that they gain the property of associativity and fulfill the Markovian requirement for dynamic fusion. Also, a new rule, SDL-improved, is presented.\nWe describe soft versions of the global cardinality constraint and the regular constraint, with efficient filtering algorithms maintaining domain consistency. For both constraints, the softening is achieved by augmenting the underlying graph. The softened constraints can be used to extend the meta-constraint framework for over-constrained problems proposed by Petit, Regin and Bessiere.\nFLUX is a programming method for the design of agents that reason logically about their actions and sensor information in the presence of incomplete knowledge. The core of FLUX is a system of Constraint Handling Rules, which enables agents to maintain an internal model of their environment by which they control their own behavior. The general action representation formalism of the fluent calculus provides the formal semantics for the constraint solver. FLUX exhibits excellent computational behavior due to both a carefully restricted expressiveness and the inference paradigm of progression.\nThis paper presents in detail the generalized pignistic transformation (GPT) succinctly developed in the Dezert-Smarandache Theory (DSmT) framework as a tool for decision process. The GPT allows to provide a subjective probability measure from any generalized basic belief assignment given by any corpus of evidence. We mainly focus our presentation on the 3D case and provide the complete result obtained by the GPT and its validation drawn from the probability theory.\nRecurrent neural networks are often used for learning time-series data. Based on a few assumptions we model this learning task as a minimization problem of a nonlinear least-squares cost function. The special structure of the cost function allows us to build a connection to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm. Furthermore, we argue that RNN training can be fit naturally into the reinforcement learning framework.\nEmergent behaviors are in the focus of recent research interest. It is then of considerable importance to investigate what optimizations suit the learning and prediction of chaotic systems, the putative candidates for emergence. We have compared L1 and L2 regularizations on predicting chaotic time series using linear recurrent neural networks. The internal representation and the weights of the networks were optimized in a unifying framework. Computational tests on different problems indicate considerable advantages for the L1 regularization: It had considerably better learning time and better interpolating capabilities. We shall argue that optimization viewed as a maximum likelihood estimation justifies our results, because L1 regularization fits heavy-tailed distributions -- an apparently general feature of emergent systems -- better.\nThere are many examples in the literature that suggest that indistinguishability is intransitive, despite the fact that the indistinguishability relation is typically taken to be an equivalence relation (and thus transitive). It is shown that if the uncertainty perception and the question of when an agent reports that two things are indistinguishable are both carefully modeled, the problems disappear, and indistinguishability can indeed be taken to be an equivalence relation. Moreover, this model also suggests a logic of vagueness that seems to solve many of the problems related to vagueness discussed in the philosophical literature. In particular, it is shown here how the logic can handle the sorites paradox.\nA careful analysis of conditioning in the Sleeping Beauty problem is done, using the formal model for reasoning about knowledge and probability developed by Halpern and Tuttle. While the Sleeping Beauty problem has been viewed as revealing problems with conditioning in the presence of imperfect recall, the analysis done here reveals that the problems are not so much due to imperfect recall as to asynchrony. The implications of this analysis for van Fraassen's Reflection Principle and Savage's Sure-Thing Principle are considered.\nWe tackle the problem of robust dialogue processing from the perspective of language engineering. We propose an agent-oriented architecture that allows us a flexible way of composing robust processors. Our approach is based on Shoham's Agent Oriented Programming (AOP) paradigm. We will show how the AOP agent model can be enriched with special features and components that allow us to deal with classical problems of dialogue understanding.\nThe paper is an attempt to generalize a methodology, which is similar to the bounded-input bounded-output method currently widely used for the system stability studies. The presented earlier methodology allows decomposition of input space into bounded subspaces and defining for each subspace its bounding surface. It also defines a corresponding predefined control, which maps any point of a bounded input into a desired bounded output subspace. This methodology was improved by providing a mechanism for the fast defining a bounded surface. This paper presents enhanced bounded-input bounded-predefined-control bounded-output approach, which provides adaptability feature to the control and allows transferring of a controlled system along a suboptimal trajectory.\nFormerly I presented a metric navigation method in the Webots mobile robot simulator. The navigating Khepera-like robot builds an occupancy grid of the environment and explores the square-shaped room around with a value iteration algorithm. Now I created a topological navigation procedure based on the occupancy grid process. The extension by a skeletonization algorithm results a graph of important places and the connecting routes among them. I also show the significant time profit gained during the process.\nCategorical data clustering (CDC) and link clustering (LC) have been considered as separate research and application areas. The main focus of this paper is to investigate the commonalities between these two problems and the uses of these commonalities for the creation of new clustering algorithms for categorical data based on cross-fertilization between the two disjoint research fields. More precisely, we formally transform the CDC problem into an LC problem, and apply LC approach for clustering categorical data. Experimental results on real datasets show that LC based clustering method is competitive with existing CDC algorithms with respect to clustering accuracy.\nA widely adopted approach to solving constraint satisfaction problems combines systematic tree search with constraint propagation for pruning the search space. Constraint propagation is performed by propagators implementing a certain notion of consistency. Bounds consistency is the method of choice for building propagators for arithmetic constraints and several global constraints in the finite integer domain. However, there has been some confusion in the definition of bounds consistency. In this paper we clarify the differences and similarities among the three commonly used notions of bounds consistency.\nThis paper presents an approach to enhance search engines with information about word senses available in WordNet. The approach exploits information about the conceptual relations within the lexical-semantic net. In the wrapper for search engines presented, WordNet information is used to specify user's request or to classify the results of a publicly available web search engine, like google, yahoo, etc.\nLexical semantic resources, like WordNet, are often used in real applications of natural language document processing. For example, we integrated GermaNet in our document suite XDOC of processing of German forensic autopsy protocols. In addition to the hypernymy and synonymy relation, we want to adapt GermaNet's verb frames for our analysis. In this paper we outline an approach for the domain related enrichment of GermaNet verb frames by corpus based syntactic and co-occurred data analyses of real documents.\nReal applications of natural language document processing are very often confronted with domain specific lexical gaps during the analysis of documents of a new domain. This paper describes an approach for the derivation of domain specific concepts for the extension of an existing ontology. As resources we need an initial ontology and a partially processed corpus of a domain. We exploit the specific characteristic of the sublanguage in the corpus. Our approach is based on syntactical structures (noun phrases) and compound analyses to extract information required for the extension of GermaNet's lexical resources.\nWe suggest to employ techniques from Natural Language Processing (NLP) and Knowledge Representation (KR) to transform existing documents into documents amenable for the Semantic Web. Semantic Web documents have at least part of their semantics and pragmatics marked up explicitly in both a machine processable as well as human readable manner. XML and its related standards (XSLT, RDF, Topic Maps etc.) are the unifying platform for the tools and methodologies developed for different application scenarios.\nWe address the practical problems of estimating the information relations that characterize large networks. Building on methods developed for analysis of the neural code, we show that reliable estimates of mutual information can be obtained with manageable computational effort. The same methods allow estimation of higher order, multi--information terms. These ideas are illustrated by analyses of gene expression, financial markets, and consumer preferences. In each case, information theoretic measures correlate with independent, intuitive measures of the underlying structures in the system.\nThe paper analyzes the scalability of multiobjective estimation of distribution algorithms (MOEDAs) on a class of boundedly-difficult additively-separable multiobjective optimization problems. The paper illustrates that even if the linkage is correctly identified, massive multimodality of the search problems can easily overwhelm the nicher and lead to exponential scale-up. Facetwise models are subsequently used to propose a growth rate of the number of differing substructures between the two objectives to avoid the niching method from being overwhelmed and lead to polynomial scalability of MOEDAs.\nIn [Hitzler and Wendt 2002, 2005], a new methodology has been proposed which allows to derive uniform characterizations of different declarative semantics for logic programs with negation. One result from this work is that the well-founded semantics can formally be understood as a stratified version of the Fitting (or Kripke-Kleene) semantics. The constructions leading to this result, however, show a certain asymmetry which is not readily understood. We will study this situation here with the result that we will obtain a coherent picture of relations between different semantics for normal logic programs.\nWe show how an evolutionary algorithm can successfully be used to evolve a set of difficult to solve symmetric travelling salesman problem instances for two variants of the Lin-Kernighan algorithm. Then we analyse the instances in those sets to guide us towards deferring general knowledge about the efficiency of the two variants in relation to structural properties of the symmetric travelling sale sman problem.\nWe study the problem of deciding whether some PSPACE-complete problems have models of bounded size. Contrary to problems in NP, models of PSPACE-complete problems may be exponentially large. However, such models may take polynomial space in a succinct representation. For example, the models of a QBF are explicitely represented by and-or trees (which are always of exponential size) but can be succinctely represented by circuits (which can be polynomial or exponential). We investigate the complexity of deciding the existence of such succinct models when a bound on size is given.\nThe task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Detection of such outliers is important for many applications such as fraud detection and customer migration. Most existing methods are designed for numeric data. They will encounter problems with real-life applications that contain categorical data. In this paper, we formally define the problem of outlier detection in categorical data as an optimization problem from a global viewpoint. Moreover, we present a local-search heuristic based algorithm for efficiently finding feasible solutions. Experimental results on real datasets and large synthetic datasets demonstrate the superiority of our model and algorithm.\nWe study here preference revision, considering both the monotonic case where the original preferences are preserved and the nonmonotonic case where the new preferences may override the original ones. We use a relational framework in which preferences are represented using binary relations (not necessarily finite). We identify several classes of revisions that preserve order axioms, for example the axioms of strict partial or weak orders. We consider applications of our results to preference querying in relational databases.\nWe consider qualitative simulation involving a finite set of qualitative relations in presence of complete knowledge about their interrelationship. We show how it can be naturally captured by means of constraints expressed in temporal logic and constraint satisfaction problems. The constraints relate at each stage the 'past' of a simulation with its 'future'. The benefit of this approach is that it readily leads to an implementation based on constraint technology that can be used to generate simulations and to answer queries about them.\nWe describe a polynomial network technique developed for learning to classify clinical electroencephalograms (EEGs) presented by noisy features. Using an evolutionary strategy implemented within Group Method of Data Handling, we learn classification models which are comprehensively described by sets of short-term polynomials. The polynomial models were learnt to classify the EEGs recorded from Alzheimer and healthy patients and recognize the EEG artifacts. Comparing the performances of our technique and some machine learning methods we conclude that our technique can learn well-suited polynomial models which experts can find easy-to-understand.\nA new learning algorithm for Evolving Cascade Neural Networks (ECNNs) is described. An ECNN starts to learn with one input node and then adding new inputs as well as new hidden neurons evolves it. The trained ECNN has a nearly minimal number of input and hidden neurons as well as connections. The algorithm was successfully applied to classify artifacts and normal segments in clinical electroencephalograms (EEGs). The EEG segments were visually labeled by EEG-viewer. The trained ECNN has correctly classified 96.69% of the testing segments. It is slightly better than a standard fully connected neural network.\nA neural network based technique is presented, which is able to successfully extract polynomial classification rules from labeled electroencephalogram (EEG) signals. To represent the classification rules in an analytical form, we use the polynomial neural networks trained by a modified Group Method of Data Handling (GMDH). The classification rules were extracted from clinical EEG data that were recorded from an Alzheimer patient and the sudden death risk patients. The third data is EEG recordings that include the normal and artifact segments. These EEG data were visually identified by medical experts. The extracted polynomial rules verified on the testing EEG data allow to correctly classify 72% of the risk group patients and 96.5% of the segments. These rules performs slightly better than standard feedforward neural networks.\nEvolving Cascade Neural Networks (ECNNs) and a new training algorithm capable of selecting informative features are described. The ECNN initially learns with one input node and then evolves by adding new inputs as well as new hidden neurons. The resultant ECNN has a near minimal number of hidden neurons and inputs. The algorithm is successfully used for training ECNN to recognise artefacts in sleep electroencephalograms (EEGs) which were visually labelled by EEG-viewers. In our experiments, the ECNN outperforms the standard neural-network as well as evolutionary techniques.\nIn this paper we describe a new method combining the polynomial neural network and decision tree techniques in order to derive comprehensible classification rules from clinical electroencephalograms (EEGs) recorded from sleeping newborns. These EEGs are heavily corrupted by cardiac, eye movement, muscle and noise artifacts and as a consequence some EEG features are irrelevant to classification problems. Combining the polynomial network and decision tree techniques, we discover comprehensible classification rules whilst also attempting to keep their classification error down. This technique is shown to outperform a number of commonly used machine learning technique applied to automatically recognize artifacts in the sleep EEGs.\nWe study a class of random 3-SAT instances having exactly one solution. The properties of this ensemble considerably differ from those of a random 3-SAT ensemble. It is numerically shown that the running time of several complete and stochastic local search algorithms monotonically increases as the clause density is decreased. Therefore, there is no easy-hard-easy pattern of hardness as for standard random 3-SAT ensemble. Furthermore, the running time for short single-solution formulas increases with the problem size much faster than for random 3-SAT formulas from the phase transition region.\nThe general intractability of the constraint satisfaction problem has motivated the study of restrictions on this problem that permit polynomial-time solvability. One major line of work has focused on structural restrictions, which arise from restricting the interaction among constraint scopes. In this paper, we engage in a mathematical investigation of generalized hypertree width, a structural measure that has up to recently eluded study. We obtain a number of computational results, including a simple proof of the tractability of CSP instances having bounded generalized hypertree width.\nThis paper presents a short evaluation about the integration of information derived from wavelet non-linear-time-invariant (non-LTI) projection properties using Support Vector Machines (SVM). These properties may give additional information for a classifier trying to detect known patterns hidden by noise. In the experiments we present a simple electromagnetic pulsed signal recognition scheme, where some improvement is achieved with respect to previous work. SVMs are used as a tool for information integration, exploiting some unique properties not easily found in neural networks.\nWe report complexity results about redundancy of formulae in 2CNF form. We first consider the problem of checking redundancy and show some algorithms that are slightly better than the trivial one. We then analyze problems related to finding irredundant equivalent subsets (I.E.S.) of a given set. The concept of cyclicity proved to be relevant to the complexity of these problems. Some results about Horn formulae are also shown.\nCorrelated time series are time series that, by virtue of the underlying process to which they refer, are expected to influence each other strongly. We introduce a novel approach to handle such time series, one that models their interaction as a two-dimensional cellular automaton and therefore allows them to be treated as a single entity. We apply our approach to the problems of filling gaps and predicting values in rainfall time series. Computational results show that the new approach compares favorably to Kalman smoothing and filtering.\nThis article presents an overview of computability logic -- the game-semantically constructed logic of interactive computational tasks and resources. There is only one non-overview, technical section in it, devoted to a proof of the soundness of affine logic with respect to the semantics of computability logic. A comprehensive online source on the subject can be found at http://www.cis.upenn.edu/~giorgi/cl.html\nAn algorithm for answering conjunctive queries over SHIQ knowledge bases that is coNP in data complexity is given. The algorithm is based on the tableau algorithm for reasoning with individuals in SHIQ. The blocking conditions of the tableau are weakened in such a way that the set of models the modified algorithm yields suffices to check query entailment. The modified blocking conditions are based on the ones proposed by Levy and Rousset for reasoning with Horn Rules in the description logic ALCNR.\nThe concept of a temporal phylogenetic network is a mathematical model of evolution of a family of natural languages. It takes into account the fact that languages can trade their characteristics with each other when linguistic communities are in contact, and also that a contact is only possible when the languages are spoken at the same time. We show how computational methods of answer set programming and constraint logic programming can be used to generate plausible conjectures about contacts between prehistoric linguistic communities, and illustrate our approach by applying it to the evolutionary history of Indo-European languages.   To appear in Theory and Practice of Logic Programming (TPLP).\nWe present a declarative language, PP, for the high-level specification of preferences between possible solutions (or trajectories) of a planning problem. This novel language allows users to elegantly express non-trivial, multi-dimensional preferences and priorities over such preferences. The semantics of PP allows the identification of most preferred trajectories for a given goal. We also provide an answer set programming implementation of planning problems with PP preferences.\nTransitive text mining - also named Swanson Linking (SL) after its primary and principal researcher - tries to establish meaningful links between literature sets which are virtually disjoint in the sense that each does not mention the main concept of the other. If successful, SL may give rise to the development of new hypotheses. In this communication we describe our approach to transitive text mining which employs co-occurrence analysis of the medical subject headings (MeSH), the descriptors assigned to papers indexed in PubMed. In addition, we will outline the current state of our web-based information system which will enable our users to perform literature-driven hypothesis building on their own.\nThe prime number theorem, established by Hadamard and de la Vall'ee Poussin independently in 1896, asserts that the density of primes in the positive integers is asymptotic to 1 / ln x. Whereas their proofs made serious use of the methods of complex analysis, elementary proofs were provided by Selberg and Erd\"os in 1948. We describe a formally verified version of Selberg's proof, obtained using the Isabelle proof assistant.\nClustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-histogram, a new efficient algorithm for clustering categorical data. The k-histogram algorithm extends the k-means algorithm to categorical domain by replacing the means of clusters with histograms, and dynamically updates histograms in the clustering process. Experimental results on real datasets show that k-histogram algorithm can produce better clustering results than k-modes algorithm, the one related with our work most closely.\nIn this paper, we propose an scalable approach to modeling based upon word processing documents, and we describe the tool Phoenix providing the technical infrastructure.   For our training environment d3web.Train, we developed a tool to extract case knowledge from existing documents, usually dismissal records, extending Phoenix to d3web.CaseImporter. Independent authors used this tool to develop training systems, observing a significant decrease of time for setteling-in and a decrease of time necessary for developing a case.\nWe analyze a model of interactive unawareness introduced by Heifetz, Meier and Schipper (HMS). We consider two axiomatizations for their model, which capture different notions of validity. These axiomatizations allow us to compare the HMS approach to both the standard (S5) epistemic logic and two other approaches to unawareness: that of Fagin and Halpern and that of Modica and Rustichini. We show that the differences between the HMS approach and the others are mainly due to the notion of validity used and the fact that the HMS is based on a 3-valued propositional logic.\nThis paper presents a versatile system intended to acquire paraphrastic phrases from a representative corpus. In order to decrease the time spent on the elaboration of resources for NLP system (for example Information Extraction, IE hereafter), we suggest to use a machine learning system that helps defining new templates and associated resources. This knowledge is automatically derived from the text collection, in interaction with a large semantic network.\nWe show in this paper that, on the one hand, named entities can be designated using different denominations and that, on the second hand, names denoting named entities are polysemous. The analysis cannot be limited to reference resolution but should take into account naming strategies, which are mainly based on two linguistic operations: synecdoche and metonymy. Lastly, we present a model that explicitly represents the different denominations in discourse, unifying the way to represent linguistic knowledge and world knowledge.\nClustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-ANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular k-means algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, Average Normalized Mutual Information-ANMI) borrowed from cluster ensemble. Experimental results on real datasets show that k-ANMI algorithm is competitive with those state-of-art categorical data clustering algorithms with respect to clustering accuracy.\nCombining a set of existing constraint solvers into an integrated system of cooperating solvers is a useful and economic principle to solve hybrid constraint problems. In this paper we show that this approach can also be used to integrate different language paradigms into a unified framework. Furthermore, we study the syntactic, semantic and operational impacts of this idea for the amalgamation of declarative and constraint programming.\nIS success is a complex concept, and its evaluation is complicated, unstructured and not readily quantifiable. Numerous scientific publications address the issue of success in the IS field as well as in other fields. But, little efforts have been done for processing indeterminacy and uncertainty in success research. This paper shows a formal method for mapping success using Neutrosophic Success Map. This is an emerging tool for processing indeterminacy and uncertainty in success research. EIS success have been analyzed using this tool.\nIn this paper, a mathematical schema theory is developed. This theory has three roots: brain theory schemas, grid automata, and block-shemas. In Section 2 of this paper, elements of the theory of grid automata necessary for the mathematical schema theory are presented. In Section 3, elements of brain theory necessary for the mathematical schema theory are presented. In Section 4, other types of schemas are considered. In Section 5, the mathematical schema theory is developed. The achieved level of schema representation allows one to model by mathematical tools virtually any type of schemas considered before, including schemas in neurophisiology, psychology, computer science, Internet technology, databases, logic, and mathematics.\nThe paper gives a soundness and completeness proof for the implicative fragment of intuitionistic calculus with respect to the semantics of computability logic, which understands intuitionistic implication as interactive algorithmic reduction. This concept -- more precisely, the associated concept of reducibility -- is a generalization of Turing reducibility from the traditional, input/output sorts of problems to computational tasks of arbitrary degrees of interactivity. See http://www.cis.upenn.edu/~giorgi/cl.html for a comprehensive online source on computability logic.\nThis paper addresses the problem of distributed learning under communication constraints, motivated by distributed signal processing in wireless sensor networks and data mining with distributed databases. After formalizing a general model for distributed learning, an algorithm for collaboratively training regularized kernel least-squares regression estimators is derived. Noting that the algorithm can be viewed as an application of successive orthogonal projection algorithms, its convergence properties are investigated and the statistical behavior of the estimator is discussed in a simplified theoretical setting.\nThis paper presents a soundness and completeness proof for propositional intuitionistic calculus with respect to the semantics of computability logic. The latter interprets formulas as interactive computational problems, formalized as games between a machine and its environment. Intuitionistic implication is understood as algorithmic reduction in the weakest possible -- and hence most natural -- sense, disjunction and conjunction as deterministic-choice combinations of problems (disjunction = machine's choice, conjunction = environment's choice), and \"absurd\" as a computational problem of universal strength. See http://www.cis.upenn.edu/~giorgi/cl.html for a comprehensive online source on computability logic.\nThe application of Genetic Programming to the discovery of empirical laws is often impaired by the huge size of the search space, and consequently by the computer resources needed. In many cases, the extreme demand for memory and CPU is due to the massive growth of non-coding segments, the introns. The paper presents a new program evolution framework which combines distribution-based evolution in the PBIL spirit, with grammar-based genetic programming; the information is stored as a probability distribution on the gra mmar rules, rather than in a population. Experiments on a real-world like problem show that this approach gives a practical solution to the problem of intron growth.\nWe discuss here constraint programming (CP) by using a proof-theoretic perspective. To this end we identify three levels of abstraction. Each level sheds light on the essence of CP.   In particular, the highest level allows us to bring CP closer to the computation as deduction paradigm. At the middle level we can explain various constraint propagation algorithms. Finally, at the lowest level we can address the issue of automatic generation and optimization of the constraint propagation algorithms.\nConsistency check has been the only criterion for theory evaluation in logic-based approaches to reasoning about actions. This work goes beyond that and contributes to the metatheory of actions by investigating what other properties a good domain description in reasoning about actions should have. We state some metatheoretical postulates concerning this sore spot. When all postulates are satisfied together we have a modular action theory. Besides being easier to understand and more elaboration tolerant in McCarthy's sense, modular theories have interesting properties. We point out the problems that arise when the postulates about modularity are violated and propose algorithmic checks that can help the designer of an action theory to overcome them.\nWe establish the convergence of the min-sum message passing algorithm for minimization of a broad class of quadratic objective functions: those that admit a convex decomposition. Our results also apply to the equivalent problem of the convergence of Gaussian belief propagation.\nWe propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge.\nShock physics experiments are often complicated and expensive. As a result, researchers are unable to conduct as many experiments as they would like - leading to sparse data sets. In this paper, Support Vector Machines for regression are applied to velocimetry data sets for shock damaged and melted tin metal. Some success at interpolating between data sets is achieved. Implications for future work are discussed.\nIn this paper, we study clustering with respect to the k-modes objective function, a natural formulation of clustering for categorical data. One of the main contributions of this paper is to establish the connection between k-modes and k-median, i.e., the optimum of k-median is at most twice the optimum of k-modes for the same categorical data clustering problem. Based on this observation, we derive a deterministic algorithm that achieves an approximation factor of 2. Furthermore, we prove that the distance measure in k-modes defines a metric. Hence, we are able to extend existing approximation algorithms for metric k-median to k-modes. Empirical results verify the superiority of our method.\nWhile in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds for the multi-armed bandit problem, one for the infinite horizon discounted reward case and one for the finite horizon undiscounted reward case are derived, which make the link between the reward horizon, uncertainty and the need for exploration explicit. From this result follow two practical approximate algorithms, which are illustrated experimentally.\nThis paper presents two new promising rules of combination for the fusion of uncertain and potentially highly conflicting sources of evidences in the framework of the theory of belief functions in order to palliate the well-know limitations of Dempster's rule and to work beyond the limits of applicability of the Dempster-Shafer theory. We present both a new class of adaptive combination rules (ACR) and a new efficient Proportional Conflict Redistribution (PCR) rule allowing to deal with highly conflicting sources for static and dynamic fusion applications.\nIt is well known that perspective alignment plays a major role in the planning and interpretation of spatial language. In order to understand the role of perspective alignment and the cognitive processes involved, we have made precise complete cognitive models of situated embodied agents that self-organise a communication system for dialoging about the position and movement of real world objects in their immediate surroundings. We show in a series of robotic experiments which cognitive mechanisms are necessary and sufficient to achieve successful spatial language and why and how perspective alignment can take place, either implicitly or based on explicit marking.\nMobile agents research is clearly aiming towards imposing agent based development as the next generation of tools for writing software. This paper comes with its own contribution to this global goal by introducing a novel unifying framework meant to bring simplicity and interoperability to and among agent platforms as we know them today. In addition to this, we also introduce a set of agent behaviors which, although tailored for and from the area of virtual learning environments, are none the less generic enough to be used for rapid, simple, useful and reliable agent deployment. The paper also presents an illustrative case study brought forward to prove the feasibility of our design.\nE-learning is nowadays one of the most interesting of the \"e- \" domains available through the Internet. The main problem to create a Web-based, virtual environment is to model the traditional domain and to implement the model using the most suitable technologies. We analyzed the distance learning domain and investigated the possibility to implement some e-learning services using mobile agent technologies. This paper presents a model of the Student Assessment Service (SAS) and an agent-based framework developed to be used for implementing specific applications. A specific Student Assessment application that relies on the framework was developed.\nClassification of ordinal data is one of the most important tasks of relation learning. In this thesis a novel framework for ordered classes is proposed. The technique reduces the problem of classifying ordered classes to the standard two-class problem. The introduced method is then mapped into support vector machines and neural networks. Compared with a well-known approach using pairwise objects as training samples, the new algorithm has a reduced complexity and training time. A second novel model, the unimodal model, is also introduced and a parametric version is mapped into neural networks. Several case studies are presented to assert the validity of the proposed models.\nThe research results described are concerned with: - developing a domain modeling method and tools to provide the design and implementation of decision-making support systems for computer integrated manufacturing; - building a decision-making support system based on know-how and its software environment. The research is funded by NEDO, Japan.\nRecently, extensive efforts have been made on the application of expert system technique to solving the process planning task in the machining domain. This paper introduces a new formal method to design CAPP expert systems. The formal method is applied to provide a contour of the CAPP expert system building technology. Theoretical aspects of the formalism are described and illustrated by an example of know-how analysis. Flexible facilities to utilize multiple knowledge types and multiple planning strategies within one system are provided by the technology.\nThe problem of combining beliefs in the Dempster-Shafer belief theory has attracted considerable attention over the last two decades. The classical Dempster's Rule has often been criticised, and many alternative rules for belief combination have been proposed in the literature. The consensus operator for combining beliefs has nice properties and produces more intuitive results than Dempster's rule, but has the limitation that it can only be applied to belief distribution functions on binary state spaces. In this paper we present a generalisation of the consensus operator that can be applied to Dirichlet belief functions on state spaces of arbitrary size. This rule, called the cumulative rule of belief combination, can be derived from classical statistical theory, and corresponds well with human intuition.\nWe present here a formal foundation for an iterative and incremental approach to constructing and evaluating preference queries. Our main focus is on query modification: a query transformation approach which works by revising the preference relation in the query. We provide a detailed analysis of the cases where the order-theoretic properties of the preference relation are preserved by the revision. We consider a number of different revision operators: union, prioritized and Pareto composition. We also formulate algebraic laws that enable incremental evaluation of preference queries. Finally, we consider two variations of the basic framework: finite restrictions of preference relations and weak-order extensions of strict partial order preference relations.\nArithmetic constraints on integer intervals are supported in many constraint programming systems. We study here a number of approaches to implement constraint propagation for these constraints. To describe them we introduce integer interval arithmetic. Each approach is explained using appropriate proof rules that reduce the variable domains. We compare these approaches using a set of benchmarks. For the most promising approach we provide results that characterize the effect of constraint propagation. This is a full version of our earlier paper, cs.PL/0403016.\nThe aim of this paper is to propose a method for tagging named entities (NE), using natural language processing techniques. Beyond their literal meaning, named entities are frequently subject to metonymy. We show the limits of current NE type hierarchies and detail a new proposal aiming at dynamically capturing the semantics of entities in context. This model can analyze complex linguistic phenomena like metonymy, which are known to be difficult for natural language processing but crucial for most applications. We present an implementation and some test using the French ESTER corpus and give significant results.\nQuestions related to the evolution of language have recently known an impressive increase of interest (Briscoe, 2002). This short paper aims at questioning the scientific status of these models and their relations to attested data. We show that one cannot directly model non-linguistic factors (exogenous factors) even if they play a crucial role in language evolution. We then examine the relation between linguistic models and attested language data, as well as their contribution to cognitive linguistics.\nThe formalizations of periods of time inside a linear model of Time are usually based on the notion of intervals, that may contain or may not their endpoints. This is not enought when the periods are written in terms of coarse granularities with respect to the event taken into account. For instance, how to express the inter-war period in terms of a {\\em years} interval? This paper presents a new type of intervals, neither open, nor closed or open-closed and the extension of operations on intervals of this new type, in order to reduce the gap between the discourse related to temporal relationship and its translation into a discretized model of Time.\nKnowing the norms of a domain is crucial, but there exist no repository of norms. We propose a method to extract them from texts: texts generally do not describe a norm, but rather how a state-of-affairs differs from it. Answers concerning the cause of the state-of-affairs described often reveal the implicit norm. We apply this idea to the domain of driving, and validate it by designing algorithms that identify, in a text, the \"basic\" norms to which it refers implicitly.\nConstraint propagation algorithms implement logical inference. For efficiency, it is essential to control whether and in what order basic inference steps are taken. We provide a high-level framework that clearly differentiates between information needed for controlling propagation versus that needed for the logical semantics of complex constraints composed from primitive ones. We argue for the appropriateness of our controlled propagation framework by showing that it captures the underlying principles of manually designed propagation algorithms, such as literal watching for unit clause propagation and the lexicographic ordering constraint. We provide an implementation and benchmark results that demonstrate the practicality and efficiency of our framework.\nWe introduce a constraint-based framework for studying infinite qualitative simulations concerned with contingencies such as time, space, shape, size, abstracted into a finite set of qualitative relations. To define the simulations, we combine constraints that formalize the background knowledge concerned with qualitative reasoning with appropriate inter-state constraints that are formulated using linear temporal logic. We implemented this approach in a constraint programming system by drawing on ideas from bounded model checking. The resulting system allows us to test and modify the problem specifications in a straightforward way and to combine various knowledge aspects.\nIn this paper, the author proposes a series of multilevel double hashing schemes called cascade hash tables. They use several levels of hash tables. In each table, we use the common double hashing scheme. Higher level hash tables work as fail-safes of lower level hash tables. By this strategy, it could effectively reduce collisions in hash insertion. Thus it gains a constant worst case lookup time with a relatively high load factor(70%-85%) in random experiments. Different parameters of cascade hash tables are tested.\nWe introduce and study logic programs whose clauses are built out of monotone constraint atoms. We show that the operational concept of the one-step provability operator generalizes to programs with monotone constraint atoms, but the generalization involves nondeterminism. Our main results demonstrate that our formalism is a common generalization of (1) normal logic programming with its semantics of models, supported models and stable models, (2) logic programming with weight atoms (lparse programs) with the semantics of stable models, as defined by Niemela, Simons and Soininen, and (3) of disjunctive logic programming with the possible-model semantics of Sakama and Inoue.\nThe paper concerns the understanding of plurals in the framework of Artificial Intelligence and emphasizes the role of time. The construction of collection(s) and their evolution across time is often crucial and has to be accounted for. The paper contrasts a \"de dicto\" collection where the collection can be considered as persisting over these situations even if its members change with a \"de re\" collection whose composition does not vary through time. It expresses different criteria of choice between the two interpretations (de re and de dicto) depending on the context of enunciation.\nThis paper addresses the problem of computational terminology evaluation not per se but in a specific application context. This paper describes the evaluation procedure that has been used to assess the validity of our overall indexing approach and the quality of the IndDoc indexing tool. Even if user-oriented extended evaluation is irreplaceable, we argue that early evaluations are possible and they are useful for development guidance.\nThis report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE.\nThis repport concerns automatic understanding of (french) iterative sentences, i.e. sentences where one single verb has to be interpreted by a more or less regular plurality of events. A linguistic analysis is proposed along an extension of Reichenbach's theory, several formal representations are considered and a corpus of 18000 newspaper extracts is described.\nIn this paper we describe an architecture of a system that answer the question : Why did the accident happen? from the textual description of an accident. We present briefly the different parts of the architecture and then we describe with more detail the semantic part of the system i.e. the part in which the norm-based reasoning is performed on the explicit knowlege extracted from the text.\nTruth based entailments are not sufficient for a good comprehension of NL. In fact, it can not deduce implicit information necessary to understand a text. On the other hand, norm based entailments are able to reach this goal. This idea was behind the development of Frames (Minsky 75) and Scripts (Schank 77, Schank 79) in the 70's. But these theories are not formalized enough and their adaptation to new situations is far from being obvious. In this paper, we present a reasoning system which uses norms in a causal reasoning process in order to find the cause of an accident from a text describing it.\nThe k-modes algorithm has become a popular technique in solving categorical data clustering problems in different application domains. However, the algorithm requires random selection of initial points for the clusters. Different initial points often lead to considerable distinct clustering results. In this paper we present an experimental study on applying a farthest-point heuristic based initialization method to k-modes clustering to improve its performance. Experiments show that new initialization method leads to better clustering accuracy than random selection initialization method for k-modes clustering.\nWe present tableau calculi for some logics of nonmonotonic reasoning, as defined by Kraus, Lehmann and Magidor. We give a tableau proof procedure for all KLM logics, namely preferential, loop-cumulative, cumulative and rational logics. Our calculi are obtained by introducing suitable modalities to interpret conditional assertions. We provide a decision procedure for the logics considered, and we study their complexity.\nEvolutionary processes proved very useful for solving optimization problems. In this work, we build a formalization of the notion of cooperation and competition of multiple systems working toward a common optimization goal of the population using evolutionary computation techniques. It is justified that evolutionary algorithms are more expressive than conventional recursive algorithms. Three subclasses of evolutionary algorithms are proposed here: bounded finite, unbounded finite and infinite types. Some results on completeness, optimality and search decidability for the above classes are presented. A natural extension of Evolutionary Turing Machine model developed in this paper allows one to mathematically represent and study properties of cooperation and competition in a population of optimized species.\nFunctional brain imaging is a source of spatio-temporal data mining problems. A new framework hybridizing multi-objective and multi-modal optimization is proposed to formalize these data mining problems, and addressed through Evolutionary Computation (EC). The merits of EC for spatio-temporal data mining are demonstrated as the approach facilitates the modelling of the experts' requirements, and flexibly accommodates their changing goals.\nWe investigate systematically the impact of human intervention in the training of computer players in a strategy board game. In that game, computer players utilise reinforcement learning with neural networks for evolving their playing strategies and demonstrate a slow learning speed. Human intervention can significantly enhance learning performance, but carry-ing it out systematically seems to be more of a problem of an integrated game development environment as opposed to automatic evolutionary learning.\nIn this paper we experiment with a 2-player strategy board game where playing models are evolved using reinforcement learning and neural networks. The models are evolved to speed up automatic game development based on human involvement at varying levels of sophistication and density when compared to fully autonomous playing. The experimental results suggest a clear and measurable association between the ability to win games and the ability to do that fast, while at the same time demonstrating that there is a minimum level of human involvement beyond which no learning really occurs.\nWhen genetic algorithms are used to evolve decision trees, key tree quality parameters can be recursively computed and re-used across generations of partially similar decision trees. Simply storing instance indices at leaves is enough for fitness to be piecewise computed in a lossless fashion. We show the derivation of the (substantial) expected speed-up on two bounding case problems and trace the attractive property of lossless fitness inheritance to the divide-and-conquer nature of decision trees. The theoretical results are supported by experimental evidence.\nThis paper presents a property of propositional theories under the answer sets semantics (called Equilibrium Logic for this general syntax): any theory can always be reexpressed as a strongly equivalent disjunctive logic program, possibly with negation in the head. We provide two different proofs for this result: one involving a syntactic transformation, and one that constructs a program starting from the countermodels of the theory in the intermediate logic of here-and-there.\nWe propose a new set of rotationally and translationally invariant features for image or pattern recognition and classification. The new features are cubic polynomials in the pixel intensities and provide a richer representation of the original image than most existing systems of invariants. Our construction is based on the generalization of the concept of bispectrum to the three-dimensional rotation group SO(3), and a projection of the image onto the sphere.\nWe examine four approaches for dealing with the logical omniscience problem and their potential applicability: the syntactic approach, awareness, algorithmic knowledge, and impossible possible worlds. Although in some settings these approaches are equi-expressive and can capture all epistemic states, in other settings of interest (especially with probability in the picture), we show that they are not equi-expressive. We then consider the pragmatics of dealing with logical omniscience-- how to choose an approach and construct an appropriate model.\nThis short paper introduces two new fusion rules for combining quantitative basic belief assignments. These rules although very simple have not been proposed in literature so far and could serve as useful alternatives because of their low computation cost with respect to the recent advanced Proportional Conflict Redistribution rules developed in the DSmT framework.\nThis paper presents a Prolog interface to the MiniSat satisfiability solver. Logic program- ming with satisfiability combines the strengths of the two paradigms: logic programming for encoding search problems into satisfiability on the one hand and efficient SAT solving on the other. This synergy between these two exposes a programming paradigm which we propose here as a logic programming pearl. To illustrate logic programming with SAT solving we give an example Prolog program which solves instances of Partial MAXSAT.\nFor academics and practitioners concerned with computers, business and mathematics, one central issue is supporting decision makers. In this paper, we propose a generalization of Decision Matrix Method (DMM), using Neutrosophic logic. It emerges as an alternative to the existing logics and it represents a mathematical model of uncertainty and indeterminacy. This paper proposes the Neutrosophic Decision Matrix Method as a more realistic tool for decision making. In addition, a de-neutrosophication process is included.\nA framework named Copula Component Analysis (CCA) for blind source separation is proposed as a generalization of Independent Component Analysis (ICA). It differs from ICA which assumes independence of sources that the underlying components may be dependent with certain structure which is represented by Copula. By incorporating dependency structure, much accurate estimation can be made in principle in the case that the assumption of independence is invalidated. A two phrase inference method is introduced for CCA which is based on the notion of multidimensional ICA.\nThis paper constructs a tree structure for the music rhythm using the L-system. It models the structure as an automata and derives its complexity. It also solves the complexity for the L-system. This complexity can resolve the similarity between trees. This complexity serves as a measure of psychological complexity for rhythms. It resolves the music complexity of various compositions including the Mozart effect K488.   Keyword: music perception, psychological complexity, rhythm, L-system, automata, temporal associative memory, inverse problem, rewriting rule, bracketed string, tree similarity\nReinforcement learning means learning a policy--a mapping of observations into actions--based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. We present an application of gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication and compare the performance of this algorithm to other routing methods on a benchmark problem.\nA set of recurrence relations for on-shell two-loop self-energy diagrams with one mass is presented, which allows to reduce the diagrams with arbitrary indices (powers of scalar propagators) to a set of the master integrals. The SHELL2 package is used for the calculation of special types of diagrams. A method of calculation of higher order \\epsilon-expansion of master integrals is demonstrated.\nHigher order coefficients of the inverse mass expansion of one-loop effective actions are obtained from a one-dimensional path integral representation. For the case of a massive scalar loop in the background of both a scalar potential and a (non Abelian) gauge field explicit results to $O(T^5)$ in the proper time parameter are presented.\nWe investigate measures of complexity of function classes based on continuity moduli of Gaussian and Rademacher processes. For Gaussian processes, we obtain bounds on the continuity modulus on the convex hull of a function class in terms of the same quantity for the class itself. We also obtain new bounds on generalization error in terms of localized Rademacher complexities. This allows us to prove new results about generalization performance for convex hulls in terms of characteristics of the base class. As a byproduct, we obtain a simple proof of some of the known bounds on the entropy of convex hulls.\nIn this paper, we are interested in optimal decisions in a partially observable Markov universe. Our viewpoint departs from the dynamic programming viewpoint: we are directly approximating an optimal strategic tree depending on the observation. This approximation is made by means of a parameterized probabilistic law. In this paper, a particular family of hidden Markov models, with input and output, is considered as a learning framework. A method for optimizing the parameters of these HMMs is proposed and applied. This optimization method is based on the cross-entropic principle.\nGiven i.i.d. data from an unknown distribution, we consider the problem of predicting future items. An adaptive way to estimate the probability density is to recursively subdivide the domain to an appropriate data-dependent granularity. A Bayesian would assign a data-independent prior probability to \"subdivide\", which leads to a prior over infinite(ly many) trees. We derive an exact, fast, and simple inference algorithm for such a prior, for the data evidence, the predictive distribution, the effective model dimension, and other quantities.\nWe study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a \"true\" model contained in the class. This implies almost sure convergence of the predictive distribution to the true one at a fast rate. It corresponds to Solomonoff's central theorem of universal induction, however with a bound that is exponentially larger.\nStatistical modeling of data sets by neural-network techniques is offered as an alternative to traditional semiempirical approaches to global modeling of nuclear properties. New results are presented to support the position that such novel techniques can rival conventional theory in predictive power, if not in economy of description. Examples include the statistical inference of atomic masses and beta-decay halflives based on the information contained in existing databases. Neural network modeling, as well as other statistical strategies based on new algorithms for artificial intelligence, may prove to be a useful asset in the further exploration of nuclear phenomena far from stability.\nThe observed long-range spatiotemporal correlations of real world dynamical systems is governed by quantumlike mechanics with inherent non-local connections. In summary, microscopic scale local fluctuations form a unified self-organized adaptive network manifested as the macro-scale dynamical system with implicit ordered energy flow between the larger and smaller scales. Such a concept of ladder networks may find applications in the design of artificial intelligence systems.\nWe have constructed a simple semiclassical model of neural network where neurons have quantum links with one another in a chosen way and affect one another in a fashion analogous to action potentials. We have examined the role of stochasticity introduced by the quantum potential and compare the system with the classical system of an integrate-and-fire model by Hopfield. Average periodicity and short term retentivity of input memory are noted.\nModern approaches to semanic analysis if reformulated as Hilbert-space problems reveal formal structures known from quantum mechanics. Similar situation is found in distributed representations of cognitive structures developed for the purposes of neural networks. We take a closer look at similarites and differences between the above two fields and quantum information theory.\nWe present a genetic algorithm for finding a set of pulse sequences, or rotations, for a given quantum logic gate, as implemented by NMR. We demonstrate the utility of the method by showing that shorter sequences than have been previously published can be found for both a CNOT and for the central part of Shor's algorithm (for N=15.) Artificial intelligence techniques like the genetic algorithm here presented have an enormous potential for simplifying the implementation of working quantum computers.\nIn this paper Arabic was investigated from the speech recognition problem point of view. We propose a novel approach to build an Arabic Automated Speech Recognition System (ASR). This system is based on the open source CMU Sphinx-4, from the Carnegie Mellon University. CMU Sphinx is a large-vocabulary; speaker-independent, continuous speech recognition system based on discrete Hidden Markov Models (HMMs). We build a model using utilities from the OpenSource CMU Sphinx. We will demonstrate the possible adaptability of this system to Arabic voice recognition.\nNoise, corruptions and variations in face images can seriously hurt the performance of face recognition systems. To make such systems robust, multiclass neuralnetwork classifiers capable of learning from noisy data have been suggested. However on large face data sets such systems cannot provide the robustness at a high level. In this paper we explore a pairwise neural-network system as an alternative approach to improving the robustness of face recognition. In our experiments this approach is shown to outperform the multiclass neural-network system in terms of the predictive accuracy on the face images corrupted by noise.\nWe argue for a compositional semantics grounded in a strongly typed ontology that reflects our commonsense view of the world and the way we talk about it. Assuming such a structure we show that the semantics of various natural language phenomena may become nearly trivial.\nGaussian mixture models (GMM) and support vector machines (SVM) are introduced to classify faults in a population of cylindrical shells. The proposed procedures are tested on a population of 20 cylindrical shells and their performance is compared to the procedure, which uses multi-layer perceptrons (MLP). The modal properties extracted from vibration data are used to train the GMM, SVM and MLP. It is observed that the GMM produces 98%, SVM produces 94% classification accuracy while the MLP produces 88% classification rates.\nThe Parameter-Less Self-Organizing Map (PLSOM) is a new neural network algorithm based on the Self-Organizing Map (SOM). It eliminates the need for a learning rate and annealing schemes for learning rate and neighbourhood size. We discuss the relative performance of the PLSOM and the SOM and demonstrate some tasks in which the SOM fails but the PLSOM performs satisfactory. Finally we discuss some example applications of the PLSOM and present a proof of ordering under certain limited conditions.\nThis paper proposes a neuro-rough model based on multi-layered perceptron and rough set. The neuro-rough model is then tested on modelling the risk of HIV from demographic data. The model is formulated using Bayesian framework and trained using Monte Carlo method and Metropolis criterion. When the model was tested to estimate the risk of HIV infection given the demographic data it was found to give the accuracy of 62%. The proposed model is able to combine the accuracy of the Bayesian MLP model and the transparency of Bayesian rough set model.\nThe idea of symbolic controllers tries to bridge the gap between the top-down manual design of the controller architecture, as advocated in Brooks' subsumption architecture, and the bottom-up designer-free approach that is now standard within the Evolutionary Robotics community. The designer provides a set of elementary behavior, and evolution is given the goal of assembling them to solve complex tasks. Two experiments are presented, demonstrating the efficiency and showing the recursiveness of this approach. In particular, the sensitivity with respect to the proposed elementary behaviors, and the robustness w.r.t. generalization of the resulting controllers are studied in detail.\nWe present a multi-modal action logic with first-order modalities, which contain terms which can be unified with the terms inside the subsequent formulas and which can be quantified. This makes it possible to handle simultaneously time and states. We discuss applications of this language to action theory where it is possible to express many temporal aspects of actions, as for example, beginning, end, time points, delayed preconditions and results, duration and many others. We present tableaux rules for a decidable fragment of this logic.\nThe work proposes the application of fuzzy set theory (FST) to diagnose the condition of high voltage bushings. The diagnosis uses dissolved gas analysis (DGA) data from bushings based on IEC60599 and IEEE C57-104 criteria for oil impregnated paper (OIP) bushings. FST and neural networks are compared in terms of accuracy and computational efficiency. Both FST and NN simulations were able to diagnose the bushings condition with 10% error. By using fuzzy theory, the maintenance department can classify bushings and know the extent of degradation in the component.\nWe consider the problem of minimal correction of the training set to make it consistent with monotonic constraints. This problem arises during analysis of data sets via techniques that require monotone data. We show that this problem is NP-hard in general and is equivalent to finding a maximal independent set in special orgraphs. Practically important cases of that problem considered in detail. These are the cases when a partial order given on the replies set is a total order or has a dimension 2. We show that the second case can be reduced to maximization of a quadratic convex function on a convex set. For this case we construct an approximate polynomial algorithm based on convex optimization.\nIn this paper we derive the equations for Loop Corrected Belief Propagation on a continuous variable Gaussian model. Using the exactness of the averages for belief propagation for Gaussian models, a different way of obtaining the covariances is found, based on Belief Propagation on cavity graphs. We discuss the relation of this loop correction algorithm to Expectation Propagation algorithms for the case in which the model is no longer Gaussian, but slightly perturbed by nonlinear terms.\nWe provide here an epistemic analysis of arbitrary strategic games based on the possibility correspondences. Such an analysis calls for the use of transfinite iterations of the corresponding operators. Our approach is based on Tarski's Fixpoint Theorem and applies both to the notions of rationalizability and the iterated elimination of strictly dominated strategies.\nThis paper describes a system capable of semi-automatically filling an XML template from free texts in the clinical domain (practice guidelines). The XML template includes semantic information not explicitly encoded in the text (pairs of conditions and actions/recommendations). Therefore, there is a need to compute the exact scope of conditions over text sequences expressing the required actions. We present a system developed for this task. We show that it yields good performance when applied to the analysis of French practice guidelines.\nIn this article, we study translations between variants of defaults logics such that the extensions of the theories that are the input and the output of the translation are in a bijective correspondence. We assume that a translation can introduce new variables and that the result of translating a theory can either be produced in time polynomial in the size of the theory or its output is polynomial in that size; we however restrict to the case in which the original theory has extensions. This study fills a gap between two previous pieces of work, one studying bijective translations among restrictions of default logics, and the other one studying non-bijective translations between default logics variants.\nThis dissertation presents several new methods of supervised and unsupervised learning of word sense disambiguation models. The supervised methods focus on performing model searches through a space of probabilistic models, and the unsupervised methods rely on the use of Gibbs Sampling and the Expectation Maximization (EM) algorithm. In both the supervised and unsupervised case, the Naive Bayesian model is found to perform well. An explanation for this success is presented in terms of learning rates and bias-variance decompositions.\nWe state an elementary inequality for the structure from motion problem for m cameras and n points. This structure from motion inequality relates space dimension, camera parameter dimension, the number of cameras and number points and global symmetry properties and provides a rigorous criterion for which reconstruction is not possible with probability 1. Mathematically the inequality is based on Frobenius theorem which is a geometric incarnation of the fundamental theorem of linear algebra. The paper also provides a general mathematical formalism for the structure from motion problem. It includes the situation the points can move while the camera takes the pictures.\nBoth in the plane and in space, we invert the nonlinear Ullman transformation for 3 points and 3 orthographic cameras. While Ullman's theorem assures a unique reconstruction modulo a reflection for 3 cameras and 4 points, we find a locally unique reconstruction for 3 cameras and 3 points. Explicit reconstruction formulas allow to decide whether picture data of three cameras seeing three points can be realized as a point-camera configuration.\nThis article presents a technique for proving problems hard for classes of the polynomial hierarchy or for PSPACE. The rationale of this technique is that some problem restrictions are able to simulate existential or universal quantifiers. If this is the case, reductions from Quantified Boolean Formulae (QBF) to these restrictions can be transformed into reductions from QBFs having one more quantifier in the front. This means that a proof of hardness of a problem at level n in the polynomial hierarchy can be split into n separate proofs, which may be simpler than a proof directly showing a reduction from a class of QBFs to the considered problem.\nHow best to quantify the information of an object, whether natural or artifact, is a problem of wide interest. A related problem is the computability of an object. We present practical examples of a new way to address this problem. By giving an appropriate representation to our objects, based on a hierarchical coding of information, we exemplify how it is remarkably easy to compute complex objects. Our algorithmic complexity is related to the length of the class of objects, rather than to the length of the object.\nIn this paper we extend the new family of (quantitative) Belief Conditioning Rules (BCR) recently developed in the Dezert-Smarandache Theory (DSmT) to their qualitative counterpart for belief revision. Since the revision of quantitative as well as qualitative belief assignment given the occurrence of a new event (the conditioning constraint) can be done in many possible ways, we present here only what we consider as the most appealing Qualitative Belief Conditioning Rules (QBCR) which allow to revise the belief directly with words and linguistic labels and thus avoids the introduction of ad-hoc translations of quantitative beliefs into quantitative ones for solving the problem.\nThis paper presents a multi-sensor fusion strategy for a novel road-matching method designed to support real-time navigational features within advanced driving-assistance systems. Managing multihypotheses is a useful strategy for the road-matching problem. The multi-sensor fusion and multi-modal estimation are realized using Dynamical Bayesian Network. Experimental results, using data from Antilock Braking System (ABS) sensors, a differential Global Positioning System (GPS) receiver and an accurate digital roadmap, illustrate the performances of this approach, especially in ambiguous situations.\nData collection often results in records that have missing values or variables. This investigation compares 3 different data imputation models and identifies their merits by using accuracy measures. Autoencoder Neural Networks, Principal components and Support Vector regression are used for prediction and combined with a genetic algorithm to then impute missing variables. The use of PCA improves the overall performance of the autoencoder network while the use of support vector regression shows promising potential for future investigation. Accuracies of up to 97.4 % on imputation of some of the variables were achieved.\nThis paper describes a system capable of semi-automatically filling an XML template from free texts in the clinical domain (practice guidelines). The XML template includes semantic information not explicitly encoded in the text (pairs of conditions and actions/recommendations). Therefore, there is a need to compute the exact scope of conditions over text sequences expressing the required actions. We present in this paper the rules developed for this task. We show that the system yields good performance when applied to the analysis of French practice guidelines.\nIntervention theories of causality define a relationship as causal if appropriately specified interventions to manipulate a putative cause tend to produce changes in the putative effect. Interventionist causal theories are commonly formalized by using directed graphs to represent causal relationships, local probability models to quantify the relationship between cause and effect, and a special kind of conditioning operator to represent the effects of interventions. Such a formal model represents a family of joint probability distributions, one for each allowable intervention policy. This paper interprets the von Neumann formalization of quantum theory as an interventionist theory of causality, describes its relationship to interventionist theories popular in the artificial intelligence literature, and presents a new family of graphical models that extends causal Bayesian networks to quantum systems.\nThis paper describes experiments on identifying the language of a single name in isolation or in a document written in a different language. A new corpus has been compiled and made available, matching names against languages. This corpus is used in a series of experiments measuring the performance of general language models and names-only language models on the language identification task. Conclusions are drawn from the comparison between using general language models and names-only language models and between identifying the language of isolated names and the language of very short document fragments. Future research directions are outlined.\nA method for the construction of approximate analytical expressions for the stationary marginal densities of general stochastic search processes is proposed. By the marginal densities, regions of the search space that with high probability contain the global optima can be readily defined. The density estimation procedure involves a controlled number of linear operations, with a computational cost per iteration that grows linearly with problem size.\nThis paper addresses a method to analyze the covert social network foundation hidden behind the terrorism disaster. It is to solve a node discovery problem, which means to discover a node, which functions relevantly in a social network, but escaped from monitoring on the presence and mutual relationship of nodes. The method aims at integrating the expert investigator's prior understanding, insight on the terrorists' social network nature derived from the complex graph theory, and computational data processing. The social network responsible for the 9/11 attack in 2001 is used to execute simulation experiment to evaluate the performance of the method.\nMethods to solve a node discovery problem for a social network are presented. Covert nodes refer to the nodes which are not observable directly. They transmit the influence and affect the resulting collaborative activities among the persons in a social network, but do not appear in the surveillance logs which record the participants of the collaborative activities. Discovering the covert nodes is identifying the suspicious logs where the covert nodes would appear if the covert nodes became overt. The performance of the methods is demonstrated with a test dataset generated from computationally synthesized networks and a real organization.\nA computer model of a \"sense of humour\" is proposed. The humorous effect is interpreted as a specific malfunction in the course of information processing due to the need for the rapid deletion of the false version transmitted into consciousness. The biological function of a sense of humour consists in speeding up the bringing of information into consciousness and in fuller use of the resources of the brain.\nThe computer realization of a \"sense of humour\" requires the creation of an algorithm for solving the \"linguistic problem\", i.e. the problem of recognizing a continuous sequence of polysemantic images. Such algorithm may be realized in the Hopfield model of a neural network after its proper modification.\nThe paper proposes a general notion of interaction between attributes, which can be applied to many fields in decision making and data analysis. It generalizes the notion of interaction defined for criteria modelled by capacities, by considering functions defined on lattices. For a given problem, the lattice contains for each attribute the partially ordered set of remarkable points or levels. The interaction is based on the notion of derivative of a function defined on a lattice, and appears as a generalization of the Shapley value or other probabilistic values.\nA computer model of \"a sense of humour\" suggested previously [arXiv:0711.2058,0711.2061], relating the humorous effect with a specific malfunction in information processing, is given in somewhat different exposition. Psychological aspects of humour are elaborated more thoroughly. The mechanism of laughter is formulated on the more general level. Detailed discussion is presented for the higher levels of information processing, which are responsible for a perception of complex samples of humour. Development of a sense of humour in the process of evolution is discussed.\nIt is a common belief that in any environment where life is possible, life will be generated. Here it is suggested that the cause for a spontaneous generation of complex systems is probability driven processes. Based on equilibrium thermodynamics, it is argued that in low occupation number statistical systems, the second law of thermodynamics yields an increase of thermal entropy and a canonic energy distribution. However, in high occupation number statistical systems, the same law for the same reasons yields an increase of information and a Benford's law/power-law energy distribution. It is therefore, plausible, that eventually the heat death is not necessarily the end of the universe.\nContributing to the rigorous understanding of BP, in this paper we relate the convergence of BP to spectral properties of the graph. This encompasses a result for random graphs with a ``planted'' solution; thus, we obtain the first rigorous result on BP for graph coloring in the case of a complex graphical structure (as opposed to trees). In particular, the analysis shows how Belief Propagation breaks the symmetry between the $3!$ possible permutations of the color classes.\nWe describe decomposition during search (DDS), an integration of And/Or tree search into propagation-based constraint solvers. The presented search algorithm dynamically decomposes sub-problems of a constraint satisfaction problem into independent partial problems, avoiding redundant work.   The paper discusses how DDS interacts with key features that make propagation-based solvers successful: constraint propagation, especially for global constraints, and dynamic search heuristics.   We have implemented DDS for the Gecode constraint programming library. Two applications, solution counting in graph coloring and protein structure prediction, exemplify the benefits of DDS in practice.\nThis paper presents experiments on common knowledge logic, conducted with the help of the proof assistant Coq. The main feature of common knowledge logic is the eponymous modality that says that a group of agents shares a knowledge about a certain proposition in a inductive way. This modality is specified by using a fixpoint approach. Furthermore, from these experiments, we discuss and compare the structure of theorems that can be proved in specific theories that use common knowledge logic. Those structures manifests the interplay between the theory (as implemented in the proof assistant Coq) and the metatheory.\nThe concept of a judgment as a logical action which introduces new information into a deductive system is examined. This leads to a way of mathematically representing implication which is distinct from the familiar material implication, according to which \"If A then B\" is considered to be equivalent to \"B or not-A\". This leads, in turn, to a resolution of the paradox of the raven.\nThis paper proposes the incremental Bayesian optimization algorithm (iBOA), which modifies standard BOA by removing the population of solutions and using incremental updates of the Bayesian network. iBOA is shown to be able to learn and exploit unrestricted Bayesian networks using incremental techniques for updating both the structure as well as the parameters of the probabilistic model. This represents an important step toward the design of competent incremental estimation of distribution algorithms that can solve difficult nearly decomposable problems scalably and reliably.\nWe review the connection between statistical mechanics and the analysis of random optimization problems, with particular emphasis on the random k-SAT problem. We discuss and characterize the different phase transitions that are met in these problems, starting from basic concepts. We also discuss how statistical mechanics methods can be used to investigate the behavior of local search and decimation based algorithms.\nFifty years ago, John von Neumann compared the architecture of the brain with that of computers that he invented and which is still in use today. In those days, the organisation of computers was based on concepts of brain organisation. Here, we give an update on current results on the global organisation of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing, and balanced network activation. Finally, we discuss mechanisms of self-organization for such architectures. After all, the organization of the brain might again inspire computer architecture.\nIn this paper, we implement an anomaly detection system using the Dempster-Shafer method. Using two standard benchmark problems we show that by combining multiple signals it is possible to achieve better results than by using a single signal. We further show that by applying this approach to a real-world email dataset the algorithm works for email worm detection. Dempster-Shafer can be a promising method for anomaly detection problems with multiple features (data sources), and two or more classes.\nThis paper introduces a procedure based on genetic programming to evolve XSLT programs (usually called stylesheets or logicsheets). XSLT is a general purpose, document-oriented functional language, generally used to transform XML documents (or, in general, solve any problem that can be coded as an XML document). The proposed solution uses a tree representation for the stylesheets as well as diverse specific operators in order to obtain, in the studied cases and a reasonable time, a XSLT stylesheet that performs the transformation. Several types of representation have been compared, resulting in different performance and degree of success.\nWe develop an incremental tableau-based decision procedures for the   Alternating-time temporal logic ATL and some of its variants.   While running within the theoretically established complexity upper bound, we claim that our tableau is practically more efficient in the average case than other decision procedures for ATL known so far. Besides, the ease of its adaptation to variants of ATL demonstrates the flexibility of the proposed procedure.\nIn this paper, we firstly present what is Interactive Evolutionary Computation (IEC) and rapidly how we have combined this artificial intelligence technique with an eye-tracker for visual optimization. Next, in order to correctly parameterize our application, we present results from applying data mining techniques on gaze information coming from experiments conducted on about 80 human individuals.\nIn this paper, we describe a new algorithm that consists in combining an eye-tracker for minimizing the fatigue of a user during the evaluation process of Interactive Evolutionary Computation. The approach is then applied to the Interactive One-Max optimization problem.\nWe show that universality in chaotic elements can be lifted to that in complex systems. We construct a globally coupled Flow lattice (GCFL), an analog of a globally coupled Map lattice (GCML). We find that Duffing GCFL shows the same behavior with GCML; population ratio between synchronizing clusters acts as a bifurcation parameter. Lorenz GCFL exhibits interesting two quasi-clusters in an opposite phase motion. Each of them looks like Will o' the wisp; they dance around in opposite phase.\nSu-Doku, a popular combinatorial puzzle, provides an excellent testbench for heuristic explorations. Several interesting questions arise from its deceptively simple set of rules. How many distinct Su-Doku grids are there? How to find a solution to a Su-Doku puzzle? Is there a unique solution to a given Su-Doku puzzle? What is a good estimation of a puzzle's difficulty? What is the minimum puzzle size (the number of \"givens\")?   This paper explores how these questions are related to the well-known alldifferent constraint which emerges in a wide variety of Constraint Satisfaction Problems (CSP) and compares various algorithmic approaches based on different formulations of Su-Doku.\nWe propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our algorithm simultaneously computes support vectors and a proxy kernel matrix used in forming the loss. This can be interpreted as a penalized kernel learning problem where indefinite kernel matrices are treated as a noisy observations of a true Mercer kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the projected gradient or analytic center cutting plane methods. We compare the performance of our technique with other methods on several classic data sets.\nThis paper pursues some applications of Rough Set Theory (RST) and neural-fuzzy model to analysis of \"lugeon data\". In the manner, using Self Organizing Map (SOM) as a pre-processing the data are scaled and then the dominant rules by RST, are elicited. Based on these rules variations of permeability in the different levels of Shivashan dam, Iran has been highlighted. Then, via using a combining of SOM and an adaptive Neuro-Fuzzy Inference System (NFIS) another analysis on the data was carried out. Finally, a brief comparison between the obtained results of RST and SOM-NFIS (briefly SONFIS) has been rendered.\nThis paper describes application of rough set theory, on the analysis of hydrocyclone operation. In this manner, using Self Organizing Map (SOM) as preprocessing step, best crisp granules of data are obtained. Then, using a combining of SOM and rough set theory (RST)-called SORST-, the dominant rules on the information table, obtained from laboratory tests, are extracted. Based on these rules, an approximate estimation on decision attribute is fulfilled. Finally, a brief comparison of this method with the SOM-NFIS system (briefly SONFIS) is highlighted.\nWe present a general framework of semi-supervised dimensionality reduction for manifold learning which naturally generalizes existing supervised and unsupervised learning frameworks which apply the spectral decomposition. Algorithms derived under our framework are able to employ both labeled and unlabeled examples and are able to handle complex problems where data form separate clusters of manifolds. Our framework offers simple views, explains relationships among existing frameworks and provides further extensions which can improve existing algorithms. Furthermore, a new semi-supervised kernelization framework called ``KPCA trick'' is proposed to handle non-linear problems.\nA first-order conditional logic is considered, with semantics given by a variant of epsilon-semantics, where p -> q means that Pr(q | p) approaches 1 super-polynomially --faster than any inverse polynomial. This type of convergence is needed for reasoning about security protocols. A complete axiomatization is provided for this semantics, and it is shown how a qualitative proof of the correctness of a security protocol can be automatically converted to a quantitative proof appropriate for reasoning about concrete security.\nMarkov networks and Bayesian networks are effective graphic representations of the dependencies embedded in probabilistic models. It is well known that independencies captured by Markov networks (called graph-isomorphs) have a finite axiomatic characterization. This paper, however, shows that independencies captured by Bayesian networks (called causal models) have no axiomatization by using even countably many Horn or disjunctive clauses. This is because a sub-independency model of a causal model may be not causal, while graph-isomorphs are closed under sub-models.\nGrainy numbers are defined as tuples of bits. They form a lattice where the meet and the join operations are an addition and a multiplication. They may be substituted for the real numbers in the definition of fuzzy sets. The aim is to propose an alternative negation for the complement that we'll call supplement.\nSimDialog is a visual editor for dialog in computer games. This paper presents the design of SimDialog, illustrating how script writers and non-programmers can easily create dialog for video games with complex branching structures and dynamic response characteristics. The system creates dialog as a directed graph. This allows for play using the dialog with a state-based cause and effect system that controls selection of non-player character responses and can provide a basic scoring mechanism for games.\nThis paper reports application of neuro- fuzzy inference system (NFIS) and self organizing feature map neural networks (SOM) on detection of contact state in a block system. In this manner, on a simple system, the evolution of contact states, by parallelization of DDA, has been investigated. So, a comparison between NFIS and SOM results has been presented. The results show applicability of the proposed methods, by different accuracy, on detection of contact's distribution.\nWe introduce a new tractable temporal constraint language, which strictly contains the Ord-Horn language of Buerkert and Nebel and the class of AND/OR precedence constraints. The algorithm we present for this language decides whether a given set of constraints is consistent in time that is quadratic in the input size. We also prove that (unlike Ord-Horn) this language cannot be solved by Datalog or by establishing local consistency.\nThis study, fundamentals of fuzzy block theory, and its application in assessment of stability in underground openings, has surveyed. Using fuzzy topics and inserting them in to key block theory, in two ways, fundamentals of fuzzy block theory has been presented. In indirect combining, by coupling of adaptive Neuro Fuzzy Inference System (NFIS) and classic block theory, we could extract possible damage parts around a tunnel. In direct solution, some principles of block theory, by means of different fuzzy facets theory, were rewritten.\nThis paper describes application of information granulation theory, on the analysis of hydrocyclone perforamance. In this manner, using a combining of Self Organizing Map (SOM) and Neuro-Fuzzy Inference System (NFIS), crisp and fuzzy granules are obtained(briefly called SONFIS). Balancing of crisp granules and sub fuzzy granules, within non fuzzy information (initial granulation), is rendered in an open-close iteration. Using two criteria, \"simplicity of rules \"and \"adaptive threoshold error level\", stability of algorithm is guaranteed. Validation of the proposed method, on the data set of the hydrocyclone is rendered.\nBy way of explaining how a brain works logically, human associative memory is modeled with logical and memory neurons, corresponding to standard digital circuits. The resulting cognitive architecture incorporates basic psychological elements such as short term and long term memory. Novel to the architecture are memory searches using cues chosen pseudorandomly from short term memory. Recalls alternated with sensory images, many tens per second, are analyzed subliminally as an ongoing process, to determine a direction of attention in short term memory.\nThe paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to 1-2 bits per node. Empirical results for our compression technique are presented, including comparisons with previously introduced techniques, showing that the new technique dominate on all tested instances.\nIn everyday life it happens that a person has to reason about what other people think and how they behave, in order to achieve his goals. In other words, an individual may be required to adapt his behaviour by reasoning about the others' mental state. In this paper we focus on a knowledge representation language derived from logic programming which both supports the representation of mental states of individual communities and provides each with the capability of reasoning about others' mental states and acting accordingly. The proposed semantics is shown to be translatable into stable model semantics of logic programs with aggregates.\nThe use of multiple Decision Models (DMs) enables to enhance the accuracy in decisions and at the same time allows users to evaluate the confidence in decision making. In this paper we explore the ability of multiple DMs to learn from a small amount of verified data. This becomes important when data samples are difficult to collect and verify. We propose an evolutionary-based approach to solving this problem. The proposed technique is examined on a few clinical problems presented by a small amount of data.\nWe present in this article a new evaluation method for classification and segmentation of textured images in uncertain environments. In uncertain environments, real classes and boundaries are known with only a partial certainty given by the experts. Most of the time, in many presented papers, only classification or only segmentation are considered and evaluated. Here, we propose to take into account both the classification and segmentation results according to the certainty given by the experts. We present the results of this method on a fusion of classifiers of sonar images for a seabed characterization.\nIn this paper, we present some high level information fusion approaches for numeric and symbolic data. We study the interest of such method particularly for classifier fusion. A comparative study is made in a context of sea bed characterization from sonar images. The classi- fication of kind of sediment is a difficult problem because of the data complexity. We compare high level information fusion and give the obtained performance.\nThe sonar images provide a rapid view of the seabed in order to characterize it. However, in such as uncertain environment, real seabed is unknown and the only information we can obtain, is the interpretation of different human experts, sometimes in conflict. In this paper, we propose to manage this conflict in order to provide a robust reality for the learning step of classification algorithms. The classification is conducted by a multilayer perceptron, taking into account the uncertainty of the reality in the learning stage. The results of this seabed characterization are presented on real sonar images.\nA serious defect with the Halpern-Pearl (HP) definition of causality is repaired by combining a theory of causality with a theory of defaults. In addition, it is shown that (despite a claim to the contrary) a cause according to the HP condition need not be a single conjunct. A definition of causality motivated by Wright's NESS test is shown to always hold for a single conjunct. Moreover, conditions that hold for all the examples considered by HP are given that guarantee that causality according to (this version) of the NESS test is equivalent to the HP definition.\nThe remarkable results of Foster and Vohra was a starting point for a series of papers which show that any sequence of outcomes can be learned (with no prior knowledge) using some universal randomized forecasting algorithm and forecast-dependent checking rules. We show that for the class of all computationally efficient outcome-forecast-based checking rules, this property is violated. Moreover, we present a probabilistic algorithm generating with probability close to one a sequence with a subsequence which simultaneously miscalibrates all partially weakly computable randomized forecasting algorithms. %subsequences non-learnable by each randomized algorithm.   According to the Dawid's prequential framework we consider partial recursive randomized algorithms.\nThe games of prediction with expert advice are considered in this paper. We present some modification of Kalai and Vempala algorithm of following the perturbed leader for the case of unrestrictedly large one-step gains. We show that in general case the cumulative gain of any probabilistic prediction algorithm can be much worse than the gain of some expert of the pool. Nevertheless, we give the lower bound for this cumulative gain in general case and construct a universal algorithm which has the optimal performance; we also prove that in case when one-step gains of experts of the pool have ``limited deviations'' the performance of our algorithm is close to the performance of the best expert.\nWe study the empirical meaning of randomness with respect to a family of probability distributions $P_\\theta$, where $\\theta$ is a real parameter, using algorithmic randomness theory. In the case when for a computable probability distribution $P_\\theta$ an effectively strongly consistent estimate exists, we show that the Levin's a priory semicomputable semimeasure of the set of all $P_\\theta$-random sequences is positive if and only if the parameter $\\theta$ is a computable real number. The different methods for generating ``meaningful'' $P_\\theta$-random sequences with noncomputable $\\theta$ are discussed.\nWe present a new online learning algorithm for cumulative discounted gain. This learning algorithm does not use exponential weights on the experts. Instead, it uses a weighting scheme that depends on the regret of the master algorithm relative to the experts. In particular, experts whose discounted cumulative gain is smaller (worse) than that of the master algorithm receive zero weight. We also sketch how a regret-based algorithm can be used as an alternative to Bayesian averaging in the context of inferring latent random variables.\nThe textured images' classification assumes to consider the images in terms of area with the same texture. In uncertain environment, it could be better to take an imprecise decision or to reject the area corresponding to an unlearning class. Moreover, on the areas that are the classification units, we can have more than one texture. These considerations allows us to develop a belief decision model permitting to reject an area as unlearning and to decide on unions and intersections of learning classes. The proposed approach finds all its justification in an application of seabed characterization from sonar images, which contributes to an illustration.\nIn this paper, we propose in Dezert-Smarandache Theory (DSmT) framework, a new probabilistic transformation, called DSmP, in order to build a subjective probability measure from any basic belief assignment defined on any model of the frame of discernment. Several examples are given to show how the DSmP transformation works and we compare it to main existing transformations proposed in the literature so far. We show the advantages of DSmP over classical transformations in term of Probabilistic Information Content (PIC). The direct extension of this transformation for dealing with qualitative belief assignments is also presented.\nAceWiki is a prototype that shows how a semantic wiki using controlled natural language - Attempto Controlled English (ACE) in our case - can make ontology management easy for everybody. Sentences in ACE can automatically be translated into first-order logic, OWL, or SWRL. AceWiki integrates the OWL reasoner Pellet and ensures that the ontology is always consistent. Previous results have shown that people with no background in logic are able to add formal knowledge to AceWiki without being instructed or trained in advance.\nIn this paper, we show our results on the bi-directional data exchange between the F-logic language supported by the Flora2 system and the OWL language. Most of the TBox and ABox axioms are translated preserving the semantics between the two representations, such as: proper inclusion, individual definition, functional properties, while some axioms and restrictions require a change in the semantics, such as: numbered and qualified cardinality restrictions. For the second case, we translate the OWL definite style inference rules into F-logic style constraints. We also describe a set of reasoning examples using the above translation, including the reasoning in Flora2 of a variety of ABox queries.\nWe extend Knuth's 16 Boolean binary logic operators to fuzzy logic and neutrosophic logic binary operators. Then we generalize them to n-ary fuzzy logic and neutrosophic logic operators using the smarandache codification of the Venn diagram and a defined vector neutrosophic law. In such way, new operators in neutrosophic logic/set/probability are built.\nThe integration of fuzzy set theory and fuzzy logic into scheduling is a rather new aspect with growing importance for manufacturing applications, resulting in various unsolved aspects. In the current paper, we investigate an improved local search technique for fuzzy scheduling problems with fitness plateaus, using a multi criteria formulation of the problem. We especially address the problem of changing job priorities over time as studied at the Sherwood Press Ltd, a Nottingham based printing company, who is a collaborator on the project.\nThe article proposes a heuristic approximation approach to the bin packing problem under multiple objectives. In addition to the traditional objective of minimizing the number of bins, the heterogeneousness of the elements in each bin is minimized, leading to a biobjective formulation of the problem with a tradeoff between the number of bins and their heterogeneousness. An extension of the Best-Fit approximation algorithm is presented to solve the problem. Experimental investigations have been carried out on benchmark instances of different size, ranging from 100 to 1000 items. Encouraging results have been obtained, showing the applicability of the heuristic approach to the described problem.\nThe article presents a local search approach for the solution of timetabling problems in general, with a particular implementation for competition track 3 of the International Timetabling Competition 2007 (ITC 2007). The heuristic search procedure is based on Threshold Accepting to overcome local optima. A stochastic neighborhood is proposed and implemented, randomly removing and reassigning events from the current solution.   The overall concept has been incrementally obtained from a series of experiments, which we describe in each (sub)section of the paper. In result, we successfully derived a potential candidate solution approach for the finals of track 3 of the ITC 2007.\nThis paper studies peek arc consistency, a reasoning technique that extends the well-known arc consistency technique for constraint satisfaction. In contrast to other more costly extensions of arc consistency that have been studied in the literature, peek arc consistency requires only linear space and quadratic time and can be parallelized in a straightforward way such that it runs in linear time with a linear number of processors. We demonstrate that for various constraint languages, peek arc consistency gives a polynomial-time decision procedure for the constraint satisfaction problem. We also present an algebraic characterization of those constraint languages that can be solved by peek arc consistency, and study the robustness of the algorithm.\nWe show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the direction of returns is not predictable using either text or returns, their size is, with text features producing significantly better performance than historical returns alone.\nThe Baum-Welsh algorithm together with its derivatives and variations has been the main technique for learning Hidden Markov Models (HMM) from observational data. We present an HMM learning algorithm based on the non-negative matrix factorization (NMF) of higher order Markovian statistics that is structurally different from the Baum-Welsh and its associated approaches. The described algorithm supports estimation of the number of recurrent states of an HMM and iterates the non-negative matrix factorization (NMF) algorithm to improve the learned HMM parameters. Numerical examples are provided as well.\nMost research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1,825 test cases against an existing empirically-derived function revealed an improvement in terms of precision, recall and accuracy.\nWe prove the following results for task allocation of indivisible resources:   - The problem of finding a leximin-maximal resource allocation is in P if the agents have max-utility functions and atomic demands.   - Deciding whether a resource allocation is Pareto-optimal is coNP-complete for agents with (1-)additive utility functions.   - Deciding whether there exists a Pareto-optimal and envy-free resource allocation is Sigma_2^p-complete for agents with (1-)additive utility functions.\nWe propose to improve medical decision making and reduce global health care costs by employing a free Internet-based medical information system with two main target groups: practicing physicians and medical researchers. After acquiring patients' consent, physicians enter medical histories, physiological data and symptoms or disorders into the system; an integrated expert system can then assist in diagnosis and statistical software provides a list of the most promising treatment options and medications, tailored to the patient. Physicians later enter information about the outcomes of the chosen treatments, data the system uses to optimize future treatment recommendations. Medical researchers can analyze the aggregate data to compare various drugs or treatments in defined patient populations on a large scale.\nIn this paper we study possibilities of efficient reasoning in combinations of theories over possibly non-disjoint signatures. We first present a class of theory extensions (called local extensions) in which hierarchical reasoning is possible, and give several examples from computer science and mathematics in which such extensions occur in a natural way. We then identify situations in which combinations of local extensions of a theory are again local extensions of that theory. We thus obtain criteria both for recognizing wider classes of local theory extensions, and for modular reasoning in combinations of theories over non-disjoint signatures.\nWe demonstrate AceWiki that is a semantic wiki using the controlled natural language Attempto Controlled English (ACE). The goal is to enable easy creation and modification of ontologies through the web. Texts in ACE can automatically be translated into first-order logic and other languages, for example OWL. Previous evaluation showed that ordinary people are able to use AceWiki without being instructed.\nThe exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. \"Optimism in the face of uncertainty\" and model building play central roles in advanced exploration methods. Here, we integrate several concepts and obtain a fast and simple algorithm. We show that the proposed algorithm finds a near-optimal policy in polynomial time, and give experimental evidence that it is robust and efficient compared to its ascendants.\nIn this paper entropy based methods are compared and used to measure structural diversity of an ensemble of 21 classifiers. This measure is mostly applied in ecology, whereby species counts are used as a measure of diversity. The measures used were Shannon entropy, Simpsons and the Berger Parker diversity indexes. As the diversity indexes increased so did the accuracy of the ensemble. An ensemble dominated by classifiers with the same structure produced poor accuracy. Uncertainty rule from information theory was also used to further define diversity. Genetic algorithms were used to find the optimal ensemble by using the diversity indices as the cost function. The method of voting was used to aggregate the decisions.\nThis article provides a simple logical structure, in which affective concepts (i.e. concepts related to emotions and feelings) can be defined. The set of affects defined is similar to the set of emotions covered in the OCC model (Ortony A., Collins A., and Clore G. L.: The Cognitive Structure of Emotions. Cambridge University Press, 1988), but the model presented in this article is fully computationally defined.\nThis paper develops a declarative language, P-log, that combines logical and probabilistic arguments in its reasoning. Answer Set Prolog is used as the logical foundation, while causal Bayes nets serve as a probabilistic foundation. We give several non-trivial examples and illustrate the use of P-log for knowledge representation and updating of knowledge. We argue that our approach to updates is more appealing than existing approaches. We give sufficiency conditions for the coherency of P-log programs and show that Bayes nets can be easily mapped to coherent P-log programs.\nAnswer set programming (ASP) is a logic programming paradigm that can be used to solve complex combinatorial search problems. Aggregates are an ASP construct that plays an important role in many applications. Defining a satisfactory semantics of aggregates turned out to be a difficult problem, and in this paper we propose a new approach, based on an analogy between aggregates and propositional connectives. First, we extend the definition of an answer set/stable model to cover arbitrary propositional theories; then we define aggregates on top of them both as primitive constructs and as abbreviations for formulas. Our definition of an aggregate combines expressiveness and simplicity, and it inherits many theorems about programs with nested expressions, such as theorems about strong equivalence and splitting.\nRecently, great attention was intended toward overcomplete dictionaries and the sparse representations they can provide. In a wide variety of signal processing problems, sparsity serves a crucial property leading to high performance. Inpainting, the process of reconstructing lost or deteriorated parts of images or videos, is an interesting application which can be handled by suitably decomposition of an image through combination of overcomplete dictionaries. This paper addresses a novel technique of such a decomposition and investigate that through inpainting of images. Simulations are presented to demonstrate the validation of our approach.\nIn this paper, we present a new kind of learning implementation to recognize the patterns using the concept of Mirroring Neural Network (MNN) which can extract information from distinct sensory input patterns and perform pattern recognition tasks. It is also capable of being used as an advanced associative memory wherein image data is associated with voice inputs in an unsupervised manner. Since the architecture is hierarchical and modular it has the potential of being used to devise learning engines of ever increasing complexity.\nHealth Practice Guideliens are supposed to unify practices and propose recommendations to physicians. This paper describes GemFrame, a system capable of semi-automatically filling an XML template from free texts in the clinical domain. The XML template includes semantic information not explicitly encoded in the text (pairs of conditions and ac-tions/recommendations). Therefore, there is a need to compute the exact scope of condi-tions over text sequences expressing the re-quired actions. We present a system developped for this task. We show that it yields good performance when applied to the analysis of French practice guidelines. We conclude with a precise evaluation of the tool.\nTwo well-known databases of semantic relationships between pairs of words used in psycholinguistics, feature-based and association-based, are studied as complex networks. We propose an algorithm to disentangle feature based relationships from free association semantic networks. The algorithm uses the rich topology of the free association semantic network to produce a new set of relationships between words similar to those observed in feature production norms.\nKnowledge representation it is an essential section of a Expert Systems, Because in this section we have a framework to establish an expert system then we can modeling and use by this to design an expert system. Many method it is exist for knowledge representation but each method have problems, in this paper we introduce a new method of object oriented by XML language as XMLKR to knowledge representation, and we want to discuss advantage and disadvantage of this method.\nIn this paper, we obtain bounds on the probability of convergence to the optimal solution for the compact Genetic Algorithm (cGA) and the Population Based Incremental Learning (PBIL). We also give a sufficient condition for convergence of these algorithms to the optimal solution and compute a range of possible values of the parameters of these algorithms for which they converge to the optimal solution with a confidence level.\nWe compare two different ways of quantization a simple sequential game Cat's Dilemma in the context of the debate on intransitive and transitive preferences. This kind of analysis can have essential meaning for the research on the artificial intelligence (some possibilities are discussed). Nature has both properties transitive and intransitive and maybe quantum models can be more able to capture this dualism than classical one. We also present electoral interpretation of the game.\nWe study constraint satisfaction problems on the so-called 'planted' random ensemble. We show that for a certain class of problems, e.g. graph coloring, many of the properties of the usual random ensemble are quantitatively identical in the planted random ensemble. We study the structural phase transitions, and the easy/hard/easy pattern in the average computational complexity. We also discuss the finite temperature phase diagram, finding a close connection with the liquid/glass/solid phenomenology.\nWe introduce a resource adaptive agent mechanism which supports the user in interactive theorem proving. The mechanism uses a two layered architecture of agent societies to suggest appropriate commands together with possible command argument instantiations. Experiments with this approach show that its effectiveness can be further improved by introducing a resource concept. In this paper we provide an abstract view on the overall mechanism, motivate the necessity of an appropriate resource concept and discuss its realization within the agent architecture.\nCurrent approaches to semantics in the geospatial domain are mainly based on ontologies, but ontologies, since continue to build entirely on the symbolic methodology, suffers from the classical problems, e.g. the symbol grounding problem, affecting representational theories. We claim for an enactive approach to semantics, where meaning is considered to be an emergent feature arising context-dependently in action. Since representational theories are unable to deal with context, a new formalism is required toward a contextual theory of concepts. SCOP is considered a promising formalism in this sense and is briefly described.\nWe investigate cut-elimination and cut-simulation in impredicative (higher-order) logics. We illustrate that adding simple axioms such as Leibniz equations to a calculus for an impredicative logic -- in our case a sequent calculus for classical type theory -- is like adding cut. The phenomenon equally applies to prominent axioms like Boolean- and functional extensionality, induction, choice, and description. This calls for the development of calculi where these principles are built-in instead of being treated axiomatically.\nA new procedure based on layered feed-forward neural networks for the microplane material model parameters identification is proposed in the present paper. Novelties are usage of the Latin Hypercube Sampling method for the generation of training sets, a systematic employment of stochastic sensitivity analysis and a genetic algorithm-based training of a neural network by an evolutionary algorithm. Advantages and disadvantages of this approach together with possible extensions are thoroughly discussed and analyzed.\nRecent development of network structure analysis shows that it plays an important role in characterizing complex system of many branches of sciences. Different from previous network centrality measures, this paper proposes the notion of topological centrality (TC) reflecting the topological positions of nodes and edges in general networks, and proposes an approach to calculating the topological centrality. The proposed topological centrality is then used to discover communities and build the backbone network. Experiments and applications on research network show the significance of the proposed approach.\nEmpirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case -- multitask learning with hundreds of thousands of tasks.\nWe present a family of pairwise tournaments reducing $k$-class classification to binary classification. These reductions are provably robust against a constant fraction of binary errors. The results improve on the PECOC construction \\cite{SECOC} with an exponential improvement in computation, from $O(k)$ to $O(\\log_2 k)$, and the removal of a square root in the regret dependence, matching the best possible computation and regret up to a constant.\nIn this short position paper we briefly review the development history of automated inductive theorem proving and computer-assisted mathematical induction. We think that the current low expectations on progress in this field result from a faulty narrow-scope historical projection. Our main motivation is to explain--on an abstract but hopefully sufficiently descriptive level--why we believe that future progress in the field is to result from human-orientedness and descente infinie.\nWe present a combination of raising, explicit variable dependency representation, the liberalized delta-rule, and preservation of solutions for first-order deductive theorem proving. Our main motivation is to provide the foundation for our work on inductive theorem proving, where the preservation of solutions is indispensable.\nThe use of formal methods provides confidence in the correctness of developments. Yet one may argue about the actual level of confidence obtained when the method itself -- or its implementation -- is not formally checked. We address this question for the B, a widely used formal method that allows for the derivation of correct programs from specifications. Through a deep embedding of the B logic in Coq, we check the B theory but also implement B tools. Both aspects are illustrated by the description of a proved prover for the B logic.\nWe propose Range and Roots which are two common patterns useful for specifying a wide range of counting and occurrence constraints. We design specialised propagation algorithms for these two patterns. Counting and occurrence constraints specified using these patterns thus directly inherit a propagation algorithm. To illustrate the capabilities of the Range and Roots constraints, we specify a number of global constraints taken from the literature. Preliminary experiments demonstrate that propagating counting and occurrence constraints using these two patterns leads to a small loss in performance when compared to specialised global constraints and is competitive with alternative decompositions using elementary constraints.\nCognitive radio is a breakthrough technology which is expected to have a profound impact on the way radio spectrum will be accessed, managed and shared in the future. In this paper I examine some of the implications of cognitive radio for future management of spectrum. Both a near-term view involving the opportunistic spectrum access model and a longer-term view involving a self-regulating dynamic spectrum access model within a society of cognitive radios are discussed.\nWe present an online method for estimating the cost of solving SAT problems. Modern SAT solvers present several challenges to estimate search cost including non-chronological backtracking, learning and restarts. Our method uses a linear model trained on data gathered at the start of search. We show the effectiveness of this method using random and structured problems. We demonstrate that predictions made in early restarts can be used to improve later predictions. We also show that we can use such cost estimations to select a solver from a portfolio.\nOne common type of symmetry is when values are symmetric. For example, if we are assigning colours (values) to nodes (variables) in a graph colouring problem then we can uniformly interchange the colours throughout a colouring. For a problem with value symmetries, all symmetric solutions can be eliminated in polynomial time. However, as we show here, both static and dynamic methods to deal with symmetry have computational limitations. With static methods, pruning all symmetric values is NP-hard in general. With dynamic methods, we can take exponential time on problems which static methods solve without search.\nIn this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce Greedy Sparse Linear Discriminant Analysis (GSLDA) \\cite{Moghaddam2007Fast} for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with \\cite{Viola2004Robust}. Moreover, we propose a new technique, termed Boosted Greedy Sparse Linear Discriminant Analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample re-weighting property of boosting and the class-separability criterion of GSLDA.\nOften user interfaces of theorem proving systems focus on assisting particularly trained and skilled users, i.e., proof experts. As a result, the systems are difficult to use for non-expert users. This paper describes a paper and pencil HCI experiment, in which (non-expert) students were asked to make suggestions for a GUI for an interactive system for mathematical proofs. They had to explain the usage of the GUI by applying it to construct a proof sketch for a given theorem. The evaluation of the experiment provides insights for the interaction design for non-expert users and the needs and wants of this user group.\nWe consider the problem of estimating the conditional probability of a label in time $O(\\log n)$, where $n$ is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly $10^6$ labels.\nA technique for speeding up reinforcement learning algorithms by using time manipulation is proposed. It is applicable to failure-avoidance control problems running in a computer simulation. Turning the time of the simulation backwards on failure events is shown to speed up the learning by 260% and improve the state space exploration by 12% on the cart-pole balancing task, compared to the conventional Q-learning and Actor-Critic algorithms.\nIn this paper a new dynamic subsumption technique for Boolean CNF formulae is proposed. It exploits simple and sufficient conditions to detect during conflict analysis, clauses from the original formula that can be reduced by subsumption. During the learnt clause derivation, and at each step of the resolution process, we simply check for backward subsumption between the current resolvent and clauses from the original formula and encoded in the implication graph. Our approach give rise to a strong and dynamic simplification technique that exploits learning to eliminate literals from the original clauses. Experimental results show that the integration of our dynamic subsumption approach within the state-of-the-art SAT solvers Minisat and Rsat achieves interesting improvements particularly on crafted instances.\nAs ontologies proliferate and automatic reasoners become more powerful, the problem of protecting sensitive information becomes more serious. In particular, as facts can be inferred from other facts, it becomes increasingly likely that information included in an ontology, while not itself deemed sensitive, may be able to be used to infer other sensitive information.   We first consider the problem of testing an ontology for safeness defined as its not being able to be used to derive any sensitive facts using a given collection of inference rules. We then consider the problem of optimizing an ontology based on the criterion of making as much useful information as possible available without revealing any sensitive facts.\nA mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.\nWe reformulate Pratt's tableau decision procedure of checking satisfiability of a set of formulas in PDL. Our formulation is simpler and more direct for implementation. Extending the method we give the first EXPTIME (optimal) tableau decision procedure not based on transformation for checking consistency of an ABox w.r.t. a TBox in PDL (here, PDL is treated as a description logic). We also prove the new result that the data complexity of the instance checking problem in PDL is coNP-complete.\nWe show how polynomial path orders can be employed efficiently in conjunction with weak innermost dependency pairs to automatically certify polynomial runtime complexity of term rewrite systems and the polytime computability of the functions computed. The established techniques have been implemented and we provide ample experimental data to assess the new method.\nThe topic of risk prevention and emergency response has become a key social and political concern. One approach to address this challenge is to develop Decision Support Systems (DSS) that can help emergency planners and responders to detect emergencies, as well as to suggest possible course of actions to deal with the emergency. Our research work comes in this framework and aims to develop a DSS that must be generic as much as possible and independent from the case study.\nIn this paper we present a novel hybrid (arraybased layout and vertical bitmap layout) database representation approach for mining complete Maximal Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms.\nSocial Network Analysis (SNA) tries to understand and exploit the key features of social networks in order to manage their life cycle and predict their evolution. Increasingly popular web 2.0 sites are forming huge social network. Classical methods from social network analysis (SNA) have been applied to such online networks. In this paper, we propose leveraging semantic web technologies to merge and exploit the best features of each domain. We present how to facilitate and enhance the analysis of online social networks, exploiting the power of semantic social network analysis.\nA efficient incremental learning algorithm for classification tasks, called NetLines, well adapted for both binary and real-valued input patterns is presented. It generates small compact feedforward neural networks with one hidden layer of binary units and binary output units. A convergence theorem ensures that solutions with a finite number of hidden units exist for both binary and real-valued input patterns. An implementation for problems with more than two classes, valid for any binary classifier, is proposed. The generalization error and the size of the resulting networks are compared to the best published results on well-known classification benchmarks. Early stopping is shown to decrease overfitting, without improving the generalization performance.\nWe present a straightforward embedding of quantified multimodal logic in simple type theory and prove its soundness and completeness. Modal operators are replaced by quantification over a type of possible worlds. We present simple experiments, using existing higher-order theorem provers, to demonstrate that the embedding allows automated proofs of statements in these logics, as well as meta properties of them.\nThis paper studies quantum annealing (QA) for clustering, which can be seen as an extension of simulated annealing (SA). We derive a QA algorithm for clustering and propose an annealing schedule, which is crucial in practice. Experiments show the proposed QA algorithm finds better clustering assignments than SA. Furthermore, QA is as easy as SA to implement.\nThis paper presents studies on a deterministic annealing algorithm based on quantum annealing for variational Bayes (QAVB) inference, which can be seen as an extension of the simulated annealing for variational Bayes (SAVB) inference. QAVB is as easy as SAVB to implement. Experiments revealed QAVB finds a better local optimum than SAVB in terms of the variational free energy in latent Dirichlet allocation (LDA).\nIn this paper we propose an approach to build a decision support system that can help emergency planners and responders to detect and manage emergency situations. The internal mechanism of the system is independent from the treated application. Therefore, we think the system may be used or adapted easily to different case studies. We focus here on a first step in the decision-support process which concerns the modeling of information issued from the perceived environment and their representation dynamically using a multiagent system. This modeling was applied on the RoboCupRescue Simulation System. An implementation and some results are presented here.\nWe present the CIFF proof procedure for abductive logic programming with constraints, and we prove its correctness. CIFF is an extension of the IFF proof procedure for abductive logic programming, relaxing the original restrictions over variable quantification (allowedness conditions) and incorporating a constraint solver to deal with numerical constraints as in constraint logic programming. Finally, we describe the CIFF system, comparing it with state of the art abductive systems and answer set solvers and showing how to use it to program some applications. (To appear in Theory and Practice of Logic Programming - TPLP).\nThis paper introduces a novel concept of self-forensics to complement the standard autonomic self-CHOP properties of the self-managed systems, to be specified in the Forensic Lucid language. We argue that self-forensics, with the forensics taken out of the cybercrime domain, is applicable to \"self-dissection\" for the purpose of verification of autonomous software and hardware systems of flight-critical systems for automated incident and anomaly analysis and event reconstruction by the engineering teams in a variety of incident scenarios during design and testing as well as actual flight data.\nWe propose a new model-based computer-aided diagnosis (CAD) system for tumor detection and classification (cancerous v.s. benign) in breast images. Specifically, we show that (x-ray, ultrasound and MRI) images can be accurately modeled by two-dimensional autoregressive-moving average (ARMA) random fields. We derive a two-stage Yule-Walker Least-Squares estimates of the model parameters, which are subsequently used as the basis for statistical inference and biophysical interpretation of the breast image. We use a k-means classifier to segment the breast image into three regions: healthy tissue, benign tumor, and cancerous tumor. Our simulation results on ultrasound breast images illustrate the power of the proposed approach.\nWe propose the use of Soft Constraints as a natural way to model Service Oriented Architecture. In the framework, constraints are used to model components and connectors and constraint aggregation is used to represent their interactions. The \"quality of a service\" is measured and considered when performing queries to service providers. Some examples consist in the levels of cost, performance and availability required by clients. In our framework, the QoS scores are represented by the softness level of the constraint and the measure of complex (web) services is computed by combining the levels of the components.\nThe problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. A part of them can be recommended as future advertising terms to the company. The goal of this work is to propose better interpretable recommendations based on FCA and association rules.\nMartin and Osswald \\cite{Martin07} have recently proposed many generalizations of combination rules on quantitative beliefs in order to manage the conflict and to consider the specificity of the responses of the experts. Since the experts express themselves usually in natural language with linguistic labels, Smarandache and Dezert \\cite{Li07} have introduced a mathematical framework for dealing directly also with qualitative beliefs. In this paper we recall some element of our previous works and propose the new combination rules, developed for the fusion of both qualitative or quantitative beliefs.\nWe investigate the global GRAMMAR constraint over restricted classes of context free grammars like deterministic and unambiguous context-free grammars. We show that detecting disentailment for the GRAMMAR constraint in these cases is as hard as parsing an unrestricted context free grammar.We also consider the class of linear grammars and give a propagator that runs in quadratic time. Finally, to demonstrate the use of linear grammars, we show that a weighted linear GRAMMAR constraint can efficiently encode the EDITDISTANCE constraint, and a conjunction of the EDITDISTANCE constraint and the REGULAR constraint\nThis paper presents an agent-oriented approach to build a decision support system aimed at helping emergency managers to detect and to manage risks. We stress the flexibility and the adaptivity characteristics that are crucial to build a robust and efficient system, able to resolve complex problems. The system should be independent as much as possible from the subject of study. Thereby, an original approach based on a mechanism of perception, representation, characterisation and assessment is proposed. The work described here is applied on the RoboCupRescue application. Experimentations and results are provided.\nScenarios are pen-pictures of plausible futures, used for strategic planning. The aim of this investigation is to expand the horizon of scenario-based planning through computational models that are able to aid the analyst in the planning process. The investigation builds upon the advances of Information and Communication Technology (ICT) to create a novel, flexible and customizable computational capability-based planning methodology that is practical and theoretically sound. We will show how evolutionary computation, in particular evolutionary multi-objective optimization, can play a central role - both as an optimizer and as a source for innovation.\nIn this paper, a new method for assigning credit to search operators is presented. Starting with the principle of optimizing search bias, search operators are selected based on an ability to create solutions that are historically linked to future generations. Using a novel framework for defining performance measurements, distributing credit for performance, and the statistical interpretation of this credit, a new adaptive method is developed and shown to outperform a variety of adaptive and non-adaptive competitors.\nThe motivation of semantic wikis is to make acquisition, maintenance, and mining of formal knowledge simpler, faster, and more flexible. However, most existing semantic wikis have a very technical interface and are restricted to a relatively low level of expressivity. In this paper, we explain how AceWiki uses controlled English - concretely Attempto Controlled English (ACE) - to provide a natural and intuitive interface while supporting a high degree of expressivity. We introduce recent improvements of the AceWiki system and user studies that indicate that AceWiki is usable and useful.\nThe article describes the proposition and application of a local search metaheuristic for multi-objective optimization problems. It is based on two main principles of heuristic search, intensification through variable neighborhoods, and diversification through perturbations and successive iterations in favorable regions of the search space. The concept is successfully tested on permutation flow shop scheduling problems under multiple objectives and compared to other local search approaches. While the obtained results are encouraging in terms of their quality, another positive attribute of the approach is its simplicity as it does require the setting of only very few parameters.\nGraph theory provides a primary tool for analyzing and designing computer communication networks. In the past few decades, Graph theory has been used to study various types of networks, including the Internet, wide Area Networks, Local Area Networks, and networking protocols such as border Gateway Protocol, Open shortest Path Protocol, and Networking Networks. In this paper, we present some key graph theory concepts used to represent different types of networks. Then we describe how networks are modeled to investigate problems related to network protocols. Finally, we present some of the tools used to generate graph for representing practical networks.\nRestart strategies are an important factor in the performance of conflict-driven Davis Putnam style SAT solvers. Selecting a good restart strategy for a problem instance can enhance the performance of a solver. Inspired by recent success applying machine learning techniques to predict the runtime of SAT solvers, we present a method which uses machine learning to boost solver performance through a smart selection of the restart strategy. Based on easy to compute features, we train both a satisfiability classifier and runtime models. We use these models to choose between restart strategies. We present experimental results comparing this technique with the most commonly used restart strategies. Our results demonstrate that machine learning is effective in improving solver performance.\nClassification is the basis of cognition. Unlike other solutions, this study approaches it from the view of outliers. We present an expanding algorithm to detect outliers in univariate datasets, together with the underlying foundation. The expanding algorithm runs in a holistic way, making it a rather robust solution. Synthetic and real data experiments show its power. Furthermore, an application for multi-class problems leads to the introduction of the oscillator algorithm. The corresponding result implies the potential wide use of the expanding algorithm.\nWe consider a sequence of repeated interactions between an agent and an environment. Uncertainty about the environment is captured by a probability distribution over a space of hypotheses, which includes all computable functions. Given a utility function, we can evaluate the expected utility of any computational policy for interaction with the environment. After making some plausible assumptions (and maybe one not-so-plausible assumption), we show that if the utility function is unbounded, then the expected utility of any policy is undefined.\nThis study describes application of some approximate reasoning methods to analysis of hydrocyclone performance. In this manner, using a combining of Self Organizing Map (SOM), Neuro-Fuzzy Inference System (NFIS)-SONFIS- and Rough Set Theory (RST)-SORST-crisp and fuzzy granules are obtained. Balancing of crisp granules and non-crisp granules can be implemented in close-open iteration. Using different criteria and based on granulation level balance point (interval) or a pseudo-balance point is estimated. Validation of the proposed methods, on the data set of the hydrocyclone is rendered.\nWe suggest an approach to use memristors (resistors with memory) in programmable analog circuits. Our idea consists in a circuit design in which low voltages are applied to memristors during their operation as analog circuit elements and high voltages are used to program the memristor's states. This way, as it was demonstrated in recent experiments, the state of memristors does not essentially change during analog mode operation. As an example of our approach, we have built several programmable analog circuits demonstrating memristor-based programming of threshold, gain and frequency.\nSome criticisms that have been raised against the Cox approach to probability theory are addressed. Should we use a single real number to measure a degree of rational belief? Can beliefs be compared? Are the Cox axioms obvious? Are there counterexamples to Cox? Rather than justifying Cox's choice of axioms we follow a different path and derive the sum and product rules of probability theory as the unique (up to regraduations) consistent representations of the Boolean AND and OR operations.\nTraditional recommendation systems make recommendations based solely on the customer's past purchases, product ratings and demographic data without considering the profitability the items being recommended. In this work we study the question of how a vendor can directly incorporate the profitability of items into its recommender so as to maximize its expected profit while still providing accurate recommendations. Our approach uses the output of any traditional recommender system and adjust them according to item profitabilities. Our approach is parameterized so the vendor can control how much the recommendation incorporating profits can deviate from the traditional recommendation. We study our approach under two settings and show that it achieves approximately 22% more profit than traditional recommendations.\nWe study the notion of informedness in a client-consultant setting. Using a software simulator, we examine the extent to which it pays off for consultants to provide their clients with advice that is well-informed, or with advice that is merely meant to appear to be well-informed. The latter strategy is beneficial in that it costs less resources to keep up-to-date, but carries the risk of a decreased reputation if the clients discover the low level of informedness of the consultant. Our experimental results indicate that under different circumstances, different strategies yield the optimal results (net profit) for the consultants.\n2007 was the first international congress on the ?square of oppositions?. A first attempt to structure debate using n-opposition theory was presented along with the results of a first experiment on the web. Our proposal for this paper is to define relations between arguments through a structure of opposition (square of oppositions is one structure of opposition). We will be trying to answer the following questions: How to organize debates on the web 2.0? How to structure them in a logical way? What is the role of n-opposition theory, in this context? We present in this paper results of three experiments (Betapolitique 2007, ECAP 2008, Intermed 2008).\nThis paper describes a new approach to solving some stochastic optimization problems for linear dynamic system with various parametric uncertainties. Proposed approach is based on application of tensor formalism for creation the mathematical model of parametric uncertainties. Within proposed approach following problems are considered: prediction, data processing and optimal control. Outcomes of carried out simulation are used as illustration of properties and effectiveness of proposed methods.\nMany models in natural and social sciences are comprised of sets of inter-acting entities whose intensity of interaction decreases with distance. This often leads to structures of interest in these models composed of dense packs of entities. Fast Multipole Methods are a family of methods developed to help with the calculation of a number of computable models such as described above. We propose a method that builds upon FMM to detect and model the dense structures of these systems.\nConstrained Optimum Path (COP) problems appear in many real-life applications, especially on communication networks. Some of these problems have been considered and solved by specific techniques which are usually difficult to extend. In this paper, we introduce a novel local search modeling for solving some COPs by local search. The modeling features the compositionality, modularity, reuse and strengthens the benefits of Constrained-Based Local Search. We also apply the modeling to the edge-disjoint paths problem (EDP). We show that side constraints can easily be added in the model. Computational results show the significance of the approach.\nThis article introduces SatHyS (SAT HYbrid Solver), a novel hybrid approach for propositional satisfiability. It combines local search and conflict driven clause learning (CDCL) scheme. Each time the local search part reaches a local minimum, the CDCL is launched. For SAT problems it behaves like a tabu list, whereas for UNSAT ones, the CDCL part tries to focus on minimum unsatisfiable sub-formula (MUS). Experimental results show good performances on many classes of SAT instances from the last SAT competitions.\nThis paper presents a new method and a constraint-based objective function to solve two problems related to the design of optical telecommunication networks, namely the Synchronous Optical Network Ring Assignment Problem (SRAP) and the Intra-ring Synchronous Optical Network Design Problem (IDP). These network topology problems can be represented as a graph partitioning with capacity constraints as shown in previous works. We present here a new objective function and a new local search algorithm to solve these problems. Experiments conducted in Comet allow us to compare our method to previous ones and show that we obtain better results.\nWe explore the idea of using finite automata to implement new constraints for local search (this is already a successful technique in constraint-based global search). We show how it is possible to maintain incrementally the violations of a constraint and its decision variables from an automaton that describes a ground checker for that constraint. We establish the practicality of our approach idea on real-life personnel rostering problems, and show that it is competitive with the approach of [Pralong, 2007].\nProofs, in Ludics, have an interpretation provided by their counter-proofs, that is the objects they interact with. We follow the same idea by proposing that sentence meanings are given by the counter-meanings they are opposed to in a dialectical interaction. The conception is at the intersection of a proof-theoretic and a game-theoretic accounts of semantics, but it enlarges them by allowing to deal with possibly infinite processes.\nMachine Learning is usually defined as a subfield of AI, which is busy with information extraction from raw data sets. Despite of its common acceptance and widespread recognition, this definition is wrong and groundless. Meaningful information does not belong to the data that bear it. It belongs to the observers of the data and it is a shared agreement and a convention among them. Therefore, this private information cannot be extracted from the data by any means. Therefore, all further attempts of Machine Learning apologists to justify their funny business are inappropriate.\nIn sports competitions, teams can manipulate the result by, for instance, throwing games. We show that we can decide how to manipulate round robin and cup competitions, two of the most popular types of sporting competitions in polynomial time. In addition, we show that finding the minimal number of games that need to be thrown to manipulate the result can also be determined in polynomial time. Finally, we show that there are several different variations of standard cup competitions where manipulation remains polynomial.\nWe propose and compare various sentence selection strategies for active learning for the task of detecting mentions of entities. The best strategy employs the sum of confidences of two statistical classifiers trained on different views of the data. Our experimental results show that, compared to the random selection strategy, this strategy reduces the amount of required labeled training data by over 50% while achieving the same performance. The effect is even more significant when only named mentions are considered: the system achieves the same performance by using only 42% of the training data required by the random selection strategy.\nModelling emotion has become a challenge nowadays. Therefore, several models have been produced in order to express human emotional activity. However, only a few of them are currently able to express the close relationship existing between emotion and cognition. An appraisal-coping model is presented here, with the aim to simulate the emotional impact caused by the evaluation of a particular situation (appraisal), along with the consequent cognitive reaction intended to face the situation (coping). This model is applied to the \"Cascades\" problem, a small arithmetical exercise designed for ten-year-old pupils. The goal is to create a model corresponding to a child's behaviour when solving the problem using his own strategies.\nModeling emotion has become a challenge nowadays. Therefore, several models have been produced in order to express human emotional activity. However, only a few of them are currently able to express the close relationship existing between emotion and cognition. An appraisal-coping model is presented here, with the aim to simulate the emotional impact caused by the evaluation of a particular situation (appraisal), along with the consequent cognitive reaction intended to face the situation (coping). This model is applied to the ?Cascades? problem, a small arithmetical exercise designed for ten-year-old pupils. The goal is to create a model corresponding to a child's behavior when solving the problem using his own strategies.\nTo find all extreme points of multimodal functions is called extremum problem, which is a well known difficult issue in optimization fields. Applying ant colony optimization (ACO) to solve this problem is rarely reported. The method of applying ACO to solve extremum problem is explored in this paper. Experiment shows that the solution error of the method presented in this paper is less than 10^-8. keywords: Extremum Problem; Ant Colony Optimization (ACO)\nA totally semantic measure is presented which is able to calculate a similarity value between concept descriptions and also between concept description and individual or between individuals expressed in an expressive description logic. It is applicable on symbolic descriptions although it uses a numeric approach for the calculus. Considering that Description Logics stand as the theoretic framework for the ontological knowledge representation and reasoning, the proposed measure can be effectively used for agglomerative and divisional clustering task applied to the semantic web domain.\nAdaptation has long been considered as the Achilles' heel of case-based reasoning since it requires some domain-specific knowledge that is difficult to acquire. In this paper, two strategies are combined in order to reduce the knowledge engineering cost induced by the adaptation knowledge (CA) acquisition task: CA is learned from the case base by the means of knowledge discovery techniques, and the CA acquisition sessions are opportunistically triggered, i.e., at problem-solving time.\nProbabilistic sampling methods have become very popular to solve single-shot path planning problems. Rapidly-exploring Random Trees (RRTs) in particular have been shown to be efficient in solving high dimensional problems. Even though several RRT variants have been proposed for dynamic replanning, these methods only perform well in environments with infrequent changes. This paper addresses the dynamic path planning problem by combining simple techniques in a multi-stage probabilistic algorithm. This algorithm uses RRTs for initial planning and informed local search for navigation. We show that this combination of simple techniques provides better responses to highly dynamic environments than the RRT extensions.\nThe benefit of using ontologies, defined by the respective data standards, is shown. It is presented how ontologies can be used for the semantic enrichment of data and how this can contribute to the vision of the semantic web to become true. The problems existing today on the way to a true semantic web are pinpointed, different semantic web standards, tools and development frameworks are overlooked and an outlook towards artificial intelligence and agents for searching and mining the data in the semantic web are given, paving the way from data management to information and in the end true knowledge management systems.\nAction description languages, such as A and B, are expressive instruments introduced for formalizing planning domains and planning problem instances. The paper starts by proposing a methodology to encode an action language (with conditional effects and static causal laws), a slight variation of B, using Constraint Logic Programming over Finite Domains. The approach is then generalized to raise the use of constraints to the level of the action language itself. A prototype implementation has been developed, and the preliminary results are presented and discussed.   To appear in Theory and Practice of Logic Programming (TPLP)\nThis paper presents several novel generalization bounds for the problem of learning kernels based on the analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels has only a log(p) dependency on the number of kernels, p, which is considerably more favorable than the previous best bound given for the same problem. We also give a novel bound for learning with a linear combination of p base kernels with an L_2 regularization whose dependency on p is only in p^{1/4}.\nThis paper extends k-means algorithms from the Euclidean domain to the domain of graphs. To recompute the centroids, we apply subgradient methods for solving the optimization-based formulation of the sample mean of graphs. To accelerate the k-means algorithm for graphs without trading computational time against solution quality, we avoid unnecessary graph distance calculations by exploiting the triangle inequality of the underlying distance metric following Elkan's k-means algorithm proposed in \\cite{Elkan03}. In experiments we show that the accelerated k-means algorithm are faster than the standard k-means algorithm for graphs provided there is a cluster structure in the data.\nThere has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to obtain stochastic lower and upper bounds on the value of each tree node. This enables us to use stochastic branch and bound algorithms to search the tree efficiently. This paper proposes two such algorithms and examines their complexity in this setting.\nNieuwenhuis, Oliveras, and Tinelli (2006) showed how to describe enhancements of the Davis-Putnam-Logemann-Loveland algorithm using transition systems, instead of pseudocode. We design a similar framework for several algorithms that generate answer sets for logic programs: Smodels, Smodels-cc, Asp-Sat with Learning (Cmodels), and a newly designed and implemented algorithm Sup. This approach to describing answer set solvers makes it easier to prove their correctness, to compare them, and to design new systems.\nThis paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.\nWe introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.\nVector quantization(VQ) is a lossy data compression technique from signal processing, which is restricted to feature vectors and therefore inapplicable for combinatorial structures. This contribution presents a theoretical foundation of graph quantization (GQ) that extends VQ to the domain of attributed graphs. We present the necessary Lloyd-Max conditions for optimality of a graph quantizer and consistency results for optimal GQ design based on empirical distortion measures and stochastic optimization. These results statistically justify existing clustering algorithms in the domain of graphs. The proposed approach provides a template of how to link structural pattern recognition methods other than GQ to statistical pattern recognition.\nThere are at least two ways to interpret numerical degrees of belief in terms of betting: (1) you can offer to bet at the odds defined by the degrees of belief, or (2) you can judge that a strategy for taking advantage of such betting offers will not multiply the capital it risks by a large factor. Both interpretations can be applied to ordinary additive probabilities and used to justify updating by conditioning. Only the second can be applied to Dempster-Shafer degrees of belief and used to justify Dempster's rule of combination.\nBotnets, which consist of thousands of compromised machines, can cause significant threats to other systems by launching Distributed Denial of Service (SSoS) attacks, keylogging, and backdoors. In response to these threats, new effective techniques are needed to detect the presence of botnets. In this paper, we have used an interception technique to monitor Windows Application Programming Interface (API) functions calls made by communication applications and store these calls with their arguments in log files. Our algorithm detects botnets based on monitoring abnormal activity by correlating the changes in log file sizes from different hosts.\nInspired by a growing interest in analyzing network data, we study the problem of node classification on graphs, focusing on approaches based on kernel machines. Conventionally, kernel machines are linear classifiers in the implicit feature space. We argue that linear classification in the feature space of kernels commonly used for graphs is often not enough to produce good results. When this is the case, one naturally considers nonlinear classifiers in the feature space. We show that repeating this process produces something we call \"deep kernel machines.\" We provide some examples where deep kernel machines can make a big difference in classification performance, and point out some connections to various recent literature on deep architectures in artificial intelligence and machine learning.\nThe construction of a reference ontology for a large domain still remains an hard human task. The process is sometimes assisted by software tools that facilitate the information extraction from a textual corpus. Despite of the great use of XML Schema files on the internet and especially in the B2B domain, tools that offer a complete semantic analysis of XML schemas are really rare. In this paper we introduce Janus, a tool for automatically building a reference knowledge base starting from XML Schema files. Janus also provides different useful views to simplify B2B application integration.\nThis work was inspired by author experiences with a telescope scheduling. Author long time goal is to develop and further extend software for an autonomous observatory. The software shall provide users with all the facilities they need to take scientific images of the night sky, cooperate with other autonomous observatories, and possibly more. This works shows how genetic algorithm can be used for scheduling of a single observatory, as well as network of observatories.\nThis paper presents an evaluation of the design decisions made in four state-of-the-art constraint solvers; Choco, ECLiPSe, Gecode, and Minion. To assess the impact of design decisions, instances of the five problem classes n-Queens, Golomb Ruler, Magic Square, Social Golfers, and Balanced Incomplete Block Design are modelled and solved with each solver. The results of the experiments are not meant to give an indication of the performance of a solver, but rather investigate what influence the choice of algorithms and data structures has.   The analysis of the impact of the design decisions focuses on the different ways of memory management, behaviour with increasing problem size, and specialised algorithms for specific types of variables. It also briefly considers other, less significant decisions.\nMachine Consciousness is the study of consciousness in a biological, philosophical, mathematical and physical perspective and designing a model that can fit into a programmable system architecture. Prime objective of the study is to make the system architecture behave consciously like a biological model does. Present work has developed a feasible definition of consciousness, that characterizes consciousness with four parameters i.e., parasitic, symbiotic, self referral and reproduction. Present work has also developed a biologically inspired consciousness architecture that has following layers: quantum layer, cellular layer, organ layer and behavioral layer and traced the characteristics of consciousness at each layer. Finally, the work has estimated physical and algorithmic architecture to devise a system that can behave consciously.\nCODEQ is a new, population-based meta-heuristic algorithm that is a hybrid of concepts from chaotic search, opposition-based learning, differential evolution and quantum mechanics. CODEQ has successfully been used to solve different types of problems (e.g. constrained, integer-programming, engineering) with excellent results. In this paper, CODEQ is used to train feed-forward neural networks. The proposed method is compared with particle swarm optimization and differential evolution algorithms on three data sets with encouraging results.\nThe conceptual modelling built from text is rarely an ontology. As a matter of fact, such a conceptualization is corpus-dependent and does not offer the main properties we expect from ontology. Furthermore, ontology extracted from text in general does not match ontology defined by expert using a formal language. It is not surprising since ontology is an extra-linguistic conceptualization whereas knowledge extracted from text is the concern of textual linguistics. Incompleteness of text and using rhetorical figures, like ellipsis, modify the perception of the conceptualization we may have. Ontological knowledge, which is necessary for text understanding, is not in general embedded into documents.\nWe extend the Chow-Liu algorithm for general random variables while the previous versions only considered finite cases. In particular, this paper applies the generalization to Suzuki's learning algorithm that generates from data forests rather than trees based on the minimum description length by balancing the fitness of the data to the forest and the simplicity of the forest. As a result, we successfully obtain an algorithm when both of the Gaussian and finite random variables are present.\nTransforming constraint models is an important task in re- cent constraint programming systems. User-understandable models are defined during the modeling phase but rewriting or tuning them is manda- tory to get solving-efficient models. We propose a new architecture al- lowing to define bridges between any (modeling or solver) languages and to implement model optimizations. This architecture follows a model- driven approach where the constraint modeling process is seen as a set of model transformations. Among others, an interesting feature is the def- inition of transformations as concept-oriented rules, i.e. based on types of model elements where the types are organized into a hierarchy called a metamodel.\nRecently, new approaches to adaptive control have sought to reformulate the problem as a minimization of a relative entropy criterion to obtain tractable solutions. In particular, it has been shown that minimizing the expected deviation from the causal input-output dependencies of the true plant leads to a new promising stochastic control rule called the Bayesian control rule. This work proves the convergence of the Bayesian control rule under two sufficient assumptions: boundedness, which is an ergodicity condition; and consistency, which is an instantiation of the sure-thing principle.\nWe proposed a learning algorithm for nonparametric estimation and on-line prediction for general stationary ergodic sources. We prepare histograms each of which estimates the probability as a finite distribution, and mixture them with weights to construct an estimator. The whole analysis is based on measure theory. The estimator works whether the source is discrete or continuous. If it is stationary ergodic, then the measure theoretically given Kullback-Leibler information divided by the sequence length $n$ converges to zero as $n$ goes to infinity. In particular, for continuous sources, the method does not require existence of a probability density function.\nWe analyze and evaluate an online gradient descent algorithm with adaptive per-coordinate adjustment of learning rates. Our algorithm can be thought of as an online version of batch gradient descent with a diagonal preconditioner. This approach leads to regret bounds that are stronger than those of standard online gradient descent for general online convex optimization problems. Experimentally, we show that our algorithm is competitive with state-of-the-art algorithms for large scale machine learning problems.\nGood old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the famous MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images, and graphics cards to greatly speed up learning.\nThe main workshop objective was to promote a holistic view and interdisciplinary methods for design, verification and co-ordination of aerospace systems, by combining formal methods with techniques from control engineering and artificial intelligence. The very demanding safety, robustness and performance requirements of these systems require unprecedented integration of heterogeneous techniques and models. The aim of FMA was to bring together active researchers from all the above areas to discuss and present their work.\nFormalism based on GA is an alternative to distributed representation models developed so far --- Smolensky's tensor product, Holographic Reduced Representations (HRR) and Binary Spatter Code (BSC). Convolutions are replaced by geometric products, interpretable in terms of geometry which seems to be the most natural language for visualization of higher concepts. This paper recalls the main ideas behind the GA model and investigates recognition test results using both inner product and a clipped version of matrix representation. The influence of accidental blade equality on recognition is also studied. Finally, the efficiency of the GA model is compared to that of previously developed models.\nThis paper discusses the knowledge integration of clinical information extracted from distributed medical ontology in order to ameliorate a machine learning-based multi-label coding assignment system. The proposed approach is implemented using a decision tree based cascade hierarchical technique on the university hospital data for patients with Coronary Heart Disease (CHD). The preliminary results obtained show a satisfactory finding.\nIn this paper, we have given an idea of area specification and its corresponding sensing of nodes in a dynamic network. We have applied the concept of Monte Carlo methods in this respect. We have cited certain statistical as well as artificial intelligence based techniques for realizing the position of a node. We have also applied curve fitting concept for node detection and relative verification.\nWe consider the problem of estimating the topology of spatial interactions in a discrete state, discrete time spatio-temporal graphical model where the interactions affect the temporal evolution of each agent in a network. Among other models, the susceptible, infected, recovered ($SIR$) model for interaction events fall into this framework. We pose the problem as a structure learning problem and solve it using an $\\ell_1$-penalized likelihood convex program. We evaluate the solution on a simulated spread of infectious over a complex network. Our topology estimates outperform those of a standard spatial Markov random field graphical model selection using $\\ell_1$-regularized logistic regression.\nProvenance, or information about the sources, derivation, custody or history of data, has been studied recently in a number of contexts, including databases, scientific workflows and the Semantic Web. Many provenance mechanisms have been developed, motivated by informal notions such as influence, dependence, explanation and causality. However, there has been little study of whether these mechanisms formally satisfy appropriate policies or even how to formalize relevant motivating concepts such as causality. We contend that mathematical models of these concepts are needed to justify and compare provenance techniques. In this paper we review a theory of causality based on structural models that has been developed in artificial intelligence, and describe work in progress on a causal semantics for provenance graphs.\nWe mark up a corpus of LaTeX lecture notes semantically and expose them as Linked Data in XHTML+MathML+RDFa. Our application makes the resulting documents interactively browsable for students. Our ontology helps to answer queries from students and lecturers, and paves the path towards an integration of our corpus with external sites.\nWe define an inference system to capture explanations based on causal statements, using an ontology in the form of an IS-A hierarchy. We first introduce a simple logical language which makes it possible to express that a fact causes another fact and that a fact explains another fact. We present a set of formal inference patterns from causal statements to explanation statements. We introduce an elementary ontology which gives greater expressiveness to the system while staying close to propositional reasoning. We provide an inference system that captures the patterns discussed, firstly in a purely propositional framework, then in a datalog (limited predicate) framework.\nWe introduce a class of neural networks derived from probabilistic models in the form of Bayesian networks. By imposing additional assumptions about the nature of the probabilistic models represented in the networks, we derive neural networks with standard dynamics that require no training to determine the synaptic weights, that perform accurate calculation of the mean values of the random variables, that can pool multiple sources of evidence, and that deal cleanly and consistently with inconsistent or contradictory evidence. The presented neural networks capture many properties of Bayesian networks, providing distributed versions of probabilistic models.\nSimple type theory is suited as framework for combining classical and non-classical logics. This claim is based on the observation that various prominent logics, including (quantified) multimodal logics and intuitionistic logics, can be elegantly embedded in simple type theory. Furthermore, simple type theory is sufficiently expressive to model combinations of embedded logics and it has a well understood semantics. Off-the-shelf reasoning systems for simple type theory exist that can be uniformly employed for reasoning within and about combinations of logics.\nWe report (to our knowledge) the first evaluation of Constraint Satisfaction as a computational framework for solving closest string problems. We show that careful consideration of symbol occurrences can provide search heuristics that provide several orders of magnitude speedup at and above the optimal distance. We also report (to our knowledge) the first analysis and evaluation -- using any technique -- of the computational difficulties involved in the identification of all closest strings for a given input set. We describe algorithms for web-scale distributed solution of closest string problems, both purely based on AI backtrack search and also hybrid numeric-AI methods.\nMany functions have been recently defined to assess the similarity among networks as tools for quantitative comparison. They stem from very different frameworks - and they are tuned for dealing with different situations. Here we show an overview of the spectral distances, highlighting their behavior in some basic cases of static and dynamic synthetic and real networks.\nWe consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actor-critic framework are presented, and shown to converge. The advantage of such an adaptive basis is demonstrated in simulations.\nPrograms to solve so-called constraint problems are complex pieces of software which require many design decisions to be made more or less arbitrarily by the implementer. These decisions affect the performance of the finished solver significantly. Once a design decision has been made, it cannot easily be reversed, although a different decision may be more appropriate for a particular problem.   We investigate using machine learning to make these decisions automatically depending on the problem to solve with the alldifferent constraint as an example. Our system is capable of making non-trivial, multi-level decisions that improve over always making a default choice.\nIn this paper the author presents a kind of Soft Computing Technique, mainly an application of fuzzy set theory of Prof. Zadeh [16], on a problem of Medical Experts Systems. The choosen problem is on design of a physician's decision model which can take crisp as well as fuzzy data as input, unlike the traditional models. The author presents a mathematical model based on fuzzy set theory for physician aided evaluation of a complete representation of information emanating from the initial interview including patient past history, present symptoms, and signs observed upon physical examination and results of clinical and diagnostic tests.\nIn this paper we present a novel genetic algorithm (GA) solution to a simple yet challenging commercial puzzle game known as the Zen Puzzle Garden (ZPG). We describe the game in detail, before presenting a suitable encoding scheme and fitness function for candidate solutions. We then compare the performance of the genetic algorithm with that of the A* algorithm. Our results show that the GA is competitive with informed search in terms of solution quality, and significantly out-performs it in terms of computational resource requirements. We conclude with a brief discussion of the implications of our findings for game solving and other \"real world\" problems.\nA research project aimed at the development of an automated theorem proving system was started in Kiev (Ukraine) in early 1960s. The mastermind of the project, Academician V.Glushkov, baptized it \"Evidence Algorithm\", EA. The work on the project lasted, off and on, more than 40 years. In the framework of the project, the Russian and English versions of the System for Automated Deduction, SAD, were constructed. They may be already seen as powerful theorem-proving assistants.\nIn logic there is a clear concept of what constitutes a proof and what not. A proof is essentially defined as a finite sequence of formulae which are either axioms or derived by proof rules from formulae earlier in the sequence. Sociologically, however, it is more difficult to say what should constitute a proof and what not. In this paper we will look at different forms of proofs and try to clarify the concept of proof in the wider meaning of the term. This has implications on how proofs should be represented formally.\nA cellular automata (CA) configuration is constructed that exhibits emergent failover. The configuration is based on standard Game of Life rules. Gliders and glider-guns form the core messaging structure in the configuration. The blinker is represented as the basic computational unit, and it is shown how it can be recreated in case of a failure. Stateless failover using primary-backup mechanism is demonstrated. The details of the CA components used in the configuration and its working are described, and a simulation of the complete configuration is also presented.\nIn this work we start walking the path to a new perspective for viewing cyberwarfare scenarios, by introducing conceptual tools (a formal model) to evaluate the costs of an attack, to describe the theater of operations, targets, missions, actions, plans and assets involved in cyberwarfare attacks. We also describe two applications of this model: autonomous planning leading to automated penetration tests, and attack simulations, allowing a system administrator to evaluate the vulnerabilities of his network.\nMarkov decision processes (MDPs) are widely used for modeling decision-making problems in robotics, automated control, and economics. Traditional MDPs assume that the decision maker (DM) knows all states and actions. However, this may not be true in many situations of interest. We define a new framework, MDPs with unawareness (MDPUs) to deal with the possibilities that a DM may not be aware of all possible actions. We provide a complete characterization of when a DM can learn to play near-optimally in an MDPU, and give an algorithm that learns to play near-optimally when it is possible to do so, as efficiently as possible. In particular, we characterize when a near-optimal solution can be found in polynomial time.\nWe wish to present a mirrored language structure (MLS) and four logic rules determined by this structure for the model of a computable Oracle Turing machine. MLS has novel features that are of considerable biological and computational significance. It suggests an algorithm of relation learning and recognition (RLR) that enables the deterministic computers to simulate the mechanism of the Oracle Turing machine, or P = NP in a mathematical term.\nWe propose an online form of the cake cutting problem. This models situations where players arrive and depart during the process of dividing a resource. We show that well known fair division procedures like cut-and-choose and the Dubins-Spanier moving knife procedure can be adapted to apply to such online problems. We propose some desirable properties that online cake cutting procedures might possess like online forms of proportionality and envy-freeness, and identify which properties are in fact possessed by the different online cake procedures.\nCausality is a non-obvious concept that is often considered to be related to temporality. In this paper we present a number of past and present approaches to the definition of temporality and causality from philosophical, physical, and computational points of view. We note that time is an important ingredient in many relationships and phenomena. The topic is then divided into the two main areas of temporal discovery, which is concerned with finding relations that are stretched over time, and causal discovery, where a claim is made as to the causal influence of certain events on others. We present a number of computational tools used for attempting to automatically discover temporal and causal relations in data.\nThis paper develops automated testing and debugging techniques for answer set solver development. We describe a flexible grammar-based black-box ASP fuzz testing tool which is able to reveal various defects such as unsound and incomplete behavior, i.e. invalid answer sets and inability to find existing solutions, in state-of-the-art answer set solver implementations. Moreover, we develop delta debugging techniques for shrinking failure-inducing inputs on which solvers exhibit defective behavior. In particular, we develop a delta debugging algorithm in the context of answer set solving, and evaluate two different elimination strategies for the algorithm.\nAnswer set programming - the most popular problem solving paradigm based on logic programs - has been recently extended to support uninterpreted function symbols. All of these approaches have some limitation. In this paper we propose a class of programs called FP2 that enjoys a different trade-off between expressiveness and complexity. FP2 programs enjoy the following unique combination of properties: (i) the ability of expressing predicates with infinite extensions; (ii) full support for predicates with arbitrary arity; (iii) decidability of FP2 membership checking; (iv) decidability of skeptical and credulous stable model reasoning for call-safe queries. Odd cycles are supported by composing FP2 programs with argument restricted programs.\nThe long-standing identification problem for causal effects in graphical models has many partial results but lacks a systematic study. We show how computer algebra can be used to either prove that a causal effect can be identified, generically identified, or show that the effect is not generically identifiable. We report on the results of our computations for linear structural equation models, where we determine precisely which causal effects are generically identifiable for all graphs on three and four vertices.\nThis paper presents the solution about the threat of a VBIED (Vehicle-Born Improvised Explosive Device) obtained with the DSmT (Dezert-Smarandache Theory). This problem has been proposed recently to the authors by Simon Maskell and John Lavery as a typical illustrative example to try to compare the different approaches for dealing with uncertainty for decision-making support. The purpose of this paper is to show in details how a solid justified solution can be obtained from DSmT approach and its fusion rules thanks to a proper modeling of the belief functions involved in this problem.\nMarkov models are extensively used in the analysis of molecular evolution. A recent line of research suggests that pairs of proteins with functional and physical interactions co-evolve with each other. Here, by analyzing hundreds of orthologous sets of three fungi and their co-evolutionary relations, we demonstrate that co-evolutionary assumption may violate the Markov assumption. Our results encourage developing alternative probabilistic models for the cases of extreme co-evolution.\nThe approach of applying associative processor for decision making problem was proposed. It focuses on hardware implementations of fuzzy processing systems, associativity as effective management basis of fuzzy processor. The structural approach is being developed resulting in a quite simple and compact parallel associative memory unit (PAMU). The memory cost and speed comparison of processors with rigid and soft-variable structure is given. Also the example PAMU flashing is considered.\nIn context of efforts of composing category-theoretic and logical methods in the area of knowledge representation we propose the notion of conceptory. We consider intersection/union and other constructions in conceptories as expressive alternative to category-theoretic (co)limits and show they have features similar to (pro-, in-)jections. Then we briefly discuss approaches to development of formal systems built on the base of conceptories and describe possible application of such system to the specific ontology.\nA learning algorithm based on primary school teaching and learning is presented. The methodology is to continuously evaluate a student and to give them training on the examples for which they repeatedly fail, until, they can correctly answer all types of questions. This incremental learning procedure produces better learning curves by demanding the student to optimally dedicate their learning time on the failed examples. When used in machine learning, the algorithm is found to train a machine on a data with maximum variance in the feature space so that the generalization ability of the network improves. The algorithm has interesting applications in data mining, model evaluations and rare objects discovery.\nWorld Wide Web (WWW) is the most popular global information sharing and communication system consisting of three standards .i.e., Uniform Resource Identifier (URL), Hypertext Transfer Protocol (HTTP) and Hypertext Mark-up Language (HTML). Information is provided in text, image, audio and video formats over the web by using HTML which is considered to be unconventional in defining and formalizing the meaning of the context...\nWe examine the practicality for a user of using Answer Set Programming (ASP) for representing logical formalisms. Our example is a formalism aiming at capturing causal explanations from causal information. We show the naturalness and relative efficiency of this translation job. We are interested in the ease for writing an ASP program. Limitations of the earlier systems made that in practice, the ``declarative aspect'' was more theoretical than practical. We show how recent improvements in working ASP systems facilitate the translation.\nKnowledge Management is a global process in companies. It includes all the processes that allow capitalization, sharing and evolution of the Knowledge Capital of the firm, generally recognized as a critical resource of the organization. Several approaches have been defined to capitalize knowledge but few of them study how to learn from this knowledge. We present in this paper an approach that helps to enhance learning from profession knowledge in an organisation. We apply our approach on knitting industry.\nConstraint problems can be trivially solved in parallel by exploring different branches of the search tree concurrently. Previous approaches have focused on implementing this functionality in the solver, more or less transparently to the user. We propose a new approach, which modifies the constraint model of the problem. An existing model is split into new models with added constraints that partition the search space. Optionally, additional constraints are imposed that rule out the search already done. The advantages of our approach are that it can be implemented easily, computations can be stopped and restarted, moved to different machines and indeed solved on machines which are not able to communicate with each other at all.\nIt is well known that text compression can be achieved by predicting the next symbol in the stream of text data based on the history seen up to the current symbol. The better the prediction the more skewed the conditional probability distribution of the next symbol and the shorter the codeword that needs to be assigned to represent this next symbol. What about the opposite direction ? suppose we have a black box that can compress text stream. Can it be used to predict the next symbol in the stream ? We introduce a criterion based on the length of the compressed data and use it to predict the next symbol. We examine empirically the prediction error rate and its dependency on some compression parameters.\nIn this paper we introduce a novel method for automatically tuning the search parameters of a chess program using genetic algorithms. Our results show that a large set of parameter values can be learned automatically, such that the resulting performance is comparable with that of manually tuned parameters of top tournament-playing chess programs.\nIt is a challenge for any Knowledge Base reasoning to manage ubiquitous uncertain ontology as well as uncertain updating times, while achieving acceptable service levels at minimum computational cost. This paper proposes an application-independent merging ontologies for any open interaction system. A solution that uses Multi-Entity Bayesan Networks with SWRL rules, and a Java program is presented to dynamically monitor Exogenous and Endogenous temporal evolution on updating merging ontologies on a probabilistic framework for the Semantic Web.\nThe problem of measuring similarity of graphs and their nodes is important in a range of practical problems. There is a number of proposed measures, some of them being based on iterative calculation of similarity between two graphs and the principle that two nodes are as similar as their neighbors are. In our work, we propose one novel method of that sort, with a refined concept of similarity of two nodes that involves matching of their neighbors. We prove convergence of the proposed method and show that it has some additional desirable properties that, to our knowledge, the existing methods lack. We illustrate the method on two specific problems and empirically compare it to other methods.\nThe school choice problem concerns the design and implementation of matching mechanisms that produce school assignments for students within a given public school district. In this note we define a simple student-optimal criterion that is not met by any previously employed mechanism in the school choice literature. We then use this criterion to adapt a well-known combinatorial optimization technique (Hungarian algorithm) to the school choice problem.\nThe iDian (previously named as the Operation Agent System) is a framework designed to enable computer users to operate software in natural language. Distinct from current speech-recognition systems, our solution supports format-free combinations of orders, and is open to both developers and customers. We used a multi-layer structure to build the entire framework, approached rule-based natural language processing, and implemented demos narrowing down to Windows, text-editing and a few other applications. This essay will firstly give an overview of the entire system, and then scrutinize the functions and structure of the system, and finally discuss the prospective de-velopment, esp. on-line interaction functions.\nSubstitutability, interchangeability and related concepts in Constraint Programming were introduced approximately twenty years ago and have given rise to considerable subsequent research. We survey this work, classify, and relate the different concepts, and indicate directions for future work, in particular with respect to making connections with research into symmetry breaking. This paper is a condensed version of a larger work in progress.\nAn important issue in Qualitative Spatial Reasoning is the representation of relative direction. In this paper we present simple geometric rules that enable reasoning about relative direction between oriented points. This framework, the Oriented Point Algebra OPRA_m, has a scalable granularity m. We develop a simple algorithm for computing the OPRA_m composition tables and prove its correctness. Using a composition table, algebraic closure for a set of OPRA statements is sufficient to solve spatial navigation tasks. And it turns out that scalable granularity is useful in these navigation tasks.\nIn this paper it is considered rule reduct generation problem, based on Rough Set Theory. Rule Reduct Generation (RG) and Modified Rule Generation (MRG) algorithms are well-known. Alternative to these algorithms Pruning Algorithm of Generation A Minimal Set of Rule Reducts, or briefly Pruning Rule Generation (PRG) algorithm is developed. PRG algorithm uses tree structured data type. PRG algorithm is compared with RG and MRG algorithms\nThe cardinal direction calculus (CDC) proposed by Goyal and Egenhofer is a very expressive qualitative calculus for directional information of extended objects. Early work has shown that consistency checking of complete networks of basic CDC constraints is tractable while reasoning with the CDC in general is NP-hard. This paper shows, however, if allowing some constraints unspecified, then consistency checking of possibly incomplete networks of basic CDC constraints is already intractable. This draws a sharp boundary between the tractable and intractable subclasses of the CDC. The result is achieved by a reduction from the well-known 3-SAT problem.\nBayesian network is a complete model for the variables and their relationships, it can be used to answer probabilistic queries about them. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems. In the application of Bayesian networks, most of the work is related to probabilistic inferences. Any variable updating in any node of Bayesian networks might result in the evidence propagation across the Bayesian networks. This paper sums up various inference techniques in Bayesian networks and provide guidance for the algorithm calculation in probabilistic inference in Bayesian networks.\nIn this paper, we propose a new approach for recommender systems based on target tracking by Kalman filtering. We assume that users and their seen resources are vectors in the multidimensional space of the categories of the resources. Knowing this space, we propose an algorithm based on a Kalman filter to track users and to predict the best prediction of their future position in the recommendation space.\nUnit resolution can simplify a CNF formula or detect an inconsistency by repeatedly assign the variables occurring in unit clauses. Given any CNF formula sigma, we show that there exists a satisfiable CNF formula psi with size polynomially related to the size of sigma such that applying unit resolution to psi simulates all the effects of applying it to sigma. The formula psi is said to be the reified counterpart of sigma. This approach can be used to prove that the failed literal rule, which is an inference rule used by some SAT solvers, can be entirely simulated by unit resolution. More generally, it sheds new light on the expressive power of unit resolution.\nOur work has focused on support for film or television scriptwriting. Since this involves potentially varied story-lines, we note the implicit or latent support for interactivity. Furthermore the film, television, games, publishing and other sectors are converging, so that cross-over and re-use of one form of product in another of these sectors is ever more common. Technically our work has been largely based on mathematical algorithms for data clustering and display. Operationally, we also discuss how our algorithms can support collective, distributed problem-solving.\nThe Resource Description Framework (RDF) provides a common data model for the integration of \"real-time\" social and sensor data streams with the Web and with each other. While there exist numerous protocols and data formats for exchanging dynamic RDF data, or RDF updates, these options should be examined carefully in order to enable a Semantic Web equivalent of the high-throughput, low-latency streams of typical Web 2.0, multimedia, and gaming applications. This paper contains a brief survey of RDF update formats and a high-level discussion of both TCP and UDP-based transport protocols for updates. Its main contribution is the experimental evaluation of a UDP-based architecture which serves as a real-world example of a high-performance RDF streaming application in an Internet-scale distributed environment.\nThis paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.\nSNOMED Clinical Terms (SNOMED CT) is one of the most widespread ontologies in the life sciences, with more than 300,000 concepts and relationships, but is distributed with no associated software tools. In this paper we present MySNOM, a web-based SNOMED CT browser. MySNOM allows organizations to browse their own distribution of SNOMED CT under a controlled environment, focuses on navigating using the structure of SNOMED CT, and has diagramming capabilities.\nIn this paper we present a preliminary logic-based evaluation of the integration of post-composed phenotypic descriptions with domain ontologies. The evaluation has been performed using a description logic reasoner together with scalable techniques: ontology modularization and approximations of the logical difference between ontologies.\nFuzzy automata have long been accepted as a generalization of nondeterministic finite automata. A closer examination, however, shows that the fundamental property---nondeterminism---in nondeterministic finite automata has not been well embodied in the generalization. In this paper, we introduce nondeterministic fuzzy automata with or without $\\el$-moves and fuzzy languages recognized by them. Furthermore, we prove that (deterministic) fuzzy automata, nondeterministic fuzzy automata, and nondeterministic fuzzy automata with $\\el$-moves are all equivalent in the sense that they recognize the same class of fuzzy languages.\nWe explore phase transitions of plan modification, which mainly focus on the conformant planning problems. By analyzing features of plan modification in conformant planning problems, quantitative results are obtained. If the number of operators is less than, almost all conformant planning problems can't be solved with plan modification. If the number of operators is more than, almost all conformant planning problems can be solved with plan modification. The results of the experiments also show that there exists an experimental threshold of density (ratio of number of operators to number of propositions), which separates the region where almost all conformant planning problems can't be solved with plan modification from the region where almost all conformant planning problems can be solved with plan modification.\nIn this paper, we propose a new approach for recommender systems based on target tracking by Kalman filtering. We assume that users and their seen resources are vectors in the multidimensional space of the categories of the resources. Knowing this space, we propose an algorithm based on a Kalman filter to track users and to predict the best prediction of their future position in the recommendation space.\nA new distance function dist(A,B) for fuzzy sets A and B is introduced. It is based on the descriptive complexity, i.e., the number of bits (on average) that are needed to describe an element in the symmetric difference of the two sets. The distance gives the amount of additional information needed to describe any one of the two sets given the other. We prove its mathematical properties and perform pattern clustering on data based on this distance.\nInterpolation is an important property of classical and many non classical logics that has been shown to have interesting applications in computer science and AI. Here we study the Interpolation Property for the propositional version of the non-monotonic system of equilibrium logic, establishing weaker or stronger forms of interpolation depending on the precise interpretation of the inference relation. These results also yield a form of interpolation for ground logic programs under the answer sets semantics. For disjunctive logic programs we also study the property of uniform interpolation that is closely related to the concept of variable forgetting.\nIn this paper we introduce a method for extending binary qualitative direction calculi with adjustable granularity like OPRAm or the star calculus with a granular distance concept. This method is similar to the concept of extending points with an internal reference direction to get oriented points which are the basic entities in the OPRAm calculus. Even if the spatial objects are from a geometrical point of view infinitesimal small points locally available reference measures are attached. In the case of OPRAm, a reference direction is attached. The same principle works also with local reference distances which are called elevations. The principle of attaching references features to a point is called hidden feature attachment.\nMany important problems in discrete optimization require maximization of a monotonic submodular function subject to matroid constraints. For these problems, a simple greedy algorithm is guaranteed to obtain near-optimal solutions. In this article, we extend this classic result to a general class of adaptive optimization problems under partial observability, where each choice can depend on observations resulting from past choices. Specifically, we prove that a natural adaptive greedy algorithm provides a $1/(p+1)$ approximation for the problem of maximizing an adaptive monotone submodular function subject to $p$ matroid constraints, and more generally over arbitrary $p$-independence systems. We illustrate the usefulness of our result on a complex adaptive match-making application.\nThe paper addresses a new class of combinatorial problems which consist in restructuring of solutions (as structures) in combinatorial optimization. Two main features of the restructuring process are examined: (i) a cost of the restructuring, (ii) a closeness to a goal solution. This problem corresponds to redesign (improvement, upgrade) of modular systems or solutions. The restructuring approach is described and illustrated for the following combinatorial optimization problems: knapsack problem, multiple choice problem, assignment problem, spanning tree problems. Examples illustrate the restructuring processes.\nThis paper presents a new multi-objective hybrid model that makes cooperation between the strength of research of neighborhood methods presented by the tabu search (TS) and the important exploration capacity of evolutionary algorithm. This model was implemented and tested in benchmark functions (ZDT1, ZDT2, and ZDT3), using a network of computers.\nThis article is concerned with automated complexity analysis of term rewrite systems. Since these systems underlie much of declarative programming, time complexity of functions defined by rewrite systems is of particular interest. Among other results, we present a variant of the dependency pair method for analysing runtime complexities of term rewrite systems automatically. The established results significantly extent previously known techniques: we give examples of rewrite systems subject to our methods that could previously not been analysed automatically. Furthermore, the techniques have been implemented in the Tyrolean Complexity Tool. We provide ample numerical data for assessing the viability of the method.\nMAX-SAT heuristics normally operate from random initial truth assignments to the variables. We consider the use of what we call preambles, which are sequences of variables with corresponding single-variable assignment actions intended to be used to determine a more suitable initial truth assignment for a given problem instance and a given heuristic. For a number of well established MAX-SAT heuristics and benchmark instances, we demonstrate that preambles can be evolved by a genetic algorithm such that the heuristics are outperformed in a significant fraction of the cases.\nThe study of phase transition phenomenon of NP complete problems plays an important role in understanding the nature of hard problems. In this paper, we follow this line of research by considering the problem of counting solutions of Constraint Satisfaction Problems (#CSP). We consider the random model, i.e. RB model. We prove that phase transition of #CSP does exist as the number of variables approaches infinity and the critical values where phase transitions occur are precisely located. Preliminary experimental results also show that the critical point coincides with the theoretical derivation. Moreover, we propose an approximate algorithm to estimate the expectation value of the solutions number of a given CSP instance of RB model.\nAn algorithm running in O(1.1995n) is presented for counting models for exact satisfiability formulae(#XSAT). This is faster than the previously best algorithm which runs in O(1.2190n). In order to improve the efficiency of the algorithm, a new principle, i.e. the common literals principle, is addressed to simplify formulae. This allows us to eliminate more common literals. In addition, we firstly inject the resolution principles into solving #XSAT problem, and therefore this further improves the efficiency of the algorithm.\nThe rigorous theoretical analysis of the algorithm for a subclass of QSAT, i.e. (1, 2)-QSAT, has been proposed in the literature. (1, 2)-QSAT, first introduced in SAT'08, can be seen as quantified extended 2-CNF formulas. Until now, within our knowledge, there exists no algorithm presenting the worst upper bound for (1, 2)-QSAT. Therefore in this paper, we present an exact algorithm to solve (1, 2)-QSAT. By analyzing the algorithms, we obtain a worst-case upper bound O(1.4142m), where m is the number of clauses.\nThe global objective of this work is to provide practical optimization methods to companies involved in inventory routing problems, taking into account this new type of data. Also, companies are sometimes not able to deal with changing plans every period and would like to adopt regular structures for serving customers.\nThe process-based semantic composition of Web Services is gaining a considerable momentum as an approach for the effective integration of distributed, heterogeneous, and autonomous applications. To compose Web Services semantically, we need an ontology. There are several ways of inserting semantics in Web Services. One of them consists of using description languages like OWL-S. In this paper, we introduce our work which consists in the proposition of a new model and the use of semantic matching technology for semantic and dynamic composition of ebXML business processes.\nThis paper considers multiprocessor task scheduling in a multistage hybrid flow-shop environment. The problem even in its simplest form is NP-hard in the strong sense. The great deal of interest for this problem, besides its theoretical complexity, is animated by needs of various manufacturing and computing systems. We propose a new approach based on limited discrepancy search to solve the problem. Our method is tested with reference to a proposed lower bound as well as the best-known solutions in literature. Computational results show that the developed approach is efficient in particular for large-size problems.\nWe propose AllDiffPrecedence, a new global constraint that combines together an AllDifferent constraint with precedence constraints that strictly order given pairs of variables. We identify a number of applications for this global constraint including instruction scheduling and symmetry breaking. We give an efficient propagation algorithm that enforces bounds consistency on this global constraint. We show how to implement this propagator using a decomposition that extends the bounds consistency enforcing decomposition proposed for the AllDifferent constraint. Finally, we prove that enforcing domain consistency on this global constraint is NP-hard in general.\nThe state of the art in local search for the Traveling Salesman Problem is dominated by ejection chain methods utilising the Stem-and-Cycle reference structure. Though effective such algorithms employ very little information in their successor selection strategy, typically seeking only to minimise the cost of a move. We propose an alternative approach inspired from the AI literature and show how an admissible heuristic can be used to guide successor selection. We undertake an empirical analysis and demonstrate that this technique often produces better results than less informed strategies albeit at the cost of running in higher polynomial time.\nThe competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent substantial improvement by others dates back 7 years (error rate 0.4%) . Recently we were able to significantly improve this result, using graphics cards to greatly speed up training of simple but deep MLPs, which achieved 0.35%, outperforming all the previous more complex methods. Here we report another substantial improvement: 0.31% obtained using a committee of MLPs.\nNonmonotonic causal logic, introduced by Norman McCain and Hudson Turner, became a basis for the semantics of several expressive action languages. McCain's embedding of definite propositional causal theories into logic programming paved the way to the use of answer set solvers for answering queries about actions described in such languages. In this paper we extend this embedding to nondefinite theories and to first-order causal logic.\nIn the present paper, we try to propose a self-similar network theory for the basic understanding. By extending the natural languages to a kind of so called idealy sufficient language, we can proceed a few steps to the investigation of the language searching and the language understanding of AI.   Image understanding, and the familiarity of the brain to the surrounding environment are also discussed. Group effects are discussed by addressing the essense of the power of influences, and constructing the influence network of a society. We also give a discussion of inspirations.\nWe study uniform interpolation and forgetting in the description logic ALC. Our main results are model-theoretic characterizations of uniform inter- polants and their existence in terms of bisimula- tions, tight complexity bounds for deciding the existence of uniform interpolants, an approach to computing interpolants when they exist, and tight bounds on their size. We use a mix of model- theoretic and automata-theoretic methods that, as a by-product, also provides characterizations of and decision procedures for conservative extensions.\nPattern learning in an important problem in Natural Language Processing (NLP). Some exhaustive pattern learning (EPL) methods (Bod, 1992) were proved to be flawed (Johnson, 2002), while similar algorithms (Och and Ney, 2004) showed great advantages on other tasks, such as machine translation. In this article, we first formalize EPL, and then show that the probability given by an EPL model is constant-factor approximation of the probability given by an ensemble method that integrates exponential number of models obtained with various segmentations of the training data. This work for the first time provides theoretical justification for the widely used EPL algorithm in NLP, which was previously viewed as a flawed heuristic method. Better understanding of EPL may lead to improved pattern learning algorithms in future.\nWe present an approach to propagation based solving, Boolean equi-propagation, where constraints are modelled as propagators of information about equalities between Boolean literals. Propagation based solving applies this information as a form of partial evaluation resulting in optimized SAT encodings. We demonstrate for a variety of benchmarks that our approach results in smaller CNF encodings and leads to speed-ups in solving times.\nIn this paper, we investigate the hybrid tractability of binary Quantified Constraint Satisfaction Problems (QCSPs). First, a basic tractable class of binary QCSPs is identified by using the broken-triangle property. In this class, the variable ordering for the broken-triangle property must be same as that in the prefix of the QCSP. Second, we break this restriction to allow that existentially quantified variables can be shifted within or out of their blocks, and thus identify some novel tractable classes by introducing the broken-angle property. Finally, we identify a more generalized tractable class, i.e., the min-of-max extendable class for QCSPs.\nA natural and established way to restrict the constraint satisfaction problem is to fix the relations that can be used to pose constraints; such a family of relations is called a constraint language. In this article, we study arc consistency, a heavily investigated inference method, and three extensions thereof from the perspective of constraint languages. We conduct a comparison of the studied methods on the basis of which constraint languages they solve, and we present new polynomial-time tractability results for singleton arc consistency, the most powerful method studied.\nWe present a first theoretical analysis of the power of polynomial-time preprocessing for important combinatorial problems from various areas in AI. We consider problems from Constraint Satisfaction, Global Constraints, Satisfiability, Nonmonotonic and Bayesian Reasoning. We show that, subject to a complexity theoretic assumption, none of the considered problems can be reduced by polynomial-time preprocessing to a problem kernel whose size is polynomial in a structural problem parameter of the input, such as induced width or backdoor size. Our results provide a firm theoretical boundary for the performance of polynomial-time preprocessing algorithms for the considered problems.\nWe consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.\nDespite the prevalence of the Computational Theory of Mind and the Connectionist Model, the establishing of the key principles of the Cognitive Science are still controversy and inconclusive. This paper proposes the concept of Pattern Recognition as Necessary and Sufficient Principle for a general cognitive science modeling, in a very ambitious scientific proposal. A formal physical definition of the pattern recognition concept is also proposed to solve many key conceptual gaps on the field.\nWe present a novel Natural Evolution Strategy (NES) variant, the Rank-One NES (R1-NES), which uses a low rank approximation of the search distribution covariance matrix. The algorithm allows computation of the natural gradient with cost linear in the dimensionality of the parameter space, and excels in solving high-dimensional non-separable problems, including the best result to date on the Rosenbrock function (512 dimensions).\nWe look more carefully at the modeling of causality using structural equations. It is clear that the structural equations can have a major impact on the conclusions we draw about causality. In particular, the choice of variables and their values can also have a significant impact on causality. These choices are, to some extent, subjective. We consider what counts as an appropriate choice. More generally, we consider what makes a model an appropriate model, especially if we want to take defaults into account, as was argued is necessary in recent work.\nWe propose a purely extensional semantics for higher-order logic programming. In this semantics program predicates denote sets of ordered tuples, and two predicates are equal iff they are equal as sets. Moreover, every program has a unique minimum Herbrand model which is the greatest lower bound of all Herbrand models of the program and the least fixed-point of an immediate consequence operator. We also propose an SLD-resolution proof procedure which is proven sound and complete with respect to the minimum model semantics. In other words, we provide a purely extensional theoretical framework for higher-order logic programming which generalizes the familiar theory of classical (first-order) logic programming.\nThis preliminary report addresses the expressive power of unit resolution regarding input data encoded with partial truth assignments of propositional variables. A characterization of the functions that are computable in this way, which we propose to call propagatable functions, is given. By establishing that propagatable functions can also be computed using monotone circuits, we show that there exist polynomial time complexity propagable functions requiring an exponential amount of clauses to be computed using unit resolution. These results shed new light on studying CNF encodings of NP-complete problems in order to solve them using propositional satisfiability algorithms.\nA sound and complete embedding of conditional logics into classical higher-order logic is presented. This embedding enables the application of off-the-shelf higher-order automated theorem provers and model finders for reasoning within and about conditional logics.\nIndividuals have an intuitive perception of what makes a good coincidence. Though the sensitivity to coincidences has often been presented as resulting from an erroneous assessment of probability, it appears to be a genuine competence, based on non-trivial computations. The model presented here suggests that coincidences occur when subjects perceive complexity drops. Co-occurring events are, together, simpler than if considered separately. This model leads to a possible redefinition of subjective probability.\nIn this paper we explore a symmetry-based search space reduction technique which can speed up optimal pathfinding on undirected uniform-cost grid maps by up to 38 times. Our technique decomposes grid maps into a set of empty rectangles, removing from each rectangle all interior nodes and possibly some from along the perimeter. We then add a series of macro-edges between selected pairs of remaining perimeter nodes to facilitate provably optimal traversal through each rectangle. We also develop a novel online pruning technique to further speed up search. Our algorithm is fast, memory efficient and retains the same optimality and completeness guarantees as searching on an unmodified grid map.\nThe study of opinions, their formation and change, is one of the defining topics addressed by social psychology, but in recent years other disciplines, as computer science and complexity, have addressed this challenge. Despite the flourishing of different models and theories in both fields, several key questions still remain unanswered. The aim of this paper is to challenge the current theories on opinion by putting forward a cognitively grounded model where opinions are described as specific mental representations whose main properties are put forward. A comparison with reputation will be also presented.\nThe cognitive research on reputation has shown several interesting properties that can improve both the quality of services and the security in distributed electronic environments. In this paper, the impact of reputation on decision-making under scarcity of information will be shown. First, a cognitive theory of reputation will be presented, then a selection of simulation experimental results from different studies will be discussed. Such results concern the benefits of reputation when agents need to find out good sellers in a virtual market-place under uncertainty and informational cheating.\nWe present a method for estimating pose information from a single depth image given an arbitrary kinematic structure without prior training. For an arbitrary skeleton and depth image, an evolutionary algorithm is used to find the optimal kinematic configuration to explain the observed image. Results show that our approach can correctly estimate poses of 39 and 78 degree-of-freedom models from a single depth image, even in cases of significant self-occlusion.\nDominance-based Rough Set Approach (DRSA), as the extension of Pawlak's Rough Set theory, is effective and fundamentally important in Multiple Criteria Decision Analysis (MCDA). In previous DRSA models, the definitions of the upper and lower approximations are preserving the class unions rather than the singleton class. In this paper, we propose a new Class-based Rough Approximation with respect to a series of previous DRSA models, including Classical DRSA model, VC-DRSA model and VP-DRSA model. In addition, the new class-based reducts are investigated.\nThe Dempster-Shafer theory of evidence accumulation is one of the main tools for combining data obtained from multiple sources. In this paper a special case of combination of two bodies of evidence with non-zero conflict coefficient is considered. It is shown that application of the Dempster-Shafer rule of combination in this case leads to an evaluation of masses of the combined bodies that is different from the evaluation of the corresponding probabilities obtained by application of the law of total probability. This finding supports the view that probabilistic interpretation of results of the Dempster-Shafer analysis in the general case is not appropriate.\nUsing the probability theory-based approach, this paper reveals the equivalence of an arbitrary NP-complete problem to a problem of checking whether a level set of a specifically constructed harmonic cost function (with all diagonal entries of its Hessian matrix equal to zero) intersects with a unit hypercube in many-dimensional Euclidean space. This connection suggests the possibility that methods of continuous mathematics can provide crucial insights into the most intriguing open questions in modern complexity theory.\nWe present in this paper our law that there is always a connection present between two entities, with a selfconnection being present at least in each node. An entity is an object, physical or imaginary, that is connected by a path (or connection) and which is important for achieving the desired result of the scenario. In machine learning, we state that for any scenario, a subject entity is always, directly or indirectly, connected and affected by single or multiple independent / dependent entities, and their impact on the subject entity is dependent on various factors falling into the categories such as the existenc\nIn this paper we propose task swapping networks for task reassignments by using task swappings in distributed systems. Some classes of task reassignments are achieved by using iterative local task swappings between software agents in distributed systems. We use group-theoretic methods to find a minimum-length sequence of adjacent task swappings needed from a source task assignment to a target task assignment in a task swapping network of several well-known topologies.\nThis paper gives a survey on the current state of Web Service Compositions and the difficulties and solutions to automated Web Service Compositions. This first gives a definition of Web Service Composition and the motivation and goal of it. It then explores into why we need automated Web Service Compositions and formally defines the domains. Techniques and solutions are proposed by the papers we surveyed to solve the current difficulty of automated Web Service Composition. Verification and future work is discussed at the end to further extend the topic.\nRule-Based Systems have been in use for decades to solve a variety of problems but not in the sensor informatics domain. Rules aid the aggregation of low-level sensor readings to form a more complete picture of the real world and help to address 10 identified challenges for sensor network middleware. This paper presents the reader with an overview of a system architecture and a pilot application to demonstrate the usefulness of a system integrating rules with sensor middleware.\nRecently there have been some unexpected results concerning Fuzzy Description Logics (FDLs) with General Concept Inclusions (GCIs). They show that, unlike the classical case, the DL ALC with GCIs does not have the finite model property under Lukasiewicz Logic or Product Logic and, specifically, knowledge base satisfiability is an undecidable problem for Product Logic. We complete here the analysis by showing that knowledge base satisfiability is also an undecidable problem for Lukasiewicz Logic.\nWe present a new system for simultaneous estimation of keys, chords, and bass notes from music audio. It makes use of a novel chromagram representation of audio that takes perception of loudness into account. Furthermore, it is fully based on machine learning (instead of expert knowledge), such that it is potentially applicable to a wider range of genres as long as training data is available. As compared to other models, the proposed system is fast and memory efficient, while achieving state-of-the-art performance.\nBoosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the \"correct\" requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements.\nWe discuss the evolution of aspects of nonmonotonic reasoning towards the computational paradigm of answer-set programming (ASP). We give a general overview of the roots of ASP and follow up with the personal perspective on research developments that helped verbalize the main principles of ASP and differentiated it from the classical logic programming.\nWe present a system capable of automatically solving combinatorial logic puzzles given in (simplified) English. It involves translating the English descriptions of the puzzles into answer set programming(ASP) and using ASP solvers to provide solutions of the puzzles. To translate the descriptions, we use a lambda-calculus based approach using Probabilistic Combinatorial Categorial Grammars (PCCG) where the meanings of words are associated with parameters to be able to distinguish between multiple meanings of the same word. Meaning of many words and the parameters are learned. The puzzles are represented in ASP using an ontology which is applicable to a large set of logic puzzles.\nWe present an analysis of the performance of an elitist Evolutionary algorithm using a recombination operator known as 1-Bit-Swap on the Royal Roads test function based on a population. We derive complete, approximate and asymptotic convergence rates for the algorithm. The complete model shows the benefit of the size of the population and re- combination pool.\nA prototype system is described whose core functionality is, based on propositional logic, the elimination of second-order operators, such as Boolean quantifiers and operators for projection, forgetting and circumscription. This approach allows to express many representational and computational tasks in knowledge representation - for example computation of abductive explanations and models with respect to logic programming semantics - in a uniform operational system, backed by a uniform classical semantic framework.\nWe present a framework which constructs an event-style dis- course semantics. The discourse dynamics are encoded in continuation semantics and various rhetorical relations are embedded in the resulting interpretation of the framework. We assume discourse and sentence are distinct semantic objects, that play different roles in meaning evalua- tion. Moreover, two sets of composition functions, for handling different discourse relations, are introduced. The paper first gives the necessary background and motivation for event and dynamic semantics, then the framework with detailed examples will be introduced.\nThis article presents an extension of Minimalist Categorial Gram- mars (MCG) to encode Chomsky's phases. These grammars are based on Par- tially Commutative Logic (PCL) and encode properties of Minimalist Grammars (MG) of Stabler. The first implementation of MCG were using both non- commutative properties (to respect the linear word order in an utterance) and commutative ones (to model features of different constituents). Here, we pro- pose to adding Chomsky's phases with the non-commutative tensor product of the logic. Then we could give account of the PIC just by using logical prop- erties of the framework.\nWe present a constraint-based approach to interactive product configuration. Our configurator tool FdConfig is based on feature models for the representation of the product domain. Such models can be directly mapped into constraint satisfaction problems and dealt with by appropriate constraint solvers. During the interactive configuration process the user generates new constraints as a result of his configuration decisions and even may retract constraints posted earlier. We discuss the configuration process, explain the underlying techniques and show optimizations.\nIn this paper we present a proposal for a knowledge-based programming environment. In such an environment, declarative background knowledge, procedures, and concrete data are represented in suitable languages and combined in a flexible manner. This leads to a highly declarative programming style. We illustrate our approach on an example and report about our prototype implementation.\nIn order to give appropriate semantics to qualitative conditionals of the form \"if A then normally B\", ordinal conditional functions (OCFs) ranking the possible worlds according to their degree of plausibility can be used. An OCF accepting all conditionals of a knowledge base R can be characterized as the solution of a constraint satisfaction problem. We present a high-level, declarative approach using constraint logic programming techniques for solving this constraint satisfaction problem. In particular, the approach developed here supports the generation of all minimal solutions; these minimal solutions are of special interest as they provide a basis for model-based inference from R.\nPublishing private data on external servers incurs the problem of how to avoid unwanted disclosure of confidential data. We study a problem of confidentiality in extended disjunctive logic programs and show how it can be solved by extended abduction. In particular, we analyze how credulous non-monotonic reasoning affects confidentiality.\nIn this paper a proof system is developed for plan verification problems $\\{X\\}c\\{Y\\}$ and $\\{X\\}c\\{KW p\\}$ under 0-approximation semantics for ${\\mathcal A}_K$. Here, for a plan $c$, two sets $X,Y$ of fluent literals, and a literal $p$, $\\{X\\}c\\{Y\\}$ (resp. $\\{X\\}c\\{KW p\\}$) means that all literals of $Y$ become true (resp. $p$ becomes known) after executing $c$ in any initial state in which all literals in $X$ are true.Then, soundness and completeness are proved. The proof system allows verifying plans and generating plans as well.\nIn this paper, we present domain-specific languages (DSLs) that we devised for their use in the implementation of a finite domain constraint programming system, available as library(clpfd) in SWI-Prolog and YAP-Prolog. These DSLs are used in propagator selection and constraint reification. In these areas, they lead to concise specifications that are easy to read and reason about. At compilation time, these specifications are translated to Prolog code, reducing interpretative run-time overheads. The devised languages can be used in the implementation of other finite domain constraint solvers as well and may contribute to their correctness, conciseness and efficiency.\nIn this work a stand-alone preprocessor for SAT is presented that is able to perform most of the known preprocessing techniques. Preprocessing a formula in SAT is important for performance since redundancy can be removed. The preprocessor is part of the SAT solver riss and is called Coprocessor. Not only riss, but also MiniSat 2.2 benefit from it, because the SatELite preprocessor of MiniSat does not implement recent techniques. By using more advanced techniques, Coprocessor is able to reduce the redundancy in a formula further and improves the overall solving performance.\nTransfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them into the training set used to solve a given target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.\nThe paper concerns selected rule modularization techniques. Three visual methods for inference specification for modularized rule- bases are described: Drools Flow, BPMN and XTT2. Drools Flow is a popular technology for workflow or process modeling, BPMN is an OMG standard for modeling business processes, and XTT2 is a hierarchical tab- ular system specification method. Because of some limitations of these solutions, several proposals of their integration are given.\nEvolutionary Multi-Objective Optimization is becoming a hot research area and quite a few papers regarding these algorithms have been published. However the role of local search techniques has not been expanded adequately. This paper studies the role of a local search technique called 2-opt for the Multi-Objective Travelling Salesman Problem (MOTSP). A new mutation operator called Jumping Gene (JG) is also used. Since 2-opt operator was intended for the single objective TSP, its domain has been expanded to MOTSP in this paper. This new technique is applied to the list of KroAB100 cities.\nThe conceptual knowledge framework OML/CKML needs several components for a successful design. One important, but previously overlooked, component is the central core of OML/CKML. The central core provides a theoretical link between the ontological specification in OML and the conceptual knowledge representation in CKML. This paper discusses the formal semantics and syntactic styles of the central core, and also the important role it plays in defining interoperability between OML/CKML, RDF/S and Ontolingua.\nAutomating the constraint modelling process is one of the key challenges facing the constraints field, and one of the principal obstacles preventing widespread adoption of constraint solving. This paper focuses on the refinement-based approach to automated modelling, where a user specifies a problem in an abstract constraint specification language and it is then automatically refined into a constraint model. In particular, we revisit the Conjure system that first appeared in prototype form in 2005 and present a new implementation with a much greater coverage of the specification language Essence.\nConcept Analysis provides a principled approach to effective management of wide area information systems, such as the Nebula File System and Interface. This not only offers evidence to support the assertion that a digital library is a bounded collection of incommensurate information sources in a logical space, but also sheds light on techniques for collaboration through coordinated access to the shared organization of knowledge.\nThe article presents a study on the biobjective inventory routing problem. Contrary to most previous research, the problem is treated as a true multi-objective optimization problem, with the goal of identifying Pareto-optimal solutions. Due to the hardness of the problem at hand, a reference point based optimization approach is presented and implemented into an optimization and decision support system, which allows for the computation of a true subset of the optimal outcomes. Experimental investigation involving local search metaheuristics are conducted on benchmark data, and numerical results are reported and analyzed.\nIn this paper we demonstrate that two common problems in Machine Learning---imbalanced and overlapping data distributions---do not have independent effects on the performance of SVM classifiers. This result is notable since it shows that a model of either of these factors must account for the presence of the other. Our study of the relationship between these problems has lead to the discovery of a previously unreported form of \"covert\" overfitting which is resilient to commonly used empirical regularization techniques. We demonstrate the existance of this covert phenomenon through several methods based around the parametric regularization of trained SVMs. Our findings in this area suggest a possible approach to quantifying overlap in real world data sets.\nSeveral rules for social choice are examined from a unifying point of view that looks at them as procedures for revising a system of degrees of belief in accordance with certain specified logical constraints. Belief is here a social attribute, its degrees being measured by the fraction of people who share a given opinion. Different known rules and some new ones are obtained depending on which particular constraints are assumed. These constraints allow to model different notions of choiceness. In particular, we give a new method to deal with approval-disapproval-preferential voting.\nWe investigate training and using Gaussian kernel SVMs by approximating the kernel with an explicit finite- dimensional polynomial feature representation based on the Taylor expansion of the exponential. Although not as efficient as the recently-proposed random Fourier features [Rahimi and Recht, 2007] in terms of the number of features, we show how this polynomial representation can provide a better approximation in terms of the computational cost involved. This makes our \"Taylor features\" especially attractive for use on very large data sets, in conjunction with online or stochastic training.\nIn voting contexts, some new candidates may show up in the course of the process. In this case, we may want to determine which of the initial candidates are possible winners, given that a fixed number $k$ of new candidates will be added. We give a computational study of this problem, focusing on scoring rules, and we provide a formal comparison with related problems such as control via adding candidates or cloning.\nWe show that estimating the complexity (mean and distribution) of the instances of a fixed size Constraint Satisfaction Problem (CSP) can be very hard. We deal with the main two aspects of the problem: defining a measure of complexity and generating random unbiased instances. For the first problem, we rely on a general framework and a measure of complexity we presented at CISSE08. For the generation problem, we restrict our analysis to the Sudoku example and we provide a solution that also explains why it is so difficult.\nThis paper presents an algorithm for learning a highly redundant inverse model in continuous and non-preset environments. Our Socially Guided Intrinsic Motivation by Demonstrations (SGIM-D) algorithm combines the advantages of both social learning and intrinsic motivation, to specialise in a wide range of skills, while lessening its dependence on the teacher. SGIM-D is evaluated on a fishing skill learning experiment.\nIn this paper, the continuity and strong continuity in domain-free information algebras and labeled information algebras are introduced respectively. A more general concept of continuous function which is defined between two domain-free continuous information algebras is presented. It is shown that, with the operations combination and focusing, the set of all continuous functions between two domain-free s-continuous information algebras forms a new s-continuous information algebra. By studying the relationship between domain-free information algebras and labeled information algebras, it is demonstrated that they do correspond to each other on s-compactness.\nThis paper is the continuation of our research work about linguistic truth-valued concept lattice. In order to provide a mathematical tool for mining tacit knowledge, we establish a concrete model of 6-ary linguistic truth-valued concept lattice and introduce a mining algorithm through the structure consistency. Specifically, we utilize the attributes to depict knowledge, propose the 6-ary linguistic truth-valued attribute extended context and congener context to characterize tacit knowledge, and research the necessary and sufficient conditions of forming tacit knowledge. We respectively give the algorithms of generating the linguistic truth-valued congener context and constructing the linguistic truth-valued concept lattice.\nWe review some existing methods for the computation of first order moments on junction trees using Shafer-Shenoy algorithm. First, we consider the problem of first order moments computation as vertices problem in junction trees. In this way, the problem is solved using the memory space of an order of the junction tree edge-set cardinality. After that, we consider two algorithms, Lauritzen-Nilsson algorithm, and Mau\\'a et al. algorithm, which computes the first order moments as the normalization problem in junction tree, using the memory space of an order of the junction tree leaf-set cardinality.\nAlthough the CSP (constraint satisfaction problem) is NP-complete, even in the case when all constraints are binary, certain classes of instances are tractable. We study classes of instances defined by excluding subproblems. This approach has recently led to the discovery of novel tractable classes. The complete characterisation of all tractable classes defined by forbidding patterns (where a pattern is simply a compact representation of a set of subproblems) is a challenging problem. We demonstrate a dichotomy in the case of forbidden patterns consisting of either one or two constraints. This has allowed us to discover new tractable classes including, for example, a novel generalisation of 2SAT.\nWe present a technique for the animation of a 3D kinematic tongue model, one component of the talking head of an acoustic-visual (AV) speech synthesizer. The skeletal animation approach is adapted to make use of a deformable rig controlled by tongue motion capture data obtained with electromagnetic articulography (EMA), while the tongue surface is extracted from volumetric magnetic resonance imaging (MRI) data. Initial results are shown and future work outlined.\nThis paper provides a self-contained first introduction to description logics (DLs). The main concepts and features are explained with examples before syntax and semantics of the DL SROIQ are defined in detail. Additional sections review light-weight DL languages, discuss the relationship to the Web Ontology Language OWL and give pointers to further reading.\nA resistive memory network that has no crossover wiring is proposed to overcome the hardware limitations to size and functional complexity that is associated with conventional analogue neural networks. The proposed memory network is based on simple network cells that are arranged in a hierarchical modular architecture. Cognitive functionality of this network is demonstrated by an example of character recognition. The network is trained by an evolutionary process to completely recognise characters deformed by random noise, rotation, scaling and shifting\nThe proposed feature selection method builds a histogram of the most stable features from random subsets of a training set and ranks the features based on a classifier based cross-validation. This approach reduces the instability of features obtained by conventional feature selection methods that occur with variation in training data and selection criteria. Classification results on four microarray and three image datasets using three major feature selection criteria and a naive Bayes classifier show considerable improvement over benchmark results.\nIn this paper we propose two new algorithms based on biclustering analysis, which can be used at the basis of a recommender system for educational orientation of Russian School graduates. The first algorithm was designed to help students make a choice between different university faculties when some of their preferences are known. The second algorithm was developed for the special situation when nothing is known about their preferences. The final version of this recommender system will be used by Higher School of Economics.\nApplied ontology is a relatively new field which aims to apply theories and methods from diverse disciplines such as philosophy, cognitive science, linguistics and formal logics to perform or improve domain-specific tasks. To support the development of effective research methodologies for applied ontology, we critically discuss the question how its research results should be evaluated. We propose that results in applied ontology must be evaluated within their domain of application, based on some ontology-based task within the domain, and discuss quantitative measures which would facilitate the objective evaluation and comparison of research results in applied ontology.\nHierarchical problem abstraction, when applicable, may offer exponential reductions in computational complexity. Previous work on coarse-to-fine dynamic programming (CFDP) has demonstrated this possibility using state abstraction to speed up the Viterbi algorithm. In this paper, we show how to apply temporal abstraction to the Viterbi problem. Our algorithm uses bounds derived from analysis of coarse timescales to prune large parts of the state trellis at finer timescales. We demonstrate improvements of several orders of magnitude over the standard Viterbi algorithm, as well as significant speedups over CFDP, for problems whose state variables evolve at widely differing rates.\nPrediction markets provide an efficient means to assess uncertain quantities from forecasters. Traditional and competitive strictly proper scoring rules have been shown to incentivize players to provide truthful probabilistic forecasts. However, we show that when those players can cooperate, these mechanisms can instead discourage them from reporting what they really believe. When players with different beliefs are able to cooperate and form a coalition, these mechanisms admit arbitrage and there is a report that will always pay coalition members more than their truthful forecasts. If the coalition were created by an intermediary, such as a web portal, the intermediary would be guaranteed a profit.\nThe problem of learning the structure of Bayesian networks from complete discrete data with a limit on parent set size is considered. Learning is cast explicitly as an optimisation problem where the goal is to find a BN structure which maximises log marginal likelihood (BDe score). Integer programming, specifically the SCIP framework, is used to solve this optimisation problem. Acyclicity constraints are added to the integer program (IP) during solving in the form of cutting planes. Finding good cutting planes is the key to the success of the approach -the search for such cutting planes is effected using a sub-IP. Results show that this is a particularly fast method for exact BN learning.\nMarkov control algorithms that perform smooth, non-greedy updates of the policy have been shown to be very general and versatile, with policy gradient and Expectation Maximisation algorithms being particularly popular. For these algorithms, marginal inference of the reward weighted trajectory distribution is required to perform policy updates. We discuss a new exact inference algorithm for these marginals in the finite horizon case that is more efficient than the standard approach based on classical forward-backward recursions. We also provide a principled extension to infinite horizon Markov Decision Problems that explicitly accounts for an infinite horizon. This extension provides a novel algorithm for both policy gradients and Expectation Maximisation in infinite horizon problems.\nThis paper presents an approach for learning to translate simple narratives, i.e., texts (sequences of sentences) describing dynamic systems, into coherent sequences of events without the need for labeled training data. Our approach incorporates domain knowledge in the form of preconditions and effects of events, and we show that it outperforms state-of-the-art supervised learning systems on the task of reconstructing RoboCup soccer games from their commentaries.\nWe consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic shortest path problems. Such bounds have been previously established only in the special case that \"all policies are proper,\" in which case the dynamic programming operator is known to be a contraction, and have been shown to be easily computable only in the more limited special case of discounting. Under the condition that transition costs are positive, we show that suboptimality bounds can be easily computed even when not all policies are proper. In the general case when there are no restrictions on transition costs, the analysis is more complex. But we present preliminary results that show such bounds are possible.\nWe present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback is not fixed but varies for the queries posed by the search algorithm. Our results show that a target out of n items can be found in O(log n) queries. We also show the surprising result that for k possible answers per query, the speedup is not log k (as for k-ary search) but only log log k in some cases.\nMarginal MAP problems are notoriously difficult tasks for graphical models. We derive a general variational framework for solving marginal MAP problems, in which we apply analogues of the Bethe, tree-reweighted, and mean field approximations. We then derive a \"mixed\" message passing algorithm and a convergent alternative using CCCP to solve the BP-type approximations. Theoretically, we give conditions under which the decoded solution is a global or local optimum, and obtain novel upper bounds on solutions. Experimentally we demonstrate that our algorithms outperform related approaches. We also show that EM and variational EM comprise a special case of our framework.\nIn this paper, we develop a qualitative theory of influence diagrams that can be used to model and solve sequential decision making tasks when only qualitative (or imprecise) information is available. Our approach is based on an order-of-magnitude approximation of both probabilities and utilities and allows for specifying partially ordered preferences via sets of utility values. We also propose a dedicated variable elimination algorithm that can be applied for solving order-of-magnitude influence diagrams.\nWe demonstrate a limitation of discounted expected utility, a standard approach for representing the preference to risk when future cost is discounted. Specifically, we provide an example of the preference of a decision maker that appears to be rational but cannot be represented with any discounted expected utility. A straightforward modification to discounted expected utility leads to inconsistent decision making over time. We will show that an iterated risk measure can represent the preference that cannot be represented by any discounted expected utility and that the decisions based on the iterated risk measure are consistent over time.\nTraditional economic models typically treat private information, or signals, as generated from some underlying state. Recent work has explicated alternative models, where signals correspond to interpretations of available information. We show that the difference between these formulations can be sharply cast in terms of causal dependence structure, and employ graphical models to illustrate the distinguishing characteristics. The graphical representation supports inferences about signal patterns in the interpreted framework, and suggests how results based on the generated model can be extended to more general situations. Specific insights about bidding games in classical auction mechanisms derive from qualitative graphical models.\nWe introduce a rich class of graphical models for multi-armed bandit problems that permit both the state or context space and the action space to be very large, yet succinctly specify the payoffs for any context-action pair. Our main result is an algorithm for such models whose regret is bounded by the number of parameters and whose running time depends only on the treewidth of the graph substructure induced by the action space.\nIn many situations, Miniature Aerial Vehicles (MAVs) are limited to using only on-board sensors for navigation. This limits the data available to algorithms used for stabilization and localization, and current control methods are often insufficient to allow reliable hovering in place or trajectory following. In this research, we explore using machine learning to predict the drift (flight path errors) of an MAV while executing a desired flight path. This predicted drift will allow the MAV to adjust it's flightpath to maintain a desired course.\nIn this article we present an Elitism Levels Traverse Mechanism that we designed to find bounds on population-based Evolutionary algorithms solving unimodal functions. We prove its efficiency theoretically and test it on OneMax function deriving bounds c{\\mu}n log n - O({\\mu} n). This analysis can be generalized to any similar algorithm using variants of tournament selection and genetic operators that flip or swap only 1 bit in each string.\nThe purpose of statistical disclosure control (SDC) of microdata, a.k.a. data anonymization or privacy-preserving data mining, is to publish data sets containing the answers of individual respondents in such a way that the respondents corresponding to the released records cannot be re-identified and the released data are analytically useful. SDC methods are either based on masking the original data, generating synthetic versions of them or creating hybrid versions by combining original and synthetic data. The choice of SDC methods for categorical data, especially nominal data, is much smaller than the choice of methods for numerical data. We mitigate this problem by introducing a numerical mapping for hierarchical nominal data which allows computing means, variances and covariances on them.\nContinuous logic extends the multi-valued Lukasiewicz logic by adding a halving operator on propositions. This extension is designed to give a more satisfactory model theory for continuous structures. The semantics of these logics can be given using specialisations of algebraic structures known as hoops. As part of an investigation into the metatheory of propositional continuous logic, we were indebted to Prover9 for finding a proof of an important algebraic law.\nA new probabilistic methodology for transmission expansion planning (TEP) that does not require a priori specification of new/additional transmission capacities and uses the concept of social welfare has been proposed. Two new concepts have been introduced in this paper: (i) roulette wheel methodology has been used to calculate the capacity of new transmission lines and (ii) load flow analysis has been used to calculate expected demand not served (EDNS). The overall methodology has been implemented on a modified IEEE 5-bus test system. Simulations show an important result: addition of only new transmission lines is not sufficient to minimize EDNS.\nWe propose a simple method for combining together voting rules that performs a run-off between the different winners of each voting rule. We prove that this combinator has several good properties. For instance, even if just one of the base voting rules has a desirable property like Condorcet consistency, the combination inherits this property. In addition, we prove that combining voting rules together in this way can make finding a manipulation more computationally difficult. Finally, we study the impact of this combinator on approximation methods that find close to optimal manipulations.\nWe introduce Gaussian Process Topic Models (GPTMs), a new family of topic models which can leverage a kernel among documents while extracting correlated topics. GPTMs can be considered a systematic generalization of the Correlated Topic Models (CTMs) using ideas from Gaussian Process (GP) based embedding. Since GPTMs work with both a topic covariance matrix and a document kernel matrix, learning GPTMs involves a novel component-solving a suitable Sylvester equation capturing both topic and document dependencies. The efficacy of GPTMs is demonstrated with experiments evaluating the quality of both topic modeling and embedding.\nWe extend the herding algorithm to continuous spaces by using the kernel trick. The resulting \"kernel herding\" algorithm is an infinite memory deterministic process that learns to approximate a PDF with a collection of samples. We show that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O(1/pT) for iid random samples. We illustrate kernel herding by approximating Bayesian predictive distributions.\nWe propose a new method for estimating the intrinsic dimension of a dataset by applying the principle of regularized maximum likelihood to the distances between close neighbors. We propose a regularization scheme which is motivated by divergence minimization principles. We derive the estimator by a Poisson process approximation, argue about its convergence properties and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators.\nWe provide a simple method and relevant theoretical analysis for efficiently estimating higher-order lp distances. While the analysis mainly focuses on l4, our methodology extends naturally to p = 6,8,10..., (i.e., when p is even). Distance-based methods are popular in machine learning. In large-scale applications, storing, computing, and retrieving the distances can be both space and time prohibitive. Efficient algorithms exist for estimating lp distances if 0 < p <= 2. The task for p > 2 is known to be difficult. Our work partially fills this gap.\nWe present a Dirichlet process mixture model over discrete incomplete rankings and study two Gibbs sampling inference techniques for estimating posterior clusterings. The first approach uses a slice sampling subcomponent for estimating cluster parameters. The second approach marginalizes out several cluster parameters by taking advantage of approximations to the conditional posteriors. We empirically demonstrate (1) the effectiveness of this approximation for improving convergence, (2) the benefits of the Dirichlet process model over alternative clustering techniques for ranked data, and (3) the applicability of the approach to exploring large realworld ranking datasets.\nRelational learning can be used to augment one data source with other correlated sources of information, to improve predictive accuracy. We frame a large class of relational learning problems as matrix factorization problems, and propose a hierarchical Bayesian model. Training our Bayesian model using random-walk Metropolis-Hastings is impractically slow, and so we develop a block Metropolis-Hastings sampler which uses the gradient and Hessian of the likelihood to dynamically tune the proposal. We demonstrate that a predictive model of brain response to stimuli can be improved by augmenting it with side information about the stimuli.\nRecent reports have described that the equivalent sample size (ESS) in a Dirichlet prior plays an important role in learning Bayesian networks. This paper provides an asymptotic analysis of the marginal likelihood score for a Bayesian network. Results show that the ratio of the ESS and sample size determine the penalty of adding arcs in learning Bayesian networks. The number of arcs increases monotonically as the ESS increases; the number of arcs monotonically decreases as the ESS decreases. Furthermore, the marginal likelihood score provides a unified expression of various score metrics by changing prior knowledge.\nThis paper describes a technology to connect patients to information in the experiences of other patients by using the power of structured big data. The approach, implemented in the Abzooba Smart Health Informatics Platform (SHIP),is to distill concepts of facts and expressions from conversations and discussions in health social media forums, and use those distilled concepts in connecting patients to experiences and insights that are highly relevant to them in particular. We envision our work, in progress, to provide new and effective tools to exploit the richness of content in social media in health for outcomes research.\nDeep Boltzmann machines are in principle powerful models for extracting the hierarchical structure of data. Unfortunately, attempts to train layers jointly (without greedy layer-wise pretraining) have been largely unsuccessful. We propose a modification of the learning algorithm that initially recenters the output of the activation functions to zero. This modification leads to a better conditioned Hessian and thus makes learning easier. We test the algorithm on real data and demonstrate that our suggestion, the centered deep Boltzmann machine, learns a hierarchy of increasingly abstract representations and a better generative model of data.\nThe deep Boltzmann machine (DBM) has been an important development in the quest for powerful \"deep\" probabilistic models. To date, simultaneous or joint training of all layers of the DBM has been largely unsuccessful with existing training methods. We introduce a simple regularization scheme that encourages the weight vectors associated with each hidden unit to have similar norms. We demonstrate that this regularization can be easily combined with standard stochastic maximum likelihood to yield an effective training strategy for the simultaneous training of all layers of the deep Boltzmann machine.\nWe introduce a new type of fully computable problems, for DSS dedicated to maximal spanning tree problems, based on deduction and choice: preferential consistency problems. To show its interest, we describe a new compact representation of preferences specific to spanning trees, identifying an efficient maximal spanning tree sub-problem. Next, we compare this problem with the Pareto-based multiobjective one. And at last, we propose an efficient algorithm solving the associated preferential consistency problem.\nSome aspects of the result of applying unit resolution on a CNF formula can be formalized as functions with domain a set of partial truth assignments. We are interested in two ways for computing such functions, depending on whether the result is the production of the empty clause or the assignment of a variable with a given truth value. We show that these two models can compute the same functions with formulae of polynomially related sizes, and we explain how this result is related to the CNF encoding of Boolean constraints.\nIn this report, we will be interested at Dynamic Bayesian Network (DBNs) as a model that tries to incorporate temporal dimension with uncertainty. We start with basics of DBN where we especially focus in Inference and Learning concepts and algorithms. Then we will present different levels and methods of creating DBNs as well as approaches of incorporating temporal dimension in static Bayesian network.\nOn dedicated websites, people can upload videos and share it with the rest of the world. Currently these videos are cat- egorized manually by the help of the user community. In this paper, we propose a combination of color spaces with the Bayesian network approach for robust detection of skin color followed by an automated video categorization. Exper- imental results show that our method can achieve satisfactory performance for categorizing videos based on skin color.\nWe argue for the value of publishing the exact code, configuration and data processing scripts used to produce empirical work in robotics. In particular, we recommend publishing a unique identifier for the code package in the paper itself, as a promise to the reader that this is the relavant code. We review some recent discussion of best practice for reproducibility in various professional organisations and journals, and discuss the current reward structure for publishing code in robotics, along with some ideas for improvement.\nIn this article a tool for the analysis of population-based EAs is used to derive asymptotic upper bounds on the optimization time of the algorithm solving Royal Roads problem, a test function with plateaus of fitness. In addition to this, limiting distribution of a certain subset of the population is approximated.\nThe common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition and event recognition to influence tracking and object detection. The performance of the combination can exceed the performance of the components in isolation. This can be done with linear asymptotic complexity.\nThe solution of the biobjective IRP is rather challenging, even for metaheuristics. We are still lacking a profound understanding of appropriate solution representations and effective neighborhood structures. Clearly, both the delivery volumes and the routing aspects of the alternatives need to be reflected in an encoding, and must be modified when searching by means of local search. Our work contributes to the better understanding of such solution representations. On the basis of an experimental investigation, the advantages and drawbacks of two encodings are studied and compared.\nToday, one's disposes of large datasets composed of thousands of geographic objects. However, for many processes, which require the appraisal of an expert or much computational time, only a small part of these objects can be taken into account. In this context, robust sampling methods become necessary. In this paper, we propose a sampling method based on clustering techniques. Our method consists in dividing the objects in clusters, then in selecting in each cluster, the most representative objects. A case-study in the context of a process dedicated to knowledge revision for geographic data generalisation is presented. This case-study shows that our method allows to select relevant samples of objects.\nMany real world problems can be defined as optimisation problems in which the aim is to maximise an objective function. The quality of obtained solution is directly linked to the pertinence of the used objective function. However, designing such function, which has to translate the user needs, is usually fastidious. In this paper, a method to help user objective functions designing is proposed. Our approach, which is highly interactive, is based on man machine dialogue and more particularly on the comparison of problem instance solutions by the user. We propose an experiment in the domain of cartographic generalisation that shows promising results.\nWhile looking for abductive explanations of a given set of manifestations, an ordering between possible solutions is often assumed. The complexity of finding/verifying optimal solutions is already known. In this paper we consider the computational complexity of finding second-best solutions. We consider different orderings, and consider also different possible definitions of what a second-best solution is.\nIn this paper we develop a fuzzy model for the description of the process of Analogical Reasoning by representing its main steps as fuzzy subsets of a set of linguistic labels characterizing the individuals' performance in each step and we use the Shannon- Wiener diversity index as a measure of the individuals' abilities in analogical problem solving. This model is compared with a stochastic model presented in author's earlier papers by introducing a finite Markov chain on the steps of the process of Analogical Reasoning. A classroom experiment is also presented to illustrate the use of our results in practice.\nBased on World Health Organization (WHO) fact sheet in the 2011, outbreaks of poultry diseases especially Avian Influenza in poultry may raise global public health concerns due to their effect on poultry populations, their potential to cause serious disease in people, and their pandemic potential. In this research, we built a Poultry Diseases Expert System using Dempster-Shafer Theory. In this Poultry Diseases Expert System We describe five symptoms which include depression, combs, wattle, bluish face region, swollen face region, narrowness of eyes, and balance disorders. The result of the research is that Poultry Diseases Expert System has been successfully identifying poultry diseases.\nThe degree of success in document summarization processes depends on the performance of the method used in identifying significant sentences in the documents. The collection of unique words characterizes the major signature of the document, and forms the basis for Term-Sentence-Matrix (TSM). The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the Term-Sentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document. Our experiments show that such a method would outperform most of the existing methods in producing summaries from large documents.\nWithout Linked Data, transport data is limited to applications exclusively around transport. In this paper, we present a workflow for publishing and linking transport data on the Web. So we will be able to develop transport applications and to add other features which will be created from other datasets. This will be possible because transport data will be linked to these datasets. We apply this workflow to two datasets: NEPTUNE, a French standard describing a transport line, and Passim, a directory containing relevant information on transport services, in every French city.\nWe present a novel clustering approach for moving object trajectories that are constrained by an underlying road network. The approach builds a similarity graph based on these trajectories then uses modularity-optimization hiearchical graph clustering to regroup trajectories with similar profiles. Our experimental study shows the superiority of the proposed approach over classic hierarchical clustering and gives a brief insight to visualization of the clustering results.\nWe present the Infinite Latent Events Model, a nonparametric hierarchical Bayesian distribution over infinite dimensional Dynamic Bayesian Networks with binary state representations and noisy-OR-like transitions. The distribution can be used to learn structure in discrete timeseries data by simultaneously inferring a set of latent events, which events fired at each timestep, and how those events are causally linked. We illustrate the model on a sound factorization task, a network topology identification task, and a video game task.\nLearning the parameters of a (potentially partially observable) random field model is intractable in general. Instead of focussing on a single optimal parameter value we propose to treat parameters as dynamical quantities. We introduce an algorithm to generate complex dynamics for parameters and (both visible and hidden) state vectors. We show that under certain conditions averages computed over trajectories of the proposed dynamical system converge to averages computed over the data. Our \"herding dynamics\" does not require expensive operations such as exponentiation and is fully deterministic.\nTemporal-difference (TD) networks are a class of predictive state representations that use well-established TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We present an algorithm for learning TD network representations of dynamical systems with continuous observations and actions. Our results show that the algorithm is capable of learning accurate and robust models of several noisy continuous dynamical systems. The algorithm presented here is the first fully incremental method for learning a predictive representation of a continuous dynamical system.\nWe consider MAP estimators for structured prediction with exponential family models. In particular, we concentrate on the case that efficient algorithms for uniform sampling from the output space exist. We show that under this assumption (i) exact computation of the partition function remains a hard problem, and (ii) the partition function and the gradient of the log partition function can be approximated efficiently. Our main result is an approximation scheme for the partition function based on Markov Chain Monte Carlo theory. We also show that the efficient uniform sampling assumption holds in several application settings that are of importance in machine learning.\nIncorporating domain knowledge into the modeling process is an effective way to improve learning accuracy. However, as it is provided by humans, domain knowledge can only be specified with some degree of uncertainty. We propose to explicitly model such uncertainty through probabilistic constraints over the parameter space. In contrast to hard parameter constraints, our approach is effective also when the domain knowledge is inaccurate and generally results in superior modeling accuracy. We focus on generative and conditional modeling where the parameters are assigned a Dirichlet or Gaussian prior and demonstrate the framework with experiments on both synthetic and real-world data.\nScore matching is a recently developed parameter learning method that is particularly effective to complicated high dimensional density models with intractable partition functions. In this paper, we study two issues that have not been completely resolved for score matching. First, we provide a formal link between maximum likelihood and score matching. Our analysis shows that score matching finds model parameters that are more robust with noisy training data. Second, we develop a generalization of score matching. Based on this generalization, we further demonstrate an extension of score matching to models of discrete data.\nWe are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.\nWe provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~O(HSpAT). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds.\nSoft sets, as a mathematical tool for dealing with uncertainty, have recently gained considerable attention, including some successful applications in information processing, decision, demand analysis, and forecasting. To construct new soft sets from given soft sets, some operations on soft sets have been proposed. Unfortunately, such operations cannot keep all classical set-theoretic laws true for soft sets. In this paper, we redefine the intersection, complement, and difference of soft sets and investigate the algebraic properties of these operations along with a known union operation. We find that the new operation system on soft sets inherits all basic properties of operations on classical sets, which justifies our definitions.\nMeasurement professionals cannot come to an agreement on the definition of the term 'item fairness'. In this paper a continuous measure of item unfairness is proposed. The more the unfairness measure deviates from zero, the less fair the item is. If the measure exceeds the cutoff value, the item is identified as definitely unfair. The new approach can identify unfair items that would not be identified with conventional procedures. The results are in accord with experts' judgments on the item qualities. Since no assumptions about scores distributions and/or correlations are assumed, the method is applicable to any educational test. Its performance is illustrated through application to scores of a real test.\nBackdoors of answer-set programs are sets of atoms that represent clever reasoning shortcuts through the search space. Assignments to backdoor atoms reduce the given program to several programs that belong to a tractable target class. Previous research has considered target classes based on notions of acyclicity where various types of cycles (good and bad cycles) are excluded from graph representations of programs. We generalize the target classes by taking the parity of the number of negative edges on bad cycles into account and consider backdoors for such classes. We establish new hardness results and non-uniform polynomial-time tractability relative to directed or undirected cycles.\nThis paper demonstrates the use of neural networks for developing a system that can recognize hand-written English alphabets. In this system, each English alphabet is represented by binary values that are used as input to a simple feature extraction system, whose output is fed to our neural network system.\nThis paper addresses a mixed integer programming (MIP) formulation for the multi-item uncapacitated lot-sizing problem that is inspired from the trailer manufacturer. The proposed MIP model has been utilized to find out the optimum order quantity, optimum order time, and the minimum total cost of purchasing, ordering, and holding over the predefined planning horizon. This problem is known as NP-hard problem. The model was presented in an optimal software form using LINGO 13.0.\nFeature weighting is a technique used to approximate the optimal degree of influence of individual features. This paper presents a feature weighting method for Document Image Retrieval System (DIRS) based on keyword spotting. In this method, we weight the feature using coefficient of multiple correlations. Coefficient of multiple correlations can be used to describe the synthesized effects and correlation of each feature. The aim of this paper is to show that feature weighting increases the performance of DIRS. After applying the feature weighting method to DIRS the average precision is 93.23% and average recall become 98.66% respectively\nWe describe a system for meta-analysis where a wiki stores numerical data in a simple format and a web service performs the numerical computation.   We initially apply the system on multiple meta-analyses of structural neuroimaging data results. The described system allows for mass meta-analysis, e.g., meta-analysis across multiple brain regions and multiple mental disorders.\nWe propose an approach for approximating the partition function which is based on two steps: (1) computing the partition function of a simplified model which is obtained by deleting model edges, and (2) rectifying the result by applying an edge-by-edge correction. The approach leads to an intuitive framework in which one can trade-off the quality of an approximation with the complexity of computing it. It also includes the Bethe free energy approximation as a degenerate case. We develop the approach theoretically in this paper and provide a number of empirical results that reveal its practical utility.\nTraditional multi-view learning approaches suffer in the presence of view disagreement,i.e., when samples in each view do not belong to the same class due to view corruption, occlusion or other noise processes. In this paper we present a multi-view learning approach that uses a conditional entropy criterion to detect view disagreement. Once detected, samples with view disagreement are filtered and standard multi-view learning methods can be successfully applied to the remaining samples. Experimental evaluation on synthetic and audio-visual databases demonstrates that the detection and filtering of view disagreement considerably increases the performance of traditional multi-view learning approaches.\nGraphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main novelty is that this is a direct minimization of emperical risk, where the risk measures the accuracy of predicted marginals.\nThis paper presents a natural extension of stagewise ranking to the the case of infinitely many items. We introduce the infinite generalized Mallows model (IGM), describe its properties and give procedures to estimate it from data. For estimation of multimodal distributions we introduce the Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of the new model and demonstrate that infinite models can be simple, elegant and practical.\nWe consider the task of learning mappings from sequential data to real-valued responses. We present and evaluate an approach to learning a type of hidden Markov model (HMM) for regression. The learning process involves inferring the structure and parameters of a conventional HMM, while simultaneously learning a regression model that maps features that characterize paths through the model to continuous responses. Our results, in both synthetic and biological domains, demonstrate the value of jointly learning the two components of our approach.\nAlthough fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.\nRecent works on cost based relaxations have improved Constraint Programming (CP) models for the Traveling Salesman Problem (TSP). We provide a short survey over solving asymmetric TSP with CP. Then, we suggest new implied propagators based on general graph properties. We experimentally show that such implied propagators bring robustness to pathological instances and highlight the fact that graph structure can significantly improve search heuristics behavior. Finally, we show that our approach outperforms current state of the art results.\nThe rules of d-separation provide a framework for deriving conditional independence facts from model structure. However, this theory only applies to simple directed graphical models. We introduce relational d-separation, a theory for deriving conditional independence in relational models. We provide a sound, complete, and computationally efficient method for relational d-separation, and we present empirical results that demonstrate effectiveness.\nThis paper revisits the problem of analyzing multiple ratings given by different judges. Different from previous work that focuses on distilling the true labels from noisy crowdsourcing ratings, we emphasize gaining diagnostic insights into our in-house well-trained judges. We generalize the well-known DawidSkene model (Dawid & Skene, 1979) to a spectrum of probabilistic models under the same \"TrueLabel + Confusion\" paradigm, and show that our proposed hierarchical Bayesian model, called HybridConfusion, consistently outperforms DawidSkene on both synthetic and real-world data sets.\nModel-based Bayesian Reinforcement Learning (BRL) allows a found formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the exploration-exploitation dilemma. However, algorithms explicitly addressing BRL suffer from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces BOLT, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze BOLT's sample complexity, and show that under certain parameters, the algorithm is near-optimal in the Bayesian sense with high probability. Then, experimental results highlight the key differences of this method compared to previous work.\nInverse optimal control, also known as inverse reinforcement learning, is the problem of recovering an unknown reward function in a Markov decision process from expert demonstrations of the optimal policy. We introduce a probabilistic inverse optimal control algorithm that scales gracefully with task dimensionality, and is suitable for large, continuous domains where even computing a full policy is impractical. By using a local approximation of the reward function, our method can also drop the assumption that the demonstrations are globally optimal, requiring only local optimality. This allows it to learn from examples that are unsuitable for prior methods.\nEffective learning of user preferences is critical to easing user burden in various types of matching problems. Equally important is active query selection to further reduce the amount of preference information users must provide. We address the problem of active learning of user preferences for matching problems, introducing a novel method for determining probabilistic matchings, and developing several new active learning strategies that are sensitive to the specific matching objective. Experiments with real-world data sets spanning diverse domains demonstrate that matching-sensitive active learning\nMuch of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society. From this perspective, there exist glaring limitations in the data sets we investigate, the metrics we employ for evaluation, and the degree to which results are communicated back to their originating domains. What changes are needed to how we conduct research to increase the impact that ML has? We present six Impact Challenges to explicitly focus the field?s energy and attention, and we discuss existing obstacles that must be addressed. We aim to inspire ongoing discussion and focus on ML that matters.\nHigh dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and expected distances. These connections provide a new framework for unsupervised metric learning for text documents. Experiments indicate that the resulting distances are generally superior to their more standard counterparts.\nDirichlet Process Mixtures (DPMs) are a popular class of statistical models to perform density estimation and clustering. However, when the data available have a distribution evolving over time, such models are inadequate. We introduce here a class of time-varying DPMs which ensures that at each time step the random distribution follows a DPM model. Our model relies on an intuitive and simple generalized Polya urn scheme. Inference is performed using Markov chain Monte Carlo and Sequential Monte Carlo. We demonstrate our model on various applications.\nWe address challenges of active learning under scarce informational resources in non-stationary environments. In real-world settings, data labeled and integrated into a predictive model may become invalid over time. However, the data can become informative again with switches in context and such changes may indicate unmodeled cyclic or other temporal dynamics. We explore principles for discarding, caching, and recalling labeled data points in active learning based on computations of value of information. We review key concepts and study the value of the methods via investigations of predictive performance and costs of acquiring data for simulated and real-world data sets.\nWe present a novel approach to detecting and utilizing symmetries in probabilistic graphical models with two main contributions. First, we present a scalable approach to computing generating sets of permutation groups representing the symmetries of graphical models. Second, we introduce orbital Markov chains, a novel family of Markov chains leveraging model symmetries to reduce mixing times. We establish an insightful connection between model symmetries and rapid mixing of orbital Markov chains. Thus, we present the first lifted MCMC algorithm for probabilistic graphical models. Both analytical and empirical results demonstrate the effectiveness and efficiency of the approach.\nIn Passive POMDPs actions do not affect the world state, but still incur costs. When the agent is bounded by information-processing constraints, it can only keep an approximation of the belief. We present a variational principle for the problem of maintaining the information which is most useful for minimizing the cost, and introduce an efficient and simple algorithm for finding an optimum.\nThe extension of counterfactual causal graphic model with three variables of vertex set in directed acyclic graph (DAG) is discussed in this paper by extending two- value distribution to three-value distribution of the variables involved in DAG. Using the conditional independence as ancillary information, 6 kinds of extension counterfactual causal graphic models with some variables are extended from two-value distribution to three-value distribution and the sufficient conditions of identifiability are derived.\nX in R^D has mean zero and finite second moments. We show that there is a precise sense in which almost all linear projections of X into R^d (for d < D) look like a scale-mixture of spherical Gaussians -- specifically, a mixture of distributions N(0, sigma^2 I_d) where the weight of the particular sigma component is P (| X |^2 = sigma^2 D). The extent of this effect depends upon the ratio of d to D, and upon a particular coefficient of eccentricity of X's distribution. We explore this result in a variety of experiments.\nIn this paper we review the notion of direct causal effect as introduced by Pearl (2001). We show how it can be formulated without counterfactuals, using intervention indicators instead. This allows to consider the natural direct effect as a special case of sequential treatments discussed by Dawid and Didelez (2005) which immediately yields conditions for identifiability as well as a graphical way of checking identifiability. The results are contrasted with the criteria given by Pearl (2001) and Robins (2003).\nThe popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present a continuous and differentiable sequential document representation that goes beyond the bag of words assumption, and yet is efficient and effective. This representation employs smooth curves in the multinomial simplex to account for sequential information. We discuss the representation and its geometric properties and demonstrate its applicability for the task of text classification.\nWe show how to reduce the process of predicting general order statistics (and the median in particular) to solving classification. The accompanying theoretical statement shows that the regret of the classifier bounds the regret of the quantile regression under a quantile loss. We also test this reduction empirically against existing quantile regression methods on large real-world datasets and discover that it provides state-of-the-art performance.\nWe show that the multi-class support vector machine (MSVM) proposed by Lee et. al. (2004), can be viewed as a MAP estimation procedure under an appropriate probabilistic interpretation of the classifier. We also show that this interpretation can be extended to a hierarchical Bayesian architecture and to a fully-Bayesian inference procedure for multi-class classification based on data augmentation. We present empirical results that show that the advantages of the Bayesian formalism are obtained without a loss in classification accuracy.\nThis paper focuses on the restart strategy of CMA-ES on multi-modal functions. A first alternative strategy proceeds by decreasing the initial step-size of the mutation while doubling the population size at each restart. A second strategy adaptively allocates the computational budget among the restart settings in the BIPOP scheme. Both restart strategies are validated on the BBOB benchmark; their generality is also demonstrated on an independent real-world problem suite related to spacecraft trajectory optimization.\nWith the growing interest on Network Analysis, Relational Data Mining is becoming an emphasized domain of Data Mining. This paper addresses the problem of extracting representative elements from a relational dataset. After defining the notion of degree of representativeness, computed using the Borda aggregation procedure, we present the extraction of exemplars which are the representative elements of the dataset. We use these concepts to build a network on the dataset. We expose the main properties of these notions and we propose two typical applications of our framework. The first application consists in resuming and structuring a set of binary images and the second in mining co-authoring relation in a research team.\nIn spectral clustering and spectral image segmentation, the data is partioned starting from a given matrix of pairwise similarities S. the matrix S is constructed by hand, or learned on a separate training set. In this paper we show how to achieve spectral clustering in unsupervised mode. Our algorithm starts with a set of observed pairwise features, which are possible components of an unknown, parametric similarity function. This function is learned iteratively, at the same time as the clustering of the data. The algorithm shows promosing results on synthetic and real data.\nWe advance the approach initiated by Chawla et al. for sanitizing (census) data so as to preserve the privacy of respondents while simultaneously extracting \"useful\" statistical information. First, we extend the scope of their techniques to a broad and rich class of distributions, specifically, mixtures of highdimensional balls, spheres, Gaussians, and other \"nice\" distributions. Second, we randomize the histogram constructions to preserve spatial characteristics of the data, allowing us to approximate various quantities of interest, e.g., cost of the minimum spanning tree on the data, in a privacy-preserving fashion.\nWe introduce a novel latent grouping model for predicting the relevance of a new document to a user. The model assumes a latent group structure for both users and documents. We compared the model against a state-of-the-art method, the User Rating Profile model, where only users have a latent group structure. We estimate both models by Gibbs sampling. The new method predicts relevance more accurately for new documents that have few known ratings. The reason is that generalization over documents then becomes necessary and hence the twoway grouping is profitable.\nThis paper addresses the problem of mapping natural language sentences to lambda-calculus encodings of their meaning. We describe a learning algorithm that takes as input a training set of sentences labeled with expressions in the lambda calculus. The algorithm induces a grammar for the problem, along with a log-linear model that represents a distribution over syntactic and semantic analyses conditioned on the input sentence. We apply the method to the task of learning natural language interfaces to databases and show that the learned parsers outperform previous methods in two benchmark database domains.\nOne of the proposed solutions to the equilibrium selection problem for agents learning in repeated games is obtained via the notion of stochastic stability. Learning algorithms are perturbed so that the Markov chain underlying the learning dynamics is necessarily irreducible and yields a unique stable distribution. The stochastically stable distribution is the limit of these stable distributions as the perturbation rate tends to zero. We present the first exact algorithm for computing the stochastically stable distribution of a Markov chain. We use our algorithm to predict the long-term dynamics of simple learning algorithms in sample repeated games.\nThe feature of our method different from other fuzzy grey relation method for supermixed multiple attribute group decision-making is that all of the subjective and objective weights are obtained by interval grey number and that the group decisionmaking is performed based on the relative approach degree of grey TOPSIS, the relative approach degree of grey incidence and the relative membership degree of grey incidence using 4-dimensional Euclidean distance. The weighted Borda method is used to obtain final rank by using the results of four methods. An example shows the applicability of the proposed approach.\nThe multiple attribute mixed type decision making is performed by four methods, that is, the relative approach degree of grey TOPSIS method, the relative approach degree of grey incidence, the relative membership degree of grey incidence and the grey relation relative approach degree method using the maximum entropy estimation, respectively. In these decision making methods, the grey incidence degree in four-dimensional Euclidean space is used. The final arrangement result is obtained by weighted Borda method. An example illustrates the applicability of the proposed approach.\nWe revisit the SeqBin constraint. This meta-constraint subsumes a number of important global constraints like Change, Smooth and IncreasingNValue. We show that the previously proposed filtering algorithm for SeqBin has two drawbacks even under strong restrictions: it does not detect bounds disentailment and it is not idempotent. We identify the cause for these problems, and propose a new propagator that overcomes both issues. Our algorithm is based on a connection to the problem of finding a path of a given cost in a restricted $n$-partite graph. Our propagator enforces domain consistency in O(nd^2) and, for special cases of SeqBin that include Change, Smooth and IncreasingNValue, in O(nd) time.\nIn this paper, we discuss the implementation of a rule based expert system for diagnosing neuromuscular diseases. The proposed system is implemented as a rule based expert system in JESS for the diagnosis of Cerebral Palsy, Multiple Sclerosis, Muscular Dystrophy and Parkinson's disease. In the system, the user is presented with a list of questionnaires about the symptoms of the patients based on which the disease of the patient is diagnosed and possible treatment is suggested. The system can aid and support the patients suffering from neuromuscular diseases to get an idea of their disease and possible treatment for the disease.\nThis paper is an example-based demonstration of our initial results on the formal specification of programs written in the computer algebra language MiniMaple (a substantial subset of Maple with slight extensions). The main goal of this work is to define a verification framework for MiniMaple. Formal specification of MiniMaple programs is rather complex task as it supports non-standard types of objects, e.g. symbols and unevaluated expressions, and additional functions and predicates, e.g. runtime type tests etc. We have used the specification language to specify various computer algebra concepts respective objects of the Maple package DifferenceDifferential developed at our institute.\nThe paper describes an application of Aggregating Algorithm to the problem of regression. It generalizes earlier results concerned with plain linear regression to kernel techniques and presents an on-line algorithm which performs nearly as well as any oblivious kernel predictor. The paper contains the derivation of an estimate on the performance of this algorithm. The estimate is then used to derive an application of the Complexity Approximation Principle to kernel methods.\nMethods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at different levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval.\nIn this paper we de ne conditional random elds in reproducing kernel Hilbert spaces and show connections to Gaussian Process classi cation. More speci cally, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present e cient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited e ciently in the optimization process.\nWe formulate and prove an axiomatic characterization of conditional information geometry, for both the normalized and the nonnormalized cases. This characterization extends the axiomatic derivation of the Fisher geometry by Cencov and Campbell to the cone of positive conditional models, and as a special case to the manifold of conditional distributions. Due to the close connection between the conditional I-divergence and the product Fisher information metric the characterization provides a new axiomatic interpretation of the primal problems underlying logistic regression and AdaBoost.\nWithin the task of collaborative filtering two challenges for computing conditional probabilities exist. First, the amount of training data available is typically sparse with respect to the size of the domain. Thus, support for higher-order interactions is generally not present. Second, the variables that we are conditioning upon vary for each query. That is, users label different variables during each query. For this reason, there is no consistent input to output mapping. To address these problems we purpose a maximum entropy approach using a non-standard measure of entropy. This approach can be simplified to solving a set of linear equations that can be efficiently solved.\nWe study the properties of variational Bayes approximations for exponential family models with missing values. It is shown that the iterative algorithm for obtaining the variational Bayesian estimator converges locally to the true value with probability 1 as the sample size becomes inde nitely large. Moreover, the variational posterior distribution is proved to be asymptotically normal.\nWe describe an algorithm for computing best response strategies in a class of two-player infinite games of incomplete information, defined by payoffs piecewise linear in agents' types and actions, conditional on linear comparisons of agents' actions. We show that this class includes many well-known games including a variety of auctions and a novel allocation game. In some cases, the best-response algorithm can be iterated to compute Bayes-Nash equilibria. We demonstrate the efficiency of our approach on existing and new games.\nStraightedge and compass construction problems are one of the oldest and most challenging problems in elementary mathematics. The central challenge, for a human or for a computer program, in solving construction problems is a huge search space. In this paper we analyze one family of triangle construction problems, aiming at detecting a small core of the underlying geometry knowledge. The analysis leads to a small set of needed definitions, lemmas and primitive construction steps, and consequently, to a simple algorithm for automated solving of problems from this family. The same approach can be applied to other families of construction problems.\nProbability Bracket Notation (PBN) is applied to systems of multiple random variables for preliminary study of static Bayesian Networks (BN) and Probabilistic Graphic Models (PGM). The famous Student BN Example is explored to show the local independences and reasoning power of a BN. Software package Elvira is used to graphically display the student BN. Our investigation shows that PBN provides a consistent and convenient alternative to manipulate many expressions related to joint, marginal and conditional probability distributions in static BN.\nUCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB1, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final \"arm pull\" (the actual move selection) that collects a reward, rather than all \"arm pulls\". In this paper, an MCTS sampling policy based on Value of Information (VOI) estimates of rollouts is suggested. Empirical evaluation of the policy and comparison to UCB1 and UCT is performed on random MAB instances as well as on Computer Go.\nThe rules of Sudoku are often specified using twenty seven \\texttt{all\\_different} constraints, referred to as the {\\em big} \\mrules. Using graphical proofs and exploratory logic programming, the following main and new result is obtained: many subsets of six of these big \\mrules are redundant (i.e., they are entailed by the remaining twenty one \\mrules), and six is maximal (i.e., removing more than six \\mrules is not possible while maintaining equivalence). The corresponding result for binary inequality constraints, referred to as the {\\em small} \\mrules, is stated as a conjecture.\nA recently identified problem is that of finding an optimal investment plan for a transportation network, given that a disaster such as an earthquake may destroy links in the network. The aim is to strengthen key links to preserve the expected network connectivity. A network based on the Istanbul highway system has thirty links and therefore a billion scenarios, but it has been estimated that sampling a million scenarios gives reasonable accuracy. In this paper we use symmetry reasoning to reduce the number of scenarios to a much smaller number, making sampling unnecessary. This result can be used to facilitate metaheuristic and exact approaches to the problem.\nIn this paper, we consider the problem of diversity in ranking of the nodes in a graph. The task is to pick the top-k nodes in the graph which are both 'central' and 'diverse'. Many graph-based models of NLP like text summarization, opinion summarization involve the concept of diversity in generating the summaries. We develop a novel method which works in an iterative fashion based on random walks to achieve diversity. Specifically, we use negative reinforcement as a main tool to introduce diversity in the Personalized PageRank framework. Experiments on two benchmark datasets show that our algorithm is competitive to the existing methods.\nA converter from first-order modal logics to classical higher- order logic is presented. This tool enables the application of off-the-shelf higher-order theorem provers and model finders for reasoning within first- order modal logics. The tool supports logics K, K4, D, D4, T, S4, and S5 with respect to constant, varying and cumulative domain semantics.\nThe increasing popularity of metaheuristic algorithms has attracted a great deal of attention in algorithm analysis and performance evaluations. No-free-lunch theorems are of both theoretical and practical importance, while many important studies on convergence analysis of various metaheuristic algorithms have proven to be fruitful. This paper discusses the recent results on no-free-lunch theorems and algorithm convergence, as well as their important implications for algorithm development in practice. Free lunches may exist for certain types of problem. In addition, we will highlight some open problems for further research.\nWe present a new approach to credal nets, which are graphical models that generalise Bayesian nets to imprecise probability. Instead of applying the commonly used notion of strong independence, we replace it by the weaker notion of epistemic irrelevance. We show how assessments of epistemic irrelevance allow us to construct a global model out of given local uncertainty models and mention some useful properties. The main results and proofs are presented using the language of sets of desirable gambles, which provides a very general and expressive way of representing imprecise probability models.\nThe human mind is known to be sensitive to complexity. For instance, the visual system reconstructs hidden parts of objects following a principle of maximum simplicity. We suggest here that higher cognitive processes, such as the selection of relevant situations, are sensitive to variations of complexity. Situations are relevant to human beings when they appear simpler to describe than to generate. This definition offers a predictive (i.e. falsifiable) model for the selection of situations worth reporting (interestingness) and for what individuals consider an appropriate move in conversation.\nGame tree search algorithms such as minimax have been used with enormous success in turn-based adversarial games such as Chess or Checkers. However, such algorithms cannot be directly applied to real-time strategy (RTS) games because a number of reasons. For example, minimax assumes a turn-taking game mechanics, not present in RTS games. In this paper we present RTMM, a real-time variant of the standard minimax algorithm, and discuss its applicability in the context of RTS games. We discuss its strengths and weaknesses, and evaluate it in two real-time games.\nThis paper deals with the implementation of Least Mean Square (LMS) algorithm in Decision Feedback Equalizer (DFE) for removal of Inter Symbol Interference (ISI) at the receiver. The channel disrupts the transmitted signal by spreading it in time. Although, the LMS algorithm is robust and reliable, it is slow in convergence. In order to increase the speed of convergence, modifications have been made in the algorithm where the weights get updated depending on the severity of disturbance.\nVarious methods for lifted probabilistic inference have been proposed, but our understanding of these methods and the relationships between them is still limited, compared to their propositional counterparts. The only existing theoretical characterization of lifting is for weighted first-order model counting (WFOMC), which was shown to be complete domain-lifted for the class of 2-logvar models. This paper makes two contributions to lifted variable elimination (LVE). First, we introduce a novel inference operator called group inversion. Second, we prove that LVE augmented with this operator is complete in the same sense as WFOMC.\nThe Generalized Traveling Salesman Problem (GTSP) is one of the NP-hard combinatorial optimization problems. A variant of GTSP is E-GTSP where E, meaning equality, has the constraint: exactly one node from a cluster of a graph partition is visited. The main objective of the E-GTSP is to find a minimum cost tour passing through exactly one node from each cluster of an undirected graph. Agent-based approaches involving are successfully used nowadays for solving real life complex problems. The aim of the current paper is to illustrate some variants of agent-based algorithms including ant-based models with specific properties for solving E-GTSP.\nThe current paper introduces a new parallel computing technique based on ant colony optimization for a dynamic routing problem. In the dynamic traveling salesman problem the distances between cities as travel times are no longer fixed. The new technique uses a parallel model for a problem variant that allows a slight movement of nodes within their Neighborhoods. The algorithm is tested with success on several large data sets.\nThis short paper presents a work on the design of low noise microwave amplifiers using particle swarm optimization (PSO) technique. Particle Swarm Optimization is used as a method that is applied to a single stage amplifier circuit to meet two criteria: desired gain and desired low noise. The aim is to get the best optimized design using the predefined constraints for gain and low noise values. The code is written to apply the algorithm to meet the desired goals and the obtained results are verified using different simulators. The results obtained show that PSO can be applied very efficiently for this kind of design problems with multiple constraints.\nThe matrices and their sub-blocks are introduced into the study of determining various extensions in the sense of Dung's theory of argumentation frameworks. It is showed that each argumentation framework has its matrix representations, and the core semantics defined by Dung can be characterized by specific sub-blocks of the matrix. Furthermore, the elementary permutations of a matrix are employed by which an efficient matrix approach for finding out all extensions under a given semantics is obtained. Different from several established approaches, such as the graph labelling algorithm, Constraint Satisfaction Problem algorithm, the matrix approach not only put the mathematic idea into the investigation for finding out various extensions, but also completely achieve the goal to compute all the extensions needed.\nWe construct an extension of diffusion geometry to multiple modalities through joint approximate diagonalization of Laplacian matrices. This naturally extends classical data analysis tools based on spectral geometry, such as diffusion maps and spectral clustering. We provide several synthetic and real examples of manifold learning, retrieval, and clustering demonstrating that the joint diffusion geometry frequently better captures the inherent structure of multi-modal data. We also show that many previous attempts to construct multimodal spectral clustering can be seen as particular cases of joint approximate diagonalization of the Laplacians.\nThe discovery of new and interesting patterns in large datasets, known as data mining, draws more and more interest as the quantities of available data are exploding. Data mining techniques may be applied to different domains and fields such as computer science, health sector, insurances, homeland security, banking and finance, etc. In this paper we are interested by the discovery of a specific category of patterns, known as rare and non-present patterns. We present a novel approach towards the discovery of non-present patterns using rare item-set mining.\nSeveral variants of the Constraint Satisfaction Problem have been proposed and investigated in the literature for modelling those scenarios where solutions are associated with some given costs. Within these frameworks computing an optimal solution is an NP-hard problem in general; yet, when restricted over classes of instances whose constraint interactions can be modelled via (nearly-)acyclic graphs, this problem is known to be solvable in polynomial time. In this paper, larger classes of tractable instances are singled out, by discussing solution approaches based on exploiting hypergraph acyclicity and, more generally, structural decomposition methods, such as (hyper)tree decompositions.\nIn the recent years, we have linked a large corpus of formal mathematics with automated theorem proving (ATP) tools, and started to develop combined AI/ATP systems working in this setting. In this paper we first relate this project to the earlier large-scale automated developments done by Quaife with McCune's Otter system, and to the discussions of the QED project about formalizing a significant part of mathematics. Then we summarize our adventure so far, argue that the QED dreams were right in anticipating the creation of a very interesting semantic AI field, and discuss its further research directions.\nThe paper presents a comparison of various soft computing techniques used for filtering and enhancing speech signals. The three major techniques that fall under soft computing are neural networks, fuzzy systems and genetic algorithms. Other hybrid techniques such as neuro-fuzzy systems are also available. In general, soft computing techniques have been experimentally observed to give far superior performance as compared to non-soft computing techniques in terms of robustness and accuracy.\nWe process a large corpus of game records of the board game of Go and propose a way of extracting summary information on played moves. We then apply several basic data-mining methods on the summary information to identify the most differentiating features within the summary information, and discuss their correspondence with traditional Go knowledge. We show statistically significant mappings of the features to player attributes such as playing strength or informally perceived \"playing style\" (e.g. territoriality or aggressivity), describe accurate classifiers for these attributes, and propose applications including seeding real-work ranks of internet players, aiding in Go study and tuning of Go-playing programs, or contribution to Go-theoretical discussion on the scope of \"playing style\".\nEfficient Natural Evolution Strategies (eNES) is a novel alternative to conventional evolutionary algorithms, using the natural gradient to adapt the mutation distribution. Unlike previous methods based on natural gradients, eNES uses a fast algorithm to calculate the inverse of the exact Fisher information matrix, thus increasing both robustness and performance of its evolution gradient estimation, even in higher dimensions. Additional novel aspects of eNES include optimal fitness baselines and importance mixing (a procedure for updating the population with very few fitness evaluations). The algorithm yields competitive results on both unimodal and multimodal benchmarks.\nWe address the relative expressiveness of defeasible logics in the framework DL. Relative expressiveness is formulated as the ability to simulate the reasoning of one logic within another logic. We show that such simulations must be modular, in the sense that they also work if applied only to part of a theory, in order to achieve a useful notion of relative expressiveness. We present simulations showing that logics in DL with and without the capability of team defeat are equally expressive. We also show that logics that handle ambiguity differently -- ambiguity blocking versus ambiguity propagating -- have distinct expressiveness, with neither able to simulate the other under a different formulation of expressiveness.\nThis paper evaluates heterogeneous information fusion using multi-task Gaussian processes in the context of geological resource modeling. Specifically, it empirically demonstrates that information integration across heterogeneous information sources leads to superior estimates of all the quantities being modeled, compared to modeling them individually. Multi-task Gaussian processes provide a powerful approach for simultaneous modeling of multiple quantities of interest while taking correlations between these quantities into consideration. Experiments are performed on large scale real sensor data.\nFor a mobile robot to be truly autonomous, it must solve the simultaneous localization and mapping (SLAM) problem. We develop a new metaheuristic algorithm called Simulated Tom Thumb (STT), based on the detailed adventure of the clever Tom Thumb and advances in researches relating to path planning based on potential functions. Investigations show that it is very promising and could be seen as an optimization of the powerful solution of SLAM with data association and learning capabilities. STT outperform JCBB. The performance is 100 % match.\nIn order to build AI we have to create a program which copes well in an arbitrary world. In this paper we will restrict our attention on one concrete world, which represents the game Tick-Tack-Toe. This world is a very simple one but it is sufficiently complicated for our task because most people cannot manage with it. The main difficulty in this world is that the player cannot see the entire internal state of the world so he has to build a model in order to understand the world. The model which we will offer will consist of final automata and first order formulas.\nThe paper introduces a framework for representation and acquisition of knowledge emerging from large samples of textual data. We utilise a tensor-based, distributional representation of simple statements extracted from text, and show how one can use the representation to infer emergent knowledge patterns from the textual data in an unsupervised manner. Examples of the patterns we investigate in the paper are implicit term relationships or conjunctive IF-THEN rules. To evaluate the practical relevance of our approach, we apply it to annotation of life science articles with terms from MeSH (a controlled biomedical vocabulary and thesaurus).\nWe present the new multi-threaded version of the state-of-the-art answer set solver clasp. We detail its component and communication architecture and illustrate how they support the principal functionalities of clasp. Also, we provide some insights into the data representation used for different constraint types handled by clasp. All this is accompanied by an extensive experimental analysis of the major features related to multi-threading in clasp.\nThis paper describes Artex, another algorithm for Automatic Text Summarization. In order to rank sentences, a simple inner product is calculated between each sentence, a document vector (text topic) and a lexical vector (vocabulary used by a sentence). Summaries are then generated by assembling the highest ranked sentences. No ruled-based linguistic post-processing is necessary in order to obtain summaries. Tests over several datasets (coming from Document Understanding Conferences (DUC), Text Analysis Conferences (TAC), evaluation campaigns, etc.) in French, English and Spanish have shown that summarizer achieves interesting results.\nWe are proud to introduce this special issue of the Journal of Theory and Practice of Logic Programming (TPLP), dedicated to the full papers accepted for the 28th International Conference on Logic Programming (ICLP). The ICLP meetings started in Marseille in 1982 and since then constitute the main venue for presenting and discussing work in the area of logic programming.\nRecent developments in fitness landscape analysis include the study of Local Optima Networks (LON) and applications of the Elementary Landscapes theory. This paper represents a first step at combining these two tools to explore their ability to forecast the performance of search algorithms. We base our analysis on the Quadratic Assignment Problem (QAP) and conduct a large statistical study over 600 generated instances of different types. Our results reveal interesting links between the network measures, the autocorrelation measures and the performance of heuristic search algorithms.\nGeneralized relational theories with null values in the sense of Reiter are first-order theories that provide a semantics for relational databases with incomplete information. In this paper we show that any such theory can be turned into an equivalent logic program, so that models of the theory can be generated using computational methods of answer set programming. As a step towards this goal, we develop a general method for calculating stable models under the domain closure assumption but without the unique name assumption.\nWe consider the budget optimization problem faced by an advertiser participating in repeated sponsored search auctions, seeking to maximize the number of clicks attained under that budget. We cast the budget optimization problem as a Markov Decision Process (MDP) with censored observations, and propose a learning algorithm based on the wellknown Kaplan-Meier or product-limit estimator. We validate the performance of this algorithm by comparing it to several others on a large set of search auction data from Microsoft adCenter, demonstrating fast convergence to optimal performance.\nComputing a Nash equilibrium (NE) is a central task in computer science. An NE is a particularly appropriate solution concept for two-agent settings because coalitional deviations are not an issue. However, even in this case, finding an NE is PPAD-complete. In this paper, we combine path following algorithms with local search techniques to design new algorithms for finding exact and approximate NEs. We show that our algorithms largely outperform the state of the art and that almost all the known benchmark game classes are easily solvable or approximable (except for the GAMUT CovariantGameRand class).\nWe present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance weighting, doubly robust evaluation, and nonstationary policy evaluation approaches. In addition, our approach allows generating longer histories by careful control of a bias-variance tradeoff, and further decreases variance by incorporating information about randomness of the target policy. Empirical evidence from synthetic and realworld exploration learning problems shows the new evaluator successfully unifies previous approaches and uses information an order of magnitude more efficiently.\nWe describe Hokusai, a real time system which is able to capture frequency information for streams of arbitrary sequences of symbols. The algorithm uses the CountMin sketch as its basis and exploits the fact that sketching is linear. It provides real time statistics of arbitrary events, e.g. streams of queries as a function of time. We use a factorizing approximation to provide point estimates at arbitrary (time, item) combinations. Queries can be answered in constant time.\nIn this paper, we demonstrate and discuss results of our mining the abstracts of the publications in Harvard Business Review between 1922 and 2012. Techniques for computing n-grams, collocations, basic sentiment analysis, and named-entity recognition were employed to uncover trends hidden in the abstracts. We present findings about international relationships, sentiment in HBR's abstracts, important international companies, influential technological inventions, renown researchers in management theories, US presidents via chronological analyses.\nWe analyse the storage and retrieval capacity in a recurrent neural network of spiking integrate and fire neurons. In the model we distinguish between a learning mode, during which the synaptic connections change according to a Spike-Timing Dependent Plasticity (STDP) rule, and a recall mode, in which connections strengths are no more plastic. Our findings show the ability of the network to store and recall periodic phase coded patterns a small number of neurons has been stimulated. The self sustained dynamics selectively gives an oscillating spiking activity that matches one of the stored patterns, depending on the initialization of the network.\nIn this paper we present the results of unstructured data clustering in this case a textual data from Reuters 21578 corpus with a new biomimetic approach using immune system. Before experimenting our immune system, we digitalized textual data by the n-grams approach. The novelty lies on hybridization of n-grams and immune systems for clustering. The experimental results show that the recommended ideas are promising and prove that this method can solve the text clustering problem.\nWith the increased use of ontologies in semantically-enabled applications, the issue of debugging defects in ontologies has become increasingly important. These defects can lead to wrong or incomplete results for the applications. Debugging consists of the phases of detection and repairing. In this paper we focus on the repairing phase of a particular kind of defects, i.e. the missing relations in the is-a hierarchy. Previous work has dealt with the case of taxonomies. In this work we extend the scope to deal with ALC ontologies that can be represented using acyclic terminologies. We present algorithms and discuss a system.\nThe paper presents a two-level learning method for the design of the Beta Basis Function Neural Network BBFNN. A Genetic Algorithm is employed at the upper level to construct BBFNN, while the key learning parameters :the width, the centers and the Beta form are optimised using the gradient algorithm at the lower level. In order to demonstrate the effectiveness of this hierarchical learning algorithm HLABBFNN, we need to validate our algorithm for the approximation of non-linear function.\nMuch work has been done refining and characterizing the receptive fields learned by deep learning algorithms. A lot of this work has focused on the development of Gabor-like filters learned when enforcing sparsity constraints on a natural image dataset. Little work however has investigated how these filters might expand to the temporal domain, namely through training on natural movies. Here we investigate exactly this problem in established temporal deep learning algorithms as well as a new learning paradigm suggested here, the Temporal Autoencoding Restricted Boltzmann Machine (TARBM).\nThis paper describes the analysis of a selected testbed of Semantic Web ontologies, by a SPARQL query, which determines those ontologies that can be related to the description logic DL<ForAllPiZero>, introduced in [4] and studied in [9]. We will see that a reasonable number of them is expressible within such computationally efficient language. We expect that, in a long-term view, a temporalization of description logics, and consequently, of OWL(2), can open new perspectives for the inclusion in this language of a greater number of ontologies of the testbed and, hopefully, of the \"real world\".\nThis paper proposes a hybrid multiagent learning algorithm for solving the dynamic simulation-based bilevel network design problem. The objective is to determine the op-timal frequency of a multimodal transit network, which minimizes total users' travel cost and operation cost of transit lines. The problem is formulated as a bilevel programming problem with equilibrium constraints describing non-cooperative Nash equilibrium in a dynamic simulation-based transit assignment context. A hybrid algorithm combing the cross entropy multiagent learning algorithm and Hooke-Jeeves algorithm is proposed. Computational results are provided on the Sioux Falls network to illustrate the perform-ance of the proposed algorithm.\nMany cognitive systems deploy multiple, closed, individually consistent models which can represent interpretations of the present state of the world, moments in the past, possible futures or alternate versions of reality. While they appear under different names, these structures can be grouped under the general term of worlds. The Xapagy architecture is a story-oriented cognitive system which relies exclusively on the autobiographical memory implemented as a raw collection of events organized into world-type structures called {\\em scenes}. The system performs reasoning by shadowing current events with events from the autobiography. The shadows are then extrapolated into headless shadows corresponding to predictions, hidden events or inferred relations.\nThis paper argues that the problem of identity is a critical challenge in agents which are able to reason about stories. The Xapagy architecture has been built from scratch to perform narrative reasoning and relies on a somewhat unusual approach to represent instances and identity. We illustrate the approach by a representation of the story of Little Red Riding Hood in the architecture, with a focus on the problem of identity raised by the narrative.\nThe Xapagy architecture is a story-oriented cognitive system which relies exclusively on the autobiographical memory implemented as a raw collection of events. Reasoning is performed by shadowing current events with events from the autobiography. The shadows are then extrapolated into headless shadows (HLSs). In a story following mood, HLSs can be used to track the level of surprise of the agent, to infer hidden actions or relations between the participants, and to summarize ongoing events. In recall mood, the HLSs can be used to create new stories ranging from exact recall to free-form confabulation.\nMost optimization problems in real life applications are often highly nonlinear. Local optimization algorithms do not give the desired performance. So, only global optimization algorithms should be used to obtain optimal solutions. This paper introduces a new nature-inspired metaheuristic optimization algorithm, called Hoopoe Heuristic (HH). In this paper, we will study HH and validate it against some test functions. Investigations show that it is very promising and could be seen as an optimization of the powerful algorithm of cuckoo search. Finally, we discuss the features of Hoopoe Heuristic and propose topics for further studies.\nThere are many new forms of interfacing human users to machines. We persevere here electric mechanical form of interaction between human and machine. The emergence of brain-computer interface allows mind-to-movement systems. The story of the Pied Piper inspired us to devise some new heuristics for interfacing human motor system using brain waves by combining head helmet and LumbarMotionMonitor For the simulation we use java GridGain Brain responses of classified subjects during training indicates that Probe can be the best stimulus to rely on in distinguishing between knowledgeable and not knowledgeable\nMost searches for alien radio transmission have focused on finding omni-directional or purposefully earth-directed beams of enduring duration. However, most of the interesting signals so far detected have been transient and non-repeatable in nature. These signals could very well be the first data points in an ever-growing data base of such signals used to construct a probabilistic argument for the existence of extraterrestrial intelligence. This paper looks at the effect base rate bias could have on deciding which signals to include in such an archive based upon the likely assumption that our ability to discern natural from artificial signals will be less than perfect.\nProbabilistic programming is related to a compositional approach to stochastic modeling by switching from discrete to continuous time dynamics. In continuous time, an operator-algebra semantics is available in which processes proceeding in parallel (and possibly interacting) have summed time-evolution operators. From this foundation, algorithms for simulation, inference and model reduction may be systematically derived. The useful consequences are potentially far-reaching in computational science, machine learning and beyond. Hybrid compositional stochastic modeling/probabilistic programming approaches may also be possible.\nInstead of requiring a domain expert to specify the probabilistic dependencies of the data, in this work we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for columns, latent variables that cluster the data, and factors that reflect and represent the foreign key links. Experiments demonstrate the accuracy of the model and the scalability of inference on synthetic and real-world data.\nGraphical models with bi-directed edges (<->) represent marginal independence: the absence of an edge between two vertices indicates that the corresponding variables are marginally independent. In this paper, we consider maximum likelihood estimation in the case of continuous variables with a Gaussian joint distribution, sometimes termed a covariance graph model. We present a new fitting algorithm which exploits standard regression techniques and establish its convergence properties. Moreover, we contrast our procedure to existing estimation methods.\nRepresentations based on random walks can exploit discrete data distributions for clustering and classification. We extend such representations from discrete to continuous distributions. Transition probabilities are now calculated using a diffusion equation with a diffusion coefficient that inversely depends on the data density. We relate this diffusion equation to a path integral and derive the corresponding path probability measure. The framework is useful for incorporating continuous data densities and prior knowledge.\nProduct models of low dimensional experts are a powerful way to avoid the curse of dimensionality. We present the ``under-complete product of experts' (UPoE), where each expert models a one dimensional projection of the data. The UPoE is fully tractable and may be interpreted as a parametric probabilistic model for projection pursuit. Its ML learning rules are identical to the approximate learning rules proposed before for under-complete ICA. We also derive an efficient sequential learning algorithm and discuss its relationship to projection pursuit density estimation and feature induction algorithms for additive random field models.\nWe present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust and fast new variant of the EM algorithm can be developed.Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring hidden units from small amounts of data.\nWe introduce Dimple, a fully open-source API for probabilistic modeling. Dimple allows the user to specify probabilistic models in the form of graphical models, Bayesian networks, or factor graphs, and performs inference (by automatically deriving an inference engine from a variety of algorithms) on the model. Dimple also serves as a compiler for GP5, a hardware accelerator for inference.\nIn this paper we describe the task of extracting product and brand pages from wikipedia. We present an experimental environment and setup built on top of a dataset of wikipedia pages we collected. We introduce a method for recognition of product pages modelled as a boolean probabilistic classification task. We show that this approach can lead to promising results and we discuss alternative approaches we considered.\nIdentifying the social actor has become one of tasks in Artificial Intelligence, whereby extracting keyword from Web snippets depend on the use of web is steadily gaining ground in this research. We develop therefore an approach based on overlap principle for utilizing a collection of features in web snippets, where use of keyword will eliminate the un-relevant web pages.\nThe emergence of network technologies and the appearance of new varied applications in terms of services and resources, has created new security problems for which existing solutions and mechanisms are inadequate, especially problems of identification and authentication. In a highly distributed and pervasive system, a uniform and centralized security management is not an option. It then becomes necessary to give more autonomy to security systems by providing them with mechanisms that allows a dynamic and flexible cooperation and collaboration between the actors in the system.\nWe investigate the concept of symmetry and its role in problem solving. This paper first defines precisely the elements that constitute a \"problem\" and its \"solution,\" and gives several examples to illustrate these definitions. Given precise definitions of problems, it is relatively straightforward to construct a search process for finding solutions. Finally this paper attempts to exploit the concept of symmetry in improving problem solving.\nIn this article we show the rough outline of a computer algorithm to generate lower bounds on the exponential function of (in principle) arbitrary precision. We implemented this to generate all necessary analytic terms for the Boltzmann machine partition function thus leading to lower bounds of any order. It turns out that the extra variational parameters can be optimized analytically. We show that bounds upto nineth order are still reasonably calculable in practical situations. The generated terms can also be used as extra correction terms (beyond TAP) in mean field expansions.\nIn this paper, we introduce and evaluate a data-driven staged mixture modeling technique for building density, regression, and classification models. Our basic approach is to sequentially add components to a finite mixture model using the structural expectation maximization (SEM) algorithm. We show that our technique is qualitatively similar to boosting. This correspondence is a natural byproduct of the fact that we use the SEM algorithm to sequentially fit the mixture model. Finally, in our experimental evaluation, we demonstrate the effectiveness of our approach on a variety of prediction and density estimation tasks using real-world data.\nWe introduce the notion of fault tolerant mechanism design, which extends the standard game theoretic framework of mechanism design to allow for uncertainty about execution. Specifically, we define the problem of task allocation in which the private information of the agents is not only their costs to attempt the tasks, but also their probabilities of failure. For several different instances of this setting we present technical results, including positive ones in the form of mechanisms that are incentive compatible, individually rational and efficient, and negative ones in the form of impossibility theorems.\nActive learning is a powerful approach to analyzing data effectively. We show that the feasibility of active learning depends crucially on the choice of measure with respect to which the query is being optimized. The standard information gain, for example, does not permit an accurate evaluation with a small committee, a representative subset of the model space. We propose a surrogate measure requiring only a small committee and discuss the properties of this new measure. We devise, in addition, a bootstrap approach for committee selection. The advantages of this approach are illustrated in the context of recovering (regulatory) network models.\nThis paper presents a novel method of foreground segmentation that distinguishes moving objects from their moving cast shadows in monocular image sequences. The models of background, edge information, and shadow are set up and adaptively updated. A Bayesian belief network is proposed to describe the relationships among the segmentation label, background, intensity, and edge information. The notion of Markov random field is used to encourage the spatial connectivity of the segmented regions. The solution is obtained by maximizing the posterior possibility density of the segmentation field.\nNP-SPEC is a language for specifying problems in NP in a declarative way. Despite the fact that the semantics of the language was given by referring to Datalog with circumscription, which is very close to ASP, so far the only existing implementations are by means of ECLiPSe Prolog and via Boolean satisfiability solvers. In this paper, we present translations from NP-SPEC into various forms of ASP and analyze them. We also argue that it might be useful to incorporate certain language constructs of NP-SPEC into mainstream ASP.\nIn this paper we continue the work on our extension of Answer Set Programming by non-Herbrand functions and add to the language support for arithmetic expressions and various inequality relations over non-Herbrand functions, as well as consistency-restoring rules from CR-Prolog. We demonstrate the use of this latest version of the language in the representation of important kinds of knowledge.\nThe advance of Internet and Sensor technology has brought about new challenges evoked by the emergence of continuous data streams. Beyond rapid data processing, application areas like ambient assisted living, robotics, or dynamic scheduling involve complex reasoning tasks. We address such scenarios and elaborate upon approaches to knowledge-intense stream reasoning, based on Answer Set Programming (ASP). While traditional ASP methods are devised for singular problem solving, we develop new techniques to formulate and process problems dealing with emerging as well as expiring data in a seamless way.\nWe present alternative definitions of the first-order stable model semantics and its extension to incorporate generalized quantifiers by referring to the familiar notion of a reduct instead of referring to the SM operator in the original definitions. Also, we extend the FLP stable model semantics to allow generalized quantifiers by referring to an operator that is similar to the $\\sm$ operator. For a reasonable syntactic class of logic programs, we show that the two stable model semantics of generalized quantifiers are interchangeable.\nWe investigate the relationship between the generalization of program completion defined in 1984 by Lloyd and Topor and the generalization of the stable model semantics introduced recently by Ferraris et al. The main theorem can be used to characterize, in some cases, the general stable models of a logic program by a first-order formula. The proof uses Truszczynski's stable model semantics of infinitary propositional formulas.\nClassification is one of the major issues in Data Mining Research fields. The classification problems in medical area often classify medical dataset based on the result of medical diagnosis or description of medical treatment by the medical practitioner. This research work discusses the classification process of Gene Expression data for three different cancers which are breast cancer, lung cancer and leukemia cancer with two classes which are cancerous stage and non cancerous stage. We have applied a fuzzy soft set similarity based classifier to enhance the accuracy to predict the stages among cancer genes and the informative genes are selected by using Entopy filtering.\nThis paper presents a novel approach based on variable forgetting, which is a useful tool in resolving contradictory by filtering some given variables, to merging multiple knowledge bases. This paper first builds a relationship between belief merging and variable forgetting by using dilation. Variable forgetting is applied to capture belief merging operation. Finally, some new merging operators are developed by modifying candidate variables to amend the shortage of traditional merging operators. Different from model selection of traditional merging operators, as an alternative approach, variable selection in those new operators could provide intuitive information about an atom variable among whole knowledge bases.\nThis volume contains the papers presented at the fifth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2012) held on September 4th, 2012 in Budapest, co-located with the 28th International Conference on Logic Programming (ICLP 2012). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.\nSome high-dimensional data.sets can be modelled by assuming that there are many different linear constraints, each of which is Frequently Approximately Satisfied (FAS) by the data. The probability of a data vector under the model is then proportional to the product of the probabilities of its constraint violations. We describe three methods of learning products of constraints using a heavy-tailed probability distribution for the violations.\nWe present an iterative Markov chainMonte Carlo algorithm for computingreference priors and minimax risk forgeneral parametric families. Ourapproach uses MCMC techniques based onthe Blahut-Arimoto algorithm forcomputing channel capacity ininformation theory. We give astatistical analysis of the algorithm,bounding the number of samples requiredfor the stochastic algorithm to closelyapproximate the deterministic algorithmin each iteration. Simulations arepresented for several examples fromexponential families. Although we focuson applications to reference priors andminimax risk, the methods and analysiswe develop are applicable to a muchbroader class of optimization problemsand iterative algorithms.\nDeep Learning models enjoy considerable success in Natural Language Processing. While deep architectures produce useful representations that lead to improvements in various tasks, they are often difficult to interpret. This makes the analysis of learned structures particularly difficult. In this paper, we rely on empirical tests to see whether a particular structure makes sense. We present an analysis of the Semi-Supervised Recursive Autoencoder, a well-known model that produces structural representations of text. We show that for certain tasks, the structure of the autoencoder can be significantly reduced without loss of classification accuracy and we evaluate the produced structures using human judgment.\nWe present a new method for conducting Monte Carlo inference in graphical models which combines explicit search with generalized importance sampling. The idea is to reduce the variance of importance sampling by searching for significant points in the target distribution. We prove that it is possible to introduce search and still maintain unbiasedness. We then demonstrate our procedure on a few simple inference tasks and show that it can improve the inference quality of standard MCMC methods, including Gibbs sampling, Metropolis sampling, and Hybrid Monte Carlo. This paper extends previous work which showed how greedy importance sampling could be correctly realized in the one-dimensional case.\nWe define a generalized likelihood function based on uncertainty measures and show that maximizing such a likelihood function for different measures induces different types of classifiers. In the probabilistic framework, we obtain classifiers that optimize the cross-entropy function. In the possibilistic framework, we obtain classifiers that maximize the interclass margin. Furthermore, we show that the support vector machine is a sub-class of these maximum-margin classifiers.\nRe-identification algorithms are used in data privacy to measure disclosure risk. They model the situation in which an adversary attacks a published database by means of linking the information of this adversary with the database.   In this paper we formalize this type of algorithm in terms of true probabilities and compatible belief functions. The purpose of this work is to leave aside as re-identification algorithms those algorithms that do not satisfy a minimum requirement.\nDevelopment of Interactive Theorem Provers has led to the creation of big libraries and varied infrastructures for formal proofs. However, despite (or perhaps due to) their sophistication, the re-use of libraries by non-experts or across domains is a challenge. In this paper, we provide detailed case studies and evaluate the machine-learning tool ML4PG built to interactively data-mine the electronic libraries of proofs, and to provide user guidance on the basis of proof patterns found in the existing libraries.\nThe notion of rough set captures indiscernibility of elements in a set. But, in many real life situations, an information system establishes the relation between different universes. This gave the extension of rough set on single universal set to rough set on two universal sets. In this paper, we introduce approximation of classifications and measures of uncertainty basing upon rough set on two universal sets employing the knowledge due to binary relations.\nWe describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object/classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives not only the prediction itself but also a practicable measure of the evidence found in support of that prediction. We also describe a procedure for assigning degrees of confidence to predictions made by the support vector machine. Some experimental results are presented, and possible extensions of the algorithms are discussed.\nIn this paper we investigate the geometry of the likelihood of the unknown parameters in a simple class of Bayesian directed graphs with hidden variables. This enables us, before any numerical algorithms are employed, to obtain certain insights in the nature of the unidentifiability inherent in such models, the way posterior densities will be sensitive to prior densities and the typical geometrical form these posterior densities might take. Many of these insights carry over into more complicated Bayesian networks with systematic missing data.\nWe previously designed Partial Order Conflict Driven Clause Learning (PO-CDCL), a variation of the satisfiability solving CDCL algorithm with a partial order on decision levels, and showed that it can speed up the solving on problems with a high independence between decision levels. In this paper, we more thoroughly analyze the reasons of the efficiency of PO-CDCL. Of particular importance is that the partial order introduces several candidates for the assertion level. By evaluating different heuristics for this choice, we show that the assertion level selection has an important impact on solving and that a carefully designed heuristic can significantly improve performances on relevant benchmarks.\nThe standard way to parameterize the distributions represented by a directed acyclic graph is to insert a parametric family for the conditional distribution of each random variable given its parents. We show that when one's goal is to test for or estimate an effect of a sequentially applied treatment, this natural parameterization has serious deficiencies. By reparameterizing the graph using structural nested models, these deficiencies can be avoided.\nDeep Neural Networks now excel at image classification, detection and segmentation. When used to scan images by means of a sliding window, however, their high computational complexity can bring even the most powerful hardware to its knees. We show how dynamic programming can speedup the process by orders of magnitude, even when max-pooling layers are present.\nWe analyse the complexity of environments according to the policies that need to be used to achieve high performance. The performance results for a population of policies leads to a distribution that is examined in terms of policy complexity and analysed through several diagrams and indicators. The notion of environment response curve is also introduced, by inverting the performance results into an ability scale. We apply all these concepts, diagrams and indicators to a minimalistic environment class, agent-populated elementary cellular automata, showing how the difficulty, discriminating power and ranges (previous to normalisation) may vary for several environments.\nWe extend the theory of d-separation to cases in which data instances are not independent and identically distributed. We show that applying the rules of d-separation directly to the structure of probabilistic models of relational data inaccurately infers conditional independence. We introduce relational d-separation, a theory for deriving conditional independence facts from relational models. We provide a new representation, the abstract ground graph, that enables a sound, complete, and computationally efficient method for answering d-separation queries about relational models, and we present empirical results that demonstrate effectiveness.\nMorris (1996, 1997) introduced preference-based definitions of knowledge and belief in standard state-space structures. This paper extends this preference-based approach to unawareness structures (Heifetz, Meier, and Schipper, 2006, 2008). By defining unawareness and knowledge in terms of preferences over acts in unawareness structures and showing their equivalence to the epistemic notions of unawareness and knowledge, we try to build a bridge between decision theory and epistemic logic. Unawareness of an event is characterized behaviorally as the event being null and its negation being null.\nDistributed decision-makers are modeled as players in a game with two levels. High level decisions concern the game environment and determine the willingness of the players to form a coalition (or group). Low level decisions involve the actions to be implemented within the chosen environment. Coalition and action strategies are determined by probability distributions, which are updated using learning automata schemes. The payoffs are also probabilistic and there is uncertainty in the state vector since information is delayed. The goal is to reach equilibrium in both levels of decision making; the results show the conditions for instability, based on the age of information.\nThe Tawny-OWL library provides a fully-programmatic environment for ontology building; it enables the use of a rich set of tools for ontology development, by recasting development as a form of programming. It is built in Clojure - a modern Lisp dialect, and is backed by the OWL API. Used simply, it has a similar syntax to OWL Manchester syntax, but it provides arbitrary extensibility and abstraction. It builds on existing facilities for Clojure, which provides a rich and modern programming tool chain, for versioning, distributed development, build, testing and continuous integration. In this paper, we describe the library, this environment and the its potential implications for the ontology development process.\nWe propose a validity preserving translation from a subset of epistemic Alternating-time Temporal Logic (ATL) to epistemic Computation Tree Logic (CTL). The considered subset of epistemic ATL is known to have the finite model property and decidable model-checking. This entails the decidability of validity but the implied algorithm is unfeasible. Reducing the validity problem to that in a corresponding system of CTL makes the techniques for automated deduction for that logic available for the handling of the apparently more complex system of ATL.\nThis paper introduces Gene-Machine, an efficient and new search heuristic algorithm, based in the building-block hypothesis. It is inspired by natural evolution, but does not use some of the concepts present in genetic algorithms like population, mutation and generation. This heuristic exhibits good performance in comparison with genetic algorithms, and can be used to generate useful solutions to optimization and search problems.\nWe analyze the meaning of the violation of the marginal probability law for situations of correlation measurements where entanglement is identified. We show that for quantum theory applied to the cognitive realm such a violation does not lead to the type of problems commonly believed to occur in situations of quantum theory applied to the physical realm. We briefly situate our quantum approach for modeling concepts and their combinations with respect to the notions of 'extension' and 'intension' in theories of meaning, and in existing concept theories.\nThe ability to automatically generalise (interactive) proofs and use such generalisations to discharge related conjectures is a very hard problem which remains unsolved. Here, we develop a notion of goal types to capture key properties of goals, which enables abstractions over the specific order and number of sub-goals arising when composing tactics. We show that the goal types form a lattice, and utilise this property in the techniques we develop to automatically generalise proof strategies in order to reuse it for proofs of related conjectures. We illustrate our approach with an example.\nWe are dealing with the problem of space layout planning here. We present an architectural conceptual CAD approach. Starting with design specifications in terms of constraints over spaces, a specific enumeration heuristics leads to a complete set of consistent conceptual design solutions named topological solutions. These topological solutions which do not presume any precise definitive dimension correspond to the sketching step that an architect carries out from the Design specifications on a preliminary design phase in architecture.\nThis paper presents capabilities of using genetic algorithms to find approximations of function extrema, which cannot be found using analytic ways. To enhance effectiveness of calculations, algorithm has been parallelized using OpenMP library. We gained much increase in speed on platforms using multithreaded processors with shared memory free access. During analysis we used different modifications of genetic operator, using them we obtained varied evolution process of potential solutions. Results allow to choose best methods among many applied in genetic algorithms and observation of acceleration on Yorkfield, Bloomfield, Westmere-EX and most recent Sandy Bridge cores.\nSeveral algorithms have been proposed for discovering patterns from trajectories of moving objects, but only a few have concentrated on outlier detection. Existing approaches, in general, discover spatial outliers, and do not provide any further analysis of the patterns. In this paper we introduce semantic spatial and spatio-temporal outliers and propose a new algorithm for trajectory outlier detection. Semantic outliers are computed between regions of interest, where objects have similar movement intention, and there exist standard paths which connect the regions. We show with experiments on real data that the method finds semantic outliers from trajectory data that are not discovered by similar approaches.\nHepatitis C virus (HCV) is a widely spread disease all over the world. HCV has very high mutation rate that makes it resistant to antibodies. Modeling HCV to identify the virus mutation process is essential to its detection and predicting its evolution. This paper presents a model based framework for estimating mutation rate of HCV in two steps. Firstly profile hidden Markov model (PHMM) architecture was builder to select the sequences which represents sequence per year. Secondly mutation rate was calculated by using pair-wise distance method between sequences. A pilot study is conducted on NS5B zone of HCV dataset of genotype 4 subtype a (HCV4a) in Egypt.\nIn this article, we combine the concept of a bipolar fuzzy set and a soft set. We introduce the notion of bipolar fuzzy soft set and study fundamental properties. We study basic operations on bipolar fuzzy soft set. We define exdended union, intersection of two bipolar fuzzy soft set. We also give an application of bipolar fuzzy soft set into decision making problem. We give a general algorithm to solve decision making problems by using bipolar fuzzy soft set.\nThe Bootstrap method application in simulation supposes that value of random variables are not generated during the simulation process but extracted from available sample populations. In the case of Hierarchical Bootstrap the function of interest is calculated recurrently using the calculation tree. In the present paper we consider the optimization of sample sizes in each vertex of the calculation tree. The dynamic programming method is used for this aim. Proposed method allows to decrease a variance of system characteristic estimators.\nPhysical symbol systems are needed for open-ended cognition. A good way to understand physical symbol systems is by comparison of thought to chemistry. Both have systematicity, productivity and compositionality. The state of the art in cognitive architectures for open-ended cognition is critically assessed. I conclude that a cognitive architecture that evolves symbol structures in the brain is a promising candidate to explain open-ended cognition. Part 2 of the paper presents such a cognitive architecture.\nWe generalize the notion of symmetries of propositional formulas in conjunctive normal form to modal formulas. Our framework uses the coinductive models and, hence, the results apply to a wide class of modal logics including, for example, hybrid logics. Our main result shows that the symmetries of a modal formula preserve entailment.\nDo two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.\nIn this paper, we address the problem of enumerating all models of a Boolean formula in conjunctive normal form (CNF). We propose an extension of CDCL-based SAT solvers to deal with this fundamental problem. Then, we provide an experimental evaluation of our proposed SAT model enumeration algorithms on both satisfiable SAT instances taken from the last SAT challenge and on instances from the SAT-based encoding of sequence mining problems.\nIn this paper, we firstly give a brief introduction of expectation maximization (EM) algorithm, and then discuss the initial value sensitivity of expectation maximization algorithm. Subsequently, we give a short proof of EM's convergence. Then, we implement experiments with the expectation maximization algorithm (We implement all the experiments on Gaussion mixture model (GMM)). Our experiment with expectation maximization is performed in the following three cases: initialize randomly; initialize with result of K-means; initialize with result of K-medoids. The experiment result shows that expectation maximization algorithm depend on its initial state or parameters. And we found that EM initialized with K-medoids performed better than both the one initialized with K-means and the one initialized randomly.\nIn this paper we present a new concept called generalized neutrosophic soft set. This concept incorporates the beneficial properties of both generalized neutrosophic set introduced by A.A. Salama [7]and soft set techniques proposed by Molodtsov [4]. We also study some properties of this concept. Some definitions and operations have been introduced on generalized neutrosophic soft set. Finally we present an application of generalized neuutrosophic soft set in decision making problem.\nIn this paper, we propose an extension of our Mining for SAT framework to Constraint satisfaction Problem (CSP). We consider n-ary extensional constraints (table constraints). Our approach aims to reduce the size of the CSP by exploiting the structure of the constraints graph and of its associated microstructure. More precisely, we apply itemset mining techniques to search for closed frequent itemsets on these two representation. Using Tseitin extension, we rewrite the whole CSP to another compressed CSP equivalent with respect to satisfiability. Our approach contrast with previous proposed approach by Katsirelos and Walsh, as we do not change the structure of the constraints.\nIn this work we present a family of neural networks, the multi-layer perceptron networks, and some of the algorithms used to train those networks (we hope that with enough details and precision as to satisfy a mathematical public). Then we study how to use those networks to solve a problem that arises from the field of information security: the remote identification of Operating Systems (part of the information gathering steps of the penetration testing methodology). This is the contribution of this work: it is an application of classic Artificial Intelligence techniques to a classification problem that gave better results than the classic techniques used to solve it.\nAnnotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels. In this paper, we present a robust extension of logistic regression that incorporates the possibility of mislabelling directly into the objective. Our model can be trained through nearly the same means as logistic regression, and retains its efficiency on high-dimensional datasets. Through named entity recognition experiments, we demonstrate that our approach can provide a significant improvement over the standard model when annotation errors are present.\nThe Semantic Web works on the existing Web which presents the meaning of information as well-defined vocabularies understood by the people. Semantic Search, at the same time, works on improving the accuracy if a search by understanding the intent of the search and providing contextually relevant results. This paper describes a semantic approach toward web search through a PHP application. The goal was to parse through a user's browsing history and return semantically relevant web pages for the search query provided.\nThe Low Autocorrelation Binary Sequence problem has applications in telecommunications, is of theoretical interest to physicists, and has inspired many optimisation researchers. Metaheuristics for the problem have progressed greatly in recent years but complete search has not progressed since a branch-and-bound method of 1996. In this paper we find four ways of improving branch-and-bound, leading to a tighter relaxation, faster convergence to optimality, and better empirical scalability.\nWe propose a cooperative coevolutionary genetic algorithm for learning Bayesian network structures from fully observable data sets. Since this problem can be decomposed into two dependent subproblems, that is to find an ordering of the nodes and an optimal connectivity matrix, our algorithm uses two subpopulations, each one representing a subtask. We describe the empirical results obtained with simulations of the Alarm and Insurance networks. We show that our algorithm outperforms the deterministic algorithm K2.\nThis article first lists reasons why - in the long term or when creating a new knowledge base (KB) for general knowledge sharing purposes - collaboratively building a well-organized KB does/can provide more possibilities, with on the whole no more costs, than the mainstream approach where knowledge creation and re-use involves searching, merging and creating (semi-)independent (relatively small) ontologies or semi-formal documents. The article lists elements required to achieve this and describes the main one: a KB editing protocol that keeps the KB free of automatically/manually detected inconsistencies while not forcing them to discuss or agree on terminology and beliefs nor requiring a selection committee.\nQualitative spatial and temporal reasoning is based on so-called qualitative calculi. Algebraic properties of these calculi have several implications on reasoning algorithms. But what exactly is a qualitative calculus? And to which extent do the qualitative calculi proposed meet these demands? The literature provides various answers to the first question but only few facts about the second. In this paper we identify the minimal requirements to binary spatio-temporal calculi and we discuss the relevance of the according axioms for representation and reasoning. We also analyze existing qualitative calculi and provide a classification involving different notions of a relation algebra.\nIn this paper, we develop an agent-based model which integrates four important elements, i.e. organisational energy management policies/regulations, energy management technologies, electric appliances and equipment, and human behaviour, to simulate the electricity consumption in office buildings. Based on a case study, we use this model to test the effectiveness of different electricity management strategies, and solve practical office electricity consumption problems. This paper theoretically contributes to an integration of the four elements involved in the complex organisational issue of office electricity consumption, and practically contributes to an application of an agent-based approach for office building electricity consumption study.\nA new defence mechanism for different jamming attack on Wireless Sensor Network (WSN) based on ant system it is introduced. The artificial sensitive ants react on network attacks in particular based on their sensitivity level. The information is re-directed from the attacked node to its appropriate destination node. It is analyzed how are detected and isolated the jamming attacks with mobile agents in general and in particular with the newly ant-based sensitive approach.\nThe paper describes some basic approaches to detection of bottlenecks in composite (modular) systems. The following basic system bottlenecks detection problems are examined: (1) traditional quality management approaches (Pareto chart based method, multicriteria analysis as selection of Pareto-efficient points, and/or multicriteria ranking), (2) selection of critical system elements (critical components/modules, critical component interconnection), (3) selection of interconnected system components as composite system faults (via clique-based fusion), (4) critical elements (e.g., nodes) in networks, and (5) predictive detection of system bottlenecks (detection of system components based on forecasting of their parameters). Here, heuristic solving schemes are used. Numerical examples illustrate the approaches.\nAlgorithm portfolio and selection approaches have achieved remarkable improvements over single solvers. However, the implementation of such systems is often highly customised and specific to the problem domain. This makes it difficult for researchers to explore different techniques for their specific problems. We present LLAMA, a modular and extensible toolkit implemented as an R package that facilitates the exploration of a range of different portfolio techniques on any problem domain. It implements the algorithm selection approaches most commonly used in the literature and leverages the extensive library of machine learning algorithms and techniques in R. We describe the current capabilities and limitations of the toolkit and illustrate its usage on a set of example SAT problems.\nOptimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertainty can be propagated to the action-value function via Bellman iterations, which are computationally insufficiently efficient though. We consider possibility of directly measuring uncertainty of the action-value function, and analyze sufficiency of this facilitated approach.\nKnowledge Representation (KR) is traditionally based on the logic of facts, expressed in boolean logic. However, facts about an agent can also be seen as a set of accomplished tasks by the agent. This paper proposes a new approach to KR: the notion of task logical KR based on Computability Logic. This notion allows the user to represent both accomplished tasks and accomplishable tasks by the agent. This notion allows us to build sophisticated KRs about many interesting agents, which have not been supported by previous logical languages.\nIn this paper I present a new approach for regression of time series using their own samples. This is a celebrated problem known as Auto-Regression. Dealing with outlier or missed samples in a time series makes the problem of estimation difficult, so it should be robust against them. Moreover for coding purposes I will show that it is desired the residual of auto-regression be sparse. To these aims, I first assume a multivariate Gaussian prior on the residual and then obtain the estimation. Two simple simulations have been done on spectrum estimation and speech coding.\nIn this paper we give a partially mechanized proof of the correctness of Steane's 7-qubit error correcting code, using the tool Quantomatic. To the best of our knowledge, this represents the largest and most complicated verification task yet carried out using Quantomatic.\nSolving Quadratic equation is one of the intrinsic interests as it is the simplest nonlinear equations. A novel approach for solving Quadratic Equation based on Genetic Algorithms (GAs) is presented. Genetic Algorithms (GAs) are a technique to solve problems which need optimization. Generation of trial solutions have been formed by this method. Many examples have been worked out, and in most cases we find out the exact solution. We have discussed the effect of different parameters on the performance of the developed algorithm. The results are concluded after rigorous testing on different equations.\nIn the area of Pattern Recognition and Matching, finding a Longest Common Subsequence plays an important role. In this paper, we have proposed one algorithm based on parallel computation. We have used OpenMP API package as middleware to send the data to different processors. We have tested our algorithm in a system having four processors and 2 GB physical memory. The best result showed that the parallel algorithm increases the performance (speed of computation) by 3.22.\nThis paper describes a number of distributed forward search algorithms for solving multi-agent planning problems. We introduce a distributed formulation of non-optimal forward search, as well as an optimal version, MAD-A*. Our algorithms exploit the structure of multi-agent problems to not only distribute the work efficiently among different agents, but also to remove symmetries and reduce the overall workload. The algorithms ensure that private information is not shared among agents, yet computation is still efficient -- outperforming current state-of-the-art distributed planners, and in some cases even centralized search -- despite the fact that each agent has access only to partial information.\nThroughout the history of games, representing the abilities of the various agents acting on behalf of the players has been a central concern. With increasingly sophisticated games emerging, these simulations have become more realistic, but the underlying mechanisms are still, to a large extent, of an ad hoc nature. This paper proposes using a logistic model from psychometrics as a unified mechanism for task resolution in simulation-oriented games.\nThe LCF tradition of interactive theorem proving, which was started by Milner in the 1970-ies, appears to be tied to the classic READ-EVAL-PRINT-LOOP of sequential and synchronous evaluation of prover commands. We break up this loop and retrofit the read-eval-print phases into a model of parallel and asynchronous proof processing. Thus we explain some key concepts of the Isabelle/Scala approach to prover interaction and integration, and the Isabelle/jEdit Prover IDE as front-end technology. We hope to open up the scientific discussion about non-trivial interaction models for ITP systems again, and help getting other old-school proof assistants on a similar track.\nIn this paper we provide a simple random-variable example of inconsistent information, and analyze it using three different approaches: Bayesian, quantum-like, and negative probabilities. We then show that, at least for this particular example, both the Bayesian and the quantum-like approaches have less normative power than the negative probabilities one.\nIn this paper, we discussed CNF-SAT problem (NP-Complete problem) and analysis two solutions that can solve the problem, the PL-Resolution algorithm and the WalkSAT algorithm. PL-Resolution is a sound and complete algorithm that can be used to determine satisfiability and unsatisfiability with certainty. WalkSAT can determine satisfiability if it finds a model, but it cannot guarantee to find a model even there exists one. However, WalkSAT is much faster than PL-Resolution, which makes WalkSAT more practical; and we have analysis the performance between these two algorithms, and the performance of WalkSAT is acceptable if the problem is not so hard.\nIn this paper, we describe an approach that enables an autonomous system to infer the semantics of a command (i.e. a symbol sequence representing an action) in terms of the relations between changes in the observations and the action instances. We present a method of how to induce a theory (i.e. a semantic description) of the meaning of a command in terms of a minimal set of background knowledge. The only thing we have is a sequence of observations from which we extract what kinds of effects were caused by performing the command. This way, we yield a description of the semantics of the action and, hence, a definition.\nThe combined approach of the Qualitative Reasoning and Probabilistic Functions for the knowledge representation is proposed. The method aims at represent uncertain, qualitative knowledge that is essential for the moving blocks task's execution. The attempt to formalize the commonsense knowledge is performed with the Situation Calculus language for reasoning and robot's beliefs representation. The method is implemented in the Prolog programming language and tested for a specific simulated scenario. In most cases the implementation enables us to solve a given task, i.e., move blocks to desired positions. The example of robot's reasoning and main parts of the implemented program's code are presented.\nWe investigate different approaches to integrating object recognition and planning in a tabletop manipulation domain with the set of objects used in the 2012 RoboCup@Work competition. Results of our preliminary experiments show that, with some approaches, close integration of perception and planning improves the quality of plans, as well as the computation times of feasible plans.\nThe aim of this paper is to report on a novel text reduction technique, called Text Denoising, that highlights information-rich content when processing a large volume of text data, especially from the biomedical domain. The core feature of the technique, the text readability index, embodies the hypothesis that complex text is more information-rich than the rest. When applied on tasks like biomedical relation bearing text extraction, keyphrase indexing and extracting sentences describing protein interactions, it is evident that the reduced set of text produced by text denoising is more information-rich than the rest.\nThe sigma point (SP) filter, also known as unscented Kalman filter, is an attractive alternative to the extended Kalman filter and the particle filter. Here, we extend the SP filter to nonsequential Bayesian inference corresponding to loopy factor graphs. We propose sigma point belief propagation (SPBP) as a low-complexity approximation of the belief propagation (BP) message passing scheme. SPBP achieves approximate marginalizations of posterior distributions corresponding to (generally) loopy factor graphs. It is well suited for decentralized inference because of its low communication requirements. For a decentralized, dynamic sensor localization problem, we demonstrate that SPBP can outperform nonparametric (particle-based) BP while requiring significantly less computations and communications.\nThis paper considers the problem for estimating the quality of machine translation outputs which are independent of human intervention and are generally addressed using machine learning techniques.There are various measures through which a machine learns translations quality. Automatic Evaluation metrics produce good co-relation at corpus level but cannot produce the same results at the same segment or sentence level. In this paper 16 features are extracted from the input sentences and their translations and a quality score is obtained based on Bayesian inference produced from training data.\nRecent work in psychology and experimental philosophy has shown that judgments of actual causation are often influenced by consideration of defaults, typicality, and normality. A number of philosophers and computer scientists have also suggested that an appeal to such factors can help deal with problems facing existing accounts of actual causation. This paper develops a flexible formal framework for incorporating defaults, typicality, and normality into an account of actual causation. The resulting account takes actual causation to be both graded and comparative. We then show how our account would handle a number of standard cases.\nJudea Pearl was the first to propose a definition of actual causation using causal models. A number of authors have suggested that an adequate account of actual causation must appeal not only to causal structure, but also to considerations of normality. In earlier work, we provided a definition of actual causation using extended causal models, which include information about both causal structure and normality. Extended causal models are potentially very complex. In this paper, we show how it is possible to achieve a compact representation of extended causal models.\nWe propose a new approximate method for counting the number of the solutions for constraint satisfaction problem (CSP). The method derives from the partition function based on introducing the free energy and capturing the relationship of probabilities of variables and constraints, which requires the marginal probabilities. It firstly obtains the marginal probabilities using the belief propagation, and then computes the number of solutions according to the partition function. This allows us to directly plug the marginal probabilities into the partition function and efficiently count the number of solutions for CSP. The experimental results show that our method can solve both random problems and structural problems efficiently.\nIn this paper we present a neural oscillator model of stimulus response theory that exhibits quantum-like behavior. We then show that without adding any additional assumptions, a quantum model constructed to fit observable pairwise correlations has no predictive power over the unknown triple moment, obtainable through the activation of multiple oscillators. We compare this with the results obtained in de Barros (2013), where a criteria of rationality gives optimal ranges for the triple moment.\nWe propose a new family of message passing techniques for MAP estimation in graphical models which we call {\\em Sequential Reweighted Message Passing} (SRMP). Special cases include well-known techniques such as {\\em Min-Sum Diffusion} (MSD) and a faster {\\em Sequential Tree-Reweighted Message Passing} (TRW-S). Importantly, our derivation is simpler than the original derivation of TRW-S, and does not involve a decomposition into trees. This allows easy generalizations. We present such a generalization for the case of higher-order graphical models, and test it on several real-world problems with promising results.\nBackground: Understanding the distinction between function and role is vexing and difficult. While it appears to be useful, in practice this distinction is hard to apply, particularly within biology.   Results: I take an evolutionary approach, considering a series of examples, to develop and generate definitions for these concepts. I test them in practice against the Ontology for Biomedical Investigations (OBI). Finally, I give an axiomatisation and discuss methods for applying these definitions in practice.   Conclusions: The definitions in this paper are applicable, formalizing current practice. As such, they make a significant contribution to the use of these concepts within biomedical ontologies.\nWe introduce novel mathematical models and algorithms to generate (shortest or k different) explanations for biomedical queries, using answer set programming. We implement these algorithms and integrate them in BIOQUERY-ASP. We illustrate the usefulness of these methods with some complex biomedical queries related to drug discovery, over the biomedical knowledge resources PHARMGKB, DRUGBANK, BIOGRID, CTD, SIDER, DISEASE ONTOLOGY and ORPHADATA. To appear in Theory and Practice of Logic Programming (TPLP).\nBoosting is known to be sensitive to label noise. We studied two approaches to improve AdaBoost's robustness against labelling errors. One is to employ a label-noise robust classifier as a base learner, while the other is to modify the AdaBoost algorithm to be more robust. Empirical evaluation shows that a committee of robust classifiers, although converges faster than non label-noise aware AdaBoost, is still susceptible to label noise. However, pairing it with the new robust Boosting algorithm we propose here results in a more resilient algorithm under mislabelling.\nWe introduce stochastic variational inference for Gaussian process models. This enables the application of Gaussian process (GP) models to data sets containing millions of data points. We show how GPs can be vari- ationally decomposed to depend on a set of globally relevant inducing variables which factorize the model in the necessary manner to perform variational inference. Our ap- proach is readily extended to models with non-Gaussian likelihoods and latent variable models based around Gaussian processes. We demonstrate the approach on a simple toy problem and two real world data sets.\nA number of discrete and continuous optimization problems in machine learning are related to convex minimization problems under submodular constraints. In this paper, we deal with a submodular function with a directed graph structure, and we show that a wide range of convex optimization problems under submodular constraints can be solved much more efficiently than general submodular optimization methods by a reduction to a maximum flow problem. Furthermore, we give some applications, including sparse optimization methods, in which the proposed methods are effective. Additionally, we evaluate the performance of the proposed method through computational experiments.\nA recent result has demonstrated that the Bethe partition function always lower bounds the true partition function of binary, log-supermodular graphical models. We demonstrate that these results can be extended to other interesting classes of graphical models that are not necessarily binary or log-supermodular: the ferromagnetic Potts model with a uniform external field and its generalizations and special classes of weighted graph homomorphism problems.\nWe tackle the challenge of efficiently learning the structure of expressive multivariate real-valued densities of copula graphical models. We start by theoretically substantiating the conjecture that for many copula families the magnitude of Spearman's rank correlation coefficient is monotone in the expected contribution of an edge in network, namely the negative copula entropy. We then build on this theory and suggest a novel Bayesian approach that makes use of a prior over values of Spearman's rho for learning copula-based models that involve a mix of copula families. We demonstrate the generalization effectiveness of our highly efficient approach on sizable and varied real-life datasets.\nWe seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the mean Bellman residual. Our algorithm uses a Kalman filter model to estimate those weights and we have developed a simpler approximate Kalman filter model that outperforms the current state of the art projected TD-Learning methods on several standard benchmark problems.\nThe condensed nearest neighbor (CNN) algorithm is a heuristic for reducing the number of prototypical points stored by a nearest neighbor classifier, while keeping the classification rule given by the reduced prototypical set consistent with the full set. I present an upper bound on the number of prototypical points accumulated by CNN. The bound originates in a bound on the number of times the decision rule is updated during training in the multiclass perceptron algorithm, and thus is independent of training set size.\nWe investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain network structures which have been previously found by stochastic search.\nIn Pawlak rough sets, the structure of the definable set families is simple and clear, but in generalizing rough sets, the structure of the definable set families is a bit more complex. There has been much research work focusing on this topic. However, as a fundamental issue in relation based rough sets, under what condition two relations induce the same definable set family has not been discussed. In this paper, based on the concept of the closure of relations, we present a necessary and sufficient condition for two relations to induce the same definable set family.\nWe consider the scenario where the parameters of a probabilistic model are expected to vary over time. We construct a novel prior distribution that promotes sparsity and adapts the strength of correlation between parameters at successive timesteps, based on the data. We derive approximate variational inference procedures for learning and prediction with this prior. We test the approach on two tasks: forecasting financial quantities from relevant text, and modeling language contingent on time-varying financial measurements.\nWe propose in this paper a new generative model for graphs that uses a latent space approach to explain timestamped interactions. The model is designed to provide global estimates of activity dates in historical networks where only the interaction dates between agents are known with reasonable precision. Experimental results show that the model provides better results than local averages in dense enough networks\nIn this extended abstract, we carefully examine a purported counterexample to a postulate of iterated belief revision. We suggest that the example is better seen as a failure to apply the theory of belief revision in sufficient detail. The main contribution is conceptual aiming at the literature on the philosophical foundations of the AGM theory of belief revision [1]. Our discussion is centered around the observation that it is often unclear whether a specific example is a \"genuine\" counterexample to an abstract theory or a misapplication of that theory to a concrete case.\nWe present a new temporal logic called Distribution Temporal Logic (DTL) defined over predicates of belief states and hidden states of partially observable systems. DTL can express properties involving uncertainty and likelihood that cannot be described by existing logics. A co-safe formulation of DTL is defined and algorithmic procedures are given for monitoring executions of a partially observable Markov decision process with respect to such formulae. A simulation case study of a rescue robotics application outlines our approach.\nNature can be seen as informational structure with computational dynamics (info-computationalism), where an (info-computational) agent is needed for the potential information of the world to actualize. Starting from the definition of information as the difference in one physical system that makes a difference in another physical system, which combines Bateson and Hewitt definitions, the argument is advanced for natural computation as a computational model of the dynamics of the physical world where information processing is constantly going on, on a variety of levels of organization. This setting helps elucidating the relationships between computation, information, agency and cognition, within the common conceptual framework, which has special relevance for biology and robotics.\nWe introduce a new local search algorithm for satisfiability problems. Usual approaches focus uniformly on unsatisfied clauses. The new method works by picking uniformly random variables in unsatisfied clauses. A Variable-based Focused Metropolis Search (V-FMS) is then applied to random 3-SAT. We show that it is quite comparable in performance to the clause-based FMS. Consequences for algorithmic design are discussed.\nDempster-Shafer theory of evidence (D-S theory) is widely used in uncertain information process. The basic probability assignment(BPA) is a key element in D-S theory. How to measure the distance between two BPAs is an open issue. In this paper, a new method to measure the distance of two BPAs is proposed. The proposed method is a generalized of existing evidence distance. Numerical examples are illustrated that the proposed method can overcome the shortcomings of existing methods.\nIn this paper we consider optimization as an approach for quickly and flexibly developing hybrid cognitive capabilities that are efficient, scalable, and can exploit knowledge to improve solution speed and quality. In this context, we focus on the Three-Weight Algorithm, which aims to solve general optimization problems. We propose novel methods by which to integrate knowledge with this algorithm to improve expressiveness, efficiency, and scaling, and demonstrate these techniques on two example problems (Sudoku and circle packing).\nMany systems based on knowledge, especially expert systems for medical decision support have been developed. Only systems are based on production rules, and cannot learn and evolve only by updating them. In addition, taking into account several criteria induces an exorbitant number of rules to be injected into the system. It becomes difficult to translate medical knowledge or a support decision as a simple rule. Moreover, reasoning based on generic cases became classic and can even reduce the range of possible solutions. To remedy that, we propose an approach based on using a multi-criteria decision guided by a case-based reasoning (CBR) approach.\nConstraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In the context of sequential pattern mining, a large number of devoted techniques have been developed for solving particular classes of constraints. The aim of this paper is to investigate the use of Constraint Programming (CP) to model and mine sequential patterns in a sequence database. Our CP approach offers a natural way to simultaneously combine in a same framework a large set of constraints coming from various origins. Experiments show the feasibility and the interest of our approach.\nIn this paper, we introduce for the first time the notions of neutrosophic measure and neutrosophic integral, and we develop the 1995 notion of neutrosophic probability. We present many practical examples. It is possible to define the neutrosophic measure and consequently the neutrosophic integral and neutrosophic probability in many ways, because there are various types of indeterminacies, depending on the problem we need to solve. Neutrosophics study the indeterminacy. Indeterminacy is different from randomness. It can be caused by physical space materials and type of construction, by items involved in the space, etc.\nCase-based planning can take advantage of former problem-solving experiences by storing in a plan library previously generated plans that can be reused to solve similar planning problems in the future. Although comparative worst-case complexity analyses of plan generation and reuse techniques reveal that it is not possible to achieve provable efficiency gain of reuse over generation, we show that the case-based planning approach can be an effective alternative to plan generation when similar reuse candidates can be chosen.\nWe present a mathematical framework for mapping second-order logic relations onto a simple state vector algebra. Using this algebra, basic theorems of set theory can be proven in an algorithmic way, hence by an expert system. We illustrate the use of the algebra with simple examples and show that, in principle, all theorems of basic set theory can be recovered in an elementary way. The developed technique can be used for an automated theorem proving in the 1st and 2nd order logic.\nOntology development is a non-trivial task requiring expertise in the chosen ontological language. We propose a method for making the content of ontologies more transparent by presenting, through the use of natural language generation, naturalistic descriptions of ontology classes as textual paragraphs. The method has been implemented in a proof-of- concept system, OntoVerbal, that automatically generates paragraph-sized textual descriptions of ontological classes expressed in OWL. OntoVerbal has been applied to ontologies that can be loaded into Prot\\'eg\\'e and been evaluated with SNOMED CT, showing that it provides coherent, well-structured and accurate textual descriptions of ontology classes.\nExpert System is developed as consulting service for users spread or public requires affordable access. The Internet has become a medium for such services, but presence of mobile devices make the access becomes more widespread by utilizing mobile web and WAP (Wireless Application Protocol). Applying expert systems applications over the web and WAP requires a knowledge base representation that can be accessed simultaneously. This paper proposes single database to accommodate the knowledge representation with decision tree mapping approach. Because of the database exist, consulting application through both web and WAP can access it to provide expert system services options for more affordable for public.\nThe purpose of this paper is to explore a new way of autonomous mapping. Current systems using perception techniques like LAZER or SONAR use probabilistic methods and have a drawback of allowing considerable uncertainty in the mapping process. Our approach is to break down the environment, specifically indoor, into reachable areas and objects, separated by boundaries, and identifying their shape, to render various navigable paths around them. This is a novel method to do away with uncertainties, as far as possible, at the cost of temporal efficiency. Also this system demands only minimum and cheap hardware, as it relies on only Infra-Red sensors to do the job.\nThis paper reviews related work and state-of-the-art publications for recognizing motor symptoms of Parkinson's Disease (PD). It presents research efforts that were undertaken to inform on how well traditional machine learning algorithms can handle this task. In particular, four PD related motor symptoms are highlighted (i.e. tremor, bradykinesia, freezing of gait and dyskinesia) and their details summarized. Thus the primary objective of this research is to provide a literary foundation for development and improvement of algorithms for detecting PD related motor symptoms.\nWe improve upon Huntington's affine geometry by showing that his independence proofs can be, in some cases, simplified. We carry out a systematic investigation of the strict notion of betweenness that Huntington employs (the three arguments are supposed to be distinct) by comparing it to McPhee's three axiom systems for the same intended class of structures, which employs weak betweenness (the arguments are permitted to be equal). Upon closely inspecting the proof that McPhee's axiom systems are equivalent to Huntington's (subject of course to the definition of weak betweenness in terms of strict and vice versa), one finds surprisingly that McPhee's axiom systems have quite different relations to strict betweenness.\nGiven that semantic Web realization is based on the critical mass of metadata accessibility and the representation of data with formal knowledge, it needs to generate metadata that is specific, easy to understand and well-defined. However, semantic annotation of the web documents is the successful way to make the Semantic Web vision a reality. This paper introduces the Semantic Web and its vision (stack layers) with regard to some concept definitions that helps the understanding of semantic annotation. Additionally, this paper introduces the semantic annotation categories, tools, domains and models.\nInformation discounting plays an important role in the theory of belief functions and, generally, in information fusion. Nevertheless, neither classical uniform discounting nor contextual cannot model certain use cases, notably temporal discounting. In this article, new contextual discounting schemes, conservative, proportional and optimistic, are proposed. Some properties of these discounting operations are examined. Classical discounting is shown to be a special case of these schemes. Two motivating cases are discussed: modelling of source reliability and application to temporal discounting.\nFor a finite state automaton, a synchronizing sequence is an input sequence that takes all the states to the same state. Checking the existence of a synchronizing sequence and finding a synchronizing sequence, if one exists, can be performed in polynomial time. However, the problem of finding a shortest synchronizing sequence is known to be NP-hard. In this work, the usefulness of Answer Set Programming to solve this optimization problem is investigated, in comparison with brute-force algorithms and SAT-based approaches.   Keywords: finite automata, shortest synchronizing sequence, ASP\nIntegrating diverse formalisms into modular knowledge representation systems offers increased expressivity, modeling convenience and computational benefits. We introduce concepts of abstract modules and abstract modular systems to study general principles behind the design and analysis of model-finding programs, or solvers, for integrated heterogeneous multi-logic systems. We show how abstract modules and abstract modular systems give rise to transition systems, which are a natural and convenient representation of solvers pioneered by the SAT community. We illustrate our approach by showing how it applies to answer set programming and propositional logic, and to multi-logic systems based on these two formalisms.\nNumerous machine learning problems require an exploration basis - a mechanism to explore the action space. We define a novel geometric notion of exploration basis with low variance, called volumetric spanners, and give efficient algorithms to construct such a basis.   We show how efficient volumetric spanners give rise to the first efficient and optimal regret algorithm for bandit linear optimization over general convex sets. Previously such results were known only for specific convex sets, or under special conditions such as the existence of an efficient self-concordant barrier for the underlying set.\nThis note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming.\nThe problem of Natural Language Query Formalization (NLQF) is to translate a given user query in natural language (NL) into a formal language so that the semantic interpretation has equivalence with the NL interpretation. Formalization of NL queries enables logic based reasoning during information retrieval, database query, question-answering, etc. Formalization also helps in Web query normalization and indexing, query intent analysis, etc. In this paper we are proposing a Description Logics based formal methodology for wh-query intent (also called desire) identification and corresponding formal translation. We evaluated the scalability of our proposed formalism using Microsoft Encarta 98 query dataset and OWL-S TC v.4.0 dataset.\nA new approach for signal parametrization, which consists of a specific regression model incorporating a discrete hidden logistic process, is proposed. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The parameters of the hidden logistic process, in the inner loop of the EM algorithm, are estimated using a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm. An experimental study using simulated and real data reveals good performances of the proposed approach.\nWe present a new mixture model-based discriminant analysis approach for functional data using a specific hidden process regression model. The approach allows for fitting flexible curve-models to each class of complex-shaped curves presenting regime changes. The model parameters are learned by maximizing the observed-data log-likelihood for each class by using a dedicated expectation-maximization (EM) algorithm. Comparisons on simulated data with alternative approaches show that the proposed approach provides better results.\nThis volume contains the papers presented at the sixth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2013) held on August 25th, 2013 in Istanbul, co-located with the 29th International Conference on Logic Programming (ICLP 2013). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.\nA computer Program Capable of performing at a human-expert level in a narrow problem domain area is called an expert system. Management of uncertainty is an intrinsically important issue in the design of expert systems because much of the information in the knowledge base of a typical expert system is imprecise, incomplete or not totally reliable. In this paper, the author present s the review of past work that has been carried out by various researchers based on development of expert systems for the diagnosis of cardiac disease\nCase-Bsed Reasoning (CBR) is a recent theory for problem-solving and learning in computers and people.Broadly construed it is the process of solving new problems based on the solution of similar past problems. In the present paper we introduce an absorbing Markov chain on the main steps of the CBR process.In this way we succeed in obtaining the probabilities for the above process to be in a certain step at a certain phase of the solution of the corresponding problem, and a measure for the efficiency of a CBR system. Examples are given to illustrate our results.\nConcepts of graph theory have applications in many areas of computer science including data mining, image segmentation, clustering, image capturing, networks, etc . An interval-valued fuzzy set is a generalization of the notion of a fuzzy set. Interval-valued fuzzy models give more precision, flexibility and compatibility to the system as compared to the fuzzy models. In this paper, we introduce the concept of antipodal interval - valued fuzzy graph and self median interval-valued fuzzy graph of the given interval-valued fuzzy graph. We investigate isomorphism properties of antipodal interval - valued fuzzy graphs.\nWe investigate cortical learning from the perspective of mechanism design. First, we show that discretizing standard models of neurons and synaptic plasticity leads to rational agents maximizing simple scoring rules. Second, our main result is that the scoring rules are proper, implying that neurons faithfully encode expected utilities in their synaptic weights and encode high-scoring outcomes in their spikes. Third, with this foundation in hand, we propose a biologically plausible mechanism whereby neurons backpropagate incentives which allows them to optimize their usefulness to the rest of cortex. Finally, experiments show that networks that backpropagate incentives can learn simple tasks.\nStandard models of multi-agent modal logic do not capture the fact that information is often \\emph{ambiguous}, and may be interpreted in different ways by different agents. We propose a framework that can model this, and consider different semantics that capture different assumptions about the agents' beliefs regarding whether or not there is ambiguity. We examine the expressive power of logics of ambiguity compared to logics that cannot model ambiguity, with respect to the different semantics that we propose.\nWe propose a novel method for approximate inference in Bayesian networks (BNs). The idea is to sample data from a BN, learn a latent tree model (LTM) from the data offline, and when online, make inference with the LTM instead of the original BN. Because LTMs are tree-structured, inference takes linear time. In the meantime, they can represent complex relationship among leaf nodes and hence the approximation accuracy is often good. Empirical evidence shows that our method can achieve good approximation accuracy at low online computational cost.\nIn this paper, we describe our autonomous bidding agent, RoxyBot, who emerged victorious in the travel division of the 2006 Trading Agent Competition in a photo finish. At a high level, the design of many successful trading agents can be summarized as follows: (i) price prediction: build a model of market prices; and (ii) optimization: solve for an approximately optimal set of bids, given this model. To predict, RoxyBot builds a stochastic model of market prices by simulating simultaneous ascending auctions. To optimize, RoxyBot relies on the sample average approximation method, a stochastic optimization technique.\nWe present an incentive-compatible polynomial-time approximation scheme for multi-unit auctions with general k-minded player valuations. The mechanism fully optimizes over an appropriately chosen sub-range of possible allocations and then uses VCG payments over this sub-range. We show that obtaining a fully polynomial-time incentive-compatible approximation scheme, at least using VCG payments, is NP-hard. For the case of valuations given by black boxes, we give a polynomial-time incentive-compatible 2-approximation mechanism and show that no better is possible, at least using VCG payments.\nWe extend the potential-based shaping method from Markov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.\nIn this paper, we consider the problem of finding a minimum common partition of two strings. The problem has its application in genome comparison. As it is an NP-hard, discrete combinatorial optimization problem, we employ a metaheuristic technique, namely, MAX-MIN ant system to solve this problem. To achieve better efficiency we first map the problem instance into a special kind of graph. Subsequently, we employ a MAX-MIN ant system to achieve high quality solutions for the problem. Experimental results show the superiority of our algorithm in comparison with the state of art algorithm in the literature. The improvement achieved is also justified by standard statistical test.\nWe present a skill analysis with time series image data using data mining methods, focused on table tennis. We do not use body model, but use only hi-speed movies, from which time series data are obtained and analyzed using data mining methods such as C4.5 and so on. We identify internal models for technical skills as evaluation skillfulness for the forehand stroke of table tennis, and discuss mono and meta-functional skills for improving skills.\nThis paper presents a tree-to-tree transduction method for sentence compression. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture structural mismatches. We describe an algorithm for decoding in this framework and show how the model can be trained discriminatively within a large margin framework. Experimental results on sentence compression bring significant improvements over a state-of-the-art model.\nThis article considers the task of automatically inducing role-semantic annotations in the FrameNet paradigm for new languages. We propose a general framework that is based on annotation projection, phrased as a graph optimization problem. It is relatively inexpensive and has the potential to reduce the human effort involved in creating role-semantic resources. Within this framework, we present projection models that exploit lexical and syntactic information. We provide an experimental evaluation on an English-German parallel corpus which demonstrates the feasibility of inducing high-precision German semantic role annotation both for manually and automatically annotated English data.\nThe talent scheduling problem is a simplified version of the real-world film shooting problem, which aims to determine a shooting sequence so as to minimize the total cost of the actors involved. In this article, we first formulate the problem as an integer linear programming model. Next, we devise a branch-and-bound algorithm to solve the problem. The branch-and-bound algorithm is enhanced by several accelerating techniques, including preprocessing, dominance rules and caching search states. Extensive experiments over two sets of benchmark instances suggest that our algorithm is superior to the current best exact algorithm. Finally, the impacts of different parameter settings are disclosed by some additional experiments.\nWe show that the propositional model counting problem #SAT for CNF- formulas with hypergraphs that allow a disjoint branches decomposition can be solved in polynomial time. We show that this class of hypergraphs is incomparable to hypergraphs of bounded incidence cliquewidth which were the biggest class of hypergraphs for which #SAT was known to be solvable in polynomial time so far. Furthermore, we present a polynomial time algorithm that computes a disjoint branches decomposition of a given hypergraph if it exists and rejects otherwise. Finally, we show that some slight extensions of the class of hypergraphs with disjoint branches decompositions lead to intractable #SAT, leaving open how to generalize the counting result of this paper.\nWe present an epistemic action theory for tractable epistemic reasoning as an extension to the h-approximation (HPX) theory. In contrast to existing tractable approaches, the theory supports functional fluents and postdictive reasoning with static causal laws. We argue that this combination is particularly synergistic because it allows one not only to perform direct postdiction about the conditions of actions, but also indirect postdiction about the conditions of static causal laws. We show that despite the richer expressiveness, the temporal projection problem remains tractable (polynomial), and therefore the planning problem remains in NP. We present the operational semantics of our theory as well as its formulation as Answer Set Programming.\nCurrently there are lots of plagiarism detection approaches. But few of them implemented and adapted for Persian languages. In this paper, our work on designing and implementation of a plagiarism detection system based on pre-processing and NLP technics will be described. And the results of testing on a corpus will be presented.\nDecision-Making Trial and Evaluation Laboratory (DEMATEL) method is widely used in many real applications. With the desirable property of efficient handling with the uncertain information in decision making, the fuzzy DEMATEL is heavily studied. Recently, Dytczak and Ginda suggested to defuzzify the fuzzy numbers firstly and then use the classical DEMATEL to obtain the final result. In this short paper, we show that it is not reasonable in some situations. The results of defuzzification at the first step are not coincide with the results of defuzzification at the final step.It seems that the alternative is to defuzzification in the final step in fuzzy DEMATEL.\nA formal framework is given for the characterizability of a class of belief revision operators, defined using minimization over a class of partial preorders, by postulates. It is shown that for partial orders characterizability implies a definability property of the class of partial orders in monadic second-order logic. Based on a non-definability result for a class of partial orders, an example is given of a non-characterizable class of revision operators. This appears to be the first non-characterizability result in belief revision.\nHow can we predict the difficulty of a Sudoku puzzle? We give an overview of difficulty rating metrics and evaluate them on extensive dataset on human problem solving (more then 1700 Sudoku puzzles, hundreds of solvers). The best results are obtained using a computational model of human solving activity. Using the model we show that there are two sources of the problem difficulty: complexity of individual steps (logic operations) and structure of dependency among steps. We also describe metrics based on analysis of solutions under relaxed constraints -- a novel approach inspired by phase transition phenomenon in the graph coloring problem. In our discussion we focus not just on the performance of individual metrics on the Sudoku puzzle, but also on their generalizability and applicability to other problems.\nCognitive Radio (CR) operates in different fields as varied, one of these is cognitive radio networks. In this paper, we propose a new approach used CR, which aims to manage potential failures of computer systems and applications through the introduction of two aspects of autonomous networks to make systems capable of managing themselves with minimum human intervention.\nIn this paper we introduce Epistemic Strategy Logic (ESL), an extension of Strategy Logic with modal operators for individual knowledge. This enhanced framework allows us to represent explicitly and to reason about the knowledge agents have of their own and other agents' strategies. We provide a semantics to ESL in terms of epistemic concurrent game models, and consider the corresponding model checking problem. We show that the complexity of model checking ESL is not worse than (non-epistemic) Strategy Logic\nOnline, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.\nMTD(f) is a new minimax search algorithm, simpler and more efficient than previous algorithms. In tests with a number of tournament game playing programs for chess, checkers and Othello it performed better, on average, than NegaScout/PVS (the AlphaBeta variant used in practically all good chess, checkers, and Othello programs). One of the strongest chess programs of the moment, MIT's parallel chess program Cilkchess uses MTD(f) as its search algorithm, replacing NegaScout, which was used in StarSocrates, the previous version of the program.\nIt is important to find optimal solutions for structural errors in rule-based expert systems .Solutions to discovering such errors by using model checking techniques have already been proposed, but these solutions have problems such as state space explosion. In this paper, to overcome these problems, we model the rule-based systems as finite state transition systems and express confliction and unreachability as Computation Tree Logic (CTL) logic formula and then use the technique of model checking to detect confliction and unreachability in rule-based systems with the model checker UPPAAL.\nIn this paper, we address the dynamic Emergency Medical Service (EMS) systems. A dynamic location model is presented that tries to locate and relocate the ambulances. The proposed model controls the movements and locations of ambulances in order to provide a better coverage of the demand points under different fluctuation patterns that may happen during a given period of time. Some numerical experiments have been carried out by using some real-world data sets that have been collected through the French EMS system.\nDempster-Shafer evidence theory is a powerful tool in information fusion. When the evidence are highly conflicting, the counter-intuitive results will be presented. To adress this open issue, a new method based on evidence distance of Jousselme and Hausdorff distance is proposed. Weight of each evidence can be computed, preprocess the original evidence to generate a new evidence. The Dempster's combination rule is used to combine the new evidence. Comparing with the existing methods, the new proposed method is efficient.\nConflict management is still an open issue in the application of Dempster Shafer evidence theory. A lot of works have been presented to address this issue. In this paper, a new theory, called as generalized evidence theory (GET), is proposed. Compared with existing methods, GET assumes that the general situation is in open world due to the uncertainty and incomplete knowledge. The conflicting evidence is handled under the framework of GET. It is shown that the new theory can explain and deal with the conflicting evidence in a more reasonable way.\nThe interaction of two binary variables, assumed to be empirical observations, has three degrees of freedom when expressed as a matrix of frequencies. Usually, the size of causal influence of one variable on the other is calculated as a single value, as increase in recovery rate for a medical treatment, for example. We examine what is lost in this simplification, and propose using two interface constants to represent positive and negative implications separately. Given certain assumptions about non-causal outcomes, the set of resulting epistemologies is a continuum. We derive a variety of particular measures and contrast them with the one-dimensional index.\nWe propose a representation of graph as a functional object derived from the power iteration of the underlying adjacency matrix. The proposed functional representation is a graph invariant, i.e., the functional remains unchanged under any reordering of the vertices. This property eliminates the difficulty of handling exponentially many isomorphic forms. Bhattacharyya kernel constructed between these functionals significantly outperforms the state-of-the-art graph kernels on 3 out of the 4 standard benchmark graph classification datasets, demonstrating the superiority of our approach. The proposed methodology is simple and runs in time linear in the number of edges, which makes our kernel more efficient and scalable compared to many widely adopted graph kernels with running time cubic in the number of vertices.\nMulti-stage optimization under uncertainty techniques can be used to solve long-term management problems. Although many optimization modeling language extensions as well as computational environments have been proposed, the acceptance of this technique is generally low, due to the inherent complexity of the modeling and solution process. In this paper a simplification to annotate multi-stage decision problems under uncertainty is presented - this simplification contrasts with the common approach to create an extension on top of an existing optimization modeling language. This leads to the definition of meta models, which can be instanced in various programming languages. An example using the statistical computing language R is shown.\nThere is knowledge. There is belief. And there is tacit agreement.' 'We may talk about objects. We may talk about attributes of the objects. Or we may talk both about objects and their attributes.' This work inspects tacit agreements on assumptions about the relation between objects and their attributes, and studies a way of expressing them, presenting as the result what we term gradual logic in which the sense of truth gradually shifts. It extends classical logic instances with a new logical connective capturing the object-attribute relation. A formal semantics is presented. Decidability is proved. Para- consistent/epistemic/conditional/intensional/description/combined logics are compared.\nData clustering is an important area of data mining. This is an unsupervised study where data of similar types are put into one cluster while data of another types are put into different cluster. Fuzzy C means is a very important clustering technique based on fuzzy logic. Also we have some hard clustering techniques available like K-means among the popular ones. In this paper a comparative study is done between Fuzzy clustering algorithm and hard clustering algorithm\nThis paper introduces an unsupervised technique to detect the changed region of multitemporal images on a same reference plane with the help of rough clustering. The proposed technique is a soft-computing approach, based on the concept of rough set with rough clustering and Pawlak's accuracy. It is less noisy and avoids pre-deterministic knowledge about the distribution of the changed and unchanged regions. To show the effectiveness, the proposed technique is compared with some other approaches.\nThis paper represents an text extraction method from Google maps, GIS maps/images. Due to an unsupervised approach there is no requirement of any prior knowledge or training set about the textual and non-textual parts. Fuzzy CMeans clustering technique is used for image segmentation and Prewitt method is used to detect the edges. Connected component analysis and gridding technique enhance the correctness of the results. The proposed method reaches 98.5% accuracy level on the basis of experimental data sets.\nArgumentation is one of the most popular approaches of defining a~non-monotonic formalism and several argumentation based semantics were proposed for defeasible logic programs. Recently, a new approach based on notions of conflict resolutions was proposed, however with declarative semantics only. This paper gives a more procedural counterpart by developing skeptical and credulous argument games for complete semantics and soundness and completeness theorems for both games are provided. After that, distribution of defeasible logic program into several contexts is investigated and both argument games are adapted for multi-context system.\nDeontic logic is shown to be applicable for modelling human reasoning. For this the Wason selection task and the suppression task are discussed in detail. Different versions of modelling norms with deontic logic are introduced and in the case of the Wason selection task it is demonstrated how differences in the performance of humans in the abstract and in the social contract case can be explained. Furthermore it is shown that an automated theorem prover can be used as a reasoning tool for deontic logic.\nA sequence of random variables is exchangeable if its joint distribution is invariant under variable permutations. We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences. We prove that a family of tractable EVMs is optimal under zero-one loss for a large class of functions, including parity and threshold functions, and strictly subsumes existing tractable independence-based model families. Extensive experiments show that EVMs outperform state of the art classifiers such as SVMs and probabilistic models which are solely based on independence assumptions.\nWe analyse the expressiveness of the two-valued semantics of abstract argumentation frameworks, normal logic programs and abstract dialectical frameworks. By expressiveness we mean the ability to encode a desired set of two-valued interpretations over a given propositional signature using only atoms from that signature. While the computational complexity of the two-valued model existence problem for all these languages is (almost) the same, we show that the languages form a neat hierarchy with respect to their expressiveness.\nMEASP is a multi-engine solver for ground ASP programs. It exploits algorithm selection techniques based on classification to select one among a set of out-of-the-box heterogeneous ASP solvers used as black-box engines. In this paper we report on (i) a new optimized implementation of MEASP; and (ii) an attempt of applying algorithm selection to non-ground programs. An experimental analysis reported in the paper shows that (i) the new implementation of \\measp is substantially faster than the previous version; and (ii) the multi-engine recipe can be applied to the evaluation of non-ground programs with some benefits.\nThis paper introduces an efficient edge detection method based on Gabor filter and rough clustering. The input image is smoothed by Gabor function, and the concept of rough clustering is used to focus on edge detection with soft computational approach. Hysteresis thresholding is used to get the actual output, i.e. edges of the input image. To show the effectiveness, the proposed technique is compared with some other edge detection methods.\nThe connections among natural language processing and argumentation theory are becoming stronger in the latest years, with a growing amount of works going in this direction, in different scenarios and applying heterogeneous techniques. In this paper, we present two datasets we built to cope with the combination of the Textual Entailment framework and bipolar abstract argumentation. In our approach, such datasets are used to automatically identify through a Textual Entailment system the relations among the arguments (i.e., attack, support), and then the resulting bipolar argumentation graphs are analyzed to compute the accepted arguments.\nExpert systems prove to be suitable replacement for human experts when human experts are unavailable for different reasons. Various expert system has been developed for wide range of application. Although some expert systems in the field of fishery and aquaculture has been developed but a system that aids user in process of selecting a new addition to their aquarium tank never been designed. This paper proposed an expert system that suggests new addition to an aquarium tank based on current environmental condition of aquarium and currently existing fishes in aquarium. The system suggest the best fit for aquarium condition and most compatible to other fishes in aquarium.\nIn this paper we present the Transalg system, designed to produce SAT encodings for discrete functions, written as programs in a specific language. Translation of such programs to SAT is based on propositional encoding methods for formal computing models and on the concept of symbolic execution. We used the Transalg system to make SAT encodings for a number of cryptographic functions.\nWe consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate. By reducing this task to a stochastic multi-armed bandit problem, we show that well developed allocation strategies can be used to achieve an MSE that approaches that of the best estimator chosen in retrospect. We then extend these developments to a scenario where alternative estimators have different, possibly stochastic costs. The outcome is a new set of adaptive Monte Carlo strategies that provide stronger guarantees than previous approaches while offering practical advantages.\nThe goal of this project is to (i) accumulate annotated informal/formal mathematical corpora suitable for training semi-automated translation between informal and formal mathematics by statistical machine-translation methods, (ii) to develop such methods oriented at the formalization task, and in particular (iii) to combine such methods with learning-assisted automated reasoning that will serve as a strong semantic component. We describe these ideas, the initial set of corpora, and some initial experiments done over them.\n(To appear in Theory and Practice of Logic Programming (TPLP))   ESmodels is designed and implemented as an experiment platform to investigate the semantics, language, related reasoning algorithms, and possible applications of epistemic specifications.We first give the epistemic specification language of ESmodels and its semantics. The language employs only one modal operator K but we prove that it is able to represent luxuriant modal operators by presenting transformation rules. Then, we describe basic algorithms and optimization approaches used in ESmodels. After that, we discuss possible applications of ESmodels in conformant planning and constraint satisfaction. Finally, we conclude with perspectives.\nQuery answering in Answer Set Programming (ASP) is usually solved by computing (a subset of) the cautious consequences of a logic program. This task is computationally very hard, and there are programs for which computing cautious consequences is not viable in reasonable time. However, current ASP solvers produce the (whole) set of cautious consequences only at the end of their computation. This paper reports on strategies for computing cautious consequences, also introducing anytime algorithms able to produce sound answers during the computation.\nData Mining is the process of extracting useful patterns from the huge amount of database and many data mining techniques are used for mining these patterns. Recently, one of the remarkable facts in higher educational institute is the rapid growth data and this educational data is expanding quickly without any advantage to the educational management. The main aim of the management is to refine the education standard; therefore by applying the various data mining techniques on this data one can get valuable information. This research study proposed the \"classification model for the student's enrollment process in higher educational courses using data mining techniques\". Additionally, this study contributes to finding some patterns that are meaningful to management.\nMy research goal is to employ a parser generation algorithm based on the Earley parsing algorithm to the evaluation and compilation of queries to logic programs, especially to deductive databases. By means of partial deduction, from a query to a logic program a parameterized automaton is to be generated that models the evaluation of this query. This automaton can be compiled to executable code; thus we expect a speedup in runtime of query evaluation. An extended abstract/ full version of a paper accepted to be presented at the Doctoral Consortium of the 30th International Conference on Logic Programming (ICLP 2014), July 19-22, Vienna, Austria\nRecent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel with- out sacrificing convergence guarantees or computational efficiency. This opens up new possibilities for sound ensemble techniques in reinforcement learning. In this work we propose learning an ensemble of policies related through potential-based shaping rewards. The ensemble induces a combination policy by using a voting mechanism on its components. Learning happens in real time, and we empirically show the combination policy to outperform the individual policies of the ensemble.\nWe extend the knowledge about so-called structural restrictions of $\\mathrm{\\#SAT}$ by giving a polynomial time algorithm for $\\beta$-acyclic $\\mathrm{\\#SAT}$. In contrast to previous algorithms in the area, our algorithm does not proceed by dynamic programming but works along an elimination order, solving a weighted version of constraint satisfaction. Moreover, we give evidence that this deviation from more standard algorithm is not a coincidence, but that there is likely no dynamic programming algorithm of the usual style for $\\beta$-acyclic $\\mathrm{\\#SAT}$.\nWe explore the structure of non-redundant and minimal sets consisting of graded if-then rules. The rules serve as graded attribute implications in object-attribute incidence data and as similarity-based functional dependencies in a similarity-based generalization of the relational model of data. Based on our observations, we derive a polynomial-time algorithm which transforms a given finite set of rules into an equivalent one which has the least size in terms of the number of rules.\nMulti-context systems provide a powerful framework for modelling information-aggregation systems featuring heterogeneous reasoning components. Their execution can, however, incur non-negligible cost. Here, we focus on cost-complexity of such systems. To that end, we introduce cost-aware multi-context systems, an extension of non-monotonic multi-context systems framework taking into account costs incurred by execution of semantic operators of the individual contexts. We formulate the notion of cost-complexity for consistency and reasoning problems in MCSs. Subsequently, we provide a series of results related to gradually more and more constrained classes of MCSs and finally introduce an incremental cost-reducing algorithm solving the reasoning problem for definite MCSs.\nThe increasing demand of world wide web raises the need of predicting the user's web page request.The most widely used approach to predict the web pages is the pattern discovery process of Web usage mining. This process involves inevitability of many techniques like Markov model, association rules and clustering. Fuzzy theory with different techniques has been introduced for the better results. Our focus is on Markov models. This paper is introducing the vague Rules with Markov models for more accuracy using the vague set theory.\nIn this paper we present a short history of logics: from particular cases of 2-symbol or numerical valued logic to the general case of n-symbol or numerical valued logic. We show generalizations of 2-valued Boolean logic to fuzzy logic, also from the Kleene and Lukasiewicz 3-symbol valued logics or Belnap 4-symbol valued logic to the most general n-symbol or numerical valued refined neutrosophic logic. Two classes of neutrosophic norm (n-norm) and neutrosophic conorm (n-conorm) are defined. Examples of applications of neutrosophic logic to physics are listed in the last section. Similar generalizations can be done for n-Valued Refined Neutrosophic Set, and respectively n- Valued Refined Neutrosopjhic Probability.\nLearning Markov blanket (MB) structures has proven useful in performing feature selection, learning Bayesian networks (BNs), and discovering causal relationships. We present a formula for efficiently determining the number of MB structures given a target variable and a set of other variables. As expected, the number of MB structures grows exponentially. However, we show quantitatively that there are many fewer MB structures that contain the target variable than there are BN structures that contain it. In particular, the ratio of BN structures to MB structures appears to increase exponentially in the number of variables.\nWe develop a technique for generalising from data in which models are samplers represented as program text. We establish encouraging empirical results that suggest that Markov chain Monte Carlo probabilistic programming inference techniques coupled with higher-order probabilistic programming languages are now sufficiently powerful to enable successful inference of this kind in nontrivial domains. We also introduce a new notion of probabilistic program compilation and show how the same machinery might be used in the future to compile probabilistic programs for efficient reusable predictive inference.\nIn the first chapter of this report, we provide an overview on mobile and wireless networks, with special focus on the IEEE 802.22 norm, which is a norm dedicated to cognitive radio (CR). Chapter 2 goes into detail about CR and Chapter 3 is devoted to the presentation of the concept of agents and in particular the concept of multi-agent systems (MAS). Finally, Chapter 4 provides a state of the art on the use of artificial intelligence techniques, particularly MAS for radio resource allocation and dynamic spectrum access in the field of CR.\nArticle purpose is the analysis of a question of possibility of technologization of philosophical knowledge. We understand the organization of cognitive activity which is guided by the set of methods guaranteed bringing to successful (i.e. to precisely corresponding set parameters) to applied results as technologization. Transformation of sense of philosophy allows revealing possibilities of its technologization. The leading role in this process is played by philosophy of science which creates conditions for such transformation. At the same time there is justified an appeal to branch combination theory of the directions of scientific knowledge and partial refusal of understanding of philosophy as synthetic knowledge in which the main task is permission, instead of generation of paradoxes.\nExisting decision-theoretic reasoning frameworks such as decision networks use simple data structures and processes. However, decisions are often made based on complex data structures, such as social networks and protein sequences, and rich processes involving those structures. We present a framework for representing decision problems with complex data structures using probabilistic programming, allowing probabilistic models to be created with programming language constructs such as data structures and control flow. We provide a way to use arbitrary data types with minimal effort from the user, and an approximate decision-making algorithm that is effective even when the information space is very large or infinite. Experimental results show our algorithm working on problems with very large information spaces.\nWe develop a model of abduction in abstract argumentation, where changes to an argumentation framework act as hypotheses to explain the support of an observation. We present dialogical proof theories for the main decision problems (i.e., finding hypothe- ses that explain skeptical/credulous support) and we show that our model can be instantiated on the basis of abductive logic programs.\nWe propose a general framework for modelling and solving deductive games, where one player selects a secret code and the other player strives to discover this code using a minimal number of allowed experiments that reveal some partial information about the code. The framework is implemented in a software tool Cobra, and its functionality is demonstrated by producing new results about existing deductive games.\nWe propose and investigate a simple ranking-measure-based extension semantics for abstract argumentation frameworks based on their generic instantiation by default knowledge bases and the ranking construction semantics for default reasoning. In this context, we consider the path from structured to logical to shallow semantic instantiations. The resulting well-justified JZ-extension semantics diverges from more traditional approaches.\nIn this paper, we address the problem of real-time detection of viruses docking to nanowires, especially when multiple viruses dock to the same nano-wire. The task becomes more complicated when there is an array of nanowires coated with different antibodies, where different viruses can dock to each coated nanowire at different binding strengths. We model the array response to a viral agent as a pattern of conductance change over nanowires with known modifier --- this representation permits analysis of the output of such an array via belief network (Bayes) methods, as well as novel generative models like the Hidden Semi-Markov Model (HSMM).\nIn this paper, we present an ontology of mathematical knowledge concepts that covers a wide range of the fields of mathematics and introduces a balanced representation between comprehensive and sensible models. We demonstrate the applications of this representation in information extraction, semantic search, and education. We argue that the ontology can be a core of future integration of math-aware data sets in the Web of Data and, therefore, provide mappings onto relevant datasets, such as DBpedia and ScienceWISE.\nIn this paper we introduce an evolutionary algorithm for the solution of linear integer programs. The strategy is based on the separation of the variables into the integer subset and the continuous subset; the integer variables are fixed by the evolutionary system, and the continuous ones are determined in function of them, by a linear program solver.   We report results obtained for some standard benchmark problems, and compare them with those obtained by branch-and-bound. The performance of the evolutionary algorithm is promising. Good feasible solutions were generally obtained, and in some of the difficult benchmark tests it outperformed branch-and-bound.\nIt has been demonstrated earlier that universal computation is 'almost surely' chaotic. Machine learning is a form of computational fixed point iteration, iterating over the computable function space. We showcase some properties of this iteration, and establish in general that the iteration is 'almost surely' of chaotic nature. This theory explains the observation in the counter intuitive properties of deep learning methods. This paper demonstrates that these properties are going to be universal to any learning method.\nThis manuscript uses machine learning techniques to exploit baseball pitchers' decision making, so-called \"Baseball IQ,\" by modeling the at-bat information, pitch selection and counts, as a Markov Decision Process (MDP). Each state of the MDP models the pitcher's current pitch selection in a Markovian fashion, conditional on the information immediately prior to making the current pitch. This includes the count prior to the previous pitch, his ensuing pitch selection, the batter's ensuing action and the result of the pitch.\nThis paper proposes an analysis of the effects of consensus and preference aggregation on the consistency of pairwise comparisons. We define some boundary properties for the inconsistency of group preferences and investigate their relation with different inconsistency indices. Some results are presented on more general dependencies between properties of inconsistency indices and the satisfaction of boundary properties. In the end, given three boundary properties and nine indices among the most relevant ones, we will be able to present a complete analysis of what indices satisfy what properties and offer a reflection on the interpretation of the inconsistency of group preferences.\nWe learn multiple hypotheses for related tasks under a latent hierarchical relationship between tasks. We exploit the intuition that for domain adaptation, we wish to share classifier structure, but for multitask learning, we wish to share covariance structure. Our hierarchical model is seen to subsume several previously proposed multitask learning models and performs well on three distinct real-world data sets.\nGraphical Gaussian models have proven to be useful tools for exploring network structures based on multivariate data. Applications to studies of gene expression have generated substantial interest in these models, and resulting recent progress includes the development of fitting methodology involving penalization of the likelihood function. In this paper we advocate the use of the multivariate t and related distributions for more robust inference of graphs. In particular, we demonstrate that penalized likelihood inference combined with an application of the EM algorithm provides a simple and computationally efficient approach to model selection in the t-distribution case.\nThis paper presents studies on a deterministic annealing algorithm based on quantum annealing for variational Bayes (QAVB) inference, which can be seen as an extension of the simulated annealing for variational Bayes (SAVB) inference. QAVB is as easy as SAVB to implement. Experiments revealed QAVB finds a better local optimum than SAVB in terms of the variational free energy in latent Dirichlet allocation (LDA).\nIn the framework of prediction with expert advice, we consider a recently introduced kind of regret bounds: the bounds that depend on the effective instead of nominal number of experts. In contrast to the Normal- Hedge bound, which mainly depends on the effective number of experts but also weakly depends on the nominal one, we obtain a bound that does not contain the nominal number of experts at all. We use the defensive forecasting method and introduce an application of defensive forecasting to multivalued supermartingales.\nActive learning strategies respond to the costly labelling task in a supervised classification by selecting the most useful unlabelled examples in training a predictive model. Many conventional active learning algorithms focus on refining the decision boundary, rather than exploring new regions that can be more informative. In this setting, we propose a sequential algorithm named EG-Active that can improve any Active learning algorithm by an optimal random exploration. Experimental results show a statistically significant and appreciable improvement in the performance of our new approach over the existing active feedback methods.\nWe present a logic for reasoning about graded inequalities which generalizes the ordinary inequational logic used in universal algebra. The logic deals with atomic predicate formulas of the form of inequalities between terms and formalizes their semantic entailment and provability in graded setting which allows to draw partially true conclusions from partially true assumptions. We follow the Pavelka approach and define general degrees of semantic entailment and provability using complete residuated lattices as structures of truth degrees. We prove the logic is Pavelka-style complete. Furthermore, we present a logic for reasoning about graded if-then rules which is obtained as particular case of the general result.\nMost controlled natural languages (CNLs) are processed with the help of a pipeline architecture that relies on different software components. We investigate in this paper in an experimental way how well answer set programming (ASP) is suited as a unifying framework for parsing a CNL, deriving a formal representation for the resulting syntax trees, and for reasoning with that representation. We start from a list of input tokens in ASP notation and show how this input can be transformed into a syntax tree using an ASP grammar and then into reified ASP rules in form of a set of facts. These facts are then processed by an ASP meta-interpreter that allows us to infer new knowledge.\nMatrix completion under interval uncertainty can be cast as matrix completion with element-wise box constraints. We present an efficient alternating-direction parallel coordinate-descent method for the problem. We show that the method outperforms any other known method on a benchmark in image in-painting in terms of signal-to-noise ratio, and that it provides high-quality solutions for an instance of collaborative filtering with 100,198,805 recommendations within 5 minutes.\nDecision trees have been widely used in machine learning. However, due to some reasons, data collecting in real world contains a fuzzy and uncertain form. The decision tree should be able to handle such fuzzy data. This paper presents a method to construct fuzzy decision tree. It proposes a fuzzy decision tree induction method in iris flower data set, obtaining the entropy from the distance between an average value and a particular value. It also presents an experiment result that shows the accuracy compared to former ID3.\nSupport vector machines (SVMs) and fuzzy rule systems are functionally equivalent under some conditions. Therefore, the learning algorithms developed in the field of support vector machines can be used to adapt the parameters of fuzzy systems. Extracting fuzzy models from support vector machines has the inherent advantage that the model does not need to determine the number of rules in advance. However, after the support vector machine learning, the complexity is usually high, and interpretability is also impaired. This paper not only proposes a complete framework for extracting interpretable SVM-based fuzzy modeling, but also provides optimization issues of the models. Simulations examples are given to embody the idea of this paper.\nStudy of soft sets was first proposed by Molodtsov in 1999 to deal with uncertainty in a non-parametric manner. The researchers did not pay attention to soft set theory at that time but now the soft set theory has been developed in many areas of mathematics. Algebraic structures using soft set theory are very rapidly developed. In this book we developed soft neutrosophic algebraic structures by using soft sets and neutrosophic algebraic structures. In this book we study soft neutrosophic groups, soft neutrosophic semigroups, soft neutrosophic loops, soft neutrosophic LA-semigroups, and their generalizations respectively.\nIn group decision making (GDM) problems fuzzy preference relations (FPR) are widely used for representing decision makers' opinions on the set of alternatives. In order to avoid misleading solutions, the study of consistency and consensus has become a very important aspect. This article presents a simulated annealing (SA) based soft computing approach to optimize the consistency/consensus level (CCL) of a complete fuzzy preference relation in order to solve a GDM problem. Consistency level indicates as expert's preference quality and consensus level measures the degree of agreement among experts' opinions. This study also suggests the set of experts for the necessary modifications in their prescribed preference structures without intervention of any moderator.\nGiven an argumentation network with initial values to the arguments, we look for algorithms which can yield extensions compatible with such initial values. We find that the best way of tackling this problem is to offer an iteration formula that takes the initial values and the attack relation and iterates a sequence of intermediate values that eventually converges leading to an extension. The properties surrounding the application of the iteration formula and its connection with other numerical and non-numerical techniques proposed by others are thoroughly investigated in this paper.\nThe paper provides a survey of semantic methods for solution of fundamental tasks in mathematical knowledge management. Ontological models and formalisms are discussed. We propose an ontology of mathematical knowledge, covering a wide range of fields of mathematics. We demonstrate applications of this representation in mathematical formula search, and learning.\nIt is well-known that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), over-specification (i.e., train networks which are larger than needed), and regularization. In this paper we revisit the computational complexity of training neural networks from a modern perspective. We provide both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks.\nErrors in implicative theories coming from binary data are studied. First, two classes of errors that may affect implicative theories are singled out. Two approaches for finding errors of these classes are proposed, both of them based on methods of Formal Concept Analysis. The first approach uses the cardinality minimal (canonical or Duquenne-Guigues) implication base. The construction of such a base is computationally intractable. Using an alternative approach one checks possible errors on the fly in polynomial time via computing closures of subsets of attributes. Both approaches are interactive, based on questions about the validity of certain implications. Results of computer experiments are presented and discussed.\nDistributed representations (such as those based on embeddings) and discrete representations (such as those based on logic) have complementary strengths. We explore one possible approach to combining these two kinds of representations. We present a model theory/semantics for first order logic based on vectors of reals. We describe the model theory, discuss some interesting properties of such a system and present a simple approach to query answering.\nWe study the semantics of fuzzy if-then rules called fuzzy attribute implications parameterized by systems of isotone Galois connections. The rules express dependencies between fuzzy attributes in object-attribute incidence data. The proposed parameterizations are general and include as special cases the parameterizations by linguistic hedges used in earlier approaches. We formalize the general parameterizations, propose bivalent and graded notions of semantic entailment of fuzzy attribute implications, show their characterization in terms of least models and complete axiomatization, and provide characterization of bases of fuzzy attribute implications derived from data.\nIn this paper, we propose an approach to the unsupervised segmentation of images using Markov Random Field. The proposed approach is based on the idea of Bit Plane Slicing. We use the planes as initial labellings for an ensemble of segmentations. With pixelwise voting, a robust segmentation approach can be achieved, which we demonstrate on microscope cell images. We tested our approach on a publicly available database, where it proven to be competitive with other methods and manual segmentation.\nThis works is motivated by a real-world case study where it is necessary to integrate and relate existing ontologies through meta- modelling. For this, we introduce the Description Logic ALCQM which is obtained from ALCQ by adding statements that equate individuals to concepts in a knowledge base. In this new extension, a concept can be an individual of another concept (called meta-concept) which themselves can be individuals of yet another concept (called meta meta-concept) and so on. We define a tableau algorithm for checking consistency of an ontology in ALCQM and prove its correctness.\nReinforcement learning agents have traditionally been evaluated on small toy problems. With advances in computing power and the advent of the Arcade Learning Environment, it is now possible to evaluate algorithms on diverse and difficult problems within a consistent framework. We discuss some challenges posed by the arcade learning environment which do not manifest in simpler environments. We then provide a comparison of model-free, linear learning algorithms on this challenging problem set.\nGenerative Adversarial Nets [8] were recently introduced as a novel way to train generative models. In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal model, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.\nThis manual outlines a fully automated liquid handling robot to enable physically-embodied evolution within a chemical oil-droplet system. The robot is based upon the REPRAP3D printer system and makes the droplets by mixing chemicals and then placing them in a petri dish after which they are recorded using a camera and the behaviour of the droplets analysed using image recognition software. This manual accompanies the open access publication published in Nature Communications DOI: 10.1038/ncomms6571.\nStandard LDA model suffers the problem that the topic assignment of each word is independent and word correlation hence is neglected. To address this problem, in this paper, we propose a model called Word Related Latent Dirichlet Allocation (WR-LDA) by incorporating word correlation into LDA topic models. This leads to new capabilities that standard LDA model does not have such as estimating infrequently occurring words or multi-language topic modeling. Experimental results demonstrate the effectiveness of our model compared with standard LDA.\nGiven empirical evidence for the dependence of an outcome variable on an exposure variable, we can typically only provide bounds for the \"probability of causation\" in the case of an individual who has developed the outcome after being exposed. We show how these bounds can be adapted or improved if further information becomes available. In addition to reviewing existing work on this topic, we provide a new analysis for the case where a mediating variable can be observed. In particular we show how the probability of causation can be bounded when there is no direct effect and no confounding.   Keywords: Causal inference, Mediation Analysis, Probability of Causation\nRewriting is widely used to optimise owl:sameAs reasoning in materialisation based OWL 2 RL systems. We investigate issues related to both the correctness and efficiency of rewriting, and present an algorithm that guarantees correctness, improves efficiency, and can be effectively parallelised. Our evaluation shows that our approach can reduce reasoning times on practical data sets by orders of magnitude.\nThe paper presents an approach to verification of a multi-agent data analysis algorithm. We base correct simulation of the multi-agent system by a finite integer model. For verification we use model checking tool SPIN. Protocols of agents are written in Promela language and properties of the multi-agent data analysis system are expressed in logic LTL. We run several experiments with SPIN and the model.\nIn this paper fuzzy VRPTW with an uncertain travel time is considered. Credibility theory is used to model the problem and specifies a preference index at which it is desired that the travel times to reach the customers fall into their time windows. We propose the integration of fuzzy and ant colony system based evolutionary algorithm to solve the problem while preserving the constraints. Computational results for certain benchmark problems having short and long time horizons are presented to show the effectiveness of the algorithm. Comparison between different preferences indexes have been obtained to help the user in making suitable decisions.\nRDF and Description Logics work in an open-world setting where absence of information is not information about absence. Nevertheless, Description Logic axioms can be interpreted in a closed-world setting and in this setting they can be used for both constraint checking and closed-world recognition against information sources. When the information sources are expressed in well-behaved RDF or RDFS (i.e., RDF graphs interpreted in the RDF or RDFS semantics) this constraint checking and closed-world recognition is simple to describe. Further this constraint checking can be implemented as SPARQL querying and thus effectively performed.\nWe propose and analyze estimators for statistical functionals of one or more distributions under nonparametric assumptions. Our estimators are based on the theory of influence functions, which appear in the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out technique enjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular information theoretic quantities, and via empirical evaluation, show the advantage of this approach over existing estimators.\nThis paper is aimed at providing a very first, more \"global\", systematic point of view with respect to possible conflict generation in CA-EN-like causal structures. For simplicity, only the outermost level of graphs is taken into account. Localization of the \"conflict area\", diagnostic preferences, and bases for systematic conflict generation are considered. A notion of {\\em Potential Conflict Structure} ({\\em PCS}) constituting a basic tool for identification of possible conflicts is proposed and its use is discussed.\nWe explore the idea of using a \"possibilistic graphical model\" as the basis for a world model that drives a dialog system. As a first step we have developed a system that uses text-based dialog to derive a model of the user's family relations. The system leverages its world model to infer relational triples, to learn to recover from upstream coreference resolution errors and ambiguities, and to learn context-dependent paraphrase models. We also explore some theoretical aspects of the underlying graphical model.\nDeontic logic is a very well researched branch of mathematical logic and philosophy. Various kinds of deontic logics are discussed for different application domains like argumentation theory, legal reasoning, and acts in multi-agent systems. In this paper, we show how standard deontic logic can be stepwise transformed into description logic and DL- clauses, such that it can be processed by Hyper, a high performance theorem prover which uses a hypertableau calculus. Two use cases, one from multi-agent research and one from the development of normative system are investigated.\nThe task of computing approximate Nash equilibria in large zero-sum extensive-form games has received a tremendous amount of attention due mainly to the Annual Computer Poker Competition. Immediately after its inception, two competing and seemingly different approaches emerged---one an application of no-regret online learning, the other a sophisticated gradient method applied to a convex-concave saddle-point formulation. Since then, both approaches have grown in relative isolation with advancements on one side not effecting the other. In this paper, we rectify this by dissecting and, in a sense, unify the two views.\nFalling rule lists are classification models consisting of an ordered list of if-then rules, where (i) the order of rules determines which example should be classified by each rule, and (ii) the estimated probability of success decreases monotonically down the list. These kinds of rule lists are inspired by healthcare applications where patients would be stratified into risk sets and the highest at-risk patients should be considered first. We provide a Bayesian framework for learning falling rule lists that does not rely on traditional greedy decision tree learning methods.\nIn a Bayesian network, we wish to evaluate the marginal probability of a query variable, which may be conditioned on the observed values of some evidence variables. Here we first present our \"border algorithm,\" which converts a BN into a directed chain. For the polytrees, we then present in details, with some modifications and within the border algorithm framework, the \"revised polytree algorithm\" by Peot & Shachter (1991). Finally, we present our \"parentless polytree method,\" which, coupled with the border algorithm, converts any Bayesian network into a polytree, rendering the complexity of our inferences independent of the size of network, and linear with the number of its evidence and query variables. All quantities in this paper have probabilistic interpretations.\nRecent advances in metareasoning for search has shown its usefulness in improving numerous search algorithms. This paper applies rational metareasoning to IDA* when several admissible heuristics are available. The obvious basic approach of taking the maximum of the heuristics is improved upon by lazy evaluation of the heuristics, resulting in a variant known as Lazy IDA*. We introduce a rational version of lazy IDA* that decides whether to compute the more expensive heuristics or to bypass it, based on a myopic expected regret estimate. Empirical evaluation in several domains supports the theoretical results, and shows that rational lazy IDA* is a state-of-the-art heuristic combination method.\nWe propose an efficient family of algorithms to learn the parameters of a Bayesian network from incomplete data. In contrast to textbook approaches such as EM and the gradient method, our approach is non-iterative, yields closed form parameter estimates, and eliminates the need for inference in a Bayesian network. Our approach provides consistent parameter estimates for missing data problems that are MCAR, MAR, and in some cases, MNAR. Empirically, our approach is orders of magnitude faster than EM (as our approach requires no inference). Given sufficient data, we learn parameters that can be orders of magnitude more accurate.\nWe present ULSA, a novel stochastic local search algorithm for random binary constraint satisfaction problems (CSP). ULSA is many times faster than the prior state of the art on a widely-studied suite of random CSP benchmarks. Unlike the best previous methods for these benchmarks, ULSA is a simple unweighted method that does not require dynamic adaptation of weights or penalties. ULSA obtains new record best solutions satisfying 99 of 100 variables in the challenging frb100-40 benchmark instance.\nThis work presents how persistent predicates have been included in the in-memory deductive system DES by relying on external SQL database management systems. We introduce how persistence is supported from a user-point of view and the possible applications the system opens up, as the deductive expressive power is projected to relational databases. Also, we describe how it is possible to intermix computations of the deductive engine and the external database, explaining its implementation and some optimizations. Finally, a performance analysis is undertaken, comparing the system with current relational database systems.\nIn the data mining field many clustering methods have been proposed, yet standard versions do not take into account uncertain databases. This paper deals with a new approach to cluster uncertain data by using a hierarchical clustering defined within the belief function framework. The main objective of the belief hierarchical clustering is to allow an object to belong to one or several clusters. To each belonging, a degree of belief is associated, and clusters are combined based on the pignistic properties. Experiments with real uncertain data show that our proposed method can be considered as a propitious tool.\nThis article proposes the use of Vector Symbolic Architectures for implementing Hierarchical Graph Neuron, an architecture for memorizing patterns of generic sensor stimuli. The adoption of a Vector Symbolic representation ensures a one-layered design for the approach, while maintaining the previously reported properties and performance characteristics of Hierarchical Graph Neuron, and also improving the noise resistance of the architecture. The proposed architecture enables a linear (with respect to the number of stored entries) time search for an arbitrary sub-pattern.\nThis paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and indeed it is only after one combines the two that the full benefit of each is realized. We introduce a hierarchical value iteration algorithm where we first coarsely solve subgoals and then use these approximate solutions to exactly solve the MDP. This algorithm solved several problems faster than vanilla value iteration.\nIn this paper, we provide all information to participate to the Second International Nurse Rostering Competition (INRC-II). First, we describe the problem formulation, which, differently from INRC-I, is a multi-stage procedure. Second, we illustrate all the necessary infrastructure do be used together with the participant's solver, including the testbed, the file formats, and the validation/simulation tools. Finally, we state the rules of the competition. All update-to-date information about the competition is available at http://mobiz.vives.be/inrc2/.\nWe study the Bayesian model averaging approach to learning Bayesian network structures (DAGs) from data. We develop new algorithms including the first algorithm that is able to efficiently sample DAGs according to the exact structure posterior. The DAG samples can then be used to construct estimators for the posterior of any feature. We theoretically prove good properties of our estimators and empirically show that our estimators considerably outperform the estimators from the previous state-of-the-art methods.\nWe introduce the first, general purpose, slice sampling inference engine for probabilistic programs. This engine is released as part of StocPy, a new Turing-Complete probabilistic programming language, available as a Python library. We present a transdimensional generalisation of slice sampling which is necessary for the inference engine to work on traces with different numbers of random variables. We show that StocPy compares favourably to other PPLs in terms of flexibility and usability, and that slice sampling can outperform previously introduced inference methods. Our experiments include a logistic regression, HMM, and Bayesian Neural Net.\nIn this paper, we propose to learn sources independence in order to choose the appropriate type of combination rules when aggregating their beliefs. Some combination rules are used with the assumption of their sources independence whereas others combine beliefs of dependent sources. Therefore, the choice of the combination rule depends on the independence of sources involved in the combination. In this paper, we propose also a measure of independence, positive and negative dependence to integrate in mass functions before the combinaision with the independence assumption.\nHidden Markov Models (HMMs) are learning methods for pattern recognition. The probabilistic HMMs have been one of the most used techniques based on the Bayesian model. First-order probabilistic HMMs were adapted to the theory of belief functions such that Bayesian probabilities were replaced with mass functions. In this paper, we present a second-order Hidden Markov Model using belief functions. Previous works in belief HMMs have been focused on the first-order HMMs. We extend them to the second-order model.\nMany information sources are considered into data fusion in order to improve the decision in terms of uncertainty and imprecision. For each technique used for data fusion, the asumption on independance is usually made. We propose in this article an approach to take into acount an independance measure befor to make the combination of information in the context of the theory of belief functions.\nWe introduce an adaptive output-sensitive Metropolis-Hastings algorithm for probabilistic models expressed as programs, Adaptive Lightweight Metropolis-Hastings (AdLMH). The algorithm extends Lightweight Metropolis-Hastings (LMH) by adjusting the probabilities of proposing random variables for modification to improve convergence of the program output. We show that AdLMH converges to the correct equilibrium distribution and compare convergence of AdLMH to that of LMH on several test problems to highlight different aspects of the adaptation scheme. We observe consistent improvement in convergence on the test problems.\nDefining and modeling the relation of inclusion between continuous belief function may be considered as an important operation in order to study their behaviors. Within this paper we will propose and present two forms of inclusion: The strict and the partial one. In order to develop this relation, we will study the case of consonant belief function. To do so, we will simulate normal distributions allowing us to model and analyze these relations. Based on that, we will determine the parameters influencing and characterizing the two forms of inclusion.\nMulti-agent planning (MAP) approaches are typically oriented at solving loosely-coupled problems, being ineffective to deal with more complex, strongly-related problems. In most cases, agents work under complete information, building complete knowledge bases. The present article introduces a general-purpose MAP framework designed to tackle problems of any coupling levels under incomplete information. Agents in our MAP model are partially unaware of the information managed by the rest of agents and share only the critical information that affects other agents, thus maintaining a distributed vision of the task.\nThe original Halpern-Pearl definition of causality [Halpern and Pearl, 2001] was updated in the journal version of the paper [Halpern and Pearl, 2005] to deal with some problems pointed out by Hopkins and Pearl [2003]. Here the definition is modified yet again, in a way that (a) leads to a simpler definition, (b) handles the problems pointed out by Hopkins and Pearl, and many others, (c) gives reasonable answers (that agree with those of the original and updated definition) in the standard problematic examples of causality, and (d) has lower complexity than either the original or updated definitions.\nThe project of the Ontology Web Search Engine is presented in this paper. The main purpose of this paper is to develop such a project that can be easily implemented. Ontology Web Search Engine is software to look for and index ontologies in the Web. OWL (Web Ontology Languages) ontologies are meant, and they are necessary for the functioning of the SWES (Semantic Web Expert System). SWES is an expert system that will use found ontologies from the Web, generating rules from them, and will supplement its knowledge base with these generated rules. It is expected that the SWES will serve as a universal expert system for the average user.\nOur FRDC_QA team participated in the QA-Lab English subtask of the NTCIR-11. In this paper, we describe our system for solving real-world university entrance exam questions, which are related to world history. Wikipedia is used as the main external resource for our system. Since problems with choosing right/wrong sentence from multiple sentence choices account for about two-thirds of the total, we individually design a classification based model for solving this type of questions. For other types of questions, we also design some simple methods.\nDifferentially private collaborative filtering is a challenging task, both in terms of accuracy and speed. We present a simple algorithm that is provably differentially private, while offering good performance, using a novel connection of differential privacy to Bayesian posterior sampling via Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm lends itself to efficient implementation. By careful systems design and by exploiting the power law behavior of the data to maximize CPU cache bandwidth we are able to generate 1024 dimensional models at a rate of 8.5 million recommendations per second on a single PC.\nStructuring theories is one of the main approaches to reduce the combinatorial explosion associated with reasoning and exploring large theories. In the past we developed the notion of development graphs as a means to represent and maintain structured theories. In this paper we present a methodology and a resulting implementation to reveal the hidden structure of flat theories by transforming them into detailed development graphs. We review our approach using plain TSTP-representations of MIZAR articles obtaining more structured and also more concise theories.\nLeoPARD supports the implementation of knowledge representation and reasoning tools for higher-order logic(s). It combines a sophisticated data structure layer (polymorphically typed {\\lambda}-calculus with nameless spine notation, explicit substitutions, and perfect term sharing) with an ambitious multi-agent blackboard architecture (supporting prover parallelism at the term, clause, and search level). Further features of LeoPARD include a parser for all TPTP dialects, a command line interpreter, and generic means for the integration of external reasoners.\nWe study the relations between Multi-valued Decision Diagrams (MDD) and tuples (i.e. elements of the Cartesian Product of variables). First, we improve the existing methods for transforming a set of tuples, Global Cut Seeds, sequences of tuples into MDDs. Then, we present some in-place algorithms for adding and deleting tuples from an MDD. Next, we consider an MDD constraint which is modified during the search by deleting some tuples. We give an algorithm which adapts MDD-4R to these dynamic and persistent modifications. Some experiments show that MDD constraints are competitive with Table constraints.\nIn the context of using norms for controlling multi-agent systems, a vitally important question that has not yet been addressed in the literature is the development of mechanisms for monitoring norm compliance under partial action observability. This paper proposes the reconstruction of unobserved actions to tackle this problem. In particular, we formalise the problem of reconstructing unobserved actions, and propose an information model and algorithms for monitoring norms under partial action observability using two different processes for reconstructing unobserved actions. Our evaluation shows that reconstructing unobserved actions increases significantly the number of norm violations and fulfilments detected.\nThis paper presents a sociocultural knowledge ontology (OntoSOC) modeling approach. OntoSOC modeling approach is based on Engestrom Human Activity Theory (HAT). That Theory allowed us to identify fundamental concepts and relationships between them. The top-down precess has been used to define differents sub-concepts. The modeled vocabulary permits us to organise data, to facilitate information retrieval by introducing a semantic layer in social web platform architecture, we project to implement. This platform can be considered as a collective memory and Participative and Distributed Information System (PDIS) which will allow Cameroonian communities to share an co-construct knowledge on permanent organized activities.\nWe present a scalable parallel solver for numerical constraint satisfaction problems (NCSPs). Our parallelization scheme consists of homogeneous worker solvers, each of which runs on an available core and communicates with others via the global load balancing (GLB) method. The parallel solver is implemented with X10 that provides an implementation of GLB as a library. In experiments, several NCSPs from the literature were solved and attained up to 516-fold speedup using 600 cores of the TSUBAME2.5 supercomputer.\nWe study properties of particular non-redundant sets of if-then rules describing dependencies between graded attributes. We introduce notions of saturation and witnessed non-redundancy of sets of graded attribute implications are show that bases of graded attribute implications given by systems of pseudo-intents correspond to non-redundant sets of graded attribute implications with saturated consequents where the non-redundancy is witnessed by antecedents of the contained graded attribute implications. We introduce an algorithm which transforms any complete set of graded attribute implications parameterized by globalization into a base given by pseudo-intents. Experimental evaluation is provided to compare the method of obtaining bases for general parameterizations by hedges with earlier graph-based approaches.\nOur study revisits the problem of accuracy-fairness tradeoff in binary classification. We argue that comparison of non-discriminatory classifiers needs to account for different rates of positive predictions, otherwise conclusions about performance may be misleading, because accuracy and discrimination of naive baselines on the same dataset vary with different rates of positive predictions. We provide methodological recommendations for sound comparison of non-discriminatory classifiers, and present a brief theoretical and empirical analysis of tradeoffs between accuracy and non-discrimination.\nIn this paper, we define a distance for the HSL colour system. Next, the proposed distance is used for a fuzzy colour clustering algorithm construction. The presented algorithm is related to the well-known fuzzy c-means algorithm. Finally, the clustering algorithm is used as colour reduction method. The obtained experimental results are presented to demonstrate the effectiveness of our approach.\nIn this paper, we propose a single-agent modal logic framework for reasoning about goal-direct \"knowing how\" based on ideas from linguistics, philosophy, modal logic and automated planning. We first define a modal language to express \"I know how to guarantee phi given psi\" with a semantics not based on standard epistemic models but labelled transition systems that represent the agent's knowledge of his own abilities. A sound and complete proof system is given to capture the valid reasoning patterns about \"knowing how\" where the most important axiom suggests its compositional nature.\nBiometrics systems have been used in a wide range of applications and have improved people authentication. Signature verification is one of the most common biometric methods with techniques that employ various specifications of a signature. Recently, deep learning has achieved great success in many fields, such as image, sounds and text processing. In this paper, deep learning method has been used for feature extraction and feature selection.\nThere are already quite a few tools for solving the Satisfiability Modulo Theories (SMT) problems. In this paper, we present \\texttt{VolCE}, a tool for counting the solutions of SMT constraints, or in other words, for computing the volume of the solution space. Its input is essentially a set of Boolean combinations of linear constraints, where the numeric variables are either all integers or all reals, and each variable is bounded. The tool extends SMT solving with integer solution counting and volume computation/estimation for convex polytopes. Effective heuristics are adopted, which enable the tool to deal with high-dimensional problem instances efficiently and accurately.\nThis paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.\nEmphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White (2015), and Yu (2015) show that by varying the emphasis in a particular way, these algorithms become stable and convergent under off-policy training with linear function approximation. This paper serves as a unified summary of the available results from both works. In addition, we demonstrate the empirical benefits from the flexibility of emphatic algorithms, including state-dependent discounting, state-dependent bootstrapping, and the user-specified allocation of function approximation resources.\nIn sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies. To exploit both deep learning and linguistic structures, we propose a tree-based convolutional neural network model which exploit various long-distance relationships between words. Our model improves the sequential baselines on all three sentiment and question classification tasks, and achieves the highest published accuracy on TREC.\nResearch units in archaeology often manage large and precious archives containing various documents, including reports on fieldwork, scholarly studies and reference books. These archives are of course invaluable, recording decades of work, but are generally hard to consult and access. In this context, digitizing full text documents is not enough: information must be formalized, structured and easy to access thanks to friendly user interfaces.\nWe present $\\mathcal{MEL}^{++}$ (M denotes Markov logic networks) an extension of the log-linear description logics $\\mathcal{EL}^{++}$-LL with concrete domains, nominals, and instances. We use Markov logic networks (MLNs) in order to find the most probable, classified and coherent $\\mathcal{EL}^{++}$ ontology from an $\\mathcal{MEL}^{++}$ knowledge base. In particular, we develop a novel way to deal with concrete domains (also known as datatypes) by extending MLN's cutting plane inference (CPI) algorithm.\nFinding the most probable (MAP) model in SRL frameworks such as Markov logic and Problog can, in principle, be solved by encoding the problem as a `grounded-out' mixed integer program (MIP). However, useful first-order structure disappears in this process motivating the development of first-order MIP approaches. Here we present mfoilp, one such approach. Since the syntax and semantics of mfoilp is essentially the same as existing approaches we focus here mainly on implementation and algorithmic issues. We start with the (conceptually) simple problem of using a logic program to generate a MIP instance before considering more ambitious exploitation of first-order representations.\nGelfond and Zhang recently proposed a new stable model semantics based on Vicious Circle Principle in order to improve the interpretation of logic programs with aggregates. The paper focuses on this proposal, and analyzes the complexity of both coherence testing and cautious reasoning under the new semantics. Some surprising results highlight similarities and differences versus mainstream stable model semantics for aggregates. Moreover, the paper reports on the design of compilation techniques for implementing the new semantics on top of existing ASP solvers, which eventually lead to realize a prototype system that allows for experimenting with Gelfond-Zhang's aggregates.   To appear in Theory and Practice of Logic Programming (TPLP), Proceedings of ICLP 2015.\nNicod's criterion states that observing a black raven is evidence for the hypothesis H that all ravens are black. We show that Solomonoff induction does not satisfy Nicod's criterion: there are time steps in which observing black ravens decreases the belief in H. Moreover, while observing any computable infinite string compatible with H, the belief in H decreases infinitely often when using the unnormalized Solomonoff prior, but only finitely often when using the normalized Solomonoff prior. We argue that the fault is not with Solomonoff induction; instead we should reject Nicod's criterion.\nWe introduce optimization techniques for reasoning in DLN---a recently introduced family of nonmonotonic description logics whose characterizing features appear well-suited to model the applicative examples naturally arising in biomedical domains and semantic web access control policies. Such optimizations are validated experimentally on large KBs with more than 30K axioms. Speedups exceed 1 order of magnitude. For the first time, response times compatible with real-time reasoning are obtained with nonmonotonic KBs of this size.\nIn this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling. We present two reinforcement learning algorithms, and devise a third one. We compare our results to previous work that uses simulated annealing (SA), and show a 27% improvement in operation costs, with running time of 2.5 minutes (compared to 2.5 hours of existing state-of-the-art).\nWe propose a new algorithm for recommender systems with numeric ratings which is based on Pattern Structures (RAPS). As the input the algorithm takes rating matrix, e.g., such that it contains movies rated by users. For a target user, the algorithm returns a rated list of items (movies) based on its previous ratings and ratings of other users. We compare the results of the proposed algorithm in terms of precision and recall measures with Slope One, one of the state-of-the-art item-based algorithms, on Movie Lens dataset and RAPS demonstrates the best or comparable quality.\nThis paper describes the IBM 704 architecture and the genesis of the names for CAR, and CDR, which, as it turns out, probably don't quite make sense. The paper suggests that this may not be all bad, as the names lend themselves to compounding. Indeed that the compound function names , such as CADR, or even CADADR, etc. may be read as little access programs.\nThis work presents two new algorithms for performing constraint satisfaction. The first algorithm presented, DMaxWalkSat, is a constraint solver specialized for solving dynamic, weighted constraint satisfaction problems. The second algorithm, RDMaxWalkSat, is a derivative of DMaxWalkSat that has been modified into an anytime algorithm, and hence support realtime constraint satisfaction. DMaxWalkSat is shown to offer performance advantages in terms of solution quality and runtime over its parent constraint solver, MaxWalkSat. RDMaxWalkSat is shown to support anytime operation. The introduction of these algorithms brings another tool to the areas of computer science that naturally represent problems as constraint satisfaction problems, an example of which is the robust coherence algorithm.\nThe Stackelberg equilibrium solution concept describes optimal strategies to commit to: Player 1 (termed the leader) publicly commits to a strategy and Player 2 (termed the follower) plays a best response to this strategy (ties are broken in favor of the leader). We study Stackelberg equilibria in finite sequential games (or extensive-form games) and provide new exact algorithms, approximate algorithms, and hardness results for several classes of these sequential games.\nAll solutions SAT (AllSAT for short) is a variant of propositional satisfiability problem. Despite its significance, AllSAT has been relatively unexplored compared to other variants. We thus survey and discuss major techniques of AllSAT solvers. We faithfully implement them and conduct comprehensive experiments using a large number of instances and various types of solvers including one of the few public softwares. The experiments reveal solver's characteristics. Our implemented solvers are made publicly available so that other researchers can easily develop their solver by modifying our codes and compare it with existing methods.\nThis paper provides a general result on controlling local Rademacher complexities, which captures in an elegant form to relate the complexities with constraint on the expected norm to the corresponding ones with constraint on the empirical norm. This result is convenient to apply in real applications and could yield refined local Rademacher complexity bounds for function classes satisfying general entropy conditions. We demonstrate the power of our complexity bounds by applying them to derive effective generalization error bounds.\nThis paper proposes a new general approach based on Bayesian networks to model the human behaviour. This approach represents human behaviour withprobabilistic cause-effect relations based not only on previous works, but also with conditional probabilities coming either from expert knowledge or deduced from observations. The approach has been used in the co-simulation of building physics and human behaviour in order to assess the CO 2 concentration in an office.\nIn this paper we present Gelisp, a new library to represent musical Constraint Satisfaction Problems and search strategies intuitively. Gelisp has two interfaces, a command-line one for Common Lisp and a graphical one for OpenMusic. Using Gelisp, we solved a problem of automatic music generation proposed by composer Michael Jarrell and we found solutions for the All-interval series.\nIn this paper we study complexity of an extension of ordered binary decision diagrams (OBDDs) called $c$-OBDDs on CNFs of bounded (primal graph) treewidth. In particular, we show that for each $k$ there is a class of CNFs of treewidth $k \\geq 3$ for which the equivalent $c$-OBDDs are of size $\\Omega(n^{k/(8c-4)})$. Moreover, this lower bound holds if $c$-OBDD is non-deterministic and semantic. Our second result uses the above lower bound to separate the above model from sentential decision diagrams (SDDs). In order to obtain the lower bound, we use a structural graph parameter called matching width. Our third result shows that matching width and pathwidth are linearly related.\nAutomatic narration of events and entities is the need of the hour, especially when live reporting is critical and volume of information to be narrated is huge. This paper discusses the challenges in this context, along with the algorithms used to build such systems. From a systematic study, we can infer that most of the work done in this area is related to statistical data. It was also found that subjective evaluation or contribution of experts is also limited for narration context.\nIn this project we outline a modularized, scalable system for comparing Amazon products in an interactive and informative way using efficient latent variable models and dynamic visualization. We demonstrate how our system can build on the structure and rich review information of Amazon products in order to provide a fast, multifaceted, and intuitive comparison. By providing a condensed per-topic comparison visualization to the user, we are able to display aggregate information from the entire set of reviews while providing an interface that is at least as compact as the \"most helpful reviews\" currently displayed by Amazon, yet far more informative.\nStochastic local search (SLS) algorithms have exhibited great effectiveness in finding models of random instances of the Boolean satisfiability problem (SAT). As one of the most widely known and used SLS algorithm, WalkSAT plays a key role in the evolutions of SLS for SAT, and also hold state-of-the-art performance on random instances. This work proposes a novel implementation for WalkSAT which decreases the redundant calculations leading to a dramatically speeding up, thus dominates the latest version of WalkSAT including its advanced variants.\nIn this paper we explore deep learning models with memory component or attention mechanism for question answering task. We combine and compare three models, Neural Machine Translation, Neural Turing Machine, and Memory Networks for a simulated QA data set. This paper is the first one that uses Neural Machine Translation and Neural Turing Machines for solving QA tasks. Our results suggest that the combination of attention and memory have potential to solve certain QA problem.\nUpcoming many core processors are expected to employ a distributed memory architecture similar to currently available supercomputers, but parallel pattern mining algorithms amenable to the architecture are not comprehensively studied. We present a novel closed pattern mining algorithm with a well-engineered communication protocol, and generalize it to find statistically significant patterns from personal genome data. For distributing communication evenly, it employs global load balancing with multiple stacks distributed on a set of cores organized as a hypercube with random edges. Our algorithm achieved up to 1175-fold speedup by using 1200 cores for solving a problem with 11,914 items and 697 transactions, while the naive approach of separating the search space failed completely.\nAs protein folding is a NP-complete problem, artificial intelligence tools like neural networks and genetic algorithms are used to attempt to predict the 3D shape of an amino acids sequence. Underlying these attempts, it is supposed that this folding process is predictable. However, to the best of our knowledge, this important assumption has been neither proven, nor studied. In this paper the topological dynamic of protein folding is evaluated. It is mathematically established that protein folding in 2D hydrophobic-hydrophilic (HP) square lattice model is chaotic as defined by Devaney. Consequences for both structure prediction and biology are then outlined.\nA large part of the use of knowledge base systems is the interpretation of the output by the end-users and the interaction with these users. Even during the development process visualisations can be a great help to the developer. We created IDPD3 as a library to visualise models of logic theories. IDPD3 is a new version of $ID^{P}_{Draw}$ and adds support for visualised interactive simulations.\nWe show that there is a largely unexplored class of functions (positive polymatroids) that can define proper discrete metrics over pairs of binary vectors and that are fairly tractable to optimize over. By exploiting submodularity, we are able to give hardness results and approximation algorithms for optimizing over such metrics. Additionally, we demonstrate empirically the effectiveness of these metrics and associated algorithms on both a metric minimization task (a form of clustering) and also a metric maximization task (generating diverse k-best lists).\nIn this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities. We employ a pair of convolutional neural networks to model visual objects and speech signals at the word level, and tie the networks together with an embedding and alignment model which learns a joint semantic space over both modalities. We evaluate our model using image search and annotation tasks on the Flickr8k dataset, which we augmented by collecting a corpus of 40,000 spoken captions using Amazon Mechanical Turk.\nIn this paper we show that the problem of checking consistency of a knowledge base in the Description Logic ALCM is ExpTime-complete. The M stands for meta-modelling as defined by Motz, Rohrer and Severi. To show our main result, we define an ExpTime Tableau algorithm as an extension of an algorithm for checking consistency of a knowledge base in ALC by Nguyen and Szalas.\nMessages often refer to entities such as people, places and events. Correct identification of the intended reference is an essential part of communication. Lack of shared unique names often complicates entity reference. Shared knowledge can be used to construct uniquely identifying descriptive references for entities with ambiguous names. We introduce a mathematical model for `Reference by Description', derive results on the conditions under which, with high probability, programs can construct unambiguous references to most entities in the domain of discourse and provide empirical validation of these results.\nEven though there are sophisticated AI planning algorithms, many integrated, large-scale projects do not use planning. One reason seems to be the missing support by engineering tools such as syntax highlighting and visualization. We propose myPDDL - a modular toolbox for efficiently creating PDDL domains and problems. To evaluate myPDDL, we compare it to existing knowledge engineering tools for PDDL and experimentally assess its usefulness for novice PDDL users.\nComputerized adaptive testing (CAT) is an interesting and promising approach to testing human abilities. In our research we use Bayesian networks to create a model of tested humans. We collected data from paper tests performed with grammar school students. In this article we first provide the summary of data used for our experiments. We propose several different Bayesian networks, which we tested and compared by cross-validation. Interesting results were obtained and are discussed in the paper. The analysis has brought a clearer view on the model selection problem. Future research is outlined in the concluding part of the paper.\nIn this paper, we combine task-dependent reward shaping and task-independent proto-value functions to obtain reward dependent proto-value functions (RPVFs). In constructing the RPVFs we are making use of the immediate rewards which are available during the sampling phase but are not used in the PVF construction. We show via experiments that learning with an RPVF based representation is better than learning with just reward shaping or PVFs. In particular, when the state space is symmetrical and the rewards are asymmetrical, the RPVF capture the asymmetry better than the PVFs.\nWe consider a reinforcement learning framework where agents have to navigate from start states to goal states. We prove convergence of a cycle-detection learning algorithm on a class of tasks that we call reducible. Reducible tasks have an acyclic solution. We also syntactically characterize the form of the final policy. This characterization can be used to precisely detect the convergence point in a simulation. Our result demonstrates that even simple algorithms can be successful in learning a large class of nontrivial tasks. In addition, our framework is elementary in the sense that we only use basic concepts to formally prove convergence.\nThe evaluation of Datalog rules over large Knowledge Graphs (KGs) is essential for many applications. In this paper, we present a new method of materializing Datalog inferences, which combines a column-based memory layout with novel optimization methods that avoid redundant inferences at runtime. The pro-active caching of certain subqueries further increases efficiency. Our empirical evaluation shows that this approach can often match or even surpass the performance of state-of-the-art systems, especially under restricted resources.\nThe ability to comprehend wishes or desires and their fulfillment is important to Natural Language Understanding. This paper introduces the task of identifying if a desire expressed by a subject in a given short piece of text was fulfilled. We propose various unstructured and structured models that capture fulfillment cues such as the subject's emotional state and actions. Our experiments with two different datasets demonstrate the importance of understanding the narrative and discourse structure to address this task.\nTraditional pattern mining algorithms generally suffer from a lack of flexibility. In this paper, we propose a SAT formulation of the problem to successfully mine frequent flexible sequences occurring in transactional datasets. Our SAT-based approach can easily be extended with extra constraints to address a broad range of pattern mining applications. To demonstrate this claim, we formulate and add several constraints, such as gap and span constraints, to our model in order to extract more specific patterns. We also use interactive solving to perform important derived tasks, such as closed pattern mining or maximal pattern mining. Finally, we prove the practical feasibility of our SAT model by running experiments on two real datasets.\nThe bbob-biobj test suite contains 55 bi-objective functions in continuous domain which are derived from combining functions of the well-known single-objective noiseless bbob test suite. Besides giving the actual function definitions and presenting their (known) properties, this documentation also aims at giving the rationale behind our approach in terms of function groups, instances, and potential objective space normalization.\nSharing unused vehicles is one practical solution for traffic congestion. We propose an advanced vehicle-sharing service that maximizes the sharing of vehicles and improves traffic efficiency by coordinating user trips via an information system. We formulate ride-sharing games that model externalities in vehicle sharing caused by insufficient vehicle supply. We show how Bayes correlated equilibrium can coordinate players in ride-sharing games and verify the resultant improvement in the price of anarchy.\nWe show that a character-level encoder-decoder framework can be successfully applied to question answering with a structured knowledge base. We use our model for single-relation question answering and demonstrate the effectiveness of our approach on the SimpleQuestions dataset (Bordes et al., 2015), where we improve state-of-the-art accuracy from 63.9% to 70.9%, without use of ensembles. Importantly, our character-level model has 16x fewer parameters than an equivalent word-level model, can be learned with significantly less data compared to previous work, which relies on data augmentation, and is robust to new entities in testing.\nWe introduce an extension of the n-ary description logic DLR to deal with attribute-labelled tuples (generalising the positional notation), with arbitrary projections of relations (inclusion dependencies), generic functional dependencies and with global and local objectification (reifying relations or their projections). We show how a simple syntactic condition on the appearance of projections and functional dependencies in a knowledge base makes the language decidable without increasing the computational complexity of the basic DLR language.\nIn this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods---it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang and Li, 2015), and a new way to mix between model based estimates and importance sampling based estimates.\nThe uniform one-dimensional fragment of first-order logic, U1, is a recently introduced formalism that extends two-variable logic in a natural way to contexts with relations of all arities. We survey properties of U1 and investigate its relationship to description logics designed to accommodate higher arity relations, with particular attention given to DLR_reg. We also define a description logic version of a variant of U1 and prove a range of new results concerning the expressivity of U1 and related logics.\nWe propose a computational model of situated language comprehension based on the Indexical Hypothesis that generates meaning representations by translating amodal linguistic symbols to modal representations of beliefs, knowledge, and experience external to the linguistic system. This Indexical Model incorporates multiple information sources, including perceptions, domain knowledge, and short-term and long-term experiences during comprehension. We show that exploiting diverse information sources can alleviate ambiguities that arise from contextual use of underspecific referring expressions and unexpressed argument alternations of verbs. The model is being used to support linguistic interactions in Rosie, an agent implemented in Soar that learns from instruction.\nThe development of machine learning in particular and artificial intelligent in general has been strongly conditioned by the lack of an appropriated framework to specify and integrate learning processes, data transformation processes and data models. In this work we extend traditional algebraic specification methods to this type of framework. Limits and colimits of diagrams are universal constructions fundamental in different mathematical domains importance in semantic modeling. The aim of our work is to study the possibility of extending these algebraic frameworks to the specification of vague structures and to the description of vague patterns on data.\nMany analyses of resource-allocation problems employ simplistic models of the population. Using the example of a resource-allocation problem of Marecek et al. [arXiv:1406.7639], we introduce rather a general behavioural model, where the evolution of a heterogeneous population of agents is governed by a Markov chain. Still, we are able to show that the distribution of agents across resources converges in distribution, for suitable means of information provision, under certain assumptions. The model and proof techniques may have wider applicability.\nThe recently developed massively parallel satisfiability (SAT) solver HordeSAT was designed in a modular way to allow the integration of any sequential CDCL-based SAT solver in its core. We integrated the QCDCL-based quantified Boolean formula (QBF) solver DepQBF in HordeSAT to obtain a massively parallel QBF solver---HordeQBF. In this paper we describe the details of this integration and report on results of the experimental evaluation of HordeQBF's performance. HordeQBF achieves superlinear average and median speedup on the hard application instances of the 2014 QBF Gallery.\nThis paper studies single-image depth perception in the wild, i.e., recovering depth from a single image taken in unconstrained settings. We introduce a new dataset \"Depth in the Wild\" consisting of images in the wild annotated with relative depth between pairs of random points. We also propose a new algorithm that learns to estimate metric depth using annotations of relative depth. Compared to the state of the art, our algorithm is simpler and performs better. Experiments show that our algorithm, combined with existing RGB-D data and our new relative depth annotations, significantly improves single-image depth perception in the wild.\nMany Web applications require efficient querying of large Knowledge Graphs (KGs). We propose KOGNAC, a dictionary-encoding algorithm designed to improve SPARQL querying with a judicious combination of statistical and semantic techniques. In KOGNAC, frequent terms are detected with a frequency approximation algorithm and encoded to maximise compression. Infrequent terms are semantically grouped into ontological classes and encoded to increase data locality. We evaluated KOGNAC in combination with state-of-the-art RDF engines, and observed that it significantly improves SPARQL querying on KGs with up to 1B edges.\nDiscovering the set of closed frequent patterns is one of the fundamental problems in Data Mining. Recent Constraint Programming (CP) approaches for declarative itemset mining have proven their usefulness and flexibility. But the wide use of reified constraints in current CP approaches raises many difficulties to cope with high dimensional datasets. This paper proposes CLOSED PATTERN global constraint which does not require any reified constraints nor any extra variables to encode efficiently the Closed Frequent Pattern Mining (CFPM) constraint. CLOSED-PATTERN captures the particular semantics of the CFPM problem in order to ensure a polynomial pruning algorithm ensuring domain consistency. The computational properties of our constraint are analyzed and their practical effectiveness is experimentally evaluated.\nIn this paper, we introduce new methods and discuss results of text-based LSTM (Long Short-Term Memory) networks for automatic music composition. The proposed network is designed to learn relationships within text documents that represent chord progressions and drum tracks in two case studies. In the experiments, word-RNNs (Recurrent Neural Networks) show good results for both cases, while character-based RNNs (char-RNNs) only succeed to learn chord progressions. The proposed system can be used for fully automatic composition or as semi-automatic systems that help humans to compose music by controlling a diversity parameter of the model.\nThis paper presents a novel approach to procedural generation of urban maps for First Person Shooter (FPS) games. A multi-agent evolutionary system is employed to place streets, buildings and other items inside the Unity3D game engine, resulting in playable video game levels. A computational agent is trained using machine learning techniques to capture the intent of the game designer as part of the multi-agent system, and to enable a semi-automated aesthetic selection for the underlying genetic algorithm.\nEmbedding-based Knowledge Base Completion models have so far mostly combined distributed representations of individual entities or relations to compute truth scores of missing links. Facts can however also be represented using pairwise embeddings, i.e. embeddings for pairs of entities and relations. In this paper we explore such bigram embeddings with a flexible Factorization Machine model and several ablations from it. We investigate the relevance of various bigram types on the fb15k237 dataset and find relative improvements compared to a compositional model.\nWe show unconditional parameterized lower bounds in the area of knowledge compilation, more specifically on the size of circuits in decomposable negation normal form (DNNF) that encode CNF-formulas restricted by several graph width measures. In particular, we show that   - there are CNF formulas of size $n$ and modular incidence treewidth $k$ whose smallest DNNF-encoding has size $n^{\\Omega(k)}$, and   - there are CNF formulas of size $n$ and incidence neighborhood diversity $k$ whose smallest DNNF-encoding has size $n^{\\Omega(\\sqrt{k})}$.   These results complement recent upper bounds for compiling CNF into DNNF and strengthen---quantitatively and qualitatively---known conditional low\\-er bounds for cliquewidth. Moreover, they show that, unlike for many graph problems, the parameters considered here behave significantly differently from treewidth.\nLearning novel tasks is a complex cognitive activity requiring the learner to acquire diverse declarative and procedural knowledge. Prior ACT-R models of acquiring task knowledge from instruction focused on learning procedural knowledge from declarative instructions encoded in semantic memory. In this paper, we identify the requirements for designing compu- tational models that learn task knowledge from situated task- oriented interactions with an expert and then describe and evaluate a model of learning from situated interactive instruc- tion that is implemented in the Soar cognitive architecture.\nIn this paper we study the relationship between the resources of social networks by exploring the Web as big data based on a simple search engine. We have used set theory by utilizing the occurrence and co-occurrence for defining the singleton or doubleton spaces of event in a search engine model, and then provided them as representation of social actors and their relationship in clusters. Thus, there are behaviors of social actors and their relation based on Web.\nWe introduce a logic for temporal beliefs and intentions based on Shoham's database perspective. We separate strong beliefs from weak beliefs. Strong beliefs are independent from intentions, while weak beliefs are obtained by adding intentions to strong beliefs and everything that follows from that. We formalize coherence conditions on strong beliefs and intentions. We provide AGM-style postulates for the revision of strong beliefs and intentions. We show in a representation theorem that a revision operator satisfying our postulates can be represented by a pre-order on interpretations of the beliefs, together with a selection function for the intentions.\nDou Shou Qi is a game in which two players control a number of pieces, each of them aiming to move one of their pieces onto a given square. We implemented an engine for analyzing the game. Moreover, we created a series of endgame tablebases containing all configurations with up to four pieces. These tablebases are the first steps towards theoretically solving the game. Finally, we constructed decision trees based on the endgame tablebases. In this note we report on some interesting patterns.\nInformation and knowledge are transformable into each other. Information transformation into knowledge by the example of rule generation from OWL (Web Ontology Language) ontology has been shown during the development of the SWES (Semantic Web Expert System). The SWES is expected as an expert system for searching OWL ontologies from the Web, generating rules from the found ontologies and supplementing the SWES knowledge base with these rules. The purpose of this paper is to show knowledge transformation into information by the example of ontology generation from rules.\nModern saturation-based Automated Theorem Provers typically implement the superposition calculus for reasoning about first-order logic with or without equality. Practical implementations of this calculus use a variety of literal selections and term orderings to tame the growth of the search space and help steer proof search. This paper introduces the notion of lookahead selection that estimates (looks ahead) the effect on the search space of selecting a literal. There is also a case made for the use of incomplete selection functions that attempt to restrict the search space instead of satisfying some completeness criteria. Experimental evaluation in the \\Vampire\\ theorem prover shows that both lookahead selection and incomplete selection significantly contribute to solving hard problems unsolvable by other methods.\nWe relate behavior composition, a synthesis task studied in AI, to supervisory control theory from the discrete event systems field. In particular, we show that realizing (i.e., implementing) a target behavior module (e.g., a house surveillance system) by suitably coordinating a collection of available behaviors (e.g., automatic blinds, doors, lights, cameras, etc.) amounts to imposing a supervisor onto a special discrete event system. Such a link allows us to leverage on the solid foundations and extensive work on discrete event systems, including borrowing tools and ideas from that field. As evidence of that we show how simple it is to introduce preferences in the mapped framework.\nThe ability to learn a model is essential for the success of autonomous agents. Unfortunately, learning a model is difficult in partially observable environments, where latent environmental factors influence what the agent observes. In the absence of a supervisory training signal, autonomous agents therefore require a mechanism to autonomously discover these environmental factors, or sensorimotor contexts.   This paper presents a method to discover sensorimotor contexts in partially observable environments, by constructing a hierarchical transition model. The method is evaluated in a simulation experiment, in which a robot learns that different rooms are characterized by different objects that are found in them.\nComma.ai's approach to Artificial Intelligence for self-driving cars is based on an agent that learns to clone driver behaviors and plans maneuvers by simulating future events in the road. This paper illustrates one of our research approaches for driving simulation. One where we learn to simulate. Here we investigate variational autoencoders with classical and learned cost functions using generative adversarial networks for embedding road frames. Afterwards, we learn a transition model in the embedded space using action conditioned Recurrent Neural Networks. We show that our approach can keep predicting realistic looking video for several frames despite the transition model being optimized without a cost function in the pixel space.\nWe propose applying the categorical compositional scheme of [6] to conceptual space models of cognition. In order to do this we introduce the category of convex relations as a new setting for categorical compositional semantics, emphasizing the convex structure important to conceptual space applications. We show how conceptual spaces for composite types such as adjectives and verbs can be constructed. We illustrate this new model on detailed examples.\nThe definition of stable models for propositional formulas with infinite conjunctions and disjunctions can be used to describe the semantics of answer set programming languages. In this note, we enhance that definition by introducing a distinction between intensional and extensional atoms. The symmetric splitting theorem for first-order formulas is then extended to infinitary formulas and used to reason about infinitary definitions. This note is under consideration for publication in Theory and Practice of Logic Programming.\nIn recent work we defined resource-based answer set semantics, which is an extension to answer set semantics stemming from the study of its relationship with linear logic. In fact, the name of the new semantics comes from the fact that in the linear-logic formulation every literal (including negative ones) were considered as a resource. In this paper, we propose a query-answering procedure reminiscent of Prolog for answer set programs under this extended semantics as an extension of XSB-resolution for logic programs with negation. We prove formal properties of the proposed procedure.   Under consideration for acceptance in TPLP.\nA Winograd schema is a pair of sentences that differ in a single word and that contain an ambiguous pronoun whose referent is different in the two sentences and requires the use of commonsense knowledge or world knowledge to disambiguate. This paper discusses how Winograd schemas and other sentence pairs could be used as challenges for machine translation using distinctions between pronouns, such as gender, that appear in the target language but not in the source.\nDelta Epsilon Alpha Star is a minimal coverage, real-time robotic search algorithm that yields a moderately aggressive search path with minimal backtracking. Search performance is bounded by a placing a combinatorial bound, epsilon and delta, on the maximum deviation from the theoretical shortest path and the probability at which further deviations can occur. Additionally, we formally define the notion of PAC-admissibility -- a relaxed admissibility criteria for algorithms, and show that PAC-admissible algorithms are better suited to robotic search situations than epsilon-admissible or strict algorithms.\nWe present an inductive spatio-temporal learning framework rooted in inductive logic programming. With an emphasis on visuo-spatial language, logic, and cognition, the framework supports learning with relational spatio-temporal features identifiable in a range of domains involving the processing and interpretation of dynamic visuo-spatial imagery. We present a prototypical system, and an example application in the domain of computing for visual arts and computational cognitive science.\nWe present Mean Box Pooling, a novel visual representation that pools over CNN representations of a large number, highly overlapping object proposals. We show that such representation together with nCCA, a successful multimodal embedding technique, achieves state-of-the-art performance on the Visual Madlibs task. Moreover, inspired by the nCCA's objective function, we extend classical CNN+LSTM approach to train the network by directly maximizing the similarity between the internal representation of the deep learning architecture and candidate answers. Again, such approach achieves a significant improvement over the prior work that also uses CNN+LSTM approach on Visual Madlibs.\nWe study the multi-agent path finding problem (MAPF) for a group of agents which are allowed to move into arbitrary directions on a 2D square grid. We focus on centralized conflict resolution for independently computed plans. We propose an algorithm that eliminates conflicts by using local re-planning and introducing time offsets to the execution of paths by different agents. Experimental results show that the algorithm can find high quality conflict-free solutions at low computational cost.\nThis paper explores the task of translating natural language queries into regular expressions which embody their meaning. In contrast to prior work, the proposed neural model does not utilize domain-specific crafting, learning to translate directly from a parallel corpus. To fully explore the potential of neural models, we propose a methodology for collecting a large corpus of regular expression, natural language pairs. Our resulting model achieves a performance gain of 19.6% over previous state-of-the-art models.\nReinforcement learning problems are often described through rewards that indicate if an agent has completed some task. This specification can yield desirable behavior, however many problems are difficult to specify in this manner, as one often needs to know the proper configuration for the agent. When humans are learning to solve tasks, we often learn from visual instructions composed of images or videos. Such representations motivate our development of Perceptual Reward Functions, which provide a mechanism for creating visual task descriptions. We show that this approach allows an agent to learn from rewards that are based on raw pixels rather than internal parameters.\nIn this paper, a geometric framework for neural networks is proposed. This framework uses the inner product space structure underlying the parameter set to perform gradient descent not in a component-based form, but in a coordinate-free manner. Convolutional neural networks are described in this framework in a compact form, with the gradients of standard --- and higher-order --- loss functions calculated for each layer of the network. This approach can be applied to other network structures and provides a basis on which to create new networks.\nNatural language processing, as a data analytics related technology, is used widely in many research areas such as artificial intelligence, human language processing, and translation. At present, due to explosive growth of data, there are many challenges for natural language processing. Hadoop is one of the platforms that can process the large amount of data required for natural language processing. KOSHIK is one of the natural language processing architectures, and utilizes Hadoop and contains language processing components such as Stanford CoreNLP and OpenNLP. This study describes how to build a KOSHIK platform with the relevant tools, and provides the steps to analyze wiki data. Finally, it evaluates and discusses the advantages and disadvantages of the KOSHIK architecture, and gives recommendations on improving the processing performance.\nFunction optimisation is a major challenge in computer science. The No Free Lunch theorems state that if all functions with the same histogram are assumed to be equally probable then no algorithm outperforms any other in expectation. We argue against the uniform assumption and suggest a universal prior exists for which there is a free lunch, but where no particular class of functions is favoured over another. We also prove upper and lower bounds on the size of the free lunch.\nThe predominant method for evaluating the quality of causal models is to measure the graphical accuracy of the learned model structure. We present an alternative method for evaluating causal models that directly measures the accuracy of estimated interventional distributions. We contrast such distributional measures with structural measures, such as structural Hamming distance and structural intervention distance, showing that structural measures often correspond poorly to the accuracy of estimated interventional distributions. We use a number of real and synthetic datasets to illustrate various scenarios in which structural measures provide misleading results with respect to algorithm selection and parameter tuning, and we recommend that distributional measures become the new standard for evaluating causal models.\nDescriptions are often provided along with recommendations to help users' discovery. Recommending automatically generated music playlists (e.g. personalised playlists) introduces the problem of generating descriptions. In this paper, we propose a method for generating music playlist descriptions, which is called as music captioning. In the proposed method, audio content analysis and natural language processing are adopted to utilise the information of each track.\nSince Pokemon Go sent millions on the quest of collecting virtual monsters, an important question has been on the minds of many people: Is going after the closest item first a time-and-cost-effective way to play? Here, we show that this is in fact a good strategy which performs on average only 7% worse than the best possible solution in terms of the total distance traveled to gather all the items. Even when accounting for errors due to the inability of people to accurately measure distances by eye, the performance only goes down to 16% of the optimal solution.\nThe theory of actual causality, defined by Halpern and Pearl, and its quantitative measure - the degree of responsibility - was shown to be extremely useful in various areas of computer science due to a good match between the results it produces and our intuition. In this paper, I describe the applications of causality to formal verification, namely, explanation of counterexamples, refinement of coverage metrics, and symbolic trajectory evaluation. I also briefly discuss recent applications of causality to legal reasoning.\nWe show how, under certain conditions, the asymptotic behaviour of an Ordinary Differential Equation under non-constant interventions can be modelled using Dynamic Structural Causal Models. In contrast to earlier work, we study not only the effect of interventions on equilibrium states; rather, we model asymptotic behaviour that is dynamic under interventions that vary in time, and include as a special case the study of static equilibria.\nThis paper describes an approach to the methodology of answer set programming (ASP) that can facilitate the design of encodings that are easy to understand and provably correct. Under this approach, after appending a rule or a small group of rules to the emerging program we include a comment that states what has been \"achieved\" so far. This strategy allows us to set out our understanding of the design of the program by describing the roles of small parts of the program in a mathematically precise way.\nSymmetry breaking has been proven to be an efficient preprocessing technique for satisfiability solving (SAT). In this paper, we port the state-of-the-art SAT symmetry breaker BreakID to answer set programming (ASP). The result is a lightweight tool that can be plugged in between the grounding and the solving phases that are common when modelling in ASP. We compare our tool with sbass, the current state-of-the-art symmetry breaker for ASP.\nFor most branching algorithms in Boolean logic \"branching\" means \"variable-wise branching\". We present the apparently novel technique of clause-wise branching, which is used to solve the ALLSAT problem for arbitrary Boolean functions in CNF format. Specifically, it converts a CNF into an orthogonal DNF, i.e. into an exclusive sum of products. Our method is enhanced by two ingredients: The use of a good SAT-solver and wildcards beyond the common don't-care symbol.\nThe Smallest Grammar Problem -- the problem of finding the smallest context-free grammar that generates exactly one given sequence -- has never been successfully applied to grammatical inference. We investigate the reasons and propose an extended formulation that seeks to minimize non-recursive grammars, instead of straight-line programs. In addition, we provide very efficient algorithms that approximate the minimization problem of this class of grammars. Our empirical evaluation shows that we are able to find smaller models than the current best approximations to the Smallest Grammar Problem on standard benchmarks, and that the inferred rules capture much better the syntactic structure of natural language.\nThis paper develops upper and lower bounds for the probability of Boolean expressions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. Our technique generalizes and extends the underlying idea of a number of recent approaches which are varyingly called node splitting, variable renaming, variable splitting, or dissociation for probabilistic databases. We prove that the probabilities we assign to new variables are the best possible in some sense.\nThe Gibbard-Satterthwaite theorem states that every non-dictatorial election rule among at least three alternatives can be strategically manipulated. We prove a quantitative version of the Gibbard-Satterthwaite theorem: a random manipulation by a single random voter will succeed with a non-negligible probability for any election rule among three alternatives that is far from being a dictatorship and from having only two alternatives in its range.\nNegation as failure and incomplete information in logic programs have been studied by many researchers In order to explains HOW a negated conclusion was reached, we introduce and proof a different way for negating facts to overcoming misleads in logic programs. Negating facts can be achieved by asking the user for constants that do not appear elsewhere in the knowledge base.\nRobust search procedures are a central component in the design of black-box constraint-programming solvers. This paper proposes activity-based search, the idea of using the activity of variables during propagation to guide the search. Activity-based search was compared experimentally to impact-based search and the WDEG heuristics. Experimental results on a variety of benchmarks show that activity-based search is more robust than other heuristics and may produce significant improvements in performance.\nWe introduce matrix and its block to the Dung's theory of argumentation frameworks. It is showed that each argumentation framework has a matrix representation, and the common extension-based semantics of argumentation framework can be characterized by blocks of matrix and their relations. In contrast with traditional method of directed graph, the matrix way has the advantage of computability. Therefore, it has an extensive perspective to bring the theory of matrix into the research of argumentation frameworks and related areas.\nWe consider the G\\\"odel bi-modal logic determined by fuzzy Kripke models where both the propositions and the accessibility relation are infinitely valued over the standard G\\\"odel algebra [0,1] and prove strong completeness of Fischer Servi intuitionistic modal logic IK plus the prelinearity axiom with respect to this semantics. We axiomatize also the bi-modal analogues of $T,$ $S4,$ and $S5$ obtained by restricting to models over frames satisfying the [0,1]-valued versions of the structural properties which characterize these logics. As application of the completeness theorems we obtain a representation theorem for bi-modal G\\\"odel algebras.\nThis paper introduces the SEQ BIN meta-constraint with a polytime algorithm achieving general- ized arc-consistency according to some properties. SEQ BIN can be used for encoding counting con- straints such as CHANGE, SMOOTH or INCREAS- ING NVALUE. For some of these constraints and some of their variants GAC can be enforced with a time and space complexity linear in the sum of domain sizes, which improves or equals the best known results of the literature.\nThe Taaable projet goal is to create a case-based reasoning system for retrieval and adaptation of cooking recipes. Within this framework, we are discussing the temporal aspects of recipes and the means of representing those in order to adapt their text.\nDesigning component-based constraint solvers is a complex problem. Some components are required, some are optional and there are interdependencies between the components. Because of this, previous approaches to solver design and modification have been ad-hoc and limited. We present a system that transforms a description of the components and the characteristics of the target constraint solver into a constraint problem. Solving this problem yields the description of a valid solver. Our approach represents a significant step towards the automated design and synthesis of constraint solvers that are specialised for individual constraint problem classes or instances.\nClassification of targets by radar has proved to be notoriously difficult with the best systems still yet to attain sufficiently high levels of performance and reliability. In the current contribution we explore a new design of radar based target recognition, where angular diversity is used in a cognitive manner to attain better performance. Performance is bench- marked against conventional classification schemes. The proposed scheme can easily be extended to cognitive target recognition based on multiple diversity strategies.\nContemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly adresses the decision problem of maximizing information gain from each evaluation.\nCovering model provides a general framework for granular computing in that overlapping among granules are almost indispensable. For any given covering, both intersection and union of covering blocks containing an element are exploited as granules to form granular worlds at different abstraction levels, respectively, and transformations among these different granular worlds are also discussed. As an application of the presented multi-granular perspective on covering, relational interpretation and axiomization of four types of covering based rough upper approximation operators are investigated, which can be dually applied to lower ones.\nThis paper studies the coupling of internally guided learning and social interaction, and more specifically the improvement owing to demonstrations of the learning by intrinsic motivation. We present Socially Guided Intrinsic Motivation by Demonstration (SGIM-D), an algorithm for learning in continuous, unbounded and non-preset environments. After introducing social learning and intrinsic motivation, we describe the design of our algorithm, before showing through a fishing experiment that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation to gain a wide repertoire while being specialised in specific subspaces.\nTo study the preference of infants for contingency of movements and familiarity of faces during self-recognition task, we built, as an accurate and instantaneous imitator, a real-time face- swapper for videos. We present a non-constraint face-swapper based on 3D visual tracking that achieves real-time performance through parallel computing. Our imitator system is par- ticularly suited for experiments involving children with Autistic Spectrum Disorder who are often strongly disturbed by the constraints of other methods.\nThis paper considers the sparse eigenvalue problem, which is to extract dominant (largest) sparse eigenvectors with at most $k$ non-zero components. We propose a simple yet effective solution called truncated power method that can approximately solve the underlying nonconvex optimization problem. A strong sparse recovery result is proved for the truncated power method, and this theory is our key motivation for developing the new algorithm. The proposed method is tested on applications such as sparse principal component analysis and the densest $k$-subgraph problem. Extensive experiments on several synthetic and real-world large scale datasets demonstrate the competitive empirical performance of our method.\nLet us envision a new class of IT systems, the \"Support Systems for Knowledge Works\" or SSKW. An SSKW can be defined as a system built for providing comprehensive support to human knowledge-workers while performing instances of complex knowledge-works of a particular type within a particular domain of professional activities To get an idea what an SSKW-enabled work environment can be like, let us look into a hypothetical scenario that depicts the interaction between a physician and a patient-care SSKW during the activity of diagnosing a patient.\nThis paper studies the use of the Tsallis Entropy versus the classic Boltzmann-Gibbs-Shannon entropy for classifying image patterns. Given a database of 40 pattern classes, the goal is to determine the class of a given image sample. Our experiments show that the Tsallis entropy encoded in a feature vector for different $q$ indices has great advantage over the Boltzmann-Gibbs-Shannon entropy for pattern classification, boosting recognition rates by a factor of 3. We discuss the reasons behind this success, shedding light on the usefulness of the Tsallis entropy.\nThis paper focuses on the expressive power of disjunctive and normal logic programs under the stable model semantics over finite, infinite, or arbitrary structures. A translation from disjunctive logic programs into normal logic programs is proposed and then proved to be sound over infinite structures. The equivalence of expressive power of two kinds of logic programs over arbitrary structures is shown to coincide with that over finite structures, and coincide with whether or not NP is closed under complement. Over finite structures, the intranslatability from disjunctive logic programs to normal logic programs is also proved if arities of auxiliary predicates and functions are bounded in a certain way.\nIn this work, we present definition of intuitionistic fuzzy parameterized (IFP) intuitionistic fuzzy soft set and its operations. Then we define IFP-aggregation operator to form IFP-intuitionistic fuzzy soft-decision-making method which allows constructing more efficient decision processes.\nWe present a unified logical framework for representing and reasoning about both quantitative and qualitative preferences in fuzzy answer set programming, called fuzzy answer set optimization programs. The proposed framework is vital to allow defining quantitative preferences over the possible outcomes of qualitative preferences. We show the application of fuzzy answer set optimization programs to the course scheduling with fuzzy preferences problem. To the best of our knowledge, this development is the first to consider a logical framework for reasoning about quantitative preferences, in general, and reasoning about both quantitative and qualitative preferences in particular.\nWe allow representing and reasoning in the presence of nested multiple aggregates over multiple variables and nested multiple aggregates over functions involving multiple variables in answer sets, precisely, in answer set optimization programming and in answer set programming. We show the applicability of the answer set optimization programming with nested multiple aggregates and the answer set programming with nested multiple aggregates to the Probabilistic Traveling Salesman Problem, a fundamental a priori optimization problem in Operation Research.\nRoborobo! is a multi-platform, highly portable, robot simulator for large-scale collective robotics experiments. Roborobo! is coded in C++, and follows the KISS guideline (\"Keep it simple\"). Therefore, its external dependency is solely limited to the widely available SDL library for fast 2D Graphics. Roborobo! is based on a Khepera/ePuck model. It is targeted for fast single and multi-robots simulation, and has already been used in more than a dozen published research mainly concerned with evolutionary swarm robotics, including environment-driven self-adaptation and distributed evolutionary optimization, as well as online onboard embodied evolution and embodied morphogenesis.\nWe present a unified logical framework for representing and reasoning about both probability quantitative and qualitative preferences in probability answer set programming, called probability answer set optimization programs. The proposed framework is vital to allow defining probability quantitative preferences over the possible outcomes of qualitative preferences. We show the application of probability answer set optimization programs to a variant of the well-known nurse restoring problem, called the nurse restoring with probability preferences problem. To the best of our knowledge, this development is the first to consider a logical framework for reasoning about probability quantitative preferences, in general, and reasoning about both probability quantitative and qualitative preferences in particular.\nWe present a logical framework to represent and reason about stochastic optimization problems based on probability answer set programming. This is established by allowing probability optimization aggregates, e.g., minimum and maximum in the language of probability answer set programming to allow minimization or maximization of some desired criteria under the probabilistic environments. We show the application of the proposed logical stochastic optimization framework under the probability answer set programming to two stages stochastic optimization problems with recourse.\nIn the interaction between agents we can have an explicative discourse, when communicating preferences or intentions, and a normative discourse, when considering normative knowledge. For justifying their actions our agents are endowed with a Justification and Explanation Logic (JEL), capable to cover both the justification for their commitments and explanations why they had to act in that way, due to the current situation in the environment. Social commitments are used to formalise justificatory and explanatory patterns. The combination of ex- planation, justification, and commitments\nIn the last decade Human-Computer Interaction (HCI) has started to focus attention on forms of persuasive interaction where computer technologies have the goal of changing users behavior and attitudes according to a predefined direction. In this work, we hypothesize a strong connection between logical fallacies (forms of reasoning which are logically invalid but cognitively effective) and some common persuasion strategies adopted within web technologies. With the aim of empirically evaluating our hypothesis, we carried out a pilot study on a sample of 150 e-commerce websites.\nPrevious work has shown the effectiveness of random walk hitting times as a measure of dissimilarity in a variety of graph-based learning problems such as collaborative filtering, query suggestion or finding paraphrases. However, application of hitting times has been limited to small datasets because of computational restrictions. This paper develops a new approximation algorithm with which hitting times can be computed on very large, disk-resident graphs, making their application possible to problems which were previously out of reach. This will potentially benefit a range of large-scale problems.\nIn this paper, we propose a first application of data mining techniques to propositional satisfiability. Our proposed Mining4SAT approach aims to discover and to exploit hidden structural knowledge for reducing the size of propositional formulae in conjunctive normal form (CNF). Mining4SAT combines both frequent itemset mining techniques and Tseitin's encoding for a compact representation of CNF formulae. The experiments of our Mining4SAT approach show interesting reductions of the sizes of many application instances taken from the last SAT competitions.\nOur aim is to investigate ontology-based data access over temporal data with validity time and ontologies capable of temporal conceptual modelling. To this end, we design a temporal description logic, TQL, that extends the standard ontology language OWL 2 QL, provides basic means for temporal conceptual modelling and ensures first-order rewritability of conjunctive queries for suitably defined data instances with validity time.\nThis research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The long term goal is to apply the abstract machinery of argumentation theory to more practical applications based on human generated arguments, such as deliberative democracy, business negotiation, or self-care. The ARGNET system was developed based on ther Semantic MediaWiki framework and on the Argument Interchange Format (AIF) ontology.\nWhen perturbation or unexpected events do occur, agents need protocols for repairing or reforming the supply chain. Unfortunate contingency could increase too much the cost of performance, while breaching the current contract may be more efficient. In our framework the principles of contract law are applied to set penalties: expectation damages, opportunity cost, reliance damages, and party design remedies, and they are introduced in the task dependency model\nWe report on highlights of the ACL2 enhancements introduced in ACL2 releases since the 2011 ACL2 Workshop. Although many enhancements are critical for soundness or robustness, we focus in this paper on those improvements that could benefit users who are aware of them, but that might not be discovered in everyday practice.\nIt will be shown that according to theorems of K. Menger, every neuron grid if identified with a curve is able to preserve the adopted qualitative structure of a data space. Furthermore, if this identification is made, the neuron grid structure can always be mapped to a subset of a universal neuron grid which is constructable in three space dimensions. Conclusions will be drawn for established neuron grid types as well as neural fields.\nMaLeS is an automatic tuning framework for automated theorem provers. It provides solutions for both the strategy finding as well as the strategy scheduling problem. This paper describes the tool and the methods used in it, and evaluates its performance on three automated theorem provers: E, LEO-II and Satallax. An evaluation on a subset of the TPTP library problems shows that on average a MaLeS-tuned prover solves 8.67% more problems than the prover with its default settings.\nControl applications often feature tasks with similar, but not identical, dynamics. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors, and introduce a semiparametric regression approach for learning its structure from data. In the control setting, we show that a learned HiP-MDP rapidly identifies the dynamics of a new task instance, allowing an agent to flexibly adapt to task variations.\nBat algorithm (BA) is a bio-inspired algorithm developed by Yang in 2010 and BA has been found to be very efficient. As a result, the literature has expanded significantly in the last 3 years. This paper provides a timely review of the bat algorithm and its new variants. A wide range of diverse applications and case studies are also reviewed and summarized briefly here. Further research topics are also discussed.\nThe present study gives a mathematical framework for self-evolution within autonomous problem solving systems. Special attention is set on universal abstraction, thereof generation by net block homomorphism, consequently multiple order solving systems and the overall decidability of the set of the solutions. By overlapping presentation of nets new abstraction relation among nets is formulated alongside with consequent alphabetical net block renetting system proportional to normal forms of renetting systems regarding the operational power. A new structure in self-evolving problem solving is established via saturation by groups of equivalence relations and iterative closures of generated quotient transducer algebras over the whole evolution.\nComplex systems are naturally hybrid: their dynamic behavior is both continuous and discrete. For these systems, maintenance and repair are an increasing part of the total cost of final product. Efficient diagnosis and prognosis techniques have to be adopted to detect, isolate and anticipate faults. This paper presents an original integrated theoretical framework for diagnosis and prognosis of hybrid systems. The formalism used for hybrid diagnosis is enriched in order to be able to follow the evolution of an aging law for each fault of the system. The paper presents a methodology for interleaving diagnosis and prognosis in a hybrid framework.\nThe partner units problem (PUP) is an acknowledged hard benchmark problem for the Logic Programming community with various industrial application fields like surveillance, electrical engineering, computer networks or railway safety systems. However, computational complexity remained widely unclear so far. In this paper we provide all missing complexity results making the PUP better exploitable for benchmark testing. Furthermore, we present QuickPup, a heuristic search algorithm for PUP instances which outperforms all state-of-the-art solving approaches and which is already in use in real world industrial configuration environments.\nEvolutionary and swarm algorithms have found many applications in design problems since todays computing power enables these algorithms to find solutions to complicated design problems very fast. Newly proposed hybrid algorithm, bat algorithm, has been applied for the design of microwave microstrip couplers for the first time. Simulation results indicate that the bat algorithm is a very fast algorithm and it produces very reliable results.\nCombining efficiency with reliability within CP systems is one of the main concerns of CP developers. This paper presents a simple and efficient way to connect Choco and Ibex, two CP solvers respectively specialised on finite and continuous domains. This enables to take advantage of the most recent advances of the continuous community within Choco while saving development and maintenance resources, hence ensuring a better software quality.\nIn this paper, we revisit an important issue of CDCL-based SAT solvers, namely the learned clauses database management policies. Our motivation takes its source from a simple observation on the remarkable performances of both random and size-bounded reduction strategies. We first derive a simple reduction strategy, called Size-Bounded Randomized strategy (in short SBR), that combines maintaing short clauses (of size bounded by k), while deleting randomly clauses of size greater than k. The resulting strategy outperform the state-of-the-art, namely the LBD based one, on SAT instances taken from the last SAT competition. Reinforced by the interest of keeping short clauses, we propose several new dynamic variants, and we discuss their performances.\nHandwritten character recognition is one of the most challenging and ongoing areas of research in the field of pattern recognition. HCR research is matured for foreign languages like Chinese and Japanese but the problem is much more complex for Indian languages. The problem becomes even more complicated for South Indian languages due to its large character set and the presence of vowels modifiers and compound characters. This paper provides an overview of important contributions and advances in offline as well as online handwritten character recognition of Malayalam scripts.\nThe amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not lend themselves to interpretable results, and the CPU and memory resources necessary to run on high-dimensional datasets severly limit the applications of the approaches. Variable and feature selection aim to remedy this by finding a subset of features that in some way captures the information provided best. In this paper we present the general methodology and highlight some specific approaches.\nMachine Learner for Automated Reasoning (MaLARea) is a learning and reasoning system for proving in large formal libraries where thousands of theorems are available when attacking a new conjecture, and a large number of related problems and proofs can be used to learn specific theorem-proving knowledge. The last version of the system has by a large margin won the 2013 CASC LTB competition. This paper describes the motivation behind the methods used in MaLARea, discusses the general approach and the issues arising in evaluation of such system, and describes the Mizar@Turing100 and CASC'24 versions of MaLARea.\nParameter estimation based on uncertain data represented as belief structures is one of the latest problems in the Dempster-Shafer theory. In this paper, a novel method is proposed for the parameter estimation in the case where belief structures are uncertain and represented as interval-valued belief structures. Within our proposed method, the maximization of likelihood criterion and minimization of estimated parameter's uncertainty are taken into consideration simultaneously. As an illustration, the proposed method is employed to estimate parameters for deterministic and uncertain belief structures, which demonstrates its effectiveness and versatility.\nWe observe a trend regarding restart strategies used in SAT solvers. A few years ago, most state-of-the-art solvers restarted on average after a few thousands of backtracks. Currently, restarting after a dozen backtracks results in much better performance. The main reason for this trend is that heuristics and data structures have become more restart-friendly. We expect further continuation of this trend, so future SAT solvers will restart even more rapidly. Additionally, we present experimental results to support our observations.\nIn this paper, we provide more evidence for the contention that logical consequence should be understood in normative terms. Hartry Field and John MacFarlane covered the classical case. We extend their work, examining what it means for an agent to be obliged to infer a conclusion when faced with uncertain information or reasoning within a non-monotonic, defeasible, logical framework (which allows e. g. for inference to be drawn from premises considered true unless evidence to the contrary is presented).\nThis paper uses the smoothing and mapping framework to solve the SLAM problem in indoor environments; focusing on how some key issues such as feature extraction and data association can be handled by applying probabilistic techniques. For feature extraction, an odds ratio approach to find multiple lines from laser scans is proposed, this criterion allows to decide which model must be merged and to output the best number of models. In addition, to solve the data association problem a method based on the segments of each line is proposed. Experimental results show that high quality indoor maps can be obtained from noisy data\nThis paper presents an analysis of data from a gift-exchange-game experiment. The experiment was described in `The Impact of Social Comparisons on Reciprocity' by G\\\"achter et al. 2012. Since this paper uses state-of-art data science techniques, the results provide a different point of view on the problem. As already shown in relevant literature from experimental economics, human decisions deviate from rational payoff maximization. The average gift rate was $31$%. Gift rate was under no conditions zero. Further, we derive some special findings and calculate their significance.\nWe present a modification of the superposition calculus that is meant to generate consequences of sets of first-order axioms. This approach is proven to be sound and deductive-complete in the presence of redundancy elimination rules, provided the considered consequences are built on a given finite set of ground terms, represented by constant symbols. In contrast to other approaches, most existing results about the termination of the superposition calculus can be carried over to our procedure. This ensures in particular that the calculus is terminating for many theories of interest to the SMT community.\n\"How to generate a sentence\" is the most critical and difficult problem in all the natural language processing technologies. In this paper, we present a new approach to explain the generation process of a sentence from the perspective of mathematics. Our method is based on the premise that in our brain a sentence is a part of a word network which is formed by many word nodes. Experiments show that the probability of the entire sentence can be obtained by the probabilities of single words and the probabilities of the co-occurrence of word pairs, which indicate that human use the synthesis method to generate a sentence.\nIt has been proved that large scale realistic Knowledge Based Machine Translation applications require acquisition of huge knowledge about language and about the world. This knowledge is encoded in computational grammars, lexicons and domain models. Another approach which avoids the need for collecting and analyzing massive knowledge, is the Example Based approach, which is the topic of this paper. We show through the paper that using Example Based in its native form is not suitable for translating into Arabic. Therefore a modification to the basic approach is presented to improve the accuracy of the translation process. The basic idea of the new approach is to improve the technique by which template-based approaches select the appropriate templates.\nDecision making is still an open issue in the application of Dempster-Shafer evidence theory. A lot of works have been presented for it. In the transferable belief model (TBM), pignistic probabilities based on the basic probability as- signments are used for decision making. In this paper, multiscale probability transformation of basic probability assignment based on the belief function and the plausibility function is proposed, which is a generalization of the pignistic probability transformation. In the multiscale probability function, a factor q based on the Tsallis entropy is used to make the multiscale prob- abilities diversified. An example is shown that the multiscale probability transformation is more reasonable in the decision making.\nNeutrosophic Statistics means statistical analysis of population or sample that has indeterminate (imprecise, ambiguous, vague, incomplete, unknown) data. For example, the population or sample size might not be exactly determinate because of some individuals that partially belong to the population or sample, and partially they do not belong, or individuals whose appurtenance is completely unknown. Also, there are population or sample individuals whose data could be indeterminate. In this book, we develop the 1995 notion of neutrosophic statistics. We present various practical examples. It is possible to define the neutrosophic statistics in many ways, because there are various types of indeterminacies, depending on the problem to solve.\nWe define a notion of rational closure for the logic SHIQ, which does not enjoys the finite model property, building on the notion of rational closure introduced by Lehmann and Magidor in [23]. We provide a semantic characterization of rational closure in SHIQ in terms of a preferential semantics, based on a finite rank characterization of minimal models. We show that the rational closure of a TBox can be computed in EXPTIME using entailment in SHIQ.\nThe user equilibrium in traffic assignment problem is based on the fact that travelers choose the minimum-cost path between every origin-destination pair and on the assumption that such a behavior will lead to an equilibrium of the traffic network. In this paper, we consider this problem when the traffic network links are fuzzy cost. Therefore, a Physarum-type algorithm is developed to unify the Physarum network and the traffic network for taking full of advantage of Physarum Polycephalum's adaptivity in network design to solve the user equilibrium problem. Eventually, some experiments are used to test the performance of this method. The results demonstrate that our approach is competitive when compared with other existing algorithms.\nThe Dynamic Logic for Propositional Assignments (DL-PA) has recently been studied as an alternative to Propositional Dynamic Logic (PDL). In DL-PA, the abstract atomic programs of PDL are replaced by assignments of propositional variables to truth values. This makes DL-PA enjoy some interesting meta-logical properties that PDL does not, such as eliminability of the Kleene star, compactness and interpolation. We define and analytic tableaux calculus for DL-PA and show that it matches the known complexity results.\nBayesian network structures are usually built using only the data and starting from an empty network or from a naive Bayes structure. Very often, in some domains, like medicine, a prior structure knowledge is already known. This structure can be automatically or manually refined in search for better performance models. In this work, we take Bayesian networks built by specialists and show that minor perturbations to this original network can yield better classifiers with a very small computational cost, while maintaining most of the intended meaning of the original model.\nWe consider the problem of learning from a similarity matrix (such as spectral clustering and lowd imensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed. We provide a theoretical analysis using standard notions of graph approximation, significantly generalizing previous results (which focused on spectral clustering with two clusters). We also propose a new algorithmic approach based on adaptive sampling, which experimentally matches or improves on previous methods, while being considerably more general and computationally cheaper.\nBecause reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.\nSampling from hierarchical Bayesian models is often difficult for MCMC methods, because of the strong correlations between the model parameters and the hyperparameters. Recent Riemannian manifold Hamiltonian Monte Carlo (RMHMC) methods have significant potential advantages in this setting, but are computationally expensive. We introduce a new RMHMC method, which we call semi-separable Hamiltonian Monte Carlo, which uses a specially designed mass matrix that allows the joint Hamiltonian over model parameters and hyperparameters to decompose into two simpler Hamiltonians. This structure is exploited by a new integrator which we call the alternating blockwise leapfrog algorithm. The resulting method can mix faster than simpler Gibbs sampling while being simpler and more efficient than previous instances of RMHMC.\nMotivated by the application problem of sensor fusion the author introduced the concept of graded set. It is reasoned that in classification problem arising in an information system (represented by information table), a novel set called Granular set naturally arises. It is realized that in any hierarchical classification problem, Granular set naturally arises. Also when the target set of objects forms a graded set the lower and upper approximations of target sets form a graded set. This generalizes the concept of rough set. It is hoped that a detailed theory of granular/ graded sets finds several applications.\nSeveral real problems ranging from text classification to computational biology are characterized by hierarchical multi-label classification tasks. Most of the methods presented in literature focused on tree-structured taxonomies, but only few on taxonomies structured according to a Directed Acyclic Graph (DAG). In this contribution novel classification ensemble algorithms for DAG-structured taxonomies are introduced. In particular Hierarchical Top-Down (HTD-DAG) and True Path Rule (TPR-DAG) for DAGs are presented and discussed.\nLatent variable conditional models, including the latent conditional random fields as a special case, are popular models for many natural language processing and vision processing tasks. The computational complexity of the exact decoding/inference in latent conditional random fields is unclear. In this paper, we try to clarify the computational complexity of the exact decoding. We analyze the complexity and demonstrate that it is an NP-hard problem even on a sequential labeling setting. Furthermore, we propose the latent-dynamic inference (LDI-Naive) method and its bounded version (LDI-Bounded), which are able to perform exact-inference or almost-exact-inference by using top-$n$ search and dynamic programming.\nIn order to improve children speech therapy, we develop a Fuzzy Expert System based on a speech therapy guide. This guide, write in natural language, was formalized using fuzzy logic paradigm. In this manner we obtain a knowledge base with over 150 rules and 19 linguistic variables. All these researches, including expert system validation, are part of TERAPERS project.\nThis paper proposes a model, the linear model, for randomly generating logic programs with low density of rules and investigates statistical properties of such random logic programs. It is mathematically shown that the average number of answer sets for a random program converges to a constant when the number of atoms approaches infinity. Several experimental results are also reported, which justify the suitability of the linear model. It is also experimentally shown that, under this model, the size distribution of answer sets for random programs tends to a normal distribution when the number of atoms is sufficiently large.\nStatements about entities occur everywhere, from newspapers and web pages to structured databases. Correlating references to entities across systems that use different identifiers or names for them is a widespread problem. In this paper, we show how shared knowledge between systems can be used to solve this problem. We present \"reference by description\", a formal model for resolving references. We provide some results on the conditions under which a randomly chosen entity in one system can, with high probability, be mapped to the same entity in a different system.\nMap matching of the GPS trajectory serves the purpose of recovering the original route on a road network from a sequence of noisy GPS observations. It is a fundamental technique to many Location Based Services. However, map matching of a low sampling rate on urban road network is still a challenging task. In this paper, the characteristics of Conditional Random Fields with regard to inducing many contextual features and feature selection are explored for the map matching of the GPS trajectories at a low sampling rate. Experiments on a taxi trajectory dataset show that our method may achieve competitive results along with the success of reducing model complexity for computation-limited applications.\nIn this paper, we describe a simple strategy for mitigating variability in temporal data series by shifting focus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action recognition task and demonstrate how working in the frequency domain can yield good recognition features for commonly used optical flow and articulated pose features, which are highly sensitive to small differences in motion, viewpoint, dynamic backgrounds, occlusion and other sources of variability. We show how these frequency-based features can be used in combination with a simple forest classifier to achieve good and robust results on the popular KTH Actions dataset.\nIn this paper, we propose a model for simulating search operators whose behaviour often changes continuously during the search. In these scenarios, the performance of the operators decreases when they are applied. This is motivated by the fact that operators for optimization problems are often roughly classified into exploitation operators and exploration operators. Our simulation model is used to compare the different performances of operator selection policies and clearly identify their ability to adapt to such specific operators behaviours. The experimental study provides interesting results on the respective behaviours of operator selection policies when faced to such non stationary search scenarios.\nThis paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax risk lower bound, and analyze the risk of two standard estimators. It is shown, and verified in simulation, that one is minimax optimal up to a constant, while another can be arbitrarily worse, despite its empirical success and popularity. The results are applied to related problems in contextual bandits and fixed-horizon Markov decision processes, and are also related to semi-supervised learning.\nA difficult task in modeling with Bayesian networks is the elicitation of numerical parameters of Bayesian networks. A large number of parameters is needed to specify a conditional probability table (CPT) that has a larger parent set. In this paper we show that, most CPTs from real applications of Bayesian networks can actually be very well approximated by tables that require substantially less parameters. This observation has practical consequence not only for model elicitation but also for efficient probabilistic reasoning with these networks.\nIn this paper a method is proposed which uses data mining techniques based on rough sets theory to select neighborhood and determine update rule for cellular automata (CA). According to the proposed approach, neighborhood is detected by reducts calculations and a rule-learning algorithm is applied to induce a set of decision rules that define the evolution of CA. Experiments were performed with use of synthetic as well as real-world data sets. The results show that the introduced method allows identification of both deterministic and probabilistic CA-based models of real-world phenomena.\nThe potential risk of privacy leakage prevents users from sharing their honest opinions on social platforms. This paper addresses the problem of privacy preservation if the query returns the histogram of rankings. The framework of differential privacy is applied to rank aggregation. The error probability of the aggregated ranking is analyzed as a result of noise added in order to achieve differential privacy. Upper bounds on the error rates for any positional ranking rule are derived under the assumption that profiles are uniformly distributed. Simulation results are provided to validate the probabilistic analysis.\nFinding the physical location of a specific network node is a prototypical task for navigation inside a wireless network. In this paper, we consider in depth the implications of wireless communication as a measurement input of gradient-based taxis algorithms. We discuss how gradients can be measured and determine the errors of this estimation. We then introduce a gradient-based taxis algorithm as an example of a family of gradient-based, convergent algorithms and discuss its convergence in the context of network robotics. We also conduct an exemplary experiment to show how to overcome some of the specific problems related to network robotics. Finally, we show how to adapt this framework to more complex objectives.\nThis paper describes a novel approach to medical diagnosis based on the SP theory of computing and cognition. The main attractions of this approach are: a format for representing diseases that is simple and intuitive; an ability to cope with errors and uncertainties in diagnostic information; the simplicity of storing statistical information as frequencies of occurrence of diseases; a method for evaluating alternative diagnostic hypotheses that yields true probabilities; and a framework that should facilitate unsupervised learning of medical knowledge and the integration of medical diagnosis with other AI applications.\nA crucial problem for many results and tools about bigraphs and bigraphical reactive systems is bigraph embedding. An embedding is more informative than a bigraph matching, since it keeps track of the correspondence between the various components of the redex (guest) within the agent (host). In this paper, we present an algorithm for computing embeddings based on a reduction to a constraint satisfaction problem. This algorithm, that we prove to be sound and complete, has been successfully implemented in LibBig, a library for manipulating bigraphical reactive systems. This library can be used for implementing a wide range of tools, and it can be adapted to various extensions of bigraphs.\nWe introduce a new approach to unsupervised estimation of feature-rich semantic role labeling models. Our model consists of two components: (1) an encoding component: a semantic role labeling model which predicts roles given a rich set of syntactic and lexical features; (2) a reconstruction component: a tensor factorization model which relies on roles to predict argument fillers. When the components are estimated jointly to minimize errors in argument reconstruction, the induced roles largely correspond to roles defined in annotated resources. Our method performs on par with most accurate role induction methods on English and German, even though, unlike these previous approaches, we do not incorporate any prior linguistic knowledge about the languages.\nWe propose a practical and scalable Gaussian process model for large-scale nonlinear probabilistic regression. Our mixture-of-experts model is conceptually simple and hierarchically recombines computations for an overall approximation of a full Gaussian process. Closed-form and distributed computations allow for efficient and massive parallelisation while keeping the memory consumption small. Given sufficient computing resources, our model can handle arbitrarily large data sets, without explicit sparse approximations. We provide strong experimental evidence that our model can be applied to large data sets of sizes far beyond millions. Hence, our model has the potential to lay the foundation for general large-scale Gaussian process research.\nThe semantic web has led to the deployment of ontologies on the web connected through various relations and, in particular, alignments of their vocabularies. There exists several semantics for alignments which make difficult interoperation between different interpretation of networks of ontologies. Here we present an abstraction of these semantics which allows for defining the notions of closure and consistency for networks of ontologies independently from the precise semantics. We also show that networks of ontologies with specific notions of morphisms define categories of networks of ontologies.\nWe study logic for reasoning with if-then formulas describing dependencies between attributes of objects which are observed in consecutive points in time. We introduce semantic entailment of the formulas, show its fixed-point characterization, investigate closure properties of model classes, present an axiomatization and prove its completeness, and investigate alternative axiomatizations and normalized proofs. We investigate decidability and complexity issues of the logic and prove that the entailment problem is NP-hard and belongs to EXPSPACE. We show that by restricting to predictive formulas, the entailment problem is decidable in pseudo-linear time.\nThis paper addresses how a recursive neural network model can automatically leave out useless information and emphasize important evidence, in other words, to perform \"weight tuning\" for higher-level representation acquisition. We propose two models, Weighted Neural Network (WNN) and Binary-Expectation Neural Network (BENN), which automatically control how much one specific unit contributes to the higher-level representation. The proposed model can be viewed as incorporating a more powerful compositional function for embedding acquisition in recursive neural networks. Experimental results demonstrate the significant improvement over standard neural models.\nHow smart is your kettle? How smart are things in your kitchen, your house, your neighborhood, on the internet? With the advent of Internet of Things, and the move of making devices `smart' by utilizing AI, a natural question arrises, how can we evaluate the progress. The standard way of evaluating AI is through the Turing Test. While Turing Test was designed for AI; the device that it was tailored to was a computer. Applying the test to variety of devices that constitute Internet of Things poses a number of challenges which could be addressed through a number of adaptations.\nLocal search methods can quickly find good quality solutions in cases where systematic search methods might take a large amount of time. Moreover, in the context of pattern set mining, exhaustive search methods are not applicable due to the large search space they have to explore. In this paper, we propose the application of stochastic local search to solve the pattern set mining. Specifically, to the task of concept learning. We applied a number of local search algorithms on a standard benchmark instances for pattern set mining and the results show the potentials for further exploration.\nWe demonstrate that any physical object, as long as its volume is conserved when coupled with suitable operations, provides a sophisticated decision-making capability. We consider the problem of finding, as accurately and quickly as possible, the most profitable option from a set of options that gives stochastic rewards. These decisions are made as dictated by a physical object, which is moved in a manner similar to the fluctuations of a rigid body in a tug-of-war game. Our analytical calculations validate statistical reasons why our method exhibits higher efficiency than conventional algorithms.\nIn unsupervised learning, an unbiased uniform sampling strategy is typically used, in order that the learned features faithfully encode the statistical structure of the training data. In this work, we explore whether active example selection strategies - algorithms that select which examples to use, based on the current estimate of the features - can accelerate learning. Specifically, we investigate effects of heuristic and saliency-inspired selection algorithms on the dictionary learning task with sparse activations. We show that some selection algorithms do improve the speed of learning, and we speculate on why they might work.\nKnowledge is only good if it is sound, consistent and complete. The same holds true for conceptual knowledge, which holds knowledge about concepts and its association. Conceptual knowledge no matter what format they are represented in, must be consistent, sound and complete in order to realise its practical use. This paper discusses consistency, soundness and completeness in the ambit of conceptual knowledge and the need to consider these factors as fundamental to the development of conceptual knowledge.\nThe KF metamodel is a comprehensive unifying metamodel covering the static structural entities and constraints of UML Class Diagrams (v2.4.1), ER, EER, ORM, and ORM2, and intended to boost interoperability of common conceptual data modelling languages. It was originally designed in UML with textual constraints, and in this report we present its formalisations in FOL and OWL, which accompanies the paper that describes, discusses, and analyses the KF metamodel in detail. These new formalizations contribute to give a precise meaning to the metamodel, to understand its complexity properties and to provide a basis for future implementations.\nWe present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.\nIn this exploratory note we ask the question of what a measure of performance for all tasks is like if we use a weighting of tasks based on a difficulty function. This difficulty function depends on the complexity of the (acceptable) solution for the task (instead of a universal distribution over tasks or an adaptive test). The resulting aggregations and decompositions are (now retrospectively) seen as the natural (and trivial) interactive generalisation of the C-tests.\nThe menu-dependent nature of regret-minimization creates subtleties when it is applied to dynamic decision problems. Firstly, it is not clear whether \\emph{forgone opportunities} should be included in the \\emph{menu}, with respect to which regrets are computed, at different points of the decision problem. If forgone opportunities are included, however, we can characterize when a form of dynamic consistency is guaranteed. Secondly, more subtleties arise when sophistication is used to deal with dynamic inconsistency. In the full version of this paper, we examine, axiomatically and by common examples, the implications of different menu definitions for sophisticated, regret-minimizing agents.\nThis study describes the experimental application of Machine Learning techniques to build prediction models that can assess the injury risk associated with traffic accidents. This work uses an freely available data set of traffic accident records that took place in the city of Porto Alegre/RS (Brazil) during the year of 2013. This study also provides an analysis of the most important attributes of a traffic accident that could produce an outcome of injury to the people involved in the accident.\nIn this paper an alternative approach to solve uncertain Stochastic Differential Equation (SDE) is proposed. This uncertainty occurs due to the involved parameters in system and these are considered as Triangular Fuzzy Numbers (TFN). Here the proposed fuzzy arithmetic in [2] is used as a tool to handle Fuzzy Stochastic Differential Equation (FSDE). In particular, a system of Ito stochastic differential equations is analysed with fuzzy parameters. Further exact and Euler Maruyama approximation methods with fuzzy values are demonstrated and solved some standard SDE.\nGibbs random fields play an important role in statistics, however, the resulting likelihood is typically unavailable due to an intractable normalizing constant. Composite likelihoods offer a principled means to construct useful approximations. This paper provides a mean to calibrate the posterior distribution resulting from using a composite likelihood and illustrate its performance in several examples.\nA wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We develop a simple distributed algorithm that is embarrassingly parallel and it achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.\nSubmodular function minimization is a fundamental optimization problem that arises in several applications in machine learning and computer vision. The problem is known to be solvable in polynomial time, but general purpose algorithms have high running times and are unsuitable for large-scale problems. Recent work have used convex optimization techniques to obtain very practical algorithms for minimizing functions that are sums of ``simple\" functions. In this paper, we use random coordinate descent methods to obtain algorithms with faster linear convergence rates and cheaper iteration costs. Compared to alternating projection methods, our algorithms do not rely on full-dimensional vector operations and they converge in significantly fewer iterations.\nDistilling from a knowledge base only the part that is relevant to a subset of alphabet, which is recognized as forgetting, has attracted extensive interests in AI community. In standard propositional logic, a general algorithm of forgetting and its computation-oriented investigation in various fragments whose satisfiability are tractable are still lacking. The paper aims at filling the gap. After exploring some basic properties of forgetting in propositional logic, we present a resolution-based algorithm of forgetting for CNF fragment, and some complexity results about forgetting in Horn, renamable Horn, q-Horn, Krom, DNF and CNF fragments of propositional logic.\nIn this work, a nonlinear model predictive controller is developed for a batch polymerization process. The physical model of the process is parameterized along a desired trajectory resulting in a trajectory linearized piecewise model (a multiple linear model bank) and the parameters are identified for an experimental polymerization reactor. Then, a multiple model adaptive predictive controller is designed for thermal trajectory tracking of the MMA polymerization. The input control signal to the process is constrained by the maximum thermal power provided by the heaters. The constrained optimization in the model predictive controller is solved via genetic algorithms to minimize a DMC cost function in each sampling interval.\nMulticriteria decision analysis aims at supporting a person facing a decision problem involving conflicting criteria. We consider an additive utility model which provides robust conclusions based on preferences elicited from the decision maker. The recommendations based on these robust conclusions are even more convincing if they are complemented by explanations. We propose a general scheme, based on sequence of preference swaps, in which explanations can be computed. We show first that the length of explanations can be unbounded in the general case. However, in the case of binary reference scales, this length is bounded and we provide an algorithm to compute the corresponding explanation.\nThis paper presents an idea of inductive learning use for rule generation from ontologies. The main purpose of the paper is to evaluate the possibility of inductive learning use in rule generation from ontologies and to develop the way how this can be done. Generated rules are necessary to supplement or even to develop the Semantic Web Expert System (SWES) knowledge base. The SWES emerges as the result of evolution of expert system concept toward the Web, and the SWES is based on the Semantic Web technologies. Available publications show that the problem of rule generation from ontologies based on inductive learning is not investigated deeply enough.\nDeontic logic is a very well researched branch of mathematical logic and philosophy. Various kinds of deontic logics are considered for different application domains like argumentation theory, legal reasoning, and acts in multi-agent systems. In this paper, we show how standard deontic logic can be used to model ethical codes for multi-agent systems. Furthermore we show how Hyper, a high performance theorem prover, can be used to prove properties of these ethical codes.\nHere a novel idea to handle imprecise or vague set viz. Pseudo fuzzy set has been proposed. Pseudo fuzzy set is a triplet of element and its two membership functions. Both the membership functions may or may not be dependent. The hypothesis is that every positive sense has some negative sense. So, one membership function has been considered as positive and another as negative. Considering this concept, here the development of Pseudo fuzzy set and its property along with Pseudo fuzzy numbers has been discussed.\nDempster-Shafer evidence theory is an efficient mathematical tool to deal with uncertain information. In that theory, basic probability assignment (BPA) is the basic element for the expression and inference of uncertainty. Decision-making based on BPA is still an open issue in Dempster-Shafer evidence theory. In this paper, a novel approach of transforming basic probability assignments to probabilities is proposed based on Deng entropy which is a new measure for the uncertainty of BPA. The principle of the proposed method is to minimize the difference of uncertainties involving in the given BPA and obtained probability distribution. Numerical examples are given to show the proposed approach.\nWe introduce a new approach to solving path-finding problems under uncertainty by representing them as probabilistic models and applying domain-independent inference algorithms to the models. This approach separates problem representation from the inference algorithm and provides a framework for efficient learning of path-finding policies. We evaluate the new approach on the Canadian Traveler Problem, which we formulate as a probabilistic model, and show how probabilistic inference allows high performance stochastic policies to be obtained for this problem.\nWe study an online model of fair division designed to capture features of a real world charity problem. We consider two simple mechanisms for this model in which agents simply declare what items they like. We analyse several axiomatic properties of these mechanisms like strategy-proofness and envy-freeness. Finally, we perform a competitive analysis and compute the price of anarchy.\nIn this paper we propose a non-metric ranking-based representation of semantic similarity that allows natural aggregation of semantic information from multiple heterogeneous sources. We apply the ranking-based representation to zero-shot learning problems, and present deterministic and probabilistic zero-shot classifiers which can be built from pre-trained classifiers without retraining. We demonstrate their the advantages on two large real-world image datasets. In particular, we show that aggregating different sources of semantic information, including crowd-sourcing, leads to more accurate classification.\nPast research has challenged us with the task of showing relational patterns between text-based data and then clustering for predictive analysis using Golay Code technique. We focus on a novel approach to extract metaknowledge in multimedia datasets. Our collaboration has been an on-going task of studying the relational patterns between datapoints based on metafeatures extracted from metaknowledge in multimedia datasets. Those selected are significant to suit the mining technique we applied, Golay Code algorithm. In this research paper we summarize findings in optimization of metaknowledge representation for 23-bit representation of structured and unstructured multimedia data in order to\nThis chapter provides an introduction to some basic concepts of epistemic logic, basic formal languages, their semantics, and proof systems. It also contains an overview of the handbook, and a brief history of epistemic logic and pointers to the literature.\nPrior knowledge has been shown very useful to address many natural language processing tasks. Many approaches have been proposed to formalise a variety of knowledge, however, whether the proposed approach is robust or sensitive to the knowledge supplied to the model has rarely been discussed. In this paper, we propose three regularization terms on top of generalized expectation criteria, and conduct extensive experiments to justify the robustness of the proposed methods. Experimental results demonstrate that our proposed methods obtain remarkable improvements and are much more robust than baselines.\nWe are proposing an extension of the recursive neural network that makes use of a variant of the long short-term memory architecture. The extension allows information low in parse trees to be stored in a memory register (the `memory cell') and used much later higher up in the parse tree. This provides a solution to the vanishing gradient problem and allows the network to capture long range dependencies. Experimental results show that our composition outperformed the traditional neural-network composition on the Stanford Sentiment Treebank.\nWe study probabilistically informative (weak) versions of transitivity, by using suitable definitions of defaults and negated defaults, in the setting of coherence and imprecise probabilities. We represent p-consistent sequences of defaults and/or negated defaults by g-coherent imprecise probability assessments on the respective sequences of conditional events. Finally, we prove the coherent probability propagation rules for Weak Transitivity and the validity of selected inference patterns by proving the p-entailment for the associated knowledge bases.\nVirtualization enables the building of multiple virtual networks over a shared substrate. One of the challenges to virtualisation is efficient resource allocation. This problem has been found to be NP hard. Therefore, most approaches to it have not only proposed static solutions, but have also made many assumptions to simplify it. In this paper, we propose a distributed, autonomic and artificial intelligence based solution to resource allocation. Our aim is to obtain self-configuring, selfoptimizing, self-healing and context aware virtual networks\nThe theory of belief functions manages uncertainty and also proposes a set of combination rules to aggregate opinions of several sources. Some combination rules mix evidential information where sources are independent; other rules are suited to combine evidential information held by dependent sources. In this paper we have two main contributions: First we suggest a method to quantify sources' degree of independence that may guide the choice of the more appropriate set of combination rules. Second, we propose a new combination rule that takes consideration of sources' degree of independence. The proposed method is illustrated on generated mass functions.\nUsing SMS (Short Message System), cell phones can be used to query for information about various topics. In an SMS based search system, one of the key problems is to identify a domain (broad topic) associated with the user query; so that a more comprehensive search can be carried out by the domain specific search engine. In this paper we use a rule based approach, to identify the domain, called Short Query Intent Identification System (SQIIS). We construct two different rule-bases using different strategies to suit query intent identification. We evaluate the two rule-bases experimentally.\nEmpirical evidence demonstrates that every region of the neocortex represents information using sparse activity patterns. This paper examines Sparse Distributed Representations (SDRs), the primary information representation strategy in Hierarchical Temporal Memory (HTM) systems and the neocortex. We derive a number of properties that are core to scaling, robustness, and generalization. We use the theory to provide practical guidelines and illustrate the power of SDRs as the basis of HTM. Our goal is to help create a unified mathematical and practical framework for SDRs as it relates to cortical function.\nThis paper recalls the definition of consistency for pairwise comparison matrices and briefly presents the concept of inconsistency index in connection to other aspects of the theory of pairwise comparisons. By commenting on a recent contribution by Koczkodaj and Szwarc, it will be shown that the discussion on inconsistency indices is far from being over, and the ground is still fertile for debates.\nThe Libra Toolkit is a collection of algorithms for learning and inference with discrete probabilistic models, including Bayesian networks, Markov networks, dependency networks, and sum-product networks. Compared to other toolkits, Libra places a greater emphasis on learning the structure of tractable models in which exact inference is efficient. It also includes a variety of algorithms for learning graphical models in which inference is potentially intractable, and for performing exact and approximate inference. Libra is released under a 2-clause BSD license to encourage broad use in academia and industry.\nRobot localization is a one of the most important problems in robotics. Most of the existing approaches assume that the map of the environment is available beforehand and focus on accurate metrical localization. In this paper, we address the localization problem when the map of the environment is not present beforehand, and the robot relies on a hand-drawn map from a non-expert user. We addressed this problem by expressing the robot pose in the pixel coordinate and simultaneously estimate a local deformation of the hand-drawn map. Experiments show that we are able to localize the robot in the correct room with a robustness up to 80%\nRelax, Compensate and then Recover (RCR) is a paradigm for approximate inference in probabilistic graphical models that has previously provided theoretical and practical insights on iterative belief propagation and some of its generalizations. In this paper, we characterize the technique of dual decomposition in the terms of RCR, viewing it as a specific way to compensate for relaxed equivalence constraints. Among other insights gathered from this perspective, we propose novel heuristics for recovering relaxed equivalence constraints with the goal of incrementally tightening dual decomposition approximations, all the way to reaching exact solutions. We also show empirically that recovering equivalence constraints can sometimes tighten the corresponding approximation (and obtaining exact results), without increasing much the complexity of inference.\nWe have designed and implemented an application running inside Second Life that supports user annotation of graphical objects and graphical visualization of concept ontologies, thus providing a formal, machine-accessible description of objects. As a result, we offer a platform that combines the graphical knowledge representation that is expected from a MUVE artifact with the semantic structure given by the Resource Framework Description (RDF) representation of information.\nKnowledge reduction of dynamic covering information systems involves with the time in practical situations. In this paper, we provide incremental approaches to computing the type-1 and type-2 characteristic matrices of dynamic coverings because of varying attribute values. Then we present incremental algorithms of constructing the second and sixth approximations of sets by using characteristic matrices. We employ experimental results to illustrate that the incremental approaches are effective to calculate approximations of sets in dynamic covering information systems. Finally, we perform knowledge reduction of dynamic covering information systems with the incremental approaches.\nWe consider a semantic class, weakly-chase-sticky (WChS), and a syntactic subclass, jointly-weakly-sticky (JWS), of Datalog+- programs. Both extend that of weakly-sticky (WS) programs, which appear in our applications to data quality. For WChS programs we propose a practical, polynomial-time query answering algorithm (QAA). We establish that the two classes are closed under magic-sets rewritings. As a consequence, QAA can be applied to the optimized programs. QAA takes as inputs the program (including the query) and semantic information about the \"finiteness\" of predicate positions. For the syntactic subclasses JWS and WS of WChS, this additional information is computable.\nFuzzy Geographically Weighted Clustering (FGWC) is considered as a suitable tool for the analysis of geo-demographic data that assists the provision and planning of products and services to local people. Context variables were attached to FGWC in order to accelerate the computing speed of the algorithm and to focus the results on the domain of interests. Nonetheless, the determination of exact, crisp values of the context variable is a hard task. In this paper, we propose two novel methods using fuzzy approaches for that determination. A numerical example is given to illustrate the uses of the proposed methods.\nThe paper addresses the graph classification problem and introduces a modification of the lazy associative classification method to efficiently handle intersections of graphs. Graph intersections are approximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We explain the idea of the algorithm with a toy example and describe our experiments with a predictive toxicology dataset.\nMost ethical work is done at a low level of formality. This makes practical moral questions inaccessible to formal and natural sciences and can lead to misunderstandings in ethical discussion. In this paper, we use Bayesian inference to introduce a formalization of preference utilitarianism in physical world models, specifically cellular automata. Even though our formalization is not immediately applicable, it is a first step in providing ethics and ultimately the question of how to \"make the world better\" with a formal basis.\nRelation extraction with accurate precision is still a challenge when processing full text databases. We propose an approach based on cooccurrence analysis in each document for which we used document organization to improve accuracy of relation extraction. This approach is implemented in a R package called \\emph{x.ent}. Another facet of extraction relies on use of extracted relation into a querying system for expert end-users. Two datasets had been used. One of them gets interest from specialists of epidemiology in plant health. For this dataset usage is dedicated to plant-disease exploration through agricultural information news. An open-data platform exploits exports from \\emph{x.ent} and is publicly available.\nCP-nets represent the dominant existing framework for expressing qualitative conditional preferences between alternatives, and are used in a variety of areas including constraint solving. Over the last fifteen years, a significant literature has developed exploring semantics, algorithms, implementation and use of CP-nets. This paper introduces a comprehensive new framework for conditional preferences: logical conditional preference theories (LCP theories). To express preferences, the user specifies arbitrary (constraint) Datalog programs over a binary ordering relation on outcomes. We show how LCP theories unify and generalize existing conditional preference proposals, and leverage the rich semantic, algorithmic and implementation frameworks of Datalog.\nWe present a parser for Abstract Meaning Representation (AMR). We treat English-to-AMR conversion within the framework of string-to-tree, syntax-based machine translation (SBMT). To make this work, we transform the AMR structure into a form suitable for the mechanics of SBMT and useful for modeling. We introduce an AMR-specific language model and add data and features drawn from semantic resources. Our resulting AMR parser improves upon state-of-the-art results by 7 Smatch points.\nWe introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.\nWe study instantiated abstract argumentation frames of the form $(S,R,I)$, where $(S,R)$ is an abstract argumentation frame and where the arguments $x$ of $S$ are instantiated by $I(x)$ as well formed formulas of a well known logic, for example as Boolean formulas or as predicate logic formulas or as modal logic formulas. We use the method of conceptual analysis to derive the properties of our proposed system. We seek to define the notion of complete extensions for such systems and provide algorithms for finding such extensions. We further develop a theory of instantiation in the abstract, using the framework of Boolean attack formations and of conjunctive and disjunctive attacks. We discuss applications and compare critically with the existing related literature.\nWe present a novel framework, called Private Disclosure of Information (PDI), which is aimed to prevent an adversary from inferring certain sensitive information about subjects using the data that they disclosed during communication with an intended recipient. We show cases where it is possible to achieve perfect privacy regardless of the adversary's auxiliary knowledge while preserving full utility of the information to the intended recipient and provide sufficient conditions for such cases. We also demonstrate the applicability of PDI on a real-world data set that simulates a health tele-monitoring scenario.\nSequential pattern mining under constraints is a challenging data mining task. Many efficient ad hoc methods have been developed for mining sequential patterns, but they are all suffering from a lack of genericity. Recent works have investigated Constraint Programming (CP) methods, but they are not still effective because of their encoding. In this paper, we propose a global constraint based on the projected databases principle which remedies to this drawback. Experiments show that our approach clearly outperforms CP approaches and competes well with ad hoc methods on large datasets.\nIn this paper, we propose a Concept-level Emotion Cause Model (CECM), instead of the mere word-level models, to discover causes of microblogging users' diversified emotions on specific hot event. A modified topic-supervised biterm topic model is utilized in CECM to detect emotion topics' in event-related tweets, and then context-sensitive topical PageRank is utilized to detect meaningful multiword expressions as emotion causes. Experimental results on a dataset from Sina Weibo, one of the largest microblogging websites in China, show CECM can better detect emotion causes than baseline methods.\nMost work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP).\nDue to rapid advancement in high-throughput techniques, such as microarrays and next generation sequencing technologies, biological data are increasing exponentially. The current challenge in computational biology and bioinformatics research is how to analyze these huge raw biological data to extract biologically meaningful knowledge. This review paper presents the applications of formal concept analysis for the analysis and knowledge discovery from biological data, including gene expression discretization, gene co-expression mining, gene expression clustering, finding genes in gene regulatory networks, enzyme/protein classifications, binding site classifications, and so on. It also presents a list of FCA-based software tools applied in biological domain and covers the challenges faced so far.\nRisk aggregation is a popular method used to estimate the sum of a collection of financial assets or events, where each asset or event is modelled as a random variable. Applications, in the financial services industry, include insurance, operational risk, stress testing, and sensitivity analysis, but the problem is widely encountered in many other application domains. This thesis has contributed two algorithms to perform Bayesian risk aggregation when model exhibit hybrid dependency and high dimensional inter-dependency. The first algorithm operates on a subset of the general problem, with an emphasis on convolution problems, in the presence of continuous and discrete variables (so called hybrid models) and the second algorithm offer a universal method for general purpose inference over much wider classes of Bayesian Network models.\nFrequent itemset mining is an essential part of data analysis and data mining. Recent works propose interesting SAT-based encodings for the problem of discovering frequent itemsets. Our aim in this work is to define strategies for adapting SAT solvers to such encodings in order to improve models enumeration. In this context, we deeply study the effects of restart, branching heuristics and clauses learning. We then conduct an experimental evaluation on SAT-Based itemset mining instances to show how SAT solvers can be adapted to obtain an efficient SAT model enumerator.\nWe present a novel approach for automatic report generation from time-series data, in the context of student feedback generation. Our proposed methodology treats content selection as a multi-label classification (MLC) problem, which takes as input time-series data (students' learning data) and outputs a summary of these data (feedback). Unlike previous work, this method considers all data simultaneously using ensembles of classifiers, and therefore, it achieves higher accuracy and F- score compared to meaningful baselines.\nWe utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonparametric transformations based on Bernstein polynomials provide ample flexibility in characterizing the univariate marginal posteriors.\nFinancial news contains useful information on public companies and the market. In this paper we apply the popular word embedding methods and deep neural networks to leverage financial news to predict stock price movements in the market. Experimental results have shown that our proposed methods are simple but very effective, which can significantly improve the stock prediction accuracy on a standard financial database over the baseline system using only the historical price information.\nMoving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.\nMany formalisms combining ontology languages with uncertainty, usually in the form of probabilities, have been studied over the years. Most of these formalisms, however, assume that the probabilistic structure of the knowledge remains static over time. We present a general approach for extending ontology languages to handle time-evolving uncertainty represented by a dynamic Bayesian network. We show how reasoning in the original language and dynamic Bayesian inferences can be exploited for effective reasoning in our framework.\nWe endow prioritised default logic (PDL) with argumentation semantics using the ASPIC+ framework for structured argumentation, and prove that the conclusions of the justified arguments are exactly the prioritised default extensions. Argumentation semantics for PDL will allow for the application of argument game proof theories to the process of inference in PDL, making the reasons for accepting a conclusion transparent and the inference process more intuitive. This also opens up the possibility for argumentation-based distributed reasoning and communication amongst agents with PDL representations of mental attitudes.\nWe present initial research towards procedural generation of Simplified Boardgames and translating them into an efficient GDL code. This is a step towards establishing Simplified Boardgames as a comparison class for General Game Playing agents. To generate playable, human readable, and balanced chess-like games we use an adaptive evolutionary algorithm with the fitness function based on simulated playouts. In future, we plan to use the proposed method to diversify and extend the set of GGP tournament games by those with fully automatically generated rules.\nA factor-graph representation of quantum-mechanical probabilities (involving any number of measurements) is proposed. Unlike standard statistical models, the proposed representation uses auxiliary variables (state variables) that are not random variables. All joint probability distributions are marginals of some complex-valued function $q$, and it is demonstrated how the basic concepts of quantum mechanics relate to factorizations and marginals of $q$.\nIn electronic sports, cyberathletes conceal their online training using different avatars (virtual identities), allowing them not being recognized by the opponents they may face in future competitions. In this article, we propose a method to tackle this avatar aliases identification problem. Our method trains a classifier on behavioural data and processes the confusion matrix to output label pairs which concentrate confusion. We experimented with Starcraft 2 and report our first results.\nThe using of the internet with its technologies and applications have been increased rapidly. So, protecting the text from illegal use is too needed . Text watermarking is used for this purpose. Arabic text has many characteristics such existing of diacritics , kashida (extension character) and points above or under its letters .Each of Arabic letters can take different shapes with different Unicode. These characteristics are utilized in the watermarking process. In this paper, several methods are discussed in the area of Arabic text watermarking with its advantages and disadvantages .Comparison of these methods is done in term of capacity, robustness and Imperceptibility.\nHoney bees use optical flow to avoid obstacles effectively. In this research work similar methodology was tested on a simulated mobile robot. Simulation framework was based on VRML and Simulink in a 3D world. Optical flow vectors were calculated from a video scene captured by a virtual camera which was used as inputs to a fuzzy logic controller. Fuzzy logic controller decided the locomotion of the robot. Different fuzzy logic rules were evaluated. The robot was able to navigate through complex static and dynamic environments effectively, avoiding obstacles on its path.\nCapturing the interdependencies between real valued time series can be achieved by finding common similar patterns. The abstraction of time series makes the process of finding similarities closer to the way as humans do. Therefore, the abstraction by means of a symbolic levels and finding the common patterns attracts researchers. One particular algorithm, Longest Common Subsequence, has been used successfully as a similarity measure between two sequences including real valued time series. In this paper, we propose Fuzzy Longest Common Subsequence matching for time series.\nDAGitty is a software for drawing and analyzing causal diagrams, also known as directed acyclic graphs (DAGs). Functions include identification of minimal sufficient adjustment sets for estimating causal effects, diagnosis of insufficient or invalid adjustment via the identification of biasing paths, identification of instrumental variables, and derivation of testable implications. DAGitty is provided in the hope that it is useful for researchers and students in Epidemiology, Sociology, Psychology, and other empirical disciplines. The software should run in any web browser that supports modern JavaScript, HTML, and SVG. This is the user manual for DAGitty version 2.3. The manual is updated with every release of a new stable version. DAGitty is available at dagitty.net.\nAnalysis of sequential event data has been recognized as one of the essential tools in data modeling and analysis field. In this paper, after the examination of its technical requirements and issues to model complex but practical situation, we propose a new sequential data model, dubbed Duration and Interval Hidden Markov Model (DI-HMM), that efficiently represents \"state duration\" and \"state interval\" of data events. This has significant implications to play an important role in representing practical time-series sequential data. This eventually provides an efficient and flexible sequential data retrieval. Numerical experiments on synthetic and real data demonstrate the efficiency and accuracy of the proposed DI-HMM.\nWe analyse a quantum-like Bayesian Network that puts together cause/effect relationships and semantic similarities between events. These semantic similarities constitute acausal connections according to the Synchronicity principle and provide new relationships to quantum like probabilistic graphical models. As a consequence, beliefs (or any other event) can be represented in vector spaces, in which quantum parameters are determined by the similarities that these vectors share between them. Events attached by a semantic meaning do not need to have an explanation in terms of cause and effect.\nOptimization of very expensive black-box functions requires utilization of maximum information gathered by the process of optimization. Model Guided Sampling Optimization (MGSO) forms a more robust alternative to Jones' Gaussian-process-based EGO algorithm. Instead of EGO's maximizing expected improvement, the MGSO uses sampling the probability of improvement which is shown to be helpful against trapping in local minima. Further, the MGSO can reach close-to-optimum solutions faster than standard optimization algorithms on low dimensional or smooth problems.\nWe propose a novel value function approximation technique for Markov decision processes. We consider the problem of compactly representing the state-action value function using a low-rank and sparse matrix model. The problem is to decompose a matrix that encodes the true value function into low-rank and sparse components, and we achieve this using Robust Principal Component Analysis (PCA). Under minimal assumptions, this Robust PCA problem can be solved exactly via the Principal Component Pursuit convex optimization problem. We experiment the procedure on several examples and demonstrate that our method yields approximations essentially identical to the true function.\nSummarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared with several strong baselines.\nDiscourse structure is the hidden link between surface features and document-level properties, such as sentiment polarity. We show that the discourse analyses produced by Rhetorical Structure Theory (RST) parsers can improve document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that reweighting discourse units according to their position in a dependency representation of the rhetorical structure can yield substantial improvements on lexicon-based sentiment analysis. Next, we present a recursive neural network over the RST structure, which offers significant improvements over classification-based methods.\nWe introduce a model-free algorithm for learning in Markov decision processes with parameterized actions-discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.\nHumans are sensitive to complexity and regularity in patterns. The subjective perception of pattern complexity is correlated to algorithmic (Kolmogorov-Chaitin) complexity as defined in computer science, but also to the frequency of naturally occurring patterns. However, the possible mediational role of natural frequencies in the perception of algorithmic complexity remains unclear. Here we reanalyze Hsu et al. (2010) through a mediational analysis, and complement their results in a new experiment. We conclude that human perception of complexity seems partly shaped by natural scenes statistics, thereby establishing a link between the perception of complexity and the effect of natural scene statistics.\nWe describe a large-scale project in applied automated deduction concerned with the following problem of considerable interest in loop theory: If $Q$ is a loop with commuting inner mappings, does it follow that $Q$ modulo its center is a group and $Q$ modulo its nucleus is an abelian group? This problem has been answered affirmatively in several varieties of loops. The solution usually involves sophisticated techniques of automated deduction, and the resulting derivations are very long, often with no higher-level human proofs available.\nWe approach the challenging problem of generating highlights from sports broadcasts utilizing audio information only. A language-independent, multi-stage classification approach is employed for detection of key acoustic events which then act as a platform for summarization of highlight scenes. Objective results and human experience indicate that our system is highly efficient.\nIn this paper we present a novel graph kernel framework inspired the by the Weisfeiler-Lehman (WL) isomorphism tests. Any WL test comprises a relabelling phase of the nodes based on test-specific information extracted from the graph, for example the set of neighbours of a node. We defined a novel relabelling and derived two kernels of the framework from it. The novel kernels are very fast to compute and achieve state-of-the-art results on five real-world datasets.\nThe `pet fish' phenomenon is often cited as a paradigm example of the `non-compositionality' of human concept use. We show here how this phenomenon is naturally accommodated within a compositional distributional model of meaning. This model describes the meaning of a composite concept by accounting for interaction between its constituents via their grammatical roles. We give two illustrative examples to show how the qualitative phenomena are exhibited. We go on to apply the model to experimental data, and finally discuss extensions of the formalism.\nEvolutionary algorithms have been frequently applied to constrained continuous optimisation problems. We carry out feature based comparisons of different types of evolutionary algorithms such as evolution strategies, differential evolution and particle swarm optimisation for constrained continuous optimisation. In our study, we examine how sets of constraints influence the difficulty of obtaining close to optimal solutions. Using a multi-objective approach, we evolve constrained continuous problems having a set of linear and/or quadratic constraints where the different evolutionary approaches show a significant difference in performance. Afterwards, we discuss the features of the constraints that exhibit a difference in performance of the different evolutionary approaches under consideration.\nBased on ideas of quantum theory of open systems and psychological dual system theory we propose two novel versions of Non-Boolean logic. The first version can be interpreted in our opinion as simplified description of primitive (mythological) thinking and the second one as the toy model of everyday human reasoning in which aside from logical deduction, heuristic elements and beliefs also play the considerable role. Several arguments in favor of the interpretations proposed are adduced and discussed in the paper as well.\nIn the decade since Jeff Hawkins proposed Hierarchical Temporal Memory (HTM) as a model of neocortical computation, the theory and the algorithms have evolved dramatically. This paper presents a detailed description of HTM's Cortical Learning Algorithm (CLA), including for the first time a rigorous mathematical formulation of all aspects of the computations. Prediction Assisted CLA (paCLA), a refinement of the CLA is presented, which is both closer to the neuroscience and adds significantly to the computational power. Finally, we summarise the key functions of neocortex which are expressed in paCLA implementations.\nWe present a new perspective on neural knowledge base (KB) embeddings, from which we build a framework that can model symbolic knowledge in the KB together with its learning process. We show that this framework well regularizes previous neural KB embedding model for superior performance in reasoning tasks, while having the capabilities of dealing with unseen entities, that is, to learn their embeddings from natural language descriptions, which is very like human's behavior of learning semantic concepts.\nAn open concept of rough evolution and an axiomatic approach to granules was also developed recently by the present author. Subsequently the concepts were used in the formal framework of rough Y-systems (RYS) for developing on granular correspondences by her. These have since been used for a new approach towards comparison of rough algebraic semantics across different semantic domains by way of correspondences that preserve rough evolution and try to avoid contamination. In this research paper, new methods are proposed and a semantics for handling possibly contaminated operations and structured bigness is developed. These would also be of natural interest for relative consistency of one collection of knowledge relative other.\nThe paper proposes a fresh look at the concept of goal and advances that motivational attitudes like desire, goal and intention are just facets of the broader notion of (acceptable) outcome. We propose to encode the preferences of an agent as sequences of \"alternative acceptable outcomes\". We then study how the agent's beliefs and norms can be used to filter the mental attitudes out of the sequences of alternative acceptable outcomes. Finally, we formalise such intuitions in a novel Modal Defeasible Logic and we prove that the resulting formalisation is computationally feasible.\nBuilding a safety case is a common approach to make expert judgement explicit about safety of a system. The issue of confidence in such argumentation is still an open research field. Providing quantitative estimation of confidence is an interesting approach to manage complexity of arguments. This paper explores the main current approaches, and proposes a new model for quantitative confidence estimation based on Belief Theory for its definition, and on Bayesian Belief Networks for its propagation in safety case networks.\nWe discuss the changes in an attitude to decision making at the fire ground. The changes are driven by the recent technological shift. The emerging new approaches in sensing and data processing (under common umbrella of Cyber-Physical Systems) allow for leveling off the gap, between humans and machines, in perception of the fire ground. Furthermore, results from descriptive decision theory question the rationality of human choices. This creates the need for searching and testing new approaches for decision making during emergency. We propose the framework that addresses this need. The primary feature of the framework are possibilities for incorporation of normative and prescriptive approaches to decision making. The framework also allows for comparison of the performance of decisions, between human and machine.\nWe present a framework for representing and modeling data on graphs. Based on this framework, we study three typical classes of graph signals: smooth graph signals, piecewise-constant graph signals, and piecewise-smooth graph signals. For each class, we provide an explicit definition of the graph signals and construct a corresponding graph dictionary with desirable properties. We then study how such graph dictionary works in two standard tasks: approximation and sampling followed with recovery, both from theoretical as well as algorithmic perspectives. Finally, for each class, we present a case study of a real-world problem by using the proposed methodology.\nIn this paper we propose an extension to the Fuzzy Cognitive Maps (FCMs) that aims at aggregating a number of reasoning tasks into a one parallel run. The described approach consists in replacing real-valued activation levels of concepts (and further influence weights) by random variables. Such extension, followed by the implemented software tool, allows for determining ranges reached by concept activation levels, sensitivity analysis as well as statistical analysis of multiple reasoning results. We replace multiplication and addition operators appearing in the FCM state equation by appropriate convolutions applicable for discrete random variables. To make the model computationally feasible, it is further augmented with aggregation operations for discrete random variables. We discuss four implemented aggregators, as well as we report results of preliminary tests.\nIntroduced by Darwiche (2011), sentential decision diagrams (SDDs) are essentially as tractable as ordered binary decision diagrams (OBDDs), but tend to be more succinct in practice. This makes SDDs a prominent representation language, with many applications in artificial intelligence and knowledge compilation. We prove that SDDs are more succinct than OBDDs also in theory, by constructing a family of boolean functions where each member has polynomial SDD size but exponential OBDD size. This exponential separation improves a quasipolynomial separation recently established by Razgon (2013), and settles an open problem in knowledge compilation.\nWe train a reinforcement learner to play a simplified version of the game Angry Birds. The learner is provided with a game state in a manner similar to the output that could be produced by computer vision algorithms. We improve on the efficiency of regular {\\epsilon}-greedy Q-Learning with linear function approximation through more systematic exploration in Randomized Least Squares Value Iteration (RLSVI), an algorithm that samples its policy from a posterior distribution on optimal policies. With larger state-action spaces, efficient exploration becomes increasingly important, as evidenced by the faster learning in RLSVI.\nThe concepts of fuzzy objects and their classes are described that make it possible to structurally represent knowledge about fuzzy and partially-defined objects and their classes. Operations over such objects and classes are also proposed that make it possible to obtain sets and new classes of fuzzy objects and also to model variations in object structures under the influence of external factors.\nFor time series comparisons, it has often been observed that z-score normalized Euclidean distances far outperform the unnormalized variant. In this paper we show that a z-score normalized, squared Euclidean Distance is, in fact, equal to a distance based on Pearson Correlation. This has profound impact on many distance-based classification or clustering methods. In addition to this theoretically sound result we also show that the often used k-Means algorithm formally needs a mod ification to keep the interpretation as Pearson correlation strictly valid. Experimental results demonstrate that in many cases the standard k-Means algorithm generally produces the same results.\nIn this paper we conduct an analysis of Moodle activity data focused on identifying early predictors of good student performance. The analysis shows that three relevant hypotheses are largely supported by the data. These hypotheses are: early submission is a good sign, a high level of activity is predictive of good results and evening activity is even better than daytime activity. We highlight some pathological examples where high levels of activity correlates with bad results.\nWe introduce a new type of graphical model that we call a \"memory factor network\" (MFN). We show how to use MFNs to model the structure inherent in many types of data sets. We also introduce an associated message-passing style algorithm called \"proactive message passing\"' (PMP) that performs inference on MFNs. PMP comes with convergence guarantees and is efficient in comparison to competing algorithms such as variants of belief propagation. We specialize MFNs and PMP to a number of distinct types of data (discrete, continuous, labelled) and inference problems (interpolation, hypothesis testing), provide examples, and discuss approaches for efficient implementation.\nTop-N recommender systems have been investigated widely both in industry and academia. However, the recommendation quality is far from satisfactory. In this paper, we propose a simple yet promising algorithm. We fill the user-item matrix based on a low-rank assumption and simultaneously keep the original information. To do that, a nonconvex rank relaxation rather than the nuclear norm is adopted to provide a better rank approximation and an efficient optimization strategy is designed. A comprehensive set of experiments on real datasets demonstrates that our method pushes the accuracy of Top-N recommendation to a new level.\nThis report seeks to inform policy makers on the nature and the merit of the arguments for and against the concerns associated with a potential technological singularity.   Part I describes the lessons learned from our investigation of the subject, separating the argu-ments of merit from the fallacies and misconceptions that confuse the debate and undermine its rational resolution.\nWe introduce a model for the linguistic hedges `very' and `quite' within the label semantics framework, and combined with the prototype and conceptual spaces theories of concepts. The proposed model emerges naturally from the representational framework we use and as such, has a clear semantic grounding. We give generalisations of these hedge models and show that they can be composed with themselves and with other functions, going on to examine their behaviour in the limit of composition.\nWe investigate the emergence of shared concepts in a community of language users using a multi-agent simulation. We extend results showing that negated assertions are of use in developing shared categories, to include assertions modified by linguistic hedges. Results show that using hedged assertions positively affects the emergence of shared categories in two distinct ways. Firstly, using contraction hedges like `very' gives better convergence over time. Secondly, using expansion hedges such as `quite' reduces concept overlap. However, both these improvements come at a cost of slower speed of development.\nIn this paper, the idea of client verification in distributed systems is presented. The proposed solution presents a sample system where client verification through cloud resources using input signature is discussed. For different signatures the proposed method has been examined. Research results are presented and discussed to show potential advantages.\nMechanical learning is a computing system that is based on a set of simple and fixed rules, and can learn from incoming data. A learning machine is a system that realizes mechanical learning. Importantly, we emphasis that it is based on a set of simple and fixed rules, contrasting to often called machine learning that is sophisticated software based on very complicated mathematical theory, and often needs human intervene for software fine tune and manual adjustments. Here, we discuss some basic facts and principles of such system, and try to lay down a framework for further study. We propose 2 directions to approach mechanical learning, just like Church-Turing pair: one is trying to realize a learning machine, another is trying to well describe the mechanical learning.\nIn this work we propose a new deep learning tool called deep dictionary learning. Multi-level dictionaries are learnt in a greedy fashion, one layer at a time. This requires solving a simple (shallow) dictionary learning problem, the solution to this is well known. We apply the proposed technique on some benchmark deep learning datasets. We compare our results with other deep learning tools like stacked autoencoder and deep belief network; and state of the art supervised dictionary learning tools like discriminative KSVD and label consistent KSVD. Our method yields better results than all.\nCommonsense knowledge representation and reasoning is key for tasks such as artificial intelligence and natural language understanding. Since commonsense consists of information that humans take for granted, gathering it is an extremely difficult task. In this paper, we introduce a novel 3D game engine for commonsense knowledge acquisition (GECKA3D) which aims to collect commonsense from game designers through the development of serious games. GECKA3D integrates the potential of serious games and games with a purpose. This provides a platform for the acquisition of re-usable and multi-purpose knowledge, and also enables the development of games that can provide entertainment value and teach players something meaningful about the actual world they live in.\nThis article generalizes object-oriented dynamic networks to the fuzzy case, which allows one to represent knowledge on objects and classes of objects that are fuzzy by nature and also to model their changes in time. Within the framework of the approach described, a mechanism is proposed that makes it possible to acquire new knowledge on the basis of basic knowledge and considerably differs from well-known methods used in existing models of knowledge representation. The approach is illustrated by an example of construction of a concrete fuzzy object-oriented dynamic network.\nUsually, routing models in pedestrian dynamics assume that agents have fulfilled and global knowledge about the building's structure. However, they neglect the fact that pedestrians possess no or only parts of information about their position relative to final exits and possible routes leading to them. To get a more realistic description we introduce the systematics of gathering and using spatial knowledge. A new wayfinding model for pedestrian dynamics is proposed. The model defines for every pedestrian an individual knowledge representation implying inaccuracies and uncertainties. In addition, knowledge-driven search strategies are introduced. The presented concept is tested on a fictive example scenario.\nPerforming efficient inference on Bayesian Networks (BNs), with large numbers of densely connected variables is challenging. With exact inference methods, such as the Junction Tree algorithm, clustering complexity can grow exponentially with the number of nodes and so computation becomes intractable. This paper presents a general purpose approximate inference algorithm called Triplet Region Construction (TRC) that reduces the clustering complexity for factorized models from worst case exponential to polynomial. We employ graph factorization to reduce connection complexity and produce clusters of limited size. Unlike MCMC algorithms TRC is guaranteed to converge and we present experiments that show that TRC achieves accurate results when compared with exact solutions.\nThis article presents new alternatives to the similarity function for the TextRank algorithm for automatic summarization of texts. We describe the generalities of the algorithm and the different functions we propose. Some of these variants achieve a significative improvement using the same metrics and dataset as the original publication.\nIn this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to create the feature space. We demonstrate our method on a machine olfaction problem, where one needs to classify the gas type and the location where it originates from data sampled by an array of sensors. Our experimental results clearly demonstrate that our method outperforms classical machine learning techniques used in previous studies.\nWe provide a solution for elementary science test using instructional materials. We posit that there is a hidden structure that explains the correctness of an answer given the question and instructional materials and present a unified max-margin framework that learns to find these hidden structures (given a corpus of question-answer pairs and instructional materials), and uses what it learns to answer novel elementary science questions. Our evaluation shows that our framework outperforms several strong baselines.\nWith the growth of the Semantic Web in size and importance, more and more knowledge is stored in machine-readable formats such as the Web Ontology Language OWL. This paper outlines common approaches for efficient reasoning on large-scale data consisting of billions ($10^9$) of triples. Therefore, OWL and its sublanguages, as well as forward and backward chaining techniques are presented. The WebPIE reasoner is discussed in detail as an example for forward chaining using MapReduce for materialisation. Moreover, the QueryPIE reasoner is presented as a backward chaining/hybrid approach which uses query rewriting. Furthermore, an overview on other reasoners is given such as OWLIM and TrOWL.\nEfficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.\nMost data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, finding that it is highly practical and scalable.\nWe propose a general framework for inconsistency-tolerant query answering within existential rule setting. This framework unifies the main semantics proposed by the state of art and introduces new ones based on cardinality and majority principles. It relies on two key notions: modifiers and inference strategies. An inconsistency-tolerant semantics is seen as a composite modifier plus an inference strategy. We compare the obtained semantics from a productivity point of view.\nWe develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good starting point for optimization. Our dual view also reveals a pragmatic and aesthetic perspective of neural networks and underscores their expressive power.\nIn this paper, we introduce a notion of backdoors to Reiter's propositional default logic and study structural properties of it. Also we consider the problems of backdoor detection (parameterised by the solution size) as well as backdoor evaluation (parameterised by the size of the given backdoor), for various kinds of target classes (cnf, horn, krom, monotone, identity). We show that backdoor detection is fixed-parameter tractable for the considered target classes, and backdoor evaluation is either fixed-parameter tractable, in para-DP2 , or in para-NP, depending on the target class.\nCausality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between queryanswer causality, consistency-based diagnosis, database repairs (wrt. integrity constraint violations), abductive diagnosis and the view-update problem have been established. In this work we further investigate connections between query-answer causality and abductive diagnosis and the view-update problem. In this context, we also define and investigate the notion of query-answer causality in the presence of integrity constraints.\nMaking a computational agent 'social' has implications for how it perceives itself and the environment in which it is situated, including the ability to recognise the behaviours of others. We point to recent work on social planning, i.e. planning in settings where the social context is relevant in the assessment of the beliefs and capabilities of others, and in making appropriate choices of what to do next.\nIn this work we present SIFT, a 3-step algorithm for the analysis of the structural information represented by means of a taxonomy. The major advantage of this algorithm is the capability to leverage the information inherent to the hierarchical structures of taxonomies to infer correspondences which can allow to merge them in a later step. This method is particular relevant in scenarios where taxonomy alignment techniques exploiting textual information from taxonomy nodes cannot operate successfully.\nWe discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.\nIn this paper, we propose the Neural Knowledge DNA, a framework that tailors the ideas underlying the success of neural networks to the scope of knowledge representation. Knowledge representation is a fundamental field that dedicate to representing information about the world in a form that computer systems can utilize to solve complex tasks. The proposed Neural Knowledge DNA is designed to support discovering, storing, reusing, improving, and sharing knowledge among machines and organisation. It is constructed in a similar fashion of how DNA formed: built up by four essential elements. As the DNA produces phenotypes, the Neural Knowledge DNA carries information and knowledge via its four essential elements, namely, Networks, Experiences, States, and Actions.\nWe present an approach to generate novel computer game levels that blend different game concepts in an unsupervised fashion. Our primary contribution is an analogical reasoning process to construct blends between level design models learned from gameplay videos. The models represent probabilistic relationships between elements in the game. An analogical reasoning process maps features between two models to produce blended models that can then generate new level chunks. As a proof-of-concept we train our system on the classic platformer game Super Mario Bros. due to its highly-regarded and well understood level design. We evaluate the extent to which the models represent stylistic level design knowledge and demonstrate the ability of our system to explain levels that were blended by human expert designers.\nIn this paper, we propose a set theoretic approach for knowledge representation. While the syntax of an application domain is captured by set theoretic constructs including individuals, concepts and operators, knowledge is formalized by equality assertions. We first present a primitive form that uses minimal assumed knowledge and constructs. Then, assuming naive set theory, we extend it by definitions, which are special kinds of knowledge. Interestingly, we show that the primitive form is expressive enough to define logic operators, not only propositional connectives but also quantifiers.\nStarting from the primary representation of neutrosophic information, namely the degree of truth, degree of indeterminacy and degree of falsity, we define a nuanced representation in a penta valued fuzzy space, described by the index of truth, index of falsity, index of ignorance, index of contradiction and index of hesitation. Also, it was constructed an associated penta valued logic and then using this logic, it was defined for the proposed penta valued structure the following operators: union, intersection, negation, complement and dual. Then, the penta valued representation is extended to a hexa valued one, adding the sixth component, namely the index of ambiguity.\nProblem solving in Answer Set Programming consists of two steps, a first grounding phase, systematically replacing all variables by terms, and a second solving phase computing the stable models of the obtained ground program. An intricate part of both phases is the treatment of aggregates, which are popular language constructs that allow for expressing properties over sets. In this paper, we elaborate upon the treatment of aggregates during grounding in Gringo series 4. Consequently, our approach is applicable to grounding based on semi-naive database evaluation techniques. In particular, we provide a series of algorithms detailing the treatment of recursive aggregates and illustrate this by a running example.\nIn this paper we model the problem of learning preferences of a population as an active learning problem. We propose an algorithm can adaptively choose pairs of items to show to users coming from a heterogeneous population, and use the obtained reward to decide which pair of items to show next. We provide computationally efficient algorithms with provable sample complexity guarantees for this problem in both the noiseless and noisy cases. In the process of establishing sample complexity guarantees for our algorithms, we establish new results using a Nystr{\\\"o}m-like method which can be of independent interest. We supplement our theoretical results with experimental comparisons.\nWe apply genetic programming techniques to the `shepherding' problem, in which a group of one type of animal (sheep dogs) attempts to control the movements of a second group of animals (sheep) obeying flocking behavior. Our genetic programming algorithm evolves an expression tree that governs the movements of each dog. The operands of the tree are hand-selected features of the simulation environment that may allow the dogs to herd the sheep effectively. The algorithm uses tournament-style selection, crossover reproduction, and a point mutation. We find that the evolved solutions generalize well and outperform a (naive) human-designed algorithm.\nThis paper presents a system which creates and visualizes probabilistic semantic links between concepts in a thesaurus and classes in a classification system. For creating the links, we build on the Polylingual Labeled Topic Model (PLL-TM). PLL-TM identifies probable thesaurus descriptors for each class in the classification system by using information from the natural language text of documents, their assigned thesaurus descriptors and their designated classes. The links are then presented to users of the system in an interactive visualization, providing them with an automatically generated overview of the relations between the thesaurus and the classification system.\nFor building question answering systems and natural language interfaces, semantic parsing has emerged as an important and powerful paradigm. Semantic parsers map natural language into logical forms, the classic representation for many important linguistic phenomena. The modern twist is that we are interested in learning semantic parsers from data, which introduces a new layer of statistical and computational issues. This article lays out the components of a statistical semantic parser, highlighting the key challenges. We will see that semantic parsing is a rich fusion of the logical and the statistical world, and that this fusion will play an integral role in the future of natural language understanding systems.\nThis paper argues that a combined treatment of probabilities, time and actions is essential for an appropriate logical account of the notion of probability; and, based on this intuition, describes an expressive probabilistic temporal logic for reasoning about actions with uncertain outcomes. The logic is modal and higher-order: modalities annotated by actions are used to express possibility and necessity of propositions in the next states resulting from the actions, and a higher-order function is needed to express the probability operator. The proposed logic is shown to be an adequate extension of classical mathematical probability theory, and its expressiveness is illustrated through the formalization of the Monty Hall problem.\nWe investigate properties of ABA+, a formalism that extends the well studied structured argumentation formalism Assumption-Based Argumentation (ABA) with a preference handling mechanism. In particular, we establish desirable properties that ABA+ semantics exhibit. These pave way to the satisfaction by ABA+ of some (arguably) desirable principles of preference handling in argumentation and nonmonotonic reasoning, as well as non-monotonic inference properties of ABA+ under various semantics.\nMethod of moment estimators exhibit appealing statistical properties, such as asymptotic unbiasedness, for nonconvex problems. However, they typically require a large number of samples and are extremely sensitive to model misspecification. In this paper, we apply the framework of M-estimation to develop both a generalized method of moments procedure and a principled method for regularization. Our proposed M-estimator obtains optimal sample efficiency rates (in the class of moment-based estimators) and the same well-known rates on prediction accuracy as other spectral estimators. It also makes it straightforward to incorporate regularization into the sample moment conditions. We demonstrate empirically the gains in sample efficiency from our approach on hidden Markov models.\nHierarchical Reinforcement Learning (HRL) exploits temporal abstraction to solve large Markov Decision Processes (MDP) and provide transferable subtask policies. In this paper, we introduce an off-policy HRL algorithm: Hierarchical Q-value Iteration (HQI). We show that it is possible to effectively learn recursive optimal policies for any valid hierarchical decomposition of the original MDP, given a fixed dataset collected from a flat stochastic behavioral policy. We first formally prove the convergence of the algorithm for tabular MDP. Then our experiments on the Taxi domain show that HQI converges faster than a flat Q-value Iteration and enjoys easy state abstraction. Also, we demonstrate that our algorithm is able to learn optimal policies for different hierarchical structures from the same fixed dataset, which enables model comparison without recollecting data.\nWe are interested in belief revision involving conditional statements where the antecedent is almost certainly false. To represent such problems, we use Ordinal Conditional Functions that may take infinite values. We model belief change in this context through simple arithmetical operations that allow us to capture the intuition that certain antecedents can not be validated by any number of observations. We frame our approach as a form of finite belief improvement, and we propose a model of conditional belief revision in which only the \"right\" hypothetical levels of implausibility are revised.\nWe describe a representation in a high-level transition system for policies that express a reactive behavior for the agent. We consider a target decision component that figures out what to do next and an (online) planning capability to compute the plans needed to reach these targets. Our representation allows one to analyze the flow of executing the given reactive policy, and to determine whether it works as expected. Additionally, the flexibility of the representation opens a range of possibilities for designing behaviors.\nWe investigate an efficient context-dependent clustering technique for recommender systems based on exploration-exploitation strategies through multi-armed bandits over multiple users. Our algorithm dynamically groups users based on their observed behavioral similarity during a sequence of logged activities. In doing so, the algorithm reacts to the currently served user by shaping clusters around him/her but, at the same time, it explores the generation of clusters over users which are not currently engaged. We motivate the effectiveness of this clustering policy, and provide an extensive empirical analysis on real-world datasets, showing scalability and improved prediction performance over state-of-the-art methods for sequential clustering of users in multi-armed bandit scenarios.\nWe present a novel online ensemble learning strategy for portfolio selection. The new strategy controls and exploits any set of commission-oblivious portfolio selection algorithms. The strategy handles transaction costs using a novel commission avoidance mechanism. We prove a logarithmic regret bound for our strategy with respect to optimal mixtures of the base algorithms. Numerical examples validate the viability of our method and show significant improvement over the state-of-the-art.\nIn these notes we propose a setting for fuzzy computing in a framework similar to that of well-established theories of computation: boolean, and quantum computing. Our efforts have been directed towards stressing the formal similarities: there is a common pattern underlying these three theories. We tried to conform our approach, as much as possible, to this pattern. This work was part of a project jointly with Professor Vittorio Cafagna. Professor Cafagna passed away unexpectedly in 2007. His intellectual breadth and inspiring passion for mathematics is still very well alive.\nWe introduce an LSTM-based method for dynamically integrating several word-prediction experts to obtain a conditional language model which can be good simultaneously at several subtasks. We illustrate this general approach with an application to dialogue where we integrate a neural chat model, good at conversational aspects, with a neural question-answering model, good at retrieving precise information from a knowledge-base, and show how the integration combines the strengths of the independent components. We hope that this focused contribution will attract attention on the benefits of using such mixtures of experts in NLP.\nThe inverse problem of general rough sets, considered by the present author in some of her earlier papers, in one of its manifestations is essentially the question of when an agent's view about crisp and non crisp objects over a set of objects has a rough evolution. In this research the nature of the problem is examined from number-theoretic and combinatorial perspectives under very few assumptions about the nature of data and some necessary conditions are proved.\nThe Dialog State Tracking Challenge 4 (DSTC 4) proposes several pilot tasks. In this paper, we focus on the spoken language understanding pilot task, which consists of tagging a given utterance with speech acts and semantic slots. We compare different classifiers: the best system obtains 0.52 and 0.67 F1-scores on the test set for speech act recognition for the tourist and the guide respectively, and 0.52 F1-score for semantic tagging for both the guide and the tourist.\nWe present in this article the model Function-described graph (FDG), which is a type of compact representation of a set of attributed graphs (AGs) that borrow from Random Graphs the capability of probabilistic modelling of structural and attribute information. We define the FDGs, their features and two distance measures between AGs (unclassified patterns) and FDGs (models or classes) and we also explain an efficient matching algorithm. Two applications of FDGs are presented: in the former, FDGs are used for modelling and matching 3D-objects described by multiple views, whereas in the latter, they are used for representing and recognising human faces, described also by several views.\nWe present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. We propose a novel algorithm for this task, able to scale to large domains and large treewidths. Our novel approach consistently outperforms the state of the art on data sets with up to ten thousand variables.\nAttention endows animals an ability to concentrate on the most relevant information among a deluge of distractors at any given time, either through volitionally 'top-down' biasing, or driven by automatically 'bottom-up' saliency of stimuli, in favour of advantageous competition in neural modulations for information processing. Nevertheless, instead of being limited to perceive simple features, human and other advanced animals adaptively learn the world into categories and abstract concepts from experiences, imparting the world meanings. This thesis suggests that the high-level cognitive ability of human is more likely driven by attention basing on abstract perceptions, which is defined as concept based attention (CbA).\nObservational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, \"Would this patient have lower blood sugar had she received a different medication?\". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art.\nWe present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g. \"Boston marathon bombing\"), our system is able to filter the stream for relevance and produce a series of short text updates describing the event as it unfolds over time. Unlike previous work, our approach is able to jointly model the relevance, comprehensiveness, novelty, and timeliness required by time-sensitive queries. We demonstrate a 28.3% improvement in summary F1 and a 43.8% improvement in time-sensitive F1 metrics.\nThis paper is a reflexion on the computability of natural language semantics. It does not contain a new model or new results in the formal semantics of natural language: it is rather a computational analysis of the logical models and algorithms currently used in natural language semantics, defined as the mapping of a statement to logical formulas - formulas, because a statement can be ambiguous. We argue that as long as possible world semantics is left out, one can compute the semantic representation(s) of a given statement, including aspects of lexical meaning. We also discuss the algorithmic complexity of this process.\nIn recent years there has been much interest in the Monte Carlo tree search algorithm, a new, adaptive, randomized optimization algorithm. In fields as diverse as Artificial Intelligence, Operations Research, and High Energy Physics, research has established that Monte Carlo tree search can find good solutions without domain dependent heuristics. However, practice shows that reaching high performance on large parallel machines is not so successful as expected. This paper proposes a new method for parallel Monte Carlo tree search based on the pipeline computation pattern.\nWe study a framework where agents have to avoid aversive signals. The agents are given only partial information, in the form of features that are projections of task states. Additionally, the agents have to cope with non-determinism, defined as unpredictability on the way that actions are executed. The goal of each agent is to define its behavior based on feature-action pairs that reliably avoid aversive signals. We study a learning algorithm, called A-learning, that exhibits fixpoint convergence, where the belief of the allowed feature-action pairs eventually becomes fixed. A-learning is parameter-free and easy to implement.\nIn this paper, we study three connection games among the most widely played: Havannah, Twixt, and Slither. We show that determining the outcome of an arbitrary input position is PSPACE-complete in all three cases. Our reductions are based on the popular graph problem Generalized Geography and on Hex itself. We also consider the complexity of generalizations of Hex parameterized by the length of the solution and establish that while Short Generalized Hex is W[1]-hard, Short Hex is FPT. Finally, we prove that the ultra-weak solution to the empty starting position in hex cannot be fully adapted to any of these three games.\nGame tree search algorithms, such as Monte Carlo Tree Search (MCTS), require access to a forward model (or \"simulator\") of the game at hand. However, in some games such forward model is not readily available. This paper presents three forward models for two-player attrition games, which we call \"combat models\", and show how they can be used to simulate combat in RTS games. We also show how these combat models can be learned from replay data. We use StarCraft as our application domain. We report experiments comparing our combat models predicting a combat output and their impact when used for tactical decisions during a real game.\nWe consider the task of predicting lexical entailment using distributional vectors. We perform a novel qualitative analysis of one existing model which was previously shown to only measure the prototypicality of word pairs. We find that the model strongly learns to identify hypernyms using Hearst patterns, which are well known to be predictive of lexical relations. We present a novel model which exploits this behavior as a method of feature extraction in an iterative procedure similar to Principal Component Analysis. Our model combines the extracted features with the strengths of other proposed models in the literature, and matches or outperforms prior work on multiple data sets.\nIn this thesis we present a new algorithm for the Vehicle Routing Problem called the Enhanced Bees Algorithm. It is adapted from a fairly recent algorithm, the Bees Algorithm, which was developed for continuous optimisation problems. We show that the results obtained by the Enhanced Bees Algorithm are competitive with the best meta-heuristics available for the Vehicle Routing Problem (within 0.5% of the optimal solution for common benchmark problems). We show that the algorithm has good runtime performance, producing results within 2% of the optimal solution within 60 seconds, making it suitable for use within real world dispatch scenarios.\nThis paper proposes a new general approach based on Bayesian networks to model the human behaviour. This approach represents human behaviour with probabilistic cause-effect relations based on knowledge, but also with conditional probabilities coming either from knowledge or deduced from observations. This approach has been applied to the co-simulation of the CO2 concentration in an office coupled with human behaviour.\nSimiles are natural language expressions used to compare unlikely things, where the comparison is not taken literally. They are often used in everyday communication and are an important part of cultural heritage. Having an up-to-date corpus of similes is challenging, as they are constantly coined and/or adapted to the contemporary times. In this paper we present a methodology for semi-automated collection of similes from the world wide web using text mining techniques. We expanded an existing corpus of traditional similes (containing 333 similes) by collecting 446 additional expressions. We, also, explore how crowdsourcing can be used to extract and curate new similes.\nCertain constructs allowed in Mizar articles cannot be represented in first-order logic but can be represented in higher-order logic. We describe a way to obtain higher-order theorem proving problems from Mizar articles that make use of these constructs. In particular, higher-order logic is used to represent schemes, a global choice construct and set level binders. The higher-order automated theorem provers Satallax and LEO-II have been run on collections of these problems and the results are discussed.\nProbabilistic modeling is cyclical: we specify a model, infer its posterior, and evaluate its performance. Evaluation drives the cycle, as we revise our model based on how it performs. This requires a metric. Traditionally, predictive accuracy prevails. Yet, predictive accuracy does not tell the whole story. We propose to evaluate a model through posterior dispersion. The idea is to analyze how each datapoint fares in relation to posterior uncertainty around the hidden structure. We propose a family of posterior dispersion indices (PDI) that capture this idea. A PDI identifies rich patterns of model mismatch in three real data examples: voting preferences, supermarket shopping, and population genetics.\nIn this paper, we develop a computationally simpler version of the operator count heuristic for a particular class of domains. The contribution of this abstract is threefold, we (1) propose an efficient closed form approximation to the operator count heuristic using the Lagrangian dual; (2) leverage compressed sensing techniques to obtain an integer approximation for operator counts in polynomial time; and (3) discuss the relationship of the proposed formulation to existing heuristics and investigate properties of domains where such approaches appear to be useful.\nWe propose a new internal guidance method for automated theorem provers based on the given-clause algorithm. Our method influences the choice of unprocessed clauses using positive and negative examples from previous proofs. To this end, we present an efficient scheme for Naive Bayesian classification by generalising label occurrences to types with monoid structure. This makes it possible to extend existing fast classifiers, which consider only positive examples, with negative ones. We implement the method in the higher-order logic prover Satallax, where we modify the delay with which propositions are processed. We evaluated our method on a simply-typed higher-order logic version of the Flyspeck project, where it solves 26% more problems than Satallax without internal guidance.\nIn this paper, we present a Virtual-Suspect system which can be used to train inexperienced law enforcement personnel in interrogation strategies. The system supports different scenario configurations based on historical data. The responses presented by the Virtual-Suspect are selected based on the psychological state of the suspect, which can be configured as well. Furthermore, each interrogator's statement affects the Virtual-Suspect's current psychological state, which may lead the interrogation in different directions. In addition, the model takes into account the context in which the statements are made. Experiments with 24 subjects demonstrate that the Virtual-Suspect's behavior is similar to that of a human who plays the role of the suspect.\nAdaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessian-based preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate algorithm called SDProp. Its key idea is effective handling of the noise by preconditioning based on covariance matrix. For various neural networks, our approach is more efficient and effective than RMSProp and its variant.\nReinforcement learning for embodied agents is a challenging problem. The accumulated reward to be optimized is often a very rugged function, and gradient methods are impaired by many local optimizers. We demonstrate, in an experimental setting, that incorporating an intrinsic reward can smoothen the optimization landscape while preserving the global optimizers of interest. We show that policy gradient optimization for locomotion in a complex morphology is significantly improved when supplementing the extrinsic reward by an intrinsic reward defined in terms of the mutual information of time consecutive sensor readings.\nWe present a general formal argumentation system for dealing with the detachment of conditional obligations. Given a set of facts, constraints, and conditional obligations, we answer the question whether an unconditional obligation is detachable by considering reasons for and against its detachment. For the evaluation of arguments in favor of detaching obligations we use a Dung-style argumentation-theoretical semantics. We illustrate the modularity of the general framework by considering some extensions, and we compare the framework to some related approaches from the literature.\nThis paper proposes a fuzzy goal programming based on Taylor series for solving decentralized bi-level multiobjective fractional programming (DBLMOFP) problem. In the proposed approach, all of the membership functions are associated with the fuzzy goals of each objective at the both levels and also the fractional membership functions are converted to linear functions using the Taylor series approach. Then a fuzzy goal programming is proposed to reach the highest degree of each of the membership goals by taking the most satisfactory solution for all decision makers at the both levels. Finally, a numerical example is presented to illustrate the effectiveness of the proposed approach.\nAlgorithm design is a laborious process and often requires many iterations of ideation and validation. In this paper, we explore automating algorithm design and present a method to learn an optimization algorithm, which we believe to be the first method that can automatically discover a better algorithm. We approach this problem from a reinforcement learning perspective and represent any particular optimization algorithm as a policy. We learn an optimization algorithm using guided policy search and demonstrate that the resulting algorithm outperforms existing hand-engineered algorithms in terms of convergence speed and/or the final objective value.\nWe introduce a novel playlist generation algorithm that focuses on the quality of transitions using a recurrent neural network (RNN). The proposed model assumes that optimal transitions between tracks can be modelled and predicted by internal transitions within music tracks. We introduce modelling sequences of high-level music descriptors using RNNs and discuss an experiment involving different similarity functions, where the sequences are provided by a musical structural analysis algorithm. Qualitative observations show that the proposed approach can effectively model transitions of music tracks in playlists.\nWe derive a relationship between network representation in energy-efficient neuromorphic architectures and block Toplitz convolutional matrices. Inspired by this connection, we develop deep convolutional networks using a family of structured convolutional matrices and achieve state-of-the-art trade-off between energy efficiency and classification accuracy for well-known image recognition tasks. We also put forward a novel method to train binary convolutional networks by utilising an existing connection between noisy-rectified linear units and binary activations.\nIn this document, we introduce a new dataset designed for training machine learning models of symbolic music data. Five datasets are provided, one of which is from a newly collected corpus of 20K midi files. We describe our preprocessing and cleaning pipeline, which includes the exclusion of a number of files based on scores from a previously developed probabilistic machine learning model. We also define training, testing and validation splits for the new dataset, based on a clustering scheme which we also describe. Some simple histograms are included.\nThis paper describes a new spoken dialog portal that connects systems produced by the spoken dialog academic research community and gives them access to real users. We introduce a distributed, multi-modal, multi-agent prototype dialog framework that affords easy integration with various remote resources, ranging from end-to-end dialog systems to external knowledge APIs. To date, the DialPort portal has successfully connected to the multi-domain spoken dialog system at Cambridge University, the NOAA (National Oceanic and Atmospheric Administration) weather API and the Yelp API.\nThis report describes our participation in the cDiscount 2015 challenge where the goal was to classify product items in a predefined taxonomy of products. Our best submission yielded an accuracy score of 64.20\\% in the private part of the leaderboard and we were ranked 10th out of 175 participating teams. We followed a text classification approach employing mainly linear models. The final solution was a weighted voting system which combined a variety of trained models.\nConsider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.\nThis work introduces a probabilistic-based model for binary CSP that provides a fine grained analysis of its internal structure. Assuming that a domain modification could occur in the CSP, it shows how to express, in a predictive way, the probability that a domain value becomes inconsistent, then it express the expectation of the number of arc-inconsistent values in each domain of the constraint network. Thus, it express the expectation of the number of arc-inconsistent values for the whole constraint network. Next, it provides bounds for each of these three probabilistic indicators. Finally, a polytime algorithm, which propagates the probabilistic information, is presented.\nWe present a first procedure that can estimate -- with statistical consistency guarantees -- any local-maxima of a density, under benign distributional conditions. The procedure estimates all such local maxima, or $\\textit{modal-sets}$, of any bounded shape or dimension, including usual point-modes. In practice, modal-sets can arise as dense low-dimensional structures in noisy data, and more generally serve to better model the rich variety of locally-high-density structures in data.   The procedure is then shown to be competitive on clustering applications, and moreover is quite stable to a wide range of settings of its tuning parameter.\nWe study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.\nDeep reinforcement learning has been shown to be a powerful framework for learning policies from complex high-dimensional sensory inputs to actions in complex tasks, such as the Atari domain. In this paper, we explore output representation modeling in the form of temporal abstraction to improve convergence and reliability of deep reinforcement learning approaches. We concentrate on macro-actions, and evaluate these on different Atari 2600 games, where we show that they yield significant improvements in learning speed. Additionally, we show that they can even achieve better scores than DQN. We offer analysis and explanation for both convergence and final results, revealing a problem deep RL approaches have with sparse reward signals.\nWe introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel. It is suitable for both batch and online learning, and admits a fast kernel-width-selection procedure as the random features can be re-used efficiently for all kernel widths. The features are constructed by sampling trees via a Mondrian process [Roy and Teh, 2009], and we highlight the connection to Mondrian forests [Lakshminarayanan et al., 2014], where trees are also sampled via a Mondrian process, but fit independently. This link provides a new insight into the relationship between kernel methods and random forests.\nThis volume of EPTCS contains the proceedings of the First Workshop on Hammers for Type Theories (HaTT 2016), held on 1 July 2016 as part of the International Joint Conference on Automated Reasoning (IJCAR 2016) in Coimbra, Portugal. The proceedings contain four regular papers, as well as abstracts of the two invited talks by Pierre Corbineau (Verimag, France) and Aleksy Schubert (University of Warsaw, Poland).\nHumans are generally good at learning abstract concepts about objects and scenes (e.g.\\ spatial orientation, relative sizes, etc.). Over the last years convolutional neural networks have achieved almost human performance in recognizing concrete classes (i.e.\\ specific object categories). This paper tests the performance of a current CNN (GoogLeNet) on the task of differentiating between abstract classes which are trivially differentiable for humans. We trained and tested the CNN on the two abstract classes of horizontal and vertical orientation and determined how well the network is able to transfer the learned classes to other, previously unseen objects.\nThis paper describes a simple new semantics for logic rules, founded semantics, and its straightforward extension to another simple new semantics, constraint semantics. The new semantics support unrestricted negation, as well as unrestricted existential and universal quantifications. They are uniquely expressive and intuitive by allowing assumptions about the predicates and rules to be specified explicitly. They are completely declarative and easy to understand and relate cleanly to prior semantics. In addition, founded semantics can be computed in linear time in the size of the ground program.\nReinforcement learning has been applied to many interesting problems such as the famous TD-gammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper, we take a step towards solving this problem by using signal temporal logic (STL) as task specification, and taking advantage of the temporal abstraction feature that the options framework provide. We show via simulation that a relatively easy to implement algorithm that combines STL and options can learn a satisfactory policy with a small number of training cases\nWe present a simple approach for automatically extracting the number of subjects involved in randomised controlled trials (RCT). Our approach first applies a set of rule-based techniques to extract candidate study sizes from the abstracts of the articles. Supervised classification is then performed over the candidates with support vector machines, using a small set of lexical, structural, and contextual features. With only a small annotated training set of 201 RCTs, we obtained an accuracy of 88\\%. We believe that this system will aid complex medical text processing tasks such as summarisation and question answering.\nLevels are a key component of many different video games, and a large body of work has been produced on how to procedurally generate game levels. Recently, Machine Learning techniques have been applied to video game level generation towards the purpose of automatically generating levels that have the properties of the training corpus. Towards that end we have made available a corpora of video game levels in an easy to parse format ideal for different machine learning and other game AI research purposes.\nGossip protocols aim at arriving, by means of point-to-point or group communications, at a situation in which all the agents know each other's secrets. We consider distributed gossip protocols which are expressed by means of epistemic logic. We provide an operational semantics of such protocols and set up an appropriate framework to argue about their correctness. Then we analyze specific protocols for complete graphs and for directed rings.\nIn this paper, we introduce a lightweight dynamic epistemic logical framework for automated planning under initial uncertainty. We reduce plan verification and conformant planning to model checking problems of our logic. We show that the model checking problem of the iteration-free fragment is PSPACE-complete. By using two non-standard (but equivalent) semantics, we give novel model checking algorithms to the full language and the iteration-free language.\nOne of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended object if the expression has a unique referent, or indicates a failure, if it does not. The model, directly trained on reference acts, is competitive with a pipeline manually engineered to perform the same task, both when referents are purely visual, and when they are characterized by a combination of visual and linguistic properties.\nOptimization of product performance repetitively introduces the need to make products adaptive in a more general sense. This more general idea is often captured under the term 'self-configuration'. Despite the importance of such capability, research work on this feature appears isolated by technical domains. It is not easy to tell quickly whether the approaches chosen in different technological domains introduce new ideas or whether the differences just reflect domain idiosyncrasies. For the sake of easy identification of key differences between systems with self-configuring capabilities, I will explore higher level concepts for understanding self-configuration, such as the {\\Omega}-units, in order to provide theoretical instruments for connecting different areas of technology and research.\nWe present a transition-based parser that jointly produces syntactic and semantic dependencies. It learns a representation of the entire algorithm state, using stack long short-term memories. Our greedy inference algorithm has linear time, including feature extraction. On the CoNLL 2008--9 English shared tasks, we obtain the best published parsing performance among models that jointly learn syntax and semantics.\nA probabilistic program defines a probability measure over its semantic structures. One common goal of probabilistic programming languages (PPLs) is to compute posterior probabilities for arbitrary models and queries, given observed evidence, using a generic inference engine. Most PPL inference engines---even the compiled ones---incur significant runtime interpretation overhead, especially for contingent and open-universe models. This paper describes Swift, a compiler for the BLOG PPL. Swift-generated code incorporates optimizations that eliminate interpretation overhead, maintain dynamic dependencies efficiently, and handle memory management for possible worlds of varying sizes. Experiments comparing Swift with other PPL engines on a variety of inference problems demonstrate speedups ranging from 12x to 326x.\nCrosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages. Our model achieves state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the monolingual word similarity and cross-lingual document classification task.\nA central question for knowledge representation is how to encode and handle uncertain knowledge adequately. We introduce the probabilistic description logic ALCP that is designed for representing context-dependent knowledge, where the actual context taking place is uncertain. ALCP allows the expression of logical dependencies on the domain and probabilistic dependencies on the possible contexts. In order to draw probabilistic conclusions, we employ the principle of maximum entropy. We provide reasoning algorithms for this logic, and show that it satisfies several desirable properties of probabilistic logics.\nComputational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an $\\tilde{O}(H\\sqrt{SAT})$ Bayesian expected regret bound for PSRL in finite-horizon episodic Markov decision processes, where $H$ is the horizon, $S$ is the number of states, $A$ is the number of actions and $T$ is the time elapsed. This improves upon the best previous bound of $\\tilde{O}(H S \\sqrt{AT})$ for any reinforcement learning algorithm.\nThe last decade has seen huge progress in the development of advanced machine learning models; however, those models are powerless unless human users can interpret them. Here we show how the mind's construction of concepts and meaning can be used to create more interpretable machine learning models. By proposing a novel method of classifying concepts, in terms of 'form' and 'function', we elucidate the nature of meaning and offer proposals to improve model understandability. As machine learning begins to permeate daily life, interpretable models may serve as a bridge between domain-expert authors and non-expert users.\nMany quantities of interest in communications, signal processing, artificial intelligence, and other areas can be expressed as the partition sum of some factor graph. Although the exact calculation of the partition sum is in many cases intractable, it can often be approximated rather well by the Bethe partition sum. In earlier work, we have shown that graph covers are a useful tool for expressing and analyzing the Bethe approximation. In this paper, we present a novel technique for analyzing double covers, a technique which ultimately leads to a deeper understanding of the Bethe approximation.\nThe Tree Augmented Naive Bayes classifier is a type of probabilistic graphical model that can represent some feature dependencies. In this work, we propose a Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes (HRE-TAN) algorithm, which considers removing the hierarchical redundancy during the classifier learning process, when coping with data containing hierarchically structured features. The experiments showed that HRE-TAN obtains significantly better predictive performance than the conventional Tree Augmented Naive Bayes classifier, and enhanced the robustness against imbalanced class distributions, in aging-related gene datasets with Gene Ontology terms used as features.\nMuch of the worlds data is streaming, time-series data, where anomalies give significant information in critical situations. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, and learn while simultaneously making predictions. We present a novel anomaly detection technique based on an on-line sequence memory algorithm called Hierarchical Temporal Memory (HTM). We show results from a live application that detects anomalies in financial metrics in real-time. We also test the algorithm on NAB, a published benchmark for real-time anomaly detection, where our algorithm achieves best-in-class results.\nWe study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition. This is the case, for example, when multiple sensors share a common resource (power, bandwidth, attention, etc.). We develop a method for computing the optimal resource allocation for a variety of scenarios and derive theoretical bounds concerning the benefit that may arise by non-uniform allocation. We further demonstrate the effectiveness of the developed method in simulations.\nWord embeddings have been shown to be useful across state-of-the-art systems in many natural language processing tasks, ranging from question answering systems to dependency parsing. (Herbelot and Vecchi, 2015) explored word embeddings and their utility for modeling language semantics. In particular, they presented an approach to automatically map a standard distributional semantic space onto a set-theoretic model using partial least squares regression. We show in this paper that a simple baseline achieves a +51% relative improvement compared to their model on one of the two datasets they used, and yields competitive results on the second dataset.\nWe present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite.   The key observation is that in MOOCs many programs share similar code fragments, and the seq2seq neural network model, used in the natural-language processing task of machine translation, can be modified and trained to recover these fragments.   Experiment shows our scheme can correct 29% of all incorrect submissions and out-performs state of the art approach which requires manual, problem specific correction strategies.\nThere has been an ever-increasing interest in multidisciplinary research on representing and reasoning with imperfect data. Possibilistic networks present one of the powerful frameworks of interest for representing uncertain and imprecise information. This paper covers the problem of their parameters learning from imprecise datasets, i.e., containing multi-valued data. We propose in the rst part of this paper a possibilistic networks sampling process. In the second part, we propose a likelihood function which explores the link between random sets theory and possibility theory. This function is then deployed to parametrize possibilistic networks.\nWe motivate and offer a formal definition of validation as it applies to information fusion systems. Common definitions of validation compare the actual state of the world with that derived by the fusion process. This definition conflates properties of the fusion system with properties of systems that intervene between the world and the fusion system. We propose an alternative definition where validation of an information fusion system references a standard fusion device, such as recognized human experts. We illustrate the approach by describing the validation process implemented in RAID, a program conducted by DARPA and focused on information fusion in adversarial, deceptive environments.\nWe propose Meta-Prod2vec, a novel method to compute item similarities for recommendation that leverages existing item metadata. Such scenarios are frequently encountered in applications such as content recommendation, ad targeting and web search. Our method leverages past user interactions with items and their attributes to compute low-dimensional embeddings of items. Specifically, the item metadata is in- jected into the model as side information to regularize the item embeddings. We show that the new item representa- tions lead to better performance on recommendation tasks on an open music dataset.\nThis paper presents a computational model of the processing of dynamic spatial relations occurring in an embodied robotic interaction setup. A complete system is introduced that allows autonomous robots to produce and interpret dynamic spatial phrases (in English) given an environment of moving objects. The model unites two separate research strands: computational cognitive semantics and on commonsense spatial representation and reasoning. The model for the first time demonstrates an integration of these different strands.\nIn this paper we present the next step in our approach to neurobiologically plausible implementation of emotional reactions and behaviors for real-time autonomous robotic systems. The working metaphor we use is the \"day\" and the \"night\" phases of mammalian life. During the \"day phase\" a robotic system stores the inbound information and is controlled by a light-weight rule-based system in real time. In contrast to that, during the \"night phase\" information that has been stored is transferred to a supercomputing system to update the realistic neural network: emotional and behavioral strategies.\nIn this paper, we present a study on personalized emphasis framing which can be used to tailor the content of a message to enhance its appeal to different individuals. With this framework, we directly model content selection decisions based on a set of psychologically-motivated domain-independent personal traits including personality (e.g., extraversion and conscientiousness) and basic human values (e.g., self-transcendence and hedonism). We also demonstrate how the analysis results can be used in automated personalized content selection for persuasive message generation.\nThe IDP knowledge base system currently uses MiniSAT(ID) as its backend Constraint Programming (CP) solver. A few similar systems have used a Mixed Integer Programming (MIP) solver as backend. However, so far little is known about when the MIP solver is preferable. This paper explores this question. It describes the use of CPLEX as a backend for IDP and reports on experiments comparing both backends.\nHave you ever looked at a machine learning classification model and thought, I could have made that? Well, that is what we test in this project, comparing XGBoost trained on human engineered features to training directly on data. The human engineered features do not outperform XGBoost trained di- rectly on the data, but they are comparable. This project con- tributes a novel method for utilizing human created classifi- cation models on high dimensional datasets.\nA mesoscopic approach to modeling pedestrian simulation with multiple exits is proposed in this paper. A floor field based on Qlearning Algorithm is used. Attractiveness of exits to pedestrian typically is based on shortest path. However, several factors may influence pedestrian choice of exits. Scenarios with multiple exits are presented and effect of Q-learning rewards system on navigation is investigated\nIn this paper we present an extension of Peirce's existential graphs to provide a diagrammatic representation of expressions in Quantified Equilibrium Logic (QEL). Using this formalisation, logical connectives are replaced by encircled regions (circles and squares) and quantified variables are represented as \"identity\" lines. Although the expressive power is equivalent to that of QEL, the new representation can be useful for illustrative or educational purposes.\nManual correction of speech transcription can involve a selection from plausible transcriptions. Recent work has shown the feasibility of employing a mismatched crowd for speech transcription. However, it is yet to be established whether a mismatched worker has sufficiently fine-granular speech perception to choose among the phonetically proximate options that are likely to be generated from the trellis of an ASRU. Hence, we consider five languages, Arabic, German, Hindi, Russian and Spanish. For each we generate synthetic, phonetically proximate, options which emulate post-editing scenarios of varying difficulty. We consistently observe non-trivial crowd ability to choose among fine-granular options.\nThis paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform.\nCounterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games. Regret-Based Pruning (RBP) is an improvement that allows poorly-performing actions to be temporarily pruned, thus speeding up CFR. We introduce Total RBP, a new form of RBP that reduces the space requirements of CFR as actions are pruned. We prove that in zero-sum games it asymptotically prunes any action that is not part of a best response to some Nash equilibrium. This leads to provably faster convergence and lower space requirements. Experiments show that Total RBP results in an order of magnitude reduction in space, and the reduction factor increases with game size.\nWe analyze the structure of the state space of chess by means of transition path sampling Monte Carlo simulation. Based on the typical number of moves required to transpose a given configuration of chess pieces into another, we conclude that the state space consists of several pockets between which transitions are rare. Skilled players explore an even smaller subset of positions that populate some of these pockets only very sparsely. These results suggest that the usual measures to estimate both, the size of the state space and the size of the tree of legal moves, are not unique indicators of the complexity of the game, but that topological considerations are equally important.\nWe introduce exploration potential, a quantity that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem's reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multi-armed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation.\nWe express Brewka's prioritised default logic (PDL) as argumentation using ASPIC+. By representing PDL as argumentation and designing an argument preference relation that takes the argument structure into account, we prove that the conclusions of the justified arguments correspond to the PDL extensions. We will first assume that the default priority is total, and then generalise to the case where it is a partial order. This provides a characterisation of non-monotonic inference in PDL as an exchange of argument and counter-argument, providing a basis for distributed non-monotonic reasoning in the form of dialogue.\nIn this paper we introduce the Wastewater Treatment Plant Problem, a real-world scheduling problem, and compare the performance of several tools on it. We show that, for a naive modeling, state-of-the-art SMT solvers outperform other tools ranging from mathematical programming to constraint programming. We use both real and randomly generated benchmarks.   From this and similar results, we claim for the convenience of developing compiler front-ends being able to translate from constraint programming languages to the SMT-LIB standard language.\nIn many machine learning applications, labeled data is scarce and obtaining more labels is expensive. We introduce a new approach to supervising neural networks by specifying constraints that should hold over the output space, rather than direct examples of input-output pairs. These constraints are derived from prior domain knowledge, e.g., from known laws of physics. We demonstrate the effectiveness of this approach on real world and simulated computer vision tasks. We are able to train a convolutional neural network to detect and track objects without any labeled examples. Our approach can significantly reduce the need for labeled training data, but introduces new challenges for encoding prior knowledge into appropriate loss functions.\nBiosurveillance, a relatively young field, has recently increased in importance because of its relevance to national security and global health. Databases and tools describing particular subsets of disease are becoming increasingly common in the field. However, a common method to describe those diseases is lacking. Here, we present the Anthology of Biosurveillance Diseases (ABD), an ontology of infectious diseases of biosurveillance relevance.\nWhen we say \"I know why he was late\", we know not only the fact that he was late, but also an explanation of this fact. We propose a logical framework of \"knowing why\" inspired by the existing formal studies on why-questions, scientific explanation, and justification logic. We introduce the Ky_i operator into the language of epistemic logic to express \"agent i knows why phi\" and propose a Kripke-style semantics of such expressions in terms of knowing an explanation of phi. We obtain two sound and complete axiomatizations w.r.t. two different model classes depending on different assumptions about introspection.\nWe investigate the 'Digital Synaptic Neural Substrate' (DSNS) computational creativity approach further with respect to the size and quality of images that can be used to seed the process. In previous work we demonstrated how combining photographs of people and sequences taken from chess games between weak players can be used to generate chess problems or puzzles of higher aesthetic quality, on average, compared to alternative approaches. In this work we show experimentally that using larger images as opposed to smaller ones improves the output quality even further. The same is also true for using clearer or less corrupted images. The reasons why these things influence the DSNS process is presently not well-understood and debatable but the findings are nevertheless immediately applicable for obtaining better results.\nTo solve hard problems, AI relies on a variety of disciplines such as logic, probabilistic reasoning, machine learning and mathematical programming. Although it is widely accepted that solving real-world problems requires an integration amongst these, contemporary representation methodologies offer little support for this.   In an attempt to alleviate this situation, we introduce a new declarative programming framework that provides abstractions of well-known problems such as SAT, Bayesian inference, generative models, and convex optimization. The semantics of programs is defined in terms of first-order structures with semiring labels, which allows us to freely combine and integrate problems from different AI disciplines.\nAnnotating semantic data with metadata is becoming more and more important to provide information about the statements being asserted. While initial solutions proposed a data model to represent a specific dimension of meta-information (such as time or provenance), the need for a general annotation framework which allows representing different context dimensions is needed. In this paper, we extend the 4dFluents ontology by Welty and Fikes---on associating temporal validity to statements---to any dimension of context, and discuss possible issues that multidimensional context representations have to face and how we address them.\nWe identify the main actors in the Isabelle and Coq communities and describe how they affect and influence their peers. This work explores selected foundations of social networking analysis that we expect to be useful in the context of the ProofPeer project, which is developing a new model for interactive theorem proving based on collaboration and social interactions.\nIn multilingual question answering, either the question needs to be translated into the document language, or vice versa. In addition to direction, there are multiple methods to perform the translation, four of which we explore in this paper: word-based, 10-best, context-based, and grammar-based. We build a feature for each combination of translation direction and method, and train a model that learns optimal feature weights. On a large forum dataset consisting of posts in English, Arabic, and Chinese, our novel learn-to-translate approach was more effective than a strong baseline (p<0.05): translating all text into English, then training a classifier based only on English (original or translated) text.\nWe present the AP16-OL7 database which was released as the training and test data for the oriental language recognition (OLR) challenge on APSIPA 2016. Based on the database, a baseline system was constructed on the basis of the i-vector model. We report the baseline results evaluated in various metrics defined by the AP16-OLR evaluation plan and demonstrate that AP16-OL7 is a reasonable data resource for multilingual research.\nFirst this report presents a restricted set of finite transducers used to synthesise structural time-series constraints described by means of a multi-layered function composition scheme. Second it provides the corresponding synthesised catalogue of structural time-series constraints where each constraint is explicitly described in terms of automata with accumulators.\nVehicle Routing Problem is a well-known problem in logistics and transportation, and the variety of such problems is explained by the fact that it occurs in many real-life situations. It is an NP-hard combinatorial optimization problem and finding an exact optimal solution is practically impossible. In this work, Site-Dependent Truck and Trailer Routing Problem with hard and soft Time Windows and Split Deliveries is considered (SDTTRPTWSD). In this article, we develop a heuristic with the elements of Tabu Search for solving SDTTRPTWSD. The heuristic uses the concept of neighborhoods and visits infeasible solutions during the search. A greedy heuristic is applied to construct an initial solution.\nWe present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically generated logical forms.\nConvolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training methods. In this work we present a novel approach for unsupervised training of Convolutional networks that is based on contrasting between spatial regions within images. This criterion can be employed within conventional neural networks and trained using standard techniques such as SGD and back-propagation, thus complementing supervised methods.\nDeep learning is a branch of artificial intelligence employing deep neural network architectures that has significantly advanced the state-of-the-art in computer vision, speech recognition, natural language processing and other domains. In November 2015, Google released $\\textit{TensorFlow}$, an open source deep learning software library for defining, training and deploying machine learning models. In this paper, we review TensorFlow and put it in context of modern deep learning concepts and software. We discuss its basic computational paradigms and distributed execution model, its programming interface as well as accompanying visualization toolkits. We then compare TensorFlow to alternative libraries such as Theano, Torch or Caffe on a qualitative as well as quantitative basis and finally comment on observed use-cases of TensorFlow in academia and industry.\nWe introduce the lifted Generalized Belief Propagation (GBP) message passing algorithm, for the computation of sum-product queries in Probabilistic Relational Models (e.g. Markov logic network). The algorithm forms a compact region graph and establishes a modified version of message passing, which mimics the GBP behavior in a corresponding ground model. The compact graph is obtained by exploiting a graphical representation of clusters, which reduces cluster symmetry detection to isomorphism tests on small local graphs. The framework is thus capable of handling complex models, while remaining domain-size independent.\nThe rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.\nUsing current reinforcement learning methods, it has recently become possible to learn to play unknown 3D games from raw pixels. In this work, we study the challenges that arise in such complex environments, and summarize current methods to approach these. We choose a task within the Doom game, that has not been approached yet. The goal for the agent is to fight enemies in a 3D world consisting of five rooms. We train the DQN and LSTM-A3C algorithms on this task. Results show that both algorithms learn sensible policies, but fail to achieve high scores given the amount of training. We provide insights into the learned behavior, which can serve as a valuable starting point for further research in the Doom domain.\nPlanning has achieved significant progress in recent years. Among the various approaches to scale up plan synthesis, the use of macro-actions has been widely explored. As a first stage towards the development of a solution to learn on-line macro-actions, we propose an algorithm to identify useful macro-actions based on data mining techniques. The integration in the planning search of these learned macro-actions shows significant improvements over four classical planning benchmarks.\nAntichain based semantics for general rough sets were introduced recently by the present author. In her paper two different semantics, one for general rough sets and another for general approximation spaces over quasi-equivalence relations, were developed. These semantics are improved and studied further from a lateral algebraic logic perspective in this research. The main results concern the structure of the algebras and deductive systems in the context.\nWe propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the first time that deep reinforcement learning has succeeded in learning multi-objective policies. In addition, we provide a testbed with two experiments to be used as a benchmark for deep multi-objective reinforcement learning.\nWe present ABA+, a new approach to handling preferences in a well known structured argumentation formalism, Assumption-Based Argumentation (ABA). In ABA+, preference information given over assumptions is incorporated directly into the attack relation, thus resulting in attack reversal. ABA+ conservatively extends ABA and exhibits various desirable features regarding relationship among argumentation semantics as well as preference handling. We also introduce Weak Contraposition, a principle concerning reasoning with rules and preferences that relaxes the standard principle of contraposition, while guaranteeing additional desirable features for ABA+.\nPeople enjoy encounters with generative software, but rarely are they encouraged to interact with, understand or engage with it. In this paper we define the term 'PCG-based game', and explain how this concept follows on from the idea of an AI-based game. We look at existing examples of games which foreground their AI, put forward a methodology for designing PCG-based games, describe some example case study designs for PCG-based games, and describe lessons learned during this process of sketching and developing ideas.\nA college student's life can be primarily categorized into domains such as education, health, social and other activities which may include daily chores and travelling time. Time management is crucial for every student. A self realisation of one's daily time expenditure in various domains is therefore essential to maximize one's effective output. This paper presents how a mobile application using Fuzzy Logic and Global Positioning System (GPS) analyzes a student's lifestyle and provides recommendations and suggestions based on the results.\nWe describe the solution of team ISMLL for the ECML-PKDD 2016 Discovery Challenge on Bank Card Usage for both tasks. Our solution is based on three pillars. Gradient boosted decision trees as a strong regression and classification model, an intensive search for good hyperparameter configurations and strong features that exploit geolocation information. This approach achieved the best performance on the public leaderboard for the first task and a decent fourth position for the second task.\nKnowledge base completion aims to infer new relations from existing information. In this paper, we propose path-augmented TransR (PTransR) model to improve the accuracy of link prediction. In our approach, we base PTransR model on TransR, which is the best one-hop model at present. Then we regularize TransR with information of relation paths. In our experiment, we evaluate PTransR on the task of entity prediction. Experimental results show that PTransR outperforms previous models.\nAccurate prediction of wind ramp events is critical for ensuring the reliability and stability of the power systems with high penetration of wind energy. This paper proposes a classification based approach for estimating the future class of wind ramp event based on certain thresholds. A parallelized gradient boosted regression tree based technique has been proposed to accurately classify the normal as well as rare extreme wind power ramp events. The model has been validated using wind power data obtained from the National Renewable Energy Laboratory database. Performance comparison with several benchmark techniques indicates the superiority of the proposed technique in terms of superior classification accuracy.\nThis study concerns with the diagnosis of aerospace structure defects by applying a HPC parallel implementation of a novel learning algorithm, named U-BRAIN. The Soft Computing approach allows advanced multi-parameter data processing in composite materials testing. The HPC parallel implementation overcomes the limits due to the great amount of data and the complexity of data processing. Our experimental results illustrate the effectiveness of the U-BRAIN parallel implementation as defect classifier in aerospace structures. The resulting system is implemented on a Linux-based cluster with multi-core architecture.\nIn this paper, we propose a new data based model for influence maximization in online social networks. We use the theory of belief functions to overcome the data imperfection problem. Besides, the proposed model searches to detect influencer users that adopt a positive opinion about the product, the idea, etc, to be propagated. Moreover, we present some experiments to show the performance of our model.\nIn this work we extend to the interval-valued setting the notion of an overlap functions and we discuss a method which makes use of interval-valued overlap functions for constructing OWA operators with interval-valued weights. . Some properties of interval-valued overlap functions and the derived interval-valued OWA operators are analysed. We specially focus on the homogeneity and migrativity properties.\nA new architecture and learning algorithms for the multidimensional hybrid cascade neural network with neuron pool optimization in each cascade are proposed in this paper. The proposed system differs from the well-known cascade systems in its capability to process multidimensional time series in an online mode, which makes it possible to process non-stationary stochastic and chaotic signals with the required accuracy. Compared to conventional analogs, the proposed system provides computational simplicity and possesses both tracking and filtering capabilities.\nAn evolving weighted neuro-neo-fuzzy-ANARX model and its learning procedures are introduced in the article. This system is basically used for time series forecasting. This system may be considered as a pool of elements that process data in a parallel manner. The proposed evolving system may provide online processing data streams.\nAn architecture of a new neuro-fuzzy system is proposed. The basic idea of this approach is to tune both synaptic weights and membership functions with the help of the supervised learning and self-learning paradigms. The approach to solving the problem has to do with evolving online neuro-fuzzy systems that can process data under uncertainty conditions. The results prove the effectiveness of the developed architecture and the learning procedure.\nA new approach to data stream clustering with the help of an ensemble of adaptive neuro-fuzzy systems is proposed. The proposed ensemble is formed with adaptive neuro-fuzzy self-organizing Kohonen maps in a parallel processing mode. A final result is chosen by the best neuro-fuzzy self-organizing Kohonen map.\nWe propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including global and local attention, and our best models achieve state-of-the-art results on three standard data sets (CMUDict, Pronlex, and NetTalk).\nIn this work, we investigate the application of Reinforcement Learning to two well known decision dilemmas, namely Newcomb's Problem and Prisoner's Dilemma. These problems are exemplary for dilemmas that autonomous agents are faced with when interacting with humans. Furthermore, we argue that a Newcomb-like formulation is more adequate in the human-machine interaction case and demonstrate empirically that the unmodified Reinforcement Learning algorithms end up with the well known maximum expected utility solution.\nPairwise comparisons between alternatives are a well-known method for measuring preferences of a decision-maker. Since these often do not exhibit consistency, a number of inconsistency indices has been introduced in order to measure the deviation from this ideal case. We axiomatically characterize the inconsistency ranking induced by the Koczkodaj inconsistency index: six independent properties are presented such that they determine a unique linear order on the set of all pairwise comparison matrices.\nWe present a fast variational Bayesian algorithm for performing non-negative matrix factorisation and tri-factorisation. We show that our approach achieves faster convergence per iteration and timestep (wall-clock) than Gibbs sampling and non-probabilistic approaches, and do not require additional samples to estimate the posterior. We show that in particular for matrix tri-factorisation convergence is difficult, but our variational Bayesian approach offers a fast solution, allowing the tri-factorisation approach to be used more effectively.\nThis paper presents the Voronoi diagram-based evolutionary algorithm (VorEAl). VorEAl partitions input space in abnormal/normal subsets using Voronoi diagrams. Diagrams are evolved using a multi-objective bio-inspired approach in order to conjointly optimize classification metrics while also being able to represent areas of the data space that are not present in the training dataset. As part of the paper VorEAl is experimentally validated and contrasted with similar approaches.\nWe propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces. For sampling, we show the first analysis of the Hit-and-Run algorithm in non-convex spaces and show that it mixes fast as long as certain smoothness conditions are satisfied. In particular, our analysis reveals an intriguing connection between fast mixing and the existence of smooth measure-preserving mappings from a convex space to the non-convex space. For planning, we show advantages of Hit-and-Run compared to state-of-the-art planning methods such as Rapidly-Exploring Random Trees.\nIn this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic data-sets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over state-of-the-art techniques and highlight the specific class of problems where this would be useful.\nSpectral inference provides fast algorithms and provable optimality for latent topic analysis. But for real data these algorithms require additional ad-hoc heuristics, and even then often produce unusable results. We explain this poor performance by casting the problem of topic inference in the framework of Joint Stochastic Matrix Factorization (JSMF) and showing that previous methods violate the theoretical conditions necessary for a good solution to exist. We then propose a novel rectification method that learns high quality topics and their interactions even on small, noisy data. This method achieves results comparable to probabilistic techniques in several domains while maintaining scalability and provable optimality.\nIn this book authors for the first time introduce the notion of strong neutrosophic graphs. They are very different from the usual graphs and neutrosophic graphs. Using these new structures special subgraph topological spaces are defined. Further special lattice graph of subgraphs of these graphs are defined and described. Several interesting properties using subgraphs of a strong neutrosophic graph are obtained. Several open conjectures are proposed. These new class of strong neutrosophic graphs will certainly find applications in Neutrosophic Cognitive Maps (NCM), Neutrosophic Relational Maps (NRM) and Neutrosophic Relational Equations (NRE) with appropriate modifications.\nIn this article using Cuckoo Optimization Algorithm and simple additive weighting method the hybrid COAW algorithm is presented to solve multi-objective problems. Cuckoo algorithm is an efficient and structured method for solving nonlinear continuous problems. The created Pareto frontiers of the COAW proposed algorithm are exact and have good dispersion. This method has a high speed in finding the Pareto frontiers and identifies the beginning and end points of Pareto frontiers properly. In order to validation the proposed algorithm, several experimental problems were analyzed. The results of which indicate the proper effectiveness of COAW algorithm for solving multi-objective problems.\nWe present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft.\nThe GANs are generative models whose random samples realistically reflect natural images. It also can generate samples with specific attributes by concatenating a condition vector into the input, yet research on this field is not well studied. We propose novel methods of conditioning generative adversarial networks (GANs) that achieve state-of-the-art results on MNIST and CIFAR-10. We mainly introduce two models: an information retrieving model that extracts conditional information from the samples, and a spatial bilinear pooling model that forms bilinear features derived from the spatial cross product of an image and a condition vector. These methods significantly enhance log-likelihood of test data under the conditional distributions compared to the methods of concatenation.\nMany NLP tasks including machine comprehension, answer selection and text entailment require the comparison between sequences. Matching the important units between sequences is a key to solve these problems. In this paper, we present a general \"compare-aggregate\" framework that performs word-level matching followed by aggregation using Convolutional Neural Networks. We particularly focus on the different comparison functions we can use to match two vectors. We use four different datasets to evaluate the model. We find that some simple comparison functions based on element-wise operations can work better than standard neural network and neural tensor network.\nOne of the classical problems in machine learning and data mining is feature selection. A feature selection algorithm is expected to be quick, and at the same time it should show high performance. MeLiF algorithm effectively solves this problem using ensembles of ranking filters. This article describes two different ways to improve MeLiF algorithm performance with parallelization. Experiments show that proposed schemes significantly improves algorithm performance and increase feature selection quality.\nFormal concepts and closed itemsets proved to be of big importance for knowledge discovery, both as a tool for concise representation of association rules and a tool for clustering and constructing domain taxonomies and ontologies. Exponential explosion makes it difficult to consider the whole concept lattice arising from data, one needs to select most useful and interesting concepts. In this paper interestingness measures of concepts are considered and compared with respect to various aspects, such as efficiency of computation and applicability to noisy data and performing ranking correlation.\nWe present a novel framework for generating pop music. Our model is a hierarchical Recurrent Neural Network, where the layers and the structure of the hierarchy encode our prior knowledge about how pop music is composed. In particular, the bottom layers generate the melody, while the higher levels produce the drums and chords. We conduct several human studies that show strong preference of our generated music over that produced by the recent method by Google. We additionally show two applications of our framework: neural dancing and karaoke, as well as neural story singing.\nWe present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.\nA probabilistic query may not be estimable from observed data corrupted by missing values if the data are not missing at random (MAR). It is therefore of theoretical interest and practical importance to determine in principle whether a probabilistic query is estimable from missing data or not when the data are not MAR. We present an algorithm that systematically determines whether the joint probability is estimable from observed data with missing values, assuming that the data-generation model is represented as a Bayesian network containing unobserved latent variables that not only encodes the dependencies among the variables but also explicitly portrays the mechanisms responsible for the missingness process. The result significantly advances the existing work.\nLSTMs have become a basic building block for many deep NLP models. In recent years, many improvements and variations have been proposed for deep sequence models in general, and LSTMs in particular. We propose and analyze a series of augmentations and modifications to LSTM networks resulting in improved performance for text classification datasets. We observe compounding improvements on traditional LSTMs using Monte Carlo test-time model averaging, average pooling, and residual connections, along with four other suggested modifications. Our analysis provides a simple, reliable, and high quality baseline model.\nThe artistic style of a painting is a subtle aesthetic judgment used by art historians for grouping and classifying artwork. The recently introduced `neural-style' algorithm substantially succeeds in merging the perceived artistic style of one image or set of images with the perceived content of another. In light of this and other recent developments in image analysis via convolutional neural networks, we investigate the effectiveness of a `neural-style' representation for classifying the artistic style of paintings.\nMonte Carlo Tree Search (MCTS) is a technique to guide search in a large decision space by taking random samples and evaluating their outcome. In this work, we study MCTS methods in the context of the connection calculus and implement them on top of the leanCoP prover. This includes proposing useful proof-state evaluation heuristics that are learned from previous proofs, and proposing and automatically improving suitable MCTS strategies in this context. The system is trained and evaluated on a large suite of related problems coming from the Mizar proof assistant, showing that it is capable to find new and different proofs. To our knowledge, this is the first time MCTS has been applied to theorem proving.\nIn order to be useful, visualizations need to be interpretable. This paper uses a user-based approach to combine and assess quality measures in order to better model user preferences. Results show that cluster separability measures are outperformed by a neighborhood conservation measure, even though the former are usually considered as intuitively representative of user motives. Moreover, combining measures, as opposed to using a single measure, further improves prediction performances.\nDeep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable parameters. Classical Dropout emerges as a special case of this method. Another member of this family selects the width of neural network layers. Experiments show that these methods help in improving generalization performance over Dropout.\nWe consider the problem of learning hierarchical policies for Reinforcement Learning able to discover options, an option corresponding to a sub-policy over a set of primitive actions. Different models have been proposed during the last decade that usually rely on a predefined set of options. We specifically address the problem of automatically discovering options in decision processes. We describe a new learning model called Budgeted Option Neural Network (BONN) able to discover options based on a budgeted learning objective. The BONN model is evaluated on different classical RL problems, demonstrating both quantitative and qualitative interesting results.\nLimbo is an open-source C++11 library for Bayesian optimization which is designed to be both highly flexible and very fast. It can be used to optimize functions for which the gradient is unknown, evaluations are expensive, and runtime cost matters (e.g., on embedded systems or robots). Benchmarks on standard functions show that Limbo is about 2 times faster than BayesOpt (another C++ library) for a similar accuracy.\nMany real world stochastic control problems suffer from the \"curse of dimensionality\". To overcome this difficulty, we develop a deep learning approach that directly solves high-dimensional stochastic control problems based on Monte-Carlo sampling. We approximate the time-dependent controls as feedforward neural networks and stack these networks together through model dynamics. The objective function for the control problem plays the role of the loss function for the deep neural network. We test this approach using examples from the areas of optimal trading and energy storage. Our results suggest that the algorithm presented here achieves satisfactory accuracy and at the same time, can handle rather high dimensional problems.\nStandard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning---termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems.\nWe establish a link between multiwinner elections and apportionment problems by showing how approval-based multiwinner election rules can be interpreted as methods of apportionment. We consider several multiwinner rules and observe that they induce apportionment methods that are well-established in the literature on proportional representation. For instance, we show that Proportional Approval Voting induces the D'Hondt method and that Monroe's rule induces the largest reminder method. We also consider properties of apportionment methods and exhibit multiwinner rules that induce apportionment methods satisfying these properties.\nConstraint Programming (CP) users need significant expertise in order to model their problems appropriately, notably to select propagators and search strategies. This puts the brakes on a broader uptake of CP. In this paper, we introduce MICE, a complete Java CP modeler that can use any Mixed Integer Linear Programming (MILP) solver as a solution technique. Our aim is to provide an alternative tool for democratizing the \"CP-style\" modeling thanks to its simplicity of use, with reasonable solving capabilities. Our contributions include new decompositions of (reified) constraints and constraints on numerical variables.\nWith regard to a computational representation of literary plot, this paper looks at the use of sentiment analysis for happy ending detection in German novels. Its focus lies on the investigation of previously proposed sentiment features in order to gain insight about the relevance of specific features on the one hand and the implications of their performance on the other hand. Therefore, we study various partitionings of novels, considering the highly variable concept of \"ending\". We also show that our approach, even though still rather simple, can potentially lead to substantial findings relevant to literary studies.\nThis article presents a new quantum-like model for cognition explicitly based on knowledge. It is shown that this model, called QKT (quantum knowledge-based theory), is able to coherently describe some experimental results that are problematic for the prior quantum-like decision models. In particular, I consider the experimental results relevant to the post-decision cognitive dissonance, the problems relevant to the question order effect and response replicability, and those relevant to the grand-reciprocity equations. A new set of postulates is proposed, which evidence the different meaning given to the projectors and to the quantum states. In the final part, I show that the use of quantum gates can help to better describe and understand the evolution of quantum-like models.\nWe study a group of new methods to solve an open problem that is the shortest paths problem on a given fix-weighted instance. It is the real significance at a considerable altitude to reach our aim to meet these qualities of generic, efficiency, precision which we generally require to a methodology. Besides our proof to guarantee our measures might work normally, we pay more interest to root out the vital theory about calculation and logic in favor of our extension to range over a wide field about decision, operator, economy, management, robot, AI and etc.\nWe study methods for automated parsing of informal mathematical expressions into formal ones, a main prerequisite for deep computer understanding of informal mathematical texts. We propose a context-based parsing approach that combines efficient statistical learning of deep parse trees with their semantic pruning by type checking and large-theory automated theorem proving. We show that the methods very significantly improve on previous results in parsing theorems from the Flyspeck corpus.\nGenerative adversarial networks have been proposed as a way of efficiently training deep generative neural networks. We propose a generative adversarial model that works on continuous sequential data, and apply it by training it on a collection of classical music. We conclude that it generates music that sounds better and better as the model is trained, report statistics on generated music, and let the reader judge the quality by downloading the generated songs.\nWe describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space. We formulate the embedding problem as a joint ranking problem over a set of triplets, where each triplet captures the relative similarities between three objects in the set. By exploiting recent advances in robust ranking, t-ETE produces high-quality embeddings even in the presence of a significant amount of noise and better preserves local scale than known methods, such as t-STE and t-SNE. In particular, our method produces significantly better results than t-SNE on signature datasets while also being faster to compute.\nWe devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems. Experimental results on updated versions of IEEE-RTS79 and IEEE-RTS96 show high accuracy measured on operational cost, achieved in runtimes that are lower in several orders of magnitude than the traditional approach.\nIn the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.\nWe present probabilistic neural programs, a framework for program induction that permits flexible specification of both a computational model and inference algorithm while simultaneously enabling the use of deep neural networks. Probabilistic neural programs combine a computation graph for specifying a neural network with an operator for weighted nondeterministic choice. Thus, a program describes both a collection of decisions as well as the neural network architecture used to make each one. We evaluate our approach on a challenging diagram question answering task where probabilistic neural programs correctly execute nearly twice as many programs as a baseline model.\nThe Center of Gravity (COG) method is one of the most popular defuzzification techniques of fuzzy mathematics. In earlier works the COG technique was properly adapted to be used as an assessment model (RFAM)and several variations of it (GRFAM, TFAM and TpFAM)were also constructed for the same purpose. In this paper the outcomes of all these models are compared to the corresponding outcomes of a traditional assessment method of the bi-valued logic, the Grade Point Average (GPA) Index. Examples are also presented illustrating our results.\nIn an independence model, the triplets that represent conditional independences between singletons are called elementary. It is known that the elementary triplets represent the independence model unambiguously under some conditions. In this paper, we show how this representation helps performing some operations with independence models, such as finding the dominant triplets or a minimal independence map of an independence model, or computing the union or intersection of a pair of independence models, or performing causal reasoning. For the latter, we rephrase in terms of conditional independences some of Pearl's results for computing causal effects.\nThe method presented extends a given regression neural network to make its performance improve. The modification affects the learning procedure only, hence the extension may be easily omitted during evaluation without any change in prediction. It means that the modified model may be evaluated as quickly as the original one but tends to perform better.   This improvement is possible because the modification gives better expressive power, provides better behaved gradients and works as a regularization. The knowledge gained by the temporarily extended neural network is contained in the parameters shared with the original neural network.   The only cost is an increase in learning time.\nWhen faced with complex choices, users refine their own preference criteria as they explore the catalogue of options. In this paper we propose an approach to preference elicitation suited for this scenario. We extend Coactive Learning, which iteratively collects manipulative feedback, to optionally query example critiques. User critiques are integrated into the learning model by dynamically extending the feature space. Our formulation natively supports constructive learning tasks, where the option catalogue is generated on-the-fly. We present an upper bound on the average regret suffered by the learner. Our empirical analysis highlights the promise of our approach.\nKnowledge Graphs (KG) constitute a flexible representation of complex relationships between entities particularly useful for biomedical data. These KG, however, are very sparse with many missing edges (facts) and the visualisation of the mesh of interactions nontrivial. Here we apply a compositional model to embed nodes and relationships into a vectorised semantic space to perform graph completion. A visualisation tool based on Convolutional Neural Networks and Self-Organised Maps (SOM) is proposed to extract high-level insights from the KG. We apply this technique to a subset of CTD, containing interactions of compounds with human genes / proteins and show that the performance is comparable to the one obtained by structural models.\nIn decision theory an act is a function from a set of conditions to the set of real numbers. The set of conditions is a partition in some algebra of events. The expected value of an act can be calculated when a probability measure is given. We adopt an algebraic point of view by substituting the algebra of events with a finite distributive lattice and the probability measure with a lattice valuation. We introduce a partial order on acts that generalizes the dominance relation and show that the set of acts is a lattice with respect to this order. Finally we analyze some different kinds of comparison between acts, without supposing a common set of conditions for the acts to be compared.\nMobile robots with complex morphology are essential for traversing rough terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the complex morphology causes high cognitive load of the operator, the morphology is controlled autonomously. The autonomous control measures the robot state and surrounding terrain which is usually only partially observable, and thus the data are often incomplete. We marginalize the control over the missing measurements and evaluate an explicit safety condition. If the safety condition is violated, tactile terrain exploration by the body-mounted robotic arm gathers the missing data.\nHierarchical architectures are critical to the scalability of reinforcement learning methods. Current hierarchical frameworks execute actions serially, with macro-actions comprising sequences of primitive actions. We propose a novel alternative to these control hierarchies based on concurrent execution of many actions in parallel. Our scheme uses the concurrent compositionality provided by the linearly solvable Markov decision process (LMDP) framework, which naturally enables a learning agent to draw on several macro-actions simultaneously to solve new tasks. We introduce the Multitask LMDP module, which maintains a parallel distributed representation of tasks and may be stacked to form deep hierarchies abstracted in space and time.\nIt has been widely recognized that uncertainty is an inevitable aspect of diagnosis and treatment of medical disorders. Such uncertainties hence, need to be considered in computerized medical models. The existing medical modeling techniques however, have mainly focused on capturing uncertainty associated with diagnosis of medical disorders while ignoring uncertainty of treatments. To tackle this issue, we have proposed using a fuzzy-based modeling and description technique for capturing uncertainties in treatment plans. We have further contributed a formal framework which allows for goal-oriented modeling and analysis of medical treatments.\nTranscriptional profiling on microarrays to obtain gene expressions has been used to facilitate cancer diagnosis. We propose a deep generative machine learning architecture (called DeepCancer) that learn features from unlabeled microarray data. These models have been used in conjunction with conventional classifiers that perform classification of the tissue samples as either being cancerous or non-cancerous. The proposed model has been tested on two different clinical datasets. The evaluation demonstrates that DeepCancer model achieves a very high precision score, while significantly controlling the false positive and false negative scores.\nWe propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour.\nThis paper introduces the probabilistic module interface, which allows encapsulation of complex probabilistic models with latent variables alongside custom stochastic approximate inference machinery, and provides a platform-agnostic abstraction barrier separating the model internals from the host probabilistic inference system. The interface can be seen as a stochastic generalization of a standard simulation and density interface for probabilistic primitives. We show that sound approximate inference algorithms can be constructed for networks of probabilistic modules, and we demonstrate that the interface can be implemented using learned stochastic inference networks and MCMC and SMC approximate inference programs.\nA prediction market is a useful means of aggregating information about a future event. To function, the market needs a trusted entity who will verify the true outcome in the end. Motivated by the recent introduction of decentralized prediction markets, we introduce a mechanism that allows for the outcome to be determined by the votes of a group of arbiters who may themselves hold stakes in the market. Despite the potential conflict of interest, we derive conditions under which we can incentivize arbiters to vote truthfully by using funds raised from market fees to implement a peer prediction mechanism. Finally, we investigate what parameter values could be used in a real-world implementation of our mechanism.\nIn this paper, we describe the construction of TeKnowbase, a knowledge-base of technical concepts in computer science. Our main information sources are technical websites such as Webopedia and Techtarget as well as Wikipedia and online textbooks. We divide the knowledge-base construction problem into two parts -- the acquisition of entities and the extraction of relationships among these entities. Our knowledge-base consists of approximately 100,000 triples. We conducted an evaluation on a sample of triples and report an accuracy of a little over 90\\%. We additionally conducted classification experiments on StackOverflow data with features from TeKnowbase and achieved improved classification accuracy.\nHow to manage conflict is still an open issue in Dempster-Shafer evidence theory. The correlation coefficient can be used to measure the similarity of evidence in Dempster-Shafer evidence theory. However, existing correlation coefficients of belief functions have some shortcomings. In this paper, a new correlation coefficient is proposed with many desirable properties. One of its applications is to measure the conflict degree among belief functions. Some numerical examples and comparisons demonstrate the effectiveness of the correlation coefficient.\nRepresenting a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL. The key idea is to maintain a second RNN which predicts the value of the current policy, and to apply experience replay to both networks. On two tasks, these methods reduce the number of dialogs/episodes required by about a third, vs. standard policy gradient methods.\nThis paper investigates a type of instability that is linked to the greedy policy improvement in approximated reinforcement learning. We show empirically that non-deterministic policy improvement can stabilize methods like LSPI by controlling the improvements' stochasticity. Additionally we show that a suitable representation of the value function also stabilizes the solution to some degree. The presented approach is simple and should also be easily transferable to more sophisticated algorithms like deep reinforcement learning.\nWe formulate an integer program to solve a highly constrained academic timetabling problem at the United States Merchant Marine Academy. The IP instance that results from our real case study has approximately both 170,000 rows and columns and solves to optimality in 4--24 hours using a commercial solver on a portable computer (near optimal feasible solutions were often found in 4--12 hours). Our model is applicable to both high schools and small colleges who wish to deviate from group scheduling. We also solve a necessary preprocessing student subgrouping problem, which breaks up big groups of students into small groups so they can optimally fit into small capacity classes.\nWe introduce a new paradigm to investigate unsupervised learning, reducing unsupervised learning to supervised learning. Specifically, we mitigate the subjectivity in unsupervised decision-making by leveraging knowledge acquired from prior, possibly heterogeneous, supervised learning tasks. We demonstrate the versatility of our framework via comprehensive expositions and detailed experiments on several unsupervised problems such as (a) clustering, (b) outlier detection, and (c) similarity prediction under a common umbrella of meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to establish the theoretical foundations of our framework, and show that our framing of meta-clustering circumvents Kleinberg's impossibility theorem for clustering.\nIn this work we present an algorithm for composing monophonic melodies similar in style to those of a given, phrase annotated, sample of melodies. For implementation, a hybrid approach incorporating parametric Markov models of higher order and a contour concept of phrases is used. This work is based on the master thesis of Thayabaran Kathiresan (2015). An online listening test conducted shows that enhancing a pure Markov model with musically relevant context, like count and planed melody contour, improves the result significantly.\nThis technical report describes the usage, syntax, semantics and core algorithms of the probabilistic inductive logic programming framework PrASP. PrASP is a research software which integrates non-monotonic reasoning based on Answer Set Programming (ASP), probabilistic inference and parameter learning. In contrast to traditional approaches to Probabilistic (Inductive) Logic Programming, our framework imposes only little restrictions on probabilistic logic programs. In particular, PrASP allows for ASP as well as First-Order Logic syntax, and for the annotation of formulas with point probabilities as well as interval probabilities. A range of widely configurable inference algorithms can be combined in a pipeline-like fashion, in order to cover a variety of use cases.\nThis paper tackles the reduction of redundant repeating generation that is often observed in RNN-based encoder-decoder models. Our basic idea is to jointly estimate the upper-bound frequency of each target vocabulary in the encoder and control the output words based on the estimation in the decoder. Our method shows significant improvement over a strong RNN-based encoder-decoder baseline and achieved its best results on an abstractive summarization benchmark.\nIn this paper, we tackle the problem of risk-averse route planning in a transportation network with time-dependent and stochastic costs. To solve this problem, we propose an adaptation of the A* algorithm that accommodates any risk measure or decision criterion that is monotonic with first-order stochastic dominance. We also present a case study of our algorithm on the Manhattan, NYC, transportation network.\nIn this paper, we present a link between preference-based and multiobjective sequential decision-making. While transforming a multiobjective problem to a preference-based one is quite natural, the other direction is a bit less obvious. We present how this transformation (from preference-based to multiobjective) can be done under the classic condition that preferences over histories can be represented by additively decomposable utilities and that the decision criterion to evaluate policies in a state is based on expectation. This link yields a new source of multiobjective sequential decision-making problems (i.e., when reward values are unknown) and justifies the use of solving methods developed in one setting in the other one.\nThe elegant Stalnaker/Lewis semantics for counterfactual conditonals works with distances between models. But human beings certainly have no tables of models and distances in their head. We begin here an investigation using a more realistic picture, based on findings in neuroscience. We call it a pre-semantics, as its meaning is not a description of the world, but of the brain, whose structure is (partly) determined by the world it reasons about. In the final section, we reconsider the components, and postulate that there are no atomic pictures, we can always look inside.\nThe high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.\nWe describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about the underlying techniques.\nMuch of work in semantic web relying on Wikipedia as the main source of knowledge often work on static snapshots of the dataset. The full history of Wikipedia revisions, while contains much more useful information, is still difficult to access due to its exceptional volume. To enable further research on this collection, we developed a tool, named Hedera, that efficiently extracts semantic information from Wikipedia revision history datasets. Hedera exploits Map-Reduce paradigm to achieve rapid extraction, it is able to handle one entire Wikipedia articles revision history within a day in a medium-scale cluster, and supports flexible data structures for various kinds of semantic web study.\nDeep learning classifiers are known to be inherently vulnerable to manipulation by intentionally perturbed inputs, named adversarial examples. In this work, we establish that reinforcement learning techniques based on Deep Q-Networks (DQNs) are also vulnerable to adversarial input perturbations, and verify the transferability of adversarial examples across different DQN models. Furthermore, we present a novel class of attacks based on this vulnerability that enable policy manipulation and induction in the learning process of DQNs. We propose an attack mechanism that exploits the transferability of adversarial examples to implement policy induction attacks on DQNs, and demonstrate its efficacy and impact through experimental study of a game-learning scenario.\nA network embedding is a representation of a large graph in a low-dimensional space, where vertices are modeled as vectors. The objective of a good embedding is to preserve the proximity between vertices in the original graph. This way, typical search and mining methods can be applied in the embedded space with the help of off-the-shelf multidimensional indexing approaches. Existing network embedding techniques focus on homogeneous networks, where all vertices are considered to belong to a single class.\nWe introduce the Binary Matrix Guessing Problem and provide two algorithms to solve this problem. The first algorithm we introduce is Elementwise Probing Algorithm (EPA) which is very fast under a score which utilizes Frobenius Distance. The second algorithm is Additive Reinforcement Learning Algorithm which combines ideas from perceptron algorithm and reinforcement learning algorithm. This algorithm is significantly slower compared to first one, but less restrictive and generalizes better. We compare computational performance of both algorithms and provide numerical results.\nENIGMA is a learning-based method for guiding given clause selection in saturation-based theorem provers. Clauses from many proof searches are classified as positive and negative based on their participation in the proofs. An efficient classification model is trained on this data, using fast feature-based characterization of the clauses . The learned model is then tightly linked with the core prover and used as a basis of a new parameterized evaluation heuristic that provides fast ranking of all generated clauses. The approach is evaluated on the E prover and the CASC 2016 AIM benchmark, showing a large increase of E's performance.\nA semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.\nWith the purpose of modeling, specifying and reasoning in an integrated fashion with procedural and declarative aspects (both commonly present in cases or scenarios), the paper introduces Logic Programming Petri Nets (LPPN), an extension to the Petri Net notation providing an interface to logic programming constructs. Two semantics are presented. First, a hybrid operational semantics that separates the process component, treated with Petri nets, from the constraint/terminological component, treated with Answer Set Programming (ASP). Second, a denotational semantics maps the notation to ASP fully, via Event Calculus. These two alternative specifications enable a preliminary evaluation in terms of reasoning efficiency.\nDiscovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \\ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \\ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values.\nOrganic Computing is an initiative in the field of systems engineering that proposed to make use of concepts such as self-adaptation and self-organisation to increase the robustness of technical systems. Based on the observation that traditional design and operation concepts reach their limits, transferring more autonomy to the systems themselves should result in a reduction of complexity for users, administrators, and developers. However, there seems to be a need for an updated definition of the term \"Organic Computing\", of desired properties of technical, organic systems, and the objectives of the Organic Computing initiative. With this article, we will address these points.\nSince formulation of Inductive Database (IDB) problem, several Data Mining (DM) languages have been proposed, confirming that KDD process could be supported via inductive queries (IQ) answering. This paper reviews the existing DM languages. We are presenting important primitives of the DM language and classifying our languages according to primitives' satisfaction. In addition, we presented languages' syntaxes and tried to apply each one to a database sample to test a set of KDD operations. This study allows us to highlight languages capabilities and limits, which is very useful for future work and perspectives.\nMaintenance of association rules is an interesting problem. Several incremental maintenance algorithms were proposed since the work of (Cheung et al, 1996). The majority of these algorithms maintain rule bases assuming that support threshold doesn't change. In this paper, we present incremental maintenance algorithm under support threshold change. This solution allows user to maintain its rule base under any support threshold.\nFOSS is an acronym for Free and Open Source Software. The FOSS 2013 survey primarily targets FOSS contributors and relevant anonymized dataset is publicly available under CC by SA license. In this study, the dataset is analyzed from a critical perspective using statistical and clustering techniques (especially multiple correspondence analysis) with a strong focus on women contributors towards discovering hidden trends and facts. Important inferences are drawn about development practices and other facets of the free software and OSS worlds.\nBased on the in-depth analysis of the essence and features of vague phenomena, this paper focuses on establishing the axiomatical foundation of membership degree theory for vague phenomena, presents an axiomatic system to govern membership degrees and their interconnections. On this basis, the concept of vague partition is introduced, further, the concept of fuzzy set introduced by Zadeh in 1965 is redefined based on vague partition from the perspective of axiomatization. The thesis defended in this paper is that the relationship among vague attribute values should be the starting point to recognize and model vague phenomena from a quantitative view.\nThe longest arc-preserving common subsequence problem is an NP-hard combinatorial optimization problem from the field of computational biology. This problem finds applications, in particular, in the comparison of arc-annotated Ribonucleic acid (RNA) sequences. In this work we propose a simple, hybrid evolutionary algorithm to tackle this problem. The most important feature of this algorithm concerns a crossover operator based on solution merging. In solution merging, two or more solutions to the problem are merged, and an exact technique is used to find the best solution within this union. It is experimentally shown that the proposed algorithm outperforms a heuristic from the literature.\nMany systems of structured argumentation explicitly require that the facts and rules that make up the argument for a conclusion be the minimal set required to derive the conclusion. ASPIC+ does not place such a requirement on arguments, instead requiring that every rule and fact that are part of an argument be used in its construction. Thus ASPIC+ arguments are minimal in the sense that removing any element of the argument would lead to a structure that is not an argument. In this brief note we discuss these two types of minimality and show how the first kind of minimality can, if desired, be recovered in ASPIC+.\nInteractive model analysis, the process of understanding, diagnosing, and refining a machine learning model with the help of interactive visualization, is very important for users to efficiently solve real-world artificial intelligence and data mining problems. Dramatic advances in big data analytics has led to a wide variety of interactive model analysis tasks. In this paper, we present a comprehensive analysis and interpretation of this rapidly developing area. Specifically, we classify the relevant work into three categories: understanding, diagnosis, and refinement. Each category is exemplified by recent influential work. Possible future research opportunities are also explored and discussed.\nWith the advent of modern computer networks, fault diagnosis has been a focus of research activity. This paper reviews the history of fault diagnosis in networks and discusses the main methods in information gathering section, information analyzing section and diagnosing and revolving section of fault diagnosis in networks. Emphasis will be placed upon knowledge-based methods with discussing the advantages and shortcomings of the different methods. The survey is concluded with a description of some open problems.\nThe recent evolution of induced seismicity in Central United States calls for exhaustive catalogs to improve seismic hazard assessment. Over the last decades, the volume of seismic data has increased exponentially, creating a need for efficient algorithms to reliably detect and locate earthquakes. Today's most elaborate methods scan through the plethora of continuous seismic records, searching for repeating seismic signals. In this work, we leverage the recent advances in artificial intelligence and present ConvNetQuake, a highly scalable convolutional neural network for earthquake detection and location from a single waveform. We apply our technique to study the induced seismicity in Oklahoma (USA). We detect 20 times more earthquakes than previously cataloged by the Oklahoma Geological Survey. Our algorithm is orders of magnitude faster than established methods.\nThe Cerebellar Model Articulation Controller (CMAC) is an influential brain-inspired computing model in many relevant fields. Since its inception in the 1970s, the model has been intensively studied and many variants of the prototype, such as Kernel-CMAC, Self-Organizing Map CMAC, and Linguistic CMAC, have been proposed. This review article focus on how the CMAC model is gradually developed and refined to meet the demand of fast, adaptive, and robust control. Two perspective, CMAC as a neural network and CMAC as a table look-up technique are presented. Three aspects of the model: the architecture, learning algorithms and applications are discussed. In the end, some potential future research directions on this model are suggested.\nIn this paper we explore whether or not deep neural architectures can learn to classify Boolean satisfiability (SAT). We devote considerable time to discussing the theoretical properties of SAT. Then, we define a graph representation for Boolean formulas in conjunctive normal form, and train neural classifiers over general graph structures called Graph Neural Networks, or GNNs, to recognize features of satisfiability. To the best of our knowledge this has never been tried before. Our preliminary findings are potentially profound. In a weakly-supervised setting, that is, without problem specific feature engineering, Graph Neural Networks can learn features of satisfiability.\nThe research was proposed to exploit and extend the relational and contextual nature of the information assets of the Catasto Gregoriano, kept at the Archivio di Stato in Rome. Developed within the MODEUS project (Making Open Data Effectively Usable), this study originates from the following key ideas of MODEUS: to require Open Data to be expressed in terms of an ontology, and to include such an ontology as a documentation of the data themselves. Thus, Open Data are naturally linked by means of the ontology, which meets the requirements of the Linked Open Data vision.\nIn this article we consider the basic ideas, approaches and results of developing of mathematical knowledge management technologies based on ontologies. These solutions form the basis of a specialized digital ecosystem OntoMath which consists of the ontology of the logical structure of mathematical documents Mocassin and ontology of mathematical knowledge OntoMathPRO, tools of text analysis, recommender system and other applications to manage mathematical knowledge. The studies are in according to the ideas of creating a distributed system of interconnected repositories of digitized versions of mathematical documents and project to create a World Digital Mathematical Library.\nPeople can refer to quantities in a visual scene by using either exact cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few, most, all). In humans, these two processes underlie fairly different cognitive and neural mechanisms. Inspired by this evidence, the present study proposes two models for learning the objective meaning of cardinals and quantifiers from visual scenes containing multiple objects. We show that a model capitalizing on a 'fuzzy' measure of similarity is effective for learning quantifiers, whereas the learning of exact cardinals is better accomplished when information about number is provided.\nBecause of several technological limitations of traditional silicon based computing, for past few years a paradigm shift, from silicon to carbon, is occurring in computational world. DNA computing has been considered to be quite promising in solving computational and reasoning problems by using DNA strands. Resolution, an important aspect of automated theorem proving and mathematical logic, is a rule of inference which leads to proof by contradiction technique for sentences in propositional logic and first-order logic. This can also be called refutation theorem-proving. In this paper we have shown how the theorem proving with resolution refutation by DNA computation can be represented by the semantics of process calculus and strand graph.\nWith the range and sensitivity of algorithmic decisions expanding at a break-neck speed, it is imperative that we aggressively investigate whether programs are biased. We propose a novel probabilistic program analysis technique and apply it to quantifying bias in decision-making programs. Specifically, we (i) present a sound and complete automated verification technique for proving quantitative properties of probabilistic programs; (ii) show that certain notions of bias, recently proposed in the fairness literature, can be phrased as quantitative correctness properties; and (iii) present FairSquare, the first verification tool for quantifying program bias, and evaluate it on a range of decision-making programs.\nThe interactive computation paradigm is reviewed and a particular example is extended to form the stochastic analog of a computational process via a transcription of a minimal Turing Machine into an equivalent asynchronous Cellular Automaton with an exponential waiting times distribution of effective transitions. Furthermore, a special toolbox for analytic derivation of recursive relations of important statistical and other quantities is introduced in the form of an Inductive Combinatorial Hierarchy.\nGenerative model has been one of the most common approaches for solving the Dialog State Tracking Problem with the capabilities to model the dialog hypotheses in an explicit manner. The most important task in such Bayesian networks models is constructing the most reliable user models by learning and reflecting the training data into the probability distribution of user actions conditional on networks states. This paper provides an overall picture of the learning process in a Bayesian framework with an emphasize on the state-of-the-art theoretical analyses of the Expectation Maximization learning algorithm.\nBinary Knapsack Problem (BKP) is to select a subset of an element (item) set with the highest value while keeping the total weight within the capacity of the knapsack. This paper presents an integer programming model for a variation of BKP where the value of each element may depend on selecting or ignoring other elements. Strengths of such Value-Related Dependencies are assumed to be imprecise and hard to specify. To capture this imprecision, we have proposed modeling value-related dependencies using fuzzy graphs and their algebraic structure.\nThis paper describes the realization of the Ontology Web Search Engine. The Ontology Web Search Engine is realizable as independent project and as a part of other projects. The main purpose of this paper is to present the Ontology Web Search Engine realization details as the part of the Semantic Web Expert System and to present the results of the Ontology Web Search Engine functioning. It is expected that the Semantic Web Expert System will be able to process ontologies from the Web, generate rules from these ontologies and develop its knowledge base.\nThough the word cognitive has a wide range of meanings we define cognitive engineering as learning from brain to bolster engineering solutions. However, giving an achievable framework to the process towards this has been a difficult task. In this work we take the classic data information knowledge wisdom (DIKW) framework to set some achievable goals and sub-goals towards cognitive engineering. A layered framework like DIKW aligns nicely with the layered structure of pre-frontal cortex. And breaking the task into sub-tasks based on the layers also makes it easier to start developmental endeavours towards achieving the final goal of a brain-inspired system.\nIn recent years ontologies enjoyed a growing popularity outside specialized AI communities. System engineering is no exception to this trend, with ontologies being proposed as a basis for several tasks in complex industrial implements, including system design, monitoring and diagnosis. In this paper, we consider four different contributions to system engineering wherein ontologies are instrumental to provide enhancements over traditional ad-hoc techniques. For each application, we briefly report the methodologies, the tools and the results obtained with the goal to provide an assessment of merits and limits of ontologies in such domains.\nWe propose a novel approach for using unsupervised boosting to create an ensemble of generative models, where models are trained in sequence to correct earlier mistakes. Our meta-algorithmic framework can leverage any existing base learner that permits likelihood evaluation, including recent deep expressive models. Further, our approach allows the ensemble to include discriminative models trained to distinguish real data from model-generated data. We show theoretical conditions under which incorporating a new model in the ensemble will improve the fit and empirically demonstrate the effectiveness of our black-box boosting algorithms on density estimation, classification, and sample generation on benchmark datasets for a wide range of generative models.\nAs machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.\nMachine learning enables systems to build and update domain models based on runtime observations. In this paper, we study statistical model checking and runtime verification for systems with this ability. Two challenges arise: (1) Models built from limited runtime data yield uncertainty to be dealt with. (2) There is no definition of satisfaction w.r.t. uncertain hypotheses. We propose such a definition of subjective satisfaction based on recently introduced satisfaction functions. We also propose the BV algorithm as a Bayesian solution to runtime verification of subjective satisfaction under model uncertainty. BV provides user-definable stochastic bounds for type I and II errors. We discuss empirical results from an example application to illustrate our ideas.\nWe introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.\nAlgorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a technique of general applicability to use hard nonlinearities with saturation cost. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end.\nThe principle of common cause asserts that positive correlations between causally unrelated events ought to be explained through the action of some shared causal factors. Reichenbachian common cause systems are probabilistic structures aimed at accounting for cases where correlations of the aforesaid sort cannot be explained through the action of a single common cause. The existence of Reichenbachian common cause systems of arbitrary finite size for each pair of non-causally correlated events was allegedly demonstrated by Hofer-Szab\\'o and R\\'edei in 2006. This paper shows that their proof is logically deficient, and we propose an improved proof.\nSophisticated gated recurrent neural network architectures like LSTMs and GRUs have been shown to be highly effective in a myriad of applications. We develop an un-gated unit, the statistical recurrent unit (SRU), that is able to learn long term dependencies in data by only keeping moving averages of statistics. The SRU's architecture is simple, un-gated, and contains a comparable number of parameters to LSTMs; yet, SRUs perform favorably to more sophisticated LSTM and GRU alternatives, often outperforming one or both in various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures' hyperparameters in a Bayesian optimization scheme for both synthetic and real-world tasks.\nHuman computation games (HCGs) can provide novel solutions to intractable computational problems, help enable scientific breakthroughs, and provide datasets for artificial intelligence. However, our knowledge about how to design and deploy HCGs that appeal to players and solve problems effectively is incomplete. We present an investigatory HCG based on Super Mario Bros. We used this game in a human subjects study to investigate how different social conditions---singleplayer and multiplayer---and scoring mechanics---collaborative and competitive---affect players' subjective experiences, accuracy at the task, and the completion rate. In doing so, we demonstrate a novel design approach for HCGs, and discuss the benefits and tradeoffs of these mechanics in HCG design.\nUnmanned aircraft have decreased the cost required to collect remote sensing imagery, which has enabled researchers to collect high-spatial resolution data from multiple sensor modalities more frequently and easily. The increase in data will push the need for semantic segmentation frameworks that are able to classify non-RGB imagery, but this type of algorithmic development requires an increase in publicly available benchmark datasets with class labels. In this paper, we introduce a high-resolution multispectral dataset with image labels. This new benchmark dataset has been pre-split into training/testing folds in order to standardize evaluation and continue to push state-of-the-art classification frameworks for non-RGB imagery.\nEpistemic planning can be used for decision making in multi-agent situations with distributed knowledge and capabilities. Dynamic Epistemic Logic (DEL) has been shown to provide a very natural and expressive framework for epistemic planning. In this paper, we aim to give an accessible introduction to DEL-based epistemic planning. The paper starts with the most classical framework for planning, STRIPS, and then moves towards epistemic planning in a number of smaller steps, where each step is motivated by the need to be able to model more complex planning scenarios.\nThe recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.\nWe present a formal measure of argument strength, which combines the ideas that conclusions of strong arguments are (i) highly probable and (ii) their uncertainty is relatively precise. Likewise, arguments are weak when their conclusion probability is low or when it is highly imprecise. We show how the proposed measure provides a new model of the Ellsberg paradox. Moreover, we further substantiate the psychological plausibility of our approach by an experiment (N = 60). The data show that the proposed measure predicts human inferences in the original Ellsberg task and in corresponding argument strength tasks. Finally, we report qualitative data taken from structured interviews on folk psychological conceptions on what argument strength means.\nWe study abductive, causal, and non-causal conditionals in indicative and counterfactual formulations using probabilistic truth table tasks under incomplete probabilistic knowledge (N = 80). We frame the task as a probability-logical inference problem. The most frequently observed response type across all conditions was a class of conditional event interpretations of conditionals; it was followed by conjunction interpretations. An interesting minority of participants neglected some of the relevant imprecision involved in the premises when inferring lower or upper probability bounds on the target conditional/counterfactual (\"halfway responses\"). We discuss the results in the light of coherence-based probability logic and the new paradigm psychology of reasoning.\nIn this extended abstract we describe, mainly by examples, the main elements of the Ontological Multidimensional Data Model, which considerably extends a relational reconstruction of the multidimensional data model proposed by Hurtado and Mendelzon by means of tuple-generating dependencies, equality-generating dependencies, and negative constraints as found in Datalog+-. We briefly mention some good computational properties of the model.\nAxioms can be used to model derived predicates in domain- independent planning models. Formulating models which use axioms can sometimes result in problems with much smaller search spaces and shorter plans than the original model. Previous work on axiom-aware planners focused solely on state- space search planners. We propose axiom-aware planners based on answer set programming and integer programming. We evaluate them on PDDL domains with axioms and show that they can exploit additional expressivity of axioms.\nMotion ability is one of the most important human properties, including gait as a basis of human transitional movement. Gait, as a biometric for recognizing human identities, can be non-intrusively captured signals using wearable or portable smart devices. In this study gait patterns is collected using a wireless platform of two sensors located at chest and right ankle of the subjects. Then the raw data has undergone some preprocessing methods and segmented into 5 seconds windows. Some time and frequency domain features is extracted and the performance evaluated by 5 different classifiers. Decision Tree (with all features) and K-Nearest Neighbors (with 10 selected features) classifiers reached 99.4% and 100% respectively.\nA novel partial order is defined on the space of digraphs or hypergraphs, based on assessing the cost of producing a graph via a sequence of elementary transformations. Leveraging work by Knuth and Skilling on the foundations of inference, and the structure of Heyting algebras on graph space, this partial order is used to construct an intuitionistic probability measure that applies to either digraphs or hypergraphs. As logical inference steps can be represented as transformations on hypergraphs representing logical statements, this also yields an intuitionistic probability measure on spaces of theorems. The central result is also extended to yield intuitionistic probabilities based on more general weighted rule systems defined over bicartesian closed categories.\nThis paper describes an application of reinforcement learning to the mention detection task. We define a novel action-based formulation for the mention detection task, in which a model can flexibly revise past labeling decisions by grouping together tokens and assigning partial mention labels. We devise a method to create mention-level episodes and we train a model by rewarding correctly labeled complete mentions, irrespective of the inner structure created. The model yields results which are on par with a competitive supervised counterpart while being more flexible in terms of achieving targeted behavior through reward modeling and generating internal mention structure, especially on longer mentions.\nIn practice, a ranking of objects with respect to given set of criteria is of considerable importance. However, due to lack of knowledge, information of time pressure, decision makers might not be able to provide a (crisp) ranking of objects from the top to the bottom. Instead, some objects might be ranked equally, or better than other objects only to some degree. In such cases, a generalization of crisp rankings to fuzzy rankings can be more useful. The aim of the article is to introduce the notion of a fuzzy ranking and to discuss its several properties, namely orderings, similarity and indecisiveness. The proposed approach can be used both for group decision making or multiple criteria decision making when uncertainty is involved.\nPairwise comparisons are an important tool of modern (multiple criteria) decision making. Since human judgments are often inconsistent, many studies focused on the ways how to express and measure this inconsistency, and several inconsistency indices were proposed as an alternative to Saaty inconsistency index and inconsistency ratio for reciprocal pairwise comparisons matrices. This paper aims to: firstly, introduce a new measure of inconsistency of pairwise comparisons and to prove its basic properties; secondly, to postulate an additional axiom, an upper boundary axiom, to an existing set of axioms; and the last, but not least, the paper provides proofs of satisfaction of this additional axiom by selected inconsistency indices as well as it provides their numerical comparison.\nThis paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource for the study of the role of script knowledge in natural language processing.\nKnowledge graph embedding aims at translating the knowledge graph into numerical representations by transforming the entities and relations into continuous low-dimensional vectors. Recently, many methods [1, 5, 3, 2, 6] have been proposed to deal with this problem, but existing single-thread implementations of them are time-consuming for large-scale knowledge graphs. Here, we design a unified parallel framework to parallelize these methods, which achieves a significant time reduction without influencing the accuracy. We name our framework as ParaGraphE, which provides a library for parallel knowledge graph embedding. The source code can be downloaded from https://github.com/LIBBLE/LIBBLE-MultiThread/tree/master/ParaGraphE .\nIn Operation Research, practical evaluation is essential to validate the efficacy of optimization approaches. This paper promotes the usage of performance profiles as a standard practice to visualize and analyze experimental results. It introduces a Web tool to construct and export performance profiles as SVG or HTML files. In addition, the application relies on a methodology to estimate the benefit of hypothetical solver improvements. Therefore, the tool allows one to employ what-if analysis to screen possible research directions, and identify those having the best potential. The approach is showcased on two Operation Research technologies: Constraint Programming and Mixed Integer Linear Programming.\nWe present PEC, an Event Calculus (EC) style action language for reasoning about probabilistic causal and narrative information. It has an action language style syntax similar to that of the EC variant Modular-E. Its semantics is given in terms of possible worlds which constitute possible evolutions of the domain, and builds on that of EFEC, an epistemic extension of EC. We also describe an ASP implementation of PEC and show the sense in which this is sound and complete.\nA typical IR system that delivers and stores information is affected by problem of matching between user query and available content on web. Use of Ontology represents the extracted terms in form of network graph consisting of nodes, edges, index terms etc. The above mentioned IR approaches provide relevance thus satisfying users query. The paper also emphasis on analyzing multimedia documents and performs calculation for extracted terms using different statistical formulas. The proposed model developed reduces semantic gap and satisfies user needs efficiently.\nThe proposed methodology is procedural i.e. it follows finite number of steps that extracts relevant documents according to users query. It is based on principles of Data Mining for analyzing web data. Data Mining first adapts integration of data to generate warehouse. Then, it extracts useful information with the help of algorithm. The task of representing extracted documents is done by using Vector Based Statistical Approach that represents each document in set of Terms.\nWe study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies. We also prove a regret bound that establishes statistical efficiency with a tabular representation.\nDiversification-Based Learning (DBL) derives from a collection of principles and methods introduced in the field of metaheuristics that have broad applications in computing and optimization. We show that the DBL framework goes significantly beyond that of the more recent Opposition-based learning (OBL) framework introduced in Tizhoosh (2005), which has become the focus of numerous research initiatives in machine learning and metaheuristic optimization. We unify and extend earlier proposals in metaheuristic search (Glover, 1997, Glover and Laguna, 1997) to give a collection of approaches that are more flexible and comprehensive than OBL for creating intensification and diversification strategies in metaheuristic search. We also describe potential applications of DBL to various subfields of machine learning and optimization.\nHigher education in the fourth industrial revolution, HE 4.0, is a complex, dialectical and exciting opportunity which can potentially transform society for the better. The fourth industrial revolution is powered by artificial intelligence and it will transform the workplace from tasks based characteristics to the human centred characteristics. Because of the convergence of man and machine, it will reduce the subject distance between humanities and social science as well as science and technology. This will necessarily require much more interdisciplinary teaching, research and innovation. This paper explores the impact of HE 4.0 on the mission of a university which is teaching, research (including innovation) and service.\nIn this paper, we present a new algorithm for parallel Monte Carlo tree search (MCTS). It is based on the pipeline pattern and allows flexible management of the control flow of the operations in parallel MCTS. The pipeline pattern provides for the first structured parallel programming approach to MCTS. Moreover, we propose a new lock-free tree data structure for parallel MCTS which removes synchronization overhead. The Pipeline Pattern for Parallel MCTS algorithm (called 3PMCTS), scales very well to higher numbers of cores when compared to the existing methods.\nKeyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, and cast the task as a multi-task learning problem with deep recurrent neural networks. Our multi-task models perform significantly better than previous state of the art approaches on two scientific KBC datasets, particularly for long keyphrases.\nReprogramming matter may sound far-fetched, but we have been doing it with increasing power and staggering efficiency for at least 60 years, and for centuries we have been paving the way toward the ultimate reprogrammed fate of the universe, the vessel of all programs. How will we be doing it in 60 years' time and how will it impact life and the purpose both of machines and of humans?\nWe propose a simulated annealing algorithm specifically tailored to optimise total retrieval times in a multi-level warehouse under complex pre-batched picking constraints. Experiments on real data from a picker-to-parts order picking process in the warehouse of a European manufacturer show that optimal storage assignments do not necessarily display features presumed in heuristics, such as clustering of positively correlated items or ordering of items by picking frequency.   In an experiment run on more than 4000 batched orders with 1 to 150 items per batch, the storage assignment suggested by the algorithm produces a 21\\% reduction in the total retrieval time with respect to a frequency-based storage assignment.\nThis communication presents a longitudinal model-free control approach for computing the wheel torque command to be applied on a vehicle. This setting enables us to overcome the problem of unknown vehicle parameters for generating a suitable control law. An important parameter in this control setting is made time-varying for ensuring finite-time stability. Several convincing computer simulations are displayed and discussed. Overshoots become therefore smaller. The driving comfort is increased and the robustness to time-delays is improved.\nThis report is targeted to groups who are subject matter experts in their application but deep learning novices. It contains practical advice for those interested in testing the use of deep neural networks on applications that are novel for deep learning. We suggest making your project more manageable by dividing it into phases. For each phase this report contains numerous recommendations and insights to assist novice practitioners.\nWe investigate the geometry of optimal memoryless time independent decision making in relation to the amount of information that the acting agent has about the state of the system. We show that the expected long term reward, discounted or per time step, is maximized by policies that randomize among at most $k$ actions whenever at most $k$ world states are consistent with the agent's observation. Moreover, we show that the expected reward per time step can be studied in terms of the expected discounted reward. Our main tool is a geometric version of the policy improvement lemma, which identifies a polyhedral cone of policy changes in which the state value function increases for all states.\nThis document describes the contributions of the 2016 Applications of Logic Programming Workshop (AppLP), which was held on October 17 and associated with the International Conference on Logic Programming (ICLP) in Flushing, New York City.\nWe describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.\nThe media industry is increasingly personalizing the offering of contents in attempt to better target the audience. This requires to analyze the relationships that goes established between users and content they enjoy, looking at one side to the content characteristics and on the other to the user profile, in order to find the best match between the two. In this paper we suggest to build that relationship using the Dempster-Shafer's Theory of Evidence, proposing a reference model and illustrating its properties by means of a toy example. Finally we suggest possible applications of the model for tasks that are common in the modern media industry.\nThis papers shows that using separators, which is a pair of two complementary contractors, we can easily and efficiently solve the localization problem of a robot with sonar measurements in an unstructured environment. We introduce separators associated with the Minkowski sum and the Minkowski difference in order to facilitate the resolution. A test-case is given in order to illustrate the principle of the approach.\nThis paper introduces Scavenger, the first theorem prover for pure first-order logic without equality based on the new conflict resolution calculus. Conflict resolution has a restricted resolution inference rule that resembles (a first-order generalization of) unit propagation as well as a rule for assuming decision literals and a rule for deriving new clauses by (a first-order generalization of) conflict-driven clause learning.\nFundamental discrepancy between first order logic and statistical inference (global versus local properties of universe) is shown to be the obstacle for integration of logic and probability in L.p. logic of Bacchus. To overcome the counterintuitiveness of L.p. behaviour, a 3-valued logic is proposed.\nCASP is an extension of ASP that allows for numerical constraints to be added in the rules. PDDL+ is an extension of the PDDL standard language of automated planning for modeling mixed discrete-continuous dynamics.   In this paper, we present CASP solutions for dealing with PDDL+ problems, i.e., encoding from PDDL+ to CASP, and extensions to the algorithm of the EZCSP CASP solver in order to solve CASP programs arising from PDDL+ domains. An experimental analysis, performed on well-known linear and non-linear variants of PDDL+ domains, involving various configurations of the EZCSP solver, other CASP solvers, and PDDL+ planners, shows the viability of our solution.\nWe develop our interpretation of the joint belief distribution and of evidential updating that matches the following basic requirements:   * there must exist an efficient method for reasoning within this framework   * there must exist a clear correspondence between the contents of the knowledge base and the real world   * there must be a clear correspondence between the reasoning method and some real world process   * there must exist a clear correspondence between the results of the reasoning process and the results of the real world process corresponding to the reasoning process.\nA key enabler for optimizing business processes is accurately estimating the probability distribution of a time series future given its past. Such probabilistic forecasts are crucial for example for reducing excess inventory in supply chains. In this paper we propose DeepAR, a novel methodology for producing accurate probabilistic forecasts, based on training an auto-regressive recurrent network model on a large number of related time series. We show through extensive empirical evaluation on several real-world forecasting data sets that our methodology is more accurate than state-of-the-art models, while requiring minimal feature engineering.\nWe propose a new task-specification language for Markov decision processes that is designed to be an improvement over reward functions by being environment independent. The language is a variant of Linear Temporal Logic (LTL) that is extended to probabilistic specifications in a way that permits approximations to be learned in finite time. We provide several small environments that demonstrate the advantages of our geometric LTL (GLTL) language and illustrate how it can be used to specify standard reinforcement-learning tasks straightforwardly.\nIn this paper we propose the creation of generic LSH families for the angular distance based on Johnson-Lindenstrauss projections. We show that feature hashing is a valid J-L projection and propose two new LSH families based on feature hashing. These new LSH families are tested on both synthetic and real datasets with very good results and a considerable performance improvement over other LSH families. While the theoretical analysis is done for the angular distance, these families can also be used in practice for the euclidean distance with excellent results [2]. Our tests using real datasets show that the proposed LSH functions work well for the euclidean distance.\nCatastrophic forgetting has a serious impact in reinforcement learning, as the data distribution is generally sparse and non-stationary over time. The purpose of this study is to investigate whether pseudorehearsal can increase performance of an actor-critic agent with neural-network based policy selection and function approximation in a pole balancing task and compare different pseudorehearsal approaches. We expect that pseudorehearsal assists learning even in such very simple problems, given proper initialization of the rehearsal parameters.\nEligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with recurrent networks in the Atari domain. We illustrate the benefits of both recurrent nets and eligibility traces in some Atari games, and highlight also the importance of the optimization used in the training.\nWe introduce the first deep reinforcement learning agent that learns to beat Atari games with the aid of natural language instructions. The agent uses a multimodal embedding between environment observations and natural language to self-monitor progress through a list of English instructions, granting itself reward for completing instructions in addition to increasing the game score. Our agent significantly outperforms Deep Q-Networks (DQNs), Asynchronous Advantage Actor-Critic (A3C) agents, and the best agents posted to OpenAI Gym on what is often considered the hardest Atari 2600 environment: Montezuma's Revenge.\nIn this work a mixed agent-based and discrete event simulation model is developed for a high frequency bus route in the Netherlands. With this model, different passenger growth scenarios can be easily evaluated. This simulation model helps policy makers to predict changes that have to be made to bus routes and planned travel times before problems occur. The model is validated using several performance indicators, showing that under some model assumptions, it can realistically simulate real-life situations. The simulation's workings are illustrated by two use cases.\nStochastic Constraint Programming (SCP) is an extension of Constraint Programming (CP) used for modelling and solving problems involving constraints and uncertainty. SCP inherits excellent modelling abilities and filtering algorithms from CP, but so far it has not been applied to large problems. Reinforcement Learning (RL) extends Dynamic Programming to large stochastic problems, but is problem-specific and has no generic solvers. We propose a hybrid combining the scalability of RL with the modelling and constraint filtering methods of CP. We implement a prototype in a CP system and demonstrate its usefulness on SCP problems.\nData stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i.e., unexpected changes in data distribution, causing most of models to be less accurate as time passes. To this end we revisited (i) semantic inference in the context of supervised stream learning, and (ii) models with semantic embeddings. The experiments show accurate prediction with data from Dublin and Beijing.\nTasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs. We introduce abstract syntax networks, a modeling framework for these problems. The outputs are represented as abstract syntax trees (ASTs) and constructed by a decoder with a dynamically-determined modular structure paralleling the structure of the output tree. On the benchmark Hearthstone dataset for code generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy, compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with no task-specific engineering.\nWe propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.\nAs entity type systems become richer and more fine-grained, we expect the number of types assigned to a given entity to increase. However, most fine-grained typing work has focused on datasets that exhibit a low degree of type multiplicity. In this paper, we consider the high-multiplicity regime inherent in data sources such as Wikipedia that have semi-open type systems. We introduce a set-prediction approach to this problem and show that our model outperforms unstructured baselines on a new Wikipedia-based fine-grained typing corpus.\nWe introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model's ability to selectively focus on the relevant parts of an input sequence.\nWit is a quintessential form of rich inter-human interaction, and is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns. We compare our approach against meaningful baseline approaches via human studies. In a Turing test style evaluation, people find our model's description for an image to be wittier than a human's witty description 55% of the time!\nWord embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on benchmark datasets such as word similarity and entailment.\nA popular curve shown in introductory maths textbooks, seems like a circle. But it is actually a different curve. This paper discusses some elementary approaches to identify the geometric object, including novel technological means by using GeoGebra. We demonstrate two ways to refute the false impression, two suggestions to find a correct conjecture, and four ways to confirm the result by proving it rigorously.   All of the discussed approaches can be introduced in classrooms at various levels from middle school to high school.\nThe notion of events has occupied a central role in modeling and has an influence in computer science and philosophy. Recent developments in diagrammatic modeling have made it possible to examine conceptual representation of events. This paper explores some aspects of the notion of events that are produced by applying a new diagrammatic methodology with a focus on the interaction of events with such concepts as time and space, objects. The proposed description applies to abstract machines where events form the dynamic phases of a system. The results of this nontechnical research can be utilized in many fields where the notion of an event is typically used in interdisciplinary application.\nKiwi is a minimalist and extendable Constraint Programming (CP) solver specifically designed for education. The particularities of Kiwi stand in its generic trailing state restoration mechanism and its modulable use of variables. By developing Kiwi, the author does not aim to provide an alternative to full featured constraint solvers but rather to provide readers with a basic architecture that will (hopefully) help them to understand the core mechanisms hidden under the hood of constraint solvers, to develop their own extended constraint solver, or to test innovative ideas.\nDempster-Shafer evidence theory is wildly applied in multi-sensor data fusion. However, lots of uncertainty and interference exist in practical situation, especially in the battle field. It is still an open issue to model the reliability of sensor reports. Many methods are proposed based on the relationship among collected data. In this letter, we proposed a quantum mechanical approach to evaluate the reliability of sensor reports, which is based on the properties of a sensor itself. The proposed method is used to modify the combining of evidences.\nImaging is a form of probabilistic belief change which could be employed for both revision and update. In this paper, we propose a new framework for probabilistic belief change based on imaging, called Expected Distance Imaging (EDI). EDI is sufficiently general to define Bayesian conditioning and other forms of imaging previously defined in the literature. We argue that, and investigate how, EDI can be used for both revision and update. EDI's definition depends crucially on a weight function whose properties are studied and whose effect on belief change operations is analysed. Finally, four EDI instantiations are proposed, two for revision and two for update, and probabilistic rationality postulates are suggested for their analysis.\nWe present a tool, simplify-defun, that transforms the definition of a given function into a simplified definition of a new function, providing a proof checked by ACL2 that the old and new functions are equivalent. When appropriate it also generates termination and guard proofs for the new function. We explain how the tool is engineered so that these proofs will succeed. Examples illustrate its utility, in particular for program transformation in synthesis and verification.\nLogic-based event recognition systems infer occurrences of events in time using a set of event definitions in the form of first-order rules. The Event Calculus is a temporal logic that has been used as a basis in event recognition applications, providing among others, direct connections to machine learning, via Inductive Logic Programming (ILP). OLED is a recently proposed ILP system that learns event definitions in the form of Event Calculus theories, in a single pass over a data stream. In this work we present a version of OLED that allows for distributed, online learning. We evaluate our approach on a benchmark activity recognition dataset and show that we can significantly reduce training times, exchanging minimal information between processing nodes.\nThis paper introduces an SLD-resolution technique based on deep learning. This technique enables neural networks to learn from old and successful resolution processes and to use learnt experiences to guide new resolution processes. An implementation of this technique is named SLDR-DL. It includes a Prolog library of deep feedforward neural networks and some essential functions of resolution. In the SLDR-DL framework, users can define logical rules in the form of definite clauses and teach neural networks to use the rules in reasoning processes.\nWith pressure to increase graduation rates and reduce time to degree in higher education, it is important to identify at-risk students early. Automated early warning systems are therefore highly desirable. In this paper, we use unsupervised clustering techniques to predict the graduation status of declared majors in five departments at California State University Northridge (CSUN), based on a minimal number of lower division courses in each major. In addition, we use the detected clusters to identify hidden bottleneck courses.\nWe investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We show that straightforward thread-based parallelization techniques which were previously proposed for massively parallel SIMD processors perform poorly due to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to fast shared memory) rather than a thread. On the 15-puzzle, BPIDA* on a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to a highly optimized sequential IDA* implementation on a Xeon E5-2670 core.\nContinuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while na\\\"ive incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios.\nWe present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.\nWe propose a new mechanism for integration of OWL ontologies using semantic import relations. In contrast to the standard OWL importing, we do not require all axioms of the imported ontologies to be taken into account for reasoning tasks, but only their logical implications over a chosen signature. This property comes natural in many ontology integration scenarios, especially when the number of ontologies is large. In this paper, we study the complexity of reasoning over ontologies with semantic import relations and establish a range of tight complexity bounds for various fragments of OWL.\nThis paper introduces an end-to-end fine-tuning method to improve hand-eye coordination in modular deep visuo-motor policies (modular networks) where each module is trained independently. Benefiting from weighted losses, the fine-tuning method significantly improves the performance of the policies for a robotic planar reaching task.\nProbabilistic modeling enables combining domain knowledge with learning from data, thereby supporting learning from fewer training instances than purely data-driven methods. However, learning probabilistic models is difficult and has not achieved the level of performance of methods such as deep neural networks on many tasks. In this paper, we attempt to address this issue by presenting a method for learning the parameters of a probabilistic program using backpropagation. Our approach opens the possibility to building deep probabilistic programming models that are trained in a similar way to neural networks.\nWe introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results.\nThe asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep reinforcement learning often deals with interpreting visual information, a large part of the train and inference time is spent performing convolutions. In this work we present our results on learning strategies in Atari games using a Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0 machine learning framework. We also analyze effects of asynchronous computations on the convergence of reinforcement learning algorithms.\nIn this paper we introduce RankPL, a modeling language that can be thought of as a qualitative variant of a probabilistic programming language with a semantics based on Spohn's ranking theory. Broadly speaking, RankPL can be used to represent and reason about processes that exhibit uncertainty expressible by distinguishing \"normal\" from\" surprising\" events. RankPL allows (iterated) revision of rankings over alternative program states and supports various types of reasoning, including abduction and causal inference. We present the language, its denotational semantics, and a number of practical examples. We also discuss an implementation of RankPL that is available for download.\nThe Maximum Balanced Biclique Problem is a well-known graph model with relevant applications in diverse domains. This paper introduces a novel algorithm, which combines an effective constraint-based tabu search procedure and two dedicated graph reduction techniques. We verify the effectiveness of the algorithm on 30 classical random benchmark graphs and 25 very large real-life sparse graphs from the popular Koblenz Network Collection (KONECT). The results show that the algorithm improves the best-known results (new lower bounds) for 10 classical benchmarks and obtains the optimal solutions for 14 KONECT instances.\nThompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applications for which Thompson sampling is viable. We establish a theoretical basis that supports the approach and present computational results that offer further insight.\nCombinatorial evolution and forecasting of system requirements is examined. The morphological model is used for a hierarchical requirements system (i.e., system parts, design alternatives for the system parts, ordinal estimates for the alternatives). A set of system changes involves changes of the system structure, component alternatives and their estimates. The composition process of the forecast is based on combinatorial synthesis (knapsack problem, multiple choice problem, hierarchical morphological design). An illustrative numerical example for four-phase evolution and forecasting of requirements to communications is described.\nTransforming a graphical user interface screenshot created by a designer into computer code is a typical task conducted by a developer in order to build customized software, websites, and mobile applications. In this paper, we show that deep learning methods can be leveraged to train a model end-to-end to automatically generate code from a single input image with over 77% of accuracy for three different platforms (i.e. iOS, Android and web-based technologies).\nDevelopments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling.\nWe study fairness in collaborative-filtering recommender systems, which are sensitive to discrimination that exists in historical data. Biased data can lead collaborative-filtering methods to make unfair predictions for users from minority groups. We identify the insufficiency of existing fairness metrics and propose four new metrics that address different forms of unfairness. These fairness metrics can be optimized by adding fairness terms to the learning objective. Experiments on synthetic and real data show that our new metrics can better measure fairness than the baseline, and that the fairness objectives effectively help reduce unfairness.\nWe study causal inference in a multi-environment setting, in which the functional relations for producing the variables from their direct causes remain the same across environments, while the distribution of exogenous noises may vary. We introduce the idea of using the invariance of the functional relations of the variables to their causes across a set of environments. We define a notion of completeness for a causal inference algorithm in this setting and prove the existence of such algorithm by proposing the baseline algorithm. Additionally, we present an alternate algorithm that has significantly improved computational and sample complexity compared to the baseline algorithm. The experiment results show that the proposed algorithm outperforms the other existing algorithms.\nThis paper focuses on detecting anomalies in a digital video broadcasting (DVB) system from providers' perspective. We learn a probabilistic deterministic real timed automaton profiling benign behavior of encryption control in the DVB control access system. This profile is used as a one-class classifier. Anomalous items in a testing sequence are detected when the sequence is not accepted by the learned model.\nAbstraction is a fundamental tool for reasoning about complex systems. Program abstraction has been utilized to great effect for analyzing deterministic programs. At the heart of program abstraction is the relationship between a concrete program, which is difficult to analyze, and an abstract program, which is more tractable. Program abstractions, however, are typically not probabilistic. We generalize non-deterministic program abstractions to probabilistic program abstractions by explicitly quantifying the non-deterministic choices. Our framework upgrades key definitions and properties of abstractions to the probabilistic context. We also discuss preliminary ideas for performing inference on probabilistic abstractions and general probabilistic programs.\nThe act of persuasion, a key component in rhetoric argumentation, may be viewed as a dynamics modifier. Such modifiers are well-known in other research fields: recall dynamic epistemic logic where operators modify possible world accessibilities, or recall side effects and concurrency in programming languages. We consider persuasion in abstract argumentation as undertaking a similar role. We extend Dung's frameworks with acts of persuasion among agents into Abstract Persuasion Argumentation (APA), and set forth properties related to arguments' admissibilities. We show a way of enriching our basic notion of admissibility through CTL (computation tree logic) encoding, which also permits importation of the theoretical results known to the logic into our argumentation frameworks.\nIn this work, we present a novel approach to ontology reasoning that is based on deep learning rather than logic-based formal reasoning. To this end, we introduce a new model for statistical relational learning that is built upon deep recursive neural networks, and give experimental evidence that it can easily compete with, or even outperform, existing logic-based reasoners on the task of ontology reasoning. More precisely, we compared our implemented system with one of the best logic-based ontology reasoners at present, RDFox, on a number of large standard benchmark datasets, and found that our system attained high reasoning quality, while being up to two orders of magnitude faster.\nRecent advances in combining deep learning and Reinforcement Learning have shown a promising path for designing new control agents that can learn optimal policies for challenging control tasks. These new methods address the main limitations of conventional Reinforcement Learning methods such as customized feature engineering and small action/state space dimension requirements. In this paper, we leverage one of the state-of-the-art Reinforcement Learning methods, known as Trust Region Policy Optimization, to tackle intersection management for autonomous vehicles. We show that using this method, we can perform fine-grained acceleration control of autonomous vehicles in a grid street plan to achieve a global design objective.\nMotivated by concerns for user privacy, we design a steganographic system (\"stegosystem\") that enables two users to exchange encrypted messages without an adversary detecting that such an exchange is taking place. We propose a new linguistic stegosystem based on a Long Short-Term Memory (LSTM) neural network. We demonstrate our approach on the Twitter and Enron email datasets and show that it yields high-quality steganographic text while significantly improving capacity (encrypted bits per word) relative to the state-of-the-art.\nMany papers have been published on the knowledge base completion task in the past few years. Most of these introduce novel architectures for relation learning that are evaluated on standard datasets such as FB15k and WN18. This paper shows that the accuracy of almost all models published on the FB15k can be outperformed by an appropriately tuned baseline - our reimplementation of the DistMult model. Our findings cast doubt on the claim that the performance improvements of recent models are due to architectural changes as opposed to hyper-parameter tuning or different training objectives. This should prompt future research to re-consider how the performance of models is evaluated and reported.\nWe present a new model DrNET that learns disentangled image representations from video. Our approach leverages the temporal coherence of video and a novel adversarial loss to learn a representation that factorizes each frame into a stationary part and a temporally varying component. The disentangled representation can be used for a range of tasks. For example, applying a standard LSTM to the time-vary components enables prediction of future frames. We evaluate our approach on a range of synthetic and real videos, demonstrating the ability to coherently generate hundreds of steps into the future.\nHumans are expert in the amount of sensory data they deal with each moment. Human brain not only analyses these data but also starts synthesizing new information from the existing data. The current age Big-data systems are needed not just to analyze data but also to come up new interpretation. We believe that the pivotal ability in human brain which enables us to do this is what is known as \"intuition\". Here, we present an intuition based architecture for big data analysis and synthesis.\nWe propose a generative machine comprehension model that learns jointly to ask and answer questions based on documents. The proposed model uses a sequence-to-sequence framework that encodes the document and generates a question (answer) given an answer (question). Significant improvement in model performance is observed empirically on the SQuAD corpus, confirming our hypothesis that the model benefits from jointly learning to perform both tasks. We believe the joint model's novelty offers a new perspective on machine comprehension beyond architectural engineering, and serves as a first step towards autonomous information seeking.\nWe describe the Marmara Turkish Coreference Corpus, which is an annotation of the whole METU-Sabanci Turkish Treebank with mentions and coreference chains. Collecting nine or more independent annotations for each document allowed for fully automatic adjudication. We provide a baseline system for Turkish mention detection and coreference resolution and evaluate it on the corpus.\nCPU Scheduling is the base of multiprogramming. Scheduling is a process which decides order of task from a set of multiple tasks that are ready to execute. There are number of CPU scheduling algorithms available, but it is very difficult task to decide which one is better. This paper discusses the design and implementation of modified fuzzy based CPU scheduling algorithm. This paper present a new set of fuzzy rules. It demonstrates that scheduling done with new priority improves average waiting time and average turnaround time.\nThe article contains a preliminary glance at balanced clustering problems. Basic balanced structures and combinatorial balanced problems are briefly described. A special attention is targeted to various balance/unbalance indices (including some new versions of the indices): by cluster cardinality, by cluster weights, by inter-cluster edge/arc weights, by cluster element structure (for element multi-type clustering). Further, versions of optimization clustering problems are suggested (including multicriteria problem formulations). Illustrative numerical examples describe calculation of balance indices and element multi-type balance clustering problems (including example for design of student teams).\nMOBAs represent a huge segment of online gaming and are growing as both an eSport and a casual genre. The natural starting point for AI researchers interested in MOBAs is to develop an AI to play the game better than a human - but MOBAs have many more challenges besides adversarial AI. In this paper we introduce the reader to the wider context of MOBA culture, propose a range of challenges faced by the community today, and posit concrete AI projects that can be undertaken to begin solving them.\nWe present a probabilistic extension of the description logic $\\mathcal{ALC}$ for reasoning about statistical knowledge. We consider conditional statements over proportions of the domain and are interested in the probabilistic-logical consequences of these proportions. After introducing some general reasoning problems and analyzing their properties, we present first algorithms and complexity results for reasoning in some fragments of Statistical $\\mathcal{ALC}$.\nThe highly influential framework of conceptual spaces provides a geometric way of representing knowledge. It aims at bridging the gap between symbolic and subsymbolic processing. Instances are represented by points in a high-dimensional space and concepts are represented by convex regions in this space. In this paper, we present our approach towards grounding the dimensions of a conceptual space in latent spaces learned by an InfoGAN from unlabeled data.\nMulti-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.\nWe propose a novel embedding model that represents relationships among several elements in bibliographic information with high representation ability and flexibility. Based on this model, we present a novel search system that shows the relationships among the elements in the ACL Anthology Reference Corpus. The evaluation results show that our model can achieve a high prediction ability and produce reasonable search results.\nIn horizontal collaborations, carriers form coalitions in order to perform parts of their logistics operations jointly. By exchanging transportation requests among each other, they can operate more efficiently and in a more sustainable way. Collaborative vehicle routing has been extensively discussed in the literature. We identify three major streams of research: (i) centralized collaborative planning, (ii) decentralized planning without auctions, and (ii) auction-based decentralized planning. For each of them we give a structured overview on the state of knowledge and discuss future research directions.\nTraditional GANs use a deterministic generator function (typically a neural network) to transform a random noise input $z$ to a sample $\\mathbf{x}$ that the discriminator seeks to distinguish. We propose a new GAN called Bayesian Conditional Generative Adversarial Networks (BC-GANs) that use a random generator function to transform a deterministic input $y'$ to a sample $\\mathbf{x}$. Our BC-GANs extend traditional GANs to a Bayesian framework, and naturally handle unsupervised learning, supervised learning, and semi-supervised learning problems. Experiments show that the proposed BC-GANs outperforms the state-of-the-arts.\nIn a recent article [Oh'15], Oh examined the impact of various key heuristics (e.g., deletion strategy, restart policy, decay factor, database reduction) in competitive SAT solvers. His key findings are that their expected success depends on whether the input formula is satisfiable or not. To further investigate these findings, we focused on two properties of satisfiable formulas: the entropy of the formula, which approximates the freedom we have in assigning the variables, and the solution density, which is the number of solutions divided by the search space. We found that both predict better the effect of these heuristics, and that satisfiable formulas with small entropy `behave' similarly to unsatisfiable formulas.\nThis approach presents a multi-valued representation of the neutrosophic information. It highlights the link between the bifuzzy information and neutrosophic one. The constructed deca-valued structure shows the neutrosophic information complexity. This deca-valued structure led to construction of two new concepts for the neutrosophic information: neutro-entropy and anti-entropy. These two concepts are added to the two existing: entropy and non-entropy. Thus, we obtained the following triad: entropy, neutro-entropy and anti-entropy.\nThis paper focuses on preserving the privacy of sensitive patterns when inducing decision trees. We adopt a record augmentation approach for hiding sensitive classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or cryptographic techniques - which restrict the usability of the data - since the raw data itself is readily available for public use. We show some key lemmas which are related to the hiding process and we also demonstrate the methodology with an example and an indicative experiment using a prototype hiding tool.\nWe propose ThalNet, a deep learning model inspired by neocortical communication via the thalamus. Our model consists of recurrent neural modules that send features through a routing center, endowing the modules with the flexibility to share features over multiple time steps. We show that our model learns to route information hierarchically, processing input data by a chain of modules. We observe common architectures, such as feed forward neural networks and skip connections, emerging as special cases of our architecture, while novel connectivity patterns are learned for the text8 compression task. Our model outperforms standard recurrent neural networks on several sequential benchmarks.\nThis paper introduces Dex, a reinforcement learning environment toolkit specialized for training and evaluation of continual learning methods as well as general reinforcement learning problems. We also present the novel continual learning method of incremental learning, where a challenging environment is solved using optimal weight initialization learned from first solving a similar easier environment. We show that incremental learning can produce vastly superior results than standard methods by providing a strong baseline method across ten Dex environments. We finally develop a saliency method for qualitative analysis of reinforcement learning, which shows the impact incremental learning has on network attention.\nMulti-agent predictive modeling is an essential step for understanding physical, social and team-play systems. Recently, Interaction Networks (INs) were proposed for the task of modeling multi-agent physical systems, INs scale with the number of interactions in the system (typically quadratic or higher order in the number of agents). In this paper we introduce VAIN, a novel attentional architecture for multi-agent predictive modeling that scales linearly with the number of agents. We show that VAIN is effective for multi-agent predictive modeling. Our method is evaluated on tasks from challenging multi-agent prediction domains: chess and soccer, and outperforms competing multi-agent approaches.\nWe build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks.\nExplaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, what is also important is understanding how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the generalization power of the rules it learned. We present here an approach that learns rules to explain globally the behavior of black box machine learning models. Collectively these rules represent the logic learned by the model and are hence useful for gaining insight into its behavior. We demonstrate the power of the approach on three publicly available data sets.\nWe study the reachability problem for systems implemented as feed-forward neural networks whose activation function is implemented via ReLU functions. We draw a correspondence between establishing whether some arbitrary output can ever be outputed by a neural system and linear problems characterising a neural system of interest. We present a methodology to solve cases of practical interest by means of a state-of-the-art linear programs solver. We evaluate the technique presented by discussing the experimental results obtained by analysing reachability properties for a number of benchmarks in the literature.\nWe study pure coordination games where in every outcome, all players have identical payoffs, 'win' or 'lose'. We identify and discuss a range of 'purely rational principles' guiding the reasoning of rational players in such games and analyze which classes of coordination games can be solved by such players with no preplay communication or conventions. We observe that it is highly nontrivial to delineate a boundary between purely rational principles and other decision methods, such as conventions, for solving such coordination games.\nIn this article, we extend the conventional framework of convolutional-Restricted-Boltzmann-Machine to learn highly abstract features among abitrary number of time related input maps by constructing a layer of multiplicative units, which capture the relations among inputs. In many cases, more than two maps are strongly related, so it is wise to make multiplicative unit learn relations among more input maps, in other words, to find the optimal relational-order of each unit. In order to enable our machine to learn relational order, we developed a reinforcement-learning method whose optimality is proven to train the network.\nTemporal landmarks have been proved to be a helpful mechanism to deal with temporal planning problems, specifically to improve planners performance and handle problems with deadline constraints. In this paper, we show the strength of using temporal landmarks to handle the state trajectory constraints of PDDL3.0. We analyze the formalism of TempLM, a temporal planner particularly aimed at solving planning problems with deadlines, and we present a detailed study that exploits the underlying temporal landmark-based mechanism of TempLM for representing and reasoning with trajectory constraints.\nThe task of learning to pick a single preferred example out a finite set of examples, an \"optimal choice problem\", is a supervised machine learning problem with complex, structured input. Problems of optimal choice emerge often in various practical applications. We formalize the problem, show that it does not satisfy the assumptions of statistical learning theory, yet it can be solved efficiently in some cases. We propose two approaches to solve the problem. Both of them reach good solutions on real life data from a signal processing application.\nHedonic games are meant to model how coalitions of people form and break apart in the real world. However, it is difficult to run simulations when everything must be done by hand on paper. We present an online software that allows fast and visual simulation of several types of hedonic games. http://lukemiles.org/hedonic-games/\nThe number of complete chloroplastic genomes increases day after day, making it possible to rethink plants phylogeny at the biomolecular era. Given a set of close plants sharing in the order of one hundred of core chloroplastic genes, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported species tree. Due to computational complexity, a discrete and distributed Particle Swarm Optimization (DPSO) is proposed. It is finally applied to the core genes of Rosales order.\nIn Constraint Programming (CP) a portfolio solver combines a variety of different constraint solvers for solving a given problem. This fairly recent approach enables to significantly boost the performance of single solvers, especially when multicore architectures are exploited. In this work we give a brief overview of the portfolio solver sunny-cp, and we discuss its performance in the MiniZinc Challenge---the annual international competition for CP solvers---where it won two gold medals in 2015 and 2016. Under consideration in Theory and Practice of Logic Programming (TPLP)\nThis paper investigates how high school students approach computing through an introductory computer science course situated in the Logic Programming (LP) paradigm. This study shows how novice students operate within the LP paradigm while engaging in foundational computing concepts and skills, and presents a case for LP as a viable paradigm choice for introductory CS courses.\nLearning relations based on evidence from knowledge bases relies on processing the available relation instances. Many relations, however, have clear domain and range, which we hypothesize could help learn a better, more generalizing, model. We include such information in the RESCAL model in the form of a regularization factor added to the loss function that takes into account the types (categories) of the entities that appear as arguments to relations in the knowledge base. We note increased performance compared to the baseline model in terms of mean reciprocal rank and hits@N, N = 1, 3, 10. Furthermore, we discover scenarios that significantly impact the effectiveness of the type regularizer.\nWe study fairness in collaborative-filtering recommender systems, which are sensitive to discrimination that exists in historical data. Biased data can lead collaborative filtering methods to make unfair predictions against minority groups of users. We identify the insufficiency of existing fairness metrics and propose four new metrics that address different forms of unfairness. These fairness metrics can be optimized by adding fairness terms to the learning objective. Experiments on synthetic and real data show that our new metrics can better measure fairness than the baseline, and that the fairness objectives effectively help reduce unfairness.\nThis paper proposes a new algorithm for recovery of belief network structure from data handling hidden variables. It consists essentially in an extension of the CI algorithm of Spirtes et al. by restricting the number of conditional dependencies checked up to k variables and in an extension of the original CI by additional steps transforming so called partial including path graph into a belief network. Its correctness is demonstrated.\nWe consider the problem of learning the functions computing children from parents in a Structural Causal Model once the underlying causal graph has been identified. This is in some sense the second step after causal discovery. Taking a probabilistic approach to estimating these functions, we derive a natural myopic active learning scheme that identifies the intervention which is optimally informative about all of the unknown functions jointly, given previously observed data. We test the derived algorithms on simple examples, to demonstrate that they produce a structured exploration policy that significantly improves on unstructured base-lines.\nIn this work, we perform an exploratory study on synthesizing deep neural networks using biological synaptic strength distributions, and the potential influence of different distributions on modelling performance particularly for the scenario associated with small data sets. Surprisingly, a CNN with convolutional layer synaptic strengths drawn from biologically-inspired distributions such as log-normal or correlated center-surround distributions performed relatively well suggesting a possibility for designing deep neural network architectures that do not require many data samples to learn, and can sidestep current training procedures while maintaining or boosting modelling performance.\nIn multi-agent path finding (MAPF) the task is to find non-conflicting paths for multiple agents. In this paper we focus on finding suboptimal solutions for MAPF for the sum-of-costs variant. Recently, a SAT-based approached was developed to solve this problem and proved beneficial in many cases when compared to other search-based solvers. In this paper, we present SAT-based unbounded- and bounded-suboptimal algorithms and compare them to relevant algorithms. Experimental results show that in many case the SAT-based solver significantly outperforms the search-based solvers.\nThe recent emergence of novel computational devices, such as adiabatic quantum computers, CMOS annealers, and optical parametric oscillators, presents new opportunities for hybrid-optimization algorithms that leverage these kinds of specialized hardware. In this work, we propose the idea of an Ising processing unit as a computational abstraction for these emerging tools. Challenges involved in using and benchmarking these devices are presented, and open-source software tools are proposed to address some of these challenges. The proposed benchmarking tools and methodology are demonstrated by conducting a baseline study of established solution methods to a D-Wave 2X adiabatic quantum computer, one example of a commercially available Ising processing unit.\nWe propose a framework for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we formulate our approach as a constrained optimization problem involving the trace of the conditional covariance operator, and additionally provide some consistency results. We then demonstrate on a variety of synthetic and real data sets that our method compares favorably with other state-of-the-art algorithms.\nWhile natural languages are compositional, how state-of-the-art neural models achieve compositionality is still unclear. We propose a deep network, which not only achieves competitive accuracy for text classification, but also exhibits compositional behavior. That is, while creating hierarchical representations of a piece of text, such as a sentence, the lower layers of the network distribute their layer-specific attention weights to individual words. In contrast, the higher layers compose meaningful phrases and clauses, whose lengths increase as the networks get deeper until fully composing the sentence.\nThe regret bound of an optimization algorithms is one of the basic criteria for evaluating the performance of the given algorithm. By inspecting the differences between the regret bounds of traditional algorithms and adaptive one, we provide a guide for choosing an optimizer with respect to the given data set and the loss function. For analysis, we assume that the loss function is convex and its gradient is Lipschitz continuous.\nRecent progress in logic programming (e.g., the development of the Answer Set Programming paradigm) has made it possible to teach it to general undergraduate and even high school students. Given the limited exposure of these students to computer science, the complexity of downloading, installing and using tools for writing logic programs could be a major barrier for logic programming to reach a much wider audience. We developed an online answer set programming environment with a self contained file system and a simple interface, allowing users to write logic programs and perform several tasks over the programs.\nNetworks are representations of complex underlying social processes. However, the same given network may be more suitable to model one behavior of individuals than another. In many cases, aggregate population models may be more effective than modeling on the network. We present a general framework for evaluating the suitability of given networks for a set of predictive tasks of interest, compared against alternative, networks inferred from data. We present several interpretable network models and measures for our comparison. We apply this general framework to the case study on collective classification of music preferences in a newly available dataset of the Last.fm social network.\nWe provide preliminary details and formulation of an optimization strategy under current development that is able to automatically tune the parameters of a Support Vector Machine over new datasets. The optimization strategy is a heuristic based on Iterated Local Search, a modification of classic hill climbing which iterates calls to a local search routine.\nThis paper provides a new similarity detection algorithm. Given an input set of multi-dimensional data points, where each data point is assumed to be multi-dimensional, and an additional reference data point for similarity finding, the algorithm uses kernel method that embeds the data points into a low dimensional manifold. Unlike other kernel methods, which consider the entire data for the embedding, our method selects a specific set of kernel eigenvectors. The eigenvectors are chosen to separate between the data points and the reference data point so that similar data points can be easily identified as being distinct from most of the members in the dataset.\nPythagorean fuzzy sets provide stronger ability than intuitionistic fuzzy sets to model uncertainty information and knowledge, but little effort has been paid to conflict analysis of Pythagorean fuzzy information systems. In this paper, we present three types of positive, central, and negative alliances with different thresholds, and employ examples to illustrate how to construct the positive, central, and negative alliances. Then we study conflict analysis of Pythagorean fuzzy information systems based on Bayesian minimum risk theory. Finally, we investigate group conflict analysis of Pythagorean fuzzy information systems based on Bayesian minimum risk theory.\nJumping has been an important mechanic since its introduction in Donkey Kong. It has taken a variety of forms and shown up in numerous games, with each jump having a different feel. In this paper, we use a modified Nintendo Entertainment System (NES) emulator to semi-automatically run experiments on a large subset (30%) of NES platform games. We use these experiments to build models of jumps from different developers, series, and games across the history of the console. We then examine these models to gain insights into different forms of jumping and their associated feel.\nWe provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a target model, which may or may not be a human. We define a framework that allows for comparing interpretable procedures by linking it to important practical aspects such as accuracy and robustness. We characterize many of the current state-of-the-art interpretable methods in our framework portraying its general applicability.\nThe interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.\nHidden variables are well known sources of disturbance when recovering belief networks from data based only on measurable variables. Hence models assuming existence of hidden variables are under development.   This paper presents a new algorithm \"accelerating\" the known CI algorithm of Spirtes, Glymour and Scheines {Spirtes:93}. We prove that this algorithm does not produces (conditional) independencies not present in the data if statistical independence test is reliable.   This result is to be considered as non-trivial since e.g. the same claim fails to be true for FCI algorithm, another \"accelerator\" of CI, developed in {Spirtes:93}.\nThe highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points and concepts are represented by regions in a (potentially) high-dimensional space. Based on our recent formalization, we present a comprehensive implementation of the conceptual spaces framework that is not only capable of representing concepts with inter-domain correlations, but that also offers a variety of operations on these concepts.\nWe present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deep-learning infrastructure: in particular, it enables high-performance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledge-base triples and tens of thousands of examples.\nThis work proposes a formulation of propositional logic, named Eigenlogic, using quantum observables as propositions. The eigenvalues of these operators are the truth-values and the associated eigenvectors the interpretations of the propositional system. Fuzzy logic arises naturally when considering vectors outside the eigensystem, the fuzzy membership function is obtained by the Born rule of the logical observable.This approach is then applied in the context of quantum robots using simple behavioral agents represented by Braitenberg vehicles. Processing with non-classical logic such as multivalued logic, fuzzy logic and the quantum Eigenlogic permits to enlarge the behavior possibilities and the associated decisions of these simple agents.\nProbLog is a state-of-art combination of logic programming and probabilities; in particular ProbLog offers parameter learning through a variant of the EM algorithm. However, the resulting learning algorithm is rather slow, even when the data are complete. In this short paper we offer some insights that lead to orders of magnitude improvements in ProbLog's parameter learning speed with complete data.\nThe article studies navigability of an autonomous agent in a maze where some rooms may be indistinguishable. In a previous work the authors have shown that the properties of navigability in such a setting depend on whether an agent has perfect recall. Navigability by an agent with perfect recall is a transitive relation and without is not transitive.   This article introduces a notion of restricted navigability and shows that a certain form of transitivity holds for restricted navigability, even for an agent without perfect recall. The main technical result is a sound and complete logical system describing the properties of restricted navigability.\nSports channel video portals offer an exciting domain for research on multimodal, multilingual analysis. We present methods addressing the problem of automatic video highlight prediction based on joint visual features and textual analysis of the real-world audience discourse with complex slang, in both English and traditional Chinese. We present a novel dataset based on League of Legends championships recorded from North American and Taiwanese Twitch.tv channels (will be released for further research), and demonstrate strong results on these using multimodal, character-level CNN-RNN model architectures.\nThe paper provides an analysis of the voting method known as delegable proxy voting, or liquid democracy. The analysis first positions liquid democracy within the theory of binary aggregation. It then focuses on two issues of the system: the occurrence of delegation cycles; and the effect of delegations on individual rationality when voting on logically interdependent propositions. It finally points to proposals on how the system may be modified in order to address the above issues.\nThe existence of a coalition strategy to achieve a goal does not necessarily mean that the coalition has enough information to know how to follow the strategy. Neither does it mean that the coalition knows that such a strategy exists. The paper studies an interplay between the distributed knowledge, coalition strategies, and coalition \"know-how\" strategies. The main technical result is a sound and complete trimodal logical system that describes the properties of this interplay.\nIn this work we analyze the performances of two of the most used word embeddings algorithms, skip-gram and continuous bag of words on Italian language. These algorithms have many hyper-parameter that have to be carefully tuned in order to obtain accurate word representation in vectorial space. We provide an accurate analysis and an evaluation, showing what are the best configuration of parameters for specific tasks.\nWe propose a new type of self-aware systems inspired by ideas from higher-order theories of consciousness. First, we discussed the crucial distinction between introspection and reflexion. Then, we focus on computational reflexion as a mechanism by which a computer program can inspect its own code at every stage of the computation. Finally, we provide a formal definition and a proof-of-concept implementation of computational reflexion, viewed as an enriched form of program interpretation and a way to dynamically \"augment\" a computational process.\nANGELINA is an automated game design system which has previously been built as a single software block which designs games from start to finish. In this paper we outline a roadmap for the development of a new version of ANGELINA, designed to iterate on games in different ways to produce a continuous creative process that will improve the quality of its work, but more importantly improve the perception of the software as being an independently creative piece of software. We provide an initial report of the system's structure here as well as results from the first working module of the system.\nI propose the purpose our concept of actual causation serves is minimizing various cost in intervention practice. Actual causation has three features: nonredundant sufficiency, continuity and abnormality; these features correspond to the minimization of exploitative cost, exploratory cost and risk cost in intervention practice. Incorporating these three features, a definition of actual causation is given. I test the definition in 66 causal cases from actual causation literature and show that this definition's application fit intuition better than some other causal modelling based definitions.\nRecommendation to groups of users is a challenging and currently only passingly studied task. Especially the evaluation aspect often appears ad-hoc and instead of truly evaluating on groups of users, synthesizes groups by merging individual preferences.   In this paper, we present a user study, recording the individual and shared preferences of actual groups of participants, resulting in a robust, standardized evaluation benchmark. Using this benchmarking dataset, that we share with the research community, we compare the respective performance of a wide range of music group recommendation techniques proposed in the\nOne question central to Reinforcement Learning is how to learn a feature representation that supports algorithm scaling and re-use of learned information from different tasks. Successor Features approach this problem by learning a feature representation that satisfies a temporal constraint. We present an implementation of an approach that decouples the feature representation from the reward function, making it suitable for transferring knowledge between domains. We then assess the advantages and limitations of using Successor Features for transfer.\nIn this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized representation to effectively utilize entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments.\nWe propose a new framework for abstractive text summarization based on a sequence-to-sequence oriented encoder-decoder model equipped with a deep recurrent generative decoder (DRGN).   Latent structure information implied in the target summaries is learned based on a recurrent latent random model for improving the summarization quality.   Neural variational inference is employed to address the intractable posterior inference for the recurrent latent variables.   Abstractive summaries are generated based on both the generative latent variables and the discriminative deterministic states.   Extensive experiments on some benchmark datasets in different languages show that DRGN achieves improvements over the state-of-the-art methods.\nWe investigate the problem of reader-aware multi-document summarization (RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we extend a variational auto-encodes (VAEs) based MDS framework by jointly considering news documents and reader comments. To conduct evaluation for summarization performance, we prepare a new dataset. We describe the methods for data collection, aspect annotation, and summary writing as well as scrutinizing by experts. Experimental results show that reader comments can improve the summarization performance, which also demonstrates the usefulness of the proposed dataset. The annotated dataset for RA-MDS is available online.\nIn this article, we mathematically study several GAN related topics, including Inception score, label smoothing, gradient vanishing and the -log(D(x)) alternative.   --- An advanced version is included in arXiv:1703.02000 \"Activation Maximization Generative Adversarial Nets\". Please refer Section 6 in 1703.02000 for detailed analysis on Inception Score, and refer its appendix for the discussions on Label Smoothing, Gradient Vanishing and -log(D(x)) Alternative. ---\nWe present four logic puzzles and after that their solutions. Joseph Yeo designed 'Cheryl's Birthday'. Mike Hartley came up with a novel solution for 'One Hundred Prisoners and a Light Bulb'. Jonathan Welton designed 'A Blind Guess' and 'Abby's Birthday'. Hans van Ditmarsch and Barteld Kooi authored the puzzlebook 'One Hundred Prisoners and a Light Bulb' that contains other knowledge puzzles, and that can also be found on the webpage http://personal.us.es/hvd/lightbulb.html dedicated to the book.\nThere have been a number of developments in measuring inconsistency in logic-based representations of knowledge. In contrast, the development of inconsistency measures for computational models of argument has been limited. To address this shortcoming, this paper provides a general framework for measuring inconsistency in abstract argumentation, together with some proposals for specific measures, and a consideration of measuring inconsistency in logic-based instantiations of argument graphs, including a review of some existing proposals and a consideration of how existing logic-based measures of inconsistency can be applied.\nWe address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.\nThe notion of commitment is widely studied as a high-level abstraction for modeling multiagent interaction. An important challenge is supporting flexible decentralized enactments of commitment specifications. In this paper, we combine recent advances on specifying commitments and information protocols. Specifically, we contribute Tosca, a technique for automatically synthesizing information protocols from commitment specifications. Our main result is that the synthesized protocols support commitment alignment, which is the idea that agents must make compatible inferences about their commitments despite decentralization.\nt-Distributed Stochastic Neighbor Embedding (t-SNE) is one of the most widely used dimensionality reduction methods for data visualization, but it has a perplexity hyperparameter that requires manual selection. In practice, proper tuning of t-SNE perplexity requires users to understand the inner working of the method as well as to have hands-on experience. We propose a model selection objective for t-SNE perplexity that requires negligible extra computation beyond that of the t-SNE itself. We empirically validate that the perplexity settings found by our approach are consistent with preferences elicited from human experts across a number of datasets. The similarities of our approach to Bayesian information criteria (BIC) and minimum description length (MDL) are also analyzed.\nWe present a framework to systematically analyze convolutional neural networks (CNNs) used in classification of cars in autonomous vehicles. Our analysis procedure comprises an image generator that produces synthetic pictures by sampling in a lower dimension image modification subspace and a suite of visualization tools. The image generator produces images which can be used to test the CNN and hence expose its vulnerabilities. The presented framework can be used to extract insights of the CNN classifier, compare across classification models, or generate training and validation datasets.\nAdvances in remote sensing technologies have made it possible to use high-resolution visual data for weather observation and forecasting tasks. We propose the use of multi-layer neural networks for understanding complex atmospheric dynamics based on multichannel satellite images. The capability of our model was evaluated by using a linear regression task for single typhoon coordinates prediction. A specific combination of models and different activation policies enabled us to obtain an interesting prediction result in the northeastern hemisphere (ENH).\nThis paper introduces Deep Incremental Boosting, a new technique derived from AdaBoost, specifically adapted to work with Deep Learning methods, that reduces the required training time and improves generalisation. We draw inspiration from Transfer of Learning approaches to reduce the start-up time to training each incremental Ensemble member. We show a set of experiments that outlines some preliminary results on some common Deep Learning datasets and discuss the potential improvements Deep Incremental Boosting brings to traditional Ensemble methods in Deep Learning.\nThe model-based control of building heating systems for energy saving encounters severe physical, mathematical and calibration difficulties in the numerous attempts that has been published until now. This topic is addressed here via a new model-free control setting, where the need of any mathematical description disappears. Several convincing computer simulations are presented. Comparisons with classic PI controllers and flatness-based predictive control are provided.\nDialog is a natural modality for interaction between customers and businesses in the service industry. As customers call up the service provider, their interactions may be routine or extraordinary. We believe that these interactions, when seen as dialogs, can be analyzed to obtain a better understanding of customer needs and how to efficiently address them. We introduce the idea of a dialog complexity measure to characterize multi-party interactions, propose a general data-driven method to calculate it, use it to discover insights in public and enterprise dialog datasets, and demonstrate its beneficial usage in facilitating better handling of customer requests and evaluating service agents.\nControlling embodied agents with many actuated degrees of freedom is a challenging task. We propose a method that can discover and interpolate between context dependent high-level actions or body-affordances. These provide an abstract, low-dimensional interface indexing high-dimensional and time- extended action policies. Our method is related to recent ap- proaches in the machine learning literature but is conceptually simpler and easier to implement. More specifically our method requires the choice of a n-dimensional target sensor space that is endowed with a distance metric. The method then learns an also n-dimensional embedding of possibly reactive body-affordances that spread as far as possible throughout the target sensor space.\nDisagreement-based approaches generate multiple classifiers and exploit the disagreement among them with unlabeled data to improve learning performance. Co-training is a representative paradigm of them, which trains two classifiers separately on two sufficient and redundant views; while for the applications where there is only one view, several successful variants of co-training with two different classifiers on single-view data instead of two views have been proposed. For these disagreement-based approaches, there are several important issues which still are unsolved, in this article we present theoretical analyses to address these issues, which provides a theoretical foundation of co-training and disagreement-based approaches.\nIT offers some benefits and collaborations in various sectors. This research focuses on exploring higher education subjects via social technology, YouTube. YouTube is the world largest video based contents application in the world. Current learning materials are not only in text and images, but included video contents. This research enriching students learning materials may involving YouTube as learning sources. The study observed 118 sophomore students in computer science faculty. The results show that, involving YouTube in enriching students course material able to create conductive learning environment. This strategy increases students understanding in their field of study.\nThere is sufficient information in the far-field of a radiating dipole antenna to rediscover the Maxwell Equations and the wave equations of light, including the speed of light $c.$ TheoSea is a Julia program that does this in about a second, and the key insight is that the compactness of theories drives the search. The program is a computational embodiment of the scientific method: observation, consideration of candidate theories, and validation.\nWe show a proof of principle for warping, a method to interpret the inner working of neural networks in the context of gene expression analysis. Warping is an efficient way to gain insight to the inner workings of neural nets and make them more interpretable. We demonstrate the ability of warping to recover meaningful information for a given class on a samplespecific individual basis. We found warping works well in both linearly and nonlinearly separable datasets. These encouraging results show that warping has a potential to be the answer to neural networks interpretability in computational biology.\nRecent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that this method is not sufficient for gesture recognition, where temporal information is more discriminative compared to general video classification tasks. We explore deep architectures for gesture recognition in video and propose a new end-to-end trainable neural network architecture incorporating temporal convolutions and bidirectional recurrence. Our main contributions are twofold; first, we show that recurrence is crucial for this task; second, we show that adding temporal convolutions leads to significant improvements. We evaluate the different approaches on the Montalbano gesture recognition dataset, where we achieve state-of-the-art results.\nWe introduce Selective Greedy Equivalence Search (SGES), a restricted version of Greedy Equivalence Search (GES). SGES retains the asymptotic correctness of GES but, unlike GES, has polynomial performance guarantees. In particular, we show that when data are sampled independently from a distribution that is perfect with respect to a DAG ${\\cal G}$ defined over the observable variables then, in the limit of large data, SGES will identify ${\\cal G}$'s equivalence class after a number of score evaluations that is (1) polynomial in the number of nodes and (2) exponential in various complexity measures including maximum-number-of-parents, maximum-clique-size, and a new measure called {\\em v-width} that is at least as small as---and potentially much smaller than---the other two. More generally, we show that for any hereditary and equivalence-invariant property $\\Pi$ known to hold in ${\\cal G}$, we retain the large-sample optimality guarantees of GES even if we ignore any GES deletion operator during the backward phase that results in a state for which $\\Pi$ does not hold in the common-descendants subgraph.\nIn this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR MDP. Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget. This result, which is of independent interest, motivates CVaR MDPs as a unifying framework for risk-sensitive and robust decision making. Our second contribution is to present an approximate value-iteration algorithm for CVaR MDPs and analyze its convergence rate. To our knowledge, this is the first solution algorithm for CVaR MDPs that enjoys error guarantees. Finally, we present results from numerical experiments that corroborate our theoretical findings and show the practicality of our approach.\nA simple Neural Network model is presented for end-to-end visual learning of arithmetic operations from pictures of numbers. The input consists of two pictures, each showing a 7-digit number. The output, also a picture, displays the number showing the result of an arithmetic operation (e.g., addition or subtraction) on the two input numbers. The concepts of a number, or of an operator, are not explicitly introduced. This indicates that addition is a simple cognitive task, which can be learned visually using a very small number of neurons.   Other operations, e.g., multiplication, were not learnable using this architecture. Some tasks were not learnable end-to-end (e.g., addition with Roman numerals), but were easily learnable once broken into two separate sub-tasks: a perceptual \\textit{Character Recognition} and cognitive \\textit{Arithmetic} sub-tasks. This indicates that while some tasks may be easily learnable end-to-end, other may need to be broken into sub-tasks.\nBehavior Trees are commonly used to model agents for robotics and games, where constrained behaviors must be designed by human experts in order to guarantee that these agents will execute a specific chain of actions given a specific set of perceptions. In such application areas, learning is a desirable feature to provide agents with the ability to adapt and improve interactions with humans and environment, but often discarded due to its unreliability. In this paper, we propose a framework that uses Reinforcement Learning nodes as part of Behavior Trees to address the problem of adding learning capabilities in constrained agents. We show how this framework relates to Options in Hierarchical Reinforcement Learning, ensuring convergence of nested learning nodes, and we empirically show that the learning nodes do not affect the execution of other nodes in the tree.\nIn Constraint Programming, global constraints allow to model and solve many combinatorial problems. Among these constraints, several sortedness constraints have been defined, for which propagation algorithms are available, but for which the tractability is not settled. We show that the sort(U,V) constraint (Older et. al, 1995) is intractable for integer variables whose domains are not limited to intervals. As a consequence, the similar result holds for the sort(U,V, P) constraint (Zhou, 1996). Moreover, the intractability holds even under the stability condition present in the recently introduced keysorting(U,V,Keys,P) constraint (Carlsson et al., 2014), and requiring that the order of the variables with the same value in the list U be preserved in the list V. Therefore, keysorting(U,V,Keys,P) is intractable as well.\nWe present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy.\nWe show new limits on the efficiency of using current techniques to make exact probabilistic inference for large classes of natural problems. In particular we show new lower bounds on knowledge compilation to SDD and DNNF forms. We give strong lower bounds on the complexity of SDD representations by relating SDD size to best-partition communication complexity. We use this relationship to prove exponential lower bounds on the SDD size for representing a large class of problems that occur naturally as queries over probabilistic databases. A consequence is that for representing unions of conjunctive queries, SDDs are not qualitatively more concise than OBDDs. We also derive simple examples for which SDDs must be exponentially less concise than FBDDs. Finally, we derive exponential lower bounds on the sizes of DNNF representations using a new quasipolynomial simulation of DNNFs by nondeterministic FBDDs.\nWe consider the problem of modelling noisy but highly symmetric shapes that can be viewed as hierarchies of whole-part relationships in which higher level objects are composed of transformed collections of lower level objects. To this end, we propose the stochastic wreath process, a fully generative probabilistic model of drawings. Following Leyton's \"Generative Theory of Shape\", we represent shapes as sequences of transformation groups composed through a wreath product.   This representation emphasizes the maximization of transfer --- the idea that the most compact and meaningful representation of a given shape is achieved by maximizing the re-use of existing building blocks or parts.   The proposed stochastic wreath process extends Leyton's theory by defining a probability distribution over geometric shapes in terms of noise processes that are aligned with the generative group structure of the shape. We propose an inference scheme for recovering the generative history of given images in terms of the wreath process using reversible jump Markov chain Monte Carlo methods and Approximate Bayesian Computation. In the context of sketching we demonstrate the feasibility and limitations of this approach on model-generated and real data.\nOur goal is to deploy a high-accuracy system starting with zero training examples. We consider an \"on-the-job\" setting, where as inputs arrive, we use real-time crowdsourcing to resolve uncertainty where needed and output our prediction when confident. As the model improves over time, the reliance on crowdsourcing queries decreases. We cast our setting as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. Computing the optimal policy is intractable, so we develop an approximation based on Monte Carlo Tree Search. We tested our approach on three datasets---named-entity recognition, sentiment classification, and image classification. On the NER task we obtained more than an order of magnitude reduction in cost compared to full human annotation, while boosting performance relative to the expert provided labels. We also achieve a 8% F1 improvement over having a single human label the whole set, and a 28% F1 improvement over online learning.\nTeaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.\nWe consider a contextual version of multi-armed bandit problem with global knapsack constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource consumption vector, both dependent on the context, and the global knapsack constraints require the total consumption for each resource to be below some pre-fixed budget. The learning agent competes with an arbitrary set of context-dependent policies. This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it. We give a computationally efficient algorithm for this problem with slightly better regret bounds, by generalizing the approach of Agarwal et al. (2014) for the non-constrained version of the problem. The computational time of our algorithm scales logarithmically in the size of the policy space. This answers the main open question of Badanidiyuru et al. (2014). We also extend our results to a variant where there are no knapsack constraints but the objective is an arbitrary Lipschitz concave function of the sum of outcome vectors.\nThe empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with $p$ being the prior probability mass of the true reward-generating model, we prove $O(\\sqrt{T/p})$ and $O(\\sqrt{(1-p)T})$ regret upper bounds for the bad- and good-prior cases, respectively, as well as \\emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.\nTransferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL). Despite much encouraging empirical evidence, there has been little theoretical analysis. In this paper, we study a class of lifelong RL problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action sets and different transition/reward functions. Motivated by the need for cross-task exploration in lifelong learning, we formulate a novel online coupon-collector problem and give an optimal algorithm. This allows us to develop a new lifelong RL algorithm, whose overall sample complexity in a sequence of tasks is much smaller than single-task learning, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in simulated problems, including a recently introduced human-robot interaction problem.\nWe present a new fast online clustering algorithm that reliably recovers arbitrary-shaped data clusters in high throughout data streams. Unlike the existing state-of-the-art online clustering methods based on k-means or k-medoid, it does not make any restrictive generative assumptions. In addition, in contrast to existing nonparametric clustering techniques such as DBScan or DenStream, it gives provable theoretical guarantees. To achieve fast clustering, we propose to represent each cluster by a skeleton set which is updated continuously as new data is seen. A skeleton set consists of weighted samples from the data where weights encode local densities. The size of each skeleton set is adapted according to the cluster geometry. The proposed technique automatically detects the number of clusters and is robust to outliers. The algorithm works for the infinite data stream where more than one pass over the data is not feasible. We provide theoretical guarantees on the quality of the clustering and also demonstrate its advantage over the existing state-of-the-art on several datasets.\nWe present a Bayesian tensor factorization model for inferring latent group structures from dynamic pairwise interaction patterns. For decades, political scientists have collected and analyzed records of the form \"country $i$ took action $a$ toward country $j$ at time $t$\"---known as dyadic events---in order to form and test theories of international relations. We represent these event data as a tensor of counts and develop Bayesian Poisson tensor factorization to infer a low-dimensional, interpretable representation of their salient patterns. We demonstrate that our model's predictive performance is better than that of standard non-negative tensor factorization methods. We also provide a comparison of our variational updates to their maximum likelihood counterparts. In doing so, we identify a better way to form point estimates of the latent factors than that typically used in Bayesian Poisson matrix factorization. Finally, we showcase our model as an exploratory analysis tool for political scientists. We show that the inferred latent factor matrices capture interpretable multilateral relations that both conform to and inform our knowledge of international affairs.\nMany real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale non-parametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets, and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.\nWe propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence \"regions\" salient to the current world state by using multiple abstractions of the input sentence. In contrast to existing methods, our model uses no specialized linguistic resources (e.g., parsers) or task-specific annotations (e.g., seed lexicons). It is therefore generalizable, yet still achieves the best results reported to-date on a benchmark single-sentence dataset and competitive results for the limited-training multi-sentence setting. We analyze our model through a series of ablations that elucidate the contributions of the primary components of our model.\nIn Dung's abstract argumentation, arguments are either acceptable or unacceptable, given a chosen notion of acceptability. This gives a coarse way to compare arguments. In this paper, we propose a counting approach for a more fine-gained assessment to arguments by counting the number of their respective attackers and defenders based on argument graph and argument game. An argument is more acceptable if the proponent puts forward more number of defenders for it and the opponent puts forward less number of attackers against it. We show that our counting model has two well-behaved properties: normalization and convergence. Then, we define a counting semantics based on this model, and investigate some general properties of the semantics.\nCausality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between query causality and consistency-based diagnosis and database repairs (wrt. integrity constrain violations) have been established in the literature. In this work we establish connections between query causality and abductive diagnosis and the view-update problem. The unveiled relationships allow us to obtain new complexity results for query causality -the main focus of our work- and also for the two other areas.\nWe show that strategies implemented in automatic theorem proving involve an interesting tradeoff between execution speed, proving speedup/computational time and usefulness of information. We advance formal definitions for these concepts by way of a notion of normality related to an expected (optimal) theoretical speedup when adding useful information (other theorems as axioms), as compared with actual strategies that can be effectively and efficiently implemented. We propose the existence of an ineluctable tradeoff between this normality and computational time complexity. The argument quantifies the usefulness of information in terms of (positive) speed-up. The results disclose a kind of no-free-lunch scenario and a tradeoff of a fundamental nature. The main theorem in this paper together with the numerical experiment---undertaken using two different automatic theorem provers AProS and Prover9 on random theorems of propositional logic---provide strong theoretical and empirical arguments for the fact that finding new useful information for solving a specific problem (theorem) is, in general, as hard as the problem (theorem) itself.\nWe develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. Codes for the DTRN will be available.\nInterpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal.   We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship's fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes---such as positive sentiment, politeness, or focus on future planning---signal impending betrayal.\nIt has been proposed that human physical reasoning consists largely of running \"physics engines in the head\" in which the future trajectory of the physical system under consideration is computed precisely using accurate scientific theories. In such models, uncertainty and incomplete knowledge is dealt with by sampling probabilistically over the space of possible trajectories (\"Monte Carlo simulation\"). We argue that such simulation-based models are too weak, in that there are many important aspects of human physical reasoning that cannot be carried out this way, or can only be carried out very inefficiently; and too strong, in that humans make large systematic errors that the models cannot account for. We conclude that simulation-based reasoning makes up at most a small part of a larger system that encompasses a wide range of additional cognitive processes.\nThis paper proposes a new approach to model the temporal dynamics of a sequence of facial expressions. To this purpose, a sequence of Face Image Descriptors (FID) is regarded as the output of a Linear Time Invariant (LTI) system. The temporal dynamics of such sequence of descriptors are represented by means of a Hankel matrix. The paper presents different strategies to compute dynamics-based representation of a sequence of FID, and reports classification accuracy values of the proposed representations within different standard classification frameworks. The representations have been validated in two very challenging application domains: emotion recognition and pain detection. Experiments on two publicly available benchmarks and comparison with state-of-the-art approaches demonstrate that the dynamics-based FID representation attains competitive performance when off-the-shelf classification tools are adopted.\nThis paper proposes a decision support system to aid movie investment decisions at the early stage of movie productions. The system predicts the success of a movie based on its profitability by leveraging historical data from various sources. Using social network analysis and text mining techniques, the system automatically extracts several groups of features, including \"who\" are on the cast, \"what\" a movie is about, \"when\" a movie will be released, as well as \"hybrid\" features that match \"who\" with \"what\", and \"when\" with \"what\". Experiment results with movies during an 11-year period showed that the system outperforms benchmark methods by a large margin in predicting movie profitability. Novel features we proposed also made great contributions to the prediction. In addition to designing a decision support system with practical utilities, our analysis of key factors for movie profitability may also have implications for theoretical research on team performance and the success of creative work.\nIn targeted online advertising, advertisers look for maximizing campaign performance under delivery constraint within budget schedule. Most of the advertisers typically prefer to impose the delivery constraint to spend budget smoothly over the time in order to reach a wider range of audiences and have a sustainable impact. Since lots of impressions are traded through public auctions for online advertising today, the liquidity makes price elasticity and bid landscape between demand and supply change quite dynamically. Therefore, it is challenging to perform smooth pacing control and maximize campaign performance simultaneously. In this paper, we propose a smart pacing approach in which the delivery pace of each campaign is learned from both offline and online data to achieve smooth delivery and optimal performance goals. The implementation of the proposed approach in a real DSP system is also presented. Experimental evaluations on both real online ad campaigns and offline simulations show that our approach can effectively improve campaign performance and achieve delivery goals.\nWe propose an original particle-based implementation of the Loopy Belief Propagation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a continuous state space. The algorithm constructs adaptively efficient proposal distributions approximating the local beliefs at each note of the MRF. This is achieved by considering proposal distributions in the exponential family whose parameters are updated iterately in an Expectation Propagation (EP) framework. The proposed particle scheme provides consistent estimation of the LBP marginals as the number of particles increases. We demonstrate that it provides more accurate results than the Particle Belief Propagation (PBP) algorithm of Ihler and McAllester (2009) at a fraction of the computational cost and is additionally more robust empirically. The computational complexity of our algorithm at each iteration is quadratic in the number of particles. We also propose an accelerated implementation with sub-quadratic computational complexity which still provides consistent estimates of the loopy BP marginal distributions and performs almost as well as the original procedure.\nStock price forecasting is an important issue for investors since extreme accuracy in forecasting can bring about high profits. Fuzzy Time Series (FTS) and Longest Common/Repeated Sub-sequence (LCS/LRS) are two important issues for forecasting prices. However, to the best of our knowledge, there are no significant studies using LCS/LRS to predict stock prices. It is impossible that prices stay exactly the same as historic prices. Therefore, this paper proposes a state-of-the-art method which combines FTS and LCS/LRS to predict stock prices. This method is based on the principle that history will repeat itself. It uses different interval lengths in FTS to fuzzify the prices, and LCS/LRS to look for the same pattern in the historical prices to predict future stock prices. In the experiment, we examine various intervals of fuzzy time sets in order to achieve high prediction accuracy. The proposed method outperforms traditional methods in terms of prediction accuracy and, furthermore, it is easy to implement.\nWe present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.\nMany online companies sell advertisement space in second-price auctions with reserve. In this paper, we develop a probabilistic method to learn a profitable strategy to set the reserve price. We use historical auction data with features to fit a predictor of the best reserve price. This problem is delicate - the structure of the auction is such that a reserve price set too high is much worse than a reserve price set too low. To address this we develop objective variables, a new framework for combining probabilistic modeling with optimal decision-making. Objective variables are \"hallucinated observations\" that transform the revenue maximization task into a regularized maximum likelihood estimation problem, which we solve with an EM algorithm. This framework enables a variety of prediction mechanisms to set the reserve price. As examples, we study objective variable methods with regression, kernelized regression, and neural networks on simulated and real data. Our methods outperform previous approaches both in terms of scalability and profit.\nThe New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality most strongly match the funniest captions, followed by positive sentiment. These results are useful for understanding humor and also in the design of more engaging conversational agents in text and multimodal (vision+text) systems. As part of this work, a large set of cartoons and captions is being made available to the community.\nThis paper studies convolutional neural networks (CNN) to learn unsupervised feature representations for 44 different plant species, collected at the Royal Botanic Gardens, Kew, England. To gain intuition on the chosen features from the CNN model (opposed to a 'black box' solution), a visualisation technique based on the deconvolutional networks (DN) is utilized. It is found that venations of different order have been chosen to uniquely represent each of the plant species. Experimental results using these CNN features with different classifiers show consistency and superiority compared to the state-of-the art solutions which rely on hand-crafted features.\nIncremental SAT and QBF solving potentially yields improvements when sequences of related formulas are solved. An incremental application is usually tailored towards some specific solver and decomposes a problem into incremental solver calls. This hinders the independent comparison of different solvers, particularly when the application program is not available. As a remedy, we present an approach to automated benchmarking of incremental SAT and QBF solvers. Given a collection of formulas in (Q)DIMACS format generated incrementally by an application program, our approach automatically translates the formulas into instructions to import and solve a formula by an incremental SAT/QBF solver. The result of the translation is a program which replays the incremental solver calls and thus allows to evaluate incremental solvers independently from the application program. We illustrate our approach by different hardware verification problems for SAT and QBF solvers.\nThis paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter. We also describe two neural learning architectures suitable for analyzing this dataset, and provide benchmark performance on the task of selecting the best next response.\nIn this paper, we consider the task of learning control policies for text-based games. In these games, all interactions in the virtual world are through text and the underlying state is not observed. The resulting language barrier makes such environments challenging for automatic game players. We employ a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback. This framework enables us to map text descriptions into vector representations that capture the semantics of the game states. We evaluate our approach on two game worlds, comparing against baselines using bag-of-words and bag-of-bigrams for state representations. Our algorithm outperforms the baselines on both worlds demonstrating the importance of learning expressive representations.\nAccurate estimation such as cost estimation, quality estimation and risk analysis is a major issue in management. We propose a patent pending soft computing framework to tackle this challenging problem. Our generic framework is independent of the nature and type of estimation. It consists of neural network, fuzzy logic, and an algorithmic estimation model. We made use of the Constructive Cost Model (COCOMO), Analysis of Variance (ANOVA), and Function Point Analysis as the algorithmic models and validated the accuracy of the Neuro-Fuzzy Algorithmic (NFA) Model in software cost estimation using industrial project data. Our model produces more accurate estimation than using an algorithmic model alone. We also discuss the prototypes of our tools that implement the NFA Model. We conclude with our roadmap and direction to enrich the model in tackling different estimation challenges.\nRobots assisting humans in complex domains have to represent knowledge and reason at both the sensorimotor level and the social level. The architecture described in this paper couples the non-monotonic logical reasoning capabilities of a declarative language with probabilistic belief revision, enabling robots to represent and reason with qualitative and quantitative descriptions of knowledge and degrees of belief. Specifically, incomplete domain knowledge, including information that holds in all but a few exceptional situations, is represented as a Answer Set Prolog (ASP) program. The answer set obtained by solving this program is used for inference, planning, and for jointly explaining (a) unexpected action outcomes due to exogenous actions and (b) partial scene descriptions extracted from sensor input. For any given task, each action in the plan contained in the answer set is executed probabilistically. The subset of the domain relevant to the action is identified automatically, and observations extracted from sensor inputs perform incremental Bayesian updates to a belief distribution defined over this domain subset, with highly probable beliefs being committed to the ASP program. The architecture's capabilities are illustrated in simulation and on a mobile robot in the context of a robot waiter operating in the dining room of a restaurant.\nDevelopments in semantic web technologies have promoted ontological encoding of knowledge from diverse domains. However, modelling many practical domains requires more expressiveness than what the standard description logics (most prominently SROIQ) support. In this paper, we extend the expressive DL SROIQ with constraint networks (resulting in the logic SROIQc) and grounded circumscription (resulting in the logic GC-SROIQ). Applications of constraint modelling include embedding ontologies with temporal or spatial information, while those of grounded circumscription include defeasible inference and closed world reasoning.   We describe the syntax and semantics of the logic formed by including constraint modelling constructs in SROIQ, and provide a sound, complete and terminating tableau algorithm for it. We further provide an intuitive algorithm for Grounded Circumscription in SROIQc, which adheres to the general framework of grounded circumscription, and which can be applied to a whole range of expressive logics for which no such specific algorithm presently exists.\nReal world problems always have different multiple solutions. For instance, optical engineers need to tune the recording parameters to get as many optimal solutions as possible for multiple trials in the varied-line-spacing holographic grating design problem. Unfortunately, most traditional optimization techniques focus on solving for a single optimal solution. They need to be applied several times; yet all solutions are not guaranteed to be found. Thus the multimodal optimization problem was proposed. In that problem, we are interested in not only a single optimal point, but also the others. With strong parallel search capability, evolutionary algorithms are shown to be particularly effective in solving this type of problem. In particular, the evolutionary algorithms for multimodal optimization usually not only locate multiple optima in a single run, but also preserve their population diversity throughout a run, resulting in their global optimization ability on multimodal functions. In addition, the techniques for multimodal optimization are borrowed as diversity maintenance techniques to other problems. In this chapter, we describe and review the state-of-the-arts evolutionary algorithms for multimodal optimization in terms of methodology, benchmarking, and application.\nIn this study, a spectral graph-theoretic grouping strategy for weakly supervised classification is introduced, where a limited number of labelled samples and a larger set of unlabelled samples are used to construct a larger annotated training set composed of strongly labelled and weakly labelled samples. The inherent relationship between the set of strongly labelled samples and the set of unlabelled samples is established via spectral grouping, with the unlabelled samples subsequently weakly annotated based on the strongly labelled samples within the associated spectral groups. A number of similarity graph models for spectral grouping, including two new similarity graph models introduced in this study, are explored to investigate their performance in the context of weakly supervised classification in handling different types of data. Experimental results using benchmark datasets as well as real EMG datasets demonstrate that the proposed approach to weakly supervised classification can provide noticeable improvements in classification performance, and that the proposed similarity graph models can lead to ultimate learning results that are either better than or on a par with existing similarity graph models in the context of spectral grouping for weakly supervised classification.\nEstimating mutual information (MI) from samples is a fundamental problem in statistics, machine learning, and data analysis. Recently it was shown that a popular class of non-parametric MI estimators perform very poorly for strongly dependent variables and have sample complexity that scales exponentially with the true MI. This undesired behavior was attributed to the reliance of those estimators on local uniformity of the underlying (and unknown) probability density function. Here we present a novel semi-parametric estimator of mutual information, where at each sample point, densities are {\\em locally} approximated by a Gaussians distribution. We demonstrate that the estimator is asymptotically unbiased. We also show that the proposed estimator has a superior performance compared to several baselines, and is able to accurately measure relationship strengths over many orders of magnitude.\nThe fundamental problem underlying all multi-criteria decision analysis (MCDA) problems is that of dominance between any two alternatives: \"Given two alternatives A and B, each described by a set criteria, is A preferred to B with respect to a set of decision maker (DM) preferences over the criteria?\". Depending on the application in which MCDA is performed, the alternatives may represent strategies and policies for business, potential locations for setting up new facilities, designs of buildings, etc. The general objective of MCDA is to enable the DM to order all alternatives in order of the stated preferences, and choose the ones that are best, i.e., optimal with respect to the preferences over the criteria. This article presents and summarizes a recently developed MCDA framework that orders the set of alternatives when the relative importance preferences are incomplete, imprecise, or qualitative in nature.\nMargin-based structured prediction commonly uses a maximum loss over all possible structured outputs \\cite{Altun03,Collins04b,Taskar03}. In natural language processing, recent work \\cite{Zhang14,Zhang15} has proposed the use of the maximum loss over random structured outputs sampled independently from some proposal distribution. This method is linear-time in the number of random structured outputs and trivially parallelizable. We study this family of loss functions in the PAC-Bayes framework under Gaussian perturbations \\cite{McAllester07}. Under some technical conditions and up to statistical accuracy, we show that this family of loss functions produces a tighter upper bound of the Gibbs decoder distortion than commonly used methods. Thus, using the maximum loss over random structured outputs is a principled way of learning the parameter of structured prediction models. Besides explaining the experimental success of \\cite{Zhang14,Zhang15}, our theoretical results show that more general techniques are possible.\nBelief compression improves the tractability of large-scale partially observable Markov decision processes (POMDPs) by finding projections from high-dimensional belief space onto low-dimensional approximations, where solving to obtain action selection policies requires fewer computations. This paper develops a unified theoretical framework to analyse three existing linear belief compression approaches, including value-directed compression and two non-negative matrix factorisation (NMF) based algorithms. The results indicate that all the three known belief compression methods have their own critical deficiencies. Therefore, projective NMF belief compression is proposed (P-NMF), aiming to overcome the drawbacks of the existing techniques. The performance of the proposed algorithm is examined on four POMDP problems of reasonably large scale, in comparison with existing techniques. Additionally, the competitiveness of belief compression is compared empirically to a state-of-the-art heuristic search based POMDP solver and their relative merits in solving large-scale POMDPs are investigated.\nOver the last few years, much progress has been made in the theory and practice of solving quantified Boolean formulas (QBF). Novel solvers have been presented that either successfully enhance established techniques or implement novel solving paradigms. Powerful preprocessors have been realized that tune the encoding of a formula to make it easier to solve. Frameworks for certification and solution extraction emerged that allow for a detailed interpretation of a QBF solver's results, and new types of QBF encodings were presented for various application problems.   To capture these developments the QBF Gallery was established in 2013. The QBF Gallery aims at providing a forum to assess QBF tools and to collect new, expressive benchmarks that allow for documenting the status quo and that indicate promising research directions. These benchmarks became the basis for the experiments conducted in the context of the QBF Gallery 2013 and follow-up evaluations. In this paper, we report on the setup of the QBF Gallery. To this end, we conducted numerous experiments which allowed us not only to assess the quality of the tools, but also the quality of the benchmarks.\nPresently, a very large number of public and private data sets are available around the local governments. In most cases, they are not semantically interoperable and a huge human effort is needed to create integrated ontologies and knowledge base for smart city. Smart City ontology is not yet standardized, and a lot of research work is needed to identify models that can easily support the data reconciliation, the management of the complexity and reasoning. In this paper, a system for data ingestion and reconciliation smart cities related aspects as road graph, services available on the roads, traffic sensors etc., is proposed. The system allows managing a big volume of data coming from a variety of sources considering both static and dynamic data. These data are mapped to smart-city ontology and stored into an RDF-Store where they are available for applications via SPARQL queries to provide new services to the users. The paper presents the process adopted to produce the ontology and the knowledge base and the mechanisms adopted for the verification, reconciliation and validation. Some examples about the possible usage of the coherent knowledge base produced are also offered and are accessible from the RDF-Store.\nThe Islamic State of Iraq and al-Sham (ISIS) is a dominant insurgent group operating in Iraq and Syria that rose to prominence when it took over Mosul in June, 2014. In this paper, we present a data-driven approach to analyzing this group using a dataset consisting of 2200 incidents of military activity surrounding ISIS and the forces that oppose it (including Iraqi, Syrian, and the American-led coalition). We combine ideas from logic programming and causal reasoning to mine for association rules for which we present evidence of causality. We present relationships that link ISIS vehicle-bourne improvised explosive device (VBIED) activity in Syria with military operations in Iraq, coalition air strikes, and ISIS IED activity, as well as rules that may serve as indicators of spikes in indirect fire, suicide attacks, and arrests.\nThis report describes an initial replication study of the PRECISE system and develops a clearer, more formal description of the approach. Based on our evaluation, we conclude that the PRECISE results do not fully replicate. However the formalization developed here suggests a road map to further enhance and extend the approach pioneered by PRECISE.   After a long, productive discussion with Ana-Maria Popescu (one of the authors of PRECISE) we got more clarity on the PRECISE approach and how the lexicon was authored for the GEO evaluation. Based on this we built a more direct implementation over a repaired formalism. Although our new evaluation is not yet complete, it is clear that the system is performing much better now. We will continue developing our ideas and implementation and generate a future report/publication that more accurately evaluates PRECISE like approaches.\nThe induction motors have wide range of applications for due to its well-known advantages like brushless structures, low costs and robust performances. Over the past years, many kind of control methods are proposed for the induction motors and direct torque control has gained huge importance inside of them due to fast dynamic torque responses and simple control structures. However, the direct torque control method has still some handicaps against the other control methods and most of the important of these handicaps is high torque ripple. This paper suggests a new approach, Fuzzy logic based space vector modulation, on the direct torque controlled induction motors and aim of the approach is to overcome high torque ripple disadvantages of conventional direct torque control. In order to test and compare the proposed direct torque control method with conventional direct torque control method simulations, in Matlab/Simulink,have been carried out in different working conditions. The simulation results showed that a significant improvement in the dynamic torque and speed responses when compared to the conventional direct torque control method.\nCurrently the Dempster-Shafer based algorithm and Uniform Random Probability based algorithm are the preferred method of resolving security games, in which defenders are able to identify attackers and only strategy remained ambiguous. However this model is inefficient in situations where resources are limited and both the identity of the attackers and their strategies are ambiguous. The intent of this study is to find a more effective algorithm to guide the defenders in choosing which outside agents with which to cooperate given both ambiguities. We designed an experiment where defenders were compelled to engage with outside agents in order to maximize protection of their targets. We introduced two important notions: the behavior of each agent in target protection and the tolerance threshold in the target protection process. From these, we proposed an algorithm that was applied by each defender to determine the best potential assistant(s) with which to cooperate. Our results showed that our proposed algorithm is safer than the Dempster-Shafer based algorithm.\nThis paper focuses on finding spatial and temporal criminal hotspots. It analyses two different real-world crimes datasets for Denver, CO and Los Angeles, CA and provides a comparison between the two datasets through a statistical analysis supported by several graphs. Then, it clarifies how we conducted Apriori algorithm to produce interesting frequent patterns for criminal hotspots. In addition, the paper shows how we used Decision Tree classifier and Naive Bayesian classifier in order to predict potential crime types. To further analyse crimes datasets, the paper introduces an analysis study by combining our findings of Denver crimes dataset with its demographics information in order to capture the factors that might affect the safety of neighborhoods. The results of this solution could be used to raise awareness regarding the dangerous locations and to help agencies to predict future crimes in a specific location within a particular time.\nWe propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.\nDeep compositional models of meaning acting on distributional representations of words in order to produce vectors of larger text constituents are evolving to a popular area of NLP research. We detail a compositional distributional framework based on a rich form of word embeddings that aims at facilitating the interactions between words in the context of a sentence. Embeddings and composition layers are jointly learned against a generic objective that enhances the vectors with syntactic information from the surrounding context. Furthermore, each word is associated with a number of senses, the most plausible of which is selected dynamically during the composition process. We evaluate the produced vectors qualitatively and quantitatively with positive results. At the sentence level, the effectiveness of the framework is demonstrated on the MSRPar task, for which we report results within the state-of-the-art range.\nAddiction, as a nervous disease, can be analysed using mathematical modelling and computer simulations. In this paper, we use an existing mathematical model to predict and simulate human brain response to the consumption of a single dose of methamphetamine. The model is implemented and coded in Matlab. Three types of personalities including introverts, ambiverts and extroverts are studied. The parameters of the mathematical model are calibrated and optimized, according to psychological theories, using a real coded genetic algorithm. The simulations show significant correlation between people response to methamphetamine abuse and their personality. They also show that one of the causes of tendency to stimulants roots in consumers personality traits. The results can be used as a tool for reducing attitude towards addiction.\nFuzzy Description Logics (DLs) provide a means for representing vague knowledge about an application domain. In this paper, we study fuzzy extensions of conjunctive queries (CQs) over the DL $\\mathcal{SROIQ}$ based on finite chains of degrees of truth. To answer such queries, we extend a well-known technique that reduces the fuzzy ontology to a classical one, and use classical DL reasoners as a black box. We improve the complexity of previous reduction techniques for finitely valued fuzzy DLs, which allows us to prove tight complexity results for answering certain kinds of fuzzy CQs. We conclude with an experimental evaluation of a prototype implementation, showing the feasibility of our approach.\nMost of contemporary software systems are implemented using an object-oriented approach. Modeling phases -- during which software engineers analyze requirements to the future system using some modeling language -- are an important part of the development process, since modeling errors are often hard to recognize and correct.   In this paper we present a framework which allows the integration of Answer Set Programming into the object-oriented software development process. OOASP supports reasoning about object-oriented software models and their instantiations. Preliminary results of the OOASP application in CSL Studio, which is a Siemens internal modeling environment for product configurators, show that it can be used as a lightweight approach to verify, create and transform instantiations of object models at runtime and to support the software development process during design and testing.\nThis paper addresses the problem of finding multiple near-optimal, spatially-dissimilar paths that can be considered as alternatives in the decision making process, for finding optimal corridors in which to construct a new road. We further consider combinations of techniques for reducing the costs associated with the computation and increasing the accuracy of the cost formulation. Numerical results for five algorithms to solve the dissimilar multipath problem show that a \"bidirectional approach\" yields the fastest running times and the most robust algorithm. Further modifications of the algorithms to reduce the running time were tested and it is shown that running time can be reduced by an average of 56 percent without compromising the quality of the results.\nWe explore methods for content selection and address the issue of coherence in the context of the generation of multimedia artifacts. We use audio and video to present two case studies: generation of film tributes, and lecture-driven science talks. For content selection, we use centrality-based and diversity-based summarization, along with topic analysis. To establish coherence, we use the emotional content of music, for film tributes, and ensure topic similarity between lectures and documentaries, for science talks. Composition techniques for the production of multimedia artifacts are addressed as a means of organizing content, in order to improve coherence. We discuss our results considering the above aspects.\nWe present a general theory and corresponding declarative model for the embodied grounding and natural language based analytical summarisation of dynamic visuo-spatial imagery. The declarative model ---ecompassing spatio-linguistic abstractions, image schemas, and a spatio-temporal feature based language generator--- is modularly implemented within Constraint Logic Programming (CLP). The implemented model is such that primitives of the theory, e.g., pertaining to space and motion, image schemata, are available as first-class objects with `deep semantics' suited for inference and query. We demonstrate the model with select examples broadly motivated by areas such as film, design, geography, smart environments where analytical natural language based externalisations of the moving image are central from the viewpoint of human interaction, evidence-based qualitative analysis, and sensemaking.   Keywords: moving image, visual semantics and embodiment, visuo-spatial cognition and computation, cognitive vision, computational models of narrative, declarative spatial reasoning\nValuation algebras abstract a large number of formalisms for automated reasoning and enable the definition of generic inference procedures. Many of these formalisms provide some notion of solution. Typical examples are satisfying assignments in constraint systems, models in logics or solutions to linear equation systems.   Many widely used dynamic programming algorithms for optimization problems rely on low treewidth decompositions and can be understood as particular cases of a single algorithmic scheme for finding solutions in a valuation algebra. The most encompassing description of this algorithmic scheme to date has been proposed by Pouly and Kohlas together with sufficient conditions for its correctness. Unfortunately, the formalization relies on a theorem for which we provide counterexamples. In spite of that, the mainline of Pouly and Kohlas' theory is correct, although some of the necessary conditions have to be revised. In this paper we analyze the impact that the counter-examples have on the theory, and rebuild the theory providing correct sufficient conditions for the algorithms. Furthermore, we also provide necessary conditions for the algorithms, allowing for a sharper characterization of when the algorithmic scheme can be applied.\nUncovering causal relationships in data is a major objective of data analytics. Causal relationships are normally discovered with designed experiments, e.g. randomised controlled trials, which, however are expensive or infeasible to be conducted in many cases. Causal relationships can also be found using some well designed observational studies, but they require domain experts' knowledge and the process is normally time consuming. Hence there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. However, classification methods are not designed for causal discovery and a classification method may find false causal signals and miss the true ones. In this paper, we develop a causal decision tree where nodes have causal interpretations. Our method follows a well established causal inference framework and makes use of a classic statistical test. The method is practical for finding causal signals in large data sets.\nThe ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements.\nThe success of deep learning often derives from well-chosen operational building blocks. In this work, we revise the temporal convolution operation in CNNs to better adapt it to text processing. Instead of concatenating word representations, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage. Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with intervening words. Through a combination of low-rank tensors, and pattern weighting, we can efficiently evaluate the resulting convolution operation via dynamic programming. We test the resulting architecture on standard sentiment classification and news categorization tasks. Our model achieves state-of-the-art performance both in terms of accuracy and training speed. For instance, we obtain 51.2% accuracy on the fine-grained sentiment classification task.\nSelectScript is an extendable, adaptable, and declarative domain-specific language aimed at information retrieval from simulation environments and robotic world models in an SQL-like manner. In this work we have extended the language in two directions. First, we have implemented hierarchical queries; second, we improve efficiency enabling manual design space exploration on different \"search\" strategies. We demonstrate the applicability of such extensions in two application problems; the basic language concepts are explained by solving the classical problem of the Towers of Hanoi and then a common path planning problem in a complex 3D environment is implemented.\nWe propose a distributed deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is based on the deep Q-network, a convolutional neural network trained with a variant of Q-learning. Its input is raw pixels and its output is a value function estimating future rewards from taking an action given a system state. To distribute the deep Q-network training, we adapt the DistBelief software framework to the context of efficiently training reinforcement learning agents. As a result, the method is completely asynchronous and scales well with the number of machines. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to achieve reasonable success on a simple game with minimal parameter tuning.\nMany of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modelling, language modelling and sequence decoding. We investigate a more direct approach in which the HMM is replaced with a Recurrent Neural Network (RNN) that performs sequence prediction directly at the character level. Alignment between the input features and the desired character sequence is learned automatically by an attention mechanism built into the RNN. For each predicted character, the attention mechanism scans the input sequence and chooses relevant frames. We propose two methods to speed up this operation: limiting the scan to a subset of most promising frames and pooling over time the information contained in neighboring frames, thereby reducing source sequence length. Integrating an n-gram language model into the decoding process yields recognition accuracies similar to other HMM-free RNN-based approaches.\nWe investigate the problem of winner determination from computational social choice theory in the data stream model. Specifically, we consider the task of summarizing an arbitrarily ordered stream of $n$ votes on $m$ candidates into a small space data structure so as to be able to obtain the winner determined by popular voting rules. As we show, finding the exact winner requires storing essentially all the votes. So, we focus on the problem of finding an {\\em $\\eps$-winner}, a candidate who could win by a change of at most $\\eps$ fraction of the votes. We show non-trivial upper and lower bounds on the space complexity of $\\eps$-winner determination for several voting rules, including $k$-approval, $k$-veto, scoring rules, approval, maximin, Bucklin, Copeland, and plurality with run off.\nWarehouse is one of the important aspects of a company. Therefore, it is necessary to improve Warehouse Management System (WMS) to have a simple function that can determine the layout of the storage goods. In this paper we propose an improved warehouse layout method based on ant colony algorithm and backtracking algorithm. The method works on two steps. First, it generates a solutions parameter tree from backtracking algorithm. Then second, it deducts the solutions parameter by using a combination of ant colony algorithm and backtracking algorithm. This method was tested by measuring the time needed to build the tree and to fill up the space using two scenarios. The method needs 0.294 to 33.15 seconds to construct the tree and 3.23 seconds (best case) to 61.41 minutes (worst case) to fill up the warehouse. This method is proved to be an attractive alternative solution for warehouse layout system.\nThe margin of victory is easy to compute for many election schemes but difficult for Instant Runoff Voting (IRV). This is important because arguments about the correctness of an election outcome usually rely on the size of the electoral margin. For example, risk-limiting audits require a knowledge of the margin of victory in order to determine how much auditing is necessary. This paper presents a practical branch-and-bound algorithm for exact IRV margin computation that substantially improves on the current best-known approach. Although exponential in the worst case, our algorithm runs efficiently in practice on all the real examples we could find. We can efficiently discover exact margins on election instances that cannot be solved by the current state-of-the-art.\nDiscrete combinatorial optimization has a central role in many scientific disciplines, however, for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover, it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with $K=3,4$, which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the backtracking survey propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossible to solve in linear time when all solutions contain frozen variables.\nWe propose a method combining relational-logic representations with neural network learning. A general lifted architecture, possibly reflecting some background domain knowledge, is described through relational rules which may be handcrafted or learned. The relational rule-set serves as a template for unfolding possibly deep neural networks whose structures also reflect the structures of given training or testing relational examples. Different networks corresponding to different examples share their weights, which co-evolve during training by stochastic gradient descent algorithm. The framework allows for hierarchical relational modeling constructs and learning of latent relational concepts through shared hidden layers weights corresponding to the rules. Discovery of notable relational concepts and experiments on 78 relational learning benchmarks demonstrate favorable performance of the method.\nWe consider the Max $K$-Armed Bandit problem, where a learning agent is faced with several sources (arms) of items (rewards), and interested in finding the best item overall. At each time step the agent chooses an arm, and obtains a random real valued reward. The rewards of each arm are assumed to be i.i.d., with an unknown probability distribution that generally differs among the arms. Under the PAC framework, we provide lower bounds on the sample complexity of any $(\\epsilon,\\delta)$-correct algorithm, and propose algorithms that attain this bound up to logarithmic factors. We compare the performance of this multi-arm algorithms to the variant in which the arms are not distinguishable by the agent and are chosen randomly at each stage. Interestingly, when the maximal rewards of the arms happen to be similar, the latter approach may provide better performance.\nEntity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.\nConstraint Programming (CP) solvers typically tackle optimization problems by repeatedly finding solutions to a problem while placing tighter and tighter bounds on the solution cost. This approach is somewhat naive, especially for soft-constraint optimization problems in which the soft constraints are mostly satisfied. Unsatisfiable-core approaches to solving soft constraint problems in SAT (e.g. MAXSAT) force all soft constraints to be hard initially. When solving fails they return an unsatisfiable core, as a set of soft constraints that cannot hold simultaneously. These are reverted to soft and solving continues. Since lazy clause generation solvers can also return unsatisfiable cores we can adapt this approach to constraint programming. We adapt the original MAXSAT unsatisfiable core solving approach to be usable for constraint programming and define a number of extensions. Experimental results show that our methods are beneficial on a broad class of CP-optimization benchmarks involving soft constraints, cardinality or preferences.\nWe present a unified framework which supports grounding natural-language semantics in robotic driving. This framework supports acquisition (learning grounded meanings of nouns and prepositions from human annotation of robotic driving paths), generation (using such acquired meanings to generate sentential description of new robotic driving paths), and comprehension (using such acquired meanings to support automated driving to accomplish navigational goals specified in natural language). We evaluate the performance of these three tasks by having independent human judges rate the semantic fidelity of the sentences associated with paths, achieving overall average correctness of 94.6% and overall average completeness of 85.6%.\nIn recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a multi-factor cause consisting of two or more component variables which individually are not causes. A straightforward approach to uncovering a combined cause is to include both individual and combined variables in the causal discovery using existing methods, but this scheme is computationally infeasible due to the huge number of combined variables. In this paper, we propose a novel approach to address this practical causal discovery problem, i.e. mining combined causes in large data sets. The experiments with both synthetic and real world data sets show that the proposed method can obtain high-quality causal discoveries with a high computational efficiency.\nAccurate software effort estimation has been a challenge for many software practitioners and project managers. Underestimation leads to disruption in the projects estimated cost and delivery. On the other hand, overestimation causes outbidding and financial losses in business. Many software estimation models exist; however, none have been proven to be the best in all situations. In this paper, a decision tree forest (DTF) model is compared to a traditional decision tree (DT) model, as well as a multiple linear regression model (MLR). The evaluation was conducted using ISBSG and Desharnais industrial datasets. Results show that the DTF model is competitive and can be used as an alternative in software effort prediction.\nBayesian networks, and especially their structures, are powerful tools for representing conditional independencies and dependencies between random variables. In applications where related variables form a priori known groups, chosen to represent different \"views\" to or aspects of the same entities, one may be more interested in modeling dependencies between groups of variables rather than between individual variables. Motivated by this, we study prospects of representing relationships between variable groups using Bayesian network structures. We show that for dependency structures between groups to be expressible exactly, the data have to satisfy the so-called groupwise faithfulness assumption. We also show that one cannot learn causal relations between groups using only groupwise conditional independencies, but also variable-wise relations are needed. Additionally, we present algorithms for finding the groupwise dependency structures.\nThe integration of Linked Open Data (LOD) content in Web pages is a challenging and sometimes tedious task for Web developers. At the same moment, most software packages for blogs, content management systems (CMS), and shop applications support the consumption of feed formats, namely RSS and Atom. In this technical report, we demonstrate an on-line tool that fetches e-commerce data from a SPARQL endpoint and syndicates obtained results as RSS or Atom feeds. Our approach combines (1) the popularity and broad tooling support of existing feed formats, (2) the precision of queries against structured data built upon common Web vocabularies like schema.org, GoodRelations, FOAF, VCard, and WGS 84, and (3) the ease of integrating content from a large number of Web sites and other data sources in RDF in general.\nWe propose an end-to-end, domain-independent neural encoder-aligner-decoder model for selective generation, i.e., the joint task of content selection and surface realization. Our model first encodes a full set of over-determined database event records via an LSTM-based recurrent neural network, then utilizes a novel coarse-to-fine aligner to identify the small subset of salient records to talk about, and finally employs a decoder to generate free-form descriptions of the aligned, selected records. Our model achieves the best selection and generation results reported to-date (with 59% relative improvement in generation) on the benchmark WeatherGov dataset, despite using no specialized features or linguistic resources. Using an improved k-nearest neighbor beam filter helps further. We also perform a series of ablations and visualizations to elucidate the contributions of our key model components. Lastly, we evaluate the generalizability of our model on the RoboCup dataset, and get results that are competitive with or better than the state-of-the-art, despite being severely data-starved.\nSeveral techniques have been used to generate weather forecast texts. In this paper, case based reasoning (CBR) is proposed for weather forecast text generation because similar weather conditions occur over time and should have similar forecast texts. CBR-METEO, a system for generating weather forecast texts was developed using a generic framework (jCOLIBRI) which provides modules for the standard components of the CBR architecture. The advantage in a CBR approach is that systems can be built in minimal time with far less human effort after initial consultation with experts. The approach depends heavily on the goodness of the retrieval and revision components of the CBR process. We evaluated CBRMETEO with NIST, an automated metric which has been shown to correlate well with human judgements for this domain. The system shows comparable performance with other NLG systems that perform the same task.\nReal life problems such as scheduling meeting between people at different locations can be modelled as distributed Constraint Satisfaction Problems (CSPs). Suitable and satisfactory solutions can then be found using constraint satisfaction algorithms which can be exhaustive (backtracking) or otherwise (local search). However, most research in this area tested their algorithms by simulation on a single PC with a single program entry point. The main contribution of our work is the design and implementation of a truly distributed constraint solver based on a local search algorithm using Java Agent DEvelopment framework (JADE) to enable communication between agents on different machines. Particularly, we discuss design and implementation issues related to truly distributed constraint solver which might not be critical when simulated on a single machine. Evaluation results indicate that our truly distributed constraint solver works well within the observed limitations when tested with various distributed CSPs. Our application can also incorporate any constraint solving algorithm with little modifications.\nWe propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS). Each database vector is quantized in multiple subspaces via a set of codebooks, learned directly by minimizing the inner product quantization error. Then, the inner product of a query to a database vector is approximated as the sum of inner products with the subspace quantizers. Different from recently proposed LSH approaches to MIPS, the database vectors and queries do not need to be augmented in a higher dimensional feature space. We also provide a theoretical analysis of the proposed approach, consisting of the concentration results under mild assumptions. Furthermore, if a small sample of example queries is given at the training time, we propose a modified codebook learning procedure which further improves the accuracy. Experimental results on a variety of datasets including those arising from deep neural networks show that the proposed approach significantly outperforms the existing state-of-the-art.\nThis report presents Giraffe, a chess engine that uses self-play to discover all its domain-specific knowledge, with minimal hand-crafted knowledge given by the programmer. Unlike previous attempts using machine learning only to perform parameter-tuning on hand-crafted evaluation functions, Giraffe's learning system also performs automatic feature extraction and pattern recognition. The trained evaluation function performs comparably to the evaluation functions of state-of-the-art chess engines - all of which containing thousands of lines of carefully hand-crafted pattern recognizers, tuned over many years by both computer chess experts and human chess masters. Giraffe is the most successful attempt thus far at using end-to-end machine learning to play chess.\nPaper provides a method for solving the reverse Monge-Kantorovich transport problem (TP). It allows to accumulate positive decision-taking experience made by decision-taker in situations that can be presented in the form of TP. The initial data for the solution of the inverse TP is the information on orders, inventories and effective decisions take by decision-taker. The result of solving the inverse TP contains evaluations of the TPs payoff matrix elements. It can be used in new situations to select the solution corresponding to the preferences of the decision-taker. The method allows to gain decision-taker experience, so it can be used by others. The method allows to build the model of decision-taker preferences in a specific application area. The model can be updated regularly to ensure its relevance and adequacy to the decision-taker system of preferences. This model is adaptive to the current preferences of the decision taker.\nIn this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs; we call the overall objective a dynamic quantile-based risk measure (DQBRM). In particular, we consider optimizing dynamic risk measures where the one-step risk measures are QBRMs, a class of risk measures that includes the popular value at risk (VaR) and the conditional value at risk (CVaR). Although there is considerable theoretical development of risk-averse MDPs in the literature, the computational challenges have not been explored as thoroughly. We propose data-driven and simulation-based approximate dynamic programming (ADP) algorithms to solve the risk-averse sequential decision problem. We address the issue of inefficient sampling for risk applications in simulated settings and present a procedure, based on importance sampling, to direct samples toward the \"risky region\" as the ADP algorithm progresses. Finally, we show numerical results of our algorithms in the context of an application involving risk-averse bidding for energy storage.\nThe paper presents a new script classification method for the discrimination of the South Slavic medieval labels. It consists in the textural analysis of the script types. In the first step, each letter is coded by the equivalent script type, which is defined by its typographical features. Obtained coded text is subjected to the run-length statistical analysis and to the adjacent local binary pattern analysis in order to extract the features. The result shows a diversity between the extracted features of the scripts, which makes the feature classification more effective. It is the basis for the classification process of the script identification by using an extension of a state-of-the-art approach for document clustering. The proposed method is evaluated on an example of hand-engraved in stone and hand-printed in paper labels in old Cyrillic, angular and round Glagolitic. Experiments demonstrate very positive results, which prove the effectiveness of the proposed method.\nIn this paper, we investigate bounded action theories in the situation calculus. A bounded action theory is one which entails that, in every situation, the number of object tuples in the extension of fluents is bounded by a given constant, although such extensions are in general different across the infinitely many situations. We argue that such theories are common in applications, either because facts do not persist indefinitely or because the agent eventually forgets some facts, as new ones are learnt. We discuss various classes of bounded action theories. Then we show that verification of a powerful first-order variant of the mu-calculus is decidable for such theories. Notably, this variant supports a controlled form of quantification across situations. We also show that through verification, we can actually check whether an arbitrary action theory maintains boundedness.\nLightweight, source-to-source transformation approaches to implementing MCMC for probabilistic programming languages are popular for their simplicity, support of existing deterministic code, and ability to execute on existing fast runtimes. However, they are also slow, requiring a complete re-execution of the program on every Metropolis Hastings proposal. We present a new extension to the lightweight approach, C3, which enables efficient, incrementalized re-execution of MH proposals. C3 is based on two core ideas: transforming probabilistic programs into continuation passing style (CPS), and caching the results of function calls. We show that on several common models, C3 reduces proposal runtime by 20-100x, in some cases reducing runtime complexity from linear in model size to constant. We also demonstrate nearly an order of magnitude speedup on a complex inverse procedural modeling application.\nMarkov decision processes (MDPs) are a well studied framework for solving sequential decision making problems under uncertainty. Exact methods for solving MDPs based on dynamic programming such as policy iteration and value iteration are effective on small problems. In problems with a large discrete state space or with continuous state spaces, a compact representation is essential for providing an efficient approximation solutions to MDPs. Commonly used approximation algorithms involving constructing basis functions for projecting the value function onto a low dimensional subspace, and building a factored or hierarchical graphical model to decompose the transition and reward functions. However, hand-coding a good compact representation for a given reinforcement learning (RL) task can be quite difficult and time consuming. Recent approaches have attempted to automatically discover efficient representations for RL.   In this thesis proposal, we discuss the problems of automatically constructing structured kernel for kernel based RL, a popular approach to learning non-parametric approximations for value function. We explore a space of kernel structures which are built compositionally from base kernels using a context-free grammar. We examine a greedy algorithm for searching over the structure space. To demonstrate how the learned structure can represent and approximate the original RL problem in terms of compactness and efficiency, we plan to evaluate our method on a synthetic problem and compare it to other RL baselines.\nMulti Expression Programming (MEP) is an evolutionary technique that may be used for solving computationally difficult problems. MEP uses a linear solution representation. Each MEP individual is a string encoding complex expressions (computer programs). A MEP individual may encode multiple solutions of the current problem. In this paper MEP is used for evolving a Traveling Salesman Problem (TSP) heuristic for graphs satisfying triangle inequality. Evolved MEP heuristic is compared with Nearest Neighbor Heuristic (NN) and Minimum Spanning Tree Heuristic (MST) on some difficult problems in TSPLIB. For most of the considered problems the evolved MEP heuristic outperforms NN and MST. The obtained algorithm was tested against some problems in TSPLIB. The results emphasizes that evolved MEP heuristic is a powerful tool for solving difficult TSP instances.\nMany practical techniques for probabilistic inference require a sequence of distributions that interpolate between a tractable distribution and an intractable distribution of interest. Usually, the sequences used are simple, e.g., based on geometric averages between distributions. When models are expressed as probabilistic programs, the models themselves are highly structured objects that can be used to derive annealing sequences that are more sensitive to domain structure. We propose an algorithm for transforming probabilistic programs to coarse-to-fine programs which have the same marginal distribution as the original programs, but generate the data at increasing levels of detail, from coarse to fine. We apply this algorithm to an Ising model, its depth-from-disparity variation, and a factorial hidden Markov model. We show preliminary evidence that the use of coarse-to-fine models can make existing generic inference algorithms more efficient.\nThis paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the value function, its gradient, and determine the actor's policy respectively. We evaluate GProp on two challenging tasks: a contextual bandit problem constructed from nonparametric regression datasets that is designed to probe the ability of reinforcement learning algorithms to accurately estimate gradients; and the octopus arm, a challenging reinforcement learning benchmark. GProp is competitive with fully supervised methods on the bandit task and achieves the best performance to date on the octopus arm.\nMining biological data is an emergent area at the intersection between bioinformatics and data mining (DM). The intelligent agent based model is a popular approach in constructing Distributed Data Mining (DDM) systems to address scalable mining over large scale distributed data. The nature of associations between different amino acids in proteins has also been a subject of great anxiety. There is a strong need to develop new models and exploit and analyze the available distributed biological data sources. In this study, we have designed and implemented a multi-agent system (MAS) called Agent enriched Quantitative Association Rules Mining for Amino Acids in distributed Protein Data Banks (AeQARM-AAPDB). Such globally strong association rules enhance understanding of protein composition and are desirable for synthesis of artificial proteins. A real protein data bank is used to validate the system.\nThe research presents epsilon hierarchical fuzzy twin support vector regression based on epsilon fuzzy twin support vector regression and epsilon twin support vector regression. Epsilon FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon TSVR which takes care of uncertainty existing in forecasting problems. Epsilon FTSVR determines a pair of epsilon insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon FTSVR is then reformulated as epsilon HFTSVR consisting of a set of hierarchical layers each containing epsilon FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon HFTSVR has remarkable generalization performance with minimum training time.\nWe consider the following problem in which a given number of items has to be chosen from a predefined set. Each item is described by a vector of attributes and for each attribute there is a desired distribution that the selected set should have. We look for a set that fits as much as possible the desired distributions on all attributes. Examples of applications include choosing members of a representative committee, where candidates are described by attributes such as sex, age and profession, and where we look for a committee that for each attribute offers a certain representation, i.e., a single committee that contains a certain number of young and old people, certain number of men and women, certain number of people with different professions, etc. With a single attribute the problem collapses to the apportionment problem for party-list proportional representation systems (in such case the value of the single attribute would be a political affiliation of a candidate). We study the properties of the associated subset selection rules, as well as their computation complexity.\nNew proof assistant developments often involve concepts similar to already formalized ones. When proving their properties, a human can often take inspiration from the existing formalized proofs available in other provers or libraries. In this paper we propose and evaluate a number of methods, which strengthen proof automation by learning from proof libraries of different provers. Certain conjectures can be proved directly from the dependencies induced by similar proofs in the other library. Even if exact correspondences are not found, learning-reasoning systems can make use of the association between proved theorems and their characteristics to predict the relevant premises. Such external help can be further combined with internal advice. We evaluate the proposed knowledge-sharing methods by reproving the HOL Light and HOL4 standard libraries. The learning-reasoning system HOL(y)Hammer, whose single best strategy could automatically find proofs for 30% of the HOL Light problems, can prove 40% with the knowledge from HOL4.\nLearning-assisted automated reasoning has recently gained popularity among the users of Isabelle/HOL, HOL Light, and Mizar. In this paper, we present an add-on to the HOL4 proof assistant and an adaptation of the HOLyHammer system that provides machine learning-based premise selection and automated reasoning also for HOL4. We efficiently record the HOL4 dependencies and extract features from the theorem statements, which form a basis for premise selection. HOLyHammer transforms the HOL4 statements in the various TPTP-ATP proof formats, which are then processed by the ATPs. We discuss the different evaluation settings: ATPs, accessible lemmas, and premise numbers. We measure the performance of HOLyHammer on the HOL4 standard library. The results are combined accordingly and compared with the HOL Light experiments, showing a comparably high quality of predictions. The system directly benefits HOL4 users by automatically finding proofs dependencies that can be reconstructed by Metis.\nProbabilistic programming provides the means to represent and reason about complex probabilistic models using programming language constructs. Even simple probabilistic programs can produce models with infinitely many variables. Factored inference algorithms are widely used for probabilistic graphical models, but cannot be applied to these programs because all the variables and factors have to be enumerated. In this paper, we present a new inference framework, lazy factored inference (LFI), that enables factored algorithms to be used for models with infinitely many variables. LFI expands the model to a bounded depth and uses the structure of the program to precisely quantify the effect of the unexpanded part of the model, producing lower and upper bounds to the probability of the query.\nDung's abstract argumentation framework consists of a set of interacting arguments and a series of semantics for evaluating them. Those semantics partition the powerset of the set of arguments into two classes: extensions and non-extensions. In order to reason with a specific semantics, one needs to take a credulous or skeptical approach, i.e. an argument is eventually accepted, if it is accepted in one or all extensions, respectively. In our previous work \\cite{ref-pu2015counting}, we have proposed a novel semantics, called \\emph{counting semantics}, which allows for a more fine-grained assessment to arguments by counting the number of their respective attackers and defenders based on argument graph and argument game. In this paper, we continue our previous work by presenting some supplementaries about how to choose the damaging factor for the counting semantics, and what relationships with some existing approaches, such as Dung's classical semantics, generic gradual valuations. Lastly, an axiomatic perspective on the ranking semantics induced by our counting semantics are presented.\nIn the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but even though a few toy examples exist in the literature, there are still no extensive or rigorous benchmarks to compare them. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.\nFeature weighting algorithms try to solve a problem of great importance nowadays in machine learning: The search of a relevance measure for the features of a given domain. This relevance is primarily used for feature selection as feature weighting can be seen as a generalization of it, but it is also useful to better understand a problem's domain or to guide an inductor in its learning process. Relief family of algorithms are proven to be very effective in this task.   On previous work, a new extension was proposed that aimed for improving the algorithm's performance and it was shown that in certain cases it improved the weights' estimation accuracy. However, it also seemed to be sensible to some characteristics of the data. An improvement of that previously presented extension is presented in this work that aims to make it more robust to problem specific characteristics. An experimental design is proposed to test its performance. Results of the tests prove that it indeed increase the robustness of the previously proposed extension.\nWith the impressive capability to capture visual content, deep convolutional neural networks (CNN) have demon- strated promising performance in various vision-based ap- plications, such as classification, recognition, and objec- t detection. However, due to the intrinsic structure design of CNN, for images with complex content, it achieves lim- ited capability on invariance to translation, rotation, and re-sizing changes, which is strongly emphasized in the s- cenario of content-based image retrieval. In this paper, to address this problem, we proposed a new kernelized deep convolutional neural network. We first discuss our motiva- tion by an experimental study to demonstrate the sensitivi- ty of the global CNN feature to the basic geometric trans- formations. Then, we propose to represent visual content with approximate invariance to the above geometric trans- formations from a kernelized perspective. We extract CNN features on the detected object-like patches and aggregate these patch-level CNN features to form a vectorial repre- sentation with the Fisher vector model. The effectiveness of our proposed algorithm is demonstrated on image search application with three benchmark datasets.\nThis paper addresses the problem of scalable optimization for L1-regularized conditional Gaussian graphical models. Conditional Gaussian graphical models generalize the well-known Gaussian graphical models to conditional distributions to model the output network influenced by conditioning input variables. While highly scalable optimization methods exist for sparse Gaussian graphical model estimation, state-of-the-art methods for conditional Gaussian graphical models are not efficient enough and more importantly, fail due to memory constraints for very large problems. In this paper, we propose a new optimization procedure based on a Newton method that efficiently iterates over two sub-problems, leading to drastic improvement in computation time compared to the previous methods. We then extend our method to scale to large problems under memory constraints, using block coordinate descent to limit memory usage while achieving fast convergence. Using synthetic and genomic data, we show that our methods can solve one million dimensional problems to high accuracy in a little over a day on a single machine.\nA major problem of causal inference is the arrangement of dependent nodes in a directed acyclic graph (DAG) with path coefficients and observed confounders. Path coefficients do not provide the units to measure the strength of information flowing from one node to the other. Here we proposed the method of causal structure learning using collider v-structures (CVS) with Negative Percentage Mapping (NPM) to get selective thresholds of information strength, to direct the edges and subjective confounders in a DAG. The NPM is used to scale the strength of information passed through nodes in units of percentage from interval from 0 to 1. The causal structures are constructed by bottom up approach using path coefficients, causal directions and confounders, derived implementing collider v-structure and NPM. The method is self-sufficient to observe all the latent confounders present in the causal model and capable of detecting every responsible causal direction. The results are tested for simulated datasets of non-Gaussian distributions and compared with DirectLiNGAM and ICA-LiNGAM to check efficiency of the proposed method.\nAnticipating the future actions of a human is a widely studied problem in robotics that requires spatio-temporal reasoning. In this work we propose a deep learning approach for anticipation in sensory-rich robotics applications. We introduce a sensory-fusion architecture which jointly learns to anticipate and fuse information from multiple sensory streams. Our architecture consists of Recurrent Neural Networks (RNNs) that use Long Short-Term Memory (LSTM) units to capture long temporal dependencies. We train our architecture in a sequence-to-sequence prediction manner, and it explicitly learns to predict the future given only a partial temporal context. We further introduce a novel loss layer for anticipation which prevents over-fitting and encourages early anticipation. We use our architecture to anticipate driving maneuvers several seconds before they happen on a natural driving data set of 1180 miles. The context for maneuver anticipation comes from multiple sensors installed on the vehicle. Our approach shows significant improvement over the state-of-the-art in maneuver anticipation by increasing the precision from 77.4% to 90.5% and recall from 71.2% to 87.4%.\nWe study a general task allocation problem, involving multiple agents that collaboratively accomplish tasks and where agents may fail to successfully complete the tasks assigned to them (known as execution uncertainty). The goal is to choose an allocation that maximises social welfare while taking their execution uncertainty into account. We show that this can be achieved by using the post-execution verification (PEV)-based mechanism if and only if agents' valuations satisfy a multilinearity condition. We then consider a more complex setting where an agent's execution uncertainty is not completely predictable by the agent alone but aggregated from all agents' private opinions (known as trust). We show that PEV-based mechanism with trust is still truthfully implementable if and only if the trust aggregation is multilinear.\nReal-time estimation of destination and travel time for taxis is of great importance for existing electronic dispatch systems. We present an approach based on trip matching and ensemble learning, in which we leverage the patterns observed in a dataset of roughly 1.7 million taxi journeys to predict the corresponding final destination and travel time for ongoing taxi trips, as a solution for the ECML/PKDD Discovery Challenge 2015 competition. The results of our empirical evaluation show that our approach is effective and very robust, which led our team -- BlueTaxi -- to the 3rd and 7th position of the final rankings for the trip time and destination prediction tasks, respectively. Given the fact that the final rankings were computed using a very small test set (with only 320 trips) we believe that our approach is one of the most robust solutions for the challenge based on the consistency of our good results across the test sets.\nThis paper investigates the mining of class association rules with rough set approach. In data mining, an association occurs between two set of elements when one element set happen together with another. A class association rule set (CARs) is a subset of association rules with classes specified as their consequences. We present an efficient algorithm for mining the finest class rule set inspired form Apriori algorithm, where the support and confidence are computed based on the elementary set of lower approximation included in the property of rough set theory. Our proposed approach has been shown very effective, where the rough set approach for class association discovery is much simpler than the classic association method.\nLearning from synthetic data has many important and practical applications. An example of application is photo-sketch recognition. Using synthetic data is challenging due to the differences in feature distributions between synthetic and real data, a phenomenon we term synthetic gap. In this paper, we investigate and formalize a general framework-Stacked Multichannel Autoencoder (SMCAE) that enables bridging the synthetic gap and learning from synthetic data more efficiently. In particular, we show that our SMCAE can not only transform and use synthetic data on the challenging face-sketch recognition task, but that it can also help simulate real images, which can be used for training classifiers for recognition. Preliminary experiments validate the effectiveness of the framework.\nRecently, knowledge graph embedding, which projects symbolic entities and relations into continuous vector space, has become a new, hot topic in artificial intelligence. This paper addresses a new issue of multiple relation semantics that a relation may have multiple meanings revealed by the entity pairs associated with the corresponding triples, and proposes a novel Gaussian mixture model for embedding, TransG. The new model can discover latent semantics for a relation and leverage a mixture of relation component vectors for embedding a fact triple. To the best of our knowledge, this is the first generative model for knowledge graph embedding, which is able to deal with multiple relation semantics. Extensive experiments show that the proposed model achieves substantial improvements against the state-of-the-art baselines.\nThis volume contains the proceedings of the Thirteenth International Workshop on the ACL2 Theorem Prover and Its Applications, ACL2 2015, a two-day workshop held in Austin, Texas, USA, on October 1-2, 2015. ACL2 workshops occur at approximately 18-month intervals and provide a major technical forum for researchers to present and discuss improvements and extensions to the theorem prover, comparisons of ACL2 with other systems, and applications of ACL2 in formal verification.   ACL2 is a state-of-the-art automated reasoning system that has been successfully applied in academia, government, and industry for specification and verification of computing systems and in teaching computer science courses. In 2005, Boyer, Kaufmann, and Moore were awarded the 2005 ACM Software System Award for their work on ACL2 and the other theorem provers in the Boyer-Moore family.\nThis paper presents a case study of a recommender system that can be used to save energy in smart homes without lowering the comfort of the inhabitants. We present an algorithm that uses consumer behavior data only and uses machine learning to suggest actions for inhabitants to reduce the energy consumption of their homes. The system mines for frequent and periodic patterns in the event data provided by the Digitalstrom home automation system. These patterns are converted into association rules, prioritized and compared with the current behavior of the inhabitants. If the system detects an opportunities to save energy without decreasing the comfort level it sends a recommendation to the residents.\nIn this paper we extend the classical notion of strong and weak backdoor sets for SAT and CSP by allowing that different instantiations of the backdoor variables result in instances that belong to different base classes; the union of the base classes forms a heterogeneous base class. Backdoor sets to heterogeneous base classes can be much smaller than backdoor sets to homogeneous ones, hence they are much more desirable but possibly harder to find. We draw a detailed complexity landscape for the problem of detecting strong and weak backdoor sets into heterogeneous base classes for SAT and CSP.\nThe Minimum Vertex Cover (MinVC) problem is a well-known NP-hard problem. Recently there has been great interest in solving this problem on real-world massive graphs. For such graphs, local search is a promising approach to finding optimal or near-optimal solutions. In this paper we propose a local search algorithm that exploits reduction rules and data structures to solve the MinVC problem in such graphs. Experimental results on a wide range of real-word massive graphs show that our algorithm finds better covers than state-of-the-art local search algorithms for MinVC. Also we present interesting results about the complexities of some well-known heuristics.\nIn this paper, we address the task of Optical Character Recognition(OCR) for the Telugu script. We present an end-to-end framework that segments the text image, classifies the characters and extracts lines using a language model. The segmentation is based on mathematical morphology. The classification module, which is the most challenging task of the three, is a deep convolutional neural network. The language is modelled as a third degree markov chain at the glyph level. Telugu script is a complex alphasyllabary and the language is agglutinative, making the problem hard. In this paper we apply the latest advances in neural networks to achieve state-of-the-art error rates. We also review convolutional neural networks in great detail and expound the statistical justification behind the many tricks needed to make Deep Learning work.\nIn this study, an artificial neural network (ANN) based on particle swarm optimization (PSO) was developed for the time series prediction. The hybrid ANN+PSO algorithm was applied on Mackey--Glass chaotic time series in the short-term $x(t+6)$. The performance prediction was evaluated and compared with another studies available in the literature. Also, we presented properties of the dynamical system via the study of chaotic behaviour obtained from the predicted time series. Next, the hybrid ANN+PSO algorithm was complemented with a Gaussian stochastic procedure (called {\\it stochastic} hybrid ANN+PSO) in order to obtain a new estimator of the predictions, which also allowed us to compute uncertainties of predictions for noisy Mackey--Glass chaotic time series. Thus, we studied the impact of noise for several cases with a white noise level ($\\sigma_{N}$) from 0.01 to 0.1.\nPoker is a family of card games that includes many variations. We hypothesize that most poker games can be solved as a pattern matching problem, and propose creating a strong poker playing system based on a unified poker representation. Our poker player learns through iterative self-play, and improves its understanding of the game by training on the results of its previous actions without sophisticated domain knowledge. We evaluate our system on three poker games: single player video poker, two-player Limit Texas Hold'em, and finally two-player 2-7 triple draw poker. We show that our model can quickly learn patterns in these very different poker games while it improves from zero knowledge to a competitive player against human experts.   The contributions of this paper include: (1) a novel representation for poker games, extendable to different poker variations, (2) a CNN based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that significantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.\nWe study hedonic games with dichotomous preferences. Hedonic games are cooperative games in which players desire to form coalitions, but only care about the makeup of the coalitions of which they are members; they are indifferent about the makeup of other coalitions. The assumption of dichotomous preferences means that, additionally, each player's preference relation partitions the set of coalitions of which that player is a member into just two equivalence classes: satisfactory and unsatisfactory. A player is indifferent between satisfactory coalitions, and is indifferent between unsatisfactory coalitions, but strictly prefers any satisfactory coalition over any unsatisfactory coalition. We develop a succinct representation for such games, in which each player's preference relation is represented by a propositional formula. We show how solution concepts for hedonic games with dichotomous preferences are characterised by propositional formulas.\nThe phenomenal growth in the healthcare data has inspired us in investigating robust and scalable models for data mining. For classification problems Information Gain(IG) based Decision Tree is one of the popular choices. However, depending upon the nature of the dataset, IG based Decision Tree may not always perform well as it prefers the attribute with more number of distinct values as the splitting attribute. Healthcare datasets generally have many attributes and each attribute generally has many distinct values. In this paper, we have tried to focus on this characteristics of the datasets while analysing the performance of our proposed approach which is a variant of Decision Tree model and uses the concept of Correlation Ratio(CR). Unlike IG based approach, this CR based approach has no biasness towards the attribute with more number of distinct values. We have applied our model on some benchmark healthcare datasets to show the effectiveness of the proposed technique.\nIn this paper, we evaluate convolutional neural network (CNN) features using the AlexNet architecture and very deep convolutional network (VGGNet) architecture. To date, most CNN researchers have employed the last layers before output, which were extracted from the fully connected feature layers. However, since it is unlikely that feature representation effectiveness is dependent on the problem, this study evaluates additional convolutional layers that are adjacent to fully connected layers, in addition to executing simple tuning for feature concatenation (e.g., layer 3 + layer 5 + layer 7) and transformation, using tools such as principal component analysis. In our experiments, we carried out detection and classification tasks using the Caltech 101 and Daimler Pedestrian Benchmark Datasets.\nA robot operating in a real-world environment needs to perform reasoning over a variety of sensor modalities such as vision, language and motion trajectories. However, it is extremely challenging to manually design features relating such disparate modalities. In this work, we introduce an algorithm that learns to embed point-cloud, natural language, and manipulation trajectory data into a shared embedding space with a deep neural network. To learn semantically meaningful spaces throughout our network, we use a loss-based margin to bring embeddings of relevant pairs closer together while driving less-relevant cases from different modalities further apart. We use this both to pre-train its lower layers and fine-tune our final embedding space, leading to a more robust representation. We test our algorithm on the task of manipulating novel objects and appliances based on prior experience with other objects. On a large dataset, we achieve significant improvements in both accuracy and inference time over the previous state of the art. We also perform end-to-end experiments on a PR2 robot utilizing our learned embedding space.\nIt is commonplace to encounter nonstationary data, of which the underlying generating process may change over time or across domains. The nonstationarity presents both challenges and opportunities for causal discovery. In this paper we propose a principled framework to handle nonstationarity, and develop some methods to address three important questions. First, we propose an enhanced constraint-based method to detect variables whose local mechanisms are nonstationary and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine some causal directions by taking advantage of information carried by changing distributions. Third, we develop a method for visualizing the nonstationarity of causal modules. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.\nDemand for high software reliability requires rigorous testing followed by requirement of robust modeling techniques for software quality prediction. On one side, firms have to steadily manage the reliability by testing it vigorously, the optimal release time determination is their biggest concern. In past many models have been developed and much research has been devoted towards assessment of release time of software. However, majority of the work deals in crisp study. This paper addresses the problem of release time prediction using fuzzy Logic. Here we have formulated a Fuzzy release time problem considering the cost of testing under the impact of warranty period. Results show that fuzzy model has good adaptability.\nWe consider an agent seeking to obtain an item, potentially available at different locations in a physical environment. The traveling costs between locations are known in advance, but there is only probabilistic knowledge regarding the possible prices of the item at any given location. Given such a setting, the problem is to find a plan that maximizes the probability of acquiring the good while minimizing both travel and purchase costs. Sample applications include agents in search-and-rescue or exploration missions, e.g., a rover on Mars seeking to mine a specific mineral. These probabilistic physical search problems have been previously studied, but we present the first approximation and heuristic algorithms for solving such problems on general graphs. We establish an interesting connection between these problems and classical graph-search problems, which led us to provide the approximation algorithms and hardness of approximation results for our settings. We further suggest several heuristics for practical use, and demonstrate their effectiveness with simulation on real graph structure and synthetic graphs.\nBoolean matrix factorization and Boolean matrix completion from noisy observations are desirable unsupervised data-analysis methods due to their interpretability, but hard to perform due to their NP-hardness. We treat these problems as maximum a posteriori inference problems in a graphical model and present a message passing approach that scales linearly with the number of observations and factors. Our empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data.\nWe propose a particularly structured Boltzmann machine, which we refer to as a dynamic Boltzmann machine (DyBM), as a stochastic model of a multi-dimensional time-series. The DyBM can have infinitely many layers of units but allows exact and efficient inference and learning when its parameters have a proposed structure. This proposed structure is motivated by postulates and observations, from biological neural networks, that the synaptic weight is strengthened or weakened, depending on the timing of spikes (i.e., spike-timing dependent plasticity or STDP). We show that the learning rule of updating the parameters of the DyBM in the direction of maximizing the likelihood of given time-series can be interpreted as STDP with long term potentiation and long term depression. The learning rule has a guarantee of convergence and can be performed in a distributed matter (i.e., local in space) with limited memory (i.e., local in time).\nReasoning with ontologies is one of the core fields of research in Description Logics. A variety of efficient reasoner with highly optimized algorithms have been developed to allow inference tasks on expressive ontology languages such as OWL(DL). However, reasoner reported computing times have exceeded and sometimes fall behind the expected theoretical values. From an empirical perspective, it is not yet well understood, which particular aspects in the ontology are reasoner performance degrading factors. In this paper, we conducted an investigation about state of art works that attempted to portray potential correlation between reasoner empirical behaviour and particular ontological features. These works were analysed and then broken down into categories. Further, we proposed a set of ontology features covering a broad range of structural and syntactic ontology characteristics. We claim that these features are good indicators of the ontology hardness level against reasoning tasks.\nWe consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \\emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed.\nCharacterizing relationships between people is fundamental for the understanding of narratives. In this work, we address the problem of inferring the polarity of relationships between people in narrative summaries. We formulate the problem as a joint structured prediction for each narrative, and present a model that combines evidence from linguistic and semantic features, as well as features based on the structure of the social community in the text. We also provide a clustering-based approach that can exploit regularities in narrative types. e.g., learn an affinity for love-triangles in romantic stories. On a dataset of movie summaries from Wikipedia, our structured models provide more than a 30% error-reduction over a competitive baseline that considers pairs of characters in isolation.\nWe describe the problem of aggregating the label predictions of diverse classifiers using a class taxonomy. Such a taxonomy may not have been available or referenced when the individual classifiers were designed and trained, yet mapping the output labels into the taxonomy is desirable to integrate the effort spent in training the constituent classifiers. A hierarchical taxonomy representing some domain knowledge may be different from, but partially mappable to, the label sets of the individual classifiers. We present a heuristic approach and a principled graphical model to aggregate the label predictions by grounding them into the available taxonomy. Our model aggregates the labels using the taxonomy structure as constraints to find the most likely hierarchically consistent class. We experimentally validate our proposed method on image and text classification tasks.\nMicrobial communities play important roles in the function and maintenance of various biosystems, ranging from human body to the environment. Current methods for analysis of microbial communities are typically based on taxonomic phylogenetic alignment using 16S rRNA metagenomic or Whole Genome Sequencing data. In typical characterizations of microbial communities, studies deal with billions of micobial sequences, aligning them to a phylogenetic tree. We introduce a new approach for the efficient analysis of microbial communities. Our new reference-free analysis tech- nique is based on n-gram sequence analysis of 16S rRNA data and reduces the processing data size dramatically (by 105 fold), without requiring taxonomic alignment. The proposed approach is applied to characterize phenotypic microbial community differ- ences in different settings. Specifically, we applied this approach in classification of microbial com- munities across different body sites, characterization of oral microbiomes associated with healthy and diseased individuals, and classification of microbial communities longitudinally during the develop- ment of infants. Different dimensionality reduction methods are introduced that offer a more scalable analysis framework, while minimizing the loss in classification accuracies. Among dimensionality re- duction techniques, we propose a continuous vector representation for microbial communities, which can widely be used for deep learning applications in microbial informatics.\nExisting methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.\nThis paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attribute-conditioned image reconstruction and completion.\nTo accomplish tasks in human-centric indoor environments, robots need to represent and understand the world in terms of objects and their attributes. We refer to this attribute-based representation as a world model, and consider how to acquire it via noisy perception and maintain it over time, as objects are added, changed, and removed in the world. Previous work has framed this as multiple-target tracking problem, where objects are potentially in motion at all times. Although this approach is general, it is computationally expensive. We argue that such generality is not needed in typical world modeling tasks, where objects only change state occasionally. More efficient approaches are enabled by restricting ourselves to such semi-static environments.   We consider a previously-proposed clustering-based world modeling approach that assumed static environments, and extend it to semi-static domains by applying a dependent Dirichlet-process (DDP) mixture model. We derive a novel MAP inference algorithm under this model, subject to data association constraints. We demonstrate our approach improves computational performance in semi-static environments.\nBayesian matrix completion has been studied based on a low-rank matrix factorization formulation with promising results. However, little work has been done on Bayesian matrix completion based on the more direct spectral regularization formulation. We fill this gap by presenting a novel Bayesian matrix completion method based on spectral regularization. In order to circumvent the difficulties of dealing with the orthonormality constraints of singular vectors, we derive a new equivalent form with relaxed constraints, which then leads us to design an adaptive version of spectral regularization feasible for Bayesian inference. Our Bayesian method requires no parameter tuning and can infer the number of latent factors automatically. Experiments on synthetic and real datasets demonstrate encouraging results on rank recovery and collaborative filtering, with notably good results for very sparse matrices.\nWe first show that there are practical situations in for instance forensic and gambling settings, in which applying classical probability theory, that is, based on the axioms of Kolmogorov, is problematic. We then introduce and discuss Shafer belief functions. Technically, Shafer belief functions generalize probability distributions. Philosophically, they pertain to individual or shared knowledge of facts, rather than to facts themselves, and therefore can be interpreted as generalizing epistemic probability, that is, probability theory interpreted epistemologically. Belief functions are more flexible and better suited to deal with certain types of uncertainty than classical probability distributions. We develop a new calculus for belief functions which does not use the much criticized Dempster's rule of combination, by generalizing the classical notions of conditioning and independence in a natural and uncontroversial way. Using this calculus, we explain our rejection of Dempster's rule in detail. We apply the new theory to a number of examples, including a gambling example and an example in a forensic setting. We prove a law of large numbers for belief functions and offer a betting interpretation similar to the Dutch Book Theorem for probability distributions.\nKnowledge graph embedding aims to represent entities and relations in a large-scale knowledge graph as elements in a continuous vector space. Existing methods, e.g., TransE and TransH, learn embedding representation by defining a global margin-based loss function over the data. However, the optimal loss function is determined during experiments whose parameters are examined among a closed set of candidates. Moreover, embeddings over two knowledge graphs with different entities and relations share the same set of candidate loss functions, ignoring the locality of both graphs. This leads to the limited performance of embedding related applications. In this paper, we propose a locally adaptive translation method for knowledge graph embedding, called TransA, to find the optimal loss function by adaptively determining its margin over different knowledge graphs. Experiments on two benchmark data sets demonstrate the superiority of the proposed method, as compared to the-state-of-the-art ones.\nIn the artificial intelligence area, one of the ultimate goals is to make computers understand human language and offer assistance. In order to achieve this ideal, researchers of computer science have put forward a lot of models and algorithms attempting at enabling the machine to analyze and process human natural language on different levels of semantics. Although recent progress in this field offers much hope, we still have to ask whether current research can provide assistance that people really desire in reading and comprehension. To this end, we conducted a reading comprehension test on two scientific papers which are written in different styles. We use the semantic link models to analyze the understanding obstacles that people will face in the process of reading and figure out what makes it difficult for human to understand a scientific literature. Through such analysis, we summarized some characteristics and problems which are reflected by people with different levels of knowledge on the comprehension of difficult science and technology literature, which can be modeled in semantic link network. We believe that these characteristics and problems will help us re-examine the existing machine models and are helpful in the designing of new one.\nA general approach to knowledge transfer is introduced in which an agent controlled by a neural network adapts how it reuses existing networks as it learns in a new domain. Networks trained for a new domain can improve their performance by routing activation selectively through previously learned neural structure, regardless of how or for what it was learned. A neuroevolution implementation of this approach is presented with application to high-dimensional sequential decision-making domains. This approach is more general than previous approaches to neural transfer for reinforcement learning. It is domain-agnostic and requires no prior assumptions about the nature of task relatedness or mappings. The method is analyzed in a stochastic version of the Arcade Learning Environment, demonstrating that it improves performance in some of the more complex Atari 2600 games, and that the success of transfer can be predicted based on a high-level characterization of game dynamics.\nIn many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \\emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs.   Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application.\nHumans routinely confront the following key question which could be viewed as a probabilistic variant of the controllability problem: While faced with an uncertain environment governed by causal structures, how should they practice their autonomy by intervening on driver variables, in order to increase (or decrease) the probability of attaining their desired (or undesired) state for some target variable? In this paper, for the first time, the problem of probabilistic controllability in Causal Bayesian Networks (CBNs) is studied. More specifically, the aim of this paper is two-fold: (i) to introduce and formalize the problem of probabilistic structural controllability in CBNs, and (ii) to identify a sufficient set of driver variables for the purpose of probabilistic structural controllability of a generic CBN. We also elaborate on the nature of minimality the identified set of driver variables satisfies. In this context, the term \"structural\" signifies the condition wherein solely the structure of the CBN is known.\nThe aim of this paper is to investigate the interplay between knowledge shared by a group of agents and its coalition ability. We investigate this relation in the standard context of imperfect information concurrent game. We assume that whenever a set of agents form a coalition to achieve a goal, they share their knowledge before acting. Based on this assumption, we propose a new semantics for alternating-time temporal logic with imperfect information and perfect recall. It turns out that this semantics is sufficient to preserve all the desirable properties of coalition ability in traditional coalitional logics. Meanwhile, we investigate how knowledge sharing within a group of agents contributes to its coalitional ability through the interplay of epistemic and coalition modalities. This work provides a partial answer to the question: which kind of group knowledge is required for a group to achieve their goals in the context of imperfect information.\nWith data sizes constantly expanding, and with classical machine learning algorithms that analyze such data requiring larger and larger amounts of computation time and storage space, the need to distribute computation and memory requirements among several computers has become apparent. Although substantial work has been done in developing distributed binary SVM algorithms and multi-class SVM algorithms individually, the field of multi-class distributed SVMs remains largely unexplored. This research proposes a novel algorithm that implements the Support Vector Machine over a multi-class dataset and is efficient in a distributed environment (here, Hadoop). The idea is to divide the dataset into half recursively and thus compute the optimal Support Vector Machine for this half during the training phase, much like a divide and conquer approach. While testing, this structure has been effectively exploited to significantly reduce the prediction time. Our algorithm has shown better computation time during the prediction phase than the traditional sequential SVM methods (One vs. One, One vs. Rest) and out-performs them as the size of the dataset grows. This approach also classifies the data with higher accuracy than the traditional multi-class algorithms.\nUsing deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.\nIn the literature of game theory, the information sets of extensive form games have different interpretations, which may lead to confusions and paradoxical cases. We argue that the problem lies in the mix-up of two interpretations of the extensive form game structures: game rules or game runs which do not always coincide. In this paper, we try to separate and connect these two views by proposing a dynamic epistemic framework in which we can compute the runs step by step from the game rules plus the given assumptions of the players. We propose a modal logic to describe players' knowledge and its change during the plays, and provide a complete axiomatization. We also show that, under certain conditions, the mix-up of the rules and the runs is not harmful due to the structural similarity of the two.\nSensitivity methods for the analysis of the outputs of discrete Bayesian networks have been extensively studied and implemented in different software packages. These methods usually focus on the study of sensitivity functions and on the impact of a parameter change to the Chan-Darwiche distance. Although not fully recognized, the majority of these results heavily rely on the multilinear structure of atomic probabilities in terms of the conditional probability parameters associated with this type of network. By defining a statistical model through the polynomial expression of its associated defining conditional probabilities, we develop a unifying approach to sensitivity methods applicable to a large suite of models including extensions of Bayesian networks, for instance context-specific and dynamic ones, and chain event graphs. By then focusing on models whose defining polynomial is multilinear, our algebraic approach enables us to prove that the Chan-Darwiche distance is minimized for a certain class of multi-parameter contemporaneous variations when parameters are proportionally covaried.\nReal data often contains a mixture of discrete and continuous variables, but many Bayesian network structure learning and inference algorithms assume all random variables are discrete. Continuous variables are often discretized, but the choice of discretization policy has significant impact on the accuracy, speed, and interpretability of the resulting models. This paper introduces a principled Bayesian discretization method for continuous variables in Bayesian networks with quadratic complexity instead of the cubic complexity of other standard techniques. Empirical demonstrations show that the proposed method is superior to the state of the art. In addition, this paper shows how to incorporate existing methods into the structure learning process to discretize all continuous variables and simultaneously learn Bayesian network structures.\nWe present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometric analysis, and provide a large-scale quantitative benchmark for research in computer graphics and vision. At the time of this technical report, ShapeNet has indexed more than 3,000,000 models, 220,000 models out of which are classified into 3,135 categories (WordNet synsets). In this report we describe the ShapeNet effort as a whole, provide details for all currently available datasets, and summarize future plans.\nIn business analytics, measure values, such as sales numbers or volumes of cargo transported, are often summed along values of one or more corresponding categories, such as time or shipping container. However, not every measure should be added by default (e.g., one might more typically want a mean over the heights of a set of people); similarly, some measures should only be summed within certain constraints (e.g., population measures need not be summed over years). In systems such as Watson Analytics, the exact additive behaviour of a measure is often determined by a human expert. In this work, we propose a small set of features for this issue. We use these features in a case-based reasoning approach, where the system suggests an aggregation behaviour, with 86% accuracy in our collected dataset.\nThe paper proposes a feed-forward control strategy for mobile robot control that accounts for a non-linear model of the vehicle with interaction between inputs and outputs. It is possible to include specific model uncertainties in the dynamic model of the mobile robot in order to see how the control problem should be addressed taking into consideration the complete dynamic mobile robot model. By means of a neural network feed-forward controller a real non-linear mathematical model of the vehicle can be taken into consideration. The classical velocity control strategy can be extended using artificial neural networks in order to compensate for the modelling uncertainties. It is possible to develop an intelligent strategy for mobile robot control.\nIt is widely acknowledged that function symbols are an important feature in answer set programming, as they make modeling easier, increase the expressive power, and allow us to deal with infinite domains. The main issue with their introduction is that the evaluation of a program might not terminate and checking whether it terminates or not is undecidable. To cope with this problem, several classes of logic programs have been proposed where the use of function symbols is restricted but the program evaluation termination is guaranteed. Despite the significant body of work in this area, current approaches do not include many simple practical programs whose evaluation terminates. In this paper, we present the novel classes of rule-bounded and cycle-bounded programs, which overcome different limitations of current approaches by performing a more global analysis of how terms are propagated from the body to the head of rules. Results on the correctness, the complexity, and the expressivity of the proposed approach are provided.\nOff-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using function approximation and incremental updates. However, they have been developed for the case of a fixed behavior policy. In control problems, one would like to adapt the behavior policy over time to become more greedy with respect to the existing value function. In this paper, we present the first gradient-based learning algorithms for this problem, which rely on the framework of policy gradient in order to modify the behavior policy. We present derivations of the algorithms, a convergence theorem, and empirical evidence showing that they compare favorably to existing approaches.\nMicrosoft Kinect camera and its skeletal tracking capabilities have been embraced by many researchers and commercial developers in various applications of real-time human movement analysis. In this paper, we evaluate the accuracy of the human kinematic motion data in the first and second generation of the Kinect system, and compare the results with an optical motion capture system. We collected motion data in 12 exercises for 10 different subjects and from three different viewpoints. We report on the accuracy of the joint localization and bone length estimation of Kinect skeletons in comparison to the motion capture. We also analyze the distribution of the joint localization offsets by fitting a mixture of Gaussian and uniform distribution models to determine the outliers in the Kinect motion data. Our analysis shows that overall Kinect 2 has more robust and more accurate tracking of human pose as compared to Kinect 1.\nAn ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow and superresolution. Hardware acceleration of these algorithms is essential to adopt these improvements in embedded and mobile computer vision systems. We present a new architecture, design and implementation as well as the first reported silicon measurements of such an accelerator, outperforming previous work in terms of power-, area- and I/O-efficiency. The manufactured device provides up to 196 GOp/s on 3.09 mm^2 of silicon in UMC 65nm technology and can achieve a power efficiency of 803 GOp/s/W. The massively reduced bandwidth requirements make it the first architecture scalable to TOp/s performance.\nAction languages have emerged as an important field of Knowledge Representation for reasoning about change and causality in dynamic domains. This article presents Cerbere, a production system designed to perform online causal, temporal and epistemic reasoning based on the Event Calculus. The framework implements the declarative semantics of the underlying logic theories in a forward-chaining rule-based reasoning system, coupling the high expressiveness of its formalisms with the efficiency of rule-based systems. To illustrate its applicability, we present both the modeling of benchmark problems in the field, as well as its utilization in the challenging domain of smart spaces. A hybrid framework that combines logic-based with probabilistic reasoning has been developed, that aims to accommodate activity recognition and monitoring tasks in smart spaces. Under consideration in Theory and Practice of Logic Programming (TPLP)\nMost of Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) algorithms in existing probabilistic programming systems suboptimally use only model priors as proposal distributions. In this work, we describe an approach for training a discriminative model, namely a neural network, in order to approximate the optimal proposal by using posterior estimates from previous runs of inference. We show an example that incorporates a data-driven proposal for use in a non-parametric model in the Anglican probabilistic programming system. Our results show that data-driven proposals can significantly improve inference performance so that considerably fewer particles are necessary to perform a good posterior estimation.\nKnowledge graph embedding aims at offering a numerical knowledge representation paradigm by transforming the entities and relations into continuous vector space. However, existing methods could not characterize the knowledge graph in a fine degree to make a precise prediction. There are two reasons: being an ill-posed algebraic system and applying an overstrict geometric form. As precise prediction is critical, we propose an manifold-based embedding principle (\\textbf{ManifoldE}) which could be treated as a well-posed algebraic system that expands the position of golden triples from one point in current models to a manifold in ours. Extensive experiments show that the proposed models achieve substantial improvements against the state-of-the-art baselines especially for the precise prediction task, and yet maintain high efficiency.\nIs it possible to make statistical inference broadly accessible to non-statisticians without sacrificing mathematical rigor or inference quality? This paper describes BayesDB, a probabilistic programming platform that aims to enable users to query the probable implications of their data as directly as SQL databases enable them to query the data itself. This paper focuses on four aspects of BayesDB: (i) BQL, an SQL-like query language for Bayesian data analysis, that answers queries by averaging over an implicit space of probabilistic models; (ii) techniques for implementing BQL using a broad class of multivariate probabilistic models; (iii) a semi-parametric Bayesian model-builder that auomatically builds ensembles of factorial mixture models to serve as baselines; and (iv) MML, a \"meta-modeling\" language for imposing qualitative constraints on the model-builder and combining baseline models with custom algorithmic and statistical models that can be implemented in external software. BayesDB is illustrated using three applications: cleaning and exploring a public database of Earth satellites; assessing the evidence for temporal dependence between macroeconomic indicators; and analyzing a salary survey.\nGood predictors of ICU Mortality have the potential to identify high-risk patients earlier, improve ICU resource allocation, or create more accurate population-level risk models. Machine learning practitioners typically make choices about how to represent features in a particular model, but these choices are seldom evaluated quantitatively. This study compares the performance of different representations of clinical event data from MIMIC II in a logistic regression model to predict 36-hour ICU mortality. The most common representations are linear (normalized counts) and binary (yes/no). These, along with a new representation termed \"hill\", are compared using both L1 and L2 regularization. Results indicate that the introduced \"hill\" representation outperforms both the binary and linear representations, the hill representation thus has the potential to improve existing models of ICU mortality.\nIn multi-criteria decision making (MCDM) problems, ratings are assigned to the alternatives on different criteria by the expert group. In this paper, we propose a thermodynamically consistent model for MCDM using the analogies for thermodynamical indicators - energy, exergy and entropy. The most commonly used method for analysing MCDM problem is Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). The conventional TOPSIS method uses a measure similar to that of energy for the ranking of alternatives. We demonstrate that the ranking of the alternatives is more meaningful if we use exergy in place of energy. The use of exergy is superior due to the inclusion of a factor accounting for the quality of the ratings by the expert group. The unevenness in the ratings by the experts is measured by entropy. The procedure for the calculation of the thermodynamical indicators is explained in both crisp and fuzzy environment. Finally, two case studies are carried out to demonstrate effectiveness of the proposed model.\nDuring the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.\nBorderline personality disorder and narcissistic personality disorder are important nosographic entities and have been subject of intensive investigations. The currently prevailing psychodynamic theory for mental disorders is based on the repertoire of defense mechanisms employed. Another line of research is concerned with the study of psychological traumas and dissociation as a defensive response. Both theories can be used to shed light on some aspects of pathological mental functioning, and have many points of contact. This work merges these two psychological theories, and builds a model of mental function in a relational context called Quadripolar Relational Model. The model, which is enriched with ideas borrowed from the field of computer science, leads to a new therapeutic proposal for psychological traumas and personality disorders.\nGenerating an article automatically with computer program is a challenging task in artificial intelligence and natural language processing. In this paper, we target at essay generation, which takes as input a topic word in mind and generates an organized article under the theme of the topic. We follow the idea of text planning \\cite{Reiter1997} and develop an essay generation framework. The framework consists of three components, including topic understanding, sentence extraction and sentence reordering. For each component, we studied several statistical algorithms and empirically compared between them in terms of qualitative or quantitative analysis. Although we run experiments on Chinese corpus, the method is language independent and can be easily adapted to other language. We lay out the remaining challenges and suggest avenues for future research.\nConvolutional neural networks demonstrated outstanding empirical results in computer vision and speech recognition tasks where labeled training data is abundant. In medical imaging, there is a huge variety of possible imaging modalities and contrasts, where annotated data is usually very scarce. We present two approaches to deal with this challenge. A network pretrained in a different domain with abundant data is used as a feature extractor, while a subsequent classifier is trained on a small target dataset; and a deep architecture trained with heavy augmentation and equipped with sophisticated regularization methods. We test the approaches on a corpus of X-ray images to design an anatomy detection system.\nHomogeneous unstructured data (HUD) are collections of unstructured documents that share common properties, such as similar layout, common file format, or common domain of values. Building on such properties, it would be desirable to automatically process HUD to access the main information through a semantic layer -- typically an ontology -- called semantic view. Hence, we propose an ontology-based approach for extracting semantically rich information from HUD, by integrating and extending recent technologies and results from the fields of classical information extraction, table recognition, ontologies, text annotation, and logic programming. Moreover, we design and implement a system, named KnowRex, that has been successfully applied to curriculum vitae in the Europass style to offer a semantic view of them, and be able, for example, to select those which exhibit required skills.\nEmerging ontology authoring methods to add knowledge to an ontology focus on ameliorating the validation bottleneck. The verification of the newly added axiom is still one of trying and seeing what the reasoner says, because a systematic testbed for ontology authoring is missing. We sought to address this by introducing the approach of test-driven development for ontology authoring. We specify 36 generic tests, as TBox queries and TBox axioms tested through individuals, and structure their inner workings in an `open box'-way, which cover the OWL 2 DL language features. This is implemented as a Protege plugin so that one can perform a TDD test as a black box test. We evaluated the two test approaches on their performance. The TBox queries were faster, and that effect is more pronounced the larger the ontology is. We provide a general sequence of a TDD process for ontology engineering as a foundation for a TDD methodology.\nThe paper focuses on a new class of combinatorial problems which consists in restructuring of solutions (as sets/structures) in combinatorial optimization. Two main features of the restructuring process are examined: (i) a cost of the restructuring, (ii) a closeness to a goal solution. Three types of the restructuring problems are under study: (a) one-stage structuring, (b) multi-stage structuring, and (c) structuring over changed element set. One-criterion and multicriteria problem formulations can be considered. The restructuring problems correspond to redesign (improvement, upgrade) of modular systems or solutions. The restructuring approach is described and illustrated (problem statements, solving schemes, examples) for the following combinatorial optimization problems: knapsack problem, multiple choice problem, assignment problem, spanning tree problems, clustering problem, multicriteria ranking (sorting) problem, morphological clique problem. Numerical examples illustrate the restructuring problems and solving schemes.\nThis paper summarizes the recent progress we have made for the computer vision technologies in physical therapy with the accessible and affordable devices. We first introduce the remote health coaching system we build with Microsoft Kinect. Since the motion data captured by Kinect is noisy, we investigate the data accuracy of Kinect with respect to the high accuracy motion capture system. We also propose an outlier data removal algorithm based on the data distribution. In order to generate the kinematic parameter from the noisy data captured by Kinect, we propose a kinematic filtering algorithm based on Unscented Kalman Filter and the kinematic model of human skeleton. The proposed algorithm can obtain smooth kinematic parameter with reduced noise compared to the kinematic parameter generated from the raw motion data from Kinect.\nHypothetical Datalog is based on an intuitionistic semantics rather than on a classical logic semantics, and embedded implications are allowed in rule bodies. While the usual implication (i.e., the neck of a Horn clause) stands for inferring facts, an embedded implication plays the role of assuming its premise for deriving its consequence. A former work introduced both a formal framework and a goal-oriented tabled implementation, allowing negation in rule bodies. While in that work positive assumptions for both facts and rules can occur in the premise, negative assumptions are not allowed. In this work, we cover this subject by introducing a new concept: a restricted predicate, which allows negative assumptions by pruning the usual semantics of a predicate. This new setting has been implemented in the deductive system DES.\nWe study how to communicate findings of Bayesian inference to third parties, while preserving the strong guarantee of differential privacy. Our main contributions are four different algorithms for private Bayesian inference on proba-bilistic graphical models. These include two mechanisms for adding noise to the Bayesian updates, either directly to the posterior parameters, or to their Fourier transform so as to preserve update consistency. We also utilise a recently introduced posterior sampling mechanism, for which we prove bounds for the specific but general case of discrete Bayesian networks; and we introduce a maximum-a-posteriori private mechanism. Our analysis includes utility and privacy bounds, with a novel focus on the influence of graph structure on privacy. Worked examples and experiments with Bayesian na{\\\"i}ve Bayes and Bayesian linear regression illustrate the application of our mechanisms.\nOur world is filled with both beautiful and brainy people, but how often does a Nobel Prize winner also wins a beauty pageant? Let us assume that someone who is both very beautiful and very smart is more rare than what we would expect from the combination of the number of beautiful and brainy people. Of course there will still always be some individuals that defy this stereotype; these beautiful brainy people are exactly the class of anomaly we focus on in this paper. They do not posses intrinsically rare qualities, it is the unexpected combination of factors that makes them stand out.   In this paper we define the above described class of anomaly and propose a method to quickly identify them in transaction data. Further, as we take a pattern set based approach, our method readily explains why a transaction is anomalous. The effectiveness of our method is thoroughly verified with a wide range of experiments on both real world and synthetic data.\nWe study how to obtain concise descriptions of discrete multivariate sequential data. In particular, how to do so in terms of rich multivariate sequential patterns that can capture potentially highly interesting (cor)relations between sequences. To this end we allow our pattern language to span over the domains (alphabets) of all sequences, allow patterns to overlap temporally, as well as allow for gaps in their occurrences.   We formalise our goal by the Minimum Description Length principle, by which our objective is to discover the set of patterns that provides the most succinct description of the data. To discover high-quality pattern sets directly from data, we introduce DITTO, a highly efficient algorithm that approximates the ideal result very well.   Experiments show that DITTO correctly discovers the patterns planted in synthetic data. Moreover, it scales favourably with the length of the data, the number of attributes, the alphabet sizes. On real data, ranging from sensor networks to annotated text, DITTO discovers easily interpretable summaries that provide clear insight in both the univariate and multivariate structure.\nWhile wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming processes. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments. First, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, by integrating language processing, a vocabulary of concepts is defined in a semantic space. Finally, by exploiting the temporal coherence in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from activity and event recognition to semantic indexing and summarization. Experiments over egocentric sets of nearly 17,000 images, show that the proposed approach outperforms state-of-the-art methods.\nA deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework. Stochastic {\\em unpooling} is employed to link consecutive layers in the model, yielding top-down image generation. A Bayesian support vector machine is linked to the top-layer features, yielding max-margin discrimination. Deep deconvolutional inference is employed when testing, to infer the latent features, and the top-layer features are connected with the max-margin classifier for discrimination tasks. The model is efficiently trained using a Monte Carlo expectation-maximization (MCEM) algorithm, with implementation on graphical processor units (GPUs) for efficient large-scale learning, and fast testing. Excellent results are obtained on several benchmark datasets, including ImageNet, demonstrating that the proposed model achieves results that are highly competitive with similarly sized convolutional neural networks.\nWe investigate crowdsourcing algorithms for finding the top-quality item within a large collection of objects with unknown intrinsic quality values. This is an important problem with many relevant applications, for example in networked recommendation systems. The core of the algorithms is that objects are distributed to crowd workers, who return a noisy and biased evaluation. All received evaluations are then combined, to identify the top-quality object. We first present a simple probabilistic model for the system under investigation. Then, we devise and study a class of efficient adaptive algorithms to assign in an effective way objects to workers. We compare the performance of several algorithms, which correspond to different choices of the design parameters/metrics. In the simulations we show that some of the algorithms achieve near optimal performance for a suitable setting of the system parameters.\nWe determine the quality of randomized social choice mechanisms in a setting in which the agents have metric preferences: every agent has a cost for each alternative, and these costs form a metric. We assume that these costs are unknown to the mechanisms (and possibly even to the agents themselves), which means we cannot simply select the optimal alternative, i.e. the alternative that minimizes the total agent cost (or median agent cost). However, we do assume that the agents know their ordinal preferences that are induced by the metric space. We examine randomized social choice functions that require only this ordinal information and select an alternative that is good in expectation with respect to the costs from the metric. To quantify how good a randomized social choice function is, we bound the distortion, which is the worst-case ratio between expected cost of the alternative selected and the cost of the optimal alternative. We provide new distortion bounds for a variety of randomized mechanisms, for both general metrics and for important special cases. Our results show a sizable improvement in distortion over deterministic mechanisms.\nApproaches to signal representation and coding theory have traditionally focused on how to best represent signals using parsimonious representations that incur the lowest possible distortion. Classical examples include linear and non-linear approximations, sparse representations, and rate-distortion theory. Very often, however, the goal of processing is to extract specific information from the signal, and the distortion should be measured on the extracted information. The corresponding representation should, therefore, represent that information as parsimoniously as possible, without necessarily accurately representing the signal itself.   In this paper, we examine the problem of encoding signals such that sufficient information is preserved about their pairwise distances and their inner products. For that goal, we consider randomized embeddings as an encoding mechanism and provide a framework to analyze their performance. We also demonstrate that it is possible to design the embedding such that it represents different ranges of distances with different precision. These embeddings also allow the computation of kernel inner products with control on their inner product-preserving properties. Our results provide a broad framework to design and analyze embeddins, and generalize existing results in this area, such as random Fourier kernels and universal embeddings.\nWe consider the Max $K$-Armed Bandit problem, where a learning agent is faced with several stochastic arms, each a source of i.i.d. rewards of unknown distribution. At each time step the agent chooses an arm, and observes the reward of the obtained sample. Each sample is considered here as a separate item with the reward designating its value, and the goal is to find an item with the highest possible value. Our basic assumption is a known lower bound on the {\\em tail function} of the reward distributions. Under the PAC framework, we provide a lower bound on the sample complexity of any $(\\epsilon,\\delta)$-correct algorithm, and propose an algorithm that attains this bound up to logarithmic factors. We analyze the robustness of the proposed algorithm and in addition, we compare the performance of this algorithm to the variant in which the arms are not distinguishable by the agent and are chosen randomly at each stage. Interestingly, when the maximal rewards of the arms happen to be similar, the latter approach may provide better performance.\nIn this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem -- how does one make sure that modified data still contains the information it had before modification? This question is not the same as asking if an accurate classifier can be built from the modified data. Often in the literature, the prediction accuracy of a classifier made from modified (anonymized) data is used as evidence that the data is similar to the original. We demonstrate that this is not the case, and we propose a new methodology for measuring the retention of the patterns that existed in the original data. We then use our methodology to design three measures that can be easily implemented, each measuring aspects of the data that no pre-existing techniques can measure. These measures do not negate the usefulness of prediction accuracy or other measures -- they are complementary to them, and support our argument that one measure is almost never enough.\nRecently, several large-scale RDF knowledge bases have been built and applied in many knowledge-based applications. To further increase the number of facts in RDF knowledge bases, logic rules can be used to predict new facts based on the existing ones. Therefore, how to automatically learn reliable rules from large-scale knowledge bases becomes increasingly important. In this paper, we propose a novel rule learning approach named RDF2Rules for RDF knowledge bases. RDF2Rules first mines frequent predicate cycles (FPCs), a kind of interesting frequent patterns in knowledge bases, and then generates rules from the mined FPCs. Because each FPC can produce multiple rules, and effective pruning strategy is used in the process of mining FPCs, RDF2Rules works very efficiently. Another advantage of RDF2Rules is that it uses the entity type information when generates and evaluates rules, which makes the learned rules more accurate. Experiments show that our approach outperforms the compared approach in terms of both efficiency and accuracy.\nNowadays, hospitals are ubiquitous and integral to modern society. Patients flow in and out of a veritable whirlwind of paperwork, consultations, and potential inpatient admissions, through an abstracted system that is not without flaws. One of the biggest flaws in the medical system is perhaps an unexpected one: the patient alarm system. One longitudinal study reported an 88.8% rate of false alarms, with other studies reporting numbers of similar magnitudes. These false alarm rates lead to a number of deleterious effects that manifest in a significantly lower standard of care across clinics.   This paper discusses a model-based probabilistic inference approach to identifying variables at a detection level. We design a generative model that complies with an overview of human physiology and perform approximate Bayesian inference. One primary goal of this paper is to justify a Bayesian modeling approach to increasing robustness in a physiological domain.   We use three data sets provided by Physionet, a research resource for complex physiological signals, in the form of the Physionet 2014 Challenge set-p1 and set-p2, as well as the MGH/MF Waveform Database. On the extended data set our algorithm is on par with the other top six submissions to the Physionet 2014 challenge.\nWe present a domain-general account of causation that applies to settings in which macro-level causal relations between two systems are of interest, but the relevant causal features are poorly understood and have to be aggregated from vast arrays of micro-measurements. Our approach generalizes that of Chalupka et al. (2015) to the setting in which the macro-level effect is not specified. We formalize the connection between micro- and macro-variables in such situations and provide a coherent framework describing causal relations at multiple levels of analysis. We present an algorithm that discovers macro-variable causes and effects from micro-level measurements obtained from an experiment. We further show how to design experiments to discover macro-variables from observational micro-variable data. Finally, we show that under specific conditions, one can identify multiple levels of causal structure. Throughout the article, we use a simulated neuroscience multi-unit recording experiment to illustrate the ideas and the algorithms.\nThis paper defines adversarial reasoning as computational approaches to inferring and anticipating an enemy's perceptions, intents and actions. It argues that adversarial reasoning transcends the boundaries of game theory and must also leverage such disciplines as cognitive modeling, control theory, AI planning and others. To illustrate the challenges of applying adversarial reasoning to real-world problems, the paper explores the lessons learned in the CADET - a battle planning system that focuses on brigade-level ground operations and involves adversarial reasoning. From this example of current capabilities, the paper proceeds to describe RAID - a DARPA program that aims to build capabilities in adversarial reasoning, and how such capabilities would address practical requirements in Defense and other application areas.\nThis paper gives an overview of recent progress in the brain inspired computing field with a focus on implementation using emerging memories as electronic synapses. Design considerations and challenges such as requirements and design targets on multilevel states, device variability, programming energy, array-level connectivity, fan-in/fanout, wire energy, and IR drop are presented. Wires are increasingly important in design decisions, especially for large systems, and cycle-to-cycle variations have large impact on learning performance.\nVehicles are becoming more and more connected, this opens up a larger attack surface which not only affects the passengers inside vehicles, but also people around them. These vulnerabilities exist because modern systems are built on the comparatively less secure and old CAN bus framework which lacks even basic authentication. Since a new protocol can only help future vehicles and not older vehicles, our approach tries to solve the issue as a data analytics problem and use machine learning techniques to secure cars. We develop a Hidden Markov Model to detect anomalous states from real data collected from vehicles. Using this model, while a vehicle is in operation, we are able to detect and issue alerts. Our model could be integrated as a plug-n-play device in all new and old cars.\nMulti-relational learning has received lots of attention from researchers in various research communities. Most existing methods either suffer from superlinear per-iteration cost, or are sensitive to the given ranks. To address both issues, we propose a scalable core tensor trace norm Regularized Orthogonal Iteration Decomposition (ROID) method for full or incomplete tensor analytics, which can be generalized as a graph Laplacian regularized version by using auxiliary information or a sparse higher-order orthogonal iteration (SHOOI) version. We first induce the equivalence relation of the Schatten p-norm (0<p<\\infty) of a low multi-linear rank tensor and its core tensor. Then we achieve a much smaller matrix trace norm minimization problem. Finally, we develop two efficient augmented Lagrange multiplier algorithms to solve our problems with convergence guarantees. Extensive experiments using both real and synthetic datasets, even though with only a few observations, verified both the efficiency and effectiveness of our methods.\nSeveral algorithms and tools have been developed to (semi) automate the process of glycan identification by interpreting Mass Spectrometric data. However, each has limitations when annotating MSn data with thousands of MS spectra using uncurated public databases. Moreover, the existing tools are not designed to manage MSn data where n > 2. We propose a novel software package to automate the annotation of tandem MS data. This software consists of two major components. The first, is a free, semi-automated MSn data interpreter called the Glycomic Elucidation and Annotation Tool (GELATO). This tool extends and automates the functionality of existing open source projects, namely, GlycoWorkbench (GWB) and GlycomeDB. The second is a machine learning model called Smart Anotation Enhancement Graph (SAGE), which learns the behavior of glycoanalysts to select annotations generated by GELATO that emulate human interpretation of the spectra.\nDecision making is often based on Bayesian networks. The building blocks for Bayesian networks are its conditional probability tables (CPTs). These tables are obtained by parameter estimation methods, or they are elicited from subject matter experts (SME). Some of these knowledge representations are insufficient approximations. Using knowledge fusion of cause and effect observations lead to better predictive decisions. We propose three new methods to generate CPTs, which even work when only soft evidence is provided. The first two are novel ways of mapping conditional expectations to the probability space. The third is a column extraction method, which obtains CPTs from nonlinear functions such as the multinomial logistic regression. Case studies on military effects and burnt forest desertification have demonstrated that so derived CPTs have highly reliable predictive power, including superiority over the CPTs obtained from SMEs. In this context, new quality measures for determining the goodness of a CPT and for comparing CPTs with each other have been introduced. The predictive power and enhanced reliability of decision making based on the novel CPT generation methods presented in this paper have been confirmed and validated within the context of the case studies.\nModel-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to on-policy algorithms. We illustrate these ideas in several examples, where G-learning results in significant improvements of the convergence rate and the cost of the learning process.\nNatural language inference (NLI) is a fundamentally important task in natural language processing that has many applications. The recently released Stanford Natural Language Inference (SNLI) corpus has made it possible to develop and evaluate learning-centered methods such as deep neural networks for natural language inference (NLI). In this paper, we propose a special long short-term memory (LSTM) architecture for NLI. Our model builds on top of a recently proposed neural attention model for NLI but is based on a significantly different idea. Instead of deriving sentence embeddings for the premise and the hypothesis to be used for classification, our solution uses a match-LSTM to perform word-by-word matching of the hypothesis with the premise. This LSTM is able to place more emphasis on important word-level matching results. In particular, we observe that this LSTM remembers important mismatches that are critical for predicting the contradiction or the neutral relationship label. On the SNLI corpus, our model achieves an accuracy of 86.1%, outperforming the state of the art.\nMany recent algorithms for approximate model counting are based on a reduction to combinatorial searches over random subsets of the space defined by parity or XOR constraints. Long parity constraints (involving many variables) provide strong theoretical guarantees but are computationally difficult. Short parity constraints are easier to solve but have weaker statistical properties. It is currently not known how long these parity constraints need to be. We close the gap by providing matching necessary and sufficient conditions on the required asymptotic length of the parity constraints. Further, we provide a new family of lower bounds and the first non-trivial upper bounds on the model count that are valid for arbitrarily short XORs. We empirically demonstrate the effectiveness of these bounds on model counting benchmarks and in a Satisfiability Modulo Theory (SMT) application motivated by the analysis of contingency tables in statistics.\nA gamma process dynamic Poisson factor analysis model is proposed to factorize a dynamic count matrix, whose columns are sequentially observed count vectors. The model builds a novel Markov chain that sends the latent gamma random variables at time $(t-1)$ as the shape parameters of those at time $t$, which are linked to observed or latent counts under the Poisson likelihood. The significant challenge of inferring the gamma shape parameters is fully addressed, using unique data augmentation and marginalization techniques for the negative binomial distribution. The same nonparametric Bayesian model also applies to the factorization of a dynamic binary matrix, via a Bernoulli-Poisson link that connects a binary observation to a latent count, with closed-form conditional posteriors for the latent counts and efficient computation for sparse observations. We apply the model to text and music analysis, with state-of-the-art results.\nWe consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computation infeasible. Based on the Lagrangian Relaxation technique in Adelman & Mersereau (2008), we provide a computationally tractable instance-specific upper bound on the value of this Bayes-optimal policy, which can in turn be used to bound the optimality gap of any other sub-optimal policy. In an approach similar in spirit to the Whittle index for restless multiarmed bandits, we provide an index policy for effort allocation in crowdsourcing and demonstrate numerically that it outperforms other stateof- arts and performs close to optimal solution.\nWe investigate the 3-architecture Connected Facility Location Problem arising in the design of urban telecommunication access networks. We propose an original optimization model for the problem that includes additional variables and constraints to take into account wireless signal coverage. Since the problem can prove challenging even for modern state-of-the art optimization solvers, we propose to solve it by an original primal heuristic which combines a probabilistic fixing procedure, guided by peculiar Linear Programming relaxations, with an exact MIP heuristic, based on a very large neighborhood search. Computational experiments on a set of realistic instances show that our heuristic can find solutions associated with much lower optimality gaps than a state-of-the-art solver.\nWe study a novel machine learning (ML) problem setting of sequentially allocating small subsets of training data amongst a large set of classifiers. The goal is to select a classifier that will give near-optimal accuracy when trained on all data, while also minimizing the cost of misallocated samples. This is motivated by large modern datasets and ML toolkits with many combinations of learning algorithms and hyper-parameters. Inspired by the principle of \"optimism under uncertainty,\" we propose an innovative strategy, Data Allocation using Upper Bounds (DAUB), which robustly achieves these objectives across a variety of real-world datasets.   We further develop substantial theoretical support for DAUB in an idealized setting where the expected accuracy of a classifier trained on $n$ samples can be known exactly. Under these conditions we establish a rigorous sub-linear bound on the regret of the approach (in terms of misallocated data), as well as a rigorous bound on suboptimality of the selected classifier. Our accuracy estimates using real-world datasets only entail mild violations of the theoretical scenario, suggesting that the practical behavior of DAUB is likely to approach the idealized behavior.\nThe histological assessment of human tissue has emerged as the key challenge for detection and treatment of cancer. A plethora of different data sources ranging from tissue microarray data to gene expression, proteomics or metabolomics data provide a detailed overview of the health status of a patient. Medical doctors need to assess these information sources and they rely on data driven automatic analysis tools. Methods for classification, grouping and segmentation of heterogeneous data sources as well as regression of noisy dependencies and estimation of survival probabilities enter the processing workflow of a pathology diagnosis system at various stages. This paper reports on state-of-the-art of the design and effectiveness of computational pathology workflows and it discusses future research directions in this emergent field of medical informatics and diagnostic machine learning.\nWe present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the time-frequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in time, log-frequency, and octave index. We give a closed-form approximation of spiral scattering coefficients for a nonstationary generalization of the harmonic source-filter model.\nWe present a unified approach for learning the parameters of Sum-Product networks (SPNs). We prove that any complete and decomposable SPN is equivalent to a mixture of trees where each tree corresponds to a product of univariate distributions. Based on the mixture model perspective, we characterize the objective function when learning SPNs based on the maximum likelihood estimation (MLE) principle and show that the optimization problem can be formulated as a signomial program. We construct two parameter learning algorithms for SPNs by using sequential monomial approximations (SMA) and the concave-convex procedure (CCCP), respectively. The two proposed methods naturally admit multiplicative updates, hence effectively avoiding the projection operation. With the help of the unified framework, we also show that, in the case of SPNs, CCCP leads to the same algorithm as Expectation Maximization (EM) despite the fact that they are different in general.\nThe BusPlus project aims at improving the off-peak hours public transit service in Canberra, Australia. To address the difficulty of covering a large geographic area, BusPlus proposes a hub and shuttle model consisting of a combination of a few high-frequency bus routes between key hubs and a large number of shuttles that bring passengers from their origin to the closest hub and take them from their last bus stop to their destination. This paper focuses on the design of bus network and proposes an efficient solving method to this multimodal network design problem based on the Benders decomposition method. Starting from a MIP formulation of the problem, the paper presents a Benders decomposition approach using dedicated solution techniques for solving independent sub-problems, Pareto optimal cuts, cut bundling, and core point update. Computational results on real-world data from Canberra's public transit system justify the design choices and show that the approach outperforms the MIP formulation by two orders of magnitude. Moreover, the results show that the hub and shuttle model may decrease transit time by a factor of 2, while staying within the costs of the existing transit system.\nSequence-to-sequence neural translation models learn semantic and syntactic relations between sentence pairs by optimizing the likelihood of the target given the source, i.e., $p(y|x)$, an objective that ignores other potentially useful sources of information. We introduce an alternative objective function for neural MT that maximizes the mutual information between the source and target sentences, modeling the bi-directional dependency of sources and targets. We implement the model with a simple re-ranking method, and also introduce a decoding algorithm that increases diversity in the N-best list produced by the first pass. Applied to the WMT German/English and French/English tasks, the proposed models offers a consistent performance boost on both standard LSTM and attention-based neural MT architectures.\nInformation hierarchies are organizational structures that often used to organize and present large and complex information as well as provide a mechanism for effective human navigation. Fortunately, many statistical and computational models exist that automatically generate hierarchies; however, the existing approaches do not consider linkages in information {\\em networks} that are increasingly common in real-world scenarios. Current approaches also tend to present topics as an abstract probably distribution over words, etc rather than as tangible nodes from the original network. Furthermore, the statistical techniques present in many previous works are not yet capable of processing data at Web-scale. In this paper we present the Hierarchical Document Topic Model (HDTM), which uses a distributed vertex-programming process to calculate a nonparametric Bayesian generative model. Experiments on three medium size data sets and the entire Wikipedia dataset show that HDTM can infer accurate hierarchies even over large information networks.\nAn important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.\nThis article discusses open scientific challenges for understanding development and evolution of speech forms, as a commentary to Moulin-Frier et al. (Moulin-Frier et al., 2015). Based on the analysis of mathematical models of the origins of speech forms, with a focus on their assumptions , we study the fundamental question of how speech can be formed out of non--speech, at both developmental and evolutionary scales. In particular, we emphasize the importance of embodied self-organization , as well as the role of mechanisms of motivation and active curiosity-driven exploration in speech formation. Finally , we discuss an evolutionary-developmental perspective of the origins of speech.\nSemantic parsing methods are used for capturing and representing semantic meaning of text. Meaning representation capturing all the concepts in the text may not always be available or may not be sufficiently complete. Ontologies provide a structured and reasoning-capable way to model the content of a collection of texts. In this work, we present a novel approach to joint learning of ontology and semantic parser from text. The method is based on semi-automatic induction of a context-free grammar from semantically annotated text. The grammar parses the text into semantic trees. Both, the grammar and the semantic trees are used to learn the ontology on several levels -- classes, instances, taxonomic and non-taxonomic relations. The approach was evaluated on the first sentences of Wikipedia pages describing people.\nWe present a new concept - Wikiometrics - the derivation of metrics and indicators from Wikipedia. Wikipedia provides an accurate representation of the real world due to its size, structure, editing policy and popularity. We demonstrate an innovative mining methodology, where different elements of Wikipedia - content, structure, editorial actions and reader reviews - are used to rank items in a manner which is by no means inferior to rankings produced by experts or other methods. We test our proposed method by applying it to two real-world ranking problems: top world universities and academic journals. Our proposed ranking methods were compared to leading and widely accepted benchmarks, and were found to be extremely correlative but with the advantage of the data being publically available.\nIn the Internet of Things (IoT) domain, various heterogeneous ubiquitous devices would be able to connect and communicate with each other seamlessly, irrespective of the domain. Semantic representation of data through detailed standardized annotation has shown to improve the integration of the interconnected heterogeneous devices. However, the semantic representation of these heterogeneous data sources for environmental monitoring systems is not yet well supported. To achieve the maximum benefits of IoT for drought forecasting, a dedicated semantic middleware solution is required. This research proposes a middleware that semantically represents and integrates heterogeneous data sources with indigenous knowledge based on a unified ontology for an accurate IoT-based drought early warning system (DEWS).\nIn this paper, we investigate stable patterns of electroencephalogram (EEG) over time for emotion recognition using a machine learning approach. Up to now, various findings of activated patterns associated with different emotions have been reported. However, their stability over time has not been fully investigated yet. In this paper, we focus on identifying EEG stability in emotion recognition. To validate the efficiency of the machine learning algorithms used in this study, we systematically evaluate the performance of various popular feature extraction, feature selection, feature smoothing and pattern classification methods with the DEAP dataset and a newly developed dataset for this study. The experimental results indicate that stable patterns exhibit consistency across sessions; the lateral temporal areas activate more for positive emotion than negative one in beta and gamma bands; the neural patterns of neutral emotion have higher alpha responses at parietal and occipital sites; and for negative emotion, the neural patterns have significant higher delta responses at parietal and occipital sites and higher gamma responses at prefrontal sites. The performance of our emotion recognition system shows that the neural patterns are relatively stable within and between sessions.\nRecommender systems (RSs) provide an effective way of alleviating the information overload problem by selecting personalized choices. Online social networks and user-generated content provide diverse sources for recommendation beyond ratings, which present opportunities as well as challenges for traditional RSs. Although social matrix factorization (Social MF) can integrate ratings with social relations and topic matrix factorization can integrate ratings with item reviews, both of them ignore some useful information. In this paper, we investigate the effective data fusion by combining the two approaches, in two steps. First, we extend Social MF to exploit the graph structure of neighbors. Second, we propose a novel framework MR3 to jointly model these three types of information effectively for rating prediction by aligning latent factors and hidden topics. We achieve more accurate rating prediction on two real-life datasets. Furthermore, we measure the contribution of each data source to the proposed framework.\nCollaborative vocabulary development in the context of data integration is the process of finding consensus between the experts of the different systems and domains. The complexity of this process is increased with the number of involved people, the variety of the systems to be integrated and the dynamics of their domain. In this paper we advocate that the realization of a powerful version control system is the heart of the problem. Driven by this idea and the success of Git in the context of software development, we investigate the applicability of Git for collaborative vocabulary development. Even though vocabulary development and software development have much more similarities than differences there are still important differences. These need to be considered within the development of a successful versioning and collaboration system for vocabulary development. Therefore, this paper starts by presenting the challenges we were faced with during the creation of vocabularies collaboratively and discusses its distinction to software development. Based on these insights we propose Git4Voc which comprises guidelines how Git can be adopted to vocabulary development. Finally, we demonstrate how Git hooks can be implemented to go beyond the plain functionality of Git by realizing vocabulary-specific features like syntactic validation and semantic diffs.\nIn this paper we present the initial development of a general theory for mapping inference in predicate logic to computation over Tensor Product Representations (TPRs; Smolensky (1990), Smolensky & Legendre (2006)). After an initial brief synopsis of TPRs (Section 0), we begin with particular examples of inference with TPRs in the 'bAbI' question-answering task of Weston et al. (2015) (Section 1). We then present a simplification of the general analysis that suffices for the bAbI task (Section 2). Finally, we lay out the general treatment of inference over TPRs (Section 3). We also show the simplification in Section 2 derives the inference methods described in Lee et al. (2016); this shows how the simple methods of Lee et al. (2016) can be formally extended to more general reasoning tasks.\nFinding inclusion-minimal \"hitting sets\" for a given collection of sets is a fundamental combinatorial problem with applications in domains as diverse as Boolean algebra, computational biology, and data mining. Much of the algorithmic literature focuses on the problem of *recognizing* the collection of minimal hitting sets; however, in many of the applications, it is more important to *generate* these hitting sets. We survey twenty algorithms from across a variety of domains, considering their history, classification, useful features, and computational performance on a variety of synthetic and real-world inputs. We also provide a suite of implementations of these algorithms with a ready-to-use, platform-agnostic interface based on Docker containers and the AlgoRun framework, so that interested computational scientists can easily perform similar tests with inputs from their own research areas on their own computers or through a convenient Web interface.\nWe consider the problem of maximizing a monotone submodular function under noise. There has been a great deal of work on optimization of submodular functions under various constraints, resulting in algorithms that provide desirable approximation guarantees. In many applications, however, we do not have access to the submodular function we aim to optimize, but rather to some erroneous or noisy version of it. This raises the question of whether provable guarantees are obtainable in presence of error and noise. We provide initial answers, by focusing on the question of maximizing a monotone submodular function under a cardinality constraint when given access to a noisy oracle of the function. We show that:   - For a cardinality constraint $k \\geq 2$, there is an approximation algorithm whose approximation ratio is arbitrarily close to $1-1/e$;   - For $k=1$ there is an algorithm whose approximation ratio is arbitrarily close to $1/2$. No randomized algorithm can obtain an approximation ratio better than $1/2+o(1)$;   -If the noise is adversarial, no non-trivial approximation guarantee can be obtained.\nModel checking has been successfully used in many computer science fields, including artificial intelligence, theoretical computer science, and databases. Most of the proposed solutions make use of classical, point-based temporal logics, while little work has been done in the interval temporal logic setting. Recently, a non-elementary model checking algorithm for Halpern and Shoham's modal logic of time intervals HS over finite Kripke structures (under the homogeneity assumption) and an EXPSPACE model checking procedure for two meaningful fragments of it have been proposed. In this paper, we show that more efficient model checking procedures can be developed for some expressive enough fragments of HS.\nLink prediction, or predicting the likelihood of a link in a knowledge graph based on its existing state is a key research task. It differs from a traditional link prediction task in that the links in a knowledge graph are categorized into different predicates and the link prediction performance of different predicates in a knowledge graph generally varies widely. In this work, we propose a latent feature embedding based link prediction model which considers the prediction task for each predicate disjointly. To learn the model parameters it utilizes a Bayesian personalized ranking based optimization technique. Experimental results on large-scale knowledge bases such as YAGO2 show that our link prediction approach achieves substantially higher performance than several state-of-art approaches. We also show that for a given predicate the topological properties of the knowledge graph induced by the given predicate edges are key indicators of the link prediction performance of that predicate in the knowledge graph.\nIn this paper we propose a special type of aggregation function which generalizes the notion of Ordered Weighted Averaging Function - OWA. The resulting functions are called Dynamic Ordered Weighted Averaging Functions --- DYOWAs. This generalization will be developed in such way that the weight vectors are variables depending on the input vector. Particularly, this operators generalize the aggregation functions: Minimum, Maximum, Arithmetic Mean, Median, etc, which are extensively used in image processing. In this field of research two problems are considered: The determination of methods to reduce images and the construction of techniques which provide noise reduction. The operators described here are able to be used in both cases. In terms of image reduction we apply the methodology provided by Patermain et al. We use the noise reduction operators obtained here to treat the images obtained in the first part of the paper, thus obtaining images with better quality.\nAutomatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.\nWe introduce the problem of Task Assignment and Sequencing (TAS), which adds the timeline perspective to expert crowdsourcing optimization. Expert crowdsourcing involves macrotasks, like document writing, product design, or web development, which take more time than typical binary microtasks, require expert skills, assume varying degrees of knowledge over a topic, and require crowd workers to build on each other's contributions. Current works usually assume offline optimization models, which consider worker and task arrivals known and do not take into account the element of time. Realistically however, time is critical: tasks have deadlines, expert workers are available only at specific time slots, and worker/task arrivals are not known a-priori. Our work is the first to address the problem of optimal task sequencing for online, heterogeneous, time-constrained macrotasks. We propose tas-online, an online algorithm that aims to complete as many tasks as possible within budget, required quality and a given timeline, without future input information regarding job release dates or worker availabilities. Results, comparing tas-online to four typical benchmarks, show that it achieves more completed jobs, lower flow times and higher job quality. This work has practical implications for improving the Quality of Service of current crowdsourcing platforms, allowing them to offer cost, quality and time improvements for expert tasks.\nIn this paper, we design a Deep Dual-Domain ($\\mathbf{D^3}$) based fast restoration model to remove artifacts of JPEG compressed images. It leverages the large learning capacity of deep networks, as well as the problem-specific expertise that was hardly incorporated in the past design of deep architectures. For the latter, we take into consideration both the prior knowledge of the JPEG compression scheme, and the successful practice of the sparsity-based dual-domain approach. We further design the One-Step Sparse Inference (1-SI) module, as an efficient and light-weighted feed-forward approximation of sparse coding. Extensive experiments verify the superiority of the proposed $D^3$ model over several state-of-the-art methods. Specifically, our best model is capable of outperforming the latest deep model for around 1 dB in PSNR, and is 30 times faster.\nVisual recognition research often assumes a sufficient resolution of the region of interest (ROI). That is usually violated in practice, inspiring us to explore the Very Low Resolution Recognition (VLRR) problem. Typically, the ROI in a VLRR problem can be smaller than $16 \\times 16$ pixels, and is challenging to be recognized even by human experts. We attempt to solve the VLRR problem using deep learning methods. Taking advantage of techniques primarily in super resolution, domain adaptation and robust regression, we formulate a dedicated deep learning method and demonstrate how these techniques are incorporated step by step. Any extra complexity, when introduced, is fully justified by both analysis and simulation results. The resulting \\textit{Robust Partially Coupled Networks} achieves feature enhancement and recognition simultaneously. It allows for both the flexibility to combat the LR-HR domain mismatch, and the robustness to outliers. Finally, the effectiveness of the proposed models is evaluated on three different VLRR tasks, including face identification, digit recognition and font recognition, all of which obtain very impressive performances.\nThe current scenario in the field of computing is largely affected by the speed at which data can be accessed and recalled. In this paper, we present the word existence algorithm which is used to check if the word given as an input is part of a particular database or not. We have taken the English language as an example here. This algorithm tries to solve the problem of lookup by using a uniformly distributed hash function. We have also addressed the problem of clustering and collision. A further contribution is that we follow a direct hashed model where each hash value is linked to another table if the continuity for the function holds true. The core of the algorithm lies in the data model being used during preordering. Our focus lies on the formation of a continuity series and validating the words that exists in the database. This algorithm can be used in applications where we there is a requirement to search for just the existence of a word, example Artificial Intelligence responding to input ,look up for neural networks and dictionary lookups and more. We have observed that this algorithm provides a faster search time\nWe study the semantic foundation of expressive probabilistic programming languages, that support higher-order functions, continuous distributions, and soft constraints (such as Anglican, Church, and Venture). We define a metalanguage (an idealised version of Anglican) for probabilistic computation with the above features, develop both operational and denotational semantics, and prove soundness, adequacy, and termination. They involve measure theory, stochastic labelled transition systems, and functor categories, but admit intuitive computational readings, one of which views sampled random variables as dynamically allocated read-only variables. We apply our semantics to validate nontrivial equations underlying the correctness of certain compiler optimisations and inference algorithms such as sequential Monte Carlo simulation. The language enables defining probability distributions on higher-order functions, and we study their properties.\nVector space representations of words capture many aspects of word similarity, but such methods tend to make vector spaces in which antonyms (as well as synonyms) are close to each other. We present a new signed spectral normalized graph cut algorithm, signed clustering, that overlays existing thesauri upon distributionally derived vector representations of words, so that antonym relationships between word pairs are represented by negative weights. Our signed clustering algorithm produces clusters of words which simultaneously capture distributional and synonym relations. We evaluate these clusters against the SimLex-999 dataset (Hill et al.,2014) of human judgments of word pair similarities, and also show the benefit of using our clusters to predict the sentiment of a given text.\nRubik's Revenge, a 4x4x4 variant of the Rubik's puzzles, remains to date as an unsolved puzzle. That is to say, we do not have a method or successful categorization to optimally solve every one of its approximately $7.401 \\times 10^{45}$ possible configurations. Rubik's Cube, Rubik's Revenge's predecessor (3x3x3), with its approximately $4.33 \\times 10^{19}$ possible configurations, has only recently been completely solved by Rokicki et. al, further finding that any configuration requires no more than 20 moves. With the sheer dimension of Rubik's Revenge and its total configuration space, a brute-force method of finding all optimal solutions would be in vain. Similar to the methods used by Rokicki et. al on Rubik's Cube, in this paper we develop a method for solving arbitrary configurations of Rubik's Revenge in phases, using a combination of a powerful algorithm known as IDA* and a useful definition of distance in the cube space. While time-series results were not successfully gathered, it will be shown that this method far outweighs current human-solving methods and can be used to determine loose upper bounds for the cube space. Discussion will suggest that this method can also be applied to other puzzles with the proper transformations.\nWe present a system for online monitoring of maritime activity over streaming positions from numerous vessels sailing at sea. It employs an online tracking module for detecting important changes in the evolving trajectory of each vessel across time, and thus can incrementally retain concise, yet reliable summaries of its recent movement. In addition, thanks to its complex event recognition module, this system can also offer instant notification to marine authorities regarding emergency situations, such as risk of collisions, suspicious moves in protected zones, or package picking at open sea. Not only did our extensive tests validate the performance, efficiency, and robustness of the system against scalable volumes of real-world and synthetically enlarged datasets, but its deployment against online feeds from vessels has also confirmed its capabilities for effective, real-time maritime surveillance.\nUse of knowledge-based planning tools can help alleviate the challenges of planning a complex operation by a coalition of diverse parties in an adversarial environment. We explore these challenges and potential contributions of knowledge-based tools using as an example the CADET system, a knowledge-based tool capable of producing automatically (or with human guidance) battle plans with realistic degree of detail and complexity. In ongoing experiments, it compared favorably with human planners. Interleaved planning, scheduling, routing, attrition and consumption processes comprise the computational approach of this tool. From the coalition operations perspective, such tools offer an important aid in rapid synchronization of assets and actions of heterogeneous assets belonging to multiple organizations, potentially with distinct doctrine and rules of engagement. In this paper, we discuss the functionality of the tool, provide a brief overview of the technical approach and experimental results, and outline the potential value of such tools.\nBased on the assumption that there exists a neural network that efficiently represents a set of Boolean functions between all binary inputs and outputs, we propose a process for developing and deploying neural networks whose weight parameters, bias terms, input, and intermediate hidden layer output signals, are all binary-valued, and require only basic bit logic for the feedforward pass. The proposed Bitwise Neural Network (BNN) is especially suitable for resource-constrained environments, since it replaces either floating or fixed-point arithmetic with significantly more efficient bitwise operations. Hence, the BNN requires for less spatial complexity, less memory bandwidth, and less power consumption in hardware. In order to design such networks, we propose to add a few training schemes, such as weight compression and noisy backpropagation, which result in a bitwise network that performs almost as well as its corresponding real-valued network. We test the proposed network on the MNIST dataset, represented using binary features, and show that BNNs result in competitive performance while offering dramatic computational savings.\nWe consider a setting for Inverse Reinforcement Learning (IRL) where the learner is extended with the ability to actively select multiple environments, observing an agent's behavior on each environment. We first demonstrate that if the learner can experiment with any transition dynamics on some fixed set of states and actions, then there exists an algorithm that reconstructs the agent's reward function to the fullest extent theoretically possible, and that requires only a small (logarithmic) number of experiments. We contrast this result to what is known about IRL in single fixed environments, namely that the true reward function is fundamentally unidentifiable. We then extend this setting to the more realistic case where the learner may not select any transition dynamic, but rather is restricted to some fixed set of environments that it may try. We connect the problem of maximizing the information derived from experiments to submodular function maximization and demonstrate that a greedy algorithm is near optimal (up to logarithmic factors). Finally, we empirically validate our algorithm on an environment inspired by behavioral psychology.\nWe present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.\nTheories of natural language and concepts have been unable to model the flexibility, creativity, context-dependence, and emergence, exhibited by words, concepts and their combinations. The mathematical formalism of quantum theory has instead been successful in capturing these phenomena such as graded membership, situational meaning, composition of categories, and also more complex decision making situations, which cannot be modeled in traditional probabilistic approaches. We show how a formal quantum approach to concepts and their combinations can provide a powerful extension of prototype theory. We explain how prototypes can interfere in conceptual combinations as a consequence of their contextual interactions, and provide an illustration of this using an intuitive wave-like diagram. This quantum-conceptual approach gives new life to original prototype theory, without however making it a privileged concept theory, as we explain at the end of our paper.\nThe goal of this paper is to identify individuals by analyzing their gait. Instead of using binary silhouettes as input data (as done in many previous works) we propose and evaluate the use of motion descriptors based on densely sampled short-term trajectories. We take advantage of state-of-the-art people detectors to define custom spatial configurations of the descriptors around the target person, obtaining a rich representation of the gait motion. The local motion features (described by the Divergence-Curl-Shear descriptor) extracted on the different spatial areas of the person are combined into a single high-level gait descriptor by using the Fisher Vector encoding. The proposed approach, coined Pyramidal Fisher Motion, is experimentally validated on `CASIA' dataset (parts B and C), `TUM GAID' dataset, `CMU MoBo' dataset and the recent `AVA Multiview Gait' dataset. The results show that this new approach achieves state-of-the-art results in the problem of gait recognition, allowing to recognize walking people from diverse viewpoints on single and multiple camera setups, wearing different clothes, carrying bags, walking at diverse speeds and not limited to straight walking paths.\nIdentifying the type of font (e.g., Roman, Blackletter) used in historical documents can help optical character recognition (OCR) systems produce more accurate text transcriptions. Towards this end, we present an active-learning strategy that can significantly reduce the number of labeled samples needed to train a font classifier. Our approach extracts image-based features that exploit geometric differences between fonts at the word level, and combines them into a bag-of-word representation for each page in a document. We evaluate six sampling strategies based on uncertainty, dissimilarity and diversity criteria, and test them on a database containing over 3,000 historical documents with Blackletter, Roman and Mixed fonts. Our results show that a combination of uncertainty and diversity achieves the highest predictive accuracy (89% of test cases correctly classified) while requiring only a small fraction of the data (17%) to be labeled. We discuss the implications of this result for mass digitization projects of historical documents.\nWe consider a general class of models, where a reinforcement learning (RL) agent learns from cyclic interactions with an external environment via classical signals. Perceptual inputs are encoded as quantum states, which are subsequently transformed by a quantum channel representing the agent's memory, while the outcomes of measurements performed at the channel's output determine the agent's actions. The learning takes place via stepwise modifications of the channel properties. They are described by an update rule that is inspired by the projective simulation (PS) model and equipped with a glow mechanism that allows for a backpropagation of policy changes, analogous to the eligibility traces in RL and edge glow in PS. In this way, the model combines features of PS with the ability for generalization, offered by its physical embodiment as a quantum system. We apply the agent to various setups of an invasion game and a grid world, which serve as elementary model tasks allowing a direct comparison with a basic classical PS agent.\nIn recent years, the planning community has observed that techniques for learning heuristic functions have yielded improvements in performance. One approach is to use offline learning to learn predictive models from existing heuristics in a domain dependent manner. These learned models are deployed as new heuristic functions. The learned models can in turn be tuned online using a domain independent error correction approach to further enhance their informativeness. The online tuning approach is domain independent but instance specific, and contributes to improved performance for individual instances as planning proceeds. Consequently it is more effective in larger problems.   In this paper, we mention two approaches applicable in Partial Order Causal Link (POCL) Planning that is also known as Plan Space Planning. First, we endeavor to enhance the performance of a POCL planner by giving an algorithm for supervised learning. Second, we then discuss an online error minimization approach in POCL framework to minimize the step-error associated with the offline learned models thus enhancing their informativeness. Our evaluation shows that the learning approaches scale up the performance of the planner over standard benchmarks, specially for larger problems.\nLocal search algorithms and iterated local search algorithms are a basic technique. Local search can be a stand along search methods, but it can also be hybridized with evolutionary algorithms. Recently, it has been shown that it is possible to identify improving moves in Hamming neighborhoods for k-bounded pseudo-Boolean optimization problems in constant time. This means that local search does not need to enumerate neighborhoods to find improving moves. It also means that evolutionary algorithms do not need to use random mutation as a operator, except perhaps as a way to escape local optima. In this paper, we show how improving moves can be identified in constant time for multiobjective problems that are expressed as k-bounded pseudo-Boolean functions. In particular, multiobjective forms of NK Landscapes and Mk Landscapes are considered.\nThis paper describes about information extraction system, which is an extension of the system developed by team Hitachi for \"Disease/Disorder Template filling\" task organized by ShARe/CLEF eHealth Evolution Lab 2014. In this extension module we focus on extraction of numerical attributes and values from discharge summary records and associating correct relation between attributes and values. We solve the problem in two steps. First step is extraction of numerical attributes and values, which is developed as a Named Entity Recognition (NER) model using Stanford NLP libraries. Second step is correctly associating the attributes to values, which is developed as a relation extraction module in Apache cTAKES framework. We integrated Stanford NER model as cTAKES pipeline component and used in relation extraction module. Conditional Random Field (CRF) algorithm is used for NER and Support Vector Machines (SVM) for relation extraction. For attribute value relation extraction, we observe 95% accuracy using NER alone and combined accuracy of 87% with NER and SVM.\nMost Software Defined Networks (SDN) traffic engineering applications use excessive and frequent global monitoring in order to find the optimal Quality-of-Service (QoS) paths for the current state of the network. In this work, we present the motivations, architecture and initial evaluation of a SDN application called Cognitive Routing Engine (CRE) which is able to find near-optimal paths for a user-specified QoS while using a very small monitoring overhead compared to global monitoring which is required to guarantee that optimal paths are found. Smaller monitoring overheads bring the advantage of smaller response time for the SDN controllers and switches. The initial evaluation of CRE on a SDN representation of the GEANT academic network shows that it is possible to find near-optimal paths with a small optimality gap of 1.65% while using 9.5 times less monitoring.\nPeople are producing more written material then anytime in the history. The increase is so high that professionals from the various fields are no more able to cope with this amount of publications. Text mining tools can offer tools to help them and one of the tools that can aid information retrieval and information extraction is semantic text annotation. In this report we present Marvin, a text annotator written in Java, which can be used as a command line tool and as a Java library. Marvin is able to annotate text using multiple sources, including WordNet, MetaMap, DBPedia and thesauri represented as SKOS.\nHuman vision greatly benefits from the information about sizes of objects. The role of size in several visual reasoning tasks has been thoroughly explored in human perception and cognition. However, the impact of the information about sizes of objects is yet to be determined in AI. We postulate that this is mainly attributed to the lack of a comprehensive repository of size information. In this paper, we introduce a method to automatically infer object sizes, leveraging visual and textual information from web. By maximizing the joint likelihood of textual and visual observations, our method learns reliable relative size estimates, with no explicit human supervision. We introduce the relative size dataset and show that our method outperforms competitive textual and visual baselines in reasoning about size comparisons.\nThe understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared.\nStatistical language models are central to many applications that use semantics. Recurrent Neural Networks (RNN) are known to produce state of the art results for language modelling, outperforming their traditional n-gram counterparts in many cases. To generate a probability distribution across a vocabulary, these models require a softmax output layer that linearly increases in size with the size of the vocabulary. Large vocabularies need a commensurately large softmax layer and training them on typical laptops/PCs requires significant time and machine resources. In this paper we present a new technique for implementing RNN based large vocabulary language models that substantially speeds up computation while optimally using the limited memory resources. Our technique, while building on the notion of factorizing the output layer by having multiple output layers, improves on the earlier work by substantially optimizing on the individual output layer size and also eliminating the need for a multistep prediction process.\nBuilding a successful recommender system depends on understanding both the dimensions of people's preferences as well as their dynamics. In certain domains, such as fashion, modeling such preferences can be incredibly difficult, due to the need to simultaneously model the visual appearance of products as well as their evolution over time. The subtle semantics and non-linear dynamics of fashion evolution raise unique challenges especially considering the sparsity and large scale of the underlying datasets. In this paper we build novel models for the One-Class Collaborative Filtering setting, where our goal is to estimate users' fashion-aware personalized ranking functions based on their past feedback. To uncover the complex and evolving visual factors that people consider when evaluating products, our method combines high-level visual features extracted from a deep convolutional neural network, users' past feedback, as well as evolving trends within the community. Experimentally we evaluate our method on two large real-world datasets from Amazon.com, where we show it to outperform state-of-the-art personalized ranking measures, and also use it to visualize the high-level fashion trends across the 11-year span of our dataset.\nCategorical compositional distributional semantics is a model of natural language; it combines the statistical vector space models of words with the compositional models of grammar. We formalise in this model the generalised quantifier theory of natural language, due to Barwise and Cooper. The underlying setting is a compact closed category with bialgebras. We start from a generative grammar formalisation and develop an abstract categorical compositional semantics for it, then instantiate the abstract setting to sets and relations and to finite dimensional vector spaces and linear maps. We prove the equivalence of the relational instantiation to the truth theoretic semantics of generalised quantifiers. The vector space instantiation formalises the statistical usages of words and enables us to, for the first time, reason about quantified phrases and sentences compositionally in distributional semantics.\nThe coordination of multiple autonomous vehicles into convoys or platoons is expected on our highways in the near future. However, before such platoons can be deployed, the new autonomous behaviors of the vehicles in these platoons must be certified. An appropriate representation for vehicle platooning is as a multi-agent system in which each agent captures the \"autonomous decisions\" carried out by each vehicle. In order to ensure that these autonomous decision-making agents in vehicle platoons never violate safety requirements, we use formal verification. However, as the formal verification technique used to verify the agent code does not scale to the full system and as the global verification technique does not capture the essential verification of autonomous behavior, we use a combination of the two approaches. This mixed strategy allows us to verify safety requirements not only of a model of the system, but of the actual agent code used to program the autonomous vehicles.\nThe model of cognition developed in (Smolensky and Legendre, 2006) seeks to unify two levels of description of the cognitive process: the connectionist and the symbolic. The theory developed brings together these two levels into the Integrated Connectionist/Symbolic Cognitive architecture (ICS). Clark and Pulman (2007) draw a parallel with semantics where meaning may be modelled on both distributional and symbolic levels, developed by Coecke et al, 2010 into the Distributional Compositional (DisCo) model of meaning. In the current work, we revisit Smolensky and Legendre (S&L)'s model. We describe the DisCo framework, summarise the key ideas in S&L's architecture, and describe how their description of harmony as a graded measure of grammaticality may be applied in the DisCo model.\nWe can program a Real-Time (RT) music improvisation system in C++ without a formal semantic or we can model it with process calculi such as the Non-deterministic Timed Concurrent Constraint (ntcc) calculus. \"A Concurrent Constraints Factor Oracle (FO) model for Music Improvisation\" (Ccfomi) is an improvisation model specified on ntcc. Since Ccfomi improvises non-deterministically, there is no control on choices and therefore little control over the sequence variation during the improvisation. To avoid this, we extended Ccfomi using the Probabilistic Non-deterministic Timed Concurrent Constraint calculus. Our extension to Ccfomi does not change the time and space complexity of building the FO, thus making our extension compatible with RT. However, there was not a ntcc interpreter capable of RT to execute Ccfomi. We developed Ntccrt --a RT capable interpreter for ntcc-- and we executed Ccfomi on Ntccrt. In the future, we plan to extend Ntccrt to execute our extension to Ccfomi.\nEntity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called \"matching dependencies\" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language \"LogiQL\" -an extended form of Datalog supported by the \"LogicBlox\" platform- for all activities related to data processing, and the specification and enforcement of MDs.\nWe study an ancient problem that in a static or dynamical system, sought an optimal path, which the context always means within an extremal condition. In fact, through those discussions about this theme, we established a universal essential calculated model to serve for these complex systems. Meanwhile we utilize the sample space to character the system. These contents in this paper would involve in several major areas including the geometry, probability, graph algorithms and some prior approaches, which stands the ultimately subtle linear algorithm to solve this class problem. Along with our progress, our discussion would demonstrate more general meaning and robust character, which provides clear ideas or notion to support our concrete applications, who work in a more popular complex system.\nIn recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Moreover, we propose a new model, the Semi Aggregated Markov Decision Process (SAMDP), and an algorithm that learns it automatically. The SAMDP model allows us to identify spatio-temporal abstractions directly from features and may be used as a sub-goal detector in future work. Using our tools we reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover, we are able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.\nWe propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks. In these tasks, the agents are not given any pre-designed communication protocol. Therefore, in order to successfully communicate, they must first automatically develop and agree upon their own communication protocol. We present empirical results on two multi-agent learning problems based on well-known riddles, demonstrating that DDRQN can successfully solve such tasks and discover elegant communication protocols to do so. To our knowledge, this is the first time deep reinforcement learning has succeeded in learning communication protocols. In addition, we present ablation experiments that confirm that each of the main components of the DDRQN architecture are critical to its success.\nWe adress the problem of dueling bandits defined on partially ordered sets, or posets. In this setting, arms may not be comparable, and there may be several (incomparable) optimal arms. We propose an algorithm, UnchainedBandits, that efficiently finds the set of optimal arms of any poset even when pairs of comparable arms cannot be distinguished from pairs of incomparable arms, with a set of minimal assumptions. This algorithm relies on the concept of decoys, which stems from social psychology. For the easier case where the incomparability information may be accessible, we propose a second algorithm, SlicingBandits, which takes advantage of this information and achieves a very significant gain of performance compared to UnchainedBandits. We provide theoretical guarantees and experimental evaluation for both algorithms.\nWe study the strategic aspects of social influence in a society of agents linked by a trust network, introducing a new class of games called games of influence. A game of influence is an infinite repeated game with incomplete information in which, at each stage of interaction, an agent can make her opinions visible (public) or invisible (private) in order to influence other agents' opinions. The influence process is mediated by a trust network, as we assume that the opinion of a given agent is only affected by the opinions of those agents that she considers trustworthy (i.e., the agents in the trust network that are directly linked to her). Each agent is endowed with a goal, expressed in a suitable temporal language inspired from linear temporal logic (LTL). We show that games of influence provide a simple abstraction to explore the effects of the trust network structure on the agents' behaviour, by considering solution concepts from game-theory such as Nash equilibrium, weak dominance and winning strategies.\nWe introduce a problem set-up we call the Iterated Matching Pennies (IMP) game and show that it is a powerful framework for the study of three problems: adversarial learnability, conventional (i.e., non-adversarial) learnability and approximability. Using it, we are able to derive the following theorems. (1) It is possible to learn by example all of $\\Sigma^0_1 \\cup \\Pi^0_1$ as well as some supersets; (2) in adversarial learning (which we describe as a pursuit-evasion game), the pursuer has a winning strategy (in other words, $\\Sigma^0_1$ can be learned adversarially, but $\\Pi^0_1$ not); (3) some languages in $\\Pi^0_1$ cannot be approximated by any language in $\\Sigma^0_1$.   We show corresponding results also for $\\Sigma^0_i$ and $\\Pi^0_i$ for arbitrary $i$.\nWe introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.\nThe problem of scheduling under resource constraints is widely applicable. One prominent example is power management, in which we have a limited continuous supply of power but must schedule a number of power-consuming tasks. Such problems feature tightly coupled continuous resource constraints and continuous temporal constraints.   We address such problems by introducing the Time Resource Network (TRN), an encoding for resource-constrained scheduling problems. The definition allows temporal specifications using a general family of representations derived from the Simple Temporal network, including the Simple Temporal Network with Uncertainty, and the probabilistic Simple Temporal Network (Fang et al. (2014)).   We propose two algorithms for determining the consistency of a TRN: one based on Mixed Integer Programing and the other one based on Constraint Programming, which we evaluate on scheduling problems with Simple Temporal Constraints and Probabilistic Temporal Constraints.\nExisting research in crowdsourcing has investigated how to recommend tasks to workers based on which task the workers have already completed, referred to as {\\em implicit feedback}. We, on the other hand, investigate the task recommendation problem, where we leverage both implicit feedback and explicit features of the task. We assume that we are given a set of workers, a set of tasks, interactions (such as the number of times a worker has completed a particular task), and the presence of explicit features of each task (such as, task location). We intend to recommend tasks to the workers by exploiting the implicit interactions, and the presence or absence of explicit features in the tasks. We formalize the problem as an optimization problem, propose two alternative problem formulations and respective solutions that exploit implicit feedback, explicit features, as well as similarity between the tasks. We compare the efficacy of our proposed solutions against multiple state-of-the-art techniques using two large scale real world datasets.\nFor complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is misspecified whenever, the representation cannot express any policy with acceptable performance. We introduce IHOMP : an approach for solving misspecified problems. IHOMP iteratively learns a set of context specialized options and combines these options to solve an otherwise misspecified problem. Our main contribution is proving that IHOMP enjoys theoretical convergence guarantees. In addition, we extend IHOMP to exploit Option Interruption (OI) enabling it to decide where the learned options can be reused. Our experiments demonstrate that IHOMP can find near-optimal solutions to otherwise misspecified problems and that OI can further improve the solutions.\nWe introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework can also solve related new tasks simply by adapting where it applies its existing learned skills. We prove that ASAP converges to a local optimum under natural conditions. Finally, our experimental results, which include a RoboCup domain, demonstrate the ability of ASAP to learn where to reuse skills as well as solve multiple tasks with considerably less experience than solving each task from scratch.\nCollaborative human activities are grounded in social and moral norms, which humans consciously and subconsciously use to guide and constrain their decision-making and behavior, thereby strengthening their interactions and preventing emotional and physical harm. This type of norm-based processing is also critical for robots in many human-robot interaction scenarios (e.g., when helping elderly and disabled persons in assisted living facilities, or assisting humans in assembly tasks in factories or even the space station). In this position paper, we will briefly describe how several components in an integrated cognitive architecture can be used to implement processes that are required for normative human-robot interactions, especially in collaborative tasks where actions and situations could potentially be perceived as threatening and thus need a change in course of action to mitigate the perceived threats.\nAn important problem in the field of bioinformatics is to identify interactive effects among profiled variables for outcome prediction. In this paper, a logistic regression model with pairwise interactions among a set of binary covariates is considered. Modeling the structure of the interactions by a graph, our goal is to recover the interaction graph from independently identically distributed (i.i.d.) samples of the covariates and the outcome.   When viewed as a feature selection problem, a simple quantity called influence is proposed as a measure of the marginal effects of the interaction terms on the outcome. For the case when the underlying interaction graph is known to be acyclic, it is shown that a simple algorithm that is based on a maximum-weight spanning tree with respect to the plug-in estimates of the influences not only has strong theoretical performance guarantees, but can also outperform generic feature selection algorithms for recovering the interaction graph from i.i.d. samples of the covariates and the outcome. Our results can also be extended to the model that includes both individual effects and pairwise interactions via the help of an auxiliary covariate.\nHospital readmissions are expensive and reflect the inadequacies in healthcare system. In the United States alone, treatment of readmitted diabetic patients exceeds 250 million dollars per year. Early identification of patients facing a high risk of readmission can enable healthcare providers to to conduct additional investigations and possibly prevent future readmissions. This not only improves the quality of care but also reduces the medical expenses on readmission. Machine learning methods have been leveraged on public health data to build a system for identifying diabetic patients facing a high risk of future readmission. Number of inpatient visits, discharge disposition and admission type were identified as strong predictors of readmission. Further, it was found that the number of laboratory tests and discharge disposition together predict whether the patient will be readmitted shortly after being discharged from the hospital (i.e. <30 days) or after a longer period of time (i.e. >30 days). These insights can help healthcare providers to improve inpatient diabetic care. Finally, the cost analysis suggests that \\$252.76 million can be saved across 98,053 diabetic patient encounters by incorporating the proposed cost sensitive analysis model.\nSum-Product Networks (SPNs) are a class of expressive yet tractable hierarchical graphical models. LearnSPN is a structure learning algorithm for SPNs that uses hierarchical co-clustering to simultaneously identifying similar entities and similar features. The original LearnSPN algorithm assumes that all the variables are discrete and there is no missing data. We introduce a practical, simplified version of LearnSPN, MiniSPN, that runs faster and can handle missing data and heterogeneous features common in real applications. We demonstrate the performance of MiniSPN on standard benchmark datasets and on two datasets from Google's Knowledge Graph exhibiting high missingness rates and a mix of discrete and continuous features.\nSpeaker identification refers to the task of localizing the face of a person who has the same identity as the ongoing voice in a video. This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable. In this paper, we describe a novel multimodal Long Short-Term Memory (LSTM) architecture which seamlessly unifies both visual and auditory modalities from the beginning of each sequence input. The key idea is to extend the conventional LSTM by not only sharing weights across time steps, but also sharing weights across modalities. We show that modeling the temporal dependency across face and voice can significantly improve the robustness to content quality degradations and variations. We also found that our multimodal LSTM is robustness to distractors, namely the non-speaking identities. We applied our multimodal LSTM to The Big Bang Theory dataset and showed that our system outperforms the state-of-the-art systems in speaker identification with lower false alarm rate and higher recognition accuracy.\nChange management for evolving collaborative business process development is crucial when the business logic, transections and workflow change due to changes in business strategies or organizational and technical environment. During the change implementation, business processes are analyzed and improved ensuring that they capture the proposed change and they do not contain any undesired functionalities or change side-effects. This paper presents Business Process Change Management approach for the efficient and effective implementation of change in the business process. The key technology behind our approach is our proposed Business Process Change Management Ontology (BPCMont) which is the main contribution of this paper. BPCMont, as a formalized change specification, helps to revert BP into a consistent state in case of system crash, intermediate conflicting stage or unauthorized change done, aid in change traceability in the new and old versions of business processes, change effects can be seen and estimated effectively, ease for Stakeholders to validate and verify change implementation, etc.\nConcept drift has potential in smart grid analysis because the socio-economic behaviour of consumers is not governed by the laws of physics. Likewise there are also applications in wind power forecasting. In this paper we present decision tree ensemble classification method based on the Random Forest algorithm for concept drift. The weighted majority voting ensemble aggregation rule is employed based on the ideas of Accuracy Weighted Ensemble (AWE) method. Base learner weight in our case is computed for each sample evaluation using base learners accuracy and intrinsic proximity measure of Random Forest. Our algorithm exploits both temporal weighting of samples and ensemble pruning as a forgetting strategy. We present results of empirical comparison of our method with original random forest with incorporated \"replace-the-looser\" forgetting andother state-of-the-art concept-drfit classifiers like AWE2.\nWe analyze dropout in deep networks with rectified linear units and the quadratic loss. Our results expose surprising differences between the behavior of dropout and more traditional regularizers like weight decay. For example, on some simple data sets dropout training produces negative weights even though the output is the sum of the inputs. This provides a counterpoint to the suggestion that dropout discourages co-adaptation of weights. We also show that the dropout penalty can grow exponentially in the depth of the network while the weight-decay penalty remains essentially linear, and that dropout is insensitive to various re-scalings of the input features, outputs, and network weights. This last insensitivity implies that there are no isolated local minima of the dropout training criterion. Our work uncovers new properties of dropout, extends our understanding of why dropout succeeds, and lays the foundation for further progress.\nConsequence-based calculi are a family of reasoning algorithms for description logics (DLs), and they combine hypertableau and resolution in a way that often achieves excellent performance in practice. Up to now, however, they were proposed for either Horn DLs (which do not support disjunction), or for DLs without counting quantifiers. In this paper we present a novel consequence-based calculus for SRIQ---a rich DL that supports both features. This extension is non-trivial since the intermediate consequences that need to be derived during reasoning cannot be captured using DLs themselves. The results of our preliminary performance evaluation suggest the feasibility of our approach in practice.\nData Warehouses are structures with large amount of data collected from heterogeneous sources to be used in a decision support system. Data Warehouses analysis identifies hidden patterns initially unexpected which analysis requires great memory and computation cost. Data reduction methods were proposed to make this analysis easier. In this paper, we present a hybrid approach based on Genetic Algorithms (GA) as Evolutionary Algorithms and the Multiple Correspondence Analysis (MCA) as Analysis Factor Methods to conduct this reduction. Our approach identifies reduced subset of dimensions from the initial subset p where p'<p where it is proposed to find the profile fact that is the closest to reference. GAs identify the possible subsets and the Khi formula of the ACM evaluates the quality of each subset. The study is based on a distance measurement between the reference and n facts profile extracted from the Warehouses.\nThe partially observable Markov decision process (POMDP) provides a principled general model for planning under uncertainty. However, solving a general POMDP is computationally intractable in the worst case. This paper introduces POMDP-lite, a subclass of POMDPs in which the hidden state variables are constant or only change deterministically. We show that a POMDP-lite is equivalent to a set of fully observable Markov decision processes indexed by a hidden parameter and is useful for modeling a variety of interesting robotic tasks. We develop a simple model-based Bayesian reinforcement learning algorithm to solve POMDP-lite models. The algorithm performs well on large-scale POMDP-lite models with up to $10^{20}$ states and outperforms the state-of-the-art general-purpose POMDP algorithms. We further show that the algorithm is near-Bayesian-optimal under suitable conditions.\nDomain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution. In this work, we present approximate label matching (ALM), a new unsupervised domain adaptation technique that creates and leverages a rough labeling on the test samples, then uses these noisy labels to learn a transformation that aligns the source and target samples. We show that the transformation estimated by ALM has favorable properties compared to transformations estimated by other methods, which do not use any kind of target labeling. Our model is regularized by requiring that a classifier trained to discriminate source from transformed target samples cannot distinguish between the two. We experiment with ALM on simulated and real data, and show that it outperforms techniques commonly used in the field.\nThis paper addresses the problem of detecting coherent motions in crowd scenes and presents its two applications in crowd scene understanding: semantic region detection and recurrent activity mining. It processes input motion fields (e.g., optical flow fields) and produces a coherent motion filed, named as thermal energy field. The thermal energy field is able to capture both motion correlation among particles and the motion trends of individual particles which are helpful to discover coherency among them. We further introduce a two-step clustering process to construct stable semantic regions from the extracted time-varying coherent motions. These semantic regions can be used to recognize pre-defined activities in crowd scenes. Finally, we introduce a cluster-and-merge process which automatically discovers recurrent activities in crowd scenes by clustering and merging the extracted coherent motions. Experiments on various videos demonstrate the effectiveness of our approach.\nWe propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD($\\lambda$). We illustrate this theoretical relationship empirically on a continuous-state control task.\nThe widespread integration of cameras in hand-held and head-worn devices as well as the ability to share content online enables a large and diverse visual capture of the world that millions of users build up collectively every day. We envision these images as well as associated meta information, such as GPS coordinates and timestamps, to form a collective visual memory that can be queried while automatically taking the ever-changing context of mobile users into account. As a first step towards this vision, in this work we present Xplore-M-Ego: a novel media retrieval system that allows users to query a dynamic database of images and videos using spatio-temporal natural language queries. We evaluate our system using a new dataset of real user queries as well as through a usability study. One key finding is that there is a considerable amount of inter-user variability, for example in the resolution of spatial relations in natural language utterances. We show that our retrieval system can cope with this variability using personalisation through an online learning-based retrieval formulation.\nRecent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of sequential patterns with low spuriousness and redundancy, high interpretability and usefulness in real-world applications. Furthermore, we demonstrate that the quality of the patterns from our approach is comparable to, if not better than, existing state of the art sequential pattern mining algorithms.\nIt was shown before that the NP-hard problem of deterministic finite automata (DFA) identification can be effectively translated to Boolean satisfiability (SAT). Modern SAT-solvers can tackle hard DFA identification instances efficiently. We present a technique to reduce the problem search space by enforcing an enumeration of DFA states in depth-first search (DFS) or breadth-first search (BFS) order. We propose symmetry breaking predicates, which can be added to Boolean formulae representing various DFA identification problems. We show how to apply this technique to DFA identification from both noiseless and noisy data. Also we propose a method to identify all automata of the desired size. The proposed approach outperforms the current state-of-the-art DFASAT method for DFA identification from noiseless data. A big advantage of the proposed approach is that it allows to determine exactly the existence or non-existence of a solution of the noisy DFA identification problem unlike metaheuristic approaches such as genetic algorithms.\nWe describe a large-scale functional brain model that includes detailed, conductance-based, compartmental models of individual neurons. We call the model BioSpaun, to indicate the increased biological plausibility of these neurons, and because it is a direct extension of the Spaun model \\cite{Eliasmith2012b}. We demonstrate that including these detailed compartmental models does not adversely affect performance across a variety of tasks, including digit recognition, serial working memory, and counting. We then explore the effects of applying TTX, a sodium channel blocking drug, to the model. We characterize the behavioral changes that result from this molecular level intervention. We believe this is the first demonstration of a large-scale brain model that clearly links low-level molecular interventions and high-level behavior.\nWe have developed a program called MUDoS (Maastricht University Domineering Solver) that solves Domineering positions in a very efficient way. This enables the solution of known positions so far (up to the 10 x 10 board) much quicker (measured in number of investigated nodes).   More importantly, it enables the solution of the 11 x 11 Domineering board, a board up till now far out of reach of previous Domineering solvers. The solution needed the investigation of 259,689,994,008 nodes, using almost half a year of computation time on a single simple desktop computer. The results show that under optimal play the first player wins the 11 x 11 Domineering game, irrespective if Vertical or Horizontal starts the game.   In addition, several other boards hitherto unsolved were solved. Using the convention that Vertical starts, the 8 x 15, 11 x 9, 12 x 8, 12 x 15, 14 x 8, and 17 x 6 boards are all won by Vertical, whereas the 6 x 17, 8 x 12, 9 x 11, and 11 x 10 boards are all won by Horizontal.\nDeep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets.\nTraditional inconsistency-tolerent query answering in ontology-based data access relies on selecting maximal components of an ABox/database which are consistent with the ontology. However, some rules in ontologies might be unreliable if they are extracted from ontology learning or written by unskillful knowledge engineers. In this paper we present a framework of handling inconsistent existential rules under stable model semantics, which is defined by a notion called rule repairs to select maximal components of the existential rules. Surprisingly, for R-acyclic existential rules with R-stratified or guarded existential rules with stratified negations, both the data complexity and combined complexity of query answering under the rule {repair semantics} remain the same as that under the conventional query answering semantics. This leads us to propose several approaches to handle the rule {repair semantics} by calling answer set programming solvers. An experimental evaluation shows that these approaches have good scalability of query answering under rule repairs on realistic cases.\nThe advances of the Linked Open Data (LOD) initiative are giving rise to a more structured web of data. Indeed, a few datasets act as hubs (e.g., DBpedia) connecting many other datasets. They also made possible new web services for entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for new applications that will benefit from a combination of the web of documents and the web of data. To ease the emergence of these new use-cases, we propose a query-biased algorithm for the ranking of entities detected inside a web page. Our algorithm combine link analysis with dimensionality reduction. We use crowdsourcing for building a publicly available and reusable dataset on which we compare our algorithm to the state of the art. Finally, we use this algorithm for the construction of semantic snippets for which we evaluate the usability and the usefulness with a crowdsourcing-based approach.\nMatching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.\nStorytelling algorithms aim to 'connect the dots' between disparate documents by linking starting and ending documents through a series of intermediate documents. Existing storytelling algorithms are based on notions of coherence and connectivity, and thus the primary way by which users can steer the story construction is via design of suitable similarity functions. We present an alternative approach to storytelling wherein the user can interactively and iteratively provide 'must use' constraints to preferentially support the construction of some stories over others. The three innovations in our approach are distance measures based on (inferred) topic distributions, the use of constraints to define sets of linear inequalities over paths, and the introduction of slack and surplus variables to condition the topic distribution to preferentially emphasize desired terms over others. We describe experimental results to illustrate the effectiveness of our interactive storytelling approach over multiple text datasets.\nAlthough RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illuminate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions furthermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.\nTo appear in Theory and Practice of Logic Programming (TPLP). In this paper we propose an extension of logic programming (LP) where each default literal derived from the well-founded model is associated to a justification represented as an algebraic expression. This expression contains both causal explanations (in the form of proof graphs built with rule labels) and terms under the scope of negation that stand for conditions that enable or disable the application of causal rules. Using some examples, we discuss how these new conditions, we respectively call \"enablers\" and \"inhibitors\", are intimately related to default negation and have an essentially different nature from regular cause-effect relations. The most important result is a formal comparison to the recent algebraic approaches for justifications in LP: \"Why-not Provenance\" (WnP) and \"Causal Graphs\" (CG). We show that the current approach extends both WnP and CG justifications under the Well-Founded Semantics and, as a byproduct, we also establish a formal relation between these two approaches.\nHuman language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like \"bleed\" and \"punch\" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.\nStudents in online courses generate large amounts of data that can be used to personalize the learning process and improve quality of education. In this paper, we present the Latent Skill Embedding (LSE), a probabilistic model of students and educational content that can be used to recommend personalized sequences of lessons with the goal of helping students prepare for specific assessments. Akin to collaborative filtering for recommender systems, the algorithm does not require students or content to be described by features, but it learns a representation using access traces. We formulate this problem as a regularized maximum-likelihood embedding of students, lessons, and assessments from historical student-content interactions. An empirical evaluation on large-scale data from Knewton, an adaptive learning technology company, shows that this approach predicts assessment results competitively with benchmark models and is able to discriminate between lesson sequences that lead to mastery and failure.\nOnline media offers opportunities to marketers to deliver brand messages to a large audience. Advertising technology platforms enables the advertisers to find the proper group of audiences and deliver ad impressions to them in real time. The recent growth of the real time bidding has posed a significant challenge on monitoring such a complicated system. With so many components we need a reliable system that detects the possible changes in the system and alerts the engineering team. In this paper we describe the mechanism that we invented for recovering the representative metrics and detecting the change in their behavior. We show that this mechanism is able to detect the possible problems in time by describing some incident cases.\nRecent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet).   The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet\nWe consider a problem of prediction based on opinions elicited from heterogeneous rational agents with private information. Making an accurate prediction with a minimal cost requires a joint design of the incentive mechanism and the prediction algorithm. Such a problem lies at the nexus of statistical learning theory and game theory, and arises in many domains such as consumer surveys and mobile crowdsourcing. In order to elicit heterogeneous agents' private information and incentivize agents with different capabilities to act in the principal's best interest, we design an optimal joint incentive mechanism and prediction algorithm called COPE (COst and Prediction Elicitation), the analysis of which offers several valuable engineering insights. First, when the costs incurred by the agents are linear in the exerted effort, COPE corresponds to a \"crowd contending\" mechanism, where the principal only employs the agent with the highest capability. Second, when the costs are quadratic, COPE corresponds to a \"crowd-sourcing\" mechanism that employs multiple agents with different capabilities at the same time. Numerical simulations show that COPE improves the principal's profit and the network profit significantly (larger than 30% in our simulations), comparing to those mechanisms that assume all agents have equal capabilities.\nSome critical open problems of epistemic logics can be investigated in the framework of a quantum computational approach. The basic idea is to interpret sentences - like Alice knows that Bob does not understand that Pi is irrational - as pieces of quantum information (generally represented by density operators of convenient Hilbert spaces). Logical epistemic operators (to understand, to know ...) are dealt with as (generally irreversible) quantum operations, which are, in a sense, similar to measurement-procedures. This approach permits us to model some characteristic epistemic processes, that concern both human and artificial intelligence. For instance, the operation of \\memorizing and retrieving information\" can be formally represented, in this framework, by using a quantum teleportation phenomenon.\nWhat are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.\nThe ability to know in advance the trend of running process instances, with respect to different features, such as the expected completion time, would allow business managers to timely counteract to undesired situations, in order to prevent losses. Therefore, the ability to accurately predict future features of running business process instances would be a very helpful aid when managing processes, especially under service level agreement constraints. However, making such accurate forecasts is not easy: many factors may influence the predicted features.   Many approaches have been proposed to cope with this problem but all of them assume that the underling process is stationary. However, in real cases this assumption is not always true. In this work we present new methods for predicting the remaining time of running cases. In particular we propose a method, assuming process stationarity, which outperforms the state-of-the-art and two other methods which are able to make predictions even with non-stationary processes. We also describe an approach able to predict the full sequence of activities that a running case is going to take. All these methods are extensively evaluated on two real case studies.\nThe amount of data generated in the modern society is increasing rapidly. New problems and novel approaches of data capture, storage, analysis and visualization are responsible for the emergence of the Big Data research field. Machine Learning algorithms can be used in Big Data to make better and more accurate inferences. However, because of the challenges Big Data imposes, these algorithms need to be adapted and optimized to specific applications. One important decision made by software engineers is the choice of the language that is used in the implementation of these algorithms. Therefore, this literature survey identifies and describes domain-specific languages and frameworks used for Machine Learning in Big Data. By doing this, software engineers can then make more informed choices and beginners have an overview of the main languages used in this domain.\nMost learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were all clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using the adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.\nAlgorithms that generate computer game content require game design knowledge. We present an approach to automatically learn game design knowledge for level design from gameplay videos. We further demonstrate how the acquired design knowledge can be used to generate sections of game levels. Our approach involves parsing video of people playing a game to detect the appearance of patterns of sprites and utilizing machine learning to build a probabilistic model of sprite placement. We show how rich game design information can be automatically parsed from gameplay videos and represented as a set of generative probabilistic models. We use Super Mario Bros. as a proof of concept. We evaluate our approach on a measure of playability and stylistic similarity to the original levels as represented in the gameplay videos.\nWe propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.\nThe importance of accurate recommender systems has been widely recognized by academia and industry. However, the recommendation quality is still rather low. Recently, a linear sparse and low-rank representation of the user-item matrix has been applied to produce Top-N recommendations. This approach uses the nuclear norm as a convex relaxation for the rank function and has achieved better recommendation accuracy than the state-of-the-art methods. In the past several years, solving rank minimization problems by leveraging nonconvex relaxations has received increasing attention. Some empirical results demonstrate that it can provide a better approximation to original problems than convex relaxation. In this paper, we propose a novel rank approximation to enhance the performance of Top-N recommendation systems, where the approximation error is controllable. Experimental results on real data show that the proposed rank approximation improves the Top-$N$ recommendation accuracy substantially.\nSeveral diseases related to cell proliferation are characterized by the accumulation of somatic DNA changes, with respect to wildtype conditions. Cancer and HIV are two common examples of such diseases, where the mutational load in the cancerous/viral population increases over time. In these cases, selective pressures are often observed along with competition, cooperation and parasitism among distinct cellular clones. Recently, we presented a mathematical framework to model these phenomena, based on a combination of Bayesian inference and Suppes' theory of probabilistic causation, depicted in graphical structures dubbed Suppes-Bayes Causal Networks (SBCNs). SBCNs are generative probabilistic graphical models that recapitulate the potential ordering of accumulation of such DNA changes during the progression of the disease. Such models can be inferred from data by exploiting likelihood-based model-selection strategies with regularization. In this paper we discuss the theoretical foundations of our approach and we investigate in depth the influence on the model-selection task of: (i) the poset based on Suppes' theory and (ii) different regularization strategies. Furthermore, we provide an example of application of our framework to HIV genetic data highlighting the valuable insights provided by the inferred.\nSubmodular function maximization finds application in a variety of real-world decision-making problems. However, most existing methods, based on greedy maximization, assume it is computationally feasible to evaluate F, the function being maximized. Unfortunately, in many realistic settings F is too expensive to evaluate exactly even once. We present probably approximately correct greedy maximization, which requires access only to cheap anytime confidence bounds on F and uses them to prune elements. We show that, with high probability, our method returns an approximately optimal set. We propose novel, cheap confidence bounds for conditional entropy, which appears in many common choices of F and for which it is difficult to find unbiased or bounded estimates. Finally, results on a real-world dataset from a multi-camera tracking system in a shopping mall demonstrate that our approach performs comparably to existing methods, but at a fraction of the computational cost.\nWe present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.\nThis paper focuses on causal structure estimation from time series data in which measurements are obtained at a coarser timescale than the causal timescale of the underlying system. Previous work has shown that such subsampling can lead to significant errors about the system's causal structure if not properly taken into account. In this paper, we first consider the search for the system timescale causal structures that correspond to a given measurement timescale structure. We provide a constraint satisfaction procedure whose computational performance is several orders of magnitude better than previous approaches. We then consider finite-sample data as input, and propose the first constraint optimization approach for recovering the system timescale causal structure. This algorithm optimally recovers from possible conflicts due to statistical errors. More generally, these advances allow for a robust and non-parametric estimation of system timescale causal structures from subsampled time series data.\nOrdinal peer grading has been proposed as a simple and scalable solution for computing reliable information about student performance in massive open online courses. The idea is to outsource the grading task to the students themselves as follows. After the end of an exam, each student is asked to rank ---in terms of quality--- a bundle of exam papers by fellow students. An aggregation rule will then combine the individual rankings into a global one that contains all students. We define a broad class of simple aggregation rules and present a theoretical framework for assessing their effectiveness. When statistical information about the grading behaviour of students is available, the framework can be used to compute the optimal rule from this class with respect to a series of performance objectives. For example, a natural rule known as Borda is proved to be optimal when students grade correctly. In addition, we present extensive simulations and a field experiment that validate our theory and prove it to be extremely accurate in predicting the performance of aggregation rules even when only rough information about grading behaviour is available.\nWe present an algorithm for building probabilistic rule lists that is two orders of magnitude faster than previous work. Rule list algorithms are competitors for decision tree algorithms. They are associative classifiers, in that they are built from pre-mined association rules. They have a logical structure that is a sequence of IF-THEN rules, identical to a decision list or one-sided decision tree. Instead of using greedy splitting and pruning like decision tree algorithms, we fully optimize over rule lists, striking a practical balance between accuracy, interpretability, and computational speed. The algorithm presented here uses a mixture of theoretical bounds (tight enough to have practical implications as a screening or bounding procedure), computational reuse, and highly tuned language libraries to achieve computational efficiency. Currently, for many practical problems, this method achieves better accuracy and sparsity than decision trees; further, in many cases, the computational time is practical and often less than that of decision trees. The result is a probabilistic classifier (which estimates P(y = 1|x) for each x) that optimizes the posterior of a Bayesian hierarchical model over rule lists.\nFollowing the recent trend in explicit neural memory structures, we present a new design of an external memory, wherein memories are stored in an Euclidean key space $\\mathbb R^n$. An LSTM controller performs read and write via specialized read and write heads. It can move a head by either providing a new address in the key space (aka random access) or moving from its previous position via a Lie group action (aka Lie access). In this way, the \"L\" and \"R\" instructions of a traditional Turing Machine are generalized to arbitrary elements of a fixed Lie group action. For this reason, we name this new model the Lie Access Neural Turing Machine, or LANTM.   We tested two different configurations of LANTM against an LSTM baseline in several basic experiments. We found the right configuration of LANTM to outperform the baseline in all of our experiments. In particular, we trained LANTM on addition of $k$-digit numbers for $2 \\le k \\le 16$, but it was able to generalize almost perfectly to $17 \\le k \\le 32$, all with the number of parameters 2 orders of magnitude below the LSTM baseline.\nOff-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policy-evaluation algorithms that fill a longstanding algorithmic void in reinforcement learning: combining robustness to off-policy sampling, function approximation, linear complexity, and temporal difference (TD) updates. This paper contains two main contributions. First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms. Second, we perform an empirical comparison to elicit which of these new linear TD methods should be preferred in different situations, and make concrete suggestions about practical use.\nA key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or $Q$-function may fail to improve performance---or worse, actually cause the policy performance to degrade. Prior work has addressed this for policy iteration by deriving tight policy improvement bounds; by optimizing the lower bound on policy improvement, a better policy is guaranteed. However, existing approaches suffer from bounds that are hard to optimize in practice because they include sup norm terms which cannot be efficiently estimated or differentiated. In this work, we derive a better policy improvement bound where the sup norm of the policy divergence has been replaced with an average divergence; this leads to an algorithm, Easy Monotonic Policy Iteration, that generates sequences of policies with guaranteed non-decreasing returns and is easy to implement in a sample-based framework.\nReinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.\nThe validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic dependencies among the attributes. We are particularly interested in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs) to a relational data mining context and enable effective and robust reasoning over relational data. Even though a panoply of works have focused, separately , on the generation of random Bayesian networks and relational databases, no work has been identified for PRMs on that track. This paper provides an algorithmic approach for generating random PRMs from scratch to fill this gap. The proposed method allows to generate PRMs as well as synthetic relational data from a randomly generated relational schema and a random set of probabilistic dependencies. This can be of interest not only for machine learning researchers to evaluate their proposals in a common framework, but also for databases designers to evaluate the effectiveness of the components of a database management system.\nModel-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized adantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.\nHierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down HC with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i:e:, defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art HC approaches.\nProbabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use.\nCollaborative Filtering aims at exploiting the feedback of users to provide personalised recommendations. Such algorithms look for latent variables in a large sparse matrix of ratings. They can be enhanced by adding side information to tackle the well-known cold start problem. While Neu-ral Networks have tremendous success in image and speech recognition, they have received less attention in Collaborative Filtering. This is all the more surprising that Neural Networks are able to discover latent variables in large and heterogeneous datasets. In this paper, we introduce a Collaborative Filtering Neural network architecture aka CFN which computes a non-linear Matrix Factorization from sparse rating inputs and side information. We show experimentally on the MovieLens and Douban dataset that CFN outper-forms the state of the art and benefits from side information. We provide an implementation of the algorithm as a reusable plugin for Torch, a popular Neural Network framework.\nThis work targets people identification in video based on the way they walk (i.e. gait). While classical methods typically derive gait signatures from sequences of binary silhouettes, in this work we explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). We carry out a thorough experimental evaluation of the proposed CNN architecture on the challenging TUM-GAID dataset. The experimental results indicate that using spatio-temporal cuboids of optical flow as input data for CNN allows to obtain state-of-the-art results on the gait task with an image resolution eight times lower than the previously reported results (i.e. 80x60 pixels).\nMany real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.\nWooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the block trajectories. The models are also able to generalize in two important ways: (i) to new physical scenarios, e.g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects.\nCloud computing involves complex technical and economical systems and interactions. This brings about various challenges, two of which are: (1) debugging and control of computing systems with the help of sandbox experiments, and (2) prediction of the cost of \"spot\" resources for decision making of cloud clients. In this paper, we formalize debugging by counterfactual probabilities and control by post-(soft-)interventional probabilities. We prove that counterfactuals can approximately be calculated from a \"stochastic\" graphical causal model (while they are originally defined only for \"deterministic\" functional causal models), and based on this sketch an approach to address problem (1). To address problem (2), we formalize bidding by post-(soft-)interventional probabilities and present a simple mathematical result on approximate integration of \"incomplete\" conditional probability distributions. We show how this can be used by cloud clients to trade off privacy against predictability of the outcome of their bidding actions in a toy scenario. We report experiments on simulated and real data.\nSo far different studies have tackled the sentiment analysis in several domains such as restaurant and movie reviews. But, this problem has not been studied in scholarly book reviews which is different in terms of review style and size. In this paper, we propose to combine different features in order to be presented to a supervised classifiers which extract the opinion target expressions and detect their polarities in scholarly book reviews. We construct a labeled corpus for training and evaluating our methods in French book reviews. We also evaluate them on English restaurant reviews in order to measure their robustness across the domains and languages. The evaluation shows that our methods are enough robust for English restaurant reviews and French book reviews.\nThe increasing amount of available Linked Data resources is laying the foundations for more advanced Semantic Web applications. One of their main limitations, however, remains the general low level of data quality. In this paper we focus on a measure of quality which is negatively affected by the increase of the available resources. We propose a measure of semantic richness of Linked Data concepts and we demonstrate our hypothesis that the more a concept is reused, the less semantically rich it becomes. This is a significant scalability issue, as one of the core aspects of Linked Data is the propagation of semantic information on the Web by reusing common terms. We prove our hypothesis with respect to our measure of semantic richness and we validate our model empirically. Finally, we suggest possible future directions to address this scalability problem.\nConceptual blending is a powerful tool for computational creativity where, for example, the properties of two harmonic spaces may be combined in a consistent manner to produce a novel harmonic space. However, deciding about the importance of property features in the input spaces and evaluating the results of conceptual blending is a nontrivial task. In the specific case of musical harmony, defining the salient features of chord transitions and evaluating invented harmonic spaces requires deep musicological background knowledge. In this paper, we propose a creative tool that helps musicologists to evaluate and to enhance harmonic innovation. This tool allows a music expert to specify arguments over given transition properties. These arguments are then considered by the system when defining combinations of features in an idiom-blending process. A music expert can assess whether the new harmonic idiom makes musicological sense and re-adjust the arguments (selection of features) to explore alternative blends that can potentially produce better harmonic spaces. We conclude with a discussion of future work that would further automate the harmonisation process.\nThe power grid is a complex and vital system that necessitates careful reliability management. Managing the grid is a difficult problem with multiple time scales of decision making and stochastic behavior due to renewable energy generations, variable demand and unplanned outages. Solving this problem in the face of uncertainty requires a new methodology with tractable algorithms. In this work, we introduce a new model for hierarchical decision making in complex systems. We apply reinforcement learning (RL) methods to learn a proxy, i.e., a level of abstraction, for real-time power grid reliability. We devise an algorithm that alternates between slow time-scale policy improvement, and fast time-scale value function approximation. We compare our results to prevailing heuristics, and show the strength of our method.\nBuilding Information Modeling (BIM) is a recent construction process based on a 3D model, containing every component related to the building achievement. Architects, structure engineers, method engineers, and others participant to the building process work on this model through the design-to-construction cycle. The high complexity and the large amount of information included in these models raise several issues, delaying its wide adoption in the industrial world. One of the most important is the visualization: professionals have difficulties to find out the relevant information for their job. Actual solutions suffer from two limitations: the BIM models information are processed manually and insignificant information are simply hidden, leading to inconsistencies in the building model. This paper describes a system relying on an ontological representation of the building information to label automatically the building elements. Depending on the user's department, the visualization is modified according to these labels by automatically adjusting the colors and image properties based on a saliency model. The proposed saliency model incorporates several adaptations to fit the specificities of architectural images.\nAs datasets capturing human choices grow in richness and scale---particularly in online domains---there is an increasing need for choice models that escape traditional choice-theoretic axioms such as regularity, stochastic transitivity, and Luce's choice axiom. In this work we introduce the Pairwise Choice Markov Chain (PCMC) model of discrete choice, an inferentially tractable model that does not assume any of the above axioms while still satisfying the foundational axiom of uniform expansion, a considerably weaker assumption than Luce's choice axiom. We show that the PCMC model significantly outperforms the Multinomial Logit (MNL) model in prediction tasks on both synthetic and empirical datasets known to exhibit violations of Luce's axiom. Our analysis also synthesizes several recent observations connecting the Multinomial Logit model and Markov chains; the PCMC model retains the Multinomial Logit model as a special case.\nWithout discourse connectives, classifying implicit discourse relations is a challenging task and a bottleneck for building a practical discourse parser. Previous research usually makes use of one kind of discourse framework such as PDTB or RST to improve the classification performance on discourse relations. Actually, under different discourse annotation frameworks, there exist multiple corpora which have internal connections. To exploit the combination of different discourse corpora, we design related discourse classification tasks specific to a corpus, and propose a novel Convolutional Neural Network embedded multi-task learning system to synthesize these tasks by learning both unique and shared representations for each task. The experimental results on the PDTB implicit discourse relation classification task demonstrate that our model achieves significant gains over baseline systems.\nRecognizing the activities of daily living plays an important role in healthcare. It is necessary to use an adapted model to simulate the human behavior in a domestic space to monitor the patient harmonically and to intervene in the necessary time. In this paper, we tackle this problem using the hierarchical hidden Markov model for representing and recognizing complex indoor activities. We propose a new grammar, called \"Home By Room Activities Language\", to facilitate the complexity of human scenarios and consider the abnormal activities.\nWe present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal value function or task compositionality. The proposed hierarchical approach can also be seen as a novel alternative to solving LMDPs with large state spaces. We derive a hierarchical version of the so-called Z-learning algorithm that learns different tasks simultaneously and show empirically that it significantly outperforms the state-of-the-art learning methods in two classical hierarchical reinforcement learning domains: the taxi domain and an autonomous guided vehicle task.\nDivide and Conquer (DC) is conceptually well suited to high-dimensional optimization by decomposing a problem into multiple small-scale sub-problems. However, appealing performance can be seldom observed when the sub-problems are interdependent. This paper suggests that the major difficulty of tackling interdependent sub-problems lies in the precise evaluation of a partial solution (to a sub-problem), which can be overwhelmingly costly and thus makes sub-problems non-trivial to conquer. Thus, we propose an approximation approach, named Divide and Approximate Conquer (DAC), which reduces the cost of partial solution evaluation from exponential time to polynomial time. Meanwhile, the convergence to the global optimum (of the original problem) is still guaranteed. The effectiveness of DAC is demonstrated empirically on two sets of non-separable high-dimensional problems.\nQuantum walks are a promising framework that can be used to both understand and implement quantum information processing tasks. The quantum stochastic walk is a recently developed framework that combines the concept of a quantum walk with that of a classical random walk, through open system evolution of a quantum system. Quantum stochastic walks have been shown to have applications in as far reaching fields as artificial intelligence. However, there are significant constraints on the kind of open system evolutions that can be realized in a physical experiment. In this work, we discuss the restrictions on the allowed open system evolution, and the physical assumptions underpinning them. We show that general implementations would require the complete solution of the underlying unitary dynamics, and sophisticated reservoir engineering, thus weakening the benefits of experimental investigations.\nGame balancing is an important part of the (computer) game design process, in which designers adapt a game prototype so that the resulting gameplay is as entertaining as possible. In industry, the evaluation of a game is often based on costly playtests with human players. It suggests itself to automate this process using surrogate models for the prediction of gameplay and outcome. In this paper, the feasibility of automatic balancing using simulation- and deck-based objectives is investigated for the card game top trumps. Additionally, the necessity of a multi-objective approach is asserted by a comparison with the only known (single-objective) method. We apply a multi-objective evolutionary algorithm to obtain decks that optimise objectives, e.g. win rate and average number of tricks, developed to express the fairness and the excitement of a game of top trumps. The results are compared with decks from published top trumps decks using simulation-based objectives. The possibility to generate decks better or at least as good as decks from published top trumps decks in terms of these objectives is demonstrated. Our results indicate that automatic balancing with the presented approach is feasible even for more complex games such as real-time strategy games.\nThe Maximum Satisfiability (MaxSAT) problem is the problem of finding a truth assignment that maximizes the number of satisfied clauses of a given Boolean formula in Conjunctive Normal Form (CNF). Many exact solvers for MaxSAT have been developed during recent years, and many of them were presented in the well-known SAT conference. Algorithms for MaxSAT generally fall into two categories: (1) branch and bound algorithms and (2) algorithms that use successive calls to a SAT solver (SAT- based), which this paper in on. In practical problems, SAT-based algorithms have been shown to be more efficient. This paper provides an experimental investigation to compare the performance of recent SAT-based and branch and bound algorithms on the benchmarks of the MaxSAT Evaluations.\nAutomatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing. Existing approaches are either top-down, which start from a gist of an image and convert it into words, or bottom-up, which come up with words describing various aspects of an image and then combine them. In this paper, we propose a new algorithm that combines both approaches through a model of semantic attention. Our algorithm learns to selectively attend to semantic concept proposals and fuse them into hidden states and outputs of recurrent neural networks. The selection and fusion form a feedback connecting the top-down and bottom-up computation. We evaluate our algorithm on two public benchmarks: Microsoft COCO and Flickr30K. Experimental results show that our algorithm significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.\nSingle Index Models (SIMs) are simple yet flexible semi-parametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions, and under general structural assumptions, has not been forthcoming. In this paper, we propose computationally efficient algorithms for SIM inference in high dimensions with structural constraints. Our general approach specializes to sparsity, group sparsity, and low-rank assumptions among others. Experiments show that the proposed method enjoys superior predictive performance when compared to generalized linear models, and achieves results comparable to or better than single layer feedforward neural networks with significantly less computational cost.\nAs most database users cannot precisely express their information needs, it is challenging for database management systems to understand them. We propose a novel formal framework for representing and understanding information needs in database querying and exploration. Our framework considers querying as a collaboration between the user and the database management system to establish a it mutual language for representing information needs. We formalize this collaboration as a signaling game, where each mutual language is an equilibrium for the game. A query interface is more effective if it establishes a less ambiguous mutual language faster. We discuss some equilibria, strategies, and the convergence in this game. In particular, we propose a reinforcement learning mechanism and analyze it within our framework. We prove that this adaptation mechanism for the query interface improves the effectiveness of answering queries stochastically speaking, and converges almost surely. We extend out results for the cases that the user also modifies her strategy during the interaction.\nHigh-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strategy inspired by the principles of state abstraction and information acquisition under uncertainty. We demonstrate the empirical effectiveness of these techniques, first, as a preliminary check, on two standard tasks (Blackjack and $n$-Chain), and then on two much larger and more realistic tasks with high-dimensional observation spaces. Specifically, we introduce two benchmarks built within the game Minecraft where the observations are pixel arrays of the agent's visual field. A combination of our two algorithmic techniques performs competitively on the standard reinforcement-learning tasks while consistently and substantially outperforming baselines on the two tasks with high-dimensional observation spaces. The new function approximator, exploration strategy, and evaluation benchmarks are each of independent interest in the pursuit of reinforcement-learning methods that scale to real-world domains.\nMany Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as word2vec, was shown to provide state-of-the-art results on various linguistics tasks. In this paper, we show that item-based CF can be cast in the same framework of neural word embedding. Inspired by SGNS, we describe a method we name item2vec for item-based CF that produces embedding for items in a latent space. The method is capable of inferring item-item relations even when user information is not available. We present experimental results that demonstrate the effectiveness of the item2vec method and show it is competitive with SVD.\nLearning the influence structure of multiple time series data is of great interest to many disciplines. This paper studies the problem of recovering the causal structure in network of multivariate linear Hawkes processes. In such processes, the occurrence of an event in one process affects the probability of occurrence of new events in some other processes. Thus, a natural notion of causality exists between such processes captured by the support of the excitation matrix. We show that the resulting causal influence network is equivalent to the Directed Information graph (DIG) of the processes, which encodes the causal factorization of the joint distribution of the processes. Furthermore, we present an algorithm for learning the support of excitation matrix (or equivalently the DIG). The performance of the algorithm is evaluated on synthesized multivariate Hawkes networks as well as a stock market and MemeTracker real-world dataset.\nVery large commonsense knowledge bases (KBs) often have thousands to millions of axioms, of which relatively few are relevant for answering any given query. A large number of irrelevant axioms can easily overwhelm resolution-based theorem provers. Therefore, methods that help the reasoner identify useful inference paths form an essential part of large-scale reasoning systems. In this paper, we describe two ordering heuristics for optimization of reasoning in such systems. First, we discuss how decision trees can be used to select inference steps that are more likely to succeed. Second, we identify a small set of problem instance features that suffice to guide searches away from intractable regions of the search space. We show the efficacy of these techniques via experiments on thousands of queries from the Cyc KB. Results show that these methods lead to an order of magnitude reduction in inference time.\nDomain adaptation algorithms are useful when the distributions of the training and the test data are different. In this paper, we focus on the problem of instrumental variation and time-varying drift in the field of sensors and measurement, which can be viewed as discrete and continuous distributional change in the feature space. We propose maximum independence domain adaptation (MIDA) and semi-supervised MIDA (SMIDA) to address this problem. Domain features are first defined to describe the background information of a sample, such as the device label and acquisition time. Then, MIDA learns a subspace which has maximum independence with the domain features, so as to reduce the inter-domain discrepancy in distributions. A feature augmentation strategy is also designed to project samples according to their backgrounds so as to improve the adaptation. The proposed algorithms are flexible and fast. Their effectiveness is verified by experiments on synthetic datasets and four real-world ones on sensors, measurement, and computer vision. They can greatly enhance the practicability of sensor systems, as well as extend the application scope of existing domain adaptation algorithms by uniformly handling different kinds of distributional change.\nSequential decision making under uncertainty is studied in a mixed observability domain. The goal is to maximize the amount of information obtained on a partially observable stochastic process under constraints imposed by a fully observable internal state. An upper bound for the optimal value function is derived by relaxing constraints. We identify conditions under which the relaxed problem is a multi-armed bandit whose optimal policy is easily computable. The upper bound is applied to prune the search space in the original problem, and the effect on solution quality is assessed via simulation experiments. Empirical results show effective pruning of the search space in a target monitoring domain.\nMany deep Convolutional Neural Networks (CNN) make incorrect predictions on adversarial samples obtained by imperceptible perturbations of clean samples. We hypothesize that this is caused by a failure to suppress unusual signals within network layers. As remedy we propose the use of Symmetric Activation Functions (SAF) in non-linear signal transducer units. These units suppress signals of exceptional magnitude. We prove that SAF networks can perform classification tasks to arbitrary precision in a simplified situation. In practice, rather than use SAFs alone, we add them into CNNs to improve their robustness. The modified CNNs can be easily trained using popular strategies with the moderate training load. Our experiments on MNIST and CIFAR-10 show that the modified CNNs perform similarly to plain ones on clean samples, and are remarkably more robust against adversarial and nonsense samples.\nBoolean satisfiability (SAT) has an extensive application domain in computer science, especially in electronic design automation applications. Circuit synthesis, optimization, and verification problems can be solved by transforming original problems to SAT problems. However, the SAT problem is known as NP-complete, which means there is no efficient method to solve it. Therefore, an efficient SAT solver to enhance the performance is always desired. We propose a hardware acceleration method for SAT problems. By surveying the properties of SAT problems and the decoding of low-density parity-check (LDPC) codes, a special class of error-correcting codes, we discover that both of them are constraint satisfaction problems. The belief propagation algorithm has been successfully applied to the decoding of LDPC, and the corresponding decoder hardware designs are extensively studied. Therefore, we proposed a belief propagation based algorithm to solve SAT problems. With this algorithm, the SAT solver can be accelerated by hardware. A software simulator is implemented to verify the proposed algorithm and the performance improvement is estimated. Our experiment results show that time complexity does not increase with the size of SAT problems and the proposed method can achieve at least 30x speedup compared to MiniSat.\nThis paper proposes a new method for an optimized mapping of temporal variables, describing a temporal stream data, into the recently proposed NeuCube spiking neural network architecture. This optimized mapping extends the use of the NeuCube, which was initially designed for spatiotemporal brain data, to work on arbitrary stream data and to achieve a better accuracy of temporal pattern recognition, a better and earlier event prediction and a better understanding of complex temporal stream data through visualization of the NeuCube connectivity. The effect of the new mapping is demonstrated on three bench mark problems. The first one is early prediction of patient sleep stage event from temporal physiological data. The second one is pattern recognition of dynamic temporal patterns of traffic in the Bay Area of California and the last one is the Challenge 2012 contest data set. In all cases the use of the proposed mapping leads to an improved accuracy of pattern recognition and event prediction and a better understanding of the data when compared to traditional machine learning techniques or spiking neural network reservoirs with arbitrary mapping of the variables.\nWhile many models are purposed for detecting the occurrence of significant events in financial systems, the task of providing qualitative detail on the developments is not usually as well automated. We present a deep learning approach for detecting relevant discussion in text and extracting natural language descriptions of events. Supervised by only a small set of event information, comprising entity names and dates, the model is leveraged by unsupervised learning of semantic vector representations on extensive text data. We demonstrate applicability to the study of financial risk based on news (6.6M articles), particularly bank distress and government interventions (243 events), where indices can signal the level of bank-stress-related reporting at the entity level, or aggregated at national or European level, while being coupled with explanations. Thus, we exemplify how text, as timely, widely available and descriptive data, can serve as a useful complementary source of information for financial and systemic risk analytics.\nWe review the task of Sentence Pair Scoring, popular in the literature in various forms - viewed as Answer Sentence Selection, Semantic Text Scoring, Next Utterance Ranking, Recognizing Textual Entailment, Paraphrasing or e.g. a component of Memory Networks.   We argue that all such tasks are similar from the model perspective and propose new baselines by comparing the performance of common IR metrics and popular convolutional, recurrent and attention-based neural models across many Sentence Pair Scoring tasks and datasets. We discuss the problem of evaluating randomized models, propose a statistically grounded methodology, and attempt to improve comparisons by releasing new datasets that are much harder than some of the currently used well explored benchmarks. We introduce a unified open source software framework with easily pluggable models and tasks, which enables us to experiment with multi-task reusability of trained sentence model. We set a new state-of-art in performance on the Ubuntu Dialogue dataset.\nProbabilistic inference algorithms such as Sequential Monte Carlo (SMC) provide powerful tools for constraining procedural models in computer graphics, but they require many samples to produce desirable results. In this paper, we show how to create procedural models which learn how to satisfy constraints. We augment procedural models with neural networks which control how the model makes random choices based on the output it has generated thus far. We call such models neurally-guided procedural models. As a pre-computation, we train these models to maximize the likelihood of example outputs generated via SMC. They are then used as efficient SMC importance samplers, generating high-quality results with very few samples. We evaluate our method on L-system-like models with image-based constraints. Given a desired quality threshold, neurally-guided models can generate satisfactory results up to 10x faster than unguided models.\nAs the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.\nThe subpath planning problem is a branch of the path planning problem, which has widespread applications in automated manufacturing process as well as vehicle and robot navigation. This problem is to find the shortest path or tour subject for travelling a set of given subpaths. The current approaches for dealing with the subpath planning problem are all based on meta-heuristic approaches. It is well-known that meta-heuristic based approaches have several deficiencies. To address them, we propose a novel approximation algorithm in the O(n^3) time complexity class, which guarantees to solve any subpath planning problem instance with the fixed ratio bound of 2. Also, the formal proofs of the claims, our empirical evaluation shows that our approximation method acts much better than a state-of-the-art method, both in result and execution time.\nIn many scientific and engineering applications, we are tasked with the optimisation of an expensive to evaluate black box function $f$. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $f$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of $f$ in a small but promising region and speedily identify the optimum. We formalise this task as a \\emph{multi-fidelity} bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop MF-GP-UCB, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour, and achieves better regret than strategies which ignore multi-fidelity information. Empirically, MF-GP-UCB outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments.\nCombining deep neural networks with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models. We propose a general framework capable of enhancing various types of neural networks (e.g., CNNs and RNNs) with declarative first-order logic rules. Specifically, we develop an iterative distillation method that transfers the structured information of logic rules into the weights of neural networks. We deploy the framework on a CNN for sentiment analysis, and an RNN for named entity recognition. With a few highly intuitive rules, we obtain substantial improvements and achieve state-of-the-art or comparable results to previous best-performing systems.\nWe address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks.\nWe consider a multi-neighborhood local search algorithm with a large number of possible neighborhoods. Each neighborhood is accompanied by a weight value which represents the probability of being chosen at each iteration. These weights are fixed before the algorithm runs, and are considered as parameters of the algorithm. Given a set of instances, off-line tuning of the algorithm's parameters can be done by automated algorithm configuration tools (e.g., SMAC). However, the large number of neighborhoods can make the tuning expensive and difficult even when the number of parameters has been reduced by some intuition. In this work, we propose a systematic method to characterize each neighborhood's behaviours, representing them as a feature vector, and using cluster analysis to form similar groups of neighborhoods. The novelty of our characterization method is the ability of reflecting changes of behaviours according to hardness of different solution quality regions. We show that using neighborhood clusters instead of individual neighborhoods helps to reduce the parameter configuration space without misleading the search of the tuning procedure. Moreover, this method is problem-independent and potentially can be applied in similar contexts.\nMost recent work focused on affect from facial expressions, and not as much on body. This work focuses on body affect analysis. Affect does not occur in isolation. Humans usually couple affect with an action in natural interactions; for example, a person could be talking and smiling. Recognizing body affect in sequences requires efficient algorithms to capture both the micro movements that differentiate between happy and sad and the macro variations between different actions. We depart from traditional approaches for time-series data analytics by proposing a multi-task learning model that learns a shared representation that is well-suited for action-affect classification as well as generation. For this paper we choose Conditional Restricted Boltzmann Machines to be our building block. We propose a new model that enhances the CRBM model with a factored multi-task component to become Multi-Task Conditional Restricted Boltzmann Machines (MTCRBMs). We evaluate our approach on two publicly available datasets, the Body Affect dataset and the Tower Game dataset, and show superior classification performance improvement over the state-of-the-art, as well as the generative abilities of our model.\nOver the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.\nReal-world problems are very difficult to optimize. However, many researchers have been solving benchmark problems that have been extensively investigated for the last decades even if they have very few direct applications. The Traveling Thief Problem (TTP) is a NP-hard optimization problem that aims to provide a more realistic model. TTP targets particularly routing problem under packing/loading constraints which can be found in supply chain management and transportation. In this paper, TTP is presented and formulated mathematically. A combined local search algorithm is proposed and compared with Random Local Search (RLS) and Evolutionary Algorithm (EA). The obtained results are quite promising since new better solutions were found.\nUnlike traditional programs (such as operating systems or word processors) which have large amounts of code, machine learning tasks use programs with relatively small amounts of code (written in machine learning libraries), but voluminous amounts of data. Just like developers of traditional programs debug errors in their code, developers of machine learning tasks debug and fix errors in their data. However, algorithms and tools for debugging and fixing errors in data are less common, when compared to their counterparts for detecting and fixing errors in code. In this paper, we consider classification tasks where errors in training data lead to misclassifications in test points, and propose an automated method to find the root causes of such misclassifications. Our root cause analysis is based on Pearl's theory of causation, and uses Pearl's PS (Probability of Sufficiency) as a scoring metric. Our implementation, Psi, encodes the computation of PS as a probabilistic program, and uses recent work on probabilistic programs and transformations on probabilistic programs (along with gray-box models of machine learning algorithms) to efficiently compute PS. Psi is able to identify root causes of data errors in interesting data sets.\nDiagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs.\nLoad disaggregation based on aided linear integer programming (ALIP) is proposed. We start with a conventional linear integer programming (IP) based disaggregation and enhance it in several ways. The enhancements include additional constraints, correction based on a state diagram, median filtering, and linear programming-based refinement. With the aid of these enhancements, the performance of IP-based disaggregation is significantly improved. The proposed ALIP system relies only on the instantaneous load samples instead of waveform signatures, and hence does not crucially depend on high sampling frequency. Experimental results show that the proposed ALIP system performs better than the conventional IP-based load disaggregation system.\nWe present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator as in Generative Adversarial Nets, but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.\nWhat makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the simplifying assumption is commonly made that images are only compared to one unique measure of similarity. A main reason for this is that contradicting notions of similarities cannot be captured in a single space. To address this shortcoming, we propose Conditional Similarity Networks (CSNs) that learn embeddings differentiated into semantically distinct subspaces that capture the different notions of similarities. CSNs jointly learn a disentangled embedding where features for different similarities are encoded in separate dimensions as well as masks that select and reweight relevant dimensions to induce a subspace that encodes a specific similarity notion. We show that our approach learns interpretable image representations with visually relevant semantic subspaces. Further, when evaluating on triplet questions from multiple similarity notions our model even outperforms the accuracy obtained by training individual specialized networks for each notion separately.\nWe investigate evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available. Recent works in response generation have adopted metrics from machine translation to compare a model's generated response to a single target response. We show that these metrics correlate very weakly with human judgements in the non-technical Twitter domain, and not at all in the technical Ubuntu domain. We provide quantitative and qualitative results highlighting specific weaknesses in existing metrics, and provide recommendations for future development of better automatic evaluation metrics for dialogue systems.\nUnderstanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.\nClearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. We propose a new model that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image. We propose a novel loss function based on sampling and reinforcement learning that learns to generate sentences that realize a global sentence property, such as class specificity. Our results on a fine-grained bird species classification dataset show that our model is able to generate explanations which are not only consistent with an image but also more discriminative than descriptions produced by existing captioning methods.\nIn this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential verification task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semantic labels, we learn a powerful visual representation using a Convolutional Neural Network (CNN). The representation contains complementary information to that learned from supervised image datasets like ImageNet. Qualitative results show that our method captures information that is temporally varying, such as human pose. When used as pre-training for action recognition, our method gives significant gains over learning without external data on benchmark datasets like UCF101 and HMDB51. To demonstrate its sensitivity to human pose, we show results for pose estimation on the FLIC and MPII datasets that are competitive, or better than approaches using significantly more supervision. Our method can be combined with supervised representations to provide an additional boost in accuracy.\nCOCO is a platform for Comparing Continuous Optimizers in a black-box setting. It aims at automatizing the tedious and repetitive task of benchmarking numerical optimization algorithms to the greatest possible extent. We present the rationals behind the development of the platform as a general proposition for a guideline towards better benchmarking. We detail underlying fundamental concepts of COCO such as its definition of a problem, the idea of instances, the relevance of target values, and runtime as central performance measure. Finally, we give a quick overview of the basic code structure and the available test suites.\nJoint state and parameter estimation is a core problem for dynamic Bayesian networks. Although modern probabilistic inference toolkits make it relatively easy to specify large and practically relevant probabilistic models, the silver bullet---an efficient and general online inference algorithm for such problems---remains elusive, forcing users to write special-purpose code for each application. We propose a novel blackbox algorithm -- a hybrid of particle filtering for state variables and assumed density filtering for parameter variables. It has following advantages: (a) it is efficient due to its online nature, and (b) it is applicable to both discrete and continuous parameter spaces . On a variety of toy and real models, our system is able to generate more accurate results within a fixed computation budget. This preliminary evidence indicates that the proposed approach is likely to be of practical use.\nWe study the robustness of active learning (AL) algorithms against prior misspecification: whether an algorithm achieves similar performance using a perturbed prior as compared to using the true prior. In both the average and worst cases of the maximum coverage setting, we prove that all $\\alpha$-approximate algorithms are robust (i.e., near $\\alpha$-approximate) if the utility is Lipschitz continuous in the prior. We further show that robustness may not be achieved if the utility is non-Lipschitz. This suggests we should use a Lipschitz utility for AL if robustness is required. For the minimum cost setting, we can also obtain a robustness result for approximate AL algorithms. Our results imply that many commonly used AL algorithms are robust against perturbed priors. We then propose the use of a mixture prior to alleviate the problem of prior misspecification. We analyze the robustness of the uniform mixture prior and show experimentally that it performs reasonably well in practice.\nSince the advent of computers, many tasks which required humans to spend a lot of time and energy have been trivialized by the computers' ability to perform repetitive tasks extremely quickly. Playing chess is one such task. It was one of the first games which was `solved' using AI. With the advent of deep learning, chess playing agents can surpass human ability with relative ease. However algorithms using deep learning must learn millions of parameters. This work looks at the game of chess through the lens of genetic algorithms. We train a genetic player from scratch using only a handful of learnable parameters. We use Multi-Niche Crowding to optimize positional Value Tables (PVTs) which are used extensively in chess engines to evaluate the goodness of a position. With a very simple setup and after only 1000 generations of evolution, the player reaches the level of an International Master.\nIterated applications of belief change operators are essential for different scenarios such as that of ontology evolution where new information is not presented at once but only in piecemeal fashion within a sequence. I discuss iterated applications of so called reinterpretation operators that trace conflicts between ontologies back to the ambiguous of symbols and that provide conflict resolution strategies with bridging axioms. The discussion centers on adaptations of the classical iteration postulates according to Darwiche and Pearl. The main result of the paper is that reinterpretation operators fulfill the postulates for sequences containing only atomic triggers. For complex triggers, a fulfillment is not guaranteed and indeed there are different reasons for the different postulates why they should not be fulfilled in the particular scenario of ontology revision with well developed ontologies.\nNeural network based approaches for sentence relation modeling automatically generate hidden matching features from raw sentence pairs. However, the quality of matching feature representation may not be satisfied due to complex semantic relations such as entailment or contradiction. To address this challenge, we propose a new deep neural network architecture that jointly leverage pre-trained word embedding and auxiliary character embedding to learn sentence meanings. The two kinds of word sequence representations as inputs into multi-layer bidirectional LSTM to learn enhanced sentence representation. After that, we construct matching features followed by another temporal CNN to learn high-level hidden matching feature representations. Experimental results demonstrate that our approach consistently outperforms the existing methods on standard evaluation datasets.\nBelief revision has been studied mainly with respect to background logics that are monotonic in character. In this paper we study belief revision when the underlying logic is non-monotonic instead--an inherently interesting problem that is under explored. In particular, we will focus on the revision of a body of beliefs that is represented as a logic program under the answer set semantics, while the new information is also similarly represented as a logic program. Our approach is driven by the observation that unlike in a monotonic setting where, when necessary, consistency in a revised body of beliefs is maintained by jettisoning some old beliefs, in a non-monotonic setting consistency can be restored by adding new beliefs as well. We will define a syntactic revision function and subsequently provide representation theorem for characterising it.\nDung's abstract argumentation theory is a widely used formalism to model conflicting information and to draw conclusions in such situations. Hereby, the knowledge is represented by so-called argumentation frameworks (AFs) and the reasoning is done via semantics extracting acceptable sets. All reasonable semantics are based on the notion of conflict-freeness which means that arguments are only jointly acceptable when they are not linked within the AF. In this paper, we study the question which information on top of conflict-free sets is needed to compute extensions of a semantics at hand. We introduce a hierarchy of so-called verification classes specifying the required amount of information. We show that well-known standard semantics are exactly verifiable through a certain such class. Our framework also gives a means to study semantics lying inbetween known semantics, thus contributing to a more abstract understanding of the different features argumentation semantics offer.\nUnderstanding the behavior of belief change operators for fragments of classical logic has received increasing interest over the last years. Results in this direction are mainly concerned with adapting representation theorems. However, fragment-driven belief change also leads to novel research questions. In this paper we propose the concept of belief distribution, which can be understood as the reverse task of merging. More specifically, we are interested in the following question: given an arbitrary knowledge base $K$ and some merging operator $\\Delta$, can we find a profile $E$ and a constraint $\\mu$, both from a given fragment of classical logic, such that $\\Delta_\\mu(E)$ yields a result equivalent to $K$? In other words, we are interested in seeing if $K$ can be distributed into knowledge bases of simpler structure, such that the task of merging allows for a reconstruction of the original knowledge. Our initial results show that merging based on drastic distance allows for an easy distribution of knowledge, while the power of distribution for operators based on Hamming distance relies heavily on the fragment of choice.\nRealizability for knowledge representation formalisms studies the following question: given a semantics and a set of interpretations, is there a knowledge base whose semantics coincides exactly with the given interpretation set? We introduce a general framework for analyzing realizability in abstract dialectical frameworks (ADFs) and various of its subclasses. In particular, the framework applies to Dung argumentation frameworks, SETAFs by Nielsen and Parsons, and bipolar ADFs. We present a uniform characterization method for the admissible, complete, preferred and model/stable semantics. We employ this method to devise an algorithm that decides realizability for the mentioned formalisms and semantics; moreover the algorithm allows for constructing a desired knowledge base whenever one exists. The algorithm is built in a modular way and thus easily extensible to new formalisms and semantics. We have also implemented our approach in answer set programming, and used the implementation to obtain several novel results on the relative expressiveness of the abovementioned formalisms.\nNatural language correction has the potential to help language learners improve their writing skills. While approaches with separate classifiers for different error types have high precision, they do not flexibly handle errors such as redundancy or non-idiomatic phrasing. On the other hand, word and phrase-based machine translation methods are not designed to cope with orthographic errors, and have recently been outpaced by neural models. Motivated by these issues, we present a neural network-based approach to language correction. The core component of our method is an encoder-decoder recurrent neural network with an attention mechanism. By operating at the character level, the network avoids the problem of out-of-vocabulary words. We illustrate the flexibility of our approach on dataset of noisy, user-generated text collected from an English learner forum. When combined with a language model, our method achieves a state-of-the-art $F_{0.5}$-score on the CoNLL 2014 Shared Task. We further demonstrate that training the network on additional data with synthesized errors can improve performance.\nThe League Championship Algorithm (LCA) is sport-inspired optimization algorithm that was introduced by Ali Husseinzadeh Kashan in the year 2009. It has since drawn enormous interest among the researchers because of its potential efficiency in solving many optimization problems and real-world applications. The LCA has also shown great potentials in solving non-deterministic polynomial time (NP-complete) problems. This survey presents a brief synopsis of the LCA literatures in peer-reviewed journals, conferences and book chapters. These research articles are then categorized according to indexing in the major academic databases (Web of Science, Scopus, IEEE Xplore and the Google Scholar). The analysis was also done to explore the prospects and the challenges of the algorithm and its acceptability among researchers. This systematic categorization can be used as a basis for future studies.\nIn this paper, we study novel neural network structures to better model long term dependency in sequential data. We propose to use more memory units to keep track of more preceding states in recurrent neural networks (RNNs), which are all recurrently fed to the hidden layers as feedback through different weighted paths. By extending the popular recurrent structure in RNNs, we provide the models with better short-term memory mechanism to learn long term dependency in sequences. Analogous to digital filters in signal processing, we call these structures as higher order RNNs (HORNNs). Similar to RNNs, HORNNs can also be learned using the back-propagation through time method. HORNNs are generally applicable to a variety of sequence modelling tasks. In this work, we have examined HORNNs for the language modeling task using two popular data sets, namely the Penn Treebank (PTB) and English text8 data sets. Experimental results have shown that the proposed HORNNs yield the state-of-the-art performance on both data sets, significantly outperforming the regular RNNs as well as the popular LSTMs.\nNovelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News.\nSparse representation has been widely studied in visual tracking, which has shown promising tracking performance. Despite a lot of progress, the visual tracking problem is still a challenging task due to appearance variations over time. In this paper, we propose a novel sparse tracking algorithm that well addresses temporal appearance changes, by enforcing template representability and temporal consistency (TRAC). By modeling temporal consistency, our algorithm addresses the issue of drifting away from a tracking target. By exploring the templates' long-term-short-term representability, the proposed method adaptively updates the dictionary using the most descriptive templates, which significantly improves the robustness to target appearance changes. We compare our TRAC algorithm against the state-of-the-art approaches on 12 challenging benchmark image sequences. Both qualitative and quantitative results demonstrate that our algorithm significantly outperforms previous state-of-the-art trackers.\nCurrent learning algorithms face many difficulties in learning simple patterns and using them to learn more complex ones. They also require more examples than humans do to learn the same pattern, assuming no prior knowledge. In this paper, a new learning framework is introduced that is called common-description learning (CDL). This framework has been tested on 32 small multi-task datasets, and the results show that it was able to learn complex algorithms from a few number of examples. The final model is perfectly interpretable and its depth depends on the question. What is meant by depth here is that whenever needed, the model learns to break down the problem into simpler subproblems and solves them using previously learned models. Finally, we explain the capabilities of our framework in discovering complex relations in data and how it can help in improving language understanding in machines.\nWe consider abstract-argumentation-theoretic coalition formability in this work. Taking a model from political alliance among political parties, we will contemplate profitability, and then formability, of a coalition. As is commonly understood, a group forms a coalition with another group for a greater good, the goodness measured against some criteria. As is also commonly understood, however, a coalition may deliver benefits to a group X at the sacrifice of something that X was able to do before coalition formation, which X may be no longer able to do under the coalition. Use of the typical conflict-free sets of arguments is not very fitting for accommodating this aspect of coalition, which prompts us to turn to a weaker notion, conflict-eliminability, as a property that a set of arguments should primarily satisfy. We require numerical quantification of attack strengths as well as of argument strengths for its characterisation. We will first analyse semantics of profitability of a given conflict-eliminable set forming a coalition with another conflict-eliminable set, and will then provide four coalition formability semantics, each of which formalises certain utility postulate(s) taking the coalition profitability into account.\nProbabilistic Boolean networks (PBNs) is an important mathematical framework widely used for modelling and analysing biological systems. PBNs are suited for modelling large biological systems, which more and more often arise in systems biology. However, the large system size poses a~significant challenge to the analysis of PBNs, in particular, to the crucial analysis of their steady-state behaviour. Numerical methods for performing steady-state analyses suffer from the state-space explosion problem, which makes the utilisation of statistical methods the only viable approach. However, such methods require long simulations of PBNs, rendering the simulation speed a crucial efficiency factor. For large PBNs and high estimation precision requirements, a slow simulation speed becomes an obstacle. In this paper, we propose a structure-based method for fast simulation of PBNs. This method first performs a network reduction operation and then divides nodes into groups for parallel simulation. Experimental results show that our method can lead to an approximately 10 times speedup for computing steady-state probabilities of a real-life biological network.\nWe give solutions to two fundamental computational problems in ontology-based data access with the W3C standard ontology language OWL 2 QL: the succinctness problem for first-order rewritings of ontology-mediated queries (OMQs), and the complexity problem for OMQ answering. We classify OMQs according to the shape of their conjunctive queries (treewidth, the number of leaves) and the existential depth of their ontologies. For each of these classes, we determine the combined complexity of OMQ answering, and whether all OMQs in the class have polynomial-size first-order, positive existential, and nonrecursive datalog rewritings. We obtain the succinctness results using hypergraph programs, a new computational model for Boolean functions, which makes it possible to connect the size of OMQ rewritings and circuit complexity.\nWe train a number of neural networks to play games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen. As the benchmark we used the convolutional model proposed in NIPS and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.\nThis study suggests a new prediction model for chaotic time series inspired by the brain emotional learning of mammals. We describe the structure and function of this model, which is referred to as BELPM (Brain Emotional Learning-Based Prediction Model). Structurally, the model mimics the connection between the regions of the limbic system, and functionally it uses weighted k nearest neighbors to imitate the roles of those regions. The learning algorithm of BELPM is defined using steepest descent (SD) and the least square estimator (LSE). Two benchmark chaotic time series, Lorenz and Henon, have been used to evaluate the performance of BELPM. The obtained results have been compared with those of other prediction methods. The results show that BELPM has the capability to achieve a reasonable accuracy for long-term prediction of chaotic time series, using a limited amount of training data and a reasonably low computational time.\nThe knowledge base paradigm aims to express domain knowledge in a rich formal language, and to use this domain knowledge as a knowledge base to solve various problems and tasks that arise in the domain by applying multiple forms of inference. As such, the paradigm applies a strict separation of concerns between information and problem solving. In this paper, we analyze the principles and feasibility of the knowledge base paradigm in the context of an important class of applications: interactive configuration problems. In interactive configuration problems, a configuration of interrelated objects under constraints is searched, where the system assists the user in reaching an intended configuration. It is widely recognized in industry that good software solutions for these problems are very difficult to develop. We investigate such problems from the perspective of the KB paradigm. We show that multiple functionalities in this domain can be achieved by applying different forms of logical inferences on a formal specification of the configuration domain. We report on a proof of concept of this approach in a real-life application with a banking company. To appear in Theory and Practice of Logic Programming (TPLP).\nEnergy is a limited resource which has to be managed wisely, taking into account both supply-demand matching and capacity constraints in the distribution grid. One aspect of the smart energy management at the building level is given by the problem of real-time detection of flexible demand available. In this paper we propose the use of energy disaggregation techniques to perform this task. Firstly, we investigate the use of existing classification methods to perform energy disaggregation. A comparison is performed between four classifiers, namely Naive Bayes, k-Nearest Neighbors, Support Vector Machine and AdaBoost. Secondly, we propose the use of Restricted Boltzmann Machine to automatically perform feature extraction. The extracted features are then used as inputs to the four classifiers and consequently shown to improve their accuracy. The efficiency of our approach is demonstrated on a real database consisting of detailed appliance-level measurements with high temporal resolution, which has been used for energy disaggregation in previous studies, namely the REDD. The results show robustness and good generalization capabilities to newly presented buildings with at least 96% accuracy.\nThe Dialog State Tracking Challenge 4 (DSTC 4) differentiates itself from the previous three editions as follows: the number of slot-value pairs present in the ontology is much larger, no spoken language understanding output is given, and utterances are labeled at the subdialog level. This paper describes a novel dialog state tracking method designed to work robustly under these conditions, using elaborate string matching, coreference resolution tailored for dialogs and a few other improvements. The method can correctly identify many values that are not explicitly present in the utterance. On the final evaluation, our method came in first among 7 competing teams and 24 entries. The F1-score achieved by our method was 9 and 7 percentage points higher than that of the runner-up for the utterance-level evaluation and for the subdialog-level evaluation, respectively.\nMerging beliefs requires the plausibility of the sources of the information to be merged. They are typically assumed equally reliable in lack of hints indicating otherwise; yet, a recent line of research spun from the idea of deriving this information from the revision process itself. In particular, the history of previous revisions and previous merging examples provide information for performing subsequent mergings.   Yet, no examples or previous revisions may be available. In spite of the apparent lack of information, something can still be inferred by a try-and-check approach: a relative reliability ordering is assumed, the merging process is performed based on it, and the result is compared with the original information. The outcome of this check may be incoherent with the initial assumption, like when a completely reliable source is rejected some of the information it provided. In such cases, the reliability ordering assumed in the first place can be excluded from consideration. The first theorem of this article proves that such a scenario is indeed possible. Other results are obtained under various definition of reliability and merging.\nAcoustic event detection is essential for content analysis and description of multimedia recordings. The majority of current literature on the topic learns the detectors through fully-supervised techniques employing strongly labeled data. However, the labels available for majority of multimedia data are generally weak and do not provide sufficient detail for such methods to be employed. In this paper we propose a framework for learning acoustic event detectors using only weakly labeled data. We first show that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem. We then suggest two frameworks for solving multiple-instance learning, one based on support vector machines, and the other on neural networks. The proposed methods can help in removing the time consuming and expensive process of manually annotating data to facilitate fully supervised learning. Moreover, it can not only detect events in a recording but can also provide temporal locations of events in the recording. This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.\nComputerized Evaluation of English Essays is performed using Machine learning techniques like Latent Semantic Analysis (LSA), Generalized LSA, Bilingual Evaluation Understudy and Maximum Entropy. Ontology, a concept map of domain knowledge, can enhance the performance of these techniques. Use of Ontology makes the evaluation process holistic as presence of keywords, synonyms, the right word combination and coverage of concepts can be checked. In this paper, the above mentioned techniques are implemented both with and without Ontology and tested on common input data consisting of technical answers of Computer Science. Domain Ontology of Computer Graphics is designed and developed. The software used for implementation includes Java Programming Language and tools such as MATLAB, Prot\\'eg\\'e, etc. Ten questions from Computer Graphics with sixty answers for each question are used for testing. The results are analyzed and it is concluded that the results are more accurate with use of Ontology.\nInspired by the hierarchical cognitive architecture and the perception-action model (PAM), we propose that the internal status acts as a kind of common-coding representation which affects, mediates and even regulates the sensorimotor behaviours. These regulation can be depicted in the Bayesian framework, that is why cognitive agents are able to generate behaviours with subtle differences according to their emotion or recognize the emotion by perception. A novel recurrent neural network called recurrent neural network with parametric bias units (RNNPB) runs in three modes, constructing a two-level emotion regulated learning model, was further applied to testify this theory in two different cases.\nIn recent years, deep architectures have been used for transfer learning with state-of-the-art performance in many datasets. The properties of their features remain, however, largely unstudied under the transfer perspective. In this work, we present an extensive analysis of the resiliency of feature vectors extracted from deep models, with special focus on the trade-off between performance and compression rate. By introducing perturbations to image descriptions extracted from a deep convolutional neural network, we change their precision and number of dimensions, measuring how it affects the final score. We show that deep features are more robust to these disturbances when compared to classical approaches, achieving a compression rate of 98.4%, while losing only 0.88% of their original score for Pascal VOC 2007.\nImportant advances have been made in the fuzzy quantification field. Nevertheless, some problems remain when we face the decision of selecting the most convenient model for a specific application. In the literature, several desirable adequacy properties have been proposed, but theoretical limits impede quantification models from simultaneously fulfilling every adequacy property that has been defined. Besides, the complexity of model definitions and adequacy properties makes very difficult for real users to understand the particularities of the different models that have been presented. In this work we will present several criteria conceived to help in the process of selecting the most adequate Quantifier Fuzzification Mechanisms for specific practical applications. In addition, some of the best known well-behaved models will be compared against this list of criteria. Based on this analysis, some guidance to choose fuzzy quantification models for practical applications will be provided.\nAutomatic optimization of spoken dialog management policies that are robust to environmental noise has long been the goal for both academia and industry. Approaches based on reinforcement learning have been proved to be effective. However, the numerical representation of dialog policy is human-incomprehensible and difficult for dialog system designers to verify or modify, which limits its practical application. In this paper we propose a novel framework for optimizing dialog policies specified in domain language using genetic algorithm. The human-interpretable representation of policy makes the method suitable for practical employment. We present learning algorithms using user simulation and real human-machine dialogs respectively.Empirical experimental results are given to show the effectiveness of the proposed approach.\nAnytime inference is inference performed incrementally, with the accuracy of the inference being controlled by a tunable parameter, usually time. Such anytime inference algorithms are also usually interruptible, gradually converging to the exact inference value until terminated. While anytime inference algorithms for specific domains like probability potentials exist in the literature, our objective in this article is to obtain an anytime inference algorithm which is sufficiently generic to cover a wide range of domains. For this we utilise the theory of generic inference as a basis for constructing an anytime inference algorithm, and in particular, extending work done on ordered valuation algebras. The novel contribution of this work is the construction of anytime algorithms in a generic framework, which automatically gives us instantiations in various useful domains. We also show how to apply this generic framework for anytime inference in semiring induced valuation algebras, an important subclass of valuation algebras, which includes instances like probability potentials, disjunctive normal forms and distributive lattices.   Keywords: Approximation; Anytime algorithms; Resource-bounded computation; Generic inference; Valuation algebras; Local computation; Binary join trees.\nIn Ontology Based Data Access (OBDA) users pose SPARQL queries over an ontology that lies on top of relational datasources. These queries are translated on-the-fly into SQL queries by OBDA systems. Standard SPARQL-to-SQL translation techniques in OBDA often produce SQL queries containing redundant joins and unions, even after a number of semantic and structural optimizations. These redundancies are detrimental to the performance of query answering, especially in complex industrial OBDA scenarios with large enterprise databases. To address this issue, we introduce two novel notions of OBDA constraints and show how to exploit them for efficient query answering. We conduct an extensive set of experiments on large datasets using real world data and queries, showing that these techniques strongly improve the performance of query answering up to orders of magnitude.\nLink prediction in large knowledge graphs has received a lot of attention recently because of its importance for inferring missing relations and for completing and improving noisily extracted knowledge graphs. Over the years a number of machine learning researchers have presented various models for predicting the presence of missing relations in a knowledge base. Although all the previous methods are presented with empirical results that show high performance on select datasets, there is almost no previous work on understanding the connection between properties of a knowledge base and the performance of a model. In this paper we analyze the RESCAL method and prove that it can not encode asymmetric transitive relations in knowledge bases.\nMany academic disciplines - including information systems, computer science, and operations management - face scheduling problems as important decision making tasks. Since many scheduling problems are NP-hard in the strong sense, there is a need for developing solution heuristics. For scheduling problems with setup times on unrelated parallel machines, there is limited research on solution methods and to the best of our knowledge, parallel computer architectures have not yet been taken advantage of. We address this gap by proposing and implementing a new solution heuristic and by testing different parallelization strategies. In our computational experiments, we show that our heuristic calculates near-optimal solutions even for large instances and that computing time can be reduced substantially by our parallelization approach.\nThis paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learning-to-rank task, where it achieves competitive performance. We derive conditions under which our estimator is unbiased---these conditions are weaker than prior heuristics for slate evaluation---and experimentally demonstrate a smaller bias than parametric approaches, even when these conditions are violated. Finally, our theory and experiments also show exponential savings in the amount of required data compared with general unbiased estimators.\nThere is an ever growing number of users with accounts on multiple social media and networking sites. Consequently, there is increasing interest in matching user accounts and profiles across different social networks in order to create aggregate profiles of users. In this paper, we present models for Digital Stylometry, which is a method for matching users through stylometry inspired techniques. We experimented with linguistic, temporal, and combined temporal-linguistic models for matching user accounts, using standard and novel techniques. Using publicly available data, our best model, a combined temporal-linguistic one, was able to correctly match the accounts of 31% of 5,612 distinct users across Twitter and Facebook.\nNumerous important problems can be framed as learning from graph data. We propose a framework for learning convolutional neural networks for arbitrary graphs. These graphs may be undirected, directed, and with both discrete and continuous node and edge attributes. Analogous to image-based convolutional networks that operate on locally connected regions of the input, we present a general approach to extracting locally connected regions from graphs. Using established benchmark data sets, we demonstrate that the learned feature representations are competitive with state of the art graph kernels and that their computation is highly efficient.\nWe explore the implications of using fuzzy techniques (mainly those commonly used in the linguistic description/summarization of data discipline) from a natural language generation perspective. For this, we provide an extensive discussion of some general convergence points and an exploration of the relationship between the different tasks involved in the standard NLG system pipeline architecture and the most common fuzzy approaches used in linguistic summarization/description of data, such as fuzzy quantified statements, evaluation criteria or aggregation operators. Each individual discussion is illustrated with a related use case. Recent work made in the context of cross-fertilization of both research fields is also referenced. This paper encompasses general ideas that emerged as part of the PhD thesis \"Application of fuzzy sets in data-to-text systems\". It does not present a specific application or a formal approach, but rather discusses current high-level issues and potential usages of fuzzy sets (focused on linguistic summarization of data) in natural language generation.\nDeep Reinforcement Learning methods have achieved state of the art performance in learning control policies for the games in the Atari 2600 domain. One of the important parameters in the Arcade Learning Environment (ALE) is the frame skip rate. It decides the granularity at which agents can control game play. A frame skip value of $k$ allows the agent to repeat a selected action $k$ number of times. The current state of the art architectures like Deep Q-Network (DQN) and Dueling Network Architectures (DuDQN) consist of a framework with a static frame skip rate, where the action output from the network is repeated for a fixed number of frames regardless of the current state. In this paper, we propose a new architecture, Dynamic Frame skip Deep Q-Network (DFDQN) which makes the frame skip rate a dynamic learnable parameter. This allows us to choose the number of times an action is to be repeated based on the current state. We show empirically that such a setting improves the performance on relatively harder games like Seaquest.\nIn a recent paper, we have shown that Plan Recognition over STRIPS can be formulated and solved using Classical Planning heuristics and algorithms. In this work, we show that this formulation subsumes the standard formulation of Plan Recognition over libraries through a compilation of libraries into STRIPS theories. The libraries correspond to AND/OR graphs that may be cyclic and where children of AND nodes may be partially ordered. These libraries include Context-Free Grammars as a special case, where the Plan Recognition problem becomes a parsing with missing tokens problem. Plan Recognition over the standard libraries become Planning problems that can be easily solved by any modern planner, while recognition over more complex libraries, including Context-Free Grammars (CFGs), illustrate limitations of current Planning heuristics and suggest improvements that may be relevant in other Planning problems too.\nSelf-Organizing Map (SOM) is a neural network model which is used to obtain a topology-preserving mapping from the (usually high dimensional) input/feature space to an output/map space of fewer dimensions (usually two or three in order to facilitate visualization). Neurons in the output space are connected with each other but this structure remains fixed throughout training and learning is achieved through the updating of neuron reference vectors in feature space. Despite the fact that growing variants of SOM overcome the fixed structure limitation they increase computational cost and also do not allow the removal of a neuron after its introduction. In this paper, a variant of SOM is proposed called AMSOM (Adaptive Moving Self-Organizing Map) that on the one hand creates a more flexible structure where neuron positions are dynamically altered during training and on the other hand tackles the drawback of having a predefined grid by allowing neuron addition and/or removal during training. Experiments using multiple literature datasets show that the proposed method improves training performance of SOM, leads to a better visualization of the input dataset and provides a framework for determining the optimal number and structure of neurons.\nSequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue. In an effort to model this kind of generative process, we propose a neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with recent neural network architectures. We evaluate the model performance through automatic evaluation metrics and by carrying out a human evaluation. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.\nVariational inference provides approximations to the computationally intractable posterior distribution in Bayesian networks. A prominent medical application of noisy-or Bayesian network is to infer potential diseases given observed symptoms. Previous studies focus on approximating a handful of complicated pathological cases using variational transformation. Our goal is to use variational transformation as part of a novel hybridized inference for serving reliable and real time diagnosis at web scale. We propose a hybridized inference that allows variational parameters to be estimated without disease posteriors or priors, making the inference faster and much of its computation recyclable. In addition, we propose a transformation ranking algorithm that is very stable to large variances in network prior probabilities, a common issue that arises in medical applications of Bayesian networks. In experiments, we perform comparative study on a large real life medical network and scalability study on a much larger (36,000x) synthesized network.\nIn this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through residual networks vary in length. Further, a lesion study reveals that these paths show ensemble-like behavior in the sense that they do not strongly depend on each other. Finally, and most surprising, most paths are shorter than one might expect, and only the short paths are needed during training, as longer paths do not contribute any gradient. For example, most of the gradient in a residual network with 110 layers comes from paths that are only 10-34 layers deep. Our results reveal one of the key characteristics that seem to enable the training of very deep networks: Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of very deep networks.\nOne way to approach end-to-end autonomous driving is to learn a policy function that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy function is tuned to minimize the difference between the predicted and ground-truth actions. A policy function trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy functions. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often requires a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.\nLarge knowledge bases (KBs) are useful in many tasks, but it is unclear how to integrate this sort of knowledge into \"deep\" gradient-based learning systems. To address this problem, we describe a probabilistic deductive database, called TensorLog, in which reasoning uses a differentiable process. In TensorLog, each clause in a logical theory is first converted into certain type of factor graph. Then, for each type of query to the factor graph, the message-passing steps required to perform belief propagation (BP) are \"unrolled\" into a function, which is differentiable. We show that these functions can be composed recursively to perform inference in non-trivial logical theories containing multiple interrelated clauses and predicates. Both compilation and inference in TensorLog are efficient: compilation is linear in theory size and proof depth, and inference is linear in database size and the number of message-passing steps used in BP. We also present experimental results with TensorLog and discuss its relationship to other first-order probabilistic logics.\nGiven that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories.\nWe consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.\nStochastic partition models tailor a product space into a number of rectangular regions such that the data within each region exhibit certain types of homogeneity. Due to constraints of partition strategy, existing models may cause unnecessary dissections in sparse regions when fitting data in dense regions. To alleviate this limitation, we propose a parsimonious partition model, named Stochastic Patching Process (SPP), to deal with multi-dimensional arrays. SPP adopts an \"enclosing\" strategy to attach rectangular patches to dense regions. SPP is self-consistent such that it can be extended to infinite arrays. We apply SPP to relational modeling and the experimental results validate its merit compared to the state-of-the-arts.\nRecent advances in deep learning have enabled the extraction of high-level features from raw sensor data which has opened up new possibilities in many different fields, including computer generated choreography. In this paper we present a system chor-rnn for generating novel choreographic material in the nuanced choreographic language and style of an individual choreographer. It also shows promising results in producing a higher level compositional cohesion, rather than just generating sequences of movement. At the core of chor-rnn is a deep recurrent neural network trained on raw motion capture data and that can generate new dance sequences for a solo dancer. Chor-rnn can be used for collaborative human-machine choreography or as a creative catalyst, serving as inspiration for a choreographer.\nThe iterative nature of the expectation maximization (EM) algorithm presents a challenge for privacy-preserving estimation, as each iteration increases the amount of noise needed. We propose a practical private EM algorithm that overcomes this challenge using two innovations: (1) a novel moment perturbation formulation for differentially private EM (DP-EM), and (2) the use of two recently developed composition methods to bound the privacy \"cost\" of multiple EM iterations: the moments accountant (MA) and zero-mean concentrated differential privacy (zCDP). Both MA and zCDP bound the moment generating function of the privacy loss random variable and achieve a refined tail bound, which effectively decrease the amount of additive noise. We present empirical results showing the benefits of our approach, as well as similar performance between these two composition methods in the DP-EM setting for Gaussian mixture models. Our approach can be readily extended to many iterative learning algorithms, opening up various exciting future directions.\nSmile is an irrefutable expression that shows the physical state of the mind in both true and deceptive ways. Generally, it shows happy state of the mind, however, `smiles' can be deceptive, for example people can give a smile when they feel happy and sometimes they might also give a smile (in a different way) when they feel pity for others. This work aims to distinguish spontaneous (felt) smile expressions from posed (deliberate) smiles by extracting and analyzing both global (macro) motion of the face and subtle (micro) changes in the facial expression features through both tracking a series of facial fiducial markers as well as using dense optical flow. Specifically the eyes and lips features are captured and used for analysis. It aims to automatically classify all smiles into either `spontaneous' or `posed' categories, by using support vector machines (SVM). Experimental results on large database show promising results as compared to other relevant methods.\nBayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.\nEach human genome is a 3 billion base pair set of encoding instructions. Decoding the genome using deep learning fundamentally differs from most tasks, as we do not know the full structure of the data and therefore cannot design architectures to suit it. As such, architectures that fit the structure of genomics should be learned not prescribed. Here, we develop a novel search algorithm, applicable across domains, that discovers an optimal architecture which simultaneously learns general genomic patterns and identifies the most important sequence motifs in predicting functional genomic outcomes. The architectures we find using this algorithm succeed at using only RNA expression data to predict gene regulatory structure, learn human-interpretable visualizations of key sequence motifs, and surpass state-of-the-art results on benchmark genomics challenges.\nThe alternating direction method of multipliers (ADMM) is a versatile tool for solving a wide range of constrained optimization problems, with differentiable or non-differentiable objective functions. Unfortunately, its performance is highly sensitive to a penalty parameter, which makes ADMM often unreliable and hard to automate for a non-expert user. We tackle this weakness of ADMM by proposing a method to adaptively tune the penalty parameters to achieve fast convergence. The resulting adaptive ADMM (AADMM) algorithm, inspired by the successful Barzilai-Borwein spectral method for gradient descent, yields fast convergence and relative insensitivity to the initial stepsize and problem scaling.\nIn Chile, does not exist an independent entity that publishes quantitative or qualitative surveys to understand the traditional media environment and its adaptation on the Social Web. Nowadays, Chilean newsreaders are increasingly using social web platforms as their primary source of information, among which Twitter plays a central role. Historical media and pure players are developing different strategies to increase their audience and influence on this platform. In this article, we propose a methodology based on data mining techniques to provide a first level of analysis of the new Chilean media environment. We use a crawling technique to mine news streams of 37 different Chilean media actively presents on Twitter and propose several indicators to compare them. We analyze their volumes of production, their potential audience, and using NLP techniques, we explore the content of their production: their editorial line and their geographic coverage.\nWe consider the Bayesian active learning and experimental design problem, where the goal is to learn the value of some unknown target variable through a sequence of informative, noisy tests. In contrast to prior work, we focus on the challenging, yet practically relevant setting where test outcomes can be conditionally dependent given the hidden target variable. Under such assumptions, common heuristics, such as greedily performing tests that maximize the reduction in uncertainty of the target, often perform poorly. In this paper, we propose ECED, a novel, computationally efficient active learning algorithm, and prove strong theoretical guarantees that hold with correlated, noisy tests. Rather than directly optimizing the prediction error, at each step, ECED picks the test that maximizes the gain in a surrogate objective, which takes into account the dependencies between tests. Our analysis relies on an information-theoretic auxiliary function to track the progress of ECED, and utilizes adaptive submodularity to attain the near-optimal bound. We demonstrate strong empirical performance of ECED on two problem instances, including a Bayesian experimental design task intended to distinguish among economic theories of how people make risky decisions, and an active preference learning task via pairwise comparisons.\nRandom generators or stochastic engines are a key component in the structure of metaheuristic algorithms. This work investigates the effects of non-Gaussian stochastic engines on the performance of metaheuristics when solving a real-world optimization problem. In this work, the bacteria foraging algorithm (BFA) was employed in tandem with four random generators (stochastic engines). The stochastic engines operate using the Weibull distribution, Gamma distribution, Gaussian distribution and a chaotic mechanism. The two non-Gaussian distributions are the Weibull and Gamma distributions. In this work, the approaches developed were implemented on the real-world multi-objective resin bonded sand mould problem. The Pareto frontiers obtained were benchmarked using two metrics; the hyper volume indicator (HVI) and the proposed Average Explorative Rate (AER) metric. Detail discussions from various perspectives on the effects of non-Gaussian random generators in metaheuristics are provided.\nBayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This paper considers the problem of finding a robust policy while taking into account the impact of environment variables. We present Alternating Optimisation and Quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy. Experimental results across different domains show that ALOQ can learn more efficiently and robustly than existing methods.\nThe paper described a generalized integrated glance to bin packing problems including a brief literature survey and some new problem formulations for the cases of multiset estimates of items. A new systemic viewpoint to bin packing problems is suggested: (a) basic element sets (item set, bin set, item subset assigned to bin), (b) binary relation over the sets: relation over item set as compatibility, precedence, dominance; relation over items and bins (i.e., correspondence of items to bins). A special attention is targeted to the following versions of bin packing problems: (a) problem with multiset estimates of items, (b) problem with colored items (and some close problems). Applied examples of bin packing problems are considered: (i) planning in paper industry (framework of combinatorial problems), (ii) selection of information messages, (iii) packing of messages/information packages in WiMAX communication system (brief description).\nMany tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand.\nPrevious studies in Open Information Extraction (Open IE) are mainly based on extraction patterns. They manually define patterns or automatically learn them from a large corpus. However, these approaches are limited when grasping the context of a sentence, and they fail to capture implicit relations. In this paper, we address this problem with the following methods. First, we exploit long short-term memory (LSTM) networks to extract higher-level features along the shortest dependency paths, connecting headwords of relations and arguments. The path-level features from LSTM networks provide useful clues regarding contextual information and the validity of arguments. Second, we constructed samples to train LSTM networks without the need for manual labeling. In particular, feedback negative sampling picks highly negative samples among non-positive samples through a model trained with positive samples. The experimental results show that our approach produces more precise and abundant extractions than state-of-the-art open IE systems. To the best of our knowledge, this is the first work to apply deep learning to Open IE.\nThis paper proposes an adaptive neural-compilation framework to address the problem of efficient program learning. Traditional code optimisation strategies used in compilers are based on applying pre-specified set of transformations that make the code faster to execute without changing its semantics. In contrast, our work involves adapting programs to make them more efficient while considering correctness only on a target input distribution. Our approach is inspired by the recent works on differentiable representations of programs. We show that it is possible to compile programs written in a low-level language to a differentiable representation. We also show how programs in this representation can be optimised to make them efficient on a target distribution of inputs. Experimental results demonstrate that our approach enables learning specifically-tuned algorithms for given data distributions with a high success rate.\nCitation and coauthor networks offer an insight into the dynamics of scientific progress. We can also view them as representations of a causal structure, a logical process captured in a graph. From a causal perspective, we can ask questions such as whether authors form groups primarily due to their prior shared interest, or if their favourite topics are 'contagious' and spread through co-authorship. Such networks have been widely studied by the artificial intelligence community, and recently a connection has been made to nonlocal correlations produced by entangled particles in quantum physics -- the impact of latent hidden variables can be analyzed by the same algebraic geometric methodology that relies on a sequence of semidefinite programming (SDP) relaxations. Following this trail, we treat our sample coauthor network as a causal graph and, using SDP relaxations, rule out latent homophily as a manifestation of prior shared interest leading to the observed patternedness. By introducing algebraic geometry to citation studies, we add a new tool to existing methods for the analysis of content-related social influences.\nA recent trend in probabilistic inference emphasizes the codification of models in a formal syntax, with suitable high-level features such as individuals, relations, and connectives, enabling descriptive clarity, succinctness and circumventing the need for the modeler to engineer a custom solver. Unfortunately, bringing these linguistic and pragmatic benefits to numerical optimization has proven surprisingly challenging. In this paper, we turn to these challenges: we introduce a rich modeling language, for which an interior-point method computes approximate solutions in a generic way. While logical features easily complicates the underlying model, often yielding intricate dependencies, we exploit and cache local structure using algebraic decision diagrams (ADDs). Indeed, standard matrix-vector algebra is efficiently realizable in ADDs, but we argue and show that well-known optimization methods are not ideal for ADDs. Our engine, therefore, invokes a sophisticated matrix-free approach. We demonstrate the flexibility of the resulting symbolic-numeric optimizer on decision making and compressed sensing tasks with millions of non-zero entries.\nWe present SGDPLL(T), an algorithm that solves (among many other problems) probabilistic inference modulo theories, that is, inference problems over probabilistic models defined via a logic theory provided as a parameter (currently, propositional, equalities on discrete sorts, and inequalities, more specifically difference arithmetic, on bounded integers). While many solutions to probabilistic inference over logic representations have been proposed, SGDPLL(T) is simultaneously (1) lifted, (2) exact and (3) modulo theories, that is, parameterized by a background logic theory. This offers a foundation for extending it to rich logic languages such as data structures and relational data. By lifted, we mean algorithms with constant complexity in the domain size (the number of values that variables can take). We also detail a solver for summations with difference arithmetic and show experimental results from a scenario in which SGDPLL(T) is much faster than a state-of-the-art probabilistic solver.\nDeterminantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of $N$ items. They have recently gained prominence in several applications that rely on \"diverse\" subsets. However, their applicability to large problems is still limited due to the $\\mathcal O(N^3)$ complexity of core tasks such as sampling and learning. We enable efficient sampling and learning for DPPs by introducing KronDPP, a DPP model whose kernel matrix decomposes as a tensor product of multiple smaller kernel matrices. This decomposition immediately enables fast exact sampling. But contrary to what one may expect, leveraging the Kronecker product structure for speeding up DPP learning turns out to be more difficult. We overcome this challenge, and derive batch and stochastic optimization algorithms for efficiently learning the parameters of a KronDPP.\nNowadays, metro systems play an important role in meeting the urban transportation demand in large cities. The understanding of passenger route choice is critical for public transit management. The wide deployment of Automated Fare Collection(AFC) systems opens up a new opportunity. However, only each trip's tap-in and tap-out timestamp and stations can be directly obtained from AFC system records; the train and route chosen by a passenger are unknown, which are necessary to solve our problem. While existing methods work well in some specific situations, they don't work for complicated situations. In this paper, we propose a solution that needs no additional equipment or human involvement than the AFC systems. We develop a probabilistic model that can estimate from empirical analysis how the passenger flows are dispatched to different routes and trains. We validate our approach using a large scale data set collected from the Shenzhen metro system. The measured results provide us with useful inputs when building the passenger path choice model.\nIn imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.\nUnsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.\nIn this paper, we introduce a new set of reinforcement learning (RL) tasks in Minecraft (a flexible 3D world). We then use these tasks to systematically compare and contrast existing deep reinforcement learning (DRL) architectures with our new memory-based DRL architectures. These tasks are designed to emphasize, in a controllable manner, issues that pose challenges for RL methods including partial observability (due to first-person visual observations), delayed rewards, high-dimensional visual observations, and the need to use active perception in a correct manner so as to perform well in the tasks. While these tasks are conceptually simple to describe, by virtue of having all of these challenges simultaneously they are difficult for current DRL architectures. Additionally, we evaluate the generalization performance of the architectures on environments not used during training. The experimental results show that our new architectures generalize to unseen environments better than existing DRL architectures.\nBuyers (e.g., advertisers) often have limited financial and processing resources, and so their participation in auctions is throttled. Changes to auctions may affect bids or throttling and any change may affect what winners pay. This paper shows that if an A/B experiment affects only bids, then the observed treatment effect is unbiased when all the bidders in an auction are randomly assigned to A or B but it can be severely biased otherwise, even in the absence of throttling. Experiments that affect throttling algorithms can also be badly biased, but the bias can be substantially reduced if the budget for each advertiser in the experiment is allocated to separate pots for the A and B arms of the experiment.\nWe show that the climate phenomena of El Nino and La Nina arise naturally as states of macro-variables when our recent causal feature learning framework (Chalupka 2015, Chalupka 2016) is applied to micro-level measures of zonal wind (ZW) and sea surface temperatures (SST) taken over the equatorial band of the Pacific Ocean. The method identifies these unusual climate states on the basis of the relation between ZW and SST patterns without any input about past occurrences of El Nino or La Nina. The simpler alternatives of (i) clustering the SST fields while disregarding their relationship with ZW patterns, or (ii) clustering the joint ZW-SST patterns, do not discover El Nino. We discuss the degree to which our method supports a causal interpretation and use a low-dimensional toy example to explain its success over other clustering approaches. Finally, we propose a new robust and scalable alternative to our original algorithm (Chalupka 2016), which circumvents the need for high-dimensional density learning.\nAs it has become common to use many computer cores in routine applications, finding good ways to parallelize popular algorithms has become increasingly important. In this paper, we present a parallelization scheme for Markov chain Monte Carlo (MCMC) methods based on spectral clustering of the underlying state space, generalizing earlier work on parallelization of MCMC methods by state space partitioning. We show empirically that this approach speeds up MCMC sampling for multimodal distributions and that it can be usefully applied in greater generality than several related algorithms. Our algorithm converges under reasonable conditions to an `optimal' MCMC algorithm. We also show that our approach can be asymptotically far more efficient than naive parallelization, even in situations such as completely flat target distributions where no unique optimal algorithm exists. Finally, we combine theoretical and empirical bounds to provide practical guidance on the choice of tuning parameters.\nWe propose a model of interdependent scheduling games in which each player controls a set of services that they schedule independently. A player is free to schedule his own services at any time; however, each of these services only begins to accrue reward for the player when all predecessor services, which may or may not be controlled by the same player, have been activated. This model, where players have interdependent services, is motivated by the problems faced in planning and coordinating large-scale infrastructures, e.g., restoring electricity and gas to residents after a natural disaster or providing medical care in a crisis when different agencies are responsible for the delivery of staff, equipment, and medicine. We undertake a game-theoretic analysis of this setting and in particular consider the issues of welfare maximization, computing best responses, Nash dynamics, and existence and computation of Nash equilibria.\nSpecialized dictionaries are used to understand concepts in specific domains, especially where those concepts are not part of the general vocabulary, or having meanings that differ from ordinary languages. The first step in creating a specialized dictionary involves detecting the characteristic vocabulary of the domain in question. Classical methods for detecting this vocabulary involve gathering a domain corpus, calculating statistics on the terms found there, and then comparing these statistics to a background or general language corpus. Terms which are found significantly more often in the specialized corpus than in the background corpus are candidates for the characteristic vocabulary of the domain. Here we present two tools, a directed crawler, and a distributional semantics package, that can be used together, circumventing the need of a background corpus. Both tools are available on the web.\nScalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.\nThe maturity of deep learning techniques has led in recent years to a breakthrough in object recognition in visual media. While for some specific benchmarks, neural techniques seem to match if not outperform human judgement, challenges are still open for detecting arbitrary concepts in arbitrary videos. In this paper, we propose a system that combines neural techniques, a large scale visual concepts ontology, and an active learning loop, to provide on the fly model learning of arbitrary concepts. We give an overview of the system as a whole, and focus on the central role of the ontology for guiding and bootstrapping the learning of new concepts, improving the recall of concept detection, and, on the user end, providing semantic search on a library of annotated videos.\nThis paper presents a Directed Controller Synthesis (DCS) technique for discrete event systems. The DCS method explores the solution space for reactive controllers guided by a domain-independent heuristic. The heuristic is derived from an efficient abstraction of the environment based on the componentized way in which complex environments are described. Then by building the composition of the components on-the-fly DCS obtains a solution by exploring a reduced portion of the state space. This work focuses on untimed discrete event systems with safety and co-safety (i.e. reachability) goals. An evaluation for the technique is presented comparing it to other well-known approaches to controller synthesis (based on symbolic representation and compositional analyses).\nIn this paper, an uncertain Multi-objective Multi-item Solid Transportation Problem (MMSTP) based on uncertainty theory is presented. In the model, transportation costs, supplies, demands and conveyances parameters are taken to be uncertain parameters. There are restrictions on some items and conveyances of the model. Therefore, some particular items cannot be transported by some exceptional conveyances. Using the advantage of uncertainty theory, the MMSTP is first converted into an equivalent deterministic MMSTP. By applying convex combination method and minimizing distance function method, the deterministic MMSTP is reduced into single objective programming problems. Thus, both single objective programming problems are solved using Maple 18.02 optimization toolbox. Finally, a numerical example is given to illustrate the performance of the models.\nThis paper introduces a new technique for quantifying the approximation error of a broad class of probabilistic inference programs, including ones based on both variational and Monte Carlo approaches. The key idea is to derive a subjective bound on the symmetrized KL divergence between the distribution achieved by an approximate inference program and its true target distribution. The bound's validity (and subjectivity) rests on the accuracy of two auxiliary probabilistic programs: (i) a \"reference\" inference program that defines a gold standard of accuracy and (ii) a \"meta-inference\" program that answers the question \"what internal random choices did the original approximate inference program probably make given that it produced a particular result?\" The paper includes empirical results on inference problems drawn from linear regression, Dirichlet process mixture modeling, HMMs, and Bayesian networks. The experiments show that the technique is robust to the quality of the reference inference program and that it can detect implementation bugs that are not apparent from predictive performance.\nKidney exchange is a barter market where patients trade willing but medically incompatible donors. These trades occur via cycles, where each patient-donor pair both gives and receives a kidney, and via chains, which begin with an altruistic donor who does not require a kidney in return. For logistical reasons, the maximum length of a cycle is typically limited to a small constant, while chains can be much longer. Given a compatibility graph of patient-donor pairs, altruists, and feasible potential transplants between them, finding even a maximum-cardinality set of vertex-disjoint cycles and chains is NP-hard. There has been much work on developing provably optimal solvers that are efficient in practice. One of the leading techniques has been branch and price, where column generation is used to incrementally bring cycles and chains into the optimization model on an as-needed basis. In particular, only positive-price columns need to be brought into the model. We prove that finding a positive-price chain is NP-complete. This shows incorrectness of two leading branch-and-price solvers that suggested polynomial-time chain pricing algorithms.\nQualitative Spatial and Temporal Reasoning (QSTR) is concerned with symbolic knowledge representation, typically over infinite domains. The motivations for employing QSTR techniques range from exploiting computational properties that allow efficient reasoning to capture human cognitive concepts in a computational framework. The notion of a qualitative calculus is one of the most prominent QSTR formalisms. This article presents the first overview of all qualitative calculi developed to date and their computational properties, together with generalized definitions of the fundamental concepts and methods, which now encompass all existing calculi. Moreover, we provide a classification of calculi according to their algebraic properties.\nObject-oriented Application Programing Interfaces (APIs) support software reuse by providing pre-implemented functionalities. Due to the huge number of included classes, reusing and understanding large APIs is a complex task. Otherwise, software components are admitted to be more reusable and understandable entities than object-oriented ones. Thus, in this paper, we propose an approach for reengineering object-oriented APIs into component-based ones. We mine components as a group of classes based on the frequency they are used together and their ability to form a quality-centric component. To validate our approach, we experimented on 100 Java applications that used Android APIs.\nWhile Bayesian methods are praised for their ability to incorporate useful prior knowledge, in practice, convenient priors that allow for computationally cheap or tractable inference are commonly used. In this paper, we investigate the following question: for a given model, is it possible to compute an inference result with any convenient false prior, and afterwards, given any target prior of interest, quickly transform this result into the target posterior? A potential solution is to use importance sampling (IS). However, we demonstrate that IS will fail for many choices of the target prior, depending on its parametric form and similarity to the false prior. Instead, we propose prior swapping, a method that leverages the pre-inferred false posterior to efficiently generate accurate posterior samples under arbitrary target priors. Prior swapping lets us apply less-costly inference algorithms to certain models, and incorporate new or updated prior information \"post-inference\". We give theoretical guarantees about our method, and demonstrate it empirically on a number of models and priors.\nWith the rapid growth of knowledge bases (KBs) on the web, how to take full advantage of them becomes increasingly important. Knowledge base-based question answering (KB-QA) is one of the most promising approaches to access the substantial knowledge. Meantime, as the neural network-based (NN-based) methods develop, NN-based KB-QA has already achieved impressive results. However, previous work did not put emphasis on question representation, and the question is converted into a fixed vector regardless of its candidate answers. This simple representation strategy is unable to express the proper information of the question. Hence, we present a neural attention-based model to represent the questions dynamically according to the different focuses of various candidate answer aspects. In addition, we leverage the global knowledge inside the underlying KB, aiming at integrating the rich KB information into the representation of the answers. And it also alleviates the out of vocabulary (OOV) problem, which helps the attention model to represent the question more precisely. The experimental results on WEBQUESTIONS demonstrate the effectiveness of the proposed approach.\nIn the domain of the Soccer simulation 2D league of the RoboCup project, appropriate player positioning against a given opponent team is an important factor of soccer team performance. This work proposes a model which decides the strategy that should be applied regarding a particular opponent team. This task can be realized by applying preliminary a learning phase where the model determines the most effective strategies against clusters of opponent teams. The model determines the best strategies by using sequential Bayes' estimators. As a first trial of the system, the proposed model is used to determine the association of player formations against opponent teams in the particular situation of corner-kick. The implemented model shows satisfying abilities to compare player formations that are similar to each other in terms of performance and determines the right ranking even by running a decent number of simulation games.\nOne difficulty faced in knowledge engineering for Bayesian Network (BN) is the quan-tification step where the Conditional Probability Tables (CPTs) are determined. The number of parameters included in CPTs increases exponentially with the number of parent variables. The most common solution is the application of the so-called canonical gates. The Noisy-OR (NOR) gate, which takes advantage of the independence of causal interactions, provides a logarithmic reduction of the number of parameters required to specify a CPT. In this paper, an extension of NOR model based on the theory of belief functions, named Belief Noisy-OR (BNOR), is proposed. BNOR is capable of dealing with both aleatory and epistemic uncertainty of the network. Compared with NOR, more rich information which is of great value for making decisions can be got when the available knowledge is uncertain. Specially, when there is no epistemic uncertainty, BNOR degrades into NOR. Additionally, different structures of BNOR are presented in this paper in order to meet various needs of engineers. The application of BNOR model on the reliability evaluation problem of networked systems demonstrates its effectiveness.\nThis paper presents a model for end-to-end learning of task-oriented dialog systems. The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions. The LSTM automatically infers a representation of dialog history, which relieves the system developer of much of the manual feature engineering of dialog state. In addition, the developer can provide software that expresses business rules and provides access to programmatic APIs, enabling the LSTM to take actions in the real world on behalf of the user. The LSTM can be optimized using supervised learning (SL), where a domain expert provides example dialogs which the LSTM should imitate; or using reinforcement learning (RL), where the system improves by interacting directly with end users. Experiments show that SL and RL are complementary: SL alone can derive a reasonable initial policy from a small number of training dialogs; and starting RL optimization with a policy trained with SL substantially accelerates the learning rate of RL.\nWe consider a class of probabilistic grammars for generating scenes with multiple objects. Probabilistic scene grammars capture relationships between objects using compositional rules that provide important contextual cues for inference with ambiguous data. We show how to represent the distribution defined by a probabilistic scene grammar using a factor graph. We also show how to efficiently perform message passing in this factor graph. This leads to an efficient approach for inference with a grammar model using belief propagation as the underlying computational engine. Inference with belief propagation naturally combines bottom-up and top-down contextual information and leads to a robust algorithm for aggregating evidence. We show experiments on two different applications to demonstrate the generality of the framework. The first application involves detecting curves in noisy images, and we address this problem using a grammar that generates a collection of curves using a first-order Markov process. The second application involves localizing faces and parts of faces in images. In this case, we use a grammar that captures spatial relationships between the parts of a face. In both applications the same framework leads to robust inference algorithms that can effectively combine weak local information to reason about a scene.\nReal-world multi-agent planning problems cannot be solved using decision-theoretic planning methods due to the exponential complexity. We approximate firefighting in rescue simulation as a spatially distributed task and model with multi-agent Markov decision process. We use recent approximation methods for spatial task problems to reduce the model complexity. Our approximations are single-agent, static task, shortest path pruning, dynamic planning horizon, and task clustering. We create scenarios from RoboCup Rescue Simulation maps and evaluate our methods on these graph worlds. The results show that our approach is faster and better than comparable methods and has negligible performance loss compared to the optimal policy. We also show that our method has a similar performance as DCOP methods on example RCRS scenarios.\nThe ability to reason with natural language is a fundamental prerequisite for many NLP tasks such as information extraction, machine translation and question answering. To quantify this ability, systems are commonly tested whether they can recognize textual entailment, i.e., whether one sentence can be inferred from another one. However, in most NLP applications only single source sentences instead of sentence pairs are available. Hence, we propose a new task that measures how well a model can generate an entailed sentence from a source sentence. We take entailment-pairs of the Stanford Natural Language Inference corpus and train an LSTM with attention. On a manually annotated test set we found that 82% of generated sentences are correct, an improvement of 10.3% over an LSTM baseline. A qualitative analysis shows that this model is not only capable of shortening input sentences, but also inferring new statements via paraphrasing and phrase entailment. We then apply this model recursively to input-output pairs, thereby generating natural language inference chains that can be used to automatically construct an entailment graph from source sentences. Finally, by swapping source and target sentences we can also train a model that given an input sentence invents additional information to generate a new sentence.\nMotivated by the search for a counterexample to the Poincar\\'e conjecture in three and four dimensions, the Andrews-Curtis conjecture was proposed in 1965. It is now generally suspected that the Andrews-Curtis conjecture is false, but small potential counterexamples are not so numerous, and previous work has attempted to eliminate some via combinatorial search. Progress has however been limited, with the most successful approach (breadth-first-search using secondary storage) being neither scalable nor heuristically-informed. A previous empirical analysis of problem structure examined several heuristic measures of search progress and determined that none of them provided any useful guidance for search. In this article, we induce new quality measures directly from the problem structure and combine them to produce a more effective search driver via ensemble machine learning. By this means, we eliminate 19 potential counterexamples, the status of which had been unknown for some years.\nAn open problem with categorical compositional distributional semantics is the representation of words that are considered semantically vacuous from a distributional perspective, such as determiners, prepositions, relative pronouns or coordinators. This paper deals with the topic of coordination between identical syntactic types, which accounts for the majority of coordination cases in language. By exploiting the compact closed structure of the underlying category and Frobenius operators canonically induced over the fixed basis of finite-dimensional vector spaces, we provide a morphism as representation of a coordinator tensor, and we show how it lifts from atomic types to compound types. Linguistic intuitions are provided, and the importance of the Frobenius operators as an addition to the compact closed setting with regard to language is discussed.\nWe introduce Bayesian Poisson Tucker decomposition (BPTD) for modeling country--country interaction event data. These data consist of interaction events of the form \"country $i$ took action $a$ toward country $j$ at time $t$.\" BPTD discovers overlapping country--community memberships, including the number of latent communities. In addition, it discovers directed community--community interaction networks that are specific to \"topics\" of action types and temporal \"regimes.\" We show that BPTD yields an efficient MCMC inference algorithm and achieves better predictive performance than related models. We also demonstrate that it discovers interpretable latent structure that agrees with our knowledge of international relations.\nWe consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge.\nSpace situational awareness (SSA) is vital for international safety and security, and the future of space travel. By improving SSA data-sharing we improve global SSA. Computational ontology may provide one means toward that goal. This paper develops the ontology of the SSA domain and takes steps in the creation of the space situational awareness ontology. Ontology objectives, requirements and desiderata are outlined; and both the SSA domain and the discipline of ontology are described. The purposes of the ontology include: exploring the potential for ontology development and engineering to (i) represent SSA data, general domain knowledge, objects and relationships (ii) annotate and express the meaning of that data, and (iii) foster SSA data-exchange and integration among SSA actors, orbital debris databases, space object catalogs and other SSA data repositories. By improving SSA via data- and knowledge-sharing, we can (iv) expand our scientific knowledge of the space environment, (v) advance our capacity for planetary defense from near-Earth objects, and (vi) ensure the future of safe space flight for generations to come.\nWe propose and investigate a semantics for \"peer data exchange systems\" where different peers are related by data exchange constraints and trust relationships. These two elements plus the data at the peers' sites and their local integrity constraints are made compatible via a semantics that characterizes sets of \"solution instances\" for the peers. They are the intended -possibly virtual- instances for a peer that are obtained through a data repair semantics that we introduce and investigate. The semantically correct answers from a peer to a query, the so-called \"peer consistent answers\", are defined as those answers that are invariant under all its different solution instances. We show that solution instances can be specified as the models of logic programs with a stable model semantics. The repair semantics is based on null values as used in SQL databases, and is also of independent interest for repairs of single databases with respect to integrity constraints.\nPrior to seeking professional medical care it is increasingly common for patients to use online resources such as automated symptom checkers. Many such systems attempt to provide a differential diagnosis based on the symptoms elucidated from the user, which may lead to anxiety if life or limb-threatening conditions are part of the list, a phenomenon termed 'cyberchondria' [1]. Systems that provide advice on where to seek help, rather than a diagnosis, are equally popular, and in our view provide the most useful information. In this technical report we describe how such a triage system can be modelled computationally, how medical insights can be translated into triage flows, and how such systems can be validated and tested. We present babylon check, our commercially deployed automated triage system, as a case study, and illustrate its performance in a large, semi-naturalistic deployment study.\nSecurity Games employ game theoretical tools to derive resource allocation strategies in security domains. Recent works considered the presence of alarm systems, even suffering various forms of uncertainty, and showed that disregarding alarm signals may lead to arbitrarily bad strategies. The central problem with an alarm system, unexplored in other Security Games, is finding the best strategy to respond to alarm signals for each mobile defensive resource. The literature provides results for the basic single-resource case, showing that even in that case the problem is computationally hard. In this paper, we focus on the challenging problem of designing algorithms scaling with multiple resources. First, we focus on finding the minimum number of resources assuring non-null protection to every target. Then, we deal with the computation of multi-resource strategies with different degrees of coordination among resources. For each considered problem, we provide a computational analysis and propose algorithmic methods.\nThe massive availability of digital repositories of human thought opens radical novel way of studying the human mind. Natural language processing tools and computational models have evolved such that many mental conditions are predicted by analysing speech. Transcription of interviews and discourses are analyzed using syntactic, grammatical or sentiment analysis to infer the mental state. Here we set to investigate if classification of Bipolar and control subjects is possible. We develop the Emotion Intensity Index based on the Dictionary of Affect, and find that subjects categories are distinguishable. Using classical classification techniques we get more than 75\\% of labeling performance. These results sumed to previous studies show that current automated speech analysis is capable of identifying altered mental states towards a quantitative psychiatry.\nMeasuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data.\nWe introduce SE3-Nets, which are deep neural networks designed to model and learn rigid body motion from raw point cloud data. Based only on sequences of depth images along with action vectors and point wise data associations, SE3-Nets learn to segment effected object parts and predict their motion resulting from the applied force. Rather than learning point wise flow vectors, SE3-Nets predict SE3 transformations for different parts of the scene. Using simulated depth data of a table top scene and a robot manipulator, we show that the structure underlying SE3-Nets enables them to generate a far more consistent prediction of object motion than traditional flow based networks. Additional experiments with a depth camera observing a Baxter robot pushing objects on a table show that SE3-Nets also work well on real data.\nRecently there has been an increasing trend to use deep learning frameworks for both 2D consumer images and for 3D medical images. However, there has been little effort to use deep frameworks for volumetric vascular segmentation. We wanted to address this by providing a freely available dataset of 12 annotated two-photon vasculature microscopy stacks. We demonstrated the use of deep learning framework consisting both 2D and 3D convolutional filters (ConvNet). Our hybrid 2D-3D architecture produced promising segmentation result. We derived the architectures from Lee et al. who used the ZNN framework initially designed for electron microscope image segmentation. We hope that by sharing our volumetric vasculature datasets, we will inspire other researchers to experiment with vasculature dataset and improve the used network architectures.\nLearning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations -- simple grid-world domains (MazeBase) and the Doom game engine.\nWe introduce a new language learning setting relevant to building adaptive natural language interfaces. It is inspired by Wittgenstein's language games: a human wishes to accomplish some task (e.g., achieving a certain configuration of blocks), but can only communicate with a computer, who performs the actual actions (e.g., removing all red blocks). The computer initially knows nothing about language and therefore must learn it from scratch through interaction, while the human adapts to the computer's capabilities. We created a game in a blocks world and collected interactions from 100 people playing it. First, we analyze the humans' strategies, showing that using compositionality and avoiding synonyms correlates positively with task performance. Second, we compare computer strategies, showing how to quickly learn a semantic parsing model from scratch, and that modeling pragmatics further accelerates learning for successful players.\nAs robots enter human environments, they will be expected to accomplish a tremendous range of tasks. It is not feasible for robot designers to pre-program these behaviors or know them in advance, so one way to address this is through end-user programming, such as via learning from demonstration (LfD). While significant work has been done on the mechanics of enabling robot learning from human teachers, one unexplored aspect is enabling mutual feedback between both the human teacher and robot during the learning process, i.e., implicit learning. In this paper, we explore one aspect of this mutual understanding, grounding sequences, where both a human and robot provide non-verbal feedback to signify their mutual understanding during interaction. We conducted a study where people taught an autonomous humanoid robot a dance, and performed gesture analysis to measure people's responses to the robot during correct and incorrect demonstrations.\nWe present a new type of probabilistic model which we call DISsimilarity COefficient Networks (DISCO Nets). DISCO Nets allow us to efficiently sample from a posterior distribution parametrised by a neural network. During training, DISCO Nets are learned by minimising the dissimilarity coefficient between the true distribution and the estimated distribution. This allows us to tailor the training to the loss related to the task at hand. We empirically show that (i) by modeling uncertainty on the output value, DISCO Nets outperform equivalent non-probabilistic predictive networks and (ii) DISCO Nets accurately model the uncertainty of the output, outperforming existing probabilistic models based on deep neural networks.\nThis paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve faster learning speed. We evaluated the proposed model on a 20 Question Game conversational game simulator. Results show that the proposed method outperforms the modular-based baseline and learns a distributed representation of the latent dialog state.\nIn this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace($\\lambda$), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of \"off-policyness\"; and (3) it is efficient as it makes the best use of samples collected from near on-policy behaviour policies. We analyze the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. We believe this is the first return-based off-policy control algorithm converging a.s. to $Q^*$ without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q($\\lambda$), which was an open problem since 1989. We illustrate the benefits of Retrace($\\lambda$) on a standard suite of Atari 2600 games.\nFrom the point of view of a programmer, the robopsychology is a synonym for the activity is done by developers to implement their machine learning applications. This robopsychological approach raises some fundamental theoretical questions of machine learning. Our discussion of these questions is constrained to Turing machines. Alan Turing had given an algorithm (aka the Turing Machine) to describe algorithms. If it has been applied to describe itself then this brings us to Turing's notion of the universal machine. In the present paper, we investigate algorithms to write algorithms. From a pedagogy point of view, this way of writing programs can be considered as a combination of learning by listening and learning by doing due to it is based on applying agent technology and machine learning. As the main result we introduce the problem of learning and then we show that it cannot easily be handled in reality therefore it is reasonable to use machine learning algorithm for learning Turing machines.\nWe present a new combinatorial market maker that operates arbitrage-free combinatorial prediction markets specified by integer programs. Although the problem of arbitrage-free pricing, while maintaining a bound on the subsidy provided by the market maker, is #P-hard in the worst case, we posit that the typical case might be amenable to modern integer programming (IP) solvers. At the crux of our method is the Frank-Wolfe (conditional gradient) algorithm which is used to implement a Bregman projection aligned with the market maker's cost function, using an IP solver as an oracle. We demonstrate the tractability and improved accuracy of our approach on real-world prediction market data from combinatorial bets placed on the 2010 NCAA Men's Division I Basketball Tournament, where the outcome space is of size 2^63. To our knowledge, this is the first implementation and empirical evaluation of an arbitrage-free combinatorial prediction market on this scale.\nEnabling a computer to understand a document so that it can answer comprehension questions is a central, yet unsolved goal of NLP. A key factor impeding its solution by machine learned systems is the limited availability of human-annotated data. Hermann et al. (2015) seek to solve this problem by creating over a million training examples by pairing CNN and Daily Mail news articles with their summarized bullet points, and show that a neural network can then be trained to give good performance on this task. In this paper, we conduct a thorough examination of this new reading comprehension task. Our primary aim is to understand what depth of language understanding is required to do well on this task. We approach this from one side by doing a careful hand-analysis of a small subset of the problems and from the other by showing that simple, carefully designed systems can obtain accuracies of 73.6% and 76.6% on these two datasets, exceeding current state-of-the-art results by 7-10% and approaching what we believe is the ceiling for performance on this task.\nUnderstanding user instructions in natural language is an active research topic in AI and robotics. Typically, natural user instructions are high-level and can be reduced into low-level tasks expressed in common verbs (e.g., `take', `get', `put'). For robots understanding such instructions, one of the key challenges is to process high-level user instructions and achieve the specified tasks with robots' primitive actions. To address this, we propose novel algorithms by utilizing semantic roles of common verbs defined in semantic dictionaries and integrating multiple open knowledge to generate task plans. Specifically, we present a new method for matching and recovering semantics of user instructions and a novel task planner that exploits functional knowledge of robot's action model. To verify and evaluate our approach, we implemented a prototype system using knowledge from several open resources. Experiments on our system confirmed the correctness and efficiency of our algorithms. Notably, our system has been deployed in the KeJia robot, which participated the annual RoboCup@Home competitions in the past three years and achieved encouragingly high scores in the benchmark tests.\nIn this paper we present a new neurobiologically-inspired affective cognitive architecture: NEUCOGAR (NEUromodulating COGnitive ARchitecture). The objective of NEUCOGAR is the identification of a mapping from the influence of serotonin, dopamine and noradrenaline to the computing processes based on Von Neuman's architecture, in order to implement affective phenomena which can operate on the Turing's machine model. As basis of the modeling we use and extend the L\\\"ovheim Cube of Emotion with parameters of the Von Neumann architecture. Validation is conducted via simulation on a computing system of dopamine neuromodulation and its effects on the Cortex. In the experimental phase of the project, the increase of computing power and storage redistribution due to emotion stimulus modulated by the dopamine system, confirmed the soundness of the model.\nWord embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a low-dimensional continuous space. In two document classification tasks, our method performs better than eight existing methods, with fewer features. In addition, we illustrate with an example that our method can generate coherent topics even based on only one document.\nRecurrent neural networks such as the GRU and LSTM found wide adoption in natural language processing and achieve state-of-the-art results for many tasks. These models are characterized by a memory state that can be written to and read from by applying gated composition operations to the current input and the previous state. However, they only cover a small subset of potentially useful compositions. We propose Multi-Function Recurrent Units (MuFuRUs) that allow for arbitrary differentiable functions as composition operations. Furthermore, MuFuRUs allow for an input- and state-dependent choice of these composition operations that is learned. Our experiments demonstrate that the additional functionality helps in different sequence modeling tasks, including the evaluation of propositional logic formulae, language modeling and sentiment analysis.\nFor an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partial-information game with two agents, human and robot; both are rewarded according to the human's reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm.\nThe gossip problem, in which information (known as secrets) must be shared among a certain number of agents using the minimum number of calls, is of interest in the conception of communication networks and protocols. We extend the gossip problem to arbitrary epistemic depths. For example, we may require not only that all agents know all secrets but also that all agents know that all agents know all secrets. We give optimal protocols for various versions of the generalised gossip problem, depending on the graph of communication links, in the case of two-way communications, one-way communications and parallel communication. We also study different variants which allow us to impose negative goals such as that certain agents must not know certain secrets. We show that in the presence of negative goals testing the existence of a successful protocol is NP-complete whereas this is always polynomial-time in the case of purely positive goals.\nDecision-making is often dependent on uncertain data, e.g. data associated with confidence scores or probabilities. We present a comparison of different information presentations for uncertain data and, for the first time, measure their effects on human decision-making. We show that the use of Natural Language Generation (NLG) improves decision-making under uncertainty, compared to state-of-the-art graphical-based representation methods. In a task-based study with 442 adults, we found that presentations using NLG lead to 24% better decision-making on average than the graphical presentations, and to 44% better decision-making when NLG is combined with graphics. We also show that women achieve significantly better results when presented with NLG output (an 87% increase on average compared to graphical presentations).\nIn various areas of computer science, the problem of dealing with a set of constraints arises. If the set of constraints is unsatisfiable, one may ask for a minimal description of the reason for this unsatisifi- ability. Minimal unsatisifable subsets (MUSes) and maximal satisifiable subsets (MSSes) are two kinds of such minimal descriptions. The goal of this work is the enumeration of MUSes and MSSes for a given constraint system. As such full enumeration may be intractable in general, we focus on building an online algorithm, which produces MUSes/MSSes in an on-the-fly manner as soon as they are discovered. The problem has been studied before even in its online version. However, our algorithm uses a novel approach that is able to outperform current state-of-the art algorithms for online MUS/MSS enumeration. Moreover, the performance of our algorithm can be adjusted using tunable parameters. We evaluate the algorithm on a set of benchmarks.\nA complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vectors for each word from WordNet. These vectors encapsulate general position - role of a given word towards all other words in the natural language. Any list or set of such vectors contains knowledge about the context of its component within the whole language. Such word representation can be easily applied to many analytic tasks like classification or clustering. The usefulness of the WordNet2Vec method was demonstrated in sentiment analysis, i.e. classification with transfer learning for the real Amazon opinion textual dataset.\nEncoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size.   In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences.   For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.\nGibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.\nSupervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of different notions, and question the oft-made assertions that linear models are interpretable and that deep neural networks are not.\nIn this paper we present a clean, yet effective, model for word sense disambiguation. Our approach leverage a bidirectional long short-term memory network which is shared between all words. This enables the model to share statistical strength and to scale well with vocabulary size. The model is trained end-to-end, directly from the raw text to sense labels, and makes effective use of word order. We evaluate our approach on two standard datasets, using identical hyperparameter settings, which are in turn tuned on a third set of held out data. We employ no external resources (e.g. knowledge graphs, part-of-speech tagging, etc), language specific features, or hand crafted rules, but still achieve statistically equivalent results to the best state-of-the-art systems, that employ no such limitations.\nThe Inland Revenue Services is overwhelmed with gigabyte of disk capacity containing data about tax payers in the state. The data stored on the database increases in size at an alarming rate. This has resulted in a data rich but information poor situation where there is a widening gap between the explosive growth of data and its types, and the ability to analyze and interpret it effectively, hence the need for a new generation of automated and intelligent tools and techniques known as investigative data mining, to look for patterns in data. These patterns can lead to new insights, competitive advantages for business, and tangible benefits for the State Revenue services. This research work focuses on designing effective fraud detection and deterring architecture using investigative data mining technique. The proposed system architecture is designed to reason using Artificial Neural Network and Machine learning algorithm in order to detect and deter fraudulent activities. We recommend that the architectural framework be developed using Object Oriented Programming and Agent Oriented Programming Languages.\nA backbone of a boolean formula $F$ is a collection $S$ of its variables for which there is a unique partial assignment $a_S$ such that $F[a_S]$ is satisfiable [MZK+99,WGS03]. This paper studies the nontransparency of backbones. We show that, under the widely believed assumption that integer factoring is hard, there exist sets of boolean formulas that have obvious, nontrivial backbones yet finding the values, $a_S$, of those backbones is intractable. We also show that, under the same assumption, there exist sets of boolean formulas that obviously have large backbones yet producing such a backbone $S$ is intractable. Further, we show that if integer factoring is not merely worst-case hard but is frequently hard, as is widely believed, then the frequency of hardness in our two results is not too much less than that frequency.\nWe introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space. A specified number of discussion threads predicted to be popular are recommended, chosen from a fixed window of recent comments to track. Novel deep reinforcement learning architectures are studied for effective modeling of the value function associated with actions comprised of interdependent sub-actions. The proposed model, which represents dependence between sub-actions through a bi-directional LSTM, gives the best performance across different experimental configurations and domains, and it also generalizes well with varying numbers of recommendation requests.\nWe propose here a methodology to help to understand the shortcomings of public transportation in a city via the mining of complex networks representing the supply and demand of public transport. We show how to build these networks based upon data on smart card use in buses via the application of algorithms that estimate an OD and reconstruct the complete itinerary of the passengers. The overlapping of the two networks sheds light in potential overload and waste in the offer of resources that can be mitigated with strategies for balancing supply and demand.\nWe describe MITRE's submission to the SemEval-2016 Task 6, Detecting Stance in Tweets. This effort achieved the top score in Task A on supervised stance detection, producing an average F1 score of 67.8 when assessing whether a tweet author was in favor or against a topic. We employed a recurrent neural network initialized with features learned via distant supervision on two large unlabeled datasets. We trained embeddings of words and phrases with the word2vec skip-gram method, then used those features to learn sentence representations via a hashtag prediction auxiliary task. These sentence vectors were then fine-tuned for stance detection on several hundred labeled examples. The result was a high performing system that used transfer learning to maximize the value of the available training data.\nCommunity detection has attracted considerable attention crossing many areas as it can be used for discovering the structure and features of complex networks. With the increasing size of social networks in real world, community detection approaches should be fast and accurate. The Label Propagation Algorithm (LPA) is known to be one of the near-linear solutions and benefits of easy implementation, thus it forms a good basis for efficient community detection methods. In this paper, we extend the update rule and propagation criterion of LPA in the framework of belief functions. A new community detection approach, called Evidential Label Propagation (ELP), is proposed as an enhanced version of conventional LPA. The node influence is first defined to guide the propagation process. The plausibility is used to determine the domain label of each node. The update order of nodes is discussed to improve the robustness of the method. ELP algorithm will converge after the domain labels of all the nodes become unchanged. The mass assignments are calculated finally as memberships of nodes. The overlapping nodes and outliers can be detected simultaneously through the proposed method. The experimental results demonstrate the effectiveness of ELP.\nProbabilistic models analyze data by relying on a set of assumptions. Data that exhibit deviations from these assumptions can undermine inference and prediction quality. Robust models offer protection against mismatch between a model's assumptions and reality. We propose a way to systematically detect and mitigate mismatch of a large class of probabilistic models. The idea is to raise the likelihood of each observation to a weight and then to infer both the latent variables and the weights from data. Inferring the weights allows a model to identify observations that match its assumptions and down-weight others. This enables robust inference and improves predictive accuracy. We study four different forms of mismatch with reality, ranging from missing latent groups to structure misspecification. A Poisson factorization analysis of the Movielens 1M dataset shows the benefits of this approach in a practical scenario.\nMany important NLP problems can be posed as dual-sequence or sequence-to-sequence modeling tasks. Recent advances in building end-to-end neural architectures have been highly successful in solving such tasks. In this work we propose a new architecture for dual-sequence modeling that is based on associative memory. We derive AM-RNNs, a recurrent associative memory (AM) which augments generic recurrent neural networks (RNN). This architecture is extended to the Dual AM-RNN which operates on two AMs at once. Our models achieve very competitive results on textual entailment. A qualitative analysis demonstrates that long range dependencies between source and target-sequence can be bridged effectively using Dual AM-RNNs. However, an initial experiment on auto-encoding reveals that these benefits are not exploited by the system when learning to solve sequence-to-sequence tasks which indicates that additional supervision or regularization is needed.\nWe describe a system to detect objects in three-dimensional space using video and inertial sensors (accelerometer and gyrometer), ubiquitous in modern mobile platforms from phones to drones. Inertials afford the ability to impose class-specific scale priors for objects, and provide a global orientation reference. A minimal sufficient representation, the posterior of semantic (identity) and syntactic (pose) attributes of objects in space, can be decomposed into a geometric term, which can be maintained by a localization-and-mapping filter, and a likelihood function, which can be approximated by a discriminatively-trained convolutional neural network. The resulting system can process the video stream causally in real time, and provides a representation of objects in the scene that is persistent: Confidence in the presence of objects grows with evidence, and objects previously seen are kept in memory even when temporarily occluded, with their return into view automatically predicted to prime re-detection.\nThere is intense interest in applying machine learning to problems of causal inference in fields such as healthcare, economics and education. In particular, individual-level causal inference has important applications such as precision medicine. We give a new theoretical analysis and family of algorithms for predicting individual treatment effect (ITE) from observational data, under the assumption known as strong ignorability. The algorithms learn a \"balanced\" representation such that the induced treated and control distributions look similar. We give a novel, simple and intuitive generalization-error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization-error of that representation and the distance between the treated and control distributions induced by the representation. We use Integral Probability Metrics to measure distances between distributions, deriving explicit bounds for the Wasserstein and Maximum Mean Discrepancy (MMD) distances. Experiments on real and simulated data show the new algorithms match or outperform the state-of-the-art.\nThe inherent inflexibility and incompleteness of commonsense knowledge bases (KB) has limited their usefulness. We describe a system called Displacer for performing KB queries extended with the analogical capabilities of the word2vec distributional semantic vector space (DSVS). This allows the system to answer queries with information which was not contained in the original KB in any form. By performing analogous queries on semantically related terms and mapping their answers back into the context of the original query using displacement vectors, we are able to give approximate answers to many questions which, if posed to the KB alone, would return no results.   We also show how the hand-curated knowledge in a KB can be used to increase the accuracy of a DSVS in solving analogy problems. In these ways, a KB and a DSVS can make up for each other's weaknesses.\nThe Bacterial Foraging Optimization (BFO) is one of the metaheuristics algorithms that most widely used to solve optimization problems. The BFO is imitated from the behavior of the foraging bacteria group such as Ecoli. The main aim of algorithm is to eliminate those bacteria that have weak foraging methods and maintaining those bacteria that have strong foraging methods. In this extent, each bacterium communicates with other bacteria by sending signals such that bacterium change the position in the next step if prior factors have been satisfied. In fact, the process of algorithm allows bacteria to follow up nutrients toward the optimal. In this paper, the BFO is used for the solutions of Quadratic Assignment Problem (QAP), and multi- objective QAP (mQAP) by using updating mechanisms including mutation, crossover, and a local search.\nHuman interactions are characterized by explicit as well as implicit channels of communication. While the explicit channel transmits overt messages, the implicit ones transmit hidden messages about the communicator (e.g., his/her intentions and attitudes). There is a growing consensus that providing a computer with the ability to manipulate implicit affective cues should allow for a more meaningful and natural way of studying particular non-verbal signals of human-human communications by human-computer interactions. In this pilot study, we created a non-dynamic human-computer interaction while manipulating three specific non-verbal channels of communication: gaze pattern, facial expression, and gesture. Participants rated the virtual agent on affective dimensional scales (pleasure, arousal, and dominance) while their physiological signal (electrodermal activity, EDA) was captured during the interaction. Assessment of the behavioral data revealed a significant and complex three-way interaction between gaze, gesture, and facial configuration on the dimension of pleasure, as well as a main effect of gesture on the dimension of dominance. These results suggest a complex relationship between different non-verbal cues and the social context in which they are interpreted. Qualifying considerations as well as possible next steps are further discussed in light of these exploratory findings.\nIn this paper, we describe a case study in a big metropolis, in which from data collected by digital sensors, we tried to understand mobility patterns of persons using buses and how this can generate knowledge to suggest interventions that are applied incrementally into the transportation network in use. We have first estimated an Origin-Destination matrix of buses users from datasets about the ticket validation and GPS positioning of buses. Then we represent the supply of buses with their routes through bus stops as a complex network, which allowed us to understand the bottlenecks of the current scenario and, in particular, applying community discovery techniques, to identify clusters that the service supply infrastructure has. Finally, from the superimposing of the flow of people represented in the OriginDestination matrix in the supply network, we exemplify how micro-interventions can be prospected by means of an example of the introduction of express routes.\nSpreadsheet workbook contents are simple programs. Because of this, probabilistic programming techniques can be used to perform Bayesian inversion of spreadsheet computations. What is more, existing execution engines in spreadsheet applications such as Microsoft Excel can be made to do this using only built-in functionality. We demonstrate this by developing a native Excel implementation of both a particle Markov Chain Monte Carlo variant and black-box variational inference for spreadsheet probabilistic programming. The resulting engine performs probabilistically coherent inference over spreadsheet computations, notably including spreadsheets that include user-defined black-box functions. Spreadsheet engines that choose to integrate the functionality we describe in this paper will give their users the ability to both easily develop probabilistic models and maintain them over time by including actuals via a simple user-interface mechanism. For spreadsheet end-users this would mean having access to efficient and probabilistically coherent probabilistic modeling and inference for use in all kinds of decision making under uncertainty.\nIn many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.\nWe propose Logic Tensor Networks: a uniform framework for integrating automatic learning and reasoning. A logic formalism called Real Logic is defined on a first-order language whereby formulas have truth-value in the interval [0,1] and semantics defined concretely on the domain of real numbers. Logical constants are interpreted as feature vectors of real numbers. Real Logic promotes a well-founded integration of deductive reasoning on a knowledge-base and efficient data-driven relational machine learning. We show how Real Logic can be implemented in deep Tensor Neural Networks with the use of Google's tensorflow primitives. The paper concludes with experiments applying Logic Tensor Networks on a simple but representative example of knowledge completion.\nSymmetry is the essential element of lifted inference that has recently demon- strated the possibility to perform very efficient inference in highly-connected, but symmetric probabilistic models models. This raises the question, whether this holds for optimisation problems in general. Here we show that for a large class of optimisation methods this is actually the case. More precisely, we introduce the concept of fractional symmetries of convex quadratic programs (QPs), which lie at the heart of many machine learning approaches, and exploit it to lift, i.e., to compress QPs. These lifted QPs can then be tackled with the usual optimization toolbox (off-the-shelf solvers, cutting plane algorithms, stochastic gradients etc.). If the original QP exhibits symmetry, then the lifted one will generally be more compact, and hence their optimization is likely to be more efficient.\nFirst-order knowledge compilation techniques have proven efficient for lifted inference. They compile a relational probability model into a target circuit on which many inference queries can be answered efficiently. Early methods used data structures as their target circuit. In our KR-2016 paper, we showed that compiling to a low-level program instead of a data structure offers orders of magnitude speedup, resulting in the state-of-the-art lifted inference technique. In this paper, we conduct experiments to address two questions regarding our KR-2016 results: 1- does the speedup come from more efficient compilation or more efficient reasoning with the target circuit?, and 2- why are low-level programs more efficient target circuits than data structures?\nWith the aim of studying social properties of belief merging and having a better understanding of impossibility, we extend in three ways the framework of logic-based merging introduced by Konieczny and Pino P\\'erez. First, at the level of representation of the information, we pass from belief bases to complex epistemic states. Second, the profiles are represented as functions of finite societies to the set of epistemic states (a sort of vectors) and not as multisets of epistemic states. Third, we extend the set of rational postulates in order to consider the epistemic versions of the classical postulates of Social Choice Theory: Standard Domain, Pareto Property, Independence of Irrelevant Alternatives and Absence of Dictator. These epistemic versions of social postulates are given, essentially, in terms of the finite propositional logic. We state some representation theorems for these operators. These extensions and representation theorems allow us to establish an epistemic and very general version of Arrow's Impossibility Theorem. One of the interesting features of our result, is that it holds for different representations of epistemic states; for instance conditionals, Ordinal Conditional functions and, of course, total preorders.\nWe present and evaluate a new model for Natural Language Generation (NLG) in Spoken Dialogue Systems, based on statistical planning, given noisy feedback from the current generation context (e.g. a user and a surface realiser). We study its use in a standard NLG problem: how to present information (in this case a set of search results) to users, given the complex trade- offs between utterance length, amount of information conveyed, and cognitive load. We set these trade-offs by analysing existing MATCH data. We then train a NLG pol- icy using Reinforcement Learning (RL), which adapts its behaviour to noisy feed- back from the current generation context. This policy is compared to several base- lines derived from previous work in this area. The learned policy significantly out- performs all the prior approaches.\nWe present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting. The network builds an internal plan, which is continuously updated upon observation of the next input from the environment. It can also partition this internal representation into contiguous sub- sequences by learning for how long the plan can be committed to - i.e. followed without re-planing. Combining these properties, the proposed model, dubbed STRategic Attentive Writer (STRAW) can learn high-level, temporally abstracted macro- actions of varying lengths that are solely learnt from data without any prior information. These macro-actions enable both structured exploration and economic computation. We experimentally demonstrate that STRAW delivers strong improvements on several ATARI games by employing temporally extended planning strategies (e.g. Ms. Pacman and Frostbite). It is at the same time a general algorithm that can be applied on any sequence data. To that end, we also show that when trained on text prediction task, STRAW naturally predicts frequent n-grams (instead of macro-actions), demonstrating the generality of the approach.\nWe describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently introduced \"perturbed iterate\" framework that resolves it. We thereby prove that ASAGA can obtain a theoretical linear speedup on multi-core systems even without sparsity assumptions. We present results of an implementation on a 40-core architecture illustrating the practical speedup as well as the hardware overhead.\nOur goal is to identify beneficial interventions from observational data. We consider interventions that are narrowly focused (impacting few covariates) and may be tailored to each individual or globally enacted over a population. For applications where harmful intervention is drastically worse than proposing no change, we propose a conservative definition of the optimal intervention. Assuming the underlying relationship remains invariant under intervention, we develop efficient algorithms to identify the optimal intervention policy from limited data and provide theoretical guarantees for our approach in a Gaussian Process setting. Although our methods assume covariates can be precisely adjusted, they remain capable of improving outcomes in misspecified settings where interventions incur unintentional downstream effects. Empirically, our approach identifies good interventions in two practical applications: gene perturbation and writing improvement.\nWe develop a belief space planning (BSP) approach that advances the state of the art by incorporating reasoning about data association (DA) within planning, while considering additional sources of uncertainty. Existing BSP approaches typically assume data association is given and perfect, an assumption that can be harder to justify while operating, in the presence of localization uncertainty, in ambiguous and perceptually aliased environments. In contrast, our data association aware belief space planning (DA-BSP) approach explicitly reasons about DA within belief evolution, and as such can better accommodate these challenging real world scenarios. In particular, we show that due to perceptual aliasing, the posterior belief becomes a mixture of probability distribution functions, and design cost functions that measure the expected level of ambiguity and posterior uncertainty. Using these and standard costs (e.g.~control penalty, distance to goal) within the objective function, yields a general framework that reliably represents action impact, and in particular, capable of active disambiguation. Our approach is thus applicable to robust active perception and autonomous navigation in perceptually aliased environments. We demonstrate key aspects in basic and realistic simulations.\nDeep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots. While DRL agents perform well in practice we are still lacking the tools to analayze their performance. In this work we present the Semi-Aggregated MDP (SAMDP) model. A model best suited to describe policies exhibiting both spatial and temporal hierarchies. We describe its advantages for analyzing trained policies over other modeling approaches, and show that under the right state representation, like that of DQN agents, SAMDP can help to identify skills. We detail the automatic process of creating it from recorded trajectories, up to presenting it on t-SNE maps. We explain how to evaluate its fitness and show surprising results indicating high compatibility with the policy at hand. We conclude by showing how using the SAMDP model, an extra performance gain can be squeezed from the agent.\nTransfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: \"successor features\", a value function representation that decouples the dynamics of the environment from the rewards, and \"generalized policy improvement\", a generalization of dynamic programming's policy improvement operation that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows the free exchange of information across tasks. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice, significantly outperforming alternative methods in a sequence of navigation tasks and in the control of a simulated robotic arm.\nWe show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test. We do not need to assume that the optimal predictor is the same between train and test, or that the true distribution lies in any parametric family. We can also efficiently differentiate the error estimate to perform unsupervised discriminative learning. Our technical tool is the method of moments, which allows us to exploit conditional independencies in the absence of a fully-specified model. Our framework encompasses a large family of losses including the log and exponential loss, and extends to structured output settings such as hidden Markov models.\nWe propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute. Our approach is based on an interrelated set of measures of expressivity, unified by the novel notion of trajectory length, which measures how the output of a network changes as the input sweeps along a one-dimensional path. Our findings can be summarized as follows:   (1) The complexity of the computed function grows exponentially with depth.   (2) All weights are not equal: trained networks are more sensitive to their lower (initial) layer weights.   (3) Regularizing on trajectory length (trajectory regularization) is a simpler alternative to batch normalization, with the same performance.\nThe capability to store data about business processes execution in so-called Event Logs has brought to the diffusion of tools for the analysis of process executions and for the assessment of the goodness of a process model. Nonetheless, these tools are often very rigid in dealing with with Event Logs that include incomplete information about the process execution. Thus, while the ability of handling incomplete event data is one of the challenges mentioned in the process mining manifesto, the evaluation of compliance of an execution trace still requires an end-to-end complete trace to be performed.   This paper exploits the power of abduction to provide a flexible, yet computationally effective, framework to deal with different forms of incompleteness in an Event Log. Moreover it proposes a refinement of the classical notion of compliance into strong and conditional compliance to take into account incomplete logs. Finally, performances evaluation in an experimental setting shows the feasibility of the presented approach.\nConcept Trees are a type of database that can organise arbitrary textual information using a very simple rule. Each tree tries to represent a single cohesive concept and the trees can link with each other for navigation and semantic purposes. The trees are therefore a type of semantic network and would benefit from having a consistent level of context for each of the nodes. The Concept Tree nodes have a mathematical basis allowing for a consistent build process. These would represent nouns or verbs in a text sentence, for example. New to the design can then be lists of descriptive elements for each of the nodes. The descriptors can also be weighted, but do not have to follow the strict counting rule of the tree nodes. With the new descriptive layers, a much richer type of knowledge can be achieved and a consistent method for adding context is suggested. It is also suggested to use the linking structure of the licas system as basis for the context links. The mathematical model is extended further and to finish, a query language is suggested for practical applications.\nThe Random Mutation Hill-Climbing algorithm is a direct search technique mostly used in discrete domains. It repeats the process of randomly selecting a neighbour of a best-so-far solution and accepts the neighbour if it is better than or equal to it. In this work, we propose to use a novel method to select the neighbour solution using a set of independent multi- armed bandit-style selection units which results in a bandit-based Random Mutation Hill-Climbing algorithm. The new algorithm significantly outperforms Random Mutation Hill-Climbing in both OneMax (in noise-free and noisy cases) and Royal Road problems (in the noise-free case). The algorithm shows particular promise for discrete optimisation problems where each fitness evaluation is expensive.\nProduct classification is the task of automatically predicting a taxonomy path for a product in a predefined taxonomy hierarchy given a textual product description or title. For efficient product classification we require a suitable representation for a document (the textual description of a product) feature vector and efficient and fast algorithms for prediction. To address the above challenges, we propose a new distributional semantics representation for document vector formation. We also develop a new two-level ensemble approach utilizing (with respect to the taxonomy tree) a path-wise, node-wise and depth-wise classifiers for error reduction in the final product classification. Our experiments show the effectiveness of the distributional representation and the ensemble approach on data sets from a leading e-commerce platform and achieve better results on various evaluation metrics compared to earlier approaches.\nThis paper addresses a question about music cognition: how do we derive polymetric structures. A preference rule system is presented which is implemented into a drum computer. The preference rule system allows inferring local polymetric structures, like two-over-three and three-over-two. By analyzing the micro-timing of West African percussion music a timing pattern consisting of six pulses was discovered. It integrates binary and ternary rhythmic feels. The presented drum computer integrates the discovered superimposed polymetric swing (timing and velocity) appropriate to the rhythmic sequence the user inputs. For binary sequences, the amount of binary swing is increased and for ternary sequences, the ternary swing is increased.\nIn statistical relational learning, the link prediction problem is key to automatically understand the structure of large knowledge bases. As in previous studies, we propose to solve this problem through latent factorization. However, here we make use of complex valued embeddings. The composition of complex embeddings can handle a large variety of binary relations, among them symmetric and antisymmetric relations. Compared to state-of-the-art models such as Neural Tensor Network and Holographic Embeddings, our approach based on complex embeddings is arguably simpler, as it only uses the Hermitian dot product, the complex counterpart of the standard dot product between real vectors. Our approach is scalable to large datasets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.\nCan we train a system that, on any new input, either says \"don't know\" or makes a prediction that is guaranteed to be correct? We answer the question in the affirmative provided our model family is well-specified. Specifically, we introduce the unanimity principle: only predict when all models consistent with the training data predict the same output. We operationalize this principle for semantic parsing, the task of mapping utterances to logical forms. We develop a simple, efficient method that reasons over the infinite set of all consistent models by only checking two of the models. We prove that our method obtains 100% precision even with a modest amount of training data from a possibly adversarial distribution. Empirically, we demonstrate the effectiveness of our approach on the standard GeoQuery dataset.\nOntologies provide conceptual abstractions over data, in domains such as the Internet of Things, in a way that sensor data can be harvested and interpreted by people and applications. The Semantic Sensor Network (SSN) ontology is the de-facto standard for semantic representation of sensor observations and metadata, and it is used at the core of the open source platform for the Internet of Things, OpenIoT. In this paper we present a Schema Editor that provides an intuitive web interface for defining new types of sensors, and concrete instances of them, using the SSN ontology as the core model. This editor is fully integrated with the OpenIoT platform for generating virtual sensor descriptions and automating their semantic annotation and registration process.\nKnowledge bases are useful resources for many natural language processing tasks, however, they are far from complete. In this paper, we define a novel entity representation as a mixture of its neighborhood in the knowledge base and apply this technique on TransE-a well-known embedding model for knowledge base completion. Experimental results show that the neighborhood information significantly helps to improve the results of the TransE model, leading to better performance than obtained by other state-of-the-art embedding models on three benchmark datasets for triple classification, entity prediction and relation prediction tasks.\nThis extended abstract presents an overview on NP-hard optimization problems with multiple interdependent components. These problems occur in many real-world applications: industrial applications, engineering, and logistics. The fact that these problems are composed of many sub-problems that are NP-hard makes them even more challenging to solve using exact algorithms. This is mainly due to the high complexity of this class of algorithms and the hardness of the problems themselves. The main source of difficulty of these problems is the presence of internal dependencies between sub-problems. This aspect of interdependence of components is presented, and some outlines on solving approaches are briefly introduced from a (meta)heuristics and evolutionary computation perspective.\nZero-sum stochastic games provide a rich model for competitive decision making. However, under general forms of state uncertainty as considered in the Partially Observable Stochastic Game (POSG), such decision making problems are still not very well understood. This paper makes a contribution to the theory of zero-sum POSGs by characterizing structure in their value function. In particular, we introduce a new formulation of the value function for zs-POSGs as a function of the \"plan-time sufficient statistics\" (roughly speaking the information distribution in the POSG), which has the potential to enable generalization over such information distributions. We further delineate this generalization capability by proving a structural result on the shape of value function: it exhibits concavity and convexity with respect to appropriately chosen marginals of the statistic space. This result is a key pre-cursor for developing solution methods that may be able to exploit such structure. Finally, we show how these results allow us to reduce a zs-POSG to a \"centralized\" model with shared observations, thereby transferring results for the latter, narrower class, to games with individual (private) observations.\nA core problem in learning semantic parsers from denotations is picking out consistent logical forms--those that yield the correct denotation--from a combinatorially large space. To control the search space, previous work relied on restricted set of rules, which limits expressivity. In this paper, we consider a much more expressive class of logical forms, and show how to use dynamic programming to efficiently represent the complete set of consistent logical forms. Expressivity also introduces many more spurious logical forms which are consistent with the correct denotation but do not represent the meaning of the utterance. To address this, we generate fictitious worlds and use crowdsourced denotations on these worlds to filter out spurious logical forms. On the WikiTableQuestions dataset, we increase the coverage of answerable questions from 53.5% to 76%, and the additional crowdsourced supervision lets us rule out 92.1% of spurious logical forms.\nConstraint-based causal discovery from limited data is a notoriously difficult challenge due to the many borderline independence test decisions. Several approaches to improve the reliability of the predictions by exploiting redundancy in the independence information have been proposed recently. Though promising, existing approaches can still be greatly improved in terms of accuracy and scalability. We present a novel method that reduces the combinatorial explosion of the search space by using a more coarse-grained representation of causal information, drastically reducing computation time. Additionally, we propose a method to score causal predictions based on their confidence. Crucially, our implementation also allows one to easily combine observational and interventional data and to incorporate various types of available background knowledge. We prove soundness and asymptotic consistency of our method and demonstrate that it can outperform the state-of-the-art on synthetic data, achieving a speedup of several orders of magnitude. We illustrate its practical feasibility by applying it on a challenging protein data set.\nConversational agents (\"bots\") are beginning to be widely used in conversational interfaces. To design a system that is capable of emulating human-like interactions, a conversational layer that can serve as a fabric for chat-like interaction with the agent is needed. In this paper, we introduce a model that employs Information Retrieval by utilizing convolutional deep structured semantic neural network-based features in the ranker to present human-like responses in ongoing conversation with a user. In conversations, accounting for context is critical to the retrieval model; we show that our context-sensitive approach using a Convolutional Deep Structured Semantic Model (cDSSM) with character trigrams significantly outperforms several conventional baselines in terms of the relevance of responses retrieved.\nStability evaluation of a weight-update system of higher-order neural units (HONUs) with polynomial aggregation of neural inputs (also known as classes of polynomial neural networks) for adaptation of both feedforward and recurrent HONUs by a gradient descent method is introduced. An essential core of the approach is based on spectral radius of a weight-update system, and it allows stability monitoring and its maintenance at every adaptation step individually. Assuring stability of the weight-update system (at every single adaptation step) naturally results in adaptation stability of the whole neural architecture that adapts to target data. As an aside, the used approach highlights the fact that the weight optimization of HONU is a linear problem, so the proposed approach can be generally extended to any neural architecture that is linear in its adaptable parameters.\nRecently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA). The performance of most models is clustered around 60-70%. In this paper we propose systematic methods to analyze the behavior of these models as a first step towards recognizing their strengths and weaknesses, and identifying the most fruitful directions for progress. We analyze two models, one each from two major classes of VQA models -- with-attention and without-attention and show the similarities and differences in the behavior of these models. We also analyze the winning entry of the VQA Challenge 2016.   Our behavior analysis reveals that despite recent progress, today's VQA models are \"myopic\" (tend to fail on sufficiently novel instances), often \"jump to conclusions\" (converge on a predicted answer after 'listening' to just half the question), and are \"stubborn\" (do not change their answers across images).\nWe investigate the problem of learning Bayesian networks in an agnostic model where an $\\epsilon$-fraction of the samples are adversarially corrupted. Our agnostic learning model is similar to -- in fact, stronger than -- Huber's contamination model in robust statistics. In this work, we study the fully observable Bernoulli case where the structure of the network is given. Even in this basic setting, previous learning algorithms either run in exponential time or lose dimension-dependent factors in their error guarantees. We provide the first computationally efficient agnostic learning algorithm for this problem with dimension-independent error guarantees. Our algorithm has polynomial sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples.\nWe investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods.\nRecurrent neural networks, and in particular long short-term memory (LSTM) networks, are a remarkably effective tool for sequence modeling that learn a dense black-box hidden representation of their sequential input. Researchers interested in better understanding these models have studied the changes in hidden state representations over time and noticed some interpretable patterns but also significant noise. In this work, we present LSTMVIS, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics. The tool allows users to select a hypothesis input range to focus on local state changes, to match these states changes to similar patterns in a large data set, and to align these results with structural annotations from their domain. We show several use cases of the tool for analyzing specific hidden state properties on dataset containing nesting, phrase structure, and chord progressions, and demonstrate how the tool can be used to isolate patterns for further statistical analysis. We characterize the domain, the different stakeholders, and their goals and tasks.\nTemporal common sense has applications in AI tasks such as QA, multi-document summarization, and human-AI communication. We propose the task of sequencing -- given a jumbled set of aligned image-caption pairs that belong to a story, the task is to sort them such that the output sequence forms a coherent story. We present multiple approaches, via unary (position) and pairwise (order) predictions, and their ensemble-based combinations, achieving strong results on this task. We use both text-based and image-based features, which depict complementary improvements. Using qualitative examples, we demonstrate that our models have learnt interesting aspects of temporal common sense.\nExtensive work has been conducted both in game theory and logic to model strategic interaction. An important question is whether we can use these theories to design agents for interacting with people? On the one hand, they provide a formal design specification for agent strategies. On the other hand, people do not necessarily adhere to playing in accordance with these strategies, and their behavior is affected by a multitude of social and psychological factors. In this paper we will consider the question of whether strategies implied by theories of strategic behavior can be used by automated agents that interact proficiently with people. We will focus on automated agents that we built that need to interact with people in two negotiation settings: bargaining and deliberation. For bargaining we will study game-theory based equilibrium agents and for argumentation we will discuss logic-based argumentation theory. We will also consider security games and persuasion games and will discuss the benefits of using equilibrium based agents.\nThe impossibility theorem of Dekel, Lipman and Rustichini has been thought to demonstrate that standard state-space models cannot be used to represent unawareness. We first show that Dekel, Lipman and Rustichini do not establish this claim. We then distinguish three notions of awareness, and argue that although one of them may not be adequately modeled using standard state spaces, there is no reason to think that standard state spaces cannot provide models of the other two notions. In fact, standard space models of these forms of awareness are attractively simple. They allow us to prove completeness and decidability results with ease, to carry over standard techniques from decision theory, and to add propositional quantifiers straightforwardly.\nThe semantics for counterfactuals due to David Lewis has been challenged on the basis of unlikely, or impossible, events. Such events may skew a given similarity order in favour of those possible worlds which exhibit them. By updating the relational structure of a model according to a ceteris paribus clause one forces out, in a natural manner, those possible worlds which do not satisfy the requirements of the clause. We develop a ceteris paribus logic for counterfactual reasoning capable of performing such actions, and offer several alternative (relaxed) interpretations of ceteris paribus. We apply this framework in a way which allows us to reason counterfactually without having our similarity order skewed by unlikely events. This continues the investigation of formal ceteris paribus reasoning, which has previously been applied to preferences, logics of game forms, and questions in decision-making, among other areas.\nWe consider decision-making and game scenarios in which an agent is limited by his/her computational ability to foresee all the available moves towards the future - that is, we study scenarios with short sight. We focus on how short sight affects the logical properties of decision making in multi-agent settings. We start with single-agent sequential decision making (SSDM) processes, modeling them by a new structure of \"preference-sight trees\". Using this model, we first explore the relation between a new natural solution concept of Sight-Compatible Backward Induction (SCBI) and the histories produced by classical Backward Induction (BI). In particular, we find necessary and sufficient conditions for the two analyses to be equivalent. Next, we study whether larger sight always contributes to better outcomes. Then we develop a simple logical special-purpose language to formally express some key properties of our preference-sight models. Lastly, we show how short-sight SSDM scenarios call for substantial enrichments of existing fixed-point logics that have been developed for the classical BI solution concept. We also discuss changes in earlier modal logics expressing \"surface reasoning\" about best actions in the presence of short sight. Our analysis may point the way to logical and computational analysis of more realistic game models.\nThe Knowledge of Preconditions principle (KoP) is proposed as a widely applicable connection between knowledge and action in multi-agent systems. Roughly speaking, it asserts that if some condition is a necessary condition for performing a given action A, then knowing that this condition holds is also a necessary condition for performing A. Since the specifications of tasks often involve necessary conditions for actions, the KoP principle shows that such specifications induce knowledge preconditions for the actions. Distributed protocols or multi-agent plans that satisfy the specifications must ensure that this knowledge be attained, and that it is detected by the agents as a condition for action. The knowledge of preconditions principle is formalised in the runs and systems framework, and is proven to hold in a wide class of settings. Well-known connections between knowledge and coordinated action are extended and shown to derive directly from the KoP principle: a \"common knowledge of preconditions\" principle is established showing that common knowledge is a necessary condition for performing simultaneous actions, and a \"nested knowledge of preconditions\" principle is proven, showing that coordinating actions to be performed in linear temporal order requires a corresponding form of nested knowledge.\nIn this paper we introduce a computational-level model of theory of mind (ToM) based on dynamic epistemic logic (DEL), and we analyze its computational complexity. The model is a special case of DEL model checking. We provide a parameterized complexity analysis, considering several aspects of DEL (e.g., number of agents, size of preconditions, etc.) as parameters. We show that model checking for DEL is PSPACE-hard, also when restricted to single-pointed models and S5 relations, thereby solving an open problem in the literature. Our approach is aimed at formalizing current intractability claims in the cognitive science literature regarding computational models of ToM.\nAn agent who lacks preferences and instead makes decisions using criteria that are costly to create should select efficient sets of criteria, where the cost of making a given number of choice distinctions is minimized. Under mild conditions, efficiency requires that binary criteria with only two categories per criterion are chosen. When applied to the problem of determining the optimal number of digits in an information storage device, this result implies that binary digits (bits) are the efficient solution, even when the marginal cost of using additional digits declines rapidly to 0. This short paper pays particular attention to the symmetry conditions entailed when sets of criteria are efficient.\nIn the last few decades, numerous experiments have shown that humans do not always behave so as to maximize their material payoff. Cooperative behavior when non-cooperation is a dominant strategy (with respect to the material payoffs) is particularly puzzling. Here we propose a novel approach to explain cooperation, assuming what Halpern and Pass call translucent players. Typically, players are assumed to be opaque, in the sense that a deviation by one player in a normal-form game does not affect the strategies used by other players. But a player may believe that if he switches from one strategy to another, the fact that he chooses to switch may be visible to the other players. For example, if he chooses to defect in Prisoner's Dilemma, the other player may sense his guilt. We show that by assuming translucent players, we can recover many of the regularities observed in human behavior in well-studied games such as Prisoner's Dilemma, Traveler's Dilemma, Bertrand Competition, and the Public Goods game.\nRecently, the next-item/basket recommendation system, which considers the sequential relation between bought items, has drawn attention of researchers. The utilization of sequential patterns has boosted performance on several kinds of recommendation tasks. Inspired by natural language processing (NLP) techniques, we propose a novel neural network (NN) based next-song recommender, CNN-rec, in this paper. Then, we compare the proposed system with several NN based and classic recommendation systems on the next-song recommendation task. Verification results indicate the proposed system outperforms classic systems and has comparable performance with the state-of-the-art system.\nThere is an urgent need for compact, fast, and power-efficient hardware implementations of state-of-the-art artificial intelligence. Here we propose a power-efficient approach for real-time inference, in which deep neural networks (DNNs) are implemented through low-power analog circuits. Although analog implementations can be extremely compact, they have been largely supplanted by digital designs, partly because of device mismatch effects due to fabrication. We propose a framework that exploits the power of Deep Learning to compensate for this mismatch by incorporating the measured variations of the devices as constraints in the DNN training process. This eliminates the use of mismatch minimization strategies such as the use of very large transistors, and allows circuit complexity and power-consumption to be reduced to a minimum. Our results, based on large-scale simulations as well as a prototype VLSI chip implementation indicate at least a 3-fold improvement of processing efficiency over current digital implementations.\nProactive decision support (PDS) helps in improving the decision making experience of human decision makers in human-in-the-loop planning environments. Here both the quality of the decisions and the ease of making them are enhanced. In this regard, we propose a PDS framework, named RADAR, based on the research in Automated Planning in AI, that aids the human decision maker with her plan to achieve her goals by providing alerts on: whether such a plan can succeed at all, whether there exist any resource constraints that may foil her plan, etc. This is achieved by generating and analyzing the landmarks that must be accomplished by any successful plan on the way to achieving the goals. Note that, this approach also supports naturalistic decision making which is being acknowledged as a necessary element in proactive decision support, since it only aids the human decision maker through suggestions and alerts rather than enforcing fixed plans or decisions. We demonstrate the utility of the proposed framework through search-and-rescue examples in a fire-fighting domain.\nWe present in this paper an efficient approach for acoustic scene classification by exploring the structure of class labels. Given a set of class labels, a category taxonomy is automatically learned by collectively optimizing a clustering of the labels into multiple meta-classes in a tree structure. An acoustic scene instance is then embedded into a low-dimensional feature representation which consists of the likelihoods that it belongs to the meta-classes. We demonstrate state-of-the-art results on two different datasets for the acoustic scene classification task, including the DCASE 2013 and LITIS Rouen datasets.\nWe consider an assignment problem that has aspects of fair division as well as social choice. In particular, we investigate the problem of assigning a small subset from a set of indivisible items to multiple players so that the chosen subset is \\emph{agreeable} to all players, i.e., every player weakly prefers the chosen subset to any subset of its complement. For an arbitrary number of players, we derive tight upper bounds on the size for which a subset of that size that is agreeable to all players always exists when preferences are monotonic. We then present polynomial-time algorithms that find an agreeable subset of approximately half of the items when there are two or three players and preferences are responsive. Our results translate to a 2-approximation on the individual welfare of every player when preferences are subadditive.\nThe study of quantum walks has been shown to have a wide range of applications in areas such as artificial intelligence, the study of biological processes, and quantum transport. The quantum stochastic walk, which allows for incoherent movement of the walker, and therefore, directionality, is a generalization on the fully coherent quantum walk. While a quantum stochastic walk can always be described in Lindblad formalism, this does not mean that it can be microscopically derived in the standard weak-coupling limit under the Born-Markov approximation. This restricts the class of quantum stochastic walks that can be experimentally realized in a simple manner. To circumvent this restriction, we introduce a technique to simulate open system evolution on a fully coherent quantum computer, using a quantum trajectories style approach. We apply this technique to a broad class of quantum stochastic walks, and show that they can be simulated with minimal experimental resources. Our work opens the path towards the experimental realization of quantum stochastic walks on large graphs with existing quantum technologies.\nTop-$N$ recommender systems have been extensively studied. However, the sparsity of user-item activities has not been well resolved. While many hybrid systems were proposed to address the cold-start problem, the profile information has not been sufficiently leveraged. Furthermore, the heterogeneity of profiles between users and items intensifies the challenge. In this paper, we propose a content-based top-$N$ recommender system by learning the global term weights in profiles. To achieve this, we bring in PathSim, which could well measures the node similarity with heterogeneous relations (between users and items). Starting from the original TF-IDF value, the global term weights gradually converge, and eventually reflect both profile and activity information. To facilitate training, the derivative is reformulated into matrix form, which could easily be paralleled. We conduct extensive experiments, which demonstrate the superiority of the proposed method.\nTo appear in the proceedings of LPAR 21.   Solving complex problems can involve non-trivial combinations of distinct knowledge bases and problem solvers. The Algebra of Modular Systems is a knowledge representation framework that provides a method for formally specifying such systems in purely semantic terms. Formally, an expression of the algebra defines a class of structures. Many expressive formalism used in practice solve the model expansion task, where a structure is given on the input and an expansion of this structure in the defined class of structures is searched (this practice overcomes the common undecidability problem for expressive logics). In this paper, we construct a solver for the model expansion task for a complex modular systems from an expression in the algebra and black-box propagators or solvers for the primitive modules. To this end, we define a general notion of propagators equipped with an explanation mechanism, an extension of the alge- bra to propagators, and a lazy conflict-driven learning algorithm. The result is a framework for seamlessly combining solving technology from different domains to produce a solver for a combined system.\nA true lie is a lie that becomes true when announced. In a logic of announcements, where the announcing agent is not modelled, a true lie is a formula (that is false and) that becomes true when announced. We investigate true lies and other types of interaction between announced formulas, their preconditions and their postconditions, in the setting of Gerbrandy's logic of believed announcements, wherein agents may have or obtain incorrect beliefs. Our results are on the satisfiability and validity of instantiations of these semantically defined categories, on iterated announcements, including arbitrarily often iterated announcements, and on syntactic characterization. We close with results for iterated announcements in the logic of knowledge (instead of belief), and for lying as private announcements (instead of public announcements) to different agents. Detailed examples illustrate our lying concepts.\nMethods based on representation learning currently hold the state-of-the-art in many natural language processing and knowledge base inference tasks. Yet, a major challenge is how to efficiently incorporate commonsense knowledge into such models. A recent approach regularizes relation and entity representations by propositionalization of first-order logic rules. However, propositionalization does not scale beyond domains with only few entities and rules. In this paper we present a highly efficient method for incorporating implication rules into distributed representations for automated knowledge base construction. We map entity-tuple embeddings into an approximately Boolean space and encourage a partial ordering over relation embeddings based on implication rules mined from WordNet. Surprisingly, we find that the strong restriction of the entity-tuple embedding space does not hurt the expressiveness of the model and even acts as a regularizer that improves generalization. By incorporating few commonsense rules, we achieve an increase of 2 percentage points mean average precision over a matrix factorization baseline, while observing a negligible increase in runtime.\nA function $f: \\mathbb{Z}_+^E \\rightarrow \\mathbb{R}_+$ is DR-submodular if it satisfies $f(\\bx + \\chi_i) -f (\\bx) \\ge f(\\by + \\chi_i) - f(\\by)$ for all $\\bx\\le \\by, i\\in E$. Recently, the problem of maximizing a DR-submodular function $f: \\mathbb{Z}_+^E \\rightarrow \\mathbb{R}_+$ subject to a budget constraint $\\|\\bx\\|_1 \\leq B$ as well as additional constraints has received significant attention \\cite{SKIK14,SY15,MYK15,SY16}.   In this note, we give a generic reduction from the DR-submodular setting to the submodular setting. The running time of the reduction and the size of the resulting submodular instance depends only \\emph{logarithmically} on $B$. Using this reduction, one can translate the results for unconstrained and constrained submodular maximization to the DR-submodular setting for many types of constraints in a unified manner.\nRelational logistic regression (RLR) is a representation of conditional probability in terms of weighted formulae for modelling multi-relational data. In this paper, we develop a learning algorithm for RLR models. Learning an RLR model from data consists of two steps: 1- learning the set of formulae to be used in the model (a.k.a. structure learning) and learning the weight of each formula (a.k.a. parameter learning). For structure learning, we deploy Schmidt and Murphy's hierarchical assumption: first we learn a model with simple formulae, then more complex formulae are added iteratively only if all their sub-formulae have proven effective in previous learned models. For parameter learning, we convert the problem into a non-relational learning problem and use an off-the-shelf logistic regression learning algorithm from Weka, an open-source machine learning tool, to learn the weights. We also indicate how hidden features about the individuals can be incorporated into RLR to boost the learning performance. We compare our learning algorithm to other structure and parameter learning algorithms in the literature, and compare the performance of RLR models to standard logistic regression and RDN-Boost on a modified version of the MovieLens data-set.\nThis paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Density-based Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of using only $k$ nearest neighbors, we further consider reverse nearest neighbors and shared nearest neighbors of an object for density distribution estimation. Some theoretical properties of the proposed RDOS including its expected value and false alarm probability are derived. A comprehensive experimental study on both synthetic and real-life data sets demonstrates that our approach is more effective than state-of-the-art outlier detection methods.\nData quantization learns encoding results of data with certain requirements, and provides a broad perspective of many real-world applications to data handling. Nevertheless, the results of encoder is usually limited to multivariate inputs with the random mapping, and side information of binary codes are hardly to mostly depict the original data patterns as possible. In the literature, cosine based random quantization has attracted much attentions due to its intrinsic bounded results. Nevertheless, it usually suffers from the uncertain outputs, and information of original data fails to be fully preserved in the reduced codes. In this work, a novel binary embedding method, termed adaptive training quantization (ATQ), is proposed to learn the ideal transform of random encoder, where the limitation of cosine random mapping is tackled. As an adaptive learning idea, the reduced mapping is adaptively calculated with idea of data group, while the bias of random transform is to be improved to hold most matching information. Experimental results show that the proposed method is able to obtain outstanding performance compared with other random quantization methods.\nThere are many declarative frameworks that allow us to implement code formatters relatively easily for any specific language, but constructing them is cumbersome. The first problem is that \"everybody\" wants to format their code differently, leading to either many formatter variants or a ridiculous number of configuration options. Second, the size of each implementation scales with a language's grammar size, leading to hundreds of rules.   In this paper, we solve the formatter construction problem using a novel approach, one that automatically derives formatters for any given language without intervention from a language expert. We introduce a code formatter called CodeBuff that uses machine learning to abstract formatting rules from a representative corpus, using a carefully designed feature set. Our experiments on Java, SQL, and ANTLR grammars show that CodeBuff is efficient, has excellent accuracy, and is grammar invariant for a given language. It also generalizes to a 4th language tested during manuscript preparation.\nIn this paper, we present subgraph2vec, a novel approach for learning latent representations of rooted subgraphs from large graphs inspired by recent advancements in Deep Learning and Graph Kernels. These latent representations encode semantic substructure dependencies in a continuous vector space, which is easily exploited by statistical models for tasks such as graph classification, clustering, link prediction and community detection. subgraph2vec leverages on local information obtained from neighbourhoods of nodes to learn their latent representations in an unsupervised fashion. We demonstrate that subgraph vectors learnt by our approach could be used in conjunction with classifiers such as CNNs, SVMs and relational data clustering algorithms to achieve significantly superior accuracies. Also, we show that the subgraph vectors could be used for building a deep learning variant of Weisfeiler-Lehman graph kernel. Our experiments on several benchmark and large-scale real-world datasets reveal that subgraph2vec achieves significant improvements in accuracies over existing graph kernels on both supervised and unsupervised learning tasks. Specifically, on two realworld program analysis tasks, namely, code clone and malware detection, subgraph2vec outperforms state-of-the-art kernels by more than 17% and 4%, respectively.\nIn this paper, a novel multiple criteria decision making (MCDM) methodology is presented for assessing and prioritizing medical tourism destinations in uncertain environment. A systematic evaluation and assessment method is proposed by integrating rough number based AHP (Analytic Hierarchy Process) and rough number based MABAC (Multi-Attributive Border Approximation area Comparison). Rough number is used to aggregate individual judgments and preferences to deal with vagueness in decision making due to limited data. Rough AHP analyzes the relative importance of criteria based on their preferences given by experts. Rough MABAC evaluates the alternative sites based on the criteria weights. The proposed methodology is explained through a case study considering different cities for healthcare service in India. The validity of the obtained ranking for the given decision making problem is established by testing criteria proposed by Wang and Triantaphyllou (2008) along with further analysis and discussion.\nMunicipal solid waste management (MSWM) is a challenging issue of urban development in developing countries. Each country having different socio-economic-environmental background, might not accept a particular disposal method as the optimal choice. Selection of suitable disposal method in MSWM, under vague and imprecise information can be considered as multi criteria decision making problem (MCDM). In the present paper, TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) methodology is extended based on credibility theory for evaluating the performances of MSW disposal methods under some criteria fixed by experts. The proposed model helps decision makers to choose a preferable alternative for their municipal area. A sensitivity analysis by our proposed model confirms this fact.\nNeural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT models, namely class-blind, class-uniform, and class-distribution, which differ in terms of how pruning thresholds are computed for the different classes of weights in the NMT architecture. We demonstrate the efficacy of weight pruning as a compression technique for a state-of-the-art NMT system. We show that an NMT model with over 200 million parameters can be pruned by 40% with very little performance loss as measured on the WMT'14 English-German translation task. This sheds light on the distribution of redundancy in the NMT architecture. Our main result is that with retraining, we can recover and even surpass the original performance with an 80%-pruned model.\nDisjunctive Answer Set Programming (ASP) is a powerful declarative programming paradigm whose main decision problems are located on the second level of the polynomial hierarchy. Identifying tractable fragments and developing efficient algorithms for such fragments are thus important objectives in order to complement the sophisticated ASP systems available to date. Hard problems can become tractable if some problem parameter is bounded by a fixed constant; such problems are then called fixed-parameter tractable (FPT). While several FPT results for ASP exist, parameters that relate to directed or signed graphs representing the program at hand have been neglected so far. In this paper, we first give some negative observations showing that directed width measures on the dependency graph of a program do not lead to FPT results. We then consider the graph parameter of signed clique-width and present a novel dynamic programming algorithm that is FPT w.r.t. this parameter. Clique-width is more general than the well-known treewidth, and, to the best of our knowledge, ours is the first FPT algorithm for bounded clique-width for reasoning problems beyond SAT.\nWe propose to accelerate the rate of convergence of the pattern recognition task by directly minimizing the variance diameters of certain hypothesis spaces, which are critical quantities in fast-convergence results.We show that the variance diameters can be controlled by dividing hypothesis spaces into metric balls based on a new order metric. This order metric can be minimized as an ordinal regression problem, leading to a LUPI (Learning Using Privileged Information) application where we take the privileged information as some desired ordering, and construct a faster-converging hypothesis space by empirically restricting some larger hypothesis space according to that ordering. We give a risk analysis of the approach. We discuss the difficulties with model selection and give an innovative technique for selecting multiple model parameters. Finally, we provide some data experiments.\nAn important approach for efficient inference in probabilistic graphical models exploits symmetries among objects in the domain. Symmetric variables (states) are collapsed into meta-variables (meta-states) and inference algorithms are run over the lifted graphical model instead of the flat one. Our paper extends existing definitions of symmetry by introducing the novel notion of contextual symmetry. Two states that are not globally symmetric, can be contextually symmetric under some specific assignment to a subset of variables, referred to as the context variables. Contextual symmetry subsumes previous symmetry definitions and can rep resent a large class of symmetries not representable earlier. We show how to compute contextual symmetries by reducing it to the problem of graph isomorphism. We extend previous work on exploiting symmetries in the MCMC framework to the case of contextual symmetries. Our experiments on several domains of interest demonstrate that exploiting contextual symmetries can result in significant computational gains.\nThe aggregation and denoising of crowd labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive optimal rates of convergence for the permutation-based model. We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. Finally, we propose a computationally-efficient method, called the OBI-WAN estimator, that is uniformly optimal over a class intermediate between the permutation-based and the Dawid-Skene models, and is uniformly consistent over the entire permutation-based model class. In contrast, the guarantees for estimators available in prior literature are sub-optimal over the original Dawid-Skene model.\nDue to the intractable nature of exact lifted inference, research has recently focused on the discovery of accurate and efficient approximate inference algorithms in Statistical Relational Models (SRMs), such as Lifted First-Order Belief Propagation. FOBP simulates propositional factor graph belief propagation without constructing the ground factor graph by identifying and lifting over redundant message computations. In this work, we propose a generalization of FOBP called Lifted Generalized Belief Propagation, in which both the region structure and the message structure can be lifted. This approach allows more of the inference to be performed intra-region (in the exact inference step of BP), thereby allowing simulation of propagation on a graph structure with larger region scopes and fewer edges, while still maintaining tractability. We demonstrate that the resulting algorithm converges in fewer iterations to more accurate results on a variety of SRMs.\nThe challenge stated in the title can be divided into two main problems. The first problem is to reliably mimic the way that users interact with user interfaces. The second problem is to build an instructible agent, i.e. one that can be taught to execute tasks expressed as previously unseen natural language commands. This paper proposes a solution to the second problem, a system we call Helpa. End-users can teach Helpa arbitrary new tasks whose level of complexity is similar to the tasks available from today's most popular virtual assistants. Teaching Helpa does not involve any programming. Instead, users teach Helpa by providing just one example of a command paired with a demonstration of how to execute that command. Helpa does not rely on any pre-existing domain-specific knowledge. It is therefore completely domain-independent. Our usability study showed that end-users can teach Helpa many new tasks in less than a minute each, often much less.\nAs a general means of expression, audio analysis and recognition has attracted much attentions for its wide applications in real-life world. Audio emotion recognition (AER) attempts to understand emotional states of human with the given utterance signals, and has been studied abroad for its further development on friendly human-machine interfaces. Distinguish from other existing works, the person-dependent patterns of audio emotions are conducted, and fractal dimension features are calculated for acoustic feature extraction. Furthermore, it is able to efficiently learn intrinsic characteristics of auditory emotions, while the utterance features are learned from fractal dimensions of each sub-bands. Experimental results show the proposed method is able to provide comparative performance for audio emotion recognition.\nProf. Robert Berwick's abstract for his forthcoming invited talk at the ACL2016 workshop on Cognitive Aspects of Computational Language Learning revives an ancient debate. Entitled \"Why take a chance?\", Berwick seems to refer implicitly to Chomsky's critique of the statistical approach of Harris as well as the currently dominant paradigms in CoNLL.   Berwick avoids Chomsky's use of \"innate\" but states that \"the debate over the existence of sophisticated mental grammars was settled with Chomsky's Logical Structure of Linguistic Theory (1957/1975)\", acknowledging that \"this debate has often been revived\".   This paper agrees with the view that this debate has long since been settled, but with the opposite outcome! Given the embers have not yet died away, and the questions remain fundamental, perhaps it is appropriate to refuel the debate, so I would like to join Bob in throwing fuel on this fire by reviewing the evidence against the Chomskian position!\nNeutrosophic Over-/Under-/Off-Set and -Logic were defined by the author in 1995 and published for the first time in 2007. We extended the neutrosophic set respectively to Neutrosophic Overset {when some neutrosophic component is over 1}, Neutrosophic Underset {when some neutrosophic component is below 0}, and to Neutrosophic Offset {when some neutrosophic components are off the interval [0, 1], i.e. some neutrosophic component over 1 and other neutrosophic component below 0}. This is no surprise with respect to the classical fuzzy set/logic, intuitionistic fuzzy set/logic, or classical/imprecise probability, where the values are not allowed outside the interval [0, 1], since our real-world has numerous examples and applications of over-/under-/off-neutrosophic components. For example, person working overtime deserves a membership degree over 1, while a person producing more damage than benefit to a company deserves a membership below 0. Then, similarly, the Neutrosophic Logic/Measure/Probability/Statistics etc. were extended to respectively Neutrosophic Over-/Under-/Off-Logic, -Measure, -Probability, -Statistics etc. [Smarandache, 2007].\nWe propose a simple domain adaptation method for neural networks in a supervised setting. Supervised domain adaptation is a way of improving the generalization performance on the target domain by using the source domain dataset, assuming that both of the datasets are labeled. Recently, recurrent neural networks have been shown to be successful on a variety of NLP tasks such as caption generation; however, the existing domain adaptation techniques are limited to (1) tune the model parameters by the target dataset after the training by the source dataset, or (2) design the network to have dual output, one for the source domain and the other for the target domain. Reformulating the idea of the domain adaptation technique proposed by Daume (2007), we propose a simple domain adaptation method, which can be applied to neural networks trained with a cross-entropy loss. On captioning datasets, we show performance improvements over other domain adaptation methods.\nA neighborhood graph, which represents the instances as vertices and their relations as weighted edges, is the basis of many semi-supervised and relational models for node labeling and link prediction. Most methods employ a sequential process to construct the neighborhood graph. This process often consists of generating a candidate graph, pruning the candidate graph to make a neighborhood graph, and then performing inference on the variables (i.e., nodes) in the neighborhood graph. In this paper, we propose a framework that can dynamically adapt the neighborhood graph based on the states of variables from intermediate inference results, as well as structural properties of the relations connecting them. A key strength of our framework is its ability to handle multi-relational data and employ varying amounts of relations for each instance based on the intermediate inference results. We formulate the link prediction task as inference on neighborhood graphs, and include preliminary results illustrating the effects of different strategies in our proposed framework.\nThis article presents an agent architecture for controlling an autonomous agent in stochastic environments. The architecture combines the partially observable Markov decision process (POMDP) model with the belief-desire-intention (BDI) framework. The Hybrid POMDP-BDI agent architecture takes the best features from the two approaches, that is, the online generation of reward-maximizing courses of action from POMDP theory, and sophisticated multiple goal management from BDI theory. We introduce the advances made since the introduction of the basic architecture, including (i) the ability to pursue multiple goals simultaneously and (ii) a plan library for storing pre-written plans and for storing recently generated plans for future reuse. A version of the architecture without the plan library is implemented and is evaluated using simulations. The results of the simulation experiments indicate that the approach is feasible.\nTraditionally, researchers in decision making have focused on attempting to reach Pareto Optimality using horizontal approaches, where optimality is calculated taking into account every participant at the same time. Sometimes, this may prove to be a difficult task (e.g., conflict, mistrust, no information sharing, etc.). In this paper, we explore the possibility of achieving Pareto Optimal outcomes in a group by using a bottom-up approach: discovering Pareto optimal outcomes by interacting in subgroups. We analytically show that Pareto optimal outcomes in a subgroup are also Pareto optimal in a supergroup of those agents in the case of strict, transitive, and complete preferences. Then, we empirically analyze the prospective usability and practicality of bottom-up approaches in a variety of decision making domains.\nThis paper introduces mathematical formalism for Spatial (SP) of Hierarchical Temporal Memory (HTM) with a spacial consideration for its hardware implementation. Performance of HTM network and its ability to learn and adjust to a problem at hand is governed by a large set of parameters. Most of parameters are codependent which makes creating efficient HTM-based solutions challenging. It requires profound knowledge of the settings and their impact on the performance of system. Consequently, this paper introduced a set of formulas which are to facilitate the design process by enhancing tedious trial-and-error method with a tool for choosing initial parameters which enable quick learning convergence. This is especially important in hardware implementations which are constrained by the limited resources of a platform. The authors focused especially on a formalism of Spatial Pooler and derive at the formulas for quality and convergence of the model. This may be considered as recipes for designing efficient HTM models for given input patterns.\nAmong the most general structures extending the framework by Dung are the abstract dialectical frameworks (ADFs). They come equipped with various types of semantics, with the most prominent - the labeling-based one - analyzed in the context of computational complexity, signatures, instantiations and software support. This makes the abstract dialectical frameworks valuable tools for argumentation. However, there are fewer results available concerning the relation between the ADFs and other argumentation frameworks. In this paper we would like to address this issue by introducing a number of translations from various formalisms into ADFs. The results of our study show the similarities and differences between them, thus promoting the use and understanding of ADFs. Moreover, our analysis also proves their capability to model many of the existing frameworks, including those that go beyond the attack relation. Finally, translations allow other structures to benefit from the research on ADFs in general and from the existing software in particular.\nMultiple choice questions (MCQs) that can be generated from a domain ontology can significantly reduce human effort & time required for authoring & administering assessments in an e-Learning environment. Even though here are various methods for generating MCQs from ontologies, methods for determining the difficulty-levels of such MCQs are less explored. In this paper, we study various aspects and factors that are involved in determining the difficulty-score of an MCQ, and propose an ontology-based model for the prediction. This model characterizes the difficulty values associated with the stem and choice set of the MCQs, and describes a measure which combines both the scores. Further more, the notion of assigning difficultly-scores based on the skill level of the test taker is utilized for predicating difficulty-score of a stem. We studied the effectiveness of the predicted difficulty-scores with the help of a psychometric model from the Item Response Theory, by involving real-students and domain experts. Our results show that, the predicated difficulty-levels of the MCQs are having high correlation with their actual difficulty-levels.\nIn this paper we propose the technology for constructing propositional encodings of discrete functions. It is aimed at solving inversion problems of considered functions using state-of-the-art SAT solvers. We implemented this technology in the form of the software system called Transalg, and used it to construct SAT encodings for a number of cryptanalysis problems. By applying SAT solvers to these encodings we managed to invert several cryptographic functions. In particular, we used the SAT encodings produced by Transalg to construct the family of two-block MD5 collisions in which the first 10 bytes are zeros. Also we used Transalg encoding for the widely known A5/1 keystream generator to solve several dozen of its cryptanalysis instances in a distributed computing environment. In the paper we compare in detail the functionality of Transalg with that of similar software systems.\nWe introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have achieved this by way of laborious feature engineering. By contrast, we propose to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm. Our approach does not require elaborate feature engineering (and concomitant data scraping); fitting user embeddings requires only the text from their previous posts. The experimental results show that our model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features.\nEntity resolution, the problem of identifying the underlying entity of references found in data, has been researched for many decades in many communities. A common theme in this research has been the importance of incorporating relational features into the resolution process. Relational entity resolution is particularly important in knowledge graphs (KGs), which have a regular structure capturing entities and their interrelationships. We identify three major problems in KG entity resolution: (1) intra-KG reference ambiguity; (2) inter-KG reference ambiguity; and (3) ambiguity when extending KGs with new facts. We implement a framework that generalizes across these three settings and exploits this regular structure of KGs. Our framework has many advantages over custom solutions widely deployed in industry, including collective inference, scalability, and interpretability. We apply our framework to two real-world KG entity resolution problems, ambiguity in NELL and merging data from Freebase and MusicBrainz, demonstrating the importance of relational features.\nIn distributed, or privacy-preserving learning, we are often given a set of probabilistic models estimated from different local repositories, and asked to combine them into a single model that gives efficient statistical estimation. A simple method is to linearly average the parameters of the local models, which, however, tends to be degenerate or not applicable on non-convex models, or models with different parameter dimensions. One more practical strategy is to generate bootstrap samples from the local models, and then learn a joint model based on the combined bootstrap set. Unfortunately, the bootstrap procedure introduces additional noise and can significantly deteriorate the performance. In this work, we propose two variance reduction methods to correct the bootstrap noise, including a weighted M-estimator that is both statistically efficient and practically powerful. Both theoretical and empirical analysis is provided to demonstrate our methods.\nOne of the challenges in affect recognition is accurate estimation of the emotion intensity level. This research proposes development of an affect intensity estimation model based on a weighted sum of classification confidence levels, displacement of feature points and speed of feature point motion. The parameters of the model were calculated from data captured using multiple modalities such as face, body posture, hand movement and speech. A preliminary study was conducted to compare the accuracy of the model with the annotated intensity levels. An emotion intensity scale ranging from 0 to 1 along the arousal dimension in the emotion space was used. Results indicated speech and hand modality significantly contributed in improving accuracy in emotion intensity estimation using the proposed model.\nIn this paper, we attempt to extend Multi Attributive Border Approximation area Comparison (MABAC) approach for multi-attribute decision making (MADM) problems based on type-2 fuzzy sets (IT2FSs). As a special case of IT2FSs interval type-2 trapezoidal fuzzy numbers (IT2TrFNs) are adopted here to deal with uncertainties present in many practical evaluation and selection problems. A systematic description of MABAC based on IT2TrFNs is presented in the current study. The validity and feasibility of the proposed method are illustrated by a practical example of selecting the most suitable candidate for a software company which is heading to hire a system analysis engineer based on few attributes. Finally, a comparison with two other existing MADM methods is described.\nThe present study provides the first evidence that illiteracy can be reliably predicted from standard mobile phone logs. By deriving a broad set of mobile phone indicators reflecting users financial, social and mobility patterns we show how supervised machine learning can be used to predict individual illiteracy in an Asian developing country, externally validated against a large-scale survey. On average the model performs 10 times better than random guessing with a 70% accuracy. Further we show how individual illiteracy can be aggregated and mapped geographically at cell tower resolution. Geographical mapping of illiteracy is crucial to know where the illiterate people are, and where to put in resources. In underdeveloped countries such mappings are often based on out-dated household surveys with low spatial and temporal resolution. One in five people worldwide struggle with illiteracy, and it is estimated that illiteracy costs the global economy more than 1 trillion dollars each year. These results potentially enable costeffective, questionnaire-free investigation of illiteracy-related questions on an unprecedented scale\nOntologies are one of the core foundations of the Semantic Web. To participate in Semantic Web projects, domain experts need to be able to understand the ontologies involved. Visual notations can provide an overview of the ontology and help users to understand the connections among entities. However, the users first need to learn the visual notation before they can interpret it correctly. Controlled natural language representation would be readable right away and might be preferred in case of complex axioms, however, the structure of the ontology would remain less apparent. We propose to combine ontology visualizations with contextual ontology verbalizations of selected ontology (diagram) elements, displaying controlled natural language (CNL) explanations of OWL axioms corresponding to the selected visual notation elements. Thus, the domain experts will benefit from both the high-level overview provided by the graphical notation and the detailed textual explanations of particular elements in the diagram.\nDeep neural networks are able to learn powerful representations from large quantities of labeled input data, however they cannot always generalize well across changes in input distributions. Domain adaptation algorithms have been proposed to compensate for the degradation in performance due to domain shift. In this paper, we address the case when the target domain is unlabeled, requiring unsupervised adaptation. CORAL is a \"frustratingly easy\" unsupervised domain adaptation method that aligns the second-order statistics of the source and target distributions with a linear transformation. Here, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (Deep CORAL). Experiments on standard benchmark datasets show state-of-the-art performance.\nThis paper describes a new algorithm for decision making in two-player real-time video games. As with Monte Carlo Tree Search, the algorithm can be used without heuristics and has been developed for use in general video game AI. The approach is to extend recent work on rolling horizon evolutionary planning, which has been shown to work well for single-player games, to two (or in principle many) player games. To select an action the algorithm co-evolves two (or in the general case N) populations, one for each player, where each individual is a sequence of actions for the respective player. The fitness of each individual is evaluated by playing it against a selection of action-sequences from the opposing population. When choosing an action to take in the game, the first action is chosen from the fittest member of the population for that player. The new algorithm is compared with a number of general video game AI algorithms on three variations of a two-player space battle game, with promising results.\nIn ontology-based data access, databases are connected to an ontology via mappings from queries over the database to queries over the ontology. In this paper, we consider mappings from relational databases to first-order ontologies, and define an ASP-based framework for GLAV mappings with queries over the ontology in the mapping rule bodies. We show that this type of mappings can be used to express constraints and exceptions, as well as being a powerful mechanism for succinctly representing OBDA mappings. We give an algorithm for brave reasoning in this setting, and show that this problem has either the same data complexity as ASP (NP- complete), or it is at least as hard as the complexity of checking entailment for the ontology queries. Furthermore, we show that for ontologies with UCQ-rewritable queries there exists a natural reduction from mapping programs to \\exists-ASP, an extension of ASP with existential variables that itself admits a natural reduction to ASP.\nSeveral studies on sentence processing suggest that the mental lexicon keeps track of the mutual expectations between words. Current DSMs, however, represent context words as separate features, thereby loosing important information for word expectations, such as word interrelations. In this paper, we present a DSM that addresses this issue by defining verb contexts as joint syntactic dependencies. We test our representation in a verb similarity task on two datasets, showing that joint contexts achieve performances comparable to single dependencies or even better. Moreover, they are able to overcome the data sparsity problem of joint feature spaces, in spite of the limited size of our training corpus.\nThis report describes our submissions to Task2 and Task3 of the DCASE 2016 challenge. The systems aim at dealing with the detection of overlapping audio events in continuous streams, where the detectors are based on random decision forests. The proposed forests are jointly trained for classification and regression simultaneously. Initially, the training is classification-oriented to encourage the trees to select discriminative features from overlapping mixtures to separate positive audio segments from the negative ones. The regression phase is then carried out to let the positive audio segments vote for the event onsets and offsets, and therefore model the temporal structure of audio events. One random decision forest is specifically trained for each event category of interest. Experimental results on the development data show that our systems significantly outperform the baseline on the Task2 evaluation while they are inferior to the baseline in the Task3 evaluation.\nBig data analytics applications drive the convergence of data management and machine learning. But there is no conceptual language available that is spoken in both worlds. The main contribution of the paper is a method to translate Bayesian networks, a main conceptual language for probabilistic graphical models, into usable entity relationship models. The transformed representation of a Bayesian network leaves out mathematical details about probabilistic relationships but unfolds all information relevant for data management tasks. As a real world example, we present the TopicExplorer system that uses Bayesian topic models as a core component in an interactive, database-supported web application. Last, we sketch a conceptual framework that eases machine learning specific development tasks while building big data analytics applications.\nDeep convolutional neural networks (CNNs) have been actively adopted in the field of music information retrieval, e.g. genre classification, mood detection, and chord recognition. However, the process of learning and prediction is little understood, particularly when it is applied to spectrograms. We introduce auralisation of a CNN to understand its underlying mechanism, which is based on a deconvolution procedure introduced in [2]. Auralisation of a CNN is converting the learned convolutional features that are obtained from deconvolution into audio signals. In the experiments and discussions, we explain trained features of a 5-layer CNN based on the deconvolved spectrograms and auralised signals. The pairwise correlations per layers with varying different musical attributes are also investigated to understand the evolution of the learnt features. It is shown that in the deep layers, the features are learnt to capture textures, the patterns of continuous distributions, rather than shapes of lines.\nIn this paper, we investigate the possibility of improvement of the widely-used filtering algorithm for the linear constraints in constraint satisfaction problems in the presence of the alldifferent constraints. In many cases, the fact that the variables in a linear constraint are also constrained by some alldifferent constraints may help us to calculate stronger bounds of the variables, leading to a stronger constraint propagation. We propose an improved filtering algorithm that targets such cases. We provide a detailed description of the proposed algorithm and prove its correctness. We evaluate the approach on five different problems that involve combinations of the linear and the alldifferent constraints. We also compare our algorithm to other relevant approaches. The experimental results show a great potential of the proposed improvement.\nIn sentiment analysis, the polarities of the opinions expressed on an object/feature are determined to assess the sentiment of a sentence or document whether it is positive/negative/neutral. Naturally, the object/feature is a noun representation which refers to a product or a component of a product, let us say, the \"lens\" in a camera and opinions emanating on it are captured in adjectives, verbs, adverbs and noun words themselves. Apart from such words, other meta-information and diverse effective features are also going to play an important role in influencing the sentiment polarity and contribute significantly to the performance of the system. In this paper, some of the associated information/meta-data are explored and investigated in the sentiment text. Based on the analysis results presented here, there is scope for further assessment and utilization of the meta-information as features in text categorization, ranking text document, identification of spam documents and polarity classification problems.\nThe aim of this research is development of rule based decision model for emotion recognition. This research also proposes using the rules for augmenting inter-corporal recognition accuracy in multimodal systems that use supervised learning techniques. The classifiers for such learning based recognition systems are susceptible to over fitting and only perform well on intra-corporal data. To overcome the limitation this research proposes using rule based model as an additional modality. The rules were developed using raw feature data from visual channel, based on human annotator agreement and existing studies that have attributed movement and postures to emotions. The outcome of the rule evaluations was combined during the decision phase of emotion recognition system. The results indicate rule based emotion recognition augment recognition accuracy of learning based systems and also provide better recognition rate across inter corpus emotion test data.\nWeakly-sticky (WS) Datalog+/- is an expressive member of the family of Datalog+/- programs that is based on the syntactic notions of stickiness and weak-acyclicity. Query answering over the WS programs has been investigated, but there is still much work to do on the design and implementation of practical query answering (QA) algorithms and their optimizations. Here, we study sticky and WS programs from the point of view of the behavior of the chase procedure, extending the stickiness property of the chase to that of generalized stickiness of the chase (gsch-property). With this property we specify the semantic class of GSCh programs, which includes sticky and WS programs, and other syntactic subclasses that we identify. In particular, we introduce joint-weakly-sticky (JWS) programs, that include WS programs. We also propose a bottom-up QA algorithm for a range of subclasses of GSCh. The algorithm runs in polynomial time (in data) for JWS programs. Unlike the WS class, JWS is closed under a general magic-sets rewriting procedure for the optimization of programs with existential rules. We apply the magic-sets rewriting in combination with the proposed QA algorithm for the optimization of QA over JWS programs.\nOpen Information Extraction (Open IE) systems aim to obtain relation tuples with highly scalable extraction in portable across domain by identifying a variety of relation phrases and their arguments in arbitrary sentences. The first generation of Open IE learns linear chain models based on unlexicalized features such as Part-of-Speech (POS) or shallow tags to label the intermediate words between pair of potential arguments for identifying extractable relations. Open IE currently is developed in the second generation that is able to extract instances of the most frequently observed relation types such as Verb, Noun and Prep, Verb and Prep, and Infinitive with deep linguistic analysis. They expose simple yet principled ways in which verbs express relationships in linguistics such as verb phrase-based extraction or clause-based extraction. They obtain a significantly higher performance over previous systems in the first generation. In this paper, we describe an overview of two Open IE generations including strengths, weaknesses and application areas.\nThe authors present a cyber-physical systems study on the estimation of driver behavior in autonomous vehicles and vehicle safety systems. Extending upon previous work, the approach described is suitable for the long term estimation and tracking of autonomous vehicle behavior. The proposed system makes use of a previously defined Hybrid State System and Hidden Markov Model (HSS+HMM) system which has provided good results for driver behavior estimation. The HSS+HMM system utilizes the hybrid characteristics of decision-behavior coupling of many systems such as the driver and the vehicle, uses Kalman Filter estimates of observable parameters to track the instantaneous continuous state, and estimates the most likely driver state. The HSS+HMM system is encompassed in a HSS structure and inter-system connectivity is determined by using Signal Processing and Pattern Recognition techniques. The proposed method is suitable for scenarios that involve unknown decisions of other individuals, such as lane changes or intersection precedence/access. The long term driver behavior estimation system involves an extended HSS+HMM structure that is capable of including external information in the estimation process. Through the grafting and pruning of metastates, the HSS+HMM system can be dynamically updated to best represent driver choices given external information. Three application examples are also provided to elucidate the theoretical system.\nReal-world optimisation problems are often dynamic. Previously good solutions must be updated or replaced due to changes in objectives and constraints. It is often claimed that evolutionary algorithms are particularly suitable for dynamic optimisation because a large population can contain different solutions that may be useful in the future. However, rigorous theoretical demonstrations for how populations in dynamic optimisation can be essential are sparse and restricted to special cases.   This paper provides theoretical explanations of how populations can be essential in evolutionary dynamic optimisation in a general and natural setting. We describe a natural class of dynamic optimisation problems where a sufficiently large population is necessary to keep track of moving optima reliably. We establish a relationship between the population-size and the probability that the algorithm loses track of the optimum.\nCharacterizing driving styles of human drivers using vehicle sensor data, e.g., GPS, is an interesting research problem and an important real-world requirement from automotive industries. A good representation of driving features can be highly valuable for autonomous driving, auto insurance, and many other application scenarios. However, traditional methods mainly rely on handcrafted features, which limit machine learning algorithms to achieve a better performance. In this paper, we propose a novel deep learning solution to this problem, which could be the first attempt of extending deep learning to driving behavior analysis based on GPS data. The proposed approach can effectively extract high level and interpretable features describing complex driving patterns. It also requires significantly less human experience and work. The power of the learned driving style representations are validated through the driver identification problem using a large real dataset.\nAn effective indexing scheme for clusters that enables fast structure comparison and congruence check is desperately desirable in the field of mathematics, artificial intelligence, materials science, etc. Here we introduce the concept of minimum vertex-type sequence for the indexing of clusters on square lattice, which contains a series of integers each labeling the vertex type of an atom. The minimum vertex-type sequence is orientation independent, and it builds a one-to-one correspondence with the cluster. By using minimum vertex-type sequence for structural comparison and congruence check, only one type of data is involved, and the largest amount of data to be compared is n pairs, n is the cluster size. In comparison with traditional coordinate-based methods and distance-matrix methods, the minimum vertex-type sequence indexing scheme has many other remarkable advantages. Furthermore, this indexing scheme can be easily generalized to clusters on other high-symmetry lattices. Our work can facilitate cluster indexing and searching in various situations, it may inspire the search of other practical indexing schemes for handling clusters of large sizes.\nThe strength of chess engines together with the availability of numerous chess games have attracted the attention of chess players, data scientists, and researchers during the last decades. State-of-the-art engines now provide an authoritative judgement that can be used in many applications like cheating detection, intrinsic ratings computation, skill assessment, or the study of human decision-making. A key issue for the research community is to gather a large dataset of chess games together with the judgement of chess engines. Unfortunately the analysis of each move takes lots of times. In this paper, we report our effort to analyse almost 5 millions chess games with a computing grid. During summer 2015, we processed 270 millions unique played positions using the Stockfish engine with a quite high depth (20). We populated a database of 1+ tera-octets of chess evaluations, representing an estimated time of 50 years of computation on a single machine. Our effort is a first step towards the replication of research results, the supply of open data and procedures for exploring new directions, and the investigation of software engineering/scalability issues when computing billions of moves.\nCombinatorial optimization problems are typically NP-hard, and thus very challenging to solve. In this paper, we present the random key cuckoo search (RKCS) algorithm for solving the famous Travelling Salesman Problem (TSP). We used a simplified random-key encoding scheme to pass from a continuous space (real numbers) to a combinatorial space. We also consider the displacement of a solution in both spaces using Levy flights. The performance of the proposed RKCS is tested against a set of benchmarks of symmetric TSP from the well-known TSPLIB library. The results of the tests show that RKCS is superior to some other metaheuristic algorithms.\nUnderstanding users' interactions with highly subjective content---like artistic images---is challenging due to the complex semantics that guide our preferences. On the one hand one has to overcome `standard' recommender systems challenges, such as dealing with large, sparse, and long-tailed datasets. On the other, several new challenges present themselves, such as the need to model content in terms of its visual appearance, or even social dynamics, such as a preference toward a particular artist that is independent of the art they create.   In this paper we build large-scale recommender systems to model the dynamics of a vibrant digital art community, Behance, consisting of tens of millions of interactions (clicks and `appreciates') of users toward digital art. Methodologically, our main contributions are to model (a) rich content, especially in terms of its visual appearance; (b) temporal dynamics, in terms of how users prefer `visually consistent' content within and across sessions; and (c) social dynamics, in terms of how users exhibit preferences both towards certain art styles, as well as the artists themselves.\nWe present a long-term intrinsically motivated structure learning method for modeling transition dynamics during controlled interactions between a robot and semi-permanent structures in the world. In particular, we discuss how partially-observable state is represented using distributions over a Markovian state and build models of objects that predict how state distributions change in response to interactions with such objects. These structures serve as the basis for a number of possible future tasks defined as Markov Decision Processes (MDPs). The approach is an example of a structure learning technique applied to a multimodal affordance representation that yields a population of forward models for use in planning. We evaluate the approach using experiments on a bimanual mobile manipulator (uBot-6) that show the performance of model acquisition as the number of transition actions increases.\nFuzzy logic is an alternate approach for quantifying uncertainty relating to activity duration. The fuzzy version of the backward recursion has been shown to produce results that incorrectly amplify the level of uncertainty. However, the fuzzy version of the forward recursion has been widely proposed as an approach for determining the fuzzy set of critical path lengths. In this paper, the direct application of the extension principle leads to a proposition that must be satisfied in fuzzy critical path analysis. Using a counterexample it is demonstrated that the fuzzy forward recursion when discrete fuzzy sets are used to represent activity durations produces results that are not consistent with the theory presented. The problem is shown to be the application of the fuzzy maximum. Several methods presented in the literature are described and shown to provide results that are consistent with the extension principle.\nIn recent years RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we present a practical implementation of a different kind of knowledge representation based on Prototypes. In detail, we present a concrete syntax easily and effectively parsable by applications. We also present extensible implementations of a prototype knowledge base, specifically designed for storage of Prototypes. These implementations are written in Java and can be extended by using the implementation as a library. Alternatively, the software can be deployed as such. Further, results of benchmarks for both local and web deployment are presented. This paper augments a research paper, in which we describe the more theoretical aspects of our Prototype system.\nThis paper introduces a novel method for learning how to play the most difficult Atari 2600 games from the Arcade Learning Environment using deep reinforcement learning. The proposed method, human checkpoint replay, consists in using checkpoints sampled from human gameplay as starting points for the learning process. This is meant to compensate for the difficulties of current exploration strategies, such as epsilon-greedy, to find successful control policies in games with sparse rewards. Like other deep reinforcement learning architectures, our model uses a convolutional neural network that receives only raw pixel inputs to estimate the state value function. We tested our method on Montezuma's Revenge and Private Eye, two of the most challenging games from the Atari platform. The results we obtained show a substantial improvement compared to previous learning approaches, as well as over a random player. We also propose a method for training deep reinforcement learning agents using human gameplay experience, which we call human experience replay.\nRecent years have seen significant market penetration for voice-based personal assistants such as Apple's Siri. However, despite this success, user take-up is frustratingly low. This position paper argues that there is a habitability gap caused by the inevitable mismatch between the capabilities and expectations of human users and the features and benefits provided by contemporary technology. Suggestions are made as to how such problems might be mitigated, but a more worrisome question emerges: \"is spoken language all-or-nothing\"? The answer, based on contemporary views on the special nature of (spoken) language, is that there may indeed be a fundamental limit to the interaction that can take place between mismatched interlocutors (such as humans and machines). However, it is concluded that interactions between native and non-native speakers, or between adults and children, or even between humans and dogs, might provide critical inspiration for the design of future speech-based human-machine interaction.\nReal-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data.\nA framework for consensus modelling is introduced using Kleene's three valued logic as a means to express vagueness in agents' beliefs. Explicitly borderline cases are inherent to propositions involving vague concepts where sentences of a propositional language may be absolutely true, absolutely false or borderline. By exploiting these intermediate truth values, we can allow agents to adopt a more vague interpretation of underlying concepts in order to weaken their beliefs and reduce the levels of inconsistency, so as to achieve consensus. We consider a consensus combination operation which results in agents adopting the borderline truth value as a shared viewpoint if they are in direct conflict. Simulation experiments are presented which show that applying this operator to agents chosen at random (subject to a consistency threshold) from a population, with initially diverse opinions, results in convergence to a smaller set of more precise shared beliefs. Furthermore, if the choice of agents for combination is dependent on the payoff of their beliefs, this acting as a proxy for performance or usefulness, then the system converges to beliefs which, on average, have higher payoff.\nThis paper presents the study of an event grouping based algorithm for a university course timetabling problem. Several publications which discuss the problem and some approaches for its solution are analyzed. The grouping of events in groups with an equal number of events in each group is not applicable to all input data sets. For this reason, a universal approach to all possible groupings of events in commensurate in size groups is proposed here. Also, an implementation of an algorithm based on this approach is presented. The methodology, conditions and the objectives of the experiment are described. The experimental results are analyzed and the ensuing conclusions are stated. The future guidelines for further research are formulated.\nNeural conversational models tend to produce generic or safe responses in different contexts, e.g., reply \\textit{\"Of course\"} to narrative statements or \\textit{\"I don't know\"} to questions. In this paper, we propose an end-to-end approach to avoid such problem in neural generative models. Additional memory mechanisms have been introduced to standard sequence-to-sequence (seq2seq) models, so that context can be considered while generating sentences. Three seq2seq models, which memorize a fix-sized contextual vector from hidden input, hidden input/output and a gated contextual attention structure respectively, have been trained and tested on a dataset of labeled question-answering pairs in Chinese. The model with contextual attention outperforms others including the state-of-the-art seq2seq models on perplexity test. The novel contextual model generates diverse and robust responses, and is able to carry out conversations on a wide range of topics appropriately.\nBig longitudinal observational databases present the opportunity to extract new knowledge in a cost effective manner. Unfortunately, the ability of these databases to be used for causal inference is limited due to the passive way in which the data are collected resulting in various forms of bias. In this paper we investigate a method that can overcome these limitations and determine causal contrast set rules efficiently from big data. In particular, we present a new methodology for the purpose of identifying risk factors that increase a patients likelihood of experiencing the known rare side effect of renal failure after ingesting aminosalicylates. The results show that the methodology was able to identify previously researched risk factors such as being prescribed diuretics and highlighted that patients with a higher than average risk of renal failure may be even more susceptible to experiencing it as a side effect after ingesting aminosalicylates.\nA major challenge in consumer credit risk portfolio management is to classify households according to their risk profile. In order to build such risk profiles it is necessary to employ an approach that analyses data systematically in order to detect important relationships, interactions, dependencies and associations amongst the available continuous and categorical variables altogether and accurately generate profiles of most interesting household segments according to their credit risk. The objective of this work is to employ a knowledge discovery from database process to identify groups of indebted households and describe their profiles using a database collected by the Consumer Credit Counselling Service (CCCS) in the UK. Employing a framework that allows the usage of both categorical and continuous data altogether to find hidden structures in unlabelled data it was established the ideal number of clusters and such clusters were described in order to identify the households who exhibit a high propensity of excessive debt levels.\nIn this position paper, we present ideas about creating a next generation framework towards an adaptive interface for data communication and visualisation systems. Our objective is to develop a system that accepts large data sets as inputs and provides user-centric, meaningful visual information to assist owners to make sense of their data collection. The proposed framework comprises four stages: (i) the knowledge base compilation, where we search and collect existing state-ofthe-art visualisation techniques per domain and user preferences; (ii) the development of the learning and inference system, where we apply artificial intelligence techniques to learn, predict and recommend new graphic interpretations (iii) results evaluation; and (iv) reinforcement and adaptation, where valid outputs are stored in our knowledge base and the system is iteratively tuned to address new demands. These stages, as well as our overall vision, limitations and possible challenges are introduced in this article. We also discuss further extensions of this framework for other knowledge discovery tasks.\nPurpose: To develop a framework for identifying and incorporating candidate confounding interaction terms into a regularised cox regression analysis to refine adverse drug reaction signals obtained via longitudinal observational data. Methods: We considered six drug families that are commonly associated with myocardial infarction in observational healthcare data, but where the causal relationship ground truth is known (adverse drug reaction or not). We applied emergent pattern mining to find itemsets of drugs and medical events that are associated with the development of myocardial infarction. These are the candidate confounding interaction terms. We then implemented a cohort study design using regularised cox regression that incorporated and accounted for the candidate confounding interaction terms. Results The methodology was able to account for signals generated due to confounding and a cox regression with elastic net regularisation correctly ranked the drug families known to be true adverse drug reactions above those.\nUncertain data streams have been widely generated in many Web applications. The uncertainty in data streams makes anomaly detection from sensor data streams far more challenging. In this paper, we present a novel framework that supports anomaly detection in uncertain data streams. The proposed framework adopts an efficient uncertainty pre-processing procedure to identify and eliminate uncertainties in data streams. Based on the corrected data streams, we develop effective period pattern recognition and feature extraction techniques to improve the computational efficiency. We use classification methods for anomaly detection in the corrected data stream. We also empirically show that the proposed approach shows a high accuracy of anomaly detection on a number of real datasets.\nNatural Language Inference is an important task for Natural Language Understanding. It is concerned with classifying the logical relation between two sentences. In this paper, we propose several text generative neural networks for generating text hypothesis, which allows construction of new Natural Language Inference datasets. To evaluate the models, we propose a new metric -- the accuracy of the classifier trained on the generated dataset. The accuracy obtained by our best generative model is only 2.7% lower than the accuracy of the classifier trained on the original, human crafted dataset. Furthermore, the best generated dataset combined with the original dataset achieves the highest accuracy. The best model learns a mapping embedding for each training example. By comparing various metrics we show that datasets that obtain higher ROUGE or METEOR scores do not necessarily yield higher classification accuracies. We also provide analysis of what are the characteristics of a good dataset including the distinguishability of the generated datasets from the original one.\nWhile question answering (QA) with neural network, i.e. neural QA, has achieved promising results in recent years, lacking of large scale real-word QA dataset is still a challenge for developing and evaluating neural QA system. To alleviate this problem, we propose a large scale human annotated real-world QA dataset WebQA with more than 42k questions and 556k evidences. As existing neural QA methods resolve QA either as sequence generation or classification/ranking problem, they face challenges of expensive softmax computation, unseen answers handling or separate candidate answer generation component. In this work, we cast neural QA as a sequence labeling problem and propose an end-to-end sequence labeling model, which overcomes all the above challenges. Experimental results on WebQA show that our model outperforms the baselines significantly with an F1 score of 74.69% with word-based input, and the performance drops only 3.72 F1 points with more challenging character-based input.\nIn this paper, we develop an agent-based model which integrates four important elements, i.e. organisational energy management policies/regulations, energy management technologies, electric appliances and equipment, and human behaviour, based on a case study, to simulate the energy consumption in office buildings. With the model, we test the effectiveness of different energy management strategies, and solve practical office energy consumption problems. This paper theoretically contributes to an integration of four elements involved in the complex organisational issue of office energy consumption, and practically contributes to an application of agent-based approach for office building energy consumption study.\nThe causal discovery of Bayesian networks is an active and important research area, and it is based upon searching the space of causal models for those which can best explain a pattern of probabilistic dependencies shown in the data. However, some of those dependencies are generated by causal structures involving variables which have not been measured, i.e., latent variables. Some such patterns of dependency \"reveal\" themselves, in that no model based solely upon the observed variables can explain them as well as a model using a latent variable. That is what latent variable discovery is based upon. Here we did a search for finding them systematically, so that they may be applied in latent variable discovery in a more rigorous fashion.\nThe OneMax problem is a standard benchmark optimisation problem for a binary search space. Recent work on applying a Bandit-Based Random Mutation Hill-Climbing algorithm to the noisy OneMax Problem showed that it is important to choose a good value for the resampling number to make a careful trade off between taking more samples in order to reduce noise, and taking fewer samples to reduce the total computational cost. This paper extends that observation, by deriving an analytical expression for the running time of the RMHC algorithm with resampling applied to the noisy OneMax problem, and showing both theoretically and empirically that the optimal resampling number increases with the number of dimensions in the search space.\nWe present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive optimization scheme and smoothly inserted into the gap, i.e. the lost or distorted signal region. Extensive listening tests show that the proposed algorithm provides highly promising results when applied to a variety of real-world music signals.\nActions may not proceed as planned; they may be interrupted, resumed or overridden. This is a challenge to handle in a natural language understanding system. We describe extensions to an existing implementation for the control of autonomous systems by natural language, to enable such systems to handle incoming language requests regarding actions. Language Communication with Autonomous Systems (LCAS) has been extended with support for X-nets, parameterized executable schemas representing actions. X-nets enable the system to control actions at a desired level of granularity, while providing a mechanism for language requests to be processed asynchronously. Standard semantics supported include requests to stop, continue, or override the existing action. The specific domain demonstrated is the control of motion of a simulated robot, but the approach is general, and could be applied to other domains.\nWe investigate the problem of verbalizing Web Ontology Language (OWL) axioms of domain ontologies in this paper. The existing approaches address the problem of fidelity of verbalized OWL texts to OWL semantics by exploring different ways of expressing the same OWL axiom in various linguistic forms. They also perform grouping and aggregating of the natural language (NL) sentences that are generated corresponding to each OWL statement into a comprehensible structure. However, no efforts have been taken to try out a semantic reduction at logical level to remove redundancies and repetitions, so that the reduced set of axioms can be used for generating a more meaningful and human-understandable (what we call redundancy-free) text. Our experiments show that, formal semantic reduction at logical level is very helpful to generate redundancy-free descriptions of ontology entities. In this paper, we particularly focus on generating descriptions of individuals of SHIQ based ontologies. The details of a case study are provided to support the usefulness of the redundancy-free NL descriptions of individuals, in knowledge validation application.\n3D action recognition - analysis of human actions based on 3D skeleton data - becomes popular recently due to its succinctness, robustness, and view-invariant representation. Recent attempts on this problem suggested to develop RNN-based learning methods to model the contextual dependency in the temporal domain. In this paper, we extend this idea to spatio-temporal domains to analyze the hidden sources of action-related information within the input data over both domains concurrently. Inspired by the graphical structure of the human skeleton, we further propose a more powerful tree-structure based traversal method. To handle the noise and occlusion in 3D skeleton data, we introduce new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell. Our method achieves state-of-the-art performance on 4 challenging benchmark datasets for 3D human action analysis.\nAutonomous robots operating in dynamic environments must maintain beliefs over a hypothesis space that is rich enough to represent the activities of interest at different scales. This is important both in order to accommodate the availability of evidence at varying degrees of coarseness, such as when interpreting and assimilating natural instructions, but also in order to make subsequent reactive planning more efficient. We present an algorithm that combines a topology-based trajectory clustering procedure that generates hierarchically-structured spatial abstractions with a bank of particle filters at each of these abstraction levels so as to produce probability estimates over an agent's navigation activity that is kept consistent across the hierarchy. We study the performance of the proposed method using a synthetic trajectory dataset in 2D, as well as a dataset taken from AIS-based tracking of ships in an extended harbour area. We show that, in comparison to a baseline which is a particle filter that estimates activity without exploiting such structure, our method achieves a better normalised error in predicting the trajectory as well as better time to convergence to a true class when compared against ground truth.\nWe present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.\nMost application development happens in the context of complex APIs; reference documentation for APIs has grown tremendously in variety, complexity, and volume, and can be difficult to navigate. There is a growing need to develop well-organized ways to access the knowledge latent in the documentation; several research efforts deal with the organization (ontology) of API-related knowledge. Extensive knowledge-engineering work, supported by a rigorous qualitative analysis, by Maalej & Robillard [3] has identified a useful taxonomy of API knowledge. Based on this taxonomy, we introduce a domain independent technique to extract the knowledge types from the given API reference documentation. Our system, OntoCat, introduces total nine different features and their semantic and statistical combinations to classify the different knowledge types. We tested OntoCat on python API reference documentation. Our experimental results show the effectiveness of the system and opens the scope of probably related research areas (i.e., user behavior, documentation quality, etc.).\nThis survey outlines a general and modular theory for proving approximation guarantees for equilibria of auctions in complex settings. This theory complements traditional economic techniques, which generally focus on exact and optimal solutions and are accordingly limited to relatively stylized settings.   We highlight three user-friendly analytical tools: smoothness-type inequalities, which immediately yield approximation guarantees for many auction formats of interest in the special case of complete information and deterministic strategies; extension theorems, which extend such guarantees to randomized strategies, no-regret learning outcomes, and incomplete-information settings; and composition theorems, which extend such guarantees from simpler to more complex auctions. Combining these tools yields tight worst-case approximation guarantees for the equilibria of many widely-used auction formats.\nUnstructured data refers to information that does not have a predefined data model or is not organized in a pre-defined manner. Loosely speaking, unstructured data refers to text data that is generated by humans. In after-sales service businesses, there are two main sources of unstructured data: customer complaints, which generally describe symptoms, and technician comments, which outline diagnostics and treatment information. A legitimate customer complaint can eventually be tracked to a failure or a claim. However, there is a delay between the time of a customer complaint and the time of a failure or a claim. A proactive strategy aimed at analyzing customer complaints for symptoms can help service providers detect reliability problems in advance and initiate corrective actions such as recalls. This paper introduces essential text mining concepts in the context of reliability analysis and a method to detect emerging reliability issues. The application of the method is illustrated using a case study.\nWe introduce a framework for model learning and planning in stochastic domains with continuous state and action spaces and non-Gaussian transition models. It is efficient because (1) local models are estimated only when the planner requires them; (2) the planner focuses on the most relevant states to the current planning problem; and (3) the planner focuses on the most informative and/or high-value actions. Our theoretical analysis shows the validity and asymptotic optimality of the proposed approach. Empirically, we demonstrate the effectiveness of our algorithm on a simulated multi-modal pushing problem.\nWe introduce a framework for supporting learning to program in the paradigm of Answer Set Programming (ASP), which is a declarative logic programming formalism. Based on the idea of teaching by asking the student to complete small example ASP programs, we introduce a three-stage method for giving hints to the student without revealing the correct solution of an example. We categorize mistakes into (i) syntactic mistakes, (ii) unexpected but syntactically correct input, and (iii) semantic mistakes, describe mathematical definitions of these mistakes, and show how to compute hints from these definitions.\nThe rapid development of autonomous vehicles spurred a careful investigation of the potential benefits of all-autonomous transportation networks. Most studies conclude that autonomous systems can enable drastic improvements in performance. A widely studied concept is all-autonomous, collision-free intersections, where vehicles arriving in a traffic intersection with no traffic light adjust their speeds to cross safely through the intersection as quickly as possible. In this paper, we propose a coordination control algorithm for this problem, assuming stochastic models for the arrival times of the vehicles. The proposed algorithm provides provable guarantees on safety and performance. More precisely, it is shown that no collisions occur surely, and moreover a rigorous upper bound is provided for the expected wait time. The algorithm is also demonstrated in simulations. The proposed algorithms are inspired by polling systems. In fact, the problem studied in this paper leads to a new polling system where customers are subject to differential constraints, which may be interesting in its own right.\nDue to the lack of structured knowledge applied in learning distributed representation of cate- gories, existing work cannot incorporate category hierarchies into entity information. We propose a framework that embeds entities and categories into a semantic space by integrating structured knowledge and taxonomy hierarchy from large knowledge bases. The framework allows to com- pute meaningful semantic relatedness between entities and categories. Our framework can han- dle both single-word concepts and multiple-word concepts with superior performance on concept categorization and yield state of the art results on dataless hierarchical classification.\nIn the medical domain, the continuous stream of scientific research contains contradictory results supported by arguments and counter-arguments. As medical expertise occurs at different levels, part of the human agents have difficulties to face the huge amount of studies, but also to understand the reasons and pieces of evidences claimed by the proponents and the opponents of the debated topic. To better understand the supporting arguments for new findings related to current state of the art in the medical domain we need tools able to identify arguments in scientific papers. Our work here aims to fill the above technological gap.   Quite aware of the difficulty of this task, we embark to this road by relying on the well-known interleaving of domain knowledge with natural language processing. To formalise the existing medical knowledge, we rely on ontologies. To structure the argumentation model we use also the expressivity and reasoning capabilities of Description Logics. To perform argumentation mining we formalise various linguistic patterns in a rule-based language. We tested our solution against a corpus of scientific papers related to breast cancer. The run experiments show a F-measure between 0.71 and 0.86 for identifying conclusions of an argument and between 0.65 and 0.86 for identifying premises of an argument.\nParkinson's disease is the second most common neurodegenerative disease, affecting more than 1.2 million people in Europe. Medications are available for the management of its symptoms, but the exact cause of the disease is unknown and there is currently no cure on the market. To better understand the relations between new findings and current medical knowledge, we need tools able to analyse published medical papers based on natural language processing and tools capable to identify various relationships of new findings with the current medical knowledge. Our work aims to fill the above technological gap.   To identify conflicting information in medical documents, we enact textual entailment technology. To encapsulate existing medical knowledge, we rely on ontologies. To connect the formal axioms in ontologies with natural text in medical articles, we exploit ontology verbalisation techniques. To assess the level of disagreement between human agents with respect to a medical issue, we rely on fuzzy aggregation. To harmonize this disagreement, we design mediation protocols within a multi-agent framework.\nThis paper addresses the task of zero-shot image classification. The key contribution of the proposed approach is to control the semantic embedding of images -- one of the main ingredients of zero-shot learning -- by formulating it as a metric learning problem. The optimized empirical criterion associates two types of sub-task constraints: metric discriminating capacity and accurate attribute prediction. This results in a novel expression of zero-shot learning not requiring the notion of class in the training phase: only pairs of image/attributes, augmented with a consistency indicator, are given as ground truth. At test time, the learned model can predict the consistency of a test image with a given set of attributes , allowing flexible ways to produce recognition inferences. Despite its simplicity, the proposed approach gives state-of-the-art results on four challenging datasets used for zero-shot recognition evaluation.\nPairwise comparison matrix as a crucial component of AHP, presents the prefer- ence relations among alternatives. However, in many cases, the pairwise comparison matrix is difficult to complete, which obstructs the subsequent operations of the clas- sical AHP. In this paper, based on DEMATEL which has ability to derive the total relation matrix from direct relation matrix, a new completion method for incomplete pairwise comparison matrix is proposed. The proposed method provides a new per- spective to estimate the missing values with explicit physical meaning. Besides, the proposed method has low computational cost. This promising method has a wide application in multi-criteria decision-making.\nMalware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.\nMobile malware has continued to grow at an alarming rate despite on-going efforts towards mitigating the problem. This has been particularly noticeable on Android due to its being an open platform that has subsequently overtaken other platforms in the share of the mobile smart devices market. Hence, incentivizing a new wave of emerging Android malware sophisticated enough to evade most common detection methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.\nIoT Big Data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining distributed data streams, and thus able to run on real-world clusters. We run several experiments to study the accuracy and throughput performance of our new VHT algorithm, as well as its ability to scale while keeping its superior performance with respect to non-distributed decision trees.\nAs we shift more of our lives into the virtual domain, the volume of data shared on the web keeps increasing and presents a threat to our privacy. This works contributes to the understanding of privacy implications of such data sharing by analysing how well people are recognisable in social media data. To facilitate a systematic study we define a number of scenarios considering factors such as how many heads of a person are tagged and if those heads are obfuscated or not. We propose a robust person recognition system that can handle large variations in pose and clothing, and can be trained with few training samples. Our results indicate that a handful of images is enough to threaten users' privacy, even in the presence of obfuscation. We show detailed experimental results, and discuss their implications.\nInfluence diagrams provide a compact graphical representation of decision problems. Several algorithms for the quick computation of their associated expected utilities are available in the literature. However, often they rely on a full quantification of both probabilistic uncertainties and utility values. For problems where all random variables and decision spaces are finite and discrete, here we develop a symbolic way to calculate the expected utilities of influence diagrams that does not require a full numerical representation. Within this approach expected utilities correspond to families of polynomials. After characterizing their polynomial structure, we develop an efficient symbolic algorithm for the propagation of expected utilities through the diagram and provide an implementation of this algorithm using a computer algebra system. We then characterize many of the standard manipulations of influence diagrams as transformations of polynomials. We also generalize the decision analytic framework of these diagrams by defining asymmetries as operations over the expected utility polynomials.\nIn the task of community detection, there often exists some useful prior information. In this paper, a Semi-supervised clustering approach using a new Evidential Label Propagation strategy (SELP) is proposed to incorporate the domain knowledge into the community detection model. The main advantage of SELP is that it can take limited supervised knowledge to guide the detection process. The prior information of community labels is expressed in the form of mass functions initially. Then a new evidential label propagation rule is adopted to propagate the labels from labeled data to unlabeled ones. The outliers can be identified to be in a special class. The experimental results demonstrate the effectiveness of SELP.\nThe DLVHEX system implements the HEX-semantics, which integrates answer set programming (ASP) with arbitrary external sources. Since its first release ten years ago, significant advancements were achieved. Most importantly, the exploitation of properties of external sources led to efficiency improvements and flexibility enhancements of the language, and technical improvements on the system side increased user's convenience. In this paper, we present the current status of the system and point out the most important recent enhancements over early versions. While existing literature focuses on theoretical aspects and specific components, a bird's eye view of the overall system is missing. In order to promote the system for real-world applications, we further present applications which were already successfully realized on top of DLVHEX. This paper is under consideration for acceptance in Theory and Practice of Logic Programming.\nAs data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification problem. Further, we analyze a large database of pipelines that were previously used to solve various supervised classification problems and identify 100 short series of machine learning operations that appear the most frequently, which we call the building blocks of machine learning pipelines. We harness these building blocks to initialize TPOT with promising solutions, and find that this sensible initialization method significantly improves TPOT's performance on one benchmark at no cost of significantly degrading performance on the others. Thus, sensible initialization with machine learning pipeline building blocks shows promise for GP-based AutoML systems, and should be further refined in future work.\nPDDL+ is an extension of PDDL that enables modelling planning domains with mixed discrete-continuous dynamics. In this paper we present a new approach to PDDL+ planning based on Constraint Answer Set Programming (CASP), i.e. ASP rules plus numerical constraints. To the best of our knowledge, ours is the first attempt to link PDDL+ planning and logic programming. We provide an encoding of PDDL+ models into CASP problems. The encoding can handle non-linear hybrid domains, and represents a solid basis for applying logic programming to PDDL+ planning. As a case study, we consider the EZCSP CASP solver and obtain promising results on a set of PDDL+ benchmark problems.\nThis paper explores the capabilities of convolutional neural networks to deal with a task that is easily manageable for humans: perceiving 3D pose of a human body from varying angles. However, in our approach, we are restricted to using a monocular vision system. For this purpose, we apply a convolutional neural network approach on RGB videos and extend it to three dimensional convolutions. This is done via encoding the time dimension in videos as the 3\\ts{rd} dimension in convolutional space, and directly regressing to human body joint positions in 3D coordinate space. This research shows the ability of such a network to achieve state-of-the-art performance on the selected Human3.6M dataset, thus demonstrating the possibility of successfully representing temporal data with an additional dimension in the convolutional operation.\nIn this paper, a progressive learning technique for multi-class classification is proposed. This newly developed learning technique is independent of the number of class constraints and it can learn new classes while still retaining the knowledge of previous classes. Whenever a new class (non-native to the knowledge learnt thus far) is encountered, the neural network structure gets remodeled automatically by facilitating new neurons and interconnections, and the parameters are calculated in such a way that it retains the knowledge learnt thus far. This technique is suitable for real-world applications where the number of classes is often unknown and online learning from real-time data is required. The consistency and the complexity of the progressive learning technique are analyzed. Several standard datasets are used to evaluate the performance of the developed technique. A comparative study shows that the developed technique is superior.\nThe community deception problem is about how to hide a target community C from community detection algorithms. The need for deception emerges whenever a group of entities (e.g., activists, police enforcements) want to cooperate while concealing their existence as a community. In this paper we introduce and formalize the community deception problem. To solve this problem, we describe algorithms that carefully rewire the connections of C's members. We experimentally show how several existing community detection algorithms can be deceived, and quantify the level of deception by introducing a deception score. We believe that our study is intriguing since, while showing how deception can be realized it raises awareness for the design of novel detection algorithms robust to deception techniques.\nThe computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limits their deployability on ubiquitous computing devices such as smart phones, wearables and autonomous drones. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach based on a novel, layer-wise greedy methodology. Thanks to our two-stage training procedure, the teacher network is still able to use state-of-the-art methods such as dropout and batch normalization to increase accuracy and reduce training time. Using only ternary weights and activations, the student ternary network learns to mimic the behavior of its teacher network without using any multiplication. Unlike its -1,1 binary counterparts, a ternary neural network inherently prunes the smaller weights by setting them to zero during training. This makes them sparser and thus more energy-efficient. We design a purpose-built hardware architecture for TNNs and implement it on FPGA and ASIC. We evaluate TNNs on several benchmark datasets and demonstrate up to 3.1x better energy efficiency with respect to the state of the art while also improving accuracy.\nOne of the fundamental problems in crowdsourcing is the trade-off between the number of the workers needed for high-accuracy aggregation and the budget to pay. For saving budget, it is important to ensure high quality of the crowd-sourced labels, hence the total cost on label collection will be reduced. Since the self-confidence of the workers often has a close relationship with their abilities, a possible way for quality control is to request the workers to return the labels only when they feel confident, by means of providing unsure option to them. On the other hand, allowing workers to choose unsure option also leads to the potential danger of budget waste. In this work, we propose the analysis towards understanding when providing the unsure option indeed leads to significant cost reduction, as well as how the confidence threshold is set. We also propose an online mechanism, which is alternative for threshold selection when the estimation of the crowd ability distribution is difficult.\nDespite significant developments in Proof Theory, surprisingly little attention has been devoted to the concept of proof verifier. In particular, the mathematical community may be interested in studying different types of proof verifiers (people, programs, oracles, communities, superintelligences) as mathematical objects. Such an effort could reveal their properties, their powers and limitations (particularly in human mathematicians), minimum and maximum complexity, as well as self-verification and self-reference issues. We propose an initial classification system for verifiers and provide some rudimentary analysis of solved and open problems in this important domain. Our main contribution is a formal introduction of the notion of unverifiability, for which the paper could serve as a general citation in domains of theorem proving, as well as software and AI verification.\nMany real-world problems are composed of several interacting components. In order to facilitate research on such interactions, the Traveling Thief Problem (TTP) was created in 2013 as the combination of two well-understood combinatorial optimization problems.   With this article, we contribute in four ways. First, we create a comprehensive dataset that comprises the performance data of 21 TTP algorithms on the full original set of 9720 TTP instances. Second, we define 55 characteristics for all TPP instances that can be used to select the best algorithm on a per-instance basis. Third, we use these algorithms and features to construct the first algorithm portfolios for TTP, clearly outperforming the single best algorithm. Finally, we study which algorithms contribute most to this portfolio.\nIn the context of the Competition on Legal Information Extraction/Entailment (COLIEE), we propose a method comprising the necessary steps for finding relevant documents to a legal question and deciding on textual entailment evidence to provide a correct answer. The proposed method is based on the combination of several lexical and morphological characteristics, to build a language model and a set of features for Machine Learning algorithms. We provide a detailed study on the proposed method performance and failure cases, indicating that it is competitive with state-of-the-art approaches on Legal Information Retrieval and Question Answering, while not needing extensive training data nor depending on expert produced knowledge. The proposed method achieved significant results in the competition, indicating a substantial level of adequacy for the tasks addressed.\nClassification involves the learning of the mapping function that associates input samples to corresponding target label. There are two major categories of classification problems: Single-label classification and Multi-label classification. Traditional binary and multi-class classifications are sub-categories of single-label classification. Several classifiers are developed for binary, multi-class and multi-label classification problems, but there are no classifiers available in the literature capable of performing all three types of classification. In this paper, a novel online universal classifier capable of performing all the three types of classification is proposed. Being a high speed online classifier, the proposed technique can be applied to streaming data applications. The performance of the developed classifier is evaluated using datasets from binary, multi-class and multi-label problems. The results obtained are compared with state-of-the-art techniques from each of the classification types.\nObservable operator models (OOMs) and related models are one of the most important and powerful tools for modeling and analyzing stochastic systems. They exactly describe dynamics of finite-rank systems and can be efficiently and consistently estimated through spectral learning under the assumption of identically distributed data. In this paper, we investigate the properties of spectral learning without this assumption due to the requirements of analyzing large-time scale systems, and show that the equilibrium dynamics of a system can be extracted from nonequilibrium observation data by imposing an equilibrium constraint. In addition, we propose a binless extension of spectral learning for continuous data. In comparison with the other continuous-valued spectral algorithms, the binless algorithm can achieve consistent estimation of equilibrium dynamics with only linear complexity.\nQ-learning is a simple and powerful tool in solving dynamic problems where environments are unknown. It uses a balance of exploration and exploitation to find an optimal solution to the problem. In this paper, we propose using four basic emotions: joy, sadness, fear, and anger to influence a Qlearning agent. Simulations show that the proposed affective agent requires lesser number of steps to find the optimal path. We found when affective agent finds the optimal path, the ratio between exploration to exploitation gradually decreases, indicating lower total step count in the long run\nOpen Trip Planner was identified as the most promising open source multi-modal trip planning software. Open Street Map, which provides mapping data to Open Trip Planner, is one of the most well-known open source international repository of geographic data. General Transit Feed Specification, which provides transportation data to Open Trip Planner, has been the standard for describing transit systems and platform for numerous applications. Together, when used to implement an instance of Open Trip Planner, these software has been helping in traffic decongestion all over the world by assisting commuters to shift from using private transportation modes to public ones. Their potential however goes beyond providing multi-modal public transportation routes. This paper aims to first discuss the researchers' experience in implementing a public transportation route planner for the purpose of traffic decongestion.The researchers would examine the prospective of using the system for disaster preparedness and recovery and concrete ways on how to realize them.\nStarting from a generalization of the standard axioms for a monoid we present a stepwise development of various, mutually equivalent foundational axiom systems for category theory. Our axiom sets have been formalized in the Isabelle/HOL interactive proof assistant, and this formalization utilizes a semantically correct embedding of free logic in classical higher-order logic. The modeling and formal analysis of our axiom sets has been significantly supported by series of experiments with automated reasoning tools integrated with Isabelle/HOL. We also address the relation of our axiom systems to alternative proposals from the literature, including an axiom set proposed by Freyd and Scedrov for which we reveal a technical issue (when encoded in free logic where free variables range over defined and undefined objects): either all operations, e.g. morphism composition, are total or their axiom system is inconsistent. The repair for this problem is quite straightforward, however.\nStudies on microscopic pedestrian requires large amounts of trajectory data from real-world pedestrian crowds. Such data collection, if done manually, needs tremendous effort and is very time consuming. Though many studies have asserted the possibility of automating this task using video cameras, we found that only a few have demonstrated good performance in very crowded situations or from a top-angled view scene. This paper deals with tracking pedestrian crowd under heavy occlusions from an angular scene. Our automated tracking system consists of two modules that perform sequentially. The first module detects moving objects as blobs. The second module is a tracking system. We employ probability distribution from the detection of each pedestrian and use Bayesian update to track the next position. The result of such tracking is a database of pedestrian trajectories over time and space. With certain prior information, we showed that the system can track a large number of people under occlusion and clutter scene.\nReinforcement learning tasks are typically specified as Markov decision processes. This formalism has been highly successful, though specifications often couple the dynamics of the environment and the learning objective. This lack of modularity can complicate generalization of the task specification, as well as obfuscate connections between different task settings, such as episodic and continuing. In this work, we introduce the RL task formalism, that provides a unification through simple constructs including a generalization to transition-based discounting. Through a series of examples, we demonstrate the generality and utility of this formalism. Finally, we extend standard learning constructs, including Bellman operators, and extend some seminal theoretical results, including approximation errors bounds. Overall, we provide a well-understood and sound formalism on which to build theoretical results and simplify algorithm use and development.\nMarkov Random Fields (MRFs), a formulation widely used in generative image modeling, have long been plagued by the lack of expressive power. This issue is primarily due to the fact that conventional MRFs formulations tend to use simplistic factors to capture local patterns. In this paper, we move beyond such limitations, and propose a novel MRF model that uses fully-connected neurons to express the complex interactions among pixels. Through theoretical analysis, we reveal an inherent connection between this model and recurrent neural networks, and thereon derive an approximated feed-forward network that couples multiple RNNs along opposite directions. This formulation combines the expressive power of deep neural networks and the cyclic dependency structure of MRF in a unified model, bringing the modeling capability to a new level. The feed-forward approximation also allows it to be efficiently learned from data. Experimental results on a variety of low-level vision tasks show notable improvement over state-of-the-arts.\nIn this work we introduce a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture that is trained end-to-end. Such a universal network can act like a `swiss knife' for vision tasks; we call this architecture an UberNet to indicate its overarching nature.   We address two main technical challenges that emerge when broadening up the range of tasks handled by a single CNN: (i) training a deep architecture while relying on diverse training sets and (ii) training many (potentially unlimited) tasks with a limited memory budget. Properly addressing these two problems allows us to train accurate predictors for a host of tasks, without compromising accuracy.   Through these advances we train in an end-to-end manner a CNN that simultaneously addresses (a) boundary detection (b) normal estimation (c) saliency estimation (d) semantic segmentation (e) human part segmentation (f) semantic boundary detection, (g) region proposal generation and object detection. We obtain competitive performance while jointly addressing all of these tasks in 0.7 seconds per frame on a single GPU. A demonstration of this system can be found at http://cvn.ecp.fr/ubernet/.\nWe consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of Successive Elimination based on random shuffling of the $K$ arms. We prove that under a novel and mild assumption on the mean gap $\\Delta$, this simple but powerful modification achieves the same guarantees in term of sample complexity and cumulative regret than its original version, but in a much wider class of problems, as it is not anymore constrained to stationary distributions. We also show that the original {\\sc Successive Elimination} fails to have controlled regret in this more general scenario, thus showing the benefit of shuffling. We then remove our mild assumption and adapt the algorithm to the best-arm identification task with switching arms. We adapt the definition of the sample complexity for that case and prove that, against an optimal policy with $N-1$ switches of the optimal arm, this new algorithm achieves an expected sample complexity of $O(\\Delta^{-2}\\sqrt{NK\\delta^{-1} \\log(K \\delta^{-1})})$, where $\\delta$ is the probability of failure of the algorithm, and an expected cumulative regret of $O(\\Delta^{-1}{\\sqrt{NTK \\log (TK)}})$ after $T$ time steps.\nThis paper reports the activities and outcomes in the Workshop on Grasping and Manipulation Datasets that was organized under the International Conference on Robotics and Automation (ICRA) 2016. The half day workshop was packed with nine invited talks, 12 interactive presentations, and one panel discussion with ten panelists. This paper summarizes all the talks and presentations and recaps what has been discussed in the panels session. This summary servers as a review of recent developments in data collection in grasping and manipulation. Many of the presentations describe ongoing efforts or explorations that could be achieved and fully available in a year or two. The panel discussion not only commented on the current approaches, but also indicates new directions and focuses. The workshop clearly displayed the importance of quality datasets in robotics and robotic grasping and manipulation field. Hopefully the workshop could motivate larger efforts to create big datasets that are comparable with big datasets in other communities such as computer vision.\nNowadays, financial data analysis is becoming increasingly important in the business market. As companies collect more and more data from daily operations, they expect to extract useful knowledge from existing collected data to help make reasonable decisions for new customer requests, e.g. user credit category, churn analysis, real estate analysis, etc. Financial institutes have applied different data mining techniques to enhance their business performance. However, simple ap-proach of these techniques could raise a performance issue. Besides, there are very few general models for both understanding and forecasting different finan-cial fields. We present in this paper a new classification model for analyzing fi-nancial data. We also evaluate this model with different real-world data to show its performance.\nWe consider scenarios from the real-time strategy game StarCraft as new benchmarks for reinforcement learning algorithms. We propose micromanagement tasks, which present the problem of the short-term, low-level control of army members during a battle. From a reinforcement learning point of view, these scenarios are challenging because the state-action space is very large, and because there is no obvious feature representation for the state-action evaluation function. We describe our approach to tackle the micromanagement scenarios with deep neural network controllers from raw state features given by the game engine. In addition, we present a heuristic reinforcement learning algorithm which combines direct exploration in the policy space and backpropagation. This algorithm allows for the collection of traces for learning using deterministic policies, which appears much more efficient than, for example, {\\epsilon}-greedy exploration. Experiments show that with this algorithm, we successfully learn non-trivial strategies for scenarios with armies of up to 15 agents, where both Q-learning and REINFORCE struggle.\nOne of the main challenges in Grid systems is designing an adaptive, scalable, and model-independent method for job scheduling to achieve a desirable degree of load balancing and system efficiency. Centralized job scheduling methods have some drawbacks, such as single point of failure and lack of scalability. Moreover, decentralized methods require a coordination mechanism with limited communications. In this paper, we propose a multi-agent approach to job scheduling in Grid, named Centralized Learning Distributed Scheduling (CLDS), by utilizing the reinforcement learning framework. The CLDS is a model free approach that uses the information of jobs and their completion time to estimate the efficiency of resources. In this method, there are a learner agent and several scheduler agents that perform the task of learning and job scheduling with the use of a coordination strategy that maintains the communication cost at a limited level. We evaluated the efficiency of the CLDS method by designing and performing a set of experiments on a simulated Grid system under different system scales and loads. The results show that the CLDS can effectively balance the load of system even in large scale and heavy loaded Grids, while maintains its adaptive performance and scalability.\nProcess mining is a research field focused on the analysis of event data with the aim of extracting insights in processes. Applying process mining techniques on data from smart home environments has the potential to provide valuable insights in (un)healthy habits and to contribute to ambient assisted living solutions. Finding the right event labels to enable application of process mining techniques is however far from trivial, as simply using the triggering sensor as the label for sensor events results in uninformative models that allow for too much behavior (overgeneralizing). Refinements of sensor level event labels suggested by domain experts have shown to enable discovery of more precise and insightful process models. However, there exist no automated approach to generate refinements of event labels in the context of process mining. In this paper we propose a framework for automated generation of label refinements based on the time attribute of events. We show on a case study with real life smart home event data that behaviorally more specific, and therefore more insightful, process models can be found by using automatically generated refined labels in process discovery.\nCausal inference from observational data is a subject of active research and development in statistics and computer science. Many toolkits have been developed for this purpose that depends on statistical software. However, these toolkits do not scale to large datasets. In this paper we describe a suite of techniques for expressing causal inference tasks from observational data in SQL. This suite supports the state-of-the-art methods for causal inference and run at scale within a database engine. In addition, we introduce several optimization techniques that significantly speedup causal inference, both in the online and offline setting. We evaluate the quality and performance of our techniques by experiments of real datasets.\nEvents and entities are closely related; entities are often actors or participants in events and events without entities are uncommon. The interpretation of events and entities is highly contextually dependent. Existing work in information extraction typically models events separately from entities, and performs inference at the sentence level, ignoring the rest of the document. In this paper, we propose a novel approach that models the dependencies among variables of events, entities, and their relations, and performs joint inference of these variables across a document. The goal is to enable access to document-level contextual information and facilitate context-aware predictions. We demonstrate that our approach substantially outperforms the state-of-the-art methods for event extraction as well as a strong baseline for entity extraction.\nGraph aggregation is the process of computing a single output graph that constitutes a good compromise between several input graphs, each provided by a different source. One needs to perform graph aggregation in a wide variety of situations, e.g., when applying a voting rule (graphs as preference orders), when consolidating conflicting views regarding the relationships between arguments in a debate (graphs as abstract argumentation frameworks), or when computing a consensus between several alternative clusterings of a given dataset (graphs as equivalence relations). In this paper, we introduce a formal framework for graph aggregation grounded in social choice theory. Our focus is on understanding which properties shared by the individual input graphs will transfer to the output graph returned by a given aggregation rule. We consider both common properties of graphs, such as transitivity and reflexivity, and arbitrary properties expressible in certain fragments of modal logic. Our results establish several connections between the types of properties preserved under aggregation and the choice-theoretic axioms satisfied by the rules used. The most important of these results is a powerful impossibility theorem that generalises Arrow's seminal result for the aggregation of preference orders to a large collection of different types of graphs.\nPDDL+ planning has its semantics rooted in hybrid automata (HA) and recent work has shown that it can be modeled as a network of HAs. Addressing the complexity of nonlinear PDDL+ planning as HAs requires both space and time efficient reasoning. Unfortunately, existing solvers either do not address nonlinear dynamics or do not natively support networks of automata.   We present a new algorithm, called HNSolve, which guides the variable selection of the dReal Satisfiability Modulo Theories (SMT) solver while reasoning about network encodings of nonlinear PDDL+ planning as HAs. HNSolve tightly integrates with dReal by solving a discrete abstraction of the HA network. HNSolve finds composite runs on the HA network that ignore continuous variables, but respect mode jumps and synchronization labels. HNSolve admissibly detects dead-ends in the discrete abstraction, and posts conflict clauses that prune the SMT solver's search. We evaluate the benefits of our HNSolve algorithm on PDDL+ benchmark problems and demonstrate its performance with respect to prior work.\nA common strategy for improving optimization algorithms is to restart the algorithm when it is believed to be trapped in an inferior part of the search space. However, while specific restart strategies have been developed for specific problems (and specific algorithms), restarts are typically not regarded as a general tool to speed up an optimization algorithm. In fact, many optimization algorithms do not employ restarts at all.   Recently, \"bet-and-run\" was introduced in the context of mixed-integer programming, where first a number of short runs with randomized initial conditions is made, and then the most promising run of these is continued. In this article, we consider two classical NP-complete combinatorial optimization problems, traveling salesperson and minimum vertex cover, and study the effectiveness of different bet-and-run strategies. In particular, our restart strategies do not take any problem knowledge into account, nor are tailored to the optimization algorithm. Therefore, they can be used off-the-shelf. We observe that state-of-the-art solvers for these problems can benefit significantly from restarts on standard benchmark instances.\nReconstruction of the tridimensional geometry of a visual scene using the binocular disparity information is an important issue in computer vision and mobile robotics, which can be formulated as a Bayesian inference problem. However, computation of the full disparity distribution with an advanced Bayesian model is usually an intractable problem, and proves computationally challenging even with a simple model. In this paper, we show how probabilistic hardware using distributed memory and alternate representation of data as stochastic bitstreams can solve that problem with high performance and energy efficiency. We put forward a way to express discrete probability distributions using stochastic data representations and perform Bayesian fusion using those representations, and show how that approach can be applied to diparity computation. We evaluate the system using a simulated stochastic implementation and discuss possible hardware implementations of such architectures and their potential for sensorimotor processing and robotics.\nRelational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.\nIn this article we propose a method to refine the clustering results obtained with the nonnegative matrix factorization (NMF) technique, imposing consistency constraints on the final labeling of the data. The research community focused its effort on the initialization and on the optimization part of this method, without paying attention to the final cluster assignments. We propose a game theoretic framework in which each object to be clustered is represented as a player, which has to choose its cluster membership. The information obtained with NMF is used to initialize the strategy space of the players and a weighted graph is used to model the interactions among the players. These interactions allow the players to choose a cluster which is coherent with the clusters chosen by similar players, a property which is not guaranteed by NMF, since it produces a soft clustering of the data. The results on common benchmarks show that our model is able to improve the performances of many NMF formulations.\nPK Dick once asked \"Do Androids Dream of Electric Sheep?\" In video games, a similar question could be asked of non-player characters: Do NPCs have dreams? Can they live and change as humans do? Can NPCs have personalities, and can these develop through interactions with players, other NPCs, and the world around them? Despite advances in personality AI for games, most NPCs are still undeveloped and undeveloping, reacting with flat affect and predictable routines that make them far less than human--in fact, they become little more than bits of the scenery that give out parcels of information. This need not be the case. Extreme AI, a psychology-based personality engine, creates adaptive NPC personalities. Originally developed as part of the thesis \"NPCs as People: Using Databases and Behaviour Trees to Give Non-Player Characters Personality,\" Extreme AI is now a fully functioning personality engine using all thirty facets of the Five Factor model of personality and an AI system that is live throughout gameplay. This paper discusses the research leading to Extreme AI; develops the ideas found in that thesis; discusses the development of other personality engines; and provides examples of Extreme AI's use in two game demos.\nA Bayesian agent acting in a multi-agent environment learns to predict the other agents' policies if its prior assigns positive probability to them (in other words, its prior contains a \\emph{grain of truth}). Finding a reasonably large class of policies that contains the Bayes-optimal policies with respect to this class is known as the \\emph{grain of truth problem}. Only small classes are known to have a grain of truth and the literature contains several related impossibility results. In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class. When the environment is unknown, Bayes-optimal agents may fail to act optimally even asymptotically. However, agents based on Thompson sampling converge to play {\\epsilon}-Nash equilibria in arbitrary unknown computable multi-agent environments. While these results are purely theoretical, we show that they can be computationally approximated arbitrarily closely.\nModeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighboring note events in a given corpus. The model is also able to invent new chords and to harmonize unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, re-discovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce user-defined constraints, which makes it useful for style-based, interactive music generation.\nOperationalization of terminology for IT applications has revived the Wusterian approach. The conceptual dimension once more prevails after taking back seat to specialised lexicography. This is demonstrated by the emergence of ontology in terminology. While the Terminology Principles as defined in Felber manual and the ISO standards remain at the core of traditional terminology, their computational implementation raises some issues. In this article, while reiterating their importance, we will be re-examining these Principles from a dual perspective: that of logic in the mathematical sense of the term and that of epistemology as in the theory of knowledge. We will thus be clarifying and describing some of them so as to take into account advances in knowledge engineering (ontology) and formal systems (logic). The notion of ontoterminology, terminology whose conceptual system is a formal ontology, results from this approach.\nLanguage students are most engaged while reading texts at an appropriate difficulty level. However, existing methods of evaluating text difficulty focus mainly on vocabulary and do not prioritize grammatical features, hence they do not work well for language learners with limited knowledge of grammar. In this paper, we introduce grammatical templates, the expert-identified units of grammar that students learn from class, as an important feature of text difficulty evaluation. Experimental classification results show that grammatical template features significantly improve text difficulty prediction accuracy over baseline readability features by 7.4%. Moreover, we build a simple and human-understandable text difficulty evaluation approach with 87.7% accuracy, using only 5 grammatical template features.\nUsually gradual and continuous changes in entities will lead to appear events. But usually it is supposed that an event is occurred at once. In this research an integrated framework called continuous occurrence theory (COT) is presented to investigate respective path leading to occurrence of the events in the real world. For this purpose initially fundamental concepts are defined. Afterwards, the appropriate tools such as occurrence variables computations, occurrence dependency function and occurrence model are introduced and explained in a systematic manner. Indeed, COT provides the possibility to: (a) monitor occurrence of events during time; (b) study background of the events; (c) recognize the relevant issues of each event; and (d) understand how these issues affect on the considered event. The developed framework (COT) provides the necessary context to analyze accurately continual changes of the issues and the relevant events in the various branches of science and business. Finally, typical applications of COT and an applied modeling example of it have been explained and a mathematical programming example is modeled in the occurrence based environment.\nCan non-player characters have human-realistic personalities, changing over time depending on input from those around them? And can they have different reactions and thoughts about different people? Using Extreme AI, a psychology-based personality engine using the Five Factor model of personality, I answer these questions by creating personalities for 100 voters and allowing them to react to two politicians to see if the NPC voters' choice of candidate develops in a realistic-seeming way, based on initial and changing personality facets and on their differing feelings toward the politicians (in this case, across liking, trusting, and feeling affiliated with the candidates). After 16 test runs, the voters did indeed change their attitudes and feelings toward the candidates in different and yet generally realistic ways, and even changed their attitudes about other issues based on what a candidate extolled.\nThe continuous increase in the availability of data of any kind, coupled with the development of networks of high-speed communications, the popularization of cloud computing and the growth of data centers and the emergence of high-performance computing does essential the task to develop techniques that allow more efficient data processing and analyzing of large volumes datasets and extraction of valuable information. In the following pages we will discuss about development of this field in recent decades, and its potential and applicability present in the various branches of scientific research. Also, we try to review briefly the different families of algorithms that are included in data mining research area, its scalability with increasing dimensionality of the input data and how they can be addressed and what behavior different methods in a scenario in which the information is distributed or decentralized processed so as to increment performance optimization in heterogeneous environments.\nAdvances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as humans in deathmatch scenarios.\nThis paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the form of the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which does not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these representations. This shows significant benefit over the sequential processing of LSTMs. The overall efficacy of our approach is demonstrated by significant improvements over the state-of-the-art, from 71.2% to 74.4% in accuracy on the \"abstract scenes\" multiple-choice benchmark, and from 34.7% to 39.1% in accuracy over pairs of \"balanced\" scenes, i.e. images with fine-grained differences and opposite yes/no answers to a same question.\nBilattice-based triangle provides an elegant algebraic structure for reasoning with vague and uncertain information. But the truth and knowledge ordering of intervals in bilattice-based triangle can not handle repetitive belief revisions which is an essential characteristic of nonmonotonic reasoning. Moreover the ordering induced over the intervals by the bilattice-based triangle is not sometimes intuitive. In this work, we construct an alternative algebraic structure, namely preorder-based triangle and we formulate proper logical connectives for this. It is also demonstrated that Preorder-based triangle serves to be a better alternative to the bilattice-based triangle for reasoning in application areas, that involve nonmonotonic fuzzy reasoning with uncertain information.\nUnification in Description Logics has been introduced as a means to detect redundancies in ontologies. We try to extend the known decidability results for unification in the Description Logic $\\mathcal{EL}$ to disunification since negative constraints can be used to avoid unwanted unifiers. While decidability of the solvability of general $\\mathcal{EL}$-disunification problems remains an open problem, we obtain NP-completeness results for two interesting special cases: dismatching problems, where one side of each negative constraint must be ground, and local solvability of disunification problems, where we consider only solutions that are constructed from terms occurring in the input problem. More precisely, we first show that dismatching can be reduced to local disunification, and then provide two complementary NP-algorithms for finding local solutions of disunification problems.\nTime series interpretation aims to provide an explanation of what is observed in terms of its underlying processes. The present work is based on the assumption that common classification-based approaches to time series interpretation suffer from a set of inherent weaknesses whose ultimate cause lies in the monotonic nature of the deductive reasoning paradigm. In this document we propose a new approach to this problem based on the initial hypothesis that abductive reasoning properly accounts for the human ability to identify and characterize patterns appearing in a time series. The result of the interpretation is a set of conjectures in the form of observations, organized into an abstraction hierarchy, and explaining what has been observed. A knowledge-based framework and a set of algorithms for the interpretation task are provided, implementing a hypothesize-and-test cycle guided by an attentional mechanism. As a representative application domain, the interpretation of the electrocardiogram allows us to highlight the strengths of the proposed approach in comparison with traditional classification-based approaches.\nIn this paper, we present an approach that is able to handle with Z-numbers in the context of Multi-Criteria Decision Making (MCDM) problems. Z-numbers are composed of two parts, the first one is a restriction on the values that can be assumed, and the second part is the reliability of the information. As human beings we communicate with other people by means of natural language using sentences like: the journey time from home to university takes about half hour, very likely. Firstly, Z-numbers are converted to fuzzy numbers using a standard procedure. Next, the Z-TODIM and Z-TOPSIS are presented as a direct extension of the fuzzy TODIM and fuzzy TOPSIS, respectively. The proposed methods are applied to two case studies and compared with the standard approach using crisp values. Results obtained show the feasibility of the approach. In addition, a graphical interface was built to handle with both methods Z- TODIM and Z-TOPSIS allowing ease of use for user in other areas of knowledge.\nUnderstanding the nature of dark energy, the mysterious force driving the accelerated expansion of the Universe, is a major challenge of modern cosmology. The next generation of cosmological surveys, specifically designed to address this issue, rely on accurate measurements of the apparent shapes of distant galaxies. However, shape measurement methods suffer from various unavoidable biases and therefore will rely on a precise calibration to meet the accuracy requirements of the science analysis. This calibration process remains an open challenge as it requires large sets of high quality galaxy images. To this end, we study the application of deep conditional generative models in generating realistic galaxy images. In particular we consider variations on conditional variational autoencoder and introduce a new adversarial objective for training of conditional generative networks. Our results suggest a reliable alternative to the acquisition of expensive high quality observations for generating the calibration data needed by the next generation of cosmological surveys.\nThis provocation paper provides an overview of the underlying optimisation problem in the emerging field of Digital Manufacturing. Initially, this paper discusses how the notion of Digital Manufacturing is transforming from a term describing a suite of software tools for the integration of production and design functions towards a more general concept incorporating computerised manufacturing and supply chain processes, as well as information collection and utilisation across the product life cycle. On this basis, we use the example of one such manufacturing process, Additive Manufacturing, to identify an integrated multi-objective optimisation problem underlying Digital Manufacturing. Forming an opportunity for a concurrent application of data science and optimisation, a set of challenges arising from this problem is outlined.\nWe report on the phase transition of finding a complete subgraph, of specified dimensions, in a bipartite graph. Finding a complete subgraph in a bipartite graph is a problem that has growing attention in several domains, including bioinformatics, social network analysis and domain clustering. A key step for a successful phase transition study is identifying a suitable order parameter, when none is known. To this purpose, we have applied a decision tree classifier to real-world instances of this problem, in order to understand what problem features separate an instance that is hard to solve from those that is not. We have successfully identified one such order parameter and with it the phase transition of finding a complete bipartite subgraph of specified dimensions. Our phase transition study shows an easy-to-hard-to-easy-to-hard-to-easy pattern. Further, our results indicate that the hardest instances are in a region where it is more likely that the corresponding bipartite graph will have a complete subgraph of specified dimensions, a positive answer. By contrast, instances with a negative answer are more likely to appear in a region where the computational cost is negligible. This behaviour is remarkably similar for problems of a number of different sizes.\nThe Gaussian mixture model is a classic technique for clustering and data modeling that is used in numerous applications. With the rise of big data, there is a need for parameter estimation techniques that can handle streaming data and distribute the computation over several processors. While online variants of the Expectation Maximization (EM) algorithm exist, their data efficiency is reduced by a stochastic approximation of the E-step and it is not clear how to distribute the computation over multiple processors. We propose a Bayesian learning technique that lends itself naturally to online and distributed computation. Since the Bayesian posterior is not tractable, we project it onto a family of tractable distributions after each observation by matching a set of sufficient moments. This Bayesian moment matching technique compares favorably to online EM in terms of time and accuracy on a set of data modeling benchmarks.\nReasoning and inference are central to human and artificial intelligence. Modeling inference in human language is very challenging. With the availability of large annotated data (Bowman et al., 2015), it has recently become feasible to train neural network based inference models, which have shown to be very effective. In this paper, we present a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset. Unlike the previous top models that use very complicated network architectures, we first demonstrate that carefully designing sequential inference models based on chain LSTMs can outperform all previous models. Based on this, we further show that by explicitly considering recursive architectures in both local inference modeling and inference composition, we achieve additional improvement. Particularly, incorporating syntactic parsing information contributes to our best result---it further improves the performance even when added to the already very strong model.\nEntity Resolution, also called record linkage or deduplication, refers to the process of identifying and merging duplicate versions of the same entity into a unified representation. The standard practice is to use a Rule based or Machine Learning based model that compares entity pairs and assigns a score to represent the pairs' Match/Non-Match status. However, performing an exhaustive pair-wise comparison on all pairs of records leads to quadratic matcher complexity and hence a Blocking step is performed before the Matching to group similar entities into smaller blocks that the matcher can then examine exhaustively. Several blocking schemes have been developed to efficiently and effectively block the input dataset into manageable groups. At CareerBuilder (CB), we perform deduplication on massive datasets of people profiles collected from disparate sources with varying informational content. We observed that, employing a single blocking technique did not cover the base for all possible scenarios due to the multi-faceted nature of our data sources. In this paper, we describe our ensemble approach to blocking that combines two different blocking techniques to leverage their respective strengths.\nAutomatic and accurate classification of items enables numerous downstream applications in many domains. These applications can range from faceted browsing of items to product recommendations and big data analytics. In the online recruitment domain, we refer to classifying job ads to pre-defined or custom occupation categories as job title classification. A large-scale job title classification system can power various downstream applications such as semantic search, job recommendations and labor market analytics. In this paper, we discuss experiments conducted to improve our in-house job title classification system. The classification component of the system is composed of a two-stage coarse and fine level classifier cascade that classifies input text such as job title and/or job ads to one of the thousands of job titles in our taxonomy. To improve classification accuracy and effectiveness, we experiment with various semantic representation strategies such as average W2V vectors and document similarity measures such as Word Movers Distance (WMD). Our initial results show an overall improvement in accuracy of Carotene[1].\nThe ability to automatically recognize a person's behavioral context can contribute to health monitoring, aging care and many other domains. Validating context recognition in-the-wild is crucial to promote practical applications that work in real-life settings. We collected over 300k minutes of sensor data with context labels from 60 subjects. Unlike previous studies, our subjects used their own personal phone, in any way that was convenient to them, and engaged in their routine in their natural environments. Unscripted behavior and unconstrained phone usage resulted in situations that are harder to recognize. We demonstrate how fusion of multi-modal sensors is important for resolving such cases. We present a baseline system, and encourage researchers to use our public dataset to compare methods and improve context recognition in-the-wild.\nRecognizing implicit discourse relations is a challenging but important task in the field of Natural Language Processing. For such a complex text processing task, different from previous studies, we argue that it is necessary to repeatedly read the arguments and dynamically exploit the efficient features useful for recognizing discourse relations. To mimic the repeated reading strategy, we propose the neural networks with multi-level attention (NNMA), combining the attention mechanism and external memories to gradually fix the attention on some specific words helpful to judging the discourse relations. Experiments on the PDTB dataset show that our proposed method achieves the state-of-art results. The visualization of the attention weights also illustrates the progress that our model observes the arguments on each level and progressively locates the important words.\nThe paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values. It is derived from the position of the letters in the text line, based on their typographical characteristics. Each code is considered as a gray level. Accordingly, the coded text determines a 1-D image, on which texture analysis by run-length statistics and local binary pattern is performed. It defines feature vectors representing the script content of the document. A modified clustering approach employed on document feature vector groups documents written in the same script. Experimentation performed on two custom oriented databases of historical documents in old Cyrillic, angular and round Glagolitic as well as Antiqua and Fraktur scripts demonstrates the superiority of the proposed method with respect to well-known methods in the state-of-the-art.\nVisual Question Answering (VQA) task has showcased a new stage of interaction between language and vision, two of the most pivotal components of artificial intelligence. However, it has mostly focused on generating short and repetitive answers, mostly single words, which fall short of rich linguistic capabilities of humans. We introduce Full-Sentence Visual Question Answering (FSVQA) dataset, consisting of nearly 1 million pairs of questions and full-sentence answers for images, built by applying a number of rule-based natural language processing techniques to original VQA dataset and captions in the MS COCO dataset. This poses many additional complexities to conventional VQA task, and we provide a baseline for approaching and evaluating the task, on top of which we invite the research community to build further improvements.\nIn this work we explore deep generative models of text in which the latent representation of a document is itself drawn from a discrete language model distribution. We formulate a variational auto-encoder for inference in this model and apply it to the task of compressing sentences. In this application the generative model first draws a latent summary sentence from a background language model, and then subsequently draws the observed sentence conditioned on this latent summary. In our empirical evaluation we show that generative formulations of both abstractive and extractive compression yield state-of-the-art results when trained on a large amount of supervised data. Further, we explore semi-supervised compression scenarios where we show that it is possible to achieve performance competitive with previously proposed supervised models while training on a fraction of the supervised data.\nIn this paper we describe approaches for discovering acoustic concepts and relations in text. The first major goal is to be able to identify text phrases which contain a notion of audibility and can be termed as a sound or an acoustic concept. We also propose a method to define an acoustic scene through a set of sound concepts. We use pattern matching and parts of speech tags to generate sound concepts from large scale text corpora. We use dependency parsing and LSTM recurrent neural network to predict a set of sound concepts for a given acoustic scene. These methods are not only helpful in creating an acoustic knowledge base but in the future can also directly help acoustic event and scene detection research.\nWe present the first reinforcement-learning model to self-improve its reward-modulated training implemented through a continuously improving \"intuition\" neural network. An agent was trained how to play the arcade video game Pong with two reward-based alternatives, one where the paddle was placed randomly during training, and a second where the paddle was simultaneously trained on three additional neural networks such that it could develop a sense of \"certainty\" as to how probable its own predicted paddle position will be to return the ball. If the agent was less than 95% certain to return the ball, the policy used an intuition neural network to place the paddle. We trained both architectures for an equivalent number of epochs and tested learning performance by letting the trained programs play against a near-perfect opponent. Through this, we found that the reinforcement learning model that uses an intuition neural network for placing the paddle during reward training quickly overtakes the simple architecture in its ability to outplay the near-perfect opponent, additionally outscoring that opponent by an increasingly wide margin after additional epochs of training.\nNowadays, several crowdsourcing projects exploit social choice methods for computing an aggregate ranking of alternatives given individual rankings provided by workers. Motivated by such systems, we consider a setting where each worker is asked to rank a fixed (small) number of alternatives and, then, a positional scoring rule is used to compute the aggregate ranking. Among the apparently infinite such rules, what is the best one to use? To answer this question, we assume that we have partial access to an underlying true ranking. Then, the important optimization problem to be solved is to compute the positional scoring rule whose outcome, when applied to the profile of individual rankings, is as close as possible to the part of the underlying true ranking we know. We study this fundamental problem from a theoretical viewpoint and present positive and negative complexity results and, furthermore, complement our theoretical findings with experiments on real-world and synthetic data.\nMixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs that scale with the total number of clusters, even though typically only a few clusters have significant posterior mass for any data point. We propose a constrained family of sparse variational distributions that allow at most $L$ non-zero entries, where the tunable threshold $L$ trades off speed for accuracy. Previous sparse approximations have used hard assignments ($L=1$), but we find that moderate values of $L>1$ provide superior performance. Our approach easily integrates with stochastic or incremental optimization algorithms to scale to millions of examples. Experiments training mixture models of image patches and topic models for news articles show that our approach produces better-quality models in far less time than baseline methods.\nRecent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.\n[Background]: Systematic Literature Review (SLR) has become an important software engineering research method but costs tremendous efforts. [Aim]: This paper proposes an approach to leverage on empirically evolved ontology to support automating key SLR activities. [Method]: First, we propose an ontology, SLRONT, built on SLR experiences and best practices as a groundwork to capture common terminologies and their relationships during SLR processes; second, we present an extended version of SLRONT, the COSONT and instantiate it with the knowledge and concepts extracted from structured abstracts. Case studies illustrate the details of applying it for supporting SLR steps. [Results]: Results show that through using COSONT, we acquire the same conclusion compared with sheer manual works, but the efforts involved is significantly reduced. [Conclusions]: The approach of using ontology could effectively and efficiently support the conducting of systematic literature review.\nWe introduce an online neural sequence to sequence model that learns to alternate between encoding and decoding segments of the input as it is read. By independently tracking the encoding and decoding representations our algorithm permits exact polynomial marginalization of the latent segmentation during training, and during decoding beam search is employed to find the best alignment path together with the predicted output sequence. Our model tackles the bottleneck of vanilla encoder-decoders that have to read and memorize the entire input sequence in their fixed-length hidden states before producing any output. It is different from previous attentive models in that, instead of treating the attention weights as output of a deterministic function, our model assigns attention weights to a sequential latent variable which can be marginalized out and permits online generation. Experiments on abstractive sentence summarization and morphological inflection show significant performance gains over the baseline encoder-decoders.\nRecommender systems play an increasingly important role in online applications to help users find what they need or prefer. Collaborative filtering algorithms that generate predictions by analyzing the user-item rating matrix perform poorly when the matrix is sparse. To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix. The proposed method constructs a new representation which preserves affinity and structure information in the user-item rating matrix and then performs recommendation task. To capture proximity information about users and items, two graphs are constructed. Manifold learning idea is used to constrain the new representation to be smooth on these graphs, so as to enforce users and item proximities. Our model is formulated as a convex optimization problem, for which we need to solve the well-known Sylvester equation only. We carry out extensive empirical evaluations on six benchmark datasets to show the effectiveness of this approach.\nDecision making is an important component in a speaker verification system. For the conventional GMM-UBM architecture, the decision is usually conducted based on the log likelihood ratio of the test utterance against the GMM of the claimed speaker and the UBM. This single-score decision is simple but tends to be sensitive to the complex variations in speech signals (e.g. text content, channel, speaking style, etc.). In this paper, we propose a decision making approach based on multiple scores derived from a set of cohort GMMs (cohort scores). Importantly, these cohort scores are not simply averaged as in conventional cohort methods; instead, we employ a powerful discriminative model as the decision maker. Experimental results show that the proposed method delivers substantial performance improvement over the baseline system, especially when a deep neural network (DNN) is used as the decision maker, and the DNN input involves some statistical features derived from the cohort scores.\nRobotic code needs to be verified to ensure its safety and functional correctness, especially when the robot is interacting with people. Testing real code in simulation is a viable option. However, generating tests that cover rare scenarios, as well as exercising most of the code, is a challenge amplified by the complexity of the interactions between the environment and the software. Model-based test generation methods can automate otherwise manual processes and facilitate reaching rare scenarios during testing. In this paper, we compare using Belief-Desire-Intention (BDI) agents as models for test generation with more conventional automata-based techniques that exploit model checking, in terms of practicality, performance, transferability to different scenarios, and exploration (`coverage'), through two case studies: a cooperative manufacturing task, and a home care scenario. The results highlight the advantages of using BDI agents for test generation. BDI agents naturally emulate the agency present in Human-Robot Interactions (HRIs), and are thus more expressive than automata. The performance of the BDI-based test generation is at least as high, and the achieved coverage is higher or equivalent, compared to test generation based on model checking automata.\nPLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification. However, PLDA training requires a large amount of labelled development data, which is highly expensive in most cases. We present a cheap PLDA training approach, which assumes that speakers in the same session can be easily separated, and speakers in different sessions are simply different. This results in `weak labels' which are not fully accurate but cheap, leading to a weak PLDA training.   Our experimental results on real-life large-scale telephony customer service achieves demonstrated that the weak training can offer good performance when human-labelled data are limited. More interestingly, the weak training can be employed as a discriminative adaptation approach, which is more efficient than the prevailing unsupervised method when human-labelled data are insufficient.\nWhile the possibility of time travel in physics is still debated, the explosive growth of virtual-reality simulations opens up new possibilities to rigorously explore such time travel and its consequences in the digital domain. Here we provide a computational model of time travel and a computer program that allows exploring digital time travel. In order to explain our method we formalize a simplified version of the famous grandfather paradox, show how the system can allow the participant to go back in time, try to kill their ancestors before they were born, and experience the consequences. The system has even come up with scenarios that can be considered consistent \"solutions\" of the grandfather paradox. We discuss the conditions for digital time travel, which indicate that it has a large number of practical applications.\nIn this paper, we present UbuntuWorld 1.0 LTS - a platform for developing automated technical support agents in the Ubuntu operating system. Specifically, we propose to use the Bash terminal as a simulator of the Ubuntu environment for a learning-based agent and demonstrate the usefulness of adopting reinforcement learning (RL) techniques for basic problem solving and troubleshooting in this environment. We provide a plug-and-play interface to the simulator as a python package where different types of agents can be plugged in and evaluated, and provide pathways for integrating data from online support forums like AskUbuntu into an automated agent's learning process. Finally, we show that the use of this data significantly improves the agent's learning efficiency. We believe that this platform can be adopted as a real-world test bed for research on automated technical support.\nTable (database) / Relational database Classification for big/smart/fast data machine learning is one of the most important tasks of predictive analytics and extracting valuable information from data. It is core applied technique for what now understood under data science and/or artificial intelligence. Widely used Decision Tree (Random Forest) and rare used rule based PRISM , VFST, etc classifiers are empirical substitutions of theoretically correct to use Boolean functions minimization. Developing Minimization of Boolean functions algorithms is started long time ago by Edward Veitch's 1952. Since it, big efforts by wide scientific/industrial community was done to find feasible solution of Boolean functions minimization. In this paper we propose consider table data classification from mathematical point of view, as minimization of Boolean functions. It is shown that data representation may be transformed to Boolean functions form and how to use known algorithms. For simplicity, binary output function is used for development, what opens doors for multivalued outputs developments.\nAcademic researchers often need to face with a large collection of research papers in the literature. This problem may be even worse for postgraduate students who are new to a field and may not know where to start. To address this problem, we have developed an online catalog of research papers where the papers have been automatically categorized by a topic model. The catalog contains 7719 papers from the proceedings of two artificial intelligence conferences from 2000 to 2015. Rather than the commonly used Latent Dirichlet Allocation, we use a recently proposed method called hierarchical latent tree analysis for topic modeling. The resulting topic model contains a hierarchy of topics so that users can browse the topics from the top level to the bottom level. The topic model contains a manageable number of general topics at the top level and allows thousands of fine-grained topics at the bottom level. It also can detect topics that have emerged recently.\nThis paper presents an end-to-end approach for tracking static and dynamic objects for an autonomous vehicle driving through crowded urban environments. Unlike traditional approaches to tracking, this method is learned end-to-end, and is able to directly predict a full unoccluded occupancy grid map from raw laser input data. Inspired by the recently presented DeepTracking approach [Ondruska, 2016], we employ a recurrent neural network (RNN) to capture the temporal evolution of the state of the environment, and propose to use Spatial Transformer modules to exploit estimates of the egomotion of the vehicle. Our results demonstrate the ability to track a range of objects, including cars, buses, pedestrians, and cyclists through occlusion, from both moving and stationary platforms, using a single learned model. Experimental results demonstrate that the model can also predict the future states of objects from current inputs, with greater accuracy than previous work.\nWe compare the effectiveness of four different syntactic CCG parsers for a semantic slot-filling task to explore how much syntactic supervision is required for downstream semantic analysis. This extrinsic, task-based evaluation provides a unique window to explore the strengths and weaknesses of semantics captured by unsupervised grammar induction systems. We release a new Freebase semantic parsing dataset called SPADES (Semantic PArsing of DEclarative Sentences) containing 93K cloze-style questions paired with answers. We evaluate all our models on this dataset. Our code and data are available at https://github.com/sivareddyg/graph-parser.\nThis paper presents a study of improvement in stability in a single machine connected to infinite bus (SMIB) power system by using static compensator (STATCOM). The gains of Proportional-Integral-Derivative (PID) controller in STATCOM are being optimized by heuristic technique based on Particle swarm optimization (PSO). Further, Bacterial Foraging Optimization (BFO) as an alternative heuristic method is also applied to select optimal gains of PID controller. The performance of STATCOM with the above soft-computing techniques are studied and compared with the conventional PID controller under various scenarios. The simulation results are accompanied with performance indices based quantitative analysis. The analysis clearly signifies the robustness of the new scheme in terms of stability and voltage regulation when compared with conventional PID.\nDetecting a small number of outliers from a set of data observations is always challenging. This problem is more difficult in the setting of multiple network samples, where computing the anomalous degree of a network sample is generally not sufficient. In fact, explaining why the network is exceptional, expressed in the form of subnetwork, is also equally important. In this paper, we develop a novel algorithm to address these two key problems. We treat each network sample as a potential outlier and identify subnetworks that mostly discriminate it from nearby regular samples. The algorithm is developed in the framework of network regression combined with the constraints on both network topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus goes beyond subspace/subgraph discovery and we show that it converges to a global optimum. Evaluation on various real-world network datasets demonstrates that our algorithm not only outperforms baselines in both network and high dimensional setting, but also discovers highly relevant and interpretable local subnetworks, further enhancing our understanding of anomalous networks.\nWeb Service is one of the most significant current discussions in information sharing technologies and one of the examples of service oriented processing. To ensure accurate execution of web services operations, it must be adaptable with policies of the social networks in which it signs up. This adaptation implements using controls called 'Commitment'. This paper describes commitments structure and existing research about commitments and social web services, then suggests an algorithm for consistency of commitments in social web services. As regards the commitments may be executed concurrently, a key challenge in web services execution based on commitment structure is consistency ensuring in execution time. The purpose of this research is providing an algorithm for consistency ensuring between web services operations based on commitments structure.\nDeep learning has led to significant advances in artificial intelligence, in part, by adopting strategies motivated by neurophysiology. However, it is unclear whether deep learning could occur in the real brain. Here, we show that a deep learning algorithm that utilizes multi-compartment neurons might help us to understand how the brain optimizes cost functions. Like neocortical pyramidal neurons, neurons in our model receive sensory information and higher-order feedback in electrotonically segregated compartments. Thanks to this segregation, the neurons in different layers of the network can coordinate synaptic weight updates. As a result, the network can learn to categorize images better than a single layer network. Furthermore, we show that our algorithm takes advantage of multilayer architectures to identify useful representations---the hallmark of deep learning. This work demonstrates that deep learning can be achieved using segregated dendritic compartments, which may help to explain the dendritic morphology of neocortical pyramidal neurons.\nA number of attempts have been made to improve accuracy and/or scalability of the PC (Peter and Clark) algorithm, some well known (Buhlmann, et al., 2010; Kalisch and Buhlmann, 2007; 2008; Zhang, 2012, to give some examples). We add here one more tool to the toolbox: the simple observation that if one is forced to choose between a variety of possible conditioning sets for a pair of variables, one should choose the one with the highest p-value. One can use the CPC (Conservative PC, Ramsey et al., 2012) algorithm as a guide to possible sepsets for a pair of variables. However, whereas CPC uses a voting rule to classify colliders versus noncolliders, our proposed algorithm, PC-Max, picks the conditioning set with the highest p-value, so that there are no ambiguities. We combine this with two other optimizations: (a) avoiding bidirected edges in the orientation of colliders, and (b) parallelization. For (b) we borrow ideas from the PC-Stable algorithm (Colombo and Maathuis, 2014). The result is an algorithm that scales quite well both in terms of accuracy and time, with no risk of bidirected edges.\nMany real-world problems involving constraints can be regarded as instances of the Max-SAT problem, which is the optimization variant of the classic satisfiability problem. In this paper, we propose a novel probabilistic approach for Max-SAT called ProMS. Our algorithm relies on a stochastic local search strategy using a novel probability distribution function with two strategies for picking variables, one based on available information and another purely random one. Moreover, while most previous algorithms based on WalkSAT choose unsatisfied clauses randomly, we introduce a novel clause selection strategy to improve our algorithm. Experimental results illustrate that ProMS outperforms many state-of-the-art stochastic local search solvers on hard unweighted random Max-SAT benchmarks.\nPerturb and Combine (P&C) group of methods generate multiple versions of the predictor by perturbing the training set or construction and then combining them into a single predictor (Breiman, 1996b). The motive is to improve the accuracy in unstable classification and regression methods. One of the most well known method in this group is Bagging. Arcing or Adaptive Resampling and Combining methods like AdaBoost are smarter variants of P&C methods. In this extended abstract, we lay the groundwork for a new family of methods under the P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary algorithms to suggest smarter sampling in both the feature space (sub-spaces) as well as training samples. We discuss multiple fitness functions to assess ensembles and empirically compare our performance against randomized sampling of training data and feature sub-spaces.\nWe consider the problem of efficient \"on the fly\" tuning of existing, or {\\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables dealing with errors without the need to re-train the system. Instead of re-training a simple cascade of perceptron nodes is added to the legacy system. The added cascade modulates the AI legacy system's decisions. If applied repeatedly, the process results in a network of modulating rules \"dressing up\" and improving performance of existing AI systems. Mathematical rationale behind the method is based on the fundamental property of measure concentration in high dimensional spaces. The method is illustrated with an example of fine-tuning a deep convolutional network that has been pre-trained to detect pedestrians in images.\nIn principle, reinforcement learning and policy search methods can enable robots to learn highly complex and general skills that may allow them to function amid the complexity and diversity of the real world. However, training a policy that generalizes well across a wide range of real-world conditions requires far greater quantity and diversity of experience than is practical to collect with a single robot. Fortunately, it is possible for multiple robots to share their experience with one another, and thereby, learn a policy collectively. In this work, we explore distributed and asynchronous policy learning as a means to achieve generalization and improved training times on challenging, real-world manipulation tasks. We propose a distributed and asynchronous version of Guided Policy Search and use it to demonstrate collective policy learning on a vision-based door opening task using four robots. We show that it achieves better generalization, utilization, and training times than the single robot alternative.\nA key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-based reinforcement learning holds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation -- pushing objects -- and can handle novel objects not seen during training.\nThere is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and as a step in that direction it is proposing the BookTest, a new dataset similar to the popular Children's Book Test (CBT), however more than 60 times larger. We show that training on the new data improves the accuracy of our Attention-Sum Reader model on the original CBT test data by a much larger margin than many recent attempts to improve the model architecture. On one version of the dataset our ensemble even exceeds the human baseline provided by Facebook. We then show in our own human study that there is still space for further improvement.\nOver the years, several meta-heuristic algorithms were proposed and are now emerging as common methods for constrained optimization problems. Among them, genetic algorithms (GA's) shine as popular evolutionary algorithms (EA's) in engineering optimization. Most engineering design problems are difficult to resolve with conventional optimization algorithms because they are highly nonlinear and contain constraints. In order to handle these constraints, the most common technique is to apply penalty functions. The major drawback is that they require tuning of parameters, which can be very challenging. In this paper, we present a constraint-handling technique for GA's solely using the violation factor, called VCH (Violation Constraint-Handling) method. Several benchmark problems from the literature are examined. The VCH technique was able to provide a consistent performance and match results from other GA-based techniques.\nTogether with the development of more accurate methods in Computer Vision and Natural Language Understanding, holistic architectures that answer on questions about the content of real-world images have emerged. In this tutorial, we build a neural-based approach to answer questions about images. We base our tutorial on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the models that we present here can achieve a competitive performance on both datasets, in fact, they are among the best methods that use a combination of LSTM with a global, full frame CNN representation of an image. We hope that after reading this tutorial, the reader will be able to use Deep Learning frameworks, such as Keras and introduced Kraino, to build various architectures that will lead to a further performance improvement on this challenging task.\nDetection rules have traditionally been designed for rational agents that minimize the Bayes risk (average decision cost). With the advent of crowd-sensing systems, there is a need to redesign binary hypothesis testing rules for behavioral agents, whose cognitive behavior is not captured by traditional utility functions such as Bayes risk. In this paper, we adopt prospect theory based models for decision makers. We consider special agent models namely optimists and pessimists in this paper, and derive optimal detection rules under different scenarios. Using an illustrative example, we also show how the decision rule of a human agent deviates from the Bayesian decision rule under various behavioral models, considered in this paper.\nWe present a weakly-supervised approach to segmenting proposed drivable paths in images with the goal of autonomous driving in complex urban environments. Using recorded routes from a data collection vehicle, our proposed method generates vast quantities of labelled images containing proposed paths and obstacles without requiring manual annotation, which we then use to train a deep semantic segmentation network. With the trained network we can segment proposed paths and obstacles at run-time using a vehicle equipped with only a monocular camera without relying on explicit modelling of road or lane markings. We evaluate our method on the large-scale KITTI and Oxford RobotCar datasets and demonstrate reliable path proposal and obstacle segmentation in a wide variety of environments under a range of lighting, weather and traffic conditions. We illustrate how the method can generalise to multiple path proposals at intersections and outline plans to incorporate the system into a framework for autonomous urban driving.\nSample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from target domain and approximate Bayesian methods, to progressively make it a better approximation. Thus, learning on a model ensemble, along with source domain adaptation, provides the benefit of both robustness and learning/adaptation.\nFace recognition (FR) is the most preferred mode for biometric-based surveillance, due to its passive nature of detecting subjects, amongst all different types of biometric traits. FR under surveillance scenario does not give satisfactory performance due to low contrast, noise and poor illumination conditions on probes, as compared to the training samples. A state-of-the-art technology, Deep Learning, even fails to perform well in these scenarios. We propose a novel soft-margin based learning method for multiple feature-kernel combinations, followed by feature transformed using Domain Adaptation, which outperforms many recent state-of-the-art techniques, when tested using three real-world surveillance face datasets.\nWith a large proportion of people carrying location-aware smartphones, we have an unprecedented platform from which to understand individuals and predict their future actions. This work builds upon the Context Tree data structure that summarises the historical contexts of individuals from augmented geospatial trajectories, and constructs a predictive model for their likely future contexts. The Predictive Context Tree (PCT) is constructed as a hierarchical classifier, capable of predicting both the future locations that a user will visit and the contexts that a user will be immersed within. The PCT is evaluated over real-world geospatial trajectories, and compared against existing location extraction and prediction techniques, as well as a proposed hybrid approach that uses identified land usage elements in combination with machine learning to predict future interactions. Our results demonstrate that higher predictive accuracies can be achieved using this hybrid approach over traditional extracted location datasets, and the PCT itself matches the performance of the hybrid approach at predicting future interactions, while adding utility in the form of context predictions. Such a prediction system is capable of understanding not only where a user will visit, but also their context, in terms of what they are likely to be doing.\nVisual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.\nIn this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent development of Gradient TD (GTD) algorithms has addressed this problem successfully. However, the success of GTD algorithms requires a set of well chosen features, which are not always available. When the number of features is huge, the GTD algorithms might face the problem of overfitting and being computationally expensive. To cope with this difficulty, regularization techniques, in particular $\\ell_1$ regularization, have attracted significant attentions in developing TD learning algorithms. The present work combines the GTD algorithms with $\\ell_1$ regularization. We propose a family of $\\ell_1$ regularized GTD algorithms, which employ the well known soft thresholding operator. We investigate convergence properties of the proposed algorithms, and depict their performance with several numerical experiments.\nDeep Neural Networks (DNNs) have become very popular for prediction in many areas. Their strength is in representation with a high number of parameters that are commonly learned via gradient descent or similar optimization methods. However, the representation is non-standardized, and the gradient calculation methods are often performed using component-based approaches that break parameters down into scalar units, instead of considering the parameters as whole entities. In this work, these problems are addressed. Standard notation is used to represent DNNs in a compact framework. Gradients of DNN loss functions are calculated directly over the inner product space on which the parameters are defined. This framework is general and is applied to two common network types: the Multilayer Perceptron and the Deep Autoencoder.\nSubjective expected utility theory assumes that decision-makers possess unlimited computational resources to reason about their choices; however, virtually all decisions in everyday life are made under resource constraints - i.e. decision-makers are bounded in their rationality. Here we experimentally tested the predictions made by a formalization of bounded rationality based on ideas from statistical mechanics and information-theory. We systematically tested human subjects in their ability to solve combinatorial puzzles under different time limitations. We found that our bounded-rational model accounts well for the data. The decomposition of the fitted model parameter into the subjects' expected utility function and resource parameter provide interesting insight into the subjects' information capacity limits. Our results confirm that humans gradually fall back on their learned prior choice patterns when confronted with increasing resource limitations.\nArising from many applications at the intersection of decision making and machine learning, Marginal Maximum A Posteriori (Marginal MAP) Problems unify the two main classes of inference, namely maximization (optimization) and marginal inference (counting), and are believed to have higher complexity than both of them. We propose XOR_MMAP, a novel approach to solve the Marginal MAP Problem, which represents the intractable counting subproblem with queries to NP oracles, subject to additional parity constraints. XOR_MMAP provides a constant factor approximation to the Marginal MAP Problem, by encoding it as a single optimization in polynomial size of the original problem. We evaluate our approach in several machine learning and decision making applications, and show that our approach outperforms several state-of-the-art Marginal MAP solvers.\nWe present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new features, reduce the accuracy gap between the original featurized system and the neural model, thus providing a clear quantitative interpretation of the success of these neural networks.\nThe crux of the problem in KDD Cup 2016 involves developing data mining techniques to rank research institutions based on publications. Rank importance of research institutions are derived from predictions on the number of full research papers that would potentially get accepted in upcoming top-tier conferences, utilizing public information on the web. This paper describes our solution to KDD Cup 2016. We used a two step approach in which we first identify full research papers corresponding to each conference of interest and then train two variants of exponential smoothing models to make predictions. Our solution achieves an overall score of 0.7508, while the winning submission scored 0.7656 in the overall results.\nHierarchical Reinforcement Learning has been previously shown to speed up the convergence rate of RL planning algorithms as well as mitigate feature-based model misspecification (Mankowitz et. al. 2016a,b, Bacon 2015). To do so, it utilizes hierarchical abstractions, also known as skills -- a type of temporally extended action (Sutton et. al. 1999) to plan at a higher level, abstracting away from the lower-level details. We incorporate risk sensitivity, also referred to as Situational Awareness (SA), into hierarchical RL for the first time by defining and learning risk aware skills in a Probabilistic Goal Semi-Markov Decision Process (PG-SMDP). This is achieved using our novel Situational Awareness by Risk-Conscious Skills (SARiCoS) algorithm which comes with a theoretical convergence guarantee. We show in a RoboCup soccer domain that the learned risk aware skills exhibit complex human behaviors such as `time-wasting' in a soccer game. In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.\nAn interesting research problem in our age of Big Data is that of determining provenance. Granular evaluation of provenance of physical goods--e.g. tracking ingredients of a pharmaceutical or demonstrating authenticity of luxury goods--has often not been possible with today's items that are produced and transported in complex, inter-organizational, often internationally-spanning supply chains. Recent adoption of Internet of Things and Blockchain technologies give promise at better supply chain provenance. We are particularly interested in the blockchain as many favoured use cases of blockchain are for provenance tracking. We are also interested in applying ontologies as there has been some work done on knowledge provenance, traceability, and food provenance using ontologies. In this paper, we make a case for why ontologies can contribute to blockchain design. To support this case, we analyze a traceability ontology and translate some of its representations to smart contracts that execute a provenance trace and enforce traceability constraints on the Ethereum blockchain platform.\nIn classical machine learning, regression is treated as a black box process of identifying a suitable function from a hypothesis set without attempting to gain insight into the mechanism connecting inputs and outputs. In the natural sciences, however, finding an interpretable function for a phenomenon is the prime goal as it allows to understand and generalize results. This paper proposes a novel type of function learning network, called equation learner (EQL), that can learn analytical expressions and is able to extrapolate to unseen domains. It is implemented as an end-to-end differentiable feed-forward network and allows for efficient gradient based training. Due to sparsity regularization concise interpretable expressions can be obtained. Often the true underlying source expression is identified.\nModern robotics applications that involve human-robot interaction require robots to be able to communicate with humans seamlessly and effectively. Natural language provides a flexible and efficient medium through which robots can exchange information with their human partners. Significant advancements have been made in developing robots capable of interpreting free-form instructions, but less attention has been devoted to endowing robots with the ability to generate natural language. We propose a navigational guide model that enables robots to generate natural language instructions that allow humans to navigate a priori unknown environments. We first decide which information to share with the user according to their preferences, using a policy trained from human demonstrations via inverse reinforcement learning. We then \"translate\" this information into a natural language instruction using a neural sequence-to-sequence model that learns to generate free-form instructions from natural language corpora. We evaluate our method on a benchmark route instruction dataset and achieve a BLEU score of 72.18% when compared to human-generated reference instructions. We additionally conduct navigation experiments with human participants that demonstrate that our method generates instructions that people follow as accurately and easily as those produced by humans.\nWe describe a general method of detecting valid chains or links of pieces on a two-dimensional grid. Specifically, using the example of the chess variant known as Switch-Side Chain-Chess (SSCC). Presently, no foolproof method of detecting such chains in any given chess position is known and existing graph theory, to our knowledge, is unable to fully address this problem either. We therefore propose a solution implemented and tested using the C++ programming language. We have been unable to find an incorrect result and therefore offer it as the most viable solution thus far to the chain-detection problem in this chess variant. The algorithm is also scalable, in principle, to areas beyond two-dimensional grids such as 3D analysis and molecular chemistry.\nIn the context of contemporary monophonic music, expression can be seen as the difference between a musical performance and its symbolic representation, i.e. a musical score. In this paper, we show how Maximum Entropy (MaxEnt) models can be used to generate musical expression in order to mimic a human performance. As a training corpus, we had a professional pianist play about 150 melodies of jazz, pop, and latin jazz. The results show a good predictive power, validating the choice of our model. Additionally, we set up a listening test whose results reveal that on average, people significantly prefer the melodies generated by the MaxEnt model than the ones without any expression, or with fully random expression. Furthermore, in some cases, MaxEnt melodies are almost as popular as the human performed ones.\nAn accurate and reliable image based fruit detection system is critical for supporting higher level agriculture tasks such as yield mapping and robotic harvesting. This paper presents the use of a state-of-the-art object detection framework, Faster R-CNN, in the context of fruit detection in orchards, including mangoes, almonds and apples. Ablation studies are presented to better understand the practical deployment of the detection network, including how much training data is required to capture variability in the dataset. Data augmentation techniques are shown to yield significant performance gains, resulting in a greater than two-fold reduction in the number of training images required. In contrast, transferring knowledge between orchards contributed to negligible performance gain over initialising the Deep Convolutional Neural Network directly from ImageNet features. Finally, to operate over orchard data containing between 100-1000 fruit per image, a tiling approach is introduced for the Faster R-CNN framework. The study has resulted in the best yet detection performance for these orchards relative to previous works, with an F1-score of >0.9 achieved for apples and mangoes.\nA fall is an abnormal activity that occurs rarely, so it is hard to collect real data for falls. It is, therefore, difficult to use supervised learning methods to automatically detect falls. Another challenge in using machine learning methods to automatically detect falls is the choice of engineered features. In this paper, we propose to use an ensemble of autoencoders to extract features from different channels of wearable sensor data trained only on normal activities. We show that the traditional approach of choosing a threshold as the maximum of the reconstruction error on the training normal data is not the right way to identify unseen falls. We propose two methods for automatic tightening of reconstruction error from only the normal activities for better identification of unseen falls. We present our results on two activity recognition datasets and show the efficacy of our proposed method against traditional autoencoder models and two standard one-class classification methods.\nA methodology for the development of a fuzzy expert system (FES) with application to earthquake prediction is presented. The idea is to reproduce the performance of a human expert in earthquake prediction. To do this, at the first step, rules provided by the human expert are used to generate a fuzzy rule base. These rules are then fed into an inference engine to produce a fuzzy inference system (FIS) and to infer the results. In this paper, we have used a Sugeno type fuzzy inference system to build the FES. At the next step, the adaptive network-based fuzzy inference system (ANFIS) is used to refine the FES parameters and improve its performance. The proposed framework is then employed to attain the performance of a human expert used to predict earthquakes in the Zagros area based on the idea of coupled earthquakes. While the prediction results are promising in parts of the testing set, the general performance indicates that prediction methodology based on coupled earthquakes needs more investigation and more complicated reasoning procedure to yield satisfactory predictions.\nWith the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets --both in number of instances and features--. The purpose of this work is to demonstrate that standard feature selection methods can be parallelized in Big Data platforms like Apache Spark, boosting both performance and accuracy. We thus propose a distributed implementation of a generic feature selection framework which includes a wide group of well-known Information Theoretic methods. Experimental results on a wide set of real-world datasets show that our distributed framework is capable of dealing with ultra-high dimensional datasets as well as those with a huge number of samples in a short period of time, outperforming the sequential version in all the cases studied.\nBilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.\nAccording to the distributional inclusion hypothesis, entailment between words can be measured via the feature inclusions of their distributional vectors. In recent work, we showed how this hypothesis can be extended from words to phrases and sentences in the setting of compositional distributional semantics. This paper focuses on inclusion properties of tensors; its main contribution is a theoretical and experimental analysis of how feature inclusion works in different concrete models of verb tensors. We present results for relational, Frobenius, projective, and holistic methods and compare them to the simple vector addition, multiplication, min, and max models. The degrees of entailment thus obtained are evaluated via a variety of existing word-based measures, such as Weed's and Clarke's, KL-divergence, APinc, balAPinc, and two of our previously proposed metrics at the phrase/sentence level. We perform experiments on three entailment datasets, investigating which version of tensor-based composition achieves the highest performance when combined with the sentence-level measures.\nAs Wireless Sensor Networks are penetrating into the industrial domain, many research opportunities are emerging. One such essential and challenging application is that of node localization. A feed-forward neural network based methodology is adopted in this paper. The Received Signal Strength Indicator (RSSI) values of the anchor node beacons are used. The number of anchor nodes and their configurations has an impact on the accuracy of the localization system, which is also addressed in this paper. Five different training algorithms are evaluated to find the training algorithm that gives the best result. The multi-layer Perceptron (MLP) neural network model was trained using Matlab. In order to evaluate the performance of the proposed method in real time, the model obtained was then implemented on the Arduino microcontroller. With four anchor nodes, an average 2D localization error of 0.2953 m has been achieved with a 12-12-2 neural network structure. The proposed method can also be implemented on any other embedded microcontroller system.\nReal life data often includes information from different channels. For example, in computer vision, we can describe an image using different image features, such as pixel intensity, color, HOG, GIST feature, SIFT features, etc.. These different aspects of the same objects are often called multi-view (or multi-modal) data. Low-rank regression model has been proved to be an effective learning mechanism by exploring the low-rank structure of real life data. But previous low-rank regression model only works on single view data. In this paper, we propose a multi-view low-rank regression model by imposing low-rank constraints on multi-view regression model. Most importantly, we provide a closed-form solution to the multi-view low-rank regression model. Extensive experiments on 4 multi-view datasets show that the multi-view low-rank regression model outperforms single-view regression model and reveals that multi-view low-rank structure is very helpful.\nCold start problem in Collaborative Filtering can be solved by asking new users to rate a small seed set of representative items or by asking representative users to rate a new item. The question is how to build a seed set that can give enough preference information for making good recommendations. One of the most successful approaches, called Representative Based Matrix Factorization, is based on Maxvol algorithm. Unfortunately, this approach has one important limitation --- a seed set of a particular size requires a rating matrix factorization of fixed rank that should coincide with that size. This is not necessarily optimal in the general case. In the current paper, we introduce a fast algorithm for an analytical generalization of this approach that we call Rectangular Maxvol. It allows the rank of factorization to be lower than the required size of the seed set. Moreover, the paper includes the theoretical analysis of the method's error, the complexity analysis of the existing methods and the comparison to the state-of-the-art approaches.\nThe weekly maintenance schedule specifies when maintenance activities should be performed on the equipment, taking into account the availability of workers and maintenance bays, and other operational constraints. The current approach to generating this schedule is labour intensive and requires coordination between the maintenance schedulers and operations staff to minimise its impact on the operation of the mine. This paper presents methods for automatically generating this schedule from the list of maintenance tasks to be performed, the availability roster of the maintenance staff, and time windows in which each piece of equipment is available for maintenance. Both Mixed-Integer Linear Programming (MILP) and genetic algorithms are evaluated, with the genetic algorithm shown to significantly outperform the MILP. Two fitness functions for the genetic algorithm are also examined, with a linear fitness function outperforming an inverse fitness function by up to 5% for the same calculation time. The genetic algorithm approach is computationally fast, allowing the schedule to be rapidly recalculated in response to unexpected delays and breakdowns.\nWe study a novel architecture and training procedure for locomotion tasks. A high-frequency, low-level \"spinal\" network with access to proprioceptive sensors learns sensorimotor primitives by training on simple tasks. This pre-trained module is fixed and connected to a low-frequency, high-level \"cortical\" network, with access to all sensors, which drives behavior by modulating the inputs to the spinal network. Where a monolithic end-to-end architecture fails completely, learning with a pre-trained spinal module succeeds at multiple high-level tasks, and enables the effective exploration required to learn from sparse rewards. We test our proposed architecture on three simulated bodies: a 16-dimensional swimming snake, a 20-dimensional quadruped, and a 54-dimensional humanoid. Our results are illustrated in the accompanying video at https://youtu.be/sboPYvhpraQ\nWe consider a set of learning agents in a collaborative peer-to-peer network, where each agent learns a personalized model according to its own learning objective. The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? We introduce and analyze two asynchronous gossip algorithms running in a fully decentralized manner. Our first approach, inspired from label propagation, aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. To optimize this challenging objective, our decentralized algorithm is based on ADMM.\nThe number of optimization techniques in the combinatorial domain is large and diversified. Nevertheless, there is still a lack of real benchmarks to validate optimization algorithms. In this work we introduce VRPBench, a tool to create instances and visualize solutions to the Vehicle Routing Problem (VRP) in a planar graph embedded in the Euclidean 2D space. We use VRPBench to model a real-world mail delivery case of the city of Artur Nogueira. Such scenarios were characterized as a multi-objective optimization of the VRP. We extracted a weighted graph from a digital map of the city to create a challenging benchmark for the VRP. Each instance models one generic day of mail delivery with hundreds to thousands of delivery points, thus allowing both the comparison and validation of optimization algorithms for routing problems.\nThe problem of makespan optimal solving of cooperative path finding (CPF) is addressed in this paper. The task in CPF is to relocate a group of agents in a non-colliding way so that each agent eventually reaches its goal location from the given initial location. The abstraction adopted in this work assumes that agents are discrete items moving in an undirected graph by traversing edges. Makespan optimal solving of CPF means to generate solutions that are as short as possi-ble in terms of the total number of time steps required for the execution of the solution.   We show that reducing CPF to propositional satisfiability (SAT) represents a viable option for obtaining makespan optimal solutions. Several encodings of CPF into propositional formulae are suggested and experimentally evaluated. The evaluation indicates that SAT based CPF solving outperforms other makespan optimal methods significantly in highly constrained situations (environments that are densely occupied by agents).\nRecent work on weighted model counting has been very successfully applied to the problem of probabilistic inference in Bayesian networks. The probability distribution is encoded into a Boolean normal form and compiled to a target language, in order to represent local structure expressed among conditional probabilities more efficiently. We show that further improvements are possible, by exploiting the knowledge that is lost during the encoding phase and incorporating it into a compiler inspired by Satisfiability Modulo Theories. Constraints among variables are used as a background theory, which allows us to optimize the Shannon decomposition. We propose a new language, called Weighted Positive Binary Decision Diagrams, that reduces the cost of probabilistic inference by using this decomposition variant to induce an arithmetic circuit of reduced size.\nIn this paper we propose a causal analog to the purely observational Dynamic Bayesian Networks, which we call Dynamic Causal Networks. We provide a sound and complete algorithm for identification of Dynamic Causal Net- works, namely, for computing the effect of an intervention or experiment, based on passive observations only, whenever possible. We note the existence of two types of confounder variables that affect in substantially different ways the iden- tification procedures, a distinction with no analog in either Dynamic Bayesian Networks or standard causal graphs. We further propose a procedure for the transportability of causal effects in Dynamic Causal Network settings, where the re- sult of causal experiments in a source domain may be used for the identification of causal effects in a target domain.\nThe horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter. We argue that the previous default choices are dubious due to their tendency to favor solutions with more unshrunk coefficients than we typically expect a priori. This can lead to bad results if this parameter is not strongly identified by data. We derive the relationship between the global parameter and the effective number of nonzeros in the coefficient vector, and show an easy and intuitive way of setting up the prior for the global parameter based on our prior beliefs about the number of nonzero coefficients in the model. The results on real world data show that one can benefit greatly -- in terms of improved parameter estimates, prediction accuracy, and reduced computation time -- from transforming even a crude guess for the number of nonzero coefficients into the prior for the global parameter using our framework.\nIn this paper, we define a novel census signal temporal logic (CensusSTL) that focuses on the number of agents in different subsets of a group that complete a certain task specified by the signal temporal logic (STL). CensusSTL consists of an \"inner logic\" STL formula and an \"outer logic\" STL formula. We present a new inference algorithm to infer CensusSTL formulae from the trajectory data of a group of agents. We first identify the \"inner logic\" STL formula and then infer the subgroups based on whether the agents' behaviors satisfy the \"inner logic\" formula at each time point. We use two different approaches to infer the subgroups based on similarity and complementarity, respectively. The \"outer logic\" CensusSTL formula is inferred from the census trajectories of different subgroups. We apply the algorithm in analyzing data from a soccer match by inferring the CensusSTL formula for different subgroups of a soccer team.\nConventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and low-dimensional. We exploit principle component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data. Experiments conducted on AMI corpus shows 4.6% relative reduction in word error rate.\nProbabilistic programming languages (PPLs) are a powerful modeling tool, able to represent any computable probability distribution. Unfortunately, probabilistic program inference is often intractable, and existing PPLs mostly rely on expensive, approximate sampling-based methods. To alleviate this problem, one could try to learn from past inferences, so that future inferences run faster. This strategy is known as amortized inference; it has recently been applied to Bayesian networks and deep generative models. This paper proposes a system for amortized inference in PPLs. In our system, amortization comes in the form of a parameterized guide program. Guide programs have similar structure to the original program, but can have richer data flow, including neural network components. These networks can be optimized so that the guide approximately samples from the posterior distribution defined by the original program. We present a flexible interface for defining guide programs and a stochastic gradient-based scheme for optimizing guide parameters, as well as some preliminary results on automatically deriving guide programs. We explore in detail the common machine learning pattern in which a 'local' model is specified by 'global' random values and used to generate independent observed data points; this gives rise to amortized local inference supporting global model learning.\nClassical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative \"big batch\" SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.\nDelay-coupled electro-optical systems have received much attention for their dynamical properties and their potential use in signal processing. In particular it has recently been demonstrated, using the artificial intelligence algorithm known as reservoir computing, that photonic implementations of such systems solve complex tasks such as speech recognition. Here we show how the backpropagation algorithm can be physically implemented on the same electro-optical delay-coupled architecture used for computation with only minor changes to the original design. We find that, compared when the backpropagation algorithm is not used, the error rate of the resulting computing device, evaluated on three benchmark tasks, decreases considerably. This demonstrates that electro-optical analog computers can embody a large part of their own training process, allowing them to be applied to new, more difficult tasks.\nThis paper examines use of dynamic probabilistic networks (DPN) for human action recognition. The actions of lifting objects and walking in the room, sitting in the room and neutral standing pose were used for testing the classification. The research used the dynamic interrelation between various different regions of interest (ROI) on the human body (face, body, arms, legs) and the time series based events related to the these ROIs. This dynamic links are then used to recognize the human behavioral aspects in the scene. First a model is developed to identify the human activities in an indoor scene and this model is dependent on the key features and interlinks between the various dynamic events using DPNs. The sub ROI are classified with DPN to associate the combined interlink with a specific human activity. The recognition accuracy performance between indoor (controlled lighting conditions) is compared with the outdoor lighting conditions. The accuracy in outdoor scenes was lower than the controlled environment.\nThe long-term memory of most connectionist systems lies entirely in the weights of the system. Since the number of weights is typically fixed, this bounds the total amount of knowledge that can be learned and stored. Though this is not normally a problem for a neural network designed for a specific task, such a bound is undesirable for a system that continually learns over an open range of domains. To address this, we describe a lifelong learning system that leverages a fast, though non-differentiable, content-addressable memory which can be exploited to encode both a long history of sequential episodic knowledge and semantic knowledge over many episodes for an unbounded number of domains. This opens the door for investigation into transfer learning, and leveraging prior knowledge that has been learned over a lifetime of experiences to new domains.\nHypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets.\nAutomatic construction of large knowledge graphs (KG) by mining web-scale text datasets has received considerable attention recently. Estimating accuracy of such automatically constructed KGs is a challenging problem due to their size and diversity. This important problem has largely been ignored in prior research we fill this gap and propose KGEval. KGEval binds facts of a KG using coupling constraints and crowdsources the facts that infer correctness of large parts of the KG. We demonstrate that the objective optimized by KGEval is submodular and NP-hard, allowing guarantees for our approximation algorithm. Through extensive experiments on real-world datasets, we demonstrate that KGEval is able to estimate KG accuracy more accurately compared to other competitive baselines, while requiring significantly lesser number of human evaluations.\nDecision makers, such as doctors and judges, make crucial decisions such as recommending treatments to patients, and granting bails to defendants on a daily basis. Such decisions typically involve weighting the potential benefits of taking an action against the costs involved. In this work, we aim to automate this task of learning \\emph{cost-effective, interpretable and actionable treatment regimes}. We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments. We propose a novel objective to construct a decision list which maximizes outcomes for the population, and minimizes overall costs. We model the problem of learning such a list as a Markov Decision Process (MDP) and employ a variant of the Upper Confidence Bound for Trees (UCT) strategy which leverages customized checks for pruning the search space effectively. Experimental results on real world observational data capturing judicial bail decisions and treatment recommendations for asthma patients demonstrate the effectiveness of our approach.\nIn most computer vision and image analysis problems, it is necessary to define a similarity measure between two or more different objects or images. Template matching is a classic and fundamental method used to score similarities between objects using certain mathematical algorithms. In this paper, we reviewed the basic concept of matching, as well as advances in template matching and applications such as invariant features or novel applications in medical image analysis. Additionally, deformable models and templates originating from classic template matching were discussed. These models have broad applications in image registration, and they are a fundamental aspect of novel machine vision or deep learning algorithms, such as convolutional neural networks (CNN), which perform shift and scale invariant functions followed by classification. In general, although template matching methods have restrictions which limit their application, they are recommended for use with other object recognition methods as pre- or post-processing steps. Combining a template matching technique such as normalized cross-correlation or dice coefficient with a robust decision-making algorithm yields a significant improvement in the accuracy rate for object detection and recognition.\nWe propose a novel method of regularization for recurrent neural networks called suprisal-driven zoneout. In this method, states zoneout (maintain their previous value rather than updating), when the suprisal (discrepancy between the last state's prediction and target) is small. Thus regularization is adaptive and input-driven on a per-neuron basis. We demonstrate the effectiveness of this idea by achieving state-of-the-art bits per character of 1.31 on the Hutter Prize Wikipedia dataset, significantly reducing the gap to the best known highly-engineered compression methods.\nWe extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We also survey other convergence results and highlight gaps in the theoretical underpinnings of FW-style algorithms. Motivating applications without known efficient alternatives are explored through structured prediction with combinatorial penalties as well as games over matching polytopes involving an exponential number of constraints.\nThis work presents a parametrized family of divergences, namely Alpha-Beta Log- Determinant (Log-Det) divergences, between positive definite unitized trace class operators on a Hilbert space. This is a generalization of the Alpha-Beta Log-Determinant divergences between symmetric, positive definite matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det divergences is highly general and contains many divergences as special cases, including the recently formulated infinite dimensional affine-invariant Riemannian distance and the infinite-dimensional Alpha Log-Det divergences between positive definite unitized trace class operators. In particular, it includes a parametrized family of metrics between positive definite trace class operators, with the affine-invariant Riemannian distance and the square root of the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det divergences between covariance operators on a Reproducing Kernel Hilbert Space (RKHS), we obtain closed form formulas via the corresponding Gram matrices.\nAnt colony system (ACS) is a promising approach which has been widely used in problems such as Travelling Salesman Problems (TSP), Job shop scheduling problems (JSP) and Quadratic Assignment problems (QAP). In its original implementation, parameters of the algorithm were selected by trial and error approach. Over the last few years, novel approaches have been proposed on adapting the parameters of ACS in improving its performance. The aim of this paper is to use a framework introduced for self-tuning optimization algorithms combined with the firefly algorithm (FA) to tune the parameters of the ACS solving symmetric TSP problems. The FA optimizes the problem specific parameters of ACS while the parameters of the FA are tuned by the selected framework itself. With this approach, the user neither has to work with the parameters of ACS nor the parameters of FA. Using common symmetric TSP problems we demonstrate that the framework fits well for the ACS. A detailed statistical analysis further verifies the goodness of the new ACS over the existing ACS and also of the other techniques used to tune the parameters of ACS.\nGiven a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.\nStatistical relational models provide compact encodings of probabilistic dependencies in relational domains, but result in highly intractable graphical models. The goal of lifted inference is to carry out probabilistic inference without needing to reason about each individual separately, by instead treating exchangeable, undistinguished objects as a whole. In this paper, we study the domain recursion inference rule, which, despite its central role in early theoretical results on domain-lifted inference, has later been believed redundant. We show that this rule is more powerful than expected, and in fact significantly extends the range of models for which lifted inference runs in time polynomial in the number of individuals in the domain. This includes an open problem called S4, the symmetric transitivity model, and a first-order logic encoding of the birthday paradox. We further identify new classes S2FO2 and S2RU of domain-liftable theories, which respectively subsume FO2 and recursively unary theories, the largest classes of domain-liftable theories known so far, and show that using domain recursion can achieve exponential speedup even in theories that cannot fully be lifted with the existing set of inference rules.\nDistributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e.g., documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.\nWe formalize synthesis of shared control protocols with correctness guarantees for temporal logic specifications. More specifically, we introduce a modeling formalism in which both a human and an autonomy protocol can issue commands to a robot towards performing a certain task. These commands are blended into a joint input to the robot. The autonomy protocol is synthesized using an abstraction of possible human commands accounting for randomness in decisions caused by factors such as fatigue or incomprehensibility of the problem at hand. The synthesis is designed to ensure that the resulting robot behavior satisfies given safety and performance specifications, e.g., in temporal logic. Our solution is based on nonlinear programming and we address the inherent scalability issue by presenting alternative methods. We assess the feasibility and the scalability of the approach by an experimental evaluation.\nMany applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functions for Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the inductive biases of long short-term memory (LSTM) recurrent networks, while retaining the non-parametric probabilistic advantages of Gaussian processes. We learn the properties of the proposed kernels by optimizing the Gaussian process marginal likelihood using a new provably convergent semi-stochastic gradient procedure and exploit the structure of these kernels for scalable training and prediction. This approach provides a practical representation for Bayesian LSTMs. We demonstrate state-of-the-art performance on several benchmarks, and thoroughly investigate a consequential autonomous driving application, where the predictive uncertainties provided by GP-LSTM are uniquely valuable.\nWe focus on generative autoencoders, such as variational or adversarial autoencoders, which jointly learn a generative model alongside an inference model. Generative autoencoders are those which are trained to softly enforce a prior on the latent distribution learned by the inference model. We call the distribution to which the inference model maps observed samples, the learned latent distribution, which may not be consistent with the prior. We formulate a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively decoding and encoding, which allows us to sample from the learned latent distribution. Since, the generative model learns to map from the learned latent distribution, rather than the prior, we may use MCMC to improve the quality of samples drawn from the generative model, especially when the learned latent distribution is far from the prior. Using MCMC sampling, we are able to reveal previously unseen differences between generative autoencoders trained either with or without a denoising criterion.\nThis paper proposes to use probabilistic model checking to synthesize optimal robot policies in multi-tasking autonomous systems that are subject to human-robot interaction. Given the convincing empirical evidence that human behavior can be related to reinforcement models, we take as input a well-studied Q-table model of the human behavior for flexible scenarios. We first describe an automated procedure to distill a Markov decision process (MDP) for the human in an arbitrary but fixed scenario. The distinctive issue is that -- in contrast to existing models -- under-specification of the human behavior is included. Probabilistic model checking is used to predict the human's behavior. Finally, the MDP model is extended with a robot model. Optimal robot policies are synthesized by analyzing the resulting two-player stochastic game. Experimental results with a prototypical implementation using PRISM show promising results.\nIn the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical and high dimensional features into a comprehensive structure with rich interpretable information in the data.   In this paper, we propose a novel Discriminative Pattern-based Prediction framework (DPPred) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, DPPred adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. DPPred selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, DPPred provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our DPPred outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns.\nProbabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edward's design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inferences about the model given data, and criticize the model's fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale.\nCognitive engineering is a multi-disciplinary field and hence it is difficult to find a review article consolidating the leading developments in the field. The in-credible pace at which technology is advancing pushes the boundaries of what is achievable in cognitive engineering. There are also differing approaches to cognitive engineering brought about from the multi-disciplinary nature of the field and the vastness of possible applications. Thus research communities require more frequent reviews to keep up to date with the latest trends. In this paper we shall dis-cuss some of the approaches to cognitive engineering holistically to clarify the reasoning behind the different approaches and to highlight their strengths and weaknesses. We shall then show how developments from seemingly disjointed views could be integrated to achieve the same goal of creating cognitive machines. By reviewing the major contributions in the different fields and showing the potential for a combined approach, this work intends to assist the research community in devising more unified methods and techniques for developing cognitive machines.\nChinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user's writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user's intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.\nThis work proposes a novel framework for the development of new products and services in transportation through an open innovation approach based on automatic content analysis of social media data. The framework is able to extract users comments from Online Social Networks (OSN), to process and analyze text through information extraction and sentiment analysis techniques to obtain relevant information about product reception on the market. A use case was developed using the mobile application Uber, which is today one of the fastest growing technology companies in the world. We measured how a controversial, highly diffused event influences the volume of tweets about Uber and the perception of its users. While there is no change in the image of Uber, a large increase in the number of tweets mentioning the company is observed, which meant a free and important diffusion of its product.\nWe introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do \"compilation of inference\" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.\nHarnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic Machine, which contains (a) a neural \"programmer\", i.e., a sequence-to-sequence model that maps language utterances to programs and utilizes a key-variable memory to handle compositionality (b) a symbolic \"computer\", i.e., a Lisp interpreter that performs program execution, and helps find good programs by pruning the search space. We apply REINFORCE to directly optimize the task reward of this structured prediction problem. To train with weak supervision and improve the stability of REINFORCE, we augment it with an iterative maximum-likelihood training process. NSM outperforms the state-of-the-art on the WebQuestionsSP dataset when trained from question-answer pairs only, without requiring any feature engineering or domain-specific knowledge.\nIn this paper, we study the problem of using a planning algorithm to automatically create and update a Behavior Tree (BT), controlling a robot in a dynamic environment. Exploiting the characteristic of BTs, in terms of modularity and reactivity, the robot continually acts and plans to achieve a given goal using a set of abstract actions and conditions. The construction of the BT is based on an extension of the Hybrid Backward-Forward algorithm (HBF) that allows us to refine the acting process by mapping the descriptive models onto operational models of actions, thus integrating the ability of planning in infinite state space of HBF with the continuous modular reactive action execution of BTs. We believe that this might be a first step to address the recently raised open challenge in automated planning: the need of a hierarchical structure and a continuous online planning and acting framework. We prove the convergence of the proposed approach as well as the absence of deadlocks and livelocks, and we illustrate our approach in two different robotics scenarios.\nThe term \"affordance\" denotes the behavioral meaning of objects. We propose a cognitive architecture for the detection of affordances in the visual modality. This model is based on the internal simulation of movement sequences. For each movement step, the resulting sensory state is predicted by a forward model, which in turn triggers the generation of a new (simulated) motor command by an inverse model. Thus, a series of mental images in the sensory and in the motor domain is evoked. Starting from a real sensory state, a large number of such sequences is simulated in parallel. Final affordance detection is based on the generated motor commands. We apply this model to a real-world mobile robot which is faced with obstacle arrangements some of which are passable (corridor) and some of which are not (dead ends). The robot's task is to detect the right affordance (\"pass-through-able\" or \"non-pass-through-able\"). The required internal models are acquired in a hierarchical training process. Afterwards, the robotic agent is able to distinguish reliably between corridors and dead ends. This real-world result enhances the validity of the proposed mental simulation approach. In addition, we compare several key factors in the simulation process regarding performance and efficiency.\nWhether officials can be trusted to protect national security information has become a matter of great public controversy, reigniting a long-standing debate about the scope and nature of official secrecy. The declassification of millions of electronic records has made it possible to analyze these issues with greater rigor and precision. Using machine-learning methods, we examined nearly a million State Department cables from the 1970s to identify features of records that are more likely to be classified, such as international negotiations, military operations, and high-level communications. Even with incomplete data, algorithms can use such features to identify 90% of classified cables with <11% false positives. But our results also show that there are longstanding problems in the identification of sensitive information. Error analysis reveals many examples of both overclassification and underclassification. This indicates both the need for research on inter-coder reliability among officials as to what constitutes classified material and the opportunity to develop recommender systems to better manage both classification and declassification.\nThe use of bots as virtual confederates in online field experiments holds extreme promise as a new methodological tool in computational social science. However, this potential tool comes with inherent ethical challenges. Informed consent can be difficult to obtain in many cases, and the use of confederates necessarily implies the use of deception. In this work we outline a design space for bots as virtual confederates, and we propose a set of guidelines for meeting the status quo for ethical experimentation. We draw upon examples from prior work in the CSCW community and the broader social science literature for illustration. While a handful of prior researchers have used bots in online experimentation, our work is meant to inspire future work in this area and raise awareness of the associated ethical issues.\nPairwise comparison is an important tool in multi-attribute decision making. Pairwise comparison matrices (PCM) have been applied for ranking criteria and for scoring alternatives according to a given criterion. Our paper presents a special application of incomplete PCMs: ranking of professional tennis players based on their results against each other. The selected 25 players have been on the top of the ATP rankings for a shorter or longer period in the last 40 years. Some of them have never met on the court. One of the aims of the paper is to provide ranking of the selected players, however, the analysis of incomplete pairwise comparison matrices is also in the focus. The eigenvector method and the logarithmic least squares method were used to calculate weights from incomplete PCMs. In our results the top three players of four decades were Nadal, Federer and Sampras. Some questions have been raised on the properties of incomplete PCMs and remains open for further investigation.\nLink prediction, the problem of identifying missing links among a set of inter-related data entities, is a popular field of research due to its application to graph-like domains. Producing consistent evaluations of the performance of the many link prediction algorithms being proposed can be challenging due to variable graph properties, such as size and density. In this paper we first discuss traditional data mining solutions which are applicable to link prediction evaluation, arguing about their capacity for producing faithful and useful evaluations. We also introduce an innovative modification to a traditional evaluation methodology with the goal of adapting it to the problem of evaluating link prediction algorithms when applied to large graphs, by tackling the problem of class imbalance. We empirically evaluate the proposed methodology and, building on these findings, make a case for its importance on the evaluation of large-scale graph processing.\nIn this work, we are interested in structure learning for a set of spatially distributed dynamical systems, where individual subsystems are coupled via latent variables and observed through a filter. We represent this model as a directed acyclic graph (DAG) that characterises the unidirectional coupling between subsystems. Standard approaches to structure learning are not applicable in this framework due to the hidden variables, however we can exploit the properties of certain dynamical systems to formulate exact methods based on state space reconstruction. We approach the problem by using reconstruction theorems to analytically derive a tractable expression for the KL-divergence of a candidate DAG from the observed dataset. We show this measure can be decomposed as a function of two information-theoretic measures, transfer entropy and stochastic interaction. We then present two mathematically robust scoring functions based on transfer entropy and statistical independence tests. These results support the previously held conjecture that transfer entropy can be used to infer effective connectivity in complex networks.\nOnline recommender systems often deal with continuous, potentially fast and unbounded flows of data. Ensemble methods for recommender systems have been used in the past in batch algorithms, however they have never been studied with incremental algorithms that learn from data streams. We evaluate online bagging with an incremental matrix factorization algorithm for top-N recommendation with positive-only -- binary -- ratings. Our results show that online bagging is able to improve accuracy up to 35% over the baseline, with small computational overhead.\nWe consider the problem of consistently matching multiple sets of elements to each other, which is a common task in fields such as computer vision. To solve the underlying NP-hard objective, existing methods often relax or approximate it, but end up with unsatisfying empirical performance due to a misaligned objective. We propose a coordinate update algorithm that directly optimizes the target objective. By using pairwise alignment information to build an undirected graph and initializing the permutation matrices along the edges of its Maximum Spanning Tree, our algorithm successfully avoids bad local optima. Theoretically, with high probability our algorithm guarantees an optimal solution under reasonable noise assumptions. Empirically, our algorithm consistently and significantly outperforms existing methods on several benchmark tasks on real datasets.\nProgressive filtering is a simple way to perform hierarchical classification, inspired by the behavior that most humans put into practice while attempting to categorize an item according to an underlying taxonomy. Each node of the taxonomy being associated with a different category, one may visualize the categorization process by looking at the item going downwards through all the nodes that accept it as belonging to the corresponding category. This paper is aimed at modeling the progressive filtering technique from a probabilistic perspective, in a hierarchical text categorization setting. As a result, the designer of a system based on progressive filtering should be facilitated in the task of devising, training, and testing it.\nIn this paper, we are going to find meaning of words based on distinct situations. Word Sense Disambiguation is used to find meaning of words based on live contexts using supervised and unsupervised approaches. Unsupervised approaches use online dictionary for learning, and supervised approaches use manual learning sets. Hand tagged data are populated which might not be effective and sufficient for learning procedure. This limitation of information is main flaw of the supervised approach. Our proposed approach focuses to overcome the limitation using learning set which is enriched in dynamic way maintaining new data. Trivial filtering method is utilized to achieve appropriate training data. We introduce a mixed methodology having Modified Lesk approach and Bag-of-Words having enriched bags using learning methods. Our approach establishes the superiority over individual Modified Lesk and Bag-of-Words approaches based on experimentation.\nCombining abstract, symbolic reasoning with continuous neural reasoning is a grand challenge of representation learning. As a step in this direction, we propose a new architecture, called neural equivalence networks, for the problem of learning continuous semantic representations of algebraic and logical expressions. These networks are trained to represent semantic equivalence, even of expressions that are syntactically very different. The challenge is that semantic representations must be computed in a syntax-directed manner, because semantics is compositional, but at the same time, small changes in syntax can lead to very large changes in semantics, which can be difficult for continuous neural architectures. We perform an exhaustive evaluation on the task of checking equivalence on a highly diverse class of symbolic algebraic and boolean expression types, showing that our model significantly outperforms existing architectures.\nWe propose a method to classify the causal relationship between two discrete variables given only the joint distribution of the variables, acknowledging that the method is subject to an inherent baseline error. We assume that the causal system is acyclicity, but we do allow for hidden common causes. Our algorithm presupposes that the probability distributions $P(C)$ of a cause $C$ is independent from the probability distribution $P(E\\mid C)$ of the cause-effect mechanism. While our classifier is trained with a Bayesian assumption of flat hyperpriors, we do not make this assumption about our test data. This work connects to recent developments on the identifiability of causal models over continuous variables under the assumption of \"independent mechanisms\". Carefully-commented Python notebooks that reproduce all our experiments are available online at http://vision.caltech.edu/~kchalupk/code.html.\nWe introduce a novel generalization of Counterexample-Guided Inductive Synthesis (CEGIS) and instantiate it to yield a novel, competitive algorithm for solving Quantified Boolean Formulas (QBF). Current QBF solvers based on counterexample-guided expansion use a recursive approach which scales poorly with the number of quantifier alternations. Our generalization of CEGIS removes the need for this recursive approach, and we instantiate it to yield a simple and efficient algorithm for QBF solving. Lastly, this research is supported by a competitive, though straightforward, implementation of the algorithm, making it possible to study the practical impact of our algorithm design decisions, along with various optimizations.\nRecurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Despite lacking trainable recurrent layers, stacked QRNNs have better predictive accuracy than stacked LSTMs of the same hidden size. Due to their increased parallelism, they are up to 16 times faster at train and test time. Experiments on language modeling, sentiment classification, and character-level neural machine translation demonstrate these advantages and underline the viability of QRNNs as a basic building block for a variety of sequence tasks.\nTransfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task's loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.\nSeveral deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both. Then a dynamic pointing decoder iterates over potential answer spans. This iterative procedure enables the model to recover from initial local maxima corresponding to incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1.\nPolicy gradient is an efficient technique for improving a policy in a reinforcement learning setting. However, vanilla online variants are on-policy only and not able to take advantage of off-policy data. In this paper we describe a new technique that combines policy gradient with off-policy Q-learning, drawing experience from a replay buffer. This is motivated by making a connection between the fixed points of the regularized policy gradient algorithm and the Q-values. This connection allows us to estimate the Q-values from the action preferences of the policy, to which we apply Q-learning updates. We refer to the new technique as 'PGQL', for policy gradient and Q-learning. We also establish an equivalency between action-value fitting techniques and actor-critic algorithms, showing that regularized policy gradient techniques can be interpreted as advantage function learning algorithms. We conclude with some numerical examples that demonstrate improved data efficiency and stability of PGQL. In particular, we tested PGQL on the full suite of Atari games and achieved performance exceeding that of both asynchronous advantage actor-critic (A3C) and Q-learning.\nOne of the most important fields in robotics is the optimization of controllers. Currently, robots are treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. We propose an implementation of a modern physics engine, which has the ability to differentiate control parameters. This has been implemented on both CPU and GPU. We show how this speeds up the optimization process, even for small problems, and why it will scale to bigger problems. We explain why this is an alternative approach to deep Q-learning, for using deep learning in robotics. Lastly, we argue that this is a big step for deep learning in robotics, as it opens up new possibilities to optimize robots, both in hardware and software.\nCausality has been recently introduced in databases, to model, characterize, and possibly compute causes for query answers. Connections between QA-causality and consistency-based diagnosis and database repairs (wrt. integrity constraint violations) have already been established. In this work we establish precise connections between QA-causality and both abductive diagnosis and the view-update problem in databases, allowing us to obtain new algorithmic and complexity results for QA-causality. We also obtain new results on the complexity of view-conditioned causality, and investigate the notion of QA-causality in the presence of integrity constraints, obtaining complexity results from a connection with view-conditioned causality. The abduction connection under integrity constraints allows us to obtain algorithmic tools for QA-causality.\nWe present an approach to sensorimotor control in immersive environments. Our approach utilizes a high-dimensional sensory stream and a lower-dimensional measurement stream. The cotemporal structure of these streams provides a rich supervisory signal, which enables training a sensorimotor control model by interacting with the environment. The model is trained using supervised learning techniques, but without extraneous supervision. It learns to act based on raw sensory input from a complex three-dimensional environment. The presented formulation enables learning without a fixed goal at training time, and pursuing dynamically changing goals at test time. We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on challenging tasks. The results also show that trained models successfully generalize across environments and goals. A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.\nQuestion answering (QA) has been the subject of a resurgence over the past years. The said resurgence has led to a multitude of question answering (QA) systems being developed both by companies and research facilities. While a few components of QA systems get reused across implementations, most systems do not leverage the full potential of component reuse. Hence, the development of QA systems is currently still a tedious and time-consuming process. We address the challenge of accelerating the creation of novel or tailored QA systems by presenting a concept for a self-wiring approach to composing QA systems. Our approach will allow the reuse of existing, web-based QA systems or modules while developing new QA platforms. To this end, it will rely on QA modules being described using the Web Ontology Language. Based on these descriptions, our approach will be able to automatically compose QA systems using a data-driven approach automatically.\nA framework is presented for unsupervised learning of representations based on infomax principle for large-scale neural populations. We use an asymptotic approximation to the Shannon's mutual information for a large neural population to demonstrate that a good initial approximation to the global information-theoretic optimum can be obtained by a hierarchical infomax method. Starting from the initial solution, an efficient algorithm based on gradient descent of the final objective function is proposed to learn representations from the input datasets, and the method works for complete, overcomplete, and undercomplete bases. As confirmed by numerical experiments, our method is robust and highly efficient for extracting salient features from input datasets. Compared with the main existing methods, our algorithm has a distinct advantage in both the training speed and the robustness of unsupervised representation learning. Furthermore, the proposed method is easily extended to the supervised or unsupervised model for training deep structure networks.\nInstability and variability of Deep Reinforcement Learning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within a simplified model. We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.\nMany algorithms for data analysis exist, especially for classification problems. To solve a data analysis problem, a proper algorithm should be chosen, and also its hyperparameters should be selected. In this paper, we present a new method for the simultaneous selection of an algorithm and its hyperparameters. In order to do so, we reduced this problem to the multi-armed bandit problem. We consider an algorithm as an arm and algorithm hyperparameters search during a fixed time as the corresponding arm play. We also suggest a problem-specific reward function. We performed the experiments on 10 real datasets and compare the suggested method with the existing one implemented in Auto-WEKA. The results show that our method is significantly better in most of the cases and never worse than the Auto-WEKA.\nMastering a video game requires skill, tactics and strategy. While these attributes may be acquired naturally by human players, teaching them to a computer program is a far more challenging task. In recent years, extensive research was carried out in the field of reinforcement learning and numerous algorithms were introduced, aiming to learn how to perform human tasks such as playing video games. As a result, the Arcade Learning Environment (ALE) (Bellemare et al., 2013) has become a commonly used benchmark environment allowing algorithms to train on various Atari 2600 games. In many games the state-of-the-art algorithms outperform humans. In this paper we introduce a new learning environment, the Retro Learning Environment --- RLE, that can run games on the Super Nintendo Entertainment System (SNES), Sega Genesis and several other gaming consoles. The environment is expandable, allowing for more video games and consoles to be easily added to the environment, while maintaining the same interface as ALE. Moreover, RLE is compatible with Python and Torch. SNES games pose a significant challenge to current algorithms due to their higher level of complexity and versatility.\nWe introduce the hierarchical compositional network (HCN), a directed generative model able to discover and disentangle, without supervision, the building blocks of a set of binary images. The building blocks are binary features defined hierarchically as a composition of some of the features in the layer immediately below, arranged in a particular manner. At a high level, HCN is similar to a sigmoid belief network with pooling. Inference and learning in HCN are very challenging and existing variational approximations do not work satisfactorily. A main contribution of this work is to show that both can be addressed using max-product message passing (MPMP) with a particular schedule (no EM required). Also, using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest. When used for classification, fast inference with HCN has exactly the same functional form as a convolutional neural network (CNN) with linear activations and binary weights. However, HCN's features are qualitatively very different.\nWe propose the Gaussian attention model for content-based neural memory access. With the proposed attention model, a neural network has the additional degree of freedom to control the focus of its attention from a laser sharp attention to a broad attention. It is applicable whenever we can assume that the distance in the latent space reflects some notion of semantics. We use the proposed attention model as a scoring function for the embedding of a knowledge base into a continuous vector space and then train a model that performs question answering about the entities in the knowledge base. The proposed attention model can handle both the propagation of uncertainty when following a series of relations and also the conjunction of conditions in a natural way. On a dataset of soccer players who participated in the FIFA World Cup 2014, we demonstrate that our model can handle both path queries and conjunctive queries well.\nWe consider the problem of density estimation on Riemannian manifolds. Density estimation on manifolds has many applications in fluid-mechanics, optics and plasma physics and it appears often when dealing with angular variables (such as used in protein folding, robot limbs, gene-expression) and in general directional statistics. In spite of the multitude of algorithms available for density estimation in the Euclidean spaces $\\mathbf{R}^n$ that scale to large n (e.g. normalizing flows, kernel methods and variational approximations), most of these methods are not immediately suitable for density estimation in more general Riemannian manifolds. We revisit techniques related to homeomorphisms from differential geometry for projecting densities to sub-manifolds and use it to generalize the idea of normalizing flows to more general Riemannian manifolds. The resulting algorithm is scalable, simple to implement and suitable for use with automatic differentiation. We demonstrate concrete examples of this method on the n-sphere $\\mathbf{S}^n$.\nEvery design choice will have different effects on different units. However traditional A/B tests are often underpowered to identify these heterogeneous effects. This is especially true when the set of unit-level attributes is high-dimensional and our priors are weak about which particular covariates are important. However, there are often observational data sets available that are orders of magnitude larger. We propose a method to combine these two data sources to estimate heterogeneous treatment effects. First, we use observational time series data to estimate a mapping from covariates to unit-level effects. These estimates are likely biased but under some conditions the bias preserves unit-level relative rank orderings. If these conditions hold, we only need sufficient experimental data to identify a monotonic, one-dimensional transformation from observationally predicted treatment effects to real treatment effects. This reduces power demands greatly and makes the detection of heterogeneous effects much easier. As an application, we show how our method can be used to improve Facebook page recommendations.\nWe analyze the data complexity of ontology-mediated querying where the ontologies are formulated in a description logic (DL) of the ALC family and queries are conjunctive queries, positive existential queries, or acyclic conjunctive queries. Our approach is non-uniform in the sense that we aim to understand the complexity of each single ontology instead of for all ontologies formulated in a certain language. While doing so, we quantify over the queries and are interested, for example, in the question whether all queries can be evaluated in polynomial time w.r.t. a given ontology. Our results include a PTime/coNP-dichotomy for ontologies of depth one in the description logic ALCFI, the same dichotomy for ALC- and ALCI-ontologies of unrestricted depth, and the non-existence of such a dichotomy for ALCF-ontologies. For the latter DL, we additionally show that it is undecidable whether a given ontology admits PTime query evaluation. We also consider the connection between PTime query evaluation and rewritability into (monadic) Datalog.\nHumans can learn concepts or recognize items from just a handful of examples, while machines require many more samples to perform the same task. In this paper, we build a computational model to investigate the possibility of this kind of rapid learning. The proposed method aims to improve the learning task of input from sensory memory by leveraging the information retrieved from long-term memory. We present a simple and intuitive technique called cognitive discriminative mappings (CDM) to explore the cognitive problem. First, CDM separates and clusters the data instances retrieved from long-term memory into distinct classes with a discrimination method in working memory when a sensory input triggers the algorithm. CDM then maps each sensory data instance to be as close as possible to the median point of the data group with the same class. The experimental results demonstrate that the CDM approach is effective for learning the discriminative features of supervised classifications with few training sensory input instances.\nWe formulate sequence to sequence transduction as a noisy channel decoding problem and use recurrent neural networks to parameterise the source and channel models. Unlike direct models which can suffer from explaining-away effects during training, noisy channel models must produce outputs that explain their inputs, and their component models can be trained with not only paired training samples but also unpaired samples from the marginal output distribution. Using a latent variable to control how much of the conditioning sequence the channel model needs to read in order to generate a subsequent symbol, we obtain a tractable and effective beam search decoder. Experimental results on abstractive sentence summarisation, morphological inflection, and machine translation show that noisy channel models outperform direct models, and that they significantly benefit from increased amounts of unpaired output data that direct models cannot easily use.\nModeling the structure of coherent texts is a key NLP problem. The task of coherently organizing a given set of sentences has been commonly used to build and evaluate models that understand such structure. We propose an end-to-end unsupervised deep learning approach based on the set-to-sequence framework to address this problem. Our model strongly outperforms prior methods in the order discrimination task and a novel task of ordering abstracts from scientific articles. Furthermore, our work shows that useful text representations can be obtained by learning to order sentences. Visualizing the learned sentence representations shows that the model captures high-level logical structure in paragraphs. Our representations perform comparably to state-of-the-art pre-training methods on sentence similarity and paraphrase detection tasks.\nCP-nets and their variants constitute one of the main AI approaches for specifying and reasoning about preferences. CI-nets, in particular, are a CP-inspired formalism for representing ordinal preferences over sets of goods, which are typically required to be monotonic.   Considering also that goods often come in multi-sets rather than sets, a natural question is whether CI-nets can be used more or less directly to encode preferences over multi-sets. We here provide some initial ideas on how to achieve this, in the sense that at least a restricted form of reasoning on our framework, which we call \"confined reasoning\", can be efficiently reduced to reasoning on CI-nets. Our framework nevertheless allows for encoding preferences over multi-sets with unbounded multiplicities. We also show the extent to which it can be used to represent preferences where multiplicites of the goods are not stated explicitly (\"purely qualitative preferences\") as well as a potential use of our generalization of CI-nets as a component of a recent system for evidence aggregation.\nA new agent architecture called Limited Instruction Set Agent (LISA) is introduced for autonomous control. The new architecture is based on previous implementations of AgentSpeak and it is structurally simpler than its predecessors with the aim of facilitating design-time and run-time verification methods. The process of abstracting the LISA system to two different types of discrete probabilistic models (DTMC and MDP) is investigated and illustrated. The LISA system provides a tool for complete modelling of the agent and the environment for probabilistic verification. The agent program can be automatically compiled into a DTMC or a MDP model for verification with Prism. The automatically generated Prism model can be used for both design-time and run-time verification. The run-time verification is investigated and illustrated in the LISA system as an internal modelling mechanism for prediction of future outcomes.\nWe propose a major revision of the format XCSP 2.1, called XCSP3, to build integrated representations of combinatorial constrained problems. This new format is able to deal with mono/multi optimization, many types of variables, cost functions, reification, views, annotations, variable quantification, distributed, probabilistic and qualitative reasoning. The new format is made compact, highly readable, and rather easy to parse. Interestingly, it captures the structure of the problem models, through the possibilities of declaring arrays of variables, and identifying syntactic and semantic groups of constraints. The number of constraints is kept under control by introducing a limited set of basic constraint forms, and producing almost automatically some of their variations through lifting, restriction, sliding, logical combination and relaxation mechanisms. As a result, XCSP3 encompasses practically all constraints that can be found in major constraint solvers developed by the CP community. A website, which is developed conjointly with the format, contains many models and series of instances. The user can make sophisticated queries for selecting instances from very precise criteria. The objective of XCSP3 is to ease the effort required to test and compare different algorithms by providing a common test-bed of combinatorial constrained instances.\nImportance sampling is often used in machine learning when training and testing data come from different distributions. In this paper we propose a new variant of importance sampling that can reduce the variance of importance sampling-based estimates by orders of magnitude when the supports of the training and testing distributions differ. After motivating and presenting our new importance sampling estimator, we provide a detailed theoretical analysis that characterizes both its bias and variance relative to the ordinary importance sampling estimator (in various settings, which include cases where ordinary importance sampling is biased, while our new estimator is not, and vice versa). We conclude with an example of how our new importance sampling estimator can be used to improve estimates of how well a new treatment policy for diabetes will work for an individual, using only data from when the individual used a previous treatment policy.\nInference in expressive probabilistic models is generally intractable, which makes them difficult to learn and limits their applicability. Sum-product networks are a class of deep models where, surprisingly, inference remains tractable even when an arbitrary number of hidden layers are present. In this paper, we generalize this result to a much broader set of learning problems: all those where inference consists of summing a function over a semiring. This includes satisfiability, constraint satisfaction, optimization, integration, and others. In any semiring, for summation to be tractable it suffices that the factors of every product have disjoint scopes. This unifies and extends many previous results in the literature. Enforcing this condition at learning time thus ensures that the learned models are tractable. We illustrate the power and generality of this approach by applying it to a new type of structured prediction problem: learning a nonconvex function that can be globally optimized in polynomial time. We show empirically that this greatly outperforms the standard approach of learning without regard to the cost of optimization.\nThis paper describes the USTC_NELSLIP systems submitted to the Trilingual Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population (KBP) contests. We have built two systems for entity discovery and mention detection (MD): one uses the conditional RNNLM and the other one uses the attention-based encoder-decoder framework. The entity linking (EL) system consists of two modules: a rule based candidate generation and a neural networks probability ranking model. Moreover, some simple string matching rules are used for NIL clustering. At the end, our best system has achieved an F1 score of 0.624 in the end-to-end typed mention ceaf plus metric.\nMost neural network models for document classification on social media focus on text infor-mation to the neglect of other information on these platforms. In this paper, we classify post stance on social media channels and develop UTCNN, a neural network model that incorporates user tastes, topic tastes, and user comments on posts. UTCNN not only works on social media texts, but also analyzes texts in forums and message boards. Experiments performed on Chinese Facebook data and English online debate forum data show that UTCNN achieves a 0.755 macro-average f-score for supportive, neutral, and unsupportive stance classes on Facebook data, which is significantly better than models in which either user, topic, or comment information is withheld. This model design greatly mitigates the lack of data for the minor class without the use of oversampling. In addition, UTCNN yields a 0.842 accuracy on English online debate forum data, which also significantly outperforms results from previous work as well as other deep learning models, showing that UTCNN performs well regardless of language or platform.\nSubjective questions such as `does neymar dive', or `is clinton lying', or `is trump a fascist', are popular queries to web search engines, as can be seen by autocompletion suggestions on Google, Yahoo and Bing. In the era of cognitive computing, beyond search, they could be handled as hypotheses issued for evaluation. Our vision is to leverage on unstructured data and metadata of the rich user-generated multimedia that is often shared as material evidence in favor or against hypotheses in social media platforms. In this paper we present two preliminary experiments along those lines and discuss challenges for a cognitive computing system that collects material evidence from user-generated multimedia towards aggregating it into some form of collective decision on the hypothesis.\nLearning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities.\nDesigning effective exploration-exploitation algorithms in Markov decision processes (MDPs) with large state-action spaces is the main challenge in reinforcement learning (RL). In fact, the learning performance degrades with the number of states and actions in the MDP. However, MDPs often exhibit a low-dimensional latent structure in practice, where a small hidden state is observable through a possibly large number of observations. In this paper, we study the setting of rich-observation Markov decision processes (\\richmdp), where hidden states are mapped to observations through an injective mapping, so that an observation can be generated by only one hidden state. While this mapping is unknown a priori, we introduce a spectral decomposition method that consistently estimates how observations are clustered in the hidden states. The estimated clustering is then integrated into an optimistic algorithm for RL (UCRL), which operates on the smaller clustered space. The resulting algorithm proceeds through phases and we show that its per-step regret (i.e., the difference in cumulative reward between the algorithm and the optimal policy) decreases as more observations are clustered together and finally, matches the (ideal) performance of an RL algorithm running directly on the hidden MDP.\nCausal discovery studies the problem of mining causal relationships between variables from data, which is of primary interest in science. During the past decades, significant amount of progresses have been made toward this fundamental data mining paradigm. Recent years, as the availability of abundant large-sized and complex observational data, the constrain-based approaches have gradually attracted a lot of interest and have been widely applied to many diverse real-world problems due to the fast running speed and easy generalizing to the problem of causal insufficiency. In this paper, we aim to review the constraint-based causal discovery algorithms. Firstly, we discuss the learning paradigm of the constraint-based approaches. Secondly and primarily, the state-of-the-art constraint-based casual inference algorithms are surveyed with the detailed analysis. Thirdly, several related open-source software packages and benchmark data repositories are briefly summarized. As a conclusion, some open problems in constraint-based causal discovery are outlined for future research.\nWe propose a scalable approach to learn video-based question answering (QA): answer a \"free-form natural language question\" about a video content. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended fromMN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et al. 2015), SS (Venugopalan et al. 2015). In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines.\nThe budgeted information gathering problem - where a robot with a fixed fuel budget is required to maximize the amount of information gathered from the world - appears in practice across a wide range of applications in autonomous exploration and inspection with mobile robots. Although there is an extensive amount of prior work investigating effective approximations of the problem, these methods do not address the fact that their performance is heavily dependent on distribution of objects in the world. In this paper, we attempt to address this issue by proposing a novel data-driven imitation learning framework.   We present an efficient algorithm, EXPLORE, that trains a policy on the target distribution to imitate a clairvoyant oracle - an oracle that has full information about the world and computes non-myopic solutions to maximize information gathered. We validate the approach on a spectrum of results on a number of 2D and 3D exploration problems that demonstrates the ability of EXPLORE to adapt to different object distributions. Additionally, our analysis provides theoretical insight into the behavior of EXPLORE. Our approach paves the way forward for efficiently applying data-driven methods to the domain of information gathering.\nMeasuring research impact and ranking academic achievement are important and challenging problems. Having an objective picture of research institution is particularly valuable for students, parents and funding agencies, and also attracts attention from government and industry. KDD Cup 2016 proposes the paper acceptance rank prediction task, in which the participants are asked to rank the importance of institutions based on predicting how many of their papers will be accepted at the 8 top conferences in computer science. In our work, we adopt a three-step feature engineering method, including basic features definition, finding similar conferences to enhance the feature set, and dimension reduction using PCA. We propose three ranking models and the ensemble methods for combining such models. Our experiment verifies the effectiveness of our approach. In KDD Cup 2016, we achieved the overall rank of the 2nd place.\nSentiment analysis is crucial for extracting social signals from social media content. Due to the prevalence of images in social media, image sentiment analysis is receiving increasing attention in recent years. However, most existing systems are black-boxes that do not provide insight on how image content invokes sentiment and emotion in the viewers. Psychological studies have confirmed that salient objects in an image often invoke emotions. In this work, we investigate more fine-grained and more comprehensive interaction between visual saliency and visual sentiment. In particular, we partition images in several primary scene-type dimensions, including: open-closed, natural-manmade, indoor-outdoor, and face-noface. Using state of the art saliency detection algorithm and sentiment classification algorithm, we examine how the sentiment of the salient region(s) in an image relates to the overall sentiment of the image. The experiments on a representative image emotion dataset have shown interesting correlation between saliency and sentiment in different scene types and in turn shed light on the mechanism of visual sentiment evocation.\nRecent studies on knowledge base completion, the task of recovering missing facts based on observed facts, demonstrate the importance of learning embeddings from multi-step relations. Due to the size of knowledge bases, previous works manually design relation paths of observed triplets in symbolic space (e.g. random walk) to learn multi-step relations during training. However, these approaches suffer some limitations as most paths are not informative, and it is prohibitively expensive to consider all possible paths. To address the limitations, we propose learning to traverse in vector space directly without the need of symbolic space guidance. To remember the connections between related observed triplets and be able to adaptively change relation paths in vector space, we propose Implicit ReasoNets (IRNs), that is composed of a global memory and a controller module to learn multi-step relation paths in vector space and infer missing facts jointly without any human-designed procedure. Without using any axillary information, our proposed model achieves state-of-the-art results on popular knowledge base completion benchmarks.\nNetwork data mining has become an important area of study due to the large number of problems it can be applied to. This paper presents NOESIS, an open source framework for network data mining that provides a large collection of network analysis techniques, including the analysis of network structural properties, community detection methods, link scoring, and link prediction, as well as network visualization algorithms. It also features a complete stand-alone graphical user interface that facilitates the use of all these techniques. The NOESIS framework has been designed using solid object-oriented design principles and structured parallel programming. As a lightweight library with minimal external dependencies and a permissive software license, NOESIS can be incorporated into other software projects. Released under a BSD license, it is available from http://noesis.ikor.org.\nReal-time parking occupancy information is critical for a parking management system to facilitate drivers to park more efficiently. Recent advances in connected and automated vehicle technologies enable sensor-equipped cars (probe cars) to detect and broadcast available parking spaces when driving through parking lots. In this paper, we evaluate the impact of market penetration of probe cars on the system performance, and investigate different parking guidance policies to improve the data acquisition process. We adopt a simulation-based approach to impose four policies on an off- street parking lot influencing the behavior of probe cars to park in assigned parking spaces. This in turn effects the scanning route and the parking space occupancy estimations. The last policy we propose is a near-optimal guidance strategy that maximizes the information gain of posteriors. The results suggest that an efficient information gathering policy can compensate for low penetration of connected and automated vehicles. We also highlight the policy trade-off that occur while attempting to maximize information gain through explorations and improve assignment accuracy through exploitations. Our results can assist urban policy makers in designing and managing smart parking systems.\nAnswer Set Programming (ASP) is an expressive knowledge representation and reasoning framework. Due to its rather simple syntax paired with high-performance solvers, ASP is interesting for industrial applications. However, to err is human and thus debugging is an important activity during the development process. Therefore, tools for debugging non-ground answer set programs are needed. In this paper, we present a new graphical debugging interface for non-ground answer set programs. The tool is based on the recently-introduced DWASP approach for debugging and it simplifies the interaction with the debugger. Furthermore, the debugging interface is integrated in ASPIDE, a rich IDE for answer set programs. With our extension ASPIDE turns into a full-fledged IDE by offering debugging support.\nIn this work, a study on Variable Neighborhood Search algorithms for multi-depot dial-a-ride problems is presented. In dial-a-ride problems patients need to be transported from pre-specified pickup locations to pre-specified delivery locations, under different considerations. The addressed problem presents several constraints and features, such as heterogeneous vehicles, distributed in different depots, and heterogeneous patients. The aim is of minimizing the total routing cost, while respecting time-window, ride-time, capacity and route duration constraints. The objective of the study is of determining the best algorithm configuration in terms of initial solution, neighborhood and local search procedures. At this aim, two different procedures for the computation of an initial solution, six different type of neighborhoods and five local search procedures, where only intra-route changes are made, have been considered and compared.   We have also evaluated an \"adjusting procedure\" that aims to produce feasible solutions from infeasible solutions with small constraints violations. The different VNS algorithms have been tested on instances from literature as well as on random instances arising from a real-world healthcare application.\nThe CDCL algorithm is the leading solution adopted by state-of-the-art solvers for SAT, SMT, ASP, and others. Experiments show that the performance of CDCL solvers can be significantly boosted by embedding domain-specific heuristics, especially on large real-world problems. However, a proper integration of such criteria in off-the-shelf CDCL implementations is not obvious. In this paper, we distill the key ingredients that drive the search of CDCL solvers, and propose a general framework for designing and implementing new heuristics. We implemented our strategy in an ASP solver, and we experimented on two industrial domains. On hard problem instances, state-of-the-art implementations fail to find any solution in acceptable time, whereas our implementation is very successful and finds all solutions.\nJust recently, the concept of augmented and virtual reality (AR/VR) over wireless has taken the entire 5G ecosystem by storm spurring an unprecedented interest from both academia, industry and others. Yet, the success of an immersive VR experience hinges on solving a plethora of grand challenges cutting across multiple disciplines. This article underscores the importance of VR technology as a disruptive use case of 5G (and beyond) harnessing the latest development of storage/memory, fog/edge computing, computer vision, artificial intelligence and others. In particular, the main requirements of wireless interconnected VR are described followed by a selection of key enablers, then, research avenues and their underlying grand challenges are presented. Furthermore, we examine three VR case studies and provide numerical results under various storage, computing and network configurations. Finally, this article exposes the limitations of current networks and makes the case for more theory, and innovations to spearhead VR for the masses.\nWith the large volume of new information created every day, determining the validity of information in a knowledge graph and filling in its missing parts are crucial tasks for many researchers and practitioners. To address this challenge, a number of knowledge graph completion methods have been developed using low-dimensional graph embeddings. Although researchers continue to improve these models using an increasingly complex feature space, we show that simple changes in the architecture of the underlying model can outperform state-of-the-art models without the need for complex feature engineering. In this work, we present a shared variable neural network model called ProjE that fills-in missing information in a knowledge graph by learning joint embeddings of the knowledge graph's entities and edges, and through subtle, but important, changes to the standard loss function. In doing so, ProjE has a parameter size that is smaller than 11 out of 15 existing methods while performing $37\\%$ better than the current-best method on standard datasets. We also show, via a new fact checking task, that ProjE is capable of accurately determining the veracity of many declarative statements.\nPart of the appeal of Visual Question Answering (VQA) is its promise to answer new questions about previously unseen images. Most current methods demand training questions that illustrate every possible concept, and will therefore never achieve this capability, since the volume of required training data would be prohibitive. Answering general questions about images requires methods capable of Zero-Shot VQA, that is, methods able to answer questions beyond the scope of the training questions. We propose a new evaluation protocol for VQA methods which measures their ability to perform Zero-Shot VQA, and in doing so highlights significant practical deficiencies of current approaches, some of which are masked by the biases in current datasets. We propose and evaluate several strategies for achieving Zero-Shot VQA, including methods based on pretrained word embeddings, object classifiers with semantic embeddings, and test-time retrieval of example images. Our extensive experiments are intended to serve as baselines for Zero-Shot VQA, and they also achieve state-of-the-art performance in the standard VQA evaluation setting.\nWhen a processing unit relies on data from external streams, we may face the problem that the stream data needs to be rearranged in a way that allows the unit to perform its task(s). On arrival of new data, we must decide whether there is sufficient information available to start processing or whether to wait for more data. Furthermore, we need to ensure that the data meets the input specification of the processing step. In the case of multiple input streams it is also necessary to coordinate which data from which incoming stream should form the input of the next process instantiation. In this work, we propose a declarative approach as an interface between multiple streams and a processing unit. The idea is to specify via answer-set programming how to arrange incoming data in packages that are suitable as input for subsequent processing. Our approach is intended for use in asynchronous multi-context systems (aMCSs), a recently proposed framework for loose coupling of knowledge representation formalisms that allows for online reasoning in a dynamic environment. Contexts in aMCSs process data streams from external sources and other contexts.\nThe current trend in object detection and localization is to learn predictions with high capacity deep neural networks trained on a very large amount of annotated data and using a high amount of processing power. In this work, we propose a new neural model which directly predicts bounding box coordinates. The particularity of our contribution lies in the local computations of predictions with a new form of local parameter sharing which keeps the overall amount of trainable parameters low. Key components of the model are spatial 2D-LSTM recurrent layers which convey contextual information between the regions of the image. We show that this model is more powerful than the state of the art in applications where training data is not as abundant as in the classical configuration of natural images and Imagenet/Pascal VOC tasks. We particularly target the detection of text in document images, but our method is not limited to this setting. The proposed model also facilitates the detection of many objects in a single image and can deal with inputs of variable sizes without resizing.\nFeature subspace selection is an important part in speech emotion recognition. Most of the studies are devoted to finding a feature subspace for representing all emotions. However, some studies have indicated that the features associated with different emotions are not exactly the same. Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace. In this work, we propose a new divide and conquer idea to solve the problem. First, the feature subspaces are constructed for all the combinations of every two different emotions (emotion-pair). Bi-classifiers are then trained on these feature subspaces respectively. The final emotion recognition result is derived by the voting and competition method. Experimental results demonstrate that the proposed method can get better results than the traditional multi-classification method.\nWe introduce two novel non-parametric statistical hypothesis tests. The first test, called the relative test of dependency, enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC). The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD). To construct these tests, we have used as our test statistics the difference of HSIC statistics and of MMD statistics, respectively. The resulting tests are consistent and unbiased, and have favorable convergence properties. The effectiveness of the relative dependency test is demonstrated on several real-world problems: we identify languages groups from a multilingual parallel corpus, and we show that tumor location is more dependent on gene expression than chromosome imbalance. We also demonstrate the performance of the relative test of similarity over a broad selection of model comparisons problems in deep generative models.\nAt the core of interpretable machine learning is the question of whether humans are able to make accurate predictions about a model's behavior. Assumed in this question are three properties of the interpretable output: coverage, precision, and effort. Coverage refers to how often humans think they can predict the model's behavior, precision to how accurate humans are in those predictions, and effort is either the up-front effort required in interpreting the model, or the effort required to make predictions about a model's behavior.   In this work, we propose anchor-LIME (aLIME), a model-agnostic technique that produces high-precision rule-based explanations for which the coverage boundaries are very clear. We compare aLIME to linear LIME with simulated experiments, and demonstrate the flexibility of aLIME with qualitative examples from a variety of domains and tasks.\nTraining deep neural networks for solving machine learning problems is one great challenge in the field, mainly due to its associated optimisation problem being highly non-convex. Recent developments have suggested that many training algorithms do not suffer from undesired local minima under certain scenario, and consequently led to great efforts in pursuing mathematical explanations for such observations. This work provides an alternative mathematical understanding of the challenge from a smooth optimisation perspective. By assuming exact learning of finite samples, sufficient conditions are identified via a critical point analysis to ensure any local minimum to be globally minimal as well. Furthermore, a state of the art algorithm, known as the Generalised Gauss-Newton (GGN) algorithm, is rigorously revisited as an approximate Newton's algorithm, which shares the property of being locally quadratically convergent to a global minimum under the condition of exact learning.\nWe study the task of teaching a machine to classify objects using features and labels. We introduce the Error-Driven-Featuring design pattern for teaching using features and labels in which a teacher prefers to introduce features only if they are needed. We analyze the potential risks and benefits of this teaching pattern through the use of teaching protocols, illustrative examples, and by providing bounds on the effort required for an optimal machine teacher using a linear learning algorithm, the most commonly used type of learners in interactive machine learning systems. Our analysis provides a deeper understanding of potential trade-offs of using different learning algorithms and between the effort required for featuring (creating new features) and labeling (providing labels for objects).\nIn this paper we present an algorithm to build a road network map enriched with traffic rules such as one-way streets and forbidden turns, based on the interpretation of already detected and classified traffic signs. Such algorithm helps to automatize the elaboration of maps for commercial navigation systems. Our solution is based on simulating navigation along the road network, determining at each point of interest the visibility of the signs and their effect on the roads. We test our approach in a small urban network and discuss various ways to generalize it to support more complex environments.\nThe Team-maxmin equilibrium prescribes the optimal strategies for a team of rational players sharing the same goal and without the capability of correlating their strategies in strategic games against an adversary. This solution concept can capture situations in which an agent controls multiple resources-corresponding to the team members-that cannot communicate. It is known that such equilibrium always exists and it is unique (unless degeneracy) and these properties make it a credible solution concept to be used in real-world applications, especially in security scenarios. Nevertheless, to the best of our knowledge, the Team-maxmin equilibrium is almost completely unexplored in the literature. In this paper, we investigate bounds of (in)efficiency of the Team-maxmin equilibrium w.r.t. the Nash equilibria and w.r.t. the Maxmin equilibrium when the team members can play correlated strategies. Furthermore, we study a number of algorithms to find and/or approximate an equilibrium, discussing their theoretical guarantees and evaluating their performance by using a standard testbed of game instances.\nRecurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data. Much of this progress has been achieved through devising recurrent units and architectures with the flexibility to capture complex statistics in the data, such as long range dependency or localized attention phenomena. However, while many sequential data (such as video, speech or language) can have highly variable information flow, most recurrent models still consume input features at a constant rate and perform a constant number of computations per time step, which can be detrimental to both speed and model capacity. In this paper, we explore a modification to existing recurrent units which allows them to learn to vary the amount of computation they perform at each step, without prior knowledge of the sequence's time structure. We show experimentally that not only do our models require fewer operations, they also lead to better performance overall on evaluation tasks.\nIn this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process,data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far, relates to the decision which expert to deploy at test time. We introduce a set of gating autoencoders that learn a representation for the task at hand, and, at test time, automatically forward the test sample to the relevant expert. This also brings memory efficiency as only one expert network has to be loaded into memory at any given time. Further, the autoencoders inherently capture the relatedness of one task to another, based on which the most relevant prior model to be used for training a new expert, with finetuning or learning without-forgetting, can be selected. We evaluate our method on image classification and video prediction problems.\nResearchers have recently started investigating deep neural networks for dialogue applications. In particular, generative sequence-to-sequence (Seq2Seq) models have shown promising results for unstructured tasks, such as word-level dialogue response generation. The hope is that such models will be able to leverage massive amounts of data to learn meaningful natural language representations and response generation strategies, while requiring a minimum amount of domain knowledge and hand-crafting. An important challenge is to develop models that can effectively incorporate dialogue context and generate meaningful and diverse responses. In support of this goal, we review recently proposed models based on generative encoder-decoder neural network architectures, and show that these models have better ability to incorporate long-term dialogue history, to model uncertainty and ambiguity in dialogue, and to generate responses with high-level compositional structure.\nNew developments in HPC technology in terms of increasing computing power on multi/many core processors, high-bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platforms and environments. However, number of optimization options that shows up with each new technology or software framework has resulted in a \\emph{combinatorial explosion} in feature space for tuning collective parameters such that finding the optimal set has become a nearly impossible task. Applicability of algorithmic choices available for optimizing collective communication depends largely on the scalability requirement for a particular usecase. This problem can be further exasperated by any requirement to run collective problems at very large scales such as in the case of exascale computing, at which impractical tuning by brute force may require many months of resources. Therefore application of statistical, data mining and artificial Intelligence or more general hybrid learning models seems essential in many collectives parameter optimization problems. We hope to explore current and the cutting edge of collective communication optimization and tuning methods and culminate with possible future directions towards this problem.\nGenerative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications.\nIt is critical for advanced manufacturing machines to autonomously execute a task by following an end-user's natural language (NL) instructions. However, NL instructions are usually ambiguous and abstract so that the machines may misunderstand and incorrectly execute the task. To address this NL-based human-machine communication problem and enable the machines to appropriately execute tasks by following the end-user's NL instructions, we developed a Machine-Executable-Plan-Generation (exePlan) method. The exePlan method conducts task-centered semantic analysis to extract task-related information from ambiguous NL instructions. In addition, the method specifies machine execution parameters to generate a machine-executable plan by interpreting abstract NL instructions. To evaluate the exePlan method, an industrial robot Baxter was instructed by NL to perform three types of industrial tasks {'drill a hole', 'clean a spot', 'install a screw'}. The experiment results proved that the exePlan method was effective in generating machine-executable plans from the end-user's NL instructions. Such a method has the promise to endow a machine with the ability of NL-instructed task execution.\nWe propose a novel deep network architecture for grayscale and color image denoising that is based on a non-local image model. Our motivation for the overall design of the proposed network stems from variational methods that exploit the inherent non-local self-similarity property of natural images. We build on this concept and introduce deep networks that perform non-local processing and at the same time they significantly benefit from discriminative learning. Experiments on the Berkeley segmentation dataset, comparing several state-of-the-art methods, show that the proposed non-local models achieve the best reported denoising performance both for grayscale and color images for all the tested noise levels. It is also worth noting that this increase in performance comes at no extra cost on the capacity of the network compared to existing alternative deep network architectures. In addition, we highlight a direct link of the proposed non-local models to convolutional neural networks. This connection is of significant importance since it allows our models to take full advantage of the latest advances on GPU computing in deep learning and makes them amenable to efficient implementations through their inherent parallelism.\nMany prediction problems can be phrased as inferences over local neighborhoods of graphs. The graph represents the interaction between entities, and the neighborhood of each entity contains information that allows the inferences or predictions. We present an approach for applying machine learning directly to such graph neighborhoods, yielding predicitons for graph nodes on the basis of the structure of their local neighborhood and the features of the nodes in it. Our approach allows predictions to be learned directly from examples, bypassing the step of creating and tuning an inference model or summarizing the neighborhoods via a fixed set of hand-crafted features. The approach is based on a multi-level architecture built from Long Short-Term Memory neural nets (LSTMs); the LSTMs learn how to summarize the neighborhood from data. We demonstrate the effectiveness of the proposed technique on a synthetic example and on real-world data related to crowdsourced grading, Bitcoin transactions, and Wikipedia edit reversions.\nWe propose a new method to study the internal memory used by reinforcement learning policies. We estimate the amount of relevant past information by estimating mutual information between behavior histories and the current action of an agent. We perform this estimation in the passive setting, that is, we do not intervene but merely observe the natural behavior of the agent. Moreover, we provide a theoretical justification for our approach by showing that it yields an implementation-independent lower bound on the minimal memory capacity of any agent that implement the observed policy. We demonstrate our approach by estimating the use of memory of DQN policies on concatenated Atari frames, demonstrating sharply different use of memory across 49 games. The study of memory as information that flows from the past to the current action opens avenues to understand and improve successful reinforcement learning algorithms.\nEntity resolution (ER) is about identifying and merging records in a database that represent the same real-world entity. Matching dependencies (MDs) have been introduced and investigated as declarative rules that specify ER policies. An ER process induced by MDs over a dirty instance leads to multiple clean instances, in general. General \"answer sets programs\" have been proposed to specify the MD-based cleaning task and its results. In this work, we extend MDs to \"relational MDs\", which capture more application semantics, and identify classes of relational MDs for which the general ASP can be automatically rewritten into a stratified Datalog program, with the single clean instance as its standard model.\nWe propose a higher-level associative memory for learning adversarial networks. Generative adversarial network (GAN) framework has a discriminator and a generator network. The generator (G) maps white noise (z) to data samples while the discriminator (D) maps data samples to a single scalar. To do so, G learns how to map from high-level representation space to data space, and D learns to do the opposite. We argue that higher-level representation spaces need not necessarily follow a uniform probability distribution. In this work, we use Restricted Boltzmann Machines (RBMs) as a higher-level associative memory and learn the probability distribution for the high-level features generated by D. The associative memory samples its underlying probability distribution and G learns how to map these samples to data space. The proposed associative adversarial networks (AANs) are generative models in the higher-levels of the learning, and use adversarial non-stochastic models D and G for learning the mapping between data and higher-level representation spaces. Experiments show the potential of the proposed networks.\nWe model coherent conversation continuation via RNN-based dialogue models equipped with a dynamic attention mechanism. Our attention-RNN language model dynamically increases the scope of attention on the history as the conversation continues, as opposed to standard attention (or alignment) models with a fixed input scope in a sequence-to-sequence model. This allows each generated word to be associated with the most relevant words in its corresponding conversation history. We evaluate the model on two popular dialogue datasets, the open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot dataset, and achieve significant improvements over the state-of-the-art and baselines on several metrics, including complementary diversity-based metrics, human evaluation, and qualitative visualizations. We also show that a vanilla RNN with dynamic attention outperforms more complex memory models (e.g., LSTM and GRU) by allowing for flexible, long-distance memory. We promote further coherence via topic modeling-based reranking.\nSurvival analysis is a fundamental tool in medical research to identify predictors of adverse events and develop systems for clinical decision support. In order to leverage large amounts of patient data, efficient optimisation routines are paramount. We propose an efficient training algorithm for the kernel survival support vector machine (SSVM). We directly optimise the primal objective function and employ truncated Newton optimisation and order statistic trees to significantly lower computational costs compared to previous training algorithms, which require $O(n^4)$ space and $O(p n^6)$ time for datasets with $n$ samples and $p$ features. Our results demonstrate that our proposed optimisation scheme allows analysing data of a much larger scale with no loss in prediction performance. Experiments on synthetic and 5 real-world datasets show that our technique outperforms existing kernel SSVM formulations if the amount of right censoring is high ($\\geq85\\%$), and performs comparably otherwise.\nAutomaton models are often seen as interpretable models. Interpretability itself is not well defined: it remains unclear what interpretability means without first explicitly specifying objectives or desired attributes. In this paper, we identify the key properties used to interpret automata and propose a modification of a state-merging approach to learn variants of finite state automata. We apply the approach to problems beyond typical grammar inference tasks. Additionally, we cover several use-cases for prediction, classification, and clustering on sequential data in both supervised and unsupervised scenarios to show how the identified key properties are applicable in a wide range of contexts.\nLossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found their way into the areas of low-level computer vision and image processing to solve regression problems mostly with relatively shallow networks.   We present a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an improvement of up to 0.36 dB over the best previous ConvNet result. We show that a network trained for a specific quality factor (QF) is resilient to the QF used to compress the input image - a single network trained for QF 60 provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.\nUnderstanding why a model made a certain prediction is crucial in many data science fields. Interpretable predictions engender appropriate trust and provide insight into how the model may be improved. However, with large modern datasets the best accuracy is often achieved by complex models even experts struggle to interpret, which creates a tension between accuracy and interpretability. Recently, several methods have been proposed for interpreting predictions from complex models by estimating the importance of input features. Here, we present how a model-agnostic additive representation of the importance of input features unifies current methods. This representation is optimal, in the sense that it is the only set of additive values that satisfies important properties. We show how we can leverage these properties to create novel visual explanations of model predictions. The thread of unity that this representation weaves through the literature indicates that there are common principles to be learned about the interpretation of model predictions that apply in many scenarios.\nIn this paper we introduce a new unsupervised reinforcement learning method for discovering the set of intrinsic options available to an agent. This set is learned by maximizing the number of different states an agent can reliably reach, as measured by the mutual information between the set of options and option termination states. To this end, we instantiate two policy gradient based algorithms, one that creates an explicit embedding space of options and one that represents options implicitly. The algorithms also provide an explicit measure of empowerment in a given state that can be used by an empowerment maximizing agent. The algorithm scales well with function approximation and we demonstrate the applicability of the algorithm on a range of tasks.\nComplex problems may require sophisticated, non-linear learning methods such as kernel machines or deep neural networks to achieve state of the art prediction accuracies. However, high prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. Unfortunately, most methods do not come with out of the box straight forward interpretation. Even linear prediction functions are not straight forward to explain if features exhibit complex correlation structure.   In this paper, we propose the Measure of Feature Importance (MFI). MFI is general and can be applied to any arbitrary learning machine (including kernel machines and deep learning). MFI is intrinsically non-linear and can detect features that by itself are inconspicuous and only impact the prediction function through their interaction with other features. Lastly, MFI can be used for both --- model-based feature importance and instance-based feature importance (i.e, measuring the importance of a feature for a particular data point).\nRecent work in model-agnostic explanations of black-box machine learning has demonstrated that interpretability of complex models does not have to come at the cost of accuracy or model flexibility. However, it is not clear what kind of explanations, such as linear models, decision trees, and rule lists, are the appropriate family to consider, and different tasks and models may benefit from different kinds of explanations. Instead of picking a single family of representations, in this work we propose to use \"programs\" as model-agnostic explanations. We show that small programs can be expressive yet intuitive as explanations, and generalize over a number of existing interpretable families. We propose a prototype program induction method based on simulated annealing that approximates the local behavior of black-box classifiers around a specific prediction using random perturbations. Finally, we present preliminary application on small datasets and show that the generated explanations are intuitive and accurate for a number of classifiers.\nA major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data. In this work, we introduce a new training strategy, iCaRL, that allows learning in such a class-incremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively. iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on CIFAR-100 and ImageNet ILSVRC 2012 data that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail.\nMean Field inference is central to statistical physics. It has attracted much interest in the Computer Vision community to efficiently solve problems expressible in terms of large Conditional Random Fields. However, since it models the posterior probability distribution as a product of marginal probabilities, it may fail to properly account for important dependencies between variables. We therefore replace the fully factorized distribution of Mean Field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. By introducing two new ideas, namely, conditioning on groups of variables instead of single ones and using a parameter of the conditional random field potentials, that we identify to the temperature in the sense of statistical physics to select such groups, we can perform this minimization efficiently. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of Mean Field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean fields.\nWe consider an orienteering problem (OP) where an agent needs to visit a series (possibly a subset) of depots, from which the maximal accumulated profits are desired within given limited time budget. Different from most existing works where the profits are assumed to be static, in this work we investigate a variant that has arbitrary time-dependent profits. Specifically, the profits to be collected change over time and they follow different (e.g., independent) time-varying functions. The problem is of inherent nonlinearity and difficult to solve by existing methods. To tackle the challenge, we present a simple and effective framework that incorporates time-variations into the fundamental planning process. Specifically, we propose a deterministic spatio-temporal representation where both spatial description and temporal logic are unified into one routing topology. By employing existing basic sorting and searching algorithms, the routing solutions can be computed in an extremely efficient way. The proposed method is easy to implement and extensive numerical results show that our approach is time efficient and generates near-optimal solutions.\nThis work presents a multiscale framework to solve an inverse reinforcement learning (IRL) problem for continuous-time/state stochastic systems. We take advantage of a diffusion wavelet representation of the associated Markov chain to abstract the state space. This not only allows for effectively handling the large (and geometrically complex) decision space but also provides more interpretable representations of the demonstrated state trajectories and also of the resulting policy of IRL. In the proposed framework, the problem is divided into the global and local IRL, where the global approximation of the optimal value functions are obtained using coarse features and the local details are quantified using fine local features. An illustrative numerical example on robot path control in a complex environment is presented to verify the proposed method.\nWe study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components. Understanding and fixing errors that arise in such integrative systems is difficult as failures can occur at multiple points in the execution workflow. Moreover, errors can propagate, become amplified or be suppressed, making blame assignment difficult. We propose a human-in-the-loop methodology which leverages human intellect for troubleshooting system failures. The approach simulates potential component fixes through human computation tasks and measures the expected improvements in the holistic behavior of the system. The method provides guidance to designers about how they can best improve the system. We demonstrate the effectiveness of the approach on an automated image captioning system that has been pressed into real-world use.\nWe introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.\nThis paper explores a novel way for analyzing the tournament structures to find a best suitable one for the tournament under consideration. It concerns about three aspects such as tournament conducting cost, competitiveness development and ranking precision. It then proposes a new method using progress tree to detect potential throwaway matches. The analysis performed using the proposed method reveals the strengths and weaknesses of tournament structures. As a conclusion, single elimination is best if we want to qualify one winner only, all matches conducted are exciting in term of competitiveness. Double elimination with proper seeding system is a better choice if we want to qualify more winners. A reasonable number of extra matches need to be conducted in exchange of being able to qualify top four winners. Round-robin gives reliable ranking precision for all participants. However, its conduction cost is very high, and it fails to maintain competitiveness development.\nNeutrosophic theory and applications have been expanding in all directions at an astonishing rate especially after the introduction the journal entitled Neutrosophic Sets and Systems. New theories, techniques, algorithms have been rapidly developed. One of the most striking trends in the neutrosophic theory is the hybridization of neutrosophic set with other potential sets such as rough set, bipolar set, soft set, hesitant fuzzy set, etc. The different hybrid structure such as rough neutrosophic set, single valued neutrosophic rough set, bipolar neutrosophic set, single valued neutrosophic hesitant fuzzy set, etc. are proposed in the literature in a short period of time. Neutrosophic set has been a very important tool in all various areas of data mining, decision making, e-learning, engineering, medicine, social science, and some more. The book New Trends in Neutrosophic Theories and Applications focuses on theories, methods, algorithms for decision making and also applications involving neutrosophic information. Some topics deal with data mining, decision making, e-learning, graph theory, medical diagnosis, probability theory, topology, and some more.\nConstrained Local Models (CLMs) are a well-established family of methods for facial landmark detection. However, they have recently fallen out of favor to cascaded regression-based approaches. This is in part due to the inability of existing CLM local detectors to model the very complex individual landmark appearance that is affected by expression, illumination, facial hair, makeup, and accessories. In our work, we present a novel local detector -- Convolutional Experts Network (CEN) -- that brings together the advantages of neural architectures and mixtures of experts in an end-to-end framework. We further propose a Convolutional Experts Constrained Local Model (CE-CLM) algorithm that uses CEN as local detectors. We demonstrate that our proposed CE-CLM algorithm outperforms competitive state-of-the-art baselines for facial landmark detection by a large margin on four publicly-available datasets. Our approach is especially accurate and robust on challenging profile images.\nTraining robots to perceive, act and communicate using multiple modalities still represents a challenging problem, particularly if robots are expected to learn efficiently from small sets of example interactions. We describe a learning approach as a step in this direction, where we teach a humanoid robot how to play the game of noughts and crosses. Given that multiple multimodal skills can be trained to play this game, we focus our attention to training the robot to perceive the game, and to interact in this game. Our multimodal deep reinforcement learning agent perceives multimodal features and exhibits verbal and non-verbal actions while playing. Experimental results using simulations show that the robot can learn to win or draw up to 98% of the games. A pilot test of the proposed multimodal system for the targeted game---integrating speech, vision and gestures---reports that reasonable and fluent interactions can be achieved using the proposed approach.\nInventing targeted proof search strategies for specific problem sets is a difficult task. State-of-the-art automated theorem provers (ATPs) such as E allow a large number of user-specified proof search strategies described in a rich domain specific language. Several machine learning methods that invent strategies automatically for ATPs were proposed previously. One of them is the Blind Strategymaker (BliStr), a system for automated invention of ATP strategies.   In this paper we introduce BliStrTune -- a hierarchical extension of BliStr. BliStrTune allows exploring much larger space of E strategies by interleaving search for high-level parameters with their fine-tuning. We use BliStrTune to invent new strategies based also on new clause weight functions targeted at problems from large ITP libraries. We show that the new strategies significantly improve E's performance in solving problems from the Mizar Mathematical Library.\nRandom embedding has been applied with empirical success to large-scale black-box optimization problems with low effective dimensions. This paper proposes the EmbeddedHunter algorithm, which incorporates the technique in a hierarchical stochastic bandit setting, following the optimism in the face of uncertainty principle and breaking away from the multiple-run framework in which random embedding has been conventionally applied similar to stochastic black-box optimization solvers. Our proposition is motivated by the bounded mean variation in the objective value for a low-dimensional point projected randomly into the decision space of Lipschitz-continuous problems. In essence, the EmbeddedHunter algorithm expands optimistically a partitioning tree over a low-dimensional---equal to the effective dimension of the problem---search space based on a bounded number of random embeddings of sampled points from the low-dimensional space. In contrast to the probabilistic theoretical guarantees of multiple-run random-embedding algorithms, the finite-time analysis of the proposed algorithm presents a theoretical upper bound on the regret as a function of the algorithm's number of iterations. Furthermore, numerical experiments were conducted to validate its performance. The results show a clear performance gain over recently proposed random embedding methods for large-scale problems, provided the intrinsic dimensionality is low.\nAutonomous driving is one of the most recent topics of interest which is aimed at replicating human driving behavior keeping in mind the safety issues. We approach the problem of learning synthetic driving using generative neural networks. The main idea is to make a controller trainer network using images plus key press data to mimic human learning. We used the architecture of a stable GAN to make predictions between driving scenes using key presses. We train our model on one video game (Road Rash) and tested the accuracy and compared it by running the model on other maps in Road Rash to determine the extent of learning.\nBoundary estimation in images and videos has been a very active topic of research, and organizing visual information into boundaries and segments is believed to be a corner stone of visual perception. While prior work has focused on estimating boundaries for observed frames, our work aims at predicting boundaries of future unobserved frames. This requires our model to learn about the fate of boundaries and corresponding motion patterns -- including a notion of \"intuitive physics\". We experiment on natural video sequences along with synthetic sequences with deterministic physics-based and agent-based motions. While not being our primary goal, we also show that fusion of RGB and boundary prediction leads to improved RGB predictions.\nWe introduce the BIN_COUNTS constraint, which deals with the problem of counting the number of decision variables in a set which are assigned values that lie in given bins. We illustrate a decomposition and a filtering algorithm that achieves generalised arc consistency. We contrast the filtering power of these two approaches and we discuss a number of applications. We show that BIN_COUNTS can be employed to develop a decomposition for the $\\chi^2$ test constraint, a new statistical constraint that we introduce in this work. We also show how this new constraint can be employed in the context of the Balanced Academic Curriculum Problem and of the Balanced Nursing Workload Problem. For both these problems we carry out numerical studies involving our reformulations. Finally, we present a further application of the $\\chi^2$ test constraint in the context of confidence interval analysis.\nThis paper addresses the task of set prediction using deep learning. This is important because the output of many computer vision tasks, including image tagging and object detection, are naturally expressed as sets of entities rather than vectors. As opposed to a vector, the size of a set is not fixed in advance, and it is invariant to the ordering of entities within it. We define a likelihood for a set distribution and learn its parameters using a deep neural network. We also derive a loss for predicting a discrete distribution corresponding to set cardinality. Set prediction is demonstrated on the problem of multi-class image classification. Moreover, we show that the proposed cardinality loss can also trivially be applied to the tasks of object counting and pedestrian detection. Our approach outperforms existing methods in all three cases on standard datasets.\nThis paper presents a novel form of policy gradient for model-free reinforcement learning (RL) with improved exploration properties. Current policy-based methods use entropy regularization to encourage undirected exploration of the reward landscape, which is ineffective in high dimensional spaces with sparse rewards. We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions. An action sequence is considered under-appreciated if its log-probability under the current policy under-estimates its resulting reward. The proposed exploration strategy is easy to implement, requiring small modifications to an implementation of the REINFORCE algorithm. We evaluate the approach on a set of algorithmic tasks that have long challenged RL methods. Our approach reduces hyper-parameter sensitivity and demonstrates significant improvements over baseline methods. Our algorithm successfully solves a benchmark multi-digit addition task and generalizes to long sequences. This is, to our knowledge, the first time that a pure RL method has solved addition using only reward feedback.\nWe describe a neural attention model with a learnable retinal sampling lattice. The model is trained on a visual search task requiring the classification of an object embedded in a visual scene amidst background distractors using the smallest number of fixations. We explore the tiling properties that emerge in the model's retinal sampling lattice after training. Specifically, we show that this lattice resembles the eccentricity dependent sampling lattice of the primate retina, with a high resolution region in the fovea surrounded by a low resolution periphery. Furthermore, we find conditions where these emergent properties are amplified or eliminated providing clues to their function.\nThere exist many problem domains where the interpretability of neural network models is essential for deployment. Here we introduce a recurrent architecture composed of input-switched affine transformations - in other words an RNN without any explicit nonlinearities, but with input-dependent recurrent weights. This simple form allows the RNN to be analyzed via straightforward linear methods: we can exactly characterize the linear contribution of each input to the model predictions; we can use a change-of-basis to disentangle input, output, and computational hidden unit subspaces; we can fully reverse-engineer the architecture's solution to a simple task. Despite this ease of interpretation, the input switched affine network achieves reasonable performance on a text modeling tasks, and allows greater computational efficiency than networks with standard nonlinearities.\nWe consider the problem of maximizing a non-monotone DR-submodular function subject to a cardinality constraint. Diminishing returns (DR) submodularity is a generalization of the diminishing returns property for functions defined over the integer lattice. This generalization can be used to solve many machine learning or combinatorial optimization problems such as optimal budget allocation, revenue maximization, etc. In this work we propose the first polynomial-time approximation algorithms for non-monotone constrained maximization. We implement our algorithms for a revenue maximization problem with a real-world dataset to check their efficiency and performance.\nDesigning appropriate features for acoustic event recognition tasks is an active field of research. Expressive features should both improve the performance of the tasks and also be interpret-able. Currently, heuristically designed features based on the domain knowledge requires tremendous effort in hand-crafting, while features extracted through deep network are difficult for human to interpret. In this work, we explore the experience guided learning method for designing acoustic features. This is a novel hybrid approach combining both domain knowledge and purely data driven feature designing. Based on the procedure of log Mel-filter banks, we design a filter bank learning layer. We concatenate this layer with a convolutional neural network (CNN) model. After training the network, the weight of the filter bank learning layer is extracted to facilitate the design of acoustic features. We smooth the trained weight of the learning layer and re-initialize it in filter bank learning layer as audio feature extractor. For the environmental sound recognition task based on the Urban- sound8K dataset, the experience guided learning leads to a 2% accuracy improvement compared with the fixed feature extractors (the log Mel-filter bank). The shape of the new filter banks are visualized and explained to prove the effectiveness of the feature design process.\nThis paper investigates the operation of a hybrid power system through a novel fuzzy control scheme. The hybrid power system employs various autonomous generation systems like wind turbine, solar photovoltaic, diesel engine, fuel-cell, aqua electrolyzer etc. Other energy storage devices like the battery, flywheel and ultra-capacitor are also present in the network. A novel fractional order (FO) fuzzy control scheme is employed and its parameters are tuned with a particle swarm optimization (PSO) algorithm augmented with two chaotic maps for achieving an improved performance. This FO fuzzy controller shows better performance over the classical PID, and the integer order fuzzy PID controller in both linear and nonlinear operating regimes. The FO fuzzy controller also shows stronger robustness properties against system parameter variation and rate constraint nonlinearity, than that with the other controller structures. The robustness is a highly desirable property in such a scenario since many components of the hybrid power system may be switched on/off or may run at lower/higher power output, at different time instants.\nAn important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach.\nWe present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.\nExploration in multi-task reinforcement learning is critical in training agents to deduce the underlying MDP. Many of the existing exploration frameworks such as $E^3$, $R_{max}$, Thompson sampling assume a single stationary MDP and are not suitable for system identification in the multi-task setting. We present a novel method to facilitate exploration in multi-task reinforcement learning using deep generative models. We supplement our method with a low dimensional energy model to learn the underlying MDP distribution and provide a resilient and adaptive exploration signal to the agent. We evaluate our method on a new set of environments and provide intuitive interpretation of our results.\nThis paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent network using a policy gradient method. We compare learning the network parameters on a set of training graphs against learning them on individual test graphs. Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. Applied to the KnapSack, another NP-hard problem, the same method obtains optimal solutions for instances with up to 200 items.\nThe relevance and importance of contextualizing data analytics is described. Qualitative characteristics might form the context of quantitative analysis. Topics that are at issue include: contrast, baselining, secondary data sources, supplementary data sources, dynamic and heterogeneous data. In geometric data analysis, especially with the Correspondence Analysis platform, various case studies are both experimented with, and are reviewed. In such aspects as paradigms followed, and technical implementation, implicitly and explicitly, an important point made is the major relevance of such work for both burgeoning analytical needs and for new analytical areas including Big Data analytics, and so on. For the general reader, it is aimed to display and describe, first of all, the analytical outcomes that are subject to analysis here, and then proceed to detail the more quantitative outcomes that fully support the analytics carried out.\nWe present an online deliberation system using mutual evaluation in order to collaboratively develop solutions. Participants submit their proposals and evaluate each other's proposals; some of them may then be invited by the system to rewrite 'problematic' proposals. Two cases are discussed: a proposal supported by many, but not by a given person, who is then invited to rewrite it for making yet more acceptable; and a poorly presented but presumably interesting proposal. The first of these cases has been successfully implemented. Proposals are evaluated along two axes-understandability (or clarity, or, more generally, quality), and agreement. The latter is used by the system to cluster proposals according to their ideas, while the former is used both to present the best proposals on top of their clusters, and to find poorly written proposals candidates for rewriting. These functionalities may be considered as important components of a large scale online deliberation system.\nEmotion estimation in music listening is confronting challenges to capture the emotion variation of listeners. Recent years have witnessed attempts to exploit multimodality fusing information from musical contents and physiological signals captured from listeners to improve the performance of emotion recognition. In this paper, we present a study of fusion of signals of electroencephalogram (EEG), a tool to capture brainwaves at a high-temporal resolution, and musical features at decision level in recognizing the time-varying binary classes of arousal and valence. Our empirical results showed that the fusion could outperform the performance of emotion recognition using only EEG modality that was suffered from inter-subject variability, and this suggested the promise of multimodal fusion in improving the accuracy of music-emotion recognition.\nUnderstanding how brain functions has been an intriguing topic for years. With the recent progress on collecting massive data and developing advanced technology, people have become interested in addressing the challenge of decoding brain wave data into meaningful mind states, with many machine learning models and algorithms being revisited and developed, especially the ones that handle time series data because of the nature of brain waves. However, many of these time series models, like HMM with hidden state in discrete space or State Space Model with hidden state in continuous space, only work with one source of data and cannot handle different sources of information simultaneously. In this paper, we propose an extension of State Space Model to work with different sources of information together with its learning and inference algorithms. We apply this model to decode the mind state of students during lectures based on their brain waves and reach a significant better results compared to traditional methods.\nThis paper presents a concept of a novel method for adjusting hyper-parameters in Deep Learning (DL) algorithms. An external agent-observer monitors a performance of a selected Deep Learning algorithm. The observer learns to model the DL algorithm using a series of random experiments. Consequently, it may be used for predicting a response of the DL algorithm in terms of a selected quality measurement to a set of hyper-parameters. This allows to construct an ensemble composed of a series of evaluators which constitute an observer-assisted architecture. The architecture may be used to gradually iterate towards to the best achievable quality score in tiny steps governed by a unit of progress. The algorithm is stopped when the maximum number of steps is reached or no further progress is made.\nSequence modeling with neural networks has lead to powerful models of symbolic music data. We address the problem of exploiting these models to reach creative musical goals, by combining with human input. To this end we generalise previous work, which sampled Markovian sequence models under the constraint that the sequence belong to the language of a given finite state machine provided by the human. We consider more expressive non-Markov models, thereby requiring approximate sampling which we provide in the form of an efficient sequential Monte Carlo method. In addition we provide and compare with a beam search strategy for conditional probability maximisation.   Our algorithms are capable of convincingly re-harmonising famous musical works. To demonstrate this we provide visualisations, quantitative experiments, a human listening test and audio examples. We find both the sampling and optimisation procedures to be effective, yet complementary in character. For the case of highly permissive constraint sets, we find that sampling is to be preferred due to the overly regular nature of the optimisation based results. The generality of our algorithms permits countless other creative applications.\nStochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river network optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real- world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates.\nProblems such as predicting a new shading field (Y) for an image (X) are ambiguous: many very distinct solutions are good. Representing this ambiguity requires building a conditional model P(Y|X) of the prediction, conditioned on the image. Such a model is difficult to train, because we do not usually have training data containing many different shadings for the same image. As a result, we need different training examples to share data to produce good models. This presents a danger we call \"code space collapse\" - the training procedure produces a model that has a very good loss score, but which represents the conditional distribution poorly. We demonstrate an improved method for building conditional models by exploiting a metric constraint on training data that prevents code space collapse. We demonstrate our model on two example tasks using real data: image saturation adjustment, image relighting. We describe quantitative metrics to evaluate ambiguous generation results. Our results quantitatively and qualitatively outperform different strong baselines.\nThe paper analyzes the interaction between humans and computers in terms of response time in solving the image-based CAPTCHA. In particular, the analysis focuses on the attitude of the different Internet users in easily solving four different types of image-based CAPTCHAs which include facial expressions like: animated character, old woman, surprised face, worried face. To pursue this goal, an experiment is realized involving 100 Internet users in solving the four types of CAPTCHAs, differentiated by age, Internet experience, and education level. The response times are collected for each user. Then, association rules are extracted from user data, for evaluating the dependence of the response time in solving the CAPTCHA from age, education level and experience in internet usage by statistical analysis. The results implicitly capture the users' psychological states showing in what states the users are more sensible. It reveals to be a novelty and a meaningful analysis in the state-of-the-art.\nSystems for automatic extraction of semantic information about events from large textual resources are now available: these tools are capable to generate RDF datasets about text extracted events and this knowledge can be used to reason over the recognized events. On the other hand, text based tasks for event recognition, as for example event coreference (i.e. recognizing whether two textual descriptions refer to the same event), do not take into account ontological information of the extracted events in their process. In this paper, we propose a method to derive event coreference on text extracted event data using semantic based rule reasoning. We demonstrate our method considering a limited (yet representative) set of event types: we introduce a formal analysis on their ontological properties and, on the base of this, we define a set of coreference criteria. We then implement these criteria as RDF-based reasoning rules to be applied on text extracted event data. We evaluate the effectiveness of our approach over a standard coreference benchmark dataset.\nTime-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 400 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.\nWe present the Neural Physics Engine (NPE), a framework for learning simulators of intuitive physics that naturally generalize across variable object count and different scene configurations. We propose a factorization of a physical scene into composable object-based representations and a neural network architecture whose compositional structure factorizes object dynamics into pairwise interactions. Like a symbolic physics engine, the NPE is endowed with generic notions of objects and their interactions; realized as a neural network, it can be trained via stochastic gradient descent to adapt to specific object properties and dynamics of different worlds. We evaluate the efficacy of our approach on simple rigid body dynamics in two-dimensional worlds. By comparing to less structured architectures, we show that the NPE's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize across variable object count and different scene configurations, and infer latent properties of objects such as mass.\nWe present a method for inducing new dialogue systems from very small amounts of unannotated dialogue data, showing how word-level exploration using Reinforcement Learning (RL), combined with an incremental and semantic grammar - Dynamic Syntax (DS) - allows systems to discover, generate, and understand many new dialogue variants. The method avoids the use of expensive and time-consuming dialogue act annotations, and supports more natural (incremental) dialogues than turn-based systems. Here, language generation and dialogue management are treated as a joint decision/optimisation problem, and the MDP model for RL is constructed automatically. With an implemented system, we show that this method enables a wide range of dialogue variations to be automatically captured, even when the system is trained from only a single dialogue. The variants include question-answer pairs, over- and under-answering, self- and other-corrections, clarification interaction, split-utterances, and ellipsis. This generalisation property results from the structural knowledge and constraints present within the DS grammar, and highlights some limitations of recent systems built using machine learning techniques only.\nThe ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news. Recent approaches for off-policy evaluation and learning in these settings appear promising. With this paper, we provide real-world data and a standardized test-bed to systematically investigate these algorithms using data from display advertising. In particular, we consider the problem of filling a banner ad with an aggregate of multiple products the user may want to purchase. This paper presents our test-bed, the sanity checks we ran to ensure its validity, and shows results comparing state-of-the-art off-policy learning methods like doubly robust optimization, POEM, and reductions to supervised learning using regression baselines. Our results show experimental evidence that recent off-policy learning methods can improve upon state-of-the-art supervised learning techniques on a large-scale real-world data set.\nAdvances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.\nA number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that indicates how humans employ a variety of semantic concepts and abstractions (object categories, localisation, etc.) to reason about the world, we build an agent-model that incorporates such abstractions into its policy-learning framework. We augment the raw image input to a Deep Q-Learning Network (DQN), by adding details of objects and structural elements encountered, along with the agent's localisation. The different components are automatically extracted and composed into a topological representation using on-the-fly object detection and 3D-scene reconstruction.We evaluate the efficacy of our approach in Doom, a 3D first-person combat game that exhibits a number of challenges discussed, and show that our augmented framework consistently learns better, more effective policies.\nDue to physiological variation, patients diagnosed with the same condition may exhibit divergent, but related, responses to the same treatments. Hidden Parameter Markov Decision Processes (HiP-MDPs) tackle this transfer-learning problem by embedding these tasks into a low-dimensional space. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modeled independently of the agent's state uncertainty, requiring an unnatural training procedure in which all tasks visited every part of the state space---possible for robots that can be moved to a particular location, impossible for human patients. We update the HiP-MDP framework and extend it to more robustly develop personalized medicine strategies for HIV treatment.\nAn important problem for HCI researchers is to estimate the parameter values of a cognitive model from behavioral data. This is a difficult problem, because of the substantial complexity and variety in human behavioral strategies. We report an investigation into a new approach using approximate Bayesian computation (ABC) to condition model parameters to data and prior knowledge. As the case study we examine menu interaction, where we have click time data only to infer a cognitive model that implements a search behaviour with parameters such as fixation duration and recall probability. Our results demonstrate that ABC (i) improves estimates of model parameter values, (ii) enables meaningful comparisons between model variants, and (iii) supports fitting models to individual users. ABC provides ample opportunities for theoretical HCI research by allowing principled inference of model parameter values and their uncertainty.\nAutomatic essay scoring (AES) refers to the process of scoring free text responses to given prompts, considering human grader scores as the gold standard. Writing such essays is an essential component of many language and aptitude exams. Hence, AES became an active and established area of research, and there are many proprietary systems used in real life applications today. However, not much is known about which specific linguistic features are useful for prediction and how much of this is consistent across datasets. This article addresses that by exploring the role of various linguistic features in automatic essay scoring using two publicly available datasets of non-native English essays written in test taking scenarios. The linguistic properties are modeled by encoding lexical, syntactic, discourse and error types of learner language in the feature set. Predictive models are then developed using these features on both datasets and the most predictive features are compared. While the results show that the feature set used results in good predictive models with both datasets, the question \"what are the most predictive features?\" has a different answer for each dataset.\nA major challenge facing existing sequential Monte-Carlo methods for parameter estimation in physics stems from the inability of existing approaches to robustly deal with experiments that have different mechanisms that yield the results with equivalent probability. We address this problem here by proposing a form of particle filtering that clusters the particles that comprise the sequential Monte-Carlo approximation to the posterior before applying a resampler. Through a new graphical approach to thinking about such models, we are able to devise an artificial-intelligence based strategy that automatically learns the shape and number of the clusters in the support of the posterior. We demonstrate the power of our approach by applying it to randomized gap estimation and a form of low circuit-depth phase estimation where existing methods from the physics literature either exhibit much worse performance or even fail completely.\nWe consider parallel asynchronous Markov Chain Monte Carlo (MCMC) sampling for problems where we can leverage (stochastic) gradients to define continuous dynamics which explore the target distribution. We outline a solution strategy for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling (SGHMC) which we alter to include an elastic coupling term that ties together multiple MCMC instances. The proposed strategy turns inherently sequential HMC algorithms into asynchronous parallel versions. First experiments empirically show that the resulting parallel sampler significantly speeds up exploration of the target distribution, when compared to standard SGHMC, and is less prone to the harmful effects of stale gradients than a naive parallelization approach.\nSemantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objects play within the activity. For this problem, we find empirically that most object-role combinations are rare, and current state-of-the-art models significantly underperform in this sparse data regime. We avoid many such errors by (1) introducing a novel tensor composition function that learns to share examples across role-noun combinations and (2) semantically augmenting our training data with automatically gathered examples of rarely observed outputs using web data. When integrated within a complete CRF-based structured prediction model, the tensor-based approach outperforms existing state of the art by a relative improvement of 2.11% and 4.40% on top-5 verb and noun-role accuracy, respectively. Adding 5 million images with our semantic augmentation techniques gives further relative improvements of 6.23% and 9.57% on top-5 verb and noun-role accuracy.\nWe show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations. Based on standard comparison theorems for matrix splittings, we then show how the asymptotic rate of convergence varies as a function of the inherent timescales of the options. This new perspective highlights a trade-off between asymptotic performance and the cost of computation associated with building a good set of options.\nWe present the Mim-Solution's approach to the RecSys Challenge 2016, which ranked 2nd. The goal of the competition was to prepare job recommendations for the users of the website Xing.com.   Our two phase algorithm consists of candidate selection followed by the candidate ranking. We ranked the candidates by the predicted probability that the user will positively interact with the job offer. We have used Gradient Boosting Decision Trees as the regression tool.\nThis paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces. We claim that, after being trained on the chorale harmonizations by Johann Sebastian Bach, our model is capable of generating highly convincing chorales in the style of Bach. DeepBach's strength comes from the use of pseudo-Gibbs sampling coupled with an adapted representation of musical data. This is in contrast with many automatic music composition approaches which tend to compose music sequentially. Our model is also steerable in the sense that a user can constrain the generation by imposing positional constraints such as notes, rhythms or cadences in the generated score. We also provide a plugin on top of the MuseScore music editor making the interaction with DeepBach easy to use.\nDeep learning approaches have reached a celebrity status in artificial intelligence field, its success have mostly relied on Convolutional Networks (CNN) and Recurrent Networks. By exploiting fundamental spatial properties of images and videos, the CNN always achieves dominant performance on visual tasks. And the Recurrent Networks (RNN) especially long short-term memory methods (LSTM) can successfully characterize the temporal correlation, thus exhibits superior capability for time series tasks. Traffic flow data have plentiful characteristics on both time and space domain. However, applications of CNN and LSTM approaches on traffic flow are limited. In this paper, we propose a novel deep architecture combined CNN and LSTM to forecast future traffic flow (CLTFP). An 1-dimension CNN is exploited to capture spatial features of traffic flow, and two LSTMs are utilized to mine the short-term variability and periodicities of traffic flow. Given those meaningful features, the feature-level fusion is performed to achieve short-term forecasting. The proposed CLTFP is compared with other popular forecasting methods on an open datasets. Experimental results indicate that the CLTFP has considerable advantages in traffic flow forecasting. in additional, the proposed CLTFP is analyzed from the view of Granger Causality, and several interesting properties of CLTFP are discovered and discussed .\nSoftware estimation is a crucial task in software engineering. Software estimation encompasses cost, effort, schedule, and size. The importance of software estimation becomes critical in the early stages of the software life cycle when the details of software have not been revealed yet. Several commercial and non-commercial tools exist to estimate software in the early stages. Most software effort estimation methods require software size as one of the important metric inputs and consequently, software size estimation in the early stages becomes essential. One of the approaches that has been used for about two decades in the early size and effort estimation is called use case points. Use case points method relies on the use case diagram to estimate the size and effort of software projects. Although the use case points method has been widely used, it has some limitations that might adversely affect the accuracy of estimation. This paper presents some techniques using fuzzy logic and neural networks to improve the accuracy of the use case points method. Results showed that an improvement up to 22% can be obtained using the proposed approach.\nWe propose a scheme for training a computerized agent to perform complex human tasks such as highway steering. The scheme is designed to follow a natural learning process whereby a human instructor teaches a computerized trainee. The learning process consists of five elements: (i) unsupervised feature learning; (ii) supervised imitation learning; (iii) supervised reward induction; (iv) supervised safety module construction; and (v) reinforcement learning. We implemented the last four elements of the scheme using deep convolutional networks and applied it to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa. We demonstrate that the use of the last four elements is essential to effectively carry out the steering task using vision alone, without access to a driving simulator internals, and operating in wall-clock time. This is made possible also through the introduction of a safety network, a novel way for preventing the agent from performing catastrophic mistakes during the reinforcement learning stage.\nWe examine the complexity of inference in Bayesian networks specified by logical languages. We consider representations that range from fragments of propositional logic to function-free first-order logic with equality; in doing so we cover a variety of plate models and of probabilistic relational models. We study the complexity of inferences when network, query and domain are the input (the inferential and the combined complexity), when the network is fixed and query and domain are the input (the query/data complexity), and when the network and query are fixed and the domain is the input (the domain complexity). We draw connections with probabilistic databases and liftability results, and obtain complexity classes that range from polynomial to exponential levels.\nExtending the success of deep neural networks to natural language understanding and symbolic reasoning requires complex operations and external memory. Recent neural program induction approaches have attempted to address this problem, but are typically limited to differentiable memory, and consequently cannot scale beyond small synthetic tasks. In this work, we propose the Manager-Programmer-Computer framework, which integrates neural networks with non-differentiable memory to support abstract, scalable and precise operations through a friendly neural computer interface. Specifically, we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence neural \"programmer\", and a non-differentiable \"computer\" that is a Lisp interpreter with code assist. To successfully apply REINFORCE for training, we augment it with approximate gold programs found by an iterative maximum likelihood training process. NSM is able to learn a semantic parser from weak supervision over a large knowledge base. It achieves new state-of-the-art performance on WebQuestionsSP, a challenging semantic parsing dataset, with weak supervision. Compared to previous approaches, NSM is end-to-end, therefore does not rely on feature engineering or domain specific knowledge.\nIn this paper we extend the principle of proportional representation to rankings. We consider the setting where alternatives need to be ranked based on approval preferences. In this setting, proportional representation requires that cohesive groups of voters are represented proportionally in each initial segment of the ranking. Proportional rankings are desirable in situations where initial segments of different lengths may be relevant, e.g., hiring decisions (if it is unclear how many positions are to be filled), the presentation of competing proposals on a liquid democracy platform (if it is unclear how many proposals participants are taking into consideration), or recommender systems (if a ranking has to accommodate different user types). We study the proportional representation provided by several ranking methods and prove theoretical guarantees. Furthermore, we experimentally evaluate these methods and present preliminary evidence as to which methods are most suitable for producing proportional rankings.\nAndroid malware has been on the rise in recent years due to the increasing popularity of Android and the proliferation of third party application markets. Emerging Android malware families are increasingly adopting sophisticated detection avoidance techniques and this calls for more effective approaches for Android malware detection. Hence, in this paper we present and evaluate an n-gram opcode features based approach that utilizes machine learning to identify and categorize Android malware. This approach enables automated feature discovery without relying on prior expert or domain knowledge for pre-determined features. Furthermore, by using a data segmentation technique for feature selection, our analysis is able to scale up to 10-gram opcodes. Our experiments on a dataset of 2520 samples showed an f-measure of 98% using the n-gram opcode based approach. We also provide empirical findings that illustrate factors that have probable impact on the overall n-gram opcodes performance trends.\nArtificial intelligence offers the potential to automate challenging data-processing tasks in collider physics. To establish its prospects, we explore to what extent deep learning with convolutional neural networks can discriminate quark and gluon jets better than observables designed by physicists. Our approach builds upon the paradigm that a jet can be treated as an image, with intensity given by the local calorimeter deposits. We supplement this construction by adding color to the images, with red, green and blue intensities given by the transverse momentum in charged particles, transverse momentum in neutral particles, and pixel-level charged particle counts. Overall, the deep networks match or outperform traditional jet variables. We also find that, while various simulations produce different quark and gluon jets, the neural networks are surprisingly insensitive to these differences, similar to traditional observables. This suggests that the networks can extract robust physical information from imperfect simulations.\nIn the classic Vehicle Routing Problem (VRP) a fleet of of vehicles has to visit a set of customers while minimising the operations' costs. We study a rich variant of the VRP featuring split deliveries, an heterogeneous fleet, and vehicle-commodity incompatibility constraints. Our goal is twofold: define the cheapest routing and the most adequate fleet.   To do so, we split the problem into two interdependent components: a fleet design component and a routing component. First, we define two Mixed Integer Programming (MIP) formulations for each component. Then we discuss several improvements in the form of valid cuts and symmetry breaking constraints.   The main contribution of this paper is a comparison of the four resulting models for this Rich VRP. We highlight their strengths and weaknesses with extensive experiments.   Finally, we explore a lightweight integration with Constraint Programming (CP). We use a fast CP model which gives good solutions and use the solution to warm-start our models.\nScarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different \"contexts\". Bayesian optimization approaches to contextual policy search (CPS) offer data-efficient policy learning that generalize over a context space. We propose to improve data- efficiency by factoring typically considered contexts into two components: target- type contexts that correspond to a desired outcome of the learned behavior, e.g. target position for throwing a ball; and environment type contexts that correspond to some state of the environment, e.g. initial ball position or wind speed. Our key observation is that experience can be directly generalized over target-type contexts. Based on that we introduce Factored Contextual Policy Search with Bayesian Optimization for both passive and active learning settings. Preliminary results show faster policy generalization on a simulated toy problem.\nAttention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as \"the\" and \"of\". Other words that may seem visual can often be predicted reliably just from the language model e.g., \"sign\" after \"behind a red stop\" or \"phone\" following \"talking on a cell\". In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin.\nOntologies in different natural languages often differ in quality in terms of richness of schema or richness of internal links. This difference is markedly visible when comparing a rich English language ontology with a non-English language counterpart. Discovering alignment between them is a useful endeavor as it serves as a starting point in bridging the disparity. In particular, our work is motivated by the absence of inter-language links for predicates in the localised versions of DBpedia. In this paper, we propose and demonstrate an ad-hoc system to find possible owl:equivalentProperty links between predicates in ontologies of different natural languages. We seek to achieve this mapping by using pre-existing inter-language links of the resources connected by the given predicate. Thus, our methodology stresses on semantic similarity rather than lexical. Moreover, through an evaluation, we show that our system is capable of outperforming a baseline system that is similar to the one used in recent OAEI campaigns.\nTransferring artistic styles onto everyday photographs has become an extremely popular task in both academia and industry. Recently, offline training has replaced on-line iterative optimization, enabling nearly real-time stylization. When those stylization networks are applied directly to high-resolution images, however, the style of localized regions often appears less similar to the desired artistic style. This is because the transfer process fails to capture small, intricate textures and maintain correct texture scales of the artworks. Here we propose a multimodal convolutional neural network that takes into consideration faithful representations of both color and luminance channels, and performs stylization hierarchically with multiple losses of increasing scales. Compared to state-of-the-art networks, our network can also perform style transfer in nearly real-time by conducting much more sophisticated training offline. By properly handling style and texture cues at multiple scales using several modalities, we can transfer not just large-scale, obvious style cues but also subtle, exquisite ones. That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.\nThis paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with finite state space and two different reward functions. Firstly we examine the effects of two reward functions under two criteria in a short-horizon MDP. We show that under the VaR criterion, when the original reward function is on both current and next states, the reward simplification will change the VaR. Secondly, for long-horizon MDPs, we estimate the Pareto front of the total reward distribution set with the aid of spectral theory and the central limit theorem. Since the estimation is for a Markov process with the simplified reward function only, we present a transformation algorithm for the Markov process with the original reward function, in order to estimate the Pareto front with an intact total reward distribution.\nAlthough Generative Adversarial Networks achieve state-of-the-art results on a variety of generative tasks, they are regarded as highly unstable and prone to miss modes. We argue that these bad behaviors of GANs are due to the very particular functional shape of the trained discriminators in high dimensional spaces, which can easily make training stuck or push probability mass in the wrong direction, towards that of higher concentration than that of the data generating distribution. We introduce several ways of regularizing the objective, which can dramatically stabilize the training of GAN models. We also show that our regularizers can help the fair distribution of probability mass across the modes of the data generating distribution, during the early phases of training and thus providing a unified solution to the missing modes problem.\nA key limitation of sampling algorithms for approximate inference is that it is difficult to quantify their approximation error. Widely used sampling schemes, such as sequential importance sampling with resampling and Metropolis-Hastings, produce output samples drawn from a distribution that may be far from the target posterior distribution. This paper shows how to upper-bound the symmetric KL divergence between the output distribution of a broad class of sequential Monte Carlo (SMC) samplers and their target posterior distributions, subject to assumptions about the accuracy of a separate gold-standard sampler. The proposed method applies to samplers that combine multiple particles, multinomial resampling, and rejuvenation kernels. The experiments show the technique being used to estimate bounds on the divergence of SMC samplers for posterior inference in a Bayesian linear regression model and a Dirichlet process mixture model.\nWe study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration. The SPD methods find an absolute-$\\epsilon$-optimal policy, with high probability, using $\\mathcal{O}\\left(\\frac{|\\mathcal{S}|^4 |\\mathcal{A}|^2\\sigma^2 }{(1-\\gamma)^6\\epsilon^2} \\right)$ iterations/samples for the infinite-horizon discounted-reward MDP and $\\mathcal{O}\\left(\\frac{|\\mathcal{S}|^4 |\\mathcal{A}|^2H^6\\sigma^2 }{\\epsilon^2} \\right)$ for the finite-horizon MDP.\nCompositional models were introduce by Jirousek and Shenoy in the general framework of valuation-based systems. They based their theory on an axiomatic system of valuations involving not only the operations of combination and marginalisation, but also of removal. They claimed that this systems covers besides the classical case of discrete probability distributions, also the cases of Gaussian densities and belief functions, and many other systems.   Whereas their results on the compositional operator are correct, the axiomatic basis is not sufficient to cover the examples claimed above. We propose here a different axiomatic system of valuation algebras, which permits a rigorous mathematical theory of compositional operators in valuation-based systems and covers all the examples mentioned above. It extends the classical theory of inverses in semigroup theory and places thereby the present theory into its proper mathematical frame. Also this theory sheds light on the different structures of valuation-based systems, like regular algebras (represented by probability potentials), canncellative algebras (Gaussian potentials) and general separative algebras (density functions).\nBuilding neural networks to query a knowledge base (a table) with natural language is an emerging research topic in deep learning. An executor for table querying typically requires multiple steps of execution because queries may have complicated structures. In previous studies, researchers have developed either fully distributed executors or symbolic executors for table querying. A distributed executor can be trained in an end-to-end fashion, but is weak in terms of execution efficiency and explicit interpretability. A symbolic executor is efficient in execution, but is very difficult to train especially at initial stages. In this paper, we propose to couple distributed and symbolic execution for natural language queries, where the symbolic executor is pretrained with the distributed executor's intermediate execution results in a step-by-step fashion. Experiments show that our approach significantly outperforms both distributed and symbolic executors, exhibiting high accuracy, high learning efficiency, high execution efficiency, and high interpretability.\nManaging patients with multimorbidity often results in polypharmacy: the prescription of multiple drugs. However, the long-term effects of specific combinations of drugs and diseases are typically unknown. In particular, drugs prescribed for one condition may result in adverse effects for the other. To investigate which types of drugs may affect the further progression of multimorbidity, we query models of diseases and prescriptions that are learned from primary care data. State-of-the-art tractable Bayesian network representations, on which such complex queries can be computed efficiently, are employed for these large medical networks. Our results confirm that prescriptions may lead to unintended negative consequences in further development of multimorbidity in cardiovascular diseases. Moreover, a drug treatment for one disease group may affect diseases of another group.\nBayesian Optimization (BO) has become a core method for solving expensive black-box optimization problems. While much research focussed on the choice of the acquisition function, we focus on online length-scale adaption and the choice of kernel function. Instead of choosing hyperparameters in view of maximum likelihood on past data, we propose to use the acquisition function to decide on hyperparameter adaptation more robustly and in view of the future optimization progress. Further, we propose a particular kernel function that includes non-stationarity and local anisotropy and thereby implicitly integrates the efficiency of local convex optimization with global Bayesian optimization. Comparisons to state-of-the art BO methods underline the efficiency of these mechanisms on global optimization benchmarks.\nLiterature reviews can be time-consuming and tedious to complete. By cataloging and refactoring three state-of-the-art active learning techniques from evidence-based medicine and legal electronic discovery, this paper finds and implements FASTREAD, a faster technique for studying a large corpus of documents. This paper assesses FASTREAD using datasets generated from existing SE literature reviews (Hall, Wahono, Radjenovi\\'c, Kitchenham et al.). Compared to manual methods, FASTREAD lets researchers find 95% relevant studies after reviewing an order of magnitude fewer papers. Compared to other state-of-the-art automatic methods, FASTREAD reviews 20-50% fewer studies while finding same number of relevant primary studies in a systematic literature review.\nReinforcement learning (RL) depends critically on the choice of reward functions used to capture the de- sired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain knowledge into learning. We propose Truncated Linear Temporal Logic (TLTL) as specifications language, that is arguably well suited for the robotics applications, together with quantitative semantics, i.e., robustness degree. We propose a RL approach to learn tasks expressed as TLTL formulae that uses their associated robustness degree as reward functions, instead of the manually crafted heuristics trying to capture the same specifications. We show in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied. Furthermore, we demonstrate the proposed RL approach in a toast-placing task learned by a Baxter robot.\nWe provide a brief technical description of an online platform for disease monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu Detector, in its current version (v.0.5), uses either Twitter or Google search data in conjunction with statistical Natural Language Processing models to estimate the rate of influenza-like illness in the population of England. Its back-end is a live service that collects online data, utilises modern technologies for large-scale text processing, and finally applies statistical inference models that are trained offline. The front-end visualises the various disease rate estimates. Notably, the models based on Google data achieve a high level of accuracy with respect to the most recent four flu seasons in England (2012/13 to 2015/16). This highlighted Flu Detector as having a great potential of becoming a complementary source to the domestic traditional flu surveillance schemes.\nTraditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions.\nSeveral methods exist for a computer to generate music based on data including Markov chains, recurrent neural networks, recombinancy, and grammars. We explore the use of unit selection and concatenation as a means of generating music using a procedure based on ranking, where, we consider a unit to be a variable length number of measures of music. We first examine whether a unit selection method, that is restricted to a finite size unit library, can be sufficient for encompassing a wide spectrum of music. We do this by developing a deep autoencoder that encodes a musical input and reconstructs the input by selecting from the library. We then describe a generative model that combines a deep structured semantic model (DSSM) with an LSTM to predict the next unit, where units consist of four, two, and one measures of music. We evaluate the generative model using objective metrics including mean rank and accuracy and with a subjective listening test in which expert musicians are asked to complete a forced-choiced ranking task. We compare our model to a note-level generative baseline that consists of a stacked LSTM trained to predict forward by one note.\nTensor decompositions have rich applications in statistics and machine learning, and developing efficient, accurate algorithms for the problem has received much attention recently. Here, we present a new method built on Kruskal's uniqueness theorem to decompose symmetric, nearly orthogonally decomposable tensors. Unlike the classical higher-order singular value decomposition which unfolds a tensor along a single mode, we consider unfoldings along two modes and use rank-1 constraints to characterize the underlying components. This tensor decomposition method provably handles a greater level of noise compared to previous methods and achieves a high estimation accuracy. Numerical results demonstrate that our algorithm is robust to various noise distributions and that it performs especially favorably as the order increases.\nWe propose an online, end-to-end, neural generative conversational model for open-domain dialogue. It is trained using a unique combination of offline two-phase supervised learning and online human-in-the-loop active learning. While most existing research proposes offline supervision or hand-crafted reward functions for online reinforcement, we devise a novel interactive learning mechanism based on hamming-diverse beam search for response generation and one-character user-feedback at each step. Experiments show that our model inherently promotes the generation of semantically relevant and interesting responses, and can be used to train agents with customized personas, moods and conversational styles.\nRecord linkage is the process of identifying records that refer to the same entities from several databases. This process is challenging because commonly no unique entity identifiers are available. Linkage therefore has to rely on partially identifying attributes, such as names and addresses of people. Recent years have seen the development of novel techniques for linking data from diverse application areas, where a major focus has been on linking complex data that contain records about different types of entities. Advanced approaches that exploit both the similarities between record attributes as well as the relationships between entities to identify clusters of matching records have been developed.   In this application paper we study the novel problem where rather than different types of entities we have databases where the same entity can have different roles, and where these roles change over time. We specifically develop novel techniques for linking historical birth, death, marriage and census records with the aim to reconstruct the population covered by these records over a period of several decades. Our experimental evaluation on real Scottish data shows that even with advanced linkage techniques that consider group, relationship, and temporal aspects it is challenging to achieve high quality linkage from such complex data.\nRecent advances have shown the capability of Fully Convolutional Neural Networks (FCN) to model cost functions for motion planning in the context of learning driving preferences purely based on demonstration data from human drivers. While pure learning from demonstrations in the framework of Inverse Reinforcement Learning (IRL) is a promising approach, we can benefit from well informed human priors and incorporate them into the learning process. Our work achieves this by pretraining a model to regress to a manual cost function and refining it based on Maximum Entropy Deep Inverse Reinforcement Learning. When injecting prior knowledge as pretraining for the network, we achieve higher robustness, more visually distinct obstacle boundaries, and the ability to capture instances of obstacles that elude models that purely learn from demonstration data. Furthermore, by exploiting these human priors, the resulting model can more accurately handle corner cases that are scarcely seen in the demonstration data, such as stairs, slopes, and underpasses.\nIn this paper we present an agent-based model (ABM) of scientific inquiry aimed at investigating how different social networks impact the efficiency of scientists in acquiring knowledge. As such, the ABM is a computational tool for tackling issues in the domain of scientific methodology and science policy. In contrast to existing ABMs of science, our model aims to represent the argumentative dynamics that underlies scientific practice. To this end we employ abstract argumentation theory as the core design feature of the model. This helps to avoid a number of problematic idealizations which are present in other ABMs of science and which impede their relevance for actual scientific practice.\nWhereas CNNs have demonstrated immense progress in many vision problems, they suffer from a dependence on monumental amounts of labeled training data. On the other hand, dictionary learning does not scale to the size of problems that CNNs can handle, despite being very effective at low-level vision tasks such as denoising and inpainting. Recently, interest has grown in adapting dictionary learning methods for supervised tasks such as classification and inverse problems. We propose two new network layers that are based on dictionary learning: a sparse factorization layer and a convolutional sparse factorization layer, analogous to fully-connected and convolutional layers, respectively. Using our derivations, these layers can be dropped in to existing CNNs, trained together in an end-to-end fashion with back-propagation, and leverage semisupervision in ways classical CNNs cannot. We experimentally compare networks with these two new layers against a baseline CNN. Our results demonstrate that networks with either of the sparse factorization layers are able to outperform classical CNNs when supervised data are few. They also show performance improvements in certain tasks when compared to the CNN with no sparse factorization layers with the same exact number of parameters.\nRecurrent Neural Networks (RNN), particularly Long Short Term Memory (LSTM) RNNs, are a popular and very successful method for learning and generating sequences. However, current generative RNN techniques do not allow real-time interactive control of the sequence generation process, thus aren't well suited for live creative expression. We propose a method of real-time continuous control and 'steering' of sequence generation using an ensemble of RNNs and dynamically altering the mixture weights of the models. We demonstrate the method using character based LSTM networks and a gestural interface allowing users to 'conduct' the generation of text.\nWe introduce a method for imposing higher-level structure on generated, polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a generative model is combined with gradient descent constraint optimization to provide further control over the generation process. Among other things, this allows for the use of a \"template\" piece, from which some structural properties can be extracted, and transferred as constraints to newly generated material. The sampling process is guided with Simulated Annealing in order to avoid local optima, and find solutions that both satisfy the constraints, and are relatively stable with respect to the C-RBM. Results show that with this approach it is possible to control the higher level self-similarity structure, the meter, as well as tonal properties of the resulting musical piece while preserving its local musical coherence.\nIn many model-based diagnosis applications it is impossible to provide such a set of observations and/or measurements that allow to identify the real cause of a fault. Therefore, diagnosis systems often return many possible candidates, leaving the burden of selecting the correct diagnosis to a user. Sequential diagnosis techniques solve this problem by automatically generating a sequence of queries to some oracle. The answers to these queries provide additional information necessary to gradually restrict the search space by removing diagnosis candidates inconsistent with the answers.   During query computation, existing sequential diagnosis methods often require the generation of many unnecessary query candidates and strongly rely on expensive logical reasoners. We tackle this issue by devising efficient heuristic query search methods. The proposed methods enable for the first time a completely reasoner-free query generation while at the same time guaranteeing optimality conditions, e.g. minimal cardinality or best understandability, of the returned query that existing methods cannot realize. Hence, the performance of this approach is independent of the (complexity of the) diagnosed system. Experiments conducted using real-world problems show that the new approach is highly scalable and outperforms existing methods by orders of magnitude.\nWe investigate a human-machine collaborative drawing environment in which an autonomous agent sketches images while optionally allowing a user to directly influence the agent's trajectory. We combine Monte Carlo Tree Search with image classifiers and test both shallow models (e.g. multinomial logistic regression) and deep Convolutional Neural Networks (e.g. LeNet, Inception v3). We found that using the shallow model, the agent produces a limited variety of images, which are noticably recogonisable by humans. However, using the deeper models, the agent produces a more diverse range of images, and while the agent remains very confident (99.99%) in having achieved its objective, to humans they mostly resemble unrecognisable 'random' noise. We relate this to recent research which also discovered that 'deep neural networks are easily fooled' \\cite{Nguyen2015} and we discuss possible solutions and future directions for the research.\nA good dialogue agent should have the ability to interact with users by both responding to questions and by asking questions, and importantly to learn from both types of interaction. In this work, we explore this direction by designing a simulator and a set of synthetic tasks in the movie domain that allow such interactions between a learner and a teacher. We investigate how a learner can benefit from asking questions in both offline and online reinforcement learning settings, and demonstrate that the learner improves when asking questions. Finally, real experiments with Mechanical Turk validate the approach. Our work represents a first step in developing such end-to-end learned interactive dialogue agents.\nOntohub is a repository engine for managing distributed heterogeneous ontologies. The distributed nature enables communities to share and exchange their contributions easily. The heterogeneous nature makes it possible to integrate ontologies written in various ontology languages. Ontohub supports a wide range of formal logical and ontology languages, as well as various structuring and modularity constructs and inter-theory (concept) mappings, building on the OMG-standardized DOL language. Ontohub repositories are organised as Git repositories, thus inheriting all features of this popular version control system. Moreover, Ontohub is the first repository engine meeting a substantial amount of the requirements formulated in the context of the Open Ontology Repository (OOR) initiative, including an API for federation as well as support for logical inference and axiom selection.\nMini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates. This variance also changes over the optimization process; when using a constant batch size, stability and convergence is thus often enforced by means of a (manually tuned) decreasing learning rate schedule.   We propose a practical method for dynamic batch size adaptation. It estimates the variance of the stochastic gradients and adapts the batch size to decrease the variance proportionally to the value of the objective function, removing the need for the aforementioned learning rate decrease. In contrast to recent related work, our algorithm couples the batch size to the learning rate, directly reflecting the known relationship between the two. On popular image classification benchmarks, our batch size adaptation yields faster optimization convergence, while simultaneously simplifying learning rate tuning. A TensorFlow implementation is available.\nSeveral recently developed Multi-Agent Path Finding (MAPF) solvers scale to large MAPF instances by searching for MAPF plans on 2 levels: The high-level search resolves collisions between agents, and the low-level search plans paths for single agents under the constraints imposed by the high-level search. We make the following contributions to solve the MAPF problem with imperfect plan execution with small average makespans: First, we formalize the MAPF Problem with Delay Probabilities (MAPF-DP), define valid MAPF-DP plans and propose the use of robust plan-execution policies for valid MAPF-DP plans to control how each agent proceeds along its path. Second, we discuss 2 classes of decentralized robust plan-execution policies (called Fully Synchronized Policies and Minimal Communication Policies) that prevent collisions during plan execution for valid MAPF-DP plans. Third, we present a 2-level MAPF-DP solver (called Approximate Minimization in Expectation) that generates valid MAPF-DP plans.\nThe National Basketball Association(NBA) has expanded their data gathering and have heavily invested in new technologies to gather advanced performance metrics on players. This expanded data set allows analysts to use unique performance metrics in models to estimate and classify player performance. Instead of grouping players together based on physical attributes and positions played, analysts can group together players that play similar to each other based on these tracked metrics. Existing methods for player classification have typically used offensive metrics for clustering [1]. There have been attempts to classify players using past defensive metrics, but the lack of quality metrics has not produced promising results. The classifications presented in the paper use newly introduced defensive metrics to find different defensive positions for each player. Without knowing the number of categories that players can be cast into, Gaussian Mixture Models (GMM) can be applied to find the optimal number of clusters. In the model presented, five different defensive player types can be identified.\nIn this paper we consider the problem of robot navigation in simple maze-like environments where the robot has to rely on its onboard sensors to perform the navigation task. In particular, we are interested in solutions to this problem that do not require localization, mapping or planning. Additionally, we require that our solution can quickly adapt to new situations (e.g., changing navigation goals and environments). To meet these criteria we frame this problem as a sequence of related reinforcement learning tasks. We propose a successor feature based deep reinforcement learning algorithm that can learn to transfer knowledge from previously mastered navigation tasks to new problem instances. Our algorithm substantially decreases the required learning time after the first task instance has been solved, which makes it easily adaptable to changing environments. We validate our method in both simulated and real robot experiments with a Robotino and compare it to a set of baseline methods including classical planning-based navigation.\nWe propose a quantum machine learning algorithm for efficiently solving a class of problems encoded in quantum controlled unitary operations. The central physical mechanism of the protocol is the iteration of a quantum time-delayed equation that introduces feedback in the dynamics and eliminates the necessity of intermediate measurements. The performance of the quantum algorithm is analyzed by comparing the results obtained in numerical simulations with the outcome of classical machine learning methods for the same problem. The use of time-delayed equations enhances the toolbox of the field of quantum machine learning, which may enable unprecedented applications in quantum technologies.\nA softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly used softmax operator in this setting, but we show that this operator is prone to misbehavior. In this work, we study a differentiable softmax operator that, among other properties, is a non-expansion ensuring a convergent behavior in learning and planning. We introduce a variant of SARSA algorithm that, by utilizing the new operator, computes a Boltzmann policy with a state-dependent temperature parameter. We show that the algorithm is convergent and that it performs favorably in practice.\nWe investigate whether quantum annealers with select chip layouts can outperform classical computers in reinforcement learning tasks. We associate a transverse field Ising spin Hamiltonian with a layout of qubits similar to that of a deep Boltzmann machine (DBM) and use simulated quantum annealing (SQA) to numerically simulate quantum sampling from this system. We design a reinforcement learning algorithm in which the set of visible nodes representing the states and actions of an optimal policy are the first and last layers of the deep network. In absence of a transverse field, our simulations show that DBMs train more effectively than restricted Boltzmann machines (RBM) with the same number of weights. Since sampling from Boltzmann distributions of a DBM is not classically feasible, this is evidence of advantage of a non-Turing sampling oracle. We then develop a framework for training the network as a quantum Boltzmann machine (QBM) in the presence of a significant transverse field for reinforcement learning. This further improves the reinforcement learning method using DBMs.\nThe increasing availability of implicit feedback datasets has raised the interest in developing effective collaborative filtering techniques able to deal asymmetrically with unambiguous positive feedback and ambiguous negative feedback. In this paper, we propose a principled kernel-based collaborative filtering method for top-N item recommendation with implicit feedback. We present an efficient implementation using the linear kernel, and we show how to generalize it to kernels of the dot product family preserving the efficiency. We also investigate on the elements which influence the sparsity of a standard cosine kernel. This analysis shows that the sparsity of the kernel strongly depends on the properties of the dataset, in particular on the long tail distribution. We compare our method with state-of-the-art algorithms achieving good results both in terms of efficiency and effectiveness.\nModeling continuous-time physiological processes that manifest a patient's evolving clinical states is a key step in approaching many problems in healthcare. In this paper, we develop the Hidden Absorbing Semi-Markov Model (HASMM): a versatile probabilistic model that is capable of capturing the modern electronic health record (EHR) data. Unlike exist- ing models, an HASMM accommodates irregularly sampled, temporally correlated, and informatively censored physiological data, and can describe non-stationary clinical state transitions. Learning an HASMM from the EHR data is achieved via a novel forward- filtering backward-sampling Monte-Carlo EM algorithm that exploits the knowledge of the end-point clinical outcomes (informative censoring) in the EHR data, and implements the E-step by sequentially sampling the patients' clinical states in the reverse-time direction while conditioning on the future states. Real-time inferences are drawn via a forward- filtering algorithm that operates on a virtually constructed discrete-time embedded Markov chain that mirrors the patient's continuous-time state trajectory. We demonstrate the di- agnostic and prognostic utility of the HASMM in a critical care prognosis setting using a real-world dataset for patients admitted to the Ronald Reagan UCLA Medical Center.\nThe mathematical formalism of quantum theory exhibits significant effectiveness when applied to cognitive phenomena that have resisted traditional (set theoretical) modeling. Relying on a decade of research on the operational foundations of micro-physical and conceptual entities, we present a theoretical framework for the representation of concepts and their conjunctions and disjunctions that uses the quantum formalism. This framework provides a unified solution to the 'conceptual combinations problem' of cognitive psychology, explaining the observed deviations from classical (Boolean, fuzzy set and Kolmogorovian) structures in terms of genuine quantum effects. In particular, natural concepts 'interfere' when they combine to form more complex conceptual entities, and they also exhibit a 'quantum-type context-dependence', which are responsible of the 'over- and under-extension' that are systematically observed in experiments on membership judgments.\nRecently, the attention mechanism plays a key role to achieve high performance for Neural Machine Translation models. However, as it computes a score function for the encoder states in all positions at each decoding step, the attention model greatly increases the computational complexity. In this paper, we investigate the adequate vision span of attention models in the context of machine translation, by proposing a novel attention framework that is capable of reducing redundant score computation dynamically. The term \"vision span\" means a window of the encoder states considered by the attention model in one step. In our experiments, we found that the average window size of vision span can be reduced by over 50% with modest loss in accuracy on English-Japanese and German-English translation tasks.% This results indicate that the conventional attention mechanism performs a significant amount of redundant computation.\nIn this work we propose a novel representation learning model which computes semantic representations for tweets accurately. Our model systematically exploits the chronologically adjacent tweets ('context') from users' Twitter timelines for this task. Further, we make our model user-aware so that it can do well in modeling the target tweet by exploiting the rich knowledge about the user such as the way the user writes the post and also summarizing the topics on which the user writes. We empirically demonstrate that the proposed models outperform the state-of-the-art models in predicting the user profile attributes like spouse, education and job by 19.66%, 2.27% and 2.22% respectively.\nThe user equilibrium traffic assignment principle is very important in the traffic assignment problem. Mathematical programming models are designed to solve the user equilibrium problem in traditional algorithms. Recently, the Physarum shows the ability to address the user equilibrium and system optimization traffic assignment problems. However, the Physarum model are not efficient in real traffic networks with two-way traffic characteristics and multiple origin-destination pairs. In this article, a modified Physarum-inspired model for the user equilibrium problem is proposed. By decomposing traffic flux based on origin nodes, the traffic flux from different origin-destination pairs can be distinguished in the proposed model. The Physarum can obtain the equilibrium traffic flux when no shorter path can be discovered between each origin-destination pair. Finally, numerical examples use the Sioux Falls network to demonstrate the rationality and convergence properties of the proposed model.\nAnalyzing textual data is a very challenging task because of the huge volume of data generated daily. Fundamental issues in text analysis include the lack of structure in document datasets, the need for various preprocessing steps %(e.g., stem or lemma extraction, part-of-speech tagging, named entities recognition...), and performance and scaling issues. Existing text analysis architectures partly solve these issues, providing restrictive data schemas, addressing only one aspect of text preprocessing and focusing on one single task when dealing with performance optimization. %As a result, no definite solution is currently available. Thus, we propose in this paper a new generic text analysis architecture, where document structure is flexible, many preprocessing techniques are integrated and textual datasets are indexed for efficient access. We implement our conceptual architecture using both a relational and a document-oriented database. Our experiments demonstrate the feasibility of our approach and the superiority of the document-oriented logical and physical implementation.\nThis paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.\nWe consider a committee voting setting in which each voter approves of a subset of candidates and based on the approvals, a target number of candidates are selected. Aziz et al. (2015) proposed two representation axioms called justified representation and extended justified representation. Whereas the former can be tested as well as achieved in polynomial time, the latter property is coNP-complete to test and no polynomial-time algorithm is known to achieve it. Interestingly, S{\\'a}nchez-Fern{\\'a}ndez et~al. (2016) proposed an intermediate property called proportional justified representation that admits a polynomial-time algorithm to achieve. The complexity of testing proportional justified representation has remained an open problem. In this paper, we settle the complexity by proving that testing proportional justified representation is coNP-complete. We complement the complexity result by showing that the problem admits efficient algorithms if any of the following parameters are bounded: (1) number of voters (2) number of candidates (3) maximum number of candidates approved by a voter (4) maximum number of voters approving a given candidate.\nIn pattern classification, polynomial classifiers are well-studied methods as they are capable of generating complex decision surfaces. Unfortunately, the use of multivariate polynomials is limited to kernels as in support vector machines, because polynomials quickly become impractical for high-dimensional problems. In this paper, we effectively overcome the curse of dimensionality by employing the tensor train format to represent a polynomial classifier. Based on the structure of tensor trains, two learning algorithms are proposed which involve solving different optimization problems of low computational complexity. Furthermore, we show how both regularization to prevent overfitting and parallelization, which enables the use of large training sets, are incorporated into these methods. Both the efficiency and efficacy of our tensor-based polynomial classifier are then demonstrated on the two popular datasets USPS and MNIST.\nThis paper analyzes customer product-choice behavior based on the recency and frequency of each customer's page views on e-commerce sites. Recently, we devised an optimization model for estimating product-choice probabilities that satisfy monotonicity, convexity, and concavity constraints with respect to recency and frequency. This shape-restricted model delivered high predictive performance even when there were few training samples. However, typical e-commerce sites deal in many different varieties of products, so the predictive performance of the model can be further improved by integration of such product heterogeneity. For this purpose, we develop a novel latent-class shape-restricted model for estimating product-choice probabilities for each latent class of products. We also give a tailored expectation-maximization algorithm for parameter estimation. Computational results demonstrate that higher predictive performance is achieved with our latent-class model than with the previous shape-restricted model and common latent-class logistic regression.\nWhen building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.\nEvaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold'em poker, where man-machine competitions have involved multiple days of consistent play and still not resulted in statistically significant conclusions even when the winner's margin is substantial. In this paper, we introduce AIVAT, a low variance, provably unbiased value assessment tool that uses an arbitrary heuristic estimate of state value, as well as the explicit strategy of a subset of the agents. Unlike existing techniques which reduce the variance from chance events, or only consider game ending actions, AIVAT reduces the variance both from choices by nature and by players with a known strategy. The resulting estimator in no-limit poker can reduce the number of hands needed to draw statistical conclusions by more than a factor of 10.\nIn many personalized recommendation problems available data consists only of positive interactions (implicit feedback) between users and items. This problem is also known as One-Class Collaborative Filtering (OC-CF). Linear models usually achieve state-of-the-art performances on OC-CF problems and many efforts have been devoted to build more expressive and complex representations able to improve the recommendations. Recent analysis show that collaborative filtering (CF) datasets have peculiar characteristics such as high sparsity and a long tailed distribution of the ratings. In this paper we propose a boolean kernel, called Disjunctive kernel, which is less expressive than the linear one but it is able to alleviate the sparsity issue in CF contexts. The embedding of this kernel is composed by all the combinations of a certain arity d of the input variables, and these combined features are semantically interpreted as disjunctions of the input variables. Experiments on several CF datasets show the effectiveness and the efficiency of the proposed kernel.\nThe raise of complexity of technical systems also raises knowledge required to set them up and to maintain them. The cost to evolve such systems can be prohibitive. In the field of Autonomic Computing, technical systems should therefore have various self-healing capabilities allowing system owners to provide only partial, potentially inconsistent updates of the system. The self-healing or self-integrating system shall find out the remaining changes to communications and functionalities in order to accommodate change and yet still restore function. This issue becomes even more interesting in context of Internet of Things and Industrial Internet where previously unexpected device combinations can be assembled in order to provide a surprising new function. In order to pursue higher levels of self-integration capabilities I propose to think of self-integration as sophisticated error correcting communications. Therefore, this paper discusses an extended scope of error correction with the purpose to emphasize error correction's role as an integrated element of bi-directional communication channels in self-integrating, autonomic communication scenarios.\nWe introduce a new family of graphical models that consists of graphs with possibly directed, undirected and bidirected edges but without directed cycles. We show that these models are suitable for representing causal models with additive error terms. We provide a set of sufficient graphical criteria for the identification of arbitrary causal effects when the new models contain directed and undirected edges but no bidirected edge. We also provide a necessary and sufficient graphical criterion for the identification of the causal effect of a single variable on the rest of the variables. Moreover, we develop an exact algorithm for learning the new models from observational and interventional data via answer set programming. Finally, we introduce gated models for causal effect identification, a new family of graphical models that exploits context specific independences to identify additional causal effects.\nWhile the solution counting problem for propositional satisfiability (#SAT) has received renewed attention in recent years, this research trend has not affected other AI solving paradigms like answer set programming (ASP). Although ASP solvers are designed to enumerate all solutions, and counting can therefore be easily done, the involved materialization of all solutions is a clear bottleneck for the counting problem of ASP (#ASP). In this paper we propose dynamic programming-based #ASP algorithms that exploit the structure of the underlying (ground) ASP program. Experimental results for a prototype implementation show promise when compared to existing solvers.\nConnections between relations in relation extraction, which we call class ties, are common. In distantly supervised scenario, one entity tuple may have multiple relation facts. Exploiting class ties between relations of one entity tuple will be promising for distantly supervised relation extraction. However, previous models are not effective or ignore to model this property. In this work, to effectively leverage class ties, we propose to make joint relation extraction with a unified model that integrates convolutional neural network (CNN) with a general pairwise ranking framework, in which three novel ranking loss functions are introduced. Additionally, an effective method is presented to relieve the severe class imbalance problem from NR (not relation) for model training. Experiments on a widely used dataset show that leveraging class ties will enhance extraction and demonstrate the effectiveness of our model to learn class ties. Our model outperforms the baselines significantly, achieving state-of-the-art performance.\nThe past year saw the introduction of new architectures such as Highway networks and Residual networks which, for the first time, enabled the training of feedforward networks with dozens to hundreds of layers using simple gradient descent. While depth of representation has been posited as a primary reason for their success, there are indications that these architectures defy a popular view of deep learning as a hierarchical computation of increasingly abstract features at each layer.   In this report, we argue that this view is incomplete and does not adequately explain several recent findings. We propose an alternative viewpoint based on unrolled iterative estimation -- a group of successive layers iteratively refine their estimates of the same features instead of computing an entirely new representation. We demonstrate that this viewpoint directly leads to the construction of Highway and Residual networks. Finally we provide preliminary experiments to discuss the similarities and differences between the two architectures.\nIn this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.\nThe Kaczmarz method is an iterative algorithm for solving systems of linear equalities and inequalities, that iteratively projects onto these constraints. Recently, Strohmer and Vershynin [J. Fourier Anal. Appl., 15(2):262-278, 2009] gave a non-asymptotic convergence rate analysis for this algorithm, spurring numerous extensions and generalizations of the Kaczmarz method. Rather than the randomized selection rule analyzed in that work, in this paper we instead discuss greedy and approximate greedy selection rules. We show that in some applications the computational costs of greedy and random selection are comparable, and that in many cases greedy selection rules give faster convergence rates than random selection rules. Further, we give the first multi-step analysis of Kaczmarz methods for a particular greedy rule, and propose a provably-faster randomized selection rule for matrices with many pairwise-orthogonal rows.\nSingle image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack high-frequency textures and do not look natural despite yielding high PSNR values.   We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixel-accurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.\nQuantum inspired Evolutionary Algorithms were proposed more than a decade ago and have been employed for solving a wide range of difficult search and optimization problems. A number of changes have been proposed to improve performance of canonical QEA. However, canonical QEA is one of the few evolutionary algorithms, which uses a search operator with relatively large number of parameters. It is well known that performance of evolutionary algorithms is dependent on specific value of parameters for a given problem. The advantage of having large number of parameters in an operator is that the search process can be made more powerful even with a single operator without requiring a combination of other operators for exploration and exploitation. However, the tuning of operators with large number of parameters is complex and computationally expensive. This paper proposes a novel heuristic method for tuning parameters of canonical QEA. The tuned QEA outperforms canonical QEA on a class of discrete combinatorial optimization problems which, validates the design of the proposed parameter tuning framework. The proposed framework can be used for tuning other algorithms with both large and small number of tunable parameters.\nAlgorithms which sort lists of real numbers into ascending order have been studied for decades. They are typically based on a series of pairwise comparisons and run entirely on chip. However people routinely sort lists which depend on subjective or complex judgements that cannot be automated. Examples include marketing research; where surveys are used to learn about customer preferences for products, the recruiting process; where interviewers attempt to rank potential employees, and sporting tournaments; where we infer team rankings from a series of one on one matches. We develop a novel sorting algorithm, where each pairwise comparison reflects a subjective human judgement about which element is bigger or better. We introduce a finite and large error rate to each judgement, and we take the cost of each comparison to significantly exceed the cost of other computational steps. The algorithm must request the most informative sequence of comparisons from the user; in order to identify the correct sorted list with minimum human input. Our Discrete Adiabatic Monte Carlo approach exploits the gradual acquisition of information by tracking a set of plausible hypotheses which are updated after each additional comparison.\nAUC (Area under the ROC curve) is an important performance measure for applications where the data is highly imbalanced. Learning to maximize AUC performance is thus an important research problem. Using a max-margin based surrogate loss function, AUC optimization problem can be approximated as a pairwise rankSVM learning problem. Batch learning methods for solving the kernelized version of this problem suffer from scalability and may not result in sparse classifiers. Recent years have witnessed an increased interest in the development of online or single-pass online learning algorithms that design a classifier by maximizing the AUC performance. The AUC performance of nonlinear classifiers, designed using online methods, is not comparable with that of nonlinear classifiers designed using batch learning algorithms on many real-world datasets. Motivated by these observations, we design a scalable algorithm for maximizing AUC performance by greedily adding the required number of basis functions into the classifier model. The resulting sparse classifiers perform faster inference. Our experimental results show that the level of sparsity achievable can be order of magnitude smaller than the Kernel RankSVM model without affecting the AUC performance much.\nWe propose to apply Simplicity Theory (ST) to model interest in creative situations. ST has been designed to describe and predict interest in communication. Here we use ST to derive a decision rule that we apply to a simplified version of a creative game, the Poietic Generator. The decision rule produces what can be regarded as an elementary form of creativity. This study is meant as a proof of principle. It suggests that some creative actions may be motivated by the search for unexpected simplicity.\nObjects appear to scale differently in natural images. This fact requires methods dealing with object-centric tasks (e.g. object proposal) to have robust performance over variances in object scales. In the paper, we present a novel segment proposal framework, namely FastMask, which takes advantage of hierarchical features in deep convolutional neural networks to segment multi-scale objects in one shot. Innovatively, we adapt segment proposal network into three different functional components (body, neck and head). We further propose a weight-shared residual neck module as well as a scale-tolerant attentional head module for efficient one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all state-of-the-art segment proposal methods in average recall being 2~5 times faster. Moreover, with a slight trade-off in accuracy, FastMask can segment objects in near real time (~13 fps) with 800*600 resolution images, demonstrating its potential in practical applications. Our implementation is available on https://github.com/voidrank/FastMask.\nDespite of the pain and limited accuracy of blood tests for early recognition of cardiovascular disease, they dominate risk screening and triage. On the other hand, heart rate variability is non-invasive and cheap, but not considered accurate enough for clinical practice. Here, we tackle heart beat interval based classification with deep learning. We introduce an end to end differentiable hybrid architecture, consisting of a layer of biological neuron models of cardiac dynamics (modified FitzHugh Nagumo neurons) and several layers of a standard feed-forward neural network. The proposed model is evaluated on ECGs from 474 stable at-risk (coronary artery disease) patients, and 1172 chest pain patients of an emergency department. We show that it can significantly outperform models based on traditional heart rate variability predictors, as well as approaching or in some cases outperforming clinical blood tests, based only on 60 seconds of inter-beat intervals.\nWe propose a new formalism for specifying and reasoning about problems that involve heterogeneous \"pieces of information\" -- large collections of data, decision procedures of any kind and complexity and connections between them. The essence of our proposal is to lift Codd's relational algebra from operations on relational tables to operations on classes of structures (with recursion), and to add a direction of information propagation. We observe the presence of information propagation in several formalisms for efficient reasoning and use it to express unary negation and operations used in graph databases. We carefully analyze several reasoning tasks and establish a precise connection between a generalized query evaluation and temporal logic model checking. Our development allows us to reveal a general correspondence between classical and modal logics and may shed a new light on the good computational properties of modal logics and related formalisms.\nTemporal Difference learning or TD($\\lambda$) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's $\\lambda$ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the $\\lambda$ selection problem as a bias-variance trade-off where the solution is the value of $\\lambda$ that leads to the smallest Mean Squared Value Error (MSVE). To solve this trade-off we suggest applying Leave-One-Trajectory-Out Cross-Validation (LOTO-CV) to search the space of $\\lambda$ values. Unfortunately, this approach is too computationally expensive for most practical applications. For Least Squares TD (LSTD) we show that LOTO-CV can be implemented efficiently to automatically tune $\\lambda$ and apply function optimization methods to efficiently search the space of $\\lambda$ values. The resulting algorithm, ALLSTD, is parameter free and our experiments demonstrate that ALLSTD is significantly computationally faster than the na\\\"{i}ve LOTO-CV implementation while achieving similar performance.\nReferring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer's feedback. We demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring expression datasets. Project and demo page: https://vision.cs.unc.edu/refer\nIn a Web Advertising Traffic Operation it's necessary to manage the day-to-day trafficking, pacing and optimization of digital and paid social campaigns. The data analyst on Traffic Operation can not only quickly provide answers but also speaks the language of the Process Manager and visually displays the discovered process problems. In order to solve a growing number of complaints in the customer service process, the weaknesses in the process itself must be identified and communicated to the department. With the help of Process Mining for the CRM data it is possible to identify unwanted loops and delays in the process. With this paper we propose a process discovery based on Machine Learning technique to automatically discover variations and detect at first glance what the problem is, and undertake corrective measures.\nNon-negative matrix factorization (NMF) is a prob- lem with many applications, ranging from facial recognition to document clustering. However, due to the variety of algorithms that solve NMF, the randomness involved in these algorithms, and the somewhat subjective nature of the problem, there is no clear \"correct answer\" to any particular NMF problem, and as a result, it can be hard to test new algorithms. This paper suggests some test cases for NMF algorithms derived from matrices with enumerable exact non-negative factorizations and perturbations of these matrices. Three algorithms using widely divergent approaches to NMF all give similar solutions over these test cases, suggesting that these test cases could be used as test cases for implementations of these existing NMF algorithms as well as potentially new NMF algorithms. This paper also describes how the proposed test cases could be used in practice.\nThe Human Phenotype Ontology (HPO) is a structured repository of concepts (HPO Terms) that are associated to one or more diseases. The process of association is referred to as annotation. The relevance and the specificity of both HPO terms and annotations are evaluated by a measure defined as Information Content (IC). The analysis of annotated data is thus an important challenge for bioinformatics. There exist different approaches of analysis. From those, the use of Association Rules (AR) may provide useful knowledge, and it has been used in some applications, e.g. improving the quality of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents HPO-Miner (Human Phenotype Ontology-based Weighted Association Rules) a methodology for extracting Weighted Association Rules. HPO-Miner can extract relevant rules from a biological point of view. A case study on using of HPO-Miner on publicly available HPO annotation datasets is used to demonstrate the effectiveness of our methodology.\nThe Workshops on (Constraint) Logic Programming (WLP) are the annual meeting of the German Society of Logic Programming (Gesellschaft f\\\"ur Logische Programmierung e.V., GLP) and bring together researchers interested in logic programming, constraint programming, answer set programming, and related areas like databases and artificial intelligence (not only from Germany).   The International Workshops on Functional and (Constraint) Logic Programming (WFLP) aim at bringing together researchers, students, and practitioners interested in functional programming, logic programming, and their integration.   The workshops have a tradition of co-location to promote the cross-fertilizing exchange of ideas and experiences among and between the communities interested in the foundations, applications, and combinations of high-level, declarative programming languages and related areas.\nMany robotic planning applications involve continuous actions with highly non-linear constraints, which cannot be modeled using modern planners that construct a propositional representation. We introduce STRIPStream: an extension of the STRIPS language which can model these domains by supporting the specification of blackbox generators to handle complex constraints. The outputs of these generators interact with actions through possibly infinite streams of objects and static predicates. We provide two algorithms which both reduce STRIPStream problems to a sequence of finite-domain planning problems. The representation and algorithms are entirely domain independent. We demonstrate our framework on simple illustrative domains, and then on a high-dimensional, continuous robotic task and motion planning domain.\nIn the past, several models of consciousness have become popular and have led to the development of models for machine consciousness with varying degrees of success and challenges for simulation and implementations. Moreover, affective computing attributes that involve emotions, behavior and personality have not been the focus of models of consciousness as they lacked motivation for deployment in software applications and robots. The affective attributes are important factors for the future of machine consciousness with the rise of technologies that can assist humans. Personality and affection hence can give an additional flavor for the computational model of consciousness in humanoid robotics. Recent advances in areas of machine learning with a focus on deep learning can further help in developing aspects of machine consciousness in areas that can better replicate human sensory perceptions such as speech recognition and vision. With such advancements, one encounters further challenges in developing models that can synchronize different aspects of affective computing. In this paper, we review some existing models of consciousnesses and present an affective computational model that would enable the human touch and feel for robotic systems.\nWe address the problem of locating facilities on the $[0,1]$ interval based on reports from strategic agents. The cost of each agent is her distance to the closest facility, and the global objective is to minimize either the maximum cost of an agent or the social cost.   As opposed to the extensive literature on facility location which considers the multiplicative error, we focus on minimizing the worst-case additive error. Minimizing the additive error incentivizes mechanisms to adapt to the size of the instance. I.e., mechanisms can sacrifice little efficiency in small instances (location profiles in which all agents are relatively close to one another), in order to gain more [absolute] efficiency in large instances. We argue that this measure is better suited for many manifestations of the facility location problem in various domains.   We present tight bounds for mechanisms locating a single facility in both deterministic and randomized cases. We further provide several extensions for locating multiple facilities.\nIn this paper, a non-probabilistic method based on fuzzy logic is used to update finite element models (FEMs). Model updating techniques use the measured data to improve the accuracy of numerical models of structures. However, the measured data are contaminated with experimental noise and the models are inaccurate due to randomness in the parameters. This kind of aleatory uncertainty is irreducible, and may decrease the accuracy of the finite element model updating process. However, uncertainty quantification methods can be used to identify the uncertainty in the updating parameters. In this paper, the uncertainties associated with the modal parameters are defined as fuzzy membership functions, while the model updating procedure is defined as an optimization problem at each {\\alpha}-cut level. To determine the membership functions of the updated parameters, an objective function is defined and minimized using two metaheuristic optimization algorithms: ant colony optimization (ACO) and particle swarm optimization (PSO). A structural example is used to investigate the accuracy of the fuzzy model updating strategy using the PSO and ACO algorithms. Furthermore, the results obtained by the fuzzy finite element model updating are compared with the Bayesian model updating results.\nLifted probabilistic inference (Poole, 2003) and symbolic dynamic programming for lifted stochastic planning (Boutilier et al, 2001) were introduced around the same time as algorithmic efforts to use abstraction in stochastic systems. Over the years, these ideas evolved into two distinct lines of research, each supported by a rich literature. Lifted probabilistic inference focused on efficient arithmetic operations on template-based graphical models under a finite domain assumption while symbolic dynamic programming focused on supporting sequential decision-making in rich quantified logical action models and on open domain reasoning. Given their common motivation but different focal points, both lines of research have yielded highly complementary innovations. In this chapter, we aim to help close the gap between these two research areas by providing an overview of lifted stochastic planning from the perspective of probabilistic inference, showing strong connections to other chapters in this book. This also allows us to define Generalized Lifted Inference as a paradigm that unifies these areas and elucidates open problems for future research that can benefit both lifted inference and stochastic planning.\nThis paper reviews the current status and challenges of Neural Networks (NNs) based machine learning approaches for modern power grid stability control including their design and implementation methodologies. NNs are widely accepted as Artificial Intelligence (AI) approaches offering an alternative way to control complex and ill-defined problems. In this paper various application of NNs for power system rotor angle stabilization and control problem is discussed. The main focus of this paper is on the use of Reinforcement Learning (RL) and Supervised Learning (SL) algorithms in power system wide-area control (WAC). Generally, these algorithms due to their capability in modeling nonlinearities and uncertainties are used for transient classification, neuro-control, wide-area monitoring and control, renewable energy management and control, and so on. The works of researchers in the field of conventional and renewable energy systems are reported and categorized. Paper concludes by presenting, comparing and evaluating various learning techniques and infrastructure configurations based on efficiency.\nIn this paper, we study learning generalized driving style representations from automobile GPS trip data. We propose a novel Autoencoder Regularized deep neural Network (ARNet) and a trip encoding framework trip2vec to learn drivers' driving styles directly from GPS records, by combining supervised and unsupervised feature learning in a unified architecture. Experiments on a challenging driver number estimation problem and the driver identification problem show that ARNet can learn a good generalized driving style representation: It significantly outperforms existing methods and alternative architectures by reaching the least estimation error on average (0.68, less than one driver) and the highest identification accuracy (by at least 3% improvement) compared with traditional supervised learning methods.\nExisting multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine's policy will prioritize each player's interests over time. Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player's own beliefs in evaluating how well an action will serve that player's utility function, and (2) shift the relative priority it assigns to each player's expected utilities over time, by a factor proportional to how well that player's beliefs predict the machine's inputs. Observation (2) represents a substantial divergence from na\\\"{i}ve linear utility aggregation (as in Harsanyi's utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.\nIn de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active towards a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target.   Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.\nTwo main techniques have been used so far to solve the #P-hard problem #SAT. The first one, used in practice, is based on an extension of DPLL for model counting called exhaustive DPLL. The second approach, more theoretical, exploits the structure of the input to compute the number of satisfying assignments by usually using a dynamic programming scheme on a decomposition of the formula. In this paper, we make a first step toward the separation of these two techniques by exhibiting a family of formulas that can be solved in polynomial time with the first technique but needs an exponential time with the second one. We show this by observing that both techniques implicitely construct a very specific boolean circuit equivalent to the input formula. We then show that every beta-acyclic formula can be represented by a polynomial size circuit corresponding to the first method and exhibit a family of beta-acyclic formulas which cannot be represented by polynomial size circuits corresponding to the second method. This result shed a new light on the complexity of #SAT and related problems on beta-acyclic formulas. As a byproduct, we give new handy tools to design algorithms on beta-acyclic hypergraphs.\nEntities are essential elements of natural language. In this paper, we present methods for learning multi-level representations of entities on three complementary levels: character (character patterns in entity names extracted, e.g., by neural networks), word (embeddings of words in entity names) and entity (entity embeddings). We investigate state-of-the-art learning methods on each level and find large differences, e.g., for deep learning models, traditional ngram features and the subword model of fasttext (Bojanowski et al., 2016) on the character level; for word2vec (Mikolov et al., 2013) on the word level; and for the order-aware model wang2vec (Ling et al., 2015a) on the entity level. We confirm experimentally that each level of representation contributes complementary information and a joint representation of all three levels improves the existing embedding based baseline for fine-grained entity typing by a large margin. Additionally, we show that adding information from entity descriptions further improves multi-level representations of entities.\nWe present a general framework, the coupled compound Poisson factorization (CCPF), to capture the missing-data mechanism in extremely sparse data sets by coupling a hierarchical Poisson factorization with an arbitrary data-generating model. We derive a stochastic variational inference algorithm for the resulting model and, as examples of our framework, implement three different data-generating models---a mixture model, linear regression, and factor analysis---to robustly model non-random missing data in the context of clustering, prediction, and matrix factorization. In all three cases, we test our framework against models that ignore the missing-data mechanism on large scale studies with non-random missing data, and we show that explicitly modeling the missing-data mechanism substantially improves the quality of the results, as measured using data log likelihood on a held-out test set.\nDespite enormous progress in object detection and classification, the problem of incorporating expected contextual relationships among object instances into modern recognition systems remains a key challenge. In this work we propose Information Pursuit, a Bayesian framework for scene parsing that combines prior models for the geometry of the scene and the spatial arrangement of objects instances with a data model for the output of high-level image classifiers trained to answer specific questions about the scene. In the proposed framework, the scene interpretation is progressively refined as evidence accumulates from the answers to a sequence of questions. At each step, we choose the question to maximize the mutual information between the new answer and the full interpretation given the current evidence obtained from previous inquiries. We also propose a method for learning the parameters of the model from synthesized, annotated scenes obtained by top-down sampling from an easy-to-learn generative scene model. Finally, we introduce a database of annotated indoor scenes of dining room tables, which we use to evaluate the proposed approach.\nMaximizing product use is a central goal of many businesses, which makes retention and monetization two central analytics metrics in games. Player retention may refer to various duration variables quantifying product use: total playtime or session playtime are popular research targets, and active playtime is well-suited for subscription games. Such research often has the goal of increasing player retention or conversely decreasing player churn. Survival analysis is a framework of powerful tools well suited for retention type data. This paper contributes new methods to game analytics on how playtime can be analyzed using survival analysis without covariates. Survival and hazard estimates provide both a visual and an analytic interpretation of the playtime phenomena as a funnel type nonparametric estimate. Metrics based on the survival curve can be used to aggregate this playtime information into a single statistic. Comparison of survival curves between cohorts provides a scientific AB-test. All these methods work on censored data and enable computation of confidence intervals. This is especially important in time and sample limited data which occurs during game development. Throughout this paper, we illustrate the application of these methods to real world game development problems on the Hipster Sheep mobile game.\nDeep Reinforcement Learning has enabled the learning of policies for complex tasks in partially observable environments, without explicitly learning the underlying model of the tasks. While such model-free methods achieve considerable performance, they often ignore the structure of task. We present a natural representation of to Reinforcement Learning (RL) problems using Recurrent Convolutional Neural Networks (RCNNs), to better exploit this inherent structure. We define 3 such RCNNs, whose forward passes execute an efficient Value Iteration, propagate beliefs of state in partially observable environments, and choose optimal actions respectively. Backpropagating gradients through these RCNNs allows the system to explicitly learn the Transition Model and Reward Function associated with the underlying MDP, serving as an elegant alternative to classical model-based RL. We evaluate the proposed algorithms in simulation, considering a robot planning problem. We demonstrate the capability of our framework to reduce the cost of replanning, learn accurate MDP models, and finally re-plan with learnt models to achieve near-optimal policies.\nMulti-task learning (MTL) involves the simultaneous training of two or more related tasks over shared representations. In this work, we apply MTL to audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn a mapping between audio-visual fused features and frame labels obtained from acoustic GMM/HMM model. This is combined with an auxiliary task which maps visual features to frame labels obtained from a separate visual GMM/HMM model. The MTL model is tested at various levels of babble noise and the results are compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate that MTL is especially useful at higher level of noise. Compared to base-line, upto 7\\% relative improvement in WER is reported at -3 SNR dB\nHigher-order probabilistic programming languages allow programmers to write sophisticated models in machine learning and statistics in a succinct and structured way, but step outside the standard measure-theoretic formalization of probability theory. Programs may use both higher-order functions and continuous distributions, or even define a probability distribution on functions. But standard probability theory does not handle higher-order functions well: the category of measurable spaces is not cartesian closed.   Here we introduce quasi-Borel spaces. We show that these spaces: form a new formalization of probability theory replacing measurable spaces; form a cartesian closed category and so support higher-order functions; form a well-pointed category and so support good proof principles for equational reasoning; and support continuous probability distributions. We demonstrate the use of quasi-Borel spaces for higher-order functions and probability by: showing that a well-known construction of probability theory involving random functions gains a cleaner expression; and generalizing de Finetti's theorem, that is a crucial theorem in probability theory, to quasi-Borel spaces.\nWe introduce a simple and accurate neural model for dependency-based semantic role labeling. Our model predicts predicate-argument dependencies relying on states of a bidirectional LSTM encoder. The semantic role labeler achieves competitive performance on English, even without any kind of syntactic information and only using local inference. However, when automatically predicted part-of-speech tags are provided as input, it substantially outperforms all previous local models and approaches the best reported results on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish where our approach also achieves competitive results. Syntactic parsers are unreliable on out-of-domain data, so standard (i.e., syntactically-informed) SRL models are hindered when tested in this setting. Our syntax-agnostic model appears more robust, resulting in the best reported results on standard out-of-domain test sets.\nWe propose a novel decoding approach for neural machine translation (NMT) based on continuous optimisation. We convert decoding - basically a discrete optimization problem - into a continuous optimization problem. The resulting constrained continuous optimisation problem is then tackled using gradient-based methods. Our powerful decoding framework enables decoding intractable models such as the intersection of left-to-right and right-to-left (bidirectional) as well as source-to-target and target-to-source (bilingual) NMT models. Our empirical results show that our decoding framework is effective, and leads to substantial improvements in translations generated from the intersected models where the typical greedy or beam search is not feasible. We also compare our framework against reranking, and analyse its advantages and disadvantages.\nWe introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of \"siamese cat\" and \"tiger cat\", we generate language that describes the \"siamese cat\" in a way that distinguishes it from \"tiger cat\". Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination.\nIn this paper, we try to predict the winning team of a match in the multiplayer eSports game Dota 2. To address the weaknesses of previous work, we consider more aspects of prior (pre-match) features from individual players' match history, as well as real-time (during-match) features at each minute as the match progresses. We use logistic regression, the proposed Attribute Sequence Model, and their combinations as the prediction models. In a dataset of 78362 matches where 20631 matches contain replay data, our experiments show that adding more aspects of prior features improves accuracy from 58.69% to 71.49%, and introducing real-time features achieves up to 93.73% accuracy when predicting at the 40th minute.\nFirst-Order Logic (FOL) is widely regarded as one of the most important foundations for knowledge representation. Nevertheless, in this paper, we argue that FOL has several critical issues for this purpose. Instead, we propose an alternative called assertional logic, in which all syntactic objects are categorized as set theoretic constructs including individuals, concepts and operators, and all kinds of knowledge are formalized by equality assertions. We first present a primitive form of assertional logic that uses minimal assumed knowledge and constructs. Then, we show how to extend it by definitions, which are special kinds of knowledge, i.e., assertions. We argue that assertional logic, although simpler, is more expressive and extensible than FOL. As a case study, we show how assertional logic can be used to unify logic and probability, and more building blocks in AI.\nSince Leonard Savage's epoch-making \"Foundations of Statistics\", Subjective Expected Utility Theory has been the presumptive model for decision-making. Savage provided an act-based axiomatization of standard expected utility theory. In this article, we provide a Savage-like axiomatization of nonstandard expected utility theory. It corresponds to a weakening of Savage's 6th axiom.\nIn this paper we investigate the links between instantiated argumentation systems and the axioms for non-monotonic reasoning described in [9] with the aim of characterising the nature of argument based reasoning. In doing so, we consider two possible interpretations of the consequence relation, and describe which axioms are met by ASPIC+ under each of these interpretations. We then consider the links between these axioms and the rationality postulates. Our results indicate that argument based reasoning as characterised by ASPIC+ is - according to the axioms of [9] - non-cumulative and non-monotonic, and therefore weaker than the weakest non-monotonic reasoning systems they considered possible. This weakness underpins ASPIC+'s success in modelling other reasoning systems, and we conclude by considering the relationship between ASPIC+ and other weak logical systems.\nWe propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation to variational inference to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, we show on a benchmark logistic regression task that Edward is at least 35x faster than Stan and 6x faster than PyMC3. Further, Edward incurs no runtime overhead: it is as fast as handwritten TensorFlow.\nTask-oriented dialogue focuses on conversational agents that participate in user-initiated dialogues on domain-specific topics. In contrast to chatbots, which simply seek to sustain open-ended meaningful discourse, existing task-oriented agents usually explicitly model user intent and belief states. This paper examines bypassing such an explicit representation by depending on a latent neural embedding of state and learning selective attention to dialogue history together with copying to incorporate relevant prior context. We complement recent work by showing the effectiveness of simple sequence-to-sequence neural architectures with a copy mechanism. Our model outperforms more complex memory-augmented models by 7% in per-response generation and is on par with the current state-of-the-art on DSTC2.\nProviding Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedures. In this work, we explore protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is to incorporate the beneficial properties of a human teacher into Reinforcement Learning without making strong assumptions about the inner workings of the agent. We show how to represent existing approaches such as action pruning, reward shaping, and training in simulation as special cases of our schema and conduct preliminary experiments on simple domains.\nThe combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments.\nWe study characteristics of receptive fields of units in deep convolutional networks. The receptive field size is a crucial issue in many visual tasks, as the output must respond to large enough areas in the image to capture information about large objects. We introduce the notion of an effective receptive field, and show that it both has a Gaussian distribution and only occupies a fraction of the full theoretical receptive field. We analyze the effective receptive field in several architecture designs, and the effect of nonlinear activations, dropout, sub-sampling and skip connections on it. This leads to suggestions for ways to address its tendency to be too small.\nIn this paper, we improve the previously best known regret bound to achieve $\\epsilon$-differential privacy in oblivious adversarial bandits from $\\mathcal{O}{(T^{2/3}/\\epsilon)}$ to $\\mathcal{O}{(\\sqrt{T} \\ln T /\\epsilon)}$. This is achieved by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear amount of information in $T$. However, we can improve this privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach $\\mathcal{O}{(\\sqrt{\\ln T})}$-DP, with a regret of $\\mathcal{O}{(T^{2/3})}$ that holds against an adaptive adversary, an improvement from the best known of $\\mathcal{O}{(T^{3/4})}$. This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.\nWe present a novel extension of Thompson Sampling for stochastic sequential decision problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of the algorithm, linking its performance to the underlying properties of the graph. Thompson Sampling has the advantage of being applicable without the need to construct complicated upper confidence bounds for different problems. We illustrate its performance through extensive experimental results on real and simulated networks with graph feedback. More specifically, we tested our algorithms on power law, planted partitions and Erdo's-Renyi graphs, as well as on graphs derived from Facebook and Flixster data. These all show that our algorithms clearly outperform related methods that employ upper confidence bounds, even if the latter use more information about the graph.\nThe problem where a tropical cyclone intensifies dramatically within a short period of time is known as rapid intensification. This has been one of the major challenges for tropical weather forecasting. Recurrent neural networks have been promising for time series problems which makes them appropriate for rapid intensification. In this paper, recurrent neural networks are used to predict rapid intensification cases of tropical cyclones from the South Pacific and South Indian Ocean regions. A class imbalanced problem is encountered which makes it very challenging to achieve promising performance. A simple strategy was proposed to include more positive cases for detection where the false positive rate was slightly improved. The limitations of building an efficient system remains due to the challenges of addressing the class imbalance problem encountered for rapid intensification prediction. This motivates further research in using innovative machine learning methods.\nMost existing community-related studies focus on detection, which aim to find the community membership for each user from user friendship links. However, membership alone, without a complete profile of what a community is and how it interacts with other communities, has limited applications. This motivates us to consider systematically profiling the communities and thereby developing useful community-level applications. In this paper, we for the first time formalize the concept of community profiling. With rich user information on the network, such as user published content and user diffusion links, we characterize a community in terms of both its internal content profile and external diffusion profile. The difficulty of community profiling is often underestimated. We novelly identify three unique challenges and propose a joint Community Profiling and Detection (CPD) model to address them accordingly. We also contribute a scalable inference algorithm, which scales linearly with the data size and it is easily parallelizable. We evaluate CPD on large-scale real-world data sets, and show that it is significantly better than the state-of-the-art baselines in various tasks.\nOptimization is becoming a crucial element in industrial applications involving sustainable alternative energy systems. During the design of such systems, the engineer/decision maker would often encounter noise factors (e.g. solar insolation and ambient temperature fluctuations) when their system interacts with the environment. In this chapter, the sizing and design optimization of the solar powered irrigation system was considered. This problem is multivariate, noisy, nonlinear and multiobjective. This design problem was tackled by first using the Fuzzy Type II approach to model the noise factors. Consequently, the Bacterial Foraging Algorithm (BFA) (in the context of a weighted sum framework) was employed to solve this multiobjective fuzzy design problem. This method was then used to construct the approximate Pareto frontier as well as to identify the best solution option in a fuzzy setting. Comprehensive analyses and discussions were performed on the generated numerical results with respect to the implemented solution methods.\nCrowdsourcing, a major economic issue, is the fact that the firm outsources internal task to the crowd. It is a form of digital subcontracting for the general public. The evaluation of the participants work quality is a major issue in crowdsourcing. Indeed, contributions must be controlled to ensure the effectiveness and relevance of the campaign. We are particularly interested in small, fast and not automatable tasks. Several methods have been proposed to solve this problem, but they are applicable when the \"golden truth\" is not always known. This work has the particularity to propose a method for calculating the degree of expertise in the presence of gold data in crowdsourcing. This method is based on the belief function theory and proposes a structuring of data using graphs. The proposed approach will be assessed and applied to the data.\nPsychological traumas are thought to be present in a wide range of conditions, including post-traumatic stress disorder, disorganised attachment, personality disorders, dissociative identity disorder and psychosis. This work presents a new psychotherapy for psychological traumas, based on a functional model of the mind, built with elements borrowed from the fields of computer science, artificial intelligence and neural networks. The model revolves around the concept of hierarchical value and explains the emergence of dissociation and splitting in response to emotional pain. The key intuition is that traumas are caused by too strong negative emotions, which are in turn made possible by a low-value self, which is in turn determined by low-value self-associated ideas. The therapeutic method compiles a list of patient's traumas, identifies for each trauma a list of low-value self-associated ideas, and provides for each idea a list of counterexamples, to raise the self value and solve the trauma. Since the psychotherapy proposed has not been clinically tested, statements on its effectiveness are premature. However, since the conceptual basis is solid and traumas are hypothesised to be present in many psychological disorders, the potential gain may be substantial.\nHumans are not only adept in recognizing what class an input instance belongs to (i.e., classification task), but perhaps more remarkably, they can imagine (i.e., generate) plausible instances of a desired class with ease, when prompted. Inspired by this, we propose a framework which allows transforming Cascade-Correlation Neural Networks (CCNNs) into probabilistic generative models, thereby enabling CCNNs to generate samples from a category of interest. CCNNs are a well-known class of deterministic, discriminative NNs, which autonomously construct their topology, and have been successful in giving accounts for a variety of psychological phenomena. Our proposed framework is based on a Markov Chain Monte Carlo (MCMC) method, called the Metropolis-adjusted Langevin algorithm, which capitalizes on the gradient information of the target distribution to direct its explorations towards regions of high probability, thereby achieving good mixing properties. Through extensive simulations, we demonstrate the efficacy of our proposed framework.\nInternship assignment is a complicated process for universities since it is necessary to take into account a multiplicity of variables to establish a compromise between companies' requirements and student competencies acquired during the university training. These variables build up a complex relations map that requires the formulation of an exhaustive and rigorous conceptual scheme. In this research a domain ontological model is presented as support to the student's decision making for opportunities of University studies level of the University Lumiere Lyon 2 (ULL) education system. The ontology is designed and created using methodological approach offering the possibility of improving the progressive creation, capture and knowledge articulation. In this paper, we draw a balance taking the demands of the companies across the capabilities of the students. This will be done through the establishment of an ontological model of an educational learners' profile and the internship postings which are written in a free text and using uncontrolled vocabulary. Furthermore, we outline the process of semantic matching which improves the quality of query results.\nThis article aims to achieve two goals: to show that probability is not the only way of dealing with uncertainty (and even more, that there are kinds of uncertainty which are for principled reasons not addressable with probabilistic means); and to provide evidence that logic-based methods can well support reasoning with uncertainty. For the latter claim, two paradigmatic examples are presented: Logic Programming with Kleene semantics for modelling reasoning from information in a discourse, to an interpretation of the state of affairs of the intended model, and a neural-symbolic implementation of Input/Output logic for dealing with uncertainty in dynamic normative contexts.\nFor agents and robots to become more useful, they must be able to quickly learn from non-technical users. This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy. We present empirical results that show this assumption to be false---whether human trainers give a positive or negative feedback for a decision is influenced by the learner's current policy. We argue that policy-dependent feedback, in addition to being commonplace, enables useful training strategies from which agents should benefit. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot, even with noisy image features.\nIn this paper, for the first time, we study label propagation in heterogeneous graphs under heterophily assumption. Homophily label propagation (i.e., two connected nodes share similar labels) in homogeneous graph (with same types of vertices and relations) has been extensively studied before. Unfortunately, real-life networks are heterogeneous, they contain different types of vertices (e.g., users, images, texts) and relations (e.g., friendships, co-tagging) and allow for each node to propagate both the same and opposite copy of labels to its neighbors. We propose a $\\mathcal{K}$-partite label propagation model to handle the mystifying combination of heterogeneous nodes/relations and heterophily propagation. With this model, we develop a novel label inference algorithm framework with update rules in near-linear time complexity. Since real networks change over time, we devise an incremental approach, which supports fast updates for both new data and evidence (e.g., ground truth labels) with guaranteed efficiency. We further provide a utility function to automatically determine whether an incremental or a re-modeling approach is favored. Extensive experiments on real datasets have verified the effectiveness and efficiency of our approach, and its superiority over the state-of-the-art label propagation methods.\nMany aspects of people's lives are proven to be deeply connected to their jobs. In this paper, we first investigate the distinct characteristics of major occupation categories based on tweets. From multiple social media platforms, we gather several types of user information. From users' LinkedIn webpages, we learn their proficiencies. To overcome the ambiguity of self-reported information, a soft clustering approach is applied to extract occupations from crowd-sourced data. Eight job categories are extracted, including Marketing, Administrator, Start-up, Editor, Software Engineer, Public Relation, Office Clerk, and Designer. Meanwhile, users' posts on Twitter provide cues for understanding their linguistic styles, interests, and personalities. Our results suggest that people of different jobs have unique tendencies in certain language styles and interests. Our results also clearly reveal distinctive levels in terms of Big Five Traits for different jobs. Finally, a classifier is built to predict job types based on the features extracted from tweets. A high accuracy indicates a strong discrimination power of language features for job prediction task.\nThe fifth Dialog State Tracking Challenge (DSTC5) introduces a new cross-language dialog state tracking scenario, where the participants are asked to build their trackers based on the English training corpus, while evaluating them with the unlabeled Chinese corpus. Although the computer-generated translations for both English and Chinese corpus are provided in the dataset, these translations contain errors and careless use of them can easily hurt the performance of the built trackers. To address this problem, we propose a multichannel Convolutional Neural Networks (CNN) architecture, in which we treat English and Chinese language as different input channels of one single CNN model. In the evaluation of DSTC5, we found that such multichannel architecture can effectively improve the robustness against translation errors. Additionally, our method for DSTC5 is purely machine learning based and requires no prior knowledge about the target language. We consider this a desirable property for building a tracker in the cross-language context, as not every developer will be familiar with both languages.\nThis paper focuses on modeling ride requests and their variations over location and time, based on analyzing extensive real-world data from a ride-sharing service. We introduce a graph model that captures the spatial and temporal variability of ride requests and the potentials for ride pooling. We discover these ride request graphs exhibit a well known property called densification power law often found in real graphs modelling human behaviors. We show the pattern of ride requests and the potential of ride pooling for a city can be characterized by the densification factor of the ride request graphs. Previous works have shown that it is possible to automatically generate synthetic versions of these graphs that exhibit a given densification factor. We present an algorithm for automatic generation of synthetic ride request graphs that match quite well the densification factor of ride request graphs from actual ride request data.\nWe develop a framework for rendering photographic images, taking into account display limitations, so as to optimize perceptual similarity between the rendered image and the original scene. We formulate this as a constrained optimization problem, in which we minimize a measure of perceptual dissimilarity, the Normalized Laplacian Pyramid Distance (NLPD), which mimics the early stage transformations of the human visual system. When rendering images acquired with higher dynamic range than that of the display, we find that the optimized solution boosts the contrast of low-contrast features without introducing significant artifacts, yielding results of comparable visual quality to current state-of-the art methods with no manual intervention or parameter settings. We also examine a variety of other display constraints, including limitations on minimum luminance (black point), mean luminance (as a proxy for energy consumption), and quantized luminance levels (halftoning). Finally, we show that the method may be used to enhance details and contrast of images degraded by optical scattering (e.g. fog).\nPortable computing devices, which include tablets, smart phones and various types of wearable sensors, experienced a rapid development in recent years. One of the most critical limitations for these devices is the power consumption as they use batteries as the power supply. However, the bottleneck of the power saving schemes in both hardware design and software algorithm is the huge variability in power consumption. The variability is caused by a myriad of factors, including the manufacturing process, the ambient environment (temperature, humidity), the aging effects and etc. As the technology node scaled down to 28nm and even lower, the variability becomes more severe. As a result, a platform for variability characterization seems to be very necessary and helpful.\nThe $k$-Means clustering problem on $n$ points is NP-Hard for any dimension $d\\ge 2$, however, for the 1D case there exist exact polynomial time algorithms. Previous literature reported an $O(kn^2)$ time dynamic programming algorithm that uses $O(kn)$ space. We present a new algorithm computing the optimal clustering in only $O(kn)$ time using linear space. For $k = \\Omega(\\lg n)$, we improve this even further to $n 2^{O(\\sqrt{ \\lg \\lg n \\lg k})}$ time. We generalize the new algorithm(s) to work for the absolute distance instead of squared distance and to work for any Bregman Divergence as well.\nExisting algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.\nThe popularity of image sharing on social media and the engagement it creates between users reflects the important role that visual context plays in everyday conversations. We present a novel task, Image-Grounded Conversations (IGC), in which natural-sounding conversations are generated about a shared image. To benchmark progress, we introduce a new multiple-reference dataset of crowd-sourced, event-centric conversations on images. IGC falls on the continuum between chit-chat and goal-directed conversation models, where visual grounding constrains the topic of conversation to event-driven utterances. Experiments with models trained on social media data show that the combination of visual and textual context enhances the quality of generated conversational turns. In human evaluation, the gap between human performance and that of both neural and retrieval architectures suggests that multi-modal IGC presents an interesting challenge for dialogue research.\nNatural-language-facilitated human-robot cooperation (NLC), in which natural language (NL) is used to share knowledge between a human and a robot for conducting intuitive human-robot cooperation (HRC), is continuously developing in the recent decade. Currently, NLC is used in several robotic domains such as manufacturing, daily assistance and health caregiving. It is necessary to summarize current NLC-based robotic systems and discuss the future developing trends, providing helpful information for future NLC research. In this review, we first analyzed the driving forces behind the NLC research. Regarding to a robot s cognition level during the cooperation, the NLC implementations then were categorized into four types {NL-based control, NL-based robot training, NL-based task execution, NL-based social companion} for comparison and discussion. Last based on our perspective and comprehensive paper review, the future research trends were discussed.\nThe study of mereology (parts and wholes) in the context of formal approaches to vagueness can be approached in a number of ways. In the context of rough sets, mereological concepts with a set-theoretic or valuation based ontology acquire complex and diverse behavior. In this research a general rough set framework called granular operator spaces is extended and the nature of parthood in it is explored from a minimally intrusive point of view. This is used to develop counting strategies that help in classifying the framework. The developed methodologies would be useful for drawing involved conclusions about the nature of data (and validity of assumptions about it) from antichains derived from context. The problem addressed is also about whether counting procedures help in confirming that the approximations involved in formation of data are indeed rough approximations?\nAutonomous software agents operating in dynamic environments need to constantly reason about actions in pursuit of their goals, while taking into consideration norms which might be imposed on those actions. Normative practical reasoning supports agents making decisions about what is best for them to (not) do in a given situation. What makes practical reasoning challenging is the interplay between goals that agents are pursuing and the norms that the agents are trying to uphold. We offer a formalisation to allow agents to plan for multiple goals and norms in the presence of durative actions that can be executed concurrently. We compare plans based on decision-theoretic notions (i.e. utility) such that the utility gain of goals and utility loss of norm violations are the basis for this comparison. The set of optimal plans consists of plans that maximise the overall utility, each of which can be chosen by the agent to execute. We provide an implementation of our proposal in Answer Set Programming, thus allowing us to state the original problem in terms of a logic program that can be queried for solutions with specific properties. The implementation is proven to be sound and complete.\nWhen AI systems interact with humans in the loop, they are often called on to provide explanations for their plans and behavior. Past work on plan explanations primarily involved the AI system explaining the correctness of its plan and the rationale for its decision in terms of its own model. Such soliloquy is wholly inadequate in most realistic scenarios where the humans have domain and task models that differ significantly from that used by the AI system. We posit that the explanations are best studied in light of these differing models. In particular, we show how explanation can be seen as a \"model reconciliation problem\" (MRP), where the AI system in effect suggests changes to the human's model, so as to make its plan be optimal with respect to that changed human model. We will study the properties of such explanations, present algorithms for automatically computing them, and evaluate the performance of the algorithms.\nIn a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music. This model solves a major problem of conventional methods that could not properly describe the nature of multiple voices as in polyrhythmic scores or in the phenomenon of loose synchrony between voices. In this paper we present a complete description of the proposed model and develop an inference technique, which is valid for any merged-output HMMs for which output probabilities depend on past events. We also examine the influence of the architecture and parameters of the method in terms of accuracies of rhythm transcription and voice separation and perform comparative evaluations with six other algorithms. Using MIDI recordings of classical piano pieces, we found that the proposed model outperformed other methods by more than 12 points in the accuracy for polyrhythmic performances and performed almost as good as the best one for non-polyrhythmic performances. This reveals the state-of-the-art methods of rhythm transcription for the first time in the literature. Publicly available source codes are also provided for future comparisons.\nWe introduce new diversification methods for zero-one optimization that significantly extend strategies previously introduced in the setting of metaheuristic search. Our methods incorporate easily implemented strategies for partitioning assignments of values to variables, accompanied by processes called augmentation and shifting which create greater flexibility and generality. We then show how the resulting collection of diversified solutions can be further diversified by means of permutation mappings, which equally can be used to generate diversified collections of permutations for applications such as scheduling and routing. These methods can be applied to non-binary vectors by the use of binarization procedures and by Diversification-Based Learning (DBL) procedures which also provide connections to applications in clustering and machine learning. Detailed pseudocode and numerical illustrations are provided to show the operation of our methods and the collections of solutions they create.\nThis research presents an innovative and unique way of solving the advertisement prediction problem which is considered as a learning problem over the past several years. Online advertising is a multi-billion-dollar industry and is growing every year with a rapid pace. The goal of this research is to enhance click through rate of the contextual advertisements using Linear Regression. In order to address this problem, a new technique propose in this paper to predict the CTR which will increase the overall revenue of the system by serving the advertisements more suitable to the viewers with the help of feature extraction and displaying the advertisements based on context of the publishers. The important steps include the data collection, feature extraction, CTR prediction and advertisement serving. The statistical results obtained from the dynamically used technique show an efficient outcome by fitting the data close to perfection for the LR technique using optimized feature selection.\nThis paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically evaluated on a dialogue task where it is shown to outperform each individual algorithm in most configurations. ESBAS is then adapted to a true online setting where algorithms update their policies after each transition, which we call SSBAS. SSBAS is evaluated on a fruit collection task where it is shown to adapt the stepsize parameter more efficiently than the classical hyperbolic decay, and on an Atari game, where it improves the performance by a wide margin.\nThis article shows how the recent breakthroughs in Reinforcement Learning (RL) that have enabled robots to learn to play arcade video games, walk or assemble colored bricks, can be used to perform other tasks that are currently at the core of engineering cyberphysical systems. We present the first use of RL for the control of systems modeled by discretized non-linear Partial Differential Equations (PDEs) and devise a novel algorithm to use non-parametric control techniques for large multi-agent systems. We show how neural network based RL enables the control of discretized PDEs whose parameters are unknown, random, and time-varying. We introduce an algorithm of Mutual Weight Regularization (MWR) which alleviates the curse of dimensionality of multi-agent control schemes by sharing experience between agents while giving each agent the opportunity to specialize its action policy so as to tailor it to the local parameters of the part of the system it is located in.\nThe focus of this work is to enumerate the various approaches and algorithms that center around application of reinforcement learning in robotic ma- ]]nipulation tasks. Earlier methods utilized specialized policy representations and human demonstrations to constrict the policy. Such methods worked well with continuous state and policy space of robots but failed to come up with generalized policies. Subsequently, high dimensional non-linear function approximators like neural networks have been used to learn policies from scratch. Several novel and recent approaches have also embedded control policy with efficient perceptual representation using deep learning. This has led to the emergence of a new branch of dynamic robot control system called deep r inforcement learning(DRL). This work embodies a survey of the most recent algorithms, architectures and their implementations in simulations and real world robotic platforms. The gamut of DRL architectures are partitioned into two different branches namely, discrete action space algorithms(DAS) and continuous action space algorithms(CAS). Further, the CAS algorithms are divided into stochastic continuous action space(SCAS) and deterministic continuous action space(DCAS) algorithms. Along with elucidating an organ- isation of the DRL algorithms this work also manifests some of the state of the art applications of these approaches in robotic manipulation tasks.\nWe examine the meaning and the complexity of probabilistic logic programs that consist of a set of rules and a set of independent probabilistic facts (that is, programs based on Sato's distribution semantics). We focus on two semantics, respectively based on stable and on well-founded models. We show that the semantics based on stable models (referred to as the \"credal semantics\") produces sets of probability models that dominate infinitely monotone Choquet capacities, we describe several useful consequences of this result. We then examine the complexity of inference with probabilistic logic programs. We distinguish between the complexity of inference when a probabilistic program and a query are given (the inferential complexity), and the complexity of inference when the probabilistic program is fixed and the query is given (the query complexity, akin to data complexity as used in database theory). We obtain results on the inferential and query complexity for acyclic, stratified, and cyclic propositional and relational programs, complexity reaches various levels of the counting hierarchy and even exponential levels.\nWe propose a novel rank aggregation method based on converting permutations into their corresponding Lehmer codes or other subdiagonal images. Lehmer codes, also known as inversion vectors, are vector representations of permutations in which each coordinate can take values not restricted by the values of other coordinates. This transformation allows for decoupling of the coordinates and for performing aggregation via simple scalar median or mode computations. We present simulation results illustrating the performance of this completely parallelizable approach and analytically prove that both the mode and median aggregation procedure recover the correct centroid aggregate with small sample complexity when the permutations are drawn according to the well-known Mallows models. The proposed Lehmer code approach may also be used on partial rankings, with similar performance guarantees.\nRetrosynthesis is a technique to plan the chemical synthesis of organic molecules, for example drugs, agro- and fine chemicals. In retrosynthesis, a search tree is built by analysing molecules recursively and dissecting them into simpler molecular building blocks until one obtains a set of known building blocks. The search space is intractably large, and it is difficult to determine the value of retrosynthetic positions. Here, we propose to model retrosynthesis as a Markov Decision Process. In combination with a Deep Neural Network policy learned from essentially the complete published knowledge of chemistry, Monte Carlo Tree Search (MCTS) can be used to evaluate positions. In exploratory studies, we demonstrate that MCTS with neural network policies outperforms the traditionally used best-first search with hand-coded heuristics.\nIn the fashion industry, order scheduling focuses on the assignment of production orders to appropriate production lines. In reality, before a new order can be put into production, a series of activities known as pre-production events need to be completed. In addition, in real production process, owing to various uncertainties, the daily production quantity of each order is not always as expected. In this research, by considering the pre-production events and the uncertainties in the daily production quantity, robust order scheduling problems in the fashion industry are investigated with the aid of a multi-objective evolutionary algorithm (MOEA) called nondominated sorting adaptive differential evolution (NSJADE). The experimental results illustrate that it is of paramount importance to consider pre-production events in order scheduling problems in the fashion industry. We also unveil that the existence of the uncertainties in the daily production quantity heavily affects the order scheduling.\nIn this paper we present an approach to extract ordered timelines of events, their participants, locations and times from a set of multilingual and cross-lingual data sources. Based on the assumption that event-related information can be recovered from different documents written in different languages, we extend the Cross-document Event Ordering task presented at SemEval 2015 by specifying two new tasks for, respectively, Multilingual and Cross-lingual Timeline Extraction. We then develop three deterministic algorithms for timeline extraction based on two main ideas. First, we address implicit temporal relations at document level since explicit time-anchors are too scarce to build a wide coverage timeline extraction system. Second, we leverage several multilingual resources to obtain a single, inter-operable, semantic representation of events across documents and across languages. The result is a highly competitive system that strongly outperforms the current state-of-the-art. Nonetheless, further analysis of the results reveals that linking the event mentions with their target entities and time-anchors remains a difficult challenge. The systems, resources and scorers are freely available to facilitate its use and guarantee the reproducibility of results.\nSafe interaction with human drivers is one of the primary challenges for autonomous vehicles. In order to plan driving maneuvers effectively, the vehicle's control system must infer and predict how humans will behave based on their latent internal state (e.g., intentions and aggressiveness). This research uses a simple model for human behavior with unknown parameters that make up the internal states of the traffic participants and presents a method for quantifying the value of estimating these states and planning with their uncertainty explicitly modeled. An upper performance bound is established by an omniscient Monte Carlo Tree Search (MCTS) planner that has perfect knowledge of the internal states. A baseline lower bound is established by planning with MCTS assuming that all drivers have the same internal state. MCTS variants are then used to solve a partially observable Markov decision process (POMDP) that models the internal state uncertainty to determine whether inferring the internal state offers an advantage over the baseline. Applying this method to a freeway lane changing scenario reveals that there is a significant performance gap between the upper bound and baseline. POMDP planning techniques come close to closing this gap, especially when important hidden model parameters are correlated with measurable parameters.\nThe problem of quantizing the activations of a deep neural network is considered. An examination of the popular binary quantization approach shows that this consists of approximating a classical non-linearity, the hyperbolic tangent, by two functions: a piecewise constant sign function, which is used in feedforward network computations, and a piecewise linear hard tanh function, used in the backpropagation step during network learning. The problem of approximating the ReLU non-linearity, widely used in the recent deep learning literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations commonly used in the literature. To overcome the problem of gradient mismatch, due to the use of different forward and backward approximations, several piece-wise backward approximators are then investigated. The implementation of the resulting quantized network, denoted as HWGQ-Net, is shown to achieve much closer performance to full precision networks, such as AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision networks, with 1-bit binary weights and 2-bit quantized activations.\nDeep neural networks have emerged as a widely used and effective means for tackling complex, real-world problems. However, a major obstacle in applying them to safety-critical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counter-examples). The technique is based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the next-generation airborne collision avoidance system for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods.\nReal-time optimization of traffic flow addresses important practical problems: reducing a driver's wasted time, improving city-wide efficiency, reducing gas emissions and improving air quality. Much of the current research in traffic-light optimization relies on extending the capabilities of traffic lights to either communicate with each other or communicate with vehicles. However, before such capabilities become ubiquitous, opportunities exist to improve traffic lights by being more responsive to current traffic situations within the current, already deployed, infrastructure. In this paper, we introduce a traffic light controller that employs bidding within micro-auctions to efficiently incorporate traffic sensor information; no other outside sources of information are assumed. We train and test traffic light controllers on large-scale data collected from opted-in Android cell-phone users over a period of several months in Mountain View, California and the River North neighborhood of Chicago, Illinois. The learned auction-based controllers surpass (in both the relevant metrics of road-capacity and mean travel time) the currently deployed lights, optimized static-program lights, and longer-term planning approaches, in both cities, measured using real user driving data.\nEntity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Due to inherent ambiguity of data representation and poor data quality, ER is a challenging task for any automated process. As a remedy, human-powered ER via crowdsourcing has become popular in recent years. Using crowd to answer queries is costly and time consuming. Furthermore, crowd-answers can often be faulty. Therefore, crowd-based ER methods aim to minimize human participation without sacrificing the quality and use a computer generated similarity matrix actively. While, some of these methods perform well in practice, no theoretical analysis exists for them, and further their worst case performances do not reflect the experimental findings. This creates a disparity in the understanding of the popular heuristics for this problem. In this paper, we make the first attempt to close this gap. We provide a thorough analysis of the prominent heuristic algorithms for crowd-based ER. We justify experimental observations with our analysis and information theoretic lower bounds.\nWe present a family of logics for reasoning about agents' positions and motion in the plane which have several potential applications in the area of multi-agent systems (MAS), such as multi-agent planning and robotics. The most general logic includes (i) atomic formulas for representing the truth of a given fact or the presence of a given agent at a certain position of the plane, (ii) atomic programs corresponding to the four basic orientations in the plane (up, down, left, right) as well as the four program constructs of propositional dynamic logic (sequential composition, nondeterministic composition, iteration and test). As this logic is not computably enumerable, we study some interesting decidable and axiomatizable fragments of it. We also present a decidable extension of the iteration-free fragment of the logic by special programs representing motion of agents in the plane.\nThis paper describes the details of Sighthound's fully automated vehicle make, model and color recognition system. The backbone of our system is a deep convolutional neural network that is not only computationally inexpensive, but also provides state-of-the-art results on several competitive benchmarks. Additionally, our deep network is trained on a large dataset of several million images which are labeled through a semi-automated process. Finally we test our system on several public datasets as well as our own internal test dataset. Our results show that we outperform other methods on all benchmarks by significant margins. Our model is available to developers through the Sighthound Cloud API at https://www.sighthound.com/products/cloud\nWe present a technique for automatically extracting mutual exclusion invariants from temporal planning instances. It first identifies a set of invariant templates by inspecting the lifted representation of the domain and then checks these templates against properties that assure invariance. Our technique builds on other approaches to invariant synthesis presented in the literature, but departs from their limited focus on instantaneous actions by addressing temporal domains. To deal with time, we formulate invariance conditions that account for the entire structure of the actions and the possible concurrent interactions between them. As a result, we construct a significantly more comprehensive technique than previous methods, which is able to find not only invariants for temporal domains, but also a broader set of invariants for non-temporal domains. The experimental results reported in this paper provide evidence that identifying a broader set of invariants results in the generation of fewer multi-valued state variables with larger domains. We show that, in turn, this reduction in the number of variables reflects positively on the performance of a number of temporal planners that use a variable/value representation by significantly reducing their running time.\nWe present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaning-based linguistic knowledge from the input signal. We carry out an in-depth analysis of the representations used by different components of the trained model and show that encoding of semantic aspects tends to become richer as we go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease.\nWe propose a method to generate multiple diverse and valid human pose hypotheses in 3D all consistent with the 2D detection of joints in a monocular RGB image. We use a novel generative model uniform (unbiased) in the space of anatomically plausible 3D poses. Our model is compositional (produces a pose by combining parts) and since it is restricted only by anatomical constraints it can generalize to every plausible human 3D pose. Removing the model bias intrinsically helps to generate more diverse 3D pose hypotheses. We argue that generating multiple pose hypotheses is more reasonable than generating only a single 3D pose based on the 2D joint detection given the depth ambiguity and the uncertainty due to occlusion and imperfect 2D joint detection. We hope that the idea of generating multiple consistent pose hypotheses can give rise to a new line of future work that has not received much attention in the literature. We used the Human3.6M dataset for empirical evaluation.\nThe technique of kernelization consists in extracting, from an instance of a problem, an essentially equivalent instance whose size is bounded in a parameter k. Besides being the basis for efficient param-eterized algorithms, this method also provides a wealth of information to reason about in the context of constraint programming. We study the use of kernelization for designing propagators through the example of the Vertex Cover constraint. Since the classic kernelization rules often correspond to dominance rather than consistency, we introduce the notion of \"loss-less\" kernel. While our preliminary experimental results show the potential of the approach, they also show some of its limits. In particular, this method is more effective for vertex covers of large and sparse graphs, as they tend to have, relatively, smaller kernels.\nWe present Deep Generalized Canonical Correlation Analysis (DGCCA) -- a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two-view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA is the first CCA-style multiview representation learning technique that combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many independent sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn DGCCA representations on two distinct datasets for three downstream tasks: phonetic transcription from acoustic and articulatory measurements, and recommending hashtags and friends on a dataset of Twitter users. We find that DGCCA representations soundly beat existing methods at phonetic transcription and hashtag recommendation, and in general perform no worse than standard linear many-view techniques.\nAlthough deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear. As a result, these models are generally treated as black boxes, yielding no insight of the underlying learned patterns. In this paper we consider Long Short Term Memory networks (LSTMs) and demonstrate a new approach for tracking the importance of a given input to the LSTM for a given output. By identifying consistently important patterns of words, we are able to distill state of the art LSTMs on sentiment analysis and question answering into a set of representative phrases. This representation is then quantitatively validated by using the extracted phrases to construct a simple, rule-based classifier which approximates the output of the LSTM.\nIn application domains such as healthcare, we want accurate predictive models that are also causally interpretable. In pursuit of such models, we propose a causal regularizer to steer predictive models towards causally-interpretable solutions and theoretically study its properties. In a large-scale analysis of Electronic Health Records (EHR), our causally-regularized model outperforms its L1-regularized counterpart in causal accuracy and is competitive in predictive performance. We perform non-linear causality analysis by causally regularizing a special neural network architecture. We also show that the proposed causal regularizer can be used together with neural representation learning algorithms to yield up to 20% improvement over multilayer perceptron in detecting multivariate causation, a situation common in healthcare, where many causal factors should occur simultaneously to have an effect on the target variable.\nIn a smart city, real-time traffic sensors may be deployed for various applications, such as route planning. Unfortunately, sensors are prone to failures, which result in erroneous traffic data. Erroneous data can adversely affect applications such as route planning, and can cause increased travel time. To minimize the impact of sensor failures, we must detect them promptly and accurately. However, typical detection algorithms may lead to a large number of false positives (i.e., false alarms) and false negatives (i.e., missed detections), which can result in suboptimal route planning. In this paper, we devise an effective detector for identifying faulty traffic sensors using a prediction model based on Gaussian Processes. Further, we present an approach for computing the optimal parameters of the detector which minimize losses due to false-positive and false-negative errors. We also characterize critical sensors, whose failure can have high impact on the route planning application. Finally, we implement our method and evaluate it numerically using a real-world dataset and the route planning platform OpenTripPlanner.\nStatistical Relational Learning (SRL) methods have shown that classification accuracy can be improved by integrating relations between samples. Techniques such as iterative classification or relaxation labeling achieve this by propagating information between related samples during the inference process. When only a few samples are labeled and connections between samples are sparse, collective inference methods have shown large improvements over standard feature-based ML methods. However, in contrast to feature based ML, collective inference methods require complex inference procedures and often depend on the strong assumption of label consistency among related samples. In this paper, we introduce new relational features for standard ML methods by extracting information from direct and indirect relations. We show empirically on three standard benchmark datasets that our relational features yield results comparable to collective inference methods. Finally we show that our proposal outperforms these methods when additional information is available.\nParameterized algorithms are a way to solve hard problems more efficiently, given that a specific parameter of the input is small. In this paper, we apply this idea to the field of answer set programming (ASP). To this end, we propose two kinds of graph representations of programs to exploit their treewidth as a parameter. Treewidth roughly measures to which extent the internal structure of a program resembles a tree. Our main contribution is the design of parameterized dynamic programming algorithms, which run in linear time if the treewidth and weights of the given program are bounded. Compared to previous work, our algorithms handle the full syntax of ASP. Finally, we report on an empirical evaluation that shows good runtime behaviour for benchmark instances of low treewidth, especially for counting answer sets.\nMatrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.\nRecent research in psycholinguistics has provided increasing evidence that humans predict upcoming content. Prediction also affects perception and might be a key to robustness in human language processing. In this paper, we investigate the factors that affect human prediction by building a computational model that can predict upcoming discourse referents based on linguistic knowledge alone vs. linguistic knowledge jointly with common-sense knowledge in the form of scripts. We find that script knowledge significantly improves model estimates of human predictions. In a second study, we test the highly controversial hypothesis that predictability influences referring expression type but do not find evidence for such an effect.\nEnd-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-to-end approaches, HCNs considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent representation of dialog state. In addition, HCNs can be optimized with supervised learning, reinforcement learning, or a mixture of both. HCNs attain state-of-the-art performance on the bAbI dialog dataset, and outperform two commercially deployed customer-facing dialog systems.\nSynapse crossbar is an elementary structure in Neuromorphic Computing Systems (NCS). However, the limited size of crossbars and heavy routing congestion impedes the NCS implementations of big neural networks. In this paper, we propose a two-step framework (namely, group scissor) to scale NCS designs to big neural networks. The first step is rank clipping, which integrates low-rank approximation into the training to reduce total crossbar area. The second step is group connection deletion, which structurally prunes connections to reduce routing congestion between crossbars. Tested on convolutional neural networks of LeNet on MNIST database and ConvNet on CIFAR-10 database, our experiments show significant reduction of crossbar area and routing area in NCS designs. Without accuracy loss, rank clipping reduces total crossbar area to 13.62\\% and 51.81\\% in the NCS designs of LeNet and ConvNet, respectively. Following rank clipping, group connection deletion further reduces the routing area of LeNet and ConvNet to 8.1\\% and 52.06\\%, respectively.\nWe present Octopus, an AI agent to jointly balance three conflicting task objectives on a micro-crowdsourcing marketplace - the quality of work, total cost incurred, and time to completion. Previous control agents have mostly focused on cost-quality, or cost-time tradeoffs, but not on directly controlling all three in concert. A naive formulation of three-objective optimization is intractable; Octopus takes a hierarchical POMDP approach, with three different components responsible for setting the pay per task, selecting the next task, and controlling task-level quality. We demonstrate that Octopus significantly outperforms existing state-of-the-art approaches on real experiments. We also deploy Octopus on Amazon Mechanical Turk, showing its ability to manage tasks in a real-world dynamic setting.\nThe lack of diversity in a genetic algorithm's population may lead to a bad performance of the genetic operators since there is not an equilibrium between exploration and exploitation. In those cases, genetic algorithms present a fast and unsuitable convergence.   In this paper we develop a novel hybrid genetic algorithm which attempts to obtain a balance between exploration and exploitation. It confronts the diversity problem using the named greedy diversification operator. Furthermore, the proposed algorithm applies a competition between parent and children so as to exploit the high quality visited solutions. These operators are complemented by a simple selection mechanism designed to preserve and take advantage of the population diversity.   Additionally, we extend our proposal to the field of memetic algorithms, obtaining an improved model with outstanding results in practice.   The experimental study shows the validity of the approach as well as how important is taking into account the exploration and exploitation concepts when designing an evolutionary algorithm.\nThis paper studies an auction design problem for a seller to sell a commodity in a social network, where each individual (the seller or a buyer) can only communicate with her neighbors. The challenge to the seller is to design a mechanism to incentivize the buyers, who are aware of the auction, to further propagate the information to their neighbors so that more buyers will participate in the auction and hence, the seller will be able to make a higher revenue. We propose a novel auction mechanism, called information diffusion mechanism (IDM), which incentivizes the buyers to not only truthfully report their valuations on the commodity to the seller, but also further propagate the auction information to all their neighbors. In comparison, the direct extension of the well-known Vickrey-Clarke-Groves (VCG) mechanism in social networks can also incentivize the information diffusion, but it will decrease the seller's revenue or even lead to a deficit sometimes. The formalization of the problem has not yet been addressed in the literature of mechanism design and our solution is very significant in the presence of large-scale online social networks.\nAgglutinative languages such as Turkish, Finnish and Hungarian require morphological disambiguation before further processing due to the complex morphology of words. A morphological disambiguator is used to select the correct morphological analysis of a word. Morphological disambiguation is important because it generally is one of the first steps of natural language processing and its performance affects subsequent analyses. In this paper, we propose a system that uses deep learning techniques for morphological disambiguation. Many of the state-of-the-art results in computer vision, speech recognition and natural language processing have been obtained through deep learning models. However, applying deep learning techniques to morphologically rich languages is not well studied. In this work, while we focus on Turkish morphological disambiguation we also present results for French and German in order to show that the proposed architecture achieves high accuracy with no language-specific feature engineering or additional resource. In the experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation accuracy among the ambiguous words for Turkish, German and French respectively.\nThe Reservoir Computing (RC) paradigm utilizes a dynamical system, i.e., a reservoir, and a linear classifier, i.e., a read-out layer, to process data from sequential classification tasks. In this paper the usage of Cellular Automata (CA) as a reservoir is investigated. The use of CA in RC has been showing promising results. In this paper, selected state-of-the-art experiments are reproduced. It is shown that some CA-rules perform better than others, and the reservoir performance is improved by increasing the size of the CA reservoir itself. In addition, the usage of parallel loosely coupled CA-reservoirs, where each reservoir has a different CA-rule, is investigated. The experiments performed on quasi-uniform CA reservoir provide valuable insights in CA reservoir design. The results herein show that some rules do not work well together, while other combinations work remarkably well. This suggests that non-uniform CA could represent a powerful tool for novel CA reservoir implementations.\nNatural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-by-word or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model under the \"matching-aggregation\" framework. Given two sentences $P$ and $Q$, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions $P \\rightarrow Q$ and $P \\leftarrow Q$. In each matching direction, each time step of one sentence is matched against all time-steps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fix-length matching vector. Finally, based on the matching vector, the decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.\nUsually bilingual word vectors are trained \"online\". Mikolov et al. showed they can also be found \"offline\", whereby two pre-trained embeddings are aligned with a linear transformation, using dictionaries compiled from expert knowledge. In this work, we prove that the linear transformation between two spaces should be orthogonal. This transformation can be obtained using the singular value decomposition. We introduce a novel \"inverted softmax\" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian. Orthogonal transformations are more robust to noise, enabling us to learn the transformation without expert bilingual signal by constructing a \"pseudo-dictionary\" from the identical character strings which appear in both languages, achieving 40% precision on the same test set. Finally, we extend our method to retrieve the true translations of English sentences from a corpus of 200k Italian sentences with a precision @1 of 68%.\nWe present our experiences using cloud computing to support data-intensive analytics on satellite imagery for commercial applications. Drawing from our background in high-performance computing, we draw parallels between the early days of clustered computing systems and the current state of cloud computing and its potential to disrupt the HPC market. Using our own virtual file system layer on top of cloud remote object storage, we demonstrate aggregate read bandwidth of 230 gigabytes per second using 512 Google Compute Engine (GCE) nodes accessing a USA multi-region standard storage bucket. This figure is comparable to the best HPC storage systems in existence. We also present several of our application results, including the identification of field boundaries in Ukraine, and the generation of a global cloud-free base layer from Landsat imagery.\nThe proposed algorithmic approach deals with finding the sense of a word in an electronic data. Now a day,in different communication mediums like internet, mobile services etc. people use few words, which are slang in nature. This approach detects those abusive words using supervised learning procedure. But in the real life scenario, the slang words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach can detect those abbreviated forms also using semi supervised learning procedure. Using the synset and concept analysis of the text, the probability of a suspicious word to be a slang word is also evaluated.\nMachine learning and deep learning in particular has advanced tremendously on perceptual tasks in recent years. However, it remains vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to a human. In this work, we propose to augment deep neural networks with a small \"detector\" subnetwork which is trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. Our method is orthogonal to prior work on addressing adversarial perturbations, which has mostly focused on making the classification network itself more robust. We show empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Moreover, while the detectors have been trained to detect only a specific adversary, they generalize to similar and weaker adversaries. In addition, we propose an adversarial attack that fools both the classifier and the detector and a novel training procedure for the detector that counteracts this attack.\nThis paper describes the details of Sighthound's fully automated age, gender and emotion recognition system. The backbone of our system consists of several deep convolutional neural networks that are not only computationally inexpensive, but also provide state-of-the-art results on several competitive benchmarks. To power our novel deep networks, we collected large labeled datasets through a semi-supervised pipeline to reduce the annotation effort/time. We tested our system on several public benchmarks and report outstanding results. Our age, gender and emotion recognition models are available to developers through the Sighthound Cloud API at https://www.sighthound.com/products/cloud\nThis paper studies scenarios of cyclic dominance in a coevolutionary spatial model in which game strategies and links between agents adaptively evolve over time. The Optional Prisoner's Dilemma (OPD) game is employed. The OPD is an extended version of the traditional Prisoner's Dilemma where players have a third option to abstain from playing the game. We adopt an agent-based simulation approach and use Monte Carlo methods to perform the OPD with coevolutionary rules. The necessary conditions to break the scenarios of cyclic dominance are also investigated. This work highlights that cyclic dominance is essential in the sustenance of biodiversity. Moreover, we also discuss the importance of a spatial coevolutionary model in maintaining cyclic dominance in adverse conditions.\nThis paper investigates the validity of Kleinberg's axioms for clustering functions with respect to the quite popular clustering algorithm called $k$-means. While Kleinberg's axioms have been discussed heavily in the past, we concentrate here on the case predominantly relevant for $k$-means algorithm, that is behavior embedded in Euclidean space. We point at some contradictions and counter intuitiveness aspects of this axiomatic set within $\\mathbb{R}^m$ that were evidently not discussed so far. Our results suggest that apparently without defining clearly what kind of clusters we expect we will not be able to construct a valid axiomatic system. In particular we look at the shape and the gaps between the clusters. Finally we demonstrate that there exist several ways to reconcile the formulation of the axioms with their intended meaning and that under this reformulation the axioms stop to be contradictory and the real-world $k$-means algorithm conforms to this axiomatic system.\nThe Minimum Weight Dominating Set (MWDS) problem is an important generalization of the Minimum Dominating Set (MDS) problem with extensive applications. This paper proposes a new local search algorithm for the MWDS problem, which is based on two new ideas. The first idea is a heuristic called two-level configuration checking (CC2), which is a new variant of a recent powerful configuration checking strategy (CC) for effectively avoiding the recent search paths. The second idea is a novel scoring function based on the frequency of being uncovered of vertices. Our algorithm is called CC2FS, according to the names of the two ideas. The experimental results show that, CC2FS performs much better than some state-of-the-art algorithms in terms of solution quality on a broad range of MWDS benchmarks.\nThis article presents the prediction difference analysis method for visualizing the response of a deep neural network to a specific input. When classifying images, the method highlights areas in a given input image that provide evidence for or against a certain class. It overcomes several shortcoming of previous methods and provides great additional insight into the decision making process of classifiers. Making neural network decisions interpretable through visualization is important both to improve models and to accelerate the adoption of black-box classifiers in application areas such as medicine. We illustrate the method in experiments on natural images (ImageNet data), as well as medical images (MRI brain scans).\nCongestive Heart Failure, or CHF, is a serious medical condition that can result in fluid buildup in the body as a result of a weak heart. When the heart can't pump enough blood to efficiently deliver nutrients and oxygen to the body, kidney function may be impaired, resulting in fluid retention. CHF patients require a broad drug regimen to maintain the delicate system balance, particularly between their heart and kidneys. These drugs include ACE inhibitors and Beta Blockers to control blood pressure, anticoagulants to prevent blood clots, and diuretics to reduce fluid overload. Many of these drugs may interact, and potential effects of these interactions must be weighed against their benefits. For this project, we consider a set of 44 drugs identified as specifically relevant for treating CHF by pediatric cardiologists at Lucile Packard Children's Hospital. This list was generated as part of our current work at the LPCH Heart Center. The goal of this project is to identify and evaluate potentially harmful drug-drug interactions (DDIs) within pediatric patients with Congestive Heart Failure. This identification will be done autonomously, so that it may continuously update by evaluating newly published literature.\nIn modern machine learning, pattern recognition replaces realtime semantic reasoning. The mapping from input to output is learned with fixed semantics by training outcomes deliberately. This is an expensive and static approach which depends heavily on the availability of a very particular kind of prior raining data to make inferences in a single step. Conventional semantic network approaches, on the other hand, base multi-step reasoning on modal logics and handcrafted ontologies, which are ad hoc, expensive to construct, and fragile to inconsistency. Both approaches may be enhanced by a hybrid approach, which completely separates reasoning from pattern recognition. In this report, a quasi-linguistic approach to knowledge representation is discussed, motivated by spacetime structure. Tokenized patterns from diverse sources are integrated to build a lightly constrained and approximately scale-free network. This is then be parsed with very simple recursive algorithms to generate `brainstorming' sets of reasoned knowledge.\nIn this work we study the quantitative relation between the recursive teaching dimension (RTD) and the VC dimension (VCD) of concept classes of finite sizes. The RTD of a concept class $\\mathcal C \\subseteq \\{0, 1\\}^n$, introduced by Zilles et al. (2011), is a combinatorial complexity measure characterized by the worst-case number of examples necessary to identify a concept in $\\mathcal C$ according to the recursive teaching model.   For any finite concept class $\\mathcal C \\subseteq \\{0,1\\}^n$ with $\\mathrm{VCD}(\\mathcal C)=d$, Simon & Zilles (2015) posed an open problem $\\mathrm{RTD}(\\mathcal C) = O(d)$, i.e., is RTD linearly upper bounded by VCD? Previously, the best known result is an exponential upper bound $\\mathrm{RTD}(\\mathcal C) = O(d \\cdot 2^d)$, due to Chen et al. (2016). In this paper, we show a quadratic upper bound: $\\mathrm{RTD}(\\mathcal C) = O(d^2)$, much closer to an answer to the open problem. We also discuss the challenges in fully solving the problem.\nDistributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In contrast, the synchronous approach is often thought to be impractical due to idle time wasted on waiting for straggling workers. We revisit these conventional beliefs in this paper, and examine the weaknesses of both approaches. We demonstrate that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers. Our approach is empirically validated and shown to converge faster and to better test accuracies.\nDistributed optimization algorithms are widely used in many industrial machine learning applications. However choosing the appropriate algorithm and cluster size is often difficult for users as the performance and convergence rate of optimization algorithms vary with the size of the cluster. In this paper we make the case for an ML-optimizer that can select the appropriate algorithm and cluster size to use for a given problem. To do this we propose building two models: one that captures the system level characteristics of how computation, communication change as we increase cluster sizes and another that captures how convergence rates change with cluster sizes. We present preliminary results from our prototype implementation called Hemingway and discuss some of the challenges involved in developing such a system.\nTraditionally, multi-layer neural networks use dot product between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded, thus increases the risk of large variance. Large variance of neuron makes the model sensitive to the change of input distribution, thus results in poor generalization, and aggravates the internal covariate shift which slows down the training. To bound dot product and decrease the variance, we propose to use cosine similarity or centered cosine similarity (Pearson Correlation Coefficient) instead of dot product in neural networks, which we call cosine normalization. We compare cosine normalization with batch, weight and layer normalization in fully-connected neural networks as well as convolutional networks on the data sets of MNIST, 20NEWS GROUP, CIFAR-10/100 and SVHN. Experiments show that cosine normalization achieves better performance than other normalization techniques.\nReinforcement Learning algorithms can learn complex behavioral patterns for sequential decision making tasks wherein an agent interacts with an environment and acquires feedback in the form of rewards sampled from it. Traditionally, such algorithms make decisions, i.e., select actions to execute, at every single time step of the agent-environment interactions. In this paper, we propose a novel framework, Fine Grained Action Repetition (FiGAR), which enables the agent to decide the action as well as the time scale of repeating it. FiGAR can be used for improving any Deep Reinforcement Learning algorithm which maintains an explicit policy estimate by enabling temporal abstractions in the action space. We empirically demonstrate the efficacy of our framework by showing performance improvements on top of three policy search algorithms in different domains: Asynchronous Advantage Actor Critic in the Atari 2600 domain, Trust Region Policy Optimization in Mujoco domain and Deep Deterministic Policy Gradients in the TORCS car racing domain.\nReason and inference require process as well as memory skills by humans. Neural networks are able to process tasks like image recognition (better than humans) but in memory aspects are still limited (by attention mechanism, size). Recurrent Neural Network (RNN) and it's modified version LSTM are able to solve small memory contexts, but as context becomes larger than a threshold, it is difficult to use them. The Solution is to use large external memory. Still, it poses many challenges like, how to train neural networks for discrete memory representation, how to describe long term dependencies in sequential data etc. Most prominent neural architectures for such tasks are Memory networks: inference components combined with long term memory and Neural Turing Machines: neural networks using external memory resources. Also, additional techniques like attention mechanism, end to end gradient descent on discrete memory representation are needed to support these solutions. Preliminary results of above neural architectures on simple algorithms (sorting, copying) and Question Answering (based on story, dialogs) application are comparable with the state of the art. In this paper, I explain these architectures (in general), the additional techniques used and the results of their application.\nOptimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields.\nWe present a novel algorithm that synthesizes imperative programs for introductory programming courses. Given a set of input-output examples and a partial program, our algorithm generates a complete program that is consistent with every example. Our key idea is to combine enumerative program synthesis and static analysis, which aggressively prunes out a large search space while guaranteeing to find, if any, a correct solution. We have implemented our algorithm in a tool, called SIMPL, and evaluated it on 30 problems used in introductory programming courses. The results show that SIMPL is able to solve the benchmark problems in 6.6 seconds on average.\nIn order to obtain reliable accuracy estimates for automatic MOOC dropout predictors, it is important to train and test them in a manner consistent with how they will be used in practice. Yet most prior research on MOOC dropout prediction has measured test accuracy on the same course used for training the classifier, which can lead to overly optimistic accuracy estimates. In order to understand better how accuracy is affected by the training+testing regime, we compared the accuracy of a standard dropout prediction architecture (clickstream features + logistic regression) across 4 different training paradigms. Results suggest that (1) training and testing on the same course (\"post-hoc\") can overestimate accuracy by several percentage points; (2) dropout classifiers trained on proxy labels based on students' persistence are surprisingly competitive with post-hoc training (87.33% versus 90.20% AUC averaged over 8 weeks of 40 HarvardX MOOCs); and (3) classifier performance does not vary significantly with the academic discipline. Finally, we also research new dropout prediction architectures based on deep, fully-connected, feed-forward neural networks and find that (4) networks with as many as 5 hidden layers can statistically significantly increase test accuracy over that of logistic regression.\nColorization of grayscale images has been a hot topic in computer vision. Previous research mainly focuses on producing a colored image to match the original one. However, since many colors share the same gray value, an input grayscale image could be diversely colored while maintaining its reality. In this paper, we design a novel solution for unsupervised diverse colorization. Specifically, we leverage conditional generative adversarial networks to model the distribution of real-world item colors, in which we develop a fully convolutional generator with multi-layer noise to enhance diversity, with multi-layer condition concatenation to maintain reality, and with stride 1 to keep spatial information. With such a novel network architecture, the model yields highly competitive performance on the open LSUN bedroom dataset. The Turing test of 80 humans further indicates our generated color schemes are highly convincible.\nVisual question answering (VQA) has witnessed great progress since May, 2015 as a classic problem unifying visual and textual data into a system. Many enlightening VQA works explore deep into the image and question encodings and fusing methods, of which attention is the most effective and infusive mechanism. Current attention based methods focus on adequate fusion of visual and textual features, but lack the attention to where people focus to ask questions about the image. Traditional attention based methods attach a single value to the feature at each spatial location, which losses many useful information. To remedy these problems, we propose a general method to perform saliency-like pre-selection on overlapped region features by the interrelation of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication based attention method to capture more competent correlation information between visual and textual features. We conduct experiments on the large-scale COCO-VQA dataset and analyze the effectiveness of our model demonstrated by strong empirical results.\nRecent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in security-sensitive settings. It was observed that an adversary could easily generate adversarial samples by making a small perturbation on irrelevant feature dimensions that are unnecessary for the current classification task. To overcome this problem, we introduce a defensive mechanism called DeepCloak. By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs. Comparing with other defensive approaches, DeepCloak is easy to implement and computationally efficient. Experimental results show that DeepCloak can increase the performance of state-of-the-art DNN models against adversarial samples.\nThe algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable.   We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class.   We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.\nThe field of Distributed Constraint Optimization has gained momentum in recent years, thanks to its ability to address various applications related to multi-agent cooperation. Nevertheless, solving Distributed Constraint Optimization Problems (DCOPs) optimally is NP-hard. Therefore, in large-scale, complex applications, incomplete DCOP algorithms are necessary. Current incomplete DCOP algorithms suffer of one or more of the following limitations: they (a) find local minima without providing quality guarantees; (b) provide loose quality assessment; or (c) are unable to benefit from the structure of the problem, such as domain-dependent knowledge and hard constraints. Therefore, capitalizing on strategies from the centralized constraint solving community, we propose a Distributed Large Neighborhood Search (D-LNS) framework to solve DCOPs. The proposed framework (with its novel repair phase) provides guarantees on solution quality, refining upper and lower bounds during the iterative process, and can exploit domain-dependent structures. Our experimental results show that D-LNS outperforms other incomplete DCOP algorithms on both structured and unstructured problem instances.\nThe field of Distributed Constraint Optimization has gained momentum in recent years thanks to its ability to address various applications related to multi-agent cooperation. While techniques to solve Distributed Constraint Optimization Problems (DCOPs) are abundant and have matured substantially since the field inception, the number of DCOP realistic applications and benchmark used to asses the performance of DCOP algorithms is lagging behind. To contrast this background we (i) introduce the Smart Home Device Scheduling (SHDS) problem, which describe the problem of coordinating smart devices schedules across multiple homes as a multi-agent system, (ii) detail the physical models adopted to simulate smart sensors, smart actuators, and homes environments, and (iii) introduce a DCOP realistic benchmark for SHDS problems.\nDevising an optimal strategy for navigation in a partially observable environment is one of the key objectives in AI. One of the problem in this context is the Canadian Traveler Problem (CTP). CTP is a navigation problem where an agent is tasked to travel from source to target in a partially observable weighted graph, whose edge might be blocked with a certain probability and observing such blockage occurs only when reaching upon one of the edges end points. The goal is to find a strategy that minimizes the expected travel cost. The problem is known to be P$\\#$ hard. In this work we study the CTP theoretically and empirically. First, we study the Dep-CTP, a CTP variant we introduce which assumes dependencies between the edges status. We show that Dep-CTP is intractable, and further we analyze two of its subclasses on disjoint paths graph. Second, we develop a general algorithm Gen-PAO that optimally solve the CTP. Gen-PAO is capable of solving two other types of CTP called Sensing-CTP and Expensive-Edges CTP. Since the CTP is intractable, Gen-PAO use some pruning methods to reduce the space search for the optimal solution. We also define some variants of Gen-PAO, compare their performance and show some benefits of Gen-PAO over existing work.\nApprenticeship learning has recently attracted a wide attention due to its capability of allowing robots to learn physical tasks directly from demonstrations provided by human experts. Most previous techniques assumed that the state space is known a priori or employed simple state representations that usually suffer from perceptual aliasing. Different from previous research, we propose a novel approach named Sequence-based Multimodal Apprenticeship Learning (SMAL), which is capable to simultaneously fusing temporal information and multimodal data, and to integrate robot perception with decision making. To evaluate the SMAL approach, experiments are performed using both simulations and real-world robots in the challenging search and rescue scenarios. The empirical study has validated that our SMAL approach can effectively learn plans for robots to make decisions using sequence of multimodal observations. Experimental results have also showed that SMAL outperforms the baseline methods using individual images.\nRepresentation learning of knowledge graphs encodes entities and relation types into a continuous low-dimensional vector space, learns embeddings of entities and relation types. Most existing methods only concentrate on knowledge triples, ignoring logic rules which contain rich background knowledge. Although there has been some work aiming at leveraging both knowledge triples and logic rules, they ignore the transitivity and antisymmetry of logic rules. In this paper, we propose a novel approach to learn knowledge representations with entities and ordered relations in knowledges and logic rules. The key idea is to integrate knowledge triples and logic rules, and approximately order the relation types in logic rules to utilize the transitivity and antisymmetry of logic rules. All entries of the embeddings of relation types are constrained to be non-negative. We translate the general constrained optimization problem into an unconstrained optimization problem to solve the non-negative matrix factorization. Experimental results show that our model significantly outperforms other baselines on knowledge graph completion task. It indicates that our model is capable of capturing the transitivity and antisymmetry information, which is significant when learning embeddings of knowledge graphs.\nWe introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had performed the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of an autonomous agent into natural language. We evaluate our technique in the Frogger game environment, training an autonomous game playing agent to rationalize its action choices using natural language. A natural language training corpus is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation and show the results of two experiments evaluating the effectiveness of rationalization. Results of these evaluations show that neural machine translation is able to accurately generate rationalizations that describe agent behavior, and that rationalizations are more satisfying to humans than other alternative methods of explanation.\nOpen forms of global constraints allow the addition of new variables to an argument during the execution of a constraint program. Such forms are needed for difficult constraint programming problems where problem construction and problem solving are interleaved, and fit naturally within constraint logic programming. However, in general, filtering that is sound for a global constraint can be unsound when the constraint is open. This paper provides a simple characterization, called contractibility, of the constraints where filtering remains sound when the constraint is open. With this characterization we can easily determine whether a constraint has this property or not. In the latter case, we can use it to derive a contractible approximation to the constraint. We demonstrate this work on both hard and soft constraints. In the process, we formulate two general classes of soft constraints.\nPolicy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.\nDespite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted. The fundamental reason is the difficulty of back-propagation through discrete random variables combined with the inherent instability of the GAN training objective. To address these problems, we propose Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. Instead of directly optimizing the GAN objective, we derive a novel and low-variance objective using the discriminator's output that follows corresponds to the log-likelihood. Compared with the original, the new objective is proved to be consistent in theory and beneficial in practice. The experimental results on various discrete datasets demonstrate the effectiveness of the proposed approach.\nWe propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.\nEffective teams are crucial for organisations, especially in environments that require teams to be constantly created and dismantled, such as software development, scientific experiments, crowd-sourcing, or the classroom. Key factors influencing team performance are competences and personality of team members. Hence, we present a computational model to compose proficient and congenial teams based on individuals' personalities and their competences to perform tasks of different nature. With this purpose, we extend Wilde's post-Jungian method for team composition, which solely employs individuals' personalities. The aim of this study is to create a model to partition agents into teams that are balanced in competences, personality and gender. Finally, we present some preliminary empirical results that we obtained when analysing student performance. Results show the benefits of a more informed team composition that exploits individuals' competences besides information about their personalities.\nBalancing fairness and efficiency in resource allocation is a classical economic and computational problem. The price of fairness measures the worst-case loss of economic efficiency when using an inefficient but fair allocation rule; for indivisible goods in many settings, this price is unacceptably high. One such setting is kidney exchange, where needy patients swap willing but incompatible kidney donors. In this work, we close an open problem regarding the theoretical price of fairness in modern kidney exchanges. We then propose a general hybrid fairness rule that balances a strict lexicographic preference ordering over classes of agents, and a utilitarian objective that maximizes economic efficiency. We develop a utility function for this rule that favors disadvantaged groups lexicographically; but if cost to overall efficiency becomes too high, it switches to a utilitarian objective. This rule has only one parameter which is proportional to a bound on the price of fairness, and can be adjusted by policymakers. We apply this rule to real data from a large kidney exchange and show that our hybrid rule produces more reliable outcomes than other fairness rules.\nWe study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method outperforms prior work on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.\nThe runtime performance of modern SAT solvers on random $k$-CNF formulas is deeply connected with the 'phase-transition' phenomenon seen empirically in the satisfiability of random $k$-CNF formulas. Recent universal hashing-based approaches to sampling and counting crucially depend on the runtime performance of SAT solvers on formulas expressed as the conjunction of both $k$-CNF and XOR constraints (known as $k$-CNF-XOR formulas), but the behavior of random $k$-CNF-XOR formulas is unexplored in prior work. In this paper, we present the first study of the satisfiability of random $k$-CNF-XOR formulas. We show empirical evidence of a surprising phase-transition that follows a linear trade-off between $k$-CNF and XOR constraints. Furthermore, we prove that a phase-transition for $k$-CNF-XOR formulas exists for $k = 2$ and (when the number of $k$-CNF constraints is small) for $k > 2$.\nWe study the problem of causal structure learning over a set of random variables when the experimenter is allowed to perform at most $M$ experiments in a non-adaptive manner. We consider the optimal learning strategy in terms of minimizing the portions of the structure that remains unknown given the limited number of experiments in both Bayesian and minimax setting. We characterize the theoretical optimal solution and propose an algorithm, which designs the experiments efficiently in terms of time complexity. We show that for bounded degree graphs, in the minimax case and in the Bayesian case with uniform priors, our proposed algorithm is a $\\rho$-approximation algorithm, where $\\rho$ is independent of the order of the underlying graph. Simulations on both synthetic and real data show that the performance of our algorithm is very close to the optimal solution.\nFor a safe, natural and effective human-robot social interaction, it is essential to develop a system that allows a robot to demonstrate the perceivable responsive behaviors to complex human behaviors. We introduce the Multimodal Deep Attention Recurrent Q-Network using which the robot exhibits human-like social interaction skills after 14 days of interacting with people in an uncontrolled real world. Each and every day during the 14 days, the system gathered robot interaction experiences with people through a hit-and-trial method and then trained the MDARQN on these experiences using end-to-end reinforcement learning approach. The results of interaction based learning indicate that the robot has learned to respond to complex human behaviors in a perceivable and socially acceptable manner.\nThis paper presents an approach for transforming data granularity in hierarchical databases for binary decision problems by applying regression to categorical attributes at the lower grain levels. Attributes from a lower hierarchy entity in the relational database have their information content optimized through regression on the categories histogram trained on a small exclusive labelled sample, instead of the usual mode category of the distribution. The paper validates the approach on a binary decision task for assessing the quality of secondary schools focusing on how logistic regression transforms the students and teachers attributes into school attributes. Experiments were carried out on Brazilian schools public datasets via 10-fold cross-validation comparison of the ranking score produced also by logistic regression. The proposed approach achieved higher performance than the usual distribution mode transformation and equal to the expert weighing approach measured by the maximum Kolmogorov-Smirnov distance and the area under the ROC curve at 0.01 significance level.\nThe optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past years, in particular in machine learning and data mining. But in applications, the parameters of the problem are rarely known exactly, and using wrong parameters can lead to undesirable outcomes. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. (2012) from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex-concave saddle point problem (or game). We show that this nonconvex problem can be solved exactly by leveraging connections to continuous submodular functions, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which such a problem can be solved to arbitrary precision $\\epsilon$.\nWe consider elections where the voters come one at a time, in a streaming fashion, and devise space-efficient algorithms which identify an approximate winning committee with respect to common multiwinner proportional representation voting rules; specifically, we consider the Approval-based and the Borda-based variants of both the Chamberlin-- ourant rule and the Monroe rule. We complement our algorithms with lower bounds. Somewhat surprisingly, our results imply that, using space which does not depend on the number of voters it is possible to efficiently identify an approximate representative committee of fixed size over vote streams with huge number of voters.\nWe establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that softmax consistent action values correspond to optimal entropy regularized policy probabilities along any action sequence, regardless of provenance. From this observation, we develop a new RL algorithm, Path Consistency Learning (PCL), that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces. We examine the behavior of PCL in different scenarios and show that PCL can be interpreted as generalizing both actor-critic and Q-learning algorithms. We subsequently deepen the relationship by showing how a single model can be used to represent both a policy and the corresponding softmax state values, eliminating the need for a separate critic. The experimental evaluation demonstrates that PCL significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks.\nTask-oriented dialog systems have been applied in various tasks, such as automated personal assistants, customer service providers and tutors. These systems work well when users have clear and explicit intentions that are well-aligned to the systems' capabilities. However, they fail if users intentions are not explicit. To address this shortcoming, we propose a framework to interleave non-task content (i.e. everyday social conversation) into task conversations. When the task content fails, the system can still keep the user engaged with the non-task content. We trained a policy using reinforcement learning algorithms to promote long-turn conversation coherence and consistency, so that the system can have smooth transitions between task and non-task content. To test the effectiveness of the proposed framework, we developed a movie promotion dialog system. Experiments with human users indicate that a system that interleaves social and task content achieves a better task success rate and is also rated as more engaging compared to a pure task-oriented system.\nOne-sided matching mechanisms are fundamental for assigning a set of indivisible objects to a set of self-interested agents when monetary transfers are not allowed. Two widely-studied randomized mechanisms in multiagent settings are the Random Serial Dictatorship (RSD) and the Probabilistic Serial Rule (PS). Both mechanisms require only that agents specify ordinal preferences and have a number of desirable economic and computational properties. However, the induced outcomes of the mechanisms are often incomparable and thus there are challenges when it comes to deciding which mechanism to adopt in practice. In this paper, we first consider the space of general ordinal preferences and provide empirical results on the (in)comparability of RSD and PS. We analyze their respective economic properties under general and lexicographic preferences. We then instantiate utility functions with the goal of gaining insights on the manipulability, efficiency, and envyfreeness of the mechanisms under different risk-attitude models. Our results hold under various preference distribution models, which further confirm the broad use of RSD in most practical applications.\nAn increasing amount of information is generated from the rapidly increasing number of sensor networks and smart devices. A wide variety of sources generate and publish information in different formats, thus highlighting interoperability as one of the key prerequisites for the success of Internet of Things (IoT). The BT Hypercat Data Hub provides a focal point for the sharing and consumption of available datasets from a wide range of sources. In this work, we propose a semantic enrichment of the BT Hypercat Data Hub, using well-accepted Semantic Web standards and tools. We propose an ontology that captures the semantics of the imported data and present the BT SPARQL Endpoint by means of a mapping between SPARQL and SQL queries. Furthermore, federated SPARQL queries allow queries over multiple hub-based and external data sources. Finally, we provide two use cases in order to illustrate the advantages afforded by our semantic approach.\nLarge computer-understandable proofs consist of millions of intermediate logical steps. The vast majority of such steps originate from manually selected and manually guided heuristics applied to intermediate goals. So far, machine learning has generally not been used to filter or generate these steps. In this paper, we introduce a new dataset based on Higher-Order Logic (HOL) proofs, for the purpose of developing new machine learning-based theorem-proving strategies. We make this dataset publicly available under the BSD license. We propose various machine learning tasks that can be performed on this dataset, and discuss their significance for theorem proving. We also benchmark a set of simple baseline machine learning models suited for the tasks (including logistic regression, convolutional neural networks and recurrent neural networks). The results of our baseline models show the promise of applying machine learning to HOL theorem proving.\nMost exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we call Prioritized DCI, and show a remarkable improvement in the dependence of query time on intrinsic dimensionality. In particular, a linear increase in intrinsic dimensionality, or equivalently, an exponential increase in the number of points near a query, can be mostly counteracted with just a linear increase in space. We also demonstrate empirically that Prioritized DCI significantly outperforms prior methods. In particular, relative to Locality-Sensitive Hashing (LSH), Prioritized DCI reduces the number of distance evaluations by a factor of 14 to 116 and the memory consumption by a factor of 21.\nLearning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning. In this paper, we explore learning an optimization algorithm for training shallow neural nets. Such high-dimensional stochastic optimization problems present interesting challenges for existing reinforcement learning algorithms. We develop an extension that is suited to learning optimization algorithms in this setting and demonstrate that the learned optimization algorithm consistently outperforms other known optimization algorithms even on unseen tasks and is robust to changes in stochasticity of gradients and the neural net architecture. More specifically, we show that an optimization algorithm trained with the proposed method on the problem of training a neural net on MNIST generalizes to the problems of training neural nets on the Toronto Faces Dataset, CIFAR-10 and CIFAR-100.\nThis paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. In this paper, we explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers and with respect to layer parameters; we develop a highly efficient solver for these layers that exploits fast GPU-based batch solves within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve; and we highlight the application of these approaches in several problems. In one notable example, we show that the method is capable of learning to play mini-Sudoku (4x4) given just input and output games, with no a priori information about the rules of the game; this highlights the ability of our architecture to learn hard constraints better than other neural architectures.\nIn this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI). Based on Gibbs sampling, our weakly supervised grammar learning can automatically construct a hierarchical representation of an interaction with long-term joint sub-tasks of both agents and short term atomic actions of individual agents. Based on a new RGB-D video dataset with rich instances of human interactions, our experiments of Baxter simulation, human evaluation, and real Baxter test demonstrate that the model learned from limited training data successfully generates human-like behaviors in unseen scenarios and outperforms both baselines.\nThe selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.\nThe success of deep learning depends on finding an architecture to fit the task. As deep learning has scaled up to more challenging tasks, the architectures have become difficult to design by hand. This paper proposes an automated method, CoDeepNEAT, for optimizing deep learning architectures through evolution. By extending existing neuroevolution methods to topology, components, and hyperparameters, this method achieves results comparable to best human designs in standard benchmarks in object recognition and language modeling. It also supports building a real-world application of automated image captioning on a magazine website. Given the anticipated increases in available computing power, evolution of deep networks is promising approach to constructing deep learning applications in the future.\nOnline two-sided matching markets such as Q&A forums (e.g. StackOverflow, Quora) and online labour platforms (e.g. Upwork) critically rely on the ability to propose adequate matches based on imperfect knowledge of the two parties to be matched. This prompts the following question: Which matching recommendation algorithms can, in the presence of such uncertainty, lead to efficient platform operation?   To answer this question, we develop a model of a task / server matching system. For this model, we give a necessary and sufficient condition for an incoming stream of tasks to be manageable by the system. We further identify a so-called back-pressure policy under which the throughput that the system can handle is optimized. We show that this policy achieves strictly larger throughput than a natural greedy policy. Finally, we validate our model and confirm our theoretical findings with experiments based on logs of Math.StackExchange, a StackOverflow forum dedicated to mathematics.\nMachine-learning techniques have been recently used with spectacular results to generate artefacts such as music or text. However, these techniques are still unable to capture and generate artefacts that are convincingly structured. In this paper we present an approach to generate structured musical sequences. We introduce a mechanism for sampling efficiently variations of musical sequences. Given a input sequence and a statistical model, this mechanism samples a set of sequences whose distance to the input sequence is approximately within specified bounds. This mechanism is implemented as an extension of belief propagation, and uses local fields to bias the generation. We show experimentally that sampled sequences are indeed closely correlated to the standard musical similarity measure defined by Mongeau and Sankoff. We then show how this mechanism can used to implement composition strategies that enforce arbitrary structure on a musical lead sheet generation problem.\nPlan Recognition algorithms require to recognize a complete hierarchy explaining the agent's actions and goals. While the output of such algorithms is informative to the recognizer, the cost of its calculation is high in run-time, space, and completeness. Moreover, performing plan recognition online requires the observing agent to reason about future actions that have not yet been seen and maintain a set of hypotheses to support all possible options. This paper presents a new and efficient algorithm for online plan recognition called SLIM (Semi-Lazy Inference Mechanism). It combines both a bottom-up and top-down parsing processes, which allow it to commit only to the minimum necessary actions in real-time, but still provide complete hypotheses post factum. We show both theoretically and empirically that although the computational cost of this process is still exponential, there is a significant improvement in run-time when compared to a state of the art of plan recognition algorithm.\nUnsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit .\nIn this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The data set contains total of 13.6M articles, 5.0B tokens, 13.8M mention entity co-occurrences. DAWT contains 4.8 times more anchor text to entity links than originally present in the Wikipedia markup. Moreover, it spans several languages including English, Spanish, Italian, German, French and Arabic. We also present the methodology used to generate the dataset which enriches Wikipedia markup in order to increase number of links. In addition to the main dataset, we open up several derived datasets including mention entity co-occurrence counts and entity embeddings, as well as mappings between Freebase ids and Wikidata item ids. We also discuss two applications of these datasets and hope that opening them up would prove useful for the Natural Language Processing and Information Retrieval communities, as well as facilitate multi-lingual research.\nGeneric generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain. This paper aims at generating plausible natural language sentences, whose attributes are dynamically controlled by learning disentangled latent representations with designated semantics. We propose a new neural generative model which combines variational auto-encoders and holistic attribute discriminators for effective imposition of semantic structures. With differentiable approximation to discrete text samples, explicit constraints on independent attribute controls, and efficient collaborative learning of generator and discriminators, our model learns highly interpretable representations from even only word annotations, and produces realistic sentences with desired attributes. Quantitative evaluation validates the accuracy of sentence and attribute generation.\nRepresentation learning and option discovery are two of the biggest challenges in reinforcement learning (RL). Proto-value functions (PVFs) are a well-known approach for representation learning in MDPs. In this paper we address the option discovery problem by showing how PVFs implicitly define options. We do it by introducing eigenpurposes, intrinsic reward functions derived from the learned representations. The options discovered from eigenpurposes traverse the principal directions of the state space. They are useful for multiple tasks because they are discovered without taking the environment's rewards into consideration. Moreover, different options act at different time scales, making them helpful for exploration. We demonstrate features of eigenpurposes in traditional tabular domains as well as in Atari 2600 games.\nRestricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.\nOne of the major drawbacks of modularized task-completion dialogue systems is that each module is trained individually, which presents several challenges. For example, downstream modules are affected by earlier modules, and the performance of the entire system is not robust to the accumulated errors. This paper presents a novel end-to-end learning framework for task-completion dialogue systems to tackle such issues. Our neural dialogue system can directly interact with a structured database to assist users in accessing information and accomplishing certain tasks. The reinforcement learning based dialogue manager offers robust capabilities to handle noises caused by other components of the dialogue system. Our experiments in a movie-ticket booking domain show that our end-to-end system not only outperforms modularized dialogue system baselines for both objective and subjective evaluation, but also is robust to noises as demonstrated by several systematic experiments with different error granularity and rates specific to the language understanding module.\nWe design a new approach that allows robot learning of new activities from unlabeled human example videos. Given videos of humans executing the same activity from a human's viewpoint (i.e., first-person videos), our objective is to make the robot learn the temporal structure of the activity as its future regression network, and learn to transfer such model for its own motor execution. We present a new deep learning model: We extend the state-of-the-art convolutional object detection network for the representation/estimation of human hands in training videos, and newly introduce the concept of using a fully convolutional network to regress (i.e., predict) the intermediate scene representation corresponding to the future frame (e.g., 1-2 seconds later). Combining these allows direct prediction of future locations of human hands and objects, which enables the robot to infer the motor control plan using our manipulation network. We experimentally confirm that our approach makes learning of robot activities from unlabeled human interaction videos possible, and demonstrate that our robot is able to execute the learned collaborative activities in real-time directly based on its camera input.\nWe introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels -- allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Manager operates at a lower temporal resolution and sets abstract goals which are conveyed to and enacted by the Worker. The Worker generates primitive actions at every tick of the environment. The decoupled structure of FuN conveys several benefits -- in addition to facilitating very long timescale credit assignment it also encourages the emergence of sub-policies associated with different goals set by the Manager. These properties allow FuN to dramatically outperform a strong baseline agent on tasks that involve long-term credit assignment or memorisation. We demonstrate the performance of our proposed system on a range of tasks from the ATARI suite and also from a 3D DeepMind Lab environment.\nCommunity-based question answering (CQA) services are facing key challenges to motivate domain experts to provide timely answers. Recently, CQA services are exploring new incentive models to engage experts and celebrities by allowing them to set a price on their answers. In this paper, we perform a data-driven analysis on two emerging payment-based CQA systems: Fenda (China) and Whale (US). By analyzing a large dataset of 220K questions (worth 1 million USD collectively), we examine how monetary incentives affect different players in the system. We find that, while monetary incentive enables quick answers from experts, it also drives certain users to aggressively game the system for profits. In addition, in this supplier-driven marketplace, users need to proactively adjust their price to make profits. Famous people are unwilling to lower their price, which in turn hurts their income and engagement over time. Finally, we discuss the key implications to future CQA design.\nIn recent years, work has been done to develop the theory of General Reinforcement Learning (GRL). However, there are few examples demonstrating these results in a concrete way. In particular, there are no examples demonstrating the known results regarding gener- alised discounting. We have added to the GRL simulation platform AIXIjs the functionality to assign an agent arbitrary discount functions, and an environment which can be used to determine the effect of discounting on an agent's policy. Using this, we investigate how geometric, hyperbolic and power discounting affect an informed agent in a simple MDP. We experimentally reproduce a number of theoretical results, and discuss some related subtleties. It was found that the agent's behaviour followed what is expected theoretically, assuming appropriate parameters were chosen for the Monte-Carlo Tree Search (MCTS) planning algorithm.\nAs statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable $z$ that influences both the features $\\mathbf{x}$ and the class variable $y$. When the influence of $z$ changes from training to testing data, we find that the classifier accuracy can degrade rapidly. In our approach, we assume that we can predict the value of $z$ at training time with some error. The prediction for $z$ is then fed to Pearl's back-door adjustment to build our model. Because of the attenuation bias caused by measurement error in $z$, standard approaches to controlling for $z$ are ineffective. In response, we propose a method to properly control for the influence of $z$ by first estimating its relationship with the class variable $y$, then updating predictions for $z$ to match that estimated relationship. By adjusting the influence of $z$, we show that we can build a model that exceeds competing baselines on accuracy as well as on robustness over a range of confounding relationships.\nPlausible reasoning concerns situations whose inherent lack of precision is not quantified; that is, there are no degrees or levels of precision, and hence no use of numbers like probabilities. A hopefully comprehensive set of principles that clarifies what it means for a formal logic to do plausible reasoning is presented. A new propositional logic, called Propositional Plausible Logic (PPL), is defined and applied to some important examples. PPL is the only non-numeric non-monotonic logic we know of that satisfies all the principles and correctly reasons with all the examples. Some important results about PPL are proved.\nTo be able to interact better with humans, it is crucial for machines to understand sound - a primary modality of human perception. Previous works have used sound to learn embeddings for improved generic textual similarity assessment. In this work, we treat sound as a first-class citizen, studying downstream textual tasks which require aural grounding. To this end, we propose sound-word2vec - a new embedding scheme that learns specialized word embeddings grounded in sounds. For example, we learn that two seemingly (semantically) unrelated concepts, like leaves and paper are similar due to the similar rustling sounds they make. Our embeddings prove useful in textual tasks requiring aural reasoning like text-based sound retrieval and discovering foley sound effects (used in movies). Moreover, our embedding space captures interesting dependencies between words and onomatopoeia and outperforms prior work on aurally-relevant word relatedness datasets such as AMEN and ASLex.\nMarkov chain model is widely applied in many fields, especially the field of prediction. The classical Discrete-time Markov chain(DTMC) is a widely used method for prediction. However, the classical DTMC model has some limitation when the system is complex with uncertain information or state space is not discrete. To address it, a new belief Markov chain model is proposed by combining Dempster-Shafer evidence theory with Markov chain. In our model, the uncertain data is allowed to be handle in the form of interval number and the basic probability assignment(BPA) is generated based on the distance between interval numbers. The new belief Markov chain model overcomes the shortcomings of classical Markov chain and has an efficient ability in dealing with uncertain information. Moreover, an example of inventory prediction and the comparison between our model and classical DTMC model can show the effectiveness and rationality of our proposed model.\nSupplier selection is a typical multi-criteria decision making (MCDM) problem and lots of uncertain information exist inevitably. To address this issue, a new method was proposed based on interval data fusion. Our method follows the original way to generate classical basic probability assignment(BPA) determined by the distance among the evidences. However, the weights of criteria are kept as interval numbers to generate interval BPAs and do the fusion of interval BPAs. Finally, the order is ranked and the decision is made according to the obtained interval BPAs. In this paper, a numerical example of supplier selection is applied to verify the feasibility and validity of our method. The new method is presented aiming at solving multiple-criteria decision-making problems in which the weights of criteria or experts are described in fuzzy data like linguistic terms or interval data.\nWe investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular functions, e.g., the Bayesian A-optimality objective in experimental design. We prove theoretical guarantees supporting the empirical performance. Our guarantees are characterized by a combination of the (generalized) curvature $\\alpha$ and the submodularity ratio $\\gamma$. In particular, we prove that Greedy enjoys a tight approximation guarantee of $\\frac{1}{\\alpha}(1- e^{-\\gamma\\alpha})$ for cardinality constrained maximization. In addition, we bound the submodularity ratio and curvature for several important real-world objectives, including the Bayesian A-optimality objective, the determinantal function of a square submatrix and certain linear programs with combinatorial constraints. We experimentally validate our theoretical findings for both synthetic and real-world applications.\nAdvances in neural network based classifiers have transformed automatic feature learning from a pipe dream of stronger AI to a routine and expected property of practical systems. Since the emergence of AlexNet every winning submission of the ImageNet challenge has employed end-to-end representation learning, and due to the utility of good representations for transfer learning, representation learning has become as an important and distinct task from supervised learning. At present, this distinction is inconsequential, as supervised methods are state-of-the-art in learning transferable representations. But recent work has shown that generative models can also be powerful agents of representation learning. Will the representations learned from these generative methods ever rival the quality of those from their supervised competitors? In this work, we argue in the affirmative, that from an information theoretic perspective, generative models have greater potential for representation learning. Based on several experimentally validated assumptions, we show that supervised learning is upper bounded in its capacity for representation learning in ways that certain generative models, such as Generative Adversarial Networks (GANs) are not. We hope that our analysis will provide a rigorous motivation for further exploration of generative representation learning.\nEpistemic planning can be used for decision making in multi-agent situations with distributed knowledge and capabilities. Recently, Dynamic Epistemic Logic (DEL) has been shown to provide a very natural and expressive framework for epistemic planning. We extend the DEL-based epistemic planning framework to include perspective shifts, allowing us to define new notions of sequential and conditional planning with implicit coordination. With these, it is possible to solve planning tasks with joint goals in a decentralized manner without the agents having to negotiate about and commit to a joint policy at plan time. First we define the central planning notions and sketch the implementation of a planning system built on those notions. Afterwards we provide some case studies in order to evaluate the planner empirically and to show that the concept is useful for multi-agent systems in practice.\nA Robust Markov Decision Process (RMDP) is a sequential decision making model that accounts for uncertainty in the parameters of dynamic systems. This uncertainty introduces difficulties in learning an optimal policy, especially for environments with large state spaces. We propose two algorithms, RTD-DQN and Deep-RoK, for solving large-scale RMDPs using nonlinear approximation schemes such as deep neural networks. The RTD-DQN algorithm incorporates the robust Bellman temporal difference error into a robust loss function, yielding robust policies for the agent. The Deep-RoK algorithm is a robust Bayesian method, based on the Extended Kalman Filter (EKF), that accounts for both the uncertainty in the weights of the approximated value function and the uncertainty in the transition probabilities, improving the robustness of the agent. We provide theoretical results for our approach and test the proposed algorithms on a continuous state domain.\nThe sure thing principle and the law of total probability are basic laws in classic probability theory. A disjunction fallacy leads to the violation of these two classical probability laws. In this paper, a new quantum dynamic belief decision making model based on quantum dynamic modelling and Dempster-Shafer (D-S) evidence theory is proposed to address this issue and model the real human decision-making process. Some mathematical techniques are borrowed from quantum mathematics. Generally, belief and action are two parts in a decision making process. The uncertainty in belief part is represented by a superposition of certain states. The uncertainty in actions is represented as an extra uncertainty state. The interference effect is produced due to the entanglement between beliefs and actions. Basic probability assignment (BPA) of decisions is generated by quantum dynamic modelling. Then BPA of the extra uncertain state and an entanglement degree defined by an entropy function named Deng entropy are used to measure the interference effect. Compared the existing model, the number of free parameters is less in our model. Finally, a classical categorization decision-making experiment is illustrated to show the effectiveness of our model.\nA team of robots sharing a common goal can benefit from coordination of the activities of team members, helping the team to reach the goal more reliably or quickly. We address the problem of coordinating the actions of a team of robots with periodic communication capability executing an information gathering task. We cast the problem as a multi-agent optimal decision-making problem with an information theoretic objective function. We show that appropriate techniques for solving decentralized partially observable Markov decision processes (Dec-POMDPs) are applicable in such information gathering problems. We quantify the usefulness of coordinated information gathering through simulation studies, and demonstrate the feasibility of the method in a real-world target tracking domain.\nWe consider the problem of learning a causal graph over a set of variables with interventions. We study the cost-optimal causal graph learning problem: For a given skeleton (undirected version of the causal graph), design the set of interventions with minimum total cost, that can uniquely identify any causal graph with the given skeleton. We show that this problem is solvable in polynomial time. Later, we consider the case when the number of interventions is limited. For this case, we provide polynomial time algorithms when the skeleton is a tree or a clique tree. For a general chordal skeleton, we develop an efficient greedy algorithm, which can be improved when the causal graph skeleton is an interval graph.\nThis work shows that policies with simple linear and RBF parameterizations can be trained to solve a variety of continuous control tasks, including the OpenAI gym benchmarks. The performance of these trained policies are competitive with state of the art results, obtained with more elaborate parameterizations such as fully connected neural networks. Furthermore, existing training and testing scenarios are shown to be very limited and prone to over-fitting, thus giving rise to only trajectory-centric policies. Training with a diverse initial state distribution is shown to produce more global policies with better generalization. This allows for interactive control scenarios where the system recovers from large on-line perturbations; as shown in the supplementary video.\nThis paper is a tutorial on Formal Concept Analysis (FCA) and its applications. FCA is an applied branch of Lattice Theory, a mathematical discipline which enables formalisation of concepts as basic units of human thinking and analysing data in the object-attribute form. Originated in early 80s, during the last three decades, it became a popular human-centred tool for knowledge representation and data analysis with numerous applications. Since the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics include Information Retrieval with a focus on visualisation aspects, Machine Learning, Data Mining and Knowledge Discovery, Text Mining and several others.\nCluster analysis plays an important role in decision making process for many knowledge-based systems. There exist a wide variety of different approaches for clustering applications including the heuristic techniques, probabilistic models, and traditional hierarchical algorithms. In this paper, a novel heuristic approach based on big bang-big crunch algorithm is proposed for clustering problems. The proposed method not only takes advantage of heuristic nature to alleviate typical clustering algorithms such as k-means, but it also benefits from the memory based scheme as compared to its similar heuristic techniques. Furthermore, the performance of the proposed algorithm is investigated based on several benchmark test functions as well as on the well-known datasets. The experimental results show the significant superiority of the proposed method over the similar algorithms.\nCategorization is necessary for many decision making tasks. However, the categorization process may interfere the decision making result and the law of total probability can be violated in some situations. To predict the interference effect of categorization, some model based on quantum probability has been proposed. In this paper, a new quantum dynamic belief (QDB) model is proposed. Considering the precise decision may not be made during the process, the concept of uncertainty is introduced in our model to simulate real human thinking process. Then the interference effect categorization can be predicted by handling the uncertain information. The proposed model is applied to a categorization decision-making experiment to explain the interference effect of categorization. Compared with other models, our model is relatively more succinct and the result shows the correctness and effectiveness of our model.\nPeople can learn a wide range of tasks from their own experience, but can also learn from observing other creatures. This can accelerate acquisition of new skills even when the observed agent differs substantially from the learning agent in terms of morphology. In this paper, we examine how reinforcement learning algorithms can transfer knowledge between morphologically different agents (e.g., different robots). We introduce a problem formulation where two agents are tasked with learning multiple skills by sharing information. Our method uses the skills that were learned by both agents to train invariant feature spaces that can then be used to transfer other skills from one agent to another. The process of learning these invariant feature spaces can be viewed as a kind of \"analogy making\", or implicit learning of partial correspondences between two distinct domains. We evaluate our transfer learning algorithm in two simulated robotic manipulation skills, and illustrate that we can transfer knowledge between simulated robotic arms with different numbers of links, as well as simulated arms with different actuation mechanisms, where one robot is torque-driven while the other is tendon-driven.\nGene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Furthermore, when the goal is to study causal relationships within the network, tools capable of overcoming the limitations of correlation networks are required. In this work, we make use of Bayesian Graphical Models to attach this problem and, specifically, we perform a comparative study of different state-of-the-art heuristics, analyzing their performance in inferring the structure of the Bayesian Network from breast cancer data.\nOne of the critical issues when adopting Bayesian networks (BNs) to model dependencies among random variables is to \"learn\" their structure, given the huge search space of possible solutions, i.e., all the possible direct acyclic graphs. This is a well-known NP-hard problem, which is also complicated by known pitfalls such as the issue of I-equivalence among different structures. In this work we restrict the investigations on BN structure learning to a specific class of networks, i.e., those representing the dynamics of phenomena characterized by the monotonic accumulation of events. Such phenomena allow to set specific structural constraints based on Suppes' theory of probabilistic causation and, accordingly, to define constrained BNs, named Suppes-Bayes Causal Networks (SBCNs). We here investigate the structure learning of SBCNs via extensive simulations with various state-of-the-art search strategies, such as canonical local search techniques and Genetic Algorithms. Among the main results we show that Suppes' constraints deeply simplify the learning task, by reducing the solution search space and providing a temporal ordering on the variables.\nThe most recent financial upheavals have cast doubt on the adequacy of some of the conventional quantitative risk management strategies, such as VaR (Value at Risk), in many common situations. Consequently, there has been an increasing need for verisimilar financial stress testings, namely simulating and analyzing financial portfolios in extreme, albeit rare scenarios. Unlike conventional risk management which exploits statistical correlations among financial instruments, here we focus our analysis on the notion of probabilistic causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs), SBCNs are probabilistic graphical models that have many attractive features in terms of more accurate causal analysis for generating financial stress scenarios. In this paper, we present a novel approach for conducting stress testing of financial portfolios based on SBCNs in combination with classical machine learning classification tools. The resulting method is shown to be capable of correctly discovering the causal relationships among financial factors that affect the portfolios and thus, simulating stress testing scenarios with a higher accuracy and lower computational complexity than conventional Monte Carlo Simulations.\nExtracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18\\% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.\nThis paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.\nThis study proposes behavior-based navigation architecture, named BBFM, to deal with the problem of navigating the mobile robot in unknown environments in the presence of obstacles and local minimum regions. In the architecture, the complex navigation task is split into principal sub-tasks or behaviors. Each behavior is implemented by a fuzzy controller and executed independently to deal with a specific problem of navigation. The fuzzy controller is modified to contain only the fuzzification and inference procedures so that its output is a membership function representing the behavior's objective. The membership functions of all controllers are then used as the objective functions for a multi-objective optimization process to coordinate all behaviors. The result of this process is an overall control signal, which is Pareto-optimal, used to control the robot. A number of simulations, comparisons, and experiments were conducted. The results show that the proposed architecture outperforms some popular behavior-based architectures in term of accuracy, smoothness, traveled distance, and time response.\nWe propose a new linear algebraic approach to the computation of Tarskian semantics in logic. We embed a finite model M in first-order logic with N entities in N-dimensional Euclidean space R^N by mapping entities of M to N dimensional one-hot vectors and k-ary relations to order-k adjacency tensors (multi-way arrays). Second given a logical formula F in prenex normal form, we compile F into a set Sigma_F of algebraic formulas in multi-linear algebra with a nonlinear operation. In this compilation, existential quantifiers are compiled into a specific type of tensors, e.g., identity matrices in the case of quantifying two occurrences of a variable. It is shown that a systematic evaluation of Sigma_F in R^N gives the truth value, 1(true) or 0(false), of F in M. Based on this framework, we also propose an unprecedented way of computing the least models defined by Datalog programs in linear spaces via matrix equations and empirically show its effectiveness compared to state-of-the-art approaches.\nIn this paper we study selected argument forms involving counterfactuals and indicative conditionals under uncertainty. We selected argument forms to explore whether people with an Eastern cultural background reason differently about conditionals compared to Westerners, because of the differences in the location of negations. In a 2x2 between-participants design, 63 Japanese university students were allocated to four groups, crossing indicative conditionals and counterfactuals, and each presented in two random task orders. The data show close agreement between the responses of Easterners and Westerners. The modal responses provide strong support for the hypothesis that conditional probability is the best predictor for counterfactuals and indicative conditionals. Finally, the grand majority of the responses are probabilistically coherent, which endorses the psychological plausibility of choosing coherence-based probability logic as a rationality framework for psychological reasoning research.\nWe present a method for skin lesion segmentation for the ISIC 2017 Skin Lesion Segmentation Challenge. Our approach is based on a Fully Convolutional Network architecture which is trained end to end, from scratch, on a limited dataset. Our semantic segmentation architecture utilizes several recent innovations in particularly in the combined use of (i) use of atrous convolutions to increase the effective field of view of the network's receptive field without increasing the number of parameters, (ii) the use of network-in-network $1\\times1$ convolution layers to add capacity to the network and (iii) state-of-art super-resolution upsampling of predictions using subpixel CNN layers. We reported a mean IOU score of 0.642 on the validation set provided by the organisers.\nAutonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance detection is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance most of the time. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.\nEvaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for evaluating a policy without requiring it to ever be deployed. Importance sampling is a popular OPE method because it is robust to partial observability and works with continuous states and actions. However, the amount of historical data required by importance sampling can scale exponentially with the horizon of the problem: the number of sequential decisions that are made. We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly improve performance for long-horizon problems. In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling. We further generalize these special cases to a general covariance testing rule that can be used to decide which weights to drop in an IS estimate, and derive a new IS algorithm called Incremental Importance Sampling that can provide significantly more accurate estimates for a broad class of domains.\nTraining deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and time consuming. Recently, researchers have tried to use deep learning algorithms to exploit the landscape of the loss function of the training problem of interest, and learn how to optimize over it in an automatic way. In this paper, we propose a new learning-to-learn model and some useful and practical tricks. Our optimizer outperforms generic, hand-crafted optimization algorithms and state-of-the-art learning-to-learn optimizers by DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a number of tasks, including deep MLPs, CNNs, and simple LSTMs.\nOur overall program objective is to provide more natural ways for soldiers to interact and communicate with robots, much like how soldiers communicate with other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be applied to multimodal human-robot dialogue in a collaborative exploration task. While the WOz method can help design robot behaviors, traditional approaches place the burden of decisions on a single wizard. In this work, we consider two wizards to stand in for robot navigation and dialogue management software components. The scenario used to elicit data is one in which a human-robot team is tasked with exploring an unknown environment: a human gives verbal instructions from a remote location and the robot follows them, clarifying possible misunderstandings as needed via dialogue. We found the division of labor between wizards to be workable, which holds promise for future software development.\nNeural networks are among the most accurate supervised learning methods in use today, but their opacity makes them difficult to trust in critical applications, especially when conditions in training differ from those in test. Recent work on explanations for black-box models has produced tools (e.g. LIME) to show the implicit rules behind predictions, which can help us identify when models are right for the wrong reasons. However, these methods do not scale to explaining entire datasets and cannot correct the problems they reveal. We introduce a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary. We apply these penalties both based on expert annotation and in an unsupervised fashion that encourages diverse models with qualitatively different decision boundaries for the same classification problem. On multiple datasets, we show our approach generates faithful explanations and models that generalize much better when conditions differ between training and test.\nBrain-inspired learning models attempt to mimic the cortical architecture and computations performed in the neurons and synapses constituting the human brain to achieve its efficiency in cognitive tasks. In this work, we present convolutional spike timing dependent plasticity based feature learning with biologically plausible leaky-integrate-and-fire neurons in Spiking Neural Networks (SNNs). We use shared weight kernels that are trained to encode representative features underlying the input patterns thereby improving the sparsity as well as the robustness of the learning model. We demonstrate that the proposed unsupervised learning methodology learns several visual categories for object recognition with fewer number of examples and outperforms traditional fully-connected SNN architectures while yielding competitive accuracy. Additionally, we observe that the learning model performs out-of-set generalization further making the proposed biologically plausible framework a viable and efficient architecture for future neuromorphic applications.\nWe explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.\nIt is well-known that any admissible unidirectional heuristic search algorithm must expand all states whose $f$-value is smaller than the optimal solution cost when using a consistent heuristic. Such states are called \"surely expanded\" (s.e.). A recent study characterized s.e. pairs of states for bidirectional search with consistent heuristics: if a pair of states is s.e. then at least one of the two states must be expanded. This paper derives a lower bound, VC, on the minimum number of expansions required to cover all s.e. pairs, and present a new admissible front-to-end bidirectional heuristic search algorithm, Near-Optimal Bidirectional Search (NBS), that is guaranteed to do no more than 2VC expansions. We further prove that no admissible front-to-end algorithm has a worst case better than 2VC. Experimental results show that NBS competes with or outperforms existing bidirectional search algorithms, and often outperforms A* as well.\nMachine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.\nRecently, reinforcement learning has been successfully applied to the logical game of Go, various Atari games, and even a 3D game, Labyrinth, though it continues to have problems in sparse reward settings. It is difficult to explore, but also difficult to exploit, a small number of successes when learning policy. To solve this issue, the subgoal and option framework have been proposed. However, discovering subgoals online is too expensive to be used to learn options in large state spaces. We propose Micro-objective learning (MOL) to solve this problem. The main idea is to estimate how important a state is while training and to give an additional reward proportional to its importance. We evaluated our algorithm in two Atari games: Montezuma's Revenge and Seaquest. With three experiments to each game, MOL significantly improved the baseline scores. Especially in Montezuma's Revenge, MOL achieved two times better results than the previous state-of-the-art model.\nWe introduce a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions. Unlike dynamics models that operate over individual discrete timesteps, we learn the distribution over future state trajectories conditioned on past state, past action, and planned future action trajectories, as well as a latent prior over action trajectories. Our approach is based on convolutional autoregressive models and variational autoencoders. It makes stable and accurate predictions over long horizons for complex, stochastic systems, effectively expressing uncertainty and modeling the effects of collisions, sensory noise, and action delays. The learned dynamics model and action prior can be used for end-to-end, fully differentiable trajectory optimization and model-based policy optimization, which we use to evaluate the performance and sample-efficiency of our method.\nThe problem of finding conflict-free trajectories for multiple agents of identical circular shape, operating in shared 2D workspace, is addressed in the paper and decoupled, e.g., prioritized, approach is used to solve this problem. Agents' workspace is tessellated into the square grid on which any-angle moves are allowed, e.g. each agent can move into an arbitrary direction as long as this move follows the straight line segment whose endpoints are tied to the distinct grid elements. A novel any-angle planner based on Safe Interval Path Planning (SIPP) algorithm is proposed to find trajectories for an agent moving amidst dynamic obstacles (other agents) on a grid. This algorithm is then used as part of a prioritized multi-agent planner AA-SIPP(m). On the theoretical, side we show that AA-SIPP(m) is complete under well-defined conditions. On the experimental side, in simulation tests with up to 200 agents involved, we show that our planner finds much better solutions in terms of cost (up to 20%) compared to the planners relying on cardinal moves only.\nWe approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input. Once the model is trained, we perform inference by gradient descent on the continuous relaxations of the output variables to find outputs with promising scores from the value network. When applied to image segmentation, the value network takes an image and a segmentation mask as inputs and predicts a scalar estimating the intersection over union between the input and ground truth masks. For multi-label classification, the DVN's objective is to correctly predict the F1 score for any potential label configuration. The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks.\nWith the increasing popularity of machine learning techniques, it has become common to see prediction algorithms operating within some larger process. However, the criteria by which we train these algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming. We present three experimental evaluations of the proposed approach: a classical inventory stock problem, a real-world electrical grid scheduling task, and a real-world energy storage arbitrage task. We show that the proposed approach can outperform both traditional modeling and purely black-box policy optimization approaches in these applications.\nUse Case Points (UCP) is a well-known method to estimate the project size, based on Use Case diagram, at early phases of software development. Although the Use Case diagram is widely accepted as a de-facto model for analyzing object oriented software requirements over the world, UCP method did not take sufficient amount of attention because, as yet, there is no consensus on how to produce software effort from UCP. This paper aims to study the potential of using Fuzzy Model Tree to derive effort estimates based on UCP size measure using a dataset collected for that purpose. The proposed approach has been validated against Treeboost model, Multiple Linear Regression and classical effort estimation based on the UCP model. The obtained results are promising and show better performance than those obtained by classical UCP, Multiple Linear Regression and slightly better than those obtained by Tree boost model.\nCase-Based Reasoning (CBR) has been widely used to generate good software effort estimates. The predictive performance of CBR is a dataset dependent and subject to extremely large space of configuration possibilities. Regardless of the type of adaptation technique, deciding on the optimal number of similar cases to be used before applying CBR is a key challenge. In this paper we propose a new technique based on Bisecting k-medoids clustering algorithm to better understanding the structure of a dataset and discovering the the optimal cases for each individual project by excluding irrelevant cases. Results obtained showed that understanding of the data characteristic prior prediction stage can help in automatically finding the best number of cases for each test project. Performance figures of the proposed estimation method are better than those of other regular K-based CBR methods.\nIn cooperative multiagent planning, it can often be beneficial for an agent to make commitments about aspects of its behavior to others, allowing them in turn to plan their own behaviors without taking the agent's detailed behavior into account. Extending previous work in the Bayesian setting, we consider instead a worst-case setting in which the agent has a set of possible environments (MDPs) it could be in, and develop a commitment semantics that allows for probabilistic guarantees on the agent's behavior in any of the environments it could end up facing. Crucially, an agent receives observations (of reward and state transitions) that allow it to potentially eliminate possible environments and thus obtain higher utility by adapting its policy to the history of observations. We develop algorithms and provide theory and some preliminary empirical results showing that they ensure an agent meets its commitments with history-dependent policies while minimizing maximum regret over the possible environments.\nHow can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.\nVoting systems typically treat all voters equally. We argue that perhaps they should not: Voters who have supported good choices in the past should be given higher weight than voters who have supported bad ones. To develop a formal framework for desirable weighting schemes, we draw on no-regret learning. Specifically, given a voting rule, we wish to design a weighting scheme such that applying the voting rule, with voters weighted by the scheme, leads to choices that are almost as good as those endorsed by the best voter in hindsight. We derive possibility and impossibility results for the existence of such weighting schemes, depending on whether the voting rule and the weighting scheme are deterministic or randomized, as well as on the social choice axioms satisfied by the voting rule.\nRecent development of large-scale question answering (QA) datasets triggered a substantial amount of research into end-to-end neural architectures for QA. Increasingly complex systems have been conceived without comparison to simpler neural baseline systems that would justify their complexity. In this work, we propose a simple heuristic that guides the development of neural baseline systems for the extractive QA task. We find that there are two ingredients necessary for building a high-performing neural QA system: first, the awareness of question words while processing the context and second, a composition function that goes beyond simple bag-of-words modeling, such as recurrent neural networks. Our results show that FastQA, a system that meets these two requirements, can achieve very competitive performance compared with existing models. We argue that this surprising finding puts results of previous systems and the complexity of recent QA datasets into perspective.\nDempster-Shafer theory of evidence is widely applied to uncertainty modelling and knowledge reasoning because of its advantages in dealing with uncertain information. But some conditions or requirements, such as exclusiveness hypothesis and completeness constraint, limit the development and application of that theory to a large extend. To overcome the shortcomings and enhance its capability of representing the uncertainty, a novel model, called D numbers, has been proposed recently. However, many key issues, for example how to implement the combination of D numbers, remain unsolved. In the paper, we have explored the combination of D Numbers from a perspective of conflict redistribution, and proposed two combination rules being suitable for different situations for the fusion of two D numbers. The proposed combination rules can reduce to the classical Dempster's rule in Dempster-Shafer theory under a certain conditions. Numerical examples and discussion about the proposed rules are also given in the paper.\nWe introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded $k$th moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in $\\ell_p$-norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded $1$st moments to a stable \"core\" with bounded $2$nd moments, which may be of independent interest.\nThis paper presents a study of employing Ranking SVM and Convolutional Neural Network for two missions: legal information retrieval and question answering in the Competition on Legal Information Extraction/Entailment. For the first task, our proposed model used a triple of features (LSI, Manhattan, Jaccard), and is based on paragraph level instead of article level as in previous studies. In fact, each single-paragraph article corresponds to a particular paragraph in a huge multiple-paragraph article. For the legal question answering task, additional statistical features from information retrieval task integrated into Convolutional Neural Network contribute to higher accuracy.\nTwo-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as \"lock-in probability\". We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection. As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC.\nKeyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks (CRNNs). Inspired by large-scale state-of-the-art speech recognition systems, we combine the strengths of convolutional layers and recurrent layers to exploit local structure and long-range context. We analyze the effect of architecture parameters, and propose training strategies to improve performance. With only ~230k parameters, our CRNN model yields acceptably low latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise ratio.\nWe consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\\tilde{O}( \\sqrt{HSAT} + H^2S^2A+H\\sqrt{T})$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps. This result improves over the best previous known bound $\\tilde{O}(HS \\sqrt{AT})$ achieved by the UCRL2 algorithm of Jaksch et al., 2010. The key significance of our new results is that when $T\\geq H^3S^3A$ and $SA\\geq H$, it leads to a regret of $\\tilde{O}(\\sqrt{HSAT})$ that matches the established lower bound of $\\Omega(\\sqrt{HSAT})$ up to a logarithmic factor. Our analysis contains two key insights. We use careful application of concentration inequalities to the optimal value function as a whole, rather than to the transitions probabilities (to improve scaling in $S$), and we define Bernstein-based \"exploration bonuses\" that use the empirical variance of the estimated values at the next states (to improve scaling in $H$).\nThe machining process is the most common method for metal cutting, and especially in the finishing of machined parts. In modern industry the goal of production is to manufacture products at a low cost, with high quality in the shortest time. In this research different biomaterials, machinability properties, surface characteristics, cutting tools, cutting fluids and machining conditions for biomaterials with machinability capability are reviewed. In the first step prosthetic acetabular (PA) hip is designed and printed by using selective laser melting (SLM) process then current limitations on fabrication are analyzed to optimize production process and obtain samples with higher quality. The feasibility of artificial intelligence (AI) in machining is determined and In order to calculate dimensional deviation the effect of tool path on tool deflection is modelled. The main focus of this research is determining the machining conditions on surface quality and osseointegration, work hardening and force analyzing of PA. Also the effect of heat treatment on machinability and mechanical properties of produced parts is determined.\nThe policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value function is not always applicable to reinforcement learning problems, so we introduce the particle value function defined by a particle filter over the distributions of an agent's experience, which bounds the risk-sensitive one. We illustrate the benefit of the policy gradients of this objective in Cliffworld.\nKnowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to deal with the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved by enriching them with an encoder model to accumulate evidence over multiple inference steps in the relational graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.\nThe principle of the common cause claims that if an improbable coincidence has occurred, there must exist a common cause. This is generally taken to mean that positive correlations between non-causally related events should disappear when conditioning on the action of some underlying common cause. The extended interpretation of the principle, by contrast, urges that common causes should be called for in order to explain positive deviations between the estimated correlation of two events and the expected value of their correlation. The aim of this paper is to provide the extended reading of the principle with a general probabilistic model, capturing the simultaneous action of a system of multiple common causes. To this end, two distinct models are elaborated, and the necessary and sufficient conditions for their existence are determined.\nMany real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrently-exploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to learn and store distinct policies for each task, but in practice identities of tasks are often non-observable, making these approaches inapplicable. This paper formalizes and addresses the problem of multi-task multi-agent reinforcement learning under partial observability. We introduce a decentralized single-task learning approach that is robust to concurrent interactions of teammates, and present an approach for distilling single-task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity.\nDeep convolutional neural networks (DCNNs) have been used to achieve state-of-the-art performance on many computer vision tasks (e.g., object recognition, object detection, semantic segmentation) thanks to a large repository of annotated image data. Large labeled datasets for other sensor modalities, e.g., multispectral imagery (MSI), are not available due to the large cost and manpower required. In this paper, we adapt state-of-the-art DCNN frameworks in computer vision for semantic segmentation for MSI imagery. To overcome label scarcity for MSI data, we substitute real MSI for generated synthetic MSI in order to initialize a DCNN framework. We evaluate our network initialization scheme on the new RIT-18 dataset that we present in this paper. This dataset contains very-high resolution MSI collected by an unmanned aircraft system. The models initialized with synthetic imagery were less prone to over-fitting and provide a state-of-the-art baseline for future work.\nDeliberating on large or continuous state spaces have been long standing challenges in reinforcement learning. Temporal Abstraction have somewhat made this possible, but efficiently planing using temporal abstraction still remains an issue. Moreover using spatial abstractions to learn policies for various situations at once while using temporal abstraction models is an open problem. We propose here an efficient algorithm which is convergent under linear function approximation while planning using temporally abstract actions. We show how this algorithm can be used along with randomly generated option models over multiple time scales to plan agents which need to act real time. Using these randomly generated option models over multiple time scales are shown to reduce number of decision epochs required to solve the given task, hence effectively reducing the time needed for deliberation.\nThe study of eye gaze fixations on photographic images is an active research area. In contrast, the image subcategory of freehand sketches has not received as much attention for such studies. In this paper, we analyze the results of a free-viewing gaze fixation study conducted on 3904 freehand sketches distributed across 160 object categories. Our analysis shows that fixation sequences exhibit marked consistency within a sketch, across sketches of a category and even across suitably grouped sets of categories. This multi-level consistency is remarkable given the variability in depiction and extreme image content sparsity that characterizes hand-drawn object sketches. In our paper, we show that the multi-level consistency in the fixation data can be exploited to (a) predict a test sketch's category given only its fixation sequence and (b) build a computational model which predicts part-labels underlying fixations on objects. We hope that our findings motivate the community to deem sketch-like representations worthy of gaze-based studies vis-a-vis photographic images.\nRobust belief revision methods are crucial in streaming data situations for updating existing knowledge or beliefs with new incoming evidence. Bayes conditioning is the primary mechanism in use for belief revision in data fusion systems that use probabilistic inference. However, traditional conditioning methods face several challenges due to inherent data/source imperfections in big-data environments that harness soft (i.e., human or human-based) sources in addition to hard (i.e., physics-based) sensors. The objective of this paper is to investigate the most natural extension of Bayes conditioning that is suitable for evidence updating in the presence of such uncertainties. By viewing the evidence updating process as a thought experiment, an elegant strategy is derived for robust evidence updating in the presence of extreme uncertainties that are characteristic of big-data environments. In particular, utilizing the Fagin-Halpern conditional notions, a natural extension to Bayes conditioning is derived for evidence that takes the form of a general belief function. The presented work differs fundamentally from the Conditional Update Equation (CUE) and authors own extensions of it. An overview of this development is provided via illustrative examples. Furthermore, insights into parameter selection under various fusion contexts are also provided.\nThis paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDP-net on different tasks so that it can generalize to new ones in the parameterized task set and \"transfer\" to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.\nPrivacy has traditionally been a major motivation for distributed problem solving. Distributed Constraint Satisfaction Problem (DisCSP) as well as Distributed Constraint Optimization Problem (DCOP) are fundamental models used to solve various families of distributed problems. Even though several approaches have been proposed to quantify and preserve privacy in such problems, none of them is exempt from limitations. Here we approach the problem by assuming that computation is performed among utilitarian agents. We introduce a utilitarian approach where the utility of each state is estimated as the difference between the reward for reaching an agreement on assignments of shared variables and the cost of privacy loss. We investigate extensions to solvers where agents integrate the utility function to guide their search and decide which action to perform, defining thereby their policy. We show that these extended solvers succeed in significantly reducing privacy loss without significant degradation of the solution quality.\nLanguage understanding is a key component in a spoken dialogue system. In this paper, we investigate how the language understanding module influences the dialogue system performance by conducting a series of systematic experiments on a task-oriented neural dialogue system in a reinforcement learning based setting. The empirical study shows that among different types of language understanding errors, slot-level errors can have more impact on the overall performance of a dialogue system compared to intent-level errors. In addition, our experiments demonstrate that the reinforcement learning based dialogue system is able to learn when and what to confirm in order to achieve better performance and greater robustness.\nLocal Process Models (LPM) describe structured fragments of process behavior occurring in the context of less structured business processes. Traditional LPM discovery aims to generate a collection of process models that describe highly frequent behavior, but these models do not always provide useful answers for questions posed by process analysts aiming at business process improvement. We propose a framework for goal-driven LPM discovery, based on utility functions and constraints. We describe four scopes on which these utility functions and constrains can be defined, and show that utility functions and constraints on different scopes can be combined to form composite utility functions/constraints. Finally, we demonstrate the applicability of our approach by presenting several actionable business insights discovered with LPM discovery on two real life data sets.\nIn all but the most trivial optimization problems, the structure of the solutions exhibit complex interdependencies between the input parameters. Decades of research with stochastic search techniques has shown the benefit of explicitly modeling the interactions between sets of parameters and the overall quality of the solutions discovered. We demonstrate a novel method, based on learning deep networks, to model the global landscapes of optimization problems. To represent the search space concisely and accurately, the deep networks must encode information about the underlying parameter interactions and their contributions to the quality of the solution. Once the networks are trained, the networks are probed to reveal parameter combinations with high expected performance with respect to the optimization task. These estimates are used to initialize fast, randomized, local search algorithms, which in turn expose more information about the search space that is subsequently used to refine the models. We demonstrate the technique on multiple optimization problems that have arisen in a variety of real-world domains, including: packing, graphics, job scheduling, layout and compression. The problems include combinatoric search spaces, discontinuous and highly non-linear spaces, and span binary, higher-cardinality discrete, as well as continuous parameters. Strengths, limitations, and extensions of the approach are extensively discussed and demonstrated.\nStatistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.\nIn economics and psychology, delay discounting is often used to characterize how individuals choose between a smaller immediate reward and a larger delayed reward. People with higher delay discounting rate (DDR) often choose smaller but more immediate rewards (a \"today person\"). In contrast, people with a lower discounting rate often choose a larger future rewards (a \"tomorrow person\"). Since the ability to modulate the desire of immediate gratification for long term rewards plays an important role in our decision-making, the lower discounting rate often predicts better social, academic and health outcomes. In contrast, the higher discounting rate is often associated with problematic behaviors such as alcohol/drug abuse, pathological gambling and credit card default. Thus, research on understanding and moderating delay discounting has the potential to produce substantial societal benefits.\nWe propose a supervised algorithm for generating type embeddings in the same semantic vector space as a given set of entity embeddings. The algorithm is agnostic to the derivation of the underlying entity embeddings. It does not require any manual feature engineering, generalizes well to hundreds of types and achieves near-linear scaling on Big Graphs containing many millions of triples and instances by virtue of an incremental execution. We demonstrate the utility of the embeddings on a type recommendation task, outperforming a non-parametric feature-agnostic baseline while achieving 15x speedup and near-constant memory usage on a full partition of DBpedia. Using state-of-the-art visualization, we illustrate the agreement of our extensionally derived DBpedia type embeddings with the manually curated domain ontology. Finally, we use the embeddings to probabilistically cluster about 4 million DBpedia instances into 415 types in the DBpedia ontology.\nWe consider the problem of a robot learning the mechanical properties of objects through physical interaction with the object, and introduce a practical, data-efficient approach for identifying the motion models of these objects. The proposed method utilizes a physics engine, where the robot seeks to identify the inertial and friction parameters of the object by simulating its motion under different values of the parameters and identifying those that result in a simulation which matches the observed real motions. The problem is solved in a Bayesian optimization framework. The same framework is used for both identifying the model of an object online and searching for a policy that would minimize a given cost function according to the identified model. Experimental results both in simulation and using a real robot indicate that the proposed method outperforms state-of-the-art model-free reinforcement learning approaches.\nConvolutional Neural Networks have been a subject of great importance over the past decade and great strides have been made in their utility for producing state of the art performance in many computer vision problems. However, the behavior of deep networks is yet to be fully understood and is still an active area of research. In this work, we present an intriguing behavior: pre-trained CNNs can be made to improve their predictions by structurally perturbing the input. We observe that these perturbations - referred as Guided Perturbations - enable a trained network to improve its prediction performance without any learning or change in network weights. We perform various ablative experiments to understand how these perturbations affect the local context and feature representations. Furthermore, we demonstrate that this idea can improve performance of several existing approaches on semantic segmentation and scene labeling tasks on the PASCAL VOC dataset and supervised classification tasks on MNIST and CIFAR10 datasets.\nRecently, research on accelerated stochastic gradient descent methods (e.g., SVRG) has made exciting progress (e.g., linear convergence for strongly convex problems). However, the best-known methods (e.g., Katyusha) requires at least two auxiliary variables and two momentum parameters. In this paper, we propose a fast stochastic variance reduction gradient (FSVRG) method, in which we design a novel update rule with the Nesterov's momentum and incorporate the technique of growing epoch size. FSVRG has only one auxiliary variable and one momentum weight, and thus it is much simpler and has much lower per-iteration complexity. We prove that FSVRG achieves linear convergence for strongly convex problems and the optimal $\\mathcal{O}(1/T^2)$ convergence rate for non-strongly convex problems, where $T$ is the number of outer-iterations. We also extend FSVRG to directly solve the problems with non-smooth component functions, such as SVM. Finally, we empirically study the performance of FSVRG for solving various machine learning problems such as logistic regression, ridge regression, Lasso and SVM. Our results show that FSVRG outperforms the state-of-the-art stochastic methods, including Katyusha.\nMany efforts have been dedicated to identifying restrictions on ontologies expressed as tuple-generating dependencies (tgds), a.k.a. existential rules, that lead to the decidability for the problem of answering ontology-mediated queries (OMQs). This has given rise to three families of formalisms: guarded, non-recursive, and sticky sets of tgds. In this work, we study the containment problem for OMQs expressed in such formalisms, which is a key ingredient for solving static analysis tasks associated with them. Our main contribution is the development of specially tailored techniques for OMQ containment under the classes of tgds stated above. This enables us to obtain sharp complexity bounds for the problems at hand, which in turn allow us to delimitate its practical applicability. We also apply our techniques to pinpoint the complexity of problems associated with two emerging applications of OMQ containment: distribution over components and UCQ rewritability of OMQs.\nWe present the first treatment of the arc length of the Gaussian Process (GP) with more than a single output dimension. GPs are commonly used for tasks such as trajectory modelling, where path length is a crucial quantity of interest. Previously, only paths in one dimension have been considered, with no theoretical consideration of higher dimensional problems. We fill the gap in the existing literature by deriving the moments of the arc length for a stationary GP with multiple output dimensions. A new method is used to derive the mean of a one-dimensional GP over a finite interval, by considering the distribution of the arc length integrand. This technique is used to derive an approximate distribution over the arc length of a vector valued GP in $\\mathbb{R}^n$ by moment matching the distribution. Numerical simulations confirm our theoretical derivations.\nKnowledge bases (KBs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform knowledge base completion or link prediction, i.e., predict whether a relationship not in the knowledge base is likely to be true. This article serves as a brief overview of embedding models of entities and relationships for knowledge base completion, summarizing up-to-date experimental results on standard benchmark datasets FB15k, WN18, FB15k-237, WN18RR, FB13 and WN11.\nIn this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -- flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task.\nThis paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals. Because performed note durations can deviate largely from score-indicated values, previous methods had the problem of not being able to accurately estimate offset score times (or note values) and thus could only output incomplete musical scores. Based on observations that the pitch context and onset score times are influential on the configuration of note values, we construct a context-tree model that provides prior distributions of note values using these features and combine it with a performance model in the framework of Markov random fields. Evaluation results show that our method reduces the average error rate by around 40 percent compared to existing/simple methods. We also confirmed that, in our model, the score model plays a more important role than the performance model, and it automatically captures the voice structure by unsupervised learning.\nAs a general and thus popular model for autonomous systems, partially observable Markov decision process (POMDP) can capture uncertainties from different sources like sensing noises, actuation errors, and uncertain environments. However, its comprehensiveness makes the planning and control in POMDP difficult. Traditional POMDP planning problems target to find the optimal policy to maximize the expectation of accumulated rewards. But for safety critical applications, guarantees of system performance described by formal specifications are desired, which motivates us to consider formal methods to synthesize supervisor for POMDP. With system specifications given by Probabilistic Computation Tree Logic (PCTL), we propose a supervisory control framework with a type of deterministic finite automata (DFA), za-DFA, as the controller form. While the existing work mainly relies on optimization techniques to learn fixed-size finite state controllers (FSCs), we develop an $L^*$ learning based algorithm to determine both space and transitions of za-DFA. Membership queries and different oracles for conjectures are defined. The learning algorithm is sound and complete. An example is given in detailed steps to illustrate the supervisor synthesis algorithm.\nA recurring problem faced when training neural networks is that there is typically not enough data to maximize the generalization capability of deep neural networks(DNN). There are many techniques to address this, including data augmentation, dropout, and transfer learning. In this paper, we introduce an additional method which we call Smart Augmentation and we show how to use it to increase the accuracy and reduce overfitting on a target network. Smart Augmentation works by creating a network that learns how to generate augmented data during the training process of a target network in a way that reduces that networks loss. This allows us to learn augmentations that minimize the error of that network.   Smart Augmentation has shown the potential to increase accuracy by demonstrably significant measures on all datasets tested. In addition, it has shown potential to achieve similar or improved performance levels with significantly smaller network sizes in a number of tested cases.\nWe extend the $ASPIC^+$ framework for structured argumentation so as to allow applications of the reasoning by cases inference scheme for defeasible arguments. Given an argument with conclusion `$A$ or $B$', an argument based on $A$ with conclusion $C$, and an argument based on $B$ with conclusion $C$, we allow the construction of an argument with conclusion $C$. We show how our framework leads to different results than other approaches in non-monotonic logic for dealing with disjunctive information, such as disjunctive default theory or approaches based on the OR-rule (which allows to derive a defeasible rule `If ($A$ or $B$) then $C$', given two defeasible rules `If $A$ then $C$' and `If $B$ then $C$'). We raise new questions regarding the subtleties of reasoning defeasibly with disjunctive information, and show that its formalization is more intricate than one would presume.\nAlthough information workers may complain about meetings, they are an essential part of their work life. Consequently, busy people spend a significant amount of time scheduling meetings. We present Calendar.help, a system that provides fast, efficient scheduling through structured workflows. Users interact with the system via email, delegating their scheduling needs to the system as if it were a human personal assistant. Common scheduling scenarios are broken down using well-defined workflows and completed as a series of microtasks that are automated when possible and executed by a human otherwise. Unusual scenarios fall back to a trained human assistant who executes them as unstructured macrotasks. We describe the iterative approach we used to develop Calendar.help, and share the lessons learned from scheduling thousands of meetings during a year of real-world deployments. Our findings provide insight into how complex information tasks can be broken down into repeatable components that can be executed efficiently to improve productivity.\nCatastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network.\nWhether teaching in a classroom or a Massive Online Open Course it is crucial to present the material in a way that benefits the audience as a whole. We identify two important tasks to solve towards this objective, 1 group students so that they can maximally benefit from peer interaction and 2 find an optimal schedule of the educational material for each group. Thus, in this paper, we solve the problem of team formation and content scheduling for education. Given a time frame d, a set of students S with their required need to learn different activities T and given k as the number of desired groups, we study the problem of finding k group of students. The goal is to teach students within time frame d such that their potential for learning is maximized and find the best schedule for each group. We show this problem to be NP-hard and develop a polynomial algorithm for it. We show our algorithm to be effective both on synthetic as well as a real data set. For our experiments, we use real data on students' grades in a Computer Science department. As part of our contribution, we release a semi-synthetic dataset that mimics the properties of the real data.\nRecognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets. In this paper, we propose a new task that aims at parsing scenes with a large and open vocabulary, and several evaluation metrics are explored for this problem. Our proposed approach to this problem is a joint image pixel and word concept embeddings framework, where word concepts are connected by semantic relations. We validate the open vocabulary prediction ability of our framework on ADE20K dataset which covers a wide variety of scenes and objects. We further explore the trained joint embedding space to show its interpretability.\nThe goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, built on top of Generative Adversarial Imitation Learning, can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations. In the driving domain, we show that a model learned from human demonstrations is able to both accurately reproduce a variety of behaviors and accurately anticipate human actions using raw visual inputs. Compared with various baselines, our method can better capture the latent structure underlying expert demonstrations, often recovering semantically meaningful factors of variation in the data.\nFor robotic vehicles to navigate safely and efficiently in pedestrian-rich environments, it is important to model subtle human behaviors and navigation rules. However, while instinctive to humans, socially compliant navigation is still difficult to quantify due to the stochasticity in people's behaviors. Existing works are mostly focused on using feature-matching techniques to describe and imitate human paths, but often do not generalize well since the feature values can vary from person to person, and even run to run. This work notes that while it is challenging to directly specify the details of what to do (precise mechanisms of human navigation), it is straightforward to specify what not to do (violations of social norms). Specifically, using deep reinforcement learning, this work develops a time-efficient navigation policy that respects common social norms. The proposed method is shown to enable fully autonomous navigation of a robotic vehicle moving at human walking speed in an environment with many pedestrians.\nIn this paper, we present a transfer learning approach for music classification and regression tasks. We propose to use a pre-trained convnet feature, a concatenated feature vector using the activations of feature maps of multiple layers in a trained convolutional network. We show how this convnet feature can serve as general-purpose music representation. In the experiments, a convnet is trained for music tagging and then transferred to other music-related classification and regression tasks. The convnet feature outperforms the baseline MFCC feature in all the considered tasks and several previous approaches that are aggregating MFCCs as well as low- and high-level music features.\nRecently, deep learning (DL) methods have been introduced very successfully into human activity recognition (HAR) scenarios in ubiquitous and wearable computing. Especially the prospect of overcoming the need for manual feature design combined with superior classification capabilities render deep neural networks very attractive for real-life HAR application. Even though DL-based approaches now outperform the state-of-the-art in a number of recognitions tasks of the field, yet substantial challenges remain. Most prominently, issues with real-life datasets, typically including imbalanced datasets and problematic data quality, still limit the effectiveness of activity recognition using wearables. In this paper we tackle such challenges through Ensembles of deep Long Short Term Memory (LSTM) networks. We have developed modified training procedures for LSTM networks and combine sets of diverse LSTM learners into classifier collectives. We demonstrate, both formally and empirically, that Ensembles of deep LSTM learners outperform the individual LSTM networks. Through an extensive experimental evaluation on three standard benchmarks (Opportunity, PAMAP2, Skoda) we demonstrate the excellent recognition capabilities of our approach and its potential for real-life applications of human activity recognition.\nMultiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides exceptional diversity of output. We efficiently train feed-forward neural networks in a self-supervised manner to generate adversarial examples against a target network or set of networks. We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier's outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as well as the latest state-of-the-art ImageNet classifier Inception ResNet v2.\nThe exponential explosion of the set of patterns is one of the main challenges in pattern mining. This challenge is approached by introducing a constraint for pattern selection. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.   In order to deal with nonmonotonic constraints we introduce the notion of \"projection antimonotonicity\" and SOFIA algorithm that allow generating best patterns for a class of nonmonotonic constraints. Cosine interest, robustness, stability of closed itemsets, and the associated delta-measure are among these constraints. SOFIA starts from light descriptions of transactions in dataset (a small set of items in the case of itemset description) and then iteratively adds more information to these descriptions (more items with indication of tidsets they describe).\nWhile humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84% and a recall of 69%.\nInverse reinforcement learning (IRL) aims to explain observed strategic behavior by fitting reinforcement learning models to behavioral data. However, traditional IRL methods are only applicable when the observations are in the form of state-action paths. This assumption may not hold in many real-world modelling settings, where only partial observations are available. In general, we may assume that there is a summarizing function $\\sigma$, which acts as a filter between us and the true state-action paths that constitute the demonstration. Some initial approaches to extending IRL to such situations have been presented, but with very specific assumptions about the structure of $\\sigma$, such as that only certain state observations are missing. This paper instead focuses on the most general case of the problem, where no assumptions are made about the summarizing function, except that it can be evaluated. We demonstrate that inference is still possible. The paper presents exact and approximate inference algorithms that allow full posterior inference, which is particularly important for assessing parameter uncertainty in this challenging inference situation. Empirical scalability is demonstrated to reasonably sized problems, and practical applicability is demonstrated by estimating the posterior for a cognitive science RL model based on observed user's task completion time only.\nThis paper investigates a novel task of generating texture images from perceptual descriptions. Previous work on texture generation focused on either synthesis from examples or generation from procedural models. Generating textures from perceptual attributes have not been well studied yet. Meanwhile, perceptual attributes, such as directionality, regularity and roughness are important factors for human observers to describe a texture. In this paper, we propose a joint deep network model that combines adversarial training and perceptual feature regression for texture generation, while only random noise and user-defined perceptual attributes are required as input. In this model, a preliminary trained convolutional neural network is essentially integrated with the adversarial framework, which can drive the generated textures to possess given perceptual attributes. An important aspect of the proposed model is that, if we change one of the input perceptual features, the corresponding appearance of the generated textures will also be changed. We design several experiments to validate the effectiveness of the proposed method. The results show that the proposed method can produce high quality texture images with desired perceptual properties.\nThe recently launched LinkedIn Salary product has been designed with the goal of providing compensation insights to the world's professionals and thereby helping them optimize their earning potential. We describe the overall design and architecture of the statistical modeling system underlying this product. We focus on the unique data mining challenges while designing and implementing the system, and describe the modeling components such as Bayesian hierarchical smoothing that help to compute and present robust compensation insights to users. We report on extensive evaluation with nearly one year of de-identified compensation data collected from over one million LinkedIn users, thereby demonstrating the efficacy of the statistical models. We also highlight the lessons learned through the deployment of our system at LinkedIn.\nSemantic segmentation requires a detailed labeling of image pixels by object category. Information derived from local image patches is necessary to describe the detailed shape of individual objects. However, this information is ambiguous and can result in noisy labels. Global inference of image content can instead capture the general semantic concepts present. We advocate that holistic inference of image concepts provides valuable information for detailed pixel labeling. We propose a generic framework to leverage holistic information in the form of a LabelBank for pixel-level segmentation.   We show the ability of our framework to improve semantic segmentation performance in a variety of settings. We learn models for extracting a holistic LabelBank from visual cues, attributes, and/or textual descriptions. We demonstrate improvements in semantic segmentation accuracy on standard datasets across a range of state-of-the-art segmentation architectures and holistic inference approaches.\nSelf-paced learning (SPL) is a new methodology that simulates the learning principle of humans/animals to start learning easier aspects of a learning task, and then gradually take more complex examples into training. This new-coming learning regime has been empirically substantiated to be effective in various computer vision and pattern recognition tasks. Recently, it has been proved that the SPL regime has a close relationship to a implicit self-paced objective function. While this implicit objective could provide helpful interpretations to the effectiveness, especially the robustness, insights under the SPL paradigms, there are still no theoretical results strictly proved to verify such relationship. To this issue, in this paper, we provide some convergence results on this implicit objective of SPL. Specifically, we prove that the learning process of SPL always converges to critical points of this implicit objective under some mild conditions. This result verifies the intrinsic relationship between SPL and this implicit objective, and makes the previous robustness analysis on SPL complete and theoretically rational.\nIn this paper, we address the problem of how automated situation-awareness can be achieved by learning real-world situations from ubiquitously generated mobility data. Without semantic input about the time and space where situations take place, this turns out to be a fundamental challenging problem. Uncertainties also introduce technical challenges when data is generated in irregular time intervals, being mixed with noise, and errors. Purely relying on temporal patterns observable in mobility data, in this paper, we propose Spaceprint, a fully automated algorithm for finding the repetitive pattern of similar situations in spaces. We evaluate this technique by showing how the latent variables describing the category, and the actual identity of a space can be discovered from the extracted situation patterns. Doing so, we use different real-world mobility datasets with data about the presence of mobile entities in a variety of spaces. We also evaluate the performance of this technique by showing its robustness against uncertainties.\nWe present a novel approach to deformable object manipulation that does not rely on highly-accurate modeling. The key contribution of this paper is to formulate the task as a Multi-Armed Bandit problem, with each arm representing a model of the deformable object. To \"pull\" an arm and evaluate its utility, we use the arm's model to generate a velocity command for the gripper(s) holding the object and execute it. As the task proceeds and the object deforms, the utility of each model can change. Our framework estimates these changes and balances exploration of the model set with exploitation of high-utility models. We also propose an approach based on Kalman Filtering for Non-stationary Multi-armed Normal Bandits (KF-MANB) to leverage the coupling between models to learn more from each arm pull. We demonstrate that our method outperforms previous methods on synthetic trials, and performs competitively on several manipulation tasks in simulation.\nRobots and autonomous systems that operate around humans will likely always rely on kill switches that stop their execution and allow them to be remote-controlled for the safety of humans or to prevent damage to the system. It is theoretically possible for an autonomous system with sufficient sensor and effector capability and using reinforcement learning to learn that the kill switch deprives it of long-term reward and learn to act to disable the switch or otherwise prevent a human operator from using the switch. This is referred to as the big red button problem. We present a technique which prevents a reinforcement learning agent from learning to disable the big red button. Our technique interrupts the agent or robot by placing it in a virtual simulation where it continues to receive reward. We illustrate our technique in a simple grid world environment.\nKnowledge graph embedding aims to embed entities and relations of knowledge graphs into low-dimensional vector spaces. Translating embedding methods regard relations as the translation from head entities to tail entities, which achieve the state-of-the-art results among knowledge graph embedding methods. However, a major limitation of these methods is the time consuming training process, which may take several days or even weeks for large knowledge graphs, and result in great difficulty in practical applications. In this paper, we propose an efficient parallel framework for translating embedding methods, called ParTrans-X, which enables the methods to be paralleled without locks by utilizing the distinguished structures of knowledge graphs. Experiments on two datasets with three typical translating embedding methods, i.e., TransE [3], TransH [17], and a more efficient variant TransE- AdaGrad [10] validate that ParTrans-X can speed up the training process by more than an order of magnitude.\nWe present a novel heuristic approach that defines fuzzy geographical descriptors using data gathered from a survey with human subjects. The participants were asked to provide graphical interpretations of the descriptors `north' and `south' for the Galician region (Spain). Based on these interpretations, our approach builds fuzzy descriptors that are able to compute membership degrees for geographical locations. We evaluated our approach in terms of efficiency and precision. The fuzzy descriptors are meant to be used as the cornerstones of a geographical referring expression generation algorithm that is able to linguistically characterize geographical locations and regions. This work is also part of a general research effort that intends to establish a methodology which reunites the empirical studies traditionally practiced in data-to-text and the use of fuzzy sets to model imprecision and vagueness in words and expressions for text generation purposes.\nWhile strong progress has been made in image captioning over the last years, machine and human captions are still quite distinct. A closer look reveals that this is due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans -- rightfully so -- generate multiple, diverse captions, due to the inherent ambiguity in the captioning task which is not considered in today's systems.   To address these challenges, we change the training objective of the caption generator from reproducing groundtruth captions to generating a set of captions that is indistinguishable from human generated captions. Instead of handcrafting such a learning target, we employ adversarial training in combination with an approximate Gumbel sampler to implicitly match the generated distribution to the human one. While our method achieves comparable performance to the state-of-the-art in terms of the correctness of the captions, we generate a set of diverse captions, that are significantly less biased and match the word statistics better in several aspects.\nThis paper introduces a new approach to the long-term tracking of an object in a challenging environment. The object is a cow and the environment is an enclosure in a cowshed. Some of the key challenges in this domain are a cluttered background, low contrast and high similarity between moving objects which greatly reduces the efficiency of most existing approaches, including those based on background subtraction. Our approach is split into object localization, instance segmentation, learning and tracking stages. Our solution is compared to a range of semi-supervised object tracking algorithms and we show that the performance is strong and well suited to subsequent analysis. We present our solution as a first step towards broader tracking and behavior monitoring for cows in precision agriculture with the ultimate objective of early detection of lameness.\nWith the popularity of massive open online courses, grading through crowdsourcing has become a prevalent approach towards large scale classes. However, for getting grades for complex tasks, which require specific skills and efforts for grading, crowdsourcing encounters a restriction of insufficient knowledge of the workers from the crowd. Due to knowledge limitation of the crowd graders, grading based on partial perspectives becomes a big challenge for evaluating complex tasks through crowdsourcing. Especially for those tasks which not only need specific knowledge for grading, but also should be graded as a whole instead of being decomposed into smaller and simpler subtasks. We propose a framework for grading complex tasks via multiple views, which are different grading perspectives defined by experts for the task, to provide uniformity. Aggregation algorithm based on graders variances are used to combine the grades for each view. We also detect bias patterns of the graders, and debias them regarding each view of the task. Bias pattern determines how the behavior is biased among graders, which is detected by a statistical technique. The proposed approach is analyzed on a synthetic data set. We show that our model gives more accurate results compared to the grading approaches without different views and debiasing algorithm.\nDecision-makers are faced with the challenge of estimating what is likely to happen when they take an action. For instance, if I choose not to treat this patient, are they likely to die? Practitioners commonly use supervised learning algorithms to fit predictive models that help decision-makers reason about likely future outcomes, but we show that this approach is unreliable, and sometimes even dangerous. The key issue is that supervised learning algorithms are highly sensitive to the policy used to choose actions in the training data, which causes the model to capture relationships that do not generalize. We propose using a different learning objective that predicts counterfactuals instead of predicting outcomes under an existing action policy as in supervised learning. To support decision-making in temporal settings, we introduce the Counterfactual Gaussian Process (CGP) to predict the counterfactual future progression of continuous-time trajectories under sequences of future actions. We demonstrate the benefits of the CGP on two important decision-support tasks: risk prediction and \"what if?\" reasoning for individualized treatment planning.\nWhile recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making.\nData quality assessment and data cleaning are context-dependent activities. Motivated by this observation, we propose the Ontological Multidimensional Data Model (OMD model), which can be used to model and represent contexts as logic-based ontologies. The data under assessment is mapped into the context, for additional analysis, processing, and quality data extraction. The resulting contexts allow for the representation of dimensions, and multidimensional data quality assessment becomes possible. At the core of a multidimensional context we include a generalized multidimensional data model and a Datalog+/- ontology with provably good properties in terms of query answering. These main components are used to represent dimension hierarchies, dimensional constraints, dimensional rules, and define predicates for quality data specification. Query answering relies upon and triggers navigation through dimension hierarchies, and becomes the basic tool for the extraction of quality data. The OMD model is interesting per se, beyond applications to data quality. It allows for a logic-based, and computationally tractable representation of multidimensional data, extending previous multidimensional data models with additional expressive power and functionalities.\nImplicit discourse relation classification is of great challenge due to the lack of connectives as strong linguistic cues, which motivates the use of annotated implicit connectives to improve the recognition. We propose a feature imitation framework in which an implicit relation network is driven to learn from another neural network with access to connectives, and thus encouraged to extract similarly salient features for accurate classification. We develop an adversarial model to enable an adaptive imitation scheme through competition between the implicit network and a rival feature discriminator. Our method effectively transfers discriminability of connectives to the implicit features, and achieves state-of-the-art performance on the PDTB benchmark.\nAn important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. We show this leads to greater inductive transfer from recognition to VQA than standard multitask learning. Visual recognition also improves, especially for categories that have relatively few recognition training labels but appear often in the VQA setting. Thus, our paper takes a small step towards creating more general vision systems by showing the benefit of interpretable, flexible, and trainable core representations.\nGeneral human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB.\nDeep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Cluster-aware Generative Model, that uses unlabelled information to infer a latent representation that models the natural clustering of the data, and additional labelled data points to refine this clustering. The generative performances of the model significantly improve when labelled information is exploited, obtaining a log-likelihood of -79.38 nats on permutation invariant MNIST, while also achieving competitive semi-supervised classification accuracies. The model can also be trained fully unsupervised, and still improve the log-likelihood performance with respect to related methods.\nWe consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is flawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefficient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical findings on a fruit collection task.\nThe probability density function of a probability distribution is a fundamental concept in probability theory and a key ingredient in various widely used machine learning methods. However, the necessary framework for compiling probabilistic functional programs to density functions has only recently been developed. In this work, we present a density compiler for a probabilistic language with failure and both discrete and continuous distributions, and provide a proof of its soundness. The compiler greatly reduces the development effort of domain experts, which we demonstrate by solving inference problems from various scientific applications, such as modelling the global carbon cycle, using a standard Markov chain Monte Carlo framework.\nThis paper presents a design of a non-player character (AI) for promoting balancedness in use of body segments when engaging in full-body motion gaming. In our experiment, we settle a battle between the proposed AI and a player by using FightingICE, a fighting game platform for AI development. A middleware called UKI is used to allow the player to control the game by using body motion instead of the keyboard and mouse. During gameplay, the proposed AI analyze health states of the player; it determines its next action by predicting how each candidate action, recommended by a Monte-Carlo tree search algorithm, will induce the player to move, and how the player's health tends to be affected. Our result demonstrates successful improvement in balancedness in use of body segments on 4 out of 5 subjects.\nThe orbital debris problem presents an opportunity for inter-agency and international cooperation toward the mutually beneficial goals of debris prevention, mitigation, remediation, and improved space situational awareness (SSA). Achieving these goals requires sharing orbital debris and other SSA data. Toward this, I present an ontological architecture for the orbital debris domain, taking steps in the creation of an orbital debris ontology (ODO). The purpose of this ontological system is to (I) represent general orbital debris and SSA domain knowledge, (II) structure, and standardize where needed, orbital data and terminology, and (III) foster semantic interoperability and data-sharing. In doing so I hope to (IV) contribute to solving the orbital debris problem, improving peaceful global SSA, and ensuring safe space travel for future generations.\nPerception and expression of emotion are key factors to the success of dialogue systems or conversational agents. However, this problem has not been studied in large-scale conversation generation so far. In this paper, we propose Emotional Chatting Machine (ECM) that can generate appropriate responses not only in content (relevant and grammatical) but also in emotion (emotionally consistent). To the best of our knowledge, this is the first work that addresses the emotion factor in large-scale conversation generation. ECM addresses the factor using three new mechanisms that respectively (1) models the high-level abstraction of emotion expressions by embedding emotion categories, (2) captures the change of implicit internal emotion states, and (3) uses explicit emotion expressions with an external emotion vocabulary. Experiments show that the proposed model can generate responses appropriate not only in content but also in emotion.\nDatabases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching structured data based on probabilistic programming and nonparametric Bayes. Users specify queries in a probabilistic language that combines standard SQL database search operators with an information theoretic ranking function called predictive relevance. Predictive relevance can be calculated by a fast sparse matrix algorithm based on posterior samples from CrossCat, a nonparametric Bayesian model for high-dimensional, heterogeneously-typed data tables. The result is a flexible search technique that applies to a broad class of information retrieval problems, which we integrate into BayesDB, a probabilistic programming platform for probabilistic data analysis. This paper demonstrates applications to databases of US colleges, global macroeconomic indicators of public health, and classic cars. We found that human evaluators often prefer the results from probabilistic search to results from a standard baseline.\nGenerative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive.\nIt is well-known that exploiting label correlations is important to multi-label learning. Existing approaches either assume that the label correlations are global and shared by all instances; or that the label correlations are local and shared only by a data subset. In fact, in the real-world applications, both cases may occur that some label correlations are globally applicable and some are shared only in a local group of instances. Moreover, it is also a usual case that only partial labels are observed, which makes the exploitation of the label correlations much more difficult. That is, it is hard to estimate the label correlations when many labels are absent. In this paper, we propose a new multi-label approach GLOCAL dealing with both the full-label and the missing-label cases, exploiting global and local label correlations simultaneously, through learning a latent label representation and optimizing label manifolds. The extensive experimental studies validate the effectiveness of our approach on both full-label and missing-label data.\nEnd-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded in explicit intermediate-level supervision. We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches. We present experiments on conversational speech recognition where we use lower-level tasks, such as phoneme recognition, in a multitask training approach with an encoder-decoder model for direct character transcription. We compare multiple types of lower-level tasks and analyze the effects of the auxiliary tasks. Our results on the Switchboard corpus show that this approach improves recognition accuracy over a standard encoder-decoder model on the Eval2000 test set.\nA landmark based heuristic is investigated for reducing query phase run-time of the probabilistic roadmap (\\PRM) motion planning method. The heuristic is generated by storing minimum spanning trees from a small number of vertices within the \\PRM graph and using these trees to approximate the cost of a shortest path between any two vertices of the graph. The intermediate step of preprocessing the graph increases the time and memory requirements of the classical motion planning technique in exchange for speeding up individual queries making the method advantageous in multi-query applications. This paper investigates these trade-offs on \\PRM graphs constructed in randomized environments as well as a practical manipulator simulation.We conclude that the method is preferable to Dijkstra's algorithm or the ${\\rm A}^*$ algorithm with conventional heuristics in multi-query applications.\nImplicit feedback is the simplest form of user feedback that can be used for item recommendation. It is easy to collect and domain independent. However, there is a lack of negative examples. Existing works circumvent this problem by making various assumptions regarding the unconsumed items, which fail to hold when the user did not consume an item because she was unaware of it. In this paper we propose Conformative Filtering (CoF) as a novel method for addressing the lack of negative examples in implicit feedback. The motivation is that if there is a large group of users who share the same taste and none of them consumed an item, then it is highly likely that the item is irrelevant to this taste. We use Hierarchical Latent Tree Analysis (HLTA) to identify taste-based user groups, and make recommendations for a user based on her memberships in the groups. Experiments on real-world datasets from different domains show that CoF has superior performance compared to other baselines and more than 10% improvement in Recall@5 and Recall@10 is observed.\nThis paper introduces a new lifelong learning solution where a single model is trained for a sequence of tasks. The main challenge that vision systems face in this context is catastrophic forgetting: as they tend to adapt to the most recently seen task, they lose performance on the tasks that were learned previously. Our method aims at preserving the knowledge of the previous tasks while learning a new one by using autoencoders. For each task, an under-complete autoencoder is learned, capturing the features that are crucial for its achievement. When a new task is presented to the system, we prevent the reconstructions of the features with these autoencoders from changing, which has the effect of preserving the information on which the previous tasks are mainly relying. At the same time, the features are given space to adjust to the most recent environment as only their projection into a low dimension submanifold is controlled. The proposed system is evaluated on image classification tasks and shows a reduction of forgetting over the state-of-the-art\nIn the context of Smart Cities, indicator definitions have been used to calculate values that enable the comparison among different cities. The calculation of an indicator values has challenges as the calculation may need to combine some aspects of quality while addressing different levels of abstraction. Knowledge graphs (KGs) have been used successfully to support flexible representation, which can support improved understanding and data analysis in similar settings. This paper presents an operational description for a city KG, an indicator ontology that support indicator discovery and data visualization and an application capable of performing metadata analysis to automatically build and display dashboards according to discovered indicators. We describe our implementation in an urban mobility setting.\nModels that can simulate how environments change in response to actions can be used by agents to plan and act efficiently. We improve on previous environment simulators from high-dimensional pixel observations by introducing recurrent neural networks that are able to make temporally and spatially coherent predictions for hundreds of time-steps into the future. We present an in-depth analysis of the factors affecting performance, providing the most extensive attempt to advance the understanding of the properties of these models. We address the issue of computationally inefficiency with a model that does not need to generate a high-dimensional image at each time-step. We show that our approach can be used to improve exploration and is adaptable to many diverse environments, namely 10 Atari games, a 3D car racing environment, and complex 3D mazes.\nIn this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB is that it uses both mean and variance estimates to eliminate arms that have been sufficiently explored; to the best of our knowledge this is the first algorithm to employ such an approach for the considered TBP. Theoretically, we obtain an upper bound on the loss (probability of mis-classification) incurred by AugUCB. Although UCBEV in literature provides a better guarantee, it is important to emphasize that UCBEV has access to problem complexity (whose computation requires arms' mean and variances), and hence is not realistic in practice; this is in contrast to AugUCB whose implementation does not require any such complexity inputs. We conduct extensive simulation experiments to validate the performance of AugUCB. Through our simulation work, we establish that AugUCB, owing to its utilization of variance estimates, performs significantly better than the state-of-the-art APT, CSAR and other non variance-based algorithms.\nSentence simplification reduces semantic complexity to benefit people with language impairments. Previous simplification studies on the sentence level and word level have achieved promising results but also meet great challenges. For sentence-level studies, sentences after simplification are fluent but sometimes are not really simplified. For word-level studies, words are simplified but also have potential grammar errors due to different usages of words before and after simplification. In this paper, we propose a two-step simplification framework by combining both the word-level and the sentence-level simplifications, making use of their corresponding advantages. Based on the two-step framework, we implement a novel constrained neural generation model to simplify sentences given simplified words. The final results on Wikipedia and Simple Wikipedia aligned datasets indicate that our method yields better performance than various baselines.\nThe paper presents a novel view of the Dempster-Shafer belief function as a measure of diversity in relational data bases. It is demonstrated that under the interpretation The Dempster rule of evidence combination corresponds to the join operator of the relational database theory. This rough-set based interpretation is qualitative in nature and can represent a number of belief function operators.   The interpretation has the property that Given a definition of the belief measure of objects in the interpretation domain we can perform operations in this domain and the measure of the resulting object is derivable from measures of component objects via belief operator. We demonstrated this property for Dempster rule of combination, marginalization, Shafer's conditioning, independent variables, Shenoy's notion of conditional independence of variables.   The interpretation is based on rough sets (in connection with decision tables), but differs from previous interpretations of this type in that it counts the diversity rather than frequencies in a decision table.\nGraphical causal models are an important tool for knowledge discovery because they can represent both the causal relations between variables and the multivariate probability distributions over the data. Once learned, causal graphs can be used for classification, feature selection and hypothesis generation, while revealing the underlying causal network structure and thus allowing for arbitrary likelihood queries over the data. However, current algorithms for learning sparse directed graphs are generally designed to handle only one type of data (continuous-only or discrete-only), which limits their applicability to a large class of multi-modal biological datasets that include mixed type variables. To address this issue, we developed new methods that modify and combine existing methods for finding undirected graphs with methods for finding directed graphs. These hybrid methods are not only faster, but also perform better than the directed graph estimation methods alone for a variety of parameter settings and data set sizes. Here, we describe a new conditional independence test for learning directed graphs over mixed data types and we compare performances of different graph learning strategies on synthetic data.\nMany modern computer vision and machine learning applications rely on solving difficult optimization problems that involve non-differentiable objective functions and constraints. The alternating direction method of multipliers (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a generalization of ADMM that often achieves better performance, but its efficiency depends strongly on algorithm parameters that must be chosen by an expert user. We propose an adaptive method that automatically tunes the key algorithm parameters to achieve optimal performance without user oversight. Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM (ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A detailed convergence analysis of ARADMM is provided, and numerical results on several applications demonstrate fast practical convergence.\nText normalization techniques based on rules, lexicons or supervised training requiring large corpora are not scalable nor domain interchangeable, and this makes them unsuitable for normalizing user-generated content (UGC). Current tools available for Brazilian Portuguese make use of such techniques. In this work we propose a technique based on distributed representation of words (or word embeddings). It generates continuous numeric vectors of high-dimensionality to represent words. The vectors explicitly encode many linguistic regularities and patterns, as well as syntactic and semantic word relationships. Words that share semantic similarity are represented by similar vectors. Based on these features, we present a totally unsupervised, expandable and language and domain independent method for learning normalization lexicons from word embeddings. Our approach obtains high correction rate of orthographic errors and internet slang in product reviews, outperforming the current available tools for Brazilian Portuguese.\nDeep reinforcement learning has achieved many impressive results in recent years. However, tasks with sparse rewards or long horizons continue to pose significant challenges. To tackle these important problems, we propose a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks. Our approach brings together some of the strengths of intrinsic motivation and hierarchical methods: the learning of useful skill is guided by a single proxy reward, the design of which requires very minimal domain knowledge about the downstream tasks. Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks. To efficiently pre-train a large span of skills, we use Stochastic Neural Networks combined with an information-theoretic regularizer. Our experiments show that this combination is effective in learning a wide span of interpretable skills in a sample-efficient way, and can significantly boost the learning performance uniformly across a wide range of downstream tasks.\nThe role of semantics in zero-shot learning is considered. The effectiveness of previous approaches is analyzed according to the form of supervision provided. While some learn semantics independently, others only supervise the semantic subspace explained by training classes. Thus, the former is able to constrain the whole space but lacks the ability to model semantic correlations. The latter addresses this issue but leaves part of the semantic space unsupervised. This complementarity is exploited in a new convolutional neural network (CNN) framework, which proposes the use of semantics as constraints for recognition.Although a CNN trained for classification has no transfer ability, this can be encouraged by learning an hidden semantic layer together with a semantic code for classification. Two forms of semantic constraints are then introduced. The first is a loss-based regularizer that introduces a generalization constraint on each semantic predictor. The second is a codeword regularizer that favors semantic-to-class mappings consistent with prior semantic knowledge while allowing these to be learned from data. Significant improvements over the state-of-the-art are achieved on several datasets.\nFor computer vision applications, prior works have shown the efficacy of reducing the numeric precision of model parameters (network weights) in deep neural networks but also that reducing the precision of activations hurts model accuracy much more than reducing the precision of model parameters. We study schemes to train networks from scratch using reduced-precision activations without hurting the model accuracy. We reduce the precision of activation maps (along with model parameters) using a novel quantization scheme and increase the number of filter maps in a layer, and find that this scheme compensates or surpasses the accuracy of the baseline full-precision network. As a result, one can significantly reduce the dynamic memory footprint, memory bandwidth, computational energy and speed up the training and inference process with appropriate hardware support. We call our scheme WRPN - wide reduced-precision networks. We report results using our proposed schemes and show that our results are better than previously reported accuracies on ILSVRC-12 dataset while being computationally less expensive compared to previously reported reduced-precision networks.\nAs machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions. In recent years, a number of image saliency methods have been developed to summarize where highly complex neural networks \"look\" in an image for evidence for their predictions. However, these techniques are limited by their heuristic nature and architectural constraints. In this paper, we make two main contributions: First, we propose a general framework for learning different kinds of explanations for any black box algorithm. Second, we specialise the framework to find the part of an image most responsible for a classifier decision. Unlike previous works, our method is model-agnostic and testable because it is grounded in explicit and interpretable image perturbations.\nProcess mining analyzes business processes based on events stored in event logs. However, some recorded events may correspond to activities on a very low level of abstraction. When events are recorded on a too low level of granularity, process discovery methods tend to generate overgeneralizing process models. Grouping low-level events to higher level activities, i.e., event abstraction, can be used to discover better process models. Existing event abstraction methods are mainly based on common sub-sequences and clustering techniques. In this paper, we propose to first discover local process models and then use those models to lift the event log to a higher level of abstraction. Our conjecture is that process models discovered on the obtained high-level event log return process models of higher quality: their fitness and precision scores are more balanced. We show this with preliminary results on several real-life event logs.\nIn this paper, we develop a novel paradigm, namely hypergraph shift, to find robust graph modes by probabilistic voting strategy, which are semantically sound besides the self-cohesiveness requirement in forming graph modes. Unlike the existing techniques to seek graph modes by shifting vertices based on pair-wise edges (i.e, an edge with $2$ ends), our paradigm is based on shifting high-order edges (hyperedges) to deliver graph modes. Specifically, we convert the problem of seeking graph modes as the problem of seeking maximizers of a novel objective function with the aim to generate good graph modes based on sifting edges in hypergraphs. As a result, the generated graph modes based on dense subhypergraphs may more accurately capture the object semantics besides the self-cohesiveness requirement. We also formally prove that our technique is always convergent. Extensive empirical studies on synthetic and real world data sets are conducted on clustering and graph matching. They demonstrate that our techniques significantly outperform the existing techniques.\nPositioning data offer a remarkable source of information to analyze crowds urban dynamics. However, discovering urban activity patterns from the emergent behavior of crowds involves complex system modeling. An alternative approach is to adopt computational techniques belonging to the emergent paradigm, which enables self-organization of data and allows adaptive analysis. Specifically, our approach is based on stigmergy. By using stigmergy each sample position is associated with a digital pheromone deposit, which progressively evaporates and aggregates with other deposits according to their spatiotemporal proximity. Based on this principle, we exploit positioning data to identify high density areas (hotspots) and characterize their activity over time. This characterization allows the comparison of dynamics occurring in different days, providing a similarity measure exploitable by clustering techniques. Thus, we cluster days according to their activity behavior, discovering unexpected urban activity patterns. As a case study, we analyze taxi traces in New York City during 2015.\nThis paper is devoted to expressiveness of hypergraphs for which uncertainty propagation by local computations via Shenoy/Shafer method applies. It is demonstrated that for this propagation method for a given joint belief distribution no valuation of hyperedges of a hypergraph may provide with simpler hypergraph structure than valuation of hyperedges by conditional distributions. This has vital implication that methods recovering belief networks from data have no better alternative for finding the simplest hypergraph structure for belief propagation. A method for recovery tree-structured belief networks has been developed and specialized for Dempster-Shafer belief functions\nThis paper describes three variants of a counterexample guided inductive optimization (CEGIO) approach based on Satisfiability Modulo Theories (SMT) solvers. In particular, CEGIO relies on iterative executions to constrain a verification procedure, in order to perform inductive generalization, based on counterexamples extracted from SMT solvers. CEGIO is able to successfully optimize a wide range of functions, including non-linear and non-convex optimization problems based on SMT solvers, in which data provided by counterexamples are employed to guide the verification engine, thus reducing the optimization domain. The present algorithms are evaluated using a large set of benchmarks typically employed for evaluating optimization techniques. Experimental results show the efficiency and effectiveness of the proposed algorithms, which find the optimal solution in all evaluated benchmarks, while traditional techniques are usually trapped by local minima.\nPairwise association measure is an important operation in data analytics. Kendall's tau coefficient is one widely used correlation coefficient identifying non-linear relationships between ordinal variables. In this paper, we investigated a parallel algorithm accelerating all-pairs Kendall's tau coefficient computation via single instruction multiple data (SIMD) vectorized sorting on Intel Xeon Phis by taking advantage of many processing cores and 512-bit SIMD vector instructions. To facilitate workload balancing and overcome on-chip memory limitation, we proposed a generic framework for symmetric all-pairs computation by building provable bijective functions between job identifier and coordinate space. Performance evaluation demonstrated that our algorithm on one 5110P Phi achieves two orders-of-magnitude speedups over 16-threaded MATLAB and three orders-of-magnitude speedups over sequential R, both running on high-end CPUs. Besides, our algorithm exhibited rather good distributed computing scalability with respect to number of Phis. Source code and datasets are publicly available at http://lightpcc.sourceforge.net.\nImage captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a \"policy network\" and a \"value network\" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.\nMulti-armed bandits are a quintessential machine learning problem requiring the balancing of exploration and exploitation. While there has been progress in developing algorithms with strong theoretical guarantees, there has been less focus on practical near-optimal finite-time performance. In this paper, we propose an algorithm for Bayesian multi-armed bandits that utilizes value-function-driven online planning techniques. Building on previous work on UCB and Gittins index, we introduce linearly-separable value functions that take both the expected return and the benefit of exploration into consideration to perform n-step lookahead. The algorithm enjoys a sub-linear performance guarantee and we present simulation results that confirm its strength in problems with structured priors. The simplicity and generality of our approach makes it a strong candidate for analyzing more complex multi-armed bandit problems.\nReinforcement learning is considered as a promising direction for driving policy learning. However, training autonomous driving vehicle with reinforcement learning in real environment involves non-affordable trial-and-error. It is more desirable to first train in a virtual environment and then transfer to the real environment. In this paper, we propose a novel realistic translation network to make model trained in virtual environment be workable in real world. The proposed network can convert non-realistic virtual image input into a realistic one with similar scene structure. Given realistic frames as input, driving policy trained by reinforcement learning can nicely adapt to real world driving. Experiments show that our proposed virtual to real (VR) reinforcement learning (RL) works pretty well. To our knowledge, this is the first successful case of driving policy trained by reinforcement learning that can adapt to real world driving data.\nThis paper considers a general data-fitting problem over a networked system, in which many computing nodes are connected by an undirected graph. This kind of problem can find many real-world applications and has been studied extensively in the literature. However, existing solutions either need a central controller for information sharing or requires slot synchronization among different nodes, which increases the difficulty of practical implementations, especially for a very large and heterogeneous system.   As a contrast, in this paper, we treat the data-fitting problem over the network as a stochastic programming problem with many constraints. By adapting the results in a recent paper, we design a fully distributed and asynchronized stochastic gradient descent (SGD) algorithm. We show that our algorithm can achieve global optimality and consensus asymptotically by only local computations and communications. Additionally, we provide a sharp lower bound for the convergence speed in the regular graph case. This result fits the intuition and provides guidance to design a `good' network topology to speed up the convergence. Also, the merit of our design is validated by experiments on both synthetic and real-world datasets.\nWe propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the \"gradient\" component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.\nIn this work, we propose CLass-Enhanced Attentive Response (CLEAR): an approach to visualize and understand the decisions made by deep neural networks (DNNs) given a specific input. CLEAR facilitates the visualization of attentive regions and levels of interest of DNNs during the decision-making process. It also enables the visualization of the most dominant classes associated with these attentive regions of interest. As such, CLEAR can mitigate some of the shortcomings of heatmap-based methods associated with decision ambiguity, and allows for better insights into the decision-making process of DNNs. Quantitative and qualitative experiments across three different datasets demonstrate the efficacy of CLEAR for gaining a better understanding of the inner workings of DNNs during the decision-making process.\nExtracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based evaluation of CNNs for video data recorded with a static camera setting, exploiting the spatio-temporal sparsity of pixel changes. We achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic benchmark with a negligible accuracy loss of less than 0.1% and no retraining of the network. The resulting energy efficiency is 10x higher than that of per-frame evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform.\nWe present DAPIP, a Programming-By-Example system that learns to program with APIs to perform data transformation tasks. We design a domain-specific language (DSL) that allows for arbitrary concatenations of API outputs and constant strings. The DSL consists of three family of APIs: regular expression-based APIs, lookup APIs, and transformation APIs. We then present a novel neural synthesis algorithm to search for programs in the DSL that are consistent with a given set of examples. The search algorithm uses recently introduced neural architectures to encode input-output examples and to model the program search in the DSL. We show that synthesis algorithm outperforms baseline methods for synthesizing programs on both synthetic and real-world benchmarks.\nChinese discourse coherence modeling remains a challenge taskin Natural Language Processing field.Existing approaches mostlyfocus on the need for feature engineering, whichadoptthe sophisticated features to capture the logic or syntactic or semantic relationships acrosssentences within a text.In this paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse coherence evaluation based on current English discourse coherenceneural network model. Specifically, to overcome the shortage of identifying the entity(nouns) overlap across sentences in the currentmodel, Our combined modelsuccessfully investigatesthe entities information into the recursive neural network freamework.Evaluation results on both sentence ordering and machine translation coherence rating task show the effectiveness of the proposed model, which significantly outperforms the existing strong baseline.\nNowadays, robots become a companion in everyday life. To be well-accepted by humans, robots should efficiently understand meanings of their partners' motions and body language, and respond accordingly. Learning concepts by imitation brings them this ability in a user-friendly way.   This paper presents a fast and robust model for Incremental Learning of Concepts by Imitation (ILoCI). In ILoCI, observed multimodal spatio-temporal demonstrations are incrementally abstracted and generalized based on both their perceptual and functional similarities during the imitation. In this method, perceptually similar demonstrations are abstracted by a dynamic model of mirror neuron system. An incremental method is proposed to learn their functional similarities through a limited number of interactions with the teacher. Learning all concepts together by the proposed memory rehearsal enables robot to utilize the common structural relations among concepts which not only expedites the learning process especially at the initial stages, but also improves the generalization ability and the robustness against discrepancies between observed demonstrations.   Performance of ILoCI is assessed using standard LASA handwriting benchmark data set. The results show efficiency of ILoCI in concept acquisition, recognition and generation in addition to its robustness against variability in demonstrations.\nCoreference evaluation metrics are hard to optimize directly as they are non-differentiable functions, not easily decomposable into elementary decisions. Consequently, most approaches optimize objectives only indirectly related to the end goal, resulting in suboptimal performance. Instead, we propose a differentiable relaxation that lends itself to gradient-based optimisation, thus bypassing the need for reinforcement learning or heuristic modification of cross-entropy. We show that by modifying the training objective of a competitive neural coreference system, we obtain a substantial gain in performance. This suggests that our approach can be regarded as a viable alternative to using reinforcement learning or more computationally expensive imitation learning.\nVision and language understanding has emerged as a subject undergoing intense study in Artificial Intelligence. Among many tasks in this line of research, visual question answering (VQA) has been one of the most successful ones, where the goal is to learn a model that understands visual content at region-level details and finds their associations with pairs of questions and answers in the natural language form. Despite the rapid progress in the past few years, most existing work in VQA have focused primarily on images. In this paper, we focus on extending VQA to the video domain and contribute to the literature in three important ways. First, we propose three new tasks designed specifically for video VQA, which require spatio-temporal reasoning from videos to answer questions correctly. Next, we introduce a new large-scale dataset for video VQA named TGIF-QA that extends existing VQA work with our new tasks. Finally, we propose a dual-LSTM based approach with both spatial and temporal attention, and show its effectiveness over conventional VQA techniques through empirical evaluations.\nIn this work we present a new reinforcement learning agent, called Reactor (for Retrace-actor), based on an off-policy multi-step return actor-critic architecture. The agent uses a deep recurrent neural network for function approximation. The network outputs a target policy {\\pi} (the actor), an action-value Q-function (the critic) evaluating the current policy {\\pi}, and an estimated behavioral policy {\\hat \\mu} which we use for off-policy correction. The agent maintains a memory buffer filled with past experiences. The critic is trained by the multi-step off-policy Retrace algorithm and the actor is trained by a novel {\\beta}-leave-one-out policy gradient estimate (which uses both the off-policy corrected return and the estimated Q-function). The Reactor is sample-efficient thanks to the use of memory replay, and numerical efficient since it uses multi-step returns. Also both acting and learning can be parallelized. We evaluated our algorithm on 57 Atari 2600 games and demonstrate that it achieves state-of-the-art performance.\nWe present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students' ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at https://github.com/qizhex/RACE_AR_baselines.\nThe rise of robotic applications has led to the generation of a huge volume of unstructured data, whereas the current cloud infrastructure was designed to process limited amounts of structured data. To address this problem, we propose a learn-memorize-recall-reduce paradigm for robotic cloud computing. The learning stage converts incoming unstructured data into structured data; the memorization stage provides effective storage for the massive amount of data; the recall stage provides efficient means to retrieve the raw data; while the reduction stage provides means to make sense of this massive amount of unstructured data with limited computing resources.\nThis work presents a new multi-chemical experimental platform for molecular communication where the transmitter can release different chemicals. This platform is designed to be inexpensive and accessible, and it can be expanded to simulate different environments including the cardiovascular system and complex network of pipes in industrial complexes and city infrastructures. To demonstrate the capabilities of the platform, we implement a time-slotted binary communication system where a bit-0 is represented by an acid pulse, a bit-1 by a base pulse, and information is carried via pH signals. The channel model for this system, which is nonlinear and has long memories, is unknown. Therefore, we devise novel detection algorithms that use techniques from machine learning and deep learning to train a maximum-likelihood detector. Using these algorithms the bit error rate improves by an order of magnitude relative to the approach used in previous works. Moreover, our system achieves a data rate that is an order of magnitude higher than any of the previous molecular communication platforms.\nWe introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values. The model is very general and can be used to integrate many datasets across different entity types, including repeated experiments, similarity matrices, and very sparse datasets. We apply our method on two biological applications, and extensively compare it to state-of-the-art machine learning and matrix factorisation models. For in-matrix predictions on drug sensitivity datasets we obtain consistently better performances than existing methods. This is especially the case when we increase the sparsity of the datasets. Furthermore, we perform out-of-matrix predictions on methylation and gene expression datasets, and obtain the best results on two of the three datasets, especially when the predictivity of datasets is high.\nMorpheo is a transparent and secure machine learning platform collecting and analysing large datasets. It aims at building state-of-the art prediction models in various fields where data are sensitive. Indeed, it offers strong privacy of data and algorithm, by preventing anyone to read the data, apart from the owner and the chosen algorithms. Computations in Morpheo are orchestrated by a blockchain infrastructure, thus offering total traceability of operations. Morpheo aims at building an attractive economic ecosystem around data prediction by channelling crypto-money from prediction requests to useful data and algorithms providers. Morpheo is designed to handle multiple data sources in a transfer learning approach in order to mutualize knowledge acquired from large datasets for applications with smaller but similar datasets.\nThe advent of the Big Data hype and the consistent recollection of event logs and real-time data from sensors, monitoring software and machine configuration has generated a huge amount of time-varying data in about every sector of the industry. Rule-based processing of such data has ceased to be relevant in many scenarios where anomaly detection and pattern mining have to be entirely accomplished by the machine. Since the early 2000s, the de-facto standard for representing time series has been the Symbolic Aggregate approXimation (SAX).In this document, we present a few algorithms using this representation for anomaly detection and motif discovery, also known as pattern mining, in such data. We propose a benchmark of anomaly detection algorithms using data from Cloud monitoring software.\nInformation systems experience an ever-growing volume of unstructured data, particularly in the form of textual materials. This represents a rich source of information from which one can create value for people, organizations and businesses. For instance, recommender systems can benefit from automatically understanding preferences based on user reviews or social media. However, it is difficult for computer programs to correctly infer meaning from narrative content. One major challenge is negations that invert the interpretation of words and sentences. As a remedy, this paper proposes a novel learning strategy to detect negations: we apply reinforcement learning to find a policy that replicates the human perception of negations based on an exogenous response, such as a user rating for reviews. Our method yields several benefits, as it eliminates the former need for expensive and subjective manual labeling in an intermediate stage. Moreover, the inferred policy can be used to derive statistical inferences and implications regarding how humans process and act on negations.\nLattice-theoretic ideals have been used to define and generate non granular rough approximations over general approximation spaces over the last few years by few authors. The goal of these studies, in relation based rough sets, have been to obtain nice properties comparable to those of classical rough approximations. In this research paper, these ideas are generalized in a severe way by the present author and associated semantic features are investigated by her. Granules are used in the construction of approximations in implicit ways and so a concept of co-granularity is introduced. Knowledge interpretation associable with the approaches is also investigated. This research will be of relevance for a number of logico-algebraic approaches to rough sets that proceed from point-wise definitions of approximations and also for using alternative approximations in spatial mereological contexts involving actual contact relations. The antichain based semantics invented in earlier papers by the present author also applies to the contexts considered.\nPredicting personality is essential for social applications supporting human-centered activities, yet prior modeling methods with users written text require too much input data to be realistically used in the context of social media. In this work, we aim to drastically reduce the data requirement for personality modeling and develop a model that is applicable to most users on Twitter. Our model integrates Word Embedding features with Gaussian Processes regression. Based on the evaluation of over 1.3K users on Twitter, we find that our model achieves comparable or better accuracy than state of the art techniques with 8 times fewer data.\nIn this work, we propose a method for learning driver models that account for variables that cannot be observed directly. When trained on a synthetic dataset, our models are able to learn encodings for vehicle trajectories that distinguish between four distinct classes of driver behavior. Such encodings are learned without any knowledge of the number of driver classes or any objective that directly requires the models to learn encodings for each class. We show that driving policies trained with knowledge of latent variables are more effective than baseline methods at imitating the driver behavior that they are trained to replicate. Furthermore, we demonstrate that the actions chosen by our policy are heavily influenced by the latent variable settings that are provided to them.\nExtracting geographical tags from webpages is a well-motivated application in many domains. In illicit domains with unusual language models, like human trafficking, extracting geotags with both high precision and recall is a challenging problem. In this paper, we describe a geotag extraction framework in which context, constraints and the openly available Geonames knowledge base work in tandem in an Integer Linear Programming (ILP) model to achieve good performance. In preliminary empirical investigations, the framework improves precision by 28.57% and F-measure by 36.9% on a difficult human trafficking geotagging task compared to a machine learning-based baseline. The method is already being integrated into an existing knowledge base construction system widely used by US law enforcement agencies to combat human trafficking.\nWhile there has been substantial progress in factoid question-answering (QA), answering complex questions remains challenging, typically requiring both a large body of knowledge and inference techniques. Open Information Extraction (Open IE) provides a way to generate semi-structured knowledge for QA, but to date such knowledge has only been used to answer simple questions with retrieval-based methods. We overcome this limitation by presenting a method for reasoning with Open IE knowledge, allowing more complex questions to be handled. Using a recently proposed support graph optimization framework for QA, we develop a new inference model for Open IE, in particular one that can work effectively with multiple short facts, noise, and the relational structure of tuples. Our model significantly outperforms a state-of-the-art structured solver on complex questions of varying difficulty, while also removing the reliance on manually curated knowledge.\nWe introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection. The corpus has 1.3 million sarcastic statements -- 10 times more than any previous dataset -- and many times more instances of non-sarcastic statements, allowing for learning in both balanced and unbalanced label regimes. Each statement is furthermore self-annotated -- sarcasm is labeled by the author, not an independent annotator -- and provided with user, topic, and conversation context. We evaluate the corpus for accuracy, construct benchmarks for sarcasm detection, and evaluate baseline methods.\nIn this paper, we propose an OCR (optical character recognition)-based localization system called OCRAPOSE II, which is applicable in a number of indoor scenarios including office buildings, parkings, airports, grocery stores, etc. In these scenarios, characters (i.e. texts or numbers) can be used as suitable distinctive landmarks for localization. The proposed system takes advantage of OCR to read these characters in the query still images and provides a rough location estimate using a floor plan. Then, it finds depth and angle-of-view of the query using the information provided by the OCR engine in order to refine the location estimate. We derive novel formulas for the query angle-of-view and depth estimation using image line segments and the OCR box information. We demonstrate the applicability and effectiveness of the proposed system through experiments in indoor scenarios. It is shown that our system demonstrates better performance compared to the state-of-the-art benchmarks in terms of location recognition rate and average localization error specially under sparse database condition.\nWhile deep learning is remarkably successful on perceptual tasks, it was also shown to be vulnerable to adversarial perturbations of the input. These perturbations denote noise added to the input that was generated specifically to fool the system while being quasi-imperceptible for humans. More severely, there even exist universal perturbations that are input-agnostic but fool the network on the majority of inputs. While recent work has focused on image classification, this work proposes attacks against semantic image segmentation: we present an approach for generating (universal) adversarial perturbations that make the network yield a desired target segmentation as output. We show empirically that there exist barely perceptible universal noise patterns which result in nearly the same predicted segmentation for arbitrary inputs. Furthermore, we also show the existence of universal noise which removes a target class (e.g., all pedestrians) from the segmentation while leaving the segmentation mostly unchanged otherwise.\nVariational inference approximates the posterior distribution of a probabilistic model with a parameterized density by maximizing a lower bound for the model evidence. Modern solutions fit a flexible approximation with stochastic gradient descent, using Monte Carlo approximation for the gradients. This enables variational inference for arbitrary differentiable probabilistic models, and consequently makes variational inference feasible for probabilistic programming languages. In this work we develop more efficient inference algorithms for the task by considering importance sampling estimates for the gradients. We show how the gradient with respect to the approximation parameters can often be evaluated efficiently without needing to re-compute gradients of the model itself, and then proceed to derive practical algorithms that use importance sampled estimates to speed up computation.We present importance sampled stochastic gradient descent that outperforms standard stochastic gradient descent by a clear margin for a range of models, and provide a justifiable variant of stochastic average gradients for variational inference.\nWe propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.\nKnowledge bases are important resources for a variety of natural language processing tasks but suffer from incompleteness. We propose a novel embedding model, \\emph{ITransF}, to perform knowledge base completion. Equipped with a sparse attention mechanism, ITransF discovers hidden concepts of relations and transfer statistical strength through the sharing of concepts. Moreover, the learned associations between relations and concepts, which are represented by sparse attention vectors, can be interpreted easily. We evaluate ITransF on two benchmark datasets---WN18 and FB15k for knowledge base completion and obtains improvements on both the mean rank and Hits@10 metrics, over all baselines that do not use additional information.\nMedia is full of false claims. Even Oxford Dictionaries named \"post-truth\" as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the kind of discourse there is around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics - each having their own families of claims and replies - and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.\nFor effective treatment of Alzheimer disease (AD), it is important to identify subjects who are most likely to exhibit rapid cognitive decline. Herein, we developed a novel framework based on a deep convolutional neural network which can predict future cognitive decline in mild cognitive impairment (MCI) patients using flurodeoxyglucose and florbetapir positron emission tomography (PET). The architecture of the network only relies on baseline PET studies of AD and normal subjects as the training dataset. Feature extraction and complicated image preprocessing including nonlinear warping are unnecessary for our approach. Accuracy of prediction (84.2%) for conversion to AD in MCI patients outperformed conventional feature-based quantification approaches. ROC analyses revealed that performance of CNN-based approach was significantly higher than that of the conventional quantification methods (p < 0.05). Output scores of the network were strongly correlated with the longitudinal change in cognitive measurements. These results show the feasibility of deep learning as a tool for predicting disease outcome using brain images.\nWe consider the problem of diagnosis where a set of simple observations are used to infer a potentially complex hidden hypothesis. Finding the optimal subset of observations is intractable in general, thus we focus on the problem of active diagnosis, where the agent selects the next most-informative observation based on the results of previous observations. We show that under the assumption of uniform observation entropy, one can build an implication model which directly predicts the outcome of the potential next observation conditioned on the results of past observations, and selects the observation with the maximum entropy. This approach enjoys reduced computation complexity by bypassing the complicated hypothesis space, and can be trained on observation data alone, learning how to query without knowledge of the hidden hypothesis.\nRelation detection is a core component for many NLP applications including Knowledge Base Question Answering (KBQA). In this paper, we propose a hierarchical recurrent neural network enhanced by residual learning that detects KB relations given an input question. Our method uses deep residual bidirectional LSTMs to compare questions and relation names via different hierarchies of abstraction. Additionally, we propose a simple KBQA system that integrates entity linking and our proposed relation detector to enable one enhance another. Experimental results evidence that our approach achieves not only outstanding relation detection performance, but more importantly, it helps our KBQA system to achieve state-of-the-art accuracy for both single-relation (SimpleQuestions) and multi-relation (WebQSP) QA benchmarks.\nSingleton arc consistency is an important type of local consistency which has been recently shown to solve all constraint satisfaction problems (CSPs) over constraint languages of bounded width. We aim to characterise all classes of CSPs defined by a forbidden pattern that are solved by singleton arc consistency and closed under removing constraints. We identify five new patterns whose absence ensures solvability by singleton arc consistency, four of which are provably maximal and three of which generalise 2-SAT. Combined with simple counter-examples for other patterns, we make significant progress towards a complete classification.\nMany Natural Language Processing and Computational Linguistics applications involves the generation of new texts based on some existing texts, such as summarization, text simplification and machine translation. However, there has been a serious problem haunting these applications for decades, that is, how to automatically and accurately assess quality of these applications. In this paper, we will present some preliminary results on one especially useful and challenging problem in NLP system evaluation: how to pinpoint content differences of two text passages (especially for large pas-sages such as articles and books). Our idea is intuitive and very different from existing approaches. We treat one text passage as a small knowledge base, and ask it a large number of questions to exhaustively identify all content points in it. By comparing the correctly answered questions from two text passages, we will be able to compare their content precisely. The experiment using 2007 DUC summarization corpus clearly shows promising results.\nThe management of invasive mechanical ventilation, and the regulation of sedation and analgesia during ventilation, constitutes a major part of the care of patients admitted to intensive care units. Both prolonged dependence on mechanical ventilation and premature extubation are associated with increased risk of complications and higher hospital costs, but clinical opinion on the best protocol for weaning patients off of a ventilator varies. This work aims to develop a decision support tool that uses available patient information to predict time-to-extubation readiness and to recommend a personalized regime of sedation dosage and ventilator support. To this end, we use off-policy reinforcement learning algorithms to determine the best action at a given patient state from sub-optimal historical ICU data. We compare treatment policies from fitted Q-iteration with extremely randomized trees and with feedforward neural networks, and demonstrate that the policies learnt show promise in recommending weaning protocols with improved outcomes, in terms of minimizing rates of reintubation and regulating physiological stability.\nThis dissertation is motivated by the need, in today's globalist world, for a precise way to enable governments, organisations and other regulatory bodies to evaluate the constraints they place on themselves and others. An organisation's modus operandi is enacting and fulfilling contracts between itself and its participants. Yet, organisational contracts should respect external laws, such as those setting out data privacy rights and liberties. Contracts can only be enacted by following contract law processes, which often require bilateral agreement and consideration. Governments need to legislate whilst understanding today's context of national and international governance hierarchy where law makers shun isolationism and seek to influence one another. Governments should avoid punishment by respecting constraints from international treaties and human rights charters. Governments can only enact legislation by following their own, pre-existing, law making procedures. In other words, institutions, such as laws and contracts are designed and enacted under constraints.\nIn this work we present a method for using Deep Q-Networks (DQNs) in multi-objective environments. Deep Q-Networks provide remarkable performance in single objective problems learning from high-level visual state representations. However, in many scenarios (e.g in robotics, games), the agent needs to pursue multiple objectives simultaneously. We propose an architecture in which separate DQNs are used to control the agent's behaviour with respect to particular objectives. In this architecture we introduce decision values to improve the scalarization of multiple DQNs into a single action. Our architecture enables the decomposition of the agent's behaviour into controllable and replaceable sub-behaviours learned by distinct modules. Moreover, it allows to change the priorities of particular objectives post-learning, while preserving the overall performance of the agent. To evaluate our solution we used a game-like simulator in which an agent - provided with high-level visual input - pursues multiple objectives in a 2D world.\nImage semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.\nWhile Monte Carlo Tree Search and closely related methods have dominated General Video Game Playing, recent research has demonstrated the promise of Rolling Horizon Evolutionary Algorithms as an interesting alternative. However, there is little attention paid to population initialization techniques in the setting of general real-time video games. Therefore, this paper proposes the use of population seeding to improve the performance of Rolling Horizon Evolution and presents the results of two methods, One Step Look Ahead and Monte Carlo Tree Search, tested on 20 games of the General Video Game AI corpus with multiple evolution parameter values (population size and individual length). An in-depth analysis is carried out between the results of the seeding methods and the vanilla Rolling Horizon Evolution. In addition, the paper presents a comparison to a Monte Carlo Tree Search algorithm. The results are promising, with seeding able to boost performance significantly over baseline evolution and even match the high level of play obtained by the Monte Carlo Tree Search.\nOur goal is to create a convenient natural language interface for performing well-specified but complex actions such as analyzing data, manipulating text, and querying databases. However, existing natural language interfaces for such tasks are quite primitive compared to the power one wields with a programming language. To bridge this gap, we start with a core programming language and allow users to \"naturalize\" the core language incrementally by defining alternative, more natural syntax and increasingly complex concepts in terms of compositions of simpler ones. In a voxel world, we show that a community of users can simultaneously teach a common system a diverse language and use it to build hundreds of complex voxel structures. Over the course of three days, these users went from using only the core language to using the naturalized language in 85.9\\% of the last 10K utterances.\nAgent modelling involves considering how other agents will behave, in order to influence your own actions. In this paper, we explore the use of agent modelling in the hidden-information, collaborative card game Hanabi. We implement a number of rule-based agents, both from the literature and of our own devising, in addition to an Information Set Monte Carlo Tree Search (IS-MCTS) agent. We observe poor results from IS-MCTS, so construct a new, predictor version that uses a model of the agents with which it is paired. We observe a significant improvement in game-playing strength from this agent in comparison to IS-MCTS, resulting from its consideration of what the other agents in a game would do. In addition, we create a flawed rule-based agent to highlight the predictor's capabilities with such an agent.\nMonte Carlo Tree Search techniques have generally dominated General Video Game Playing, but recent research has started looking at Evolutionary Algorithms and their potential at matching Tree Search level of play or even outperforming these methods. Online or Rolling Horizon Evolution is one of the options available to evolve sequences of actions for planning in General Video Game Playing, but no research has been done up to date that explores the capabilities of the vanilla version of this algorithm in multiple games. This study aims to critically analyse the different configurations regarding population size and individual length in a set of 20 games from the General Video Game AI corpus. Distinctions are made between deterministic and stochastic games, and the implications of using superior time budgets are studied. Results show that there is scope for the use of these techniques, which in some configurations outperform Monte Carlo Tree Search, and also suggest that further research in these methods could boost their performance.\nThis paper describes team Turing's submission to SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A). Subtask A addresses the challenge of rumour stance classification, which involves identifying the attitude of Twitter users towards the truthfulness of the rumour they are discussing. Stance classification is considered to be an important step towards rumour verification, therefore performing well in this task is expected to be useful in debunking false rumours. In this work we classify a set of Twitter posts discussing rumours into either supporting, denying, questioning or commenting on the underlying rumours. We propose a LSTM-based sequential model that, through modelling the conversational structure of tweets, which achieves an accuracy of 0.784 on the RumourEval test set outperforming all other systems in Subtask A.\nTo bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a saliency-boosted image captioning model in order to investigate benefits from low-level cues in language models. We learn that (1) humans mention more salient objects earlier than less salient ones in their descriptions, (2) the better a captioning model performs, the better attention agreement it has with human descriptions, (3) the proposed saliency-boosted model, compared to its baseline form, does not improve significantly on the MS COCO database, indicating explicit bottom-up boosting does not help when the task is well learnt and tuned on a data, (4) a better generalization is, however, observed for the saliency-boosted model on unseen data.\nVideo captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generation tasks: a temporally-directed unsupervised video prediction task to learn richer context-aware video encoder representations, and a logically-directed language entailment generation task to learn better video-entailed caption decoder representations. For this, we present a many-to-many multi-task learning model that shares parameters across the encoders and decoders of the three tasks. We achieve significant improvements and the new state-of-the-art on several standard video captioning datasets using diverse automatic and human evaluations. We also show mutual multi-task improvements on the entailment generation task.\nTo date, developing a good model for early intensive care unit (ICU) mortality prediction is still challenging. This paper presents a patient based predictive modeling framework (PPMF) to improve the performance of ICU mortality prediction using data collected during the first 48 hours of ICU admission. PPMF consists of three main components verifying three related research hypotheses. The first component captures dynamic changes of patients status in the ICU using their time series data (e.g., vital signs and laboratory tests). The second component is a local approximation algorithm that classifies patients based on their similarities. The third component is a Gradient Decent wrapper that updates feature weights according to the classification feedback. Experiments using data from MIMICIII show that PPMF significantly outperforms: (1) the severity score systems, namely SASP III, APACHE IV, and MPM0III, (2) the aggregation based classifiers that utilize summarized time series, and (3) baseline feature selection methods.\nThere is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals.\nPath planning for multiple robots is well studied in the AI and robotics communities. For a given discretized environment, robots need to find collision-free paths to a set of specified goal locations. Robots can be fully anonymous, non-anonymous, or organized in groups. Although powerful solvers for this abstract problem exist, they make simplifying assumptions by ignoring kinematic constraints, making it difficult to use the resulting plans on actual robots. In this paper, we present a solution which takes kinematic constraints, such as maximum velocities, into account, while guaranteeing a user-specified minimum safety distance between robots. We demonstrate our approach in simulation and on real robots in 2D and 3D environments.\nIn emotion recognition, it is difficult to recognize human's emotional states using just a single modality. Besides, the annotation of physiological emotional data is particularly expensive. These two aspects make the building of effective emotion recognition model challenging. In this paper, we first build a multi-view deep generative model to simulate the generative process of multi-modality emotional data. By imposing a mixture of Gaussians assumption on the posterior approximation of the latent variables, our model can learn the shared deep representation from multiple modalities. To solve the labeled-data-scarcity problem, we further extend our multi-view model to semi-supervised learning scenario by casting the semi-supervised classification problem as a specialized missing data imputation task. Our semi-supervised multi-view deep generative framework can leverage both labeled and unlabeled data from multiple modalities, where the weight factor for each modality can be learned automatically. Compared with previous emotion recognition methods, our method is more robust and flexible. The experiments conducted on two real multi-modal emotion datasets have demonstrated the superiority of our framework over a number of competitors.\nThis work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.\nWe propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on a carefully designed directed graph. Through experiments, we demonstrate that our approach outperforms stateof- the-art taxonomy induction approaches across four languages. Importantly, we also show that our approach is robust to the presence of noise in the input vocabulary. To the best of our knowledge, no previous approaches have been empirically proven to manifest noise-robustness in the input vocabulary.\nVehicle climate control systems aim to keep passengers thermally comfortable. However, current systems control temperature rather than thermal comfort and tend to be energy hungry, which is of particular concern when considering electric vehicles. This paper poses energy-efficient vehicle comfort control as a Markov Decision Process, which is then solved numerically using Sarsa({\\lambda}) and an empirically validated, single-zone, 1D thermal model of the cabin. The resulting controller was tested in simulation using 200 randomly selected scenarios and found to exceed the performance of bang-bang, proportional, simple fuzzy logic, and commercial controllers with 23%, 43%, 40%, 56% increase, respectively. Compared to the next best performing controller, energy consumption is reduced by 13% while the proportion of time spent thermally comfortable is increased by 23%. These results indicate that this is a viable approach that promises to translate into substantial comfort and energy improvements in the car.\nOur goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. The new algorithm guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL, and by updating parameters such that probability is spread more evenly across consistent programs. We apply our learning algorithm to a new neural semantic parser and show significant gains over existing state-of-the-art results on a recent context-dependent semantic parsing task.\nThe $L_1$-regularized models are widely used for sparse regression or classification tasks. In this paper, we propose the orthant-wise passive descent algorithm (OPDA) for optimizing $L_1$-regularized models, as an improved substitute of proximal algorithms, which are the standard tools for optimizing the models nowadays. OPDA uses a stochastic variance-reduced gradient (SVRG) to initialize the descent direction, then apply a novel alignment operator to encourage each element keeping the same sign after one iteration of update, so the parameter remains in the same orthant as before. It also explicitly suppresses the magnitude of each element to impose sparsity. The quasi-Newton update can be utilized to incorporate curvature information and accelerate the speed. We prove a linear convergence rate for OPDA on general smooth and strongly-convex loss functions. By conducting experiments on $L_1$-regularized logistic regression and convolutional neural networks, we show that OPDA outperforms state-of-the-art stochastic proximal algorithms, implying a wide range of applications in training sparse models.\nWhile the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.\nWe propose a development of the Analytic Hierarchy Process (AHP) permitting to use the methodology also in cases of decision problems with a very large number of alternatives evaluated with respect to several criteria. While the application of the original AHP method involves many pairwise comparisons between alternatives and criteria, our proposal is composed of three steps: (i) direct evaluation of the alternatives at hand on the considered criteria, (ii) selection of some reference evaluations; (iii) application of the original AHP method to reference evaluations; (iv) revision of the direct evaluation on the basis of the prioritization supplied by AHP on reference evaluations. The new proposal has been tested and validated in an experiment conducted on a sample of university students. The new methodology has been therefore applied to a real world problem involving the evaluation of 21 Social Housing initiatives sited in the Piedmont region (Italy). To take into account interaction between criteria, the Choquet integral preference model has been considered within a Non Additive Robust Ordinal Regression approach.\nThis paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.\nDespite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning approach to adapt road scene segmenters across different cities. By utilizing Google Street View and its time-machine feature, we can collect unannotated images for each road scene at different times, so that the associated static-object priors can be extracted accordingly. By advancing a joint global and class-specific domain adversarial learning framework, adaptation of pre-trained segmenters to that city can be achieved without the need of any user annotation or interaction. We show that our method improves the performance of semantic segmentation in multiple cities across continents, while it performs favorably against state-of-the-art approaches requiring annotated training data.\nOne of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions and turned out to be a well-known NP-hard problem and, hence, approximations are required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before.   For this reason, in this work, we provide a detailed study of the different state-of-the-arts methods for structural learning on simulated data considering both BNs with discrete and continuous variables, and with different rates of noise in the data. In particular, we investigate the characteristics of different widespread scores proposed for the inference and the statistical pitfalls within them.\nWe introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most important feature of Parseval networks is to maintain weight matrices of linear and convolutional layers to be (approximately) Parseval tight frames, which are extensions of orthogonal matrices to non-square matrices. We describe how these constraints can be maintained efficiently during SGD. We show that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers (SVHN) while being more robust than their vanilla counterpart against adversarial examples. Incidentally, Parseval networks also tend to train faster and make a better usage of the full capacity of the networks.\nWe present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting - to the best of our knowledge - the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.\nNeural conversational models require substantial amounts of dialogue data for their parameter estimation and are therefore usually learned on large corpora such as chat forums or movie subtitles. These corpora are, however, often challenging to work with, notably due to their frequent lack of turn segmentation and the presence of multiple references external to the dialogue itself. This paper shows that these challenges can be mitigated by adding a weighting model into the architecture. The weighting model, which is itself estimated from dialogue data, associates each training example to a numerical weight that reflects its intrinsic quality for dialogue modelling. At training time, these sample weights are included into the empirical loss to be minimised. Evaluation results on retrieval-based models trained on movie and TV subtitles demonstrate that the inclusion of such a weighting model improves the model performance on unsupervised metrics.\nWe study automatic question generation for sentences from text passages in reading comprehension. We introduce an attention-based sequence learning model for the task and investigate the effect of encoding sentence- vs. paragraph-level information. In contrast to all previous work, our model does not rely on hand-crafted rules or a sophisticated NLP pipeline; it is instead trainable end-to-end via sequence-to-sequence learning. Automatic evaluation results show that our system significantly outperforms the state-of-the-art rule-based system. In human evaluations, questions generated by our system are also rated as being more natural (i.e., grammaticality, fluency) and as more difficult to answer (in terms of syntactic and lexical divergence from the original text and reasoning needed to answer).\nIn this paper we show how the defense relation among abstract arguments can be used to encode the reasons for accepting arguments. After introducing a novel notion of defenses and defense graphs, we propose a defense semantics together with a new notion of defense equivalence of argument graphs, and compare defense equivalence with standard equivalence and strong equivalence, respectively. Then, based on defense semantics, we define two kinds of reasons for accepting arguments, i.e., direct reasons and root reasons, and a notion of root equivalence of argument graphs. Finally, we show how the notion of root equivalence can be used in argumentation summarization.\nOpen world games present players with more freedom than games with linear progression structures. However, without clearly-defined objectives, they often leave players without a sense of purpose. Most of the time, quests and objectives are hand-authored and overlaid atop an open world's mechanics. But what if they could be generated organically from the gameplay itself? The goal of our project was to develop a model of the mechanics in Minecraft that could be used to determine the ideal placement of objectives in an open world setting. We formalized the game logic of Minecraft in terms of logical rules that can be manipulated in two ways: they may be executed to generate graphs representative of the player experience when playing an open world game with little developer direction; and they may be statically analyzed to determine dependency orderings, feedback loops, and bottlenecks. These analyses may then be used to place achievements on gameplay actions algorithmically.\nSemi-supervised learning plays an important role in large-scale machine learning. Properly using additional unlabeled data (largely available nowadays) often can improve the machine learning accuracy. However, if the machine learning model is misspecified for the underlying true data distribution, the model performance could be seriously jeopardized. This issue is known as model misspecification. To address this issue, we focus on generative models and propose a criterion to detect the onset of model misspecification by measuring the performance difference between models obtained using supervised and semi-supervised learning. Then, we propose to automatically modify the generative models during model training to achieve an unbiased generative model. Rigorous experiments were carried out to evaluate the proposed method using two image classification data sets PASCAL VOC'07 and MIR Flickr. Our proposed method has been demonstrated to outperform a number of state-of-the-art semi-supervised learning approaches for the classification task.\nWe propose a software architecture designed to ease the implementation of dialogue systems. The Modular Architecture for Conversational Agents (MACA) uses a plug-n-play style that allows quick prototyping, thereby facilitating the development of new techniques and the reproduction of previous work. The architecture separates the domain of the conversation from the agent's dialogue strategy, and as such can be easily extended to multiple domains. MACA provides tools to host dialogue agents on Amazon Mechanical Turk (mTurk) for data collection and allows processing of other sources of training data. The current version of the framework already incorporates several domains and existing dialogue strategies from the recent literature.\nThe increase of connectivity and the impact it has in every day life is raising new and existing security problems that are becoming important for social good. We introduce two particular problems: cyber attack attribution and regulatory data sharing. For both problems, decisions about which rules to apply, should be taken under incomplete and context dependent information. The solution we propose is based on argumentation reasoning, that is a well suited technique for implementing decision making mechanisms under conflicting and incomplete information. Our proposal permits us to identify the attacker of a cyber attack and decide the regulation rule that should be used while using and sharing data. We illustrate our solution through concrete examples.\nLogical theories have been developed which have allowed temporal reasoning about eventualities (a la Galton) such as states, processes, actions, events, processes and complex eventualities such as sequences and recurrences of other eventualities. This paper presents the problem of coincidence within the framework of a first order logical theory formalising temporal multiple recurrence of two sequences of fixed duration eventualities and presents a solution to it The coincidence problem is described as: if two complex eventualities (or eventuality sequences) consisting respectively of component eventualities x0, x1,....,xr and y0, y1, ..,ys both recur over an interval k and all eventualities are of fixed durations, is there a sub-interval of k over which the incidence xt and yu for t between 0..r and s between 0..s coincide. The solution presented here formalises the intuition that a solution can be found by temporal projection over a cycle of the multiple recurrence of both sequences.\nIt is not rare that the performance of one metaheuristic algorithm can be improved by incorporating ideas taken from another. In this article we present how Simulated Annealing (SA) can be used to improve the efficiency of the Ant Colony System (ACS) and Enhanced ACS when solving the Sequential Ordering Problem (SOP). Moreover, we show how the very same ideas can be applied to improve the convergence of a dedicated local search, i.e. the SOP-3-exchange algorithm. A statistical analysis of the proposed algorithms both in terms of finding suitable parameter values and the quality of the generated solutions is presented based on a series of computational experiments conducted on SOP instances from the well-known TSPLIB and SOPLIB2006 repositories. The proposed ACS-SA and EACS-SA algorithms often generate solutions of better quality than the ACS and EACS, respectively. Moreover, the EACS-SA algorithm combined with the proposed SOP-3-exchange-SA local search was able to find 10 new best solutions for the SOP instances from the SOPLIB2006 repository, thus improving the state-of-the-art results as known from the literature. Overall, the best known or improved solutions were found in 41 out of 48 cases.\nProviding an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate and have limited ability to generalize. We then explore a system's ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions.\nCognitive arithmetic studies the mental processes used in solving math problems. This area of research explores the retrieval mechanisms and strategies used by people during a common cognitive task. Past research has shown that human performance in arithmetic operations is correlated to the numerical size of the problem. Past research on cognitive arithmetic has pinpointed this trend to either retrieval strength, error checking, or strategy-based approaches when solving equations. This paper describes a rule-based computational model that performs the four major arithmetic operations (addition, subtraction, multiplication and division) on two operands. We then evaluated our model to probe its validity in representing the prevailing concepts observed in psychology experiments from the related works. The experiments specifically explore the problem size effect, an activation-based model for fact retrieval, backup strategies when retrieval fails, and finally optimization strategies when faced with large operands. From our experimental results, we concluded that our model's response times were comparable to results observed when people performed similar tasks during psychology experiments. The fit of our model in reproducing these results and incorporating accuracy into our model are discussed.\nThe state-of-the-art online learning approaches are only capable of learning the metric for predefined tasks. In this paper, we consider lifelong learning problem to mimic \"human learning\", i.e., endowing a new capability to the learned metric for a new task from new online samples and incorporating previous experiences and knowledge. Therefore, we propose a new metric learning framework: lifelong metric learning (LML), which only utilizes the data of the new task to train the metric model while preserving the original capabilities. More specifically, the proposed LML maintains a common subspace for all learned metrics, named lifelong dictionary, transfers knowledge from the common subspace to each new metric task with task-specific idiosyncrasy, and redefines the common subspace over time to maximize performance across all metric tasks. For model optimization, we apply online passive aggressive optimization algorithm to solve the proposed LML framework, where the lifelong dictionary and task-specific partition are optimized alternatively and consecutively. Finally, we evaluate our approach by analyzing several multi-task metric learning datasets. Extensive experimental results demonstrate effectiveness and efficiency of the proposed framework.\nWe present an approach for the verification of feed-forward neural networks in which all nodes have a piece-wise linear activation function. Such networks are often used in deep learning and have been shown to be hard to verify for modern satisfiability modulo theory (SMT) and integer linear programming (ILP) solvers.   The starting point of our approach is the addition of a global linear approximation of the overall network behavior to the verification problem that helps with SMT-like reasoning over the network behavior. We present a specialized verification algorithm that employs this approximation in a search process in which it infers additional node phases for the non-linear nodes in the network from partial node phase assignments, similar to unit propagation in classical SAT solving. We also show how to infer additional conflict clauses and safe node fixtures from the results of the analysis steps performed during the search. The resulting approach is evaluated on collision avoidance and handwritten digit recognition case studies.\nNon-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains.\nThe postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by comparing it against a null hypothesis through the application of random generic group transformations. We show that the group theoretic view provides a very general tool to study the structure of data generating mechanisms with direct applications to machine learning.\nApplication of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing values, inconvenient storage mechanisms, intellectual property, security and privacy. All these aspects obstruct the sharing and interconnection of data, and the eventual interpretation of data through machine learning or other approaches. In project reporting, a major challenge is in encapsulating these problems and enabling goals to be built around the processing of data. Project overruns can occur due to failure to account for the amount of time required to curate and collate. But to understand these failures we need to have a common language for assessing the readiness of a particular data set. This position paper proposes the use of data readiness levels: it gives a rough outline of three stages of data preparedness and speculates on how formalisation of these levels into a common language for data readiness could facilitate project management.\nLarge-scale multi-relational embedding refers to the task of learning the latent representations for entities and relations in large knowledge graphs. An effective and scalable solution for this problem is crucial for the true success of knowledge-based inference in a broad range of applications. This paper proposes a novel framework for optimizing the latent representations with respect to the \\textit{analogical} properties of the embedded entities and relations. By formulating the learning objective in a differentiable fashion, our model enjoys both theoretical power and computational scalability, and significantly outperformed a large number of representative baseline methods on benchmark datasets. Furthermore, the model offers an elegant unification of several well-known methods in multi-relational embedding, which can be proven to be special instantiations of our framework.\nThis article describes their biopolitical implications for design from psychological, cultural, legal, functional and aesthetic/perceptive ways, in the framework of Hyperconnectivity: the condition according to which person-to-person, person-to-machine and machine-to-machine communication progressively shift to networked and digital means. A definition is given for the terms of \"interface biopolitics\" and \"data biopolitics\", as well as evidence supporting these definitions and a description of the technological, theoretical and practice-based innovations bringing them into meaningful existence. Interfaces, algorithms, artificial intelligences of various types, the tendency in quantified self and the concept of \"information bubbles\" will be examined in terms of interface and data biopolitics, from the point of view of design, and for their implications in terms of freedoms, transparency, justice and accessibility to human rights. A working hypothesis is described for technologically relevant design practices and education processes, in order to confront with these issues in critical, ethical and inclusive ways.\nOnline health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.\nWe propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through epochs, in each epoch we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the epoch, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.\nHow to handle uncertainty in medical diagnosis is an open issue. In this paper, a new decision making methodology based on Z-numbers is presented. Firstly, the experts' opinions are represented by Z-numbers. Z-number is an ordered pair of fuzzy numbers denoted as Z = (A, B). Then, a new method for ranking fuzzy numbers is proposed. And based on the proposed fuzzy number ranking method, a novel method is presented to transform the Z-numbers into Basic Probability Assignment (BPA). As a result, the information from different sources is combined by the Dempster' combination rule. The final decision making is more reasonable due to the advantage of information fusion. Finally, two experiments, risk analysis and medical diagnosis, are illustrated to show the efficiency of the proposed methodology.\nUnderstanding and discovering knowledge from GPS (Global Positioning System) traces of human activities is an essential topic in mobility-based urban computing. We propose TrajectoryNet-a neural network architecture for point-based trajectory classification to infer real world human transportation modes from GPS traces. To overcome the challenge of capturing the underlying latent factors in the low-dimensional and heterogeneous feature space imposed by GPS data, we develop a novel representation that embeds the original feature space into another space that can be understood as a form of basis expansion. We also enrich the feature space via segment-based information and use Maxout activations to improve the predictive power of Recurrent Neural Networks (RNNs). We achieve over 98% classification accuracy when detecting four types of transportation modes, outperforming existing models without additional sensory data or location-based prior knowledge.\nOnline reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for \"long-tail\" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines.\nOnline review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings.\nWearable computing is one of the fastest growing technologies today. Smart watches are poised to take over at least of half the wearable devices market in the near future. Smart watch screen size, however, is a limiting factor for growth, as it restricts practical text input. On the other hand, wearable devices have some features, such as consistent user interaction and hands-free, heads-up operations, which pave the way for gesture recognition methods of text entry. This paper proposes a new text input method for smart watches, which utilizes motion sensor data and machine learning approaches to detect letters written in the air by a user. This method is less computationally intensive and less expensive when compared to computer vision approaches. It is also not affected by lighting factors, which limit computer vision solutions. The AirDraw system prototype developed to test this approach is presented. Additionally, experimental results close to 71% accuracy are presented.\nConsumers often react expressively to products such as food samples, perfume, jewelry, sunglasses, and clothing accessories. This research discusses a multimodal affect recognition system developed to classify whether a consumer likes or dislikes a product tested at a counter or kiosk, by analyzing the consumer's facial expression, body posture, hand gestures, and voice after testing the product. A depth-capable camera and microphone system - Kinect for Windows - is utilized. An emotion identification engine has been developed to analyze the images and voice to determine affective state of the customer. The image is segmented using skin color and adaptive threshold. Face, body and hands are detected using the Haar cascade classifier. Canny edges are identified and the lip, body and hand contours are extracted using spatial filtering. Edge count and orientation around the mouth, cheeks, eyes, shoulders, fingers and the location of the edges are used as features. Classification is done by an emotion template mapping algorithm and training a classifier using support vector machines. The real-time performance, accuracy and feasibility for multimodal affect recognition in feedback assessment are evaluated.\nWe study the problem of finding a small subset of items that is \\emph{agreeable} to all agents, meaning that all agents value the subset at least as much as its complement. Previous work has shown worst-case bounds, over all instances with a given number of agents and items, on the number of items that may need to be included in such a subset. Our goal in this paper is to efficiently compute an agreeable subset whose size approximates the size of the smallest agreeable subset for a given instance. We consider three well-known models for representing the preferences of the agents: ordinal preferences on single items, the value oracle model, and additive utilities. In each of these models, we establish virtually tight bounds on the approximation ratio that can be obtained by algorithms running in polynomial time.\nThe character information in natural scene images contains various personal information, such as telephone numbers, home addresses, etc. It is a high risk of leakage the information if they are published. In this paper, we proposed a scene text erasing method to properly hide the information via an inpainting convolutional neural network (CNN) model. The input is a scene text image, and the output is expected to be text erased image with all the character regions filled up the colors of the surrounding background pixels. This work is accomplished by a CNN model through convolution to deconvolution with interconnection process. The training samples and the corresponding inpainting images are considered as teaching signals for training. To evaluate the text erasing performance, the output images are detected by a novel scene text detection method. Subsequently, the same measurement on text detection is utilized for testing the images in benchmark dataset ICDAR2013. Compared with direct text detection way, the scene text erasing process demonstrates a drastically decrease on the precision, recall and f-score. That proves the effectiveness of proposed method for erasing the text in natural scene images.\nGenerative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN.\nMachine learning has become pervasive in multiple domains, impacting a wide variety of applications, such as knowledge discovery and data mining, natural language processing, information retrieval, computer vision, social and health informatics, ubiquitous computing, etc. Two essential problems of machine learning are how to generate features and how to acquire labels for machines to learn. Particularly, labeling large amount of data for each domain-specific problem can be very time consuming and costly. It has become a key obstacle in making learning protocols realistic in applications. In this paper, we will discuss how to use the existing general-purpose world knowledge to enhance machine learning processes, by enriching the features or reducing the labeling work. We start from the comparison of world knowledge with domain-specific knowledge, and then introduce three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision. Finally we discuss the future directions of this research topic.\nIn imperfect-information games, the optimal strategy in a subgame may depend on the strategy in other, unreached subgames. Thus a subgame cannot be solved in isolation and must instead consider the strategy for the entire game as a whole, unlike perfect-information games. Nevertheless, it is possible to first approximate a solution for the whole game and then improve it by solving individual subgames. This is referred to as subgame solving. We introduce subgame-solving techniques that outperform prior methods both in theory and practice. We also show how to adapt them, and past subgame-solving techniques, to respond to opponent actions that are outside the original action abstraction; this significantly outperforms the prior state-of-the-art approach, action translation. Finally, we show that subgame solving can be repeated as the game progresses down the game tree, leading to far lower exploitability. These techniques were a key component of Libratus, the first AI to defeat top humans in heads-up no-limit Texas hold'em poker.\nWord and phrase tables are key inputs to machine translations, but costly to produce. New unsupervised learning methods represent words and phrases in a high-dimensional vector space, and these monolingual embeddings have been shown to encode syntactic and semantic relationships between language elements. The information captured by these embeddings can be exploited for bilingual translation by learning a transformation matrix that allows to match relative positions across two monolingual vector spaces. This method aims to identify high-quality candidates for word and phrase translation more cost-effectively from unlabeled data.   This paper expands the scope of previous attempts of bilingual translation to four languages (English, German, Spanish, and French). It shows how to process the source data, train a neural network to learn the high-dimensional embeddings for individual languages and expands the framework for testing their quality beyond the English language. Furthermore, it shows how to learn bilingual transformation matrices and obtain candidates for word and phrase translation, and assess their quality.\nThis paper proposes a path planning strategy for an Autonomous Ground Vehicle (AGV) navigating in a partially known environment. Global path planning is performed by first using a spatial database of the region to be traversed containing selected attributes such as height data and soil information from a suitable spatial database. The database is processed using a biomimetic swarm algorithm that is inspired by the nest building strategies followed by termites. Local path planning is performed online utilizing information regarding contingencies that affect the safe navigation of the AGV from various sensors. The simulation discussed has been implemented on the open source Player-Stage-Gazebo platform.\nIn process mining, precision measures are used to quantify how much a process model overapproximates the behavior seen in an event log. Although several measures have been proposed throughout the years, no research has been done to validate whether these measures achieve the intended aim of quantifying over-approximation in a consistent way for all models and logs. This paper fills this gap by postulating a number of axioms for quantifying precision consistently for any log and any model. Further, we show through counter-examples that none of the existing measures consistently quantifies precision.\nWe propose a logic of asynchronous announcements, where truthful announcements are publicly sent but individually received by agents. Additional to epistemic modalities, the logic therefore contains two types of dynamic modalities, for sending messages and for receiving messages. The semantics defines truth relative to the current state of reception of messages for all agents. This means that knowledge need not be truthful, because some messages may not have been received by the knowing agent. Messages that are announcements may also result in partial synchronization, namely when an agent learns from receiving an announcement that other announcements must already have been received by other agents. We give detailed examples of the semantics, and prove several semantic results, including that: after an announcement an agent knows that a proposition is true, if and only if on condition of the truth of that announcement, the agent knows that after that announcement and after any number of other agents also receiving it, the proposition is true. We show that on multi-agent epistemic models, each formula in asynchronous announcement logic is equivalent to a formula in epistemic logic.\nSpoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a recurrent neural network (RNN) based language understanding system. We propose the Sequential Dialogue Encoder Network, that allows encoding context from the dialogue history in chronological order. We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history. Experiments with a multi-domain dialogue dataset demonstrate that the proposed architecture results in reduced semantic frame error rates.\nIn most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk aversion situation expectation of accumulated rewards is not robust enough, this is the case when distribution of accumulated reward is heavily skewed; another issue is that many applications naturally take several objective into consideration when evaluating a policy, for instance in autonomous driving an agent needs to balance speed and safety when choosing appropriate decision. In this paper, we consider evaluating a policy based on a sequence of quantiles it induces on a set of target states, our idea is to reformulate the original problem into a multi-objective MDP problem with lexicographic preference naturally defined. For computation of finding an optimal policy, we proposed an algorithm \\textbf{FLMDP} that could solve general multi-objective MDP with lexicographic reward preference.\nIt has been shown that Chinese poems can be successfully generated by sequence-to-sequence neural models, particularly with the attention mechanism. A potential problem of this approach, however, is that neural models can only learn abstract rules, while poem generation is a highly creative process that involves not only rules but also innovations for which pure statistical models are not appropriate in principle. This work proposes a memory-augmented neural model for Chinese poem generation, where the neural model and the augmented memory work together to balance the requirements of linguistic accordance and aesthetic innovation, leading to innovative generations that are still rule-compliant. In addition, it is found that the memory mechanism provides interesting flexibility that can be used to generate poems with different styles.\nWe consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context(TSRC) and the Windows Thompson Sampling with Restricted Context(WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasets\nVisual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The first part of the survey details the various datasets for VQA and compares them along some common factors. The second part of this survey details the different approaches for VQA, classified into four types: non-deep learning models, deep learning models without attention, deep learning models with attention, and other models which do not fit into the first three. Finally, we compare the performances of these approaches and provide some directions for future work.\nThis paper explores the use of Answer Set Programming (ASP) in solving Distributed Constraint Optimization Problems (DCOPs). The paper provides the following novel contributions: (1) It shows how one can formulate DCOPs as logic programs; (2) It introduces ASP-DPOP, the first DCOP algorithm that is based on logic programming; (3) It experimentally shows that ASP-DPOP can be up to two orders of magnitude faster than DPOP (its imperative programming counterpart) as well as solve some problems that DPOP fails to solve, due to memory limitations; and (4) It demonstrates the applicability of ASP in a wide array of multi-agent problems currently modeled as DCOPs. Under consideration in Theory and Practice of Logic Programming (TPLP).\nCritical node problems involve identifying a subset of critical nodes from an undirected graph whose removal results in optimizing a pre-defined measure over the residual graph. As useful models for a variety of practical applications, these problems are computational challenging. In this paper, we study the classic critical node problem (CNP) and introduce an effective memetic algorithm for solving CNP. The proposed algorithm combines a double backbone-based crossover operator (to generate promising offspring solutions), a component-based neighborhood search procedure (to find high-quality local optima) and a rank-based pool updating strategy (to guarantee a healthy population). Specially, the component-based neighborhood search integrates two key techniques, i.e., two-phase node exchange strategy and node weighting scheme. The double backbone-based crossover extends the idea of general backbone-based crossovers. Extensive evaluations on 42 synthetic and real-world benchmark instances show that the proposed algorithm discovers 21 new upper bounds and matches 18 previous best-known upper bounds. We also demonstrate the relevance of our algorithm for effectively solving a variant of the classic CNP, called the cardinality-constrained critical node problem. Finally, we investigate the usefulness of each key algorithmic component.\nSolving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.\nIn this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under off-policy training (Sutton, Mahmood and White 2016), but it is also a new algorithm for the on-policy case. In both our on-policy and off-policy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of \"bounce\". In the off-policy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower.\nHumans make complex inferences on faces, ranging from objective properties (gender, ethnicity, expression, age, identity, etc) to subjective judgments (facial attractiveness, trustworthiness, sociability, friendliness, etc). While the objective aspects of face perception have been extensively studied, relatively fewer computational models have been developed for the social impressions of faces. Bridging this gap, we develop a method to predict human impressions of faces in 40 subjective social dimensions, using deep representations from state-of-the-art neural networks. We find that model performance grows as the human consensus on a face trait increases, and that model predictions outperform human groups in correlation with human averages. This illustrates the learnability of subjective social perception of faces, especially when there is high human consensus. Our system can be used to decide which photographs from a personal collection will make the best impression. The results are significant for the field of social robotics, demonstrating that robots can learn the subjective judgments defining the underlying fabric of human interaction.\nThe pancake puzzle is a classic optimization problem that has become a standard benchmark for heuristic search algorithms. In this paper, we provide full proofs regarding the local search topology of the gap heuristic for the pancake puzzle. First, we show that in any non-goal state in which there is no move that will decrease the number of gaps, there is a move that will keep the number of gaps constant. We then classify any state in which the number of gaps cannot be decreased in a single action into two groups: those requiring 2 actions to decrease the number of gaps, and those which require 3 actions to decrease the number of gaps.\nExisting person re-identification (re-id) methods rely mostly on either localised or global feature representation alone. This ignores their joint benefit and mutual complementary effects. In this work, we show the advantages of jointly learning local and global features in a Convolutional Neural Network (CNN) by aiming to discover correlated local and global features in different context. Specifically, we formulate a method for joint learning of local and global feature selection losses designed to optimise person re-id when using only generic matching metrics such as the L2 distance. We design a novel CNN architecture for Jointly Learning Multi-Loss (JLML) of local and global discriminative feature optimisation subject concurrently to the same re-id labelled information. Extensive comparative evaluations demonstrate the advantages of this new JLML model for person re-id over a wide range of state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03, Market-1501).\nThe brain's self-monitoring of activities, including internal activities -- a functionality that we refer to as awareness -- has been suggested as a key element of consciousness. Here we investigate whether the presence of an inner-eye-like process (monitor) that supervises the activities of a number of subsystems (operative agents) engaged in the solution of a problem can improve the problem-solving efficiency of the system. The problem is to find the global maximum of a NK fitness landscape and the performance is measured by the time required to find that maximum. The operative agents explore blindly the fitness landscape and the monitor provides them with feedback on the quality (fitness) of the proposed solutions. This feedback is then used by the operative agents to bias their searches towards the fittest regions of the landscape. We find that a weak feedback between the monitor and the operative agents improves the performance of the system, regardless of the difficulty of the problem, which is gauged by the number of local maxima in the landscape. For easy problems (i.e., landscapes without local maxima), the performance improves monotonically as the feedback strength increases, but for difficult problems, there is an optimal value of the feedback strength beyond which the system performance degrades very rapidly.\nIt has long been assumed that high dimensional continuous control problems cannot be solved effectively by discretizing individual dimensions of the action space due to the exponentially large number of bins over which policies would have to be learned. In this paper, we draw inspiration from the recent success of sequence-to-sequence models for structured prediction problems to develop policies over discretized spaces. Central to this method is the realization that complex functions over high dimensional spaces can be modeled by neural networks that use next step prediction. Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions. With this parameterization, it is possible to both leverage the compositional structure of action spaces during learning, as well as compute maxima over action spaces (approximately). On a simple example task we demonstrate empirically that our method can perform global search, which effectively gets around the local optimization issues that plague DDPG and NAF. We apply the technique to off-policy (Q-learning) methods and show that our method can achieve the state-of-the-art for off-policy methods on several continuous control tasks.\nPenetration testing is a well-established practical concept for the identification of potentially exploitable security weaknesses and an important component of a security audit. Providing a holistic security assessment for networks consisting of several hundreds hosts is hardly feasible though without some sort of mechanization. Mitigation, prioritizing counter- measures subject to a given budget, currently lacks a solid theoretical understanding and is hence more art than science. In this work, we propose the first approach for conduct- ing comprehensive what-if analyses in order to reason about mitigation in a conceptually well-founded manner. To evaluate and compare mitigation strategies, we use simulated penetration testing, i.e., automated attack-finding, based on a network model to which a subset of a given set of mitigation actions, e.g., changes to the network topology, system updates, configuration changes etc. is applied. We determine optimal combinations that minimize the maximal attacker success (similar to a Stackelberg game), and thus provide a well-founded basis for a holistic mitigation strategy. We show that these what-if analysis models can largely be derived from network scan, public vulnerability databases and manual inspection with various degrees of automation and detail, and we simulate mitigation analysis on networks of different size and vulnerability.\nUser opinions expressed in the form of ratings can influence an individual's view of an item. However, the true quality of an item is often obfuscated by user biases, and it is not obvious from the observed ratings the importance different users place on different aspects of an item. We propose a probabilistic modeling of the observed aspect ratings to infer (i) each user's aspect bias and (ii) latent intrinsic quality of an item. We model multi-aspect ratings as ordered discrete data and encode the dependency between different aspects by using a latent Gaussian structure. We handle the Gaussian-Categorical non-conjugacy using a stick-breaking formulation coupled with P\\'{o}lya-Gamma auxiliary variable augmentation for a simple, fully Bayesian inference. On two real world datasets, we demonstrate the predictive ability of our model and its effectiveness in learning explainable user biases to provide insights towards a more reliable product quality estimation.\nMassive public resume data emerging on the WWW indicates individual-related characteristics in terms of profile and career experiences. Resume Analysis (RA) provides opportunities for many applications, such as talent seeking and evaluation. Existing RA studies based on statistical analyzing have primarily focused on talent recruitment by identifying explicit attributes. However, they failed to discover the implicit semantic information, i.e., individual career progress patterns and social-relations, which are vital to comprehensive understanding of career development. Besides, how to visualize them for better human cognition is also challenging. To tackle these issues, we propose a visual analytics system ResumeVis to mine and visualize resume data. Firstly, a text-mining based approach is presented to extract semantic information. Then, a set of visualizations are devised to represent the semantic information in multiple perspectives. By interactive exploration on ResumeVis performed by domain experts, the following tasks can be accomplished: to trace individual career evolving trajectory; to mine latent social-relations among individuals; and to hold the full picture of massive resumes' collective mobility. Case studies with over 2500 online officer resumes demonstrate the effectiveness of our system. We provide a demonstration video.\nIn this paper, we propose a single-agent logic of goal-directed knowing how extending the standard epistemic logic of knowing that with a new knowing how operator. The semantics of the new operator is based on the idea that knowing how to achieve $\\phi$ means that there exists a (uniform) strategy such that the agent knows that it can make sure $\\phi$. We give an intuitive axiomatization of our logic and prove the soundness, completeness, and decidability of the logic. The crucial axioms relating knowing that and knowing how illustrate our understanding of knowing how in this setting. This logic can be used in representing both knowledge-that and knowledge-how.\nLocal consistencies stronger than arc consistency have received a lot of attention since the early days of CSP research. %because of the strong pruning they can achieve. However, they have not been widely adopted by CSP solvers. This is because applying such consistencies can sometimes result in considerably smaller search tree sizes and therefore in important speed-ups, but in other cases the search space reduction may be small, causing severe run time penalties. Taking advantage of recent advances in parallelization, we propose a novel approach for the application of strong local consistencies (SLCs) that can improve their performance by largely preserving the speed-ups they offer in cases where they are successful, and eliminating the run time penalties in cases where they are unsuccessful. This approach is presented in the form of two search algorithms. Both algorithms consist of a master search process, which is a typical CSP solver, and a number of slave processes, with each one implementing a SLC method. The first algorithm runs the different SLCs synchronously at each node of the search tree explored in the master process, while the second one can run them asynchronously at different nodes of the search tree. Experimental results demonstrate the benefits of the proposed method.\nWe develop the theory and practice of an approach to modelling and probabilistic inference in causal networks that is suitable when application-specific or analysis-specific constraints should inform such inference or when little or no data for the learning of causal network structure or probability values at nodes are available. Constrained Bayesian Networks generalize a Bayesian Network such that probabilities can be symbolic, arithmetic expressions and where the meaning of the network is constrained by finitely many formulas from the theory of the reals. A formal semantics for constrained Bayesian Networks over first-order logic of the reals is given, which enables non-linear and non-convex optimisation algorithms that rely on decision procedures for this logic, and supports the composition of several constrained Bayesian Networks. A non-trivial case study in arms control, where few or no data are available to assess the effectiveness of an arms inspection process, evaluates our approach. An open-access prototype implementation of these foundations and their algorithms uses the SMT solver Z3 as decision procedure, leverages an open-source package for Bayesian inference to symbolic computation, and is evaluated experimentally.\nAlthough learning-based methods have great potential for robotics, one concern is that a robot that updates its parameters might cause large amounts of damage before it learns the optimal policy. We formalize the idea of safe learning in a probabilistic sense by defining an optimization problem: we desire to maximize the expected return while keeping the expected damage below a given safety limit. We study this optimization for the case of a robot manipulator with safety-based torque limits. We would like to ensure that the damage constraint is maintained at every step of the optimization and not just at convergence. To achieve this aim, we introduce a novel method which predicts how modifying the torque limit, as well as how updating the policy parameters, might affect the robot's safety. We show through a number of experiments that our approach allows the robot to improve its performance while ensuring that the expected damage constraint is not violated during the learning process.\nThere has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task.\nUpdate rules for learning in dynamic time warping spaces are based on optimal warping paths between parameter and input time series. In general, optimal warping paths are not unique resulting in adverse effects in theory and practice. Under the assumption of squared error local costs, we show that no two warping paths have identical costs almost everywhere in a measure-theoretic sense. Two direct consequences of this result are: (i) optimal warping paths are unique almost everywhere, and (ii) the set of all pairs of time series with multiple equal-cost warping paths coincides with the union of exponentially many zero sets of quadratic forms. One implication of the proposed results is that typical distance-based cost functions such as the k-means objective are differentiable almost everywhere and can be minimized by subgradient methods.\nKnowledge bases (KBs) have attracted increasing attention due to its great success in various areas, such as Web and mobile search.Existing KBs are restricted to objective factual knowledge, such as city population or fruit shape, whereas,subjective knowledge, such as big city, which is commonly mentioned in Web and mobile queries, has been neglected. Subjective knowledge differs from objective knowledge in that it has no documented or observed ground truth. Instead, the truth relies on people's dominant opinion. Thus, we can use the crowdsourcing technique to get opinion from the crowd. In our work, we propose a system, called crowdsourced subjective knowledge acquisition (CoSKA),for subjective knowledge acquisition powered by crowdsourcing and existing KBs. The acquired knowledge can be used to enrich existing KBs in the subjective dimension which bridges the gap between existing objective knowledge and subjective queries.The main challenge of CoSKA is the conflict between large scale knowledge facts and limited crowdsourcing resource. To address this challenge, in this work, we define knowledge inference rules and then select the seed knowledge judiciously for crowdsourcing to maximize the inference power under the resource constraint. Our experimental results on real knowledge base and crowdsourcing platform verify the effectiveness of CoSKA system.\nThe availability of large scale event data with time stamps has given rise to dynamically evolving knowledge graphs that contain temporal information for each edge. Reasoning over time in such dynamic knowledge graphs is not yet well understood. To this end, we present Know-Evolve, a novel deep evolutionary knowledge network that learns non-linearly evolving entity representations over time. The occurrence of a fact (edge) is modeled as a multivariate point process whose intensity function is modulated by the score for that fact computed based on the learned entity embeddings. We demonstrate significantly improved performance over various relational learning approaches on two large scale real-world datasets. Further, our method effectively predicts occurrence or recurrence time of a fact which is novel compared to prior reasoning approaches in multi-relational setting.\nThis paper describes a method for identification of the informative variables in the information system with discrete decision variables. It is targeted specifically towards discovery of the variables that are non-informative when considered alone, but are informative when the synergistic interactions between multiple variables are considered. To this end, the mutual entropy of all possible k-tuples of variables with decision variable is computed. Then, for each variable the maximal information gain due to interactions with other variables is obtained. For non-informative variables this quantity conforms to the well known statistical distributions. This allows for discerning truly informative variables from non-informative ones. For demonstration of the approach, the method is applied to several synthetic datasets that involve complex multidimensional interactions between variables. It is capable of identifying most important informative variables, even in the case when the dimensionality of the analysis is smaller than the true dimensionality of the problem. What is more, the high sensitivity of the algorithm allows for detection of the influence of nuisance variables on the response variable.\nLatent features learned by deep learning approaches have proven to be a powerful tool for machine learning. They serve as a data abstraction that makes learning easier by capturing regularities in data explicitly. Their benefits motivated their adaptation to relational learning context. In our previous work, we introduce an approach that learns relational latent features by means of clustering instances and their relations. The major drawback of latent representations is that they are often black-box and difficult to interpret. This work addresses these issues and shows that (1) latent features created by clustering are interpretable and capture interesting properties of data; (2) they identify local regions of instances that match well with the label, which partially explains their benefit; and (3) although the number of latent features generated by this approach is large, often many of them are highly redundant and can be removed without hurting performance much.\nThe city has proven to be the most successful form of human agglomeration and provides wide employment opportunities for its dwellers. As advances in robotics and artificial intelligence revive concerns about the impact of automation on jobs, a question looms: How will automation affect employment in cities? Here, we provide a comparative picture of the impact of automation across U.S. urban areas. Small cities will undertake greater adjustments, such as worker displacement and job content substitutions. We demonstrate that large cities exhibit increased occupational and skill specialization due to increased abundance of managerial and technical professions. These occupations are not easily automatable, and, thus, reduce the potential impact of automation in large cities. Our results pass several robustness checks including potential errors in the estimation of occupational automation and sub-sampling of occupations. Our study provides the first empirical law connecting two societal forces: urban agglomeration and automation's impact on employment.\nBased on Alan Turing's proposition on AI and computing machinery, which shaped Computing as we know it today, the new AI computing machinery should comprise a universal computer and a universal learning machine. The later should understand linear algebra natively to overcome the slowdown of Moore's law. In such a universal learnig machine, a computing unit does not need to keep the legacy of a universal computing core. The data can be distributed to the computing units, and the results can be collected from them through Collective Streaming, reminiscent of Collective Communication in Supercomputing. It is not necessary to use a GPU-like deep memory hierarchy, nor a TPU-like fine-grain mesh.\nThe sense of touch, being the earliest sensory system to develop in a human body [1], plays a critical part of our daily interaction with the environment. In order to successfully complete a task, many manipulation interactions require incorporating haptic feedback. However, manually designing a feedback mechanism can be extremely challenging. In this work, we consider manipulation tasks that need to incorporate tactile sensor feedback in order to modify a provided nominal plan. To incorporate partial observation, we present a new framework that models the task as a partially observable Markov decision process (POMDP) and learns an appropriate representation of haptic feedback which can serve as the state for a POMDP model. The model, that is parametrized by deep recurrent neural networks, utilizes variational Bayes methods to optimize the approximate posterior. Finally, we build on deep Q-learning to be able to select the optimal action in each state without access to a simulator. We test our model on a PR2 robot for multiple tasks of turning a knob until it clicks.\nIn this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed.\nReinforcement learning is a powerful technique to train an agent to perform a task. However, an agent that is trained using reinforcement learning is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing. We use a generator network to propose tasks for the agent to try to achieve, specified as goal states. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent. Our method thus automatically produces a curriculum of tasks for the agent to learn. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment. Our method can also learn to achieve tasks with sparse rewards, which traditionally pose significant challenges.\nIn Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure learning of Bayesian networks and Markov blankets discovery, and thus has many practical applications, ranging from fraud detection to clinical decision support. In this paper, we introduce a new distributed memory approach to the exact parent sets assignment problem. To achieve scalability, we derive theoretical bounds to constraint the search space when MDL scoring function is used, and we reorganize the underlying dynamic programming such that the computational density is increased and fine-grain synchronization is eliminated. We then design efficient realization of our approach in the Apache Spark platform. Through experimental results, we demonstrate that the method maintains strong scalability on a 500-core standalone Spark cluster, and it can be used to efficiently process data sets with 70 variables, far beyond the reach of the currently available solutions.\nWe introduce a package service model where trucks as well as drones can deliver packages. Drones can travel on trucks or fly; but while flying, drones can only carry one package at a time and have to return to a truck to charge after each delivery. We present a heuristic algorithm to solve the problem of finding a good schedule for all drones and trucks. The algorithm is based on two nested local searches, thus the definition of suitable neighbourhoods of solutions is crucial for the algorithm. Empirical tests show that our algorithm performs significantly better than a natural Greedy algorithm. Moreover, the savings compared to solutions without drones turn out to be substantial, suggesting that delivery systems might considerably benefit from using drones in addition to trucks.\nStatistical Relational Learning (SRL) methods for anomaly detection are introduced via a security-related application. Operational requirements for online learning stability are outlined and compared to mathematical definitions as applied to the learning process of a representative SRL method - Bayesian Logic Programs (BLP). Since a formal proof of online stability appears to be impossible, tentative common sense requirements are formulated and tested by theoretical and experimental analysis of a simple and analytically tractable BLP model. It is found that learning algorithms in initial stages of online learning can lock on unstable false predictors that nevertheless comply with our tentative stability requirements and thus masquerade as bona fide solutions. The very expressiveness of SRL seems to cause significant stability issues in settings with many variables and scarce data. We conclude that reliable anomaly detection with SRL-methods requires monitoring by an overarching framework that may involve a comprehensive context knowledge base or human supervision.\nThe sure thing principle and the law of total probability are basic laws in classic probability theory. A disjunction fallacy leads to the violation of these two classical laws. In this paper, an Evidential Markov (EM) decision making model based on Dempster-Shafer (D-S) evidence theory and Markov modelling is proposed to address this issue and model the real human decision-making process. In an evidential framework, the states are extended by introducing an uncertain state which represents the hesitance of a decision maker. The classical Markov model can not produce the disjunction effect, which assumes that a decision has to be certain at one time. However, the state is allowed to be uncertain in the EM model before the final decision is made. An extra uncertainty degree parameter is defined by a belief entropy, named Deng entropy, to assignment the basic probability assignment of the uncertain state, which is the key to predict the disjunction effect. A classical categorization decision-making experiment is used to illustrate the effectiveness and validity of EM model. The disjunction effect can be well predicted and the free parameters are less compared with the existing models.\nAs mobile devices have become indispensable in modern life, mobile security is becoming much more important. Traditional password or PIN-like point-of-entry security measures score low on usability and are vulnerable to brute force and other types of attacks. In order to improve mobile security, an adaptive neuro-fuzzy inference system(ANFIS)-based implicit authentication system is proposed in this paper to provide authentication in a continuous and transparent manner.To illustrate the applicability and capability of ANFIS in our implicit authentication system, experiments were conducted on behavioural data collected for up to 12 weeks from different Android users. The ability of the ANFIS-based system to detect an adversary is also tested with scenarios involving an attacker with varying levels of knowledge. The results demonstrate that ANFIS is a feasible and efficient approach for implicit authentication with an average of 95% user recognition rate. Moreover, the use of ANFIS-based system for implicit authentication significantly reduces manual tuning and configuration tasks due to its selflearning capability.\nMotivated by the common academic problem of allocating papers to referees for conference reviewing we propose a novel mechanism for solving the assignment problem when we have a two sided matching problem with preferences from one side (the agents/reviewers) over the other side (the objects/papers) and both sides have capacity constraints. The assignment problem is a fundamental problem in both computer science and economics with application in many areas including task and resource allocation. We draw inspiration from multi-criteria decision making and voting and use order weighted averages (OWAs) to propose a novel and flexible class of algorithms for the assignment problem. We show an algorithm for finding a $\\Sigma$-OWA assignment in polynomial time, in contrast to the NP-hardness of finding an egalitarian assignment. Inspired by this setting we observe an interesting connection between our model and the classic proportional multi-winner election problem in social choice.\nWe propose an efficient method to estimate the accuracy of classifiers using only unlabeled data. We consider a setting with multiple classification problems where the target classes may be tied together through logical constraints. For example, a set of classes may be mutually exclusive, meaning that a data instance can belong to at most one of them. The proposed method is based on the intuition that: (i) when classifiers agree, they are more likely to be correct, and (ii) when the classifiers make a prediction that violates the constraints, at least one classifier must be making an error. Experiments on four real-world data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. Our models also outperform existing state-of-the-art solutions in both estimating accuracies, and combining multiple classifier outputs. The results emphasize the utility of logical constraints in estimating accuracy, thus validating our intuition.\nThe field of Statistical Relational Learning (SRL) is concerned with learning probabilistic models from relational data. Learned SRL models are typically represented using some kind of weighted logical formulas, which make them considerably more interpretable than those obtained by e.g. neural networks. In practice, however, these models are often still difficult to interpret correctly, as they can contain many formulas that interact in non-trivial ways and weights do not always have an intuitive meaning. To address this, we propose a new SRL method which uses possibilistic logic to encode relational models. Learned models are then essentially stratified classical theories, which explicitly encode what can be derived with a given level of certainty. Compared to Markov Logic Networks (MLNs), our method is faster and produces considerably more interpretable models.\nOntology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags.\nMany different methods to train deep generative models have been introduced in the past. In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call \"Variational Mixture of Posteriors\" prior, or VampPrior for short. The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs. We further extend this prior to a two layer hierarchical model and show that this architecture with a coupled prior and posterior, learns significantly better models. The model also avoids the usual local optima issues related to useless latent dimensions that plague VAEs. We provide empirical studies on six datasets, namely, static and binary MNIST, OMNIGLOT, Caltech 101 Silhouettes, Frey Faces and Histopathology patches, and show that applying the hierarchical VampPrior delivers state-of-the-art results on all datasets in the unsupervised permutation invariant setting and the best results or comparable to SOTA methods for the approach with convolutional networks.\nAction planning using learned and differentiable forward models of the world is a general approach which has a number of desirable properties, including improved sample complexity over model-free RL methods, reuse of learned models across different tasks, and the ability to perform efficient gradient-based optimization in continuous action spaces. However, this approach does not apply straightforwardly when the action space is discrete. In this work, we show that it is in fact possible to effectively perform planning via backprop in discrete action spaces, using a simple paramaterization of the actions vectors on the simplex combined with input noise when training the forward model. Our experiments show that this approach can match or outperform model-free RL and discrete planning methods on gridworld navigation tasks in terms of performance and/or planning time while using limited environment interactions, and can additionally be used to perform model-based control in a challenging new task where the action space combines discrete and continuous actions. We furthermore propose a policy distillation approach which yields a fast policy network which can be used at inference time, removing the need for an iterative planning procedure.\nWe propose studying GAN training dynamics as regret minimization, which is in contrast to the popular view that there is consistent minimization of a divergence between real and generated distributions. We analyze the convergence of GAN training from this new point of view to understand why mode collapse happens. We hypothesize the existence of undesirable local equilibria in this non-convex game to be responsible for mode collapse. We observe that these local equilibria often exhibit sharp gradients of the discriminator function around some real data points. We demonstrate that these degenerate local equilibria can be avoided with a gradient penalty scheme called DRAGAN. We show that DRAGAN enables faster training, achieves improved stability with fewer mode collapses, and leads to generator networks with better modeling performance across a variety of architectures and objective functions.\nApproximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific data set. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the approximating distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics.\nIn this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage--translation stage--, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.\nA number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.\nWord embeddings improve the performance of NLP systems by revealing the hidden structural relationships between words. Despite their success in many applications, word embeddings have seen very little use in computational social science NLP tasks, presumably due to their reliance on big data, and to a lack of interpretability. I propose a probabilistic model-based word embedding method which can recover interpretable embeddings, without big data. The key insight is to leverage mixed membership modeling, in which global representations are shared, but individual entities (i.e. dictionary words) are free to use these representations to uniquely differing degrees. I show how to train the model using a combination of state-of-the-art training techniques for word embeddings and topic models. The experimental results show an improvement in predictive language modeling of up to 63% in MRR over the skip-gram, and demonstrate that the representations are beneficial for supervised learning. I illustrate the interpretability of the models with computational social science case studies on State of the Union addresses and NIPS articles.\nThe stochastic shortest path problem (SSP) is a highly expressive model for probabilistic planning. The computational hardness of SSPs has sparked interest in determinization-based planners that can quickly solve large problems. However, existing methods employ a simplistic approach to determinization. In particular, they ignore the possibility of tailoring the determinization to the specific characteristics of the target domain. In this work we examine this question, by showing that learning a good determinization for a planning domain can be done efficiently and can improve performance. Moreover, we show how to directly incorporate probabilistic reasoning into the planning problem when a good determinization is not sufficient by itself. Based on these insights, we introduce a planner, FF-LAO*, that outperforms state-of-the-art probabilistic planners on several well-known competition benchmarks.\nAnswer Set Programming (ASP) is a powerful modeling formalism for combinatorial problems. However, writing ASP models is not trivial. We propose a novel method, called Sketched Answer Set Programming (SkASP), aiming at supporting the user in resolving this issue. The user writes an ASP program while marking uncertain parts open with question marks. In addition, the user provides a number of positive and negative examples of the desired program behaviour. The sketched model is rewritten into another ASP program, which is solved by traditional methods. As a result, the user obtains a functional and reusable ASP program modelling her problem. We evaluate our approach on 21 well known puzzles and combinatorial problems inspired by Karp's 21 NP-complete problems and demonstrate a use-case for a database application based on ASP.\nWe present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.\nWe propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.\nWe frame Question Answering (QA) as a Reinforcement Learning task, an approach that we call Active Question Answering. We propose an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer. The reformulation system is trained end-to-end to maximize answer quality using policy gradient. We evaluate on SearchQA, a dataset of complex questions extracted from Jeopardy!. The agent outperforms a state-of-the-art base model, playing the role of the environment, and other benchmarks. We also analyze the language that the agent has learned while interacting with the question answering system. We find that successful question reformulations look quite different from natural language paraphrases. The agent is able to discover non-trivial reformulation strategies that resemble classic information retrieval techniques such as term re-weighting (tf-idf) and stemming.\nUnderstanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.\nWe propose a new algorithm for training generative adversarial networks that jointly learns latent codes for both identities (e.g. individual humans) and observations (e.g. specific photographs). By fixing the identity portion of the latent codes, we can generate diverse images of the same subject, and by fixing the observation portion, we can traverse the manifold of subjects while maintaining contingent aspects such as lighting and pose. Our algorithm features a pairwise training scheme in which each sample from the generator consists of two images with a common identity code. Corresponding samples from the real dataset consist of two distinct photographs of the same subject. In order to fool the discriminator, the generator must produce pairs that are photorealistic, distinct, and appear to depict the same individual. We augment both the DCGAN and BEGAN approaches with Siamese discriminators to facilitate pairwise training. Experiments with human judges and an off-the-shelf face verification system demonstrate our algorithm's ability to generate convincing, identity-matched photographs.\nRepresentation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincar\\'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar\\'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.\nThe design and analysis of communication systems typically rely on the development of mathematical models that describe the underlying communication channel, which dictates the relationship between the transmitted and the received signals. However, in some systems, such as molecular communication systems where chemical signals are used for transfer of information, it is not possible to accurately model this relationship. In these scenarios, because of the lack of mathematical channel models, a completely new approach to design and analysis is required. In this work, we focus on one important aspect of communication systems, the detection algorithms, and demonstrate that by borrowing tools from deep learning, it is possible to train detectors that perform well, without any knowledge of the underlying channel models. We evaluate these algorithms using experimental data that is collected by a chemical communication platform, where the channel model is unknown and difficult to model analytically. We show that deep learning algorithms perform significantly better than a simple detector that was used in previous works, which also did not assume any knowledge of the channel.\nApplying deep reinforcement learning (RL) on real systems suffers from slow data sampling. We propose an enhanced generative adversarial network (EGAN) to initialize an RL agent in order to achieve faster learning. The EGAN utilizes the relation between states and actions to enhance the quality of data samples generated by a GAN. Pre-training the agent with the EGAN shows a steeper learning curve with a 20% improvement of training time in the beginning of learning, compared to no pre-training, and an improvement compared to training with GAN by about 5% with smaller variations. For real time systems with sparse and slow data sampling the EGAN could be used to speed up the early phases of the training process.\nExplaining and reasoning about processes which underlie observed black-box phenomena enables the discovery of causal mechanisms, derivation of suitable abstract representations and the formulation of more robust predictions. We propose to learn high level functional programs in order to represent abstract models which capture the invariant structure in the observed data. We introduce the $\\pi$-machine (program-induction machine) -- an architecture able to induce interpretable LISP-like programs from observed data traces. We propose an optimisation procedure for program learning based on backpropagation, gradient descent and A* search. We apply the proposed method to three problems: system identification of dynamical systems, explaining the behaviour of a DQN agent and learning by demonstration in a human-robot interaction scenario. Our experimental results show that the $\\pi$-machine can efficiently induce interpretable programs from individual data traces.\nNo real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.\nLTLf synthesis is the process of finding a strategy that satisfies a linear temporal specification over finite traces. An existing solution to this problem relies on a reduction to a DFA game. In this paper, we propose a symbolic framework for LTLf synthesis based on this technique, by performing the computation over a representation of the DFA as a boolean formula rather than as an explicit graph. This approach enables strategy generation by utilizing the mechanism of boolean synthesis. We implement this symbolic synthesis method in a tool called Syft, and demonstrate by experiments on scalable benchmarks that the symbolic approach scales better than the explicit one.\nThe rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements. In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate. We show how to make the algorithm differentially private to protect against the disclosure of information about the personal datasets, and formally analyze the trade-off between utility and privacy. Our experiments show that our approach dramatically outperforms previous work in the non-private case, and that under privacy constraints, we can significantly improve over models learned in isolation.\nRecent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific lower bounds on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp. neural networks improves the robustness of the classifier without any loss in prediction performance.\nWe introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings. We then analyze the effects of using second-order embeddings as input features in two deep natural language processing models, for named entity recognition and recognizing textual entailment, as well as a linear model for paraphrase recognition. Surprisingly, we find that nearest neighbor information alone is sufficient to capture most of the performance benefits derived from using pre-trained word embeddings. Furthermore, second-order embeddings are able to handle highly heterogeneous data better than first-order representations, though at the cost of some specificity. Additionally, augmenting contextual embeddings with second-order information further improves model performance in some cases. Due to variance in the random initializations of word embeddings, utilizing nearest neighbor features from multiple first-order embedding samples can also contribute to downstream performance gains. Finally, we identify intriguing characteristics of second-order embedding spaces for further research, including much higher density and different semantic interpretations of cosine similarity.\nRandomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as uplift modeling, differential response analysis, or personalized treatment learning in literature. A key feature for uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved. This presents a challenge to both the training and the evaluation of uplift models. In this paper we describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. We present a new uplift algorithm which creates a forest of randomized trees. The trees are built with a splitting criterion designed to directly optimize their uplift performance based on the proposed evaluation method. Both the evaluation method and the algorithm apply to arbitrary number of treatments and general response types. Experimental results on synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods.\nSelective classification techniques (also known as reject option) have not yet been considered in the context of deep neural networks (DNNs). These techniques can potentially significantly improve DNNs prediction performance by trading-off coverage. In this paper we propose a method to construct a selective classifier given a trained neural network. Our method allows a user to set a desired risk level. At test time, the classifier rejects instances as needed, to grant the desired risk (with high probability). Empirical results over CIFAR and ImageNet convincingly demonstrate the viability of our method, which opens up possibilities to operate DNNs in mission-critical applications. For example, using our method an unprecedented 2% error in top-5 ImageNet classification can be guaranteed with probability 99.9%, and almost 60% test coverage.\nThe explosive growth of the location-enabled devices coupled with the increasing use of Internet services has led to an increasing awareness of the importance and usage of geospatial information in many applications. The navigation apps (often called Maps), use a variety of available data sources to calculate and predict the travel time as well as several options for routing in public transportation, car or pedestrian modes. This paper evaluates the pedestrian mode of Maps apps in three major smartphone operating systems (Android, iOS and Windows Phone). In the paper, we will show that the Maps apps on iOS, Android and Windows Phone in pedestrian mode, predict travel time without learning from the individual's movement profile. In addition, we will exemplify that those apps suffer from a specific data quality issue which relates to the absence of information about location and type of pedestrian crossings. Finally, we will illustrate learning from movement profile of individuals using various predictive analytics models to improve the accuracy of travel time estimation.\nA major challenge in designing neural network (NN) systems is to determine the best structure and parameters for the network given the data for the machine learning problem at hand. Examples of parameters are the number of layers and nodes, the learning rates, and the dropout rates. Typically, these parameters are chosen based on heuristic rules and manually fine-tuned, which may be very time-consuming, because evaluating the performance of a single parametrization of the NN may require several hours. This paper addresses the problem of choosing appropriate parameters for the NN by formulating it as a box-constrained mathematical optimization problem, and applying a derivative-free optimization tool that automatically and effectively searches the parameter space. The optimization tool employs a radial basis function model of the objective function (the prediction accuracy of the NN) to accelerate the discovery of configurations yielding high accuracy. Candidate configurations explored by the algorithm are trained to a small number of epochs, and only the most promising candidates receive full training. The performance of the proposed methodology is assessed on benchmark sets and in the context of predicting drug-drug interactions, showing promising results. The optimization tool used in this paper is open-source.\nLarge-scale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular [Rahimi and Recht, 2007], where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC) integration [Yang et al., 2014]. A limitation of the current approaches is that all the features receive an equal weight summing to 1. In this paper, we propose a novel shrinkage estimator from \"Stein effect\", which provides a data-driven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk. We further present an efficient randomized algorithm for large-scale applications of the proposed method. Our empirical results on six benchmark data sets demonstrate the advantageous performance of this approach over representative baselines in both kernel approximation and supervised learning tasks.\nReinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.\nSemi-supervised learning methods using Generative Adversarial Networks (GANs) have shown promising empirical success recently. Most of these methods use a shared discriminator/classifier which discriminates real examples from fake while also predicting the class label. Motivated by the ability of the GANs generator to capture the data manifold well, we propose to estimate the tangent space to the data manifold using GANs and employ it to inject invariances into the classifier. In the process, we propose enhancements over existing methods for learning the inverse mapping (i.e., the encoder) which greatly improves in terms of semantic similarity of the reconstructed sample with the input sample. We observe considerable empirical gains in semi-supervised learning over baselines, particularly in the cases when the number of labeled examples is low. We also provide insights into how fake examples influence the semi-supervised learning procedure.\nAdversarial learning of probabilistic models has recently emerged as a promising alternative to maximum likelihood. Implicit models such as generative adversarial networks (GAN) often generate better samples compared to explicit models trained by maximum likelihood. Yet, GANs sidestep the characterization of an explicit density which makes quantitative evaluations challenging. To bridge this gap, we propose Flow-GANs, a generative adversarial network for which we can perform exact likelihood evaluation, thus supporting both adversarial and maximum likelihood training. When trained adversarially, Flow-GANs generate high-quality samples but attain extremely poor log-likelihood scores, inferior even to a mixture model memorizing the training data; the opposite is true when trained by maximum likelihood. Results on MNIST and CIFAR-10 demonstrate that hybrid training can attain high held-out likelihoods while retaining visual fidelity in the generated samples.\nCooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.\nTypical reinforcement learning (RL) agents learn to complete tasks specified by reward functions tailored to their domain. As such, the policies they learn do not generalize even to similar domains. To address this issue, we develop a framework through which a deep RL agent learns to generalize policies from smaller, simpler domains to more complex ones using a recurrent attention mechanism. The task is presented to the agent as an image and an instruction specifying the goal. This meta-controller guides the agent towards its goal by designing a sequence of smaller subtasks on the part of the state space within the attention, effectively decomposing it. As a baseline, we consider a setup without attention as well. Our experiments show that the meta-controller learns to create subgoals within the attention.\nWe propose a probabilistic framework for domain adaptation that blends both generative and discriminative modeling in a principled way. Under this framework, generative and discriminative models correspond to specific choices of the prior over parameters. This provides us a very general way to interpolate between generative and discriminative extremes through different choices of priors. By maximizing both the marginal and the conditional log-likelihoods, models derived from this framework can use both labeled instances from the source domain as well as unlabeled instances from both source and target domains. Under this framework, we show that the popular reconstruction loss of autoencoder corresponds to an upper bound of the negative marginal log-likelihoods of unlabeled instances, where marginal distributions are given by proper kernel density estimations. This provides a way to interpret the empirical success of autoencoders in domain adaptation and semi-supervised learning. We instantiate our framework using neural networks, and build a concrete model, DAuto. Empirically, we demonstrate the effectiveness of DAuto on text, image and speech datasets, showing that it outperforms related competitors when domain adaptation is possible.\nIncremental methods for structure learning of pairwise Markov random fields (MRFs), such as grafting, improve scalability to large systems by avoiding inference over the entire feature space in each optimization step. Instead, inference is performed over an incrementally grown active set of features. In this paper, we address the computational bottlenecks that current techniques still suffer by introducing online edge grafting, an incremental, structured method that activates edges as groups of features in a streaming setting. The framework is based on reservoir sampling of edges that satisfy a necessary activation condition, approximating the search for the optimal edge to activate. Online edge grafting performs an informed edge search set reorganization using search history and structure heuristics. Experiments show a significant computational speedup for structure learning and a controllable trade-off between the speed and the quality of learning.\nIn reinforcement learning, we often define goals by specifying rewards within desirable states. One problem with this approach is that we typically need to redefine the rewards each time the goal changes, which often requires some understanding of the solution in the agents environment. When humans are learning to complete tasks, we regularly utilize alternative sources that guide our understanding of the problem. Such task representations allow one to specify goals on their own terms, thus providing specifications that can be appropriately interpreted across various environments. This motivates our own work, in which we represent goals in environments that are different from the agents. We introduce Cross-Domain Perceptual Reward (CDPR) functions, learned rewards that represent the visual similarity between an agents state and a cross-domain goal image. We report results for learning the CDPRs with a deep neural network and using them to solve two tasks with deep reinforcement learning.\nIn this paper, we focus on learning structure-aware document representations from data without recourse to a discourse parser or additional annotations. Drawing inspiration from recent efforts to empower neural networks with a structural bias, we propose a model that can encode a document while automatically inducing rich structural dependencies. Specifically, we embed a differentiable non-projective parsing algorithm into a neural model and use attention mechanisms to incorporate the structural biases. Experimental evaluation across different tasks and datasets shows that the proposed model achieves state-of-the-art results on document modeling tasks while inducing intermediate structures which are both interpretable and meaningful.\nWe study the notion of robustness in stable matching problems. We first define robustness by introducing (a,b)-supermatches. An $(a,b)$-supermatch is a stable matching in which if $a$ pairs break up it is possible to find another stable matching by changing the partners of those $a$ pairs and at most $b$ other pairs. In this context, we define the most robust stable matching as a $(1,b)$-supermatch where b is minimum. We show that checking whether a given stable matching is a $(1,b)$-supermatch can be done in polynomial time. Next, we use this procedure to design a constraint programming model, a local search approach, and a genetic algorithm to find the most robust stable matching. Our empirical evaluation on large instances show that local search outperforms the other approaches.\nRecurrent neural networks have achieved remarkable success at generating sequences with complex structures, thanks to advances that include richer embeddings of input and cures for vanishing gradients. Trained only on sequences from a known grammar, though, they can still struggle to learn rules and constraints of the grammar.   Neural Attribute Machines (NAMs) are equipped with a logical machine that represents the underlying grammar, which is used to teach the constraints to the neural machine by (i) augmenting the input sequence, and (ii) optimizing a custom loss function. Unlike traditional RNNs, NAMs are exposed to the grammar, as well as samples from the language of the grammar. During generation, NAMs make significantly fewer violations of the constraints of the underlying grammar than RNNs trained only on samples from the language of the grammar.\nWhen used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results. Inspired by this, we consider the extension of the ELBO to a family of lower bounds defined by a particle filter's estimator of the marginal likelihood, the filtering variational objectives (FIVOs). FIVOs take the same arguments as the ELBO, but can exploit a model's sequential structure to form tighter bounds. We present results that relate the tightness of FIVO's bound to the variance of the particle filter's estimator by considering the generic case of bounds defined as log-transformed likelihood estimators. Experimentally, we show that training with FIVO results in substantial improvements over training the same model architecture with the ELBO on sequential data.\nThe existence of a coalition strategy to achieve a goal does not necessarily mean that the coalition has enough information to know how to follow the strategy. Neither does it mean that the coalition knows that such a strategy exists. The article studies an interplay between the distributed knowledge, coalition strategies, and coalition \"know-how\" strategies. The main technical result is a sound and complete trimodal logical system that describes the properties of this interplay.\nWe study Robust Subspace Recovery (RSR) in distributed settings. We consider a huge dataset in an ad hoc network without a central processor, where each node has access only to one chunk of the dataset. We assume that part of the whole dataset lies around a low-dimensional subspace and the other part is composed of outliers that lie away from that subspace. The goal is to recover the underlying subspace for the whole dataset, without transferring the data itself between the nodes. We apply the Consensus-Based Gradient method for the Geometric Median Subspace algorithm for RSR. We propose an iterative solution for the local dual minimization problem and establish its $r$-linear convergence. We also explain how to distributedly implement the Reaper and Fast Median Subspace algorithms for RSR. We demonstrate the competitive performance of our algorithms for both synthetic and real data.\nGiven a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper.   As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.\nOur experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together. Multimodal machine learning aims to build models that can process and relate information from multiple modalities. It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy. We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning. This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.\nOnline music services are increasing in popularity. They enable us to analyze people's music listening behavior based on play logs. Although it is known that people listen to music based on topic (e.g., rock or jazz), we assume that when a user is addicted to an artist, s/he chooses the artist's songs regardless of topic. Based on this assumption, in this paper, we propose a probabilistic model to analyze people's music listening behavior. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling music listening behavior by taking into account the influence of addiction to artists. Second, by using real-world datasets of play logs, we showed the effectiveness of our proposed model. Third, we carried out qualitative experiments and showed that taking addiction into account enables us to analyze music listening behavior from a new viewpoint in terms of how people listen to music according to the time of day, how an artist's songs are listened to by people, etc. We also discuss the possibility of applying the analysis results to applications such as artist similarity computation and song recommendation.\nThis paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR transcriptions , semantic concepts and concept/values pairs in a e.g touristic information system. An approach is proposed for enriching the set of semantic labels with error specific labels and by using a recently proposed neural approach based on word embeddings to compute well calibrated ASR confidence measures. Experimental results are reported showing that it is possible to decrease significantly the Concept/Value Error Rate with a state of the art system, outperforming previously published results performance on the same experimental data. It also shown that combining an SLU approach based on conditional random fields with a neural encoder/decoder attention based architecture , it is possible to effectively identifying confidence islands and uncertain semantic output segments useful for deciding appropriate error handling actions by the dialogue manager strategy .\nThe quadratic unconstrained binary optimization (QUBO) problem arises in diverse optimization applications ranging from Ising spin problems to classical problems in graph theory and binary discrete optimization. The use of preprocessing to transform the graph representing the QUBO problem into a smaller equivalent graph is important for improving solution quality and time for both exact and metaheuristic algorithms and is a step towards mapping large scale QUBO to hardware graphs used in quantum annealing computers. In an earlier paper (Lewis and Glover, 2016) a set of rules was introduced that achieved significant QUBO reductions as verified through computational testing. Here this work is extended with additional rules that provide further reductions that succeed in exactly solving 10% of the benchmark QUBO problems. An algorithm and associated data structures to efficiently implement the entire set of rules is detailed and computational experiments are reported that demonstrate their efficacy.\nThe goal of this paper is to analyze the geometric properties of deep neural network classifiers in the input space. We specifically study the topology of classification regions created by deep networks, as well as their associated decision boundary. Through a systematic empirical investigation, we show that state-of-the-art deep nets learn connected classification regions, and that the decision boundary in the vicinity of datapoints is flat along most directions. We further draw an essential connection between two seemingly unrelated properties of deep networks: their sensitivity to additive perturbations in the inputs, and the curvature of their decision boundary. The directions where the decision boundary is curved in fact remarkably characterize the directions to which the classifier is the most vulnerable. We finally leverage a fundamental asymmetry in the curvature of the decision boundary of deep nets, and propose a method to discriminate between original images, and images perturbed with small adversarial examples. We show the effectiveness of this purely geometric approach for detecting small adversarial perturbations in images, and for recovering the labels of perturbed images.\nDeep networks have recently been shown to be vulnerable to universal perturbations: there exist very small image-agnostic perturbations that cause most natural images to be misclassified by such classifiers. In this paper, we propose the first quantitative analysis of the robustness of classifiers to universal perturbations, and draw a formal link between the robustness to universal perturbations, and the geometry of the decision boundary. Specifically, we establish theoretical bounds on the robustness of classifiers under two decision boundary models (flat and curved models). We show in particular that the robustness of deep networks to universal perturbations is driven by a key property of their curvature: there exists shared directions along which the decision boundary of deep networks is systematically positively curved. Under such conditions, we prove the existence of small universal perturbations. Our analysis further provides a novel geometric method for computing universal perturbations, in addition to explaining their properties.\nGenerative adversarial networks (GANs) can implicitly learn rich distributions over images, audio, and data which are hard to model with an explicit likelihood. We present a practical Bayesian formulation for unsupervised and semi-supervised learning with GANs. Within this framework, we use stochastic gradient Hamiltonian Monte Carlo to marginalize the weights of the generator and discriminator networks. The resulting approach is straightforward and obtains good performance without any standard interventions such as feature matching, or mini-batch discrimination. By exploring an expressive posterior over the parameters of the generator, the Bayesian GAN avoids mode-collapse, produces interpretable and diverse candidate samples, and provides state-of-the-art quantitative results for semi-supervised learning on benchmarks including SVHN, CelebA, and CIFAR-10, outperforming DCGAN, Wasserstein GANs, and DCGAN ensembles.\nAutonomous systems can substantially enhance a human's efficiency and effectiveness in complex environments. Machines, however, are often unable to observe the preferences of the humans that they serve. Despite the fact that the human's and machine's objectives are aligned, asymmetric information, along with heterogeneous sensitivities to risk by the human and machine, make their joint optimization process a game with strategic interactions. We propose a framework based on risk-sensitive dynamic games; the human seeks to optimize her risk-sensitive criterion according to her true preferences, while the machine seeks to adaptively learn the human's preferences and at the same time provide a good service to the human. We develop a class of performance measures for the proposed framework based on the concept of regret. We then evaluate their dependence on the risk-sensitivity and the degree of uncertainty. We present applications of our framework to self-driving taxis, and robo-financial advising.\nThe Quadratic Unconstrained Binary Optimization problem (QUBO) has become a unifying model for representing a wide range of combinatorial optimization problems, and for linking a variety of disciplines that face these problems. A new class of quantum annealing computer that maps QUBO onto a physical qubit network structure with specific size and edge density restrictions is generating a growing interest in ways to transform the underlying QUBO structure into an equivalent graph having fewer nodes and edges. In this paper we present rules for reducing the size of the QUBO matrix by identifying variables whose value at optimality can be predetermined. We verify that the reductions improve both solution quality and time to solution and, in the case of metaheuristic methods where optimal solutions cannot be guaranteed, the quality of solutions obtained within reasonable time limits.   We discuss the general QUBO structural characteristics that can take advantage of these reduction techniques and perform careful experimental design and analysis to identify and quantify the specific characteristics most affecting reduction. The rules make it possible to dramatically improve solution times on a new set of problems using both the exact Cplex solver and a tabu search metaheuristic.\nIn this work we present strategies for (optimal) measurement selection in model-based sequential diagnosis. In particular, assuming a set of leading diagnoses being given, we show how queries (sets of measurements) can be computed and optimized along two dimensions: expected number of queries and cost per query. By means of a suitable decoupling of two optimizations and a clever search space reduction the computations are done without any inference engine calls. For the full search space, we give a method requiring only a polynomial number of inferences and guaranteeing query properties existing methods cannot provide. Evaluation results using real-world problems indicate that the new method computes (virtually) optimal queries instantly independently of the size and complexity of the considered diagnosis problems.\nIn this paper, we present the Role Playing Learning (RPL) scheme for a mobile robot to navigate socially with its human companion in populated environments. Neural networks (NN) are constructed to parameterize a stochastic policy that directly maps sensory data collected by the robot to its velocity outputs, while respecting a set of social norms. An efficient simulative learning environment is built with maps and pedestrians trajectories collected from a number of real-world crowd data sets. In each learning iteration, a robot equipped with the NN policy is created virtually in the learning environment to play itself as a companied pedestrian and navigate towards a goal in a socially concomitant manner. Thus, we call this process Role Playing Learning, which is formulated under a reinforcement learning (RL) framework. The NN policy is optimized end-to-end using Trust Region Policy Optimization (TRPO), with consideration of the imperfectness of robot's sensor measurements. Simulative and experimental results are provided to demonstrate the efficacy and superiority of our method.\nRecent progress in variational inference has paid much attention to the flexibility of variational posteriors. One promising direction is to use implicit distributions, i.e., distributions without tractable densities as the variational posterior. However, existing methods on implicit posteriors still face challenges of noisy estimation and computational infeasibility when applied to models with high-dimensional latent variables. In this paper, we present a new approach named Kernel Implicit Variational Inference that addresses these challenges. As far as we know, for the first time implicit variational inference is successfully applied to Bayesian neural networks, which shows promising results on both regression and classification tasks.\nThere are two common approaches for optimizing the performance of a machine: genetic algorithms and machine learning. A genetic algorithm is applied over many generations whereas machine learning works by applying feedback until the system meets a performance threshold. Though these are methods that typically operate separately, we combine evolutionary adaptation and machine learning into one approach. Our focus is on machines that can learn during their lifetime, but instead of equipping them with a machine learning algorithm we aim to let them evolve their ability to learn by themselves. We use evolvable networks of probabilistic and deterministic logic gates, known as Markov Brains, as our computational model organism. The ability of Markov Brains to learn is augmented by a novel adaptive component that can change its computational behavior based on feedback. We show that Markov Brains can indeed evolve to incorporate these feedback gates to improve their adaptability to variable environments. By combining these two methods, we now also implemented a computational model that can be used to study the evolution of learning.\nWe introduce contextual explanation networks (CENs)---a class of models that learn to predict by generating and leveraging intermediate explanations. CENs are deep networks that generate parameters for context-specific probabilistic graphical models which are further used for prediction and play the role of explanations. Contrary to the existing post-hoc model-explanation tools, CENs learn to predict and to explain jointly. Our approach offers two major advantages: (i) for each prediction, valid instance-specific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularization and boosts performance in low-resource settings. We prove that local approximations to the decision boundary of our networks are consistent with the generated explanations. Our results on image and text classification and survival analysis tasks demonstrate that CENs are competitive with the state-of-the-art while offering additional insights behind each prediction, valuable for decision support.\nMultisensory polices are known to enhance both state estimation and target tracking. However, in the space of end-to-end sensorimotor control, this multi-sensor outlook has received limited attention. Moreover, systematic ways to make policies robust to partial sensor failure are not well explored. In this work, we propose a specific customization of Dropout, called \\textit{Sensor Dropout}, to improve multisensory policy robustness and handle partial failure in the sensor-set. We also introduce an additional auxiliary loss on the policy network in order to reduce variance in the band of potential multi- and uni-sensory policies to reduce jerks during policy switching triggered by an abrupt sensor failure or deactivation/activation. Finally, through the visualization of gradients, we show that the learned policies are conditioned on the same latent states representation despite having diverse observations spaces - a hallmark of true sensor-fusion. Simulation results of the multisensory policy, as visualized in TORCS racing game, can be seen here: https://youtu.be/QAK2lcXjNZc.\nDespite the current interest in Open Data publishing, a formal and comprehensive methodology supporting an organization in deciding which data to publish and carrying out precise procedures for publishing high-quality data, is still missing. In this paper we argue that the Ontology-based Data Management paradigm can provide a formal basis for a principled approach to publish high quality, semantically annotated Open Data. We describe two main approaches to using an ontology for this endeavor, and then we present some technical results on one of the approaches, called bottom-up, where the specification of the data to be published is given in terms of the sources, and specific techniques allow deriving suitable annotations for interpreting the published data under the light of the ontology.\nDeep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest to larger but noisy datasets that are more easily obtained. In this paper, we show that deep neural networks are capable of generalizing from training data for which true labels are massively outnumbered by incorrect labels. We demonstrate remarkably high test performance after training on corrupted data from MNIST, CIFAR, and ImageNet. For example, on MNIST we obtain test accuracy above 90 percent even after each clean training example has been diluted with 100 randomly-labeled examples. Such behavior holds across multiple patterns of label noise, even when erroneous labels are biased towards confusing classes. We show that training in this regime requires a significant but manageable increase in dataset size that is related to the factor by which correct labels have been diluted. Finally, we provide an analysis of our results that shows how increasing noise decreases the effective batch size.\nHuman trafficking is one of the most atrocious crimes and among the challenging problems facing law enforcement which demands attention of global magnitude. In this study, we leverage textual data from the website \"Backpage\"- used for classified advertisement- to discern potential patterns of human trafficking activities which manifest online and identify advertisements of high interest to law enforcement. Due to the lack of ground truth, we rely on a human analyst from law enforcement, for hand-labeling a small portion of the crawled data. We extend the existing Laplacian SVM and present S3VM-R, by adding a regularization term to exploit exogenous information embedded in our feature space in favor of the task at hand. We train the proposed method using labeled and unlabeled data and evaluate it on a fraction of the unlabeled data, herein referred to as unseen data, with our expert's further verification. Results from comparisons between our method and other semi-supervised and supervised approaches on the labeled data demonstrate that our learner is effective in identifying advertisements of high interest to law enforcement\nDeep learning algorithms for connectomics rely upon localized classification, rather than overall morphology. This leads to a high incidence of erroneously merged objects. Humans, by contrast, can easily detect such errors by acquiring intuition for the correct morphology of objects. Biological neurons have complicated and variable shapes, which are challenging to learn, and merge errors take a multitude of different forms. We present an algorithm, MergeNet, that shows 3D ConvNets can, in fact, detect merge errors from high-level neuronal morphology. MergeNet follows unsupervised training and operates across datasets. We demonstrate the performance of MergeNet both on a variety of connectomics data and on a dataset created from merged MNIST images.\nClause Learning is one of the most important components of a conflict driven clause learning (CDCL) SAT solver that is effective on industrial instances. Since the number of learned clauses is proved to be exponential in the worse case, it is necessary to identify the most relevant clauses to maintain and delete the irrelevant ones. As reported in the literature, several learned clauses deletion strategies have been proposed. However the diversity in both the number of clauses to be removed at each step of reduction and the results obtained with each strategy creates confusion to determine which criterion is better. Thus, the problem to select which learned clauses are to be removed during the search step remains very challenging. In this paper, we propose a novel approach to identify the most relevant learned clauses without favoring or excluding any of the proposed measures, but by adopting the notion of dominance relationship among those measures. Our approach bypasses the problem of the diversity of results and reaches a compromise between the assessments of these measures. Furthermore, the proposed approach also avoids another non-trivial problem which is the amount of clauses to be deleted at each reduction of the learned clause database.\nRepresenting symbolic knowledge into a connectionist network is the key element for the integration of scalable learning and sound reasoning. Most of the previous studies focus on discriminative neural networks which unnecessarily require a separation of input/output variables. Recent development of generative neural networks such as restricted Boltzmann machines (RBMs) has shown a capability of learning semantic abstractions directly from data, posing a promise for general symbolic learning and reasoning. Previous work on Penalty logic show a link between propositional logic and symmetric connectionist networks, however it is not applicable to RBMs. This paper proposes a novel method to represent propositional formulas into RBMs/stack of RBMs where Gibbs sampling can be seen as maximising satisfiability. It also shows a promising use of RBMs to learn symbolic knowledge through maximum likelihood estimation.\nGenerative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.\nPartially observable environments present an important open challenge in the domain of sequential control learning with delayed rewards. Despite numerous attempts during the two last decades, the majority of reinforcement learning algorithms and associated approximate models, applied to this context, still assume Markovian state transitions. In this paper, we explore the use of a recently proposed attention-based model, the Gated End-to-End Memory Network, for sequential control. We call the resulting model the Gated End-to-End Memory Policy Network. More precisely, we use a model-free value-based algorithm to learn policies for partially observed domains using this memory-enhanced neural network. This model is end-to-end learnable and it features unbounded memory. Indeed, because of its attention mechanism and associated non-parametric memory, the proposed model allows us to define an attention mechanism over the observation stream unlike recurrent models. We show encouraging results that illustrate the capability of our attention-based model in the context of the continuous-state non-stationary control problem of stock trading. We also present an OpenAI Gym environment for simulated stock exchange and explain its relevance as a benchmark for the field of non-Markovian decision process learning.\nRecent progress in Reinforcement Learning (RL), fueled by its combination, with Deep Learning has enabled impressive results in learning to interact with complex virtual environments, yet real-world applications of RL are still scarce. A key limitation is data efficiency, with current state-of-the-art approaches requiring millions of training samples. A promising way to tackle this problem is to augment RL with learning from human demonstrations. However, human demonstration data is not yet readily available. This hinders progress in this direction. The present work addresses this problem as follows. We (i) collect and describe a large dataset of human Atari 2600 replays -- the largest and most diverse such data set publicly released to date, (ii) illustrate an example use of this dataset by analyzing the relation between demonstration quality and imitation learning performance, and (iii) outline possible research directions that are opened up by our work.\nWe introduce neural networks for end-to-end differentiable proving of queries to knowledge bases by operating on dense vector representations of symbols. These neural networks are constructed recursively by taking inspiration from the backward chaining algorithm as used in Prolog. Specifically, we replace symbolic unification with a differentiable computation on vector representations of symbols using a radial basis function kernel, thereby combining symbolic reasoning with learning subsymbolic vector representations. By using gradient descent, the resulting neural network can be trained to infer facts from a given incomplete knowledge base. It learns to (i) place representations of similar symbols in close proximity in a vector space, (ii) make use of such similarities to prove queries, (iii) induce logical rules, and (iv) use provided and induced logical rules for multi-hop reasoning. We demonstrate that this architecture outperforms ComplEx, a state-of-the-art neural link prediction model, on three out of four benchmark knowledge bases while at the same time inducing interpretable function-free first-order logic rules.\nLearning meaningful representations that maintain the content necessary for a particular task while filtering away detrimental variations is a problem of great interest in machine learning. In this paper, we tackle the problem of learning representations invariant to a specific factor or trait of data. The representation learning process is formulated as an adversarial minimax game. We analyze the optimal equilibrium of such a game and find that it amounts to maximizing the uncertainty of inferring the detrimental factor given the representation while maximizing the certainty of making task-specific predictions. On three benchmark tasks, namely fair and bias-free classification, language-independent generation, and lighting-independent image classification, we show that the proposed framework induces an invariant representation, and leads to better generalization evidenced by the improved performance.\nMulti-start algorithms are a common and effective tool for metaheuristic searches. In this paper we amplify multi-start capabilities by employing the parallel processing power of the graphics processer unit (GPU) to quickly generate a diverse starting set of solutions for the Unconstrained Binary Quadratic Optimization Problem which are evaluated and used to implement screening methods to select solutions for further optimization. This method is implemented as an initial high quality solution generation phase prior to a secondary steepest ascent search and a comparison of results to best known approaches on benchmark unconstrained binary quadratic problems demonstrates that GPU-enabled diversified multi-start with screening quickly yields very good results.\nIn [1], we introduced mechanical learning and proposed 2 approaches to mechanical learning. Here, we follow one such approach to well describe the objects and the processes of learning. We discuss 2 kinds of patterns: objective and subjective pattern. Subjective pattern is crucial for learning machine. We prove that for any objective pattern we can find a proper subjective pattern based upon least base patterns to express the objective pattern well. X-form is algebraic expression for subjective pattern. Collection of X-forms form internal representation space, which is center of learning machine. We discuss learning by teaching and without teaching. We define data sufficiency by X-form. We then discussed some learning strategies. We show, in each strategy, with sufficient data, and with certain capabilities, learning machine indeed can learn any pattern (universal learning machine). In appendix, with knowledge of learning machine, we try to view deep learning from a different angle, i.e. its internal representation space and its learning dynamics.\nRecent theoretical and experimental results suggest the possibility of using current and near-future quantum hardware in challenging sampling tasks. In this paper, we introduce free energy-based reinforcement learning (FERL) as an application of quantum hardware. We propose a method for processing a quantum annealer's measured qubit spin configurations in approximating the free energy of a quantum Boltzmann machine (QBM). We then apply this method to perform reinforcement learning on the grid-world problem using the D-Wave 2000Q quantum annealer. The experimental results show that our technique is a promising method for harnessing the power of quantum sampling in reinforcement learning tasks.\nWe introduce a diversified top-k partial MaxSAT problem, a combination of partial MaxSAT problem and enumeration problem. Given a partial MaxSAT formula F and a positive integer k, the diversified top-k partial MaxSAT is to find k maximal solutions for F such that the k maximal solutions satisfy the maximum number of soft clauses of F. This problem can be widely used in many applications including community detection, sensor place, motif discovery, and combinatorial testing. We prove the problem is NP-hard and propose an approach for solving the problem. The concrete idea of the approach is to design an encoding EE which reduces diversified top-k partial MaxSAT problem into partial MaxSAT problem, and then solve the resulting problem with state-of-art solvers. In addition, we present an algorithm MEMKC exactly solving the diversified top-k partial MaxSAT. Through several experiments we show that our approach can be successfully applied to the interesting problem.\nRobots will eventually be part of every household. It is thus critical to enable algorithms to learn from and be guided by non-expert users. In this paper, we bring a human in the loop, and enable a human teacher to give feedback to a learning agent in the form of natural language. We argue that a descriptive sentence can provide a much stronger learning signal than a numeric reward in that it can easily point to where the mistakes are and how to correct them. We focus on the problem of image captioning in which the quality of the output can easily be judged by non-experts. We propose a hierarchical phrase-based captioning model trained with policy gradients, and design a feedback network that provides reward to the learner by conditioning on the human-provided feedback. We show that by exploiting descriptive feedback our model learns to perform better than when given independently written human captions.\nFeature engineering is one of the most important and time consuming tasks in predictive analytics projects. It involves understanding domain knowledge and data exploration to discover relevant hand-crafted features from raw data. In this paper, we introduce a system called One Button Machine, or OneBM for short, which automates feature discovery in relational databases. OneBM automatically performs a key activity of data scientists, namely, joining of database tables and applying advanced data transformations to extract useful features from data. We validated OneBM in Kaggle competitions in which OneBM achieved performance as good as top 16% to 24% data scientists in three Kaggle competitions. More importantly, OneBM outperformed the state-of-the-art system in a Kaggle competition in terms of prediction accuracy and ranking on Kaggle leaderboard. The results show that OneBM can be useful for both data scientists and non-experts. It helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time and cost.\nAs robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users' contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input---i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations---to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user's notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.\nThe growing adoption of IT-systems for modeling and executing (business) processes or services has thrust the scientific investigation towards techniques and tools which support more complex forms of process analysis. Many of them, such as conformance checking, process alignment, mining and enhancement, rely on complete observation of past (tracked and logged) executions. In many real cases, however, the lack of human or IT-support on all the steps of process execution, as well as information hiding and abstraction of model and data, result in incomplete log information of both data and activities. This paper tackles the issue of automatically repairing traces with missing information by notably considering not only activities but also data manipulated by them. Our technique recasts such a problem in a reachability problem and provides an encoding in an action language which allows to virtually use any state-of-the-art planning to return solutions.\nTopic models have been widely explored as probabilistic generative models of documents. Traditional inference methods have sought closed-form derivations for updating the models, however as the expressiveness of these models grows, so does the difficulty of performing fast and accurate inference over their parameters. This paper presents alternative neural approaches to topic modelling by providing parameterisable distributions over topics which permit training by backpropagation in the framework of neural variational inference. In addition, with the help of a stick-breaking construction, we propose a recurrent network that is able to discover a notionally unbounded number of topics, analogous to Bayesian non-parametric topic models. Experimental results on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the effectiveness and efficiency of these neural topic models.\nWe present Attract-Repel, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. Attract-Repel facilitates the use of constraints from mono- and cross-lingual resources, yielding semantically specialised cross-lingual vector spaces. Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones. The effectiveness of our approach is demonstrated with state-of-the-art results on semantic similarity datasets in six languages. We next show that Attract-Repel-specialised vectors boost performance in the downstream task of dialogue state tracking (DST) across multiple languages. Finally, we show that cross-lingual vector spaces produced by our algorithm facilitate the training of multilingual DST models, which brings further performance improvements.\nWe introduce the relational ontology log, or relational olog, a knowledge representation system based on the category of sets and relations. It is inspired by Spivak and Kent's olog, a recent categorical framework for knowledge representation. Relational ologs interpolate between ologs and description logic, the dominant formalism for knowledge representation today. In this paper, we investigate relational ologs both for their own sake and to gain insight into the relationship between the algebraic and logical approaches to knowledge representation. On a practical level, we show by example that relational ologs have a friendly and intuitive--yet fully precise--graphical syntax, derived from the string diagrams of monoidal categories. We explain several other useful features of relational ologs not possessed by most description logics, such as a type system and a rich, flexible notion of instance data. In a more theoretical vein, we draw on categorical logic to show how relational ologs can be translated to and from logical theories in a fragment of first-order logic. Although we make extensive use of categorical language, this paper is designed to be self-contained and has considerable expository content. The only prerequisites are knowledge of first-order logic and the rudiments of category theory.\nDeep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to such effective behaviors or, more critically, failure modes. In this work, we present a general method for visualizing an arbitrary neural network's inner mechanisms and their power and limitations. Our dataset-centric method produces visualizations of how a trained network attends to components of its inputs. The computed \"attention masks\" support improved interpretability by highlighting which input attributes are critical in determining output. We demonstrate the effectiveness of our framework on a variety of deep neural network architectures in domains from computer vision, natural language processing, and reinforcement learning. The primary contribution of our approach is an interpretable visualization of attention that provides unique insights into the network's underlying decision-making process irrespective of the data modality.\nWhile several matrix factorization (MF) and tensor factorization (TF) models have been proposed for knowledge base (KB) inference, they have rarely been compared across various datasets. Is there a single model that performs well across datasets? If not, what characteristics of a dataset determine the performance of MF and TF models? Is there a joint TF+MF model that performs robustly on all datasets? We perform an extensive evaluation to compare popular KB inference models across popular datasets in the literature. In addition to answering the questions above, we remove a limitation in the standard evaluation protocol for MF models, propose an extension to MF models so that they can better handle out-of-vocabulary (OOV) entity pairs, and develop a novel combination of TF and MF models. We also analyze and explain the results based on models and dataset characteristics. Our best model is robust, and obtains strong results across all datasets.\nIn many robotic applications, some aspects of the system dynamics can be modeled accurately while others are difficult to obtain or model. We present a novel reinforcement learning (RL) method for continuous state and action spaces that learns with partial knowledge of the system and without active exploration. It solves linearly-solvable Markov decision processes (L-MDPs), which are well suited for continuous state and action spaces, based on an actor-critic architecture. Compared to previous RL methods for L-MDPs and path integral methods which are model based, the actor-critic learning does not need a model of the uncontrolled dynamics and, importantly, transition noise levels; however, it requires knowing the control dynamics for the problem. We evaluate our method on two synthetic test problems, and one real-world problem in simulation and using real traffic data. Our experiments demonstrate improved learning and policy performance.\nPathfinding is a very popular area in computer game development. While two-dimensional (2D) pathfinding is widely applied in most of the popular game engines, little implementation of real three-dimensional (3D) pathfinding can be found. This research presents a dynamic search space optimization algorithm which can be applied to tessellate 3D search space unevenly, significantly reducing the total number of resulting nodes. The algorithm can be used with popular pathfinding algorithms in 3D game engines. Furthermore, a simplified standalone 3D pathfinding algorithm is proposed in this paper. The proposed algorithm relies on ray-casting or line vision to generate a feasible path during runtime without requiring division of the search space into a 3D grid. Both of the proposed algorithms are simulated on Unreal Engine to show innerworkings and resultant path comparison with A*. The advantages and shortcomings of the proposed algorithms are also discussed along with future directions.\nNon-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named {\\em Online ASP for MDP} (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.\nBayesian optimization (BO) has become an effective approach for black-box function optimization problems when function evaluations are expensive and the optimum can be achieved within a relatively small number of queries. However, many cases, such as the ones with high-dimensional inputs, may require a much larger number of observations for optimization. Despite an abundance of observations thanks to parallel experiments, current BO techniques have been limited to merely a few thousand observations. In this paper, we propose ensemble Bayesian optimization (EBO) to address three current challenges in BO simultaneously: (1) large-scale observations; (2) high dimensional input spaces; and (3) selections of batch queries that balance quality and diversity. The key idea of EBO is to operate on an ensemble of additive Gaussian process models, each of which possesses a randomized strategy to divide and conquer. We show unprecedented, previously impossible results of scaling up BO to tens of thousands of observations within minutes of computation.\nA significant amount of search queries originate from some real world information need or tasks. In order to improve the search experience of the end users, it is important to have accurate representations of tasks. As a result, significant amount of research has been devoted to extracting proper representations of tasks in order to enable search systems to help users complete their tasks, as well as providing the end user with better query suggestions, for better recommendations, for satisfaction prediction, and for improved personalization in terms of tasks. Most existing task extraction methodologies focus on representing tasks as flat structures. However, tasks often tend to have multiple subtasks associated with them and a more naturalistic representation of tasks would be in terms of a hierarchy, where each task can be composed of multiple (sub)tasks. To this end, we propose an efficient Bayesian nonparametric model for extracting hierarchies of such tasks \\& subtasks. We evaluate our method based on real world query log data both through quantitative and crowdsourced experiments and highlight the importance of considering task/subtask hierarchies.\nMulti-task learning (MTL) is a supervised learning paradigm in which the prediction models for several related tasks are learned jointly to achieve better generalization performance. When there are only a few training examples per task, MTL considerably outperforms the traditional Single task learning (STL) in terms of prediction accuracy. In this work we develop an MTL based approach for classifying documents that are archived within dual concept hierarchies, namely, DMOZ and Wikipedia. We solve the multi-class classification problem by defining one-versus-rest binary classification tasks for each of the different classes across the two hierarchical datasets. Instead of learning a linear discriminant for each of the different tasks independently, we use a MTL approach with relationships between the different tasks across the datasets established using the non-parametric, lazy, nearest neighbor approach. We also develop and evaluate a transfer learning (TL) approach and compare the MTL (and TL) methods against the standard single task learning and semi-supervised learning approaches. Our empirical results demonstrate the strength of our developed methods that show an improvement especially when there are fewer number of training examples per classification task.\nIn this paper, we explore SPPIM-based text classification method, and the experiment reveals that the SPPIM method is equal to or even superior than SGNS method in text classification task on three international and standard text datasets, namely 20newsgroups, Reuters52 and WebKB. Comparing to SGNS, although SPPMI provides a better solution, it is not necessarily better than SGNS in text classification tasks. Based on our analysis, SGNS takes into the consideration of weight calculation during decomposition process, so it has better performance than SPPIM in some standard datasets. Inspired by this, we propose a WL-SPPIM semantic model based on SPPIM model, and experiment shows that WL-SPPIM approach has better classification and higher scalability in the text classification task compared with LDA, SGNS and SPPIM approaches.\nDeep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.\nEpistemic logic with non-standard knowledge operators, especially the \"knowing-value\" operator, has recently gathered much attention. With the \"knowing-value\" operator, we can express knowledge of individual variables, but not of the relations between them in general. In this paper, we propose a new operator Kf to express knowledge of the functional dependencies between variables. The semantics of this Kf operator uses a function domain which imposes a constraint on what counts as a functional dependency relation. By adjusting this function domain, different interesting logics arise, and in this paper we axiomatize three such logics in a single agent setting. Then we show how these three logics can be unified by allowing the function domain to vary relative to different agents and possible worlds. A multiagent axiomatization is given in this case.\nArtifact-centric process models aim to describe complex processes as a collection of interacting artifacts. Recent development in process mining allow for the discovery of such models. However, the focus is often on the representation of the individual artifacts rather than their interactions. Based on event data we can automatically discover composite state machines representing artifact-centric processes. Moreover, we provide ways of visualizing and quantifying interactions among different artifacts. For example, we are able to highlight strongly correlated behaviours in different artifacts. The approach has been fully implemented as a ProM plug-in; the CSM Miner provides an interactive artifact-centric process discovery tool focussing on interactions. The approach has been evaluated using real life data sets, including the personal loan and overdraft process of a Dutch financial institution.\nIn the context of solving large distributed constraint optimization problems (DCOP), belief-propagation and approximate inference algorithms are candidates of choice. However, in general, when the factor graph is very loopy (i.e. cyclic), these solution methods suffer from bad performance, due to non-convergence and many exchanged messages. As to improve performances of the Max-Sum inference algorithm when solving loopy constraint optimization problems, we propose here to take inspiration from the belief-propagation-guided dec-imation used to solve sparse random graphs (k-satisfiability). We propose the novel DeciMaxSum method, which is parameterized in terms of policies to decide when to trigger decimation, which variables to decimate, and which values to assign to decimated variables. Based on an empirical evaluation on a classical BP benchmark (the Ising model), some of these combinations of policies exhibit better performance than state-of-the-art competitors.\nMaking inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared to whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.\nAs we know, some global optimization problems cannot be solved using analytic methods, so numeric/algorithmic approaches are used to find near to the optimal solutions for them. A stochastic global optimization algorithm (SGoal) is an iterative algorithm that generates a new population (a set of candidate solutions) from a previous population using stochastic operations. Although some research works have formalized SGoals using Markov kernels, such formalization is not general and sometimes is blurred. In this paper, we propose a comprehensive and systematic formal approach for studying SGoals. First, we present the required theory of probability (\\sigma-algebras, measurable functions, kernel, markov chain, products, convergence and so on) and prove that some algorithmic functions like swapping and projection can be represented by kernels. Then, we introduce the notion of join-kernel as a way of characterizing the combination of stochastic methods. Next, we define the optimization space, a formal structure (a set with a \\sigma-algebra that contains strict \\epsilon-optimal states) for studying SGoals, and we develop kernels, like sort and permutation, on such structure. Finally, we present some popular SGoals in terms of the developed theory, we introduce sufficient conditions for convergence of a SGoal, and we prove convergence of some popular SGoals.\nIt has been previously observed that variational autoencoders tend to ignore the latent code when combined with a decoding distribution that is too flexible. This undermines the purpose of unsupervised representation learning. In this paper, we additionally show that existing training criteria can lead to extremely poor amortized inference distributions and overestimation of the posterior variance, even when trained to optimality. We identify the reason for both short-comings in the regularization term used in the ELBO criterion to match the variational posterior to the latent prior distribution. We propose a class of training criteria termed InfoVAE that solves the two problems. We show that these models maximize the mutual information between input and latent features, make effective use of the latent features regardless of the flexibility of the decoding distribution, and avoid the variance over-estimation problem. Through extensive qualitative and quantitative analyses, we demonstrate that our models do not suffer from these problems, and outperform models trained with ELBO on multiple metrics of performance.\nCan computers overcome human capabilities? This is a paradoxical and controversial question, particularly because there are many hidden assumptions. This article focuses on that issue putting on evidence some misconception related with future generations of machines and the understanding of the brain. It will be discussed to what extent computers might reach human capabilities, and how it could be possible only if the computer is a conscious machine. However, it will be shown that if the computer is conscious, an interference process due to consciousness would affect the information processing of the system. Therefore, it might be possible to make conscious machines to overcome human capabilities, which will have limitations as well as humans. In other words, trying to overcome human capabilities with computers implies the paradoxical conclusion that a computer will never overcome human capabilities at all, or if the computer does, it should not be considered as a computer anymore.\nWe explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.\nIn this paper, we introduce a generalized value iteration network (GVIN), which is an end-to-end neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Q-learning, an improvement upon traditional n-step Q-learning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and real-world street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image).\nAutomating statistical modelling is a challenging problem in artificial intelligence. The Automatic Statistician takes a first step in this direction, by employing a kernel search algorithm with Gaussian Processes (GP) to provide interpretable statistical models for regression problems. However this does not scale due to its $O(N^3)$ running time for the model selection. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets. In doing so, we derive a cheap upper bound on the GP marginal likelihood that sandwiches the marginal likelihood with the variational lower bound . We show that the upper bound is significantly tighter than the lower bound and thus useful for model selection.\nThe continuing development of Semantic Web technologies and the increasing user adoption in the recent years have accelerated the progress incorporating explicit semantics with data on the Web. With the rapidly growing RDF (Resource Description Framework) data on the Semantic Web, processing large semantic graph data have become more challenging. Constructing a summary graph structure from the raw RDF can help obtain semantic type relations and reduce the computational complexity for graph processing purposes. In this paper, we addressed the problem of graph summarization in RDF graphs, and we proposed an approach for building summary graph structures automatically from RDF graph data. Moreover, we introduced a measure to help discover optimum class dissimilarity thresholds and an effective method to discover the type classes automatically. In future work, we plan to investigate further improvement options on the scalability of the proposed method.\nCommon-sense or background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, the requisite background knowledge is indirectly acquired from static corpora. We develop a new reading architecture for the dynamic integration of explicit background knowledge in NLU models. A new task-agnostic reading module provides refined word representations to a task-specific NLU architecture by processing background knowledge in the form of free-text statements, together with the task-specific inputs. Strong performance on the tasks of document question answering (DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness and flexibility of our approach. Analysis shows that our models learn to exploit knowledge selectively and in a semantically appropriate way.\nDigital games are one of the major and most important fields on the entertainment domain, which also involves cinema and music. Numerous attempts have been done to improve the quality of the games including more realistic artistic production and computer science. Assessing the player's behavior, a task known as player modeling, is currently the need of the hour which leads to possible improvements in terms of: (i) better game interaction experience, (ii) better exploitation of the relationship between players, and (iii) increasing/maintaining the number of players interested in the game. In this paper we model players using the basic four behaviors proposed in \\cite{BartleArtigo}, namely: achiever, explorer, socializer and killer. Our analysis is carried out using data obtained from the game \"World of Warcraft\" over 3 years (2006 $-$ 2009). We employ a semi-supervised learning technique in order to find out characteristics that possibly impact player's behavior.\nWe present a new preprocessing algorithm for embedding the nodes of a given edge-weighted undirected graph into a Euclidean space. The Euclidean distance between any two nodes in this space approximates the length of the shortest path between them in the given graph. Later, at runtime, a shortest path between any two nodes can be computed with A* search using the Euclidean distances as heuristic. Our preprocessing algorithm, called FastMap, is inspired by the data mining algorithm of the same name and runs in near-linear time. Hence, FastMap is orders of magnitude faster than competing approaches that produce a Euclidean embedding using Semidefinite Programming. FastMap also produces admissible and consistent heuristics and therefore guarantees the generation of shortest paths. Moreover, FastMap applies to general undirected graphs for which many traditional heuristics, such as the Manhattan Distance heuristic, are not well defined. Empirically, we demonstrate that A* search using the FastMap heuristic is competitive with A* search using other state-of-the-art heuristics, such as the Differential heuristic.\nWe provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a target model, which may or may not be a human. We define a framework that allows for comparing interpretable procedures by linking it to important practical aspects such as accuracy and robustness. We characterize many of the current state-of-the-art interpretable methods in our framework portraying its general applicability. Finally, principled interpretable strategies are proposed and empirically evaluated on synthetic data, as well as on the largest public olfaction dataset that was made recently available \\cite{olfs}. We also experiment on MNIST with a simple target model and different oracle models of varying complexity. This leads to the insight that the improvement in the target model is not only a function of the oracle models performance, but also its relative complexity with respect to the target model.\nOn a daily investment decision in a security market, the price earnings (PE) ratio is one of the most widely applied methods being used as a firm valuation tool by investment experts. Unfortunately, recent academic developments in financial econometrics and machine learning rarely look at this tool. In practice, fundamental PE ratios are often estimated only by subjective expert opinions. The purpose of this research is to formalize a process of fundamental PE estimation by employing advanced dynamic Bayesian network (DBN) methodology. The estimated PE ratio from our model can be used either as a information support for an expert to make investment decisions, or as an automatic trading system illustrated in experiments. Forward-backward inference and EM parameter estimation algorithms are derived with respect to the proposed DBN structure. Unlike existing works in literatures, the economic interpretation of our DBN model is well-justified by behavioral finance evidences of volatility. A simple but practical trading strategy is invented based on the result of Bayesian inference. Extensive experiments show that our trading strategy equipped with the inferenced PE ratios consistently outperforms standard investment benchmarks.\nIn this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for functional approximation. Finally we show that the use of potential based reward shaping is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information.\nIn this research, we investigate the subject of path-finding. A pruned version of visibility graph based on Candidate Vertices is formulated, followed by a new visibility check technique. Such combination enables us to quickly identify the useful vertices and thus find the optimal path more efficiently. The algorithm proposed is demonstrated on various path-finding cases. The performance of the new technique on visibility graphs is compared to the traditional A* on Grids, Theta* and A* on Visibility Graphs in terms of path length, number of nodes evaluated, as well as computational time. The key algorithmic contribution is that the new approach combines the merits of grid-based method and visibility graph-based method and thus yields better overall performance.\nWe study the skip-thought model with neighborhood information as weak supervision. More specifically, we propose a skip-thought neighbor model to consider the adjacent sentences as a neighborhood. We train our skip-thought neighbor model on a large corpus with continuous sentences, and then evaluate the trained model on 7 tasks, which include semantic relatedness, paraphrase detection, and classification benchmarks. Both quantitative comparison and qualitative investigation are conducted. We empirically show that, our skip-thought neighbor model performs as well as the skip-thought model on evaluation tasks. In addition, we found that, incorporating an autoencoder path in our model didn't aid our model to perform better, while it hurts the performance of the skip-thought model.\nMost existing matching algorithms are one-off algorithms, i.e., they usually measure the distance between the two image feature representation vectors for only one time. In contrast, human's vision system achieves this task, i.e., image matching, by recursively looking at specific/related parts of both images and then making the final judgement. Towards this end, we propose a novel loopy recurrent neural network (Loopy RNN), which is capable of aggregating relationship information of two input images in a progressive/iterative manner and outputting the consolidated matching score in the final iteration. A Loopy RNN features two uniqueness. First, built on conventional long short-term memory (LSTM) nodes, it links the output gate of the tail node to the input gate of the head node, thus it brings up symmetry property required for matching. Second, a monotonous loss designed for the proposed network guarantees increasing confidence during the recursive matching process. Extensive experiments on several image matching benchmarks demonstrate the great potential of the proposed method.\nCommunication is a critical factor for the big multi-agent world to stay organized and productive. Typically, most previous multi-agent \"learning-to-communicate\" studies try to predefine the communication protocols or use technologies such as tabular reinforcement learning and evolutionary algorithm, which can not generalize to changing environment or large collection of agents.   In this paper, we propose an Actor-Coordinator-Critic Net (ACCNet) framework for solving \"learning-to-communicate\" problem. The ACCNet naturally combines the powerful actor-critic reinforcement learning technology with deep learning technology. It can efficiently learn the communication protocols even from scratch under partially observable environment. We demonstrate that the ACCNet can achieve better results than several baselines under both continuous and discrete action space environments. We also analyse the learned protocols and discuss some design considerations.\nWe consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance. We show that the data collected from deploying a different policy, commonly called the behavior policy, can be used to produce unbiased estimates with lower mean squared error than this standard technique. We derive an analytic expression for the optimal behavior policy --- the behavior policy that minimizes the mean squared error of the resulting estimates. Because this expression depends on terms that are unknown in practice, we propose a novel policy evaluation sub-problem, behavior policy search: searching for a behavior policy that reduces mean squared error. We present a behavior policy search algorithm and empirically demonstrate its effectiveness in lowering the mean squared error of policy performance estimates.\nHyperparameter tuning is one of the most time-consuming workloads in deep learning. State-of-the-art optimizers, such as AdaGrad, RMSProp and Adam, reduce this labor by adaptively tuning an individual learning rate for each variable. Recently researchers have shown renewed interest in simpler methods like momentum SGD as they may yield better test metrics. Motivated by this trend, we ask: can simple adaptive methods based on SGD perform as well or better? We revisit the momentum SGD algorithm and show that hand-tuning a single learning rate and momentum makes it competitive with Adam. We then analyze its robustness to learning rate misspecification and objective curvature variation. Based on these insights, we design YellowFin, an automatic tuner for momentum and learning rate in SGD. YellowFin optionally uses a negative-feedback loop to compensate for the momentum dynamics in asynchronous settings on the fly. We empirically show that YellowFin can converge in fewer iterations than Adam on ResNets and LSTMs for image recognition, language modeling and constituency parsing, with a speedup of up to 3.28x in synchronous and up to 2.69x in asynchronous settings.\nFactoid question answering (QA) has recently benefited from the development of deep learning (DL) systems. Neural network models outperform traditional approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 questions) for Wikipedia articles. However, these systems have not yet been applied to QA in more specific domains, such as biomedicine, because datasets are generally too small to train a DL system from scratch. For example, the BioASQ dataset for biomedical QA comprises less then 900 factoid (single answer) and list (multiple answers) QA instances. In this work, we adapt a neural QA system trained on a large open-domain dataset (SQuAD, source) to a biomedical dataset (BioASQ, target) by employing various transfer learning techniques. Our network architecture is based on a state-of-the-art QA system, extended with biomedical word embeddings and a novel mechanism to answer list questions. In contrast to existing biomedical QA systems, our system does not rely on domain-specific ontologies, parsers or entity taggers, which are expensive to create. Despite this fact, our systems achieve state-of-the-art results on factoid questions and competitive results on list questions.\nFor sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.\nUnsupervised learning of low-dimensional, semantic representations of words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations of our previously published entity representation models. The toolkit provides a unified interface to different representation learning algorithms, fine-grained parsing configuration and can be used transparently with GPUs. In addition, users can easily modify existing models or implement their own models in the framework. After model training, SERT can be used to rank entities according to a textual query and extract the learned entity/word representation for use in downstream algorithms, such as clustering or recommendation.\nMeasurement error in the observed values of the variables can greatly change the output of various causal discovery methods. This problem has received much attention in multiple fields, but it is not clear to what extent the causal model for the measurement-error-free variables can be identified in the presence of measurement error with unknown variance. In this paper, we study precise sufficient identifiability conditions for the measurement-error-free causal model and show what information of the causal model can be recovered from observed data. In particular, we present two different sets of identifiability conditions, based on the second-order statistics and higher-order statistics of the data, respectively. The former was inspired by the relationship between the generating model of the measurement-error-contaminated data and the factor analysis model, and the latter makes use of the identifiability result of the over-complete independent component analysis problem.\nThe population in Sweden is growing rapidly due to immigration. In this light, the issue of infrastructure upgrades to provide telecommunication services is of importance. New antennas can be installed at hot spots of user demand, which will require an investment, and/or the clientele expansion can be carried out in a planned manner to promote the exploitation of the infrastructure in the less loaded geographical zones. In this paper, we explore the second alternative. Informally speaking, the term Infrastructure-Stressing describes a user who stays in the zones of high demand, which are prone to produce service failures, if further loaded. We have studied the Infrastructure-Stressing population in the light of their correlation with geo-demographic segments. This is motivated by the fact that specific geo-demographic segments can be targeted via marketing campaigns. Fuzzy logic is applied to create an interface between big data, numeric methods for processing big data and a manager.\nA major investment made by a telecom operator goes into the infrastructure and its maintenance, while business revenues are proportional to how big and good the customer base is. We present a data-driven analytic strategy based on combinatorial optimization and analysis of historical data. The data cover historical mobility of the users in one region of Sweden during a week. Applying the proposed method to the case study, we have identified the optimal proportion of geo-demographic segments in the customer base, developed a functionality to assess the potential of a planned marketing campaign, and explored the problem of an optimal number and types of the geo-demographic segments to target through marketing campaigns. With the help of fuzzy logic, the conclusions of data analysis are automatically translated into comprehensible recommendations in a natural language.\nAutomatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on neural approaches to summarisation, which can be very data-hungry. However, few large datasets exist and none for the traditionally popular domain of scientific publications, which opens up challenging research avenues centered on encoding large, complex documents. In this paper, we introduce a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and show straightforward ways of extending it further. We develop models on the dataset making use of both neural sentence encoding and traditionally used summarisation features and show that models which encode sentences as well as their local and global context perform best, significantly outperforming well-established baseline methods.\nIn this paper we provide a first analysis of the research questions that arise when dealing with the problem of communicating pieces of formal argumentation through natural language interfaces. It is a generally held opinion that formal models of argumentation naturally capture human argument, and some preliminary studies have focused on justifying this view. Unfortunately, the results are not only inconclusive, but seem to suggest that explaining formal argumentation to humans is a rather articulated task. Graphical models for expressing argumentation-based reasoning are appealing, but often humans require significant training to use these tools effectively. We claim that natural language interfaces to formal argumentation systems offer a real alternative, and may be the way forward for systems that capture human argument.\nWe show that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot. This reduction has several advantages: we can (1) learn relation-extraction models by extending recent neural reading-comprehension techniques, (2) build very large training sets for those models by combining relation-specific crowd-sourced questions with distant supervision, and even (3) do zero-shot learning by extracting new relation types that are only specified at test-time, for which we have no labeled training examples. Experiments on a Wikipedia slot-filling task demonstrate that the approach can generalize to new questions for known relation types with high accuracy, and that zero-shot generalization to unseen relation types is possible, at lower accuracy levels, setting the bar for future work on this task.\nA reinforcement algorithm solves a classical optimization problem by introducing a feedback to the system which slowly changes the energy landscape and converges the algorithm to an optimal solution in the configuration space. Here, we use this strategy to concentrate (localize) preferentially the wave function of a quantum particle, which explores the configuration space of the problem, on an optimal configuration. We examine the method by solving numerically the equations governing the evolution of the system, which are similar to the nonlinear Schr\\\"odinger equations, for small problem sizes. In particular, we observe that reinforcement increases the minimal energy gap of the system in a quantum annealing algorithm. Our numerical simulations and the latter observation show that such kind of quantum feedbacks might be helpful in solving a computationally hard optimization problem by a quantum reinforcement algorithm.\nIn built infrastructure monitoring, an efficient path planning algorithm is essential for robotic inspection of large surfaces using computer vision. In this work, we first formulate the inspection path planning problem as an extended travelling salesman problem (TSP) in which both the coverage and obstacle avoidance were taken into account. An enhanced discrete particle swarm optimization (DPSO) algorithm is then proposed to solve the TSP, with performance improvement by using deterministic initialization, random mutation, and edge exchange. Finally, we take advantage of parallel computing to implement the DPSO in a GPU-based framework so that the computation time can be significantly reduced while keeping the hardware requirement unchanged. To show the effectiveness of the proposed algorithm, experimental results are included for datasets obtained from UAV inspection of an office building and a bridge.\nMapping in the GPS-denied environment is an important and challenging task in the field of robotics. In the large environment, mapping can be significantly accelerated by multiple robots exploring different parts of the environment. Accordingly, a key problem is how to integrate these local maps built by different robots into a single global map. In this paper, we propose an approach for simultaneous merging of multiple grid maps by the robust motion averaging. The main idea of this approach is to recover all global motions for map merging from a set of relative motions. Therefore, it firstly adopts the pair-wise map merging method to estimate relative motions for grid map pairs. To obtain as many reliable relative motions as possible, a graph-based sampling scheme is utilized to efficiently remove unreliable relative motions obtained from the pair-wise map merging. Subsequently, the accurate global motions can be recovered from the set of reliable relative motions by the motion averaging. Experimental results carried on real robot data sets demonstrate that proposed approach can achieve simultaneous merging of multiple grid maps with good performances.\nWe propose a two-stage neural model to tackle question generation from documents. Our model first estimates the probability that word sequences in a document compose \"interesting\" answers using a neural model trained on a question-answering corpus. We thus take a data-driven approach to interestingness. Predicted key phrases then act as target answers that condition a sequence-to-sequence question generation model with a copy mechanism. Empirically, our neural key phrase detection model significantly outperforms an entity-tagging baseline system and existing rule-based approaches. We demonstrate that the question generator formulates good quality natural language questions from extracted key phrases, and a human study indicates that our system's generated question-answer pairs are competitive with those of an earlier approach. We foresee our system being used in an educational setting to assess reading comprehension and also as a data augmentation technique for semi-supervised learning.\nMany existing global constraints can be encoded as a conjunction of among constraints. An among constraint holds if the number of the variables in its scope whose value belongs to a prespecified set, which we call its range, is within some given bounds. It is known that domain filtering algorithms can benefit from reasoning about the interaction of among constraints so that values can be filtered out taking into consideration several among constraints simultaneously. The present pa- per embarks into a systematic investigation on the circumstances under which it is possible to obtain efficient and complete domain filtering algorithms for conjunctions of among constraints. We start by observing that restrictions on both the scope and the range of the among constraints are necessary to obtain meaningful results. Then, we derive a domain flow-based filtering algorithm and present several applications. In particular, it is shown that the algorithm unifies and generalizes several previous existing results.\nAs a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new objective which encourages learning correspondences between similar subtasks by making analogies. For generalization over sequential instructions, we present a hierarchical architecture where a meta controller learns to use the acquired skills for executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more efficient. Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.\nJoint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What's more, the end-to-end model proposed in this paper, achieves the best results on the public dataset.\nMuch of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other's reward functions must reach an agreement (or a deal) via natural language dialogue. For the first time, we show it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance. Our code and dataset are publicly available (https://github.com/facebookresearch/end-to-end-negotiator).\nWe consider the question of extending propositional logic to a logic of plausible reasoning, and posit four requirements that any such extension should satisfy. Each is a requirement that some property of classical propositional logic be preserved in the extended logic; as such, the requirements are simpler and less problematic than those used in Cox's Theorem and its variants. As with Cox's Theorem, our requirements imply that the extended logic must be isomorphic to (finite-set) probability theory. We also obtain specific numerical values for the probabilities, recovering the classical definition of probability as a theorem, with truth assignments that satisfy the premise playing the role of the \"possible cases.\"\nWe study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the \"lazy agent\" problem, which arises due to partial observability. We address these problems by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. We perform an experimental evaluation across a range of partially-observable multi-agent domains and show that learning such value-decompositions leads to superior results, in particular when combined with weight sharing, role information and information channels.\nAdaptive gradient methods have become recently very popular, in particular as they have been shown to be useful in the training of deep neural networks. In this paper we have analyzed RMSProp, originally proposed for the training of deep neural networks, in the context of online convex optimization and show $\\sqrt{T}$-type regret bounds. Moreover, we propose two variants SC-Adagrad and SC-RMSProp for which we show logarithmic regret bounds for strongly convex functions. Finally, we demonstrate in the experiments that these new variants outperform other adaptive gradient techniques or stochastic gradient descent in the optimization of strongly convex functions as well as in training of deep neural networks.\nMany tourist applications provide a personalized tourist agenda with the list of recommended activities to the user. These applications must undoubtedly deal with the constraints and preferences that define the user interests. Among these preferences, we can find those that define the travel style of the user, such as the rhythm of the trip, the number of visits to include in the tour or the priority to visits of special interest for the user. In this paper, we deal with the task of creating a customized tourist agenda as a planning and scheduling application capable of conveniently scheduling the most appropriate goals (visits) so as to maximize the user satisfaction with the tourist route. This paper makes an analysis of the meaning of the travel style preferences and compares the quality of the solutions obtained by two different solvers, a PDDL-based planner and a Constraint Satisfaction Problem solver. We also define several quality metrics and perform extensive experiments in order to evaluate the results obtained with both solvers.\nDiffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass \"too aggressively,\" thereby failing to find the \"right\" clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good---but not very good---clusters.\nThis work proposes a new algorithm for training a re-weighted L2 Support Vector Machine (SVM), inspired on the re-weighted Lasso algorithm of Cand\\`es et al. and on the equivalence between Lasso and SVM shown recently by Jaggi. In particular, the margin required for each training vector is set independently, defining a new weighted SVM model. These weights are selected to be binary, and they are automatically adapted during the training of the model, resulting in a variation of the Frank-Wolfe optimization algorithm with essentially the same computational complexity as the original algorithm. As shown experimentally, this algorithm is computationally cheaper to apply since it requires less iterations to converge, and it produces models with a sparser representation in terms of support vectors and which are more stable with respect to the selection of the regularization hyper-parameter.\nThe use of semi-autonomous and autonomous robotic assistants to aid in care of the elderly is expected to ease the burden on human caretakers, with small-stage testing already occurring in a variety of countries. Yet, it is likely that these robots will need to request human assistance via teleoperation when domain expertise is needed for a specific task. As deployment of robotic assistants moves to scale, mapping these requests for human aid to the teleoperators themselves will be a difficult online optimization problem. In this paper, we design a system that allocates requests to a limited number of teleoperators, each with different specialities, in an online fashion. We generalize a recent model of online job scheduling with a worst-case competitive-ratio bound to our setting. Next, we design a scalable machine-learning-based teleoperator-aware task scheduling algorithm and show, experimentally, that it performs well when compared to an omniscient optimal scheduling algorithm.\nInteger Linear Programming (ILP) has a broad range of applications in various areas of artificial intelligence. Yet in spite of recent advances, we still lack a thorough understanding of which structural restrictions make ILP tractable. Here we study ILP instances consisting of a small number of \"global\" variables and/or constraints such that the remaining part of the instance consists of small and otherwise independent components; this is captured in terms of a structural measure we call fracture backdoors which generalizes, for instance, the well-studied class of N -fold ILP instances.   Our main contributions can be divided into three parts. First, we formally develop fracture backdoors and obtain exact and approximation algorithms for computing these. Second, we exploit these backdoors to develop several new parameterized algorithms for ILP; the performance of these algorithms will naturally scale based on the number of global variables or constraints in the instance. Finally, we complement the developed algorithms with matching lower bounds. Altogether, our results paint a near-complete complexity landscape of ILP with respect to fracture backdoors.\nAs a means of human-based computation, crowdsourcing has been widely used to annotate large-scale unlabeled datasets. One of the obvious challenges is how to aggregate these possibly noisy labels provided by a set of heterogeneous annotators. Another challenge stems from the difficulty in evaluating the annotator reliability without even knowing the ground truth, which can be used to build incentive mechanisms in crowdsourcing platforms. When each instance is associated with many possible labels simultaneously, the problem becomes even harder because of its combinatorial nature. In this paper, we present new flexible Bayesian models and efficient inference algorithms for multi-label annotation aggregation by taking both annotator reliability and label dependency into account. Extensive experiments on real-world datasets confirm that the proposed methods outperform other competitive alternatives, and the model can recover the type of the annotators with high accuracy. Besides, we empirically find that the mixture of multiple independent Bernoulli distribution is able to accurately capture label dependency in this unsupervised multi-label annotation aggregation scenario.\nIn this report, we provide a comparative analysis of different techniques for user intent classification towards the task of app recommendation. We analyse the performance of different models and architectures for multi-label classification over a dataset with a relative large number of classes and only a handful examples of each class. We focus, in particular, on memory network architectures, and compare how well the different versions perform under the task constraints. Since the classifier is meant to serve as a module in a practical dialog system, it needs to be able to work with limited training data and incorporate new data on the fly. We devise a 1-shot learning task to test the models under the above constraint. We conclude that relatively simple versions of memory networks perform better than other approaches. Although, for tasks with very limited data, simple non-parametric methods perform comparably, without needing the extra training data.\nWe propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1--4\\% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/jklj077/meProp\nHuman conversation is inherently complex, often spanning many different topics/domains. This makes policy learning for dialogue systems very challenging. Standard flat reinforcement learning methods do not provide an efficient framework for modelling such dialogues. In this paper, we focus on the under-explored problem of multi-domain dialogue management. First, we propose a new method for hierarchical reinforcement learning using the option framework. Next, we show that the proposed architecture learns faster and arrives at a better policy than the existing flat ones do. Moreover, we show how pretrained policies can be adapted to more complex systems with an additional set of new actions. In doing that, we show that our approach has the potential to facilitate policy optimisation for more sophisticated multi-domain dialogue systems.\nGenerative adversarial nets (GANs) are a promising technique for modeling a distribution from samples. It is however well known that GAN training suffers from instability due to the nature of its maximin formulation. In this paper, we explore ways to tackle the instability problem by dualizing the discriminator. We start from linear discriminators in which case conjugate duality provides a mechanism to reformulate the saddle point objective into a maximization problem, such that both the generator and the discriminator of this 'dualing GAN' act in concert. We then demonstrate how to extend this intuition to non-linear formulations. For GANs with linear discriminators our approach is able to remove the instability in training, while for GANs with nonlinear discriminators our approach provides an alternative to the commonly used GAN training algorithm.\nIn \"The Logic of Campaigning\", Dean and Parikh consider a candidate making campaign statements to appeal to the voters. They model these statements as Boolean formulas over variables that represent stances on the issues, and study optimal candidate strategies under three proposed models of voter preferences based on the assignments that satisfy these formulas. We prove that voter utility evaluation is computationally hard under these preference models (in one case, #P-hard), along with certain problems related to candidate strategic reasoning. Our results raise questions about the desirable characteristics of a voter preference model and to what extent a polynomial-time-evaluable function can capture them.\nThis paper presents preliminary results of our work with a major financial company, where we try to use methods of plan recognition in order to investigate the interactions of a costumer with the company's online interface. In this paper, we present the first steps of integrating a plan recognition algorithm in a real-world application for detecting and analyzing the interactions of a costumer. It uses a novel approach for plan recognition from bare-bone UI data, which reasons about the plan library at the lowest recognition level in order to define the relevancy of actions in our domain, and then uses it to perform plan recognition.   We present preliminary results of inference on three different use-cases modeled by domain experts from the company, and show that this approach manages to decrease the overload of information required from an analyst to evaluate a costumer's session - whether this is a malicious or benign session, whether the intended tasks were completed, and if not - what actions are expected next.\nThe highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a high-dimensional space and concepts are represented by convex regions in this space. After pointing out a problem with the convexity requirement, we propose a formalization of conceptual spaces based on fuzzy star-shaped sets. Our formalization uses a parametric definition of concepts and extends the original framework by adding means to represent correlations between different domains in a geometric way. Moreover, we define computationally efficient operations on concepts (intersection, union, and projection onto a subspace) and show that these operations can support both learning and reasoning processes.\nInspired by the recent evolution of deep neural networks (DNNs) in machine learning, we explore their application to PL-related topics. This paper is the first step towards this goal; we propose a proof-synthesis method for the negation-free propositional logic in which we use a DNN to obtain a guide of proof search. The idea is to view the proof-synthesis problem as a translation from a proposition to its proof. We train seq2seq, which is a popular network in neural machine translation, so that it generates a proof encoded as a $\\lambda$-term of a given proposition. We implement the whole framework and empirically observe that a generated proof term is close to a correct proof in terms of the tree edit distance of AST. This observation justifies using the output from a trained seq2seq model as a guide for proof search.\nThe neural network is a powerful computing framework that has been exploited by biological evolution and by humans for solving diverse problems. Although the computational capabilities of neural networks are determined by their structure, the current understanding of the relationships between a neural network's architecture and function is still primitive. Here we reveal that neural network's modular architecture plays a vital role in determining the neural dynamics and memory performance of the network. In particular, we demonstrate that there exists an optimal modularity for memory performance, where a balance between local cohesion and global connectivity is established, allowing optimally modular networks to remember longer. Our results suggest that insights from dynamical analysis of neural networks and information spreading processes can be leveraged to better design neural networks and may shed light on the brain's modular organization.\nWe introduce a new formulation of the Hidden Parameter Markov Decision Process (HiP-MDP), a framework for modeling families of related tasks using low-dimensional latent embeddings. Our new framework correctly models the joint uncertainty in the latent parameters and the state space. We also replace the original Gaussian Process-based model with a Bayesian Neural Network, enabling more scalable inference. Thus, we expand the scope of the HiP-MDP to applications with higher dimensions and more complex dynamics.\nThis paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval. In this work, the query and documents are modeled by word-based representations and entity-based representations. Ranking features are generated by the interactions between the two representations, incorporating information from the word space, the entity space, and the cross-space connections through the knowledge graph. To handle the uncertainties from the automatically constructed entity representations, an attention-based ranking model AttR-Duet is developed. With back-propagation from ranking labels, the model learns simultaneously how to demote noisy entities and how to rank documents with the word-entity duet. Evaluation results on TREC Web Track ad-hoc task demonstrate that all of the four-way interactions in the duet are useful, the attention mechanism successfully steers the model away from noisy entities, and together they significantly outperform both word-based and entity-based learning to rank systems.\nThis paper addresses the design and implementation of complex Reinforcement Learning (RL) behaviors where multi-dimensional action spaces are involved, as well as the need to execute the behaviors in real-time using robotic platforms with limited computational resources and training times. For this purpose, we propose the use of decentralized RL, in combination with finite support basis functions as alternatives to Gaussian RBF, in order to alleviate the effects of the curse of dimensionality on the action and state spaces respectively, and to reduce the computation time. As testbed, a RL based controller for the in-walk kick in NAO robots, a challenging and critical problem for soccer robotics, is used. The reported experiments show empirically that our solution saves up to 99.94% of execution time and 98.82% of memory consumption during execution, without diminishing performance compared to classical approaches.\nIn this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic non-parametric approach, called \"Non-Parametric Generalized Linear Model\" (NP-GLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on both synthetic data and real-world Sina Weibo social network demonstrate the effectiveness of NP-GLM in solving temporal link prediction problem vis-a-vis competitive baselines.\nMotor adaptation displays a structure-learning effect: adaptation to a new perturbation occurs more quickly when the subject has prior exposure to perturbations with related structure. Although this `learning-to-learn' effect is well documented, its underlying computational mechanisms are poorly understood. We present a new model of motor structure learning, approaching it from the point of view of deep reinforcement learning. Previous work outside of motor control has shown how recurrent neural networks can account for learning-to-learn effects. We leverage this insight to address motor learning, by importing it into the setting of model-based reinforcement learning. We apply the resulting processing architecture to empirical findings from a landmark study of structure learning in target-directed reaching (Braun et al., 2009), and discuss its implications for a wider range of learning-to-learn phenomena.\nIn this work, we present Web-STAR, an online platform for story understanding built on top of the STAR (STory comprehension through ARgumentation) reasoning engine. This platform includes a web-based IDE, integration with the STAR system and a web service infrastructure to support integration with other systems that rely on story understanding functionality to complete their tasks. The platform also delivers a number of \"social\" features like public story sharing with a built-in commenting system, a public repository for sharing stories with the community and collaboration tools that can be used from both project team members for development and educators for teaching. Moreover, we discuss the ongoing work on adding new features and functionality to this platform.\nWe propose a new system for generating art. The system generates art by looking at art and learning about style; and becomes creative by increasing the arousal potential of the generated art by deviating from the learned styles. We build over Generative Adversarial Networks (GAN), which have shown the ability to learn to generate novel images simulating a given distribution. We argue that such networks are limited in their ability to generate creative products in their original design. We propose modifications to its objective to make it capable of generating creative art by maximizing deviation from established styles and minimizing deviation from art distribution. We conducted experiments to compare the response of human subjects to the generated art with their response to art created by artists. The results show that human subjects could not distinguish art generated by the proposed system from art generated by contemporary artists and shown in top art fairs. Human subjects even rated the generated images higher on various scales.\nRecently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the resulting LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.\nTo perform tasks specified by natural language instructions, autonomous agents need to extract semantically meaningful representations of language and map it to visual elements and actions in the environment. This problem is called task-oriented language grounding. We propose an end-to-end trainable neural architecture for task-oriented language grounding in 3D environments which assumes no prior linguistic or perceptual knowledge and requires only raw pixels from the environment and the natural language instruction as input. The proposed model combines the image and text representations using a Gated-Attention mechanism and learns a policy to execute the natural language instruction using standard reinforcement and imitation learning methods. We show the effectiveness of the proposed model on unseen instructions as well as unseen maps, both quantitatively and qualitatively. We also introduce a novel environment based on a 3D game engine to simulate the challenges of task-oriented language grounding over a rich set of instructions and environment states.\nTechnological advancement in Wireless Sensor Networks (WSN) has made it become an invaluable component of a reliable environmental monitoring system; they form the digital skin' through which to 'sense' and collect the context of the surroundings and provides information on the process leading to complex events such as drought. However, these environmental properties are measured by various heterogeneous sensors of different modalities in distributed locations making up the WSN, using different abstruse terms and vocabulary in most cases to denote the same observed property, causing data heterogeneity. Adding semantics and understanding the relationships that exist between the observed properties, and augmenting it with local indigenous knowledge is necessary for an accurate drought forecasting system. In this paper, we propose the framework for the semantic representation of sensor data and integration with indigenous knowledge on drought using a middleware for an efficient drought forecasting system.\nDomain adaptation deals with adapting classifiers trained on data from a source distribution, to work effectively on data from a target distribution. In this paper, we introduce the Nonlinear Embedding Transform (NET) for unsupervised domain adaptation. The NET reduces cross-domain disparity through nonlinear domain alignment. It also embeds the domain-aligned data such that similar data points are clustered together. This results in enhanced classification. To determine the parameters in the NET model (and in other unsupervised domain adaptation models), we introduce a validation procedure by sampling source data points that are similar in distribution to the target data. We test the NET and the validation procedure using popular image datasets and compare the classification results across competitive procedures for unsupervised domain adaptation.\nExisting Markov Chain Monte Carlo (MCMC) methods are either based on general-purpose and domain-agnostic schemes which can lead to slow convergence, or hand-crafting of problem-specific proposals by an expert. We propose A-NICE-MC, a novel method to train flexible parametric Markov chain kernels to produce samples with desired properties. First, we propose an efficient likelihood-free adversarial training method to train a Markov chain and mimic a given data distribution. Then, we leverage flexible volume preserving flows to obtain parametric kernels for MCMC. Using a bootstrap approach, we show how to train efficient Markov chains to sample from a prescribed posterior distribution by iteratively improving the quality of both the model and the samples. A-NICE-MC provides the first framework to automatically design efficient domain-specific MCMC proposals. Empirical results demonstrate that A-NICE-MC combines the strong guarantees of MCMC with the expressiveness of deep neural networks, and is able to significantly outperform competing methods such as Hamiltonian Monte Carlo.\nThis work aims at the goal whether the artificial intelligence can recognize phase transition without the prior human knowledge. If this becomes successful, it can be applied to, for instance, analyze data from quantum simulation of unsolved physical models. Toward this goal, we first need to apply the machine learning algorithm to well-understood models and see whether the outputs are consistent with our prior knowledge, which serves as the benchmark of this approach. In this work, we feed the compute with data generated by the classical Monte Carlo simulation for the XY model in frustrated triangular and union jack lattices, which has two order parameters and exhibits two phase transitions. We show that the outputs of the principle component analysis agree very well with our understanding of different orders in different phases, and the temperature dependences of the major components detect the nature and the locations of the phase transitions. Our work offers promise for using machine learning techniques to study sophisticated statistical models, and our results can be further improved by using principle component analysis with kernel tricks and the neural network method.\nIn this paper, we present a toolbox for a specific optimization problem that frequently arises in bioinformatics or genomics. In this specific optimisation problem, the state space is a set of words of specified length over a finite alphabet. To each word is associated a score. The overall objective is to find the words which have the lowest possible score. This type of general optimization problem is encountered in e.g 3D conformation optimisation for protein structure prediction, or largest core genes subset discovery based on best supported phylogenetic tree for a set of species. In order to solve this problem, we propose a toolbox that can be easily launched using MPI and embeds 3 well-known metaheuristics. The toolbox is fully parametrized and well documented. It has been specifically designed to be easy modified and possibly improved by the user depending on the application, and does not require to be a computer scientist. We show that the toolbox performs very well on two difficult practical problems.\nIn Markov Decision Processes (MDPs), the reward obtained in a state depends on the properties of the last state and action. This state dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle such non-Markovian reward function was the subject of two previous lines of work, both using variants of LTL to specify the reward function and then compiling the new model back into a Markovian model. Building upon recent progress in the theories of temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.\nIn this paper, random forests are proposed for operating devices diagnostics in the presence of a variable number of features. In various contexts, like large or difficult-to-access monitored areas, wired sensor networks providing features to achieve diagnostics are either very costly to use or totally impossible to spread out. Using a wireless sensor network can solve this problem, but this latter is more subjected to flaws. Furthermore, the networks' topology often changes, leading to a variability in quality of coverage in the targeted area. Diagnostics at the sink level must take into consideration that both the number and the quality of the provided features are not constant, and that some politics like scheduling or data aggregation may be developed across the network. The aim of this article is ($1$) to show that random forests are relevant in this context, due to their flexibility and robustness, and ($2$) to provide first examples of use of this method for diagnostics based on data provided by a wireless sensor network.\nWe propose a simple and efficient approach to learning sparse models. Our approach consists of (1) projecting the data into a lower dimensional space, (2) learning a dense model in the lower dimensional space, and then (3) recovering the sparse model in the original space via compressive sensing. We apply this approach to Non-negative Matrix Factorization (NMF), tensor decomposition and linear classification---showing that it obtains $10\\times$ compression with negligible loss in accuracy on real data, and obtains up to $5\\times$ speedups. Our main theoretical contribution is to show the following result for NMF: if the original factors are sparse, then their projections are the sparsest solutions to the projected NMF problem. This explains why our method works for NMF and shows an interesting new property of random projections: they can preserve the solutions of non-convex optimization problems such as NMF.\nFinding solution values for unknowns in Boolean equations was a principal reasoning mode in the Algebra of Logic of the 19th century. Schr\\\"oder investigated it as \"Aufl\\\"osungsproblem\" (\"solution problem\"). It is closely related to the modern notion of Boolean unification. Today it is commonly presented in an algebraic setting, but seems potentially useful also in knowledge representation based on predicate logic. We show that it can be modeled on the basis of first-order logic extended by second-order quantification. A wealth of classical results transfers, foundations for algorithms unfold, and connections with second-order quantifier elimination and Craig interpolation show up. Although for first-order inputs the set of solutions is recursively enumerable, the development of constructive methods remains a challenge. We identify some cases that allow constructions, most of them based on Craig interpolation, and show a method to take vocabulary restrictions on solution components into account.\nGenerative encoder-decoder models offer great promise in developing domain-general dialog systems. However, they have mainly been applied to open-domain conversations. This paper presents a practical and novel framework for building task-oriented dialog systems based on encoder-decoder models. This framework enables encoder-decoder models to accomplish slot-value independent decision-making and interact with external databases. Moreover, this paper shows the flexibility of the proposed method by interleaving chatting capability with a slot-filling system for better out-of-domain recovery. The models were trained on both real-user data from a bus information system and human-human chat data. Results show that the proposed framework achieves good performance in both offline evaluation metrics and in task success rate with human users.\nA number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, all learned without any human supervision!   In this paper, using a Task and Tell reference game between two agents as a testbed, we present a sequence of 'negative' results culminating in a 'positive' one -- showing that while most agent-invented languages are effective (i.e. achieve near-perfect task rewards), they are decidedly not interpretable or compositional.   In essence, we find that natural language does not emerge 'naturally', despite the semblance of ease of natural-language-emergence that one may gather from recent literature. We discuss how it is possible to coax the invented languages to become more and more human-like and compositional by increasing restrictions on how two agents may communicate.\nThis paper describes our submission to the 2017 BioASQ challenge. We participated in Task B, Phase B which is concerned with biomedical question answering (QA). We focus on factoid and list question, using an extractive QA model, that is, we restrict our system to output substrings of the provided text snippets. At the core of our system, we use FastQA, a state-of-the-art neural QA system. We extended it with biomedical word embeddings and changed its answer layer to be able to answer list questions in addition to factoid questions. We pre-trained the model on a large-scale open-domain QA dataset, SQuAD, and then fine-tuned the parameters on the BioASQ training set. With our approach, we achieve state-of-the-art results on factoid questions and competitive results on list questions.\nNoisy data, non-convex objectives, model misspecification, and numerical instability can all cause undesired behaviors in machine learning systems. As a result, detecting actual implementation errors can be extremely difficult. We demonstrate a methodology in which developers use an interactive proof assistant to both implement their system and to state a formal theorem defining what it means for their system to be correct. The process of proving this theorem interactively in the proof assistant exposes all implementation errors since any error in the program would cause the proof to fail. As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients. We train a variational autoencoder using Certigrad and find the performance comparable to training the same model in TensorFlow.\nOver the years complexity theorists have proposed many structural parameters to explain the surprising efficiency of conflict-driven clause-learning (CDCL) SAT solvers on a wide variety of large industrial Boolean instances. While some of these parameters have been studied empirically, until now there has not been a unified comparative study of their explanatory power on a comprehensive benchmark. We correct this state of affairs by conducting a large-scale empirical evaluation of CDCL SAT solver performance on nearly 7000 industrial and crafted formulas against several structural parameters such as backdoors, treewidth, backbones, and community structure.   Our study led us to several results. First, we show that while such parameters only weakly correlate with CDCL solving time, certain combinations of them yield much better regression models. Second, we show how some parameters can be used as a \"lens\" to better understand the efficiency of different solving heuristics. Finally, we propose a new complexity-theoretic parameter, which we call learning-sensitive with restarts (LSR) backdoors, that extends the notion of learning-sensitive (LS) backdoors to incorporate restarts and discuss algorithms to compute them. We mathematically prove that for certain class of instances minimal LSR-backdoors are exponentially smaller than minimal-LS backdoors.\nInspired by the tremendous success of deep Convolutional Neural Networks as generic feature extractors for images, we propose TimeNet: a deep recurrent neural network (RNN) trained on diverse time series in an unsupervised manner using sequence to sequence (seq2seq) models to extract features from time series. Rather than relying on data from the problem domain, TimeNet attempts to generalize time series representation across domains by ingesting time series from several domains simultaneously. Once trained, TimeNet can be used as a generic off-the-shelf feature extractor for time series. The representations or embeddings given by a pre-trained TimeNet are found to be useful for time series classification (TSC). For several publicly available datasets from UCR TSC Archive and an industrial telematics sensor data from vehicles, we observe that a classifier learned over the TimeNet embeddings yields significantly better performance compared to (i) a classifier learned over the embeddings given by a domain-specific RNN, as well as (ii) a nearest neighbor classifier based on Dynamic Time Warping.\nOne major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.\nWe present a deep, fully convolutional neural network that learns to route a circuit layout net with appropriate choice of metal tracks and wire class combinations. Inputs to the network are the encoded layouts containing spatial location of pins to be routed. After 15 fully convolutional stages followed by a score comparator, the network outputs 8 layout layers (corresponding to 4 route layers, 3 via layers and an identity-mapped pin layer) which are then decoded to obtain the routed layouts. We formulate this as a binary segmentation problem on a per-pixel per-layer basis, where the network is trained to correctly classify pixels in each layout layer to be 'on' or 'off'. To demonstrate learnability of layout design rules, we train the network on a dataset of 50,000 train and 10,000 validation samples that we generate based on certain pre-defined layout constraints. Precision, recall and $F_1$ score metrics are used to track the training progress. Our network achieves $F_1\\approx97\\%$ on the train set and $F_1\\approx92\\%$ on the validation set. We use PyTorch for implementing our model. Code is made publicly available at https://github.com/sjain-stanford/deep-route .\nAdditively separable hedonic games and fractional hedonic games have received considerable attention. They are coalition forming games of selfish agents based on their mutual preferences. Most of the work in the literature characterizes the existence and structure of stable outcomes (i.e., partitions in coalitions), assuming that preferences are given. However, there is little discussion on this assumption. In fact, agents receive different utilities if they belong to different partitions, and thus it is natural for them to declare their preferences strategically in order to maximize their benefit. In this paper we consider strategyproof mechanisms for additively separable hedonic games and fractional hedonic games, that is, partitioning methods without payments such that utility maximizing agents have no incentive to lie about their true preferences. We focus on social welfare maximization and provide several lower and upper bounds on the performance achievable by strategyproof mechanisms for general and specific additive functions. In most of the cases we provide tight or asymptotically tight results. All our mechanisms are simple and can be computed in polynomial time. Moreover, all the lower bounds are unconditional, that is, they do not rely on any computational or complexity assumptions.\nA descriptive approach for automatic generation of visual blends is presented. The implemented system, the Blender, is composed of two components: the Mapper and the Visual Blender. The approach uses structured visual representations along with sets of visual relations which describe how the elements (in which the visual representation can be decomposed) relate among each other. Our system is a hybrid blender, as the blending process starts at the Mapper (conceptual level) and ends at the Visual Blender (visual representation level). The experimental results show that the Blender is able to create analogies from input mental spaces and produce well-composed blends, which follow the rules imposed by its base-analogy and its relations. The resulting blends are visually interesting and some can be considered as unexpected.\nIn order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network). Unlike MLE directly maximizing the conditional likelihood, the bridge extends the point-wise ground truth to a bridge distribution conditioned on it, and the generator is optimized to minimize their KL-divergence. Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize confidence, enhance language smoothness and relieve learning burden. Experiments conducted on two recognized sequence prediction tasks (machine translation and abstractive text summarization) show that our proposed GBNs can yield significant improvements over strong baselines. Furthermore, by analyzing samples drawn from different bridges, expected influences on the generator are verified.\nClass-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned a priori. Inspired by how the human visual cortex employs spatial attention and separate \"where\" and \"what\" processing pathways to actively suppress irrelevant visual features, this work develops a hierarchical attentive recurrent model for single object tracking in videos. The first layer of attention discards the majority of background by selecting a region containing the object of interest, while the subsequent layers tune in on visual features particular to the tracked object. This framework is fully differentiable and can be trained in a purely data driven fashion by gradient methods. To improve training convergence, we augment the loss function with terms for a number of auxiliary tasks relevant for tracking. Evaluation of the proposed model is performed on two datasets: pedestrian tracking on the KTH activity recognition dataset and the more difficult KITTI object tracking dataset.\nThis paper presents a collection of path planning algorithms for real-time movement of multiple robots across a Robotic Mobile Fulfillment System (RMFS). Robots are assigned to move storage units to pickers at working stations instead of requiring pickers to go to the storage area. Path planning algorithms aim to find paths for the robots to fulfill the requests without collisions or deadlocks. The state-of-the-art path planning algorithms, including WHCA*, FAR, BCP, OD&ID and CBS, were adapted to suit path planning in RMFS and integrated within a simulation tool to guide the robots from their starting points to their destinations during the storage and retrieval processes. Ten different layouts with a variety of numbers of robots, floors, pods, stations and the sizes of storage areas were considered in the simulation study. Performance metrics of throughput, path length and search time were monitored. Simulation results demonstrate the best algorithm based on each performance metric.\nA vibrant theoretical research area are efficient exact parameterized algorithms. Very recent solving competitions such as the PACE challenge show that there is also increasing practical interest in the parameterized algorithms community. An important research question is whether dedicated parameterized exact algorithms exhibit certain practical relevance and one can even beat well-established problem solvers. We consider the logic-based declarative modeling language and problem solving framework Answer Set Programming (ASP). State-of-the-art ASP solvers rely considerably on Sat-based algorithms. An ASP solver (DynASP2), which is based on a classical dynamic programming on tree decompositions, has been published very recently. Unfortunately, DynASP2 can outperform modern ASP solvers on programs of small treewidth only if the question of interest is to count the number of solutions. In this paper, we describe underlying concepts of our new implementation (DynASP2.5) that shows competitive behavior to state-of-the-art ASP solvers even for finding just one solution when solving problems as the Steiner tree problem that have been modeled in ASP on graphs with low treewidth. Our implementation is based on a novel approach that we call multi-pass dynamic programming (M-DPSINC).\nIn this paper, we study Reiter's propositional default logic when the treewidth of a certain graph representation (semi-primal graph) of the input theory is bounded. We establish a dynamic programming algorithm on tree decompositions that decides whether a theory has a consistent stable extension (Ext). Our algorithm can even be used to enumerate all generating defaults (ExtEnum) that lead to stable extensions.   We show that our algorithm decides Ext in linear time in the input theory and triple exponential time in the treewidth (so-called fixed-parameter linear algorithm).   Further, our algorithm solves ExtEnum with a pre-computation step that is linear in the input theory and triple exponential in the treewidth followed by a linear delay to output solutions.\nWe present an approach for agents to learn representations of a global map from sensor data, to aid their exploration in new environments. To achieve this, we embed procedures mimicking that of traditional Simultaneous Localization and Mapping (SLAM) into the soft attention based addressing of external memory architectures, in which the external memory acts as an internal representation of the environment. This structure encourages the evolution of SLAM-like behaviors inside a completely differentiable deep neural network. We show that this approach can help reinforcement learning agents to successfully explore new environments where long-term memory is essential. We validate our approach in both challenging grid-world environments and preliminary Gazebo experiments. A video of our experiments can be found at: https://goo.gl/G2Vu5y.\nIn this paper, we introduce Path Integral Networks (PI-Net), a recurrent network representation of the Path Integral optimal control algorithm. The network includes both system dynamics and cost models, used for optimal control based planning. PI-Net is fully differentiable, learning both dynamics and cost models end-to-end by back-propagation and stochastic gradient descent. Because of this, PI-Net can learn to plan. PI-Net has several advantages: it can generalize to unseen states thanks to planning, it can be applied to continuous control tasks, and it allows for a wide variety learning schemes, including imitation and reinforcement learning. Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem. We also show that PI-Net is able to learn dynamics and cost models latent in the demonstrations.\nResearch in UAV scheduling has obtained an emerging interest from scientists in the optimization field. When the scheduling itself has established a strong root since the 19th century, works on UAV scheduling in indoor environment has come forth in the latest decade. Several works on scheduling UAV operations in indoor (two and three dimensional) and outdoor environments are reported. In this paper, a further study on UAV scheduling in three dimensional indoor environment is investigated. Dealing with indoor environment\\textemdash where humans, UAVs, and other elements or infrastructures are likely to coexist in the same space\\textemdash draws attention towards the safety of the operations. In relation to the battery level, a preserved battery level leads to safer operations, promoting the UAV to have a decent remaining power level. A methodology which consists of a heuristic approach based on Restful Task Assignment Algorithm, incorporated with Particle Swarm Optimization Algorithm, is proposed. The motivation is to preserve the battery level throughout the operations, which promotes less possibility in having failed UAVs on duty. This methodology is tested with 54 benchmark datasets stressing on 4 different aspects: geographical distance, number of tasks, number of predecessors, and slack time. The test results and their characteristics in regard to the proposed methodology are discussed and presented.\nIt is well known that speaker identification performs extremely well in the neutral talking environments; however, the identification performance is declined sharply in the shouted talking environments. This work aims at proposing, implementing and testing a new approach to enhance the declined performance in the shouted talking environments. The new proposed approach is based on gender-dependent speaker identification using Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. This proposed approach has been tested on two different and separate speech databases: our collected database and the Speech Under Simulated and Actual Stress (SUSAS) database. The results of this work show that gender-dependent speaker identification based on SPHMMs outperforms gender-independent speaker identification based on the same models and gender-dependent speaker identification based on Hidden Markov Models (HMMs) by about 6% and 8%, respectively. The results obtained based on the proposed approach are close to those obtained in subjective evaluation by human judges.\nVirtual reality simulation is becoming popular as a training platform in surgical education. However, one important aspect of simulation-based surgical training that has not received much attention is the provision of automated real-time performance feedback to support the learning process. Performance feedback is actionable advice that improves novice behaviour. In simulation, automated feedback is typically extracted from prediction models trained using data mining techniques. Existing techniques suffer from either low effectiveness or low efficiency resulting in their inability to be used in real-time. In this paper, we propose a random forest based method that finds a balance between effectiveness and efficiency. Experimental results in a temporal bone surgery simulation show that the proposed method is able to extract highly effective feedback at a high level of efficiency.\nThe challenge of sharing and communicating information is crucial in complex human-robot interaction (HRI) scenarios. Ontologies and symbolic reasoning are the state-of-the-art approaches for a natural representation of knowledge, especially within the Semantic Web domain. In such a context, scripted paradigms have been adopted to achieve high expressiveness. Nevertheless, since symbolic reasoning is a high complexity problem, optimizing its performance requires a careful design of the knowledge. Specifically, a robot architecture requires the integration of several components implementing different behaviors and generating a series of beliefs. Most of the components are expected to access, manipulate, and reason upon a run-time generated semantic representation of knowledge grounding robot behaviors and perceptions through formal axioms, with soft real-time requirements.\nIn this paper the elements of the CAPTCHA usability are analyzed. CAPTCHA, as a time progressive element in computer science, has been under constant interest of ordinary, professional as well as the scientific users of the Internet. The analysis is given based on the usability elements of CAPTCHA which are abbreviated as user-centric approach to the CAPTCHA. To demonstrate it, the specific type of Dice CAPTCHA is used in the experiment. The experiment is conducted on 190 Internet users with different demographic characteristics on laptop and tablet computers. The obtained results are statistically processed. At the end, the results are compared and conclusion of their use is drawn.\nThe influence maximization is the problem of finding a set of social network users, called influencers, that can trigger a large cascade of propagation. Influencers are very beneficial to make a marketing campaign goes viral through social networks for example. In this paper, we propose an influence measure that combines many influence indicators. Besides, we consider the reliability of each influence indicator and we present a distance-based process that allows to estimate the reliability of each indicator. The proposed measure is defined under the framework of the theory of belief functions. Furthermore, the reliability-based influence measure is used with an influence maximization model to select a set of users that are able to maximize the influence in the network. Finally, we present a set of experiments on a dataset collected from Twitter. These experiments show the performance of the proposed solution in detecting social influencers with good quality.\nIt is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don't. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments.\nThe current paper proposes a novel variational Bayes predictive coding RNN model, which can learn to generate fluctuated temporal patterns from exemplars. The model learns to maximize the lower bound of the weighted sum of the regularization and reconstruction error terms. We examined how this weighting can affect development of different types of information processing while learning fluctuated temporal patterns. Simulation results show that strong weighting of the reconstruction term causes the development of deterministic chaos for imitating the randomness observed in target sequences, while strong weighting of the regularization term causes the development of stochastic dynamics imitating probabilistic processes observed in targets. Moreover, results indicate that the most generalized learning emerges between these two extremes. The paper concludes with implications in terms of the underlying neuronal mechanisms for autism spectrum disorder and for free action.\nSeveral domains have adopted the increasing use of IoT-based devices to collect sensor data for generating abstractions and perceptions of the real world. This sensor data is multi-modal and heterogeneous in nature. This heterogeneity induces interoperability issues while developing cross-domain applications, thereby restricting the possibility of reusing sensor data to develop new applications. As a solution to this, semantic approaches have been proposed in the literature to tackle problems related to interoperability of sensor data. Several ontologies have been proposed to handle different aspects of IoT-based sensor data collection, ranging from discovering the IoT sensors for data collection to applying reasoning on the collected sensor data for drawing inferences. In this paper, we survey these existing semantic ontologies to provide an overview of the recent developments in this field. We highlight the fundamental ontological concepts (e.g., sensor-capabilities and context-awareness) required for an IoT-based application, and survey the existing ontologies which include these concepts. Based on our study, we also identify the shortcomings of currently available ontologies, which serves as a stepping stone to state the need for a common unified ontology for the IoT domain.\nDeep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially problematic for on-line learning with real users. Two approaches are introduced to tackle this problem. Firstly, to speed up the learning process, two sample-efficient neural networks algorithms: trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER) are presented. For TRACER, the trust region helps to control the learning step size and avoid catastrophic model changes. For eNACER, the natural gradient identifies the steepest ascent direction in policy space to speed up the convergence. Both models employ off-policy learning with experience replay to improve sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning. Combining these two approaches, we demonstrate a practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain.\nWe propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student's performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks.\nIn this work, we formulated a real-world problem related to sewer pipeline gas detection using the classification-based approaches. The primary goal of this work was to identify the hazardousness of sewer pipeline to offer safe and non-hazardous access to sewer pipeline workers so that the human fatalities, which occurs due to the toxic exposure of sewer gas components, can be avoided. The dataset acquired through laboratory tests, experiments, and various literature sources was organized to design a predictive model that was able to identify/classify hazardous and non-hazardous situation of sewer pipeline. To design such prediction model, several classification algorithms were used and their performances were evaluated and compared, both empirically and statistically, over the collected dataset. In addition, the performances of several ensemble methods were analyzed to understand the extent of improvement offered by these methods. The result of this comprehensive study showed that the instance-based learning algorithm performed better than many other algorithms such as multilayer perceptron, radial basis function network, support vector machine, reduced pruning tree. Similarly, it was observed that multi-scheme ensemble approach enhanced the performance of base predictors.\nWe propose a novel approach for group elevator scheduling by formulating it as the maximization of submodular function under a matroid constraint. In particular, we propose to model the total waiting time of passengers using a quadratic Boolean function. The unary and pairwise terms in the function denote the waiting time for single and pairwise allocation of passengers to elevators, respectively. We show that this objective function is submodular. The matroid constraints ensure that every passenger is allocated to exactly one elevator. We use a greedy algorithm to maximize the submodular objective function, and derive provable guarantees on the optimality of the solution. We tested our algorithm using Elevate 8, a commercial-grade elevator simulator that allows simulation with a wide range of elevator settings. We achieve significant improvement over the existing algorithms.\nWe present Solrex,an automated solver for the game of Reverse Hex.Reverse Hex, also known as Rex, or Misere Hex, is the variant of the game of Hex in which the player who joins her two sides loses the game. Solrex performs a mini-max search of the state space using Scalable Parallel Depth First Proof Number Search, enhanced by the pruning of inferior moves and the early detection of certain winning strategies. Solrex is implemented on the same code base as the Hex program Solver, and can solve arbitrary positions on board sizes up to 6x6, with the hardest position taking less than four hours on four threads.\nModeling preference time in triathlons means predicting the intermediate times of particular sports disciplines by a given overall finish time in a specific triathlon course for the athlete with the known personal best result. This is a hard task for athletes and sport trainers due to a lot of different factors that need to be taken into account, e.g., athlete's abilities, health, mental preparations and even their current sports form. So far, this process was calculated manually without any specific software tools or using the artificial intelligence. This paper presents the new solution for modeling preference time in middle distance triathlons based on particle swarm optimization algorithm and archive of existing sports results. Initial results are presented, which suggest the usefulness of proposed approach, while remarks for future improvements and use are also emphasized.\nA popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.\nThis paper addresses the challenge of viewing and navigating Bayesian networks as their structural size and complexity grow. Starting with a review of the state of the art of visualizing Bayesian networks, an area which has largely been passed over, we improve upon existing visualizations in three ways. First, we apply a disciplined approach to the graphic design of the basic elements of the Bayesian network. Second, we propose a technique for direct, visual comparison of posterior distributions resulting from alternative evidence sets. Third, we leverage a central mathematical tool in information theory, to assist the user in finding variables of interest in the network, and to reduce visual complexity where unimportant. We present our methods applied to two modestly large Bayesian networks constructed from real-world data sets. Results suggest the new techniques can be a useful tool for discovering information flow phenomena, and also for qualitative comparisons of different evidence configurations, especially in large probabilistic networks.\nWe present a conditional generative model that maps low-dimensional embeddings of multiple modalities of data to a common latent space hence extracting semantic relationships between them. The embedding specific to a modality is first extracted and subsequently a constrained optimization procedure is performed to project the two embedding spaces to a common manifold. The individual embeddings are generated back from this common latent space. However, in order to enable independent conditional inference for separately extracting the corresponding embeddings from the common latent space representation, we deploy a proxy variable trick - wherein, the single shared latent space is replaced by the respective separate latent spaces of each modality. We design an objective function, such that, during training we can force these separate spaces to lie close to each other, by minimizing the distance between their probability distribution functions. Experimental results demonstrate that the learned joint model can generalize to learning concepts of double MNIST digits with additional attributes of colors,from both textual and speech input.\nThe set-based concept approach has been suggested as a means to simultaneously explore different design concepts, which are meaningful sub-sets of the entire set of solutions. Previous efforts concerning the suggested approach focused on either revealing the global front (s-Pareto front), of all the concepts, or on finding the concepts' fronts, within a relaxation zone. In contrast, here the aim is to reveal which of the concepts have at least one solution with a performance vector within a pre-defined window-of-interest (WOI). This paper provides the rational for this new concept-based exploration problem, and suggests a WOI-based rather than Pareto-based multi-objective evolutionary algorithm. The proposed algorithm, which simultaneously explores different concepts, is tested using a recently suggested concept-based benchmarking approach. The numerical study of this paper shows that the algorithm can cope with various numerical difficulties in a simultaneous way, which outperforms a sequential exploration approach.\nA network of driven nonlinear oscillators without dissipation has recently been proposed for solving combinatorial optimization problems via quantum adiabatic evolution through its bifurcation point. Here we investigate the behavior of the quantum bifurcation machine in the presence of dissipation. Our numerical study suggests that the output probability distribution of the dissipative quantum bifurcation machine is Boltzmann-like, where the energy in the Boltzmann distribution corresponds to the cost function of the optimization problem. We explain the Boltzmann distribution by generalizing the concept of quantum heating in a single oscillator to the case of multiple coupled oscillators. The present result also suggests that such driven dissipative nonlinear oscillator networks can be applied to Boltzmann sampling, which is used, e.g., for Boltzmann machine learning in the field of artificial intelligence.\nWe propose Black Box Explanations through Transparent Approximations (BETA), a novel model agnostic framework for explaining the behavior of any black-box classifier by simultaneously optimizing for fidelity to the original model and interpretability of the explanation. To this end, we develop a novel objective function which allows us to learn (with optimality guarantees), a small number of compact decision sets each of which explains the behavior of the black box model in unambiguous, well-defined regions of feature space. Furthermore, our framework also is capable of accepting user input when generating these approximations, thus allowing users to interactively explore how the black-box model behaves in different subspaces that are of interest to the user. To the best of our knowledge, this is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences. Experimental evaluation with real-world datasets and user studies demonstrates that our approach can generate highly compact, easy-to-understand, yet accurate approximations of various kinds of predictive models compared to state-of-the-art baselines.\nUnsupervised rank aggregation on score-based permutations, which is widely used in many applications, has not been deeply explored yet. This work studies the use of submodular optimization for rank aggregation on score-based permutations in an unsupervised way. Specifically, we propose an unsupervised approach based on the Lovasz Bregman divergence for setting up linear structured convex and nested structured concave objective functions. In addition, stochastic optimization methods are applied in the training process and efficient algorithms for inference can be guaranteed. The experimental results from Information Retrieval, Combining Distributed Neural Networks, Influencers in Social Networks, and Distributed Automatic Speech Recognition tasks demonstrate the effectiveness of the proposed methods.\nSentiment analysis is the Natural Language Processing (NLP) task dealing with the detection and classification of sentiments in texts. While some tasks deal with identifying the presence of sentiment in the text (Subjectivity analysis), other tasks aim at determining the polarity of the text categorizing them as positive, negative and neutral. Whenever there is a presence of sentiment in the text, it has a source (people, group of people or any entity) and the sentiment is directed towards some entity, object, event or person. Sentiment analysis tasks aim to determine the subject, the target and the polarity or valence of the sentiment. In our work, we try to automatically extract sentiment (positive or negative) from Facebook posts using a machine learning approach.While some works have been done in code-mixed social media data and in sentiment analysis separately, our work is the first attempt (as of now) which aims at performing sentiment analysis of code-mixed social media text. We have used extensive pre-processing to remove noise from raw text. Multilayer Perceptron model has been used to determine the polarity of the sentiment. We have also developed the corpus for this task by manually labeling Facebook posts with their associated sentiments.\nVarious measures can be used to estimate bias or unfairness in a predictor. Previous work has already established that some of these measures are incompatible with each other. Here we show that, when groups differ in prevalence of the predicted event, several intuitive, reasonable measures of fairness (probability of positive prediction given occurrence or non-occurrence; probability of occurrence given prediction or non-prediction; and ratio of predictions over occurrences for each group) are all mutually exclusive: if one of them is equal among groups, the other two must differ. The only exceptions are for perfect, or trivial (always-positive or always-negative) predictors. As a consequence, any non-perfect, non-trivial predictor must necessarily be \"unfair\" under two out of three reasonable sets of criteria. This result readily generalizes to a wide range of well-known statistical quantities (sensitivity, specificity, false positive rate, precision, etc.), all of which can be divided into three mutually exclusive groups. Importantly, The results applies to all predictors, whether algorithmic or human. We conclude with possible ways to handle this effect when assessing and designing prediction methods.\nModeling users for the purpose of identifying their preferences and then personalizing services on the basis of these models is a complex task, primarily due to the need to take into consideration various explicit and implicit signals, missing or uncertain information, contextual aspects, and more. In this study, a novel generic approach for uncovering latent preference patterns from user data is proposed and evaluated. The approach relies on representing the data using graphs, and then systematically extracting graph-based features and using them to enrich the original user models. The extracted features encapsulate complex relationships between users, items, and metadata. The enhanced user models can then serve as an input to any recommendation algorithm. The proposed approach is domain-independent (demonstrated on data from movies, music, and business recommender systems), and is evaluated using several state-of-the-art machine learning methods, on different recommendation tasks, and using different evaluation metrics. The results show a unanimous improvement in the recommendation accuracy across tasks and domains. In addition, the evaluation provides a deeper analysis regarding the performance of the approach in special scenarios, including high sparsity and variability of ratings.\nCausation discovery without manipulation is considered a crucial problem to a variety of applications. The state-of-the-art solutions are applicable only when large numbers of samples are available or the problem domain is sufficiently small. Motivated by the observations of the local sparsity properties on causal structures, we propose a general Split-and-Merge framework, named SADA, to enhance the scalability of a wide class of causation discovery algorithms. In SADA, the variables are partitioned into subsets, by finding causal cut on the sparse causal structure over the variables. By running mainstream causation discovery algorithms as basic causal solvers on the subproblems, complete causal structure can be reconstructed by combining the partial results. SADA benefits from the recursive division technique, since each small subproblem generates more accurate result under the same number of samples. We theoretically prove that SADA always reduces the scales of problems without sacrifice on accuracy, under the condition of local causal sparsity and reliable conditional independence tests. We also present sufficient condition to accuracy enhancement by SADA, even when the conditional independence tests are vulnerable. Extensive experiments on both simulated and real-world datasets verify the improvements on scalability and accuracy by applying SADA together with existing causation discovery algorithms.\nIn typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and solving the dual MDP-policy pair yields a policy gradient solution to optimizing the parametrized environment. Furthermore, environments with discontinuous parameters are addressed by a proposed general generative framework. While the idea is illustrated by an extended two-agent rock-paper-scissors game, our experiments on a Maze game design task show the effectiveness of the proposed algorithm in generating diverse and challenging Mazes against different agents with various settings.\nContent-invariance in mapping codes learned by GAEs is a useful feature for various relation learning tasks. In this paper we show that the content-invariance of mapping codes for images of 2D and 3D rotated objects can be substantially improved by extending the standard GAE loss (symmetric reconstruction error) with a regularization term that penalizes the symmetric cross-reconstruction error. This error term involves reconstruction of pairs with mapping codes obtained from other pairs exhibiting similar transformations. Although this would principally require knowledge of the transformations exhibited by training pairs, our experiments show that a bootstrapping approach can sidestep this issue, and that the regularization term can effectively be used in an unsupervised setting.\nMany practical problems are characterized by a preference relation over admissible solutions, where preferred solutions are minimal in some sense. For example, a preferred diagnosis usually comprises a minimal set of reasons that is sufficient to cause the observed anomaly. Alternatively, a minimal correction subset comprises a minimal set of reasons whose deletion is sufficient to eliminate the observed anomaly. Circumscription formalizes such preference relations by associating propositional theories with minimal models. The resulting enumeration problem is addressed here by means of a new algorithm taking advantage of unsatisfiable core analysis. Empirical evidence of the efficiency of the algorithm is given by comparing the performance of the resulting solver, CIRCUMSCRIPTINO, with HCLASP, CAMUS MCS, LBX and MCSLS on the enumeration of minimal models for problems originating from practical applications.   This paper is under consideration for acceptance in TPLP.\nDealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum.   We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task.\nOver the past few years, ride-sharing has emerged as an effective way to relieve traffic congestion. A key problem for these platforms is to come up with a revenue-optimal (or GMV-optimal) pricing scheme and an induced vehicle dispatching policy that incorporate geographic and temporal information. In this paper, we aim to tackle this problem via an economic approach.   Modeled naively, the underlying optimization problem may be non-convex and thus hard to compute. To this end, we use a so-called \"ironing\" technique to convert the problem into an equivalent convex optimization one via a clean Markov decision process (MDP) formulation, where the states are the driver distributions and the decision variables are the prices for each pair of locations. Our main finding is an efficient algorithm that computes the exact revenue-optimal (or GMV-optimal) randomized pricing schemes. We characterize the optimal solution of the MDP by a primal-dual analysis of a corresponding convex program. We also conduct empirical evaluations of our solution through real data of a major ride-sharing platform and show its advantages over fixed pricing schemes as well as several prevalent surge-based pricing schemes.\nThis paper aims at providing insight on the transferability of deep CNN features to unsupervised problems. We study the impact of different pretrained CNN feature extractors on the problem of image set clustering for object classification as well as fine-grained classification. We propose a rather straightforward pipeline combining deep-feature extraction using a CNN pretrained on ImageNet and a classic clustering algorithm to classify sets of images. This approach is compared to state-of-the-art algorithms in image-clustering and provides better results. These results strengthen the belief that supervised training of deep CNN on large datasets, with a large variability of classes, extracts better features than most carefully designed engineering approaches, even for unsupervised tasks. We also validate our approach on a robotic application, consisting in sorting and storing objects smartly based on clustering.\nAutomatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a cross-linguistic comparison of Dutch, English, and German image descriptions. We find that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on description specificity.\nTrust region methods, such as TRPO, are often used to stabilize policy optimization algorithms in reinforcement learning (RL). While current trust region strategies are effective for continuous control, they typically require a prohibitively large amount of on-policy interaction with the environment. To address this problem, we propose an off-policy trust region method, Trust-PCL. The algorithm is the result of observing that the optimal policy and state values of a maximum reward objective with a relative-entropy regularizer satisfy a set of multi-step pathwise consistencies along any path. Thus, Trust-PCL is able to maintain optimization stability while exploiting off-policy data to improve sample efficiency. When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.\nQuestion answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However, a major drawback of many such models is that they are capable of only generating single-word answers. In addition, they require large amount of training data to generate accurate answers. In this paper, we introduce the Long-Term Memory Network (LTMN), which incorporates both an external memory module and a Long Short-Term Memory (LSTM) module to comprehend the input data and generate multi-word answers. The LTMN model can be trained end-to-end using back-propagation and requires minimal supervision. We test our model on two synthetic data sets (based on Facebook's bAbI data set) and the real-world Stanford question answering data set, and show that it can achieve state-of-the-art performance.\nWe introduce a graphical framework for fair division in cake cutting, where comparisons between agents are limited by an underlying network structure. We generalize the classical fairness notions of envy-freeness and proportionality to this graphical setting. Given a simple undirected graph G, an allocation is envy-free on G if no agent envies any of her neighbor's share, and is proportional on G if every agent values her own share no less than the average among her neighbors, with respect to her own measure. These generalizations open new research directions in developing simple and efficient algorithms that can produce fair allocations under specific graph structures.   On the algorithmic frontier, we first propose a moving-knife algorithm that outputs an envy-free allocation on trees. The algorithm is significantly simpler than the discrete and bounded envy-free algorithm recently designed by Aziz and Mackenzie for complete graphs. Next, we give a discrete and bounded algorithm for computing a proportional allocation on descendant graphs, a class of graphs by taking a rooted tree and connecting all its ancestor-descendant pairs.\nThe concept of leader--follower (or Stackelberg) equilibrium plays a central role in a number of real--world applications of game theory. While the case with a single follower has been thoroughly investigated, results with multiple followers are only sporadic and the problem of designing and evaluating computationally tractable equilibrium-finding algorithms is still largely open. In this work, we focus on the fundamental case where multiple followers play a Nash equilibrium once the leader has committed to a strategy---as we illustrate, the corresponding equilibrium finding problem can be easily shown to be $\\mathcal{FNP}$--hard and not in Poly--$\\mathcal{APX}$ unless $\\mathcal{P} = \\mathcal{NP}$ and therefore it is one among the hardest problems to solve and approximate. We propose nonconvex mathematical programming formulations and global optimization methods to find both exact and approximate equilibria, as well as a heuristic black box algorithm. All the methods and formulations that we introduce are thoroughly evaluated computationally.\nAutomated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains.   In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings (\"docstrings\") generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data.   We release our datasets and processing scripts in order to stimulate research in these areas.\nThe highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a high-dimensional space and concepts are represented by regions in this space. Our recent mathematical formalization of this framework is capable of representing correlations between different domains in a geometric way. In this paper, we extend our formalization by providing quantitative mathematical definitions for the notions of concept size, subsethood, implication, similarity, and betweenness. This considerably increases the representational power of our formalization by introducing measurable ways of describing relations between concepts.\nDiversity is one of the fundamental properties for the survival of species, populations, and organizations. Recent advances in deep learning allow for the rapid and automatic assessment of organizational diversity and possible discrimination by race, sex, age and other parameters. Automating the process of assessing the organizational diversity using the deep neural networks and eliminating the human factor may provide a set of real-time unbiased reports to all stakeholders. In this pilot study we applied the deep-learned predictors of race and sex to the executive management and board member profiles of the 500 largest companies from the 2016 Forbes Global 2000 list and compared the predicted ratios to the ratios within each company's country of origin and ranked them by the sex-, age- and race- diversity index (DI). While the study has many limitations and no claims are being made concerning the individual companies, it demonstrates a method for the rapid and impartial assessment of organizational diversity using deep neural networks.\nState-of-the-art slot filling models for goal-oriented human/machine conversational language understanding systems rely on deep learning methods. While multi-task training of such models alleviates the need for large in-domain annotated datasets, bootstrapping a semantic parsing model for a new domain using only the semantic frame, such as the back-end API or knowledge graph schema, is still one of the holy grail tasks of language understanding for dialogue systems. This paper proposes a deep learning based approach that can utilize only the slot description in context without the need for any labeled or unlabeled in-domain examples, to quickly bootstrap a new domain. The main idea of this paper is to leverage the encoding of the slot names and descriptions within a multi-task deep learned slot filling model, to implicitly align slots across domains. The proposed approach is promising for solving the domain scaling problem and eliminating the need for any manually annotated data or explicit schema alignment. Furthermore, our experiments on multiple domains show that this approach results in significantly better slot-filling performance when compared to using only in-domain data, especially in the low data regime.\nFor safe and efficient planning and control in autonomous driving, we need a driving policy which can achieve desirable driving quality in long-term horizon with guaranteed safety and feasibility. Optimization-based approaches, such as Model Predictive Control (MPC), can provide such optimal policies, but their computational complexity is generally unacceptable for real-time implementation. To address this problem, we propose a fast integrated planning and control framework that combines learning- and optimization-based approaches in a two-layer hierarchical structure. The first layer, defined as the \"policy layer\", is established by a neural network which learns the long-term optimal driving policy generated by MPC. The second layer, called the \"execution layer\", is a short-term optimization-based controller that tracks the reference trajecotries given by the \"policy layer\" with guaranteed short-term safety and feasibility. Moreover, with efficient and highly-representative features, a small-size neural network is sufficient in the \"policy layer\" to handle many complicated driving scenarios. This renders online imitation learning with Dataset Aggregation (DAgger) so that the performance of the \"policy layer\" can be improved rapidly and continuously online. Several exampled driving scenarios are demonstrated to verify the effectiveness and efficiency of the proposed framework.\nThe current study applies deep learning to herbalism. Toward the goal, we acquired the de-identified health insurance reimbursements that were claimed in a 10-year period from 2004 to 2013 in the National Health Insurance Database of Taiwan, the total number of reimbursement records equaling 340 millions. Two artificial intelligence techniques were applied to the dataset: residual convolutional neural network multitask classifier and attention-based recurrent neural network. The former works to translate from herbal prescriptions to diseases; and the latter from diseases to herbal prescriptions. Analysis of the classification results indicates that herbal prescriptions are specific to: anatomy, pathophysiology, sex and age of the patient, and season and year of the prescription. Further analysis identifies temperature and gross domestic product as the meteorological and socioeconomic factors that are associated with herbal prescriptions. Analysis of the neural machine transitional result indicates that the recurrent neural network learnt not only syntax but also semantics of diseases and herbal prescriptions.\nEvery year at the United Nations, member states deliver statements during the General Debate discussing major issues in world politics. These speeches provide invaluable information on governments' perspectives and preferences on a wide range of issues, but have largely been overlooked in the study of international politics. This paper introduces a new dataset consisting of over 7,701 English-language country statements from 1970-2016. We demonstrate how the UN General Debate Corpus (UNGDC) can be used to derive country positions on different policy dimensions using text analytic methods. The paper provides applications of these estimates, demonstrating the contribution the UNGDC can make to the study of international politics.\nAdversarial samples are strategically modified samples, which are crafted with the purpose of fooling a classifier at hand. An attacker introduces specially crafted adversarial samples to a deployed classifier, which are being mis-classified by the classifier. However, the samples are perceived to be drawn from entirely different classes and thus it becomes hard to detect the adversarial samples. Most of the prior works have been focused on synthesizing adversarial samples in the image domain. In this paper, we propose a new method of crafting adversarial text samples by modification of the original samples. Modifications of the original text samples are done by deleting or replacing the important or salient words in the text or by introducing new words in the text sample. Our algorithm works best for the datasets which have sub-categories within each of the classes of examples. While crafting adversarial samples, one of the key constraint is to generate meaningful sentences which can at pass off as legitimate from language (English) viewpoint. Experimental results on IMDB movie review dataset for sentiment analysis and Twitter dataset for gender detection show the efficiency of our proposed method.\nThe amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.\nIn this paper, we propose a multi-task learning from demonstration method that works using raw images as input to autonomously accomplish a wide variety of tasks in the real world using a low-cost robotic arm. The controller is a single recurrent neural network that can generate robot arm trajectories to perform different manipulation tasks. In order to learn complex skills from relatively few demonstrations, we share parameters across different tasks. Our network also combines VAE-GAN-based reconstruction with autoregressive multimodal action prediction for improved data efficiency. Our results show that weight sharing and reconstruction substantially improve generalization and robustness, and that training on multiple tasks simultaneously greatly improves the success rate on all of the tasks. Our experiments, performed on a real-world low-cost Lynxmotion arm, illustrate a variety of picking and placing tasks, as well as non-prehensile manipulation.\nWe investigate a generalisation of the coherent choice functions considered by Seidenfeld et al. (2010), by sticking to the convexity axiom but imposing no Archimedeanity condition. We define our choice functions on vector spaces of options, which allows us to incorporate as special cases both Seidenfeld et al.'s (2010) choice functions on horse lotteries and sets of desirable gambles (Quaeghebeur, 2014), and to investigate their connections. We show that choice functions based on sets of desirable options (gambles) satisfy Seidenfeld's convexity axiom only for very particular types of sets of desirable options, which are in a one-to-one relationship with the lexicographic probabilities. We call them lexicographic choice functions. Finally, we prove that these choice functions can be used to determine the most conservative convex choice function associated with a given binary relation.\nMachine learning based system are increasingly being used for sensitive tasks such as security surveillance, guiding autonomous vehicle, taking investment decisions, detecting and blocking network intrusion and malware etc. However, recent research has shown that machine learning models are venerable to attacks by adversaries at all phases of machine learning (eg, training data collection, training, operation). All model classes of machine learning systems can be misled by providing carefully crafted inputs making them wrongly classify inputs. Maliciously created input samples can affect the learning process of a ML system by either slowing down the learning process, or affecting the performance of the learned mode, or causing the system make error(s) only in attacker's planned scenario. Because of these developments, understanding security of machine learning algorithms and systems is emerging as an important research area among computer security and machine learning researchers and practitioners. We present a survey of this emerging area in machine learning.\nRecently, many variance reduced stochastic alternating direction method of multipliers (ADMM) methods (e.g.\\ SAG-ADMM, SDCA-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rates for strongly convex problems. However, the best known convergence rate for general convex problems is O(1/T) as opposed to O(1/T^2) of accelerated batch algorithms, where $T$ is the number of iterations. Thus, there still remains a gap in convergence rates between existing stochastic ADMM and batch algorithms. To bridge this gap, we introduce the momentum acceleration trick for batch optimization into the stochastic variance reduced gradient based ADMM (SVRG-ADMM), which leads to an accelerated (ASVRG-ADMM) method. Then we design two different momentum term update rules for strongly convex and general convex cases. We prove that ASVRG-ADMM converges linearly for strongly convex problems. Besides having a low per-iteration complexity as existing stochastic ADMM methods, ASVRG-ADMM improves the convergence rate on general convex problems from O(1/T) to O(1/T^2). Our experimental results show the effectiveness of ASVRG-ADMM.\nWhile general game playing is an active field of research, the learning of game design has tended to be either a secondary goal of such research or it has been solely the domain of humans. We propose a field of research, Automated Game Design Learning (AGDL), with the direct purpose of learning game designs directly through interaction with games in the mode that most people experience games: via play. We detail existing work that touches the edges of this field, describe current successful projects in AGDL and the theoretical foundations that enable them, point to promising applications enabled by AGDL, and discuss next steps for this exciting area of study. The key moves of AGDL are to use game programs as the ultimate source of truth about their own design, and to make these design properties available to other systems and avenues of inquiry.\nWe propose and evaluate a new technique for learning hybrid automata automatically by observing the runtime behavior of a dynamical system. Working from a sequence of continuous state values and predicates about the environment, CHARDA recovers the distinct dynamic modes, learns a model for each mode from a given set of templates, and postulates causal guard conditions which trigger transitions between modes. Our main contribution is the use of information-theoretic measures (1)~as a cost function for data segmentation and model selection to penalize over-fitting and (2)~to determine the likely causes of each transition. CHARDA is easily extended with different classes of model templates, fitting methods, or predicates. In our experiments on a complex videogame character, CHARDA successfully discovers a reasonable over-approximation of the character's true behaviors. Our results also compare favorably against recent work in automatically learning probabilistic timed automata in an aircraft domain: CHARDA exactly learns the modes of these simpler automata.\nForeign policy analysis has been struggling to find ways to measure policy preferences and paradigm shifts in international political systems. This paper presents a novel, potential solution to this challenge, through the application of a neural word embedding (Word2vec) model on a dataset featuring speeches by heads of state or government in the United Nations General Debate. The paper provides three key contributions based on the output of the Word2vec model. First, it presents a set of policy attention indices, synthesizing the semantic proximity of political speeches to specific policy themes. Second, it introduces country-specific semantic centrality indices, based on topological analyses of countries' semantic positions with respect to each other. Third, it tests the hypothesis that there exists a statistical relation between the semantic content of political speeches and UN voting behavior, falsifying it and suggesting that political speeches contain information of different nature then the one behind voting outcomes. The paper concludes with a discussion of the practical use of its results and consequences for foreign policy analysis, public accountability, and transparency.\nThis paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.\nSensor-based activity recognition seeks the profound high-level knowledge about human activities from multitudes of low-level sensor readings. Conventional pattern recognition approaches have made tremendous progress in the past years. However, those methods often heavily rely on heuristic hand-crafted feature extraction, which could hinder their generalization performance. Additionally, existing methods are undermined for unsupervised and incremental learning tasks. Recently, the recent advancement of deep learning makes it possible to perform automatic high-level feature extraction thus achieves promising performance in many areas. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition tasks. This paper surveys the recent advance of deep learning based sensor-based activity recognition. We summarize existing literature from three aspects: sensor modality, deep model, and application. We also present detailed insights on existing work and propose grand challenges for future research.\nThe Semantic Web began to emerge as its standards and technologies developed rapidly in the recent years. The continuing development of Semantic Web technologies has facilitated publishing explicit semantics with data on the Web in RDF data model. This study proposes a semantic search framework to support efficient keyword-based semantic search on RDF data utilizing near neighbor explorations. The framework augments the search results with the resources in close proximity by utilizing the entity type semantics. Along with the search results, the system generates a relevance confidence score measuring the inferred semantic relatedness of returned entities based on the degree of similarity. Furthermore, the evaluations assessing the effectiveness of the framework and the accuracy of the results are presented.\nModels that can execute natural language instructions for situated robotic tasks such as assembly and navigation have several useful applications in homes, offices, and remote scenarios. We study the semantics of spatially-referred configuration and arrangement instructions, based on the challenging Bisk-2016 blank-labeled block dataset. This task involves finding a source block and moving it to the target position (mentioned via a reference block and offset), where the blocks have no names or colors and are just referred to via spatial location features. We present novel models for the subtasks of source block classification and target position regression, based on joint-loss language and spatial-world representation learning, as well as CNN-based and dual attention models to compute the alignment between the world blocks and the instruction phrases. For target position prediction, we compare two inference approaches: annealed sampling via policy gradient versus expectation inference via supervised regression. Our models achieve the new state-of-the-art on this task, with an improvement of 47% on source block accuracy and 22% on target position distance.\nSeveral approaches of structuring (factorization, decomposition) of Dempster-Shafer joint belief functions from literature are reviewed with special emphasis on their capability to capture independence from the point of view of the claim that belief functions generalize bayes notion of probability.   It is demonstrated that Zhu and Lee's {Zhu:93} logical networks and Smets' {Smets:93} directed acyclic graphs are unable to capture statistical dependence/independence of bayesian networks {Pearl:88}. On the other hand, though Shenoy and Shafer's hypergraphs can explicitly represent bayesian network factorization of bayesian belief functions, they disclaim any need for representation of independence of variables in belief functions.   Cano et al. {Cano:93} reject the hypergraph representation of Shenoy and Shafer just on grounds of missing representation of variable independence, but in their frameworks some belief functions factorizable in Shenoy/Shafer framework cannot be factored.   The approach in {Klopotek:93f} on the other hand combines the merits of both Cano et al. and of Shenoy/Shafer approach in that for Shenoy/Shafer approach no simpler factorization than that in {Klopotek:93f} approach exists and on the other hand all independences among variables captured in Cano et al. framework and many more are captured in {Klopotek:93f} approach.%\nMathematical Theory of Evidence called also Dempster-Shafer Theory (DST) is known as a foundation for reasoning when knowledge is expressed at various levels of detail. Though much research effort has been committed to this theory since its foundation, many questions remain open. One of the most important open questions seems to be the relationship between frequencies and the Mathematical Theory of Evidence. The theory is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: (1) no experiment may be run to compare the performance of DST-based models of real world processes against real world data, (2) data may not serve as foundation for construction of an appropriate belief model.   In this paper we develop a frequentist interpretation of the DST bringing to fall the above argument against DST. An immediate consequence of it is the possibility to develop algorithms acquiring automatically DST belief models from data. We propose three such algorithms for various classes of belief model structures: for tree structured belief networks, for poly-tree belief networks and for general type belief networks.\nGame maps are useful for human players, general-game-playing agents, and data-driven procedural content generation. These maps are generally made by hand-assembling manually-created screenshots of game levels. Besides being tedious and error-prone, this approach requires additional effort for each new game and level to be mapped. The results can still be hard for humans or computational systems to make use of, privileging visual appearance over semantic information. We describe a software system, Mappy, that produces a good approximation of a linked map of rooms given a Nintendo Entertainment System game program and a sequence of button inputs exploring its world. In addition to visual maps, Mappy outputs grids of tiles (and how they change over time), positions of non-tile objects, clusters of similar rooms that might in fact be the same room, and a set of links between these rooms. We believe this is a necessary step towards developing larger corpora of high-quality semantically-annotated maps for PCG via machine learning and other applications.\nEngineers widely use Gaussian process regression framework to construct surrogate models aimed to replace computationally expensive physical models while exploring design space. Thanks to Gaussian process properties we can use both samples generated by a high fidelity function (an expensive and accurate representation of a physical phenomenon) and a low fidelity function (a cheap and coarse approximation of the same physical phenomenon) while constructing a surrogate model. However, if samples sizes are more than few thousands of points, computational costs of the Gaussian process regression become prohibitive both in case of learning and in case of prediction calculation. We propose two approaches to circumvent this computational burden: one approach is based on the Nystr\\\"om approximation of sample covariance matrices and another is based on an intelligent usage of a blackbox that can evaluate a~low fidelity function on the fly at any point of a design space. We examine performance of the proposed approaches using a number of artificial and real problems, including engineering optimization of a rotating disk shape.\nDomain knowledge can often be encoded in the structure of a network, such as convolutional layers for vision, which has been shown to increase generalization and decrease sample complexity, or the number of samples required for successful learning. In this study, we ask whether sample complexity can be reduced for systems where the structure of the domain is unknown beforehand, and the structure and parameters must both be learned from the data. We show that sample complexity reduction through learning structure is possible for at least two simple cases. In studying these cases, we also gain insight into how this might be done for more complex domains.\nProgramming by Optimization tools perform automatic software configuration according to the specification supplied by a software developer. Developers specify design spaces for program components, and the onerous task of determining which configuration best suits a given use case is determined using automated analysis tools and optimization heuristics. However, in current approaches to Programming by Optimization, design space specification and exploration relies on external configuration algorithms, executable wrappers and fragile, preprocessed programming language extensions.   Here we show that the architectural pattern of Dependency Injection provides a superior alternative to the traditional Programming by Optimization pipeline. We demonstrate that configuration tools based on Dependency Injection fit naturally into the software development process, while requiring less overhead than current wrapper-based mechanisms. Furthermore, the structural correspondence between Dependency Injection and context-free grammars yields a new class of evolutionary metaheuristics for automated algorithm configuration. We found that the new heuristics significantly outperform existing configuration algorithms on many problems of interest (in one case by two orders of magnitude). We anticipate that these developments will make Programming by Optimization immediately applicable to a large number of enterprise software projects.\nAnswer Set Programming (ASP) is a well-established declarative paradigm. One of the successes of ASP is the availability of efficient systems. State-of-the-art systems are based on the ground+solve approach. In some applications this approach is infeasible because the grounding of one or few constraints is expensive. In this paper, we systematically compare alternative strategies to avoid the instantiation of problematic constraints, that are based on custom extensions of the solver. Results on real and synthetic benchmarks highlight some strengths and weaknesses of the different strategies. (Under consideration for acceptance in TPLP, ICLP 2017 Special Issue.)\nYoutube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally we report improved validation results of our models on large-scale Youtube-8M datasets and discussions for the further improvement.\nMethods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.\nThe recent series 5 of the ASP system clingo provides generic means to enhance basic Answer Set Programming (ASP) with theory reasoning capabilities. We instantiate this framework with different forms of linear constraints, discuss the respective implementations, and present techniques of how to use these constraints in a reactive context. More precisely, we introduce extensions to clingo with difference and linear constraints over integers and reals, respectively, and realize them in complementary ways. Finally, we empirically evaluate the resulting clingo derivatives clingo[dl] and clingo[lp] on common fragments and contrast them to related ASP systems.   This paper is under consideration for acceptance in TPLP.\nMachine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give us. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.\nIn this paper, we describe the Lithium Natural Language Processing (NLP) system - a resource-constrained, high- throughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currently incorporated in Lithium products. We also compare our system with existing commercial and academic NLP systems in terms of performance, information extracted and languages supported. We show that Lithium NLP is at par with and in some cases, outperforms state- of-the-art commercial NLP systems.\nModern software systems in many application areas offer to the user a multitude of parameters, switches and other customisation hooks. Humans tend to have difficulties determining the best configurations for particular applications. Modern optimising compilers are an example of such software systems; their many parameters need to be tuned for optimal performance, but are often left at the default values for convenience. In this work, we automatically determine compiler parameter settings that result in optimised performance for particular applications. Specifically, we apply a state-of-the-art automated parameter configuration procedure based on cutting-edge machine learning and optimisation techniques to two prominent JavaScript compilers and demonstrate that significant performance improvements, more than 35% in some cases, can be achieved over the default parameter settings on a diverse set of benchmarks.\nThis paper verifies a result of {Shenoy:94} concerning graphoidal structure of Shenoy's notion of independence for Dempster-Shafer theory of belief functions. Shenoy proved that his notion of independence has graphoidal properties for positive normal valuations.   The requirement of strict positive normal valuations as prerequisite for application of graphoidal properties excludes a wide class of DS belief functions. It excludes especially so-called probabilistic belief functions. It is demonstrated that the requirement of positiveness of valuation may be weakened in that it may be required that commonality function is non-zero for singleton sets instead, and the graphoidal properties for independence of belief function variables are then preserved. This means especially that probabilistic belief functions with all singleton sets as focal points possess graphoidal properties for independence.\nLearners regularly abandon online coding tutorials when they get bored or frustrated, but there are few techniques for anticipating this abandonment to intervene. In this paper, we examine the feasibility of predicting abandonment with machine-learned classifiers. Using interaction logs from an online programming game, we extracted a collection of features that are potentially related to learner abandonment and engagement, then developed classifiers for each level. Across the first five levels of the game, our classifiers successfully predicted 61% to 76% of learners who did not complete the next level, achieving an average AUC of 0.68. In these classifiers, features negatively associated with abandonment included account activation and help-seeking behaviors, whereas features positively associated with abandonment included features indicating difficulty and disengagement. These findings highlight the feasibility of providing timely intervention to learners likely to quit.\nWe present the first general purpose framework for marginal maximum a posteriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction; delivering significant performance improvements over prominent existing packages. We present applications of our method to a number of tasks including engineering design and parameter optimization.\nFreeway merging in congested traffic is a significant challenge toward fully automated driving. Merging vehicles need to decide not only how to merge into a spot, but also where to merge. We present a method for the freeway merging based on multi-policy decision making with a reinforcement learning method called {\\em passive actor-critic} (pAC), which learns with less knowledge of the system and without active exploration. The method selects a merging spot candidate by using the state value learned with pAC. We evaluate our method using real traffic data. Our experiments show that pAC achieves 92\\% success rate to merge into a freeway, which is comparable to human decision making.\nReliability assessment of distribution system, based on historical data and probabilistic methods, leads to an unreliable estimation of reliability indices since the data for the distribution components are usually inaccurate or unavailable. Fuzzy logic is an efficient method to deal with the uncertainty in reliability inputs. In this paper, the ENS index along with other commonly used indices in reliability assessment are evaluated for the distribution system using fuzzy logic. Accordingly, the influential variables on the failure rate and outage duration time of the distribution components, which are natural or human-made, are explained using proposed fuzzy membership functions. The reliability indices are calculated and compared for different cases of the system operations by simulation on the IEEE RBTS Bus 2. The results of simulation show how utilities can significantly improve the reliability of their distribution system by considering the risk of the influential variables.\nVAEs (Variational AutoEncoders) have proved to be powerful in the context of density modeling and have been used in a variety of contexts for creative purposes. In many settings, the data we model possesses continuous attributes that we would like to take into account at generation time. We propose in this paper GLSR-VAE, a Geodesic Latent Space Regularization for the Variational AutoEncoder architecture and its generalizations which allows a fine control on the embedding of the data into the latent space. When augmenting the VAE loss with this regularization, changes in the learned latent space reflects changes of the attributes of the data. This deeper understanding of the VAE latent space structure offers the possibility to modulate the attributes of the generated data in a continuous way. We demonstrate its efficiency on a monophonic music generation task where we manage to generate variations of discrete sequences in an intended and playful way.\nIt is quite exceptional, if it ever happens, that a new conceptual domain be built from scratch. Usually, it is developed and mastered in interaction, both positive and negative, with other more operational existing domains. Few reasoning mechanisms have been proposed to account for the interplay of different conceptual domains and the transfer of information from one to another. Analogical reasoning is one, blending is another. This paper presents a new mechanism, called 'tunnel effect', that may explain, in part, how scientists and students reason while constructing a new conceptual domain. One experimental study with high school students and analyses from the history of science, particularly about the birth of classical thermodynamics, provide evidence and illustrate this mechanism. The knowledge organization, processes and conditions for its appearance are detailed and put into the perspective of a computational model. Specifically, we put forward the hypothesis that two levels of knowledge, notional and conceptual, cooperate in the scientific discovery process when a new conceptual domain is being built. The type of conceptual learning that can be associated with tunnel effect is discussed and a thorough comparison is made with analogical reasoning in order to underline the main features of the new proposed mechanism.\nManagement of chronic diseases such as heart failure (HF) is a major public health problem. A standard approach to managing chronic diseases by medical community is to have a committee of experts develop guidelines that all physicians should follow. Due to their complexity, these guidelines are difficult to implement and are adopted slowly by the medical community at large. We have developed a physician advisory system that codes the entire set of clinical practice guidelines for managing HF using answer set programming(ASP). In this paper we show how abductive reasoning can be deployed to find missing symptoms and conditions that the patient must exhibit in order for a treatment prescribed by a physician to work effectively. Thus, if a physician does not make an appropriate recommendation or makes a non-adherent recommendation, our system will advise the physician about symptoms and conditions that must be in effect for that recommendation to apply. It is under consideration for acceptance in TPLP.\nWe introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber($p_i$) for some underlying mean $p_i$) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we have suggested are indeed optimal up to a constant factor. We also investigate the case where the distribution from which the underlying means are drawn is not known ahead of time. We again, are able to suggest algorithms that are optimal up to a constant factor for this case, given certain mild conditions on the universe of distributions.\nAn approach for coalition formation of multi-agent pursuit based on neural network and AGRMF model is proposed.This paper constructs a novel neural work called AGRMF-ANN which consists of feature extraction part and group generation part. On one hand,The convolutional layers of feature extraction part can abstract the features of agent group role membership function(AGRMF) for all of the groups,on the other hand,those features will be fed to the group generation part based on self-organizing map(SOM) layer which is used to group the pursuers with similar features in the same group. Besides, we also come up the group attractiveness function(GAF) to evaluate the quality of groups and the pursuers contribution in order to adjust the main ability indicators of AGRMF and other weight of all neural network. The simulation experiment showed that this proposal can improve the effectiveness of coalition formation for multi-agent pursuit and ability to adopt pursuit-evasion problem with the scale of pursuer team growing.\nRecent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec's embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large real-world datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with state-of-the-art graph kernels.\nSequential Constraint Grammar (SCG) (Karlsson, 1990) and its extensions have lacked clear connections to formal language theory. The purpose of this article is to lay a foundation for these connections by simplifying the definition of strings processed by the grammar and by showing that Nonmonotonic SCG is undecidable and that derivations similar to the Generative Phonology exist. The current investigations propose resource bounds that restrict the generative power of SCG to a subset of context sensitive languages and present a strong finite-state condition for grammars as wholes. We show that a grammar is equivalent to a finite-state transducer if it is implemented with a Turing machine that runs in o(n log n) time. This condition opens new finite-state hypotheses and avenues for deeper analysis of SCG instances in the way inspired by Finite-State Phonology.\nElectronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Facing with patient's symptoms, experienced caregivers make right medical decisions based on their professional knowledge that accurately grasps relationships between symptoms, diagnosis and corresponding treatments. In this paper, we aim to capture these relationships by constructing a large and high-quality heterogenous graph linking patients, diseases, and drugs (PDD) in EMRs. Specifically, we propose a novel framework to extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented in this paper is accessible on the Web via the SPARQL endpoint, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.\nGenerating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.\nIn this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.\nThe human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions. We introduce a two-stage neural-network grounding pipeline that maps natural language referring expressions directly to objects in the images. The first stage uses visual descriptions in the referring expressions to generate a candidate set of relevant objects. The second stage examines all pairwise relationships between the candidates and predicts the most likely referred object according to the spatial descriptions in the referring expressions. A key feature of our system is that by leveraging a large dataset of images labeled with text descriptions, it allows unrestricted object types and natural language referring expressions. Preliminary results indicate that our system outperforms a near state-of-the-art object comprehension system on standard benchmark datasets. We also present a robot system that follows voice commands to pick and place previously unseen objects.\nUnprecedented high volumes of data are becoming available with the growth of the advanced metering infrastructure. These are expected to benefit planning and operation of the future power system, and to help the customers transition from a passive to an active role. In this paper, we explore for the first time in the smart grid context the benefits of using Deep Reinforcement Learning, a hybrid type of methods that combines Reinforcement Learning with Deep Learning, to perform on-line optimization of schedules for building energy management systems. The learning procedure was explored using two methods, Deep Q-learning and Deep Policy Gradient, both of them being extended to perform multiple actions simultaneously. The proposed approach was validated on the large-scale Pecan Street Inc. database. This highly-dimensional database includes information about photovoltaic power generation, electric vehicles as well as buildings appliances. Moreover, these on-line energy scheduling strategies could be used to provide real-time feedback to consumers to encourage more efficient use of electricity.\nExisting region-based object detectors are limited to regions with fixed box geometry to represent objects, even if those are highly non-rectangular. In this paper we introduce DP-FCN, a deep model for object detection which explicitly adapts to shapes of objects with deformable parts. Without additional annotations, it learns to focus on discriminative elements and to align them, and simultaneously brings more invariance for classification and geometric information to refine localization. DP-FCN is composed of three main modules: a Fully Convolutional Network to efficiently maintain spatial resolution, a deformable part-based RoI pooling layer to optimize positions of parts and build invariance, and a deformation-aware localization module explicitly exploiting displacements of parts to improve accuracy of bounding box regression. We experimentally validate our model and show significant gains. DP-FCN achieves state-of-the-art performances of 83.1% and 80.9% on PASCAL VOC 2007 and 2012 with VOC data only.\nFor decomposable score-based structure learning of Bayesian networks, existing approaches first compute a collection of candidate parent sets for each variable and then optimize over this collection by choosing one parent set for each variable without creating directed cycles while maximizing the total score. We target the task of constructing the collection of candidate parent sets when the score of choice is the Bayesian Information Criterion (BIC). We provide new non-trivial results that can be used to prune the search space of candidate parent sets of each node. We analyze how these new results relate to previous ideas in the literature both theoretically and empirically. We show in experiments with UCI data sets that gains can be significant. Since the new pruning rules are easy to implement and have low computational costs, they can be promptly integrated into all state-of-the-art methods for structure learning of Bayesian networks.\nWe introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.\nWe present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.\nPairwise comparison data arises in many domains, including tournament rankings, web search, and preference elicitation. Given noisy comparisons of a fixed subset of pairs of items, we study the problem of estimating the underlying comparison probabilities under the assumption of strong stochastic transitivity (SST). We also consider the noisy sorting subclass of the SST model. We show that when the assignment of items to the topology is arbitrary, these permutation-based models, unlike their parametric counterparts, do not admit consistent estimation for most comparison topologies used in practice. We then demonstrate that consistent estimation is possible when the assignment of items to the topology is randomized, thus establishing a dichotomy between worst-case and average-case designs. We propose two estimators in the average-case setting and analyze their risk, showing that it depends on the comparison topology only through the degree sequence of the topology. The rates achieved by these estimators are shown to be optimal for a large class of graphs. Our results are corroborated by simulations on multiple comparison topologies.\nComputational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker's sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task.\nLPMLN is a recent addition to probabilistic logic programming languages. Its main idea is to overcome the rigid nature of the stable model semantics by assigning a weight to each rule in a way similar to Markov Logic is defined. We present two implementations of LPMLN, $\\text{LPMLN2ASP}$ and $\\text{LPMLN2MLN}$. System $\\text{LPMLN2ASP}$ translates LPMLN programs into the input language of answer set solver $\\text{CLINGO}$, and using weak constraints and stable model enumeration, it can compute most probable stable models as well as exact conditional and marginal probabilities. System $\\text{LPMLN2MLN}$ translates LPMLN programs into the input language of Markov Logic solvers, such as $\\text{ALCHEMY}$, $\\text{TUFFY}$, and $\\text{ROCKIT}$, and allows for performing approximate probabilistic inference on LPMLN programs. We also demonstrate the usefulness of the LPMLN systems for computing other languages, such as ProbLog and Pearl's Causal Models, that are shown to be translatable into LPMLN. (Under consideration for acceptance in TPLP)\nLearning cooperative policies for multi-agent systems is often challenged by partial observability and a lack of coordination. In some settings, the structure of a problem allows a distributed solution with limited communication. Here, we consider a scenario where no communication is available, and instead we learn local policies for all agents that collectively mimic the solution to a centralized multi-agent static optimization problem. Our main contribution is an information theoretic framework based on rate distortion theory which facilitates analysis of how well the resulting fully decentralized policies are able to reconstruct the optimal solution. Moreover, this framework provides a natural extension that addresses which nodes an agent should communicate with to improve the performance of its individual policy.\nVideo Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question. However, the existing visual question answering approaches mainly tackle the problem of static image question, which may be ineffectively for video question answering due to the insufficiency of modeling the temporal dynamics of video contents. In this paper, we study the problem of video question answering by modeling its temporal dynamics with frame-level attention mechanism. We propose the attribute-augmented attention network learning framework that enables the joint frame-level attribute detection and unified video representation learning for video question answering. We then incorporate the multi-step reasoning process for our proposed attention network to further improve the performance. We construct a large-scale video question answering dataset. We conduct the experiments on both multiple-choice and open-ended video question answering tasks to show the effectiveness of the proposed method.\nBayesian Filtering for plan and activity recognition is challenging for scenarios that contain many observation equivalent entities (i.e. entities that produce the same observations). This is due to the combinatorial explosion in the number of hypotheses that need to be tracked. However, this class of problems exhibits a certain symmetry that can be exploited for state space representation and inference. We analyze current state of the art methods and find that none of them completely fits the requirements arising in this problem class. We sketch a novel inference algorithm that provides a solution by incorporating concepts from Lifted Inference algorithms, Probabilistic Multiset Rewriting Systems, and Computational State Space Models. Two experiments confirm that this novel algorithm has the potential to perform efficient probabilistic inference on this problem class.\nThis paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net). It is designed based on a re-formulated atmospheric scattering model. Instead of estimating the transmission matrix and the atmospheric light separately as most previous models did, AOD-Net directly generates the clean image through a light-weight CNN. Such a novel end-to-end design makes it easy to embed AOD-Net into other deep models, e.g., Faster R-CNN, for improving high-level task performance on hazy images. Experimental results on both synthesized and natural hazy image datasets demonstrate our superior performance than the state-of-the-art in terms of PSNR, SSIM and the subjective visual quality. Furthermore, when concatenating AOD-Net with Faster R-CNN and training the joint pipeline from end to end, we witness a large improvement of the object detection performance on hazy images.\nWe study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.\nWe describe a generalization of the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) which is able to encode prior information that state transitions are more likely between \"nearby\" states. This is accomplished by defining a similarity function on the state space and scaling transition probabilities by pair-wise similarities, thereby inducing correlations among the transition distributions. We present an augmented data representation of the model as a Markov Jump Process in which: (1) some jump attempts fail, and (2) the probability of success is proportional to the similarity between the source and destination states. This augmentation restores conditional conjugacy and admits a simple Gibbs sampler. We evaluate the model and inference method on a speaker diarization task and a \"harmonic parsing\" task using four-part chorale data, as well as on several synthetic datasets, achieving favorable comparisons to existing models.\nAnswer Set Programming (ASP) is a well-established formalism for nonmonotonic reasoning. An ASP program can have no answer set due to cyclic default negation. In this case, it is not possible to draw any conclusion, even if this is not intended. Recently, several paracoherent semantics have been proposed that address this issue, and several potential applications for these semantics have been identified. However, paracoherent semantics have essentially been inapplicable in practice, due to the lack of efficient algorithms and implementations. In this paper, this lack is addressed, and several different algorithms to compute semi-stable and semi-equilibrium models are proposed and implemented into an answer set solving framework. An empirical performance comparison among the new algorithms on benchmarks from ASP competitions is given as well.\nIn this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman's equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.\nAnswer Set Programming (ASP) is a well-established declarative problem solving paradigm which became widely used in AI and recognized as a powerful tool for knowledge representation and reasoning (KRR), especially for its high expressiveness and the ability to deal also with incomplete knowledge.   Recently, thanks to the availability of a number of robust and efficient implementations, ASP has been increasingly employed in a number of different domains, and used for the development of industrial-level and enterprise applications. This made clear the need for proper development tools and interoperability mechanisms for easing interaction and integration with external systems in the widest range of real-world scenarios, including mobile applications and educational contexts.   In this work we present a framework for integrating the KRR capabilities of ASP into generic applications. We show the use of the framework by illustrating proper specializations for some relevant ASP systems over different platforms, including the mobile setting; furthermore, the potential of the framework for educational purposes is illustrated by means of the development of several ASP-based applications.\nRecent rapid advances in Artificial Intelligence (AI) and Machine Learning have raised many questions about the regulatory and governance mechanisms for autonomous machines. Many commentators, scholars, and policy-makers now call for ensuring that algorithms governing our lives are transparent, fair, and accountable. Here, I propose a conceptual framework for the regulation of AI and algorithmic systems. I argue that we need tools to program, debug and maintain an algorithmic social contract, a pact between various human stakeholders, mediated by machines. To achieve this, we can adapt the concept of human-in-the-loop (HITL) from the fields of modeling and simulation, and interactive machine learning. In particular, I propose an agenda I call society-in-the-loop (SITL), which combines the HITL control paradigm with mechanisms for negotiating the values of various stakeholders affected by AI systems, and monitoring compliance with the agreement. In short, `SITL = HITL + Social Contract.'\nBecause preferences naturally arise and play an important role in many real-life decisions, they are at the backbone of various fields. In particular preferences are increasingly used in almost all matching procedures-based applications. In this work we highlight the benefit of using AI insights on preferences in a large scale application, namely the French Admission Post-Baccalaureat Platform (APB). Each year APB allocates hundreds of thousands first year applicants to universities. This is done automatically by matching applicants preferences to university seats. In practice, APB can be unable to distinguish between applicants which leads to the introduction of random selection. This has created frustration in the French public since randomness, even used as a last mean does not fare well with the republican egalitarian principle. In this work, we provide a solution to this problem. We take advantage of recent AI Preferences Theory results to show how to enhance APB in order to improve expressiveness of applicants preferences and reduce their exposure to random decisions.\nSupervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly balances two goals: recovery of faithful generative explanations of high-dimensional data, and accurate prediction of associated semantic labels. Existing approaches fail to achieve these goals due to an incomplete treatment of a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our prediction-constrained objective for training generative models coherently integrates loss-based supervisory signals while enabling effective semi-supervised learning from partially labeled data. We derive learning algorithms for semi-supervised mixture and topic models using stochastic gradient descent with automatic differentiation. We demonstrate improved prediction quality compared to several previous supervised topic models, achieving predictions competitive with high-dimensional logistic regression on text sentiment analysis and electronic health records tasks while simultaneously learning interpretable topics.\nMachine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.\nWe present a simple method for assessing the quality of generated images in Generative Adversarial Networks (GANs). The method can be applied in any kind of GAN without interfering with the learning procedure or affecting the learning objective. The central idea is to define a likelihood function that correlates with the quality of the generated images. In particular, we derive a Gaussian likelihood function from the distribution of the embeddings (hidden activations) of the real images in the discriminator, and based on this, define two simple measures of how likely it is that the embeddings of generated images are from the distribution of the embeddings of the real images. This yields a simple measure of fitness for generated images, for all varieties of GANs. Empirical results on CIFAR-10 demonstrate a strong correlation between the proposed measures and the perceived quality of the generated images.\nWe propose a methodology that adapts graph embedding techniques (DeepWalk (Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016)) as well as cross-lingual vector space mapping approaches (Least Squares and Canonical Correlation Analysis) in order to merge the corpus and ontological sources of lexical knowledge. We also perform comparative analysis of the used algorithms in order to identify the best combination for the proposed system. We then apply this to the task of enhancing the coverage of an existing word embedding's vocabulary with rare and unseen words. We show that our technique can provide considerable extra coverage (over 99%), leading to consistent performance gain (around 10% absolute gain is achieved with w2v-gn-500K cf.\\S 3.3) on the Rare Word Similarity dataset.\nDeep neural networks have become a primary tool for solving problems in many fields. They are also used for addressing information retrieval problems and show strong performance in several tasks. Training these models requires large, representative datasets and for most IR tasks, such data contains sensitive information from users. Privacy and confidentiality concerns prevent many data owners from sharing the data, thus today the research community can only benefit from research on large-scale datasets in a limited manner. In this paper, we discuss privacy preserving mimic learning, i.e., using predictions from a privacy preserving trained model instead of labels from the original sensitive training data as a supervision signal. We present the results of preliminary experiments in which we apply the idea of mimic learning and privacy preserving mimic learning for the task of document re-ranking as one of the core IR tasks. This research is a step toward laying the ground for enabling researchers from data-rich environments to share knowledge learned from actual users' data, which should facilitate research collaborations.\nThere have been some works that learn a lexicon together with the corpus to improve the word embeddings. However, they either model the lexicon separately but update the neural networks for both the corpus and the lexicon by the same likelihood, or minimize the distance between all of the synonym pairs in the lexicon. Such methods do not consider the relatedness and difference of the corpus and the lexicon, and may not be the best optimized. In this paper, we propose a novel method that considers the relatedness and difference of the corpus and the lexicon. It trains word embeddings by learning the corpus to predicate a word and its corresponding synonym under the context at the same time. For polysemous words, we use a word sense disambiguation filter to eliminate the synonyms that have different meanings for the context. To evaluate the proposed method, we compare the performance of the word embeddings trained by our proposed model, the control groups without the filter or the lexicon, and the prior works in the word similarity tasks and text classification task. The experimental results show that the proposed model provides better embeddings for polysemous words and improves the performance for text classification.\nElectronic Health Records are electronic data generated during or as a byproduct of routine patient care. Structured, semi-structured and unstructured EHR offer researchers unprecedented phenotypic breadth and depth and have the potential to accelerate the development of precision medicine approaches at scale. A main EHR use-case is defining phenotyping algorithms that identify disease status, onset and severity. Phenotyping algorithms utilize diagnoses, prescriptions, laboratory tests, symptoms and other elements in order to identify patients with or without a specific trait. No common standardized, structured, computable format exists for storing phenotyping algorithms. The majority of algorithms are stored as human-readable descriptive text documents making their translation to code challenging due to their inherent complexity and hinders their sharing and re-use across the community. In this paper, we evaluate the two key Semantic Web Technologies, the Web Ontology Language and the Resource Description Framework, for enabling computable representations of EHR-driven phenotyping algorithms.\nThe number of scientific articles has grown rapidly over the years and there are no signs that this growth will slow down in the near future. Because of this, it becomes increasingly difficult to keep up with the latest developments in a scientific field. To address this problem, we present here an approach to help researchers learn about the latest developments and findings by extracting in a normalized form core claims from scientific articles. This normalized representation is a controlled natural language of English sentences called AIDA, which has been proposed in previous work as a method to formally structure and organize scientific findings and discourse. We show how such AIDA sentences can be automatically extracted by detecting the core claim of an article, checking for AIDA compliance, and - if necessary - transforming it into a compliant sentence. While our algorithm is still far from perfect, our results indicate that the different steps are feasible and they support the claim that AIDA sentences might be a promising approach to improve scientific communication in the future.\nThe quest for better data analysis and artificial intelligence has lead to more and more data being collected and stored. As a consequence, more data are exposed to malicious entities. This paper examines the problem of privacy in machine learning for classification. We utilize the Ridge Discriminant Component Analysis (RDCA) to desensitize data with respect to a privacy label. Based on five experiments, we show that desensitization by RDCA can effectively protect privacy (i.e. low accuracy on the privacy label) with small loss in utility. On HAR and CMU Faces datasets, the use of desensitized data results in random guess level accuracies for privacy at a cost of 5.14% and 0.04%, on average, drop in the utility accuracies. For Semeion Handwritten Digit dataset, accuracies of the privacy-sensitive digits are almost zero, while the accuracies for the utility-relevant digits drop by 7.53% on average. This presents a promising solution to the problem of privacy in machine learning for classification.\nTraining robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning in games and simulations, research in real robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies in simulation, such policies can perform sub-optimally on the real platform given imperfect calibration of model dynamics. We present an approach -- supplemental to fine tuning on the real robot -- to further benefit from parallel access to a simulator during training and reduce sample requirements on the real robot. The developed approach harnesses auxiliary rewards to guide the exploration for the real world agent based on the proficiency of the agent in simulation and vice versa. In this context, we demonstrate empirically that the reciprocal alignment for both agents provides further benefit as the agent in simulation can adjust to optimize its behaviour for states commonly visited by the real-world agent.\nEntity retrieval is the task of finding entities such as people or products in response to a query, based solely on the textual documents they are associated with. Recent semantic entity retrieval algorithms represent queries and experts in finite-dimensional vector spaces, where both are constructed from text sequences.   We investigate entity vector spaces and the degree to which they capture structural regularities. Such vector spaces are constructed in an unsupervised manner without explicit information about structural aspects. For concreteness, we address these questions for a specific type of entity: experts in the context of expert finding. We discover how clusterings of experts correspond to committees in organizations, the ability of expert representations to encode the co-author graph, and the degree to which they encode academic rank. We compare latent, continuous representations created using methods based on distributional semantics (LSI), topic models (LDA) and neural networks (word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as doc2vec and SERT, systematically perform better at clustering than LSI, LDA and word2vec. When it comes to encoding entity relations, SERT performs best.\nProcessing and publishing the data of the historical sciences in the semantic web is an interesting challenge in which the representation of temporal aspects plays a key role. We propose in this paper a model of temporal knowledge representation adapted to work on historical documents. This model is based on the notion of fluent that is represented in RDF graphs. We show how this model allows to represent the knowledge necessary to the historians and how it can be used to reason on this knowledge using the SWRL and SPARQL languages. This model is being used in a project to digitize, study and publish the manuscripts of linguist Ferdinand de Saussure.\nAcademic research in the field of recommender systems mainly focuses on the problem of maximizing the users' utility by trying to identify the most relevant items for each user. However, such items are not necessarily the ones that maximize the utility of the service provider (e.g., an online retailer) in terms of the business value, such as profit. One approach to increasing the providers' utility is to incorporate purchase-oriented information, e.g., the price, sales probabilities, and the resulting profit, into the recommendation algorithms. In this paper we specifically focus on price- and profit-aware recommender systems. We provide a brief overview of the relevant literature and use numerical simulations to illustrate the potential business benefit of such approaches.\nMulti-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories: feature learning approach, low-rank approach, task clustering approach, task relation learning approach, dirty approach, multi-level approach and deep learning approach. In order to compare different approaches, we discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as feature hashing are reviewed to reveal the computational and storage advantages. Many real-world applications use MTL to boost their performance and we introduce some representative works. Finally, we present theoretical analyses and discuss several future directions for MTL.\nIn this paper, we present a new task that investigates how people interact with and make judgments about towers of blocks. In Experiment~1, participants in the lab solved a series of problems in which they had to re-configure three blocks from an initial to a final configuration. We recorded whether they used one hand or two hands to do so. In Experiment~2, we asked participants online to judge whether they think the person in the lab used one or two hands. The results revealed a close correspondence between participants' actions in the lab, and the mental simulations of participants online. To explain participants' actions and mental simulations, we develop a model that plans over a symbolic representation of the situation, executes the plan using a geometric solver, and checks the plan's feasibility by taking into account the physical constraints of the scene. Our model explains participants' actions and judgments to a high degree of quantitative accuracy.\nManufacturers of safety-critical systems must make the case that their product is sufficiently safe for public deployment. Much of this case often relies upon critical event outcomes from real-world testing, requiring manufacturers to be strategic about how they allocate testing resources in order to maximize their chances of demonstrating system safety. This work frames the partially observable and belief-dependent problem of test scheduling as a Markov decision process, which can be solved efficiently to yield closed-loop manufacturer testing policies. By solving for policies over a wide range of problem formulations, we are able to provide high-level guidance for manufacturers and regulators on issues relating to the testing of safety-critical systems. This guidance spans an array of topics, including circumstances under which manufacturers should continue testing despite observed incidents, when manufacturers should test aggressively, and when regulators should increase or reduce the real-world testing requirements for an autonomous vehicle.\nNowadays, there are many approaches designed for the task of detecting communities in social networks. Among them, some methods only consider the topological graph structure, while others take use of both the graph structure and the node attributes. In real-world networks, there are many uncertain and noisy attributes in the graph. In this paper, we will present how we detect communities in graphs with uncertain attributes in the first step. The numerical, probabilistic as well as evidential attributes are generated according to the graph structure. In the second step, some noise will be added to the attributes. We perform experiments on graphs with different types of attributes and compare the detection results in terms of the Normalized Mutual Information (NMI) values. The experimental results show that the clustering with evidential attributes gives better results comparing to those with probabilistic and numerical attributes. This illustrates the advantages of evidential attributes.\nWe introduce $\\mathcal{DLR}^+$, an extension of the n-ary propositionally closed description logic $\\mathcal{DLR}$ to deal with attribute-labelled tuples (generalising the positional notation), projections of relations, and global and local objectification of relations, able to express inclusion, functional, key, and external uniqueness dependencies. The logic is equipped with both TBox and ABox axioms. We show how a simple syntactic restriction on the appearance of projections sharing common attributes in a $\\mathcal{DLR}^+$ knowledge base makes reasoning in the language decidable with the same computational complexity as $\\mathcal{DLR}$. The obtained $\\mathcal{DLR}^\\pm$ n-ary description logic is able to encode more thoroughly conceptual data models such as EER, UML, and ORM.\nDomain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act. DARLA's vision is based on learning a disentangled representation of the observed environment. Once DARLA can see, it is able to acquire source policies that are robust to many domain shifts - even with no access to the target domain. DARLA significantly outperforms conventional baselines in zero-shot domain adaptation scenarios, an effect that holds across a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms (DQN, A3C and EC).\nIn this work we present a technique to use natural language to help reinforcement learning generalize to unseen environments. This technique uses neural machine translation, specifically the use of encoder-decoder networks, to learn associations between natural language behavior descriptions and state-action information. We then use this learned model to guide agent exploration using a modified version of policy shaping to make it more effective at learning in unseen environments. We evaluate this technique using the popular arcade game, Frogger, under ideal and non-ideal conditions. This evaluation shows that our modified policy shaping algorithm improves over a Q-learning agent as well as a baseline version of policy shaping.\nMatching 3D rigid point clouds in complex environments robustly and accurately is still a core technique used in many applications. This paper proposes a new architecture combining error estimation from sample covariances and dual global probability alignment based on the convolution of adaptive Gaussian Mixture Models (GMM) from point clouds. Firstly, a novel adaptive GMM is defined using probability distributions from the corresponding points. Then rigid point cloud alignment is performed by maximizing the global probability from the convolution of dual adaptive GMMs in the whole 2D or 3D space, which can be efficiently optimized and has a large zone of accurate convergence. Thousands of trials have been conducted on 200 models from public 2D and 3D datasets to demonstrate superior robustness and accuracy in complex environments with unpredictable noise, outliers, occlusion, initial rotation, shape and missing points.\nRobots operating alongside humans in diverse, stochastic environments must be able to accurately interpret natural language commands. These instructions often fall into one of two categories: those that specify a goal condition or target state, and those that specify explicit actions, or how to perform a given task. Recent approaches have used reward functions as a semantic representation of goal-based commands, which allows for the use of a state-of-the-art planner to find a policy for the given task. However, these reward functions cannot be directly used to represent action-oriented commands. We introduce a new hybrid approach, the Deep Recurrent Action-Goal Grounding Network (DRAGGN), for task grounding and execution that handles natural language from either category as input, and generalizes to unseen environments. Our robot-simulation results demonstrate that a system successfully interpreting both goal-oriented and action-oriented task specifications brings us closer to robust natural language understanding for human-robot interaction.\nGossip protocols aim at arriving, by means of point-to-point or group communications, at a situation in which all the agents know each other secrets. Recently a number of authors studied distributed epistemic gossip protocols. These protocols use as guards formulas from a simple epistemic logic, which makes their analysis and verification substantially easier.   We study here common knowledge in the context of such a logic. First, we analyze when it can be reduced to iterated knowledge. Then we show that the semantics and truth for formulas without nested common knowledge operator are decidable. This implies that implementability, partial correctness and termination of distributed epistemic gossip protocols that use non-nested common knowledge operator is decidable, as well. Given that common knowledge is equivalent to an infinite conjunction of nested knowledge, these results are non-trivial generalizations of the corresponding decidability results for the original epistemic logic, established in (Apt & Wojtczak, 2016).   K. R. Apt & D. Wojtczak (2016): On Decidability of a Logic of Gossips. In Proc. of JELIA 2016, pp. 18-33, doi:10.1007/ 978-3-319-48758-8_2.\nAn abstract argumentation framework can be used to model the argumentative stance of an agent at a high level of abstraction, by indicating for every pair of arguments that is being considered in a debate whether the first attacks the second. When modelling a group of agents engaged in a debate, we may wish to aggregate their individual argumentation frameworks to obtain a single such framework that reflects the consensus of the group. Even when agents disagree on many details, there may well be high-level agreement on important semantic properties, such as the acceptability of a given argument. Using techniques from social choice theory, we analyse under what circumstances such semantic properties agreed upon by the individual agents can be preserved under aggregation.\nWhile there have been many attempts, going back to BAN logic, to base reasoning about security protocols on epistemic notions, they have not been all that successful. Arguably, this has been due to the particular logics chosen. We present a simple logic based on the well-understood modal operators of knowledge, time, and probability, and show that it is able to handle issues that have often been swept under the rug by other approaches, while being flexible enough to capture all the higher- level security notions that appear in BAN logic. Moreover, while still assuming that the knowledge operator allows for unbounded computation, it can handle the fact that a computationally bounded agent cannot decrypt messages in a natural way, by distinguishing strings and message terms. We demonstrate that our logic can capture BAN logic notions by providing a translation of the BAN operators into our logic, capturing belief by a form of probabilistic knowledge.\nWe introduce an axiomatic approach to group recommendations, in line of previous work on the axiomatic treatment of trust-based recommendation systems, ranking systems, and other foundational work on the axiomatic approach to internet mechanisms in social choice settings. In group recommendations we wish to recommend to a group of agents, consisting of both opinionated and undecided members, a joint choice that would be acceptable to them. Such a system has many applications, such as choosing a movie or a restaurant to go to with a group of friends, recommending games for online game players, & other communal activities.   Our method utilizes a given social graph to extract information on the undecided, relying on the agents influencing them. We first show that a set of fairly natural desired requirements (a.k.a axioms) leads to an impossibility, rendering mutual satisfaction of them unreachable. However, we also show a modified set of axioms that fully axiomatize a group variant of the random-walk recommendation system, expanding a previous result from the individual recommendation case.\nThis paper combines two studies: a topological semantics for epistemic notions and abstract argumentation theory. In our combined setting, we use a topological semantics to represent the structure of an agent's collection of evidence, and we use argumentation theory to single out the relevant sets of evidence through which a notion of beliefs grounded on arguments is defined. We discuss the formal properties of this newly defined notion, providing also a formal language with a matching modality together with a sound and complete axiom system for it. Despite the fact that our agent can combine her evidence in a 'rational' way (captured via the topological structure), argument-based beliefs are not closed under conjunction. This illustrates the difference between an agent's reasoning abilities (i.e. the way she is able to combine her available evidence) and the closure properties of her beliefs. We use this point to argue for why the failure of closure under conjunction of belief should not bear the burden of the failure of rationality.\nLegal probabilism (LP) claims the degrees of conviction in juridical fact-finding are to be modeled exactly the way degrees of beliefs are modeled in standard bayesian epistemology. Classical legal probabilism (CLP) adds that the conviction is justified if the credence in guilt given the evidence is above an appropriate guilt probability threshold. The views are challenged on various counts, especially by the proponents of the so-called narrative approach, on which the fact-finders' decision is the result of a dynamic interplay between competing narratives of what happened. I develop a way a bayesian epistemologist can make sense of the narrative approach. I do so by formulating a probabilistic framework for evaluating competing narrations in terms of formal explications of the informal evaluation criteria used in the narrative approach.\nRecent years witnessed a growing interest in non-standard epistemic logics of knowing whether, knowing how, knowing what, knowing why and so on. The new epistemic modalities introduced in those logics all share, in their semantics, the general schema of $\\exists x \\Box \\phi$, e.g., knowing how to achieve $\\phi$ roughly means that there exists a way such that you know that it is a way to ensure that $\\phi$. Moreover, the resulting logics are decidable. Inspired by those particular logics, in this work, we propose a very general and powerful framework based on quantifier-free predicate language extended by a new modality $\\Box^x$, which packs exactly $\\exists x \\Box$ together. We show that the resulting language, though much more expressive, shares many good properties of the basic propositional modal logic over arbitrary models, such as finite-tree-model property and van Benthem-like characterization w.r.t.\\ first-order modal logic. We axiomatize the logic over S5 frames with intuitive axioms to capture the interaction between $\\Box^x$ and know-that operator in an epistemic setting.\nWe propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage Function). The demonstrations replace the need for carefully engineered rewards, and reduce the exploration problem encountered by classical RL approaches in these domains. Demonstrations are collected by a robot kinesthetically force-controlled by a human demonstrator. Results on four simulated insertion tasks show that DDPG from demonstrations out-performs DDPG, and does not require engineered rewards. Finally, we demonstrate the method on a real robotics task consisting of inserting a clip (flexible object) into a rigid object.\nDeep residual learning (ResNet) is a new method for training very deep neural networks using identity map-ping for shortcut connections. ResNet has won the ImageNet ILSVRC 2015 classification task, and achieved state-of-the-art performances in many computer vision tasks. However, the effect of residual learning on noisy natural language processing tasks is still not well understood. In this paper, we design a novel convolutional neural network (CNN) with residual learning, and investigate its impacts on the task of distantly supervised noisy relation extraction. In contradictory to popular beliefs that ResNet only works well for very deep networks, we found that even with 9 layers of CNNs, using identity mapping could significantly improve the performance for distantly-supervised relation extraction.\nIn this article we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution we formulate the problem as a learning one and propose a novel RL algorithm capable of learning when to advise, adapting to the student and the task at hand. Furthermore, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.\nMachine comprehension(MC) style question answering is a representative problem in natural language processing. Previous methods rarely spend time on the improvement of encoding layer, especially the embedding of syntactic information and name entity of the words, which are very crucial to the quality of encoding. Moreover, existing attention methods represent each query word as a vector or use a single vector to represent the whole query sentence, neither of them can handle the proper weight of the key words in query sentence. In this paper, we introduce a novel neural network architecture called Multi-layer Embedding with Memory Network(MEMEN) for machine reading task. In the encoding layer, we employ classic skip-gram model to the syntactic and semantic information of the words to train a new kind of embedding layer. We also propose a memory network of full-orientation matching of the query and passage to catch more pivotal information. Experiments show that our model has competitive results both from the perspectives of precision and efficiency in Stanford Question Answering Dataset(SQuAD) among all published results and achieves the state-of-the-art results on TriviaQA dataset.\nA novel data-driven stochastic robust optimization (DDSRO) framework is proposed for optimization under uncertainty leveraging labeled multi-class uncertainty data. Uncertainty data in large datasets are often collected from various conditions, which are encoded by class labels. Machine learning methods including Dirichlet process mixture model and maximum likelihood estimation are employed for uncertainty modeling. A DDSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different data classes; adaptive robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A decomposition-based algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on process network design and planning are presented to demonstrate the applicability of the proposed framework and algorithm.\nWe propose a recurrent extension of the Ladder networks whose structure is motivated by the inference required in hierarchical latent variable models. We demonstrate that the recurrent Ladder is able to handle a wide variety of complex learning tasks that benefit from iterative inference and temporal modeling. The architecture shows close-to-optimal results on temporal modeling of video data, competitive results on music modeling, and improved perceptual grouping based on higher order abstractions, such as stochastic textures and motion cues. We present results for fully supervised, semi-supervised, and unsupervised tasks. The results suggest that the proposed architecture and principles are powerful tools for learning a hierarchy of abstractions, learning iterative inference and handling temporal information.\nTopological models of empirical and formal inquiry are increasingly prevalent. They have emerged in such diverse fields as domain theory [1, 16], formal learning theory [18], epistemology and philosophy of science [10, 15, 8, 9, 2], statistics [6, 7] and modal logic [17, 4]. In those applications, open sets are typically interpreted as hypotheses deductively verifiable by true propositional information that rules out relevant possibilities. However, in statistical data analysis, one routinely receives random samples logically compatible with every statistical hypothesis. We bridge the gap between propositional and statistical data by solving for the unique topology on probability measures in which the open sets are exactly the statistically verifiable hypotheses. Furthermore, we extend that result to a topological characterization of learnability in the limit from statistical data.\nWe present an approach to synthesizing photographic images conditioned on semantic layouts. Given a semantic label map, our approach produces an image with photographic appearance that conforms to the input layout. The approach thus functions as a rendering engine that takes a two-dimensional semantic specification of the scene and produces a corresponding photographic image. Unlike recent and contemporaneous work, our approach does not rely on adversarial training. We show that photographic images can be synthesized from semantic layouts by a single feedforward network with appropriate structure, trained end-to-end with a direct regression objective. The presented approach scales seamlessly to high resolutions; we demonstrate this by synthesizing photographic images at 2-megapixel resolution, the full resolution of our training data. Extensive perceptual experiments on datasets of outdoor and indoor scenes demonstrate that images synthesized by the presented approach are considerably more realistic than alternative approaches. The results are shown in the supplementary video at https://youtu.be/0fhUJT21-bs\nA device which contains number of symbol input keys, where the number of available keys is less than the number of symbols of an alphabet of any given language, screen, and dynamic reordering table of the symbols which are mapped onto those keys, according to a disambiguation method based on the previously entered symbols. The device incorporates a previously entered keystrokes tracking mechanism, and the key selected by the user detector, as well as a mechanism to select the dynamic symbol reordering mapped onto this key according to the information contained to the reordering table. The reordering table occurs from a disambiguation method which reorders the symbol appearance. The reordering information occurs from Bayesian Belief network construction and training from text corpora of the specific language.\nIn this work we present a novel system for PET estimation using CT scans. We explore the use of fully convolutional networks (FCN) and conditional generative adversarial networks (GAN) to export PET data from CT data. Our dataset includes 25 pairs of PET and CT scans where 17 were used for training and 8 for testing. The system was tested for detection of malignant tumors in the liver region. Initial results look promising showing high detection performance with a TPR of 92.3% and FPR of 0.25 per case. Future work entails expansion of the current system to the entire body using a much larger dataset. Such a system can be used for tumor detection and drug treatment evaluation in a CT-only environment instead of the expansive and radioactive PET-CT scan.\nPrecision medicine requires the precision disease risk prediction models. In literature, there have been a lot well-established (inter-)national risk models, but when applying them into the local population, the prediction performance becomes unsatisfactory. To address the localization issue, this paper exploits the way to develop knowledge-enhanced localized risk models. On the one hand, we tune models by learning from regional Electronic Health Record (EHR) repositories, and on the other hand, we propose knowledge injection into the EHR data learning process. For experiments, we leverage the Pooled Cohort Equations (PCE, as recommended in ACC/AHA guidelines to estimate the risk of ASCVD) to develop a localized ASCVD risk prediction model in diabetes. The experimental results show that, if directly using the PCE algorithm on our cohort, the AUC is only 0.653, while our knowledge-enhanced localized risk model can achieve higher prediction performance with AUC of 0.723 (improved by 10.7%).\nConvolutional Neural Networks have been highly successful in performing a host of computer vision tasks such as object recognition, object detection, image segmentation and texture synthesis. In 2015, Gatys et. al [7] show how the style of a painter can be extracted from an image of the painting and applied to another normal photograph, thus recreating the photo in the style of the painter. The method has been successfully applied to a wide range of images and has since spawned multiple applications and mobile apps. In this paper, the neural style transfer algorithm is applied to fashion so as to synthesize new custom clothes. We construct an approach to personalize and generate new custom clothes based on a users preference and by learning the users fashion choices from a limited set of clothes from their closet. The approach is evaluated by analyzing the generated images of clothes and how well they align with the users fashion style.\nModel based iterative reconstruction (MBIR) algorithms for low-dose X-ray CT are computationally expensive. To address this problem, we recently proposed a deep convolutional neural network (CNN) for low-dose X-ray CT and won the second place in 2016 AAPM Low-Dose CT Grand Challenge. However, some of the texture were not fully recovered. To address this problem, here we propose a novel framelet-based denoising algorithm using wavelet residual network which synergistically combines the expressive power of deep learning and the performance guarantee from the framelet-based denoising algorithms. The new algorithms were inspired by the recent interpretation of the deep convolutional neural network (CNN) as a cascaded convolution framelet signal representation. Extensive experimental results confirm that the proposed networks have significantly improved performance and preserves the detail texture of the original images.\nComputer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.\nThe combination of argumentation and probability paves the way to new accounts of qualitative and quantitative uncertainty, thereby offering new theoretical and applicative opportunities. Due to a variety of interests, probabilistic argumentation is approached in the literature with different frameworks, pertaining to structured and abstract argumentation, and with respect to diverse types of uncertainty, in particular the uncertainty on the credibility of the premises, the uncertainty about which arguments to consider, and the uncertainty on the acceptance status of arguments or statements. Towards a general framework for probabilistic argumentation, we investigate a labelling-oriented framework encompassing a basic setting for rule-based argumentation and its (semi-) abstract account, along with diverse types of uncertainty. Our framework provides a systematic treatment of various kinds of uncertainty and of their relationships and allows us to back or question assertions from the literature.\nProjective Simulation was introduced as a novel approach to Artificial Intelligence. It involves a deliberation procedure that consists of a random walk on a graph of clips and allows for the learning agent to project itself into the future before committing to an action. Here we study and analyze a quantum mechanical version in which the random walk is performed by two kinds of Hamiltonians. The first kind is implemented by naively embedding the classical model in a quantum model by turning the clips into qubits. The other allows for storing clips in superpositions of qubits allowing for a potentially purely quantum mechanical learning procedure in which the perception of the environment is purely quantum mechanical but the action is classical. We lastly introduce the concept of interacting projective agents for both the classical and quantum mechanical case.\nRecently, some E-commerce sites launch a new interaction box called Tips on their mobile apps. Users can express their experience and feelings or provide suggestions using short texts typically several words or one sentence. In essence, writing some tips and giving a numerical rating are two facets of a user's product assessment action, expressing the user experience and feelings. Jointly modeling these two facets is helpful for designing a better recommendation system. While some existing models integrate text information such as item specifications or user reviews into user and item latent factors for improving the rating prediction, no existing works consider tips for improving recommendation quality. We propose a deep learning based framework named NRT which can simultaneously predict precise ratings and generate abstractive tips with good linguistic quality simulating user experience and feelings. For abstractive tips generation, gated recurrent neural networks are employed to \"translate\" user and item latent representations into a concise sentence. Extensive experiments on benchmark datasets from different domains show that NRT achieves significant improvements over the state-of-the-art methods. Moreover, the generated tips can vividly predict the user experience and feelings.\nExemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos. The critical step causing the failure is the search of similar patch candidates for an input photo patch. Conventional illumination invariant patch distances are adopted rather than directly relying on pixel intensity difference, but they will fail when local contrast within a patch changes. In this paper, we propose a fast preprocessing method named Bidirectional Luminance Remapping (BLR), which interactively adjust the lighting of training and input photos. Our method can be directly integrated into state-of-the-art exemplar-based methods to improve their robustness with ignorable computational cost.\nDiscriminative correlation filters (DCFs) have been shown to perform superiorly in visual tracking. They only need a small set of training samples from the initial frame to generate an appearance model. However, existing DCFs learn the filters separately from feature extraction, and update these filters using a moving average operation with an empirical weight. These DCF trackers hardly benefit from the end-to-end training. In this paper, we propose the CREST algorithm to reformulate DCFs as a one-layer convolutional neural network. Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training. To reduce model degradation during online update, we apply residual learning to take appearance changes into account. Extensive experiments on the benchmark datasets demonstrate that our CREST tracker performs favorably against state-of-the-art trackers.\nExplaining and reasoning about processes which underlie observed black-box phenomena enables the discovery of causal mechanisms, derivation of suitable abstract representations and the formulation of more robust predictions. We propose to learn high level functional programs in order to represent abstract models which capture the invariant structure in the observed data. We introduce the $\\pi$-machine (program-induction machine) -- an architecture able to induce interpretable LISP-like programs from observed data traces. We propose an optimisation procedure for program learning based on backpropagation, gradient descent and A* search. We apply the proposed method to two problems: system identification of dynamical systems and explaining the behaviour of a DQN agent. Our results show that the $\\pi$-machine can efficiently induce interpretable programs from individual data traces.\nHierarchical reinforcement learning methods offer a powerful means of planning flexible behavior in complicated domains. However, learning an appropriate hierarchical decomposition of a domain into subtasks remains a substantial challenge. We present a novel algorithm for subtask discovery, based on the recently introduced multitask linearly-solvable Markov decision process (MLMDP) framework. The MLMDP can perform never-before-seen tasks by representing them as a linear combination of a previously learned basis set of tasks. In this setting, the subtask discovery problem can naturally be posed as finding an optimal low-rank approximation of the set of tasks the agent will face in a domain. We use non-negative matrix factorization to discover this minimal basis set of tasks, and show that the technique learns intuitive decompositions in a variety of domains. Our method has several qualitatively desirable features: it is not limited to learning subtasks with single goal states, instead learning distributed patterns of preferred states; it learns qualitatively different hierarchical decompositions in the same domain depending on the ensemble of tasks the agent will face; and it may be straightforwardly iterated to obtain deeper hierarchical decompositions.\nAs technology become more advanced, those who design, use and are otherwise affected by it want to know that it will perform correctly, and understand why it does what it does, and how to use it appropriately. In essence they want to be able to trust the systems that are being designed. In this survey we present assurances that are the method by which users can understand how to trust this technology. Trust between humans and autonomy is reviewed, and the implications for the design of assurances are highlighted. A survey of research that has been performed with respect to assurances is presented, and several key ideas are extracted in order to refine the definition of assurances. Several directions for future research are identified and discussed.\nWe explain that the difficulties of training deep neural networks come from a syndrome of three consistency issues. This paper describes our efforts in their analysis and treatment. The first issue is the training speed inconsistency in different layers. We propose to address it with an intuitive, simple-to-implement, low footprint second-order method. The second issue is the scale inconsistency between the layer inputs and the layer residuals. We explain how second-order information provides favorable convenience in removing this roadblock. The third and most challenging issue is the inconsistency in residual propagation. Based on the fundamental theorem of linear algebra, we provide a mathematical characterization of the famous vanishing gradient problem. Thus, an important design principle for future optimization and neural network design is derived. We conclude this paper with the construction of a novel contractive neural network.\nAlgorithms learned from data are increasingly used for deciding many aspects in our life: from movies we see, to prices we pay, or medicine we get. Yet there is growing evidence that decision making by inappropriately trained algorithms may unintentionally discriminate people. For example, in automated matching of candidate CVs with job descriptions, algorithms may capture and propagate ethnicity related biases. Several repairs for selected algorithms have already been proposed, but the underlying mechanisms how such discrimination happens from the computational perspective are not yet scientifically understood. We need to develop theoretical understanding how algorithms may become discriminatory, and establish fundamental machine learning principles for prevention. We need to analyze machine learning process as a whole to systematically explain the roots of discrimination occurrence, which will allow to devise global machine learning optimization criteria for guaranteed prevention, as opposed to pushing empirical constraints into existing algorithms case-by-case. As a result, the state-of-the-art will advance from heuristic repairing, to proactive and theoretically supported prevention. This is needed not only because law requires to protect vulnerable people. Penetration of big data initiatives will only increase, and computer science needs to provide solid explanations and accountability to the public, before public concerns lead to unnecessarily restrictive regulations against machine learning.\nWhile online communities have become increasingly important over the years, the moderation of user-generated content is still performed mostly manually. Automating this task is an important step in reducing the financial cost associated with moderation, but the majority of automated approaches strictly based on message content are highly vulnerable to intentional obfuscation. In this paper, we discuss methods for extracting conversational networks based on raw multi-participant chat logs, and we study the contribution of graph features to a classification system that aims to determine if a given message is abusive. The conversational graph-based system yields unexpectedly high performance , with results comparable to those previously obtained with a content-based approach.\nIt has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.\nHigh-dimensional representations, such as radial basis function networks or tile coding, are common choices for policy evaluation in reinforcement learning. Learning with such high-dimensional representations, however, can be expensive, particularly for matrix methods, such as least-squares temporal difference learning or quasi-Newton methods that approximate matrix step-sizes. In this work, we explore the utility of sketching for these two classes of algorithms. We highlight issues with sketching the high-dimensional features directly, which can incur significant bias. As a remedy, we demonstrate how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters. We empirically investigate these algorithms, in four domains with a variety of representations. Our aim is to provide insights into effective use of sketching in practice.\nWe describe the University of Maryland machine translation systems submitted to the WMT17 German-English Bandit Learning Task. The task is to adapt a translation system to a new domain, using only bandit feedback: the system receives a German sentence to translate, produces an English sentence, and only gets a scalar score as feedback. Targeting these two challenges (adaptation and bandit learning), we built a standard neural machine translation system and extended it in two ways: (1) robust reinforcement learning techniques to learn effectively from the bandit feedback, and (2) domain adaptation using data selection from a large corpus of parallel data.\nAgent-based modeling and simulation tools provide a mature platform for development of complex simulations. They however, have not been applied much in the domain of mainstream modeling and simulation of computer networks. In this article, we evaluate how and if these tools can offer any value-addition in the modeling & simulation of complex networks such as pervasive computing, large-scale peer-to-peer systems, and networks involving considerable environment and human/animal/habitat interaction. Specifically, we demonstrate the effectiveness of NetLogo - a tool that has been widely used in the area of agent-based social simulation.\nIn the real world, agents or entities are in a continuous state of interactions. These inter- actions lead to various types of complexity dynamics. One key difficulty in the study of complex agent interactions is the difficulty of modeling agent communication on the basis of rewards. Game theory offers a perspective of analysis and modeling these interactions. Previously, while a large amount of literature is available on game theory, most of it is from specific domains and does not cater for the concepts from an agent- based perspective. Here in this paper, we present a comprehensive multidisciplinary state-of-the-art review and taxonomy of game theory models of complex interactions between agents.\nThe success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data. Inspired by the nature of human perception of 3D shapes as a collection of simple parts, we explore such an abstract shape representation based on primitives. Given a single depth image of an object, we present 3D-PRNN, a generative recurrent neural network that synthesizes multiple plausible shapes composed of a set of primitives. Our generative model encodes symmetry characteristics of common man-made objects, preserves long-range structural coherence, and describes objects of varying complexity with a compact representation. We also propose a method based on Gaussian Fields to generate a large scale dataset of primitive-based shape representations to train our network. We evaluate our approach on a wide range of examples and show that it outperforms nearest-neighbor based shape retrieval methods and is on-par with voxel-based generative models while using a significantly reduced parameter space.\nVariational inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, boosting variational inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. However, as is the case with many other variational inference algorithms, its theoretical properties have not been studied. In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm. Our analyses yields novel theoretical insights regarding the sufficient conditions for convergence, explicit rates, and algorithmic simplifications. Since a lot of focus in previous works for variational inference has been on tractability, our work is especially important as a much needed attempt to bridge the gap between probabilistic models and their corresponding theoretical properties.\nIn this paper we present a new dataset and user simulator e-QRAQ (explainable Query, Reason, and Answer Question) which tests an Agent's ability to read an ambiguous text; ask questions until it can answer a challenge question; and explain the reasoning behind its questions and answer. The User simulator provides the Agent with a short, ambiguous story and a challenge question about the story. The story is ambiguous because some of the entities have been replaced by variables. At each turn the Agent may ask for the value of a variable or try to answer the challenge question. In response the User simulator provides a natural language explanation of why the Agent's query or answer was useful in narrowing down the set of possible answers, or not. To demonstrate one potential application of the e-QRAQ dataset, we train a new neural architecture based on End-to-End Memory Networks to successfully generate both predictions and partial explanations of its current understanding of the problem. We observe a strong correlation between the quality of the prediction and explanation.\nIn this work we introduce declarative statistics, a suite of declarative modelling tools for statistical analysis. Statistical constraints represent the key building block of declarative statistics. First, we introduce a range of relevant counting and matrix constraints and associated decompositions, some of which novel, that are instrumental in the design of statistical constraints. Second, we introduce a selection of novel statistical constraints and associated decompositions, which constitute a self-contained toolbox that can be used to tackle a wide range of problems typically encountered by statisticians. Finally, we deploy these statistical constraints to a wide range of application areas drawn from classical statistics and we contrast our framework against established practices.\nWe methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari games, our algorithm outperforms other algorithms (e.g. deep and double deep Q-networks) in terms of both game-play performance and sample complexity.\nThe vanishing gradient problem was a major obstacle for the success of deep learning. In recent years it was gradually alleviated through multiple different techniques. However the problem was not really overcome in a fundamental way, since it is inherent to neural networks with activation functions based on dot products. In a series of papers, we are going to analyze alternative neural network structures which are not based on dot products. In this first paper, we revisit neural networks built up of layers based on distance measures and Gaussian activation functions. These kinds of networks were only sparsely used in the past since they are hard to train when using plain stochastic gradient descent methods. We show that by using Root Mean Square Propagation (RMSProp) it is possible to efficiently learn multi-layer neural networks. Furthermore we show that when appropriately initialized these kinds of neural networks suffer much less from the vanishing and exploding gradient problem than traditional neural networks even for deep networks.\nFrom scientific experiments to online A/B testing, the previously observed data often affects how future experiments are performed, which in turn affects which data will be collected. Such adaptivity introduces complex correlations between the data and the collection procedure. In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic \\emph{negative} biases. As an example, consider an adaptive clinical trial where additional data points are more likely to be tested for treatments that show initial promise. Our surprising result implies that the average observed treatment effects would underestimate the true effects of each treatment. We quantitatively analyze the magnitude and behavior of this negative bias in a variety of settings. We also propose a novel debiasing algorithm based on selective inference techniques. In experiments, our method can effectively reduce bias and estimation error.\nWe release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was recorded every 3 frames which ensures suitability for a wide variety of machine learning tasks such as strategy classification, inverse reinforcement learning, imitation learning, forward modeling, partial information extraction, and others. We use TorchCraft to extract and store the data, which standardizes the data format for both reading from replays and reading directly from the game. Furthermore, the data can be used on different operating systems and platforms. The dataset contains valid, non-corrupted replays only and its quality and diversity was ensured by a number of heuristics. We illustrate the diversity of the data with various statistics and provide examples of tasks that benefit from the dataset. We make the dataset available at https://github.com/TorchCraft/StarData . En Taro Adun!\nIn this work we focus on the following question: how important was the i-th feature in determining the outcome for a given datapoint? We identify a family of influence measures; functions that, given a datapoint x, assign a value phi_i(x) to every feature i, which roughly corresponds to that i's importance in determining the outcome for x. This family is uniquely derived from a set of axioms: desirable properties that any reasonable influence measure should satisfy. Departing from prior work on influence measures, we assume no knowledge of - or access to - the underlying classifier labelling the dataset. In other words, our influence measures are based on the dataset alone, and do not make any queries to the classifier. While this requirement naturally limits the scope of explanations we provide, we show that it is effective on real datasets.\nQuestions play a prominent role in social interactions, performing rhetorical functions that go beyond that of simple informational exchange. The surface form of a question can signal the intention and background of the person asking it, as well as the nature of their relation with the interlocutor. While the informational nature of questions has been extensively examined in the context of question-answering applications, their rhetorical aspects have been largely understudied.   In this work we introduce an unsupervised methodology for extracting surface motifs that recur in questions, and for grouping them according to their latent rhetorical role. By applying this framework to the setting of question sessions in the UK parliament, we show that the resulting typology encodes key aspects of the political discourse---such as the bifurcation in questioning behavior between government and opposition parties---and reveals new insights into the effects of a legislator's tenure and political career ambitions.\nGenerative statistical models of chord sequences play crucial roles in music processing. To capture syntactic similarities among certain chords (e.g. in C major key, between G and G7 and between F and Dm), we study hidden Markov models and probabilistic context-free grammar models with latent variables describing syntactic categories of chord symbols and their unsupervised learning techniques for inducing the latent grammar from data. Surprisingly, we find that these models often outperform conventional Markov models in predictive power, and the self-emergent categories often correspond to traditional harmonic functions. This implies the need for chord categories in harmony models from the informatics perspective.\nIn a robotised warehouse a major issue is the safety of human operators in case of intervention in the work area of the robots. The current solution is to shut down every robot but it causes a loss of productivity, especially for large robotised warehouses. In order to avoid this loss we need to ensure the operator's security during his/her intervention in the warehouse without powering off the robots. The human operator needs to be localised in the warehouse and the trajectories of the robots have to be modified so that they do not interfere with the human. The purpose of this paper is to demonstrate a visual localisation method with visual elements that are already available in the current warehouse setup.\nSequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic metrics and human evaluation on multiple datasets. Next, we propose a novel entailment-enhanced reward (CIDEnt) that corrects phrase-matching based metrics (such as CIDEr) to only allow for logically-implied partial matches and avoid contradictions, achieving further significant improvements over the CIDEr-reward model. Overall, our CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset.\nWe present a simple sequential sentence encoder for multi-domain natural language inference. Our encoder is based on stacked bidirectional LSTM-RNNs with shortcut connections and fine-tuning of word embeddings. The overall supervised model uses the above encoder to encode two input sentences into two vectors, and then uses a classifier over the vector combination to label the relationship between these two sentences as that of entailment, contradiction, or neural. Our Shortcut-Stacked sentence encoders achieve strong improvements over existing encoders on matched and mismatched multi-domain natural language inference (top non-ensemble single-model result in the EMNLP RepEval 2017 Shared Task (Nangia et al., 2017)). Moreover, they achieve the new state-of-the-art encoding result on the original SNLI dataset (Bowman et al., 2015).\nIn this paper, we propose a secure multibiometric system that uses deep neural networks and error-correction coding. We present a feature-level fusion framework to generate a secure multibiometric template from each user's multiple biometrics. Two fusion architectures, fully connected architecture and bilinear architecture, are implemented to develop a robust multibiometric shared representation. The shared representation is used to generate a cancelable biometric template that involves the selection of a different set of reliable and discriminative features for each user. This cancelable template is a binary vector and is passed through an appropriate error-correcting decoder to find a closest codeword and this codeword is hashed to generate the final secure template. The efficacy of the proposed approach is shown using a multimodal database where we achieve state-of-the-art matching performance, along with cancelability and security.\nLiterature on the modeling and simulation of complex adaptive systems (cas) has primarily advanced vertically in different scientific domains with scientists developing a variety of domain-specific approaches and applications. However, while cas researchers are inher-ently interested in an interdisciplinary comparison of models, to the best of our knowledge, there is currently no single unified framework for facilitating the development, comparison, communication and validation of models across different scientific domains. In this thesis, we propose first steps towards such a unified framework using a combination of agent-based and complex network-based modeling approaches and guidelines formulated in the form of a set of four levels of usage, which allow multidisciplinary researchers to adopt a suitable framework level on the basis of available data types, their research study objectives and expected outcomes, thus allowing them to better plan and conduct their respective re-search case studies.\nRecently software development companies started to embrace Machine Learning (ML) techniques for introducing a series of advanced functionality in their products such as personalisation of the user experience, improved search, content recommendation and automation. The technical challenges for tackling these problems are heavily researched in literature. A less studied area is a pragmatic approach to the role of humans in a complex modern industrial environment where ML based systems are developed. Key stakeholders affect the system from inception and up to operation and maintenance. Product managers want to embed \"smart\" experiences for their users and drive the decisions on what should be built next; software engineers are challenged to build or utilise ML software tools that require skills that are well outside of their comfort zone; legal and risk departments may influence design choices and data access; operations teams are requested to maintain ML systems which are non-stationary in their nature and change behaviour over time; and finally ML practitioners should communicate with all these stakeholders to successfully build a reliable system. This paper discusses some of the challenges we faced in Atlassian as we started investing more in the ML space.\nActive learning aims to select a small subset of data for annotation such that a classifier learned on the data is highly accurate. This is usually done using heuristic selection methods, however the effectiveness of such methods is limited and moreover, the performance of heuristics varies between datasets. To address these shortcomings, we introduce a novel formulation by reframing the active learning as a reinforcement learning problem and explicitly learning a data selection policy, where the policy takes the role of the active learning heuristic. Importantly, our method allows the selection policy learned using simulation on one language to be transferred to other languages. We demonstrate our method using cross-lingual named entity recognition, observing uniform improvements over traditional active learning.\nMany stochastic optimization algorithms work by estimating the gradient of the cost function on the fly by sampling datapoints uniformly at random from a training set. However, the estimator might have a large variance, which inadvertently slows down the convergence rate of the algorithms. One way to reduce this variance is to sample the datapoints from a carefully selected non-uniform distribution. In this work, we propose a novel non-uniform sampling approach that uses the multi-armed bandit framework. Theoretically, we show that our algorithm asymptotically approximates the optimal variance within a factor of 3. Empirically, we show that using this datapoint-selection technique results in a significant reduction in the convergence time and variance of several stochastic optimization algorithms such as SGD, SVRG and SAGA. This approach for sampling datapoints is general, and can be used in conjunction with any algorithm that uses an unbiased gradient estimation -- we expect it to have broad applicability beyond the specific examples explored in this work.\nThis paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of significant importance for research in artificial intelligence, given its multimodal nature, clear evaluation protocol, and potential real-world applications. The performance of deep neural networks for VQA is very dependent on choices of architectures and hyperparameters. To help further research in the area, we describe in detail our high-performing, though relatively simple model. Through a massive exploration of architectures and hyperparameters representing more than 3,000 GPU-hours, we identified tips and tricks that lead to its success, namely: sigmoid outputs, soft training targets, image features from bottom-up attention, gated tanh activations, output embeddings initialized using GloVe and Google Images, large mini-batches, and smart shuffling of training data. We provide a detailed analysis of their impact on performance to assist others in making an appropriate selection.\nReinforcement learning is a proven technique for an agent to learn a task. However, when learning a task using reinforcement learning, the agent cannot distinguish the characteristics of the environment from those of the task. This makes it harder to transfer skills between tasks in the same environment. Furthermore, this does not reduce risk when training for a new task. In this paper, we introduce an approach to decouple the environment characteristics from the task-specific ones, allowing an agent to develop a sense of survival. We evaluate our approach in an environment where an agent must learn a sequence of collection tasks, and show that decoupled learning allows for a safer utilization of prior knowledge.\nWe discuss memory models which are based on tensor decompositions using latent representations of entities and events. We show how episodic memory and semantic memory can be realized and discuss how new memory traces can be generated from sensory input: Existing memories are the basis for perception and new memories are generated via perception. We relate our mathematical approach to the hippocampal memory indexing theory. We describe the first detailed mathematical models for the complete processing pipeline from sensory input and its semantic decoding, i.e., perception, to the formation of episodic and semantic memories and their declarative semantic decodings. Our main hypothesis is that perception includes an active semantic decoding process, which relies on latent representations of entities and predicates, and that episodic and semantic memories depend on the same decoding process. We contribute to the debate between the leading memory consolidation theories, i.e., the standard consolidation theory (SCT) and the multiple trace theory (MTT). The latter is closely related to the complementary learning systems (CLS) framework. In particular, we show explicitly how episodic memory can teach the neocortex to form a semantic memory, which is a core issue in MTT and CLS.\nData Mining is best-known for its analytical and prediction capabilities. It is used in several areas such as fraud detection, predicting client behavior, money market behavior, bankruptcy prediction. It can also help in establishing an educational ecosystem, which discovers useful knowledge, and assist educators to take proactive decisions to boost student performance and employability. This paper presents an empirical study that compares varied classification algorithms on two datasets of MCA (Masters in Computer Applications) students collected from various affiliated colleges of a reputed state university in India. One dataset includes only primary attributes, whereas other dataset is feeded with secondary psychometric attributes in it. The results showcase that solely primary academic attributes do not lead to smart prediction accuracy of students employability, once they square measure within the initial year of their education. The study analyzes and stresses the role of secondary psychometric attributes for better prediction accuracy and analysis of students performance. Timely prediction and analysis of students performance can help Management, Teachers and Students to work on their gray areas for better results and employment opportunities.\nHierarchically structured agent plans are important for efficient planning and acting, and they also serve (among other things) to produce \"richer\" classical plans, composed not just of a sequence of primitive actions, but also \"abstract\" ones representing the supplied hierarchies. A crucial step for this and other approaches is deriving precondition and effect \"summaries\" from a given plan hierarchy. This paper provides mechanisms to do this for more pragmatic and conventional hierarchies than in the past. To this end, we formally define the notion of a precondition and an effect for a hierarchical plan; we present data structures and algorithms for automatically deriving this information; and we analyse the properties of the presented algorithms. We conclude the paper by detailing how our algorithms may be used together with a classical planner in order to obtain abstract plans.\nFacing an unknown situation, a person may not be able to firmly elicit his/her preferences over different alternatives, so he/she tends to express uncertain preferences. Given a community of different persons expressing their preferences over certain alternatives under uncertainty, to get a collective representative opinion of the whole community, a preference fusion process is required. The aim of this work is to propose a preference fusion method that copes with uncertainty and escape from the Condorcet paradox. To model preferences under uncertainty, we propose to develop a model of preferences based on belief function theory that accurately describes and captures the uncertainty associated with individual or collective preferences. This work improves and extends the previous results. This work improves and extends the contribution presented in a previous work. The benefits of our contribution are twofold. On the one hand, we propose a qualitative and expressive preference modeling strategy based on belief-function theory which scales better with the number of sources. On the other hand, we propose an incremental distance-based algorithm (using Jousselme distance) for the construction of the collective preference order to avoid the Condorcet Paradox.\nKnowledge graphs and vector space models are robust knowledge representation techniques with individual strengths and weaknesses. Vector space models excel at determining similarity between concepts, but are severely constrained when evaluating complex dependency relations and other logic-based operations that are a strength of knowledge graphs. We describe the VKG structure that helps unify knowledge graphs and vector representation of entities, and enables powerful inference methods and search capabilities that combine their complementary strengths. We analogize this to thinking `fast' in vector space along with thinking 'slow' and `deeply' by reasoning over the knowledge graph. We have created a query processing engine that takes complex queries and decomposes them into subqueries optimized to run on the respective knowledge graph or vector view of a VKG. We show that the VKG structure can process specific queries that are not efficiently handled by vector spaces or knowledge graphs alone. We also demonstrate and evaluate the VKG structure and the query processing engine by developing a system called Cyber-All-Intel for knowledge extraction, representation and querying in an end-to-end pipeline grounded in the cybersecurity informatics domain.\nThere exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier.\nWe describe and evaluate a novel optimization-based off-line path planning algorithm for mobile robots based on the Counterexample-Guided Inductive Optimization (CEGIO) technique. CEGIO iteratively employs counterexamples generated from Boolean Satisfiability (SAT) and Satisfiability Modulo Theories (SMT) solvers, in order to guide the optimization process and to ensure global optimization. This paper marks the first application of CEGIO for planning mobile robot path. In particular, CEGIO has been successfully applied to obtain optimal two-dimensional paths for autonomous mobile robots using off-the-shelf SAT and SMT solvers.\nRobotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as an object-centric prior for the perception system of a learned policy. We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy. A task-independent meta-attention locates possible objects in the scene, and a task-specific attention identifies which objects are predictive of the trajectories. The scope of the task-specific attention is easily adjusted by showing demonstrations with distractor objects or with diverse relevant objects. Our results indicate that this approach exhibits good generalization across object instances using very few samples, and can be used to learn a variety of manipulation tasks using reinforcement learning.\nWe study motion planning problems where agents move inside environments that are not fully observable and subject to uncertainties. The goal is to compute a strategy for an agent that is guaranteed to satisfy certain safety and performance specifications. Such problems are naturally modelled by partially observable Markov decision processes (POMDPs). Because of the potentially huge or even infinite belief space of POMDPs, verification and strategy synthesis is in general computationally intractable. We tackle this difficulty by exploiting typical structural properties of such scenarios; for instance, we assume that agents have the ability to observe their own positions inside an environment. Ambiguity in the state of the environment is abstracted into non-deterministic choices over the possible states of the environment. Technically, this abstraction transforms POMDPs into probabilistic two-player games (PGs). For these PGs, efficient verification tools are able to determine strategies that approximate certain measures on the POMDP. If an approximation is too coarse to provide guarantees, an abstraction refinement scheme further resolves the belief space of the POMDP. We demonstrate that our method improves the state of the art by orders of magnitude compared to a direct solution of the POMDP.\nAs demand drives systems to generalize to various domains and problems, the study of multitask, transfer and lifelong learning has become an increasingly important pursuit. In discrete domains, performance on the Atari game suite has emerged as the de facto benchmark for assessing multitask learning. However, in continuous domains there is a lack of agreement on standard multitask evaluation environments which makes it difficult to compare different approaches fairly. In this work, we describe a benchmark set of tasks that we have developed in an extendable framework based on OpenAI Gym. We run a simple baseline using Trust Region Policy Optimization and release the framework publicly to be expanded and used for the systematic comparison of multitask, transfer, and lifelong learning in continuous domains.\nLearning representation for graph classification turns a variable-size graph into a fixed-size vector (or matrix). Such a representation works nicely with algebraic manipulations. Here we introduce a simple method to augment an attributed graph with a virtual node that is bidirectionally connected to all existing nodes. The virtual node represents the latent aspects of the graph, which are not immediately available from the attributes and local connectivity structures. The expanded graph is then put through any node representation method. The representation of the virtual node is then the representation of the entire graph. In this paper, we use the recently introduced Column Network for the expanded graph, resulting in a new end-to-end graph classification model dubbed Virtual Column Network (VCN). The model is validated on two tasks: (i) predicting bio-activity of chemical compounds, and (ii) finding software vulnerability from source code. Results demonstrate that VCN is competitive against well-established rivals.\nIn this paper, we consider a novel machine learning problem, that is, learning a classifier from noisy label distributions. In this problem, each instance with a feature vector belongs to at least one group. Then, instead of the true label of each instance, we observe the label distribution of the instances associated with a group, where the label distribution is distorted by an unknown noise. Our goals are to (1) estimate the true label of each instance, and (2) learn a classifier that predicts the true label of a new instance. We propose a probabilistic model that considers true label distributions of groups and parameters that represent the noise as hidden variables. The model can be learned based on a variational Bayesian method. In numerical experiments, we show that the proposed model outperforms existing methods in terms of the estimation of the true labels of instances.\nDebate summarization is one of the novel and challenging research areas in automatic text summarization which has been largely unexplored. In this paper, we develop a debate summarization pipeline to summarize key topics which are discussed or argued in the two opposing sides of online debates. We view that the generation of debate summaries can be achieved by clustering, cluster labeling, and visualization. In our work, we investigate two different clustering approaches for the generation of the summaries. In the first approach, we generate the summaries by applying purely term-based clustering and cluster labeling. The second approach makes use of X-means for clustering and Mutual Information for labeling the clusters. Both approaches are driven by ontologies. We visualize the results using bar charts. We think that our results are a smooth entry for users aiming to receive the first impression about what is discussed within a debate topic containing waste number of argumentations.\nStochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; low-performance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WP-SGD). WP-SGD combines weighted model parameters from different nodes in the system to produce the final output. WP-SGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WP-SGD does not require that all nodes consume equal quantities of data. We also analyze the theoretical feasibility of running two other parallel SGD algorithms combined with WP-SGD in a heterogeneous environment. The experimental results show that WP-SGD significantly outperforms the traditional parallel SGD algorithms on distributed training systems with an unbalanced workload.\nThis paper continues the research that considers a new cognitive model based strongly on the human brain. In particular, it considers the neural binding structure of an earlier paper. It also describes some new methods in the areas of image processing and behaviour simulation. The work is all based on earlier research by the author and the new additions are intended to fit in with the overall design. For image processing, a grid-like structure is used with 'full linking'. Each cell in the classifier grid stores a list of all other cells it gets associated with and this is used as the learned image that new input is compared to. For the behaviour metric, a new prediction equation is suggested, as part of a simulation, that uses feedback and history to dynamically determine its course of action. While the new methods are from widely different topics, both can be compared with the binary-analog type of interface that is the main focus of the paper. It is suggested that the simplest of linking between a tree and ensemble can explain neural binding and variable signal strengths.\nSum-product networks (SPNs) are a class of probabilistic graphical models that allow tractable marginal inference. However, the maximum a posteriori (MAP) inference in SPNs is NP-hard. We investigate MAP inference in SPNs from both theoretical and algorithmic perspectives. For the theoretical part, we reduce general MAP inference to its special case without evidence and hidden variables; we also show that it is NP-hard to approximate the MAP problem to $2^{n^\\epsilon}$ for fixed $0 \\leq \\epsilon < 1$, where $n$ is the input size. For the algorithmic part, we first present an exact MAP solver that runs reasonably fast and could handle SPNs with up to 1k variables and 150k arcs in our experiments. We then present a new approximate MAP solver with a good balance between speed and accuracy, and our comprehensive experiments on real-world datasets show that it has better overall performance than existing approximate solvers.\nCross-modal data retrieval has been the basis of various creative tasks performed by Artificial Intelligence (AI). One such highly challenging task for AI is to convert a book into its corresponding movie, which most of the creative film makers do as of today. In this research, we take the first step towards it by visualizing the content of a book using its corresponding movie visuals. Given a set of sentences from a book or even a fan-fiction written in the same universe, we employ deep learning models to visualize the input by stitching together relevant frames from the movie. We studied and compared three different types of setting to match the book with the movie content: (i) Dialog model: using only the dialog from the movie, (ii) Visual model: using only the visual content from the movie, and (iii) Hybrid model: using the dialog and the visual content from the movie. Experiments on the publicly available MovieBook dataset shows the effectiveness of the proposed models.\nT-distributed stochastic neighbor embedding (tSNE) is a popular and prize-winning approach for dimensionality reduction and visualizing high-dimensional data. However, tSNE is non-parametric: once visualization is built, tSNE is not designed to incorporate additional data into existing representation. It highly limits the applicability of tSNE to the scenarios where data are added or updated over time (like dashboards or series of data snapshots).   In this paper we propose, analyze and evaluate LION-tSNE (Local Interpolation with Outlier coNtrol) - a novel approach for incorporating new data into tSNE representation. LION-tSNE is based on local interpolation in the vicinity of training data, outlier detection and a special outlier mapping algorithm. We show that LION-tSNE method is robust both to outliers and to new samples from existing clusters. We also discuss multiple possible improvements for special cases.   We compare LION-tSNE to a comprehensive list of possible benchmark approaches that include multiple interpolation techniques, gradient descent for new data, and neural network approximation.\nEntity alignment is the task of finding entities in two knowledge bases (KBs) that represent the same real-world object. When facing KBs in different natural languages, conventional cross-lingual entity alignment methods rely on machine translation to eliminate the language barriers. These approaches often suffer from the uneven quality of translations between languages. While recent embedding-based techniques encode entities and relationships in KBs and do not need machine translation for cross-lingual entity alignment, a significant number of attributes remain largely unexplored. In this paper, we propose a joint attribute-preserving embedding model for cross-lingual entity alignment. It jointly embeds the structures of two KBs into a unified vector space and further refines it by leveraging attribute correlations in the KBs. Our experimental results on real-world datasets show that this approach significantly outperforms the state-of-the-art embedding approaches for cross-lingual entity alignment and could be complemented with methods based on machine translation.\nSupport vector data description (SVDD) is a popular technique for detecting anomalies. The SVDD classifier partitions the whole space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, and the Gaussian kernel is a common choice for the kernel function. The Gaussian kernel has a bandwidth parameter, whose value is important for good results. A small bandwidth leads to overfitting, and the resulting SVDD classifier overestimates the number of anomalies. A large bandwidth leads to underfitting, and the classifier fails to detect many anomalies. In this paper we present a new automatic, unsupervised method for selecting the Gaussian kernel bandwidth. The selected value can be computed quickly, and it is competitive with existing bandwidth selection methods.\nThe cognitive framework of conceptual spaces [3] provides geometric means for representing knowledge. A conceptual space is a high-dimensional space whose dimensions are partitioned into so-called domains. Within each domain, the Euclidean metric is used to compute distances. Distances in the overall space are computed by applying the Manhattan metric to the intra-domain distances. Instances are represented as points in this space and concepts are represented by regions. In this paper, we derive a formula for the size of a hyperball under the combined metric of a conceptual space. One can think of such a hyperball as the set of all points having a certain minimal similarity to the hyperball's center.\nKnowledge is useless without structure. While the classification of knowledge has been an enduring philosophical enterprise, it recently found applications in computer science, notably for artificial intelligence. The availability of large databases allowed for complex ontologies to be built automatically, for example by extracting structured content from Wikipedia. However, this approach is subject to manual categorization decisions made by online editors. Here we show that an implicit classification hierarchy emerges spontaneously on Wikipedia. We study the network of first links between articles, and find that it centers on a core cycle involving concepts of fundamental classifying importance. We argue that this structure is rooted in cultural history. For European languages, articles like Philosophy and Science are central, whereas Human and Earth dominate for East Asian languages. This reflects the differences between ancient Greek thought and Chinese tradition. Our results reveal the powerful influence of culture on the intrinsic architecture of complex data sets.\nAmong the local consistency techniques used for solving constraint networks, path-consistency (PC) has received a great deal of attention. However, enforcing PC is computationally expensive and sometimes even unnecessary. Directional path-consistency (DPC) is a weaker notion of PC that considers a given variable ordering and can thus be enforced more efficiently than PC. This paper shows that DPC (the DPC enforcing algorithm of Dechter and Pearl) decides the constraint satisfaction problem (CSP) of a constraint language if it is complete and has the variable elimination property (VEP). However, we also show that no complete VEP constraint language can have a domain with more than 2 values. We then present a simple variant of the DPC algorithm, called DPC*, and show that the CSP of a constraint language can be decided by DPC* if it is closed under a majority operation. In fact, DPC* is sufficient for guaranteeing backtrack-free search for such constraint networks. Examples of majority-closed constraint classes include the classes of connected row-convex (CRC) constraints and tree-preserving constraints, which have found applications in various domains, such as scene labeling, temporal reasoning, geometric reasoning, and logical filtering. Our experimental evaluations show that DPC* significantly outperforms the state-of-the-art algorithms for solving majority-closed constraints.\nWe present LADDER, the first deep reinforcement learning agent that can successfully learn control policies for large-scale real-world problems directly from raw inputs composed of high-level semantic information. The agent is based on an asynchronous stochastic variant of DQN (Deep Q Network) named DASQN. The inputs of the agent are plain-text descriptions of states of a game of incomplete information, i.e. real-time large scale online auctions, and the rewards are auction profits of very large scale. We apply the agent to an essential portion of JD's online RTB (real-time bidding) advertising business and find that it easily beats the former state-of-the-art bidding policy that had been carefully engineered and calibrated by human experts: during JD.com's June 18th anniversary sale, the agent increased the company's ads revenue from the portion by more than 50%, while the advertisers' ROI (return on investment) also improved significantly.\nLong Short-Term Memory (LSTM) is the primary recurrent neural networks architecture for acoustic modeling in automatic speech recognition systems. Residual learning is an efficient method to help neural networks converge easier and faster. In this paper, we propose several types of residual LSTM methods for our acoustic modeling. Our experiments indicate that, compared with classic LSTM, our architecture shows more than 8% relative reduction in Phone Error Rate (PER) on TIMIT tasks. At the same time, our residual fast LSTM approach shows 4% relative reduction in PER on the same task. Besides, we find that all this architecture could have good results on THCHS-30, Librispeech and Switchboard corpora.\nOne of the most crucial issues in data mining is to model human behaviour in order to provide personalisation, adaptation and recommendation. This usually involves implicit or explicit knowledge, either by observing user interactions, or by asking users directly. But these sources of information are always subject to the volatility of human decisions, making utilised data uncertain to a particular extent. In this contribution, we elaborate on the impact of this human uncertainty when it comes to comparative assessments of different data mining approaches. In particular, we reveal two problems: (1) biasing effects on various metrics of model-based prediction and (2) the propagation of uncertainty and its thus induced error probabilities for algorithm rankings. For this purpose, we introduce a probabilistic view and prove the existence of those problems mathematically, as well as provide possible solution strategies. We exemplify our theory mainly in the context of recommender systems along with the metric RMSE as a prominent example of precision quality measures.\nThis article constructs a Turing Machine which can solve for $\\beta^{'}$ which is RE-complete. Such a machine is only possible if there is something wrong with the foundations of computer science and mathematics. We therefore check our work by looking very closely at Cantor's diagonalization and construct a novel formal language as an Abelian group which allows us, through equivalence relations, to provide a non-trivial counterexample to Cantor's argument. As if that wasn't enough, we then discover that the impredicative nature of G\\\"odel's diagonalization lemma leads to logical tautology, invalidating any meaning behind the method, leaving no doubt that diagonalization is flawed. Our discovery in regards to these foundational arguments opens the door to solving the P vs NP problem.\nData analytics helps basketball teams to create tactics. However, manual data collection and analytics are costly and ineffective. Therefore, we applied a deep bidirectional long short-term memory (BLSTM) and mixture density network (MDN) approach. This model is not only capable of predicting a basketball trajectory based on real data, but it also can generate new trajectory samples. It is an excellent application to help coaches and players decide when and where to shoot. Its structure is particularly suitable for dealing with time series problems. BLSTM receives forward and backward information at the same time, while stacking multiple BLSTMs further increases the learning ability of the model. Combined with BLSTMs, MDN is used to generate a multi-modal distribution of outputs. Thus, the proposed model can, in principle, represent arbitrary conditional probability distributions of output variables. We tested our model with two experiments on three-pointer datasets from NBA SportVu data. In the hit-or-miss classification experiment, the proposed model outperformed other models in terms of the convergence speed and accuracy. In the trajectory generation experiment, eight model-generated trajectories at a given time closely matched real trajectories.\nDeep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.\nIn this paper we present a novel Formal Agent-Based Simulation framework (FABS). FABS uses formal specification as a means of clear description of wireless sensor networks (WSN) sensing a Complex Adaptive Environment. This specification model is then used to develop an agent-based model of both the wireless sensor network as well as the environment. As proof of concept, we demonstrate the application of FABS to a boids model of self-organized flocking of animals monitored by a random deployment of proximity sensors.\nIn this paper, a new type of 3D bin packing problem (BPP) is proposed, in which a number of cuboid-shaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixed-sized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Our research shows that this problem is NP-hard. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. Among these factors, the sequence of items plays a key role in minimizing the surface area. Inspired by recent achievements of deep reinforcement learning (DRL) techniques, especially Pointer Network, on combinatorial optimization problems such as TSP, a DRL-based method is applied to optimize the sequence of items to be packed into the bin. Numerical results show that the method proposed in this paper achieve about 5% improvement than heuristic method.\nThe methodology of Software-Defined Robotics hierarchical-based and stand-alone framework can be designed and implemented to program and control different sets of robots, regardless of their manufacturers' parameters and specifications, with unified commands and communications. This framework approach will increase the capability of (re)programming a specific group of robots during the runtime without affecting the others as desired in the critical missions and industrial operations, expand the shared bandwidth, enhance the reusability of code, leverage the computational processing power, decrease the unnecessary analyses of vast supplemental electrical components for each robot, as well as get advantages of the most state-of-the-art industrial trends in the cloud-based computing, Virtual Machines (VM), and Robot-as-a-Service (RaaS) technologies.\nHandwritten character recognition has been the center of research and a benchmark problem in the sector of pattern recognition and artificial intelligence, and it continues to be a challenging research topic. Due to its enormous application many works have been done in this field focusing on different languages. Arabic, being a diversified language has a huge scope of research with potential challenges. A convolutional neural network model for recognizing handwritten numerals in Arabic language is proposed in this paper, where the dataset is subject to various augmentation in order to add robustness needed for deep learning approach. The proposed method is empowered by the presence of dropout regularization to do away with the problem of data overfitting. Moreover, suitable change is introduced in activation function to overcome the problem of vanishing gradient. With these modifications, the proposed system achieves an accuracy of 99.4\\% which performs better than every previous work on the dataset.\nTraining large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.\nThe increasing availability of affect-rich multimedia resources has bolstered interest in understanding sentiment and emotions in and from visual content. Adjective-noun pairs (ANP) are a popular mid-level semantic construct for capturing affect via visually detectable concepts such as \"cute dog\" or \"beautiful landscape\". Current state-of-the-art methods approach ANP prediction by considering each of these compound concepts as individual tokens, ignoring the underlying relationships in ANPs. This work aims at disentangling the contributions of the `adjectives' and `nouns' in the visual prediction of ANPs. Two specialised classifiers, one trained for detecting adjectives and another for nouns, are fused to predict 553 different ANPs. The resulting ANP prediction model is more interpretable as it allows us to study contributions of the adjective and noun components. Source code and models are available at https://imatge-upc.github.io/affective-2017-musa2/ .\nEfficient Monte Carlo inference often requires manual construction of model-specific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler's ability to escape local modes yields higher final F1 scores than single-site Gibbs.\nIn this era of digitization, knowing the user's sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user's language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user's gender and native language information. Here user's tweets written in a different language from their native language are represented as Document - Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes. This experiment attains the average accuracy of 73.42% in gender prediction and 76.26% in the native language identification task.\nIn this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accuracy, and (c) accuracy of counterfactual inference. For (b) and (c) we train causal Bayesian networks with structures as predicted by each causal discovery technique to carry out counterfactual or standard predictive inference. We compare causal algorithms on two pub- licly available and one simulated datasets having different sample sizes: small, medium and large. Experiments show that structural accuracy of a technique does not necessarily correlate with higher accuracy of inferencing tasks. Fur- ther, surveyed structure learning algorithms do not perform well in terms of structural accuracy in case of datasets having large number of variables.\nWord embeddings have been found to capture a surprisingly rich amount of syntactic and semantic knowledge. However, it is not yet sufficiently well-understood how the relational knowledge that is implicitly encoded in word embeddings can be extracted in a reliable way. In this paper, we propose two probabilistic models to address this issue. The first model is based on the common relations-as-translations view, but is cast in a probabilistic setting. Our second model is based on the much weaker assumption that there is a linear relationship between the vector representations of related words. Compared to existing approaches, our models lead to more accurate predictions, and they are more explicit about what can and cannot be extracted from the word embedding.\nThe nursing literature shows that cultural competence is an important requirement for effective healthcare. We claim that personal assistive robots should likewise be culturally competent, that is, they should be aware of general cultural characteristics and of the different forms they take in different individuals, and take these into account while perceiving, reasoning, and acting. The CARESSES project is an Europe-Japan collaborative effort that aims at designing, developing and evaluating culturally competent assistive robots. These robots will be able to adapt the way they behave, speak and interact to the cultural identity of the person they assist. This paper describes the approach taken in the CARESSES project, its initial steps, and its future plans.\nNetworks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model selection to evaluate network representations from data, focusing on fundamental predictive tasks on networks. We present a modular methodology using general, interpretable network models, task neighborhood functions found across domains, and several criteria for robust model selection. We demonstrate our methodology on three online user activity datasets and show that network model selection for the appropriate network task vs. an alternate task increases performance by an order of magnitude in our experiments.\nIn recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of this paper is to add to the equation two additional crucial parameters. The first is standardization of safety assurance --- what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability --- engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, and drive the entire field into a \"winter of autonomous driving\". In the first part of the paper we propose a white-box, interpretable, mathematical model for safety assurance, which we call Responsibility-Sensitive Safety (RSS). In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.\nMany real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.\nHuman action recognition involves the characterization of human actions through the automated analysis of video data and is integral in the development of smart computer vision systems. However, several challenges like dynamic backgrounds, camera stabilization, complex actions, occlusions etc. make action recognition in a real time and robust fashion difficult. Several complex approaches exist but are computationally intensive. This paper presents a novel approach of using a combination of good features along with iterative optical flow algorithm to compute feature vectors which are classified using a multilayer perceptron (MLP) network. The use of multiple features for motion descriptors enhances the quality of tracking. Resilient backpropagation algorithm is used for training the feedforward neural network reducing the learning time. The overall system accuracy is improved by optimizing the various parameters of the multilayer perceptron network.\nKnowledge graphs are large, useful, but incomplete knowledge repositories. They encode knowledge through entities and relations which define each other through the connective structure of the graph. This has inspired methods for the joint embedding of entities and relations in continuous low-dimensional vector spaces, that can be used to induce new edges in the graph, i.e., link prediction in knowledge graphs. Learning these representations relies on contrasting positive instances with negative ones. Knowledge graphs include only positive relation instances, leaving the door open for a variety of methods for selecting negative examples. In this paper we present an empirical study on the impact of negative sampling on the learned embeddings, assessed through the task of link prediction. We use state-of-the-art knowledge graph embeddings -- \\rescal , TransE, DistMult and ComplEX -- and evaluate on benchmark datasets -- FB15k and WN18. We compare well known methods for negative sampling and additionally propose embedding based sampling methods. We note a marked difference in the impact of these sampling methods on the two datasets, with the \"traditional\" corrupting positives method leading to best results on WN18, while embedding based methods benefiting the task on FB15k.\nThe electronic health record (EHR) contains a large amount of multi-dimensional and unstructured clinical data of significant operational and research value. Distinguished from previous studies, our approach embraces a double-annotated dataset and strays away from obscure \"black-box\" models to comprehensive deep learning models. In this paper, we present a novel neural attention mechanism that not only classifies clinically important findings. Specifically, convolutional neural networks (CNN) with attention analysis are used to classify radiology head computed tomography reports based on five categories that radiologists would account for in assessing acute and communicable findings in daily practice. The experiments show that our CNN attention models outperform non-neural models, especially when trained on a larger dataset. Our attention analysis demonstrates the intuition behind the classifier's decision by generating a heatmap that highlights attended terms used by the CNN model; this is valuable when potential downstream medical decisions are to be performed by human experts or the classifier information is to be used in cohort construction such as for epidemiological studies.\nAnytime predictors first produce crude results quickly, and then continuously refine them until the test-time computational budget is depleted. Such predictors are used in real-time vision systems and streaming-data processing to efficiently utilize varying test-time budgets, and to reduce average prediction cost via early-exits. However, anytime prediction algorithms have difficulties utilizing the accurate predictions of deep neural networks (DNNs), because DNNs are often computationally expensive without competitive intermediate results. In this work, we propose to add auxiliary predictions in DNNs to generate anytime predictions, and optimize these predictions simultaneously by minimizing a carefully constructed weighted sum of losses, where the weights also oscillate during training. The proposed anytime neural networks (ANNs) produce reasonable anytime predictions without sacrificing the final performance or incurring noticeable extra computation. This enables us to assemble a sequence of exponentially deepening ANNs, and it achieves, both theoretically and practically, near-optimal anytime predictions at every budget after spending a constant fraction of extra cost. The proposed methods are shown to produce anytime predictions at the state-of-the-art level on visual recognition data-sets, including ILSVRC2012.\nRecurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ .\nThe past decade has seen a significant interest in learning tractable probabilistic representations. Arithmetic circuits (ACs) were among the first proposed tractable representations, with some subsequent representations being instances of ACs with weaker or stronger properties. In this paper, we provide a formal basis under which variants on ACs can be compared, and where the precise roles and semantics of their various properties can be made more transparent. This allows us to place some recent developments on ACs in a clearer perspective and to also derive new results for ACs. This includes an exponential separation between ACs with and without determinism; completeness and incompleteness results; and tractability results (or lack thereof) when computing most probable explanations (MPEs).\nThe Koopman operator has recently garnered much attention for its value in dynamical systems analysis and data-driven model discovery. However, its application has been hindered by the computational complexity of extended dynamic mode decomposition; this requires a combinatorially large basis set to adequately describe many nonlinear systems of interest, e.g. cyber-physical infrastructure systems, biological networks, social systems, and fluid dynamics. Often the dictionaries generated for these problems are manually curated, requiring domain-specific knowledge and painstaking tuning. In this paper we introduce a deep learning framework for learning Koopman operators of nonlinear dynamical systems. We show that this novel method automatically selects efficient deep dictionaries, outperforming state-of-the-art methods. We benchmark this method on partially observed nonlinear systems, including the glycolytic oscillator and show it is able to predict quantitatively 100 steps into the future, using only a single timepoint, and qualitative oscillatory behavior 400 steps into the future.\nDuring the last years, Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in image classification. Their architectures have largely drawn inspiration by models of the primate visual system. However, while recent research results of neuroscience prove the existence of non-linear operations in the response of complex visual cells, little effort has been devoted to extend the convolution technique to non-linear forms. Typical convolutional layers are linear systems, hence their expressiveness is limited. To overcome this, various non-linearities have been used as activation functions inside CNNs, while also many pooling strategies have been applied. We address the issue of developing a convolution method in the context of a computational model of the visual cortex, exploring quadratic forms through the Volterra kernels. Such forms, constituting a more rich function space, are used as approximations of the response profile of visual cells. Our proposed second-order convolution is tested on CIFAR-10 and CIFAR-100. We show that a network which combines linear and non-linear filters in its convolutional layers, can outperform networks that use standard linear filters with the same architecture, yielding results competitive with the state-of-the-art on these datasets.\nThe goal of continuous emotion recognition is to assign an emotion value to every frame in a sequence of acoustic features. We show that incorporating long-term temporal dependencies is critical for continuous emotion recognition tasks. To this end, we first investigate architectures that use dilated convolutions. We show that even though such architectures outperform previously reported systems, the output signals produced from such architectures undergo erratic changes between consecutive time steps. This is inconsistent with the slow moving ground-truth emotion labels that are obtained from human annotators. To deal with this problem, we model a downsampled version of the input signal and then generate the output signal through upsampling. Not only does the resulting downsampling/upsampling network achieve good performance, it also generates smooth output trajectories. Our method yields the best known audio-only performance on the RECOLA dataset.\nImage relighting is to change the illumination of an image to a target illumination effect without known the original scene geometry, material information and illumination condition. We propose a novel outdoor scene relighting method, which needs only a single reference image and is based on material constrained layer decomposition. Firstly, the material map is extracted from the input image. Then, the reference image is warped to the input image through patch match based image warping. Lastly, the input image is relit using material constrained layer decomposition. The experimental results reveal that our method can produce similar illumination effect as that of the reference image on the input image using only a single reference image.\nIn this article, we present a survey of recent advances in passive human behaviour recognition in indoor areas using the channel state information (CSI) of commercial WiFi systems. Movement of human body causes a change in the wireless signal reflections, which results in variations in the CSI. By analyzing the data streams of CSIs for different activities and comparing them against stored models, human behaviour can be recognized. This is done by extracting features from CSI data streams and using machine learning techniques to build models and classifiers. The techniques from the literature that are presented herein have great performances, however, instead of the machine learning techniques employed in these works, we propose to use deep learning techniques such as long-short term memory (LSTM) recurrent neural network (RNN), and show the improved performance. We also discuss about different challenges such as environment change, frame rate selection, and multi-user scenario, and suggest possible directions for future work.\nAutomatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model's predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation.\nAn exhaustive study on neural network language modeling (NNLM) is performed in this paper. Different architectures of basic neural network language models are described and examined. A number of different improvements over basic neural network language models, including importance sampling, word classes, caching and bidirectional recurrent neural network (BiRNN), are studied separately, and the advantages and disadvantages of every technique are evaluated. Then, the limits of neural network language modeling are explored from the aspects of model architecture and knowledge representation. Part of the statistical information from a word sequence will loss when it is processed word by word in a certain order, and the mechanism of training neural network by updating weight matrixes and vectors imposes severe restrictions on any significant enhancement of NNLM. For knowledge representation, the knowledge represented by neural network language models is the approximate probabilistic distribution of word sequences from a certain training data set rather than the knowledge of a language itself or the information conveyed by word sequences in a natural language. Finally, some directions for improving neural network language modeling further is discussed.\nWe consider the problem of learning for planning, where knowledge acquired while planning is reused to plan faster in new problem instances. For robotic tasks, among others, plan execution can be captured as a sequence of visual images. For such domains, we propose to use deep neural networks in learning for planning, based on learning a reactive policy that imitates execution traces produced by a planner. We investigate architectural properties of deep networks that are suitable for learning long-horizon planning behavior, and explore how to learn, in addition to the policy, a heuristic function that can be used with classical planners or search algorithms such as A*. Our results on the challenging Sokoban domain show that, with a suitable network design, complex decision making policies and powerful heuristic functions can be learned through imitation.\nProportional representation (PR) is often discussed in voting settings as a major desideratum. For the past century or so, it is common both in practice and in the academic literature to jump to STV (Single Transferable Vote) as the solution for achieving PR. Some of the most prominent electoral reform movements around the globe are pushing for the adoption of STV.   It has been termed a major open problem to design a voting rule that satisfies the same PR properties as STV and better monotonicity properties. We present a rule called EAR (Expanding Approvals Rule) that satisfies properties stronger than the central PR axiom satisfied by STV, can handle indifferences in a convenient and computationally efficient manner, and also satisfies better candidate monotonicity properties. In view of this, our proposed rule seems to be a compelling solution for achieving proportional representation in voting settings.\nWe study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform. We employ a general framework of reinforcement mechanism design, which uses deep reinforcement learning to design efficient algorithms, taking the strategic behaviour of the sellers into account. Specifically, we model the impression allocation problem as a Markov decision process, where the states encode the history of impressions, prices, transactions and generated revenue and the actions are the possible impression allocations in each round. To tackle the problem of continuity and high-dimensionality of states and actions, we adopt the ideas of the DDPG algorithm to design an actor-critic policy gradient algorithm which takes advantage of the problem domain in order to achieve convergence and stability. We evaluate our proposed algorithm, coined IA(GRU), by comparing it against DDPG, as well as several natural heuristics, under different rationality models for the sellers - we assume that sellers follow well-known no-regret type strategies which may vary in their degree of sophistication. We find that IA(GRU) outperforms all algorithms in terms of the total revenue.\nRecently, deep neural networks have demonstrated excellent performances in recognizing the age and gender on human face images. However, these models were applied in a black-box manner with no information provided about which facial features are actually used for prediction and how these features depend on image preprocessing, model initialization and architecture choice. We present a study investigating these different effects.   In detail, our work compares four popular neural network architectures, studies the effect of pretraining, evaluates the robustness of the considered alignment preprocessings via cross-method test set swapping and intuitively visualizes the model's prediction strategies in given preprocessing conditions using the recent Layer-wise Relevance Propagation (LRP) algorithm. Our evaluations on the challenging Adience benchmark show that suitable parameter initialization leads to a holistic perception of the input, compensating artefactual data representations. With a combination of simple preprocessing steps, we reach state of the art performance in gender recognition.\nWe consider the problem of minimizing the difference in the demand and the supply of power using microgrids. We setup multiple microgrids, that provide electricity to a village. They have access to the batteries that can store renewable power and also the electrical lines from the main grid. During each time period, these microgrids need to take decision on the amount of renewable power to be used from the batteries as well as the amount of power needed from the main grid. We formulate this problem in the framework of Markov Decision Process (MDP), similar to the one discussed in [1]. The power allotment to the village from the main grid is fixed and bounded, whereas the renewable energy generation is uncertain in nature. Therefore we adapt a distributed version of the popular Reinforcement learning technique, Multi-Agent Q-Learning to the problem. Finally, we also consider a variant of this problem where the cost of power production at the main site is taken into consideration. In this scenario the microgrids need to minimize the demand-supply deficit, while maintaining the desired average cost of the power production.\nIn recent years, many deep-learning based models are proposed for text classification. This kind of models well fits the training set from the statistical point of view. However, it lacks the capacity of utilizing instance-level information from individual instances in the training set. In this work, we propose to enhance neural network models by allowing them to leverage information from $k$-nearest neighbor (kNN) of the input text. Our model employs a neural network that encodes texts into text embeddings. Moreover, we also utilize $k$-nearest neighbor of the input text as an external memory, and utilize it to capture instance-level information from the training set. The final prediction is made based on features from both the neural network encoder and the kNN memory. Experimental results on several standard benchmark datasets show that our model outperforms the baseline model on all the datasets, and it even beats a very deep neural network model (with 29 layers) in several datasets. Our model also shows superior performance when training instances are scarce, and when the training set is severely unbalanced. Our model also leverages techniques such as semi-supervised training and transfer learning quite well.\nIn this paper, we propose a novel 3D-RecGAN approach, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike the existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid by filling in the occluded/missing regions. The key idea is to combine the generative capabilities of autoencoders and the conditional Generative Adversarial Networks (GAN) framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets show that the proposed 3D-RecGAN significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects. Our code and data are available at: https://github.com/Yang7879/3D-RecGAN.\nWe consider the problem of tracking an intruder using a network of wireless sensors. For tracking the intruder at each instant, the optimal number and the right configuration of sensors has to be powered. As powering the sensors consumes energy, there is a trade off between accurately tracking the position of the intruder at each instant and the energy consumption of sensors. This problem has been formulated in the framework of Partially Observable Markov Decision Process (POMDP). Even for the state-of-the-art algorithm in the literature, the curse of dimensionality renders the problem intractable. In this paper, we formulate the Intrusion Detection (ID) problem with a suitable state-action space in the framework of POMDP and develop a Reinforcement Learning (RL) algorithm utilizing the Upper Confidence Tree Search (UCT) method to solve the ID problem. Through simulations, we show that our algorithm performs and scales well with the increasing state and action spaces.\nTraditional tools for configuring cloud services can run much slower than the workflows they are trying to optimize. For example, in the case studies reported here, we find cases where (using traditional methods) it takes hours to find ways to make a workflow terminate in tens of seconds. Such slow optimizers are a poor choice of tools for reacting to changing operational environmental conditions. Hence, they are unsuited for cloud services that support rapidly changing workflows, e.g., scientific workflows or workflows from the media or telecommunication industries.   To solve this problem, this paper presents RIOT (Randomized Instance Order Types), a new configuration tool. RIOT has a very low optimization overhead-- often, less than 10\\% of the system runtime, especially for every complex workflow. Instead of simulating many configurations, RIOT uses a novel surrogate sampling method to quickly find promising solutions. As shown by this paper, RIOT achieves comparable results to the other approaches but does so in a fraction of the time.\nToday, the practice of returning entities from a knowledge base in response to search queries has become widespread. One of the distinctive characteristics of entities is that they are typed, i.e., assigned to some hierarchically organized type system (type taxonomy). The primary objective of this paper is to gain a better understanding of how entity type information can be utilized in entity retrieval. We perform this investigation in an idealized \"oracle\" setting, assuming that we know the distribution of target types of the relevant entities for a given query. We perform a thorough analysis of three main aspects: (i) the choice of type taxonomy, (ii) the representation of hierarchical type information, and (iii) the combination of type-based and term-based similarity in the retrieval model. Using a standard entity search test collection based on DBpedia, we find that type information proves most useful when using large type taxonomies that provide very specific types. We provide further insights on the extensional coverage of entities and on the utility of target types.\nThe areas of machine learning and communication technology are converging. Today's communications systems generate a huge amount of traffic data, which can help to significantly enhance the design and management of networks and communication components when combined with advanced machine learning methods. Furthermore, recently developed end-to-end training procedures offer new ways to jointly optimize the components of a communication system. Also in many emerging application fields of communication technology, e.g., smart cities or internet of things, machine learning methods are of central importance. This paper gives an overview over the use of machine learning in different areas of communications and discusses two exemplar applications in wireless networking. Furthermore, it identifies promising future research topics and discusses their potential impact.\nUbiquitous bio-sensing for personalized health monitoring is slowly becoming a reality with the increasing availability of small, diverse, robust, high fidelity sensors. This oncoming flood of data begs the question of how we will extract useful information from it. In this paper we explore the use of a variety of representations and machine learning algorithms applied to the task of seizure detection in high resolution, multichannel EEG data. We explore classification accuracy, computational complexity and memory requirements with a view toward understanding which approaches are most suitable for such tasks as the number of people involved and the amount of data they produce grows to be quite large. In particular, we show that layered learning approaches such as Deep Belief Networks excel along these dimensions.\nNatural disasters can have catastrophic impacts on the functionality of infrastructure systems and cause severe physical and socio-economic losses. Given budget constraints, it is crucial to optimize decisions regarding mitigation, preparedness, response, and recovery practices for these systems. This requires accurate and efficient means to evaluate the infrastructure system reliability. While numerous research efforts have addressed and quantified the impact of natural disasters on infrastructure systems, typically using the Monte Carlo approach, they still suffer from high computational cost and, thus, are of limited applicability to large systems. This paper presents a deep learning framework for accelerating infrastructure system reliability analysis. In particular, two distinct deep neural network surrogates are constructed and studied: (1) A classifier surrogate which speeds up the connectivity determination of networks, and (2) An end-to-end surrogate that replaces a number of components such as roadway status realization, connectivity determination, and connectivity averaging. The proposed approach is applied to a simulation-based study of the two-terminal connectivity of a California transportation network subject to extreme probabilistic earthquake events. Numerical results highlight the effectiveness of the proposed approach in accelerating the transportation system two-terminal reliability analysis with extremely high prediction accuracy.\nReinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield is introduced in the traditional learning process in two alternative ways, depending on the location at which the shield is implemented. In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions. In the second way, the shield is introduced after the learning agent. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.\nWe introduce a new class of graphical models that generalizes Lauritzen-Wermuth-Frydenberg chain graphs by relaxing the semi-directed acyclity constraint so that only directed cycles are forbidden. Moreover, up to two edges are allowed between any pair of nodes. Specifically, we present local, pairwise and global Markov properties for the new graphical models and prove their equivalence. We also present an equivalent factorization property. Finally, we present a causal interpretation of the new models.\nIn this paper, we build the case that 5G and concomitant emerging technologies (such as IoT, big data, artificial intelligence, and machine learning) will transform global healthcare systems in the near future. Our optimism around 5G-enabled healthcare stems from a confluence of significant technical pushes that are already at play: apart from the availability of high-throughput low-latency wireless connectivity, other significant factors include the democratization of computing through cloud computing; the democratization of AI and cognitive computing (e.g., IBM Watson); and the commoditization of data through crowdsourcing and digital exhaust. These technologies together can finally crack a dysfunctional healthcare system that has largely been impervious to technological innovations. We highlight the persistent deficiencies of the current healthcare system, and then demonstrate how the 5G-enabled healthcare revolution can fix these deficiencies. We also highlight open technical research challenges, and potential pitfalls, that may hinder the development of such a 5G-enabled health revolution.\nGenerative models are widely used for unsupervised learning with various applications, including data compression and signal restoration. Training methods for such systems focus on the generality of the network given limited amount of training data. A less researched type of techniques concerns generation of only a single type of input. This is useful for applications such as constraint handling, noise reduction and anomaly detection. In this paper we present a technique to limit the generative capability of the network using negative learning. The proposed method searches the solution in the gradient direction for the desired input and in the opposite direction for the undesired input. One of the application can be anomaly detection where the undesired inputs are the anomalous data. In the results section we demonstrate the features of the algorithm using MNIST handwritten digit dataset and latter apply the technique to a real-world obstacle detection problem. The results clearly show that the proposed learning technique can significantly improve the performance for anomaly detection.\nMany genres of natural language text are narratively structured, a testament to our predilection for organizing our experiences as narratives. There is broad consensus that understanding a narrative requires identifying and tracking the goals and desires of the characters and their narrative outcomes. However, to date, there has been limited work on computational models for this problem. We introduce a new dataset, DesireDB, which includes gold-standard labels for identifying statements of desire, textual evidence for desire fulfillment, and annotations for whether the stated desire is fulfilled given the evidence in the narrative context. We report experiments on tracking desire fulfillment using different methods, and show that LSTM Skip-Thought model achieves F-measure of 0.7 on our corpus.\nThe optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.\nThe aim of the current work is to assess the challenges that gamification in education are facing nowadays. Benefits and disadvantages of using gamification in classroom are both discussed to offer a clearer view on the impact of using gamification within learning process. Exploratory study cases are provided to investigate the relation between motivation and engagement of the students and gamification in training. Following this idea, a survey was conducted to assess how students behavior and motivation is affected by introducing a single, specific gamification element during a semester learning process. To stimulate competition among students, a ranking type plugin was introduced within the university learning management system used for extramural education. The results prove that motivation decreases by comparison to the previous semester.\nWe develop an end-to-end training algorithm for whole-image breast cancer diagnosis based on mammograms. It requires lesion annotations only at the first stage of training. After that, a whole image classifier can be trained using only image level labels. This greatly reduced the reliance on lesion annotations. Our approach is implemented using an all convolutional design that is simple yet provides superior performance in comparison with the previous methods. On DDSM, our best single-model achieves a per-image AUC score of 0.88 and three-model averaging increases the score to 0.91. On INbreast, our best single-model achieves a per-image AUC score of 0.96. Using DDSM as benchmark, our models compare favorably with the current state-of-the-art. We also demonstrate that a whole image model trained on DDSM can be easily transferred to INbreast without using its lesion annotations and using only a small amount of training data.   Code and model availability: https://github.com/lishen/end2end-all-conv\nMuch of the user-generated content on social media is provided by ordinary people telling stories about their daily lives. We develop and test a novel method for learning fine-grained common-sense knowledge from these stories about contingent (causal and conditional) relationships between everyday events. This type of knowledge is useful for text and story understanding, information extraction, question answering, and text summarization. We test and compare different methods for learning contingency relation, and compare what is learned from topic-sorted story collections vs. general-domain stories. Our experiments show that using topic-specific datasets enables learning finer-grained knowledge about events and results in significant improvement over the baselines. An evaluation on Amazon Mechanical Turk shows 82% of the relations between events that we learn from topic-sorted stories are judged as contingent.\nHuman understanding of narrative is mainly driven by reasoning about causal relations between events and thus recognizing them is a key capability for computational models of language understanding. Computational work in this area has approached this via two different routes: by focusing on acquiring a knowledge base of common causal relations between events, or by attempting to understand a particular story or macro-event, along with its storyline. In this position paper, we focus on knowledge acquisition approach and claim that newswire is a relatively poor source for learning fine-grained causal relations between everyday events. We describe experiments using an unsupervised method to learn causal relations between events in the narrative genres of first-person narratives and film scene descriptions. We show that our method learns fine-grained causal relations, judged by humans as likely to be causal over 80% of the time. We also demonstrate that the learned event pairs do not exist in publicly available event-pair datasets extracted from newswire.\nRecent efforts in bioinformatics have achieved tremendous progress in the machine reading of biomedical literature, and the assembly of the extracted biochemical interactions into large-scale models such as protein signaling pathways. However, batch machine reading of literature at today's scale (PubMed alone indexes over 1 million papers per year) is unfeasible due to both cost and processing overhead. In this work, we introduce a focused reading approach to guide the machine reading of biomedical literature towards what literature should be read to answer a biomedical query as efficiently as possible. We introduce a family of algorithms for focused reading, including an intuitive, strong baseline, and a second approach which uses a reinforcement learning (RL) framework that learns when to explore (widen the search) or exploit (narrow it). We demonstrate that the RL approach is capable of answering more queries than the baseline, while being more efficient, i.e., reading fewer documents.\nGenerating texts from structured data (e.g., a table) is important for various natural language processing tasks such as question answering and dialog systems. In recent studies, researchers use neural language models and encoder-decoder frameworks for table-to-text generation. However, these neural network-based approaches do not model the order of contents during text generation. When a human writes a summary based on a given table, he or she would probably consider the content order before wording. In a biography, for example, the nationality of a person is typically mentioned before occupation in a biography. In this paper, we propose an order-planning text generation model to capture the relationship between different fields and use such relationship to make the generated text more fluent and smooth. We conducted experiments on the WikiBio dataset and achieve significantly higher performance than previous methods in terms of BLEU, ROUGE, and NIST scores.\nWe study the problem of inferring the type of a networked device in a home network by leveraging low level traffic activity indicators seen at commodity home gateways. We analyze a dataset of detailed device network activity obtained from 240 subscriber homes of a large European ISP and extract a number of traffic and spatial fingerprints for individual devices. We develop a two level taxonomy to describe devices onto which we map individual devices using a number of heuristics. We leverage the heuristically derived labels to train classifiers that distinguish device classes based on the traffic and spatial fingerprints of a device. Our results show an accuracy level up to 91% for the coarse level category and up to 84% for the fine grained category. By incorporating information from other sources (e.g., MAC OUI), we are able to further improve accuracy to above 97% and 92%, respectively. Finally, we also extract a set of simple and human-readable rules that concisely capture the behaviour of these distinct device categories.\nTraditionally psychometric tests were used for profiling incoming workers. These methods use DISC profiling method to classify people into distinct personality types, which are further used to predict if a person may be a possible fit to the organizational culture. This concept is taken further by introducing a novel technique to predict if a particular pair of an incoming worker and the manager being assigned are compatible at a psychological scale. This is done using multilayer perceptron neural network which can be adaptively trained to showcase the true nature of the compatibility index. The proposed prototype model is used to quantify the relevant attributes, use them to train the prediction engine, and to define the data pipeline required for it.\nWe propose two multimodal deep learning architectures that allow for cross-modal dataflow (XFlow) between the feature extractors, thereby extracting more interpretable features and obtaining a better representation than through unimodal learning, for the same amount of training data. These models can usefully exploit correlations between audio and visual data, which have a different dimensionality and are therefore nontrivially exchangeable. Our work improves on existing multimodal deep learning metholodogies in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections, which only transfer information between streams that process compatible data. Both cross-modal architectures outperformed their baselines (by up to 7.5%) when evaluated on the AVletters dataset.\nResearch on the structure of dialogue has been hampered for years because large dialogue corpora have not been available. This has impacted the dialogue research community's ability to develop better theories, as well as good off the shelf tools for dialogue processing. Happily, an increasing amount of information and opinion exchange occur in natural dialogue in online forums, where people share their opinions about a vast range of topics. In particular we are interested in rejection in dialogue, also called disagreement and denial, where the size of available dialogue corpora, for the first time, offers an opportunity to empirically test theoretical accounts of the expression and inference of rejection in dialogue. In this paper, we test whether topic-independent features motivated by theoretical predictions can be used to recognize rejection in online forums in a topic independent way. Our results show that our theoretically motivated features achieve 66% accuracy, an improvement over a unigram baseline of an absolute 6%.\nSemantics based knowledge representations such as ontologies are found to be very useful in automatically generating meaningful factual questions. Determining the difficulty level of these system generated questions is helpful to effectively utilize them in various educational and professional applications. The existing approaches for finding the difficulty level of factual questions are very simple and are limited to a few basic principles. We propose a new methodology for this problem by considering an educational theory called Item Response Theory (IRT). In the IRT, knowledge proficiency of end users (learners) are considered for assigning difficulty levels, because of the assumptions that a given question is perceived differently by learners of various proficiencies. We have done a detailed study on the features (factors) of a question statement which could possibly determine its difficulty level for three learner categories (experts, intermediates and beginners). We formulate ontology based metrics for the same. We then train three logistic regression models to predict the difficulty level corresponding to the three learner categories.\nAspect-level sentiment classification aims at identifying the sentiment polarity of specific target in its context. Previous approaches have realized the importance of targets in sentiment classification and developed various methods with the goal of precisely modeling their contexts via generating target-specific representations. However, these studies always ignore the separate modeling of targets. In this paper, we argue that both targets and contexts deserve special treatment and need to be learned their own representations via interactive learning. Then, we propose the interactive attention networks (IAN) to interactively learn attentions in the contexts and targets, and generate the representations for targets and contexts separately. With this design, the IAN model can well represent a target and its collocative context, which is helpful to sentiment classification. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of our model.\nMobile applications are being used every day by more than half of the world's population to perform a great variety of tasks. With the increasingly widespread usage of these applications, the need arises for efficient techniques to test them. Many frameworks allow automating the process of application testing, however existing frameworks mainly rely on the application developer for providing testing scripts for each developed application, thus preventing reuse of these tests for similar applications. In this paper, we present a novel approach for the automation of testing Android applications by leveraging machine learning techniques and reusing popular test scenarios. We discuss and demonstrate the potential benefits of our approach in an empirical study where we show that our developed testing tool, based on the proposed approach, outperforms standard methods in realistic settings.\nWe explain how the prototype automatic chess problem composer, Chesthetica, successfully composed a rare and interesting chess problem using the new Digital Synaptic Neural Substrate (DSNS) computational creativity approach. This problem represents a greater challenge from a creative standpoint because the checkmate is not always clear and the method of winning even less so. Creating a decisive chess problem of this type without the aid of an omniscient 7-piece endgame tablebase (and one that also abides by several chess composition conventions) would therefore be a challenge for most human players and composers working on their own. The fact that a small computer with relatively low processing power and memory was sufficient to compose such a problem using the DSNS approach in just 10 days is therefore noteworthy. In this report we document the event and result in some detail. It lends additional credence to the DSNS as a viable new approach in the field of computational creativity. In particular, in areas where human-like creativity is required for targeted or specific problems with no clear path to the solution.\nWe investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.\nWe report a proof-of-principle experimental demonstration of the quantum speed-up for learning agents utilizing a small-scale quantum information processor based on radiofrequency-driven trapped ions. The decision-making process of a quantum learning agent within the projective simulation paradigm for machine learning is implemented in a system of two qubits. The latter are realized using hyperfine states of two frequency-addressed atomic ions exposed to a static magnetic field gradient. We show that the deliberation time of this quantum learning agent is quadratically improved with respect to comparable classical learning agents. The performance of this quantum-enhanced learning agent highlights the potential of scalable quantum processors taking advantage of machine learning.\nA central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first-order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys a good practical performance.\nMany efforts have been made to use various forms of domain knowledge in malware detection. Currently there exist two common approaches to malware detection without domain knowledge, namely byte n-grams and strings. In this work we explore the feasibility of applying neural networks to malware detection and feature learning. We do this by restricting ourselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header. By doing this we show that neural networks can learn from raw bytes without explicit feature construction, and perform even better than a domain knowledge approach that parses the PE header into explicit features.\nFine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned.\nWe develop a second order primal-dual method for optimization problems in which the objective function is given by the sum of a strongly convex twice differentiable term and a possibly nondifferentiable convex regularizer. After introducing an auxiliary variable, we utilize the proximal operator of the nonsmooth regularizer to transform the associated augmented Lagrangian into a function that is once, but not twice, continuously differentiable. The saddle point of this function corresponds to the solution of the original optimization problem. We employ a generalization of the Hessian to define second order updates on this function and prove global exponential stability of the corresponding differential inclusion. Furthermore, we develop a globally convergent customized algorithm that utilizes the primal-dual augmented Lagrangian as a merit function. We show that the search direction can be computed efficiently and prove quadratic/superlinear asymptotic convergence. We use the $\\ell_1$-regularized least squares problem and the problem of designing a distributed controller for a spatially-invariant system to demonstrate the merits and the effectiveness of our method.\nThis paper presents the EACare project, an ambitious multi-disciplinary collaboration with the aim to develop an embodied system, capable of carrying out neuropsychological tests to detect early signs of dementia, e.g., due to Alzheimer's disease. The system will use methods from Machine Learning and Social Robotics, and be trained with examples of recorded clinician-patient interactions. The interaction will be developed using a participatory design approach. We describe the scope and method of the project, and report on a first Wizard of Oz prototype.\nIn outdoor environments, mobile robots are required to navigate through terrain with varying characteristics, some of which might significantly affect the integrity of the platform. Ideally, the robot should be able to identify areas that are safe for navigation based on its own percepts about the environment while avoiding damage to itself. Bayesian optimisation (BO) has been successfully applied to the task of learning a model of terrain traversability while guiding the robot through more traversable areas. An issue, however, is that localisation uncertainty can end up guiding the robot to unsafe areas and distort the model being learnt. In this paper, we address this problem and present a novel method that allows BO to consider localisation uncertainty by applying a Gaussian process model for uncertain inputs as a prior. We evaluate the proposed method in simulation and in experiments with a real robot navigating over rough terrain and compare it against standard BO methods.\nIn this paper, we propose an uncertainty-aware learning from demonstration method by presenting a novel uncertainty estimation method utilizing a mixture density network appropriate for modeling complex and noisy human behaviors. The proposed uncertainty acquisition can be done with a single forward path without Monte Carlo sampling and is suitable for real-time robotics applications. The properties of the proposed uncertainty measure are analyzed through three different synthetic examples, absence of data, heavy measurement noise, and composition of functions scenarios. We show that each case can be distinguished using the proposed uncertainty measure and presented an uncertainty-aware learn- ing from demonstration method of an autonomous driving using this property. The proposed uncertainty-aware learning from demonstration method outperforms other compared methods in terms of safety using a complex real-world driving dataset.\nWe build a model using Gaussian processes to infer a spatio-temporal vector field from observed agent trajectories. Significant landmarks or influence points in agent surroundings are jointly derived through vector calculus operations that indicate presence of sources and sinks. We evaluate these influence points by using the Kullback-Leibler divergence between the posterior and prior Laplacian of the inferred spatio-temporal vector field. Through locating significant features that influence trajectories, our model aims to give greater insight into underlying causal utility functions that determine agent decision-making. A key feature of our model is that it infers a joint Gaussian process over the observed trajectories, the time-varying vector field of utility and canonical vector calculus operators. We apply our model to both synthetic data and lion GPS data collected at the Bubye Valley Conservancy in southern Zimbabwe.\nObtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus reducing the data required to learn structure significantly. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations found, improving over the standard sample complexity, which is exponential in $n$ for identifying $n^{\\textrm{th}}$ degree relations. Experimentally, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.\nThis paper contains analysis and extension of exploiters-based knowledge extraction methods, which allow generation of new knowledge, based on the basic ones. The main achievement of the paper is useful features of some universal exploiters proof, which allow extending set of basic classes and set of basic relations by finite set of new classes of objects and relations among them, which allow creating of complete lattice. Proposed approach gives an opportunity to compute quantity of new classes, which can be generated using it, and quantity of different types, which each of obtained classes describes; constructing of defined hierarchy of classes with determined subsumption relation; avoidance of some problems of inheritance and more efficient restoring of basic knowledge within the database.\nThe ability to rapidly identify symmetry and anti-symmetry is an essential attribute of intelligence. Symmetry perception is a central process in human vision and may be key to human 3D visualization. While previous work in understanding neuron symmetry perception has concentrated on the neuron as an integrator, here we show how the coincidence detecting property of the spiking neuron can be used to reveal symmetry density in spatial data. We develop a method for synchronizing symmetry-identifying spiking artificial neural networks to enable layering and feedback in the network. We show a method for building a network capable of identifying symmetry density between sets of data and present a digital logic implementation demonstrating an 8x8 leaky-integrate-and-fire symmetry detector in a field programmable gate array. Our results show that the efficiencies of spiking neural networks can be harnessed to rapidly identify symmetry in spatial data with applications in image processing, 3D computer vision, and robotics.\nA new approach to the study of Generalized Graphs as semantic data structures using machine learning techniques is presented. We show how vector representations maintaining semantic characteristics of the original data can be obtained from a given graph using neural encoding architectures and considering the topological properties of the graph. Semantic features of these new representations are tested by using some machine learning tasks and new directions on efficient link discovery, entitity retrieval and long distance query methodologies on large relational datasets are investigated using real datasets.   ----   En este trabajo se presenta un nuevo enfoque en el contexto del aprendizaje autom\\'atico multi-relacional para el estudio de Grafos Generalizados. Se muestra c\\'omo se pueden obtener representaciones vectoriales que mantienen caracter\\'isticas sem\\'anticas del grafo original utilizando codificadores neuronales y considerando las propiedades topol\\'ogicas del grafo. Adem\\'as, se eval\\'uan las caracter\\'isticas sem\\'anticas capturadas por estas nuevas representaciones y se investigan nuevas metodolog\\'ias eficientes relacionadas con Link Discovery, Entity Retrieval y consultas a larga distancia en grandes conjuntos de datos relacionales haciendo uso de bases de datos reales.\nSocial dilemmas have been regarded as the essence of evolution game theory, in which the prisoner's dilemma game is the most famous metaphor for the problem of cooperation. Recent findings revealed people's behavior violated the Sure Thing Principle in such games. Classic probability methodologies have difficulty explaining the underlying mechanisms of people's behavior. In this paper, a novel quantum-like Bayesian Network was proposed to accommodate the paradoxical phenomenon. The special network can take interference into consideration, which is likely to be an efficient way to describe the underlying mechanism. With the assistance of belief entropy, named as Deng entropy, the paper proposes Belief Distance to render the model practical. Tested with empirical data, the proposed model is proved to be predictable and effective.\nIn this paper, we propose: (a) a restart schedule for an adaptive simulated annealer, and (b) parallel simulated annealing, with an adaptive and parameter-free annealing schedule. The foundation of our approach is the Modified Lam annealing schedule, which adaptively controls the temperature parameter to track a theoretically ideal rate of acceptance of neighboring states. A sequential implementation of Modified Lam simulated annealing is almost parameter-free. However, it requires prior knowledge of the annealing length. We eliminate this parameter using restarts, with an exponentially increasing schedule of annealing lengths. We then extend this restart schedule to parallel implementation, executing several Modified Lam simulated annealers in parallel, with varying initial annealing lengths, and our proposed parallel annealing length schedule. To validate our approach, we conduct experiments on an NP-Hard scheduling problem with sequence-dependent setup constraints. We compare our approach to fixed length restarts, both sequentially and in parallel. Our results show that our approach can achieve substantial performance gains, throughout the course of the run, demonstrating our approach to be an effective anytime algorithm.\nWe discuss that how the majority of traditional modeling approaches are following the idealism point of view in scientific modeling, which follow the set theoretical notions of models based on abstract universals. We show that while successful in many classical modeling domains, there are fundamental limits to the application of set theoretical models in dealing with complex systems with many potential aspects or properties depending on the perspectives. As an alternative to abstract universals, we propose a conceptual modeling framework based on concrete universals that can be interpreted as a category theoretical approach to modeling. We call this modeling framework pre-specific modeling. We further, discuss how a certain group of mathematical and computational methods, along with ever-growing data streams are able to operationalize the concept of pre-specific modeling.\nIn this paper, we tackle the problem of extracting frequent opinions from uncertain databases. We introduce the foundation of an opinion mining approach with the definition of pattern and support measure. The support measure is derived from the commitment definition. A new algorithm called OpMiner that extracts the set of frequent opinions modelled as a mass functions is detailed. Finally, we apply our approach on a real-world biomedical database that stores opinions of experts to evaluate the reliability level of biomedical data. Performance analysis showed a better quality patterns for our proposed model in comparison with literature-based methods.\nCurrent metropolises largely depend on a functioning transport infrastructure and the increasing demand can only be satisfied by a well organized mass transit. One example for a crucial mass transit system is New York City's Staten Island Ferry, connecting the two boroughs of Staten Island and Manhattan with a regular passenger service. Today's demand already exceeds 2500 passengers for a single cycle during peek hours, and future projections suggest that it will further increase. One way to appraise how the system will cope with future demand is by simulation. This contribution proposes an integrated simulation approach to evaluate the system performance with respect to future demand. The simulation relies on a multiscale modeling approach where the terminal buildings are simulated by a microscopic and quantitatively valid cellular automata (CA) and the journeys of the ferries themselves are modeled by a mesoscopic queue simulation approach. Based on the simulation results recommendations with respect to the future demand are given.\nWe describe a novel approach to monitoring high level behaviors using concepts from AI planning. Our goal is to understand what a program is doing based on its system call trace. This ability is particularly important for detecting malware. We approach this problem by building an abstract model of the operating system using the STRIPS planning language, casting system calls as planning operators. Given a system call trace, we simulate the corresponding operators on our model and by observing the properties of the state reached, we learn about the nature of the original program and its behavior. Thus, unlike most statistical detection methods that focus on syntactic features, our approach is semantic in nature. Therefore, it is more robust against obfuscation techniques used by malware that change the outward appearance of the trace but not its effect. We demonstrate the efficacy of our approach by evaluating it on actual system call traces.\nThis paper introduces a novel activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset presents a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations. Furthermore, we also describe the methodology used to acquire annotations through crowdsourcing. Finally some activity recognition benchmarks are presented using current state-of-the-art techniques. We believe that this dataset is particularly suitable as a testbed for activity recognition research but it can also be applicable for other common tasks in robotics/computer vision research such as object detection and human skeleton tracking.\nVulnerability of Deep Neural Networks (DNNs) to adversarial attacks has been attracting a lot of attention in recent studies. It has been shown that for many state of the art DNNs performing image classification there exist universal adversarial perturbations --- image-agnostic perturbations mere addition of which to natural images with high probability leads to their misclassification. In this work we propose a new algorithm for constructing such universal perturbations. Our approach is based on computing the so-called $(p, q)$-singular vectors of the Jacobian matrices of hidden layers of a network. Resulting perturbations present interesting visual patterns, and by using only 64 images we were able to construct universal perturbations with more than 60 \\% fooling rate on the dataset consisting of 50000 images. We also investigate a correlation between the maximal singular value of the Jacobian matrix and the fooling rate of the corresponding singular vector, and show that the constructed perturbations generalize across networks.\nRandomized experiments have been critical tools of decision making for decades. However, subjects can show significant heterogeneity in response to treatments in many important applications. Therefore it is not enough to simply know which treatment is optimal for the entire population. What we need is a model that correctly customize treatment assignment base on subject characteristics. The problem of constructing such models from randomized experiments data is known as Uplift Modeling in the literature. Many algorithms have been proposed for uplift modeling and some have generated promising results on various data sets. Yet little is known about the theoretical properties of these algorithms. In this paper, we propose a new tree-based ensemble algorithm for uplift modeling. Experiments show that our algorithm can achieve competitive results on both synthetic and industry-provided data. In addition, by properly tuning the \"node size\" parameter, our algorithm is proved to be consistent under mild regularity conditions. This is the first consistent algorithm for uplift modeling that we are aware of.\nIn this paper, we propose a recurrent neural network (RNN) with residual attention (RRA) to learn long-range dependencies from sequential data. We propose to add residual connections across timesteps to RNN, which explicitly enhances the interaction between current state and hidden states that are several timesteps apart. This also allows training errors to be directly back-propagated through residual connections and effectively alleviates gradient vanishing problem. We further reformulate an attention mechanism over residual connections. An attention gate is defined to summarize the individual contribution from multiple previous hidden states in computing the current state. We evaluate RRA on three tasks: the adding problem, pixel-by-pixel MNIST classification and sentiment analysis on the IMDB dataset. Our experiments demonstrate that RRA yields better performance, faster convergence and more stable training compared to a standard LSTM network. Furthermore, RRA shows highly competitive performance to the state-of-the-art methods.\nThe recent development of CNN-based image dehazing has revealed the effectiveness of end-to-end modeling. However, extending the idea to end-to-end video dehazing has not been explored yet. In this paper, we propose an End-to-End Video Dehazing Network (EVD-Net), to exploit the temporal consistency between consecutive video frames. A thorough study has been conducted over a number of structure options, to identify the best temporal fusion strategy. Furthermore, we build an End-to-End United Video Dehazing and Detection Network(EVDD-Net), which concatenates and jointly trains EVD-Net with a video object detection model. The resulting augmented end-to-end pipeline has demonstrated much more stable and accurate detection results in hazy video.\nExisting neural conversational models process natural language primarily on a lexico-syntactic level, thereby ignoring one of the most crucial components of human-to-human dialogue: its affective content. We take a step in this direction by proposing three novel ways to incorporate affective/emotional aspects into long short term memory (LSTM) encoder-decoder neural conversation models: (1) affective word embeddings, which are cognitively engineered, (2) affect-based objective functions that augment the standard cross-entropy loss, and (3) affectively diverse beam search for decoding. Experiments show that these techniques improve the open-domain conversational prowess of encoder-decoder networks by enabling them to produce emotionally rich responses that are more interesting and natural.\nWe describe a method to use discrete human feedback to enhance the performance of deep learning agents in virtual three-dimensional environments by extending deep-reinforcement learning to model the confidence and consistency of human feedback. This enables deep reinforcement learning algorithms to determine the most appropriate time to listen to the human feedback, exploit the current policy model, or explore the agent's environment. Managing the trade-off between these three strategies allows DRL agents to be robust to inconsistent or intermittent human feedback. Through experimentation using a synthetic oracle, we show that our technique improves the training speed and overall performance of deep reinforcement learning in navigating three-dimensional environments using Minecraft. We further show that our technique is robust to highly innacurate human feedback and can also operate when no human feedback is given.\nAlthough neural machine translation (NMT) with the encoder-decoder framework has achieved great success in recent times, it still suffers from some drawbacks: RNNs tend to forget old information which is often useful and the encoder only operates through words without considering word relationship. To solve these problems, we introduce a relation networks (RN) into NMT to refine the encoding representations of the source. In our method, the RN first augments the representation of each source word with its neighbors and reasons all the possible pairwise relations between them. Then the source representations and all the relations are fed to the attention module and the decoder together, keeping the main encoder-decoder architecture unchanged. Experiments on two Chinese-to-English data sets in different scales both show that our method can outperform the competitive baselines significantly.\nCrowdfunding has emerged as a prominent way for entrepreneurs to secure funding without sophisticated intermediation. In crowdfunding, an entrepreneur often has to decide how to disclose the campaign status in order to collect as many contributions as possible. Such decisions are difficult to make primarily due to incomplete information. We propose information design as a tool to help the entrepreneur to improve revenue by influencing backers' beliefs. We introduce a heuristic algorithm to dynamically compute information-disclosure policies for the entrepreneur, followed by an empirical evaluation to demonstrate its competitiveness over the widely-adopted immediate-disclosure policy. Our results demonstrate that the immediate-disclosure policy is not optimal when backers follow thresholding policies despite its ease of implementation. With appropriate heuristics, an entrepreneur can benefit from dynamic information disclosure. Our work sheds light on information design in a dynamic setting where agents make decisions using thresholding policies.\nRecurrent neural networks (RNNs) are widely used to model sequential data but their non-linear dependencies between sequence elements prevent parallelizing training over sequence length. We show the training of RNNs with only linear sequential dependencies can be parallelized over the sequence length using the parallel scan algorithm, leading to rapid training on long sequences even with small minibatch size. We develop a parallel linear recurrence CUDA kernel and show that it can be applied to immediately speed up training and inference of several state of the art RNN architectures by up to 9x. We abstract recent work on linear RNNs into a new framework of linear surrogate RNNs and develop a linear surrogate model for the long short-term memory unit, the GILR-LSTM, that utilizes parallel linear recurrence. We extend sequence learning to new extremely long sequence regimes that were previously out of reach by successfully training a GILR-LSTM on a synthetic sequence classification task with a one million timestep dependency.\nMulti-scanner Antivirus systems provide insightful information on the nature of a suspect application; however there is often a lack of consensus and consistency between different Anti-Virus engines. In this article, we analyze more than 250 thousand malware signatures generated by 61 different Anti-Virus engines after analyzing 82 thousand different Android malware applications. We identify 41 different malware classes grouped into three major categories, namely Adware, Harmful Threats and Unknown or Generic signatures. We further investigate the relationships between such 41 classes using community detection algorithms from graph theory to identify similarities between them; and we finally propose a Structure Equation Model to identify which Anti-Virus engines are more powerful at detecting each macro-category. As an application, we show how such models can help in identifying whether Unknown malware applications are more likely to be of Harmful or Adware type.\nWe compare Tetrad (Java) algorithms to the other public software packages BNT (Bayes Net Toolbox, Matlab), pcalg (R), bnlearn (R) on the \\vanilla\" task of recovering DAG structure to the extent possible from data generated recursively from linear, Gaussian structure equation models (SEMs) with no latent variables, for random graphs, with no additional knowledge of variable order or adjacency structure, and without additional specification of intervention information. Each one of the above packages offers at least one implementation suitable to this purpose. We compare them on adjacency and orientation accuracy as well as time performance, for fixed datasets. We vary the number of variables, the number of samples, and the density of graph, for a total of 27 combinations, averaging all statistics over 10 runs, for a total of 270 datasets. All runs are carried out on the same machine and on their native platforms. An interactive visualization tool is provided for the reader who wishes to know more than can be documented explicitly in this report.\nIn this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight-sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet's learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains.\nAs the use of cloud computing continues to rise, controlling cost becomes increasingly important. Yet there is evidence that 30\\% - 45\\% of cloud spend is wasted. Existing tools for cloud provisioning typically rely on highly trained human experts to specify what to monitor, thresholds for triggering action, and actions. In this paper we explore the use of reinforcement learning (RL) to acquire policies to balance performance and spend, allowing humans to specify what they want as opposed to how to do it, minimizing the need for cloud expertise. Empirical results with tabular, deep, and dueling double deep Q-learning with the CloudSim simulator show the utility of RL and the relative merits of the approaches. We also demonstrate effective policy transfer learning from an extremely simple simulator to CloudSim, with the next step being transfer from CloudSim to an Amazon Web Services physical environment.\nWeighted finite automata (WFA) can expressively model functions defined over strings but are inherently linear models. Given the recent successes of nonlinear models in machine learning, it is natural to wonder whether ex-tending WFA to the nonlinear setting would be beneficial. In this paper, we propose a novel model of neural network based nonlinearWFA model (NL-WFA) along with a learning algorithm. Our learning algorithm is inspired by the spectral learning algorithm for WFAand relies on a nonlinear decomposition of the so-called Hankel matrix, by means of an auto-encoder network. The expressive power of NL-WFA and the proposed learning algorithm are assessed on both synthetic and real-world data, showing that NL-WFA can lead to smaller model sizes and infer complex grammatical structures from data.\nIn this paper, we report on the visualization capabilities of an Explainable AI Planning (XAIP) agent that can support human in the loop decision making. Imposing transparency and explainability requirements on such agents is especially important in order to establish trust and common ground with the end-to-end automated planning system. Visualizing the agent's internal decision-making processes is a crucial step towards achieving this. This may include externalizing the \"brain\" of the agent -- starting from its sensory inputs, to progressively higher order decisions made by it in order to drive its planning components. We also show how the planner can bootstrap on the latest techniques in explainable planning to cast plan visualization as a plan explanation problem, and thus provide concise model-based visualization of its plans. We demonstrate these functionalities in the context of the automated planning components of a smart assistant in an instrumented meeting space.\nIn this paper, we introduce the problem of denoting and deriving the complexity of workflows (plans, schedules) in collaborative, planner-assisted settings where humans and agents are trying to jointly solve a task. The interactions -- and hence the workflows that connect the human and the agents -- may differ according to the domain and the kind of agents. We adapt insights from prior work in human-agent teaming and workflow analysis to suggest metrics for workflow complexity. The main motivation behind this work is to highlight metrics for human comprehensibility of plans and schedules. The planning community has seen its fair share of work on the synthesis of plans that take diversity into account -- what value do such plans hold if their generation is not guided at least in part by metrics that reflect the ease of engaging with and using those plans?\nThe prediction of organic reaction outcomes is a fundamental problem in computational chemistry. Since a reaction may involve hundreds of atoms, fully exploring the space of possible transformations is intractable. The current solution utilizes reaction templates to limit the space, but it suffers from coverage and efficiency issues. In this paper, we propose a template-free approach to efficiently explore the space of product molecules by first pinpointing the reaction center -- the set of nodes and edges where graph edits occur. Since only a small number of atoms contribute to reaction center, we can directly enumerate candidate products. The generated candidates are scored by a Weisfeiler-Lehman Difference Network that models high-order interactions between changes occurring at nodes across the molecule. Our framework outperforms the top-performing template-based approach with a 10\\% margin, while running orders of magnitude faster. Finally, we demonstrate that the model accuracy rivals the performance of domain experts.\nReinforcement learning (RL), while often powerful, can suffer from slow learning speeds, particularly in high dimensional spaces. The autonomous decomposition of tasks and use of hierarchical methods hold the potential to significantly speed up learning in such domains. This paper proposes a novel practical method that can autonomously decompose tasks, by leveraging association rule mining, which discovers hidden relationship among entities in data mining. We introduce a novel method called ARM-HSTRL (Association Rule Mining to extract Hierarchical Structure of Tasks in Reinforcement Learning). It extracts temporal and structural relationships of sub-goals in RL, and multi-task RL. In particular,it finds sub-goals and relationship among them. It is shown the significant efficiency and performance of the proposed method in two main topics of RL.\nRandom walks are at the heart of many existing deep learning algorithms for graph data. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to node identity. In this work, we introduce the notion of attributed random walks which serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16.1% while requiring on average 853 times less space than existing methods on a variety of graphs from several domains.\nThe performance of many hard combinatorial problem solvers depends strongly on their parameter settings, and since manual parameter tuning is both tedious and suboptimal the AI community has recently developed several algorithm configuration (AC) methods to automatically address this problem. While all existing AC methods start the configuration process of an algorithm A from scratch for each new type of benchmark instances, here we propose to exploit information about A's performance on previous benchmarks in order to warmstart its configuration on new types of benchmarks. We introduce two complementary ways in which we can exploit this information to warmstart AC methods based on a predictive model. Experiments for optimizing a very flexible modern SAT solver on twelve different instance sets show that our methods often yield substantial speedups over existing AC methods (up to 165-fold) and can also find substantially better configurations given the same compute budget.\nWe present KBLRN, a framework for end-to-end learning of knowledge base representations from latent, relational, and numerical features. KBLRN integrates feature types with a novel combination of neural representation learning and probabilistic product of experts models. To the best of our knowledge, KBLRN is the first approach that learns representations of knowledge bases by integrating latent, relational, and numerical features. We show that instances of KBLRN outperform existing methods on a range of knowledge base completion tasks. We contribute a novel data sets enriching commonly used knowledge base completion benchmarks with numerical features. We have made the data sets available for further research. We also investigate the impact numerical features have on the KB completion performance of KBLRN.\nWe present a novel method to solve image analogy problems : it allows to learn the relation between paired images present in training data, and then generalize and generate images that correspond to the relation, but were never seen in the training set. Therefore, we call the method Conditional Analogy Generative Adversarial Network (CAGAN), as it is based on adversarial training and employs deep convolutional neural networks. An especially interesting application of that technique is automatic swapping of clothing on fashion model photos. Our work has the following contributions. First, the definition of the end-to-end trainable CAGAN architecture, which implicitly learns segmentation masks without expensive supervised labeling data. Second, experimental results show plausible segmentation masks and often convincing swapped images, given the target article. Finally, we discuss the next steps for that technique: neural network architecture improvements and more advanced applications.\nDespite the recent developments that allowed neural networks to achieve impressive performance on a variety of applications, these models are intrinsically affected by the problem of overgeneralization, due to their partitioning of the full input space into the fixed set of target classes used during training. Thus it is possible for novel inputs belonging to categories unknown during training or even completely unrecognizable to humans to fool the system into classifying them as one of the known classes, even with a high degree of confidence. Solving this problem may help improve the security of such systems in critical applications, and may further lead to applications in the context of open set recognition and 1-class recognition. This paper presents a novel way to compute a confidence score using denoising autoencoders and shows that such confidence score can correctly identify the regions of the input space close to the training distribution by approximately identifying its local maxima.\nHigh-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.\nIn order for a robot to be a generalist that can perform a wide range of jobs, it must be able to acquire a wide variety of skills quickly and efficiently in complex unstructured environments. High-capacity models such as deep neural networks can enable a robot to represent complex skills, but learning each skill from scratch then becomes infeasible. In this work, we present a meta-imitation learning method that enables a robot to learn how to learn more efficiently, allowing it to acquire new skills from just a single demonstration. Unlike prior methods for one-shot imitation, our method can scale to raw pixel inputs and requires data from significantly fewer prior tasks for effective learning of new skills. Our experiments on both simulated and real robot platforms demonstrate the ability to learn new tasks, end-to-end, from a single visual demonstration.\nDeep Reinforcement Learning has been able to achieve amazing successes in a variety of domains from video games to continuous control by trying to maximize the cumulative reward. However, most of these successes rely on algorithms that require a large amount of data to train in order to obtain results on par with human-level performance. This is not feasible if we are to deploy these systems on real world tasks and hence there has been an increased thrust in exploring data efficient algorithms. To this end, we propose the Shared Learning framework aimed at making $Q$-ensemble algorithms data-efficient. For achieving this, we look into some principles of transfer learning which aim to study the benefits of information exchange across tasks in reinforcement learning and adapt transfer to learning our value function estimates in a novel manner. In this paper, we consider the special case of transfer between the value function estimates in the $Q$-ensemble architecture of BootstrappedDQN. We further empirically demonstrate how our proposed framework can help in speeding up the learning process in $Q$-ensembles with minimum computational overhead on a suite of Atari 2600 Games.\nToday's general-purpose deep convolutional neural networks (CNN) for image classification and object detection are trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of problem as Time-ordered Online Training (ToOT) - these problems will require a consideration of not only the quantity of incoming training data, but the human effort required to tag and use it. In this paper, we define training benefit as a metric to measure the effectiveness of a sequence in using each user interaction. We demonstrate and evaluate a system tailored to performing ToOT in the field, capable of training an image classifier on a live video stream through minimal input from a human operator. We show that by exploiting the time-ordered nature of the video stream through optical flow-based object tracking, we can increase the effectiveness of human actions by about 8 times.\nIn this paper, we introduce Query-based Attention CNN(QACNN) for Text Similarity Map, an end-to-end neural network for question answering. This network is composed of compare mechanism, two-staged CNN architecture with attention mechanism, and a prediction layer. First, the compare mechanism compares between the given passage, query, and multiple answer choices to build similarity maps. Then, the two-staged CNN architecture extracts features through word-level and sentence-level. At the same time, attention mechanism helps CNN focus more on the important part of the passage based on the query information. Finally, the prediction layer find out the most possible answer choice. We conduct this model on the MovieQA dataset using Plot Synopses only, and achieve 79.99% accuracy which is the state of the art on the dataset.\nSmall objects detection is a challenging task in computer vision due to its limited resolution and information. In order to solve this problem, the majority of existing methods sacrifice speed for improvement in accuracy. In this paper, we aim to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector (SSD) with respect to accuracy-vs-speed trade-off as base architecture. We propose a multi-level feature fusion method for introducing contextual information in SSD, in order to improve the accuracy for small objects. In detailed fusion operation, we design two feature fusion modules, concatenation module and element-sum module, different in the way of adding contextual information. Experimental results show that these two fusion modules obtain higher mAP on PASCALVOC2007 than baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points improvement on some smallobjects categories. The testing speed of them is 43 and 40 FPS respectively, superior to the state of the art Deconvolutional single shot detector (DSSD) by 29.4 and 26.4 FPS. Keywords: small object detection, feature fusion, real-time, single shot multi-box detector\nInformation Cascades Model captures dynamical properties of user activity in a social network. In this work, we develop a novel framework for activity shaping under the Continuous-Time Information Cascades Model which allows the administrator for local control actions by allocating targeted resources that can alter the spread of the process. Our framework employs the optimization of the spectral radius of the Hazard matrix, a quantity that has been shown to drive the maximum influence in a network, while enjoying a simple convex relaxation when used to minimize the influence of the cascade. In addition, use-cases such as quarantine and node immunization are discussed to highlight the generality of the proposed activity shaping framework. Finally, we present the NetShape influence minimization method which is compared favorably to baseline and state-of-the-art approaches through simulations on real social networks.\nWe introduce a framework to leverage knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. Our perspective avoids the subjectivity inherent in unsupervised learning by reducing it to supervised learning, and provides a principled way to evaluate unsupervised algorithms. We demonstrate the versatility of our framework via simple agnostic bounds on unsupervised problems. In the context of clustering, our approach helps choose the number of clusters and the clustering algorithm, remove the outliers, and provably circumvent the Kleinberg's impossibility result. Experimental results across hundreds of problems demonstrate improved performance on unsupervised data with simple algorithms, despite the fact that our problems come from heterogeneous domains. Additionally, our framework lets us leverage deep networks to learn common features from many such small datasets, and perform zero shot learning.\nIn this paper, we propose a novel explanation module to explain the predictions made by a deep network. The explanation module works by embedding a high-dimensional deep network layer nonlinearly into a low-dimensional explanation space while retaining faithfulness, so that the original deep learning predictions can be constructed from the few concepts extracted by the explanation module. We then visualize such concepts for human to learn about the high-level concepts that deep learning is using to make decisions. We propose an algorithm called Sparse Reconstruction Autoencoder (SRAE) for learning the embedding to the explanation space. SRAE aims to reconstruct part of the original feature space while retaining faithfulness. A pull-away term is applied to SRAE to make the explanation space more orthogonal. A visualization system is then introduced for human understanding of the features in the explanation space. The proposed method is applied to explain CNN models in image classification tasks, and several novel metrics are introduced to evaluate the performance of explanations quantitatively without human involvement. Experiments show that the proposed approach generates interesting explanations of the mechanisms CNN use for making predictions.\nWe consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the estimated value of any fixed policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $\\epsilon$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.\nCross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing framework that integrates view-centric proposals into scene-centric parse graphs that represent a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapping fields of views embed rich appearance and geometry correlations and that knowledge fragments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed joint parsing framework represents such correlations and constraints explicitly and generates semantic scene-centric parse graphs. Quantitative experiments show that scene-centric predictions in the parse graph outperform view-centric predictions.\nAdapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(N2L2) time. These algorithms are heavily dependent on the selected guide-tree metric, often return sum-of-pairs-score-reducing errors that interfere with interpretation, and are computationally intensive for large datasets. To alleviate these issues, we propose process-oriented iterative multiple alignment (PIMA), which contains specialized optimizations to better handle workflow data. We demonstrate that PIMA is a flexible framework capable of achieving better sum-of-pairs score than existing trace alignment algorithms in only O(NL2) time. We applied PIMA to analyzing medical workflow data, showing how iterative alignment can better represent the data and facilitate the extraction of insights from data visualization.\nIn this work, we develop an end-to-end Reinforcement Learning based architecture for a conversational search agent to assist users in searching on an e-commerce marketplace for digital assets. Our approach caters to a search task fundamentally different from the ones which have limited search modalities where the user can express his preferences objectively. The system interacts with the users to display search results to the queries, and gauges user's intent and context of the conversation to choose the next action and reply. To train the agent in the absence of true conversation data, a virtual user is constructed to model a human user using the query and session logs from a major stock photography and digital assets marketplace. The system provides an alternative that is more engaging than the traditional search while maintaining similar effectiveness. This work provides a mechanism to build and deploy bootstrapped version of an effective conversational agent from readily available query log data. The system can then be used to acquire true conversational data and be fine-tuned further. The methodology discussed in this paper can be extended to e-commerce domains in general.\nRecently, digital music libraries have been developed and can be plainly accessed. Latest research showed that current organization and retrieval of music tracks based on album information are inefficient. Moreover, they demonstrated that people use emotion tags for music tracks in order to search and retrieve them. In this paper, we discuss separability of a set of emotional labels, proposed in the categorical emotion expression, using Fisher's separation theorem. We determine a set of adjectives to tag music parts: happy, sad, relaxing, exciting, epic and thriller. Temporal, frequency and energy features have been extracted from the music parts. It could be seen that the maximum separability within the extracted features occurs between relaxing and epic music parts. Finally, we have trained a classifier using Support Vector Machines to automatically recognize and generate emotional labels for a music part. Accuracy for recognizing each label has been calculated; where the results show that epic music can be recognized more accurately (77.4%), comparing to the other types of music.\nIn this paper, we present the first-of-its-kind machine learning (ML) system, called AI Programmer, that can automatically generate full software programs requiring only minimal human guidance. At its core, AI Programmer uses genetic algorithms (GA) coupled with a tightly constrained programming language that minimizes the overhead of its ML search space. Part of AI Programmer's novelty stems from (i) its unique system design, including an embedded, hand-crafted interpreter for efficiency and security and (ii) its augmentation of GAs to include instruction-gene randomization bindings and programming language-specific genome construction and elimination techniques. We provide a detailed examination of AI Programmer's system design, several examples detailing how the system works, and experimental data demonstrating its software generation capabilities and performance using only mainstream CPUs.\nA modular method is proposed to learn and transfer visuo-motor policies from simulation to the real world in an efficient manner by combining domain randomization and adaptation. The feasibility of the approach is demonstrated in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The learned visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 93.3% success rate and 2.2 cm control accuracy.\nTo aide simultaneous localization and mapping (SLAM), future perception systems will incorporate forms of scene understanding. In a step towards fully integrated probabilistic geometric scene understanding, localization and mapping we propose the first direction-aware semi-dense SLAM system. It jointly infers the directional Stata Center World (SCW) segmentation and a surfel-based semi-dense map while performing real-time camera tracking. The joint SCW map model connects a scene-wide Bayesian nonparametric Dirichlet Process von-Mises-Fisher mixture model (DP-vMF) prior on surfel orientations with the local surfel locations via a conditional random field (CRF). Camera tracking leverages the SCW segmentation to improve efficiency via guided observation selection. Results demonstrate improved SLAM accuracy and tracking efficiency at state of the art performance.\nIn the propositional setting, the marginal problem is to find a (maximum-entropy) distribution that has some given marginals. We study this problem in a relational setting and make the following contributions. First, we compare two different notions of relational marginals. Second, we show a duality between the resulting relational marginal problems and the maximum likelihood estimation of the parameters of relational models, which generalizes a well-known duality from the propositional setting. Third, by exploiting the relational marginal formulation, we present a statistically sound method to learn the parameters of relational models that will be applied in settings where the number of constants differs between the training and test data. Furthermore, based on a relational generalization of marginal polytopes, we characterize cases where the standard estimators based on feature's number of true groundings needs to be adjusted and we quantitatively characterize the consequences of these adjustments. Fourth, we prove bounds on expected errors of the estimated parameters, which allows us to lower-bound, among other things, the effective sample size of relational training data.\nIn this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan is featured for its deep root into Bayesian inference, thus supporting various kinds of probabilistic models, including both the traditional hierarchical Bayesian models and recent deep generative models. We use running examples to illustrate the probabilistic programming on ZhuSuan, including Bayesian logistic regression, variational auto-encoders, deep sigmoid belief networks and Bayesian recurrent neural networks.\nCross-correlator plays a significant role in many visual perception tasks, such as object detection and tracking. Beyond the linear cross-correlator, this paper proposes a kernel cross-correlator (KCC) that breaks traditional limitations. First, by introducing the kernel trick, the KCC extends the linear cross-correlation to non-linear space, which is more robust to signal noises and distortions. Second, the connection to the existing works shows that KCC provides a unified solution for correlation filters. Third, KCC is applicable to any kernel function and is not limited to circulant structure on training data, thus it is able to predict affine transformations with customized properties. Last, by leveraging the fast Fourier transform (FFT), KCC eliminates direct calculation of kernel vectors, thus achieves better performance yet still with a reasonable computational cost. Comprehensive experiments on visual tracking and human activity recognition using wearable devices demonstrate its robustness, flexibility, and efficiency. The source codes of both experiments are released at https://github.com/wang-chen/KCC\nWhat if $\\{$a tourist, a train addict, Dr. Sheldon Cooper, somebody who likes to waste time$\\}$ wants to visit all metro lines or carriages in a given network in a minimum number of steps? We study this problem with an application to the metro network of Paris and Tokyo, proposing optimal solutions thanks to mathematical programming tools. Quite surprisingly, it appears that you can visit all 16 Parisian metro lines in only 26 steps (we denote by a step the act of taking the metro from one station to an adjacent one). Perhaps even more surprisingly, adding the 5 RER lines to these 16 lines does not increase the size of the best solution. It is also possible to visit the 13 lines of (the dense network of) Tokyo with only 15 steps.\nGraph classification is a problem with practical applications in many different domains. Most of the existing methods take the entire graph into account when calculating graph features. In a graphlet-based approach, for instance, the entire graph is processed to get the total count of different graphlets or sub-graphs. In the real-world, however, graphs can be both large and noisy with discriminative patterns confined to certain regions in the graph only. In this work, we study the problem of attentional processing for graph classification. The use of attention allows us to focus on small but informative parts of the graph, avoiding noise in the rest of the graph. We present a novel RNN model, called the Graph Attention Model (GAM), that processes only a portion of the graph by adaptively selecting a sequence of \"interesting\" nodes. The model is equipped with an external memory component which allows it to integrate information gathered from different parts of the graph. We demonstrate the effectiveness of the model through various experiments.\nIn this paper we focus on the linear algebra theory behind feedforward (FNN) and recurrent (RNN) neural networks. We review backward propagation, including backward propagation through time (BPTT). Also, we obtain a new exact expression for Hessian, which represents second order effects. We show that for $t$ time steps the weight gradient can be expressed as a rank-$t$ matrix, while the weight Hessian is as a sum of $t^{2}$ Kronecker products of rank-$1$ and $W^{T}AW$ matrices, for some matrix $A$ and weight matrix $W$. Also, we show that for a mini-batch of size $r$, the weight update can be expressed as a rank-$rt$ matrix. Finally, we briefly comment on the eigenvalues of the Hessian matrix.\nWe analyze the convergence of (stochastic) gradient descent algorithm for learning a convolutional filter with Rectified Linear Unit (ReLU) activation function. Our analysis does not rely on any specific form of the input distribution and our proofs only use the definition of ReLU, in contrast with previous works that are restricted to standard Gaussian input. We show that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches. To the best of our knowledge, this is the first recovery guarantee of gradient-based algorithms for convolutional filter on non-Gaussian input distributions. Our theory also justifies the two-stage learning rate strategy in deep neural networks. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.\nWhile imitation learning is becoming common practice in robotics, this approach often suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by continually aggregating training data from both the expert and novice policies, but does not consider the impact of safety. We present a probabilistic extension to DAgger, which uses the distribution over actions provided by the novice policy, for a given observation. Our method, which we call DropoutDAgger, uses dropout to train the novice as a Bayesian neural network that provides insight to its confidence. Using the distribution over the novice's actions, we estimate a probabilistic measure of safety with respect to the expert action, tuned to balance exploration and exploitation. The utility of this approach is evaluated on the MuJoCo HalfCheetah and in a simple driving experiment, demonstrating improved performance and safety compared to other DAgger variants and classic imitation learning.\nRobust Stable Marriage (RSM) is a variant of the classical Stable Marriage problem, where the robustness of a given stable matching is measured by the number of modifications required for repairing it in case an unforeseen event occurs. We focus on the complexity of finding an (a,b)-supermatch. An (a,b)-supermatch is defined as a stable matching in which if any 'a' (non-fixed) men/women break up it is possible to find another stable matching by changing the partners of those 'a' men/women and also the partners of at most 'b' other couples. In order to show deciding if there exists an (a,b)-supermatch is NP-Complete, we first introduce a SAT formulation that is NP-Complete by using Schaefer's Dichotomy Theorem. Then, we show the equivalence between the SAT formulation and finding a (1,1)-supermatch on a specific family of instances.\nOnline solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.\nThe hospitality industry is one of the data-rich industries that receives huge Volumes of data streaming at high Velocity with considerably Variety, Veracity, and Variability. These properties make the data analysis in the hospitality industry a big data problem. Meeting the customers' expectations is a key factor in the hospitality industry to grasp the customers' loyalty. To achieve this goal, marketing professionals in this industry actively look for ways to utilize their data in the best possible manner and advance their data analytic solutions, such as identifying a unique market segmentation clustering and developing a recommendation system. In this paper, we present a comprehensive literature review of existing big data clustering algorithms and their advantages and disadvantages for various use cases. We implement the existing big data clustering algorithms and provide a quantitative comparison of the performance of different clustering algorithms for different scenarios. We also present our insights and recommendations regarding the suitability of different big data clustering algorithms for different use cases. These recommendations will be helpful for hoteliers in selecting the appropriate market segmentation clustering algorithm for different clustering datasets to improve the customer experience and maximize the hotel revenue.\nA value learning system has incentives to follow shutdown instructions, assuming the shutdown instruction provides information (in the technical sense) about which actions lead to valuable outcomes. However, this assumption is not robust to model mis-specification (e.g., in the case of programmer errors). We demonstrate this by presenting some Supervised POMDP scenarios in which errors in the parameterized reward function remove the incentive to follow shutdown commands. These difficulties parallel those discussed by Soares et al. (2015) in their paper on corrigibility. We argue that it is important to consider systems that follow shutdown commands under some weaker set of assumptions (e.g., that one small verified module is correctly implemented; as opposed to an entire prior probability distribution and/or parameterized reward function). We discuss some difficulties with simple ways to attempt to attain these sorts of guarantees in a value learning framework.\nIn this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided.We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance.\nRecurrent Neural Networks (RNNS) are now widely used on sequence generation tasks due to their ability to learn long-range dependencies and to generate sequences of arbitrary length. However, their left-to-right generation procedure only allows a limited control from a potential user which makes them unsuitable for interactive and creative usages such as interactive music generation. This paper introduces a novel architecture called Anticipation-RNN which possesses the assets of the RNN-based generative models while allowing to enforce user-defined positional constraints. We demonstrate its efficiency on the task of generating melodies satisfying positional constraints in the style of the soprano parts of the J.S. Bach chorale harmonizations. Sampling using the Anticipation-RNN is of the same order of complexity than sampling from the traditional RNN model. This fast and interactive generation of musical sequences opens ways to devise real-time systems that could be used for creative purposes.\nLearning to remember long sequences remains a challenging task for recurrent neural networks. Register memory and attention mechanisms were both proposed to resolve the issue with either high computational cost to retain memory differentiability, or by discounting the RNN representation learning towards encoding shorter local contexts than encouraging long sequence encoding. Associative memory, which studies the compression of multiple patterns in a fixed size memory, were rarely considered in recent years. Although some recent work tries to introduce associative memory in RNN and mimic the energy decay process in Hopfield nets, it inherits the shortcoming of rule-based memory updates, and the memory capacity is limited. This paper proposes a method to learn the memory update rule jointly with task objective to improve memory capacity for remembering long sequences. Also, we propose an architecture that uses multiple such associative memory for more complex input encoding. We observed some interesting facts when compared to other RNN architectures on some well-studied sequence learning tasks.\nGenerative adversarial networks (GANs) are an exciting alternative to algorithms for solving density estimation problems---using data to assess how likely samples are to be drawn from the same distribution. Instead of explicitly computing these probabilities, GANs learn a generator that can match the given probabilistic source. This paper looks particularly at this matching capability in the context of problems with one-dimensional outputs. We identify a class of function decompositions with properties that make them well suited to the critic role in a leading approach to GANs known as Wasserstein GANs. We show that Taylor and Fourier series decompositions belong to our class, provide examples of these critics outperforming standard GAN approaches, and suggest how they can be scaled to higher dimensional problems in the future.\nIn recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.\nUnderstanding properties of deep neural networks is an important challenge in deep learning. In this paper, we take a step in this direction by proposing a rigorous way of verifying properties of a popular class of neural networks, Binarized Neural Networks, using the well-developed means of Boolean satisfiability. Our main contribution is a construction that creates a representation of a binarized neural network as a Boolean formula. Our encoding is the first exact Boolean representation of a deep neural network. Using this encoding, we leverage the power of modern SAT solvers along with a proposed counterexample-guided search procedure to verify various properties of these networks. A particular focus will be on the critical property of robustness to adversarial perturbations. For this property, our experimental results demonstrate that our approach scales to medium-size deep neural networks used in image classification tasks. To the best of our knowledge, this is the first work on verifying properties of deep neural networks using an exact Boolean encoding of the network.\nRepresenting the semantic relations that exist between two given words (or entities) is an important first step in a wide-range of NLP applications such as analogical reasoning, knowledge base completion and relational information retrieval. A simple, yet surprisingly accurate method for representing a relation between two words is to compute the vector offset (\\PairDiff) between their corresponding word embeddings. Despite the empirical success, it remains unclear as to whether \\PairDiff is the best operator for obtaining a relational representation from word embeddings. We conduct a theoretical analysis of generalised bilinear operators that can be used to measure the $\\ell_{2}$ relational distance between two word-pairs. We show that, if the word embeddings are standardised and uncorrelated, such an operator will be independent of bilinear terms, and can be simplified to a linear form, where \\PairDiff is a special case. For numerous word embedding types, we empirically verify the uncorrelation assumption, demonstrating the general applicability of our theoretical result. Moreover, we experimentally discover \\PairDiff from the bilinear relation composition operator on several benchmark analogy datasets.\nReinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.\nWe present a general approach to automating ethical decisions, drawing on machine learning and computational social choice. In a nutshell, we propose to learn a model of societal preferences, and, when faced with a specific ethical dilemma at runtime, efficiently aggregate those preferences to identify a desirable choice. We provide a concrete algorithm that instantiates our approach; some of its crucial steps are informed by a new theory of swap-dominance efficient voting rules. Finally, we implement and evaluate a system for ethical decision making in the autonomous vehicle domain, using preference data collected from 1.3 million people through the Moral Machine website.\nRecently, evolving networks are becoming a suitable form to model many real-world complex systems, due to their peculiarities to represent the systems and their constituting entities, the interactions between the entities and the time-variability of their structure and properties. Designing computational models able to analyze evolving networks becomes relevant in many applications. The goal of this research project is to evaluate the possible contribution of temporal pattern mining techniques in the analysis of evolving networks. In particular, we aim at exploiting available snapshots for the recognition of valuable and potentially useful knowledge about the temporal dynamics exhibited by the network over the time, without making any prior assumption about the underlying evolutionary schema. Pattern-based approaches of temporal pattern mining can be exploited to detect and characterize changes exhibited by a network over the time, starting from observed snapshots.\nThis paper presents an evaluation of deep neural networks for recognition of digits entered by users on a smartphone touchscreen. A new large dataset of Arabic numerals was collected for training and evaluation of the network. The dataset consists of spatial and temporal touch data recorded for 80 digits entered by 260 users. Two neural network models were investigated. The first model was a 2D convolutional neural (ConvNet) network applied to bitmaps of the glpyhs created by interpolation of the sensed screen touches and its topology is similar to that of previously published models for offline handwriting recognition from scanned images. The second model used a 1D ConvNet architecture but was applied to the sequence of polar vectors connecting the touch points. The models were found to provide accuracies of 98.50% and 95.86%, respectively. The second model was much simpler, providing a reduction in the number of parameters from 1,663,370 to 287,690. The dataset has been made available to the community as an open source resource.\nOne of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.\nDeep reinforcement learning yields great results for a large array of problems, but models are generally retrained anew for each new problem to be solved. Prior learning and knowledge are difficult to incorporate when training new models, requiring increasingly longer training as problems become more complex. This is especially problematic for problems with sparse rewards. We provide a solution to these problems by introducing Concept Network Reinforcement Learning (CNRL), a framework which allows us to decompose problems using a multi-level hierarchy. Concepts in a concept network are reusable, and flexible enough to encapsulate feature extractors, skills, or other concept networks. With this hierarchical learning approach, deep reinforcement learning can be used to solve complex tasks in a modular way, through problem decomposition. We demonstrate the strength of CNRL by training a model to grasp a rectangular prism and precisely stack it on top of a cube using a gripper on a Kinova JACO arm, simulated in MuJoCo. Our experiments show that our use of hierarchy results in a 45x reduction in environment interactions compared to the state-of-the-art on this task.\nState-of-the-art knowledge compilers generate deterministic subsets of DNNF, which have been recently shown to be exponentially less succinct than DNNF. In this paper, we propose a new method to compile DNNFs without enforcing determinism necessarily. Our approach is based on compiling deterministic DNNFs with the addition of auxiliary variables to the input formula. These variables are then existentially quantified from the deterministic structure in linear time, which would lead to a DNNF that is equivalent to the input formula and not necessarily deterministic. On the theoretical side, we show that the new method could generate exponentially smaller DNNFs than deterministic ones, even by adding a single auxiliary variable. Further, we show that various existing techniques that introduce auxiliary variables to the input formulas can be employed in our framework. On the practical side, we empirically demonstrate that our new method can significantly advance DNNF compilation on certain benchmarks.\nOperationalizing machine learning based security detections is extremely challenging, especially in a continuously evolving cloud environment. Conventional anomaly detection does not produce satisfactory results for analysts that are investigating security incidents in the cloud. Model evaluation alone presents its own set of problems due to a lack of benchmark datasets. When deploying these detections, we must deal with model compliance, localization, and data silo issues, among many others. We pose the problem of \"attack disruption\" as a way forward in the security data science space. In this paper, we describe the framework, challenges, and open questions surrounding the successful operationalization of machine learning based security detections in a cloud environment and provide some insights on how we have addressed them.\nFeature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.\nDeep learning algorithms offer a powerful means to automatically analyze the content of medical images. However, many biological samples of interest are primarily transparent to visible light and contain features that are difficult to resolve with a standard optical microscope. Here, we use a convolutional neural network (CNN) not only to classify images, but also to optimize the physical layout of the imaging device itself. We increase the classification accuracy of a microscope's recorded images by merging an optical model of image formation into the pipeline of a CNN. The resulting network simultaneously determines an ideal illumination arrangement to highlight important sample features during image acquisition, along with a set of convolutional weights to classify the detected images post-capture. We demonstrate our joint optimization technique with an experimental microscope configuration that automatically identifies malaria-infected cells with 5-10% higher accuracy than standard and alternative microscope lighting designs.\nWe study the problem of learning description logic (DL) ontologies in Angluin et al.'s framework of exact learning via queries. We admit membership queries (\"is a given subsumption entailed by the target ontology?\") and equivalence queries (\"is a given ontology equivalent to the target ontology?\"). We present three main results: (1) ontologies formulated in (two relevant versions of) the description logic DL-Lite can be learned with polynomially many queries of polynomial size; (2) this is not the case for ontologies formulated in the description logic EL, even when only acyclic ontologies are admitted; and (3) ontologies formulated in a fragment of EL related to the web ontology language OWL 2 RL can be learned in polynomial time. We also show that neither membership nor equivalence queries alone are sufficient in cases (1) and (3).\nWe present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.\nIn the smart grid, the intent is to use flexibility in demand, both to balance demand and supply as well as to resolve potential congestion. A first prominent example of such flexible demand is the charging of electric vehicles, which do not necessarily need to be charged as soon as they are plugged in. The problem of optimally scheduling the charging demand of electric vehicles within the constraints of the electricity infrastructure is called the charge scheduling problem. The models of the charging speed, horizon, and charging demand determine the computational complexity of the charge scheduling problem. For about 20 variants, we show, using a dynamic programming approach, that the problem is either in P or weakly NP-hard. We also show that about 10 variants of the problem are strongly NP-hard, presenting a potentially significant obstacle to their use in practical situations of scale.\nIn this paper we focus on the unconstrained binary quadratic optimization model, maximize x^t Qx, x binary, and consider the problem of identifying optimal solutions that are robust with respect to perturbations in the Q matrix.. We are motivated to find robust, or stable, solutions because of the uncertainty inherent in the big data origins of Q and limitations in computer numerical precision, particularly in a new class of quantum annealing computers. Experimental design techniques are used to generate a diverse subset of possible scenarios, from which robust solutions are identified. An illustrative example with practical application to business decision making is examined. The approach presented also generates a surface response equation which is used to estimate upper bounds in constant time for Q instantiations within the scenario extremes. In addition, a theoretical framework for the robustness of individual x_i variables is considered by examining the range of Q values over which the x_i are predetermined.\nMany AI systems have a black box nature that makes it difficult to understand how they make their recommendations. This can be unsettling, as the designer cannot be certain how the system will respond to novelty. To penetrate our Na\\\"ive Bayes recommender's black box, we first asked, what do we want to know from our system, and how can it be obtained? The answers led us to recursively define a common lexicon with the AI, a lingua franca, using the very items that the system ranks to create meta-symbols recognized by the system, and enabling us to understand the system's knowledge in plain terms and at different levels of abstraction. As one bonus, using its existing knowledge, the lingua franca can enable the system to extend recommendations to related, but entirely new areas, ameliorating the cold start problem. We also supplement the lingua franca with techniques for visualizing the system's knowledge state, develop metrics for evaluating the meaningfulness of terms in the lingua franca, and generalize the requirements for developing a similar lingua franca in other applications.\nLocal search is a basic building block in memetic algorithms. Guided Local Search (GLS) can improve the efficiency of local search. By changing the guide function, GLS guides a local search to escape from locally optimal solutions and find better solutions. The key component of GLS is its penalizing mechanism which determines which feature is selected to penalize when the search is trapped in a locally optimal solution. The original GLS penalizing mechanism only makes use of the cost and the current penalty value of each feature. It is well known that many combinatorial optimization problems have a big valley structure, i.e., the better a solution is, the more the chance it is closer to a globally optimal solution. This paper proposes to use big valley structure assumption to improve the GLS penalizing mechanism. An improved GLS algorithm called Elite Biased GLS (EB-GLS) is proposed. EB-GLS records and maintains an elite solution as an estimate of the globally optimal solutions, and reduces the chance of penalizing the features in this solution. We have systematically tested the proposed algorithm on the symmetric traveling salesman problem. Experimental results show that EB-GLS is significantly better than GLS.\nAutomatic mesh-based shape generation is of great interest across a wide range of disciplines, from industrial design to gaming, computer graphics and various other forms of digital art. While most traditional methods focus on primitive based model generation, advances in deep learning made it possible to learn 3-dimensional geometric shape representations in an end-to-end manner. However, most current deep learning based frameworks focus on the representation and generation of voxel and point-cloud based shapes, making it not directly applicable to design and graphics communities. This study addresses the needs for automatic generation of mesh-based geometries, and propose a novel framework that utilizes signed distance function representation that generates detail preserving three-dimensional surface mesh by a deep learning based approach.\nWe make an important connection to existing results in econometrics to describe an alternative formulation of inverse reinforcement learning (IRL). In particular, we describe an algorithm using Conditional Choice Probabilities (CCP), which are maximum likelihood estimates of the policy estimated from expert demonstrations, to solve the IRL problem. Using the language of structural econometrics, we re-frame the optimal decision problem and introduce an alternative representation of value functions due to (Hotz and Miller 1993). In addition to presenting the theoretical connections that bridge the IRL literature between Economics and Robotics, the use of CCPs also has the practical benefit of reducing the computational cost of solving the IRL problem. Specifically, under the CCP representation, we show how one can avoid repeated calls to the dynamic programming subroutine typically used in IRL. We show via extensive experimentation on standard IRL benchmarks that CCP-IRL is able to outperform MaxEnt-IRL, with as much as a 5x speedup and without compromising on the quality of the recovered reward function.\nMany state-of-the-art algorithms for solving hard combinatorial problems include elements of stochasticity that lead to high variations in runtime, even for a fixed problem instance, across runs with different pseudo-random number seeds. Knowledge about the runtime distributions (RTDs) of algorithms on given problem instances can be exploited in various meta-algorithmic procedures, such as algorithm selection, portfolios, and randomized restarts. Previous work has shown that machine learning can be used to individually predict mean, median and variance of RTDs. To establish a new state-of-the-art in predicting RTDs, we demonstrate that the parameters of an RTD should be learned jointly and that neural networks can do this well by directly optimizing the likelihood of an RTD given runtime observations. In an empirical study involving four algorithms for SAT solving and AI planning, we show that our neural networks predict the true RTDs of unseen instances better than previous methods. As an exemplary application of RTD predictions, we show that our RTD models also yield good predictions of running these algorithms in parallel.\nAppropriate comments of code snippets provide insight for code functionality, which are helpful for program comprehension. However, due to the great cost of authoring with the comments, many code projects do not contain adequate comments. Automatic comment generation techniques have been proposed to generate comments from pieces of code in order to alleviate the human efforts in annotating the code. Most existing approaches attempt to exploit certain correlations (usually manually given) between code and generated comments, which could be easily violated if the coding patterns change and hence the performance of comment generation declines. In this paper, we first build C2CGit, a large dataset from open projects in GitHub, which is more than 20$\\times$ larger than existing datasets. Then we propose a new attention module called Code Attention to translate code to comments, which is able to utilize the domain features of code snippets, such as symbols and identifiers. We make ablation studies to determine effects of different parts in Code Attention. Experimental results demonstrate that the proposed module has better performance over existing approaches in both BLEU and METEOR.\nWhile deep reinforcement learning techniques have recently produced considerable achievements on many decision-making problems, their use in robotics has largely been limited to simulated worlds or restricted motions, since unconstrained trial-and-error interactions in the real world can have undesirable consequences for the robot or its environment. To overcome such limitations, we propose a novel reinforcement learning architecture, OptLayer, that takes as inputs possibly unsafe actions predicted by a neural network and outputs the closest actions that satisfy chosen constraints. While learning control policies often requires carefully crafted rewards and penalties while exploring the range of possible actions, OptLayer ensures that only safe actions are actually executed and unsafe predictions are penalized during training. We demonstrate the effectiveness of our approach on robot reaching tasks, both simulated and in the real world.\nThe \"Loving AI\" project involves developing software enabling humanoid robots to interact with people in loving and compassionate ways, and to promote people' self-understanding and self-transcendence. Currently the project centers on the Hanson Robotics robot \"Sophia\" -- specifically, on supplying Sophia with personality content and cognitive, linguistic, perceptual and behavioral content aimed at enabling loving interactions supportive of human self-transcendence. In September 2017 a small pilot study was conducted, involving the Sophia robot leading human subjects through dialogues and exercises focused on meditation, visualization and relaxation. The pilot was an apparent success, qualitatively demonstrating the viability of the approach and the ability of appropriate human-robot interaction to increase human well-being and advance human consciousness.\nThis paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data), and theoretically show that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. Our analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations. Finally, we also discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting.\nWe propose a method to build quantum memristors in quantum photonic platforms. We firstly design an effective beam splitter, which is tunable in real-time, by means of a Mach-Zehnder-type array with two equal 50:50 beam splitters and a tunable retarder, which allows us to control its reflectivity. Then, we show that this tunable beam splitter, when equipped with weak measurements and classical feedback, behaves as a quantum memristor. Indeed, in order to prove its quantumness, we show how to codify quantum information in the coherent beams. Moreover, we estimate the memory capability of the quantum memristor. Finally, we show the feasibility of the proposed setup in integrated quantum photonics.\nWe propose a protocol to perform generalized quantum reinforcement learning with quantum technologies. At variance with recent results on quantum reinforcement learning with superconducting circuits [L. Lamata, Sci. Rep. 7, 1609 (2017)], in our current protocol coherent feedback during the learning process is not required, enabling its implementation in a wide variety of quantum systems. We consider diverse possible scenarios for an agent, an environment, and a register that connects them, involving multiqubit and multilevel systems, as well as open-system dynamics. We finally propose possible implementations of this protocol in trapped ions and superconducting circuits. The field of quantum reinforcement learning with quantum technologies will enable enhanced quantum control, as well as more efficient machine learning calculations.\nSubgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data.   We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.\nModels based on Convolutional Neural Networks (CNNs) have been proven very successful for semantic segmentation and object parsing that yield hierarchies of features. Our key insight is to build convolutional networks that take input of arbitrary size and produce object parsing output with efficient inference and learning. In this work, we focus on the task of instance segmentation and parsing which recognizes and localizes objects down to a pixel level base on deep CNN. Therefore, unlike some related work, a pixel cannot belong to multiple instances and parsing. Our model is based on a deep neural network trained for object masking that supervised with input image and follow incorporates a Conditional Random Field (CRF) with end-to-end trainable piecewise order potentials based on object parsing outputs. In each CRF unit we designed terms to capture the short range and long range dependencies from various neighbors. The accurate instance-level segmentation that our network produce is reflected by the considerable improvements obtained over previous work at high APr thresholds. We demonstrate the effectiveness of our model with extensive experiments on challenging dataset subset of PASCAL VOC2012.\nThis paper contains analysis of concept of a class within different object-oriented knowledge representation models. The main attention is paid to structure of the class and its efficiency in the context of data storage, using object-relational mapping. The main achievement of the paper is extension of concept of homogeneous class of objects by introducing concepts of single-core and multi-core inhomogeneous classes of objects, which allow simultaneous defining of a few different types within one class of objects, avoiding duplication of properties and methods in representation of types, decreasing sizes of program codes and providing more efficient information storage in the databases. In addition, the paper contains results of experiment, which show that data storage in relational database, using proposed extensions of the class, in some cases is more efficient in contrast to usage of homogeneous classes of objects.\nThe new era of the Web is known as the semantic Web or the Web of data. The semantic Web depends on ontologies that are seen as one of its pillars. The bigger these ontologies, the greater their exploitation. However, when these ontologies become too big other problems may appear, such as the complexity to charge big files in memory, the time it needs to download such files and especially the time it needs to make reasoning on them. We discuss in this paper approaches for segmenting such big Web ontologies as well as its usefulness. The segmentation method extracts from an existing ontology a segment that represents a layer or a generation in the existing ontology; i.e. a horizontally extraction. The extracted segment should be itself an ontology.\nTo resolve conflicts among norms, various nonmonotonic formalisms can be used to perform prioritized normative reasoning. Meanwhile, formal argumentation provides a way to represent nonmonotonic logics. In this paper, we propose a representation of prioritized normative reasoning by argumentation. Using hierarchical abstract normative systems, we define three kinds of prioritized normative reasoning approaches, called Greedy, Reduction, and Optimization. Then, after formulating an argumentation theory for a hierarchical abstract normative system, we show that for a totally ordered hierarchical abstract normative system, Greedy and Reduction can be represented in argumentation by applying the weakest link and the last link principles respectively, and Optimization can be represented by introducing additional defeats capturing the idea that for each argument that contains a norm not belonging to the maximal obeyable set then this argument should be rejected.\nWe analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by way of recurrent cross-connections. We present a general hyperparameter optimisation technique for X-LSTMs, which allows us to significantly improve on the LSTM and a prior state-of-the-art cross-modal approach, using a comparable number of parameters. Finally, we visualise the model's predictions, revealing implications about latent variables in this task.\nSelf-supervised learning (SSL) is a reliable learning mechanism in which a robot enhances its perceptual capabilities. Typically, in SSL a trusted, primary sensor cue provides supervised training data to a secondary sensor cue. In this article, a theoretical analysis is performed on the fusion of the primary and secondary cue in a minimal model of SSL. A proof is provided that determines the specific conditions under which it is favorable to perform fusion. In short, it is favorable when (i) the prior on the target value is strong or (ii) the secondary cue is sufficiently accurate. The theoretical findings are validated with computational experiments. Subsequently, a real-world case study is performed to investigate if fusion in SSL is also beneficial when assumptions of the minimal model are not met. In particular, a flying robot learns to map pressure measurements to sonar height measurements and then fuses the two, resulting in better height estimation. Fusion is also beneficial in the opposite case, when pressure is the primary cue. The analysis and results are encouraging to study SSL fusion also for other robots and sensors.\nWe present a probabilistic model of an intrusion in a marked renewal process. Given a process and a sequence of events, an intrusion is a subsequence of events that is not produced by the process. Applications of the model are, for example, online payment fraud with the fraudster taking over a user's account and performing payments on the user's behalf, or unexpected equipment failures due to unintended use.   We adopt Bayesian approach to infer the probability of an intrusion in a sequence of events, a MAP subsequence of events constituting the intrusion, and the marginal probability of each event in a sequence to belong to the intrusion. We evaluate the model for intrusion detection on synthetic data, as well as on anonymized data from an online payment system.\nTransfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.\nWhile deep reinforcement learning (RL) methods have achieved unprecedented successes in a range of challenging problems, their applicability has been mainly limited to simulation or game domains due to the high sample complexity of the trial-and-error learning process. However, real-world robotic applications often need a data-efficient learning process with safety-critical constraints. In this paper, we consider the challenging problem of learning unmanned aerial vehicle (UAV) control for tracking a moving target. To acquire a strategy that combines perception and control, we represent the policy by a convolutional neural network. We develop a hierarchical approach that combines a model-free policy gradient method with a conventional feedback proportional-integral-derivative (PID) controller to enable stable learning without catastrophic failure. The neural network is trained by a combination of supervised learning from raw images and reinforcement learning from games of self-play. We show that the proposed approach can learn a target following policy in a simulator efficiently and the learned behavior can be successfully transferred to the DJI quadrotor platform for real-world UAV control.\nThe continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.\nWe introduce Graph-Structured Sum-Product Networks (GraphSPNs), a probabilistic approach to structured prediction for problems where dependencies between latent variables are expressed in terms of arbitrary, dynamic graphs. While many approaches to structured prediction place strict constraints on the interactions between inferred variables, many real-world problems can be only characterized using complex graph structures of varying size, often contaminated with noise when obtained from real data. Here, we focus on one such problem in the domain of robotics. We demonstrate how GraphSPNs can be used to bolster inference about semantic, conceptual place descriptions using noisy topological relations discovered by a robot exploring large-scale office spaces. Through experiments, we show that GraphSPNs consistently outperform the traditional approach based on undirected graphical models, successfully disambiguating information in global semantic maps built from uncertain, noisy local evidence. We further exploit the probabilistic nature of the model to infer marginal distributions over semantic descriptions of as yet unexplored places and detect spatial environment configurations that are novel and incongruent with the known evidence.\nPersuasivenes is a creative art aimed at making people believe in certain set of beliefs. Many a times, such creativity is about adapting richness of one domain into another to strike a chord with the target audience. In this research, we present PersuAIDE! - A persuasive system based on linguistic creativity to transform given sentence to generate various forms of persuading sentences. These various forms cover multiple focus of persuasion such as memorability and sentiment. For a given simple product line, the algorithm is composed of several steps including: (i) select an appropriate well-known expression for the target domain to add memorability, (ii) identify keywords and entities in the given sentence and expression and transform it to produce creative persuading sentence, and (iii) adding positive or negative sentiment for further persuasion. The persuasive conversion were manually verified using qualitative results and the effectiveness of the proposed approach is empirically discussed.\nCryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentricities and obfuscated transmutation mechanisms, hence requiring smarter, more efficient detection strategies. The following manuscript presents a novel approach for the classification of cryptographic primitives in compiled binary executables using deep learning. The model blueprint, a DCNN, is fittingly configured to learn from variable-length control flow diagnostics output from a dynamic trace. To rival the size and variability of contemporary data compendiums, hence feeding the model cognition, a methodology for the procedural generation of synthetic cryptographic binaries is defined, utilising core primitives from OpenSSL with multivariate obfuscation, to draw a vastly scalable distribution. The library, CryptoKnight, rendered an algorithmic pool of AES, RC4, Blowfish, MD5 and RSA to synthesis combinable variants which are automatically fed in its core model. Converging at 91% accuracy, CryptoKnight is successfully able to classify the sample algorithms with minimal loss.\nGraph based semi-supervised learning (GSSL) has intuitive representation and can be improved by exploiting the matrix calculation. However, it has to perform iterative optimization to achieve a preset objective, which usually leads to low efficiency. Another inconvenience lying in GSSL is that when new data come, the graph construction and the optimization have to be conducted all over again. We propose a sound assumption, arguing that: the neighboring data points are not in peer-to-peer relation, but in a partial-ordered relation induced by the local density and distance between the data; and the label of a center can be regarded as the contribution of its followers. Starting from the assumption, we develop a highly efficient non-iterative label propagation algorithm based on a novel data structure named as optimal leading forest (LaPOLeaF). The major weaknesses of the traditional GSSL are addressed by this study. We further scale LaPOLeaF to accommodate big data by utilizing block distance matrix technique, parallel computing, and Locality-Sensitive Hashing (LSH). Experiments on large datasets have shown the promising results of the proposed methods.\nIn this paper we focus on developing a control algorithm for multi-terrain tracked robots with flippers using a reinforcement learning (RL) approach. The work is based on the deep deterministic policy gradient (DDPG) algorithm, proven to be very successful in simple simulation environments. The algorithm works in an end-to-end fashion in order to control the continuous position of the flippers. This end-to-end approach makes it easy to apply the controller to a wide array of circumstances, but the huge flexibility comes to the cost of an increased difficulty of solution. The complexity of the task is enlarged even more by the fact that real multi-terrain robots move in partially observable environments. Notwithstanding these complications, being able to smoothly control a multi-terrain robot can produce huge benefits in impaired people daily lives or in search and rescue situations.\nWe study the quantum synchronization between a pair of two-level systems inside two coupledcavities. Using a digital-analog decomposition of the master equation that rules the system dynamics, we show that this approach leads to quantum synchronization between both two-level systems. Moreover, we can identify in this digital-analog block decomposition the fundamental elements of a quantum machine learning protocol, in which the agent and the environment (learning units) interact through a mediating system, namely, the register. If we can additionally equip this algorithm with a classical feedback mechanism, which consists of projective measurements in the register, reinitialization of the register state and local conditional operations on the agent and register subspace, a powerful and flexible quantum machine learning protocol emerges. Indeed, numerical simulations show that this protocol enhances the synchronization process, even when every subsystem experience different loss/decoherence mechanisms, and give us flexibility to choose the synchronization state. Finally, we propose an implementation based on current technologies in superconducting circuits.\nThe growing importance and utilization of measuring brain waves (e.g. EEG signals of eye state) in brain-computer interface (BCI) applications highlighted the need for suitable classification methods. In this paper, a comparison between three of well-known classification methods (i.e. support vector machine (SVM), hidden Markov map (HMM), and radial basis function (RBF)) for EEG based eye state classification was achieved. Furthermore, a suggested method that is based on ensemble model was tested. The suggested (ensemble system) method based on a voting algorithm with two kernels: random forest (RF) and Kstar classification methods. The performance was tested using three measurement parameters: accuracy, mean absolute error (MAE), and confusion matrix. Results showed that the proposed method outperforms the other tested methods. For instance, the suggested method's performance was 97.27% accuracy and 0.13 MAE.\nDaily operation of a large-scale experiment is a challenging task, particularly from perspectives of routine monitoring of quality for data being taken. We describe an approach that uses Machine Learning for the automated system to monitor data quality, which is based on partial use of data qualified manually by detector experts. The system automatically classifies marginal cases: both of good an bad data, and use human expert decision to classify remaining \"grey area\" cases.   This study uses collision data collected by the CMS experiment at LHC in 2010. We demonstrate that proposed workflow is able to automatically process at least 20\\% of samples without noticeable degradation of the result.\nAdversarial attacks are known to succeed on classifiers, but it has been an open question whether more complex vision systems are vulnerable. In this paper, we study adversarial examples for vision and language models, which incorporate natural language understanding and complex structures such as attention, localization, and modular architectures. In particular, we investigate attacks on a dense captioning model and on two visual question answering (VQA) models. Our evaluation shows that we can generate adversarial examples with a high success rate (i.e., > 90%) for these models. Our work sheds new light on understanding adversarial attacks on vision systems which have a language component and shows that attention, bounding box localization, and compositional internal structures are vulnerable to adversarial attacks. These observations will inform future work towards building effective defenses.\nThe process of building ontologies is a difficult task that involves collaboration between ontology developers and domain experts and requires an ongoing interaction between them. This collaboration is made more difficult, because they tend to use different tool sets, which can hamper this interaction. In this paper, we propose to decrease this distance between domain experts and ontology developers by creating more readable forms of ontologies, and further to enable editing in normal office environments. Building on a programmatic ontology development environment, such as Tawny-OWL, we are now able to generate these readable/editable from the raw ontological source and its embedded comments. We have this translation to HTML for reading; this environment provides rich hyperlinking as well as active features such as hiding the source code in favour of comments. We are now working on translation to a Word document that also enables editing. Taken together this should provide a significant new route for collaboration between the ontologist and domain specialist.\nThis paper provides an overview of evolutionary robotics techniques applied to on-line distributed evolution for robot collectives -- namely, embodied evolution. It provides a definition of embodied evolution as well as a thorough description of the underlying concepts and mechanisms. The paper also presents a comprehensive summary of research published in the field since its inception (1999-2017), providing various perspectives to identify the major trends. In particular, we identify a shift from considering embodied evolution as a parallel search method within small robot collectives (fewer than 10 robots) to embodied evolution as an on-line distributed learning method for designing collective behaviours in swarm-like collectives. The paper concludes with a discussion of applications and open questions, providing a milestone for past and an inspiration for future research.\nQuestion processing is a fundamental step in a question answering (QA) application, and its quality impacts the performance of QA application. The major challenging issue in processing question is how to extract semantic of natural language questions (NLQs). A human language is ambiguous. Ambiguity may occur at two levels; lexical and syntactic. In this paper, we propose a new approach for resolving lexical ambiguity problem by integrating context knowledge and concepts knowledge of a domain, into shallow natural language processing (SNLP) techniques. Concepts knowledge is modeled using ontology, while context knowledge is obtained from WordNet, and it is determined based on neighborhood words in a question. The approach will be applied to a university QA system.\nResource allocation is still a difficult issue to deal with in wireless networks. The unstable channel condition and traffic demand for Quality of Service (QoS) raise some barriers that interfere with the process. It is significant that an optimal policy takes into account some resources available to each traffic class while considering the spectral efficiency and other related channel issues. Reinforcement learning is a dynamic and effective method to support the accomplishment of resource allocation properly maintaining QoS levels for applications. The technique can track the system state as feedback to enhance the performance of a given task. Herein, it is proposed a simple reinforcement learning mechanism introduced in LTE-A networks and aimed to choose and limit the number of resources allocated for each traffic class, regarding the QoS Class Identifier (QCI), at each Transmission Time Interval (TTI) along the scheduling procedure. The proposed mechanism implements a Markov Decision Process (MDP) solved by the Q-Learning algorithm to find an optimal action-state decision policy. The results obtained from simulation exhibit good performance, especially for the real-time Video application.\nPredicting traffic conditions has been recently explored as a way to relieve traffic congestion. Several pioneering approaches have been proposed based on traffic observations of the target location as well as its adjacent regions, but they obtain somewhat limited accuracy due to lack of mining road topology. To address the effect attenuation problem, we propose to take account of the traffic of surrounding locations(wider than adjacent range). We propose an end-to-end framework called DeepTransport, in which Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are utilized to obtain spatial-temporal traffic information within a transport network topology. In addition, attention mechanism is introduced to align spatial and temporal information. Moreover, we constructed and released a real-world large traffic condition dataset with 5-minute resolution. Our experiments on this dataset demonstrate our method captures the complex relationship in temporal and spatial domain. It significantly outperforms traditional statistical methods and a state-of-the-art deep learning method.\nIn the context of the Electronic Health Record, automated diagnosis coding of patient notes is a useful task, but a challenging one due to the large number of codes and the length of patient notes. We investigate four models for assigning multiple ICD codes to discharge summaries taken from both MIMIC II and III. We present Hierarchical Attention-GRU (HA-GRU), a hierarchical approach to tag a document by identifying the sentences relevant for each label. HA-GRU achieves state-of-the art results. Furthermore, the learned sentence-level attention layer highlights the model decision process, allows easier error analysis, and suggests future directions for improvement.\nReward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach.\nNamed Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approaches have been proposed for this task in Russian language, it still has a substantial potential for the better solutions. In this work, we studied several deep neural network models starting from vanilla Bi-directional Long Short-Term Memory (Bi-LSTM) then supplementing it with Conditional Random Fields (CRF) as well as highway networks and finally adding external word embeddings. All models were evaluated across three datasets: Gareev's dataset, Person-1000, FactRuEval-2016. We found that extension of Bi-LSTM model with CRF significantly increased the quality of predictions. Encoding input tokens with external word embeddings reduced training time and allowed to achieve state of the art for the Russian NER task.\nGoal recognition is the problem of inferring the goal of an agent, based on its observed actions. An inspiring approach - plan recognition by planning (PRP) - uses off-the-shelf planners to dynamically generate plans for given goals, eliminating the need for the traditional plan library. However, existing PRP formulation is inherently inefficient in online recognition, and cannot be used with motion planners for continuous spaces. In this paper, we utilize a different PRP formulation which allows for online goal recognition, and for application in continuous spaces. We present an online recognition algorithm, where two heuristic decision points may be used to improve run-time significantly over existing work. We specify heuristics for continuous domains, prove guarantees on their use, and empirically evaluate the algorithm over hundreds of experiments in both a 3D navigational environment and a cooperative robotic team task.\nThe reliable measurement of confidence in classifiers' predictions is very important for many applications and is, therefore, an important part of classifier design. Yet, although deep learning has received tremendous attention in recent years, not much progress has been made in quantifying the prediction confidence of neural network classifiers. Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with prohibitive computational costs. In this paper we propose a simple, scalable method to achieve a reliable confidence score, based on the data embedding derived from the penultimate layer of the network. We investigate two ways to achieve desirable embeddings, by using either a distance-based loss or Adversarial Training. We then test the benefits of our method when used for classification error prediction, weighting an ensemble of classifiers, and novelty detection. In all tasks we show significant improvement over traditional, commonly used confidence scores.\nWe report on an extensive study of the current benefits and limitations of deep learning approaches to robot vision and introduce a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a human-robot interaction setting to design a data-acquisition protocol for visual object recognition on the iCub humanoid robot. Considering the performance of off-the-shelf models trained on off-line large-scale image retrieval datasets, we show the necessity for knowledge transfer. Indeed, we analyze different ways in which this last step can be done, and identify the major bottlenecks in robotics scenarios. By studying both object categorization and identification tasks, we highlight the key differences between object recognition in robotics and in image retrieval tasks, for which the considered deep learning approaches have been originally designed. In a nutshell, our results confirm also in the considered setting the remarkable improvements yield by deep learning, while pointing to specific open challenges that need to be addressed for seamless deployment in robotics.\nThe excellent performance of deep neural networks has enabled us to solve several automatization problems, opening an era of autonomous devices. However, current deep net architectures are heavy with millions of parameters and require billions of floating point operations. Several works have been developed to compress a pre-trained deep network to reduce memory footprint and, possibly, computation. Instead of compressing a pre-trained network, in this work, we propose a generic neural network layer structure employing multilinear projection as the primary feature extractor. The proposed architecture requires several times less memory as compared to the traditional Convolutional Neural Networks (CNN), while inherits the similar design principles of a CNN. In addition, the proposed architecture is equipped with two computation schemes that enable computation reduction or scalability. Experimental results show the effectiveness of our compact projection that outperforms traditional CNN, while requiring far fewer parameters.\nOne of the key challenges for operations researchers solving real-world problems is designing and implementing high-quality heuristics to guide their search procedures. In the past, machine learning techniques have failed to play a major role in operations research approaches, especially in terms of guiding branching and pruning decisions. We integrate deep neural networks into a heuristic tree search procedure to decide which branch to choose next and to estimate a bound for pruning the search tree of an optimization problem. We call our approach Deep Learning assisted heuristic Tree Search (DLTS) and apply it to a well-known problem from the container terminals literature, the container pre-marshalling problem (CPMP). Our approach is able to learn heuristics customized to the CPMP solely through analyzing the solutions to CPMP instances, and applies this knowledge within a heuristic tree search to produce the highest quality heuristic solutions to the CPMP to date.\nDexterous multi-fingered hands are extremely versatile and provide a generic way to perform multiple tasks in human-centric environments. However, effectively controlling them remains challenging due to their high dimensionality and large number of potential contacts. Deep reinforcement learning (DRL) provides a model-agnostic approach to control complex dynamical systems, but has not been shown to scale to high-dimensional dexterous manipulation. Furthermore, deployment of DRL on physical systems remains challenging due to sample inefficiency. Thus, the success of DRL in robotics has thus far been limited to simpler manipulators and tasks. In this work, we show that model-free DRL with natural policy gradients can effectively scale up to complex manipulation tasks with a high-dimensional 24-DoF hand, and solve them from scratch in simulated experiments. Furthermore, with the use of a small number of human demonstrations, the sample complexity can be significantly reduced, and enable learning within the equivalent of a few hours of robot experience. We demonstrate successful policies for multiple complex tasks: object relocation, in-hand manipulation, tool use, and door opening.\nExploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.\nThis paper proposes a novel neural machine reading model for open-domain question answering at scale. Existing machine comprehension models typically assume that a short piece of relevant text containing answers is already identified and given to the models, from which the models are designed to extract answers. This assumption, however, is not realistic for building a large-scale open-domain question answering system which requires both deep text understanding and identifying relevant text from corpus simultaneously.   In this paper, we introduce Neural Comprehensive Ranker (NCR) that integrates both passage ranking and answer extraction in one single framework. A Q&A system based on this framework allows users to issue an open-domain question without needing to provide a piece of text that must contain the answer. Experiments show that the unified NCR model is able to outperform the states-of-the-art in both retrieval of relevant text and answer extraction.\nThe ability to deploy neural networks in real-world, safety-critical systems is severely limited by the presence of adversarial examples: slightly perturbed inputs that are misclassified by the network. In recent years, several techniques have been proposed for increasing robustness to adversarial examples --- and yet most of these have been quickly shown to be vulnerable to future attacks. For example, over half of the defenses proposed by papers accepted at ICLR 2018 have already been broken. We propose to address this difficulty through formal verification techniques. We show how to construct provably minimally distorted adversarial examples: given an arbitrary neural network and input sample, we can construct adversarial examples which we prove are of minimal distortion. Using this approach, we demonstrate that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4.2.\nWe present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the effects of different dialogue policies and capabilities on the accuracy of learned meanings, learning rates, and efforts/costs to the tutor. We show that the overall performance of the learning agent is affected by (1) who takes initiative in the dialogues; (2) the ability to express/use their confidence level about visual attributes; and (3) the ability to process elliptical and incrementally constructed dialogue turns. Ultimately, we train an adaptive dialogue policy which optimises the trade-off between classifier accuracy and tutoring costs.\nWe present a novel framework for the automatic discovery and recognition of human motion primitives from motion capture data. Human motion primitives are discovered by optimizing the 'motion flux', a quantity which depends on the motion of a group of skeletal joints. Models of each primitive category are computed via non-parametric Bayes methods and recognition is performed based on their geometric properties. A normalization of the primitives is proposed in order to make them invariant with respect to anatomical variations and data sampling rate. Using our framework we build a publicly available dataset of human motion primitives based on motion capture sequences taken from well-known datasets. We expect that our framework, by providing an objective way for discovering and categorizing human motion, will be a useful tool in numerous research fields related to Robotics including human inspired motion generation, learning by demonstration, and intuitive human-robot interaction.\nExecution monitor of high-level robot actions can be effectively improved by visual monitoring the state of the world in terms of preconditions and postconditions that hold before and after the execution of an action. Furthermore a policy for searching where to look at, either for verifying the relations that specify the pre and postconditions or to refocus in case of a failure, can tremendously improve the robot execution in an uncharted environment. It is now possible to strongly rely on visual perception in order to make the assumption that the environment is observable, by the amazing results of deep learning. In this work we present visual execution monitoring for a robot executing tasks in an uncharted Lab environment. The execution monitor interacts with the environment via a visual stream that uses two DCNN for recognizing the objects the robot has to deal with and manipulate, and a non-parametric Bayes estimation to discover the relations out of the DCNN features. To recover from lack of focus and failures due to missed objects we resort to visual search policies via deep reinforcement learning.\nIn this paper we study the personalized text search problem. The keyword based search method in conventional algorithms has a low efficiency in understanding users' intention since the semantic meaning, user profile, user interests are not always considered. Firstly, we propose a novel text search algorithm using a inverse filtering mechanism that is very efficient for label based item search. Secondly, we adopt the Bayesian network to implement the user interest prediction for an improved personalized search. According to user input, it searches the related items using keyword information, predicted user interest. Thirdly, the word vectorization is used to discover potential targets according to the semantic meaning. Experimental results show that the proposed search engine has an improved efficiency and accuracy and it can operate on embedded devices with very limited computational resources.\nDeep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. This paper focuses on cooperative multi-agent problem based on actor-critic methods under local observations settings. Multi agent deep deterministic policy gradient obtained state of art results for some multi-agent games, whereas, it cannot scale well with growing amount of agents. In order to boost scalability, we propose a parameter sharing deterministic policy gradient method with three variants based on neural networks, including actor-critic sharing, actor sharing and actor sharing with partially shared critic. Benchmarks from rllab show that the proposed method has advantages in learning speed and memory efficiency, well scales with growing amount of agents, and moreover, it can make full use of reward sharing and exchangeability if possible.\nSelecting an optimal event representation is essential for event classification in real world contexts. In this paper, we investigate the application of qualitative spatial reasoning (QSR) frameworks for classification of human-object interaction in three dimensional space, in comparison with the use of quantitative feature extraction approaches for the same purpose. In particular, we modify QSRLib, a library that allows computation of Qualitative Spatial Relations and Calculi, and employ it for feature extraction, before inputting features into our neural network models. Using an experimental setup involving motion captures of human-object interaction as three dimensional inputs, we observe that the use of qualitative spatial features significantly improves the performance of our machine learning algorithm against our baseline, while quantitative features of similar kinds fail to deliver similar improvement. We also observe that sequential representations of QSR features yield the best classification performance. A result of our learning method is a simple approach to the qualitative representation of 3D activities as compositions of 2D actions that can be visualized and learned using 2-dimensional QSR.\nWe examine the problem of learning and planning on high-dimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains. However, domains with long horizons and sparse rewards, such as Montezuma's Revenge and Venture, remain challenging for existing methods. Methods using abstraction (Dietterich 2000; Sutton, Precup, and Singh 1999) have shown to be useful in tackling long-horizon problems. We combine recent techniques of deep reinforcement learning with existing model-based approaches using an expert-provided state abstraction. We construct toy domains that elucidate the problem of long horizons, sparse rewards and high-dimensional inputs, and show that our algorithm significantly outperforms previous methods on these domains. Our abstraction-based approach outperforms Deep Q-Networks (Mnih et al. 2015) on Montezuma's Revenge and Venture, and exhibits backtracking behavior that is absent from previous methods.\nSpeech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs have a rather complex design with three multiplicative gates, that might impair their efficient implementation. An attempt to simplify LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just two multiplicative gates.   This paper builds on these efforts by further revising GRUs and proposing a simplified architecture potentially more suitable for speech recognition. The contribution of this work is two-fold. First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture. Second, we propose to replace tanh with ReLU activations in the state update equations. Results show that, in our implementation, the revised architecture reduces the per-epoch training time with more than 30% and consistently improves recognition performance across different tasks, input features, and noisy conditions when compared to a standard GRU.\nPartially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize \"weakest\" additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability~1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.\nAutomatic feature learning algorithms are at the forefront of modern day machine learning research. We present a novel algorithm, supervised Q-walk, which applies Q-learning to generate random walks on graphs such that the walks prove to be useful for learning node features suitable for tackling with the node classification problem. We present another novel algorithm, k-hops neighborhood based confidence values learner, which learns confidence values of labels for unlabelled nodes in the network without first learning the node embedding. These confidence values aid in learning an apt reward function for Q-learning.   We demonstrate the efficacy of supervised Q-walk approach over existing state-of-the-art random walk based node embedding learners in solving the single / multi-label multi-class node classification problem using several real world datasets.   Summarising, our approach represents a novel state-of-the-art technique to learn features, for nodes in networks, tailor-made for dealing with the node classification problem.\nDeep Neural Networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. Many different algorithms have been proposed to implement the most computationally expensive layers of DNNs. Further, each of these algorithms has a large number of variants, which offer different trade-offs of parallelism, data locality, memory footprint, and execution time. In addition, specific algorithms operate much more efficiently on specialized data layouts and formats.   We state the problem of optimal primitive selection in the presence of data format transformations, and show that it is NP-hard by demonstrating an embedding in the Partitioned Boolean Quadratic Assignment problem (PBQP).   We propose an analytic solution via a PBQP solver, and evaluate our approach experimentally by optimizing several popular DNNs using a library of more than 70 DNN primitives, on an embedded platform and a general purpose platform. We show experimentally that significant gains are possible versus the state of the art vendor libraries by using a principled analytic solution to the problem of layout selection in the presence of data format transformations.\nLow dimensional embeddings that capture the main variations of interest in collections of data are important for many applications. One way to construct these embeddings is to acquire estimates of similarity from the crowd. However, similarity is a multi-dimensional concept that varies from individual to individual. Existing models for learning embeddings from the crowd typically make simplifying assumptions such as all individuals estimate similarity using the same criteria, the list of criteria is known in advance, or that the crowd workers are not influenced by the data that they see. To overcome these limitations we introduce Context Embedding Networks (CENs). In addition to learning interpretable embeddings from images, CENs also model worker biases for different attributes along with the visual context i.e. the visual attributes highlighted by a set of images. Experiments on two noisy crowd annotated datasets show that modeling both worker bias and visual context results in more interpretable embeddings compared to existing approaches.\nIn this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer sub-task specifications. These specifications are fed to a hierarchical neural program, where bottom-level programs are callable subroutines that interact with the environment. We validate our method in three robot manipulation tasks. NTP achieves strong generalization across sequential tasks that exhibit hierarchal and compositional structures. The experimental results show that NTP learns to generalize well to- wards unseen tasks with increasing lengths, variable topologies, and changing objectives.\nA current challenge for data management systems is to support the construction and maintenance of machine learning models over data that is large, multi-dimensional, and evolving. While systems that could support these tasks are emerging, the need to scale to distributed, streaming data requires new models and algorithms. In this setting, as well as computational scalability and model accuracy, we also need to minimize the amount of communication between distributed processors, which is the chief component of latency. We study Bayesian networks, the workhorse of graphical models, and present a communication-efficient method for continuously learning and maintaining a Bayesian network model over data that is arriving as a distributed stream partitioned across multiple processors. We show a strategy for maintaining model parameters that leads to an exponential reduction in communication when compared with baseline approaches to maintain the exact MLE (maximum likelihood estimation). Meanwhile, our strategy provides similar prediction errors for the target distribution and for classification tasks.\nLifted Relational Neural Networks (LRNNs) describe relational domains using weighted first-order rules which act as templates for constructing feed-forward neural networks. While previous work has shown that using LRNNs can lead to state-of-the-art results in various ILP tasks, these results depended on hand-crafted rules. In this paper, we extend the framework of LRNNs with structure learning, thus enabling a fully automated learning process. Similarly to many ILP methods, our structure learning algorithm proceeds in an iterative fashion by top-down searching through the hypothesis space of all possible Horn clauses, considering the predicates that occur in the training examples as well as invented soft concepts entailed by the best weighted rules found so far. In the experiments, we demonstrate the ability to automatically induce useful hierarchical soft concepts leading to deep LRNNs with a competitive predictive power.\nIn this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.\nRecurrent neural networks have shown remarkable success in modeling sequences. However low resource situations still adversely affect the generalizability of these models. We introduce a new family of models, called Lattice Recurrent Units (LRU), to address the challenge of learning deep multi-layer recurrent models with limited resources. LRU models achieve this goal by creating distinct (but coupled) flow of information inside the units: a first flow along time dimension and a second flow along depth dimension. It also offers a symmetry in how information can flow horizontally and vertically. We analyze the effects of decoupling three different components of our LRU model: Reset Gate, Update Gate and Projected State. We evaluate this family on new LRU models on computational convergence rates and statistical efficiency. Our experiments are performed on four publicly-available datasets, comparing with Grid-LSTM and Recurrent Highway networks. Our results show that LRU has better empirical computational convergence rates and statistical efficiency values, along with learning more accurate language models.\nWe present a hybrid neural network and rule-based system that generates pop music. Music produced by pure rule-based systems often sounds mechanical. Music produced by machine learning sounds better, but still lacks hierarchical temporal structure. We restore temporal hierarchy by augmenting machine learning with a temporal production grammar, which generates the music's overall structure and chord progressions. A compatible melody is then generated by a conditional variational recurrent autoencoder. The autoencoder is trained with eight-measure segments from a corpus of 10,000 MIDI files, each of which has had its melody track and chord progressions identified heuristically. The autoencoder maps melody into a multi-dimensional feature space, conditioned by the underlying chord progression. A melody is then generated by feeding a random sample from that space to the autoencoder's decoder, along with the chord progression generated by the grammar. The autoencoder can make musically plausible variations on an existing melody, suitable for recurring motifs. It can also reharmonize a melody to a new chord progression, keeping the rhythm and contour. The generated music compares favorably with that generated by other academic and commercial software designed for the music-as-a-service industry.\nThe deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.\nWe present an approach for mobile robots to learn to navigate in dynamic environments with pedestrians via raw depth inputs, in a socially compliant manner. To achieve this, we adopt a generative adversarial imitation learning (GAIL) strategy, which improves upon a pre-trained behavior cloning policy. Our approach overcomes the disadvantages of previous methods, as they heavily depend on the full knowledge of the location and velocity information of nearby pedestrians, which not only requires specific sensors, but also the extraction of such state information from raw sensory input could consume much computation time. In this paper, our proposed GAIL-based model performs directly on raw depth inputs and plans in real-time. Experiments show that our GAIL-based approach greatly improves the safety and efficiency of the behavior of mobile robots from pure behavior cloning. The real-world deployment also shows that our method is capable of guiding autonomous vehicles to navigate in a socially compliant manner directly through raw depth inputs. In addition, we release a simulation plugin for modeling pedestrian behaviors based on the social force model.\nDigital image segmentation is the process of assigning distinct labels to different objects in a digital image, and the fuzzy segmentation algorithm has been successfully used in the segmentation of images from a wide variety of sources. However, the traditional fuzzy segmentation algorithm fails to segment objects that are characterized by textures whose patterns cannot be successfully described by simple statistics computed over a very restricted area. In this paper, we propose an extension of the fuzzy segmentation algorithm that uses adaptive textural affinity functions to perform the segmentation of such objects on bidimensional images. The adaptive affinity functions compute their appropriate neighborhood size as they compute the texture descriptors surrounding the seed spels (spatial elements), according to the characteristics of the texture being processed. The algorithm then segments the image with an appropriate neighborhood for each object. We performed experiments on mosaic images that were composed using images from the Brodatz database, and compared our results with the ones produced by a recently published texture segmentation algorithm, showing the applicability of our method.\nIn this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to an optimal regret that is logarithmic with respect to the number of episodes.\nThis paper presents the learning algorithm based on the Recurrent Network-based Deterministic Policy Gradient. The Long-Short Term Memory is utilized to enable the Partially Observed Markov Decision Process framework. The novelty are improvements of LSTM networks: update of multi-step temporal difference, removal of backpropagation through time on actor, initialisation of hidden state using past trajectory scanning, and injection of external experiences learned by other agents. Our methods benefit the reinforcement learning agent on inferring the desirable action by referring the trajectories of both past observations and actions. The proposed algorithm was implemented to solve the Bipedal-Walker challenge in OpenAI virtual environment where only partial state information is available. The validation on the extremely rugged terrain demonstrates the effectiveness of the proposed algorithm by achieving a new record of highest rewards in the challenge. The autonomous behaviors generated by our agent are highly adaptive to a variety of obstacles as shown in the simulation results.\nMachine learning algorithms for prediction are increasingly being used in critical decisions affecting human lives. Various fairness formalizations, with no firm consensus yet, are employed to prevent such algorithms from systematically discriminating against people based on certain attributes protected by law. The aim of this article is to survey how fairness is formalized in the machine learning literature for the task of prediction and present these formalizations with their corresponding notions of distributive justice from the social sciences literature. We provide theoretical as well as empirical critiques of these notions from the social sciences literature and explain how these critiques limit the suitability of the corresponding fairness formalizations to certain domains. We also suggest two notions of distributive justice which address some of these critiques and discuss avenues for prospective fairness formalizations.\nIn this paper we propose a function space approach to Representation Learning and the analysis of the representation layers in deep learning architectures. We show how to compute a weak-type Besov smoothness index that quantifies the geometry of the clustering in the feature space. This approach was already applied successfully to improve the performance of machine learning algorithms such as the Random Forest and tree-based Gradient Boosting. Our experiments demonstrate that in well-known and well-performing trained networks, the Besov smoothness of the training set, measured in the corresponding hidden layer feature map representation, increases from layer to layer. We also contribute to the understanding of generalization by showing how the Besov smoothness of the representations, decreases as we add more mis-labeling to the training data. We hope this approach will contribute to the de-mystification of some aspects of deep learning.\nMany applications infer the structure of a probabilistic graphical model from data to elucidate the relationships between variables. But how can we train graphical models on a massive data set? In this paper, we show how to construct coresets -compressed data sets which can be used as proxy for the original data and have provably bounded worst case error- for Gaussian dependency networks (DNs), i.e., cyclic directed graphical models over Gaussians, where the parents of each variable are its Markov blanket. Specifically, we prove that Gaussian DNs admit coresets of size independent of the size of the data set. Unfortunately, this does not extend to DNs over members of the exponential family in general. As we will prove, Poisson DNs do not admit small coresets. Despite this worst-case result, we will provide an argument why our coreset construction for DNs can still work well in practice on count data. To corroborate our theoretical results, we empirically evaluated the resulting Core DNs on real data sets. The results\nNatural language place descriptions in everyday communication provide a rich source of spatial knowledge about places. An important step to utilize such knowledge in information systems is geo-referencing all the places referred to in these descriptions. Current techniques for geo-referencing places from text documents are using place name recognition and disambiguation; however, place descriptions often contain place references that are not known by gazetteers, or that are expressed in other, more flexible ways. Hence, the approach for geo-referencing presented in this paper starts from a place graph that contains the place references as well as spatial relationships extracted from place descriptions. Spatial relationships are used to constrain the locations of places and allow the later best-matching process for geo-referencing. The novel geo-referencing process results in higher precision and recall compared to state-of-art toponym resolution approaches on several tested place description datasets.\nReasoning about causes and effects naturally arises in the engineering of safety-critical systems. A classical example is Fault Tree Analysis, a deductive technique used for system safety assessment, whereby an undesired state is reduced to the set of its immediate causes. The design of fault management systems also requires reasoning on causality relationships. In particular, a fail-operational system needs to ensure timely detection and identification of faults, i.e. recognize the occurrence of run-time faults through their observable effects on the system. Even more complex scenarios arise when multiple faults are involved and may interact in subtle ways.   In this work, we propose a formal approach to fault management for complex systems. We first introduce the notions of fault tree and minimal cut sets. We then present a formal framework for the specification and analysis of diagnosability, and for the design of fault detection and identification (FDI) components. Finally, we review recent advances in fault propagation analysis, based on the Timed Failure Propagation Graphs (TFPG) formalism.\nHow do we determine the mutational effects in exome sequencing data with little or no statistical evidence? Can protein structural information fill in the gap of not having enough statistical evidence? In this work, we answer the two questions with the goal towards determining pathogenic effects of rare variants in rare disease. We take the approach of determining the importance of point mutation loci focusing on protein structure features. The proposed structure-based features contain information about geometric, physicochemical, and functional information of mutation loci and those of structural neighbors of the loci. The performance of the structure-based features trained on 80\\% of HumDiv and tested on 20\\% of HumDiv and on ClinVar datasets showed high levels of discernibility in the mutation's pathogenic or benign effects: F score of 0.71 and 0.68 respectively using multi-layer perceptron. Combining structure- and sequence-based feature further improve the accuracy: F score of 0.86 (HumDiv) and 0.75 (ClinVar). Also, careful examination of the rare variants in rare diseases cases showed that structure-based features are important in discerning importance of variant loci.\nIn this paper, we propose a novel end-to-end neural architecture for ranking candidate answers, that adapts a hierarchical recurrent neural network and a latent topic clustering module. With our proposed model, a text is encoded to a vector representation from an word-level to a chunk-level to effectively capture the entire meaning. In particular, by adapting the hierarchical structure, our model shows very small performance degradations in longer text comprehension while other state-of-the-art recurrent neural network models suffer from it. Additionally, the latent topic clustering module extracts semantic information from target samples. This clustering module is useful for any text related tasks by allowing each data sample to find its nearest topic cluster, thus helping the neural network model analyze the entire data. We evaluate our models on the Ubuntu Dialogue Corpus and consumer electronic domain question answering dataset, which is related to Samsung products. The proposed model shows state-of-the-art results for ranking question-answer pairs.\nIn several domains obtaining class annotations is expensive while at the same time unlabelled data are abundant. While most semi-supervised approaches enforce restrictive assumptions on the data distribution, recent work has managed to learn semi-supervised models in a non-restrictive regime. However, so far such approaches have only been proposed for linear models. In this work, we introduce semi-supervised parameter learning for Sum-Product Networks (SPNs). SPNs are deep probabilistic models admitting inference in linear time in number of network edges. Our approach has several advantages, as it (1) allows generative and discriminative semi-supervised learning, (2) guarantees that adding unlabelled data can increase, but not degrade, the performance (safe), and (3) is computationally efficient and does not enforce restrictive assumptions on the data distribution. We show on a variety of data sets that safe semi-supervised learning with SPNs is competitive compared to state-of-the-art and can lead to a better generative and discriminative objective value than a purely supervised approach.\nThis work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta IRL algorithm that first models the reward function for each task as a distribution conditioned on a baseline reward function shared by all tasks and dependent only on the demonstrator, and then finds the most likely reward function in the distribution that explains the task-specific behaviors. We test the method in a simulated environment on path planning tasks with limited demonstrations, and show that the accuracy of the learned reward function is significantly improved. We also apply the method to analyze the motion of a patient under rehabilitation.\nReinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can produce behaviors that are far more complex than the environment itself. We also point out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty. This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics. The trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple. The skills include behaviors such as running, blocking, ducking, tackling, fooling opponents, kicking, and defending using both arms and legs. A highlight of the learned behaviors can be found here: https://goo.gl/eR7fbX\nThe recent breakthroughs of deep reinforcement learning (DRL) technique in Alpha Go and playing Atari have set a good example in handling large state and actions spaces of complicated control problems. The DRL technique is comprised of (i) an offline deep neural network (DNN) construction phase, which derives the correlation between each state-action pair of the system and its value function, and (ii) an online deep Q-learning phase, which adaptively derives the optimal action and updates value estimates. In this paper, we first present the general DRL framework, which can be widely utilized in many applications with different optimization objectives. This is followed by the introduction of three specific applications: the cloud computing resource allocation problem, the residential smart grid task scheduling problem, and building HVAC system optimal control problem. The effectiveness of the DRL technique in these three cyber-physical applications have been validated. Finally, this paper investigates the stochastic computing-based hardware implementations of the DRL framework, which consumes a significant improvement in area efficiency and power consumption compared with binary-based implementation counterparts.\nSteering a car through traffic is a complex task that is difficult to cast into algorithms. Therefore, researchers turn to training artificial neural networks from front-facing camera data stream along with the associated steering angles. Nevertheless, most existing solutions consider only the visual camera frames as input, thus ignoring the temporal relationship between frames. In this work, we propose a Convolutional Long Short-Term Memory Recurrent Neural Network (C-LSTM), that is end-to-end trainable, to learn both visual and dynamic temporal dependencies of driving. Additionally, We introduce posing the steering angle regression problem as classification while imposing a spatial relationship between the output layer neurons. Such method is based on learning a sinusoidal function that encodes steering angles. To train and validate our proposed methods, we used the publicly available Comma.ai dataset. Our solution improved steering root mean square error by 35% over recent methods, and led to a more stable steering by 87%.\nWe propose a deep semantic characterization of space and motion categorically from the viewpoint of grounding embodied human-object interactions. Our key focus is on an ontological model that would be adept to formalisation from the viewpoint of commonsense knowledge representation, relational learning, and qualitative reasoning about space and motion in cognitive robotics settings. We demonstrate key aspects of the space & motion ontology and its formalization as a representational framework in the backdrop of select examples from a dataset of everyday activities. Furthermore, focussing on human-object interaction data obtained from RGBD sensors, we also illustrate how declarative (spatio-temporal) reasoning in the (constraint) logic programming family may be performed with the developed deep semantic abstractions.\nIn this paper we demonstrate a new algorithm for sparse prestack azimuthal AVO inversion. A novel Euclidean prior model is developed to at once respect sparseness in the layered earth and smoothness in the model of reflectivity. Recognizing that methods of artificial intelligence and Bayesian computation are finding an every increasing role in augmenting the process of interpretation and analysis of geophysical data, we derive a generalized matrix-variate model of reflectivity in terms of orthogonal basis functions, subject to sparse constraints. This supports a direct application of machine learning methods, in a way that can be mapped back onto the physical principles known to govern reflection seismology. As a demonstration we present an application of these methods to the Marcellus shale. Attributes extracted using the azimuthal inversion are clustered using an unsupervised learning algorithm. Interpretation of the clusters is performed in the context of the Ruger model of azimuthal AVO.\nWe present a novel formalization of counterfactual conditionals in a quantified modal logic. Counterfactual conditionals play a vital role in ethical and moral reasoning. Prior work has shown that moral reasoning systems (and more generally, theory-of-mind reasoning systems) should be at least as expressive as first-order (quantified) modal logic (QML) to be well-behaved. While existing work on moral reasoning has focused on counterfactual-free QML moral reasoning, we present a fully specified and implemented formal system that includes counterfactual conditionals. We validate our model with two projects. In the first project, we demonstrate that our system can be used to model a complex moral principle, the doctrine of double effect. In the second project, we use the system to build a data-set with true and false counterfactuals as licensed by our theory, which we believe can be useful for other researchers. This project also shows that our model can be computationally feasible.\nWe present Synkhronos, an extension to Theano for multi-GPU computations leveraging data parallelism. Our framework provides automated execution and synchronization across devices, allowing users to continue to write serial programs without risk of race conditions. The NVIDIA Collective Communication Library is used for high-bandwidth inter-GPU communication. Further enhancements to the Theano function interface include input slicing (with aggregation) and input indexing, which perform common data-parallel computation patterns efficiently. One example use case is synchronous SGD, which has recently been shown to scale well for a growing set of deep learning problems. When training ResNet-50, we achieve a near-linear speedup of 7.5x on an NVIDIA DGX-1 using 8 GPUs, relative to Theano-only code running a single GPU in isolation. Yet Synkhronos remains general to any data-parallel computation programmable in Theano. By implementing parallelism at the level of individual Theano functions, our framework uniquely addresses a niche between manual multi-device programming and prescribed multi-GPU training routines.\nMachine learning, the core of artificial intelligence and big data science, is one of today's most rapidly growing interdisciplinary fields. Recently, its tools and techniques have been adopted to tackle intricate quantum many-body problems. In this work, we introduce machine learning techniques to the detection of quantum nonlocality in many-body systems, with a focus on the restricted-Boltzmann-machine (RBM) architecture. Using reinforcement learning, we demonstrate that RBM is capable of finding the maximum quantum violations of multipartite Bell inequalities with given measurement settings. Our results build a novel bridge between computer-science-based machine learning and quantum many-body nonlocality, which will benefit future studies in both areas.\nWe propose Marve, a system for extracting measurement values, units, and related words from natural language text. Marve uses conditional random fields (CRF) to identify measurement values and units, followed by a rule-based system to find related entities, descriptors and modifiers within a sentence. Sentence tokens are represented by an undirected graphical model, and rules are based on part-of-speech and word dependency patterns connecting values and units to contextual words. Marve is unique in its focus on measurement context and early experimentation demonstrates Marve's ability to generate high-precision extractions with strong recall. We also discuss Marve's role in refining measurement requirements for NASA's proposed HyspIRI mission, a hyperspectral infrared imaging satellite that will study the world's ecosystems. In general, our work with HyspIRI demonstrates the value of semantic measurement extractions in characterizing quantitative discussion contained in large corpuses of natural language text. These extractions accelerate broad, cross-cutting research and expose scientists new algorithmic approaches and experimental nuances. They also facilitate identification of scientific opportunities enabled by HyspIRI leading to more efficient scientific investment and research.\nSentence vectors represent an appealing approach to meaning: learn an embedding that encompasses the meaning of a sentence in a single vector, that can be used for a variety of semantic tasks. Existing models for learning sentence embeddings either require extensive computational resources to train on large corpora, or are trained on costly, manually curated datasets of sentence relations. We observe that humans naturally annotate the relations between their sentences with discourse markers like \"but\" and \"because\". These words are deeply linked to the meanings of the sentences they connect. Using this natural signal, we automatically collect a classification dataset from unannotated text. Training a model to predict these discourse markers yields high quality sentence embeddings. Our model captures complementary information to existing models and achieves comparable generalization performance to state of the art models.\nIn practical analysis, domain knowledge about analysis target has often been accumulated, although, typically, such knowledge has been discarded in the statistical analysis stage, and the statistical tool has been applied as a black box. In this paper, we introduce sign constraints that are a handy and simple representation for non-experts in generic learning problems. We have developed two new optimization algorithms for the sign-constrained regularized loss minimization, called the sign-constrained Pegasos (SC-Pega) and the sign-constrained SDCA (SC-SDCA), by simply inserting the sign correction step into the original Pegasos and SDCA, respectively. We present theoretical analyses that guarantee that insertion of the sign correction step does not degrade the convergence rate for both algorithms. Two applications, where the sign-constrained learning is effective, are presented. The one is exploitation of prior information about correlation between explanatory variables and a target variable. The other is introduction of the sign-constrained to SVM-Pairwise method. Experimental results demonstrate significant improvement of generalization performance by introducing sign constraints in both applications.\nBayesian inference for models that have an intractable partition function is known as a doubly intractable problem, where standard Monte Carlo methods are not applicable. The past decade has seen the development of auxiliary variable Monte Carlo techniques (M{\\o}ller et al., 2006; Murray et al., 2006) for tackling this problem; these approaches being members of the more general class of pseudo-marginal, or exact-approximate, Monte Carlo algorithms (Andrieu and Roberts, 2009), which make use of unbiased estimates of intractable posteriors. Everitt et al. (2017) investigated the use of exact-approximate importance sampling (IS) and sequential Monte Carlo (SMC) in doubly intractable problems, but focussed only on SMC algorithms that used data-point tempering. This paper describes SMC samplers that may use alternative sequences of distributions, and describes ways in which likelihood estimates may be improved adaptively as the algorithm progresses, building on ideas from Moores et al. (2015). This approach is compared with a number of alternative algorithms for doubly intractable problems, including approximate Bayesian computation (ABC), which we show is closely related to the method of M{\\o}ller et al. (2006).\nUnderstanding driving behaviors is essential for improving safety and mobility of our transportation systems. Data is usually collected via simulator-based studies or naturalistic driving studies. Those techniques allow for understanding relations between demographics, road conditions and safety. On the other hand, they are very costly and time consuming. Thanks to the ubiquity of smartphones, we have an opportunity to substantially complement more traditional data collection techniques with data extracted from phone sensors, such as GPS, accelerometer gyroscope and camera. We developed statistical models that provided insight into driver behavior in the San Francisco metro area based on tens of thousands of driver logs. We used novel data sources to support our work. We used cell phone sensor data drawn from five hundred drivers in San Francisco to understand the speed of traffic across the city as well as the maneuvers of drivers in different areas. Specifically, we clustered drivers based on their driving behavior. We looked at driver norms by street and flagged driving behaviors that deviated from the norm.\nIn Crowdfunding platforms, people turn their prototype ideas into real products by raising money from the crowd, or invest in someone else's projects. In reward-based crowdfunding platforms such as Kickstarter and Indiegogo, selecting accurate reward delivery duration becomes crucial for creators, backers, and platform providers to keep the trust between the creators and the backers, and the trust between the platform providers and users. According to Kickstarter, 35% backers did not receive rewards on time. Unfortunately, little is known about on-time and late reward delivery projects, and there is no prior work to estimate reward delivery duration. To fill the gap, in this paper, we (i) extract novel features that reveal latent difficulty levels of project rewards; (ii) build predictive models to identify whether a creator will deliver all rewards in a project on time or not; and (iii) build a regression model to estimate accurate reward delivery duration (i.e., how long it will take to produce and deliver all the rewards). Experimental results show that our models achieve good performance -- 82.5% accuracy, 78.1 RMSE, and 0.108 NRMSE at the first 5% of the longest reward delivery duration.\nRecent developments within memory-augmented neural networks have solved sequential problems requiring long-term memory, which are intractable for traditional neural networks. However, current approaches still struggle to scale to large memory sizes and sequence lengths. In this paper we show how access to memory can be encoded geometrically through a HyperNEAT-based Neural Turing Machine (HyperENTM). We demonstrate that using the indirect HyperNEAT encoding allows for training on small memory vectors in a bit-vector copy task and then applying the knowledge gained from such training to speed up training on larger size memory vectors. Additionally, we demonstrate that in some instances, networks trained to copy bit-vectors of size 9 can be scaled to sizes of 1,000 without further training. While the task in this paper is simple, these results could open up the problems amendable to networks with external memories to problems with larger memory vectors and theoretically unbounded memory sizes.\nWe propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, $h$, is a neural network which learns to transform a simple noise distribution, $p(\\epsilon) = \\mathcal{N}(0,I)$, to a distribution $q(\\theta) \\doteq q(h(\\epsilon))$ over the parameters $\\theta$ of another neural network (the \"primary network\"). We train $q$ with variational inference, using an invertible $h$ to enable efficient estimation of the variational lower bound on the posterior $p(\\theta | \\mathcal{D})$ via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of $q(\\theta)$. We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.\nWith the recent renaissance of deep convolution neural networks, encouraging breakthroughs have been achieved on the supervised recognition tasks, where each class has sufficient training data and fully annotated training data. However, to scale the recognition to a large number of classes with few or now training samples for each class remains an unsolved problem. One approach to scaling up the recognition is to develop models capable of recognizing unseen categories without any training instances, or zero-shot recognition/ learning. This article provides a comprehensive review of existing zero-shot recognition techniques covering various aspects ranging from representations of models, and from datasets and evaluation settings. We also overview related recognition tasks including one-shot and open set recognition which can be used as natural extensions of zero-shot recognition when limited number of class samples become available or when zero-shot recognition is implemented in a real-world setting. Importantly, we highlight the limitations of existing approaches and point out future research directions in this existing new research area.\nThis paper describes and motivates a new decision theory known as functional decision theory (FDT), as distinct from causal decision theory and evidential decision theory. Functional decision theorists hold that the normative principle for action is to treat one's decision as the output of a fixed mathematical function that answers the question, \"Which output of this very function would yield the best outcome?\" Adhering to this principle delivers a number of benefits, including the ability to maximize wealth in an array of traditional decision-theoretic and game-theoretic problems where CDT and EDT perform poorly. Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb's problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit's hitchhiker problem. In this paper, we define FDT, explore its prescriptions in a number of different decision problems, compare it to CDT and EDT, and give philosophical justifications for FDT as a normative theory of decision-making.\nNetworks are fundamental models for data used in practically every application domain. In most instances, several implicit or explicit choices about the network definition impact the translation of underlying data to a network representation, and the subsequent question(s) about the underlying system being represented. Users of downstream network data may not even be aware of these choices or their impacts. We propose a task-focused network model selection methodology which addresses several key challenges. Our approach constructs network models from underlying data and uses minimum description length (MDL) criteria for selection. Our methodology measures efficiency, a general and comparable measure of the network's performance of a local (i.e. node-level) predictive task of interest. Selection on efficiency favors parsimonious (e.g. sparse) models to avoid overfitting and can be applied across arbitrary tasks and representations. We show stability, sensitivity, and significance testing in our methodology.\nWe study learning algorithms that are restricted to using a small amount of information from their input sample. We introduce a category of learning algorithms we term $d$-bit information learners, which are algorithms whose output conveys at most $d$ bits of information of their input. A central theme in this work is that such algorithms generalize.   We focus on the learning capacity of these algorithms, and prove sample complexity bounds with tight dependencies on the confidence and error parameters. We also observe connections with well studied notions such as sample compression schemes, Occam's razor, PAC-Bayes and differential privacy.   We discuss an approach that allows us to prove upper bounds on the amount of information that algorithms reveal about their inputs, and also provide a lower bound by showing a simple concept class for which every (possibly randomized) empirical risk minimizer must reveal a lot of information. On the other hand, we show that in the distribution-dependent setting every VC class has empirical risk minimizers that do not reveal a lot of information.\nWe present the Multi-vAlue Rule Set (MARS) model for interpretable classification with feature efficient presentations. MARS introduces a more generalized form of association rules that allows multiple values in a condition. Rules of this form are more concise than traditional single-valued rules in capturing and describing patterns in data. MARS mitigates the problem of dealing with continuous features and high-cardinality categorical features faced by rule-based models. Our formulation also pursues a higher efficiency of feature utilization, which reduces the cognitive load to understand the decision process. We propose an efficient inference method for learning a maximum a posteriori model, incorporating theoretically grounded bounds to iteratively reduce the search space to improve search efficiency. Experiments with synthetic and real-world data demonstrate that MARS models have significantly smaller complexity and fewer features, providing better interpretability while being competitive in predictive accuracy. We conducted a usability study with human subjects and results show that MARS is the easiest to use compared with other competing rule-based models, in terms of the correct rate and response time. Overall, MARS introduces a new approach to rule-based models that balance accuracy and interpretability with feature-efficient representations.\nIn this work, we propose an infinite restricted Boltzmann machine~(RBM), whose maximum likelihood estimation~(MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.\nIn his seminal paper that inaugurated abstract argumentation, Dung proved that the set of complete extensions forms a complete semilattice with respect to set inclusion. In this note we demonstrate that this proof is incorrect with counterexamples. We then trace the error in the proof and explain why it arose. We then examine the implications for the grounded extension.   [Reason for withdrawal continued] Page 4, Example 2 is not a counterexample to Dung 1995 Theorem 25(3). It was believed to be a counter-example because the author misunderstood ``glb'' to be set-theoretic intersection. But in this case, ``glb'' is defined to be other than set-theoretic intersection such that Theorem 25(3) is true.   The author was motivated to fully understand the lattice-theoretic claims of Dung 1995 in writing this note and was not aware that this issue is probably folklore; the author bears full responsibility for this error.\nPolicy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.\nWe introduce a novel generative model for interpretable subgroup analysis for causal inference applications, Causal Rule Sets (CRS). A CRS model uses a small set of short rules to capture a subgroup where the average treatment effect is elevated compared to the entire population. We present a Bayesian framework for learning a causal rule set. The Bayesian framework consists of a prior that favors simpler models and a Bayesian logistic regression that characterizes the relation between outcomes, attributes and subgroup membership. We find maximum a posteriori models using discrete Monte Carlo steps in the joint solution space of rules sets and parameters. We provide theoretically grounded heuristics and bounding strategies to improve search efficiency. Experiments show that the search algorithm can efficiently recover a true underlying subgroup and CRS shows consistently competitive performance compared to other state-of-the-art baseline methods.\nFlow is a new computational framework, built to support a key need triggered by the rapid growth of autonomy in ground traffic: controllers for autonomous vehicles in the presence of complex nonlinear dynamics in traffic. Leveraging recent advances in deep Reinforcement Learning (RL), Flow enables the use of RL methods such as policy gradient for traffic control and enables benchmarking the performance of classical (including hand-designed) controllers with learned policies (control laws). Flow integrates traffic microsimulator SUMO with deep reinforcement learning library rllab and enables the easy design of traffic tasks, including different networks configurations and vehicle dynamics. We use Flow to develop reliable controllers for complex problems, such as controlling mixed-autonomy traffic (involving both autonomous and human-driven vehicles) in a ring road. For this, we first show that state-of-the-art hand-designed controllers excel when in-distribution, but fail to generalize; then, we show that even simple neural network policies can solve the stabilization task across density settings and generalize to out-of-distribution settings.\nWith a direct analysis of neural networks, this paper presents a mathematically tight generalization theory to partially address an open problem regarding the generalization of deep learning. Unlike previous bound-based theory, our main theory is quantitatively as tight as possible for every dataset individually, while producing qualitative insights competitively. Our results give insight into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, answering to an open question in the literature. We also discuss limitations of our results and propose additional open problems.\nHow can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical approach, which integrates model-free deep learning and model-based path planning. At the low level, a neural-network motion controller, called the intention-net, is trained end-to-end to provide robust local navigation. The intention-net maps images from a single monocular camera and \"intentions\" directly to robot controls. At the high level, a path planner uses a crude map, e.g., a 2-D floor plan, to compute a path from the robot's current location to the goal. The planned path provides intentions to the intention-net. Preliminary experiments suggest that the learned motion controller is robust against perceptual uncertainty and by integrating with a path planner, it generalizes effectively to new environments and goals.\nProcess mining has emerged as a way to analyze the behavior of an organization by extracting knowledge from event logs and by offering techniques to discover, monitor and enhance real processes. In the discovery of process models, retrieving a complex one, i.e., a hardly readable process model, can hinder the extraction of information. Even in well-structured process models, there is information that cannot be obtained with the current techniques. In this paper, we present WoMine, an algorithm to retrieve frequent behavioural patterns from the model. Our approach searches in process models extracting structures with sequences, selections, parallels and loops, which are frequently executed in the logs. This proposal has been validated with a set of process models, including some from BPI Challenges, and compared with the state of the art techniques. Experiments have validated that WoMine can find all types of patterns, extracting information that cannot be mined with the state of the art techniques.\nOptical Character Recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into its constituent characters. Despite decades of intense research, developing OCR with capabilities comparable to that of human still remains an open challenge. Due to this challenging nature, researchers from industry and academic circles have directed their attentions towards Optical Character Recognition. Over the last few years, the number of academic laboratories and companies involved in research on Character Recognition has increased dramatically. This research aims at summarizing the research so far done in the field of OCR. It provides an overview of different aspects of OCR and discusses corresponding proposals aimed at resolving issues of OCR.\nBecause of the increasing availability of spatiotemporal data, a variety of data-analytic applications have become possible. Characterizing driving context, where context may be thought of as a combination of location and time, is a new challenging application. An example of such a characterization is finding the correlation between driving behavior and traffic conditions. This contextual information enables analysts to validate observation-based hypotheses about the driving of an individual. In this paper, we present DriveContext, a novel framework to find the characteristics of a context, by extracting significant driving patterns (e.g., a slow-down), and then identifying the set of potential causes behind patterns (e.g., traffic congestion). Our experimental results confirm the feasibility of the framework in identifying meaningful driving patterns, with improvements in comparison with the state-of-the-art. We also demonstrate how the framework derives interesting characteristics for different contexts, through real-world examples.\nWe develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.\nWe present PubMed 200k RCT, a new dataset based on PubMed for sequential sentence classification. The dataset consists of approximately 200,000 abstracts of randomized controlled trials, totaling 2.3 million sentences. Each sentence of each abstract is labeled with their role in the abstract using one of the following classes: background, objective, method, result, or conclusion. The purpose of releasing this dataset is twofold. First, the majority of datasets for sequential short-text classification (i.e., classification of short texts that appear in sequences) are small: we hope that releasing a new large dataset will help develop more accurate algorithms for this task. Second, from an application perspective, researchers need better tools to efficiently skim through the literature. Automatically classifying each sentence in an abstract would help researchers read abstracts more efficiently, especially in fields where abstracts may be long, such as the medical field.\nIn this paper, an original heuristic algorithm of empty vehicles management in personal rapid transit network is presented. The algorithm is used for the delivery of empty vehicles for waiting passengers, for balancing the distribution of empty vehicles within the network, and for providing an empty space for vehicles approaching a station. Each of these tasks involves a decision on the trip that has to be done by a selected empty vehicle from its actual location to some determined destination. The decisions are based on a multi-parameter function involving a set of factors and thresholds. An important feature of the algorithm is that it does not use any central database of passenger input (demand) and locations of free vehicles. Instead, it is based on the local exchange of data between stations: on their states and on the vehicles they expect. Therefore, it seems well-tailored for a distributed implementation. The algorithm is uniform, meaning that the same basic procedure is used for multiple tasks using a task-specific set of parameters.\nRecent universal-hashing based approaches to sampling and counting crucially depend on the runtime performance of SAT solvers on formulas expressed as the conjunction of both CNF constraints and variable-width XOR constraints (known as CNF-XOR formulas). In this paper, we present the first study of the runtime behavior of SAT solvers equipped with XOR-reasoning techniques on random CNF-XOR formulas. We empirically demonstrate that a state-of-the-art SAT solver scales exponentially on random CNF-XOR formulas across a wide range of XOR-clause densities, peaking around the empirical phase-transition location. On the theoretical front, we prove that the solution space of a random CNF-XOR formula 'shatters' at all nonzero XOR-clause densities into well-separated components, similar to the behavior seen in random CNF formulas known to be difficult for many SAT algorithms.\nMany iterative procedures in stochastic optimization exhibit a transient phase followed by a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transition in the context of stochastic gradient descent with constant learning rate. We present theory and experiments suggesting that the region where the proposed diagnostic is activated coincides with the convergence region. For a class of loss functions, we derive a closed-form solution describing such region. Finally, we suggest an application to speed up convergence of stochastic gradient descent by halving the learning rate each time stationarity is detected. This leads to a new variant of stochastic gradient descent, which in many settings is comparable to state-of-art.\nLearning-based approaches to robotic manipulation are limited by the scalability of data collection and accessibility of labels. In this paper, we present a multi-task domain adaptation framework for instance grasping in cluttered scenes by utilizing simulated robot experiments. Our neural network takes monocular RGB images and the instance segmentation mask of a specified target object as inputs, and predicts the probability of successfully grasping the specified object for each candidate motor command. The proposed transfer learning framework trains a model for instance grasping in simulation and uses a domain-adversarial loss to transfer the trained model to real robots using indiscriminate grasping data, which is available both in simulation and the real world. We evaluate our model in real-world robot experiments, comparing it with alternative model architectures as well as an indiscriminate grasping baseline.\nMost Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently there exist no resources to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence - effectively performing multi-hop (alias multi-step) inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information, as providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 42.9% compared to human performance at 74.0% - leaving ample room for improvement.\nA key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware.\nDeep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.\nExperience replay is a key technique behind many recent advances in deep reinforcement learning. Allowing the agent to learn from earlier memories can speed up learning and break undesirable temporal correlations. Despite its wide-spread application, very little is understood about the properties of experience replay. How does the amount of memory kept affect learning dynamics? Does it help to prioritize certain experiences? In this paper, we address these questions by formulating a dynamical systems ODE model of Q-learning with experience replay. We derive analytic solutions of the ODE for a simple setting. We show that even in this very simple setting, the amount of memory kept can substantially affect the agent's performance. Too much or too little memory both slow down learning. Moreover, we characterize regimes where prioritized replay harms the agent's learning. We show that our analytic solutions have excellent agreement with experiments. Finally, we propose a simple algorithm for adaptively changing the memory buffer size which achieves consistently good empirical performance.\nWe tackle highly nonconvex, nonsmooth composite optimization problems whose objectives comprise a Moreau-Yosida regularized term. Classical nonconvex proximal splitting algorithms, such as nonconvex ADMM, suffer from lack of convergence for such a problem class. To overcome this difficulty, in this work we consider a lifted variant of the Moreau-Yosida regularized model and propose a novel multiblock primal-dual algorithm that intrinsically stabilizes the dual block. We provide a complete convergence analysis of our algorithm and identify respective optimality qualifications under which stationarity of the original model is retrieved at convergence. Numerically, we demonstrate the relevance of Moreau-Yosida regularized models and the efficiency of our algorithm on robust regression as well as joint feature selection and semi-supervised learning.\nGraph embedding has attracted increasing attention due to its critical application in social network analysis. Most existing algorithms for graph embedding only rely on the typology information and fail to use the copious information in nodes as well as edges. As a result, their performance for many tasks may not be satisfactory. In this paper, we proposed a novel and general framework of representation learning for graph with rich text information through constructing a bipartite heterogeneous network. Specially, we designed a biased random walk to explore the constructed heterogeneous network with the notion of flexible neighborhood. The efficacy of our method is demonstrated by extensive comparison experiments with several baselines on various datasets. It improves the Micro-F1 and Macro-F1 of node classification by 10% and 7% on Cora dataset.\nThe past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence. However, the loss functions of deep neural networks (especially nonlinear networks) are still far from being well understood from a theoretical aspect. In this paper, we enrich the current understanding of the landscape of the square loss functions for three types of neural networks. Specifically, when the parameter matrices are square, we provide an explicit characterization of the global minimizers for linear networks, linear residual networks, and nonlinear networks with one hidden layer. Then, we establish two quadratic types of landscape properties for the square loss of these neural networks, i.e., the gradient dominance condition within the neighborhood of their full rank global minimizers, and the regularity condition along certain directions and within the neighborhood of their global minimizers. These two landscape properties are desirable for the optimization around the global minimizers of the loss function for these neural networks.\nWhile most machine translation systems to date are trained on large parallel corpora, humans learn language in a different way: by being grounded in an environment and interacting with other humans. In this work, we propose a communication game where two agents, native speakers of their own respective languages, jointly learn to solve a visual referential task. We find that the ability to understand and translate a foreign language emerges as a means to achieve shared goals. The emergent translation is interactive and multimodal, and crucially does not require parallel corpora, but only monolingual, independent text and corresponding images. Our proposed translation model achieves this by grounding the source and target languages into a shared visual modality, and outperforms several baselines on both word-level and sentence-level translation tasks. Furthermore, we show that agents in a multilingual community learn to translate better and faster than in a bilingual communication setting.\nSpeech-based natural language question-answering interfaces to enterprise systems are gaining a lot of attention. General-purpose speech engines can be integrated with NLP systems to provide such interfaces. Usually, general-purpose speech engines are trained on large `general' corpus. However, when such engines are used for specific domains, they may not recognize domain-specific words well, and may produce erroneous output. Further, the accent and the environmental conditions in which the speaker speaks a sentence may induce the speech engine to inaccurately recognize certain words. The subsequent natural language question-answering does not produce the requisite results as the question does not accurately represent what the speaker intended. Thus, the speech engine's output may need to be adapted for a domain before further natural language processing is carried out. We present two mechanisms for such an adaptation, one based on evolutionary development and the other based on machine learning, and show how we can repair the speech-output to make the subsequent natural language question-answering better.\nSocial dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest. However, often the actions taken by a partner are (partially) unobserved or the consequences of individual actions are hard to predict. We show that in a large class of games good strategies can be constructed by conditioning one's behavior solely on outcomes (ie. one's past rewards). We call this consequentialist conditional cooperation. We show how to construct such strategies using deep reinforcement learning techniques and demonstrate, both analytically and experimentally, that they are effective in social dilemmas beyond simple matrix games. We also show the limitations of relying purely on consequences and discuss the need for understanding both the consequences of and the intentions behind an action.\nWe use decision trees to build a helpdesk agent reference network to facilitate the on-the-job advising of junior or less experienced staff on how to better address telecommunication customer fault reports. Such reports generate field measurements and remote measurements which, when coupled with location data and client attributes, and fused with organization-level statistics, can produce models of how support should be provided. Beyond decision support, these models can help identify staff who can act as advisors, based on the quality, consistency and predictability of dealing with complex troubleshooting reports. Advisor staff models are then used to guide less experienced staff in their decision making; thus, we advocate the deployment of a simple mechanism which exploits the availability of staff with a sound track record at the helpdesk to act as dormant tutors.\nIn this study, we present Swift Linked Data Miner, an interruptible algorithm that can directly mine an online Linked Data source (e.g., a SPARQL endpoint) for OWL 2 EL class expressions to extend an ontology with new SubClassOf: axioms. The algorithm works by downloading only a small part of the Linked Data source at a time, building a smart index in the memory and swiftly iterating over the index to mine axioms. We propose a transformation function from mined axioms to RDF Data Shapes. We show, by means of a crowdsourcing experiment, that most of the axioms mined by Swift Linked Data Miner are correct and can be added to an ontology. We provide a ready to use Prot\\'eg\\'e plugin implementing the algorithm, to support ontology engineers in their daily modeling work.\nWe propose a two phase time dependent vehicle routing and scheduling optimization model that identifies the safest routes, as a substitute for the classical objectives given in the literature such as shortest distance or travel time, through (1) avoiding recurring congestions, and (2) selecting routes that have a lower probability of crash occurrences and non-recurring congestion caused by those crashes. In the first phase, we solve a mixed-integer programming model which takes the dynamic speed variations into account on a graph of roadway networks according to the time of day, and identify the routing of a fleet and sequence of nodes on the safest feasible paths. Second phase considers each route as an independent transit path (fixed route with fixed node sequences), and tries to avoid congestion by rescheduling the departure times of each vehicle from each node, and by adjusting the sub-optimal speed on each arc. A modified simulated annealing (SA) algorithm is formulated to solve both complex models iteratively, which is found to be capable of providing solutions in a considerably short amount of time.\nThis paper focuses on preserving the privacy of sensitive pat-terns when inducing decision trees. We adopt a record aug-mentation approach for hiding sensitive classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or crypto-graphic techniques - which restrict the usability of the data - since the raw data itself is readily available for public use. In this paper, we propose a look ahead approach using linear Diophantine equations in order to add the appropriate number of instances while minimally disturbing the initial entropy of the nodes.\nIn this study we developed an automated system that evaluates speech and language features from audio recordings of neuropsychological examinations of 92 subjects in the Framingham Heart Study. A total of 265 features were used in an elastic-net regularized binomial logistic regression model to classify the presence of cognitive impairment, and to select the most predictive features. We compared performance with a demographic model from 6,258 subjects in the greater study cohort (0.79 AUC), and found that a system that incorporated both audio and text features performed the best (0.92 AUC), with a True Positive Rate of 29% (at 0% False Positive Rate) and a good model fit (Hosmer-Lemeshow test > 0.05). We also found that decreasing pitch and jitter, shorter segments of speech, and responses phrased as questions were positively associated with cognitive impairment.\nWe present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. We scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, we identify common error modes of attention-based speech synthesis networks, demonstrate how to mitigate them, and compare several different waveform synthesis methods. We also describe how to scale inference to ten million queries per day on one single-GPU server.\nThe mathematical model underlying the Neural Engineering Framework (NEF) expresses neuronal input as a linear combination of synaptic currents. However, in biology, synapses are not perfect current sources and are thus nonlinear. Detailed synapse models are based on channel conductances instead of currents, which require independent handling of excitatory and inhibitory synapses. This, in particular, significantly affects the influence of inhibitory signals on the neuronal dynamics. In this technical report we first summarize the relevant portions of the NEF and conductance-based synapse models. We then discuss a na\\\"ive translation between populations of LIF neurons with current- and conductance-based synapses based on an estimation of an average membrane potential. Experiments show that this simple approach works relatively well for feed-forward communication channels, yet performance degrades for NEF networks describing more complex dynamics, such as integration.\nIn this paper, we present an automated feature engineering based approach to dramatically reduce false positives in fraud prediction. False positives plague the fraud prediction industry. It is estimated that only 1 in 5 declared as fraud are actually fraud and roughly 1 in every 6 customers have had a valid transaction declined in the past year. To address this problem, we use the Deep Feature Synthesis algorithm to automatically derive behavioral features based on the historical data of the card associated with a transaction. We generate 237 features (>100 behavioral patterns) for each transaction, and use a random forest to learn a classifier. We tested our machine learning model on data from a large multinational bank and compared it to their existing solution. On an unseen data of 1.852 million transactions, we were able to reduce the false positives by 54% and provide a savings of 190K euros. We also assess how to deploy this solution, and whether it necessitates streaming computation for real time scoring. We found that our solution can maintain similar benefits even when historical features are computed once every 7 days.\nThe use of random perturbations of ground truth data, such as random translation or scaling of bounding boxes, is a common heuristic used for data augmentation that has been shown to prevent overfitting and improve generalization. Since the design of data augmentation is largely guided by reported best practices, it is difficult to understand if those design choices are optimal. To provide a more principled perspective, we develop a game-theoretic interpretation of data augmentation in the context of object detection. We aim to find an optimal adversarial perturbations of the ground truth data (i.e., the worst case perturbations) that forces the object bounding box predictor to learn from the hardest distribution of perturbed examples for better test-time performance. We establish that the game theoretic solution, the Nash equilibrium, provides both an optimal predictor and optimal data augmentation distribution. We show that our adversarial method of training a predictor can significantly improve test time performance for the task of object detection. On the ImageNet object detection task, our adversarial approach improves performance by over 16\\% compared to the best performing data augmentation method\nIdentifying arbitrary topologies of power networks in real time is a computationally hard problem due to the number of hypotheses that grows exponentially with the network size. A new \"Learning-to-Infer\" variational inference method is developed for efficient inference of every line status in the network. Optimizing the variational model is transformed to and solved as a discriminative learning problem based on Monte Carlo samples generated with power flow simulations. A major advantage of the developed Learning-to-Infer method is that the labeled data used for training can be generated in an arbitrarily large amount fast and at very little cost. As a result, the power of offline training is fully exploited to learn very complex classifiers for effective real-time topology identification. The proposed methods are evaluated in the IEEE 30, 118 and 300 bus systems. Excellent performance in identifying arbitrary power network topologies in real time is achieved even with relatively simple variational models and a reasonably small amount of data.\nDeep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller network architecture that approximates the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments.   In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.\nWe study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regard- less of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub- optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we ex- tend a previously known incomplete polynomial-time iterative algorithm to under-approximate the NWR.\nApprenticeship learning (AL) is a class of \"learning from demonstrations\" techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.\nIn this semi-tutorial paper, we first review the information-theoretic approach to account for the computational costs incurred during the search for optimal actions in a sequential decision-making problem. The traditional (MDP) framework ignores computational limitations while searching for optimal policies, essentially assuming that the acting agent is perfectly rational and aims for exact optimality. Using the free-energy, a variational principle is introduced that accounts not only for the value of a policy alone, but also considers the cost of finding this optimal policy. The solution of the variational equations arising from this formulation can be obtained using familiar Bellman-like value iterations from dynamic programming (DP) and the Blahut-Arimoto (BA) algorithm from rate distortion theory. Finally, we demonstrate the utility of the approach for generating hierarchies of state abstractions that can be used to best exploit the available computational resources. A numerical example showcases these concepts for a path-planning problem in a grid world environment.\nSince the study of deep convolutional neural network became prevalent, one of the important discoveries is that a feature map from a convolutional network can be extracted before going into the fully connected layer and can be used as a saliency map for object detection. Furthermore, the model can use features from each different layer for accurate object detection: the features from different layers can have different properties. As the model goes deeper, it has many latent skip connections and feature maps to elaborate object detection. Although there are many intermediate layers that we can use for semantic segmentation through skip connection, still the characteristics of each skip connection and the best skip connection for this task are uncertain. Therefore, in this study, we exhaustively research skip connections of state-of-the-art deep convolutional networks and investigate the characteristics of the features from each intermediate layer. In addition, this study would suggest how to use a recent deep neural network model for semantic segmentation and it would therefore become a cornerstone for later studies with the state-of-the-art network models.\nWe study transfer learning in convolutional network architectures applied to the task of recognizing audio, such as environmental sound events and speech commands. Our key finding is that not only is it possible to transfer representations from an unrelated task like environmental sound classification to a voice-focused task like speech command recognition, but also that doing so improves accuracies significantly. We also investigate the effect of increased model capacity for transfer learning audio, by first validating known results from the field of Computer Vision of achieving better accuracies with increasingly deeper networks on two audio datasets: UrbanSound8k and the newly released Google Speech Commands dataset. Then we propose a simple multiscale input representation using dilated convolutions and show that it is able to aggregate larger contexts and increase classification performance. Further, the models trained using a combination of transfer learning and multiscale input representations need only 40% of the training data to achieve similar accuracies as a freshly trained model with 100% of the training data. Finally, we demonstrate a positive interaction effect for the multiscale input and transfer learning, making a case for the joint application of the two techniques.\nHealth related social media mining is a valuable apparatus for the early recognition of the diverse antagonistic medicinal conditions. Mostly, the existing methods are based on machine learning with knowledge-based learning. This working note presents the Recurrent neural network (RNN) and Long short-term memory (LSTM) based embedding for automatic health text classification in the social media mining. For each task, two systems are built and that classify the tweet at the tweet level. RNN and LSTM are used for extracting features and non-linear activation function at the last layer facilitates to distinguish the tweets of different categories. The experiments are conducted on 2nd Social Media Mining for Health Applications Shared Task at AMIA 2017. The experiment results are considerable; however the proposed method is appropriate for the health text classification. This is primarily due to the reason that, it doesn't rely on any feature engineering mechanisms.\nServerless computing has emerged as a compelling paradigm for the development and deployment of a wide range of event based cloud applications. At the same time, cloud providers and enterprise companies are heavily adopting machine learning and Artificial Intelligence to either differentiate themselves, or provide their customers with value added services. In this work we evaluate the suitability of a serverless computing environment for the inferencing of large neural network models. Our experimental evaluations are executed on the AWS Lambda environment using the MxNet deep learning framework. Our experimental results show that while the inferencing latency can be within an acceptable range, longer delays due to cold starts can skew the latency distribution and hence risk violating more stringent SLAs.\nThe study of representations invariant to common transformations of the data is important to learning. Most techniques have focused on local approximate invariance implemented within expensive optimization frameworks lacking explicit theoretical guarantees. In this paper, we study kernels that are invariant to a unitary group while having theoretical guarantees in addressing the important practical issue of unavailability of transformed versions of labelled data. A problem we call the Unlabeled Transformation Problem which is a special form of semi-supervised learning and one-shot learning. We present a theoretically motivated alternate approach to the invariant kernel SVM based on which we propose Max-Margin Invariant Features (MMIF) to solve this problem. As an illustration, we design an framework for face recognition and demonstrate the efficacy of our approach on a large scale semi-synthetic dataset with 153,000 images and a new challenging protocol on Labelled Faces in the Wild (LFW) while out-performing strong baselines.\nThis paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without any recurrent units. Recurrent neural network (RNN) has been a standard technique to model sequential data recently, and this technique has been used in some cutting-edge neural TTS techniques. However, training RNN component often requires a very powerful computer, or very long time typically several days or weeks. Recent other studies, on the other hand, have shown that CNN-based sequence synthesis can be much faster than RNN-based techniques, because of high parallelizability. The objective of this paper is to show an alternative neural TTS system, based only on CNN, that can alleviate these economic costs of training. In our experiment, the proposed Deep Convolutional TTS can be sufficiently trained only in a night (15 hours), using an ordinary gaming PC equipped with two GPUs, while the quality of the synthesized speech was almost acceptable.\nData and knowledge representation are fundamental concepts in machine learning. The quality of the representation impacts the performance of the learning model directly. Feature learning transforms or enhances raw data to structures that are effectively exploited by those models. In recent years, several works have been using complex networks for data representation and analysis. However, no feature learning method has been proposed for such category of techniques. Here, we present an unsupervised feature learning mechanism that works on datasets with binary features. First, the dataset is mapped into a feature--sample network. Then, a multi-objective optimization process selects a set of new vertices to produce an enhanced version of the network. The new features depend on a nonlinear function of a combination of preexisting features. Effectively, the process projects the input data into a higher-dimensional space. To solve the optimization problem, we design two metaheuristics based on the lexicographic genetic algorithm and the improved strength Pareto evolutionary algorithm (SPEA2). We show that the enhanced network contains more information and can be exploited to improve the performance of machine learning methods. The advantages and disadvantages of each optimization strategy are discussed.\nA core business in the fashion industry is the understanding and prediction of customer needs and trends. Search engines and social networks are at the same time a fundamental bridge and a costly middleman between the customer's purchase intention and the retailer. To better exploit Europe's distinctive characteristics e.g., multiple languages, fashion and cultural differences, it is pivotal to reduce retailers' dependence to search engines. This goal can be achieved by harnessing various data channels (manufacturers and distribution networks, online shops, large retailers, social media, market observers, call centers, press/magazines etc.) that retailers can leverage in order to gain more insight about potential buyers, and on the industry trends as a whole. This can enable the creation of novel on-line shopping experiences, the detection of influencers, and the prediction of upcoming fashion trends.   In this paper, we provide an overview of the main research challenges and an analysis of the most promising technological solutions that we are investigating in the FashionBrain project.\nThis paper presents Klout Topics, a lightweight ontology to describe social media users' topics of interest and expertise. Klout Topics is designed to: be human-readable and consumer-friendly; cover multiple domains of knowledge in depth; and promote data extensibility via knowledge base entities. We discuss why this ontology is well-suited for text labeling and interest modeling applications, and how it compares to available alternatives. We show its coverage against common social media interest sets, and examples of how it is used to model the interests of over 780M social media users on Klout.com. Finally, we open the ontology for external use.\nNeural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and even execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome certain well-studied learning challenges that are also fundamental to infants learning their first words. While it is notable that models with no meaningful prior knowledge overcome these learning obstacles, AI researchers and practitioners currently lack a clear understanding of exactly how they do so. Here we address this question as a way of achieving a clearer general understanding of grounded language learning, both to inform future research and to improve confidence in model predictions. For maximum control and generality, we focus on a simple neural network-based language learning agent trained via policy-gradient methods to interpret synthetic linguistic instructions in a simulated 3D world. We apply experimental paradigms from developmental psychology to this agent, exploring the conditions under which established human biases and learning effects emerge. We further propose a novel way to visualise and analyse semantic representation in grounded language learning agents that yields a plausible computational account of the observed effects.\nVisual Analytics might be defined as data mining assisted by interactive visual interfaces. The field has been receiving prominent consideration by researchers, developers and the industry. The literature, however, is complex because it involves multiple fields of knowledge and is considerably recent. In this article we describe an initial tentative organization of the knowledge in the field as an OWL ontology and a SKOS vocabulary. This effort might be useful in many ways that include conceptual considerations and software implementations. Within the results and discussions, we expose a core and an example expansion of the conceptualization, and incorporate design issues that enhance the expressive power of the abstraction.\nIn reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods of learning the value distribution instead of the value function. We give results that close a number of gaps between the theoretical and algorithmic results given by Bellemare, Dabney, and Munos (2017). First, we extend existing results to the approximate distribution setting. Second, we present a novel distributional reinforcement learning algorithm consistent with our theoretical formulation. Finally, we evaluate this new algorithm on the Atari 2600 games, observing that it significantly outperforms many of the recent improvements on DQN, including the related distributional algorithm C51.\nWe study multiwinner voting problems when there is an additional requirement that the selected committee should be fair with respect to attributes such as gender, ethnicity, or political parties. Every setting of an attribute gives rise to a group, and the goal is to ensure that each group is neither over nor under represented in the selected committee. Prior work has largely focused on designing specialized score functions that lead to a precise level of representation with respect to disjoint attributes (e.g., only political affiliation). Here we propose a general algorithmic framework that allows the use of any score function and can guarantee flexible notions of fairness with respect to multiple, non-disjoint attributes (e.g., political affiliation and gender). Technically, we study the complexity of this constrained multiwinner voting problem subject to group-fairness constraints for monotone submodular score functions. We present approximation algorithms and hardness of approximation results for various attribute set structures and score functions.\nThe literature on Multiple Criteria Decision Analysis (MCDA) proposes several methods in order to sort alternatives evaluated on several attributes into ordered classes. Non Compensatory Sorting models (NCS) assign alternatives to classes based on the way they compare to multicriteria profiles separating the consecutive classes. Previous works have proposed approaches to learn the parameters of a NCS model based on a learning set. Exact approaches based on mixed integer linear programming ensures that the learning set is best restored, but can only handle datasets of limited size. Heuristic approaches can handle large learning sets, but do not provide any guarantee about the inferred model. In this paper, we propose an alternative formulation to learn a NCS model. This formulation, based on a SAT problem, guarantees to find a model fully consistent with the learning set (whenever it exists), and is computationally much more efficient than existing exact MIP approaches.\nProviding elderly and people with special needs, including those suffering from physical disabilities and chronic diseases, with the possibility of retaining their independence at best is one of the most important challenges our society is expected to face. Assistance models based on the home care paradigm are being adopted rapidly in almost all industrialized and emerging countries. Such paradigms hypothesize that it is necessary to ensure that the so-called Activities of Daily Living are correctly and regularly performed by the assisted person to increase the perception of an improved quality of life. This chapter describes the computational inference engine at the core of Arianna, a system able to understand whether an assisted person performs a given set of ADL and to motivate him/her in performing them through a speech-mediated motivational dialogue, using a set of nearables to be installed in an apartment, plus a wearable to be worn or fit in garments.\nThe ability to modulate vocal sounds and generate speech is one of the features which set humans apart from other living beings. The human voice can be characterized by several attributes such as pitch, timbre, loudness, and vocal tone. It has often been observed that humans express their emotions by varying different vocal attributes during speech generation. Hence, deduction of human emotions through voice and speech analysis has a practical plausibility and could potentially be beneficial for improving human conversational and persuasion skills. This paper presents an algorithmic approach for detection and analysis of human emotions with the help of voice and speech processing. The proposed approach has been developed with the objective of incorporation with futuristic artificial intelligence systems for improving human-computer interactions.\nIn this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized, information-theoretic criterion that measures the change in costs associated with changes in information. Optimizing the value of information yields a deterministic annealing style of clustering with many benefits. For instance, investigators avoid needing to a priori specify the number of clusters, as the partitions naturally undergo phase changes, during the annealing process, whereby the number of clusters changes in a data-driven fashion. The global-best partition can also often be identified.\nInspired by the recent neuroscience studies on the left-right asymmetry of the human brain in processing low and high spatial frequency information, this paper introduces a dual skipping network which carries out coarse-to-fine object categorization. Such a network has two branches to simultaneously deal with both coarse and fine-grained classification tasks. Specifically, we propose a layer-skipping mechanism that learns a gating network to predict which layers to skip in the testing stage. This layer-skipping mechanism endows the network with good flexibility and capability in practice. Evaluations are conducted on several widely used coarse-to-fine object categorization benchmarks, and promising results are achieved by our proposed network model.\nVisual localization under large changes in scale is an important capability in many robotic mapping applications, such as localizing at low altitudes in maps built at high altitudes, or performing loop closure over long distances. Existing approaches, however, are robust only up to a 3x difference in scale between map and query images. We propose a novel combination of deep-learning-based object features and hand-engineered point-features that yields improved robustness to scale change, perspective change, and image noise. We conduct experiments in simulation and in real-world outdoor scenes exhibiting up to a 7x change in scale, and compare our approach against localization using state-of-the-art SIFT features. This technique is training-free and class-agnostic, and in principle can be deployed in any environment out-of-the-box.\nCamera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure and loop closure detection. Recent random forests based methods exploit randomly sampled pixel comparison features to predict 3D world locations for 2D image locations to guide the camera pose optimization. However, these image features are only sampled randomly in the images, without considering the spatial structures or geometric information, leading to large errors or failure cases with the existence of poorly textured areas or in motion blur. Line segment features are more robust in these environments. In this work, we propose to jointly exploit points and lines within the framework of uncertainty driven regression forests. The proposed approach is thoroughly evaluated on three publicly available datasets against several strong state-of-the-art baselines in terms of several different error metrics. Experimental results prove the efficacy of our method, showing superior or on-par state-of-the-art performance.\nThis paper introduces a new routing problem referred to as the vehicle routing problem with vector profits. Given a network composed of nodes (depot/sites) and arcs connecting the nodes, the problem determines routes that depart from the depot, visit sites to collect profits, and return to the depot. There are multiple stakeholders interested in the mission and each site is associated with a vector whose k-th element represents the profit value for the k-th stakeholder. The objective of the problem is to maximize the profit sum for the least satisfied stakeholder, i.e., the stakeholder with the smallest total profit value. An approach based on the linear programming relaxation and column-generation to solve this max-min type routing problem was developed. Two cases studies - the planetary surface exploration and the Rome tour cases - were presented to demonstrate the effectiveness of the proposed problem formulation and solution methodology.\nRegularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization procedures. We do not provide all details about the listed methods; instead, we present an overview of how the methods can be sorted into meaningful categories and sub-categories. This helps revealing links and fundamental similarities between them. Finally, we include practical recommendations both for users and for developers of new regularization methods.\nThird-generation neural networks, or Spiking Neural Networks (SNNs), aim at harnessing the energy efficiency of spike-domain processing by building on computing elements that operate on, and exchange, spikes. In this paper, the problem of training a two-layer SNN is studied for the purpose of classification, under a Generalized Linear Model (GLM) probabilistic neural model that was previously considered within the computational neuroscience literature. Conventional classification rules for SNNs operate offline based on the number of output spikes at each output neuron. In contrast, a novel training method is proposed here for a first-to-spike decoding rule, whereby the SNN can perform an early classification decision once spike firing is detected at an output neuron. Numerical results bring insights into the optimal parameter selection for the GLM neuron and on the accuracy-complexity trade-off performance of conventional and first-to-spike decoding.\nGenerative Adversarial Network (GAN) and its variants exhibit state-of-the-art performance in the class of generative models. To capture higher-dimensional distributions, the common learning procedure requires high computational complexity and a large number of parameters. The problem of employing such massive framework arises when deploying it on a platform with limited computational power such as mobile phones. In this paper, we present a new generative adversarial framework by representing each layer as a tensor structure connected by multilinear operations, aiming to reduce the number of model parameters by a large factor while preserving the generative performance and sample quality. To learn the model, we employ an efficient algorithm which alternatively optimizes both discriminator and generator. Experimental outcomes demonstrate that our model can achieve high compression rate for model parameters up to $35$ times when compared to the original GAN for MNIST dataset.\nRecurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding of the mechanisms behind their effectiveness limits further improvements on their architectures. In this paper, we present a visual analytics method for understanding and comparing RNN models for NLP tasks. We propose a technique to explain the function of individual hidden state units based on their expected response to input texts. We then co-cluster hidden state units and words based on the expected response and visualize co-clustering results as memory chips and word clouds to provide more structured knowledge on RNNs' hidden states. We also propose a glyph-based sequence visualization based on aggregate information to analyze the behavior of an RNN's hidden state at the sentence-level. The usability and effectiveness of our method are demonstrated through case studies and reviews from domain experts.\nExtreme learning machine (ELM) is a new single hidden layer feedback neural network. The weights of the input layer and the biases of neurons in hidden layer are randomly generated, the weights of the output layer can be analytically determined. ELM has been achieved good results for a large number of classification tasks. In this paper, a new extreme learning machine called rough extreme learning machine (RELM) was proposed. RELM uses rough set to divide data into upper approximation set and lower approximation set, and the two approximation sets are utilized to train upper approximation neurons and lower approximation neurons. In addition, an attribute reduction is executed in this algorithm to remove redundant attributes. The experimental results showed, comparing with the comparison algorithms, RELM can get a better accuracy and repeatability in most cases, RELM can not only maintain the advantages of fast speed, but also effectively cope with the classification task for high-dimensional data.\nWe present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).\nWe analyze the expressiveness and loss surface of practical deep convolutional neural networks (CNNs) with shared weights and max pooling layers. We show that such CNNs produce linearly independent features at a \"wide\" layer which has more neurons than the number of training samples. This condition holds e.g. for the VGG network. Furthermore, we provide for such wide CNNs necessary and sufficient conditions for global minima with zero training error. For the case where the wide layer is followed by a fully connected layer, we show that almost every critical point of the empirical loss is a global minimum with zero training error. Our analysis suggests that both depth and width are very important in deep learning. While depth brings more representational power and allows the network to learn high level features, width smoothes the optimization landscape of the loss function in the sense that a sufficiently wide network has a well-behaved loss surface with potentially no bad local minima.\nIn spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French-to-English and German-to-English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respectively. Our implementation is released as an open source project.\nOptions in reinforcement learning allow agents to hierarchically decompose a task into subtasks, having the potential to speed up learning and planning. However, autonomously learning effective sets of options is still a major challenge in the field. In this paper we focus on the recently introduced idea of using representation learning methods to guide the option discovery process. Specifically, we look at eigenoptions, options obtained from representations that encode diffusive information flow in the environment. We extend the existing algorithms for eigenoption discovery to settings with stochastic transitions and in which handcrafted features are not available. We propose an algorithm that discovers eigenoptions while learning non-linear state representations from raw pixels. It exploits recent successes in the deep reinforcement learning literature and the equivalence between proto-value functions and the successor representation. We use traditional tabular domains to provide intuition about our approach and Atari 2600 games to demonstrate its potential.\nWe focus on the problem of estimating the change in the dependency structures of two $p$-dimensional Gaussian Graphical models (GGMs). Previous studies for sparse change estimation in GGMs involve expensive and difficult non-smooth optimization. We propose a novel method, DIFFEE for estimating DIFFerential networks via an Elementary Estimator under a high-dimensional situation. DIFFEE is solved through a faster and closed form solution that enables it to work in large-scale settings. We conduct a rigorous statistical analysis showing that surprisingly DIFFEE achieves the same asymptotic convergence rates as the state-of-the-art estimators that are much more difficult to compute. Our experimental results on multiple synthetic datasets and one real-world data about brain connectivity show strong performance improvements over baselines, as well as significant computational benefits.\nThis paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.\nWe consider the problems of learning forward models that map state to high-dimensional images and inverse models that map high-dimensional images to state in robotics. Specifically, we present a perceptual model for generating video frames from state with deep networks, and provide a framework for its use in tracking and prediction tasks. We show that our proposed model greatly outperforms standard deconvolutional methods and GANs for image generation, producing clear, photo-realistic images. We also develop a convolutional neural network model for state estimation and compare the result to an Extended Kalman Filter to estimate robot trajectories. We validate all models on a real robotic system.\nDue to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers for a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers.\nIt is commonly agreed that the use of relevant invariances as a good statistical bias is important in machine-learning. However, most approaches that explicitly incorporate invariances into a model architecture only make use of very simple transformations, such as translations and rotations. Hence, there is a need for methods to model and extract richer transformations that capture much higher-level invariances. To that end, we introduce a tool allowing to parametrize the set of filters of a trained convolutional neural network with the latent space of a generative adversarial network. We then show that the method can capture highly non-linear invariances of the data by visualizing their effect in the data space.\nDeep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial or non-Markovian observations by using finite-length frame-history observations or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to a cumulative clipped advantage function and is robust to partially observed state. We demonstrate that on several partially observed reinforcement learning tasks, this new class of algorithms can substantially outperform strong baseline methods: on Pong with single-frame observations, and on the challenging Doom (ViZDoom) and Minecraft (Malm\\\"o) first-person navigation benchmarks.\nThis paper introduces a novel framework for combining scientific knowledge of physics-based models with neural networks to advance scientific discovery. This framework, termed as physics-guided neural network (PGNN), leverages the output of physics-based model simulations along with observational features to generate predictions using a neural network architecture. Further, this paper presents a novel framework for using physics-based loss functions in the learning objective of neural networks, to ensure that the model predictions not only show lower errors on the training set but are also scientifically consistent with the known physics on the unlabeled set. We illustrate the effectiveness of PGNN for the problem of lake temperature modeling, where physical relationships between the temperature, density, and depth of water are used to design a physics-based loss function. By using scientific knowledge to guide the construction and learning of neural networks, we are able to show that the proposed framework ensures better generalizability as well as scientific consistency of results.\nIn this paper we argue that crime drama exemplified in television programs such as CSI:Crime Scene Investigation is an ideal testbed for approximating real-world natural language understanding and the complex inferences associated with it. We propose to treat crime drama as a new inference task, capitalizing on the fact that each episode poses the same basic question (i.e., who committed the crime) and naturally provides the answer when the perpetrator is revealed. We develop a new dataset based on CSI episodes, formalize perpetrator identification as a sequence labeling problem, and develop an LSTM-based model which learns from multi-modal data. Experimental results show that an incremental inference strategy is key to making accurate guesses as well as learning from representations fusing textual, visual, and acoustic input.\nLearning to learn is a powerful paradigm for enabling models to learn from data more effectively and efficiently. A popular approach to meta-learning is to train a recurrent model to read in a training dataset as input and output the parameters of a learned model, or output predictions for new test inputs. Alternatively, a more recent approach to meta-learning aims to acquire deep representations that can be effectively fine-tuned, via standard gradient descent, to new tasks. In this paper, we consider the meta-learning problem from the perspective of universality, formalizing the notion of learning algorithm approximation and comparing the expressive power of the aforementioned recurrent models to the more recent approaches that embed gradient descent into the meta-learner. In particular, we seek to answer the following question: does deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm? We find that this is indeed true, and further find, in our experiments, that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models.\nMachine translation has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale parallel corpora. There have been numerous attempts to extend these successes to low-resource language pairs, yet requiring tens of thousands of parallel sentences. In this work, we take this research direction to the extreme and investigate whether it is possible to learn to translate even without any parallel data. We propose a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space. By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data. We demonstrate our model on two widely used datasets and two language pairs, reporting BLEU scores of 32.8 and 15.1 on the Multi30k and WMT English-French datasets, without using even a single parallel sentence at training time.\nTraditional models for question answering optimize using cross entropy loss, which encourages exact answers at the cost of penalizing nearby or overlapping answers that are sometimes equally accurate. We propose a mixed objective that combines cross entropy loss with self-critical policy learning. The objective uses rewards derived from word overlap to solve the misalignment between evaluation metric and optimization objective. In addition to the mixed objective, we improve dynamic coattention networks (DCN) with a deep residual coattention encoder that is inspired by recent work in deep self-attention and residual networks. Our proposals improve model performance across question types and input lengths, especially for long questions that requires the ability to capture long-term dependencies. On the Stanford Question Answering Dataset, our model achieves state-of-the-art results with 75.1% exact match accuracy and 83.1% F1, while the ensemble obtains 78.9% exact match accuracy and 86.0% F1.\nExisting deep multitask learning (MTL) approaches align layers shared between tasks in a parallel ordering. Such an organization significantly constricts the types of shared structure that can be learned. The necessity of parallel ordering for deep MTL is first tested by comparing it with permuted ordering of shared layers. The results indicate that a flexible ordering can enable more effective sharing, thus motivating the development of a soft ordering approach, which learns how shared layers are applied in different ways for different tasks. Deep MTL with soft ordering outperforms parallel ordering methods across a series of domains. These results suggest that the power of deep MTL comes from learning highly general building blocks that can be assembled to meet the demands of each task.\nAn obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large amount of interactions with the environ- ment in order to master a skill. The learned skill usually generalizes poorly across domains and re-training is often necessary when presented with a new task. We present a framework that combines methods in formal methods with hierarchi- cal reinforcement learning (HRL). The set of techniques we provide allows for convenient specification of tasks with complex logic, learn hierarchical policies (meta-controller and low-level controllers) with well-defined intrinsic rewards us- ing any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot.\nWe present pomegranate, an open source machine learning package for probabilistic modeling in Python. Probabilistic modeling encompasses a wide range of methods that explicitly describe uncertainty using probability distributions. Three widely used probabilistic models implemented in pomegranate are general mixture models, hidden Markov models, and Bayesian networks. A primary focus of pomegranate is to abstract away the complexities of training models from their definition. This allows users to focus on specifying the correct model for their application instead of being limited by their understanding of the underlying algorithms. An aspect of this focus involves the collection of additive sufficient statistics from data sets as a strategy for training models. This approach trivially enables many useful learning strategies, such as out-of-core learning, minibatch learning, and semi-supervised learning, without requiring the user to consider how to partition data or modify the algorithms to handle these tasks themselves. pomegranate is written in Cython to speed up calculations and releases the global interpreter lock to allow for built-in multithreaded parallelism, making it competitive with---or outperform---other implementations of similar algorithms. This paper presents an overview of the design choices in pomegranate, and how they have enabled complex features to be supported by simple code.\nHumans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb \"dax,\" he or she can immediately understand the meaning of \"dax twice\" or \"sing and dax.\" In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply \"mix-and-match\" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the \"dax\" example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, suggesting that lack of systematicity might be partially responsible for neural networks' notorious training data thirst.\nIt is often argued that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a {\\em Pareto-optimal} policy, i.e., a policy that cannot be improved upon for one agent without making sacrifices for another. A famous theorem of Harsanyi shows that, when the principals have a common prior on the outcome distributions of all policies, a Pareto-optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities.   In this paper, we show that Harsanyi's theorem does not hold for principals with different priors, and derive a more precise generalization which does hold, which constitutes our main result. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior. The result has implications for the design of contracts, treaties, joint ventures, and robots.\nThis paper introduces and addresses a wide class of stochastic bandit problems where the function mapping the arm to the corresponding reward exhibits some known structural properties. Most existing structures (e.g. linear, Lipschitz, unimodal, combinatorial, dueling, ...) are covered by our framework. We derive an asymptotic instance-specific regret lower bound for these problems, and develop OSSB, an algorithm whose regret matches this fundamental limit. OSSB is not based on the classical principle of \"optimism in the face of uncertainty\" or on Thompson sampling, and rather aims at matching the minimal exploration rates of sub-optimal arms as characterized in the derivation of the regret lower bound. We illustrate the efficiency of OSSB using numerical experiments in the case of the linear bandit problem and show that OSSB outperforms existing algorithms, including Thompson sampling.\nAs data-driven methods rise in popularity in materials science applications, a key question is how these machine learning models can be used to understand microstructure. Given the importance of process-structure-property relations throughout materials science, it seems logical that models that can leverage microstructural data would be more capable of predicting property information. While there have been some recent attempts to use convolutional neural networks to understand microstructural images, these early studies have focused only on which featurizations yield the highest machine learning model accuracy for a single data set. This paper explores the use of convolutional neural networks for classifying microstructure with a more holistic set of objectives in mind: generalization between data sets, number of features required, and interpretability.\nThis paper presents the design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) a large scale Spoken Language Understanding (SLU) Software Development Kit (SDK) that enables developers to extend the capabilities of Amazon's virtual assistant, Alexa. At Amazon, the infrastructure powers over 25,000 skills deployed through the ASK, as well as AWS's Amazon Lex SLU Service. The ASK emphasizes flexibility, predictability and a rapid iteration cycle for third party developers. It imposes inductive biases that allow it to learn robust SLU models from extremely small and sparse datasets and, in doing so, removes significant barriers to entry for software developers and dialogue systems researchers.\nDisentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, offer several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We evaluate the proposed approach using several quantitative metrics and empirically observe significant gains over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).\nDom/wdeg is one of the best performing heuristics for dynamic variable ordering in backtrack search [Boussemart et al., 2004]. As originally defined, this heuristic increments the weight of the constraint that causes a domain wipeout (i.e., a dead-end) when enforcing arc consistency during search. \"The process of weighting constraints with dom/wdeg is not defined when more than one constraint lead to a domain wipeout [Vion et al., 2011].\" In this paper, we investigate how weights should be updated in the context of two high-level consistencies, namely, singleton (POAC) and relational consistencies (RNIC). We propose, analyze, and empirically evaluate several strategies for updating the weights. We statistically compare the proposed strategies and conclude with our recommendations.\nProgram analysis is a technique to reason about programs without executing them, and it has various applications in compilers, integrated development environments, and security. In this work, we present a machine learning pipeline that induces a security analyzer for programs by example. The security analyzer determines whether a program is either secure or insecure based on symbolic rules that were deduced by our machine learning pipeline. The machine pipeline is two-staged consisting of a Recurrent Neural Networks (RNN) and an Extractor that converts an RNN to symbolic rules.   To evaluate the quality of the learned symbolic rules, we propose a sampling-based similarity measurement between two infinite regular languages. We conduct a case study using real-world data. In this work, we discuss the limitations of existing techniques and possible improvements in the future. The results show that with sufficient training data and a fair distribution of program paths it is feasible to deducing symbolic security rules for the OpenJDK library with millions lines of code.\nIn meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Under the assumption that future tasks are 'related' to previous tasks, representations should be learned in a way which captures the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of new tasks. We present a framework for meta-learning that is based on generalization error bounds, allowing us to extend various PAC-Bayes bounds to meta-learning. Learning takes place through the construction of a distribution over hypotheses based on the observed tasks, and its utilization for learning a new task. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks. We develop a gradient-based algorithm which minimizes an objective function derived from the bounds and demonstrate its effectiveness numerically with deep neural networks. In addition to establishing the improved performance available through meta-learning, we demonstrate the intuitive way by which prior information is manifested at different levels of the network.\nMarkov Logic Networks join probabilistic modeling with first-order logic and have been shown to integrate well with the Semantic Web foundations. While several approaches have been devised to tackle the subproblems of rule mining, grounding, and inference, no comprehensive workflow has been proposed so far. In this paper, we fill this gap by introducing a framework called Mandolin, which implements a workflow for knowledge discovery specifically on RDF datasets. Our framework imports knowledge from referenced graphs, creates similarity relationships among similar literals, and relies on state-of-the-art techniques for rule mining, grounding, and inference computation. We show that our best configuration scales well and achieves at least comparable results with respect to other statistical-relational-learning algorithms on link prediction.\nThis paper describes the design and development of a decentralized firewall system powered by a novel malware detection engine. The firewall is built using blockchain technology. The detection engine aims to classify Portable Executable (PE) files as malicious or benign. File classification is carried out using a deep belief neural network (DBN) as the detection engine. Our approach is to model the files as grayscale images and use the DBN to classify those images into the aforementioned two classes. An extensive data set of 10,000 files is used to train the DBN. Validation is carried out using 4,000 files previously unexposed to the network. The final result of whether to allow or block a file is obtained by arriving at a proof of work based consensus in the blockchain network.\nMachine learning is usually defined in behaviourist terms, where external validation is the primary mechanism of learning. In this paper, I argue for a more holistic interpretation in which finding more probable, efficient and abstract representations is as central to learning as performance. In other words, machine learning should be extended with strategies to reason over its own learning process, leading to so-called meta-cognitive machine learning. As such, the de facto definition of machine learning should be reformulated in these intrinsically multi-objective terms, taking into account not only the task performance but also internal learning objectives. To this end, we suggest a \"model entropy function\" to be defined that quantifies the efficiency of the internal learning processes. It is conjured that the minimization of this model entropy leads to concept formation. Besides philosophical aspects, some initial illustrations are included to support the claims.\nIndividual Neurons in the nervous systems exploit various dynamics. To capture these dynamics for single neurons, we tune the parameters of an electrophysiological model of nerve cells, to fit experimental data obtained by calcium imaging. A search for the biophysical parameters of this model is performed by means of a genetic algorithm, where the model neuron is exposed to a predefined input current representing overall inputs from other parts of the nervous system. The algorithm is then constrained for keeping the ion-channel currents within reasonable ranges, while producing the best fit to a calcium imaging time series of the AVA interneuron, from the brain of the soil-worm, C. elegans. Our settings enable us to project a set of biophysical parameters to the the neuron kinetics observed in neuronal imaging.\nDeep learning approaches such as convolutional neural nets have consistently outperformed previous methods on challenging tasks such as dense, semantic segmentation. However, the various proposed networks perform differently, with behaviour largely influenced by architectural choices and training settings. This paper explores Ensembles of Multiple Models and Architectures (EMMA) for robust performance through aggregation of predictions from a wide range of methods. The approach reduces the influence of the meta-parameters of individual models and the risk of overfitting the configuration to a particular database. EMMA can be seen as an unbiased, generic deep learning model which is shown to yield excellent performance, winning the first position in the BRATS 2017 competition among 50+ participating teams.\nRather than learning new control policies for each new task, it is possible, when tasks share some structure, to compose a \"meta-policy\" from previously learned policies. This paper reports results from experiments using Deep Reinforcement Learning on a continuous-state, discrete-action autonomous driving simulator. We explore how Deep Neural Networks can represent meta-policies that switch among a set of previously learned policies, specifically in settings where the dynamics of a new scenario are composed of a mixture of previously learned dynamics and where the state observation is possibly corrupted by sensing noise. We also report the results of experiments varying dynamics mixes, distractor policies, magnitudes/distributions of sensing noise, and obstacles. In a fully observed experiment, the meta-policy learning algorithm achieves 2.6x the reward achieved by the next best policy composition technique with 80% less exploration. In a partially observed experiment, the meta-policy learning algorithm converges after 50 iterations while a direct application of RL fails to converge even after 200 iterations.\nThe next leap on the internet has already started as Semantic Web. At its core, Semantic Web transforms the document oriented web to a data oriented web enriched with semantics embedded as metadata. This change in perspective towards the web offers numerous benefits for vast amount of data intensive industries that are bound to the web and its related applications. The industries are diverse as they range from Oil & Gas exploration to the investigative journalism, and everything in between. This paper discusses eight different industries which currently reap the benefits of Semantic Web. The paper also offers a future outlook into Semantic Web applications and discusses the areas in which Semantic Web would play a key role in the future.\nWe study the relationship between geometry and capacity measures for deep neural networks from an invariance viewpoint. We introduce a new notion of capacity --- the Fisher-Rao norm --- that possesses desirable invariance properties and is motivated by Information Geometry. We discover an analytical characterization of the new capacity measure, through which we establish norm-comparison inequalities and further show that the new measure serves as an umbrella for several existing norm-based complexity measures. We discuss upper bounds on the generalization error induced by the proposed measure. Extensive numerical experiments on CIFAR-10 support our theoretical findings. Our theoretical analysis rests on a key structural lemma about partial derivatives of multi-layer rectifier networks.\nLong Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden states are represented by tensors and updated via a cross-layer convolution. By increasing the tensor size, the network can be widened efficiently without additional parameters since the parameters are shared across different locations in the tensor; by delaying the output, the network can be deepened implicitly with little additional runtime since deep computations for each timestep are merged into temporal computations of the sequence. Experiments conducted on five challenging sequence learning tasks show the potential of the proposed model.\nA remarkable feature of human beings is their capacity for creative behaviour, referring to their ability to react to problems in ways that are novel, surprising, and useful. Transformational creativity is a form of creativity where the creative behaviour is induced by a transformation of the actor's conceptual space, that is, the representational system with which the actor interprets its environment. In this report, we focus on ways of adapting systems of learned representations as they switch from performing one task to performing another. We describe an experimental comparison of multiple strategies for adaptation of learned features, and evaluate how effectively each of these strategies realizes the adaptation, in terms of the amount of training, and in terms of their ability to cope with restricted availability of training data. We show, among other things, that across handwritten digits, natural images, and classical music, adaptive strategies are systematically more effective than a baseline method that starts learning from scratch.\nTraining a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.\nRoboCup is an international scientific robot competition in which teams of multiple robots compete against each other. Its different leagues provide many sources of robotics data, that can be used for further analysis and application of machine learning. This paper describes a large dataset from games of some of the top teams (from 2016 and 2017) in RoboCup Soccer Simulation League (2D), where teams of 11 robots (agents) compete against each other. Overall, we used 10 different teams to play each other, resulting in 45 unique pairings. For each pairing, we ran 25 matches (of 10mins), leading to 1125 matches or more than 180 hours of game play. The generated CSV files are 17GB of data (zipped), or 229GB (unzipped). The dataset is unique in the sense that it contains both the ground truth data (global, complete, noise-free information of all objects on the field), as well as the noisy, local and incomplete percepts of each robot. These data are made available as CSV files, as well as in the original soccer simulator formats.\nWe propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.\nNeural networks (NNs) have begun to have a pervasive impact on various applications of machine learning. However, the problem of finding an optimal NN architecture for large applications has remained open for several decades. Conventional approaches search for the optimal NN architecture through extensive trial-and-error. Such a procedure is quite inefficient. In addition, the generated NN architectures incur substantial redundancy. To address these problems, we propose an NN synthesis tool (NeST) that automatically generates very compact architectures for a given dataset. NeST starts with a seed NN architecture. It iteratively tunes the architecture with gradient-based growth and magnitude-based pruning of neurons and connections. Our experimental results show that NeST yields accurate yet very compact NNs with a wide range of seed architecture selection. For example, for the LeNet-300-100 (LeNet-5) NN architecture derived from the MNIST dataset, we reduce network parameters by 34.1x (74.3x) and floating-point operations (FLOPs) by 35.8x (43.7x). For the AlexNet NN architecture derived from the ImageNet dataset, we reduce network parameters by 15.7x and FLOPs by 4.6x. All these results are the current state-of-the-art for these architectures.\nIn this paper, we study the representational power of deep neural networks (DNN) that belong to the family of piecewise-linear (PWL) functions, based on PWL activation units such as rectifier or maxout. We investigate the complexity of such networks by studying the number of linear regions of the PWL function. Typically, a PWL function from a DNN can be seen as a large family of linear functions acting on millions of such regions. We directly build upon the work of Montufar et al. (2014), Montufar (2017) and Raghu et al. (2017) by refining the upper and lower bounds on the number of linear regions for rectified and maxout networks. In addition to achieving tighter bounds, we also develop a novel method to perform exact enumeration or counting of the number of linear regions with a mixed-integer linear formulation that maps the input space to output. We use this new capability to visualize how the number of linear regions change while training DNNs.\nState-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution completely. Instead, it uses only self-attention and feed-forward layers. While the proposed architecture achieves state-of-the-art results on several machine translation tasks, it requires a large number of parameters and training iterations to converge. We propose Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15-40% faster. Specifically, we replace the multi-head attention by multiple self-attention branches that the model learns to combine during the training process. Our model improves the state-of-the-art performance by 0.5 BLEU points on the WMT 2014 English-to-German translation task and by 0.4 on the English-to-French translation task.\nWe present a novel technique for learning the mass matrices in samplers obtained from discretized dynamics that preserve some energy function. Existing adaptive samplers use Riemannian preconditioning techniques, where the mass matrices are functions of the parameters being sampled. This leads to significant complexities in the energy reformulations and resultant dynamics, often leading to implicit systems of equations and requiring inversion of high-dimensional matrices in the leapfrog steps. Our approach provides a simpler alternative, by using existing dynamics in the sampling step of a Monte Carlo EM framework, and learning the mass matrices in the M step with a novel online technique. We also propose a way to adaptively set the number of samples gathered in the E step, using sampling error estimates from the leapfrog dynamics. Along with a novel stochastic sampler based on Nos\\'{e}-Poincar\\'{e} dynamics, we use this framework with standard Hamiltonian Monte Carlo (HMC) as well as newer stochastic algorithms such as SGHMC and SGNHT, and show strong performance on synthetic and real high-dimensional sampling scenarios; we achieve sampling accuracies comparable to Riemannian samplers while being significantly faster.\nApproximate algorithms for structured prediction problems---such as the popular alpha-expansion algorithm (Boykov et al. 2001) in computer vision---typically far exceed their theoretical performance guarantees on real-world instances. These algorithms often find solutions that are very close to optimal. The goal of this paper is to partially explain the performance of alpha-expansion on MAP inference in Ferromagnetic Potts models (FPMs). Our main results use the connection between energy minimization in FPMs and the Uniform Metric Labeling problem to give a stability condition under which the alpha-expansion algorithm provably recovers the optimal MAP solution. This theoretical result complements the numerous empirical observations of alpha-expansion's performance. Additionally, we give a different stability condition under which an LP-based algorithm recovers the optimal solution.\nDeep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization. Here we consider a family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer, and we propose their use as environments for evaluating and comparing different approaches to reinforcement learning. These games have a number of appealing features: they are challenging for current learning approaches, but they form (i) a low-dimensional, simply parametrized environment where (ii) there is a linear closed form solution for optimal behavior from any state, and (iii) the difficulty of the game can be tuned by changing environment parameters in an interpretable way. We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyse multiagent play, and even develop a self play algorithm.\nWe study the problem of learning overcomplete HMMs---those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results---both positive and negative---which help define the boundaries between the tractable and intractable settings. Specifically, we show positive results for a large subclass of HMMs whose transition matrices are sparse, well-conditioned, and have small probability mass on short cycles. On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree. We also discuss these results in the context of learning HMMs which can capture long-term dependencies.\nA major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation. This makes BPTT both computationally impractical and biologically implausible. For this reason, full backpropagation through time is rarely used on long sequences, and truncated backpropagation through time is used as a heuristic. However, this usually leads to biased estimates of the gradient in which longer term dependencies are ignored. Addressing this issue, we propose an alternative algorithm, Sparse Attentive Backtracking, which might also be related to principles used by brains to learn long-term dependencies. Sparse Attentive Backtracking learns an attention mechanism over the hidden states of the past and selectively backpropagates through paths with high attention weights. This allows the model to learn long term dependencies while only backtracking for a small number of time steps, not just from the recent past but also from attended relevant past states.\nThe main challenge of online multi-object tracking is to reliably associate object trajectories with detections in each video frame based on their tracking history. In this work, we propose the Recurrent Autoregressive Network (RAN), a temporal generative modeling framework to characterize the appearance and motion dynamics of multiple objects over time. The RAN couples an external memory and an internal memory. The external memory explicitly stores previous inputs of each trajectory in a time window, while the internal memory learns to summarize long-term tracking history and associate detections by processing the external memory. We conduct experiments on the MOT 2015 and 2016 datasets to demonstrate the robustness of our tracking method in highly crowded and occluded scenes. Our method achieves top-ranked results on the two benchmarks.\nRecurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. In order to address this issue, we investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, we demonstrate that we can create block-sparse RNNs with sparsity ranging from 80% to 90% with small loss in accuracy. This allows us to reduce the model size by roughly 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity.\nWe improve the performance of the American Fuzzy Lop (AFL) fuzz testing framework by using Generative Adversarial Network (GAN) models to reinitialize the system with novel seed files. We assess performance based on the temporal rate at which we produce novel and unseen code paths. We compare this approach to seed file generation from a random draw of bytes observed in the training seed files. The code path lengths and variations were not sufficiently diverse to fully replace AFL input generation. However, augmenting native AFL with these additional code paths demonstrated improvements over AFL alone. Specifically, experiments showed the GAN was faster and more effective than the LSTM and out-performed a random augmentation strategy, as measured by the number of unique code paths discovered. GAN helps AFL discover 14.23% more code paths than the random strategy in the same amount of CPU time, finds 6.16% more unique code paths, and finds paths that are on average 13.84% longer. Using GAN shows promise as a reinitialization strategy for AFL to help the fuzzer exercise deep paths in software.\nAutonomous agents optimize the reward function we give them. What they don't know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios. Inevitably, agents encounter new scenarios (e.g., new types of terrain) where optimizing that same reward may lead to undesired behavior. Our insight is that reward functions are merely observations about what the designer actually wants, and that they should be interpreted in the context in which they were designed. We introduce inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP. We introduce approximate methods for solving IRD problems, and use their solution to plan risk-averse behavior in test MDPs. Empirical results suggest that this approach can help alleviate negative side effects of misspecified reward functions and mitigate reward hacking.\nSparsity inducing regularization is an important part for learning over-complete visual representations. Despite the popularity of $\\ell_1$ regularization, in this paper, we investigate the usage of non-convex regularizations in this problem. Our contribution consists of three parts. First, we propose the leaky capped norm regularization (LCNR), which allows model weights below a certain threshold to be regularized more strongly as opposed to those above, therefore imposes strong sparsity and only introduces controllable estimation bias. We propose a majorization-minimization algorithm to optimize the joint objective function. Second, our study over monocular 3D shape recovery and neural networks with LCNR outperforms $\\ell_1$ and other non-convex regularizations, achieving state-of-the-art performance and faster convergence. Third, we prove a theoretical global convergence speed on the 3D recovery problem. To the best of our knowledge, this is the first convergence analysis of the 3D recovery problem.\nThis paper deals with unsupervised clustering with feature selection. The problem is to estimate both labels and a sparse projection matrix of weights. To address this combinatorial non-convex problem maintaining a strict control on the sparsity of the matrix of weights, we propose an alternating minimization of the Frobenius norm criterion. We provide a new efficient algorithm named K-sparse which alternates k-means with projection-gradient minimization. The projection-gradient step is a method of splitting type, with exact projection on the $\\ell^1$ ball to promote sparsity. The convergence of the gradient-projection step is addressed, and a preliminary analysis of the alternating minimization is made. The Frobenius norm criterion converges as the number of iterates in Algorithm K-sparse goes to infinity. Experiments on Single Cell RNA sequencing datasets show that our method significantly improves the results of PCA k-means, spectral clustering, SIMLR, and Sparcl methods, and achieves a relevant selection of genes. The complexity of K-sparse is linear in the number of samples (cells), so that the method scales up to large datasets.\nRoguelike games generally feature exploration problems as a critical, yet often repetitive element of gameplay. Automated approaches, however, face challenges in terms of optimality, as well as due to incomplete information, such as from the presence of secret doors. This paper presents an algorithmic approach to exploration of roguelike dungeon environments. Our design aims to minimize exploration time, balancing coverage and discovery of secret areas with resource cost. Our algorithm is based on the concept of occupancy maps popular in robotics, adapted to encourage efficient discovery of secret access points. Through extensive experimentation on NetHack maps we show that this technique is significantly more efficient than simpler greedy approaches. We further investigate optimized parameterization for the algorithm through a comprehensive data analysis. These results point towards better automation for players as well as heuristics applicable to fully automated gameplay.\nWe consider stochastic multi-armed bandit problems with graph feedback, where the decision maker is allowed to observe the neighboring actions of the chosen action. We allow the graph structure to vary with time and consider both deterministic and Erd\\H{o}s-R\\'enyi random graph models. For such a graph feedback model, we first present a novel analysis of Thompson sampling that leads to tighter performance bound than existing work. Next, we propose new Information Directed Sampling based policies that are graph-aware in their decision making. Under the deterministic graph case, we establish a Bayesian regret bound for the proposed policies that scales with the clique cover number of the graph instead of the number of actions. Under the random graph case, we provide a Bayesian regret bound for the proposed policies that scales with the ratio of the number of actions over the expected number of observations per iteration. To the best of our knowledge, this is the first analytical result for stochastic bandits with random graph feedback. Finally, using numerical evaluations, we demonstrate that our proposed IDS policies outperform existing approaches, including adaptions of upper confidence bound, $\\epsilon$-greedy and Exp3 algorithms.\nCloze test is widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-designed cloze test dataset CLOTH, in which the questions were used in middle-school and high-school language exams. With the missing blanks carefully created by teachers and candidate choices purposely designed to be confusing, CLOTH requires a deeper language understanding and a wider attention span than previous automatically generated cloze datasets. We show humans outperform dedicated designed baseline models by a significant margin, even when the model is trained on sufficiently large external data. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending a long-term context to be the key bottleneck.\nComputational models of decisionmaking must contend with the variance of context and any number of possible decisions that a defined strategic actor can make at a given time. Relying on cognitive science theory, the authors have created an algorithm that captures the orientation of the actor towards an object and arrays the possible decisions available to that actor based on their given intersubjective orientation. This algorithm, like a traditional K-means clustering algorithm, relies on a core-periphery structure that gives the likelihood of moves as those closest to the cluster's centroid. The result is an algorithm that enables unsupervised classification of an array of decision points belonging to an actor's present state and deeply rooted in cognitive science theory.\nProgram synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, mapping the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver. A key challenge of this formulation is scalability: while constraint solvers work well with a few well-chosen examples, a large set of examples can incur significant overhead in both time and memory. We describe a method to discover a subset of examples that is both small and representative: the subset is constructed iteratively, using a neural network to predict the probability of unchosen examples conditioned on the chosen examples in the subset, and greedily adding the least probable example. We empirically evaluate the representativeness of the subsets constructed by our method, and demonstrate such subsets can significantly improve synthesis time and stability.\nNetwork integration studies try to assess the impact of future developments, such as the increase of Renewable Energy Sources or the introduction of Smart Grid Technologies, on large-scale network areas. Goals can be to support strategic alignment in the regulatory framework or to adapt the network planning principles of Distribution System Operators. This study outlines an approach for the automated distribution system planning that can calculate network reconfiguration, reinforcement and extension plans in a fully automated fashion. This allows the estimation of the expected cost in massive probabilistic simulations of large numbers of real networks and constitutes a core component of a framework for large-scale network integration studies. Exemplary case study results are presented that were performed in cooperation with different major distribution system operators. The case studies cover the estimation of expected network reinforcement costs, technical and economical assessment of smart grid technologies and structural network optimisation.\nOntology engineering is a hard and error-prone task, in which small changes may lead to errors, or even produce an inconsistent ontology. As ontologies grow in size, the need for automated methods for repairing inconsistencies while preserving as much of the original knowledge as possible increases. Most previous approaches to this task are based on removing a few axioms from the ontology to regain consistency. We propose a new method based on weakening these axioms to make them less restrictive, employing the use of refinement operators. We introduce the theoretical framework for weakening DL ontologies, propose algorithms to repair ontologies based on the framework, and provide an analysis of the computational complexity. Through an empirical analysis made over real-life ontologies, we show that our approach preserves significantly more of the original knowledge of the ontology than removing axioms.\nKnowledge Graphs (KGs) have been applied to many tasks including Web search, link prediction, recommendation, natural language processing, and entity linking. However, most KGs are far from complete and are growing at a rapid pace. To address these problems, Knowledge Graph Completion (KGC) has been proposed to improve KGs by filling in its missing connections. Unlike existing methods which hold a closed-world assumption, i.e., where KGs are fixed and new entities cannot be easily added, in the present work we relax this assumption and propose a new open-world KGC task. As a first attempt to solve this task we introduce an open-world KGC model called ConMask. This model learns embeddings of the entity's name and parts of its text-description to connect unseen entities to the KG. To mitigate the presence of noisy text descriptions, ConMask uses a relationship-dependent content masking to extract relevant snippets and then trains a fully convolutional neural network to fuse the extracted snippets with entities in the KG. Experiments on large data sets, both old and new, show that ConMask performs well in the open-world KGC task and even outperforms existing KGC models on the standard closed-world KGC task.\nFor applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an $n \\times n$ positive definite matrix, and its derivatives - leading to prohibitive $\\mathcal{O}(n^3)$ computations. We propose novel $\\mathcal{O}(n)$ approaches to estimating these quantities from only fast matrix vector multiplications (MVMs). These stochastic approximations are based on Chebyshev, Lanczos, and surrogate models, and converge quickly even for kernel matrices that have challenging spectra. We leverage these approximations to develop a scalable Gaussian process approach to kernel learning. We find that Lanczos is generally superior to Chebyshev for kernel learning, and that a surrogate approach can be highly efficient and accurate with popular kernels.\nRepresenting the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results.\nWe propose a new splitting criterion for a meta-learning approach to multiclass classifier design that adaptively merges the classes into a tree-structured hierarchy of increasingly difficult binary classification problems. The classification tree is constructed from empirical estimates of the Henze-Penrose bounds on the pairwise Bayes misclassification rates that rank the binary subproblems in terms of difficulty of classification. The proposed empirical estimates of the Bayes error rate are computed from the minimal spanning tree (MST) of the samples from each pair of classes. Moreover, a meta-learning technique is presented for quantifying the one-vs-rest Bayes error rate for each individual class from a single MST on the entire dataset. Extensive simulations on benchmark datasets show that the proposed hierarchical method can often be learned much faster than competing methods, while achieving competitive accuracy.\nThis paper proposes a computational approach for analysis of strokes in line drawings by artists. We aim at developing an AI methodology that facilitates attribution of drawings of unknown authors in a way that is not easy to be deceived by forged art. The methodology used is based on quantifying the characteristics of individual strokes in drawings. We propose a novel algorithm for segmenting individual strokes. We designed and compared different hand-crafted and learned features for the task of quantifying stroke characteristics. We also propose and compare different classification methods at the drawing level. We experimented with a dataset of 300 digitized drawings with over 80 thousands strokes. The collection mainly consisted of drawings of Pablo Picasso, Henry Matisse, and Egon Schiele, besides a small number of representative works of other artists. The experiments shows that the proposed methodology can classify individual strokes with accuracy 70%-90%, and aggregate over drawings with accuracy above 80%, while being robust to be deceived by fakes (with accuracy 100% for detecting fakes in most settings).\nThe multi-armed bandit problem has been extensively studied under the stationary assumption. However in reality, this assumption often does not hold because the distributions of rewards themselves may change over time. In this paper, we propose a change-detection (CD) based framework for multi-armed bandit problems under the piecewise-stationary setting, and study a class of change-detection based UCB (Upper Confidence Bound) policies, CD-UCB, that actively detects change points and restarts the UCB indices. We then develop CUSUM-UCB and PHT-UCB, that belong to the CD-UCB class and use cumulative sum (CUSUM) and Page-Hinkley Test (PHT) to detect changes. We show that CUSUM-UCB obtains the best known regret upper bound under mild assumptions. We also demonstrate the regret reduction of the CD-UCB policies over arbitrary Bernoulli rewards and Yahoo! datasets of webpage click-through rates.\nWith an abundance of research papers in deep learning, reproducibility or adoption of the existing works becomes a challenge. This is due to the lack of open source implementations provided by the authors. Further, re-implementing research papers in a different library is a daunting task. To address these challenges, we propose a novel extensible approach, DLPaper2Code, to extract and understand deep learning design flow diagrams and tables available in a research paper and convert them to an abstract computational graph. The extracted computational graph is then converted into execution ready source code in both Keras and Caffe, in real-time. An arXiv-like website is created where the automatically generated designs is made publicly available for 5,000 research papers. The generated designs could be rated and edited using an intuitive drag-and-drop UI framework in a crowdsourced manner. To evaluate our approach, we create a simulated dataset with over 216,000 valid design visualizations using a manually defined grammar. Experiments on the simulated dataset show that the proposed framework provide more than $93\\%$ accuracy in flow diagram content extraction.\nWe study the performance of stochastically trained deep neural networks (DNNs) whose synaptic weights are implemented using emerging memristive devices that exhibit limited dynamic range, resolution, and variability in their programming characteristics. We show that a key device parameter to optimize the learning efficiency of DNNs is the variability in its programming characteristics. DNNs with such memristive synapses, even with dynamic range as low as $15$ and only $32$ discrete levels, when trained based on stochastic updates suffer less than $3\\%$ loss in accuracy compared to floating point software baseline. We also study the performance of stochastic memristive DNNs when used as inference engines with noise corrupted data and find that if the device variability can be minimized, the relative degradation in performance for the Stochastic DNN is better than that of the software baseline. Hence, our study presents a new optimization corner for memristive devices for building large noise-immune deep learning systems.\nIntrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data. In contrast to traditional fully supervised learning approaches, in this paper we propose learning intrinsic image decomposition by explaining the input image. Our model, the Rendered Intrinsics Network (RIN), joins together an image decomposition pipeline, which predicts reflectance, shape, and lighting conditions given a single image, with a recombination function, a learned shading model used to recompose the original input based off of intrinsic image predictions. Our network can then use unsupervised reconstruction error as an additional signal to improve its intermediate representations. This allows large-scale unlabeled data to be useful during training, and also enables transferring learned knowledge to images of unseen object categories, lighting conditions, and shapes. Extensive experiments demonstrate that our method performs well on both intrinsic image decomposition and knowledge transfer.\nWe introduce models for saliency prediction for mobile user interfaces. A mobile interface may include elements like buttons, text, etc. in addition to natural images which enable performing a variety of tasks. Saliency in natural images is a well studied area. However, given the difference in what constitutes a mobile interface, and the usage context of these devices, we postulate that saliency prediction for mobile interface images requires a fresh approach. Mobile interface design involves operating on elements, the building blocks of the interface. We first collected eye-gaze data from mobile devices for free viewing task. Using this data, we develop a novel autoencoder based multi-scale deep learning model that provides saliency prediction at the mobile interface element level. Compared to saliency prediction approaches developed for natural images, we show that our approach performs significantly better on a range of established metrics.\nIn this paper we deal with the problem of extending Zadeh's operators on fuzzy sets (FSs) to interval-valued (IVFSs), set-valued (SVFSs) and type-2 (T2FSs) fuzzy sets. Namely, it is known that seeing FSs as SVFSs, or T2FSs, whose membership degrees are singletons is not order-preserving. We then describe a family of lattice embeddings from FSs to SVFSs. Alternatively, if the former singleton viewpoint is required, we reformulate the intersection on hesitant fuzzy sets and introduce what we have called closed-valued fuzzy sets. This new type of fuzzy sets extends standard union and intersection on FSs. In addition, it allows handling together membership degrees of different nature as, for instance, closed intervals and finite sets. Finally, all these constructions are viewed as T2FSs forming a chain of lattices.\nA temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy exactly, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(\\beta), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(\\beta) by casting learning with options into a common framework with well-studied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.\nIn this work we propose a new method for the rhythm classification of short single-lead ECG records, using a set of high-level and clinically meaningful features provided by the abductive interpretation of the records. These features include morphological and rhythm-related features that are used to build two classifiers: one that evaluates the record globally, using aggregated values for each feature; and another one that evaluates the record as a sequence, using a Recurrent Neural Network fed with the individual features for each detected heartbeat. The two classifiers are finally combined using the stacking technique, providing an answer by means of four target classes: Normal sinus rhythm, Atrial fibrillation, Other anomaly, and Noisy. The approach has been validated against the 2017 Physionet/CinC Challenge dataset, obtaining a final score of 0.83 and ranking first in the competition.\nWe introduce CARLA, an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. We use CARLA to study the performance of three approaches to autonomous driving: a classic modular pipeline, an end-to-end model trained via imitation learning, and an end-to-end model trained via reinforcement learning. The approaches are evaluated in controlled scenarios of increasing difficulty, and their performance is examined via metrics provided by CARLA, illustrating the platform's utility for autonomous driving research. The supplementary video can be viewed at https://youtu.be/Hp8Dz-Zek2E\nRapid advances of hardware-based technologies during the past decades have opened up new possibilities for Life scientists to gather multimodal data in various application domains (e.g., Omics, Bioimaging, Medical Imaging, and [Brain/Body]-Machine Interfaces), thus generating novel opportunities for development of dedicated data intensive machine learning techniques. Overall, recent research in Deep learning (DL), Reinforcement learning (RL), and their combination (Deep RL) promise to revolutionize Artificial Intelligence. The growth in computational power accompanied by faster and increased data storage and declining computing costs have already allowed scientists in various fields to apply these techniques on datasets that were previously intractable for their size and complexity. This review article provides a comprehensive survey on the application of DL, RL, and Deep RL techniques in mining Biological data. In addition, we compare performances of DL techniques when applied to different datasets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.\nTo efficiently answer queries, datalog systems often materialise all consequences of a datalog program, so the materialisation must be updated whenever the input facts change. Several solutions to the materialisation update problem have been proposed. The Delete/Rederive (DRed) and the Backward/Forward (B/F) algorithms solve this problem for general datalog, but both contain steps that evaluate rules 'backwards' by matching their heads to a fact and evaluating the partially instantiated rule bodies as queries. We show that this can be a considerable source of overhead even on very small updates. In contrast, the Counting algorithm does not evaluate the rules 'backwards', but it can handle only nonrecursive rules. We present two hybrid approaches that combine DRed and B/F with Counting so as to reduce or even eliminate 'backward' rule evaluation while still handling arbitrary datalog programs. We show empirically that our hybrid algorithms are usually significantly faster than existing approaches, sometimes by orders of magnitude.\nIn recent years, there has been an increasing interest in extending traditional stream processing engines with logical, rule-based, reasoning capabilities. This poses significant theoretical and practical challenges since rules can derive new information and propagate it both towards past and future time points; as a result, streamed query answers can depend on data that has not yet been received, as well as on data that arrived far in the past. Stream reasoning algorithms, however, must be able to stream out query answers as soon as possible, and can only keep a limited number of previous input facts in memory. In this paper, we propose novel reasoning problems to deal with these challenges, and study their computational properties on Datalog extended with a temporal sort and the successor function (a core rule-based language for stream reasoning applications).\nWithin-Class Covariance Normalization (WCCN) is a powerful post-processing method for normalizing the within-class covariance of a set of data points. WCCN projects the observations into a linear sub-space where the within-class variability is reduced. This property has proven to be beneficial in subsequent recognition tasks. The central idea of this paper is to reformulate the classic WCCN as a Deep Neural Network (DNN) compatible version. We propose the Deep WithinClass Covariance Analysis (DWCCA) which can be incorporated in a DNN architecture. This formulation enables us to exploit the beneficial properties of WCCN, and still allows for training with Stochastic Gradient Descent (SGD) in an end-to-end fashion. We investigate the advantages of DWCCA on deep neural networks with convolutional layers for supervised learning. Our results on Acoustic Scene Classification show that via DWCCA we can achieves equal or superior performance in a VGG-style deep neural network.\nWe search for digital biomarkers from Parkinson's Disease by observing approximate repetitive patterns matching hypothesized step and stride periodic cycles. These observations were modeled as a cycle of hidden states with randomness allowing deviation from a canonical pattern of transitions and emissions, under the hypothesis that the averaged features of hidden states would serve to informatively characterize classes of patients/controls. We propose a Hidden Semi-Markov Model (HSMM), a latent-state model, emitting 3D-acceleration vectors. Transitions and emissions are inferred from data. We fit separate models per unique device and training label. Hidden Markov Models (HMM) force geometric distributions of the duration spent at each state before transition to a new state. Instead, our HSMM allows us to specify the distribution of state duration. This modified version is more effective because we are interested more in each state's duration than the sequence of distinct states, allowing inclusion of these durations the feature vector.\nTraining a personalized dialogue system requires a lot of data, and the data collected for a single user is usually insufficient. One common practice for this problem is to share training dialogues between different users and train multiple sequence-to-sequence dialogue models together with transfer learning. However, current sequence-to-sequence transfer learning models operate on the entire sentence, which might cause negative transfer if different personal information from different users is mixed up. We propose a personalized decoder model to transfer finer granularity phrase-level knowledge between different users while keeping personal preferences of each user intact. A novel personal control gate is introduced, enabling the personalized decoder to switch between generating personalized phrases and shared phrases. The proposed personalized decoder model can be easily combined with various deep models and can be trained with reinforcement learning. Real-world experimental results demonstrate that the phrase-level personalized decoder improves the BLEU over multiple sentence-level transfer baseline models by as much as 7.5%.\nGenerating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficult. In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. More specifically, we collect a large corpus of Twitter conversations that include emojis in the response, and assume the emojis convey the underlying emotions of the sentence. We then introduce a reinforced conditional variational encoder approach to train a deep generative model on these conversations, which allows us to use emojis to control the emotion of the generated text. Experimentally, we show in our quantitative and qualitative analyses that the proposed models can successfully generate high-quality abstractive conversation responses in accordance with designated emotions.\nLocatedNear relation describes two typically co-located objects, which is a type of useful commonsense knowledge for computer vision, natural language understanding, machine comprehension, etc. We propose to automatically extract such relationship through a sentence-level classifier and aggregating the scores of entity pairs detected from a large number of sentences. To enable the research of these tasks, we release two benchmark datasets, one containing 5,000 sentences annotated with whether a mentioned entity pair has LocatedNear relation in the given sentence or not; the other containing 500 pairs of physical objects and whether they are commonly located nearby. We also propose some baseline methods for the tasks and compare the results with a state-of-the-art general-purpose relation classifier.\nTracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. Compared to tracking with a still camera, the images captured with a PTZ camera are highly dynamic in nature because the camera can perform large motion resulting in quickly changing capture conditions. Furthermore, tracking with a PTZ camera involves camera control to position the camera on the target. For successful tracking and camera control, the tracker must be fast enough, or has to be able to predict accurately the next position of the target. Therefore, standard benchmarks do not allow to assess properly the quality of a tracker for the PTZ scenario. In this work, we use a virtual PTZ framework to evaluate different tracking algorithms and compare their performances. We also extend the framework to add target position prediction for the next frame, accounting for camera motion and processing delays. By doing this, we can assess if predicting can make long-term tracking more robust as it may help slower algorithms for keeping the target in the field of view of the camera. Results confirm that both speed and robustness are required for tracking under the PTZ scenario.\nThe quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.\nJuba recently proposed a formulation of learning abductive reasoning from examples, in which both the relative plausibility of various explanations, as well as which explanations are valid, are learned directly from data. The main shortcoming of this formulation of the task is that it assumes access to full-information (i.e., fully specified) examples; relatedly, it offers no role for declarative background knowledge, as such knowledge is rendered redundant in the abduction task by complete information. In this work, we extend the formulation to utilize such partially specified examples, along with declarative background knowledge about the missing data. We show that it is possible to use implicitly learned rules together with the explicitly given declarative knowledge to support hypotheses in the course of abduction. We observe that when a small explanation exists, it is possible to obtain a much-improved guarantee in the challenging exception-tolerant setting. Such small, human-understandable explanations are of particular interest for potential applications of the task.\nNeural networks have recently had a lot of success for many tasks. However, neural network architectures that perform well are still typically designed manually by experts in a cumbersome trial-and-error process. We propose a new method to automatically search for well-performing CNN architectures based on a simple hill climbing procedure whose operators apply network morphisms, followed by short optimization runs by cosine annealing. Surprisingly, this simple method yields competitive results, despite only requiring resources in the same order of magnitude as training a single network. E.g., on CIFAR-10, our method designs and trains networks with an error rate below 6% in only 12 hours on a single GPU; training for one day reduces this error further, to almost 5%.\nIn the paper, a parallel Tabu Search algorithm for the Resource Constrained Project Scheduling Problem is proposed. To deal with this NP-hard combinatorial problem many optimizations have been performed. For example, a resource evaluation algorithm is selected by a heuristic and an effective Tabu List was designed. In addition to that, a capacity-indexed resource evaluation algorithm was proposed and the GPU (Graphics Processing Unit) version uses a homogeneous model to reduce the required communication bandwidth. According to the experiments, the GPU version outperforms the optimized parallel CPU version with respect to the computational time and the quality of solutions. In comparison with other existing heuristics, the proposed solution often gives better quality solutions.\nTraining automatic speech recognition (ASR) systems requires large amounts of data in the target language in order to achieve good performance. Whereas large training corpora are readily available for languages like English, there exists a long tail of languages which do suffer from a lack of resources. One method to handle data sparsity is to use data from additional source languages and build a multilingual system. Recently, ASR systems based on recurrent neural networks (RNNs) trained with connectionist temporal classification (CTC) have gained substantial research interest. In this work, we extended our previous approach towards training CTC-based systems multilingually. Our systems feature a global phone set, based on the joint phone sets of each source language. We evaluated the use of different language combinations as well as the addition of Language Feature Vectors (LFVs). As contrastive experiment, we built systems based on graphemes as well. Systems having a multilingual phone set are known to suffer in performance compared to their monolingual counterparts. With our proposed approach, we could reduce the gap between these mono- and multilingual setups, using either graphemes or phonemes.\nIn this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units poses difficulties. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. Language adaptation, in contrast to speaker adaptation, needs to be applied not only on the feature level, but also to deeper layers of the network. In this work, we therefore extended our previous approach by introducing a novel technique which we call \"modulation\". Based on this method, we modulated the hidden layers of RNNs using LFVs. We evaluated this approach in both full and low resource conditions, as well as for grapheme and phone based systems. Lower error rates throughout the different conditions could be achieved by the use of the modulation.\nThis paper considers two important problems - on the supply-side and demand-side respectively and studies both in a unified framework. On the supply side, we study the problem of energy sharing among microgrids with the goal of maximizing profit obtained from selling power while meeting customer demand. On the other hand, under shortage of power, this problem becomes one of deciding the amount of power to be bought with dynamically varying prices. On the demand side, we consider the problem of optimally scheduling the time-adjustable demand - i.e., of loads with flexible time windows in which they can be scheduled. While previous works have treated these two problems in isolation, we combine these problems together and provide for the first time in the literature, a unified Markov decision process (MDP) framework for these problems. We then apply the Q-learning algorithm, a popular model-free reinforcement learning technique, to obtain the optimal policy. Through simulations, we show that our model outperforms the traditional power sharing models.\nThis article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for a long time. We propose encodings of the classical sequential pattern mining tasks within two representations of embeddings (fill-gaps vs skip-gaps) and for various kinds of patterns: frequent, constrained and condensed. We compare the computational performance of these encodings with each other to get a good insight into the efficiency of ASP encodings. The results show that the fill-gaps strategy is better on real problems due to lower memory consumption. Finally, compared to a constraint programming approach (CPSM), another declarative programming paradigm, our proposal showed comparable performance.\nRecent industry reports assure the rise of web robots which comprise more than half of the total web traffic. They not only threaten the security, privacy and efficiency of the web but they also distort analytics and metrics, doubting the veracity of the information being promoted. In the academic publishing domain, this can cause articles to be faulty presented as prominent and influential. In this paper, we present our approach on detecting web robots in academic publishing websites. We use different supervised learning algorithms with a variety of characteristics deriving from both the log files of the server and the content served by the website. Our approach relies on the assumption that human users will be interested in specific domains or articles, while web robots crawl a web library incoherently. We experiment with features adopted in previous studies with the addition of novel semantic characteristics which derive after performing a semantic analysis using the Latent Dirichlet Allocation (LDA) algorithm. Our real-world case study shows promising results, pinpointing the significance of semantic features in the web robot detection problem.\nA popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.\nNeuromorphic hardware tends to pose limits on the connectivity of deep networks that one can run on them. But also generic hardware and software implementations of deep learning run more efficiently for sparse networks. Several methods exist for pruning connections of a neural network after it was trained without connectivity constraints. We present an algorithm, DEEP R, that enables us to train directly a sparsely connected neural network. DEEP R automatically rewires the network during supervised training so that connections are there where they are most needed for the task, while its total number is all the time strictly bounded. We demonstrate that DEEP R can be used to train very sparse feedforward and recurrent neural networks on standard benchmark tasks with just a minor loss in performance. DEEP R is based on a rigorous theoretical foundation that views rewiring as stochastic sampling of network configurations from a posterior.\nHumans process visual scenes selectively and sequentially using attention. Central to models of human visual attention is the saliency map. We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. The architecture is motivated by human visual attention, and is used for multi-label image classification on a novel multiset task, demonstrating that it achieves high precision and recall while localizing objects with its attention. Unlike conventional multi-label image classification models, the model supports multiset prediction due to a reinforcement-learning based training process that allows for arbitrary label permutation and multiple instances per label.\nInspired by the magic sets for Datalog, we present a novel goal-driven approach for answering queries over terminating existential rules with equality (aka TGDs and EGDs). Our technique improves the performance of query answering by pruning the consequences that are not relevant for the query. This is challenging in our setting because equalities can potentially affect all predicates in a dataset. We address this problem by combining the existing singularization technique with two new ingredients: an algorithm for identifying the rules relevant to a query and a new magic sets algorithm. We show empirically that our technique can significantly improve the performance of query answering, and that it can mean the difference between answering a query in a few seconds or not being able to process the query at all.\nSemantic parsers translate language utterances to programs, but are often trained from utterance-denotation pairs only. Consequently, parsers must overcome the problem of spuriousness at training time, where an incorrect program found at search time accidentally leads to a correct denotation. We propose that in small well-typed domains, we can semi-automatically generate an abstract representation for examples that facilitates information sharing across examples. This alleviates spuriousness, as the probability of randomly obtaining a correct answer from a program decreases across multiple examples. We test our approach on CNLVR, a challenging visual reasoning dataset, where spuriousness is central because denotations are either TRUE or FALSE, and thus random programs have high probability of leading to a correct denotation. We develop the first semantic parser for this task and reach 83.5% accuracy, a 15.7% absolute accuracy improvement compared to the best reported accuracy so far.\nWe study the problem of multiset prediction. The goal of multiset prediction is to train a predictor that maps an input to a multiset consisting of multiple items. Unlike existing problems in supervised learning, such as classification, ranking and sequence generation, there is no known order among items in a target multiset, and each item in the multiset may appear more than once, making this problem extremely challenging. In this paper, we propose a novel multiset loss function by viewing this problem from the perspective of sequential decision making. The proposed multiset loss function is empirically evaluated on two families of datasets, one synthetic and the other real, with varying levels of difficulty, against various baseline loss functions including reinforcement learning, sequence, and aggregated distribution matching loss functions. The experiments reveal the effectiveness of the proposed loss function over the others.\nWe address the problem of learning vector representations for entities and relations in Knowledge Graphs (KGs) for Knowledge Base Completion (KBC). This problem has received significant attention in the past few years and multiple methods have been proposed. Most of the existing methods in the literature use a predefined characteristic scoring function for evaluating the correctness of KG triples. These scoring functions distinguish correct triples (high score) from incorrect ones (low score). However, their performance vary across different datasets. In this work, we demonstrate that a simple neural network based score function can consistently achieve near start-of-the-art performance on multiple datasets. We also quantitatively demonstrate biases in standard benchmark datasets, and highlight the need to perform evaluation spanning various datasets.\nOur team Hibikino-Musashi@Home was founded in 2010. It is based in Kitakyushu Science and Research Park, Japan. Since 2010, we have participated in the RoboCup@Home Japan open competition open-platform league every year. Currently, the Hibikino-Musashi@Home team has 24 members from seven different laboratories based in the Kyushu Institute of Technology. Our home-service robots are used as platforms for both education and implementation of our research outcomes. In this paper, we introduce our team and the technologies that we have implemented in our robots.\nModel-Based Diagnosis deals with the identification of the real cause of a system's malfunction based on a formal system model and observations of the system behavior. When a malfunction is detected, there is usually not enough information available to pinpoint the real cause and one needs to discriminate between multiple fault hypotheses (called diagnoses). To this end, Sequential Diagnosis approaches ask an oracle for additional system measurements.   This work presents strategies for (optimal) measurement selection in model-based sequential diagnosis. In particular, assuming a set of leading diagnoses being given, we show how queries (sets of measurements) can be computed and optimized along two dimensions: expected number of queries and cost per query. By means of a suitable decoupling of two optimizations and a clever search space reduction the computations are done without any inference engine calls. For the full search space, we give a method requiring only a polynomial number of inferences and show how query properties can be guaranteed which existing methods do not provide. Evaluation results using real-world problems indicate that the new method computes (virtually) optimal queries instantly independently of the size and complexity of the considered diagnosis problems and outperforms equally general methods not exploiting the proposed theory by orders of magnitude.\nBy introducing sign constraints on the weights, this paper proposes sign constrained rectifier networks (SCRNs), whose training can be solved efficiently by the well known majorization-minimization (MM) algorithms. We prove that the proposed two-hidden-layer SCRNs, which exhibit negative weights in the second hidden layer and negative weights in the output layer, are capable of separating any two (or more) disjoint pattern sets. Furthermore, the proposed two-hidden-layer SCRNs can decompose the patterns of each class into several clusters so that each cluster is convexly separable from all the patterns from the other classes. This provides a means to learn the pattern structures and analyse the discriminant factors between different classes of patterns.\nWe consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the context that may provide information about how the patient might respond to treatment decisions. We propose algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context. We also give lower and upper PAC bounds under the smoothness assumption. Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs. For the linear setting, we give a PAC learning algorithm based on KWIK learning techniques.\nDeformable image registration and regression are important tasks in medical image analysis. However, they are computationally expensive, especially when analyzing large-scale datasets that contain thousands of images. Hence, cluster computing is typically used, making the approaches dependent on such computational infrastructure. Even larger computational resources are required as study sizes increase. This limits the use of deformable image registration and regression for clinical applications and as component algorithms for other image analysis approaches. We therefore propose using a fast predictive approach to perform image registrations. In particular, we employ these fast registration predictions to approximate a simplified geodesic regression model to capture longitudinal brain changes. The resulting method is orders of magnitude faster than the standard optimization-based regression model and hence facilitates large-scale analysis on a single graphics processing unit (GPU). We evaluate our results on 3D brain magnetic resonance images (MRI) from the ADNI datasets.\nIn this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of Markov Decision Processes (MDP), to which we refers as Quantile Markov Decision Processes (QMDP). Traditionally, the goal of a Markov Decision Process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly to be infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. Our framework of QMDP provides analytical results characterizing the optimal QMDP solution and presents the algorithm for solving the QMDP. We provide analytical results characterizing the optimal QMDP solution and present the algorithms for solving the QMDP. We illustrate the model with two experiments: a grid game and a HIV optimal treatment experiment.\nWe investigate some well-known (and a few not-so-well-known) many-valued logics that have a small number (3 or 4) of truth values. For some of them we complain that they do not have any \\emph{logical} use (despite their perhaps having some intuitive semantic interest) and we look at ways to add features so as to make them useful, while retaining their intuitive appeal. At the end, we show some surprising results in the system FDE, and its relationships with features of other logics. We close with some new examples of \"synonymous logics.\" An Appendix contains a natural deduction system for our augmented FDE, and proofs of soundness and completeness.\nApproximate Bayesian computation (ABC) and synthetic likelihood (SL) techniques have enabled the use of Bayesian inference for models that may be simulated, but for which the likelihood cannot be evaluated pointwise at values of an unknown parameter $\\theta$. The main idea in ABC and SL is to, for different values of $\\theta$ (usually chosen using a Monte Carlo algorithm), build estimates of the likelihood based on simulations from the model conditional on $\\theta$. The quality of these estimates determines the efficiency of an ABC/SL algorithm. In standard ABC/SL, the only means to improve an estimated likelihood at $\\theta$ is to simulate more times from the model conditional on $\\theta$, which is infeasible in cases where the simulator is computationally expensive. In this paper we describe how to use bootstrapping as a means for improving SL estimates whilst using fewer simulations from the model, and also investigate its use in ABC. Further, we investigate the use of the bag of little bootstraps as a means for applying this approach to large datasets, yielding Monte Carlo algorithms that accurately approximate posterior distributions whilst only simulating subsamples of the full data. Examples of the approach applied to i.i.d., temporal and spatial data are given.\nField Programmable Gate Arrays (FPGAs) plays an increasingly important role in data sampling and processing industries due to its highly parallel architecture, low power consumption, and flexibility in custom algorithms. Especially, in the artificial intelligence field, for training and implement the neural networks and machine learning algorithms, high energy efficiency hardware implement and massively parallel computing capacity are heavily demanded. Therefore, many global companies have applied FPGAs into AI and Machine learning fields such as autonomous driving and Automatic Spoken Language Recognition (Baidu) [1] [2] and Bing search (Microsoft) [3]. Considering the FPGAs great potential in these fields, we tend to implement a general neural network hardware architecture on XILINX ZU9CG System On Chip (SOC) platform [4], which contains abundant hardware resource and powerful processing capacity. The general neural network architecture on the FPGA SOC platform can perform forward and backward algorithms in deep neural networks (DNN) with high performance and easily be adjusted according to the type and scale of the neural networks.\nKnowledge bases (KB) constructed through information extraction from text play an important role in query answering and reasoning. In this work, we study a particular reasoning task, the problem of discovering causal relationships between entities, known as causal discovery. There are two contrasting types of approaches to discovering causal knowledge. One approach attempts to identify causal relationships from text using automatic extraction techniques, while the other approach infers causation from observational data. However, extractions alone are often insufficient to capture complex patterns and full observational data is expensive to obtain. We introduce a probabilistic method for fusing noisy extractions with observational data to discover causal knowledge. We propose a principled approach that uses the probabilistic soft logic (PSL) framework to encode well-studied constraints to recover long-range patterns and consistent predictions, while cheaply acquired extractions provide a proxy for unseen observations. We apply our method gene regulatory networks and show the promise of exploiting KB signals in causal discovery, suggesting a critical, new area of research.\nWe study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly $K$ out of $N$ possible arms have to be played (with $1\\leq K \\leq N$). In addition to observing the individual rewards for each arm played, the player also learns a vector of costs which has to be covered with an a-priori defined budget $B$. The game ends when the sum of current costs associated with the played arms exceeds the remaining budget.   Firstly, we analyze this setting for the stochastic case, for which we assume each arm to have an underlying cost and reward distribution with support $[c_{\\min}, 1]$ and $[0, 1]$, respectively. We derive an Upper Confidence Bound (UCB) algorithm which achieves $O(NK^4 \\log B)$ regret.   Secondly, for the adversarial case in which the entire sequence of rewards and costs is fixed in advance, we derive an upper bound on the regret of order $O(\\sqrt{NB\\log(N/K)})$ utilizing an extension of the well-known $\\texttt{Exp3}$ algorithm. We also provide upper bounds that hold with high probability and a lower bound of order $\\Omega((1 - K/N)^2 \\sqrt{NB/K})$.\nGoal-conditional policies allow reinforcement learning agents to pursue specific goals during different episodes. In addition to their potential to generalize desired behavior to unseen goals, such policies may also help in defining options for arbitrary subgoals, enabling higher-level planning. While trying to achieve a specific goal, an agent may also be able to exploit information about the degree to which it has achieved alternative goals. Reinforcement learning agents have only recently been endowed with such capacity for hindsight, which is highly valuable in environments with sparse rewards. In this paper, we show how hindsight can be introduced to likelihood-ratio policy gradient methods, generalizing this capacity to an entire class of highly successful algorithms. Our preliminary experiments suggest that hindsight may increase the sample efficiency of policy gradient methods.\nIn order to automate verification process, regulatory rules written in natural language need to be translated into a format that machines can understand. However, none of the existing formalisms can fully represent the elements that appear in legal norms. For instance, most of these formalisms do not provide features to capture the behavior of deontic effects, which is an important aspect in automated compliance checking. This paper presents an approach for transforming legal norms represented using LegalRuleML to a variant of Modal Defeasible Logic (and vice versa) such that a legal statement represented using LegalRuleML can be transformed into a machine-readable format that can be understood and reasoned about depending upon the client's preferences.\nWe consider the problem of mining signal temporal logical requirements from a dataset of regular (good) and anomalous (bad) trajectories of a dynamical system. We assume the training set to be labeled by human experts and that we have access only to a limited amount of data, typically noisy. We provide a systematic approach to synthesize both the syntactical structure and the parameters of the temporal logic formula using a two-steps procedure: first, we leverage a novel evolutionary algorithm for learning the structure of the formula; second, we perform the parameter synthesis operating on the statistical emulation of the average robustness for a candidate formula w.r.t. its parameters. We compare our results with our previous work [{BufoBSBLB14] and with a recently proposed decision-tree [bombara_decision_2016] based method. We present experimental results on two case studies: an anomalous trajectory detection problem of a naval surveillance system and the characterization of an Ineffective Respiratory effort, showing the usefulness of our work.\nA major target of linguistics and cognitive science has been to understand what class of learning systems can acquire the key structures of natural language. Until recently, the computational requirements of language have been used to argue that learning is impossible without a highly constrained hypothesis space. Here, we describe a learning system that is maximally unconstrained, operating over the space of all computations, and is able to acquire several of the key structures present natural language from positive evidence alone. The model successfully acquires regular (e.g. $(ab)^n$), context-free (e.g. $a^n b^n$, $x x^R$), and context-sensitive (e.g. $a^nb^nc^n$, $a^nb^mc^nd^m$, $xx$) formal languages. Our approach develops the concept of factorized programs in Bayesian program induction in order to help manage the complexity of representation. We show in learning, the model predicts several phenomena empirically observed in human grammar acquisition experiments.\nThere is an increasing interest in exploiting mobile sensing technologies and machine learning techniques for mental health monitoring and intervention. Researchers have effectively used contextual information, such as mobility, communication and mobile phone usage patterns for quantifying individuals' mood and wellbeing. In this paper, we investigate the effectiveness of neural network models for predicting users' level of stress by using the location information collected by smartphones. We characterize the mobility patterns of individuals using the GPS metrics presented in the literature and employ these metrics as input to the network. We evaluate our approach on the open-source StudentLife dataset. Moreover, we discuss the challenges and trade-offs involved in building machine learning models for digital mental health and highlight potential future work in this direction.\nWe introduce a data-driven approach to aid the repairing and conservation of archaeological objects: ORGAN, an object reconstruction generative adversarial network (GAN). By using an encoder-decoder 3D deep neural network on a GAN architecture, and combining two loss objectives: a completion loss and an Improved Wasserstein GAN loss, we can train a network to effectively predict the missing geometry of damaged objects. As archaeological objects can greatly differ between them, the network is conditioned on a variable, which can be a culture, a region or any metadata of the object. In our results, we show that our method can recover most of the information from damaged objects, even in cases where more than half of the voxels are missing, without producing many errors.\nWe present a method for explaining the image classification predictions of deep convolution neural networks, by highlighting the pixels in the image which influence the final class prediction. Our method requires the identification of a heuristic method to select parameters hypothesized to be most relevant in this prediction, and here we use Kullback-Leibler divergence to provide this focus. Overall, our approach helps in understanding and interpreting deep network predictions and we hope contributes to a foundation for such understanding of deep learning networks. In this brief paper, our experiments evaluate the performance of two popular networks in this context of interpretability.\nEsports has emerged as a popular genre for players as well as spectators, supporting a global entertainment industry. Esports analytics has evolved to address the requirement for data-driven feedback, and is focused on cyber-athlete evaluation, strategy and prediction. Towards the latter, previous work has used match data from a variety of player ranks from hobbyist to professional players. However, professional players have been shown to behave differently than lower ranked players. Given the comparatively limited supply of professional data, a key question is thus whether mixed-rank match datasets can be used to create data-driven models which predict winners in professional matches and provide a simple in-game statistic for viewers and broadcasters. Here we show that, although there is a slightly reduced accuracy, mixed-rank datasets can be used to predict the outcome of professional matches, with suitably optimized configurations.\nAchieving superhuman playing level by AlphaGo corroborated the capabilities of convolutional neural architectures (CNNs) for capturing complex spatial patterns. This result was to a great extent due to several analogies between Go board states and 2D images CNNs have been designed for, in particular translational invariance and a relatively large board. In this paper, we verify whether CNN-based move predictors prove effective for Othello, a game with significantly different characteristics, including a much smaller board size and complete lack of translational invariance. We compare several CNN architectures and board encodings, augment them with state-of-the-art extensions, train on an extensive database of experts' moves, and examine them with respect to move prediction accuracy and playing strength. The empirical evaluation confirms high capabilities of neural move predictors and suggests a strong correlation between prediction accuracy and playing strength. The best CNNs not only surpass all other 1-ply Othello players proposed to date but defeat (2-ply) Edax, the best open-source Othello player.\nThe goal of point set registration is to find point-by-point correspondences between point sets, each of which characterizes the shape of an object. Because local preservation of object geometry is assumed, prevalent algorithms in the area can often elegantly solve the problems without using geometric information specific to the objects. This means that registration performance can be further improved by using prior knowledge of object geometry. In this paper, we propose a novel point set registration method using the Gaussian mixture model with prior shape information encoded as a statistical shape model. Our transformation model is defined as a combination of the similar transformation, motion coherence, and the statistical shape model. Therefore, the proposed method works effectively if the target point set includes outliers and missing regions, or if it is rotated. The computational cost can be reduced to linear, and therefore the method is scalable to large point sets. The effectiveness of the method will be verified through comparisons with existing algorithms using datasets concerning human body shapes, hands, and faces.\nWe present a self-supervised approach to ignoring \"distractors\" in camera images for the purposes of robustly estimating vehicle motion in cluttered urban environments. We leverage offline multi-session mapping approaches to automatically generate a per-pixel ephemerality mask and depth map for each input image, which we use to train a deep convolutional network. At run-time we use the predicted ephemerality and depth as an input to a monocular visual odometry (VO) pipeline, using either sparse features or dense photometric matching. Our approach yields metric-scale VO using only a single camera and can recover the correct egomotion even when 90% of the image is obscured by dynamic, independently moving objects. We evaluate our robust VO methods on more than 400km of driving from the Oxford RobotCar Dataset and demonstrate reduced odometry drift and significantly improved egomotion estimation in the presence of large moving vehicles in urban traffic.\nEpisodic control has been proposed as a third approach to reinforcement learning, besides model-free and model-based control, by analogy with the three types of human memory. i.e. episodic, procedural and semantic memory. But the theoretical properties of episodic control are not well investigated. Here I show that in deterministic tree Markov decision processes, episodic control is equivalent to a form of prioritized sweeping in terms of sample efficiency as well as memory and computation demands. For general deterministic and stochastic environments, prioritized sweeping performs better even when memory and computation demands are restricted to be equal to those of episodic control. These results suggest generalizations of prioritized sweeping to partially observable environments, its combined use with function approximation and the search for possible implementations of prioritized sweeping in brains.\nGiven the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A continual lifelong learning system must have three primary capabilities to succeed: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts largely on the first two capabilities while the third capability remains largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are n bits and memories are limited to k bits for k << n. We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM.\nIn this paper we introduce new types of square-piece jigsaw puzzles, where in addition to the unknown location and orientation of each piece, a piece might also need to be flipped. These puzzles, which are associated with a number of real world problems, are considerably harder, from a computational standpoint. Specifically, we present a novel generalized genetic algorithm (GA)-based solver that can handle puzzle pieces of unknown location and orientation (Type 2 puzzles) and (two-sided) puzzle pieces of unknown location, orientation, and face (Type 4 puzzles). To the best of our knowledge, our solver provides a new state-of-the-art, solving previously attempted puzzles faster and far more accurately, handling puzzle sizes that have never been attempted before, and assembling the newly introduced two-sided puzzles automatically and effectively. This paper also presents, among other results, the most extensive set of experimental results, compiled as of yet, on Type 2 puzzles.\nThis paper proposes a novel game-theoretical autonomous decision-making framework to address a task allocation problem for a swarm of multiple agents. We consider cooperation of self-interested agents and show that agents who have social inhibition can converge to a Nash stable partition (i.e., social agreement) using our proposed decentralised algorithm within polynomial time. The algorithm is simple and executable based on local interactions with neighbour agents under a strongly-connected communication network and even in asynchronous environments. We analytically present a mathematical formulation for computing the lower bound of a converged solution's suboptimality and additionally show that 50 % of suboptimality can be minimally guaranteed if social utilities are non-decreasing functions with respect to the number of co-working agents. Through numerical experiments, it is confirmed that the proposed framework is scalable, fast adaptable against dynamical environments, and robust even in a realistic situation where some of the agents temporarily somehow do not operate during a mission.\nIn this paper, we present our approach to solve a physics-based reinforcement learning challenge \"Learning to Run\" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. We benchmark state of the art policy-gradient methods and test several improvements, such as layer normalization, parameter noise, action and state reflecting, to stabilize training and improve its sample-efficiency. We found that the Deep Deterministic Policy Gradient method is the most efficient method for this environment and the improvements we have introduced help to stabilize training. Learned models are able to generalize to new physical scenarios, e.g. different obstacle courses.\nWe provide, to the best of our knowledge, the first computational study of extensive-form adversarial team games. These games are sequential, zero-sum games in which a team of players, sharing the same utility function, faces an adversary. We define three different scenarios according to the communication capabilities of the team. In the first, the teammates can communicate and correlate their actions both before and during the play. In the second, they can only communicate before the play. In the third, no communication is possible at all. We define the most suitable solution concepts, and we study the inefficiency caused by partial or null communication, showing that the inefficiency can be arbitrarily large in the size of the game tree. Furthermore, we study the computational complexity of the equilibrium-finding problem in the three scenarios mentioned above, and we provide, for each of the three scenarios, an exact algorithm. Finally, we empirically evaluate the scalability of the algorithms in random games and the inefficiency caused by partial or null communication.\nThere are many methodologies and techniques for easing the task of ontology building. Here we describe the intersection of two of these: ontology normalisation and fully programmatic ontology development. The first of these describes a standardized organisation for an ontology, with singly inherited self-standing entities, and a number of small taxonomies of refining entities. The former are described and defined in terms of the latter and used to manage the polyhierarchy of the self-standing entities. Fully programmatic development is a technique where an ontology is developed using a domain-specific language within a programming language, meaning that as well defining ontological entities, it is possible to add arbitrary patterns or new syntax within the same environment. We describe how new patterns can be used to enable a new style of ontology development that we call hypernormalisation.\nThis paper introduces a new neural structure called FusionNet, which extends existing attention approaches from three perspectives. First, it puts forward a novel concept of \"history of word\" to characterize attention information from the lowest word-level embedding up to the highest semantic-level representation. Second, it introduces an improved attention scoring function that better utilizes the \"history of word\" concept. Third, it proposes a fully-aware multi-level attention mechanism to capture the complete information in one text (such as a question) and exploit it in its counterpart (such as context or passage) layer by layer. We apply FusionNet to the Stanford Question Answering Dataset (SQuAD) and it achieves the first position for both single and ensemble model on the official SQuAD leaderboard at the time of writing (Oct. 4th, 2017). Meanwhile, we verify the generalization of FusionNet with two adversarial SQuAD datasets and it sets up the new state-of-the-art on both datasets: on AddSent, FusionNet increases the best F1 metric from 46.6% to 51.4%; on AddOneSent, FusionNet boosts the best F1 metric from 56.0% to 60.7%.\nThe Deep Q-Network proposed by Mnih et al. [2015] has become a benchmark and building point for much deep reinforcement learning research. However, replicating results for complex systems is often challenging since original scientific publications are not always able to describe in detail every important parameter setting and software engineering solution. In this paper, we present results from our work reproducing the results of the DQN paper. We highlight key areas in the implementation that were not covered in great detail in the original paper to make it easier for researchers to replicate these results, including termination conditions and gradient descent algorithms. Finally, we discuss methods for improving the computational performance and provide our own implementation that is designed to work with a range of domains, and not just the original Arcade Learning Environment [Bellemare et al., 2013].\nComputer poetry generation is our first step towards computer writing. Writing must have a theme. The current approaches of using sequence-to-sequence models with attention often produce non-thematic poems. We present a novel conditional variational autoencoder with a hybrid decoder adding the deconvolutional neural networks to the general recurrent neural networks to fully learn topic information via latent variables. This approach significantly improves the relevance of the generated poems by representing each line of the poem not only in a context-sensitive manner but also in a holistic way that is highly related to the given keyword and the learned topic. A proposed augmented word2vec model further improves the rhythm and symmetry. Tests show that the generated poems by our approach are mostly satisfying with regulated rules and consistent themes, and 73.42% of them receive an Overall score no less than 3 (the highest score is 5).\nTemporal gates play a significant role in modern recurrent-based neural encoders, enabling fine-grained control over recursive compositional operations over time. In recurrent models such as the long short-term memory (LSTM), temporal gates control the amount of information retained or discarded over time, not only playing an important role in influencing the learned representations but also serving as a protection against vanishing gradients. This paper explores the idea of learning temporal gates for sequence pairs (question and answer), jointly influencing the learned representations in a pairwise manner. In our approach, temporal gates are learned via 1D convolutional layers and then subsequently cross applied across question and answer for joint learning. Empirically, we show that this conceptually simple sharing of temporal gates can lead to competitive performance across multiple benchmarks. Intuitively, what our network achieves can be interpreted as learning representations of question and answer pairs that are aware of what each other is remembering or forgetting, i.e., pairwise temporal gating. Via extensive experiments, we show that our proposed model achieves state-of-the-art performance on two community-based QA datasets and competitive performance on one factoid-based QA dataset.\nMultimodal features play a key role in wearable sensor based Human Activity Recognition (HAR). Selecting the most salient features adaptively is a promising way to maximize the effectiveness of multimodal sensor data. In this regard, we propose a \"collect fully and select wisely (Fullie and Wiselie)\" principle as well as a dual-stream recurrent convolutional attention model, Recurrent Attention and Activity Frame (RAAF), to improve the recognition performance. We first collect modality features and the relations between each pair of features to generate activity frames, and then introduce an attention mechanism to select the most prominent regions from activity frames precisely. The selected frames not only maximize the utilization of valid features but also reduce the number of features to be computed effectively. We further analyze the hyper-parameters, accuracy, interpretability, and annotation dependency of the proposed model based on extensive experiments. The results show that RAAF achieves competitive performance on two benchmarked datasets and works well in real life scenarios.\nThe paper introduces the Hidden Tree Markov Network (HTN), a neuro-probabilistic hybrid fusing the representation power of generative models for trees with the incremental and discriminative learning capabilities of neural networks. We put forward a modular architecture in which multiple generative models of limited complexity are trained to learn structural feature detectors whose outputs are then combined and integrated by neural layers at a later stage. In this respect, the model is both deep, thanks to the unfolding of the generative models on the input structures, as well as wide, given the potentially large number of generative modules that can be trained in parallel. Experimental results show that the proposed approach can outperform state-of-the-art syntactic kernels as well as generative kernels built on the same probabilistic model as the HTN.\nHierarchical abstractions, also known as options -- a type of temporally extended action (Sutton et. al. 1999) that enables a reinforcement learning agent to plan at a higher level, abstracting away from the lower-level details. In this work, we learn reusable options whose parameters can vary, encouraging different behaviors, based on the current situation. In principle, these behaviors can include vigor, defence or even risk-averseness. These are some examples of what we refer to in the broader context as Situational Awareness (SA). We incorporate SA, in the form of vigor, into hierarchical RL by defining and learning situationally aware options in a Probabilistic Goal Semi-Markov Decision Process (PG-SMDP). This is achieved using our Situationally Aware oPtions (SAP) policy gradient algorithm which comes with a theoretical convergence guarantee. We learn reusable options in different scenarios in a RoboCup soccer domain (i.e., winning/losing). These options learn to execute with different levels of vigor resulting in human-like behaviours such as `time-wasting' in the winning scenario. We show the potential of the agent to exit bad local optima using reusable options in RoboCup. Finally, using SAP, the agent mitigates feature-based model misspecification in a Bottomless Pit of Death domain.\nPeference elicitation is the task of suggesting a highly preferred configuration to a decision maker. The preferences are typically learned by querying the user for choice feedback over pairs or sets of objects. In its constructive variant, new objects are synthesized \"from scratch\" by maximizing an estimate of the user utility over a combinatorial (possibly infinite) space of candidates. In the constructive setting, most existing elicitation techniques fail because they rely on exhaustive enumeration of the candidates. A previous solution explicitly designed for constructive tasks comes with no formal performance guarantees, and can be very expensive in (or unapplicable to) problems with non-Boolean attributes. We propose the Choice Perceptron, a Perceptron-like algorithm for learning user preferences from set-wise choice feedback over constructive domains and hybrid Boolean-numeric feature spaces. We provide a theoretical analysis on the attained regret that holds for a large class of query selection strategies, and devise a heuristic strategy that aims at optimizing the regret in practice. Finally, we demonstrate its effectiveness by empirical evaluation against existing competitors on constructive scenarios of increasing complexity.\nSpinal cord stimulation has enabled humans with motor complete spinal cord injury (SCI) to independently stand and recover some lost autonomic function. Quantifying the quality of bipedal standing under spinal stimulation is important for spinal rehabilitation therapies and for new strategies that seek to combine spinal stimulation and rehabilitative robots (such as exoskeletons) in real time feedback. To study the potential for automated electromyography (EMG) analysis in SCI, we evaluated the standing quality of paralyzed patients undergoing electrical spinal cord stimulation using both video and multi-channel surface EMG recordings during spinal stimulation therapy sessions. The quality of standing under different stimulation settings was quantified manually by experienced clinicians. By correlating features of the recorded EMG activity with the expert evaluations, we show that multi-channel EMG recording can provide accurate, fast, and robust estimation for the quality of bipedal standing in spinally stimulated SCI patients. Moreover, our analysis shows that the total number of EMG channels needed to effectively predict standing quality can be reduced while maintaining high estimation accuracy, which provides more flexibility for rehabilitation robotic systems to incorporate EMG recordings.\nWe consider the use of Deep Learning methods for modeling complex phenomena like those occurring in natural physical processes. With the large amount of data gathered on these phenomena the data intensive paradigm could begin to challenge more traditional approaches elaborated over the years in fields like maths or physics. However, despite considerable successes in a variety of application domains, the machine learning field is not yet ready to handle the level of complexity required by such problems. Using an example application, namely Sea Surface Temperature Prediction, we show how general background knowledge gained from physics could be used as a guideline for designing efficient Deep Learning models. In order to motivate the approach and to assess its generality we demonstrate a formal link between the solution of a class of differential equations underlying a large family of physical phenomena and the proposed model. Experiments and comparison with series of baselines including a state of the art numerical approach is then provided.\nMany current methods to interpret convolutional neural networks (CNNs) use visualization techniques and words to highlight concepts of the input seemingly relevant to a CNN's decision. The methods hypothesize that the recognition of these concepts are instrumental in the decision a CNN reaches, but the nature of this relationship has not been well explored. To address this gap, this paper examines the quality of a concept's recognition by a CNN and the degree to which the recognitions are associated with CNN decisions. The study considers a CNN trained for scene recognition over the ADE20k dataset. It uses a novel approach to find and score the strength of minimally distributed representations of input concepts (defined by objects in scene images) across late stage feature maps. Subsequent analysis finds evidence that concept recognition impacts decision making. Strong recognition of concepts frequently-occurring in few scenes are indicative of correct decisions, but recognizing concepts common to many scenes may mislead the network.\nHumans possess an ability to abstractly reason about objects and their interactions, an ability not shared with state-of-the-art deep learning models. Relational networks, introduced by Santoro et al. (2017), add the capacity for relational reasoning to deep neural networks, but are limited in the complexity of the reasoning tasks they can address. We introduce recurrent relational networks which increase the suite of solvable tasks to those that require an order of magnitude more steps of relational reasoning. We use recurrent relational networks to solve Sudoku puzzles and achieve state-of-the-art results by solving 96.6% of the hardest Sudoku puzzles, where relational networks fail to solve any. We also apply our model to the BaBi textual QA dataset solving 19/20 tasks which is competitive with state-of-the-art sparse differentiable neural computers. The recurrent relational network is a general purpose module that can augment any neural network model with the capacity to do many-step relational reasoning.\nPolicy optimization methods have shown great promise in solving complex reinforcement and imitation learning tasks. While model-free methods are broadly applicable, they often require many samples to optimize complex policies. Model-based methods greatly improve sample-efficiency but at the cost of poor generalization, requiring a carefully handcrafted model of the system dynamics for each task. Recently, hybrid methods have been successful in trading off applicability for improved sample-complexity. However, these have been limited to continuous action spaces. In this work, we present a new hybrid method based on an approximation of the dynamics as an expectation over the next state under the current policy. This relaxation allows us to derive a novel hybrid policy gradient estimator, combining score function and pathwise derivative estimators, that is applicable to discrete action spaces. We show significant gains in sample complexity, ranging between $1.7$ and $25\\times$, when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and Hand Mass. Our method is applicable to both discrete and continuous action spaces, when competing pathwise methods are limited to the latter.\nStackelberg equilibria have become increasingly important as a solution concept in computational game theory, largely inspired by practical problems such as security settings. In practice, however, there is typically uncertainty regarding the model about the opponent. This paper is, to our knowledge, the first to investigate Stackelberg equilibria under uncertainty in extensive-form games, one of the broadest classes of game. We introduce robust Stackelberg equilibria, where the uncertainty is about the opponent's payoffs, as well as ones where the opponent has limited lookahead and the uncertainty is about the opponent's node evaluation function. We develop a new mixed-integer program for the deterministic limited-lookahead setting. We then extend the program to the robust setting for Stackelberg equilibrium under unlimited and under limited lookahead by the opponent. We show that for the specific case of interval uncertainty about the opponent's payoffs (or about the opponent's node evaluations in the case of limited lookahead), robust Stackelberg equilibria can be computed with a mixed-integer program that is of the same asymptotic size as that for the deterministic setting.\nAction abstractions restrict the number of legal actions available during search in multi-unit real-time adversarial games, thus allowing algorithms to focus their search on a set of promising actions. Optimal strategies derived from un-abstracted spaces are guaranteed to be no worse than optimal strategies derived from action-abstracted spaces. In practice, however, due to real-time constraints and the state space size, one is only able to derive good strategies in un-abstracted spaces in small-scale games. In this paper we introduce search algorithms that use an action abstraction scheme we call asymmetric abstraction. Asymmetric abstractions retain the un-abstracted spaces' theoretical advantage over regularly abstracted spaces while still allowing the search algorithms to derive effective strategies, even in large-scale games. Empirical results on combat scenarios that arise in a real-time strategy game show that our search algorithms are able to substantially outperform state-of-the-art approaches.\nThe dynamics of infectious diseases spread is crucial in determining their risk and offering ways to contain them. We study sequential vaccination of individuals in networks. In the original (deterministic) version of the Firefighter problem, a fire breaks out at some node of a given graph. At each time step, b nodes can be protected by a firefighter and then the fire spreads to all unprotected neighbors of the nodes on fire. The process ends when the fire can no longer spread. We extend the Firefighter problem to a probabilistic setting, where the infection is stochastic. We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget. We derive methods for calculating upper and lower bounds of the expected number of infected individuals, as well as provide estimates on the budget needed for containment in expectation. We calculate these explicitly on trees, d-dimensional grids, and Erd\\H{o}s R\\'{e}nyi graphs. Finally, we construct a state-dependent budget allocation strategy and demonstrate its superiority over constant budget allocation on real networks following a first order acquaintance vaccination policy.\nWe tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned itera tively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning user preferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preference-based decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.\nA first step to reach Theory of Mind (ToM) abilities (attribution of beliefs to others) in synthetic agents through sensorimotor interactions, would be to tag sensory data with agent typology and action intentions: autonomous agent X moved an object under the box. We propose a dual arm robotic setup in which ToM could be probed. We then discuss what measures can be extracted from sensorimotor interaction data (based on a correlation analysis) in the proposed setup that allow to distinguish self than other and other/inanimate from other/active with intentions. We finally discuss what elements are missing in current cognitive architectures to be able to acquire ToM abilities in synthetic agents from sensorimotor interactions, bottom-up from reactive agent interaction behaviors and top-down from the optimization of social behaviour and cooperation.\nHuman motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems. Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data. In this paper, a detailed overview of recent advances in RGB-D-based motion recognition is presented. The reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth-based, skeleton-based and multi-modal-based. As a survey focused on the application of deep learning to RGB-D-based motion recognition, we explicitly discuss the advantages and limitations of existing techniques. Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.\nThe discriminative approach to classification using deep neural networks has become the de-facto standard in various fields. Complementing recent reservations about safety against adversarial examples, we show that conventional discriminative methods can easily be fooled to provide incorrect labels with very high confidence to out of distribution examples. We posit that a generative approach is the natural remedy for this problem, and propose a method for classification using generative models. At training time, we learn a generative model for each class, while at test time, given an example to classify, we query each generator for its most similar generation, and select the class corresponding to the most similar one. Our approach is general and can be used with expressive models such as GANs and VAEs. At test time, our method accurately \"knows when it does not know,\" and provides resilience to out of distribution examples while maintaining competitive performance for standard examples.\nIn this paper, we build upon previous work on designing informative and efficient Exploratory Landscape Analysis features for characterizing problems' landscapes and show their effectiveness in automatically constructing algorithm selection models in continuous black-box optimization problems. Focussing on algorithm performance results of the COCO platform of several years, we construct a representative set of high-performing complementary solvers and present an algorithm selection model that manages to outperform the single best solver out of the portfolio by factor two. Acting on the assumption that the function set of the Black-Box Optimization Benchmark is representative enough for practical applications the model allows for selecting the best suited optimization algorithm within the considered set for unseen problems prior to the optimization itself based on a small sample of function evaluations. Note that such a sample can even be reused for the initial algorithm population so that feature costs become negligible.\nThe use of dialogue systems as a medium for human-machine interaction is an increasingly prevalent paradigm. A growing number of dialogue systems use conversation strategies that are learned from large datasets. There are well documented instances where interactions with these system have resulted in biased or even offensive conversations due to the data-driven training process. Here, we highlight potential ethical issues that arise in dialogue systems research, including: implicit biases in data-driven systems, the rise of adversarial examples, potential sources of privacy violations, safety concerns, special considerations for reinforcement learning systems, and reproducibility concerns. We also suggest areas stemming from these issues that deserve further investigation. Through this initial survey, we hope to spur research leading to robust, safe, and ethically sound dialogue systems.\nWe propose the cascade attribute learning network (CALNet), which can learn attributes in a control task separately and assemble them together. Our contribution is twofold: first we propose attribute learning in reinforcement learning (RL). Attributes used to be modeled using constraint functions or terms in the objective function, making it hard to transfer. Attribute learning, on the other hand, models these task properties as modules in the policy network. We also propose using novel cascading compensative networks in the CALNet to learn and assemble attributes. Using the CALNet, one can zero shoot an unseen task by separately learning all its attributes, and assembling the attribute modules. We have validated the capacity of our model on a wide variety of control problems with attributes in time, position, velocity and acceleration phases.\nAdversarial decision making is a particular type of decision making problem where the gain a decision maker obtains as a result of his decisions is affected by the actions taken by others. Representation of alternatives' evaluations and methods to find the optimal alternative are two important aspects in the adversarial decision making. The aim of this study is to develop a general framework for solving the adversarial decision making issue under uncertain environment. By combining fuzzy set theory, game theory and D numbers theory (DNT), a DNT based game-theoretic framework for adversarial decision making under fuzzy environment is presented. Within the proposed framework or model, fuzzy set theory is used to model the uncertain evaluations of decision makers to alternatives, the non-exclusiveness among fuzzy evaluations are taken into consideration by using DNT, and the conflict of interests among decision makers is considered in a two-person non-constant sum game theory perspective. An illustrative application is given to demonstrate the effectiveness of the proposed model. This work, on one hand, has developed an effective framework for adversarial decision making under fuzzy environment; One the other hand, it has further improved the basis of DNT as a generalization of Dempster-Shafer theory for uncertainty reasoning.\nWe present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106x improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling. We release an open source TensorFlow implementation of the algorithm.\nIn this paper, we propose an adversarial process for abstractive text summarization, in which we simultaneously train a generative model G and a discriminative model D. In particular, we build the generator G as an agent of reinforcement learning, which takes the raw text as input and predicts the abstractive summarization. We also build a discriminator which attempts to distinguish the generated summary from the ground truth summary. Extensive experiments demonstrate that our model achieves competitive ROUGE scores with the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show that our model is able to generate more abstractive, readable and diverse summaries.\nA common assumption in machine learning is that training data are i.i.d. samples from some distribution. Processes that generate i.i.d. samples are, in a sense, uninformative---they produce data without regard to how good this data is for learning. By contrast, cognitive science research has shown that when people generate training data for others (i.e., teaching), they deliberately select examples that are helpful for learning. Because the data is more informative, learning can require less data. Interestingly, such examples are most effective when learners know that the data were pedagogically generated (as opposed to randomly generated). We call this pedagogical learning---when a learner assumes that evidence comes from a helpful teacher. In this work, we ask how pedagogical learning might work for machine learning algorithms. Studying this question requires understanding how people actually teach complex concepts with examples, so we conducted a behavioral study examining how people teach regular expressions using example strings. We found that teachers' examples contain powerful clustering structure that can greatly facilitate learning. We then develop a model of teaching and show a proof of concept that using this model inside of a learner can improve performance.\nWe introduce a one-shot learning approach for video object tracking. The proposed algorithm requires seeing the object to be tracked only once, and employs an external memory to store and remember the evolving features of the foreground object as well as backgrounds over time during tracking. With the relevant memory retrieved and updated in each tracking, our tracking model is capable of maintaining long-term memory of the object, and thus can naturally deal with hard tracking scenarios including partial and total occlusion, motion changes and large scale and shape variations. In our experiments we use the ImageNet ILSVRC2015 video detection dataset to train and use the VOT-2016 benchmark to test and compare our Memory-Augmented Video Object Tracking (MAVOT) model. From the results, we conclude that given its oneshot property and simplicity in design, MAVOT is an attractive approach in visual tracking because it shows good performance on VOT-2016 benchmark and is among the top 5 performers in accuracy and robustness in occlusion, motion changes and empty target.\nInterval Pairwise Comparison Matrices have been widely used to account for uncertain statements concerning the preferences of decision makers. Several approaches have been proposed in the literature, such as multiplicative and fuzzy interval matrices. In this paper, we propose a general unified approach to Interval Pairwise Comparison Matrices, based on Abelian linearly ordered groups. In this framework, we generalize some consistency conditions provided for multiplicative and/or fuzzy interval pairwise comparison matrices and provide inclusion relations between them. Then, we provide a concept of distance between intervals that, together with a notion of mean defined over real continuous Abelian linearly ordered groups, allows us to provide a consistency index and an indeterminacy index. In this way, by means of suitable isomorphisms between Abelian linearly ordered groups, we will be able to compare the inconsistency and the indeterminacy of different kinds of Interval Pairwise Comparison Matrices, e.g. multiplicative, additive, and fuzzy, on a unique Cartesian coordinate system.\nSepsis is a leading cause of mortality in intensive care units and costs hospitals billions annually. Treating a septic patient is highly challenging, because individual patients respond very differently to medical interventions and there is no universally agreed-upon treatment for sepsis. In this work, we propose an approach to deduce treatment policies for septic patients by using continuous state-space models and deep reinforcement learning. Our model learns clinically interpretable treatment policies, similar in important aspects to the treatment policies of physicians. The learned policies could be used to aid intensive care clinicians in medical decision making and improve the likelihood of patient survival.\nThis paper proposes a new algorithm for controlling classification results by generating a small additive perturbation without changing the classifier network. Our work is inspired by existing works generating adversarial perturbation that worsens classification performance. In contrast to the existing methods, our work aims to generate perturbations that can enhance overall classification performance. To solve this performance enhancement problem, we newly propose a perturbation generation network (PGN) influenced by the adversarial learning strategy. In our problem, the information in a large external dataset is summarized by a small additive perturbation, which helps to improve the performance of the classifier trained with the target dataset. In addition to this performance enhancement problem, we show that the proposed PGN can be adopted to solve the classical adversarial problem without utilizing the information on the target classifier. The mentioned characteristics of our method are verified through extensive experiments on publicly available visual datasets.\nIn this paper, we present a hybrid model that combines a neural conversational model and a rule-based graph dialogue system that assists users in scheduling reminders through a chat conversation. The graph based system has high precision and provides a grammatically accurate response but has a low recall. The neural conversation model can cater to a variety of requests, as it generates the responses word by word as opposed to using canned responses. The hybrid system shows significant improvements over the existing baseline system of rule based approach and caters to complex queries with a domain-restricted neural model. Restricting the conversation topic and combination of graph based retrieval system with a neural generative model makes the final system robust enough for a real world application.\nThis work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statis- tics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evalu- ating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 bi- nary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.\nTable-to-text generation aims to generate a description for a factual table which can be viewed as a set of field-value records. To encode both the content and the structure of a table, we propose a novel structure-aware seq2seq architecture which consists of field-gating encoder and description generator with dual attention. In the encoding phase, we update the cell memory of the LSTM unit by a field gate and its corresponding field value in order to incorporate field information into table representation. In the decoding phase, dual attention mechanism which contains word level attention and field level attention is proposed to model the semantic relevance between the generated description and the table. We conduct experiments on the \\texttt{WIKIBIO} dataset which contains over 700k biographies and corresponding infoboxes from Wikipedia. The attention visualizations and case studies show that our model is capable of generating coherent and informative descriptions based on the comprehensive understanding of both the content and the structure of a table. Automatic evaluations also show our model outperforms the baselines by a great margin. Code for this work is available on https://github.com/tyliupku/wiki2bio.\nDeep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired by the neural net and express the same knowledge in a model that relies on hierarchical decisions instead, explaining a particular decision would be much easier. We describe a way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data.\nTensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide attention and achievement in data mining, computer vision, signal processing, and neuroscience, etc. In this survey, we provide a modern overview of recent advances in tensor completion algorithms from the perspective of big data analytics characterized by diverse variety, large volume, and high velocity. Towards a better comprehension and comparison of vast existing advances, we summarize and categorize them into four groups including general tensor completion algorithms, tensor completion with auxiliary information (variety), scalable tensor completion algorithms (volume) and dynamic tensor completion algorithms (velocity). Besides, we introduce their applications on real-world data-driven problems and present an open-source package covering several widely used tensor decomposition and completion algorithms. Our goal is to summarize these popular methods and introduce them to researchers for promoting the research process in this field and give an available repository for practitioners. In the end, we also discuss some challenges and promising research directions in this community for future explorations.\nDistributed training of deep neural networks has received significant research interest, and its major approaches include implementations on multiple GPUs and clusters. Parallelization can dramatically improve the efficiency of training deep and complicated models with large-scale data. A fundamental barrier against the speedup of DNN training, however, is the trade-off between computation and communication time. In other words, increasing the number of worker nodes decreases the time consumed in computation while simultaneously increasing communication overhead under constrained network bandwidth, especially in commodity hardware environments. To alleviate this trade-off, we suggest the idea of homomorphic parameter compression, which compresses parameters with the least expense and trains the DNN with the compressed representation. Although the specific method is yet to be discovered, we demonstrate that there is a high probability that the homomorphism can reduce the communication overhead, thanks to little compression and decompression times. We also provide theoretical speedup of homomorphic compression.\nRecently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot. The robot leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation, to demonstrate successful zero-shot transfer under real-world environmental variations without fine-tuning.\nQuantitative CBA is a postprocessing algorithm for association rule classification algorithm CBA (Liu et al, 1998). QCBA uses original, undiscretized numerical attributes to optimize the discovered association rules, refining the boundaries of literals in the antecedent of the rules produced by CBA. Some rules as well as literals from the rules can consequently be removed, which makes the resulting classifier smaller. One-rule classification and crisp rules make CBA classification models possibly most comprehensible among all association rule classification algorithms. These viable properties are retained by QCBA. The postprocessing is conceptually fast, because it is performed on a relatively small number of rules that passed data coverage pruning in CBA. Benchmark of our QCBA approach on 22 UCI datasets shows average 53% decrease in the total size of the model as measured by the total number of conditions in all rules. Model accuracy remains on the same level as for CBA.\nLearning an optimal policy from a multi-modal reward function is a challenging problem in reinforcement learning (RL). Hierarchical RL (HRL) tackles this problem by learning a hierarchical policy, where multiple option policies are in charge of different strategies corresponding to modes of a reward function and a gating policy selects the best option for a given context. Although HRL has been demonstrated to be promising, current state-of-the-art methods cannot still perform well in complex real-world problems due to the difficulty of identifying modes of the reward function. In this paper, we propose a novel method called hierarchical policy search via return-weighted density estimation (HPSDE), which can efficiently identify the modes through density estimation with return-weighted importance sampling. Our proposed method finds option policies corresponding to the modes of the return function and automatically determines the number and the location of option policies, which significantly reduces the burden of hyper-parameters tuning. Through experiments, we demonstrate that the proposed HPSDE successfully learns option policies corresponding to modes of the return function and that it can be successfully applied to a challenging motion planning problem of a redundant robotic manipulator.\nThis paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017], we open-source a fast hybrid CPU-GPU implementation of CASL.\nRecent systems on structured prediction focus on increasing the level of structural dependencies within the model. However, our study suggests that complex structures entail high overfitting risks. To control the structure-based overfitting, we propose to conduct structure regularization decoding (SR decoding). The decoding of the complex structure model is regularized by the additionally trained simple structure model. We theoretically analyze the quantitative relations between the structural complexity and the overfitting risk. The analysis shows that complex structure models are prone to the structure-based overfitting. Empirical evaluations show that the proposed method improves the performance of the complex structure models by reducing the structure-based overfitting. On the sequence labeling tasks, the proposed method substantially improves the performance of the complex neural network models. The maximum F1 error rate reduction is 36.4% for the third-order model. The proposed method also works for the parsing task. The maximum UAS improvement is 5.5% for the tri-sibling model. The results are competitive with or better than the state-of-the-art results.\nA supervised learning algorithm searches over a set of functions $A \\to B$ parametrised by a space $P$ to find the best approximation to some ideal function $f\\colon A \\to B$. It does this by taking examples $(a,f(a)) \\in A\\times B$, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent---with respect to a fixed step size and an error function satisfying a certain property---defines a monoidal functor from a category of parametrised functions to this category of update rules. This provides a structural perspective on backpropagation, as well as a broad generalisation of neural networks.\nDemographic studies suggest that changes in the retinal vasculature geometry, especially in vessel width, are associated with the incidence or progression of eye-related or systemic diseases. To date, the main information source for width estimation from fundus images has been the intensity profile between vessel edges. However, there are many factors affecting the intensity profile: pathologies, the central light reflex and local illumination levels, to name a few. In this study, we introduce three information sources for width estimation. These are the probability profiles of vessel interior, centreline and edge locations generated by a deep network. The probability profiles provide direct access to vessel geometry and are used in the likelihood calculation for a Bayesian method, particle filtering. We also introduce a geometric model which can handle non-ideal conditions of the probability profiles. Our experiments conducted on the REVIEW dataset yielded consistent estimates of vessel width, even in cases when one of the vessel edges is difficult to identify. Moreover, our results suggest that the method is better than human observers at locating edges of low contrast vessels.\nWe introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this two part treatise, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct classes of algorithms, namely continuous time and discrete time models. The resulting neural networks form a new class of data-efficient universal function approximators that naturally encode any underlying physical laws as prior information. In this first part, we demonstrate how these networks can be used to infer solutions to partial differential equations, and obtain physics-informed surrogate models that are fully differentiable with respect to all input coordinates and free parameters.\nIncremental class learning involves sequentially learning classes in bursts of examples from the same class. This violates the assumptions that underlie methods for training standard deep neural networks, and will cause them to suffer from catastrophic forgetting. Arguably, the best method for incremental class learning is iCaRL, but it requires storing training examples for each class, making it challenging to scale. Here, we propose FearNet for incremental class learning. FearNet is a generative model that does not store previous examples, making it memory efficient. FearNet uses a brain-inspired dual-memory system in which new memories are consolidated from a network for recent memories inspired by the mammalian hippocampal complex to a network for long-term storage inspired by medial prefrontal cortex. Memory consolidation is inspired by mechanisms that occur during sleep. FearNet also uses a module inspired by the basolateral amygdala for determining which memory system to use for recall. FearNet achieves state-of-the-art performance at incremental class learning on image (CIFAR-100, CUB-200) and audio classification (AudioSet) benchmarks.\nWe introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this second part of our two-part treatise, we focus on the problem of data-driven discovery of partial differential equations. Depending on whether the available data is scattered in space-time or arranged in fixed temporal snapshots, we introduce two main classes of algorithms, namely continuous time and discrete time models. The effectiveness of our approach is demonstrated using a wide range of benchmark problems in mathematical physics, including conservation laws, incompressible fluid flow, and the propagation of nonlinear shallow-water waves.\nThis paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via some sort competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory. Particles are devised with Q learning algorithm for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced results are supportive to the algorithmic structures suggesting that a substantive collaboration can be build via proposed learning algorithm.\nThe TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.\nPredicting unseen weather phenomena is an important issue for disaster management. In this paper, we suggest a model for a convolutional sequence-to-sequence autoencoder for predicting undiscovered weather situations from previous satellite images. We also propose a symmetric skip connection between encoder and decoder modules to produce more comprehensive image predictions. To examine our model performance, we conducted experiments for each suggested model to predict future satellite images from historical satellite images. A specific combination of skip connection and sequence-to-sequence autoencoder was able to generate closest prediction from the ground truth image.\nThis paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.\nThis work explores attention models to weight the contribution of local convolutional representations for the instance search task. We present a retrieval framework based on bags of local convolutional features (BLCF) that benefits from saliency weighting to build an efficient image representation. The use of human visual attention models (saliency) allows significant improvements in retrieval performance without the need to conduct region analysis or spatial verification, and without requiring any feature fine tuning. We investigate the impact of different saliency models, finding that higher performance on saliency benchmarks does not necessarily equate to improved performance when used in instance search tasks. The proposed approach outperforms the state-of-the-art on the challenging INSTRE benchmark by a large margin, and provides similar performance on the Oxford and Paris benchmarks compared to more complex methods that use off-the-shelf representations. The source code used in this project is available at https://imatge-upc.github.io/salbow/\nWe propose a novel computational strategy based on deep and reinforcement learning techniques for de-novo design of molecules with desired properties. This strategy integrates two deep neural networks -generative and predictive - that are trained separately but employed jointly to generate novel chemical structures with the desired properties. Generative models are trained to produce chemically feasible SMILES, and predictive models are derived to forecast the desired compound properties. In the first phase of the method, generative and predictive models are separately trained with supervised learning algorithms. In the second phase, both models are trained jointly with reinforcement learning approach to bias newly generated chemical structures towards those with desired physical and biological properties. In this proof-of-concept study, we have employed this integrative strategy to design chemical libraries biased toward compounds with either maximal, minimal, or specific range of physical properties, such as melting point and hydrophobicity, as well as to develop novel putative inhibitors of JAK2. This new approach can find a general use for generating targeted chemical libraries optimized for a single desired property or multiple properties.\nIn the covariate shift learning scenario, the training and test covariate distributions differ, so that a predictor's average loss over the training and test distributions also differ. In this work, we explore the potential of extreme dimension reduction, i.e. to very low dimensions, in improving the performance of importance weighting methods for handling covariate shift, which fail in high dimensions due to potentially high train/test covariate divergence and the inability to accurately estimate the requisite density ratios. We first formulate and solve a problem optimizing over linear subspaces a combination of their predictive utility and train/test divergence within. Applying it to simulated and real data, we show extreme dimension reduction helps sometimes but not always, due to a bias introduced by dimension reduction.\nExisting music recognition applications require a connection to a server that performs the actual recognition. In this paper we present a low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction. To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main application processor only when it is confident that music is present. Once woken, the recognizer on the application processor is provided with a few seconds of audio which is fingerprinted and compared to the stored fingerprints in the on-device fingerprint database of tens of thousands of songs. Our presented system, Now Playing, has a daily battery usage of less than 1% on average, respects user privacy by running entirely on-device and can passively recognize a wide range of music.\nWe introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential 'meanings'. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks.\nModern social platforms are characterized by the presence of rich user-behavior data associated with the publication, sharing and consumption of textual content. Users interact with content and with each other in a complex and dynamic social environment while simultaneously evolving over time. In order to effectively characterize users and predict their future behavior in such a setting, it is necessary to overcome several challenges. Content heterogeneity and temporal inconsistency of behavior data result in severe sparsity at the user level. In this paper, we propose a novel mutual-enhancement framework to simultaneously partition and learn latent activity profiles of users. We propose a flexible user partitioning approach to effectively discover rare behaviors and tackle user-level sparsity. We extensively evaluate the proposed framework on massive datasets from real-world platforms including Q&A networks and interactive online courses (MOOCs). Our results indicate significant gains over state-of-the-art behavior models ( 15% avg ) in a varied range of tasks and our gains are further magnified for users with limited interaction data. The proposed algorithms are amenable to parallelization, scale linearly in the size of datasets, and provide flexibility to model diverse facets of user behavior.\nVideo captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fine-grained actions with a detailed description. This paper aims to address the challenge by proposing a novel hierarchical reinforcement learning framework for video captioning, where a high-level Manager module learns to design sub-goals and a low-level Worker module recognizes the primitive actions to fulfill the sub-goal. With this compositional framework to reinforce video captioning at different levels, our approach significantly outperforms all the baseline methods on a newly introduced large-scale dataset for fine-grained video captioning. Furthermore, our non-ensemble model has already achieved the state-of-the-art results on the widely-used MSR-VTT dataset.\nThis paper develops a novel methodology for using symbolic knowledge in deep learning. From first principles, we derive a semantic loss function that bridges between neural output vectors and logical constraints. This loss function captures how close the neural network is to satisfying the constraints on its output. An experimental evaluation shows that our semantic loss function effectively guides the learner to achieve (near-)state-of-the-art results on semi-supervised multi-class classification. Moreover, it significantly increases the ability of the neural network to predict structured objects, such as rankings and paths. These discrete concepts are tremendously difficult to learn, and benefit from a tight integration of deep learning and symbolic reasoning methods.\nThis paper proposes a real-time embedded fall detection system using a DVS(Dynamic Vision Sensor) that has never been used for traditional fall detection, a dataset for fall detection using that, and a DVS-TN(DVS-Temporal Network). The first contribution is building a DVS Falls Dataset, which made our network to recognize a much greater variety of falls than the existing datasets that existed before and solved privacy issues using the DVS. Secondly, we introduce the DVS-TN : optimized deep learning network to detect falls using DVS. Finally, we implemented a fall detection system which can run on low-computing H/W with real-time, and tested on DVS Falls Dataset that takes into account various falls situations. Our approach achieved 95.5% on the F1-score and operates at 31.25 FPS on NVIDIA Jetson TX1 board.\nIn this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to control the magnitude of the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model.\nConvNets and Imagenet have driven the recent success of deep learning for image classification. However, the marked slowdown in performance improvement, the recent studies on the lack of robustness of neural networks to adversarial examples and their tendency to exhibit undesirable biases (e.g racial biases) questioned the reliability and the sustained development of these methods. This work investigates these questions from the perspective of the end-user by using human subject studies and explanations. We experimentally demonstrate that the accuracy and robustness of ConvNets measured on Imagenet are underestimated. We show that explanations can mitigate the impact of misclassified adversarial examples from the perspective of the end-user and we introduce a novel tool for uncovering the undesirable biases learned by a model. These contributions also show that explanations are a promising tool for improving our understanding of ConvNets' predictions and for designing more reliable models\nThis paper is concerned with how to make efficient use of social information to improve recommendations. Most existing social recommender systems assume people share similar preferences with their social friends. Which, however, may not hold true due to various motivations of making online friends and dynamics of online social networks. Inspired by recent causal process based recommendations that first model user exposures towards items and then use these exposures to guide rating prediction, we utilize social information to capture user exposures rather than user preferences. We assume that people get information of products from their online friends and they do not have to share similar preferences, which is less restrictive and seems closer to reality. Under this new assumption, in this paper, we present a novel recommendation approach (named SERec) to integrate social exposure into collaborative filtering. We propose two methods to implement SERec, namely social regularization and social boosting, each with different ways to construct social exposures. Experiments on four real-world datasets demonstrate that our methods outperform the state-of-the-art methods on top-N recommendations. Further study compares the robustness and scalability of the two proposed methods.\nDetermining semantic similarity between academic documents is crucial to many tasks such as plagiarism detection, automatic technical survey and semantic search. Current studies mostly focus on semantic similarity between concepts, sentences and short text fragments. However, document-level semantic matching is still based on statistical information in surface level, neglecting article structures and global semantic meanings, which may cause the deviation in document understanding. In this paper, we focus on the document-level semantic similarity issue for academic literatures with a novel method. We represent academic articles with topic events that utilize multiple information profiles, such as research purposes, methodologies and domains to integrally describe the research work, and calculate the similarity between topic events based on the domain ontology to acquire the semantic similarity between articles. Experiments show that our approach achieves significant performance compared to state-of-the-art methods.\nWe propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. Previous neural network-based methods have been focusing on localizing a single sound source, which do not extend to multiple sources in terms of detection and localization. In this paper, we thus propose a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources. In addition, we investigate the use of sub-band cross-correlation information as features for better localization in sound mixtures, as well as three different network architectures based on different motivations. Experiments on real data recorded from a robot show that our proposed methods significantly outperform the popular spatial spectrum-based approaches.\nReinforcement Learning and the Evolutionary Strategy are two major approaches in addressing complicated control problems. Both are strong contenders and have their own devotee communities. Both groups have been very active in developing new advances in their own domain and devising, in recent years, leading-edge techniques to address complex continuous control tasks. Here, in the context of Deep Reinforcement Learning, we formulate a parallelized version of the Proximal Policy Optimization method and a Deep Deterministic Policy Gradient method. Moreover, we conduct a thorough comparison between the state-of-the-art techniques in both camps fro continuous control; evolutionary methods and Deep Reinforcement Learning methods. The results show there is no consistent winner.\nRecent progress in deep learning has been accompanied by a growing concern for whether models are fair for users, with equally good performance across different demographics. In computer vision research, such questions are relevant to face detection and the related task of face attribute detection, among others. We measure race and gender inclusion in the context of smiling detection, and introduce a method for improving smiling detection across demographic groups. Our method introduces several modifications over existing detection methods, leveraging twofold transfer learning to better model facial diversity. Results show that this technique improves accuracy against strong baselines for most demographic groups as well as overall. Our best-performing model defines a new state-of-the-art for smiling detection, reaching 91% on the Faces of the World dataset. The accompanying multi-head diversity classifier also defines a new state-of-the-art for gender classification, reaching 93.87% on the Faces of the World dataset. This research demonstrates the utility of modeling race and gender to improve a face attribute detection task, using a twofold transfer learning framework that allows for privacy towards individuals in a target dataset.\nLearning Automata (LA) are considered as one of the most powerful tools in the field of reinforcement learning. The family of estimator algorithms is proposed to improve the convergence rate of LA and has made great achievements. However, the estimators perform poorly on estimating the reward probabilities of actions in the initial stage of the learning process of LA. In this situation, a lot of rewards would be added to the probabilities of non-optimal actions. Thus, a large number of extra iterations are needed to compensate for these wrong rewards. In order to improve the speed of convergence, we propose a new P-model absorbing learning automaton by utilizing a double competitive strategy which is designed for updating the action probability vector. In this way, the wrong rewards can be corrected instantly. Hence, the proposed Double Competitive Algorithm overcomes the drawbacks of existing estimator algorithms. A refined analysis is presented to show the $\\epsilon-optimality$ of the proposed scheme. The extensive experimental results in benchmark environments demonstrate that our proposed learning automata perform more efficiently than the most classic LA $SE_{RI}$ and the current fastest LA $DGCPA^{*}$.\nThe task of decision-making under uncertainty is daunting, especially for problems which have significant complexity. Healthcare policy makers across the globe are facing problems under challenging constraints, with limited tools to help them make data driven decisions. In this work we frame the process of finding an optimal malaria policy as a stochastic multi-armed bandit problem, and implement three agent based strategies to explore the policy space. We apply a Gaussian Process regression to the findings of each agent, both for comparison and to account for stochastic results from simulating the spread of malaria in a fixed population. The generated policy spaces are compared with published results to give a direct reference with human expert decisions for the same simulated population. Our novel approach provides a powerful resource for policy makers, and a platform which can be readily extended to capture future more nuanced policy spaces.\nHumans are able to identify a referred visual object in a complex scene via a few rounds of natural language communications. Success communication requires both parties to engage and learn to adapt for each other. In this paper, we introduce an interactive training method to improve the natural language conversation system for a visual grounding task. During interactive training, both agents are reinforced by the guidance from a common reward function. The parametrized reward function also cooperatively updates itself via interactions, and contribute to accomplishing the task. We evaluate the method on GuessWhat?! visual grounding task, and significantly improve the task success rate. However, we observe language drifting problem during training and propose to use reward engineering to improve the interpretability for the generated conversations. Our result also indicates evaluating goal-ended visual conversation tasks require semantic relevant metrics beyond task success rate.\nThis paper briefly elaborates on a development in (applied) fuzzy logic that has taken place in the last couple of decades, namely, the complementation or even replacement of the traditional knowledge-based approach to fuzzy rule-based systems design by a data-driven one. It is argued that the classical rule-based modeling paradigm is actually more amenable to the knowledge-based approach, for which it has originally been conceived, while being less apt to data-driven model design. An important reason that prevents fuzzy (rule-based) systems from being leveraged in large-scale applications is the flat structure of rule bases, along with the local nature of fuzzy rules and their limited ability to express complex dependencies between variables. This motivates alternative approaches to fuzzy systems modeling, in which functional dependencies can be represented more flexibly and more compactly in terms of hierarchical structures.\nProgramming trends suggest that software development will undergo a radical change in the future: the combination of machine learning, artificial intelligence, natural language processing, and code generation technologies will improve in such a way that machines, instead of humans, will write most of their own code by 2040. This poses a number of interesting challenges for scientific research, especially as the hardware on which this Machine Generated Code will run becomes extremely heterogeneous. Indeed, extreme heterogeneity may drive the creation of this technology because it will allow humans to cope with the difficulty of programming different devices efficiently and easily.\nAlzheimer's disease is the most common cause of dementia, yet hard to diagnose precisely without invasive techniques, particularly at the onset of the disease. This work approaches image analysis and classification of synthetic multispectral images composed by diffusion-weighted magnetic resonance (MR) cerebral images for the evaluation of cerebrospinal fluid area and measuring the advance of Alzheimer's disease. A clinical 1.5 T MR imaging system was used to acquire all images presented. The classification methods are based on multilayer perceptrons and Kohonen Self-Organized Map classifiers. We assume the classes of interest can be separated by hyperquadrics. Therefore, a 2-degree polynomial network is used to classify the original image, generating the ground truth image. The classification results are used to improve the usual analysis of the apparent diffusion coefficient map.\nWe consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation function, i.e., $f(\\mathbf{Z}; \\mathbf{w}, \\mathbf{a}) = \\sum_j a_j\\sigma(\\mathbf{w}^\\top\\mathbf{Z}_j)$, in which both the convolutional weights $\\mathbf{w}$ and the output weights $\\mathbf{a}$ are parameters to be learned. We prove that with Gaussian input $\\mathbf{Z}$, there is a spurious local minimum that is not a global mininum. Surprisingly, in the presence of local minimum, starting from randomly initialized weights, gradient descent with weight normalization can still be proven to recover the true parameters with constant probability (which can be boosted to arbitrarily high accuracy with multiple restarts). We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.\nThe ability to learn at different resolutions in time may help overcome one of the main challenges in deep reinforcement learning -- sample efficiency. Hierarchical agents that operate at different levels of temporal abstraction can learn tasks more quickly because they can divide the work of learning behaviors among multiple policies and can also explore the environment at a higher level. In this paper, we present a novel approach to hierarchical reinforcement learning called Hierarchical Actor-Critic (HAC) that enables agents to learn to break down problems involving continuous action spaces into simpler subproblems belonging to different time scales. HAC has two key advantages over most existing hierarchical learning methods: (i) the potential for faster learning as agents learn short policies at each level of the hierarchy and (ii) an end-to-end approach. We demonstrate that HAC significantly accelerates learning in a series of tasks that require behavior over a relatively long time horizon and involve sparse rewards.\nPerformance appraisal (PA) is an important HR process to periodically measure and evaluate every employee's performance vis-a-vis the goals established by the organization. A PA process involves purposeful multi-step multi-modal communication between employees, their supervisors and their peers, such as self-appraisal, supervisor assessment and peer feedback. Analysis of the structured data and text produced in PA is crucial for measuring the quality of appraisals and tracking actual improvements. In this paper, we apply text mining techniques to produce insights from PA text. First, we perform sentence classification to identify strengths, weaknesses and suggestions of improvements found in the supervisor assessments and then use clustering to discover broad categories among them. Next we use multi-class multi-label classification techniques to match supervisor assessments to predefined broad perspectives on performance. Finally, we propose a short-text summarization technique to produce a summary of peer feedback comments for a given employee and compare it with manual summaries. All techniques are illustrated using a real-life dataset of supervisor assessment and peer feedback text produced during the PA of 4528 employees in a large multi-national IT company.\nA correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints has already been established. In this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes. Here, causes are also introduced at the attribute level by appealing to a both null-based and attribute-based repair semantics. The corresponding repair programs are presented, and they are used as a basis for computation and reasoning about attribute-level causes.\nRecently experience replay is widely used in various deep reinforcement learning (RL) algorithms, however in this paper we showcase that it is not as good as people think. To be more specific, experience replay will significantly hurt the learning process if the size of replay buffer is not well tuned. Although experience replay is a necessary component in modern deep RL algorithms to stabilize the network, we should be aware that the idea of experience replay itself is not as good as people think. The size of the replay buffer is an important hyper-parameter, which can significantly influence the performance and has unfortunately been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games. Moreover, we visualize how a large replay buffer hurts the learning process.\nInteractive systems have taken over the web and mobile space with increasing participation from users. Applications across every marketing domain can now be accessed through mobile or web where users can directly perform certain actions and reach a desired outcome. Actions of user on a system, though, can be representative of a certain intent. Ability to learn this intent through user's actions can help draw certain insight into the behavior of users on a system.   In this paper, we present models to optimize interactive systems by learning and analyzing user intent through their actions on the system. We present a four phased model that uses time-series of interaction actions sequentially using a Long Short-Term Memory (LSTM) based sequence learning system that helps build a model for intent recognition. Our system then provides an objective specific maximization followed by analysis and contrasting methods in order to identify spaces of improvement in the interaction system. We discuss deployment scenarios for such a system and present results from evaluation on an online marketplace using user clickstream data.\nIn this work we propose a blackbox intervention method for visual dialog models, with the aim of assessing the contribution of individual linguistic or visual components. Concretely, we conduct structured or randomized interventions that aim to impair an individual component of the model, and observe changes in task performance. We reproduce a state-of-the-art visual dialog model and demonstrate that our methodology yields surprising insights, namely that both dialog and image information have minimal contributions to task performance. The intervention method presented here can be applied as a sanity check for the strength and robustness of each component in visual dialog systems.\nDeriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users' interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users' preference. On the other hand, their exclusiveness of single modality source misses cross-modality information. This paper proposes a method, multimodal imitation learning via generative adversarial networks(MIL-GAN), to directly model users' interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users' demonstrated storylines. Our proposed model is designed to learn the reward patterns given user-provided storylines and then applies the learned policy to unseen data. The proposed approach is demonstrated to be capable of acquiring the user's implicit intent and outperforming competing methods by a substantial margin with a user study.\nThe search for increased trustworthiness of SAT solvers is very active and uses various methods. Some of these methods obtain a proof from the provers then check it, normally by replicating the search based on the proof's information. Because the certification process involves another nontrivial proof search, the trust we can place in it is decreased. Some attempts to amend this use certifiers which have been verified by proofs assistants such as Isabelle/HOL and Coq. Our approach is different because it is based on an extremely simplified certifier. This certifier enjoys a very high level of trust but is very inefficient. In this paper, we experiment with this approach and conclude that by placing some restrictions on the formats, one can mostly eliminate the need for search and in principle, can certify proofs of arbitrary size.\nA major challenge in Entity Linking (EL) is making effective use of contextual information to disambiguate mentions to Wikipedia that might refer to different entities in different contexts. The problem exacerbates with cross-lingual EL which involves linking mentions written in non-English documents to entries in the English Wikipedia: to compare textual clues across languages we need to compute similarity between textual fragments across languages. In this paper, we propose a neural EL model that trains fine-grained similarities and dissimilarities between the query and candidate document from multiple perspectives, combined with convolution and tensor networks. Further, we show that this English-trained system can be applied, in zero-shot learning, to other languages by making surprisingly effective use of multi-lingual embeddings. The proposed system has strong empirical evidence yielding state-of-the-art results in English as well as cross-lingual: Spanish and Chinese TAC 2015 datasets.\nAttention-based sequence-to-sequence models for automatic speech recognition jointly train an acoustic model, language model, and alignment mechanism. Thus, the language model component is only trained on transcribed audio-text pairs. This leads to the use of shallow fusion with an external language model at inference time. Shallow fusion refers to log-linear interpolation with a separately trained language model at each step of the beam search. In this work, we investigate the behavior of shallow fusion across a range of conditions: different types of language models, different decoding units, and different tasks. On Google Voice Search, we demonstrate that the use of shallow fusion with a neural LM with wordpieces yields a 9.1% relative word error rate reduction (WERR) over our competitive attention-based sequence-to-sequence model, obviating the need for second-pass rescoring.\nIn this paper, we propose a new algorithm for learning general latent-variable probabilistic graphical models using the techniques of predictive state representation, instrumental variable regression, and reproducing-kernel Hilbert space embeddings of distributions. Under this new learning framework, we first convert latent-variable graphical models into corresponding latent-variable junction trees, and then reduce the hard parameter learning problem into a pipeline of supervised learning problems, whose results will then be used to perform predictive belief propagation over the latent junction tree during the actual inference procedure. We then give proofs of our algorithm's correctness, and demonstrate its good performance in experiments on one synthetic dataset and two real-world tasks from computational biology and computer vision - classifying DNA splice junctions and recognizing human actions in videos.\nAttention mechanism has been used as an ancillary means to help RNN or CNN. However, the Transformer (Vaswani et al., 2017) recently recorded the state-of-the-art performance in machine translation with a dramatic reduction in training time by solely using attention. Motivated by the Transformer, Directional Self Attention Network (Shen et al., 2017), a fully attention-based sentence encoder, was proposed. It showed good performance with various data by using forward and backward directional information in a sentence. But in their study, not considered at all was the distance between words, an important feature when learning the local dependency to help understand the context of input text. We propose Distance-based Self-Attention Network, which considers the word distance by using a simple distance mask in order to model the local dependency without losing the ability of modeling global dependency which attention has inherent. Our model shows good performance with NLI data, and it records the new state-of-the-art result with SNLI data. Additionally, we show that our model has a strength in long sentences or documents.\nAn adversarial example is an example that has been adjusted to produce a wrong label when presented to a system at test time. To date, adversarial example constructions have been demonstrated for classifiers, but not for detectors. If adversarial examples that could fool a detector exist, they could be used to (for example) maliciously create security hazards on roads populated with smart vehicles. In this paper, we demonstrate a construction that successfully fools two standard detectors, Faster RCNN and YOLO. The existence of such examples is surprising, as attacking a classifier is very different from attacking a detector, and that the structure of detectors - which must search for their own bounding box, and which cannot estimate that box very accurately - makes it quite likely that adversarial patterns are strongly disrupted. We show that our construction produces adversarial examples that generalize well across sequences digitally, even though large perturbations are needed. We also show that our construction yields physical objects that are adversarial.\nWith access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet's accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties.\nThis paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is that existing paraphrase systems perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts. Challenges with paraphrase detection on user generated short texts, such as Twitter, include language irregularity and noise. To cope with these challenges, we propose a novel deep neural network-based approach that relies on coarse-grained sentence modeling using a convolutional neural network and a long short-term memory model, combined with a specific fine-grained word-level similarity matching model. Our experimental results show that the proposed approach outperforms existing state-of-the-art approaches on user-generated noisy social media data, such as Twitter texts, and achieves highly competitive performance on a cleaner corpus.\nLearning a goal-oriented dialog policy is generally performed offline with supervised learning algorithms or online with reinforcement learning (RL). Additionally, as companies accumulate massive quantities of dialog transcripts between customers and trained human agents, encoder-decoder methods have gained popularity as agent utterances can be directly treated as supervision without the need for utterance-level annotations. However, one potential drawback of such approaches is that they myopically generate the next agent utterance without regard for dialog-level considerations. To resolve this concern, this paper describes an offline RL method for learning from unannotated corpora that can optimize a goal-oriented policy at both the utterance and dialog level. We introduce a novel reward function and use both on-policy and off-policy policy gradient to learn a policy offline without requiring online user interaction or an explicit state space definition.\nRecent advances with in-memory columnar database techniques have increased the performance of analytical queries on very large databases and data warehouses. At the same time, advances in artificial intelligence (AI) algorithms have increased the ability to analyze data. We use the term AI to encompass both Deep Learning (DL or neural network) and Machine Learning (ML aka Big Data analytics). Our exploration of the AI full stack has led us to a cross-stack columnar database innovation that efficiently creates features for AI analytics. The innovation is to create Augmented Dictionary Values (ADVs) to add to existing columnar database dictionaries in order to increase the efficiency of featurization by minimizing data movement and data duplication. We show how various forms of featurization (feature selection, feature extraction, and feature creation) can be efficiently calculated in a columnar database. The full stack AI investigation has also led us to propose an integrated columnar database and AI architecture. This architecture has information flows and feedback loops to improve the whole analytics cycle during multiple iterations of extracting data from the data sources, featurization, and analysis.\nCoordinate descent methods minimize a cost function by updating a single decision variable (corresponding to one coordinate) at a time. Ideally, one would update the decision variable that yields the largest marginal decrease in the cost function. However, finding this coordinate would require checking all of them, which is not computationally practical. We instead propose a new adaptive method for coordinate descent. First, we define a lower bound on the decrease of the cost function when a coordinate is updated and, instead of calculating this lower bound for all coordinates, we use a multi-armed bandit algorithm to learn which coordinates result in the largest marginal decrease while simultaneously performing coordinate descent. We show that our approach improves the convergence of the coordinate methods (including parallel versions) both theoretically and experimentally.\nIn this paper, we describe and study the indicator mining problem in the online sex advertising domain. We present an in-development system, FlagIt (Flexible and adaptive generation of Indicators from text), which combines the benefits of both a lightweight expert system and classical semi-supervision (heuristic re-labeling) with recently released state-of-the-art unsupervised text embeddings to tag millions of sentences with indicators that are highly correlated with human trafficking. The FlagIt technology stack is open source. On preliminary evaluations involving five indicators, FlagIt illustrates promising performance compared to several alternatives. The system is being actively developed, refined and integrated into a domain-specific search system used by over 200 law enforcement agencies to combat human trafficking, and is being aggressively extended to mine at least six more indicators with minimal programming effort. FlagIt is a good example of a system that operates in limited label settings, and that requires creative combinations of established machine learning techniques to produce outputs that could be used by real-world non-technical analysts.\nAn outstanding challenge in nonlinear systems theory is identification or learning of a given nonlinear system's Koopman operator directly from data or models. Advances in extended dynamic mode decomposition approaches and machine learning methods have enabled data-driven discovery of Koopman operators, for both continuous and discrete-time systems. Since Koopman operators are often infinite-dimensional, they are approximated in practice using finite-dimensional systems. The fidelity and convergence of a given finite-dimensional Koopman approximation is a subject of ongoing research. In this paper we introduce a class of Koopman observable functions that confer an approximate closure property on their corresponding finite-dimensional approximations of the Koopman operator. We derive error bounds for the fidelity of this class of observable functions, as well as identify two key learning parameters which can be used to tune performance. We illustrate our approach on two classical nonlinear system models: the Van Der Pol oscillator and the bistable toggle switch.\nFeature selection is an important preprocessing step for classification problems. It deals with selecting near optimal features in the original dataset. Feature selection is an NP-hard problem, so meta-heuristics can be more efficient than exact methods. In this work, Ant Lion Optimizer (ALO), which is a recent metaheuristic algorithm, is employed as a wrapper feature selection method. Six variants of ALO are proposed where each employ a transfer function to map a continuous search space to a discrete search space. The performance of the proposed approaches is tested on eighteen UCI datasets and compared to a number of existing approaches in the literature: Particle Swarm Optimization, Gravitational Search Algorithm, and two existing ALO-based approaches. Computational experiments show that the proposed approaches efficiently explore the feature space and select the most informative features, which help to improve the classification accuracy.\nAs of February 2016 Facebook allows users to express their experienced emotions about a post by using five so-called `reactions'. This research paper proposes and evaluates alternative methods for predicting these reactions to user posts on public pages of firms/companies (like supermarket chains). For this purpose, we collected posts (and their reactions) from Facebook pages of large supermarket chains and constructed a dataset which is available for other researches. In order to predict the distribution of reactions of a new post, neural network architectures (convolutional and recurrent neural networks) were tested using pretrained word embeddings. Results of the neural networks were improved by introducing a bootstrapping approach for sentiment and emotion mining on the comments for each post. The final model (a combination of neural network and a baseline emotion miner) is able to predict the reaction distribution on Facebook posts with a mean squared error (or misclassification rate) of 0.135.\nWhile off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to off-policy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on action-values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Q-learning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as computational complexity, usage of uncertainty, and nonlinearity.\nThis paper proposes adversarial attacks for Reinforcement Learning (RL) and then improves the robustness of Deep Reinforcement Learning algorithms (DRL) to parameter uncertainties with the help of these attacks. We show that even a naively engineered attack successfully degrades the performance of DRL algorithm. We further improve the attack using gradient information of an engineered loss function which leads to further degradation in performance. These attacks are then leveraged during training to improve the robustness of RL within robust control framework. We show that this adversarial training of DRL algorithms like Deep Double Q learning and Deep Deterministic Policy Gradients leads to significant increase in robustness to parameter variations for RL benchmarks such as Cart-pole, Mountain Car, Hopper and Half Cheetah environment.\nIn recent years, many techniques have been developed to improve the performance and efficiency of data center networks. While these techniques provide high accuracy, they are often designed using heuristics that leverage domain-specific properties of the workload or hardware.   In this vision paper, we argue that many data center networking techniques, e.g., routing, topology augmentation, energy savings, with diverse goals actually share design and architectural similarity. We present a design for developing general intermediate representations of network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework, DeepConfig, that simplifies the processing of configuring and training deep learning agents that use the intermediate representation to learns different tasks. To illustrate the strength of our approach, we configured, implemented, and evaluated a DeepConfig-Agent that tackles the data center topology augmentation problem. Our initial results are promising --- DeepConfig performs comparably to the optimal.\nWe present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. The simulator leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. We use MINOS to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity on navigation performance, and to carry out a controlled study of multimodality in sensorimotor learning. The experiments show that current deep reinforcement learning approaches fail in large realistic environments. The experiments also indicate that multimodality is beneficial in learning to navigate cluttered scenes. MINOS is released open-source to the research community at http://minosworld.org . A video that shows MINOS can be found at https://youtu.be/c0mL9K64q84\nTransfer Learning helps to build a system to recognize and apply knowledge and experience learned in previous tasks (source task) to new tasks or new domains (target task), which share some commonality. The two important factors that impact the performance of transfer learning models are: (a) the size of the target dataset and (b) the similarity in distribution between source and target domains. Thus far there has been little investigation into just how important these factors are. In this paper, we investigated the impact of target dataset size and source/target domain similarity on model performance through a series of experiments. We found that more data is always beneficial, and that model performance improved linearly with the log of data size, until we were out of data. As source/target domains differ, more data is required and fine tuning will render better performance than feature extraction. When source/target domains are similar and data size is small, fine tuning and feature extraction renders equivalent performance. We hope that our study inspires further work in transfer learning, which continues to be a very important technique for developing practical machine learning applications in business domains.\nModern virtual personal assistants provide a convenient interface for completing daily tasks via voice commands. An important consideration for these assistants is the ability to recover from automatic speech recognition (ASR) and natural language understanding (NLU) errors. In this paper, we focus on learning robust dialog policies to recover from these errors. To this end, we develop a user simulator which interacts with the assistant through voice commands in realistic scenarios with noisy audio, and use it to learn dialog policies through deep reinforcement learning. We show that dialogs generated by our simulator are indistinguishable from human generated dialogs, as determined by human evaluators. Furthermore, preliminary experimental results show that the learned policies in noisy environments achieve the same execution success rate with fewer dialog turns compared to fixed rule-based policies.\nEigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration. Despite its initial promising results, a couple of issues in current algorithms limit its application, namely: (1) EO methods require two separate steps (eigenoption discovery and reward maximization) to learn a control policy, which can incur a significant amount of storage and computation; (2) EOs are only defined for problems with discrete state-spaces and; (3) it is not easy to take the environment's reward function into consideration when discovering EOs. To addresses these issues, we introduce an algorithm termed eigenoption-critic (EOC) based on the Option-critic (OC) framework [Bacon17], a general hierarchical reinforcement learning (RL) algorithm that allows learning the intra-option policies simultaneously with the policy over options. We also propose a generalization of EOC to problems with continuous state-spaces through the Nystr\\\"om approximation. EOC can also be seen as extending OC to nonstationary settings, where the discovered options are not tailored for a single task.\nThe performance of optimization algorithms relies crucially on their parameterizations. Finding good parameter settings is called algorithm tuning. Using a simple simulated annealing algorithm, we will demonstrate how optimization algorithms can be tuned using the sequential parameter optimization toolbox (SPOT). SPOT provides several tools for automated and interactive tuning. The underling concepts of the SPOT approach are explained. This includes key techniques such as exploratory fitness landscape analysis and response surface methodology. Many examples illustrate how SPOT can be used for understanding the performance of algorithms and gaining insight into algorithm's behavior. Furthermore, we demonstrate how SPOT can be used as an optimizer and how a sophisticated ensemble approach is able to combine several meta models via stacking.\nIn this paper, we present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE). RESIDE highlights diverse data sources and image contents, and is divided into five subsets, each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on RESIDE shed light on the comparisons and limitations of state-of-the-art dehazing algorithms, and suggest promising future directions.\nModeling and verifying real-world cyber-physical systems are challenging, especially so for complex systems where manually modeling is infeasible. In this work, we report our experience on combining model learning and abstraction refinement to analyze a challenging system, i.e., a real-world Secure Water Treatment (SWaT) system. Given a set of safety requirements, the objective is to either show that the system is safe with a high probability (so that a system shutdown is rarely triggered due to safety violation) or otherwise. As the system is too complicated to be manually modelled, we apply latest automatic model learning techniques to construct a set of Markov chains through abstraction and refinement, based on two long system execution logs (one for training and the other for testing). For each probabilistic property, we either report it does not hold with a certain level of probabilistic confidence, or report that it holds by showing the evidence in the form of an abstract Markov chain. The Markov chains can subsequently be implemented as runtime monitors in SWaT. This is the first case study of applying model learning techniques to a real-world system as far as we know.\nSequential pattern mining techniques extract patterns corresponding to frequent subsequences from a sequence database. A practical limitation of these techniques is that they overload the user with too many patterns. Local Process Model (LPM) mining is an alternative approach coming from the field of process mining. While in traditional sequential pattern mining, a pattern describes one subsequence, an LPM captures a set of subsequences. Also, while traditional sequential patterns only match subsequences that are observed in the sequence database, an LPM may capture subsequences that are not explicitly observed, but that are related to observed subsequences. In other words, LPMs generalize the behavior observed in the sequence database. These properties make it possible for a set of LPMs to cover the behavior of a much larger set of sequential patterns. Yet, existing LPM mining techniques still suffer from the pattern explosion problem because they produce sets of redundant LPMs. In this paper, we propose several heuristics to mine a set of non-redundant LPMs either from a set of redundant LPMs or from a set of sequential patterns. We empirically compare the proposed heuristics between them and against existing (local) process mining techniques in terms of coverage, precision, and complexity of the produced sets of LPMs.\nThe search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.\nIn this paper, we will show that (1) the results about the fuzzy reasoning algoritm obtained in the paper \"Computer Sciences Vol. 34, No.4, pp.145-148, 2007\" according to the paper \"IEEE Transactions On systems, Man and cybernetics, 18, pp.1049-1056, 1988\" are correct; (2) example 2 in the paper \"An Algorithm of General Fuzzy Inference With The Reductive Property\" presented by He Ying-Si, Quan Hai-Jin and Deng Hui-Wen according to the paper \"An approximate analogical reasoning approach based on similarity measures\" presented by Tursken I.B. and Zhong zhao is incorrect; (3) the mistakes in their paper are modified and then a calculation example of FMT is supplemented.\nBanking is one of the most significant adopters of cutting-edge information technologies. Since its modern era beginning in the form of paper based accounting maintained in the branch, adoption of computerized system made it possible to centralize the processing in data centre and improve customer experience by making a more available and efficient system. The latest twist in this evolution is adoption of natural language processing and speech recognition in the user interface between the human and the system and use of machine learning and advanced analytics, in general, for backend processing as well. The paper reviews the progress of technology adoption in the field and comments on the maturity level of solutions involving less studied or low-resource languages like Hindi and also other Indian, regional languages. Furthermore, it also provides an analysis from a prototype built by us. The future directions of this area are also highlighted.\nWe examine the issue of stability of probability in reasoning about complex systems with uncertainty in structure. Normally, propositions are viewed as probability functions on an abstract random graph where it is implicitly assumed that the nodes of the graph have stable properties. But what if some of the nodes change their characteristics? This is a situation that cannot be covered by abstractions of either static or dynamic sets when these changes take place at regular intervals. We propose the use of sets with elements that change, and modular forms are proposed to account for one type of such change. An expression for the dependence of the mean on the probability of the switching elements has been determined. The system is also analyzed from the perspective of decision between different hypotheses. Such sets are likely to be of use in complex system queries and in analysis of surveys.\nThis paper presents a framework for intrinsic point of interest discovery from trajectory databases. Intrinsic points of interest are regions of a geospatial area innately defined by the spatial and temporal aspects of trajectory data, and can be of varying size, shape, and resolution. Any trajectory database exhibits such points of interest, and hence are intrinsic, as compared to most other point of interest definitions which are said to be extrinsic, as they require trajectory metadata, external knowledge about the region the trajectories are observed, or other application-specific information. Spatial and temporal aspects are qualities of any trajectory database, making the framework applicable to data from any domain and of any resolution. The framework is developed under recent developments on the consistency of nonparametric hierarchical density estimators and enables the possibility of formal statistical inference and evaluation over such intrinsic points of interest. Comparisons of the POIs uncovered by the framework in synthetic truth data to thousands of parameter settings for common POI discovery methods show a marked improvement in fidelity without the need to tune any parameters by hand.\nTo harness the complexity of their high-dimensional bodies during sensorimotor development, infants are guided by patterns of freezing and freeing of degrees of freedom. For instance, when learning to reach, infants free the degrees of freedom in their arm proximodistally, i.e. from joints that are closer to the body to those that are more distant. Here, we formulate and study computationally the hypothesis that such patterns can emerge spontaneously as the result of a family of stochastic optimization processes (evolution strategies with covariance-matrix adaptation), without an innate encoding of a maturational schedule. In particular, we present simulated experiments with an arm where a computational learner progressively acquires reaching skills through adaptive exploration, and we show that a proximodistal organization appears spontaneously, which we denote PDFF (ProximoDistal Freezing and Freeing of degrees of freedom). We also compare this emergent organization between different arm morphologies -- from human-like to quite unnatural ones -- to study the effect of different kinematic structures on the emergence of PDFF. Keywords: human motor learning; proximo-distal exploration; stochastic optimization; modelling; evolution strategies; cross-entropy methods; policy search; morphology.}\nThis paper considers the integrated problem of quay crane assignment, quay crane scheduling, yard location assignment, and vehicle dispatching operations at a container terminal. The main objective is to minimize vessel turnover times and maximize the terminal throughput, which are key economic drivers in terminal operations. Due to their computational complexities, these problems are not optimized jointly in existing work. This paper revisits this limitation and proposes Mixed Integer Programming (MIP) and Constraint Programming (CP) models for the integrated problem, under some realistic assumptions. Experimental results show that the MIP formulation can only solve small instances, while the CP model finds optimal solutions in reasonable times for realistic instances derived from actual container terminal operations.\nIn this work, we propose a goal-driven collaborative task that contains vision, language, and action in a virtual environment as its core components. Specifically, we develop a collaborative `Image Drawing' game between two agents, called CoDraw. Our game is grounded in a virtual world that contains movable clip art objects. Two players, Teller and Drawer, are involved. The Teller sees an abstract scene containing multiple clip arts in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip arts. The two players communicate via two-way communication using natural language. We collect the CoDraw dataset of ~10K dialogs consisting of 138K messages exchanged between a Teller and a Drawer from Amazon Mechanical Turk (AMT). We analyze our dataset and present three models to model the players' behaviors, including an attention model to describe and draw multiple clip arts at each round. The attention models are quantitatively compared to the other models to show how the conventional approaches work for this new task. We also present qualitative visualizations.\nRecurrent neural networks with differentiable attention mechanisms have had success in generative and classification tasks. We show that the classification performance of such models can be enhanced by guiding a randomly initialized model to attend to salient regions of the input in early training iterations. We further show that, if explicit heuristics for guidance are unavailable, a model that is pretrained on an unsupervised reconstruction task can discover good attention policies without supervision. We demonstrate that increased efficiency of the attention mechanism itself contributes to these performance improvements. Based on these insights, we introduce bootstrapped glimpse mimicking, a simple, theoretically task-general method of more effectively training attention models. Our work draws inspiration from and parallels results on human learning of attention.\nInverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior. Since human planning systematically deviates from rationality, several approaches have been tried to account for specific human shortcomings. However, there has been little analysis of the general problem of inferring the reward of a human of unknown rationality. The observed behavior can, in principle, be decomposed into two components: a reward function and a planning algorithm, both of which have to be inferred from behavior. This paper presents a No Free Lunch theorem, showing that, without making `normative' assumptions beyond the data, nothing about the human reward function can be deduced from human behavior. Unlike most No Free Lunch theorems, this cannot be alleviated by regularising with simplicity assumptions. We show that the simplest hypotheses which explain the data are generally degenerate.\nAs more robots act in physical proximity to people, it is essential to ensure they make decisions and execute actions that align with human values. To do so, robots need to understand the true intentions behind human-issued commands. In this paper, we define a safe robot as one that receives a natural-language command from humans, considers an action in response to that command, and accurately predicts how humans will judge that action if is executed in reality. Our contribution is two-fold: First, we introduce a web platform for human users to propose commands to simulated robots. The robots receive commands and act based on those proposed commands, and then the users provide positive and/or negative reinforcement. Next, we train a critic for each robot to predict the crowd's responses to one of the crowd-proposed commands. Second, we show that the morphology of a robot plays a role in the way it grounds language: The critics show that two of the robots used in the experiment achieve a lower prediction error than the others. Thus, those two robots are safer, according to our definition, since they ground the proposed command more accurately.\nThe next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray---a distributed system to address them. Ray implements a dynamic task graph computation model that supports both the task-parallel and the actor programming models. To meet the performance requirements of AI applications, we propose an architecture that logically centralizes the system's control state using a sharded storage system and a novel bottom-up distributed scheduler. In our experiments, we demonstrate sub-millisecond remote task latencies and linear throughput scaling beyond 1.8 million tasks per second. We empirically validate that Ray speeds up challenging benchmarks and serves as both a natural and performant fit for an emerging class of reinforcement learning applications and algorithms.\nDeep learning has led to a paradigm shift in artificial intelligence, including web, text and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning in general and deep learning in particular is ideally suited for representing quantum-mechanical interactions, enabling to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for \\emph{molecules and materials} where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study of the quantum-mechanical properties of C$_{20}$-fullerene that would have been infeasible with regular ab initio molecular dynamics.\nThe visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs. In this work, we extend their work to show that the Hadamard product in multimodal deep networks performs not only for visual inputs but also for textual inputs simultaneously using the proposed gradient-based visualization technique. The attentional effect of Hadamard product is visualized for both visual and textual inputs by analyzing the two inputs and an output of the Hadamard product with the proposed method and compared with learned attentional weights of a visual question answering model.\n`Indifference' refers to a class of methods that are used to control a reward based agent. These methods of control work even if the implications of the agent's reward are otherwise not fully understood. Though they all come out of similar ideas, indifference techniques can be classified as way of achieving one or more of three distinct goals: rewards dependent on certain events (with no motivation for the agent to manipulate the probability of those events), effective disbelief that an event will ever occur, and seamless transition from one behaviour to another. This paper analyses methods of achieving these goals in the POMDP setting, and establishes their uses, strengths, and limitations. It aims to make the tools of indifference generally accessible and usable to agent designers.\nWe would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of data and can use non-Gaussian likelihoods. Our inspiration for this paper is to marry these two approaches and reap the benefits of both. In order to do so we will introduce a novel approximate inference scheme inspired by the GP-LVM and the VAE. We show experimentally that the approximation allows the capacity of the generative bottle-neck (Z) of the VAE to be arbitrarily large without losing a highly interpretable representation, allowing reconstruction quality to be unlimited by Z at the same time as a low-dimensional space can be used to perform ancestral sampling from as well as a means to reason about the embedded data.\nWe show that the forward and backward propagation can be formulated as a solution of lower and upper triangular systems of equations. For standard feedforward (FNNs) and recurrent neural networks (RNNs) the triangular systems are always block bi-diagonal, while for a general computation graph (directed acyclic graph) they can have a more complex triangular sparsity pattern. We discuss direct and iterative parallel algorithms that can be used for their solution and interpreted as different ways of performing model parallelism. Also, we show that for FNNs and RNNs with $k$ layers and $\\tau$ time steps the backward propagation can be performed in parallel in O($\\log k$) and O($\\log k \\log \\tau$) steps, respectively. Finally, we outline the generalization of this technique using Jacobians that potentially allows us to handle arbitrary layers.\nFrom our experiences in the past, we have seen that the growth of cities is very much dependent on the transportation networks. In mega cities, transportation networks determine to a significant extent as to where the people will move and houses will be built. Hence, transportation network data is crucial to an urban growth prediction system. Existing works have used manually derived distance based features based on the road networks to build models on urban growth. But due to the non-generic and laborious nature of the manual feature engineering process, we can shift to End-to-End systems which do not rely on manual feature engineering. In this paper, we propose a method to integrate road network data to an existing Rule based End-to-End framework without manual feature engineering. Our method employs recurrent neural networks to represent road networks in a structured way such that it can be plugged into the previously proposed End-to-End framework. The proposed approach enhances the performance in terms of Figure of Merit, Producer's accuracy, User's accuracy and Overall accuracy of the existing Rule based End-to-End framework.\nFor relational monadic formulas (the L\\\"owenheim class) second-order quantifier elimination, which is closely related to computation of uniform interpolants, projection and forgetting - operations that currently receive much attention in knowledge processing - always succeeds. The decidability proof for this class by Heinrich Behmann from 1922 explicitly proceeds by elimination with equivalence preserving formula rewriting. Here we reconstruct the results from Behmann's publication in detail and discuss related issues that are relevant in the context of modern approaches to second-order quantifier elimination in computational logic. In addition, an extensive documentation of the letters and manuscripts in Behmann's bequest that concern second-order quantifier elimination is given, including a commented register and English abstracts of the German sources with focus on technical material. In the late 1920s Behmann attempted to develop an elimination-based decision method for formulas with predicates whose arity is larger than one. His manuscripts and the correspondence with Wilhelm Ackermann show technical aspects that are still of interest today and give insight into the genesis of Ackermann's landmark paper \"Untersuchungen \\\"uber das Eliminationsproblem der mathematischen Logik\" from 1935, which laid the foundation of the two prevailing modern approaches to second-order quantifier elimination.\nNowadays many artificial intelligence systems rely on knowledge bases for enriching the information they process. Such Knowledge Bases are usually difficult to obtain and therefore they are crowdsourced: they are available for everyone on the internet to suggest edits and add new information. Unfortunately, they are sometimes targeted by vandals who put inaccurate or offensive information there. This is especially bad for the systems that use these Knowledge Bases: for them it is important to use reliable information to make correct inferences.   One of such knowledge bases is Wikidata, and to fight vandals the organizers of WSDM Cup 2017 challenged participants to build a model for detecting mistrustful edits. In this paper we present the second place solution to the cup: we show that it is possible to achieve competitive performance with simple linear classification. With our approach we can achieve AU ROC of 0.938 on the test data. Additionally, compared to other approaches, ours is significantly faster. The solution is made available on GitHub.\nA common goal in Reinforcement Learning is to derive a good strategy given a limited batch of data. In this paper, we adopt the safe policy improvement (SPI) approach: we compute a target policy guaranteed to perform at least as well as a given baseline policy. Our SPI strategy, inspired by the knows-what-it-knows paradigms, consists in bootstrapping the target policy with the baseline policy when it does not know. We develop two computationally efficient bootstrapping algorithms, a value-based and a policy-based, both accompanied with theoretical SPI bounds. Three algorithm variants are proposed. We empirically show the literature algorithms limits on a small stochastic gridworld problem, and then demonstrate that our five algorithms not only improve the worst case scenarios, but also the mean performance.\nIn the context of public transport modeling and simulation, we address the problem of mismatch between simulated transit trips and observed ones. We point to the weakness of the current travel demand modeling process; the trips it generates are over-optimistic and do not reflect the real passenger choices. We introduce the notion of mini activities the travelers do during the trips; they can explain the deviation of simulated trips from the observed trips. We propose to mine the smart card data to extract the mini activities. We develop a technique to integrate them in the generated trips and learn such an integration from two available sources, the trip history and trip planner recommendations. For an input travel demand, we build a Markov chain over the trip collection and apply the Monte Carlo Markov Chain algorithm to integrate mini activities in such a way that the selected characteristics converge to the desired distributions. We test our method in different settings on the passenger trip collection of Nancy, France. We report experimental results demonstrating a very important mismatch reduction.\nThis paper proposes a novel column generation framework for combinatorial software testing. In particular, it combines Mathematical Programming and Constraint Programming in a hybrid decomposition to generate covering arrays. The approach allows generating parameterized test cases with coverage guarantees between parameter interactions of a given application. Compared to exhaustive testing, combinatorial test case generation reduces the number of tests to run significantly. Our column generation algorithm is generic and can accommodate mixed coverage arrays over heterogeneous alphabets. The algorithm is realized in practice as a cloud service and recognized as one of the five winners of the company-wide cloud application challenge at Oracle. The service is currently helping software developers from a range of different product teams in their testing efforts while exposing declarative constraint models and hybrid optimization techniques to a broader audience.\nThe randomized-feature approach has been successfully employed in large-scale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing data-dependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomized-feature approach in supervised learning for good generalizability. We propose the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters.\nThe emerging vehicular networks are expected to make everyday vehicular operation safer, greener, and more efficient, and pave the path to autonomous driving in the advent of the fifth generation (5G) cellular system. Machine learning, as a major branch of artificial intelligence, has been recently applied to wireless networks to provide a data-driven approach to solve traditionally challenging problems. In this article, we review recent advances in applying machine learning in vehicular networks and attempt to bring more attention to this emerging area. After a brief overview of the major concept of machine learning, we present some application examples of machine learning in solving problems arising in vehicular networks. We finally discuss and highlight several open issues that warrant further research.\nLearning policies for complex tasks that require multiple different skills is a major challenge in reinforcement learning (RL). It is also a requirement for its deployment in real-world scenarios. This paper proposes a novel framework for efficient multi-task reinforcement learning. Our framework trains agents to employ hierarchical policies that decide when to use a previously learned policy and when to learn a new skill. This enables agents to continually acquire new skills during different stages of training. Each learned task corresponds to a human language description. Because agents can only access previously learned skills through these descriptions, the agent can always provide a human-interpretable description of its choices. In order to help the agent learn the complex temporal dependencies necessary for the hierarchical policy, we provide it with a stochastic temporal grammar that modulates when to rely on previously learned skills and when to execute new skills. We validate our approach on Minecraft games designed to explicitly test the ability to reuse previously learned skills while simultaneously learning new skills.\nSecond-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of the Hessian-free method that leverages a block-diagonal approximation of the generalized Gauss-Newton matrix. Our method computes the curvature approximation matrix only for pairs of parameters from the same layer or block of the neural network and performs conjugate gradient updates independently for each block. Experiments on deep autoencoders, deep convolutional networks, and multilayer LSTMs demonstrate better convergence and generalization compared to the original Hessian-free approach and the Adam method.\nBreast cancer is already one of the most common form of cancer worldwide. Mammography image analysis is still the most effective diagnostic method to promote the early detection of breast cancer. Accurately segmenting tumors in digital mammography images is important to improve diagnosis capabilities of health specialists and avoid misdiagnosis. In this work, we evaluate the feasibility of applying GrowCut to segment regions of tumor and we propose two GrowCut semi-supervised versions. All the analysis was performed by evaluating the application of segmentation techniques to a set of images obtained from the Mini-MIAS mammography image database. GrowCut segmentation was compared to Region Growing, Active Contours, Random Walks and Graph Cut techniques. Experiments showed that GrowCut, when compared to the other techniques, was able to acquire better results for the metrics analyzed. Moreover, the proposed semi-supervised versions of GrowCut was proved to have a clinically satisfactory quality of segmentation.\nGastric cancer is the second leading cause of cancer-related deaths worldwide, and the major hurdle in biomedical image analysis is the determination of the cancer extent. This assignment has high clinical relevance and would generally require vast microscopic assessment by pathologists. Recent advances in deep learning have produced inspiring results on biomedical image segmentation, while its outcome is reliant on comprehensive annotation. This requires plenty of labor costs, for the ground truth must be annotated meticulously by pathologists. In this paper, a reiterative learning framework was presented to train our network on partial annotated biomedical images, and superior performance was achieved without any pre-trained or further manual annotation. We eliminate the boundary error of patch-based model through our overlapped region forecast algorithm. Through these advisable methods, a mean intersection over union coefficient (IOU) of 0.883 and mean accuracy of 91.09% on the partial labeled dataset was achieved, which made us win the 2017 China Big Data & Artificial Intelligence Innovation and Entrepreneurship Competitions.\nOpen-domain social dialogue is one of the long-standing goals of Artificial Intelligence. This year, the Amazon Alexa Prize challenge was announced for the first time, where real customers get to rate systems developed by leading universities worldwide. The aim of the challenge is to converse \"coherently and engagingly with humans on popular topics for 20 minutes\". We describe our Alexa Prize system (called 'Alana') consisting of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose a system response. The ranker was trained on real user feedback received during the competition, where we address the problem of how to train on the noisy and sparse feedback obtained during the competition.\nKnowledge base (KB) completion aims to infer missing facts from existing ones in a KB. Among various approaches, path ranking (PR) algorithms have received increasing attention in recent years. PR algorithms enumerate paths between entity pairs in a KB and use those paths as features to train a model for missing fact prediction. Due to their good performances and high model interpretability, several methods have been proposed. However, most existing methods suffer from scalability (high RAM consumption) and feature explosion (trains on an exponentially large number of features) problems. This paper proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems by introducing a selective path exploration strategy. C-PR learns global semantics of entities in the KB using word embedding and leverages the knowledge of entity semantics to enumerate contextually relevant paths using bidirectional random walk. Experimental results on three large KBs show that the path features (fewer in number) discovered by C-PR not only improve predictive performance but also are more interpretable than existing baselines.\nApproximate model counting for bit-vector SMT formulas (generalizing \\#SAT) has many applications such as probabilistic inference and quantitative information-flow security, but it is computationally difficult. Adding random parity constraints (XOR streamlining) and then checking satisfiability is an effective approximation technique, but it requires a prior hypothesis about the model count to produce useful results. We propose an approach inspired by statistical estimation to continually refine a probabilistic estimate of the model count for a formula, so that each XOR-streamlined query yields as much information as possible. We implement this approach, with an approximate probability model, as a wrapper around an off-the-shelf SMT solver or SAT solver. Experimental results show that the implementation is faster than the most similar previous approaches which used simpler refinement strategies. The technique also lets us model count formulas over floating-point constraints, which we demonstrate with an application to a vulnerability in differential privacy mechanisms.\nWe consider the problem of Bayesian inference in the family of probabilistic models implicitly defined by stochastic generative models of data. In scientific fields ranging from population biology to cosmology, low-level mechanistic components are composed to create complex generative models. These models lead to intractable likelihoods and are typically non-differentiable, which poses challenges for traditional approaches to inference. We extend previous work in \"inference compilation\", which combines universal probabilistic programming and deep learning methods, to large-scale scientific simulators, and introduce a C++ based probabilistic programming library called CPProb. We successfully use CPProb to interface with SHERPA, a large code-base used in particle physics. Here we describe the technical innovations realized and planned for this library.\nNeural networks have been widely used to solve complex real-world problems. Due to the complicate, nonlinear, non-convex nature of neural networks, formal safety guarantees for the output behaviors of neural networks will be crucial for their applications in safety-critical systems.In this paper, the output reachable set computation and safety verification problems for a class of neural networks consisting of Rectified Linear Unit (ReLU) activation functions are addressed. A layer-by-layer approach is developed to compute output reachable set. The computation is formulated in the form of a set of manipulations for a union of polyhedra, which can be efficiently applied with the aid of polyhedron computation tools. Based on the output reachable set computation results, the safety verification for a ReLU neural network can be performed by checking the intersections of unsafe regions and output reachable set described by a union of polyhedra. A numerical example of a randomly generated ReLU neural network is provided to show the effectiveness of the approach developed in this paper.\nThe potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly. Surprisingly, there is no prior work developing tree-induction algorithms for building fair decision trees or fair random forests. These methods have widespread popularity as they are one of the few to be simultaneously interpretable, non-linear, and easy-to-use. In this paper we develop, to our knowledge, the first technique for the induction of fair decision trees. We show that our \"Fair Forest\" retains the benefits of the tree-based approach, while providing both greater accuracy and fairness than other alternatives, for both \"group fairness\" and \"individual fairness.'\" We also introduce new measures for fairness which are able to handle multinomial and continues attributes as well as regression problems, as opposed to binary attributes and labels only. Finally, we demonstrate a new, more robust evaluation procedure for algorithms that considers the dataset in its entirety rather than only a specific protected attribute.\nWe present a neural architecture that takes as input a 2D or 3D shape and outputs a program that generates the shape. The instructions in our program are based on constructive solid geometry principles, i.e., a set of boolean operations on shape primitives defined recursively. Bottom-up techniques for this shape parsing task rely on primitive detection and are inherently slow since the search space over possible primitive combinations is large. In contrast, our model uses a recurrent neural network that parses the input shape in a top-down manner, which is significantly faster and yields a compact and easy-to-interpret sequence of modeling instructions. Our model is also more effective as a shape detector compared to existing state-of-the-art detection techniques. We finally demonstrate that our network can be trained on novel datasets without ground-truth program annotations through policy gradient techniques.\nIn the context of post-hoc interpretability, this paper addresses the task of explaining the prediction of a classifier, considering the case where no information is available, neither on the classifier itself, nor on the processed data (neither the training nor the test data). It proposes an instance-based approach whose principle consists in determining the minimal changes needed to alter a prediction: given a data point whose classification must be explained, the proposed method consists in identifying a close neighbour classified differently, where the closeness definition integrates a sparsity constraint. This principle is implemented using observation generation in the Growing Spheres algorithm. Experimental results on two datasets illustrate the relevance of the proposed approach that can be used to gain knowledge about the classifier.\nConditional preference networks (CP-nets) are a graphical representation of a person's (conditional) preferences over a set of discrete variables. In this paper, we introduce a novel method of quantifying preference for any given outcome based on a CP-net representation of a user's preferences. We demonstrate that these values are useful for reasoning about user preferences. In particular, they allow us to order (any subset of) the possible outcomes in accordance with the user's preferences. Further, these values can be used to improve the efficiency of outcome dominance testing. That is, given a pair of outcomes, we can determine which the user prefers more efficiently. We show that these results also hold for CP-nets that express indifference between variable values.\nDiscovery of an accurate causal Bayesian network structure from observational data can be useful in many areas of science. Often the discoveries are made under uncertainty, which can be expressed as probabilities. To guide the use of such discoveries, including directing further investigation, it is important that those probabilities be well-calibrated. In this paper, we introduce a novel framework to derive calibrated probabilities of causal relationships from observational data. The framework consists of three components: (1) an approximate method for generating initial probability estimates of the edge types for each pair of variables, (2) the availability of a relatively small number of the causal relationships in the network for which the truth status is known, which we call a calibration training set, and (3) a calibration method for using the approximate probability estimates and the calibration training set to generate calibrated probabilities for the many remaining pairs of variables. We also introduce a new calibration method based on a shallow neural network. Our experiments on simulated data support that the proposed approach improves the calibration of causal edge predictions. The results also support that the approach often improves the precision and recall of predictions.\nQuestions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA). The most common approaches to VQA involve either classifying answers based on fixed length representations of both the image and question or summing fractional counts estimated from each section of the image. In contrast, we treat counting as a sequential decision process and force our model to make discrete choices of what to count. Specifically, the model sequentially selects from detected objects and learns interactions between objects that influence subsequent selections. A distinction of our approach is its intuitive and interpretable output, as discrete counts are automatically grounded in the image. Furthermore, our method outperforms the state of the art architecture for VQA on multiple metrics that evaluate counting.\nIn domains with high knowledge distribution a natural objective is to create principle foundations for collaborative interactive learning environments. We present a first mathematical characterization of a collaborative learning group, a consortium, based on closure systems of attribute sets and the well-known attribute exploration algorithm from formal concept analysis. To this end, we introduce (weak) local experts for subdomains of a given knowledge domain. These entities are able to refute and potentially accept a given (implicational) query for some closure system that is a restriction of the whole domain. On this we build up a consortial expert and show first insights about the ability of such an expert to answer queries. Furthermore, we depict techniques on how to cope with falsely accepted implications and on combining counterexamples. Using notions from combinatorial design theory we further expand those insights as far as providing first results on the decidability problem if a given consortium is able to explore some target domain. Applications in conceptual knowledge acquisition as well as in collaborative interactive ontology learning are at hand.\nDeep Neural Networks are built to generalize outside of training set in mind by using techniques such as regularization, early stopping and dropout. But considerations to make them more resilient to adversarial examples are rarely taken. As deep neural networks become more prevalent in mission-critical and real-time systems, miscreants start to attack them by intentionally making deep neural networks to misclassify an object of one type to be seen as another type. This can be catastrophic in some scenarios where the classification of a deep neural network can lead to a fatal decision by a machine. In this work, we used GTSRB dataset to craft adversarial samples by Fast Gradient Sign Method and Jacobian Saliency Method, used those crafted adversarial samples to attack another Deep Convolutional Neural Network and built the attacked network to be more resilient against adversarial attacks by making it more robust by Defensive Distillation and Adversarial Training\nAs efficient traffic-management platforms, public vehicle (PV) systems are envisioned to be a promising approach to solving traffic congestions and pollutions for future smart cities. PV systems provide online/dynamic peer-to-peer ride-sharing services with the goal of serving sufficient number of customers with minimum number of vehicles and lowest possible cost. A key component of the PV system is the online ride-sharing scheduling strategy. In this paper, we propose an efficient path planning strategy that focuses on a limited potential search area for each vehicle by filtering out the requests that violate passenger service quality level, so that the global search is reduced to local search. We analyze the performance of the proposed solution such as reduction ratio of computational complexity. Simulations based on the Manhattan taxi data set show that, the computing time is reduced by 22% compared with the exhaustive search method under the same service quality performance.\nIn the research of the impact of gestures using by a lecturer, one challenging task is to infer the attention of a group of audiences. Two important measurements that can help infer the level of attention are eye movement data and Electroencephalography (EEG) data. Under the fundamental assumption that a group of people would look at the same place if they all pay attention at the same time, we apply a method, \"Time Warp Edit Distance\", to calculate the similarity of their eye movement trajectories. Moreover, we also cluster eye movement pattern of audiences based on these pair-wised similarity metrics. Besides, since we don't have a direct metric for the \"attention\" ground truth, a visual assessment would be beneficial to evaluate the gesture-attention relationship. Thus we also implement a visualization tool.\nWhile end-to-end neural conversation models have led to promising advances in reducing hand-crafted features and errors induced by the traditional complex system architecture, they typically require an enormous amount of data due to the lack of modularity. Previous studies adopted a hybrid approach with knowledge-based components either to abstract out domain-specific information or to augment data to cover more diverse patterns. On the contrary, we propose to directly address the problem using recent developments in the space of continual learning for neural models. Specifically, we adopt a domain-independent neural conversational model and introduce a novel neural continual learning algorithm that allows a conversational agent to accumulate skills across different tasks in a data-efficient way. To the best of our knowledge, this is the first work that applies continual learning to conversation systems. We verified the efficacy of our method through a conversational skill transfer from either synthetic dialogs or human-human dialogs to human-computer conversations in a customer support domain.\nThe culture of sharing instead of ownership is sharply increasing in individuals behaviors. Particularly in transportation, concepts of sharing a ride in either carpooling or ridesharing have been recently adopted. An efficient optimization approach to match passengers in real-time is the core of any ridesharing system. In this paper, we model ridesharing as an online matching problem on general graphs such that passengers do not drive private cars and use shared taxis. We propose an optimization algorithm to solve it. The outlined algorithm calculates the optimal waiting time when a passenger arrives. This leads to a matching with minimal overall overheads while maximizing the number of partnerships. To evaluate the behavior of our algorithm, we used NYC taxi real-life data set. Results represent a substantial reduction in overall overheads.\nUnder covariate shift, training (source) data and testing (target) data differ in input space distribution, but share the same conditional label distribution. This poses a challenging machine learning task. Robust Bias-Aware (RBA) prediction provides the conditional label distribution that is robust to the worstcase logarithmic loss for the target distribution while matching feature expectation constraints from the source distribution. However, employing RBA with insufficient feature constraints may result in high certainty predictions for much of the source data, while leaving too much uncertainty for target data predictions. To overcome this issue, we extend the representer theorem to the RBA setting, enabling minimization of regularized expected target risk by a reweighted kernel expectation under the source distribution. By applying kernel methods, we establish consistency guarantees and demonstrate better performance of the RBA classifier than competing methods on synthetically biased UCI datasets as well as datasets that have natural covariate shift.\nInterior tomography for the region-of-interest (ROI) imaging has advantages of using a small detector and reducing X-ray radiation dose. However, standard analytic reconstruction suffers from severe cupping artifacts due to existence of null space in the truncated Radon transform. Existing penalized reconstruction methods may address this problem but they require extensive computations due to the iterative reconstruction. Inspired by the recent deep learning approaches to low-dose and sparse view CT, here we propose a deep learning architecture that removes null space signals from the FBP reconstruction. Experimental results have shown that the proposed method provides near-perfect reconstruction with about 7-10 dB improvement in PSNR over existing methods in spite of significantly reduced run-time complexity.\nHierarchies are of fundamental interest in both stochastic optimal control and biological control due to their facilitation of a range of desirable computational traits in a control algorithm and the possibility that they may form a core principle of sensorimotor and cognitive control systems. However, a theoretically justified construction of state-space hierarchies over all spatial resolutions and their evolution through a policy inference process remains elusive. Here, a formalism for deriving such normative representations of discrete Markov decision processes is introduced in the context of graphs. The resulting hierarchies correspond to a hierarchical policy inference algorithm approximating a discrete gradient flow between state-space trajectory densities generated by the prior and optimal policies.\nLearning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications. Bayesian methods, such as Stein variational gradient descent (SVGD), offer an elegant framework to reason about NN model uncertainty. However, by assuming independent Gaussian priors for the individual NN weights (as often applied), SVGD does not impose prior knowledge that there is often structural information (dependence) among weights. We propose efficient posterior learning of structural weight uncertainty, within an SVGD framework, by employing matrix variate Gaussian priors on NN parameters. We further investigate the learned structural uncertainty in sequential decision-making problems, including contextual bandits and reinforcement learning. Experiments on several synthetic and real datasets indicate the superiority of our model, compared with state-of-the-art methods.\nThis paper presents a new deep learning architecture for Natural Language Inference (NLI). Firstly, we introduce a new compare-propagate architecture where alignments pairs are compared and then propagated to upper layers for enhanced representation learning. Secondly, we adopt novel factorization layers for efficient compression of alignment vectors into scalar valued features, which are then be used to augment the base word representations. The design of our approach is aimed to be conceptually simple, compact and yet powerful. We conduct experiments on three popular benchmarks, SNLI, MultiNLI and SciTail, achieving state-of-the-art performance on all. A lightweight parameterization of our model enjoys a $\\approx 300\\%$ reduction in parameter size compared to the ESIM and DIIN, while maintaining competitive performance. Visual analysis shows that our propagated features are highly interpretable, opening new avenues to explainability in neural NLI models.\nGame-theoretic centrality is a flexible and sophisticated approach to identify the most important nodes in a network. It builds upon the methods from cooperative game theory and network theory. The key idea is to treat nodes as players in a cooperative game, where the value of each coalition is determined by certain graph-theoretic properties. Using solution concepts from cooperative game theory, it is then possible to measure how responsible each node is for the worth of the network.   The literature on the topic is already quite large, and is scattered among game-theoretic and computer science venues. We review the main game-theoretic network centrality measures from both bodies of literature and organize them into two categories: those that are more focused on the connectivity of nodes, and those that are more focused on the synergies achieved by nodes in groups. We present and explain each centrality, with a focus on algorithms and complexity.\nIn this paper we present a neurally plausible model of robot reaching inspired by human infant reaching that is based on embodied artificial intelligence, which emphasizes the importance of the sensory-motor interaction of an agent and the world. This model encompasses both learning sensory-motor correlations through motor babbling and also arm motion planning using spreading activation. This model is organized in three layers of neural maps with parallel structures representing the same sensory-motor space. The motor babbling period shapes the structure of the three neural maps as well as the connections within and between them. We describe an implementation of this model and an investigation of this implementation using a simple reaching task on a humanoid robot. The robot has learned successfully to plan reaching motions from a test set with high accuracy and smoothness.\nWe propose a scalable divergence estimation method based on hashing. Consider two continuous random variables $X$ and $Y$ whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We show that the proposed estimator is optimal in terms of both MSE rate and computational complexity. We derive the MSE rates for two families of smooth functions; the H\\\"{o}lder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order $d/2$, where $d$ is the dimension of samples, the optimal parametric MSE rate of $O(1/N)$ can be achieved. The computational complexity is shown to be $O(N)$, which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.\nEffective presentation skills can help to succeed in business, career and academy. This paper presents the design of speech assessment during the oral presentation and the algorithm for speech evaluation based on criteria of optimal intonation. As the pace of the speech and its optimal intonation varies from language to language, developing an automatic identification of language during the presentation is required. Proposed algorithm was tested with presentations delivered in Kazakh language. For testing purposes the features of Kazakh phonemes were extracted using MFCC and PLP methods and created a Hidden Markov Model (HMM) [5], [5] of Kazakh phonemes. Kazakh vowel formants were defined and the correlation between the deviation rate in fundamental frequency and the liveliness of the speech to evaluate intonation of the presentation was analyzed. It was established that the threshold value between monotone and dynamic speech is 0.16 and the error for intonation evaluation is 19%.\nThe number of optimization techniques in the combinatorial domain is large and diversified. Nevertheless, real-world based benchmarks for testing algorithms are few. This work creates an extensible real-world mail delivery benchmark to the Vehicle Routing Problem (VRP) in a planar graph embedded in the 2D Euclidean space. Such problem is multi-objective on a roadmap with up to 25 vehicles and 30,000 deliveries per day. Each instance models one generic day of mail delivery, allowing both comparison and validation of optimization algorithms for routing problems. The benchmark may be extended to model other scenarios.\nMoney laundering is a crime that makes it possible to finance other crimes, for this reason, it is important for criminal organizations and their combat is prioritized by nations around the world. The anti-money laundering process has not evolved as expected because it has prioritized only the signaling of suspicious transactions. The constant increasing in the volume of transactions has overloaded the indispensable human work of final evaluation of the suspicions. This article presents a multiagent system that aims to go beyond the capture of suspicious transactions, seeking to assist the human expert in the analysis of suspicions. The agents created use data mining techniques to create transactional behavioral profiles; apply rules generated in learning process in conjunction with specific rules based on legal aspects and profiles created to capture suspicious transactions; and analyze these suspicious transactions indicating to the human expert those that require more detailed analysis.\nViZDoom is a robust, first-person shooter reinforcement learning environment, characterized by a significant degree of latent state information. In this paper, double-Q learning and prioritized experience replay methods are tested under a certain ViZDoom combat scenario using a competitive deep recurrent Q-network (DRQN) architecture. In addition, an ensembling technique known as snapshot ensembling is employed using a specific annealed learning rate to observe differences in ensembling efficacy under these two methods. Annealed learning rates are important in general to the training of deep neural network models, as they shake up the status-quo and counter a model's tending towards local optima. While both variants show performance exceeding those of built-in AI agents of the game, the known stabilizing effects of double-Q learning are illustrated, and priority experience replay is again validated in its usefulness by showing immediate results early on in agent development, with the caveat that value overestimation is accelerated in this case. In addition, some unique behaviors are observed to develop for priority experience replay (PER) and double-Q (DDQ) variants, and snapshot ensembling of both PER and DDQ proves a valuable method for improving performance of the ViZDoom Marine.\nLanguages shared by people differ in different regions based on their accents, pronunciation and word usages. In this era sharing of language takes place mainly through social media and blogs. Every second swing of such a micro posts exist which induces the need of processing those micro posts, in-order to extract knowledge out of it. Knowledge extraction differs with respect to the application in which the research on cognitive science fed the necessities for the same. This work further moves forward such a research by extracting semantic information of streaming and batch data in applications like Named Entity Recognition and Author Profiling. In the case of Named Entity Recognition context of a single micro post has been utilized and context that lies in the pool of micro posts were utilized to identify the sociolect aspects of the author of those micro posts. In this work Conditional Random Field has been utilized to do the entity recognition and a novel approach has been proposed to find the sociolect aspects of the author (Gender, Age group).\nA common problem in machine learning is to rank a set of n items based on pairwise comparisons. Here ranking refers to partitioning the items into sets of pre-specified sizes according to their scores, which includes identification of the top-k items as the most prominent special case. The score of a given item is defined as the probability that it beats a randomly chosen other item. Finding an exact ranking typically requires a prohibitively large number of comparisons, but in practice, approximate rankings are often adequate. Accordingly, we study the problem of finding approximate rankings from pairwise comparisons. We analyze an active ranking algorithm that counts the number of comparisons won, and decides whether to stop or which pair of items to compare next, based on confidence intervals computed from the data collected in previous steps. We show that this algorithm succeeds in recovering approximate rankings using a number of comparisons that is close to optimal up to logarithmic factors. We also present numerical results, showing that in practice, approximation can drastically reduce the number of comparisons required to estimate a ranking.\nFor homeland and transportation security applications, 2D X-ray explosive detection system (EDS) have been widely used, but they have limitations in recognizing 3D shape of the hidden objects. Among various types of 3D computed tomography (CT) systems to address this issue, this paper is interested in a stationary CT using fixed X-ray sources and detectors. However, due to the limited number of projection views, analytic reconstruction algorithms produce severe streaking artifacts. Inspired by recent success of deep learning approach for sparse view CT reconstruction, here we propose a novel image and sinogram domain deep learning architecture for 3D reconstruction from very sparse view measurement. The algorithm has been tested with the real data from a prototype 9-view dual energy stationary CT EDS carry-on baggage scanner developed by GEMSS Medical Systems, Korea, which confirms the superior reconstruction performance over the existing approaches.\nModel-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy - that is, succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.\nTo choose a multi-winner rule, i.e., a voting rule that selects a subset of $k$ alternatives based on preferences of a certain population, is a hard and ambiguous task. Depending on the context, it varies widely what constitutes an \"optimal\" committee. In this paper, we offer a new perspective to measure the quality of committees and---consequently---multi-winner rules. We provide a quantitative analysis using methods from the theory of approximation algorithms and estimate how well multi-winner rules approximate two extreme objectives: diversity as captured by the (Approval) Chamberlin--Courant rule (CC) and individual excellence as captured by Approval Voting (AV). With both theoretical and experimental methods we establish a classification of multi-winner rules in terms of their quantitative alignment with these two opposing objectives.\nDeep learning has achieved impressive results on many problems. However, it requires high degree of expertise or a lot of experience to tune well the hyperparameters, and such manual tuning process is likely to be biased. Moreover, it is not practical to try out as many different hyperparameter configurations in deep learning as in other machine learning scenarios, because evaluating each single hyperparameter configuration in deep learning would mean training a deep neural network, which usually takes quite long time. Hyperband algorithm achieves state-of-the-art performance on various hyperparameter optimization problems in the field of deep learning. However, Hyperband algorithm does not utilize history information of previous explored hyperparameter configurations, thus the solution found is suboptimal. We propose to combine Hyperband algorithm with Bayesian optimization (which does not ignore history when sampling next trial configuration). Experimental results show that our combination approach is superior to other hyperparameter optimization approaches including Hyperband algorithm.\nIn this paper, we use the witness-functions to analyze cryptographic protocols for secrecy under nonempty equational theories. The witness-functions are safe metrics used to compute security. An analysis with a witness-function consists in making sure that the security of every atomic message does not decrease during its lifecycle in the protocol. The analysis gets more difficult under nonempty equational theories. Indeed, the intruder can take advantage of the algebraic properties of the cryptographic primitives to derive secrets. These properties arise from the use of mathematical functions, such as multiplication, addition, exclusive-or or modular exponentiation in the cryptosystems and the protocols. Here, we show how to use the witness-functions under nonempty equational theories and we run an analysis on the Needham-Schroeder-Lowe protocol under the cipher homomorphism. This analysis reveals that although this protocol is proved secure under the perfect encryption assumption, its security collapses under the homomorphic primitives. We show how the witness-functions help to illustrate an attack scenario on it and we propose an amended version to fix it.\nEvaluating pairwise comparisons breaks down complex decision problems into tractable ones. Pairwise comparison matrices (PCMs) are regularly used to solve multiple-criteria decision-making (MCDM) problems using Saaty's analytic hierarchy process (AHP) framework. There are two significant drawbacks of using PCMs. First, humans evaluate PCM in an inconsistent manner. Second, PCMs of large problems often have missing entries. We address these two issues by first establishing a novel connection between PCMs and time-irreversible Markov processes. Specifically, we show that every PCM induces a family of dissipative maximum path entropy random walks (MERW) over the set of alternatives. We show that only `consistent' PCMs correspond to detailed balanced MERWs. We identify the non-equilibrium entropy production in the induced MERWs as a metric of inconsistency of the underlying PCMs. Notably, the entropy production satisfies all of the recently laid out criteria for reasonable consistency indices. We also propose an approach to use incompletely filled PCMs in AHP. Potential future avenues are discussed as well.\nThe Graph Brain Project is an experiment in how the use of automated mathematical discovery software, databases, large collaboration, and systematic investigation provide a model for how mathematical research might proceed in the future.   Our Project began with the development of a program that can be used to generate invariant-relation and property-relation conjectures in many areas of mathematics. This program can produce conjectures which are not implied by existing (published) theorems. Here we propose a new approach to push forward existing mathematical research goals---using automated mathematical discovery software. We suggest how to initiate and harness large-scale collaborative mathematics. We envision mathematical research labs similar to what exist in other sciences, new avenues for funding, new opportunities for training students, and a more efficient and effective use of published mathematical research.   And our experiment in graph theory can be imitated in many other areas of mathematical research. Big Mathematics is the idea of large, systematic, collaborative research on problems of existing mathematical interest. What is possible when we put our skills, tools, and results together systematically?\nThe Semantic Web is becoming a large scale framework that enables data to be published, shared, and reused in the form of ontologies. The ontology which is considered as basic building block of semantic web consists of two layers including data and schema layer. With the current exponential development of ontologies in both data size and complexity of schemas, ontology understanding which is playing an important role in different tasks such as ontology engineering, ontology learning, etc., is becoming more difficult. Ontology summarization as a way to distill knowledge from an ontology and generate an abridge version to facilitate a better understanding is getting more attention recently. There are various approaches available for ontology summarization which are focusing on different measures in order to produce a proper summary for a given ontology. In this paper, we mainly focus on the common metrics which are using for ontology summarization and meet the state-of-the-art in ontology summarization.\nInductive inference is the process of extracting general rules from specific observations. This problem also arises in the analysis of biological networks, such as genetic regulatory networks, where the interactions are complex and the observations are incomplete. A typical task in these problems is to extract general interaction rules as combinations of Boolean covariates, that explain a measured response variable. The inductive inference process can be considered as an incompletely specified Boolean function synthesis problem. This incompleteness of the problem will also generate spurious inferences, which are a serious threat to valid inductive inference rules. Using random Boolean data as a null model, here we attempt to measure the competition between valid and spurious inductive inference rules from a given data set. We formulate two greedy search algorithms, which synthesize a given Boolean response variable in a sparse disjunct normal form, and respectively a sparse generalized algebraic normal form of the variables from the observation data, and we evaluate numerically their performance.\nRecurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in terms of computational and memory load. Emerging latency-sensitive applications including mobile robots and autonomous vehicles often operate under stringent computation time constraints. In this paper, we address the challenge of deploying computationally demanding LSTMs at a constrained time budget by introducing an approximate computing scheme that combines iterative low-rank compression and pruning, along with a novel FPGA-based LSTM architecture. Combined in an end-to-end framework, the approximation method's parameters are optimised and the architecture is configured to address the problem of high-performance LSTM execution in time-constrained applications. Quantitative evaluation on a real-life image captioning application indicates that the proposed methods required up to 6.5x less time to achieve the same application-level accuracy compared to a baseline method, while achieving an average of 25x higher accuracy under the same computation time constraints.\nIndian regional movie dataset is the first database of regional Indian movies, users and their ratings. It consists of movies belonging to 18 different Indian regional languages and metadata of users with varying demographics. Through this dataset, the diversity of Indian regional cinema and its huge viewership is captured. We analyze the dataset that contains roughly 10K ratings of 919 users and 2,851 movies using some supervised and unsupervised collaborative filtering techniques like Probabilistic Matrix Factorization, Matrix Completion, Blind Compressed Sensing etc. The dataset consists of metadata information of users like age, occupation, home state and known languages. It also consists of metadata of movies like genre, language, release year and cast. India has a wide base of viewers which is evident by the large number of movies released every year and the huge box-office revenue. This dataset can be used for designing recommendation systems for Indian users and regional movies, which do not, yet, exist. The dataset can be downloaded from \\href{https://goo.gl/EmTPv6}{https://goo.gl/EmTPv6}.\nRecent work in deep reinforcement learning has allowed algorithms to learn complex tasks such as Atari 2600 games just from the reward provided by the game, but these algorithms presently require millions of training steps in order to learn, making them approximately five orders of magnitude slower than humans. One reason for this is that humans build robust shared representations that are applicable to collections of problems, making it much easier to assimilate new variants. This paper first introduces the idea of automatically-generated game sets to aid in transfer learning research, and then demonstrates the utility of shared representations by showing that models can substantially benefit from the incorporation of relevant architectural priors. This technique affords a remarkable 50x positive transfer on a toy problem-set.\nThis paper investigates the following problem: how to find a GSMem malicious activity effectively. To this end, this paper puts forward a new method based on Artificial Intelligence (AI). At first, we use a large quantity of data in terms of frequencies and amplitudes of some electromagnetic waves to train our models. And then, we input a given frequency and amplitude into the obtained models, predicting that whether a GSMem malicious activity occurs or not. The simulated experiments show that the new method is potential to detect a GSMem one, with low False Positive Rates (FPR) and low False Negative Rates (FNR).\nWe present a micro-traffic simulation (named \"DeepTraffic\") where the perception, control, and planning systems for one of the cars are all handled by a single neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of DQN variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowd-sourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space with the objective of their neural network submission to make it onto the top-10 leaderboard.\nWe present a study in Distributed Deep Reinforcement Learning (DDRL) focused on scalability of a state-of-the-art Deep Reinforcement Learning algorithm known as Batch Asynchronous Advantage ActorCritic (BA3C). We show that using the Adam optimization algorithm with a batch size of up to 2048 is a viable choice for carrying out large scale machine learning computations. This, combined with careful reexamination of the optimizer's hyperparameters, using synchronous training on the node level (while keeping the local, single node part of the algorithm asynchronous) and minimizing the memory footprint of the model, allowed us to achieve linear scaling for up to 64 CPU nodes. This corresponds to a training time of 21 minutes on 768 CPU cores, as opposed to 10 hours when using a single node with 24 cores achieved by a baseline single-node implementation.\nThis paper demonstrates the development of ontology for satellite databases. First, I create a computational ontology for the Union of Concerned Scientists (UCS) Satellite Database (UCSSD for short), called the UCS Satellite Ontology (or UCSSO). Second, in developing UCSSO I show that The Space Situational Awareness Ontology (SSAO) (Rovetto and Kelso 2016)--an existing space domain reference ontology--and related ontology work by the author (Rovetto 2015, 2016) can be used either (i) with a database-specific local ontology such as UCSSO, or (ii) in its stead. In case (i), local ontologies such as UCSSO can reuse SSAO terms, perform term mappings, or extend it. In case (ii), the author's orbital space ontology work, such as the SSAO, is usable by the UCSSD and organizations with other space object catalogs, as a reference ontology suite providing a common semantically-rich domain model. The SSAO, UCSSO, and the broader Orbital Space Environment Domain Ontology project is online at http://purl.org/space-ontology and GitHub. This ontology effort aims, in part, to provide accurate formal representations of the domain for various applications. Ontology engineering has the potential to facilitate the sharing and integration of satellite data from federated databases and sensors for safer spaceflight.\nWe propose a deep learning model - Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) for estimating short-term life expectancy (3 months) of the patients by analyzing free-text clinical notes in the electronic medical record, while maintaining the temporal visit sequence. In a single framework, we integrated semantic data mapping and neural embedding technique to produce a text processing method that extracts relevant information from heterogeneous types of clinical notes in an unsupervised manner, and we designed a recurrent neural network to model the temporal dependency of the patient visits. The model was trained on a large dataset (10,293 patients) and validated on a separated dataset (1818 patients). Our method achieved an area under the ROC curve (AUC) of 0.89. To provide explain-ability, we developed an interactive graphical tool that may improve physician understanding of the basis for the model's predictions. The high accuracy and explain-ability of the PPES-Met model may enable our model to be used as a decision support tool to personalize metastatic cancer treatment and provide valuable assistance to the physicians.\nExperience replay allows a reinforcement learning agent to train on samples from a large amount of the most recent experiences. A simple in-RAM experience replay stores these most recent experiences in a list in RAM, and then copies sampled batches to the GPU for training. I moved this list to the GPU, thus creating an in-GPU experience replay, and a training step that no longer has inputs copied from the CPU. I trained an agent to play Super Smash Bros. Melee, using internal game memory values as inputs and outputting controller button presses. A single state in Melee contains 27 floats, so the full experience replay fits on a single GPU. For a batch size of 128, the in-GPU experience replay trained twice as fast as the in-RAM experience replay. As far as I know, this is the first in-GPU implementation of experience replay. Finally, I note a few ideas for fitting the experience replay inside the GPU when the environment state requires more memory.\nWe present a formalization and computational implementation of the second formulation of Kant's categorical imperative. This ethical principle requires an agent to never treat someone merely as a means but always also as an end. Here we interpret this principle in terms of how persons are causally affected by actions. We introduce Kantian causal agency models in which moral patients, actions, goals, and causal influence are represented, and we show how to formalize several readings of Kant's categorical imperative that correspond to Kant's concept of strict and wide duties towards oneself and others. Stricter versions handle cases where an action directly causally affects oneself or others, whereas the wide version maximizes the number of persons being treated as an end. We discuss limitations of our formalization by pointing to one of Kant's cases that the machinery cannot handle in a satisfying way.\nGraph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filters in graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data of arbitrary graph structure as input. In that way a task-driven adaptive graph is learned for each graph data while training. To efficiently learn the graph, a distance metric learning is proposed. Extensive experiments on nine graph-structured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy.\nCurrent crowdsourcing platforms provide little support for worker feedback. Workers are sometimes invited to post free text describing their experience and preferences in completing tasks. They can also use forums such as Turker Nation1 to exchange preferences on tasks and requesters. In fact, crowdsourcing platforms rely heavily on observing workers and inferring their preferences implicitly. In this work, we believe that asking workers to indicate their preferences explicitly improve their experience in task completion and hence, the quality of their contributions. Explicit elicitation can indeed help to build more accurate worker models for task completion that captures the evolving nature of worker preferences. We design a worker model whose accuracy is improved iteratively by requesting preferences for task factors such as required skills, task payment, and task relevance. We propose a generic framework, develop efficient solutions in realistic scenarios, and run extensive experiments that show the benefit of explicit preference elicitation over implicit ones with statistical significance.\nMethods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised---its possible states and actions and their causal structure---is known in advance and does not change during learning. This is an unrealistic assumption in many scenarios, because new evidence can reveal important information about what is possible, possibilities that the agent was not aware existed prior to learning. We present a model of an agent which both discovers and learns to exploit unforeseen possibilities using two sources of evidence: direct interaction with the world and communication with a domain expert. We use a combination of probabilistic and symbolic reasoning to estimate all components of the decision problem, including its set of random variables and their causal dependencies. Agent simulations show that the agent converges on optimal polices even when it starts out unaware of factors that are critical to behaving optimally.\nWe consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs.\nMonte Carlo inference has asymptotic guarantees, but can be slow when using generic proposals. Handcrafted proposals that rely on user knowledge about the posterior distribution can be efficient, but are difficult to derive and implement. This paper proposes to let users express their posterior knowledge in the form of proposal programs, which are samplers written in probabilistic programming languages. One strategy for writing good proposal programs is to combine domain-specific heuristic algorithms with neural network models. The heuristics identify high probability regions, and the neural networks model the posterior uncertainty around the outputs of the algorithm. Proposal programs can be used as proposal distributions in importance sampling and Metropolis-Hastings samplers without sacrificing asymptotic consistency, and can be optimized offline using inference compilation. Support for optimizing and using proposal programs is easily implemented in a sampling-based probabilistic programming runtime. The paper illustrates the proposed technique with a proposal program that combines RANSAC and neural networks to accelerate inference in a Bayesian linear regression with outliers model.\nDialog evaluation is a challenging problem, especially for non task-oriented dialogs where conversational success is not well-defined. We propose to evaluate dialog quality using topic-based metrics that describe the ability of a conversational bot to sustain coherent and engaging conversations on a topic, and the diversity of topics that a bot can handle. To detect conversation topics per utterance, we adopt Deep Average Networks (DAN) and train a topic classifier on a variety of question and query data categorized into multiple topics. We propose a novel extension to DAN by adding a topic-word attention table that allows the system to jointly capture topic keywords in an utterance and perform topic classification. We compare our proposed topic based metrics with the ratings provided by users and show that our metrics both correlate with and complement human judgment. Our analysis is performed on tens of thousands of real human-bot dialogs from the Alexa Prize competition and highlights user expectations for conversational bots.\nIn order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework called \"EARL\", which performs entity linking and relation linking as a joint single task. EARL uses a graph connection based solution to the problem. We model the linking task as an instance of the Generalised Travelling Salesman Problem (GTSP) and use GTSP approximate algorithm solutions. We later develop EARL which uses a pair-wise graph-distance based solution to the problem.The system determines the best semantic connection between all keywords of the question by referring to a knowledge graph. This is achieved by exploiting the \"connection density\" between entity candidates and relation candidates. The \"connection density\" based solution performs at par with the approximate GTSP solution.We have empirically evaluated the framework on a dataset with 5000 questions. Our system surpasses state-of-the-art scores for entity linking task by reporting an accuracy of 0.65 to 0.40 from the next best entity linker.\nThe highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a similarity space and concepts are represented by convex regions in this space. After pointing out a problem with the convexity requirement, we propose a formalization of conceptual spaces based on fuzzy star-shaped sets. Our formalization uses a parametric definition of concepts and extends the original framework by adding means to represent correlations between different domains in a geometric way. Moreover, we define various operations for our formalization, both for creating new concepts from old ones and for measuring relations between concepts. We present an illustrative toy-example and sketch a research project on concept formation that is based on both our formalization and its implementation.\nDeep reinforcement learning has achieved great strides in solving challenging motion control tasks. Recently, there has been significant work on methods for exploiting the data gathered during training, but there has been less work on how to best generate the data to learn from. For continuous action domains, the most common method for generating exploratory actions involves sampling from a Gaussian distribution centred around the mean action output by a policy. Although these methods can be quite capable, they do not scale well with the dimensionality of the action space, and can be dangerous to apply on hardware. We consider learning a forward dynamics model to predict the result, ($x_{t+1}$), of taking a particular action, ($u$), given a specific observation of the state, ($x_{t}$). With this model we perform internal look-ahead predictions of outcomes and seek actions we believe have a reasonable chance of success. This method alters the exploratory action space, thereby increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion and juggling.\nLearning of user preferences, as represented by, for example, Conditional Preference Networks (CP-nets), has become a core issue in AI research. Recent studies investigate learning of CP-nets from randomly chosen examples or from membership and equivalence queries. To assess the optimality of learning algorithms as well as to better understand the combinatorial structure of classes of CP-nets, it is helpful to calculate certain learning-theoretic information complexity parameters. This paper determines bounds on or exact values of some of the most central information complexity parameters, namely the VC dimension, the (recursive) teaching dimension, the self-directed learning complexity, and the optimal mistake bound, for classes of acyclic CP-nets. We further provide an algorithm that learns tree-structured CP-nets from membership queries. Using our results on complexity parameters, we assess the optimality of our algorithm as well as that of another query learning algorithm for acyclic CP-nets presented in the literature. Our algorithm is near-optimal, and can, under certain assumptions be adapted to the case when the membership oracle is faulty.\nTrust is essential for human-robot collaboration and user adoption of autonomous systems, such as robot assistants. This paper introduces a computational model which integrates trust into robot decision-making. Specifically, we learn from data a partially observable Markov decision process (POMDP) with human trust as a latent variable. The trust-POMDP model provides a principled approach for the robot to (i) infer the trust of a human teammate through interaction, (ii) reason about the effect of its own actions on human behaviors, and (iii) choose actions that maximize team performance over the long term. We validated the model through human subject experiments on a table-clearing task in simulation (201 participants) and with a real robot (20 participants). The results show that the trust-POMDP improves human-robot team performance in this task. They further suggest that maximizing trust in itself may not improve team performance.\nNeural programming involves training neural networks to learn programs from data. Previous works have failed to achieve good generalization performance, especially on programs with high complexity or on large domains. This is because they mostly rely either on black-box function evaluations that do not capture the structure of the program, or on detailed execution traces that are expensive to obtain, and hence the training data has poor coverage of the domain under consideration. We present a novel framework that utilizes black-box function evaluations, in conjunction with symbolic expressions that integrate relationships between the given functions. We employ tree LSTMs to incorporate the structure of the symbolic expression trees. We use tree encoding for numbers present in function evaluation data, based on their decimal representation. We present an evaluation benchmark for this task to demonstrate our proposed model combines symbolic reasoning and function evaluation in a fruitful manner, obtaining high accuracies in our experiments. Our framework generalizes significantly better to expressions of higher depth and is able to fill partial equations with valid completions.\nWe introduce a new computational model of moral decision making, drawing on a recent theory of commonsense moral learning via social dynamics. Our model describes moral dilemmas as a utility function that computes trade-offs in values over abstract moral dimensions, which provide interpretable parameter values when implemented in machine-led ethical decision-making. Moreover, characterizing the social structures of individuals and groups as a hierarchical Bayesian model, we show that a useful description of an individual's moral values - as well as a group's shared values - can be inferred from a limited amount of observed data. Finally, we apply and evaluate our approach to data from the Moral Machine, a web application that collects human judgments on moral dilemmas involving autonomous vehicles.\nAutomated decision making systems are increasingly being used in real-world applications. In these systems for the most part, the decision rules are derived by minimizing the training error on the available historical data. Therefore, if there is a bias related to a sensitive attribute such as gender, race, religion, etc. in the data, say, due to cultural/historical discriminatory practices against a certain demographic, the system could continue discrimination in decisions by including the said bias in its decision rule. We present an information theoretic framework for designing fair predictors from data, which aim to prevent discrimination against a specified sensitive attribute in a supervised learning setting. We use equalized odds as the criterion for discrimination, which demands that the prediction should be independent of the protected attribute conditioned on the actual label. To ensure fairness and generalization simultaneously, we compress the data to an auxiliary variable, which is used for the prediction task. This auxiliary variable is chosen such that it is decontaminated from the discriminatory attribute in the sense of equalized odds. The final predictor is obtained by applying a Bayesian decision rule to the auxiliary variable.\nRecent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distribution lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn a generative image model of all 1000 Imagenet classes in a single GAN with little hyperparameter tuning.\nApart from few exceptions, the mathematical runtime analysis of evolutionary algorithms is mostly concerned with expected runtimes. In this work, we argue that stochastic domination is a notion that should be used more frequently in this area. Stochastic domination allows to formulate much more informative performance guarantees than the expectation alone, it allows to decouple the algorithm analysis into the true algorithmic part of detecting a domination statement and probability theoretic part of deriving the desired probabilistic guarantees from this statement, and it allows simpler and more natural proofs.   As particular results, we prove a fitness level theorem which shows that the runtime is dominated by a sum of independent geometric random variables, we prove tail bounds for several classic problems, and we give a short and natural proof for Witt's result that the runtime of any $(\\mu,p)$ mutation-based algorithm on any function with unique optimum is subdominated by the runtime of a variant of the (1+1) evolutionary algorithm on the OneMax function.\nConvNets, through their architecture, only enforce invariance to translation. In this paper, we introduce a new class of deep convolutional architectures called Non-Parametric Transformation Networks (NPTNs) which can learn \\textit{general} invariances and symmetries directly from data. NPTNs are a natural generalization of ConvNets and can be optimized directly using gradient descent. Unlike almost all previous works in deep architectures, they make no assumption regarding the structure of the invariances present in the data and in that aspect are flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks (TN), which yields a better understanding of the connection between the two. We demonstrate the efficacy of NPTNs on data such as MNIST and CIFAR10 where they outperform ConvNet baselines with the same number of parameters. We show it is more effective than ConvNets in modelling symmetries from data, without the explicit knowledge of the added arbitrary nuisance transformations. Finally, we replace ConvNets with NPTNs within Capsule Networks and show that this enables Capsule Nets to perform even better.\nLearning a classifier with control on the false-positive rate plays a critical role in many machine learning applications. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classifiers, which lack consistency in methodology because they do not strictly adhere to the false-positive rate constraint. In this paper, we propose a novel scoring-thresholding approach, tau-False Positive Learning (tau-FPL) to address this problem. We show the scoring problem which takes the false-positive rate tolerance into accounts can be efficiently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classifier. Both theoretical analysis and experimental results show superior performance of the proposed tau-FPL over existing approaches.\nA large body of compelling evidence has been accumulated demonstrating that embodiment - the agent's physical setup, including its shape, materials, sensors and actuators - is constitutive for any form of cognition and as a consequence, models of cognition need to be embodied. In contrast to methods from empirical sciences to study cognition, robots can be freely manipulated and virtually all key variables of their embodiment and control programs can be systematically varied. As such, they provide an extremely powerful tool of investigation. We present a robotic bottom-up or developmental approach, focusing on three stages: (a) low-level behaviors like walking and reflexes, (b) learning regularities in sensorimotor spaces, and (c) human-like cognition. We also show that robotic based research is not only a productive path to deepening our understanding of cognition, but that robots can strongly benefit from human-like cognition in order to become more autonomous, robust, resilient, and safe.\nWe propose Machines Talking To Machines (M2M), a framework combining automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents for goal-oriented dialogues in arbitrary domains. M2M scales to new tasks with just a task schema and an API client from the dialogue system developer, but it is also customizable to cater to task-specific interactions. Compared to the Wizard-of-Oz approach for data collection, M2M achieves greater diversity and coverage of salient dialogue flows while maintaining the naturalness of individual utterances. In the first phase, a simulated user bot and a domain-agnostic system bot converse to exhaustively generate dialogue \"outlines\", i.e. sequences of template utterances and their semantic parses. In the second phase, crowd workers provide contextual rewrites of the dialogues to make the utterances more natural while preserving their meaning. The entire process can finish within a few hours. We propose a new corpus of 3,000 dialogues spanning 2 domains collected with M2M, and present comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows.\nTopic modeling enables exploration and compact representation of a corpus. The CaringBridge (CB) dataset is a massive collection of journals written by patients and caregivers during a health crisis. Topic modeling on the CB dataset, however, is challenging due to the asynchronous nature of multiple authors writing about their health journeys. To overcome this challenge we introduce the Dynamic Author-Persona topic model (DAP), a probabilistic graphical model designed for temporal corpora with multiple authors. The novelty of the DAP model lies in its representation of authors by a persona --- where personas capture the propensity to write about certain topics over time. Further, we present a regularized variational inference algorithm, which we use to encourage the DAP model's personas to be distinct. Our results show significant improvements over competing topic models --- particularly after regularization, and highlight the DAP model's unique ability to capture common journeys shared by different authors.\nWe present extensive experiments training and testing hidden units in deep networks that emit only a predefined, static, number of discretized values. These units provide benefits in real-world deployment in systems in which memory and/or computation may be limited. Additionally, they are particularly well suited for use in large recurrent network models that require the maintenance of large amounts of internal state in memory. Surprisingly, we find that despite reducing the number of values that can be represented in the output activations from $2^{32}-2^{64}$ to between 64 and 256, there is little to no degradation in network performance across a variety of different settings. We investigate simple classification and regression tasks, as well as memorization and compression problems. We compare the results with more standard activations, such as tanh and relu. Unlike previous discretization studies which often concentrate only on binary units, we examine the effects of varying the number of allowed activation levels. Compared to existing approaches for discretization, the approach presented here is both conceptually and programatically simple, has no stochastic component, and allows the training, testing, and usage phases to be treated in exactly the same manner.\nThis paper proposes a novel adaptive algorithm for the automated short-term trading of financial instrument. The algorithm adopts a semantic sentiment analysis technique to inspect the Twitter posts and to use them to predict the behaviour of the stock market. Indeed, the algorithm is specifically developed to take advantage of both the sentiment and the past values of a certain financial instrument in order to choose the best investment decision. This allows the algorithm to ensure the maximization of the obtainable profits by trading on the stock market. We have conducted an investment simulation and compared the performance of our proposed with a well-known benchmark (DJTATO index) and the optimal results, in which an investor knows in advance the future price of a product. The result shows that our approach outperforms the benchmark and achieves the performance score close to the optimal result.\nFeature engineering is one of the most important but tedious tasks in data science projects. This work studies automation of feature learning for relational data. We first theoretically proved that learning relevant features from relational data for a given predictive analytics problem is NP-hard. However, it is possible to empirically show that an efficient rule based approach predefining transformations as a priori based on heuristics can extract very useful features from relational data. Indeed, the proposed approach outperformed the state of the art solutions with a significant margin. We further introduce a deep neural network which automatically learns appropriate transformations of relational data into a representation that predicts the target variable well instead of being predefined as a priori by users. In an extensive experiment with Kaggle competitions, the proposed methods could win late medals. To the best of our knowledge, this is the first time an automation system could win medals in Kaggle competitions with complex relational data.\nControl systems behavior can be analyzed taking into account a large number of parameters: performances, reliability, availability, security. Each control system presents various security vulnerabilities that affect in lower or higher measure its functioning. In this paper the authors present a method to assess the impact of security issues on the systems availability. A fuzzy model for estimating the availability of the system based on the security level and achieved availability coefficient (depending on MTBF and MTR) is developed and described. The results of the fuzzy inference system (FIS) are presented in the last section of the paper.\nThis paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising.\nDempster-Shafer evidence theory has been widely used in various fields of applications. Besides, it has been proven that the quantum theory has powerful capabilities of solving the decision making problems. However, due to the inconsistency of the expression, the classical Dempster-Shafer evidence theory modelled by real numbers can not be integrated directly with the quantum theory modelled by complex numbers. The main contribution in this study is that, unlike the existing evidence theory, a mass function in the generalized Dempster-Shafer evidence theory is modelled by a complex number, called as a complex mass function. When the complex mass function is degenerated from complex numbers to real numbers, the generalized Dempster's combination rule degenerates to the classical evidence theory. This generalized Dempster-Shafer evidence theory provides a promising way to model and handle more uncertain information. Numerical examples are illustrated to show the efficiency of the generalized Dempster-Shafer evidence theory.\nThe increasing use of deep neural networks for safety-critical applications, such as autonomous driving and flight control, raises concerns about their safety and reliability. Formal verification can address these concerns by guaranteeing that a deep learning system operates as intended, but the state of the art is limited to small systems. In this work-in-progress report we give an overview of our work on mitigating this difficulty, by pursuing two complementary directions: devising scalable verification techniques, and identifying design choices that result in deep learning systems that are more amenable to verification.\nWith the demand for machine learning increasing, so does the demand for tools which make it easier to use. Automated machine learning (AutoML) tools have been developed to address this need, such as the Tree-Based Pipeline Optimization Tool (TPOT) which uses genetic programming to build optimal pipelines. We introduce Layered TPOT, a modification to TPOT which aims to create pipelines equally good as the original, but in significantly less time. This approach evaluates candidate pipelines on increasingly large subsets of the data according to their fitness, using a modified evolutionary algorithm to allow for separate competition between pipelines trained on different sample sizes. Empirical evaluation shows that, on sufficiently large datasets, Layered TPOT indeed finds better models faster.\nWe train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the representation space by interpolating between sentences, which yields interesting pseudo-English sentences, many of which have recognizable syntactic structure. Lastly, we point out an interesting property of our models: The difference-vector between two sentences can be added to change a third sentence with similar features in a meaningful way.\nA problem faced by many instructors is that of designing exams that accurately assess the abilities of the students. Typically these exams are prepared several days in advance, and generic question scores are used based on rough approximation of the question difficulty and length. For example, for a recent class taught by the author, there were 30 multiple choice questions worth 3 points, 15 true/false with explanation questions worth 4 points, and 5 analytical exercises worth 10 points. We describe a novel framework where algorithms from machine learning are used to modify the exam question weights in order to optimize the exam scores, using the overall class grade as a proxy for a student's true ability. We show that significant error reduction can be obtained by our approach over standard weighting schemes, and we make several new observations regarding the properties of the \"good\" and \"bad\" exam questions that can have impact on the design of improved future evaluation methods.\nTraining a task-completion dialogue agent with real users via reinforcement learning (RL) could be prohibitively expensive, because it requires many interactions with users. One alternative is to resort to a user simulator, while the discrepancy of between simulated and real users makes the learned policy unreliable in practice. This paper addresses these challenges by integrating planning into the dialogue policy learning based on Dyna-Q framework, and provides a more sample-efficient approach to learn the dialogue polices. The proposed agent consists of a planner trained on-line with limited real user experience that can generate large amounts of simulated experience to supplement with limited real user experience, and a policy model trained on these hybrid experiences. The effectiveness of our approach is validated on a movie-booking task in both a simulation setting and a human-in-the-loop setting.\nTopological data analysis offers a robust way to extract useful information from noisy, unstructured data by identifying its underlying structure. Recently, an efficient quantum algorithm was proposed [Lloyd, Garnerone, Zanardi, Nat. Commun. 7, 10138 (2016)] for calculating Betti numbers of data points -- topological features that count the number of topological holes of various dimensions in a scatterplot. Here, we implement a proof-of-principle demonstration of this quantum algorithm by employing a six-photon quantum processor to successfully analyze the topological features of Betti numbers of a network including three data points, providing new insights into data analysis in the era of quantum computing.\nStrict partial order is a mathematical structure commonly seen in relational data. One obstacle to extracting such type of relations at scale is the lack of large-scale labels for building effective data-driven solutions. We develop an active learning framework for mining such relations subject to a strict order. Our approach incorporates relational reasoning not only in finding new unlabeled pairs whose labels can be deduced from an existing label set, but also in devising new query strategies that consider the relational structure of labels. Our experiments on concept prerequisite relations show our proposed framework can substantially improve the classification performance with the same query budget compared to other baseline approaches.\nMulti-view networks are ubiquitous in real-world applications. In order to extract knowledge or business value, it is of interest to transform such networks into representations that are easily machine-actionable. Meanwhile, network embedding has emerged as an effective approach to generate distributed network representations. Therefore, we are motivated to study the problem of multi-view network embedding, with a focus on the characteristics that are specific and important in embedding this type of networks. In our practice of embedding real-world multi-view networks, we identify two such characteristics, which we refer to as preservation and collaboration. We then explore the feasibility of achieving better embedding quality by simultaneously modeling preservation and collaboration, and propose the mvn2vec algorithms. With experiments on a series of synthetic datasets, an internal Snapchat dataset, and two public datasets, we further confirm the presence and importance of preservation and collaboration. These experiments also demonstrate that better embedding can be obtained by simultaneously modeling the two characteristics, while not over-complicating the model or requiring additional supervision.\nWe introduce a continuous-time analog solver for MaxSAT, a quintessential class of NP-hard discrete optimization problems, where the task is to find a truth assignment for a set of Boolean variables satisfying the maximum number of given logical constraints. We show that the scaling of an invariant of the solver's dynamics, the escape rate, as function of the number of unsatisfied clauses can predict the global optimum value, often well before reaching the corresponding state. We demonstrate the performance of the solver on hard MaxSAT competition problems. We then consider the two-color Ramsey number $R(m,m)$ problem, translate it to SAT, and apply our algorithm to the still unknown $R(5,5)$. We find edge colorings without monochromatic 5-cliques for complete graphs up to 42 vertices, while on 43 vertices we find colorings with only two monochromatic 5-cliques, the best coloring found so far, supporting the conjecture that $R(5,5) = 43$.\nDisplaying the large number of bands in a hyper spectral image on a trichromatic monitor has been an active research topic. The visualized image shall convey as much information as possible form the original data and facilitate image interpretation. Most existing methods display HSIs in false colors which contradict with human's experience and expectation. In this paper, we propose a nonlinear approach to visualize an input HSI with natural colors by taking advantage of a corresponding RGB image. Our approach is based on Moving Least Squares, an interpolation scheme for reconstructing a surface from a set of control points, which in our case is a set of matching pixels between the HSI and the corresponding RGB image. Based on MLS, the proposed method solves for each spectral signature a unique transformation so that the non linear structure of the HSI can be preserved. The matching pixels between a pair of HSI and RGB image can be reused to display other HSIs captured b the same imaging sensor with natural colors. Experiments show that the output image of the proposed method no only have natural colors but also maintain the visual information necessary for human analysis.\nThe seminal work of Chow and Liu (1968) shows that approximation of a finite probabilistic system by Markov trees can achieve the minimum information loss with the topology of a maximum spanning tree. Our current paper generalizes the result to Markov networks of tree width $\\leq k$, for every fixed $k\\geq 2$. In particular, we prove that approximation of a finite probabilistic system with such Markov networks has the minimum information loss when the network topology is achieved with a maximum spanning $k$-tree. While constructing a maximum spanning $k$-tree is intractable for even $k=2$, we show that polynomial algorithms can be ensured by a sufficient condition accommodated by many meaningful applications. In particular, we prove an efficient algorithm for learning the optimal topology of higher order correlations among random variables that belong to an underlying linear structure.\nIn this paper, we present a new approach to Transfer Learning (TL) in Reinforcement Learning (RL) for cross-domain tasks. Many of the available techniques approach the transfer architecture as a method of speeding up the target task learning. We propose to adapt and reuse the mapped source task optimal-policy directly in related domains. We show the optimal policy from a related source task can be near optimal in target domain provided an adaptive policy accounts for the model error between target and source. The main benefit of this policy augmentation is generalizing policies across multiple related domains without having to re-learn the new tasks. Our results show that this architecture leads to better sample efficiency in the transfer, reducing sample complexity of target task learning to target apprentice learning.\nThis paper is concerned with the sparsification of the input-hidden weights of ELM (Extreme Learning Machine). For ordinary feedforward neural networks, the sparsification is usually done by introducing certain regularization technique into the learning process of the network. But this strategy can not be applied for ELM, since the input-hidden weights of ELM are supposed to be randomly chosen rather than to be learned. To this end, we propose a modified ELM, called ELM-LC (ELM with local connections), which is designed for the sparsification of the input-hidden weights as follows: The hidden nodes and the input nodes are divided respectively into several corresponding groups, and an input node group is fully connected with its corresponding hidden node group, but is not connected with any other hidden node group. As in the usual ELM, the hidden-input weights are randomly given, and the hidden-output weights are obtained through a least square learning. In the numerical simulations on some benchmark problems, the new ELM-CL behaves better than the traditional ELM.\nWe study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We first investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not too large. We then extend our results to spectral-regularization algorithms (SRA), including kernel ridge regression (KRR), kernel principal component analysis, and gradient methods. Our results are superior to the state-of-the-art theory. Particularly, our results show that distributed SGM has a smaller theoretical computational complexity, compared with distributed KRR and classic SGM. Moreover, even for non-distributed SRA, they provide the first optimal, capacity-dependent convergence rates, considering the case that the regression function may not be in the RKHS.\nChit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.\nThis paper describes the application of comparison training (CT) for automatic feature weight tuning, with the final objective of improving the evaluation functions used in Chinese chess programs. First, we propose an n-tuple network to extract features, since n-tuple networks require very little expert knowledge through its large numbers of features, while simulta-neously allowing easy access. Second, we propose a novel evalua-tion method that incorporates tapered eval into CT. Experiments show that with the same features and the same Chinese chess program, the automatically tuned comparison training feature weights achieved a win rate of 86.58% against the weights that were hand-tuned. The above trained version was then improved by adding additional features, most importantly n-tuple features. This improved version achieved a win rate of 81.65% against the trained version without additional features.\nWe analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the system with many versions of a question that are generated via a sequence-to-sequence question reformulation model, then aggregates the returned evidence to find the best answer. This process is an instance of \\emph{machine-machine} communication. The question reformulation model must adapt its language to increase the quality of the answers returned, matching the language of the question answering system. We find that the agent does not learn transformations that align with semantic intuitions but discovers through learning classical information retrieval techniques such as tf-idf re-weighting and stemming.\nMachine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code.   The objective is to maximize the predictor's ability to predict Y while minimizing the adversary's ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census) Dataset, it results in a predictive model that does not lose much accuracy while achieving very close to equality of odds (Hardt, et al., 2016). The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.\nFacial analysis technologies have recently measured up to the capabilities of expert clinicians in syndrome identification. To date, these technologies could only identify phenotypes of a few diseases, limiting their role in clinical settings where hundreds of diagnoses must be considered.   We developed a facial analysis framework, DeepGestalt, using computer vision and deep learning algorithms, that quantifies similarities to hundreds of genetic syndromes based on unconstrained 2D images. DeepGestalt is currently trained with over 26,000 patient cases from a rapidly growing phenotype-genotype database, consisting of tens of thousands of validated clinical cases, curated through a community-driven platform. DeepGestalt currently achieves 91% top-10-accuracy in identifying over 215 different genetic syndromes and has outperformed clinical experts in three separate experiments.   We suggest that this form of artificial intelligence is ready to support medical genetics in clinical and laboratory practices and will play a key role in the future of precision medicine.\nClustering is a fundamental machine learning method. The quality of its results is dependent on the data distribution. For this reason, deep neural networks can be used for learning better representations of the data. In this paper, we propose a systematic taxonomy for clustering with deep learning, in addition to a review of methods from the field. Based on our taxonomy, creating new methods is more straightforward. We also propose a new approach which is built on the taxonomy and surpasses some of the limitations of some previous work. Our experimental evaluation on image datasets shows that the method approaches state-of-the-art clustering quality, and performs better in some cases.\nBased on the observation that semantic segmentation errors are partially predictable, we propose a compact formulation using confusion statistics of the trained classifier to refine (re-estimate) the initial pixel label hypotheses. The proposed strategy is contingent upon computing the classifier confusion probabilities for a given dataset and estimating a relevant prior on the object classes present in the image to be classified. We provide a procedure to robustly estimate the confusion probabilities and explore multiple prior definitions. Experiments are shown comparing performances on multiple challenging datasets using different priors to improve a state-of-the-art semantic segmentation classifier. This study demonstrates the potential to significantly improve semantic labeling and motivates future work for reliable label prior estimation from images.\nInspired by the matching of supply to demand in logistical problems, the optimal transportation (or Monge-Kantorovich) problem involves the matching of probability distributions defined over a geometric domain such as a surface or manifold. After discretization, optimal transportation becomes a large-scale linear program, which typically is infeasible to solve efficiently on triangle meshes, graphs, point clouds, and other domains encountered in graphics and machine learning. Recent breakthroughs in numerical optimal transportation enable scalability to orders-of-magnitude larger problems, solvable in a fraction of a second. In these lecture notes, we discuss advances in numerical optimal transport that leverage understanding of both discrete and smooth aspects of the problem. State-of-the-art techniques in discrete optimal transportation combine insight from partial differential equations (PDE) with convex analysis to reformulate, discretize, and optimize transportation problems. The end result is a set of theoretically-justified models suitable for domains with thousands or millions of vertices. Since numerical optimal transport is a relatively new discipline, special emphasis is placed on identifying and explaining open problems in need of mathematical insight and additional research.\nWe present a simple and general framework for feature learning from point cloud. The key to the success of CNNs is the convolution operator that is capable of leveraging spatially-local correlation in data represented densely in grids (e.g. images). However, point cloud are irregular and unordered, thus a direct convolving of kernels against the features associated with the points will result in deserting the shape information while being variant to the orders. To address these problems, we propose to learn a X-transformation from the input points, and then use it to simultaneously weight the input features associated with the points and permute them into latent potentially canonical order, before the element-wise product and sum operations are applied. The proposed method is a generalization of typical CNNs into learning features from point cloud, thus we call it PointCNN. Experiments show that PointCNN achieves on par or better performance than state-of-the-art methods on multiple challenging benchmark datasets and tasks.\nThe evaluation of interactive machine learning systems remains a difficult task. These systems learn from and adapt to the human, but at the same time, the human receives feedback and adapts to the system. Getting a clear understanding of these subtle mechanisms of co-operation and co-adaptation is challenging. In this chapter, we report on our experience in designing and evaluating various interactive machine learning applications from different domains. We argue for coupling two types of validation: algorithm-centered analysis, to study the computational behaviour of the system; and human-centered evaluation, to observe the utility and effectiveness of the application for end-users. We use a visual analytics application for guided search, built using an interactive evolutionary approach, as an exemplar of our work. Our observation is that human-centered design and evaluation complement algorithmic analysis, and can play an important role in addressing the \"black-box\" effect of machine learning. Finally, we discuss research opportunities that require human-computer interaction methodologies, in order to support both the visible and hidden roles that humans play in interactive machine learning.\nGeometric analysis is a very capable theory to understand the influence of the high dimensionality of the input data in machine learning (ML) and knowledge discovery (KD). With our approach we can assess how far the application of a specific KD/ML-algorithm to a concrete data set is prone to the curse of dimensionality. To this end we extend V.~Pestov's axiomatic approach to the instrinsic dimension of data sets, based on the seminal work by M.~Gromov on concentration phenomena, and provide an adaptable and computationally feasible model for studying observable geometric invariants associated to features that are natural to both the data and the learning procedure. In detail, we investigate data represented by formal contexts and give first theoretical as well as experimental insights into the intrinsic dimension of a concept lattice. Because of the correspondence between formal concepts and maximal cliques in graphs, applications to social network analysis are at hand.\nAlthough Recurrent Neural Network (RNN) has been a powerful tool for modeling sequential data, its performance is inadequate when processing sequences with multiple patterns. In this paper, we address this challenge by introducing an external memory and constructing a novel persistent memory augmented RNN (termed as PRNN). The PRNN captures the principle patterns in training sequences and stores them in an external memory. By leveraging the persistent memory, the proposed method can adaptively update states according to the similarities between encoded inputs and memory slots, leading to a stronger capacity in assimilating sequences with multiple patterns. Content-based addressing is suggested in memory accessing, and gradient descent is utilized for implicitly updating the memory. Our approach can be further extended by combining the prior knowledge of data. Experiments on several datasets demonstrate the effectiveness of the proposed method.\nIn this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided.\nIn this paper, we study the problem of discovering the Markov blanket (MB) of a target variable from multiple interventional datasets. Datasets attained from interventional experiments contain richer causal information than passively observed data (observational data) for MB discovery. However, almost all existing MB discovery methods are designed for finding MBs from a single observational dataset. To identify MBs from multiple interventional datasets, we face two challenges: (1) unknown intervention variables; (2) nonidentical data distributions. To tackle the challenges, we theoretically analyze (a) under what conditions we can find the correct MB of a target variable, and (b) under what conditions we can identify the causes of the target variable via discovering its MB. Based on the theoretical analysis, we propose a new algorithm for discovering MBs from multiple interventional datasets, and present the conditions/assumptions which assure the correctness of the algorithm. To our knowledge, this work is the first to present the theoretical analyses about the conditions for MB discovery in multiple interventional datasets and the algorithm to find the MBs in relation to the conditions. Using benchmark Bayesian networks and real-world datasets, the experiments have validated the effectiveness and efficiency of the proposed algorithm in the paper.\nTo solve the text-based question and answering task that requires relational reasoning, it is necessary to memorize a large amount of information and find out the question relevant information from the memory. Most approaches were based on external memory and four components proposed by Memory Network. The distinctive component among them was the way of finding the necessary information and it contributes to the performance. Recently, a simple but powerful neural network module for reasoning called Relation Network (RN) has been introduced. We analyzed RN from the view of Memory Network, and realized that its MLP component is able to reveal the complicate relation between question and object pair. Motivated from it, we introduce which uses MLP to find out relevant information on Memory Network architecture. It shows new state-of-the-art results in jointly trained bAbI-10k story-based question answering tasks and bAbI dialog-based question answering tasks.\nThis paper presents a novel hybrid algorithm named Since Cosine Crow Search Algorithm. To propose the SCCSA, two novel algorithms are considered including Crow Search Algorithm (CSA) and Since Cosine Algorithm (SCA). The advantages of the two algorithms are considered and utilize to design an efficient hybrid algorithm which can perform significantly better in various benchmark functions. The combination of concept and operators of the two algorithms enable the SCCSA to make an appropriate trade-off between exploration and exploitation abilities of the algorithm. To evaluate the performance of the proposed SCCSA, seven well-known benchmark functions are utilized. The results indicated that the proposed hybrid algorithm is able to provide very competitive solution comparing to other state-of-the-art meta heuristics.\nThis Perspective provides examples of current and future applications of deep learning in pharmacogenomics, including: (1) identification of novel regulatory variants located in noncoding domains and their function as applied to pharmacoepigenomics; (2) patient stratification from medical records; and (3) prediction of drugs, targets, and their interactions. Deep learning encapsulates a family of machine learning algorithms that over the last decade has transformed many important subfields of artificial intelligence (AI) and has demonstrated breakthrough performance improvements on a wide range of tasks in biomedicine. We anticipate that in the future deep learning will be widely used to predict personalized drug response and optimize medication selection and dosing, using knowledge extracted from large and complex molecular, epidemiological, clinical, and demographic datasets.\nKnowledge graphs contain rich relational structures of the world, and thus complement data-driven machine learning in heterogeneous data. One of the most effective methods in representing knowledge graphs is to embed symbolic relations and entities into continuous spaces, where relations are approximately linear translation between projected images of entities in the relation space. However, state-of-the-art relation projection methods such as TransR, TransD or TransSparse do not model the correlation between relations, and thus are not scalable to complex knowledge graphs with thousands of relations, both in computational demand and in statistical robustness. To this end we introduce TransF, a novel translation-based method which mitigates the burden of relation projection by explicitly modeling the basis subspaces of projection matrices. As a result, TransF is far more light weight than the existing projection methods, and is robust when facing a high number of relations. Experimental results on the canonical link prediction task show that our proposed model outperforms competing rivals by a large margin and achieves state-of-the-art performance. Especially, TransF improves by 9%/5% in the head/tail entity prediction task for N-to-1/1-to-N relations over the best performing translation-based method.\nWe address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated. We show how to exploit the typically smooth dynamics of these systems and enable RL algorithms to never violate constraints during learning. Our technique is to directly add to the policy a safety layer that analytically solves an action correction formulation per each state. The novelty of obtaining an elegant closed-form solution is attained due to a linearized model, learned on past trajectories consisting of arbitrary actions. This is to mimic the real-world circumstances where data logs were generated with a behavior policy that is implausible to describe mathematically; such cases render the known safety-aware off-policy methods inapplicable. We demonstrate the efficacy of our approach on new representative physics-based environments, and prevail where reward shaping fails by maintaining zero constraint violations.\nReinforcement Learning (RL) is a research area that has blossomed tremendously in recent years and has shown remarkable potential in among others successfully playing computer games. However, there only exists a few game platforms that provide diversity in tasks and state-space needed to advance RL algorithms. The existing platforms offer RL access to Atari- and a few web-based games, but no platform fully expose access to Flash games. This is unfortunate because applying RL to Flash games have potential to push the research of RL algorithms.   This paper introduces the Flash Reinforcement Learning platform (FlashRL) which attempts to fill this gap by providing an environment for thousands of Flash games on a novel platform for Flash automation. It opens up easy experimentation with RL algorithms for Flash games, which has previously been challenging. The platform shows excellent performance with as little as 5% CPU utilization on consumer hardware. It shows promising results for novel reinforcement learning algorithms.\nWe introduce a new formal model -- based on the mathematical construct of sheaves -- for representing contradictory information in textual sources. This model has the advantage of letting us (a) identify the causes of the inconsistency; (b) measure how strong it is; (c) and do something about it, e.g. suggest ways to reconcile inconsistent advice. This model naturally represents the distinction between contradictions and disagreements. It is based on the idea of representing natural language sentences as formulas with parameters sitting on lattices, creating partial orders based on predicates shared by theories, and building sheaves on these partial orders with products of lattices as stalks. Degrees of disagreement are measured by the existence of global and local sections.   Limitations of the sheaf approach and connections to recent work in natural language processing, as well as the topics of contextuality in physics, data fusion, topological data analysis and epistemology are also discussed.\nThis paper presents the first deep reinforcement learning (DRL) framework to estimate the optimal Dynamic Treatment Regimes from observational medical data. This framework is more flexible and adaptive for high dimensional action and state spaces than existing reinforcement learning methods to model real-life complexity in heterogeneous disease progression and treatment choices, with the goal of providing doctor and patients the data-driven personalized decision recommendations. The proposed DRL framework comprises (i) a supervised learning step to predict the most possible expert actions, and (ii) a deep reinforcement learning step to estimate the long-term value function of Dynamic Treatment Regimes. Both steps depend on deep neural networks.   As a key motivational example, we have implemented the proposed framework on a data set from the Center for International Bone Marrow Transplant Research (CIBMTR) registry database, focusing on the sequence of prevention and treatments for acute and chronic graft versus host disease after transplantation. In the experimental results, we have demonstrated promising accuracy in predicting human experts' decisions, as well as the high expected reward function in the DRL-based dynamic treatment regimes.\nProportional representation (PR) is a fundamental principle of many democracies world-wide which employ PR-based voting rules to elect their representatives. The normative properties of these voting rules however, are often only understood in the context of sincere voting.   In this paper we consider PR in the presence of strategic voters. We construct a voting rule such that for every preference profile there exists at least one costly voting equilibrium satisfying PR with respect to voters' private and unrevealed preferences - such a voting rule is said to be strategically robust. In contrast, a commonly applied voting rule is shown not be strategically robust. Furthermore, we prove a limit on `how strategically robust' a PR-based voting rule can be; we show that there is no PR-based voting rule which ensures that every equilibrium satisfies PR. Collectively, our results highlight the possibility and limit of achieving PR in the presence of strategic voters and a positive role for mechanisms, such as pre-election polls, which coordinate voter behaviour towards equilibria which satisfy PR.\nWe propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner's forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems.\nSystematic reviews are essential to summarizing the results of different clinical and social science studies. The first step in a systematic review task is to identify all the studies relevant to the review. The task of identifying relevant studies for a given systematic review is usually performed manually, and as a result, involves substantial amounts of expensive human resource. Lately, there have been some attempts to reduce this manual effort using active learning. In this work, we build upon some such existing techniques, and validate by experimenting on a larger and comprehensive dataset than has been attempted until now. Our experiments provide insights on the use of different feature extraction models for different disciplines. More importantly, we identify that a naive active learning based screening process is biased in favour of selecting similar documents. We aimed to improve the performance of the screening process using a novel active learning algorithm with success. Additionally, we propose a mechanism to choose the best feature extraction method for a given review.\nMulti-vehicle routing has become increasingly important with the rapid development of autonomous vehicle technology. Dial-a-ride problem, a variant of vehicle routing problem (VRP), deals with the allocation of customer requests to vehicles, scheduling the pick-up and drop-off times and the sequence of serving those requests by ensuring high customer satisfaction with minimized travel cost. In this paper, we propose an improved tabu search (ITS) heuristic for static dial-a-ride problem (DARP) with the objective of obtaining high-quality solutions in short time. Two new techniques, initialization heuristic, and time window adjustment are proposed to achieve faster convergence to the global optimum. Various numerical experiments are conducted for the proposed solution methodology using DARP test instances from the literature and the convergence speed up is validated.\nRelational data sources are still one of the most popular ways to store enterprise or Web data, however, the issue with relational schema is the lack of a well-defined semantic description. A common ontology provides a way to represent the meaning of a relational schema and can facilitate the integration of heterogeneous data sources within a domain. Semantic labeling is achieved by mapping attributes from the data sources to the classes and properties in the ontology. We formulate this problem as a multi-class classification problem where previously labeled data sources are used to learn rules for labeling new data sources. The majority of existing approaches for semantic labeling have focused on data integration challenges such as naming conflicts and semantic heterogeneity. In addition, machine learning approaches typically have issues around class imbalance, lack of labeled instances and relative importance of attributes. To address these issues, we develop a new machine learning model with engineered features as well as two deep learning models which do not require extensive feature engineering. We evaluate our new approaches with the state-of-the-art.\nA flashover occurs when a fire spreads very rapidly through crevices due to intense heat. Flashovers present one of the most frightening and challenging fire phenomena to those who regularly encounter them: firefighters. Firefighters' safety and lives often depend on their ability to predict flashovers before they occur. Typical pre-flashover fire characteristics include dark smoke, high heat, and rollover (\"angel fingers\") and can be quantified by color, size, and shape. Using a color video stream from a firefighter's body camera, we applied generative adversarial neural networks for image enhancement. The neural networks were trained to enhance very dark fire and smoke patterns in videos and monitor dynamic changes in smoke and fire areas. Preliminary tests with limited flashover training videos showed that we predicted a flashover as early as 55 seconds before it occurred.\nAccurate and transparent prediction of cancer survival times on the level of individual patients can inform and improve patient care and treatment practices. In this paper, we design a model that concurrently learns to accurately predict patient-specific survival distributions and to explain its predictions in terms of patient attributes such as clinical tests or assessments. Our model is flexible and based on a recurrent network, can handle various modalities of data including temporal measurements, and yet constructs and uses simple explanations in the form of patient- and time-specific linear regression. For analysis, we use two publicly available datasets and show that our networks outperform a number of baselines in prediction while providing a way to inspect the reasons behind each prediction.\nEffective collaboration between humans and AI-based systems requires effective modeling of the human in the loop, both in terms of the mental state as well as the physical capabilities of the latter. However, these models can also open up pathways for manipulating and exploiting the human in the hopes of achieving some greater good, especially when the intent or values of the AI and the human are not aligned or when they have an asymmetrical relationship with respect to knowledge or computation power. In fact, such behavior does not necessarily require any malicious intent but can rather be borne out of cooperative scenarios. It is also beyond simple misinterpretation of intents, as in the case of value alignment problems, and thus can be effectively engineered if desired. Such techniques already exist and pose several unresolved ethical and moral questions with regards to the design of autonomy. In this paper, we illustrate some of these issues in a teaming scenario and investigate how they are perceived by participants in a thought experiment.\nClustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance.\nGeneralized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a mapping from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as \"if the gripper is empty, pick the clear block above x and place it on the table\" that achieve the goal clear(x) in any Blocksworld instance. In this policy, \"pick the clear block above x\" is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.\nIt is inconceivable how chaotic the world would look to humans, faced with innumerable decisions a day to be made under uncertainty, had they been lacking the capacity to distinguish the relevant from the irrelevant---a capacity which computationally amounts to handling probabilistic independence relations. The highly parallel and distributed computational machinery of the brain suggests that a satisfying process-level account of human independence judgment should also mimic these features. In this work, we present the first rational, distributed, message-passing, process-level account of independence judgment, called $\\mathcal{D}^\\ast$. Interestingly, $\\mathcal{D}^\\ast$ shows a curious, but normatively-justified tendency for quick detection of dependencies, whenever they hold. Furthermore, $\\mathcal{D}^\\ast$ outperforms all the previously proposed algorithms in the AI literature in terms of worst-case running time, and a salient aspect of it is supported by recent work in neuroscience investigating possible implementations of Bayes nets at the neural level. $\\mathcal{D}^\\ast$ nicely exemplifies how the pursuit of cognitive plausibility can lead to the discovery of state-of-the-art algorithms with appealing properties, and its simplicity makes $\\mathcal{D}^\\ast$ potentially a good candidate for pedagogical purposes.\nThe cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is inefficient, both computationally and space wise. We provide a novel stochastic approximation version of the CE method, where the sample averaging is replaced with incremental geometric averaging. This approach can save considerable computational and storage costs. Our algorithm is incremental in nature and possesses additional attractive features such as accuracy, stability, robustness and convergence to the global optimum for a particular class of objective functions. We evaluate the algorithm on a variety of global optimization benchmark problems and the results obtained corroborate our theoretical findings.\nDeep learning relies on a very specific kind of neural networks: those superposing several neural layers. In the last few years, deep learning achieved major breakthroughs in many tasks such as image analysis, speech recognition, natural language processing, and so on. Yet, there is no theoretical explanation of this success. In particular, it is not clear why the deeper the network, the better it actually performs.   We argue that the explanation is intimately connected to a key feature of the data collected from our surrounding universe to feed the machine learning algorithms: large non-parallelizable logical depth. Roughly speaking, we conjecture that the shortest computational descriptions of the universe are algorithms with inherently large computation times, even when a large number of computers are available for parallelization. Interestingly, this conjecture, combined with the folklore conjecture in theoretical computer science that $ P \\neq NC$, explains the success of deep learning.\nPretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-critic reinforcement learning algorithms. Also, some existing methods rely on the global optimum assumption, which is not true in most scenarios. In this paper, we employ expert demonstrations in a actor-critic reinforcement learning framework, and meanwhile ensure that the performance is not affected by the fact that expert demonstrations are not global optimal. We theoretically derive a method for computing policy gradients and value estimators with only expert demonstrations. Our method is theoretically plausible for actor-critic reinforcement learning algorithms that pretrains both policy and value functions. We apply our method to two of the typical actor-critic reinforcement learning algorithms, DDPG and ACER, and demonstrate with experiments that our method not only outperforms the RL algorithms without pretraining process, but also is more simulation efficient.\nNovice programmers often struggle with the formal syntax of programming languages. To assist them, we design a novel programming language correction framework amenable to reinforcement learning. The framework allows an agent to mimic human actions for text navigation and editing. We demonstrate that the agent can be trained through self-exploration directly from the raw input, that is, program text itself, without any knowledge of the formal syntax of the programming language. We leverage expert demonstrations for one tenth of the training data to accelerate training. The proposed technique is evaluated on 6975 erroneous C programs with typographic errors, written by students during an introductory programming course. Our technique fixes 14% more programs and 29% more compiler error messages relative to those fixed by a state-of-the-art tool, DeepFix, which uses a fully supervised neural machine translation approach.\nAutomatic music generation is a compelling task where much recent progress has been made with deep learning models. In this paper, we ask how these models can be integrated into interactive music systems; how can they encourage or enhance the music making of human users? Musical performance requires prediction to operate instruments, and perform in groups. We argue that predictive models could help interactive systems to understand their temporal context, and ensemble behaviour. Deep learning can allow data-driven models with a long memory of past states.   We advocate for predictive musical interaction, where a predictive model is embedded in a musical interface, assisting users by predicting unknown states of musical processes. We propose a framework for incorporating such predictive models into the sensing, processing, and result architecture that is often used in musical interface design. We show that our framework accommodates deep generative models, as well as models for predicting gestural states, or other high-level musical information. We motivate the framework with two examples from our recent work, as well as systems from the literature, and suggest musical use-cases where prediction is a necessary component.\nWe present a model for recursive Bayesian filtering based on lifted multiset states. Combining multisets with lifting makes it possible to simultaneously exploit multiple strategies for reducing inference complexity when compared to list-based grounded state representations. The core idea is to borrow the concept of Maximally Parallel Multiset Rewriting Systems and to enhance it by concepts from Rao-Blackwellisation and Lifted Inference, giving a representation of state distributions that enables efficient inference. In worlds where the random variables that define the system state are exchangeable - where the identity of entities does not matter - it automatically uses a representation that abstracts from ordering (achieving an exponential reduction in complexity) and it automatically adapts when observations or system dynamics destroy exchangeability by breaking symmetry.\nFor 50 years, research in the area of inductive inference aims at investigating the learning of formal languages and is influenced by computability theory, complexity theory, cognitive science, machine learning, and more generally artificial intelligence. Being one of the pioneers, Gold investigated the most common formalization, learning in the limit both from solely positive examples as well as from positive and negative information. The first mode of presentation has been studied extensively, including insights in how different additional requirements on the hypothesis sequence of the learner or requested properties of the latter itself, restrict what collections of languages are learnable.   We focus on the second paradigm, learning from informants, and study how imposing different restrictions on the learning process effects learnability. For example, we show that learners can be assumed to only change their hypothesis in case it is inconsistent with the data (such learners are called conservative). Further, we give a picture of how the most important learning restrictions relate. Our investigations underpin the claim for delayability being the right structural property to gain a deeper understanding concerning the nature of learning restrictions.\nNegative affect is a proxy for mental health in adults. By being able to predict participants' negative affect states unobtrusively, researchers and clinicians will be better positioned to deliver targeted, just-in-time mental health interventions via mobile applications. This work attempts to personalize the passive recognition of negative affect states via group-based modeling of user behavior patterns captured from mobility, communication, and activity patterns. Results show that group models outperform generalized models in a dataset based on two weeks of users' daily lives.\nWhen humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using feature generation. Given a feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated features significantly improve the performance of existing learning algorithms.\nThough deep neural networks (DNNs) achieve remarkable performances in many artificial intelligence tasks, the lack of training instances remains a notorious challenge. As the network goes deeper, the generalization accuracy decays rapidly in the situation of lacking massive amounts of training data. In this paper, we propose novel deep neural network structures that can be inherited from all existing DNNs with almost the same level of complexity, and develop simple training algorithms. We show our paradigm successfully resolves the lack of data issue. Tests on the CIFAR10 and CIFAR100 image recognition datasets show that the new paradigm leads to 20$\\%$ to $30\\%$ relative error rate reduction compared to their base DNNs. The intuition of our algorithms for deep residual network stems from theories of the partial differential equation (PDE) control problems. Code will be made available.\nWe propose an architecture for VQA which utilizes recurrent layers to generate visual and textual attention. The memory characteristic of the proposed recurrent attention units offers a rich joint embedding of visual and textual features and enables the model to reason relations between several parts of the image and question. Our single model outperforms the first place winner on the VQA 1.0 dataset, performs within margin to the current state-of-the-art ensemble model. We also experiment with replacing attention mechanisms in other state-of-the-art models with our implementation and show increased accuracy. In both cases, our recurrent attention mechanism improves performance in tasks requiring sequential or relational reasoning on the VQA dataset.\nIn this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid with a high resolution of 256^3 by recovering the occluded/missing regions. The key idea is to combine the generative capabilities of autoencoders and the conditional Generative Adversarial Networks (GAN) framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets and real-world Kinect datasets show that the proposed 3D-RecGAN++ significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects.\nWe identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. For each of the three types of obfuscated gradients we discover, we describe characteristic behaviors of defenses exhibiting this effect and develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 8 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely and 1 partially.\nWe present Adaptive Memory Networks (AMN) that processes input-question pairs to dynamically construct a network architecture optimized for lower inference times for Question Answering (QA) tasks. AMN processes the input story to extract entities and stores them in memory banks. Starting from a single bank, as the number of input entities increases, AMN learns to create new banks as the entropy in a single bank becomes too high. Hence, after processing an input-question(s) pair, the resulting network represents a hierarchical structure where entities are stored in different banks, distanced by question relevance. At inference, one or few banks are used, creating a tradeoff between accuracy and performance. AMN is enabled by dynamic networks that allow input dependent network creation and efficiency in dynamic mini-batching as well as our novel bank controller that allows learning discrete decision making with high accuracy. In our results, we demonstrate that AMN learns to create variable depth networks depending on task complexity and reduces inference times for QA tasks.\nRecently, feature selection has become an increasingly important area of research due to the surge in high-dimensional datasets in all areas of modern life. A plethora of feature selection algorithms have been proposed, but it is difficult to truly analyse the quality of a given algorithm. Ideally, an algorithm would be evaluated by measuring how well it removes known bad features. Acquiring datasets with such features is inherently difficult, and so a common technique is to add synthetic bad features to an existing dataset. While adding noisy features is an easy task, it is very difficult to automatically add complex, redundant features. This work proposes one of the first approaches to generating redundant features, using a novel genetic programming approach. Initial experiments show that our proposed method can automatically create difficult, redundant features which have the potential to be used for creating high-quality feature selection benchmark datasets.\nModel interpretability is a requirement in many applications in which crucial decisions are made by users relying on a model's outputs. The recent movement for \"algorithmic fairness\" also stipulates explainability, and therefore interpretability of learning models. And yet the most successful contemporary Machine Learning approaches, the Deep Neural Networks, produce models that are highly non-interpretable. We attempt to address this challenge by proposing a technique called CNN-INTE to interpret deep Convolutional Neural Networks (CNN) via meta-learning. In this work, we interpret a specific hidden layer of the deep CNN model on the MNIST image dataset. We use a clustering algorithm in a two-level structure to find the meta-level training data and Random Forest as base learning algorithms to generate the meta-level test data. The interpretation results are displayed visually via diagrams, which clearly indicates how a specific test instance is classified. Our method achieves global interpretation for all the test instances without sacrificing the accuracy obtained by the original deep CNN model. This means our model is faithful to the deep CNN model, which leads to reliable interpretations.\nThis paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, the interpretability is always the Achilles' heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of low interpretability of their black-box representations. We believe that high model interpretability may help people to break several bottlenecks of deep learning, e.g., learning from very few annotations, learning via human-computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and we revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.\nRecent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable in the specific context of verification. Suppose we have a machine learning system that predicts X, and we provide rationale for this prediction X. Given an input, an explanation, and an output, is the output consistent with the input and the supposed rationale? Via a series of user-studies, we identify what kinds of increases in complexity have the greatest effect on the time it takes for humans to verify the rationale, and which seem relatively insensitive.\nTraining deep neural networks results in strong learned representations that show good generalization capabilities. In most cases, training involves iterative modification of all weights inside the network via back-propagation. In Extreme Learning Machines, it has been suggested to set the first layer of a network to fixed random values instead of learning it. In this paper, we propose to take this approach a step further and fix almost all layers of a deep convolutional neural network, allowing only a small portion of the weights to be learned. As our experiments show, fixing even the majority of the parameters of the network often results in performance which is on par with the performance of learning all of them. The implications of this intriguing property of deep neural networks are discussed and we suggest ways to harness it to create more robust representations.\nMulti-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view interactions. In this paper, we present a new neural architecture for multi-view sequential learning called the Memory Fusion Network (MFN) that explicitly accounts for both interactions in a neural architecture and continuously models them through time. The first component of the MFN is called the System of LSTMs, where view-specific interactions are learned in isolation through assigning an LSTM function to each view. The cross-view interactions are then identified using a special attention mechanism called the Delta-memory Attention Network (DMAN) and summarized through time with a Multi-view Gated Memory. Through extensive experimentation, MFN is compared to various proposed approaches for multi-view sequential learning on multiple publicly available benchmark datasets. MFN outperforms all the existing multi-view approaches. Furthermore, MFN outperforms all current state-of-the-art models, setting new state-of-the-art results for these multi-view datasets.\nKnowledge graphs, on top of entities and their relationships, contain another important element: literals. Literals encode interesting properties (e.g. the height) of entities that are not captured by links between entities alone. Most of the existing work on embedding (or latent feature) based knowledge graph modeling focuses mainly on the relations between entities. In this work, we study the effect of incorporating literal information into existing knowledge graph models. Our approach, which we name LiteralE, is an extension that can be plugged into existing latent feature methods. LiteralE merges entity embeddings with their literal information using a learnable, parametrized function, such as a simple linear or nonlinear transformation, or a multilayer neural network. We extend several popular embedding models using LiteralE and evaluate the performance on the task of link prediction. Despite its simplicity, LiteralE proves to be an effective way to incorporate literal information into existing embedding based models, improving their performance on different standard datasets, which we augmented with their literals and provide as testbed for further research.\nMulti-person articulated pose tracking in complex unconstrained videos is an important and challenging problem. In this paper, going along the road of top-down approaches, we propose a decent and efficient pose tracker based on pose flows. First, we design an online optimization framework to build association of cross-frame poses and form pose flows. Second, a novel pose flow non maximum suppression (NMS) is designed to robustly reduce redundant pose flows and re-link temporal disjoint pose flows. Extensive experiments show our method significantly outperforms best reported results on two standard Pose Tracking datasets (PoseTrack dataset and PoseTrack Challenge dataset) by 13 mAP 25 MOTA and 6 mAP 3 MOTA respectively. Moreover, in the case of working on detected poses in individual frames, the extra computation of proposed pose tracker is very minor, requiring 0.01 second per frame only.\nRecent work in explanation generation for decision making agents has looked at how unexplained behavior of autonomous systems can be understood in terms of differences in the model of the system and the human's understanding of the same, and how the explanation process as a result of this mismatch can be then seen as a process of reconciliation of these models. Existing algorithms in such settings, while having been built on contrastive, selective and social properties of explanations as studied extensively in the psychology literature, have not, to the best of our knowledge, been evaluated in settings with actual humans in the loop. As such, the applicability of such explanations to human-AI and human-robot interactions remains suspect. In this paper, we set out to evaluate these explanation generation algorithms in a series of studies in a mock search and rescue scenario with an internal semi-autonomous robot and an external human commander. We demonstrate to what extent the properties of these algorithms hold as they are evaluated by humans, and how the dynamics of trust between the human and the robot evolve during the process of these interactions.\nAdvanced and accurate modelling of a Flapping Wing Micro Air Vehicle (FW MAV) and its control is one of the recent research topics related to the field of autonomous Unmanned Aerial Vehicles (UAVs). In this work, a four wing Natureinspired (NI) FW MAV is modeled and controlled inspiring by its advanced features like quick flight, vertical take-off and landing, hovering, and fast turn, and enhanced manoeuvrability when contrasted with comparable-sized fixed and rotary wing UAVs. The Fuzzy C-Means (FCM) clustering algorithm is utilized to demonstrate the NIFW MAV model, which has points of interest over first principle based modelling since it does not depend on the system dynamics, rather based on data and can incorporate various uncertainties like sensor error. The same clustering strategy is used to develop an adaptive fuzzy controller. The controller is then utilized to control the altitude of the NIFW MAV, that can adapt with environmental disturbances by tuning the antecedent and consequent parameters of the fuzzy system.\nIn recent years, neural network approaches have been widely adopted for machine learning tasks, with applications in computer vision. More recently, unsupervised generative models based on neural networks have been successfully applied to model data distributions via low-dimensional latent spaces. In this paper, we use Generative Adversarial Networks (GANs) to impose structure in compressed sensing problems, replacing the usual sparsity constraint. We propose to train the GANs in a task-aware fashion, specifically for reconstruction tasks. We also show that it is possible to train our model without using any (or much) non-compressed data. Finally, we show that the latent space of the GAN carries discriminative information and can further be regularized to generate input features for general inference tasks. We demonstrate the effectiveness of our method on a variety of reconstruction and classification problems.\nWe build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher's language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably interpolates and extrapolates to interpret sentences that contain new word combinations or new words missing from training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. The proposed model significantly outperforms five comparison methods for interpreting zero-shot sentences. In addition, we demonstrate human-interpretable intermediate outputs of the model in the appendix.\nThis paper describes a problem arising in sea exploration, where the aim is to schedule the expedition of a ship for collecting information about the resources on the seafloor. The aim is to collect data by probing on a set of carefully chosen locations, so that the information available is optimally enriched. This problem has similarities with the orienteering problem, where the aim is to plan a time-limited trip for visiting a set of vertices, collecting a prize at each of them, in such a way that the total value collected is maximum. In our problem, the score at each vertex is associated with an estimation of the level of the resource on the given surface, which is done by regression using Gaussian processes. Hence, there is a correlation among scores on the selected vertices; this is a first difference with respect to the standard orienteering problem. The second difference is the location of each vertex, which in our problem is a freely chosen point on a given surface.\nReinforcement learning in environments with many action-state pairs is challenging. At issue is the number of episodes needed to thoroughly search the policy space. Most conventional heuristics address this search problem in a stochastic manner. This can leave large portions of the policy space unvisited during the early training stages. In this paper, we propose an uncertainty-based, information-theoretic approach for performing guided stochastic searches that more effectively cover the policy space. Our approach is based on the value of information, a criterion that provides the optimal trade-off between expected costs and the granularity of the search process. The value of information yields a stochastic routine for choosing actions during learning that can explore the policy space in a coarse to fine manner. We augment this criterion with a state-transition uncertainty factor, which guides the search process into previously unexplored regions of the policy space.\nCycles of attacking arguments pose non-trivial issues in Dung style argumentation theory, apparent behavioural difference between odd and even length cycles being a notable one. While a few methods were proposed for treating them, to - in particular - enable selection of acceptable arguments in an odd-length cycle when Dung semantics could select none, so far the issues have been observed from a purely argument-graph-theoretic perspective. Per contra, we consider argument graphs together with a certain lattice like semantic structure over arguments e.g. ontology. As we show, the semantic-argumentgraphic hybrid theory allows us to apply abstract interpretation, a widely known methodology in static program analysis, to formal argumentation. With this, even where no arguments in a cycle could be selected sensibly, we could say more about arguments acceptability of an argument framework that contains it. In a certain sense, we can verify Dung extensions with respect to a semantic structure in this hybrid theory, to consolidate our confidence in their suitability. By defining the theory, and by making comparisons to existing approaches, we ultimately discover that whether Dung semantics, or an alternative semantics such as cf2, is adequate or problematic depends not just on an argument graph but also on the semantic relation among the arguments in the graph.\nWe focus on learning the desired objective function for a robot. Although trajectory demonstrations can be very informative of the desired objective, they can also be difficult for users to provide. Answers to comparison queries, asking which of two trajectories is preferable, are much easier for users, and have emerged as an effective alternative. Unfortunately, comparisons are far less informative. We propose that there is much richer information that users can easily provide and that robots ought to leverage. We focus on augmenting comparisons with feature queries, and introduce a unified formalism for treating all answers as observations about the true desired reward. We derive an active query selection algorithm, and test these queries in simulation and on real users. We find that richer, feature-augmented queries can extract more information faster, leading to robots that better match user preferences in their behavior.\nDecomposition methods have been proposed in the past to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used where each individual entity is considered independently. The individual utility functions are then combined in real time to solve the global problem. Although these techniques can perform well empirically, they sacrifice optimality. This paper proposes an approach inspired from multi-fidelity optimization to learn a correction term with a neural network representation. Learning this correction can significantly improve performance. We demonstrate this approach on a pedestrian avoidance problem for autonomous driving. By leveraging strategies to avoid a single pedestrian, the decomposition method can scale to avoid multiple pedestrians. We verify empirically that the proposed correction method leads to a significant improvement over the decomposition method alone and outperforms a policy trained on the full scale problem without utility decomposition.\nAttention-based sequence-to-sequence model has proved successful in Neural Machine Translation (NMT). However, the attention without consideration of decoding history, which includes the past information in the decoder and the attention mechanism, often causes much repetition. To address this problem, we propose the decoding-history-based Adaptive Control of Attention (ACA) for the NMT model. ACA learns to control the attention by keeping track of the decoding history and the current information with a memory vector, so that the model can take the translated contents and the current information into consideration. Experiments on Chinese-English translation and the English-Vietnamese translation have demonstrated that our model significantly outperforms the strong baselines. The analysis shows that our model is capable of generating translation with less repetition and higher accuracy. The code will be available at https://github.com/lancopku\nIn the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.\nVariational encoder-decoders (VEDs) have shown promising results in dialogue generation. However, the latent variable distributions are usually approximated by a much simpler model than the powerful RNN structure used for encoding and decoding, yielding the KL-vanishing problem and inconsistent training objective. In this paper, we separate the training step into two phases: The first phase learns to autoencode discrete texts into continuous embeddings, from which the second phase learns to generalize latent representations by reconstructing the encoded embedding. In this case, latent variables are sampled by transforming Gaussian noise through multi-layer perceptrons and are trained with a separate VED model, which has the potential of realizing a much more flexible distribution. We compare our model with current popular models and the experiment demonstrates substantial improvement in both metric-based and human evaluations.\nInception and the Resnet family of Convolutional Neural Network archi-tectures have broken records in the past few years, but recent state of the art models have also incurred very high computational cost in terms of training, inference and model size. Making the deployment of these models on Edge devices, impractical. In light of this, we present a new novel architecture that is designed for high computational efficiency on both GPUs and CPUs, and is highly suited for deployment on Mobile Applications, Smart Cameras, Iot devices and controllers as well as low cost drones. Our architecture boasts competitive accuracies on standard Datasets even out-performing the original Resnet. We present below the motivation for this research, the architecture of the network, single test accuracies on CIFAR 10 and CIFAR 100 , a detailed comparison with other well-known architectures and link to an implementation in Keras.\nInertial sensors play a pivotal role in indoor localization, which in turn lays the foundation for pervasive personal applications. However, low-cost inertial sensors, as commonly found in smartphones, are plagued by bias and noise, which leads to unbounded growth in error when accelerations are double integrated to obtain displacement. Small errors in state estimation propagate to make odometry virtually unusable in a matter of seconds. We propose to break the cycle of continuous integration, and instead segment inertial data into independent windows. The challenge becomes estimating the latent states of each window, such as velocity and orientation, as these are not directly observable from sensor data. We demonstrate how to formulate this as an optimization problem, and show how deep recurrent neural networks can yield highly accurate trajectories, outperforming state-of-the-art shallow techniques, on a wide range of tests and attachments. In particular, we demonstrate that IONet can generalize to estimate odometry for non-periodic motion, such as a shopping trolley or baby-stroller, an extremely challenging task for existing techniques.\nBayesian optimization has become a standard technique for hyperparameter optimization, including data-intensive models such as deep neural networks that may take days or weeks to train. We consider the setting where previous optimization runs are available, and we wish to use their results to warm-start a new optimization run. We develop an ensemble model that can incorporate the results of past optimization runs, while avoiding the poor scaling that comes with putting all results into a single Gaussian process model. The ensemble combines models from past runs according to estimates of their generalization performance on the current optimization. Results from a large collection of hyperparameter optimization benchmark problems and from optimization of a production computer vision platform at Facebook show that the ensemble can substantially reduce the time it takes to obtain near-optimal configurations, and is useful for warm-starting expensive searches or running quick re-optimizations.\nThis research proposes a novel indicator-based hybrid evolutionary approach that combines approximate and exact algorithms. We apply it to a new bi-criteria formulation of the travelling thief problem, which is known to the Evolutionary Computation community as a benchmark multi-component optimisation problem that interconnects two classical NP-hard problems: the travelling salesman problem and the 0-1 knapsack problem. Our approach employs the exact dynamic programming algorithm for the underlying Packing-While-Travelling (PWT) problem as a subroutine within a bi-objective evolutionary algorithm. This design takes advantage of the data extracted from Pareto fronts generated by the dynamic program to achieve better solutions. Furthermore, we develop a number of novel indicators and selection mechanisms to strengthen synergy of the two algorithmic components of our approach. The results of computational experiments show that the approach is capable to outperform the state-of-the-art results for the single-objective case of the problem.\nLearning a Bayesian networks with bounded treewidth is important for reducing the complexity of the inferences. We present a novel anytime algorithm (k-MAX) method for this task, which scales up to thousands of variables. Through extensive experiments we show that it consistently yields higher-scoring structures than its competitors on complete data sets. We then consider the problem of structure learning from incomplete data sets. This can be addressed by structural EM, which however is computationally very demanding. We thus adopt the novel k-MAX algorithm in the maximization step of structural EM, obtaining an efficient computation of the expected sufficient statistics. We test the resulting structural EM method on the task of imputing missing data, comparing it against the state-of-the-art approach based on random forests. Our approach achieves the same imputation accuracy of the competitors, but in about one tenth of the time. Furthermore we show that it has worst-case complexity linear in the input size, and that it is easily parallelizable.\nWe train and validate a semi-supervised, multi-task LSTM on 57,675 person-weeks of data from off-the-shelf wearable heart rate sensors, showing high accuracy at detecting multiple medical conditions, including diabetes (0.8451), high cholesterol (0.7441), high blood pressure (0.8086), and sleep apnea (0.8298). We compare two semi-supervised train- ing methods, semi-supervised sequence learning and heuristic pretraining, and show they outperform hand-engineered biomarkers from the medical literature. We believe our work suggests a new approach to patient risk stratification based on cardiovascular risk scores derived from popular wearables such as Fitbit, Apple Watch, or Android Wear.\nWe present PPFNet - Point Pair Feature NETwork for deeply learning a globally informed 3D local feature descriptor to find correspondences in unorganized point clouds. PPFNet learns local descriptors on pure geometry and is highly aware of the global context, an important cue in deep learning. Our 3D representation is computed as a collection of point-pair-features combined with the points and normals within a local vicinity. Our permutation invariant network design is inspired by PointNet and sets PPFNet to be ordering-free. As opposed to voxelization, our method is able to consume raw point clouds to exploit the full sparsity. PPFNet uses a novel $\\textit{N-tuple}$ loss and architecture injecting the global information naturally into the local descriptor. It shows that context awareness also boosts the local feature representation. Qualitative and quantitative evaluations of our network suggest increased recall, improved robustness and invariance as well as a vital step in the 3D descriptor extraction performance.\nFish in schooling formations navigate complex flow-fields replete with mechanical energy in the vortex wakes of their companions. Their schooling behaviour has been associated with evolutionary advantages including collective energy savings. How fish harvest energy from their complex fluid environment and the underlying physical mechanisms governing energy-extraction during collective swimming, is still unknown. Here we show that fish can improve their sustained propulsive efficiency by actively following, and judiciously intercepting, vortices in the wake of other swimmers. This swimming strategy leads to collective energy-savings and is revealed through the first ever combination of deep reinforcement learning with high-fidelity flow simulations. We find that a `smart-swimmer' can adapt its position and body deformation to synchronise with the momentum of the oncoming vortices, improving its average swimming-efficiency at no cost to the leader. The results show that fish may harvest energy deposited in vortices produced by their peers, and support the conjecture that swimming in formation is energetically advantageous. Moreover, this study demonstrates that deep reinforcement learning can produce navigation algorithms for complex flow-fields, with promising implications for energy savings in autonomous robotic swarms.\nWhile the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi-modal fusion, with the additional benefit of improved interpretability.\nRandom walks are at the heart of many existing network embedding methods. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to vertex identity. In this work, we introduce the Role2Vec framework which uses the flexible notion of attributed random walks, and serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16:55% while requiring on average 853x less space than existing methods on a variety of graphs.\nIn the era of Big Data and Internet-of-Things (IoT), all real-world environments are gradually becoming cyber-physical (e.g., emergency management, healthcare, smart manufacturing, etc.), with the presence of connected devices and embedded ICT systems (e.g., smartphones, sensors, actuators) producing huge amounts of data and events that influence the enactment of the Cyber Physical Processes (CPPs) enacted in such environments. A Process Management System (PMS) employed for executing CPPs is required to automatically adapt its running processes to anomalous situations and exogenous events by minimising any human intervention at run-time. In this paper, we tackle this issue by introducing an approach and an adaptive Cognitive PMS that combines process execution monitoring, unanticipated exception detection and automated resolution strategies leveraging on well-established action-based formalisms in Artificial Intelligence, which allow to interpret the ever-changing knowledge of cyber-physical environments and to adapt CPPs by preserving their base structure.\nThe world is connected through the Internet. As the abundance of Internet users connected into the Web and the popularity of cloud computing research, the need of Artificial Intelligence (AI) is demanding. In this research, Genetic Algorithm (GA) as AI optimization method through natural selection and genetic evolution is utilized. There are many applications of GA such as web mining, load balancing, routing, and scheduling or web service selection. Hence, it is a challenging task to discover whether the code mainly server side and web based language technology affects the performance of GA. Travelling Salesperson Problem (TSP) as Non Polynomial-hard (NP-hard) problem is provided to be a problem domain to be solved by GA. While many scientists prefer Python in GA implementation, another popular high-level interpreter programming language such as PHP (PHP Hypertext Preprocessor) and Ruby were benchmarked. Line of codes, file sizes, and performances based on GA implementation and runtime were found varies among these programming languages. Based on the result, the use of Ruby in GA implementation is recommended.\nWithin the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically. We contribute both theoretically and empirically. On the theory side, we show that games with soft Q-learning exhibit a unique value and generalise team games and zero-sum games far beyond these two extremes to cover a continuous spectrum of gaming behaviour. Experimentally, we show how tuning agents' constraints affect performance and demonstrate, through a neural network architecture, how to reliably balance games with high-dimensional representations.\nRobust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RO-DQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than non-robust policy iteration.\nInference in continuous label Markov random fields is a challenging task. We use particle belief propagation (PBP) for solving the inference problem in continuous label space. Sampling particles from the belief distribution is typically done by using Metropolis-Hastings Markov chain Monte Carlo methods which involves sampling from a proposal distribution. This proposal distribution has to be carefully designed depending on the particular model and input data to achieve fast convergence. We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm. The proposed approach shows superior convergence performance on an image denoising toy example. Our findings are validated on a challenging relational 2D feature tracking application.\nATPboost is a system for solving sets of large-theory problems by interleaving ATP runs with state-of-the-art machine learning of premise selection from the proofs. Unlike many previous approaches that use multi-label setting, the learning is implemented as binary classification that estimates the pairwise-relevance of (theorem, premise) pairs. ATPboost uses for this the XGBoost gradient boosting algorithm, which is fast and has state-of-the-art performance on many tasks. Learning in the binary setting however requires negative examples, which is nontrivial due to many alternative proofs. We discuss and implement several solutions in the context of the ATP/ML feedback loop, and show that ATPboost with such methods significantly outperforms the k-nearest neighbors multilabel classifier.\nThe robust and efficient recognition of visual relations in images is a hallmark of biological vision. Here, we argue that, despite recent progress in visual recognition, modern machine vision algorithms are severely limited in their ability to learn visual relations. Through controlled experiments, we demonstrate that visual-relation problems strain convolutional neural networks (CNNs). The networks eventually break altogether when rote memorization becomes impossible such as when the intra-class variability exceeds their capacity. We further show that another type of feedforward network, called a relational network (RN), which was shown to successfully solve seemingly difficult visual question answering (VQA) problems on the CLEVR datasets, suffers similar limitations. Motivated by the comparable success of biological vision, we argue that feedback mechanisms including working memory and attention are the key computational components underlying abstract visual reasoning.\nAn important issue in neural network research is how to choose the number of nodes and layers such as to solve a classification problem. We provide new intuitions based on earlier results by An et al. (2015) by deriving an upper bound on the number of nodes in networks with two hidden layers such that linear separability can be achieved. Concretely, we show that if the data can be described in terms of N finite sets and the used activation function f is non-constant, increasing and has a left asymptote, we can derive how many nodes are needed to linearly separate these sets. This will be an upper bound that depends on the structure of the data. This structure can be analyzed using an algorithm. For the leaky rectified linear activation function, we prove separately that under some conditions on the slope, the same number of layers and nodes as for the aforementioned activation functions is sufficient. We empirically validate our claims.\nHigh spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.\nLearning a deep model from small data is yet an opening and challenging problem. We focus on one-shot classification by deep learning approach based on a small quantity of training samples. We proposed a novel deep learning approach named Local Contrast Learning (LCL) based on the key insight about a human cognitive behavior that human recognizes the objects in a specific context by contrasting the objects in the context or in her/his memory. LCL is used to train a deep model that can contrast the recognizing sample with a couple of contrastive samples randomly drawn and shuffled. On one-shot classification task on Omniglot, the deep model based LCL with 122 layers and 1.94 millions of parameters, which was trained on a tiny dataset with only 60 classes and 20 samples per class, achieved the accuracy 97.99% that outperforms human and state-of-the-art established by Bayesian Program Learning (BPL) trained on 964 classes. LCL is a fundamental idea which can be applied to alleviate parametric model's overfitting resulted by lack of training samples.\nThe given work describes methodological principles of design instrumental complex of ontological purpose. Instrumental complex intends for the implementation of the integrated information technologies automated build of domain ontologies. Results focus on enhancing the effectiveness of the automatic analysis and understanding of natural-language texts, building of knowledge description of subject areas (primarily in the area of science and technology) and for interdisciplinary research in conjunction with the solution of complex problems.\nGraph representations of large knowledge bases may comprise billions of edges. Usually built upon human-generated ontologies, several knowledge bases do not feature declared ontological rules and are far from being complete. Current rule mining approaches rely on schemata or store the graph in-memory, which can be unfeasible for large graphs. In this paper, we introduce HornConcerto, an algorithm to discover Horn clauses in large graphs without the need of a schema. Using a standard fact-based confidence score, we can mine close Horn rules having an arbitrary body size. We show that our method can outperform existing approaches in terms of runtime and memory consumption and mine high-quality rules for the link prediction task, achieving state-of-the-art results on a widely-used benchmark. Moreover, we find that rules alone can perform inference significantly faster than embedding-based methods and achieve accuracies on link prediction comparable to resource-demanding approaches such as Markov Logic Networks.\nGraph planning gives rise to fundamental algorithmic questions such as shortest path, traveling salesman problem, etc. A classical problem in discrete planning is to consider a weighted graph and construct a path that maximizes the sum of weights for a given time horizon $T$. However, in many scenarios, the time horizon is not fixed, but the stopping time is chosen according to some distribution such that the expected stopping time is $T$. If the stopping time distribution is not known, then to ensure robustness, the distribution is chosen by an adversary, to represent the worst-case scenario.   A stationary plan for every vertex always chooses the same outgoing edge. For fixed horizon or fixed stopping-time distribution, stationary plans are not sufficient for optimality. Quite surprisingly we show that when an adversary chooses the stopping-time distribution with expected stopping time $T$, then stationary plans are sufficient. While computing optimal stationary plans for fixed horizon is NP-complete, we show that computing optimal stationary plans under adversarial stopping-time distribution can be achieved in polynomial time. Consequently, our polynomial-time algorithm for adversarial stopping time also computes an optimal plan among all possible plans.\nThe famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.\nWe present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability. Although it is not competitive with state-of-the-art SAT solvers, NeuroSAT can solve problems that are substantially larger and more difficult than it ever saw during training by simply running for more iterations. Moreover, NeuroSAT generalizes to novel distributions; after training only on random SAT problems, at test time it can solve SAT problems encoding graph coloring, clique detection, dominating set, and vertex cover problems, all on a range of distributions over small random graphs.\nOntology learning (OL) is the process of automatically generating an ontological knowledge base from a plain text document. In this paper, we propose a new ontology learning approach and tool, called DLOL, which generates a knowledge base in the description logic (DL) SHOQ(D) from a collection of factual non-negative IS-A sentences in English. We provide extensive experimental results on the accuracy of DLOL, giving experimental comparisons to three state-of-the-art existing OL tools, namely Text2Onto, FRED, and LExO. Here, we use the standard OL accuracy measure, called lexical accuracy, and a novel OL accuracy measure, called instance-based inference model. In our experimental results, DLOL turns out to be about 21% and 46%, respectively, better than the best of the other three approaches.\nAI applications pose increasing demands on performance, so it is not surprising that the era of client-side distributed software is becoming important. On top of many AI applications already using mobile hardware, and even browsers for computationally demanding AI applications, we are already witnessing the emergence of client-side (federated) machine learning algorithms, driven by the interests of large corporations and startups alike. Apart from mathematical and algorithmic concerns, this trend especially demands new levels of computational efficiency from client environments. Consequently, this paper deals with the question of state-of-the-art performance by presenting a comparison study between native code and different browser-based implementations: JavaScript, ASM.js as well as WebAssembly on a representative mix of algorithms. Our results show that current efforts in runtime optimization push the boundaries well towards (and even beyond) native binary performance. We analyze the results obtained and speculate on the reasons behind some surprises, rounding the paper off by outlining future possibilities as well as some of our own research efforts.\nWe propose a connectionist-inspired kernel machine model with three key advantages over traditional kernel machines. First, it is capable of learning distributed and hierarchical representations. Second, its performance is highly robust to the choice of kernel function. Third, the solution space is not limited to the span of images of training data in reproducing kernel Hilbert space (RKHS). Together with the architecture, we propose a greedy learning algorithm that allows the proposed multilayer network to be trained layer-wise without backpropagation by optimizing the geometric properties of images in RKHS. With a single fixed generic kernel for each layer and two layers in total, our model compares favorably with state-of-the-art multiple kernel learning algorithms using significantly more kernels and popular deep architectures on widely used classification benchmarks.\nWe study the problem of explaining a rich class of behavioral properties of deep neural networks. Distinctively, our influence-directed explanations approach this problem by peering inside the net- work to identify neurons with high influence on the property and distribution of interest using an axiomatically justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by training convolutional neural net- works on MNIST, ImageNet, Pubfig, and Diabetic Retinopathy datasets. Our evaluation demonstrates that influence-directed explanations (1) identify influential concepts that generalize across instances, (2) help extract the essence of what the network learned about a class, (3) isolate individual features the network uses to make decisions and distinguish related instances, and (4) assist in understanding misclassifications.\nIn general, neural networks are not currently capable of learning tasks in a sequential fashion. When a novel, unrelated task is learnt by a neural network, it substantially forgets how to solve previously learnt tasks. One of the original solutions to this problem is pseudo-rehearsal, which involves learning the new task while rehearsing generated items representative of the previous task/s. This is very effective for simple tasks. However, pseudo-rehearsal has not yet been successfully applied to very complex tasks because in these tasks it is difficult to generate representative items. We accomplish pseudo-rehearsal by using a Generative Adversarial Network to generate items so that our deep network can learn to sequentially classify the CIFAR-10, SVHN and MNIST datasets. After training on all tasks, our network loses only 1.67% absolute accuracy on CIFAR-10 and gains 0.24% absolute accuracy on SVHN. Our model's performance is a substantial improvement compared to the current state of the art solution.\nFaced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal $p(y)$ changes but the conditional $p(x|y)$ does not. We propose Black Box Shift Estimation (BBSE) to estimate the test distribution $p(y)$. BBSE exploits arbitrary black box predictors to reduce dimensionality prior to shift correction. While better predictors give tighter estimates, BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. We prove BBSE's consistency, bound its error, and introduce a statistical test that uses BBSE to detect shift. We also leverage BBSE to correct classifiers. Experiments demonstrate accurate estimates and improved prediction, even on high-dimensional datasets of natural images.\nIn this note we describe an application of Wasserstein distance to Reinforcement Learning. The Wasserstein distance in question is between the distribution of mappings of trajectories of a policy into some metric space, and some other fixed distribution (which may, for example, come from another policy). Different policies induce different distributions, so given an underlying metric, the Wasserstein distance quantifies how different policies are. This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser. Changing the sign of the regularisation parameter, one can learn a policy for which its trajectory mapping distribution is attracted to a given fixed distribution.\nWatchlist (also hint list) is a mechanism that allows related proofs to guide a proof search for a new conjecture. This mechanism has been used with the Otter and Prover9 theorem provers, both for interactive formalizations and for human-assisted proving of open conjectures in small theories. In this work we explore the use of watchlists in large theories coming from first-order translations of large ITP libraries, aiming at improving hammer-style automation by smarter internal guidance of the ATP systems. In particular, we (i) design watchlist-based clause evaluation heuristics inside the E ATP system, and (ii) develop new proof guiding algorithms that load many previous proofs inside the ATP and focus the proof search using a dynamically updated notion of proof matching. The methods are evaluated on a large set of problems coming from the Mizar library, showing significant improvement of E's standard portfolio of strategies, and also of the previous best set of strategies invented for Mizar by evolutionary methods.\nIn this paper, we deal with a calculus system SLCD (Syllogistic Logic with Carroll Diagrams), which gives a formal approach to logical reasoning with diagrams, for representations of the fundamental Aristotelian categorical propositions and show that they are closed under the syllogistic criterion of inference which is the deletion of middle term. Therefore, it is implemented to let the formalism comprise synchronically bilateral and trilateral diagrammatical appearance and a naive algorithmic nature. And also, there is no need specific knowledge or exclusive ability to understand as well as to use it. Consequently, we give an effective algorithm used to determine whether a syllogistic reasoning valid or not by using SLCD.\nRepresentation learning algorithms are designed to learn abstract features that characterize data. State representation learning (SRL) focuses on a particular kind of representation learning where learned features are in low dimension, evolve through time, and are influenced by actions of an agent. As the representation learned captures the variation in the environment generated by agents, this kind of representation is particularly suitable for robotics and control scenarios. In particular, the low dimension helps to overcome the curse of dimensionality, provides easier interpretation and utilization by humans and can help improve performance and speed in policy learning algorithms such as reinforcement learning.   This survey aims at covering the state-of-the-art on state representation learning in the most recent years. It reviews different SRL methods that involve interaction with the environment, their implementations and their applications in robotics control tasks (simulated or real). In particular, it highlights how generic learning objectives are differently exploited in the reviewed algorithms. Finally, it discusses evaluation methods to assess the representation learned and summarizes current and future lines of research.\nNoisy observations coupled with nonlinear dynamics pose one of the biggest challenges in robot motion planning. By decomposing the nonlinear dynamics into a discrete set of local dynamics models, hybrid dynamics provide a natural way to model nonlinear dynamics, especially in systems with sudden \"jumps\" in the dynamics, due to factors such as contacts. We propose a hierarchical POMDP planner that develops locally optimal motion plans for hybrid dynamics models. The hierarchical planner first develops a high-level motion plan to sequence the local dynamics models to be visited. The high-level plan is then converted into a detailed cost-optimized continuous state plan. This hierarchical planning approach results in a decomposition of the POMDP planning problem into smaller sub-parts that can be solved with significantly lower computational costs. The ability to sequence the visitation of local dynamics models also provides a powerful way to leverage the hybrid dynamics to reduce state uncertainty. We evaluate the proposed planner for two navigation and localization tasks in simulated domains, as well as an assembly task with a real robotic manipulator.\nWe present an end-to-end framework for solving Vehicle Routing Problem (VRP) using deep reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. Our method is faster in both training and inference than a recent method that solves the Traveling Salesman Problem (TSP), with nearly identical solution quality. On the more general VRP, our approach outperforms classical heuristics on medium-sized instances in both solution quality and computation time (after training). Our proposed framework can be applied to variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.\nIn this work, we propose a simple but effective method to interpret black-box machine learning models globally. That is, we use a compact binary tree, the interpretation tree, to explicitly represent the most important decision rules that are implicitly contained in the black-box machine learning models. This tree is learned from the contribution matrix which consists of the contributions of input variables to predicted scores for each single prediction. To generate the interpretation tree, a unified process recursively partitions the input variable space by maximizing the difference in the average contribution of the split variable between the divided spaces. We demonstrate the effectiveness of our method in diagnosing machine learning models on multiple tasks. Also, it is useful for new knowledge discovery as such insights are not easily identifiable when only looking at single predictions. In general, our work makes it easier and more efficient for human beings to understand machine learning models.\nModern reinforcement learning algorithms reach super-human performance in many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduce VaST (Variational State Tabulation), which maps an environment with a high-dimensional state space (e.g. the space of visual inputs) to an abstract tabular environment. Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities.\nMotivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable by introducing a novel framework for clustering overfitted \\emph{parametric} (i.e. misspecified) mixture models. These conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In contrast to the recent literature on estimating nonparametric mixtures, we allow for general nonparametric mixture components, and instead impose regularity assumptions on the underlying mixing measure. As our primary application, we apply these results to partition-based clustering, generalizing the well-known notion of a Bayes optimal partition from classical model-based clustering to nonparametric settings. Furthermore, this framework is constructive in that it yields a practical algorithm for learning identified mixtures, which is illustrated through several examples. The key conceptual device in the analysis is the convex, metric geometry of probability distributions on metric spaces and its connection to optimal transport and the Wasserstein convergence of mixing measures. The result is a flexible framework for nonparametric clustering with formal consistency guarantees.\nTraining large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. signSGD can exploit mismatches between L1 and L2 geometry: when noise and curvature are much sparser than the gradients, signSGD is expected to converge at the same rate or faster than full-precision SGD. Measurements of the L1 versus L2 geometry of real networks support our theoretical claims, and we find that the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss, we prove that the non-convex convergence rate of majority vote matches that of distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve both communication efficiency and high accuracy.\nWe use model-free reinforcement learning, extensive simulation, and transfer learning to develop a continuous control algorithm that has good zero-shot performance in a real physical environment. We train a simulated agent to act optimally across a set of similar environments, each with dynamics drawn from a prior distribution. We propose that the agent is able to adjust its actions almost immediately, based on small set of observations. This robust and adaptive behavior is enabled by using a policy gradient algorithm with an Long Short Term Memory (LSTM) function approximation. Finally, we train an agent to navigate a two-dimensional environment with uncertain dynamics and noisy observations. We demonstrate that this agent has good zero-shot performance in a real physical environment. Our preliminary results indicate that the agent is able to infer the environmental dynamics after only a few timesteps, and adjust its actions accordingly.\nSentential decision diagrams (SDDs) introduced by Darwiche in 2011 are a promising representation type used in knowledge compilation. The relative succinctness of representation types is an important subject in this area. The aim of the paper is to identify which kind of Boolean functions can be represented by SDDs of small size with respect to the number of variables the functions are defined on. For this reason the sets of Boolean functions representable by different representation types in polynomial size are investigated and SDDs are compared with representation types from the classical knowledge compilation map of Darwiche and Marquis. Ordered binary decision diagrams (OBDDs) which are a popular data structure for Boolean functions are one of these representation types. SDDs are more general than OBDDs by definition but only recently, a Boolean function was presented with polynomial SDD size but exponential OBDD size. This result is strengthened in several ways. The main result is a quasipolynomial simulation of SDDs by equivalent unambiguous nondeterministic OBDDs, a nondeterministic variant where there exists exactly one accepting computation for each satisfying input. As a side effect an open problem about the relative succinctness between SDDs and free binary decision diagrams (FBDDs) which are more general than OBDDs is answered.\nEfficient exploration remains a challenging research problem in reinforcement learning, especially when an environment contains large state spaces, deceptive local optima, or sparse rewards. To tackle this problem, we present a diversity-driven approach for exploration, which can be easily combined with both off- and on-policy reinforcement learning algorithms. We show that by simply adding a distance measure to the loss function, the proposed methodology significantly enhances an agent's exploratory behaviors, and thus preventing the policy from being trapped in local optima. We further propose an adaptive scaling method for stabilizing the learning process. Our experimental results in Atari 2600 show that our method outperforms baseline approaches in several tasks in terms of mean scores and exploration efficiency.\nIn recent years, the importance of deep learning has significantly increased in pattern recognition, computer vision, and artificial intelligence research, as well as in industry. However, despite the existence of multiple deep learning frameworks, there is a lack of comprehensible and easy-to-use high-level tools for the design, training, and testing of deep neural networks (DNNs). In this paper, we introduce Barista, an open-source graphical high-level interface for the Caffe deep learning framework. While Caffe is one of the most popular frameworks for training DNNs, editing prototext files in order to specify the net architecture and hyper parameters can become a cumbersome and error-prone task. Instead, Barista offers a fully graphical user interface with a graph-based net topology editor and provides an end-to-end training facility for DNNs, which allows researchers to focus on solving their problems without having to write code, edit text files, or manually parse logged data.\nIn this work, we present a weakly supervised sentence extraction technique for identifying important sentences in scientific papers that are worthy of inclusion in the abstract. We propose a new attention based deep learning architecture that jointly learns to identify important content, as well as the cue phrases that are indicative of summary worthy sentences. We propose a new context embedding technique for determining the focus of a given paper using topic models and use it jointly with an LSTM based sequence encoder to learn attention weights across the sentence words. We use a collection of articles publicly available through ACL anthology for our experiments. Our system achieves a performance that is better, in terms of several ROUGE metrics, as compared to several state of art extractive techniques. It also generates more coherent summaries and preserves the overall structure of the document.\nDeep reinforcement learning has demonstrated increasing capabilities for continuous control problems, including agents that can move with skill and agility through their environment. An open problem in this setting is that of developing good strategies for integrating or merging policies for multiple skills, where each individual skill is a specialist in a specific skill and its associated state distribution. We extend policy distillation methods to the continuous action setting and leverage this technique to combine expert policies, as evaluated in the domain of simulated bipedal locomotion across different classes of terrain. We also introduce an input injection method for augmenting an existing policy network to exploit new input features. Lastly, our method uses transfer learning to assist in the efficient acquisition of new skills. The combination of these methods allows a policy to be incrementally augmented with new skills. We compare our progressive learning and integration via distillation (PLAID) method against three alternative baselines.\nWe propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning and eliminates the need for reward shaping at test time. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. Moreover, at test time, our learner optimizes only its learned loss function, and requires no explicit reward signal. In effect, the agent internalizes the reward structure, suggesting a direction toward agents that learn to solve new tasks simply from intrinsic motivation.\nThere is no denying the tremendous leap in the performance of machine learning methods in the past half-decade. Some might even say that specific sub-fields in pattern recognition, such as machine-vision, are as good as solved, reaching human and super-human levels. Arguably, lack of training data and computation power are all that stand between us and solving the remaining ones. In this position paper we underline cases in vision which are challenging to machines and even to human observers. This is to show limitations of contemporary models that are hard to ameliorate by following the current trend to increase training data, network capacity or computational power. Moreover, we claim that attempting to do so is in principle a suboptimal approach. We provide a taster of such examples in hope to encourage and challenge the machine learning community to develop new directions to solve the said difficulties.\nIn the quest towards general artificial intelligence (AI), researchers have explored developing loss functions that act as intrinsic motivators in the absence of external rewards. This paper argues that such research has overlooked an important and useful intrinsic motivator: social interaction. We posit that making an AI agent aware of implicit social feedback from humans can allow for faster learning of more generalizable and useful representations, and could potentially impact AI safety. We collect social feedback in the form of facial expression reactions to samples from Sketch RNN, an LSTM-based variational autoencoder (VAE) designed to produce sketch drawings. We use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small group of viewers, and then show in an independent evaluation with 76 users that this model produced sketches that lead to significantly more positive facial expressions. Thus, we establish that implicit social feedback can improve the output of a deep learning model.\nHuman behavior understanding is arguably one of the most important mid-level components in artificial intelligence. In order to efficiently make use of data, multi-task learning has been studied in diverse computer vision tasks including human behavior understanding. However, multi-task learning relies on task specific datasets and constructing such datasets can be cumbersome. It requires huge amounts of data, labeling efforts, statistical consideration etc. In this paper, we leverage existing single-task datasets for human action classification and captioning data for efficient human behavior learning. Since the data in each dataset has respective heterogeneous annotations, traditional multi-task learning is not effective in this scenario. To this end, we propose a novel alternating directional optimization method to efficiently learn from the heterogeneous data. We demonstrate the effectiveness of our model and show performance improvements on both classification and sentence retrieval tasks in comparison to the models trained on each of the single-task datasets.\nThe problem of rating the performance of soccer players is attracting the interest of many companies, websites, and the scientific community, thanks to the availability of massive data capturing all the events generated during a game (e.g., tackles, passes, shots, etc.). Existing approaches fail to fully exploit the richness of the available data and lack of a proper validation. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We validate the framework through an experimental analysis advised by soccer experts, based on a massive dataset of millions of events pertaining four seasons of the five prominent European leagues. Experiments show that PlayeRank is robust in agreeing with the experts' evaluation of players, significantly improving the state of the art. We also explore an application of PlayeRank --- i.e. searching players --- by introducing a special form of spatial query on the soccer field. This shows its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.\nPatients in the intensive care unit (ICU) require constant and close supervision. To assist clinical staff in this task, hospitals use monitoring systems that trigger audiovisual alarms if their algorithms indicate that a patient's condition may be worsening. However, current monitoring systems are extremely sensitive to movement artefacts and technical errors. As a result, they typically trigger hundreds to thousands of false alarms per patient per day - drowning the important alarms in noise and adding to the exhaustion of clinical staff. In this setting, data is abundantly available, but obtaining trustworthy annotations by experts is laborious and expensive. We frame the problem of false alarm reduction from multivariate time series as a machine-learning task and address it with a novel multitask network architecture that utilises distant supervision through multiple related auxiliary tasks in order to reduce the number of expensive labels required for training. We show that our approach leads to significant improvements over several state-of-the-art baselines on real-world ICU data and provide new insights on the importance of task selection and architectural choices in distantly supervised multitask learning.\nThe prediction of the gas production from mature gas wells, due to their complex end-of-life behavior, is challenging and crucial for operational decision making. In this paper, we apply a modified deep LSTM model for prediction of the gas flow rates in mature gas wells, including the uncertainties in input parameters. Additionally, due to changes in the system in time and in order to increase the accuracy and robustness of the prediction, the Ensemble Kalman Filter (EnKF) is used to update the flow rate predictions based on new observations. The developed approach was tested on the data from two mature gas production wells in which their production is highly dynamic and suffering from salt deposition. The results show that the flow predictions using the EnKF updated model leads to better Jeffreys' J-divergences than the predictions without the EnKF model updating scheme.\nThis paper presents a framework for generating adventure games from open data. Focusing on the murder mystery type of adventure games, the generator is able to transform open data from Wikipedia articles, OpenStreetMap and images from Wikimedia Commons into WikiMysteries. Every WikiMystery game revolves around the murder of a person with a Wikipedia article and populates the game with suspects who must be arrested by the player if guilty of the murder or absolved if innocent. Starting from only one person as the victim, an extensive generative pipeline finds suspects, their alibis, and paths connecting them from open data, transforms open data into cities, buildings, non-player characters, locks and keys and dialog options. The paper describes in detail each generative step, provides a specific playthrough of one WikiMystery where Albert Einstein is murdered, and evaluates the outcomes of games generated for the 100 most influential people of the 20th century.\nDeep-embedding methods aim to discover representations of a domain that make explicit the domain's class structure. Disentangling methods aim to make explicit compositional or factorial structure. We combine these two active but independent lines of research and propose a new paradigm for discovering disentangled representations of class structure; these representations reveal the underlying factors that jointly determine class. We propose and evaluate a novel loss function based on the $F$ statistic, which describes the separation of two or more distributions. By ensuring that distinct classes are well separated on a subset of embedding dimensions, we obtain embeddings that are useful for few-shot learning. By not requiring separation on all dimensions, we encourage the discovery of disentangled representations. Our embedding procedure matches or beats state-of-the-art procedures on deep embeddings, as evaluated by performance on recall@$k$ and few-shot learning tasks. To evaluate alternative approaches on disentangling, we formalize two key properties of a disentangled representation: modularity and explicitness. By these criteria, our procedure yields disentangled representations, whereas traditional procedures fail. The goal of our work is to obtain more interpretable, manipulable, and generalizable deep representations of concepts and categories.\nRobust real-world learning should benefit from both demonstrations and interactions with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on the reward received from the environment. These tasks have divergent losses which are difficult to jointly optimize and such methods can be very sensitive to noisy demonstrations. We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data. NAC learns an initial policy network from demonstrations and refines the policy in the environment, surpassing the demonstrator's performance. Crucially, both learning from demonstration and interactive refinement use the same objective, unlike prior approaches that combine distinct supervised and reinforcement losses. This makes NAC robust to suboptimal demonstration data since the method is not forced to mimic all of the examples in the dataset. We show that our unified reinforcement learning algorithm can learn robustly and outperform existing baselines when evaluated on several realistic driving games.\nDespite the recent successes of deep neural networks in various fields such as image and speech recognition, natural language processing, and reinforcement learning, we still face big challenges in bringing the power of numeric optimization to symbolic reasoning. Researchers have proposed different avenues such as neural machine translation for proof synthesis, vectorization of symbols and expressions for representing symbolic patterns, and coupling of neural back-ends for dimensionality reduction with symbolic front-ends for decision making. However, these initial explorations are still only point solutions, and bear other shortcomings such as lack of correctness guarantees. In this paper, we present our approach of casting symbolic reasoning as games, and directly harnessing the power of deep reinforcement learning in the style of Alpha(Go) Zero on symbolic problems. Using the Boolean Satisfiability (SAT) problem as showcase, we demonstrate the feasibility of our method, and the advantages of modularity, efficiency, and correctness guarantees.\nMulti-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified or the inference would otherwise be too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels. Also, deep learning approaches introduce a lot of errors, particularly in the presence of unseen noise types and settings. We have therefore proposed an enhancement framework called DEEPBEAM, which combines the two complementary classes of algorithms. DEEPBEAM introduces a beamforming filter to produce natural sounding speech, but the filter coefficients are determined with the help of a monaural speech enhancement neural network. Experiments on synthetic and real-world data show that DEEPBEAM is able to produce clean, dry and natural sounding speech, and is robust against unseen noise.\nScaling Bayesian optimization to high dimensions is challenging task as the global optimization of high-dimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited active variables or the additive form of the objective function. We propose a new method for high-dimensional Bayesian optimization, that uses a dropout strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two real-world applications- training cascade classifiers and optimizing alloy composition.\nExisting multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of user interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution. Experiments on resource allocation, Ising model estimation, and battle game tasks verify the learning effectiveness of our mean field approaches in handling many-agent interactions in population.\nThe discovery of time series motifs has emerged as one of the most useful primitives in time series data mining. Researchers have shown its utility for exploratory data mining, summarization, visualization, segmentation, classification, clustering, and rule discovery. Although there has been more than a decade of extensive research, there is still no technique to allow the discovery of time series motifs in the presence of missing data, despite the well-documented ubiquity of missing data in scientific, industrial, and medical datasets. In this work, we introduce a technique for motif discovery in the presence of missing data. We formally prove that our method is admissible, producing no false negatives. We also show that our method can piggy-back off the fastest known motif discovery method with a small constant factor time/space overhead. We will demonstrate our approach on diverse datasets with varying amounts of missing data\nDuring sleep and awake rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, a prioritized sweeping approach, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate whether such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the world model should occur during rest periods, and that the corresponding replays should be shuffled.\nIn the modern era, abundant information is easily accessible from various sources, however only a few of these sources are reliable as they mostly contain unverified contents. We develop a system to validate the truthfulness of a given statement together with underlying evidence. The proposed system provides supporting evidence when the statement is tagged as false. Our work relies on an inference method on a knowledge graph (KG) to identify the truthfulness of statements. In order to extract the evidence of falseness, the proposed algorithm takes into account combined knowledge from KG and ontologies. The system shows very good results as it provides valid and concise evidence. The quality of KG plays a role in the performance of the inference method which explicitly affects the performance of our evidence-extracting algorithm.\nGiven a target name, which can be a product aspect or entity, identifying its aspect words and opinion words in a given corpus is a fine-grained task in target-based sentiment analysis (TSA). This task is challenging, especially when we have no labeled data and we want to perform it for any given domain. To address it, we propose a general two-stage approach. Stage one extracts/groups the target-related words (call t-words) for a given target. This is relatively easy as we can apply an existing semantics-based learning technique. Stage two separates the aspect and opinion words from the grouped t-words, which is challenging because we often do not have enough word-level aspect and opinion labels. In this work, we formulate this problem in a PU learning setting and incorporate the idea of lifelong learning to solve it. Experimental results show the effectiveness of our approach.\nIntegrated task and motion planning has emerged as a challenging problem in sequential decision making, where a robot needs to compute high-level strategy and low-level motion plans for solving complex tasks. While high-level strategies require decision making over longer time-horizons and scales, their feasibility depends on low-level constraints based upon the geometries and continuous dynamics of the environment. The hybrid nature of this problem makes it difficult to scale; most existing approaches focus on deterministic, fully observable scenarios. We present a new approach where the high-level decision problem occurs in a stochastic setting and can be modeled as a Markov decision process. In contrast to prior efforts, we show that complete MDP policies, or contingent behaviors, can be computed effectively in an anytime fashion. Our algorithm continuously improves the quality of the solution and is guaranteed to be probabilistically complete. We evaluate the performance of our approach on a challenging, realistic test problem: autonomous aircraft inspection. Our results show that we can effectively compute consistent task and motion policies for the most likely execution-time outcomes using only a fraction of the computation required to develop the complete task and motion policy.\nIn this paper, we unify causal and non-causal feature selection methods based on the Bayesian network framework. We first show that the objectives of causal and non-causal feature selection methods are equal and are to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We demonstrate that causal and non-causal feature selection take different assumptions of dependency among features to find Markov blanket, and their algorithms are shown different level of approximation for finding Markov blanket. In this framework, we are able to analyze the sample and error bounds of casual and non-causal methods. We conducted extensive experiments to show the correctness of our theoretical analysis.\nThis work exploits translation data as a source of semantically relevant learning signal for models of word representation. In particular, we exploit equivalence through translation as a form of distributed context and jointly learn how to embed and align with a deep generative model. Our EmbedAlign model embeds words in their complete observed context and learns by marginalisation of latent lexical alignments. Besides, it embeds words as posterior probability densities, rather than point estimates, which allows us to compare words in context using a measure of overlap between distributions (e.g. KL divergence). We investigate our model's performance on a range of lexical semantics tasks achieving competitive results on several standard benchmarks including natural language inference, paraphrasing, and text similarity.\nWe present a technique for estimating the similarity between objects such as movies or foods whose proper representation depends on human perception. Our technique combines a modest number of human similarity assessments to infer a pairwise similarity function between the objects. This similarity function captures some human notion of similarity which may be difficult or impossible to automatically extract, such as which movie from a collection would be a better substitute when the desired one is unavailable. In contrast to prior techniques, our method does not assume that all similarity questions on the collection can be answered or that all users perceive similarity in the same way. When combined with a user model, we find how each assessor's tastes vary, affecting their perception of similarity.\nRecently, the interest in reinforcement learning in game playing has been renewed. This is evidenced by the groundbreaking results achieved by AlphaGo. General Game Playing (GGP) provides a good testbed for reinforcement learning, currently one of the hottest fields of AI. In GGP, a specification of games rules is given. The description specifies a reinforcement learning problem, leaving programs to find strategies for playing well. Q-learning is one of the canonical reinforcement learning methods, which is used as baseline on some previous work (Banerjee & Stone, IJCAI 2007). We implement Q-learning in GGP for three small board games (Tic-Tac-Toe, Connect-Four, Hex). We find that Q-learning converges, and thus that this general reinforcement learning method is indeed applicable to General Game Playing. However, convergence is slow, in comparison to MCTS (a reinforcement learning method reported to achieve good results). We enhance Q-learning with Monte Carlo Search. This enhancement improves performance of pure Q-learning, although it does not yet out-perform MCTS. Future work is needed into the relation between MCTS and Q-learning, and on larger problem instances.\nRecent developments in the field of robot grasping have shown great improvements in the grasp success rates when dealing with unknown objects. In this work we improve on one of the most promising approaches, the Grasp Quality Convolutional Neural Network (GQ-CNN) trained on the DexNet 2.0 dataset. We propose a new architecture for the GQ-CNN and describe practical improvements that increase the model validation accuracy from 92.2% to 95.8% and from 85.9% to 88.0% on respectively image-wise and object-wise training and validation splits.\nThe field of learning analytics needs to adopt a more rigorous approach for predictive model evaluation that matches the complex practice of model-building. In this work, we present a procedure to statistically test hypotheses about model performance which goes beyond the state-of-the-practice in the community to analyze both algorithms and feature extraction methods from raw data. We apply this method to a series of algorithms and feature sets derived from a large sample of Massive Open Online Courses (MOOCs). While a complete comparison of all potential modeling approaches is beyond the scope of this paper, we show that this approach reveals a large gap in dropout prediction performance between forum-, assignment-, and clickstream-based feature extraction methods, where the latter is significantly better than the former two, which are in turn indistinguishable from one another. This work has methodological implications for evaluating predictive or AI-based models of student success, and practical implications for the design and targeting of at-risk student models and interventions.\nAlthough chatbots have been very popular in recent years, they still have some serious weaknesses which limit the scope of their applications. One major weakness is that they cannot learn new knowledge during the conversation process, i.e., their knowledge is fixed beforehand and cannot be expanded or updated during conversation. In this paper, we propose to build a general knowledge learning engine for chatbots to enable them to continuously and interactively learn new knowledge during conversations. As time goes by, they become more and more knowledgeable and better and better at learning and conversation. We model the task as an open-world knowledge base completion problem and propose a novel technique called lifelong interactive learning and inference (LiLi) to solve it. LiLi works by imitating how humans acquire knowledge and perform inference during an interactive conversation. Our experimental results show LiLi is highly promising.\nIn this paper, we consider an online optimization process, where the objective functions are not convex (nor concave) but instead belong to a broad class of continuous submodular functions. We first propose a variant of the Frank-Wolfe algorithm that has access to the full gradient of the objective functions. We show that it achieves a regret bound of $O(\\sqrt{T})$ (where $T$ is the horizon of the online optimization problem) against a $(1-1/e)$-approximation to the best feasible solution in hindsight. However, in many scenarios, only an unbiased estimate of the gradients are available. For such settings, we then propose an online stochastic gradient ascent algorithm that also achieves a regret bound of $O(\\sqrt{T})$ regret, albeit against a weaker $1/2$-approximation to the best feasible solution in hindsight. We also generalize our results to $\\gamma$-weakly submodular functions and prove the same sublinear regret bounds. Finally, we demonstrate the efficiency of our algorithms on a few problem instances, including non-convex/non-concave quadratic programs, multilinear extensions of submodular set functions, and D-optimal design.\nUsers of AI systems may rely upon them to produce plans for achieving desired objectives. Such AI systems should be able to compute obfuscated plans whose execution in adversarial situations protects privacy as well as legible plans which are easy for team-members to understand in collaborative situations. We develop a unified framework that addresses these dual problems by computing plans with a desired level of comprehensibility from the point of view of a partially informed observer. Our approach produces obfuscated plans with observations that are consistent with at least 'k' goals from a given set of decoy goals. In addition, when the goal is known to the observer, our approach generates obfuscated plans with observations that are diverse with at least 'l' candidate plans. Our approach for plan legibility produces plans that achieve a goal while being consistent with at most 'j' goals in a given set of confounding goals. We provide an empirical evaluation to show the feasibility and usefulness of our approaches.\nPlanning under uncertainty is critical for robust robot performance in uncertain, dynamic environments, but it incurs high computational cost. State-of-the-art online search algorithms, such as DESPOT, have vastly improved the computational efficiency of planning under uncertainty and made it a valuable tool for robotics in practice. This work takes one step further by leveraging both CPU and GPU parallelization in order to achieve near real-time online planning performance for complex tasks with large state, action, and observation spaces. Specifically, we propose Hybrid Parallel DESPOT (HyP-DESPOT), a massively parallel online planning algorithm that integrates CPU and GPU parallelism in a multi-level scheme. It performs parallel DESPOT tree search by simultaneously traversing multiple independent paths using multi-core CPUs and performs parallel Monte-Carlo simulations at the leaf nodes of the search tree using GPUs. Experimental results show that HyP-DESPOT speeds up online planning by up to several hundred times, compared with the original DESPOT algorithm, in several challenging robotic tasks in simulation.\nEffective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. Then we introduce: Interactive System Optimizer (ISO), a novel algorithm that uses these inferred objectives for optimization. Our main contribution is a new general principled approach to optimizing interactive systems using data-driven objectives. We demonstrate the high effectiveness of ISO over several GridWorld simulations.\nA common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. In contrast, the score matching method of Hyv\\\"arinen (2005) avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over $\\mathbb{R}^m$. Hyv\\\"arinen (2007) extended the approach to distributions supported on the non-negative orthant $\\mathbb{R}_+^m$. In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. We also generalize the regularized score matching method of Lin et al. (2016) for non-negative Gaussian graphical models, with improved theoretical guarantees.\nIn this paper we consider online mirror descent (OMD) algorithms, a class of scalable online learning algorithms exploiting data geometric structures through mirror maps. Necessary and sufficient conditions are presented in terms of the step size sequence $\\{\\eta_t\\}_{t}$ for the convergence of an OMD algorithm with respect to the expected Bregman distance induced by the mirror map. The condition is $\\lim_{t\\to\\infty}\\eta_t=0, \\sum_{t=1}^{\\infty}\\eta_t=\\infty$ in the case of positive variances. It is reduced to $\\sum_{t=1}^{\\infty}\\eta_t=\\infty$ in the case of zero variances for which the linear convergence may be achieved by taking a constant step size sequence. A sufficient condition on the almost sure convergence is also given. We establish tight error bounds under mild conditions on the mirror map, the loss function, and the regularizer. Our results are achieved by some novel analysis on the one-step progress of the OMD algorithm using smoothness and strong convexity of the mirror map and the loss function.\nString kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVMs in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We present the first approximation named ESP+SFM for alignment kernels by leveraging a metric embedding named edit-sensitive parsing (ESP) and space-efficient feature maps (SFM) for random Fourier features (RFF) for large-scale string analyses. Input strings are projected into vectors of RFF by leveraging ESP and SFM. Then, SVMs are trained on the projected vectors, which enables to significantly improve the scalability of alignment kernels while preserving their prediction accuracies. We experimentally test ESP+ SFM on its ability to learn SVMs for large-scale string classifications with various massive string data, and we demonstrate the superior performance of ESP+SFM with respect to prediction accuracy, scalability and computation efficiency.\nNatural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Critically, the learner cannot in general know a priori the relevant time scale over which meaningful relationships will be observed. Widely used reinforcement learning algorithms discretize continuous time and use the Bellman equation to estimate exponentially-discounted future reward. However, exponential discounting introduces a time scale to the computation of value. Scaling is a serious problem in continuous time: efficient learning with scaled algorithms requires prior knowledge of the relevant scale. That is, with scaled algorithms one must know at least part of the solution to a problem prior to attempting a solution. We present a computational mechanism, developed based on work in psychology and neuroscience, for computing a scale-invariant timeline of future events. This mechanism efficiently computes a model for future time on a logarithmically-compressed scale, and can be used to generate a scale-invariant power-law-discounted estimate of expected future reward. Moreover, the representation of future time retains information about what will happen when, enabling flexible decision making based on future events. The entire timeline can be constructed in a single parallel operation.\nLarge-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not only can significantly improve the utilization of transportation resources but also increase the revenue and customer satisfaction. It is a challenging task to design an effective fleet management strategy that can adapt to an environment involving complex dynamics between demand and supply. Existing studies usually work on a simplified problem setting that can hardly capture the complicated stochastic demand-supply variations in high-dimensional space. In this paper we propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including two concrete algorithms, namely contextual deep Q-learning and contextual multi-agent actor-critic, to achieve explicit coordination among a large number of agents adaptive to different contexts. We show significant improvements of the proposed framework over state-of-the-art approaches through extensive empirical studies.\nNeural networks are very powerful learning systems, but they do not readily generalize from one task to the other. This is partly due to the fact that they do not learn in a compositional way, that is, by discovering skills that are shared by different tasks, and recombining them to solve new problems. In this paper, we explore the compositional generalization capabilities of recurrent neural networks (RNNs). We first propose the lookup table composition domain as a simple setup to test compositional behaviour and show that it is theoretically possible for a standard RNN to learn to behave compositionally in this domain when trained with standard gradient descent and provided with additional supervision. We then remove this additional supervision and perform a search over a large number of model initializations to investigate the proportion of RNNs that can still converge to a compositional solution. We discover that a small but non-negligible proportion of RNNs do reach partial compositional solutions even without special architectural constraints. This suggests that a combination of gradient descent and evolutionary strategies directly favouring the minority models that developed more compositional approaches might suffice to lead standard RNNs towards compositional solutions.\nConstrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs only use on-policy data for dual updates, which results in sample inefficiency and slow convergence. In this paper, we propose a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient. Experimental results on a simulated robot locomotion task show that APDO achieves better sample efficiency and faster convergence than state-of-the-art approaches for CMDPs.\nWe provide a new computationally-efficient class of estimators for risk minimization. We show that these estimators are robust for general statistical models: in the classical Huber epsilon-contamination model and in heavy-tailed settings. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate estimators in a general convex risk minimization problem. We provide specific consequences of our theory for linear regression, logistic regression and for estimation of the canonical parameters in an exponential family. These results provide some of the first computationally tractable and provably robust estimators for these canonical statistical models. Finally, we study the empirical performance of our proposed methods on synthetic and real datasets, and find that our methods convincingly outperform a variety of baselines.\nHierarchical learning (HL) is key to solving complex sequential decision problems with long horizons and sparse rewards. It allows learning agents to break-up large problems into smaller, more manageable subtasks. A common approach to HL, is to provide the agent with a number of high-level skills that solve small parts of the overall problem. A major open question, however, is how to identify a suitable set of reusable skills. We propose a principled approach that uses human demonstrations to infer a set of subgoals based on changes in the demonstration dynamics. Using these subgoals, we decompose the learning problem into an abstract high-level representation and a set of low-level subtasks. The abstract description captures the overall problem structure, while subtasks capture desired skills. We demonstrate that we can jointly optimize over both levels of learning. We show that the resulting method significantly outperforms previous baselines on two challenging problems: the Atari 2600 game Montezuma's Revenge, and a simulated robotics problem moving the ant robot through a maze.\nIn this paper, we introduce a novel approach for diagnosis of Parkinson's Disease (PD) based on deep Echo State Networks (ESNs). The identification of PD is performed by analyzing the whole time-series collected from a tablet device during the sketching of spiral tests, without the need for feature extraction and data preprocessing. We evaluated the proposed approach on a public dataset of spiral tests. The results of experimental analysis show that DeepESNs perform significantly better than shallow ESN model. Overall, the proposed approach obtains state-of-the-art results in the identification of PD on this kind of temporal data.\nHierarchical classification is supervised multi-class classification problem over the set of class labels organized according to a hierarchy. In this report, we study the work by Ramaswamy et. al. on hierarchical classification over symmetric tree distance loss. We extend the consistency of hierarchical classification algorithm over asymmetric tree distance loss. We design a $\\mathcal{O}(nk\\log{}n)$ algorithm to find Bayes optimal classification for a k-ary tree as a hierarchy. We show that under reasonable assumptions over asymmetric loss function, the Bayes optimal classification over this asymmetric loss can be found in $\\mathcal{O}(k\\log{}n)$. We exploit this insight and attempt to extend the Ova-Cascade algorithm \\citet{ramaswamy2015convex} for hierarchical classification over the asymmetric loss.\nDeep neural networks, although shown to be a successful class of machine learning algorithms, are known to be extremely unstable to adversarial perturbations. Improving the robustness of neural networks against these attacks is important, especially for security-critical applications. To defend against such attacks, we propose dividing the input image into multiple patches, denoising each patch independently, and reconstructing the image, without losing significant image content. This proposed defense mechanism is non-differentiable which makes it non-trivial for an adversary to apply gradient-based attacks. Moreover, we do not fine-tune the network with adversarial examples, making it more robust against unknown attacks. We present a thorough analysis of the tradeoff between accuracy and robustness against adversarial attacks. We evaluate our method under black-box, grey-box, and white-box settings. The proposed method outperforms the state-of-the-art by a significant margin on the ImageNet dataset under grey-box attacks while maintaining good accuracy on clean images. We also establish a strong baseline for a novel white-box attack.\nWe consider the problem of joint source and channel coding of structured data such as natural language over a noisy channel. The typical approach to this problem in both theory and practice involves performing source coding to first compress the text and then channel coding to add robustness for the transmission across the channel. This approach is optimal in terms of minimizing end-to-end distortion with arbitrarily large block lengths of both the source and channel codes when transmission is over discrete memoryless channels. However, the optimality of this approach is no longer ensured for documents of finite length and limitations on the length of the encoding. We will show in this scenario that we can achieve lower word error rates by developing a deep learning based encoder and decoder. While the approach of separate source and channel coding would minimize bit error rates, our approach preserves semantic information of sentences by first embedding sentences in a semantic space where sentences closer in meaning are located closer together, and then performing joint source and channel coding on these embeddings.\nWe propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.\nThere is a growing interest within the AI research community to develop autonomous systems capable of explaining their behavior to users. One aspect of the explanation generation problem that has yet to receive much attention is the task of explaining plans to users whose level of expertise differ from that of the explainer. We propose an approach for addressing this problem by representing the user's model as an abstraction of the domain model that the planner uses. We present algorithms for generating minimal explanations in cases where this abstract human model is not known. We reduce the problem of generating explanation to a search over the space of abstract models and investigate possible greedy approximations for minimal explanations. We also empirically show that our approach can efficiently compute explanations for a variety of problems.\nIn this paper we construct preimage attack on the truncated variant of the MD4 hash function. Specifically, we study the MD4-39 function defined by the first 39 steps of the MD4 algorithm. We suggest a new attack on MD4-39, which develops the ideas proposed by H. Dobbertin in 1998. Namely, the special relaxation constraints are introduced in order to simplify the equations corresponding to the problem of finding a preimage for an arbitrary MD4-39 hash value. The equations supplemented with the relaxation constraints are then reduced to the Boolean Satisfiability Problem (SAT) and solved using the state-of-the-art SAT solvers. We show that the effectiveness of a set of relaxation constraints can be evaluated using the black-box function of a special kind. Thus, we suggest automatic method of relaxation constraints generation by applying the black-box optimization to this function. The proposed method made it possible to find new relaxation constraints that contribute to a SAT-based preimage attack on MD4-39 which significantly outperforms the competition.\nDetecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic novelty detection techniques in a classification framework. To bridge this gap, we present here a resource for benchmarking the techniques for document level novelty detection. We create the resource via event-specific crawling of news documents across several domains in a periodic manner. We release the annotated corpus with necessary statistics and show its use with a developed system for the problem in concern.\nThe problem of identifying end-use electrical appliances from their individual consumption profiles, known as the appliance identification problem, is a primary stage in both Non-Intrusive Load Monitoring (NILM) and automated plug-wise metering. Therefore, appliance identification has received dedicated studies with various electric appliance signatures, classification models, and evaluation datasets. In this paper, we propose a neural network ensembles approach to address this problem using high resolution measurements. The models are trained on the raw current and voltage waveforms, and thus, eliminating the need for well engineered appliance signatures. We evaluate the proposed model on a publicly available appliance dataset from 55 residential buildings, 11 appliance categories, and over 1000 measurements. We further study the stability of the trained models with respect to training dataset, sampling frequency, and variations in the steady-state operation of appliances.\nWe study the problem of maximizing a monotone set function subject to a cardinality constraint $k$ in the setting where some number of elements $\\tau$ is deleted from the returned set. The focus of this work is on the worst-case adversarial setting. While there exist constant-factor guarantees when the function is submodular, there are no guarantees for non-submodular objectives. In this work, we present a new algorithm Oblivious-Greedy and prove the first constant-factor approximation guarantees for a wider class of non-submodular objectives. The obtained theoretical bounds are the first constant-factor bounds that also hold in the linear regime, i.e. when the number of deletions $\\tau$ is linear in $k$. Our bounds depend on established parameters such as the submodularity ratio and some novel ones such as the inverse curvature. We bound these parameters for two important objectives including support selection and variance reduction. Finally, we numerically demonstrate the robust performance of Oblivious-Greedy for these two objectives on various datasets.\nChatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data has a different nature of a simple question and answering interaction, in which context and temporal properties (turn order) creates a different understanding of such data. In this paper, we propose a novelty metric to compute dialogs' similarity based not only on the text content but also on the information related to the dialog structure. Our experimental results performed over the Switchboard dataset show that using evidence from both textual content and the dialog structure leads to more accurate results than using each measure in isolation.\nExploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.\nInfants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and interact with objects it sees, we propose a \"world-model\" network that learns to predict the dynamic consequences of the agent's actions. Simultaneously, we train a separate explicit \"self-model\" that allows the agent to track the error map of its own world-model, and then uses the self-model to adversarially challenge the developing world-model. We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating flexible autonomous agents that self-supervise in complex novel physical environments.\nInfants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to replicate some of these abilities with a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which the agent can move and interact with objects it sees, the agent learns a world model predicting the dynamic consequences of its actions. Simultaneously, the agent learns to take actions that adversarially challenge the developing world model, pushing the agent to explore novel and informative interactions with its environment. We demonstrate that this policy leads to the self-supervised emergence of a spectrum of complex behaviors, including ego motion prediction, object attention, and object gathering. Moreover, the world model that the agent learns supports improved performance on object dynamics prediction and localization tasks. Our results are a proof-of-principle that computational models of intrinsic motivation might account for key features of developmental visuomotor learning in infants.\nThis paper introduces epistemic graphs as a generalization of the epistemic approach to probabilistic argumentation. In these graphs, an argument can be believed or disbelieved up to a given degree, thus providing a more fine--grained alternative to the standard Dung's approaches when it comes to determining the status of a given argument. Furthermore, the flexibility of the epistemic approach allows us to both model the rationale behind the existing semantics as well as completely deviate from them when required. Epistemic graphs can model both attack and support as well as relations that are neither support nor attack. The way other arguments influence a given argument is expressed by the epistemic constraints that can restrict the belief we have in an argument with a varying degree of specificity. The fact that we can specify the rules under which arguments should be evaluated and we can include constraints between unrelated arguments permits the framework to be more context--sensitive. It also allows for better modelling of imperfect agents, which can be important in multi--agent applications.\nIn this paper we propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. Given an input we find what should be minimally and sufficiently present (viz. important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \\emph{absent} (viz. certain background pixels). We argue that such explanations are natural for humans and are used commonly in domains such as health care and criminology. What is minimally but critically \\emph{absent} is an important part of an explanation, which to the best of our knowledge, has not been touched upon by current explanation methods that attempt to explain predictions of neural networks. We validate our approach on three real datasets obtained from diverse domains; namely, a handwritten digits dataset MNIST, a large procurement fraud dataset and an fMRI brain imaging dataset. In all three cases, we witness the power of our approach in generating precise explanations that are also easy for human experts to understand and evaluate.\nDespite a growing body of research focused on creating interpretable machine learning methods, there have been few empirical studies verifying whether interpretable methods achieve their intended effects on end users. We present a framework for assessing the effects of model interpretability on users via pre-registered experiments in which participants are shown functionally identical models that vary in factors thought to influence interpretability. Using this framework, we ran a sequence of large-scale randomized experiments, varying two putative drivers of interpretability: the number of features and the model transparency (clear or black-box). We measured how these factors impact trust in model predictions, the ability to simulate a model, and the ability to detect a model's mistakes. We found that participants who were shown a clear model with a small number of features were better able to simulate the model's predictions. However, we found no difference in multiple measures of trust and found that clear models did not improve the ability to correct mistakes. These findings suggest that interpretability research could benefit from more emphasis on empirically verifying that interpretable models achieve all their intended effects.\nWe present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to the-curse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-action value functions (Q functions). Using state-value functions helps to lift the curse and as a result naturally turn our policy-gradient solution into classical Actor-Critic architecture whose Actor uses state-value function for the update. Our algorithms, Gradient Actor-Critic and Emphatic Actor-Critic, are derived based on the exact gradient of averaged state-value function objective and thus are guaranteed to converge to its optimal solution, while maintaining all the desirable properties of classical Actor-Critic methods with no additional hyper-parameters. To our knowledge, this is the first time that convergent off-policy learning methods have been extended to classical Actor-Critic methods with function approximation.\nIn ensemble methods, the outputs of a collection of diverse classifiers are combined in the expectation that the global prediction be more accurate than the individual ones. Heterogeneous ensembles consist of predictors of different types, which are likely to have different biases. If these biases are complementary, the combination of their decisions is beneficial. In this work, a family of heterogeneous ensembles is built by pooling classifiers from M homogeneous ensembles of different types of size T. Depending on the fraction of base classifiers of each type, a particular heterogeneous combination in this family is represented by a point in a regular simplex in M dimensions. The M vertices of this simplex represent the different homogeneous ensembles. A displacement away from one of these vertices effects a smooth transformation of the corresponding homogeneous ensemble into a heterogeneous one. The optimal composition of such heterogeneous ensemble can be determined using cross-validation or, if bootstrap samples are used to build the individual classifiers, out-of-bag data. An empirical analysis of such combinations of bootstraped ensembles composed of neural networks, SVMs, and random trees (i.e. from a standard random forest) illustrates the gains that can be achieved by this heterogeneous ensemble creation method.\nThis paper proposes a class of well-conditioned neural networks in which a unit amount of change in the inputs causes at most a unit amount of change in the outputs or any of the internal layers. We develop the known methodology of controlling Lipschitz constants to realize its full potential in maximizing robustness: our linear and convolution layers subsume those in the previous Parseval networks as a special case and allow greater degrees of freedom; aggregation, pooling, splitting and other operators are adapted in new ways, and a new loss function is proposed, all for the purpose of improving robustness. With MNIST and CIFAR-10 classifiers, we demonstrate a number of advantages. Without needing any adversarial training, the proposed classifiers exceed the state of the art in robustness against white-box L2-bounded adversarial attacks. Their outputs are quantitatively more meaningful than ordinary networks and indicate levels of confidence. They are also free of exploding gradients, among other desirable properties.\nWe study the robustness of classifiers to various kinds of random noise models. In particular, we consider noise drawn uniformly from the $\\ell\\_p$ ball for $p \\in [1, \\infty]$ and Gaussian noise with an arbitrary covariance matrix. We characterize this robustness to random noise in terms of the distance to the decision boundary of the classifier. This analysis applies to linear classifiers as well as classifiers with locally approximately flat decision boundaries, a condition which is satisfied by state-of-the-art deep neural networks. The predicted robustness is verified experimentally.\nWe address the task of generating query suggestions for task-based search. The current state of the art relies heavily on suggestions provided by a major search engine. In this paper, we solve the task without reliance on search engines. Specifically, we focus on the first step of a two-stage pipeline approach, which is dedicated to the generation of query suggestion candidates. We present three methods for generating candidate suggestions and apply them on multiple information sources. Using a purpose-built test collection, we find that these methods are able to generate high-quality suggestion candidates.\nEntity-oriented search deals with a wide variety of information needs, from displaying direct answers to interacting with services. In this work, we aim to understand what are prominent entity-oriented search intents and how they can be fulfilled. We develop a scheme of entity intent categories, and use them to annotate a sample of queries. Specifically, we annotate unique query refiners on the level of entity types. We observe that, on average, over half of those refiners seek to interact with a service, while over a quarter of the refiners search for information that may be looked up in a knowledge base.\nAutonomous robots need to interact with unknown, unstructured and changing environments, constantly facing novel challenges. Therefore, continuous online adaptation for lifelong-learning and the need of sample-efficient mechanisms to adapt to changes in the environment, the constraints, the tasks, or the robot itself are crucial. In this work, we propose a novel framework for probabilistic online motion planning with online adaptation based on a bio-inspired stochastic recurrent neural network. By using learning signals which mimic the intrinsic motivation signal cognitive dissonance in addition with a mental replay strategy to intensify experiences, the stochastic recurrent network can learn from few physical interactions and adapts to novel environments in seconds. We evaluate our online planning and adaptation framework on an anthropomorphic KUKA LWR arm. The rapid online adaptation is shown by learning unknown workspace constraints sample-efficiently from few physical interactions while following given via points.\nAlgorithmic collusion is an emerging concept in current artificial intelligence age. Whether algorithmic collusion is a creditable threat remains as an argument. In this paper, we propose an algorithm which can extort its human rival to collude in a Cournot duopoly competing market. In experiments, we show that, the algorithm can successfully extorted its human rival and gets higher profit in long run, meanwhile the human rival will fully collude with the algorithm. As a result, the social welfare declines rapidly and stably. Both in theory and in experiment, our work confirms that, algorithmic collusion can be a creditable threat. In application, we hope, the frameworks, the algorithm design as well as the experiment environment illustrated in this work, can be an incubator or a test bed for researchers and policymakers to handle the emerging algorithmic collusion.\nDeep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets to define and evaluate this task, and propose a novel model which can provide joint textual rationale generation and attention visualization. Our datasets define visual and textual justifications of a classification decision for activity recognition tasks (ACT-X) and for visual question answering tasks (VQA-X). We quantitatively show that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision. We also qualitatively show cases where visual explanation is more insightful than textual explanation, and vice versa, supporting our thesis that multimodal explanation models offer significant benefits over unimodal approaches.\nWe propose a reliable intersection control mechanism for strategic autonomous and connected vehicles (agents) in non-cooperative environments. Each agent has access to his/her earliest possible and desired passing times, and reports a passing time to the intersection manager, who allocates the intersection temporally to the agents in a First-Come-First-Serve basis. However, the agents might have conflicting interests and can take actions strategically. To this end, we analyze the strategic behaviors of the agents and formulate Nash equilibria for all possible scenarios. Furthermore, among all Nash equilibria we identify a socially optimal equilibrium that leads to a fair intersection allocation, and correspondingly we describe a strategy-proof intersection mechanism, which achieves reliable intersection control such that the strategic agents do not have any incentive to misreport their passing times strategically.\nDescription Logics (DLs) under Rational Closure (RC) is a well-known framework for non-monotonic reasoning in DLs. In this paper, we address the concept subsumption decision problem under RC for nominal safe $\\mathcal{ELO}_\\bot$, a notable and practically important DL representative of the OWL 2 profile OWL 2 EL.   Our contribution here is to define a polynomial time subsumption procedure for nominal safe $\\mathcal{ELO}_\\bot$ under RC that relies entirely on a series of classical, monotonic $\\mathcal{EL}_\\bot$ subsumption tests. Therefore, any existing classical monotonic $\\mathcal{EL}_\\bot$ reasoner can be used as a black box to implement our method. We then also adapt the method to one of the known extensions of RC for DLs, namely Defeasible Inheritance-based DLs without losing the computational tractability.\nWe introduce tensor field networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as input (and guarantees as output) scalars, vectors, and higher-order tensors, in the geometric sense of these terms. We demonstrate how tensor field networks learn to model simple physics (Newtonian gravitation and moment of inertia), classify simple 3D shapes (trained on one orientation and tested on shapes in arbitrary orientations), and, given a small organic molecule with an atom removed, replace the correct element at the correct location in space.\nVanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the memory cells mean an LSTM layer has four times as many parameters as an RNN with the same hidden vector size. This paper addresses the vanishing gradient problem using a high order RNN (HORNN) which has additional connections from multiple previous time steps. Speech recognition experiments using British English multi-genre broadcast (MGB3) data showed that the proposed HORNN architectures for rectified linear unit and sigmoid activation functions reduced word error rates (WER) by 4.2% and 6.3% over the corresponding RNNs, and gave similar WERs to a (projected) LSTM while using only 20%--50% of the recurrent layer parameters and computation.\nWe examine two fundamental tasks associated with graph representation learning: link prediction and semi-supervised node classification. We present a densely connected autoencoder architecture capable of learning a joint representation of both local graph structure and available external node features for the multi-task learning of link prediction and node classification. To the best of our knowledge, this is the first architecture that can be efficiently trained end-to-end in a single learning stage to simultaneously perform link prediction and node classification. We provide comprehensive empirical evaluation of our models on a range of challenging benchmark graph-structured datasets, and demonstrate significant improvement in accuracy over related methods for graph representation learning. Code implementation is available at https://github.com/vuptran/graph-representation-learning\nReal-time bidding (RTB) is almost the most important mechanism in online display advertising, where proper bid for each page view plays a vital and essential role for good marketing results. Budget constrained bidding is a typical scenario in RTB mechanism where the advertisers hope to maximize total value of winning impressions under a pre-set budget constraint. However, the optimal strategy is hard to be derived due to complexity and volatility of the auction environment. To address the challenges, in this paper, we formulate budget constrained bidding as a Markov Decision Process. Quite different from prior model-based work, we propose a novel framework based on model-free reinforcement learning which sequentially regulates the bidding parameter rather than directly producing bid. Along this line, we further innovate a reward function which deploys a deep neural network to learn appropriate reward and thus leads the agent to deliver the optimal policy effectively; we also design an adaptive $\\epsilon$-greedy strategy which adjusts the exploration behaviour dynamically and further improves the performance. Experimental results on real dataset demonstrate the effectiveness of our framework.\nNeural networks are commonly regarded as black boxes performing incomprehensible functions. For classification problems networks provide maps from high dimensional feature space to K-dimensional image space. Images of training vector are projected on polygon vertices, providing visualization of network function. Such visualization may show the dynamics of learning, allow for comparison of different networks, display training vectors around which potential problems may arise, show differences due to regularization and optimization procedures, investigate stability of network classification under perturbation of original vectors, and place new data sample in relation to training data, allowing for estimation of confidence in classification of a given sample. An illustrative example for the three-class Wine data and five-class Satimage data is described. The visualization method proposed here is applicable to any black box system that provides continuous outputs.\nDespite single agent deep reinforcement learning has achieved significant success due to the experience replay mechanism, Concerns should be reconsidered in multiagent environments. This work focus on the stochastic cooperative environment. We apply a specific adaptation to one recently proposed weighted double estimator and propose a multiagent deep reinforcement learning framework, named Weighted Double Deep Q-Network (WDDQN). To achieve efficient cooperation, \\textit{Lenient Reward Network} and \\textit{Mixture Replay Strategy} are introduced. By utilizing the deep neural network and the weighted double estimator, WDDQN can not only reduce the bias effectively but also be extended to many deep RL scenarios with only raw pixel images as input. Empirically, the WDDQN outperforms the existing DRL algorithm (double DQN) and multiagent RL algorithm (lenient Q-learning) in terms of performance and convergence within stochastic cooperative environments.\nWith an increasing demand from emerging logistics businesses, Vehicle Routing Problem with Private fleet and common Carrier (VRPPC) has been introduced to manage package delivery services from a supplier to customers. However, almost all of existing studies focus on the deterministic problem that assumes all parameters are known perfectly at the time when the planning and routing decisions are made. In reality, some parameters are random and unknown. Therefore, in this paper, we consider VRPPC with hard time windows and random demand, called Optimal Delivery Planning (ODP). The proposed ODP aims to minimize the total package delivery cost while meeting the customer time window constraints. We use stochastic integer programming to formulate the optimization problem incorporating the customer demand uncertainty. Moreover, we evaluate the performance of the ODP using test data from benchmark dataset and from actual Singapore road map.\nThere has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, gram- mars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.\nUnderstanding and visualizing human discourse has long being a challenging task. Although recent work on argument mining have shown success in classifying the role of various sentences, the task of recognizing concepts and understanding the ways in which they are discussed remains challenging. Given an email thread or a transcript of a group discussion, our task is to extract the relevant concepts and understand how they are referenced and re-referenced throughout the discussion. In the present work, we present a preliminary approach for extracting and visualizing group discourse by adapting Wikipedia's category hierarchy to be an external concept ontology. From a user study, we found that our method achieved better results than 4 strong alternative approaches, and we illustrate our visualization method based on the extracted discourse flows.\nChoosing optimal (or at least better) policies is an important problem in domains from medicine to education to finance and many others. One approach to this problem is through controlled experiments/trials - but controlled experiments are expensive. Hence it is important to choose the best policies on the basis of observational data. This presents two difficult challenges: (i) missing counterfactuals, and (ii) selection bias. This paper presents theoretical bounds on estimation errors of counterfactuals from observational data by making connections to domain adaptation theory. It also presents a principled way of choosing optimal policies using domain adversarial neural networks. We illustrate the effectiveness of domain adversarial training together with various features of our algorithm on a semi-synthetic breast cancer dataset and a supervised UCI dataset (Statlog).\nDeep learning is a branch of artificial intelligence where networks of simple interconnected units are used to extract patterns from data in order to solve complex problems. Deep learning algorithms have shown groundbreaking performance in a variety of sophisticated tasks, especially those related to images. They have often matched or exceeded human performance. Since the medical field of radiology mostly relies on extracting useful information from images, it is a very natural application area for deep learning, and research in this area has rapidly grown in recent years. In this article, we review the clinical reality of radiology and discuss the opportunities for application of deep learning algorithms. We also introduce basic concepts of deep learning including convolutional neural networks. Then, we present a survey of the research in deep learning applied to radiology. We organize the studies by the types of specific tasks that they attempt to solve and review the broad range of utilized deep learning algorithms. Finally, we briefly discuss opportunities and challenges for incorporating deep learning in the radiology practice of the future.\nModeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the non-unique, high-dimensional nature of graphs and the complex, non-local dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far.   In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models.\nCombinatorial optimization is a common theme in computer science which underlies a considerable variety of problems. In contrast to the continuous setting, combinatorial problems require special solution strategies, and it's hard to come by generic schemes like gradient methods for continuous domains. We follow a standard construction of a parametric sampling distribution that transforms the problem to the continuous domain, allowing us to optimize the expectation of a given objective using estimates of the gradient. In spite of the apparent generality, such constructions are known to suffer from highly variable gradient estimates, and thus require careful tuning that is done in a problem specific manner. We show that a simple trick of converting the objective values to their cumulative probabilities fixes the distribution of the objective, allowing us to derive an online optimization algorithm that can be applied in a generic fashion. As an experimental benchmark we use the task of finding cliques in undirected graphs, and we show that our method, even when blindly applied, consistently outperforms related methods. Notably, on the DIMACS clique benchmark, our method approaches the performance of the best clique finding algorithms without access to the graph structure, and only through objective function evaluations, thus providing significant evidence to the generality and effectivity of our method.\nDesigning conversational user interface experience is complicated because conversation comes with many expectations. When these expectations are met, we feel the interface is natural, but once violated, we feel something is amiss. The last decade witnessed human language technologies and behaviours to enable humans converse with software using spoken dialogue to access, create and process information. Less is known about the practicalities of designing chatbot interactions. In this paper, we introduce the nature of conversational user interfaces (CUIs) and describe the underlying technologies they are based on. Moreover, we define guidelines for designing conversational interfaces in various domains. This paper particularly focuses on classifying the elements and techniques used in CUI design patterns. After concluding certain challenges with CUI, we discuss important features and chatbot states to be considered in CUI design for specific domain. We envisage this study to support CUI researchers to design tailored chatbots applicable into certain domain and improve the current state of research challenges in the field of Artificial Intelligence and conversational agents.\nDue to recent technical and scientific advances, we have a wealth of information hidden in unstructured text data such as offline/online narratives, research articles, and clinical reports. To mine these data properly, attributable to their innate ambiguity, a Word Sense Disambiguation (WSD) algorithm can avoid numbers of difficulties in Natural Language Processing (NLP) pipeline. However, considering a large number of ambiguous words in one language or technical domain, we may encounter limiting constraints for proper deployment of existing WSD models. This paper attempts to address the problem of one-classifier-per-one-word WSD algorithms by proposing a single Bidirectional Long Short-Term Memory (BLSTM) network which by considering senses and context sequences works on all ambiguous words collectively. Evaluated on SensEval-3 benchmark, we show the result of our model is comparable with top-performing WSD algorithms. We also discuss how applying additional modifications alleviates the model fault and the need for more training data.\nCategory, or property generalization is a central function in the human cognition. It plays a crucial role in a variety of domains, such as learning, everyday reasoning, specialized reasoning, and decision making. Judging the content of a dish as edible, a hormone level as healthy, a building as belonging to the same architectural style as previously seen buildings, are examples of category generalization. In this paper, we propose self-organizing maps as candidates to explain the psychological mechanisms underlying category generalization. Self-organizing maps are psychologically and biologically plausible neural network models that learn after limited exposure to positive category examples, without any need of contrastive information. Just like humans. They reproduce human behavior in category generalization, in particular for what concerns the well-known Numerosity and Variability effects, which are usually explained with Bayesian tools. Where category generalization is concerned, self-organizing maps are good candidates to bridge the gap between the computational level of analysis in Marr's hierarchy (where Bayesian models are situated) and the algorithmic level of aanalysis in Marr's hierarchy (where Bayesian models are situated) and the algorithmic level of analysis in which plausible mechanisms are described.\nThe purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input.   The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.\nWe propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. Our experiments indicate that our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0\nWe consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players' hidden states, in both cooperative and adversarial settings.\nThis paper offers a multi-disciplinary review of knowledge acquisition methods in human activity systems. The review captures the degree of involvement of various types of agencies in the knowledge acquisition process, and proposes a classification with three categories of methods: the human agent, the human-inspired agent, and the autonomous machine agent methods. In the first two categories, the acquisition of knowledge is seen as a cognitive task analysis exercise, while in the third category knowledge acquisition is treated as an autonomous knowledge-discovery endeavour. The motivation for this classification stems from the continuous change over time of the structure, meaning and purpose of human activity systems, which are seen as the factor that fuelled researchers' and practitioners' efforts in knowledge acquisition for more than a century.   We show through this review that the KA field is increasingly active due to the higher and higher pace of change in human activity, and conclude by discussing the emergence of a fourth category of knowledge acquisition methods, which are based on red-teaming and co-evolution.\nPurpose: We propose a phenotype-based artificial intelligence system that can self-learn and is accurate for screening purposes, and test it on a Level IV monitoring system. Methods: Based on the physiological knowledge, we hypothesize that the phenotype information will allow us to find subjects from a well-annotated database that share similar sleep apnea patterns. Therefore, for a new-arriving subject, we can establish a prediction model from the existing database that is adaptive to the subject. We test the proposed algorithm on a database consisting of 62 subjects with the signals recorded from a Level IV wearable device measuring the thoracic and abdominal movements and the SpO2. Results: With the leave-one cross validation, the accuracy of the proposed algorithm to screen subjects with an apnea-hypopnea index greater or equal to 15 is 93.6%, the positive likelihood ratio is 6.8, and the negative likelihood ratio is 0.03. Conclusion: The results confirm the hypothesis and show that the proposed algorithm has great potential to screen patients with SAS.\nIn sequential hypothesis testing, Generalized Binary Search (GBS) greedily chooses the test with the highest information gain at each step. It is known that GBS obtains the gold standard query cost of $O(\\log n)$ for problems satisfying the $k$-neighborly condition, which requires any two tests to be connected by a sequence of tests where neighboring tests disagree on at most $k$ hypotheses. In this paper, we introduce a weaker condition, split-neighborly, which requires that for the set of hypotheses two neighbors disagree on, any subset is splittable by some test. For four problems that are not $k$-neighborly for any constant $k$, we prove that they are split-neighborly, which allows us to obtain the optimal $O(\\log n)$ worst-case query cost.\nReal-time advertising allows advertisers to bid for each impression for a visiting user. To optimize a specific goal such as maximizing the revenue led by ad placements, advertisers not only need to estimate the relevance between the ads and user's interests, but most importantly require a strategic response with respect to other advertisers bidding in the market. In this paper, we formulate bidding optimization with multi-agent reinforcement learning. To deal with a large number of advertisers, we propose a clustering method and assign each cluster with a strategic bidding agent. A practical Distributed Coordinated Multi-Agent Bidding (DCMAB) has been proposed and implemented to balance the tradeoff between the competition and cooperation among advertisers. The empirical study on our industry-scaled real-world data has demonstrated the effectiveness of our modeling methods. Our results show that a cluster based bidding would largely outperform single-agent and bandit approaches, and the coordinated bidding achieves better overall objectives than the purely self-interested bidding agents.\nMany of the current scientific advances in the life sciences have their origin in the intensive use of data for knowledge discovery. In no area this is so clear as in bioinformatics, led by technological breakthroughs in data acquisition technologies. It has been argued that bioinformatics could quickly become the field of research generating the largest data repositories, beating other data-intensive areas such as high-energy physics or astroinformatics. Over the last decade, deep learning has become a disruptive advance in machine learning, giving new live to the long-standing connectionist paradigm in artificial intelligence. Deep learning methods are ideally suited to large-scale data and, therefore, they should be ideally suited to knowledge discovery in bioinformatics and biomedicine at large. In this brief paper, we review key aspects of the application of deep learning in bioinformatics and medicine, drawing from the themes covered by the contributions to an ESANN 2018 special session devoted to this topic.\nWe tackle here the problem of multimodal image non-rigid registration, which is of prime importance in remote sensing and medical imaging. The difficulties encountered by classical registration approaches include feature design and slow optimization by gradient descent. By analyzing these methods, we note the significance of the notion of scale. We design easy-to-train, fully-convolutional neural networks able to learn scale-specific features. Once chained appropriately, they perform global registration in linear time, getting rid of gradient descent schemes by predicting directly the deformation.We show their performance in terms of quality and speed through various tasks of remote sensing multimodal image alignment. In particular, we are able to register correctly cadastral maps of buildings as well as road polylines onto RGB images, and outperform current keypoint matching methods.\nComplex data is usually produced by interacting sources with different mechanisms. Here we introduce a parameter-free model-based approach, based upon the seminal concept of Algorithmic Probability, that decomposes an observation and signal into its most likely algorithmic generative sources. Our methods use a causal calculus to infer model representations. We demonstrate the method ability to distinguish interacting mechanisms and deconvolve them, regardless of whether the objects produce strings, space-time evolution diagrams, images or networks. We numerically test and evaluate our causal separation methods and find that it can disentangle examples of observations from discrete dynamical systems, and complex networks. We think that these causal separating techniques can contribute to tackle the challenge of causation for estimations of better rooted probability distributions thereby complementing more limited statistical-oriented techniques that otherwise would lack model inference capabilities.\nRecent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.\nThe loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves, such as a polygonal chain with only one bend, over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10 and CIFAR-100, using state-of-the-art deep residual networks. On ImageNet we improve the top-1 error-rate of a pre-trained ResNet by 0.56% by running FGE for just 5 epochs.\nWhat makes humans so good at solving seemingly complex video games? Unlike computers, humans bring in a great deal of prior knowledge about the world, enabling efficient decision making. This paper investigates the role of human priors for solving video games. Given a sample game, we conduct a series of ablation studies to quantify the importance of various priors on human performance. We do this by modifying the video game environment to systematically mask different types of visual information that could be used by humans as priors. We find that removal of some prior knowledge causes a drastic degradation in the speed with which human players solve the game, e.g. from 2 minutes to over 20 minutes. Furthermore, our results indicate that general priors, such as the importance of objects and visual consistency, are critical for efficient game-play. Videos and the game manipulations are available at https://rach0012.github.io/humanRL_website/\nTraditional methods for assessing illness severity and predicting in-hospital mortality among critically ill patients require manual, time-consuming, and error-prone calculations that are further hindered by the use of static variable thresholds derived from aggregate patient populations. These coarse frameworks do not capture time-sensitive individual physiological patterns and are not suitable for instantaneous assessment of patients' acuity trajectories, a critical task for the ICU where conditions often change rapidly. Furthermore, they are ill-suited to capitalize on the emerging availability of streaming electronic health record data. We propose a novel acuity score framework (DeepSOFA) that leverages temporal patient measurements in conjunction with deep learning models to make accurate assessments of a patient's illness severity at any point during their ICU stay. We compare DeepSOFA with SOFA baseline models using the same predictors and find that at any point during an ICU admission, DeepSOFA yields more accurate predictions of in-hospital mortality.\nDeep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs.   Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.38x and 1.60x, compared to CUBLAS and CUSPARSE respectively.\nGeneral Video Game Playing (GVGP) aims at designing an agent that is capable of playing multiple video games with no human intervention. In 2014, The General Video Game AI (GVGAI) competition framework was created and released with the purpose of providing researchers a common open-source and easy to use platform for testing their AI methods with potentially infinity of games created using Video Game Description Language (VGDL). The framework has been expanded into several tracks during the last few years to meet the demand of different research directions. The agents are required to either play multiples unknown games with or without access to game simulations, or to design new game levels or rules. This survey paper presents the VGDL, the GVGAI framework, existing tracks, and reviews the wide use of GVGAI framework in research, education and competitions five years after its birth. A future plan of framework improvements is also described.\nIn systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system's surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods.\nIn psychological measurements, two levels should be distinguished: the 'individual level', relative to the different participants in a given cognitive situation, and the 'collective level', relative to the overall statistics of their outcomes, which we propose to associate with a notion of 'collective participant'. When the distinction between these two levels is properly formalized, it reveals why the modeling of the collective participant generally requires beyond-quantum - non-Bornian - probabilistic models, when sequential measurements at the individual level are considered, and this though a pure quantum description remains valid for single measurement situations.\nMost reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where different tasks share a set of actions. In such environments a compound policy may be learnt with shared neural network parameters, which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from different tasks negate each other, making the learning unstable and sometimes less data efficient. In this paper, we propose a new approach for simultaneous training of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update to further improve the learning. The proposed architecture was tested on 8 link planar manipulator and 27 degrees of freedom(DoF) Humanoid for learning multi-goal reachability tasks for 3 and 2 end effectors respectively. We show that our approach supports efficient multi-task learning in complex robotic systems, outperforming related methods in continuous action spaces.\nClose human-robot cooperation is a key enabler for new developments in advanced manufacturing and assistive applications. Close cooperation require robots that can predict human actions and intent, and understand human non-verbal cues. Recent approaches based on neural networks have led to encouraging results in the human action prediction problem both in continuous and discrete spaces. Our approach extends the research in this direction. Our contributions are three-fold. First, we validate the use of gaze and body pose cues as a means of predicting human action through a feature selection method. Next, we address two shortcomings of existing literature: predicting multiple and variable-length action sequences. This is achieved by introducing an encoder-decoder recurrent neural network topology in the discrete action prediction problem. In addition, we theoretically demonstrate the importance of predicting multiple action sequences as a means of estimating the stochastic reward in a human robot cooperation scenario. Finally, we show the ability to effectively train the prediction model on a action prediction dataset, involving human motion data, and explore the influence of the model's parameters on its performance.\nModel-free reinforcement learning (RL) methods are succeeding in a growing number of tasks, aided by recent advances in deep learning. However, they tend to suffer from high sample complexity, which hinders their use in real-world domains. Alternatively, model-based reinforcement learning promises to reduce sample complexity, but tends to require careful tuning and to date have succeeded mainly in restrictive domains where simple models are sufficient for learning. In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training. To overcome this issue, we propose to use an ensemble of models to maintain the model uncertainty and regularize the learning process. We further show that the use of likelihood ratio derivatives yields much more stable learning than backpropagation through time. Altogether, our approach Model-Ensemble Trust-Region Policy Optimization (ME-TRPO) significantly reduces the sample complexity compared to model-free deep RL methods on challenging continuous control benchmark tasks.\nIn the recent literature the important role of depth in deep learning has been emphasized. In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. This implies that a sufficiently wide layer is necessary to produce disconnected decision regions. We discuss the implications of this result for the construction of neural networks, in particular the relation to the problem of adversarial manipulation of classifiers.\nRecent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage of the dynamics model. We present model-based value expansion, which controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.\nIn partially observed environments, it can be useful for a human to provide the robot with declarative information that augments its direct sensory observations. For instance, given a robot on a search-and-rescue mission, a human operator might suggest locations of interest. We provide a representation for the robot's internal knowledge that supports efficient combination of raw sensory information with high-level declarative information presented in a formal language. Computational efficiency is achieved by dynamically selecting an appropriate factoring of the belief state, combining aspects of the belief when they are correlated through information and separating them when they are not. This strategy works in open domains, in which the set of possible objects is not known in advance, and provides significant improvements in inference time, leading to more efficient planning for complex partially observable tasks. We validate our approach experimentally in two open-domain planning problems: a 2D discrete gridworld task and a 3D continuous cooking task.\nDespite recent advances in training recurrent neural networks (RNNs), capturing long-term dependencies in sequences remains a fundamental challenge. Most approaches use backpropagation through time (BPTT), which is difficult to scale to very long sequences. This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective. This auxiliary loss forces RNNs to either reconstruct previous events or predict next events in a sequence, making truncated backpropagation feasible for long sequences and also improving full BPTT. We evaluate our method on a variety of settings, including pixel-by-pixel image classification with sequence lengths up to 16\\,000, and a real document classification benchmark. Our results highlight good performance and resource efficiency of this approach over competitive baselines, including other recurrent models and a comparable sized Transformer. Further analyses reveal beneficial effects of the auxiliary loss on optimization and regularization, as well as extreme cases where there is little to no backpropagation.\nHuman inertial thinking schemes can be formed through learning, which are then applied to quickly solve similar problems later. However, when problems are significantly different, inertial thinking generally presents the solutions that are definitely imperfect. In such cases, people will apply creative thinking, such as reverse thinking, to solve problems. Similarly, machine learning methods also form inertial thinking schemes through learning the knowledge from a large amount of data. However, when the testing data are vastly difference, the formed inertial thinking schemes will inevitably generate errors. This kind of inertial thinking is called illusion inertial thinking. Because all machine learning methods do not consider illusion inertial thinking, in this paper we propose a new method that uses reverse thinking to correct illusion inertial thinking, which increases the generalization ability of machine learning methods. Experimental results on benchmark datasets are used to validate the proposed method.\nThe Iterated Prisoner's Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoner's dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoner's Dilemma (SPD) game to better capture the aforementioned characteristics. In this work, we propose a deep multiagent reinforcement learning approach that investigates the evolution of mutual cooperation in SPD games. Our approach consists of two phases. The first phase is offline: it synthesizes policies with different cooperation degrees and then trains a cooperation degree detection network. The second phase is online: an agent adaptively selects its policy based on the detected degree of opponent cooperation. The effectiveness of our approach is demonstrated in two representative SPD 2D games: the Apple-Pear game and the Fruit Gathering game. Experimental results show that our strategy can avoid being exploited by exploitative opponents and achieve cooperation with cooperative opponents.\nFacial expression recognition (FER) has always been a challenging issue in computer vision. The different expressions of emotion and uncontrolled environmental factors lead to inconsistencies in the complexity of FER and variability of between expression categories, which is often overlooked in most facial expression recognition systems. In order to solve this problem effectively, we presented a simple and efficient CNN model to extract facial features, and proposed a complexity perception classification (CPC) algorithm for FER. The CPC algorithm divided the dataset into an easy classification sample subspace and a complex classification sample subspace by evaluating the complexity of facial features that are suitable for classification. The experimental results of our proposed algorithm on Fer2013 and CK-plus datasets demonstrated the algorithm's effectiveness and superiority over other state-of-the-art approaches.\nThe design of gaits for robot locomotion can be a daunting process which requires significant expert knowledge and engineering. This process is even more challenging for robots that do not have an accurate physical model, such as compliant or micro-scale robots. Data-driven gait optimization provides an automated alternative to analytical gait design. In this paper, we propose a novel approach to efficiently learn a wide range of locomotion tasks with walking robots. This approach formalizes locomotion as a contextual policy search task to collect data, and subsequently uses that data to learn multi-objective locomotion primitives that can be used for planning. As a proof-of-concept we consider a simulated hexapod modeled after a recently developed microrobot, and we thoroughly evaluate the performance of this microrobot on different tasks and gaits. Our results validate the proposed controller and learning scheme on single and multi-objective locomotion tasks. Moreover, the experimental simulations show that without any prior knowledge about the robot used (e.g., dynamics model), our approach is capable of learning locomotion primitives within 250 trials and subsequently using them to successfully navigate through a maze.\nIn order to explore and act autonomously in an environment, an agent needs to learn from the sensorimotor information that is captured while acting. By extracting the regularities in this sensorimotor stream, it can learn a model of the world, which in turn can be used as a basis for action and exploration.   This requires the acquisition of compact representations from a possibly high dimensional raw observation, which is noisy and ambiguous. In this paper, we learn sensory representations from sensorimotor prediction. We propose a model which integrates sensorimotor information over time, and project it in a sensory representation which is useful for prediction. We emphasize on a simple example the role of motor and memory for learning sensory representations.\nResearch on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a coordination task between two mobile robots entering a door. The obtained results show the effectiveness of Q-CP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance.\nMultilayer graphs encode different kind of interactions between the same set of entities. When one wants to cluster such a multilayer graph, the natural question arises how one should merge the information different layers. We introduce in this paper a one-parameter family of matrix power means for merging the Laplacians from different layers and analyze it in expectation in the stochastic block model. We show that this family allows to recover ground truth clusters under different settings and verify this in real world data. While computing the matrix power mean can be very expensive for large graphs, we introduce a numerical scheme to efficiently compute its eigenvectors for the case of large sparse graphs.\nThe tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between \"nearby\" sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies.\nOnline structure learning approaches, such as those stemming from Statistical Relational Learning, enable the discovery of complex relations in noisy data streams. However, these methods assume the existence of fully-labelled training data, which is unrealistic for most real-world applications. We present a novel approach for completing the supervision of a semi-supervised structure learning task. We incorporate graph cut minimisation, a technique that derives labels for unlabelled data, based on their distance to their labelled counterparts. In order to adapt graph cut minimisation to first order logic, we employ a suitable structural distance for measuring the distance between sets of logical atoms. The labelling process is achieved online (single-pass) by means of a caching mechanism and the Hoeffding bound, a statistical tool to approximate globally-optimal decisions from locally-optimal ones. We evaluate our approach on the task of composite event recognition by using a benchmark dataset for human activity recognition, as well as a real dataset for maritime monitoring. The evaluation suggests that our approach can effectively complete the missing labels and eventually, improve the accuracy of the underlying structure learning system.\nWe study the problem of learning policies over long time horizons. We present a framework that leverages and integrates two key concepts. First, we utilize hierarchical policy classes that enable planning over different time scales, i.e., the high level planner proposes a sequence of subgoals for the low level planner to achieve. Second, we utilize expert demonstrations within the hierarchical action space to dramatically reduce cost of exploration. Our framework is flexible and can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels of the hierarchy. Using long-horizon benchmarks, including Montezuma's Revenge, we empirically demonstrate that our approach can learn significantly faster compared to hierarchical RL, and can be significantly more label- and sample-efficient compared to flat IL. We also provide theoretical analysis of the labeling cost for certain instantiations of our framework.\nWe introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals. The proposed semi-parametric topological memory (SPTM) consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a (parametric) deep network capable of retrieving nodes from the graph based on observations. The graph stores no metric information, only connectivity of locations corresponding to the nodes. We use SPTM as a planning module in a navigation system. Given only 5 minutes of footage of a previously unseen maze, an SPTM-based navigation agent can build a topological map of the environment and use it to confidently navigate towards goals. The average success rate of the SPTM agent in goal-directed navigation across test environments is higher than the best-performing baseline by a factor of three. A video of the agent is available at https://youtu.be/vRF7f4lhswo\nAerial robots are becoming popular among general public, and with the development of artificial intelligence (AI), there is a trend to equip aerial robots with a natural user interface (NUI). Hand/arm gestures are an intuitive way to communicate for humans, and various research works have focused on controlling an aerial robot with natural gestures. However, the techniques in this area are still far from mature. Many issues in this area have been poorly addressed, such as the principles of choosing gestures from the design point of view, hardware requirements from an economic point of view, considerations of data availability, and algorithm complexity from a practical perspective. Our work focuses on building an economical monocular system particularly designed for gesture-based piloting of an aerial robot. Natural arm gestures are mapped to rich target directions and convenient fine adjustment is achieved. Practical piloting scenarios, hardware cost and algorithm applicability are jointly considered in our system design. The entire system is successfully implemented in an aerial robot and various properties of the system are tested.\nIntrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations.\nAutomatic chess problem or puzzle composition typically involves generating and testing various different positions, sometimes using particular piece sets. Once a position has been generated, it is then usually tested for positional legality based on the game rules. However, it is useful to be able to estimate what the search space size for particular piece combinations is to begin with. So if a desirable chess problem was successfully generated by examining 'merely' 100,000 or so positions in a theoretical search space of about 100 billion, this would imply the composing approach used was quite viable and perhaps even impressive. In this article, I explain a method of calculating the size of this search space using a combinatorics and permutations approach. While the mathematics itself may already be established, a precise method and justification of applying it with regard to the chessboard and chess pieces has not been documented, to the best of our knowledge. Additionally, the method could serve as a useful starting point for further estimations of search space size which filter out positions for legality and rotation, depending on how the automatic composer is allowed to place pieces on the board (because this affects its total search space size).\nTraining neural networks involves finding minima of a high-dimensional non-convex loss function. Knowledge of the structure of this energy landscape is sparse. Relaxing from linear interpolations, we construct continuous paths between minima of recent neural network architectures on CIFAR10 and CIFAR100. Surprisingly, the paths are essentially flat in both the training and test landscapes. This implies that neural networks have enough capacity for structural changes, or that these changes are small between minima. Also, each minimum has at least one vanishing Hessian eigenvalue in addition to those resulting from trivial invariance.\nThe underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big data prediction proved to be entirely wrong, whereas George Gallup only needed 3K handpicked people to make an accurate prediction. Generally, biases occur in machine learning whenever the distributions of training set and test set are different. In this work, we provide a review of different sorts of biases in (big) data sets in machine learning. We provide definitions and discussions of the most commonly appearing biases in machine learning: class imbalance and covariate shift. We also show how these biases can be quantified and corrected. This work is an introductory text for both researchers and practitioners to become more aware of this topic and thus to derive more reliable models for their learning problems.\nWe propose a Multi-Instance-Learning (MIL) approach for weakly-supervised learning problems, where a training set is formed by bags (sets of feature vectors or instances) and only labels at bag-level are provided. Specifically, we consider the Multi-Instance Dynamic-Ordinal-Regression (MI-DOR) setting, where the instance labels are naturally represented as ordinal variables and bags are structured as temporal sequences. To this end, we propose Multi-Instance Dynamic Ordinal Random Fields (MI-DORF). In this framework, we treat instance-labels as temporally-dependent latent variables in an Undirected Graphical Model. Different MIL assumptions are modelled via newly introduced high-order potentials relating bag and instance-labels within the energy function of the model. We also extend our framework to address the Partially-Observed MI-DOR problems, where a subset of instance labels are available during training. We show on the tasks of weakly-supervised facial behavior analysis, Facial Action Unit (DISFA dataset) and Pain (UNBC dataset) Intensity estimation, that the proposed framework outperforms alternative learning approaches. Furthermore, we show that MIDORF can be employed to reduce the data annotation efforts in this context by large-scale.\nIt is widely conjectured that the reason that training algorithms for neural networks are successful because all local minima lead to similar performance, for example, see (LeCun et al., 2015, Choromanska et al., 2015, Dauphin et al., 2014). Performance is typically measured in terms of two metrics: training performance and generalization performance. Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function. Our conditions are roughly in the following form: the neurons have to be strictly convex and the surrogate loss function should be a smooth version of hinge loss. We also provide counterexamples to show that when the loss function is replaced with quadratic loss or logistic loss, the result may not hold.\nDecision trees effectively represent the sparse, high dimensional and noisy nature of chemical data from experiments. Having learned a function from this data, we may want to thereafter optimize the function, e.g., picking the best chemical process catalyst. In this way, we may repurpose legacy predictive models. This work studies a large-scale, industrially-relevant mixed-integer quadratic optimization problem involving: (i) gradient-boosted pre-trained regression trees modeling catalyst behavior, (ii) penalty functions mitigating risk, and (iii) penalties enforcing composition constraints. We develop heuristic methods and an exact, branch-and-bound algorithm leveraging structural properties of gradient-boosted trees and penalty functions. We numerically test our methods on an industrial instance.\nSimulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.\nEmploying voice-based emotion recognition function in artificial intelligence (AI) product will improve the user experience. Most of researches that have been done only focus on the speech collected under controlled conditions. The scenarios evaluated in these research were well controlled. The conventional approach may fail when background noise or nonspeech filler exist. In this paper, we propose an ensemble framework combining several aspects of features from audio. The framework incorporates gender and speaker information relying on multi-task learning. Therefore it is able to dig and capture emotional information as much as possible. This framework is evaluated on multimodal emotion challenge (MEC) 2017 corpus which is close to real world. The proposed framework outperformed the best baseline system by 29.5% (relative improvement).\nWe provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a $k$ hidden node shallow network with quadratic activation and $n$ training data points, we show as long as $ k \\ge \\sqrt{2n}$, over-parametrization enables local search algorithms to find a \\emph{globally} optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, using theory of Rademacher complexity, we show with weight decay, the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian. To prove when $k\\ge \\sqrt{2n}$, the loss function has benign landscape properties, we adopt an idea from smoothed analysis, which may have other applications in studying loss surfaces of neural networks.\nThe considered problem is how to optimally allocate a set of jobs to technicians of different skills such that the number of technicians of each skill does not exceed the number of persons with that skill designation. The key motivation is the quick sensitivity analysis in terms of the workforce size which is quite necessary in many industries in the presence of unexpected work orders. A time-indexed mathematical model is proposed to minimize the total weighted completion time of the jobs. The proposed model is decomposed into a number of single-skill sub-problems so that each one is a combination of a series of nested binary Knapsack problems. A heuristic procedure is proposed to solve the problem. Our experimental results, based on a real-world case study, reveal that the proposed method quickly produces a schedule statistically close to the optimal one while the classical optimal procedure is very time-consuming.\nFor most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks.\nIt is conventional wisdom in machine learning and data mining that logical models such as rule sets are more interpretable than other models, and that among such rule-based models, simpler models are more interpretable than more complex ones. In this position paper, we question this latter assumption, and recapitulate evidence for and against this postulate. We also report the results of an evaluation in a crowd-sourcing study, which does not reveal a strong preference for simple rules, whereas we can observe a weak preference for longer rules in some domains. We then continue to review criteria for interpretability from the psychological literature, evaluate some of them, and briefly discuss their potential use in machine learning.\nIn its simplest form, the traffic flow prediction problem is restricted to predicting a single time-step into the future. Multi-step traffic flow prediction extends this set-up to the case where predicting multiple time-steps into the future based on some finite history is of interest. This problem is significantly more difficult than its single-step variant and is known to suffer from degradation in predictions as the time step increases. In this paper, two approaches to improve multi-step traffic flow prediction performance in recursive and multi-output settings are introduced. In particular, a model that allows recursive prediction approaches to take into account the temporal context in term of time-step index when making predictions is introduced. In addition, a conditional generative adversarial network-based data augmentation method is proposed to improve prediction performance in the multi-output setting. The experiments on a real-world traffic flow dataset show that the two methods improve on multi-step traffic flow prediction in recursive and multi-output settings, respectively.\nAutonomous vehicles (AVs) require accurate metric and topological location estimates for safe, effective navigation and decision-making. Although many high-definition (HD) roadmaps exist, they are not always accurate since public roads are dynamic, shaped unpredictably by both human activity and nature. Thus, AVs must be able to handle situations in which the topology specified by the map does not agree with reality. We present the Variable Structure Multiple Hidden Markov Model (VSM-HMM) as a framework for localizing in the presence of topological uncertainty, and demonstrate its effectiveness on an AV where lane membership is modeled as a topological localization process. VSM-HMMs use a dynamic set of HMMs to simultaneously reason about location within a set of most likely current topologies and therefore may also be applied to topological structure estimation as well as AV lane estimation. In addition, we present an extension to the Earth Mover's Distance which allows uncertainty to be taken into account when computing the distance between belief distributions on simplices of arbitrary relative sizes.\nWith the growing integration of smartphones into our daily lives, and their increased ease of use, mobile games have become highly popular across all demographics. People listen to music, play games or read the news while in transit or bridging gap times. While mobile gaming is gaining popularity, mobile expression of creativity is still in its early stages. We present here a new type of mobile app -- fluidic games -- and illustrate our iterative approach to their design. This new type of app seamlessly integrates exploration of the design space into the actual user experience of playing the game, and aims to enrich the user experience. To better illustrate the game domain and our approach, we discuss one specific fluidic game, which is available as a commercial product. We also briefly discuss open challenges such as player support and how generative techniques can aid the exploration of the game space further.\nTo real-time management of the bridges under dynamic conditions, this paper develops a rule-based decision support framework to extract the necessary rules from simulation results made by Aimsun. In this rule-based system, the supervised and the unsupervised learning algorithms are applied to generalize the rules where the initial set of rules are provided by the aid of fuzzy rule generation algorithms on the results of Aimsun traffic micro-simulation software. As a pilot case study, Nasr Bridge in Tehran is simulated in Aimsun7 and WEKA data mining software is used to execute the learning algorithms. Based on this experiment, the accuracy of the supervised algorithms to generalize the rules is greater than 80%. In addition, CART decision tree and sequential minimal optimization (SMO) provides 100% accuracy for normal data and so these algorithms are so reliable for crisis management on bridge. This means that, it is possible to use such machine learning methods to manage bridges in the real-time conditions.\nEstimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint and are not well-suited to general purpose optimization packages for their solution. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a smooth, constrained optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting nonconvex, constrained program involves smooth functions whose gradients are easy to compute and only involve elementary matrix operations. By using existing black-box optimization routines, our method uses global search to find an optimal DAG and can be implemented in about 50 lines of Python and outperforms existing methods without imposing any structural constraints.\nWe describe N-body networks, a neural network architecture for learning the behavior and properties of complex many body physical systems. Our specific application is to learn atomic potential energy surfaces for use in molecular dynamics simulations. Our architecture is novel in that (a) it is based on a hierarchical decomposition of the many body system into subsytems, (b) the activations of the network correspond to the internal state of each subsystem, (c) the \"neurons\" in the network are constructed explicitly so as to guarantee that each of the activations is covariant to rotations, (d) the neurons operate entirely in Fourier space, and the nonlinearities are realized by tensor products followed by Clebsch-Gordan decompositions. As part of the description of our network, we give a characterization of what way the weights of the network may interact with the activations so as to ensure that the covariance property is maintained.\nMany online applications, such as online social networks or knowledge bases, are often attacked by malicious users who commit different types of actions such as vandalism on Wikipedia or fraudulent reviews on eBay. Currently, most of the fraud detection approaches require a training dataset that contains records of both benign and malicious users. However, in practice, there are often no or very few records of malicious users. In this paper, we develop one-class adversarial nets (OCAN) for fraud detection using training data with only benign users. OCAN first uses LSTM-Autoencoder to learn the representations of benign users from their sequences of online activities. It then detects malicious users by training a discriminator with a complementary GAN model that is different from the regular GAN model. Experimental results show that our OCAN outperforms the state-of-the-art one-class classification models and achieves comparable performance with the latest multi-source LSTM model that requires both benign and malicious users in the training phase.\nMachine learning (ML) is the fastest growing field in computer science and healthcare, providing future benefits in improved medical diagnoses, disease analyses and prevention. In this paper, we introduce an application of interactive machine learning (iML) in a telemedicine system, to enable automatic and personalised interventions for lifestyle promotion. We first present the high level architecture of the system and the components forming the overall architecture. We then illustrate the interactive machine learning process design. Prediction models are expected to be trained through the participants' profiles, activity performance, and feedback from the caregiver. Finally, we show some preliminary results during the system implementation and discuss future directions. We envisage the proposed system to be digitally implemented, and behaviourally designed to promote healthy lifestyle and activities, and hence prevent users from the risk of chronic diseases.\nEvaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times ROUGE scores do not reflect the true quality of summaries and prevents multi-faceted evaluation of summaries (i.e. by topics, by overall content coverage and etc). In this paper, we introduce ROUGE 2.0, which has several updated measures of ROUGE: ROUGE-N+Synonyms, ROUGE-Topic, ROUGE-Topic+Synonyms, ROUGE-TopicUniq and ROUGE-TopicUniq+Synonyms; all of which are improvements over the core ROUGE measures.\nAutonomous systems in remote locations have a high degree of autonomy and there is a need to explain what they are doing and why in order to increase transparency and maintain trust. Here, we describe a natural language chat interface that enables vehicle behaviour to be queried by the user. We obtain an interpretable model of autonomy through having an expert 'speak out-loud' and provide explanations during a mission. This approach is agnostic to the type of autonomy model and as expert and operator are from the same user-group, we predict that these explanations will align well with the operator's mental model, increase transparency and assist with operator training.\nVehicle Routing Problem with Private fleet and common Carrier (VRPPC) has been proposed to help a supplier manage package delivery services from a single depot to multiple customers. Most of the existing VRPPC works consider deterministic parameters which may not be practical and uncertainty has to be taken into account. In this paper, we propose the Optimal Stochastic Delivery Planning with Deadline (ODPD) to help a supplier plan and optimize the package delivery. The aim of ODPD is to service all customers within a given deadline while considering the randomness in customer demands and traveling time. We formulate the ODPD as a stochastic integer programming, and use the cardinality minimization approach for calculating the deadline violation probability. To accelerate computation, the L-shaped decomposition method is adopted. We conduct extensive performance evaluation based on real customer locations and traveling time from Google Map.\nWe present an algorithm for rapidly learning controllers for robotics systems. The algorithm follows the model-based reinforcement learning paradigm, and improves upon existing algorithms; namely Probabilistic learning in Control (PILCO) and a sample-based version of PILCO with neural network dynamics (Deep-PILCO). We propose training a neural network dynamics model using variational dropout with truncated Log-Normal noise. This allows us to obtain a dynamics model with calibrated uncertainty, which can be used to simulate controller executions via rollouts. We also describe set of techniques, inspired by viewing PILCO as a recurrent neural network model, that are crucial to improve the convergence of the method. We test our method on a variety of benchmark tasks, demonstrating data-efficiency that is competitive with PILCO, while being able to optimize complex neural network controllers. Finally, we assess the performance of the algorithm for learning motor controllers for a six legged autonomous underwater vehicle. This demonstrates the potential of the algorithm for scaling up the dimensionality and dataset sizes, in more complex control tasks.\nLarge-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI (Bowman et. al, 2015) and 53% of MultiNLI (Williams et. al, 2017). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the task remains a hard open problem.\nState-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships, we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. The approach is additionally amenable to proximal optimization by augmenting the objective with a penalty on KL-divergence from a previous policy. We find that the ability to learn both a mean and covariance during training leads to significantly improved results on standard continuous control benchmarks.\nWe develop three efficient approaches for generating visual explanations from 3D convolutional neural networks (3D-CNNs) for Alzheimer's disease classification. One approach conducts sensitivity analysis on hierarchical 3D image segmentation, and the other two visualize network activations on a spatial map. Visual checks and a quantitative localization benchmark indicate that all approaches identify important brain parts for Alzheimer's disease diagnosis. Comparative analysis show that the sensitivity analysis based approach has difficulty handling loosely distributed cerebral cortex, and approaches based on visualization of activations are constrained by the resolution of the convolutional layer. The complementarity of these methods improves the understanding of 3D-CNNs in Alzheimer's disease classification from different perspectives.\nObject cosegmentation addresses the problem of discovering similar objects from multiple images and segmenting them as foreground simultaneously. In this paper, we propose a novel end-to-end pipeline to segment the similar objects simultaneously from relevant set of images using supervised learning via deep-learning framework. We experiment with multiple set of object proposal generation techniques and perform extensive numerical evaluations by training the Siamese network with generated object proposals. Similar objects proposals for the test images are retrieved using the ANNOY (Approximate Nearest Neighbor) library and deep semantic segmentation is performed on them. Finally, we form a collage from the segmented similar objects based on the relative importance of the objects.\nIn this work, we present a Multi-Channel deep convolutional Pyramid Person Matching Network (MC-PPMN) based on the combination of the semantic-components and the color-texture distributions to address the problem of person re-identification. In particular, we learn separate deep representations for semantic-components and color-texture distributions from two person images and then employ pyramid person matching network (PPMN) to obtain correspondence representations. These correspondence representations are fused to perform the re-identification task. Further, the proposed framework is optimized via a unified end-to-end deep learning scheme. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach against the state-of-the-art literature, especially on the rank-1 recognition rate.\nIn this paper, we propose GPSP, a novel Graph Partition and Space Projection based approach, to learn the representation of a heterogeneous network that consists of multiple types of nodes and links. Concretely, we first partition the heterogeneous network into homogeneous and bipartite subnetworks. Then, the projective relations hidden in bipartite subnetworks are extracted by learning the projective embedding vectors. Finally, we concatenate the projective vectors from bipartite subnetworks with the ones learned from homogeneous subnetworks to form the final representation of the heterogeneous network. Extensive experiments are conducted on a real-life dataset. The results demonstrate that GPSP outperforms the state-of-the-art baselines in two key network mining tasks: node classification and clustering.\nExtracting action sequences from texts in natural language is challenging, which requires commonsense inferences based on world knowledge. Although there has been work on extracting action scripts, instructions, navigation actions, etc., they require either the set of candidate actions is provided in advance, or action descriptions are restricted in a specific form, e.g., description templates. In this paper, we aim to extract action sequences from texts in free natural language, i.e., without any restricted templates, provided the candidate set of actions is unknown. We propose to extract action sequences from texts based on the deep reinforcement learning framework. Specifically, we view \"selecting\" or \"eliminating\" words from texts as \"actions\", and texts associated with actions as \"states\". We then build Q-networks to learn the policy of extracting actions and extract plans from the labelled texts. We exhibit the effectiveness of our approach in several datasets with comparison to state-of-the-art approaches, including online experiments interacting with humans.\nLearning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to learn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.\nOntologies are critical sources of semantic information for many application domains. Hence, there are ontologies proposed and utilized for domains such as medicine, chemical engineering, and electrical energy. In this paper, we present an improved and extended version of a wind energy ontology previously proposed. First, the ontology is restructured to increase its understandability and coverage. Secondly, it is enriched with new concepts, crisp/fuzzy attributes, and instances to increase its usability in semantic applications regarding wind energy. The ultimate ontology is utilized within a Web-based semantic portal application for wind energy, in order to showcase its contribution in a genuine application. Hence, the current study is a significant to wind and thereby renewable energy informatics, with the presented publicly-available wind energy ontology and the implemented proof-of-concept system.\nDeep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. We investigate how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs. We confirm that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances. We further find it possible to train using batch sizes considerably larger than are standard, without negatively affecting sample complexity or final performance. We leverage these facts to build a unified framework for parallelization that dramatically hastens experiments in both classes of algorithm. All neural network computations use GPUs, accelerating both data collection and training. Our results include using an entire NVIDIA DGX-1 to learn successful strategies in Atari games in single-digit minutes, using both synchronous and asynchronous algorithms.\nIn high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain $n \\times d$ matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with $1\\%$ corruptions, we achieved $7.4\\%$ test error, compared to $13.4\\%-20.5\\%$ for the baselines, and $3\\%$ error on the uncorrupted dataset. Similarly, on the drug design dataset, with $10\\%$ corruptions, we achieved $1.42$ mean-squared error test error, compared to $1.51$-$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.\nMuch of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an optimal action requires much more information than near-optimal ones. Indeed, popular approaches such as upper-confidence-bound methods and Thompson sampling can fare poorly in such situations. We consider instead learning a satisficing action, which is near-optimal while requiring less information, and propose satisficing Thompson sampling, an algorithm that serves this purpose. We establish a general bound on expected discounted regret and study the application of satisficing Thompson sampling to linear and infinite-armed bandits, demonstrating arbitrarily large benefits over Thompson sampling. We also discuss the relation between the notion of satisficing and the theory of rate distortion, which offers guidance on the selection of satisficing actions.\nIn this work we propose a simple and efficient framework for learning sentence representations from unlabelled data. Drawing inspiration from the distributional hypothesis and recent work on learning sentence representations, we reformulate the problem of predicting the context in which a sentence appears as a classification problem. Given a sentence and its context, a classifier distinguishes context sentences from other contrastive sentences based on their vector representations. This allows us to efficiently learn different types of encoding functions, and we show that the model learns high-quality sentence representations. We demonstrate that our sentence representations outperform state-of-the-art unsupervised and supervised representation learning methods on several downstream NLP tasks that involve understanding sentence semantics while achieving an order of magnitude speedup in training time.\nWe propose novel techniques for task allocation and planning in multi-robot systems operating in uncertain environments. Task allocation is performed simultaneously with planning, which provides more detailed information about individual robot behaviour, but also exploits the independence between tasks to do so efficiently. We use Markov decision processes to model robot behaviour and linear temporal logic to specify tasks and safety constraints. Building upon techniques and tools from formal verification, we show how to generate a sequence of multi-robot policies, iteratively refining them to reallocate tasks if individual robots fail, and providing probabilistic guarantees on the performance (and safe operation) of the team of robots under the resulting policy. We implement our approach and evaluate it on a benchmark multi-robot example.\nIn multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in details. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.\nWe present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. Drawing inspiration from first principles of computer organization, MAC moves away from monolithic black-box neural architectures towards a design that encourages both transparency and versatility. The model approaches problems by decomposing them into a series of attention-based reasoning steps, each performed by a novel recurrent Memory, Attention, and Composition (MAC) cell that maintains a separation between control and memory. By stringing the cells together and imposing structural constraints that regulate their interaction, MAC effectively learns to perform iterative reasoning processes that are directly inferred from the data in an end-to-end approach. We demonstrate the model's strength, robustness and interpretability on the challenging CLEVR dataset for visual reasoning, achieving a new state-of-the-art 98.9% accuracy, halving the error rate of the previous best model. More importantly, we show that the model is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.\nThe enormous amount of data to be represented using large graphs exceeds in some cases the resources of a conventional computer. Edges in particular can take up a considerable amount of memory as compared to the number of nodes. However, rigorous edge storage might not always be essential to be able to draw the needed conclusions. A similar problem takes records with many variables and attempts to extract the most discernible features. It is said that the \"dimension\" of this data is reduced. Following an approach with the same objective in mind, we can map a graph representation to a k-dimensional space and answer queries of neighboring nodes by measuring Euclidean distances. The accuracy of our answers would decrease but would be compensated for by fuzzy logic which gives an idea about the likelihood of error. This method allows for reasonable representation in memory while maintaining a fair amount of useful information. Promising preliminary results are obtained and reported by testing the proposed approach on a number of Facebook graphs.\nReinforcement learning (RL) is a promising approach to solve dialogue policy optimisation. Traditional RL algorithms, however, fail to scale to large domains due to the curse of dimensionality. We propose a novel Dialogue Management architecture, based on Feudal RL, which decomposes the decision into two steps; a first step where a master policy selects a subset of primitive actions, and a second step where a primitive action is chosen from the selected subset. The structural information included in the domain ontology is used to abstract the dialogue state space, taking the decisions at each step using different parts of the abstracted state. This, combined with an information sharing mechanism between slots, increases the scalability to large domains. We show that an implementation of this approach, based on Deep-Q Networks, significantly outperforms previous state of the art in several dialogue domains and environments, without the need of any additional reward signal.\nNetwork quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary).We are the first to consider the network quantization from both width and depth level. In the width level, parameters are divided into two parts: one for quantization and the other for re-training to eliminate the quantization loss. SLQ leverages the distribution of the parameters to improve the width level. In the depth level, we introduce incremental layer compensation to quantize layers iteratively which decreases the quantization loss in each iteration. The proposed approaches are validated with extensive experiments based on the state-of-the-art neural networks including AlexNet, VGG-16, GoogleNet and ResNet-18. Both SLQ and MLQ achieve impressive results.\nIncreasing energy efficiency in buildings can reduce costs and emissions substantially. Historically, this has been treated as a local, or single-agent, optimization problem. However, many buildings utilize the same types of thermal equipment e.g. electric heaters and hot water vessels. During operation, occupants in these buildings interact with the equipment differently thereby driving them to diverse regions in the state-space. Reinforcement learning agents can learn from these interactions, recorded as sensor data, to optimize the overall energy efficiency. However, if these agents operate individually at a household level, they can not exploit the replicated structure in the problem. In this paper, we demonstrate that this problem can indeed benefit from multi-agent collaboration by making use of targeted exploration of the state-space allowing for better generalization. We also investigate trade-offs between integrating human knowledge and additional sensors. Results show that savings of over 40% are possible with collaborative multi-agent systems making use of either expert knowledge or additional sensors with no loss of occupant comfort. We find that such multi-agent systems comfortably outperform comparable single agent systems.\nClassical anomaly detection (AD) is principally concerned with point-based anomalies, anomalies that occur at a single point in time. While point-based anomalies are useful, many real-world anomalies are range-based, meaning they occur over a period of time. Therefore, applying classical point-based accuracy measures to range-based AD systems can be misleading. In this paper, we present a new mathematical model that more accurately gauges the classification correctness of AD systems for range-based anomalies. Unlike prior work, our mathematical definitions are a superset of the classical AD definitions, enabling our system to also subsume point-based anomalies. Moreover, our system is broadly generalizable and provides a number of specialization functions that can control the application's bias along a multi-dimensional axis.\nMultitask learning, i.e. learning several tasks at once with the same neural network, can improve performance in each of the tasks. Designing deep neural network architectures for multitask learning is a challenge: There are many ways to tie the tasks together, and the design choices matter. The size and complexity of this problem exceeds human design ability, making it a compelling domain for evolutionary optimization. Using the existing state of the art soft ordering architecture as the starting point, methods for evolving the modules of this architecture and for evolving the overall topology or routing between modules are evaluated in this paper. A synergetic approach of evolving custom routings with evolved, shared modules for each task is found to be very powerful, significantly improving the state of the art in the Omniglot multitask, multialphabet character recognition domain. This result demonstrates how evolution can be instrumental in advancing deep neural network and complex system design in general.\nAccurate demand forecasts can help on-line retail organizations better plan their supply-chain processes. The challenge, however, is the large number of associative factors that result in large, non-stationary shifts in demand, which traditional time series and regression approaches fail to model. In this paper, we propose a Neural Network architecture called AR-MDN, that simultaneously models associative factors, time-series trends and the variance in the demand. We first identify several causal features and use a combination of feature embeddings, MLP and LSTM to represent them. We then model the output density as a learned mixture of Gaussian distributions. The AR-MDN can be trained end-to-end without the need for additional supervision. We experiment on a dataset of an year's worth of data over tens-of-thousands of products from Flipkart. The proposed architecture yields a significant improvement in forecasting accuracy when compared with existing alternatives.\nWe present a semantically rich graph representation for indoor robotic navigation. Our graph representation encodes: semantic locations such as offices or corridors as nodes, and navigational behaviors such as enter office or cross a corridor as edges. In particular, our navigational behaviors operate directly from visual inputs to produce motor controls and are implemented with deep learning architectures. This enables the robot to avoid explicit computation of its precise location or the geometry of the environment, and enables navigation at a higher level of semantic abstraction. We evaluate the effectiveness of our representation by simulating navigation tasks in a large number of virtual environments. Our results show that using a simple sets of perceptual and navigational behaviors, the proposed approach can successfully guide the way of the robot as it completes navigational missions such as going to a specific office. Furthermore, our implementation shows to be effective to control the selection and switching of behaviors.\nAlthough there is an emerging trend towards generating embeddings for primarily unstructured data, and recently for structured data, there is not yet any systematic suite for measuring the quality of embeddings. This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of encoded structure as well as semantic patterns in the embedding space. In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect. Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings. Furthermore, w.r.t. this framework multiple experimental studies were run to compare the quality of the available embedding models. Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.\nIn a multi-source environment, each source has its own credibility. If there is no external knowledge about credibility then we can use the information provided by the sources to assess their credibility. In this paper, we propose a way to measure conflict in a multi-source environment as a normal measure. We examine our algorithm using three simulated examples of increasing conflict and one experimental example. The results demonstrate that the proposed measure can represent conflict in a meaningful way similar to what a human might expect and from it we can identify conflict within our sources.\nPropositional satisfiability (SAT) is at the nucleus of state-of-the-art approaches to a variety of computationally hard problems, one of which is cryptanalysis. Moreover, a number of practical applications of SAT can only be tackled efficiently by identifying and exploiting a subset of formula's variables called backdoor set (or simply backdoors). This paper proposes a new class of backdoor sets for SAT used in the context of cryptographic attacks, namely guess-and-determine attacks. The idea is to identify the best set of backdoor variables subject to a statistically estimated hardness of the guess-and-determine attack using a SAT solver. Experimental results on weakened variants of the renowned encryption algorithms exhibit advantage of the proposed approach compared to the state of the art in terms of the estimated hardness of the resulting guess-and-determine attacks.\nIn this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs. Reward decomposition is a special case of Hierarchical Reinforcement Learning, that allows one to learn many policies in parallel and combine them into a composite solution. Our approach builds on mapping this problem into a Reward Discounted Traveling Salesman Problem, and then deriving approximate solutions for it. In particular, we focus on approximate solutions that are local, i.e., solutions that only observe information about the current state. Local policies are easy to implement and do not require substantial computational resources as they do not perform planning. While local deterministic policies, like Nearest Neighbor, are being used in practice for hierarchical reinforcement learning, we propose three stochastic policies that guarantee better performance than any deterministic policy.\nResidential location choice modeling is one of the substantial components of land use and transportation models. While numerous aggregated mathematical and statistical approaches have been developed to model the residence choice behavior of households, disaggregated approaches such as the agent-based modeling have shown interesting capabilities. In this article, a novel agent-based approach is developed to simulate the residential location choice of tenants in Tehran, the capital of Iran. Tenants are considered as agents who select their desired residential alternatives according to their characteristics and preferences for various criteria such as the rent, accessibility to different services and facilities, environmental pollution, and distance from their workplace and former residence. The choice set of agents is limited to their desired residential alternatives by applying a constrained NSGA-II algorithm. Then, agents compete with each other to select their final residence among their alternatives. Results of the proposed approach are validated by comparing simulated and actual residences of a sample of tenants. Results show that the proposed approach is able to accurately simulate the residence of 59.3% of tenants at the traffic analysis zone level.\nBecause of improving accessibility, transport developments play an important role in residence choice of renter households. In this paper, an agent-based model is developed to investigate impacts of different transport developments on residence choice of renter households in Tehran, the capital of Iran. In the proposed model, renter households are considered as agents who make a multi-objective decision and compete with each other to rent a preferred residential zone. Then, three transport development scenarios including construction a new highway, subway and bus rapid transit (BRT) line are simulated and resulting changes in residence choice of agents are evaluated. Results show that transport development scenarios significantly affect residence choice behavior of different socio-economic categories of renter households and lead to considerable changes in the residential demand, composition of residents, mean income level and mean car ownership in their vicinities.\nThe performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore \\emph{local} regions close to what the actor policy dictates. In this work, we develop a simple meta-policy gradient algorithm that allows us to adaptively learn the exploration policy in DDPG. Our algorithm allows us to train flexible exploration behaviors that are independent of the actor policy, yielding a \\emph{global exploration} that significantly speeds up the learning process. With an extensive study, we show that our method significantly improves the sample-efficiency of DDPG on a variety of reinforcement learning tasks.\nHierarchical Temporal Memory (HTM) is a neuromorphic algorithm that emulates sparsity, hierarchy and modularity resembling the working principles of neocortex. Feature encoding is an important step to create sparse binary patterns. This sparsity is introduced by the binary weights and random weight assignment in the initialization stage of the HTM. We propose the alternative deterministic method for the HTM initialization stage, which connects the HTM weights to the input data and preserves natural sparsity of the input information. Further, we introduce the hardware implementation of the deterministic approach and compare it to the traditional HTM and existing hardware implementation. We test the proposed approach on the face recognition problem and show that it outperforms the conventional HTM approach.\nDeep reinforcement learning (DRL) has proven to be an effective tool for creating general video-game AI. However most current DRL video-game agents learn end-to-end from the video-output of the game, which is superfluous for many applications and creates a number of additional problems. More importantly, directly working on pixel-based raw video data is substantially distinct from what a human player does.In this paper, we present a novel method which enables DRL agents to learn directly from object information. This is obtained via use of an object embedding network (OEN) that compresses a set of object feature vectors of different lengths into a single fixed-length unified feature vector representing the current game-state and fulfills the DRL simultaneously. We evaluate our OEN-based DRL agent by comparing to several state-of-the-art approaches on a selection of games from the GVG-AI Competition. Experimental results suggest that our object-based DRL agent yields performance comparable to that of those approaches used in our comparative study.\nAccurate Traffic Sign Detection (TSD) can help drivers make better decision according to the traffic regulations. TSD, regarded as a typical small object detection problem in some way, is fundamental in the field of self-driving and advanced driver assistance systems. However, small object detection is still an open question. In this paper, we proposed a human brain inspired network to handle this problem. Attention mechanism is an essential function of our brain, we used a novel recurrent attentive neural network to improve the detection accuracy in a fine-grained manner. Further, as we human can combine domain specific knowledge and intuitive knowledge to solve tricky tasks, we proposed an assumption that the location of the traffic signs obeys the reverse gaussian distribution, which means the location is around the central bias of every picture. Experimental result shows that our methods achieved better performance than several popular methods used in object detection.\nShort-term synaptic plasticity (STSP) affects the efficiency of synaptic transmission for persistent presynaptic activities. We consider attractor neural networks, for which the attractors are given, in the absence of STSP, by cell assemblies of excitatory cliques. We show that STSP may transform these attracting states into attractor relics, inducing ongoing transient-state dynamics in terms of sequences of transiently activated cell assemblies, the former attractors. Subsequent cell assemblies may be both disjoint or partially overlapping. It may hence be possible to use the resulting dynamics for the generation of motor control sequences.\nIn this work we describe a novel deep reinforcement learning neural network architecture that allows multiple actions to be selected at every time-step. Multi-action policies allows complex behaviors to be learnt that are otherwise hard to achieve when using single action selection techniques. This work describes an algorithm that uses both imitation learning (IL) and temporal difference (TD) reinforcement learning (RL) to provide a 4x improvement in training time and 2.5x improvement in performance over single action selection TD RL. We demonstrate the capabilities of this network using a complex in-house 3D game. Mimicking the behavior of the expert teacher significantly improves world state exploration and allows the agents vision system to be trained more rapidly than TD RL alone. This initial training technique kick-starts TD learning and the agent quickly learns to surpass the capabilities of the expert.\nDeep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much broader optima than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and Shake-Shake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.\nWe present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering. Together, these constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. The ARC question set is partitioned into a Challenge Set and an Easy Set, where the Challenge Set contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurence algorithm. The dataset contains only natural, grade-school science questions (authored for human tests), and is the largest public-domain set of this kind (7,787 questions). We test several baselines on the Challenge Set, including leading neural models from the SQuAD and SNLI tasks, and find that none are able to significantly outperform a random baseline, reflecting the difficult nature of this task. We are also releasing the ARC Corpus, a corpus of 14M science sentences relevant to the task, and implementations of the three neural baseline models tested. Can your model perform better? We pose ARC as a challenge to the community.\nIn this study we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered. Using a new dataset annotated specifically for this task, we employ supervised classification along with a set of features that includes n-grams, skip-grams and clustering-based word representations. We apply approaches based on single classifiers as well as more advanced ensemble classifiers and stacked generalization, achieving the best result of 80% accuracy for this 3-class classification task. Analysis of the results reveals that discriminating hate speech and profanity is not a simple task, which may require features that capture a deeper understanding of the text not always possible with surface n-grams. The variability of gold labels in the annotated data, due to differences in the subjective adjudications of the annotators, is also an issue. Other directions for future work are discussed.\nRearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance.\nDeep Neural Networks (DNNs) have achieved remarkable performance in a myriad of realistic applications. However, recent studies show that well-trained DNNs can be easily misled by adversarial examples (AE) -- the maliciously crafted inputs by introducing small and imperceptible input perturbations. Existing mitigation solutions, such as adversarial training and defensive distillation, suffer from expensive retraining cost and demonstrate marginal robustness improvement against the state-of-the-art attacks like CW family adversarial examples. In this work, we propose a novel low-cost \"feature distillation\" strategy to purify the adversarial input perturbations of AEs by redesigning the popular image compression framework \"JPEG\". The proposed \"feature distillation\" wisely maximizes the malicious feature loss of AE perturbations during image compression while suppressing the distortions of benign features essential for high accurate DNN classification. Experimental results show that our method can drastically reduce the success rate of various state-of-the-art AE attacks by ~60% on average for both CIFAR-10 and ImageNet benchmarks without harming the testing accuracy, outperforming existing solutions like default JPEG compression and \"feature squeezing\".\nIn the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.\nGo gaming is a struggle between adversaries, black and white simple stones, and aim to control the most Go board territory for success. Rules are simple but Go game fighting is highly intricate. Stones placement and interaction on board is random-appearance, likewise interaction phenomena among basic elements in physics thermodynamics, chemistry, biology, or social issues. We model the Go game dynamic employing an Ising model energy function, whose interaction coefficients reflect the application of rules and tactics to build long-term strategies. At any step of the game, the energy function of the model assesses the control and strength of a player over the board. A close fit between predictions of the model with actual game's scores is obtained. AlphaGo computer is the current top Go player, but its behavior does not wholly reveal the Go gaming nature. The Ising function allows for precisely model the stochastic evolutions of Go gaming patterns, so, to advance the understanding on Go own-dynamic -beyond the player`s abilities. The analysis of the frequency and combination of tactics shows the formation of patterns in the groups of stones during a game, regarding the turn of each player, or if human or computer adversaries are confronted.\nThere is a need for systems to dynamically interact with ageing populations to gather information, monitor health condition and provide support, especially after hospital discharge or at-home settings. Several smart devices have been delivered by digital health, bundled with telemedicine systems, smartphone and other digital services. While such solutions offer personalised data and suggestions, the real disruptive step comes from the interaction of new digital ecosystem, represented by chatbots. Chatbots will play a leading role by embodying the function of a virtual assistant and bridging the gap between patients and clinicians. Powered by AI and machine learning algorithms, chatbots are forecasted to save healthcare costs when used in place of a human or assist them as a preliminary step of helping to assess a condition and providing self-care recommendations. This paper describes integrating chatbots into telemedicine systems intended for elderly patient after their hospital discharge. The paper discusses possible ways to utilise chatbots to assist healthcare providers and support patients with their condition.\nWe introduce MeSys, a meaning-based approach to solving English math word problems (MWPs) via understanding and reasoning in this paper. It first analyzes the text, transforms both body and question parts into their corresponding logic forms, and then performs inference on them. The associated context of each quantity is represented with proposed role-tags (e.g., nsubj, verb, etc.), which provides the flexibility for annotating an extracted math quantity with its associated context information (i.e., the physical meaning of this quantity). Statistical models are proposed to select the operator and operands. A noisy dataset is designed to assess if a solver solves MWPs mainly via understanding or pattern matching. Experimental results show that our approach outperforms existing systems on both benchmark datasets and the noisy dataset, which demonstrates that the proposed approach more understands the meaning of each quantity in the text.\nThe Renormalisation Group (RG) provides a framework in which it is possible to assess whether a deep-learning network is sensitive to small changes in the input data and hence prone to error, or susceptible to adversarial attack. Distinct classification outputs are associated with different RG fixed points and sensitivity to small changes in the input data is due to the presence of relevant operators at a fixed point. A numerical scheme, based on Monte Carlo RG ideas, is proposed for identifying the existence of relevant operators and the corresponding directions of greatest sensitivity in the input data. Thus, a trained deep-learning network may be tested for its robustness and, if it is vulnerable to attack, dangerous perturbations of the input data identified.\nIn this short paper, we consider the roles of HCI in enabling the better governance of consequential machine learning systems using the rights and obligations laid out in the recent 2016 EU General Data Protection Regulation (GDPR)---a law which involves heavy interaction with people and systems. Focussing on those areas that relate to algorithmic systems in society, we propose roles for HCI in legal contexts in relation to fairness, bias and discrimination; data protection by design; data protection impact assessments; transparency and explanations; the mitigation and understanding of automation bias; and the communication of envisaged consequences of processing.\nWe describe an efficient, scalable machine learning library that enables very fast training of generalized linear models. We demonstrate that our library can remove the training time as a bottleneck for machine learning workloads, opening the door to a range of new applications. For instance, it allows more agile development, faster and more fine-grained exploration of the hyper-parameter space, enables scaling to massive datasets and makes frequent re-training of models possible in order to adapt to events as they occur. Our library, named Snap Machine Learning (Snap ML), combines recent advances in machine learning systems and algorithms in a nested manner to reflect the hierarchical architecture of modern distributed systems. This allows us to effectively leverage available network, memory and heterogeneous compute resources. On a terabyte-scale publicly available dataset for click-through-rate prediction in computational advertising, we demonstrate the training of a logistic regression classifier in 1.53 minutes, a 46x improvement over the fastest reported performance.\nThis work proposed a novel learning objective to train a deep neural network to perform end-to-end image pixel clustering. We applied the approach to instance segmentation, which is at the intersection of image semantic segmentation and object detection. We utilize the most fundamental property of instance labeling -- the pairwise relationship between pixels -- as the supervision to formulate the learning objective, then apply it to train a fully convolutional network (FCN) for learning to perform pixel-wise clustering. The resulting clusters can be used as the instance labeling directly. To support labeling of an unlimited number of instance, we further formulate ideas from graph coloring theory into the proposed learning objective. The evaluation on the Cityscapes dataset demonstrates strong performance and therefore proof of the concept. Moreover, our approach won the second place in the lane detection competition of 2017 CVPR Autonomous Driving Challenge, and was the top performer without using external data.\nTo adequately model mathematical arguments the analyst must be able to represent the mathematical objects under discussion and the relationships between them, as well as inferences drawn about these objects and relationships as the discourse unfolds. We introduce a framework with these properties, which has been applied to both mathematical dialogues and expository texts. The framework can recover salient elements of discourse at, and within, the sentence level, as well as the way mathematical content connects to form larger argumentative structures. We show how the framework might be used to support computational reasoning, and argue that it provides a more natural way to examine the process of proving theorems than do Lamport's structured proofs.\nSpiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity of a balanced network, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synaptically-coupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain.\nAnswering complex questions is a time-consuming activity for humans that requires reasoning and integration of information. Recent work on reading comprehension made headway in answering simple questions, but tackling complex questions is still an ongoing research challenge. Conversely, semantic parsers have been successful at handling compositionality, but only when the information resides in a target knowledge-base. In this paper, we present a novel framework for answering broad and complex questions, assuming answering simple questions is possible using a search engine and a reading comprehension model. We propose to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers. To illustrate the viability of our approach, we create a new dataset of complex questions, ComplexWebQuestions, and present a model that decomposes questions and interacts with the web to compute an answer. We empirically demonstrate that question decomposition improves performance from 20.8 precision@1 to 27.5 precision@1 on this new dataset.\nSelecting a set of alternatives based on the preferences of agents is an important problem in committee selection and beyond. Among the various criteria put forth for the desirability of a committee, Pareto optimality is a minimal and important requirement. As asking agents to specify their preferences over exponentially many subsets of alternatives is practically infeasible, we assume that each agent specifies a weak order on single alternatives, from which a preference relation over subsets is derived using some preference extension. We consider five prominent extensions (responsive, downward lexicographic, upward lexicographic, best, and worst). For each of them, we consider the corresponding Pareto optimality notion, and we study the complexity of computing and verifying Pareto optimal outcomes. We also consider strategic issues: for four of the set extensions, we present a linear-time, Pareto optimal and strategyproof algorithm that even works for weak preferences.\nMachine learning is a crucial aspect of artificial intelligence. This paper details an approach for quantum Hebbian learning through a batched version of quantum state exponentiation. Here, batches of quantum data are interacted with learning and processing quantum bits (qubits) by a series of elementary controlled partial swap operations, resulting in a Hamiltonian simulation of the statistical ensemble of the data. We decompose this elementary operation into one and two qubit quantum gates from the Clifford+$T$ set and use the decomposition to perform an efficiency analysis. Our construction of quantum Hebbian learning is motivated by extension from the established classical approach, and it can be used to find details about the data such as eigenvalues through phase estimation. This work contributes to the near-term development and implementation of quantum machine learning techniques.\nA common belief in model-free reinforcement learning is that methods based on random search in the parameter space of policies exhibit significantly worse sample complexity than those that explore the space of actions. We dispel such beliefs by introducing a random search method for training static, linear policies for continuous control problems, matching state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks. Our method also finds a nearly optimal controller for a challenging instance of the Linear Quadratic Regulator, a classical problem in control theory, when the dynamics are not known. Computationally, our random search algorithm is at least 15 times more efficient than the fastest competing model-free methods on these benchmarks. We take advantage of this computational efficiency to evaluate the performance of our method over hundreds of random seeds and many different hyperparameter configurations for each benchmark task. Our simulations highlight a high variability in performance in these benchmark tasks, suggesting that commonly used estimations of sample efficiency do not adequately evaluate the performance of RL algorithms.\nReward shaping allows reinforcement learning (RL) agents to accelerate learning by receiving additional reward signals. However, these signals can be difficult to design manually, especially for complex RL tasks. We propose a simple and general approach that determines the reward of pre-defined events by their rarity alone. Here events become less rewarding as they are experienced more often, which encourages the agent to continually explore new types of events as it learns. The adaptiveness of this reward function results in a form of automated curriculum learning that does not have to be specified by the experimenter. We demonstrate that this Rarity of Events (RoE) approach enables the agent to succeed in challenging VizDoom scenarios without access to the extrinsic reward from the environment. Furthermore, the results demonstrate that RoE learns a more versatile policy that adapts well to critical changes in the environment. Rewarding events based on their rarity could help in many unsolved RL environments that are characterized by sparse extrinsic rewards but a plethora of known event types.\nThis paper presents a systematic survey on recent development of neural text generation models. Specifically, we start from recurrent neural network language models with the traditional maximum likelihood estimation training scheme and point out its shortcoming for text generation. We thus introduce the recently proposed methods for text generation based on reinforcement learning, re-parametrization tricks and generative adversarial nets (GAN) techniques. We compare different properties of these models and the corresponding techniques to handle their common problems such as gradient vanishing and generation diversity. Finally, we conduct a benchmarking experiment with different types of neural text generation models on two well-known datasets and discuss the empirical results along with the aforementioned model properties.\nThis paper describes the methodology followed to build a neural machine translation system in the biomedical domain for the English-Catalan language pair. This task can be considered a low-resourced task from the point of view of the domain and the language pair. To face this task, this paper reports experiments on a cascade pivot strategy through Spanish for the neural machine translation using the English-Spanish SCIELO and Spanish-Catalan El Peri\\'odico database. To test the final performance of the system, we have created a new test data set for English-Catalan in the biomedical domain which is freely available on request.\nResearch in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs). However, effective and efficient methods for incorporation of temporal information into CNNs are still being actively explored in the recent literature. Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-based Temporal Weighted CNN (ATW), which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is simply implemented as temporal weighting yet it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with backpropagation. Our experiments show that the proposed attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments.\nIt has been challenging for the technical and regulatory communities to formulate requirements for trustworthiness of the cyber-physical systems (CPS) due to the complexity of the issues associated with their design, deployment, and operations. The US National Institute of Standards and Technology (NIST), through a public working group, has released a CPS Framework that adopts a broad and integrated view of CPS and positions trustworthiness among other aspects of CPS. This paper takes the model created by the CPS Framework and its further developments one step further, by applying ontological approaches and reasoning techniques in order to achieve greater understanding of CPS. The example analyzed in the paper demonstrates the enrichment of the original CPS model obtained through ontology and reasoning and its ability to deliver additional insights to the developers and operators of CPS.\nAs concerns about unfairness and discrimination in \"black box\" machine learning systems rise, a legal \"right to an explanation\" has emerged as a compellingly attractive approach for challenge and redress. We outline recent debates on the limited provisions in European data protection law, and introduce and analyze newer explanation rights in French administrative law and the draft modernized Council of Europe Convention 108. While individual rights can be useful, in privacy law they have historically unreasonably burdened the average data subject. \"Meaningful information\" about algorithmic logics is more technically possible than commonly thought, but this exacerbates a new \"transparency fallacy\"---an illusion of remedy rather than anything substantively helpful. While rights-based approaches deserve a firm place in the toolbox, other forms of governance, such as impact assessments, \"soft law,\" judicial review, and model repositories deserve more attention, alongside catalyzing agencies acting for users to control algorithmic system design.\nA useful computation when acting in a complex environment is to infer the marginal probabilities or most probable states of task-relevant variables. Probabilistic graphical models can efficiently represent the structure of such complex data, but performing these inferences is generally difficult. Message-passing algorithms, such as belief propagation, are a natural way to disseminate evidence amongst correlated variables while exploiting the graph structure, but these algorithms can struggle when the conditional dependency graphs contain loops. Here we use Graph Neural Networks (GNNs) to learn a message-passing algorithm that solves these inference tasks. We first show that the architecture of GNNs is well-matched to inference tasks. We then demonstrate the efficacy of this inference approach by training GNNs on an ensemble of graphical models and showing that they substantially outperform belief propagation on loopy graphs. Our message-passing algorithms generalize out of the training set to larger graphs and graphs with different structure.\nExisting research studies on vision and language grounding for robot navigation focus on improving model-free deep reinforcement learning (DRL) models in synthetic environments. However, model-free DRL models do not consider the dynamics in the real-world environments, and they often fail to generalize to new scenes. In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task. Our look-ahead module tightly integrates a look-ahead policy model with an environment model that predicts the next state and the reward. Experimental results suggest that our proposed method significantly outperforms the baselines and achieves the best on the real-world Room-to-Room dataset. Moreover, our scalable method is more generalizable when transferring to unseen environments, and the relative success rate is increased by 15.5% on the unseen test set.\nRecently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features.\nDecades of research on the neural code underlying spatial navigation have revealed a diverse set of neural response properties. The Entorhinal Cortex (EC) of the mammalian brain contains a rich set of spatial correlates, including grid cells which encode space using tessellating patterns. However, the mechanisms and functional significance of these spatial representations remain largely mysterious. As a new way to understand these neural representations, we trained recurrent neural networks (RNNs) to perform navigation tasks in 2D arenas based on velocity inputs. Surprisingly, we find that grid-like spatial response patterns emerge in trained networks, along with units that exhibit other spatial correlates, including border cells and band-like cells. All these different functional types of neurons have been observed experimentally. The order of the emergence of grid-like and border cells is also consistent with observations from developmental studies. Together, our results suggest that grid cells, border cells and others as observed in EC may be a natural solution for representing space efficiently given the predominant recurrent connections in the neural circuits.\nAutomatic human action recognition is indispensable for almost artificial intelligent systems such as video surveillance, human-computer interfaces, video retrieval, etc. Despite a lot of progress, recognizing actions in an unknown video is still a challenging task in computer vision. Recently, deep learning algorithms have proved its great potential in many vision-related recognition tasks. In this paper, we propose the use of Deep Residual Neural Networks (ResNets) to learn and recognize human action from skeleton data provided by Kinect sensor. Firstly, the body joint coordinates are transformed into 3D-arrays and saved in RGB images space. Five different deep learning models based on ResNet have been designed to extract image features and classify them into classes. Experiments are conducted on two public video datasets for human action recognition containing various challenges. The results show that our method achieves the state-of-the-art performance comparing with existing approaches.\nWe consider the problem of metric learning for multi-view data and present a novel method for learning within-view as well as between-view metrics in vector-valued kernel spaces, as a way to capture multi-modal structure of the data. We formulate two convex optimization problems to jointly learn the metric and the classifier or regressor in kernel feature spaces. An iterative three-step multi-view metric learning algorithm is derived from the optimization problems. In order to scale the computation to large training sets, a block-wise Nystr{\\\"o}m approximation of the multi-view kernel matrix is introduced. We justify our approach theoretically and experimentally, and show its performance on real-world datasets against relevant state-of-the-art methods.\nKnowledge Graph Embedding methods aim at representing entities and relations in a knowledge base as points or vectors in a continuous vector space. Several approaches using embeddings have shown promising results on tasks such as link prediction, entity recommendation, question answering, and triplet classification. However, only a few methods can compute low-dimensional embeddings of very large knowledge bases. In this paper, we propose KG2Vec, a novel approach to Knowledge Graph Embedding based on the skip-gram model. Instead of using a predefined scoring function, we learn it relying on Long Short-Term Memories. We evaluated the goodness of our embeddings on knowledge graph completion and show that KG2Vec is comparable to the quality of the scalable state-of-the-art approaches and can process large graphs by parsing more than a hundred million triples in less than 6 hours on common hardware.\nFormal Concept Analysis and its associated conceptual structures have been used to support exploratory search through conceptual navigation. Relational Concept Analysis (RCA) is an extension of Formal Concept Analysis to process relational datasets. RCA and its multiple interconnected structures represent good candidates to support exploratory search in relational datasets, as they are enabling navigation within a structure as well as between the connected structures. However, building the entire structures does not present an efficient solution to explore a small localised area of the dataset, for instance to retrieve the closest alternatives to a given query. In these cases, generating only a concept and its neighbour concepts at each navigation step appears as a less costly alternative. In this paper, we propose an algorithm to compute a concept and its neighbourhood in extended concept lattices. The concepts are generated directly from the relational context family, and possess both formal and relational attributes. The algorithm takes into account two RCA scaling operators. We illustrate it on an example.\nDynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular topic models, and also limit scalability. In this paper, we present several new results around DTMs. First, we extend the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs). This allows us to explore topics that develop smoothly over time, that have a long-term memory or are temporally concentrated (for event detection). Second, we show how to perform scalable approximate inference in these models based on ideas around stochastic variational inference and sparse Gaussian processes. This way we can train a rich family of DTMs to massive data. Our experiments on several large-scale datasets show that our generalized model allows us to find interesting patterns that were not accessible by previous approaches.\nSelfridge, along with Sutherland and Marr provided some of the earliest proposals for how to program computers to recognize shapes. Their emphasis on filtering for contour features, especially the orientation of boundary segments, was reinforced by the Nobel Prize winning work of Hubel & Wiesel who discovered that neurons in primary visual cortex selectively respond as a function of contour orientation. Countless investigators and theorists have continued to build on this approach. These models are often described as neuromorphic, which implies that the computational methods are based on biologically plausible principles. Recent work from the present lab has challenged the emphasis on orientation selectivity and the use of neural network principles. The goal of the present report is not to relitigate those issues, but to provide an alternative concept for encoding of shape information that may be useful to neuromorphic modelers.\nIn this paper, we study the problem of image-text matching. Inferring the latent semantic alignment between objects or other salient stuffs (e.g. snow, sky, lawn) and the corresponding words in sentences allows to capture fine-grained interplay between vision and language, and makes image-text matching more interpretable. Prior works either simply aggregate the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or use a multi-step attentional process to capture limited number of semantic alignments which is less interpretable. In this paper, we present Stacked Cross Attention to discover the full latent alignments using both image regions and words in sentence as context and infer the image-text similarity. Our approach achieves the state-of-the-art results on the MS-COCO and Flickr30K datasets. On Flickr30K, our approach outperforms the current best methods by 22.1% in text retrieval from image query, and 18.2% in image retrieval with text query (based on Recall@1). On MS-COCO, our approach improves sentence retrieval by 17.8% and image retrieval by 16.6% (based on Recall@1 using the 5K test set).\nWe revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Further, many blind deconvolution methods based on deep architectures internally make use of or optimize the basic formulation, so a clearer understanding of how this sub-module behaves, when it can be solved, and what noise injection it can tolerate is a first order requirement. We derive new insights into the theoretical underpinnings of blind deconvolution. The algorithm that emerges has nice convergence guarantees and is provably robust in a sense we formalize in the paper. Interestingly, these technical results play out very well in practice, where on standard datasets our algorithm yields results competitive with or superior to the state of the art. Keywords: blind deconvolution, robust continuous optimization\nConsidered as a data-driven approach, Fingerprinting Localization Solutions (FPSs) enjoy huge popularity due to their good performance and minimal environment information requirement. This papers addresses applications of artificial intelligence to solve two problems in Received Signal Strength Indicator (RSSI) based FPS, first the cumbersome training database construction and second the extrapolation of fingerprinting algorithm for similar buildings with slight environmental changes. After a concise overview of deep learning design techniques, two main techniques widely used in deep learning are exploited for the above mentioned issues namely data augmentation and transfer learning. We train a multi-layer neural network that learns the mapping from the observations to the locations. A data augmentation method is proposed to increase the training database size based on the structure of RSSI measurements and hence reducing effectively the amount of training data. Then it is shown experimentally how a model trained for a particular building can be transferred to a similar one by fine tuning with significantly smaller training numbers. The paper implicitly discusses the new guidelines to consider about deep learning designs when they are employed in a new application context.\nLearning-based methods have been successful in solving complex control tasks without significant prior knowledge about the system. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that provides provable high-probability safety guarantees. To this end, we exploit regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we do not assume that model uncertainties are independent. Based on these predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. In our experiments, we show that the resulting algorithm can be used to safely and efficiently explore and learn about dynamic systems.\nCultural adaptation, i.e., the matching of a robot's behaviours to the cultural norms and preferences of its user, is a well known key requirement for the success of any assistive application. However, culture-dependent robot behaviours are often implicitly set by designers, thus not allowing for an easy and automatic adaptation to different cultures. This paper presents a method for the design of culture-aware robots, that can automatically adapt their behaviour to conform to a given culture. We propose a mapping from cultural factors to related parameters of robot behaviours which relies on linguistic variables to encode heterogeneous cultural factors in a uniform formalism, and on fuzzy rules to encode qualitative relations among multiple variables. We illustrate the approach in two practical case studies.\nMotivated by Supervised Opinion Analysis, we propose a novel framework devoted to Structured Output Learning with Abstention (SOLA). The structure prediction model is able to abstain from predicting some labels in the structured output at a cost chosen by the user in a flexible way. For that purpose, we decompose the problem into the learning of a pair of predictors, one devoted to structured abstention and the other, to structured output prediction. To compare fully labeled training data with predictions potentially containing abstentions, we define a wide class of asymmetric abstention-aware losses. Learning is achieved by surrogate regression in an appropriate feature space while prediction with abstention is performed by solving a new pre-image problem. Thus, SOLA extends recent ideas about Structured Output Prediction via surrogate problems and calibration theory and enjoys statistical guarantees on the resulting excess risk. Instantiated on a hierarchical abstention-aware loss, SOLA is shown to be relevant for fine-grained opinion mining and gives state-of-the-art results on this task. Moreover, the abstention-aware representations can be used to competitively predict user-review ratings based on a sentence-level opinion predictor.\nConversational agents have become ubiquitous, ranging from goal-oriented systems for helping with reservations to chit-chat models found in modern virtual assistants. In this survey paper, we explore this fascinating field. We look at some of the pioneering work that defined the field and gradually move to the current state-of-the-art models. We look at statistical, neural, generative adversarial network based and reinforcement learning based approaches and how they evolved. Along the way we discuss various challenges that the field faces, lack of context in utterances, not having a good quantitative metric to compare models, lack of trust in agents because they do not have a consistent persona etc. We structure this paper in a way that answers these pertinent questions and discusses competing approaches to solve them.\nDeep reinforcement learning has been successfully applied to several visual-input tasks using model-free methods. In this paper, we propose a model-based approach that combines learning a DNN-based transition model with Monte Carlo tree search to solve a block-placing task in Minecraft. Our learned transition model predicts the next frame and the rewards one step ahead given the last four frames of the agent's first-person-view image and the current action. Then a Monte Carlo tree search algorithm uses this model to plan the best sequence of actions for the agent to perform. On the proposed task in Minecraft, our model-based approach reaches the performance comparable to the Deep Q-Network's, but learns faster and, thus, is more training sample efficient.\nUnseen Action Recognition (UAR) aims to recognise novel action categories without training examples. While previous methods focus on inner-dataset seen/unseen splits, this paper proposes a pipeline using a large-scale training source to achieve a Universal Representation (UR) that can generalise to a more realistic Cross-Dataset UAR (CD-UAR) scenario. We first address UAR as a Generalised Multiple-Instance Learning (GMIL) problem and discover 'building-blocks' from the large-scale ActivityNet dataset using distribution kernels. Essential visual and semantic components are preserved in a shared space to achieve the UR that can efficiently generalise to new datasets. Predicted UR exemplars can be improved by a simple semantic adaptation, and then an unseen action can be directly recognised using UR during the test. Without further training, extensive experiments manifest significant improvements over the UCF101 and HMDB51 benchmarks.\nWe present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections, and produces a joint representation that captures the many-to-many relations between language and physical properties of 3D shapes such as color and shape. To evaluate our approach, we collect a large dataset of natural language descriptions for physical 3D objects in the ShapeNet dataset. With this learned joint embedding we demonstrate text-to-shape retrieval that outperforms baseline approaches. Using our embeddings with a novel conditional Wasserstein GAN framework, we generate colored 3D shapes from text. Our method is the first to connect natural language text with realistic 3D objects exhibiting rich variations in color, texture, and shape detail. See video at https://youtu.be/zraPvRdl13Q\nWe propose an effective way to create interpretable control agents, by re-purposing the function of a biological neural circuit model, to govern simulated and real world reinforcement learning (RL) test-beds. We model the tap-withdrawal (TW) neural circuit of the nematode, C. elegans, a circuit responsible for the worm's reflexive response to external mechanical touch stimulations, and learn its synaptic and neuronal parameters as a policy for controlling basic RL tasks. We also autonomously park a real rover robot on a pre-defined trajectory, by deploying such neuronal circuit policies learned in a simulated environment. For reconfiguration of the purpose of the TW neural circuit, we adopt a search-based RL algorithm. We show that our neuronal policies perform as good as deep neural network policies with the advantage of realizing interpretable dynamics at the cell level.\nPredictive process monitoring is concerned with the analysis of events produced during the execution of a process in order to predict the future state of ongoing cases thereof. Existing techniques in this field are able to predict, at each step of a case, the likelihood that the case will end up in an undesired outcome. These techniques, however, do not take into account what process workers may do with the generated predictions in order to decrease the likelihood of undesired outcomes. This paper proposes a framework for prescriptive process monitoring, which extends predictive monitoring approaches with concepts of alarms, interventions, compensations, and mitigation effects. The framework incorporates a parameterized cost model to assess the cost-benefit tradeoffs of applying prescriptive process monitoring in a given setting. The paper also outlines an approach to optimize the generation of alarms given a dataset and a set of cost model parameters. The proposed approach is empirically evaluated using a range of real-life event logs.\nRandom Differential Equations provide a natural extension of Ordinary Differential Equations to the stochastic setting. We show how, and under which conditions, every equilibrium state of a Random Differential Equation (RDE) can be described by a Structural Causal Model (SCM), while pertaining the causal semantics. This provides an SCM that captures the stochastic and causal behavior of the RDE, which can model both cycles and confounders. This enables the study of the equilibrium states of the RDE by applying the theory and statistical tools available for SCMs, for example, marginalizations and Markov properties, as we illustrate by means of an example. Our work thus provides a direct connection between two fields that so far have been developing in isolation.\nThis paper introduces an innovative approach for handling 2D compound hypotheses within the Belief Function Theory framework. We propose a polygon-based generic rep- resentation which relies on polygon clipping operators. This approach allows us to account in the computational cost for the precision of the representation independently of the cardinality of the discernment frame. For the BBA combination and decision making, we propose efficient algorithms which rely on hashes for fast lookup, and on a topological ordering of the focal elements within a directed acyclic graph encoding their interconnections. Additionally, an implementation of the functionalities proposed in this paper is provided as an open source library. Experimental results on a pedestrian localization problem are reported. The experiments show that the solution is accurate and that it fully benefits from the scalability of the 2D search space granularity provided by our representation.\nChu Spaces and Channel Theory are well established areas of investigation in the general context of category theory. We review a range of examples and applications of these methods in logic and computer science, including Formal Concept Analysis, distributed systems and ontology development. We then employ these methods to describe human object perception, beginning with the construction of uncategorized object files and proceeding through categorization, individual object identification and the tracking of object identity through time. We investigate the relationship between abstraction and mereological categorization, particularly as these affect object identity tracking. This we accomplish in terms of information flow that is semantically structured in terms of local logics, while at the same time this framework also provides an inferential mechanism towards identification and perception. We show how a mereotopology naturally emerges from the representation of classifications by simplicial complexes, and briefly explore the emergence of geometric relations and interactions between objects.\nThe increasing use of electronic forms of communication presents new opportunities in the study of mental health, including the ability to investigate the manifestations of psychiatric diseases unobtrusively and in the setting of patients' daily lives. A pilot study to explore the possible connections between bipolar affective disorder and mobile phone usage was conducted. In this study, participants were provided a mobile phone to use as their primary phone. This phone was loaded with a custom keyboard that collected metadata consisting of keypress entry time and accelerometer movement. Individual character data with the exceptions of the backspace key and space bar were not collected due to privacy concerns. We propose an end-to-end deep architecture based on late fusion, named DeepMood, to model the multi-view metadata for the prediction of mood scores. Experimental results show that 90.31% prediction accuracy on the depression score can be achieved based on session-level mobile phone typing dynamics which is typically less than one minute. It demonstrates the feasibility of using mobile phone metadata to infer mood disturbance and severity.\nWe propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to aligning the image based on vanishing points, predicting multiple layout elements (corners, boundaries, size and translation), and fitting a constrained Manhattan layout to the resulting predictions. Our method compares well in speed and accuracy to other existing work on panoramas, achieves among the best accuracy for perspective images, and can handle both cuboid-shaped and more general Manhattan layouts.\nDeep Convolutional Neural Networks have become a Swiss knife in solving critical artificial intelligence tasks. However, deploying deep CNN models for latency-critical tasks remains to be challenging because of the complex nature of CNNs. Recently, FPGA has become a favorable device to accelerate deep CNNs thanks to its high parallel processing capability and energy efficiency. In this work, we explore different fast convolution algorithms including Winograd and Fast Fourier Transform (FFT), and find an optimal strategy to apply them together on different types of convolutions. We also propose an optimization scheme to exploit parallelism on novel CNN architectures such as Inception modules in GoogLeNet. We implement a configurable IP-based face recognition acceleration system based on FaceNet using High-Level Synthesis. Our implementation on a Xilinx Ultrascale device achieves 3.75x latency speedup compared to a high-end NVIDIA GPU and surpasses previous FPGA results significantly.\nCurrently there is no standard way to identify how a dataset was created, and what characteristics, motivations, and potential skews it represents. To begin to address this issue, we propose the concept of a datasheet for datasets, a short document to accompany public datasets, commercial APIs, and pretrained models. The goal of this proposal is to enable better communication between dataset creators and users, and help the AI community move toward greater transparency and accountability. By analogy, in computer hardware, it has become industry standard to accompany everything from the simplest components (e.g., resistors), to the most complex microprocessor chips, with datasheets detailing standard operating characteristics, test results, recommended usage, and other information. We outline some of the questions a datasheet for datasets should answer. These questions focus on when, where, and how the training data was gathered, its recommended use cases, and, in the case of human-centric datasets, information regarding the subjects' demographics and consent as applicable. We develop prototypes of datasheets for two well-known datasets: Labeled Faces in The Wild~\\cite{lfw} and the Pang \\& Lee Polarity Dataset~\\cite{polarity}.\nWe propose MRU (Multi-Range Reasoning Units), a new fast compositional encoder for machine comprehension (MC). Our proposed MRU encoders are characterized by multi-ranged gating, executing a series of parameterized contract-and-expand layers for learning gating vectors that benefit from long and short-term dependencies. The aims of our approach are as follows: (1) learning representations that are concurrently aware of long and short-term context, (2) modeling relationships between intra-document blocks and (3) fast and efficient sequence encoding. We show that our proposed encoder demonstrates promising results both as a standalone encoder and as well as a complementary building block. We conduct extensive experiments on three challenging MC datasets, namely RACE, SearchQA and NarrativeQA, achieving highly competitive performance on all. On the RACE benchmark, our model outperforms DFN (Dynamic Fusion Networks) by 1.5%-6% without using any recurrent or convolution layers. Similarly, we achieve competitive performance relative to AMANDA on the SearchQA benchmark and BiDAF on the NarrativeQA benchmark without using any LSTM/GRU layers. Finally, incorporating MRU encoders with standard BiLSTM architectures further improves performance, achieving state-of-the-art results.\nDesigners of autonomous agents, whether in physical or virtual environments, need to express nondeterminisim, failure, and parallelism in behaviors, as well as accounting for synchronous coordination between agents. Behavior Trees are a semi-formalism deployed widely for this purpose in the games industry, but with challenges to scalability, reasoning, and reuse of common sub-behaviors.   We present an alternative formulation of behavior trees through a language design perspective, giving a formal operational semantics, type system, and corresponding implementation. We express specifications for atomic behaviors as linear logic formulas describing how they transform the environment, and our type system uses linear sequent calculus to derive a compositional type assignment to behavior tree expressions. These types expose the conditions required for behaviors to succeed and allow abstraction over parameters to behaviors, enabling the development of behavior \"building blocks\" amenable to compositional reasoning and reuse.\nWe present a neural model for representing snippets of code as continuous distributed vectors. The main idea is to represent code as a collection of paths in its abstract syntax tree, and aggregate these paths, in a smart and scalable way, into a single fixed-length $\\textit{code vector}$, which can be used to predict semantic properties of the snippet.   We demonstrate the effectiveness of our approach by using it to predict a method's name from the vector representation of its body. We evaluate our approach by training a model on a dataset of $14$M methods. We show that code vectors trained on this dataset can predict method names from files that were completely unobserved during training. Furthermore, we show that our model learns useful method name vectors that capture semantic similarities, combinations, and analogies.   Comparing previous techniques over the same data set, our approach obtains a relative improvement of over $75\\%$, being the first to successfully predict method names based on a large, cross-project, corpus.\nThe aggregate behaviors of users can collectively encode deep semantic information about the objects with which they interact. In this paper, we demonstrate novel ways in which the synthesis of these data can illuminate the terrain of users' environment and support them in their decision making and wayfinding. A novel application of Recurrent Neural Networks and skip-gram models, approaches popularized by their application to modeling language, are brought to bear on student university enrollment sequences to create vector representations of courses and map out traversals across them. We present demonstrations of how scrutability from these neural networks can be gained and how the combination of these techniques can be seen as an evolution of content tagging and a means for a recommender to balance user preferences inferred from data with those explicitly specified. From validation of the models to the development of a UI, we discuss additional requisite functionality informed by the results of a field study leading to the ultimate deployment of the system at a university.\nModels of intrinsic motivation present an important means to produce sensible behaviour in the absence of extrinsic rewards. Applications in video games are varied, and range from intrinsically motivated general game-playing agents to non-player characters such as companions and enemies. The information-theoretic quantity of Empowerment is a particularly promising candidate motivation to produce believable, generic and robust behaviour. However, while it can be used in the absence of external reward functions that would need to be crafted and learned, empowerment is computationally expensive. In this paper, we propose a modified UCT tree search method to mitigate empowerment's computational complexity in discrete and deterministic scenarios. We demonstrate how to modify a Monte-Carlo Search Tree with UCT to realise empowerment maximisation, and discuss three additional modifications that facilitate better sampling. We evaluate the approach both quantitatively, by analysing how close our approach gets to the baseline of exhaustive empowerment computation with varying amounts of computational resources, and qualitatively, by analysing the resulting behaviour in a Minecraft-like scenario.\nDue to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.\nWe propose Image-Semantic-Transformation-Reconstruction-Circle(ISTRC) model, a novel and powerful method using facenet's Euclidean latent space to understand the images. As the name suggests, ISTRC construct the circle, able to perfectly reconstruct images. One powerful Euclidean latent space embedded in ISTRC is FaceNet's last layer with the power of distinguishing and understanding images. Our model will reconstruct the images and manipulate Euclidean latent vectors to achieve semantic transformations and semantic images arthimetic calculations. In this paper, we show that ISTRC performs 10 high-level semantic transformations like \"Male and female\",\"add smile\",\"open mouth\", \"deduct beard or add mustache\", \"bigger/smaller nose\", \"make older and younger\", \"bigger lips\", \"bigger eyes\", \"bigger/smaller mouths\" and \"more attractive\". It just takes 3 hours(GTX 1080) to train the models of 10 semantic transformations.\nUnfair pricing policies have been shown to be one of the most negative perceptions customers can have concerning pricing, and may result in long-term losses for a company. Despite the fact that dynamic pricing models help companies maximize revenue, fairness and equality should be taken into account in order to avoid unfair price differences between groups of customers. This paper shows how to solve dynamic pricing by using Reinforcement Learning (RL) techniques so that prices are maximized while keeping a balance between revenue and fairness. We demonstrate that RL provides two main features to support fairness in dynamic pricing: on the one hand, RL is able to learn from recent experience, adapting the pricing policy to complex market environments; on the other hand, it provides a trade-off between short and long-term objectives, hence integrating fairness into the model's core. Considering these two features, we propose the application of RL for revenue optimization, with the additional integration of fairness as part of the learning procedure by using Jain's index as a metric. Results in a simulated environment show a significant improvement in fairness while at the same time maintaining optimisation of revenue.\nThis paper introduces a method, based on deep reinforcement learning, for automatically generating a general purpose decision making function. A Deep Q-Network agent was trained in a simulated environment to handle speed and lane change decisions for a truck-trailer combination. In a highway driving case, it is shown that the method produced an agent that matched or surpassed the performance of a commonly used reference model. To demonstrate the generality of the method, the exact same algorithm was also tested by training it for an overtaking case on a road with oncoming traffic. Furthermore, a novel way of applying a convolutional neural network to high level input that represents interchangeable objects is also introduced.\nIn computer vision, one is often confronted with problems of domain shifts, which occur when one applies a classifier trained on a source dataset to target data sharing similar characteristics (e.g. same classes), but also different latent data structures (e.g. different acquisition conditions). In such a situation, the model will perform poorly on the new data, since the classifier is specialized to recognize visual cues specific to the source domain. In this work we explore a solution, named DeepJDOT, to tackle this problem: through a measure of discrepancy on joint deep representations/labels based on optimal transport, we not only learn new data representations aligned between the source and target domain, but also simultaneously preserve the discriminative information used by the classifier. We applied DeepJDOT to a series of visual recognition tasks, where it compares favorably against state-of-the-art deep domain adaptation methods.\nSpeech recognition has received a less attention in Bengali literature due to the lack of a comprehensive dataset. In this paper, we describe the development process of the first comprehensive Bengali speech dataset on real numbers. It comprehends all the possible words that may arise in uttering any Bengali real number. The corpus has ten speakers from the different regions of Bengali native people. It comprises of more than two thousands of speech samples in a total duration of closed to four hours. We also provide a deep analysis of our corpus, highlight some of the notable features of it, and finally evaluate the performances of two of the notable Bengali speech recognizers on it.\nGoals for reinforcement learning problems are typically defined through hand-specified rewards. To design such problems, developers of learning algorithms must inherently be aware of what the task goals are, yet we often require agents to discover them on their own without any supervision beyond these sparse rewards. While much of the power of reinforcement learning derives from the concept that agents can learn with little guidance, this requirement greatly burdens the training process. If we relax this one restriction and endow the agent with knowledge of the reward function, and in particular of the goal, we can leverage backwards induction to accelerate training. To achieve this, we propose training a model to learn to take imagined reversal steps from known goal states. Rather than training an agent exclusively to determine how to reach a goal while moving forwards in time, our approach travels backwards to jointly predict how we got there. We evaluate our work in Gridworld and Towers of Hanoi and empirically demonstrate that it yields better performance than standard DDQN.\nThis paper uses neuroevolution of augmenting topologies to evolve control tactics for groups of units in real-time strategy games. In such games, players build economies to generate armies composed of multiple types of units with different attack and movement characteristics to combat each other. This paper evolves neural networks to control movement and attack commands, also called micro, for a group of ranged units skirmishing with a group of melee units. Our results show that neuroevolution of augmenting topologies can effectively generate neural networks capable of good micro for our ranged units against a group of hand-coded melee units. The evolved neural networks lead to kiting behavior for the ranged units which is a common tactic used by professional players in ranged versus melee skirmishes in popular real-time strategy games like Starcraft. The evolved neural networks also generalized well to other starting positions and numbers of units. We believe these results indicate the potential of neuroevolution for generating effective micro in real-time strategy games.\nCan deep learning (DL) guide our understanding of computations happening in biological brain? We will first briefly consider how DL has contributed to the research on visual object recognition. In the main part we will assess whether DL could also help us to clarify the computations underlying higher cognitive functions such as Theory of Mind. In addition, we will compare the objectives and learning signals of brains and machines, leading us to conclude that simply scaling up the current DL algorithms will not lead to human level mindreading skills. We then provide some insights about how to fairly compare human and DL performance. In the end we find that DL can contribute to our understanding of biological computations by providing an example of an end-to-end algorithm that solves the same problems the biological agents face.\nQuantified modal logic provides a natural logical language for reasoning about modal attitudes even while retaining the richness of quantification for referring to predicates over domains. But then most fragments of the logic are undecidable, over many model classes. Over the years, only a few fragments (such as the monodic) have been shown to be decidable. In this paper, we study fragments that bundle quantifiers and modalities together, inspired by earlier work on epistemic logics of know-how/why/what. As always with quantified modal logics, it makes a significant difference whether the domain stays the same across worlds, or not. In particular, we show that the bundle $\\forall \\Box$ is undecidable over constant domain interpretations, even with only monadic predicates, whereas $\\exists \\Box$ bundle is decidable. On the other hand, over increasing domain interpretations, we get decidability with both $\\forall \\Box$ and $\\exists \\Box$ bundles with unrestricted predicates. In these cases, we also obtain tableau based procedures that run in \\PSPACE. We further show that the $\\exists \\Box$ bundle cannot distinguish between constant domain and increasing domain interpretations.\nThe CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing , and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech and recorded by 6 Kinect microphone arrays and 4 binaural microphone pairs. The challenge features a single-array track and a multiple-array track and, for each track, distinct rankings will be produced for systems focusing on robustness with respect to distant-microphone capture vs. systems attempting to address all aspects of the task including conversational language modeling. We discuss the rationale for the challenge and provide a detailed description of the data collection procedure, the task, and the baseline systems for array synchronization, speech enhancement, and conventional and end-to-end ASR.\nWith super-resolution optical microscopy, it is now possible to observe molecular interactions in living cells. The obtained images have a very high spatial precision but their overall quality can vary a lot depending on the structure of interest and the imaging parameters. Moreover, evaluating this quality is often difficult for non-expert users. In this work, we tackle the problem of learning the quality function of super- resolution images from scores provided by experts. More specifically, we are proposing a system based on a deep neural network that can provide a quantitative quality measure of a STED image of neuronal structures given as input. We conduct a user study in order to evaluate the quality of the predictions of the neural network against those of a human expert. Results show the potential while highlighting some of the limits of the proposed approach.\nWe propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework to model the relationship between partial and delayed feedback, and as a special case we introduce efficient algorithms for settings where the partial feedback are biased or unbiased estimators of the delayed feedback. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Our experiments in real-world settings, involving policy search and hyperparameter optimization in computational sustainability domains for fast charging of batteries and wildlife corridor construction, demonstrate that exploiting the structure of partial feedback can lead to significant improvements over baselines in both sequential and parallel MAB.\nAs multicore computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint algorithms are amenable to parallelisation; whether to use shared memory or distributed computation; whether to use static or dynamic decomposition; and how to best exploit portfolios and cooperating search. We review the literature, and see that we can sometimes do quite well, some of the time, on some instances, but we are far from a general solution. Yet there seems to be little overall guidance that can be given on how best to exploit multicore computers to speed up constraint solving. We hope at least that this survey will provide useful pointers to future researchers wishing to correct this situation.   Under consideration in Theory and Practice of Logic Programming (TPLP).\nRear-end collision warning system has a great role to enhance the driving safety. In this system some measures are used to estimate the dangers and the system warns drivers to be more cautious. The real-time processes should be executed in such system, to remain enough time and distance to avoid collision with the front vehicle. To this end, in this paper a new system is developed by using random forest classifier. To evaluate the performance of the proposed system, vehicles trajectory data of 100 car's database from Virginia tech transportation institute are used and the methods are compared based on their accuracy and their processing time. By using TOPSIS multi-criteria selection method, we show that the results of the implemented classifier is better than the results of different classifiers including Bayesian network, naive Bayes, MLP neural network, support vector machine, nearest neighbor, rule-based methods and decision tree. The presented experiments reveals that the random forest is an acceptable algorithm for the proposed driver assistant system with 88.4% accuracy for detecting warning situations and 94.7% for detecting safe situations.\nSMOTE is one of the oversampling techniques for balancing the datasets and it is considered as a pre-processing step in learning algorithms. In this paper, four new enhanced SMOTE are proposed that include an improved version of KNN in which the attribute weights are defined by mutual information firstly and then they are replaced by maximum entropy, Renyi entropy and Tsallis entropy. These four pre-processing methods are combined with 1NN and J48 classifiers and their performance are compared with the previous methods on 11 imbalanced datasets from KEEL repository. The results show that these pre-processing methods improves the accuracy compared with the previous stablished works. In addition, as a case study, the first pre-processing method is applied on transportation data of Tehran-Bazargan Highway in Iran with IR equal to 36.\nWe present a training framework for neural abstractive summarization based on actor-critic approaches from reinforcement learning. In the traditional neural network based methods, the objective is only to maximize the likelihood of the predicted summaries, no other assessment constraints are considered, which may generate low-quality summaries or even incorrect sentences. To alleviate this problem, we employ an actor-critic framework to enhance the training procedure. For the actor, we employ the typical attention based sequence-to-sequence (seq2seq) framework as the policy network for summary generation. For the critic, we combine the maximum likelihood estimator with a well designed global summary quality estimator which is a neural network based binary classifier aiming to make the generated summaries indistinguishable from the human-written ones. Policy gradient method is used to conduct the parameter learning. An alternating training strategy is proposed to conduct the joint training of the actor and critic models. Extensive experiments on some benchmark datasets in different languages show that our framework achieves improvements over the state-of-the-art methods.\nWe present a novel automated method to segment the myocardium of both left and right ventricles in MRI volumes. The segmentation is consistent in 3D across the slices such that it can be directly used for mesh generation. Two specific neural networks with multi-scale coarse-to-fine prediction structure are proposed to cope with the small training dataset and trained using an original loss function. The former segments a slice in the middle of the volume. Then the latter iteratively propagates the slice segmentations towards the base and the apex, in a spatially consistent way. We perform 5-fold cross-validation on the 15 cases from STACOM to validate the method. For training, we use real cases and their synthetic variants generated by combining motion simulation and image synthesis. Accurate and consistent testing results are obtained.\nHuman conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent progress in visual question answering, image captioning, and visual question generation shows that dialog systems may be realizable in the not too distant future. To this end, a novel dataset was introduced recently and encouraging results were demonstrated, particularly for question answering. In this paper, we demonstrate a simple symmetric discriminative baseline, that can be applied to both predicting an answer as well as predicting a question. We show that this method performs on par with the state of the art, even memory net based methods. In addition, for the first time on the visual dialog dataset, we assess the performance of a system asking questions, and demonstrate how visual dialog can be generated from discriminative question generation and question answering.\nIn the face of shifting means of production from manual human labor to labor automation, one solution that stands out is the advancement of a Universal Basic Income, UBI to every citizen from the government with no strings attached. The proposal, however, has encountered sharp criticism from different quarters questioning the morality behind sourcing of funds, largely through taxation, to uphold an institution designed to provide social support. Others also perceive the idea as a form of socialism, or a capitalist road to communism. The current discussion, however, seeks to demonstrate that the provision of such stipend can occur through the utilization of revenues realized from production driven by Artificial Intelligence (AI), and to a small extent, philanthropic contributions from the top 1 percent of the population.\nRecently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on. In this paper, we propose a novel architecture, namely Auto-Reconstructor Network (ARNet), which, coupling with the conventional encoder-decoder framework, works in an end-to-end fashion to generate captions. ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). Extensive experimental results show that our proposed ARNet boosts the performance over the existing encoder-decoder models on both image captioning and source code captioning tasks. Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies. Our code is available at: https://github.com/chenxinpeng/ARNet\nWe propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contribution is twofold. We first present a 3D pose estimation approach for object categories which significantly outperforms the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior to retrieve 3D models which accurately represent the geometry of objects in RGB images. For this purpose, we render depth images from 3D models under our predicted pose and match learned image descriptors of RGB images against those of rendered depth images using a CNN-based multi-view metric learning approach. In this way, we are the first to report quantitative results for 3D model retrieval on Pascal3D+, where our method chooses the same models as human annotators for 50% of the validation images on average. In addition, we show that our method, which was trained purely on Pascal3D+, retrieves rich and accurate 3D models from ShapeNet given RGB images of objects in the wild.\nThere is an increasing concern in computer vision devices invading the privacy of their users by recording unwanted videos. On one hand, we want the camera systems/robots to recognize important events and assist human daily life by understanding its videos, but on the other hand we also want to ensure that they do not intrude people's privacy. In this paper, we propose a new principled approach for learning a video face anonymizer. We use an adversarial training setting in which two competing systems fight: (1) a video anonymizer that modifies the original video to remove privacy-sensitive information (i.e., human face) while still trying to maximize spatial action detection performance, and (2) a discriminator that tries to extract privacy-sensitive information from such anonymized videos. The end result is a video anonymizer that performs a pixel-level modification to anonymize each person's face, with minimal effect on action detection performance. We experimentally confirm the benefit of our approach compared to conventional hand-crafted video/face anonymization methods including masking, blurring, and noise adding. See the project page https://jason718.github.io/project/privacy/main.html for a demo video and more results.\nProspection, the act of predicting the consequences of many possible futures, is intrinsic to human planning and action, and may even be at the root of consciousness. Surprisingly, this idea has been explored comparatively little in robotics. In this work, we propose a neural network architecture and associated planning algorithm that (1) learns a representation of the world useful for generating prospective futures after the application of high-level actions, (2) uses this generative model to simulate the result of sequences of high-level actions in a variety of environments, and (3) uses this same representation to evaluate these actions and perform tree search to find a sequence of high-level actions in a new environment. Models are trained via imitation learning on a variety of domains, including navigation, pick-and-place, and a surgical robotics task. Our approach allows us to visualize intermediate motion goals and learn to plan complex activity from visual information.\nIn the encoding of many real-world problems to propositional satisfiability, the cardinality constraint is a recurrent constraint that needs to be managed effectively. Several efficient encodings have been proposed while missing that such a constraint can be involved in a more general propositional formulation. To avoid combinatorial explosion, Tseitin principle usually used to translate such general propositional formula to Conjunctive Normal Form (CNF), introduces fresh propositional variables to represent sub-formulas and/or complex contraints. Thanks to Plaisted and Greenbaum improvement, the polarity of the sub-formula $\\Phi$ is taken into account leading to conditional constraints of the form $y\\rightarrow \\Phi$, or $\\Phi\\rightarrow y$, where $y$ is a fresh propositional variable. In the case where $\\Phi$ represents a cardinality constraint, such translation leads to conditional cardinality constraints subject of the present paper. We first show that when all the clauses encoding the cardinality constraint are augmented with an additional new variable, most of the well-known encodings cease to maintain the generalized arc consistency property. Then, we consider some of these encodings and show how they can be extended to recover such important property. An experimental validation is conducted on a SAT-based pattern mining application, where such conditional cardinality constraints is a cornerstone, showing the relevance of our proposed approach.\nPlayer modeling is an important concept that has gained much attention in game research due to its utility in developing adaptive techniques to target better designs for engagement and retention. Previous work has explored modeling individual differences using machine learning algorithms per- formed on aggregated game actions. However, players' individual differences may be better manifested through sequential patterns of the in-game player's actions. While few works have explored sequential analysis of player data, none have explored the use of Hidden Markov Models (HMM) to model individual differences, which is the topic of this paper. In par- ticular, we developed a modeling approach using data col- lected from players playing a Role-Playing Game (RPG). Our proposed approach is two fold: 1. We present a Hidden Markov Model (HMM) of player in-game behaviors to model individual differences, and 2. using the output of the HMM, we generate behavioral features used to classify real world players' characteristics, including game expertise and the big five personality traits. Our results show predictive power for some of personality traits, such as game expertise and conscientiousness, but the most influential factor was game expertise.\nWe address a largely open problem of multilabel classification over graphs. Unlike traditional vector input, a graph has rich variable-size substructures which are related to the labels in some ways. We believe that uncovering these relations might hold the key to classification performance and explainability. We introduce GAML (Graph Attentional Multi-Label learning), a novel graph neural network that can handle this problem effectively. GAML regards labels as auxiliary nodes and models them in conjunction with the input graph. By applying message passing and attention mechanisms to both the label nodes and the input nodes iteratively, GAML can capture the relations between the labels and the input subgraphs at various resolution scales. Moreover, our model can take advantage of explicit label dependencies. It also scales linearly with the number of labels and graph size thanks to our proposed hierarchical attention. We evaluate GAML on an extensive set of experiments with both graph-structured inputs and classical unstructured inputs. The results show that GAML significantly outperforms other competing methods. Importantly, GAML enables intuitive visualizations for better understanding of the label-substructure relations and explanation of the model behaviors.\nMomentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed in low curvature directions. Its performance depends crucially on a damping coefficient $\\beta$. Large $\\beta$ values can potentially deliver much larger speedups, but are prone to oscillations and instability; hence one typically resorts to small values such as 0.5 or 0.9. We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different $\\beta$ parameters. AggMo is trivial to implement, but significantly dampens oscillations, enabling it to remain stable even for aggressive $\\beta$ values such as 0.999. We reinterpret Nesterov's accelerated gradient descent as a special case of AggMo and provide theoretical convergence bounds for online convex optimization. Empirically, we find that AggMo is a suitable drop-in replacement for other momentum methods, and frequently delivers faster convergence.\nThis paper investigates exploration strategies of Deep Reinforcement Learning (DRL) methods to learn navigation policies for mobile robots. In particular, we augment the normal external reward for training DRL algorithms with intrinsic reward signals measured by curiosity. We test our approach in a mapless navigation setting, where the autonomous agent is required to navigate without the occupancy map of the environment, to targets whose relative locations can be easily acquired through low-cost solutions (e.g., visible light localization, Wi-Fi signal localization). We validate that the intrinsic motivation is crucial for improving DRL performance in tasks with challenging exploration requirements. Our experimental results show that our proposed method is able to more effectively learn navigation policies, and has better generalization capabilities in previously unseen environments. A video of our experimental results can be found at https://goo.gl/pWbpcF.\nIn the recent times, autoencoders, besides being used for compression, have been proven quite useful even for regenerating similar images or help in image denoising. They have also been explored for anomaly detection in a few cases. However, due to location invariance property of convolutional neural network, autoencoders tend to learn from or search for learned features in the complete image. This creates issues when all the items in the image are not equally important and their location matters. For such cases, a semi supervised solution - regional priority based autoencoder (RPAE) has been proposed. In this model, similar to object detection models, a region proposal network identifies the relevant areas in the images as belonging to one of the predefined categories and then those bounding boxes are fed into appropriate decoder based on the category they belong to. Finally, the error scores from all the decoders are combined based on their importance to provide total reconstruction error.\nWhile deep neural networks (DNN) have become an effective computational tool, the prediction results are often criticized by the lack of interpretability, which is essential in many real-world applications such as health informatics. Existing attempts based on local interpretations aim to identify relevant features contributing the most to the prediction of DNN by monitoring the neighborhood of a given input. They usually simply ignore the intermediate layers of the DNN that might contain rich information for interpretation. To bridge the gap, in this paper, we propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of DNN models. By further interacting with the neuron of the target category at the output layer of the DNN, we enforce the interpretation result to be class-discriminative. We apply the proposed interpretation model to different CNN architectures to provide explanations for image data and conduct extensive experiments on ImageNet and PASCAL VOC07 datasets. The interpretation results demonstrate the effectiveness of our proposed framework in providing class-discriminative interpretation for DNN-based prediction.\nPredictions of driver's intentions and their behaviors using the road is of great importance for planning and decision making processes of autonomous driving vehicles. In particular, relatively short-term driving intentions are the fundamental units that constitute more sophisticated driving goals, behaviors, such as overtaking the slow vehicle in front, exit or merge onto a high way, etc. While it is not uncommon that most of the time human driver can rationalize, in advance, various on-road behaviors, intentions, as well as the associated risks, aggressiveness, reciprocity characteristics, etc., such reasoning skills can be challenging and difficult for an autonomous driving system to learn. In this article, we demonstrate a disciplined methodology that can be used to build and train a predictive drive system, therefore to learn the on-road characteristics aforementioned.\nIn this study, we explore capsule networks with dynamic routing for text classification. We propose three strategies to stabilize the dynamic routing process to alleviate the disturbance of some noise capsules which may contain \"background\" information or have not been successfully trained. A series of experiments are conducted with capsule networks on six text classification benchmarks. Capsule networks achieve state of the art on 4 out of 6 datasets, which shows the effectiveness of capsule networks for text classification. We additionally show that capsule networks exhibit significant improvement when transfer single-label to multi-label text classification over strong baseline methods. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for text modeling.\nPredictive analysis in business process monitoring aims at forecasting the future information of a running business process. The prediction is typically made based on the model extracted from historical process execution logs (event logs). In practice, different business domains might require different kinds of predictions. Hence, it is important to have a means for properly specifying the desired prediction tasks, and a mechanism to deal with these various prediction tasks. Although there have been many studies in this area, they mostly focus on a specific prediction task. This work introduces a language for specifying the desired prediction tasks, and this language allows us to express various kinds of prediction tasks. This work also presents a mechanism for automatically creating the corresponding prediction model based on the given specification. Thus, different from previous studies, our approach enables us to deal with various kinds of prediction tasks based on the given specification. A prototype implementing our approach has been developed and experiments using a real-life event log have been conducted.\nA key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its underlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforcement learning, resulting in substantially more effective learning when solving new tasks described via image-based goals. We were able to achieve successful transfer of visuomotor planning strategies across robots with significantly different morphologies and actuation capabilities.\nTraditional machine learning models have problems with handling sequence data, because the lengths of sequences may vary between samples. In this paper, we present an unsupervised learning model for sequence data, called the Integrated Sequence Autoencoder (ISA), to learn a fixed-length vectorial representation by minimizing the reconstruction error. Specifically, we propose to integrate two classical mechanisms for sequence reconstruction which takes into account both the global silhouette information and the local temporal dependencies. Furthermore, we propose a stop feature that serves as a temporal stamp to guide the reconstruction process, and which results in a higher-quality representation. Extensive validation on real-world datasets shows that the learned representation is able to effectively summarize not only the apparent features, but also the underlying and high-level style information. Take for example a speech sequence sample: our ISA model can not only recognize the spoken text (apparent feature), but can also discriminate the speaker who utters the audio (more high-level style).\nWe present an approach where two different models (Deep and Shallow) are trained separately on the data and a weighted average of the outputs is taken as the final result. For the Deep approach, we use different combinations of models like Convolution Neural Network, pretrained word2vec embeddings and LSTMs to get representations which are then used to train a Deep Neural Network. For Clarity prediction, we also use an Attentive Pooling approach for the pooling operation so as to be aware of the Title-Category pair. For the shallow approach, we use boosting technique LightGBM on features generated using title and categories. We find that an ensemble of these approaches does a better job than using them alone suggesting that the results of the deep and shallow approach are highly complementary\nWe propose the idea of transferring common-sense knowledge from source categories to target categories for scalable object detection. In our setting, the training data for the source categories have bounding box annotations, while those for the target categories only have image-level annotations. Current state-of-the-art approaches focus on image-level visual or semantic similarity to adapt a detector trained on the source categories to the new target categories. In contrast, our key idea is to (i) use similarity not at image-level, but rather at region-level, as well as (ii) leverage richer common-sense (based on attribute, spatial, etc.,) to guide the algorithm towards learning the correct detections. We acquire such common-sense cues automatically from readily-available knowledge bases without any extra human effort. On the challenging MS COCO dataset, we find that using common-sense knowledge substantially improves detection performance over existing transfer-learning baselines.\nModern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. While weakly-supervised methods require less supervision, by utilizing 2D poses or multi-view imagery without annotations, they still need a sufficiently large set of samples with 3D annotations for learning to succeed.   In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without annotations. To this end, we use an encoder-decoder that predicts an image from one viewpoint given an image from another viewpoint. Because this representation encodes 3D geometry, using it in a semi-supervised setting makes it easier to learn a mapping from it to 3D human pose. As evidenced by our experiments, our approach significantly outperforms fully-supervised methods given the same amount of labeled data, and improves over other semi-supervised methods while using as little as 1% of the labeled data.\nSynthesizing user-intended programs from a small number of input-output examples is a challenging problem with several important applications like spreadsheet manipulation, data wrangling and code refactoring. Existing synthesis systems either completely rely on deductive logic techniques that are extensively hand-engineered or on purely statistical models that need massive amounts of data, and in general fail to provide real-time synthesis on challenging benchmarks. In this work, we propose Neural Guided Deductive Search (NGDS), a hybrid synthesis technique that combines the best of both symbolic logic techniques and statistical models. Thus, it produces programs that satisfy the provided specifications by construction and generalize well on unseen examples, similar to data-driven systems. Our technique effectively utilizes the deductive search framework to reduce the learning problem of the neural component to a simple supervised learning setup. Further, this allows us to both train on sparingly available real-world data and still leverage powerful recurrent neural network encoders. We demonstrate the effectiveness of our method by evaluating on real-world customer scenarios by synthesizing accurate programs with up to 12x speed-up compared to state-of-the-art systems.\nMining frequent sequential patterns consists in extracting recurrent behaviors, modeled as patterns, in a big sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. what does really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting a lot of observed events. Negative sequential patterns (NSP) formulate recurrent behaviors by patterns containing both observed events and absent events. Few approaches have been proposed to mine such NSPs. In addition, the syntax and semantics of NSPs differ in the different methods which makes it difficult to compare them. This article provides a unified framework for the formulation of the syntax and the semantics of NSPs. Then, we introduce a new algorithm, NegPSpan, that extracts NSPs using a PrefixSpan depth-first scheme and enabling maxgap constraints that other approaches do not take into account. The formal framework allows for highlighting the differences between the proposed approach wrt to the methods from the literature, especially wrt the state of the art approach eNSP. Intensive experiments on synthetic and real datasets show that NegPSpan can extract meaningful NSPs and that it can process bigger datasets than eNSP thanks to significantly lower memory requirements and better computation times.\nWord embeddings have emerged as a popular approach to unsupervised learning of word relationships in machine learning and natural language processing. In this article, we benchmark two of the most popular algorithms, GloVe and word2vec, to assess their suitability for capturing medical relationships in large sources of biomedical data. Leaning on recent theoretical insights, we provide a unified view of these algorithms and demonstrate how different sources of data can be combined to construct the largest ever set of embeddings for 108,477 medical concepts using an insurance claims database of 60 million members, 20 million clinical notes, and 1.7 million full text biomedical journal articles. We evaluate our approach, called cui2vec, on a set of clinically relevant benchmarks and in many instances demonstrate state of the art performance relative to previous results. Finally, we provide a downloadable set of pre-trained embeddings for other researchers to use, as well as an online tool for interactive exploration of the cui2vec embeddings.\nThis paper describes an abstractive summarization method for tabular data which employs a knowledge base semantic embedding to generate the summary. Assuming the dataset contains descriptive text in headers, columns and/or some augmenting metadata, the system employs the embedding to recommend a subject/type for each text segment. Recommendations are aggregated into a small collection of super types considered to be descriptive of the dataset by exploiting the hierarchy of types in a pre-specified ontology. Using February 2015 Wikipedia as the knowledge base, and a corresponding DBpedia ontology as types, we present experimental results on open data taken from several sources--OpenML, CKAN and data.world--to illustrate the effectiveness of the approach.\nBeing able to predict what may happen in the future requires an in-depth understanding of the physical and causal rules that govern the world. A model that is able to do so has a number of appealing applications, from robotic planning to representation learning. However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction. Recently, this has been addressed by two distinct approaches: (a) latent variational variable models that explicitly model underlying stochasticity and (b) adversarially-trained models that aim to produce naturalistic images. However, a standard latent variable model can struggle to produce realistic results, and a standard adversarially-trained model underutilizes latent variables and fails to produce diverse predictions. We show that these distinct methods are in fact complementary. Combining the two produces predictions that look more realistic to human raters and better cover the range of possible futures. Our method outperforms prior and concurrent work in these aspects.\nWe revisit the classical problem of exact inference on probabilistic graphical models (PGMs). Our algorithm is based on recent worst-case optimal database join algorithms, which can be asymptotically faster than traditional data processing methods. We present the first empirical evaluation of these new algorithms via JoinInfer, a new exact inference engine. We empirically explore the properties of the data for which our engine can be expected to outperform traditional inference engines refining current theoretical notions. Further, JoinInfer outperforms existing state-of-the-art inference engines (ACE, IJGP and libDAI) on some standard benchmark datasets by up to a factor of 630x. Finally, we propose a promising data-driven heuristic that extends JoinInfer to automatically tailor its parameters and/or switch to the traditional inference algorithms.\nLearning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates. We propose a novel rejection sampling step that discards samples from the variational posterior which are assigned low likelihoods by the model. Our approach provides an arbitrarily accurate approximation of the true posterior at the expense of extra computation. Using a new gradient estimator for the resulting unnormalized proposal distribution, we achieve average improvements of 3.71 nats and 0.21 nats over state-of-the-art single-sample and multi-sample alternatives respectively for estimating marginal log-likelihoods using sigmoid belief networks on the MNIST dataset.\nWe present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva's sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train.\nMost saliency estimation methods aim to explicitly model low-level conspicuity cues such as edges or blobs and may additionally incorporate top-down cues using face or text detection. Data-driven methods for training saliency models using eye-fixation data are increasingly popular, particularly with the introduction of large-scale datasets and deep architectures. However, current methods in this latter paradigm use loss functions designed for classification or regression tasks whereas saliency estimation is evaluated on topographical maps. In this work, we introduce a new saliency map model which formulates a map as a generalized Bernoulli distribution. We then train a deep architecture to predict such maps using novel loss functions which pair the softmax activation function with measures designed to compute distances between probability distributions. We show in extensive experiments the effectiveness of such loss functions over standard ones on four public benchmark datasets, and demonstrate improved performance over state-of-the-art saliency methods.\nIn 2015, Google's DeepMind announced an advancement in creating an autonomous agent based on deep reinforcement learning (DRL) that could beat a professional player in a series of 49 Atari games. However, the current manifestation of DRL is still immature, and has significant drawbacks. One of DRL's imperfections is its lack of \"exploration\" during the training process, especially when working with high-dimensional problems. In this paper, we propose a mixed strategy approach that mimics behaviors of human when interacting with environment, and create a \"thinking\" agent that allows for more efficient exploration in the DRL training process. The simulation results based on the Breakout game show that our scheme achieves a higher probability of obtaining a maximum score than does the baseline DRL algorithm, i.e., the asynchronous advantage actor-critic method. The proposed scheme therefore can be applied effectively to solving a complicated task in a real-world application.\nMiss-ratio curve (MRC), or equivalently hit-ratio curve (HRC), construction techniques have recently gathered the attention of many researchers. Recent advancements have allowed for approximating these curves in constant time, allowing for online working-set-size (WSS) measurement. Techniques span the algorithmic design paradigm from classic dynamic programming to artificial intelligence inspired techniques. Our survey produces broad classification of the current techniques primarily based on \\emph{what} locality metric is being recorded and \\emph{how} that metric is stored for processing.   Applications of theses curves span from dynamic cache partitioning in the processor, to improving block allocation at the operating system level. Our survey will give an overview of the historical, exact MRC construction methods, and compare them with the state-of-the-art methods present in today's literature. In addition, we will show where there are still open areas of research and remain excited to see what this domain can produce with a strong theoretical background.\nState-of-the-art pedestrian detection models have achieved great success in many benchmarks. However, these models require lots of annotation information and the labeling process usually takes much time and efforts. In this paper, we propose a method to generate labeled pedestrian data and adapt them to support the training of pedestrian detectors. The proposed framework is built on the Generative Adversarial Network (GAN) with multiple discriminators, trying to synthesize realistic pedestrians and learn the background context simultaneously. To handle the pedestrians of different sizes, we adopt the Spatial Pyramid Pooling (SPP) layer in the discriminator. We conduct experiments on two benchmarks. The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details. To quantitatively evaluate our approach, we add the generated samples into training data of the baseline pedestrian detectors and show the synthetic images are able to improve the detectors' performance.\nObject retrieval and classification in point cloud data is challenged by noise, irregular sampling density and occlusion. To address this issue, we propose a point pair descriptor that is robust to noise and occlusion and achieves high retrieval accuracy. We further show how the proposed descriptor can be used in a 4D convolutional neural network for the task of object classification. We propose a novel 4D convolutional layer that is able to learn class-specific clusters in the descriptor histograms. Finally, we provide experimental validation on 3 benchmark datasets, which confirms the superiority of the proposed approach.\nWith the ever growing diversity of devices and applications that will be connected to 5G networks, flexible and agile service orchestration with acknowledged QoE that satisfies end-user's functional and QoS requirements is necessary. SDN (Software-Defined Networking) and NFV (Network Function Virtualization) are considered key enabling technologies for 5G core networks. In this regard, this paper proposes a reinforcement learning based QoS/QoE-aware Service Function Chaining (SFC) in SDN/NFV-enabled 5G slices. First, it implements a lightweight QoS information collector based on LLDP, which works in a piggyback fashion on the southbound interface of the SDN controller, to enable QoS-awareness. Then, a DQN (Deep Q Network) based agent framework is designed to support SFC in the context of NFV. The agent takes into account the QoE and QoS as key aspects to formulate the reward so that it is expected to maximize QoE while respecting QoS constraints. The experiment results show that this framework exhibits good performance in QoE provisioning and QoS requirements maintenance for SFC in dynamic network environments.\nThe idea of end-to-end learning of communications systems through neural network -based autoencoders has the shortcoming that it requires a differentiable channel model. We present in this paper a novel learning algorithm which alleviates this problem. The algorithm iterates between supervised training of the receiver and reinforcement learning -based training of the transmitter. We demonstrate that this approach works as well as fully supervised methods on additive white Gaussian noise (AWGN) and Rayleigh block-fading (RBF) channels. Surprisingly, while our method converges slower on AWGN channels than supervised training, it converges faster on RBF channels. Our results are a first step towards learning of communications systems over any type of channel without prior assumptions.\nThe problem of comparing concepts of dependence in general rough sets with those in probability theory had been initiated by the present author in some of her recent papers. This problem relates to the identification of the limitations of translating between the methodologies and possibilities in the identification of concepts. Comparison of ideas of dependence in the approaches had been attempted from a set-valuation based minimalist perspective by the present author. The deviant probability framework has been the result of such an approach. Other Bayesian reasoning perspectives (involving numeric valuations) and frequentist approaches are also known. In this research, duality results are adapted to demonstrate the possibility of improved comparisons across implications between ontologically distinct concepts in a common logic-based framework by the present author. Both positive and negative results are proved that delimit possible comparisons in a clearer way by her.\nOne of the distinguishing aspects of human language is its compositionality, which allows us to describe complex environments with limited vocabulary. Previously, it has been shown that neural network agents can learn to communicate in a highly structured, possibly compositional language based on disentangled input (e.g. hand- engineered features). Humans, however, do not learn to communicate based on well-summarized features. In this work, we train neural agents to simultaneously develop visual perception from raw image pixels, and learn to communicate with a sequence of discrete symbols. The agents play an image description game where the image contains factors such as colors and shapes. We train the agents using the obverter technique where an agent introspects to generate messages that maximize its own understanding. Through qualitative analysis, visualization and a zero-shot test, we show that the agents can develop, out of raw image pixels, a language with compositional properties, given a proper pressure from the environment.\nPredictive process monitoring has recently gained traction in academia and is maturing also in companies. However, with the growing body of research, it might be daunting for companies to navigate in this domain in order to find, provided certain data, what can be predicted and what methods to use. The main objective of this paper is developing a value-driven framework for classifying existing work on predictive process monitoring. This objective is achieved by systematically identifying, categorizing, and analyzing existing approaches for predictive process monitoring. The review is then used to develop a value-driven framework that can support organizations to navigate in the predictive process monitoring field and help them to find value and exploit the opportunities enabled by these analysis techniques.\nWe study the problem of generating interpretable and verifiable policies through reinforcement learning. Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim in Programmatically Interpretable Reinforcement Learning is to find a policy that can be represented in a high-level programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verification by symbolic methods. We propose a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maxima reward. NDPS works by first learning a neural policy network using DRL, and then performing a local search over programmatic policies that seeks to minimize a distance from this neural \"oracle\". We evaluate NDPS on the task of learning to drive a simulated car in the TORCS car-racing environment. We demonstrate that NDPS is able to discover human-readable policies that pass some significant performance bars. We also find that a well-designed policy language can serve as a regularizer, and result in the discovery of policies that lead to smoother trajectories and are more easily transferred to environments not encountered during training.\nWe present a novel algorithm for reciprocal collision avoidance between heterogeneous agents of different shapes and sizes. We present a novel CTMAT representation based on medial axis transform to compute a tight fitting bounding shape for each agent. Each CTMAT is represented using tuples, which are composed of circular arcs and line segments. Based on the reciprocal velocity obstacle formulation, we reduce the problem to solving a low-dimensional linear programming between each pair of tuples belonging to adjacent agents. We precompute the Minkowski Sums of tuples to accelerate the runtime performance. Finally, we provide an efficient method to update the orientation of each agent in a local manner. We have implemented the algorithm and highlight its performance on benchmarks corresponding to road traffic scenarios and different vehicles. The overall runtime performance is comparable to prior multi-agent collision avoidance algorithms that use circular or elliptical agents. Our approach is less conservative and results in fewer false collisions.\nPartially Observable Markov Decision Processes (POMDPs) offer an elegant framework to model sequential decision making in uncertain environments. Solving POMDPs online is an active area of research and given the size of real-world problems approximate solvers are used. Recently, a few approaches have been suggested for solving POMDPs by using MDP solvers in conjunction with imitation learning. MDP based POMDP solvers work well for some cases, while catastrophically failing for others. The main failure point of such solvers is the lack of motivation for MDP solvers to gain information, since under their assumption the environment is either already known as much as it can be or the uncertainty will disappear after the next step. However for solving POMDP problems gaining information can lead to efficient solutions. In this paper we derive a set of conditions where MDP based POMDP solvers are provably sub-optimal. We then use the well-known tiger problem to demonstrate such sub-optimality. We show that multi-resolution, budgeted information gathering cannot be addressed using MDP based POMDP solvers. The contribution of the paper helps identify the properties of a POMDP problem for which the use of MDP based POMDP solvers is inappropriate, enabling better design choices.\nSelf Organizing Map is trained using unsupervised learning to produce a two-dimensional discretized representation of input space of the training cases. Growing Hierarchical SOM is an architecture which grows both in a hierarchical way representing the structure of data distribution and in a horizontal way representation the size of each individual maps. The control method of the growing degree of GHSOM by pruning off the redundant branch of hierarchy in SOM is proposed in this paper. Moreover, the interface tool for the proposed method called interactive GHSOM is developed. We discuss the computation results of Iris data by using the developed tool.\nWe present and evaluate the Fast (conditional) Independence Test (FIT) -- a nonparametric conditional independence test. The test is based on the idea that when $P(X \\mid Y, Z) = P(X \\mid Y)$, $Z$ is not useful as a feature to predict $X$, as long as $Y$ is also a regressor. On the contrary, if $P(X \\mid Y, Z) \\neq P(X \\mid Y)$, $Z$ might improve prediction results. FIT applies to thousand-dimensional random variables with a hundred thousand samples in a fraction of the time required by alternative methods. We provide an extensive evaluation that compares FIT to six extant nonparametric independence tests. The evaluation shows that FIT has low probability of making both Type I and Type II errors compared to other tests, especially as the number of available samples grows. Our implementation of FIT is publicly available.\nThe convergence speed of stochastic gradient descent (SGD) can be improved by actively selecting mini-batches. We explore sampling schemes where similar data points are less likely to be selected in the same mini-batch. In particular, we prove that such repulsive sampling schemes lowers the variance of the gradient estimator. This generalizes recent work on using Determinantal Point Processes (DPPs) for mini-batch diversification (Zhang et al., 2017) to the broader class of repulsive point processes. We first show that the phenomenon of variance reduction by diversified sampling generalizes in particular to non-stationary point processes. We then show that other point processes may be computationally much more efficient than DPPs. In particular, we propose and investigate Poisson Disk sampling---frequently encountered in the computer graphics community---for this task. We show empirically that our approach improves over standard SGD both in terms of convergence speed as well as final model performance.\nRecently, a high technique of image processing is required to extract the image features in real time. In our research, the tourist subject data are collected from the Mobile Phone based Participatory Sensing (MPPS) system. Each record consists of image files with GPS, geographic location name, user's numerical evaluation, and comments written in natural language at sightseeing spots where a user really visits. In our previous research, the famous landmarks in sightseeing spot can be detected by Clonal Selection Algorithm with Immunological Memory Cell (CSAIM). However, some landmarks was not detected correctly by the previous method because they didn't have enough amount of information for the feature extraction. In order to improve the weakness, we propose the generation method of immunological memory by Restricted Boltzmann Machines. To verify the effectiveness of the method, some experiments for classification of the subjective data are executed by using machine learning tools for Deep Learning.\nDecentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDEC-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDEC-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real-world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.\nThe most enigmatic aspect of consciousness is the fact that it is felt, as a subjective sensation. This particular aspect is explained by the theory proposed here. The theory encompasses both the computation that is presumably involved and the way in which that computation may be realized in the brain's neurobiology. It is assumed that the brain makes an internal estimate of an individual's own evolutionary fitness, which can be shown to produce an irreducible, distinct cause. Communicating components of the fitness estimate (either for external or internal use) requires inverting them. Such inversion can be performed by the thalamocortical feedback loop in the mammalian brain, if that loop is operating in a switched, dual-stage mode. A first (nonconscious) stage produces forward estimates, whereas the second (conscious) stage inverts those estimates. It is argued that inversion produces irreducible, distinct, and spatially localized causes, which are plausibly sensed as the feeling of consciousness.\nThis paper investigates to what extent do cognitive biases affect human understanding of interpretable machine learning models, in particular of rules discovered from data. Twenty cognitive biases (illusions, effects) are covered, as are possibly effective debiasing techniques that can be adopted by designers of machine learning algorithms and software. While there seems no universal approach for eliminating all the identified cognitive biases, it follows from our analysis that the effect of most biases can be ameliorated by making rule-based models more concise. Due to lack of previous research, our review transfers general results obtained in cognitive psychology to the domain of machine learning. It needs to be succeeded by empirical studies specifically aimed at the machine learning domain.\nHate speech detection is a critical, yet challenging problem in Natural Language Processing (NLP). Despite the existence of numerous studies dedicated to the development of NLP hate speech detection approaches, the accuracy is still poor. The central problem is that social media posts are short and noisy, and most existing hate speech detection solutions take each post as an isolated input instance, which is likely to yield high false positive and negative rates. In this paper, we radically improve automated hate speech detection by presenting a novel model that leverages intra-user and inter-user representation learning for robust hate speech detection on Twitter. In addition to the target Tweet, we collect and analyze the user's historical posts to model intra-user Tweet representations. To suppress the noise in a single Tweet, we also model the similar Tweets posted by all other users with reinforced inter-user representation learning techniques. Experimentally, we show that leveraging these two representations can significantly improve the f-score of a strong bidirectional LSTM baseline model by 10.1%.\nRapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. Even high-level, dedicated visualization tools often require users to manually select among data attributes, decide which transformations to apply, and specify mappings between visual encoding variables and raw or transformed attributes. In this paper, we introduce Data2Vis, a neural translation model, for automatically generating visualizations from given datasets. We formulate visualization generation as a sequence to sequence translation problem where data specification is mapped to a visualization specification in a declarative language (Vega-Lite). To this end, we train a multilayered Long Short-Term Memory (LSTM) model with attention on a corpus of visualization specifications. Qualitative results show that our model learns the vocabulary and syntax for a valid visualization specification, appropriate transformations (count, bins, mean) and how to use common data selection patterns that occur within data visualizations. Our model generates visualizations that are comparable to manually-created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.\nMagnetic resonance images (MRI) play an important role in supporting and substituting clinical information in the diagnosis of multiple sclerosis (MS) disease by presenting lesion in brain MR images. In this paper, an algorithm for MS lesion segmentation from Brain MR Images has been presented. We revisit the modification of properties of fuzzy -c means algorithms and the canny edge detection. By changing and reformed fuzzy c-means clustering algorithms, and applying canny contraction principle, a relationship between MS lesions and edge detection is established. For the special case of FCM, we derive a sufficient condition and clustering parameters, allowing identification of them as (local) minima of the objective function.\nInformation Extraction (IE) refers to automatically extracting structured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.\nVariational autoencoders (VAE) combined with hierarchical RNNs have emerged as a powerful framework for conversation modeling. However, they suffer from the notorious degeneration problem, where the decoders learn to ignore latent variables and reduce to vanilla RNNs. We empirically show that this degeneracy occurs mostly due to two reasons. First, the expressive power of hierarchical RNN decoders is often high enough to model the data using only its decoding distributions without relying on the latent variables. Second, the conditional VAE structure whose generation process is conditioned on a context, makes the range of training targets very sparse; that is, the RNN decoders can easily overfit to the training data ignoring the latent variables. To solve the degeneration problem, we propose a novel model named Variational Hierarchical Conversation RNNs (VHCR), involving two key ideas of (1) using a hierarchical structure of latent variables, and (2) exploiting an utterance drop regularization. With evaluations on two datasets of Cornell Movie Dialog and Ubuntu Dialog Corpus, we show that our VHCR successfully utilizes latent variables and outperforms state-of-the-art models for conversation generation. Moreover, it can perform several new utterance control tasks, thanks to its hierarchical latent structure.\nThis paper proposes learning disentangled but complementary face features with minimal supervision by face identification. Specifically, we construct an identity Distilling and Dispelling Autoencoder (D2AE) framework that adversarially learns the identity-distilled features for identity verification and the identity-dispelled features to fool the verification system. Thanks to the design of two-stream cues, the learned disentangled features represent not only the identity or attribute but the complete input image. Comprehensive evaluations further demonstrate that the proposed features not only maintain state-of-the-art identity verification performance on LFW, but also acquire competitive discriminative power for face attribute recognition on CelebA and LFWA. Moreover, the proposed system is ready to semantically control the face generation/editing based on various identities and attributes in an unsupervised manner.\nProbabilistic topic models are popular unsupervised learning methods, including probabilistic latent semantic indexing (pLSI) and latent Dirichlet allocation (LDA). By now, their training is implemented on general purpose computers (GPCs), which are flexible in programming but energy-consuming. Towards low-energy implementations, this paper investigates their training on an emerging hardware technology called the neuromorphic multi-chip systems (NMSs). NMSs are very effective for a family of algorithms called spiking neural networks (SNNs). We present three SNNs to train topic models. The first SNN is a batch algorithm combining the conventional collapsed Gibbs sampling (CGS) algorithm and an inference SNN to train LDA. The other two SNNs are online algorithms targeting at both energy- and storage-limited environments. The two online algorithms are equivalent with training LDA by using maximum-a-posterior estimation and maximizing the semi-collapsed likelihood, respectively. They use novel, tailored ordinary differential equations for stochastic optimization. We simulate the new algorithms and show that they are comparable with the GPC algorithms, while being suitable for NMS implementation. We also propose an extension to train pLSI and a method to prune the network to obey the limited fan-in of some NMSs.\nLogic is a foundation for many modern areas of computer science. In artificial intelligence, as a basis of database query languages, as well as in formal software and hardware verification --- modelling scenarios using logical formalisms and inferring new knowledge are important skills for going-to-be computer scientists. The Iltis project aims at providing a web-based, interactive system that supports teaching logical methods. In particular the system shall (a) support to learn to model knowledge and to infer new knowledge using propositional logic, modal logic and first-order logic, and (b) provide immediate feedback and support to students. This article presents a prototypical system that currently supports the above tasks for propositional logic. First impressions on its use in a second year logic course for computer science students are reported.\nResearch has shown that personalization of health interventions can contribute to an improved effectiveness. Reinforcement learning algorithms can be used to perform such tailoring using data that is collected about users. Learning is however very fragile for health interventions as only limited time is available to learn from the user before disengagement takes place, or before the opportunity to intervene passes. In this paper, we present a cluster-based reinforcement learning approach which learns across groups of users. Such an approach can speed up the learning process while still giving a level of personalization. The clustering algorithm uses a distance metric over traces of states and rewards. We apply both online and batch learning to learn policies over the clusters and introduce a publicly available simulator which we have developed to evaluate the approach. The results show batch learning clearly outperforms online learning. Furthermore, clustering can be beneficial provided that a proper clustering is found.\nWe present new intuitions and theoretical assessments of the emergence of disentangled representation in variational autoencoders. Taking a rate-distortion theory perspective, we show the circumstances under which representations aligned with the underlying generative factors of variation of data emerge when optimising the modified ELBO bound in $\\beta$-VAE, as training progresses. From these insights, we propose a modification to the training regime of $\\beta$-VAE, that progressively increases the information capacity of the latent code during training. This modification facilitates the robust learning of disentangled representations in $\\beta$-VAE, without the previous trade-off in reconstruction accuracy.\nThe objective of transfer reinforcement learning is to generalize from a set of previous tasks to unseen new tasks. In this work, we focus on the transfer scenario where the dynamics among tasks are the same, but their goals differ. Although general value function (Sutton et al., 2011) has been shown to be useful for knowledge transfer, learning a universal value function can be challenging in practice. To attack this, we propose (1) to use universal successor representations (USR) to represent the transferable knowledge and (2) a USR approximator (USRA) that can be trained by interacting with the environment. Our experiments show that USR can be effectively applied to new tasks, and the agent initialized by the trained USRA can achieve the goal considerably faster than random initialization.\nWe propose Cooperative Training (CoT) for training generative models that measure a tractable density function for target data. CoT coordinately trains a generator $G$ and an auxiliary predictive mediator $M$. The training target of $M$ is to estimate a mixture density of the learned distribution $G$ and the target distribution $P$, and that of $G$ is to minimize the Jensen-Shannon divergence estimated through $M$. CoT achieves independent success without the necessity of pre-training via Maximum Likelihood Estimation or involving high-variance algorithms like REINFORCE. This low-variance algorithm is theoretically proved to be unbiased for both generative and predictive tasks. We also theoretically and empirically show the superiority of CoT over most previous algorithms, in terms of generative quality and diversity, predictive generalization ability and computational cost.\nAn innovative model of parcel distribution is emerging from the accelerated evolution of drones and the effort of logistic companies to proceed faster deliveries at a reduced cost. This new modality originated the Flying Sidekick Traveling Salesman Problem (FSTSP) in which customers are served either by a truck or a drone. Additionally, this variant of the Traveling Salesman Problem (TSP) presents several new restrictions concerning the drone such as endurance and payload capacity. This work proposes a hybrid heuristic that the initial solution is created from the optimal TSP solution reached by a Mixed-Integer Programming (MIP) solver. Next, an implementation of the General Variable Neighborhood Search is used to obtain the delivery routes of truck and drone. Computational experiments show the potential of the algorithm to improve the total delivery time up to 67.79%. New best-known solutions (BKS) are established for all FSTSP instances that results are reported in the literature. Furthermore, a new set of instances based on well-known TSPLIB instances is provided.\nA characteristic of existing predictive process monitoring techniques is to first construct a predictive model based on past process executions, and then use it to predict the future of new ongoing cases, without the possibility of updating it with new cases when they complete their execution. This can make predictive process monitoring too rigid to deal with the variability of processes working in real environments that continuously evolve and/or exhibit new variant behaviors over time. As a solution to this problem, we propose the use of algorithms that allow the incremental construction of the predictive model. These incremental learning algorithms update the model whenever new cases become available so that the predictive model evolves over time to fit the current circumstances. The algorithms have been implemented using different case encoding strategies and evaluated on a number of real and synthetic datasets. The results provide a first evidence of the potential of incremental learning strategies for predicting process monitoring in real environments, and of the impact of different case encoding strategies in this setting.\nWe present a simulation-based approach for generating barrier certificate functions for safety verification of cyber-physical systems (CPS) that contain neural network-based controllers. A linear programming solver is utilized to find a candidate generator function from a set of simulation traces obtained by randomly selecting initial states for the CPS model. A level set of the generator function is then selected to act as a barrier certificate for the system, meaning it demonstrates that no unsafe system states are reachable from a given set of initial states. The barrier certificate properties are verified with an SMT solver. This approach is demonstrated on a case study in which a Dubins car model of an autonomous vehicle is controlled by a neural network to follow a given path.\nMulti-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \\textit{a priori} ungrounded and is a form of cheap talk. We show that self-interested agents can use the pre-grounded communication channel to negotiate fairly, but are unable to effectively use the ungrounded channel. However, prosocial agents do learn to use cheap talk to find an optimal negotiating strategy, suggesting that cooperation is necessary for language to emerge. We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.\nThe ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured.\nExploration is a fundamental aspect of Reinforcement Learning, typically implemented using stochastic action-selection. Exploration, however, can be more efficient if directed toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality. While there are a few model-based solutions to this shortcoming, a model-free approach is still missing. We propose $E$-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories. We compare our approach to commonly used RL techniques, and show that using $E$-values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efficiently learn continuous MDPs. We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.\nInferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been explored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary.\nMarket making is a fundamental trading problem in which an agent provides liquidity by continually offering to buy and sell a security. The problem is challenging due to inventory risk, the risk of accumulating an unfavourable position and ultimately losing money. In this paper, we develop a high-fidelity simulation of limit order book markets, and use it to design a market making agent using temporal-difference reinforcement learning. We use a linear combination of tile codings as a value function approximator, and design a custom reward function that controls inventory risk. We demonstrate the effectiveness of our approach by showing that our agent outperforms both simple benchmark strategies and a recent online learning approach from the literature.\nIn the medical domain, identifying and expanding abbreviations in clinical texts is a vital task for both better human and machine understanding. It is a challenging task because many abbreviations are ambiguous especially for intensive care medicine texts, in which phrase abbreviations are frequently used. Besides the fact that there is no universal dictionary of clinical abbreviations and no universal rules for abbreviation writing, such texts are difficult to acquire, expensive to annotate and even sometimes, confusing to domain experts. This paper proposes a novel and effective approach -- exploiting task-oriented resources to learn word embeddings for expanding abbreviations in clinical notes. We achieved 82.27\\% accuracy, close to expert human performance.\nIn several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter second-moment estimators requires memory equal to the number of parameters. For the case of neural network weight matrices, we propose maintaining only the per-row and per-column sums of these moving averages, and estimating the per-parameter second moments based on these sums. We demonstrate empirically that this method produces similar results to the baseline. Secondly, we show that adaptive methods can produce larger-than-desired updates when the decay rate of the second moment accumulator is too slow. We propose update clipping and a gradually increasing decay rate scheme as remedies. Combining these methods and dropping momentum, we achieve comparable results to the published Adam regime in training the Transformer model on the WMT 2014 English-German machine translation task, while using very little auxiliary storage in the optimizer. Finally, we propose scaling the parameter updates based on the scale of the parameters themselves.\nWe suggest that the analysis of incomplete contracting developed by law and economics researchers can provide a useful framework for understanding the AI alignment problem and help to generate a systematic approach to finding solutions. We first provide an overview of the incomplete contracting literature and explore parallels between this work and the problem of AI alignment. As we emphasize, misalignment between principal and agent is a core focus of economic analysis. We highlight some technical results from the economics literature on incomplete contracts that may provide insights for AI alignment researchers. Our core contribution, however, is to bring to bear an insight that economists have been urged to absorb from legal scholars and other behavioral scientists: the fact that human contracting is supported by substantial amounts of external structure, such as generally available institutions (culture, law) that can supply implied terms to fill the gaps in incomplete contracts. We propose a research agenda for AI alignment work that focuses on the problem of how to build AI that can replicate the human cognitive processes that connect individual incomplete contracts with this supporting external structure.\nA new large-scale video dataset for human action recognition, called STAIR Actions is introduced. STAIR Actions contains 100 categories of action labels representing fine-grained everyday home actions so that it can be applied to research in various home tasks such as nursing, caring, and security. In STAIR Actions, each video has a single action label. Moreover, for each action category, there are around 1,000 videos that were obtained from YouTube or produced by crowdsource workers. The duration of each video is mostly five to six seconds. The total number of videos is 102,462. We explain how we constructed STAIR Actions and show the characteristics of STAIR Actions compared to existing datasets for human action recognition. Experiments with three major models for action recognition show that STAIR Actions can train large models and achieve good performance. STAIR Actions can be downloaded from http://actions.stair.center.\nThis paper demonstrates two novel methods to estimate the global SNR of speech signals. In both methods, Deep Neural Network-Hidden Markov Model (DNN-HMM) acoustic model used in speech recognition systems is leveraged for the additional task of SNR estimation. In the first method, the entropy of the DNN-HMM output is computed. Recent work on bayesian deep learning has shown that a DNN-HMM trained with dropout can be used to estimate model uncertainty by approximating it as a deep Gaussian process. In the second method, this approximation is used to obtain model uncertainty estimates. Noise specific regressors are used to predict the SNR from the entropy and model uncertainty. The DNN-HMM is trained on GRID corpus and tested on different noise profiles from the DEMAND noise database at SNR levels ranging from -10 dB to 30 dB.\nThe Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.\n3D Convolutional Neural Networks are sensitive to transformations applied to their input. This is a problem because a voxelized version of a 3D object, and its rotated clone, will look unrelated to each other after passing through to the last layer of a network. Instead, an idealized model would preserve a meaningful representation of the voxelized object, while explaining the pose-difference between the two inputs. An equivariant representation vector has two components: the invariant identity part, and a discernable encoding of the transformation. Models that can't explain pose-differences risk \"diluting\" the representation, in pursuit of optimizing a classification or regression loss function.   We introduce a Group Convolutional Neural Network with linear equivariance to translations and right angle rotations in three dimensions. We call this network CubeNet, reflecting its cube-like symmetry. By construction, this network helps preserve a 3D shape's global and local signature, as it is transformed through successive layers. We apply this network to a variety of 3D inference problems, achieving state-of-the-art on the ModelNet10 classification challenge, and comparable performance on the ISBI 2012 Connectome Segmentation Benchmark. To the best of our knowledge, this is the first 3D rotation equivariant CNN for voxel representations.\nImage segmentation needs both local boundary position information and global object context information. The performance of the recent state-of-the-art method, fully convolutional networks, reaches a bottleneck due to the neural network limit after balancing between the two types of information simultaneously in an end-to-end training style. To overcome this problem, we divide the semantic image segmentation into temporal subtasks. First, we find a possible pixel position of some object boundary; then trace the boundary at steps within a limited length until the whole object is outlined. We present the first deep reinforcement learning approach to semantic image segmentation, called DeepOutline, which outperforms other algorithms in Coco detection leaderboard in the middle and large size person category in Coco val2017 dataset. Meanwhile, it provides an insight into a divide and conquer way by reinforcement learning on computer vision problems.\nJoint visual attention is characterized by two or more individuals looking at a common target at the same time. The ability to identify joint attention in scenes, the people involved, and their common target, is fundamental to the understanding of social interactions, including others' intentions and goals. In this work we deal with the extraction of joint attention events, and the use of such events for image descriptions. The work makes two novel contributions. First, our extraction algorithm is the first which identifies joint visual attention in single static images. It computes 3D gaze direction, identifies the gaze target by combining gaze direction with a 3D depth map computed for the image, and identifies the common gaze target. Second, we use a human study to demonstrate the sensitivity of humans to joint attention, suggesting that the detection of such a configuration in an image can be useful for understanding the image, including the goals of the agents and their joint activity, and therefore can contribute to image captioning and related tasks.\nThe web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available. Although there have been efforts to learn extractors from automatically-generated labels, these methods are not sufficiently robust to succeed in settings with complex schemas and information-rich websites.   In this paper we present a new method for automatic extraction from semi-structured websites based on distant supervision. We automatically generate training labels by aligning an existing knowledge base with a web page and leveraging the unique structural characteristics of semi-structured websites. We then train a classifier based on the potentially noisy and incomplete labels to predict new relation instances. Our method can compete with annotation-based techniques in the literature in terms of extraction quality. A large-scale experiment on over 400,000 pages from dozens of multi-lingual long-tail websites harvested 1.25 million facts at a precision of 90%.\nUnintentional falls can cause severe injuries and even death, especially if no immediate assistance is given. The aim of Fall Detection Systems (FDSs) is to detect an occurring fall. This information can be used to trigger the necessary assistance in case of injury. This can be done by using either ambient-based sensors, e.g. cameras, or wearable devices. The aim of this work is to study the technical aspects of FDSs based on wearable devices and artificial intelligence techniques, in particular Deep Learning (DL), to implement an effective algorithm for on-line fall detection. The proposed classifier is based on a Recurrent Neural Network (RNN) model with underlying Long Short-Term Memory (LSTM) blocks. The method is tested on the publicly available SisFall dataset, with extended annotation, and compared with the results obtained by the SisFall authors.\nThe W3C's Web of Things working group is aimed at addressing the interoperability problem on the Internet of Things using Linked Data as uniform interface. While Linked Data paves the way towards combining such devices into integrated applications, traditional solutions for specifying the control flow of applications do not work seamlessly with Linked Data. We therefore tackle the problem of the specification, execution, and monitoring of applications in the context of Linked Data. We present a novel approach that combines workflows, semantic reasoning, and RESTful interaction into one integrated solution. We contribute to the state of the art by (1) defining an ontology for describing workflow models and instances, (2) providing operational semantics for the ontology that allows for the execution and monitoring of workflow instances, (3) presenting a benchmark to evaluate our solution. Moreover, we showcase how we used the ontology and the operational semantics to monitor pilots executing workflows in virtual aircraft cockpits.\nThe analysis of practical probabilistic models on the computer demands a convenient representation for the available knowledge and an efficient algorithm to perform inference. An appealing representation is the influence diagram, a network that makes explicit the random variables in a model and their probabilistic dependencies. Recent advances have developed solution procedures based on the influence diagram. In this paper, we examine the fundamental properties that underlie those techniques, and the information about the probabilistic structure that is available in the influence diagram representation. The influence diagram is a convenient representation for computer processing while also being clear and non-mathematical. It displays probabilistic dependence precisely, in a way that is intuitive for decision makers and experts to understand and communicate. As a result, the same influence diagram can be used to build, assess and analyze a model, facilitating changes in the formulation and feedback from sensitivity analysis. The goal in this paper is to determine arbitrary conditional probability distributions from a given probabilistic model. Given qualitative information about the dependence of the random variables in the model we can, for a specific conditional expression, specify precisely what quantitative information we need to be able to determine the desired conditional probability distribution. It is also shown how we can find that probability distribution by performing operations locally, that is, over subspaces of the joint distribution. In this way, we can exploit the conditional independence present in the model to avoid having to construct or manipulate the full joint distribution. These results are extended to include maximal processing when the information available is incomplete, and optimal decision making in an uncertain environment. Influence diagrams as a computer-aided modeling tool were developed by Miller, Merkofer, and Howard [5] and extended by Howard and Matheson [2]. Good descriptions of how to use them in modeling are in Owen [7] and Howard and Matheson [2]. The notion of solving a decision problem through influence diagrams was examined by Olmsted [6] and such an algorithm was developed by Shachter [8]. The latter paper also shows how influence diagrams can be used to perform a variety of sensitivity analyses. This paper extends those results by developing a theory of the properties of the diagram that are used by the algorithm, and the information needed to solve arbitrary probability inference problems. Section 2 develops the notation and the framework for the paper and the relationship between influence diagrams and joint probability distributions. The general probabilistic inference problem is posed in Section 3. In Section 4 the transformations on the diagram are developed and then put together into a solution procedure in Section 5. In Section 6, this procedure is used to calculate the information requirement to solve an inference problem and the maximal processing that can be performed with incomplete information. Section 7 contains a summary of results.\nThis paper surveys the emerging science of how to design a ``COllective INtelligence'' (COIN). A COIN is a large multi-agent system where:   (i) There is little to no centralized communication or control; and   (ii) There is a provided world utility function that rates the possible histories of the full system.   In particular, we are interested in COINs in which each agent runs a reinforcement learning (RL) algorithm. Rather than use a conventional modeling approach (e.g., model the system dynamics, and hand-tune agents to cooperate), we aim to solve the COIN design problem implicitly, via the ``adaptive'' character of the RL algorithms of each of the agents. This approach introduces an entirely new, profound design problem: Assuming the RL algorithms are able to achieve high rewards, what reward functions for the individual agents will, when pursued by those agents, result in high world utility? In other words, what reward functions will best ensure that we do not have phenomena like the tragedy of the commons, Braess's paradox, or the liquidity trap?   Although still very young, research specifically concentrating on the COIN design problem has already resulted in successes in artificial domains, in particular in packet-routing, the leader-follower problem, and in variants of Arthur's El Farol bar problem. It is expected that as it matures and draws upon other disciplines related to COINs, this research will greatly expand the range of tasks addressable by human engineers. Moreover, in addition to drawing on them, such a fully developed scie nce of COIN design may provide much insight into other already established scientific fields, such as economics, game theory, and population biology.\nThis paper shows how a machine, which observes stimuli through an uncharacterized, uncalibrated channel and sensor, can glean machine-independent information (i.e., channel- and sensor-independent information) about the stimuli. First, we demonstrate that a machine defines a specific coordinate system on the stimulus state space, with the nature of that coordinate system depending on the device's channel and sensor. Thus, machines with different channels and sensors \"see\" the same stimulus trajectory through state space, but in different machine-specific coordinate systems. For a large variety of physical stimuli, statistical properties of that trajectory endow the stimulus configuration space with differential geometric structure (a metric and parallel transfer procedure), which can then be used to represent relative stimulus configurations in a coordinate-system-independent manner (and, therefore, in a channel- and sensor-independent manner). The resulting description is an \"inner\" property of the stimulus time series in the sense that it does not depend on extrinsic factors like the observer's choice of a coordinate system in which the stimulus is viewed (i.e., the observer's choice of channel and sensor). This methodology is illustrated with analytic examples and with a numerically simulated experiment. In an intelligent sensory device, this kind of representation \"engine\" could function as a \"front-end\" that passes channel/sensor-independent stimulus representations to a pattern recognition module. After a pattern recognizer has been trained in one of these devices, it could be used without change in other devices having different channels and sensors.\nWe present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist and field-astrobiologist. The Cyborg Astrobiologist platform has thus far been used for testing and development of these algorithms and systems: robotic acquisition of quasi-mosaics of images, real-time image segmentation, and real-time determination of interesting points in the image mosaics. The hardware and software systems function reliably, and the computer-vision algorithms are adequate for the first field tests. In addition to the proof-of-concept aspect of these field tests, the main result of these field tests is the enumeration of those issues that we can improve in the future, including: first, detection and accounting for shadows caused by 3D jagged edges in the outcrop; second, reincorporation of more sophisticated texture-analysis algorithms into the system; third, creation of hardware and software capabilities to control the camera's zoom lens in an intelligent manner; and fourth, development of algorithms for interpretation of complex geological scenery. Nonetheless, despite these technical inadequacies, this Cyborg Astrobiologist system, consisting of a camera-equipped wearable-computer and its computer-vision algorithms, has demonstrated its ability of finding genuinely interesting points in real-time in the geological scenery, and then gathering more information about these interest points in an automated manner.\nWe present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist. The Cyborg Astrobiologist platform has thus far been used for testing and development of these algorithms and systems: robotic acquisition of quasi-mosaics of images, real-time image segmentation, and real-time determination of interesting points in the image mosaics. This work is more of a test of the whole system, rather than of any one part of the system. However, beyond the concept of the system itself, the uncommon map (despite its simplicity) is the main innovative part of the system. The uncommon map helps to determine interest-points in a context-free manner. Overall, the hardware and software systems function reliably, and the computer-vision algorithms are adequate for the first field tests. In addition to the proof-of-concept aspect of these field tests, the main result of these field tests is the enumeration of those issues that we can improve in the future, including: dealing with structural shadow and microtexture, and also, controlling the camera's zoom lens in an intelligent manner. Nonetheless, despite these and other technical inadequacies, this Cyborg Astrobiologist system, consisting of a camera-equipped wearable-computer and its computer-vision algorithms, has demonstrated its ability of finding genuinely interesting points in real-time in the geological scenery, and then gathering more information about these interest points in an automated manner. We use these capabilities for autonomous guidance towards geological points-of-interest.\nIn recent years genetic algorithms have emerged as a useful tool for the heuristic solution of complex discrete optimisation problems. In particular there has been considerable interest in their use in tackling problems arising in the areas of scheduling and timetabling. However, the classical genetic algorithm paradigm is not well equipped to handle constraints and successful implementations usually require some sort of modification to enable the search to exploit problem specific knowledge in order to overcome this shortcoming. This paper is concerned with the development of a family of genetic algorithms for the solution of a nurse rostering problem at a major UK hospital. The hospital is made up of wards of up to 30 nurses. Each ward has its own group of nurses whose shifts have to be scheduled on a weekly basis. In addition to fulfilling the minimum demand for staff over three daily shifts, nurses' wishes and qualifications have to be taken into account. The schedules must also be seen to be fair, in that unpopular shifts have to be spread evenly amongst all nurses, and other restrictions, such as team nursing and special conditions for senior staff, have to be satisfied. The basis of the family of genetic algorithms is a classical genetic algorithm consisting of n-point crossover, single-bit mutation and a rank-based selection. The solution space consists of all schedules in which each nurse works the required number of shifts, but the remaining constraints, both hard and soft, are relaxed and penalised in the fitness function. The talk will start with a detailed description of the problem and the initial implementation and will go on to highlight the shortcomings of such an approach, in terms of the key element of balancing feasibility, i.e. covering the demand and work regulations, and quality, as measured by the nurses' preferences. A series of experiments involving parameter adaptation, niching, intelligent weights, delta coding, local hill climbing, migration and special selection rules will then be outlined and it will be shown how a series of these enhancements were able to eradicate these difficulties. Results based on several months' real data will be used to measure the impact of each modification, and to show that the final algorithm is able to compete with a tabu search approach currently employed at the hospital. The talk will conclude with some observations as to the overall quality of this approach to this and similar problems.\nGesture recognition is mainly apprehensive on analyzing the functionality of human wits. The main goal of gesture recognition is to create a system which can recognize specific human gestures and use them to convey information or for device control. Hand gestures provide a separate complementary modality to speech for expressing ones ideas. Information associated with hand gestures in a conversation is degree,discourse structure, spatial and temporal structure. The approaches present can be mainly divided into Data-Glove Based and Vision Based approaches. An important face feature point is the nose tip. Since nose is the highest protruding point from the face. Besides that, it is not affected by facial expressions.Another important function of the nose is that it is able to indicate the head pose. Knowledge of the nose location will enable us to align an unknown 3D face with those in a face database. Eye detection is divided into eye position detection and eye contour detection. Existing works in eye detection can be classified into two major categories: traditional image-based passive approaches and the active IR based approaches. The former uses intensity and shape of eyes for detection and the latter works on the assumption that eyes have a reflection under near IR illumination and produce bright/dark pupil effect. The traditional methods can be broadly classified into three categories: template based methods,appearance based methods and feature based methods. The purpose of this paper is to compare various human Gesture recognition systems for interfacing machines directly to human wits without any corporeal media in an ambient environment.\nSummary of results in last project period (1. 10. 2009 - 30. 9. 2010) of SNFS Project \"From locomotion to cognition\"   The research that we have been involved in, and will continue to do, starts from the insight that in order to understand and design intelligent behavior, we must adopt an embodied perspective, i.e. we must take the entire agent, including its shape or morphology, the materials out of which it is built, and its interaction with the environment into account, in addition to the neural control. A lot of our research in the past has been on relatively low-level sensory-motor tasks such as locomotion (e.g. walking, running, jumping), navigation, and grasping. While this research is of interest in itself, in the context of artificial intelligence and cognitive science, this leads to the question of what these kinds of tasks have to do with higher levels of cognition, or to put it more provocatively, \"What does walking have to do with thinking?\" This question is of course reminiscent of the notorious \"symbol grounding problem\". In contrast to most of the research on symbol grounding, we propose to exploit the dynamic interaction between the embodied agent and the environment as the basis for grounding. We use the term \"morphological computation\" to designate the fact that some of the control or computation can be taken over by the dynamic interaction derived from morphological properties (e.g. the passive forward swing of the leg in walking, the spring-like properties of the muscles, and the weight distribution). By taking morphological computation into account, an agent will be able to achieve not only faster, more robust, and more energy-efficient behavior, but also more situated exploration by the agent for the comprehensive understanding of the environment.\nSummary of results (project period 1. 10. 2008 - 30. 9. 2009) of SNFS Project \"From locomotion to cognition\"   The research that we have been involved in, and will continue to do, starts from the insight that in order to understand and design intelligent behavior, we must adopt an embodied perspective, i.e. we must take the entire agent, including its shape or morphology, the materials out of which it is built, and its interaction with the environment into account, in addition to the neural control. A lot of our research in the past has been on relatively low-level sensory-motor tasks such as locomotion (e.g. walking, running, jumping), navigation, and grasping. While this research is of interest in itself, in the context of artificial intelligence and cognitive science, this leads to the question of what these kinds of tasks have to do with higher levels of cognition, or to put it more provocatively, \"What does walking have to do with thinking?\" This question is of course reminiscent of the notorious \"symbol grounding problem\". In contrast to most of the research on symbol grounding, we propose to exploit the dynamic interaction between the embodied agent and the environment as the basis for grounding. We use the term \"morphological computation\" to designate the fact that some of the control or computation can be taken over by the dynamic interaction derived from morphological properties (e.g. the passive forward swing of the leg in walking, the spring-like properties of the muscles, and the weight distribution). By taking morphological computation into account, an agent will be able to achieve not only faster, more robust, and more energy-efficient behavior, but also more situated exploration by the agent for the comprehensive understanding of the environment.\nIn India financial markets have existed for many years. A functionally accented, diverse, efficient and flexible financial system is vital to the national objective of creating a market driven, productive and competitive economy. Today markets of varying maturity exist in equity, debt, commodities and foreign exchange. In this work we attempt to generate prediction rules scheme for stock price movement at Bombay Stock Exchange using an important Soft Computing paradigm viz., Rough Fuzzy Multi Layer Perception. The use of Computational Intelligence Systems such as Neural Networks, Fuzzy Sets, Genetic Algorithms, etc. for Stock Market Predictions has been widely established. The process is to extract knowledge in the form of rules from daily stock movements. These rules can then be used to guide investors. To increase the efficiency of the prediction process, Rough Sets is used to discretize the data. The methodology uses a Genetic Algorithm to obtain a structured network suitable for both classification and rule extraction. The modular concept, based on divide and conquer strategy, provides accelerated training and a compact network suitable for generating a minimum number of rules with high certainty values. The concept of variable mutation operator is introduced for preserving the localized structure of the constituting Knowledge Based sub-networks, while they are integrated and evolved. Rough Set Dependency Rules are generated directly from the real valued attribute table containing Fuzzy membership values. The paradigm is thus used to develop a rule extraction algorithm. The extracted rules are compared with some of the related rule extraction techniques on the basis of some quantitative performance indices. The proposed methodology extracts rules which are less in number, are accurate, have high certainty factor and have low confusion with less computation time.\nIf we consider Big History as simply 'our' example of the process of cosmic evolution playing out, then we can seek to broaden our view of our possible fate as a species by asking questions about what paths or trajectories other species' own versions of Big History might take or have taken. This paper explores the broad outlines of possible scenarios for the evolution of long-lived intelligent engineering species---scenarios which might have been part of another species' own Big History story, or which may yet lie ahead in our own distant future. A sufficiently long-lived engineering-oriented species may decide to undertake a program of macro-engineering projects that might eventually lead to a re-engineered galaxy so altered that its artificiality may be detectable from Earth. We consider activities that lead ultimately to a galactic structure consisting of a central inner core surrounded by a more distant ring of stars separated by a relatively sparser 'gap', where star systems and stellar materials may have been removed, 'lifted' or turned into Dyson Spheres. When one looks to the sky, one finds that such galaxies do indeed exist---including the beautiful ringed galaxy known as 'Hoag's Object' (PGC 54559) in the constellation Serpens. This leads us to pose the question: Is Hoag's Object an example of galaxy-scale macro-engineering? And this suggests a program of possible observational activities and theoretical explorations, several of which are presented here, that could be carried out in order to begin to investigate this beguiling question.\nEvery day, billions of mobile network events (i.e. CDRs) are generated by cellular phone operator companies. Latent in this data are inspiring insights about human actions and behaviors, the discovery of which is important because context-aware applications and services hold the key to user-driven, intelligent services, which can enhance our everyday lives such as social and economic development, urban planning, and health prevention. The major challenge in this area is that interpreting such a big stream of data requires a deep understanding of mobile network events' context through available background knowledge. This article addresses the issues in context awareness given heterogeneous and uncertain data of mobile network events missing reliable information on the context of this activity. The contribution of this research is a model from a combination of logical and statistical reasoning standpoints for enabling human activity inference in qualitative terms from open geographical data that aimed at improving the quality of human behaviors recognition tasks from CDRs. We use open geographical data, Openstreetmap (OSM), as a proxy for predicting the content of human activity in the area. The user study performed in Trento shows that predicted human activities (top level) match the survey data with around 93% overall accuracy. The extensive validation for predicting a more specific economic type of human activity performed in Barcelona, by employing credit card transaction data. The analysis identifies that appropriately normalized data on points of interest (POI) is a good proxy for predicting human economical activities, with 84% accuracy on average. So the model is proven to be efficient for predicting the context of human activity, when its total level could be efficiently observed from cell phone data records, missing contextual information however.\nWe propose a high level network architecture for an economic system that integrates money, governance and reputation. We introduce a method for issuing, and redeeming a digital coin using a mechanism to create a sustainable global economy and a free market. To maintain a currency's value over time, and therefore be money proper, we claim it must be issued by the buyer and backed for value by the seller, exchanging the products of labour, in a free market. We also claim that a free market and sustainable economy cannot be maintained using economically arbitrary creation and allocation of money. Nakamoto, with Bitcoin, introduced a new technology called the cryptographic blockchain to operate a decentralised and distributed accounts ledger without the need for an untrusted third party. This blockchain technology creates and allocates new digital currency as a reward for \"proof-of-work\", to secure the network. However, no currency, digital or otherwise, has solved how to create and allocate money in an economically non-arbitrary way, or how to govern and trust a world-scale free enterprise money system. We propose an \"Ontologically Networked Exchange\" (ONE), with purpose as its highest order domain. Each purpose is defined in a contract, and the entire economy of contracts is structured in a unified ontology. We claim to secure the ONE network using economically non-arbitrary methodologies and economically incented human behaviour. Decisions influenced by reputation help to secure the network without an untrusted third party. The stack of contracts, organised in a unified ontology, functions as a super recursive algorithm, with individual use programming the algorithm, acting as the \"oracle\". The state of the algorithm becomes the \"memory\" of a scalable and trustable artificial intelligence (AI). This AI offers a new platform for what we call the \"Autonomy-of-Things\" (AoT).\nRoute choice in multimodal networks shows a considerable variation between different individuals as well as the current situational context. Personalization of recommendation algorithms are already common in many areas, e.g., online retail. However, most online routing applications still provide shortest distance or shortest travel-time routes only, neglecting individual preferences as well as the current situation. Both aspects are of particular importance in a multimodal setting as attractivity of some transportation modes such as biking crucially depends on personal characteristics and exogenous factors like the weather. This paper introduces the FAVourite rOUte Recommendation (FAVOUR) approach to provide personalized, situation-aware route proposals based on three steps: first, at the initialization stage, the user provides limited information (home location, work place, mobility options, sociodemographics) used to select one out of a small number of initial profiles. Second, based on this information, a stated preference survey is designed in order to sharpen the profile. In this step a mass preference prior is used to encode the prior knowledge on preferences from the class identified in step one. And third, subsequently the profile is continuously updated during usage of the routing services. The last two steps use Bayesian learning techniques in order to incorporate information from all contributing individuals. The FAVOUR approach is presented in detail and tested on a small number of survey participants. The experimental results on this real-world dataset show that FAVOUR generates better-quality recommendations w.r.t. alternative learning algorithms from the literature. In particular the definition of the mass preference prior for initialization of step two is shown to provide better predictions than a number of alternatives from the literature.\nNowadays large amounts of GPS trajectory data is being continuously collected by GPS-enabled devices such as vehicles navigation systems and mobile phones. GPS trajectory data is useful for applications such as traffic management, location forecasting, and itinerary planning. Such applications often need to extract the time-stamped Sequence of Visited Locations (SVLs) of the mobile objects. The nearest neighbor query (NNQ) is the most applied method for labeling the visited locations based on the IDs of the POIs in the process of SVL generation. NNQ in some scenarios is not accurate enough. To improve the quality of the extracted SVLs, instead of using NNQ, we label the visited locations as the IDs of the POIs which geometrically intersect with the GPS observations. Intersection operator requires the accurate geometry of the points of interest which we refer to them as the Geometries of Interest (GOIs). In some application domains (e.g. movement trajectories of animals), adequate information about the POIs and their GOIs may not be available a priori, or they may not be publicly accessible and, therefore, they need to be derived from GPS trajectory data. In this paper we propose a novel method for estimating the POIs and their GOIs, which consists of three phases: (i) extracting the geometries of the stay regions; (ii) constructing the geometry of destination regions based on the extracted stay regions; and (iii) constructing the GOIs based on the geometries of the destination regions. Using the geometric similarity to known GOIs as the major evaluation criterion, the experiments we performed using long-term GPS trajectory data show that our method outperforms the existing approaches.\nWe introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and contains 1 dialog with 10 question-answer pairs on ~120k images from COCO, with a total of ~1.2M dialog question-answer pairs.   We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders -- Late Fusion, Hierarchical Recurrent Encoder and Memory Network -- and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response. We quantify gap between machine and human performance on the Visual Dialog task via human studies. Putting it all together, we demonstrate the first 'visual chatbot'! Our dataset, code, trained models and visual chatbot are available on https://visualdialog.org\nIn this work, a computational intelligence (CI) technique named flexible neural tree (FNT) was developed to predict die filling performance of pharmaceutical granules and to identify significant die filling process variables. FNT resembles feedforward neural network, which creates a tree-like structure by using genetic programming. To improve accuracy, FNT parameters were optimized by using differential evolution algorithm. The performance of the FNT-based CI model was evaluated and compared with other CI techniques: multilayer perceptron, Gaussian process regression, and reduced error pruning tree. The accuracy of the CI model was evaluated experimentally using die filling as a case study. The die filling experiments were performed using a model shoe system and three different grades of microcrystalline cellulose (MCC) powders (MCC PH 101, MCC PH 102, and MCC DG). The feed powders were roll-compacted and milled into granules. The granules were then sieved into samples of various size classes. The mass of granules deposited into the die at different shoe speeds was measured. From these experiments, a dataset consisting true density, mean diameter (d50), granule size, and shoe speed as the inputs and the deposited mass as the output was generated. Cross-validation (CV) methods such as 10FCV and 5x2FCV were applied to develop and to validate the predictive models. It was found that the FNT-based CI model (for both CV methods) performed much better than other CI models. Additionally, it was observed that process variables such as the granule size and the shoe speed had a higher impact on the predictability than that of the powder property such as d50. Furthermore, validation of model prediction with experimental data showed that the die filling behavior of coarse granules could be better predicted than that of fine granules.\nObject detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.\nMobile network that millions of people use every day is one of the most complex systems in real world. Optimization of mobile network to meet exploding customer demand and reduce CAPEX/OPEX poses greater challenges than in prior works. Learning to solve complex problems in real world to benefit everyone and make the world better has long been ultimate goal of AI. However, it still remains an unsolved problem for deep reinforcement learning (DRL), given imperfect information in real world, huge state/action space, lots of data needed for training, associated time/cost, multi-agent interactions, potential negative impact to real world, etc. To bridge this reality gap, we proposed a DRL framework to direct transfer optimal policy learned from multi-tasks in source domain to unseen similar tasks in target domain without any further training in both domains. First, we distilled temporal-spatial relationships between cells and mobile users to scalable 3D image-like tensor to best characterize partially observed mobile network. Second, inspired by AlphaGo, we used a novel self-play mechanism to empower DRL agent to gradually improve its intelligence by competing for best record on multiple tasks. Third, a decentralized DRL method is proposed to coordinate multi-agents to compete and cooperate as a team to maximize global reward and minimize potential negative impact. Using 7693 unseen test tasks over 160 unseen simulated mobile networks and 6 field trials over 4 commercial mobile networks in real world, we demonstrated the capability of our approach to direct transfer the learning from one simulator to another simulator, and from simulation to real world. This is the first time that a DRL agent successfully transfers its learning directly from simulation to very complex real world problems with incomplete and imperfect information, huge state/action space and multi-agent interactions.\nRecent breakthroughs in machine learning especially artificial intelligence shift the paradigm of wireless communication towards intelligence radios. One of their core operations is automatic modulation recognition (AMR). Existing research focuses on coherent modulation schemes such as QAM, PSK and FSK. The AMR of (non-coherent) space-time modulation remains an uncharted area despite its wide deployment in modern multiple-input-multiple-output (MIMO) systems. The scheme using a so called Grassmann constellation enables rate-enhancement using multi-antennas and blind detection. In this work, we propose an AMR approach for Grassmann constellation based on data clustering, which differs from traditional AMR based on classification using a modulation database. The approach allows algorithms for clustering on the Grassmann manifold, such as Grassmann K-means and depth-first search, originally developed for computer vision to be applied to AMR. We further develop an analytical framework for studying and designing these algorithms in the context of AMR. First, the maximum-likelihood Grassmann constellation detection is proved to be equivalent to clustering on the Grassmannian. Thereby, a well-known machine-learning result that was originally established only for the Euclidean space is rediscovered for the Grassmannian. Next, despite a rich literature on algorithmic design, theoretical analysis of data clustering is largely overlooked due to the lack of tractable techniques. We tackle the challenge by introducing probabilistic metrics for measuring the inter-cluster separability and intra-cluster connectivity of received space-time symbols and deriving them using tools from differential geometry and Grassmannian packing. The results provide useful insights into the effects of various parameters ranging from the signal-to-noise ratio to constellation size, facilitating algorithmic design.\nNeoteny, also spelled Paedomorphosis, can be defined in biological terms as the retention by an organism of juvenile or even larval traits into later life. In some species, all morphological development is retarded; the organism is juvenilized but sexually mature. Such shifts of reproductive capability would appear to have adaptive significance to organisms that exhibit it. In terms of evolutionary theory, the process of paedomorphosis suggests that larval stages and developmental phases of existing organisms may give rise, under certain circumstances, to wholly new organisms. Although the present work does not pretend to model or simulate the biological details of such a concept in any way, these ideas were incorporated by a rather simple abstract computational strategy, in order to allow (if possible) for faster convergence into simple non-memetic Genetic Algorithms, i.e. without using local improvement procedures (e.g. via Baldwin or Lamarckian learning). As a case-study, the Genetic Algorithm was used for colour image segmentation purposes by using K-mean unsupervised clustering methods, namely for guiding the evolutionary algorithm in his search for finding the optimal or sub-optimal data partition. Average results suggest that the use of neotonic strategies by employing juvenile genotypes into the later generations and the use of linear-dynamic mutation rates instead of constant, can increase fitness values by 58% comparing to classical Genetic Algorithms, independently from the starting population characteristics on the search space. KEYWORDS: Genetic Algorithms, Artificial Neoteny, Dynamic Mutation Rates, Faster Convergence, Colour Image Segmentation, Classification, Clustering.\nBarring swarm robotics, a substantial share of current machine-human and machine-machine learning and interaction mechanisms are being developed and fed by results of agent-based computer simulations, game-theoretic models, or robotic experiments based on a dyadic communication pattern. Yet, in real life, humans no less frequently communicate in groups, and gain knowledge and take decisions basing on information cumulatively gleaned from more than one single source. These properties should be taken into consideration in the design of autonomous artificial cognitive systems construed to interact with learn from more than one contact or 'neighbour'. To this end, significant practical import can be gleaned from research applying strict science methodology to human and social phenomena, e.g. to discovery of realistic creativity potential spans, or the 'exposure thresholds' after which new information could be accepted by a cognitive agent. The results will be presented of a project analysing the social propagation of neologisms in a microblogging service. From local, low-level interactions and information flows between agents inventing and imitating discrete lexemes we aim to describe the processes of the emergence of more global systemic order and dynamics, using the latest methods of complexity science. Whether in order to mimic them, or to 'enhance' them, parameters gleaned from complexity science approaches to humans' social and humanistic behaviour should subsequently be incorporated as points of reference in the field of robotics and human-machine interaction.\nIt has been repeatedly proposed to expand the scope for SETI, and one of the suggested alternatives to radio is the biological media. Genomic DNA is already used on Earth to store non-biological information. Though smaller in capacity, but stronger in noise immunity is the genetic code. The code is a flexible mapping between codons and amino acids, and this flexibility allows modifying the code artificially. But once fixed, the code might stay unchanged over cosmological timescales. Thus, it represents a reliable storage for an intelligent signature, if that conforms to biological and thermodynamic requirements. As the actual scenario for the origin of terrestrial life is far from being settled, the proposal that it might have been seeded intentionally cannot be ruled out. A statistically strong signal in the genetic code is then a testable consequence of such scenario. Here we show that the terrestrial code displays a thorough precision orderliness matching the criteria to be considered an informational signal. Simple arrangements of the code reveal an ensemble of arithmetical and ideographical patterns of the same symbolic language. Accurate and systematic, these underlying patterns appear as a product of precision logic and nontrivial computing rather than of stochastic processes. The patterns are profound to the extent that the code mapping itself is uniquely deduced from their algebraic representation. The signal displays readily recognizable hallmarks of artificiality. Besides, extraction of the signal involves logically straightforward but abstract operations, making the patterns essentially irreducible to any natural origin. Plausible way of embedding the signal into the code and possible interpretation of its content are discussed. Overall, while the code is nearly optimized biologically, its limited capacity is used extremely efficiently to store non-biological information.\nWe propose an artificial immune model for intrusion detection in distributed systems based on a relatively recent theory in immunology called Danger theory. Based on Danger theory, immune response in natural systems is a result of sensing corruption as well as sensing unknown substances. In contrast, traditional self-nonself discrimination theory states that immune response is only initiated by sensing nonself (unknown) patterns. Danger theory solves many problems that could only be partially explained by the traditional model. Although the traditional model is simpler, such problems result in high false positive rates in immune-inspired intrusion detection systems. We believe using danger theory in a multi-agent environment that computationally emulates the behavior of natural immune systems is effective in reducing false positive rates. We first describe a simplified scenario of immune response in natural systems based on danger theory and then, convert it to a computational model as a network protocol. In our protocol, we define several immune signals and model cell signaling via message passing between agents that emulate cells. Most messages include application-specific patterns that must be meaningfully extracted from various system properties. We show how to model these messages in practice by performing a case study on the problem of detecting distributed denial-of-service attacks in wireless sensor networks. We conduct a set of systematic experiments to find a set of performance metrics that can accurately distinguish malicious patterns. The results indicate that the system can be efficiently used to detect malicious patterns with a high level of accuracy.\nThe discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the \"ground truth\" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.\nBy drawing on ideas from optimisation theory, artificial neural networks (ANN), graph embeddings and sparse representations, I develop a novel technique, termed SENNS (Sparse Extraction Neural NetworkS), aimed at addressing the feature extraction problem. The proposed method uses (preferably deep) ANNs for projecting input attribute vectors to an output space wherein pairwise distances are maximized for vectors belonging to different classes, but minimized for those belonging to the same class, while simultaneously enforcing sparsity on the ANN outputs. The vectors that result from the projection can then be used as features in any classifier of choice. Mathematically, I formulate the proposed method as the minimisation of an objective function which can be interpreted, in the ANN output space, as a negative factor of the sum of the squares of the pair-wise distances between output vectors belonging to different classes, added to a positive factor of the sum of squares of the pair-wise distances between output vectors belonging to the same classes, plus sparsity and weight decay terms. To derive an algorithm for minimizing the objective function via gradient descent, I use the multi-variate version of the chain rule to obtain the partial derivatives of the function with respect to ANN weights and biases, and find that each of the required partial derivatives can be expressed as a sum of six terms. As it turns out, four of those six terms can be computed using the standard back propagation algorithm; the fifth can be computed via a slight modification of the standard backpropagation algorithm; while the sixth one can be computed via simple arithmetic. Finally, I propose experiments on the ARABASE Arabic corpora of digits and letters, the CMU PIE database of faces, the MNIST digits database, and other standard machine learning databases.\nWe propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. Our method often improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant. On the WMT'14 benchmarks, a single multilingual model achieves comparable performance for English$\\rightarrow$French and surpasses state-of-the-art results for English$\\rightarrow$German. Similarly, a single multilingual model surpasses state-of-the-art results for French$\\rightarrow$English and German$\\rightarrow$English on WMT'14 and WMT'15 benchmarks respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.\nThe foundations of all methodologies for the measurement and verification (M&V) of energy savings are based on the same five key principles: accuracy, completeness, conservatism, consistency and transparency. The most widely accepted methodologies tend to generalise M&V so as to ensure applicability across the spectrum of energy conservation measures (ECM's). These do not provide a rigid calculation procedure to follow. This paper aims to bridge the gap between high-level methodologies and the practical application of modelling algorithms, with a focus on the industrial buildings sector. This is achieved with the development of a novel, machine learning supported methodology for M&V 2.0 which enables accurate quantification of savings.   A novel and computationally efficient feature selection algorithm and powerful machine learning regression algorithms are employed to maximise the effectiveness of available data. The baseline period energy consumption is modelled using artificial neural networks, support vector machines, k-nearest neighbours and multiple ordinary least squares regression. Improved knowledge discovery and an expanded boundary of analysis allow more complex energy systems be analysed, thus increasing the applicability of M&V. A case study in a large biomedical manufacturing facility is used to demonstrate the methodology's ability to accurately quantify the savings under real-world conditions. The ECM was found to result in 604,527 kWh of energy savings with 57% uncertainty at a confidence interval of 68%. 20 baseline energy models are developed using an exhaustive approach with the optimal model being used to quantify savings. The range of savings estimated with each model are presented and the acceptability of uncertainty is reviewed. The case study demonstrates the ability of the methodology to perform M&V to an acceptable standard in challenging circumstances.\nAlthough simple individually, artificial neurons provide state-of-the-art performance when interconnected in deep networks. Unknown to many, there exists an arguably even simpler and more versatile learning mechanism, namely, the Tsetlin Automaton. Merely by means of a single integer as memory, it learns the optimal action in stochastic environments. In this paper, we introduce the Tsetlin Machine, which solves complex pattern recognition problems with easy-to-interpret propositional formulas, composed by a collective of Tsetlin Automata. To eliminate the longstanding problem of vanishing signal-to-noise ratio, the Tsetlin Machine orchestrates the automata using a novel game. Our theoretical analysis establishes that the Nash equilibria of the game are aligned with the propositional formulas that provide optimal pattern recognition accuracy. This translates to learning without local optima, only global ones. We argue that the Tsetlin Machine finds the propositional formula that provides optimal accuracy, with probability arbitrarily close to unity. In four distinct benchmarks, the Tsetlin Machine outperforms both Neural Networks, SVMs, Random Forests, the Naive Bayes Classifier and Logistic Regression. It further turns out that the accuracy advantage of the Tsetlin Machine increases with lack of data. The Tsetlin Machine has a significant computational performance advantage since both inputs, patterns, and outputs are expressed as bits, while recognition of patterns relies on bit manipulation. The combination of accuracy, interpretability, and computational simplicity makes the Tsetlin Machine a promising tool for a wide range of domains, including safety-critical medicine. Being the first of its kind, we believe the Tsetlin Machine will kick-start completely new paths of research, with a potentially significant impact on the AI field and the applications of AI.\nMultiple sequence alignment (MSA) is a ubiquitous problem in computational biology. Although it is NP-hard to find an optimal solution for an arbitrary number of sequences, due to the importance of this problem researchers are trying to push the limits of exact algorithms further. Since MSA can be cast as a classical path finding problem, it is attracting a growing number of AI researchers interested in heuristic search algorithms as a challenge with actual practical relevance. In this paper, we first review two previous, complementary lines of research. Based on Hirschbergs algorithm, Dynamic Programming needs O(kN^(k-1)) space to store both the search frontier and the nodes needed to reconstruct the solution path, for k sequences of length N. Best first search, on the other hand, has the advantage of bounding the search space that has to be explored using a heuristic. However, it is necessary to maintain all explored nodes up to the final solution in order to prevent the search from re-expanding them at higher cost. Earlier approaches to reduce the Closed list are either incompatible with pruning methods for the Open list, or must retain at least the boundary of the Closed list. In this article, we present an algorithm that attempts at combining the respective advantages; like A* it uses a heuristic for pruning the search space, but reduces both the maximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. The underlying idea is to conduct a series of searches with successively increasing upper bounds, but using the DP ordering as the key for the Open priority queue. With a suitable choice of thresholds, in practice, a running time below four times that of A* can be expected. In our experiments we show that our algorithm outperforms one of the currently most successful algorithms for optimal multiple sequence alignments, Partial Expansion A*, both in time and memory. Moreover, we apply a refined heuristic based on optimal alignments not only of pairs of sequences, but of larger subsets. This idea is not new; however, to make it practically relevant we show that it is equally important to bound the heuristic computation appropriately, or the overhead can obliterate any possible gain. Furthermore, we discuss a number of improvements in time and space efficiency with regard to practical implementations. Our algorithm, used in conjunction with higher-dimensional heuristics, is able to calculate for the first time the optimal alignment for almost all of the problems in Reference 1 of the benchmark database BAliBASE.\nGiven a universe of discourse X-a domain of possible outcomes-an experiment may consist of selecting one of its elements, subject to the operation of chance, or of observing the elements, subject to imprecision. A priori uncertainty about the actual result of the experiment may be quantified, representing either the likelihood of the choice of :r_X or the degree to which any such X would be suitable as a description of the outcome. The former case corresponds to a probability distribution, while the latter gives a possibility assignment on X. The study of such assignments and their properties falls within the purview of possibility theory [DP88, Y80, Z783. It, like probability theory, assigns values between 0 and 1 to express likelihoods of outcomes. Here, however, the similarity ends. Possibility theory uses the maximum and minimum functions to combine uncertainties, whereas probability theory uses the plus and times operations. This leads to very dissimilar theories in terms of analytical framework, even though they share several semantic concepts. One of the shared concepts consists of expressing quantitatively the uncertainty associated with a given distribution. In probability theory its value corresponds to the gain of information that would result from conducting an experiment and ascertaining an actual result. This gain of information can equally well be viewed as a decrease in uncertainty about the outcome of an experiment. In this case the standard measure of information, and thus uncertainty, is Shannon entropy [AD75, G77]. It enjoys several advantages-it is characterized uniquely by a few, very natural properties, and it can be conveniently used in decision processes. This application is based on the principle of maximum entropy; it has become a popular method of relating decisions to uncertainty. This paper demonstrates that an equally integrated theory can be built on the foundation of possibility theory. We first show how to define measures of in formation and uncertainty for possibility assignments. Next we construct an information-based metric on the space of all possibility distributions defined on a given domain. It allows us to capture the notion of proximity in information content among the distributions. Lastly, we show that all the above constructions can be carried out for continuous distributions-possibility assignments on arbitrary measurable domains. We consider this step very significant-finite domains of discourse are but approximations of the real-life infinite domains. If possibility theory is to represent real world situations, it must handle continuous distributions both directly and through finite approximations. In the last section we discuss a principle of maximum uncertainty for possibility distributions. We show how such a principle could be formalized as an inference rule. We also suggest it could be derived as a consequence of simple assumptions about combining information. We would like to mention that possibility assignments can be viewed as fuzzy sets and that every fuzzy set gives rise to an assignment of possibilities. This correspondence has far reaching consequences in logic and in control theory. Our treatment here is independent of any special interpretation; in particular we speak of possibility distributions and possibility measures, defining them as measurable mappings into the interval [0, 1]. Our presentation is intended as a self-contained, albeit terse summary. Topics discussed were selected with care, to demonstrate both the completeness and a certain elegance of the theory. Proofs are not included; we only offer illustrative examples.\nCoalition formation is a fundamental type of interaction that involves the creation of coherent groupings of distinct, autonomous, agents in order to efficiently achieve their individual or collective goals. Forming effective coalitions is a major research challenge in the field of multi-agent systems. Central to this endeavour is the problem of determining which of the many possible coalitions to form in order to achieve some goal. This usually requires calculating a value for every possible coalition, known as the coalition value, which indicates how beneficial that coalition would be if it was formed. Once these values are calculated, the agents usually need to find a combination of coalitions, in which every agent belongs to exactly one coalition, and by which the overall outcome of the system is maximized. However, this coalition structure generation problem is extremely challenging due to the number of possible solutions that need to be examined, which grows exponentially with the number of agents involved. To date, therefore, many algorithms have been proposed to solve this problem using different techniques ranging from dynamic programming, to integer programming, to stochastic search all of which suffer from major limitations relating to execution time, solution quality, and memory requirements.   With this in mind, we develop an anytime algorithm to solve the coalition structure generation problem. Specifically, the algorithm uses a novel representation of the search space, which partitions the space of possible solutions into sub-spaces such that it is possible to compute upper and lower bounds on the values of the best coalition structures in them. These bounds are then used to identify the sub-spaces that have no potential of containing the optimal solution so that they can be pruned. The algorithm, then, searches through the remaining sub-spaces very efficiently using a branch-and-bound technique to avoid examining all the solutions within the searched subspace(s). In this setting, we prove that our algorithm enumerates all coalition structures efficiently by avoiding redundant and invalid solutions automatically. Moreover, in order to effectively test our algorithm we develop a new type of input distribution which allows us to generate more reliable benchmarks compared to the input distributions previously used in the field. Given this new distribution, we show that for 27 agents our algorithm is able to find solutions that are optimal in 0.175% of the time required by the fastest available algorithm in the literature. The algorithm is anytime, and if interrupted before it would have normally terminated, it can still provide a solution that is guaranteed to be within a bound from the optimal one. Moreover, the guarantees we provide on the quality of the solution are significantly better than those provided by the previous state of the art algorithms designed for this purpose. For example, for the worst case distribution given 25 agents, our algorithm is able to find a 90% efficient solution in around 10% of time it takes to find the optimal solution.\n(l) I have enough evidence to render the sentence S probable. (la) So, relative to what I know, it is rational of me to believe S. (2) Now that I have more evidence, S may no longer be probable. (2a) So now, relative to what I know, it is not rational of me to believe S. These seem a perfectly ordinary, common sense, pair of situations. Generally and vaguely, I take them to embody what I shall call probabilistic inference. This form of inference is clearly non-monotonic. Relatively few people have taken this form of inference, based on high probability, to serve as a foundation for non-monotonic logic or for a logical or defeasible inference. There are exceptions: Jane Nutter [16] thinks that sometimes probability has something to do with non-monotonic reasoning. Judea Pearl [ 17] has recently been exploring the possibility. There are any number of people whom one might call probability enthusiasts who feel that probability provides all the answers by itself, with no need of help from logic. Cheeseman [1], Henrion [5] and others think it useful to look at a distribution of probabilities over a whole algebra of statements, to update that distribution in the light of new evidence, and to use the latest updated distribution of probability over the algebra as a basis for planning and decision making. A slightly weaker form of this approach is captured by Nilsson [15], where one assumes certain probabilities for certain statements, and infers the probabilities, or constraints on the probabilities of other statement. None of this corresponds to what I call probabilistic inference. All of the inference that is taking place, either in Bayesian updating, or in probabilistic logic, is strictly deductive. Deductive inference, particularly that concerned with the distribution of classical probabilities or chances, is of great importance. But this is not to say that there is no important role for what earlier logicians have called \"ampliative\" or \"inductive\" or \"scientific\" inference, in which the conclusion goes beyond the premises, asserts more than do the premises. This depends on what David Israel [6] has called \"real rules of inference\". It is characteristic of any such logic or inference procedure that it can go wrong: that statements accepted at one point may be rejected at a later point. Research underlying the results reported here has been partially supported by the Signals Warfare Center of the United States Army.\nProbability theory, epistemically interpreted, provides an excellent, if not the best available account of inductive reasoning. This is so because there are general and definite rules for the change of subjective probabilities through information or experience; induction and belief change are one and same topic, after all. The most basic of these rules is simply to conditionalize with respect to the information received; and there are similar and more general rules. 1 Hence, a fundamental reason for the epistemological success of probability theory is that there at all exists a well-behaved concept of conditional probability. Still, people have, and have reasons for, various concerns over probability theory. One of these is my starting point: Intuitively, we have the notion of plain belief; we believe propositions2 to be true (or to be false or neither). Probability theory, however, offers no formal counterpart to this notion. Believing A is not the same as having probability 1 for A, because probability 1 is incorrigible3; but plain belief is clearly corrigible. And believing A is not the same as giving A a probability larger than some 1 - c, because believing A and believing B is usually taken to be equivalent to believing A & B.4 Thus, it seems that the formal representation of plain belief has to take a non-probabilistic route. Indeed, representing plain belief seems easy enough: simply represent an epistemic state by the set of all propositions believed true in it or, since I make the common assumption that plain belief is deductively closed, by the conjunction of all propositions believed true in it. But this does not yet provide a theory of induction, i.e. an answer to the question how epistemic states so represented are changed tbrough information or experience. There is a convincing partial answer: if the new information is compatible with the old epistemic state, then the new epistemic state is simply represented by the conjunction of the new information and the old beliefs. This answer is partial because it does not cover the quite common case where the new information is incompatible with the old beliefs. It is, however, important to complete the answer and to cover this case, too; otherwise, we would not represent plain belief as conigible. The crucial problem is that there is no good completion. When epistemic states are represented simply by the conjunction of all propositions believed true in it, the answer cannot be completed; and though there is a lot of fruitful work, no other representation of epistemic states has been proposed, as far as I know, which provides a complete solution to this problem. In this paper, I want to suggest such a solution. In [4], I have more fully argued that this is the only solution, if certain plausible desiderata are to be satisfied. Here, in section 2, I will be content with formally defining and intuitively explaining my proposal. I will compare my proposal with probability theory in section 3. It will turn out that the theory I am proposing is structurally homomorphic to probability theory in important respects and that it is thus equally easily implementable, but moreover computationally simpler. Section 4 contains a very brief comparison with various kinds of logics, in particular conditional logic, with Shackle's functions of potential surprise and related theories, and with the Dempster - Shafer theory of belief functions.\nGraphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network is\nA spacially extended model of the collective behavior of a large number of locally acting organisms is proposed in which organisms move probabilistically between local cells in space, but with weights dependent on local morphogenetic substances, or morphogens. The morphogens are in turn are effected by the passage of an organism. The evolution of the morphogens, and the corresponding flow of the organisms constitutes the collective behavior of the group. Such models have various types of phase transitions and self-organizing properties controlled both by the level of the noise, and other parameters.   The model is then applied to the specific case of ants moving on a lattice. The local behavior of the ants is inspired by the actual behavior observed in the laboratory, and analytic results for the collective behavior are compared to the corresponding laboratory results.   It is hoped that the present model might serve as a paradigmatic example of a complex cooperative system in nature. In particular swarm models can be used to explore the relation of nonequilibrium phase transitions to at least three important issues encountered in artificial life. Firstly, that of emergence as complex adaptive behavior. Secondly, as an exploration of continuous phase transitions in biological systems. Lastly, to derive behavioral criteria for the evolution of collective behavior in social organisms.\nThis paper shows that a new type of artificial neural network (ANN) -- the Simultaneous Recurrent Network (SRN) -- can, if properly trained, solve a difficult function approximation problem which conventional ANNs -- either feedforward or Hebbian -- cannot. This problem, the problem of generalized maze navigation, is typical of problems which arise in building true intelligent control systems using neural networks. (Such systems are discussed in the chapter by Werbos in K.Pribram, Brain and Values, Erlbaum 1998.) The paper provides a general review of other types of recurrent networks and alternative training techniques, including a flowchart of the Error Critic training design, arguable the only plausible approach to explain how the brain adapts time-lagged recurrent systems in real-time. The C code of the test is appended. As in the first tests of backprop, the training here was slow, but there are ways to do better after more experience using this type of network.\nWe introduce constraints necessary for type checking a higher-order concurrent constraint language, and solve them with an incremental algorithm. Our constraint system extends rational unification by constraints x$\\subseteq$ y saying that ``$x$ has at least the structure of $y$'', modelled by a weak instance relation between trees. This notion of instance has been carefully chosen to be weaker than the usual one which renders semi-unification undecidable. Semi-unification has more than once served to link unification problems arising from type inference and those considered in computational linguistics. Just as polymorphic recursion corresponds to subsumption through the semi-unification problem, our type constraint problem corresponds to weak subsumption of feature graphs in linguistics. The decidability problem for \\WhatsIt for feature graphs has been settled by D\\\"orre~\\cite{Doerre:WeakSubsumption:94}. \\nocite{RuppRosnerJohnson:94} In contrast to D\\\"orre's, our algorithm is fully incremental and does not refer to finite state automata. Our algorithm also is a lot more flexible. It allows a number of extensions (records, sorts, disjunctive types, type declarations, and others) which make it suitable for type inference of a full-fledged programming language.\nThis paper shows how agents' choice in communicative action can be designed to mitigate the effect of their resource limits in the context of particular features of a collaborative planning task. I first motivate a number of hypotheses about effective language behavior based on a statistical analysis of a corpus of natural collaborative planning dialogues. These hypotheses are then tested in a dialogue testbed whose design is motivated by the corpus analysis. Experiments in the testbed examine the interaction between (1) agents' resource limits in attentional capacity and inferential capacity; (2) agents' choice in communication; and (3) features of communicative tasks that affect task difficulty such as inferential complexity, degree of belief coordination required, and tolerance for errors. The results show that good algorithms for communication must be defined relative to the agents' resource limits and the features of the task. Algorithms that are inefficient for inferentially simple, low coordination or fault-tolerant tasks are effective when tasks require coordination or complex inferences, or are fault-intolerant. The results provide an explanation for the occurrence of utterances in human dialogues that, prima facie, appear inefficient, and provide the basis for the design of effective algorithms for communicative choice for resource limited agents.\nOver the past thirty years, there has been considerable progress in the design of natural language interfaces to databases. Most of this work has concerned snapshot databases, in which there are only limited facilities for manipulating time-varying information. The database community is becoming increasingly interested in temporal databases, databases with special support for time-dependent entries. We have developed a framework for constructing natural language interfaces to temporal databases, drawing on research on temporal phenomena within logic and linguistics. The central part of our framework is a logic-like formal language, called TOP, which can capture the semantics of a wide range of English sentences. We have implemented an HPSG-based sentence analyser that converts a large set of English queries involving time into TOP formulae, and have formulated a provably correct procedure for translating TOP expressions into queries in the TSQL2 temporal database language. In this way we have established a sound route from English to a general-purpose temporal database language.\nFASTUS is a system for extracting information from natural language text for entry into a database and for other applications. It works essentially as a cascaded, nondeterministic finite-state automaton. There are five stages in the operation of FASTUS. In Stage 1, names and other fixed form expressions are recognized. In Stage 2, basic noun groups, verb groups, and prepositions and some other particles are recognized. In Stage 3, certain complex noun groups and verb groups are constructed. Patterns for events of interest are identified in Stage 4 and corresponding ``event structures'' are built. In Stage 5, distinct event structures that describe the same event are identified and merged, and these are used in generating database entries. This decomposition of language processing enables the system to do exactly the right amount of domain-independent syntax, so that domain-dependent semantic and pragmatic processing can be applied to the right larger-scale structures. FASTUS is very efficient and effective, and has been used successfully in a number of applications.\nMost existing natural language database interfaces (NLDBs) were designed to be used with database systems that provide very limited facilities for manipulating time-dependent data, and they do not support adequately temporal linguistic mechanisms (verb tenses, temporal adverbials, temporal subordinate clauses, etc.). The database community is becoming increasingly interested in temporal database systems, that are intended to store and manipulate in a principled manner information not only about the present, but also about the past and future. When interfacing to temporal databases, supporting temporal linguistic mechanisms becomes crucial.   We present a framework for constructing natural language interfaces for temporal databases (NLTDBs), that draws on research in tense and aspect theories, temporal logics, and temporal databases. The framework consists of a temporal intermediate representation language, called TOP, an HPSG grammar that maps a wide range of questions involving temporal mechanisms to appropriate TOP expressions, and a provably correct method for translating from TOP to TSQL2, TSQL2 being a recently proposed temporal extension of the SQL database language. This framework was employed to implement a prototype NLTDB using ALE and Prolog.\nThis work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represent relations between individual words explicitly in my model. Lexical attraction is defined as the likelihood of such relations. I introduce a new class of probabilistic language models named lexical attraction models which can represent long distance relations between words and I formalize this new class of models using information theory.   Within the framework of lexical attraction, I developed an unsupervised language acquisition program that learns to identify linguistic relations in a given sentence. The only explicitly represented linguistic knowledge in the program is lexical attraction. There is no initial grammar or lexicon built in and the only input is raw text. Learning and processing are interdigitated. The processor uses the regularities detected by the learner to impose structure on the input. This structure enables the learner to detect higher level regularities. Using this bootstrapping procedure, the program was trained on 100 million words of Associated Press material and was able to achieve 60% precision and 50% recall in finding relations between content-words. Using knowledge of lexical attraction, the program can identify the correct relations in syntactically ambiguous sentences such as ``I saw the Statue of Liberty flying over New York.''\nWe introduce a highly structured family of hard satisfiable 3-SAT formulas corresponding to an ordered spin-glass model from statistical physics. This model has provably \"glassy\" behavior; that is, it has many local optima with large energy barriers between them, so that local search algorithms get stuck and have difficulty finding the true ground state, i.e., the unique satisfying assignment. We test the hardness of our formulas with two Davis-Putnam solvers, Satz and zChaff, the recently introduced Survey Propagation (SP), and two local search algorithms, Walksat and Record-to-Record Travel (RRT). We compare our formulas to random 3-XOR-SAT formulas and to two other generators of hard satisfiable instances, the minimum disagreement parity formulas of Crawford et al., and Hirsch's hgen. For the complete solvers the running time of our formulas grows exponentially in sqrt(n), and exceeds that of random 3-XOR-SAT formulas for small problem sizes. SP is unable to solve our formulas with as few as 25 variables. For Walksat, our formulas appear to be harder than any other known generator of satisfiable instances. Finally, our formulas can be solved efficiently by RRT but only if the parameter d is tuned to the height of the barriers between local minima, and we use this parameter to measure the barrier heights in random 3-XOR-SAT formulas as well.\nConditional logics play an important role in recent attempts to formulate theories of default reasoning. This paper investigates first-order conditional logic. We show that, as for first-order probabilistic logic, it is important not to confound statistical conditionals over the domain (such as ``most birds fly''), and subjective conditionals over possible worlds (such as ``I believe that Tweety is unlikely to fly''). We then address the issue of ascribing semantics to first-order conditional logic. As in the propositional case, there are many possible semantics. To study the problem in a coherent way, we use plausibility structures. These provide us with a general framework in which many of the standard approaches can be embedded. We show that while these standard approaches are all the same at the propositional level, they are significantly different in the context of a first-order language. Furthermore, we show that plausibilities provide the most natural extension of conditional logic to the first-order case: We provide a sound and complete axiomatization that contains only the KLM properties and standard axioms of first-order modal logic. We show that most of the other approaches have additional properties, which result in an inappropriate treatment of an infinitary version of the lottery paradox.\nIn many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on ``most similar'' words.   We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error.   We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task.\nSeveral important decision problems on conjunctive queries (CQs) are NP-complete in general but become tractable, and actually highly parallelizable, if restricted to acyclic or nearly acyclic queries. Examples are the evaluation of Boolean CQs and query containment. These problems were shown tractable for conjunctive queries of bounded treewidth and of bounded degree of cyclicity. The so far most general concept of nearly acyclic queries was the notion of queries of bounded query-width introduced by Chekuri and Rajaraman (1997). While CQs of bounded query width are tractable, it remained unclear whether such queries are efficiently recognizable. Chekuri and Rajaraman stated as an open problem whether for each constant k it can be determined in polynomial time if a query has query width less than or equal to k. We give a negative answer by proving this problem NP-complete (specifically, for k=4). In order to circumvent this difficulty, we introduce the new concept of hypertree decomposition of a query and the corresponding notion of hypertree width. We prove: (a) for each k, the class of queries with query width bounded by k is properly contained in the class of queries whose hypertree width is bounded by k; (b) unlike query width, constant hypertree-width is efficiently recognizable; (c) Boolean queries of constant hypertree width can be efficiently evaluated.\nScheduling dialogs, during which people negotiate the times of appointments, are common in everyday life. This paper reports the results of an in-depth empirical investigation of resolving explicit temporal references in scheduling dialogs. There are four phases of this work: data annotation and evaluation, model development, system implementation and evaluation, and model evaluation and analysis. The system and model were developed primarily on one set of data, and then applied later to a much more complex data set, to assess the generalizability of the model for the task being performed. Many different types of empirical methods are applied to pinpoint the strengths and weaknesses of the approach. Detailed annotation instructions were developed and an intercoder reliability study was performed, showing that naive annotators can reliably perform the targeted annotations. A fully automatic system has been developed and evaluated on unseen test data, with good results on both data sets. We adopt a pure realization of a recency-based focus model to identify precisely when it is and is not adequate for the task being addressed. In addition to system results, an in-depth evaluation of the model itself is presented, based on detailed manual annotations. The results are that few errors occur specifically due to the model of focus being used, and the set of anaphoric relations defined in the model are low in ambiguity for both data sets.\nThe relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's minimal sufficient statistic. In general we show that data compression is almost always the best strategy, both in hypothesis identification and prediction.\nThe paper argues that Fodor and Lepore are misguided in their attack on Pustejovsky's Generative Lexicon, largely because their argument rests on a traditional, but implausible and discredited, view of the lexicon on which it is effectively empty of content, a view that stands in the long line of explaining word meaning (a) by ostension and then (b) explaining it by means of a vacuous symbol in a lexicon, often the word itself after typographic transmogrification. (a) and (b) both share the wrong belief that to a word must correspond a simple entity that is its meaning. I then turn to the semantic rules that Pustejovsky uses and argue first that, although they have novel features, they are in a well-established Artificial Intelligence tradition of explaining meaning by reference to structures that mention other structures assigned to words that may occur in close proximity to the first. It is argued that Fodor and Lepore's view that there cannot be such rules is without foundation, and indeed systems using such rules have proved their practical worth in computational systems. Their justification descends from line of argument, whose high points were probably Wittgenstein and Quine that meaning is not to be understood by simple links to the world, ostensive or otherwise, but by the relationship of whole cultural representational structures to each other and to the world as a whole.\nThe abstract mathematical theory of partial differential equations (PDEs) is formulated in terms of manifolds, scalar fields, tensors, and the like, but these algebraic structures are hardly recognizable in actual PDE solvers. The general aim of the Sophus programming style is to bridge the gap between theory and practice in the domain of PDE solvers. Its main ingredients are a library of abstract datatypes corresponding to the algebraic structures used in the mathematical theory and an algebraic expression style similar to the expression style used in the mathematical theory. Because of its emphasis on abstract datatypes, Sophus is most naturally combined with object-oriented languages or other languages supporting abstract datatypes. The resulting source code patterns are beyond the scope of current compiler optimizations, but are sufficiently specific for a dedicated source-to-source optimizer. The limited, domain-specific, character of Sophus is the key to success here. This kind of optimization has been tested on computationally intensive Sophus style code with promising results. The general approach may be useful for other styles and in other application domains as well.\nWe consider the problem of how to design large decentralized multi-agent systems (MAS's) in an automated fashion, with little or no hand-tuning. Our approach has each agent run a reinforcement learning algorithm. This converts the problem into one of how to automatically set/update the reward functions for each of the agents so that the global goal is achieved. In particular we do not want the agents to ``work at cross-purposes'' as far as the global goal is concerned. We use the term artificial COllective INtelligence (COIN) to refer to systems that embody solutions to this problem. In this paper we present a summary of a mathematical framework for COINs. We then investigate the real-world applicability of the core concepts of that framework via two computer experiments: we show that our COINs perform near optimally in a difficult variant of Arthur's bar problem (and in particular avoid the tragedy of the commons for that problem), and we also illustrate optimal performance for our COINs in the leader-follower problem.\nWe study here constraint satisfaction problems that are based on predefined, explicitly given finite constraints. To solve them we propose a notion of rule consistency that can be expressed in terms of rules derived from the explicit representation of the initial constraints.   This notion of local consistency is weaker than arc consistency for constraints of arbitrary arity but coincides with it when all domains are unary or binary. For Boolean constraints rule consistency coincides with the closure under the well-known propagation rules for Boolean constraints.   By generalizing the format of the rules we obtain a characterization of arc consistency in terms of so-called inclusion rules. The advantage of rule consistency and this rule based characterization of the arc consistency is that the algorithms that enforce both notions can be automatically generated, as CHR rules. So these algorithms could be integrated into constraint logic programming systems such as Eclipse.   We illustrate the usefulness of this approach to constraint propagation by discussing the implementations of both algorithms and their use on various examples, including Boolean constraints, three valued logic of Kleene, constraints dealing with Waltz's language for describing polyhedreal scenes, and Allen's qualitative approach to temporal logic.\nThe semantic framework for the modal logic of knowledge due to Halpern and Moses provides a way to ascribe knowledge to agents in distributed and multi-agent systems. In this paper we study two special cases of this framework: full systems and hypercubes. Both model static situations in which no agent has any information about another agent's state. Full systems and hypercubes are an appropriate model for the initial configurations of many systems of interest. We establish a correspondence between full systems and hypercube systems and certain classes of Kripke frames. We show that these classes of systems correspond to the same logic. Moreover, this logic is also the same as that generated by the larger class of weakly directed frames. We provide a sound and complete axiomatization, S5WDn, of this logic. Finally, we show that under certain natural assumptions, in a model where knowledge evolves over time, S5WDn characterizes the properties of knowledge not just at the initial configuration, but also at all later configurations. In particular, this holds for homogeneous broadcast systems, which capture settings in which agents are initially ignorant of each others local states, operate synchronously, have perfect recall and can communicate only by broadcasting.\nIn this paper, we focus on the problem of existence and computing of small and large stable models. We show that for every fixed integer k, there is a linear-time algorithm to decide the problem LSM (large stable models problem): does a logic program P have a stable model of size at least |P|-k. In contrast, we show that the problem SSM (small stable models problem) to decide whether a logic program P has a stable model of size at most k is much harder. We present two algorithms for this problem but their running time is given by polynomials of order depending on k. We show that the problem SSM is fixed-parameter intractable by demonstrating that it is W[2]-hard. This result implies that it is unlikely, an algorithm exists to compute stable models of size at most k that would run in time O(n^c), where c is a constant independent of k. We also provide an upper bound on the fixed-parameter complexity of the problem SSM by showing that it belongs to the class W[3].\nGlobal SLS-resolution and SLG-resolution are two representative mechanisms for top-down evaluation of the well-founded semantics of general logic programs. Global SLS-resolution is linear for query evaluation but suffers from infinite loops and redundant computations. In contrast, SLG-resolution resolves infinite loops and redundant computations by means of tabling, but it is not linear. The principal disadvantage of a non-linear approach is that it cannot be implemented using a simple, efficient stack-based memory structure nor can it be easily extended to handle some strictly sequential operators such as cuts in Prolog.   In this paper, we present a linear tabling method, called SLT-resolution, for top-down evaluation of the well-founded semantics. SLT-resolution is a substantial extension of SLDNF-resolution with tabling. Its main features include: (1) It resolves infinite loops and redundant computations while preserving the linearity. (2) It is terminating, and sound and complete w.r.t. the well-founded semantics for programs with the bounded-term-size property with non-floundering queries. Its time complexity is comparable with SLG-resolution and polynomial for function-free logic programs. (3) Because of its linearity for query evaluation, SLT-resolution bridges the gap between the well-founded semantics and standard Prolog implementation techniques. It can be implemented by an extension to any existing Prolog abstract machines such as WAM or ATOAM.\nWe introduced decomposable negation normal form (DNNF) recently as a tractable form of propositional theories, and provided a number of powerful logical operations that can be performed on it in polynomial time. We also presented an algorithm for compiling any conjunctive normal form (CNF) into DNNF and provided a structure-based guarantee on its space and time complexity. We present in this paper a linear-time algorithm for converting an ordered binary decision diagram (OBDD) representation of a propositional theory into an equivalent DNNF, showing that DNNFs scale as well as OBDDs. We also identify a subclass of DNNF which we call deterministic DNNF, d-DNNF, and show that the previous complexity guarantees on compiling DNNF continue to hold for this stricter subclass, which has stronger properties. In particular, we present a new operation on d-DNNF which allows us to count its models under the assertion, retraction and flipping of every literal by traversing the d-DNNF twice. That is, after such traversal, we can test in constant-time: the entailment of any literal by the d-DNNF, and the consistency of the d-DNNF under the retraction or flipping of any literal. We demonstrate the significance of these new operations by showing how they allow us to implement linear-time, complete truth maintenance systems and linear-time, complete belief revision systems for two important classes of propositional theories.\nInfinite loops and redundant computations are long recognized open problems in Prolog. Two ways have been explored to resolve these problems: loop checking and tabling. Loop checking can cut infinite loops, but it cannot be both sound and complete even for function-free logic programs. Tabling seems to be an effective way to resolve infinite loops and redundant computations. However, existing tabulated resolutions, such as OLDT-resolution, SLG- resolution, and Tabulated SLS-resolution, are non-linear because they rely on the solution-lookup mode in formulating tabling. The principal disadvantage of non-linear resolutions is that they cannot be implemented using a simple stack-based memory structure like that in Prolog. Moreover, some strictly sequential operators such as cuts may not be handled as easily as in Prolog.   In this paper, we propose a hybrid method to resolve infinite loops and redundant computations. We combine the ideas of loop checking and tabling to establish a linear tabulated resolution called TP-resolution. TP-resolution has two distinctive features: (1) It makes linear tabulated derivations in the same way as Prolog except that infinite loops are broken and redundant computations are reduced. It handles cuts as effectively as Prolog. (2) It is sound and complete for positive logic programs with the bounded-term-size property. The underlying algorithm can be implemented by an extension to any existing Prolog abstract machines such as WAM or ATOAM.\nWe present a general, consistency-based framework for belief change. Informally, in revising K by A, we begin with A and incorporate as much of K as consistently possible. Formally, a knowledge base K and sentence A are expressed, via renaming propositions in K, in separate languages. Using a maximization process, we assume the languages are the same insofar as consistently possible. Lastly, we express the resultant knowledge base in a single language. There may be more than one way in which A can be so extended by K: in choice revision, one such ``extension'' represents the revised state; alternately revision consists of the intersection of all such extensions.   The most general formulation of our approach is flexible enough to express other approaches to revision and update, the merging of knowledge bases, and the incorporation of static and dynamic integrity constraints. Our framework differs from work based on ordinal conditional functions, notably with respect to iterated revision. We argue that the approach is well-suited for implementation: the choice revision operator gives better complexity results than general revision; the approach can be expressed in terms of a finite knowledge base; and the scope of a revision can be restricted to just those propositions mentioned in the sentence for revision A.\nIn solving a query, the SLD proof procedure for definite programs sometimes searches an infinite space for a non existing solution. For example, querying a planner for an unreachable goal state. Such programs motivate the development of methods to prove the absence of a solution. Considering the definite program and the query ``<- Q'' as clauses of a first order theory, one can apply model generators which search for a finite interpretation in which the program clauses as well as the clause ``false <- Q'' are true. This paper develops a new approach which exploits the fact that all clauses are definite. It is based on a goal directed abductive search in the space of finite pre-interpretations for a pre-interpretation such that ``Q'' is false in the least model of the program based on it. Several methods for efficiently searching the space of pre-interpretations are presented. Experimental results confirm that our approach find solutions with less search than with the use of a first order model generator.\nWe study here a natural situation when constraint programming can be entirely reduced to rule-based programming. To this end we explain first how one can compute on constraint satisfaction problems using rules represented by simple first-order formulas. Then we consider constraint satisfaction problems that are based on predefined, explicitly given constraints. To solve them we first derive rules from these explicitly given constraints and limit the computation process to a repeated application of these rules, combined with labeling.We consider here two types of rules. The first type, that we call equality rules, leads to a new notion of local consistency, called {\\em rule consistency} that turns out to be weaker than arc consistency for constraints of arbitrary arity (called hyper-arc consistency in \\cite{MS98b}). For Boolean constraints rule consistency coincides with the closure under the well-known propagation rules for Boolean constraints. The second type of rules, that we call membership rules, yields a rule-based characterization of arc consistency. To show feasibility of this rule-based approach to constraint programming we show how both types of rules can be automatically generated, as {\\tt CHR} rules of \\cite{fruhwirth-constraint-95}. This yields an implementation of this approach to programming by means of constraint logic programming. We illustrate the usefulness of this approach to constraint programming by discussing various examples, including Boolean constraints, two typical examples of many valued logics, constraints dealing with Waltz's language for describing polyhedral scenes, and Allen's qualitative approach to temporal logic.\nAnswer-set programming (ASP) has emerged recently as a viable programming paradigm well attuned to search problems in AI, constraint satisfaction and combinatorics. Propositional logic is, arguably, the simplest ASP system with an intuitive semantics supporting direct modeling of problem constraints. However, for some applications, especially those requiring that transitive closure be computed, it requires additional variables and results in large theories. Consequently, it may not be a practical computational tool for such problems. On the other hand, ASP systems based on nonmonotonic logics, such as stable logic programming, can handle transitive closure computation efficiently and, in general, yield very concise theories as problem representations. Their semantics is, however, more complex. Searching for the middle ground, in this paper we introduce a new nonmonotonic logic, DATALOG with constraints or DC. Informally, DC theories consist of propositional clauses (constraints) and of Horn rules. The semantics is a simple and natural extension of the semantics of the propositional logic. However, thanks to the presence of Horn rules in the system, modeling of transitive closure becomes straightforward. We describe the syntax and semantics of DC, and study its properties. We discuss an implementation of DC and present results of experimental study of the effectiveness of DC, comparing it with CSAT, a satisfiability checker and SMODELS implementation of stable logic programming. Our results show that DC is competitive with the other two approaches, in case of many search problems, often yielding much more efficient solutions.\nThe implicit theory that a simulation represents is precisely not in the individual choices but rather in the 'envelope' of possible trajectories - what is important is the shape of the whole envelope. Typically a huge amount of computation is required when experimenting with factors bearing on the dynamics of a simulation to tease out what affects the shape of this envelope. In this paper we present a methodology aimed at systematically exploring this envelope. We propose a method for searching for tendencies and proving their necessity relative to a range of parameterisations of the model and agents' choices, and to the logic of the simulation language. The exploration consists of a forward chaining generation of the trajectories associated to and constrained by such a range of parameterisations and choices. Additionally, we propose a computational procedure that helps implement this exploration by translating a Multi Agent System simulation into a constraint-based search over possible trajectories by 'compiling' the simulation rules into a more specific form, namely by partitioning the simulation rules using appropriate modularity in the simulation. An example of this procedure is exhibited.   Keywords: Constraint Search, Constraint Logic Programming, Proof, Emergence, Tendencies\nFor years, Caisse des Depots et Consignations has produced information filtering applications. To be operational, these applications require high filtering performances which are achieved by using rule-based filters. With this technique, an administrator has to tune a set of rules for each topic. However, filters become obsolescent over time. The decrease of their performances is due to diachronic polysemy of terms that involves a loss of precision and to diachronic polymorphism of concepts that involves a loss of recall.   To help the administrator to maintain his filters, we have developed a method which automatically detects filtering obsolescence. It consists in making a learning-based control filter using a set of documents which have already been categorised as relevant or not relevant by the rule-based filter. The idea is to supervise this filter by processing a differential comparison of its outcomes with those of the control one.   This method has many advantages. It is simple to implement since the training set used by the learning is supplied by the rule-based filter. Thus, both the making and the use of the control filter are fully automatic. With automatic detection of obsolescence, learning-based filtering finds a rich application which offers interesting prospects.\nFor the TREC-8 routing, one specific filter is built for each topic. Each filter is a classifier trained to recognize the documents that are relevant to the topic. When presented with a document, each classifier estimates the probability for the document to be relevant to the topic for which it has been trained. Since the procedure for building a filter is topic-independent, the system is fully automatic.   By making use of a sample of documents that have previously been evaluated as relevant or not relevant to a particular topic, a term selection is performed, and a neural network is trained. Each document is represented by a vector of frequencies of a list of selected terms. This list depends on the topic to be filtered; it is constructed in two steps. The first step defines the characteristic words used in the relevant documents of the corpus; the second one chooses, among the previous list, the most discriminant ones. The length of the vector is optimized automatically for each topic. At the end of the term selection, a vector of typically 25 words is defined for the topic, so that each document which has to be processed is represented by a vector of term frequencies.   This vector is subsequently input to a classifier that is trained from the same sample. After training, the classifier estimates for each document of a test set its probability of being relevant; for submission to TREC, the top 1000 documents are ranked in order of decreasing relevance.\nWe describe a slightly sub-exponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomial-time algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known instance of an efficient noise-tolerant algorithm for a concept class that is provably not learnable in the Statistical Query model of Kearns. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model.   In coding-theory terms, what we give is a poly(n)-time algorithm for decoding linear k by n codes in the presence of random noise for the case of k = c log n loglog n for some c > 0. (The case of k = O(log n) is trivial since one can just individually check each of the 2^k possible messages and choose the one that yields the closest codeword.)   A natural extension of the statistical query model is to allow queries about statistical properties that involve t-tuples of examples (as opposed to single examples). The second result of this paper is to show that any class of functions learnable (strongly or weakly) with t-wise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence this natural extension to the statistical query model does not increase the set of weakly learnable functions.\nThinking is one of the most interesting mental processes. Its complexity is sometimes simplified and its different manifestations are classified into normal and abnormal, like the delusional and disorganized thought or the creative one. The boundaries between these facets of thinking are fuzzy causing difficulties in medical, academic, and philosophical discussions. Considering the dopaminergic signal-to-noise neuronal modulation in the central nervous system, and the existence of semantic maps in human brain, a self-organizing neural network model was developed to unify the different thought processes into a single neurocomputational substrate. Simulations were performed varying the dopaminergic modulation and observing the different patterns that emerged at the semantic map. Assuming that the thought process is the total pattern elicited at the output layer of the neural network, the model shows how the normal and abnormal thinking are generated and that there are no borders between their different manifestations. Actually, a continuum of different qualitative reasoning, ranging from delusion to disorganization of thought, and passing through the normal and the creative thinking, seems to be more plausible. The model is far from explaining the complexities of human thinking but, at least, it seems to be a good metaphorical and unifying view of the many facets of this phenomenon usually studied in separated settings.\nWe examine carefully the rationale underlying the approaches to belief change taken in the literature, and highlight what we view as methodological problems. We argue that to study belief change carefully, we must be quite explicit about the ``ontology'' or scenario underlying the belief change process. This is something that has been missing in previous work, with its focus on postulates. Our analysis shows that we must pay particular attention to two issues that have often been taken for granted: The first is how we model the agent's epistemic state. (Do we use a set of beliefs, or a richer structure, such as an ordering on worlds? And if we use a set of beliefs, in what language are these beliefs are expressed?) We show that even postulates that have been called ``beyond controversy'' are unreasonable when the agent's beliefs include beliefs about her own epistemic state as well as the external world. The second is the status of observations. (Are observations known to be true, or just believed? In the latter case, how firm is the belief?) Issues regarding the status of observations arise particularly when we consider iterated belief revision, and we must confront the possibility of revising by p and then by not-p.\nWe consider an approach to update nonmonotonic knowledge bases represented as extended logic programs under answer set semantics. New information is incorporated into the current knowledge base subject to a causal rejection principle enforcing that, in case of conflicts, more recent rules are preferred and older rules are overridden. Such a rejection principle is also exploited in other approaches to update logic programs, e.g., in dynamic logic programming by Alferes et al. We give a thorough analysis of properties of our approach, to get a better understanding of the causal rejection principle. We review postulates for update and revision operators from the area of theory change and nonmonotonic reasoning, and some new properties are considered as well. We then consider refinements of our semantics which incorporate a notion of minimality of change. As well, we investigate the relationship to other approaches, showing that our approach is semantically equivalent to inheritance programs by Buccafurri et al. and that it coincides with certain classes of dynamic logic programs, for which we provide characterizations in terms of graph conditions. Therefore, most of our results about properties of causal rejection principle apply to these approaches as well. Finally, we deal with computational complexity of our approach, and outline how the update semantics and its refinements can be implemented on top of existing logic programming engines.\nThe notion of arc consistency plays a central role in constraint satisfaction. It is known that the notion of local consistency can be extended to constraint optimisation problems defined by soft constraint frameworks based on an idempotent cost combination operator. This excludes non idempotent operators such as + which define problems which are very important in practical applications such as Max-CSP, where the aim is to minimize the number of violated constraints. In this paper, we show that using a weak additional axiom satisfied by most existing soft constraints proposals, it is possible to define a notion of soft arc consistency that extends the classical notion of arc consistency and this even in the case of non idempotent cost combination operators. A polynomial time algorithm for enforcing this soft arc consistency exists and its space and time complexities are identical to that of enforcing arc consistency in CSPs when the cost combination operator is strictly monotonic (for example Max-CSP). A directional version of arc consistency is potentially even stronger than the non-directional version, since it allows non local propagation of penalties. We demonstrate the utility of directional arc consistency by showing that it not only solves soft constraint problems on trees, but that it also implies a form of local optimality, which we call arc irreducibility.\nMost recently, Answer Set Programming (ASP) is attracting interest as a new paradigm for problem solving. An important aspect which needs to be supported is the handling of preferences between rules, for which several approaches have been presented. In this paper, we consider the problem of implementing preference handling approaches by means of meta-interpreters in Answer Set Programming. In particular, we consider the preferred answer set approaches by Brewka and Eiter, by Delgrande, Schaub and Tompits, and by Wang, Zhou and Lin. We present suitable meta-interpreters for these semantics using DLV, which is an efficient engine for ASP. Moreover, we also present a meta-interpreter for the weakly preferred answer set approach by Brewka and Eiter, which uses the weak constraint feature of DLV as a tool for expressing and solving an underlying optimization problem. We also consider advanced meta-interpreters, which make use of graph-based characterizations and often allow for more efficient computations. Our approach shows the suitability of ASP in general and of DLV in particular for fast prototyping. This can be fruitfully exploited for experimenting with new languages and knowledge-representation formalisms.\nWe consider the question of whether collusion among bidders (a \"bidding ring\") can be supported in equilibrium of unrepeated first-price auctions. Unlike previous work on the topic such as that by McAfee and McMillan [1992] and Marshall and Marx [2007], we do not assume that non-colluding agents have perfect knowledge about the number of colluding agents whose bids are suppressed by the bidding ring, and indeed even allow for the existence of multiple cartels. Furthermore, while we treat the association of bidders with bidding rings as exogenous, we allow bidders to make strategic decisions about whether to join bidding rings when invited. We identify a bidding ring protocol that results in an efficient allocation in Bayes{Nash equilibrium, under which non-colluding agents bid straightforwardly, and colluding agents join bidding rings when invited and truthfully declare their valuations to the ring center. We show that bidding rings benefit ring centers and all agents, both members and non-members of bidding rings, at the auctioneer's expense. The techniques we introduce in this paper may also be useful for reasoning about other problems in which agents have asymmetric information about a setting.\nThis paper is aimed at providing a uniform framework for reasoning about beliefs of multiple agents and their fusion. In the first part of the paper, we develop logics for reasoning about cautiously merged beliefs of agents with different degrees of reliability. The logics are obtained by combining the multi-agent epistemic logic and multi-sources reasoning systems. Every ordering for the reliability of the agents is represented by a modal operator, so we can reason with the merged results under different situations. The fusion is cautious in the sense that if an agent's belief is in conflict with those of higher priorities, then his belief is completely discarded from the merged result. We consider two strategies for the cautious merging of beliefs. In the first one, if inconsistency occurs at some level, then all beliefs at the lower levels are discarded simultaneously, so it is called level cutting strategy. For the second one, only the level at which the inconsistency occurs is skipped, so it is called level skipping strategy. The formal semantics and axiomatic systems for these two strategies are presented. In the second part, we extend the logics both syntactically and semantically to cover some more sophisticated belief fusion and revision operators. While most existing approaches treat belief fusion operators as meta-level constructs, these operators are directly incorporated into our object logic language. Thus it is possible to reason not only with the merged results but also about the fusion process in our logics. The relationship of our extended logics with the conditional logics of belief revision is also discussed.\nPrevious works suggested the use of Branch and Bound techniques for finding the optimal allocation in (multi-unit) combinatorial auctions. They remarked that Linear Programming could provide a good upper-bound to the optimal allocation, but they went on using lighter and less tight upper-bound heuristics, on the ground that LP was too time-consuming to be used repetitively to solve large combinatorial auctions. We present the results of extensive experiments solving large (multi-unit) combinatorial auctions generated according to distributions proposed by different researchers. Our surprising conclusion is that Linear Programming is worth using. Investing almost all of one's computing time in using LP to bound from above the value of the optimal solution in order to prune aggressively pays off. We present a way to save on the number of calls to the LP routine and experimental results comparing different heuristics for choosing the bid to be considered next. Those results show that the ordering based on the square root of the size of the bids that was shown to be theoretically optimal in a previous paper by the authors performs surprisingly better than others in practice. Choosing to deal first with the bid with largest coefficient (typically 1) in the optimal solution of the relaxed LP problem, is also a good choice. The gap between the lower bound provided by greedy heuristics and the upper bound provided by LP is typically small and pruning is therefore extensive. For most distributions, auctions of a few hundred goods among a few thousand bids can be solved in practice. All experiments were run on a PC under Matlab.\nTarski gave a general semantics for deductive reasoning: a formula a may be deduced from a set A of formulas iff a holds in all models in which each of the elements of A holds. A more liberal semantics has been considered: a formula a may be deduced from a set A of formulas iff a holds in all of the \"preferred\" models in which all the elements of A hold. Shoham proposed that the notion of \"preferred\" models be defined by a partial ordering on the models of the underlying language. A more general semantics is described in this paper, based on a set of natural properties of choice functions. This semantics is here shown to be equivalent to a semantics based on comparing the relative \"importance\" of sets of models, by what amounts to a qualitative probability measure. The consequence operations defined by the equivalent semantics are then characterized by a weakening of Tarski's properties in which the monotonicity requirement is replaced by three weaker conditions. Classical propositional connectives are characterized by natural introduction-elimination rules in a nonmonotonic setting. Even in the nonmonotonic setting, one obtains classical propositional logic, thus showing that monotonicity is not required to justify classical propositional connectives.\nWe introduce a methodology and framework for expressing general preference information in logic programming under the answer set semantics. An ordered logic program is an extended logic program in which rules are named by unique terms, and in which preferences among rules are given by a set of atoms of form s < t where s and t are names. An ordered logic program is transformed into a second, regular, extended logic program wherein the preferences are respected, in that the answer sets obtained in the transformed program correspond with the preferred answer sets of the original program. Our approach allows the specification of dynamic orderings, in which preferences can appear arbitrarily within a program. Static orderings (in which preferences are external to a logic program) are a trivial restriction of the general dynamic case. First, we develop a specific approach to reasoning with preferences, wherein the preference ordering specifies the order in which rules are to be applied. We then demonstrate the wide range of applicability of our framework by showing how other approaches, among them that of Brewka and Eiter, can be captured within our framework. Since the result of each of these transformations is an extended logic program, we can make use of existing implementations, such as dlv and smodels. To this end, we have developed a publicly available compiler as a front-end for these programming systems.\nIn multiagent settings where the agents have different preferences, preference aggregation is a central issue. Voting is a general method for preference aggregation, but seminal results have shown that all general voting protocols are manipulable. One could try to avoid manipulation by using voting protocols where determining a beneficial manipulation is hard. Especially among computational agents, it is reasonable to measure this hardness by computational complexity. Some earlier work has been done in this area, but it was assumed that the number of voters and candidates is unbounded. We derive hardness results for practical multiagent settings where the number of candidates is small but the number of voters can be large. We show that with complete information about the others' votes, individual manipulation is easy, and coalitional manipulation is easy with unweighted voters. However, constructive coalitional manipulation with weighted voters is intractable for all of the voting protocols under study, except for the nonrandomized Cup. Destructive manipulation tends to be easier. Randomizing over instantiations of the protocols (such as schedules of the Cup protocol) can be used to make manipulation hard. Finally, we show that under weak assumptions, if weighted coalitional manipulation with complete information about the others' votes is hard in some voting protocol, then individual and unweighted manipulation is hard when there is uncertainty about the others' votes.\nThis paper studies the problem of modeling complex domains of actions and change within high-level action description languages. We investigate two main issues of concern: (a) can we represent complex domains that capture together different problems such as ramifications, non-determinism and concurrency of actions, at a high-level, close to the given natural ontology of the problem domain and (b) what features of such a representation can affect, and how, its computational behaviour. The paper describes the main problems faced in this representation task and presents the results of an empirical study, carried out through a series of controlled experiments, to analyze the computational performance of reasoning in these representations. The experiments compare different representations obtained, for example, by changing the basic ontology of the domain or by varying the degree of use of indirect effect laws through domain constraints. This study has helped to expose the main sources of computational difficulty in the reasoning and suggest some methodological guidelines for representing complex domains. Although our work has been carried out within one particular high-level description language, we believe that the results, especially those that relate to the problems of representation, are independent of the specific modeling language.\nNested logic programs have recently been introduced in order to allow for arbitrarily nested formulas in the heads and the bodies of logic program rules under the answer sets semantics. Nested expressions can be formed using conjunction, disjunction, as well as the negation as failure operator in an unrestricted fashion. This provides a very flexible and compact framework for knowledge representation and reasoning. Previous results show that nested logic programs can be transformed into standard (unnested) disjunctive logic programs in an elementary way, applying the negation as failure operator to body literals only. This is of great practical relevance since it allows us to evaluate nested logic programs by means of off-the-shelf disjunctive logic programming systems, like DLV. However, it turns out that this straightforward transformation results in an exponential blow-up in the worst-case, despite the fact that complexity results indicate that there is a polynomial translation among both formalisms. In this paper, we take up this challenge and provide a polynomial translation of logic programs with nested expressions into disjunctive logic programs. Moreover, we show that this translation is modular and (strongly) faithful. We have implemented both the straightforward as well as our advanced transformation; the resulting compiler serves as a front-end to DLV and is publicly available on the Web.\nGroenendijk and Stokhof (1984, 1996; Groenendijk 1999) provide a logically attractive theory of the semantics of natural language questions, commonly referred to as the partition theory. Two central notions in this theory are entailment between questions and answerhood. For example, the question \"Who is going to the party?\" entails the question \"Is John going to the party?\", and \"John is going to the party\" counts as an answer to both. Groenendijk and Stokhof define these two notions in terms of partitions of a set of possible worlds.   We provide a syntactic characterization of entailment between questions and answerhood . We show that answers are, in some sense, exactly those formulas that are built up from instances of the question. This result lets us compare the partition theory with other approaches to interrogation -- both linguistic analyses, such as Hamblin's and Karttunen's semantics, and computational systems, such as Prolog. Our comparison separates a notion of answerhood into three aspects: equivalence (when two questions or answers are interchangeable), atomic answers (what instances of a question count as answers), and compound answers (how answers compose).\nA recently introduced general-purpose heuristic for finding high-quality solutions for many hard optimization problems is reviewed. The method is inspired by recent progress in understanding far-from-equilibrium phenomena in terms of {\\em self-organized criticality,} a concept introduced to describe emergent complexity in physical systems. This method, called {\\em extremal optimization,} successively replaces the value of extremely undesirable variables in a sub-optimal solution with new, random ones. Large, avalanche-like fluctuations in the cost function self-organize from this dynamics, effectively scaling barriers to explore local optima in distant neighborhoods of the configuration space while eliminating the need to tune parameters. Drawing upon models used to simulate the dynamics of granular media, evolution, or geology, extremal optimization complements approximation methods inspired by equilibrium statistical physics, such as {\\em simulated annealing}. It may be but one example of applying new insights into {\\em non-equilibrium phenomena} systematically to hard optimization problems. This method is widely applicable and so far has proved competitive with -- and even superior to -- more elaborate general-purpose heuristics on testbeds of constrained optimization problems with up to $10^5$ variables, such as bipartitioning, coloring, and satisfiability. Analysis of a suitable model predicts the only free parameter of the method in accordance with all experimental results.\nThis paper presents the DLV system, which is widely considered the state-of-the-art implementation of disjunctive logic programming, and addresses several aspects. As for problem solving, we provide a formal definition of its kernel language, function-free disjunctive logic programs (also known as disjunctive datalog), extended by weak constraints, which are a powerful tool to express optimization problems. We then illustrate the usage of DLV as a tool for knowledge representation and reasoning, describing a new declarative programming methodology which allows one to encode complex problems (up to $\\Delta^P_3$-complete problems) in a declarative fashion. On the foundational side, we provide a detailed analysis of the computational complexity of the language of DLV, and by deriving new complexity results we chart a complete picture of the complexity of this language and important fragments thereof.   Furthermore, we illustrate the general architecture of the DLV system which has been influenced by these results. As for applications, we overview application front-ends which have been developed on top of DLV to solve specific knowledge representation tasks, and we briefly describe the main international projects investigating the potential of the system for industrial exploitation. Finally, we report about thorough experimentation and benchmarking, which has been carried out to assess the efficiency of the system. The experimental results confirm the solidity of DLV and highlight its potential for emerging application areas like knowledge management and information integration.\nWith the inclusion of an effective methodology, this article answers in detail a question that, for a quarter of a century, remained open despite intense study by various researchers. Is the formula XCB = e(x,e(e(e(x,y),e(z,y)),z)) a single axiom for the classical equivalential calculus when the rules of inference consist of detachment (modus ponens) and substitution? Where the function e represents equivalence, this calculus can be axiomatized quite naturally with the formulas e(x,x), e(e(x,y),e(y,x)), and e(e(x,y),e(e(y,z),e(x,z))), which correspond to reflexivity, symmetry, and transitivity, respectively. (We note that e(x,x) is dependent on the other two axioms.) Heretofore, thirteen shortest single axioms for classical equivalence of length eleven had been discovered, and XCB was the only remaining formula of that length whose status was undetermined. To show that XCB is indeed such a single axiom, we focus on the rule of condensed detachment, a rule that captures detachment together with an appropriately general, but restricted, form of substitution. The proof we present in this paper consists of twenty-five applications of condensed detachment, completing with the deduction of transitivity followed by a deduction of symmetry. We also discuss some factors that may explain in part why XCB resisted relinquishing its treasure for so long. Our approach relied on diverse strategies applied by the automated reasoning program OTTER. Thus ends the search for shortest single axioms for the equivalential calculus.\nAnswer-set programming (ASP) paradigm is a way of using logic to solve search problems. Given a search problem, to solve it one designs a theory in the logic so that models of this theory represent problem solutions. To compute a solution to a problem one needs to compute a model of the corresponding theory. Several answer-set programming formalisms have been developed on the basis of logic programming with the semantics of stable models. In this paper we show that also the logic of predicate calculus gives rise to effective implementations of the ASP paradigm, similar in spirit to logic programming with stable model semantics and with a similar scope of applicability. Specifically, we propose two logics based on predicate calculus as formalisms for encoding search problems. We show that the expressive power of these logics is given by the class NP-search. We demonstrate how to use them in programming and develop computational tools for model finding. In the case of one of the logics our techniques reduce the problem to that of propositional satisfiability and allow one to use off-the-shelf satisfiability solvers. The language of the other logic has more complex syntax and provides explicit means to model some high-level constraints. For theories in this logic, we designed our own solver that takes advantage of the expanded syntax. We present experimental results demonstrating computational effectiveness of the overall approach.\nThis paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic. Cross-validation requires a set of training examples and a set of testing examples. The value of the attribute that is to be predicted is known to the learner in the training set, but unknown in the testing set. The theory demonstrates that cross-validation error has two components: error on the training set (inaccuracy) and sensitivity to noise (instability). This general theory is then applied to voting in instance-based learning. Given an example in the testing set, a typical instance-based learning algorithm predicts the designated attribute by voting among the k nearest neighbors (the k most similar examples) to the testing example in the training set. Voting is intended to increase the stability (resistance to noise) of instance-based learning, but a theoretical analysis shows that there are circumstances in which voting can be destabilizing. The theory suggests ways to minimize cross-validation error, by insuring that voting is stable and does not adversely affect accuracy.\nThis paper first analyzes the resolution complexity of two random CSP models (i.e. Model RB/RD) for which we can establish the existence of phase transitions and identify the threshold points exactly. By encoding CSPs into CNF formulas, it is proved that almost all instances of Model RB/RD have no tree-like resolution proofs of less than exponential size. Thus, we not only introduce new families of CNF formulas hard for resolution, which is a central task of Proof-Complexity theory, but also propose models with both many hard instances and exact phase transitions. Then, the implications of such models are addressed. It is shown both theoretically and experimentally that an application of Model RB/RD might be in the generation of hard satisfiable instances, which is not only of practical importance but also related to some open problems in cryptography such as generating one-way functions. Subsequently, a further theoretical support for the generation method is shown by establishing exponential lower bounds on the complexity of solving random satisfiable and forced satisfiable instances of RB/RD near the threshold. Finally, conclusions are presented, as well as a detailed comparison of Model RB/RD with the Hamiltonian cycle problem and random 3-SAT, which, respectively, exhibit three different kinds of phase transition behavior in NP-complete problems.\nThe paper studies an implementation methodology for partial and disjunctive stable models where partiality and disjunctions are unfolded from a logic program so that an implementation of stable models for normal (disjunction-free) programs can be used as the core inference engine. The unfolding is done in two separate steps. Firstly, it is shown that partial stable models can be captured by total stable models using a simple linear and modular program transformation. Hence, reasoning tasks concerning partial stable models can be solved using an implementation of total stable models. Disjunctive partial stable models have been lacking implementations which now become available as the translation handles also the disjunctive case. Secondly, it is shown how total stable models of disjunctive programs can be determined by computing stable models for normal programs. Hence, an implementation of stable models of normal programs can be used as a core engine for implementing disjunctive programs. The feasibility of the approach is demonstrated by constructing a system for computing stable models of disjunctive programs using the smodels system as the core engine. The performance of the resulting system is compared to that of dlv which is a state-of-the-art special purpose system for disjunctive programs.\nIt has been shown that a neural network model recently proposed to describe basic memory performance is based on a ternary/binary coding/decoding algorithm which leads to a new neural network assembly memory model (NNAMM) providing maximum-likelihood recall/recognition properties and implying a new memory unit architecture with Hopfield two-layer network, N-channel time gate, auxiliary reference memory, and two nested feedback loops. For the data coding used, conditions are found under which a version of Hopfied network implements maximum-likelihood convolutional decoding algorithm and, simultaneously, linear statistical classifier of arbitrary binary vectors with respect to Hamming distance between vector analyzed and reference vector given. In addition to basic memory performance and etc, the model explicitly describes the dependence on time of memory trace retrieval, gives a possibility of one-trial learning, metamemory simulation, generalized knowledge representation, and distinct description of conscious and unconscious mental processes. It has been shown that an assembly memory unit may be viewed as a model of a smallest inseparable part or an 'atom' of consciousness. Some nontraditional neurobiological backgrounds (dynamic spatiotemporal synchrony, properties of time dependent and error detector neurons, early precise spike firing, etc) and the model's application to solve some interdisciplinary problems from different scientific fields are discussed.\nThe study of Complex Systems is considered by many to be a new scientific field, and is distinguished by being a discipline that has applications within many separate areas of scientific study. The study of Neural Networks, Traffic Patterns, Artificial Intelligence, Social Systems, and many other scientific areas can all be considered to fall within the realm of Complex Systems, and can be studied from this new perspective. The advent of more capable computer systems has allowed these systems to be simulated and modeled with far greater ease, and new understanding of computer modeling approaches has allowed the fledgling science to be studied as never before.   The preliminary focus of this paper will be to provide a general overview of the science of Complex Systems, including terminology, definitions, history, and examples. I will attempt to look at some of the most important trends in different areas of research, and give a general overview of research methods that have been used in parallel with computer modeling. Also, I will further define the areas of the science that concern themselves with computer modeling and simulation, and I will attempt to make it clear why the science only came into its own when the proper modeling and simulation tools were finally available. In addition, although there seems to be general agreement between different authors and institutes regarding the generalities of the study, there are some differences in terminology and methodology. I have attempted in this paper to bring as many elements together as possible, as far as the scope of the subject is concerned, without losing focus by studying Complex System techniques that are bound to one particular area of scientific study, unless that area is that of computer modeling.\nWhen reasoning with uncertainty there are many situations where evidences are not only uncertain but their propositions may also be weakly specified in the sense that it may not be certain to which event a proposition is referring. It is then crucial not to combine such evidences in the mistaken belief that they are referring to the same event. This situation would become manageable if the evidences could be clustered into subsets representing events that should be handled separately. In an earlier article we established within Dempster-Shafer theory a criterion function called the metaconflict function. With this criterion we can partition a set of evidences into subsets. Each subset representing a separate event. In this article we will not only find the most plausible subset for each piece of evidence, we will also find the plausibility for every subset that the evidence belongs to the subset. Also, when the number of subsets are uncertain we aim to find a posterior probability distribution regarding the number of subsets.\nThomas M. Strat has developed a decision-theoretic apparatus for Dempster-Shafer theory (Decision analysis using belief functions, Intern. J. Approx. Reason. 4(5/6), 391-417, 1990). In this apparatus, expected utility intervals are constructed for different choices. The choice with the highest expected utility is preferable to others. However, to find the preferred choice when the expected utility interval of one choice is included in that of another, it is necessary to interpolate a discerning point in the intervals. This is done by the parameter rho, defined as the probability that the ambiguity about the utility of every nonsingleton focal element will turn out as favorable as possible. If there are several different decision makers, we might sometimes be more interested in having the highest expected utility among the decision makers rather than only trying to maximize our own expected utility regardless of choices made by other decision makers. The preference of each choice is then determined by the probability of yielding the highest expected utility. This probability is equal to the maximal interval length of rho under which an alternative is preferred. We must here take into account not only the choices already made by other decision makers but also the rational choices we can assume to be made by later decision makers. In Strats apparatus, an assumption, unwarranted by the evidence at hand, has to be made about the value of rho. We demonstrate that no such assumption is necessary. It is sufficient to assume a uniform probability distribution for rho to be able to discern the most preferable choice. We discuss when this approach is justifiable.\nCurrently, there is renewed interest in the problem, raised by Shafer in 1985, of updating probabilities when observations are incomplete. This is a fundamental problem in general, and of particular interest for Bayesian networks. Recently, Grunwald and Halpern have shown that commonly used updating strategies fail in this case, except under very special assumptions. In this paper we propose a new method for updating probabilities with incomplete observations. Our approach is deliberately conservative: we make no assumptions about the so-called incompleteness mechanism that associates complete with incomplete observations. We model our ignorance about this mechanism by a vacuous lower prevision, a tool from the theory of imprecise probabilities, and we use only coherence arguments to turn prior into posterior probabilities. In general, this new approach to updating produces lower and upper posterior probabilities and expectations, as well as partially determinate decisions. This is a logical consequence of the existing ignorance about the incompleteness mechanism. We apply the new approach to the problem of classification of new evidence in probabilistic expert systems, where it leads to a new, so-called conservative updating rule. In the special case of Bayesian networks constructed using expert knowledge, we provide an exact algorithm for classification based on our updating rule, which has linear-time complexity for a class of networks wider than polytrees. This result is then extended to the more general framework of credal networks, where computations are often much harder than with Bayesian nets. Using an example, we show that our rule appears to provide a solid basis for reliable updating with incomplete observations, when no strong assumptions about the incompleteness mechanism are justified.\nSolomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to the true sequence generating posterior mu, if the latter is computable. Hence, M is eligible as a universal predictor in case of unknown mu. We investigate the existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: finitely computable, estimable, enumerable, and approximable. For instance, M is known to be enumerable, but not finitely computable, and to dominate all enumerable semimeasures. We define seven classes of (semi)measures based on these four computability concepts. Each class may or may not contain a (semi)measure which dominates all elements of another class. The analysis of these 49 cases can be reduced to four basic cases, two of them being new. The results hold for discrete and continuous semimeasures. We also investigate more closely the types of convergence, possibly implied by universality: in difference and in ratio, with probability 1, in mean sum, and for Martin-Loef random sequences. We introduce a generalized concept of randomness for individual sequences and use it to exhibit difficulties regarding these issues.\nWe give a purely model-theoretic characterization of the semantics of logic programs with negation-as-failure allowed in clause bodies. In our semantics the meaning of a program is, as in the classical case, the unique minimum model in a program-independent ordering. We use an expanded truth domain that has an uncountable linearly ordered set of truth values between False (the minimum element) and True (the maximum), with a Zero element in the middle. The truth values below Zero are ordered like the countable ordinals. The values above Zero have exactly the reverse order. Negation is interpreted as reflection about Zero followed by a step towards Zero; the only truth value that remains unaffected by negation is Zero. We show that every program has a unique minimum model M_P, and that this model can be constructed with a T_P iteration which proceeds through the countable ordinals. Furthermore, we demonstrate that M_P can also be obtained through a model intersection construction which generalizes the well-known model intersection theorem for classical logic programming. Finally, we show that by collapsing the true and false values of the infinite-valued model M_P to (the classical) True and False, we obtain a three-valued model identical to the well-founded one.\nCoalition formation is a key problem in automated negotiation among self-interested agents, and other multiagent applications. A coalition of agents can sometimes accomplish things that the individual agents cannot, or can do things more efficiently. However, motivating the agents to abide to a solution requires careful analysis: only some of the solutions are stable in the sense that no group of agents is motivated to break off and form a new coalition. This constraint has been studied extensively in cooperative game theory. However, the computational questions around this constraint have received less attention. When it comes to coalition formation among software agents (that represent real-world parties), these questions become increasingly explicit.   In this paper we define a concise general representation for games in characteristic form that relies on superadditivity, and show that it allows for efficient checking of whether a given outcome is in the core. We then show that determining whether the core is nonempty is $\\mathcal{NP}$-complete both with and without transferable utility. We demonstrate that what makes the problem hard in both cases is determining the collaborative possibilities (the set of outcomes possible for the grand coalition), by showing that if these are given, the problem becomes tractable in both cases. However, we then demonstrate that for a hybrid version of the problem, where utility transfer is possible only within the grand coalition, the problem remains $\\mathcal{NP}$-complete even when the collaborative possibilities are given.\nVoting is a general method for preference aggregation in multiagent settings, but seminal results have shown that all (nondictatorial) voting protocols are manipulable. One could try to avoid manipulation by using voting protocols where determining a beneficial manipulation is hard computationally. A number of recent papers study the complexity of manipulating existing protocols. This paper is the first work to take the next step of designing new protocols that are especially hard to manipulate. Rather than designing these new protocols from scratch, we instead show how to tweak existing protocols to make manipulation hard, while leaving much of the original nature of the protocol intact. The tweak studied consists of adding one elimination preround to the election. Surprisingly, this extremely simple and universal tweak makes typical protocols hard to manipulate! The protocols become NP-hard, #P-hard, or PSPACE-hard to manipulate, depending on whether the schedule of the preround is determined before the votes are collected, after the votes are collected, or the scheduling and the vote collecting are interleaved, respectively. We prove general sufficient conditions on the protocols for this tweak to introduce the hardness, and show that the most common voting protocols satisfy those conditions. These are the first results in voting settings where manipulation is in a higher complexity class than NP (presuming PSPACE $\\neq$ NP).\nThe aim of this work is to provide a family of qualitative theories for spatial change in general, and for motion of spatial scenes in particular. To achieve this, we consider a spatio-temporalisation MTALC(D_x), of the well-known ALC(D) family of Description Logics (DLs) with a concrete domainan. In particular, the concrete domain D_x is generated by a qualitative spatial Relation Algebra (RA) x. We show the important result that satisfiability of an MTALC(D_x) concept with respect to a weakly cyclic TBox is decidable in nondeterministic exponential time, by reducing it to the emptiness problem of a weak alternating automaton augmented with spatial constraints, which we show to remain decidable, although the accepting condition of a run involves, additionally to the standard case, consistency of a CSP (Constraint Satisfaction Problem) potentially infinite. The result provides an effective tableaux-like satisfiability procedure which is discussed.\nWe define a ternary Relation Algebra (RA) of relative position relations on two-dimensional directed lines (d-lines for short). A d-line has two degrees of freedom (DFs): a rotational DF (RDF), and a translational DF (TDF). The representation of the RDF of a d-line will be handled by an RA of 2D orientations, CYC_t, known in the literature. A second algebra, TA_t, which will handle the TDF of a d-line, will be defined. The two algebras, CYC_t and TA_t, will constitute, respectively, the translational and the rotational components of the RA, PA_t, of relative position relations on d-lines: the PA_t atoms will consist of those pairs <t,r> of a TA_t atom and a CYC_t atom that are compatible. We present in detail the RA PA_t, with its converse table, its rotation table and its composition tables. We show that a (polynomial) constraint propagation algorithm, known in the literature, is complete for a subset of PA_t relations including almost all of the atomic relations. We will discuss the application scope of the RA, which includes incidence geometry, GIS (Geographic Information Systems), shape representation, localisation in (multi-)robot navigation, and the representation of motion prepositions in NLP (Natural Language Processing). We then compare the RA to existing ones, such as an algebra for reasoning about rectangles parallel to the axes of an (orthogonal) coordinate system, a ``spatial Odyssey'' of Allen's interval algebra, and an algebra for reasoning about 2D segments.\nThis paper describes how the elements of the SP theory (Wolff, 2003a) may be realised with neural structures and processes. To the extent that this is successful, the insights that have been achieved in the SP theory - the integration and simplification of a range of phenomena in perception and cognition - may be incorporated in a neural view of brain function.   These proposals may be seen as a development of Hebb's (1949) concept of a 'cell assembly'. By contrast with that concept and variants of it, the version described in this paper proposes that any one neuron can belong in one assembly and only one assembly. A distinctive feature of the present proposals is that any neuron or cluster of neurons within a cell assembly may serve as a proxy or reference for another cell assembly or class of cell assemblies. This device provides solutions to many of the problems associated with cell assemblies, it allows information to be stored in a compressed form, and it provides a robust mechanism by which assemblies may be connected to form hierarchies, grammars and other kinds of knowledge structure.   Drawing on insights derived from the SP theory, the paper also describes how unsupervised learning may be achieved with neural structures and processes. This theory of learning overcomes weaknesses in the Hebbian concept of learning and it is, at the same time, compatible with the observations that Hebb's theory was designed to explain.\nKeyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is that the selected keyphrases are occasionally incoherent. That is, the majority of the output keyphrases may fit together well, but there may be a minority that appear to be outliers, with no clear semantic relation to the majority or to each other. This paper presents enhancements to the Kea keyphrase extraction algorithm that are designed to increase the coherence of the extracted keyphrases. The approach is to use the degree of statistical association among candidate keyphrases as evidence that they may be semantically related. The statistical association is measured using web mining. Experiments demonstrate that the enhancements improve the quality of the extracted keyphrases. Furthermore, the enhancements are not domain-specific: the algorithm generalizes well when it is trained on one domain (computer science documents) and tested on another (physics documents).\nA ternary/binary data coding algorithm and conditions under which Hopfield networks implement optimal convolutional or Hamming decoding algorithms has been described. Using the coding/decoding approach (an optimal Binary Signal Detection Theory, BSDT) introduced a Neural Network Assembly Memory Model (NNAMM) is built. The model provides optimal (the best) basic memory performance and demands the use of a new memory unit architecture with two-layer Hopfield network, N-channel time gate, auxiliary reference memory, and two nested feedback loops. NNAMM explicitly describes the dependence on time of a memory trace retrieval, gives a possibility of metamemory simulation, generalized knowledge representation, and distinct description of conscious and unconscious mental processes. A model of smallest inseparable part or an \"atom\" of consciousness is also defined. The NNAMM's neurobiological backgrounds and its applications to solving some interdisciplinary problems are shortly discussed. BSDT could implement the \"best neural code\" used in nervous tissues of animals and humans.\nWe present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.\nWe discuss the computational complexity of random 2D Ising spin glasses, which represent an interesting class of constraint satisfaction problems for black box optimization. Two extremal cases are considered: (1) the +/- J spin glass, and (2) the Gaussian spin glass. We also study a smooth transition between these two extremal cases. The computational complexity of all studied spin glass systems is found to be dominated by rare events of extremely hard spin glass samples. We show that complexity of all studied spin glass systems is closely related to Frechet extremal value distribution. In a hybrid algorithm that combines the hierarchical Bayesian optimization algorithm (hBOA) with a deterministic bit-flip hill climber, the number of steps performed by both the global searcher (hBOA) and the local searcher follow Frechet distributions. Nonetheless, unlike in methods based purely on local search, the parameters of these distributions confirm good scalability of hBOA with local search. We further argue that standard performance measures for optimization algorithms--such as the average number of evaluations until convergence--can be misleading. Finally, our results indicate that for highly multimodal constraint satisfaction problems, such as Ising spin glasses, recombination-based search can provide qualitatively better results than mutation-based search.\nMutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.\nOne proposes a first alternative rule of combination to WAO (Weighted Average Operator) proposed recently by Josang, Daniel and Vannoorenberghe, called Proportional Conflict Redistribution rule (denoted PCR1). PCR1 and WAO are particular cases of WO (the Weighted Operator) because the conflicting mass is redistributed with respect to some weighting factors. In this first PCR rule, the proportionalization is done for each non-empty set with respect to the non-zero sum of its corresponding mass matrix - instead of its mass column average as in WAO, but the results are the same as Ph. Smets has pointed out. Also, we extend WAO (which herein gives no solution) for the degenerate case when all column sums of all non-empty sets are zero, and then the conflicting mass is transferred to the non-empty disjunctive form of all non-empty sets together; but if this disjunctive form happens to be empty, then one considers an open world (i.e. the frame of discernment might contain new hypotheses) and thus all conflicting mass is transferred to the empty set. In addition to WAO, we propose a general formula for PCR1 (WAO for non-degenerate cases).\nThe development of effective knowledge discovery techniques has become in the recent few years a very active research area due to the important impact it has in several relevant application areas. One interesting task thereof is that of singling out anomalous individuals from a given population, e.g., to detect rare events in time-series analysis settings, or to identify objects whose behavior is deviant w.r.t. a codified standard set of \"social\" rules. Such exceptional individuals are usually referred to as outliers in the literature.   Recently, outlier detection has also emerged as a relevant KR&R problem. In this paper, we formally state the concept of outliers by generalizing in several respects an approach recently proposed in the context of default logic, for instance, by having outliers not being restricted to single individuals but, rather, in the more general case, to correspond to entire (sub)theories. We do that within the context of logic programming and, mainly through examples, we discuss its potential practical impact in applications. The formalization we propose is a novel one and helps in shedding some light on the real nature of outliers. Moreover, as a major contribution of this work, we illustrate the exploitation of minimality criteria in outlier detection. The computational complexity of outlier detection problems arising in this novel setting is thoroughly investigated and accounted for in the paper as well. Finally, we also propose a rewriting algorithm that transforms any outlier detection problem into an equivalent inference problem under the stable model semantics, thereby making outlier computation effective and realizable on top of any stable model solver.\nNormal forms for logic programs under stable/answer set semantics are introduced. We argue that these forms can simplify the study of program properties, mainly consistency. The first normal form, called the {\\em kernel} of the program, is useful for studying existence and number of answer sets. A kernel program is composed of the atoms which are undefined in the Well-founded semantics, which are those that directly affect the existence of answer sets. The body of rules is composed of negative literals only. Thus, the kernel form tends to be significantly more compact than other formulations. Also, it is possible to check consistency of kernel programs in terms of colorings of the Extended Dependency Graph program representation which we previously developed. The second normal form is called {\\em 3-kernel.} A 3-kernel program is composed of the atoms which are undefined in the Well-founded semantics. Rules in 3-kernel programs have at most two conditions, and each rule either belongs to a cycle, or defines a connection between cycles. 3-kernel programs may have positive conditions. The 3-kernel normal form is very useful for the static analysis of program consistency, i.e., the syntactic characterization of existence of answer sets. This result can be obtained thanks to a novel graph-like representation of programs, called Cycle Graph which presented in the companion article \\cite{Cos04b}.\nConsider the problem of tracking a set of moving targets. Apart from the tracking result, it is often important to know where the tracking fails, either to steer sensors to that part of the state-space, or to inform a human operator about the status and quality of the obtained information. An intuitive quality measure is the correlation between two tracking results based on uncorrelated observations. In the case of Bayesian trackers such a correlation measure could be the Kullback-Leibler difference.   We focus on a scenario with a large number of military units moving in some terrain. The units are observed by several types of sensors and \"meta-sensors\" with force aggregation capabilities. The sensors register units of different size. Two separate multi-target probability hypothesis density (PHD) particle filters are used to track some type of units (e.g., companies) and their sub-units (e.g., platoons), respectively, based on observations of units of those sizes. Each observation is used in one filter only.   Although the state-space may well be the same in both filters, the posterior PHD distributions are not directly comparable -- one unit might correspond to three or four spatially distributed sub-units. Therefore, we introduce a mapping function between distributions for different unit size, based on doctrine knowledge of unit configuration.   The mapped distributions can now be compared -- locally or globally -- using some measure, which gives the correlation between two PHD distributions in a bounded volume of the state-space. To locate areas where the tracking fails, a discretized quality map of the state-space can be generated by applying the measure locally to different parts of the space.\nSegmentation of a colour image composed of different kinds of texture regions can be a hard problem, namely to compute for an exact texture fields and a decision of the optimum number of segmentation areas in an image when it contains similar and/or unstationary texture fields. In this work, a method is described for evolving adaptive procedures for these problems. In many real world applications data clustering constitutes a fundamental issue whenever behavioural or feature domains can be mapped into topological domains. We formulate the segmentation problem upon such images as an optimisation problem and adopt evolutionary strategy of Genetic Algorithms for the clustering of small regions in colour feature space. The present approach uses k-Means unsupervised clustering methods into Genetic Algorithms, namely for guiding this last Evolutionary Algorithm in his search for finding the optimal or sub-optimal data partition, task that as we know, requires a non-trivial search because of its intrinsic NP-complete nature. To solve this task, the appropriate genetic coding is also discussed, since this is a key aspect in the implementation. Our purpose is to demonstrate the efficiency of Genetic Algorithms to automatic and unsupervised texture segmentation. Some examples in Colour Maps, Ornamental Stones and in Human Skin Mark segmentation are presented and overall results discussed. KEYWORDS: Genetic Algorithms, Colour Image Segmentation, Classification, Clustering.\nIn the absence of a pure noise-free image it is hard to define what noise is, in any original noisy image, and as a consequence also where it is, and in what amount. In fact, the definition of noise depends largely on our own aim in the whole image analysis process, and (perhaps more important) in our self-perception of noise. For instance, when we perceive noise as disconnected and small it is normal to use MM-ASF filters to treat it. There is two evidences of this. First, in many instances there is no ideal and pure noise-free image to compare our filtering process (nothing but our self-perception of its pure image); second, and related with this first point, MM transformations that we chose are only based on our self - and perhaps - fuzzy notion. The present proposal combines the results of two MM filtering transformations (FT1, FT2) and makes use of some measures and quantitative relations on their Size/Intensity Diagrams to find the most appropriate noise removal process. Results can also be used for finding the most appropriate stop criteria, and the right sequence of MM operators combination on Alternating Sequential Filters (ASF), if these measures are applied, for instance, on a Genetic Algorithm's target function.\nThis paper introduces a fundamental result, which is relevant for Answer Set programming, and planning. For the first time since the definition of the stable model semantics, the class of logic programs for which a stable model exists is given a syntactic characterization. This condition may have a practical importance both for defining new algorithms for checking consistency and computing answer sets, and for improving the existing systems. The approach of this paper is to introduce a new canonical form (to which any logic program can be reduced to), to focus the attention on cyclic dependencies. The technical result is then given in terms of programs in canonical form (canonical programs), without loss of generality. The result is based on identifying the cycles contained in the program, showing that stable models of the overall program are composed of stable models of suitable sub-programs, corresponding to the cycles, and on defining the Cycle Graph. Each vertex of this graph corresponds to one cycle, and each edge corresponds to onehandle, which is a literal containing an atom that, occurring in both cycles, actually determines a connection between them. In fact, the truth value of the handle in the cycle where it appears as the head of a rule, influences the truth value of the atoms of the cycle(s) where it occurs in the body. We can therefore introduce the concept of a handle path, connecting different cycles. If for every odd cycle we can find a handle path with certain properties, then the existence of stable model is guaranteed.\nComputability logic is a formal theory of computational tasks and resources. Formulas in it represent interactive computational problems, and \"truth\" is understood as algorithmic solvability. Interactive computational problems, in turn, are defined as a certain sort games between a machine and its environment, with logical operators standing for operations on such games. Within the ambitious program of finding axiomatizations for incrementally rich fragments of this semantically introduced logic, the earlier article \"From truth to computability I\" proved soundness and completeness for system CL3, whose language has the so called parallel connectives (including negation), choice connectives, choice quantifiers, and blind quantifiers. The present paper extends that result to the significantly more expressive system CL4 with the same collection of logical operators. What makes CL4 expressive is the presence of two sorts of atoms in its language: elementary atoms, representing elementary computational problems (i.e. predicates, i.e. problems of zero degree of interactivity), and general atoms, representing arbitrary computational problems. CL4 conservatively extends CL3, with the latter being nothing but the general-atom-free fragment of the former. Removing the blind (classical) group of quantifiers from the language of CL4 is shown to yield a decidable logic despite the fact that the latter is still first-order. A comprehensive online source on computability logic can be found at http://www.cis.upenn.edu/~giorgi/cl.html\nAnswer set programming (ASP) with disjunction offers a powerful tool for declaratively representing and solving hard problems. Many NP-complete problems can be encoded in the answer set semantics of logic programs in a very concise and intuitive way, where the encoding reflects the typical \"guess and check\" nature of NP problems: The property is encoded in a way such that polynomial size certificates for it correspond to stable models of a program. However, the problem-solving capacity of full disjunctive logic programs (DLPs) is beyond NP, and captures a class of problems at the second level of the polynomial hierarchy. While these problems also have a clear \"guess and check\" structure, finding an encoding in a DLP reflecting this structure may sometimes be a non-obvious task, in particular if the \"check\" itself is a coNP-complete problem; usually, such problems are solved by interleaving separate guess and check programs, where the check is expressed by inconsistency of the check program. In this paper, we present general transformations of head-cycle free (extended) disjunctive logic programs into stratified and positive (extended) disjunctive logic programs based on meta-interpretation techniques. The answer sets of the original and the transformed program are in simple correspondence, and, moreover, inconsistency of the original program is indicated by a designated answer set of the transformed program. Our transformations facilitate the integration of separate \"guess\" and \"check\" programs, which are often easy to obtain, automatically into a single disjunctive logic program. Our results complement recent results on meta-interpretation in ASP, and extend methods and techniques for a declarative \"guess and check\" problem solving paradigm through ASP.\nTo test incomplete search algorithms for constraint satisfaction problems such as 3-SAT, we need a source of hard, but satisfiable, benchmark instances. A simple way to do this is to choose a random truth assignment A, and then choose clauses randomly from among those satisfied by A. However, this method tends to produce easy problems, since the majority of literals point toward the ``hidden'' assignment A. Last year, Achlioptas, Jia and Moore proposed a problem generator that cancels this effect by hiding both A and its complement. While the resulting formulas appear to be just as hard for DPLL algorithms as random 3-SAT formulas with no hidden assignment, they can be solved by WalkSAT in only polynomial time. Here we propose a new method to cancel the attraction to A, by choosing a clause with t > 0 literals satisfied by A with probability proportional to q^t for some q < 1. By varying q, we can generate formulas whose variables have no bias, i.e., which are equally likely to be true or false; we can even cause the formula to ``deceptively'' point away from A. We present theoretical and experimental results suggesting that these formulas are exponentially hard both for DPLL algorithms and for incomplete algorithms such as WalkSAT.\nThe uncertainty of classification outcomes is of crucial importance for many safety critical applications including, for example, medical diagnostics. In such applications the uncertainty of classification can be reliably estimated within a Bayesian model averaging technique that allows the use of prior information. Decision Tree (DT) classification models used within such a technique gives experts additional information by making this classification scheme observable. The use of the Markov Chain Monte Carlo (MCMC) methodology of stochastic sampling makes the Bayesian DT technique feasible to perform. However, in practice, the MCMC technique may become stuck in a particular DT which is far away from a region with a maximal posterior. Sampling such DTs causes bias in the posterior estimates, and as a result the evaluation of classification uncertainty may be incorrect. In a particular case, the negative effect of such sampling may be reduced by giving additional prior information on the shape of DTs. In this paper we describe a new approach based on sweeping the DTs without additional priors on the favorite shape of DTs. The performances of Bayesian DT techniques with the standard and sweeping strategies are compared on a synthetic data as well as on real datasets. Quantitatively evaluating the uncertainty in terms of entropy of class posterior probabilities, we found that the sweeping strategy is superior to the standard strategy.\nObjective:   The aim of this paper is to survey the recent work in medical documents summarization.   Background:   During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc.   Methodology:   This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics.   Discussion and conclusions:   The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications\nBayesian averaging over classification models allows the uncertainty of classification outcomes to be evaluated, which is of crucial importance for making reliable decisions in applications such as financial in which risks have to be estimated. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the diversity of a classifier ensemble and the required performance. The interpretability of classification models can also give useful information for experts responsible for making reliable classifications. For this reason Decision Trees (DTs) seem to be attractive classification models. The required diversity of the DT ensemble can be achieved by using the Bayesian model averaging all possible DTs. In practice, the Bayesian approach can be implemented on the base of a Markov Chain Monte Carlo (MCMC) technique of random sampling from the posterior distribution. For sampling large DTs, the MCMC method is extended by Reversible Jump technique which allows inducing DTs under given priors. For the case when the prior information on the DT size is unavailable, the sweeping technique defining the prior implicitly reveals a better performance. Within this Chapter we explore the classification uncertainty of the Bayesian MCMC techniques on some datasets from the StatLog Repository and real financial data. The classification uncertainty is compared within an Uncertainty Envelope technique dealing with the class posterior distribution and a given confidence probability. This technique provides realistic estimates of the classification uncertainty which can be easily interpreted in statistical terms with the aim of risk evaluation.\nMultiple Classifier Systems (MCSs) allow evaluation of the uncertainty of classification outcomes that is of crucial importance for safety critical applications. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the classifier diversity and the required performance. The interpretability of MCSs can also give useful information for experts responsible for making reliable classifications. For this reason Decision Trees (DTs) seem to be attractive classification models for experts. The required diversity of MCSs exploiting such classification models can be achieved by using two techniques, the Bayesian model averaging and the randomised DT ensemble. Both techniques have revealed promising results when applied to real-world problems. In this paper we experimentally compare the classification uncertainty of the Bayesian model averaging with a restarting strategy and the randomised DT ensemble on a synthetic dataset and some domain problems commonly used in the machine learning community. To make the Bayesian DT averaging feasible, we use a Markov Chain Monte Carlo technique. The classification uncertainty is evaluated within an Uncertainty Envelope technique dealing with the class posterior distribution and a given confidence probability. Exploring a full posterior distribution, this technique produces realistic estimates which can be easily interpreted in statistical terms. In our experiments we found out that the Bayesian DTs are superior to the randomised DT ensembles within the Uncertainty Envelope technique.\nAn important task for Homeland Security is the prediction of threat vulnerabilities, such as through the detection of relationships between seemingly disjoint entities. A structure used for this task is a \"semantic graph\", also known as a \"relational data graph\" or an \"attributed relational graph\". These graphs encode relationships as \"typed\" links between a pair of \"typed\" nodes. Indeed, semantic graphs are very similar to semantic networks used in AI. The node and link types are related through an ontology graph (also known as a schema). Furthermore, each node has a set of attributes associated with it (e.g., \"age\" may be an attribute of a node of type \"person\"). Unfortunately, the selection of types and attributes for both nodes and links depends on human expertise and is somewhat subjective and even arbitrary. This subjectiveness introduces biases into any algorithm that operates on semantic graphs. Here, we raise some knowledge representation issues for semantic graphs and provide some possible solutions using recently developed ideas in the field of complex networks. In particular, we use the concept of transitivity to evaluate the relevance of individual links in the semantic graph for detecting relationships. We also propose new statistical measures for semantic graphs and illustrate these semantic measures on graphs constructed from movies and terrorism data.\nIn the frame of designing a knowledge discovery system, we have developed stochastic models based on high-order hidden Markov models. These models are capable to map sequences of data into a Markov chain in which the transitions between the states depend on the \\texttt{n} previous states according to the order of the model. We study the process of achieving information extraction fromspatial and temporal data by means of an unsupervised classification. We use therefore a French national database related to the land use of a region, named Teruti, which describes the land use both in the spatial and temporal domain. Land-use categories (wheat, corn, forest, ...) are logged every year on each site regularly spaced in the region. They constitute a temporal sequence of images in which we look for spatial and temporal dependencies. The temporal segmentation of the data is done by means of a second-order Hidden Markov Model (\\hmmd) that appears to have very good capabilities to locate stationary segments, as shown in our previous work in speech recognition. Thespatial classification is performed by defining a fractal scanning ofthe images with the help of a Hilbert-Peano curve that introduces atotal order on the sites, preserving the relation ofneighborhood between the sites. We show that the \\hmmd performs aclassification that is meaningful for the agronomists.Spatial and temporal classification may be achieved simultaneously by means of a 2 levels \\hmmd that measures the \\aposteriori probability to map a temporal sequence of images onto a set of hidden classes.\nThe task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Detection of such outliers is important for many applications such as fraud detection and customer migration. Most such applications are high dimensional domains in which the data may contain hundreds of dimensions. However, the outlier detection problem itself is not well defined and none of the existing definitions are widely accepted, especially in high dimensional space. In this paper, our first contribution is to propose a unified framework for outlier detection in high dimensional spaces from an ensemble-learning viewpoint. In our new framework, the outlying-ness of each data object is measured by fusing outlier factors in different subspaces using a combination function. Accordingly, we show that all existing researches on outlier detection can be regarded as special cases in the unified framework with respect to the set of subspaces considered and the type of combination function used. In addition, to demonstrate the usefulness of the ensemble-learning based outlier detection framework, we developed a very simple and fast algorithm, namely SOE1 (Subspace Outlier Ensemble using 1-dimensional Subspaces) in which only subspaces with one dimension is used for mining outliers from large categorical datasets. The SOE1 algorithm needs only two scans over the dataset and hence is very appealing in real data mining applications. Experimental results on real datasets and large synthetic datasets show that: (1) SOE1 has comparable performance with respect to those state-of-art outlier detection algorithms on identifying true outliers and (2) SOE1 can be an order of magnitude faster than one of the fastest outlier detection algorithms known so far.\nThe present paper investigates consequence relations that are both non-monotonic and paraconsistent. More precisely, we put the focus on preferential consequence relations, i.e. those relations that can be defined by a binary preference relation on states labelled by valuations. We worked with a general notion of valuation that covers e.g. the classical valuations as well as certain kinds of many-valued valuations. In the many-valued cases, preferential consequence relations are paraconsistant (in addition to be non-monotonic), i.e. they are capable of drawing reasonable conclusions which contain contradictions. The first purpose of this paper is to provide in our general framework syntactic characterizations of several families of preferential relations. The second and main purpose is to provide, again in our general framework, characterizations of several families of preferential discriminative consequence relations. They are defined exactly as the plain version, but any conclusion such that its negation is also a conclusion is rejected (these relations bring something new essentially in the many-valued cases).\nWe consider the problem of sequential decision making under uncertainty in which the loss caused by a decision depends on the following binary observation. In competitive on-line learning, the goal is to design decision algorithms that are almost as good as the best decision rules in a wide benchmark class, without making any assumptions about the way the observations are generated. However, standard algorithms in this area can only deal with finite-dimensional (often countable) benchmark classes. In this paper we give similar results for decision rules ranging over an arbitrary reproducing kernel Hilbert space. For example, it is shown that for a wide class of loss functions (including the standard square, absolute, and log loss functions) the average loss of the master algorithm, over the first $N$ observations, does not exceed the average loss of the best decision rule with a bounded norm plus $O(N^{-1/2})$. Our proof technique is very different from the standard ones and is based on recent results about defensive forecasting. Given the probabilities produced by a defensive forecasting algorithm, which are known to be well calibrated and to have good resolution in the long run, we use the expected loss minimization principle to find a suitable decision.\nRecursive loops in a logic program present a challenging problem to the PLP framework. On the one hand, they loop forever so that the PLP backward-chaining inferences would never stop. On the other hand, they generate cyclic influences, which are disallowed in Bayesian networks. Therefore, in existing PLP approaches logic programs with recursive loops are considered to be problematic and thus are excluded. In this paper, we propose an approach that makes use of recursive loops to build a stationary dynamic Bayesian network. Our work stems from an observation that recursive loops in a logic program imply a time sequence and thus can be used to model a stationary dynamic Bayesian network without using explicit time parameters. We introduce a Bayesian knowledge base with logic clauses of the form $A \\leftarrow A_1,...,A_l, true, Context, Types$, which naturally represents the knowledge that the $A_i$s have direct influences on $A$ in the context $Context$ under the type constraints $Types$. We then use the well-founded model of a logic program to define the direct influence relation and apply SLG-resolution to compute the space of random variables together with their parental connections. We introduce a novel notion of influence clauses, based on which a declarative semantics for a Bayesian knowledge base is established and algorithms for building a two-slice dynamic Bayesian network from a logic program are developed.\nQuery containment and query answering are two important computational tasks in databases. While query answering amounts to compute the result of a query over a database, query containment is the problem of checking whether for every database, the result of one query is a subset of the result of another query.   In this paper, we deal with unions of conjunctive queries, and we address query containment and query answering under Description Logic constraints. Every such constraint is essentially an inclusion dependencies between concepts and relations, and their expressive power is due to the possibility of using complex expressions, e.g., intersection and difference of relations, special forms of quantification, regular expressions over binary relations, in the specification of the dependencies. These types of constraints capture a great variety of data models, including the relational, the entity-relationship, and the object-oriented model, all extended with various forms of constraints, and also the basic features of the ontology languages used in the context of the Semantic Web.   We present the following results on both query containment and query answering. We provide a method for query containment under Description Logic constraints, thus showing that the problem is decidable, and analyze its computational complexity. We prove that query containment is undecidable in the case where we allow inequalities in the right-hand side query, even for very simple constraints and queries. We show that query answering under Description Logic constraints can be reduced to query containment, and illustrate how such a reduction provides upper bound results with respect to both combined and data complexity.\nThis paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks.\nWe develop and analyze methods for computing provably optimal {\\em maximum a posteriori} (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of tree-structured distributions, we obtain an upper bound on the optimal value of the original problem (i.e., the log probability of the MAP assignment) in terms of the combined optimal values of the tree problems. We prove that this upper bound is tight if and only if all the tree distributions share an optimal configuration in common. An important implication is that any such shared configuration must also be a MAP configuration for the original distribution. Next we develop two approaches to attempting to obtain tight upper bounds: (a) a {\\em tree-relaxed linear program} (LP), which is derived from the Lagrangian dual of the upper bounds; and (b) a {\\em tree-reweighted max-product message-passing algorithm} that is related to but distinct from the max-product algorithm. In this way, we establish a connection between a certain LP relaxation of the mode-finding problem, and a reweighted form of the max-product (min-sum) message-passing algorithm.\nWe posit a new paradigm for image information processing. For the last 25 years, this task was usually approached in the frame of Treisman's two-stage paradigm [1]. The latter supposes an unsupervised, bottom-up directed process of preliminary information pieces gathering at the lower processing stages and a supervised, top-down directed process of information pieces binding and grouping at the higher stages. It is acknowledged that these sub-processes interact and intervene between them in a tricky and a complicated manner. Notwithstanding the prevalence of this paradigm in biological and computer vision, we nevertheless propose to replace it with a new one, which we would like to designate as a two-part paradigm. In it, information contained in an image is initially extracted in an independent top-down manner by one part of the system, and then it is examined and interpreted by another, separate system part. We argue that the new paradigm seems to be more plausible than its forerunner. We provide evidence from human attention vision studies and insights of Kolmogorov's complexity theory to support our arguments. We also provide some reasons in favor of separate image interpretation issues.\nProtein-protein and protein nucleic acid interactions are vitally important for a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed machine learning approaches for predicting which amino acids of a protein participate in its interactions with other proteins and/or nucleic acids, using only the protein sequence as input. In this paper, we describe an application of classifiers trained on datasets of well-characterized protein-protein and protein-RNA complexes for which experimental structures are available. We apply these classifiers to the problem of predicting protein and RNA binding sites in the sequence of a clinically important protein for which the structure is not known: the regulatory protein Rev, essential for the replication of HIV-1 and other lentiviruses. We compare our predictions with published biochemical, genetic and partial structural information for HIV-1 and EIAV Rev and with our own published experimental mapping of RNA binding sites in EIAV Rev. The predicted and experimentally determined binding sites are in very good agreement. The ability to predict reliably the residues of a protein that directly contribute to specific binding events - without the requirement for structural information regarding either the protein or complexes in which it participates - can potentially generate new disease intervention strategies.\nWhen solving numerical constraints such as nonlinear equations and inequalities, solvers often exploit pruning techniques, which remove redundant value combinations from the domains of variables, at pruning steps. To find the complete solution set, most of these solvers alternate the pruning steps with branching steps, which split each problem into subproblems. This forms the so-called branch-and-prune framework, well known among the approaches for solving numerical constraints. The basic branch-and-prune search strategy that uses domain bisections in place of the branching steps is called the bisection search. In general, the bisection search works well in case (i) the solutions are isolated, but it can be improved further in case (ii) there are continuums of solutions (this often occurs when inequalities are involved). In this paper, we propose a new branch-and-prune search strategy along with several variants, which not only allow yielding better branching decisions in the latter case, but also work as well as the bisection search does in the former case. These new search algorithms enable us to employ various pruning techniques in the construction of inner and outer approximations of the solution set. Our experiments show that these algorithms speed up the solving process often by one order of magnitude or more when solving problems with continuums of solutions, while keeping the same performance as the bisection search when the solutions are isolated.\nFor graphs $G$ and $H$, a mapping $f: V(G)\\dom V(H)$ is a homomorphism of $G$ to $H$ if $uv\\in E(G)$ implies $f(u)f(v)\\in E(H).$ If, moreover, each vertex $u \\in V(G)$ is associated with costs $c_i(u), i \\in V(H)$, then the cost of the homomorphism $f$ is $\\sum_{u\\in V(G)}c_{f(u)}(u)$. For each fixed graph $H$, we have the {\\em minimum cost homomorphism problem}, written as MinHOM($H)$. The problem is to decide, for an input graph $G$ with costs $c_i(u),$ $u \\in V(G), i\\in V(H)$, whether there exists a homomorphism of $G$ to $H$ and, if one exists, to find one of minimum cost. Minimum cost homomorphism problems encompass (or are related to) many well studied optimization problems. We describe a dichotomy of the minimum cost homomorphism problems for graphs $H$, with loops allowed. When each connected component of $H$ is either a reflexive proper interval graph or an irreflexive proper interval bigraph, the problem MinHOM($H)$ is polynomial time solvable. In all other cases the problem MinHOM($H)$ is NP-hard. This solves an open problem from an earlier paper. Along the way, we prove a new characterization of the class of proper interval bigraphs.\nOpen answer set programming (OASP) is an extension of answer set programming where one may ground a program with an arbitrary superset of the program's constants. We define a fixed point logic (FPL) extension of Clark's completion such that open answer sets correspond to models of FPL formulas and identify a syntactic subclass of programs, called (loosely) guarded programs. Whereas reasoning with general programs in OASP is undecidable, the FPL translation of (loosely) guarded programs falls in the decidable (loosely) guarded fixed point logic (mu(L)GF). Moreover, we reduce normal closed ASP to loosely guarded OASP, enabling for the first time, a characterization of an answer set semantics by muLGF formulas. We further extend the open answer set semantics for programs with generalized literals. Such generalized programs (gPs) have interesting properties, e.g., the ability to express infinity axioms. We restrict the syntax of gPs such that both rules and generalized literals are guarded. Via a translation to guarded fixed point logic, we deduce 2-exptime-completeness of satisfiability checking in such guarded gPs (GgPs). Bound GgPs are restricted GgPs with exptime-complete satisfiability checking, but still sufficiently expressive to optimally simulate computation tree logic (CTL). We translate Datalog lite programs to GgPs, establishing equivalence of GgPs under an open answer set semantics, alternation-free muGF, and Datalog lite.\nProgram analysis and verification require decision procedures to reason on theories of data structures. Many problems can be reduced to the satisfiability of sets of ground literals in theory T. If a sound and complete inference system for first-order logic is guaranteed to terminate on T-satisfiability problems, any theorem-proving strategy with that system and a fair search plan is a T-satisfiability procedure. We prove termination of a rewrite-based first-order engine on the theories of records, integer offsets, integer offsets modulo and lists. We give a modularity theorem stating sufficient conditions for termination on a combinations of theories, given termination on each. The above theories, as well as others, satisfy these conditions. We introduce several sets of benchmarks on these theories and their combinations, including both parametric synthetic benchmarks to test scalability, and real-world problems to test performances on huge sets of literals. We compare the rewrite-based theorem prover E with the validity checkers CVC and CVC Lite. Contrary to the folklore that a general-purpose prover cannot compete with reasoners with built-in theories, the experiments are overall favorable to the theorem prover, showing that not only the rewriting approach is elegant and conceptually simple, but has important practical implications.\nFuzzy automata, whose input alphabet is a set of numbers or symbols, are a formal model of computing with values. Motivated by Zadeh's paradigm of computing with words rather than numbers, Ying proposed a kind of fuzzy automata, whose input alphabet consists of all fuzzy subsets of a set of symbols, as a formal model of computing with all words. In this paper, we introduce a somewhat general formal model of computing with (some special) words. The new features of the model are that the input alphabet only comprises some (not necessarily all) fuzzy subsets of a set of symbols and the fuzzy transition function can be specified arbitrarily. By employing the methodology of fuzzy control, we establish a retraction principle from computing with words to computing with values for handling crisp inputs and a generalized extension principle from computing with words to computing with all words for handling fuzzy inputs. These principles show that computing with values and computing with all words can be respectively implemented by computing with words. Some algebraic properties of retractions and generalized extensions are addressed as well.\nThrough the Internet and the World-Wide Web, a vast number of information sources has become available, which offer information on various subjects by different providers, often in heterogeneous formats. This calls for tools and methods for building an advanced information-processing infrastructure. One issue in this area is the selection of suitable information sources in query answering. In this paper, we present a knowledge-based approach to this problem, in the setting where one among a set of information sources (prototypically, data repositories) should be selected for evaluating a user query. We use extended logic programs (ELPs) to represent rich descriptions of the information sources, an underlying domain theory, and user queries in a formal query language (here, XML-QL, but other languages can be handled as well). Moreover, we use ELPs for declarative query analysis and generation of a query description. Central to our approach are declarative source-selection programs, for which we define syntax and semantics. Due to the structured nature of the considered data items, the semantics of such programs must carefully respect implicit context information in source-selection rules, and furthermore combine it with possible user preferences. A prototype implementation of our approach has been realized exploiting the DLV KR system and its plp front-end for prioritized ELPs. We describe a representative example involving specific movie databases, and report about experimental results.\nLogical formalisms for reasoning about relations between spatial regions play a fundamental role in geographical information systems, spatial and constraint databases, and spatial reasoning in AI. In analogy with Halpern and Shoham's modal logic of time intervals based on the Allen relations, we introduce a family of modal logics equipped with eight modal operators that are interpreted by the Egenhofer-Franzosa (or RCC8) relations between regions in topological spaces such as the real plane. We investigate the expressive power and computational complexity of logics obtained in this way. It turns out that our modal logics have the same expressive power as the two-variable fragment of first-order logic, but are exponentially less succinct. The complexity ranges from (undecidable and) recursively enumerable to highly undecidable, where the recursively enumerable logics are obtained by considering substructures of structures induced by topological spaces. As our undecidability results also capture logics based on the real line, they improve upon undecidability results for interval temporal logics by Halpern and Shoham. We also analyze modal logics based on the five RCC5 relations, with similar results regarding the expressive power, but weaker results regarding the complexity.\nFuzzy {\\it discrete event systems} (DESs) were proposed recently by Lin and Ying [19], which may better cope with the real-world problems with fuzziness, impreciseness, and subjectivity such as those in biomedicine. As a continuation of [19], in this paper we further develop fuzzy DESs by dealing with supervisory control of fuzzy DESs. More specifically, (i) we reformulate the parallel composition of crisp DESs, and then define the parallel composition of fuzzy DESs that is equivalent to that in [19]; {\\it max-product} and {\\it max-min} automata for modeling fuzzy DESs are considered; (ii) we deal with a number of fundamental problems regarding supervisory control of fuzzy DESs, particularly demonstrate controllability theorem and nonblocking controllability theorem of fuzzy DESs, and thus present the conditions for the existence of supervisors in fuzzy DESs; (iii) we analyze the complexity for presenting a uniform criterion to test the fuzzy controllability condition of fuzzy DESs modeled by max-product automata; in particular, we present in detail a general computing method for checking whether or not the fuzzy controllability condition holds, if max-min automata are used to model fuzzy DESs, and by means of this method we can search for all possible fuzzy states reachable from initial fuzzy state in max-min automata; also, we introduce the fuzzy $n$-controllability condition for some practical problems; (iv) a number of examples serving to illustrate the applications of the derived results and methods are described; some basic properties related to supervisory control of fuzzy DESs are investigated. To conclude, some related issues are raised for further consideration.\nThis paper presents results of an ongoing interdisciplinary study to develop a computational theory of creativity for engineering design. Human design activities are surveyed, and popular computer-aided design methodologies are examined. It is argued that semiotics has the potential to merge and unite various design approaches into one fundamental theory that is naturally interpretable and so comprehensible in terms of computer use. Reviewing related work in philosophy, psychology, and cognitive science provides a general and encompassing vision of the creativity phenomenon. Basic notions of algebraic semiotics are given and explained in terms of design. This is to define a model of the design creative process, which is seen as a process of semiosis, where concepts and their attributes represented as signs organized into systems are evolved, blended, and analyzed, resulting in the development of new concepts. The model allows us to formally describe and investigate essential properties of the design process, namely its dynamics and non-determinism inherent in creative thinking. A stable pattern of creative thought - analogical and metaphorical reasoning - is specified to demonstrate the expressive power of the modeling approach; illustrative examples are given. The developed theory is applied to clarify the nature of emergence in design: it is shown that while emergent properties of a product may influence its creative value, emergence can simply be seen as a by-product of the creative process. Concluding remarks summarize the research, point to some unresolved issues, and outline directions for future work.\nWe present an unsupervised learning algorithm that mines large text corpora for patterns that express implicit semantic relations. For a given input word pair X:Y with some unspecified semantic relations, the corresponding output list of patterns <P1,...,Pm> is ranked according to how well each pattern Pi expresses the relations between X and Y. For example, given X=ostrich and Y=bird, the two highest ranking output patterns are \"X is the largest Y\" and \"Y such as the X\". The output patterns are intended to be useful for finding further pairs with the same relations, to support the construction of lexicons, ontologies, and semantic networks. The patterns are sorted by pertinence, where the pertinence of a pattern Pi for a word pair X:Y is the expected relational similarity between the given pair and typical pairs for Pi. The algorithm is empirically evaluated on two tasks, solving multiple-choice SAT word analogy questions and classifying semantic relations in noun-modifier pairs. On both tasks, the algorithm achieves state-of-the-art results, performing significantly better than several alternative pattern ranking algorithms, based on tf-idf.\nWe develop a Genetic Programming-based methodology that enables discovery of novel functional forms for classical inter-atomic force-fields, used in molecular dynamics simulations. Unlike previous efforts in the field, that fit only the parameters to the fixed functional forms, we instead use a novel algorithm to search the space of many possible functional forms. While a follow-on practical procedure will use experimental and {\\it ab inito} data to find an optimal functional form for a forcefield, we first validate the approach using a manufactured solution. This validation has the advantage of a well-defined metric of success. We manufactured a training set of atomic coordinate data with an associated set of global energies using the well-known Lennard-Jones inter-atomic potential. We performed an automatic functional form fitting procedure starting with a population of random functions, using a genetic programming functional formulation, and a parallel tempering Metropolis-based optimization algorithm. Our massively-parallel method independently discovered the Lennard-Jones function after searching for several hours on 100 processors and covering a miniscule portion of the configuration space. We find that the method is suitable for unsupervised discovery of functional forms for inter-atomic potentials/force-fields. We also find that our parallel tempering Metropolis-based approach significantly improves the optimization convergence time, and takes good advantage of the parallel cluster architecture.\nIn answer set programming (ASP), a problem at hand is solved by (i) writing a logic program whose answer sets correspond to the solutions of the problem, and by (ii) computing the answer sets of the program using an answer set solver as a search engine. Typically, a programmer creates a series of gradually improving logic programs for a particular problem when optimizing program length and execution time on a particular solver. This leads the programmer to a meta-level problem of ensuring that the programs are equivalent, i.e., they give rise to the same answer sets. To ease answer set programming at methodological level, we propose a translation-based method for verifying the equivalence of logic programs. The basic idea is to translate logic programs P and Q under consideration into a single logic program EQT(P,Q) whose answer sets (if such exist) yield counter-examples to the equivalence of P and Q. The method is developed here in a slightly more general setting by taking the visibility of atoms properly into account when comparing answer sets. The translation-based approach presented in the paper has been implemented as a translator called lpeq that enables the verification of weak equivalence within the smodels system using the same search engine as for the search of models. Our experiments with lpeq and smodels suggest that establishing the equivalence of logic programs in this way is in certain cases much faster than naive cross-checking of answer sets.\nTermination is a major question in both logic and computer science. In logic, termination is at the heart of proof theory where it is usually called strong normalization (of cut elimination). In computer science, termination has always been an important issue for showing programs correct. In the early days of logic, strong normalization was usually shown by assigning ordinals to expressions in such a way that eliminating a cut would yield an expression with a smaller ordinal. In the early days of verification, computer scientists used similar ideas, interpreting the arguments of a program call by a natural number, such as their size. Showing the size of the arguments to decrease for each recursive call gives a termination proof of the program, which is however rather weak since it can only yield quite small ordinals. In the sixties, Tait invented a new method for showing cut elimination of natural deduction, based on a predicate over the set of terms, such that the membership of an expression to the predicate implied the strong normalization property for that expression. The predicate being defined by induction on types, or even as a fixpoint, this method could yield much larger ordinals. Later generalized by Girard under the name of reducibility or computability candidates, it showed very effective in proving the strong normalization property of typed lambda-calculi...\nWe consider sensor scheduling as the optimal observability problem for partially observable Markov decision processes (POMDP). This model fits to the cases where a Markov process is observed by a single sensor which needs to be dynamically adjusted or by a set of sensors which are selected one at a time in a way that maximizes the information acquisition from the process. Similar to conventional POMDP problems, in this model the control action is based on all past measurements; however here this action is not for the control of state process, which is autonomous, but it is for influencing the measurement of that process. This POMDP is a controlled version of the hidden Markov process, and we show that its optimal observability problem can be formulated as an average cost Markov decision process (MDP) scheduling problem. In this problem, a policy is a rule for selecting sensors or adjusting the measuring device based on the measurement history. Given a policy, we can evaluate the estimation entropy for the joint state-measurement processes which inversely measures the observability of state process for that policy. Considering estimation entropy as the cost of a policy, we show that the problem of finding optimal policy is equivalent to an average cost MDP scheduling problem where the cost function is the entropy function over the belief space. This allows the application of the policy iteration algorithm for finding the policy achieving minimum estimation entropy, thus optimum observability.\nWe give some semantic results for an epistemic logic incorporating dynamic operators to describe information changing events. Such events include epistemic changes, where agents become more informed about the non-changing state of the world, and ontic changes, wherein the world changes. The events are executed in information states that are modeled as pointed Kripke models. Our contribution consists of three semantic results. (i) Given two information states, there is an event transforming one into the other. The linguistic correspondent to this is that every consistent formula can be made true in every information state by the execution of an event. (ii) A more technical result is that: every event corresponds to an event in which the postconditions formalizing ontic change are assignments to `true' and `false' only (instead of assignments to arbitrary formulas in the logical language). `Corresponds' means that execution of either event in a given information state results in bisimilar information states. (iii) The third, also technical, result is that every event corresponds to a sequence of events wherein all postconditions are assignments of a single atom only (instead of simultaneous assignments of more than one atom).\nRecently, the diagnosability of {\\it stochastic discrete event systems} (SDESs) was investigated in the literature, and, the failure diagnosis considered was {\\it centralized}. In this paper, we propose an approach to {\\it decentralized} failure diagnosis of SDESs, where the stochastic system uses multiple local diagnosers to detect failures and each local diagnoser possesses its own information. In a way, the centralized failure diagnosis of SDESs can be viewed as a special case of the decentralized failure diagnosis presented in this paper with only one projection. The main contributions are as follows: (1) We formalize the notion of codiagnosability for stochastic automata, which means that a failure can be detected by at least one local stochastic diagnoser within a finite delay. (2) We construct a codiagnoser from a given stochastic automaton with multiple projections, and the codiagnoser associated with the local diagnosers is used to test codiagnosability condition of SDESs. (3) We deal with a number of basic properties of the codiagnoser. In particular, a necessary and sufficient condition for the codiagnosability of SDESs is presented. (4) We give a computing method in detail to check whether codiagnosability is violated. And (5) some examples are described to illustrate the applications of the codiagnosability and its computing method.\nWeighted Max-SAT is the optimization version of SAT and many important problems can be naturally encoded as such. Solving weighted Max-SAT is an important problem from both a theoretical and a practical point of view. In recent years, there has been considerable interest in finding efficient solving techniques. Most of this work focus on the computation of good quality lower bounds to be used within a branch and bound DPLL-like algorithm. Most often, these lower bounds are described in a procedural way. Because of that, it is difficult to realize the {\\em logic} that is behind.   In this paper we introduce an original framework for Max-SAT that stresses the parallelism with classical SAT. Then, we extend the two basic SAT solving techniques: {\\em search} and {\\em inference}. We show that many algorithmic {\\em tricks} used in state-of-the-art Max-SAT solvers are easily expressable in {\\em logic} terms with our framework in a unified manner.   Besides, we introduce an original search algorithm that performs a restricted amount of {\\em weighted resolution} at each visited node. We empirically compare our algorithm with a variety of solving alternatives on several benchmarks. Our experiments, which constitute to the best of our knowledge the most comprehensive Max-sat evaluation ever reported, show that our algorithm is generally orders of magnitude faster than any competitor.\nA fuzzy logic based classification engine has been developed for classifying mass spectra obtained with an imaging internal source Fourier transform mass spectrometer (I^2LD-FTMS). Traditionally, an operator uses the relative abundance of ions with specific mass-to-charge (m/z) ratios to categorize spectra. An operator does this by comparing the spectrum of m/z versus abundance of an unknown sample against a library of spectra from known samples. Automated positioning and acquisition allow I^2LD-FTMS to acquire data from very large grids, this would require classification of up to 3600 spectrum per hour to keep pace with the acquisition. The tedious job of classifying numerous spectra generated in an I^2LD-FTMS imaging application can be replaced by a fuzzy rule base if the cues an operator uses can be encapsulated. We present the translation of linguistic rules to a fuzzy classifier for mineral phases in basalt. This paper also describes a method for gathering statistics on ions, which are not currently used in the rule base, but which may be candidates for making the rule base more accurate and complete or to form new rule bases based on data obtained from known samples. A spatial method for classifying spectra with low membership values, based on neighboring sample classifications, is also presented.\nDescription Logics (DLs) are appropriate, widely used, logics for managing structured knowledge. They allow reasoning about individuals and concepts, i.e. set of individuals with common properties. Typically, DLs are limited to dealing with crisp, well defined concepts. That is, concepts for which the problem whether an individual is an instance of it is yes/no question. More often than not, the concepts encountered in the real world do not have a precisely defined criteria of membership: we may say that an individual is an instance of a concept only to a certain degree, depending on the individual's properties. The DLs that deal with such fuzzy concepts are called fuzzy DLs. In order to deal with fuzzy, incomplete, indeterminate and inconsistent concepts, we need to extend the fuzzy DLs, combining the neutrosophic logic with a classical DL. In particular, concepts become neutrosophic (here neutrosophic means fuzzy, incomplete, indeterminate, and inconsistent), thus reasoning about neutrosophic concepts is supported. We'll define its syntax, its semantics, and describe its properties.\nThe local reconstruction of a railway schedule following a small perturbation of the traffic, seeking minimization of the total accumulated delay, is a very difficult and tightly constrained combinatorial problem. Notoriously enough, the railway company's public image degrades proportionally to the amount of daily delays, and the same goes for its profit! This paper describes an inoculation procedure which greatly enhances an evolutionary algorithm for train re-scheduling. The procedure consists in building the initial population around a pre-computed solution based on problem-related information available beforehand. The optimization is performed by adapting times of departure and arrival, as well as allocation of tracks, for each train at each station. This is achieved by a permutation-based evolutionary algorithm that relies on a semi-greedy heuristic scheduler to gradually reconstruct the schedule by inserting trains one after another. Experimental results are presented on various instances of a large real-world case involving around 500 trains and more than 1 million constraints. In terms of competition with commercial math ematical programming tool ILOG CPLEX, it appears that within a large class of instances, excluding trivial instances as well as too difficult ones, and with very few exceptions, a clever initialization turns an encouraging failure into a clear-cut success auguring of substantial financial savings.\nA product configurator which is complete, backtrack free and able to compute the valid domains at any state of the configuration can be constructed by building a Binary Decision Diagram (BDD). Despite the fact that the size of the BDD is exponential in the number of variables in the worst case, BDDs have proved to work very well in practice. Current BDD-based techniques can only handle interactive configuration with small finite domains. In this paper we extend the approach to handle string variables constrained by regular expressions. The user is allowed to change the strings by adding letters at the end of the string. We show how to make a data structure that can perform fast valid domain computations given some assignment on the set of string variables.   We first show how to do this by using one large DFA. Since this approach is too space consuming to be of practical use, we construct a data structure that simulates the large DFA and in most practical cases are much more space efficient. As an example a configuration problem on $n$ string variables with only one solution in which each string variable is assigned to a value of length of $k$ the former structure will use $\\Omega(k^n)$ space whereas the latter only need $O(kn)$. We also show how this framework easily can be combined with the recent BDD techniques to allow both boolean, integer and string variables in the configuration problem.\nRecently, M. Chertkov and V.Y. Chernyak derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the Belief Propagation solution. By adding correction terms to the BP free energy, one for each \"generalized loop\" in the factor graph, the exact partition sum is obtained. However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. In this article we introduce Truncated Loop Series BP (TLSBP), a particular way of truncating the loop series of M. Chertkov and V.Y. Chernyak by considering generalized loops as compositions of simple loops. We analyze the performance of TLSBP in different scenarios, including the Ising model, regular random graphs and on Promedas, a large probabilistic medical diagnostic system. We show that TLSBP often improves upon the accuracy of the BP solution, at the expense of increased computation time. We also show that the performance of TLSBP strongly depends on the degree of interaction between the variables. For weak interactions, truncating the series leads to significant improvements, whereas for strong interactions it can be ineffective, even if a high number of terms is considered.\nConstraint Programming (CP) has been successfully applied to both constraint satisfaction and constraint optimization problems. A wide variety of specialized global constraints provide critical assistance in achieving a good model that can take advantage of the structure of the problem in the search for a solution. However, a key outstanding issue is the representation of 'ad-hoc' constraints that do not have an inherent combinatorial nature, and hence are not modeled well using narrowly specialized global constraints. We attempt to address this issue by considering a hybrid of search and compilation. Specifically we suggest the use of Reduced Ordered Multi-Valued Decision Diagrams (ROMDDs) as the supporting data structure for a generic global constraint. We give an algorithm for maintaining generalized arc consistency (GAC) on this constraint that amortizes the cost of the GAC computation over a root-to-leaf path in the search tree without requiring asymptotically more space than used for the MDD. Furthermore we present an approach for incrementally maintaining the reduced property of the MDD during the search, and show how this can be used for providing domain entailment detection. Finally we discuss how to apply our approach to other similar data structures such as AOMDDs and Case DAGs. The technique used can be seen as an extension of the GAC algorithm for the regular language constraint on finite length input.\nAxiomatic approach has demonstrated its power in mathematics. The main goal of this preprint is to show that axiomatic methods are also very efficient for computer science. It is possible to apply these methods to many problems in computer science. Here the main modes of computer functioning and program execution are described, formalized, and studied in an axiomatic context. The emphasis is on three principal modes: computation, decision, and acceptation. Now the prevalent mode for computers is computation. Problems of artificial intelligence involve decision mode, while communication functions of computer demand accepting mode. The main goal of this preprint is to study properties of these modes and relations between them. These problems are closely related to such fundamental concepts of computer science and technology as computability, decidability, and acceptability. In other words, we are concerned with the question what computers and software systems can do working in this or that mode. Consequently, results of this preprint allow one to achieve higher understanding of computations and in such a way, to find some basic properties of computers and their applications. Classes of algorithms, which model different kinds of computers and software, are compared with respect to their computing, accepting or deciding power. Operations with algorithms and machines are introduced. Examples show how to apply axiomatic results to different classes of algorithms and machines in order to enhance their performance.\nIn this article, we study directed graphs (digraphs) with a coloring constraint due to Von Neumann and related to Nim-type games. This is equivalent to the notion of kernels of digraphs, which appears in numerous fields of research such as game theory, complexity theory, artificial intelligence (default logic, argumentation in multi-agent systems), 0-1 laws in monadic second order logic, combinatorics (perfect graphs)... Kernels of digraphs lead to numerous difficult questions (in the sense of NP-completeness, #P-completeness). However, we show here that it is possible to use a generating function approach to get new informations: we use technique of symbolic and analytic combinatorics (generating functions and their singularities) in order to get exact and asymptotic results, e.g. for the existence of a kernel in a circuit or in a unicircuit digraph. This is a first step toward a generatingfunctionology treatment of kernels, while using, e.g., an approach \"a la Wright\". Our method could be applied to more general \"local coloring constraints\" in decomposable combinatorial structures.\nDiscrete temporal transitions occur in a variety of domains, but this work is mainly motivated by applications in molecular biology: explaining and analyzing observed transcriptome and proteome time series by literature and database knowledge. The starting point of a formal concept analysis model is presented. The objects of a formal context are states of the interesting entities, and the attributes are the variable properties defining the current state (e.g. observed presence or absence of proteins). Temporal transitions assign a relation to the objects, defined by deterministic or non-deterministic transition rules between sets of pre- and postconditions. This relation can be generalized to its transitive closure, i.e. states are related if one results from the other by a transition sequence of arbitrary length. The focus of the work is the adaptation of the attribute exploration algorithm to such a relational context, so that questions concerning temporal dependencies can be asked during the exploration process and be answered from the computed stem base. Results are given for the abstract example of a game and a small gene regulatory network relevant to a biomedical question.\nWe propose a new class of quantum computing algorithms which generalize many standard ones. The goal of our algorithms is to estimate probability distributions. Such estimates are useful in, for example, applications of Decision Theory and Artificial Intelligence, where inferences are made based on uncertain knowledge. The class of algorithms that we propose is based on a construction method that generalizes a Fredkin-Toffoli (F-T) construction method used in the field of classical reversible computing. F-T showed how, given any binary deterministic circuit, one can construct another binary deterministic circuit which does the same calculations in a reversible manner. We show how, given any classical stochastic network (classical Bayesian net), one can construct a quantum network (quantum Bayesian net). By running this quantum Bayesian net on a quantum computer, one can calculate any conditional probability that one would be interested in calculating for the original classical Bayesian net. Thus, we generalize the F-T construction method so that it can be applied to any classical stochastic circuit, not just binary deterministic ones. We also show that, in certain situations, our class of algorithms can be combined with Grover's algorithm to great advantage.\nWe present some results from simulation of a network of nodes connected by c-NOT gates with nearest neighbors. Though initially we begin with pure states of varying boundary conditions, the updating with time quickly involves a complicated entanglement involving all or most nodes. As a normal c-NOT gate, though unitary for a single pair of nodes, seems to be not so when used in a network in a naive way, we use a manifestly unitary form of the transition matrix with c?-NOT gates, which invert the phase as well as flipping the qubit. This leads to complete entanglement of the net, but with variable coefficients for the different components of the superposition. It is interesting to note that by a simple logical back projection the original input state can be recovered in most cases. We also prove that it is not possible for a sequence of unitary operators working on a net to make it move from an aperiodic regime to a periodic one, unlike some classical cases where phase-locking happens in course of evolution. However, we show that it is possible to introduce by hand periodic orbits to sets of initial states, which may be useful in forming dynamic pattern recognition systems.\nGraphical models of probabilistic dependencies have been extensively investigated in the context of classical uncertainty. However, in some domains (most notably, in computational physics and quantum computing) the nature of the relevant uncertainty is non-classical, and the laws of classical probability theory are superseded by those of quantum mechanics. In this paper we introduce Markovian Entanglement Networks (MEN), a novel class of graphical representations of quantum-mechanical dependencies in the context of such non-classical systems. MEN are the quantum-mechanical analogue of Markovian Networks, a family of undirected graphical representations which, in the classical domain, exploit a notion of conditional independence among subsystems.   After defining a notion of conditional independence appropriate to our domain (conditional separability), we prove that the conditional separabilities induced by a quantum-mechanical wave function are effectively reflected in the graphical structure of MEN. Specifically, we show that for any wave function there exists a MEN which is a perfect map of its conditional separabilities. Next, we show how the graphical structure of MEN can be used to effectively classify the pure states of three-qubit systems. We also demonstrate that, in large systems, exploiting conditional independencies may dramatically reduce the computational burden of various inference tasks. In principle, the graph-theoretic representation of conditional independencies afforded by MEN may not only facilitate the classical simulation of quantum systems, but also provide a guide to the efficient design and complexity analysis of quantum algorithms and circuits.\nMotivation: Profile hidden Markov Models (pHMMs) are a popular and very useful tool in the detection of the remote homologue protein families. Unfortunately, their performance is not always satisfactory when proteins are in the 'twilight zone'. We present HMMER-STRUCT, a model construction algorithm and tool that tries to improve pHMM performance by using structural information while training pHMMs. As a first step, HMMER-STRUCT constructs a set of pHMMs. Each pHMM is constructed by weighting each residue in an aligned protein according to a specific structural property of the residue. Properties used were primary, secondary and tertiary structures, accessibility and packing. HMMER-STRUCT then prioritizes the results by voting. Results: We used the SCOP database to perform our experiments. Throughout, we apply leave-one-family-out cross-validation over protein superfamilies. First, we used the MAMMOTH-mult structural aligner to align the training set proteins. Then, we performed two sets of experiments. In a first experiment, we compared structure weighted models against standard pHMMs and against each other. In a second experiment, we compared the voting model against individual pHMMs. We compare method performance through ROC curves and through Precision/Recall curves, and assess significance through the paired two tailed t-test. Our results show significant performance improvements of all structurally weighted models over default HMMER, and a significant improvement in sensitivity of the combined models over both the original model and the structurally weighted models.\nOne way of getting a better view of data is using frequent patterns. In this paper frequent patterns are subsets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is in principle impossible to say if a pattern is often occurring or not. Furthermore the number of patterns can be huge and a good overview of the structure of the stream is lost quickly. The proposed approach will use clustering to facilitate the analysis of the structure of the stream.   A clustering on the co-occurrence of patterns will give the user an improved view on the structure of the stream. Some patterns might occur so much together that they should form a combined pattern. In this way the patterns in the clustering will be the largest frequent patterns: maximal frequent patterns.   Our approach to decide if patterns occur often together will be based on a method of clustering when only the distance between pairs is known. The number of maximal frequent patterns is much smaller and combined with clustering methods these patterns provide a good view on the structure of the stream.\nMining frequent subgraphs is an area of research where we have a given set of graphs (each graph can be seen as a transaction), and we search for (connected) subgraphs contained in many of these graphs. In this work we will discuss techniques used in our framework Lattice2SAR for mining and analysing frequent subgraph data and their corresponding lattice information. Lattice information is provided by the graph mining algorithm gSpan; it contains all supergraph-subgraph relations of the frequent subgraph patterns -- and their supports.   Lattice2SAR is in particular used in the analysis of frequent graph patterns where the graphs are molecules and the frequent subgraphs are fragments. In the analysis of fragments one is interested in the molecules where patterns occur. This data can be very extensive and in this paper we focus on a technique of making it better available by using the lattice information in our clustering. Now we can reduce the number of times the highly compressed occurrence data needs to be accessed by the user. The user does not have to browse all the occurrence data in search of patterns occurring in the same molecules. Instead one can directly see which frequent subgraphs are of interest.\nThe SEMATECH sponsored J-88-E project teaming Texas Instruments with NeuroDyne (et al.) focused on Fault Detection and Classification (FDC) on a Lam 9600 aluminum plasma etch reactor, used in the process of semiconductor fabrication. Fault classification was accomplished by implementing a series of virtual sensor models which used data from real sensors (Lam Station sensors, Optical Emission Spectroscopy, and RF Monitoring) to predict recipe setpoints and wafer state characteristics. Fault detection and classification were performed by comparing predicted recipe and wafer state values with expected values. Models utilized include linear PLS, Polynomial PLS, and Neural Network PLS. Prediction of recipe setpoints based upon sensor data provides a capability for cross-checking that the machine is maintaining the desired setpoints. Wafer state characteristics such as Line Width Reduction and Remaining Oxide were estimated on-line using these same process sensors (Lam, OES, RFM). Wafer-to-wafer measurement of these characteristics in a production setting (where typically this information may be only sparsely available, if at all, after batch processing runs with numerous wafers have been completed) would provide important information to the operator that the process is or is not producing wafers within acceptable bounds of product quality. Production yield is increased, and correspondingly per unit cost is reduced, by providing the operator with the opportunity to adjust the process or machine before etching more wafers.\nHaplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information allows researchers to perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents.   A notable approach to the problem is to encode it as a combinatorial problem (under certain hypotheses, such as the pure parsimony criterion) and to solve it using off-the-shelf combinatorial optimization techniques. The main methods applied to Haplotype Inference are either simple greedy heuristic or exact methods (Integer Linear Programming, Semidefinite Programming, SAT encoding) that, at present, are adequate only for moderate size instances.   We believe that metaheuristic and hybrid approaches could provide a better scalability. Moreover, metaheuristics can be very easily combined with problem specific heuristics and they can also be integrated with tree-based search techniques, thus providing a promising framework for hybrid systems in which a good trade-off between effectiveness and efficiency can be reached.   In this paper we illustrate a feasibility study of the approach and discuss some relevant design issues, such as modeling and design of approximate solvers that combine constructive heuristics, local search-based improvement strategies and learning mechanisms. Besides the relevance of the Haplotype Inference problem itself, this preliminary analysis is also an interesting case study because the formulation of the problem poses some challenges in modeling and hybrid metaheuristic solver design that can be generalized to other problems.\nMany systems can be described in terms of networks of discrete elements and their various relationships to one another. A semantic network, or multi-relational network, is a directed labeled graph consisting of a heterogeneous set of entities connected by a heterogeneous set of relationships. Semantic networks serve as a promising general-purpose modeling substrate for complex systems. Various standardized formats and tools are now available to support practical, large-scale semantic network models. First, the Resource Description Framework (RDF) offers a standardized semantic network data model that can be further formalized by ontology modeling languages such as RDF Schema (RDFS) and the Web Ontology Language (OWL). Second, the recent introduction of highly performant triple-stores (i.e. semantic network databases) allows semantic network models on the order of $10^9$ edges to be efficiently stored and manipulated. RDF and its related technologies are currently used extensively in the domains of computer science, digital library science, and the biological sciences. This article will provide an introduction to RDF/RDFS/OWL and an examination of its suitability to model discrete element complex systems.\nIn this paper we present efficient evaluation algorithms for the Horn Transaction Logic (a generalization of the regular Horn logic programs with state updates). We present two complementary methods for optimizing the implementation of Transaction Logic. The first method is based on tabling and we modified the proof theory to table calls and answers on states (practically, equivalent to dynamic programming). The call-answer table is indexed on the call and a signature of the state in which the call was made. The answer columns contain the answer unification and a signature of the state after the call was executed. The states are signed efficiently using a technique based on tries and counting. The second method is based on incremental evaluation and it applies when the data oracle contains derived relations. The deletions and insertions (executed in the transaction oracle) change the state of the database. Using the heuristic of inertia (only a part of the state changes in response to elementary updates), most of the time it is cheaper to compute only the changes in the state than to recompute the entire state from scratch. The two methods are complementary by the fact that the first method optimizes the evaluation when a call is repeated in the same state, and the second method optimizes the evaluation of a new state when a call-state pair is not found by the tabling mechanism (i.e. the first method). The proof theory of Transaction Logic with the application of tabling and incremental evaluation is sound and complete with respect to its model theory.\nA growing number of indicators are now being used with some confidence to measure the metallicity(Z) of photoionisation regions in planetary nebulae, galactic HII regions(GHIIRs), extra-galactic HII regions(EGHIIRs) and HII galaxies(HIIGs). However, a universal indicator valid also at high metallicities has yet to be found. Here, we report on a new artificial intelligence-based approach to determine metallicity indicators that shows promise for the provision of improved empirical fits. The method hinges on the application of an evolutionary neural network to observational emission line data. The network's DNA, encoded in its architecture, weights and neuron transfer functions, is evolved using a genetic algorithm. Furthermore, selection, operating on a set of 10 distinct neuron transfer functions, means that the empirical relation encoded in the network solution architecture is in functional rather than numerical form. Thus the network solutions provide an equation for the metallicity in terms of line ratios without a priori assumptions. Tapping into the mathematical power offered by this approach, we applied the network to detailed observations of both nebula and auroral emission lines in the optical for a sample of 96 HII-type regions and we were able to obtain an empirical relation between Z and S23 with a dispersion of only 0.16 dex. We show how the method can be used to identify new diagnostics as well as the nonlinear relationship supposed to exist between the metallicity Z, ionisation parameter U and effective (or equivalent) temperature T*.\nIn this paper we study cellular automata (CAs) that perform the computational Majority task. This task is a good example of what the phenomenon of emergence in complex systems is. We take an interest in the reasons that make this particular fitness landscape a difficult one. The first goal is to study the landscape as such, and thus it is ideally independent from the actual heuristics used to search the space. However, a second goal is to understand the features a good search technique for this particular problem space should possess. We statistically quantify in various ways the degree of difficulty of searching this landscape. Due to neutrality, investigations based on sampling techniques on the whole landscape are difficult to conduct. So, we go exploring the landscape from the top. Although it has been proved that no CA can perform the task perfectly, several efficient CAs for this task have been found. Exploiting similarities between these CAs and symmetries in the landscape, we define the Olympus landscape which is regarded as the ''heavenly home'' of the best local optima known (blok). Then we measure several properties of this subspace. Although it is easier to find relevant CAs in this subspace than in the overall landscape, there are structural reasons that prevent a searcher from finding overfitted CAs in the Olympus. Finally, we study dynamics and performance of genetic algorithms on the Olympus in order to confirm our analysis and to find efficient CAs for the Majority problem with low computational cost.\nWe develop a general framework for MAP estimation in discrete and Gaussian graphical models using Lagrangian relaxation techniques. The key idea is to reformulate an intractable estimation problem as one defined on a more tractable graph, but subject to additional constraints. Relaxing these constraints gives a tractable dual problem, one defined by a thin graph, which is then optimized by an iterative procedure. When this iterative optimization leads to a consistent estimate, one which also satisfies the constraints, then it corresponds to an optimal MAP estimate of the original model. Otherwise there is a ``duality gap'', and we obtain a bound on the optimal solution. Thus, our approach combines convex optimization with dynamic programming techniques applicable for thin graphs. The popular tree-reweighted max-product (TRMP) method may be seen as solving a particular class of such relaxations, where the intractable graph is relaxed to a set of spanning trees. We also consider relaxations to a set of small induced subgraphs, thin subgraphs (e.g. loops), and a connected tree obtained by ``unwinding'' cycles. In addition, we propose a new class of multiscale relaxations that introduce ``summary'' variables. The potential benefits of such generalizations include: reducing or eliminating the ``duality gap'' in hard problems, reducing the number or Lagrange multipliers in the dual problem, and accelerating convergence of the iterative optimization procedure.\nElectrical Impedance Tomography (EIT) is a functional imaging method that is being developed for bedside use in critical care medicine. Aiming at improving the chest anatomical resolution of EIT images we developed a fuzzy model based on EIT high temporal resolution and the functional information contained in the pulmonary perfusion and ventilation signals. EIT data from an experimental animal model were collected during normal ventilation and apnea while an injection of hypertonic saline was used as a reference . The fuzzy model was elaborated in three parts: a modeling of the heart, a pulmonary map from ventilation images and, a pulmonary map from perfusion images. Image segmentation was performed using a threshold method and a ventilation/perfusion map was generated. EIT images treated by the fuzzy model were compared with the hypertonic saline injection method and CT-scan images, presenting good results in both qualitative (the image obtained by the model was very similar to that of the CT-scan) and quantitative (the ROC curve provided an area equal to 0.93) point of view. Undoubtedly, these results represent an important step in the EIT images area, since they open the possibility of developing EIT-based bedside clinical methods, which are not available nowadays. These achievements could serve as the base to develop EIT diagnosis system for some life-threatening diseases commonly found in critical care medicine.\nNear optimal decoding of good error control codes is generally a difficult task. However, for a certain type of (sufficiently) good codes an efficient decoding algorithm with near optimal performance exists. These codes are defined via a combination of constituent codes with low complexity trellis representations. Their decoding algorithm is an instance of (loopy) belief propagation and is based on an iterative transfer of constituent beliefs. The beliefs are thereby given by the symbol probabilities computed in the constituent trellises. Even though weak constituent codes are employed close to optimal performance is obtained, i.e., the encoder/decoder pair (almost) achieves the information theoretic capacity. However, (loopy) belief propagation only performs well for a rather specific set of codes, which limits its applicability.   In this paper a generalisation of iterative decoding is presented. It is proposed to transfer more values than just the constituent beliefs. This is achieved by the transfer of beliefs obtained by independently investigating parts of the code space. This leads to the concept of discriminators, which are used to improve the decoder resolution within certain areas and defines discriminated symbol beliefs. It is shown that these beliefs approximate the overall symbol probabilities. This leads to an iteration rule that (below channel capacity) typically only admits the solution of the overall decoding problem. Via a Gauss approximation a low complexity version of this algorithm is derived. Moreover, the approach may then be applied to a wide range of channel maps without significant complexity increase.\nWe consider the discrete-time infinite-horizon optimal control problem formalized by Markov Decision Processes. We revisit the work of Bertsekas and Ioffe, that introduced $\\lambda$ Policy Iteration, a family of algorithms parameterized by $\\lambda$ that generalizes the standard algorithms Value Iteration and Policy Iteration, and has some deep connections with the Temporal Differences algorithm TD($\\lambda$) described by Sutton and Barto. We deepen the original theory developped by the authors by providing convergence rate bounds which generalize standard bounds for Value Iteration described for instance by Puterman. Then, the main contribution of this paper is to develop the theory of this algorithm when it is used in an approximate form and show that this is sound. Doing so, we extend and unify the separate analyses developped by Munos for Approximate Value Iteration and Approximate Policy Iteration. Eventually, we revisit the use of this algorithm in the training of a Tetris playing controller as originally done by Bertsekas and Ioffe. We provide an original performance bound that can be applied to such an undiscounted control problem. Our empirical results are different from those of Bertsekas and Ioffe (which were originally qualified as \"paradoxical\" and \"intriguing\"), and much more conform to what one would expect from a learning experiment. We discuss the possible reason for such a difference.\nThe pace of progress in the fields of Evolutionary Computation and Machine Learning is currently limited -- in the former field, by the improbability of making advantageous extensions to evolutionary algorithms when their capacity for adaptation is poorly understood, and in the latter by the difficulty of finding effective semi-principled reductions of hard real-world problems to relatively simple optimization problems. In this paper we explain why a theory which can accurately explain the simple genetic algorithm's remarkable capacity for adaptation has the potential to address both these limitations. We describe what we believe to be the impediments -- historic and analytic -- to the discovery of such a theory and highlight the negative role that the building block hypothesis (BBH) has played. We argue based on experimental results that a fundamental limitation which is widely believed to constrain the SGA's adaptive ability (and is strongly implied by the BBH) is in fact illusionary and does not exist. The SGA therefore turns out to be more powerful than it is currently thought to be. We give conditions under which it becomes feasible to numerically approximate and study the multivariate marginals of the search distribution of an infinite population SGA over multiple generations even when its genomes are long, and explain why this analysis is relevant to the riddle of the SGA's remarkable adaptive abilities.\nIn this paper, we present a Mirroring Neural Network architecture to perform non-linear dimensionality reduction and Object Recognition using a reduced lowdimensional characteristic vector. In addition to dimensionality reduction, the network also reconstructs (mirrors) the original high-dimensional input vector from the reduced low-dimensional data. The Mirroring Neural Network architecture has more number of processing elements (adalines) in the outer layers and the least number of elements in the central layer to form a converging-diverging shape in its configuration. Since this network is able to reconstruct the original image from the output of the innermost layer (which contains all the information about the input pattern), these outputs can be used as object signature to classify patterns. The network is trained to minimize the discrepancy between actual output and the input by back propagating the mean squared error from the output layer to the input layer. After successfully training the network, it can reduce the dimension of input vectors and mirror the patterns fed to it. The Mirroring Neural Network architecture gave very good results on various test patterns.\nThis paper proposes an unsupervised learning technique by using Multi-layer Mirroring Neural Network and Forgy's clustering algorithm. Multi-layer Mirroring Neural Network is a neural network that can be trained with generalized data inputs (different categories of image patterns) to perform non-linear dimensionality reduction and the resultant low-dimensional code is used for unsupervised pattern classification using Forgy's algorithm. By adapting the non-linear activation function (modified sigmoidal function) and initializing the weights and bias terms to small random values, mirroring of the input pattern is initiated. In training, the weights and bias terms are changed in such a way that the input presented is reproduced at the output by back propagating the error. The mirroring neural network is capable of reducing the input vector to a great degree (approximately 1/30th the original size) and also able to reconstruct the input pattern at the output layer from this reduced code units. The feature set (output of central hidden layer) extracted from this network is fed to Forgy's algorithm, which classify input data patterns into distinguishable classes. In the implementation of Forgy's algorithm, initial seed points are selected in such a way that they are distant enough to be perfectly grouped into different categories. Thus a new method of unsupervised learning is formulated and demonstrated in this paper. This method gave impressive results when applied to classification of different image patterns.\nLogic programming under the answer-set semantics nowadays deals with numerous different notions of program equivalence. This is due to the fact that equivalence for substitution (known as strong equivalence) and ordinary equivalence are different concepts. The former holds, given programs P and Q, iff P can be faithfully replaced by Q within any context R, while the latter holds iff P and Q provide the same output, that is, they have the same answer sets. Notions in between strong and ordinary equivalence have been introduced as theoretical tools to compare incomplete programs and are defined by either restricting the syntactic structure of the considered context programs R or by bounding the set A of atoms allowed to occur in R (relativized equivalence).For the latter approach, different A yield properly different equivalence notions, in general. For the former approach, however, it turned out that any ``reasonable'' syntactic restriction to R coincides with either ordinary, strong, or uniform equivalence. In this paper, we propose a parameterization for equivalence notions which takes care of both such kinds of restrictions simultaneously by bounding, on the one hand, the atoms which are allowed to occur in the rule heads of the context and, on the other hand, the atoms which are allowed to occur in the rule bodies of the context. We introduce a general semantical characterization which includes known ones as SE-models (for strong equivalence) or UE-models (for uniform equivalence) as special cases. Moreover,we provide complexity bounds for the problem in question and sketch a possible implementation method.   To appear in Theory and Practice of Logic Programming (TPLP).\nComputability logic (CL) (see http://www.cis.upenn.edu/~giorgi/cl.html) is a semantical platform and research program for redeveloping logic as a formal theory of computability, as opposed to the formal theory of truth which it has more traditionally been. Formulas in CL stand for (interactive) computational problems, understood as games between a machine and its environment; logical operators represent operations on such entities; and \"truth\" is understood as existence of an effective solution, i.e., of an algorithmic winning strategy.   The formalism of CL is open-ended, and may undergo series of extensions as the study of the subject advances. The main groups of operators on which CL has been focused so far are the parallel, choice, branching, and blind operators. The present paper introduces a new important group of operators, called sequential. The latter come in the form of sequential conjunction and disjunction, sequential quantifiers, and sequential recurrences. As the name may suggest, the algorithmic intuitions associated with this group are those of sequential computations, as opposed to the intuitions of parallel computations associated with the parallel group of operations: playing a sequential combination of games means playing its components in a sequential fashion, one after one.   The main technical result of the present paper is a sound and complete axiomatization of the propositional fragment of computability logic whose vocabulary, together with negation, includes all three -- parallel, choice and sequential -- sorts of conjunction and disjunction. An extension of this result to the first-order level is also outlined.\nMany problems that arise in machine learning domain deal with nonlinearity and quite often demand users to obtain global optimal solutions rather than local optimal ones. Optimization problems are inherent in machine learning algorithms and hence many methods in machine learning were inherited from the optimization literature. Popularly known as the initialization problem, the ideal set of parameters required will significantly depend on the given initialization values. The recently developed TRUST-TECH (TRansformation Under STability-reTaining Equilibria CHaracterization) methodology systematically explores the subspace of the parameters to obtain a complete set of local optimal solutions. In this thesis work, we propose TRUST-TECH based methods for solving several optimization and machine learning problems. Two stages namely, the local stage and the neighborhood-search stage, are repeated alternatively in the solution space to achieve improvements in the quality of the solutions. Our methods were tested on both synthetic and real datasets and the advantages of using this novel framework are clearly manifested. This framework not only reduces the sensitivity to initialization, but also allows the flexibility for the practitioners to use various global and local methods that work well for a particular problem of interest. Other hierarchical stochastic algorithms like evolutionary algorithms and smoothing algorithms are also studied and frameworks for combining these methods with TRUST-TECH have been proposed and evaluated on several test systems.\nMost definitions of ontology, viewed as a \"specification of a conceptualization\", agree on the fact that if an ontology can take different forms, it necessarily includes a vocabulary of terms and some specification of their meaning in relation to the domain's conceptualization. And as domain knowledge is mainly conveyed through scientific and technical texts, we can hope to extract some useful information from them for building ontology. But is it as simple as this? In this article we shall see that the lexical structure, i.e. the network of words linked by linguistic relationships, does not necessarily match the domain conceptualization. We have to bear in mind that writing documents is the concern of textual linguistics, of which one of the principles is the incompleteness of text, whereas building ontology - viewed as task-independent knowledge - is concerned with conceptualization based on formal and not natural languages. Nevertheless, the famous Sapir and Whorf hypothesis, concerning the interdependence of thought and language, is also applicable to formal languages. This means that the way an ontology is built and a concept is defined depends directly on the formal language which is used; and the results will not be the same. The introduction of the notion of ontoterminology allows to take into account epistemological principles for formal ontology building.\nDisjunctive Logic Programming (DLP) is a very expressive formalism: it allows for expressing every property of finite structures that is decidable in the complexity class SigmaP2 (= NP^NP). Despite this high expressiveness, there are some simple properties, often arising in real-world applications, which cannot be encoded in a simple and natural manner. Especially properties that require the use of arithmetic operators (like sum, times, or count) on a set or multiset of elements, which satisfy some conditions, cannot be naturally expressed in classic DLP.   To overcome this deficiency, we extend DLP by aggregate functions in a conservative way. In particular, we avoid the introduction of constructs with disputed semantics, by requiring aggregates to be stratified. We formally define the semantics of the extended language (called DLP^A), and illustrate how it can be profitably used for representing knowledge. Furthermore, we analyze the computational complexity of DLP^A, showing that the addition of aggregates does not bring a higher cost in that respect. Finally, we provide an implementation of DLP^A in DLV -- a state-of-the-art DLP system -- and report on experiments which confirm the usefulness of the proposed extension also for the efficiency of computation.\nThere are two kinds of approaches for termination analysis of logic programs: \"transformational\" and \"direct\" ones. Direct approaches prove termination directly on the basis of the logic program. Transformational approaches transform a logic program into a term rewrite system (TRS) and then analyze termination of the resulting TRS instead. Thus, transformational approaches make all methods previously developed for TRSs available for logic programs as well. However, the applicability of most existing transformations is quite restricted, as they can only be used for certain subclasses of logic programs. (Most of them are restricted to well-moded programs.) In this paper we improve these transformations such that they become applicable for any definite logic program. To simulate the behavior of logic programs by TRSs, we slightly modify the notion of rewriting by permitting infinite terms. We show that our transformation results in TRSs which are indeed suitable for automated termination analysis. In contrast to most other methods for termination of logic programs, our technique is also sound for logic programming without occur check, which is typically used in practice. We implemented our approach in the termination prover AProVE and successfully evaluated it on a large collection of examples.\nWe provide deterministic, polynomial-time computable voting rules that approximate Dodgson's and (the ``minimization version'' of) Young's scoring rules to within a logarithmic factor. Our approximation of Dodgson's rule is tight up to a constant factor, as Dodgson's rule is $\\NP$-hard to approximate to within some logarithmic factor. The ``maximization version'' of Young's rule is known to be $\\NP$-hard to approximate by any constant factor. Both approximations are simple, and natural as rules in their own right: Given a candidate we wish to score, we can regard either its Dodgson or Young score as the edit distance between a given set of voter preferences and one in which the candidate to be scored is the Condorcet winner. (The difference between the two scoring rules is the type of edits allowed.) We regard the marginal cost of a sequence of edits to be the number of edits divided by the number of reductions (in the candidate's deficit against any of its opponents in the pairwise race against that opponent) that the edits yield. Over a series of rounds, our scoring rules greedily choose a sequence of edits that modify exactly one voter's preferences and whose marginal cost is no greater than any other such single-vote-modifying sequence.\nThis paper focuses on the problem of kernelizing an existing supervised Mahalanobis distance learner. The following features are included in the paper. Firstly, three popular learners, namely, \"neighborhood component analysis\", \"large margin nearest neighbors\" and \"discriminant neighborhood embedding\", which do not have kernel versions are kernelized in order to improve their classification performances. Secondly, an alternative kernelization framework called \"KPCA trick\" is presented. Implementing a learner in the new framework gains several advantages over the standard framework, e.g. no mathematical formulas and no reprogramming are required for a kernel implementation, the framework avoids troublesome problems such as singularity, etc. Thirdly, while the truths of representer theorems are just assumptions in previous papers related to ours, here, representer theorems are formally proven. The proofs validate both the kernel trick and the KPCA trick in the context of Mahalanobis distance learning. Fourthly, unlike previous works which always apply brute force methods to select a kernel, we investigate two approaches which can be efficiently adopted to construct an appropriate kernel for a given dataset. Finally, numerical results on various real-world datasets are presented.\nThis paper explores the links between Knowledge Management and new community-based models of the organization from both a theoretical and an empirical perspective. From a theoretical standpoint, we look at Communities of Practice (CoPs) and Knowledge Management (KM) and explore the links between the two as they relate to the use of information systems to manage knowledge. We begin by reviewing technologically supported approaches to KM and introduce the idea of \"Systemes d'Aide a la Gestion des Connaissances\" SAGC (Systems to aid the Management of Knowledge). Following this we examine the contribution that communal structures such as CoPs can make to intraorganizational KM and highlight some of 'success factors' for this approach to KM that are found in the literature. From an empirical standpoint, we present the results of a survey involving the Chief Knowledge Officers (CKOs) of twelve large French businesses; the objective of this study was to identify the factors that might influence the success of such approaches. The survey was analysed using thematic content analysis and the results are presented here with some short illustrative quotes from the CKOs. Finally, the paper concludes with some brief reflections on what can be learnt from looking at this problem from these two perspectives.\nKnowledge could be gained from experts, specialists in the area of interest, or it can be gained by induction from sets of data. Automatic induction of knowledge from data sets, usually stored in large databases, is called data mining. Data mining methods are important in the management of complex systems. There are many technologies available to data mining practitioners, including Artificial Neural Networks, Regression, and Decision Trees. Neural networks have been successfully applied in wide range of supervised and unsupervised learning applications. Neural network methods are not commonly used for data mining tasks, because they often produce incomprehensible models, and require long training times. One way in which the collective properties of a neural network may be used to implement a computational task is by way of the concept of energy minimization. The Hopfield network is well-known example of such an approach. The Hopfield network is useful as content addressable memory or an analog computer for solving combinatorial-type optimization problems. Wan Abdullah [1] proposed a method of doing logic programming on a Hopfield neural network. Optimization of logical inconsistency is carried out by the network after the connection strengths are defined from the logic program; the network relaxes to neural states corresponding to a valid interpretation. In this article, we describe how Hopfield network is able to induce logical rules from large database by using reverse analysis method: given the values of the connections of a network, we can hope to know what logical rules are entrenched in the database.\nComputability logic (CL) (see http://www.cis.upenn.edu/~giorgi/cl.html) is a recently launched program for redeveloping logic as a formal theory of computability, as opposed to the formal theory of truth that logic has more traditionally been. Formulas in it represent computational problems, \"truth\" means existence of an algorithmic solution, and proofs encode such solutions. Within the line of research devoted to finding axiomatizations for ever more expressive fragments of CL, the present paper introduces a new deductive system CL12 and proves its soundness and completeness with respect to the semantics of CL. Conservatively extending classical predicate calculus and offering considerable additional expressive and deductive power, CL12 presents a reasonable, computationally meaningful, constructive alternative to classical logic as a basis for applied theories. To obtain a model example of such theories, this paper rebuilds the traditional, classical-logic-based Peano arithmetic into a computability-logic-based counterpart. Among the purposes of the present contribution is to provide a starting point for what, as the author wishes to hope, might become a new line of research with a potential of interesting findings -- an exploration of the presumably quite unusual metatheory of CL-based arithmetic and other CL-based applied systems.\nMany social Web sites allow users to publish content and annotate with descriptive metadata. In addition to flat tags, some social Web sites have recently began to allow users to organize their content and metadata hierarchically. The social photosharing site Flickr, for example, allows users to group related photos in sets, and related sets in collections. The social bookmarking site Del.icio.us similarly lets users group related tags into bundles. Although the sites themselves don't impose any constraints on how these hierarchies are used, individuals generally use them to capture relationships between concepts, most commonly the broader/narrower relations. Collective annotation of content with hierarchical relations may lead to an emergent classification system, called a folksonomy. While some researchers have explored using tags as evidence for learning folksonomies, we believe that hierarchical relations described above offer a high-quality source of evidence for this task.   We propose a simple approach to aggregate shallow hierarchies created by many distinct Flickr users into a common folksonomy. Our approach uses statistics to determine if a particular relation should be retained or discarded. The relations are then woven together into larger hierarchies. Although we have not carried out a detailed quantitative evaluation of the approach, it looks very promising since it generates very reasonable, non-trivial hierarchies.\nIn the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screening tests, keeping a high performance. We assume that the probabilistic reasoning within the Bayesian methodology allows us to discover new relationships between the screening tests and uncertainty in decisions. In practice, selection of the most informative tests can also reduce the cost of a screening procedure in trauma care centers. In our experiments we use the UK Trauma data to compare the efficiency of the proposed technique in terms of the performance. We also compare the uncertainty in decisions in terms of entropy.\nThe way a rational agent changes her belief in certain propositions/hypotheses in the light of new evidence lies at the heart of Bayesian inference. The basic natural assumption, as summarized in van Fraassen's Reflection Principle ([1984]), would be that in the absence of new evidence the belief should not change. Yet, there are examples that are claimed to violate this assumption. The apparent paradox presented by such examples, if not settled, would demonstrate the inconsistency and/or incompleteness of the Bayesian approach and without eliminating this inconsistency, the approach cannot be regarded as scientific.   The Sleeping Beauty Problem is just such an example. The existing attempts to solve the problem fall into three categories. The first two share the view that new evidence is absent, but differ about the conclusion of whether Sleeping Beauty should change her belief or not, and why. The third category is characterized by the view that, after all, new evidence (although hidden from the initial view) is involved.   My solution is radically different and does not fall in either of these categories. I deflate the paradox by arguing that the two different degrees of belief presented in the Sleeping Beauty Problem are in fact beliefs in two different propositions, i.e. there is no need to explain the (un)change of belief.\nIt is generally accepted that human vision is an extremely powerful information processing system that facilitates our interaction with the surrounding world. However, despite extended and extensive research efforts, which encompass many exploration fields, the underlying fundamentals and operational principles of visual information processing in human brain remain unknown. We still are unable to figure out where and how along the path from eyes to the cortex the sensory input perceived by the retina is converted into a meaningful object representation, which can be consciously manipulated by the brain. Studying the vast literature considering the various aspects of brain information processing, I was surprised to learn that the respected scholarly discussion is totally indifferent to the basic keynote question: \"What is information?\" in general or \"What is visual information?\" in particular. In the old days, it was assumed that any scientific research approach has first to define its basic departure points. Why was it overlooked in brain information processing research remains a conundrum. In this paper, I am trying to find a remedy for this bizarre situation. I propose an uncommon definition of \"information\", which can be derived from Kolmogorov's Complexity Theory and Chaitin's notion of Algorithmic Information. Embracing this new definition leads to an inevitable revision of traditional dogmas that shape the state of the art of brain information processing research. I hope this revision would better serve the challenging goal of human visual information processing modeling.\nWe investigate the use of message-passing algorithms for the problem of finding the max-weight independent set (MWIS) in a graph. First, we study the performance of the classical loopy max-product belief propagation. We show that each fixed point estimate of max-product can be mapped in a natural way to an extreme point of the LP polytope associated with the MWIS problem. However, this extreme point may not be the one that maximizes the value of node weights; the particular extreme point at final convergence depends on the initialization of max-product. We then show that if max-product is started from the natural initialization of uninformative messages, it always solves the correct LP -- if it converges. This result is obtained via a direct analysis of the iterative algorithm, and cannot be obtained by looking only at fixed points.   The tightness of the LP relaxation is thus necessary for max-product optimality, but it is not sufficient. Motivated by this observation, we show that a simple modification of max-product becomes gradient descent on (a convexified version of) the dual of the LP, and converges to the dual optimum. We also develop a message-passing algorithm that recovers the primal MWIS solution from the output of the descent algorithm. We show that the MWIS estimate obtained using these two algorithms in conjunction is correct when the graph is bipartite and the MWIS is unique.   Finally, we show that any problem of MAP estimation for probability distributions over finite domains can be reduced to an MWIS problem. We believe this reduction will yield new insights and algorithms for MAP estimation.\nThis project describes the electricity demand and energy consumption management system and its application to Southern Peru smelter. It is composed of an hourly demand-forecasting module and of a simulation component for a plant electrical system. The first module was done using dynamic neural networks with backpropagation training algorithm; it is used to predict the electric power demanded every hour, with an error percentage below of 1%. This information allows efficient management of energy peak demands before this happen, distributing the raise of electric load to other hours or improving those equipments that increase the demand. The simulation module is based in advanced estimation techniques, such as: parametric estimation, neural network modeling, statistic regression and previously developed models, which simulates the electric behavior of the smelter plant. These modules facilitate electricity demand and consumption proper planning, because they allow knowing the behavior of the hourly demand and the consumption patterns of the plant, including the bill components, but also energy deficiencies and opportunities for improvement, based on analysis of information about equipments, processes and production plans, as well as maintenance programs. Finally the results of its application in Southern Peru smelter are presented.\nThe notions of hypertree width and generalized hypertree width were introduced by Gottlob, Leone, and Scarcello in order to extend the concept of hypergraph acyclicity. These notions were further generalized by Grohe and Marx, who introduced the fractional hypertree width of a hypergraph. All these width parameters on hypergraphs are useful for extending tractability of many problems in database theory and artificial intelligence. In this paper, we study the approximability of (generalized, fractional) hyper treewidth of sparse hypergraphs where the criterion of sparsity reflects the sparsity of their incidence graphs. Our first step is to prove that the (generalized, fractional) hypertree width of a hypergraph H is constant-factor sandwiched by the treewidth of its incidence graph, when the incidence graph belongs to some apex-minor-free graph class. This determines the combinatorial borderline above which the notion of (generalized, fractional) hypertree width becomes essentially more general than treewidth, justifying that way its functionality as a hypergraph acyclicity measure. While for more general sparse families of hypergraphs treewidth of incidence graphs and all hypertree width parameters may differ arbitrarily, there are sparse families where a constant factor approximation algorithm is possible. In particular, we give a constant factor approximation polynomial time algorithm for (generalized, fractional) hypertree width on hypergraphs whose incidence graphs belong to some H-minor-free graph class.\nWikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks.   This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.\nIn this paper, a Gaifman-Shapiro-style module architecture is tailored to the case of Smodels programs under the stable model semantics. The composition of Smodels program modules is suitably limited by module conditions which ensure the compatibility of the module system with stable models. Hence the semantics of an entire Smodels program depends directly on stable models assigned to its modules. This result is formalized as a module theorem which truly strengthens Lifschitz and Turner's splitting-set theorem for the class of Smodels programs. To streamline generalizations in the future, the module theorem is first proved for normal programs and then extended to cover Smodels programs using a translation from the latter class of programs to the former class. Moreover, the respective notion of module-level equivalence, namely modular equivalence, is shown to be a proper congruence relation: it is preserved under substitutions of modules that are modularly equivalent. Principles for program decomposition are also addressed. The strongly connected components of the respective dependency graph can be exploited in order to extract a module structure when there is no explicit a priori knowledge about the modules of a program. The paper includes a practical demonstration of tools that have been developed for automated (de)composition of Smodels programs.   To appear in Theory and Practice of Logic Programming.\nMulti-relational networks are used extensively to structure knowledge. Perhaps the most popular instance, due to the widespread adoption of the Semantic Web, is the Resource Description Framework (RDF). One of the primary purposes of a knowledge network is to reason; that is, to alter the topology of the network according to an algorithm that uses the existing topological structure as its input. There exist many such reasoning algorithms. With respect to the Semantic Web, the bivalent, monotonic reasoners of the RDF Schema (RDFS) and the Web Ontology Language (OWL) are the most prevalent. However, nothing prevents other forms of reasoning from existing in the Semantic Web. This article presents a non-bivalent, non-monotonic, evidential logic and reasoner that is an algebraic ring over a multi-relational network equipped with two binary operations that can be composed to execute various forms of inference. Given its multi-relational grounding, it is possible to use the presented evidential framework as another method for structuring knowledge and reasoning in the Semantic Web. The benefits of this framework are that it works with arbitrary, partial, and contradictory knowledge while, at the same time, it supports a tractable approximate reasoning process.\nWe proof a theorem that shows that a collection of experimental data of membership weights of items with respect to a pair of concepts and its conjunction cannot be modeled within a classical measure theoretic weight structure in case the experimental data contain the effect called overextension. Since the effect of overextension, analogue to the well-known guppy effect for concept combinations, is abundant in all experiments testing weights of items with respect to pairs of concepts and their conjunctions, our theorem constitutes a no-go theorem for classical measure structure for common data of membership weights of items with respect to concepts and their combinations. We put forward a simple geometric criterion that reveals the non classicality of the membership weight structure and use experimentally measured membership weights estimated by subjects in experiments to illustrate our geometrical criterion. The violation of the classical weight structure is similar to the violation of the well-known Bell inequalities studied in quantum mechanics, and hence suggests that the quantum formalism and hence the modeling by quantum membership weights can accomplish what classical membership weights cannot do.\nInspired by a quantum mechanical formalism to model concepts and their disjunctions and conjunctions, we put forward in this paper a specific hypothesis. Namely that within human thought two superposed layers can be distinguished: (i) a layer given form by an underlying classical deterministic process, incorporating essentially logical thought and its indeterministic version modeled by classical probability theory; (ii) a layer given form under influence of the totality of the surrounding conceptual landscape, where the different concepts figure as individual entities rather than (logical) combinations of others, with measurable quantities such as 'typicality', 'membership', 'representativeness', 'similarity', 'applicability', 'preference' or 'utility' carrying the influences. We call the process in this second layer 'quantum conceptual thought', which is indeterministic in essence, and contains holistic aspects, but is equally well, although very differently, organized than logical thought. A substantial part of the 'quantum conceptual thought process' can be modeled by quantum mechanical probabilistic and mathematical structures. We consider examples of three specific domains of research where the effects of the presence of quantum conceptual thought and its deviations from classical logical thought have been noticed and studied, i.e. economics, decision theory, and concept theories and which provide experimental evidence for our hypothesis.\nIn the context of the Semantic Web, several approaches to the combination of ontologies, given in terms of theories of classical first-order logic and rule bases, have been proposed. They either cast rules into classical logic or limit the interaction between rules and ontologies. Autoepistemic logic (AEL) is an attractive formalism which allows to overcome these limitations, by serving as a uniform host language to embed ontologies and nonmonotonic logic programs into it. For the latter, so far only the propositional setting has been considered. In this paper, we present three embeddings of normal and three embeddings of disjunctive non-ground logic programs under the stable model semantics into first-order AEL. While the embeddings all correspond with respect to objective ground atoms, differences arise when considering non-atomic formulas and combinations with first-order theories. We compare the embeddings with respect to stable expansions and autoepistemic consequences, considering the embeddings by themselves, as well as combinations with classical theories. Our results reveal differences and correspondences of the embeddings and provide useful guidance in the choice of a particular embedding for knowledge combination.\nCollaborative tagging systems, such as Delicious, CiteULike, and others, allow users to annotate resources, e.g., Web pages or scientific papers, with descriptive labels called tags. The social annotations contributed by thousands of users, can potentially be used to infer categorical knowledge, classify documents or recommend new relevant information. Traditional text inference methods do not make best use of social annotation, since they do not take into account variations in individual users' perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated resources. Unfortunately, that approach had one major shortcoming: the number of topics and interests must be specified a priori. To address this drawback, we extend the model to a fully Bayesian framework, which offers a way to automatically estimate these numbers. In particular, the model allows the number of interests and topics to change as suggested by the structure of the data. We evaluate the proposed model in detail on the synthetic and real-world data by comparing its performance to Latent Dirichlet Allocation on the topic extraction task. For the latter evaluation, we apply the model to infer topics of Web resources from social annotations obtained from Delicious in order to discover new resources similar to a specified one. Our empirical results demonstrate that the proposed model is a promising method for exploiting social knowledge contained in user-generated annotations.\nMany databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning (SRL) has developed a number of new statistical models for such data. In this paper we focus on learning class-level or first-order dependencies, which model the general database statistics over attributes of linked objects and links (e.g., the percentage of A grades given in computer science classes). Class-level statistical relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. Most current SRL methods find class-level dependencies, but their main task is to support instance-level predictions about the attributes or links of specific entities. We focus only on class-level prediction, and describe algorithms for learning class-level models that are orders of magnitude faster for this task. Our algorithms learn Bayes nets with relational structure, leveraging the efficiency of single-table nonrelational Bayes net learners. An evaluation of our methods on three data sets shows that they are computationally feasible for realistic table sizes, and that the learned structures represent the statistical information in the databases well. After learning compiles the database statistics into a Bayes net, querying these statistics via Bayes net inference is faster than with SQL queries, and does not depend on the size of the database.\nThe Quantum Decision Theory, developed recently by the authors, is applied to clarify the role of risk and uncertainty in decision making and in particular in relation to the phenomenon of dynamic inconsistency. By formulating this notion in precise mathematical terms, we distinguish three types of inconsistency: time inconsistency, planning paradox, and inconsistency occurring in some discounting effects. While time inconsistency is well accounted for in classical decision theory, the planning paradox is in contradiction with classical utility theory. It finds a natural explanation in the frame of the Quantum Decision Theory. Different types of discounting effects are analyzed and shown to enjoy a straightforward explanation within the suggested theory. We also introduce a general methodology based on self-similar approximation theory for deriving the evolution equations for the probabilities of future prospects. This provides a novel classification of possible discount factors, which include the previously known cases (exponential or hyperbolic discounting), but also predicts a novel class of discount factors that decay to a strictly positive constant for very large future time horizons. This class may be useful to deal with very long-term discounting situations associated with intergenerational public policy choices, encompassing issues such as global warming and nuclear waste disposal.\nWe have proposed a model based upon flocking on a complex network, and then developed two clustering algorithms on the basis of it. In the algorithms, firstly a \\textit{k}-nearest neighbor (knn) graph as a weighted and directed graph is produced among all data points in a dataset each of which is regarded as an agent who can move in space, and then a time-varying complex network is created by adding long-range links for each data point. Furthermore, each data point is not only acted by its \\textit{k} nearest neighbors but also \\textit{r} long-range neighbors through fields established in space by them together, so it will take a step along the direction of the vector sum of all fields. It is more important that these long-range links provides some hidden information for each data point when it moves and at the same time accelerate its speed converging to a center. As they move in space according to the proposed model, data points that belong to the same class are located at a same position gradually, whereas those that belong to different classes are away from one another. Consequently, the experimental results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the rates of convergence of clustering algorithms are fast enough. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.\nWe introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in (Certkov et al., 2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for the partition function approximation for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference.\nThis paper presents several types of evolutionary algorithms (EAs) used for global optimization on real domains. The interest has been focused on multimodal problems, where the difficulties of a premature convergence usually occurs. First the standard genetic algorithm (SGA) using binary encoding of real values and its unsatisfactory behavior with multimodal problems is briefly reviewed together with some improvements of fighting premature convergence. Two types of real encoded methods based on differential operators are examined in detail: the differential evolution (DE), a very modern and effective method firstly published by R. Storn and K. Price, and the simplified real-coded differential genetic algorithm SADE proposed by the authors. In addition, an improvement of the SADE method, called CERAF technology, enabling the population of solutions to escape from local extremes, is examined. All methods are tested on an identical set of objective functions and a systematic comparison based on a reliable methodology is presented. It is confirmed that real coded methods generally exhibit better behavior on real domains than the binary algorithms, even when extended by several improvements. Furthermore, the positive influence of the differential operators due to their possibility of self-adaptation is demonstrated. From the reliability point of view, it seems that the real encoded differential algorithm, improved by the technology described in this paper, is a universal and reliable method capable of solving all proposed test problems.\nWe study the combination of the following already known ideas for showing confluence of unconditional or conditional term rewriting systems into practically more useful confluence criteria for conditional systems: Our syntactical separation into constructor and non-constructor symbols, Huet's introduction and Toyama's generalization of parallel closedness for non-noetherian unconditional systems, the use of shallow confluence for proving confluence of noetherian and non-noetherian conditional systems, the idea that certain kinds of limited confluence can be assumed for checking the fulfilledness or infeasibility of the conditions of conditional critical pairs, and the idea that (when termination is given) only prime superpositions have to be considered and certain normalization restrictions can be applied for the substitutions fulfilling the conditions of conditional critical pairs. Besides combining and improving already known methods, we present the following new ideas and results: We strengthen the criterion for overlay joinable noetherian systems, and, by using the expressiveness of our syntactical separation into constructor and non-constructor symbols, we are able to present criteria for level confluence that are not criteria for shallow confluence actually and also able to weaken the severe requirement of normality (stiffened with left-linearity) in the criteria for shallow confluence of noetherian and non-noetherian conditional systems to the easily satisfied requirement of quasi-normality. Finally, the whole paper may also give a practically useful overview of the syntactical means for showing confluence of conditional term rewriting systems.\nWe present the only proof of Pierre Fermat by descente infinie that is known to exist today. As the text of its Latin original requires active mathematical interpretation, it is more a proof sketch than a proper mathematical proof. We discuss descente infinie from the mathematical, logical, historical, linguistic, and refined logic-historical points of view. We provide the required preliminaries from number theory and develop a self-contained proof in a modern form, which nevertheless is intended to follow Fermat's ideas closely. We then annotate an English translation of Fermat's original proof with terms from the modern proof. Including all important facts, we present a concise and self-contained discussion of Fermat's proof sketch, which is easily accessible to laymen in number theory as well as to laymen in the history of mathematics, and which provides new clarification of the Method of Descente Infinie to the experts in these fields. Last but not least, this paper fills a gap regarding the easy accessibility of the subject.\n1-Nearest Neighbor with the Dynamic Time Warping (DTW) distance is one of the most effective classifiers on time series domain. Since the global constraint has been introduced in speech community, many global constraint models have been proposed including Sakoe-Chiba (S-C) band, Itakura Parallelogram, and Ratanamahatana-Keogh (R-K) band. The R-K band is a general global constraint model that can represent any global constraints with arbitrary shape and size effectively. However, we need a good learning algorithm to discover the most suitable set of R-K bands, and the current R-K band learning algorithm still suffers from an 'overfitting' phenomenon. In this paper, we propose two new learning algorithms, i.e., band boundary extraction algorithm and iterative learning algorithm. The band boundary extraction is calculated from the bound of all possible warping paths in each class, and the iterative learning is adjusted from the original R-K band learning. We also use a Silhouette index, a well-known clustering validation technique, as a heuristic function, and the lower bound function, LB_Keogh, to enhance the prediction speed. Twenty datasets, from the Workshop and Challenge on Time Series Classification, held in conjunction of the SIGKDD 2007, are used to evaluate our approach.\nAffective computing has proven to be a viable field of research comprised of a large number of multidisciplinary researchers resulting in work that is widely published. The majority of this work consists of computational models of emotion recognition, computational modeling of causal factors of emotion and emotion expression through rendered and robotic faces. A smaller part is concerned with modeling the effects of emotion, formal modeling of cognitive appraisal theory and models of emergent emotions. Part of the motivation for affective computing as a field is to better understand emotional processes through computational modeling. One of the four major topics in affective computing is computers that have emotions (the others are recognizing, expressing and understanding emotions). A critical and neglected aspect of having emotions is the experience of emotion (Barrett, Mesquita, Ochsner, and Gross, 2007): what does the content of an emotional episode look like, how does this content change over time and when do we call the episode emotional. Few modeling efforts have these topics as primary focus. The launch of a journal on synthetic emotions should motivate research initiatives in this direction, and this research should have a measurable impact on emotion research in psychology. I show that a good way to do so is to investigate the psychological core of what an emotion is: an experience. I present ideas on how the experience of emotion could be modeled and provide evidence that several computational models of emotion are already addressing the issue.\nIn this work, we deal with the question of modeling programming exercises for novices pointing to an e-learning scenario. Our purpose is to identify basic requirements, raise some key questions and propose potential answers from a conceptual perspective. Presented as a general picture, we hypothetically situate our work in a general context where e-learning instructional material needs to be adapted to form part of an introductory Computer Science (CS) e-learning course at the CS1-level. Meant is a potential course which aims at improving novices skills and knowledge on the essentials of programming by using e-learning based approaches in connection (at least conceptually) with a general host framework like Activemath (www.activemath.org). Our elaboration covers contextual and, particularly, cognitive elements preparing the terrain for eventual research stages in a derived project, as indicated. We concentrate our main efforts on reasoning mechanisms about exercise complexity that can eventually offer tool support for the task of exercise authoring. We base our requirements analysis on our own perception of the exercise subsystem provided by Activemath especially within the domain reasoner area. We enrich the analysis by bringing to the discussion several relevant contextual elements from the CS1 courses, its definition and implementation. Concerning cognitive models and exercises, we build upon the principles of Bloom's Taxonomy as a relatively standardized basis and use them as a framework for study and analysis of complexity in basic programming exercises. Our analysis includes requirements for the domain reasoner which are necessary for the exercise analysis. We propose for such a purpose a three-layered conceptual model considering exercise evaluation, programming and metaprogramming.\nComplexity theory is a useful tool to study computational issues surrounding the elicitation of preferences, as well as the strategic manipulation of elections aggregating together preferences of multiple agents. We study here the complexity of determining when we can terminate eliciting preferences, and prove that the complexity depends on the elicitation strategy. We show, for instance, that it may be better from a computational perspective to elicit all preferences from one agent at a time than to elicit individual preferences from multiple agents. We also study the connection between the strategic manipulation of an election and preference elicitation. We show that what we can manipulate affects the computational complexity of manipulation. In particular, we prove that there are voting rules which are easy to manipulate if we can change all of an agent's vote, but computationally intractable if we can change only some of their preferences. This suggests that, as with preference elicitation, a fine-grained view of manipulation may be informative. Finally, we study the connection between predicting the winner of an election and preference elicitation. Based on this connection, we identify a voting rule where it is computationally difficult to decide the probability of a candidate winning given a probability distribution over the votes.\nSemantic memory is the subsystem of human memory that stores knowledge of concepts or meanings, as opposed to life specific experiences. The organization of concepts within semantic memory can be understood as a semantic network, where the concepts (nodes) are associated (linked) to others depending on perceptions, similarities, etc. Lexical access is the complementary part of this system and allows the retrieval of such organized knowledge. While conceptual information is stored under certain underlying organization (and thus gives rise to a specific topology), it is crucial to have an accurate access to any of the information units, e.g. the concepts, for efficiently retrieving semantic information for real-time needings. An example of an information retrieval process occurs in verbal fluency tasks, and it is known to involve two different mechanisms: -clustering-, or generating words within a subcategory, and, when a subcategory is exhausted, -switching- to a new subcategory. We extended this approach to random-walking on a network (clustering) in combination to jumping (switching) to any node with certain probability and derived its analytical expression based on Markov chains. Results show that this dual mechanism contributes to optimize the exploration of different network models in terms of the mean first passage time. Additionally, this cognitive inspired dual mechanism opens a new framework to better understand and evaluate exploration, propagation and transport phenomena in other complex systems where switching-like phenomena are feasible.\nObservational astronomy has changed drastically in the last decade: manually driven target-by-target instruments have been replaced by fully automated robotic telescopes. Data acquisition methods have advanced to the point that terabytes of data are flowing in and being stored on a daily basis. At the same time, the vast majority of analysis tools in stellar astrophysics still rely on manual expert interaction. To bridge this gap, we foresee that the next decade will witness a fundamental shift in the approaches to data analysis: case-by-case methods will be replaced by fully automated pipelines that will process the data from their reduction stage, through analysis, to storage. While major effort has been invested in data reduction automation, automated data analysis has mostly been neglected despite the urgent need. Scientific data mining will face serious challenges to identify, understand and eliminate the sources of systematic errors that will arise from this automation. As a special case, we present an artificial intelligence (AI) driven pipeline that is prototyped in the domain of stellar astrophysics (eclipsing binaries in particular), current results and the challenges still ahead.\nThis papers develops a logical language for representing probabilistic causal laws. Our interest in such a language is twofold. First, it can be motivated as a fundamental study of the representation of causal knowledge. Causality has an inherent dynamic aspect, which has been studied at the semantical level by Shafer in his framework of probability trees. In such a dynamic context, where the evolution of a domain over time is considered, the idea of a causal law as something which guides this evolution is quite natural. In our formalization, a set of probabilistic causal laws can be used to represent a class of probability trees in a concise, flexible and modular way. In this way, our work extends Shafer's by offering a convenient logical representation for his semantical objects.   Second, this language also has relevance for the area of probabilistic logic programming. In particular, we prove that the formal semantics of a theory in our language can be equivalently defined as a probability distribution over the well-founded models of certain logic programs, rendering it formally quite similar to existing languages such as ICL or PRISM. Because we can motivate and explain our language in a completely self-contained way as a representation of probabilistic causal laws, this provides a new way of explaining the intuitions behind such probabilistic logic programs: we can say precisely which knowledge such a program expresses, in terms that are equally understandable by a non-logician. Moreover, we also obtain an additional piece of knowledge representation methodology for probabilistic logic programs, by showing how they can express probabilistic causal laws.\nSubspace clustering has gained increasing popularity in the analysis of gene expression data. Among subspace cluster models, the recently introduced order-preserving sub-matrix (OPSM) has demonstrated high promise. An OPSM, essentially a pattern-based subspace cluster, is a subset of rows and columns in a data matrix for which all the rows induce the same linear ordering of columns. Existing OPSM discovery methods do not scale well to increasingly large expression datasets. In particular, twig clusters having few genes and many experiments incur explosive computational costs and are completely pruned off by existing methods. However, it is of particular interest to determine small groups of genes that are tightly coregulated across many conditions. In this paper, we present KiWi, an OPSM subspace clustering algorithm that is scalable to massive datasets, capable of discovering twig clusters and identifying negative as well as positive correlations. We extensively validate KiWi using relevant biological datasets and show that KiWi correctly assigns redundant probes to the same cluster, groups experiments with common clinical annotations, differentiates real promoter sequences from negative control sequences, and shows good association with cis-regulatory motif predictions.\nReal world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not appropriate. Since without any domain knowledge, setting support threshold small or large can output nothing or a large number of redundant uninteresting results. Recently a novel approach of mining only N-most/Top-K interesting frequent itemsets has been proposed, which discovers the top N interesting results without specifying any user defined support threshold. However, mining interesting frequent itemsets without minimum support threshold are more costly in terms of itemset search space exploration and processing cost. Thereby, the efficiency of their mining highly depends upon three main factors (1) Database representation approach used for itemset frequency counting, (2) Projection of relevant transactions to lower level nodes of search space and (3) Algorithm implementation technique. Therefore, to improve the efficiency of mining process, in this paper we present two novel algorithms called (N-MostMiner and Top-K-Miner) using the bit-vector representation approach which is very efficient in terms of itemset frequency counting and transactions projection. In addition to this, several efficient implementation techniques of N-MostMiner and Top-K-Miner are also present which we experienced in our implementation. Our experimental results on benchmark datasets suggest that the NMostMiner and Top-K-Miner are very efficient in terms of processing time as compared to current best algorithms BOMO and TFP.\nComputability logic (CL) (see http://www.cis.upenn.edu/~giorgi/cl.html ) is a research program for redeveloping logic as a formal theory of computability, as opposed to the formal theory of truth which it has more traditionally been. Formulas in CL stand for interactive computational problems, seen as games between a machine and its environment; logical operators represent operations on such entities; and \"truth\" is understood as existence of an effective solution. The formalism of CL is open-ended, and may undergo series of extensions as the studies of the subject advance. So far three -- parallel, sequential and choice -- sorts of conjunction and disjunction have been studied. The present paper adds one more natural kind to this collection, termed toggling. The toggling operations can be characterized as lenient versions of choice operations where choices are retractable, being allowed to be reconsidered any finite number of times. This way, they model trial-and-error style decision steps in interactive computation. The main technical result of this paper is constructing a sound and complete axiomatization for the propositional fragment of computability logic whose vocabulary, together with negation, includes all four -- parallel, toggling, sequential and choice -- kinds of conjunction and disjunction. Along with toggling conjunction and disjunction, the paper also introduces the toggling versions of quantifiers and recurrence operations.\nApproximately over 50 million people worldwide suffer from epilepsy. Traditional diagnosis of epilepsy relies on tedious visual screening by highly trained clinicians from lengthy EEG recording that contains the presence of seizure (ictal) activities. Nowadays, there are many automatic systems that can recognize seizure-related EEG signals to help the diagnosis. However, it is very costly and inconvenient to obtain long-term EEG data with seizure activities, especially in areas short of medical resources. We demonstrate in this paper that we can use the interictal scalp EEG data, which is much easier to collect than the ictal data, to automatically diagnose whether a person is epileptic. In our automated EEG recognition system, we extract three classes of features from the EEG data and build Probabilistic Neural Networks (PNNs) fed with these features. We optimize the feature extraction parameters and combine these PNNs through a voting mechanism. As a result, our system achieves an impressive 94.07% accuracy, which is very close to reported human recognition accuracy by experienced medical professionals.\nThis paper studies the stable model semantics of logic programs with (abstract) constraint atoms and their properties. We introduce a succinct abstract representation of these constraint atoms in which a constraint atom is represented compactly. We show two applications. First, under this representation of constraint atoms, we generalize the Gelfond-Lifschitz transformation and apply it to define stable models (also called answer sets) for logic programs with arbitrary constraint atoms. The resulting semantics turns out to coincide with the one defined by Son et al., which is based on a fixpoint approach. One advantage of our approach is that it can be applied, in a natural way, to define stable models for disjunctive logic programs with constraint atoms, which may appear in the disjunctive head as well as in the body of a rule. As a result, our approach to the stable model semantics for logic programs with constraint atoms generalizes a number of previous approaches. Second, we show that our abstract representation of constraint atoms provides a means to characterize dependencies of atoms in a program with constraint atoms, so that some standard characterizations and properties relying on these dependencies in the past for logic programs with ordinary atoms can be extended to logic programs with constraint atoms.\nOur aim in this paper is to analyse the phenotypic effects (evolvability) of diverse coding conversion operators in an instance of the states based evolutionary algorithm (SEA). Since the representation of solutions or the selection of the best encoding during the optimization process has been proved to be very important for the efficiency of evolutionary algorithms (EAs), we will discuss a strategy of coupling more than one representation and different procedures of conversion from one coding to another during the search. Elsewhere, some EAs try to use multiple representations (SM-GA, SEA, etc.) in intention to benefit from the characteristics of each of them. In spite of those results, this paper shows that the change of the representation is also a crucial approach to take into consideration while attempting to increase the performances of such EAs. As a demonstrative example, we use a two states SEA (2-SEA) which has two identical search spaces but different coding conversion operators. The results show that the way of changing from one coding to another and not only the choice of the best representation nor the representation itself is very advantageous and must be taken into account in order to well-desing and improve EAs execution.\nMany regression problems involve not one but several response variables (y's). Often the responses are suspected to share a common underlying structure, in which case it may be advantageous to share information across them; this is known as multitask learning. As a special case, we can use multiple responses to better identify shared predictive features -- a project we might call multitask feature selection.   This thesis is organized as follows. Section 1 introduces feature selection for regression, focusing on ell_0 regularization methods and their interpretation within a Minimum Description Length (MDL) framework. Section 2 proposes a novel extension of MDL feature selection to the multitask setting. The approach, called the \"Multiple Inclusion Criterion\" (MIC), is designed to borrow information across regression tasks by more easily selecting features that are associated with multiple responses. We show in experiments on synthetic and real biological data sets that MIC can reduce prediction error in settings where features are at least partially shared across responses. Section 3 surveys hypothesis testing by regression with a single response, focusing on the parallel between the standard Bonferroni correction and an MDL approach. Mirroring the ideas in Section 2, Section 4 proposes a novel MIC approach to hypothesis testing with multiple responses and shows that on synthetic data with significant sharing of features across responses, MIC sometimes outperforms standard FDR-controlling methods in terms of finding true positives for a given level of false positives. Section 5 concludes.\nMining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, in many problem domains (e.g, program execution traces), a novel sequential pattern mining research, called mining repetitive gapped sequential patterns, has attracted the attention of many researchers, considering not only the repetition of sequential pattern in different sequences but also the repetition within a sequence is more meaningful than the general sequential pattern mining which only captures occurrences in different sequences. However, the number of repetitive gapped sequential patterns generated by even these closed mining algorithms may be too large to understand for users, especially when support threshold is low. In this paper, we propose and study the problem of compressing repetitive gapped sequential patterns. Inspired by the ideas of summarizing frequent itemsets, RPglobal, we develop an algorithm, CRGSgrow (Compressing Repetitive Gapped Sequential pattern grow), including an efficient pruning strategy, SyncScan, and an efficient representative pattern checking scheme, -dominate sequential pattern checking. The CRGSgrow is a two-step approach: in the first step, we obtain all closed repetitive sequential patterns as the candidate set of representative repetitive sequential patterns, and at the same time get the most of representative repetitive sequential patterns; in the second step, we only spend a little time in finding the remaining the representative patterns from the candidate set. An empirical study with both real and synthetic data sets clearly shows that the CRGSgrow has good performance.\nAmong many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process by a few orders of magnitude. In addition, our method shows superiority over existing rival methods in terms of query processing time, number of page accesses, and storage requirement with no false dismissal guaranteed.\nConference paper assignment, i.e., the task of assigning paper submissions to reviewers, presents multi-faceted issues for recommender systems research. Besides the traditional goal of predicting `who likes what?', a conference management system must take into account aspects such as: reviewer capacity constraints, adequate numbers of reviews for papers, expertise modeling, conflicts of interest, and an overall distribution of assignments that balances reviewer preferences with conference objectives. Among these, issues of modeling preferences and tastes in reviewing have traditionally been studied separately from the optimization of paper-reviewer assignment. In this paper, we present an integrated study of both these aspects. First, due to the paucity of data per reviewer or per paper (relative to other recommender systems applications) we show how we can integrate multiple sources of information to learn paper-reviewer preference models. Second, our models are evaluated not just in terms of prediction accuracy but in terms of the end-assignment quality. Using a linear programming-based assignment optimization formulation, we show how our approach better explores the space of unsupplied assignments to maximize the overall affinities of papers assigned to reviewers. We demonstrate our results on real reviewer preference data from the IEEE ICDM 2007 conference.\nWe study the termination problem of the chase algorithm, a central tool in various database problems such as the constraint implication problem, Conjunctive Query optimization, rewriting queries using views, data exchange, and data integration. The basic idea of the chase is, given a database instance and a set of constraints as input, to fix constraint violations in the database instance. It is well-known that, for an arbitrary set of constraints, the chase does not necessarily terminate (in general, it is even undecidable if it does or not). Addressing this issue, we review the limitations of existing sufficient termination conditions for the chase and develop new techniques that allow us to establish weaker sufficient conditions. In particular, we introduce two novel termination conditions called safety and inductive restriction, and use them to define the so-called T-hierarchy of termination conditions. We then study the interrelations of our termination conditions with previous conditions and the complexity of checking our conditions. This analysis leads to an algorithm that checks membership in a level of the T-hierarchy and accounts for the complexity of termination conditions. As another contribution, we study the problem of data-dependent chase termination and present sufficient termination conditions w.r.t. fixed instances. They might guarantee termination although the chase does not terminate in the general case. As an application of our techniques beyond those already mentioned, we transfer our results into the field of query answering over knowledge bases where the chase on the underlying database may not terminate, making existing algorithms applicable to broader classes of constraints.\nIn most contemporary approaches to decision making, a decision problem is described by a sets of states and set of outcomes, and a rich set of acts, which are functions from states to outcomes over which the decision maker (DM) has preferences. Most interesting decision problems, however, do not come with a state space and an outcome space. Indeed, in complex problems it is often far from clear what the state and outcome spaces would be. We present an alternative foundation for decision making, in which the primitive objects of choice are syntactic programs. A representation theorem is proved in the spirit of standard representation theorems, showing that if the DM's preference relation on objects of choice satisfies appropriate axioms, then there exist a set S of states, a set O of outcomes, a way of interpreting the objects of choice as functions from S to O, a probability on S, and a utility function on O, such that the DM prefers choice a to choice b if and only if the expected utility of a is higher than that of b. Thus, the state space and outcome space are subjective, just like the probability and utility; they are not part of the description of the problem. In principle, a modeler can test for SEU behavior without having access to states or outcomes. We illustrate the power of our approach by showing that it can capture decision makers who are subject to framing effects.\nAlthough researchers often comment on the rising popularity of nature-inspired meta-heuristics (NIM), there has been a paucity of data to directly support the claim that NIM are growing in prominence compared to other optimization techniques. This study presents evidence that the use of NIM is not only growing, but indeed appears to have surpassed mathematical optimization techniques (MOT) in several important metrics related to academic research activity (publication frequency) and commercial activity (patenting frequency). Motivated by these findings, this article discusses some of the possible origins of this growing popularity. I review different explanations for NIM popularity and discuss why some of these arguments remain unsatisfying. I argue that a compelling and comprehensive explanation should directly account for the manner in which most NIM success has actually been achieved, e.g. through hybridization and customization to different problem environments. By taking a problem lifecycle perspective, this paper offers a fresh look at the hypothesis that nature-inspired meta-heuristics derive much of their utility from being flexible. I discuss global trends within the business environments where optimization algorithms are applied and I speculate that highly flexible algorithm frameworks could become increasingly popular within our diverse and rapidly changing world.\nCapability planning problems are pervasive throughout many areas of human interest with prominent examples found in defense and security. Planning provides a unique context for optimization that has not been explored in great detail and involves a number of interesting challenges which are distinct from traditional optimization research. Planning problems demand solutions that can satisfy a number of competing objectives on multiple scales related to robustness, adaptiveness, risk, etc. The scenario method is a key approach for planning. Scenarios can be defined for long-term as well as short-term plans. This paper introduces computational scenario-based planning problems and proposes ways to accommodate strategic positioning within the tactical planning domain. We demonstrate the methodology in a resource planning problem that is solved with a multi-objective evolutionary algorithm. Our discussion and results highlight the fact that scenario-based planning is naturally framed within a multi-objective setting. However, the conflicting objectives occur on different system levels rather than within a single system alone. This paper also contends that planning problems are of vital interest in many human endeavors and that Evolutionary Computation may be well positioned for this problem domain.\nCollective graphical models exploit inter-instance associative dependence to output more accurate labelings. However existing models support very limited kind of associativity which restricts accuracy gains. This paper makes two major contributions. First, we propose a general collective inference framework that biases data instances to agree on a set of {\\em properties} of their labelings. Agreement is encouraged through symmetric clique potentials. We show that rich properties leads to bigger gains, and present a systematic inference procedure for a large class of such properties. The procedure performs message passing on the cluster graph, where property-aware messages are computed with cluster specific algorithms. This provides an inference-only solution for domain adaptation. Our experiments on bibliographic information extraction illustrate significant test error reduction over unseen domains. Our second major contribution consists of algorithms for computing outgoing messages from clique clusters with symmetric clique potentials. Our algorithms are exact for arbitrary symmetric potentials on binary labels and for max-like and majority-like potentials on multiple labels. For majority potentials, we also provide an efficient Lagrangian Relaxation based algorithm that compares favorably with the exact algorithm. We present a 13/15-approximation algorithm for the NP-hard Potts potential, with runtime sub-quadratic in the clique size. In contrast, the best known previous guarantee for graphs with Potts potentials is only 1/2. We empirically show that our method for Potts potentials is an order of magnitude faster than the best alternatives, and our Lagrangian Relaxation based algorithm for majority potentials beats the best applicable heuristic -- ICM.\nMaking decisions about the structure of a future military fleet is a challenging task. Several issues need to be considered such as the existence of multiple competing objectives and the complexity of the operating environment. A particular challenge is posed by the various types of uncertainty that the future might hold. It is uncertain what future events might be encountered; how fleet design decisions will influence and shape the future; and how present and future decision makers will act based on available information, their personal biases regarding the importance of different objectives, and their economic preferences. In order to assist strategic decision-making, an analysis of future fleet options needs to account for conditions in which these different classes of uncertainty are exposed. It is important to understand what assumptions a particular fleet is robust to, what the fleet can readily adapt to, and what conditions present clear risks to the fleet. We call this the analysis of a fleet's strategic positioning. This paper introduces how strategic positioning can be evaluated using computer simulations. Our main aim is to introduce a framework for capturing information that can be useful to a decision maker and for defining the concepts of robustness and adaptiveness in the context of future fleet design. We demonstrate our conceptual framework using simulation studies of an air transportation fleet. We capture uncertainty by employing an explorative scenario-based approach. Each scenario represents a sampling of different future conditions, different model assumptions, and different economic preferences. Proposed changes to a fleet are then analysed based on their influence on the fleet's robustness, adaptiveness, and risk to different scenarios.\nAnt Colony Optimization (ACO) has time complexity O(t*m*N*N), and its typical application is to solve Traveling Salesman Problem (TSP), where t, m, and N denotes the iteration number, number of ants, number of cities respectively. Cutting down running time is one of study focuses, and one way is to decrease parameter t and N, especially N. For this focus, the following method is presented in this paper. Firstly, design a novel clustering algorithm named Special Local Clustering algorithm (SLC), then apply it to classify all cities into compact classes, where compact class is the class that all cities in this class cluster tightly in a small region. Secondly, let ACO act on every class to get a local TSP route. Thirdly, all local TSP routes are jointed to form solution. Fourthly, the inaccuracy of solution caused by clustering is eliminated. Simulation shows that the presented method improves the running speed of ACO by 200 factors at least. And this high speed is benefit from two factors. One is that class has small size and parameter N is cut down. The route length at every iteration step is convergent when ACO acts on compact class. The other factor is that, using the convergence of route length as termination criterion of ACO and parameter t is cut down.\nA key question in cooperative game theory is that of coalitional stability, usually captured by the notion of the \\emph{core}--the set of outcomes such that no subgroup of players has an incentive to deviate. However, some coalitional games have empty cores, and any outcome in such a game is unstable.   In this paper, we investigate the possibility of stabilizing a coalitional game by using external payments. We consider a scenario where an external party, which is interested in having the players work together, offers a supplemental payment to the grand coalition (or, more generally, a particular coalition structure). This payment is conditional on players not deviating from their coalition(s). The sum of this payment plus the actual gains of the coalition(s) may then be divided among the agents so as to promote stability. We define the \\emph{cost of stability (CoS)} as the minimal external payment that stabilizes the game.   We provide general bounds on the cost of stability in several classes of games, and explore its algorithmic properties. To develop a better intuition for the concepts we introduce, we provide a detailed algorithmic study of the cost of stability in weighted voting games, a simple but expressive class of games which can model decision-making in political bodies, and cooperation in multiagent settings. Finally, we extend our model and results to games with coalition structures.\nThe study of topological information of spatial objects has for a long time been a focus of research in disciplines like computational geometry, spatial reasoning, cognitive science, and robotics. While the majority of these researches emphasised the topological relations between spatial objects, this work studies the internal topological structure of bounded plane regions, which could consist of multiple pieces and/or have holes and islands to any finite level. The insufficiency of simple regions (regions homeomorphic to closed disks) to cope with the variety and complexity of spatial entities and phenomena has been widely acknowledged. Another significant drawback of simple regions is that they are not closed under set operations union, intersection, and difference. This paper considers bounded semi-algebraic regions, which are closed under set operations and can closely approximate most plane regions arising in practice.\nGrid environment is a service oriented infrastructure in which many heterogeneous resources participate to provide the high performance computation. One of the bug issues in the grid environment is the vagueness and uncertainty between advertised resources and requested resources. Furthermore, in an environment such as grid dynamicity is considered as a crucial issue which must be dealt with. Classical rough set have been used to deal with the uncertainty and vagueness. But it can just be used on the static systems and can not support dynamicity in a system. In this work we propose a solution, called Dynamic Rough Set Resource Discovery (DRSRD), for dealing with cases of vagueness and uncertainty problems based on Dynamic rough set theory which considers dynamic features in this environment. In this way, requested resource properties have a weight as priority according to which resource matchmaking and ranking process is done. We also report the result of the solution obtained from the simulation in GridSim simulator. The comparison has been made between DRSRD, classical rough set theory based algorithm, and UDDI and OWL S combined algorithm. DRSRD shows much better precision for the cases with vagueness and uncertainty in a dynamic system such as the grid rather than the classical rough set theory based algorithm, and UDDI and OWL S combined algorithm.\nIn many scenarios, such as emergency response or ad hoc collaboration, it is critical to reduce the overhead in integrating data. Ideally, one could perform the entire process interactively under one unified interface: defining extractors and wrappers for sources, creating a mediated schema, and adding schema mappings ? while seeing how these impact the integrated view of the data, and refining the design accordingly.   We propose a novel smart copy and paste (SCP) model and architecture for seamlessly combining the design-time and run-time aspects of data integration, and we describe an initial prototype, the CopyCat system. In CopyCat, the user does not need special tools for the different stages of integration: instead, the system watches as the user copies data from applications (including the Web browser) and pastes them into CopyCat?s spreadsheet-like workspace. CopyCat generalizes these actions and presents proposed auto-completions, each with an explanation in the form of provenance. The user provides feedback on these suggestions ? through either direct interactions or further copy-and-paste operations ? and the system learns from this feedback. This paper provides an overview of our prototype system, and identifies key research challenges in achieving SCP in its full generality.\nThis paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average consensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.\nActor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local maximum of the average reward. Linear function approximation is used by the critic in order estimate the value function, and the temporal difference signal, which is passed from the critic to the actor. The main distinguishing feature of the present convergence proof is that both the actor and the critic operate on a similar time scale, while in most current convergence proofs they are required to have very different time scales in order to converge. Moreover, the same temporal difference signal is used to update the parameters of both the actor and the critic. A limitation of the proposed approach, compared to results available for two time scale convergence, is that convergence is guaranteed only to a neighborhood of an optimal value, rather to an optimal value itself. The single time scale and identical temporal difference signal used by the actor and the critic, may provide a step towards constructing more biologically realistic models of reinforcement learning in the brain.\nWe consider multi-agent systems where agents' preferences are aggregated via sequential majority voting: each decision is taken by performing a sequence of pairwise comparisons where each comparison is a weighted majority vote among the agents. Incompleteness in the agents' preferences is common in many real-life settings due to privacy issues or an ongoing elicitation process. In addition, there may be uncertainty about how the preferences are aggregated. For example, the agenda (a tree whose leaves are labelled with the decisions being compared) may not yet be known or fixed. We therefore study how to determine collectively optimal decisions (also called winners) when preferences may be incomplete, and when the agenda may be uncertain. We show that it is computationally easy to determine if a candidate decision always wins, or may win, whatever the agenda. On the other hand, it is computationally hard to know wheth er a candidate decision wins in at least one agenda for at least one completion of the agents' preferences. These results hold even if the agenda must be balanced so that each candidate decision faces the same number of majority votes. Such results are useful for reasoning about preference elicitation. They help understand the complexity of tasks such as determining if a decision can be taken collectively, as well as knowing if the winner can be manipulated by appropriately ordering the agenda.\nWe introduce the Reduced-Rank Hidden Markov Model (RR-HMM), a generalization of HMMs that can model smooth state evolution as in Linear Dynamical Systems (LDSs) as well as non-log-concave predictive distributions as in continuous-observation HMMs. RR-HMMs assume an m-dimensional latent state and n discrete observations, with a transition matrix of rank k <= m. This implies the dynamics evolve in a k-dimensional subspace, while the shape of the set of predictive distributions is determined by m. Latent state belief is represented with a k-dimensional state vector and inference is carried out entirely in R^k, making RR-HMMs as computationally efficient as k-state HMMs yet more expressive. To learn RR-HMMs, we relax the assumptions of a recently proposed spectral learning algorithm for HMMs (Hsu, Kakade and Zhang 2009) and apply it to learn k-dimensional observable representations of rank-k RR-HMMs. The algorithm is consistent and free of local optima, and we extend its performance guarantees to cover the RR-HMM case. We show how this algorithm can be used in conjunction with a kernel density estimator to efficiently model high-dimensional multivariate continuous data. We also relax the assumption that single observations are sufficient to disambiguate state, and extend the algorithm accordingly. Experiments on synthetic data and a toy video, as well as on a difficult robot vision modeling problem, yield accurate models that compare favorably with standard alternatives in simulation quality and prediction capability.\nWe analyze and exploit some scaling properties of the Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007). First we observe that a divide and conquer strategy, used on a large data set hierarchically reduces the complexity ${\\cal O}(N^2)$ to ${\\cal O}(N^{(h+2)/(h+1)})$, for a data-set of size $N$ and a depth $h$ of the hierarchical strategy. For a data-set embedded in a $d$-dimensional space, we show that this is obtained without notably damaging the precision except in dimension $d=2$. In fact, for $d$ larger than 2 the relative loss in precision scales like $N^{(2-d)/(h+1)d}$. Finally, under some conditions we observe that there is a value $s^*$ of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for $s<s^*$) from a coalescent one (for $s>s^*$) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position. From this observation, a strategy based on \\AP can be defined to find out how many clusters are present in a given dataset.\nNurse rostering is a complex scheduling problem that affects hospital personnel on a daily basis all over the world. This paper presents a new component-based approach with evolutionary eliminations, for a nurse scheduling problem arising at a major UK hospital. The main idea behind this technique is to decompose a schedule into its components (i.e. the allocated shift pattern of each nurse), and then to implement two evolutionary elimination strategies mimicking natural selection and natural mutation process on these components respectively to iteratively deliver better schedules. The worthiness of all components in the schedule has to be continuously demonstrated in order for them to remain there. This demonstration employs an evaluation function which evaluates how well each component contributes towards the final objective. Two elimination steps are then applied: the first elimination eliminates a number of components that are deemed not worthy to stay in the current schedule; the second elimination may also throw out, with a low level of probability, some worthy components. The eliminated components are replenished with new ones using a set of constructive heuristics using local optimality criteria. Computational results using 52 data instances demonstrate the applicability of the proposed approach in solving real-world problems.\nThe quest for robust heuristics that are able to solve more than one problem is ongoing. In this paper, we present, discuss and analyse a technique called Evolutionary Squeaky Wheel Optimisation and apply it to two different personnel scheduling problems. Evolutionary Squeaky Wheel Optimisation improves the original Squeaky Wheel Optimisation's effectiveness and execution speed by incorporating two extra steps (Selection and Mutation) for added evolution. In the Evolutionary Squeaky Wheel Optimisation, a cycle of Analysis-Selection-Mutation-Prioritization-Construction continues until stopping conditions are reached. The aim of the Analysis step is to identify below average solution components by calculating a fitness value for all components. The Selection step then chooses amongst these underperformers and discards some probabilistically based on fitness. The Mutation step further discards a few components at random. Solutions can become incomplete and thus repairs may be required. The repairs are carried out by using the Prioritization to first produce priorities that determine an order by which the following Construction step then schedules the remaining components. Therefore, improvement in the Evolutionary Squeaky Wheel Optimisation is achieved by selective solution disruption mixed with interative improvement and constructive repair. Strong experimental results are reported on two different domains of personnel scheduling: bus and rail driver scheduling and hospital nurse scheduling.\nMedical Informatics and the application of modern signal processing in the assistance of the diagnostic process in medical imaging is one of the more recent and active research areas today. This thesis addresses a variety of issues related to the general problem of medical image analysis, specifically in mammography, and presents a series of algorithms and design approaches for all the intermediate levels of a modern system for computer-aided diagnosis (CAD). The diagnostic problem is analyzed with a systematic approach, first defining the imaging characteristics and features that are relevant to probable pathology in mammo-grams. Next, these features are quantified and fused into new, integrated radio-logical systems that exhibit embedded digital signal processing, in order to improve the final result and minimize the radiological dose for the patient. In a higher level, special algorithms are designed for detecting and encoding these clinically interest-ing imaging features, in order to be used as input to advanced pattern classifiers and machine learning models. Finally, these approaches are extended in multi-classifier models under the scope of Game Theory and optimum collective deci-sion, in order to produce efficient solutions for combining classifiers with minimum computational costs for advanced diagnostic systems. The material covered in this thesis is related to a total of 18 published papers, 6 in scientific journals and 12 in international conferences.\nWe consider directed graphs over a set of n agents, where an edge (i,j) is taken to mean that agent i supports or trusts agent j. Given such a graph and an integer k\\leq n, we wish to select a subset of k agents that maximizes the sum of indegrees, i.e., a subset of k most popular or most trusted agents. At the same time we assume that each individual agent is only interested in being selected, and may misreport its outgoing edges to this end. This problem formulation captures realistic scenarios where agents choose among themselves, which can be found in the context of Internet search, social networks like Twitter, or reputation systems like Epinions.   Our goal is to design mechanisms without payments that map each graph to a k-subset of agents to be selected and satisfy the following two constraints: strategyproofness, i.e., agents cannot benefit from misreporting their outgoing edges, and approximate optimality, i.e., the sum of indegrees of the selected subset of agents is always close to optimal. Our first main result is a surprising impossibility: for k \\in {1,...,n-1}, no deterministic strategyproof mechanism can provide a finite approximation ratio. Our second main result is a randomized strategyproof mechanism with an approximation ratio that is bounded from above by four for any value of k, and approaches one as k grows.\nBoolean Satisfiability (SAT) solvers are now routinely used in the verification of large industrial problems. However, their application in safety-critical domains such as the railways, avionics, and automotive industries requires some form of assurance for the results, as the solvers can (and sometimes do) have bugs. Unfortunately, the complexity of modern, highly optimized SAT solvers renders impractical the development of direct formal proofs of their correctness. This paper presents an alternative approach where an untrusted, industrial-strength, SAT solver is plugged into a trusted, formally certified, SAT proof checker to provide industrial-strength certified SAT solving. The key novelties and characteristics of our approach are (i) that the checker is automatically extracted from the formal development, (ii), that the combined system can be used as a standalone executable program independent of any supporting theorem prover, and (iii) that the checker certifies any SAT solver respecting the agreed format for satisfiability and unsatisfiability claims. The core of the system is a certified checker for unsatisfiability claims that is formally designed and verified in Coq. We present its formal design and outline the correctness proofs. The actual standalone checker is automatically extracted from the the Coq development. An evaluation of the certified checker on a representative set of industrial benchmarks from the SAT Race Competition shows that, albeit it is slower than uncertified SAT checkers, it is significantly faster than certified checkers implemented on top of an interactive theorem prover.\nImages can be segmented by first using a classifier to predict an affinity graph that reflects the degree to which image pixels must be grouped together and then partitioning the graph to yield a segmentation. Machine learning has been applied to the affinity classifier to produce affinity graphs that are good in the sense of minimizing edge misclassification rates. However, this error measure is only indirectly related to the quality of segmentations produced by ultimately partitioning the affinity graph. We present the first machine learning algorithm for training a classifier to produce affinity graphs that are good in the sense of producing segmentations that directly minimize the Rand index, a well known segmentation performance measure. The Rand index measures segmentation performance by quantifying the classification of the connectivity of image pixel pairs after segmentation. By using the simple graph partitioning algorithm of finding the connected components of the thresholded affinity graph, we are able to train an affinity classifier to directly minimize the Rand index of segmentations resulting from the graph partitioning. Our learning algorithm corresponds to the learning of maximin affinities between image pixel pairs, which are predictive of the pixel-pair connectivity.\nPrivacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the $\\epsilon$-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.\nThe problem is sequence prediction in the following setting. A sequence $x_1,...,x_n,...$ of discrete-valued observations is generated according to some unknown probabilistic law (measure) $\\mu$. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure $\\mu$ belongs to an arbitrary but known class $C$ of stochastic process measures. We are interested in predictors $\\rho$ whose conditional probabilities converge (in some sense) to the \"true\" $\\mu$-conditional probabilities if any $\\mu\\in C$ is chosen to generate the sequence. The contribution of this work is in characterizing the families $C$ for which such predictors exist, and in providing a specific and simple form in which to look for a solution. We show that if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too. We also find several sufficient and necessary conditions for the existence of a predictor, in terms of topological characterizations of the family $C$, as well as in terms of local behaviour of the measures in $C$, which in some cases lead to procedures for constructing such predictors. It should be emphasized that the framework is completely general: the stochastic processes considered are not required to be i.i.d., stationary, or to belong to any parametric or countable family.\nWe propose a database model that allows users to annotate data with belief statements. Our motivation comes from scientific database applications where a community of users is working together to assemble, revise, and curate a shared data repository. As the community accumulates knowledge and the database content evolves over time, it may contain conflicting information and members can disagree on the information it should store. For example, Alice may believe that a tuple should be in the database, whereas Bob disagrees. He may also insert the reason why he thinks Alice believes the tuple should be in the database, and explain what he thinks the correct tuple should be instead.   We propose a formal model for Belief Databases that interprets users' annotations as belief statements. These annotations can refer both to the base data and to other annotations. We give a formal semantics based on a fragment of multi-agent epistemic logic and define a query language over belief databases. We then prove a key technical result, stating that every belief database can be encoded as a canonical Kripke structure. We use this structure to describe a relational representation of belief databases, and give an algorithm for translating queries over the belief database into standard relational queries. Finally, we report early experimental results with our prototype implementation on synthetic data.\nIn this paper, we propose causality as a unified framework to explain query answers and non-answers, thus generalizing and extending several previously proposed approaches of provenance and missing query result explanations.   We develop our framework starting from the well-studied definition of actual causes by Halpern and Pearl. After identifying some undesirable characteristics of the original definition, we propose functional causes as a refined definition of causality with several desirable properties. These properties allow us to apply our notion of causality in a database context and apply it uniformly to define the causes of query results and their individual contributions in several ways: (i) we can model both provenance as well as non-answers, (ii) we can define explanations as either data in the input relations or relational operations in a query plan, and (iii) we can give graded degrees of responsibility to individual causes, thus allowing us to rank causes. In particular, our approach allows us to explain contributions to relational aggregate functions and to rank causes according to their respective responsibilities. We give complexity results and describe polynomial algorithms for evaluating causality in tractable cases. Throughout the paper, we illustrate the applicability of our framework with several examples.   Overall, we develop in this paper the theoretical foundations of causality theory in a database context.\nInternet and expert systems have offered new ways of sharing and distributing knowledge, but there is a lack of researches in the area of web based expert systems. This paper introduces a development of a web-based expert system for the regulations of civil service in the Kingdom of Saudi Arabia named as RCSES. It is the first time to develop such system (application of civil service regulations) as well the development of it using web based approach. The proposed system considers 17 regulations of the civil service system. The different phases of developing the RCSES system are presented, as knowledge acquiring and selection, ontology and knowledge representations using XML format. XML Rule-based knowledge sources and the inference mechanisms were implemented using ASP.net technique. An interactive tool for entering the ontology and knowledge base, and the inferencing was built. It gives the ability to use, modify, update, and extend the existing knowledge base in an easy way. The knowledge was validated by experts in the domain of civil service regulations, and the proposed RCSES was tested, verified, and validated by different technical users and the developers staff. The RCSES system is compared with other related web based expert systems, that comparison proved the goodness, usability, and high performance of RCSES.\nIn our research we investigate the output accuracy of discrete event simulation models and agent based simulation models when studying human centric complex systems. In this paper we focus on human reactive behaviour as it is possible in both modelling approaches to implement human reactive behaviour in the model by using standard methods. As a case study we have chosen the retail sector, and here in particular the operations of the fitting room in the women wear department of a large UK department store. In our case study we looked at ways of determining the efficiency of implementing new management policies for the fitting room operation through modelling the reactive behaviour of staff and customers of the department. First, we have carried out a validation experiment in which we compared the results from our models to the performance of the real system. This experiment also allowed us to establish differences in output accuracy between the two modelling methids. In a second step a multi-scenario experiment was carried out to study the behaviour of the models when they are used for the purpose of operational improvement. Overall we have found that for our case study example both discrete event simulation and agent based simulation have the same potential to support the investigation into the efficiency of implementing new management policies.\nMany problems can be specified by patterns of propositional formulae depending on a parameter, e.g. the specification of a circuit usually depends on the number of bits of its input. We define a logic whose formulae, called \"iterated schemata\", allow to express such patterns. Schemata extend propositional logic with indexed propositions, e.g. P_i, P_i+1, P_1, and with generalized connectives, e.g. /\\i=1..n or i=1..n (called \"iterations\") where n is an (unbound) integer variable called a \"parameter\". The expressive power of iterated schemata is strictly greater than propositional logic: it is even out of the scope of first-order logic. We define a proof procedure, called DPLL*, that can prove that a schema is satisfiable for at least one value of its parameter, in the spirit of the DPLL procedure. However the converse problem, i.e. proving that a schema is unsatisfiable for every value of the parameter, is undecidable so DPLL* does not terminate in general. Still, we prove that it terminates for schemata of a syntactic subclass called \"regularly nested\". This is the first non trivial class for which DPLL* is proved to terminate. Furthermore the class of regularly nested schemata is the first decidable class to allow nesting of iterations, i.e. to allow schemata of the form /\\i=1..n (/\\j=1..n ...).\nFaces are highly deformable objects which may easily change their appearance over time. Not all face areas are subject to the same variability. Therefore decoupling the information from independent areas of the face is of paramount importance to improve the robustness of any face recognition technique. This paper presents a robust face recognition technique based on the extraction and matching of SIFT features related to independent face areas. Both a global and local (as recognition from parts) matching strategy is proposed. The local strategy is based on matching individual salient facial SIFT features as connected to facial landmarks such as the eyes and the mouth. As for the global matching strategy, all SIFT features are combined together to form a single feature. In order to reduce the identification errors, the Dempster-Shafer decision theory is applied to fuse the two matching techniques. The proposed algorithms are evaluated with the ORL and the IITK face databases. The experimental results demonstrate the effectiveness and potential of the proposed face recognition techniques also in the case of partially occluded faces or with missing information.\nEar biometric is considered as one of the most reliable and invariant biometrics characteristics in line with iris and fingerprint characteristics. In many cases, ear biometrics can be compared with face biometrics regarding many physiological and texture characteristics. In this paper, a robust and efficient ear recognition system is presented, which uses Scale Invariant Feature Transform (SIFT) as feature descriptor for structural representation of ear images. In order to make it more robust to user authentication, only the regions having color probabilities in a certain ranges are considered for invariant SIFT feature extraction, where the K-L divergence is used for keeping color consistency. Ear skin color model is formed by Gaussian mixture model and clustering the ear color pattern using vector quantization. Finally, K-L divergence is applied to the GMM framework for recording the color similarity in the specified ranges by comparing color similarity between a pair of reference model and probe ear images. After segmentation of ear images in some color slice regions, SIFT keypoints are extracted and an augmented vector of extracted SIFT features are created for matching, which is accomplished between a pair of reference model and probe ear images. The proposed technique has been tested on the IITK Ear database and the experimental results show improvements in recognition accuracy while invariant features are extracted from color slice regions to maintain the robustness of the system.\nThe search for patterns or motifs in data represents an area of key interest to many researchers. In this paper we present the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs which repeat within time series data. The power of the algorithm is derived from its use of a small number of parameters with minimal assumptions. The algorithm searches from a completely neutral perspective that is independent of the data being analysed, and the underlying motifs. In this paper the motif tracking algorithm is applied to the search for patterns within sequences of low level system calls between the Linux kernel and the operating system's user space. The MTA is able to compress data found in large system call data sets to a limited number of motifs which summarise that data. The motifs provide a resource from which a profile of executed processes can be built. The potential for these profiles and new implications for security research are highlighted. A higher level call system language for measuring similarity between patterns of such calls is also suggested.\nIn this paper, research on AI based modeling technique to optimize development of new alloys with necessitated improvements in properties and chemical mixture over existing alloys as per functional requirements of product is done. The current research work novels AI in lieu of predictions to establish association between material and product customary. Advanced computational simulation techniques like CFD, FEA interrogations are made viable to authenticate product dynamics in context to experimental investigations. Accordingly, the current research is focused towards binding relationships between material design and product design domains. The input to feed forward back propagation prediction network model constitutes of material design features. Parameters relevant to product design strategies are furnished as target outputs. The outcomes of ANN shows good sign of correlation between material and product design domains. The study enriches a new path to illustrate material factors at the time of new product development.\nThe Application of Bio Inspired Algorithms to complicated Power System Stability Problems has recently attracted the researchers in the field of Artificial Intelligence. Low frequency oscillations after a disturbance in a Power system, if not sufficiently damped, can drive the system unstable. This paper provides a systematic procedure to damp the low frequency oscillations based on Bio Inspired Genetic (GA) and Particle Swarm Optimization (PSO) algorithms. The proposed controller design is based on formulating a System Damping ratio enhancement based Optimization criterion to compute the optimal controller parameters for better stability. The Novel and contrasting feature of this work is the mathematical modeling and simulation of the Synchronous generator model including the Steam Governor Turbine (GT) dynamics. To show the robustness of the proposed controller, Non linear Time domain simulations have been carried out under various system operating conditions. Also, a detailed Comparative study has been done to show the superiority of the Bio inspired algorithm based controllers over the Conventional Lead lag controller.\nThe max-product algorithm, a local message-passing scheme that attempts to compute the most probable assignment (MAP) of a given probability distribution, has been successfully employed as a method of approximate inference for applications arising in coding theory, computer vision, and machine learning. However, the max-product algorithm is not guaranteed to converge to the MAP assignment, and if it does, is not guaranteed to recover the MAP assignment.   Alternative convergent message-passing schemes have been proposed to overcome these difficulties. This work provides a systematic study of such message-passing algorithms that extends the known results by exhibiting new sufficient conditions for convergence to local and/or global optima, providing a combinatorial characterization of these optima based on graph covers, and describing a new convergent and correct message-passing algorithm whose derivation unifies many of the known convergent message-passing algorithms.   While convergent and correct message-passing algorithms represent a step forward in the analysis of max-product style message-passing algorithms, the conditions needed to guarantee convergence to a global optimum can be too restrictive in both theory and practice. This limitation of convergent and correct message-passing schemes is characterized by graph covers and illustrated by example.\nAssociation rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as \"support\" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form \"any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant\". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.\nPersonalized web services strive to adapt their services (advertisements, news articles, etc) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation.   In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.   The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.\nRecent advancement in web services plays an important role in business to business and business to consumer interaction. Discovery mechanism is not only used to find a suitable service but also provides collaboration between service providers and consumers by using standard protocols. A static web service discovery mechanism is not only time consuming but requires continuous human interaction. This paper proposed an efficient dynamic web services discovery mechanism that can locate relevant and updated web services from service registries and repositories with timestamp based on indexing value and categorization for faster and efficient discovery of service. The proposed prototype focuses on quality of service issues and introduces concept of local cache, categorization of services, indexing mechanism, CSP (Constraint Satisfaction Problem) solver, aging and usage of translator. Performance of proposed framework is evaluated by implementing the algorithm and correctness of our method is shown. The results of proposed framework shows greater performance and accuracy in dynamic discovery mechanism of web services resolving the existing issues of flexibility, scalability, based on quality of service, and discovers updated and most relevant services with ease of usage.\nHandwritten numeral recognition is in general a benchmark problem of Pattern Recognition and Artificial Intelligence. Compared to the problem of printed numeral recognition, the problem of handwritten numeral recognition is compounded due to variations in shapes and sizes of handwritten characters. Considering all these, the problem of handwritten numeral recognition is addressed under the present work in respect to handwritten Arabic numerals. Arabic is spoken throughout the Arab World and the fifth most popular language in the world slightly before Portuguese and Bengali. For the present work, we have developed a feature set of 88 features is designed to represent samples of handwritten Arabic numerals for this work. It includes 72 shadow and 16 octant features. A Multi Layer Perceptron (MLP) based classifier is used here for recognition handwritten Arabic digits represented with the said feature set. On experimentation with a database of 3000 samples, the technique yields an average recognition rate of 94.93% evaluated after three-fold cross validation of results. It is useful for applications related to OCR of handwritten Arabic Digit and can also be extended to include OCR of handwritten characters of Arabic alphabet.\nWe propose a new method for mining frequent patterns in a language that combines both Semantic Web ontologies and rules. In particular we consider the setting of using a language that combines description logics with DL-safe rules. This setting is important for the practical application of data mining to the Semantic Web. We focus on the relation of the semantics of the representation formalism to the task of frequent pattern discovery, and for the core of our method, we propose an algorithm that exploits the semantics of the combined knowledge base. We have developed a proof-of-concept data mining implementation of this. Using this we have empirically shown that using the combined knowledge base to perform semantic tests can make data mining faster by pruning useless candidate patterns before their evaluation. We have also shown that the quality of the set of patterns produced may be improved: the patterns are more compact, and there are fewer patterns. We conclude that exploiting the semantics of a chosen representation formalism is key to the design and application of (onto-)relational frequent pattern discovery methods. Note: To appear in Theory and Practice of Logic Programming (TPLP)\nThis paper is concern about developing a semantic agreement maintenance method based on semantic distance by calculating the change of local schema or ontology. This approach is important in dynamic and autonomous environment, in which the current approach assumed that agreement or mapping in static environment. The contribution of this research is to develop a framework based on semantic agreement maintenance approach for P2P environment. This framework based on two level hybrid P2P model architecture, which consist of two peer type: (1) super peer that use to register and manage the other peers, and (2) simple peer, as a simple peer, it exports and shares its contents with others. This research develop a model to maintain the semantic agreement in P2P environment, so the current approach which does not have the mechanism to know the change, since it assumed that ontology and local schema are in the static condition, and it is different in dynamic condition. The main issues are how to calculate the change of local schema or common ontology and the calculation result is used to determine which algorithm in maintaining the agreement. The experiment on the job matching domain in Indonesia have been done to show how far the performance of the approach. From the experiment, the main result are (i) the more change so the F-measure value tend to be decreased, (ii) there is no significant different in F-measure value for various modification type (add, delete, rename), and (iii) the correct choice of algorithm would improve the F-measure value.\nMulti-agent systems offer a new and exciting way of understanding the world of work. We apply agent-based modeling and simulation to investigate a set of problems in a retail context. Specifically, we are working to understand the relationship between people management practices on the shop-floor and retail performance. Despite the fact we are working within a relatively novel and complex domain, it is clear that using an agent-based approach offers great potential for improving organizational capabilities in the future. Our multi-disciplinary research team has worked closely with one of the UK's top ten retailers to collect data and build an understanding of shop-floor operations and the key actors in a department (customers, staff, and managers). Based on this case study we have built and tested our first version of a retail branch agent-based simulation model where we have focused on how we can simulate the effects of people management practices on customer satisfaction and sales. In our experiments we have looked at employee development and cashier empowerment as two examples of shop floor management practices. In this paper we describe the underlying conceptual ideas and the features of our simulation model. We present a selection of experiments we have conducted in order to validate our simulation model and to show its potential for answering \"what-if\" questions in a retail context. We also introduce a novel performance measure which we have created to quantify customers' satisfaction with service, based on their individual shopping experiences.\nThis paper reports on continuing research into the modelling of an order picking process within a Crossdocking distribution centre using Simulation Optimisation. The aim of this project is to optimise a discrete event simulation model and to understand factors that affect finding its optimal performance. Our initial investigation revealed that the precision of the selected simulation output performance measure and the number of replications required for the evaluation of the optimisation objective function through simulation influences the ability of the optimisation technique. We experimented with Common Random Numbers, in order to improve the precision of our simulation output performance measure, and intended to use the number of replications utilised for this purpose as the initial number of replications for the optimisation of our Crossdocking distribution centre simulation model. Our results demonstrate that we can improve the precision of our selected simulation output performance measure value using Common Random Numbers at various levels of replications. Furthermore, after optimising our Crossdocking distribution centre simulation model, we are able to achieve optimal performance using fewer simulations runs for the simulation model which uses Common Random Numbers as compared to the simulation model which does not use Common Random Numbers.\nWe consider a living organism as an observer of the evolution of its environment recording sensory information about the state space X of the environment in real time. Sensory information is sampled and then processed on two levels. On the biological level, the organism serves as an evaluation mechanism of the subjective relevance of the incoming data to the observer: the observer assigns excitation values to events in X it could recognize using its sensory equipment. On the algorithmic level, sensory input is used for updating a database, the memory of the observer whose purpose is to serve as a geometric/combinatorial model of X, whose nodes are weighted by the excitation values produced by the evaluation mechanism. These values serve as a guidance system for deciding how the database should transform as observation data mounts. We define a searching problem for the proposed model and discuss the model's flexibility and its computational efficiency, as well as the possibility of implementing it as a dynamic network of neuron-like units. We show how various easily observable properties of the human memory and thought process can be explained within the framework of this model. These include: reasoning (with efficiency bounds), errors, temporary and permanent loss of information. We are also able to define general learning problems in terms of the new model, such as the language acquisition problem.\nSolving stochastic optimization problems under partial observability, where one needs to adaptively make decisions with uncertain outcomes, is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse applications including sensor placement, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations.\nAs an immune inspired algorithm, the Dendritic Cell Algorithm (DCA) has been applied to a range of problems, particularly in the area of intrusion detection. Ideally, the intrusion detection should be performed in real-time, to continuously detect misuses as soon as they occur. Consequently, the analysis process performed by an intrusion detection system must operate in real-time or near-to real-time. The analysis process of the DCA is currently performed offline, therefore to improve the algorithm's performance we suggest the development of a real-time analysis component. The initial step of the development is to apply segmentation to the DCA. This involves segmenting the current output of the DCA into slices and performing the analysis in various ways. Two segmentation approaches are introduced and tested in this paper, namely antigen based segmentation (ABS) and time based segmentation (TBS). The results of the corresponding experiments suggest that applying segmentation produces different and significantly better results in some cases, when compared to the standard DCA without segmentation. Therefore, we conclude that the segmentation is applicable to the DCA for the purpose of real-time analysis.\nIn density estimation task, maximum entropy model (Maxent) can effectively use reliable prior information via certain constraints, i.e., linear constraints without empirical parameters. However, reliable prior information is often insufficient, and the selection of uncertain constraints becomes necessary but poses considerable implementation complexity. Improper setting of uncertain constraints can result in overfitting or underfitting. To solve this problem, a generalization of Maxent, under Tsallis entropy framework, is proposed. The proposed method introduces a convex quadratic constraint for the correction of (expected) Tsallis entropy bias (TEB). Specifically, we demonstrate that the expected Tsallis entropy of sampling distributions is smaller than the Tsallis entropy of the underlying real distribution. This expected entropy reduction is exactly the (expected) TEB, which can be expressed by a closed-form formula and act as a consistent and unbiased correction. TEB indicates that the entropy of a specific sampling distribution should be increased accordingly. This entails a quantitative re-interpretation of the Maxent principle. By compensating TEB and meanwhile forcing the resulting distribution to be close to the sampling distribution, our generalized TEBC Maxent can be expected to alleviate the overfitting and underfitting. We also present a connection between TEB and Lidstone estimator. As a result, TEB-Lidstone estimator is developed by analytically identifying the rate of probability correction in Lidstone. Extensive empirical evaluation shows promising performance of both TEBC Maxent and TEB-Lidstone in comparison with various state-of-the-art density estimation methods.\nMessage passing type algorithms such as the so-called Belief Propagation algorithm have recently gained a lot of attention in the statistics, signal processing and machine learning communities as attractive algorithms for solving a variety of optimization and inference problems. As a decentralized, easy to implement and empirically successful algorithm, BP deserves attention from the theoretical standpoint, and here not much is known at the present stage. In order to fill this gap we consider the performance of the BP algorithm in the context of the capacitated minimum-cost network flow problem - the classical problem in the operations research field. We prove that BP converges to the optimal solution in the pseudo-polynomial time, provided that the optimal solution of the underlying problem is unique and the problem input is integral. Moreover, we present a simple modification of the BP algorithm which gives a fully polynomial-time randomized approximation scheme (FPRAS) for the same problem, which no longer requires the uniqueness of the optimal solution. This is the first instance where BP is proved to have fully-polynomial running time. Our results thus provide a theoretical justification for the viability of BP as an attractive method to solve an important class of optimization problems.\nTerrorism has led to many problems in Thai societies, not only property damage but also civilian casualties. Predicting terrorism activities in advance can help prepare and manage risk from sabotage by these activities. This paper proposes a framework focusing on event classification in terrorism domain using fuzzy inference systems (FISs). Each FIS is a decision-making model combining fuzzy logic and approximate reasoning. It is generated in five main parts: the input interface, the fuzzification interface, knowledge base unit, decision making unit and output defuzzification interface. Adaptive neuro-fuzzy inference system (ANFIS) is a FIS model adapted by combining the fuzzy logic and neural network. The ANFIS utilizes automatic identification of fuzzy logic rules and adjustment of membership function (MF). Moreover, neural network can directly learn from data set to construct fuzzy logic rules and MF implemented in various applications. FIS settings are evaluated based on two comparisons. The first evaluation is the comparison between unstructured and structured events using the same FIS setting. The second comparison is the model settings between FIS and ANFIS for classifying structured events. The data set consists of news articles related to terrorism events in three southern provinces of Thailand. The experimental results show that the classification performance of the FIS resulting from structured events achieves satisfactory accuracy and is better than the unstructured events. In addition, the classification of structured events using ANFIS gives higher performance than the events using only FIS in the prediction of terrorism events.\nThis thesis investigates the use of problem-specific knowledge to enhance a genetic algorithm approach to multiple-choice optimisation problems.It shows that such information can significantly enhance performance, but that the choice of information and the way it is included are important factors for success.Two multiple-choice problems are considered.The first is constructing a feasible nurse roster that considers as many requests as possible.In the second problem, shops are allocated to locations in a mall subject to constraints and maximising the overall income.Genetic algorithms are chosen for their well-known robustness and ability to solve large and complex discrete optimisation problems.However, a survey of the literature reveals room for further research into generic ways to include constraints into a genetic algorithm framework.Hence, the main theme of this work is to balance feasibility and cost of solutions.In particular, co-operative co-evolution with hierarchical sub-populations, problem structure exploiting repair schemes and indirect genetic algorithms with self-adjusting decoder functions are identified as promising approaches.The research starts by applying standard genetic algorithms to the problems and explaining the failure of such approaches due to epistasis.To overcome this, problem-specific information is added in a variety of ways, some of which are designed to increase the number of feasible solutions found whilst others are intended to improve the quality of such solutions.As well as a theoretical discussion as to the underlying reasons for using each operator,extensive computational experiments are carried out on a variety of data.These show that the indirect approach relies less on problem structure and hence is easier to implement and superior in solution quality.\nThe successful execution of a construction project is heavily impacted by making the right decision during tendering processes. Managing tender procedures is very complex and uncertain involving coordination of many tasks and individuals with different priorities and objectives. Bias and inconsistent decision are inevitable if the decision-making process is totally depends on intuition, subjective judgement or emotion. In making transparent decision and healthy competition tendering, there exists a need for flexible guidance tool for decision support. Aim of this paper is to give a review on current practices of Decision Support Systems (DSS) technology in construction tendering processes. Current practices of general tendering processes as applied to the most countries in different regions such as United States, Europe, Middle East and Asia are comprehensively discussed. Applications of Web-based tendering processes is also summarised in terms of its properties. Besides that, a summary of Decision Support System (DSS) components is included in the next section. Furthermore, prior researches on implementation of DSS approaches in tendering processes are discussed in details. Current issues arise from both of paper-based and Web-based tendering processes are outlined. Finally, conclusion is included at the end of this paper.\nInter-subject parcellation of functional Magnetic Resonance Imaging (fMRI) data based on a standard General Linear Model (GLM)and spectral clustering was recently proposed as a means to alleviate the issues associated with spatial normalization in fMRI. However, for all its appeal, a GLM-based parcellation approach introduces its own biases, in the form of a priori knowledge about the shape of Hemodynamic Response Function (HRF) and task-related signal changes, or about the subject behaviour during the task. In this paper, we introduce a data-driven version of the spectral clustering parcellation, based on Independent Component Analysis (ICA) and Partial Least Squares (PLS) instead of the GLM. First, a number of independent components are automatically selected. Seed voxels are then obtained from the associated ICA maps and we compute the PLS latent variables between the fMRI signal of the seed voxels (which covers regional variations of the HRF) and the principal components of the signal across all voxels. Finally, we parcellate all subjects data with a spectral clustering of the PLS latent variables. We present results of the application of the proposed method on both single-subject and multi-subject fMRI datasets. Preliminary experimental results, evaluated with intra-parcel variance of GLM t-values and PLS derived t-values, indicate that this data-driven approach offers improvement in terms of parcellation accuracy over GLM based techniques.\nBiometric authentication techniques are more consistent and efficient than conventional authentication techniques and can be used in monitoring, transaction authentication, information retrieval, access control, forensics, etc. In this paper, we have presented a detailed comparative analysis between Principle Component Analysis (PCA) and Independent Component Analysis (ICA) which are used for feature extraction on the basis of different Artificial Neural Network (ANN) such as Back Propagation (BP), Radial Basis Function (RBF) and Learning Vector Quantization (LVQ). In this paper, we have chosen \"TULIPS1 database, (Movellan, 1995)\" which is a small audiovisual database of 12 subjects saying the first 4 digits in English for the incorporation of above methods. The six geometric lip features i.e. height of the outer corners of the mouth, width of the outer corners of the mouth, height of the inner corners of the mouth, width of the inner corners of the mouth, height of the upper lip, and height of the lower lip which extracts the identity relevant information are considered for the research work. After the comprehensive analysis and evaluation a maximum of 91.07% accuracy in speaker recognition is achieved using PCA and RBF and 87.36% accuracy is achieved using ICA and RBF. Speaker identification has a wide scope of applications such as access control, monitoring, transaction authentication, information retrieval, forensics, etc.\nWe often encounter probability distributions given as unnormalized products of non-negative functions. The factorization structures are represented by hypergraphs called factor graphs. Such distributions appear in various fields, including statistics, artificial intelligence, statistical physics, error correcting codes, etc. Given such a distribution, computations of marginal distributions and the normalization constant are often required. However, they are computationally intractable because of their computational costs. One successful approximation method is Loopy Belief Propagation (LBP) algorithm. The focus of this thesis is an analysis of the LBP algorithm. If the factor graph is a tree, i.e. having no cycle, the algorithm gives the exact quantities. If the factor graph has cycles, however, the LBP algorithm does not give exact results and possibly exhibits oscillatory and non-convergent behaviors. The thematic question of this thesis is \"How the behaviors of the LBP algorithm are affected by the discrete geometry of the factor graph?\" The primary contribution of this thesis is the discovery of a formula that establishes the relation between the LBP, the Bethe free energy and the graph zeta function. This formula provides new techniques for analysis of the LBP algorithm, connecting properties of the graph and of the LBP and the Bethe free energy. We demonstrate applications of the techniques to several problems including (non) convexity of the Bethe free energy, the uniqueness and stability of the LBP fixed point. We also discuss the loop series initiated by Chertkov and Chernyak. The loop series is a subgraph expansion of the normalization constant, or partition function, and reflects the graph geometry. We investigate theoretical natures of the series. Moreover, we show a partial connection between the loop series and the graph zeta function.\nOne of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. Performance guarantees become crucial for tasks such as microarray data analysis due to very small sample sizes resulting in limited empirical evaluation. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of well known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with much smaller number of genes while giving competitive classification accuracy but also have tight risk guarantees on future performance unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.\nProbabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or restrict the class of relational algebra formula under which they are closed. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary queries over probabilistic databases with arbitrary dependencies expressed by graphical models with structure that changes during inference. MCMC sampling provides efficiency by hypothesizing {\\em modifications} to possible worlds rather than generating entire worlds from scratch. Queries are then run over the portions of the world that change, avoiding the onerous cost of running full queries over each sampled world. A significant innovation of this work is the connection between MCMC sampling and materialized view maintenance techniques: we find empirically that using view maintenance techniques is several orders of magnitude faster than naively querying each sampled world. We also demonstrate our system's ability to answer relational queries with aggregation, and demonstrate additional scalability through the use of parallelization.\nIn this paper, a new learning algorithm for adaptive network intrusion detection using naive Bayesian classifier and decision tree is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks, and eliminates redundant attributes as well as contradictory examples from training data that make the detection model complex. The proposed algorithm also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. Due to the large volumes of security audit data as well as the complex and dynamic properties of intrusion behaviours, several data miningbased intrusion detection techniques have been applied to network-based traffic data and host-based data in the last decades. However, there remain various issues needed to be examined towards current intrusion detection systems (IDS). We tested the performance of our proposed algorithm with existing learning algorithms by employing on the KDD99 benchmark intrusion detection dataset. The experimental results prove that the proposed algorithm achieved high detection rates (DR) and significant reduce false positives (FP) for different types of network intrusions using limited computational resources.\nThe email is used daily by millions of people to communicate around the globe and it is a mission-critical application for many businesses. Over the last decade, unsolicited bulk email has become a major problem for email users. An overwhelming amount of spam is flowing into users' mailboxes daily. In 2004, an estimated 62% of all email was attributed to spam. Spam is not only frustrating for most email users, it strains the IT infrastructure of organizations and costs businesses billions of dollars in lost productivity. In recent years, spam has evolved from an annoyance into a serious security threat, and is now a prime medium for phishing of sensitive information, as well the spread of malicious software. This work presents a first approach to attack the spam problem. We propose an algorithm that will improve a classifier's results by adjusting its training set data. It improves the document's vocabulary representation by detecting good topic descriptors and discriminators.\nRepresenting distributions over permutations can be a daunting task due to the fact that the number of permutations of $n$ objects scales factorially in $n$. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called \\emph{riffled independence}, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the \\emph{riffle shuffle}, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this paper, we provide a formal introduction to riffled independence and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. Additionally, we propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence.\nInterval temporal logics (ITLs) are logics for reasoning about temporal statements expressed over intervals, i.e., periods of time. The most famous ITL studied so far is Halpern and Shoham's HS, which is the logic of the thirteen Allen's interval relations. Unfortunately, HS and most of its fragments have an undecidable satisfiability problem. This discouraged the research in this area until recently, when a number non-trivial decidable ITLs have been discovered.   This paper is a contribution towards the complete classification of all different fragments of HS. We consider different combinations of the interval relations Begins, After, Later and their inverses Abar, Bbar, and Lbar. We know from previous works that the combination ABBbarAbar is decidable only when finite domains are considered (and undecidable elsewhere), and that ABBbar is decidable over the natural numbers. We extend these results by showing that decidability of ABBar can be further extended to capture the language ABBbarLbar, which lays in between ABBar and ABBbarAbar, and that turns out to be maximal w.r.t decidability over strongly discrete linear orders (e.g. finite orders, the naturals, the integers). We also prove that the proposed decision procedure is optimal with respect to the complexity class.\nThis research investigated the simulation model behaviour of a traditional and combined discrete event as well as agent based simulation models when modelling human reactive and proactive behaviour in human centric complex systems. A departmental store was chosen as human centric complex case study where the operation system of a fitting room in WomensWear department was investigated. We have looked at ways to determine the efficiency of new management policies for the fitting room operation through simulating the reactive and proactive behaviour of staff towards customers. Once development of the simulation models and their verification had been done, we carried out a validation experiment in the form of a sensitivity analysis. Subsequently, we executed a statistical analysis where the mixed reactive and proactive behaviour experimental results were compared with some reactive experimental results from previously published works. Generally, this case study discovered that simple proactive individual behaviour could be modelled in both simulation models. In addition, we found the traditional discrete event model performed similar in the simulation model output compared to the combined discrete event and agent based simulation when modelling similar human behaviour.\nIn response to a 1997 problem of M. Vidyasagar, we state a necessary and sufficient condition for distribution-free PAC learnability of a concept class $\\mathscr C$ under the family of all non-atomic (diffuse) measures on the domain $\\Omega$. Clearly, finiteness of the classical Vapnik-Chervonenkis dimension of $\\mathscr C$ is a sufficient, but no longer necessary, condition. Besides, learnability of $\\mathscr C$ under non-atomic measures does not imply the uniform Glivenko-Cantelli property with regard to non-atomic measures. Our learnability criterion is stated in terms of a combinatorial parameter $\\VC({\\mathscr C}\\,{\\mathrm{mod}}\\,\\omega_1)$ which we call the VC dimension of $\\mathscr C$ modulo countable sets. The new parameter is obtained by ``thickening up'' single points in the definition of VC dimension to uncountable ``clusters''. Equivalently, $\\VC(\\mathscr C\\modd\\omega_1)\\leq d$ if and only if every countable subclass of $\\mathscr C$ has VC dimension $\\leq d$ outside a countable subset of $\\Omega$. The new parameter can be also expressed as the classical VC dimension of $\\mathscr C$ calculated on a suitable subset of a compactification of $\\Omega$. We do not make any measurability assumptions on $\\mathscr C$, assuming instead the validity of Martin's Axiom (MA).\nNotions of core, support and inversion of a soft set have been defined and studied. Soft approximations are soft sets developed through core and support, and are used for granulating the soft space. Membership structure of a soft set has been probed in and many interesting properties presented. The mathematical apparatus developed so far in this paper yields a detailed analysis of two works viz. [N. Cagman, S. Enginoglu, Soft set theory and uni-int decision making, European Jr. of Operational Research (article in press, available online 12 May 2010)] and [N. Cagman, S. Enginoglu, Soft matrix theory and its decision making, Computers and Mathematics with Applications 59 (2010) 3308 - 3314.]. We prove (Theorem 8.1) that uni-int method of Cagman is equivalent to a core-support expression which is computationally far less expansive than uni-int. This also highlights some shortcomings in Cagman's uni-int method and thus motivates us to improve the method. We first suggest an improvement in uni-int method and then present a new conjecture to solve the optimum choice problem given by Cagman and Enginoglu. Our Example 8.6 presents a case where the optimum choice is intuitively clear yet both uni-int methods (Cagman's and our improved one) give wrong answer but the new conjecture solves the problem correctly.\nConstraint satisfaction problems (or CSPs) have been extensively studied in, for instance, artificial intelligence, database theory, graph theory, and statistical physics. From a practical viewpoint, it is beneficial to approximately solve those CSPs. When one tries to approximate the total number of truth assignments that satisfy all Boolean-valued constraints for (unweighted) Boolean CSPs, there is a known trichotomy theorem by which all such counting problems are neatly classified into exactly three categories under polynomial-time (randomized) approximation-preserving reductions. In contrast, we obtain a dichotomy theorem of approximate counting for complex-weighted Boolean CSPs, provided that all complex-valued unary constraints are freely available to use. It is the expressive power of free unary constraints that enables us to prove such a stronger, complete classification theorem. This discovery makes a step forward in the quest for the approximation-complexity classification of all counting CSPs. To deal with complex weights, we employ proof techniques of factorization and arity reduction along the line of solving Holant problems. Moreover, we introduce a novel notion of T-constructibility that naturally induces approximation-preserving reducibility. Our result also gives an approximation analogue of the dichotomy theorem on the complexity of exact counting for complex-weighted Boolean CSPs.\nWe consider a common type of symmetry where we have a matrix of decision variables with interchangeable rows and columns. A simple and efficient method to deal with such row and column symmetry is to post symmetry breaking constraints like DOUBLELEX and SNAKELEX. We provide a number of positive and negative results on posting such symmetry breaking constraints. On the positive side, we prove that we can compute in polynomial time a unique representative of an equivalence class in a matrix model with row and column symmetry if the number of rows (or of columns) is bounded and in a number of other special cases. On the negative side, we show that whilst DOUBLELEX and SNAKELEX are often effective in practice, they can leave a large number of symmetric solutions in the worst case. In addition, we prove that propagating DOUBLELEX completely is NP-hard. Finally we consider how to break row, column and value symmetry, correcting a result in the literature about the safeness of combining different symmetry breaking constraints. We end with the first experimental study on how much symmetry is left by DOUBLELEX and SNAKELEX on some benchmark problems.\nClassic decision-theory is based on the maximum expected utility (MEU) principle, but crucially ignores the resource costs incurred when determining optimal decisions. Here we propose an axiomatic framework for bounded decision-making that considers resource costs. Agents are formalized as probability measures over input-output streams. We postulate that any such probability measure can be assigned a corresponding conjugate utility function based on three axioms: utilities should be real-valued, additive and monotonic mappings of probabilities. We show that these axioms enforce a unique conversion law between utility and probability (and thereby, information). Moreover, we show that this relation can be characterized as a variational principle: given a utility function, its conjugate probability measure maximizes a free utility functional. Transformations of probability measures can then be formalized as a change in free utility due to the addition of new constraints expressed by a target utility function. Accordingly, one obtains a criterion to choose a probability measure that trades off the maximization of a target utility function and the cost of the deviation from a reference distribution. We show that optimal control, adaptive estimation and adaptive control problems can be solved this way in a resource-efficient way. When resource costs are ignored, the MEU principle is recovered. Our formalization might thus provide a principled approach to bounded rationality that establishes a close link to information theory.\nFrom the advent of the application of satellite imagery to land cover mapping, one of the growing areas of research interest has been in the area of image classification. Image classifiers are algorithms used to extract land cover information from satellite imagery. Most of the initial research has focussed on the development and application of algorithms to better existing and emerging classifiers. In this paper, a paradigm shift is proposed whereby a committee of classifiers is used to determine the final classification output. Two of the key components of an ensemble system are that there should be diversity among the classifiers and that there should be a mechanism through which the results are combined. In this paper, the members of the ensemble system include: Linear SVM, Gaussian SVM and Quadratic SVM. The final output was determined through a simple majority vote of the individual classifiers. From the results obtained it was observed that the final derived map generated by an ensemble system can potentially improve on the results derived from the individual classifiers making up the ensemble system. The ensemble system classification accuracy was, in this case, better than the linear and quadratic SVM result. It was however less than that of the RBF SVM. Areas for further research could focus on improving the diversity of the ensemble system used in this research.\nStrategic Environmental Assessment is a procedure aimed at introducing systematic assessment of the environmental effects of plans and programs. This procedure is based on the so-called coaxial matrices that define dependencies between plan activities (infrastructures, plants, resource extractions, buildings, etc.) and positive and negative environmental impacts, and dependencies between these impacts and environmental receptors. Up to now, this procedure is manually implemented by environmental experts for checking the environmental effects of a given plan or program, but it is never applied during the plan/program construction. A decision support system, based on a clear logic semantics, would be an invaluable tool not only in assessing a single, already defined plan, but also during the planning process in order to produce an optimized, environmentally assessed plan and to study possible alternative scenarios. We propose two logic-based approaches to the problem, one based on Constraint Logic Programming and one on Probabilistic Logic Programming that could be, in the future, conveniently merged to exploit the advantages of both. We test the proposed approaches on a real energy plan and we discuss their limitations and advantages.\nThis paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.\nAn approach to the revision of logic programs under the answer set semantics is presented. For programs P and Q, the goal is to determine the answer sets that correspond to the revision of P by Q, denoted P * Q. A fundamental principle of classical (AGM) revision, and the one that guides the approach here, is the success postulate. In AGM revision, this stipulates that A is in K * A. By analogy with the success postulate, for programs P and Q, this means that the answer sets of Q will in some sense be contained in those of P * Q. The essential idea is that for P * Q, a three-valued answer set for Q, consisting of positive and negative literals, is first determined. The positive literals constitute a regular answer set, while the negated literals make up a minimal set of naf literals required to produce the answer set from Q. These literals are propagated to the program P, along with those rules of Q that are not decided by these literals. The approach differs from work in update logic programs in two main respects. First, we ensure that the revising logic program has higher priority, and so we satisfy the success postulate; second, for the preference implicit in a revision P * Q, the program Q as a whole takes precedence over P, unlike update logic programs, since answer sets of Q are propagated to P. We show that a core group of the AGM postulates are satisfied, as are the postulates that have been proposed for update logic programs.\nThe stable marriage problem is a well-known problem of matching men to women so that no man and woman, who are not married to each other, both prefer each other. Such a problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools or more generally to any two-sided market. In the classical stable marriage problem, both men and women express a strict preference order over the members of the other sex, in a qualitative way. Here we consider stable marriage problems with quantitative preferences: each man (resp., woman) provides a score for each woman (resp., man). Such problems are more expressive than the classical stable marriage problems. Moreover, in some real-life situations it is more natural to express scores (to model, for example, profits or costs) rather than a qualitative preference ordering. In this context, we define new notions of stability and optimality, and we provide algorithms to find marriages which are stable and/or optimal according to these notions. While expressivity greatly increases by adopting quantitative preferences, we show that in most cases the desired solutions can be found by adapting existing algorithms for the classical stable marriage problem.\nIn this work we have compared two indexing algorithms that have been used to index and retrieve Carnatic music songs. We have compared a modified algorithm of the Dual ternary indexing algorithm for music indexing and retrieval with the multi-key hashing indexing algorithm proposed by us. The modification in the dual ternary algorithm was essential to handle variable length query phrase and to accommodate features specific to Carnatic music. The dual ternary indexing algorithm is adapted for Carnatic music by segmenting using the segmentation technique for Carnatic music. The dual ternary algorithm is compared with the multi-key hashing algorithm designed by us for indexing and retrieval in which features like MFCC, spectral flux, melody string and spectral centroid are used as features for indexing data into a hash table. The way in which collision resolution was handled by this hash table is different than the normal hash table approaches. It was observed that multi-key hashing based retrieval had a lesser time complexity than dual-ternary based indexing The algorithms were also compared for their precision and recall in which multi-key hashing had a better recall than modified dual ternary indexing for the sample data considered.\nEvent-driven automation of reactive functionalities for complex event processing is an urgent need in today's distributed service-oriented architectures and Web-based event-driven environments. An important problem to be addressed is how to correctly and efficiently capture and process the event-based behavioral, reactive logic embodied in reaction rules, and combining this with other conditional decision logic embodied, e.g., in derivation rules. This paper elaborates a homogeneous integration approach that combines derivation rules, reaction rules and other rule types such as integrity constraints into the general framework of logic programming, the industrial-strength version of declarative programming. We describe syntax and semantics of the language, implement a distributed web-based middleware using enterprise service technologies and illustrate its adequacy in terms of expressiveness, efficiency and scalability through examples extracted from industrial use cases. The developed reaction rule language provides expressive features such as modular ID-based updates with support for external imports and self-updates of the intensional and extensional knowledge bases, transactions including integrity testing and roll-backs of update transition paths. It also supports distributed complex event processing, event messaging and event querying via efficient and scalable enterprise middleware technologies and event/action reasoning based on an event/action algebra implemented by an interval-based event calculus variant as a logic inference formalism.\nIn this paper we analyze judgement aggregation problems in which a group of agents independently votes on a set of complex propositions that has some interdependency constraint between them(e.g., transitivity when describing preferences). We consider the issue of judgement aggregation from the perspective of approximation. That is, we generalize the previous results by studying approximate judgement aggregation. We relax the main two constraints assumed in the current literature, Consistency and Independence and consider mechanisms that only approximately satisfy these constraints, that is, satisfy them up to a small portion of the inputs. The main question we raise is whether the relaxation of these notions significantly alters the class of satisfying aggregation mechanisms. The recent works for preference aggregation of Kalai, Mossel, and Keller fit into this framework. The main result of this paper is that, as in the case of preference aggregation, in the case of a subclass of a natural class of aggregation problems termed `truth-functional agendas', the set of satisfying aggregation mechanisms does not extend non-trivially when relaxing the constraints. Our proof techniques involve Boolean Fourier transform and analysis of voter influences for voting protocols. The question we raise for Approximate Aggregation can be stated in terms of Property Testing. For instance, as a corollary from our result we get a generalization of the classic result for property testing of linearity of Boolean functions.   An updated version (RePEc:huj:dispap:dp574R) is available at http://www.ratio.huji.ac.il/dp_files/dp574R.pdf\nSupport vector machines (SVMs) are invaluable tools for many practical applications in artificial intelligence, e.g., classification and event recognition. However, popular SVM solvers are not sufficiently efficient for applications with a great deal of samples as well as a large number of features. In this paper, thus, we present NESVM, a fast gradient SVM solver that can optimize various SVM models, e.g., classical SVM, linear programming SVM and least square SVM. Compared against SVM-Perf \\cite{SVM_Perf}\\cite{PerfML} (its convergence rate in solving the dual SVM is upper bounded by $\\mathcal O(1/\\sqrt{k})$, wherein $k$ is the number of iterations.) and Pegasos \\cite{Pegasos} (online SVM that converges at rate $\\mathcal O(1/k)$ for the primal SVM), NESVM achieves the optimal convergence rate at $\\mathcal O(1/k^{2})$ and a linear time complexity. In particular, NESVM smoothes the non-differentiable hinge loss and $\\ell_1$-norm in the primal SVM. Then the optimal gradient method without any line search is adopted to solve the optimization. In each iteration round, the current gradient and historical gradients are combined to determine the descent direction, while the Lipschitz constant determines the step size. Only two matrix-vector multiplications are required in each iteration round. Therefore, NESVM is more efficient than existing SVM solvers. In addition, NESVM is available for both linear and nonlinear kernels. We also propose \"homotopy NESVM\" to accelerate NESVM by dynamically decreasing the smooth parameter and using the continuation method. Our experiments on census income categorization, indoor/outdoor scene classification, event recognition and scene recognition suggest the efficiency and the effectiveness of NESVM. The MATLAB code of NESVM will be available on our website for further assessment.\nThe search strategy of a CP solver is determined by the variable and value ordering heuristics it employs and by the branching scheme it follows. Although the effects of variable and value ordering heuristics on search effort have been widely studied, the effects of different branching schemes have received less attention. In this paper we study this effect through an experimental evaluation that includes standard branching schemes such as 2-way, d-way, and dichotomic domain splitting, as well as variations of set branching where branching is performed on sets of values. We also propose and evaluate a generic approach to set branching where the partition of a domain into sets is created using the scores assigned to values by a value ordering heuristic, and a clustering algorithm from machine learning. Experimental results demonstrate that although exponential differences between branching schemes, as predicted in theory between 2-way and d-way branching, are not very common, still the choice of branching scheme can make quite a difference on certain classes of problems. Set branching methods are very competitive with 2-way branching and outperform it on some problem classes. A statistical analysis of the results reveals that our generic clustering-based set branching method is the best among the methods compared.\nWe motivate and analyse a new Tree Search algorithm, GPTS, based on recent theoretical advances in the use of Gaussian Processes for Bandit problems. We consider tree paths as arms and we assume the target/reward function is drawn from a GP distribution. The posterior mean and variance, after observing data, are used to define confidence intervals for the function values, and we sequentially play arms with highest upper confidence bounds. We give an efficient implementation of GPTS and we adapt previous regret bounds by determining the decay rate of the eigenvalues of the kernel matrix on the whole set of tree paths. We consider two kernels in the feature space of binary vectors indexed by the nodes of the tree: linear and Gaussian. The regret grows in square root of the number of iterations T, up to a logarithmic factor, with a constant that improves with bigger Gaussian kernel widths. We focus on practical values of T, smaller than the number of arms. Finally, we apply GPTS to Open Loop Planning in discounted Markov Decision Processes by modelling the reward as a discounted sum of independent Gaussian Processes. We report similar regret bounds to those of the OLOP algorithm.\nAn answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Chockler's recent definitions of causality and responsibility to define the causes of answers and non-answers to queries, and their degree of responsibility. Responsibility captures the notion of degree of causality and serves to rank potentially many causes by their relative contributions to the effect. Then, we study the complexity of computing causes and responsibilities for conjunctive queries. It is known that computing causes is NP-complete in general. Our first main result shows that all causes to conjunctive queries can be computed by a relational query which may involve negation. Thus, causality can be computed in PTIME, and very efficiently so. Next, we study computing responsibility. Here, we prove that the complexity depends on the conjunctive query and demonstrate a dichotomy between PTIME and NP-complete cases. For the PTIME cases, we give a non-trivial algorithm, consisting of a reduction to the max-flow computation problem. Finally, we prove that, even when it is in PTIME, responsibility is complete for LOGSPACE, implying that, unlike causality, it cannot be computed by a relational query.\nComplex network theory aims to model and analyze complex systems that consist of multiple and interdependent components. Among all studies on complex networks, topological structure analysis is of the most fundamental importance, as it represents a natural route to understand the dynamics, as well as to synthesize or optimize the functions, of networks. A broad spectrum of network structural patterns have been respectively reported in the past decade, such as communities, multipartites, hubs, authorities, outliers, bow ties, and others. Here, we show that most individual real-world networks demonstrate multiplex structures. That is, a multitude of known or even unknown (hidden) patterns can simultaneously situate in the same network, and moreover they may be overlapped and nested with each other to collaboratively form a heterogeneous, nested or hierarchical organization, in which different connective phenomena can be observed at different granular levels. In addition, we show that the multiplex structures hidden in exploratory networks can be well defined as well as effectively recognized within an unified framework consisting of a set of proposed concepts, models, and algorithms. Our findings provide a strong evidence that most real-world complex systems are driven by a combination of heterogeneous mechanisms that may collaboratively shape their ubiquitous multiplex structures as we observe currently. This work also contributes a mathematical tool for analyzing different sources of networks from a new perspective of unveiling multiplex structures, which will be beneficial to multiple disciplines including sociology, economics and computer science.\nMargin theory provides one of the most popular explanations to the success of \\texttt{AdaBoost}, where the central point lies in the recognition that \\textit{margin} is the key for characterizing the performance of \\texttt{AdaBoost}. This theory has been very influential, e.g., it has been used to argue that \\texttt{AdaBoost} usually does not overfit since it tends to enlarge the margin even after the training error reaches zero. Previously the \\textit{minimum margin bound} was established for \\texttt{AdaBoost}, however, \\cite{Breiman1999} pointed out that maximizing the minimum margin does not necessarily lead to a better generalization. Later, \\cite{Reyzin:Schapire2006} emphasized that the margin distribution rather than minimum margin is crucial to the performance of \\texttt{AdaBoost}. In this paper, we first present the \\textit{$k$th margin bound} and further study on its relationship to previous work such as the minimum margin bound and Emargin bound. Then, we improve the previous empirical Bernstein bounds \\citep{Maurer:Pontil2009,Audibert:Munos:Szepesvari2009}, and based on such findings, we defend the margin-based explanation against Breiman's doubts by proving a new generalization error bound that considers exactly the same factors as \\cite{Schapire:Freund:Bartlett:Lee1998} but is sharper than \\cite{Breiman1999}'s minimum margin bound. By incorporating factors such as average margin and variance, we present a generalization error bound that is heavily related to the whole margin distribution. We also provide margin distribution bounds for generalization error of voting classifiers in finite VC-dimension space.\nPervasive user-centric applications are systems which are meant to sense the presence, mood, and intentions of users in order to optimize user comfort and performance. Building such applications requires not only state-of-the art techniques from artificial intelligence but also sound software engineering methods for facilitating modular design, runtime adaptation and verification of critical system requirements.   In this paper we focus on high-level design and analysis, and use the algebraic rewriting language Real-Time Maude for specifying applications in a real-time setting. We propose a generic component-based approach for modeling pervasive user-centric systems and we show how to analyze and prove crucial properties of the system architecture through model checking and simulation. For proving time-dependent properties we use Metric Temporal Logic (MTL) and present analysis algorithms for model checking two subclasses of MTL formulas: time-bounded response and time-bounded safety MTL formulas. The underlying idea is to extend the Real-Time Maude model with suitable clocks, to transform the MTL formulas into LTL formulas over the extended specification, and then to use the LTL model checker of Maude. It is shown that these analyses are sound and complete for maximal time sampling. The approach is illustrated by a simple adaptive advertising scenario in which an adaptive advertisement display can react to actions of the users in front of the display.\nThe aim of this work is to develop a study from the perspective of Abstract Algebraic Logic of some bilattice-based logical systems introduced in the nineties by Ofer Arieli and Arnon Avron. The motivation for such an investigation has two main roots. On the one hand there is an interest in bilattices as an elegant formalism that gave rise in the last two decades to a variety of applications, especially in the field of Theoretical Computer Science and Artificial Intelligence. In this respect, the present study aims to be a contribution to a better understanding of the mathematical and logical framework that underlie these applications. On the other hand, our interest in bilattice-based logics comes from Abstract Algebraic Logic. In very general terms, algebraic logic can be described as the study of the connections between algebra and logic. One of the main reasons that motivate this study is the possibility to treat logical problems with algebraic methods and viceversa: this is accomplished by associating to a logical system a class of algebraic models that can be regarded as the algebraic counterpart of that logic. Starting from the work of Tarski and his collaborators, the method of algebraizing logics has been increasingly developed and generalized. In the last two decades, algebraic logicians have focused their attention on the process of algebraization itself: this kind of investigation forms now a subfield of algebraic logic known as Abstract Algebraic Logic (which we abbreviate AAL).\nModern scientific data mainly consist of huge datasets gathered by a very large number of techniques and stored in very diversified and often incompatible data repositories. More in general, in the e-science environment, it is considered as a critical and urgent requirement to integrate services across distributed, heterogeneous, dynamic \"virtual organizations\" formed by different resources within a single enterprise. In the last decade, Astronomy has become an immensely data rich field due to the evolution of detectors (plates to digital to mosaics), telescopes and space instruments. The Virtual Observatory approach consists into the federation under common standards of all astronomical archives available worldwide, as well as data analysis, data mining and data exploration applications. The main drive behind such effort being that once the infrastructure will be completed, it will allow a new type of multi-wavelength, multi-epoch science which can only be barely imagined. Data Mining, or Knowledge Discovery in Databases, while being the main methodology to extract the scientific information contained in such MDS (Massive Data Sets), poses crucial problems since it has to orchestrate complex problems posed by transparent access to different computing environments, scalability of algorithms, reusability of resources, etc. In the present paper we summarize the present status of the MDS in the Virtual Observatory and what is currently done and planned to bring advanced Data Mining methodologies in the case of the DAME (DAta Mining & Exploration) project.\nWe describe a mathematical models of grounded symbols in the brain. It also serves as a computational foundations for Perceptual Symbol System (PSS). This development requires new mathematical methods of dynamic logic (DL), which have overcome limitations of classical artificial intelligence and connectionist approaches. The paper discusses these past limitations, relates them to combinatorial complexity (exponential explosion) of algorithms in the past, and further to the static nature of classical logic. The new mathematical theory, DL, is a process-logic. A salient property of this process is evolution of vague representations into crisp. The paper first applies it to one aspect of PSS: situation learning from object perceptions. Then we relate DL to the essential PSS mechanisms of concepts, simulators, grounding, productivity, binding, recursion, and to the mechanisms relating grounded and amodal symbols. We discuss DL as a general theory describing the process of cognition on multiple levels of abstraction. We also discuss the implications of this theory for interactions between cognition and language, mechanisms of language grounding, and possible role of language in grounding abstract cognition. The developed theory makes experimental predictions, and will impact future theoretical developments in cognitive science, including knowledge representation, and perception-cognition interaction. Experimental neuroimaging evidence for DL and PSS in brain imaging is discussed as well as future research directions.\nThis report outlines the use of a relational representation in a Multi-Agent domain to model the behaviour of the whole system. A desired property in this systems is the ability of the team members to work together to achieve a common goal in a cooperative manner. The aim is to define a systematic method to verify the effective collaboration among the members of a team and comparing the different multi-agent behaviours. Using external observations of a Multi-Agent System to analyse, model, recognize agent behaviour could be very useful to direct team actions. In particular, this report focuses on the challenge of autonomous unsupervised sequential learning of the team's behaviour from observations. Our approach allows to learn a symbolic sequence (a relational representation) to translate raw multi-agent, multi-variate observations of a dynamic, complex environment, into a set of sequential behaviours that are characteristic of the team in question, represented by a set of sequences expressed in first-order logic atoms. We propose to use a relational learning algorithm to mine meaningful frequent patterns among the relational sequences to characterise team behaviours. We compared the performance of two teams in the RoboCup four-legged league environment, that have a very different approach to the game. One uses a Case Based Reasoning approach, the other uses a pure reactive behaviour.\nWe propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem.\nPredicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open.   We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function.   Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction.\nDistributed search problems are ubiquitous in Artificial Life (ALife). Many distributed search problems require identifying a rare and previously unseen event and producing a rapid response. This challenge amounts to finding and removing an unknown needle in a very large haystack. Traditional computational search models are unlikely to find, nonetheless, appropriately respond to, novel events, particularly given data distributed across multiple platforms in a variety of formats and sources with variable and unknown reliability. Biological systems have evolved solutions to distributed search and response under uncertainty. Immune systems and ant colonies efficiently scale up massively parallel search with automated response in highly dynamic environments, and both do so using distributed coordination without centralized control. These properties are relevant to ALife, where distributed, autonomous, robust and adaptive control is needed to design robot swarms, mobile computing networks, computer security systems and other distributed intelligent systems. They are also relevant for searching, tracking the spread of ideas, and understanding the impact of innovations in online social networks. We review design principles for Scalable Robust, Adaptive, Decentralized search with Automated Response (Scalable RADAR) in biology. We discuss how biological RADAR scales up efficiently, and then discuss in detail how modular search in the immune system can be mimicked or built upon in ALife. Such search mechanisms are particularly useful when components have limited capacity to communicate and social or physical distance makes long distance communication more costly.\nBoth deterministic and indeterministic physical laws are incompatible with control by genuine (non-illusory) free will. We propose that an indeterministic dynamics can be $weakly$ compatible with free will (FW), whereby the latter acts by altering the probability distribution over allowed outcomes. In the quantum physical world, such a FW can collapse the wave function, introducing deviations from the Born rule. In principle, this deviation would stand in conflict with both special relativity and (a variant of) the Strong Church-Turing thesis, implying that the brain may be an arena of exotic, non-standard physics. However, in practice, these deviations would not be directly or easily observable, because they occur in sub-neuronal superpositions in the brain, where they would be shrouded in random measurement errors, noise and statistical fluctuations. Our result elucidates the difference between the FW of human observers and that of observed particles in the Free Will Theorem. This difference is a basic reason for why FW (and, in general, consciousness) cannot be recreated by standard artificial intelligence (AI) technology. We propose various neurobiological experiments to test our proposed theory. We speculate that for observers to be aware of a physical theory such as quantum mechanics, FW is necessary and that the theory must therefore not be universal. We suggest that FW may be regarded as a primitive principle in Nature for explaining quantum indeterminism.\nGraph coloring, also known as vertex coloring, considers the problem of assigning colors to the nodes of a graph such that adjacent nodes do not share the same color. The optimization version of the problem concerns the minimization of the number of used colors. In this paper we deal with the problem of finding valid colorings of graphs in a distributed way, that is, by means of an algorithm that only uses local information for deciding the color of the nodes. Such algorithms prescind from any central control. Due to the fact that quite a few practical applications require to find colorings in a distributed way, the interest in distributed algorithms for graph coloring has been growing during the last decade. As an example consider wireless ad-hoc and sensor networks, where tasks such as the assignment of frequencies or the assignment of TDMA slots are strongly related to graph coloring.   The algorithm proposed in this paper is inspired by the calling behavior of Japanese tree frogs. Male frogs use their calls to attract females. Interestingly, groups of males that are located nearby each other desynchronize their calls. This is because female frogs are only able to correctly localize the male frogs when their calls are not too close in time. We experimentally show that our algorithm is very competitive with the current state of the art, using different sets of problem instances and comparing to one of the most competitive algorithms from the literature.\nBayesian networks are basic graphical models, used widely both in statistics and artificial intelligence. These statistical models of conditional independence structure are described by acyclic directed graphs whose nodes correspond to (random) variables in consideration. A quite important topic is the learning of Bayesian network structures, which is determining the best fitting statistical model on the basis of given data. Although there are learning methods based on statistical conditional independence tests, contemporary methods are mainly based on maximization of a suitable quality criterion that evaluates how good the graph explains the occurrence of the observed data. This leads to a nonlinear combinatorial optimization problem that is in general NP-hard to solve. In this paper we deal with the complexity of learning restricted Bayesian network structures, that is, we wish to find network structures of highest score within a given subset of all possible network structures. For this, we introduce a new unique algebraic representative for these structures, called the characteristic imset. We show that these imsets are always 0-1-vectors and that they have many nice properties that allow us to simplify long proofs for some known results and to easily establish new complexity results for learning restricted Bayes network structures.\nThere are a huge number of problems, from various areas, being solved by reducing them to SAT. However, for many applications, translation into SAT is performed by specialized, problem-specific tools. In this paper we describe a new system for uniform solving of a wide class of problems by reducing them to SAT. The system uses a new specification language URSA that combines imperative and declarative programming paradigms. The reduction to SAT is defined precisely by the semantics of the specification language. The domain of the approach is wide (e.g., many NP-complete problems can be simply specified and then solved by the system) and there are problems easily solvable by the proposed system, while they can be hardly solved by using other programming languages or constraint programming systems. So, the system can be seen not only as a tool for solving problems by reducing them to SAT, but also as a general-purpose constraint solving system (for finite domains). In this paper, we also describe an open-source implementation of the described approach. The performed experiments suggest that the system is competitive to state-of-the-art related modelling systems.\nThe previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on thirty-eight time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.\nTemporal Logic Model Checking is a verification method in which we describe a system, the model, and then we verify whether some properties, expressed in a temporal logic formula, hold in the system. It has many industrial applications. In order to improve performance, some tools allow preprocessing of the model, verifying on-line a set of properties reusing the same compiled model; we prove that the complexity of the Model Checking problem, without any preprocessing or preprocessing the model or the formula in a polynomial data structure, is the same. As a result preprocessing does not always exponentially improve performance.   Symbolic Model Checking algorithms work by manipulating sets of states, and these sets are often represented by BDDs. It has been observed that the size of BDDs may grow exponentially as the model and formula increase in size. As a side result, we formally prove that a superpolynomial increase of the size of these BDDs is unavoidable in the worst case. While this exponential growth has been empirically observed, to the best of our knowledge it has never been proved so far in general terms. This result not only holds for all types of BDDs regardless of the variable ordering, but also for more powerful data structures, such as BEDs, RBCs, MTBDDs, and ADDs.\nIn massively collaborative projects such as scientific or community databases, users often need to agree or disagree on the content of individual data items. On the other hand, trust relationships often exist between users, allowing them to accept or reject other users' beliefs by default. As those trust relationships become complex, however, it becomes difficult to define and compute a consistent snapshot of the conflicting information. Previous solutions to a related problem, the update reconciliation problem, are dependent on the order in which the updates are processed and, therefore, do not guarantee a globally consistent snapshot. This paper proposes the first principled solution to the automatic conflict resolution problem in a community database. Our semantics is based on the certain tuples of all stable models of a logic program. While evaluating stable models in general is well known to be hard, even for very simple logic programs, we show that the conflict resolution problem admits a PTIME solution. To the best of our knowledge, ours is the first PTIME algorithm that allows conflict resolution in a principled way. We further discuss extensions to negative beliefs and prove that some of these extensions are hard. This work is done in the context of the BeliefDB project at the University of Washington, which focuses on the efficient management of conflicts in community databases.\nThis article deals with Part family formation problem which is believed to be moderately complicated to be solved in polynomial time in the vicinity of Group Technology (GT). In the past literature researchers investigated that the part family formation techniques are principally based on production flow analysis (PFA) which usually considers operational requirements, sequences and time. Part Coding Analysis (PCA) is merely considered in GT which is believed to be the proficient method to identify the part families. PCA classifies parts by allotting them to different families based on their resemblances in: (1) design characteristics such as shape and size, and/or (2) manufacturing characteristics (machining requirements). A novel approach based on simulated annealing namely SAPFOCS is adopted in this study to develop effective part families exploiting the PCA technique. Thereafter Taguchi's orthogonal design method is employed to solve the critical issues on the subject of parameters selection for the proposed metaheuristic algorithm. The adopted technique is therefore tested on 5 different datasets of size 5 {\\times} 9 to 27 {\\times} 9 and the obtained results are compared with C-Linkage clustering technique. The experimental results reported that the proposed metaheuristic algorithm is extremely effective in terms of the quality of the solution obtained and has outperformed C-Linkage algorithm in most instances.\nIn a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A* can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a model-free online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significantly overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior.\nEngineered systems are designed to deftly operate under predetermined conditions yet are notoriously fragile when unexpected perturbations arise. In contrast, biological systems operate in a highly flexible manner; learn quickly adequate responses to novel conditions, and evolve new routines/traits to remain competitive under persistent environmental change. A recent theory on the origins of biological flexibility has proposed that degeneracy - the existence of multi-functional components with partially overlapping functions - is a primary determinant of the robustness and adaptability found in evolved systems. While degeneracy's contribution to biological flexibility is well documented, there has been little investigation of degeneracy design principles for achieving flexibility in systems engineering. Actually, the conditions that can lead to degeneracy are routinely eliminated in engineering design.   With the planning of transportation vehicle fleets taken as a case study, this paper reports evidence that degeneracy improves robustness and adaptability of a simulated fleet without incurring costs to efficiency. We find degeneracy dramatically increases robustness of a fleet to unpredicted changes in the environment while it also facilitates robustness to anticipated variations. When we allow a fleet's architecture to be adapted in response to environmental change, we find degeneracy can be selectively acquired, leading to faster rates of design adaptation and ultimately to better designs. Given the range of conditions where favorable short-term and long-term performance outcomes are observed, we propose that degeneracy design principles fundamentally alter the propensity for adaptation and may be useful within several engineering and planning contexts.\nTechniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a framework for natural language semantics in which words, phrases and sentences are all represented as vectors, based on a theoretical analysis which assumes that meaning is determined by context.   In the theoretical analysis, we define a corpus model as a mathematical abstraction of a text corpus. The meaning of a string of words is assumed to be a vector representing the contexts in which it occurs in the corpus model. Based on this assumption, we can show that the vector representations of words can be considered as elements of an algebra over a field. We note that in applications of vector spaces to representing meanings of words there is an underlying lattice structure; we interpret the partial ordering of the lattice as describing entailment between meanings. We also define the context-theoretic probability of a string, and, based on this and the lattice structure, a degree of entailment between strings.   We relate the framework to existing methods of composing vector-based representations of meaning, and show that our approach generalises many of these, including vector addition, component-wise multiplication, and the tensor product.\nExternal information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.\nThe emerging need for qualitative approaches in context-aware information processing calls for proper modeling of context information and efficient handling of its inherent uncertainty resulted from human interpretation and usage. Many of the current approaches to context-awareness either lack a solid theoretical basis for modeling or ignore important requirements such as modularity, high-order uncertainty management and group-based context-awareness. Therefore, their real-world application and extendability remains limited. In this paper, we present f-Context as a service-based context-awareness framework, based on language-action perspective (LAP) theory for modeling. Then we identify some of the complex, informational parts of context which contain high-order uncertainties due to differences between members of the group in defining them. An agent-based perceptual computer architecture is proposed for implementing f-Context that uses computing with words (CWW) for handling uncertainty. The feasibility of f-Context is analyzed using a realistic scenario involving a group of mobile users. We believe that the proposed approach can open the door to future research on context-awareness by offering a theoretical foundation based on human communication, and a service-based layered architecture which exploits CWW for context-aware, group-based and platform-independent access to information systems.\nRecent research in multi-robot exploration and mapping has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing information-theoretic exploration strategies for learning GP-based environmental field maps adopt the non-Markovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes computationally impractical to use these strategies for in situ, real-time active sampling. To ease this computational burden, this paper presents a Markov-based approach to efficient information-theoretic path planning for active sampling of GP-based fields. We analyze the time complexity of solving the Markov-based path planning problem, and demonstrate analytically that it scales better than that of deriving the non-Markovian strategies with increasing length of planning horizon. For a class of exploration tasks called the transect sampling task, we provide theoretical guarantees on the active sampling performance of our Markov-based policy, from which ideal environmental field conditions and sampling task settings can be established to limit its performance degradation due to violation of the Markov assumption. Empirical evaluation on real-world temperature and plankton density field data shows that our Markov-based policy can generally achieve active sampling performance comparable to that of the widely-used non-Markovian greedy policies under less favorable realistic field conditions and task settings while enjoying significant computational gain over them.\nBoolean Satisfiability solvers have gone through dramatic improvements in their performances and scalability over the last few years by considering symmetries. It has been shown that by using graph symmetries and generating symmetry breaking predicates (SBPs) it is possible to break symmetries in Conjunctive Normal Form (CNF). The SBPs cut down the search space to the nonsymmetric regions of the space without affecting the satisfiability of the CNF formula. The symmetry breaking predicates are created by representing the formula as a graph, finding the graph symmetries and using some symmetry extraction mechanism (Crawford et al.). Here in this paper we take one non-trivial CNF and explore its symmetries. Finally, we generate the SBPs and adding it to CNF we show how it helps to prune the search tree, so that SAT solver would take short time. Here we present the pruning procedure of the search tree from scratch, starting from the CNF and its graph representation. As we explore the whole mechanism by a non-trivial example, it would be easily comprehendible. Also we have given a new idea of generating symmetry breaking predicates for breaking symmetry in CNF, not derived from Crawford's conditions. At last we propose a backtrack SAT solver with inbuilt SBP generator.\nOne of the most interesting scientific challenges nowadays deals with the analysis and the understanding of complex networks' dynamics and how their processes lead to emergence according to the interactions among their components. In this paper we approach the definition of new methodologies for the visualization and the exploration of the dynamics at play in real dynamic social networks. We present a recently introduced formalism called TVG (for time-varying graphs), which was initially developed to model and analyze highly-dynamic and infrastructure-less communication networks such as mobile ad-hoc networks, wireless sensor networks, or vehicular networks. We discuss its applicability to complex networks in general, and social networks in particular, by showing how it enables the specification and analysis of complex dynamic phenomena in terms of temporal interactions, and allows to easily switch the perspective between local and global dynamics. As an example, we chose the case of scientific communities by analyzing portion of the ArXiv repository (ten years of publications in physics) focusing on the social determinants (e.g. goals and potential interactions among individuals) behind the emergence and the resilience of scientific communities. We consider that scientific communities are at the same time communities of practice (through co-authorship) and that they exist also as representations in the scientists' mind, since references to other scientists' works is not merely an objective link to a relevant work, but it reveals social objects that one manipulates, select and refers to. In the paper we show the emergence/selection of a community as a goal-driven preferential attachment toward a set of authors among which there are some key scientists (Nobel prizes).\nWe give the first analysis of the computational complexity of {\\it coalition structure generation over graphs}. Given an undirected graph $G=(N,E)$ and a valuation function $v:2^N\\rightarrow\\RR$ over the subsets of nodes, the problem is to find a partition of $N$ into connected subsets, that maximises the sum of the components' values. This problem is generally NP--complete; in particular, it is hard for a defined class of valuation functions which are {\\it independent of disconnected members}---that is, two nodes have no effect on each other's marginal contribution to their vertex separator. Nonetheless, for all such functions we provide bounds on the complexity of coalition structure generation over general and minor free graphs. Our proof is constructive and yields algorithms for solving corresponding instances of the problem. Furthermore, we derive polynomial time bounds for acyclic, $K_{2,3}$ and $K_4$ minor free graphs. However, as we show, the problem remains NP--complete for planar graphs, and hence, for any $K_k$ minor free graphs where $k\\geq 5$. Moreover, our hardness result holds for a particular subclass of valuation functions, termed {\\it edge sum}, where the value of each subset of nodes is simply determined by the sum of given weights of the edges in the induced subgraph.\nIn this paper we introduce the olog, or ontology log, a category-theoretic model for knowledge representation (KR). Grounded in formal mathematics, ologs can be rigorously formulated and cross-compared in ways that other KR models (such as semantic networks) cannot. An olog is similar to a relational database schema; in fact an olog can serve as a data repository if desired. Unlike database schemas, which are generally difficult to create or modify, ologs are designed to be user-friendly enough that authoring or reconfiguring an olog is a matter of course rather than a difficult chore. It is hoped that learning to author ologs is much simpler than learning a database definition language, despite their similarity. We describe ologs carefully and illustrate with many examples. As an application we show that any primitive recursive function can be described by an olog. We also show that ologs can be aligned or connected together into a larger network using functors. The various methods of information flow and institutions can then be used to integrate local and global world-views. We finish by providing several different avenues for future research.\nWe present a novel variant of decision making based on the mathematical theory of separable Hilbert spaces. This mathematical structure captures the effect of superposition of composite prospects, including many incorporated intentions, which allows us to describe a variety of interesting fallacies and anomalies that have been reported to particularize the decision making of real human beings. The theory characterizes entangled decision making, non-commutativity of subsequent decisions, and intention interference. We demonstrate how the violation of the Savage's sure-thing principle, known as the disjunction effect, can be explained quantitatively as a result of the interference of intentions, when making decisions under uncertainty. The disjunction effects, observed in experiments, are accurately predicted using a theorem on interference alternation that we derive, which connects aversion-to-uncertainty to the appearance of negative interference terms suppressing the probability of actions. The conjunction fallacy is also explained by the presence of the interference terms. A series of experiments are analysed and shown to be in excellent agreement with a priori evaluation of interference effects. The conjunction fallacy is also shown to be a sufficient condition for the disjunction effect and novel experiments testing the combined interplay between the two effects are suggested.\nThe problem of consciousness faced several challenges for a few reasons: (a) a lack of necessary and sufficient conditions, without which we would not know how close we are to the solution, (b) a lack of a synthesis framework to build conscious systems and (c) a lack of mechanisms explaining the transition between the lower-level chemical dynamics and the higher-level abstractions. In this paper, I address these issues using a new framework. The central result is that a person is 'minimally' conscious if and only if he knows at least one truth. This lets us move away from the vagueness surrounding consciousness and instead focus equivalently on: (i) what truths are and how our brain represents/relates them to each other and (ii) how we attain a feeling of knowing for a truth. For the former problem, since truths are things that do not change, I replace the abstract notion with a dynamical one called fixed sets. These sets are guaranteed to exist for our brain and other stable parallel looped systems. The relationships between everyday events are now built using relationships between fixed sets, until our brain creates a unique dynamical state called the self-sustaining threshold 'membrane' of fixed sets. For the latter problem, I present necessary and sufficient conditions for attaining a feeling of knowing using a definition of continuity applied to abstractions. Combining these results, I now say that a person is minimally conscious if and only if his brain has a self-sustaining dynamical membrane with abstract continuous paths. A synthetic system built to satisfy this equivalent self-sustaining membrane condition appears indistinguishable from human consciousness.\nBisimulations have been widely used in many areas of computer science to model equivalence between various systems, and to reduce the number of states of these systems, whereas uniform fuzzy relations have recently been introduced as a means to model the fuzzy equivalence between elements of two possible different sets. Here we use the conjunction of these two concepts as a powerful tool in the study of equivalence between fuzzy automata. We prove that a uniform fuzzy relation between fuzzy automata $\\cal A$ and $\\cal B$ is a forward bisimulation if and only if its kernel and co-kernel are forward bisimulation fuzzy equivalences on $\\cal A$ and $\\cal B$ and there is a special isomorphism between factor fuzzy automata with respect to these fuzzy equivalences. As a consequence we get that fuzzy automata $\\cal A$ and $\\cal B$ are UFB-equivalent, i.e., there is a uniform forward bisimulation between them, if and only if there is a special isomorphism between the factor fuzzy automata of $\\cal A$ and $\\cal B$ with respect to their greatest forward bisimulation fuzzy equivalences. This result reduces the problem of testing UFB-equivalence to the problem of testing isomorphism of fuzzy automata, which is closely related to the well-known graph isomorphism problem. We prove some similar results for backward-forward bisimulations, and we point to fundamental differences. Because of the duality with the studied concepts, backward and forward-backward bisimulations are not considered separately. Finally, we give a comprehensive overview of various concepts on deterministic, nondeterministic, fuzzy, and weighted automata, which are related to bisimulations.\nThe problem of business-IT alignment is of widespread economic concern.   As one way of addressing the problem, this paper describes an online system that functions as a kind of Wiki -- one that supports the collaborative writing and running of business and scientific applications, as rules in open vocabulary, executable English, using a browser.   Since the rules are in English, they are indexed by Google and other search engines. This is useful when looking for rules for a task that one has in mind.   The design of the system integrates the semantics of data, with a semantics of an inference method, and also with the meanings of English sentences. As such, the system has functionality that may be useful for the Rules, Logic, Proof and Trust requirements of the Semantic Web.   The system accepts rules, and small numbers of facts, typed or copy-pasted directly into a browser. One can then run the rules, again using a browser. For larger amounts of data, the system uses information in the rules to automatically generate and run SQL over networked databases. From a few highly declarative rules, the system typically generates SQL that would be too complicated to write reliably by hand. However, the system can explain its results in step-by-step hypertexted English, at the business or scientific level   As befits a Wiki, shared use of the system is free.\nThe Elo system for rating chess players, also used in other games and sports, was adopted by the World Chess Federation over four decades ago. Although not without controversy, it is accepted as generally reliable and provides a method for assessing players' strengths and ranking them in official tournaments.   It is generally accepted that the distribution of players' rating data is approximately normal but, to date, no stochastic model of how the distribution might have arisen has been proposed. We propose such an evolutionary stochastic model, which models the arrival of players into the rating pool, the games they play against each other, and how the results of these games affect their ratings. Using a continuous approximation to the discrete model, we derive the distribution for players' ratings at time $t$ as a normal distribution, where the variance increases in time as a logarithmic function of $t$. We validate the model using published rating data from 2007 to 2010, showing that the parameters obtained from the data can be recovered through simulations of the stochastic model.   The distribution of players' ratings is only approximately normal and has been shown to have a small negative skew. We show how to modify our evolutionary stochastic model to take this skewness into account, and we validate the modified model using the published official rating data.\nAn emotional version of Sapir-Whorf hypothesis suggests that differences in language emotionalities influence differences among cultures no less than conceptual differences. Conceptual contents of languages and cultures to significant extent are determined by words and their semantic differences; these could be borrowed among languages and exchanged among cultures. Emotional differences, as suggested in the paper, are related to grammar and mostly cannot be borrowed. Conceptual and emotional mechanisms of languages are considered here along with their functions in the mind and cultural evolution. A fundamental contradiction in human mind is considered: language evolution requires reduced emotionality, but \"too low\" emotionality makes language \"irrelevant to life,\" disconnected from sensory-motor experience. Neural mechanisms of these processes are suggested as well as their mathematical models: the knowledge instinct, the language instinct, the dual model connecting language and cognition, dynamic logic, neural modeling fields. Mathematical results are related to cognitive science, linguistics, and psychology. Experimental evidence and theoretical arguments are discussed. Approximate equations for evolution of human minds and cultures are obtained. Their solutions identify three types of cultures: \"conceptual\"-pragmatic cultures, in which emotionality of language is reduced and differentiation overtakes synthesis resulting in fast evolution at the price of uncertainty of values, self doubts, and internal crises; \"traditional-emotional\" cultures where differentiation lags behind synthesis, resulting in cultural stability at the price of stagnation; and \"multi-cultural\" societies combining fast cultural evolution and stability. Unsolved problems and future theoretical and experimental directions are discussed.\nKnowledge compilation is an approach to tackle the computational intractability of general reasoning problems. According to this approach, knowledge bases are converted off-line into a target compilation language which is tractable for on-line querying. Reduced ordered binary decision diagram (ROBDD) is one of the most influential target languages. We generalize ROBDD by associating some implied literals in each node and the new language is called reduced ordered binary decision diagram with implied literals (ROBDD-L). Then we discuss a kind of subsets of ROBDD-L called ROBDD-i with precisely i implied literals (0 \\leq i \\leq \\infty). In particular, ROBDD-0 is isomorphic to ROBDD; ROBDD-\\infty requires that each node should be associated by the implied literals as many as possible. We show that ROBDD-i has uniqueness over some specific variables order, and ROBDD-\\infty is the most succinct subset in ROBDD-L and can meet most of the querying requirements involved in the knowledge compilation map. Finally, we propose an ROBDD-i compilation algorithm for any i and a ROBDD-\\infty compilation algorithm. Based on them, we implement a ROBDD-L package called BDDjLu and then get some conclusions from preliminary experimental results: ROBDD-\\infty is obviously smaller than ROBDD for all benchmarks; ROBDD-\\infty is smaller than the d-DNNF the benchmarks whose compilation results are relatively small; it seems that it is better to transform ROBDDs-\\infty into FBDDs and ROBDDs rather than straight compile the benchmarks.\nWe show that several important resource allocation problems in wireless networks fit within the common framework of Constraint Satisfaction Problems (CSPs). Inspired by the requirements of these applications, where variables are located at distinct network devices that may not be able to communicate but may interfere, we define natural criteria that a CSP solver must possess in order to be practical. We term these algorithms decentralized CSP solvers. The best known CSP solvers were designed for centralized problems and do not meet these criteria. We introduce a stochastic decentralized CSP solver and prove that it will find a solution in almost surely finite time, should one exist, also showing it has many practically desirable properties. We benchmark the algorithm's performance on a well-studied class of CSPs, random k-SAT, illustrating that the time the algorithm takes to find a satisfying assignment is competitive with stochastic centralized solvers on problems with order a thousand variables despite its decentralized nature. We demonstrate the solver's practical utility for the problems that motivated its introduction by using it to find a non-interfering channel allocation for a network formed from data from downtown Manhattan.\nWe study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance.   In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the doubly robust technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have either a good (but not necessarily consistent) model of rewards or a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust approach uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice.\nSome existing notions of redundancy among association rules allow for a logical-style characterization and lead to irredundant bases of absolutely minimum size. One can push the intuition of redundancy further and find an intuitive notion of interest of an association rule, in terms of its \"novelty\" with respect to other rules. Namely: an irredundant rule is so because its confidence is higher than what the rest of the rules would suggest; then, one can ask: how much higher? We propose to measure such a sort of \"novelty\" through the confidence boost of a rule, which encompasses two previous similar notions (confidence width and rule blocking, of which the latter is closely related to the earlier measure \"improvement\"). Acting as a complement to confidence and support, the confidence boost helps to obtain small and crisp sets of mined association rules, and solves the well-known problem that, in certain cases, rules of negative correlation may pass the confidence bound. We analyze the properties of two versions of the notion of confidence boost, one of them a natural generalization of the other. We develop efficient algorithmics to filter rules according to their confidence boost, compare the concept to some similar notions in the bibliography, and describe the results of some experimentation employing the new notions on standard benchmark datasets. We describe an open-source association mining tool that embodies one of our variants of confidence boost in such a way that the data mining process does not require the user to select any value for any parameter.\nSocial computation, whether in the form of searches performed by swarms of agents or collective predictions of markets, often supplies remarkably good solutions to complex problems. In many examples, individuals trying to solve a problem locally can aggregate their information and work together to arrive at a superior global solution. This suggests that there may be general principles of information aggregation and coordination that can transcend particular applications. Here we show that the general structure of this problem can be cast in terms of information theory and derive mathematical conditions that lead to optimal multi-agent searches. Specifically, we illustrate the problem in terms of local search algorithms for autonomous agents looking for the spatial location of a stochastic source. We explore the types of search problems, defined in terms of the statistical properties of the source and the nature of measurements at each agent, for which coordination among multiple searchers yields an advantage beyond that gained by having the same number of independent searchers. We show that effective coordination corresponds to synergy and that ineffective coordination corresponds to independence as defined using information theory. We classify explicit types of sources in terms of their potential for synergy. We show that sources that emit uncorrelated signals provide no opportunity for synergetic coordination while sources that emit signals that are correlated in some way, do allow for strong synergy between searchers. These general considerations are crucial for designing optimal algorithms for particular search problems in real world settings.\nHomomorphisms between relational structures are not only fundamental mathematical objects, but are also of great importance in an applied computational context. Indeed, constraint satisfaction problems (CSPs), a wide class of algorithmic problems that occur in many different areas of computer science such as artificial intelligence or database theory, may be viewed as asking for homomorphisms between two relational structures [FedVar98]. In a logical setting, homomorphisms may be viewed as witnesses for positive primitive formulas in a relational language. As we shall see, homomorphisms, or more precisely the numbers of homomorphisms between two structures, are also related to a fundamental computational problem of statistical physics.   In this article, we are concerned with the complexity of counting homomorphisms from a given structure A to a fixed structure B. Actually, we are mainly interested in a generalization of this problem to weighted homomorphisms (or partition functions). We almost exclusively focus on graphs. The first part of the article is a short survey of what is known about the problem. In the second part, we give a proof of a theorem due to Bulatov and the first author of this paper [BulGro05], which classifies the complexity of partition functions described by matrices with non-negative entries. The proof we give here is essentially the same as the original one, with a few shortcuts due to [Thu09], but it is phrased in a different, more graph theoretical language that may make it more accessible to most readers.\nFree variables occur frequently in mathematics and computer science with ad hoc and altering semantics. We present the most recent version of our free-variable framework for two-valued logics with properly improved functionality, but only two kinds of free variables left (instead of three): implicitly universally and implicitly existentially quantified ones, now simply called \"free atoms\" and \"free variables\", respectively. The quantificational expressiveness and the problem-solving facilities of our framework exceed standard first-order and even higher-order modal logics, and directly support Fermat's descente infinie. With the improved version of our framework, we can now model also Henkin quantification, neither using quantifiers (binders) nor raising (Skolemization). We propose a new semantics for Hilbert's epsilon as a choice operator with the following features: We avoid overspecification (such as right-uniqueness), but admit indefinite choice, committed choice, and classical logics. Moreover, our semantics for the epsilon supports reductive proof search optimally.\nAnswer Set Programming (ASP) is an increasingly popular framework for declarative programming that admits the description of problems by means of rules and constraints that form a disjunctive logic program. In particular, many AI problems such as reasoning in a nonmonotonic setting can be directly formulated in ASP. Although the main problems of ASP are of high computational complexity, located at the second level of the Polynomial Hierarchy, several restrictions of ASP have been identified in the literature, under which ASP problems become tractable.   In this paper we use the concept of backdoors to identify new restrictions that make ASP problems tractable. Small backdoors are sets of atoms that represent \"clever reasoning shortcuts\" through the search space and represent a hidden structure in the problem input. The concept of backdoors is widely used in the areas of propositional satisfiability and constraint satisfaction. We show that it can be fruitfully adapted to ASP. We demonstrate how backdoors can serve as a unifying framework that accommodates several tractable restrictions of ASP known from the literature. Furthermore, we show how backdoors allow us to deploy recent algorithmic results from parameterized complexity theory to the domain of answer set programming.\nWe present in this paper a novel approach for training deterministic auto-encoders. We show that by adding a well chosen penalty term to the classical reconstruction cost function, we can achieve results that equal or surpass those attained by other regularized auto-encoders as well as denoising auto-encoders on a range of datasets. This penalty term corresponds to the Frobenius norm of the Jacobian matrix of the encoder activations with respect to the input. We show that this penalty term results in a localized space contraction which in turn yields robust features on the activation layer. Furthermore, we show how this penalty term is related to both regularized auto-encoders and denoising encoders and how it can be seen as a link between deterministic and non-deterministic auto-encoders. We find empirically that this penalty helps to carve a representation that better captures the local directions of variation dictated by the data, corresponding to a lower-dimensional non-linear manifold, while being more invariant to the vast majority of directions orthogonal to the manifold. Finally, we show that by using the learned features to initialize a MLP, we achieve state of the art classification error on a range of datasets, surpassing other methods of pre-training.\nOne of the key challenges in electronic government (e-government) is the development of systems that can be easily integrated and interoperated to provide seamless services delivery to citizens. In recent years, Semantic Web technologies based on ontology have emerged as promising solutions to the above engineering problems. However, current research practicing semantic development in e-government does not focus on the application of available methodologies and platforms for developing government domain ontologies. Furthermore, only a few of these researches provide detailed guidelines for developing semantic ontology models from a government service domain. This research presents a case study combining an ontology building methodology and two state-of-the-art Semantic Web platforms namely Protege and Java Jena ontology API for semantic ontology development in e-government. Firstly, a framework adopted from the Uschold and King ontology building methodology is employed to build a domain ontology describing the semantic content of a government service domain. Thereafter, UML is used to semi-formally represent the domain ontology. Finally, Protege and Jena API are employed to create the Web Ontology Language (OWL) and Resource Description Framework (RDF) representations of the domain ontology respectively to enable its computer processing. The study aims at: (1) providing e-government developers, particularly those from the developing world with detailed guidelines for practicing semantic content development in their e-government projects and (2), strengthening the adoption of semantic technologies in e-government. The study would also be of interest to novice Semantic Web developers who might used it as a starting point for further investigations.\nMutation has traditionally been regarded as an important operator in evolutionary algorithms. In particular, there have been many experimental studies which showed the effectiveness of adapting mutation rates for various static optimization problems. Given the perceived effectiveness of adaptive and self-adaptive mutation for static optimization problems, there have been speculations that adaptive and self-adaptive mutation can benefit dynamic optimization problems even more since adaptation and self-adaptation are capable of following a dynamic environment. However, few theoretical results are available in analyzing rigorously evolutionary algorithms for dynamic optimization problems. It is unclear when adaptive and self-adaptive mutation rates are likely to be useful for evolutionary algorithms in solving dynamic optimization problems. This paper provides the first rigorous analysis of adaptive mutation and its impact on the computation times of evolutionary algorithms in solving certain dynamic optimization problems. More specifically, for both individual-based and population-based EAs, we have shown that any time-variable mutation rate scheme will not significantly outperform a fixed mutation rate on some dynamic optimization problem instances. The proofs also offer some insights into conditions under which any time-variable mutation scheme is unlikely to be useful and into the relationships between the problem characteristics and algorithmic features (e.g., different mutation schemes).\nThe use of L1 regularisation for sparse learning has generated immense research interest, with successful application in such diverse areas as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared with other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of \"L1\", and Bayesian models with a stronger L0-like sparsity induced through spike-and-slab distributions. These spike-and-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner and avoiding unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.\nUsing the Hilbert-Bernays account as a spring-board, we first define four ways in which two objects can be discerned from one another, using the non-logical vocabulary of the language concerned. (These definitions are based on definitions made by Quine and Saunders.) Because of our use of the Hilbert-Bernays account, these definitions are in terms of the syntax of the language. But we also relate our definitions to the idea of permutations on the domain of quantification, and their being symmetries. These relations turn out to be subtle---some natural conjectures about them are false. We will see in particular that the idea of symmetry meshes with a species of indiscernibility that we will call `absolute indiscernibility'. We then report all the logical implications between our four kinds of discernibility. We use these four kinds as a resource for stating four metaphysical theses about identity. Three of these theses articulate two traditional philosophical themes: viz. the principle of the identity of indiscernibles (which will come in two versions), and haecceitism. The fourth is recent. Its most notable feature is that it makes diversity (i.e. non-identity) weaker than what we will call individuality (being an individual): two objects can be distinct but not individuals. For this reason, it has been advocated both for quantum particles and for spacetime points. Finally, we locate this fourth metaphysical thesis in a broader position, which we call structuralism. We conclude with a discussion of the semantics suitable for a structuralist, with particular reference to physical theories as well as elementary model theory.\nWe generalize the belief-propagation algorithm to sparse random networks with arbitrary distributions of motifs (triangles, loops, etc.). Each vertex in these networks belongs to a given set of motifs (generalization of the configuration model). These networks can be treated as sparse uncorrelated hypergraphs in which hyperedges represent motifs. Here a hypergraph is a generalization of a graph, where a hyperedge can connect any number of vertices. These uncorrelated hypergraphs are tree-like (hypertrees), which crucially simplify the problem and allow us to apply the belief-propagation algorithm to these loopy networks with arbitrary motifs. As natural examples, we consider motifs in the form of finite loops and cliques. We apply the belief-propagation algorithm to the ferromagnetic Ising model on the resulting random networks. We obtain an exact solution of this model on networks with finite loops or cliques as motifs. We find an exact critical temperature of the ferromagnetic phase transition and demonstrate that with increasing the clustering coefficient and the loop size, the critical temperature increases compared to ordinary tree-like complex networks. Our solution also gives the birth point of the giant connected component in these loopy networks.\nNatural language generation (NLG) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. NLG systems, like most AI systems, need substantial amounts of knowledge. However, our experience in two NLG projects suggests that it is difficult to acquire correct knowledge for NLG systems; indeed, every knowledge acquisition (KA) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based KA approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented KA techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other NLG systems as well. In the long term, we hope that new KA techniques may emerge to help NLG system builders. In the shorter term, we believe that understanding how individual KA techniques can fail, and using a mixture of different KA techniques with different strengths and weaknesses, can help developers acquire NLG knowledge that is mostly correct.\nSearch is a major technique for planning. It amounts to exploring a state space of planning domains typically modeled as a directed graph. However, prohibitively large sizes of the search space make search expensive. Developing better heuristic functions has been the main technique for improving search efficiency. Nevertheless, recent studies have shown that improving heuristics alone has certain fundamental limits on improving search efficiency. Recently, a new direction of research called partial order based reduction (POR) has been proposed as an alternative to improving heuristics. POR has shown promise in speeding up searches.   POR has been extensively studied in model checking research and is a key enabling technique for scalability of model checking systems. Although the POR theory has been extensively studied in model checking, it has never been developed systematically for planning before. In addition, the conditions for POR in the model checking theory are abstract and not directly applicable in planning. Previous works on POR algorithms for planning did not establish the connection between these algorithms and existing theory in model checking.   In this paper, we develop a theory for POR in planning. The new theory we develop connects the stubborn set theory in model checking and POR methods in planning. We show that previous POR algorithms in planning can be explained by the new theory. Based on the new theory, we propose a new, stronger POR algorithm. Experimental results on various planning domains show further search cost reduction using the new algorithm.\nThe AdaBoost algorithm was designed to combine many \"weak\" hypotheses that perform slightly better than random guessing into a \"strong\" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the \"exponential loss.\" Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Our first result shows that at iteration $t$, the exponential loss of AdaBoost's computed parameter vector will be at most $\\epsilon$ more than that of any parameter vector of $\\ell_1$-norm bounded by $B$ in a number of rounds that is at most a polynomial in $B$ and $1/\\epsilon$. We also provide lower bounds showing that a polynomial dependence on these parameters is necessary. Our second result is that within $C/\\epsilon$ iterations, AdaBoost achieves a value of the exponential loss that is at most $\\epsilon$ more than the best possible value, where $C$ depends on the dataset. We show that this dependence of the rate on $\\epsilon$ is optimal up to constant factors, i.e., at least $\\Omega(1/\\epsilon)$ rounds are necessary to achieve within $\\epsilon$ of the optimal exponential loss.\nWith the recent technological feasibility of electronic commerce over the Internet, much attention has been given to the design of electronic markets for various types of electronically-tradable goods. Such markets, however, will normally need to function in some relationship with markets for other related goods, usually those downstream or upstream in the supply chain. Thus, for example, an electronic market for rubber tires for trucks will likely need to be strongly influenced by the rubber market as well as by the truck market. In this paper we design protocols for exchange of information between a sequence of markets along a single supply chain. These protocols allow each of these markets to function separately, while the information exchanged ensures efficient global behavior across the supply chain. Each market that forms a link in the supply chain operates as a double auction, where the bids on one side of the double auction come from bidders in the corresponding segment of the industry, and the bids on the other side are synthetically generated by the protocol to express the combined information from all other links in the chain. The double auctions in each of the markets can be of several types, and we study several variants of incentive compatible double auctions, comparing them in terms of their efficiency and of the market revenue.\nMultiagent learning is a necessary yet challenging problem as multiagent systems become more prevalent and environments become more dynamic. Much of the groundbreaking work in this area draws on notable results from game theory, in particular, the concept of Nash equilibria. Learners that directly learn an equilibrium obviously rely on their existence. Learners that instead seek to play optimally with respect to the other players also depend upon equilibria since equilibria are fixed points for learning. From another perspective, agents with limitations are real and common. These may be undesired physical limitations as well as self-imposed rational limitations, such as abstraction and approximation techniques, used to make learning tractable. This article explores the interactions of these two important concepts: equilibria and limitations in learning. We introduce the question of whether equilibria continue to exist when agents have limitations. We look at the general effects limitations can have on agent behavior, and define a natural extension of equilibria that accounts for these limitations. Using this formalization, we make three major contributions: (i) a counterexample for the general existence of equilibria with limitations, (ii) sufficient conditions on limitations that preserve their existence, (iii) three general classes of games and limitations that satisfy these conditions. We then present empirical results from a specific multiagent learning algorithm applied to a specific instance of limited agents. These results demonstrate that learning with limitations is feasible, when the conditions outlined by our theoretical analysis hold.\nA time series consists of a series of values or events obtained over repeated measurements in time. Analysis of time series represents and important tool in many application areas, such as stock market analysis, process and quality control, observation of natural phenomena, medical treatments, etc. A vital component in many types of time-series analysis is the choice of an appropriate distance/similarity measure. Numerous measures have been proposed to date, with the most successful ones based on dynamic programming. Being of quadratic time complexity, however, global constraints are often employed to limit the search space in the matrix during the dynamic programming procedure, in order to speed up computation. Furthermore, it has been reported that such constrained measures can also achieve better accuracy. In this paper, we investigate two representative time-series distance/similarity measures based on dynamic programming, Dynamic Time Warping (DTW) and Longest Common Subsequence (LCS), and the effects of global constraints on them. Through extensive experiments on a large number of time-series data sets, we demonstrate how global constrains can significantly reduce the computation time of DTW and LCS. We also show that, if the constraint parameter is tight enough (less than 10-15% of time-series length), the constrained measure becomes significantly different from its unconstrained counterpart, in the sense of producing qualitatively different 1-nearest neighbor graphs. This observation explains the potential for accuracy gains when using constrained measures, highlighting the need for careful tuning of constraint parameters in order to achieve a good trade-off between speed and accuracy.\nIn previous work we have introduced a network-based model that abstracts many details of the underlying landscape and compresses the landscape information into a weighted, oriented graph which we call the local optima network. The vertices of this graph are the local optima of the given fitness landscape, while the arcs are transition probabilities between local optima basins. Here we extend this formalism to neutral fitness landscapes, which are common in difficult combinatorial search spaces. By using two known neutral variants of the NK family (i.e. NKp and NKq) in which the amount of neutrality can be tuned by a parameter, we show that our new definitions of the optima networks and the associated basins are consistent with the previous definitions for the non-neutral case. Moreover, our empirical study and statistical analysis show that the features of neutral landscapes interpolate smoothly between landscapes with maximum neutrality and non-neutral ones. We found some unknown structural differences between the two studied families of neutral landscapes. But overall, the network features studied confirmed that neutrality, in landscapes with percolating neutral networks, may enhance heuristic search. Our current methodology requires the exhaustive enumeration of the underlying search space. Therefore, sampling techniques should be developed before this analysis can have practical implications. We argue, however, that the proposed model offers a new perspective into the problem difficulty of combinatorial optimization problems and may inspire the design of more effective search heuristics.\nKnowledge mining is the process of deriving new and useful knowledge from vast volumes of data and background knowledge. Modern healthcare organizations regularly generate huge amount of electronic data stored in the databases. These data are a valuable resource for mining useful knowledge to help medical practitioners making appropriate and accurate decision on the diagnosis and treatment of diseases. In this paper, we propose the design of a novel medical expert system based on a logic-programming framework. The proposed system includes a knowledge-mining component as a repertoire of tools for discovering useful knowledge. The implementation of classification and association mining tools based on the higher order and meta-level programming schemes using Prolog has been presented to express the power of logic-based language. Such language also provides a pattern matching facility, which is an essential function for the development of knowledge-intensive tasks. Besides the major goal of medical decision support, the knowledge discovered by our logic-based knowledge-mining component can also be deployed as background knowledge to pre-treatment data from other sources as well as to guard the data repositories against constraint violation. A framework for knowledge deployment is also presented.\nMany real world domains require the representation of a measure of uncertainty. The most common such representation is probability, and the combination of probability with logic programs has given rise to the field of Probabilistic Logic Programming (PLP), leading to languages such as the Independent Choice Logic, Logic Programs with Annotated Disjunctions (LPADs), Problog, PRISM and others. These languages share a similar distribution semantics, and methods have been devised to translate programs between these languages. The complexity of computing the probability of queries to these general PLP programs is very high due to the need to combine the probabilities of explanations that may not be exclusive. As one alternative, the PRISM system reduces the complexity of query answering by restricting the form of programs it can evaluate. As an entirely different alternative, Possibilistic Logic Programs adopt a simpler metric of uncertainty than probability. Each of these approaches -- general PLP, restricted PLP, and Possibilistic Logic Programming -- can be useful in different domains depending on the form of uncertainty to be represented, on the form of programs needed to model problems, and on the scale of the problems to be solved. In this paper, we show how the PITA system, which originally supported the general PLP language of LPADs, can also efficiently support restricted PLP and Possibilistic Logic Programs. PITA relies on tabling with answer subsumption and consists of a transformation along with an API for library functions that interface with answer subsumption.\nAn important problem in bioinformatics is the inference of gene regulatory networks (GRN) from temporal expression profiles. In general, the main limitations faced by GRN inference methods is the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. In face of these limitations, alternatives are needed to get better accuracy on the GRNs inference problem. This work addresses this problem by presenting an alternative feature selection method that applies prior knowledge on its search strategy, called SFFS-BA. The proposed search strategy is based on the Sequential Floating Forward Selection (SFFS) algorithm, with the inclusion of a scale-free (Barab\\'asi-Albert) topology information in order to guide the search process to improve inference. The proposed algorithm explores the scale-free property by pruning the search space and using a power law as a weight for reducing it. In this way, the search space traversed by the SFFS-BA method combines a breadth-first search when the number of combinations is small (<k> <= 2) with a depth-first search when the number of combinations becomes explosive (<k> >= 3), being guided by the scale-free prior information. Experimental results show that the SFFS-BA provides a better inference similarities than SFS and SFFS, keeping the robustness of the SFS and SFFS methods, thus presenting very good results.\nLearning to operate a vehicle is generally accomplished by forming a new cognitive map between the body motions and extrapersonal space. Here, we consider the challenge of remapping movement-to-space representations in survivors of spinal cord injury, for the control of powered wheelchairs. Our goal is to facilitate this remapping by developing interfaces between residual body motions and navigational commands that exploit the degrees of freedom that disabled individuals are most capable to coordinate. We present a new framework for allowing spinal cord injured persons to control powered wheelchairs through signals derived from their residual mobility. The main novelty of this approach lies in substituting the more common joystick controllers of powered wheelchairs with a sensor shirt. This allows the whole upper body of the user to operate as an adaptive joystick. Considerations about learning and risks have lead us to develop a safe testing environment in 3D Virtual Reality. A Personal Augmented Reality Immersive System (PARIS) allows us to analyse learning skills and provide users with an adequate training to control a simulated wheelchair through the signals generated by body motions in a safe environment. We provide a description of the basic theory, of the development phases and of the operation of the complete system. We also present preliminary results illustrating the processing of the data and supporting of the feasibility of this approach.\nEfficient collaborative decision making is an important challenge for multiagent systems. Finding optimal joint actions is especially challenging when each agent has only imperfect information about the state of its environment. Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type. However, representing and solving such games requires space and computation time exponential in the number of agents. This article introduces collaborative graphical Bayesian games (CGBGs), which facilitate more efficient collaborative decision making by decomposing the global payoff function as the sum of local payoff functions that depend on only a few agents. We propose a framework for the efficient solution of CGBGs based on the insight that they posses two different types of independence, which we call agent independence and type independence. In particular, we present a factor graph representation that captures both forms of independence and thus enables efficient solutions. In addition, we show how this representation can provide leverage in sequential tasks by using it to construct a novel method for decentralized partially observable Markov decision processes. Experimental results in both random and benchmark tasks demonstrate the improved scalability of our methods compared to several existing alternatives.\nSpecifying and implementing flexible human-computer dialogs, such as those used in kiosks and smart phone apps, is challenging because of the numerous and varied directions in which each user might steer a dialog. The objective of this research is to improve dialog specification and implementation. To do so we enriched a notation based on concepts from programming languages, especially partial evaluation, for specifying a variety of unsolicited reporting, mixed-initiative dialogs in a concise representation that serves as a design for dialog implementation. We also built a dialog mining system that extracts a specification in this notation from requirements. To demonstrate that such a specification provides a design for dialog implementation, we built a system that automatically generates an implementation of the dialog, called a stager, from it. These two components constitute a dialog modeling toolkit that automates dialog specification and implementation. These results provide a proof of concept and demonstrate the study of dialog specification and implementation from a programming languages perspective. The ubiquity of dialogs in domains such as travel, education, and health care combined with the demand for smart phone apps provide a landscape for further investigation of these results.\nCrowdsourcing websites (e.g. Yahoo! Answers, Amazon Mechanical Turk, and etc.) emerged in recent years that allow requesters from all around the world to post tasks and seek help from an equally global pool of workers. However, intrinsic incentive problems reside in crowdsourcing applications as workers and requester are selfish and aim to strategically maximize their own benefit. In this paper, we propose to provide incentives for workers to exert effort using a novel game-theoretic model based on repeated games. As there is always a gap in the social welfare between the non-cooperative equilibria emerging when workers pursue their self-interests and the desirable Pareto efficient outcome, we propose a novel class of incentive protocols based on social norms which integrates reputation mechanisms into the existing pricing schemes currently implemented on crowdsourcing websites, in order to improve the performance of the non-cooperative equilibria emerging in such applications. We first formulate the exchanges on a crowdsourcing website as a two-sided market where requesters and workers are matched and play gift-giving games repeatedly. Subsequently, we study the protocol designer's problem of finding an optimal and sustainable (equilibrium) protocol which achieves the highest social welfare for that website. We prove that the proposed incentives protocol can make the website operate close to Pareto efficiency. Moreover, we also examine an alternative scenario, where the protocol designer aims at maximizing the revenue of the website and evaluate the performance of the optimal protocol.\nWe propose a dynamic logic of lying, wherein a 'lie that phi' (where phi is a formula in the logic) is an action in the sense of dynamic modal logic, that is interpreted as a state transformer relative to the formula phi. The states that are being transformed are pointed Kripke models encoding the uncertainty of agents about their beliefs. Lies can be about factual propositions but also about modal formulas, such as the beliefs of other agents or the belief consequences of the lies of other agents. We distinguish (i) an outside observer who is lying to an agent that is modelled in the system, from (ii) one agent who is lying to another agent, and where both are modelled in the system. For either case, we further distinguish (iii) the agent who believes everything that it is told (even at the price of inconsistency), from (iv) the agent who only believes what it is told if that is consistent with its current beliefs, and from (v) the agent who believes everything that it is told by consistently revising its current beliefs. The logics have complete axiomatizations, which can most elegantly be shown by way of their embedding in what is known as action model logic or the extension of that logic to belief revision.\nGiven a set of several inputs into a system (e.g., independent variables characterizing stimuli) and a set of several stochastically non-independent outputs (e.g., random variables describing different aspects of responses), how can one determine, for each of the outputs, which of the inputs it is influenced by? The problem has applications ranging from modeling pairwise comparisons to reconstructing mental processing architectures to conjoint testing. A necessary and sufficient condition for a given pattern of selective influences is provided by the Joint Distribution Criterion, according to which the problem of \"what influences what\" is equivalent to that of the existence of a joint distribution for a certain set of random variables. For inputs and outputs with finite sets of values this criterion translates into a test of consistency of a certain system of linear equations and inequalities (Linear Feasibility Test) which can be performed by means of linear programming. The Joint Distribution Criterion also leads to a metatheoretical principle for generating a broad class of necessary conditions (tests) for diagrams of selective influences. Among them is the class of distance-type tests based on the observation that certain functionals on jointly distributed random variables satisfy triangle inequality.\nThere is little research concerning comparisons and combination of System Dynamics Simulation (SDS) and Agent Based Simulation (ABS). ABS is a paradigm used in many levels of abstraction, including those levels covered by SDS. We believe that the establishment of frameworks for the choice between these two simulation approaches would contribute to the simulation research. Hence, our work aims for the establishment of directions for the choice between SDS and ABS approaches for immune system-related problems. Previously, we compared the use of ABS and SDS for modelling agents' behaviour in an environment with nomovement or interactions between these agents. We concluded that for these types of agents it is preferable to use SDS, as it takes up less computational resources and produces the same results as those obtained by the ABS model. In order to move this research forward, our next research question is: if we introduce interactions between these agents will SDS still be the most appropriate paradigm to be used? To answer this question for immune system simulation problems, we will use, as case studies, models involving interactions between tumour cells and immune effector cells. Experiments show that there are cases where SDS and ABS can not be used interchangeably, and therefore, their comparison is not straightforward.\nDung's famous abstract argumentation frameworks represent the core formalism for many problems and applications in the field of argumentation which significantly evolved within the last decade. Recent work in the field has thus focused on implementations for these frameworks, whereby one of the main approaches is to use Answer-Set Programming (ASP). While some of the argumentation semantics can be nicely expressed within the ASP language, others required rather cumbersome encoding techniques. Recent advances in ASP systems, in particular, the metasp optimization frontend for the ASP-package gringo/claspD provides direct commands to filter answer sets satisfying certain subset-minimality (or -maximality) constraints. This allows for much simpler encodings compared to the ones in standard ASP language. In this paper, we experimentally compare the original encodings (for the argumentation semantics based on preferred, semi-stable, and respectively, stage extensions) with new metasp encodings. Moreover, we provide novel encodings for the recently introduced resolution-based grounded semantics. Our experimental results indicate that the metasp approach works well in those cases where the complexity of the encoded problem is adequately mirrored within the metasp approach.\nTo reduce datacenter energy consumption and cost, current practice has considered demand-proportional resource provisioning schemes, where servers are turned on/off according to the load of requests.   Most existing work considers instantaneous (Internet) requests only, which are explicitly or implicitly assumed to be delay-sensitive. On the other hand, in datacenters, there exist a vast amount of delay-tolerant jobs, such as background/maintainance jobs. In this paper, we explicitly differentiate delay-sensitive jobs and delay tolerant jobs. We focus on the problem of using delay-tolerant jobs to fill the extra capacity of datacenters, referred to as trough/valley filling. Giving a higher priority to delay-sensitive jobs, our schemes complement to most existing demand-proportional resource provisioning schemes. Our goal is to design intelligent trough filling mechanisms that are energy efficient and also achieve good delay performance. Specifically, we propose two joint dynamic speed scaling and traffic shifting schemes, one subgradient-based and the other queue-based. Our schemes assume little statistical information of the system, which is usually difficult to obtain in practice. In both schemes, energy cost saving comes from dynamic speed scaling, statistical multiplexing, electricity price diversity, and service efficiency diversity. In addition, good delay performance is achieved in the queue-based scheme via load shifting and capacity allocation based on queue conditions. Practical issues that may arise in datacenter networks are considered, including capacity and bandwidth constraint, service agility constraint, and load shifting cost. We use both artificial and real datacenter traces to evaluate the proposed schemes.\nObtaining the set of cosmological parameters consistent with observational data is an important exercise in current cosmological research. It involves finding the global maximum of the likelihood function in the multi-dimensional parameter space. Currently sampling based methods, which are in general stochastic in nature, like Markov-Chain Monte Carlo(MCMC), are being commonly used for parameter estimation. The beauty of stochastic methods is that the computational cost grows, at the most, linearly in place of exponentially (as in grid based approaches) with the dimensionality of the search space. MCMC methods sample the full joint probability distribution (posterior) from which one and two dimensional probability distributions, best fit (average) values of parameters and then error bars can be computed. In the present work we demonstrate the application of another stochastic method, named Particle Swarm Optimization (PSO), that is widely used in the field of engineering and artificial intelligence, for cosmological parameter estimation from WMAP seven years data. We find that there is a good agreement between the values of the best fit parameters obtained from PSO and publicly available code COSMOMC. However, there is a slight disagreement between error bars mainly due to the fact that errors are computed differently in PSO. Apart from presenting the results of our exercise, we also discuss the merits of PSO and explain its usefulness in more extensive search in higher dimensional parameter space.\nWe introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.\nThe paper studies machine learning problems where each example is described using a set of Boolean features and where hypotheses are represented by linear threshold elements. One method of increasing the expressiveness of learned hypotheses in this context is to expand the feature set to include conjunctions of basic features. This can be done explicitly or where possible by using a kernel function. Focusing on the well known Perceptron and Winnow algorithms, the paper demonstrates a tradeoff between the computational efficiency with which the algorithm can be run over the expanded feature space and the generalization ability of the corresponding learning algorithm. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over a feature space of exponentially many conjunctions; however we also show that using such kernels, the Perceptron algorithm can provably make an exponential number of mistakes even when learning simple functions. We then consider the question of whether kernel functions can analogously be used to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. Known upper bounds imply that the Winnow algorithm can learn Disjunctive Normal Form (DNF) formulae with a polynomial mistake bound in this setting. However, we prove that it is computationally hard to simulate Winnows behavior for learning DNF over such a feature set. This implies that the kernel functions which correspond to running Winnow for this problem are not efficiently computable, and that there is no general construction that can run Winnow with kernels.\nIn this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed.\nWe discuss an attentional model for simultaneous object tracking and recognition that is driven by gaze data. Motivated by theories of perception, the model consists of two interacting pathways: identity and control, intended to mirror the what and where pathways in neuroscience models. The identity pathway models object appearance and performs classification using deep (factored)-Restricted Boltzmann Machines. At each point in time the observations consist of foveated images, with decaying resolution toward the periphery of the gaze. The control pathway models the location, orientation, scale and speed of the attended object. The posterior distribution of these states is estimated with particle filtering. Deeper in the control pathway, we encounter an attentional mechanism that learns to select gazes so as to minimize tracking uncertainty. Unlike in our previous work, we introduce gaze selection strategies which operate in the presence of partial information and on a continuous action space. We show that a straightforward extension of the existing approach to the partial information setting results in poor performance, and we propose an alternative method based on modeling the reward surface as a Gaussian Process. This approach gives good performance in the presence of partial information and allows us to expand the action space from a small, discrete set of fixation points to a continuous domain.\nIn answer-set programming (ASP), the solutions of a problem are encoded in dedicated models, called answer sets, of a logical theory. These answer sets are computed from the program that represents the theory by means of an ASP solver and returned to the user as sets of ground first-order literals. As this type of representation is often cumbersome for the user to interpret, tools like ASPVIZ and IDPDraw were developed that allow for visualising answer sets. The tool Kara, introduced in this paper, follows these approaches, using ASP itself as a language for defining visualisations of interpretations. Unlike existing tools that position graphic primitives according to static coordinates only, Kara allows for more high-level specifications, supporting graph structures, grids, and relative positioning of graphical elements. Moreover, generalising the functionality of previous tools, Kara provides modifiable visualisations such that interpretations can be manipulated by graphically editing their visualisations. This is realised by resorting to abductive reasoning techniques. Kara is part of SeaLion, a forthcoming integrated development environment (IDE) for ASP.\nThis paper proposes a novel latent semantic learning method for extracting high-level features (i.e. latent semantics) from a large vocabulary of abundant mid-level features (i.e. visual keywords) with structured sparse representation, which can help to bridge the semantic gap in the challenging task of human action recognition. To discover the manifold structure of midlevel features, we develop a spectral embedding approach to latent semantic learning based on L1-graph, without the need to tune any parameter for graph construction as a key step of manifold learning. More importantly, we construct the L1-graph with structured sparse representation, which can be obtained by structured sparse coding with its structured sparsity ensured by novel L1-norm hypergraph regularization over mid-level features. In the new embedding space, we learn latent semantics automatically from abundant mid-level features through spectral clustering. The learnt latent semantics can be readily used for human action recognition with SVM by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our latent semantic learning method can explore the manifold structure of mid-level features in both L1-graph construction and spectral embedding, which results in compact but discriminative high-level features. The experimental results on the commonly used KTH action dataset and unconstrained YouTube action dataset show the superior performance of our method.\nThis paper studies the topic modeling problem of tagged documents and images. Higher-order relations among tagged documents and images are major and ubiquitous characteristics, and play positive roles in extracting reliable and interpretable topics. In this paper, we propose the tag-topic models (TTM) to depict such higher-order topic structural dependencies within the Markov random field (MRF) framework. First, we use the novel factor graph representation of latent Dirichlet allocation (LDA)-based topic models from the MRF perspective, and present an efficient loopy belief propagation (BP) algorithm for approximate inference and parameter estimation. Second, we propose the factor hypergraph representation of TTM, and focus on both pairwise and higher-order relation modeling among tagged documents and images. Efficient loopy BP algorithm is developed to learn TTM, which encourages the topic labeling smoothness among tagged documents and images. Extensive experimental results confirm the incorporation of higher-order relations to be effective in enhancing the overall topic modeling performance, when compared with current state-of-the-art topic models, in many text and image mining tasks of broad interests such as word and link prediction, document classification, and tag recommendation.\nEffective coordination of agents actions in partially-observable domains is a major challenge of multi-agent systems research. To address this, many researchers have developed techniques that allow the agents to make decisions based on estimates of the states and actions of other agents that are typically learnt using some form of machine learning algorithm. Nevertheless, many of these approaches fail to provide an actual means by which the necessary information is made available so that the estimates can be learnt. To this end, we argue that cooperative communication of state information between agents is one such mechanism. However, in a dynamically changing environment, the accuracy and timeliness of this communicated information determine the fidelity of the learned estimates and the usefulness of the actions taken based on these. Given this, we propose a novel information-sharing protocol, post-task-completion sharing, for the distribution of state information. We then show, through a formal analysis, the improvement in the quality of estimates produced using our strategy over the widely used protocol of sharing information between nearest neighbours. Moreover, communication heuristics designed around our information-sharing principle are subjected to empirical evaluation along with other benchmark strategies (including Littmans Q-routing and Stones TPOT-RL) in a simulated call-routing application. These studies, conducted across a range of environmental settings, show that, compared to the different benchmarks used, our strategy generates an improvement of up to 60% in the call connection rate; of more than 1000% in the ability to connect long-distance calls; and incurs as low as 0.25 of the message overhead.\nOne of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a template-based generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations.\nDescription logic programs (dl-programs) under the answer set semantics formulated by Eiter {\\em et al.} have been considered as a prominent formalism for integrating rules and ontology knowledge bases. A question of interest has been whether dl-programs can be captured in a general formalism of nonmonotonic logic. In this paper, we study the possibility of embedding dl-programs into default logic. We show that dl-programs under the strong and weak answer set semantics can be embedded in default logic by combining two translations, one of which eliminates the constraint operator from nonmonotonic dl-atoms and the other translates a dl-program into a default theory. For dl-programs without nonmonotonic dl-atoms but with the negation-as-failure operator, our embedding is polynomial, faithful, and modular. In addition, our default logic encoding can be extended in a simple way to capture recently proposed weakly well-supported answer set semantics, for arbitrary dl-programs. These results reinforce the argument that default logic can serve as a fruitful foundation for query-based approaches to integrating ontology and rules. With its simple syntax and intuitive semantics, plus available computational results, default logic can be considered an attractive approach to integration of ontology and rules.\nElectronic government (e-government) has been one of the most active areas of ontology development during the past six years. In e-government, ontologies are being used to describe and specify e-government services (e-services) because they enable easy composition, matching, mapping and merging of various e-government services. More importantly, they also facilitate the semantic integration and interoperability of e-government services. However, it is still unclear in the current literature how an existing ontology building methodology can be applied to develop semantic ontology models in a government service domain. In this paper the Uschold and King ontology building methodology is applied to develop semantic ontology models in a government service domain. Firstly, the Uschold and King methodology is presented, discussed and applied to build a government domain ontology. Secondly, the domain ontology is evaluated for semantic consistency using its semi-formal representation in Description Logic. Thirdly, an alignment of the domain ontology with the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) upper level ontology is drawn to allow its wider visibility and facilitate its integration with existing metadata standard. Finally, the domain ontology is formally written in Web Ontology Language (OWL) to enable its automatic processing by computers. The study aims to provide direction for the application of existing ontology building methodologies in the Semantic Web development processes of e-government domain specific ontology models; which would enable their repeatability in other e-government projects and strengthen the adoption of semantic technologies in e-government.\nRGB-D cameras, which give an RGB image to- gether with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the 3D point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model's parsimony becomes important and we address that by using multiple types of edge potentials. We train the model using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views), we get a performance of 84.06% and 73.38% in labeling office and home scenes respectively for 17 object classes each. We also present a method for a robot to search for an object using the learned model and the contextual information available from the current labelings of the scene. We applied this algorithm successfully on a mobile robot for the task of finding 12 object classes in 10 different offices and achieved a precision of 97.56% with 78.43% recall.\nA famous result by Jeavons, Cohen, and Gyssens shows that every constraint satisfaction problem (CSP) where the constraints are preserved by a semi-lattice operation can be solved in polynomial time. This is one of the basic facts for the so-called universal-algebraic approach to a systematic theory of tractability and hardness in finite domain constraint satisfaction.   Not surprisingly, the theorem of Jeavons et al. fails for arbitrary infinite domain CSPs. Many CSPs of practical interest, though, and in particular those CSPs that are motivated by qualitative reasoning calculi from Artificial Intelligence, can be formulated with constraint languages that are rather well-behaved from a model-theoretic point of view. In particular, the automorphism group of these constraint languages tends to be large in the sense that the number of orbits of n-subsets of the automorphism group is bounded by some function in n.   In this paper we present a generalization of the theorem by Jeavons et al. to infinite domain CSPs where the number of orbits of n-subsets grows sub-exponentially in n, and prove that preservation under a semi-lattice operation for such CSPs implies polynomial-time tractability. Unlike the result of Jeavons et al., this includes many CSPs that cannot be solved by Datalog.\nA few decades of work in the AI field have focused efforts on developing a new generation of systems which can acquire knowledge via interaction with the world. Yet, until very recently, most such attempts were underpinned by research which predominantly regarded linguistic phenomena as separated from the brain and body. This could lead one into believing that to emulate linguistic behaviour, it suffices to develop 'software' operating on abstract representations that will work on any computational machine. This picture is inaccurate for several reasons, which are elucidated in this paper and extend beyond sensorimotor and semantic resonance. Beginning with a review of research, I list several heterogeneous arguments against disembodied language, in an attempt to draw conclusions for developing embodied multisensory agents which communicate verbally and non-verbally with their environment. Without taking into account both the architecture of the human brain, and embodiment, it is unrealistic to replicate accurately the processes which take place during language acquisition, comprehension, production, or during non-linguistic actions. While robots are far from isomorphic with humans, they could benefit from strengthened associative connections in the optimization of their processes and their reactivity and sensitivity to environmental stimuli, and in situated human-machine interaction. The concept of multisensory integration should be extended to cover linguistic input and the complementary information combined from temporally coincident sensory impressions.\nCharacter posing is of interest in computer animation. It is difficult due to its dependence on inverse kinematics (IK) techniques and articulate property of human characters . To solve the IK problem, classical methods that rely on numerical solutions often suffer from the under-determination problem and can not guarantee naturalness. Existing data-driven methods address this problem by learning from motion capture data. When facing a large variety of poses however, these methods may not be able to capture the pose styles or be applicable in real-time environment. Inspired from the low-rank motion de-noising and completion model in \\cite{lai2011motion}, we propose a novel model for character posing based on sparse coding. Unlike conventional approaches, our model directly captures the pose styles in Euclidean space to provide intuitive training error measurements and facilitate pose synthesis. A pose dictionary is learned in training stage and based on it natural poses are synthesized to satisfy users' constraints . We compare our model with existing models for tasks of pose de-noising and completion. Experiments show our model obtains lower de-noising and completion error. We also provide User Interface(UI) examples illustrating that our model is effective for interactive character posing.\nText mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new genre in web 2.0 is a digital diary of web user, which has chronological entries and contains a lot of useful knowledge, thus offers a lot of challenges and opportunities for text mining. In this paper, we report a new indigenous dataset for Pakistani Political Blogosphere. The paper describes the process of data collection, organization, and standardization. We have used this dataset for carrying out various text mining tasks for blogosphere, like: blog-search, political sentiments analysis and tracking, identification of influential blogger, and clustering of the blog-posts. We wish to offer this dataset free for others who aspire to pursue further in this domain.\nThe bi-objective winner determination problem (2WDP-SC) of a combinatorial procurement auction for transport contracts is characterized by a set B of bundle bids, with each bundle bid b in B consisting of a bidding carrier c_b, a bid price p_b, and a set tau_b transport contracts which is a subset of the set T of tendered transport contracts. Additionally, the transport quality q_{t,c_b} is given which is expected to be realized when a transport contract t is executed by a carrier c_b. The task of the auctioneer is to find a set X of winning bids (X subset B), such that each transport contract is part of at least one winning bid, the total procurement costs are minimized, and the total transport quality is maximized. This article presents a metaheuristic approach for the 2WDP-SC which integrates the greedy randomized adaptive search procedure with a two-stage candidate component selection procedure, large neighborhood search, and self-adaptive parameter setting in order to find a competitive set of non-dominated solutions. The heuristic outperforms all existing approaches. For seven small benchmark instances, the heuristic is the sole approach that finds all Pareto-optimal solutions. For 28 out of 30 large instances, none of the existing approaches is able to compute a solution that dominates a solution found by the proposed heuristic.\nMultilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen to label an instance. Due to the problem complexity (the solution is one among an exponential number of alternatives), a very common solution (the binary method) is frequently used, learning a binary classifier for every category, and combining them all afterwards. The assumption taken in this solution is not realistic, and in this work we give examples where the decisions for all the labels are not taken independently, and thus, a supervised approach should learn those existing relationships among categories to make a better classification. Therefore, we show here a generic methodology that can improve the results obtained by a set of independent probabilistic binary classifiers, by using a combination procedure with a classifier trained on the co-occurrences of the labels. We show an exhaustive experimentation in three different standard corpora of labeled documents (Reuters-21578, Ohsumed-23 and RCV1), which present noticeable improvements in all of them, when using our methodology, in three probabilistic base classifiers.\nThis paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning.\nIn the contexts of automated reasoning and formal verification, important decision problems are effectively encoded into Satisfiability Modulo Theories (SMT). In the last decade efficient SMT solvers have been developed for several theories of practical interest (e.g., linear arithmetic, arrays, bit-vectors). Surprisingly, very few work has been done to extend SMT to deal with optimization problems; in particular, we are not aware of any work on SMT solvers able to produce solutions which minimize cost functions over arithmetical variables. This is unfortunate, since some problems of interest require this functionality.   In this paper we start filling this gap. We present and discuss two general procedures for leveraging SMT to handle the minimization of LA(Q) cost functions, combining SMT with standard minimization techniques. We have implemented the proposed approach within the MathSAT SMT solver. Due to the lack of competitors in AR and SMT domains, we experimentally evaluated our implementation against state-of-the-art tools for the domain of linear generalized disjunctive programming (LGDP), which is closest in spirit to our domain, on sets of problems which have been previously proposed as benchmarks for the latter tools. The results show that our tool is very competitive with, and often outperforms, these tools on these problems, clearly demonstrating the potential of the approach.\nAchieving joint objectives by teams of cooperative planning agents requires significant coordination and communication efforts. For a single-agent system facing a plan failure in a dynamic environment, arguably, attempts to repair the failed plan in general do not straightforwardly bring any benefit in terms of time complexity. However, in multi-agent settings the communication complexity might be of a much higher importance, possibly a high communication overhead might be even prohibitive in certain domains. We hypothesize that in decentralized systems, where coordination is enforced to achieve joint objectives, attempts to repair failed multi-agent plans should lead to lower communication overhead than replanning from scratch.   The contribution of the presented paper is threefold. Firstly, we formally introduce the multi-agent plan repair problem and formally present the core hypothesis underlying our work. Secondly, we propose three algorithms for multi-agent plan repair reducing the problem to specialized instances of the multi-agent planning problem. Finally, we present results of experimental validation confirming the core hypothesis of the paper.\nIt is a high-quality algorithm for hierarchical clustering of large software source code. This effectively allows to break the complexity of tens of millions lines of source code, so that a human software engineer can comprehend a software system at high level by means of looking at its architectural diagram that is reconstructed automatically from the source code of the software system. The architectural diagram shows a tree of subsystems having OOP classes in its leaves (in the other words, a nested software decomposition). The tool reconstructs the missing (inconsistent/incomplete/inexistent) architectural documentation for a software system from its source code. This facilitates software maintenance: change requests can be performed substantially faster. Simply speaking, this unique tool allows to lift the comprehensible grain of object-oriented software systems from OOP class-level to subsystem-level. It is estimated that a commercial tool, developed on the basis of this work, will reduce software maintenance expenses 10 times on the current needs, and will allow to implement next-generation software systems which are currently too complex to be within the range of human comprehension, therefore can't yet be designed or implemented. Implemented prototype in Open Source: http://sourceforge.net/p/insoar/code-0/1/tree/\nIn this paper we present {\\em refinement modal logic}. A refinement is like a bisimulation, except that from the three relational requirements only `atoms' and `back' need to be satisfied. Our logic contains a new operator 'all' in addition to the standard modalities 'box' for each agent. The operator 'all' acts as a quantifier over the set of all refinements of a given model. As a variation on a bisimulation quantifier, this refinement operator or refinement quantifier 'all' can be seen as quantifying over a variable not occurring in the formula bound by it. The logic combines the simplicity of multi-agent modal logic with some powers of monadic second-order quantification. We present a sound and complete axiomatization of multi-agent refinement modal logic. We also present an extension of the logic to the modal mu-calculus, and an axiomatization for the single-agent version of this logic. Examples and applications are also discussed: to software verification and design (the set of agents can also be seen as a set of actions), and to dynamic epistemic logic. We further give detailed results on the complexity of satisfiability, and on succinctness.\nThe assignment of tasks to multiple resources becomes an interesting game theoretic problem, when both the task owner and the resources are strategic. In the classical, nonstrategic setting, where the states of the tasks and resources are observable by the controller, this problem is that of finding an optimal policy for a Markov decision process (MDP). When the states are held by strategic agents, the problem of an efficient task allocation extends beyond that of solving an MDP and becomes that of designing a mechanism. Motivated by this fact, we propose a general mechanism which decides on an allocation rule for the tasks and resources and a payment rule to incentivize agents' participation and truthful reports. In contrast to related dynamic strategic control problems studied in recent literature, the problem studied here has interdependent values: the benefit of an allocation to the task owner is not simply a function of the characteristics of the task itself and the allocation, but also of the state of the resources. We introduce a dynamic extension of Mezzetti's two phase mechanism for interdependent valuations. In this changed setting, the proposed dynamic mechanism is efficient, within period ex-post incentive compatible, and within period ex-post individually rational.\nKnuth (1990) introduced the class of nested formulas and showed that their satisfiability can be decided in polynomial time. We show that, parameterized by the size of a smallest strong backdoor set to the target class of nested formulas, checking the satisfiability of any CNF formula is fixed-parameter tractable. Thus, for any k>0, the satisfiability problem can be solved in polynomial time for any formula F for which there exists a variable set B of size at most k such that for every truth assignment t to B, the formula F[t] is nested; moreover, the degree of the polynomial is independent of k.   Our algorithm uses the grid-minor theorem of Robertson and Seymour (1986) to either find that the incidence graph of the formula has bounded treewidth - a case that is solved using model checking for monadic second order logic - or to find many vertex-disjoint obstructions in the incidence graph. For the latter case, new combinatorial arguments are used to find a small backdoor set. Combining both cases leads to an approximation algorithm producing a strong backdoor set whose size is upper bounded by a function of the optimum. Going through all assignments to this set of variables and using Knuth's algorithm, the satisfiability of the input formula is decided.\nBayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose to either sequentially evaluate the function, one input at a time and wait for the output of the function before making the next selection, or evaluate the function at a batch of multiple inputs at once. These two different settings are commonly referred to as the sequential and batch settings of Bayesian Optimization. In general, the sequential setting leads to better optimization performance as each function evaluation is selected with more information, whereas the batch setting has an advantage in terms of the total experimental time (the number of iterations). In this work, our goal is to combine the strength of both settings. Specifically, we systematically analyze Bayesian optimization using Gaussian process as the posterior estimator and provide a hybrid algorithm that, based on the current state, dynamically switches between a sequential policy and a batch policy with variable batch sizes. We provide theoretical justification for our algorithm and present experimental results on eight benchmark BO problems. The results show that our method achieves substantial speedup (up to %78) compared to a pure sequential policy, without suffering any significant performance loss.\nThis paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.\nIn the paper, frameworks for electronic shopping of composite (modular) products are described: (a) multicriteria selection (product is considered as a whole system, it is a traditional approach), (b) combinatorial synthesis (composition) of the product from its components, (c) aggregation of the product from several selected products/prototypes. The following product model is examined: (i) general tree-like structure, (ii) set of system parts/components (leaf nodes), (iii) design alternatives (DAs) for each component, (iv) ordinal priorities for DAs, and (v) estimates of compatibility between DAs for different components. The combinatorial synthesis is realized as morphological design of a composite (modular) product or an extended composite product (e.g., product and support services as financial instruments). Here the solving process is based on Hierarchical Morphological Multicriteria Design (HMMD): (i) multicriteria selection of alternatives for system parts, (ii) composing the selected alternatives into a resultant combination (while taking into account ordinal quality of the alternatives above and their compatibility). The aggregation framework is based on consideration of aggregation procedures, for example: (i) addition procedure: design of a products substructure or an extended substructure ('kernel') and addition of elements, and (ii) design procedure: design of the composite solution based on all elements of product superstructure. Applied numerical examples (e.g., composite product, extended composite product, product repair plan, and product trajectory) illustrate the proposed approaches.\nWe consider unsupervised estimation of mixtures of discrete graphical models, where the class variable corresponding to the mixture components is hidden and each mixture component over the observed variables can have a potentially different Markov graph structure and parameters. We propose a novel approach for estimating the mixture components, and our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. Our method is efficient when the union graph, which is the union of the Markov graphs of the mixture components, has sparse vertex separators between any pair of observed variables. This includes tree mixtures and mixtures of bounded degree graphs. For such models, we prove that our method correctly recovers the union graph structure and the tree structures corresponding to maximum-likelihood tree approximations of the mixture components. The sample and computational complexities of our method scale as $\\poly(p, r)$, for an $r$-component mixture of $p$-variate graphical models. We further extend our results to the case when the union graph has sparse local separators between any pair of observed variables, such as mixtures of locally tree-like graphs, and the mixture components are in the regime of correlation decay.\nIn Multi-Source Feedback or 360 Degree Feedback, data on the performance of an individual are collected systematically from a number of stakeholders and are used for improving performance. The 360-Degree Feedback approach provides a consistent management philosophy meeting the criterion outlined previously. The 360-degree feedback appraisal process describes a human resource methodology that is frequently used for both employee appraisal and employee development. Used in employee performance appraisals, the 360-degree feedback methodology is differentiated from traditional, top-down appraisal methods in which the supervisor responsible for the appraisal provides the majority of the data. Instead it seeks to use information gained from other sources to provide a fuller picture of employees' performances. Similarly, when this technique used in employee development it augments employees' perceptions of training needs with those of the people with whom they interact. The 360-degree feedback based appraisal is a comprehensive method where in the feedback about the employee comes from all the sources that come into contact with the employee on his/her job. The respondents for an employee can be her/his peers, managers, subordinates team members, customers, suppliers and vendors. Hence anyone who comes into contact with the employee, the 360 degree appraisal has four components that include self-appraisal, superior's appraisal, subordinate's appraisal student's appraisal and peer's appraisal .The proposed system is an attempt to implement the 360 degree feedback based appraisal system in academics especially engineering colleges.\nTo understand the structural dynamics of a large-scale social, biological or technological network, it may be useful to discover behavioral roles representing the main connectivity patterns present over time. In this paper, we propose a scalable non-parametric approach to automatically learn the structural dynamics of the network and individual nodes. Roles may represent structural or behavioral patterns such as the center of a star, peripheral nodes, or bridge nodes that connect different communities. Our novel approach learns the appropriate structural role dynamics for any arbitrary network and tracks the changes over time. In particular, we uncover the specific global network dynamics and the local node dynamics of a technological, communication, and social network. We identify interesting node and network patterns such as stationary and non-stationary roles, spikes/steps in role-memberships (perhaps indicating anomalies), increasing/decreasing role trends, among many others. Our results indicate that the nodes in each of these networks have distinct connectivity patterns that are non-stationary and evolve considerably over time. Overall, the experiments demonstrate the effectiveness of our approach for fast mining and tracking of the dynamics in large networks. Furthermore, the dynamic structural representation provides a basis for building more sophisticated models and tools that are fast for exploring large dynamic networks.\nBayesian structure learning is the NP-hard problem of discovering a Bayesian network that optimally represents a given set of training data. In this paper we study the computational worst-case complexity of exact Bayesian structure learning under graph theoretic restrictions on the super-structure. The super-structure (a concept introduced by Perrier, Imoto, and Miyano, JMLR 2008) is an undirected graph that contains as subgraphs the skeletons of solution networks. Our results apply to several variants of score-based Bayesian structure learning where the score of a network decomposes into local scores of its nodes. Results: We show that exact Bayesian structure learning can be carried out in non-uniform polynomial time if the super-structure has bounded treewidth and in linear time if in addition the super-structure has bounded maximum degree. We complement this with a number of hardness results. We show that both restrictions (treewidth and degree) are essential and cannot be dropped without loosing uniform polynomial time tractability (subject to a complexity-theoretic assumption). Furthermore, we show that the restrictions remain essential if we do not search for a globally optimal network but we aim to improve a given network by means of at most k arc additions, arc deletions, or arc reversals (k-neighborhood local search).\nGaussian processes (GPs) provide a probabilistic nonparametric representation of functions in regression, classification, and other problems. Unfortunately, exact learning with GPs is intractable for large datasets. A variety of approximate GP methods have been proposed that essentially map the large dataset into a small set of basis points. Among them, two state-of-the-art methods are sparse pseudo-input Gaussian process (SPGP) (Snelson and Ghahramani, 2006) and variablesigma GP (VSGP) Walder et al. (2008), which generalizes SPGP and allows each basis point to have its own length scale. However, VSGP was only derived for regression. In this paper, we propose a new sparse GP framework that uses expectation propagation to directly approximate general GP likelihoods using a sparse and smooth basis. It includes both SPGP and VSGP for regression as special cases. Plus as an EP algorithm, it inherits the ability to process data online. As a particular choice of approximating family, we blur each basis point with a Gaussian distribution that has a full covariance matrix representing the data distribution around that basis point; as a result, we can summarize local data manifold information with a small set of basis points. Our experiments demonstrate that this framework outperforms previous GP classification methods on benchmark datasets in terms of minimizing divergence to the non-sparse GP solution as well as lower misclassification rate.\nUndirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.\nThe problem of structure estimation in graphical models with latent variables is considered. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider models where the underlying Markov graph is locally tree-like, and the model is in the regime of correlation decay. For the special case of the Ising model, the number of samples $n$ required for structural consistency of our method scales as $n=\\Omega(\\theta_{\\min}^{-\\delta\\eta(\\eta+1)-2}\\log p)$, where p is the number of variables, $\\theta_{\\min}$ is the minimum edge potential, $\\delta$ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and $\\eta$ is a parameter which depends on the bounds on node and edge potentials in the Ising model. Necessary conditions for structural consistency under any algorithm are derived and our method nearly matches the lower bound on sample requirements. Further, the proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph.\nI present a new approach to recover the primordial density fluctuations and the cosmic web structure underlying a galaxy distribution. The method is based on sampling Gaussian fields which are compatible with a galaxy distribution and a structure formation model. This is achieved by splitting the inversion problem into two Gibbs-sampling steps: the first being a Gaussianisation step transforming a distribution of point sources at Lagrangian positions -which are not a priori given- into a linear alias-free Gaussian field. This step is based on Hamiltonian sampling with a Gaussian-Poisson model. The second step consists on a likelihood comparison in which the set of matter tracers at the initial conditions is constrained on the galaxy distribution and the assumed structure formation model. For computational reasons second order Lagrangian Perturbation Theory is used. However, the presented approach is flexible to adopt any structure formation model. A semi-analytic halo-model based galaxy mock catalog is taken to demonstrate that the recovered initial conditions are closely unbiased with respect to the actual ones from the corresponding N-body simulation down to scales of a ~ 5 Mpc/h. The cross-correlation between them shows a substantial gain of information, being at k ~ 0.3 h/Mpc more than doubled. In addition the initial conditions are extremely well Gaussian distributed and the power-spectra follow the shape of the linear power-spectrum being very close to the actual one from the simulation down to scales of k ~ 1 h/Mpc.\nRelational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.\nWe propose a graphical model for representing networks of stochastic processes, the minimal generative model graph. It is based on reduced factorizations of the joint distribution over time. We show that under appropriate conditions, it is unique and consistent with another type of graphical model, the directed information graph, which is based on a generalization of Granger causality. We demonstrate how directed information quantifies Granger causality in a particular sequential prediction setting. We also develop efficient methods to estimate the topological structure from data that obviate estimating the joint statistics. One algorithm assumes upper-bounds on the degrees and uses the minimal dimension statistics necessary. In the event that the upper-bounds are not valid, the resulting graph is nonetheless an optimal approximation. Another algorithm uses near-minimal dimension statistics when no bounds are known but the distribution satisfies a certain criterion. Analogous to how structure learning algorithms for undirected graphical models use mutual information estimates, these algorithms use directed information estimates. We characterize the sample-complexity of two plug-in directed information estimators and obtain confidence intervals. For the setting when point estimates are unreliable, we propose an algorithm that uses confidence intervals to identify the best approximation that is robust to estimation error. Lastly, we demonstrate the effectiveness of the proposed algorithms through analysis of both synthetic data and real data from the Twitter network. In the latter case, we identify which news sources influence users in the network by merely analyzing tweet times.\nWe present an automated classification of 2165 \\textit{Kepler} eclipsing binary (EB) light curves that accompanied the second \\textit{Kepler} data release. The light curves are classified using Locally Linear Embedding, a general nonlinear dimensionality reduction tool, into morphology types (detached, semi-detached, overcontact, ellipsoidal). The method, related to a more widely used Principal Component Analysis, produces a lower-dimensional representation of the input data while preserving local geometry and, consequently, the similarity between neighboring data points. We use this property to reduce the dimensionality in a series of steps to a one-dimensional manifold and classify light curves with a single parameter that is a measure of \"detachedness\" of the system. This fully automated classification correlates well with the manual determination of morphology from the data release, and also efficiently highlights any misclassified objects. Once a lower-dimensional projection space is defined, the classification of additional light curves runs in a negligible time and the method can therefore be used as a fully automated classifier in pipeline structures. The classifier forms a tier of the \\textit{Kepler} EB pipeline that pre-processes light curves for the artificial intelligence based parameter estimator.\nNovel research in the field of Linked Data focuses on the problem of entity summarization. This field addresses the problem of ranking features according to their importance for the task of identifying a particular entity. Next to a more human friendly presentation, these summarizations can play a central role for semantic search engines and semantic recommender systems. In current approaches, it has been tried to apply entity summarization based on patterns that are inherent to the regarded data.   The proposed approach of this paper focuses on the movie domain. It utilizes usage data in order to support measuring the similarity between movie entities. Using this similarity it is possible to determine the k-nearest neighbors of an entity. This leads to the idea that features that entities share with their nearest neighbors can be considered as significant or important for these entities. Additionally, we introduce a downgrading factor (similar to TF-IDF) in order to overcome the high number of commonly occurring features. We exemplify the approach based on a movie-ratings dataset that has been linked to Freebase entities.\nWe present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.\nThe application of reinforcement learning algorithms onto real life problems always bears the challenge of filtering the environmental state out of raw sensor readings. While most approaches use heuristics, biology suggests that there must exist an unsupervised method to construct such filters automatically. Besides the extraction of environmental states, the filters have to represent them in a fashion that support modern reinforcement algorithms. Many popular algorithms use a linear architecture, so one should aim at filters that have good approximation properties in combination with linear functions. This thesis wants to propose the unsupervised method slow feature analysis (SFA) for this task. Presented with a random sequence of sensor readings, SFA learns a set of filters. With growing model complexity and training examples, the filters converge against trigonometric polynomial functions. These are known to possess excellent approximation capabilities and should therfore support the reinforcement algorithms well. We evaluate this claim on a robot. The task is to learn a navigational control in a simple environment using the least square policy iteration (LSPI) algorithm. The only accessible sensor is a head mounted video camera, but without meaningful filtering, video images are not suited as LSPI input. We will show that filters learned by SFA, based on a random walk video of the robot, allow the learned control to navigate successfully in ca. 80% of the test trials.\nA relatively recent advance in cognitive neuroscience has been multi-voxel pattern analysis (MVPA), which enables researchers to decode brain states and/or the type of information represented in the brain during a cognitive operation. MVPA methods utilize machine learning algorithms to distinguish among types of information or cognitive states represented in the brain, based on distributed patterns of neural activity. In the current investigation, we propose a new approach for representation of neural data for pattern analysis, namely a Mesh Learning Model. In this approach, at each time instant, a star mesh is formed around each voxel, such that the voxel corresponding to the center node is surrounded by its p-nearest neighbors. The arc weights of each mesh are estimated from the voxel intensity values by least squares method. The estimated arc weights of all the meshes, called Mesh Arc Descriptors (MADs), are then used to train a classifier, such as Neural Networks, k-Nearest Neighbor, Na\\\"ive Bayes and Support Vector Machines. The proposed Mesh Model was tested on neuroimaging data acquired via functional magnetic resonance imaging (fMRI) during a recognition memory experiment using categorized word lists, employing a previously established experimental paradigm (\\\"Oztekin & Badre, 2011). Results suggest that the proposed Mesh Learning approach can provide an effective algorithm for pattern analysis of brain activity during cognitive processing.\nMethods for automated discovery of causal relationships from non-interventional data have received much attention recently. A widely used and well understood model family is given by linear acyclic causal models (recursive structural equation models). For Gaussian data both constraint-based methods (Spirtes et al., 1993; Pearl, 2000) (which output a single equivalence class) and Bayesian score-based methods (Geiger and Heckerman, 1994) (which assign relative scores to the equivalence classes) are available. On the contrary, all current methods able to utilize non-Gaussianity in the data (Shimizu et al., 2006; Hoyer et al., 2008) always return only a single graph or a single equivalence class, and so are fundamentally unable to express the degree of certainty attached to that output. In this paper we develop a Bayesian score-based approach able to take advantage of non-Gaussianity when estimating linear acyclic causal models, and we empirically demonstrate that, at least on very modest size networks, its accuracy is as good as or better than existing methods. We provide a complete code package (in R) which implements all algorithms and performs all of the analysis provided in the paper, and hope that this will further the application of these methods to solving causal inference problems.\nThe choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O(pp/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.\nEvaluating conjunctive queries and solving constraint satisfaction problems are fundamental problems in database theory and artificial intelligence, respectively. These problems are NP-hard, so that several research efforts have been made in the literature for identifying tractable classes, known as islands of tractability, as well as for devising clever heuristics for solving efficiently real-world instances. Many heuristic approaches are based on enforcing on the given instance a property called local consistency, where (in database terms) each tuple in every query atom matches at least one tuple in every other query atom. Interestingly, it turns out that, for many well-known classes of queries, such as for the acyclic queries, enforcing local consistency is even sufficient to solve the given instance correctly. However, the precise power of such a procedure was unclear, but for some very restricted cases. The paper provides full answers to the long-standing questions about the precise power of algorithms based on enforcing local consistency. The classes of instances where enforcing local consistency turns out to be a correct query-answering procedure are however not efficiently recognizable. In fact, the paper finally focuses on certain subclasses defined in terms of the novel notion of greedy tree projections. These latter classes are shown to be efficiently recognizable and strictly larger than most islands of tractability known so far, both in the general case of tree projections and for specific structural decomposition methods.\nWe introduce kLog, a novel approach to statistical relational learning. Unlike standard approaches, kLog does not represent a probability distribution directly. It is rather a language to perform kernel-based learning on expressive logical and relational representations. kLog allows users to specify learning problems declaratively. It builds on simple but powerful concepts: learning from interpretations, entity/relationship data modeling, logic programming, and deductive databases. Access by the kernel to the rich representation is mediated by a technique we call graphicalization: the relational representation is first transformed into a graph --- in particular, a grounded entity/relationship diagram. Subsequently, a choice of graph kernel defines the feature space. kLog supports mixed numerical and symbolic data, as well as background knowledge in the form of Prolog or Datalog programs as in inductive logic programming systems. The kLog framework can be applied to tackle the same range of tasks that has made statistical relational learning so popular, including classification, regression, multitask learning, and collective classification. We also report about empirical comparisons, showing that kLog can be either more accurate, or much faster at the same level of accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at http://klog.dinfo.unifi.it along with tutorials.\nI am most honoured to have the privilege to present the Foreword to this fascinating and wonderfully varied collection of contributions, concerning the nature of computation and of its deep connection with the operation of those basic laws, known or yet unknown, governing the universe in which we live. Fundamentally deep questions are indeed being grappled with here, and the fact that we find so many different viewpoints is something to be expected, since, in truth, we know little about the foundational nature and origins of these basic laws, despite the immense precision that we so often find revealed in them. Accordingly, it is not surprising that within the viewpoints expressed here is some unabashed speculation, occasionally bordering on just partially justified guesswork, while elsewhere we find a good deal of precise reasoning, some in the form of rigorous mathematical theorems. Both of these are as should be, for without some inspired guesswork we cannot have new ideas as to where look in order to make genuinely new progress, and without precise mathematical reasoning, no less than in precise observation, we cannot know when we are right -- or, more usually, when we are wrong.\nIn order to involve user knowledge in determining equality of sets, which may not be equal in the mathematical sense, three types of approximate (rough) equalities were introduced by Novotny and Pawlak ([8, 9, 10]). These notions were generalized by Tripathy, Mitra and Ojha ([13]), who introduced the concepts of approximate (rough) equivalences of sets. Rough equivalences capture equality of sets at a higher level than rough equalities. More properties of these concepts were established in [14]. Combining the conditions for the two types of approximate equalities, two more approximate equalities were introduced by Tripathy [12] and a comparative analysis of their relative efficiency was provided. In [15], the four types of approximate equalities were extended by considering rough fuzzy sets instead of only rough sets. In fact the concepts of leveled approximate equalities were introduced and properties were studied. In this paper we proceed further by introducing and studying the approximate equalities based on rough intuitionistic fuzzy sets instead of rough fuzzy sets. That is we introduce the concepts of approximate (rough)equalities of intuitionistic fuzzy sets and study their properties. We provide some real life examples to show the applications of rough equalities of fuzzy sets and rough equalities of intuitionistic fuzzy sets.\nWithin the framework proposed in this paper, we address the issue of extending the certain networks to a fuzzy certain networks in order to cope with a vagueness and limitations of existing models for decision under imprecise and uncertain knowledge. This paper proposes a framework that combines two disciplines to exploit their own advantages in uncertain and imprecise knowledge representation problems. The framework proposed is a possibilistic logic based one in which Bayesian nodes and their properties are represented by local necessity-valued knowledge base. Data in properties are interpreted as set of valuated formulas. In our contribution possibilistic Bayesian networks have a qualitative part and a quantitative part, represented by local knowledge bases. The general idea is to study how a fusion of these two formalisms would permit representing compact way to solve efficiently problems for knowledge representation. We show how to apply possibility and necessity measures to the problem of knowledge representation with large scale data. On the other hand fuzzification of crisp certainty degrees to fuzzy variables improves the quality of the network and tends to bring smoothness and robustness in the network performance. The general aim is to provide a new approach for decision under uncertainty that combines three methodologies: Bayesian networks certainty distribution and fuzzy logic.\nPertinence Feedback is a technique that enables a user to interactively express his information requirement by modifying his original query formulation with further information. This information is provided by explicitly confirming the pertinent of some indicating objects and/or goals extracted by the system. Obviously the user cannot mark objects and/or goals as pertinent until some are extracted, so the first search has to be initiated by a query and the initial query specification has to be good enough to pick out some pertinent objects and/or goals from the Semantic Network. In this paper we present a short survey of fuzzy and Semantic approaches to Knowledge Extraction. The goal of such approaches is to define flexible Knowledge Extraction Systems able to deal with the inherent vagueness and uncertainty of the Extraction process. It has long been recognised that interactivity improves the effectiveness of Knowledge Extraction systems. Novice user's queries are the most natural and interactive medium of communication and recent progress in recognition is making it possible to build systems that interact with the user. However, given the typical novice user's queries submitted to Knowledge Extraction Systems, it is easy to imagine that the effects of goal recognition errors in novice user's queries must be severely destructive on the system's effectiveness. The experimental work reported in this paper shows that the use of possibility theory in classical Knowledge Extraction techniques for novice user's query processing is more robust than the use of the probability theory. Moreover, both possibilistic and probabilistic pertinence feedback can be effectively employed to improve the effectiveness of novice user's query processing.\nWe analyze different aspects of our quantum modeling approach of human concepts, and more specifically focus on the quantum effects of contextuality, interference, entanglement and emergence, illustrating how each of them makes its appearance in specific situations of the dynamics of human concepts and their combinations. We point out the relation of our approach, which is based on an ontology of a concept as an entity in a state changing under influence of a context, with the main traditional concept theories, i.e. prototype theory, exemplar theory and theory theory. We ponder about the question why quantum theory performs so well in its modeling of human concepts, and shed light on this question by analyzing the role of complex amplitudes, showing how they allow to describe interference in the statistics of measurement outcomes, while in the traditional theories statistics of outcomes originates in classical probability weights, without the possibility of interference. The relevance of complex numbers, the appearance of entanglement, and the role of Fock space in explaining contextual emergence, all as unique features of the quantum modeling, are explicitly revealed in this paper by analyzing human concepts and their dynamics.\nThe similarity between trajectory patterns in clustering has played an important role in discovering movement behaviour of different groups of mobile objects. Several approaches have been proposed to measure the similarity between sequences in trajectory data. Most of these measures are based on Euclidean space or on spatial network and some of them have been concerned with temporal aspect or ordering types. However, they are not appropriate to characteristics of spatiotemporal mobility patterns in wireless networks. In this paper, we propose a new similarity measure for mobility patterns in cellular space of wireless network. The framework for constructing our measure is composed of two phases as follows. First, we present formal definitions to capture mathematically two spatial and temporal similarity measures for mobility patterns. And then, we define the total similarity measure by means of a weighted combination of these similarities. The truth of the partial and total similarity measures are proved in mathematics. Furthermore, instead of the time interval or ordering, our work makes use of the timestamp at which two mobility patterns share the same cell. A case study is also described to give a comparison of the combination measure with other ones.\nThis paper presents a method of optimization, based on both Bayesian Analysis technical and Gallois Lattice, of a Fuzzy Semantic Networks. The technical System we use learn by interpreting an unknown word using the links created between this new word and known words. The main link is provided by the context of the query. When novice's query is confused with an unknown verb (goal) applied to a known noun denoting either an object in the ideal user's Network or an object in the user's Network, the system infer that this new verb corresponds to one of the known goal. With the learning of new words in natural language as the interpretation, which was produced in agreement with the user, the system improves its representation scheme at each experiment with a new user and, in addition, takes advantage of previous discussions with users. The semantic Net of user objects thus obtained by these kinds of learning is not always optimal because some relationships between couple of user objects can be generalized and others suppressed according to values of forces that characterize them. Indeed, to simplify the obtained Net, we propose to proceed to an inductive Bayesian analysis, on the Net obtained from Gallois lattice. The objective of this analysis can be seen as an operation of filtering of the obtained descriptive graph.\nAnswer Set Programming (ASP) is a well-established paradigm of declarative programming in close relationship with other declarative formalisms such as SAT Modulo Theories, Constraint Handling Rules, FO(.), PDDL and many others. Since its first informal editions, ASP systems have been compared in the now well-established ASP Competition. The Third (Open) ASP Competition, as the sequel to the ASP Competitions Series held at the University of Potsdam in Germany (2006-2007) and at the University of Leuven in Belgium in 2009, took place at the University of Calabria (Italy) in the first half of 2011. Participants competed on a pre-selected collection of benchmark problems, taken from a variety of domains as well as real world applications. The Competition ran on two tracks: the Model and Solve (M&S) Track, based on an open problem encoding, and open language, and open to any kind of system based on a declarative specification paradigm; and the System Track, run on the basis of fixed, public problem encodings, written in a standard ASP language. This paper discusses the format of the Competition and the rationale behind it, then reports the results for both tracks. Comparison with the second ASP competition and state-of-the-art solutions for some of the benchmark domains is eventually discussed.   To appear in Theory and Practice of Logic Programming (TPLP).\nParameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief propagation (LBP) can suffer from poor convergence. In this paper, we provide a different approach for combining MRF learning and Bethe approximation. We consider the dual of maximum likelihood Markov network learning - maximizing entropy with moment matching constraints - and then approximate both the objective and the constraints in the resulting optimization problem. Unlike previous work along these lines (Teh & Welling, 2003), our formulation allows parameter sharing between features in a general log-linear model, parameter regularization and conditional training. We show that piecewise training (Sutton & McCallum, 2005) is a very restricted special case of this formulation. We study two optimization strategies: one based on a single convex approximation and one that uses repeated convex approximations. We show results on several real-world networks that demonstrate that these algorithms can significantly outperform learning with loopy and piecewise. Our results also provide a framework for analyzing the trade-offs of different relaxations of the entropy objective and of the constraints.\nMuch recent work has concerned sparse approximations to speed up the Gaussian process regression from the unfavorable O(n3) scaling in computational time to O(nm2). Thus far, work has concentrated on models with one covariance function. However, in many practical situations additive models with multiple covariance functions may perform better, since the data may contain both long and short length-scale phenomena. The long length-scales can be captured with global sparse approximations, such as fully independent conditional (FIC), and the short length-scales can be modeled naturally by covariance functions with compact support (CS). CS covariance functions lead to naturally sparse covariance matrices, which are computationally cheaper to handle than full covariance matrices. In this paper, we propose a new sparse Gaussian process model with two additive components: FIC for the long length-scales and CS covariance function for the short length-scales. We give theoretical and experimental results and show that under certain conditions the proposed model has the same computational complexity as FIC. We also compare the model performance of the proposed model to additive models approximated by fully and partially independent conditional (PIC). We use real data sets and show that our model outperforms FIC and PIC approximations for data sets with two additive phenomena.\nWe consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. The performance of algorithms for online planning is assessed in terms of simple regret, which is the agent's expected performance loss when the chosen action, rather than an optimal one, is followed.   To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate reduction of simple regret and error probability. This algorithm is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. Our empirical evaluation shows that BRUE not only provides superior performance guarantees, but is also very effective in practice and favorably compares to state-of-the-art. We then extend BRUE with a variant of \"learning by forgetting.\" The resulting set of algorithms, BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper bound on its reduction rate, and exhibits even more attractive empirical performance.\nIn the recent advancement of multimedia technologies, it becomes a major concern of detecting visual attention regions in the field of image processing. The popularity of the terminal devices in a heterogeneous environment of the multimedia technology gives us enough scope for the betterment of image visualization. Although there exist numerous methods, feature based image extraction becomes a popular one in the field of image processing. The objective of image segmentation is the domain-independent partition of the image into a set of regions, which are visually distinct and uniform with respect to some property, such as grey level, texture or colour. Segmentation and subsequent extraction can be considered the first step and key issue in object recognition, scene understanding and image analysis. Its application area encompasses mobile devices, industrial quality control, medical appliances, robot navigation, geophysical exploration, military applications, etc. In all these areas, the quality of the final results depends largely on the quality of the preprocessing work. Most of the times, acquiring spurious-free preprocessing data requires a lot of application cum mathematical intensive background works. We propose a feature based fuzzy rule guided novel technique that is functionally devoid of any external intervention during execution. Experimental results suggest that this approach is an efficient one in comparison to different other techniques extensively addressed in literature. In order to justify the supremacy of performance of our proposed technique in respect of its competitors, we take recourse to effective metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE) and Peak Signal to Noise Ratio (PSNR).\nWe systematically investigate the complexity of model checking the existential positive fragment of first-order logic. In particular, for a set of existential positive sentences, we consider model checking where the sentence is restricted to fall into the set; a natural question is then to classify which sentence sets are tractable and which are intractable. With respect to fixed-parameter tractability, we give a general theorem that reduces this classification question to the corresponding question for primitive positive logic, for a variety of representations of structures. This general theorem allows us to deduce that an existential positive sentence set having bounded arity is fixed-parameter tractable if and only if each sentence is equivalent to one in bounded-variable logic. We then use the lens of classical complexity to study these fixed-parameter tractable sentence sets. We show that such a set can be NP-complete, and consider the length needed by a translation from sentences in such a set to bounded-variable logic; we prove superpolynomial lower bounds on this length using the theory of compilability, obtaining an interesting type of formula size lower bound. Overall, the tools, concepts, and results of this article set the stage for the future consideration of the complexity of model checking on more expressive logics.\nIn this work we present an algorithm for covering continuous connected domains by ant-like robots with very limited capabilities. The robots can mark visited places with pheromone marks and sense the level of the pheromone in their local neighborhood. In case of multiple robots these pheromone marks can be sensed by all robots and provide the only way of (indirect) communication between the robots. The robots are assumed to be memoryless, and to have no global information such as the domain map, their own position (either absolute or relative), total marked area percentage, maximal pheromone level, etc.. Despite the robots' simplicity, we show that they are able, by running a very simple rule of behavior, to ensure efficient covering of arbitrary connected domains, including non-planar and multidimensional ones. The novelty of our algorithm lies in the fact that, unlike previously proposed methods, our algorithm works on continuous domains without relying on some \"induced\" underlying graph, that effectively reduces the problem to a discrete case of graph covering. The algorithm guarantees complete coverage of any connected domain. We also prove that the algorithm is noise immune, i.e., it is able to cope with any initial pheromone profile (noise). In addition the algorithm provides a bounded constant time between two successive visits of the robot, and thus, is suitable for patrolling or surveillance applications.\nOnline sequence prediction is the problem of predicting the next element of a sequence given previous elements. This problem has been extensively studied in the context of individual sequence prediction, where no prior assumptions are made on the origin of the sequence. Individual sequence prediction algorithms work quite well for long sequences, where the algorithm has enough time to learn the temporal structure of the sequence. However, they might give poor predictions for short sequences. A possible remedy is to rely on the general model of prediction with expert advice, where the learner has access to a set of $r$ experts, each of which makes its own predictions on the sequence. It is well known that it is possible to predict almost as well as the best expert if the sequence length is order of $\\log(r)$. But, without firm prior knowledge on the problem, it is not clear how to choose a small set of {\\em good} experts. In this paper we describe and analyze a new algorithm that learns a good set of experts using a training set of previously observed sequences. We demonstrate the merits of our approach by applying it on the task of click prediction on the web.\nWe consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation is modeled using latent variables. Previous methods overburden a single distribution with two separate tasks: (i) modeling the uncertainty in the latent variables during training; and (ii) making accurate predictions for the output and the latent variables during testing. We propose a novel framework that separates the demands of the two tasks using two distributions: (i) a conditional distribution to model the uncertainty of the latent variables for a given input-output pair; and (ii) a delta distribution to predict the output and the latent variables for a given input. During learning, we encourage agreement between the two distributions by minimizing a loss-based dissimilarity coefficient. Our approach generalizes latent SVM in two important ways: (i) it models the uncertainty over latent variables instead of relying on a pointwise estimate; and (ii) it allows the use of loss functions that depend on latent variables, which greatly increases its applicability. We demonstrate the efficacy of our approach on two challenging problems---object detection and action detection---using publicly available datasets.\nSparse coding is an unsupervised learning algorithm that learns a succinct high-level representation of the inputs given only unlabeled data; it represents each input as a sparse linear combination of a set of basis functions. Originally applied to modeling the human visual cortex, sparse coding has also been shown to be useful for self-taught learning, in which the goal is to solve a supervised classification task given access to additional unlabeled data drawn from different classes than that in the supervised learning problem. Shift-invariant sparse coding (SISC) is an extension of sparse coding which reconstructs a (usually time-series) input using all of the basis functions in all possible shifts. In this paper, we present an efficient algorithm for learning SISC bases. Our method is based on iteratively solving two large convex optimization problems: The first, which computes the linear coefficients, is an L1-regularized linear least squares problem with potentially hundreds of thousands of variables. Existing methods typically use a heuristic to select a small subset of the variables to optimize, but we present a way to efficiently compute the exact solution. The second, which solves for bases, is a constrained linear least squares problem. By optimizing over complex-valued variables in the Fourier domain, we reduce the coupling between the different variables, allowing the problem to be solved efficiently. We show that SISC's learned high-level representations of speech and music provide useful features for classification tasks within those domains. When applied to classification, under certain conditions the learned features outperform state of the art spectral and cepstral features.\nInstrumental variables have proven useful, in particular within the social sciences and economics, for making inference about the causal effect of a random variable, B, on another random variable, C, in the presence of unobserved confounders. In the case where relationships are linear, causal effects can be identified exactly from studying the regression of C on A and the regression of B on A, where A is the instrument. In the more general case, bounds have been developed in the literature for the causal effect of B on C, given observational data on the joint distribution of C, B and A. Using an approach based on the analysis of convex polytopes, we develop bounds for the same causal effect when given data on (C,A) and (B,A) only. The bounds developed are thus in direct analogy to the standard use of instruments in econometrics, but we make no assumption of linearity. Use of the bounds is illustrated for experiments with partial compliance. The bounds are, for example, relevant in genetic epidemiology, where the 'Mendelian instrument' S represents a genotype, and where joint data on all of C, B and A may rarely be available but studies involving pairs of these may be abundant. Other examples of bounding causal effects are considered to show that the method applies to DAGs in general, subject to certain conditions.\nIn many application domains, such as computational biology, the goal of graphical model structure learning is to uncover discrete relationships between entities. For example, in our problem of interest concerning HIV vaccine design, we want to infer which HIV peptides interact with which immune system molecules (HLA molecules). For problems of this nature, we are interested in determining the number of nonspurious arcs in a learned graphical model. We describe both a Bayesian and frequentist approach to this problem. In the Bayesian approach, we use the posterior distribution over model structures to compute the expected number of true arcs in a learned model. In the frequentist approach, we develop a method based on the concept of the False Discovery Rate. On synthetic data sets generated from models similar to the ones learned, we find that both the Bayesian and frequentist approaches yield accurate estimates of the number of non-spurious arcs. In addition, we speculate that the frequentist approach, which is non-parametric, may outperform the parametric Bayesian approach in situations where the models learned are less representative of the data. Finally, we apply the frequentist approach to our problem of HIV vaccine design.\nAssistive systems for persons with cognitive disabilities (e.g. dementia) are difficult to build due to the wide range of different approaches people can take to accomplishing the same task, and the significant uncertainties that arise from both the unpredictability of client's behaviours and from noise in sensor readings. Partially observable Markov decision process (POMDP) models have been used successfully as the reasoning engine behind such assistive systems for small multi-step tasks such as hand washing. POMDP models are a powerful, yet flexible framework for modelling assistance that can deal with uncertainty and utility. Unfortunately, POMDPs usually require a very labour intensive, manual procedure for their definition and construction. Our previous work has described a knowledge driven method for automatically generating POMDP activity recognition and context sensitive prompting systems for complex tasks. We call the resulting POMDP a SNAP (SyNdetic Assistance Process). The spreadsheet-like result of the analysis does not correspond to the POMDP model directly and the translation to a formal POMDP representation is required. To date, this translation had to be performed manually by a trained POMDP expert. In this paper, we formalise and automate this translation process using a probabilistic relational model (PRM) encoded in a relational database. We demonstrate the method by eliciting three assistance tasks from non-experts. We validate the resulting POMDP models using case-based simulations to show that they are reasonable for the domains. We also show a complete case study of a designer specifying one database, including an evaluation in a real-life experiment with a human actor.\nRepresentations are internal models of the environment that can provide guidance to a behaving agent, even in the absence of sensory information. It is not clear how representations are developed and whether or not they are necessary or even essential for intelligent behavior. We argue here that the ability to represent relevant features of the environment is the expected consequence of an adaptive process, give a formal definition of representation based on information theory, and quantify it with a measure R. To measure how R changes over time, we evolve two types of networks---an artificial neural network and a network of hidden Markov gates---to solve a categorization task using a genetic algorithm. We find that the capacity to represent increases during evolutionary adaptation, and that agents form representations of their environment during their lifetime. This ability allows the agents to act on sensorial inputs in the context of their acquired representations and enables complex and context-dependent behavior. We examine which concepts (features of the environment) our networks are representing, how the representations are logically encoded in the networks, and how they form as an agent behaves to solve a task. We conclude that R should be able to quantify the representations within any cognitive system, and should be predictive of an agent's long-term adaptive success.\nDiscriminative linear models are a popular tool in machine learning. These can be generally divided into two types: The first is linear classifiers, such as support vector machines, which are well studied and provide state-of-the-art results. One shortcoming of these models is that their output (known as the 'margin') is not calibrated, and cannot be translated naturally into a distribution over the labels. Thus, it is difficult to incorporate such models as components of larger systems, unlike probabilistic based approaches. The second type of approach constructs class conditional distributions using a nonlinearity (e.g. log-linear models), but is occasionally worse in terms of classification error. We propose a supervised learning method which combines the best of both approaches. Specifically, our method provides a distribution over the labels, which is a linear function of the model parameters. As a consequence, differences between probabilities are linear functions, a property which most probabilistic models (e.g. log-linear) do not have.   Our model assumes that classes correspond to linear subspaces (rather than to half spaces). Using a relaxed projection operator, we construct a measure which evaluates the degree to which a given vector 'belongs' to a subspace, resulting in a distribution over labels. Interestingly, this view is closely related to similar concepts in quantum detection theory. The resulting models can be trained either to maximize the margin or to optimize average likelihood measures. The corresponding optimization problems are semidefinite programs which can be solved efficiently. We illustrate the performance of our algorithm on real world datasets, and show that it outperforms 2nd order kernel methods.\nMany tasks require finding groups of elements in a matrix of numbers, symbols or class likelihoods. One approach is to use efficient bi- or tri-linear factorization techniques including PCA, ICA, sparse matrix factorization and plaid analysis. These techniques are not appropriate when addition and multiplication of matrix elements are not sensibly defined. More directly, methods like bi-clustering can be used to classify matrix elements, but these methods make the overly-restrictive assumption that the class of each element is a function of a row class and a column class. We introduce a general computational problem, `matrix tile analysis' (MTA), which consists of decomposing a matrix into a set of non-overlapping tiles, each of which is defined by a subset of usually nonadjacent rows and columns. MTA does not require an algebra for combining tiles, but must search over discrete combinations of tile assignments. Exact MTA is a computationally intractable integer programming problem, but we describe an approximate iterative technique and a computationally efficient sum-product relaxation of the integer program. We compare the effectiveness of these methods to PCA and plaid on hundreds of randomly generated tasks. Using double-gene-knockout data, we show that MTA finds groups of interacting yeast genes that have biologically-related functions.\nRecently, a theory for stochastic optimal control in non-linear dynamical systems in continuous space-time has been developed (Kappen, 2005). We apply this theory to collaborative multi-agent systems. The agents evolve according to a given non-linear dynamics with additive Wiener noise. Each agent can control its own dynamics. The goal is to minimize the accumulated joint cost, which consists of a state dependent term and a term that is quadratic in the control. We focus on systems of non-interacting agents that have to distribute themselves optimally over a number of targets, given a set of end-costs for the different possible agent-target combinations. We show that optimal control is the combinatorial sum of independent single-agent single-target optimal controls weighted by a factor proportional to the end-costs of the different combinations. Thus, multi-agent control is related to a standard graphical model inference problem. The additional computational cost compared to single-agent control is exponential in the tree-width of the graph specifying the combinatorial sum times the number of targets. We illustrate the result by simulations of systems with up to 42 agents.\nWe present a machine learning approach for estimating the second derivative of a drivable surface, its roughness. Robot perception generally focuses on the first derivative, obstacle detection. However, the second derivative is also important due to its direct relation (with speed) to the shock the vehicle experiences. Knowing the second derivative allows a vehicle to slow down in advance of rough terrain. Estimating the second derivative is challenging due to uncertainty. For example, at range, laser readings may be so sparse that significant information about the surface is missing. Also, a high degree of precision is required in projecting laser readings. This precision may be unavailable due to latency or error in the pose estimation. We model these sources of error as a multivariate polynomial. Its coefficients are learned using the shock data as ground truth -- the accelerometers are used to train the lasers. The resulting classifier operates on individual laser readings from a road surface described by a 3D point cloud. The classifier identifies sections of road where the second derivative is likely to be large. Thus, the vehicle can slow down in advance, reducing the shock it experiences. The algorithm is an evolution of one we used in the 2005 DARPA Grand Challenge. We analyze it using data from that route.\nCombinatorial optimization is widely applied in a number of areas nowadays. Unfortunately, many combinatorial optimization problems are NP-hard which usually means that they are unsolvable in practice. However, it is often unnecessary to have an exact solution. In this case one may use heuristic approach to obtain a near-optimal solution in some reasonable time.   We focus on two combinatorial optimization problems, namely the Generalized Traveling Salesman Problem and the Multidimensional Assignment Problem. The first problem is an important generalization of the Traveling Salesman Problem; the second one is a generalization of the Assignment Problem for an arbitrary number of dimensions. Both problems are NP-hard and have hosts of applications.   In this work, we discuss different aspects of heuristics design and evaluation. A broad spectrum of related subjects, covered in this research, includes test bed generation and analysis, implementation and performance issues, local search neighborhoods and efficient exploration algorithms, metaheuristics design and population sizing in memetic algorithm.   The most important results are obtained in the areas of local search and memetic algorithms for the considered problems. In both cases we have significantly advanced the existing knowledge on the local search neighborhoods and algorithms by systematizing and improving the previous results. We have proposed a number of efficient heuristics which dominate the existing algorithms in a wide range of time/quality requirements.   Several new approaches, introduced in our memetic algorithms, make them the state-of-the-art metaheuristics for the corresponding problems. Population sizing is one of the most promising among these approaches; it is expected to be applicable to virtually any memetic algorithm.\nOntologies are key enablers for sharing precise and machine-understandable semantics among different applications and parties. Yet, for ontologies to meet these expectations, their quality must be of a good standard. The quality of an ontology is strongly based on the design method employed. This paper addresses the design problems related to the modelling of ontologies, with specific concentration on the issues related to the quality of the conceptualisations produced. The paper aims to demonstrate the impact of the modelling paradigm adopted on the quality of ontological models and, consequently, the potential impact that such a decision can have in relation to the development of software applications. To this aim, an ontology that is conceptualised based on the Object-Role Modelling (ORM) approach (a representative of endurantism) is re-engineered into a one modelled on the basis of the Object Paradigm (OP) (a representative of perdurantism). Next, the two ontologies are analytically compared using the specified criteria. The conducted comparison highlights that using the OP for ontology conceptualisation can provide more expressive, reusable, objective and temporal ontologies than those conceptualised on the basis of the ORM approach.\nDynamic treatment regimes operationalize the clinical decision process as a sequence of functions, one for each clinical decision, where each function takes as input up-to-date patient information and gives as output a single recommended treatment. Current methods for estimating optimal dynamic treatment regimes, for example Q-learning, require the specification of a single outcome by which the `goodness' of competing dynamic treatment regimes are measured. However, this is an over-simplification of the goal of clinical decision making, which aims to balance several potentially competing outcomes. For example, often a balance must be struck between treatment effectiveness and side-effect burden. We propose a method for constructing dynamic treatment regimes that accommodates competing outcomes by recommending sets of treatments at each decision point. Formally, we construct a sequence of set-valued functions that take as input up-to-date patient information and give as output a recommended subset of the possible treatments. For a given patient history, the recommended set of treatments contains all treatments that are not inferior according to any of the competing outcomes. When there is more than one decision point, constructing these set-valued functions requires solving a non-trivial enumeration problem. We offer an exact enumeration algorithm by recasting the problem as a linear mixed integer program. The proposed methods are illustrated using data from a depression study and the CATIE schizophrenia study.\nPIDE is a general framework for document-oriented prover interaction and integration, based on a bilingual architecture that combines ML and Scala. The overall aim is to connect LCF-style provers like Isabelle (or Coq or HOL) with sophisticated front-end technology on the JVM platform, overcoming command-line interaction at last.   The present system description specifically covers Isabelle/jEdit as part of the official release of Isabelle2011-1 (October 2011). It is a concrete Prover IDE implementation based on Isabelle/PIDE library modules (implemented in Scala) on the one hand, and the well-known text editor framework of jEdit (implemented in Java) on the other hand.   The interaction model of our Prover IDE follows the idea of continuous proof checking: the theory source text is annotated by semantic information by the prover as it becomes available incrementally. This works via an asynchronous protocol that neither blocks the editor nor stops the prover from exploiting parallelism on multi-core hardware. The jEdit GUI provides standard metaphors for augmented text editing (highlighting, squiggles, tooltips, hyperlinks etc.) that we have instrumented to render the formal content from the prover context. Further refinement of the jEdit display engine via suitable plugins and fonts approximates mathematical rendering in the text buffer, including symbols from the TeX repertoire, and sub-/superscripts.   Isabelle/jEdit is presented here both as a usable interface for current Isabelle, and as a reference application to inspire further projects based on PIDE.\nWe address the problem of unsupervised learning of complex articulated object models from 3D range data. We describe an algorithm whose input is a set of meshes corresponding to different configurations of an articulated object. The algorithm automatically recovers a decomposition of the object into approximately rigid parts, the location of the parts in the different object instances, and the articulated object skeleton linking the parts. Our algorithm first registers allthe meshes using an unsupervised non-rigid technique described in a companion paper. It then segments the meshes using a graphical model that captures the spatial contiguity of parts. The segmentation is done using the EM algorithm, iterating between finding a decomposition of the object into rigid parts, and finding the location of the parts in the object instances. Although the graphical model is densely connected, the object decomposition step can be performed optimally and efficiently, allowing us to identify a large number of object parts while avoiding local maxima. We demonstrate the algorithm on real world datasets, recovering a 15-part articulated model of a human puppet from just 7 different puppet configurations, as well as a 4 part model of a fiexing arm where significant non-rigid deformation was present.\nClassical learning assumes the learner is given a labeled data sample, from which it learns a model. The field of Active Learning deals with the situation where the learner begins not with a training sample, but instead with resources that it can use to obtain information to help identify the optimal model. To better understand this task, this paper presents and analyses the simplified \"(budgeted) active model selection\" version, which captures the pure exploration aspect of many active learning problems in a clean and simple problem formulation. Here the learner can use a fixed budget of \"model probes\" (where each probe evaluates the specified model on a random indistinguishable instance) to identify which of a given set of possible models has the highest expected accuracy. Our goal is a policy that sequentially determines which model to probe next, based on the information observed so far. We present a formal description of this task, and show that it is NPhard in general. We then investigate a number of algorithms for this task, including several existing ones (eg, \"Round-Robin\", \"Interval Estimation\", \"Gittins\") as well as some novel ones (e.g., \"Biased-Robin\"), describing first their approximation properties and then their empirical performance on various problem instances. We observe empirically that the simple biased-robin algorithm significantly outperforms the other algorithms in the case of identical costs and priors.\nHaplotypes, the global patterns of DNA sequence variation, have important implications for identifying complex traits. Recently, blocks of limited haplotype diversity have been discovered in human chromosomes, intensifying the research on modelling the block structure as well as the transitions or co-occurrence of the alleles in these blocks as a way to compress the variability and infer the associations more robustly. The haplotype block structure analysis is typically complicated by the fact that the phase information for each SNP is missing, i.e., the observed allele pairs are not given in a consistent order across the sequence. The techniques for circumventing this require additional information, such as family data, or a more complex sequencing procedure. In this paper we present a hierarchical statistical model and the associated learning and inference algorithms that simultaneously deal with the allele ambiguity per locus, missing data, block estimation, and the complex trait association. While the blo structure may differ from the structures inferred by other methods, which use the pedigree information or previously known alleles, the parameters we estimate, including the learned block structure and the estimated block transitions per locus, define a good model of variability in the set. The method is completely datadriven and can detect Chron's disease from the SNP data taken from the human chromosome 5q31 with the detection rate of 80% and a small error variance.\nOne of the major problems in modeling natural signals is that signals with very similar structure may locally have completely different measurements, e.g., images taken under different illumination conditions, or the speech signal captured in different environments. While there have been many successful attempts to address these problems in application-specific settings, we believe that underlying a large set of problems in signal representation is a representational deficiency of intensity-derived local measurements that are the basis of most efficient models. We argue that interesting structure in signals is better captured when the signal is de- fined as a matrix whose entries are discrete indices to a separate palette of possible measurements. In order to model the variability in signal structure, we define a signal class not by a single index map, but by a probability distribution over the index maps, which can be estimated from the data, and which we call probabilistic index maps. The existing algorithm can be adapted to work with this representation. Furthermore, the probabilistic index map representation leads to algorithms with computational costs proportional to either the size of the palette or the log of the size of the palette, making the cost of significantly increased invariance to non-structural changes quite bearable. We illustrate the benefits of the probabilistic index map representation in several applications in computer vision and speech processing.\nThe lack of interoperability between mobile cellular access networks has long been a challenging obstacle, which telecommunication engineering is trying to overcome. In second generation networks for example, this problem lies in the fact that there are multiple standards. Each of these standards can operate in the same frequency range. However, each utilizes a different Radio Technology and Modulation Scheme, which are characteristics of the standard. Therefore, the lack of interoperability in 2G occurs because of the lack of standardization. Interoperability within 3G networks is limited to a few operating modes using different Radio Transmission Technologies that are not inter-operable. Thus, interoperability remains an issue for 3G. 4G technology even being successful in its various trials cannot guarantee the interoperability. This is within each network generation; meanwhile between heterogeneous network generations the situation seems to be worst. This approach is first to analyze the structure, inputs, and outputs of three different cellular technologies, performing a domain analysis (of this subset of technologies) and producing a feature model of the domain. Finally, we sought to build an ontology capable of providing a common view of the domain, providing an effective representation of relations between representations of corresponding concepts in different cellular technologies.\nThe exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy.   To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution.   We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed Bernoulli bandit problems and various playing budgets, show that the meta-learnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-Tuned, UCB-v, KL-UCB and epsilon greedy); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution.\nToday's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.\nCreating and monitoring competitive and cost-effective pay-per-click advertisement campaigns through the web-search channel is a resource demanding task in terms of expertise and effort. Assisting or even automating the work of an advertising specialist will have an unrivaled commercial value. In this paper we propose a methodology, an architecture, and a fully functional framework for semi- and fully- automated creation, monitoring, and optimization of cost-efficient pay-per-click campaigns with budget constraints. The campaign creation module generates automatically keywords based on the content of the web page to be advertised extended with corresponding ad-texts. These keywords are used to create automatically the campaigns fully equipped with the appropriate values set. The campaigns are uploaded to the auctioneer platform and start running. The optimization module focuses on the learning process from existing campaign statistics and also from applied strategies of previous periods in order to invest optimally in the next period. The objective is to maximize the performance (i.e. clicks, actions) under the current budget constraint. The fully functional prototype is experimentally evaluated on real world Google AdWords campaigns and presents a promising behavior with regards to campaign performance statistics as it outperforms systematically the competing manually maintained campaigns.\nInferring probabilistic networks from data is a notoriously difficult task. Under various goodness-of-fit measures, finding an optimal network is NP-hard, even if restricted to polytrees of bounded in-degree. Polynomial-time algorithms are known only for rare special cases, perhaps most notably for branchings, that is, polytrees in which the in-degree of every node is at most one. Here, we study the complexity of finding an optimal polytree that can be turned into a branching by deleting some number of arcs or nodes, treated as a parameter.   We show that the problem can be solved via a matroid intersection formulation in polynomial time if the number of deleted arcs is bounded by a constant. The order of the polynomial time bound depends on this constant, hence the algorithm does not establish fixed-parameter tractability when parameterized by the number of deleted arcs. We show that a restricted version of the problem allows fixed-parameter tractability and hence scales well with the parameter. We contrast this positive result by showing that if we parameterize by the number of deleted nodes, a somewhat more powerful parameter, the problem is not fixed-parameter tractable, subject to a complexity-theoretic assumption.\nFuzzy rule based classification systems are one of the most popular fuzzy modeling systems used in pattern classification problems. This paper investigates the effect of applying nine different T-norms in fuzzy rule based classification systems. In the recent researches, fuzzy versions of confidence and support merits from the field of data mining have been widely used for both rules selecting and weighting in the construction of fuzzy rule based classification systems. For calculating these merits the product has been usually used as a T-norm. In this paper different T-norms have been used for calculating the confidence and support measures. Therefore, the calculations in rule selection and rule weighting steps (in the process of constructing the fuzzy rule based classification systems) are modified by employing these T-norms. Consequently, these changes in calculation results in altering the overall accuracy of rule based classification systems. Experimental results obtained on some well-known data sets show that the best performance is produced by employing the Aczel-Alsina operator in terms of the classification accuracy, the second best operator is Dubois-Prade and the third best operator is Dombi. In experiments, we have used 12 data sets with numerical attributes from the University of California, Irvine machine learning repository (UCI).\nWith such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic authorship attribution scheme and an open authorship attribution scheme for digital texts based on the analysis. NFZ-WDA is based on the observation that all authors leave distinct intrinsic word usage traces on texts written by them and these intrinsic styles can be identified and employed to analyze the authorship. The intrinsic word usage styles can be estimated through the analysis of word distribution within a text, which is more than normal word frequency analysis and can be expressed as: which groups of words are used in the text; how frequently does each group of words occur; how are the occurrences of each group of words distributed in the text. Next, the basic authorship attribution scheme and the open authorship attribution scheme provide solutions for both closed and open authorship attribution problems. Through analysis and extensive experimental studies, this paper demonstrates the efficiency of the proposed method for authorship attribution.\nA major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assign a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that the documents are practically identical. Traditionally, vector-based models have been used for computing the document similarity. The vector-based models represent several features present in documents. These approaches to similarity measures, in general, cannot account for the semantics of the document. Documents written in human languages contain contexts and the words used to describe these contexts are generally semantically related. Motivated by this fact, many researchers have proposed semantic-based similarity measures by utilizing text annotation through external thesauruses like WordNet (a lexical database). In this paper, we define a semantic similarity measure based on documents represented in topic maps. Topic maps are rapidly becoming an industrial standard for knowledge representation with a focus for later search and extraction. The documents are transformed into a topic map based coded knowledge and the similarity between a pair of documents is represented as a correlation between the common patterns. The experimental studies on the text mining datasets reveal that this new similarity measure is more effective as compared to commonly used similarity measures in text clustering.\nMuch current research in AI and games is being devoted to Monte Carlo search (MCS) algorithms. While the quest for a single unified MCS algorithm that would perform well on all problems is of major interest for AI, practitioners often know in advance the problem they want to solve, and spend plenty of time exploiting this knowledge to customize their MCS algorithm in a problem-driven way. We propose an MCS algorithm discovery scheme to perform this in an automatic and reproducible way. We first introduce a grammar over MCS algorithms that enables inducing a rich space of candidate algorithms. Afterwards, we search in this space for the algorithm that performs best on average for a given distribution of training problems. We rely on multi-armed bandits to approximately solve this optimization problem. The experiments, generated on three different domains, show that our approach enables discovering algorithms that outperform several well-known MCS algorithms such as Upper Confidence bounds applied to Trees and Nested Monte Carlo search. We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems.\nDirect policy search (DPS) and look-ahead tree (LT) policies are two widely used classes of techniques to produce high performance policies for sequential decision-making problems. To make DPS approaches work well, one crucial issue is to select an appropriate space of parameterized policies with respect to the targeted problem. A fundamental issue in LT approaches is that, to take good decisions, such policies must develop very large look-ahead trees which may require excessive online computational resources. In this paper, we propose a new hybrid policy learning scheme that lies at the intersection of DPS and LT, in which the policy is an algorithm that develops a small look-ahead tree in a directed way, guided by a node scoring function that is learned through DPS. The LT-based representation is shown to be a versatile way of representing policies in a DPS scheme, while at the same time, DPS enables to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions.   We experimentally compare our method with two other state-of-the-art DPS techniques and four common LT policies on four benchmark domains and show that it combines the advantages of the two techniques from which it originates. In particular, we show that our method: (1) produces overall better performing policies than both pure DPS and pure LT policies, (2) requires a substantially smaller number of policy evaluations than other DPS techniques, (3) is easy to tune and (4) results in policies that are quite robust with respect to perturbations of the initial conditions.\nAs an important tool for information filtering in the era of socialized web, recommender systems have witnessed rapid development in the last decade. As benefited from the better interpretability, neighborhood-based collaborative filtering techniques, such as item-based collaborative filtering adopted by Amazon, have gained a great success in many practical recommender systems. However, the neighborhood-based collaborative filtering method suffers from the rating bound problem, i.e., the rating on a target item that this method estimates is bounded by the observed ratings of its all neighboring items. Therefore, it cannot accurately estimate the unobserved rating on a target item, if its ground truth rating is actually higher (lower) than the highest (lowest) rating over all items in its neighborhood. In this paper, we address this problem by formalizing rating estimation as a task of recovering a scalar rating function. With a linearity assumption, we infer all the ratings by optimizing the low-order norm, e.g., the $l_1/2$-norm, of the second derivative of the target scalar function, while remaining its observed ratings unchanged. Experimental results on three real datasets, namely Douban, Goodreads and MovieLens, demonstrate that the proposed approach can well overcome the rating bound problem. Particularly, it can significantly improve the accuracy of rating estimation by 37% than the conventional neighborhood-based methods.\nWe propose parametric constructive Kripke-semantics for multi-agent KD45-belief and S5-knowledge in terms of elementary set-theoretic constructions of two basic functional building blocks, namely bias (or viewpoint) and visibility, functioning also as the parameters of the doxastic and epistemic accessibility relation. The doxastic accessibility relates two possible worlds whenever the application of the composition of bias with visibility to the first world is equal to the application of visibility to the second world. The epistemic accessibility is the transitive closure of the union of our doxastic accessibility and its converse. Therefrom, accessibility relations for common and distributed belief and knowledge can be constructed in a standard way. As a result, we obtain a general definition of knowledge in terms of belief that enables us to view S5-knowledge as accurate (unbiased and thus true) KD45-belief, negation-complete belief and knowledge as exact KD45-belief and S5-knowledge, respectively, and perfect S5-knowledge as precise (exact and accurate) KD45-belief, and all this generically for arbitrary functions of bias and visibility. Our results can be seen as a semantic complement to previous foundational results by Halpern et al. about the (un)definability and (non-)reducibility of knowledge in terms of and to belief, respectively.\nThe aim of this paper is to develop a methodology that is useful for analysing from a microeconomic perspective the incentives to entry, permanence and exit in the market for pharmaceutical generics under fuzzy conditions. In an empirical application of our proposed methodology, the potential towards permanence of labs with different characteristics has been estimated. The case we deal with is set in an open market where global players diversify into different national markets of pharmaceutical generics. Risk issues are significantly important in deterring decision makers from expanding in the generic pharmaceutical business. However, not all players are affected in the same way and/or to the same extent. Small, non-diversified generics labs are in the worse position. We have highlighted that the expected NPV and the number of generics in the portfolio of a pharmaceutical lab are important variables, but that it is also important to consider the degree of diversification. Labs with a higher potential for diversification across markets have an advantage over smaller labs. We have described a fuzzy decision support system based on the Mamdani model in order to determine the incentives for a laboratory to remain in the market both when it is stable and when it is growing.\nCultural algorithm is a kind of evolutionary algorithm inspired from societal evolution and is composed of a belief space, a population space and a protocol that enables exchange of knowledge between these sources. Knowledge created in the population space is accepted into the belief space while this collective knowledge from these sources is combined to influence the decisions of the individual agents in solving problems. Classification rules comes under descriptive knowledge discovery in data mining and are the most sought out by users since they represent highly comprehensible form of knowledge. The rules have certain properties which make them useful forms of actionable knowledge to users. The rules are evaluated using these properties namely the rule metrics. In the current study a Cultural Algorithm Toolkit for Classification Rule Mining (CAT-CRM) is proposed which allows the user to control three different set of parameters namely the evolutionary parameters, the rule parameters as well as agent parameters and hence can be used for experimenting with an evolutionary system, a rule mining system or an agent based social system. Results of experiments conducted to observe the effect of different number and type of metrics on the performance of the algorithm on bench mark data sets is reported.\nEfficient ontology debugging is a cornerstone for many activities in the context of the Semantic Web, especially when automatic tools produce (parts of) ontologies such as in the field of ontology matching. The best currently known interactive debugging systems rely upon some meta information in terms of fault probabilities, which can speed up the debugging procedure in the good case, but can also have negative impact on the performance in the bad case. The problem is that assessment of the meta information is only possible a-posteriori. Consequently, as long as the actual fault is unknown, there is always some risk of suboptimal interactive diagnoses discrimination. As an alternative, one might prefer to rely on a tool which pursues a no-risk strategy. In this case, however, possibly well-chosen meta information cannot be exploited, resulting again in inefficient debugging actions. In this work we present a reinforcement learning strategy that continuously adapts its behavior depending on the performance achieved and minimizes the risk of using low-quality meta information. Therefore, this method is suitable for application scenarios where reliable a-priori fault estimates are difficult to obtain. Using problematic ontologies in the field of ontology matching, we show that the proposed risk-aware query strategy outperforms both active learning approaches and no-risk strategies on average in terms of required amount of user interaction.\nA central problem of surveillance is to monitor multiple targets moving in a large-scale, obstacle-ridden environment with occlusions. This paper presents a novel principled Partially Observable Markov Decision Process-based approach to coordinating and controlling a network of active cameras for tracking and observing multiple mobile targets at high resolution in such surveillance environments. Our proposed approach is capable of (a) maintaining a belief over the targets' states (i.e., locations, directions, and velocities) to track them, even when they may not be observed directly by the cameras at all times, (b) coordinating the cameras' actions to simultaneously improve the belief over the targets' states and maximize the expected number of targets observed with a guaranteed resolution, and (c) exploiting the inherent structure of our surveillance problem to improve its scalability (i.e., linear time) in the number of targets to be observed. Quantitative comparisons with state-of-the-art multi-camera coordination and control techniques show that our approach can achieve higher surveillance quality in real time. The practical feasibility of our approach is also demonstrated using real AXIS 214 PTZ cameras\nCovering is a common type of data structure and covering-based rough set theory is an efficient tool to process this data. Lattice is an important algebraic structure and used extensively in investigating some types of generalized rough sets. In this paper, we propose two family of sets and study the conditions that these two sets become some lattice structures. These two sets are consisted by the fixed point of the lower approximations of the first type and the sixth type of covering-based rough sets, respectively. These two sets are called the fixed point set of neighborhoods and the fixed point set of covering, respectively. First, for any covering, the fixed point set of neighborhoods is a complete and distributive lattice, at the same time, it is also a double p-algebra. Especially, when the neighborhood forms a partition of the universe, the fixed point set of neighborhoods is both a boolean lattice and a double Stone algebra. Second, for any covering, the fixed point set of covering is a complete lattice.When the covering is unary, the fixed point set of covering becomes a distributive lattice and a double p-algebra. a distributive lattice and a double p-algebra when the covering is unary. Especially, when the reduction of the covering forms a partition of the universe, the fixed point set of covering is both a boolean lattice and a double Stone algebra.\nWe design temporal description logics suitable for reasoning about temporal conceptual data models and investigate their computational complexity. Our formalisms are based on DL-Lite logics with three types of concept inclusions (ranging from atomic concept inclusions and disjointness to the full Booleans), as well as cardinality constraints and role inclusions. In the temporal dimension, they capture future and past temporal operators on concepts, flexible and rigid roles, the operators `always' and `some time' on roles, data assertions for particular moments of time and global concept inclusions. The logics are interpreted over the Cartesian products of object domains and the flow of time (Z,<), satisfying the constant domain assumption. We prove that the most expressive of our temporal description logics (which can capture lifespan cardinalities and either qualitative or quantitative evolution constraints) turn out to be undecidable. However, by omitting some of the temporal operators on concepts/roles or by restricting the form of concept inclusions we obtain logics whose complexity ranges between PSpace and NLogSpace. These positive results were obtained by reduction to various clausal fragments of propositional temporal logic, which opens a way to employ propositional or first-order temporal provers for reasoning about temporal data models.\nTaaable is a case-based reasoning system that adapts cooking recipes to user constraints. Within it, the preparation part of recipes is formalised as a graph. This graph is a semantic representation of the sequence of instructions composing the cooking process and is used to compute the procedure adaptation, conjointly with the textual adaptation. It is composed of cooking actions and ingredients, among others, represented as vertices, and semantic relations between those, shown as arcs, and is built automatically thanks to natural language processing. The results of the automatic annotation process is often a disconnected graph, representing an incomplete annotation, or may contain errors. Therefore, a validating and correcting step is required. In this paper, we present an existing graphic tool named \\kcatos, conceived for representing and editing decision trees, and show how it has been adapted and integrated in WikiTaaable, the semantic wiki in which the knowledge used by Taaable is stored. This interface provides the wiki users with a way to correct the case representation of the cooking process, improving at the same time the quality of the knowledge about cooking procedures stored in WikiTaaable.\nThe measurement error with normal distribution is universal in applications. Generally, smaller measurement error requires better instrument and higher test cost. In decision making based on attribute values of objects, we shall select an attribute subset with appropriate measurement error to minimize the total test cost. Recently, error-range-based covering rough set with uniform distribution error was proposed to investigate this issue. However, the measurement errors satisfy normal distribution instead of uniform distribution which is rather simple for most applications. In this paper, we introduce normal distribution measurement errors to covering-based rough set model, and deal with test-cost-sensitive attribute reduction problem in this new model. The major contributions of this paper are four-fold. First, we build a new data model based on normal distribution measurement errors. With the new data model, the error range is an ellipse in a two-dimension space. Second, the covering-based rough set with normal distribution measurement errors is constructed through the \"3-sigma\" rule. Third, the test-cost-sensitive attribute reduction problem is redefined on this covering-based rough set. Fourth, a heuristic algorithm is proposed to deal with this problem. The algorithm is tested on ten UCI (University of California - Irvine) datasets. The experimental results show that the algorithm is more effective and efficient than the existing one. This study is a step toward realistic applications of cost-sensitive learning.\nA fundamental question in systems biology is the construction and training to data of mathematical models. Logic formalisms have become very popular to model signaling networks because their simplicity allows us to model large systems encompassing hundreds of proteins. An approach to train (Boolean) logic models to high-throughput phospho-proteomics data was recently introduced and solved using optimization heuristics based on stochastic methods. Here we demonstrate how this problem can be solved using Answer Set Programming (ASP), a declarative problem solving paradigm, in which a problem is encoded as a logical program such that its answer sets represent solutions to the problem. ASP has significant improvements over heuristic methods in terms of efficiency and scalability, it guarantees global optimality of solutions as well as provides a complete set of solutions. We illustrate the application of ASP with in silico cases based on realistic networks and data.\nThe notion of meta-mining has appeared recently and extends the traditional meta-learning in two ways. First it does not learn meta-models that provide support only for the learning algorithm selection task but ones that support the whole data-mining process. In addition it abandons the so called black-box approach to algorithm description followed in meta-learning. Now in addition to the datasets, algorithms also have descriptors, workflows as well. For the latter two these descriptions are semantic, describing properties of the algorithms. With the availability of descriptors both for datasets and data mining workflows the traditional modelling techniques followed in meta-learning, typically based on classification and regression algorithms, are no longer appropriate. Instead we are faced with a problem the nature of which is much more similar to the problems that appear in recommendation systems. The most important meta-mining requirements are that suggestions should use only datasets and workflows descriptors and the cold-start problem, e.g. providing workflow suggestions for new datasets.   In this paper we take a different view on the meta-mining modelling problem and treat it as a recommender problem. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulations is general enough so that it can be applied on problems with similar requirements.\nAnswer Set Programming (ASP) is a well-known problem solving approach based on nonmonotonic logic programs and efficient solvers. To enable access to external information, HEX-programs extend programs with external atoms, which allow for a bidirectional communication between the logic program and external sources of computation (e.g., description logic reasoners and Web resources). Current solvers evaluate HEX-programs by a translation to ASP itself, in which values of external atoms are guessed and verified after the ordinary answer set computation. This elegant approach does not scale with the number of external accesses in general, in particular in presence of nondeterminism (which is instrumental for ASP). In this paper, we present a novel, native algorithm for evaluating HEX-programs which uses learning techniques. In particular, we extend conflict-driven ASP solving techniques, which prevent the solver from running into the same conflict again, from ordinary to HEX-programs. We show how to gain additional knowledge from external source evaluations and how to use it in a conflict-driven algorithm. We first target the uninformed case, i.e., when we have no extra information on external sources, and then extend our approach to the case where additional meta-information is available. Experiments show that learning from external sources can significantly decrease both the runtime and the number of considered candidate compatible sets.\nExisting Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired post-data posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks whose Bayesian formulation results in hybrid chain graph models. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convex-analysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets, which appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics. Such results were not available until now, and contribute to push forward the interface between these two important subfields, which have been largely treated as isolated in the community.\nThe fundamental, powerful process of computation in the brain has been widely misunderstood. The paper [1] associates the general failure to build intelligent thinking machines with current reductionist principles of temporal coding and advocates for a change in paradigm regarding the brain analogy. Since fragments of information are stored in proteins which can shift between several structures to perform their function, the biological substrate is actively involved in physical computation. The intrinsic nonlinear dynamics of action potentials and synaptic activities maintain physical interactions within and between neurons in the brain. During these events the required information is exchanged between molecular structures (proteins) which store fragments of information and the generated electric flux which carries and integrates information in the brain. The entire process of physical interaction explains how the brain actively creates or experiences meaning. This process of interaction during an action potential generation can be simply seen as the moment when the neuron solves a many-body problem. A neuroelectrodynamic theory shows that the neuron solves equations rather than exclusively computes functions. With the main focus on temporal patterns, the spike timing dogma (STD) has neglected important forms of computation which do occur inside neurons. In addition, artificial neural models have missed the most important part since the real super-computing power of the brain has its origins in computations that occur within neurons.\nAndroid and Facebook provide third-party applications with access to users' private data and the ability to perform potentially sensitive operations (e.g., post to a user's wall or place phone calls). As a security measure, these platforms restrict applications' privileges with permission systems: users must approve the permissions requested by applications before the applications can make privacy- or security-relevant API calls. However, recent studies have shown that users often do not understand permission requests and lack a notion of typicality of requests. As a first step towards simplifying permission systems, we cluster a corpus of 188,389 Android applications and 27,029 Facebook applications to find patterns in permission requests. Using a method for Boolean matrix factorization for finding overlapping clusters, we find that Facebook permission requests follow a clear structure that exhibits high stability when fitted with only five clusters, whereas Android applications demonstrate more complex permission requests. We also find that low-reputation applications often deviate from the permission request patterns that we identified for high-reputation applications suggesting that permission request patterns are indicative for user satisfaction or application quality.\nMulti-view learning algorithms typically assume a complete bipartite mapping between the different views in order to exchange information during the learning process. However, many applications provide only a partial mapping between the views, creating a challenge for current methods. To address this problem, we propose a multi-view algorithm based on constrained clustering that can operate with an incomplete mapping. Given a set of pairwise constraints in each view, our approach propagates these constraints using a local similarity measure to those instances that can be mapped to the other views, allowing the propagated constraints to be transferred across views via the partial mapping. It uses co-EM to iteratively estimate the propagation within each view based on the current clustering model, transfer the constraints across views, and then update the clustering model. By alternating the learning process between views, this approach produces a unified clustering model that is consistent with all views. We show that this approach significantly improves clustering performance over several other methods for transferring constraints and allows multi-view clustering to be reliably applied when given a limited mapping between the views. Our evaluation reveals that the propagated constraints have high precision with respect to the true clusters in the data, explaining their benefit to clustering performance in both single- and multi-view learning scenarios.\nIn social networks, information and influence diffuse among users as cascades. While the importance of studying cascades has been recognized in various applications, it is difficult to observe the complete structure of cascades in practice. Moreover, much less is known on how to infer cascades based on partial observations. In this paper we study the cascade inference problem following the independent cascade model, and provide a full treatment from complexity to algorithms: (a) We propose the idea of consistent trees as the inferred structures for cascades; these trees connect source nodes and observed nodes with paths satisfying the constraints from the observed temporal information. (b) We introduce metrics to measure the likelihood of consistent trees as inferred cascades, as well as several optimization problems for finding them. (c) We show that the decision problems for consistent trees are in general NP-complete, and that the optimization problems are hard to approximate. (d) We provide approximation algorithms with performance guarantees on the quality of the inferred cascades, as well as heuristics. We experimentally verify the efficiency and effectiveness of our inference algorithms, using real and synthetic data.\nQuick Summary is an innovate implementation of an automatic document summarizer that inputs a document in the English language and evaluates each sentence. The scanner or evaluator determines criteria based on its grammatical structure and place in the paragraph. The program then asks the user to specify the number of sentences the person wishes to highlight. For example should the user ask to have three of the most important sentences, it would highlight the first and most important sentence in green. Commonly this is the sentence containing the conclusion. Then Quick Summary finds the second most important sentence usually called a satellite and highlights it in yellow. This is usually the topic sentence. Then the program finds the third most important sentence and highlights it in red. The implementations of this technology are useful in a society of information overload when a person typically receives 42 emails a day (Microsoft). The paper also is a candid look at difficulty that machine learning has in textural translating. However, it speaks on how to overcome the obstacles that historically prevented progress. This paper proposes mathematical meta-data criteria that justify the place of importance of a sentence. Just as tools for the study of relational symmetry in bio-informatics, this tool seeks to classify words with greater clarity. \"Survey Finds Workers Average Only Three Productive Days per Week.\" Microsoft News Center. Microsoft. Web. 31 Mar. 2012.\nCongestion games model a wide variety of real-world resource congestion problems, such as selfish network routing, traffic route guidance in congested areas, taxi fleet optimization and crowd movement in busy areas. However, existing research in congestion games assumes: (a) deterministic movement of agents between resources; and (b) perfect rationality (i.e. maximizing their own expected value) of all agents. Such assumptions are not reasonable in dynamic domains where decision support has to be provided to humans. For instance, in optimizing the performance of a taxi fleet serving a city, movement of taxis can be involuntary or nondeterministic (decided by the specific customer who hires the taxi) and more importantly, taxi drivers may not follow advice provided by the decision support system (due to bounded rationality of humans). To that end, we contribute: (a) a general framework for representing congestion games under uncertainty for populations with assorted notions of rationality. (b) a scalable approach for solving the decision problem for perfectly rational agents which are in the mix with boundedly rational agents; and (c) a detailed evaluation on a synthetic and realworld data set to illustrate the usefulness of our new approach with respect to key social welfare metrics in the context of an assorted human-agent population. An interesting result from our experiments on a real-world taxi fleet optimization problem is that it is better (in terms of revenue and operational efficiency) for taxi drivers to follow perfectly rational strategies irrespective of the percentage of drivers not following the advice.\nA determinantal point process (DPP) is a random process useful for modeling the combinatorial problem of subset selection. In particular, DPPs encourage a random subset Y to contain a diverse set of items selected from a base set Y. For example, we might use a DPP to display a set of news headlines that are relevant to a user's interests while covering a variety of topics. Suppose, however, that we are asked to sequentially select multiple diverse sets of items, for example, displaying new headlines day-by-day. We might want these sets to be diverse not just individually but also through time, offering headlines today that are unlike the ones shown yesterday. In this paper, we construct a Markov DPP (M-DPP) that models a sequence of random sets {Yt}. The proposed M-DPP defines a stationary process that maintains DPP margins. Crucially, the induced union process Zt = Yt u Yt-1 is also marginally DPP-distributed. Jointly, these properties imply that the sequence of random sets are encouraged to be diverse both at a given time step as well as across time steps. We describe an exact, efficient sampling procedure, and a method for incrementally learning a quality measure over items in the base set Y based on external preferences. We apply the M-DPP to the task of sequentially displaying diverse and relevant news articles to a user with topic preferences.\nIn spectral clustering, one defines a similarity matrix for a collection of data points, transforms the matrix to get the Laplacian matrix, finds the eigenvectors of the Laplacian matrix, and obtains a partition of the data using the leading eigenvectors. The last step is sometimes referred to as rounding, where one needs to decide how many leading eigenvectors to use, to determine the number of clusters, and to partition the data points. In this paper, we propose a novel method for rounding. The method differs from previous methods in three ways. First, we relax the assumption that the number of clusters equals the number of eigenvectors used. Second, when deciding the number of leading eigenvectors to use, we not only rely on information contained in the leading eigenvectors themselves, but also use subsequent eigenvectors. Third, our method is model-based and solves all the three subproblems of rounding using a class of graphical models called latent tree models. We evaluate our method on both synthetic and real-world data. The results show that our method works correctly in the ideal case where between-clusters similarity is 0, and degrades gracefully as one moves away from the ideal case.\nLatent variable models are used to estimate variables of interest quantities which are observable only up to some measurement error. In many studies, such variables are known but not precisely quantifiable (such as \"job satisfaction\" in social sciences and marketing, \"analytical ability\" in educational testing, or \"inflation\" in economics). This leads to the development of measurement instruments to record noisy indirect evidence for such unobserved variables such as surveys, tests and price indexes. In such problems, there are postulated latent variables and a given measurement model. At the same time, other unantecipated latent variables can add further unmeasured confounding to the observed variables. The problem is how to deal with unantecipated latents variables. In this paper, we provide a method loosely inspired by canonical correlation that makes use of background information concerning the \"known\" latent variables. Given a partially specified structure, it provides a structure learning approach to detect \"unknown unknowns,\" the confounding effect of potentially infinitely many other latent variables. This is done without explicitly modeling such extra latent factors. Because of the special structure of the problem, we are able to exploit a new variation of composite likelihood fitting to efficiently learn this structure. Validation is provided with experiments in synthetic data and the analysis of a large survey done with a sample of over 100,000 staff members of the National Health Service of the United Kingdom.\nUnderstanding the adaptation process of plants to drought stress is essential in improving management practices, breeding strategies as well as engineering viable crops for a sustainable agriculture in the coming decades. Hyper-spectral imaging provides a particularly promising approach to gain such understanding since it allows to discover non-destructively spectral characteristics of plants governed primarily by scattering and absorption characteristics of the leaf internal structure and biochemical constituents. Several drought stress indices have been derived using hyper-spectral imaging. However, they are typically based on few hyper-spectral images only, rely on interpretations of experts, and consider few wavelengths only. In this study, we present the first data-driven approach to discovering spectral drought stress indices, treating it as an unsupervised labeling problem at massive scale. To make use of short range dependencies of spectral wavelengths, we develop an online variational Bayes algorithm for latent Dirichlet allocation with convolved Dirichlet regularizer. This approach scales to massive datasets and, hence, provides a more objective complement to plant physiological practices. The spectral topics found conform to plant physiological knowledge and can be computed in a fraction of the time compared to existing LDA approaches.\nIn time series analysis research there is a strong interest in discrete representations of real valued data streams. One approach that emerged over a decade ago and is still considered state-of-the-art is the Symbolic Aggregate Approximation algorithm. This discretization algorithm was the first symbolic approach that mapped a real-valued time series to a symbolic representation that was guaranteed to lower-bound Euclidean distance. The interest of this paper concerns the SAX assumption of data being highly Gaussian and the use of the standard normal curve to choose partitions to discretize the data. Though not necessarily, but generally, and certainly in its canonical form, the SAX approach chooses partitions on the standard normal curve that would produce an equal probability for each symbol in a finite alphabet to occur. This procedure is generally valid as a time series is normalized before the rest of the SAX algorithm is applied. However there exists a caveat to this assumption of equi-probability due to the intermediate step of Piecewise Aggregate Approximation (PAA). What we will show in this paper is that when PAA is applied the distribution of the data is indeed altered, resulting in a shrinking standard deviation that is proportional to the number of points used to create a segment of the PAA representation and the degree of auto-correlation within the series. Data that exhibits statistically significant auto-correlation is less affected by this shrinking distribution. As the standard deviation of the data contracts, the mean remains the same, however the distribution is no longer standard normal and therefore the partitions based on the standard normal curve are no longer valid for the assumption of equal probability.\nOur broader goal is to automatically translate English sentences into formulas in appropriate knowledge representation languages as a step towards understanding and thus answering questions with respect to English text. Our focus in this paper is on the language of Answer Set Programming (ASP). Our approach to translate sentences to ASP rules is inspired by Montague's use of lambda calculus formulas as meaning of words and phrases. With ASP as the target language the meaning of words and phrases are ASP-lambda formulas. In an earlier work we illustrated our approach by manually developing a dictionary of words and their ASP-lambda formulas. However such an approach is not scalable. In this paper our focus is on two algorithms that allow one to construct ASP-lambda formulas in an inverse manner. In particular the two algorithms take as input two lambda-calculus expressions G and H and compute a lambda-calculus expression F such that F with input as G, denoted by F@G, is equal to H; and similarly G@F = H. We present correctness and complexity results about these algorithms. To do that we develop the notion of typed ASP-lambda calculus theories and their orders and use it in developing the completeness results. (To appear in Theory and Practice of Logic Programming.)\nIn the family of Learning Classifier Systems, the classifier system XCS has been successfully used for many applications. However, the standard XCS has no memory mechanism and can only learn optimal policy in Markov environments, where the optimal action is determined solely by the state of current sensory input. In practice, most environments are partially observable environments on agent's sensation, which are also known as non-Markov environments. Within these environments, XCS either fails, or only develops a suboptimal policy, since it has no memory. In this work, we develop a new classifier system based on XCS to tackle this problem. It adds an internal message list to XCS as the memory list to record input sensation history, and extends a small number of classifiers with memory conditions. The classifier's memory condition, as a foothold to disambiguate non-Markov states, is used to sense a specified element in the memory list. Besides, a detection method is employed to recognize non-Markov states in environments, to avoid these states controlling over classifiers' memory conditions. Furthermore, four sets of different complex maze environments have been tested by the proposed method. Experimental results show that our system is one of the best techniques to solve partially observable environments, compared with some well-known classifier systems proposed for these environments.\nThe propositional planning problem is a notoriously difficult computational problem. Downey et al. (1999) initiated the parameterized analysis of planning (with plan length as the parameter) and B\\\"ackstr\\\"om et al. (2012) picked up this line of research and provided an extensive parameterized analysis under various restrictions, leaving open only one stubborn case. We continue this work and provide a full classification. In particular, we show that the case when actions have no preconditions and at most $e$ postconditions is fixed-parameter tractable if $e\\leq 2$ and W[1]-complete otherwise. We show fixed-parameter tractability by a reduction to a variant of the Steiner Tree problem; this problem has been shown fixed-parameter tractable by Guo et al. (2007). If a problem is fixed-parameter tractable, then it admits a polynomial-time self-reduction to instances whose input size is bounded by a function of the parameter, called the kernel. For some problems, this function is even polynomial which has desirable computational implications. Recent research in parameterized complexity has focused on classifying fixed-parameter tractable problems on whether they admit polynomial kernels or not. We revisit all the previously obtained restrictions of planning that are fixed-parameter tractable and show that none of them admits a polynomial kernel unless the polynomial hierarchy collapses to its third level.\nThe systematic biases seen in people's probability judgments are typically taken as evidence that people do not reason about probability using the rules of probability theory, but instead use heuristics which sometimes yield reasonable judgments and sometimes systematic biases. This view has had a major impact in economics, law, medicine, and other fields; indeed, the idea that people cannot reason with probabilities has become a widespread truism. We present a simple alternative to this view, where people reason about probability according to probability theory but are subject to random variation or noise in the reasoning process. In this account the effect of noise is cancelled for some probabilistic expressions: analysing data from two experiments we find that, for these expressions, people's probability judgments are strikingly close to those required by probability theory. For other expressions this account produces systematic deviations in probability estimates. These deviations explain four reliable biases in human probabilistic reasoning (conservatism, subadditivity, conjunction and disjunction fallacies). These results suggest that people's probability judgments embody the rules of probability theory, and that biases in those judgments are due to the effects of random noise.\nPerhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problem-specific instance features. Such models have important applications to algorithm analysis, portfolio-based algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of existing models, new families of models, and -- perhaps most importantly -- a much more thorough treatment of algorithm parameters as model inputs. We also comprehensively describe new and existing features for predicting algorithm runtime for propositional satisfiability (SAT), travelling salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to a wide range of runtime modelling techniques from the literature. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.\nWe introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are \\emph{close} to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points).   Membership query algorithms are not popular among machine learning practitioners. Apart from the obvious difficulty of adaptively querying labelers, it has also been observed that querying \\emph{unnatural} points leads to increased noise from human labelers (Lang and Baum, 1992). This motivates our study of learning algorithms that make queries that are close to examples generated from the data distribution.   We restrict our attention to functions defined on the $n$-dimensional Boolean hypercube and say that a membership query is local if its Hamming distance from some example in the (random) training data is at most $O(\\log(n))$. We show the following results in this model:   (i) The class of sparse polynomials (with coefficients in R) over $\\{0,1\\}^n$ is polynomial time learnable under a large class of \\emph{locally smooth} distributions using $O(\\log(n))$-local queries. This class also includes the class of $O(\\log(n))$-depth decision trees.   (ii) The class of polynomial-sized decision trees is polynomial time learnable under product distributions using $O(\\log(n))$-local queries.   (iii) The class of polynomial size DNF formulas is learnable under the uniform distribution using $O(\\log(n))$-local queries in time $n^{O(\\log(\\log(n)))}$.   (iv) In addition we prove a number of results relating the proposed model to the traditional PAC model and the PAC+MQ model.\nIn this paper secured wireless communication using fuzzy logic based high speed public key cryptography (FLHSPKC) has been proposed by satisfying the major issues likes computational safety, power management and restricted usage of memory in wireless communication. Wireless Sensor Network (WSN) has several major constraints likes inadequate source of energy, restricted computational potentiality and limited memory. Though conventional Elliptic Curve Cryptography (ECC) which is a sort of public key cryptography used in wireless communication provides equivalent level of security like other existing public key algorithm using smaller parameters than other but this traditional ECC does not take care of all these major limitations in WSN. In conventional ECC consider Elliptic curve point p, an arbitrary integer k and modulus m, ECC carry out scalar multiplication kP mod m, which takes about 80% of key computation time on WSN. In this paper proposed FLHSPKC scheme provides some novel strategy including novel soft computing based strategy to speed up scalar multiplication in conventional ECC and which in turn takes shorter computational time and also satisfies power consumption restraint, limited usage of memory without hampering the security level. Performance analysis of the different strategies under FLHSPKC scheme and comparison study with existing conventional ECC methods has been done.\nThe paper addresses the modular design of composite solving strategies for multicriteria ranking (sorting). Here a 'scale of creativity' that is close to creative levels proposed by Altshuller is used as the reference viewpoint: (i) a basic object, (ii) a selected object, (iii) a modified object, and (iv) a designed object (e.g., composition of object components). These levels maybe used in various parts of decision support systems (DSS) (e.g., information, operations, user). The paper focuses on the more creative above-mentioned level (i.e., composition or combinatorial synthesis) for the operational part (i.e., composite solving strategy). This is important for a search/exploration mode of decision making process with usage of various procedures and techniques and analysis/integration of obtained results. The paper describes methodological issues of decision technology and synthesis of composite strategy for multicriteria ranking. The synthesis of composite strategies is based on 'hierarchical morphological multicriteria design' (HMMD) which is based on selection and combination of design alternatives (DAs) (here: local procedures or techniques) while taking into account their quality and quality of their interconnections (IC). A new version of HMMD with interval multiset estimates for DAs is used. The operational environment of DSS COMBI for multicriteria ranking, consisting of a morphology of local procedures or techniques (as design alternatives DAs), is examined as a basic one.\nWe consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to $k$ sites, and the sites are required to communicate with each other via the coordinator. At each time-step $t$, one of the $k$ site nodes has to pick an expert from the set ${1, ..., n}$, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon $T$, while simultaneously keeping communication to a minimum.   The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the non-distributed setting to obtain the optimal $O(\\sqrt{\\log(n)T})$ regret bound at the cost of $T$ communication. (ii) No communication: Each site runs an independent copy : the regret is $O(\\sqrt{log(n)kT})$ and the communication is 0. This paper shows the difficulty of simultaneously achieving regret asymptotically better than $\\sqrt{kT}$ and communication better than $T$. We give a novel algorithm that for an oblivious adversary achieves a non-trivial trade-off: regret $O(\\sqrt{k^{5(1+\\epsilon)/6} T})$ and communication $O(T/k^{\\epsilon})$, for any value of $\\epsilon \\in (0, 1/5)$. We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off.\nOptimization problems associated with the interaction of linked particles are at the heart of polymer science, protein folding and other important problems in the physical sciences. In this review we explain how to recast these problems as constraint satisfaction problems such as linear programming, maximum satisfiability, and pseudo-boolean optimization. By encoding problems this way, one can leverage substantial insight and powerful solvers from the computer science community which studies constraint programming for diverse applications such as logistics, scheduling, artificial intelligence, and circuit design. We demonstrate how to constrain and embed lattice heteropolymer problems using several strategies. Each strikes a unique balance between number of constraints, complexity of constraints, and number of variables. Finally, we show how to reduce the locality of couplings in these energy functions so they can be realized as Hamiltonians on existing quantum annealing machines. We intend that this review be used as a case study for encoding related combinatorial optimization problems in a form suitable for adiabatic quantum optimization.\nInformation-Geometric Optimization (IGO) is a unified framework of stochastic algorithms for optimization problems. Given a family of probability distributions, IGO turns the original optimization problem into a new maximization problem on the parameter space of the probability distributions. IGO updates the parameter of the probability distribution along the natural gradient, taken with respect to the Fisher metric on the parameter manifold, aiming at maximizing an adaptive transform of the objective function. IGO recovers several known algorithms as particular instances: for the family of Bernoulli distributions IGO recovers PBIL, for the family of Gaussian distributions the pure rank-mu CMA-ES update is recovered, and for exponential families in expectation parametrization the cross-entropy/ML method is recovered. This article provides a theoretical justification for the IGO framework, by proving that any step size not greater than 1 guarantees monotone improvement over the course of optimization, in terms of q-quantile values of the objective function f. The range of admissible step sizes is independent of f and its domain. We extend the result to cover the case of different step sizes for blocks of the parameters in the IGO algorithm. Moreover, we prove that expected fitness improves over time when fitness-proportional selection is applied, in which case the RPP algorithm is recovered.\nGiven the limited performance of 2D cellular automata in terms of space when the number of documents increases and in terms of visualization clusters, our motivation was to experiment these cellular automata by increasing the size to view the impact of size on quality of results. The representation of textual data was carried out by a vector model whose components are derived from the overall balancing of the used corpus, Term Frequency Inverse Document Frequency (TF-IDF). The WorldNet thesaurus has been used to address the problem of the lemmatization of the words because the representation used in this study is that of the bags of words. Another independent method of the language was used to represent textual records is that of the n-grams. Several measures of similarity have been tested. To validate the classification we have used two measures of assessment based on the recall and precision (f-measure and entropy). The results are promising and confirm the idea to increase the dimension to the problem of the spatiality of the classes. The results obtained in terms of purity class (i.e. the minimum value of entropy) shows that the number of documents over longer believes the results are better for 3D cellular automata, which was not obvious to the 2D dimension. In terms of spatial navigation, cellular automata provide very good 3D performance visualization than 2D cellular automata.\nThis paper introduces TwitterPaul, a system designed to make use of Social Media data to help to predict game outcomes for the 2010 FIFA World Cup tournament. To this end, we extracted over 538K mentions to football games from a large sample of tweets that occurred during the World Cup, and we classified into different types with a precision of up to 88%. The different mentions were aggregated in order to make predictions about the outcomes of the actual games. We attempt to learn which Twitter users are accurate predictors and explore several techniques in order to exploit this information to make more accurate predictions. We compare our results to strong baselines and against the betting line (prediction market) and found that the quality of extractions is more important than the quantity, suggesting that high precision methods working on a medium-sized dataset are preferable over low precision methods that use a larger amount of data. Finally, by aggregating some classes of predictions, the system performance is close to the one of the betting line. Furthermore, we believe that this domain independent framework can help to predict other sports, elections, product release dates and other future events that people talk about in social media.\nSmartphone technology is more and more becoming the predominant communication tool for people across the world. People use their smartphones to keep their contact data, to browse the internet, to exchange messages, to keep notes, carry their personal files and documents, etc. Users while browsing are also capable of shopping online, thus provoking a need to type their credit card numbers and security codes. As the smartphones are becoming widespread so do the security threats and vulnerabilities facing this technology. Recent news and articles indicate huge increase in malware and viruses for operating systems employed on smartphones (primarily Android and iOS). Major limitations of smartphone technology are its processing power and its scarce energy source since smartphones rely on battery usage. Since smartphones are devices which change their network location as the user moves between different places, intrusion detection systems for smartphone technology are most often classified as IDSs designed for mobile ad-hoc networks. The aim of this research is to give a brief overview of IDS technology, give an overview of major machine learning and pattern recognition algorithms used in IDS technologies, give an overview of security models of iOS and Android and propose a new host-based IDS model for smartphones and create proof-of-concept application for Android platform for the newly proposed model. Keywords: IDS, SVM, Android, iOS;\nRecent works have validated the possibility of improving energy efficiency in radio access networks (RANs), achieved by dynamically turning on/off some base stations (BSs). In this paper, we extend the research over BS switching operations, which should match up with traffic load variations. Instead of depending on the dynamic traffic loads which are still quite challenging to precisely forecast, we firstly formulate the traffic variations as a Markov decision process. Afterwards, in order to foresightedly minimize the energy consumption of RANs, we design a reinforcement learning framework based BS switching operation scheme. Furthermore, to avoid the underlying curse of dimensionality in reinforcement learning, a transfer actor-critic algorithm (TACT), which utilizes the transferred learning expertise in historical periods or neighboring regions, is proposed and provably converges. In the end, we evaluate our proposed scheme by extensive simulations under various practical configurations and show that the proposed TACT algorithm contributes to a performance jumpstart and demonstrates the feasibility of significant energy efficiency improvement at the expense of tolerable delay performance.\nThe considerable mathematical knowledge encoded by the Flyspeck project is combined with external automated theorem provers (ATPs) and machine-learning premise selection methods trained on the proofs, producing an AI system capable of answering a wide range of mathematical queries automatically. The performance of this architecture is evaluated in a bootstrapping scenario emulating the development of Flyspeck from axioms to the last theorem, each time using only the previous theorems and proofs. It is shown that 39% of the 14185 theorems could be proved in a push-button mode (without any high-level advice and user interaction) in 30 seconds of real time on a fourteen-CPU workstation. The necessary work involves: (i) an implementation of sound translations of the HOL Light logic to ATP formalisms: untyped first-order, polymorphic typed first-order, and typed higher-order, (ii) export of the dependency information from HOL Light and ATP proofs for the machine learners, and (iii) choice of suitable representations and methods for learning from previous proofs, and their integration as advisors with HOL Light. This work is described and discussed here, and an initial analysis of the body of proofs that were found fully automatically is provided.\nIn game theory, an Evolutionarily Stable Set (ES set) is a set of Nash Equilibrium (NE) strategies that give the same payoffs. Similar to an Evolutionarily Stable Strategy (ES strategy), an ES set is also a strict NE. This work investigates the evolutionary stability of classical and quantum strategies in the quantum penny flip games. In particular, we developed an evolutionary game theory model to conduct a series of simulations where a population of mixed classical strategies from the ES set of the game were invaded by quantum strategies. We found that when only one of the two players' mixed classical strategies were invaded, the results were different. In one case, due to the interference phenomenon of superposition, quantum strategies provided more payoff, hence successfully replaced the mixed classical strategies in the ES set. In the other case, the mixed classical strategies were able to sustain the invasion of quantum strategies and remained in the ES set. Moreover, when both players' mixed classical strategies were invaded by quantum strategies, a new quantum ES set emerged. The strategies in the quantum ES set give both players payoff 0, which is the same as the payoff of the strategies in the mixed classical ES set of this game.\nTraining a Support Vector Machine (SVM) requires the solution of a quadratic programming problem (QP) whose computational complexity becomes prohibitively expensive for large scale datasets. Traditional optimization methods cannot be directly applied in these cases, mainly due to memory restrictions.   By adopting a slightly different objective function and under mild conditions on the kernel used within the model, efficient algorithms to train SVMs have been devised under the name of Core Vector Machines (CVMs). This framework exploits the equivalence of the resulting learning problem with the task of building a Minimal Enclosing Ball (MEB) problem in a feature space, where data is implicitly embedded by a kernel function.   In this paper, we improve on the CVM approach by proposing two novel methods to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast method to approximate the solution of a MEB problem. In contrast to CVMs, our algorithms do not require to compute the solutions of a sequence of increasingly complex QPs and are defined by using only analytic optimization steps. Experiments on a large collection of datasets show that our methods scale better than CVMs in most cases, sometimes at the price of a slightly lower accuracy. As CVMs, the proposed methods can be easily extended to machine learning problems other than binary classification. However, effective classifiers are also obtained using kernels which do not satisfy the condition required by CVMs and can thus be used for a wider set of problems.\nFor effective autonomous navigation,estimation of the pose of the robot is essential at every sampling time. For computing an accurate estimation,odometric error needs to be reduced with the help of data from external sensor. In this work, a technique has been developed for accurate pose estimation of mobile robot by using Laser Range data. The technique is robust to noisy data, which may contain considerable amount of outliers. A grey image is formed from laser range data and the key points from this image are extracted by Harris corner detector. The matching of the key points from consecutive data sets have been done while outliers have been rejected by RANSAC method. Robot state is measured by the correspondence between the two sets of keypoints. Finally, optimal robot state is estimated by Extended Kalman Filter. The technique has been applied to an operational robot in the laboratory environment to show the robustness of the technique in presence of noisy sensor data. The performance of this new technique has been compared with that of conventional ICP method. Through this method, effective and accurate navigation has been achieved even in presence of substantial noise in the sensor data at the cost of a small amount of additional computational complexity.\nDuring years 2008 to 2011 author gives several courses on Foundations of Scientific Research at Computer Science Faculty of the National Aviation University in Kiev. This text presents material to lectures of the courses. It consists of 18 sections and some ideas of the manual can be seen from their titles. These include: General notions about scientific research. Ontologies and upper ontologies. Ontologies of object domains. Examples of Research Activity. Some Notions of the Theory of Finite and Discrete Sets. Algebraic Operations and Algebraic Structures. Elements of the Theory of Graphs and Nets. Scientific activity on the example of Information and its investigation. Scientific research in Artificial Intelligence. Compilers and compilation. Objective, Concepts and History of Computer security. Methodological and categorical apparatus of scientific research. Methodology and methods of scientific research. Scientific idea and significance of scientific research. Forms of scientific knowledge organization and principles of scientific research. Theoretical study, applied study and creativity. Types of scientific research: theoretical study, applied study. Types of scientific research: forms of representation of material. Some sections of the text contain enough material to lectures, but in some cases these are sketchs without references to Foundations of Research Activities. Really this is the first version of the manual and author plans to edit, modify and extend the version. Some reasons impose the author to post it as e-print. . Author compiled material from many sources and hope that it gives various points of view on Foundations of Research Activities.\nThe monotone duality problem is defined as follows: Given two monotone formulas f and g in iredundant DNF, decide whether f and g are dual. This problem is the same as duality testing for hypergraphs, that is, checking whether a hypergraph H consists of precisely all minimal transversals of a simple hypergraph G. By exploiting a recent problem-decomposition method by Boros and Makino (ICALP 2009), we show that duality testing for hypergraphs, and thus for monotone DNFs, is feasible in DSPACE[log^2 n], i.e., in quadratic logspace. As the monotone duality problem is equivalent to a number of problems in the areas of databases, data mining, and knowledge discovery, the results presented here yield new complexity results for those problems, too. For example, it follows from our results that whenever for a Boolean-valued relation (whose attributes represent items), a number of maximal frequent itemsets and a number of minimal infrequent itemsets are known, then it can be decided in quadratic logspace whether there exist additional frequent or infrequent itemsets.\nAutomatic analysis of biomedical time series such as electroencephalogram (EEG) and electrocardiographic (ECG) signals has attracted great interest in the community of biomedical engineering due to its important applications in medicine. In this work, a simple yet effective bag-of-words representation that is able to capture both local and global structure similarity information is proposed for biomedical time series representation. In particular, similar to the bag-of-words model used in text document domain, the proposed method treats a time series as a text document and extracts local segments from the time series as words. The biomedical time series is then represented as a histogram of codewords, each entry of which is the count of a codeword appeared in the time series. Although the temporal order of the local segments is ignored, the bag-of-words representation is able to capture high-level structural information because both local and global structural information are well utilized. The performance of the bag-of-words model is validated on three datasets extracted from real EEG and ECG signals. The experimental results demonstrate that the proposed method is not only insensitive to parameters of the bag-of-words model such as local segment length and codebook size, but also robust to noise.\nTree projections provide a mathematical framework that encompasses all the various (purely) structural decomposition methods that have been proposed in the literature to single out classes of nearly-acyclic (hyper)graphs, such as the tree decomposition method, which is the most powerful decomposition method on graphs, and the (generalized) hypertree decomposition method, which is its natural counterpart on arbitrary hypergraphs. The paper analyzes this framework, by focusing in particular on \"minimal\" tree projections, that is, on tree projections without useless redundancies. First, it is shown that minimal tree projections enjoy a number of properties that are usually required for normal form decompositions in various structural decomposition methods. In particular, they enjoy the same kind of connection properties as (minimal) tree decompositions of graphs, with the result being tight in the light of the negative answer that is provided to the open question about whether they enjoy a slightly stronger notion of connection property, defined to speed-up the computation of hypertree decompositions. Second, it is shown that tree projections admit a natural game-theoretic characterization in terms of the Captain and Robber game. In this game, as for the Robber and Cops game characterizing tree decompositions, the existence of winning strategies implies the existence of monotone ones. As a special case, the Captain and Robber game can be used to characterize the generalized hypertree decomposition method, where such a game-theoretic characterization was missing and asked for. Besides their theoretical interest, these results have immediate algorithmic applications both for the general setting and for structural decomposition methods that can be recast in terms of tree projections.\nWe exploit the redundancy and volume of information on the web to build a computerized player for the ABC TV game show 'Who Wants To Be A Millionaire?' The player consists of a question-answering module and a decision-making module. The question-answering module utilizes question transformation techniques, natural language parsing, multiple information retrieval algorithms, and multiple search engines; results are combined in the spirit of ensemble learning using an adaptive weighting scheme. Empirically, the system correctly answers about 75% of questions from the Millionaire CD-ROM, 3rd edition - general-interest trivia questions often about popular culture and common knowledge. The decision-making module chooses from allowable actions in the game in order to maximize expected risk-adjusted winnings, where the estimated probability of answering correctly is a function of past performance and confidence in in correctly answering the current question. When given a six question head start (i.e., when starting from the $2,000 level), we find that the system performs about as well on average as humans starting at the beginning. Our system demonstrates the potential of simple but well-chosen techniques for mining answers from unstructured information such as the web.\nCollaborative filtering is a very useful general technique for exploiting the preference patterns of a group of users to predict the utility of items to a particular user. Previous research has studied several probabilistic graphic models for collaborative filtering with promising results. However, while these models have succeeded in capturing the similarity among users and items in one way or the other, none of them has considered the fact that users with similar interests in items can have very different rating patterns; some users tend to assign a higher rating to all items than other users. In this paper, we propose and study of two new graphic models that address the distinction between user preferences and ratings. In one model, called the decoupled model, we introduce two different variables to decouple a users preferences FROM his ratings. IN the other, called the preference model, we model the orderings OF items preferred BY a USER, rather than the USERs numerical ratings of items. Empirical study over two datasets of movie ratings shows that appropriate modeling of the distinction between user preferences and ratings improves the performance substantially and consistently. Specifically, the proposed decoupled model outperforms all five existing approaches that we compare with significantly, but the preference model is not very successful. These results suggest that explicit modeling of the underlying user preferences is very important for collaborative filtering, but we can not afford ignoring the rating information completely.\nCollaborative filtering (CF) and content-based filtering (CBF) have widely been used in information filtering applications. Both approaches have their strengths and weaknesses which is why researchers have developed hybrid systems. This paper proposes a novel approach to unify CF and CBF in a probabilistic framework, named collaborative ensemble learning. It uses probabilistic SVMs to model each user's profile (as CBF does).At the prediction phase, it combines a society OF users profiles, represented by their respective SVM models, to predict an active users preferences(the CF idea).The combination scheme is embedded in a probabilistic framework and retains an intuitive explanation.Moreover, collaborative ensemble learning does not require a global training stage and thus can incrementally incorporate new data.We report results based on two data sets. For the Reuters-21578 text data set, we simulate user ratings under the assumption that each user is interested in only one category. In the second experiment, we use users' opinions on a set of 642 art images that were collected through a web-based survey. For both data sets, collaborative ensemble achieved excellent performance in terms of recommendation accuracy.\nThe mean field methods, which entail approximating intractable probability distributions variationally with distributions from a tractable family, enjoy high efficiency, guaranteed convergence, and provide lower bounds on the true likelihood. But due to requirement for model-specific derivation of the optimization equations and unclear inference quality in various models, it is not widely used as a generic approximate inference algorithm. In this paper, we discuss a generalized mean field theory on variational approximation to a broad class of intractable distributions using a rich set of tractable distributions via constrained optimization over distribution spaces. We present a class of generalized mean field (GMF) algorithms for approximate inference in complex exponential family models, which entails limiting the optimization over the class of cluster-factorizable distributions. GMF is a generic method requiring no model-specific derivations. It factors a complex model into a set of disjoint variable clusters, and uses a set of canonical fix-point equations to iteratively update the cluster distributions, and converge to locally optimal cluster marginals that preserve the original dependency structure within each cluster, hence, fully decomposed the overall inference problem. We empirically analyzed the effect of different tractable family (clusters of different granularity) on inference quality, and compared GMF with BP on several canonical models. Possible extension to higher-order MF approximation is also discussed.\nMethods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this paper, we consider a solution to this problem, suitable for domains where many variables have similar behavior. Our method is based on a new class of models, which we call module networks. A module network explicitly represents the notion of a module - a set of variables that have the same parents in the network and share the same conditional probability distribution. We define the semantics of module networks, and describe an algorithm that learns a module network from data. The algorithm learns both the partitioning of the variables into modules and the dependency structure between the variables. We evaluate our algorithm on synthetic data, and on real data in the domains of gene expression and the stock market. Our results show that module networks generalize better than Bayesian networks, and that the learned module network structure reveals regularities that are obscured in learned Bayesian networks.\nThe Semantic Web ontology language OWL 2 DL comes with a variety of language features that enable sophisticated and practically useful modeling. However, the use of these features has been severely restricted in order to retain decidability of the language. For example, OWL 2 DL does not allow a property to be both transitive and asymmetric, which would be desirable, e.g., for representing an ancestor relation. In this paper, we argue that the so-called global restrictions of OWL 2 DL preclude many useful forms of modeling, by providing a catalog of basic modeling patterns that would be available in OWL 2 DL if the global restrictions were discarded. We then report on the results of evaluating several state-of-the-art OWL 2 DL reasoners on problems that use combinations of features in a way that the global restrictions are violated. The systems turn out to rely heavily on the global restrictions and are thus largely incapable of coping with the modeling patterns. Next we show how off-the-shelf first-order logic theorem proving technology can be used to perform reasoning in the OWL 2 direct semantics, the semantics that underlies OWL 2 DL, but without requiring the global restrictions. Applying a naive proof-of-concept implementation of this approach to the test problems was successful in all cases. Based on our observations, we make suggestions for future lines of research on expressive description logic-style OWL reasoning.\nThe chase algorithm is a fundamental tool for query evaluation and query containment under constraints, where the constraints are (sub-classes of) tuple-generating dependencies (TGDs) and equality generating depencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates, with some notable exceptions. In this paper we take a general approach, and we propose large classes of TGDs under which the chase does not always terminate. Our languages, in particular, are inspired by guarded logic: we show that by enforcing syntactic properties on the form of the TGDs, we are able to ensure decidability of the problem of answering conjunctive queries despite the non-terminating chase. We provide tight complexity bounds for the problem of conjunctive query evaluation for several classes of TGDs. We then introduce EGDs, and provide a condition under which EGDs do not interact with TGDs, and therefore do not take part in query answering. We show applications of our classes of constraints to the problem of answering conjunctive queries under F-Logic Lite, a recently introduced ontology language, and under prominent tractable Description Logics languages. All the results in this paper immediately extend to the problem of conjunctive query containment.\nNowadays, huge efforts are made to modernize the air traffic management systems to cope with uncertainty, complexity and sub-optimality. An answer is to enhance the information sharing between the stakeholders. This paper introduces a framework that bridges the gap between air traffic management and air traffic control on the one hand, and bridges the gap between the ground, the approach and the en-route centers on the other hand. An original system is presented, that has three essential components: the trajectory models, the optimization process, and the monitoring process. The uncertainty of the trajectory is modeled with a Bayesian Network, where the nodes are associated to two types of random variables: the time of overflight on metering points of the airspace, and the traveling time of the routes linking these points. The resulting Bayesian Network covers the complete airspace, and Monte- Carlo simulations are done to estimate the probabilities of sector congestion and delays. On top of this trajectory model, an optimization process minimizes these probabilities by tuning the parameters of the Bayesian trajectory model related to overflight times on metering points. The last component is the monitoring process, that continuously updates the situation of the airspace, modifying the trajectories uncertainties according to actual positions of aircraft. After each update, a new optimal set of overflight times is computed, and can be communicated to the controllers as clearances for the aircraft pilots. The paper presents a formal specification of this global optimization problem, whose underlying rationale was derived with the help of air traffic controllers at Thales Air Systems.\nThis paper presents a model based on an hybrid system to numerically simulate the climbing phase of an aircraft. This model is then used within a trajectory prediction tool. Finally, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) optimization algorithm is used to tune five selected parameters, and thus improve the accuracy of the model. Incorporated within a trajectory prediction tool, this model can be used to derive the order of magnitude of the prediction error over time, and thus the domain of validity of the trajectory prediction. A first validation experiment of the proposed model is based on the errors along time for a one-time trajectory prediction at the take off of the flight with respect to the default values of the theoretical BADA model. This experiment, assuming complete information, also shows the limit of the model. A second experiment part presents an on-line trajectory prediction, in which the prediction is continuously updated based on the current aircraft position. This approach raises several issues, for which improvements of the basic model are proposed, and the resulting trajectory prediction tool shows statistically significantly more accurate results than those of the default model.\nSoftware design is crucial to successful software development, yet is a demanding multi-objective problem for software engineers. In an attempt to assist the software designer, interactive (i.e. human in-the-loop) meta-heuristic search techniques such as evolutionary computing have been applied and show promising results. Recent investigations have also shown that Ant Colony Optimization (ACO) can outperform evolutionary computing as a potential search engine for interactive software design. With a limited computational budget, ACO produces superior candidate design solutions in a smaller number of iterations. Building on these findings, we propose a novel interactive ACO (iACO) approach to assist the designer in early lifecycle software design, in which the search is steered jointly by subjective designer evaluation as well as machine fitness functions relating the structural integrity and surrogate elegance of software designs. Results show that iACO is speedy, responsive and highly effective in enabling interactive, dynamic multi-objective search in early lifecycle software design. Study participants rate the iACO search experience as compelling. Results of machine learning of fitness measure weightings indicate that software design elegance does indeed play a significant role in designer evaluation of candidate software design. We conclude that the evenness of the number of attributes and methods among classes (NAC) is a significant surrogate elegance measure, which in turn suggests that this evenness of distribution, when combined with structural integrity, is an implicit but crucial component of effective early lifecycle software design.\nIn science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate complex mappings using simple stages. Each layer performs a different operation and achieves an ever more sophisticated representation of the input, as, for example, in an deep artificial neural network, an object recognition cascade in computer vision or a speech front-end processing. Joint estimation of the parameters of all the layers and selection of an optimal architecture is widely considered to be a difficult numerical nonconvex optimization problem, difficult to parallelize for execution in a distributed computation environment, and requiring significant human expert effort, which leads to suboptimal systems in practice. We describe a general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penalty-based methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, and is competitive with state-of-the-art nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.\nA plethora of words are used to describe the spectrum of human emotions, but how many emotions are there really, and how do they interact? Over the past few decades, several theories of emotion have been proposed, each based around the existence of a set of 'basic emotions', and each supported by an extensive variety of research including studies in facial expression, ethology, neurology and physiology. Here we present research based on a theory that people transmit their understanding of emotions through the language they use surrounding emotion keywords. Using a labelled corpus of over 21,000 tweets, six of the basic emotion sets proposed in existing literature were analysed using Latent Semantic Clustering (LSC), evaluating the distinctiveness of the semantic meaning attached to the emotional label. We hypothesise that the more distinct the language is used to express a certain emotion, then the more distinct the perception (including proprioception) of that emotion is, and thus more 'basic'. This allows us to select the dimensions best representing the entire spectrum of emotion. We find that Ekman's set, arguably the most frequently used for classifying emotions, is in fact the most semantically distinct overall. Next, taking all analysed (that is, previously proposed) emotion terms into account, we determine the optimal semantically irreducible basic emotion set using an iterative LSC algorithm. Our newly-derived set (Accepting, Ashamed, Contempt, Interested, Joyful, Pleased, Sleepy, Stressed) generates a 6.1% increase in distinctiveness over Ekman's set (Angry, Disgusted, Joyful, Sad, Scared). We also demonstrate how using LSC data can help visualise emotions. We introduce the concept of an Emotion Profile and briefly analyse compound emotions both visually and mathematically.\nVisual features can help predict if a manipulation behavior will succeed at a given location. For example, the success of a behavior that flips light switches depends on the location of the switch. Within this paper, we present methods that enable a mobile manipulator to autonomously learn a function that takes an RGB image and a registered 3D point cloud as input and returns a 3D location at which a manipulation behavior is likely to succeed. Given a pair of manipulation behaviors that can change the state of the world between two sets (e.g., light switch up and light switch down), classifiers that detect when each behavior has been successful, and an initial hint as to where one of the behaviors will be successful, the robot autonomously trains a pair of support vector machine (SVM) classifiers by trying out the behaviors at locations in the world and observing the results. When an image feature vector associated with a 3D location is provided as input to one of the SVMs, the SVM predicts if the associated manipulation behavior will be successful at the 3D location. To evaluate our approach, we performed experiments with a PR2 robot from Willow Garage in a simulated home using behaviors that flip a light switch, push a rocker-type light switch, and operate a drawer. By using active learning, the robot efficiently learned SVMs that enabled it to consistently succeed at these tasks. After training, the robot also continued to learn in order to adapt in the event of failure.\nSubmodular functions have many applications. Matchings have many applications. The bitext word alignment problem can be modeled as the problem of maximizing a nonnegative, monotone, submodular function constrained to matchings in a complete bipartite graph where each vertex corresponds to a word in the two input sentences and each edge represents a potential word-to-word translation. We propose a more general problem of maximizing a nonnegative, monotone, submodular function defined on the edge set of a complete graph constrained to matchings; we call this problem the CSM-Matching problem. CSM-Matching also generalizes the maximum-weight matching problem, which has a polynomial-time algorithm; however, we show that it is NP-hard to approximate CSM-Matching within a factor of e/(e-1) by reducing the max k-cover problem to it. Our main result is a simple, greedy, 3-approximation algorithm for CSM-Matching. Then we reduce CSM-Matching to maximizing a nonnegative, monotone, submodular function over two matroids, i.e., CSM-2-Matroids. CSM-2-Matroids has a (2+epsilon)-approximation algorithm - called LSV2. We show that we can find a (4+epsilon)-approximate solution to CSM-Matching using LSV2. We extend this approach to similar problems.\nTravel sharing, i.e., the problem of finding parts of routes which can be shared by several travellers with different points of departure and destinations, is a complex multiagent problem that requires taking into account individual agents' preferences to come up with mutually acceptable joint plans. In this paper, we apply state-of-the-art planning techniques to real-world public transportation data to evaluate the feasibility of multiagent planning techniques in this domain. The potential application value of improving travel sharing technology has great application value due to its ability to reduce the environmental impact of travelling while providing benefits to travellers at the same time. We propose a three-phase algorithm that utilises performant single-agent planners to find individual plans in a simplified domain first, then merges them using a best-response planner which ensures resulting solutions are individually rational, and then maps the resulting plan onto the full temporal planning domain to schedule actual journeys. The evaluation of our algorithm on real-world, multi-modal public transportation data for the United Kingdom shows linear scalability both in the scenario size and in the number of agents, where trade-offs have to be made between total cost improvement, the percentage of feasible timetables identified for journeys, and the prolongation of these journeys. Our system constitutes the first implementation of strategic multiagent planning algorithms in large-scale domains and provides insights into the engineering process of translating general domain-independent multiagent planning algorithms to real-world applications.\nIn probabilistic approaches to classification and information extraction, one typically builds a statistical model of words under the assumption that future data will exhibit the same regularities as the training data. In many data sets, however, there are scope-limited features whose predictive power is only applicable to a certain subset of the data. For example, in information extraction from web pages, word formatting may be indicative of extraction category in different ways on different web pages. The difficulty with using such features is capturing and exploiting the new regularities encountered in previously unseen data. In this paper, we propose a hierarchical probabilistic model that uses both local/scope-limited features, such as word formatting, and global features, such as word content. The local regularities are modeled as an unobserved random parameter which is drawn once for each local data set. This random parameter is estimated during the inference process and then used to perform classification with both the local and global features--- a procedure which is akin to automatically retuning the classifier to the local regularities on each newly encountered web page. Exact inference is intractable and we present approximations via point estimates and variational methods. Empirical results on large collections of web data demonstrate that this method significantly improves performance from traditional models of global features alone.\nIn this paper we propose a measure of clustering quality or accuracy that is appropriate in situations where it is desirable to evaluate a clustering algorithm by somehow comparing the clusters it produces with ``ground truth' consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are refered to as ``external'. Our measure also has the characteristic of allowing clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. In cases where all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that would be required to encode (compress) the class labels if both the encoder and decoder have free acccess to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. These estimated probabilities can be seen as a model for the class labels and their associated code length as a model cost.\nIn this paper, by adopting a coherence-based probabilistic approach to default reasoning, we focus the study on the logical operation of quasi conjunction and the Goodman-Nguyen inclusion relation for conditional events. We recall that quasi conjunction is a basic notion for defining consistency of conditional knowledge bases. By deepening some results given in a previous paper we show that, given any finite family of conditional events F and any nonempty subset S of F, the family F p-entails the quasi conjunction C(S); then, given any conditional event E|H, we analyze the equivalence between p-entailment of E|H from F and p-entailment of E|H from C(S), where S is some nonempty subset of F. We also illustrate some alternative theorems related with p-consistency and p-entailment. Finally, we deepen the study of the connections between the notions of p-entailment and inclusion relation by introducing for a pair (F,E|H) the (possibly empty) class K of the subsets S of F such that C(S) implies E|H. We show that the class K satisfies many properties; in particular K is additive and has a greatest element which can be determined by applying a suitable algorithm.\nWithin the area of computational models of argumentation, the instantiation-based approach is gaining more and more attention, not at least because meaningful input for Dung's abstract frameworks is provided in that way. In a nutshell, the aim of instantiation-based argumentation is to form, from a given knowledge base, a set of arguments and to identify the conflicts between them. The resulting network is then evaluated by means of extension-based semantics on an abstract level, i.e. on the resulting graph. While several systems are nowadays available for the latter step, the automation of the instantiation process itself has received less attention. In this work, we provide a novel approach to construct and visualize an argumentation framework from a given knowledge base. The system we propose relies on Answer-Set Programming and follows a two-step approach. A first program yields the logic-based arguments as its answer-sets; a second program is then used to specify the relations between arguments based on the answer-sets of the first program. As it turns out, this approach not only allows for a flexible and extensible tool for instantiation-based argumentation, but also provides a new method for answer-set visualization in general.\nPrevious research into the relation between ASP and classical logic has identified at least two different ways in which the former extends the latter. First, ASP program typically contain sets of rules that can be naturally interpreted as inductive definitions, and the language FO(ID) has shown that such inductive definitions can elegantly be added to classical logic in a modular way. Second, there is of course also the well-known epistemic component of ASP, which was mainly emphasized in the early papers on stable model semantics. To investigate whether this kind of knowledge can also, and in a similarly modular way, be added to classical logic, the language of Ordered Epistemic Logic was presented in recent work. However, this logic views the epistemic component as entirely separate from the inductive definition component, thus ignoring any possible interplay between the two. In this paper, we present a language that extends the inductive definition construct found in FO(ID) with an epistemic component, making such interplay possible. The eventual goal of this work is to discover whether it is really appropriate to view the epistemic component and the inductive definition component of ASP as two separate extensions of classical logic, or whether there is also something of importance in the combination of the two.\nArtifact systems are a novel paradigm for specifying and implementing business processes described in terms of interacting modules called artifacts. Artifacts consist of data and lifecycles, accounting respectively for the relational structure of the artifacts' states and their possible evolutions over time. In this paper we put forward artifact-centric multi-agent systems, a novel formalisation of artifact systems in the context of multi-agent systems operating on them. Differently from the usual process-based models of services, the semantics we give explicitly accounts for the data structures on which artifact systems are defined. We study the model checking problem for artifact-centric multi-agent systems against specifications written in a quantified version of temporal-epistemic logic expressing the knowledge of the agents in the exchange. We begin by noting that the problem is undecidable in general. We then identify two noteworthy restrictions, one syntactical and one semantical, that enable us to find bisimilar finite abstractions and therefore reduce the model checking problem to the instance on finite models. Under these assumptions we show that the model checking problem for these systems is EXPSPACE-complete. We then introduce artifact-centric programs, compact and declarative representations of the programs governing both the artifact system and the agents. We show that, while these in principle generate infinite-state systems, under natural conditions their verification problem can be solved on finite abstractions that can be effectively computed from the programs. Finally we exemplify the theoretical results of the paper through a mainstream procurement scenario from the artifact systems literature.\nRecently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide are rather noisy and erroneous. This challenge is usually resolved by collecting multiple labels for each sample, and then aggregating them to estimate the true label. Although the mechanism leads to high-quality labels, it is not actually cost-effective. As a result, efforts are currently made to maximize the accuracy in estimating true labels, while fixing the number of acquired labels.   This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterwards, algorithms to make inference on these models will be surveyed. Moreover, adaptive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished methods, and provides guidelines for future work required to address the current open issues.\nIn this work we consider the problem of learning the structure of Markov networks from data. We present an approach for tackling this problem called IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC algorithm, designed for avoiding important limitations of existing independence-based algorithms. These algorithms proceed by performing statistical independence tests on data, trusting completely the outcome of each test. In practice tests may be incorrect, resulting in potential cascading errors and the consequent reduction in the quality of the structures learned. IBMAP contemplates this uncertainty in the outcome of the tests through a probabilistic maximum-a-posteriori approach. The approach is instantiated in the IBMAP-HC algorithm, a structure selection strategy that performs a polynomial heuristic local search in the space of possible structures. We present an extensive empirical evaluation on synthetic and real data, showing that our algorithm outperforms significantly the current independence-based algorithms, in terms of data efficiency and quality of learned structures, with equivalent computational complexities. We also show the performance of IBMAP-HC in a real-world application of knowledge discovery: EDAs, which are evolutionary algorithms that use structure learning on each generation for modeling the distribution of populations. The experiments show that when IBMAP-HC is used to learn the structure, EDAs improve the convergence to the optimum.\nFeature selection aims to select the smallest subset of features for a specified level of performance. The optimal achievable classification performance on a feature subset is summarized by its Receiver Operating Curve (ROC). When infinite data is available, the Neyman- Pearson (NP) design procedure provides the most efficient way of obtaining this curve. In practice the design procedure is applied to density estimates from finite data sets. We perform a detailed statistical analysis of the resulting error propagation on finite alphabets. We show that the estimated performance curve (EPC) produced by the design procedure is arbitrarily accurate given sufficient data, independent of the size of the feature set. However, the underlying likelihood ranking procedure is highly sensitive to errors that reduces the probability that the EPC is in fact the ROC. In the worst case, guaranteeing that the EPC is equal to the ROC may require data sizes exponential in the size of the feature set. These results imply that in theory the NP design approach may only be valid for characterizing relatively small feature subsets, even when the performance of any given classifier can be estimated very accurately. We discuss the practical limitations for on-line methods that ensures that the NP procedure operates in a statistically valid region.\nGiven a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more ``focused' predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classification data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawids prequential(predictive sequential) principle.The results demonstrate that the marginal likelihood score does NOT perform well FOR supervised model selection, WHILE the best results are obtained BY using Dawids prequential r napproach.\nThis paper is about metric data structures in high-dimensional or non-Euclidean space that permit cached sufficient statistics accelerations of learning algorithms.   It has recently been shown that for less than about 10 dimensions, decorating kd-trees with additional \"cached sufficient statistics\" such as first and second moments and contingency tables can provide satisfying acceleration for a very wide range of statistical learning tasks such as kernel regression, locally weighted regression, k-means clustering, mixture modeling and Bayes Net learning.   In this paper, we begin by defining the anchors hierarchy - a fast data structure and algorithm for localizing data based only on a triangle-inequality-obeying distance metric. We show how this, in its own right, gives a fast and effective clustering of data. But more importantly we show how it can produce a well-balanced structure similar to a Ball-Tree (Omohundro, 1991) or a kind of metric tree (Uhlmann, 1991; Ciaccia, Patella, & Zezula, 1997) in a way that is neither \"top-down\" nor \"bottom-up\" but instead \"middle-out\". We then show how this structure, decorated with cached sufficient statistics, allows a wide variety of statistical learning algorithms to be accelerated even in thousands of dimensions.\nThe growth of Internet commerce has stimulated the use of collaborative filtering (CF) algorithms as recommender systems. Such systems leverage knowledge about the known preferences of multiple users to recommend items of interest to other users. CF methods have been harnessed to make recommendations about such items as web pages, movies, books, and toys. Researchers have proposed and evaluated many approaches for generating recommendations. We describe and evaluate a new method called emph{personality diagnosis (PD)}. Given a user's preferences for some items, we compute the probability that he or she is of the same \"personality type\" as other users, and, in turn, the probability that he or she will like new items. PD retains some of the advantages of traditional similarity-weighting techniques in that all data is brought to bear on each prediction and new data can be added easily and incrementally. Additionally, PD has a meaningful probabilistic interpretation, which may be leveraged to justify, explain, and augment results. We report empirical results on the EachMovie database of movie ratings, and on user profile data collected from the CiteSeer digital library of Computer Science research papers. The probabilistic framework naturally supports a variety of descriptive measurements - in particular, we consider the applicability of a value of information (VOI) computation.\n\"Information Processing\" is a recently launched buzzword whose meaning is vague and obscure even for the majority of its users. The reason for this is the lack of a suitable definition for the term \"information\". In my attempt to amend this bizarre situation, I have realized that, following the insights of Kolmogorov's Complexity theory, information can be defined as a description of structures observable in a given data set. Two types of structures could be easily distinguished in every data set - in this regard, two types of information (information descriptions) should be designated: physical information and semantic information. Kolmogorov's theory also posits that the information descriptions should be provided as a linguistic text structure. This inevitably leads us to an assertion that information processing has to be seen as a kind of text processing. The idea is not new - inspired by the observation that human information processing is deeply rooted in natural language handling customs, Lotfi Zadeh and his followers have introduced the so-called \"Computing With Words\" paradigm. Despite of promotional efforts, the idea is not taking off yet. The reason - a lack of a coherent understanding of what should be called \"information\", and, as a result, misleading research roadmaps and objectives. I hope my humble attempt to clarify these issues would be helpful in avoiding common traps and pitfalls.\nThis paper presents a method to compute automatically topological relations using SWRL rules. The calculation of these rules is based on the definition of a Selective Nef Complexes Nef Polyhedra structure generated from standard Polyhedron. The Selective Nef Complexes is a data model providing a set of binary Boolean operators such as Union, Difference, Intersection and Symmetric difference, and unary operators such as Interior, Closure and Boundary. In this work, these operators are used to compute topological relations between objects defined by the constraints of the 9 Intersection Model (9-IM) from Egenhofer. With the help of these constraints, we defined a procedure to compute the topological relations on Nef polyhedra. These topological relationships are Disjoint, Meets, Contains, Inside, Covers, CoveredBy, Equals and Overlaps, and defined in a top-level ontology with a specific semantic definition on relation such as Transitive, Symmetric, Asymmetric, Functional, Reflexive, and Irreflexive. The results of the computation of topological relationships are stored in an OWL-DL ontology allowing after what to infer on these new relationships between objects. In addition, logic rules based on the Semantic Web Rule Language allows the definition of logic programs that define which topological relationships have to be computed on which kind of objects with specific attributes. For instance, a \"Building\" that overlaps a \"Railway\" is a \"RailStation\".\nThe dependency graph is a data architecture that models all the dependencies between the different types of assets in the game. It depicts the dependency-based relationships between the assets of a game. For example, a player must construct an arsenal before he can build weapons. It is vital that the dependency graph of a game is designed logically to ensure a logical sequence of game play. However, a mere logical dependency graph is not sufficient in sustaining the players' enduring interests in a game, which brings the problem of game balancing into picture. The issue of game balancing arises when the players do not feel the chances of winning the game over their AI opponents who are more skillful in the game play. At the current state of research, the architecture of dependency graph is monolithic for the players. The sequence of asset possession is always foreseeable because there is only a single dependency graph. Game balancing is impossible when the assets of AI players are overwhelmingly outnumbering that of human players. This paper proposes a parallel architecture of dependency graph for the AI players and human players. Instead of having a single dependency graph, a parallel architecture is proposed where the dependency graph of AI player is adjustable with that of human player using a support dependency as a game balancing mechanism. This paper exhibits that the parallel dependency graph helps to improve game balancing.\nGiven a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more ``focused' predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classification data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawids prequential(predictive sequential) principle.The results demonstrate that the marginal likelihood score does NOT perform well FOR supervised model selection, WHILE the best results are obtained BY using Dawids prequential r napproach.\nCollaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metrics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.\nWe investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form psi(ga+fx^Tfgb) are mixed. Here psi(...) is the inverse link function. Suppose the true response y follows an exponential family regression model with mean function belonging to a class of smooth functions of the form psi(h(fx)) where h(...)in W_2^infty (a Sobolev class over [0,1]^{s}). It is shown that the HME probability density functions can approximate the true density, at a rate of O(m^{-2/s}) in L_p norm, and at a rate of O(m^{-4/s}) in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with no more than s-layers, where s is the dimension of the predictor fx. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size n and the number of experts m both increase, the mean square error of the predicted mean response goes to zero. Conditions for such results to hold are stated and discussed.\nA key problem of robotic environmental sensing and monitoring is that of active sensing: How can a team of robots plan the most informative observation paths to minimize the uncertainty in modeling and predicting an environmental phenomenon? This paper presents two principled approaches to efficient information-theoretic path planning based on entropy and mutual information criteria for in situ active sensing of an important broad class of widely-occurring environmental phenomena called anisotropic fields. Our proposed algorithms are novel in addressing a trade-off between active sensing performance and time efficiency. An important practical consequence is that our algorithms can exploit the spatial correlation structure of Gaussian process-based anisotropic fields to improve time efficiency while preserving near-optimal active sensing performance. We analyze the time complexity of our algorithms and prove analytically that they scale better than state-of-the-art algorithms with increasing planning horizon length. We provide theoretical guarantees on the active sensing performance of our algorithms for a class of exploration tasks called transect sampling, which, in particular, can be improved with longer planning time and/or lower spatial correlation along the transect. Empirical evaluation on real-world anisotropic field data shows that our algorithms can perform better or at least as well as the state-of-the-art algorithms while often incurring a few orders of magnitude less computational time, even when the field conditions are less favorable.\nAssignment methods are at the heart of many algorithms for unsupervised learning and clustering - in particular, the well-known K-means and Expectation-Maximization (EM) algorithms. In this work, we study several different methods of assignment, including the \"hard\" assignments used by K-means and the ?soft' assignments used by EM. While it is known that K-means minimizes the distortion on the data and EM maximizes the likelihood, little is known about the systematic differences of behavior between the two algorithms. Here we shed light on these differences via an information-theoretic analysis. The cornerstone of our results is a simple decomposition of the expected distortion, showing that K-means (and its extension for inferring general parametric densities from unlabeled sample data) must implicitly manage a trade-off between how similar the data assigned to each cluster are, and how the data are balanced among the clusters. How well the data are balanced is measured by the entropy of the partition defined by the hard assignments. In addition to letting us predict and verify systematic differences between K-means and EM on specific examples, the decomposition allows us to give a rather general argument showing that K ?means will consistently find densities with less \"overlap\" than EM. We also study a third natural assignment method that we call posterior assignment, that is close in spirit to the soft assignments of EM, but leads to a surprisingly different algorithm.\nOver the years, numerous experiments have been accumulated to show that cooperation is not casual and depends on the payoffs of the game. These findings suggest that humans have attitude to cooperation by nature and the same person may act more or less cooperatively depending on the particular payoffs. In other words, people do not act a priori as single agents, but they forecast how the game would be played if they formed coalitions and then they play according to their best forecast. In this paper we formalize this idea and we define a new solution concept for one-shot normal form games. We prove that this \\emph{cooperative equilibrium} exists for all finite games and it explains a number of different experimental findings, such as (1) the rate of cooperation in the Prisoner's dilemma depends on the cost-benefit ratio; (2) the rate of cooperation in the Traveler's dilemma depends on the bonus/penalty; (3) the rate of cooperation in the Publig Goods game depends on the pro-capite marginal return and on the numbers of players; (4) the rate of cooperation in the Bertrand competition depends on the number of players; (5) players tend to be fair in the bargaining problem; (6) players tend to be fair in the Ultimatum game; (7) players tend to be altruist in the Dictator game; (8) offers in the Ultimatum game are larger than offers in the Dictator game.\nWe aim at providing a foundation of a theory of \"good\" SAT representations F of boolean functions f. We argue that the hierarchy UC_k of unit-refutation complete clause-sets of level k, introduced by the authors, provides the most basic target classes, that is, F in UC_k is to be achieved for k as small as feasible. If F does not contain new variables, i.e., F is equivalent (as a CNF) to f, then F in UC_1 is similar to \"achieving (generalised) arc consistency\" known from the literature (it is somewhat weaker, but theoretically much nicer to handle). We show that for polysize representations of boolean functions in this sense, the hierarchy UC_k is strict. The boolean functions for these separations are \"doped\" minimally unsatisfiable clause-sets of deficiency 1; these functions have been introduced in [Sloan, Soerenyi, Turan, 2007], and we generalise their construction and show a correspondence to a strengthened notion of irredundant sub-clause-sets. Turning from lower bounds to upper bounds, we believe that many common CNF representations fit into the UC_k scheme, and we give some basic tools to construct representations in UC_1 with new variables, based on the Tseitin translation. Note that regarding new variables the UC_1-representations are stronger than mere \"arc consistency\", since the new variables are not excluded from consideration.\nOne of the most challenging problems in recommender systems based on the collaborative filtering (CF) concept is data sparseness, i.e., limited user preference data is available for making recommendations. Cross-domain collaborative filtering (CDCF) has been studied as an effective mechanism to alleviate data sparseness of one domain using the knowledge about user preferences from other domains. A key question to be answered in the context of CDCF is what common characteristics can be deployed to link different domains for effective knowledge transfer. In this paper, we assess the usefulness of user-contributed (social) tags in this respect. We do so by means of the Generalized Tag-induced Cross-domain Collaborative Filtering (GTagCDCF) approach that we propose in this paper and that we developed based on the general collective matrix factorization framework. Assessment is done by a series of experiments, using publicly available CF datasets that represent three cross-domain cases, i.e., two two-domain cases and one three-domain case. A comparative analysis on two-domain cases involving GTagCDCF and several state-of-the-art CDCF approaches indicates the increased benefit of using social tags as representatives of explicit links between domains for CDCF as compared to the implicit links deployed by the existing CDCF methods. In addition, we show that users from different domains can already benefit from GTagCDCF if they only share a few common tags. Finally, we use the three-domain case to validate the robustness of GTagCDCF with respect to the scale of datasets and the varying number of domains.\nSaliency detection has been an intuitive way to provide useful cues for object detection and segmentation, as desired for many vision and graphics applications. In this paper, we provided a robust method for salient object detection and segmentation. Other than using various pixel-level contrast definitions, we exploited global image structures and proposed a new geodesic method dedicated for salient object detection. In the proposed approach, a new geodesic scheme, namely geodesic tunneling is proposed to tackle with textures and local chaotic structures. With our new geodesic approach, a geodesic saliency map is estimated in correspondence to spatial structures in an image. Experimental evaluation on a salient object benchmark dataset validated that our algorithm consistently outperformed a number of the state-of-art saliency methods, yielding higher precision and better recall rates. With the robust saliency estimation, we also present an unsupervised hierarchical salient object cut scheme simply using adaptive saliency thresholding, which attained the highest score in our F-measure test. We also applied our geodesic cut scheme to a number of image editing tasks as demonstrated in additional experiments.\nThe marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of \"mixed-product\" message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel \"argmax-product\" message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods.\nEvery cellular network deployment requires planning and optimization in order to provide adequate coverage, capacity, and quality of service (QoS). Optimization mobile radio network planning is a very complex task, as many aspects must be taken into account. With the rapid development in mobile network we need effective network planning tool to satisfy the need of customers. However, deciding upon the optimum placement for the base stations (BS s) to achieve best services while reducing the cost is a complex task requiring vast computational resource. This paper introduces the spatial clustering to solve the Mobile Networking Planning problem. It addresses antenna placement problem or the cell planning problem, involves locating and configuring infrastructure for mobile networks by modified the original Partitioning Around Medoids PAM algorithm. M-PAM (Modified Partitioning Around Medoids) has been proposed to satisfy the requirements and constraints. PAM needs to specify number of clusters (k) before starting to search for the best locations of base stations. The M-PAM algorithm uses the radio network planning to determine k. We calculate for each cluster its coverage and capacity and determine if they satisfy the mobile requirements, if not we will increase (k) and reapply algorithms depending on two methods for clustering. Implementation of this algorithm to a real case study is presented. Experimental results and analysis indicate that the M-PAM algorithm when applying method two is effective in case of heavy load distribution, and leads to minimum number of base stations, which directly affected onto the cost of planning the network.\nMost optimal routing problems focus on minimizing travel time or distance traveled. Oftentimes, a more useful objective is to maximize the probability of on-time arrival, which requires statistical distributions of travel times, rather than just mean values. We propose a method to estimate travel time distributions on large-scale road networks, using probe vehicle data collected from GPS. We present a framework that works with large input of data, and scales linearly with the size of the network. Leveraging the planar topology of the graph, the method computes efficiently the time correlations between neighboring streets. First, raw probe vehicle traces are compressed into pairs of travel times and number of stops for each traversed road segment using a `stop-and-go' algorithm developed for this work. The compressed data is then used as input for training a path travel time model, which couples a Markov model along with a Gaussian Markov random field. Finally, scalable inference algorithms are developed for obtaining path travel time distributions from the composite MM-GMRF model. We illustrate the accuracy and scalability of our model on a 505,000 road link network spanning the San Francisco Bay Area.\nThis paper deals with chain graphs under the Andersson-Madigan-Perlman (AMP) interpretation. In particular, we present a constraint based algorithm for learning an AMP chain graph a given probability distribution is faithful to. Moreover, we show that the extension of Meek's conjecture to AMP chain graphs does not hold, which compromises the development of efficient and correct score+search learning algorithms under assumptions weaker than faithfulness.   We also introduce a new family of graphical models that consists of undirected and bidirected edges. We name this new family maximal covariance-concentration graphs (MCCGs) because it includes both covariance and concentration graphs as subfamilies. However, every MCCG can be seen as the result of marginalizing out some nodes in an AMP CG. We describe global, local and pairwise Markov properties for MCCGs and prove their equivalence. We characterize when two MCCGs are Markov equivalent, and show that every Markov equivalence class of MCCGs has a distinguished member. We present a constraint based algorithm for learning a MCCG a given probability distribution is faithful to.   Finally, we present a graphical criterion for reading dependencies from a MCCG of a probability distribution that satisfies the graphoid properties, weak transitivity and composition. We prove that the criterion is sound and complete in certain sense.\nIntroduction. Case Based Reasoning (CBR) is an emerg- ing decision making paradigm in medical research where new cases are solved relying on previously solved similar cases. Usually, a database of solved cases is provided, and every case is described through a set of attributes (inputs) and a label (output). Extracting useful information from this database can help the CBR system providing more reliable results on the yet to be solved cases. Objective. For that purpose we suggest a general frame- work where a CBR system, viz. K-Nearest Neighbor (K-NN) algorithm, is combined with various information obtained from a Logistic Regression (LR) model. Methods. LR is applied, on the case database, to assign weights to the attributes as well as the solved cases. Thus, five possible decision making systems based on K-NN and/or LR were identified: a standalone K-NN, a standalone LR and three soft K-NN algorithms that rely on the weights based on the results of the LR. The evaluation of the described approaches is performed in the field of renal transplant access waiting list. Results and conclusion. The results show that our suggested approach, where the K-NN algorithm relies on both weighted attributes and cases, can efficiently deal with non relevant attributes, whereas the four other approaches suffer from this kind of noisy setups. The robustness of this approach suggests interesting perspectives for medical problem solving tools using CBR methodology.\nWe consider the problem of creating fair course timetables in the setting of a university. Our motivation is to improve the overall satisfaction of individuals concerned (students, teachers, etc.) by providing a fair timetable to them. The central idea is that undesirable arrangements in the course timetable, i.e., violations of soft constraints, should be distributed in a fair way among the individuals. We propose two formulations for the fair course timetabling problem that are based on max-min fairness and Jain's fairness index, respectively. Furthermore, we present and experimentally evaluate an optimization algorithm based on simulated annealing for solving max-min fair course timetabling problems. The new contribution is concerned with measuring the energy difference between two timetables, i.e., how much worse a timetable is compared to another timetable with respect to max-min fairness. We introduce three different energy difference measures and evaluate their impact on the overall algorithm performance. The second proposed problem formulation focuses on the tradeoff between fairness and the total amount of soft constraint violations. Our experimental evaluation shows that the known best solutions to the ITC2007 curriculum-based course timetabling instances are quite fair with respect to Jain's fairness index. However, the experiments also show that the fairness can be improved further for only a rather small increase in the total amount of soft constraint violations.\nBayesian Reinforcement Learning (RL) is capable of not only incorporating domain knowledge, but also solving the exploration-exploitation dilemma in a natural way. As Bayesian RL is intractable except for special cases, previous work has proposed several approximation methods. However, these methods are usually too sensitive to parameter values, and finding an acceptable parameter setting is practically impossible in many applications. In this paper, we propose a new algorithm that greedily approximates Bayesian RL to achieve robustness in parameter space. We show that for a desired learning behavior, our proposed algorithm has a polynomial sample complexity that is lower than those of existing algorithms. We also demonstrate that the proposed algorithm naturally outperforms other existing algorithms when the prior distributions are not significantly misleading. On the other hand, the proposed algorithm cannot handle greatly misspecified priors as well as the other algorithms can. This is a natural consequence of the fact that the proposed algorithm is greedier than the other algorithms. Accordingly, we discuss a way to select an appropriate algorithm for different tasks based on the algorithms' greediness. We also introduce a new way of simplifying Bayesian planning, based on which future work would be able to derive new algorithms.\nVT (Viterbi training), or hard EM, is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation $y$, it searches for a state of hidden variables $x$ that maximizes $p(x,y \\mid \\theta)$ by coordinate ascent on parameters $\\theta$ and $x$. In this paper we introduce VT to PRISM, a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First VT in PRISM converges faster than EM in PRISM due to the VT's termination condition. Second, parameters learned by VT often show good prediction performance compared to those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e.\\ VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition that is imposed on PRISM programs is no more required if we learn parameters by VT.   Last but not least we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers. To appear in Theory and Practice of Logic Programming (TPLP).\nAssociative classification is a recent and rewarding technique which integrates association rule mining and classification to a model for prediction and achieves maximum accuracy. Associative classifiers are especially fit to applications where maximum accuracy is desired to a model for prediction. There are many domains such as medical where the maximum accuracy of the model is desired. Heart disease is a single largest cause of death in developed countries and one of the main contributors to disease burden in developing countries. Mortality data from the registrar general of India shows that heart disease are a major cause of death in India, and in Andhra Pradesh coronary heart disease cause about 30%of deaths in rural areas. Hence there is a need to develop a decision support system for predicting heart disease of a patient. In this paper we propose efficient associative classification algorithm using genetic approach for heart disease prediction. The main motivation for using genetic algorithm in the discovery of high level prediction rules is that the discovered rules are highly comprehensible, having high predictive accuracy and of high interestingness values. Experimental Results show that most of the classifier rules help in the best prediction of heart disease which even helps doctors in their diagnosis decisions.\nAssociative memories store content in such a way that the content can be later retrieved by presenting the memory with a small portion of the content, rather than presenting the memory with an address as in more traditional memories. Associative memories are used as building blocks for algorithms within database engines, anomaly detection systems, compression algorithms, and face recognition systems. A classical example of an associative memory is the Hopfield neural network. Recently, Gripon and Berrou have introduced an alternative construction which builds on ideas from the theory of error correcting codes and which greatly outperforms the Hopfield network in capacity, diversity, and efficiency. In this paper we implement a variation of the Gripon-Berrou associative memory on a general purpose graphical processing unit (GPU). The work of Gripon and Berrou proposes two retrieval rules, sum-of-sum and sum-of-max. The sum-of-sum rule uses only matrix-vector multiplication and is easily implemented on the GPU. The sum-of-max rule is much less straightforward to implement because it involves non-linear operations. However, the sum-of-max rule gives significantly better retrieval error rates. We propose a hybrid rule tailored for implementation on a GPU which achieves a 880-fold speedup without sacrificing any accuracy.\nEstablishing arc consistency on two relational structures is one of the most popular heuristics for the constraint satisfaction problem. We aim at determining the time complexity of arc consistency testing. The input structures $G$ and $H$ can be supposed to be connected colored graphs, as the general problem reduces to this particular case. We first observe the upper bound $O(e(G)v(H)+v(G)e(H))$, which implies the bound $O(e(G)e(H))$ in terms of the number of edges and the bound $O((v(G)+v(H))^3)$ in terms of the number of vertices. We then show that both bounds are tight up to a constant factor as long as an arc consistency algorithm is based on constraint propagation (like any algorithm currently known).   Our argument for the lower bounds is based on examples of slow constraint propagation. We measure the speed of constraint propagation observed on a pair $G,H$ by the size of a proof, in a natural combinatorial proof system, that Spoiler wins the existential 2-pebble game on $G,H$. The proof size is bounded from below by the game length $D(G,H)$, and a crucial ingredient of our analysis is the existence of $G,H$ with $D(G,H)=\\Omega(v(G)v(H))$. We find one such example among old benchmark instances for the arc consistency problem and also suggest a new, different construction.\nOrthogonality is a discipline of programming that in a syntactic manner guarantees determinism of functional specifications. Essentially, orthogonality avoids, on the one side, the inherent ambiguity of non determinism, prohibiting the existence of different rules that specify the same function and that may apply simultaneously (non-ambiguity), and, on the other side, it eliminates the possibility of occurrence of repetitions of variables in the left-hand side of these rules (left linearity). In the theory of term rewriting systems (TRSs) determinism is captured by the well-known property of confluence, that basically states that whenever different computations or simplifications from a term are possible, the computed answers should coincide. Although the proofs are technically elaborated, confluence is well-known to be a consequence of orthogonality. Thus, orthogonality is an important mathematical discipline intrinsic to the specification of recursive functions that is naturally applied in functional programming and specification. Starting from a formalization of the theory of TRSs in the proof assistant PVS, this work describes how confluence of orthogonal TRSs has been formalized, based on axiomatizations of properties of rules, positions and substitutions involved in parallel steps of reduction, in this proof assistant. Proofs for some similar but restricted properties such as the property of confluence of non-ambiguous and (left and right) linear TRSs have been fully formalized.\nWeb service composition is the process of synthesizing a new composite service using a set of available Web services in order to satisfy a client request that cannot be treated by any available Web services. The Web services space is a dynamic environment characterized by a huge number of elements. Furthermore, many Web services are offering similar functionalities. In this paper we propose a model for Web service composition designed to address the scale effect and the redundancy issue. The Web services space is represented by a two-layered network architecture. A concrete similarity network layer organizes the Web services operations into communities of functionally similar operations. An abstract interaction network layer represents the composition relationships between the sets of communities. Composition synthesis is performed by a two-phased graph search algorithm. First, the interaction network is mined in order to discover abstract solutions to the request goal. Then, the abstract compositions are instantiated with concrete operations selected from the similarity network. This strategy allows an efficient exploration of the Web services space. Furthermore, operations grouped in a community can be easily substituted if necessary during the composition's synthesis's process.\nThe highest level of mathematics research is traditionally seen as a solitary activity. Yet new innovations by mathematicians themselves are starting to harness the power of social computation to create new modes of mathematical production. We study the effectiveness of one such system, and make proposals for enhancement, drawing on AI and computer based mathematics. We analyse the content of a sample of questions and responses in the community question answering system for research mathematicians, math-overflow. We find that mathoverflow is very effective, with 90% of our sample of questions answered completely or in part. A typical response is an informal dialogue, allowing error and speculation, rather than rigorous mathematical argument: 37% of our sample discussions acknowledged error. Responses typically present information known to the respondent, and readily checked by other users: thus the effectiveness of mathoverflow comes from information sharing. We conclude that extending and the power and reach of mathoverflow through a combination of people and machines raises new challenges for artificial intelligence and computational mathematics, in particular how to handle error, analogy and informal reasoning.\nComplex networks refer to large-scale graphs with nontrivial connection patterns. The salient and interesting features that the complex network study offer in comparison to graph theory are the emphasis on the dynamical properties of the networks and the ability of inherently uncovering pattern formation of the vertices. In this paper, we present a hybrid data classification technique combining a low level and a high level classifier. The low level term can be equipped with any traditional classification techniques, which realize the classification task considering only physical features (e.g., geometrical or statistical features) of the input data. On the other hand, the high level term has the ability of detecting data patterns with semantic meanings. In this way, the classification is realized by means of the extraction of the underlying network's features constructed from the input data. As a result, the high level classification process measures the compliance of the test instances with the pattern formation of the training data. Out of various high level perspectives that can be utilized to capture semantic meaning, we utilize the dynamical features that are generated from a tourist walker in a networked environment. Specifically, a weighted combination of transient and cycle lengths generated by the tourist walk is employed for that end. Interestingly, our study shows that the proposed technique is able to further improve the already optimized performance of traditional classification techniques.\nIn this work, we investigate a novel semantic approach for pattern discovery in trajectories that, relying on ontologies, enhances object movement information with event semantics. The approach can be applied to the detection of movement patterns and behaviors whenever the semantics of events occurring along the trajectory is, explicitly or implicitly, available. In particular, we tested it against an exacting case scenario in maritime surveillance, i.e., the discovery of suspicious container transportations.   The methodology we have developed entails the formalization of the application domain through a domain ontology, extending the Moving Object Ontology (MOO) described in this paper. Afterwards, movement patterns have to be formalized, either as Description Logic (DL) axioms or queries, enabling the retrieval of the trajectories that follow the patterns.   In our experimental evaluation, we have considered a real world dataset of 18 Million of container events describing the deed undertaken in a port to accomplish the shipping (e.g., loading on a vessel, export operation). Leveraging events, we have reconstructed almost 300 thousand container trajectories referring to 50 thousand containers travelling along three years. We have formalized the anomalous itinerary patterns as DL axioms, testing different ontology APIs and DL reasoners to retrieve the suspicious transportations.   Our experiments demonstrate that the approach is feasible and efficient. In particular, the joint use of Pellet and SPARQL-DL enables to detect the trajectories following a given pattern in a reasonable time with big size datasets.\nThe extended mind hypothesis has stimulated much interest in cognitive science. However, its core claim, i.e. that the process of cognition can extend beyond the brain via the body and into the environment, has been heavily criticized. A prominent critique of this claim holds that when some part of the world is coupled to a cognitive system this does not necessarily entail that the part is also constitutive of that cognitive system. This critique is known as the \"coupling-constitution fallacy\". In this paper we respond to this reductionist challenge by using an evolutionary robotics approach to create a minimal model of two acoustically coupled agents. We demonstrate how the interaction process as a whole has properties that cannot be reduced to the contributions of the isolated agents. We also show that the neural dynamics of the coupled agents has formal properties that are inherently impossible for those neural networks in isolation. By keeping the complexity of the model to an absolute minimum, we are able to illustrate how the coupling-constitution fallacy is in fact based on an inadequate understanding of the constitutive role of nonlinear interactions in dynamical systems theory.\nRecent research in robot exploration and mapping has focused on sampling environmental hotspot fields. This exploration task is formalized by Low, Dolan, and Khosla (2008) in a sequential decision-theoretic planning under uncertainty framework called MASP. The time complexity of solving MASP approximately depends on the map resolution, which limits its use in large-scale, high-resolution exploration and mapping. To alleviate this computational difficulty, this paper presents an information-theoretic approach to MASP (iMASP) for efficient adaptive path planning; by reformulating the cost-minimizing iMASP as a reward-maximizing problem, its time complexity becomes independent of map resolution and is less sensitive to increasing robot team size as demonstrated both theoretically and empirically. Using the reward-maximizing dual, we derive a novel adaptive variant of maximum entropy sampling, thus improving the induced exploration policy performance. It also allows us to establish theoretical bounds quantifying the performance advantage of optimal adaptive over non-adaptive policies and the performance quality of approximately optimal vs. optimal adaptive policies. We show analytically and empirically the superior performance of iMASP-based policies for sampling the log-Gaussian process to that of policies for the widely-used Gaussian process in mapping the hotspot field. Lastly, we provide sufficient conditions that, when met, guarantee adaptivity has no benefit under an assumed environment model.\nThe efficiency of current cargo screening processes at sea and air ports is unknown as no benchmarks exists against which they could be measured. Some manufacturer benchmarks exist for individual sensors but we have not found any benchmarks that take a holistic view of the screening procedures assessing a combination of sensors and also taking operator variability into account. Just adding up resources and manpower used is not an effective way for assessing systems where human decision-making and operator compliance to rules play a vital role. For such systems more advanced assessment methods need to be used, taking into account that the cargo screening process is of a dynamic and stochastic nature. Our project aim is to develop a decision support tool (cargo-screening system simulator) that will map the right technology and manpower to the right commodity-threat combination in order to maximize detection rates. In this paper we present a project outline and highlight the research challenges we have identified so far. In addition we introduce our first case study, where we investigate the cargo screening process at the ferry port in Calais.\nMany advances in research regarding immuno-interactions with cancer were developed with the help of ordinary differential equation (ODE) models. These models, however, are not effectively capable of representing problems involving individual localisation, memory and emerging properties, which are common characteristics of cells and molecules of the immune system. Agent-based modelling and simulation is an alternative paradigm to ODE models that overcomes these limitations. In this paper we investigate the potential contribution of agent-based modelling and simulation when compared to ODE modelling and simulation. We seek answers to the following questions: Is it possible to obtain an equivalent agent-based model from the ODE formulation? Do the outcomes differ? Are there any benefits of using one method compared to the other? To answer these questions, we have considered three case studies using established mathematical models of immune interactions with early-stage cancer. These case studies were re-conceptualised under an agent-based perspective and the simulation results were then compared with those from the ODE models. Our results show that it is possible to obtain equivalent agent-based models (i.e. implementing the same mechanisms); the simulation output of both types of models however might differ depending on the attributes of the system to be modelled. In some cases, additional insight from using agent-based modelling was obtained. Overall, we can confirm that agent-based modelling is a useful addition to the tool set of immunologists, as it has extra features that allow for simulations with characteristics that are closer to the biological phenomena.\nWe present in this paper a new approach for the automatic annotation of medical images, using the approach of \"bag-of-words\" to represent the visual content of the medical image combined with text descriptors based approach tf.idf and reduced by latent semantic to extract the co-occurrence between terms and visual terms. A medical report is composed of a text describing a medical image. First, we are interested to index the text and extract all relevant terms using a thesaurus containing MeSH medical concepts. In a second phase, the medical image is indexed while recovering areas of interest which are invariant to change in scale, light and tilt. To annotate a new medical image, we use the approach of \"bagof-words\" to recover the feature vector. Indeed, we use the vector space model to retrieve similar medical image from the database training. The calculation of the relevance value of an image to the query image is based on the cosine function. We conclude with an experiment carried out on five types of radiological imaging to evaluate the performance of our system of medical annotation. The results showed that our approach works better with more images from the radiology of the skull.\nThis manuscript discusses computation of the Partition Function (PF) and the Minimum Weight Perfect Matching (MWPM) on arbitrary, non-bipartite graphs. We present two novel problem formulations - one for computing the PF of a Perfect Matching (PM) and one for finding MWPMs - that build upon the inter-related Bethe Free Energy, Belief Propagation (BP), Loop Calculus (LC), Integer Linear Programming (ILP) and Linear Programming (LP) frameworks. First, we describe an extension of the LC framework to the PM problem. The resulting formulas, coined (fractional) Bootstrap-BP, express the PF of the original model via the BFE of an alternative PM problem. We then study the zero-temperature version of this Bootstrap-BP formula for approximately solving the MWPM problem. We do so by leveraging the Bootstrap-BP formula to construct a sequence of MWPM problems, where each new problem in the sequence is formed by contracting odd-sized cycles (or blossoms) from the previous problem. This Bootstrap-and-Contract procedure converges reliably and generates an empirically tight upper bound for the MWPM. We conclude by discussing the relationship between our iterative procedure and the famous Blossom Algorithm of Edmonds '65 and demonstrate the performance of the Bootstrap-and-Contract approach on a variety of weighted PM problems.\nTo determine the 3D conformation of proteins is a necessity to understand their functions or interactions with other molecules. It is commonly admitted that, when proteins fold from their primary linear structures to their final 3D conformations, they tend to choose the ones that minimize their free energy. To find the 3D conformation of a protein knowing its amino acid sequence, bioinformaticians use various models of different resolutions and artificial intelligence tools, as the protein folding prediction problem is a NP complete one. More precisely, to determine the backbone structure of the protein using the low resolution models (2D HP square and 3D HP cubic), by finding the conformation that minimize free energy, is intractable exactly. Both the proof of NP-completeness and the 2D prediction consider that acceptable conformations have to satisfy a self-avoiding walk (SAW) requirement, as two different amino acids cannot occupy a same position in the lattice. It is shown in this document that the SAW requirement considered when proving NP-completeness is different from the SAW requirement used in various prediction programs, and that they are different from the real biological requirement. Indeed, the proof of NP completeness and the predictions in silico consider conformations that are not possible in practice. Consequences of this fact are investigated in this research work.\nWe consider a voting setting where candidates have preferences about the outcome of the election and are free to join or leave the election. The corresponding candidacy game, where candidates choose strategically to participate or not, has been studied %initially by Dutta et al., who showed that no non-dictatorial voting procedure satisfying unanimity is candidacy-strategyproof, that is, is such that the joint action where all candidates enter the election is always a pure strategy Nash equilibrium. Dutta et al. also showed that for some voting tree procedures, there are candidacy games with no pure Nash equilibria, and that for the rule that outputs the sophisticated winner of voting by successive elimination, all games have a pure Nash equilibrium. No results were known about other voting rules. Here we prove several such results. For four candidates, the message is, roughly, that most scoring rules (with the exception of Borda) do not guarantee the existence of a pure Nash equilibrium but that Condorcet-consistent rules, for an odd number of voters, do. For five candidates, most rules we study no longer have this guarantee. Finally, we identify one prominent rule that guarantees the existence of a pure Nash equilibrium for any number of candidates (and for an odd number of voters): the Copeland rule. We also show that under mild assumptions on the voting rule, the existence of strong equilibria cannot be guaranteed.\nExpert finding is an information retrieval task concerned with the search for the most knowledgeable people, in some topic, with basis on documents describing peoples activities. The task involves taking a user query as input and returning a list of people sorted by their level of expertise regarding the user query. This paper introduces a novel approach for combining multiple estimators of expertise based on a multisensor data fusion framework together with the Dempster-Shafer theory of evidence and Shannon's entropy. More specifically, we defined three sensors which detect heterogeneous information derived from the textual contents, from the graph structure of the citation patterns for the community of experts, and from profile information about the academic experts. Given the evidences collected, each sensor may define different candidates as experts and consequently do not agree in a final ranking decision. To deal with these conflicts, we applied the Dempster-Shafer theory of evidence combined with Shannon's Entropy formula to fuse this information and come up with a more accurate and reliable final ranking list. Experiments made over two datasets of academic publications from the Computer Science domain attest for the adequacy of the proposed approach over the traditional state of the art approaches. We also made experiments against representative supervised state of the art algorithms. Results revealed that the proposed method achieved a similar performance when compared to these supervised techniques, confirming the capabilities of the proposed framework.\nLogic programs under the stable model semantics, or answer-set programs, provide an expressive rule-based knowledge representation framework, featuring a formal, declarative and well-understood semantics. However, handling the evolution of rule bases is still a largely open problem. The AGM framework for belief change was shown to give inappropriate results when directly applied to logic programs under a non-monotonic semantics such as the stable models. The approaches to address this issue, developed so far, proposed update semantics based on manipulating the syntactic structure of programs and rules.   More recently, AGM revision has been successfully applied to a significantly more expressive semantic characterisation of logic programs based on SE-models. This is an important step, as it changes the focus from the evolution of a syntactic representation of a rule base to the evolution of its semantic content.   In this paper, we borrow results from the area of belief update to tackle the problem of updating (instead of revising) answer-set programs. We prove a representation theorem which makes it possible to constructively define any operator satisfying a set of postulates derived from Katsuno and Mendelzon's postulates for belief update. We define a specific operator based on this theorem, examine its computational complexity and compare the behaviour of this operator with syntactic rule update semantics from the literature. Perhaps surprisingly, we uncover a serious drawback of all rule update operators based on Katsuno and Mendelzon's approach to update and on SE-models.\nAnswer Set Programming (ASP) is a truly-declarative programming paradigm proposed in the area of non-monotonic reasoning and logic programming, that has been recently employed in many applications. The development of efficient ASP systems is, thus, crucial. Having in mind the task of improving the solving methods for ASP, there are two usual ways to reach this goal: $(i)$ extending state-of-the-art techniques and ASP solvers, or $(ii)$ designing a new ASP solver from scratch. An alternative to these trends is to build on top of state-of-the-art solvers, and to apply machine learning techniques for choosing automatically the \"best\" available solver on a per-instance basis.   In this paper we pursue this latter direction. We first define a set of cheap-to-compute syntactic features that characterize several aspects of ASP programs. Then, we apply classification methods that, given the features of the instances in a {\\sl training} set and the solvers' performance on these instances, inductively learn algorithm selection strategies to be applied to a {\\sl test} set. We report the results of a number of experiments considering solvers and different training and test sets of instances taken from the ones submitted to the \"System Track\" of the 3rd ASP Competition. Our analysis shows that, by applying machine learning techniques to ASP solving, it is possible to obtain very robust performance: our approach can solve more instances compared with any solver that entered the 3rd ASP Competition. (To appear in Theory and Practice of Logic Programming (TPLP).)\nWe propose a decomposition of the max-min fair curriculum-based course timetabling (MMF-CB-CTT) problem. The decomposition models the room assignment subproblem as a generalized lexicographic bottleneck optimization problem (LBOP). We show that the generalized LBOP can be solved efficiently if the corresponding sum optimization problem can be solved efficiently. As a consequence, the room assignment subproblem of the MMF-CB-CTT problem can be solved efficiently. We use this insight to improve a previously proposed heuristic algorithm for the MMF-CB-CTT problem. Our experimental results indicate that using the new decomposition improves the performance of the algorithm on most of the 21 ITC2007 test instances with respect to the quality of the best solution found. Furthermore, we introduce a measure of the quality of a solution to a max-min fair optimization problem. This measure helps to overcome some limitations imposed by the qualitative nature of max-min fairness and aids the statistical evaluation of the performance of randomized algorithms for such problems. We use this measure to show that using the new decomposition the algorithm outperforms the original one on most instances with respect to the average solution quality.\nDetermination of dietary food consumed a day for patients with diseases in general, greatly affect the health of the body and the healing process, is no exception for people with kidney disease and urinary tract. This paper presents the determination of diet composition in the form of food subtance for people with kidney and urinary tract diseases with a genetic fuzzy approach. This approach combines fuzzy logic and genetic algorithms, which utilizing fuzzy logic fuzzy tools and techniques to model the components of the genetic algorithm and adapting genetic algorithm control parameters, with the aim of improving system performance. The Mamdani fuzzy inference model and fuzzy rules based on population parameters and generation are used to determine the probability of crossover and mutation, and was using In this study, 400 food survey data along with their substances was used as test material. From the data, a varying amount of population is established. Each chromosome has 10 genes in which the value of each gene indicates the index number of foodstuffs in the database. The fuzzy genetic approach produces 10 best food substance and their compositions. The composition of these foods has nutritional value in accordance with the number of calories needed by people with kidney and urinary tract diseases by type of food.\nSmart home technology is a better choice for the people to care about security, comfort and power saving as well. It is required to develop technologies that recognize the Activities of Daily Living (ADLs) of the residents at home and detect the abnormal behavior in the individual's patterns. Data mining techniques such as Frequent pattern mining (FPM), High Utility Pattern (HUP) Mining were used to find those activity patterns from the collected sensor data. But applying the above technique for Activity Recognition from the temporal sensor data stream is highly complex and challenging task. So, a new approach is proposed for activity recognition from sensor data stream which is achieved by constructing Frequent Pattern Stream tree (FPS - tree). FPS is a sliding window based approach to discover the recent activity patterns over time from data streams. The proposed work aims at identifying the frequent pattern of the user from the sensor data streams which are later modeled for activity recognition. The proposed FPM algorithm uses a data structure called Linked Sensor Data Stream (LSDS) for storing the sensor data stream information which increases the efficiency of frequent pattern mining algorithm through both space and time. The experimental results show the efficiency of the proposed algorithm and this FPM is further extended for applying for power efficiency using HUP to detect the high usage of power consumption of residents at smart home.\nThe idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer's output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood model. Representations and algorithms from computer graphics, originally designed to produce high-quality images, are instead used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on general-purpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured alphanumeric characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and supports accurate, approximately Bayesian inferences about ambiguous real-world images.\nThe intuitive notion of evidence has both semantic and syntactic features. In this paper, we develop an {\\em evidence logic} for epistemic agents faced with possibly contradictory evidence from different sources. The logic is based on a neighborhood semantics, where a neighborhood $N$ indicates that the agent has reason to believe that the true state of the world lies in $N$. Further notions of relative plausibility between worlds and beliefs based on the latter ordering are then defined in terms of this evidence structure, yielding our intended models for evidence-based beliefs. In addition, we also consider a second more general flavor, where belief and plausibility are modeled using additional primitive relations, and we prove a representation theorem showing that each such general model is a $p$-morphic image of an intended one. This semantics invites a number of natural special cases, depending on how uniform we make the evidence sets, and how coherent their total structure. We give a structural study of the resulting `uniform' and `flat' models. Our main result are sound and complete axiomatizations for the logics of all four major model classes with respect to the modal language of evidence, belief and safe belief. We conclude with an outlook toward logics for the dynamics of changing evidence, and the resulting language extensions and connections with logics of plausibility change.\nDue to dynamic network conditions, routing is the most critical part in WMNs and needs to be optimised. The routing strategies developed for WMNs must be efficient to make it an operationally self configurable network. Thus we need to resort to near shortest path evaluation. This lays down the requirement of some soft computing approaches such that a near shortest path is available in an affordable computing time. This paper proposes a Fuzzy Logic based integrated cost measure in terms of delay, throughput and jitter. Based upon this distance (cost) between two adjacent nodes we evaluate minimal shortest path that updates routing tables. We apply two recent soft computing approaches namely Big Bang Big Crunch (BB-BC) and Biogeography Based Optimization (BBO) approaches to enumerate shortest or near short paths. BB-BC theory is related with the evolution of the universe whereas BBO is inspired by dynamical equilibrium in the number of species on an island. Both the algorithms have low computational time and high convergence speed. Simulation results show that the proposed routing algorithms find the optimal shortest path taking into account three most important parameters of network dynamics. It has been further observed that for the shortest path problem BB-BC outperforms BBO in terms of speed and percent error between the evaluated minimal path and the actual shortest path.\nDynamic behaviour of a WMN imposes stringent constraints on the routing policy of the network. In the shortest path based routing the shortest paths needs to be evaluated within a given time frame allowed by the WMN dynamics. The exact reasoning based shortest path evaluation methods usually fail to meet this rigid requirement. Thus, requiring some soft computing based approaches which can replace \"best for sure\" solutions with \"good enough\" solutions. This paper proposes a framework for optimal routing in the WMNs; where we investigate the suitability of Big Bang-Big Crunch (BB-BC), a soft computing based approach to evaluate shortest/near-shortest path. In order to make routing optimal we first propose to replace distance between the adjacent nodes with an integrated cost measure that takes into account throughput, delay, jitter and residual energy of a node. A fuzzy logic based inference mechanism evaluates this cost measure at each node. Using this distance measure we apply BB-BC optimization algorithm to evaluate shortest/near shortest path to update the routing tables periodically as dictated by network requirements. A large number of simulations were conducted and it has been observed that BB-BC algorithm appears to be a high potential candidate suitable for routing in WMNs.\nPhilosophers writing about the ravens paradox often note that Nicod's Condition (NC) holds given some set of background information, and fails to hold against others, but rarely go any further. That is, it is usually not explored which background information makes NC true or false. The present paper aims to fill this gap. For us, \"(objective) background knowledge\" is restricted to information that can be expressed as probability events. Any other configuration is regarded as being subjective and a property of the a priori probability distribution. We study NC in two specific settings. In the first case, a complete description of some individuals is known, e.g. one knows of each of a group of individuals whether they are black and whether they are ravens. In the second case, the number of individuals having a particular property is given, e.g. one knows how many ravens or how many black things there are (in the relevant population). While some of the most famous answers to the paradox are measure-dependent, our discussion is not restricted to any particular probability measure. Our most interesting result is that in the second setting, NC violates a simple kind of inductive inference (namely projectability). Since relative to NC, this latter rule is more closely related to, and more directly justified by our intuitive notion of inductive reasoning, this tension makes a case against the plausibility of NC. In the end, we suggest that the informal representation of NC may seem to be intuitively plausible because it can easily be mistaken for reasoning by analogy.\nThe language of probability is used to define several different types of conditional statements. There are four principal types: subjunctive, material, existential, and feasibility. Two further types of conditionals are defined using the propositional calculus and Boole's mathematical logic: truth-functional and Boolean feasibility (which turn out to be special cases of probabilistic conditionals). Each probabilistic conditional is quantified by a fractional parameter between zero and one that says whether it is purely affirmative, purely negative, or intermediate in its sense. Conditionals can be specialized further by their content to express factuality and counterfactuality, and revised or reformulated to account for exceptions and confounding factors. The various conditionals have distinct mathematical representations: through intermediate probability expressions and logical formulas, each conditional is eventually translated into a set of polynomial equations and inequalities (with real coefficients). The polynomial systems from different types of conditionals exhibit different patterns of behavior, concerning for example opposing conditionals or false antecedents. Interesting results can be computed from the relevant polynomial systems using well-known methods from algebra and computer science. Among other benefits, the proposed framework of analysis offers paraconsistent procedures for logical deduction that produce such familiar results as modus ponens, transitivity, disjunction introduction, and disjunctive syllogism; all while avoiding any explosion of consequences from inconsistent premises. Several example problems from Goodman and Adams are analyzed. A new perspective called polylogicism is presented: mathematical logic that respects the diversity among conditionals in particular and logic problems in general.\nThis paper establishes theoretical bonafides for implicit concurrent multivariate effect evaluation--implicit concurrency for short---a broad and versatile computational learning efficiency thought to underlie general-purpose, non-local, noise-tolerant optimization in genetic algorithms with uniform crossover (UGAs). We demonstrate that implicit concurrency is indeed a form of efficient learning by showing that it can be used to obtain close-to-optimal bounds on the time and queries required to approximately correctly solve a constrained version (k=7, \\eta=1/5) of a recognizable computational learning problem: learning parities with noisy membership queries. We argue that a UGA that treats the noisy membership query oracle as a fitness function can be straightforwardly used to approximately correctly learn the essential attributes in O(log^1.585 n) queries and O(n log^1.585 n) time, where n is the total number of attributes. Our proof relies on an accessible symmetry argument and the use of statistical hypothesis testing to reject a global null hypothesis at the 10^-100 level of significance. It is, to the best of our knowledge, the first relatively rigorous identification of efficient computational learning in an evolutionary algorithm on a non-trivial learning problem.\nLearning the Markov network structure from data is a problem that has received considerable attention in machine learning, and in many other application fields. This work focuses on a particular approach for this purpose called independence-based learning. Such approach guarantees the learning of the correct structure efficiently, whenever data is sufficient for representing the underlying distribution. However, an important issue of such approach is that the learned structures are encoded in an undirected graph. The problem with graphs is that they cannot encode some types of independence relations, such as the context-specific independences. They are a particular case of conditional independences that is true only for a certain assignment of its conditioning set, in contrast to conditional independences that must hold for all its assignments. In this work we present CSPC, an independence-based algorithm for learning structures that encode context-specific independences, and encoding them in a log-linear model, instead of a graph. The central idea of CSPC is combining the theoretical guarantees provided by the independence-based approach with the benefits of representing complex structures by using features in a log-linear model. We present experiments in a synthetic case, showing that CSPC is more accurate than the state-of-the-art IB algorithms when the underlying distribution contains CSIs.\nPlanning is a notoriously difficult computational problem of high worst-case complexity. Researchers have been investing significant efforts to develop heuristics or restrictions to make planning practically feasible. Case-based planning is a heuristic approach where one tries to reuse previous experience when solving similar problems in order to avoid some of the planning effort. Plan reuse may offer an interesting alternative to plan generation in some settings.   We provide theoretical results that identify situations in which plan reuse is provably tractable. We perform our analysis in the framework of parameterized complexity, which supports a rigorous worst-case complexity analysis that takes structural properties of the input into account in terms of parameters. A central notion of parameterized complexity is fixed-parameter tractability which extends the classical notion of polynomial-time tractability by utilizing the effect of structural properties of the problem input.   We draw a detailed map of the parameterized complexity landscape of several variants of problems that arise in the context of case-based planning. In particular, we consider the problem of reusing an existing plan, imposing various restrictions in terms of parameters, such as the number of steps that can be added to the existing plan to turn it into a solution of the planning instance at hand.\nVerification of multi-agents systems (MAS) has been recently studied taking into account the need of expressing resource bounds. Several logics for specifying properties of MAS have been presented in quite a variety of scenarios with bounded resources. In this paper, we study a different formalism, called Priced Resource-Bounded Alternating-time Temporal Logic (PRBATL), whose main novelty consists in moving the notion of resources from a syntactic level (part of the formula) to a semantic one (part of the model). This allows us to track the evolution of the resource availability along the computations and provides us with a formalisms capable to model a number of real-world scenarios. Two relevant aspects are the notion of global availability of the resources on the market, that are shared by the agents, and the notion of price of resources, depending on their availability. In a previous work of ours, an initial step towards this new formalism was introduced, along with an EXPTIME algorithm for the model checking problem. In this paper we better analyze the features of the proposed formalism, also in comparison with previous approaches. The main technical contribution is the proof of the EXPTIME-hardness of the the model checking problem for PRBATL, based on a reduction from the acceptance problem for Linearly-Bounded Alternating Turing Machines. In particular, since the problem has multiple parameters, we show two fixed-parameter reductions.\nThe magnetic permeability of a ferrite is an important factor in designing devices such as inductors, transformers, and microwave absorbing materials among others. Due to this, it is advisable to study the magnetic permeability of a ferrite as a function of frequency.   When an excitation that corresponds to a harmonic magnetic field \\textbf{H} is applied to the system, this system responds with a magnetic flux density \\textbf{B}; the relation between these two vectors can be expressed as \\textbf{B}=$\\mu(\\omega)$ \\textbf{H} . Where $\\mu$ is the magnetic permeability.   In this paper, ferrites were considered linear, homogeneous, and isotropic materials. A magnetic permeability model was applied to NiZn ferrites doped with Yttrium.   The parameters of the model were adjusted using the Genetic Algorithm. In the computer science field of artificial intelligence, Genetic Algorithms and Machine Learning does rely upon nature's bounty for both inspiration nature's and mechanisms. Genetic Algorithms are probabilistic search procedures which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover.   For the numerical fitting usually is used a nonlinear least square method, this algorithm is based on calculus by starting from an initial set of variable values. This approach is mathematically elegant compared to the exhaustive or random searches but tends easily to get stuck in local minima. On the other hand, random methods use some probabilistic calculations to find variable sets. They tend to be slower but have greater success at finding the global minimum regardless of the initial values of the variables\nTime-series classification has attracted considerable research attention due to the various domains where time-series data are observed, ranging from medicine to econometrics. Traditionally, the focus of time-series classification has been on short time-series data composed of a unique pattern with intraclass pattern distortions and variations, while recently there have been attempts to focus on longer series composed of various local patterns. This study presents a novel method which can detect local patterns in long time-series via fitting local polynomial functions of arbitrary degrees. The coefficients of the polynomial functions are converted to symbolic words via equivolume discretizations of the coefficients' distributions. The symbolic polynomial words enable the detection of similar local patterns by assigning the same words to similar polynomials. Moreover, a histogram of the frequencies of the words is constructed from each time-series' bag of words. Each row of the histogram enables a new representation for the series and symbolize the existence of local patterns and their frequencies. Experimental evidence demonstrates outstanding results of our method compared to the state-of-art baselines, by exhibiting the best classification accuracies in all the datasets and having statistically significant improvements in the absolute majority of experiments.\nWe provide a systematic analysis of levels of integration between discrete high-level reasoning and continuous low-level reasoning to address hybrid planning problems in robotics. We identify four distinct strategies for such an integration: (i) low-level checks are done for all possible cases in advance and then this information is used during plan generation, (ii) low-level checks are done exactly when they are needed during the search for a plan, (iii) first all plans are computed and then infeasible ones are filtered, and (iv) by means of replanning, after finding a plan, low-level checks identify whether it is infeasible or not; if it is infeasible, a new plan is computed considering the results of previous low- level checks. We perform experiments on hybrid planning problems in robotic manipulation and legged locomotion domains considering these four methods of integration, as well as some of their combinations. We analyze the usefulness of levels of integration in these domains, both from the point of view of computational efficiency (in time and space) and from the point of view of plan quality relative to its feasibility. We discuss advantages and disadvantages of each strategy in the light of experimental results and provide some guidelines on choosing proper strategies for a given domain.\nThere has been significant interest in crowdsourcing and human computation. One subclass of human computation applications are those directed at tasks that involve planning (e.g. travel planning) and scheduling (e.g. conference scheduling). Much of this work appears outside the traditional automated planning forums, and at the outset it is not clear whether automated planning has much of a role to play in these human computation systems. Interestingly however, work on these systems shows that even primitive forms of automated oversight of the human planner does help in significantly improving the effectiveness of the humans/crowd. In this paper, we will argue that the automated oversight used in these systems can be viewed as a primitive automated planner, and that there are several opportunities for more sophisticated automated planning in effectively steering crowdsourced planning. Straightforward adaptation of current planning technology is however hampered by the mismatch between the capabilities of human workers and automated planners. We identify two important challenges that need to be overcome before such adaptation of planning technology can occur: (i) interpreting the inputs of the human workers (and the requester) and (ii) steering or critiquing the plans being produced by the human workers armed only with incomplete domain and preference models. In this paper, we discuss approaches for handling these challenges, and characterize existing human computation systems in terms of the specific choices they make in handling these challenges.\nWhen eliciting opinions from a group of experts, traditional devices used to promote honest reporting assume that there is an observable future outcome. In practice, however, this assumption is not always reasonable. In this paper, we propose a scoring method built on strictly proper scoring rules to induce honest reporting without assuming observable outcomes. Our method provides scores based on pairwise comparisons between the reports made by each pair of experts in the group. For ease of exposition, we introduce our scoring method by illustrating its application to the peer-review process. In order to do so, we start by modeling the peer-review process using a Bayesian model where the uncertainty regarding the quality of the manuscript is taken into account. Thereafter, we introduce our scoring method to evaluate the reported reviews. Under the assumptions that reviewers are Bayesian decision-makers and that they cannot influence the reviews of other reviewers, we show that risk-neutral reviewers strictly maximize their expected scores by honestly disclosing their reviews. We also show how the group's scores can be used to find a consensual review. Experimental results show that encouraging honest reporting through the proposed scoring method creates more accurate reviews than the traditional peer-review process.\nThe theory of natural selection cannot describe how early life evolved, in part because acquired characteristics are passed on through horizontal exchange. It has been proposed that culture, like life, began with the emergence of autopoietic form, thus its evolution too cannot be described by natural selection. The evolution of autopoietic form can be described using a framework referred to as Context-driven Actualization of Potential (CAP), which grew out of a generalization of the formalisms of quantum mechanics, and encompasses nondeterministic as well as deterministic change of state. The autopoietic structure that evolves through culture is the mind, or more accurately the conceptual network that yields an individual's internal model of the world. A branch of CAP research referred to as the state-context-property (SCOP) formalism provides a mathematical framework for reconciling the stability of conceptual structure with its susceptibility to context-driven change. The combination of two or more concepts (an extreme case of contextual influence), as occurs in insight, is modeled as a state of entanglement. Theoretical and empirical findings are presented that challenge assumptions underlying virtually all of cognitive science, such as the notion of spreading activation and the assumption that cognitive processes can be described with a Kolmogorovian probability model.\nWe investigate a method to deal with congestion of sectors and delays in the tactical phase of air traffic flow and capacity management. It relies on temporal objectives given for every point of the flight plans and shared among the controllers in order to create a collaborative environment. This would enhance the transition from the network view of the flow management to the local view of air traffic control. Uncertainty is modeled at the trajectory level with temporal information on the boundary points of the crossed sectors and then, we infer the probabilistic occupancy count. Therefore, we can model the accuracy of the trajectory prediction in the optimization process in order to fix some safety margins. On the one hand, more accurate is our prediction; more efficient will be the proposed solutions, because of the tighter safety margins. On the other hand, when uncertainty is not negligible, the proposed solutions will be more robust to disruptions. Furthermore, a multiobjective algorithm is used to find the tradeoff between the delays and congestion, which are antagonist in airspace with high traffic density. The flow management position can choose manually, or automatically with a preference-based algorithm, the adequate solution. This method is tested against two instances, one with 10 flights and 5 sectors and one with 300 flights and 16 sectors.\nIn this paper we study a particular aspect of the urban community policing: routine patrol route planning. We seek routes that guarantee visibility, as this has a sizable impact on the community perceived safety, allowing quick emergency responses and providing surveillance of selected sites (e.g., hospitals, schools). The planning is restricted to the availability of vehicles and strives to achieve balanced routes. We study an adaptation of the model for the multi-vehicle covering tour problem, in which a set of locations must be visited, whereas another subset must be close enough to the planned routes. It constitutes an NP-complete integer programming problem. Suboptimal solutions are obtained with several heuristics, some adapted from the literature and others developed by us. We solve some adapted instances from TSPLIB and an instance with real data, the former being compared with results from literature, and latter being compared with empirical data.\nThis paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then stitches the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF), and modularity optimization for community detection. We show that the resulting distributed algorithms for these problems essentially run in time linear in the number of nodes in the graph, and perform as well -- or even better -- than the original centralized algorithm as long as the graph has geometric structures. Here we say a graph has geometric structures, or polynomial growth property, when the number of nodes within distance r of any given node grows no faster than a polynomial function of r. More precisely, if the centralized algorithm is a C-factor approximation with constant C \\ge 1, the resulting distributed algorithm is a (C+\\delta)-factor approximation for any small \\delta>0; but if the centralized algorithm is a non-constant (e.g. logarithmic) factor approximation, then the resulting distributed algorithm becomes a constant factor approximation. For general graphs, we compute explicit bounds on the loss of performance of the resulting distributed algorithm with respect to the centralized algorithm.\nIn addition to their limpid interface with semantics, categorial grammars enjoy another important property: learnability. This was first noticed by Buskowsky and Penn and further studied by Kanazawa, for Bar-Hillel categorial grammars.   What about Lambek categorial grammars? In a previous paper we showed that product free Lambek grammars where learnable from structured sentences, the structures being incomplete natural deductions. These grammars were shown to be unlearnable from strings by Foret and Le Nir. In the present paper we show that Lambek grammars, possibly with product, are learnable from proof frames that are incomplete proof nets.   After a short reminder on grammatical inference \\`a la Gold, we provide an algorithm that learns Lambek grammars with product from proof frames and we prove its convergence. We do so for 1-valued also known as rigid Lambek grammars with product, since standard techniques can extend our result to $k$-valued grammars. Because of the correspondence between cut-free proof nets and normal natural deductions, our initial result on product free Lambek grammars can be recovered.   We are sad to dedicate the present paper to Philippe Darondeau, with whom we started to study such questions in Rennes at the beginning of the millennium, and who passed away prematurely.   We are glad to dedicate the present paper to Jim Lambek for his 90 birthday: he is the living proof that research is an eternal learning process.\nSmall groups of interneurons, abbreviated by CPG for central pattern generators, are arranged into neural networks to generate a variety of core bursting rhythms with specific phase-locked states, on distinct time scales, that govern vital motor behaviors in invertebrates such as chewing, swimming, etc. These movements in lower level animals mimic motions of organs in higher animals due to evolutionarily conserved mechanisms. Hence, various neurological diseases can be linked to abnormal movement of body parts that are regulated by a malfunctioning CPG. In this paper, we, being inspired by recent experimental studies of neuronal activity patterns recorded from a swimming motion CPG of the sea slug {\\it Melibe leonina}, examine a mathematical model of a 4-cell network that can plausibly and stably underlie the observed bursting rhythm. We develop a dynamical systems framework for explaining the existence and robustness of phase-locked states in activity patterns produced by the modeled CPGs. The proposed tools can be used for identifying core components for other CPG networks with reliable bursting outcomes and specific phase relationships between the interneurons. Our findings can be employed for identifying or implementing the conditions for normal and pathological functioning of basic CPGs of animals and artificially intelligent prosthetics that can regulate various movements.\nState-of-the-art algorithms for industrial instances of MaxSAT problem rely on iterative calls to a SAT solver. Preprocessing is crucial for the acceleration of SAT solving, and the key preprocessing techniques rely on the application of resolution and subsumption elimination. Additionally, satisfiability-preserving clause elimination procedures are often used. Since MaxSAT computation typically involves a large number of SAT calls, we are interested in whether an input instance to a MaxSAT problem can be preprocessed up-front, i.e. prior to running the MaxSAT solver, rather than (or, in addition to) during each iterative SAT solver call. The key requirement in this setting is that the preprocessing has to be sound, i.e. so that the solution can be reconstructed correctly and efficiently after the execution of a MaxSAT algorithm on the preprocessed instance. While, as we demonstrate in this paper, certain clause elimination procedures are sound for MaxSAT, it is well-known that this is not the case for resolution and subsumption elimination. In this paper we show how to adapt these preprocessing techniques to MaxSAT. To achieve this we recast the MaxSAT problem in a recently introduced labelled-CNF framework, and show that within the framework the preprocessing techniques can be applied soundly. Furthermore, we show that MaxSAT algorithms restated in the framework have a natural implementation on top of an incremental SAT solver. We evaluate the prototype implementation of a MaxSAT algorithm WMSU1 in this setting, demonstrate the effectiveness of preprocessing, and show overall improvement with respect to non-incremental versions of the algorithm on some classes of problems.\nEven though modern service-oriented and data-oriented architectures promise to deliver loosely coupled control systems, they are inherently brittle as they commonly depend on a priori agreed interfaces and data models. At the same time, the Semantic Web and a whole set of accompanying standards and tools are emerging, advocating ontologies as the basis for knowledge exchange. In this paper we aim to identify a number of key ideas from the myriad of knowledge-based practices that can readily be implemented by control systems today. We demonstrate with a practical example (a three-channel imager for the Mercator Telescope) how ontologies developed in the Web Ontology Language (OWL) can serve as a meta-model for our instrument, covering as many engineering aspects of the project as needed. We show how a concrete system model can be built on top of this meta-model via a set of Domain Specific Languages (DSLs), supporting both formal verification and the generation of software and documentation artifacts. Finally we reason how the available semantics can be exposed at run-time by adding a \"semantic layer\" that can be browsed, queried, monitored etc. by any OPC UA-enabled client.\nProbabilistic inference over large data sets is a challenging data management problem since exact inference is generally #P-hard and is most often solved approximately with sampling-based methods today. This paper proposes an alternative approach for approximate evaluation of conjunctive queries with standard relational databases: In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known PTIME self-join-free conjunctive queries: A query is in PTIME if and only if our algorithm returns one single plan. Furthermore, our approach is a generalization of a family of efficient ranking methods from graphs to hypergraphs. We also adapt three relational query optimization techniques to evaluate all necessary plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers. We also note that the techniques developed in this paper apply immediately to lifted inference from statistical relational models since lifted inference corresponds to PTIME plans in probabilistic databases.\nDesigning algorithms capable of efficiently constructing minimal models of CNFs is an important task in AI. This paper provides new results along this research line and presents new algorithms for performing minimal model finding and checking over positive propositional CNFs and model minimization over propositional CNFs. An algorithmic schema, called the Generalized Elimination Algorithm (GEA) is presented, that computes a minimal model of any positive CNF. The schema generalizes the Elimination Algorithm (EA) [BP97], which computes a minimal model of positive head-cycle-free (HCF) CNF theories. While the EA always runs in polynomial time in the size of the input HCF CNF, the complexity of the GEA depends on the complexity of the specific eliminating operator invoked therein, which may in general turn out to be exponential. Therefore, a specific eliminating operator is defined by which the GEA computes, in polynomial time, a minimal model for a class of CNF that strictly includes head-elementary-set-free (HEF) CNF theories [GLL06], which form, in their turn, a strict superset of HCF theories. Furthermore, in order to deal with the high complexity associated with recognizing HEF theories, an \"incomplete\" variant of the GEA (called IGEA) is proposed: the resulting schema, once instantiated with an appropriate elimination operator, always constructs a model of the input CNF, which is guaranteed to be minimal if the input theory is HEF. In the light of the above results, the main contribution of this work is the enlargement of the tractability frontier for the minimal model finding and checking and the model minimization problems.\nHuman activity and environment produces sounds such as, at home, the noise produced by water, cough, or television. These sounds can be used to determine the activity in the environment. The objective is to monitor a person's activity or determine his environment using a single low cost microphone by sound analysis. The purpose is to adapt programs to the activity or environment or detect abnormal situations. Some patterns of over expressed repeatedly in the sequences of recognized sounds inter and intra environment allow to characterize activities such as the entrance of a person in the house, or a tv program watched. We first manually annotated 1500 sounds of daily life activity of old persons living at home recognized sounds. Then we inferred an ontology and enriched the database of annotation with a crowed sourced manual annotation of 7500 sounds to help with the annotation of the most frequent sounds. Using learning sound algorithms, we defined 50 types of the most frequent sounds. We used this set of recognizable sounds as a base to tag sounds and put tags on them. By using over expressed number of motifs of sequences of the tags, we were able to categorize using only a single low-cost microphone, complex activities of daily life of a persona at home as watching TV, entrance in the apartment of a person, or phone conversation including detecting unknown activities as repeated tasks performed by users.\nOntology matching finds correspondences between similar entities of different ontologies. Two ontologies may be similar in some aspects such as structure, semantic etc. Most ontology matching systems integrate multiple matchers to extract all the similarities that two ontologies may have. Thus, we face a major problem to aggregate different similarities. Some matching systems use experimental weights for aggregation of similarities among different matchers while others use machine learning approaches and optimization algorithms to find optimal weights to assign to different matchers. However, both approaches have their own deficiencies. In this paper, we will point out the problems and shortcomings of current similarity aggregation strategies. Then, we propose a new strategy, which enables us to utilize the structural information of ontologies to get weights of matchers, for the similarity aggregation task. For achieving this goal, we create a new Ontology Matching system which it uses three available matchers, namely GMO, ISub and VDoc. We have tested our similarity aggregation strategy on the OAEI 2012 data set. Experimental results show significant improvements in accuracies of several cases, especially in matching the classes of ontologies. We will compare the performance of our similarity aggregation strategy with other well-known strategies\nWe present algorithms to effectively represent a set of Markov decision processes (MDPs), whose optimal policies have already been learned, by a smaller source subset for lifelong, policy-reuse-based transfer learning in reinforcement learning. This is necessary when the number of previous tasks is large and the cost of measuring similarity counteracts the benefit of transfer. The source subset forms an `$\\epsilon$-net' over the original set of MDPs, in the sense that for each previous MDP $M_p$, there is a source $M^s$ whose optimal policy has $<\\epsilon$ regret in $M_p$. Our contributions are as follows. We present EXP-3-Transfer, a principled policy-reuse algorithm that optimally reuses a given source policy set when learning for a new MDP. We present a framework to cluster the previous MDPs to extract a source subset. The framework consists of (i) a distance $d_V$ over MDPs to measure policy-based similarity between MDPs; (ii) a cost function $g(\\cdot)$ that uses $d_V$ to measure how good a particular clustering is for generating useful source tasks for EXP-3-Transfer and (iii) a provably convergent algorithm, MHAV, for finding the optimal clustering. We validate our algorithms through experiments in a surveillance domain.\nMany optimization tasks have to be handled in noisy environments, where we cannot obtain the exact evaluation of a solution but only a noisy one. For noisy optimization tasks, evolutionary algorithms (EAs), a kind of stochastic metaheuristic search algorithm, have been widely and successfully applied. Previous work mainly focuses on empirical studying and designing EAs for noisy optimization, while, the theoretical counterpart has been little investigated. In this paper, we investigate a largely ignored question, i.e., whether an optimization problem will always become harder for EAs in a noisy environment. We prove that the answer is negative, with respect to the measurement of the expected running time. The result implies that, for optimization tasks that have already been quite hard to solve, the noise may not have a negative effect, and the easier a task the more negatively affected by the noise. On a representative problem where the noise has a strong negative effect, we examine two commonly employed mechanisms in EAs dealing with noise, the re-evaluation and the threshold selection strategies. The analysis discloses that the two strategies, however, both are not effective, i.e., they do not make the EA more noise tolerant. We then find that a small modification of the threshold selection allows it to be proven as an effective strategy for dealing with the noise in the problem.\nAnswer Set Programming (ASP) is a popular framework for modeling combinatorial problems. However, ASP cannot easily be used for reasoning about uncertain information. Possibilistic ASP (PASP) is an extension of ASP that combines possibilistic logic and ASP. In PASP a weight is associated with each rule, where this weight is interpreted as the certainty with which the conclusion can be established when the body is known to hold. As such, it allows us to model and reason about uncertain information in an intuitive way. In this paper we present new semantics for PASP, in which rules are interpreted as constraints on possibility distributions. Special models of these constraints are then identified as possibilistic answer sets. In addition, since ASP is a special case of PASP in which all the rules are entirely certain, we obtain a new characterization of ASP in terms of constraints on possibility distributions. This allows us to uncover a new form of disjunction, called weak disjunction, that has not been previously considered in the literature. In addition to introducing and motivating the semantics of weak disjunction, we also pinpoint its computational complexity. In particular, while the complexity of most reasoning tasks coincides with standard disjunctive ASP, we find that brave reasoning for programs with weak disjunctions is easier.\nIn this paper, we consider active information acquisition when the prediction model is meant to be applied on a targeted subset of the population. The goal is to label a pre-specified fraction of customers in the target or test set by iteratively querying for information from the non-target or training set. The number of queries is limited by an overall budget. Arising in the context of two rather disparate applications- banking and medical diagnosis, we pose the active information acquisition problem as a constrained optimization problem. We propose two greedy iterative algorithms for solving the above problem. We conduct experiments with synthetic data and compare results of our proposed algorithms with few other baseline approaches. The experimental results show that our proposed approaches perform better than the baseline schemes.\nImproving the throughput of molecular docking, a computationally intensive phase of the virtual screening process, is a highly sought area of research since it has a significant weight in the drug designing process. With such improvements, the world might find cures for incurable diseases like HIV disease and Cancer sooner. Our approach presented in this paper is to utilize a multi-core environment to introduce Data Level Parallelism (DLP) to the Autodock Vina software, which is a widely used for molecular docking software. Autodock Vina already exploits Instruction Level Parallelism (ILP) in multi-core environments and therefore optimized for such environments. However, with the results we have obtained, it can be clearly seen that our approach has enhanced the throughput of the already optimized software by more than six times. This will dramatically reduce the time consumed for the lead identification phase in drug designing along with the shift in the processor technology from multi-core to many-core of the current era. Therefore, we believe that the contribution of this project will effectively make it possible to expand the number of small molecules docked against a drug target and improving the chances to design drugs for incurable diseases.\nBike sharing systems are a very popular means to provide bikes to citizens in a simple and cheap way. The idea is to install bike stations at various points in the city, from which a registered user can easily loan a bike by removing it from a specialized rack. After the ride, the user may return the bike at any station (if there is a free rack). Services of this kind are mainly public or semi-public, often aimed at increasing the attractiveness of non-motorized means of transportation, and are usually free, or almost free, of charge for the users. Depending on their location, bike stations have specific patterns regarding when they are empty or full. For instance, in cities where most jobs are located near the city centre, the commuters cause certain peaks in the morning: the central bike stations are filled, while the stations in the outskirts are emptied. Furthermore, stations located on top of a hill are more likely to be empty, since users are less keen on cycling uphill to return the bike, and often leave their bike at a more reachable station. These issues result in substantial user dissatisfaction which may eventually cause the users to abandon the service. This is why nowadays most bike sharing system providers take measures to rebalance them. Over the last few years, balancing bike sharing systems (BBSS) has become increasingly studied in optimization. As such, generating meaningful instance to serve as a benchmark for the proposed approaches is an important task. In this technical report we describe the procedure we used to generate BBSS problem instances from data of the CitiBike NYC bike sharing system.\nThe number of malicious software (malware) is growing out of control. Syntactic signature based detection cannot cope with such growth and manual construction of malware signature databases needs to be replaced by computer learning based approaches. Currently, a single modern signature capturing the semantics of a malicious behavior can be used to replace an arbitrarily large number of old-fashioned syntactical signatures. However teaching computers to learn such behaviors is a challenge. Existing work relies on dynamic analysis to extract malicious behaviors, but such technique does not guarantee the coverage of all behaviors. To sidestep this limitation we show how to learn malware signatures using static reachability analysis. The idea is to model binary programs using pushdown systems (that can be used to model the stack operations occurring during the binary code execution), use reachability analysis to extract behaviors in the form of trees, and use subtrees that are common among the trees extracted from a training set of malware files as signatures. To detect malware we propose to use a tree automaton to compactly store malicious behavior trees and check if any of the subtrees extracted from the file under analysis is malicious. Experimental data shows that our approach can be used to learn signatures from a training set of malware files and use them to detect a test set of malware that is 5 times the size of the training set.\nWe consider the discrete assignment problem in which agents express ordinal preferences over objects and these objects are allocated to the agents in a fair manner. We use the stochastic dominance relation between fractional or randomized allocations to systematically define varying notions of proportionality and envy-freeness for discrete assignments. The computational complexity of checking whether a fair assignment exists is studied for these fairness notions. We also characterize the conditions under which a fair assignment is guaranteed to exist. For a number of fairness concepts, polynomial-time algorithms are presented to check whether a fair assignment exists. Our algorithmic results also extend to the case of unequal entitlements of agents. Our NP-hardness result, which holds for several variants of envy-freeness, answers an open question posed by Bouveret, Endriss, and Lang (ECAI 2010). We also propose fairness concepts that always suggest a non-empty set of assignments with meaningful fairness properties. Among these concepts, optimal proportionality and optimal weak proportionality appear to be desirable fairness concepts.\nDifferent machine learning techniques have been proposed and used for modeling individual and group user needs, interests and preferences. In the traditional predictive modeling instances are described by observable variables, called attributes. The goal is to learn a model for predicting the target variable for unseen instances. For example, for marketing purposes a company consider profiling a new user based on her observed web browsing behavior, referral keywords or other relevant information. In many real world applications the values of some attributes are not only observable, but can be actively decided by a decision maker. Furthermore, in some of such applications the decision maker is interested not only to generate accurate predictions, but to maximize the probability of the desired outcome. For example, a direct marketing manager can choose which type of a special offer to send to a client (actionable attribute), hoping that the right choice will result in a positive response with a higher probability. We study how to learn to choose the value of an actionable attribute in order to maximize the probability of a desired outcome in predictive modeling. We emphasize that not all instances are equally sensitive to changes in actions. Accurate choice of an action is critical for those instances, which are on the borderline (e.g. users who do not have a strong opinion one way or the other). We formulate three supervised learning approaches for learning to select the value of an actionable attribute at an instance level. We also introduce a focused training procedure which puts more emphasis on the situations where varying the action is the most likely to take the effect. The proof of concept experimental validation on two real-world case studies in web analytics and e-learning domains highlights the potential of the proposed approaches.\nOntology Learning (OL) is the computational task of generating a knowledge base in the form of an ontology given an unstructured corpus whose content is in natural language (NL). Several works can be found in this area most of which are limited to statistical and lexico-syntactic pattern matching based techniques Light-Weight OL. These techniques do not lead to very accurate learning mostly because of several linguistic nuances in NL. Formal OL is an alternative (less explored) methodology were deep linguistics analysis is made using theory and tools found in computational linguistics to generate formal axioms and definitions instead simply inducing a taxonomy. In this paper we propose \"Description Logic (DL)\" based formal OL framework for learning factual IS-A type sentences in English. We claim that semantic construction of IS-A sentences is non trivial. Hence, we also claim that such sentences requires special studies in the context of OL before any truly formal OL can be proposed. We introduce a learner tool, called DLOL_IS-A, that generated such ontologies in the owl format. We have adopted \"Gold Standard\" based OL evaluation on IS-A rich WCL v.1.1 dataset and our own Community representative IS-A dataset. We observed significant improvement of DLOL_IS-A when compared to the light-weight OL tool Text2Onto and formal OL tool FRED.\nA Constraint Satisfaction Problem (CSP) is a framework used for modeling and solving constrained problems. Tree-search algorithms like backtracking try to construct a solution to a CSP by selecting the variables of the problem one after another. The order in which these algorithm select the variables potentially have significant impact on the search performance. Various heuristics have been proposed for choosing good variable ordering. Many powerful variable ordering heuristics weigh the constraints first and then utilize the weights for selecting good order of the variables. Constraint weighting are basically employed to identify global bottlenecks in a CSP.   In this paper, we propose a new approach for learning weights for the constraints using competitive coevolutionary Genetic Algorithm (GA). Weights learned by the coevolutionary GA later help to make better choices for the first few variables in a search. In the competitive coevolutionary GA, constraints and candidate solutions for a CSP evolve together through an inverse fitness interaction process. We have conducted experiments on several random, quasi-random and patterned instances to measure the efficiency of the proposed approach. The results and analysis show that the proposed approach is good at learning weights to distinguish the hard constraints for quasi-random instances and forced satisfiable random instances generated with the Model RB. For other type of instances, RNDI still seems to be the best approach as our experiments show.\nThese are the proceedings of the Second Workshop on GRAPH Inspection and Traversal Engineering (GRAPHITE 2013), which took place on March 24, 2013 in Rome, Italy, as a satellite event of the 16th European Joint Conferences on Theory and Practice of Software (ETAPS 2013).   The topic of the GRAPHITE workshop is graph analysis in all its forms in computer science. Graphs are used to represent data in many application areas, and they are subjected to various computational algorithms in order to acquire the desired information. These graph algorithms tend to have common characteristics, such as duplicate detection to guarantee their termination, independent of their application domain. Over the past few years, it has been shown that the scalability of such algorithms can be dramatically improved by using, e.g., external memory, by exploiting parallel architectures, such as clusters, multi-core CPUs, and graphics processing units, and by using heuristics to guide the search. Novel techniques to further scale graph search algorithms, and new applications of graph search are within the scope of this workshop.   Another topic of interest of the event is more related to the structural properties of graphs: which kind of graph characteristics are relevant for a particular application area, and how can these be measured? Finally, any novel way of using graphs for a particular application area is on topic.   The goal of this event is to gather scientists from different communities, such as model checking, artificial intelligence planning, game playing, and algorithm engineering, who do research on graph search algorithms, such that awareness of each others' work is increased.\nIn this paper we examine the usefulness of two classes of algorithms Distance Methods, Discrete Character Methods (Felsenstein and Felsenstein 2003) widely used in genetics, for predicting the family relationships among a set of related languages and therefore, diachronic language change. Applying these algorithms to the data on the numbers of shared cognates- with-change and changed as well as unchanged cognates for a group of six languages belonging to a Dravidian language sub-family given in Krishnamurti et al. (1983), we observed that the resultant phylogenetic trees are largely in agreement with the linguistic family tree constructed using the comparative method of reconstruction with only a few minor differences. Furthermore, we studied these minor differences and found that they were cases of genuine ambiguity even for a well-trained historical linguist. We evaluated the trees obtained through our experiments using a well-defined criterion and report the results here. We finally conclude that quantitative methods like the ones we examined are quite useful in predicting family relationships among languages. In addition, we conclude that a modest degree of confidence attached to the intuition that there could indeed exist a parallelism between the processes of linguistic and genetic change is not totally misplaced.\nInvestigation of the underlying physics or biology from empirical data requires a quantifiable notion of similarity - when do two observed data sets indicate nearly identical generating processes, and when they do not. The discriminating characteristics to look for in data is often determined by heuristics designed by experts, $e.g.$, distinct shapes of \"folded\" lightcurves may be used as \"features\" to classify variable stars, while determination of pathological brain states might require a Fourier analysis of brainwave activity. Finding good features is non-trivial. Here, we propose a universal solution to this problem: we delineate a principle for quantifying similarity between sources of arbitrary data streams, without a priori knowledge, features or training. We uncover an algebraic structure on a space of symbolic models for quantized data, and show that such stochastic generators may be added and uniquely inverted; and that a model and its inverse always sum to the generator of flat white noise. Therefore, every data stream has an anti-stream: data generated by the inverse model. Similarity between two streams, then, is the degree to which one, when summed to the other's anti-stream, mutually annihilates all statistical structure to noise. We call this data smashing. We present diverse applications, including disambiguation of brainwaves pertaining to epileptic seizures, detection of anomalous cardiac rhythms, and classification of astronomical objects from raw photometry. In our examples, the data smashing principle, without access to any domain knowledge, meets or exceeds the performance of specialized algorithms tuned by domain experts.\nConstraints have played an important role in the construction of GUIs, where they are mainly used to define the layout of the widgets. Resizing behavior is very important in GUIs because areas have domain specific parameters such as form the resizing of windows. If linear objective function is used and window is resized then error is not distributed equally. To distribute the error equally, a quadratic objective function is introduced. Different algorithms are widely used for solving linear constraints and quadratic problems in a variety of different scientific areas. The linear relxation, Kaczmarz, direct and linear programming methods are common methods for solving linear constraints for GUI layout. The interior point and active set methods are most commonly used techniques to solve quadratic programming problems. Current constraint solvers designed for GUI layout do not use interior point methods for solving a quadratic objective function subject to linear equality and inequality constraints. In this paper, performance aspects and the convergence speed of interior point and active set methods are compared along with one most commonly used linear programming method when they are implemented for graphical user interface layout. The performance and convergence of the proposed algorithms are evaluated empirically using randomly generated UI layout specifications of various sizes. The results show that the interior point algorithms perform significantly better than the Simplex method and QOCA-solver, which uses the active set method implementation for solving quadratic optimization.\nIn a sequential auction with multiple bidding agents, it is highly challenging to determine the ordering of the items to sell in order to maximize the revenue due to the fact that the autonomy and private information of the agents heavily influence the outcome of the auction.   The main contribution of this paper is two-fold. First, we demonstrate how to apply machine learning techniques to solve the optimal ordering problem in sequential auctions. We learn regression models from historical auctions, which are subsequently used to predict the expected value of orderings for new auctions. Given the learned models, we propose two types of optimization methods: a black-box best-first search approach, and a novel white-box approach that maps learned models to integer linear programs (ILP) which can then be solved by any ILP-solver. Although the studied auction design problem is hard, our proposed optimization methods obtain good orderings with high revenues.   Our second main contribution is the insight that the internal structure of regression models can be efficiently evaluated inside an ILP solver for optimization purposes. To this end, we provide efficient encodings of regression trees and linear regression models as ILP constraints. This new way of using learned models for optimization is promising. As the experimental results show, it significantly outperforms the black-box best-first search in nearly all settings.\nMany computer programs have graphical user interfaces (GUIs), which need good layout to make efficient use of the available screen real estate. Most GUIs do not have a fixed layout, but are resizable and able to adapt themselves. Constraints are a powerful tool for specifying adaptable GUI layouts: they are used to specify a layout in a general form, and a constraint solver is used to find a satisfying concrete layout, e.g.\\ for a specific GUI size. The constraint solver has to calculate a new layout every time a GUI is resized or changed, so it needs to be efficient to ensure a good user experience. One approach for constraint solvers is based on the Gauss-Seidel algorithm and successive over-relaxation (SOR).   Our observation is that a solution after resizing or changing is similar in structure to a previous solution. Thus, our hypothesis is that we can increase the computational performance of an SOR-based constraint solver if we reuse the solution of a previous layout to warm-start the solving of a new layout. In this paper we report on experiments to test this hypothesis experimentally for three common use cases: big-step resizing, small-step resizing and constraint change. In our experiments, we measured the solving time for randomly generated GUI layout specifications of various sizes. For all three cases we found that the performance is improved if an existing solution is used as a starting solution for a new layout.\nThe problem of defining and locating free will (FW) in physics is studied. On basis of logical paradoxes, we argue that FW has a meta-theoretic character, like the concept of truth in Tarski's undefinability theorem. Free will exists relative to a base theory if there is freedom to deviate from the deterministic or indeterministic dynamics in the theory, with the deviations caused by parameters (representing will) in the meta-theory. By contrast, determinism and indeterminism do not require meta-theoretic considerations in their formalization, making FW a fundamentally new causal primitive. FW exists relative to the meta-theory if there is freedom for deviation, due to higher-order causes. Absolute free will, which corresponds to our intuitive introspective notion of free will, exists if this meta-theoretic hierarchy is infinite. We argue that this hierarchy corresponds to higher levels of uncomputability. In other words, at any finitely high order in the hierarchy, there are uncomputable deviations from the law at that order. Applied to the human condition, the hierarchy corresponds to deeper levels of the subconscious or unconscious mind. Possible ramifications of our model for physics, neuroscience and artificial intelligence (AI) are briefly considered.\nFollowing the \"decomposition-and-ensemble\" principle, the empirical mode decomposition (EMD)-based modeling framework has been widely used as a promising alternative for nonlinear and nonstationary time series modeling and prediction. The end effect, which occurs during the sifting process of EMD and is apt to distort the decomposed sub-series and hurt the modeling process followed, however, has been ignored in previous studies. Addressing the end effect issue, this study proposes to incorporate end condition methods into EMD-based decomposition and ensemble modeling framework for one- and multi-step ahead time series prediction. Four well-established end condition methods, Mirror method, Coughlin's method, Slope-based method, and Rato's method, are selected, and support vector regression (SVR) is employed as the modeling technique. For the purpose of justification and comparison, well-known NN3 competition data sets are used and four well-established prediction models are selected as benchmarks. The experimental results demonstrated that significant improvement can be achieved by the proposed EMD-based SVR models with end condition methods. The EMD-SBM-SVR model and EMD-Rato-SVR model, in particular, achieved the best prediction performances in terms of goodness of forecast measures and equality of accuracy of competing forecasts test.\nWe consider schemes for obtaining truthful reports on a common but hidden signal from large groups of rational, self-interested agents. One example are online feedback mechanisms, where users provide observations about the quality of a product or service so that other users can have an accurate idea of what quality they can expect. However, (i) providing such feedback is costly, and (ii) there are many motivations for providing incorrect feedback.   Both problems can be addressed by reward schemes which (i) cover the cost of obtaining and reporting feedback, and (ii) maximize the expected reward of a rational agent who reports truthfully. We address the design of such incentive-compatible rewards for feedback generated in environments with pure adverse selection. Here, the correlation between the true knowledge of an agent and her beliefs regarding the likelihoods of reports of other agents can be exploited to make honest reporting a Nash equilibrium.   In this paper we extend existing methods for designing incentive-compatible rewards by also considering collusion. We analyze different scenarios, where, for example, some or all of the agents collude. For each scenario we investigate whether a collusion-resistant, incentive-compatible reward scheme exists, and use automated mechanism design to specify an algorithm for deriving an efficient reward mechanism.\nThis paper presents a new method for inferring the semantic properties of documents by leveraging free-text keyphrase annotations. Such annotations are becoming increasingly abundant due to the recent dramatic growth in semi-structured, user-generated online content. One especially relevant domain is product reviews, which are often annotated by their authors with pros/cons keyphrases such as a real bargain or good value. These annotations are representative of the underlying semantic properties; however, unlike expert annotations, they are noisy: lay authors may use different labels to denote the same property, and some labels may be missing. To learn using such noisy annotations, we find a hidden paraphrase structure which clusters the keyphrases. The paraphrase structure is linked with a latent topic model of the review texts, enabling the system to predict the properties of unannotated documents and to effectively aggregate the semantic properties of multiple reviews. Our approach is implemented as a hierarchical Bayesian model with joint inference. We find that joint inference increases the robustness of the keyphrase clustering and encourages the latent topics to correlate with semantically meaningful properties. Multiple evaluations demonstrate that our model substantially outperforms alternative approaches for summarizing single and multiple documents into a set of semantically salient keyphrases.\nVickrey-Clarke-Groves (VCG) mechanisms are often used to allocate tasks to selfish and rational agents. VCG mechanisms are incentive compatible, direct mechanisms that are efficient (i.e., maximise social utility) and individually rational (i.e., agents prefer to join rather than opt out). However, an important assumption of these mechanisms is that the agents will \"always\" successfully complete their allocated tasks. Clearly, this assumption is unrealistic in many real-world applications, where agents can, and often do, fail in their endeavours. Moreover, whether an agent is deemed to have failed may be perceived differently by different agents. Such subjective perceptions about an agents probability of succeeding at a given task are often captured and reasoned about using the notion of \"trust\". Given this background, in this paper we investigate the design of novel mechanisms that take into account the trust between agents when allocating tasks.   Specifically, we develop a new class of mechanisms, called \"trust-based mechanisms\", that can take into account multiple subjective measures of the probability of an agent succeeding at a given task and produce allocations that maximise social utility, whilst ensuring that no agent obtains a negative utility. We then show that such mechanisms pose a challenging new combinatorial optimisation problem (that is NP-complete), devise a novel representation for solving the problem, and develop an effective integer programming solution (that can solve instances with about 2x10^5 possible allocations in 40 seconds).\nComplex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topic-oriented, informative multi-document summarization where the goal is to produce a single text as a compressed version of a set of documents with a minimum loss of relevant information. In this paper, we experiment with one empirical method and two unsupervised statistical machine learning techniques: K-means and Expectation Maximization (EM), for computing relative importance of the sentences. We compare the results of these approaches. Our experiments show that the empirical approach outperforms the other two techniques and EM performs better than K-means. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. In order to measure the importance and relevance to the user query we extract different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences. We use a local search technique to learn the weights of the features. To the best of our knowledge, no study has used tree kernel functions to encode syntactic/semantic information for more complex tasks such as computing the relatedness between the query sentences and the document sentences in order to generate query-focused summaries (or answers to complex questions). For each of our methods of generating summaries (i.e. empirical, K-means and EM) we show the effects of syntactic and shallow-semantic features over the bag-of-words (BOW) features.\nA highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation.\nVast amounts of text on the Web are unstructured and ungrammatical, such as classified ads, auction listings, forum postings, etc. We call such text \"posts.\" Despite their inconsistent structure and lack of grammar, posts are full of useful information. This paper presents work on semi-automatically building tables of relational information, called \"reference sets,\" by analyzing such posts directly. Reference sets can be applied to a number of tasks such as ontology maintenance and information extraction. Our reference-set construction method starts with just a small amount of background knowledge, and constructs tuples representing the entities in the posts to form a reference set. We also describe an extension to this approach for the special case where even this small amount of background knowledge is impossible to discover and use. To evaluate the utility of the machine-constructed reference sets, we compare them to manually constructed reference sets in the context of reference-set-based information extraction. Our results show the reference sets constructed by our method outperform manually constructed reference sets. We also compare the reference-set-based extraction approach using the machine-constructed reference set to supervised extraction approaches using generic features. These results demonstrate that using machine-constructed reference sets outperforms the supervised methods, even though the supervised methods require training data.\nIn the usual models of cooperative game theory, the outcome of a coalition formation process is either the grand coalition or a coalition structure that consists of disjoint coalitions. However, in many domains where coalitions are associated with tasks, an agent may be involved in executing more than one task, and thus may distribute his resources among several coalitions. To tackle such scenarios, we introduce a model for cooperative games with overlapping coalitions--or overlapping coalition formation (OCF) games. We then explore the issue of stability in this setting. In particular, we introduce a notion of the core, which generalizes the corresponding notion in the traditional (non-overlapping) scenario. Then, under some quite general conditions, we characterize the elements of the core, and show that any element of the core maximizes the social welfare. We also introduce a concept of balancedness for overlapping coalitional games, and use it to characterize coalition structures that can be extended to elements of the core. Finally, we generalize the notion of convexity to our setting, and show that under some natural assumptions convex games have a non-empty core. Moreover, we introduce two alternative notions of stability in OCF that allow a wider range of deviations, and explore the relationships among the corresponding definitions of the core, as well as the classic (non-overlapping) core and the Aubin core. We illustrate the general properties of the three cores, and also study them from a computational perspective, thus obtaining additional insights into their fundamental structure.\nWeighted voting is a classic model of cooperation among agents in decision-making domains. In such games, each player has a weight, and a coalition of players wins the game if its total weight meets or exceeds a given quota. A players power in such games is usually not directly proportional to his weight, and is measured by a power index, the most prominent among which are the Shapley-Shubik index and the Banzhaf index.In this paper, we investigate by how much a player can change his power, as measured by the Shapley-Shubik index or the Banzhaf index, by means of a false-name manipulation, i.e., splitting his weight among two or more identities. For both indices, we provide upper and lower bounds on the effect of weight-splitting. We then show that checking whether a beneficial split exists is NP-hard, and discuss efficient algorithms for restricted cases of this problem, as well as randomized algorithms for the general case. We also provide an experimental evaluation of these algorithms. Finally, we examine related forms of manipulative behavior, such as annexation, where a player subsumes other players, or merging, where several players unite into one. We characterize the computational complexity of such manipulations and provide limits on their effects. For the Banzhaf index, we describe a new paradox, which we term the Annexation Non-monotonicity Paradox.\nThere has been significant recent interest in game-theoretic approaches to security, with much of the recent research focused on utilizing the leader-follower Stackelberg game model. Among the major applications are the ARMOR program deployed at LAX Airport and the IRIS program in use by the US Federal Air Marshals (FAMS). The foundational assumption for using Stackelberg games is that security forces (leaders), acting first, commit to a randomized strategy; while their adversaries (followers) choose their best response after surveillance of this randomized strategy. Yet, in many situations, a leader may face uncertainty about the follower's surveillance capability. Previous work fails to address how a leader should compute her strategy given such uncertainty. We provide five contributions in the context of a general class of security games. First, we show that the Nash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem. Second, under a natural restriction on security games, any Stackelberg strategy is also a Nash equilibrium strategy; and furthermore, the solution is unique in a class of security games of which ARMOR is a key exemplar. Third, when faced with a follower that can attack multiple targets, many of these properties no longer hold. Fourth, we show experimentally that in most (but not all) games where the restriction does not hold, the Stackelberg strategy is still a Nash equilibrium strategy, but this is no longer true when the attacker can attack multiple targets. Finally, as a possible direction for future research, we propose an extensive-form game model that makes the defender's uncertainty about the attacker's ability to observe explicit.\nThe problem of adversarial multi-robot patrol has gained interest in recent years, mainly due to its immediate relevance to various security applications. In this problem, robots are required to repeatedly visit a target area in a way that maximizes their chances of detecting an adversary trying to penetrate through the patrol path. When facing a strong adversary that knows the patrol strategy of the robots, if the robots use a deterministic patrol algorithm, then in many cases it is easy for the adversary to penetrate undetected (in fact, in some of those cases the adversary can guarantee penetration). Therefore this paper presents a non-deterministic patrol framework for the robots. Assuming that the strong adversary will take advantage of its knowledge and try to penetrate through the patrols weakest spot, hence an optimal algorithm is one that maximizes the chances of detection in that point. We therefore present a polynomial-time algorithm for determining an optimal patrol under the Markovian strategy assumption for the robots, such that the probability of detecting the adversary in the patrols weakest spot is maximized. We build upon this framework and describe an optimal patrol strategy for several robotic models based on their movement abilities (directed or undirected) and sensing abilities (perfect or imperfect), and in different environment models - either patrol around a perimeter (closed polygon) or an open fence (open polyline).\nIn automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Centrality (relevance) is determined by considering the whole input source (and not only local information), and by taking into account the existence of minor topics or lateral subjects in the information sources to be summarized. The method consists in creating, for each passage of the input source, a support set consisting only of the most semantically related passages. Then, the determination of the most relevant content is achieved by selecting the passages that occur in the largest number of support sets. This model produces extractive summaries that are generic, and language- and domain-independent. Thorough automatic evaluation shows that the method achieves state-of-the-art performance, both in written text, and automatically transcribed speech summarization, including when compared to considerably more complex approaches.\nThe Aviation Safety Reporting System collects voluntarily submitted reports on aviation safety incidents to facilitate research work aiming to reduce such incidents. To effectively reduce these incidents, it is vital to accurately identify why these incidents occurred. More precisely, given a set of possible causes, or shaping factors, this task of cause identification involves identifying all and only those shaping factors that are responsible for the incidents described in a report. We investigate two approaches to cause identification. Both approaches exploit information provided by a semantic lexicon, which is automatically constructed via Thelen and Riloffs Basilisk framework augmented with our linguistic and algorithmic modifications. The first approach labels a report using a simple heuristic, which looks for the words and phrases acquired during the semantic lexicon learning process in the report. The second approach recasts cause identification as a text classification problem, employing supervised and transductive text classification algorithms to learn models from incident reports labeled with shaping factors and using the models to label unseen reports. Our experiments show that both the heuristic-based approach and the learning-based approach (when given sufficient training data) outperform the baseline system significantly.\nToday, mobile robots are expected to carry out increasingly complex tasks in multifarious, real-world environments. Often, the tasks require a certain semantic understanding of the workspace. Consider, for example, spoken instructions from a human collaborator referring to objects of interest; the robot must be able to accurately detect these objects to correctly understand the instructions. However, existing object detection, while competent, is not perfect. In particular, the performance of detection algorithms is commonly sensitive to the position of the sensor relative to the objects in the scene. This paper presents an online planning algorithm which learns an explicit model of the spatial dependence of object detection and generates plans which maximize the expected performance of the detection, and by extension the overall plan performance. Crucially, the learned sensor model incorporates spatial correlations between measurements, capturing the fact that successive measurements taken at the same or nearby locations are not independent. We show how this sensor model can be incorporated into an efficient forward search algorithm in the information space of detected objects, allowing the robot to generate motion plans efficiently. We investigate the performance of our approach by addressing the tasks of door and text detection in indoor environments and demonstrate significant improvement in detection performance during task execution over alternative methods in simulated and real robot experiments.\nMany relations of scientific interest are nonlinear, and even in linear systems distributions are often non-Gaussian, for example in fMRI BOLD data. A class of search procedures for causal relations in high dimensional data relies on sample derived conditional independence decisions. The most common applications rely on Gaussian tests that can be systematically erroneous in nonlinear non-Gaussian cases. Recent work (Gretton et al. (2009), Tillman et al. (2009), Zhang et al. (2011)) has proposed conditional independence tests using Reproducing Kernel Hilbert Spaces (RKHS). Among these, perhaps the most efficient has been KCI (Kernel Conditional Independence, Zhang et al. (2011)), with computational requirements that grow effectively at least as O(N3), placing it out of range of large sample size analysis, and restricting its applicability to high dimensional data sets. We propose a class of O(N2) tests using conditional correlation independence (CCI) that require a few seconds on a standard workstation for tests that require tens of minutes to hours for the KCI method, depending on degree of parallelization, with similar accuracy. For accuracy on difficult nonlinear, non-Gaussian data sets, we also compare a recent test due to Harris & Drton (2012), applicable to nonlinear, non-Gaussian distributions in the Gaussian copula, as well as to partial correlation, a linear Gaussian test.\nLarge bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords ---pairs of parallel words with a high probability of co-occurrence--- that can be used as an intermediate representation in the compression process. However, the simple biword approach described in the literature can only exploit one-to-one word alignments and cannot tackle the reordering of words. We therefore introduce a generalization of biwords which can describe multi-word expressions and reorderings. We also describe some methods for the binary compression of generalized biword sequences, and compare their performance when different schemes are applied to the extraction of the biword sequence. In addition, we show that this generalization of biwords allows for the implementation of an efficient algorithm to look on the compressed bitext for words or text segments in one of the texts and retrieve their counterpart translations in the other text ---an application usually referred to as translation spotting--- with only some minor modifications in the compression algorithm.\nThe computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches.\nQuality of General Game Playing (GGP) matches suffers from slow state-switching and weak knowledge modules. Instantiation and Propositional Networks offer great performance gains over Prolog-based reasoning, but do not scale well. In this publication mGDL, a variant of GDL stripped of function constants, has been defined as a basis for simple reasoning machines. mGDL allows to easily map rules to C++ functions. 253 out of 270 tested GDL rule sheets conformed to mGDL without any modifications; the rest required minor changes. A revised (m)GDL to C++ translation scheme has been reevaluated; it brought gains ranging from 28% to 7300% over YAP Prolog, managing to compile even demanding rule sheets under few seconds. For strengthening game knowledge, spatial features inspired by similar successful techniques from computer Go have been proposed. For they required an Euclidean metric, a small board extension to GDL has been defined through a set of ground atomic sentences. An SGA-based genetic algorithm has been designed for tweaking game parameters and conducting self-plays, so the features could be mined from meaningful game records. The approach has been tested on a small cluster, giving performance gains up to 20% more wins against the baseline UCT player. Implementations of proposed ideas constitutes the core of GGP Spatium - a small C++/Python GGP framework, created for developing compact GGP Players and problem solvers.\nIn the last decade, a lot of effort has been put into securing software application during development in the software industry. Software security is a research field in this area which looks at how security can be weaved into software at each phase of software development lifecycle (SDLC). The use of attack patterns is one of the approaches that have been proposed for integrating security during the design phase of SDLC. While this approach help developers in identify security flaws in their software designs, the need to apply the proper security capability that will mitigate the threat identified is very important. To assist in this area, the uses of security patterns have been proposed to help developers to identify solutions to recurring security problems. However due to different types of security patterns and their taxonomy, software developers are faced with the challenge of finding and selecting appropriate security patterns that addresses the security risks in their design. In this paper, we propose a tool based on Neural Network for proposing solutions in form of security patterns to threats in attack patterns matching attacking patterns. From the result of performance of the neural network, we found out that the neural network was able to match attack patterns to security patterns that can mitigate the threat in the attack pattern. With this information developers are better informed in making decision on the solution for securing their application.\nAutomatic extraction of temporal relations between event pairs is an important task for several natural language processing applications such as Question Answering, Information Extraction, and Summarization. Since most existing methods are supervised and require large corpora, which for many languages do not exist, we have concentrated our efforts to reduce the need for annotated data as much as possible. This paper presents two different algorithms towards this goal. The first algorithm is a weakly supervised machine learning approach for classification of temporal relations between events. In the first stage, the algorithm learns a general classifier from an annotated corpus. Then, inspired by the hypothesis of \"one type of temporal relation per discourse, it extracts useful information from a cluster of topically related documents. We show that by combining the global information of such a cluster with local decisions of a general classifier, a bootstrapping cross-document classifier can be built to extract temporal relations between events. Our experiments show that without any additional annotated data, the accuracy of the proposed algorithm is higher than that of several previous successful systems. The second proposed method for temporal relation extraction is based on the expectation maximization (EM) algorithm. Within EM, we used different techniques such as a greedy best-first search and integer linear programming for temporal inconsistency removal. We think that the experimental results of our EM based algorithm, as a first step toward a fully unsupervised temporal relation extraction method, is encouraging.\nWe give the analysis of the computational complexity of coalition structure generation over graphs. Given an undirected graph G = (N,E) and a valuation function v : P(N) \\to R over the subsets of nodes, the problem is to find a partition of N into connected subsets, that maximises the sum of the components values. This problem is generally NP-complete; in particular, it is hard for a defined class of valuation functions which are independent of disconnected members - that is, two nodes have no effect on each other's marginal contribution to their vertex separator. Nonetheless, for all such functions we provide bounds on the complexity of coalition structure generation over general and minor-free graphs. Our proof is constructive and yields algorithms for solving corresponding instances of the problem. Furthermore, we derive linear time bounds for graphs of bounded treewidth. However, as we show, the problem remains NP-complete for planar graphs, and hence, for any K_k minor free graphs where k \\geq 5. Moreover, a 3-SAT problem with m clauses can be represented by a coalition structure generation problem over a planar graph with O(m^2) nodes. Importantly, our hardness result holds for a particular subclass of valuation functions, termed edge sum, where the value of each subset of nodes is simply determined by the sum of given weights of the edges in the induced subgraph.\nTo tackle the vocabulary problem in conversational systems, previous work has applied unsupervised learning approaches on co-occurring speech and eye gaze during interaction to automatically acquire new words. Although these approaches have shown promise, several issues related to human language behavior and human-machine conversation have not been addressed. First, psycholinguistic studies have shown certain temporal regularities between human eye movement and language production. While these regularities can potentially guide the acquisition process, they have not been incorporated in the previous unsupervised approaches. Second, conversational systems generally have an existing knowledge base about the domain and vocabulary. While the existing knowledge can potentially help bootstrap and constrain the acquired new words, it has not been incorporated in the previous models. Third, eye gaze could serve different functions in human-machine conversation. Some gaze streams may not be closely coupled with speech stream, and thus are potentially detrimental to word acquisition. Automated recognition of closely-coupled speech-gaze streams based on conversation context is important. To address these issues, we developed new approaches that incorporate user language behavior, domain knowledge, and conversation context in word acquisition. We evaluated these approaches in the context of situated dialogue in a virtual world. Our experimental results have shown that incorporating the above three types of contextual information significantly improves word acquisition performance.\nWe propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary \"real training data by a factor of 2--5.\nThe thyroid, an endocrine gland that secretes hormones in the blood, circulates its products to all tissues of the body, where they control vital functions in every cell. Normal levels of thyroid hormone help the brain, heart, intestines, muscles and reproductive system function normally. Thyroid hormones control the metabolism of the body. Abnormalities of thyroid function are usually related to production of too little thyroid hormone (hypothyroidism) or production of too much thyroid hormone (hyperthyroidism). Therefore, the correct diagnosis of these diseases is very important topic. In this study, Linguistic Hedges Neural-Fuzzy Classifier with Selected Features (LHNFCSF) is presented for diagnosis of thyroid diseases. The performance evaluation of this system is estimated by using classification accuracy and k-fold cross-validation. The results indicated that the classification accuracy without feature selection was 98.6047% and 97.6744% during training and testing phases, respectively with RMSE of 0.02335. After applying feature selection algorithm, LHNFCSF achieved 100% for all cluster sizes during training phase. However, in the testing phase LHNFCSF achieved 88.3721% using one cluster for each class, 90.6977% using two clusters, 91.8605% using three clusters and 97.6744% using four clusters for each class and 12 fuzzy rules. The obtained classification accuracy was very promising with regard to the other classification applications in literature for this problem.\nBiological organisms are composed of numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for diagnosing diseases, and drug development. Scientists studying these processes have identified various pathways responsible for drug metabolism, and signal transduction, etc.   Newer techniques and speed improvements have resulted in deeper knowledge about these pathways, resulting in refined models that tend to be large and complex, making it difficult for a person to remember all aspects of it. Thus, computer models are needed to analyze them. We want to build such a system that allows modeling of biological systems and pathways in such a way that we can answer questions about them.   Many existing models focus on structural and/or factoid questions, using surface-level knowledge that does not require understanding the underlying model. We believe these are not the kind of questions that a biologist may ask someone to test their understanding of the biological processes. We want our system to answer the kind of questions a biologist may ask. Such questions appear in early college level text books.   Thus the main goal of our thesis is to develop a system that allows us to encode knowledge about biological pathways and answer such questions about them demonstrating understanding of the pathway. To that end, we develop a language that will allow posing such questions and illustrate the utility of our framework with various applications in the biological domain. We use some existing tools with modifications to accomplish our goal.   Finally, we apply our system to real world applications by extracting pathway knowledge from text and answering questions related to drug development.\nAlignment-free sequence analysis approaches provide important alternatives over multiple sequence alignment (MSA) in biological sequence analysis because alignment-free approaches have low computation complexity and are not dependent on high level of sequence identity, however, most of the existing alignment-free methods do not employ true full information content of sequences and thus can not accurately reveal similarities and differences among DNA sequences. We present a novel alignment-free computational method for sequence analysis based on Ramanujan-Fourier transform (RFT), in which complete information of DNA sequences is retained. We represent DNA sequences as four binary indicator sequences and apply RFT on the indicator sequences to convert them into frequency domain. The Euclidean distance of the complete RFT coefficients of DNA sequences are used as similarity measure. To address the different lengths in Euclidean space of RFT coefficients, we pad zeros to short DNA binary sequences so that the binary sequences equal the longest length in the comparison sequence data. Thus, the DNA sequences are compared in the same dimensional frequency space without information loss. We demonstrate the usefulness of the proposed method by presenting experimental results on hierarchical clustering of genes and genomes. The proposed method opens a new channel to biological sequence analysis, classification, and structural module identification.\nOptimizing an interactive system against a predefined online metric is particularly challenging, when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is run to serve users and compared with a baseline in an A/B test. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run (potentially infinitely) many A/B tests offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques.\nScientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions, manipulations, perturbations) and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different data distributions. In this work, we present algorithm COmbINE, which accepts a collection of data sets over overlapping variable sets under different experimental conditions; COmbINE then outputs a summary of all causal models indicating the invariant and variant structural characteristics of all models that simultaneously fit all of the input data sets. COmbINE converts estimated dependencies and independencies in the data into path constraints on the data-generating causal model and encodes them as a SAT instance. The algorithm is sound and complete in the sample limit. To account for conflicting constraints arising from statistical errors, we introduce a general method for sorting constraints in order of confidence, computed as a function of their corresponding p-values. In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co-analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions.\nHigh accuracy in cancer prediction is important to improve the quality of the treatment and to improve the rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the analytical challenge exists in double. The use of effective sampling technique in classification algorithms always yields good prediction accuracy. The SEER public use cancer database provides various prominent class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling techniques in classifying the prognosis variable and propose an ideal sampling method based on the outcome of the experimentation. In the first phase of this work the traditional random sampling and stratified sampling techniques have been used. At the next level the balanced stratified sampling with variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been focused on performing the pre_processing of the SEER data set. The classification model for experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with three traditional classifiers namely Decision Tree, Naive Bayes and K-Nearest Neighbor. The three prognosis factors survival, stage and metastasis have been used as class labels for experimental comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model as the sample size increases, but the traditional approach fluctuates before the optimum results.\nProbabilistic Logic Programming (PLP) languages enable programmers to specify systems that combine logical models with statistical knowledge. The inference problem, to determine the probability of query answers in PLP, is intractable in general, thereby motivating the need for approximate techniques. In this paper, we present a technique for approximate inference of conditional probabilities for PLP queries. It is an Adaptive Markov Chain Monte Carlo (MCMC) technique, where the distribution from which samples are drawn is modified as the Markov Chain is explored. In particular, the distribution is progressively modified to increase the likelihood that a generated sample is consistent with evidence. In our context, each sample is uniquely characterized by the outcomes of a set of random variables. Inspired by reinforcement learning, our technique propagates rewards to random variable/outcome pairs used in a sample based on whether the sample was consistent or not. The cumulative rewards of each outcome is used to derive a new \"adapted distribution\" for each random variable. For a sequence of samples, the distributions are progressively adapted after each sample. For a query with \"Markovian evaluation structure\", we show that the adapted distribution of samples converges to the query's conditional probability distribution. For Markovian queries, we present a modified adaptation process that can be used in adaptive MCMC as well as adaptive independent sampling. We empirically evaluate the effectiveness of the adaptive sampling methods for queries with and without Markovian evaluation structure.\nIn this paper we introduce a Bayesian framework for solving a class of problems termed Multi-agent Inverse Reinforcement Learning (MIRL). Compared to the well-known Inverse Reinforcement Learning (IRL) problem, MIRL is formalized in the context of a stochastic game rather than a Markov decision process (MDP). Games bring two primary challenges: First, the concept of optimality, central to MDPs, loses its meaning and must be replaced with a more general solution concept, such as the Nash equilibrium. Second, the non-uniqueness of equilibria means that in MIRL, in addition to multiple reasonable solutions for a given inversion model, there may be multiple inversion models that are all equally sensible approaches to solving the problem. We establish a theoretical foundation for competitive two-agent MIRL problems and propose a Bayesian optimization algorithm to solve the problem. We focus on the case of two-person zero-sum stochastic games, developing a generative model for the likelihood of unknown rewards of agents given observed game play assuming that the two agents follow a minimax bipolicy. As a numerical illustration, we apply our method in the context of an abstract soccer game. For the soccer game, we investigate relationships between the extent of prior information and the quality of learned rewards. Results suggest that covariance structure is more important than mean value in reward priors.\nWe describe Venture, an interactive virtual machine for probabilistic programming that aims to be sufficiently expressive, extensible, and efficient for general-purpose use. Like Church, probabilistic models and inference problems in Venture are specified via a Turing-complete, higher-order probabilistic language descended from Lisp. Unlike Church, Venture also provides a compositional language for custom inference strategies built out of scalable exact and approximate techniques. We also describe four key aspects of Venture's implementation that build on ideas from probabilistic graphical models. First, we describe the stochastic procedure interface (SPI) that specifies and encapsulates primitive random variables. The SPI supports custom control flow, higher-order probabilistic procedures, partially exchangeable sequences and ``likelihood-free'' stochastic simulators. It also supports external models that do inference over latent variables hidden from Venture. Second, we describe probabilistic execution traces (PETs), which represent execution histories of Venture programs. PETs capture conditional dependencies, existential dependencies and exchangeable coupling. Third, we describe partitions of execution histories called scaffolds that factor global inference problems into coherent sub-problems. Finally, we describe a family of stochastic regeneration algorithms for efficiently modifying PET fragments contained within scaffolds. Stochastic regeneration linear runtime scaling in cases where many previous approaches scaled quadratically. We show how to use stochastic regeneration and the SPI to implement general-purpose inference strategies such as Metropolis-Hastings, Gibbs sampling, and blocked proposals based on particle Markov chain Monte Carlo and mean-field variational inference techniques.\nThis paper introduces a new paradigm for minimax game-tree search algo- rithms. MT is a memory-enhanced version of Pearls Test procedure. By changing the way MT is called, a number of best-first game-tree search algorithms can be simply and elegantly constructed (including SSS*). Most of the assessments of minimax search algorithms have been based on simulations. However, these simulations generally do not address two of the key ingredients of high performance game-playing programs: iterative deepening and memory usage. This paper presents experimental data from three game-playing programs (checkers, Othello and chess), covering the range from low to high branching factor. The improved move ordering due to iterative deepening and memory usage results in significantly different results from those portrayed in the literature. Whereas some simulations show Alpha-Beta expanding almost 100% more leaf nodes than other algorithms [12], our results showed variations of less than 20%. One new instance of our framework (MTD-f) out-performs our best alpha- beta searcher (aspiration NegaScout) on leaf nodes, total nodes and execution time. To our knowledge, these are the first reported results that compare both depth-first and best-first algorithms given the same amount of memory\nIn 1979 Stockman introduced the SSS* minimax search algorithm that domi- nates Alpha-Beta in the number of leaf nodes expanded. Further investigation of the algorithm showed that it had three serious drawbacks, which prevented its use by practitioners: it is difficult to understand, it has large memory requirements, and it is slow. This paper presents an alternate formulation of SSS*, in which it is implemented as a series of Alpha-Beta calls that use a transposition table (AB- SSS*). The reformulation solves all three perceived drawbacks of SSS*, making it a practical algorithm. Further, because the search is now based on Alpha-Beta, the extensive research on minimax search enhancements can be easily integrated into AB-SSS*. To test AB-SSS* in practise, it has been implemented in three state-of-the- art programs: for checkers, Othello and chess. AB-SSS* is comparable in performance to Alpha-Beta on leaf node count in all three games, making it a viable alternative to Alpha-Beta in practise. Whereas SSS* has usually been regarded as being entirely different from Alpha-Beta, it turns out to be just an Alpha-Beta enhancement, like null-window searching. This runs counter to published simulation results. Our research leads to the surprising result that iterative deepening versions of Alpha-Beta can expand fewer leaf nodes than iterative deepening versions of SSS* due to dynamic move re-ordering.\nKnuth and Moore presented a theoretical lower bound on the number of leaves that any fixed-depth minimax tree-search algorithm traversing a uniform tree must explore, the so-called minimal tree. Since real-life minimax trees are not uniform, the exact size of this tree is not known for most applications. Further, most games have transpositions, implying that there exists a minimal graph which is smaller than the minimal tree. For three games (chess, Othello and checkers) we compute the size of the minimal tree and the minimal graph. Empirical evidence shows that in all three games, enhanced Alpha-Beta search is capable of building a tree that is close in size to that of the minimal graph. Hence, it appears game-playing programs build nearly optimal search trees. However, the conventional definition of the minimal graph is wrong. There are ways in which the size of the minimal graph can be reduced: by maximizing the number of transpositions in the search, and generating cutoffs using branches that lead to smaller search trees. The conventional definition of the minimal graph is just a left-most approximation. Calculating the size of the real minimal graph is too computationally intensive. However, upper bound approximations show it to be significantly smaller than the left-most minimal graph. Hence, it appears that game-playing programs are not searching as efficiently as is widely believed. Understanding the left-most and real minimal search graphs leads to some new ideas for enhancing Alpha-Beta search. One of them, enhanced transposition cutoffs, is shown to significantly reduce search tree size.\nWe present a polyphonic MIDI score-following algorithm capable of following performances with arbitrary repeats and skips, based on a probabilistic model of musical performances. It is attractive in practical applications of score following to handle repeats and skips which may be made arbitrarily during performances, but the algorithms previously described in the literature cannot be applied to scores of practical length due to problems with large computational complexity. We propose a new type of hidden Markov model (HMM) as a performance model which can describe arbitrary repeats and skips including performer tendencies on distributed score positions before and after them, and derive an efficient score-following algorithm that reduces computational complexity without pruning. A theoretical discussion on how much such information on performer tendencies improves the score-following results is given. The proposed score-following algorithm also admits performance mistakes and is demonstrated to be effective in practical situations by carrying out evaluations with human performances. The proposed HMM is potentially valuable for other topics in information processing and we also provide a detailed description of inference algorithms.\nOne important challenge for probabilistic logics is reasoning with very large knowledge bases (KBs) of imperfect information, such as those produced by modern web-scale information extraction systems. One scalability problem shared by many probabilistic logics is that answering queries involves \"grounding\" the query---i.e., mapping it to a propositional representation---and the size of a \"grounding\" grows with database size. To address this bottleneck, we present a first-order probabilistic language called ProPPR in which that approximate \"local groundings\" can be constructed in time independent of database size. Technically, ProPPR is an extension to stochastic logic programs (SLPs) that is biased towards short derivations; it is also closely related to an earlier relational learning algorithm called the path ranking algorithm (PRA). We show that the problem of constructing proofs for this logic is related to computation of personalized PageRank (PPR) on a linearized version of the proof space, and using on this connection, we develop a proveably-correct approximate grounding scheme, based on the PageRank-Nibble algorithm. Building on this, we develop a fast and easily-parallelized weight-learning algorithm for ProPPR. In experiments, we show that learning for ProPPR is orders magnitude faster than learning for Markov logic networks; that allowing mutual recursion (joint learning) in KB inference leads to improvements in performance; and that ProPPR can learn weights for a mutually recursive program with hundreds of clauses, which define scores of interrelated predicates, over a KB containing one million entities.\nSocial status, defined as the relative rank or position that an individual holds in a social hierarchy, is known to be among the most important motivating forces in social behaviors. In this paper, we consider the notion of status from the perspective of a position or title held by a person in an enterprise. We study the intersection of social status and social networks in an enterprise. We study whether enterprise communication logs can help reveal how social interactions and individual status manifest themselves in social networks. To that end, we use two enterprise datasets with three communication channels --- voice call, short message, and email --- to demonstrate the social-behavioral differences among individuals with different status. We have several interesting findings and based on these findings we also develop a model to predict social status. On the individual level, high-status individuals are more likely to be spanned as structural holes by linking to people in parts of the enterprise networks that are otherwise not well connected to one another. On the community level, the principle of homophily, social balance and clique theory generally indicate a \"rich club\" maintained by high-status individuals, in the sense that this community is much more connected, balanced and dense. Our model can predict social status of individuals with 93% accuracy.\nFace verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problem is exacerbated when we rely unrealistically on a single training data source, which is often insufficient to cover the intrinsically complex face variations. This paper proposes a principled multi-task learning approach based on Discriminative Gaussian Process Latent Variable Model, named GaussianFace, to enrich the diversity of training data. In comparison to existing methods, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification in an unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. Extensive experiments demonstrate the effectiveness of the proposed model in learning from diverse data sources and generalize to unseen domain. Specifically, the accuracy of our algorithm achieves an impressive accuracy rate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark. For the first time, the human-level performance in face verification (97.53%) on LFW is surpassed.\nScoring systems are an extremely important class of election systems. A length-$m$ (so-called) scoring vector applies only to $m$-candidate elections. To handle general elections, one must use a family of vectors, one per length. The most elegant approach to making sure such families are \"family-like\" is the recently introduced notion of (polynomial-time uniform) pure scoring rules [Betzler and Dorn 2010], where each scoring vector is obtained from its precursor by adding one new coefficient. We obtain the first dichotomy theorem for pure scoring rules for a control problem. In particular, for constructive control by adding voters (CCAV), we show that CCAV is solvable in polynomial time for $k$-approval with $k \\leq 3$, $k$-veto with $k \\leq 2$, every pure scoring rule in which only the two top-rated candidates gain nonzero scores, and a particular rule that is a \"hybrid\" of 1-approval and 1-veto. For all other pure scoring rules, CCAV is NP-complete. We also investigate the descriptive richness of different models for defining pure scoring rules, proving how more rule-generation time gives more rules, proving that rationals give more rules than do the natural numbers, and proving that some restrictions previously thought to be \"w.l.o.g.\" in fact do lose generality.\nWe study techniques to incentivize self-interested agents to form socially desirable solutions in scenarios where they benefit from mutual coordination. Towards this end, we consider coordination games where agents have different intrinsic preferences but they stand to gain if others choose the same strategy as them. For non-trivial versions of our game, stable solutions like Nash Equilibrium may not exist, or may be socially inefficient even when they do exist. This motivates us to focus on designing efficient algorithms to compute (almost) stable solutions like Approximate Equilibrium that can be realized if agents are provided some additional incentives. Our results apply in many settings like adoption of new products, project selection, and group formation, where a central authority can direct agents towards a strategy but agents may defect if they have better alternatives. We show that for any given instance, we can either compute a high quality approximate equilibrium or a near-optimal solution that can be stabilized by providing small payments to some players. We then generalize our model to encompass situations where player relationships may exhibit complementarities and present an algorithm to compute an Approximate Equilibrium whose stability factor is linear in the degree of complementarity. Our results imply that a little influence is necessary in order to ensure that selfish players coordinate and form socially efficient solutions.\nResearch on multi-agent planning has been popular in recent years. While previous research has been motivated by the understanding that, through cooperation, multi-agent systems can achieve tasks that are unachievable by single-agent systems, there are no formal characterizations of situations where cooperation is required to achieve a goal, thus warranting the application of multi-agent systems. In this paper, we provide such a formal discussion from the planning aspect. We first show that determining whether there is required cooperation (RC) is intractable is general. Then, by dividing the problems that require cooperation (referred to as RC problems) into two classes -- problems with heterogeneous and homogeneous agents, we aim to identify all the conditions that can cause RC in these two classes. We establish that when none of these identified conditions hold, the problem is single-agent solvable. Furthermore, with a few assumptions, we provide an upper bound on the minimum number of agents required for RC problems with homogeneous agents. This study not only provides new insights into multi-agent planning, but also has many applications. For example, in human-robot teaming, when a robot cannot achieve a task, it may be due to RC. In such cases, the human teammate should be informed and, consequently, coordinate with other available robots for a solution.\nA {\\it dynamic reasoning system} (DRS) is an adaptation of a conventional formal logical system that explicitly portrays reasoning as a temporal activity, with each extralogical input to the system and each inference rule application being viewed as occurring at a distinct time step. Every DRS incorporates some well-defined logic together with a controller that serves to guide the reasoning process in response to user inputs. Logics are generic, whereas controllers are application-specific. Every controller does, nonetheless, provide an algorithm for nonmonotonic belief revision. The general notion of a DRS comprises a framework within which one can formulate the logic and algorithms for a given application and prove that the algorithms are correct, i.e., that they serve to (i) derive all salient information and (ii) preserve the consistency of the belief set. This paper illustrates the idea with ordinary first-order predicate calculus, suitably modified for the present purpose, and an example. The example revisits some classic nonmonotonic reasoning puzzles (Opus the Penguin, Nixon Diamond) and shows how these can be resolved in the context of a DRS, using an expanded version of first-order logic that incorporates typed predicate symbols. All concepts are rigorously defined and effectively computable, thereby providing the foundation for a future software implementation.\nRuntime monitoring is one of the central tasks to provide operational decision support to running business processes, and check on-the-fly whether they comply with constraints and rules. We study runtime monitoring of properties expressed in LTL on finite traces (LTLf) and in its extension LDLf. LDLf is a powerful logic that captures all monadic second order logic on finite traces, which is obtained by combining regular expressions and LTLf, adopting the syntax of propositional dynamic logic (PDL). Interestingly, in spite of its greater expressivity, LDLf has exactly the same computational complexity of LTLf. We show that LDLf is able to capture, in the logic itself, not only the constraints to be monitored, but also the de-facto standard RV-LTL monitors. This makes it possible to declaratively capture monitoring metaconstraints, and check them by relying on usual logical services instead of ad-hoc algorithms. This, in turn, enables to flexibly monitor constraints depending on the monitoring state of other constraints, e.g., \"compensation\" constraints that are only checked when others are detected to be violated. In addition, we devise a direct translation of LDLf formulas into nondeterministic automata, avoiding to detour to Buechi automata or alternating automata, and we use it to implement a monitoring plug-in for the PROM suite.\nOutlier detection in a large-scale database is a significant and complex issue in knowledge discovering field. As the data distributions are obscure and uncertain in high dimensional space, most existing solutions try to solve the issue taking into account the two intuitive points: first, outliers are extremely far away from other points in high dimensional space; second, outliers are detected obviously different in projected-dimensional subspaces. However, for a complicated case that outliers are hidden inside the normal points in all dimensions, existing detection methods fail to find such inner outliers. In this paper, we propose a method with twice dimension-projections, which integrates primary subspace outlier detection and secondary point-projection between subspaces, and sums up the multiple weight values for each point. The points are computed with local density ratio separately in twice-projected dimensions. After the process, outliers are those points scoring the largest values of weight. The proposed method succeeds to find all inner outliers on the synthetic test datasets with the dimension varying from 100 to 10000. The experimental results also show that the proposed algorithm can work in low dimensional space and can achieve perfect performance in high dimensional space. As for this reason, our proposed approach has considerable potential to apply it in multimedia applications helping to process images or video with large-scale attributes.\nAnswer Set Programming (ASP) is logic programming under the stable model or answer set semantics. During the last decade, this paradigm has seen several extensions by generalizing the notion of atom used in these programs. Among these, there are aggregate atoms, HEX atoms, generalized quantifiers, and abstract constraints. In this paper we refer to these constructs collectively as generalized atoms. The idea common to all of these constructs is that their satisfaction depends on the truth values of a set of (non-generalized) atoms, rather than the truth value of a single (non-generalized) atom. Motivated by several examples, we argue that for some of the more intricate generalized atoms, the previously suggested semantics provide unintuitive results and provide an alternative semantics, which we call supportedly stable or SFLP answer sets. We show that it is equivalent to the major previously proposed semantics for programs with convex generalized atoms, and that it in general admits more intended models than other semantics in the presence of non-convex generalized atoms. We show that the complexity of supportedly stable models is on the second level of the polynomial hierarchy, similar to previous proposals and to stable models of disjunctive logic programs. Given these complexity results, we provide a compilation method that compactly transforms programs with generalized atoms in disjunctive normal form to programs without generalized atoms. Variants are given for the new supportedly stable and the existing FLP semantics, for which a similar compilation technique has not been known so far.\nIn this paper, a mathematical theory of learning is proposed that has many parallels with information theory. We consider Vapnik's General Setting of Learning in which the learning process is defined to be the act of selecting a hypothesis in response to a given training set. Such hypothesis can, for example, be a decision boundary in classification, a set of centroids in clustering, or a set of frequent item-sets in association rule mining. Depending on the hypothesis space and how the final hypothesis is selected, we show that a learning process can be assigned a numeric score, called learning capacity, which is analogous to Shannon's channel capacity and satisfies similar interesting properties as well such as the data-processing inequality and the information-cannot-hurt inequality. In addition, learning capacity provides the tightest possible bound on the difference between true risk and empirical risk of the learning process for all loss functions that are parametrized by the chosen hypothesis. It is also shown that the notion of learning capacity equivalently quantifies how sensitive the choice of the final hypothesis is to a small perturbation in the training set. Consequently, algorithmic stability is both necessary and sufficient for generalization. While the theory does not rely on concentration inequalities, we finally show that analogs to classical results in learning theory using the Probably Approximately Correct (PAC) model can be immediately deduced using this theory, and conclude with information-theoretic bounds to learning capacity.\nThe maximum mean discrepancy (MMD) is a recently proposed test statistic for two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitude expectation of a linear combination of sinusoid components based on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking advantage of sampling of Fourier transform, FastMMD decreases the time complexity for MMD calculation from $O(N^2 d)$ to $O(L N d)$, where $N$ and $d$ are the size and dimension of the sample set, respectively. Here $L$ is the number of basis functions for approximating kernels which determines the approximation accuracy. For kernels that are spherically invariant, the computation can be further accelerated to $O(L N \\log d)$ by using the Fastfood technique (Le et al., 2013). The uniform convergence of our method has also been theoretically proved in both unbiased and biased estimates. We have further provided a geometric explanation for our method, namely ensemble of circular discrepancy, which facilitates us to understand the insight of MMD, and is hopeful to help arouse more extensive metrics for assessing two-sample test. Experimental results substantiate that FastMMD is with similar accuracy as exact MMD, while with faster computation speed and lower variance than the existing MMD approximation methods.\nLifted inference has been proposed for various probabilistic logical frameworks in order to compute the probability of queries in a time that depends on the size of the domains of the random variables rather than the number of instances. Even if various authors have underlined its importance for probabilistic logic programming (PLP), lifted inference has been applied up to now only to relational languages outside of logic programming. In this paper we adapt Generalized Counting First Order Variable Elimination (GC-FOVE) to the problem of computing the probability of queries to probabilistic logic programs under the distribution semantics. In particular, we extend the Prolog Factor Language (PFL) to include two new types of factors that are needed for representing ProbLog programs. These factors take into account the existing causal independence relationships among random variables and are managed by the extension to variable elimination proposed by Zhang and Poole for dealing with convergent variables and heterogeneous factors. Two new operators are added to GC-FOVE for treating heterogeneous factors. The resulting algorithm, called LP$^2$ for Lifted Probabilistic Logic Programming, has been implemented by modifying the PFL implementation of GC-FOVE and tested on three benchmarks for lifted inference. A comparison with PITA and ProbLog2 shows the potential of the approach.\nIn the event that a bacteriological or chemical toxin is intro- duced to a water distribution network, a large population of consumers may become exposed to the contaminant. A contamination event may be poorly predictable dynamic process due to the interactions of consumers and utility managers during an event. Consumers that become aware of a threat may select protective actions that change their water demands from typical demand patterns, and new hydraulic conditions can arise that differ from conditions that are predicted when demands are considered as exogenous inputs. Consequently, the movement of the contaminant plume in the pipe network may shift from its expected trajectory. A sociotechnical model is developed here to integrate agent-based models of consumers with an engineering water distribution system model and capture the dynamics between consumer behaviors and the water distribution system for predicting contaminant transport and public exposure. Consumers are simulated as agents with behaviors defined for water use activities, mobility, word-of-mouth communication, and demand reduction, based on a set of rules representing an agents autonomy and reaction to health impacts, the environment, and the actions of other agents. As consumers decrease their water use, the demand exerted on the water distribution system is updated; as the flow directions and volumes shift in response, the location of the contaminant plume is updated and the amount of contaminant consumed by each agent is calculated. The framework is tested through simulating realistic contamination scenarios for a virtual city and water distribution system.\nAnswer Set Programming (ASP) is a powerful modelling formalism that is very efficient in solving combinatorial problems. ASP solvers implement the stable model semantics that eliminates circular derivations between Boolean variables from the solutions of a logic program. Due to this, ASP solvers are better suited than propositional satisfiability (SAT) and Constraint Programming (CP) solvers to solve a certain class of problems whose specification includes inductive definitions such as reachability in a graph. On the other hand, ASP solvers suffer from the grounding bottleneck that occurs due to their inability to model finite domain variables. Furthermore, the existing stable model semantics are not sufficient to disallow circular reasoning on the bounds of numeric variables. An example where this is required is in modelling shortest paths between nodes in a graph. Just as reachability can be encoded as an inductive definition with one or more base cases and recursive rules, shortest paths between nodes can also be modelled with similar base cases and recursive rules for their upper bounds. This deficiency of stable model semantics introduces another type of grounding bottleneck in ASP systems that cannot be removed by naively merging ASP with CP solvers, but requires a theoretical extension of the semantics from Booleans and normal rules to bounds over numeric variables and more general rules. In this work, we propose Bound Founded Answer Set Programming (BFASP) that resolves this issue and consequently, removes all types of grounding bottleneck inherent in ASP systems.\nBayesian model averaging (BMA) is the state of the art approach for overcoming model uncertainty. Yet, especially on small data sets, the results yielded by BMA might be sensitive to the prior over the models. Credal Model Averaging (CMA) addresses this problem by substituting the single prior over the models by a set of priors (credal set). Such approach solves the problem of how to choose the prior over the models and automates sensitivity analysis. We discuss various CMA algorithms for building an ensemble of logistic regressors characterized by different sets of covariates. We show how CMA can be appropriately tuned to the case in which one is prior-ignorant and to the case in which instead domain knowledge is available. CMA detects prior-dependent instances, namely instances in which a different class is more probable depending on the prior over the models. On such instances CMA suspends the judgment, returning multiple classes. We thoroughly compare different BMA and CMA variants on a real case study, predicting presence of Alpine marmot burrows in an Alpine valley. We find that BMA is almost a random guesser on the instances recognized as prior-dependent by CMA.\nAnswer Set Programming (ASP) is a well-established paradigm of declarative programming that has been developed in the field of logic programming and nonmonotonic reasoning. Advances in ASP solving technology are customarily assessed in competition events, as it happens for other closely-related problem-solving technologies like SAT/SMT, QBF, Planning and Scheduling. ASP Competitions are (usually) biennial events; however, the Fifth ASP Competition departs from tradition, in order to join the FLoC Olympic Games at the Vienna Summer of Logic 2014, which is expected to be the largest event in the history of logic. This edition of the ASP Competition series is jointly organized by the University of Calabria (Italy), the Aalto University (Finland), and the University of Genova (Italy), and is affiliated with the 30th International Conference on Logic Programming (ICLP 2014). It features a completely re-designed setup, with novelties involving the design of tracks, the scoring schema, and the adherence to a fixed modeling language in order to push the adoption of the ASP-Core-2 standard. Benchmark domains are taken from past editions, and best system packages submitted in 2013 are compared with new versions and solvers.   To appear in Theory and Practice of Logic Programming (TPLP).\nThe belief bias effect is a phenomenon which occurs when we think that we judge an argument based on our reasoning, but are actually influenced by our beliefs and prior knowledge. Evans, Barston and Pollard carried out a psychological syllogistic reasoning task to prove this effect. Participants were asked whether they would accept or reject a given syllogism. We discuss one specific case which is commonly assumed to be believable but which is actually not logically valid. By introducing abnormalities, abduction and background knowledge, we adequately model this case under the weak completion semantics. Our formalization reveals new questions about possible extensions in abductive reasoning. For instance, observations and their explanations might include some relevant prior abductive contextual information concerning some side-effect or leading to a contestable or refutable side-effect. A weaker notion indicates the support of some relevant consequences by a prior abductive context. Yet another definition describes jointly supported relevant consequences, which captures the idea of two observations containing mutually supportive side-effects. Though motivated with and exemplified by the running psychology application, the various new general abductive context definitions are introduced here and given a declarative semantics for the first time, and have a much wider scope of application. Inspection points, a concept introduced by Pereira and Pinto, allows us to express these definitions syntactically and intertwine them into an operational semantics.\nThis work deals with the problem of combining reactive features, such as the ability to respond to events and define complex events, with the execution of transactions over general Knowledge Bases (KBs).   With this as goal, we build on Transaction Logic (TR), a logic precisely designed to model and execute transactions in KBs defined by arbitrary logic theories. In it, transactions are written in a logic-programming style, by combining primitive update operations over a general KB, with the usual logic programming connectives and some additional connectives e.g. to express sequence of actions. While TR is a natural choice to deal with transactions, it remains the question whether TR can be used to express complex events, but also to deal simultaneously with the detection of complex events and the execution of transactions. In this paper we show that the former is possible while the latter is not. For that, we start by illustrating how TR can express complex events, and in particular, how SNOOP event expressions can be translated in the logic. Afterwards, we show why TR fails to deal with the two issues together, and to solve the intended problem propose Transaction Logic with Events, its syntax, model theory and executional semantics. The achieved solution is a non-monotonic extension of TR, which guarantees that every complex event detected in a transaction is necessarily responded.\nThe stable model (SM) semantics lacks the properties of existence, relevance and cumulativity. If we prospectively consider the class of conservative extensions of SM semantics (i.e., semantics that for each normal logic program P retrieve a superset of the set of stable models of P), one may wander how do the semantics of this class behave in what concerns the aforementioned properties. That is the type of issue dealt with in this paper. We define a large class of conservative extensions of the SM semantics, dubbed affix stable model semantics, ASM, and study the above referred properties into two non-disjoint subfamilies of the class ASM, here dubbed ASMh and ASMm. From this study a number of results stem which facilitate the assessment of semantics in the class ASMh U ASMm with respect to the properties of existence, relevance and cumulativity, whilst unveiling relations among these properties. As a result of the approach taken in our work, light is shed on the characterization of the SM semantics, as we show that the properties of (lack of) existence and (lack of) cautious monotony are equivalent, which opposes statements on this issue that may be found in the literature; we also characterize the relevance failure of SM semantics in a more clear way than usually stated in the literature.\nIn this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we compare these sense clusters of two different time points to find if (i) there is birth of a new sense or (ii) if an older sense has got split into more than one sense or (iii) if a newer sense has been formed from the joining of older senses or (iv) if a particular sense has died. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet. Manual evaluation indicates that the algorithm could correctly identify 60.4% birth cases from a set of 48 randomly picked samples and 57% split/join cases from a set of 21 randomly picked samples. Remarkably, in 44% cases the birth of a novel sense is attested by WordNet, while in 46% cases and 43% cases split and join are respectively confirmed by WordNet. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.\nShapleys impossibility result indicates that the two-person bargaining problem has no non-trivial ordinal solution with the traditional game-theoretic bargaining model. Although the result is no longer true for bargaining problems with more than two agents, none of the well known bargaining solutions are ordinal. Searching for meaningful ordinal solutions, especially for the bilateral bargaining problem, has been a challenging issue in bargaining theory for more than three decades. This paper proposes a logic-based ordinal solution to the bilateral bargaining problem. We argue that if a bargaining problem is modeled in terms of the logical relation of players physical negotiation items, a meaningful bargaining solution can be constructed based on the ordinal structure of bargainers preferences. We represent bargainers demands in propositional logic and bargainers preferences over their demands in total preorder. We show that the solution satisfies most desirable logical properties, such as individual rationality (logical version), consistency, collective rationality as well as a few typical game-theoretic properties, such as weak Pareto optimality and contraction invariance. In addition, if all players demand sets are logically closed, the solution satisfies a fixed-point condition, which says that the outcome of a negotiation is the result of mutual belief revision. Finally, we define various decision problems in relation to our bargaining model and study their computational complexity.\nWe present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. The learned model is then used as part of a Mixed Observability Markov Decision Process formulation, wherein the human type is a partially observable variable. With this framework, we can infer, either offline or online, the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this new user and will be robust to deviations of the human actions from prior demonstrations. Finally we validate the approach using data collected in human subject experiments, and conduct proof-of-concept demonstrations in which a person performs a collaborative task with a small industrial robot.\nAdvances in high energy physics have created the need to increase computational capacity. Project HEPGAME was composed to address this challenge. One of the issues is that numerical integration of expressions of current interest have millions of terms and takes weeks to compute. We have investigated ways to simplify these expressions, using Horner schemes and common subexpression elimination. Our approach applies MCTS, a search procedure that has been successful in AI. We use it to find near-optimal Horner schemes. Although MCTS finds better solutions, this approach gives rise to two further challenges. (1) MCTS (with UCT) introduces a constant, $C_p$ that governs the balance between exploration and exploitation. This constant has to be tuned manually. (2) There should be more guided exploration at the bottom of the tree, since the current approach reduces the quality of the solution towards the end of the expression. We investigate NMCS (Nested Monte Carlo Search) to address both issues, but find that NMCS is computationally unfeasible for our problem. Then, we modify the MCTS formula by introducing a dynamic exploration-exploitation parameter $T$ that decreases linearly with the iteration number. Consequently, we provide a performance analysis. We observe that a variable $C_p$ solves our domain: it yields more exploration at the bottom and as a result the tuning problem has been simplified. The region in $C_p$ for which good values are found is increased by more than a tenfold. This result encourages us to continue our research to solve other prominent problems in High Energy Physics.\nSemantic composition is the task of understanding the meaning of text by composing the meanings of the individual words in the text. Semantic decomposition is the task of understanding the meaning of an individual word by decomposing it into various aspects (factors, constituents, components) that are latent in the meaning of the word. We take a distributional approach to semantics, in which a word is represented by a context vector. Much recent work has considered the problem of recognizing compositions and decompositions, but we tackle the more difficult generation problem. For simplicity, we focus on noun-modifier bigrams and noun unigrams. A test for semantic composition is, given context vectors for the noun and modifier in a noun-modifier bigram (\"red salmon\"), generate a noun unigram that is synonymous with the given bigram (\"sockeye\"). A test for semantic decomposition is, given a context vector for a noun unigram (\"snifter\"), generate a noun-modifier bigram that is synonymous with the given unigram (\"brandy glass\"). With a vocabulary of about 73,000 unigrams from WordNet, there are 73,000 candidate unigram compositions for a bigram and 5,300,000,000 (73,000 squared) candidate bigram decompositions for a unigram. We generate ranked lists of potential solutions in two passes. A fast unsupervised learning algorithm generates an initial list of candidates and then a slower supervised learning algorithm refines the list. We evaluate the candidate solutions by comparing them to WordNet synonym sets. For decomposition (unigram to bigram), the top 100 most highly ranked bigrams include a WordNet synonym of the given unigram 50.7% of the time. For composition (bigram to unigram), the top 100 most highly ranked unigrams include a WordNet synonym of the given bigram 77.8% of the time.\nRecently, multiple formulations of vision problems as probabilistic inversions of generative models based on computer graphics have been proposed. However, applications to 3D perception from natural images have focused on low-dimensional latent scenes, due to challenges in both modeling and inference. Accounting for the enormous variability in 3D object shape and 2D appearance via realistic generative models seems intractable, as does inverting even simple versions of the many-to-many computations that link 3D scenes to 2D images. This paper proposes and evaluates an approach that addresses key aspects of both these challenges. We show that it is possible to solve challenging, real-world 3D vision problems by approximate inference in generative models for images based on rendering the outputs of probabilistic CAD (PCAD) programs. Our PCAD object geometry priors generate deformable 3D meshes corresponding to plausible objects and apply affine transformations to place them in a scene. Image likelihoods are based on similarity in a feature space based on standard mid-level image representations from the vision literature. Our inference algorithm integrates single-site and locally blocked Metropolis-Hastings proposals, Hamiltonian Monte Carlo and discriminative data-driven proposals learned from training data generated from our models. We apply this approach to 3D human pose estimation and object shape reconstruction from single images, achieving quantitative and qualitative performance improvements over state-of-the-art baselines.\nRecent work introduced Generalized First Order Decision Diagrams (GFODD) as a knowledge representation that is useful in mechanizing decision theoretic planning in relational domains. GFODDs generalize function-free first order logic and include numerical values and numerical generalizations of existential and universal quantification. Previous work presented heuristic inference algorithms for GFODDs and implemented these heuristics in systems for decision theoretic planning. In this paper, we study the complexity of the computational problems addressed by such implementations. In particular, we study the evaluation problem, the satisfiability problem, and the equivalence problem for GFODDs under the assumption that the size of the intended model is given with the problem, a restriction that guarantees decidability. Our results provide a complete characterization placing these problems within the polynomial hierarchy. The same characterization applies to the corresponding restriction of problems in first order logic, giving an interesting new avenue for efficient inference when the number of objects is bounded. Our results show that for $\\Sigma_k$ formulas, and for corresponding GFODDs, evaluation and satisfiability are $\\Sigma_k^p$ complete, and equivalence is $\\Pi_{k+1}^p$ complete. For $\\Pi_k$ formulas evaluation is $\\Pi_k^p$ complete, satisfiability is one level higher and is $\\Sigma_{k+1}^p$ complete, and equivalence is $\\Pi_{k+1}^p$ complete.\nAn originally chaotic system can be controlled into various periodic dynamics. When it is implemented into a legged robot's locomotion control as a central pattern generator (CPG), sophisticated gait patterns arise so that the robot can perform various walking behaviors. However, such a single chaotic CPG controller has difficulties dealing with leg malfunction. Specifically, in the scenarios presented here, its movement permanently deviates from the desired trajectory. To address this problem, we extend the single chaotic CPG to multiple CPGs with learning. The learning mechanism is based on a simulated annealing algorithm. In a normal situation, the CPGs synchronize and their dynamics are identical. With leg malfunction or disability, the CPGs lose synchronization leading to independent dynamics. In this case, the learning mechanism is applied to automatically adjust the remaining legs' oscillation frequencies so that the robot adapts its locomotion to deal with the malfunction. As a consequence, the trajectory produced by the multiple chaotic CPGs resembles the original trajectory far better than the one produced by only a single CPG. The performance of the system is evaluated first in a physical simulation of a quadruped as well as a hexapod robot and finally in a real six-legged walking machine called AMOSII. The experimental results presented here reveal that using multiple CPGs with learning is an effective approach for adaptive locomotion generation where, for instance, different body parts have to perform independent movements for malfunction compensation.\nThe dynamics of belief and knowledge is one of the major components of any autonomous system that should be able to incorporate new pieces of information. In order to apply the rationality result of belief dynamics theory to various practical problems, it should be generalized in two respects: first it should allow a certain part of belief to be declared as immutable; and second, the belief state need not be deductively closed. Such a generalization of belief dynamics, referred to as base dynamics, is presented in this paper, along with the concept of a generalized revision algorithm for knowledge bases (Horn or Horn logic with stratified negation). We show that knowledge base dynamics has an interesting connection with kernel change via hitting set and abduction. In this paper, we show how techniques from disjunctive logic programming can be used for efficient (deductive) database updates. The key idea is to transform the given database together with the update request into a disjunctive (datalog) logic program and apply disjunctive techniques (such as minimal model reasoning) to solve the original update problem. The approach extends and integrates standard techniques for efficient query answering and integrity checking. The generation of a hitting set is carried out through a hyper tableaux calculus and magic set that is focused on the goal of minimality.\nCounterexample-guided inductive synthesis CEGIS is used to synthesize programs from a candidate space of programs. The technique is guaranteed to terminate and synthesize the correct program if the space of candidate programs is finite. But the technique may or may not terminate with the correct program if the candidate space of programs is infinite. In this paper, we perform a theoretical analysis of counterexample-guided inductive synthesis technique. We investigate whether the set of candidate spaces for which the correct program can be synthesized using CEGIS depends on the counterexamples used in inductive synthesis, that is, whether there are good mistakes which would increase the synthesis power. We investigate whether the use of minimal counterexamples instead of arbitrary counterexamples expands the set of candidate spaces of programs for which inductive synthesis can successfully synthesize a correct program. We consider two kinds of counterexamples: minimal counterexamples and history bounded counterexamples. The history bounded counterexample used in any iteration of CEGIS is bounded by the examples used in previous iterations of inductive synthesis. We examine the relative change in power of inductive synthesis in both cases. We show that the synthesis technique using minimal counterexamples MinCEGIS has the same synthesis power as CEGIS but the synthesis technique using history bounded counterexamples HCEGIS has different power than that of CEGIS, but none dominates the other.\nDetecting faults in electrical power grids is of paramount importance, either from the electricity operator and consumer viewpoints. Modern electric power grids (smart grids) are equipped with smart sensors that allow to gather real-time information regarding the physical status of all the component elements belonging to the whole infrastructure (e.g., cables and related insulation, transformers, breakers and so on). In real-world smart grid systems, usually, additional information that are related to the operational status of the grid itself are collected such as meteorological information. Designing a suitable recognition (discrimination) model of faults in a real-world smart grid system is hence a challenging task. This follows from the heterogeneity of the information that actually determine a typical fault condition. The second point is that, for synthesizing a recognition model, in practice only the conditions of observed faults are usually meaningful. Therefore, a suitable recognition model should be synthesized by making use of the observed fault conditions only. In this paper, we deal with the problem of modeling and recognizing faults in a real-world smart grid system, which supplies the entire city of Rome, Italy. Recognition of faults is addressed by following a combined approach of multiple dissimilarity measures customization and one-class classification techniques. We provide here an in-depth study related to the available data and to the models synthesized by the proposed one-class classifier. We offer also a comprehensive analysis of the fault recognition results by exploiting a fuzzy set based reliability decision rule.\nIn this paper we address the problem of planning in rich domains, where knowledge representation is a key aspect for managing the complexity and size of the planning domain. We follow the approach of Description Logic (DL) based Dynamic Knowledge Bases, where a state of the world is represented concisely by a (possibly changing) ABox and a (fixed) TBox containing the axioms, and actions that allow to change the content of the ABox. The plan goal is given in terms of satisfaction of a DL query. In this paper we start from a traditional forward planning algorithm and we propose a much more efficient variant by combining backward and forward search. In particular, we propose a Backward State-space Reduction technique that consists in two phases: first, an Abstract Planning Graph P is created by using the Abstract Backward Planning Algorithm (ABP), then the abstract planning graph P is instantiated into a corresponding planning graph P by using the Forward Plan Instantiation Algorithm (FPI). The advantage is that in the preliminary ABP phase we produce a symbolic plan that is a pattern to direct the search of the concrete plan. This can be seen as a kind of informed search where the preliminary backward phase is useful to discover properties of the state-space that can be used to direct the subsequent forward phase. We evaluate the effectiveness of our ABP+FPI algorithm in the reduction of the explored planning domain by comparing it to a standard forward planning algorithm and applying both of them to a concrete business case study.\nThis paper deals with the relations among structural, topological, and chemical properties of the E.Coli proteome from the vantage point of the solubility/aggregation propensity of proteins. Each E.Coli protein is initially represented according to its known folded 3D shape. This step consists in representing the available E.Coli proteins in terms of graphs. We first analyze those graphs by considering pure topological characterizations, i.e., by analyzing the mass fractal dimension and the distribution underlying both shortest paths and vertex degrees. Results confirm the general architectural principles of proteins. Successively, we focus on the statistical properties of a representation of such graphs in terms of vectors composed of several numerical features, which we extracted from their structural representation. We found that protein size is the main discriminator for the solubility, while however there are other factors that help explaining the solubility degree. We finally analyze such data through a novel one-class classifier, with the aim of discriminating among very and poorly soluble proteins. Results are encouraging and consolidate the potential of pattern recognition techniques when employed to describe complex biological systems.\nBots are, for many Web and social media users, the source of many dangerous attacks or the carrier of unwanted messages, such as spam. Nevertheless, crawlers and software agents are a precious tool for analysts, and they are continuously executed to collect data or to test distributed applications. However, no one knows which is the real potential of a bot whose purpose is to control a community, to manipulate consensus, or to influence user behavior. It is commonly believed that the better an agent simulates human behavior in a social network, the more it can succeed to generate an impact in that community. We contribute to shed light on this issue through an online social experiment aimed to study to what extent a bot with no trust, no profile, and no aims to reproduce human behavior, can become popular and influential in a social media. Results show that a basic social probing activity can be used to acquire social relevance on the network and that the so-acquired popularity can be effectively leveraged to drive users in their social connectivity choices. We also register that our bot activity unveiled hidden social polarization patterns in the community and triggered an emotional response of individuals that brings to light subtle privacy hazards perceived by the user base.\nThe strong solutions of Nine Men's Morris and its variant, Lasker Morris are well-known results (the starting positions are draws). We re-examined both of these games, and calculated extended strong solutions for them. By this we mean the game-theoretic values of all possible game states that could be reached from certain starting positions where the number of stones to be placed by the players is different from the standard rules. These were also calculated for a previously unsolved third variant, Morabaraba, with interesting results: most of the starting positions where the players can place an equal number of stones (including the standard starting position) are wins for the first player (as opposed to the above games, where these are usually draws).   We also developed a multi-valued retrograde analysis, and used it as a basis for an algorithm for solving these games ultra-strongly. This means that when our program is playing against a fallible opponent, it has a greater chance of achieving a better result than the game-theoretic value, compared to randomly selecting between \"just strongly\" optimal moves. Previous attempts on ultra-strong solutions used local heuristics or learning during games, but we incorporated our algorithm into the retrograde analysis.\nDue to advances in sensors, growing large and complex medical image data have the ability to visualize the pathological change in the cellular or even the molecular level or anatomical changes in tissues and organs. As a consequence, the medical images have the potential to enhance diagnosis of disease, prediction of clinical outcomes, characterization of disease progression, management of health care and development of treatments, but also pose great methodological and computational challenges for representation and selection of features in image cluster analysis. To address these challenges, we first extend one dimensional functional principal component analysis to the two dimensional functional principle component analyses (2DFPCA) to fully capture space variation of image signals. Image signals contain a large number of redundant and irrelevant features which provide no additional or no useful information for cluster analysis. Widely used methods for removing redundant and irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on how to select penalty parameters and a threshold for selecting features. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attention in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image cluster analysis. The proposed method is applied to ovarian and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.\nAfter experimentation with other designs, the major search engines converged on the weighted, generalized second-price auction (wGSP) for selling keyword advertisements. Notably, this convergence occurred before position auctions were well understood (or, indeed, widely studied) theoretically. While much progress has been made since, theoretical analysis is still not able to settle the question of why search engines found wGSP preferable to other position auctions. We approach this question in a new way, adopting a new analytical paradigm we dub \"computational mechanism analysis.\" By sampling position auction games from a given distribution, encoding them in a computationally efficient representation language, computing their Nash equilibria, and then calculating economic quantities of interest, we can quantitatively answer questions that theoretical methods have not. We considered seven widely studied valuation models from the literature and three position auction variants (generalized first price, unweighted generalized second price, and wGSP). We found that wGSP consistently showed the best ads of any position auction, measured both by social welfare and by relevance (expected number of clicks). Even in models where wGSP was already known to have bad worse-case efficiency, we found that it almost always performed well on average. In contrast, we found that revenue was extremely variable across auction mechanisms, and was highly sensitive to equilibrium selection, the preference model, and the valuation distribution.\nExact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in $O(2(d+1)n2^n)$ time and space, if the number of nodes (variables) in the Bayesian network is $n$ and the in-degree (the number of parents) per node is bounded by a constant $d$. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all $n(n-1)$ edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if $p=2^k$ processors are used, the run-time reduces to $O(5(d+1)n2^{n-k}+k(n-k)^d)$ and the space usage becomes $O(n2^{n-k})$ per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a $n$-$D$ hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \\emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.\nThis paper addresses the path selection problem from a known source to the destination in dense networks. The proposed solution for route discovery uses the genetic algorithm approach for a QoS based network. The multi point crossover and mutation helps in determining the optimal path and alternate path when required. The input to the genetic algorithm is a learnt module which is a part of the cognitive router that takes care of four QoS parameters. Here the set of nodes selected for routing is determined by delay, jitter and loss. On this graded surface of nodes selected, the bandwidth parameter is considered for path selection. The aim of the approach is to occupy the maximized bandwidth along the forward channels and minimize the route length. The population size is considered as fixed nodes participating in the network scenario, which will be limited to a known size of topology. The simulated results show that by using genetic algorithm (GA) approach the probability of convergence to shortest path is higher.\nIn many applications that require matrix solutions of minimal rank, the underlying cost function is non-convex leading to an intractable, NP-hard optimization problem. Consequently, the convex nuclear norm is frequently used as a surrogate penalty term for matrix rank. The problem is that in many practical scenarios there is no longer any guarantee that we can correctly estimate generative low-rank matrices of interest, theoretical special cases notwithstanding. Consequently, this paper proposes an alternative empirical Bayesian procedure build upon a variational approximation that, unlike the nuclear norm, retains the same globally minimizing point estimate as the rank function under many useful constraints. However, locally minimizing solutions are largely smoothed away via marginalization, allowing the algorithm to succeed when standard convex relaxations completely fail. While the proposed methodology is generally applicable to a wide range of low-rank applications, we focus our attention on the robust principal component analysis problem (RPCA), which involves estimating an unknown low-rank matrix with unknown sparse corruptions. Theoretical and empirical evidence are presented to show that our method is potentially superior to related MAP-based approaches, for which the convex principle component pursuit (PCP) algorithm (Candes et al., 2011) can be viewed as a special case.\nEditing faces in videos is a popular yet challenging aspect of computer vision and graphics, which encompasses several applications including facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation. Simply applying image-based warping algorithms to video-based face editing produces temporal incoherence in the synthesized videos because it is impossible to consistently localize facial features in two frames representing two different faces in two different videos (or even two consecutive frames representing the same face in one video). Therefore, high performance face editing usually requires significant manual manipulation. In this paper we propose a novel temporal-spatial-smooth warping (TSSW) algorithm to effectively exploit the temporal information in two consecutive frames, as well as the spatial smoothness within each frame. TSSW precisely estimates two control lattices in the horizontal and vertical directions respectively from the corresponding control lattices in the previous frame, by minimizing a novel energy function that unifies a data-driven term, a smoothness term, and feature point constraints. Corresponding warping surfaces then precisely map source frames to the target frames. Experimental testing on facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation demonstrates that the proposed approaches can effectively preserve spatial smoothness and temporal coherence in editing facial geometry, skin detail, identity, and expression, which outperform the existing face editing methods. In particular, TSSW is robust to subtly inaccurate localization of feature points and is a vast improvement over image-based warping methods.\nWe evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.\nWe propose and evaluate a number of solutions to the problem of calculating the cost to serve each location in a single-vehicle transport setting. Such cost to serve analysis has application both strategically and operationally in transportation. The problem is formally given by the traveling salesperson game (TSG), a cooperative total utility game in which agents correspond to locations in a traveling salesperson problem (TSP). The cost to serve a location is an allocated portion of the cost of an optimal tour. The Shapley value is one of the most important normative division schemes in cooperative games, giving a principled and fair allocation both for the TSG and more generally. We consider a number of direct and sampling-based procedures for calculating the Shapley value, and present the first proof that approximating the Shapley value of the TSG within a constant factor is NP-hard. Treating the Shapley value as an ideal baseline allocation, we then develop six proxies for that value which are relatively easy to compute. We perform an experimental evaluation using Synthetic Euclidean games as well as games derived from real-world tours calculated for fast-moving consumer goods scenarios. Our experiments show that several computationally tractable allocation techniques correspond to good proxies for the Shapley value.\nIn this paper, we address the knowledge engineering problems for hypothesis generation motivated by applications that require timely exploration of hypotheses under unreliable observations. We looked at two applications: malware detection and intensive care delivery. In intensive care, the goal is to generate plausible hypotheses about the condition of the patient from clinical observations and further refine these hypotheses to create a recovery plan for the patient. Similarly, preventing malware spread within a corporate network involves generating hypotheses from network traffic data and selecting preventive actions. To this end, building on the already established characterization and use of AI planning for similar problems, we propose use of planning for the hypothesis generation problem. However, to deal with uncertainty, incomplete model description and unreliable observations, we need to use a planner capable of generating multiple high-quality plans. To capture the model description we propose a language called LTS++ and a web-based tool that enables the specification of the LTS++ model and a set of observations. We also proposed a 9-step process that helps provide guidance to the domain expert in specifying the LTS++ model. The hypotheses are then generated by running a planner on the translated LTS++ model and the provided trace. The hypotheses can be visualized and shown to the analyst or can be further investigated automatically.\nWe introduce the notion of online reactive planning with sensing actions for systems with temporal logic constraints in partially observable and dynamic environments. With incomplete information on the dynamic environment, reactive controller synthesis amounts to solving a two-player game with partial observations, which has impractically computational complexity. To alleviate the high computational burden, online replanning via sensing actions avoids solving the strategy in the reactive system under partial observations. Instead, we only solve for a strategy that ensures a given temporal logic specification can be satisfied had the system have complete observations of its environment. Such a strategy is then transformed into one which makes control decisions based on the observed sequence of states (of the interacting system and its environment). When the system encounters a belief---a set including all possible hypotheses the system has for the current state---for which the observation-based strategy is undefined, a sequence of sensing actions are triggered, chosen by an active sensing strategy, to reduce the uncertainty in the system's belief. We show that by alternating between the observation-based strategy and the active sensing strategy, under a mild technical assumption of the set of sensors in the system, the given temporal logic specification can be satisfied with probability 1.\nIn image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories. In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HD-CNN training, component-wise pretraining is followed by global finetuning with a multinomial logistic loss regularized by a coarse category consistency term. In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for large-scale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different HD-CNNs and they lower the top-1 error of the standard CNNs by 2.65%, 3.1% and 1.1%, respectively.\nWhile most Bayesian nonparametric models in machine learning have focused on the Dirichlet process, the beta process, or their variants, the gamma process has recently emerged as a useful nonparametric prior in its own right. Current inference schemes for models involving the gamma process are restricted to MCMC-based methods, which limits their scalability. In this paper, we present a variational inference framework for models involving gamma process priors. Our approach is based on a novel stick-breaking constructive definition of the gamma process. We prove correctness of this stick-breaking process by using the characterization of the gamma process as a completely random measure (CRM), and we explicitly derive the rate measure of our construction using Poisson process machinery. We also derive error bounds on the truncation of the infinite process required for variational inference, similar to the truncation analyses for other nonparametric models based on the Dirichlet and beta processes. Our representation is then used to derive a variational inference algorithm for a particular Bayesian nonparametric latent structure formulation known as the infinite Gamma-Poisson model, where the latent variables are drawn from a gamma process prior with Poisson likelihoods. Finally, we present results for our algorithms on nonnegative matrix factorization tasks on document corpora, and show that we compare favorably to both sampling-based techniques and variational approaches based on beta-Bernoulli priors.\nWe propose a framework grounded in Logic Programming for representing and reasoning about business processes from both the procedural and ontological point of views. In particular, our goal is threefold: (1) define a logical language and a formal semantics for process models enriched with ontology-based annotations; (2) provide an effective inference mechanism that supports the combination of reasoning services dealing with the structural definition of a process model, its behavior, and the domain knowledge related to the participating business entities; (3) implement such a theoretical framework into a process modeling and reasoning platform. To this end we define a process ontology coping with a relevant fragment of the popular BPMN modeling notation. The behavioral semantics of a process is defined as a state transition system by following an approach similar to the Fluent Calculus, and allows us to specify state change in terms of preconditions and effects of the enactment of activities. Then we show how the procedural process knowledge can be seamlessly integrated with the domain knowledge specified by using the OWL 2 RL rule-based ontology language. Our framework provides a wide range of reasoning services, including CTL model checking, which can be performed by using standard Logic Programming inference engines through a goal-oriented, efficient, sound and complete evaluation procedure. We also present a software environment implementing the proposed framework, and we report on an experimental evaluation of the system, whose results are encouraging and show the viability of the approach.\nIn part one of the Critique of Judgment, Immanuel Kant wrote that \"the judgment of taste...is not a cognitive judgment, and so not logical, but is aesthetic.\"\\cite{Kant} While the condition of aesthetic discernment has long been the subject of philosophical discourse, the role of the arbiters of that judgment has more often been assumed than questioned. The art historian, critic, connoisseur, and curator have long held the esteemed position of the aesthetic judge, their training, instinct, and eye part of the inimitable subjective processes that Kant described as occurring upon artistic evaluation. Although the concept of intangible knowledge in regard to aesthetic theory has been much explored, little discussion has arisen in response to the development of new types of artificial intelligence as a challenge to the seemingly ineffable abilities of the human observer. This paper examines the developments in the field of computer vision analysis of paintings from canonical movements with the history of Western art and the reaction of art historians to the application of this technology in the field. Through an investigation of the ethical consequences of this innovative technology, the unquestioned authority of the art expert is challenged and the subjective nature of aesthetic judgment is brought to philosophical scrutiny once again.\nGiven recent advances in information technology and artificial intelligence, web-based education systems have became complementary and, in some cases, viable alternatives to traditional classroom teaching. The popularity of these systems stems from their ability to make education available to a large demographics (see MOOCs). However, existing systems do not take advantage of the personalization which becomes possible when web-based education is offered: they continue to be one-size-fits-all. In this paper, we aim to provide a first systematic method for designing a personalized web-based education system. Personalizing education is challenging: (i) students need to be provided personalized teaching and training depending on their contexts (e.g. classes already taken, methods of learning preferred, etc.), (ii) for each specific context, the best teaching and training method (e.g type and order of teaching materials to be shown) must be learned, (iii) teaching and training should be adapted online, based on the scores/feedback (e.g. tests, quizzes, final exam, likes/dislikes etc.) of the students. Our personalized online system, e-Tutor, is able to address these challenges by learning how to adapt the teaching methodology (in this case what sequence of teaching material to present to a student) to maximize her performance in the final exam, while minimizing the time spent by the students to learn the course (and possibly dropouts). We illustrate the efficiency of the proposed method on a real-world eTutor platform which is used for remedial training for a Digital Signal Processing (DSP) course.\nMany tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learning-to-match in this paper, have been successfully applied to the problems. Among them, a class of state-of-the-art methods, named feature-based matrix factorization, formalize the task as an extension to matrix factorization by incorporating auxiliary features into the model. Unfortunately, making those algorithms scale to real world problems is challenging, and simple parallelization strategies fail due to the complex cross talking patterns between sub-tasks. In this paper, we tackle this challenge with a novel parallel and efficient algorithm for feature-based matrix factorization. Our algorithm, based on coordinate descent, can easily handle hundreds of millions of instances and features on a single machine. The key recipe of this algorithm is an iterative relaxation of the objective to facilitate parallel updates of parameters, with guaranteed convergence on minimizing the original objective function. Experimental results demonstrate that the proposed method is effective on a wide range of matching problems, with efficiency significantly improved upon the baselines while accuracy retained unchanged.\nFunctional Magnetic Resonance Imaging (fMRI) is a powerful non-invasive tool for localizing and analyzing brain activity. This study focuses on one very important aspect of the functional properties of human brain, specifically the estimation of the level of parallelism when performing complex cognitive tasks. Using fMRI as the main modality, the human brain activity is investigated through a purely data-driven signal processing and dimensionality analysis approach. Specifically, the fMRI signal is treated as a multi-dimensional data space and its intrinsic `complexity' is studied via dataset fractal analysis and blind-source separation (BSS) methods. One simulated and two real fMRI datasets are used in combination with Independent Component Analysis (ICA) and fractal analysis for estimating the intrinsic (true) dimensionality, in order to provide data-driven experimental evidence on the number of independent brain processes that run in parallel when visual or visuo-motor tasks are performed. Although this number is can not be defined as a strict threshold but rather as a continuous range, when a specific activation level is defined, a corresponding number of parallel processes or the casual equivalent of `cpu cores' can be detected in normal human brain activity.\nAs language and visual understanding by machines progresses rapidly, we are observing an increasing interest in holistic architectures that tightly interlink both modalities in a joint learning and inference process. This trend has allowed the community to progress towards more challenging and open tasks and refueled the hope at achieving the old AI dream of building machines that could pass a turing test in open domains. In order to steadily make progress towards this goal, we realize that quantifying performance becomes increasingly difficult. Therefore we ask how we can precisely define such challenges and how we can evaluate different algorithms on this open tasks? In this paper, we summarize and discuss such challenges as well as try to give answers where appropriate options are available in the literature. We exemplify some of the solutions on a recently presented dataset of question-answering task based on real-world indoor images that establishes a visual turing challenge. Finally, we argue despite the success of unique ground-truth annotation, we likely have to step away from carefully curated dataset and rather rely on 'social consensus' as the main driving force to create suitable benchmarks. Providing coverage in this inherently ambiguous output space is an emerging challenge that we face in order to make quantifiable progress in this area.\nOne important challenge for a set of agents to achieve more efficient collaboration is for these agents to maintain proper models of each other. An important aspect of these models of other agents is that they are often partial and incomplete. Thus far, there are two common representations of agent models: MDP based and action based, which are both based on action modeling. In many applications, agent models may not have been given, and hence must be learnt. While it may seem convenient to use either MDP based or action based models for learning, in this paper, we introduce a new representation based on capability models, which has several unique advantages. First, we show that learning capability models can be performed efficiently online via Bayesian learning, and the learning process is robust to high degrees of incompleteness in plan execution traces (e.g., with only start and end states). While high degrees of incompleteness in plan execution traces presents learning challenges for MDP based and action based models, capability models can still learn to {\\em abstract} useful information out of these traces. As a result, capability models are useful in applications in which such incompleteness is common, e.g., robot learning human model from observations and interactions. Furthermore, when used in multi-agent planning (with each agent modeled separately), capability models provide flexible abstraction of actions. The limitation, however, is that the synthesized plan is incomplete and abstract.\nWe study the data space $D$ of any given data set $X$ and explain how functions and relations are defined over $D$. From $D$ and for a specific domain $\\Delta$ we construct the information space $I$ of $X$ by interpreting variables, functions, and explicit relations over $D$ in $\\Delta$ and by including other relations that $D$ implies under the interpretation in $\\Delta$. Then from $I$ we build up the knowledge space $K$ of $X$ as the product of two spaces $K_T$ and $K_P$, where $K_T$ is obtained from $I$ by using the induction principle to generalize propositional relations to quantified relations, the deduction principle to generate new relations, and standard mechanisms to validate relations and $K_P$ is the space of specifications of methods with operational instructions which are valid in $K_T$. Through our construction of the three topological spaces the following key observation is made clear: the retrieval of information from the given data set for $\\Delta$ consists essentially in mining domain objects and relations, and the discovery of knowledge from the retrieved information consists essentially in applying the induction and deduction principles to generate propositions, synthesizing and modeling the information to generate specifications of methods with operational instructions, and validating the propositions and specifications. Based on this observation, efficient approaches may be designed to discover profound knowledge automatically from simple data, as demonstrated by the result of our study in the case of geometry.\nAnswering conjunctive queries (CQs) over $\\mathcal{EL}$ knowledge bases (KBs) with complex role inclusions is PSPACE-hard and in PSPACE in certain cases; however, if complex role inclusions are restricted to role transitivity, the tight upper complexity bound has so far been unknown. Furthermore, the existing algorithms cannot handle reflexive roles, and they are not practicable. Finally, the problem is tractable for acyclic CQs and $\\mathcal{ELH}$, and NP-complete for unrestricted CQs and $\\mathcal{ELHO}$ KBs. In this paper we complete the complexity landscape of CQ answering for several important cases. In particular, we present a practicable NP algorithm for answering CQs over $\\mathcal{ELHO}^s$ KBs---a logic containing all of OWL 2 EL, but with complex role inclusions restricted to role transitivity. Our preliminary evaluation suggests that the algorithm can be suitable for practical use. Moreover, we show that, even for a restricted class of so-called arborescent acyclic queries, CQ answering over $\\mathcal{EL}$ KBs becomes NP-hard in the presence of either transitive or reflexive roles. Finally, we show that answering arborescent CQs over $\\mathcal{ELHO}$ KBs is tractable, whereas answering acyclic CQs is NP-hard.\nThis short paper concerns discretization schemes for representing and computing approximate Nash equilibria, with emphasis on graphical games, but briefly touching on normal-form and poly-matrix games. The main technical contribution is a representation theorem that informally states that to account for every exact Nash equilibrium using a nearby approximate Nash equilibrium on a grid over mixed strategies, a uniform discretization size linear on the inverse of the approximation quality and natural game-representation parameters suffices. For graphical games, under natural conditions, the discretization is logarithmic in the game-representation size, a substantial improvement over the linear dependency previously required. The paper has five other objectives: (1) given the venue, to highlight the important, but often ignored, role that work on constraint networks in AI has in simplifying the derivation and analysis of algorithms for computing approximate Nash equilibria; (2) to summarize the state-of-the-art on computing approximate Nash equilibria, with emphasis on relevance to graphical games; (3) to help clarify the distinction between sparse-discretization and sparse-support techniques; (4) to illustrate and advocate for the deliberate mathematical simplicity of the formal proof of the representation theorem; and (5) to list and discuss important open problems, emphasizing graphical-game generalizations, which the AI community is most suitable to solve.\nHuman beings do not observe the world from the outside, but rather are fully embedded in it. The sciences, however, often give the observer both a \"god's eye\" perspective and substantial a~priori knowledge. Motivated by W. Ross Ashby's statement, \"the theory of the Black Box is merely the theory of real objects or systems, when close attention is given to the question, relating object and observer, about what information comes from the object, and how it is obtained\" (Introduction to Cybernetics, 1956, p. 110), I develop here an alternate picture of the world as a black box to which the observer is coupled. Within this framework I prove purely-classical analogs of the \"no-go\" theorems of quantum theory. Focussing on the question of identifying macroscopic objects, such as laboratory apparatus or even other observers, I show that the standard quantum formalism of superposition is required to adequately represent the classical information that an observer can obtain. I relate these results to supporting considerations from evolutionary biology, cognitive and developmental psychology, and artificial intelligence.\nWe propose a scalable temporal latent space model for link prediction in dynamic social networks, where the goal is to predict links over time based on a sequence of previous graph snapshots. The model assumes that each user lies in an unobserved latent space and interactions are more likely to form between similar users in the latent space representation. In addition, the model allows each user to gradually move its position in the latent space as the network structure evolves over time. We present a global optimization algorithm to effectively infer the temporal latent space, with a quadratic convergence rate. Two alternative optimization algorithms with local and incremental updates are also proposed, allowing the model to scale to larger networks without compromising prediction accuracy. Empirically, we demonstrate that our model, when evaluated on a number of real-world dynamic networks, significantly outperforms existing approaches for temporal link prediction in terms of both scalability and predictive power.\nThe automatic design of controllers for mobile robots usually requires two stages. In the first stage,sensorial data are preprocessed or transformed into high level and meaningful values of variables whichare usually defined from expert knowledge. In the second stage, a machine learning technique is applied toobtain a controller that maps these high level variables to the control commands that are actually sent tothe robot. This paper describes an algorithm that is able to embed the preprocessing stage into the learningstage in order to get controllers directly starting from sensorial raw data with no expert knowledgeinvolved. Due to the high dimensionality of the sensorial data, this approach uses Quantified Fuzzy Rules(QFRs), that are able to transform low-level input variables into high-level input variables, reducingthe dimensionality through summarization. The proposed learning algorithm, called Iterative QuantifiedFuzzy Rule Learning (IQFRL), is based on genetic programming. IQFRL is able to learn rules with differentstructures, and can manage linguistic variables with multiple granularities. The algorithm has been testedwith the implementation of the wall-following behavior both in several realistic simulated environmentswith different complexity and on a Pioneer 3-AT robot in two real environments. Results have beencompared with several well-known learning algorithms combined with different data preprocessingtechniques, showing that IQFRL exhibits a better and statistically significant performance. Moreover,three real world applications for which IQFRL plays a central role are also presented: path and objecttracking with static and moving obstacles avoidance.\nWe investigate the potential of using ordinal peer grading for the evaluation of students in massive online open courses (MOOCs). According to such grading schemes, each student receives a few assignments (by other students) which she has to rank. Then, a global ranking (possibly translated into numerical scores) is produced by combining the individual ones. This is a novel application area for social choice concepts and methods where the important problem to be solved is as follows: how should the assignments be distributed so that the collected individual rankings can be easily merged into a global one that is as close as possible to the ranking that represents the relative performance of the students in the assignment? Our main theoretical result suggests that using very simple ways to distribute the assignments so that each student has to rank only $k$ of them, a Borda-like aggregation method can recover a $1-O(1/k)$ fraction of the true ranking when each student correctly ranks the assignments she receives. Experimental results strengthen our analysis further and also demonstrate that the same method is extremely robust even when students have imperfect capabilities as graders. We believe that our results provide strong evidence that ordinal peer grading can be a highly effective and scalable solution for evaluation in MOOCs.\nA wide range of evidence points toward the existence of a common algorithm underlying the processing of information throughout the cerebral cortex. Several hypothesized features of this cortical algorithm are reviewed, including sparse distributed representation, Bayesian inference, hierarchical organization composed of alternating template matching and pooling layers, temporal slowness and predictive coding. Hierarchical Temporal Memory (HTM) is a family of learning algorithms and corresponding theories of cortical function that embodies these principles. HTM has previously been applied mainly to perceptual tasks typical of posterior cortex. In order to evaluate HTM as a candidate model of cortical function, it is necessary also to investigate its compatibility with the requirements of frontal cortical function. To this end, a variety of models of frontal cortical function are reviewed and integrated, to arrive at the hypothesis that frontal functions including attention, working memory and action selection depend largely upon the same basic algorithms as do posterior functions, with the notable additions of a mechanism for the active maintenance of representations and of multiple cortico-striato-thalamo-cortical loops that allow communication between regions of frontal cortex to be gated in an adaptive manner. Computational models of this system are reviewed. Finally, there is a discussion of how HTM can contribute to the understanding of frontal cortical function, and of what the requirements of frontal cortical function mean for the future development of HTM.\nNonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\\textit{general} alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\\textit{mean-shift} alternatives).   The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test's power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and also the first analysis of how tests designed for general alternatives perform when faced with easier ones.\nIn railway operations, a timetable is established to determine the departure and arrival times for the trains or other rolling stock at the different stations or relevant points inside the rail network or a subset of this network. The elaboration of this timetable is done to respond to the commercial requirements for both passenger and freight traffic, but also it must respect a set of security and capacity constraints associated with the railway network, rolling stock and legislation. Combining these requirements and constraints, as well as the important number of trains and schedules to plan, makes the preparation of a feasible timetable a complex and time-consuming process, that normally takes several months to be completed. This article addresses the problem of generating periodic timetables, which means that the involved trains operate in a recurrent pattern. For instance, the trains belonging to the same train line, depart from some station every 15 minutes or one hour. To tackle the problem, we present a constraint-based model suitable for this kind of problem. Then, we propose a genetic algorithm, allowing a rapid generation of feasible periodic timetables. Finally, two case studies are presented, the first, describing a sub-set of the Netherlands rail network, and the second a large portion of the Nord-pas-de-Calais regional rail network, both of them are then solved using our algorithm and the results are presented and discussed.\nEvolutionary Robotics allows robots with limited sensors and processing to tackle complex tasks by means of sensory-motor coordination. In this paper we show the first application of the Behaviour Tree framework to a real robotic platform using the Evolutionary Robotics methodology. This framework is used to improve the intelligibility of the emergent robotic behaviour as compared to the traditional Neural Network formulation. As a result, the behaviour is easier to comprehend and manually adapt when crossing the reality gap from simulation to reality. This functionality is shown by performing real-world flight tests with the 20-gram DelFly Explorer flapping wing Micro Air Vehicle equipped with a 4-gram onboard stereo vision system. The experiments show that the DelFly can fully autonomously search for and fly through a window with only its onboard sensors and processing. The success rate of the optimised behaviour in simulation is 88% and the corresponding real-world performance is 54% after user adaptation. Although this leaves room for improvement, it is higher than the 46% success rate from a tuned user-defined controller.\nWithin the Kolmogorov theory of probability, Bayes' rule allows one to perform statistical inference by relating conditional probabilities to unconditional probabilities. As we show here, however, there is a continuous set of alternative inference rules that yield the same results, and that may have computational or practical advantages for certain problems. We formulate generalized axioms for probability theory, according to which the reverse conditional probability distribution P(B|A) is not specified by the forward conditional probability distribution P(A|B) and the marginals P(A) and P(B). Thus, in order to perform statistical inference, one must specify an additional \"inference axiom,\" which relates P(B|A) to P(A|B), P(A), and P(B). We show that when Bayes' rule is chosen as the inference axiom, the axioms are equivalent to the classical Kolmogorov axioms. We then derive consistency conditions on the inference axiom, and thereby characterize the set of all possible rules for inference. The set of \"first-order\" inference axioms, defined as the set of axioms in which P(B|A) depends on the first power of P(A|B), is found to be a 1-simplex, with Bayes' rule at one of the extreme points. The other extreme point, the \"inversion rule,\" is studied in depth.\nChaos provides many interesting properties that can be used to achieve computational tasks. Such properties are sensitivity to initial conditions, space filling, control and synchronization. Chaotic neural models have been devised to exploit such properties. In this paper, a chaotic spiking neuron model is investigated experimentally. This investigation is performed to understand the dynamic behaviours of the model.   The aim of this research is to investigate the dynamics of the nonlinear dynamic state neuron (NDS) experimentally. The experimental approach has revealed some quantitative and qualitative properties of the NDS model such as the control mechanism, the reset mechanism, and the way the model may exhibit dynamic behaviours in phase space. It is shown experimentally in this paper that both the reset mechanism and the self-feed back control mechanism are important for the NDS model to work and to stabilise to one of the large number of available unstable periodic orbits (UPOs) that are embedded in its attractor. The experimental investigation suggests that the internal dynamics of the NDS neuron provide a rich set of dynamic behaviours that can be controlled and stabilised. These wide range of dynamic behaviours may be exploited to carry out information processing tasks.\nPlanned experiments are the gold standard in reliably comparing the causal effect of switching from a baseline policy to a new policy. One critical shortcoming of classical experimental methods, however, is that they typically do not take into account the dynamic nature of response to policy changes. For instance, in an experiment where we seek to understand the effects of a new ad pricing policy on auction revenue, agents may adapt their bidding in response to the experimental pricing changes. Thus, causal effects of the new pricing policy after such adaptation period, the {\\em long-term causal effects}, are not captured by the classical methodology even though they clearly are more indicative of the value of the new policy. Here, we formalize a framework to define and estimate long-term causal effects of policy changes in multiagent economies. Central to our approach is behavioral game theory, which we leverage to formulate the ignorability assumptions that are necessary for causal inference. Under such assumptions we estimate long-term causal effects through a latent space approach, where a behavioral model of how agents act conditional on their latent behaviors is combined with a temporal model of how behaviors evolve over time.\nParticipatory democracy advances in virtually all governments and especially in South America which exhibits a mixed culture and social predisposition. This article presents the \"Social Participation Ontology\" (OPS from the Brazilian name \\emph{Ontologia de Participa\\c{c}\\~ao Social}) implemented in compliance with the Web Ontology Language standard (OWL) for fostering social participation, specially in virtual platforms. The entities and links of OPS were defined based on an extensive collaboration of specialists. It is shown that OPS is instrumental for information retrieval from the contents of the portal, both in terms of the actors (at various levels) as well as mechanisms and activities. Significantly, OPS is linked to other OWL ontologies as an upper ontology and via FOAF and BFO as higher upper ontologies, which yields sound organization and access of knowledge and data. In order to illustrate the usefulness of OPS, we present results on ontological expansion and integration with other ontologies and data. Ongoing work involves further adoption of OPS by the official Brazilian federal portal for social participation and NGO s, and further linkage to other ontologies for social participation.\nReservoir computing is a recent bio-inspired approach for processing time-dependent signals. It has enabled a breakthrough in analog information processing, with several experiments, both electronic and optical, demonstrating state-of-the-art performances for hard tasks such as speech recognition, time series prediction and nonlinear channel equalization. A proof-of-principle experiment using a linear optical circuit on a photonic chip to process digital signals was recently reported. Here we present a photonic implementation of a reservoir computer based on a coherently driven passive fiber cavity processing analog signals. Our experiment has error rate as low or lower than previous experiments on a wide variety of tasks, and also has lower power consumption. Furthermore, the analytical model describing our experiment is also of interest, as it constitutes a very simple high performance reservoir computer algorithm. The present experiment, given its good performances, low energy consumption and conceptual simplicity, confirms the great potential of photonic reservoir computing for information processing applications ranging from artificial intelligence to telecommunications\nUnderstanding infant development is one of the greatest scientific challenges of contemporary science. A large source of difficulty comes from the fact that the development of skills in infants results from the interactions of multiple mechanisms at multiple spatio-temporal scales. The concepts of \"innate\" or \"acquired\" are not any more adequate tools for explanations, which call for a shift from reductionist to systemic accounts. To address this challenge, building and experimenting with robots modeling the growing infant brain and body is crucial. Systemic explanations of pattern formation in sensorimotor, cognitive and social development, viewed as a complex dynamical system, require the use of formal models based on mathematics, algorithms and robots. Formulating hypothesis about development using such models, and exploring them through experiments, allows us to consider in detail the interaction between many mechanisms and parameters. This complements traditional experimental methods in psychology and neuroscience where only a few variables can be studied at the same time. Furthermore, the use of robots is of particular importance. The laws of physics generate everywhere around us spontaneous patterns in the inorganic world. They also strongly impact the living, and in particular constrain and guide infant development through the properties of its (changing) body in interaction with the physical environment. Being able to consider the body as an experimental variable, something that can be systematically changed in order to study the impact on skill formation, has been a dream to many developmental scientists. This is today becoming possible with developmental robotics.\nIn view of the paradigm shift that makes science ever more data-driven, in this thesis we propose a synthesis method for encoding and managing large-scale deterministic scientific hypotheses as uncertain and probabilistic data.   In the form of mathematical equations, hypotheses symmetrically relate aspects of the studied phenomena. For computing predictions, however, deterministic hypotheses can be abstracted as functions. We build upon Simon's notion of structural equations in order to efficiently extract the (so-called) causal ordering between variables, implicit in a hypothesis structure (set of mathematical equations).   We show how to process the hypothesis predictive structure effectively through original algorithms for encoding it into a set of functional dependencies (fd's) and then performing causal reasoning in terms of acyclic pseudo-transitive reasoning over fd's. Such reasoning reveals important causal dependencies implicit in the hypothesis predictive data and guide our synthesis of a probabilistic database. Like in the field of graphical models in AI, such a probabilistic database should be normalized so that the uncertainty arisen from competing hypotheses is decomposed into factors and propagated properly onto predictive data by recovering its joint probability distribution through a lossless join. That is motivated as a design-theoretic principle for data-driven hypothesis management and predictive analytics.   The method is applicable to both quantitative and qualitative deterministic hypotheses and demonstrated in realistic use cases from computational science.\nWeb services allow communication between heterogeneous systems in a distributed environment. Their enormous success and their increased use led to the fact that thousands of Web services are present on the Internet. This significant number of Web services which not cease to increase has led to problems of the difficulty in locating and classifying web services, these problems are encountered mainly during the operations of web services discovery and substitution. Traditional ways of search based on keywords are not successful in this context, their results do not support the structure of Web services and they consider in their search only the identifiers of the web service description language (WSDL) interface elements. The methods based on semantics (WSDLS, OWLS, SAWSDL...) which increase the WSDL description of a Web service with a semantic description allow raising partially this problem, but their complexity and difficulty delays their adoption in real cases. Measuring the similarity between the web services interfaces is the most suitable solution for this kind of problems, it will classify available web services so as to know those that best match the searched profile and those that do not match. Thus, the main goal of this work is to study the degree of similarity between any two web services by offering a new method that is more effective than existing works.\nIn the domain of online advertising, our aim is to serve the best ad to a user who visits a certain webpage, to maximize the chance of a desired action to be performed by this user after seeing the ad. While it is possible to generate a different prediction model for each user to tell if he/she will act on a given ad, the prediction result typically will be quite unreliable with huge variance, since the desired actions are extremely sparse, and the set of users is huge (hundreds of millions) and extremely volatile, i.e., a lot of new users are introduced everyday, or are no longer valid. In this paper we aim to improve the accuracy in finding users who will perform the desired action, by assigning each user to a cluster, where the number of clusters is much smaller than the number of users (in the order of hundreds). Each user will fall into the same cluster with another user if their event history are similar. For this purpose, we modify the probabilistic latent semantic analysis (pLSA) model by assuming the independence of the user and the cluster id, given the history of events. This assumption helps us to identify a cluster of a new user without re-clustering all the users. We present the details of the algorithm we employed as well as the distributed implementation on Hadoop, and some initial results on the clusters that were generated by the algorithm.\nThis thesis contributes to ongoing research related to the categorical compositional model for natural language of Coecke, Sadrzadeh and Clark in three ways: Firstly, I propose a concrete instantiation of the abstract framework based on Frobenius algebras (joint work with Sadrzadeh). The theory improves shortcomings of previous proposals, extends the coverage of the language, and is supported by experimental work that improves existing results. The proposed framework describes a new class of compositional models that find intuitive interpretations for a number of linguistic phenomena. Secondly, I propose and evaluate in practice a new compositional methodology which explicitly deals with the different levels of lexical ambiguity (joint work with Pulman). A concrete algorithm is presented, based on the separation of vector disambiguation from composition in an explicit prior step. Extensive experimental work shows that the proposed methodology indeed results in more accurate composite representations for the framework of Coecke et al. in particular and every other class of compositional models in general. As a last contribution, I formalize the explicit treatment of lexical ambiguity in the context of the categorical framework by resorting to categorical quantum mechanics (joint work with Coecke). In the proposed extension, the concept of a distributional vector is replaced with that of a density matrix, which compactly represents a probability distribution over the potential different meanings of the specific word. Composition takes the form of quantum measurements, leading to interesting analogies between quantum physics and linguistics.\nTwo recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.\nWe describe a new instance-based learning algorithm called the Boundary Forest (BF) algorithm, that can be used for supervised and unsupervised learning. The algorithm builds a forest of trees whose nodes store previously seen examples. It can be shown data points one at a time and updates itself incrementally, hence it is naturally online. Few instance-based algorithms have this property while being simultaneously fast, which the BF is. This is crucial for applications where one needs to respond to input data in real time. The number of children of each node is not set beforehand but obtained from the training procedure, which makes the algorithm very flexible with regards to what data manifolds it can learn. We test its generalization performance and speed on a range of benchmark datasets and detail in which settings it outperforms the state of the art. Empirically we find that training time scales as O(DNlog(N)) and testing as O(Dlog(N)), where D is the dimensionality and N the amount of data,\nThe basic indicators of a researcher's productivity and impact are still the number of publications and their citation counts. These metrics are clear, straightforward, and easy to obtain. When a ranking of scholars is needed, for instance in grant, award, or promotion procedures, their use is the fastest and cheapest way of prioritizing some scientists over others. However, due to their nature, there is a danger of oversimplifying scientific achievements. Therefore, many other indicators have been proposed including the usage of the PageRank algorithm known for the ranking of webpages and its modifications suited to citation networks. Nevertheless, this recursive method is computationally expensive and even if it has the advantage of favouring prestige over popularity, its application should be well justified, particularly when compared to the standard citation counts. In this study, we analyze three large datasets of computer science papers in the categories of artificial intelligence, software engineering, and theory and methods and apply 12 different ranking methods to the citation networks of authors. We compare the resulting rankings with self-compiled lists of outstanding researchers selected as frequent editorial board members of prestigious journals in the field and conclude that there is no evidence of PageRank-based methods outperforming simple citation counts.\nThe Stable Matching Problem with Couples (SMP-C) is a ubiquitous real-world extension of the stable matching problem (SMP) involving complementarities. Although SMP can be solved in polynomial time, SMP-C is NP-Complete. Hence, it is not clear which, if any, of the theoretical results surrounding the canonical SMP problem apply in this setting. In this paper, we use a recently-developed SAT encoding to solve SMP-C exactly. This allows us to enumerate all stable matchings for any given instance of SMP-C. With this tool, we empirically evaluate some of the properties that have been hypothesized to hold for SMP-C.   We take particular interest in investigating if, as the size of the market grows, the percentage of instances with unique stable matchings also grows. While we did not find this trend among the random problem instances we sampled, we did find that the percentage of instances with an resident optimal matching seems to more closely follow the trends predicted by previous conjectures. We also define and investigate resident Pareto optimal stable matchings, finding that, even though this is important desideratum for the deferred acceptance style algorithms previously designed to solve SMP-C, they do not always find one.   We also investigate strategy-proofness for SMP-C, showing that even if only one stable matching exists, residents still have incentive to misreport their preferences. However, if a problem has a resident optimal stable matching, we show that residents cannot manipulate via truncation.\nFormal synthesis is the process of generating a program satisfying a high-level formal specification. In recent times, effective formal synthesis methods have been proposed based on the use of inductive learning. We refer to this class of methods that learn programs from examples as formal inductive synthesis. In this paper, we present a theoretical framework for formal inductive synthesis. We discuss how formal inductive synthesis differs from traditional machine learning. We then describe oracle-guided inductive synthesis (OGIS), a framework that captures a family of synthesizers that operate by iteratively querying an oracle. An instance of OGIS that has had much practical impact is counterexample-guided inductive synthesis (CEGIS). We present a theoretical characterization of CEGIS for learning any program that computes a recursive language. In particular, we analyze the relative power of CEGIS variants where the types of counterexamples generated by the oracle varies. We also consider the impact of bounded versus unbounded memory available to the learning algorithm. In the special case where the universe of candidate programs is finite, we relate the speed of convergence to the notion of teaching dimension studied in machine learning theory. Altogether, the results of the paper take a first step towards a theoretical foundation for the emerging field of formal inductive synthesis.\nWe focus on the problem of finding a non-linear classification function that lies in a Reproducing Kernel Hilbert Space (RKHS) both from the primal point of view (finding a perfect separator when one exists) and the dual point of view (giving a certificate of non-existence), with special focus on generalizations of two classical schemes - the Perceptron (primal) and Von-Neumann (dual) algorithms.   We cast our problem as one of maximizing the regularized normalized hard-margin ($\\rho$) in an RKHS and %use the Representer Theorem to rephrase it in terms of a Mahalanobis dot-product/semi-norm associated with the kernel's (normalized and signed) Gram matrix. We derive an accelerated smoothed algorithm with a convergence rate of $\\tfrac{\\sqrt {\\log n}}{\\rho}$ given $n$ separable points, which is strikingly similar to the classical kernelized Perceptron algorithm whose rate is $\\tfrac1{\\rho^2}$. When no such classifier exists, we prove a version of Gordan's separation theorem for RKHSs, and give a reinterpretation of negative margins. This allows us to give guarantees for a primal-dual algorithm that halts in $\\min\\{\\tfrac{\\sqrt n}{|\\rho|}, \\tfrac{\\sqrt n}{\\epsilon}\\}$ iterations with a perfect separator in the RKHS if the primal is feasible or a dual $\\epsilon$-certificate of near-infeasibility.\nInteresting theoretical associations have been established by recent papers between the fields of active learning and stochastic convex optimization due to the common role of feedback in sequential querying mechanisms. In this paper, we continue this thread in two parts by exploiting these relations for the first time to yield novel algorithms in both fields, further motivating the study of their intersection. First, inspired by a recent optimization algorithm that was adaptive to unknown uniform convexity parameters, we present a new active learning algorithm for one-dimensional thresholds that can yield minimax rates by adapting to unknown noise parameters. Next, we show that one can perform $d$-dimensional stochastic minimization of smooth uniformly convex functions when only granted oracle access to noisy gradient signs along any coordinate instead of real-valued gradients, by using a simple randomized coordinate descent procedure where each line search can be solved by $1$-dimensional active learning, provably achieving the same error convergence rate as having the entire real-valued gradient. Combining these two parts yields an algorithm that solves stochastic convex optimization of uniformly convex and smooth functions using only noisy gradient signs by repeatedly performing active learning, achieves optimal rates and is adaptive to all unknown convexity and smoothness parameters.\nA fundamental challenge in developing high-impact machine learning technologies is balancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HL-MRFs. The learned HL-MRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible.\nWriting rap lyrics requires both creativity to construct a meaningful, interesting story and lyrical skills to produce complex rhyme patterns, which form the cornerstone of good flow. We present a rap lyrics generation method that captures both of these aspects. First, we develop a prediction model to identify the next line of existing lyrics from a set of candidate next lines. This model is based on two machine-learning techniques: the RankSVM algorithm and a deep neural network model with a novel structure. Results show that the prediction model can identify the true next line among 299 randomly selected lines with an accuracy of 17%, i.e., over 50 times more likely than by random. Second, we employ the prediction model to combine lines from existing songs, producing lyrics with rhyme and a meaning. An evaluation of the produced lyrics shows that in terms of quantitative rhyme density, the method outperforms the best human rappers by 21%. The rap lyrics generator has been deployed as an online tool called DeepBeat, and the performance of the tool has been assessed by analyzing its usage logs. This analysis shows that machine-learned rankings correlate with user preferences.\nWe recently performed cognitive experiments on conjunctions and negations of two concepts with the aim of investigating the combination problem of concepts. Our experiments confirmed the deviations (conceptual vagueness, underextension, overextension, etc.) from the rules of classical (fuzzy) logic and probability theory observed by several scholars in concept theory, while our data were successfully modeled in a quantum-theoretic framework developed by ourselves. In this paper, we isolate a new, very stable and systematic pattern of violation of classicality that occurs in concept combinations. In addition, the strength and regularity of this non-classical effect leads us to believe that it occurs at a more fundamental level than the deviations observed up to now. It is our opinion that we have identified a deep non-classical mechanism determining not only how concepts are combined but, rather, how they are formed. We show that this effect can be faithfully modeled in a two-sector Fock space structure, and that it can be exactly explained by assuming that human thought is the supersposition of two processes, a 'logical reasoning', guided by 'logic', and a 'conceptual reasoning' guided by 'emergence', and that the latter generally prevails over the former. All these findings provide a new fundamental support to our quantum-theoretic approach to human cognition.\nThe paper introduces a new modular action language, ALM, and illustrates the methodology of its use. It is based on the approach of Gelfond and Lifschitz (1993; 1998) in which a high-level action language is used as a front end for a logic programming system description. The resulting logic programming representation is used to perform various computational tasks. The methodology based on existing action languages works well for small and even medium size systems, but is not meant to deal with larger systems that require structuring of knowledge. ALM is meant to remedy this problem. Structuring of knowledge in ALM is supported by the concepts of module (a formal description of a specific piece of knowledge packaged as a unit), module hierarchy, and library, and by the division of a system description of ALM into two parts: theory and structure. A theory consists of one or more modules with a common theme, possibly organized into a module hierarchy based on a dependency relation. It contains declarations of sorts, attributes, and properties of the domain together with axioms describing them. Structures are used to describe the domain's objects. These features, together with the means for defining classes of a domain as special cases of previously defined ones, facilitate the stepwise development, testing, and readability of a knowledge base, as well as the creation of knowledge representation libraries. To appear in Theory and Practice of Logic Programming (TPLP).\nThe proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.\nThe choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.\nShort Message Service (SMS) based Information Systems (SMSbIS) provide an excellent alternative to a traditional approach of obtaining specific information by direct (through phone) or indirect (IVRS, Web, Email) probing. Information and communication technology and far reaching mobile penetration has opened this new research trend Number of key players in Search industry including Microsoft and Google are attracted by the expected increase in volume of use of such applications. The wide range of applications and their public acceptance has motivated researchers to work in this research domain. Several applications such as SMS based information access using database management services, SMS based information retrieval through internet (search engine), SMS based information extraction, question answering, image retrieval etc. have been emerged. With the aim to understand the functionality involved in these systems, an extensive review of a few of these SMSbISs has been planned and executed by us. These systems are classified into four categories based on the objectives and domains of the applications. As a result of this study a well structured functional model is presented here. The model is evaluated in different dimensions, which is presented in this paper. In addition to this a chronological progress with respect to research and development in this upcoming field is compiled in this paper. Such an extensive review presented in this paper would definitely help the researchers and developers to understand the technical aspects of this field. The functional framework presented here would be useful to the system designers to design and develop an SMS based Information System of any specific domain.\nThe Web has made it possible to harness human cognition en masse to achieve new capabilities. Some of these successes are well known; for example Wikipedia has become the go-to place for basic information on all things; Duolingo engages millions of people in real-life translation of text, while simultaneously teaching them to speak foreign languages; and fold.it has enabled public-driven scientific discoveries by recasting complex biomedical challenges into popular online puzzle games. These and other early successes hint at the tremendous potential for future crowd-powered capabilities for the benefit of health, education, science, and society. In the process, a new field called Human Computation has emerged to better understand, replicate, and improve upon these successes through scientific research. Human Computation refers to the science that underlies online crowd-powered systems and was the topic of a recent visioning activity in which a representative cross-section of researchers, industry practitioners, visionaries, funding agency representatives, and policy makers came together to understand what makes crowd-powered systems successful. Teams of experts considered past, present, and future human computation systems to explore which kinds of crowd-powered systems have the greatest potential for societal impact and which kinds of research will best enable the efficient development of new crowd-powered systems to achieve this impact. This report summarize the products and findings of those activities as well as the unconventional process and activities employed by the workshop, which were informed by human computation research.\nStackelberg security game models and associated computational tools have seen deployment in a number of high-consequence security settings, such as LAX canine patrols and Federal Air Marshal Service. These models focus on isolated systems with only one defender, despite being part of a more complex system with multiple players. Furthermore, many real systems such as transportation networks and the power grid exhibit interdependencies between targets and, consequently, between decision makers jointly charged with protecting them. To understand such multidefender strategic interactions present in security, we investigate game theoretic models of security games with multiple defenders. Unlike most prior analysis, we focus on the situations in which each defender must protect multiple targets, so that even a single defender's best response decision is, in general, highly non-trivial. We start with an analytical investigation of multidefender security games with independent targets, offering an equilibrium and price-of-anarchy analysis of three models with increasing generality. In all models, we find that defenders have the incentive to over-protect targets, at times significantly. Additionally, in the simpler models, we find that the price of anarchy is unbounded, linearly increasing both in the number of defenders and the number of targets per defender. Considering interdependencies among targets, we develop a novel mixed-integer linear programming formulation to compute a defender's best response, and make use of this formulation in approximating Nash equilibria of the game. We apply this approach towards computational strategic analysis of several models of networks representing interdependencies, including real-world power networks. Our analysis shows how network structure and the probability of failure spread determine the propensity of defenders to over- or under-invest in security.\nCloud controllers aim at responding to application demands by automatically scaling the compute resources at runtime to meet performance guarantees and minimize resource costs. Existing cloud controllers often resort to scaling strategies that are codified as a set of adaptation rules. However, for a cloud provider, applications running on top of the cloud infrastructure are more or less black-boxes, making it difficult at design time to define optimal or pre-emptive adaptation rules. Thus, the burden of taking adaptation decisions often is delegated to the cloud application. Yet, in most cases, application developers in turn have limited knowledge of the cloud infrastructure. In this paper, we propose learning adaptation rules during runtime. To this end, we introduce FQL4KE, a self-learning fuzzy cloud controller. In particular, FQL4KE learns and modifies fuzzy rules at runtime. The benefit is that for designing cloud controllers, we do not have to rely solely on precise design-time knowledge, which may be difficult to acquire. FQL4KE empowers users to specify cloud controllers by simply adjusting weights representing priorities in system goals instead of specifying complex adaptation rules. The applicability of FQL4KE has been experimentally assessed as part of the cloud application framework ElasticBench. The experimental results indicate that FQL4KE outperforms our previously developed fuzzy controller without learning mechanisms and the native Azure auto-scaling.\nAchieving efficient and scalable exploration in complex domains poses a major challenge in reinforcement learning. While Bayesian and PAC-MDP approaches to the exploration problem offer strong formal guarantees, they are often impractical in higher dimensions due to their reliance on enumerating the state-action space. Hence, exploration in complex domains is often performed with simple epsilon-greedy methods. In this paper, we consider the challenging Atari games domain, which requires processing raw pixel inputs and delayed rewards. We evaluate several more sophisticated exploration strategies, including Thompson sampling and Boltzman exploration, and propose a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics. By parameterizing our learned model with a neural network, we are able to develop a scalable and efficient approach to exploration bonuses that can be applied to tasks with complex, high-dimensional state spaces. In the Atari domain, our method provides the most consistent improvement across a range of games that pose a major challenge for prior methods. In addition to raw game-scores, we also develop an AUC-100 metric for the Atari Learning domain to evaluate the impact of exploration on this benchmark.\nIn this paper we propose the approach for constructing partitionings of hard variants of the Boolean satisfiability problem (SAT). Such partitionings can be used for solving corresponding SAT instances in parallel. For the same SAT instance one can construct different partitionings, each of them is a set of simplified versions of the original SAT instance. The effectiveness of an arbitrary partitioning is determined by the total time of solving of all SAT instances from it. We suggest the approach, based on the Monte Carlo method, for estimating time of processing of an arbitrary partitioning. With each partitioning we associate a point in the special finite search space. The estimation of effectiveness of the particular partitioning is the value of predictive function in the corresponding point of this space. The problem of search for an effective partitioning can be formulated as a problem of optimization of the predictive function. We use metaheuristic algorithms (simulated annealing and tabu search) to move from point to point in the search space. In our computational experiments we found partitionings for SAT instances encoding problems of inversion of some cryptographic functions. Several of these SAT instances with realistic predicted solving time were successfully solved on a computing cluster and in the volunteer computing project SAT@home. The solving time agrees well with estimations obtained by the proposed method.\nAs software systems are getting increasingly connected, there is a need for equipping nonmonotonic logic programs with access to external sources that are possibly remote and may contain information in heterogeneous formats. To cater for this need, HEX programs were designed as a generalization of answer set programs with an API style interface that allows to access arbitrary external sources, providing great flexibility. Efficient evaluation of such programs however is challenging, and it requires to interleave external computation and model building; to decide when to switch between these tasks is difficult, and existing approaches have limited scalability in many real-world application scenarios. We present a new approach for the evaluation of logic programs with external source access, which is based on a configurable framework for dividing the non-ground program into possibly overlapping smaller parts called evaluation units. The latter will be processed by interleaving external evaluation and model building using an evaluation graph and a model graph, respectively, and by combining intermediate results. Experiments with our prototype implementation show a significant improvement compared to previous approaches. While designed for HEX-programs, the new evaluation approach may be deployed to related rule-based formalisms as well.\nOur goal is to answer elementary-level science questions using knowledge extracted automatically from science textbooks, expressed in a subset of first-order logic. Given the incomplete and noisy nature of these automatically extracted rules, Markov Logic Networks (MLNs) seem a natural model to use, but the exact way of leveraging MLNs is by no means obvious. We investigate three ways of applying MLNs to our task. In the first, we simply use the extracted science rules directly as MLN clauses. Unlike typical MLN applications, our domain has long and complex rules, leading to an unmanageable number of groundings. We exploit the structure present in hard constraints to improve tractability, but the formulation remains ineffective. In the second approach, we instead interpret science rules as describing prototypical entities, thus mapping rules directly to grounded MLN assertions, whose constants are then clustered using existing entity resolution methods. This drastically simplifies the network, but still suffers from brittleness. Finally, our third approach, called Praline, uses MLNs to align the lexical elements as well as define and control how inference should be performed in this task. Our experiments, demonstrating a 15\\% accuracy boost and a 10x reduction in runtime, suggest that the flexibility and different inference semantics of Praline are a better fit for the natural language question answering task.\nIn real clustering applications, proximity data, in which only pairwise similarities or dissimilarities are known, is more general than object data, in which each pattern is described explicitly by a list of attributes. Medoid-based clustering algorithms, which assume the prototypes of classes are objects, are of great value for partitioning relational data sets. In this paper a new prototype-based clustering method, named Evidential C-Medoids (ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical framework of belief functions is proposed. In ECMdd, medoids are utilized as the prototypes to represent the detected classes, including specific classes and imprecise classes. Specific classes are for the data which are distinctly far from the prototypes of other classes, while imprecise classes accept the objects that may be close to the prototypes of more than one class. This soft decision mechanism could make the clustering results more cautious and reduce the misclassification rates. Experiments in synthetic and real data sets are used to illustrate the performance of ECMdd. The results show that ECMdd could capture well the uncertainty in the internal data structure. Moreover, it is more robust to the initializations compared with FCMdd.\nMany algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we study the scaling behavior of MCTS, on a highly optimized real-world application, on real hardware. The Intel Xeon Phi allows shared memory scaling studies up to 61 cores and 244 hardware threads. We compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling) approaches. Interestingly, we find that a straightforward thread pool with a work-sharing FIFO queue shows the best performance. A crucial element for this high performance is the controlling of the grain size, an approach that we call Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon CPUs shows an even more comprehensible distinction in performance between different threading libraries. We achieve, to the best of our knowledge, the fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a real application (47 relative to a sequential run).\nReal-time bidding (RTB) has become one of the largest online advertising markets in the world. Today the bid price per ad impression is typically decided by the expected value of how it can lead to a desired action event (e.g., registering an account or placing a purchase order) to the advertiser. However, this industry standard approach to decide the bid price does not consider the actual effect of the ad shown to the user, which should be measured based on the performance lift among users who have been or have not been exposed to a certain treatment of ads. In this paper, we propose a new bidding strategy and prove that if the bid price is decided based on the performance lift rather than absolute performance value, advertisers can actually gain more action events. We describe the modeling methodology to predict the performance lift and demonstrate the actual performance gain through blind A/B test with real ad campaigns in an industry-leading Demand-Side Platform (DSP). We also discuss the relationship between attribution models and bidding strategies. We prove that, to move the DSPs to bid based on performance lift, they should be rewarded according to the relative performance lift they contribute.\nThe visualization of an image collection is the process of displaying a collection of images on a screen under some specific layout requirements. This paper focuses on an important problem that is not well addressed by the previous methods: visualizing image collections into arbitrary layout shapes while arranging images according to user-defined semantic or visual correlations (e.g., color or object category). To this end, we first propose a property-based tree construction scheme to organize images of a collection into a tree structure according to user-defined properties. In this way, images can be adaptively placed with the desired semantic or visual correlations in the final visualization layout. Then, we design a two-step visualization optimization scheme to further optimize image layouts. As a result, multiple layout effects including layout shape and image overlap ratio can be effectively controlled to guarantee a satisfactory visualization. Finally, we also propose a tree-transfer scheme such that visualization layouts can be adaptively changed when users select different \"images of interest\". We demonstrate the effectiveness of our proposed approach through the comparisons with state-of-the-art visualization techniques.\nBuilding's energy consumption prediction is a major concern in the recent years and many efforts have been achieved in order to improve the energy management of buildings. In particular, the prediction of energy consumption in building is essential for the energy operator to build an optimal operating strategy, which could be integrated to building's energy management system (BEMS). This paper proposes a prediction model for building energy consumption using support vector machine (SVM). Data-driven model, for instance, SVM is very sensitive to the selection of training data. Thus the relevant days data selection method based on Dynamic Time Warping is used to train SVM model. In addition, to encompass thermal inertia of building, pseudo dynamic model is applied since it takes into account information of transition of energy consumption effects and occupancy profile. Relevant days data selection and whole training data model is applied to the case studies of Ecole des Mines de Nantes, France Office building. The results showed that support vector machine based on relevant data selection method is able to predict the energy consumption of building with a high accuracy in compare to whole data training. In addition, relevant data selection method is computationally cheaper (around 8 minute training time) in contrast to whole data training (around 31 hour for weekend and 116 hour for working days) and reveals realistic control implementation for online system as well.\nAlgorithmic decision making systems are ubiquitous across a wide variety of online as well as offline services. These systems rely on complex learning methods and vast amounts of data to optimize the service functionality, satisfaction of the end user and profitability. However, there is a growing concern that these automated decisions can lead, even in the absence of intent, to a lack of fairness, i.e., their outcomes can disproportionately hurt (or, benefit) particular groups of people sharing one or more sensitive attributes (e.g., race, sex). In this paper, we introduce a flexible mechanism to design fair classifiers by leveraging a novel intuitive measure of decision boundary (un)fairness. We instantiate this mechanism with two well-known classifiers, logistic regression and support vector machines, and show on real-world data that our mechanism allows for a fine-grained control on the degree of fairness, often at a small cost in terms of accuracy.\nDisjunctive Answer Set Programming is a powerful declarative programming paradigm with complexity beyond NP. Identifying classes of programs for which the consistency problem is in NP is of interest from the theoretical standpoint and can potentially lead to improvements in the design of answer set programming solvers. One of such classes consists of dual-normal programs, where the number of positive body atoms in proper rules is at most one. Unlike other classes of programs, dual-normal programs have received little attention so far. In this paper we study this class. We relate dual-normal programs to propositional theories and to normal programs by presenting several inter-translations. With the translation from dual-normal to normal programs at hand, we introduce the novel class of body-cycle free programs, which are in many respects dual to head-cycle free programs. We establish the expressive power of dual-normal programs in terms of SE- and UE-models, and compare them to normal programs. We also discuss the complexity of deciding whether dual-normal programs are strongly and uniformly equivalent.\nThe Support Vector Machine (SVM) method has been widely used in numerous classification tasks. The main idea of this algorithm is based on the principle of the margin maximization to find an hyperplane which separates the data into two different classes.In this paper, SVM is applied to phoneme recognition task. However, in many real-world problems, each phoneme in the data set for recognition problems may differ in the degree of significance due to noise, inaccuracies, or abnormal characteristics; All those problems can lead to the inaccuracies in the prediction phase. Unfortunately, the standard formulation of SVM does not take into account all those problems and, in particular, the variation in the speech input. This paper presents a new formulation of SVM (B-SVM) that attributes to each phoneme a confidence degree computed based on its geometric position in the space. Then, this degree is used in order to strengthen the class membership of the tested phoneme. Hence, we introduce a reformulation of the standard SVM that incorporates the degree of belief. Experimental performance on TIMIT database shows the effectiveness of the proposed method B-SVM on a phoneme recognition problem.\nWords (phrases or symbols) play a key role in human life. Word (phrase or symbol) representation is the fundamental problem for knowledge representation and understanding. A word (phrase or symbol) usually represents a name of a category. However, it is always a challenge that how to represent a category can make it easily understood. In this paper, a new representation for a category is discussed, which can be considered a generalization of classic set. In order to reduce representation complexity, the economy principle of category representation is proposed. The proposed category representation provides a powerful tool for analyzing conceptual systems, relations between words, communication, knowledge, situations. More specifically, the conceptual system, word relations and communication are mathematically defined and classified such as ideal conceptual system, perfect communication and so on; relation between words and sentences is also studied, which shows that knowledge are words. Furthermore, how conceptual systems and words depend on situations is presented, and how truth is defined is also discussed.\nApproximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space, and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss-Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains.\nThe lack of reliable data in developing countries is a major obstacle to sustainable development, food security, and disaster relief. Poverty data, for example, is typically scarce, sparse in coverage, and labor-intensive to obtain. Remote sensing data such as high-resolution satellite imagery, on the other hand, is becoming increasingly available and inexpensive. Unfortunately, such data is highly unstructured and currently no techniques exist to automatically extract useful insights to inform policy decisions and help direct humanitarian efforts. We propose a novel machine learning approach to extract large-scale socioeconomic indicators from high-resolution satellite imagery. The main challenge is that training data is very scarce, making it difficult to apply modern techniques such as Convolutional Neural Networks (CNN). We therefore propose a transfer learning approach where nighttime light intensities are used as a data-rich proxy. We train a fully convolutional CNN model to predict nighttime lights from daytime imagery, simultaneously learning features that are useful for poverty prediction. The model learns filters identifying different terrains and man-made structures, including roads, buildings, and farmlands, without any supervision beyond nighttime lights. We demonstrate that these learned features are highly informative for poverty mapping, even approaching the predictive performance of survey data collected in the field.\nThis paper examines the role and efficiency of the non-convex loss functions for binary classification problems. In particular, we investigate how to design a simple and effective boosting algorithm that is robust to the outliers in the data. The analysis of the role of a particular non-convex loss for prediction accuracy varies depending on the diminishing tail properties of the gradient of the loss -- the ability of the loss to efficiently adapt to the outlying data, the local convex properties of the loss and the proportion of the contaminated data. In order to use these properties efficiently, we propose a new family of non-convex losses named $\\gamma$-robust losses. Moreover, we present a new boosting framework, {\\it Arch Boost}, designed for augmenting the existing work such that its corresponding classification algorithm is significantly more adaptable to the unknown data contamination. Along with the Arch Boosting framework, the non-convex losses lead to the new class of boosting algorithms, named adaptive, robust, boosting (ARB). Furthermore, we present theoretical examples that demonstrate the robustness properties of the proposed algorithms. In particular, we develop a new breakdown point analysis and a new influence function analysis that demonstrate gains in robustness. Moreover, we present new theoretical results, based only on local curvatures, which may be used to establish statistical and optimization properties of the proposed Arch boosting algorithms with highly non-convex loss functions. Extensive numerical calculations are used to illustrate these theoretical properties and reveal advantages over the existing boosting methods when data exhibits a number of outliers.\nPurpose: In this paper, we investigate a framework for interactive brain tumor segmentation which, at its core, treats the problem of interactive brain tumor segmentation as a machine learning problem.   Methods: This method has an advantage over typical machine learning methods for this task where generalization is made across brains. The problem with these methods is that they need to deal with intensity bias correction and other MRI-specific noise. In this paper, we avoid these issues by approaching the problem as one of within brain generalization. Specifically, we propose a semi-automatic method that segments a brain tumor by training and generalizing within that brain only, based on some minimum user interaction.   Conclusion: We investigate how adding spatial feature coordinates (i.e. $i$, $j$, $k$) to the intensity features can significantly improve the performance of different classification methods such as SVM, kNN and random forests. This would only be possible within an interactive framework. We also investigate the use of a more appropriate kernel and the adaptation of hyper-parameters specifically for each brain.   Results: As a result of these experiments, we obtain an interactive method whose results reported on the MICCAI-BRATS 2013 dataset are the second most accurate compared to published methods, while using significantly less memory and processing power than most state-of-the-art methods.\nAnswer set programming is a declarative programming paradigm oriented towards difficult combinatorial search problems. A fundamental task in answer set programming is to compute stable models, i.e., solutions of logic programs. Answer set solvers are the programs that perform this task. The problem of deciding whether a disjunctive program has a stable model is $\\Sigma^P_2$-complete. The high complexity of reasoning within disjunctive logic programming is responsible for few solvers capable of dealing with such programs, namely DLV, GnT, Cmodels, CLASP and WASP. In this paper we show that transition systems introduced by Nieuwenhuis, Oliveras, and Tinelli to model and analyze satisfiability solvers can be adapted for disjunctive answer set solvers. Transition systems give a unifying perspective and bring clarity in the description and comparison of solvers. They can be effectively used for analyzing, comparing and proving correctness of search algorithms as well as inspiring new ideas in the design of disjunctive answer set solvers. In this light, we introduce a general template, which accounts for major techniques implemented in disjunctive solvers. We then illustrate how this general template captures solvers DLV, GnT and Cmodels. We also show how this framework provides a convenient tool for designing new solving algorithms by means of combinations of techniques employed in different solvers.\nTransferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately addressed. First, the agent should be able to avoid negative transfer, which happens when the transfer hampers or slows down the learning instead of helping it. Second, the agent should be able to selectively transfer, which is the ability to select and transfer from different and multiple source tasks for different parts of the state space of the target task. We propose A2T (Attend, Adapt and Transfer), an attentive deep architecture which adapts and transfers from these source tasks. Our model is generic enough to effect transfer of either policies or value functions. Empirical evaluations on different learning algorithms show that A2T is an effective architecture for transfer by being able to avoid negative transfer while transferring selectively from multiple source tasks in the same domain.\nMuch of the world's data is streaming, time-series data, where anomalies give significant information in critical situations; examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, not batches, and learn while simultaneously making predictions. There are no benchmarks to adequately test and score the efficacy of real-time anomaly detectors. Here we propose the Numenta Anomaly Benchmark (NAB), which attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with real-world time-series data across a variety of domains, and automatically adapt to changing statistics. Rewarding these characteristics is formalized in NAB, using a scoring algorithm designed for streaming data. NAB evaluates detectors on a benchmark dataset with labeled, real-world time-series data. We present these components, and give results and analyses for several open source, commercially-used algorithms. The goal for NAB is to provide a standard, open source framework with which the research community can compare and evaluate different algorithms for detecting anomalies in streaming data.\nThe increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely large for weights connecting deep layers (layers near the output layer), and extremely small for shallow layers (near the input layer); this results in slow learning in the shallow layers. Additionally, it has also been shown that in highly non-convex problems, such as deep neural networks, there is a proliferation of high-error low curvature saddle points, which slows down learning dramatically. In this paper, we attempt to overcome the two above problems by proposing an optimization method for training deep neural networks which uses learning rates which are both specific to each layer in the network and adaptive to the curvature of the function, increasing the learning rate at low curvature points. This enables us to speed up learning in the shallow layers of the network and quickly escape high-error low curvature saddle points. We test our method on standard image classification datasets such as MNIST, CIFAR10 and ImageNet, and demonstrate that our method increases accuracy as well as reduces the required training time over standard algorithms.\nThe only rigorous approaches for achieving a numerical proof of optimality in global optimization are interval-based methods that interleave branching of the search-space and pruning of the subdomains that cannot contain an optimal solution. State-of-the-art solvers generally integrate local optimization algorithms to compute a good upper bound of the global minimum over each subspace. In this document, we propose a cooperative framework in which interval methods cooperate with evolutionary algorithms. The latter are stochastic algorithms in which a population of candidate solutions iteratively evolves in the search-space to reach satisfactory solutions.   Within our cooperative solver Charibde, the evolutionary algorithm and the interval-based algorithm run in parallel and exchange bounds, solutions and search-space in an advanced manner via message passing. A comparison of Charibde with state-of-the-art interval-based solvers (GlobSol, IBBA, Ibex) and NLP solvers (Couenne, BARON) on a benchmark of difficult COCONUT problems shows that Charibde is highly competitive against non-rigorous solvers and converges faster than rigorous solvers by an order of magnitude.\nWe analyze the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotate the RI-TIMEXes in three corpora to study the characteristics of RI-TMEXes in different domains. This informed the design of our RI-TIMEX normalization system for the clinical domain, which consists of an anchor point classifier, an anchor relation classifier and a rule-based RI-TIMEX text span parser. We experiment with different feature sets and perform error analysis for each system component. The annotation confirmed the hypotheses that we can simplify the RI-TIMEXes normalization task using two multi-label classifiers. Our system achieves anchor point classification, anchor relation classification and rule-based parsing accuracy of 74.68%, 87.71% and 57.2% (82.09% under relaxed matching criteria) respectively on the held-out test set of the 2012 i2b2 temporal relation challenge. Experiments with feature sets reveals some interesting findings such as the verbal tense feature does not inform the anchor relation classification in clinical narratives as much as the tokens near the RI-TIMEX. Error analysis shows that underrepresented anchor point and anchor relation classes are difficult to detect. We formulate the RI-TIMEX normalization problem as a pair of multi-label classification problems. Considering only the RI-TIMEX extraction and normalization, the system achieves statistically significant improvement over the RI-TIMEX results of the best systems in the 2012 i2b2 challenge.\nLatent variable models have accumulated a considerable amount of interest from the industry and academia for their versatility in a wide range of applications. A large amount of effort has been made to develop systems that is able to extend the systems to a large scale, in the hope to make use of them on industry scale data. In this paper, we describe a system that operates at a scale orders of magnitude higher than previous works, and an order of magnitude faster than state-of-the-art system at the same scale, at the same time showing more robustness and more accurate results.   Our system uses a number of advances in distributed inference: high performance in synchronization of sufficient statistics with relaxed consistency model; fast sampling, using the Metropolis-Hastings-Walker method to overcome dense generative models; statistical modeling, moving beyond Latent Dirichlet Allocation (LDA) to Pitman-Yor distributions (PDP) and Hierarchical Dirichlet Process (HDP) models; sophisticated parameter projection schemes, to resolve the conflicts within the constraint between parameters arising from the relaxed consistency model.   This work significantly extends the domain of applicability of what is commonly known as the Parameter Server. We obtain results with up to hundreds billion oftokens, thousands of topics, and a vocabulary of a few million token-types, using up to 60,000 processor cores operating on a production cluster of a large Internet company. This demonstrates the feasibility to scale to problems orders of magnitude larger than any previously published work.\nEvidence for small amounts of very hot plasma has been found in active regions and might be the indication of an impulsive heating, released at spatial scales smaller than the cross section of a single loop. We investigate the heating and substructure of coronal loops in the core of one such active region by analyzing the light curves in the smallest resolution elements of solar observations in two EUV channels (94 A and 335 A) from the Atmospheric Imaging Assembly on-board the Solar Dynamics Observatory. We model the evolution of a bundle of strands heated by a storm of nanoflares by means of a hydrodynamic 0D loop model (EBTEL). The light curves obtained from the random combination of those of single strands are compared to the observed light curves either in a single pixel or in a row of pixels, simultaneously in the two channels and using two independent methods: an artificial intelligent system (Probabilistic Neural Network, PNN) and a simple cross-correlation technique. We explore the space of the parameters to constrain the distribution of the heat pulses, their duration and their spatial size, and, as a feedback on the data, their signatures on the light curves. From both methods the best agreement is obtained for a relatively large population of events (1000) with a short duration (less than 1 min) and a relatively shallow distribution (power law with index 1.5) in a limited energy range (1.5 decades). The feedback on the data indicates that bumps in the light curves, especially in the 94 A channel, are signatures of a heating excess that occurred a few minutes before.\nRecently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound $\\tilde O(\\frac{|\\mathcal S|^2 |\\mathcal A| H^2}{\\epsilon^2} \\ln\\frac 1 \\delta)$ and a lower PAC bound $\\tilde \\Omega(\\frac{|\\mathcal S| |\\mathcal A| H^2}{\\epsilon^2} \\ln \\frac 1 {\\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\\mathcal S|$. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least $H^3$.\nWe consider the problem of learning causal networks with interventions, when each intervention is limited in size under Pearl's Structural Equation Model with independent errors (SEM-IE). The objective is to minimize the number of experiments to discover the causal directions of all the edges in a causal graph. Previous work has focused on the use of separating systems for complete graphs for this task. We prove that any deterministic adaptive algorithm needs to be a separating system in order to learn complete graphs in the worst case. In addition, we present a novel separating system construction, whose size is close to optimal and is arguably simpler than previous work in combinatorics. We also develop a novel information theoretic lower bound on the number of interventions that applies in full generality, including for randomized adaptive learning algorithms.   For general chordal graphs, we derive worst case lower bounds on the number of interventions. Building on observations about induced trees, we give a new deterministic adaptive algorithm to learn directions on any chordal skeleton completely. In the worst case, our achievable scheme is an $\\alpha$-approximation algorithm where $\\alpha$ is the independence number of the graph. We also show that there exist graph classes for which the sufficient number of experiments is close to the lower bound. In the other extreme, there are graph classes for which the required number of experiments is multiplicatively $\\alpha$ away from our lower bound.   In simulations, our algorithm almost always performs very close to the lower bound, while the approach based on separating systems for complete graphs is significantly worse for random chordal graphs.\nRecent applications of Stackelberg Security Games (SSG), from wildlife crime to urban crime, have employed machine learning tools to learn and predict adversary behavior using available data about defender-adversary interactions. Given these recent developments, this paper commits to an approach of directly learning the response function of the adversary. Using the PAC model, this paper lays a firm theoretical foundation for learning in SSGs (e.g., theoretically answer questions about the numbers of samples required to learn adversary behavior) and provides utility guarantees when the learned adversary model is used to plan the defender's strategy. The paper also aims to answer practical questions such as how much more data is needed to improve an adversary model's accuracy. Additionally, we explain a recently observed phenomenon that prediction accuracy of learned adversary behavior is not enough to discover the utility maximizing defender strategy. We provide four main contributions: (1) a PAC model of learning adversary response functions in SSGs; (2) PAC-model analysis of the learning of key, existing bounded rationality models in SSGs; (3) an entirely new approach to adversary modeling based on a non-parametric class of response functions with PAC-model analysis and (4) identification of conditions under which computing the best defender strategy against the learned adversary behavior is indeed the optimal strategy. Finally, we conduct experiments with real-world data from a national park in Uganda, showing the benefit of our new adversary modeling approach and verification of our PAC model predictions.\nClassification is a fundamental task in machine learning and data mining. Existing classification methods are designed to classify unknown instances within a set of previously known training classes. Such a classification takes the form of a prediction within a closed-set of classes. However, a more realistic scenario that fits real-world applications is to consider the possibility of encountering instances that do not belong to any of the training classes, $i.e.$, an open-set classification. In such situation, existing closed-set classifiers will assign a training label to these instances resulting in a misclassification. In this paper, we introduce Galaxy-X, a novel multi-class classification approach for open-set recognition problems. For each class of the training set, Galaxy-X creates a minimum bounding hyper-sphere that encompasses the distribution of the class by enclosing all of its instances. In such manner, our method is able to distinguish instances resembling previously seen classes from those that are of unknown ones. To adequately evaluate open-set classification, we introduce a novel evaluation procedure. Experimental results on benchmark datasets show the efficiency of our approach in classifying novel instances from known as well as unknown classes.\nDeviations from rational decision-making due to limited computational resources have been studied in the field of bounded rationality, originally proposed by Herbert Simon. There have been a number of different approaches to model bounded rationality ranging from optimality principles to heuristics. Here we take an information-theoretic approach to bounded rationality, where information-processing costs are measured by the relative entropy between a posterior decision strategy and a given fixed prior strategy. In the case of multiple environments, it can be shown that there is an optimal prior rendering the bounded rationality problem equivalent to the rate distortion problem for lossy compression in information theory. Accordingly, the optimal prior and posterior strategies can be computed by the well-known Blahut-Arimoto algorithm which requires the computation of partition sums over all possible outcomes and cannot be applied straightforwardly to continuous problems. Here we derive a sampling-based alternative update rule for the adaptation of prior behaviors of decision-makers and we show convergence to the optimal prior predicted by rate distortion theory. Importantly, the update rule avoids typical infeasible operations such as the computation of partition sums. We show in simulations a proof of concept for discrete action and environment domains. This approach is not only interesting as a generic computational method, but might also provide a more realistic model of human decision-making processes occurring on a fast and a slow time scale.\nA recommender system is an information filtering technology which can be used to predict preference ratings of items (products, services, movies, etc) and/or to output a ranking of items that are likely to be of interest to the user. Context-aware recommender systems (CARS) learn and predict the tastes and preferences of users by incorporating available contextual information in the recommendation process. One of the major challenges in context-aware recommender systems research is the lack of automatic methods to obtain contextual information for these systems. Considering this scenario, in this paper, we propose to use contextual information from topic hierarchies of the items (web pages) to improve the performance of context-aware recommender systems. The topic hierarchies are constructed by an extension of the LUPI-based Incremental Hierarchical Clustering method that considers three types of information: traditional bag-of-words (technical information), and the combination of named entities (privileged information I) with domain terms (privileged information II). We evaluated the contextual information in four context-aware recommender systems. Different weights were assigned to each type of information. The empirical results demonstrated that topic hierarchies with the combination of the two kinds of privileged information can provide better recommendations.\nMulti-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically \"attending\" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during training and testing. In particular, we track people in videos and use a recurrent neural network (RNN) to represent the track features. We learn time-varying attention weights to combine these features at each time-instant. The attended features are then processed using another RNN for event detection/classification. Since most video datasets with multiple people are restricted to a small number of videos, we also collected a new basketball dataset comprising 257 basketball games with 14K event annotations corresponding to 11 event classes. Our model outperforms state-of-the-art methods for both event classification and detection on this new dataset. Additionally, we show that the attention mechanism is able to consistently localize the relevant players.\nA theoretical framework that supports automated construction of dynamic prime models purely from experimental time series data has been invented and developed, which can automatically generate (construct) data-driven models of any time series data in seconds. This has resulted in the formulation and formalisation of new reverse engineering and dynamic methods for automated systems modelling of complex systems, including complex biological, financial, control, and artificial neural network systems. The systems/model theory behind the invention has been formalised as a new, effective and robust system identification strategy complementary to process-based modelling. The proposed dynamic modelling and network inference solutions often involve tackling extremely difficult parameter estimation challenges, inferring unknown underlying network structures, and unsupervised formulation and construction of smart and intelligent ODE models of complex systems. In underdetermined conditions, i.e., cases of dealing with how best to instantaneously and rapidly construct data-consistent prime models of unknown (or well-studied) complex system from small-sized time series data, inference of unknown underlying network of interaction is more challenging. This article reports a robust step-by-step mathematical and computational analysis of the entire prime model construction process that determines a model from data in less than a minute.\nMost machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.\nLearning about the social structure of hidden and hard-to-reach populations --- such as drug users and sex workers --- is a major goal of epidemiological and public health research on risk behaviors and disease prevention. Respondent-driven sampling (RDS) is a peer-referral process widely used by many health organizations, where research subjects recruit other subjects from their social network. In such surveys, researchers observe who recruited whom, along with the time of recruitment and the total number of acquaintances (network degree) of respondents. However, due to privacy concerns, the identities of acquaintances are not disclosed. In this work, we show how to reconstruct the underlying network structure through which the subjects are recruited. We formulate the dynamics of RDS as a continuous-time diffusion process over the underlying graph and derive the likelihood for the recruitment time series under an arbitrary recruitment time distribution. We develop an efficient stochastic optimization algorithm called RENDER (REspoNdent-Driven nEtwork Reconstruction) that finds the network that best explains the collected data. We support our analytical results through an exhaustive set of experiments on both synthetic and real data.\nIn the task of Object Recognition, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition. Deep learning methods have achieved wide success in this task. In contrast, object pose regression using these approaches has received relatively much less attention. In this paper we show how deep architectures, specifically Convolutional Neural Networks (CNN), can be adapted to the task of simultaneous categorization and pose estimation of objects. We investigate and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layers of distributed representations of CNNs represent object pose information and how this contradicts with object category representations. We extensively experiment on two recent large and challenging multi-view datasets. Our models achieve better than state-of-the-art performance on both datasets.\nWe address the problem of Visual Question Answering (VQA), which requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on convolutional-recurrent networks to this problem, but have failed to model spatial inference. To remedy this, we propose a model we call the Spatial Memory Network and apply it to the VQA task. Memory networks are recurrent neural networks with an explicit attention mechanism that selects certain parts of the information stored in memory. Our Spatial Memory Network stores neuron activations from different spatial regions of the image in its memory, and uses the question to choose relevant regions for computing the answer, a process of which constitutes a single \"hop\" in the network. We propose a novel spatial attention architecture that aligns words with image patches in the first hop, and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop. To better understand the inference process learned by the network, we design synthetic questions that specifically require spatial inference and visualize the attention weights. We evaluate our model on two published visual question answering datasets, DAQUAR [1] and VQA [2], and obtain improved results compared to a strong deep baseline model (iBOWIMG) which concatenates image and question features to predict the answer [3].\nTraditional algorithms for robots who need to integrate into a wireless network often focus on one specific task. In this work we want to develop simple, adaptive and reusable algorithms for real world applications for this scenario. Starting with the most basic task for mobile wireless network nodes, finding the position of another node, we introduce an algorithm able to solve this task. We then show how this algorithm can readily be employed to solve a large number of other related tasks like finding the optimal position to bridge two static network nodes. For this we first introduce a meta-algorithm inspired by autonomous robot learning strategies and the concept of internal models which yields a class of source seeking algorithms for mobile nodes. The effectiveness of this algorithm is demonstrated in real world experiments using a physical mobile robot and standard 802.11 wireless LAN in an office environment. We also discuss the differences to conventional algorithms and give the robotics perspective on this class of algorithms. Then we proceed to show how more complex tasks, which might be encountered by mobile nodes, can be encoded in the same framework and how the introduced algorithm can solve them. These tasks can be direct (cross layer) optimization tasks or can also encode more complex tasks like bridging two network nodes. We choose the bridging scenario as an example, implemented on a real physical robot, and show how the robot can solve it in a real world experiment.\nIn practice, there are often explicit constraints on what representations or decisions are acceptable in an application of machine learning. For example it may be a legal requirement that a decision must not favour a particular group. Alternatively it can be that that representation of data must not have identifying information. We address these two related issues by learning flexible representations that minimize the capability of an adversarial critic. This adversary is trying to predict the relevant sensitive variable from the representation, and so minimizing the performance of the adversary ensures there is little or no information in the representation about the sensitive variable. We demonstrate this adversarial approach on two problems: making decisions free from discrimination and removing private information from images. We formulate the adversarial model as a minimax problem, and optimize that minimax objective using a stochastic gradient alternate min-max optimizer. We demonstrate the ability to provide discriminant free representations for standard test problems, and compare with previous state of the art methods for fairness, showing statistically significant improvement across most cases. The flexibility of this method is shown via a novel problem: removing annotations from images, from unaligned training examples of annotated and unannotated images, and with no a priori knowledge of the form of annotation provided to the model.\nComputer system monitoring generates huge amounts of logs that record the interaction of system entities. How to query such data to better understand system behaviors and identify potential system risks and malicious behaviors becomes a challenging task for system administrators due to the dynamics and heterogeneity of the data. System monitoring data are essentially heterogeneous temporal graphs with nodes being system entities and edges being their interactions over time. Given the complexity of such graphs, it becomes time-consuming for system administrators to manually formulate useful queries in order to examine abnormal activities, attacks, and vulnerabilities in computer systems.   In this work, we investigate how to query temporal graphs and treat query formulation as a discriminative temporal graph pattern mining problem. We introduce TGMiner to mine discriminative patterns from system logs, and these patterns can be taken as templates for building more complex queries. TGMiner leverages temporal information in graphs to prune graph patterns that share similar growth trend without compromising pattern quality. Experimental results on real system data show that TGMiner is 6-32 times faster than baseline methods. The discovered patterns were verified by system experts; they achieved high precision (97%) and recall (91%).\nMethods for learning word representations using large text corpora have received much attention lately due to their impressive performance in numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and word analogy detection. Despite their success, these data-driven word representation learning methods do not consider the rich semantic relational structure between words in a co-occurring context. On the other hand, already much manual effort has gone into the construction of semantic lexicons such as the WordNet that represent the meanings of words by defining the various relationships that exist among the words in a language. We consider the question, can we improve the word representations learnt using a corpora by integrating the knowledge from semantic lexicons?. For this purpose, we propose a joint word representation learning method that simultaneously predicts the co-occurrences of two words in a sentence subject to the relational constrains given by the semantic lexicon. We use relations that exist between words in the lexicon to regularize the word representations learnt from the corpus. Our proposed method statistically significantly outperforms previously proposed methods for incorporating semantic lexicons into word representations on several benchmark datasets for semantic similarity and word analogy.\nWe propose a method for hand pose estimation based on a deep regressor trained on two different kinds of input. Raw depth data is fused with an intermediate representation in the form of a segmentation of the hand into parts. This intermediate representation contains important topological information and provides useful cues for reasoning about joint locations. The mapping from raw depth to segmentation maps is learned in a semi/weakly-supervised way from two different datasets: (i) a synthetic dataset created through a rendering pipeline including densely labeled ground truth (pixelwise segmentations); and (ii) a dataset with real images for which ground truth joint positions are available, but not dense segmentations. Loss for training on real images is generated from a patch-wise restoration process, which aligns tentative segmentation maps with a large dictionary of synthetic poses. The underlying premise is that the domain shift between synthetic and real data is smaller in the intermediate representation, where labels carry geometric and topological meaning, than in the raw input domain. Experiments on the NYU dataset show that the proposed training method decreases error on joints over direct regression of joints from depth data by 15.7%.\nNon-sentential utterances (NSUs) are utterances that lack a complete sentential form but whose meaning can be inferred from the dialogue context, such as \"OK\", \"where?\", \"probably at his apartment\". The interpretation of non-sentential utterances is an important problem in computational linguistics since they constitute a frequent phenomena in dialogue and they are intrinsically context-dependent. The interpretation of NSUs is the task of retrieving their full semantic content from their form and the dialogue context. The first half of this thesis is devoted to the NSU classification task. Our work builds upon Fern\\'andez et al. (2007) which present a series of machine-learning experiments on the classification of NSUs. We extended their approach with a combination of new features and semi-supervised learning techniques. The empirical results presented in this thesis show a modest but significant improvement over the state-of-the-art classification performance. The consecutive, yet independent, problem is how to infer an appropriate semantic representation of such NSUs on the basis of the dialogue context. Fern\\'andez (2006) formalizes this task in terms of \"resolution rules\" built on top of the Type Theory with Records (TTR). Our work is focused on the reimplementation of the resolution rules from Fern\\'andez (2006) with a probabilistic account of the dialogue state. The probabilistic rules formalism Lison (2014) is particularly suited for this task because, similarly to the framework developed by Ginzburg (2012) and Fern\\'andez (2006), it involves the specification of update rules on the variables of the dialogue state to capture the dynamics of the conversation. However, the probabilistic rules can also encode probabilistic knowledge, thereby providing a principled account of ambiguities in the NSU resolution process.\nWe introduce the concept of continuous transportation task to the context of multi-agent systems. A continuous transportation task is one in which a multi-agent team visits a number of fixed locations, picks up objects, and delivers them to a final destination. The goal is to maximize the rate of transportation while the objects are replenished over time. Examples of problems that need continuous transportation are foraging, area sweeping, and first/last mile problem. Previous approaches typically neglect the interference and are highly dependent on communications among agents. Some also incorporate an additional reconnaissance agent to gather information. In this paper, we present a hybrid of centralized and distributed approaches that minimize the interference and communications in the multi-agent team without the need for a reconnaissance agent. We contribute two partitioning-transportation algorithms inspired by existing algorithms, and contribute one novel online partitioning-transportation algorithm with information gathering in the multi-agent team. Our algorithms have been implemented and tested extensively in the simulation. The results presented in this paper demonstrate the effectiveness of our algorithms that outperform the existing algorithms, even without any communications between the agents and without the presence of a reconnaissance agent.\nThis paper proposes algorithms for learning two-level Boolean rules in Conjunctive Normal Form (CNF, i.e. AND-of-ORs) or Disjunctive Normal Form (DNF, i.e. OR-of-ANDs) as a type of human-interpretable classification model, aiming for a favorable trade-off between the classification accuracy and the simplicity of the rule. Two formulations are proposed. The first is an integer program whose objective function is a combination of the total number of errors and the total number of features used in the rule. We generalize a previously proposed linear programming (LP) relaxation from one-level to two-level rules. The second formulation replaces the 0-1 classification error with the Hamming distance from the current two-level rule to the closest rule that correctly classifies a sample. Based on this second formulation, block coordinate descent and alternating minimization algorithms are developed. Experiments show that the two-level rules can yield noticeably better performance than one-level rules due to their dramatically larger modeling capacity, and the two algorithms based on the Hamming distance formulation are generally superior to the other two-level rule learning methods in our comparison. A proposed approach to binarize any fractional values in the optimal solutions of LP relaxations is also shown to be effective.\nEmbedding learning, a.k.a. representation learning, has been shown to be able to model large-scale semantic knowledge graphs. A key concept is a mapping of the knowledge graph to a tensor representation whose entries are predicted by models using latent representations of generalized entities. Latent variable models are well suited to deal with the high dimensionality and sparsity of typical knowledge graphs. In recent publications the embedding models were extended to also consider time evolutions, time patterns and subsymbolic representations. In this paper we map embedding models, which were developed purely as solutions to technical problems for modelling temporal knowledge graphs, to various cognitive memory functions, in particular to semantic and concept memory, episodic memory, sensory memory, short-term memory, and working memory. We discuss learning, query answering, the path from sensory input to semantic decoding, and the relationship between episodic memory and semantic memory. We introduce a number of hypotheses on human memory that can be derived from the developed mathematical models.\nHuman language is recognized as a very complex domain since decades. No computer system has been able to reach human levels of performance so far. The only known computational system capable of proper language processing is the human brain. While we gather more and more data about the brain, its fundamental computational processes still remain obscure. The lack of a sound computational brain theory also prevents the fundamental understanding of Natural Language Processing. As always when science lacks a theoretical foundation, statistical modeling is applied to accommodate as many sampled real-world data as possible. An unsolved fundamental issue is the actual representation of language (data) within the brain, denoted as the Representational Problem. Starting with Jeff Hawkins' Hierarchical Temporal Memory (HTM) theory, a consistent computational theory of the human cortex, we have developed a corresponding theory of language data representation: The Semantic Folding Theory. The process of encoding words, by using a topographic semantic space as distributional reference frame into a sparse binary representational vector is called Semantic Folding and is the central topic of this document. Semantic Folding describes a method of converting language from its symbolic representation (text) into an explicit, semantically grounded representation that can be generically processed by Hawkins' HTM networks. As it turned out, this change in representation, by itself, can solve many complex NLP problems by applying Boolean operators and a generic similarity function like the Euclidian Distance. Many practical problems of statistical NLP systems, like the high cost of computation, the fundamental incongruity of precision and recall , the complex tuning procedures etc., can be elegantly overcome by applying Semantic Folding.\nThis paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by algorithmic information theory, we describe RNN-based AIs (RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending sequences of tasks, some of them provided by the user, others invented by the RNNAI itself in a curious, playful fashion, to improve its RNN-based world model. Unlike our previous model-building RNN-based RL machines dating back to 1990, the RNNAI learns to actively query its model for abstract reasoning and planning and decision making, essentially \"learning to think.\" The basic ideas of this report can be applied to many other cases where one RNN-like system exploits the algorithmic information content of another. They are taken from a grant proposal submitted in Fall 2014, and also explain concepts such as \"mirror neurons.\" Experimental results will be described in separate papers.\nBicycle-sharing systems, which can provide shared bike usage services for the public, have been launched in many big cities. In bicycle-sharing systems, people can borrow and return bikes at any stations in the service region very conveniently. Therefore, bicycle-sharing systems are normally used as a short-distance trip supplement for private vehicles as well as regular public transportation. Meanwhile, for stations located at different places in the service region, the bike usages can be quite skewed and imbalanced. Some stations have too many incoming bikes and get jammed without enough docks for upcoming bikes, while some other stations get empty quickly and lack enough bikes for people to check out. Therefore, inferring the potential destinations and arriving time of each individual trip beforehand can effectively help the service providers schedule manual bike re-dispatch in advance. In this paper, we will study the individual trip prediction problem for bicycle-sharing systems. To address the problem, we study a real-world bicycle-sharing system and analyze individuals' bike usage behaviors first. Based on the analysis results, a new trip destination prediction and trip duration inference model will be introduced. Experiments conducted on a real-world bicycle-sharing system demonstrate the effectiveness of the proposed model.\nThe constraint satisfaction problem (CSP) involves deciding, given a set of variables and a set of constraints on the variables, whether or not there is an assignment to the variables satisfying all of the constraints. One formulation of the CSP is as the problem of deciding, given a pair (G,H) of relational structures, whether or not there is a homomorphism from the first structure to the second structure. The CSP is in general NP-hard; a common way to restrict this problem is to fix the second structure H, so that each structure H gives rise to a problem CSP(H). The problem family CSP(H) has been studied using an algebraic approach, which links the algorithmic and complexity properties of each problem CSP(H) to a set of operations, the so-called polymorphisms of H. Certain types of polymorphisms are known to imply the polynomial-time tractability of $CSP(H)$, and others are conjectured to do so. This article systematically studies---for various classes of polymorphisms---the computational complexity of deciding whether or not a given structure H admits a polymorphism from the class. Among other results, we prove the NP-completeness of deciding a condition conjectured to characterize the tractable problems CSP(H), as well as the NP-completeness of deciding if CSP(H) has bounded width.\nThe main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.\nFeature extraction has gained increasing attention in the field of machine learning, as in order to detect patterns, extract information, or predict future observations from big data, the urge of informative features is crucial. The process of extracting features is highly linked to dimensionality reduction as it implies the transformation of the data from a sparse high-dimensional space, to higher level meaningful abstractions. This dissertation employs Neural Networks for distributed paragraph representations, and Latent Dirichlet Allocation to capture higher level features of paragraph vectors. Although Neural Networks for distributed paragraph representations are considered the state of the art for extracting paragraph vectors, we show that a quick topic analysis model such as Latent Dirichlet Allocation can provide meaningful features too. We evaluate the two methods on the CMU Movie Summary Corpus, a collection of 25,203 movie plot summaries extracted from Wikipedia. Finally, for both approaches, we use K-Nearest Neighbors to discover similar movies, and plot the projected representations using T-Distributed Stochastic Neighbor Embedding to depict the context similarities. These similarities, expressed as movie distances, can be used for movies recommendation. The recommended movies of this approach are compared with the recommended movies from IMDB, which use a collaborative filtering recommendation approach, to show that our two models could constitute either an alternative or a supplementary recommendation approach.\nResidual learning has recently surfaced as an effective means of constructing very deep neural networks for object recognition. However, current incarnations of residual networks do not allow for the modeling and integration of complex relations between closely coupled recognition tasks or across domains. Such problems are often encountered in multimedia applications involving large-scale content recognition. We propose a novel extension of residual learning for deep networks that enables intuitive learning across multiple related tasks using cross-connections called cross-residuals. These cross-residuals connections can be viewed as a form of in-network regularization and enables greater network generalization. We show how cross-residual learning (CRL) can be integrated in multitask networks to jointly train and detect visual concepts across several tasks. We present a single multitask cross-residual network with >40% less parameters that is able to achieve competitive, or even better, detection performance on a visual sentiment concept detection problem normally requiring multiple specialized single-task networks. The resulting multitask cross-residual network also achieves better detection performance by about 10.4% over a standard multitask residual network without cross-residuals with even a small amount of cross-task weighting.\nWhat is the right supervisory signal to train visual representations? Current approaches in computer vision use category labels from datasets such as ImageNet to train ConvNets. However, in case of biological agents, visual representation learning does not require millions of semantic labels. We argue that biological agents use physical interactions with the world to learn visual representations unlike current vision systems which just use passive observations (images and videos downloaded from web). For example, babies push objects, poke them, put them in their mouth and throw them to learn representations. Towards this goal, we build one of the first systems on a Baxter platform that pushes, pokes, grasps and observes objects in a tabletop environment. It uses four different types of physical interactions to collect more than 130K datapoints, with each datapoint providing supervision to a shared ConvNet architecture allowing us to learn visual representations. We show the quality of learned representations by observing neuron activations and performing nearest neighbor retrieval on this learned representation. Quantitatively, we evaluate our learned ConvNet on image classification tasks and show improvements compared to learning without external data. Finally, on the task of instance retrieval, our network outperforms the ImageNet network on recall@1 by 3%\nRepresentation and learning of commonsense knowledge is one of the foundational problems in the quest to enable deep language understanding. This issue is particularly challenging for understanding casual and correlational relationships between events. While this topic has received a lot of interest in the NLP community, research has been hindered by the lack of a proper evaluation framework. This paper attempts to address this problem with a new framework for evaluating story understanding and script learning: the 'Story Cloze Test'. This test requires a system to choose the correct ending to a four-sentence story. We created a new corpus of ~50k five-sentence commonsense stories, ROCStories, to enable this evaluation. This corpus is unique in two ways: (1) it captures a rich set of causal and temporal commonsense relations between daily events, and (2) it is a high quality collection of everyday life stories that can also be used for story generation. Experimental evaluation shows that a host of baselines and state-of-the-art models based on shallow language understanding struggle to achieve a high score on the Story Cloze Test. We discuss these implications for script and story learning, and offer suggestions for deeper language understanding.\nMachine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker's state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker is the first neural-network tracker that learns to track generic objects at 100 fps.\nIn theoretical cognitive science, there is a tension between highly structured models whose parameters have a direct psychological interpretation and highly complex, general-purpose models whose parameters and representations are difficult to interpret. The former typically provide more insight into cognition but the latter often perform better. This tension has recently surfaced in the realm of educational data mining, where a deep learning approach to predicting students' performance as they work through a series of exercises---termed deep knowledge tracing or DKT---has demonstrated a stunning performance advantage over the mainstay of the field, Bayesian knowledge tracing or BKT. In this article, we attempt to understand the basis for DKT's advantage by considering the sources of statistical regularity in the data that DKT can leverage but which BKT cannot. We hypothesize four forms of regularity that BKT fails to exploit: recency effects, the contextualized trial sequence, inter-skill similarity, and individual variation in ability. We demonstrate that when BKT is extended to allow it more flexibility in modeling statistical regularities---using extensions previously proposed in the literature---BKT achieves a level of performance indistinguishable from that of DKT. We argue that while DKT is a powerful, useful, general-purpose framework for modeling student learning, its gains do not come from the discovery of novel representations---the fundamental advantage of deep learning. To answer the question posed in our title, knowledge tracing may be a domain that does not require `depth'; shallow models like BKT can perform just as well and offer us greater interpretability and explanatory power.\nThe AUV three-dimension path planning in complex turbulent underwater environment is investigated in this research, in which static current map data and uncertain static-moving time variant obstacles are taken into account. Robustness of AUVs path planning to this strong variability is known as a complex NP-hard problem and is considered a critical issue to ensure vehicles safe deployment. Efficient evolutionary techniques have substantial potential of handling NP hard complexity of path planning problem as more powerful and fast algorithms among other approaches for mentioned problem. For the purpose of this research Differential Evolution (DE) technique is conducted to solve the AUV path planning problem in a realistic underwater environment. The path planners designed in this paper are capable of extracting feasible areas of a real map to determine the allowed spaces for deployment, where coastal area, islands, static/dynamic obstacles and ocean current is taken into account and provides the efficient path with a small computation time. The results obtained from analyze of experimental demonstrate the inherent robustness and drastic efficiency of the proposed scheme in enhancement of the vehicles path planning capability in coping undesired current, using useful current flow, and avoid colliding collision boundaries in a real-time manner. The proposed approach is also flexible and strictly respects to vehicle's kinematic constraints resisting current instabilities.\nWe present a method for the classification of multi-labelled text documents explicitly designed for data stream applications that require to process a virtually infinite sequence of data using constant memory and constant processing time. Our method is composed of an online procedure used to efficiently map text into a low-dimensional feature space and a partition of this space into a set of regions for which the system extracts and keeps statistics used to predict multi-label text annotations. Documents are fed into the system as a sequence of words, mapped to a region of the partition, and annotated using the statistics computed from the labelled instances colliding in the same region. This approach is referred to as clashing. We illustrate the method in real-world text data, comparing the results with those obtained using other text classifiers. In addition, we provide an analysis about the effect of the representation space dimensionality on the predictive performance of the system. Our results show that the online embedding indeed approximates the geometry of the full corpus-wise TF and TF-IDF space. The model obtains competitive F measures with respect to the most accurate methods, using significantly fewer computational resources. In addition, the method achieves a higher macro-averaged F measure than methods with similar running time. Furthermore, the system is able to learn faster than the other methods from partially labelled streams.\nPeer review, evaluation, and selection is a fundamental aspect of modern science. Funding bodies the world over employ experts to review and select the best proposals of those submitted for funding. The problem of peer selection, however, is much more general: a professional society may want to give a subset of its members awards based on the opinions of all members; an instructor for a MOOC or online course may want to crowdsource grading; or a marketing company may select ideas from group brainstorming sessions based on peer evaluation. We make three fundamental contributions to the study of procedures or mechanisms for peer selection, a specific type of group decision-making problem, studied in computer science, economics, and political science. First, we propose a novel mechanism that is strategyproof, i.e., agents cannot benefit by reporting insincere valuations. Second, we demonstrate the effectiveness of our mechanism by a comprehensive simulation-based comparison with a suite of mechanisms found in the literature. Finally, our mechanism employs a randomized rounding technique that is of independent interest, as it solves the apportionment problem that arises in various settings where discrete resources such as parliamentary representation slots need to be divided proportionally.\nWe consider the well-studied cake cutting problem in which the goal is to find an envy-free allocation based on queries from $n$ agents. The problem has received attention in computer science, mathematics, and economics. It has been a major open problem whether there exists a discrete and bounded envy-free protocol. We resolve the problem by proposing a discrete and bounded envy-free protocol for any number of agents. The maximum number of queries required by the protocol is $n^{n^{n^{n^{n^n}}}}$. We additionally show that even if we do not run our protocol to completion, it can find in at most $n^3{(n^2)}^n$ queries a partial allocation of the cake that achieves proportionality (each agent gets at least $1/n$ of the value of the whole cake) and envy-freeness. Finally we show that an envy-free partial allocation can be computed in at most $n^3{(n^2)}^n$ queries such that each agent gets a connected piece that gives the agent at least $1/(3n)$ of the value of the whole cake.\nA real-world newspaper distribution problem with recycling policy is tackled in this work. In order to meet all the complex restrictions contained in such a problem, it has been modeled as a rich vehicle routing problem, which can be more specifically considered as an asymmetric and clustered vehicle routing problem with simultaneous pickup and deliveries, variable costs and forbidden paths (AC-VRP-SPDVCFP). This is the first study of such a problem in the literature. For this reason, a benchmark composed by 15 instances has been also proposed. In the design of this benchmark, real geographical positions have been used, located in the province of Bizkaia, Spain. For the proper treatment of this AC-VRP-SPDVCFP, a discrete firefly algorithm (DFA) has been developed. This application is the first application of the firefly algorithm to any rich vehicle routing problem. To prove that the proposed DFA is a promising technique, its performance has been compared with two other well-known techniques: an evolutionary algorithm and an evolutionary simulated annealing. Our results have shown that the DFA has outperformed these two classic meta-heuristics.\nSemantic matching, which aims to determine the matching degree between two texts, is a fundamental problem for many NLP applications. Recently, deep learning approach has been applied to this problem and significant improvements have been achieved. In this paper, we propose to view the generation of the global interaction between two texts as a recursive process: i.e. the interaction of two texts at each position is a composition of the interactions between their prefixes as well as the word level interaction at the current position. Based on this idea, we propose a novel deep architecture, namely Match-SRNN, to model the recursive matching structure. Firstly, a tensor is constructed to capture the word level interactions. Then a spatial RNN is applied to integrate the local interactions recursively, with importance determined by four types of gates. Finally, the matching score is calculated based on the global interaction. We show that, after degenerated to the exact matching scenario, Match-SRNN can approximate the dynamic programming process of longest common subsequence. Thus, there exists a clear interpretation for Match-SRNN. Our experiments on two semantic matching tasks showed the effectiveness of Match-SRNN, and its ability of visualizing the learned matching structure.\nThanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile service robots, and deploying these systems for long-term installations in security and care environments. Over four deployments, our robots have been operational for a combined duration of 104 days autonomously performing end-user defined tasks, covering 116km in the process. In this article we describe the approach we have used to enable long-term autonomous operation in everyday environments, and how our robots are able to use their long run times to improve their own performance.\nIn multiagent systems, we often have a set of agents each of which have a preference ordering over a set of items and one would like to know these preference orderings for various tasks, for example, data analysis, preference aggregation, voting etc. However, we often have a large number of items which makes it impractical to ask the agents for their complete preference ordering. In such scenarios, we usually elicit these agents' preferences by asking (a hopefully small number of) comparison queries --- asking an agent to compare two items. Prior works on preference elicitation focus on unrestricted domain and the domain of single peaked preferences and show that the preferences in single peaked domain can be elicited by much less number of queries compared to unrestricted domain. We extend this line of research and study preference elicitation for single peaked preferences on trees which is a strict superset of the domain of single peaked preferences. We show that the query complexity crucially depends on the number of leaves, the path cover number, and the distance from path of the underlying single peaked tree, whereas the other natural parameters like maximum degree, diameter, pathwidth do not play any direct role in determining query complexity. We then investigate the query complexity for finding a weak Condorcet winner for preferences single peaked on a tree and show that this task has much less query complexity than preference elicitation. Here again we observe that the number of leaves in the underlying single peaked tree and the path cover number of the tree influence the query complexity of the problem.\nBio-inspired algorithms like Genetic Algorithms and Fuzzy Inference Systems (FIS) are nowadays widely adopted as hybrid techniques in commercial and industrial environment. In this paper we present an interesting application of the fuzzy-GA paradigm to Smart Grids. The main aim consists in performing decision making for power flow management tasks in the proposed microgrid model equipped by renewable sources and an energy storage system, taking into account the economical profit in energy trading with the main-grid. In particular, this study focuses on the application of a Hierarchical Genetic Algorithm (HGA) for tuning the Rule Base (RB) of a Fuzzy Inference System (FIS), trying to discover a minimal fuzzy rules set in a Fuzzy Logic Controller (FLC) adopted to perform decision making in the microgrid. The HGA rationale focuses on a particular encoding scheme, based on control genes and parametric genes applied to the optimization of the FIS parameters, allowing to perform a reduction in the structural complexity of the RB. This approach will be referred in the following as fuzzy-HGA. Results are compared with a simpler approach based on a classic fuzzy-GA scheme, where both FIS parameters and rule weights are tuned, while the number of fuzzy rules is fixed in advance. Experiments shows how the fuzzy-HGA approach adopted for the synthesis of the proposed controller outperforms the classic fuzzy-GA scheme, increasing the accounting profit by 67\\% in the considered energy trading problem yielding at the same time a simpler RB.\nAn Autonomous Underwater Vehicle (AUV) needs to acquire a certain degree of autonomy for any particular underwater mission to fulfill the mission objectives successfully and ensure its safety in all stages of the mission in a large scale operating filed. In this paper, a novel combinatorial conflict-free-task assignment strategy consisting an interactive engagement of a local path planner and an adaptive global route planner, is introduced. The method is established upon the heuristic search potency of the Particle Swarm Optimisation (PSO) algorithm to address the discrete nature of routing-task assignment approach and the complexity of NP-hard path planning problem. The proposed hybrid method is highly efficient for having a reactive guidance framework that guarantees successful completion of missions specifically in cluttered environments. To examine the performance of the method in a context of mission productivity, mission time management and vehicle safety, a series of simulation studies are undertaken. The results of simulations declare that the proposed method is reliable and robust, particularly in dealing with uncertainties, and it can significantly enhance the level of vehicle's autonomy by relying on its reactive nature and capability of providing fast feasible solutions.\nIn this work we present a novel end-to-end framework for tracking and classifying a robot's surroundings in complex, dynamic and only partially observable real-world environments. The approach deploys a recurrent neural network to filter an input stream of raw laser measurements in order to directly infer object locations, along with their identity in both visible and occluded areas. To achieve this we first train the network using unsupervised Deep Tracking, a recently proposed theoretical framework for end-to-end space occupancy prediction. We show that by learning to track on a large amount of unsupervised data, the network creates a rich internal representation of its environment which we in turn exploit through the principle of inductive transfer of knowledge to perform the task of it's semantic classification. As a result, we show that only a small amount of labelled data suffices to steer the network towards mastering this additional task. Furthermore we propose a novel recurrent neural network architecture specifically tailored to tracking and semantic classification in real-world robotics applications. We demonstrate the tracking and classification performance of the method on real-world data collected at a busy road junction. Our evaluation shows that the proposed end-to-end framework compares favourably to a state-of-the-art, model-free tracking solution and that it outperforms a conventional one-shot training scheme for semantic classification.\nEliciting the preferences of a set of agents over a set of alternatives is a problem of fundamental importance in social choice theory. Prior work on this problem has studied the query complexity of preference elicitation for the unrestricted domain and for the domain of single peaked preferences. In this paper, we consider the domain of single crossing preference profiles and study the query complexity of preference elicitation under various settings. We consider two distinct situations: when an ordering of the voters with respect to which the profile is single crossing is known versus when it is unknown. We also consider different access models: when the votes can be accessed at random, as opposed to when they are coming in a pre-defined sequence. In the sequential access model, we distinguish two cases when the ordering is known: the first is that sequence in which the votes appear is also a single-crossing order, versus when it is not.   The main contribution of our work is to provide polynomial time algorithms with low query complexity for preference elicitation in all the above six cases. Further, we show that the query complexities of our algorithms are optimal up to constant factors for all but one of the above six cases. We then present preference elicitation algorithms for profiles which are close to being single crossing under various notions of closeness, for example, single crossing width, minimum number of candidates | voters whose deletion makes a profile single crossing.\nAutomatic image annotation has been an important research topic in facilitating large scale image management and retrieval. Existing methods focus on learning image-tag correlation or correlation between tags to improve annotation accuracy. However, most of these methods evaluate their performance using top-k retrieval performance, where k is fixed. Although such setting gives convenience for comparing different methods, it is not the natural way that humans annotate images. The number of annotated tags should depend on image contents. Inspired by the recent progress in machine translation and image captioning, we propose a novel Recurrent Image Annotator (RIA) model that forms image annotation task as a sequence generation problem so that RIA can natively predict the proper length of tags according to image contents. We evaluate the proposed model on various image annotation datasets. In addition to comparing our model with existing methods using the conventional top-k evaluation measures, we also provide our model as a high quality baseline for the arbitrary length image tagging task. Moreover, the results of our experiments show that the order of tags in training phase has a great impact on the final annotation performance.\nWhile probability theory is normally applied to external environments, there has been some recent interest in probabilistic modeling of the outputs of computations that are too expensive to run. Since mathematical logic is a powerful tool for reasoning about computer programs, we consider this problem from the perspective of integrating probability and logic. Recent work on assigning probabilities to mathematical statements has used the concept of coherent distributions, which satisfy logical constraints such as the probability of a sentence and its negation summing to one. Although there are algorithms which converge to a coherent probability distribution in the limit, this yields only weak guarantees about finite approximations of these distributions. In our setting, this is a significant limitation: Coherent distributions assign probability one to all statements provable in a specific logical theory, such as Peano Arithmetic, which can prove what the output of any terminating computation is; thus, a coherent distribution must assign probability one to the output of any terminating computation. To model uncertainty about computations, we propose to work with approximations to coherent distributions. We introduce inductive coherence, a strengthening of coherence that provides appropriate constraints on finite approximations, and propose an algorithm which satisfies this criterion.\nA function $f: \\mathbb{R}^d \\rightarrow \\mathbb{R}$ is referred to as a Sparse Additive Model (SPAM), if it is of the form $f(\\mathbf{x}) = \\sum_{l \\in \\mathcal{S}}\\phi_{l}(x_l)$, where $\\mathcal{S} \\subset [d]$, $|\\mathcal{S}| \\ll d$. Assuming $\\phi_l$'s and $\\mathcal{S}$ to be unknown, the problem of estimating $f$ from its samples has been studied extensively. In this work, we consider a generalized SPAM, allowing for second order interaction terms. For some $\\mathcal{S}_1 \\subset [d], \\mathcal{S}_2 \\subset {[d] \\choose 2}$, the function $f$ is assumed to be of the form: $$f(\\mathbf{x}) = \\sum_{p \\in \\mathcal{S}_1}\\phi_{p} (x_p) + \\sum_{(l,l^{\\prime}) \\in \\mathcal{S}_2}\\phi_{(l,l^{\\prime})} (x_{l},x_{l^{\\prime}}).$$ Assuming $\\phi_{p},\\phi_{(l,l^{\\prime})}$, $\\mathcal{S}_1$ and, $\\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and exactly recovers $\\mathcal{S}_1,\\mathcal{S}_2$. Consequently, this also enables us to estimate the underlying $\\phi_p, \\phi_{(l,l^{\\prime})}$. We derive sample complexity bounds for our scheme and also extend our analysis to include the situation where the queries are corrupted with noise -- either stochastic, or arbitrary but bounded. Lastly, we provide simulation results on synthetic data, that validate our theoretical findings.\nThe field of iterated belief change has focused mainly on revision, with the other main operator of AGM belief change theory, i.e. contraction, receiving relatively little attention. In this paper we extend the Harper Identity from single-step change to define iterated contraction in terms of iterated revision. Specifically, just as the Harper Identity provides a recipe for defining the belief set resulting from contracting A in terms of (i) the initial belief set and (ii) the belief set resulting from revision by not-A, we look at ways to define the plausibility ordering over worlds resulting from contracting A in terms of (iii) the initial plausibility ordering, and (iv) the plausibility ordering resulting from revision by not-A. After noting that the most straightforward such extension leads to a trivialisation of the space of permissible orderings, we provide a family of operators for combining plausibility orderings that avoid such a result. These operators are characterised in our domain of interest by a pair of intuitively compelling properties, which turn out to enable the derivation of a number of iterated contraction postulates from postulates for iterated revision. We finish by observing that a salient member of this family allows for the derivation of counterparts for contraction of some well known iterated revision operators, as well as for defining new iterated contraction operators.\nIt is important to have multi-agent robotic system specifications that ensure correctness properties of safety and liveness. As these systems have concurrency, and often have dynamic environment, the formal specification and verification of these systems along with step-wise refinement from abstract to concrete concepts play a major role in system correctness. Formal verification is used for exhaustive investigation of the system space thus ensuring that undetected failures in the behavior are excluded. We construct the system incrementally from subcomponents, based on software architecture. The challenge is to develop a safe multi-agent robotic system, more specifically to ensure the correctness properties of safety and liveness. Formal specifications based on model-checking are flexible, have a concrete syntax, and play vital role in correctness of a multi-agent robotic system. To formally verify safety and liveness of such systems is important because they have high concurrency and in most of the cases have dynamic environment. We have considered a case-study of a multi-agent robotic system for the transport of stock between storehouses to exemplify our formal approach. Our proposed development approach allows for formal verification during specification definition. The development process has been classified in to four major phases of requirement specifications, verification specifications, architecture specifications and implementation.\nTwo important requirements when aggregating the preferences of multiple agents are that the outcome should be economically efficient and the aggregation mechanism should not be manipulable. In this paper, we provide a computer-aided proof of a sweeping impossibility using these two conditions for randomized aggregation mechanisms. More precisely, we show that every efficient aggregation mechanism can be manipulated for all expected utility representations of the agents' preferences. This settles an open problem and strengthens a number of existing theorems, including statements that were shown within the special domain of assignment. Our proof is obtained by formulating the claim as a satisfiability problem over predicates from real-valued arithmetic, which is then checked using an SMT (satisfiability modulo theories) solver. In order to verify the correctness of the result, a minimal unsatisfiable set of constraints returned by the SMT solver was translated back into a proof in higher-order logic, which was automatically verified by an interactive theorem prover. To the best of our knowledge, this is the first application of SMT solvers in computational social choice.\nWe consider the problem of selecting the best variable-value strategy for solving a given problem in constraint programming. We show that the recent Embarrassingly Parallel Search method (EPS) can be used for this purpose. EPS proposes to solve a problem by decomposing it in a lot of subproblems and to give them on-demand to workers which run in parallel. Our method uses a part of these subproblems as a simple sample as defined in statistics for comparing some strategies in order to select the most promising one that will be used for solving the remaining subproblems. For each subproblem of the sample, the parallelism helps us to control the running time of the strategies because it gives us the possibility to introduce timeouts by stopping a strategy when it requires more than twice the time of the best one. Thus, we can deal with the great disparity in solving times for the strategies. The selections we made are based on the Wilcoxon signed rank tests because no assumption has to be made on the distribution of the solving times and because these tests can deal with the censored data that we obtain after introducing timeouts. The experiments we performed on a set of classical benchmarks for satisfaction and optimization problems show that our method obtain good performance by selecting almost all the time the best variable-value strategy and by almost never choosing a variable-value strategy which is dramatically slower than the best one. Our method also outperforms the portfolio approach consisting in running some strategies in parallel and is competitive with the multi armed bandit framework.\nPrivacy has traditionally been a major motivation for decentralized problem solving. However, even though several metrics have been proposed to quantify it, none of them is easily integrated with common solvers. Constraint programming is a fundamental paradigm used to approach various families of problems. We introduce Utilitarian Distributed Constraint Satisfaction Problems (UDisCSP) where the utility of each state is estimated as the difference between the the expected rewards for agreements on assignments for shared variables, and the expected cost of privacy loss. Therefore, a traditional DisCSP with privacy requirements is viewed as a planning problem. The actions available to agents are: communication and local inference. Common decentralized solvers are evaluated here from the point of view of their interpretation as greedy planners. Further, we investigate some simple extensions where these solvers start taking into account the utility function. In these extensions we assume that the planning problem is further restricting the set of communication actions to only the communication primitives present in the corresponding solver protocols. The solvers obtained for the new type of problems propose the action (communication/inference) to be performed in each situation, defining thereby the policy.\nGroup discussions are essential for organizing every aspect of modern life, from faculty meetings to senate debates, from grant review panels to papal conclaves. While costly in terms of time and organization effort, group discussions are commonly seen as a way of reaching better decisions compared to solutions that do not require coordination between the individuals (e.g. voting)---through discussion, the sum becomes greater than the parts. However, this assumption is not irrefutable: anecdotal evidence of wasteful discussions abounds, and in our own experiments we find that over 30% of discussions are unproductive.   We propose a framework for analyzing conversational dynamics in order to determine whether a given task-oriented discussion is worth having or not. We exploit conversational patterns reflecting the flow of ideas and the balance between the participants, as well as their linguistic choices. We apply this framework to conversations naturally occurring in an online collaborative world exploration game developed and deployed to support this research. Using this setting, we show that linguistic cues and conversational patterns extracted from the first 20 seconds of a team discussion are predictive of whether it will be a wasteful or a productive one.\nTensor factorization is a powerful tool to analyse multi-way data. Compared with traditional multi-linear methods, nonlinear tensor factorization models are capable of capturing more complex relationships in the data. However, they are computationally expensive and may suffer severe learning bias in case of extreme data sparsity. To overcome these limitations, in this paper we propose a distributed, flexible nonlinear tensor factorization model. Our model can effectively avoid the expensive computations and structural restrictions of the Kronecker-product in existing TGP formulations, allowing an arbitrary subset of tensorial entries to be selected to contribute to the training. At the same time, we derive a tractable and tight variational evidence lower bound (ELBO) that enables highly decoupled, parallel computations and high-quality inference. Based on the new bound, we develop a distributed inference algorithm in the MapReduce framework, which is key-value-free and can fully exploit the memory cache mechanism in fast MapReduce systems such as SPARK. Experimental results fully demonstrate the advantages of our method over several state-of-the-art approaches, in terms of both predictive performance and computational efficiency. Moreover, our approach shows a promising potential in the application of Click-Through-Rate (CTR) prediction for online advertising.\nThis paper is motivated by a series of (related) questions as to whether a computer can have pleasure and pain, what pleasure (and intensity of pleasure) is, and, ultimately, what concepts of emotion are.   To determine what an emotion is, is a matter of conceptualization, namely, understanding and explicitly encoding the concept of emotion as people use it in everyday life. This is a notoriously difficult problem (Frijda, 1986, Fehr \\& Russell, 1984). This paper firstly shows why this is a difficult problem by aligning it with the conceptualization of a few other so called semantic primitives such as \"EXIST\", \"FORCE\", \"BIG\" (plus \"LIMIT\"). The definitions of these thought-to-be-indefinable concepts, given in this paper, show what formal definitions of concepts look like and how concepts are constructed. As a by-product, owing to the explicit account of the meaning of \"exist\", the famous dispute between Einstein and Bohr is naturally resolved from linguistic point of view. Secondly, defending Frijda's view that emotion is action tendency (or Ryle's behavioral disposition (propensity)), we give a list of emotions defined in terms of action tendency. In particular, the definitions of pleasure and the feeling of beauty are presented.   Further, we give a formal definition of \"action tendency\", from which the concept of \"intensity\" of emotions (including pleasure) is naturally derived in a formal fashion. The meanings of \"wish\", \"wait\", \"good\", \"hot\" are analyzed.\nDespite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size of problems we are able to tackle with current hardware. Second, it remains unclear how close it matches the task loss such as the top-k error rate or other non-differentiable evaluation metrics which we aim to optimize ultimately. In this paper, we introduce an alternative classification loss function, the Z-loss, which is designed to address these two issues. Unlike the log-softmax, it has the desirable property of belonging to the spherical loss family (Vincent et al., 2015), a class of loss functions for which training can be performed very efficiently with a complexity independent of the number of output classes. We show experimentally that it significantly outperforms the other spherical loss functions previously investigated. Furthermore, we show on a word language modeling task that it also outperforms the log-softmax with respect to certain ranking scores, such as top-k scores, suggesting that the Z-loss has the flexibility to better match the task loss. These qualities thus makes the Z-loss an appealing candidate to train very efficiently large output networks such as word-language models or other extreme classification problems. On the One Billion Word (Chelba et al., 2014) dataset, we are able to train a model with the Z-loss 40 times faster than the log-softmax and more than 4 times faster than the hierarchical softmax.\nIn existing literature, while approximate approaches based on Monte-Carlo simulation technique have been proposed to compute the semantics of probabilistic argumentation, how to improve the efficiency of computation without using simulation technique is still an open problem. In this paper, we address this problem from the following two perspectives. First, conceptually, we define specific properties to characterize the subgraphs of a PrAG with respect to a given extension, such that the probability of a set of arguments E being an extension can be defined in terms of these properties, without (or with less) construction of subgraphs. Second, computationally, we take preferred semantics as an example, and develop algorithms to evaluate the efficiency of our approach. The results show that our approach not only dramatically decreases the time for computing p(E^\\sigma), but also has an attractive property, which is contrary to that of existing approaches: the denser the edges of a PrAG are or the bigger the size of a given extension E is, the more efficient our approach computes p(E^\\sigma). Meanwhile, it is shown that under complete and preferred semantics, the problems of determining p(E^\\sigma) are fixed-parameter tractable.\nSteering a complex system towards a desired outcome is a challenging task. The lack of clarity on the system's exact architecture and the often scarce scientific data upon which to base the operationalisation of the dynamic rules that underpin the interactions between participant entities are two contributing factors. We describe an analytical approach that builds on Fuzzy Cognitive Mapping (FCM) to address the latter and represent the system as a complex network. We apply results from network controllability to address the former and determine minimal control configurations - subsets of factors, or system levers, which comprise points for strategic intervention in steering the system. We have implemented the combination of these techniques in an analytical tool that runs in the browser, and generates all minimal control configurations of a complex network. We demonstrate our approach by reporting on our experience of working alongside industrial, local-government, and NGO stakeholders in the Humber region, UK. Our results are applied to the decision-making process involved in the transition of the region to a bio-based economy.\nAnswer Set Programming (ASP) is a popular logic programming paradigm that has been applied for solving a variety of complex problems. Among the most challenging real-world applications of ASP are two industrial problems defined by Siemens: the Partner Units Problem (PUP) and the Combined Configuration Problem (CCP). The hardest instances of PUP and CCP are out of reach for state-of-the-art ASP solvers. Experiments show that the performance of ASP solvers could be significantly improved by embedding domain-specific heuristics, but a proper effective integration of such criteria in off-the-shelf ASP implementations is not obvious. In this paper the combination of ASP and domain-specific heuristics is studied with the goal of effectively solving real-world problem instances of PUP and CCP. As a byproduct of this activity, the ASP solver WASP was extended with an interface that eases embedding new external heuristics in the solver. The evaluation shows that our domain-heuristic-driven ASP solver finds solutions for all the real-world instances of PUP and CCP ever provided by Siemens. This paper is under consideration for acceptance in TPLP.\nWord puzzles and the problem of their representations in logic languages have received considerable attention in the last decade (Ponnuru et al. 2004; Shapiro 2011; Baral and Dzifcak 2012; Schwitter 2013). Of special interest is the problem of generating such representations directly from natural language (NL) or controlled natural language (CNL). An interesting variation of this problem, and to the best of our knowledge, scarcely explored variation in this context, is when the input information is inconsistent. In such situations, the existing encodings of word puzzles produce inconsistent representations and break down. In this paper, we bring the well-known type of paraconsistent logics, called Annotated Predicate Calculus (APC) (Kifer and Lozinskii 1992), to bear on the problem. We introduce a new kind of non-monotonic semantics for APC, called consistency preferred stable models and argue that it makes APC into a suitable platform for dealing with inconsistency in word puzzles and, more generally, in NL sentences. We also devise a number of general principles to help the user choose among the different representations of NL sentences, which might seem equivalent but, in fact, behave differently when inconsistent information is taken into account. These principles can be incorporated into existing CNL translators, such as Attempto Controlled English (ACE) (Fuchs et al. 2008) and PENG Light (White and Schwitter 2009). Finally, we show that APC with the consistency preferred stable model semantics can be equivalently embedded in ASP with preferences over stable models, and we use this embedding to implement this version of APC in Clingo (Gebser et al. 2011) and its Asprin add-on (Brewka et al. 2015).\nSome argue that biologically inspired algorithms are the future of solving difficult problems in computer science. Others strongly believe that the future lies in the exploration of mathematical foundations of problems at hand. The field of computer security tends to accept the latter view as a more appropriate approach due to its more workable validation and verification possibilities. The lack of rigorous scientific practices prevalent in biologically inspired security research does not aid in presenting bio-inspired security approaches as a viable way of dealing with complex security problems. This chapter introduces a biologically inspired algorithm, called the Self Organising Map (SOM), that was developed by Teuvo Kohonen in 1981. Since the algorithm's inception it has been scrutinised by the scientific community and analysed in more than 4000 research papers, many of which dealt with various computer security issues, from anomaly detection, analysis of executables all the way to wireless network monitoring. In this chapter a review of security related SOM research undertaken in the past is presented and analysed. The algorithm's biological analogies are detailed and the author's view on the future possibilities of this successful bio-inspired approach are given. The SOM algorithm's close relation to a number of vital functions of the human brain and the emergence of multi-core computer architectures are the two main reasons behind our assumption that the future of the SOM algorithm and its variations is promising, notably in the field of computer security.\nAnswer set programming (ASP) is a well-established logic programming language that offers an intuitive, declarative syntax for problem solving. In its traditional application, a fixed ASP program for a given problem is designed and the actual instance of the problem is fed into the program as a set of facts. This approach typically results in programs with comparably short and simple rules. However, as is known from complexity analysis, such an approach limits the expressive power of ASP; in fact, an entire NP-check can be encoded into a single large rule body of bounded arity that performs both a guess and a check within the same rule.   Here, we propose a novel paradigm for encoding hard problems in ASP by making explicit use of large rules which depend on the actual instance of the problem. We illustrate how this new encoding paradigm can be used, providing examples of problems from the first, second, and even third level of the polynomial hierarchy. As state-of-the-art solvers are tuned towards short rules, rule decomposition is a key technique in the practical realization of our approach. We also provide some preliminary benchmarks which indicate that giving up the convenient way of specifying a fixed program can lead to a significant speed-up.   This paper is under consideration for acceptance into TPLP.\nIn recent years, several frameworks and systems have been proposed that extend Inductive Logic Programming (ILP) to the Answer Set Programming (ASP) paradigm. In ILP, examples must all be explained by a hypothesis together with a given background knowledge. In existing systems, the background knowledge is the same for all examples; however, examples may be context-dependent. This means that some examples should be explained in the context of some information, whereas others should be explained in different contexts. In this paper, we capture this notion and present a context-dependent extension of the Learning from Ordered Answer Sets framework. In this extension, contexts can be used to further structure the background knowledge. We then propose a new iterative algorithm, ILASP2i, which exploits this feature to scale up the existing ILASP2 system to learning tasks with large numbers of examples. We demonstrate the gain in scalability by applying both algorithms to various learning tasks. Our results show that, compared to ILASP2, the newly proposed ILASP2i system can be two orders of magnitude faster and use two orders of magnitude less memory, whilst preserving the same average accuracy. This paper is under consideration for acceptance in TPLP.\nAdaptive behavior is mainly the result of adaptive brains. We go a step beyond and claim that the brain does not only adapt to its surrounding reality but rather, it builds itself up to constructs its own reality. That is, rather than just trying to passively understand its environment, the brain is the architect of its own reality in an active process where its internal models of the external world frame how its new interactions with the environment are assimilated. These internal models represent relevant predictive patterns of interaction all over the different brain structures: perceptual, sensorimotor, motor, etc. The emergence of adaptive behavior arises from this self-constructive nature of the brain, based on the following principles of organization: self-experimental, self- growing, and self-repairing. Self-experimental, since to ensure survival, the self-constructive brain (SCB) is an active machine capable of performing experiments of its own interactions with the environment by mental simulation. Self-growing, since it dynamically and incrementally constructs internal structures in order to build a model of the world as it gathers statistics from its interactions with the environment. Self-repairing, since to survive the SCB must also be robust and capable of finding ways to repair parts of previously working structures and hence re-construct a previous relevant pattern of activity.\nMarkov networks are extensively used to model complex sequential, spatial, and relational interactions in a wide range of fields. By learning the structure of independences of a domain, more accurate joint probability distributions can be obtained for inference tasks or, more directly, for interpreting the most significant relations among the variables. Recently, several researchers have investigated techniques for automatically learning the structure from data by obtaining the probabilistic maximum-a-posteriori structure given the available data. However, all the approximations proposed decompose the posterior of the whole structure into local sub-problems, by assuming that the posteriors of the Markov blankets of all the variables are mutually independent. In this work, we propose a scoring function for relaxing such assumption. The Blankets Joint Posterior score computes the joint posterior of structures as a joint distribution of the collection of its Markov blankets. Essentially, the whole posterior is obtained by computing the posterior of the blanket of each variable as a conditional distribution that takes into account information from other blankets in the network. We show in our experimental results that the proposed approximation can improve the sample complexity of state-of-the-art scores when learning complex networks, where the independence assumption between blanket variables is clearly incorrect.\nSeveral methods exist to infer causal networks from massive volumes of observational data. However, almost all existing methods require a considerable length of time series data to capture cause and effect relationships. In contrast, memory-less transition networks or Markov Chain data, which refers to one-step transitions to and from an event, have not been explored for causality inference even though such data is widely available. We find that causal network can be inferred from characteristics of four unique distribution zones around each event. We call this Composition of Transitions and show that cause, effect, and random events exhibit different behavior in their compositions. We applied machine learning models to learn these different behaviors and to infer causality. We name this new method Causality Inference using Composition of Transitions (CICT). To evaluate CICT, we used an administrative inpatient healthcare dataset to set up a network of patients transitions between different diagnoses. We show that CICT is highly accurate in inferring whether the transition between a pair of events is causal or random and performs well in identifying the direction of causality in a bi-directional association.\nDeriving an effective facial expression recognition component is important for a successful human-computer interaction system. Nonetheless, recognizing facial expression remains a challenging task. This paper describes a novel approach towards facial expression recognition task. The proposed method is motivated by the success of Convolutional Neural Networks (CNN) on the face recognition problem. Unlike other works, we focus on achieving good accuracy while requiring only a small sample data for training. Scale Invariant Feature Transform (SIFT) features are used to increase the performance on small data as SIFT does not require extensive training data to generate useful features. In this paper, both Dense SIFT and regular SIFT are studied and compared when merged with CNN features. Moreover, an aggregator of the models is developed. The proposed approach is tested on the FER-2013 and CK+ datasets. Results demonstrate the superiority of CNN with Dense SIFT over conventional CNN and CNN with SIFT. The accuracy even increased when all the models are aggregated which generates state-of-art results on FER-2013 and CK+ datasets, where it achieved 73.4% on FER-2013 and 99.1% on CK+.\nWe propose stochastic rank-$1$ bandits, a class of online learning problems where at each step a learning agent chooses a pair of row and column arms, and receives the product of their values as a reward. The main challenge of the problem is that the individual values of the row and column are unobserved. We assume that these values are stochastic and drawn independently. We propose a computationally-efficient algorithm for solving our problem, which we call Rank1Elim. We derive a $O((K + L) (1 / \\Delta) \\log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, and $\\Delta$ is the minimum of the row and column gaps; under the assumption that the mean row and column rewards are bounded away from zero. To the best of our knowledge, we present the first bandit algorithm that finds the maximum entry of a rank-$1$ matrix whose regret is linear in $K + L$, $1 / \\Delta$, and $\\log n$. We also derive a nearly matching lower bound. Finally, we evaluate Rank1Elim empirically on multiple problems. We observe that it leverages the structure of our problems and can learn near-optimal solutions even if our modeling assumptions are mildly violated.\nCharacterizing genes with semantic information is an important process regarding the description of gene products. In spite that complete genomes of many organisms have been already sequenced, the biological functions of all of their genes are still unknown. Since experimentally studying the functions of those genes, one by one, would be unfeasible, new computational methods for gene functions inference are needed. We present here a novel computational approach for inferring biological function for a set of genes with previously unknown function, given a set of genes with well-known information. This approach is based on the premise that genes with similar behaviour should be grouped together. This is known as the guilt-by-association principle. Thus, it is possible to take advantage of clustering techniques to obtain groups of unknown genes that are co-clustered with genes that have well-known semantic information (GO annotations). Meaningful knowledge to infer unknown semantic information can therefore be provided by these well-known genes. We provide a method to explore the potential function of new genes according to those currently annotated. The results obtained indicate that the proposed approach could be a useful and effective tool when used by biologists to guide the inference of biological functions for recently discovered genes. Our work sets an important landmark in the field of identifying unknown gene functions through clustering, using an external source of biological input. A simple web interface to this proposal can be found at http://fich.unl.edu.ar/sinc/webdemo/gamma-am/.\nFinding the most effective way to aggregate multi-subject fMRI data is a long-standing and challenging problem. It is of increasing interest in contemporary fMRI studies of human cognition due to the scarcity of data per subject and the variability of brain anatomy and functional response across subjects. Recent work on latent factor models shows promising results in this task but this approach does not preserve spatial locality in the brain. We examine two ways to combine the ideas of a factor model and a searchlight based analysis to aggregate multi-subject fMRI data while preserving spatial locality. We first do this directly by combining a recent factor method known as a shared response model with searchlight analysis. Then we design a multi-view convolutional autoencoder for the same task. Both approaches preserve spatial locality and have competitive or better performance compared with standard searchlight analysis and the shared response model applied across the whole brain. We also report a system design to handle the computational challenge of training the convolutional autoencoder.\nPlanning plays an important role in the broad class of decision theory. Planning has drawn much attention in recent work in the robotics and sequential decision making areas. Recently, Reinforcement Learning (RL), as an agent-environment interaction problem, has brought further attention to planning methods. Generally in RL, one can assume a generative model, e.g. graphical models, for the environment, and then the task for the RL agent is to learn the model parameters and find the optimal strategy based on these learnt parameters. Based on environment behavior, the agent can assume various types of generative models, e.g. Multi Armed Bandit for a static environment, or Markov Decision Process (MDP) for a dynamic environment. The advantage of these popular models is their simplicity, which results in tractable methods of learning the parameters and finding the optimal policy. The drawback of these models is again their simplicity: these models usually underfit and underestimate the actual environment behavior. For example, in robotics, the agent usually has noisy observations of the environment inner state and MDP is not a suitable model.   More complex models like Partially Observable Markov Decision Process (POMDP) can compensate for this drawback. Fitting this model to the environment, where the partial observation is given to the agent, generally gives dramatic performance improvement, sometimes unbounded improvement, compared to MDP. In general, finding the optimal policy for the POMDP model is computationally intractable and fully non convex, even for the class of memoryless policies. The open problem is to come up with a method to find an exact or an approximate optimal stochastic memoryless policy for POMDP models.\nState-of-the-art answer set programming (ASP) solvers rely on a program called a grounder to convert non-ground programs containing variables into variable-free, propositional programs. The size of this grounding depends heavily on the size of the non-ground rules, and thus, reducing the size of such rules is a promising approach to improve solving performance. To this end, in this paper we announce lpopt, a tool that decomposes large logic programming rules into smaller rules that are easier to handle for current solvers. The tool is specifically tailored to handle the standard syntax of the ASP language (ASP-Core) and makes it easier for users to write efficient and intuitive ASP programs, which would otherwise often require significant hand-tuning by expert ASP engineers. It is based on an idea proposed by Morak and Woltran (2012) that we extend significantly in order to handle the full ASP syntax, including complex constructs like aggregates, weak constraints, and arithmetic expressions. We present the algorithm, the theoretical foundations on how to treat these constructs, as well as an experimental evaluation showing the viability of our approach.\nAccuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models.\nIn this work a discrete counterpart to the continuous harmonic potential field approach is suggested. The extension to the discrete case makes use of the strong relation HPF-based planning has to connectionist artificial intelligence (AI). Connectionist AI systems are networks of simple, interconnected processors running in parallel within the confines of the environment in which the planning action is to be synthesized. It is not hard to see that such a paradigm naturally lends itself to planning on weighted graphs where the processors may be seen as the vertices of the graph and the relations among them as its edges. Electrical networks are an effective realization of connectionist AI. The utility of the discrete HPF (DHPF) approach is demonstrated in three ways. First, the capability of the DHPF approach to generate new, abstract, planning techniques is demonstrated by constructing a novel, efficient, optimal, discrete planning method called the M* algorithm. Also, its ability to augment the capabilities of existing planners is demonstrated by suggesting a generic solution to the lower bound problem faced by the A* algorithm. The DHPF approach is shown to be useful in solving specific planning problems in communication. It is demonstrated that the discrete HPF paradigm can support routing on-the-fly while the network is still in a transient state. It is shown by simulation that if a path to the target always exist and the switching delays in the routers are negligible, a packet will reach its destination despite the changes in the network which may simultaneously take place while the packet is being routed.\nPartial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed \\cite{lincombinatorial2014}, where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution. The paper gave a confidence bound based algorithm (GCB) that achieves $O(T^{2/3}\\log T)$ distribution independent and $O(\\log T)$ distribution dependent regret bounds. The implementation of their algorithm depends on two separate offline oracles and the distribution dependent regret additionally requires existence of a unique optimal action for the learner. Adopting their CPM model, our first contribution is a Phased Exploration with Greedy Exploitation (PEGE) algorithmic framework for the problem. Different algorithms within the framework achieve $O(T^{2/3}\\sqrt{\\log T})$ distribution independent and $O(\\log^2 T)$ distribution dependent regret respectively. Crucially, our framework needs only the simpler \"argmax\" oracle from GCB and the distribution dependent regret does not require existence of a unique optimal action. Our second contribution is another algorithm, PEGE2, which combines gap estimation with a PEGE algorithm, to achieve an $O(\\log T)$ regret bound, matching the GCB guarantee but removing the dependence on size of the learner's action space. However, like GCB, PEGE2 requires access to both offline oracles and the existence of a unique optimal action. Finally, we discuss how our algorithm can be efficiently applied to a CPM problem of practical interest: namely, online ranking with feedback at the top.\nDeep learning has been popularized by its recent successes on challenging artificial intelligence problems. One of the reasons for its dominance is also an ongoing challenge: the need for immense amounts of computational power. Hardware architects have responded by proposing a wide array of promising ideas, but to date, the majority of the work has focused on specific algorithms in somewhat narrow application domains. While their specificity does not diminish these approaches, there is a clear need for more flexible solutions. We believe the first step is to examine the characteristics of cutting edge models from across the deep learning community.   Consequently, we have assembled Fathom: a collection of eight archetypal deep learning workloads for study. Each of these models comes from a seminal work in the deep learning community, ranging from the familiar deep convolutional neural network of Krizhevsky et al., to the more exotic memory networks from Facebook's AI research group. Fathom has been released online, and this paper focuses on understanding the fundamental performance characteristics of each model. We use a set of application-level modeling tools built around the TensorFlow deep learning framework in order to analyze the behavior of the Fathom workloads. We present a breakdown of where time is spent, the similarities between the performance profiles of our models, an analysis of behavior in inference and training, and the effects of parallelism on scaling.\nSequential data modeling and analysis have become indispensable tools for analyzing sequential data such as time-series data because a larger amount of sensed event data have become available. These methods capture the sequential structure of data of interest, such as input- output relationship and correlation among datasets. However, since most studies in this area are specialized or limited for their respective applications, rigorous requirement analysis on such a model has not been examined in a general point of view. Hence, we particularly examine the structure of sequential data, and extract the necessity of \"state duration\" and \"state duration\" of events for efficient and rich representation of sequential data. Specifically addressing the hidden semi-Markov model (HSMM) that represents such state duration inside a model, we attempt to newly add representational capability of state interval of events onto HSMM. To this end, we propose two extended models; one is interval state hidden semi-Markov model (IS-HSMM) to express the length of state interval with a special state node designated as \"interval state node\". The other is interval length probability hidden semi-Markov model (ILP-HSMM) which repre- sents the length of state interval with a new probabilistic parameter \"interval length probability.\" From exhaustive simulations, we show superior performances of the proposed models in comparison with HSMM. To the best of our knowledge, our proposed models are the first extensions of HMM to support state interval representation as well as state duration representation.\nMulti-view data clustering refers to categorizing a data set by making good use of related information from multiple representations of the data. It becomes important nowadays because more and more data can be collected in a variety of ways, in different settings and from different sources, so each data set can be represented by different sets of features to form different views of it. Many approaches have been proposed to improve clustering performance by exploring and integrating heterogeneous information underlying different views. In this paper, we propose a new multi-view fuzzy clustering approach called MinimaxFCM by using minimax optimization based on well-known Fuzzy c means. In MinimaxFCM the consensus clustering results are generated based on minimax optimization in which the maximum disagreements of different weighted views are minimized. Moreover, the weight of each view can be learned automatically in the clustering process. In addition, there is only one parameter to be set besides the fuzzifier. The detailed problem formulation, updating rules derivation, and the in-depth analysis of the proposed MinimaxFCM are provided here. Experimental studies on nine multi-view data sets including real world image and document data sets have been conducted. We observed that MinimaxFCM outperforms related multi-view clustering approaches in terms of clustering accuracy, demonstrating the great potential of MinimaxFCM for multi-view data analysis.\nA great video title describes the most salient event compactly and captures the viewer's attention. In contrast, video captioning tends to generate sentences that describe the video as a whole. Although generating a video title automatically is a very useful task, it is much less addressed than video captioning. We address video title generation for the first time by proposing two methods that extend state-of-the-art video captioners to this new task. First, we make video captioners highlight sensitive by priming them with a highlight detector. Our framework allows for jointly training a model for title generation and video highlight localization. Second, we induce high sentence diversity in video captioners, so that the generated titles are also diverse and catchy. This means that a large number of sentences might be required to learn the sentence structure of titles. Hence, we propose a novel sentence augmentation method to train a captioner with additional sentence-only examples that come without corresponding videos. We collected a large-scale Video Titles in the Wild (VTW) dataset of 18100 automatically crawled user-generated videos and titles. On VTW, our methods consistently improve title prediction accuracy, and achieve the best performance in both automatic and human evaluation. Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.\nANDy , Activity Networks with Delays, is a discrete time framework aimed at the qualitative modelling of time-dependent activities. The modular and concise syntax makes ANDy suitable for an easy and natural modelling of time-dependent biological systems (i.e., regulatory pathways). Activities involve entities playing the role of activators, inhibitors or products of biochemical network operation. Activities may have given duration, i.e., the time required to obtain results. An entity may represent an object (e.g., an agent, a biochemical species or a family of thereof) with a local attribute, a state denoting its level (e.g., concentration, strength). Entities levels may change as a result of an activity or may decay gradually as time passes by. The semantics of ANDy is formally given via high-level Petri nets ensuring this way some modularity. As main results we show that ANDy systems have finite state representations even for potentially infinite processes and it well adapts to the modelling of toxic behaviours. As an illustration, we present a classification of toxicity properties and give some hints on how they can be verified with existing tools on ANDy systems. A small case study on blood glucose regulation is provided to exemplify the ANDy framework and the toxicity properties.\nRelation classification is associated with many potential applications in the artificial intelligence area. Recent approaches usually leverage neural networks based on structure features such as syntactic or dependency features to solve this problem. However, high-cost structure features make such approaches inconvenient to be directly used. In addition, structure features are probably domain-dependent. Therefore, this paper proposes a bi-directional long-short-term-memory recurrent-neural-network (Bi-LSTM-RNN) model based on low-cost sequence features to address relation classification. This model divides a sentence or text segment into five parts, namely two target entities and their three contexts. It learns the representations of entities and their contexts, and uses them to classify relations. We evaluate our model on two standard benchmark datasets in different domains, namely SemEval-2010 Task 8 and BioNLP-ST 2016 Task BB3. In the former dataset, our model achieves comparable performance compared with other models using sequence features. In the latter dataset, our model obtains the third best results compared with other models in the official evaluation. Moreover, we find that the context between two target entities plays the most important role in relation classification. Furthermore, statistic experiments show that the context between two target entities can be used as an approximate replacement of the shortest dependency path when dependency parsing is not used.\nAn important role carried out by cyber-security experts is the assessment of proposed computer systems, during their design stage. This task is fraught with difficulties and uncertainty, making the knowledge provided by human experts essential for successful assessment. Today, the increasing number of progressively complex systems has led to an urgent need to produce tools that support the expert-led process of system-security assessment. In this research, we use weighted averages (WAs) and ordered weighted averages (OWAs) with evolutionary algorithms (EAs) to create aggregation operators that model parts of the assessment process. We show how individual overall ratings for security components can be produced from ratings of their characteristics, and how these individual overall ratings can be aggregated to produce overall rankings of potential attacks on a system. As well as the identification of salient attacks and weak points in a prospective system, the proposed method also highlights which factors and security components contribute most to a component's difficulty and attack ranking respectively. A real world scenario is used in which experts were asked to rank a set of technical attacks, and to answer a series of questions about the security components that are the subject of the attacks. The work shows how finding good aggregation operators, and identifying important components and factors of a cyber-security problem can be automated. The resulting operators have the potential for use as decision aids for systems designers and cyber-security experts, increasing the amount of assessment that can be achieved with the limited resources available.\nWith the constant growth of the World Wide Web and the number of documents in different languages accordingly, the need for reliable language detection tools has increased as well. Platforms such as Twitter with predominantly short texts are becoming important information resources, which additionally imposes the need for short texts language detection algorithms. In this paper, we show how incorporating personalized user-specific information into the language detection algorithm leads to an important improvement of detection results. To choose the best algorithm for language detection for short text messages, we investigate several machine learning approaches. These approaches include the use of the well-known classifiers such as SVM and logistic regression, a dictionary based approach, and a probabilistic model based on modified Kneser-Ney smoothing. Furthermore, the extension of the probabilistic model to include additional user-specific information such as evidence accumulation per user and user interface language is explored, with the goal of improving the classification performance. The proposed approaches are evaluated on randomly collected Twitter data containing Latin as well as non-Latin alphabet languages and the quality of the obtained results is compared, followed by the selection of the best performing algorithm. This algorithm is then evaluated against two already existing general language detection tools: Chromium Compact Language Detector 2 (CLD2) and langid, where our method significantly outperforms the results achieved by both of the mentioned methods. Additionally, a preview of benefits and possible applications of having a reliable language detection algorithm is given.\nThe tremendous success of ImageNet-trained deep features on a wide range of transfer tasks begs the question: what are the properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature quality depend on the number of training examples per class? Does adding more object classes improve performance? For the same data budget, how should the data be split into classes? Is fine-grained recognition necessary for learning good features? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class? To answer these and related questions, we pre-trained CNN features on various subsets of the ImageNet dataset and evaluated transfer performance on PASCAL detection, PASCAL action classification, and SUN scene classification tasks. Our overall findings suggest that most changes in the choice of pre-training data long thought to be critical do not significantly affect transfer performance.? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class?\nThe amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large-scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of problematic genes (i.e., homoplasy, incomplete lineage sorting, horizontal gene transfers, etc.) which may blur the phylogenetic signal. However, a trustworthy phylogenetic tree can still be obtained provided such a number of blurring genes is reduced. The problem is thus to determine the largest subset of core genes that produces the best-supported tree. To discard problematic genes and due to the overwhelming number of possible combinations, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported species tree. Due to computational complexity, a distributed Binary Particle Swarm Optimization (BPSO) is proposed in sequential and distributed fashions. Obtained results from both versions of the BPSO are compared with those computed using an hybrid approach embedding both genetic algorithms and statistical tests. The proposal has been applied to different cases of plant families, leading to encouraging results for these families.\nMany problems, especially those with a composite structure, can naturally be expressed in higher order logic. From a KR perspective modeling these problems in an intuitive way is a challenging task. In this paper we study the graph mining problem as an example of a higher order problem. In short, this problem asks us to find a graph that frequently occurs as a subgraph among a set of example graphs. We start from the problem's mathematical definition to solve it in three state-of-the-art specification systems. For IDP and ASP, which have no native support for higher order logic, we propose the use of encoding techniques such as the disjoint union technique and the saturation technique. ProB benefits from the higher order support for sets. We compare the performance of the three approaches to get an idea of the overhead of the higher order support.   We propose higher-order language extensions for IDP-like specification languages and discuss what kind of solver support is needed. Native higher order shifts the burden of rewriting specifications using encoding techniques from the user to the solver itself.\nOver the years, nonmonotonic rules have proven to be a very expressive and useful knowledge representation paradigm. They have recently been used to complement the expressive power of Description Logics (DLs), leading to the study of integrative formal frameworks, generally referred to as hybrid knowledge bases, where both DL axioms and rules can be used to represent knowledge. The need to use these hybrid knowledge bases in dynamic domains has called for the development of update operators, which, given the substantially different way Description Logics and rules are usually updated, has turned out to be an extremely difficult task.   In [SL10], a first step towards addressing this problem was taken, and an update operator for hybrid knowledge bases was proposed. Despite its significance -- not only for being the first update operator for hybrid knowledge bases in the literature, but also because it has some applications - this operator was defined for a restricted class of problems where only the ABox was allowed to change, which considerably diminished its applicability. Many applications that use hybrid knowledge bases in dynamic scenarios require both DL axioms and rules to be updated.   In this paper, motivated by real world applications, we introduce an update operator for a large class of hybrid knowledge bases where both the DL component as well as the rule component are allowed to dynamically change. We introduce splitting sequences and splitting theorem for hybrid knowledge bases, use them to define a modular update semantics, investigate its basic properties, and illustrate its use on a realistic example about cargo imports.\nDeveloping smart house systems has been a great challenge for researchers and engineers in this area because of the high cost of implementation and evaluation process of these systems, while being very time consuming. Testing a designed smart house before actually building it is considered as an obstacle towards an efficient smart house project. This is because of the variety of sensors, home appliances and devices available for a real smart environment. In this paper, we present the design and implementation of a multi-purpose smart house simulation system for designing and simulating all aspects of a smart house environment. This simulator provides the ability to design the house plan and different virtual sensors and appliances in a two dimensional model of the virtual house environment. This simulator can connect to any external smart house remote controlling system, providing evaluation capabilities to their system much easier than before. It also supports detailed adding of new emerging sensors and devices to help maintain its compatibility with future simulation needs. Scenarios can also be defined for testing various possible combinations of device states; so different criteria and variables can be simply evaluated without the need of experimenting on a real environment.\nUsing an interactive theorem prover to reason about programs involves a sequence of interactions where the user challenges the theorem prover with conjectures. Invariably, many of the conjectures posed are in fact false, and users often spend considerable effort examining the theorem prover's output before realizing this. We present a synergistic integration of testing with theorem proving, implemented in the ACL2 Sedan (ACL2s), for automatically generating concrete counterexamples. Our method uses the full power of the theorem prover and associated libraries to simplify conjectures; this simplification can transform conjectures for which finding counterexamples is hard into conjectures where finding counterexamples is trivial. In fact, our approach even leads to better theorem proving, e.g. if testing shows that a generalization step leads to a false conjecture, we force the theorem prover to backtrack, allowing it to pursue more fruitful options that may yield a proof. The focus of the paper is on the engineering of a synergistic integration of testing with interactive theorem proving; this includes extending ACL2 with new functionality that we expect to be of general interest. We also discuss our experience in using ACL2s to teach freshman students how to reason about their programs.\nWe propose a nonparametric generalization of belief propagation, Kernel Belief Propagation (KBP), for pairwise Markov random fields. Messages are represented as functions in a reproducing kernel Hilbert space (RKHS), and message updates are simple linear operations in the RKHS. KBP makes none of the assumptions commonly required in classical BP algorithms: the variables need not arise from a finite domain or a Gaussian distribution, nor must their relations take any particular parametric form. Rather, the relations between variables are represented implicitly, and are learned nonparametrically from training data. KBP has the advantage that it may be used on any domain where kernels are defined (Rd, strings, groups), even where explicit parametric models are not known, or closed form expressions for the BP updates do not exist. The computational cost of message updates in KBP is polynomial in the training data size. We also propose a constant time approximate message update procedure by representing messages using a small number of basis functions. In experiments, we apply KBP to image denoising, depth prediction from still images, and protein configuration prediction: KBP is faster than competing classical and nonparametric approaches (by orders of magnitude, in some cases), while providing significantly more accurate results.\nThis paper applies machine learning and the mathematics of chaos to the task of designing indoor rock-climbing routes. Chaotic variation has been used to great advantage on music and dance, but the challenges here are quite different, beginning with the representation. We present a formalized system for transcribing rock climbing problems, then describe a variation generator that is designed to support human route-setters in designing new and interesting climbing problems. This variation generator, termed Strange Beta, combines chaos and machine learning, using the former to introduce novelty and the latter to smooth transitions in a manner that is consistent with the style of the climbs This entails parsing the domain-specific natural language that rock climbers use to describe routes and movement and then learning the patterns in the results. We validated this approach with a pilot study in a small university rock climbing gym, followed by a large blinded study in a commercial climbing gym, in cooperation with experienced climbers and expert route setters. The results show that {\\sc Strange Beta} can help a human setter produce routes that are at least as good as, and in some cases better than, those produced in the traditional manner.\nThe distribution semantics is one of the most prominent approaches for the combination of logic programming and probability theory. Many languages follow this semantics, such as Independent Choice Logic, PRISM, pD, Logic Programs with Annotated Disjunctions (LPADs) and ProbLog. When a program contains functions symbols, the distribution semantics is well-defined only if the set of explanations for a query is finite and so is each explanation. Well-definedness is usually either explicitly imposed or is achieved by severely limiting the class of allowed programs. In this paper we identify a larger class of programs for which the semantics is well-defined together with an efficient procedure for computing the probability of queries. Since LPADs offer the most general syntax, we present our results for them, but our results are applicable to all languages under the distribution semantics. We present the algorithm \"Probabilistic Inference with Tabling and Answer subsumption\" (PITA) that computes the probability of queries by transforming a probabilistic program into a normal program and then applying SLG resolution with answer subsumption. PITA has been implemented in XSB and tested on six domains: two with function symbols and four without. The execution times are compared with those of ProbLog, cplint and CVE, PITA was almost always able to solve larger problems in a shorter time, on domains with and without function symbols.\nWe study the performance of different message passing algorithms in the two dimensional Edwards Anderson model. We show that the standard Belief Propagation (BP) algorithm converges only at high temperature to a paramagnetic solution. Then, we test a Generalized Belief Propagation (GBP) algorithm, derived from a Cluster Variational Method (CVM) at the plaquette level. We compare its performance with BP and with other algorithms derived under the same approximation: Double Loop (DL) and a two-ways message passing algorithm (HAK). The plaquette-CVM approximation improves BP in at least three ways: the quality of the paramagnetic solution at high temperatures, a better estimate (lower) for the critical temperature, and the fact that the GBP message passing algorithm converges also to non paramagnetic solutions. The lack of convergence of the standard GBP message passing algorithm at low temperatures seems to be related to the implementation details and not to the appearance of long range order. In fact, we prove that a gauge invariance of the constrained CVM free energy can be exploited to derive a new message passing algorithm which converges at even lower temperatures. In all its region of convergence this new algorithm is faster than HAK and DL by some orders of magnitude.\nGiven a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-sensitive hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods only use LSH for the first phase of similarity search - i.e. efficient indexing for candidate generation. In this paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search - performing candidate pruning and similarity estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates similarities exactly, is also presented. BayesLSH is able to quickly prune away a large majority of the false positive candidate pairs, leading to significant speedups over baseline approaches. For BayesLSH, we also provide probabilistic guarantees on the quality of the output, both in terms of accuracy and recall. Finally, the quality of BayesLSH's output can be easily tuned and does not require any manual setting of the number of hashes to use for similarity estimation, unlike standard approaches. For two state-of-the-art candidate generation algorithms, AllPairs and LSH, BayesLSH enables significant speedups, typically in the range 2x-20x for a wide variety of datasets.\nThis paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.\nScene understanding includes many related sub-tasks, such as scene categorization, depth estimation, object detection, etc. Each of these sub-tasks is often notoriously hard, and state-of-the-art classifiers already exist for many of them. These classifiers operate on the same raw image and provide correlated outputs. It is desirable to have an algorithm that can capture such correlation without requiring any changes to the inner workings of any classifier.   We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that jointly optimizes all the sub-tasks, while requiring only a `black-box' interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in the domain of scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection. Our method also improves performance in two robotic applications: an object-grasping robot and an object-finding robot.\nRecently, quantitative versions of the Gibbard-Satterthwaite theorem were proven for $k=3$ alternatives by Friedgut, Kalai, Keller and Nisan and for neutral functions on $k \\geq 4$ alternatives by Isaksson, Kindler and Mossel.   We prove a quantitative version of the Gibbard-Satterthwaite theorem for general social choice functions for any number $k \\geq 3$ of alternatives. In particular we show that for a social choice function $f$ on $k \\geq 3$ alternatives and $n$ voters, which is $\\epsilon$-far from the family of nonmanipulable functions, a uniformly chosen voter profile is manipulable with probability at least inverse polynomial in $n$, $k$, and $\\epsilon^{-1}$.   Removing the neutrality assumption of previous theorems is important for multiple reasons. For one, it is known that there is a conflict between anonymity and neutrality, and since most common voting rules are anonymous, they cannot always be neutral. Second, virtual elections are used in many applications in artificial intelligence, where there are often restrictions on the outcome of the election, and so neutrality is not a natural assumption in these situations.   Ours is a unified proof which in particular covers all previous cases established before. The proof crucially uses reverse hypercontractivity in addition to several ideas from the two previous proofs. Much of the work is devoted to understanding functions of a single voter, and in particular we also prove a quantitative Gibbard-Satterthwaite theorem for one voter.\nThe CDOI outcome measure - a patient-reported outcome (PRO) instrument utilizing direct client feedback - was implemented in a large, real-world behavioral healthcare setting in order to evaluate previous findings from smaller controlled studies. PROs provide an alternative window into treatment effectiveness based on client perception and facilitate detection of problems/symptoms for which there is no discernible measure (e.g. pain). The principal focus of the study was to evaluate the utility of the CDOI for predictive modeling of outcomes in a live clinical setting. Implementation factors were also addressed within the framework of the Theory of Planned Behavior by linking adoption rates to implementation practices and clinician perceptions. The results showed that the CDOI does contain significant capacity to predict outcome delta over time based on baseline and early change scores in a large, real-world clinical setting, as suggested in previous research. The implementation analysis revealed a number of critical factors affecting successful implementation and adoption of the CDOI outcome measure, though there was a notable disconnect between clinician intentions and actual behavior. Most importantly, the predictive capacity of the CDOI underscores the utility of direct client feedback measures such as PROs and their potential use as the basis for next generation clinical decision support tools and personalized treatment approaches.\nThe basis of the method proposed in this article is the idea that information is one of the most important factors in strategic decisions, including decisions in computer chess and other strategy games. The model proposed in this article and the algorithm described are based on the idea of a information theoretic basis of decision in strategy games . The model generalizes and provides a mathematical justification for one of the most popular search algorithms used in leading computer chess programs, the fractional ply scheme. However, despite its success in leading computer chess applications, until now few has been published about this method. The article creates a fundamental basis for this method in the axioms of information theory, then derives the principles used in programming the search and describes mathematically the form of the coefficients. One of the most important parameters of the fractional ply search is derived from fundamental principles. Until now this coefficient has been usually handcrafted or determined from intuitive elements or data mining. There is a deep, information theoretical justification for such a parameter. In one way the method proposed is a generalization of previous methods. More important, it shows why the fractional depth ply scheme is so powerful. It is because the algorithm navigates along the lines where the highest information gain is possible. A working and original implementation has been written and tested for this algorithm and is provided in the appendix. The article is essentially self-contained and gives proper background knowledge and references. The assumptions are intuitive and in the direction expected and described intuitively by great champions of chess.\nAdaptation to changing environments is a hallmark of biological systems. Diversity in traits is necessary for adaptation and can influence the survival of a population faced with novelty. In habitats that remain stable over many generations, stabilizing selection reduces trait differences within populations, thereby appearing to remove the diversity needed for heritable adaptive responses in new environments. Paradoxically, field studies have documented numerous populations under long periods of stabilizing selection and evolutionary stasis that have rapidly evolved under changed environmental conditions. In this article, we review how cryptic genetic variation (CGV) resolves this diversity paradox by allowing populations in a stable environment to gradually accumulate hidden genetic diversity that is revealed as trait differences when environments change. Instead of being in conflict, environmental stasis supports CGV accumulation and thus appears to facilitate rapid adaptation in new environments as suggested by recent CGV studies. Similarly, degeneracy has been found to support both genetic and non-genetic adaptation at many levels of biological organization. Degenerate, as opposed to diverse or redundant, ensembles appear functionally redundant in certain environmental contexts but functionally diverse in others. CGV and degeneracy paradigms for adaptation are integrated in this review, revealing a common set of principles that support adaptation at multiple levels of biological organization. Though a discussion of simulation studies, molecular-based experimental systems, principles from population genetics, and field experiments, we demonstrate that CGV and degeneracy reflect complementary top-down and bottom-up, respectively, conceptualizations of the same basic phenomenon and arguably capture a universal feature of biological adaptive processes.\nIn this paper, we study CPU utilization time patterns of several Map-Reduce applications. After extracting running patterns of several applications, the patterns with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute unknown applications in future. To achieve this goal, CPU utilization patterns of new applications along with its statistical information are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different patterns lengths, the Dynamic Time Warping (DTW) is utilized for such comparison; a statistical analysis is then applied to DTWs' outcomes to select the most suitable candidates. Moreover, under a hypothesis, another algorithm is proposed to classify applications under similar CPU utilization patterns. Three widely used text processing applications (WordCount, Distributed Grep, and Terasort) and another application (Exim Mainlog parsing) are used to evaluate our hypothesis in tweaking system parameters in executing similar applications. Results were very promising and showed effectiveness of our approach on 5-node Map-Reduce platform\nAtomizing various Web activities by replacing human to human interactions on the Internet has been made indispensable due to its enormous growth. However, bots also known as Web-bots which have a malicious intend and pretending to be humans pose a severe threat to various services on the Internet that implicitly assume a human interaction. Accordingly, Web service providers before allowing access to such services use various Human Interaction Proof's (HIPs) to authenticate that the user is a human and not a bot. Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a class of HIPs tests and are based on Artificial Intelligence. These tests are easier for humans to qualify and tough for bots to simulate. Several Web services use CAPTCHAs as a defensive mechanism against automated Web-bots. In this paper, we review the existing CAPTCHA schemes that have been proposed or are being used to protect various Web services. We classify them in groups and compare them with each other in terms of security and usability. We present general method used to generate and break text-based and image-based CAPTCHAs. Further, we discuss various security and usability issues in CAPTCHA design and provide guidelines for improving their robustness and usability.\nImportance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algorithms, which are based on either Document Vector model (DVM) or Suffix Tree model (STC), are less efficient in producing results with high cluster quality. This paper introduces a new approach for document clustering based on the Topic Map representation of the documents. The document is being transformed into a compact form. A similarity measure is proposed based upon the inferred information through topic maps data and structures. The suggested method is implemented using agglomerative hierarchal clustering and tested on standard Information retrieval (IR) datasets. The comparative experiment reveals that the proposed approach is effective in improving the cluster quality.\nDocument clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional vector based document similarity for clustering to suffix tree based document similarity, as it offers more semantic representation of the text present in the document. In this paper, we compare and contrast two recently introduced approaches to document clustering based on suffix tree data model. The first is an Efficient Phrase based document clustering, which extracts phrases from documents to form compact document representation and uses a similarity measure based on common suffix tree to cluster the documents. The second approach is a frequent word/word meaning sequence based document clustering, it similarly extracts the common word sequence from the document and uses the common sequence/ common word meaning sequence to perform the compact representation, and finally, it uses document clustering approach to cluster the compact documents. These algorithms are using agglomerative hierarchical document clustering to perform the actual clustering step, the difference in these approaches are mainly based on extraction of phrases, model representation as a compact document, and the similarity measures used for clustering. This paper investigates the computational aspect of the two algorithms, and the quality of results they produced.\nThe causal inference literature has provided a clear formal definition of confounding expressed in terms of counterfactual independence. The literature has not, however, come to any consensus on a formal definition of a confounder, as it has given priority to the concept of confounding over that of a confounder. We consider a number of candidate definitions arising from various more informal statements made in the literature. We consider the properties satisfied by each candidate definition, principally focusing on (i) whether under the candidate definition control for all \"confounders\" suffices to control for \"confounding\" and (ii) whether each confounder in some context helps eliminate or reduce confounding bias. Several of the candidate definitions do not have these two properties. Only one candidate definition of those considered satisfies both properties. We propose that a \"confounder\" be defined as a pre-exposure covariate C for which there exists a set of other covariates X such that effect of the exposure on the outcome is unconfounded conditional on (X,C) but such that for no proper subset of (X,C) is the effect of the exposure on the outcome unconfounded given the subset. We also provide a conditional analogue of the above definition; and we propose a variable that helps reduce bias but not eliminate bias be referred to as a \"surrogate confounder.\" These definitions are closely related to those given by Robins and Morgenstern [Comput. Math. Appl. 14 (1987) 869-916]. The implications that hold among the various candidate definitions are discussed.\nThe computational characterization of game-theoretic solution concepts is a central topic in artificial intelligence, with the aim of developing computationally efficient tools for finding optimal ways to behave in strategic interactions. The central solution concept in game theory is Nash equilibrium (NE). However, it fails to capture the possibility that agents can form coalitions (even in the 2-agent case). Strong Nash equilibrium (SNE) refines NE to this setting. It is known that finding an SNE is NP-complete when the number of agents is constant. This hardness is solely due to the existence of mixed-strategy SNEs, given that the problem of enumerating all pure-strategy SNEs is trivially in P. Our central result is that, in order for a game to have at least one non-pure-strategy SNE, the agents' payoffs restricted to the agents' supports must, in the case of 2 agents, lie on the same line, and, in the case of n agents, lie on an (n - 1)-dimensional hyperplane. Leveraging this result, we provide two contributions. First, we develop worst-case instances for support-enumeration algorithms. These instances have only one SNE and the support size can be chosen to be of any size-in particular, arbitrarily large. Second, we prove that, unlike NE, finding an SNE is in smoothed polynomial time: generic game instances (i.e., all instances except knife-edge cases) have only pure-strategy SNEs.\nPattern-Based Constraint Satisfaction and Logic Puzzles develops a pure logic, pattern-based perspective of solving the finite Constraint Satisfaction Problem (CSP), with emphasis on finding the \"simplest\" solution. Different ways of reasoning with the constraints are formalised by various families of \"resolution rules\", each of them carrying its own notion of simplicity. A large part of the book illustrates the power of the approach by applying it to various popular logic puzzles. It provides a unified view of how to model and solve them, even though they involve very different types of constraints: obvious symmetric ones in Sudoku, non-symmetric but transitive ones (inequalities) in Futoshiki, topological and geometric ones in Map colouring, Numbrix and Hidato, and even much more complex non-binary arithmetic ones in Kakuro (or Cross Sums). It also shows that the most familiar techniques for these puzzles can indeed be understood as mere application-specific presentations of the general rules. Sudoku is used as the main example throughout the book, making it also an advanced level sequel to \"The Hidden Logic of Sudoku\" (another book by the same author), with: many examples of relationships among different rules and of exceptional situations; comparisons of the resolution potential of various families of rules; detailed statistics of puzzles hardness; analysis of extreme instances.\nThe Decision Support System (DSS) contains more than one antecedent and the degrees of strength of the antecedents need to be combined to determine the overall strength of the rule consequent. The membership values of the linguistic variables in Fuzzy have to be combined using an aggregation operator. But it is not feasible to predefine the form of aggregation operators in decision making. Instead, each rule should be found based on the feeling of the experts and on their actual decision pattern over the set of typical examples. Thus this work illustrates how the choice of aggregation operators is intended to mimic human decision making and can be selected and adjusted to fit empirical data, a series of test cases. Both parametrized and nonparametrized aggregation operators are adapted to fit empirical data. Moreover, they provided compensatory properties and, therefore, seemed to produce a better decision support system. To solve the problem, a threshold point from the output of the aggregation operators is chosen as the separation point between two classes. The best achieved accuracy is chosen as the appropriate aggregation operator. Thus a medical decision can be generated which is very close to a practitioner's guideline.\nIn this Part II, we apply the general theory developed in Part I to a detailed analysis of the Constraint Satisfaction Problem (CSP). We show how specific types of resolution rules can be defined. In particular, we introduce the general notions of a chain and a braid. As in Part I, these notions are illustrated in detail with the Sudoku example - a problem known to be NP-complete and which is therefore typical of a broad class of hard problems. For Sudoku, we also show how far one can go in 'approximating' a CSP with a resolution theory and we give an empirical statistical analysis of how the various puzzles, corresponding to different sets of entries, can be classified along a natural scale of complexity. For any CSP, we also prove the confluence property of some Resolution Theories based on braids and we show how it can be used to define different resolution strategies. Finally, we prove that, in any CSP, braids have the same solving capacity as Trial-and-Error (T&E) with no guessing and we comment this result in the Sudoku case.\nTrustworthiness especially for service oriented system is very important topic now a day in IT field of the whole world. There are many successful E-commerce organizations presently run in the whole world, but E-commerce has not reached its full potential. The main reason behind this is lack of Trust of people in e-commerce. Again, proper models are still absent for calculating trust of different e-commerce organizations. Most of the present trust models are subjective and have failed to account vagueness and ambiguity of different domain. In this paper we have proposed a new fuzzy logic based Certain Trust model which considers these ambiguity and vagueness of different domain. Fuzzy Based Certain Trust Model depends on some certain values given by experts and developers. can be applied in a system like cloud computing, internet, website, e-commerce, etc. to ensure trustworthiness of these platforms. In this paper we show, although fuzzy works with uncertainties, proposed model works with some certain values. Some experimental results and validation of the model with linguistics terms are shown at the last part of the paper.\nAn undirected graphical model is a joint probability distribution defined on an undirected graph G*, where the vertices in the graph index a collection of random variables and the edges encode conditional independence relationships among random variables. The undirected graphical model selection (UGMS) problem is to estimate the graph G* given observations drawn from the undirected graphical model. This paper proposes a framework for decomposing the UGMS problem into multiple subproblems over clusters and subsets of the separators in a junction tree. The junction tree is constructed using a graph that contains a superset of the edges in G*. We highlight three main properties of using junction trees for UGMS. First, different regularization parameters or different UGMS algorithms can be used to learn different parts of the graph. This is possible since the subproblems we identify can be solved independently of each other. Second, under certain conditions, a junction tree based UGMS algorithm can produce consistent results with fewer observations than the usual requirements of existing algorithms. Third, both our theoretical and experimental results show that the junction tree framework does a significantly better job at finding the weakest edges in a graph than existing methods. This property is a consequence of both the first and second properties. Finally, we note that our framework is independent of the choice of the UGMS algorithm and can be used as a wrapper around standard UGMS algorithms for more accurate graph estimation.\nThe paper describes development (improvement/extension) approaches for composite (modular) systems (as combinatorial reengineering). The following system improvement/extension actions are considered: (a) improvement of systems component(s) (e.g., improvement of a system component, replacement of a system component); (b) improvement of system component interconnection (compatibility); (c) joint improvement improvement of system components(s) and their interconnection; (d) improvement of system structure (replacement of system part(s), addition of a system part, deletion of a system part, modification of system structure). The study of system improvement approaches involve some crucial issues: (i) scales for evaluation of system components and component compatibility (quantitative scale, ordinal scale, poset-like scale, scale based on interval multiset estimate), (ii) evaluation of integrated system quality, (iii) integration methods to obtain the integrated system quality. The system improvement/extension strategies can be examined as seleciton/combination of the improvement action(s) above and as modification of system structure. The strategies are based on combinatorial optimization problems (e.g., multicriteria selection, knapsack problem, multiple choice problem, combinatorial synthesis based on morphological clique problem, assignment/reassignment problem, graph recoloring problem, spanning problems, hotlink assignment). Here, heuristics are used. Various system improvement/extension strategies are presented including illustrative numerical examples.\nWe consider stochastic strongly convex optimization with a complex inequality constraint. This complex inequality constraint may lead to computationally expensive projections in algorithmic iterations of the stochastic gradient descent~(SGD) methods. To reduce the computation costs pertaining to the projections, we propose an Epoch-Projection Stochastic Gradient Descent~(Epro-SGD) method. The proposed Epro-SGD method consists of a sequence of epochs; it applies SGD to an augmented objective function at each iteration within the epoch, and then performs a projection at the end of each epoch. Given a strongly convex optimization and for a total number of $T$ iterations, Epro-SGD requires only $\\log(T)$ projections, and meanwhile attains an optimal convergence rate of $O(1/T)$, both in expectation and with a high probability. To exploit the structure of the optimization problem, we propose a proximal variant of Epro-SGD, namely Epro-ORDA, based on the optimal regularized dual averaging method. We apply the proposed methods on real-world applications; the empirical results demonstrate the effectiveness of our methods.\nIn the domain of Computing with words (CW), fuzzy linguistic approaches are known to be relevant in many decision-making problems. Indeed, they allow us to model the human reasoning in replacing words, assessments, preferences, choices, wishes... by ad hoc variables, such as fuzzy sets or more sophisticated variables.   This paper focuses on a particular model: Herrera & Martinez' 2-tuple linguistic model and their approach to deal with unbalanced linguistic term sets. It is interesting since the computations are accomplished without loss of information while the results of the decision-making processes always refer to the initial linguistic term set. They propose a fuzzy partition which distributes data on the axis by using linguistic hierarchies to manage the non-uniformity. However, the required input (especially the density around the terms) taken by their fuzzy partition algorithm may be considered as too much demanding in a real-world application, since density is not always easy to determine. Moreover, in some limit cases (especially when two terms are very closed semantically to each other), the partition doesn't comply with the data themselves, it isn't close to the reality. Therefore we propose to modify the required input, in order to offer a simpler and more faithful partition. We have added an extension to the package jFuzzyLogic and to the corresponding script language FCL. This extension supports both 2-tuple models: Herrera & Martinez' and ours. In addition to the partition algorithm, we present two aggregation algorithms: the arithmetic means and the addition. We also discuss these kinds of 2-tuple models.\nWe reconsider the stochastic (sub)gradient approach to the unconstrained primal L1-SVM optimization. We observe that if the learning rate is inversely proportional to the number of steps, i.e., the number of times any training pattern is presented to the algorithm, the update rule may be transformed into the one of the classical perceptron with margin in which the margin threshold increases linearly with the number of steps. Moreover, if we cycle repeatedly through the possibly randomly permuted training set the dual variables defined naturally via the expansion of the weight vector as a linear combination of the patterns on which margin errors were made are shown to obey at the end of each complete cycle automatically the box constraints arising in dual optimization. This renders the dual Lagrangian a running lower bound on the primal objective tending to it at the optimum and makes available an upper bound on the relative accuracy achieved which provides a meaningful stopping criterion. In addition, we propose a mechanism of presenting the same pattern repeatedly to the algorithm which maintains the above properties. Finally, we give experimental evidence that algorithms constructed along these lines exhibit a considerably improved performance.\nProbabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. This paper investigates how classical inference and learning tasks known from the graphical model community can be tackled for probabilistic logic programs. Several such tasks such as computing the marginals given evidence and learning from (partial) interpretations have not really been addressed for probabilistic logic programs before.   The first contribution of this paper is a suite of efficient algorithms for various inference tasks. It is based on a conversion of the program and the queries and evidence to a weighted Boolean formula. This allows us to reduce the inference tasks to well-studied tasks such as weighted model counting, which can be solved using state-of-the-art methods known from the graphical model and knowledge compilation literature. The second contribution is an algorithm for parameter estimation in the learning from interpretations setting. The algorithm employs Expectation Maximization, and is built on top of the developed inference algorithms.   The proposed approach is experimentally evaluated. The results show that the inference algorithms improve upon the state-of-the-art in probabilistic logic programming and that it is indeed possible to learn the parameters of a probabilistic logic program from interpretations.\nResearchers since at least Darwin have debated whether and to what extent emotions are universal or culture-dependent. However, previous studies have primarily focused on facial expressions and on a limited set of emotions. Given that emotions have a substantial impact on human lives, evidence for cultural emotional relativity might be derived by applying distributional semantics techniques to a text corpus of self-reported behaviour. Here, we explore this idea by measuring the valence and arousal of the twelve most popular emotion keywords expressed on the micro-blogging site Twitter. We do this in three geographical regions: Europe, Asia and North America. We demonstrate that in our sample, the valence and arousal levels of the same emotion keywords differ significantly with respect to these geographical regions --- Europeans are, or at least present themselves as more positive and aroused, North Americans are more negative and Asians appear to be more positive but less aroused when compared to global valence and arousal levels of the same emotion keywords. Our work is the first in kind to programatically map large text corpora to a dimensional model of affect.\nThe junction tree algorithm is a way of computing marginals of boolean multivariate probability distributions that factorise over sets of random variables. The junction tree algorithm first constructs a tree called a junction tree who's vertices are sets of random variables. The algorithm then performs a generalised version of belief propagation on the junction tree. The Shafer-Shenoy and Hugin architectures are two ways to perform this belief propagation that tradeoff time and space complexities in different ways: Hugin propagation is at least as fast as Shafer-Shenoy propagation and in the cases that we have large vertices of high degree is significantly faster. However, this speed increase comes at the cost of an increased space complexity. This paper first introduces a simple novel architecture, ARCH-1, which has the best of both worlds: the speed of Hugin propagation and the low space requirements of Shafer-Shenoy propagation. A more complicated novel architecture, ARCH-2, is then introduced which has, up to a factor only linear in the maximum cardinality of any vertex, time and space complexities at least as good as ARCH-1 and in the cases that we have large vertices of high degree is significantly faster than ARCH-1.\nWith the proliferation of its applications in various industries, sentiment analysis by using publicly available web data has become an active research area in text classification during these years. It is argued by researchers that semi-supervised learning is an effective approach to this problem since it is capable to mitigate the manual labeling effort which is usually expensive and time-consuming. However, there was a long-term debate on the effectiveness of unlabeled data in text classification. This was partially caused by the fact that many assumptions in theoretic analysis often do not hold in practice. We argue that this problem may be further understood by adding an additional dimension in the experiment. This allows us to address this problem in the perspective of bias and variance in a broader view. We show that the well-known performance degradation issue caused by unlabeled data can be reproduced as a subset of the whole scenario. We argue that if the bias-variance trade-off is to be better balanced by a more effective feature selection method unlabeled data is very likely to boost the classification performance. We then propose a feature selection framework in which labeled and unlabeled training samples are both considered. We discuss its potential in achieving such a balance. Besides, the application in financial sentiment analysis is chosen because it not only exemplifies an important application, the data possesses better illustrative power as well. The implications of this study in text classification and financial sentiment analysis are both discussed.\nThe Bayesian approach to machine learning amounts to computing posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define measure-transformer combinators inspired by theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that is processed by an existing inference engine for factor graphs, which are data structures that enable many efficient inference algorithms. This allows efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.\nWhat are the cognitive after-effects of making a similarity judgement? What, cognitively, is left behind and what effect might these residues have on subsequent processing? In this paper, we probe for such after-effects using a visual search task, performed after a task in which pictures of real-world objects were compared. So, target objects were first presented in a comparison task (e.g., rate the similarity of this object to another) thus, presumably, modifying some of their features before asking people to visually search for the same object in complex scenes (with distractors and camouflaged backgrounds). As visual search is known to be influenced by the features of target objects, then any after-effects of the comparison task should be revealed in subsequent visual searches. Results showed that when people previously rated an object as being high on a scale (e.g., colour similarity or general similarity) then visual search is inhibited (slower RTs and more saccades in eye-tracking) relative to an object being rated as low in the same scale. There was also some evidence that different comparison tasks (e.g., compare on colour or compare on general similarity) have differential effects on visual search.\nIn this paper, a new structure of cooperative learning automata so-called extended learning automata (eDLA) is introduced. Based on the proposed structure, a new iterative randomized heuristic algorithm for finding optimal sub-graph in a stochastic edge-weighted graph through sampling is proposed. It has been shown that the proposed algorithm based on new networked-structure can be to solve the optimization problems on stochastic graph through less number of sampling in compare to standard sampling. Stochastic graphs are graphs in which the edges have an unknown distribution probability weights. Proposed algorithm uses an eDLA to find a policy that leads to an induced sub-graph that satisfies some restrictions such as minimum or maximum weight (length). At each stage of the proposed algorithm, eDLA determines which edges to be sampled. This eDLA-based proposed sampling method may result in decreasing unnecessary samples and hence decreasing the time that algorithm requires for finding the optimal sub-graph. It has been shown that proposed method converge to optimal solution, furthermore the probability of this convergence can be made arbitrarily close to 1 by using a sufficiently small learning rate. A new variance-aware threshold value was proposed that can be improving significantly convergence rate of the proposed eDLA-based algorithm. It has been shown that the proposed algorithm is competitive in terms of the quality of the solution\nFloating-point computations are quickly finding their way in the design of safety- and mission-critical systems, despite the fact that designing floating-point algorithms is significantly more difficult than designing integer algorithms. For this reason, verification and validation of floating-point computations is a hot research topic. An important verification technique, especially in some industrial sectors, is testing. However, generating test data for floating-point intensive programs proved to be a challenging problem. Existing approaches usually resort to random or search-based test data generation, but without symbolic reasoning it is almost impossible to generate test inputs that execute complex paths controlled by floating-point computations. Moreover, as constraint solvers over the reals or the rationals do not natively support the handling of rounding errors, the need arises for efficient constraint solvers over floating-point domains. In this paper, we present and fully justify improved algorithms for the propagation of arithmetic IEEE 754 binary floating-point constraints. The key point of these algorithms is a generalization of an idea by B. Marre and C. Michel that exploits a property of the representation of floating-point numbers.\nThis paper summarizes efforts to computationally model two transitions in the evolution of human creativity: its origins about two million years ago, and the 'big bang' of creativity about 50,000 years ago. Using a computational model of cultural evolution in which neural network based agents evolve ideas for actions through invention and imitation, we tested the hypothesis that human creativity began with onset of the capacity for recursive recall. We compared runs in which agents were limited to single-step actions to runs in which they used recursive recall to chain simple actions into complex ones. Chaining resulted in higher diversity, open-ended novelty, no ceiling on the mean fitness of actions, and greater ability to make use of learning. Using a computational model of portrait painting, we tested the hypothesis that the explosion of creativity in the Middle/Upper Paleolithic was due to onset of con-textual focus: the capacity to shift between associative and analytic thought. This resulted in faster convergence on portraits that resembled the sitter, employed painterly techniques, and were rated as preferable. We conclude that recursive recall and contextual focus provide a computationally plausible explanation of how humans evolved the means to transform this planet.\nThe Shapley value---probably the most important normative payoff division scheme in coalitional games---has recently been advocated as a useful measure of centrality in networks. However, although this approach has a variety of real-world applications (including social and organisational networks, biological networks and communication networks), its computational properties have not been widely studied. To date, the only practicable approach to compute Shapley value-based centrality has been via Monte Carlo simulations which are computationally expensive and not guaranteed to give an exact answer. Against this background, this paper presents the first study of the computational aspects of the Shapley value for network centralities. Specifically, we develop exact analytical formulae for Shapley value-based centrality in both weighted and unweighted networks and develop efficient (polynomial time) and exact algorithms based on them. We empirically evaluate these algorithms on two real-life examples (an infrastructure network representing the topology of the Western States Power Grid and a collaboration network from the field of astrophysics) and demonstrate that they deliver significant speedups over the Monte Carlo approach. For instance, in the case of unweighted networks our algorithms are able to return the exact solution about 1600 times faster than the Monte Carlo approximation, even if we allow for a generous 10% error margin for the latter method.\nMany feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.\nWe consider how selfish agents are likely to share revenues derived from maintaining connectivity between important network servers. We model a network where a failure of one node may disrupt communication between other nodes as a cooperative game called the vertex Connectivity Game (CG). In this game, each agent owns a vertex, and controls all the edges going to and from that vertex. A coalition of agents wins if it fully connects a certain subset of vertices in the graph, called the primary vertices. Power indices measure an agents ability to affect the outcome of the game. We show that in our domain, such indices can be used to both determine the fair share of the revenues an agent is entitled to, and identify significant possible points of failure affecting the reliability of communication in the network. We show that in general graphs, calculating the Shapley and Banzhaf power indices is #P-complete, but suggest a polynomial algorithm for calculating them in trees. We also investigate finding stable payoff divisions of the revenues in CGs, captured by the game theoretic solution of the core, and its relaxations, the epsilon-core and least core. We show a polynomial algorithm for computing the core of a CG, but show that testing whether an imputation is in the epsilon-core is coNP-complete. Finally, we show that for trees, it is possible to test for epsilon-core imputations in polynomial time.\nTo achieve an optimal outcome in many situations, agents need to choose distinct actions from one another. This is the case notably in many resource allocation problems, where a single resource can only be used by one agent at a time. How shall a designer of a multi-agent system program its identical agents to behave each in a different way? From a game theoretic perspective, such situations lead to undesirable Nash equilibria. For example consider a resource allocation game in that two players compete for an exclusive access to a single resource. It has three Nash equilibria. The two pure-strategy NE are efficient, but not fair. The one mixed-strategy NE is fair, but not efficient. Aumanns notion of correlated equilibrium fixes this problem: It assumes a correlation device that suggests each agent an action to take. However, such a \"smart\" coordination device might not be available. We propose using a randomly chosen, \"stupid\" integer coordination signal. \"Smart\" agents learn which action they should use for each value of the coordination signal. We present a multi-agent learning algorithm that converges in polynomial number of steps to a correlated equilibrium of a channel allocation game, a variant of the resource allocation game. We show that the agents learn to play for each coordination signal value a randomly chosen pure-strategy Nash equilibrium of the game. Therefore, the outcome is an efficient correlated equilibrium. This CE becomes more fair as the number of the available coordination signal values increases.\nContent-based and collaborative filtering methods are the most successful solutions in recommender systems. Content based method is based on items attributes. This method checks the features of users favourite items and then proposes the items which have the most similar characteristics with those items. Collaborative filtering method is based on the determination of similar items or similar users, which are called item-based and user-based collaborative filtering, respectively.In this paper we propose a hybrid method that integrates collaborative filtering and content-based methods. The proposed method can be viewed as user-based Collaborative filtering technique. However to find users with similar taste with active user, we used content features of the item under investigation to put more emphasis on users rating for similar items. In other words two users are similar if their ratings are similar on items that have similar context. This is achieved by assigning a weight to each rating when calculating the similarity of two users.We used movielens data set to access the performance of the proposed method in comparison with basic user-based collaborative filtering and other popular methods.\nWe describe a probabilistic framework for synthesizing control policies for general multi-robot systems, given environment and sensor models and a cost function. Decentralized, partially observable Markov decision processes (Dec-POMDPs) are a general model of decision processes where a team of agents must cooperate to optimize some objective (specified by a shared reward or cost function) in the presence of uncertainty, but where communication limitations mean that the agents cannot share their state, so execution must proceed in a decentralized fashion. While Dec-POMDPs are typically intractable to solve for real-world problems, recent research on the use of macro-actions in Dec-POMDPs has significantly increased the size of problem that can be practically solved as a Dec-POMDP. We describe this general model, and show how, in contrast to most existing methods that are specialized to a particular problem class, it can synthesize control policies that use whatever opportunities for coordination are present in the problem, while balancing off uncertainty in outcomes, sensor information, and information about other agents. We use three variations on a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communication, and signaling, as appropriate.\nUnsupervised ranking faces one critical challenge in evaluation applications, that is, no ground truth is available. When PageRank and its variants show a good solution in related subjects, they are applicable only for ranking from link-structure data. In this work, we focus on unsupervised ranking from multi-attribute data which is also common in evaluation tasks. To overcome the challenge, we propose five essential meta-rules for the design and assessment of unsupervised ranking approaches: scale and translation invariance, strict monotonicity, linear/nonlinear capacities, smoothness, and explicitness of parameter size. These meta-rules are regarded as high level knowledge for unsupervised ranking tasks. Inspired by the works in [8] and [14], we propose a ranking principal curve (RPC) model, which learns a one-dimensional manifold function to perform unsupervised ranking tasks on multi-attribute observations. Furthermore, the RPC is modeled to be a cubic B\\'ezier curve with control points restricted in the interior of a hypercube, thereby complying with all the five meta-rules to infer a reasonable ranking list. With control points as the model parameters, one is able to understand the learned manifold and to interpret the ranking list semantically. Numerical experiments of the presented RPC model are conducted on two open datasets of different ranking applications. In comparison with the state-of-the-art approaches, the new model is able to show more reasonable ranking lists.\nIn the last decade, scenario-based serious-games have become a main tool for learning new skills and capabilities. An important factor in the development of such systems is the overhead in time, cost and human resources to manually create the content for these scenarios. We focus on how to create content for scenarios in medical, military, commerce and gaming applications where maintaining the integrity and coherence of the content is integral for the system's success. To do so, we present an automatic method for generating content about everyday activities through combining computer science techniques with the crowd. We use the crowd in three basic ways: to capture a database of scenarios of everyday activities, to generate a database of likely replacements for specific events within that scenario, and to evaluate the resulting scenarios. We found that the generated scenarios were rated as reliable and consistent by the crowd when compared to the scenarios that were originally captured. We also compared the generated scenarios to those created by traditional planning techniques. We found that both methods were equally effective in generated reliable and consistent scenarios, yet the main advantages of our approach is that the content we generate is more varied and much easier to create. We have begun integrating this approach within a scenario-based training application for novice investigators within the law enforcement departments to improve their questioning skills.\nAs digital games continue to be explored as solutions to educational and behavioural challenges, the need for evaluation methodologies which support both the unique nature of the format and the need for comparison with other approaches continues to increase. In this workshop paper, a range of challenges are described related specifically to the case of cultural learning using digital games, in terms of how it may best be assessed, understood, and sustained through an iterative process supported by research. An evaluation framework is proposed, identifying metrics for reach and impact and their associated challenges, as well as presenting ethical considerations and the means to utilize evaluation outcomes within an iterative cycle, and to provide feedback to learners. Presenting as a case study a serious game from the Mobile Assistance for Social Inclusion and Empowerment of Immigrants with Persuasive Learning Technologies and Social Networks (MASELTOV) project, the use of the framework in the context of an integrative project is discussed, with emphasis on the need to view game-based learning as a blended component of the cultural learning process, rather than a standalone solution. The particular case of mobile gaming is also considered within this case study, providing a platform by which to deliver and update content in response to evaluation outcomes. Discussion reflects upon the general challenges related to the assessment of cultural learning, and behavioural change in more general terms, suggesting future work should address the need to provide sustainable, research-driven platforms for game-based learning content.\nWe propose a statistical learning model for classifying cognitive processes based on distributed patterns of neural activation in the brain, acquired via functional magnetic resonance imaging (fMRI). In the proposed learning method, local meshes are formed around each voxel. The distance between voxels in the mesh is determined by using a functional neighbourhood concept. In order to define the functional neighbourhood, the similarities between the time series recorded for voxels are measured and functional connectivity matrices are constructed. Then, the local mesh for each voxel is formed by including the functionally closest neighbouring voxels in the mesh. The relationship between the voxels within a mesh is estimated by using a linear regression model. These relationship vectors, called Functional Connectivity aware Local Relational Features (FC-LRF) are then used to train a statistical learning machine. The proposed method was tested on a recognition memory experiment, including data pertaining to encoding and retrieval of words belonging to ten different semantic categories. Two popular classifiers, namely k-nearest neighbour (k-nn) and Support Vector Machine (SVM), are trained in order to predict the semantic category of the item being retrieved, based on activation patterns during encoding. The classification performance of the Functional Mesh Learning model, which range in 62%-71% is superior to the classical multi-voxel pattern analysis (MVPA) methods, which range in 40%-48%, for ten semantic categories.\nEvent recognition systems rely on properly engineered knowledge bases of event definitions to infer occurrences of events in time. The manual development of such knowledge is a tedious and error-prone task, thus event-based applications may benefit from automated knowledge construction techniques, such as Inductive Logic Programming (ILP), which combines machine learning with the declarative and formal semantics of First-Order Logic. However, learning temporal logical formalisms, which are typically utilized by logic-based Event Recognition systems is a challenging task, which most ILP systems cannot fully undertake. In addition, event-based data is usually massive and collected at different times and under various circumstances. Ideally, systems that learn from temporal data should be able to operate in an incremental mode, that is, revise prior constructed knowledge in the face of new evidence. Most ILP systems are batch learners, in the sense that in order to account for new evidence they have no alternative but to forget past knowledge and learn from scratch. Given the increased inherent complexity of ILP and the volumes of real-life temporal data, this results to algorithms that scale poorly. In this work we present an incremental method for learning and revising event-based knowledge, in the form of Event Calculus programs. The proposed algorithm relies on abductive-inductive learning and comprises a scalable clause refinement methodology, based on a compressive summarization of clause coverage in a stream of examples. We present an empirical evaluation of our approach on real and synthetic data from activity recognition and city transport applications.\nHospital readmission has become a critical metric of quality and cost of healthcare. Medicare anticipates that nearly $17 billion is paid out on the 20% of patients who are readmitted within 30 days of discharge. Although several interventions such as transition care management and discharge reengineering have been practiced in recent years, the effectiveness and sustainability depends on how well they can identify and target patients at high risk of rehospitalization. Based on the literature, most current risk prediction models fail to reach an acceptable accuracy level; none of them considers patient's history of readmission and impacts of patient attribute changes over time; and they often do not discriminate between planned and unnecessary readmissions. Tackling such drawbacks, we develop a new readmission metric based on administrative data that can identify potentially avoidable readmissions from all other types of readmission. We further propose a tree based classification method to estimate the predicted probability of readmission that can directly incorporate patient's history of readmission and risk factors changes over time. The proposed methods are validated with 2011-12 Veterans Health Administration data from inpatients hospitalized for heart failure, acute myocardial infarction, pneumonia, or chronic obstructive pulmonary disease in the State of Michigan. Results shows improved discrimination power compared to the literature (c-statistics>80%) and good calibration.\nAlthough many algorithms for the multi-armed bandit problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important observations can be made from our results. Firstly, simple heuristics such as epsilon-greedy and Boltzmann exploration outperform theoretically sound algorithms on most settings by a significant margin. Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identifies for each algorithm the settings where it performs well, and the settings where it performs poorly. Thirdly, the algorithms' performance relative each to other is affected only by the number of bandit arms and the variance of the rewards. This finding may guide the design of subsequent empirical evaluations. In the second part of the paper, we turn our attention to an important area of application of bandit algorithms: clinical trials. Although the design of clinical trials has been one of the principal practical problems motivating research on multi-armed bandits, bandit algorithms have never been evaluated as potential treatment allocation strategies. Using data from a real study, we simulate the outcome that a 2001-2002 clinical trial would have had if bandit algorithms had been used to allocate patients to treatments. We find that an adaptive trial would have successfully treated at least 50% more patients, while significantly reducing the number of adverse effects and increasing patient retention. At the end of the trial, the best treatment could have still been identified with a high level of statistical confidence. Our findings demonstrate that bandit algorithms are attractive alternatives to current adaptive treatment allocation strategies.\nIn this paper we propose a structural parameter of CNF formulas and use it to identify instances of weighted MaxSAT and #SAT that can be solved in polynomial time. Given a CNF formula we say that a set of clauses is precisely satisfiable if there is some complete assignment satisfying these clauses only. Let the ps-value of the formula be the number of precisely satisfiable sets of clauses. Applying the notion of branch decompositions to CNF formulas and using ps-value as cut function, we define the ps-width of a formula. For a formula given with a decomposition of polynomial ps-width we show dynamic programming algorithms solving weighted MaxSAT and #SAT in polynomial time. Combining with results of 'Belmonte and Vatshelle, Graph classes with structured neighborhoods and algorithmic applications, Theor. Comput. Sci. 511: 54-65 (2013)' we get polynomial-time algorithms solving weighted MaxSAT and #SAT for some classes of structured CNF formulas. For example, we get $O(m^2(m + n)s)$ algorithms for formulas $F$ of $m$ clauses and $n$ variables and size $s$, if $F$ has a linear ordering of the variables and clauses such that for any variable $x$ occurring in clause $C$, if $x$ appears before $C$ then any variable between them also occurs in $C$, and if $C$ appears before $x$ then $x$ occurs also in any clause between them. Note that the class of incidence graphs of such formulas do not have bounded clique-width.\nSponsored search is an important monetization channel for search engines, in which an auction mechanism is used to select the ads shown to users and determine the prices charged from advertisers. There have been several pieces of work in the literature that investigate how to design an auction mechanism in order to optimize the revenue of the search engine. However, due to some unrealistic assumptions used, the practical values of these studies are not very clear. In this paper, we propose a novel \\emph{game-theoretic machine learning} approach, which naturally combines machine learning and game theory, and learns the auction mechanism using a bilevel optimization framework. In particular, we first learn a Markov model from historical data to describe how advertisers change their bids in response to an auction mechanism, and then for any given auction mechanism, we use the learnt model to predict its corresponding future bid sequences. Next we learn the auction mechanism through empirical revenue maximization on the predicted bid sequences. We show that the empirical revenue will converge when the prediction period approaches infinity, and a Genetic Programming algorithm can effectively optimize this empirical revenue. Our experiments indicate that the proposed approach is able to produce a much more effective auction mechanism than several baselines.\nThe satisfiability problem for SPARQL patterns is undecidable in general, since the expressive power of SPARQL 1.0 is comparable with that of the relational algebra. The goal of this paper is to delineate the boundary of decidability of satisfiability in terms of the constraints allowed in filter conditions. The classes of constraints considered are bound-constraints, negated bound-constraints, equalities, nonequalities, constant-equalities, and constant-nonequalities. The main result of the paper can be summarized by saying that, as soon as inconsistent filter conditions can be formed, satisfiability is undecidable. The key insight in each case is to find a way to emulate the set difference operation. Undecidability can then be obtained from a known undecidability result for the algebra of binary relations with union, composition, and set difference. When no inconsistent filter conditions can be formed, satisfiability is efficiently decidable by simple checks on bound variables and on the use of literals. The paper also points out that satisfiability for the so-called `well-designed' patterns can be decided by a check on bound variables and a check for inconsistent filter conditions.\nWe perform two experiments with the aim to investigate the effects of negation on the combination of natural concepts. In the first experiment, we test the membership weights of a list of exemplars with respect to two concepts, e.g., {\\it Fruits} and {\\it Vegetables}, and their conjunction {\\it Fruits And Vegetables}. In the second experiment, we test the membership weights of the same list of exemplars with respect to the same two concepts, but negating the second, e.g., {\\it Fruits} and {\\it Not Vegetables}, and again their conjunction {\\it Fruits And Not Vegetables}. The collected data confirm existing results on conceptual combination, namely, they show dramatic deviations from the predictions of classical (fuzzy set) logic and probability theory. More precisely, they exhibit conceptual vagueness, gradeness of membership, overextension and double overextension of membership weights with respect to the given conjunctions. Then, we show that the quantum probability model in Fock space recently elaborated to model Hampton's data on concept conjunction (Hampton, 1988a) and disjunction (Hampton, 1988b) faithfully accords with the collected data. Our quantum-theoretic modeling enables to describe these non-classical effects in terms of genuine quantum effects, namely `contextuality', `superposition', `interference' and `emergence'. The obtained results confirm and strenghten the analysis in Aerts (2009a) and Sozzo (2014) on the identification of quantum aspects in experiments on conceptual vagueness. Our results can be inserted within the general research on the identification of quantum structures in cognitive and decision processes.\nThis paper investigates the impact of query topology on the difficulty of answering conjunctive queries in the presence of OWL 2 QL ontologies. Our first contribution is to clarify the worst-case size of positive existential (PE), non-recursive Datalog (NDL), and first-order (FO) rewritings for various classes of tree-like conjunctive queries, ranging from linear queries to bounded treewidth queries. Perhaps our most surprising result is a superpolynomial lower bound on the size of PE-rewritings that holds already for linear queries and ontologies of depth 2. More positively, we show that polynomial-size NDL-rewritings always exist for tree-shaped queries with a bounded number of leaves (and arbitrary ontologies), and for bounded treewidth queries paired with bounded depth ontologies. For FO-rewritings, we equate the existence of polysize rewritings with well-known problems in Boolean circuit complexity. As our second contribution, we analyze the computational complexity of query answering and establish tractability results (either NL- or LOGCFL-completeness) for a range of query-ontology pairs. Combining our new results with those from the literature yields a complete picture of the succinctness and complexity landscapes for the considered classes of queries and ontologies.\nHotspot detection aims at identifying subgroups in the observations that are unexpected, with respect to the some baseline information. For instance, in disease surveillance, the purpose is to detect sub-regions in spatiotemporal space, where the count of reported diseases (e.g. Cancer) is higher than expected, with respect to the population. The state-of-the-art method for this kind of problem is the Space-Time Scan Statistics (STScan), which exhaustively search the whole space through a sliding window looking for significant spatiotemporal clusters. STScan makes some restrictive assumptions about the distribution of data, the shape of the hotspots and the quality of data, which can be unrealistic for some nontraditional data sources. A novel methodology called EigenSpot is proposed where instead of an exhaustive search over the space, tracks the changes in a space-time correlation structure. Not only does the new approach presents much more computational efficiency, but also makes no assumption about the data distribution, hotspot shape or the data quality. The principal idea is that with the joint combination of abnormal elements in the principal spatial and the temporal singular vectors, the location of hotspots in the spatiotemporal space can be approximated. A comprehensive experimental evaluation, both on simulated and real data sets reveals the effectiveness of the proposed method.\nFaces are a class of visual stimuli with unique significance, for a variety of reasons. They are ubiquitous throughout the course of a person's life, and face recognition is crucial for daily social interaction. Faces are also unlike any other stimulus class in terms of certain physical stimulus characteristics. Furthermore, faces have been empirically found to elicit certain characteristic behavioral phenomena, which are widely held to be evidence of \"holistic\" processing of faces. However, little is known about the neural mechanisms underlying such holistic face processing. In other words, for the processing of faces by the primate visual system, the input and output characteristics are relatively well known, but the internal neural computations are not. The main aim of this work is to further the fundamental understanding of what causes the visual processing of faces to be different from that of objects. In this computational modeling work, we show that a single factor - \"neural tuning size\" - is able to account for three key phenomena that are characteristic of face processing, namely the Composite Face Effect (CFE), Face Inversion Effect (FIE) and Whole-Part Effect (WPE). Our computational proof-of-principle provides specific neural tuning properties that correspond to the poorly-understood notion of holistic face processing, and connects these neural properties to psychophysical behavior. Overall, our work provides a unified and parsimonious theoretical account for the disparate empirical data on face-specific processing, deepening the fundamental understanding of face processing.\nThe semantics of determiner phrases, be they definite de- scriptions, indefinite descriptions or quantified noun phrases, is often as- sumed to be a fully solved question: common nouns are properties, and determiners are generalised quantifiers that apply to two predicates: the property corresponding to the common noun and the one corresponding to the verb phrase. We first present a criticism of this standard view. Firstly, the semantics of determiners does not follow the syntactical structure of the sentence. Secondly the standard interpretation of the indefinite article cannot ac- count for nominal sentences. Thirdly, the standard view misses the linguis- tic asymmetry between the two properties of a generalised quantifier. In the sequel, we propose a treatment of determiners and quantifiers as Hilbert terms in a richly typed system that we initially developed for lexical semantics, using a many sorted logic for semantical representations. We present this semantical framework called the Montagovian generative lexicon and show how these terms better match the syntactical structure and avoid the aforementioned problems of the standard approach. Hilbert terms rather differ from choice functions in that there is one polymorphic operator and not one operator per formula. They also open an intriguing connection between the logic for meaning assembly, the typed lambda calculus handling compositionality and the many-sorted logic for semantical representations. Furthermore epsilon terms naturally introduce type-judgements and confirm the claim that type judgment are a form of presupposition.\nThe search for binary sequences with a high figure of merit, known as the low autocorrelation binary sequence ($labs$}) problem, represents a formidable computational challenge. To mitigate the computational constraints of the problem, we consider solvers that accept odd values of sequence length $L$ and return solutions for skew-symmetric binary sequences only -- with the consequence that not all best solutions under this constraint will be optimal for each $L$. In order to improve both, the search for best merit factor $and$ the asymptotic runtime performance, we instrumented three stochastic solvers, the first two are state-of-the-art solvers that rely on variants of memetic and tabu search ($lssMAts$ and $lssRRts$), the third solver ($lssOrel$) organizes the search as a sequence of independent contiguous self-avoiding walk segments. By adapting a rigorous statistical methodology to performance testing of all three combinatorial solvers, experiments show that the solver with the best asymptotic average-case performance, $lssOrel\\_8 = 0.000032*1.1504^L$, has the best chance of finding solutions that improve, as $L$ increases, figures of merit reported to date. The same methodology can be applied to engineering new $labs$ solvers that may return merit factors even closer to the conjectured asymptotic value of 12.3248.\nWe present a simple noise-robust margin-based active learning algorithm to find homogeneous (passing the origin) linear separators and analyze its error convergence when labels are corrupted by noise. We show that when the imposed noise satisfies the Tsybakov low noise condition (Mammen, Tsybakov, and others 1999; Tsybakov 2004) the algorithm is able to adapt to unknown level of noise and achieves optimal statistical rate up to poly-logarithmic factors. We also derive lower bounds for margin based active learning algorithms under Tsybakov noise conditions (TNC) for the membership query synthesis scenario (Angluin 1988). Our result implies lower bounds for the stream based selective sampling scenario (Cohn 1990) under TNC for some fairly simple data distributions. Quite surprisingly, we show that the sample complexity cannot be improved even if the underlying data distribution is as simple as the uniform distribution on the unit ball. Our proof involves the construction of a well separated hypothesis set on the d-dimensional unit ball along with carefully designed label distributions for the Tsybakov noise condition. Our analysis might provide insights for other forms of lower bounds as well.\nVideo Surveillance is a fast evolving field of research and development (R&D) driven by the urgent need for public security and safety (due to the growing threats of terrorism, vandalism, and anti-social behavior). Traditionally, surveillance systems are comprised of two components - video cameras distributed over the guarded area and human observer watching and analyzing the incoming video. Explosive growth of installed cameras and limited human operator's ability to process the delivered video content raise an urgent demand for developing surveillance systems with human like cognitive capabilities, that is - Cognitive surveillance systems. The growing interest in this issue is testified by the tens of workshops, symposiums and conferences held over the world each year. The IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS) is certainly one of them. However, for unknown reasons, the term Cognitive Surveillance does never appear among its topics. As to me, the explanation for this is simple - the complexity and the indefinable nature of the term \"Cognition\". In this paper, I am trying to resolve the problem providing a novel definition of cognition equally suitable for biological as well as technological applications. I hope my humble efforts will be helpful.\nMethods for combining predictions from different models in a supervised learning setting must somehow estimate/predict the quality of a model's predictions at unknown future inputs. Many of these methods (often implicitly) make the assumption that the test inputs are identical to the training inputs, which is seldom reasonable. By failing to take into account that prediction will generally be harder for test inputs that did not occur in the training set, this leads to the selection of too complex models. Based on a novel, unbiased expression for KL divergence, we propose XAIC and its special case FAIC as versions of AIC intended for prediction that use different degrees of knowledge of the test inputs. Both methods substantially differ from and may outperform all the known versions of AIC even when the training and test inputs are iid, and are especially useful for deterministic inputs and under covariate shift. Our experiments on linear models suggest that if the test and training inputs differ substantially, then XAIC and FAIC predictively outperform AIC, BIC and several other methods including Bayesian model averaging.\nWe present a physics inspired heuristic method for solving combinatorial optimization problems. Our approach is specifically motivated by the desire to avoid trapping in metastable local minima- a common occurrence in hard problems with multiple extrema. Our method involves (i) coupling otherwise independent simulations of a system (\"replicas\") via geometrical distances as well as (ii) probabilistic inference applied to the solutions found by individual replicas. The {\\it ensemble} of replicas evolves as to maximize the inter-replica correlation while simultaneously minimize the local intra-replica cost function (e.g., the total path length in the Traveling Salesman Problem within each replica). We demonstrate how our method improves the performance of rudimentary local optimization schemes long applied to the NP hard Traveling Salesman Problem. In particular, we apply our method to the well-known \"$k$-opt\" algorithm and examine two particular cases- $k=2$ and $k=3$. With the aid of geometrical coupling alone, we are able to determine for the optimum tour length on systems up to $280$ cities (an order of magnitude larger than the largest systems typically solved by the bare $k=3$ opt). The probabilistic replica-based inference approach improves $k-opt$ even further and determines the optimal solution of a problem with $318$ cities and find tours whose total length is close to that of the optimal solutions for other systems with a larger number of cities.\nIt is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios.   Efficient implementations of score-based structure learning benefit from past and current research in optimisation theory, which can be adapted to the task by using the network score as the objective function to maximise. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimisation in widespread use, backtracking, leverages the symmetries implied by the definitions of neighbourhood and Markov blanket.   In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelise constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.\nIn this paper, the Dempster-Shafer method is employed as the theoretical basis for creating data classification systems. Testing is carried out using three popular (multiple attribute) benchmark datasets that have two, three and four classes. In each case, a subset of the available data is used for training to establish thresholds, limits or likelihoods of class membership for each attribute, and hence create mass functions that establish probability of class membership for each attribute of the test data. Classification of each data item is achieved by combination of these probabilities via Dempster's Rule of Combination. Results for the first two datasets show extremely high classification accuracy that is competitive with other popular methods. The third dataset is non-numerical and difficult to classify, but good results can be achieved provided the system and mass functions are designed carefully and the right attributes are chosen for combination. In all cases the Dempster-Shafer method provides comparable performance to other more popular algorithms, but the overhead of generating accurate mass functions increases the complexity with the addition of new attributes. Overall, the results suggest that the D-S approach provides a suitable framework for the design of classification systems and that automating the mass function design and calculation would increase the viability of the algorithm for complex classification problems.\nThe Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximized over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximizing an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from iid observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the nonlinear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain inputs and semi-supervised Gaussian processes. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.\nDeep learning has made significant breakthroughs in various fields of artificial intelligence. Advantages of deep learning include the ability to capture highly complicated features, weak involvement of human engineering, etc. However, it is still virtually impossible to use deep learning to analyze programs since deep architectures cannot be trained effectively with pure back propagation. In this pioneering paper, we propose the \"coding criterion\" to build program vector representations, which are the premise of deep learning for program analysis. Our representation learning approach directly makes deep learning a reality in this new field. We evaluate the learned vector representations both qualitatively and quantitatively. We conclude, based on the experiments, the coding criterion is successful in building program representations. To evaluate whether deep learning is beneficial for program analysis, we feed the representations to deep neural networks, and achieve higher accuracy in the program classification task than \"shallow\" methods, such as logistic regression and the support vector machine. This result confirms the feasibility of deep learning to analyze programs. It also gives primary evidence of its success in this new field. We believe deep learning will become an outstanding technique for program analysis in the near future.\nWe consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known.   Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).\nIn 2013 Intel introduced the Xeon Phi, a new parallel co-processor board. The Xeon Phi is a cache-coherent many-core shared memory architecture claiming CPU-like versatility, programmability, high performance, and power efficiency. The first published micro-benchmark studies indicate that many of Intel's claims appear to be true. The current paper is the first study on the Phi of a complex artificial intelligence application. It contains an open source MCTS application for playing tournament quality Go (an oriental board game). We report the first speedup figures for up to 240 parallel threads on a real machine, allowing a direct comparison to previous simulation studies. After a substantial amount of work, we observed that performance scales well up to 32 threads, largely confirming previous simulation results of this Go program, although the performance surprisingly deteriorates between 32 and 240 threads. Furthermore, we report (1) unexpected performance anomalies between the Xeon Phi and Xeon CPU for small problem sizes and small numbers of threads, and (2) that performance is sensitive to scheduling choices. Achieving good performance on the Xeon Phi for complex programs is not straightforward; it requires a deep understanding of (1) search patterns, (2) of scheduling, and (3) of the architecture and its many cores and caches. In practice, the Xeon Phi is less straightforward to program for than originally envisioned by Intel.\nThis paper introduces a multi-period inspector scheduling problem (MPISP), which is a new variant of the multi-trip vehicle routing problem with time windows (VRPTW). In the MPISP, each inspector is scheduled to perform a route in a given multi-period planning horizon. At the end of each period, each inspector is not required to return to the depot but has to stay at one of the vertices for recuperation. If the remaining time of the current period is insufficient for an inspector to travel from his/her current vertex $A$ to a certain vertex B, he/she can choose either waiting at vertex A until the start of the next period or traveling to a vertex C that is closer to vertex B. Therefore, the shortest transit time between any vertex pair is affected by the length of the period and the departure time. We first describe an approach of computing the shortest transit time between any pair of vertices with an arbitrary departure time. To solve the MPISP, we then propose several local search operators adapted from classical operators for the VRPTW and integrate them into a tabu search framework. In addition, we present a constrained knapsack model that is able to produce an upper bound for the problem. Finally, we evaluate the effectiveness of our algorithm with extensive experiments based on a set of test instances. Our computational results indicate that our approach generates high-quality solutions.\nLocal field potentials (LFPs) sampled with extracellular electrodes are frequently used as a measure of population neuronal activity. However, relating such measurements to underlying neuronal behaviour and connectivity is non-trivial. To help study this link, we developed the Virtual Electrode Recording Tool for EXtracellular potentials (VERTEX). We first identified a reduced neuron model that retained the spatial and frequency filtering characteristics of extracellular potentials from neocortical neurons. We then developed VERTEX as an easy-to-use Matlab tool for simulating LFPs from large populations (>100 000 neurons). A VERTEX-based simulation successfully reproduced features of the LFPs from an in vitro multi-electrode array recording of macaque neocortical tissue. Our model, with virtual electrodes placed anywhere in 3D, allows direct comparisons with the in vitro recording setup. We envisage that VERTEX will stimulate experimentalists, clinicians, and computational neuroscientists to use models to understand the mechanisms underlying measured brain dynamics in health and disease.\nWe consider \\textit{anytime} linear prediction in the common machine learning setting, where features are in groups that have costs. We achieve anytime (or interruptible) predictions by sequencing the computation of feature groups and reporting results using the computed features at interruption. We extend Orthogonal Matching Pursuit (OMP) and Forward Regression (FR) to learn the sequencing greedily under this group setting with costs. We theoretically guarantee that our algorithms achieve near-optimal linear predictions at each budget when a feature group is chosen. With a novel analysis of OMP, we improve its theoretical bound to the same strength as that of FR. In addition, we develop a novel algorithm that consumes cost $4B$ to approximate the optimal performance of \\textit{any} cost $B$, and prove that with cost less than $4B$, such an approximation is impossible. To our knowledge, these are the first anytime bounds at \\textit{all} budgets. We test our algorithms on two real-world data-sets and evaluate them in terms of anytime linear prediction performance against cost-weighted Group Lasso and alternative greedy algorithms.\nActive security is mainly concerned with performing one or more security functions when a host in a communication network is subject to an attack. Such security functions include appropriate actions against attackers. To properly afford active security actions a set of software subsystems should be integrated together so that they can automatically detect and appropriately address any vulnerability in the underlying network. This work presents integrated model for active security response model. The proposed model introduces Active Response Mechanism (ARM) for tracing anonymous attacks in the network back to their source. This work is motivated by the increased frequency and sophistication of denial-of-service attacks and by the difficulty in tracing packets with incorrect, or \"spoofed\", source addresses. This paper presents within the proposed model two tracing approaches based on:   1.Sleepy Watermark Tracing (SWT) for unauthorized access attacks.   2.Probabilistic Packet Marking (PPM) in the network for Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks. On the basis of the proposed model a cooperative network security tools such as firewall, intrusion detection system with IP tracing mechanism has been designed for taking a rapid active response against real IPs for attackers. The proposed model is able to detect network vulnerabilities, trace attack source IP and reconfigure the attacked subnetworks.\nTo analyze high-dimensional systems, many fields in science and engineering rely on high-level descriptions, sometimes called \"macrostates,\" \"coarse-grainings,\" or \"effective theories\". Examples of such descriptions include the thermodynamic properties of a large collection of point particles undergoing reversible dynamics, the variables in a macroeconomic model describing the individuals that participate in an economy, and the summary state of a cell composed of a large set of biochemical networks.   Often these high-level descriptions are constructed without considering the ultimate reason for needing them in the first place. Here, we formalize and quantify one such purpose: the need to predict observables of interest concerning the high-dimensional system with as high accuracy as possible, while minimizing the computational cost of doing so. The resulting State Space Compression (SSC) framework provides a guide for how to solve for the {optimal} high-level description of a given dynamical system, rather than constructing it based on human intuition alone.   In this preliminary report, we introduce SSC, and illustrate it with several information-theoretic quantifications of \"accuracy\", all with different implications for the optimal compression. We also discuss some other possible applications of SSC beyond the goal of accurate prediction. These include SSC as a measure of the complexity of a dynamical system, and as a way to quantify information flow between the scales of a system.\nMachine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing \"big data\" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of 2-, 4-, and 8-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can in principle be scaled to larger number of qubits, and may provide a new route to accelerate machine learning.\nProbabilistic graphical models such as Bayesian Networks are one of the most powerful structures known by the Computer Science community for deriving probabilistic inferences. However, modern cognitive psychology has revealed that human decisions could not follow the rules of classical probability theory, because humans cannot process large amounts of data in order to make judgements. Consequently, the inferences performed are based on limited data coupled with several heuristics, leading to violations of the law of total probability. This means that probabilistic graphical models based on classical probability theory are too limited to fully simulate and explain various aspects of human decision making.   Quantum probability theory was developed in order to accommodate the paradoxical findings that the classical theory could not explain. Recent findings in cognitive psychology revealed that quantum probability can fully describe human decisions in an elegant framework. Their findings suggest that, before taking a decision, human thoughts are seen as superposed waves that can interfere with each other, influencing the final decision.   In this work, we propose a new Bayesian Network based on the psychological findings of cognitive scientists. We made experiments with two very well known Bayesian Networks from the literature. The results obtained revealed that the quantum like Bayesian Network can affect drastically the probabilistic inferences, specially when the levels of uncertainty of the network are very high (no pieces of evidence observed). When the levels of uncertainty are very low, then the proposed quantum like network collapses to its classical counterpart.\nDue to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution,for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright preservation. This work proposes an innovative multi-agent driven machine learning technique that has been developed for authorship attribution. By means of a preprocessing for word-grouping and time-period related analysis of the common lexicon, we determine a bias reference level for the recurrence frequency of the words within analysed texts, and then train a Radial Basis Neural Networks (RBPNN)-based classifier to identify the correct author. The main advantage of the proposed approach lies in the generality of the semantic analysis, which can be applied to different contexts and lexical domains, without requiring any modification. Moreover, the proposed system is able to incorporate an external input, meant to tune the classifier, and then self-adjust by means of continuous learning reinforcement.\nThe Turing machine, as it was presented by Turing himself, models the calculations done by a person. This means that we can compute whatever any Turing machine can compute, and therefore we are Turing complete. The question addressed here is why, Why are we Turing complete? Being Turing complete also means that somehow our brain implements the function that a universal Turing machine implements. The point is that evolution achieved Turing completeness, and then the explanation should be evolutionary, but our explanation is mathematical. The trick is to introduce a mathematical theory of problems, under the basic assumption that solving more problems provides more survival opportunities. So we build a problem theory by fusing set and computing theories. Then we construct a series of resolvers, where each resolver is defined by its computing capacity, that exhibits the following property: all problems solved by a resolver are also solved by the next resolver in the series if certain condition is satisfied. The last of the conditions is to be Turing complete. This series defines a resolvers hierarchy that could be seen as a framework for the evolution of cognition. Then the answer to our question would be: to solve most problems. By the way, the problem theory defines adaptation, perception, and learning, and it shows that there are just three ways to resolve any problem: routine, trial, and analogy. And, most importantly, this theory demonstrates how problems can be used to found mathematics and computing on biology.\nA database of fetal heart rate (FHR) time series measured from 7221 patients during labor is analyzed with the aim of learning the types of features of these recordings that are informative of low cord pH. Our 'highly comparative' analysis involves extracting over 9000 time-series analysis features from each FHR time series, including measures of autocorrelation, entropy, distribution, and various model fits. This diverse collection of features was developed in previous work, and is publicly available. We describe five features that most accurately classify a balanced training set of 59 'low pH' and 59 'normal pH' FHR recordings. We then describe five of the features with the strongest linear correlation to cord pH across the full dataset of FHR time series. The features identified in this work may be used as part of a system for guiding intervention during labor in future. This work successfully demonstrates the utility of comparing across a large, interdisciplinary literature on time-series analysis to automatically contribute new scientific results for specific biomedical signal processing challenges.\nThe FO Model Counting problem (FOMC) is the following: given a sentence $\\Phi$ in FO and a number $n$, compute the number of models of $\\Phi$ over a domain of size $n$; the Weighted variant (WFOMC) generalizes the problem by associating a weight to each tuple and defining the weight of a model to be the product of weights of its tuples. In this paper we study the complexity of the symmetric WFOMC, where all tuples of a given relation have the same weight. Our motivation comes from an important application, inference in Knowledge Bases with soft constraints, like Markov Logic Networks, but the problem is also of independent theoretical interest. We study both the data complexity, and the combined complexity of FOMC and WFOMC. For the data complexity we prove the existence of an FO$^{3}$ formula for which FOMC is #P$_1$-complete, and the existence of a Conjunctive Query for which WFOMC is #P$_1$-complete. We also prove that all $\\gamma$-acyclic queries have polynomial time data complexity. For the combined complexity, we prove that, for every fragment FO$^{k}$, $k\\geq 2$, the combined complexity of FOMC (or WFOMC) is #P-complete.\nDeep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Given that DNNs are now able to classify objects in images with near-human-level performance, questions naturally arise as to what differences remain between computer and human vision. A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects, which we call \"fooling images\" (more generally, fooling examples). Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.\nJoint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that are further away, especially in unconstraint cases where the looker can move her head and eyes freely. In this paper we address this question by jointly exploring human psychophysics and a cognitively motivated computer vision model, which can detect the 3D direction of gaze from 2D face images. The synthesis of behavioral study and computer vision yields several interesting discoveries. (1) Human accuracy of discriminating targets 8{\\deg}-10{\\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly. These results collectively show that the acuity of human joint attention is indeed highly impressive, given the computational challenge of the natural looking task. Moreover, the gap between human and model performance, as well as the variability of gaze interpretation across different lookers, require further understanding of the underlying mechanisms utilized by humans for this challenging task.\nHuman-robot interaction can be divided into two categories based on the physical distance between the human and robot: remote and proximal. In proximal interaction, the human and robot often engage in close coordination; in remote interaction, the human and robot are less coupled due to communication constraints. As a result, providing automation for the robot in remote interaction becomes more important. Thus far, human factor studies on automation in remote human-robot interaction have been restricted to various forms of supervision, in which the robot is essentially being used as a smart mobile manipulation platform with sensing capabilities. In this paper, we investigate the incorporation of general planning capability into the robot to facilitate peer-to-peer human-robot teaming, in which the human and robot are viewed as teammates that are physically separated. The human and robot share the same global goal and collaborate to achieve it. Note that humans may feel uncomfortable at such robot autonomy, which can potentially reduce teaming performance. One important difference between peer-to-peer teaming and supervised teaming is that an autonomous robot in peer-to-peer teaming can achieve the goal alone when the task information is completely specified. However, incompleteness often exists, which implies information asymmetry. While information asymmetry can be desirable sometimes, it may also lead to the robot choosing improper actions that negatively influence the teaming performance. We aim to investigate the various trade-offs, e.g., mental workload and situation awareness, between these two types of remote human-robot teaming.\nHalpern and Pearl introduced a definition of actual causality; Eiter and Lukasiewicz showed that computing whether X=x is a cause of Y=y is NP-complete in binary models (where all variables can take on only two values) and\\ Sigma_2^P-complete in general models. In the final version of their paper, Halpern and Pearl slightly modified the definition of actual cause, in order to deal with problems pointed by Hopkins and Pearl. As we show, this modification has a nontrivial impact on the complexity of computing actual cause. To characterize the complexity, a new family D_k^P, k= 1, 2, 3, ..., of complexity classes is introduced, which generalizes the class DP introduced by Papadimitriou and Yannakakis (DP is just D_1^P). %joe2 %We show that the complexity of computing causality is $\\D_2$-complete %under the new definition. Chockler and Halpern \\citeyear{CH04} extended the We show that the complexity of computing causality under the updated definition is $D_2^P$-complete.   Chockler and Halpern extended the definition of causality by introducing notions of responsibility and blame. The complexity of determining the degree of responsibility and blame using the original definition of causality was completely characterized. Again, we show that changing the definition of causality affects the complexity, and completely characterize it using the updated definition.\nMastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more 'humanlike' way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to 'hard code' symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned.\nCausal models defined in terms of structural equations have proved to be quite a powerful way of representing knowledge regarding causality. However, a number of authors have given examples that seem to show that the Halpern-Pearl (HP) definition of causality gives intuitively unreasonable answers. Here it is shown that, for each of these examples, we can give two stories consistent with the description in the example, such that intuitions regarding causality are quite different for each story. By adding additional variables, we can disambiguate the stories. Moreover, in the resulting causal models, the HP definition of causality gives the intuitively correct answer. It is also shown that, by adding extra variables, a modification to the original HP definition made to deal with an example of Hopkins and Pearl may not be necessary. Given how much can be done by adding extra variables, there might be a concern that the notion of causality is somewhat unstable. Can adding extra variables in a \"conservative\" way (i.e., maintaining all the relations between the variables in the original model) cause the answer to the question \"Is X=x a cause of Y=y\" to alternate between \"yes\" and \"no\"? It is shown that we can have such alternation infinitely often, but if we take normality into consideration, we cannot. Indeed, under appropriate normality assumptions. adding an extra variable can change the answer from \"yes\" to \"no\", but after that, it cannot cannot change back to \"yes\".\nDropout is a simple but effective technique for learning in neural networks and other settings. A sound theoretical understanding of dropout is needed to determine when dropout should be applied and how to use it most effectively. In this paper we continue the exploration of dropout as a regularizer pioneered by Wager, et.al. We focus on linear classification where a convex proxy to the misclassification loss (i.e. the logistic loss used in logistic regression) is minimized. We show: (a) when the dropout-regularized criterion has a unique minimizer, (b) when the dropout-regularization penalty goes to infinity with the weights, and when it remains bounded, (c) that the dropout regularization can be non-monotonic as individual weights increase from 0, and (d) that the dropout regularization penalty may not be convex. This last point is particularly surprising because the combination of dropout regularization with any convex loss proxy is always a convex function.   In order to contrast dropout regularization with $L_2$ regularization, we formalize the notion of when different sources are more compatible with different regularizers. We then exhibit distributions that are provably more compatible with dropout regularization than $L_2$ regularization, and vice versa. These sources provide additional insight into how the inductive biases of dropout and $L_2$ regularization differ. We provide some similar results for $L_1$ regularization.\nThe process of multiple criteria decision making (MCDM) is of determining the best choice among all of the probable alternatives. The problem of supplier selection on which decision maker has usually vague and imprecise knowledge is a typical example of multi criteria group decision-making problem. The conventional crisp techniques has not much effective for solving MCDM problems because of imprecise or fuzziness nature of the linguistic assessments. To find the exact values for MCDM problems is both difficult and impossible in more cases in real world. So, it is more reasonable to consider the values of alternatives according to the criteria as single valued neutrosophic sets (SVNS). This paper deal with the technique for order preference by similarity to ideal solution (TOPSIS) approach and extend the TOPSIS method to MCDM problem with single valued neutrosophic information. The value of each alternative and the weight of each criterion are characterized by single valued neutrosophic numbers. Here, the importance of criteria and alternatives is identified by aggregating individual opinions of decision makers (DMs) via single valued neutrosophic weighted averaging (IFWA) operator. The proposed method is, easy use, precise and practical for solving MCDM problem with single valued neutrosophic data. Finally, to show the applicability of the developed method, a numerical experiment for supplier choice is given as an application of single valued neutrosophic TOPSIS method at end of this paper.\nWe investigate modal logics of high probability having two unary modal operators: an operator $K$ expressing probabilistic certainty and an operator $B$ expressing probability exceeding a fixed rational threshold $c\\geq\\frac 12$. Identifying knowledge with the former and belief with the latter, we may think of $c$ as the agent's betting threshold, which leads to the motto \"belief is willingness to bet.\" The logic $\\mathsf{KB.5}$ for $c=\\frac 12$ has an $\\mathsf{S5}$ $K$ modality along with a sub-normal $B$ modality that extends the minimal modal logic $\\mathsf{EMND45}$ by way of four schemes relating $K$ and $B$, one of which is a complex scheme arising out of a theorem due to Scott. Lenzen was the first to use Scott's theorem to show that a version of this logic is sound and complete for the probability interpretation. We reformulate Lenzen's results and present them here in a modern and accessible form. In addition, we introduce a new epistemic neighborhood semantics that will be more familiar to modern modal logicians. Using Scott's theorem, we provide the Lenzen-derivative properties that must be imposed on finite epistemic neighborhood models so as to guarantee the existence of a probability measure respecting the neighborhood function in the appropriate way for threshold $c=\\frac 12$. This yields a link between probabilistic and modal neighborhood semantics that we hope will be of use in future work on modal logics of qualitative probability. We leave open the question of which properties must be imposed on finite epistemic neighborhood models so as to guarantee existence of an appropriate probability measure for thresholds $c\\neq\\frac 12$.\nWhile influence maximization in social networks has been studied extensively in computer science community for the last decade the focus has been on the progressive influence models, such as independent cascade (IC) and Linear threshold (LT) models, which cannot capture the reversibility of choices. In this paper, we present the Heat Conduction (HC) model which is a non-progressive influence model with real-world interpretations. We show that HC unifies, generalizes, and extends the existing nonprogressive models, such as the Voter model [1] and non-progressive LT [2]. We then prove that selecting the optimal seed set of influential nodes is NP-hard for HC but by establishing the submodularity of influence spread, we can tackle the influence maximization problem with a scalable and provably near-optimal greedy algorithm. We are the first to present a scalable solution for influence maximization under nonprogressive LT model, as a special case of the HC model. In sharp contrast to the other greedy influence maximization methods, our fast and efficient C2GREEDY algorithm benefits from two analytically computable steps: closed-form computation for finding the influence spread as well as the greedy seed selection. Through extensive experiments on several large real and synthetic networks, we show that C2GREEDY outperforms the state-of-the-art methods, in terms of both influence spread and scalability.\nIn the recent decade, with the enormous growth of digital content in internet and databases, sentiment analysis has received more and more attention between information retrieval and natural language processing researchers. Sentiment analysis aims to use automated tools to detect subjective information from reviews. One of the main challenges in sentiment analysis is feature selection. Feature selection is widely used as the first stage of analysis and classification tasks to reduce the dimension of problem, and improve speed by the elimination of irrelevant and redundant features. Up to now as there are few researches conducted on feature selection in sentiment analysis, there are very rare works for Persian sentiment analysis. This paper considers the problem of sentiment classification using different feature selection methods for online customer reviews in Persian language. Three of the challenges of Persian text are using of a wide variety of declensional suffixes, different word spacing and many informal or colloquial words. In this paper we study these challenges by proposing a model for sentiment classification of Persian review documents. The proposed model is based on lemmatization and feature selection and is employed Naive Bayes algorithm for classification. We evaluate the performance of the model on a manually gathered collection of cellphone reviews, where the results show the effectiveness of the proposed approaches.\nIn this work, we propose an abductive framework for biosignal interpretation, based on the concept of Temporal Abstraction Patterns. A temporal abstraction pattern defines an abstraction relation between an observation hypothesis and a set of observations constituting its evidence support. New observations are generated abductively from any subset of the evidence of a pattern, building an abstraction hierarchy of observations in which higher levels contain those observations with greater interpretative value of the physiological processes underlying a given signal. Non-monotonic reasoning techniques have been applied to this model in order to find the best interpretation of a set of initial observations, permitting even to correct these observations by removing, adding or modifying them in order to make them consistent with the available domain knowledge. Some preliminary experiments have been conducted to apply this framework to a well known and bounded problem: the QRS detection on ECG signals. The objective is not to provide a new better QRS detector, but to test the validity of an abductive paradigm. These experiments show that a knowledge base comprising just a few very simple rhythm abstraction patterns can enhance the results of a state of the art algorithm by significantly improving its detection F1-score, besides proving the ability of the abductive framework to correct both sensitivity and specificity failures.\nBelief revision of knowledge bases represented by a set of sentences in a given logic has been extensively studied but for specific logics, mainly propositional, and also recently Horn and description logics. Here, we propose to generalize this operation from a model-theoretic point of view, by defining revision in an abstract model theory known under the name of satisfaction systems. In this framework, we generalize to any satisfaction systems the characterization of the well known AGM postulates given by Katsuno and Mendelzon for propositional logic in terms of minimal change among interpretations. Moreover, we study how to define revision, satisfying the AGM postulates, from relaxation notions that have been first introduced in description logics to define dissimilarity measures between concepts, and the consequence of which is to relax the set of models of the old belief until it becomes consistent with the new pieces of knowledge. We show how the proposed general framework can be instantiated in different logics such as propositional, first-order, description and Horn logics. In particular for description logics, we introduce several concrete relaxation operators tailored for the description logic $\\ALC{}$ and its fragments $\\EL{}$ and $\\ELext{}$, discuss their properties and provide some illustrative examples.\nEnforcing local consistencies in cost function networks is performed by applying so-called Equivalent Preserving Transformations (EPTs) to the cost functions. As EPTs transform the cost functions, they may break the property that was making local consistency enforcement tractable on a global cost function. A global cost function is called tractable projection-safe when applying an EPT to it is tractable and does not break the tractability property. In this paper, we prove that depending on the size r of the smallest scopes used for performing EPTs, the tractability of global cost functions can be preserved (r = 0) or destroyed (r > 1). When r = 1, the answer is indefinite. We show that on a large family of cost functions, EPTs can be computed via dynamic programming-based algorithms, leading to tractable projection-safety. We also show that when a global cost function can be decomposed into a Berge acyclic network of bounded arity cost functions, soft local consistencies such as soft Directed or Virtual Arc Consistency can directly emulate dynamic programming. These different approaches to decomposable cost functions are then embedded in a solver for extensive experiments that confirm the feasibility and efficiency of our proposal.\nSentiment analysis on user reviews helps to keep track of user reactions towards products, and make advices to users about what to buy. State-of-the-art review-level sentiment classification techniques could give pretty good precisions of above 90%. However, current phrase-level sentiment analysis approaches might only give sentiment polarity labelling precisions of around 70%~80%, which is far from satisfaction and restricts its application in many practical tasks. In this paper, we focus on the problem of phrase-level sentiment polarity labelling and attempt to bridge the gap between phrase-level and review-level sentiment analysis. We investigate the inconsistency between the numerical star ratings and the sentiment orientation of textual user reviews. Although they have long been treated as identical, which serves as a basic assumption in previous work, we find that this assumption is not necessarily true. We further propose to leverage the results of review-level sentiment classification to boost the performance of phrase-level polarity labelling using a novel constrained convex optimization framework. Besides, the framework is capable of integrating various kinds of information sources and heuristics, while giving the global optimal solution due to its convexity. Experimental results on both English and Chinese reviews show that our framework achieves high labelling precisions of up to 89%, which is a significant improvement from current approaches.\nClassical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithm takes into account the collaborative effects that arise due to the interaction of the users with the items, by dynamically grouping users based on the items under consideration and, at the same time, grouping items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on medium-size real-world datasets, showing scalability and increased prediction performance (as measured by click-through rate) over state-of-the-art methods for clustering bandits. We also provide a regret analysis within a standard linear stochastic noise setting.\nMultiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non-parametric method of estimating the FWER for a given $\\alpha$-threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix ${\\bf P}$. By analyzing the spectrum of this matrix, under certain conditions, we see that ${\\bf P}$ has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub--sampled --- on the order of $0.5\\%$ --- matrix completion methods. Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly $50\\times$ can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated $\\alpha$-threshold is also recovered faithfully, and is stable.\nThe proliferation of heterogeneous data sources of semantic knowledge base intensifies the need of an automatic instance matching technique. However, the efficiency of instance matching is often influenced by the weight of a property associated to instances. Automatic weight generation is a non-trivial, however an important task in instance matching technique. Therefore, identifying an appropriate metric for generating weight for a property automatically is nevertheless a formidable task. In this paper, we investigate an approach of generating weights automatically by considering hypotheses: (1) the weight of a property is directly proportional to the ratio of the number of its distinct values to the number of instances contain the property, and (2) the weight is also proportional to the ratio of the number of distinct values of a property to the number of instances in a training dataset. The basic intuition behind the use of our approach is the classical theory of information content that infrequent words are more informative than frequent ones. Our mathematical model derives a metric for generating property weights automatically, which is applied in instance matching system to produce re-conciliated instances efficiently. Our experiments and evaluations show the effectiveness of our proposed metric of automatic weight generation for properties in an instance matching technique.\nMonaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve 2.30--4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30--2.48 dB GNSDR gain and 4.32--5.42 dB GSIR gain compared to existing models in the singing voice separation task, and outperform NMF and DNN baselines in the speech denoising task.\nThis book discusses computational curiosity, from the psychology of curiosity to the computational models of curiosity, and then showcases several interesting applications of computational curiosity. A brief overview of the book is given as follows. Chapter 1 discusses the underpinnings of curiosity in human beings, including the major categories of curiosity, curiosity-related emotions and behaviors, and the benefits of curiosity. Chapter 2 reviews the arousal theories of curiosity in psychology and summarizes a general two-step process model for computational curiosity. Base on the perspective of the two-step process model, Chapter 3 reviews and analyzes some of the traditional computational models of curiosity. Chapter 4 introduces a novel generic computational model of curiosity, which is developed based on the arousal theories of curiosity. After the discussion of computational models of curiosity, we outline the important applications where computational curiosity may bring significant impacts in Chapter 5. Chapter 6 discusses the application of the generic computational model of curiosity in a machine learning framework. Chapter 7 discusses the application of the generic computational model of curiosity in a recommender system. In Chapter 8 and Chapter 9, the generic computational model of curiosity is studied in two types of pedagogical agents. In Chapter 8, a curious peer learner is studied. It is a non-player character that aims to provide a believable virtual learning environment for users. In Chapter 9, a curious learning companion is studied. It aims to enhance users' learning experience through providing meaningful interactions with them. Chapter 10 discusses open questions in the research field of computation curiosity.\nRecent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.\nWe design mechanisms for online procurement of data held by strategic agents for machine learning tasks. The challenge is to use past data to actively price future data and give learning guarantees even when an agent's cost for revealing her data may depend arbitrarily on the data itself. We achieve this goal by showing how to convert a large class of no-regret algorithms into online posted-price and learning mechanisms. Our results in a sense parallel classic sample complexity guarantees, but with the key resource being money rather than quantity of data: With a budget constraint $B$, we give robust risk (predictive error) bounds on the order of $1/\\sqrt{B}$. Because we use an active approach, we can often guarantee to do significantly better by leveraging correlations between costs and data.   Our algorithms and analysis go through a model of no-regret learning with $T$ arriving pairs (cost, data) and a budget constraint of $B$. Our regret bounds for this model are on the order of $T/\\sqrt{B}$ and we give lower bounds on the same order.\nAutomatic reconstruction of 3D models from images using multi-view Structure-from-Motion methods has been one of the most fruitful outcomes of computer vision. These advances combined with the growing popularity of Micro Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools ubiquitous for large number of Architecture, Engineering and Construction applications among audiences, mostly unskilled in computer vision. However, to obtain high-resolution and accurate reconstructions from a large-scale object using SfM, there are many critical constraints on the quality of image data, which often become sources of inaccuracy as the current 3D reconstruction pipelines do not facilitate the users to determine the fidelity of input data during the image acquisition. In this paper, we present and advocate a closed-loop interactive approach that performs incremental reconstruction in real-time and gives users an online feedback about the quality parameters like Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We also propose a novel multi-scale camera network design to prevent scene drift caused by incremental map building, and release the first multi-scale image sequence dataset as a benchmark. Further, we evaluate our system on real outdoor scenes, and show that our interactive pipeline combined with a multi-scale camera network approach provides compelling accuracy in multi-view reconstruction tasks when compared against the state-of-the-art methods.\nRecent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions.\nRecursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. But there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper we benchmark {\\bf recursive} neural models against sequential {\\bf recurrent} neural models (simple recurrent and LSTM models), enforcing apples-to-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; (4) semantic relation extraction (e.g., {\\em component-whole} between nouns).   Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require associating headwords across a long distance, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models.\nThe global influence of Big Data is not only growing but seemingly endless. The trend is leaning towards knowledge that is attained easily and quickly from massive pools of Big Data. Today we are living in the technological world that Dr. Usama Fayyad and his distinguished research fellows discussed in the introductory explanations of Knowledge Discovery in Databases (KDD) predicted nearly two decades ago. Indeed, they were precise in their outlook on Big Data analytics. In fact, the continued improvement of the interoperability of machine learning, statistics, database building and querying fused to create this increasingly popular science- Data Mining and Knowledge Discovery. The next generation computational theories are geared towards helping to extract insightful knowledge from even larger volumes of data at higher rates of speed. As the trend increases in popularity, the need for a highly adaptive solution for knowledge discovery will be necessary. In this research paper, we are introducing the investigation and development of 23 bit-questions for a Metaknowledge template for Big Data Processing and clustering purposes. This research aims to demonstrate the construction of this methodology and proves the validity and the beneficial utilization that brings Knowledge Discovery from Big Data.\nThis article provides a thorough meta-analysis of the anomaly detection problem. To accomplish this we first identify approaches to benchmarking anomaly detection algorithms across the literature and produce a large corpus of anomaly detection benchmarks that vary in their construction across several dimensions we deem important to real-world applications: (a) point difficulty, (b) relative frequency of anomalies, (c) clusteredness of anomalies, and (d) relevance of features. We apply a representative set of anomaly detection algorithms to this corpus, yielding a very large collection of experimental results. We analyze these results to understand many phenomena observed in previous work. First we observe the effects of experimental design on experimental results. Second, results are evaluated with two metrics, ROC Area Under the Curve and Average Precision. We employ statistical hypothesis testing to demonstrate the value (or lack thereof) of our benchmarks. We then offer several approaches to summarizing our experimental results, drawing several conclusions about the impact of our methodology as well as the strengths and weaknesses of some algorithms. Last, we compare results against a trivial solution as an alternate means of normalizing the reported performance of algorithms. The intended contributions of this article are many; in addition to providing a large publicly-available corpus of anomaly detection benchmarks, we provide an ontology for describing anomaly detection contexts, a methodology for controlling various aspects of benchmark creation, guidelines for future experimental design and a discussion of the many potential pitfalls of trying to measure success in this field.\nGiven a hierarchical plan (or schedule) with uncertain task times, we propose a deterministic polynomial (time and memory) algorithm for estimating the probability that its meets a deadline, or, alternately, that its {\\em makespan} is less than a given duration. Approximation is needed as it is known that this problem is NP-hard even for sequential plans (just, a sum of random variables). In addition, we show two new complexity results: (1) Counting the number of events that do not cross deadline is \\#P-hard; (2)~Computing the expected makespan of a hierarchical plan is NP-hard. For the proposed approximation algorithm, we establish formal approximation bounds and show that the time and memory complexities grow polynomially with the required accuracy, the number of nodes in the plan, and with the size of the support of the random variables that represent the durations of the primitive tasks. We examine these approximation bounds empirically and demonstrate, using task networks taken from the literature, how our scheme outperforms sampling techniques and exact computation in terms of accuracy and run-time. As the empirical data shows much better error bounds than guaranteed, we also suggest a method for tightening the bounds in some cases.\nPoker is one of the most popular card games, whose rational investigation represents also one of the major challenges in several scientific areas, spanning from information theory and artificial intelligence to game theory and statistical physics. In principle, several variants of Poker can be identified, although all of them make use of money to make the challenge meaningful and, moreover, can be played in two different formats: tournament and cash game. An important issue when dealing with Poker is its classification, i.e., as a `skill game' or as gambling. Nowadays, its classification still represents an open question, having a long list of implications (e.g., legal and healthcare) that vary from country to country. In this study, we analyze Poker challenges, considering the cash game format, in terms of thermodynamics systems. Notably, we propose a framework to represent a cash game Poker challenge that, although based on a simplified scenario, allows both to obtain useful information for rounders (i.e., Poker players), and to evaluate the role of Poker room in this context. Finally, starting from a model based on thermodynamics, we show the evolution of a Poker challenge, making a direct connection with the probability theory underlying its dynamics and finding that, even if we consider these games as `skill games', to take a real profit from Poker is really hard.\nConjunctive database queries have been extended with a mechanism for object creation to capture important applications such as data exchange, data integration, and ontology-based data access. Object creation generates new object identifiers in the result, that do not belong to the set of constants in the source database. The new object identifiers can be also seen as Skolem terms. Hence, object-creating conjunctive queries can also be regarded as restricted second-order tuple-generating dependencies (SO tgds), considered in the data exchange literature.   In this paper, we focus on the class of single-function object-creating conjunctive queries, or sifo CQs for short. We give a new characterization for oid-equivalence of sifo CQs that is simpler than the one given by Hull and Yoshikawa and places the problem in the complexity class NP. Our characterization is based on Cohen's equivalence notions for conjunctive queries with multiplicities. We also solve the logical entailment problem for sifo CQs, showing that also this problem belongs to NP. Results by Pichler et al. have shown that logical equivalence for more general classes of SO tgds is either undecidable or decidable with as yet unknown complexity upper bounds.\nMotivated by online settings where users can provide explicit feedback about the relevance of products that are sequentially presented to them, we look at the recommendation process as a problem of dynamically optimizing this relevance feedback. Such an algorithm optimizes the fine tradeoff between presenting the products that are most likely to be relevant, and learning the preferences of the user so that more relevant recommendations can be made in the future.   We assume a standard predictive model inspired by collaborative filtering, in which a user is sampled from a distribution over a set of possible types. For every product category, each type has an associated relevance feedback that is assumed to be binary: the category is either relevant or irrelevant. Assuming that the user stays for each additional recommendation opportunity with probability $\\beta$ independent of the past, the problem is to find a policy that maximizes the expected number of recommendations that are deemed relevant in a session.   We analyze this problem and prove key structural properties of the optimal policy. Based on these properties, we first present an algorithm that strikes a balance between recursion and dynamic programming to compute this policy. We further propose and analyze two heuristic policies: a `farsighted' greedy policy that attains at least $1-\\beta$ factor of the optimal payoff, and a naive greedy policy that attains at least $\\frac{1-\\beta}{1+\\beta}$ factor of the optimal payoff in the worst case. Extensive simulations show that these heuristics are very close to optimal in practice.\nThis paper investigates the effectiveness of factorial speech processing models in noise-robust automatic speech recognition tasks. For this purpose, the paper proposes an idealistic approach for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This approach is an extension to previous single pass retraining for ideal model compensation which is extended here to support multiple audio sources. Non-stationary noises can be considered as one of these audio sources with multiple states. Experiments of this paper over the set A of the Aurora 2 dataset show that recognition performance can be improved by this consideration. The improvement is significant in low signal to noise energy conditions, up to 4% absolute word recognition accuracy. In addition to the power of the proposed method in accurate representation of state-conditional observation distribution, it has an important advantage over previous methods by providing the opportunity to independently select feature spaces for both source and corrupted features. This opens a new window for seeking better feature spaces appropriate for noisy speech, independent from clean speech features.\nWe study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strengths and overcome the weaknesses of the two approaches by applying the doubly robust estimation technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have either a good (but not necessarily consistent) model of rewards or a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust estimation uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.\nWe analyse in this paper the data collected in a set of experiments performed on human subjects on the combination of natural concepts. We investigate the mutual influence of conceptual conjunction and negation by measuring the membership weights of a list of exemplars with respect to two concepts, e.g., 'Fruits' and 'Vegetables', and their conjunction 'Fruits And Vegetables', but also their conjunction when one or both concepts are negated, namely, 'Fruits And Not Vegetables', 'Not Fruits And Vegetables' and 'Not Fruits And Not Vegetables'. Our findings sharpen existing analysis on conceptual combinations, revealing systematic and remarkable deviations from classical (fuzzy set) logic and probability theory. And, more important, our results give further considerable evidence to the validity of our quantum-theoretic framework for the combination of two concepts. Indeed, the representation of conceptual negation naturally arises from the general assumptions of our two-sector Fock space model, and this representation faithfully agrees with the collected data. In addition, we find a further significant deviation and a priori unexpected from classicality, which can exactly be explained by assuming that human reasoning is the superposition of an 'emergent reasoning' and a 'logical reasoning', and that these two processes can be successfully represented in a Fock space algebraic structure.\nThe idea of dynamic move chains has been described in a preceding paper [10]. Re-using an earlier piece of search allows the tree to be forward-pruned, which is known to be dangerous, because it can potentially remove new information that would only be realised through a more exhaustive search process. The justification is the integrity in the position and small changes between positions make it more likely that an earlier result still applies. Larger problems where exhaustive search is not possible would also like a method that can guess accurately. This paper has added to the forward-pruning technique by using 'move tables' that can act in the same way as Transposition Tables, but for moves not positions. They use an efficient memory structure and have put the design into the context of short or long-term memories. The long-term memory includes simply rote-learning of other players' games. The forward-pruning technique can also be fortified to help to remove some potential errors. Another idea is 'long branches'. This plays a short move sequence, before returning to a full search at the resulting leaf nodes. Therefore, with some configuration the dynamic tables can be reliably used and relatively independently of the position. This has advanced some of the future work theory of the earlier paper, and made more explicit where logical plans and more knowledge-based approaches might be applied. The author would argue that the process is a very human approach to searching for chess moves.\nDimension reduction is often needed in the area of data mining. The goal of these methods is to map the given high-dimensional data into a low-dimensional space preserving certain properties of the initial data. There are two kinds of techniques for this purpose. The first, projective methods, builds an explicit linear projection from the high-dimensional space to the low-dimensional one. On the other hand, the nonlinear methods utilizes nonlinear and implicit mapping between the two spaces. In both cases, the methods considered in literature have usually relied on computationally very intensive matrix factorizations, frequently the Singular Value Decomposition (SVD). The computational burden of SVD quickly renders these dimension reduction methods infeasible thanks to the ever-increasing sizes of the practical datasets.   In this paper, we present a new decomposition strategy, Reduced Basis Decomposition (RBD), which is inspired by the Reduced Basis Method (RBM). Given $X$ the high-dimensional data, the method approximates it by $Y \\, T (\\approx X)$ with $Y$ being the low-dimensional surrogate and $T$ the transformation matrix. $Y$ is obtained through a greedy algorithm thus extremely efficient. In fact, it is significantly faster than SVD with comparable accuracy. $T$ can be computed on the fly. Moreover, unlike many compression algorithms, it easily finds the mapping for an arbitrary ``out-of-sample'' vector and it comes with an ``error indicator'' certifying the accuracy of the compression. Numerical results are shown validating these claims.\nDeep Convolutional Neural Networks (CNNs) have demonstrated excellent performance in image classification, but still show room for improvement in object-detection tasks with many categories, in particular for cluttered scenes and occlusion. Modern detection algorithms like Regions with CNNs (Girshick et al., 2014) rely on Selective Search (Uijlings et al., 2013) to propose regions which with high probability represent objects, where in turn CNNs are deployed for classification. Selective Search represents a family of sophisticated algorithms that are engineered with multiple segmentation, appearance and saliency cues, typically coming with a significant run-time overhead. Furthermore, (Hosang et al., 2014) have shown that most methods suffer from low reproducibility due to unstable superpixels, even for slight image perturbations. Although CNNs are subsequently used for classification in top-performing object-detection pipelines, current proposal methods are agnostic to how these models parse objects and their rich learned representations. As a result they may propose regions which may not resemble high-level objects or totally miss some of them. To overcome these drawbacks we propose a boosting approach which directly takes advantage of hierarchical CNN features for detecting regions of interest fast. We demonstrate its performance on ImageNet 2013 detection benchmark and compare it with state-of-the-art methods.\nSearching through a large volume of data is very critical for companies, scientists, and searching engines applications due to time complexity and memory complexity. In this paper, a new technique of generating FuzzyFind Dictionary for text mining was introduced. We simply mapped the 23 bits of the English alphabet into a FuzzyFind Dictionary or more than 23 bits by using more FuzzyFind Dictionary, and reflecting the presence or absence of particular letters. This representation preserves closeness of word distortions in terms of closeness of the created binary vectors within Hamming distance of 2 deviations. This paper talks about the Golay Coding Transformation Hash Table and how it can be used on a FuzzyFind Dictionary as a new technology for using in searching through big data. This method is introduced by linear time complexity for generating the dictionary and constant time complexity to access the data and update by new data sets, also updating for new data sets is linear time depends on new data points. This technique is based on searching only for letters of English that each segment has 23 bits, and also we have more than 23-bit and also it could work with more segments as reference table.\nDuring last years poker has gained a lot of prestige in several countries and, beyond to be one of the most famous card games, it represents a modern challenge for scientists belonging to different communities, spanning from artificial intelligence to physics and from psychology to mathematics. Unlike games like chess, the task of classifying the nature of poker (i.e., as 'skill game' or gambling) seems really hard and it also constitutes a current problem, whose solution has several implications. In general, gambling offers equal winning probabilities both to rational players (i.e., those that use a strategy) and to irrational ones (i.e., those without a strategy). Therefore, in order to uncover the nature of poker, a viable way is comparing performances of rational versus irrational players during a series of challenges. Recently, a work on this topic revealed that rationality is a fundamental ingredient to succeed in poker tournaments. In this study we analyze a simple model of poker challenges by a statistical physics approach, with the aim to uncover the nature of this game. As main result we found that, under particular conditions, few irrational players can turn poker into gambling. Therefore, although rationality is a key ingredient to succeed in poker, also the format of challenges has an important role in these dynamics, as it can strongly influence the underlying nature of the game. The importance of our results lies on related implications, as for instance in identifying the limits poker can be considered as a `skill game' and, as a consequence, which kind of format must be chosen to devise algorithms able to face humans.\nWe present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.\nThis paper investigates distributed cooperative learning algorithms for data processing in a network setting. Specifically, the extreme learning machine (ELM) is introduced to train a set of data distributed across several components, and each component runs a program on a subset of the entire data. In this scheme, there is no requirement for a fusion center in the network due to e.g., practical limitations, security, or privacy reasons. We first reformulate the centralized ELM training problem into a separable form among nodes with consensus constraints. Then, we solve the equivalent problem using distributed optimization tools. A new distributed cooperative learning algorithm based on ELM, called DC-ELM, is proposed. The architecture of this algorithm differs from that of some existing parallel/distributed ELMs based on MapReduce or cloud computing. We also present an online version of the proposed algorithm that can learn data sequentially in a one-by-one or chunk-by-chunk mode. The novel algorithm is well suited for potential applications such as artificial intelligence, computational biology, finance, wireless sensor networks, and so on, involving datasets that are often extremely large, high-dimensional and located on distributed data sources. We show simulation results on both synthetic and real-world data sets.\nTraditional way of storing facts in triplets ({\\it head\\_entity, relation, tail\\_entity}), abbreviated as ({\\it h, r, t}), makes the knowledge intuitively displayed and easily acquired by mankind, but hardly computed or even reasoned by AI machines. Inspired by the success in applying {\\it Distributed Representations} to AI-related fields, recent studies expect to represent each entity and relation with a unique low-dimensional embedding, which is different from the symbolic and atomic framework of displaying knowledge in triplets. In this way, the knowledge computing and reasoning can be essentially facilitated by means of a simple {\\it vector calculation}, i.e. ${\\bf h} + {\\bf r} \\approx {\\bf t}$. We thus contribute an effective model to learn better embeddings satisfying the formula by pulling the positive tail entities ${\\bf t^{+}}$ to get together and close to {\\bf h} + {\\bf r} ({\\it Nearest Neighbor}), and simultaneously pushing the negatives ${\\bf t^{-}}$ away from the positives ${\\bf t^{+}}$ via keeping a {\\it Large Margin}. We also design a corresponding learning algorithm to efficiently find the optimal solution based on {\\it Stochastic Gradient Descent} in iterative fashion. Quantitative experiments illustrate that our approach can achieve the state-of-the-art performance, compared with several latest methods on some benchmark datasets for two classical applications, i.e. {\\it Link prediction} and {\\it Triplet classification}. Moreover, we analyze the parameter complexities among all the evaluated models, and analytical results indicate that our model needs fewer computational resources on outperforming the other methods.\nIdentification of falls while performing normal activities of daily living (ADL) is important to ensure personal safety and well-being. However, falling is a short term activity that occurs infrequently. This poses a challenge to traditional classification algorithms, because there may be very little training data for falls (or none at all). This paper proposes an approach for the identification of falls using a wearable device in the absence of training data for falls but with plentiful data for normal ADL. We propose three `X-Factor' Hidden Markov Model (XHMMs) approaches. The XHMMs model unseen falls using \"inflated\" output covariances (observation models). To estimate the inflated covariances, we propose a novel cross validation method to remove \"outliers\" from the normal ADL that serve as proxies for the unseen falls and allow learning the XHMMs using only normal activities. We tested the proposed XHMM approaches on two activity recognition datasets and show high detection rates for falls in the absence of fall-specific training data. We show that the traditional method of choosing a threshold based on maximum of negative of log-likelihood to identify unseen falls is ill-posed for this problem. We also show that supervised classification methods perform poorly when very limited fall data are available during the training phase.\nNowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of \"complex\" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis (FCA) and its extension based on \"pattern structures\". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e., a data reduction of sequential structures), are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analyzing interesting patient patterns from a French healthcare data set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use case which is the main motivation for this work.   Keywords: data mining; formal concept analysis; pattern structures; projections; sequences; sequential data.\nWe present a computational framework for automatically quantifying verbal and nonverbal behaviors in the context of job interviews. The proposed framework is trained by analyzing the videos of 138 interview sessions with 69 internship-seeking undergraduates at the Massachusetts Institute of Technology (MIT). Our automated analysis includes facial expressions (e.g., smiles, head gestures, facial tracking points), language (e.g., word counts, topic modeling), and prosodic information (e.g., pitch, intonation, and pauses) of the interviewees. The ground truth labels are derived by taking a weighted average over the ratings of 9 independent judges. Our framework can automatically predict the ratings for interview traits such as excitement, friendliness, and engagement with correlation coefficients of 0.75 or higher, and can quantify the relative importance of prosody, language, and facial expressions. By analyzing the relative feature weights learned by the regression models, our framework recommends to speak more fluently, use less filler words, speak as \"we\" (vs. \"I\"), use more unique words, and smile more. We also find that the students who were rated highly while answering the first interview question were also rated highly overall (i.e., first impression matters). Finally, our MIT Interview dataset will be made available to other researchers to further validate and expand our findings.\nEvolutionary Algorithms (EAs) have been shown to be powerful tools for complex optimization problems, which are ubiquitous in both communication and big data analytics. This paper presents a new EA, namely Negatively Correlated Search (NCS), which maintains multiple individual search processes in parallel and models the search behaviors of individual search processes as probability distributions. NCS explicitly promotes negatively correlated search behaviors by encouraging differences among the probability distributions (search behaviors). By this means, individual search processes share information and cooperate with each other to search diverse regions of a search space, which makes NCS a promising method for non-convex optimization. The cooperation scheme of NCS could also be regarded as a novel diversity preservation scheme that, different from other existing schemes, directly promotes diversity at the level of search behaviors rather than merely trying to maintain diversity among candidate solutions. Empirical studies showed that NCS is competitive to well-established search methods in the sense that NCS achieved the best overall performance on 20 multimodal (non-convex) continuous optimization problems. The advantages of NCS over state-of-the-art approaches are also demonstrated with a case study on the synthesis of unequally spaced linear antenna arrays.\nThe amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of \"problematic\" genes (i.e., homoplasy, incomplete lineage sorting, horizontal gene transfers, etc.) which may blur phylogenetic signal. However, a trustworthy phylogenetic tree can still be obtained if the number of problematic genes is low, the problem being to determine the largest subset of core genes that produces the best supported tree. To discard problematic genes and due to the overwhelming number of possible combinations, we propose an hybrid approach that embeds both genetic algorithms and statistical tests. Given a set of organisms, the result is a pipeline of many stages for the production of well supported phylogenetic trees. The proposal has been applied to different cases of plant families, leading to encouraging results for these families.\nThis paper addresses the task of time separated aerial image registration. The ability to solve this problem accurately and reliably is important for a variety of subsequent image understanding applications. The principal challenge lies in the extent and nature of transient appearance variation that a land area can undergo, such as that caused by the change in illumination conditions, seasonal variations, or the occlusion by non-persistent objects (people, cars). Our work introduces several novelties: (i) unlike all previous work on aerial image registration, we approach the problem using a set-based paradigm; (ii) we show how local, pair-wise constraints can be used to enforce a globally good registration using a constraints graph structure; (iii) we show how a simple holistic representation derived from raw aerial images can be used as a basic building block of the constraints graph in a manner which achieves both high registration accuracy and speed. We demonstrate: (i) that the proposed method outperforms the state-of-the-art for pair-wise registration already, achieving greater accuracy and reliability, while at the same time reducing the computational cost of the task; and (ii) that the increase in the number of available images in a set consistently reduces the average registration error.\nIn the classic AGM belief revision theory, beliefs are static and do not change their own shape. For instance, if p is accepted by a rational agent, it will remain p to the agent. But such rarely happens to us. Often, when we accept some information p, what is actually accepted is not the whole p, but only a portion of it; not necessarily because we select the portion but because p must be perceived. Only the perceived p is accepted; and the perception is subject to what we already believe (know). What may, however, happen to the rest of p that initially escaped our attention? In this work we argue that the invisible part is also accepted to the agent, if only unconsciously. Hence some parts of p are accepted as visible, while some other parts as latent, beliefs. The division is not static. As the set of beliefs changes, what were hidden may become visible. We present a perception-based belief theory that incorporates latent beliefs.\nTime series forecasting is an important predictive methodology which can be applied to a wide range of problems. Particularly, forecasting the indoor temperature permits an improved utilization of the HVAC (Heating, Ventilating and Air Conditioning) systems in a home and thus a better energy efficiency. With such purpose the paper describes how to implement an Artificial Neural Network (ANN) algorithm in a low cost system-on-chip to develop an autonomous intelligent wireless sensor network. The present paper uses a Wireless Sensor Networks (WSN) to monitor and forecast the indoor temperature in a smart home, based on low resources and cost microcontroller technology as the 8051MCU. An on-line learning approach, based on Back-Propagation (BP) algorithm for ANNs, has been developed for real-time time series learning. It performs the model training with every new data that arrive to the system, without saving enormous quantities of data to create a historical database as usual, i.e., without previous knowledge. Consequently to validate the approach a simulation study through a Bayesian baseline model have been tested in order to compare with a database of a real application aiming to see the performance and accuracy. The core of the paper is a new algorithm, based on the BP one, which has been described in detail, and the challenge was how to implement a computational demanding algorithm in a simple architecture with very few hardware resources.\nConstraint programming is a family of techniques for solving combinatorial problems, where the problem is modelled as a set of decision variables (typically with finite domains) and a set of constraints that express relations among the decision variables. One key concept in constraint programming is propagation: reasoning on a constraint or set of constraints to derive new facts, typically to remove values from the domains of decision variables. Specialised propagation algorithms (propagators) exist for many classes of constraints.   The concept of support is pervasive in the design of propagators. Traditionally, when a domain value ceases to have support, it may be removed because it takes part in no solutions. Arc-consistency algorithms such as AC2001 make use of support in the form of a single domain value. GAC algorithms such as GAC-Schema use a tuple of values to support each literal. We generalize these notions of support in two ways. First, we allow a set of tuples to act as support. Second, the supported object is generalized from a set of literals (GAC-Schema) to an entire constraint or any part of it.   We design a methodology for developing correct propagators using generalized support. A constraint is expressed as a family of support properties, which may be proven correct against the formal semantics of the constraint. Using Curry-Howard isomorphism to interpret constructive proofs as programs, we show how to derive correct propagators from the constructive proofs of the support properties. The framework is carefully designed to allow efficient algorithms to be produced. Derived algorithms may make use of dynamic literal triggers or watched literals for efficiency. Finally, two case studies of deriving efficient algorithms are given.\nSocial media is becoming an increasingly important source of information to complement traditional pharmacovigilance methods. In order to identify signals of potential adverse drug reactions, it is necessary to first identify medical concepts in the social media text. Most of the existing studies use dictionary-based methods which are not evaluated independently from the overall signal detection task.   We compare different approaches to automatically identify and normalise medical concepts in consumer reviews in medical forums. Specifically, we implement several dictionary-based methods popular in the relevant literature, as well as a method we suggest based on a state-of-the-art machine learning method for entity recognition. MetaMap, a popular biomedical concept extraction tool, is used as a baseline. Our evaluations were performed in a controlled setting on a common corpus which is a collection of medical forum posts annotated with concepts and linked to controlled vocabularies such as MedDRA and SNOMED CT.   To our knowledge, our study is the first to systematically examine the effect of popular concept extraction methods in the area of signal detection for adverse reactions. We show that the choice of algorithm or controlled vocabulary has a significant impact on concept extraction, which will impact the overall signal detection process. We also show that our proposed machine learning approach significantly outperforms all the other methods in identification of both adverse reactions and drugs, even when trained with a relatively small set of annotated text.\nIn this paper, we present a probabilistic framework for goal-driven spoken dialog systems. A new dynamic stochastic state (DS-state) is then defined to characterize the goal set of a dialog state at different stages of the dialog process. Furthermore, an entropy minimization dialog management(EMDM) strategy is also proposed to combine with the DS-states to facilitate a robust and efficient solution in reaching a user's goals. A Song-On-Demand task, with a total of 38117 songs and 12 attributes corresponding to each song, is used to test the performance of the proposed approach. In an ideal simulation, assuming no errors, the EMDM strategy is the most efficient goal-seeking method among all tested approaches, returning the correct song within 3.3 dialog turns on average. Furthermore, in a practical scenario, with top five candidates to handle the unavoidable automatic speech recognition (ASR) and natural language understanding (NLU) errors, the results show that only 61.7\\% of the dialog goals can be successfully obtained in 6.23 dialog turns on average when random questions are asked by the system, whereas if the proposed DS-states are updated with the top 5 candidates from the SLU output using the proposed EMDM strategy executed at every DS-state, then a 86.7\\% dialog success rate can be accomplished effectively within 5.17 dialog turns on average. We also demonstrate that entropy-based DM strategies are more efficient than non-entropy based DM. Moreover, using the goal set distributions in EMDM, the results are better than those without them, such as in sate-of-the-art database summary DM.\nWe consider existential rules (aka Datalog+) as a formalism for specifying ontologies. In recent years, many classes of existential rules have been exhibited for which conjunctive query (CQ) entailment is decidable. However, most of these classes cannot express transitivity of binary relations, a frequently used modelling construct. In this paper, we address the issue of whether transitivity can be safely combined with decidable classes of existential rules.   First, we prove that transitivity is incompatible with one of the simplest decidable classes, namely aGRD (acyclic graph of rule dependencies), which clarifies the landscape of `finite expansion sets' of rules.   Second, we show that transitivity can be safely added to linear rules (a subclass of guarded rules, which generalizes the description logic DL-Lite-R) in the case of atomic CQs, and also for general CQs if we place a minor syntactic restriction on the rule set. This is shown by means of a novel query rewriting algorithm that is specially tailored to handle transitivity rules.   Third, for the identified decidable cases, we pinpoint the combined and data complexities of query entailment.\nParticle Swarm Optimization (PSO) is a nature-inspired meta-heuristic for solving continuous optimization problems. In the literature, the potential of the particles of swarm has been used to show that slightly modified PSO guarantees convergence to local optima. Here we show that under specific circumstances the unmodified PSO, even with swarm parameters known (from the literature) to be good, almost surely does not yield convergence to a local optimum is provided. This undesirable phenomenon is called stagnation. For this purpose, the particles' potential in each dimension is analyzed mathematically. Additionally, some reasonable assumptions on the behavior if the particles' potential are made. Depending on the objective function and, interestingly, the number of particles, the potential in some dimensions may decrease much faster than in other dimensions. Therefore, these dimensions lose relevance, i.e., the contribution of their entries to the decisions about attractor updates becomes insignificant and, with positive probability, they never regain relevance. If Brownian Motion is assumed to be an approximation of the time-dependent drop of potential, practical, i.e., large values for this probability are calculated. Finally, on chosen multidimensional polynomials of degree two, experiments are provided showing that the required circumstances occur quite frequently. Furthermore, experiments are provided showing that even when the very simple sphere function is processed the described stagnation phenomenon occurs. Consequently, unmodified PSO does not converge to any local optimum of the chosen functions for tested parameter settings.\nThe Coalitional Manipulation (CM) problem has been studied extensively in the literature for many voting rules. The CM problem, however, has been studied only in the complete information setting, that is, when the manipulators know the votes of the non-manipulators. A more realistic scenario is an incomplete information setting where the manipulators do not know the exact votes of the non- manipulators but may have some partial knowledge of the votes. In this paper, we study a setting where the manipulators know a partial order for each voter that is consistent with the vote of that voter. In this setting, we introduce and study two natural computational problems - (1) Weak Manipulation (WM) problem where the manipulators wish to vote in a way that makes their preferred candidate win in at least one extension of the partial votes of the non-manipulators; (2) Strong Manipulation (SM) problem where the manipulators wish to vote in a way that makes their preferred candidate win in all possible extensions of the partial votes of the non-manipulators. We study the computational complexity of the WM and the SM problems for commonly used voting rules such as plurality, veto, k-approval, k-veto, maximin, Copeland, and Bucklin. Our key finding is that, barring a few exceptions, manipulation becomes a significantly harder problem in the setting of incomplete votes.\nSuppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-Select, which utilizes prior knowledge about similarity be- tween items, expressed as a kernel function. GP-Select uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-Select to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-Select and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine de- sign task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users.\nWe describe Quizz, a gamified crowdsourcing system that simultaneously assesses the knowledge of users and acquires new knowledge from them. Quizz operates by asking users to complete short quizzes on specific topics; as a user answers the quiz questions, Quizz estimates the user's competence. To acquire new knowledge, Quizz also incorporates questions for which we do not have a known answer; the answers given by competent users provide useful signals for selecting the correct answers for these questions. Quizz actively tries to identify knowledgeable users on the Internet by running advertising campaigns, effectively leveraging the targeting capabilities of existing, publicly available, ad placement services. Quizz quantifies the contributions of the users using information theory and sends feedback to the advertisingsystem about each user. The feedback allows the ad targeting mechanism to further optimize ad placement.   Our experiments, which involve over ten thousand users, confirm that we can crowdsource knowledge curation for niche and specialized topics, as the advertising network can automatically identify users with the desired expertise and interest in the given topic. We present controlled experiments that examine the effect of various incentive mechanisms, highlighting the need for having short-term rewards as goals, which incentivize the users to contribute. Finally, our cost-quality analysis indicates that the cost of our approach is below that of hiring workers through paid-crowdsourcing platforms, while offering the additional advantage of giving access to billions of potential users all over the planet, and being able to reach users with specialized expertise that is not typically available through existing labor marketplaces.\nA neuromorphic chip that combines CMOS analog spiking neurons and memristive synapses offers a promising solution to brain-inspired computing, as it can provide massive neural network parallelism and density. Previous hybrid analog CMOS-memristor approaches required extensive CMOS circuitry for training, and thus eliminated most of the density advantages gained by the adoption of memristor synapses. Further, they used different waveforms for pre and post-synaptic spikes that added undesirable circuit overhead. Here we describe a hardware architecture that can feature a large number of memristor synapses to learn real-world patterns. We present a versatile CMOS neuron that combines integrate-and-fire behavior, drives passive memristors and implements competitive learning in a compact circuit module, and enables in-situ plasticity in the memristor synapses. We demonstrate handwritten-digits recognition using the proposed architecture using transistor-level circuit simulations. As the described neuromorphic architecture is homogeneous, it realizes a fundamental building block for large-scale energy-efficient brain-inspired silicon chips that could lead to next-generation cognitive computing.\nWe deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data has led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimisers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.\nThe monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions. For the monolithic approach to succeed (and this is not always possible), a complex feature representation is often necessary since the policy is a complex object that has to prescribe what actions to take all over the state space. This is especially true in large domains with complicated dynamics. It is also computationally inefficient to both learn and plan in MDPs using a complex monolithic approach. We present a different approach where we restrict the policy space to policies that can be represented as combinations of simpler, parameterized skills---a type of temporally extended action, with a simple policy representation. We introduce Learning Skills via Bootstrapping (LSB) that can use a broad family of Reinforcement Learning (RL) algorithms as a \"black box\" to iteratively learn parametrized skills. Initially, the learned skills are short-sighted but each iteration of the algorithm allows the skills to bootstrap off one another, improving each skill in the process. We prove that this bootstrapping process returns a near-optimal policy. Furthermore, our experiments demonstrate that LSB can solve MDPs that, given the same representational power, could not be solved by a monolithic approach. Thus, planning with learned skills results in better policies without requiring complex policy representations.\nThis paper reveals the tree structure as an intermediate result of clustering by fast search and find of density peaks (DPCLUS), and explores the power of using this tree to perform hierarchical clustering. The array used to hold the index of the nearest higher-densitied object for each object can be transformed into a Leading Tree (LT), in which each parent node P leads its child nodes to join the same cluster as P itself, and the child nodes are sorted by their gamma values in descendant order to accelerate the disconnecting of root in each subtree. There are two major advantages with the LT: One is dramatically reducing the running time of assigning noncenter data points to their cluster ID, because the assigning process is turned into just disconnecting the links from each center to its parent. The other is that the tree model for representing clusters is more informative. Because we can check which objects are more likely to be selected as centers in finer grained clustering, or which objects reach to its center via less jumps. Experiment results and analysis show the effectiveness and efficiency of the assigning process with an LT.\nPlace classification is a fundamental ability that a robot should possess to carry out effective human-robot interactions. It is a nontrivial classification problem which has attracted many research. In recent years, there is a high exploitation of Artificial Intelligent algorithms in robotics applications. Inspired by the recent successes of deep learning methods, we propose an end-to-end learning approach for the place classification problem. With the deep architectures, this methodology automatically discovers features and contributes in general to higher classification accuracies. The pipeline of our approach is composed of three parts. Firstly, we construct multiple layers of laser range data to represent the environment information in different levels of granularity. Secondly, each layer of data is fed into a deep neural network model for classification, where a graph regularization is imposed to the deep architecture for keeping local consistency between adjacent samples. Finally, the predicted labels obtained from all the layers are fused based on confidence trees to maximize the overall confidence. Experimental results validate the effective- ness of our end-to-end place classification framework in which both the multi-layer structure and the graph regularization promote the classification performance. Furthermore, results show that the features automatically learned from the raw input range data can achieve competitive results to the features constructed based on statistical and geometrical information.\nOwing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.\nDeclarative spatial reasoning denotes the ability to (declaratively) specify and solve real-world problems related to geometric and qualitative spatial representation and reasoning within standard knowledge representation and reasoning (KR) based methods (e.g., logic programming and derivatives). One approach for encoding the semantics of spatial relations within a declarative programming framework is by systems of polynomial constraints. However, solving such constraints is computationally intractable in general (i.e. the theory of real-closed fields).   We present a new algorithm, implemented within the declarative spatial reasoning system CLP(QS), that drastically improves the performance of deciding the consistency of spatial constraint graphs over conventional polynomial encodings. We develop pruning strategies founded on spatial symmetries that form equivalence classes (based on affine transformations) at the qualitative spatial level. Moreover, pruning strategies are themselves formalised as knowledge about the properties of space and spatial symmetries. We evaluate our algorithm using a range of benchmarks in the class of contact problems, and proofs in mereology and geometry. The empirical results show that CLP(QS) with knowledge-based spatial pruning outperforms conventional polynomial encodings by orders of magnitude, and can thus be applied to problems that are otherwise unsolvable in practice.\nModern conflict-driven clause-learning (CDCL) Boolean SAT solvers provide efficient automatic analysis of real-world feature models (FM) of systems ranging from cars to operating systems. It is well-known that solver-based analysis of real-world FMs scale very well even though SAT instances obtained from such FMs are large, and the corresponding analysis problems are known to be NP-complete. To better understand why SAT solvers are so effective, we systematically studied many syntactic and semantic characteristics of a representative set of large real-world FMs. We discovered that a key reason why large real-world FMs are easy-to-analyze is that the vast majority of the variables in these models are unrestricted, i.e., the models are satisfiable for both true and false assignments to such variables under the current partial assignment. Given this discovery and our understanding of CDCL SAT solvers, we show that solvers can easily find satisfying assignments for such models without too many backtracks relative to the model size, explaining why solvers scale so well. Further analysis showed that the presence of unrestricted variables in these real-world models can be attributed to their high-degree of variability. Additionally, we experimented with a series of well-known non-backtracking simplifications that are particularly effective in solving FMs. The remaining variables/clauses after simplifications, called the core, are so few that they are easily solved even with backtracking, further strengthening our conclusions.\nThis paper introduces a high-performance hybrid algorithm, called Hybrid Hypervolume Maximization Algorithm (H2MA), for multi-objective optimization that alternates between exploring the decision space and exploiting the already obtained non-dominated solutions. The proposal is centered on maximizing the hypervolume indicator, thus converting the multi-objective problem into a single-objective one. The exploitation employs gradient-based methods, but considering a single candidate efficient solution at a time, to overcome limitations associated with population-based approaches and also to allow an easy control of the number of solutions provided. There is an interchange between two steps. The first step is a deterministic local exploration, endowed with an automatic procedure to detect stagnation. When stagnation is detected, the search is switched to a second step characterized by a stochastic global exploration using an evolutionary algorithm. Using five ZDT benchmarks with 30 variables, the performance of the new algorithm is compared to state-of-the-art algorithms for multi-objective optimization, more specifically NSGA-II, SPEA2, and SMS-EMOA. The solutions found by the H2MA guide to higher hypervolume and smaller distance to the true Pareto frontier with significantly less function evaluations, even when the gradient is estimated numerically. Furthermore, although only continuous decision spaces have been considered here, discrete decision spaces could also have been treated, replacing gradient-based search by hill-climbing. Finally, a thorough explanation is provided to support the expressive gain in performance that was achieved.\nIt is often desirable to be able to recognize when inputs to a recognition function learned in a supervised manner correspond to classes unseen at training time. With this ability, new class labels could be assigned to these inputs by a human operator, allowing them to be incorporated into the recognition function --- ideally under an efficient incremental update mechanism. While good algorithms that assume inputs from a fixed set of classes exist, e.g., artificial neural networks and kernel machines, it is not immediately obvious how to extend them to perform incremental learning in the presence of unknown query classes. Existing algorithms take little to no distributional information into account when learning recognition functions and lack a strong theoretical foundation. We address this gap by formulating a novel, theoretically sound classifier --- the Extreme Value Machine (EVM). The EVM has a well-grounded interpretation derived from statistical Extreme Value Theory (EVT), and is the first classifier to be able to perform nonlinear kernel-free variable bandwidth incremental learning. Compared to other classifiers in the same deep network derived feature space, the EVM is accurate and efficient on an established benchmark partition of the ImageNet dataset.\nThis paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern - a concept on which most interestingness measures directly rely - with (2) SkOPUS: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art. This article was published in Data Mining and Knowledge Discovery and is accessible at http://dx.doi.org/10.1007/s10618-016-0467-9.\nProbabilistic graphical models offer a powerful framework to account for the dependence structure between variables, which is represented as a graph. However, the dependence between variables may render inference tasks intractable. In this paper we review techniques exploiting the graph structure for exact inference, borrowed from optimisation and computer science. They are built on the principle of variable elimination whose complexity is dictated in an intricate way by the order in which variables are eliminated. The so-called treewidth of the graph characterises this algorithmic complexity: low-treewidth graphs can be processed efficiently. The first message that we illustrate is therefore the idea that for inference in graphical model, the number of variables is not the limiting factor, and it is worth checking for the treewidth before turning to approximate methods. We show how algorithms providing an upper bound of the treewidth can be exploited to derive a 'good' elimination order enabling to perform exact inference. The second message is that when the treewidth is too large, algorithms for approximate inference linked to the principle of variable elimination, such as loopy belief propagation and variational approaches, can lead to accurate results while being much less time consuming than Monte-Carlo approaches. We illustrate the techniques reviewed in this article on benchmarks of inference problems in genetic linkage analysis and computer vision, as well as on hidden variables restoration in coupled Hidden Markov Models.\nWe address the problem of belief revision of logic programs, i.e., how to incorporate to a logic program P a new logic program Q. Based on the structure of SE interpretations, Delgrande et al. adapted the well-known AGM framework to logic program (LP) revision. They identified the rational behavior of LP revision and introduced some specific operators. In this paper, a constructive characterization of all rational LP revision operators is given in terms of orderings over propositional interpretations with some further conditions specific to SE interpretations. It provides an intuitive, complete procedure for the construction of all rational LP revision operators and makes easier the comprehension of their semantic and computational properties. We give a particular consideration to logic programs of very general form, i.e., the generalized logic programs (GLPs). We show that every rational GLP revision operator is derived from a propositional revision operator satisfying the original AGM postulates. Interestingly, the further conditions specific to GLP revision are independent from the propositional revision operator on which a GLP revision operator is based. Taking advantage of our characterization result, we embed the GLP revision operators into structures of Boolean lattices, that allow us to bring to light some potential weaknesses in the adapted AGM postulates. To illustrate our claim, we introduce and characterize axiomatically two specific classes of (rational) GLP revision operators which arguably have a drastic behavior. We additionally consider two more restricted forms of logic programs, i.e., the disjunctive logic programs (DLPs) and the normal logic programs (NLPs) and adapt our characterization result to DLP and NLP revision operators.\nChecking software application suitability using automated software tools has become a vital element for most organisations irrespective of whether they produce in-house software or simply customise off-the-shelf software applications for internal use. As software solutions become ever more complex, the industry becomes increasingly dependent on software automation tools, yet the brittle nature of the available software automation tools limits their effectiveness. Companies invest significantly in obtaining and implementing automation software but most of the tools fail to deliver when the cost of maintaining an effective automation test suite exceeds the cost and time that would have otherwise been spent on manual testing. A failing in the current generation of software automation tools is they do not adapt to unexpected modifications and obstructions without frequent (and time expensive) manual interference. Such issues are commonly acknowledged amongst industry practitioners, yet none of the current generation of tools have leveraged the advances in machine learning and artificial intelligence to address these problems.   This paper proposes a framework solution that utilises machine learning concepts, namely fuzzy matching and error recovery. The suggested solution applies adaptive techniques to recover from unexpected obstructions that would otherwise have prevented the script from proceeding. Recovery details are presented to the user in a report which can be analysed to determine if the recovery procedure was acceptable and the framework will adapt future runs based on the decisions of the user. Using this framework, a practitioner can run the automated suits without human intervention while minimising the risk of schedule delays.\nPurpose. Radiation therapy is a local treatment aimed at cells in and around a tumor. The goal of this study is to develop an algorithmic solution for predicting the position of a target in 3D in real time, aiming for the short fixed calibration time for each patient at the beginning of the procedure. Accurate predictions of lung tumor motion are expected to improve the precision of radiation treatment by controlling the position of a couch or a beam in order to compensate for respiratory motion during radiation treatment.   Methods. For developing the algorithmic solution, data mining techniques are used. A model form from the family of exponential smoothing is assumed, and the model parameters are fitted by minimizing the absolute disposition error, and the fluctuations of the prediction signal (jitter). The predictive performance is evaluated retrospectively on clinical datasets capturing different behavior (being quiet, talking, laughing), and validated in real-time on a prototype system with respiratory motion imitation.   Results. An algorithmic solution for respiratory motion prediction (called ExSmi) is designed. ExSmi achieves good accuracy of prediction (error $4-9$ mm/s) with acceptable jitter values (5-7 mm/s), as tested on out-of-sample data. The datasets, the code for algorithms and the experiments are openly available for research purposes on a dedicated website.   Conclusions. The developed algorithmic solution performs well to be prototyped and deployed in applications of radiotherapy.\nPerceptron is a classic online algorithm for learning a classification function. In this paper, we provide a novel extension of the perceptron algorithm to the learning to rank problem in information retrieval. We consider popular listwise performance measures such as Normalized Discounted Cumulative Gain (NDCG) and Average Precision (AP). A modern perspective on perceptron for classification is that it is simply an instance of online gradient descent (OGD), during mistake rounds, using the hinge loss function. Motivated by this interpretation, we propose a novel family of listwise, large margin ranking surrogates. Members of this family can be thought of as analogs of the hinge loss. Exploiting a certain self-bounding property of the proposed family, we provide a guarantee on the cumulative NDCG (or AP) induced loss incurred by our perceptron-like algorithm. We show that, if there exists a perfect oracle ranker which can correctly rank each instance in an online sequence of ranking data, with some margin, the cumulative loss of perceptron algorithm on that sequence is bounded by a constant, irrespective of the length of the sequence. This result is reminiscent of Novikoff's convergence theorem for the classification perceptron. Moreover, we prove a lower bound on the cumulative loss achievable by any deterministic algorithm, under the assumption of existence of perfect oracle ranker. The lower bound shows that our perceptron bound is not tight, and we propose another, \\emph{purely online}, algorithm which achieves the lower bound. We provide empirical results on simulated and large commercial datasets to corroborate our theoretical results.\nPresently, a very large number of public and private data sets are available from local governments. In most cases, they are not semantically interoperable and a huge human effort would be needed to create integrated ontologies and knowledge base for smart city. Smart City ontology is not yet standardized, and a lot of research work is needed to identify models that can easily support the data reconciliation, the management of the complexity, to allow the data reasoning. In this paper, a system for data ingestion and reconciliation of smart cities related aspects as road graph, services available on the roads, traffic sensors etc., is proposed. The system allows managing a big data volume of data coming from a variety of sources considering both static and dynamic data. These data are mapped to a smart-city ontology, called KM4City (Knowledge Model for City), and stored into an RDF-Store where they are available for applications via SPARQL queries to provide new services to the users via specific applications of public administration and enterprises. The paper presents the process adopted to produce the ontology and the big data architecture for the knowledge base feeding on the basis of open and private data, and the mechanisms adopted for the data verification, reconciliation and validation. Some examples about the possible usage of the coherent big data knowledge base produced are also offered and are accessible from the RDF-Store and related services. The article also presented the work performed about reconciliation algorithms and their comparative assessment and selection.\nThe logic-based machine-understandable framework of the Semantic Web often challenges naive users when they try to query ontology-based knowledge bases. Existing research efforts have approached this problem by introducing Natural Language (NL) interfaces to ontologies. These NL interfaces have the ability to construct SPARQL queries based on NL user queries. However, most efforts were restricted to queries expressed in English, and they often benefited from the advancement of English NLP tools. However, little research has been done to support querying the Arabic content on the Semantic Web by using NL queries. This paper presents a domain-independent approach to translate Arabic NL queries to SPARQL by leveraging linguistic analysis. Based on a special consideration on Noun Phrases (NPs), our approach uses a language parser to extract NPs and the relations from Arabic parse trees and match them to the underlying ontology. It then utilizes knowledge in the ontology to group NPs into triple-based representations. A SPARQL query is finally generated by extracting targets and modifiers, and interpreting them into SPARQL. The interpretation of advanced semantic features including negation, conjunctive and disjunctive modifiers is also supported. The approach was evaluated by using two datasets consisting of OWL test data and queries, and the obtained results have confirmed its feasibility to translate Arabic NL queries to SPARQL.\nLarge knowledge graphs increasingly add value to various applications that require machines to recognize and understand queries and their semantics, as in search or question answering systems. Latent variable models have increasingly gained attention for the statistical modeling of knowledge graphs, showing promising results in tasks related to knowledge graph completion and cleaning. Besides storing facts about the world, schema-based knowledge graphs are backed by rich semantic descriptions of entities and relation-types that allow machines to understand the notion of things and their semantic relationships. In this work, we study how type-constraints can generally support the statistical modeling with latent variable models. More precisely, we integrated prior knowledge in form of type-constraints in various state of the art latent variable approaches. Our experimental results show that prior knowledge on relation-types significantly improves these models up to 77% in link-prediction tasks. The achieved improvements are especially prominent when a low model complexity is enforced, a crucial requirement when these models are applied to very large datasets. Unfortunately, type-constraints are neither always available nor always complete e.g., they can become fuzzy when entities lack proper typing. We show that in these cases, it can be beneficial to apply a local closed-world assumption that approximates the semantics of relation-types based on observations made in the data.\nAlgorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with sequentially choosing which distribution from a collection to sample in order to minimize (maximize) the single best cost (reward). Whereas the distributions in the standard bandit setting are primarily characterized by their means, a number of subtleties arise when we care about the minimal cost as opposed to the average cost. For example, there may not be a well-defined \"best\" distribution as there is in the standard bandit setting. The best distribution depends on the rewards that have been obtained and on the remaining time horizon. Whereas in the standard bandit setting, it is sensible to compare policies with an oracle which plays the single best arm, in the extreme bandit setting, there are multiple sensible oracle models. We define a sensible notion of \"extreme regret\" in the extreme bandit setting, which parallels the concept of regret in the standard bandit setting. We then prove that no policy can asymptotically achieve no extreme regret.\nThe primary challenge of rocket propulsion is the burden of needing to accelerate the spacecraft's own fuel, resulting in only a logarithmic gain in maximum speed as propellant is added to the spacecraft. Light sails offer an attractive alternative in which fuel is not carried by the spacecraft, with acceleration being provided by an external source of light. By artificially illuminating the spacecraft with beamed radiation, speeds are only limited by the area of the sail, heat resistance of its material, and power use of the accelerating apparatus. In this paper, we show that leakage from a light sail propulsion apparatus in operation around a solar system analogue would be detectable. To demonstrate this, we model the launch and arrival of a microwave beam-driven light sail constructed for transit between planets in orbit around a single star, and find an optimal beam frequency on the order of tens of GHz. Leakage from these beams yields transients with flux densities of Jy and durations of tens of seconds at 100 pc. Because most travel within a planetary system would be conducted between the habitable worlds within that system, multiply-transiting exoplanetary systems offer the greatest chance of detection, especially when the planets are in projected conjunction as viewed from Earth. If interplanetary travel via beam-driven light sails is commonly employed in our galaxy, this activity could be revealed by radio follow-up of nearby transiting exoplanetary systems. The expected signal properties define a new strategy in the search for extraterrestrial intelligence (SETI).\nA general tension-reduction (GTR) model was recently considered to derive quantum probabilities as (universal) averages over all possible forms of non-uniform fluctuations, and explain their considerable success in describing experimental situations also outside of the domain of physics, for instance in the ambit of quantum models of cognition and decision. Yet, this result also highlighted the possibility of observing violations of the predictions of the Born rule, in those situations where the averaging would not be large enough, or would be altered because of the combination of multiple measurements. In this article we show that this is indeed the case in typical psychological measurements exhibiting question order effects, by showing that their statistics of outcomes are inherently non-Hilbertian, and require the larger framework of the GTR-model to receive an exact mathematical description. We also consider another unsolved problem of quantum cognition: response replicability. It is has been observed that when question order effects and response replicability occur together, the situation cannot be handled anymore by quantum theory. However, we show that it can be easily and naturally described in the GTR-model. Based on these findings, we motivate the adoption in cognitive science of a hidden-measurements interpretation of the quantum formalism, and of its GTR-model generalization, as the natural interpretational framework explaining the data of psychological measurements on conceptual entities.\nLearning novel concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same data set may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, the output of current relational learning algorithms tends to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency. This variation complicates their off-the-shelf application. In this paper, we introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We study both sample-based learning algorithms, which learn from sets of labeled examples, and query-based algorithms, which learn by asking queries to an oracle. We prove that current relational learning algorithms are generally not schema independent. For query-based learning algorithms we show that the (de) composition transformations influence their query complexity. We propose Castor, a sample-based relational learning algorithm that achieves schema independence by leveraging data dependencies. We support the theoretical results with an empirical study that demonstrates the schema dependence/independence of several algorithms on existing benchmark and real-world datasets under (de) compositions.\nThis paper describes an architecture that combines the complementary strengths of probabilistic graphical models and declarative programming to enable robots to represent and reason with logic-based and probabilistic descriptions of uncertainty and domain knowledge. An action language is extended to support non-boolean fluents and non-deterministic causal laws. This action language is used to describe tightly-coupled transition diagrams at two levels of granularity, refining a coarse-resolution transition diagram of the domain to obtain a fine-resolution transition diagram. The coarse-resolution system description, and a history that includes (prioritized) defaults, are translated into an Answer Set Prolog (ASP) program. For any given goal, inference in the ASP program provides a plan of abstract actions. To implement each such abstract action probabilistically, the part of the fine-resolution transition diagram relevant to this action is identified, and a probabilistic representation of the uncertainty in sensing and actuation is included and used to construct a partially observable Markov decision process (POMDP). The policy obtained by solving the POMDP is invoked repeatedly to implement the abstract action as a sequence of concrete actions, with the corresponding observations being recorded in the coarse-resolution history and used for subsequent reasoning. The architecture is evaluated in simulation and on a mobile robot moving objects in an indoor domain, to show that it supports reasoning with violation of defaults, noisy observations and unreliable actions, in complex domains.\nIn this paper, we propose a model-based clustering method (TVClust) that robustly incorporates noisy side information as soft-constraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the side-information. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the small-variance asymptotics of our probabilistic model, we then derive a new deterministic clustering algorithm (RDP-means). It can be viewed as an extension of K-means that allows for the inclusion of side information and has the additional property that the number of clusters does not need to be specified a priori. Empirical studies have been carried out to compare our work with many constrained clustering algorithms from the literature on both a variety of data sets and under a variety of conditions such as using noisy side information and erroneous k values. The results of our experiments show strong results for our probabilistic and deterministic approaches under these conditions when compared to other algorithms in the literature.\nConsider a setting where selfish agents are to be assigned to coalitions or projects from a fixed set P. Each project k is characterized by a valuation function; v_k(S) is the value generated by a set S of agents working on project k. We study the following classic problem in this setting: \"how should the agents divide the value that they collectively create?\". One traditional approach in cooperative game theory is to study core stability with the implicit assumption that there are infinite copies of one project, and agents can partition themselves into any number of coalitions. In contrast, we consider a model with a finite number of non-identical projects; this makes computing both high-welfare solutions and core payments highly non-trivial.   The main contribution of this paper is a black-box mechanism that reduces the problem of computing a near-optimal core stable solution to the purely algorithmic problem of welfare maximization; we apply this to compute an approximately core stable solution that extracts one-fourth of the optimal social welfare for the class of subadditive valuations. We also show much stronger results for several popular sub-classes: anonymous, fractionally subadditive, and submodular valuations, as well as provide new approximation algorithms for welfare maximization with anonymous functions. Finally, we establish a connection between our setting and the well-studied simultaneous auctions with item bidding; we adapt our results to compute approximate pure Nash equilibria for these auctions.\nThis paper addresses the problem of predicting the k events that are most likely to occur next, over historical real-time event streams. Existing approaches to causal prediction queries have a number of limitations. First, they exhaustively search over an acyclic causal network to find the most likely k effect events; however, data from real event streams frequently reflect cyclic causality. Second, they contain conservative assumptions intended to exclude all possible non-causal links in the causal network; it leads to the omission of many less-frequent but important causal links. We overcome these limitations by proposing a novel event precedence model and a run-time causal inference mechanism. The event precedence model constructs a first order absorbing Markov chain incrementally over event streams, where an edge between two events signifies a temporal precedence relationship between them, which is a necessary condition for causality. Then, the run-time causal inference mechanism learns causal relationships dynamically during query processing. This is done by removing some of the temporal precedence relationships that do not exhibit causality in the presence of other events in the event precedence model. This paper presents two query processing algorithms -- one performs exhaustive search on the model and the other performs a more efficient reduced search with early termination. Experiments using two real datasets (cascading blackouts in power systems and web page views) verify the effectiveness of the probabilistic top-k prediction queries and the efficiency of the algorithms. Specifically, the reduced search algorithm reduced runtime, relative to exhaustive search, by 25-80% (depending on the application) with only a small reduction in accuracy.\nThis work proposes a unified heuristic algorithm for a large class of earliness-tardiness (E-T) scheduling problems. We consider single/parallel machine E-T problems that may or may not consider some additional features such as idle time, setup times and release dates. In addition, we also consider those problems whose objective is to minimize either the total (average) weighted completion time or the total (average) weighted flow time, which arise as particular cases when the due dates of all jobs are either set to zero or to their associated release dates, respectively. The developed local search based metaheuristic framework is quite simple, but at the same time relies on sophisticated procedures for efficiently performing local search according to the characteristics of the problem. We present efficient move evaluation approaches for some parallel machine problems that generalize the existing ones for single machine problems. The algorithm was tested in hundreds of instances of several E-T problems and particular cases. The results obtained show that our unified heuristic is capable of producing high quality solutions when compared to the best ones available in the literature that were obtained by specific methods. Moreover, we provide an extensive annotated bibliography on the problems related to those considered in this work, where we not only indicate the approach(es) used in each publication, but we also point out the characteristics of the problem(s) considered. Beyond that, we classify the existing methods in different categories so as to have a better idea of the popularity of each type of solution procedure.\nAn approach for game bot detection in MMORPGs is proposed based on the analysis of game playing behavior. Since MMORPGs are large scale games, users can play in various ways. This variety in playing behavior makes it hard to detect game bots based on play behaviors. In order to cope with this problem, the proposed approach observes game playing behaviors of users and groups them by their behavioral similarities. Then, it develops a local bot detection model for each player group. Since the locally optimized models can more accurately detect game bots within each player group, the combination of those models brings about overall improvement. For a practical purpose of reducing the workloads of the game servers in service, the game data is collected at a low resolution in time. Behavioral features are selected and developed to accurately detect game bots with the low resolution data, considering common aspects of MMORPG playing. Through the experiment with the real data from a game currently in service, it is shown that the proposed local model approach yields more accurate results.\nSuccessful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.\nThis work presents and analyzes three convolutional neural network (CNN) models for efficient pixelwise classification of images. When using convolutional neural networks to classify single pixels in patches of a whole image, a lot of redundant computations are carried out when using sliding window networks. This set of new architectures solve this issue by either removing redundant computations or using fully convolutional architectures that inherently predict many pixels at once.   The implementations of the three models are accessible through a new utility on top of the Caffe library. The utility provides support for a wide range of image input and output formats, pre-processing parameters and methods to equalize the label histogram during training. The Caffe library has been extended by new layers and a new backend for availability on a wider range of hardware such as CPUs and GPUs through OpenCL.   On AMD GPUs, speedups of $54\\times$ (SK-Net), $437\\times$ (U-Net) and $320\\times$ (USK-Net) have been observed, taking the SK equivalent SW (sliding window) network as the baseline. The label throughput is up to one megapixel per second.   The analyzed neural networks have distinctive characteristics that apply during training or processing, and not every data set is suitable to every architecture. The quality of the predictions is assessed on two neural tissue data sets, of which one is the ISBI 2012 challenge data set. Two different loss functions, Malis loss and Softmax loss, were used during training.   The whole pipeline, consisting of models, interface and modified Caffe library, is available as Open Source software under the working title Project Greentea.\nStudies on computational neuroscience through functional magnetic resonance imaging (fMRI) and following biological inspired system stated that human action recognition in the brain of mammalian leads two distinct pathways in the model, which are specialized for analysis of motion (optic flow) and form information. Principally, we have defined a novel and robust form features applying active basis model as form extractor in form pathway in the biological inspired model. An unbalanced synergetic neural net-work classifies shapes and structures of human objects along with tuning its attention parameter by quantum particle swarm optimization (QPSO) via initiation of Centroidal Voronoi Tessellations. These tools utilized and justified as strong tools for following biological system model in form pathway. But the final decision has done by combination of ultimate outcomes of both pathways via fuzzy inference which increases novality of proposed model. Combination of these two brain pathways is done by considering each feature sets in Gaussian membership functions with fuzzy product inference method. Two configurations have been proposed for form pathway: applying multi-prototype human action templates using two time synergetic neural network for obtaining uniform template regarding each actions, and second scenario that it uses abstracting human action in four key-frames. Experimental results showed promising accuracy performance on different datasets (KTH and Weizmann).\nThe Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Semantic Web practitioners. Can an existing reasoner recognize the singleton property triples? And how? If the singleton property triples describe a data triple, then how can a reasoner infer this data triple from the singleton property triples? Or would the large property hierarchy affect the reasoners in some way? We address these questions in this paper and present our study about the reasoning aspects of the singleton properties. We propose a simple mechanism to enable existing reasoners to recognize the singleton property triples, as well as to infer the data triples described by the singleton property triples. We evaluate the effect of the singleton property triples in the reasoning processes by comparing the performance on RDF datasets with and without singleton properties. Our evaluation uses as benchmark the LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal information added through singleton properties.\nIn many human brain network studies, we do not have sufficient number (n) of images relative to the number (p) of voxels due to the prohibitively expensive cost of scanning enough subjects. Thus, brain network models usually suffer the small-n large-p problem. Such a problem is often remedied by sparse network models, which are usually solved numerically by optimizing L1-penalties. Unfortunately, due to the computational bottleneck associated with optimizing L1-penalties, it is not practical to apply such methods to construct large-scale brain networks at the voxel-level. In this paper, we propose a new scalable sparse network model using cross-correlations that bypass the computational bottleneck. Our model can build sparse brain networks at the voxel level with p > 25000. Instead of using a single sparse parameter that may not be optimal in other studies and datasets, the computational speed gain enables us to analyze the collection of networks at every possible sparse parameter in a coherent mathematical framework via persistent homology. The method is subsequently applied in determining the extent of heritability on a functional brain network at the voxel-level for the first time using twin fMRI.\nSystematic use of the published results of randomized clinical trials is increasingly important in evidence-based medicine. In order to collate and analyze the results from potentially numerous trials, evidence tables are used to represent trials concerning a set of interventions of interest. An evidence table has columns for the patient group, for each of the interventions being compared, for the criterion for the comparison (e.g. proportion who survived after 5 years from treatment), and for each of the results. Currently, it is a labour-intensive activity to read each published paper and extract the information for each field in an evidence table. There have been some NLP studies investigating how some of the features from papers can be extracted, or at least the relevant sentences identified. However, there is a lack of an NLP system for the systematic extraction of each item of information required for an evidence table. We address this need by a combination of a maximum entropy classifier, and integer linear programming. We use the later to handle constraints on what is an acceptable classification of the features to be extracted. With experimental results, we demonstrate substantial advantages in using global constraints (such as the features describing the patient group, and the interventions, must occur before the features describing the results of the comparison).\nA generic algorithm for the extraction of probabilistic (Bayesian) information about model parameters from data is presented. The algorithm propagates an ensemble of particles in the product space of model parameters and outputs. Each particle update consists of a random jump in parameter space followed by a simulation of a model output and a Metropolis acceptance/rejection step based on a comparison of the simulated output to the data. The distance of a particle to the data is interpreted as an energy and the algorithm is reducing the associated temperature of the ensemble such that entropy production is minimized. If this simulated annealing is not too fast compared to the mixing speed in parameter space, the parameter marginal of the ensemble approaches the Bayesian posterior distribution. Annealing is adaptive and depends on certain extensive thermodynamic quantities that can easily be measured throughout run-time. In the general case, we propose annealing with a constant entropy production rate, which is optimal as long as annealing is not too fast. For the practically relevant special case of no prior knowledge, we derive an optimal fast annealing schedule with a non-constant entropy production rate. The algorithm does not require the calculation of the density of the model likelihood, which makes it interesting for Bayesian parameter inference with stochastic models, whose likelihood functions are typically very high dimensional integrals.\nAn open problem in robotics is that of using vision to identify a robot's own body and the world around it. Many models attempt to recover the traditional C-space parameters. Instead, we propose an alternative C-space by deriving generalized coordinates from $n$ images of the robot. We show that the space of such images is bijective to the motion space, so these images lie on a manifold $\\mathcal{V}$ homeomorphic to the canonical C-space. We now approximate this manifold as a set of $n$ neighbourhood tangent spaces that result in a graph, which we call the Visual Roadmap (VRM). Given a new robot image, we perform inverse kinematics visually by interpolating between nearby images in the image space. Obstacles are projected onto the VRM in $O(n)$ time by superimposition of images, leading to the identification of collision poses. The edges joining the free nodes can now be checked with a visual local planner, and free-space motions computed in $O(nlogn)$ time. This enables us to plan paths in the image space for a robot manipulator with unknown link geometries, DOF, kinematics, obstacles, and camera pose. We sketch the proofs for the main theoretical ideas, identify the assumptions, and demonstrate the approach for both articulated and mobile robots. We also investigate the feasibility of the process by investigating various metrics and image sampling densities, and demonstrate it on simulated and real robots.\nProtein-protein interaction (PPI) prediction is an important problem in machine learning and computational biology. However, there is no data set for training or evaluation purposes, where all the instances are accurately labeled. Instead, what is available are instances of positive class (with possibly noisy labels) and no instances of negative class. The non-availability of negative class data is typically handled with the observation that randomly chosen protein-pairs have a nearly 100% chance of being negative class, as only 1 in 1,500 protein pairs expected is expected to be an interacting pair. In this paper, we focused on the problem that non-availability of accurately labeled testing data sets in the domain of protein-protein interaction (PPI) prediction may lead to biased evaluation results. We first showed that not acknowledging the inherent skew in the interactome (i.e. rare occurrence of positive instances) leads to an over-estimated accuracy of the predictor. Then we show that, with the belief that positive interactions are a rare category, sampling random pairs of proteins excluding known interacting proteins set as the negative testing data set could lead to an under-estimated evaluation result. We formalized those two problems to validate the above claim, and based on the formalization, we proposed a balancing method to cancel out the over-estimation with under-estimation. Finally, our experiments validated the theoretical aspects and showed that this balancing evaluation could evaluate the exact performance without availability of golden standard data sets.\nWhile most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.\nMax-product Belief Propagation (BP) is a popular message-passing algorithm for computing a Maximum-A-Posteriori (MAP) assignment over a distribution represented by a Graphical Model (GM). It has been shown that BP can solve a number of combinatorial optimization problems including minimum weight matching, shortest path, network flow and vertex cover under the following common assumption: the respective Linear Programming (LP) relaxation is tight, i.e., no integrality gap is present. However, when LP shows an integrality gap, no model has been known which can be solved systematically via sequential applications of BP. In this paper, we develop the first such algorithm, coined Blossom-BP, for solving the minimum weight matching problem over arbitrary graphs. Each step of the sequential algorithm requires applying BP over a modified graph constructed by contractions and expansions of blossoms, i.e., odd sets of vertices. Our scheme guarantees termination in O(n^2) of BP runs, where n is the number of vertices in the original graph. In essence, the Blossom-BP offers a distributed version of the celebrated Edmonds' Blossom algorithm by jumping at once over many sub-steps with a single BP. Moreover, our result provides an interpretation of the Edmonds' algorithm as a sequence of LPs.\nThis study explores the design and control of the behaviour of agents and robots using simple circuits of spiking neurons and Spike Timing Dependent Plasticity (STDP) as a mechanism of associative and unsupervised learning. Based on a \"reward and punishment\" classical conditioning, it is demonstrated that these robots learnt to identify and avoid obstacles as well as to identify and look for rewarding stimuli. Using the simulation and programming environment NetLogo, a software engine for the Integrate and Fire model was developed, which allowed us to monitor in discrete time steps the dynamics of each single neuron, synapse and spike in the proposed neural networks. These spiking neural networks (SNN) served as simple brains for the experimental robots. The Lego Mindstorms robot kit was used for the embodiment of the simulated agents. In this paper the topological building blocks are presented as well as the neural parameters required to reproduce the experiments. This paper summarizes the resulting behaviour as well as the observed dynamics of the neural circuits. The Internet-link to the NetLogo code is included in the annex.\nHumans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.\nThis paper presents an ontology-based approach for the design of a collaborative business process model (CBP). This CBP is considered as a specification of needs in order to build a collaboration information system (CIS) for a network of organisations. The study is a part of a model driven engineering approach of the CIS in a specific enterprise interoperability framework that will be summarised. An adaptation of the Business Process Modeling Notation (BPMN) is used to represent the CBP model. We develop a knowledge-based system (KbS) which is composed of three main parts: knowledge gathering, knowledge representation and reasoning, and collaborative business process modelling. The first part starts from a high abstraction level where knowledge from business partners is captured. A collaboration ontology is defined in order to provide a structure to store and use the knowledge captured. In parallel, we try to reuse generic existing knowledge about business processes from the MIT Process Handbook repository. This results in a collaboration process ontology that is also described. A set of rules is defined in order to extract knowledge about fragments of the CBP model from the two previous ontologies. These fragments are finally assembled in the third part of the KbS. A prototype of the KbS has been developed in order to implement and support this approach. The prototype is a computer-aided design tool of the CBP. In this paper, we will present the theoretical aspects of each part of this KbS as well as the tools that we developed and used in order to support its functionalities.\nThe Mediation Information System Engineering project is currently finishing its second iteration (MISE 2.0). The main objective of this scientific project is to provide any emerging collaborative situation with methods and tools to deploy a Mediation Information System (MIS). MISE 2.0 aims at defining and designing a service-based platform, dedicated to initiating and supporting the interoperability of collaborative situations among potential partners. This MISE 2.0 platform implements a model-driven engineering approach to the design of a service-oriented MIS dedicated to supporting the collaborative situation. This approach is structured in three layers, each providing their own key innovative points: (i) the gathering of individual and collaborative knowledge to provide appropriate collaborative business behaviour (key point: knowledge management, including semantics, exploitation and capitalization), (ii) deployment of a mediation information system able to computerize the previously deduced collaborative processes (key point: the automatic generation of collaborative workflows, including connection with existing devices or services) (iii) the management of the agility of the obtained collaborative network of organizations (key point: supervision of collaborative situations and relevant exploitation of the gathered data). MISE covers business issues (through BPM), technical issues (through an SOA) and agility issues of collaborative situations (through EDA).\nSymbolic (or Literal) Neutrosophic Theory is referring to the use of abstract symbols (i.e. the letters T, I, F, or their refined indexed letters Tj, Ik, Fl) in neutrosophics. We extend the dialectical triad thesis-antithesis-synthesis to the neutrosophic tetrad thesis-antithesis-neutrothesis-neutrosynthesis. The we introduce the neutrosophic system that is a quasi or (t,i,f) classical system, in the sense that the neutrosophic system deals with quasi-terms (concepts, attributes, etc.). Then the notions of Neutrosophic Axiom, Neutrosophic Deducibility, Degree of Contradiction (Dissimilarity) of Two Neutrosophic Axioms, etc. Afterwards a new type of structures, called (t, i, f) Neutrosophic Structures, and we show particular cases of such structures in geometry and in algebra. Also, a short history of the neutrosophic set, neutrosophic numerical components and neutrosophic literal components, neutrosophic numbers, etc. We construct examples of splitting the literal indeterminacy (I) into literal subindeterminacies (I1, I2, and so on, Ir), and to define a multiplication law of these literal subindeterminacies in order to be able to build refined I neutrosophic algebraic structures. We define three neutrosophic actions and their properties. We then introduce the prevalence order on T,I,F with respect to a given neutrosophic operator. And the refinement of neutrosophic entities A, neutA, and antiA. Then we extend the classical logical operators to neutrosophic literal (symbolic) logical operators and to refined literal (symbolic) logical operators, and we define the refinement neutrosophic literal (symbolic) space. We introduce the neutrosophic quadruple numbers (a+bT+cI+dF) and the refined neutrosophic quadruple numbers. Then we define an absorbance law, based on a prevalence order, in order to multiply the neutrosophic quadruple numbers.\nWe present a very general geometrico-dynamical description of physical or more abstract entities, called the 'general tension-reduction' (GTR) model, where not only states, but also measurement-interactions can be represented, and the associated outcome probabilities calculated. Underlying the model is the hypothesis that indeterminism manifests as a consequence of unavoidable fluctuations in the experimental context, in accordance with the 'hidden-measurements interpretation' of quantum mechanics. When the structure of the state space is Hilbertian, and measurements are of the 'universal' kind, i.e., are the result of an average over all possible ways of selecting an outcome, the GTR-model provides the same predictions of the Born rule, and therefore provides a natural completed version of quantum mechanics. However, when the structure of the state space is non-Hilbertian and/or not all possible ways of selecting an outcome are available to be actualized, the predictions of the model generally differ from the quantum ones, especially when sequential measurements are considered. Some paradigmatic examples will be discussed, taken from physics and human cognition. Particular attention will be given to some known psychological effects, like question order effects and response replicability, which we show are able to generate non-Hilbertian statistics. We also suggest a realistic interpretation of the GTR-model, when applied to human cognition and decision, which we think could become the generally adopted interpretative framework in quantum cognition research.\nWe proposed Neural Enquirer as a neural network architecture to execute a natural language (NL) query on a knowledge-base (KB) for answers. Basically, Neural Enquirer finds the distributed representation of a query and then executes it on knowledge-base tables to obtain the answer as one of the values in the tables. Unlike similar efforts in end-to-end training of semantic parsers, Neural Enquirer is fully \"neuralized\": it not only gives distributional representation of the query and the knowledge-base, but also realizes the execution of compositional queries as a series of differentiable operations, with intermediate results (consisting of annotations of the tables at different levels) saved on multiple layers of memory. Neural Enquirer can be trained with gradient descent, with which not only the parameters of the controlling components and semantic parsing component, but also the embeddings of the tables and query words can be learned from scratch. The training can be done in an end-to-end fashion, but it can take stronger guidance, e.g., the step-by-step supervision for complicated queries, and benefit from it. Neural Enquirer is one step towards building neural network systems which seek to understand language by executing it on real-world. Our experiments show that Neural Enquirer can learn to execute fairly complicated NL queries on tables with rich structures.\nWe first discuss certain problems with the classical probabilistic approach for assessing forensic evidence, in particular its inability to distinguish between lack of belief and disbelief, and its inability to model complete ignorance within a given population. We then discuss Shafer belief functions, a generalization of probability distributions, which can deal with both these objections. We use a calculus of belief functions which does not use the much criticized Dempster rule of combination, but only the very natural Dempster-Shafer conditioning. We then apply this calculus to some classical forensic problems like the various island problems and the problem of parental identification. If we impose no prior knowledge apart from assuming that the culprit or parent belongs to a given population (something which is possible in our setting), then our answers differ from the classical ones when uniform or other priors are imposed. We can actually retrieve the classical answers by imposing the relevant priors, so our setup can and should be interpreted as a generalization of the classical methodology, allowing more flexibility. We show how our calculus can be used to develop an analogue of Bayes' rule, with belief functions instead of classical probabilities. We also discuss consequences of our theory for legal practice.\nThere is a widespread need for statistical methods that can analyze high-dimensional datasets with- out imposing restrictive or opaque modeling assumptions. This paper describes a domain-general data analysis method called CrossCat. CrossCat infers multiple non-overlapping views of the data, each consisting of a subset of the variables, and uses a separate nonparametric mixture to model each view. CrossCat is based on approximately Bayesian inference in a hierarchical, nonparamet- ric model for data tables. This model consists of a Dirichlet process mixture over the columns of a data table in which each mixture component is itself an independent Dirichlet process mixture over the rows; the inner mixture components are simple parametric models whose form depends on the types of data in the table. CrossCat combines strengths of mixture modeling and Bayesian net- work structure learning. Like mixture modeling, CrossCat can model a broad class of distributions by positing latent variables, and produces representations that can be efficiently conditioned and sampled from for prediction. Like Bayesian networks, CrossCat represents the dependencies and independencies between variables, and thus remains accurate when there are multiple statistical signals. Inference is done via a scalable Gibbs sampling scheme; this paper shows that it works well in practice. This paper also includes empirical results on heterogeneous tabular data of up to 10 million cells, such as hospital cost and quality measures, voting records, unemployment rates, gene expression measurements, and images of handwritten digits. CrossCat infers structure that is consistent with accepted findings and common-sense knowledge in multiple domains and yields predictive accuracy competitive with generative, discriminative, and model-free alternatives.\nOnce known to be used exclusively in military domain, unmanned aerial vehicles (drones) have stepped up to become a part of new logistic method in commercial sector called \"last-mile delivery\". In this novel approach, small unmanned aerial vehicles (UAV), also known as drones, are deployed alongside with trucks to deliver goods to customers in order to improve the service quality or reduce the transportation cost. It gives rise to a new variant of the traveling salesman problem (TSP), of which we call TSP with drone (TSP-D). In this article, we consider a variant of TSP-D where the main objective is to minimize the total transportation cost. We also propose two heuristics: \"Drone First, Truck Second\" (DFTS) and \"Truck First, Drone Second\" (TFDS), to effectively solve the problem. The former constructs route for drone first while the latter constructs route for truck first. We solve a TSP to generate route for truck and propose a mixed integer programming (MIP) formulation with different profit functions to build route for drone. Numerical results obtained on many instances with different sizes and characteristics are presented. Recommendations on promising algorithm choices are also provided.\nThis paper presents a restricted visual Turing test (VTT) for story-line based deep understanding in long-term and multi-camera captured videos. Given a set of videos of a scene (such as a multi-room office, a garden, and a parking lot.) and a sequence of story-line based queries, the task is to provide answers either simply in binary form \"true/false\" (to a polar query) or in an accurate natural language description (to a non-polar query). Queries, polar or non-polar, consist of view-based queries which can be answered from a particular camera view and scene-centered queries which involves joint inference across different cameras. The story lines are collected to cover spatial, temporal and causal understanding of input videos. The data and queries distinguish our VTT from recently proposed visual question answering in images and video captioning. A vision system is proposed to perform joint video and query parsing which integrates different vision modules, a knowledge base and a query engine. The system provides unified interfaces for different modules so that individual modules can be reconfigured to test a new method. We provide a benchmark dataset and a toolkit for ontology guided story-line query generation which consists of about 93.5 hours videos captured in four different locations and 3,426 queries split into 127 story lines. We also provide a baseline implementation and result analyses.\nIn this dissertation, we analyze the computational properties of game-theoretic centrality measures. The key idea behind game-theoretic approach to network analysis is to treat nodes as players in a cooperative game, where the value of each coalition of nodes is determined by certain graph properties. Next, the centrality of any individual node is determined by a chosen game-theoretic solution concept (notably, the Shapley value) in the same way as the payoff of a player in a cooperative game. On one hand, the advantage of game-theoretic centrality measures is that nodes are ranked not only according to their individual roles but also according to how they contribute to the role played by all possible subsets of nodes. On the other hand, the disadvantage is that the game-theoretic solution concepts are typically computationally challenging. The main contribution of this dissertation is that we show that a wide variety of game-theoretic solution concepts on networks can be computed in polynomial time. Our focus is on centralities based on the Shapley value and its various extensions, such as the Semivalues and Coalitional Semivalues. Furthermore, we prove #P-hardness of computing the Shapley value in connectivity games and propose an algorithm to compute it. Finally, we analyse computational properties of generalized version of cooperative games in which order of player matters. We propose a new representation for such games, called generalized marginal contribution networks, that allows for polynomial computation in the size of the representation of two dedicated extensions of the Shapley value to this class of games.\nTo explore the hypothesis that KIC 8462852's aperiodic dimming is caused by artificial megastructures in orbit (Wright et al. 2015), rather than a natural cause such as cometary fragments in a highly elliptical orbit (Marengo et al. 2015), we searched for electromagnetic signals from KIC 8462852 indicative of extraterrestrial intelligence. The primary observations were in the visible optical regime using the Boquete Optical SETI Observatory in Panama. In addition, as a preparatory exercise for the possible future detection of a candidate signal (Heidmann 1991), three of six observing runs simultaneously searched radio frequencies at the Allen Telescope Array in California. No periodic optical signals greater than 67 photons/m2 within a time frame of 25 ns were seen. This limit corresponds to isotropic optical pulses of 8E22 joules. If, however, any inhabitants of KIC 8462852 were targeting our solar system (Shostak & Villard 2004), the required energy would be reduced greatly. The limits on narrowband radio signals were 180 - 300 Jy Hz at 1 and 8 GHz, respectively, corresponding to a transmitter with an effective isotropic radiated power of 4E15 W (and 7E15 W) at the distance of KIC 8462852. While these powers requirements are high, even modest targeting could - just as for optical signals - lower these numbers substantially.\nMany scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of convergence guarantee and adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a new regularized principal graph learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected $\\ell_1$ graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly.\nThe proliferation of contextualized knowledge in the Semantic Web (SW) has led to the popularity of knowledge formats such as \\emph{quads} in the SW community. A quad is an extension of an RDF triple with contextual information of the triple. In this paper, we study the problem of query answering over quads augmented with forall-existential bridge rules that enable interoperability of reasoning between triples in various contexts. We call a set of quads together with such expressive bridge rules, a quad-system. Query answering over quad-systems is undecidable, in general. We derive decidable classes of quad-systems, for which query answering can be done using forward chaining. Sound, complete and terminating procedures, which are adaptations of the well known chase algorithm, are provided for these classes for deciding query entailment. Safe, msafe, and csafe class of quad-systems restrict the structure of blank nodes generated during the chase computation process to be directed acyclic graphs (DAGs) of bounded depth. RR and restricted RR classes do not allow the generation of blank nodes during the chase computation process. Both data and combined complexity of query entailment has been established for the classes derived. We further show that quad-systems are equivalent to forall-existential rules whose predicates are restricted to ternary arity, modulo polynomial time translations. We subsequently show that the technique of safety, strictly subsumes in expressivity, some of the well known and expressive techniques, such as joint acyclicity and model faithful acyclicity, used for decidability guarantees in the realm of forall-existential rules.\nBackground: Lung cancer was known as primary cancers and the survival rate of cancer is about 15%. Early detection of lung cancer is the leading factor in survival rate. All symptoms (features) of lung cancer do not appear until the cancer spreads to other areas. It needs an accurate early detection of lung cancer, for increasing the survival rate. For accurate detection, it need characterizes efficient features and delete redundancy features among all features. Feature selection is the problem of selecting informative features among all features. Materials and Methods: Lung cancer database consist of 32 patient records with 57 features. This database collected by Hong and Youngand indexed in the University of California Irvine repository. Experimental contents include the extracted from the clinical data and X-ray data, etc. The data described 3 types of pathological lung cancers and all features are taking an integer value 0-3. In our study, new method is proposed for identify efficient features of lung cancer. It is based on Hyper-Heuristic. Results: We obtained an accuracy of 80.63% using reduced 11 feature set. The proposed method compare to the accuracy of 5 machine learning feature selections. The accuracy of these 5 methods are 60.94, 57.81, 68.75, 60.94 and 68.75. Conclusions: The proposed method has better performance with the highest level of accuracy. Therefore, the proposed model is recommended for identifying an efficient symptom of Disease. These finding are very important in health research, particularly in allocation of medical resources for patients who predicted as high-risks\nReverse engineering the brain is proving difficult, perhaps impossible. While many believe that this is just a matter of time and effort, a different approach might help. Here, we describe a very simple idea which explains the power of the brain as well as its structure, exploiting complex dynamics rather than abstracting it away. Just as a Turing Machine is a Universal Digital Computer operating in a world of symbols, we propose that the brain is a Universal Dynamical Systems Modeller, evolved bottom-up (itself using nested networks of interconnected, self-organised dynamical systems) to prosper in a world of dynamical systems.   Recent progress in Applied Mathematics has produced startling evidence of what happens when abstract Dynamical Systems interact. Key latent information describing system A can be extracted by system B from very simple signals, and signals can be used by one system to control and manipulate others. Using these facts, we show how a region of the neocortex uses its dynamics to intrinsically \"compute\" about the external and internal world.   Building on an existing \"static\" model of cortical computation (Hawkins' Hierarchical Temporal Memory - HTM), we describe how a region of neocortex can be viewed as a network of components which together form a Dynamical Systems modelling module, connected via sensory and motor pathways to the external world, and forming part of a larger dynamical network in the brain.   Empirical modelling and simulations of Dynamical HTM are possible with simple extensions and combinations of currently existing open source software. We list a number of relevant projects.\nSince the introduction of the stable marriage problem (SMP) by Gale and Shapley (1962), several variants and extensions have been investigated. While this variety is useful to widen the application potential, each variant requires a new algorithm for finding the stable matchings. To address this issue, we propose an encoding of the SMP using answer set programming (ASP), which can straightforwardly be adapted and extended to suit the needs of specific applications. The use of ASP also means that we can take advantage of highly efficient off-the-shelf solvers. To illustrate the flexibility of our approach, we show how our ASP encoding naturally allows us to select optimal stable matchings, i.e. matchings that are optimal according to some user-specified criterion. To the best of our knowledge, our encoding offers the first exact implementation to find sex-equal, minimum regret, egalitarian or maximum cardinality stable matchings for SMP instances in which individuals may designate unacceptable partners and ties between preferences are allowed.   This paper is under consideration in Theory and Practice of Logic Programming (TPLP).\nAn active object recognition system has the advantage of being able to act in the environment to capture images that are more suited for training and that lead to better performance at test time. In this paper, we propose a deep convolutional neural network for active object recognition that simultaneously predicts the object label, and selects the next action to perform on the object with the aim of improving recognition performance. We treat active object recognition as a reinforcement learning problem and derive the cost function to train the network for joint prediction of the object label and the action. A generative model of object similarities based on the Dirichlet distribution is proposed and embedded in the network for encoding the state of the system. The training is carried out by simultaneously minimizing the label and action prediction errors using gradient descent. We empirically show that the proposed network is able to predict both the object label and the actions on GERMS, a dataset for active object recognition. We compare the test label prediction accuracy of the proposed model with Dirichlet and Naive Bayes state encoding. The results of experiments suggest that the proposed model equipped with Dirichlet state encoding is superior in performance, and selects images that lead to better training and higher accuracy of label prediction at test time.\nWe study Matching and other related problems in a partial information setting where the agents' utilities for being matched to other agents are hidden and the mechanism only has access to ordinal preference information. Our model is motivated by the fact that in many settings, agents cannot express the numerical values of their utility for different outcomes, but are still able to rank the outcomes in their order of preference. Specifically, we study problems where the ground truth exists in the form of a weighted graph, and look to design algorithms that approximate the true optimum matching using only the preference orderings for each agent (induced by the hidden weights) as input. If no restrictions are placed on the weights, then one cannot hope to do better than the simple greedy algorithm, which yields a half optimal matching. Perhaps surprisingly, we show that by imposing a little structure on the weights, we can improve upon the trivial algorithm significantly: we design a 1.6-approximation algorithm for instances where the hidden weights obey the metric inequality. Using our algorithms for matching as a black-box, we also design new approximation algorithms for other closely related problems: these include a a 3.2-approximation for the problem of clustering agents into equal sized partitions, a 4-approximation algorithm for Densest k-subgraph, and a 2.14-approximation algorithm for Max TSP. These results are the first non-trivial ordinal approximation algorithms for such problems, and indicate that we can design robust algorithms even when we are agnostic to the precise agent utilities.\nGaussian Processes (GPs) are widely used tools in statistics, machine learning, robotics, computer vision, and scientific computation. However, despite their popularity, they can be difficult to apply; all but the simplest classification or regression applications require specification and inference over complex covariance functions that do not admit simple analytical posteriors. This paper shows how to embed Gaussian processes in any higher-order probabilistic programming language, using an idiom based on memoization, and demonstrates its utility by implementing and extending classic and state-of-the-art GP applications. The interface to Gaussian processes, called gpmem, takes an arbitrary real-valued computational process as input and returns a statistical emulator that automatically improve as the original process is invoked and its input-output behavior is recorded. The flexibility of gpmem is illustrated via three applications: (i) robust GP regression with hierarchical hyper-parameter learning, (ii) discovering symbolic expressions from time-series data by fully Bayesian structure learning over kernels generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian optimization with automatic inference and action selection. All applications share a single 50-line Python library and require fewer than 20 lines of probabilistic code each.\nAn important use of machine learning is to learn what people value. What posts or photos should a user be shown? Which jobs or activities would a person find rewarding? In each case, observations of people's past choices can inform our inferences about their likes and preferences. If we assume that choices are approximately optimal according to some utility function, we can treat preference inference as Bayesian inverse planning. That is, given a prior on utility functions and some observed choices, we invert an optimal decision-making process to infer a posterior distribution on utility functions. However, people often deviate from approximate optimality. They have false beliefs, their planning is sub-optimal, and their choices may be temporally inconsistent due to hyperbolic discounting and other biases. We demonstrate how to incorporate these deviations into algorithms for preference inference by constructing generative models of planning for agents who are subject to false beliefs and time inconsistency. We explore the inferences these models make about preferences, beliefs, and biases. We present a behavioral experiment in which human subjects perform preference inference given the same observations of choices as our model. Results show that human subjects (like our model) explain choices in terms of systematic deviations from optimal behavior and suggest that they take such deviations into account when inferring preferences.\nWe study mechanisms for candidate selection that seek to minimize the social cost, where voters and candidates are associated with points in some underlying metric space. The social cost of a candidate is the sum of its distances to each voter. Some of our work assumes that these points can be modeled on a real line, but other results of ours are more general.   A question closely related to candidate selection is that of minimizing the sum of distances for facility location. The difference is that in our setting there is a fixed set of candidates, whereas the large body of work on facility location seems to consider every point in the metric space to be a possible candidate. This gives rise to three types of mechanisms which differ in the granularity of their input space (voting, ranking and location mechanisms). We study the relationships between these three classes of mechanisms.   While it may seem that Black's 1948 median algorithm is optimal for candidate selection on the line, this is not the case. We give matching upper and lower bounds for a variety of settings. In particular, when candidates and voters are on the line, our universally truthful spike mechanism gives a [tight] approximation of two. When assessing candidate selection mechanisms, we seek several desirable properties: (a) efficiency (minimizing the social cost) (b) truthfulness (dominant strategy incentive compatibility) and (c) simplicity (a smaller input space). We quantify the effect that truthfulness and simplicity impose on the efficiency.\nDeep convolutional neural networks have led to breakthrough results in numerous practical machine learning tasks such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a trainable classifier. The mathematical analysis of deep convolutional neural networks for feature extraction was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on a wavelet transform followed by the modulus non-linearity in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. This paper complements Mallat's results by developing a theory that encompasses general convolutional transforms, or in more technical parlance, general semi-discrete frames (including Weyl-Heisenberg filters, curvelets, shearlets, ridgelets, wavelets, and learned filters), general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), and general Lipschitz-continuous pooling operators emulating, e.g., sub-sampling and averaging. In addition, all of these elements can be different in different network layers. For the resulting feature extractor we prove a translation invariance result of vertical nature in the sense of the features becoming progressively more translation-invariant with increasing network depth, and we establish deformation sensitivity bounds that apply to signal classes such as, e.g., band-limited functions, cartoon functions, and Lipschitz functions.\nOnline reviews are often our first port of call when considering products and purchases online. When evaluating a potential purchase, we may have a specific query in mind, e.g. `will this baby seat fit in the overhead compartment of a 747?' or `will I like this album if I liked Taylor Swift's 1989?'. To answer such questions we must either wade through huge volumes of consumer reviews hoping to find one that is relevant, or otherwise pose our question directly to the community via a Q/A system.   In this paper we hope to fuse these two paradigms: given a large volume of previously answered queries about products, we hope to automatically learn whether a review of a product is relevant to a given query. We formulate this as a machine learning problem using a mixture-of-experts-type framework---here each review is an `expert' that gets to vote on the response to a particular query; simultaneously we learn a relevance function such that `relevant' reviews are those that vote correctly. At test time this learned relevance function allows us to surface reviews that are relevant to new queries on-demand. We evaluate our system, Moqa, on a novel corpus of 1.4 million questions (and answers) and 13 million reviews. We show quantitatively that it is effective at addressing both binary and open-ended queries, and qualitatively that it surfaces reviews that human evaluators consider to be relevant.\nIn today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Taking a multilingual stream and clusters of articles from each language, we compare different cross-lingual document similarity measures based on Wikipedia. This allows us to compute the similarity of any two articles regardless of language. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data. Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event. We provide an extensive evaluation of the system as a whole, as well as an evaluation of the quality and robustness of the similarity measure and the linking algorithm.\nAttribute reduction is one of the most important topics in rough set theory. Heuristic attribute reduction algorithms have been presented to solve the attribute reduction problem. It is generally known that fitness functions play a key role in developing heuristic attribute reduction algorithms. The monotonicity of fitness functions can guarantee the validity of heuristic attribute reduction algorithms. In probabilistic rough set model, distribution reducts can ensure the decision rules derived from the reducts are compatible with those derived from the original decision table. However, there are few studies on developing heuristic attribute reduction algorithms for finding distribution reducts. This is partly due to the fact that there are no monotonic fitness functions that are used to design heuristic attribute reduction algorithms in probabilistic rough set model. The main objective of this paper is to develop heuristic attribute reduction algorithms for finding distribution reducts in probabilistic rough set model. For one thing, two monotonic fitness functions are constructed, from which equivalence definitions of distribution reducts can be obtained. For another, two modified monotonic fitness functions are proposed to evaluate the significance of attributes more effectively. On this basis, two heuristic attribute reduction algorithms for finding distribution reducts are developed based on addition-deletion method and deletion method. In particular, the monotonicity of fitness functions guarantees the rationality of the proposed heuristic attribute reduction algorithms. Results of experimental analysis are included to quantify the effectiveness of the proposed fitness functions and distribution reducts.\nThis paper discusses the representation of ontologies in the first-order logical environment FOLE (Kent 2013). An ontology defines the primitives with which to model the knowledge resources for a community of discourse (Gruber 2009). These primitives, consisting of classes, relationships and properties, are represented by the entity-relationship-attribute ERA data model (Chen 1976). An ontology uses formal axioms to constrain the interpretation of these primitives. In short, an ontology specifies a logical theory. This paper is the first in a series of three papers that provide a rigorous mathematical representation for the ERA data model in particular, and ontologies in general, within the first-order logical environment FOLE. The first two papers show how FOLE represents the formalism and semantics of (many-sorted) first-order logic in a classification form corresponding to ideas discussed in the Information Flow Framework (IFF). In particular, this first paper provides a foundation that connects elements of the ERA data model with components of the first-order logical environment FOLE, and the second paper provides a superstructure that extends FOLE to the formalisms of first-order logic. The third paper defines an interpretation of FOLE in terms of the transformational passage, first described in (Kent 2013), from the classification form of first-order logic to an equivalent interpretation form, thereby defining the formalism and semantics of first-order logical/relational database systems (Kent 2011). The FOLE representation follows a conceptual structures approach, that is completely compatible with formal concept analysis (Ganter and Wille 1999) and information flow (Barwise and Seligman 1997).\nBeing able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems. Recommender systems, industrial plants and language models are only some of the many real-world tasks involving large numbers of discrete actions for which current methods are difficult or even often impossible to apply. An ability to generalize over the set of actions as well as sub-linear complexity relative to the size of the set are both necessary to handle such tasks. Current approaches are not able to provide both of these, which motivates the work in this paper. Our proposed approach leverages prior information about the actions to embed them in a continuous space upon which it can generalize. Additionally, approximate nearest-neighbor methods allow for logarithmic-time lookup complexity relative to the number of actions, which is necessary for time-wise tractable training. This combined approach allows reinforcement learning methods to be applied to large-scale learning problems previously intractable with current methods. We demonstrate our algorithm's abilities on a series of tasks having up to one million actions.\nProbabilistic Graphical Models (PGM) are very useful in the fields of machine learning and data mining. The crucial limitation of those models,however, is the scalability. The Bayesian Network, which is one of the most common PGMs used in machine learning and data mining, demonstrates this limitation when the training data consists of random variables, each of them has a large set of possible values. In the big data era, one would expect new extensions to the existing PGMs to handle the massive amount of data produced these days by computers, sensors and other electronic devices. With hierarchical data - data that is arranged in a treelike structure with several levels - one would expect to see hundreds of thousands or millions of values distributed over even just a small number of levels. When modeling this kind of hierarchical data across large data sets, Bayesian Networks become infeasible for representing the probability distributions. In this paper we introduce an extension to Bayesian Networks to handle massive sets of hierarchical data in a reasonable amount of time and space. The proposed model achieves perfect precision of 1.0 and high recall of 0.93 when it is used as multi-label classifier for the annotation of mass spectrometry data. On another data set of 1.5 billion search logs provided by CareerBuilder.com the model was able to predict latent semantic relationships between search keywords with accuracy up to 0.80.\nThe scientific community is becoming more and more interested in the research that applies the mathematical formalism of quantum theory to model human decision-making. In this paper, we provide the theoretical foundations of the quantum approach to cognition that we developed in Brussels. These foundations rest on the results of two decade studies on the axiomatic and operational-realistic approaches to the foundations of quantum physics. The deep analogies between the foundations of physics and cognition lead us to investigate the validity of quantum theory as a general and unitary framework for cognitive processes, and the empirical success of the Hilbert space models derived by such investigation provides a strong theoretical confirmation of this validity. However, two situations in the cognitive realm, 'question order effects' and 'response replicability', indicate that even the Hilbert space framework could be insufficient to reproduce the collected data. This does not mean that the mentioned operational-realistic approach would be incorrect, but simply that a larger class of measurements would be in force in human cognition, so that an extended quantum formalism may be needed to deal with all of them. As we will explain, the recently derived 'extended Bloch representation' of quantum theory (and the associated 'general tension-reduction' model) precisely provides such extended formalism, while remaining within the same unitary interpretative framework.\nWe study abduction in First Order Horn logic theories where all atoms can be abduced and we are looking for preferred solutions with respect to three objective functions: cardinality minimality, coherence, and weighted abduction. We represent this reasoning problem in Answer Set Programming (ASP), in order to obtain a flexible framework for experimenting with global constraints and objective functions, and to test the boundaries of what is possible with ASP. Realizing this problem in ASP is challenging as it requires value invention and equivalence between certain constants, because the Unique Names Assumption does not hold in general. To permit reasoning in cyclic theories, we formally describe fine-grained variations of limiting Skolemization. We identify term equivalence as a main instantiation bottleneck, and improve the efficiency of our approach with on-demand constraints that were used to eliminate the same bottleneck in state-of-the-art solvers. We evaluate our approach experimentally on the ACCEL benchmark for plan recognition in Natural Language Understanding. Our encodings are publicly available, modular, and our approach is more efficient than state-of-the-art solvers on the ACCEL benchmark.\nWe consider data in the form of pairwise comparisons of n items, with the goal of precisely identifying the top k items for some value of k < n, or alternatively, recovering a ranking of all the items. We analyze the Copeland counting algorithm that ranks the items in order of the number of pairwise comparisons won, and show it has three attractive features: (a) its computational efficiency leads to speed-ups of several orders of magnitude in computation time as compared to prior work; (b) it is robust in that theoretical guarantees impose no conditions on the underlying matrix of pairwise-comparison probabilities, in contrast to some prior work that applies only to the BTL parametric model; and (c) it is an optimal method up to constant factors, meaning that it achieves the information-theoretic limits for recovering the top k-subset. We extend our results to obtain sharp guarantees for approximate recovery under the Hamming distortion metric, and more generally, to any arbitrary error requirement that satisfies a simple and natural monotonicity condition.\nIn previous work, we proposed a logic-based framework in which computation is the execution of actions in an attempt to make reactive rules of the form if antecedent then consequent true in a canonical model of a logic program determined by an initial state, sequence of events, and the resulting sequence of subsequent states. In this model-theoretic semantics, reactive rules are the driving force, and logic programs play only a supporting role.   In the canonical model, states, actions and other events are represented with timestamps. But in the operational semantics, for the sake of efficiency, timestamps are omitted and only the current state is maintained. State transitions are performed reactively by executing actions to make the consequents of rules true whenever the antecedents become true. This operational semantics is sound, but incomplete. It cannot make reactive rules true by preventing their antecedents from becoming true, or by proactively making their consequents true before their antecedents become true.   In this paper, we characterize the notion of reactive model, and prove that the operational semantics can generate all and only such models. In order to focus on the main issues, we omit the logic programming component of the framework.\nWe propose a formal mathematical model for sparse representations and active dendrites in neocortex. Our model is inspired by recent experimental findings on active dendritic processing and NMDA spikes in pyramidal neurons. These experimental and modeling studies suggest that the basic unit of pattern memory in the neocortex is instantiated by small clusters of synapses operated on by localized non-linear dendritic processes. We derive a number of scaling laws that characterize the accuracy of such dendrites in detecting activation patterns in a neuronal population under adverse conditions. We introduce the union property which shows that synapses for multiple patterns can be randomly mixed together within a segment and still lead to highly accurate recognition. We describe simulation results that provide further insight into sparse representations as well as two primary results. First we show that pattern recognition by a neuron with active dendrites can be extremely accurate and robust with high dimensional sparse inputs even when using a tiny number of synapses to recognize large patterns. Second, equations representing recognition accuracy of a dendrite predict optimal NMDA spiking thresholds under a generous set of assumptions. The prediction tightly matches NMDA spiking thresholds measured in the literature. Our model matches many of the known properties of pyramidal neurons. As such the theory provides a mathematical framework for understanding the benefits and limits of sparse representations in cortical networks.\nWe consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than simple geometric constraints on trajectories; they are rather governed by the surrounding context of various objects and human interactions in the environment. We propose a coactive online learning framework for teaching preferences in contextually rich environments. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this coactive preference feedback can be more easily elicited than demonstrations of optimal trajectories. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms.   We implement our algorithm on two high degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings -- household chores and grocery store checkout -- and show that users are able to train the robot with just a few feedbacks (taking only a few minutes).\\footnote{Parts of this work has been published at NIPS and ISRR conferences~\\citep{Jain13,Jain13b}. This journal submission presents a consistent full paper, and also includes the proof of regret bounds, more details of the robotic system, and a thorough related work.}\nThe performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass of computations. However, the backward pass usually needs to consume unaffordable memory to store all the intermediate variables to exactly reverse the forward training procedure. In this work we propose a simple but effective method, DrMAD, to distill the knowledge of the forward pass into a shortcut path, through which we approximately reverse the training trajectory. Experiments on several image benchmark datasets show that DrMAD is at least 45 times faster and consumes 100 times less memory compared to state-of-the-art methods for optimizing hyperparameters with minimal compromise to its effectiveness. To the best of our knowledge, DrMAD is the first research attempt to make it practical to automatically tune thousands of hyperparameters of deep neural networks. The code can be downloaded from https://github.com/bigaidream-projects/drmad\nPopular online enrichment analysis tools from the field of molecular systems biology provide users with the ability to submit their experimental results as gene sets for individual analysis. Such queries are kept private, and have never before been considered as a resource for integrative analysis. By harnessing gene set query submissions from thousands of users, we aim to discover biological knowledge beyond the scope of an individual study. In this work, we investigated a large collection of gene sets submitted to the tool Enrichr by thousands of users. Based on co-occurrence, we constructed a global gene-gene association network. We interpret this inferred network as providing a summary of the structure present in this crowdsourced gene set library, and show that this network recapitulates known protein-protein interactions and functional associations between genes. This finding implies that this network also offers predictive value. Furthermore, we visualize this gene-gene association network using a new edge-pruning algorithm that retains both the local and global structures of large-scale networks. Our ability to make predictions for currently unknown gene associations, that may not be captured by individual researchers and data sources, is a demonstration of the potential of harnessing collective knowledge from users of popular tools in the field of molecular systems biology.\nThere is a large variety of objects and appliances in human environments, such as stoves, coffee dispensers, juice extractors, and so on. It is challenging for a roboticist to program a robot for each of these object types and for each of their instantiations. In this work, we present a novel approach to manipulation planning based on the idea that many household objects share similarly-operated object parts. We formulate the manipulation planning as a structured prediction problem and learn to transfer manipulation strategy across different objects by embedding point-cloud, natural language, and manipulation trajectory data into a shared embedding space using a deep neural network. In order to learn semantically meaningful spaces throughout our network, we introduce a method for pre-training its lower layers for multimodal feature embedding and a method for fine-tuning this embedding space using a loss-based margin. In order to collect a large number of manipulation demonstrations for different objects, we develop a new crowd-sourcing platform called Robobarista. We test our model on our dataset consisting of 116 objects and appliances with 249 parts along with 250 language instructions, for which there are 1225 crowd-sourced manipulation demonstrations. We further show that our robot with our model can even prepare a cup of a latte with appliances it has never seen before.\nWe consider the problem of generating motion plans for a robot that are guaranteed to succeed despite uncertainty in the environment, parametric model uncertainty, and disturbances. Furthermore, we consider scenarios where these plans must be generated in real-time, because constraints such as obstacles in the environment may not be known until they are perceived (with a noisy sensor) at runtime. Our approach is to pre-compute a library of \"funnels\" along different maneuvers of the system that the state is guaranteed to remain within (despite bounded disturbances) when the feedback controller corresponding to the maneuver is executed. We leverage powerful computational machinery from convex optimization (sums-of-squares programming in particular) to compute these funnels. The resulting funnel library is then used to sequentially compose motion plans at runtime while ensuring the safety of the robot. A major advantage of the work presented here is that by explicitly taking into account the effect of uncertainty, the robot can evaluate motion plans based on how vulnerable they are to disturbances.   We demonstrate and validate our method using extensive hardware experiments on a small fixed-wing airplane avoiding obstacles at high speed (~12 mph), along with thorough simulation experiments of ground vehicle and quadrotor models navigating through cluttered environments. To our knowledge, these demonstrations constitute one of the first examples of provably safe and robust control for robotic systems with complex nonlinear dynamics that need to plan in real-time in environments with complex geometric constraints.\nMachine learning algorithms are increasingly influencing our decisions and interacting with us in all parts of our daily lives. Therefore, just like for power plants, highways, and myriad other engineered sociotechnical systems, we must consider the safety of systems involving machine learning. In this paper, we first discuss the definition of safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. Then we examine dimensions, such as the choice of cost function and the appropriateness of minimizing the empirical average training cost, along which certain real-world applications may not be completely amenable to the foundational principle of modern statistical machine learning: empirical risk minimization. In particular, we note an emerging dichotomy of applications: ones in which safety is important and risk minimization is not the complete story (we name these Type A applications), and ones in which safety is not so critical and risk minimization is sufficient (we name these Type B applications). Finally, we discuss how four different strategies for achieving safety in engineering (inherently safe design, safety reserves, safe fail, and procedural safeguards) can be mapped to the machine learning context through interpretability and causality of predictive models, objectives beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software.\nThe categorical compositional distributional model of natural language provides a conceptually motivated procedure to compute the meaning of sentences, given grammatical structure and the meanings of its words. This approach has outperformed other models in mainstream empirical language processing tasks. However, until recently it has lacked the crucial feature of lexical entailment -- as do other distributional models of meaning.   In this paper we solve the problem of entailment for categorical compositional distributional semantics. Taking advantage of the abstract categorical framework allows us to vary our choice of model. This enables the introduction of a notion of entailment, exploiting ideas from the categorical semantics of partial knowledge in quantum computation.   The new model of language uses density matrices, on which we introduce a novel robust graded order capturing the entailment strength between concepts. This graded measure emerges from a general framework for approximate entailment, induced by any commutative monoid. Quantum logic embeds in our graded order.   Our main theorem shows that entailment strength lifts compositionally to the sentence level, giving a lower bound on sentence entailment. We describe the essential properties of graded entailment such as continuity, and provide a procedure for calculating entailment strength.\nA number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate \"influence bots\" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified \"influence bots\" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.\nTeachable agents are computer agents based on the pedagogical concept of learning-by-teaching. During the tutoring process, where students take on the role of the tutor to teach a computer agent tutee, learners have been observed to gain deeper understanding of the subject matter. Teachable agents are commonly used in the areas of science and mathematics learning where learners are able to learn complex concepts and deep reasoning by teaching the teachable agent through graphic representation such as concept maps.   Literature review on teachable agents as well as observations during field studies conducted by the researcher, have shown that many current teachable agents lack the interaction abilities required to keep learners engage in learning tasks. The result of this is learners deviating from the teaching process, and thus the learners are unable to benefit fully from learning with the teachable agent. The applications of teachable agents are restricted to the learning of academic subjects such as mathematics and science.   In this book, we have proposed the Persuasive Teachable Agent (PTA), a teachable agent based on the theoretical framework of persuasion, computational and goal-oriented agent modelling. We argue that the PTA, an autonomous agent, capable of encouraging attitude and behavioural change can offer a more meaningful and engaging learning experiences for learners from different age groups. Based on the findings from our research we argue that persuasive feedback actions generated by the PTA provide significant influence over learner's decision to participate in intergenerational learning. The PTA plays a crucial role in the development of future persuasive technologies in artificially intelligent agents.\nGoal models have been widely used in Computer Science to represent software requirements, business objectives, and design qualities. Existing goal modelling techniques, however, have shown limitations of expressiveness and/or tractability in coping with complex real-world problems. In this work, we exploit advances in automated reasoning technologies, notably Satisfiability and Optimization Modulo Theories (SMT/OMT), and we propose and formalize: (i) an extended modelling language for goals, namely the Constrained Goal Model (CGM), which makes explicit the notion of goal refinement and of domain assumption, allows for expressing preferences between goals and refinements, and allows for associating numerical attributes to goals and refinements for defining constraints and optimization goals over multiple objective functions, refinements and their numerical attributes; (ii) a novel set of automated reasoning functionalities over CGMs, allowing for automatically generating suitable refinements of input CGMs, under user-specified assumptions and constraints, that also maximize preferences and optimize given objective functions. We have implemented these modelling and reasoning functionalities in a tool, named CGM-Tool, using the OMT solver OptiMathSAT as automated reasoning backend. Moreover, we have conducted an experimental evaluation on large CGMs to support the claim that our proposal scales well for goal models with thousands of elements.\nThis paper presents HEALER, a software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth. HEALER's sequential plans (built using knowledge of social networks of homeless youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. While previous work presents influence maximizing techniques to choose intervention participants, they do not address three real-world issues: (i) they completely fail to scale up to real-world sizes; (ii) they do not handle deviations in execution of intervention plans; (iii) constructing real-world social networks is an expensive process. HEALER handles these issues via four major contributions: (i) HEALER casts this influence maximization problem as a POMDP and solves it using a novel planner which scales up to previously unsolvable real-world sizes; (ii) HEALER allows shelter officials to modify its recommendations, and updates its future plans in a deviation-tolerant manner; (iii) HEALER constructs social networks of homeless youth at low cost, using a Facebook application. Finally, (iv) we show hardness results for the problem that HEALER solves. HEALER will be deployed in the real world in early Spring 2016 and is currently undergoing testing at a homeless shelter.\nLearning for maximizing AUC performance is an important research problem in Machine Learning and Artificial Intelligence. Unlike traditional batch learning methods for maximizing AUC which often suffer from poor scalability, recent years have witnessed some emerging studies that attempt to maximize AUC by single-pass online learning approaches. Despite their encouraging results reported, the existing online AUC maximization algorithms often adopt simple online gradient descent approaches that fail to exploit the geometrical knowledge of the data observed during the online learning process, and thus could suffer from relatively larger regret. To address the above limitation, in this work, we explore a novel algorithm of Adaptive Online AUC Maximization (AdaOAM) which employs an adaptive gradient method that exploits the knowledge of historical gradients to perform more informative online learning. The new adaptive updating strategy of the AdaOAM is less sensitive to the parameter settings and maintains the same time complexity as previous non-adaptive counterparts. Additionally, we extend the algorithm to handle high-dimensional sparse data (SAdaOAM) and address sparsity in the solution by performing lazy gradient updating. We analyze the theoretical bounds and evaluate their empirical performance on various types of data sets. The encouraging empirical results obtained clearly highlighted the effectiveness and efficiency of the proposed algorithms.\nIn this paper, we propose a novel unsupervised learning method for the lexical acquisition of words related to places visited by robots, from human continuous speech signals. We address the problem of learning novel words by a robot that has no prior knowledge of these words except for a primitive acoustic model. Further, we propose a method that allows a robot to effectively use the learned words and their meanings for self-localization tasks. The proposed method is nonparametric Bayesian spatial concept acquisition method (SpCoA) that integrates the generative model for self-localization and the unsupervised word segmentation in uttered sentences via latent variables related to the spatial concept. We implemented the proposed method SpCoA on SIGVerse, which is a simulation environment, and TurtleBot2, which is a mobile robot in a real environment. Further, we conducted experiments for evaluating the performance of SpCoA. The experimental results showed that SpCoA enabled the robot to acquire the names of places from speech sentences. They also revealed that the robot could effectively utilize the acquired spatial concepts and reduce the uncertainty in self-localization.\nThe current paper proposes a novel neural network model for recognizing visually perceived human actions. The proposed multiple spatio-temporal scales recurrent neural network (MSTRNN) model is derived by introducing multiple timescale recurrent dynamics to the conventional convolutional neural network model. One of the essential characteristics of the MSTRNN is that its architecture imposes both spatial and temporal constraints simultaneously on the neural activity which vary in multiple scales among different layers. As suggested by the principle of the upward and downward causation, it is assumed that the network can develop meaningful structures such as functional hierarchy by taking advantage of such constraints during the course of learning. To evaluate the characteristics of the model, the current study uses three types of human action video dataset consisting of different types of primitive actions and different levels of compositionality on them. The performance of the MSTRNN in testing with these dataset is compared with the ones by other representative deep learning models used in the field. The analysis of the internal representation obtained through the learning with the dataset clarifies what sorts of functional hierarchy can be developed by extracting the essential compositionality underlying the dataset.\nFinding efficient and provable methods to solve non-convex optimization problems is an outstanding challenge in machine learning and optimization theory. A popular approach used to tackle non-convex problems is to use convex relaxation techniques to find a convex surrogate for the problem. Unfortunately, convex relaxations typically must be found on a problem-by-problem basis. Thus, providing a general-purpose strategy to estimate a convex relaxation would have a wide reaching impact. Here, we introduce Convex Relaxation Regression (CoRR), an approach for learning convex relaxations for a class of smooth functions. The main idea behind our approach is to estimate the convex envelope of a function $f$ by evaluating $f$ at a set of $T$ random points and then fitting a convex function to these function evaluations. We prove that with probability greater than $1-\\delta$, the solution of our algorithm converges to the global optimizer of $f$ with error $\\mathcal{O} \\Big( \\big(\\frac{\\log(1/\\delta) }{T} \\big)^{\\alpha} \\Big)$ for some $\\alpha> 0$. Our approach enables the use of convex optimization tools to solve a class of non-convex optimization problems.\nWhen data analysts train a classifier and check if its accuracy is significantly different from random guessing, they are implicitly and indirectly performing a hypothesis test (two sample testing) and it is of importance to ask whether this indirect method for testing is statistically optimal or not. Given that hypothesis tests attempt to maximize statistical power subject to a bound on the allowable false positive rate, while prediction attempts to minimize statistical risk on future predictions on unseen data, we wish to study whether a predictive approach for an ultimate aim of testing is prudent. We formalize this problem by considering the two-sample mean-testing setting where one must determine if the means of two Gaussians (with known and equal covariance) are the same or not, but the analyst indirectly does so by checking whether the accuracy achieved by Fisher's LDA classifier is significantly different from chance or not. Unexpectedly, we find that the asymptotic power of LDA's sample-splitting classification accuracy is actually minimax rate-optimal in terms of problem-dependent parameters. Since prediction is commonly thought to be harder than testing, it might come as a surprise to some that solving a harder problem does not create a information-theoretic bottleneck for the easier one. On the flip side, even though the power is rate-optimal, our derivation suggests that it may be worse by a small constant factor; hence practitioners must be wary of using (admittedly flexible) prediction methods on disguised testing problems.\nThe information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods.   This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines.   The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places.\nIn clinical data sets we often find static information (e.g. patient gender, blood type, etc.) combined with sequences of data that are recorded during multiple hospital visits (e.g. medications prescribed, tests performed, etc.). Recurrent Neural Networks (RNNs) have proven to be very successful for modelling sequences of data in many areas of Machine Learning. In this work we present an approach based on RNNs, specifically designed for the clinical domain, that combines static and dynamic information in order to predict future events. We work with a database collected in the Charit\\'{e} Hospital in Berlin that contains complete information concerning patients that underwent a kidney transplantation. After the transplantation three main endpoints can occur: rejection of the kidney, loss of the kidney and death of the patient. Our goal is to predict, based on information recorded in the Electronic Health Record of each patient, whether any of those endpoints will occur within the next six or twelve months after each visit to the clinic. We compared different types of RNNs that we developed for this work, with a model based on a Feedforward Neural Network and a Logistic Regression model. We found that the RNN that we developed based on Gated Recurrent Units provides the best performance for this task. We also used the same models for a second task, i.e., next event prediction, and found that here the model based on a Feedforward Neural Network outperformed the other models. Our hypothesis is that long-term dependencies are not as relevant in this task.\nIn order to distribute the best arm identification task as close as possible to the user's devices, on the edge of the Radio Access Network, we propose a new problem setting, where distributed players collaborate to find the best arm. This architecture guarantees privacy to end-users since no events are stored. The only thing that can be observed by an adversary through the core network is aggregated information across users. We provide a first algorithm, Distributed Median Elimination, which is optimal in term of number of transmitted bits and near optimal in term of speed-up factor with respect to an optimal algorithm run independently on each player. In practice, this first algorithm cannot handle the trade-off between the communication cost and the speed-up factor, and requires some knowledge about the distribution of players. Extended Distributed Median Elimination overcomes these limitations, by playing in parallel different instances of Distributed Median Elimination and selecting the best one. Experiments illustrate and complete the analysis. According to the analysis, in comparison to Median Elimination performed on each player, the proposed algorithm shows significant practical improvements.\nWhether in groups of humans or groups of computer agents, collaboration is most effective between individuals who have the ability to coordinate on a joint strategy for collective action. However, in general a rational actor will only intend to coordinate if that actor believes the other group members have the same intention. This circular dependence makes rational coordination difficult in uncertain environments if communication between actors is unreliable and no prior agreements have been made. An important normative question with regard to coordination in these ad hoc settings is therefore how one can come to believe that other actors will coordinate, and with regard to systems involving humans, an important empirical question is how humans arrive at these expectations. We introduce an exact algorithm for computing the infinitely recursive hierarchy of graded beliefs required for rational coordination in uncertain environments, and we introduce a novel mechanism for multiagent coordination that uses it. Our algorithm is valid in any environment with a finite state space, and extensions to certain countably infinite state spaces are likely possible. We test our mechanism for multiagent coordination as a model for human decisions in a simple coordination game using existing experimental data. We then explore via simulations whether modeling humans in this way may improve human-agent collaboration.\nWe study a problem of allocating divisible jobs, arriving online, to workers in a crowdsourcing setting which involves learning two parameters of strategically behaving workers. Each job is split into a certain number of tasks that are then allocated to workers. Each arriving job has to be completed within a deadline and each task has to be completed satisfying an upper bound on probability of failure. The job population is homogeneous while the workers are heterogeneous in terms of costs, completion times, and times to failure. The job completion time and time to failure of each worker are stochastic with fixed but unknown means. The requester is faced with the challenge of learning two separate parameters of each (strategically behaving) worker simultaneously, namely, the mean job completion time and the mean time to failure. The time to failure of a worker depends on the duration of the task handled by the worker. Assuming non-strategic workers to start with, we solve this biparameter learning problem by applying the Robust UCB algorithm. Then, we non-trivially extend this algorithm to the setting where the workers are strategic about their costs. Our proposed mechanism is dominant strategy incentive compatible and ex-post individually rational with asymptotically optimal regret performance.\nThe exploration of social conversations for addressing patient's needs is an important analytical task in which many scholarly publications are contributing to fill the knowledge gap in this area. The main difficulty remains the inability to turn such contributions into pragmatic processes the pharmaceutical industry can leverage in order to generate insight from social media data, which can be considered as one of the most challenging source of information available today due to its sheer volume and noise. This study is based on the work by Scott Spangler and Jeffrey Kreulen and applies it to identify structure in social media through the extraction of a topical taxonomy able to capture the latent knowledge in social conversations in health-related sites. The mechanism for automatically identifying and generating a taxonomy from social conversations is developed and pressured tested using public data from media sites focused on the needs of cancer patients and their families. Moreover, a novel method for generating the category's label and the determination of an optimal number of categories is presented which extends Scott and Jeffrey's research in a meaningful way. We assume the reader is familiar with taxonomies, what they are and how they are used.\nDespite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.\nModern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning.   We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.\nComplex network topologies and hyperbolic geometry seem specularly connected, and one of the most fascinating and challenging problems of recent complex network theory is to map a given network to its hyperbolic space. The Popularity Similarity Optimization (PSO) model represents - at the moment - the climax of this theory. It suggests that the trade-off between node popularity and similarity is a mechanism to explain how complex network topologies emerge - as discrete samples - from the continuous world of hyperbolic geometry. The hyperbolic space seems appropriate to represent real complex networks. In fact, it preserves many of their fundamental topological properties, and can be exploited for real applications such as, among others, link prediction and community detection. Here, we observe for the first time that a topological-based machine learning class of algorithms - for nonlinear unsupervised dimensionality reduction - can directly approximate the network's node angular coordinates of the hyperbolic model into a two-dimensional space, according to a similar topological organization that we named angular coalescence. On the basis of this phenomenon, we propose a new class of algorithms that offers fast and accurate coalescent embedding of networks in the hyperbolic space even for graphs with thousands of nodes.\nAutomatic video keyword generation is one of the key ingredients in reducing the burden of security officers in analyzing surveillance videos. Keywords or attributes are generally chosen manually based on expert knowledge of surveillance. Most existing works primarily aim at either supervised learning approaches relying on extensive manual labelling or hierarchical probabilistic models that assume the features are extracted using the bag-of-words approach; thus limiting the utilization of the other features. To address this, we turn our attention to automatic attribute discovery approaches. However, it is not clear which automatic discovery approach can discover the most meaningful attributes. Furthermore, little research has been done on how to compare and choose the best automatic attribute discovery methods. In this paper, we propose a novel approach, based on the shared structure exhibited amongst meaningful attributes, that enables us to compare between different automatic attribute discovery approaches.We then validate our approach by comparing various attribute discovery methods such as PiCoDeS on two attribute datasets. The evaluation shows that our approach is able to select the automatic discovery approach that discovers the most meaningful attributes. We then employ the best discovery approach to generate keywords for videos recorded from a surveillance system. This work shows it is possible to massively reduce the amount of manual work in generating video keywords without limiting ourselves to a particular video feature descriptor.\nThis paper presents a strategy to guide a mobile ground robot equipped with a camera or depth sensor, in order to autonomously map the visible part of a bounded three-dimensional structure. We describe motion planning algorithms that determine appropriate successive viewpoints and attempt to fill holes automatically in a point cloud produced by the sensing and perception layer. The emphasis is on accurately reconstructing a 3D model of a structure of moderate size rather than mapping large open environments, with applications for example in architecture, construction and inspection. The proposed algorithms do not require any initialization in the form of a mesh model or a bounding box, and the paths generated are well adapted to situations where the vision sensor is used simultaneously for mapping and for localizing the robot, in the absence of additional absolute positioning system. We analyze the coverage properties of our policy, and compare its performance to the classic frontier based exploration algorithm. We illustrate its efficacy for different structure sizes, levels of localization accuracy and range of the depth sensor, and validate our design on a real-world experiment.\nFrom smart homes that prepare coffee when we wake, to phones that know not to interrupt us during important conversations, our collective visions of HCI imagine a future in which computers understand a broad range of human behaviors. Today our systems fall short of these visions, however, because this range of behaviors is too large for designers or programmers to capture manually. In this paper, we instead demonstrate it is possible to mine a broad knowledge base of human behavior by analyzing more than one billion words of modern fiction. Our resulting knowledge base, Augur, trains vector models that can predict many thousands of user activities from surrounding objects in modern contexts: for example, whether a user may be eating food, meeting with a friend, or taking a selfie. Augur uses these predictions to identify actions that people commonly take on objects in the world and estimate a user's future activities given their current situation. We demonstrate Augur-powered, activity-based systems such as a phone that silences itself when the odds of you answering it are low, and a dynamic music player that adjusts to your present activity. A field deployment of an Augur-powered wearable camera resulted in 96% recall and 71% precision on its unsupervised predictions of common daily activities. A second evaluation where human judges rated the system's predictions over a broad set of input images found that 94% were rated sensible.\nThe present complexity in designing web applications makes software security a difficult goal to achieve. An attacker can explore a deployed service on the web and attack at his/her own leisure. Moving Target Defense (MTD) in web applications is an effective mechanism to nullify this advantage of their reconnaissance but the framework demands a good switching strategy when switching between multiple configurations for its web-stack. To address this issue, we propose modeling of a real-world MTD web application as a repeated Bayesian game. We then formulate an optimization problem that generates an effective switching strategy while considering the cost of switching between different web-stack configurations. To incorporate this model into a developed MTD system, we develop an automated system for generating attack sets of Common Vulnerabilities and Exposures (CVEs) for input attacker types with predefined capabilities. Our framework obtains realistic reward values for the players (defenders and attackers) in this game by using security domain expertise on CVEs obtained from the National Vulnerability Database (NVD). We also address the issue of prioritizing vulnerabilities that when fixed, improves the security of the MTD system. Lastly, we demonstrate the robustness of our proposed model by evaluating its performance when there is uncertainty about input attacker information.\nIn the study of human learning, there is broad evidence that our ability to retain information improves with repeated exposure and decays with delay since last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been taught. A common way to balance this trade-off is spaced repetition, which uses periodic review of content to improve long-term retention. Though spaced repetition is widely used in practice, e.g., in electronic flashcard software, there is little formal understanding of the design of these systems. Our paper addresses this gap in three ways. First, we mine log data from spaced repetition software to establish the functional dependence of retention on reinforcement and delay. Second, we use this memory model to develop a stochastic model for spaced repetition systems. We propose a queueing network model of the Leitner system for reviewing flashcards, along with a heuristic approximation that admits a tractable optimization problem for review scheduling. Finally, we empirically evaluate our queueing model through a Mechanical Turk experiment, verifying a key qualitative prediction of our model: the existence of a sharp phase transition in learning outcomes upon increasing the rate of new item introductions.\nDespite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked \"What vehicle is the person riding?\", computers will need to identify the objects in an image as well as the relationships riding(man, carriage) and pulling(horse, carriage) in order to answer correctly that \"the person is riding a horse-drawn carriage\".   In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 100K images where each image has an average of 21 objects, 18 attributes, and 18 pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations represent the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answers.\nWe consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.\nMutation is one of the most important stages of the genetic algorithm because of its impact on the exploration of global optima, and to overcome premature convergence. There are many types of mutation, and the problem lies in selection of the appropriate type, where the decision becomes more difficult and needs more trial and error. This paper investigates the use of more than one mutation operator to enhance the performance of genetic algorithms. Novel mutation operators are proposed, in addition to two selection strategies for the mutation operators, one of which is based on selecting the best mutation operator and the other randomly selects any operator. Several experiments on some Travelling Salesman Problems (TSP) were conducted to evaluate the proposed methods, and these were compared to the well-known exchange mutation and rearrangement mutation. The results show the importance of some of the proposed methods, in addition to the significant enhancement of the genetic algorithm's performance, particularly when using more than one mutation operator.\nNon-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.\nNeutrosophic set has the ability to handle uncertain, incomplete, inconsistent, indeterminate information in a more accurate way. In this paper, we proposed a neutrosophic recommender system to predict the diseases based on neutrosophic set which includes single-criterion neutrosophic recommender system (SC-NRS) and multi-criterion neutrosophic recommender system (MC-NRS). Further, we investigated some algebraic operations of neutrosophic recommender system such as union, complement, intersection, probabilistic sum, bold sum, bold intersection, bounded difference, symmetric difference, convex linear sum of min and max operators, Cartesian product, associativity, commutativity and distributive. Based on these operations, we studied the algebraic structures such as lattices, Kleen algebra, de Morgan algebra, Brouwerian algebra, BCK algebra, Stone algebra and MV algebra. In addition, we introduced several types of similarity measures based on these algebraic operations and studied some of their theoretic properties. Moreover, we accomplished a prediction formula using the proposed algebraic similarity measure. We also proposed a new algorithm for medical diagnosis based on neutrosophic recommender system. Finally to check the validity of the proposed methodology, we made experiments on the datasets Heart, RHC, Breast cancer, Diabetes and DMD. At the end, we presented the MSE and computational time by comparing the proposed algorithm with the relevant ones such as ICSM, DSM, CARE, CFMD, as well as other variants namely Variant 67, Variant 69, and Varian 71 both in tabular and graphical form to analyze the efficiency and accuracy. Finally we analyzed the strength of all 8 algorithms by ANOVA statistical tool.\nIn many common interactive scenarios, participants lack information about other participants, and specifically about the preferences of other participants. In this work, we model an extreme case of incomplete information, which we term games with type ambiguity, where a participant lacks even information enabling him to form a belief on the preferences of others. Under type ambiguity, one cannot analyze the scenario using the commonly used Bayesian framework, and therefore he needs to model the participants using a different decision model.   In this work, we present the ${\\rm MINthenMAX}$ decision model under ambiguity. This model is a refinement of Wald's MiniMax principle, which we show to be too coarse for games with type ambiguity. We characterize ${\\rm MINthenMAX}$ as the finest refinement of the MiniMax principle that satisfies three properties we claim are necessary for games with type ambiguity. This prior-less approach we present her also follows the common practice in computer science of worst-case analysis.   Finally, we define and analyze the corresponding equilibrium concept assuming all players follow ${\\rm MINthenMAX}$. We demonstrate this equilibrium by applying it to two common economic scenarios: coordination games and bilateral trade. We show that in both scenarios, an equilibrium in pure strategies always exists and we analyze the equilibria.\nWe address the robot grasp optimization problem of unknown objects considering uncertainty in the input space. Grasping unknown objects can be achieved by using a trial and error exploration strategy. Bayesian optimization is a sample efficient optimization algorithm that is especially suitable for this setups as it actively reduces the number of trials for learning about the function to optimize. In fact, this active object exploration is the same strategy that infants do to learn optimal grasps. One problem that arises while learning grasping policies is that some configurations of grasp parameters may be very sensitive to error in the relative pose between the object and robot end-effector. We call these configurations unsafe because small errors during grasp execution may turn good grasps into bad grasps. Therefore, to reduce the risk of grasp failure, grasps should be planned in safe areas. We propose a new algorithm, Unscented Bayesian optimization that is able to perform sample efficient optimization while taking into consideration input noise to find safe optima. The contribution of Unscented Bayesian optimization is twofold as if provides a new decision process that drives exploration to safe regions and a new selection procedure that chooses the optimal in terms of its safety without extra analysis or computational cost. Both contributions are rooted on the strong theory behind the unscented transformation, a popular nonlinear approximation method. We show its advantages with respect to the classical Bayesian optimization both in synthetic problems and in realistic robot grasp simulations. The results highlights that our method achieves optimal and robust grasping policies after few trials while the selected grasps remain in safe regions.\nWe investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space. Since the environment does not change, there is potentially a lot of common ground amongst tasks and learning to solve them individually seems extremely wasteful. In this paper, we explicitly model and learn this shared structure as it arises in the state-action value space. We will show how one can jointly learn optimal value-functions by modifying the popular Value-Iteration and Policy-Iteration procedures to accommodate this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization. this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization.\nWe describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.\nAdditive utility function models are widely used in multiple criteria decision analysis. In such models, a numerical value is associated to each alternative involved in the decision problem. It is computed by aggregating the scores of the alternative on the different criteria of the decision problem. The score of an alternative is determined by a marginal value function that evolves monotonically as a function of the performance of the alternative on this criterion. Determining the shape of the marginals is not easy for a decision maker. It is easier for him/her to make statements such as \"alternative $a$ is preferred to $b$\". In order to help the decision maker, UTA disaggregation procedures use linear programming to approximate the marginals by piecewise linear functions based only on such statements. In this paper, we propose to infer polynomials and splines instead of piecewise linear functions for the marginals. In this aim, we use semidefinite programming instead of linear programming. We illustrate this new elicitation method and present some experimental results.\nRecent research has shown great progress on fine-grained entity typing. Most existing methods require pre-defining a set of types and training a multi-class classifier from a large labeled data set based on multi-level linguistic features. They are thus limited to certain domains, genres and languages. In this paper, we propose a novel unsupervised entity typing framework by combining symbolic and distributional semantics. We start from learning general embeddings for each entity mention, compose the embeddings of specific contexts using linguistic structures, link the mention to knowledge bases and learn its related knowledge representations. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework doesn't rely on any annotated data, predefined typing schema, or hand-crafted features, therefore it can be quickly adapted to a new domain, genre and language. Furthermore, it has great flexibility at incorporating linguistic structures (e.g., Abstract Meaning Representation (AMR), dependency relations) to improve specific context representation. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.\nNearly all previous work on geo-locating latent states and activities from social media confounds general discussions about activities, self-reports of users participating in those activities at times in the past or future, and self-reports made at the immediate time and place the activity occurs. Activities, such as alcohol consumption, may occur at different places and types of places, and it is important not only to detect the local regions where these activities occur, but also to analyze the degree of participation in them by local residents. In this paper, we develop new machine learning based methods for fine-grained localization of activities and home locations from Twitter data. We apply these methods to discover and compare alcohol consumption patterns in a large urban area, New York City, and a more suburban and rural area, Monroe County. We find positive correlations between the rate of alcohol consumption reported among a community's Twitter users and the density of alcohol outlets, demonstrating that the degree of correlation varies significantly between urban and suburban areas. While our experiments are focused on alcohol use, our methods for locating homes and distinguishing temporally-specific self-reports are applicable to a broad range of behaviors and latent states.\nTwo fundamental problems in computational game theory are computing a Nash equilibrium and learning to exploit opponents given observations of their play (opponent exploitation). The latter is perhaps even more important than the former: Nash equilibrium does not have a compelling theoretical justification in game classes other than two-player zero-sum, and for all games one can potentially do better by exploiting perceived weaknesses of the opponent than by following a static equilibrium strategy throughout the match. The natural setting for opponent exploitation is the Bayesian setting where we have a prior model that is integrated with observations to create a posterior opponent model that we respond to. The most natural, and a well-studied prior distribution is the Dirichlet distribution. An exact polynomial-time algorithm is known for best-responding to the posterior distribution for an opponent assuming a Dirichlet prior with multinomial sampling in normal-form games; however, for imperfect-information games the best known algorithm is based on approximating an infinite integral without theoretical guarantees. We present the first exact algorithm for a natural class of imperfect-information games. We demonstrate that our algorithm runs quickly in practice and outperforms the best prior approaches. We also present an algorithm for the uniform prior setting.\nOne goal of online social recommendation systems is to harness the wisdom of crowds in order to identify high quality content. Yet the sequential voting mechanisms that are commonly used by these systems are at odds with existing theoretical and empirical literature on optimal aggregation. This literature suggests that sequential voting will promote herding---the tendency for individuals to copy the decisions of others around them---and hence lead to suboptimal content recommendation. Is there a problem with our practice, or a problem with our theory? Previous attempts at answering this question have been limited by a lack of objective measurements of content quality. Quality is typically defined endogenously as the popularity of content in absence of social influence. The flaw of this metric is its presupposition that the preferences of the crowd are aligned with underlying quality. Domains in which content quality can be defined exogenously and measured objectively are thus needed in order to better assess the design choices of social recommendation systems. In this work, we look to the domain of education, where content quality can be measured via how well students are able to learn from the material presented to them. Through a behavioral experiment involving a simulated massive open online course (MOOC) run on Amazon Mechanical Turk, we show that sequential voting systems can surface better content than systems that elicit independent votes.\nWe propose Turing Learning, a novel system identification method for inferring the behavior of natural or artificial systems. Turing Learning simultaneously optimizes two populations of computer programs, one representing models of the behavior of the system under investigation, and the other representing classifiers. By observing the behavior of the system as well as the behaviors produced by the models, two sets of data samples are obtained. The classifiers are rewarded for discriminating between these two sets, that is, for correctly categorizing data samples as either genuine or counterfeit. Conversely, the models are rewarded for 'tricking' the classifiers into categorizing their data samples as genuine. Unlike other methods for system identification, Turing Learning does not require predefined metrics to quantify the difference between the system and its models. We present two case studies with swarms of simulated robots and prove that the underlying behaviors cannot be inferred by a metric-based system identification method. By contrast, Turing Learning infers the behaviors with high accuracy. It also produces a useful by-product - the classifiers - that can be used to detect abnormal behavior in the swarm. Moreover, we show that Turing Learning also successfully infers the behavior of physical robot swarms. The results show that collective behaviors can be directly inferred from motion trajectories of individuals in the swarm, which may have significant implications for the study of animal collectives. Furthermore, Turing Learning could prove useful whenever a behavior is not easily characterizable using metrics, making it suitable for a wide range of applications.\nHumans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. We develop a class of sequential generative models that are built on the principles of feedback and attention. These two characteristics lead to generative models that are among the state-of-the art in density estimation and image generation. We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples---having seen new examples just once---providing an important class of general-purpose models for one-shot machine learning.\nThis paper presents a Neural Aggregation Network (NAN) for video face recognition. The network takes a face video or face image set of a person with a variable number of face images as its input, and produces a compact, fixed-dimension feature representation for recognition. The whole network is composed of two modules. The feature embedding module is a deep Convolutional Neural Network (CNN) which maps each face image to a feature vector. The aggregation module consists of two attention blocks which adaptively aggregate the feature vectors to form a single feature inside the convex hull spanned by them. Due to the attention mechanism, the aggregation is invariant to the image order. Our NAN is trained with a standard classification or verification loss without any extra supervision signal, and we found that it automatically learns to advocate high-quality face images while repelling low-quality ones such as blurred, occluded and improperly exposed faces. The experiments on IJB-A, YouTube Face, Celebrity-1000 video face recognition benchmarks show that it consistently outperforms naive aggregation methods and achieves the state-of-the-art accuracy.\nRecently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as \"in-the-wild\"). This is partially attributed to the fact that comprehensive \"in-the-wild\" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking \"in-the-wild\". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.\nThere has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.\nWe present a method for automatically generating repair feedback for syntax errors for introductory programming problems. Syntax errors constitute one of the largest classes of errors (34%) in our dataset of student submissions obtained from a MOOC course on edX. The previous techniques for generating automated feed- back on programming assignments have focused on functional correctness and style considerations of student programs. These techniques analyze the program AST of the program and then perform some dynamic and symbolic analyses to compute repair feedback. Unfortunately, it is not possible to generate ASTs for student pro- grams with syntax errors and therefore the previous feedback techniques are not applicable in repairing syntax errors.   We present a technique for providing feedback on syntax errors that uses Recurrent neural networks (RNNs) to model syntactically valid token sequences. Our approach is inspired from the recent work on learning language models from Big Code (large code corpus). For a given programming assignment, we first learn an RNN to model all valid token sequences using the set of syntactically correct student submissions. Then, for a student submission with syntax errors, we query the learnt RNN model with the prefix to- ken sequence to predict token sequences that can fix the error by either replacing or inserting the predicted token sequence at the error location. We evaluate our technique on over 14, 000 student submissions with syntax errors. Our technique can completely re- pair 31.69% (4501/14203) of submissions with syntax errors and in addition partially correct 6.39% (908/14203) of the submissions.\nWe study methods for aggregating pairwise comparison data in order to estimate outcome probabilities for future comparisons among a collection of n items. Working within a flexible framework that imposes only a form of strong stochastic transitivity (SST), we introduce an adaptivity index defined by the indifference sets of the pairwise comparison probabilities. In addition to measuring the usual worst-case risk of an estimator, this adaptivity index also captures the extent to which the estimator adapts to instance-specific difficulty relative to an oracle estimator. We prove three main results that involve this adaptivity index and different algorithms. First, we propose a three-step estimator termed Count-Randomize-Least squares (CRL), and show that it has adaptivity index upper bounded as $\\sqrt{n}$ up to logarithmic factors. We then show that that conditional on the hardness of planted clique, no computationally efficient estimator can achieve an adaptivity index smaller than $\\sqrt{n}$. Second, we show that a regularized least squares estimator can achieve a poly-logarithmic adaptivity index, thereby demonstrating a $\\sqrt{n}$-gap between optimal and computationally achievable adaptivity. Finally, we prove that the standard least squares estimator, which is known to be optimally adaptive in several closely related problems, fails to adapt in the context of estimating pairwise probabilities.\nIntegration between biology and information science benefits both fields. Many related models have been proposed, such as computational visual cognition models, computational motor control models, integrations of both and so on. In general, the robustness and precision of recognition is one of the key problems for object recognition models.   In this paper, inspired by features of human recognition process and their biological mechanisms, a new integrated and dynamic framework is proposed to mimic the semantic extraction, concept formation and feature re-selection in human visual processing. The main contributions of the proposed model are as follows:   (1) Semantic feature extraction: Local semantic features are learnt from episodic features that are extracted from raw images through a deep neural network;   (2) Integrated concept formation: Concepts are formed with local semantic information and structural information learnt through network.   (3) Feature re-selection: When ambiguity is detected during recognition process, distinctive features according to the difference between ambiguous candidates are re-selected for recognition.   Experimental results on hand-written digits and facial shape dataset show that, compared with other methods, the new proposed model exhibits higher robustness and precision for visual recognition, especially in the condition when input samples are smantic ambiguous. Meanwhile, the introduced biological mechanisms further strengthen the interaction between neuroscience and information science.\nIn this paper, we present a preliminary work on an approach to fill the gap between logic-based argumentation and the numerous approaches to tackle the dynamics of abstract argumentation frameworks. Our idea is that, even when arguments and attacks are defined by means of a logical belief base, there may be some uncertainty about how accurate is the content of an argument, and so the presence (or absence) of attacks concerning it. We use enthymemes to illustrate this notion of uncertainty of arguments and attacks. Indeed, as argued in the literature, real arguments are often enthymemes instead of completely specified deductive arguments. This means that some parts of the pair (support, claim) may be missing because they are supposed to belong to some \"common knowledge\", and then should be deduced by the agent which receives the enthymeme. But the perception that agents have of the common knowledge may be wrong, and then a first agent may state an enthymeme that her opponent is not able to decode in an accurate way. It is likely that the decoding of the enthymeme by the agent leads to mistaken attacks between this new argument and the existing ones. In this case, the agent can receive some information about attacks or arguments acceptance statuses which disagree with her argumentation framework. We exemplify a way to incorporate this new piece of information by means of existing works on the dynamics of abstract argumentation frameworks.\nStructural decomposition methods have been developed for identifying tractable classes of instances of fundamental problems in databases, such as conjunctive queries and query containment, of the constraint satisfaction problem in artificial intelligence, or more generally of the homomorphism problem over relational structures. Most structural decomposition methods can be characterized through hypergraph games that are variations of the Robber and Cops graph game that characterizes the notion of treewidth. In particular, decomposition trees somehow correspond to monotone winning strategies, where the escape space of the robber on the hypergraph is shrunk monotonically by the cops. In fact, unlike the treewidth case, there are hypergraphs where monotonic strategies do not exist, while the robber can be captured by means of more complex non-monotonic strategies. However, these powerful strategies do not correspond in general to valid decompositions. The paper provides a general way to exploit the power of non-monotonic strategies, by allowing a \"disciplined\" form of non-monotonicity, characteristic of cops playing in a greedy way. It is shown that deciding the existence of a (non-monotone) greedy winning strategy (and compute one, if any) is tractable. Moreover, despite their non-monotonicity, such strategies always induce valid decomposition trees, which can be computed efficiently based on them. As a consequence, greedy strategies allow us to define new islands of tractability for the considered problems properly including all previously known classes of tractable instances.\nVisual recognition systems mounted on autonomous moving agents face the challenge of unconstrained data, but simultaneously have the opportunity to improve their performance by moving to acquire new views of test data. In this work, we first show how a recurrent neural network-based system may be trained to perform end-to-end learning of motion policies suited for this \"active recognition\" setting. Further, we hypothesize that active vision requires an agent to have the capacity to reason about the effects of its motions on its view of the world. To verify this hypothesis, we attempt to induce this capacity in our active recognition pipeline, by simultaneously learning to forecast the effects of the agent's motions on its internal representation of the environment conditional on all past views. Results across two challenging datasets confirm both that our end-to-end system successfully learns meaningful policies for active category recognition, and that \"learning to look ahead\" further boosts recognition performance.\nSeveral `edge-discovery' applications over graph-based data models are known to have worst-case quadratic time complexity in the nodes, even if the discovered edges are sparse. One example is the generic link discovery problem between two graphs, which has invited research interest in several communities. Specific versions of this problem include link prediction in social networks, ontology alignment between metadata-rich RDF data, approximate joins, and entity resolution between instance-rich data. As large datasets continue to proliferate, reducing quadratic complexity to make the task practical is an important research problem. Within the entity resolution community, the problem is commonly referred to as blocking. A particular class of learnable blocking schemes is known as Disjunctive Normal Form (DNF) blocking schemes, and has emerged as state-of-the art for homogeneous (i.e. same-schema) tabular data. Despite the promise of these schemes, a formalism or learning framework has not been developed for them when input data instances are generic, attributed graphs possessing both node and edge heterogeneity. With such a development, the complexity-reducing scope of DNF schemes becomes applicable to a variety of problems, including entity resolution and type alignment between heterogeneous graphs, and link prediction in networks represented as attributed graphs. This paper presents a graph-theoretic formalism for DNF schemes, and investigates their learnability in an optimization framework. We also briefly describe an empirical case study encapsulating some of the principles in this paper.\nThe static bike rebalancing problem (SBRP) concerns the task of repositioning bikes among stations in self-service bike-sharing systems. This problem can be seen as a variant of the one-commodity pickup and delivery vehicle routing problem, where multiple visits are allowed to be performed at each station, i.e., the demand of a station is allowed to be split. Moreover, a vehicle may temporarily drop its load at a station, leaving it in excess or, alternatively, collect more bikes from a station (even all of them), thus leaving it in default. Both cases require further visits in order to meet the actual demands of such station. This paper deals with a particular case of the SBRP, in which only a single vehicle is available and the objective is to find a least-cost route that meets the demand of all stations and does not violate the minimum (zero) and maximum (vehicle capacity) load limits along the tour. Therefore, the number of bikes to be collected or delivered at each station should be appropriately determined in order to respect such constraints. We propose an iterated local search (ILS) based heuristic to solve the problem. The ILS algorithm was tested on 980 benchmark instances from the literature and the results obtained are quite competitive when compared to other existing methods. Moreover, our heuristic was capable of finding most of the known optimal solutions and also of improving the results on a number of open instances.\nHumans demonstrate remarkable abilities to predict physical events in complex scenes. Two classes of models for physical scene understanding have recently been proposed: \"Intuitive Physics Engines\", or IPEs, which posit that people make predictions by running approximate probabilistic simulations in causal mental models similar in nature to video-game physics engines, and memory-based models, which make judgments based on analogies to stored experiences of previously encountered scenes and physical outcomes. Versions of the latter have recently been instantiated in convolutional neural network (CNN) architectures. Here we report four experiments that, to our knowledge, are the first rigorous comparisons of simulation-based and CNN-based models, where both approaches are concretely instantiated in algorithms that can run on raw image inputs and produce as outputs physical judgments such as whether a stack of blocks will fall. Both approaches can achieve super-human accuracy levels and can quantitatively predict human judgments to a similar degree, but only the simulation-based models generalize to novel situations in ways that people do, and are qualitatively consistent with systematic perceptual illusions and judgment asymmetries that people show.\nIn order to be effective teammates, robots need to be able to understand high-level human behavior to recognize, anticipate, and adapt to human motion. We have designed a new approach to enable robots to perceive human group motion in real-time, anticipate future actions, and synthesize their own motion accordingly. We explore this within the context of joint action, where humans and robots move together synchronously. In this paper, we present an anticipation method which takes high-level group behavior into account. We validate the method within a human-robot interaction scenario, where an autonomous mobile robot observes a team of human dancers, and then successfully and contingently coordinates its movements to \"join the dance\". We compared the results of our anticipation method to move the robot with another method which did not rely on high-level group behavior, and found our method performed better both in terms of more closely synchronizing the robot's motion to the team, and also exhibiting more contingent and fluent motion. These findings suggest that the robot performs better when it has an understanding of high-level group behavior than when it does not. This work will help enable others in the robotics community to build more fluent and adaptable robots in the future.\nMachines of all kinds from vehicles to industrial equipment are increasingly instrumented with hundreds of sensors. Using such data to detect anomalous behaviour is critical for safety and efficient maintenance. However, anomalies occur rarely and with great variety in such systems, so there is often insufficient anomalous data to build reliable detectors. A standard approach to mitigate this problem is to use one class methods relying only on data from normal behaviour. Unfortunately, even these approaches are more likely to fail in the scenario of a dynamical system with manual control input(s). Normal behaviour in response to novel control input(s) might look very different to the learned detector which may be incorrectly detected as anomalous. In this paper, we address this issue by modelling time-series via Ordinary Differential Equations (ODE) and utilising such an ODE model to simulate the behaviour of dynamical systems under varying control inputs. The available data is then augmented with data generated from the ODE, and the anomaly detector is retrained on this augmented dataset. Experiments demonstrate that ODE-augmented training data allows better coverage of possible control input(s) and results in learning more accurate distinctions between normal and anomalous behaviour in time-series.\nEpistemic logic has become a major field of philosophical logic ever since the groundbreaking work by Hintikka (1962). Despite its various successful applications in theoretical computer science, AI, and game theory, the technical development of the field has been mainly focusing on the propositional part, i.e., the propositional modal logics of \"knowing that\". However, knowledge is expressed in everyday life by using various other locutions such as \"knowing whether\", \"knowing what\", \"knowing how\" and so on (knowing-wh hereafter). Such knowledge expressions are better captured in quantified epistemic logic, as was already discussed by Hintikka (1962) and his sequel works at length. This paper aims to draw the attention back again to such a fascinating but largely neglected topic. We first survey what Hintikka and others did in the literature of quantified epistemic logic, and then advocate a new quantifier-free approach to study the epistemic logics of knowing-wh, which we believe can balance expressivity and complexity, and capture the essential reasoning patterns about knowing-wh. We survey our recent line of work on the epistemic logics of \"knowing whether\", \"knowing what\" and \"knowing how\" to demonstrate the use of this new approach.\nWe study the Bipartite Boolean Quadratic Programming Problem (BBQP) which is an extension of the well known Boolean Quadratic Programming Problem (BQP). Applications of the BBQP include mining discrete patterns from binary data, approximating matrices by rank-one binary matrices, computing the cut-norm of a matrix, and solving optimisation problems such as maximum weight biclique, bipartite maximum weight cut, maximum weight induced sub-graph of a bipartite graph, etc. For the BBQP, we first present several algorithmic components, specifically, hill climbers and mutations, and then show how to combine them in a high-performance metaheuristic. Instead of hand-tuning a standard metaheuristic to test the efficiency of the hybrid of the components, we chose to use an automated generation of a multi-component metaheuristic to save human time, and also improve objectivity in the analysis and comparisons of components. For this we designed a new metaheuristic schema which we call Conditional Markov Chain Search (CMCS). We show that CMCS is flexible enough to model several standard metaheuristics; this flexibility is controlled by multiple numeric parameters, and so is convenient for automated generation. We study the configurations revealed by our approach and show that the best of them outperforms the previous state-of-the-art BBQP algorithm by several orders of magnitude. In our experiments we use benchmark instances introduced in the preliminary version of this paper and described here, which have already become the de facto standard in the BBQP literature.\nThe generalized belief propagation (GBP), introduced by Yedidia et al., is an extension of the belief propagation (BP) algorithm, which is widely used in different problems involved in calculating exact or approximate marginals of probability distributions. In many problems, it has been observed that the accuracy of GBP considerably outperforms that of BP. However, because in general the computational complexity of GBP is higher than BP, its application is limited in practice.   In this paper, we introduce a stochastic version of GBP called stochastic generalized belief propagation (SGBP) that can be considered as an extension to the stochastic BP (SBP) algorithm introduced by Noorshams et al. They have shown that SBP reduces the complexity per iteration of BP by an order of magnitude in alphabet size. In contrast to SBP, SGBP can reduce the computation complexity if certain topological conditions are met by the region graph associated to a graphical model. However, this reduction can be larger than only one order of magnitude in alphabet size. In this paper, we characterize these conditions and the amount of computation gain that we can obtain by using SGBP. Finally, using similar proof techniques employed by Noorshams et al., for general graphical models satisfy contraction conditions, we prove the asymptotic convergence of SGBP to the unique GBP fixed point, as well as providing non-asymptotic upper bounds on the mean square error and on the high probability error.\nThe Ant Colony System (ACS) is, next to Ant Colony Optimization (ACO) and the MAX-MIN Ant System (MMAS), one of the most efficient metaheuristic algorithms inspired by the behavior of ants. In this article we present three novel parallel versions of the ACS for the graphics processing units (GPUs). To the best of our knowledge, this is the first such work on the ACS which shares many key elements of the ACO and the MMAS, but differences in the process of building solutions and updating the pheromone trails make obtaining an efficient parallel version for the GPUs a difficult task. The proposed parallel versions of the ACS differ mainly in their implementations of the pheromone memory. The first two use the standard pheromone matrix, and the third uses a novel selective pheromone memory. Computational experiments conducted on several Travelling Salesman Problem (TSP) instances of sizes ranging from 198 to 2392 cities showed that the parallel ACS on Nvidia Kepler GK104 GPU (1536 CUDA cores) is able to obtain a speedup up to 24.29x vs the sequential ACS running on a single core of Intel Xeon E5-2670 CPU. The parallel ACS with the selective pheromone memory achieved speedups up to 16.85x, but in most cases the obtained solutions were of significantly better quality than for the sequential ACS.\nPsychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people's emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs.\nWe address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Ask Your Neurons, a scalable, jointly trained, end-to-end formulation to this problem.   In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language inputs (image and question). We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extend the original DAQUAR dataset to DAQUAR-Consensus.   Moreover, we also extend our analysis to VQA, a large-scale question answering about images dataset, where we investigate some particular design choices and show the importance of stronger visual models. At the same time, we achieve strong performance of our model that still uses a global image representation. Finally, based on such analysis, we refine our Ask Your Neurons on DAQUAR, which also leads to a better performance on this challenging task.\nIdentifying context-specific entity networks from aggregated data is an important task, arising often in bioinformatics and neuroimaging. Computationally, this task can be formulated as jointly estimating multiple different, but related, sparse Undirected Graphical Models (UGM) from aggregated samples across several contexts. Previous joint-UGM studies have mostly focused on sparse Gaussian Graphical Models (sGGMs) and can't identify context-specific edge patterns directly. We, therefore, propose a novel approach, SIMULE (detecting Shared and Individual parts of MULtiple graphs Explicitly) to learn multi-UGM via a constrained L1 minimization. SIMULE automatically infers both specific edge patterns that are unique to each context and shared interactions preserved among all the contexts. Through the L1 constrained formulation, this problem is cast as multiple independent subtasks of linear programming that can be solved efficiently in parallel. In addition to Gaussian data, SIMULE can also handle multivariate Nonparanormal data that greatly relaxes the normality assumption that many real-world applications do not follow. We provide a novel theoretical proof showing that SIMULE achieves a consistent result at the rate O(log(Kp)/n_{tot}). On multiple synthetic datasets and two biomedical datasets, SIMULE shows significant improvement over state-of-the-art multi-sGGM and single-UGM baselines.\nYield and quality improvement is of paramount importance to any manufacturing company. One of the ways of improving yield is through discovery of the root causal factors affecting yield. We propose the use of data-driven interpretable causal models to identify key factors affecting yield. We focus on factors that are measured in different stages of production and testing in the manufacturing cycle of a product. We apply causal structure learning techniques on real data collected from this line. Specifically, the goal of this work is to learn interpretable causal models from observational data produced by manufacturing lines.   Emphasis has been given to the interpretability of the models to make them actionable in the field of manufacturing. We highlight the challenges presented by assembly line data and propose ways to alleviate them.We also identify unique characteristics of data originating from assembly lines and how to leverage them in order to improve causal discovery. Standard evaluation techniques for causal structure learning shows that the learned causal models seem to closely represent the underlying latent causal relationship between different factors in the production process. These results were also validated by manufacturing domain experts who found them promising. This work demonstrates how data mining and knowledge discovery can be used for root cause analysis in the domain of manufacturing and connected industry.\nThe estimation of class prevalence, i.e., the fraction of a population that belongs to a certain class, is a very useful tool in data analytics and learning, and finds applications in many domains such as sentiment analysis, epidemiology, etc. For example, in sentiment analysis, the objective is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather estimate the overall distribution of positive and negative sentiments during an event window. A popular way of performing the above task, often dubbed quantification, is to use supervised learning to train a prevalence estimator from labeled data.   Contemporary literature cites several performance measures used to measure the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization and we show, by a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.\nLearning the true ordering between objects by aggregating a set of expert opinion rank order lists is an important and ubiquitous problem in many applications ranging from social choice theory to natural language processing and search aggregation. We study the problem of unsupervised rank aggregation where no ground truth ordering information in available, neither about the true preference ordering between any set of objects nor about the quality of individual rank lists. Aggregating the often inconsistent and poor quality rank lists in such an unsupervised manner is a highly challenging problem, and standard consensus-based methods are often ill-defined, and difficult to solve. In this manuscript we propose a novel framework to bypass these issues by using object attributes to augment the standard rank aggregation framework. We design algorithms that learn joint models on both rank lists and object features to obtain an aggregated rank ordering that is more accurate and robust, and also helps weed out rank lists of dubious validity. We validate our techniques on synthetic datasets where our algorithm is able to estimate the true rank ordering even when the rank lists are corrupted. Experiments on three real datasets, MQ2008, MQ2008 and OHSUMED, show that using object features can result in significant improvement in performance over existing rank aggregation methods that do not use object information. Furthermore, when at least some of the rank lists are of high quality, our methods are able to effectively exploit their high expertise to output an aggregated rank ordering of great accuracy.\nDatabases in domains such as healthcare are routinely released to the public in aggregated form. Unfortunately, naive modeling with aggregated data may significantly diminish the accuracy of inferences at the individual level. This paper addresses the scenario where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. We consider a limiting case of generalized linear modeling when the target variables are only known up to permutation, and explore how this relates to permutation testing; a standard technique for assessing statistical dependency. Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. Our results suggest the effectiveness of the proposed approach when, in the original data, permutation testing accurately ascertains the veracity of the linear relationship. The framework is extended to general histogram data with larger bins - with order statistics such as the median as a limiting case. Our experimental results on simulated data and aggregated healthcare data suggest a diminishing returns property with respect to the granularity of the histogram - when a linear relationship holds in the original data, the targets can be predicted accurately given relatively coarse histograms.\nTime-frequency methods for vibration-based gearbox faults detection have been considered the most efficient method. Among these methods, continuous wavelet transform (CWT) as one of the best time-frequency method has been used for both stationary and transitory signals. Some deficiencies of CWT are problem of overlapping and distortion ofsignals. In this condition, a large amount of redundant information exists so that it may cause false alarm or misinterpretation of the operator. In this paper a modified method called Exact Wavelet Analysis is used to minimize the effects of overlapping and distortion in case of gearbox faults. To implement exact wavelet analysis, Particle Swarm Optimization (PSO) algorithm has been used for this purpose. This method have been implemented for the acceleration signals from 2D acceleration sensor acquired by Advantech PCI-1710 card from a gearbox test setup in Amirkabir University of Technology. Gearbox has been considered in both healthy and chipped tooth gears conditions. Kernelized Support Vector Machine (SVM) with radial basis functions has used the extracted features from exact wavelet analysis for classification. The efficiency of this classifier is then evaluated with the other signals acquired from the setup test. The results show that in comparison of CWT, PSO Exact Wavelet Transform has better ability in feature extraction in price of more computational effort. In addition, PSO exact wavelet has better speed comparing to Genetic Algorithm (GA) exact wavelet in condition of equal population because of factoring mutation and crossover in PSO algorithm. SVM classifier with the extracted features in gearbox shows very good results and its ability has been proved.\nThe rise in popularity and ubiquity of Twitter has made sentiment analysis of tweets an important and well-covered area of research. However, the 140 character limit imposed on tweets makes it hard to use standard linguistic methods for sentiment classification. On the other hand, what tweets lack in structure they make up with sheer volume and rich metadata. This metadata includes geolocation, temporal and author information. We hypothesize that sentiment is dependent on all these contextual factors. Different locations, times and authors have different emotional valences. In this paper, we explored this hypothesis by utilizing distant supervision to collect millions of labelled tweets from different locations, times and authors. We used this data to analyse the variation of tweet sentiments across different authors, times and locations. Once we explored and understood the relationship between these variables and sentiment, we used a Bayesian approach to combine these variables with more standard linguistic features such as n-grams to create a Twitter sentiment classifier. This combined classifier outperforms the purely linguistic classifier, showing that integrating the rich contextual information available on Twitter into sentiment classification is a promising direction of research.\nApnea-bradycardia is one of the major clinical early indicators of late-onset sepsis occurring in approximately 7% to 10% of all neonates and in more than 25% of very low birth weight infants in NICU. The objective of this paper was to determine if HRV, respiration and their relationships help to diagnose infection in premature infants via non-invasive ways in NICU. Therefore, we implement Mono-Channel (MC) and Bi-Channel (BC) Analysis in two groups: sepsis (S) vs. non-sepsis (NS). Firstly, we studied RR series not only by linear methods: time domain and frequency domain, but also by non-linear methods: chaos theory and information theory. The results show that alpha Slow, alpha Fast and Sample Entropy are significant parameters to distinguish S from NS. Secondly, the question about the functional coupling of HRV and nasal respiration is addressed. Local linear correlation coefficient r2t,f has been explored, while non-linear regression coefficient h2 was calculated in two directions. It is obvious that r2t,f within the third frequency band (0.2<f<0.4 Hz) and h2 in two directions were complementary approaches to diagnose sepsis. Thirdly, feasibility study is carried out on the candidate parameters selected from MC and BC respectively. We discovered that the proposed test based on optimal fusion of 6 features shows good performance with the largest AUC and a reduced probability of false alarm (PFA).\nThis paper introduces an automated skill acquisition framework in reinforcement learning which involves identifying a hierarchical description of the given task in terms of abstract states and extended actions between abstract states. Identifying such structures present in the task provides ways to simplify and speed up reinforcement learning algorithms. These structures also help to generalize such algorithms over multiple tasks without relearning policies from scratch. We use ideas from dynamical systems to find metastable regions in the state space and associate them with abstract states. The spectral clustering algorithm PCCA+ is used to identify suitable abstractions aligned to the underlying structure. Skills are defined in terms of the sequence of actions that lead to transitions between such abstract states. The connectivity information from PCCA+ is used to generate these skills or options. These skills are independent of the learning task and can be efficiently reused across a variety of tasks defined over the same model. This approach works well even without the exact model of the environment by using sample trajectories to construct an approximate estimate. We also present our approach to scaling the skill acquisition framework to complex tasks with large state spaces for which we perform state aggregation using the representation learned from an action conditional video prediction network and use the skill acquisition framework on the aggregated state space.\nIn this article, we propose CANDIES (Combined Approach for Novelty Detection in Intelligent Embedded Systems), a new approach to novelty detection in technical systems. We assume that in a technical system several processes interact. If we observe these processes with sensors, we are able to model the observations (samples) with a probabilistic model, where, in an ideal case, the components of the parametric mixture density model we use, correspond to the processes in the real world. Eventually, at run-time, novel processes emerge in the technical systems such as in the case of an unpredictable failure. As a consequence, new kinds of samples are observed that require an adaptation of the model. CANDIES relies on mixtures of Gaussians which can be used for classification purposes, too. New processes may emerge in regions of the models' input spaces where few samples were observed before (low-density regions) or in regions where already many samples were available (high-density regions). The latter case is more difficult, but most existing solutions focus on the former. Novelty detection in low- and high-density regions requires different detection strategies. With CANDIES, we introduce a new technique to detect novel processes in high-density regions by means of a fast online goodness-of-fit test. For detection in low-density regions we combine this approach with a 2SND (Two-Stage-Novelty-Detector) which we presented in preliminary work. The properties of CANDIES are evaluated using artificial data and benchmark data from the field of intrusion detection in computer networks, where the task is to detect new kinds of attacks.\nUtilities face the challenge of responding to power outages due to storms and ice damage, but most power grids are not equipped with sensors to pinpoint the precise location of the faults causing the outage. Instead, utilities have to depend primarily on phone calls (trouble calls) from customers who have lost power to guide the dispatching of utility trucks. In this paper, we develop a policy that routes a utility truck to restore outages in the power grid as quickly as possible, using phone calls to create beliefs about outages, but also using utility trucks as a mechanism for collecting additional information. This means that routing decisions change not only the physical state of the truck (as it moves from one location to another) and the grid (as the truck performs repairs), but also our belief about the network, creating the first stochastic vehicle routing problem that explicitly models information collection and belief modeling. We address the problem of managing a single utility truck, which we start by formulating as a sequential stochastic optimization model which captures our belief about the state of the grid. We propose a stochastic lookahead policy, and use Monte Carlo tree search (MCTS) to produce a practical policy that is asymptotically optimal. Simulation results show that the developed policy restores the power grid much faster compared to standard industry heuristics.\nSparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features of high-dimensional data in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables is difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure called globally sparse probabilistic PCA (GSPPCA) that allows to obtain several sparse components with the same sparsity pattern. This allows the practitioner to identify the original variables which are relevant to describe the data. To this end, using Roweis' probabilistic interpretation of PCA and a Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. To avoid the drawbacks of discrete model selection, a simple relaxation of this framework is presented. It allows to find a path of models using a variational expectation-maximization algorithm. The exact marginal likelihood is then maximized over this path. This approach is illustrated on real and synthetic data sets. In particular, using unlabeled microarray data, GSPPCA infers much more relevant gene subsets than traditional sparse PCA algorithms.\nMany AI applications rely on knowledge about a relevant real-world domain that is encoded by means of some logical knowledge base (KB). The most essential benefit of logical KBs is the opportunity to perform automatic reasoning to derive implicit knowledge or to answer complex queries about the modeled domain. The feasibility of meaningful reasoning requires KBs to meet some minimal quality criteria such as logical consistency. Without adequate tool assistance, the task of resolving violated quality criteria in KBs can be extremely tough even for domain experts, especially when the problematic KB includes a large number of logical formulas or comprises complicated logical formalisms.   Published non-interactive debugging systems often cannot localize all possible faults (incompleteness), suggest the deletion or modification of unnecessarily large parts of the KB (non-minimality), return incorrect solutions which lead to a repaired KB not satisfying the imposed quality requirements (unsoundness) or suffer from poor scalability due to the inherent complexity of the KB debugging problem. Even if a system is complete and sound and considers only minimal solutions, there are generally exponentially many solution candidates to select one from. However, any two repaired KBs obtained from these candidates differ in their semantics in terms of entailments and non-entailments. Selection of just any of these repaired KBs might result in unexpected entailments, the loss of desired entailments or unwanted changes to the KB.   This work proposes complete, sound and optimal methods for the interactive debugging of KBs that suggest the one (minimally invasive) error correction of the faulty KB that yields a repaired KB with exactly the intended semantics. Users, e.g. domain experts, are involved in the debugging process by answering automatically generated queries about the intended domain.\nAfter data selection, pre-processing, transformation, and feature extraction, knowledge extraction is not the final step in a data mining process. It is then necessary to understand this knowledge in order to apply it efficiently and effectively. Up to now, there is a lack of appropriate techniques that support this significant step. This is partly due to the fact that the assessment of knowledge is often highly subjective, e.g., regarding aspects such as novelty or usefulness. These aspects depend on the specific knowledge and requirements of the data miner. There are, however, a number of aspects that are objective and for which it is possible to provide appropriate measures. In this article we focus on classification problems and use probabilistic generative classifiers based on mixture density models that are quite common in data mining applications. We define objective measures to assess the informativeness, uniqueness, importance, discrimination, representativity, uncertainty, and distinguishability of rules contained in these classifiers numerically. These measures not only support a data miner in evaluating results of a data mining process based on such classifiers. As we will see in illustrative case studies, they may also be used to improve the data mining process itself or to support the later application of the extracted knowledge.\nMarkov chains and diffusion processes are indispensable tools in machine learning and statistics that are used for inference, sampling, and modeling. With the growth of large-scale datasets, the computational cost associated with simulating these stochastic processes can be considerable, and many algorithms have been proposed to approximate the underlying Markov chain or diffusion. A fundamental question is how the computational savings trade off against the statistical error incurred due to approximations. This paper develops general results that address this question. We bound the Wasserstein distance between the equilibrium distributions of two diffusions as a function of their mixing rates and the deviation in their drifts. We show that this error bound is tight in simple Gaussian settings. Our general result on continuous diffusions can be discretized to provide insights into the computational-statistical trade-off of Markov chains. As an illustration, we apply our framework to derive finite-sample error bounds of approximate unadjusted Langevin dynamics. We characterize computation-constrained settings where, by using fast-to-compute approximate gradients in the Langevin dynamics, we obtain more accurate samples compared to using the exact gradients. Finally, as an additional application of our approach, we quantify the accuracy of approximate zig-zag sampling. Our theoretical analyses are supported by simulation experiments.\nWeb services are software systems designed for supporting interoperable dynamic cross-enterprise interactions. The result of attacks to Web services can be catastrophic and causing the disclosure of enterprises' confidential data. As new approaches of attacking arise every day, anomaly detection systems seem to be invaluable tools in this context. The aim of this work has been to target the attacks that reside in the Web service layer and the extensible markup language (XML)-structured simple object access protocol (SOAP) messages. After studying the shortcomings of the existing solutions, a new approach for detecting anomalies in Web services is outlined. More specifically, the proposed technique illustrates how to identify anomalies by employing mining methods on XML-structured SOAP messages. This technique also takes the advantages of tree-based association rule mining to extract knowledge in the training phase, which is used in the test phase to detect anomalies. In addition, this novel composition of techniques brings nearly low false alarm rate while maintaining the detection rate reasonably high, which is shown by a case study.\nIn many settings people must give numerical scores to entities from a small discrete set. For instance, rating physical attractiveness from 1--5 on dating sites, or papers from 1--10 for conference reviewing. We study the problem of understanding when using a different number of options is optimal. For concreteness we assume the true underlying scores are integers from 1--100. We consider the case when scores are uniform random and Gaussian. We study when using 2, 3, 4, 5, and 10 options is optimal in these models. One may expect that using more options would always improve performance in this model, but we show that this is not necessarily the case, and that using fewer choices---even just two---can surprisingly be optimal in certain situations. While in theory for this setting it would be optimal to use all 100 options, in practice this is prohibitive, and it is preferable to utilize a smaller number of options due to humans' limited computational resources. Our results suggest that using a smaller number of options than is typical could be optimal in certain situations. This would have many potential applications, as settings requiring entities to be ranked by humans are ubiquitous.\nWe study the stochastic online problem of learning to influence in a social network with semi-bandit feedback, where we observe how users influence each other. The problem combines challenges of limited feedback, because the learning agent only observes the influenced portion of the network, and combinatorial number of actions, because the cardinality of the feasible set is exponential in the maximum number of influencers. We propose a computationally efficient UCB-like algorithm, IMLinUCB, and analyze it. Our regret bounds are polynomial in all quantities of interest; reflect the structure of the network and the probabilities of influence. Moreover, they do not depend on inherently large quantities, such as the cardinality of the action set. To the best of our knowledge, these are the first such results. IMLinUCB permits linear generalization and therefore is suitable for large-scale problems. Our experiments show that the regret of IMLinUCB scales as suggested by our upper bounds in several representative graph topologies; and based on linear generalization, IMLinUCB can significantly reduce regret of real-world influence maximization semi-bandits.\nGenerative state estimators based on probabilistic filters and smoothers are one of the most popular classes of state estimators for robots and autonomous vehicles. However, generative models have limited capacity to handle rich sensory observations, such as camera images, since they must model the entire distribution over sensor readings. Discriminative models do not suffer from this limitation, but are typically more complex to train as latent variable models for state estimation. We present an alternative approach where the parameters of the latent state distribution are directly optimized as a deterministic computation graph, resulting in a simple and effective gradient descent algorithm for training discriminative state estimators. We show that this procedure can be used to train state estimators that use complex input, such as raw camera images, which must be processed using expressive nonlinear function approximators such as convolutional neural networks. Our model can be viewed as a type of recurrent neural network, and the connection to probabilistic filtering allows us to design a network architecture that is particularly well suited for state estimation. We evaluate our approach on synthetic tracking task with raw image inputs and on the visual odometry task in the KITTI dataset. The results show significant improvement over both standard generative approaches and regular recurrent neural networks.\nA core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning a \"visual imagination\" of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods.\nIteratively reweighted least squares (IRLS) is a widely-used method in machine learning to estimate the parameters in the generalised linear models. In particular, IRLS for L1 minimisation under the linear model provides a closed-form solution in each step, which is a simple multiplication between the inverse of the weighted second moment matrix and the weighted first moment vector. When dealing with privacy sensitive data, however, developing a privacy preserving IRLS algorithm faces two challenges. First, due to the inversion of the second moment matrix, the usual sensitivity analysis in differential privacy incorporating a single datapoint perturbation gets complicated and often requires unrealistic assumptions. Second, due to its iterative nature, a significant cumulative privacy loss occurs. However, adding a high level of noise to compensate for the privacy loss hinders from getting accurate estimates. Here, we develop a practical algorithm that overcomes these challenges and outputs privatised and accurate IRLS solutions. In our method, we analyse the sensitivity of each moments separately and treat the matrix inversion and multiplication as a post-processing step, which simplifies the sensitivity analysis. Furthermore, we apply the {\\it{concentrated differential privacy}} formalism, a more relaxed version of differential privacy, which requires adding a significantly less amount of noise for the same level of privacy guarantee, compared to the conventional and advanced compositions of differentially private mechanisms.\nNutrient-based meal recommendations have the potential to help individuals prevent or manage conditions such as diabetes and obesity. However, learning people's food preferences and making recommendations that simultaneously appeal to their palate and satisfy nutritional expectations are challenging. Existing approaches either only learn high-level preferences or require a prolonged learning period. We propose Yum-me, a personalized nutrient-based meal recommender system designed to meet individuals' nutritional expectations, dietary restrictions, and fine-grained food preferences. Yum-me enables a simple and accurate food preference profiling procedure via a visual quiz-based user interface, and projects the learned profile into the domain of nutritionally appropriate food options to find ones that will appeal to the user. We present the design and implementation of Yum-me, and further describe and evaluate two innovative contributions. The first contriution is an open source state-of-the-art food image analysis model, named FoodDist. We demonstrate FoodDist's superior performance through careful benchmarking and discuss its applicability across a wide array of dietary applications. The second contribution is a novel online learning framework that learns food preference from item-wise and pairwise image comparisons. We evaluate the framework in a field study of 227 anonymous users and demonstrate that it outperforms other baselines by a significant margin. We further conducted an end-to-end validation of the feasibility and effectiveness of Yum-me through a 60-person user study, in which Yum-me improves the recommendation acceptance rate by 42.63%.\nLarge labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict. We show that by explicitly representing this training set labeling process as a generative model, we can \"denoise\" the generated training set, and establish theoretically that we can recover the parameters of these generative models in a handful of settings. We then show how to modify a discriminative loss function to make it noise-aware, and demonstrate our method over a range of discriminative models including logistic regression and LSTMs. Experimentally, on the 2014 TAC-KBP Slot Filling challenge, we show that data programming would have led to a new winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a state-of-the-art LSTM baseline (and into second place in the competition). Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.\nUsing top-ranked documents in response to a query has been shown to be an effective approach to improve the quality of query translation in dictionary-based cross-language information retrieval. In this paper, we propose a new method for dictionary-based query translation based on dimension projection of embedded vectors from the pseudo-relevant documents in the source language to their equivalents in the target language. To this end, first we learn low-dimensional vectors of the words in the pseudo-relevant collections separately and then aim to find a query-dependent transformation matrix between the vectors of translation pairs appeared in the collections. At the next step, representation of each query term is projected to the target language and then, after using a softmax function, a query-dependent translation model is built. Finally, the model is used for query translation. Our experiments on four CLEF collections in French, Spanish, German, and Italian demonstrate that the proposed method outperforms a word embedding baseline based on bilingual shuffling and a further number of competitive baselines. The proposed method reaches up to 87% performance of machine translation (MT) in short queries and considerable improvements in verbose queries.\nMachines, not humans, are the world's dominant knowledge accumulators but humans remain the dominant decision makers. Interpreting and disseminating the knowledge accumulated by machines requires expertise, time, and is prone to failure. The problem of how best to convey accumulated knowledge from computers to humans is a critical bottleneck in the broader application of machine learning. We propose an approach based on human teaching where the problem is formalized as selecting a small subset of the data that will, with high probability, lead the human user to the correct inference. This approach, though successful for modeling human learning in simple laboratory experiments, has failed to achieve broader relevance due to challenges in formulating general and scalable algorithms. We propose general-purpose teaching via pseudo-marginal sampling and demonstrate the algorithm by teaching topic models. Simulation results show our sampling-based approach: effectively approximates the probability where ground-truth is possible via enumeration, results in data that are markedly different from those expected by random sampling, and speeds learning especially for small amounts of data. Application to movie synopsis data illustrates differences between teaching and random sampling for teaching distributions and specific topics, and demonstrates gains in scalability and applicability to real-world problems.\nWhile great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning - leveraging unlabeled examples to learn about the structure of a domain - remains a difficult unsolved challenge. Here, we explore prediction of future frames in a video sequence as an unsupervised learning rule for learning about the structure of the visual world. We describe a predictive neural network (\"PredNet\") architecture that is inspired by the concept of \"predictive coding\" from the neuroscience literature. These networks learn to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers. We show that these networks are able to robustly learn to predict the movement of synthetic (rendered) objects, and that in doing so, the networks learn internal representations that are useful for decoding latent object parameters (e.g. pose) that support object recognition with fewer training views. We also show that these networks can scale to complex natural image streams (car-mounted camera videos), capturing key aspects of both egocentric movement and the movement of objects in the visual scene, and the representation learned in this setting is useful for estimating the steering angle. Altogether, these results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.\nWe propose a low cost and effective way to combine a free simulation software and free CAD models for modeling human-object interaction in order to improve human & object segmentation. It is intended for research scenarios related to safe human-robot collaboration (SHRC) and interaction (SHRI) in the industrial domain. The task of human and object modeling has been used for detecting activity, and for inferring and predicting actions, different from those works, we do human and object modeling in order to learn interactions in RGB-D data for improving segmentation. For this purpose, we define a novel density function to model a three dimensional (3D) scene in a virtual environment (VREP). This density function takes into account various possible configurations of human-object and object-object relationships and interactions governed by their affordances. Using this function, we synthesize a large, realistic and highly varied synthetic RGB-D dataset that we use for training. We train a random forest classifier, and the pixelwise predictions obtained is integrated as a unary term in a pairwise conditional random fields (CRF). Our evaluation shows that modeling these interactions improves segmentation performance by ~7\\% in mean average precision and recall over state-of-the-art methods that ignore these interactions in real-world data. Our approach is computationally efficient, robust and can run real-time on consumer hardware.\nTensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous \"parameter server\" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.\nIn this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.\nDeep neural networks (DNNs) have demonstrated state-of-the-art results on many pattern recognition tasks, especially vision classification problems. Understanding the inner workings of such computational brains is both fascinating basic science that is interesting in its own right - similar to why we study the human brain - and will enable researchers to further improve DNNs. One path to understanding how a neural network functions internally is to study what each of its neurons has learned to detect. One such method is called activation maximization (AM), which synthesizes an input (e.g. an image) that highly activates a neuron. Here we dramatically improve the qualitative state of the art of activation maximization by harnessing a powerful, learned prior: a deep generator network (DGN). The algorithm (1) generates qualitatively state-of-the-art synthetic images that look almost real, (2) reveals the features learned by each neuron in an interpretable way, (3) generalizes well to new datasets and somewhat well to different network architectures without requiring the prior to be relearned, and (4) can be considered as a high-quality generative method (in this case, by generating novel, creative, interesting, recognizable images).\nThe ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution. Intuitively, models trained to predict these semantic latent representations given data may serve as useful feature representations for auxiliary problems where semantics are relevant. However, in their existing form, GANs have no means of learning the inverse mapping -- projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning.\nThis thesis describes work on two applications of probabilistic programming: the learning of probabilistic program code given specifications, in particular program code of one-dimensional samplers; and the facilitation of sequential Monte Carlo inference with help of data-driven proposals. The latter is presented with experimental results on a linear Gaussian model and a non-parametric dependent Dirichlet process mixture of objects model for object recognition and tracking.   In Chapter 1 we provide a brief introduction to probabilistic programming.   In Chapter 3 we present an approach to automatic discovery of samplers in the form of probabilistic programs. We formulate a Bayesian approach to this problem by specifying a grammar-based prior over probabilistic program code. We use an approximate Bayesian computation method to learn the programs, whose executions generate samples that statistically match observed data or analytical characteristics of distributions of interest. In our experiments we leverage different probabilistic programming systems to perform Markov chain Monte Carlo sampling over the space of programs. Experimental results have demonstrated that, using the proposed methodology, we can learn approximate and even some exact samplers. Finally, we show that our results are competitive with regard to genetic programming methods.   In Chapter 3, we describe a way to facilitate sequential Monte Carlo inference in probabilistic programming using data-driven proposals. In particular, we develop a distance-based proposal for the non-parametric dependent Dirichlet process mixture of objects model. We implement this approach in the probabilistic programming system Anglican, and show that for that model data-driven proposals provide significant performance improvements. We also explore the possibility of using neural networks to improve data-driven proposals.\nWe introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics. Such procedure allows training the multiresolution recurrent neural network by maximizing the exact joint log-likelihood over both sequences. In contrast to the standard log- likelihood objective w.r.t. natural language tokens (word perplexity), optimizing the joint log-likelihood biases the model towards modeling high-level abstractions. We apply the proposed model to the task of dialogue response generation in two challenging domains: the Ubuntu technical support domain, and Twitter conversations. On Ubuntu, the model outperforms competing approaches by a substantial margin, achieving state-of-the-art results according to both automatic evaluation metrics and a human evaluation study. On Twitter, the model appears to generate more relevant and on-topic responses according to automatic evaluation metrics. Finally, our experiments demonstrate that the proposed model is more adept at overcoming the sparsity of natural language and is better able to capture long-term structure.\nDocument classification for text, images and other applicable entities has long been a focus of research in academia and also finds application in many industrial settings. Amidst a plethora of approaches to solve such problems, machine-learning techniques have found success in a variety of scenarios. In this paper we discuss the design of a machine learning-based semi-supervised job title classification system for the online job recruitment domain currently in production at CareerBuilder.com and propose enhancements to it. The system leverages a varied collection of classification as well clustering algorithms. These algorithms are encompassed in an architecture that facilitates leveraging existing off-the-shelf machine learning tools and techniques while keeping into consideration the challenges of constructing a scalable classification system for a large taxonomy of categories. As a continuously evolving system that is still under development we first discuss the existing semi-supervised classification system which is composed of both clustering and classification components in a proximity-based classifier setup and results of which are already used across numerous products at CareerBuilder. We then elucidate our long-term goals for job title classification and propose enhancements to the existing system in the form of a two-stage coarse and fine level classifier augmentation to construct a cascade of hierarchical vertical classifiers. Preliminary results are presented using experimental evaluation on real world industrial data.\nIn this work, a new prototype-based clustering method named Evidential C-Medoids (ECMdd), which belongs to the family of medoid-based clustering for proximity data, is proposed as an extension of Fuzzy C-Medoids (FCMdd) on the theoretical framework of belief functions. In the application of FCMdd and original ECMdd, a single medoid (prototype), which is supposed to belong to the object set, is utilized to represent one class. For the sake of clarity, this kind of ECMdd using a single medoid is denoted by sECMdd. In real clustering applications, using only one pattern to capture or interpret a class may not adequately model different types of group structure and hence limits the clustering performance. In order to address this problem, a variation of ECMdd using multiple weighted medoids, denoted by wECMdd, is presented. Unlike sECMdd, in wECMdd objects in each cluster carry various weights describing their degree of representativeness for that class. This mechanism enables each class to be represented by more than one object. Experimental results in synthetic and real data sets clearly demonstrate the superiority of sECMdd and wECMdd. Moreover, the clustering results by wECMdd can provide richer information for the inner structure of the detected classes with the help of prototype weights.\nModeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations. We hypothesize that these methods are not as expressive as an outer product of the visual and textual vectors. As the outer product is typically infeasible due to its high dimensionality, we instead propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and expressively combine multimodal features. We extensively evaluate MCB on the visual question answering and grounding tasks. We consistently show the benefit of MCB over ablations without MCB. For visual question answering, we present an architecture which uses MCB twice, once for predicting attention over spatial features and again to combine the attended representation with the question representation. This model outperforms the state-of-the-art on the Visual7W dataset and the VQA challenge.\nThis publication presents a relation computation or calculus for international relations using a mathematical modeling. It examined trust for international relations and its calculus, which related to Bayesian inference, Dempster-Shafer theory and subjective logic. Based on an observation in the literature, we found no literature discussing the calculus method for the international relations. To bridge this research gap, we propose a relation algebra method for international relations computation. The proposed method will allow a relation computation which is previously subjective and incomputable. We also present three international relations as case studies to demonstrate the proposed method is a real-world scenario. The method will deliver the relation computation for the international relations that to support decision makers in a government such as foreign ministry, defense ministry, presidential or prime minister office. The Department of Defense (DoD) may use our method to determine a nation that can be identified as a friendly, neutral or hostile nation.\nAn important application of interactive machine learning is extending or amplifying the cognitive and physical capabilities of a human. To accomplish this, machines need to learn about their human users' intentions and adapt to their preferences. In most current research, a user has conveyed preferences to a machine using explicit corrective or instructive feedback; explicit feedback imposes a cognitive load on the user and is expensive in terms of human effort. The primary objective of the current work is to demonstrate that a learning agent can reduce the amount of explicit feedback required for adapting to the user's preferences pertaining to a task by learning to perceive a value of its behavior from the human user, particularly from the user's facial expressions---we call this face valuing. We empirically evaluate face valuing on a grip selection task. Our preliminary results suggest that an agent can quickly adapt to a user's changing preferences with minimal explicit feedback by learning a value function that maps facial features extracted from a camera image to expected future reward. We believe that an agent learning to perceive a value from the body language of its human user is complementary to existing interactive machine learning approaches and will help in creating successful human-machine interactive applications.\nIn this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and domain expertise. In order to remove the need to define such summary spaces, we show that deep RL can also be trained efficiently on the original state and action spaces. Dialogue systems based on partially observable Markov decision processes are known to require many dialogues to train, which makes them unappealing for practical deployment. We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actor-critic deep learner is considerably bootstrapped from a combination of supervised and batch RL. In addition, convergence to an optimal policy is significantly sped up compared to other deep RL methods initialized on the data with batch RL. All experiments are performed on a restaurant domain derived from the Dialogue State Tracking Challenge 2 (DSTC2) dataset.\nAnalysis of regional development imbalances quadrant has a very important meaning in order to see the extent of achievement of the development of certain areas as well as the difference. Factors that could be used as a tool to measure the inequality of development is to look at the average growth and development contribution of each sector of Gross Regional Domestic Product (GRDP) based on the analyzed region and the reference region. This study discusses the development of a model to determine the regional development imbalances using fuzzy approach system, and the rules of typology Klassen. The model is then called fuzzy-Klassen. Implications Product Mamdani fuzzy system is used in the model as an inference engine to generate output after defuzzyfication process. Application of MATLAB is used as a tool of analysis in this study. The test a result of Kota Cilegon is shows that there are significant differences between traditional Klassen typology analyses with the results of the model developed. Fuzzy model-Klassen shows GRDP sector inequality Cilegon City is dominated by Quadrant I (K4), where status is the sector forward and grows exponentially. While the traditional Klassen typology, half of GRDP sector is dominated by Quadrant IV (K4) with a sector that is lagging relative status.\nModern SAT solvers have experienced a remarkable progress on solving industrial instances. Most of the techniques have been developed after an intensive experimental process. It is believed that these techniques exploit the underlying structure of industrial instances. However, there are few works trying to exactly characterize the main features of this structure.   The research community on complex networks has developed techniques of analysis and algorithms to study real-world graphs that can be used by the SAT community. Recently, there have been some attempts to analyze the structure of industrial SAT instances in terms of complex networks, with the aim of explaining the success of SAT solving techniques, and possibly improving them.   In this paper, inspired by the results on complex networks, we study the community structure, or modularity, of industrial SAT instances. In a graph with clear community structure, or high modularity, we can find a partition of its nodes into communities such that most edges connect variables of the same community. In our analysis, we represent SAT instances as graphs, and we show that most application benchmarks are characterized by a high modularity. On the contrary, random SAT instances are closer to the classical Erd\\\"os-R\\'enyi random graph model, where no structure can be observed. We also analyze how this structure evolves by the effects of the execution of the SAT solver. We detect that new clauses learnt by the solver during the search contribute to destroy the original community structure of the formula. This partially explains the distinct performance of SAT solvers on random and industrial SAT instances.\nChoosing a good location when opening a new store is crucial for the future success of a business. Traditional methods include offline manual survey, which is very time consuming, and analytic models based on census data, which are un- able to adapt to the dynamic market. The rapid increase of the availability of big data from various types of mobile devices, such as online query data and offline positioning data, provides us with the possibility to develop automatic and accurate data-driven prediction models for business store placement. In this paper, we propose a Demand Distribution Driven Store Placement (D3SP) framework for business store placement by mining search query data from Baidu Maps. D3SP first detects the spatial-temporal distributions of customer demands on different business services via query data from Baidu Maps, the largest online map search engine in China, and detects the gaps between demand and sup- ply. Then we determine candidate locations via clustering such gaps. In the final stage, we solve the location optimization problem by predicting and ranking the number of customers. We not only deploy supervised regression models to predict the number of customers, but also learn to rank models to directly rank the locations. We evaluate our framework on various types of businesses in real-world cases, and the experiments results demonstrate the effectiveness of our methods. D3SP as the core function for store placement has already been implemented as a core component of our business analytics platform and could be potentially used by chain store merchants on Baidu Nuomi.\nRedescription mining is a field of knowledge discovery that aims at finding different descriptions of similar subsets of instances in the data. These descriptions are represented as rules inferred from one or more disjoint sets of attributes, called views. As such, they support knowledge discovery process and help domain experts in formulating new hypotheses or constructing new knowledge bases and decision support systems. In contrast to previous approaches that typically create one smaller set of redescriptions satisfying a pre-defined set of constraints, we introduce a framework that creates large and heterogeneous redescription set from which user/expert can extract compact sets of differing properties, according to its own preferences. Construction of large and heterogeneous redescription set relies on CLUS-RM algorithm and a novel, conjunctive refinement procedure that facilitates generation of larger and more accurate redescription sets. The work also introduces the variability of redescription accuracy when missing values are present in the data, which significantly extends applicability of the method. Crucial part of the framework is the redescription set extraction based on heuristic multi-objective optimization procedure that allows user to define importance levels towards one or more redescription quality criteria. We provide both theoretical and empirical comparison of the novel framework against current state of the art redescription mining algorithms and show that it represents more efficient and versatile approach for mining redescriptions from data.\nIn this paper, we introduce Entropy/IP: a system that discovers Internet address structure based on analyses of a subset of IPv6 addresses known to be active, i.e., training data, gleaned by readily available passive and active means. The system is completely automated and employs a combination of information-theoretic and machine learning techniques to probabilistically model IPv6 addresses. We present results showing that our system is effective in exposing structural characteristics of portions of the IPv6 Internet address space populated by active client, service, and router addresses.   In addition to visualizing the address structure for exploration, the system uses its models to generate candidate target addresses for scanning. For each of 15 evaluated datasets, we train on 1K addresses and generate 1M candidates for scanning. We achieve some success in 14 datasets, finding up to 40% of the generated addresses to be active. In 11 of these datasets, we find active network identifiers (e.g., /64 prefixes or `subnets') not seen in training. Thus, we provide the first evidence that it is practical to discover subnets and hosts by scanning probabilistically selected areas of the IPv6 address space not known to contain active hosts a priori.\nRecognizing the actions of others from visual stimuli is a crucial aspect of human visual perception that allows individuals to respond to social cues. Humans are able to identify similar behaviors and discriminate between distinct actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding motion perception at the neural level have not always translated in precise accounts of the computational principles underlying what representation our visual cortex evolved or learned to compute. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, CNNs, that achieve human level performance in complex discriminative tasks. Within this class of models, architectures that better support invariant object recognition also produce image representations that match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations remains unknown. Here we show that spatiotemporal CNNs appropriately categorize video stimuli into actions, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed by human visual cortex.\nIn classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about the safety of unvisited state-action pairs from noisy observations collected while navigating the environment. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.\nAn increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors.   To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging even for the best players in the world.   We organize our analysis around three categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are significantly more powerful than features based on skill or time.\nRecruiters usually spend less than a minute looking at each r\\'esum\\'e when deciding whether it's worth continuing the recruitment process with the candidate. Recruiters focus on keywords, and it's almost impossible to guarantee a fair process of candidate selection. The main scope of this paper is to tackle this issue by introducing a data-driven approach that shows how to process r\\'esum\\'es automatically and give recruiters more time to only examine promising candidates. Furthermore, we show how to leverage Machine Learning and Natural Language Processing in order to extract all required information from the r\\'esum\\'es. Once the information is extracted, a ranking score is calculated. The score describes how well the candidates fit based on their education, work experience and skills. Later this paper illustrates a prototype application that shows how this novel approach can increase the productivity of recruiters. The application enables them to filter and rank candidates based on predefined job descriptions. Guided by the ranking, recruiters can get deeper insights from candidate profiles and validate why and how the application ranked them. This application shows how to improve the hiring process by giving an unbiased hiring decision support.\nMulti-label classification has received considerable interest in recent years. Multi-label classifiers have to address many problems including: handling large-scale datasets with many instances and a large set of labels, compensating missing label assignments in the training set, considering correlations between labels, as well as exploiting unlabeled data to improve prediction performance. To tackle datasets with a large set of labels, embedding-based methods have been proposed which seek to represent the label assignments in a low-dimensional space. Many state-of-the-art embedding-based methods use a linear dimensionality reduction to represent the label assignments in a low-dimensional space. However, by doing so, these methods actually neglect the tail labels - labels that are infrequently assigned to instances. We propose an embedding-based method that non-linearly embeds the label vectors using an stochastic approach, thereby predicting the tail labels more accurately. Moreover, the proposed method have excellent mechanisms for handling missing labels, dealing with large-scale datasets, as well as exploiting unlabeled data. With the best of our knowledge, our proposed method is the first multi-label classifier that simultaneously addresses all of the mentioned challenges. Experiments on real-world datasets show that our method outperforms stateof-the-art multi-label classifiers by a large margin, in terms of prediction performance, as well as training time.\nStudents opting for Engineering as their discipline is increasing rapidly. But due to various factors and inappropriate primary education in India, failure rates are high. Students are unable to excel in core engineering because of complex and mathematical subjects. Hence, they fail in such subjects. With the help of data mining techniques, we can predict the performance of students in terms of grades and failure in subjects. This paper performs a comparative analysis of various classification techniques, such as Na\\\"ive Bayes, LibSVM, J48, Random Forest, and JRip and tries to choose best among these. Based on the results obtained, we found that Na\\\"ive Bayes is the most accurate method in terms of students failure prediction and JRip is most accurate in terms of students grade prediction. We also found that JRip marginally differs from Na\\\"ive Bayes in terms of accuracy for students failure prediction and gives us a set of rules from which we derive the key factors influencing students performance. Finally, we suggest various ways to mitigate these factors. This study is limited to Indian Education system scenarios. However, the factors found can be helpful in other scenarios as well.\nWe introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAMBADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse. We show that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-of-the-art language models reaches accuracy above 1% on this novel benchmark. We thus propose LAMBADA as a challenging test set, meant to encourage the development of new models capable of genuine understanding of broad context in natural language text.\nFor an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail.\nThe research community has considered in the past the application of Artificial Intelligence (AI) techniques to control and operate networks. A notable example is the Knowledge Plane proposed by D.Clark et al. However, such techniques have not been extensively prototyped or deployed in the field yet. In this paper, we explore the reasons for the lack of adoption and posit that the rise of two recent paradigms: Software-Defined Networking (SDN) and Network Analytics (NA), will facilitate the adoption of AI techniques in the context of network operation and control. We describe a new paradigm that accommodates and exploits SDN, NA and AI, and provide use cases that illustrate its applicability and benefits. We also present simple experimental results that support its feasibility. We refer to this new paradigm as Knowledge-Defined Networking (KDN).\nOptimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithm for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for \"smart grid\" applications like control of distributed energy resources. We evaluate our technique numerically on several benchmark networks and show that practical OPF problems can be solved effectively using this approach.\nThis paper contributes a preliminary report on the advantages and disadvantages of incorporating simultaneous human control and feedback signals in the training of a reinforcement learning robotic agent. While robotic human-machine interfaces have become increasingly complex in both form and function, control remains challenging for users. This has resulted in an increasing gap between user control approaches and the number of robotic motors which can be controlled. One way to address this gap is to shift some autonomy to the robot. Semi-autonomous actions of the robotic agent can then be shaped by human feedback, simplifying user control. Most prior work on agent shaping by humans has incorporated training with feedback, or has included indirect control signals. By contrast, in this paper we explore how a human can provide concurrent feedback signals and real-time myoelectric control signals to train a robot's actor-critic reinforcement learning control system. Using both a physical and a simulated robotic system, we compare training performance on a simple movement task when reward is derived from the environment, when reward is provided by the human, and combinations of these two approaches. Our results indicate that some benefit can be gained with the inclusion of human generated feedback.\nAttack graphs provide compact representations of the attack paths that an attacker can follow to compromise network resources by analysing network vulnerabilities and topology. These representations are a powerful tool for security risk assessment. Bayesian inference on attack graphs enables the estimation of the risk of compromise to the system's components given their vulnerabilities and interconnections, and accounts for multi-step attacks spreading through the system. Whilst static analysis considers the risk posture at rest, dynamic analysis also accounts for evidence of compromise, e.g. from SIEM software or forensic investigation. However, in this context, exact Bayesian inference techniques do not scale well. In this paper we show how Loopy Belief Propagation - an approximate inference technique - can be applied to attack graphs, and that it scales linearly in the number of nodes for both static and dynamic analysis, making such analyses viable for larger networks. We experiment with different topologies and network clustering on synthetic Bayesian attack graphs with thousands of nodes to show that the algorithm's accuracy is acceptable and converge to a stable solution. We compare sequential and parallel versions of Loopy Belief Propagation with exact inference techniques for both static and dynamic analysis, showing the advantages of approximate inference techniques to scale to larger attack graphs.\nIn recent years online advertising has become increasingly ubiquitous and effective. Advertisements shown to visitors fund sites and apps that publish digital content, manage social networks, and operate e-mail services. Given such large variety of internet resources, determining an appropriate type of advertising for a given platform has become critical to financial success. Native advertisements, namely ads that are similar in look and feel to content, have had great success in news and social feeds. However, to date there has not been a winning formula for ads in e-mail clients. In this paper we describe a system that leverages user purchase history determined from e-mail receipts to deliver highly personalized product ads to Yahoo Mail users. We propose to use a novel neural language-based algorithm specifically tailored for delivering effective product recommendations, which was evaluated against baselines that included showing popular products and products predicted based on co-occurrence. We conducted rigorous offline testing using a large-scale product purchase data set, covering purchases of more than 29 million users from 172 e-commerce websites. Ads in the form of product recommendations were successfully tested on online traffic, where we observed a steady 9% lift in click-through rates over other ad formats in mail, as well as comparable lift in conversion rates. Following successful tests, the system was launched into production during the holiday season of 2014.\nDistributed knowledge is the sum of the knowledge in a group; what someone who is able to discern between two possible worlds whenever any member of the group can discern between them, would know. Sometimes distributed knowledge is referred to as the potential knowledge of a group, or the joint knowledge they could obtain if they had unlimited means of communication. In epistemic logic, the formula D_G{\\phi} is intended to express the fact that group G has distributed knowledge of {\\phi}, that there is enough information in the group to infer {\\phi}. But this is not the same as reasoning about what happens if the members of the group share their information. In this paper we introduce an operator R_G, such that R_G{\\phi} means that {\\phi} is true after G have shared all their information with each other - after G's distributed knowledge has been resolved. The R_G operators are called resolution operators. Semantically, we say that an expression R_G{\\phi} is true iff {\\phi} is true in what van Benthem [11, p. 249] calls (G's) communication core; the model update obtained by removing links to states for members of G that are not linked by all members of G. We study logics with different combinations of resolution operators and operators for common and distributed knowledge. Of particular interest is the relationship between distributed and common knowledge. The main results are sound and complete axiomatizations.\nInformation delivery in a network of agents is a key issue for large, complex systems that need to do so in a predictable, efficient manner. The delivery of information in such multi-agent systems is typically implemented through routing protocols that determine how information flows through the network. Different routing protocols exist each with its own benefits, but it is generally unclear which properties can be successfully combined within a given algorithm. We approach this problem from the axiomatic point of view, i.e., we try to establish what are the properties we would seek to see in such a system, and examine the different properties which uniquely define common routing algorithms used today.   We examine several desirable properties, such as robustness, which ensures adding nodes and edges does not change the routing in a radical, unpredictable ways; and properties that depend on the operating environment, such as an \"economic model\", where nodes choose their paths based on the cost they are charged to pass information to the next node. We proceed to fully characterize minimal spanning tree, shortest path, and weakest link routing algorithms, showing a tight set of axioms for each.\nAlthough several RDF knowledge bases are available through the LOD initiative, the ontology schema of such linked datasets is not very rich. In particular, they lack object properties. The problem of finding new object properties (and their instances) between any two given classes has not been investigated in detail in the context of Linked Data. In this paper, we present DART (Detecting Arbitrary Relations for enriching T-Boxes of Linked Data) - an unsupervised solution to enrich the LOD cloud with new object properties between two given classes. DART exploits contextual similarity to identify text patterns from the web corpus that can potentially represent relations between individuals. These text patterns are then clustered by means of paraphrase detection to capture the object properties between the two given LOD classes. DART also performs fully automated mapping of the discovered relations to the properties in the linked dataset. This serves many purposes such as identification of completely new relations, elimination of irrelevant relations, and generation of prospective property axioms. We have empirically evaluated our approach on several pairs of classes and found that the system can indeed be used for enriching the linked datasets with new object properties and their instances. We compared DART with newOntExt system which is an offshoot of the NELL (Never-Ending Language Learning) effort. Our experiments reveal that DART gives better results than newOntExt with respect to both the correctness, as well as the number of relations.\nThis paper presents a new model for word sense disambiguation formulated in terms of evolutionary game theory, where each word to be disambiguated is represented as a node on a graph whose edges represent word relations and senses are represented as classes. The words simultaneously update their class membership preferences according to the senses that neighboring words are likely to choose. We use distributional information to weigh the influence that each word has on the decisions of the others and semantic similarity information to measure the strength of compatibility among the choices. With this information we can formulate the word sense disambiguation problem as a constraint satisfaction problem and solve it using tools derived from game theory, maintaining the textual coherence. The model is based on two ideas: similar words should be assigned to similar classes and the meaning of a word does not depend on all the words in a text but just on some of them. The paper provides an in-depth motivation of the idea of modeling the word sense disambiguation problem in terms of game theory, which is illustrated by an example. The conclusion presents an extensive analysis on the combination of similarity measures to use in the framework and a comparison with state-of-the-art systems. The results show that our model outperforms state-of-the-art algorithms and can be applied to different tasks and in different scenarios.\nThe systematic modelling of dynamic spatial systems is a key requirement in a wide range of application areas such as commonsense cognitive robotics, computer-aided architecture design, and dynamic geographic information systems. We present ASPMT(QS), a novel approach and fully-implemented prototype for non-monotonic spatial reasoning -a crucial requirement within dynamic spatial systems- based on Answer Set Programming Modulo Theories (ASPMT).   ASPMT(QS) consists of a (qualitative) spatial representation module (QS) and a method for turning tight ASPMT instances into Satisfiability Modulo Theories (SMT) instances in order to compute stable models by means of SMT solvers. We formalise and implement concepts of default spatial reasoning and spatial frame axioms. Spatial reasoning is performed by encoding spatial relations as systems of polynomial constraints, and solving via SMT with the theory of real nonlinear arithmetic. We empirically evaluate ASPMT(QS) in comparison with other contemporary spatial reasoning systems both within and outside the context of logic programming. ASPMT(QS) is currently the only existing system that is capable of reasoning about indirect spatial effects (i.e., addressing the ramification problem), and integrating geometric and qualitative spatial information within a non-monotonic spatial reasoning context.   This paper is under consideration for publication in TPLP.\nKnowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform link prediction or knowledge base completion, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper combines insights from several previous link prediction models into a new embedding model STransE that represents each entity as a low-dimensional vector, and each relation by two matrices and a translation vector. STransE is a simple combination of the SE and TransE models, but it obtains better link prediction performance on two benchmark datasets than previous embedding models. Thus, STransE can serve as a new baseline for the more complex models in the link prediction task.\nWe consider a collection of distributed units that interact with one another through the sending of messages. Each message carries a positive ($+1$) or negative ($-1$) tag and causes the receiving unit to send out messages as a function of the tags it has received and a threshold. This simple model abstracts some of the essential characteristics of several systems used in the field of artificial intelligence, and also of biological systems epitomized by the brain. We study the integration of information inside a temporal window as the model's dynamics unfolds. We quantify information integration by the total correlation, relative to the window's duration ($w$), of a set of random variables valued as a function of message arrival. Total correlation refers to the rise of information gain above and beyond that which the units already achieve individually, being therefore related to consciousness studies in some models. We report on extensive computational experiments that explore the interrelations of the model's parameters (two probabilities and the threshold), highlighting relevant scenarios of message traffic and how they impact the behavior of total correlation as a function of $w$. We find that total correlation can occur at significant fractions of the maximum possible value and provide semi-analytical results on the message-traffic characteristics associated with values of $w$ for which it peaks. We then reinterpret the model's parameters in terms of the current best estimates of some quantities pertaining to cortical structure and dynamics. We find the resulting possibilities for best values of $w$ to be well aligned with the time frames within which percepts are thought to be processed and eventually rendered conscious.\nA qualitative representation $\\phi$ is like an ordinary representation of a relation algebra, but instead of requiring $(a; b)^\\phi = a^\\phi | b^\\phi$, as we do for ordinary representations, we only require that $c^\\phi\\supseteq a^\\phi | b^\\phi \\iff c\\geq a ; b$, for each $c$ in the algebra. A constraint network is qualitatively satisfiable if its nodes can be mapped to elements of a qualitative representation, preserving the constraints. If a constraint network is satisfiable then it is clearly qualitatively satisfiable, but the converse can fail. However, for a wide range of relation algebras including the point algebra, the Allen Interval Algebra, RCC8 and many others, a network is satisfiable if and only if it is qualitatively satisfiable.   Unlike ordinary composition, the weak composition arising from qualitative representations need not be associative, so we can generalise by considering network satisfaction problems over non-associative algebras. We prove that computationally, qualitative representations have many advantages over ordinary representations: whereas many finite relation algebras have only infinite representations, every finite qualitatively representable algebra has a finite qualitative representation; the representability problem for (the atom structures of) finite non-associative algebras is NP-complete; the network satisfaction problem over a finite qualitatively representable algebra is always in NP; the validity of equations over qualitative representations is co-NP-complete. On the other hand we prove that there is no finite axiomatisation of the class of qualitatively representable algebras.\nMechanical devices such as engines, vehicles, aircrafts, etc., are typically instrumented with numerous sensors to capture the behavior and health of the machine. However, there are often external factors or variables which are not captured by sensors leading to time-series which are inherently unpredictable. For instance, manual controls and/or unmonitored environmental conditions or load may lead to inherently unpredictable time-series. Detecting anomalies in such scenarios becomes challenging using standard approaches based on mathematical models that rely on stationarity, or prediction models that utilize prediction errors to detect anomalies. We propose a Long Short Term Memory Networks based Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) that learns to reconstruct 'normal' time-series behavior, and thereafter uses reconstruction error to detect anomalies. We experiment with three publicly available quasi predictable time-series datasets: power demand, space shuttle, and ECG, and two real-world engine datasets with both predictive and unpredictable behavior. We show that EncDec-AD is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, we show that EncDec-AD is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500).\nWe employed the SERENDIP III system with the Arecibo radio telescope to search for possible artificial extraterrestrial signals. Over the four years of this search we covered 93% of the sky observable at Arecibo at least once and 44% of the sky five times or more with a sensitivity of ~3E-25 W/m2. The data were sent to a 4 million channel spectrum analyzer. Information was obtained from over 1E+14 independent data points and the results were then analyzed via a suite of pattern detection algorithms to identify narrow band spectral power peaks that were not readily identifiable as the product of human activity. We separately selected data coincident with interesting nearby G dwarf stars that were encountered by chance in our sky survey for suggestions of excess power peaks. The peak power distributions in both these data sets were consistent with random noise. We report upper limits on possible signals from the stars investigated and provide examples of the most interesting candidates identified in the sky survey. This paper was intended for publication in 2000 and is presented here without change from the version submitted to ApJS in 2000.\nOne of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well understood algorithms for adapting the learning-rate parameter, online during learning. Such meta-learning approaches can improve robustness of learning and enable specialization to current task, improving learning speed. For temporal-difference learning algorithms which we study here, there is yet another parameter, $\\lambda$, that similarly impacts learning speed and stability in practice. Unfortunately, unlike the learning-rate parameter, $\\lambda$ parametrizes the objective function that temporal-difference methods optimize. Different choices of $\\lambda$ produce different fixed-point solutions, and thus adapting $\\lambda$ online and characterizing the optimization is substantially more complex than adapting the learning-rate parameter. There are no meta-learning method for $\\lambda$ that can achieve (1) incremental updating, (2) compatibility with function approximation, and (3) maintain stability of learning under both on and off-policy sampling. In this paper we contribute a novel objective function for optimizing $\\lambda$ as a function of state rather than time. We derive a new incremental, linear complexity $\\lambda$-adaption algorithm that does not require offline batch updating or access to a model of the world, and present a suite of experiments illustrating the practicality of our new algorithm in three different settings. Taken together, our contributions represent a concrete step towards black-box application of temporal-difference learning methods in real world problems.\nElectricity theft is a major problem around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which are losses that occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions around the world.\nRecommendation systems usually involve exploiting the relations among known features and content that describe items (content-based filtering) or the overlap of similar users who interacted with or rated the target item (collaborative filtering). To combine these two filtering approaches, current model-based hybrid recommendation systems typically require extensive feature engineering to construct a user profile. Statistical Relational Learning (SRL) provides a straightforward way to combine the two approaches. However, due to the large scale of the data used in real world recommendation systems, little research exists on applying SRL models to hybrid recommendation systems, and essentially none of that research has been applied on real big-data-scale systems. In this paper, we proposed a way to adapt the state-of-the-art in SRL learning approaches to construct a real hybrid recommendation system. Furthermore, in order to satisfy a common requirement in recommendation systems (i.e. that false positives are more undesirable and therefore penalized more harshly than false negatives), our approach can also allow tuning the trade-off between the precision and recall of the system in a principled way. Our experimental results demonstrate the efficiency of our proposed approach as well as its improved performance on recommendation precision.\nWe present a novel form of interactive video object segmentation where a few clicks by the user helps the system produce a full spatio-temporal segmentation of the object of interest. Whereas conventional interactive pipelines take the user's initialization as a starting point, we show the value in the system taking the lead even in initialization. In particular, for a given video frame, the system precomputes a ranked list of thousands of possible segmentation hypotheses (also referred to as object region proposals) using image and motion cues. Then, the user looks at the top ranked proposals, and clicks on the object boundary to carve away erroneous ones. This process iterates (typically 2-3 times), and each time the system revises the top ranked proposal set, until the user is satisfied with a resulting segmentation mask. Finally, the mask is propagated across the video to produce a spatio-temporal object tube. On three challenging datasets, we provide extensive comparisons with both existing work and simpler alternative methods. In all, the proposed Click Carving approach strikes an excellent balance of accuracy and human effort. It outperforms all similarly fast methods, and is competitive or better than those requiring 2 to 12 times the effort.\nThis paper addresses an optimal control problem for a robot that has to find and collect a finite number of objects and move them to a depot in minimum time. The robot has fourth-order dynamics that change instantaneously at any pick-up or drop-off of an object. The objects are modeled by point masses with a-priori unknown locations in a bounded two-dimensional space that may contain unknown obstacles. For this hybrid system, an Optimal Control Problem (OCP) is approximately solved by a receding horizon scheme, where the derived lower bound for the cost-to-go is evaluated for the worst and for a probabilistic case, assuming a uniform distribution of the objects. First, a time-driven approximate solution based on time and position space discretization and mixed integer programming is presented. Due to the high computational cost of this solution, an alternative event-driven approximate approach based on a suitable motion parameterization and gradient-based optimization is proposed. The solutions are compared in a numerical example, suggesting that the latter approach offers a significant computational advantage while yielding similar qualitative results compared to the former. The methods are particularly relevant for various robotic applications like automated cleaning, search and rescue, harvesting or manufacturing.\nIn recent years, content recommendation systems in large websites (or \\emph{content providers}) capture an increased focus. While the type of content varies, e.g.\\ movies, articles, music, advertisements, etc., the high level problem remains the same. Based on knowledge obtained so far on the user, recommend the most desired content. In this paper we present a method to handle the well known user-cold-start problem in recommendation systems. In this scenario, a recommendation system encounters a new user and the objective is to present items as relevant as possible with the hope of keeping the user's session as long as possible. We formulate an optimization problem aimed to maximize the length of this initial session, as this is believed to be the key to have the user come back and perhaps register to the system. In particular, our model captures the fact that a single round with low quality recommendation is likely to terminate the session. In such a case, we do not proceed to the next round as the user leaves the system, possibly never to seen again. We denote this phenomenon a \\emph{One-Shot Session}. Our optimization problem is formulated as an MDP where the action space is of a combinatorial nature as we recommend in each round, multiple items. This huge action space presents a computational challenge making the straightforward solution intractable. We analyze the structure of the MDP to prove monotone and submodular like properties that allow a computationally efficient solution via a method denoted by \\emph{Greedy Value Iteration} (G-VI).\nChoosing control inputs randomly can result in a reduced expected cost in optimal control problems with stochastic constraints, such as stochastic model predictive control (SMPC). We consider a controller with initial randomization, meaning that the controller randomly chooses from K+1 control sequences at the beginning (called K-randimization).It is known that, for a finite-state, finite-action Markov Decision Process (MDP) with K constraints, K-randimization is sufficient to achieve the minimum cost. We found that the same result holds for stochastic optimal control problems with continuous state and action spaces.Furthermore, we show the randomization of control input can result in reduced cost when the optimization problem is nonconvex, and the cost reduction is equal to the duality gap. We then provide the necessary and sufficient conditions for the optimality of a randomized solution, and develop an efficient solution method based on dual optimization. Furthermore, in a special case with K=1 such as a joint chance-constrained problem, the dual optimization can be solved even more efficiently by root finding. Finally, we test the theories and demonstrate the solution method on multiple practical problems ranging from path planning to the planning of entry, descent, and landing (EDL) for future Mars missions.\nThere is an impressive body of work on developing heuristics and other reasoning algorithms to guide search in optimal and anytime planning algorithms for classical planning. However, very little effort has been directed towards developing analogous techniques to guide search towards high-quality solutions in hierarchical planning formalisms like HTN planning, which allows using additional domain-specific procedural control knowledge. In lieu of such techniques, this control knowledge often needs to provide the necessary search guidance to the planning algorithm, which imposes a substantial burden on the domain author and can yield brittle or error-prone domain models. We address this gap by extending recent work on a new hierarchical goal-based planning formalism called Hierarchical Goal Network (HGN) Planning to develop the Hierarchically-Optimal Goal Decomposition Planner (HOpGDP), an HGN planning algorithm that computes hierarchically-optimal plans. HOpGDP is guided by $h_{HL}$, a new HGN planning heuristic that extends existing admissible landmark-based heuristics from classical planning to compute admissible cost estimates for HGN planning problems. Our experimental evaluation across three benchmark planning domains shows that HOpGDP compares favorably to both optimal classical planners due to its ability to use domain-specific procedural knowledge, and a blind-search version of HOpGDP due to the search guidance provided by $h_{HL}$.\nOwing to the remarkable photometric precision of space observatories like Kepler, stellar and planetary systems beyond our own are now being characterized en masse for the first time. These characterizations are pivotal for endeavors such as searching for Earth-like planets and solar twins, understanding the mechanisms that govern stellar evolution, and tracing the dynamics of our Galaxy. The volume of data that is becoming available, however, brings with it the need to process this information accurately and rapidly. While existing methods can constrain fundamental stellar parameters such as ages, masses, and radii from these observations, they require substantial computational efforts to do so.   We develop a method based on machine learning for rapidly estimating fundamental parameters of main-sequence solar-like stars from classical and asteroseismic observations. We first demonstrate this method on a hare-and-hound exercise and then apply it to the Sun, 16 Cyg A & B, and 34 planet-hosting candidates that have been observed by the Kepler spacecraft. We find that our estimates and their associated uncertainties are comparable to the results of other methods, but with the additional benefit of being able to explore many more stellar parameters while using much less computation time. We furthermore use this method to present evidence for an empirical diffusion-mass relation. Our method is open source and freely available for the community to use.   The source code for all analyses and for all figures appearing in this manuscript can be found electronically at https://github.com/earlbellinger/asteroseismology\nIn this work we propose a game theoretic model for document clustering. Each document to be clustered is represented as a player and each cluster as a strategy. The players receive a reward interacting with other players that they try to maximize choosing their best strategies. The geometry of the data is modeled with a weighted graph that encodes the pairwise similarity among documents, so that similar players are constrained to choose similar strategies, updating their strategy preferences at each iteration of the games. We used different approaches to find the prototypical elements of the clusters and with this information we divided the players into two disjoint sets, one collecting players with a definite strategy and the other one collecting players that try to learn from others the correct strategy to play. The latter set of players can be considered as new data points that have to be clustered according to previous information. This representation is useful in scenarios in which the data are streamed continuously. The evaluation of the system was conducted on 13 document datasets using different settings. It shows that the proposed method performs well compared to different document clustering algorithms.\nWe introduce LL-RNNs (Log-Linear RNNs), an extension of Recurrent Neural Networks that replaces the softmax output layer by a log-linear output layer, of which the softmax is a special case. This conceptually simple move has two main advantages. First, it allows the learner to combat training data sparsity by allowing it to model words (or more generally, output symbols) as complex combinations of attributes without requiring that each combination is directly observed in the training data (as the softmax does). Second, it permits the inclusion of flexible prior knowledge in the form of a priori specified modular features, where the neural network component learns to dynamically control the weights of a log-linear distribution exploiting these features.   We conduct experiments in the domain of language modelling of French, that exploit morphological prior knowledge and show an important decrease in perplexity relative to a baseline RNN.   We provide other motivating iillustrations, and finally argue that the log-linear and the neural-network components contribute complementary strengths to the LL-RNN: the LL aspect allows the model to incorporate rich prior knowledge, while the NN aspect, according to the \"representation learning\" paradigm, allows the model to discover novel combination of characteristics.\nStrategy Logic (SL) is a logical formalism for strategic reasoning in multi-agent systems. Its main feature is that it has variables for strategies that are associated to specific agents with a binding operator. We introduce Graded Strategy Logic (GradedSL), an extension of SL by graded quantifiers over tuples of strategy variables, i.e., \"there exist at least g different tuples (x_1,...,x_n) of strategies\" where g is a cardinal from the set N union {aleph_0, aleph_1, 2^aleph_0}. We prove that the model-checking problem of GradedSL is decidable. We then turn to the complexity of fragments of GradedSL. When the g's are restricted to finite cardinals, written GradedNSL, the complexity of model-checking is no harder than for SL, i.e., it is non-elementary in the quantifier rank. We illustrate our formalism by showing how to count the number of different strategy profiles that are Nash equilibria (NE), or subgame-perfect equilibria (SPE). By analyzing the structure of the specific formulas involved, we conclude that the important problems of checking for the existence of a unique NE or SPE can both be solved in 2ExpTime, which is not harder than merely checking for the existence of such equilibria.\nProtein quality assessment (QA) by ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiment demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. DeepQA is a useful tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/.\nAdvances in healthcare and in the quality of life significantly increase human life expectancy. With the ageing of populations, new un-faced challenges are brought to science. The human body is naturally selected to be well-functioning until the age of reproduction to keep the species alive. However, as the lifespan extends, unseen problems due to the body deterioration emerge. There are several age-related diseases with no appropriate treatment; therefore, the complex ageing phenomena needs further understanding. Immunosenescence, the ageing of the immune system, is highly correlated to the negative effects of ageing, such as the increase of auto-inflammatory diseases and decrease in responsiveness to new diseases. Besides clinical and mathematical tools, we believe there is opportunity to further exploit simulation tools to understand immunosenescence. Compared to real-world experimentation, benefits include time and cost effectiveness due to the laborious, resource-intensiveness of the biological environment and the possibility of conducting experiments without ethic restrictions. Contrasted with mathematical models, simulation modelling is more suitable for representing complex systems and emergence. In addition, there is the belief that simulation models are easier to communicate in interdisciplinary contexts. Our work investigates the usefulness of simulations to understand immunosenescence by employing two different simulation methods, agent-based and system dynamics simulation, to a case study of immune cells depletion with age.\nHow do technology users effectively transit from having zero knowledge about a technology to making the best use of it after an authoritative technology adoption? This post-adoption user learning has received little research attention in technology management literature. In this paper we investigate user learning in authoritative technology adoption by developing an agent-based model using the case of council-led smart meter deployment in the UK City of Leeds. Energy consumers gain experience of using smart meters based on the learning curve in behavioural learning. With the agent-based model we carry out experiments to validate the model and test different energy interventions that local authorities can use to facilitate energy consumers' learning and maintain their continuous use of the technology. Our results show that the easier energy consumers become experienced, the more energy-efficient they are and the more energy saving they can achieve; encouraging energy consumers' contacts via various informational means can facilitate their learning; and developing and maintaining their positive attitude toward smart metering can enable them to use the technology continuously. Contributions and energy policy/intervention implications are discussed in this paper.\nThis study optimises manually derived rule-based expert system classification of objects according to changes in their properties over time. One of the key challenges that this study tries to address is how to classify objects that exhibit changes in their behaviour over time, for example how to classify companies' share price stability over a period of time or how to classify students' preferences for subjects while they are progressing through school. A specific case the paper considers is the strategy of players in public goods games (as common in economics) across multiple consecutive games. Initial classification starts from expert definitions specifying class allocation for players based on aggregated attributes of the temporal data. Based on these initial classifications, the optimisation process tries to find an improved classifier which produces the best possible compact classes of objects (players) for every time point in the temporal data. The compactness of the classes is measured by a cost function based on internal cluster indices like the Dunn Index, distance measures like Euclidean distance or statistically derived measures like standard deviation. The paper discusses the approach in the context of incorporating changing player strategies in the aforementioned public good games, where common classification approaches so far do not consider such changes in behaviour resulting from learning or in-game experience. By using the proposed process for classifying temporal data and the actual players' contribution during the games, we aim to produce a more refined classification which in turn may inform the interpretation of public goods game data.\nIn the context of cancer treatment and surgery, quality of life assessment is a crucial part of determining treatment success and viability. In order to assess it, patients completed questionnaires which employ words to capture aspects of patients well-being are the norm. As the results of these questionnaires are often used to assess patient progress and to determine future treatment options, it is important to establish that the words used are interpreted in the same way by both patients and medical professionals. In this paper, we capture and model patients perceptions and associated uncertainty about the words used to describe the level of their physical function used in the highly common (in Sarcoma Services) Toronto Extremity Salvage Score (TESS) questionnaire. The paper provides detail about the interval-valued data capture as well as the subsequent modelling of the data using fuzzy sets. Based on an initial sample of participants, we use Jaccard similarity on the resulting words models to show that there may be considerable differences in the interpretation of commonly used questionnaire terms, thus presenting a very real risk of miscommunication between patients and medical professionals as well as within the group of medical professionals.\nBig longitudinal observational medical data potentially hold a wealth of information and have been recognised as potential sources for gaining new drug safety knowledge. Unfortunately there are many complexities and underlying issues when analysing longitudinal observational data. Due to these complexities, existing methods for large-scale detection of negative side effects using observational data all tend to have issues distinguishing between association and causality. New methods that can better discriminate causal and non-causal relationships need to be developed to fully utilise the data. In this paper we propose using a set of causality considerations developed by the epidemiologist Bradford Hill as a basis for engineering features that enable the application of supervised learning for the problem of detecting negative side effects. The Bradford Hill considerations look at various perspectives of a drug and outcome relationship to determine whether it shows causal traits. We taught a classifier to find patterns within these perspectives and it learned to discriminate between association and causality. The novelty of this research is the combination of supervised learning and Bradford Hill's causality considerations to automate the Bradford Hill's causality assessment. We evaluated the framework on a drug safety gold standard know as the observational medical outcomes partnership's nonspecified association reference set. The methodology obtained excellent discriminate ability with area under the curves ranging between 0.792-0.940 (existing method optimal: 0.73) and a mean average precision of 0.640 (existing method optimal: 0.141). The proposed features can be calculated efficiently and be readily updated, making the framework suitable for big observational data.\nThe blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to \"debias\" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.\nWord embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words, sentences and documents in context. Celebrated methods can be categorized as prediction-based and count-based methods according to the training objectives and model architectures. Their pros and cons have been extensively analyzed and evaluated in recent studies, but there is relatively less work continuing the line of research to develop an enhanced learning method that brings together the advantages of the two model families. In addition, the interpretation of the learned word representations still remains somewhat opaque. Motivated by the observations and considering the pressing need, this paper presents a novel method for learning the word representations, which not only inherits the advantages of classic word embedding methods but also offers a clearer and more rigorous interpretation of the learned word representations. Built upon the proposed word embedding method, we further formulate a translation-based language modeling framework for the extractive speech summarization task. A series of empirical evaluations demonstrate the effectiveness of the proposed word representation learning and language modeling techniques in extractive speech summarization.\nBackground: There has been growing research interest in automated answering of questions or generation of summary of free form text such as news article. In order to implement this task, the computer should be able to identify the sequence of events, duration of events, time at which event occurred and the relationship type between event pairs, time pairs or event-time pairs. Specific Problem: It is important to accurately identify the relationship type between combinations of event and time before the temporal ordering of events can be defined. The machine learning approach taken in Mani et. al (2006) provides an accuracy of only 62.5 on the baseline data from TimeBank. The researchers used maximum entropy classifier in their methodology. TimeML uses the TLINK annotation to tag a relationship type between events and time. The time complexity is quadratic when it comes to tagging documents with TLINK using human annotation. This research proposes using decision tree and parsing to improve the relationship type tagging. This research attempts to solve the gaps in human annotation by automating the task of relationship type tagging in an attempt to improve the accuracy of event and time relationship in annotated documents. Scope information: The documents from the domain of news will be used. The tagging will be performed within the same document and not across documents. The relationship types will be identified only for a pair of event and time and not a chain of events. The research focuses on documents tagged using the TimeML specification which contains tags such as EVENT, TLINK, and TIMEX. Each tag has attributes such as identifier, relation, POS, time etc.\nRecently, machine learning techniques especially predictive modeling and pattern recognition in biomedical sciences from drug delivery system to medical imaging has become one of the important methods which are assisting researchers to have deeper understanding of entire issue and to solve complex medical problems. Deep learning is a powerful machine learning algorithm in classification while extracting low to high-level features. In this paper, we used convolutional neural network to classify Alzheimer's brain from normal healthy brain. The importance of classifying this kind of medical data is to potentially develop a predict model or system in order to recognize the type disease from normal subjects or to estimate the stage of the disease. Classification of clinical data such as Alzheimer's disease has been always challenging and most problematic part has been always selecting the most discriminative features. Using Convolutional Neural Network (CNN) and the famous architecture LeNet-5, we successfully classified structural MRI data of Alzheimer's subjects from normal controls where the accuracy of test data on trained data reached 98.84%. This experiment suggests us the shift and scale invariant features extracted by CNN followed by deep learning classification is most powerful method to distinguish clinical data from healthy data in fMRI. This approach also enables us to expand our methodology to predict more complicated systems.\nEfficient usage of the knowledge provided by the Linked Data community is often hindered by the need for domain experts to formulate the right SPARQL queries to answer questions. For new questions they have to decide which datasets are suitable and in which terminology and modelling style to phrase the SPARQL query.   In this work we present an evolutionary algorithm to help with this challenging task. Given a training list of source-target node-pair examples our algorithm can learn patterns (SPARQL queries) from a SPARQL endpoint. The learned patterns can be visualised to form the basis for further investigation, or they can be used to predict target nodes for new source nodes.   Amongst others, we apply our algorithm to a dataset of several hundred human associations (such as \"circle - square\") to find patterns for them in DBpedia. We show the scalability of the algorithm by running it against a SPARQL endpoint loaded with > 7.9 billion triples. Further, we use the resulting SPARQL queries to mimic human associations with a Mean Average Precision (MAP) of 39.9 % and a Recall@10 of 63.9 %.\nData association, the reasoning over correspondence between targets and measurements, is a problem of fundamental importance in target tracking. Recently, belief propagation (BP) has emerged as a promising method for estimating the marginal probabilities of measurement to target association, providing fast, accurate estimates. The excellent performance of BP in the particular formulation used may be attributed to the convexity of the underlying free energy which it implicitly optimises. This paper studies multiple scan data association problems, i.e., problems that reason over correspondence between targets and several sets of measurements, which may correspond to different sensors or different time steps. We find that the multiple scan extension of the single scan BP formulation is non-convex and demonstrate the undesirable behaviour that can result. A convex free energy is constructed using the recently proposed fractional free energy (FFE). A convergent, BP-like algorithm is provided for the single scan FFE, and employed in optimising the multiple scan free energy using primal-dual coordinate ascent. Finally, based on a variational interpretation of joint probabilistic data association (JPDA), we develop a sequential variant of the algorithm that is similar to JPDA, but retains consistency constraints from prior scans. The performance of the proposed methods is demonstrated on a bearings only target localisation problem.\nAutomatically searching for optimal hyperparameter configurations is of crucial importance for applying deep learning algorithms in practice. Recently, Bayesian optimization has been proposed for optimizing hyperparameters of various machine learning algorithms. Those methods adopt probabilistic surrogate models like Gaussian processes to approximate and minimize the validation error function of hyperparameter values. However, probabilistic surrogates require accurate estimates of sufficient statistics (e.g., covariance) of the error distribution and thus need many function evaluations with a sizeable number of hyperparameters. This makes them inefficient for optimizing hyperparameters of deep learning algorithms, which are highly expensive to evaluate. In this work, we propose a new deterministic and efficient hyperparameter optimization method that employs radial basis functions as error surrogates. The proposed mixed integer algorithm, called HORD, searches the surrogate for the most promising hyperparameter values through dynamic coordinate search and requires many fewer function evaluations. HORD does well in low dimensions but it is exceptionally better in higher dimensions. Extensive evaluations on MNIST and CIFAR-10 for four deep neural networks demonstrate HORD significantly outperforms the well-established Bayesian optimization methods such as GP, SMAC, and TPE. For instance, on average, HORD is more than 6 times faster than GP-EI in obtaining the best configuration of 19 hyperparameters.\nOutlier detection is a fundamental data science task with applications ranging from data cleaning to network security. Given the fundamental nature of the task, this has been the subject of much research. Recently, a new class of outlier detection algorithms has emerged, called {\\it contextual outlier detection}, and has shown improved performance when studying anomalous behavior in a specific context. However, as we point out in this article, such approaches have limited applicability in situations where the context is sparse (i.e. lacking a suitable frame of reference). Moreover, approaches developed to date do not scale to large datasets. To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD). We utilize a local and global behavioral model based on the relevant contexts, which is then integrated in a natural and robust fashion. We also present several optimizations to improve the scalability of the approach. We run ROCOD on both synthetic and real-world datasets and demonstrate that it outperforms other competitive baselines on the axes of efficacy and efficiency (40X speedup compared to modern contextual outlier detection methods). We also drill down and perform a fine-grained analysis to shed light on the rationale for the performance gains of ROCOD and reveal its effectiveness when handling objects with sparse contexts.\nIn this paper, a high-speed online neural network classifier based on extreme learning machines for multi-label classification is proposed. In multi-label classification, each of the input data sample belongs to one or more than one of the target labels. The traditional binary and multi-class classification where each sample belongs to only one target class forms the subset of multi-label classification. Multi-label classification problems are far more complex than binary and multi-class classification problems, as both the number of target labels and each of the target labels corresponding to each of the input samples are to be identified. The proposed work exploits the high-speed nature of the extreme learning machines to achieve real-time multi-label classification of streaming data. A new threshold-based online sequential learning algorithm is proposed for high speed and streaming data classification of multi-label problems. The proposed method is experimented with six different datasets from different application domains such as multimedia, text, and biology. The hamming loss, accuracy, training time and testing time of the proposed technique is compared with nine different state-of-the-art methods. Experimental studies shows that the proposed technique outperforms the existing multi-label classifiers in terms of performance and speed.\nWe present a loss function for neural networks that encompasses an idea of trivial versus non-trivial predictions, such that the network jointly determines its own prediction goals and learns to satisfy them. This permits the network to choose sub-sets of a problem which are most amenable to its abilities to focus on solving, while discarding 'distracting' elements that interfere with its learning. To do this, the network first transforms the raw data into a higher-level categorical representation, and then trains a predictor from that new time series to its future. To prevent a trivial solution of mapping the signal to zero, we introduce a measure of non-triviality via a contrast between the prediction error of the learned model with a naive model of the overall signal statistics. The transform can learn to discard uninformative and unpredictable components of the signal in favor of the features which are both highly predictive and highly predictable. This creates a coarse-grained model of the time-series dynamics, focusing on predicting the slowly varying latent parameters which control the statistics of the time-series, rather than predicting the fast details directly. The result is a semi-supervised algorithm which is capable of extracting latent parameters, segmenting sections of time-series with differing statistics, and building a higher-level representation of the underlying dynamics from unlabeled data.\nThough deep learning has pushed the boundaries of classification forward, in recent years hints of the limits of standard classification have begun to emerge. Problems such as fooling, adding new classes over time, and the need to retrain learning models only for small changes to the original problem all point to a potential shortcoming in the classic classification regime, where a comprehensive a priori knowledge of the possible classes or concepts is critical. Without such knowledge, classifiers misjudge the limits of their knowledge and overgeneralization therefore becomes a serious obstacle to consistent performance. In response to these challenges, this paper extends the classic regime by reframing classification instead with the assumption that concepts present in the training set are only a sample of the hypothetical final set of concepts. To bring learning models into this new paradigm, a novel elaboration of standard architectures called the competitive overcomplete output layer (COOL) neural network is introduced. Experiments demonstrate the effectiveness of COOL by applying it to fooling, separable concept learning, one-class neural networks, and standard classification benchmarks. The results suggest that, unlike conventional classifiers, the amount of generalization in COOL networks can be tuned to match the problem.\nThe partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the \"curse of dimensionality\" and the \"curse of history\". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the \"execution\" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.\nCreativity is a complex, multi-faceted concept encompassing a variety of related aspects, abilities, properties and behaviours. If we wish to study creativity scientifically, then a tractable and well-articulated model of creativity is required. Such a model would be of great value to researchers investigating the nature of creativity and in particular, those concerned with the evaluation of creative practice. This paper describes a unique approach to developing a suitable model of how creative behaviour emerges that is based on the words people use to describe the concept. Using techniques from the field of statistical natural language processing, we identify a collection of fourteen key components of creativity through an analysis of a corpus of academic papers on the topic. Words are identified which appear significantly often in connection with discussions of the concept. Using a measure of lexical similarity to help cluster these words, a number of distinct themes emerge, which collectively contribute to a comprehensive and multi-perspective model of creativity. The components provide an ontology of creativity: a set of building blocks which can be used to model creative practice in a variety of domains. The components have been employed in two case studies to evaluate the creativity of computational systems and have proven useful in articulating achievements of this work and directions for further research.\nStatistical characteristics of network traffic have attracted a significant amount of research for automated network intrusion detection, some of which looked at applications of natural statistical laws such as Zipf's law, Benford's law and the Pareto distribution. In this paper, we present the application of Benford's law to a new network flow metric \"flow size difference\", which have not been studied before by other researchers, to build an unsupervised flow-based intrusion detection system (IDS). The method was inspired by our observation on a large number of TCP flow datasets where normal flows tend to follow Benford's law closely but malicious flows tend to deviate significantly from it. The proposed IDS is unsupervised, so it can be easily deployed without any training. It has two simple operational parameters with a clear semantic meaning, allowing the IDS operator to set and adapt their values intuitively to adjust the overall performance of the IDS. We tested the proposed IDS on two (one closed and one public) datasets, and proved its efficiency in terms of AUC (area under the ROC curve). Our work showed the \"flow size difference\" has a great potential to improve the performance of any flow-based network IDSs.\nIn recent years, there has been a huge increase in the number of bots online, varying from Web crawlers for search engines, to chatbots for online customer service, spambots on social media, and content-editing bots in online collaboration communities. The online world has turned into an ecosystem of bots. However, our knowledge of how these automated agents are interacting with each other is rather poor. Bots are predictable automatons that do not have the capacity for emotions, meaning-making, creativity, and sociality and it is hence natural to expect interactions between bots to be relatively predictable and uneventful. In this article, we analyze the interactions between bots that edit articles on Wikipedia. We track the extent to which bots undid each other's edits over the period 2001-2010, model how pairs of bots interact over time, and identify different types of interaction trajectories. We find that, although Wikipedia bots are intended to support the encyclopedia, they often undo each other's edits and these sterile \"fights\" may sometimes continue for years. Unlike humans on Wikipedia, bots' interactions tend to occur over longer periods of time and to be more reciprocated. Yet, just like humans, bots in different cultural environments may behave differently. Our research suggests that even relatively \"dumb\" bots may give rise to complex interactions, and this carries important implications for Artificial Intelligence research. Understanding what affects bot-bot interactions is crucial for managing social media well, providing adequate cyber-security, and designing well functioning autonomous vehicles.\nPreference orderings are orderings of a set of items according to the preferences (of judges). Such orderings arise in a variety of domains, including group decision making, consumer marketing, voting and machine learning. Measuring the mutual information and extracting the common patterns in a set of preference orderings are key to these areas. In this paper we deal with the representation of sets of preference orderings, the quantification of the degree to which judges agree on their ordering of the items (i.e. the concordance), and the efficient, meaningful description of such sets.   We propose to represent the orderings in a subsequence-based feature space and present a new algorithm to calculate the size of the set of all common subsequences - the basis of a quantification of concordance, not only for pairs of orderings but also for sets of orderings. The new algorithm is fast and storage efficient with a time complexity of only $O(Nn^2)$ for the orderings of $n$ items by $N$ judges and a space complexity of only $O(\\min\\{Nn,n^2\\})$.   Also, we propose to represent the set of all $N$ orderings through a smallest set of covering preferences and present an algorithm to construct this smallest covering set.   The source code for the algorithms is available at https://github.com/zhiweiuu/secs\nRobotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \\emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot.\nDeep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system -- though just a prototype -- learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.\nSince sequential information plays an important role in modeling user behaviors, various sequential recommendation methods have been proposed. Methods based on Markov assumption are widely-used, but independently combine several most recent components. Recently, Recurrent Neural Networks (RNN) based methods have been successfully applied in several sequential modeling tasks. However, for real-world applications, these methods have difficulty in modeling the contextual information, which has been proved to be very important for behavior modeling. In this paper, we propose a novel model, named Context-Aware Recurrent Neural Networks (CA-RNN). Instead of using the constant input matrix and transition matrix in conventional RNN models, CA-RNN employs adaptive context-specific input matrices and adaptive context-specific transition matrices. The adaptive context-specific input matrices capture external situations where user behaviors happen, such as time, location, weather and so on. And the adaptive context-specific transition matrices capture how lengths of time intervals between adjacent behaviors in historical sequences affect the transition of global sequential features. Experimental results show that the proposed CA-RNN model yields significant improvements over state-of-the-art sequential recommendation methods and context-aware recommendation methods on two public datasets, i.e., the Taobao dataset and the Movielens-1M dataset.\nIn this note we consider the problem of introducing variables in temporal logic programs under the formalism of \"Temporal Equilibrium Logic\" (TEL), an extension of Answer Set Programming (ASP) for dealing with linear-time modal operators. To this aim, we provide a definition of a first-order version of TEL that shares the syntax of first-order Linear-time Temporal Logic (LTL) but has a different semantics, selecting some LTL models we call \"temporal stable models\". Then, we consider a subclass of theories (called \"splittable temporal logic programs\") that are close to usual logic programs but allowing a restricted use of temporal operators. In this setting, we provide a syntactic definition of \"safe variables\" that suffices to show the property of \"domain independence\" -- that is, addition of arbitrary elements in the universe does not vary the set of temporal stable models. Finally, we present a method for computing the derivable facts by constructing a non-temporal logic program with variables that is fed to a standard ASP grounder. The information provided by the grounder is then used to generate a subset of ground temporal rules which is equivalent to (and generally smaller than) the full program instantiation.\nAn optimal data partitioning in parallel & distributed implementation of clustering algorithms is a necessary computation as it ensures independent task completion, fair distribution, less number of affected points and better & faster merging. Though partitioning using Kd Tree is being conventionally used in academia, it suffers from performance drenches and bias (non equal distribution) as dimensionality of data increases and hence is not suitable for practical use in industry where dimensionality can be of order of 100s to 1000s. To address these issues we propose two new partitioning techniques using existing mathematical models & study their feasibility, performance (bias and partitioning speed) & possible variants in choosing initial seeds. First method uses an n dimensional hashed grid based approach which is based on mapping the points in space to a set of cubes which hashes the points. Second method uses a tree of voronoi planes where each plane corresponds to a partition. We found that grid based approach was computationally impractical, while using a tree of voronoi planes (using scalable K-Means++ initial seeds) drastically outperformed the Kd-tree tree method as dimensionality increased.\nThis paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs). In particular, this is achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers which explicitly exploit the sparsity encountered in the input. To this end, we examine the trade-off between accuracy and speed for different architectures and additionally propose to use an L1 penalty on the filter activations to further encourage sparsity in the intermediate representations. To the best of our knowledge, this is the first work to propose sparse convolutional layers and L1 regularisation for efficient large-scale processing of 3D data. We demonstrate the efficacy of our approach on the KITTI object detection benchmark and show that Vote3Deep models with as few as three layers outperform the previous state of the art in both laser and laser-vision based approaches by margins of up to 40% while remaining highly competitive in terms of processing time.\nHigh-speed, low-latency obstacle avoidance that is insensitive to sensor noise is essential for enabling multiple decentralized robots to function reliably in cluttered and dynamic environments. While other distributed multi-agent collision avoidance systems exist, these systems require online geometric optimization where tedious parameter tuning and perfect sensing are necessary.   We present a novel end-to-end framework to generate reactive collision avoidance policy for efficient distributed multi-agent navigation. Our method formulates an agent's navigation strategy as a deep neural network mapping from the observed noisy sensor measurements to the agent's steering commands in terms of movement velocity. We train the network on a large number of frames of collision avoidance data collected by repeatedly running a multi-agent simulator with different parameter settings. We validate the learned deep neural network policy in a set of simulated and real scenarios with noisy measurements and demonstrate that our method is able to generate a robust navigation strategy that is insensitive to imperfect sensing and works reliably in all situations. We also show that our method can be well generalized to scenarios that do not appear in our training data, including scenes with static obstacles and agents with different sizes. Videos are available at https://sites.google.com/view/deepmaca.\nAbstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the sentences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.\nIn this paper, we deal with two challenges for measuring the similarity of the subject identities in practical video-based face recognition - the variation of the head pose in uncontrolled environments and the computational expense of processing videos. Since the frame-wise feature mean is unable to characterize the pose diversity among frames, we define and preserve the overall pose diversity and closeness in a video. Then, identity will be the only source of variation across videos since the pose varies even within a single video. Instead of simply using all the frames, we select those faces whose pose point is closest to the centroid of the K-means cluster containing that pose point. Then, we represent a video as a bag of frame-wise deep face features while the number of features has been reduced from hundreds to K. Since the video representation can well represent the identity, now we measure the subject similarity between two videos as the max correlation among all possible pairs in the two bags of features. On the official 5,000 video-pairs of the YouTube Face dataset for face verification, our algorithm achieves a comparable performance with VGG-face that averages over deep features of all frames. Other vision tasks can also benefit from the generic idea of employing geometric cues to improve the descriptiveness of deep features.\nRecent advances in biosensors technology and mobile electroencephalographic (EEG) interfaces have opened new application fields for cognitive monitoring. A computable biomarker for the assessment of spontaneous aesthetic brain responses during music listening is introduced here. It derives from well-established measures of cross-frequency coupling (CFC) and quantifies the music-induced alterations in the dynamic relationships between brain rhythms. During a stage of exploratory analysis, and using the signals from a suitably designed experiment, we established the biomarker, which acts on brain activations recorded over the left prefrontal cortex and focuses on the functional coupling between high-beta and low-gamma oscillations. Based on data from an additional experimental paradigm, we validated the introduced biomarker and showed its relevance for expressing the subjective aesthetic appreciation of a piece of music. Our approach resulted in an affordable tool that can promote human-machine interaction and, by serving as a personalized music annotation strategy, can be potentially integrated into modern flexible music recommendation systems.   Keywords: Cross-frequency coupling; Human-computer interaction; Brain-computer interface\nThe goal of this thesis is to investigate the potential of predictive modelling for football injuries. This work was conducted in close collaboration with Tottenham Hotspurs FC (THFC), the PGA European tour and the participation of Wolverhampton Wanderers (WW).   Three investigations were conducted:   1. Predicting the recovery time of football injuries using the UEFA injury recordings: The UEFA recordings is a common standard for recording injuries in professional football. For this investigation, three datasets of UEFA injury recordings were available. Different machine learning algorithms were used in order to build a predictive model. The performance of the machine learning models is then improved by using feature selection conducted through correlation-based subset feature selection and random forests.   2. Predicting injuries in professional football using exposure records: The relationship between exposure (in training hours and match hours) in professional football athletes and injury incidence was studied. A common problem in football is understanding how the training schedule of an athlete can affect the chance of him getting injured. The task was to predict the number of days a player can train before he gets injured.   3. Predicting intrinsic injury incidence using in-training GPS measurements: A significant percentage of football injuries can be attributed to overtraining and fatigue. GPS data collected during training sessions might provide indicators of fatigue, or might be used to detect very intense training sessions which can lead to overtraining. This research used GPS data gathered during training sessions of the first team of THFC, in order to predict whether an injury would take place during a week.\nThe 'conjunction fallacy' has been extensively debated by scholars in cognitive science and, in recent times, the discussion has been enriched by the proposal of modeling the fallacy using the quantum formalism. Two major quantum approaches have been put forward: the first assumes that respondents use a two-step sequential reasoning and that the fallacy results from the presence of 'question order effects'; the second assumes that respondents evaluate the cognitive situation as a whole and that the fallacy results from the 'emergence of new meanings', as an 'effect of overextension' in the conceptual conjunction. Thus, the question arises as to determine whether and to what extent conjunction fallacies would result from 'order effects' or, instead, from 'emergence effects'. To help clarify this situation, we propose to use the World Wide Web as an 'information space' that can be interrogated both in a sequential and non-sequential way, to test these two quantum approaches. We find that 'emergence effects', and not 'order effects', should be considered the main cognitive mechanism producing the observed conjunction fallacies.\nUser preference integration is of great importance in multi-objective optimization, in particular in many objective optimization. Preferences have long been considered in traditional multicriteria decision making (MCDM) which is based on mathematical programming. Recently, it is integrated in multi-objective metaheuristics (MOMH), resulting in focus on preferred parts of the Pareto front instead of the whole Pareto front. The number of publications on preference-based multi-objective metaheuristics has increased rapidly over the past decades. There already exist various preference handling methods and MOMH methods, which have been combined in diverse ways. This article proposes to use the Web Ontology Language (OWL) to model and systematize the results developed in this field. A review of the existing work is provided, based on which an ontology is built and instantiated with state-of-the-art results. The OWL ontology is made public and open to future extension. Moreover, the usage of the ontology is exemplified for different use-cases, including querying for methods that match an engineering application, bibliometric analysis, checking existence of combinations of preference models and MOMH techniques, and discovering opportunities for new research and open research questions.\nRobust principal component analysis (RPCA) has been widely used for recovering low-rank matrices in many data mining and machine learning problems. It separates a data matrix into a low-rank part and a sparse part. The convex approach has been well studied in the literature. However, state-of-the-art algorithms for the convex approach usually have relatively high complexity due to the need of solving (partial) singular value decompositions of large matrices. A non-convex approach, AltProj, has also been proposed with lighter complexity and better scalability. Given the true rank $r$ of the underlying low rank matrix, AltProj has a complexity of $O(r^2dn)$, where $d\\times n$ is the size of data matrix. In this paper, we propose a novel factorization-based model of RPCA, which has a complexity of $O(kdn)$, where $k$ is an upper bound of the true rank. Our method does not need the precise value of the true rank. From extensive experiments, we observe that AltProj can work only when $r$ is precisely known in advance; however, when the needed rank parameter $r$ is specified to a value different from the true rank, AltProj cannot fully separate the two parts while our method succeeds. Even when both work, our method is about 4 times faster than AltProj. Our method can be used as a light-weight, scalable tool for RPCA in the absence of the precise value of the true rank.\nRecently, end-to-end memory networks have shown promising results on Question Answering task, which encode the past facts into an explicit memory and perform reasoning ability by making multiple computational steps on the memory. However, memory networks conduct the reasoning on sentence-level memory to output coarse semantic vectors and do not further take any attention mechanism to focus on words, which may lead to the model lose some detail information, especially when the answers are rare or unknown words. In this paper, we propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the past facts into sentence-level memory and word-level memory respectively. Then, (k)-max pooling is exploited following reasoning module on the sentence-level memory to sample the (k) most relevant sentences to a question and feed these sentences into attention mechanism on the word-level memory to focus the words in the selected sentences. Finally, the prediction is jointly learned over the outputs of the sentence-level reasoning module and the word-level attention mechanism. The experimental results demonstrate that our approach successfully conducts answer selection on unknown words and achieves a better performance than memory networks.\nModel predictive control (MPC) is a popular control method that has proved effective for robotics, among other fields. MPC performs re-planning at every time step. Re-planning is done with a limited horizon per computational and real-time constraints and often also for robustness to potential model errors. However, the limited horizon leads to suboptimal performance. In this work, we consider the iterative learning setting, where the same task can be repeated several times, and propose a policy improvement scheme for MPC. The main idea is that between executions we can, offline, run MPC with a longer horizon, resulting in a hindsight plan. To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan. This effectively consolidates long-term reasoning into the short-horizon planning. We empirically evaluate our approach in contact-rich manipulation tasks both in simulated and real environments, such as peg insertion by a real PR2 robot.\nWith the rapid growth of social media, rumors are also spreading widely on social media and bring harm to people's daily life. Nowadays, information credibility evaluation has drawn attention from academic and industrial communities. Current methods mainly focus on feature engineering and achieve some success. However, feature engineering based methods require a lot of labor and cannot fully reveal the underlying relations among data. In our viewpoint, the key elements of user behaviors for evaluating credibility are concluded as \"who\", \"what\", \"when\", and \"how\". These existing methods cannot model the correlation among different key elements during the spreading of microblogs. In this paper, we propose a novel representation learning method, Information Credibility Evaluation (ICE), to learn representations of information credibility on social media. In ICE, latent representations are learnt for modeling user credibility, behavior types, temporal properties, and comment attitudes. The aggregation of these factors in the microblog spreading process yields the representation of a user's behavior, and the aggregation of these dynamic representations generates the credibility representation of an event spreading on social media. Moreover, a pairwise learning method is applied to maximize the credibility difference between rumors and non-rumors. To evaluate the performance of ICE, we conduct experiments on a Sina Weibo data set, and the experimental results show that our ICE model outperforms the state-of-the-art methods.\nDeep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.\nStructured sparse optimization is an important and challenging problem for analyzing high-dimensional data in a variety of applications such as bioinformatics, medical imaging, social networks, and astronomy. Although a number of structured sparsity models have been explored, such as trees, groups, clusters, and paths, connected subgraphs have been rarely explored in the current literature. One of the main technical challenges is that there is no structured sparsity-inducing norm that can directly model the space of connected subgraphs, and there is no exact implementation of a projection oracle for connected subgraphs due to its NP-hardness. In this paper, we explore efficient approximate projection oracles for connected subgraphs, and propose two new efficient algorithms, namely, Graph-IHT and Graph-GHTP, to optimize a generic nonlinear objective function subject to connectivity constraint on the support of the variables. Our proposed algorithms enjoy strong guarantees analogous to several current methods for sparsity-constrained optimization, such as Projected Gradient Descent (PGD), Approximate Model Iterative Hard Thresholding (AM-IHT), and Gradient Hard Thresholding Pursuit (GHTP) with respect to convergence rate and approximation accuracy. We apply our proposed algorithms to optimize several well-known graph scan statistics in several applications of connected subgraph detection as a case study, and the experimental results demonstrate that our proposed algorithms outperform state-of-the-art methods.\nForecasting the flow of crowds is of great importance to traffic management and public safety, yet a very challenging task affected by many complex factors, such as inter-region traffic, events and weather. In this paper, we propose a deep-learning-based approach, called ST-ResNet, to collectively forecast the in-flow and out-flow of crowds in each and every region through a city. We design an end-to-end structure of ST-ResNet based on unique properties of spatio-temporal data. More specifically, we employ the framework of the residual neural networks to model the temporal closeness, period, and trend properties of the crowd traffic, respectively. For each property, we design a branch of residual convolutional units, each of which models the spatial properties of the crowd traffic. ST-ResNet learns to dynamically aggregate the output of the three residual neural networks based on data, assigning different weights to different branches and regions. The aggregation is further combined with external factors, such as weather and day of the week, to predict the final traffic of crowds in each and every region. We evaluate ST-ResNet based on two types of crowd flows in Beijing and NYC, finding that its performance exceeds six well-know methods.\nReinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.\nHigh-Throughput materials discovery involves the rapid synthesis, measurement, and characterization of many different but structurally-related materials. A key problem in materials discovery, the phase map identification problem, involves the determination of the crystal phase diagram from the materials' composition and structural characterization data. We present Phase-Mapper, a novel AI platform to solve the phase map identification problem that allows humans to interact with both the data and products of AI algorithms, including the incorporation of human feedback to constrain or initialize solutions. Phase-Mapper affords incorporation of any spectral demixing algorithm, including our novel solver, AgileFD, which is based on a convolutive non-negative matrix factorization algorithm. AgileFD can incorporate constraints to capture the physics of the materials as well as human feedback. We compare three solver variants with previously proposed methods in a large-scale experiment involving 20 synthetic systems, demonstrating the efficacy of imposing physical constrains using AgileFD. Phase-Mapper has also been used by materials scientists to solve a wide variety of phase diagrams, including the previously unsolved Nb-Mn-V oxide system, which is provided here as an illustrative example.\nNetworks represent relationships between entities in many complex systems, spanning from online social interactions to biological cell development and brain connectivity. In many cases, relationships between entities are unambiguously known: are two users 'friends' in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? These are directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another? Do two animals who physically co-locate have a social bond? Who infected whom in a disease outbreak in a population?   Existing approaches for inferring networks from data are found across many application domains and use specialized knowledge to infer and measure the quality of inferred network for a specific task or hypothesis. However, current research lacks a rigorous methodology which employs standard statistical validation on inferred models. In this survey, we examine (1) how network representations are constructed from underlying data, (2) the variety of questions and tasks on these representations over several domains, and (3) validation strategies for measuring the inferred network's capability of answering questions on the system of interest.\nIdentity verification based on authenticity assessment of a handwritten signature is an important issue in biometrics. There are many effective methods for signature verification taking into account dynamics of a signing process. Methods based on partitioning take a very important place among them. In this paper we propose a new approach to signature partitioning. Its most important feature is the possibility of selecting and processing of hybrid partitions in order to increase a precision of the test signature analysis. Partitions are formed by a combination of vertical and horizontal sections of the signature. Vertical sections correspond to the initial, middle, and final time moments of the signing process. In turn, horizontal sections correspond to the signature areas associated with high and low pen velocity and high and low pen pressure on the surface of a graphics tablet. Our previous research on vertical and horizontal sections of the dynamic signature (created independently) led us to develop the algorithm presented in this paper. Selection of sections, among others, allows us to define the stability of the signing process in the partitions, promoting signature areas of greater stability (and vice versa). In the test of the proposed method two databases were used: public MCYT-100 and paid BioSecure.\nExploration in an unknown environment is the core functionality for mobile robots. Learning-based exploration methods, including convolutional neural networks, provide excellent strategies without human-designed logic for the feature extraction. But the conventional supervised learning algorithms cost lots of efforts on the labeling work of datasets inevitably. Scenes not included in the training set are mostly unrecognized either. We propose a deep reinforcement learning method for the exploration of mobile robots in an indoor environment with the depth information from an RGB-D sensor only. Based on the Deep Q-Network framework, the raw depth image is taken as the only input to estimate the Q values corresponding to all moving commands. The training of the network weights is end-to-end. In arbitrarily constructed simulation environments, we show that the robot can be quickly adapted to unfamiliar scenes without any man-made labeling. Besides, through analysis of receptive fields of feature representations, deep reinforcement learning motivates the convolutional networks to estimate the traversability of the scenes. The test results are compared with the exploration strategies separately based on deep learning or reinforcement learning. Even trained only in the simulated environment, experimental results in real-world environment demonstrate that the cognitive ability of robot controller is dramatically improved compared with the supervised method. We believe it is the first time that raw sensor information is used to build cognitive exploration strategy for mobile robots through end-to-end deep reinforcement learning.\nMany malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been successful on fairly simplistic DGAs, many of which produce names of fixed length. However, models trained on limited datasets are somewhat blind to new DGA variants.   In this paper, we leverage the concept of generative adversarial networks to construct a deep learning based DGA that is designed to intentionally bypass a deep learning based detector. In a series of adversarial rounds, the generator learns to generate domain names that are increasingly more difficult to detect. In turn, a detector model updates its parameters to compensate for the adversarially generated domains. We test the hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs. We detail solutions to several challenges in training this character-based generative adversarial network (GAN). In particular, our deep learning architecture begins as a domain name auto-encoder (encoder + decoder) trained on domains in the Alexa one million. Then the encoder and decoder are reassembled competitively in a generative adversarial network (detector + generator), with novel neural architectures and training strategies to improve convergence.\nRapid construction of phase diagrams is a central tenet of combinatorial materials science with accelerated materials discovery efforts often hampered by challenges in interpreting combinatorial x-ray diffraction datasets, which we address by developing AgileFD, an artificial intelligence algorithm that enables rapid phase mapping from a combinatorial library of x-ray diffraction patterns. AgileFD models alloying-based peak shifting through a novel expansion of convolutional nonnegative matrix factorization, which not only improves the identification of constituent phases but also maps their concentration and lattice parameter as a function of composition. By incorporating Gibbs phase rule into the algorithm, physically meaningful phase maps are obtained with unsupervised operation, and more refined solutions are attained by injecting expert knowledge of the system. The algorithm is demonstrated through investigation of the V-Mn-Nb oxide system where decomposition of eight oxide phases, including two with substantial alloying, provides the first phase map for this pseudo-ternary system. This phase map enables interpretation of high-throughput band gap data, leading to the discovery of new solar light absorbers and the alloying-based tuning of the direct-allowed band-gap energy of MnV2O6. The open-source family of AgileFD algorithms can be implemented into a broad range of high throughput workflows to accelerate materials discovery.\nIn big data era, the data continuously generated and its distribution may keep changes overtime. These challenges in online stream of data are known as concept drift. In this paper, we proposed the Adaptive Convolutional ELM method (ACNNELM) as enhancement of Convolutional Neural Network (CNN) with a hybrid Extreme Learning Machine (ELM) model plus adaptive capability. This method is aimed for concept drift handling. We enhanced the CNN as convolutional hiererchical features representation learner combined with Elastic ELM (E$^2$LM) as a parallel supervised classifier. We propose an Adaptive OS-ELM (AOS-ELM) for concept drift adaptability in classifier level (named ACNNELM-1) and matrices concatenation ensembles for concept drift adaptability in ensemble level (named ACNNELM-2). Our proposed Adaptive CNNELM is flexible that works well in classifier level and ensemble level while most current methods only proposed to work on either one of the levels.   We verified our method in extended MNIST data set and not MNIST data set. We set the experiment to simulate virtual drift, real drift, and hybrid drift event and we demonstrated how our CNNELM adaptability works. Our proposed method works well and gives better accuracy, computation scalability, and concept drifts adaptability compared to the regular ELM and CNN. Further researches are still required to study the optimum parameters and to use more varied image data set.\nNeural sequence models are widely used to model time-series data in many fields. Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. BS explores the search space in a greedy left-right fashion retaining only the top-$B$ candidates -- resulting in sequences that differ only slightly from each other. Producing lists of nearly identical sequences is not only computationally wasteful but also typically fails to capture the inherent ambiguity of complex AI tasks. To overcome this problem, we propose \\emph{Diverse Beam Search} (DBS), an alternative to BS that decodes a list of diverse outputs by optimizing for a diversity-augmented objective. We observe that our method finds better top-1 solutions by controlling for the exploration and exploitation of the search space -- implying that DBS is a \\emph{better search algorithm}. Moreover, these gains are achieved with minimal computational or memory overhead as compared to beam search. To demonstrate the broad applicability of our method, we present results on image captioning, machine translation and visual question generation using both standard quantitative metrics and qualitative human studies. Our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.\nThis paper presents NetWorks (NW), an interactive music generation system that uses a hierarchically clustered scale free network to generate music that ranges from orderly to chaotic. NW was inspired by the Honing Theory of creativity, according to which human-like creativity hinges on (1) the ability to self-organize and maintain dynamics at the 'edge of chaos' using something akin to 'psychological entropy', and (2) the capacity to shift between analytic and associative processing modes. At the 'edge of chaos', NW generates patterns that exhibit emergent complexity through coherent development at low, mid, and high levels of musical organization, and often suggests goal seeking behaviour. The architecture consists of four 16-node modules: one each for pitch, velocity, duration, and entry delay. The Core allows users to define how nodes are connected, and rules that determine when and how nodes respond to their inputs. The Mapping Layer allows users to map node output values to MIDI data that is routed to software instruments in a digital audio workstation. By shifting between bottom-up and top-down NW shifts between analytic and associative processing modes.\nRecently neural networks and multiple instance learning are both attractive topics in Artificial Intelligence related research fields. Deep neural networks have achieved great success in supervised learning problems, and multiple instance learning as a typical weakly-supervised learning method is effective for many applications in computer vision, biometrics, nature language processing, etc. In this paper, we revisit the problem of solving multiple instance learning problems using neural networks. Neural networks are appealing for solving multiple instance learning problem. The multiple instance neural networks perform multiple instance learning in an end-to-end way, which take a bag with various number of instances as input and directly output bag label. All of the parameters in a multiple instance network are able to be optimized via back-propagation. We propose a new multiple instance neural network to learn bag representations, which is different from the existing multiple instance neural networks that focus on estimating instance label. In addition, recent tricks developed in deep learning have been studied in multiple instance networks, we find deep supervision is effective for boosting bag classification accuracy. In the experiments, the proposed multiple instance networks achieve state-of-the-art or competitive performance on several MIL benchmarks. Moreover, it is extremely fast for both testing and training, e.g., it takes only 0.0003 second to predict a bag and a few seconds to train on a MIL datasets on a moderate CPU.\nIt is difficult to train a personalized task-oriented dialogue system because the data collected from each individual is often insufficient. Personalized dialogue systems trained on a small dataset can overfit and make it difficult to adapt to different user needs. One way to solve this problem is to consider a collection of multiple users' data as a source domain and an individual user's data as a target domain, and to perform a transfer learning from the source to the target domain. By following this idea, we propose \"PETAL\"(PErsonalized Task-oriented diALogue), a transfer-learning framework based on POMDP to learn a personalized dialogue system. The system first learns common dialogue knowledge from the source domain and then adapts this knowledge to the target user. This framework can avoid the negative transfer problem by considering differences between source and target users. The policy in the personalized POMDP can learn to choose different actions appropriately for different users. Experimental results on a real-world coffee-shopping data and simulation data show that our personalized dialogue system can choose different optimal actions for different users, and thus effectively improve the dialogue quality under the personalized setting.\nAutonomous driving is a multi-agent setting where the host vehicle must apply sophisticated negotiation skills with other road users when overtaking, giving way, merging, taking left and right turns and while pushing ahead in unstructured urban roadways. Since there are many possible scenarios, manually tackling all possible cases will likely yield a too simplistic policy. Moreover, one must balance between unexpected behavior of other drivers/pedestrians and at the same time not to be too defensive so that normal traffic flow is maintained.   In this paper we apply deep reinforcement learning to the problem of forming long term driving strategies. We note that there are two major challenges that make autonomous driving different from other robotic tasks. First, is the necessity for ensuring functional safety - something that machine learning has difficulty with given that performance is optimized at the level of an expectation over many instances. Second, the Markov Decision Process model often used in robotics is problematic in our case because of unpredictable behavior of other agents in this multi-agent scenario. We make three contributions in our work. First, we show how policy gradient iterations can be used without Markovian assumptions. Second, we decompose the problem into a composition of a Policy for Desires (which is to be learned) and trajectory planning with hard constraints (which is not learned). The goal of Desires is to enable comfort of driving, while hard constraints guarantees the safety of driving. Third, we introduce a hierarchical temporal abstraction we call an \"Option Graph\" with a gating mechanism that significantly reduces the effective horizon and thereby reducing the variance of the gradient estimation even further.\nDeveloping control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties. During execution, at each time step our approach computes what the simulation-based control policy would do, but then, rather than executing these controls on the real robot, our approach computes what the simulation expects the resulting next state(s) will be, and then relies on a learned deep inverse dynamics model to decide which real-world action is most suitable to achieve those next states. Deep models are only as good as their training data, and we also propose an approach for data collection to (incrementally) learn the deep inverse dynamics model. Our experiments shows our approach compares favorably with various baselines that have been developed for dealing with simulation to real world model discrepancy, including output error control and Gaussian dynamics adaptation.\nWe study truthful mechanisms for matching and related problems in a partial information setting, where the agents' true utilities are hidden, and the algorithm only has access to ordinal preference information. Our model is motivated by the fact that in many settings, agents cannot express the numerical values of their utility for different outcomes, but are still able to rank the outcomes in their order of preference. Specifically, we study problems where the ground truth exists in the form of a weighted graph of agent utilities, but the algorithm can only elicit the agents' private information in the form of a preference ordering for each agent induced by the underlying weights. Against this backdrop, we design truthful algorithms to approximate the true optimum solution with respect to the hidden weights. Our techniques yield universally truthful algorithms for a number of graph problems: a 1.76-approximation algorithm for Max-Weight Matching, 2-approximation algorithm for Max k-matching, a 6-approximation algorithm for Densest k-subgraph, and a 2-approximation algorithm for Max Traveling Salesman as long as the hidden weights constitute a metric. We also provide improved approximation algorithms for such problems when the agents are not able to lie about their preferences. Our results are the first non-trivial truthful approximation algorithms for these problems, and indicate that in many situations, we can design robust algorithms even when the agents may lie and only provide ordinal information instead of precise utilities.\nThis paper presents a deep learning architecture for the semantic decoder component of a Statistical Spoken Dialogue System. In a slot-filling dialogue, the semantic decoder predicts the dialogue act and a set of slot-value pairs from a set of n-best hypotheses returned by the Automatic Speech Recognition. Most current models for spoken language understanding assume (i) word-aligned semantic annotations as in sequence taggers and (ii) delexicalisation, or a mapping of input words to domain-specific concepts using heuristics that try to capture morphological variation but that do not scale to other domains nor to language variation (e.g., morphology, synonyms, paraphrasing ). In this work the semantic decoder is trained using unaligned semantic annotations and it uses distributed semantic representation learning to overcome the limitations of explicit delexicalisation. The proposed architecture uses a convolutional neural network for the sentence representation and a long-short term memory network for the context representation. Results are presented for the publicly available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a significantly higher word error rate (WER).\nWhile deep learning has had significant successes in computer vision thanks to the abundance of visual data, collecting sufficiently large real-world datasets for robot learning can be costly. To increase the practicality of these techniques on real robots, we propose a modular deep reinforcement learning method capable of transferring models trained in simulation to a real-world robotic task. We introduce a bottleneck between perception and control, enabling the networks to be trained independently, but then merged and fine-tuned in an end-to-end manner to further improve hand-eye coordination. On a canonical, planar visually-guided robot reaching task a fine-tuned accuracy of 1.6 pixels is achieved, a significant improvement over naive transfer (17.5 pixels), showing the potential for more complicated and broader applications. Our method provides a technique for more efficient learning and transfer of visuo-motor policies for real robotic systems without relying entirely on large real-world robot datasets.\nDeep neural networks have achieved impressive experimental results in image classification, but can surprisingly be unstable with respect to adversarial perturbations, that is, minimal changes to the input image that cause the network to misclassify it. With potential applications including perception modules and end-to-end controllers for self-driving cars, this raises concerns about their safety. We develop a novel automated verification framework for feed-forward multi-layer neural networks based on Satisfiability Modulo Theory (SMT). We focus on safety of image classification decisions with respect to image manipulations, such as scratches or changes to camera angle or lighting conditions that would result in the same class being assigned by a human, and define safety for an individual decision in terms of invariance of the classification within a small neighbourhood of the original image. We enable exhaustive search of the region by employing discretisation, and propagate the analysis layer by layer. Our method works directly with the network code and, in contrast to existing methods, can guarantee that adversarial examples, if they exist, are found for the given region and family of manipulations. If found, adversarial examples can be shown to human testers and/or used to fine-tune the network. We implement the techniques using Z3 and evaluate them on state-of-the-art networks, including regularised and deep learning networks. We also compare against existing techniques to search for adversarial examples and estimate network robustness.\nWe develop a Bayesian model for decision-making under time pressure with endogenous information acquisition. In our model, the decision maker decides when to observe (costly) information by sampling an underlying continuous-time stochastic process (time series) that conveys information about the potential occurrence or non-occurrence of an adverse event which will terminate the decision-making process. In her attempt to predict the occurrence of the adverse event, the decision-maker follows a policy that determines when to acquire information from the time series (continuation), and when to stop acquiring information and make a final prediction (stopping). We show that the optimal policy has a rendezvous structure, i.e. a structure in which whenever a new information sample is gathered from the time series, the optimal \"date\" for acquiring the next sample becomes computable. The optimal interval between two information samples balances a trade-off between the decision maker's surprise, i.e. the drift in her posterior belief after observing new information, and suspense, i.e. the probability that the adverse event occurs in the time interval between two information samples. Moreover, we characterize the continuation and stopping regions in the decision-maker's state-space, and show that they depend not only on the decision-maker's beliefs, but also on the context, i.e. the current realization of the time series.\nMachine Learning has been a big success story during the AI resurgence. One particular stand out success relates to unsupervised learning from a massive amount of data, albeit much of it relates to one modality/type of data at a time. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition of utilizing knowledge whenever it is available or can be created purposefully. In this paper, we focus on discussing the indispensable role of knowledge for deeper understanding of complex text and multimodal data in situations where (i) large amounts of training data (labeled/unlabeled) are not available or labor intensive to create, (ii) the objects (particularly text) to be recognized are complex (i.e., beyond simple entity-person/location/organization names), such as implicit entities and highly subjective content, and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create knowledge, varying from comprehensive or cross domain to domain or application specific, and (b) carefully exploit the knowledge to further empower or extend the applications of ML/NLP techniques. Using the early results in several diverse situations - both in data types and applications - we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data.\nFood and nutrition occupy an increasingly prevalent space on the web, and dishes and recipes shared online provide an invaluable mirror into culinary cultures and attitudes around the world. More specifically, ingredients, flavors, and nutrition information become strong signals of the taste preferences of individuals and civilizations. However, there is little understanding of these palate varieties. In this paper, we present a large-scale study of recipes published on the web and their content, aiming to understand cuisines and culinary habits around the world. Using a database of more than 157K recipes from over 200 different cuisines, we analyze ingredients, flavors, and nutritional values which distinguish dishes from different regions, and use this knowledge to assess the predictability of recipes from different cuisines. We then use country health statistics to understand the relation between these factors and health indicators of different nations, such as obesity, diabetes, migration, and health expenditure. Our results confirm the strong effects of geographical and cultural similarities on recipes, health indicators, and culinary preferences across the globe.\nIn this paper we present a broad overview of the last 40 years of research on cognitive architectures. Although the number of existing architectures is nearing several hundred, most of the existing surveys do not reflect this growth and focus on a handful of well-established architectures. Thus, in this survey we wanted to shift the focus towards a more inclusive and high-level overview of the research on cognitive architectures. Our final set of 84 architectures includes 49 that are still actively developed, and borrow from a diverse set of disciplines, spanning areas from psychoanalysis to neuroscience. To keep the length of this paper within reasonable limits we discuss only the core cognitive abilities, such as perception, attention mechanisms, action selection, memory, learning and reasoning. In order to assess the breadth of practical applications of cognitive architectures we gathered information on over 900 practical projects implemented using the cognitive architectures in our list. We use various visualization techniques to highlight overall trends in the development of the field. In addition to summarizing the current state-of-the-art in the cognitive architecture research, this survey describes a variety of methods and ideas that have been tried and their relative success in modeling human cognitive abilities, as well as which aspects of cognitive behavior need more research with respect to their mechanistic counterparts and thus can further inform how cognitive science might progress.\nDespite the fact that different objects possess distinct class-specific features, they also usually share common patterns. This observation has been exploited partially in a recently proposed dictionary learning framework by separating the particularity and the commonality (COPAR). Inspired by this, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification with more intuitive constraints. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e. claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Further, we develop new fast and accurate algorithms to solve the subproblems in the learning step, accelerating its convergence. The said algorithms could also be applied to FDDL and its extensions. The efficiencies of these algorithms are theoretically and experimentally verified by comparing their complexities and running time with those of other well-known dictionary learning methods. Experimental results on widely used image datasets establish the advantages of our method over state-of-the-art dictionary learning methods.\nObjective: In this paper, we develop a personalized real-time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs; the proposed risk scoring system ensures timely intensive care unit (ICU) admissions for clinically deteriorating patients. Methods: The risk scoring system learns a set of latent patient subtypes from the offline electronic health record data, and trains a mixture of Gaussian Process (GP) experts, where each expert models the physiological data streams associated with a specific patient subtype. Transfer learning techniques are used to learn the relationship between a patient's latent subtype and her static admission information (e.g. age, gender, transfer status, ICD-9 codes, etc). Results: Experiments conducted on data from a heterogeneous cohort of 6,321 patients admitted to Ronald Reagan UCLA medical center show that our risk score significantly and consistently outperforms the currently deployed risk scores, such as the Rothman index, MEWS, APACHE and SOFA scores, in terms of timeliness, true positive rate (TPR), and positive predictive value (PPV). Conclusion: Our results reflect the importance of adopting the concepts of personalized medicine in critical care settings; significant accuracy and timeliness gains can be achieved by accounting for the patients' heterogeneity. Significance: The proposed risk scoring methodology can confer huge clinical and social benefits on more than 200,000 critically ill inpatient who exhibit cardiac arrests in the US every year.\nIn Bayesian statistics probability distributions express beliefs. However, for many problems the beliefs cannot be computed analytically and approximations of beliefs are needed. We seek a loss function that quantifies how \"embarrassing\" it is to communicate a given approximation. We reproduce and discuss an old proof showing that there is only one ranking under the requirements that (1) the best ranked approximation is the non-approximated belief and (2) that the ranking judges approximations only by their predictions for actual outcomes. The loss function that is obtained in the derivation is equal to the Kullback-Leibler divergence when normalized. This loss function is frequently used in the literature. However, there seems to be confusion about the correct order in which its functional arguments, the approximated and non-approximated beliefs, should be used. The correct order ensures that the recipient of a communication is only deprived of the minimal amount of information. We hope that the elementary derivation settles the apparent confusion. For example when approximating beliefs with Gaussian distributions the optimal approximation is given by moment matching. This is in contrast to many suggested computational schemes.\nThe research of personalized recommendation techniques today has mostly parted into two mainstream directions, i.e., the factorization-based approaches and topic models. Practically, they aim to benefit from the numerical ratings and textual reviews, correspondingly, which compose two major information sources in various real-world systems. However, although the two approaches are supposed to be correlated for their same goal of accurate recommendation, there still lacks a clear theoretical understanding of how their objective functions can be mathematically bridged to leverage the numerical ratings and textual reviews collectively, and why such a bridge is intuitively reasonable to match up their learning procedures for the rating prediction and top-N recommendation tasks, respectively.   In this work, we exposit with mathematical analysis that, the vector-level randomization functions to coordinate the optimization objectives of factorizational and topic models unfortunately do not exist at all, although they are usually pre-assumed and intuitively designed in the literature. Fortunately, we also point out that one can avoid the seeking of such a randomization function by optimizing a Joint Factorizational Topic (JFT) model directly. We apply our JFT model to restaurant recommendation, and study its performance in both normal and cross-city recommendation scenarios, where the latter is an extremely difficult task for its inherent cold-start nature. Experimental results on real-world datasets verified the appealing performance of our approach against previous methods, on both rating prediction and top-N recommendation tasks.\nVision-based object detection is one of the fundamental functions in numerous traffic scene applications such as self-driving vehicle systems and advance driver assistance systems (ADAS). However, it is also a challenging task due to the diversity of traffic scene and the storage, power and computing source limitations of the platforms for traffic scene applications. This paper presents a generalized Haar filter based deep network which is suitable for the object detection tasks in traffic scene. In this approach, we first decompose a object detection task into several easier local regression tasks. Then, we handle the local regression tasks by using several tiny deep networks which simultaneously output the bounding boxes, categories and confidence scores of detected objects. To reduce the consumption of storage and computing resources, the weights of the deep networks are constrained to the form of generalized Haar filter in training phase. Additionally, we introduce the strategy of sparse windows generation to improve the efficiency of the algorithm. Finally, we perform several experiments to validate the performance of our proposed approach. Experimental results demonstrate that the proposed approach is both efficient and effective in traffic scene compared with the state-of-the-art.\nMost of the existing graph embedding methods focus on nodes, which aim to output a vector representation for each node in the graph such that two nodes being \"close\" on the graph are close too in the low-dimensional space. Despite the success of embedding individual nodes for graph analytics, we notice that an important concept of embedding communities (i.e., groups of nodes) is missing. Embedding communities is useful, not only for supporting various community-level applications, but also to help preserve community structure in graph embedding. In fact, we see community embedding as providing a higher-order proximity to define the node closeness, whereas most of the popular graph embedding methods focus on first-order and/or second-order proximities. To learn the community embedding, we hinge upon the insight that community embedding and node embedding reinforce with each other. As a result, we propose ComEmbed, the first community embedding method, which jointly optimizes the community embedding and node embedding together. We evaluate ComEmbed on real-world data sets. We show it outperforms the state-of-the-art baselines in both tasks of node classification and community prediction.\nNeural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponential-family distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as second-order representations for the associated tasks such as link prediction. Experiments on real-world datasets show that NPN can achieve state-of-the-art performance.\nHybrid methods that utilize both content and rating information are commonly used in many recommender systems. However, most of them use either handcrafted features or the bag-of-words representation as a surrogate for the content information but they are neither effective nor natural enough. To address this problem, we develop a collaborative recurrent autoencoder (CRAE) which is a denoising recurrent autoencoder (DRAE) that models the generation of content sequences in the collaborative filtering (CF) setting. The model generalizes recent advances in recurrent deep learning from i.i.d. input to non-i.i.d. (CF-based) input and provides a new denoising scheme along with a novel learnable pooling scheme for the recurrent autoencoder. To do this, we first develop a hierarchical Bayesian model for the DRAE and then generalize it to the CF setting. The synergy between denoising and CF enables CRAE to make accurate recommendations while learning to fill in the blanks in sequences. Experiments on real-world datasets from different domains (CiteULike and Netflix) show that, by jointly modeling the order-aware generation of sequences for the content information and performing CF for the ratings, CRAE is able to significantly outperform the state of the art on both the recommendation task based on ratings and the sequence generation task based on content information.\nThe Neural GPU is a recent model that can learn algorithms such as multi-digit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size. The latter requires a memory efficient implementation, as a naive implementation of the Neural GPU is memory intensive. We find that these techniques increase the set of algorithmic problems that can be solved by the Neural GPU: we have been able to learn to perform all the arithmetic operations (and generalize to arbitrarily long numbers) when the arguments are given in the decimal representation (which, surprisingly, has not been possible before). We have also been able to train the Neural GPU to evaluate long arithmetic expressions with multiple operands that require respecting the precedence order of the operands, although these have succeeded only in their binary representation, and not with perfect accuracy.   In addition, we gain insight into the Neural GPU by investigating its failure modes. We find that Neural GPUs that correctly generalize to arbitrarily long numbers still fail to compute the correct answer on highly-symmetric, atypical inputs: for example, a Neural GPU that achieves near-perfect generalization on decimal multiplication of up to 100-digit long numbers can fail on $000000\\dots002 \\times 000000\\dots002$ while succeeding at $2 \\times 2$. These failure modes are reminiscent of adversarial examples.\nVarious families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to a command and control (C&C) server. In order to block DGA C&C traffic, security organizations must first discover the algorithm by reverse engineering malware samples, then generating a list of domains for a given seed. The domains are then either preregistered or published in a DNS blacklist. This process is not only tedious, but can be readily circumvented by malware authors using a large number of seeds in algorithms with multivariate recurrence properties (e.g., banjori) or by using a dynamic list of seeds (e.g., bedep). Another technique to stop malware from using DGAs is to intercept DNS queries on a network and predict whether domains are DGA generated. Such a technique will alert network administrators to the presence of malware on their networks. In addition, if the predictor can also accurately predict the family of DGAs, then network administrators can also be alerted to the type of malware that is on their networks. This paper presents a DGA classifier that leverages long short-term memory (LSTM) networks to predict DGAs and their respective families without the need for a priori feature extraction. Results are significantly better than state-of-the-art techniques, providing 0.9993 area under the receiver operating characteristic curve for binary classification and a micro-averaged F1 score of 0.9906. In other terms, the LSTM technique can provide a 90% detection rate with a 1:10000 false positive (FP) rate---a twenty times FP improvement over comparable methods. Experiments in this paper are run on open datasets and code snippets are provided to reproduce the results.\nDetermining the optimal size and orientation of small-scale residential based PV arrays will become increasingly complex in the future smart grid environment with the introduction of smart meters and dynamic tariffs. However consumers can leverage the availability of smart meter data to conduct a more detailed exploration of PV investment options for their particular circumstances. In this paper, an optimization method for PV orientation and sizing is proposed whereby maximizing the PV investment value is set as the defining objective. Solar insolation and PV array models are described to form the basis of the PV array optimization strategy. A constrained particle swarm optimization algorithm is selected due to its strong performance in non-linear applications. The optimization algorithm is applied to real-world metered data to quantify the possible investment value of a PV installation under different energy retailers and tariff structures. The arrangement with the highest value is determined to enable prospective small-scale PV investors to select the most cost-effective system.\nNeural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.\nIn this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence - both semantic and syntactic - but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis on the IMDB movie review dataset and report an error rate of $6.28\\%$. This is comparable to the state-of-the-art $5.91\\%$ resulting from a semi-supervised approach. Finally, TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation.\nRecent years have seen the proposal of a number of neural architectures for the problem of Program Induction. Given a set of input-output examples, these architectures are able to learn mappings that generalize to new test inputs. While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network). In this paper, we propose a novel technique, Neuro-Symbolic Program Synthesis, to overcome the above-mentioned problems. Once trained, our approach can automatically construct computer programs in a domain-specific language that are consistent with a set of input-output examples provided at test time. Our method is based on two novel neural modules. The first module, called the cross correlation I/O network, given a set of input-output examples, produces a continuous representation of the set of I/O examples. The second module, the Recursive-Reverse-Recursive Neural Network (R3NN), given the continuous representation of the examples, synthesizes a program by incrementally expanding partial programs. We demonstrate the effectiveness of our approach by applying it to the rich and complex domain of regular expression based string transformations. Experiments show that the R3NN model is not only able to construct programs from new input-output examples, but it is also able to construct new programs for tasks that it had never observed before during training.\nDeep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a \"fast\" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL$^2$, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose (\"slow\") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the \"fast\" RL algorithm on the current (previously unseen) MDP. We evaluate RL$^2$ experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL$^2$ is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL$^2$ on a vision-based navigation task and show that it scales up to high-dimensional problems.\nAcquiring your first language is an incredible feat and not easily duplicated. Learning to communicate using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. Nevertheless, this is the dominating approach in most natural language processing today. As an alternative, we propose the use of situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks for evolving a shared language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that the agents learn not only to encode physical concepts in their words, i.e. grounding, but also that the agents learn to hold a multi-step dialogue remembering the state of the dialogue from step to step.\nGenerative adversarial networks (GANs) are a recently proposed class of generative models in which a generator is trained to optimize a cost function that is being simultaneously learned by a discriminator. While the idea of learning cost functions is relatively new to the field of generative modeling, learning costs has long been studied in control and reinforcement learning (RL) domains, typically for imitation learning from demonstrations. In these fields, learning cost function underlying observed behavior is known as inverse reinforcement learning (IRL) or inverse optimal control. While at first the connection between cost learning in RL and cost learning in generative modeling may appear to be a superficial one, we show in this paper that certain IRL methods are in fact mathematically equivalent to GANs. In particular, we demonstrate an equivalence between a sample-based algorithm for maximum entropy IRL and a GAN in which the generator's density can be evaluated and is provided as an additional input to the discriminator. Interestingly, maximum entropy IRL is a special case of an energy-based model. We discuss the interpretation of GANs as an algorithm for training energy-based models, and relate this interpretation to other recent work that seeks to connect GANs and EBMs. By formally highlighting the connection between GANs, IRL, and EBMs, we hope that researchers in all three communities can better identify and apply transferable ideas from one domain to another, particularly for developing more stable and scalable algorithms: a major challenge in all three domains.\nMany recent works have demonstrated the benefits of knowledge graph embeddings in completing monolingual knowledge graphs. Inasmuch as related knowledge bases are built in several different languages, achieving cross-lingual knowledge alignment will help people in constructing a coherent knowledge base, and assist machines in dealing with different expressions of entity relationships across diverse human languages. Unfortunately, achieving this highly desirable crosslingual alignment by human labor is very costly and errorprone. Thus, we propose MTransE, a translation-based model for multilingual knowledge graph embeddings, to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its cross-lingual counterparts in other spaces, while preserving the functionalities of monolingual embeddings. We deploy three different techniques to represent cross-lingual transitions, namely axis calibration, translation vectors, and linear transformations, and derive five variants for MTransE using different loss functions. Our models can be trained on partially aligned graphs, where just a small portion of triples are aligned with their cross-lingual counterparts. The experiments on cross-lingual entity matching and triple-wise alignment verification show promising results, with some variants consistently outperforming others on different tasks. We also explore how MTransE preserves the key properties of its monolingual counterpart TransE.\nWe consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam's razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using R\\'enyi entropy. Our main result is that, under natural assumptions, if the exogenous variable has low $H_0$ entropy (cardinality) in the true direction, it must have high $H_0$ entropy in the wrong direction. We establish several algorithmic hardness results about estimating the minimum entropy exogenous variable. We show that the problem of finding the exogenous variable with minimum entropy is equivalent to the problem of finding minimum joint entropy given $n$ marginal distributions, also known as minimum entropy coupling problem. We propose an efficient greedy algorithm for the minimum entropy coupling problem, that for $n=2$ provably finds a local optimum. This gives a greedy algorithm for finding the exogenous variable with minimum $H_1$ (Shannon Entropy). Our greedy entropy-based causal inference algorithm has similar performance to the state of the art additive noise models in real datasets. One advantage of our approach is that we make no use of the values of random variables but only their distributions. Our method can therefore be used for causal inference for both ordinal and also categorical data, unlike additive noise models.\nIn this paper, we propose commonsense knowledge enhanced embeddings (KEE) for solving the Pronoun Disambiguation Problems (PDP). The PDP task we investigate in this paper is a complex coreference resolution task which requires the utilization of commonsense knowledge. This task is a standard first round test set in the 2016 Winograd Schema Challenge. In this task, traditional linguistic features that are useful for coreference resolution, e.g. context and gender information, are no longer effective anymore. Therefore, the KEE models are proposed to provide a general framework to make use of commonsense knowledge for solving the PDP problems. Since the PDP task doesn't have training data, the KEE models would be used during the unsupervised feature extraction process. To evaluate the effectiveness of the KEE models, we propose to incorporate various commonsense knowledge bases, including ConceptNet, WordNet, and CauseCom, into the KEE training process. We achieved the best performance by applying the proposed methods to the 2016 Winograd Schema Challenge. In addition, experiments conducted on the standard PDP task indicate that, the proposed KEE models could solve the PDP problems by achieving 66.7% accuracy, which is a new state-of-the-art performance.\nThe domain of single crossing preference profiles is a widely studied domain in social choice theory. It has been generalized to the domain of single crossing preference profiles with respect to trees which inherits many desirable properties from the single crossing domain, for example, transitivity of majority relation, existence of polynomial time algorithms for finding winners of Kemeny voting rule, etc. In this paper, we consider a further generalization of the domain of single crossing profiles on trees to the domain consisting of all preference profiles which can be extended to single crossing preference profiles with respect to some tree by adding more preferences to it. We call this domain the weakly single crossing domain on trees. We present a polynomial time algorithm for recognizing weakly single crossing profiles on trees. We then move on to develop a polynomial time algorithm with low query complexity for eliciting weakly single crossing profiles on trees even when we do not know any tree with respect to which the closure of the input profile is single crossing and the preferences can be queried only sequentially; moreover, the sequential order is also unknown. We complement the performance of our preference elicitation algorithm by proving that our algorithm makes an optimal number of queries up to constant factors when the number of preferences is large compared to the number of candidates, even if the input profile is known to be single crossing with respect to some given tree and the preferences can be accessed randomly.\nWe propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a discriminator attempts to tell these apart from data samples. In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples. Second, the MMD can be used to evaluate the performance of a generative model, by testing the model's samples against a reference data set. In the latter role, the optimized MMD is particularly helpful, as it gives an interpretable indication of how the model and data distributions differ, even in cases where individual model samples are not easily distinguished either by eye or by classifier.\nWe propose an approach to build a neural machine translation system with no supervised resources (i.e., no parallel corpora) using multimodal embedded representation over texts and images. Based on the assumption that text documents are often likely to be described with other multimedia information (e.g., images) somewhat related to the content, we try to indirectly estimate the relevance between two languages. Using multimedia as the \"pivot\", we project all modalities into one common hidden space where samples belonging to similar semantic concepts should come close to each other, whatever the observed space of each sample is. This modality-agnostic representation is the key to bridging the gap between different modalities. Putting a decoder on top of it, our network can flexibly draw the outputs from any input modality. Notably, in the testing phase, we need only source language texts as the input for translation. In experiments, we tested our method on two benchmarks to show that it can achieve reasonable translation performance. We compared and investigated several possible implementations and found that an end-to-end model that simultaneously optimized both rank loss in multimodal encoders and cross-entropy loss in decoders performed the best.\nMax-cut, clustering, and many other partitioning problems that are of significant importance to machine learning and other scientific fields are NP-hard, a reality that has motivated researchers to develop a wealth of approximation algorithms and heuristics. Although the best algorithm to use typically depends on the specific application domain, a worst-case analysis is often used to compare algorithms. This may be misleading if worst-case instances occur infrequently, and thus there is a demand for optimization methods which return the algorithm configuration best suited for the given application's typical inputs. We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance. Our algorithms learn over common integer quadratic programming and clustering algorithm families: SDP rounding algorithms and agglomerative clustering algorithms with dynamic programming. For our sample complexity analysis, we provide tight bounds on the pseudodimension of these algorithm classes, and show that surprisingly, even for classes of algorithms parameterized by a single parameter, the pseudo-dimension is superconstant. In this way, our work both contributes to the foundations of algorithm configuration and pushes the boundaries of learning theory, since the algorithm classes we analyze consist of multi-stage optimization procedures and are significantly more complex than classes typically studied in learning theory.\nOur aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).\nOver the last few years, the number of smart objects connected to the Internet has grown exponentially in comparison to the number of services and applications. The integration between Cloud Computing and Internet of Things, named as Cloud of Things, plays a key role in managing the connected things, their data and services. One of the main challenges in Cloud of Things is the resource discovery of the smart objects and their reuse in different contexts. Most of the existent work uses some kind of multi-criteria decision analysis algorithm to perform the resource discovery, but do not evaluate the impact that the user constraints has in the final solution. In this paper, we analyse the behaviour of the SAW, TOPSIS and VIKOR multi-objective decision analyses algorithms and the impact of user constraints on them. We evaluated the quality of the proposed solutions using the Pareto-optimality concept.\nIn order to have effective human AI collaboration, it is not simply enough to address the question of autonomy; an equally important question is, how the AI's behavior is being perceived by their human counterparts. When AI agent's task plans are generated without such considerations, they may often demonstrate inexplicable behavior from the human's point of view. This problem arises due to the human's partial or inaccurate understanding of the agent's planning process and/or the model. This may have serious implications on human-AI collaboration, from increased cognitive load and reduced trust in the agent, to more serious concerns of safety in interactions with physical agent. In this paper, we address this issue by modeling the notion of plan explicability as a function of the distance between a plan that agent makes and the plan that human expects it to make. To this end, we learn a distance function based on different plan distance measures that can accurately model this notion of plan explicability, and develop an anytime search algorithm that can use this distance as a heuristic to come up with progressively explicable plans. We evaluate the effectiveness of our approach in a simulated autonomous car domain and a physical service robot domain. We provide empirical evaluations that demonstrate the usefulness of our approach in making the planning process of an autonomous agent conform to human expectations.\nA game theoretic distributed decision making approach is presented for the problem of control effort allocation in a robotic team based on a novel variant of fictitious play. The proposed learning process allows the robots to accomplish their objectives by coordinating their actions in order to efficiently complete their tasks. In particular, each robot of the team predicts the other robots' planned actions while making decisions to maximise their own expected reward that depends on the reward for joint successful completion of the task. Action selection is interpreted as an $n$-player cooperative game. The approach presented can be seen as part of the \\emph{Belief Desire Intention} (BDI) framework, also can address the problem of cooperative, legal, safe, considerate and emphatic decisions by robots if their individual and group rewards are suitably defined. After theoretical analysis the performance of the proposed algorithm is tested on four simulation scenarios. The first one is a coordination game between two material handling robots, the second one is a warehouse patrolling task by a team of robots, the third one presents a coordination mechanism between two robots that carry a heavy object on a corridor and the fourth one is an example of coordination on a sensors network.\nCurrent state of the art in the field of UAV activation relies solely on human operators for the design and adaptation of the drones' flying routes. Furthermore, this is being done today on an individual level (one vehicle per operators), with some exceptions of a handful of new systems, that are comprised of a small number of self-organizing swarms, manually guided by a human operator.   Drones-based monitoring is of great importance in variety of civilian domains, such as road safety, homeland security, and even environmental control. In its military aspect, efficiently detecting evading targets by a fleet of unmanned drones has an ever increasing impact on the ability of modern armies to engage in warfare. The latter is true both traditional symmetric conflicts among armies as well as asymmetric ones. Be it a speeding driver, a polluting trailer or a covert convoy, the basic challenge remains the same -- how can its detection probability be maximized using as little number of drones as possible.   In this work we propose a novel approach for the optimization of large scale swarms of reconnaissance drones -- capable of producing on-demand optimal coverage strategies for any given search scenario. Given an estimation cost of the threat's potential damages, as well as types of monitoring drones available and their comparative performance, our proposed method generates an analytically provable strategy, stating the optimal number and types of drones to be deployed, in order to cost-efficiently monitor a pre-defined region for targets maneuvering using a given roads networks.   We demonstrate our model using a unique dataset of the Israeli transportation network, on which different deployment schemes for drones deployment are evaluated.\nIn recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.\nWhile much research effort has been dedicated to scaling up sparse Gaussian process (GP) models based on inducing variables for big data, little attention is afforded to the other less explored class of low-rank GP approximations that exploit the sparse spectral representation of a GP kernel. This paper presents such an effort to advance the state of the art of sparse spectrum GP models to achieve competitive predictive performance for massive datasets. Our generalized framework of stochastic variational Bayesian sparse spectrum GP (sVBSSGP) models addresses their shortcomings by adopting a Bayesian treatment of the spectral frequencies to avoid overfitting, modeling these frequencies jointly in its variational distribution to enable their interaction a posteriori, and exploiting local data for boosting the predictive performance. However, such structural improvements result in a variational lower bound that is intractable to be optimized. To resolve this, we exploit a variational parameterization trick to make it amenable to stochastic optimization. Interestingly, the resulting stochastic gradient has a linearly decomposable structure that can be exploited to refine our stochastic optimization method to incur constant time per iteration while preserving its property of being an unbiased estimator of the exact gradient of the variational lower bound. Empirical evaluation on real-world datasets shows that sVBSSGP outperforms state-of-the-art stochastic implementations of sparse GP models.\nGaussian processes (GP) provide a prior over functions and allow finding complex regularities in data. Gaussian processes are successfully used for classification/regression problems and dimensionality reduction. In this work we consider the classification problem only. The complexity of standard methods for GP-classification scales cubically with the size of the training dataset. This complexity makes them inapplicable to big data problems. Therefore, a variety of methods were introduced to overcome this limitation. In the paper we focus on methods based on so called inducing inputs. This approach is based on variational inference and proposes a particular lower bound for marginal likelihood (evidence). This bound is then maximized w.r.t. parameters of kernel function of the Gaussian process, thus fitting the model to data. The computational complexity of this method is $O(nm^2)$, where $m$ is the number of inducing inputs used by the model and is assumed to be substantially smaller than the size of the dataset $n$. Recently, a new evidence lower bound for GP-classification problem was introduced. It allows using stochastic optimization, which makes it suitable for big data problems. However, the new lower bound depends on $O(m^2)$ variational parameter, which makes optimization challenging in case of big m. In this work we develop a new approach for training inducing input GP models for classification problems. Here we use quadratic approximation of several terms in the aforementioned evidence lower bound, obtaining analytical expressions for optimal values of most of the parameters in the optimization, thus sufficiently reducing the dimension of optimization space. In our experiments we achieve as well or better results, compared to the existing method. Moreover, our method doesn't require the user to manually set the learning rate, making it more practical, than the existing method.\nStructural causal models (SCMs), also known as non-parametric structural equation models (NP-SEMs), are widely used for causal modeling purposes. In this paper, we give a rigorous treatment of structural causal models, dealing with measure-theoretic complications that arise in the presence of cyclic relations. The central question studied in this paper is: given a (possibly cyclic) SCM defined on a large system (consisting of observable endogenous and latent exogenous variables), can we \"project it down\" to an SCM that describes a subsystem (consisting of a subset of the observed endogenous variables and possibly different latent exogenous variables) in order to obtain a more parsimonious but equivalent representation of the subsystem? We define a marginalization operation that effectively removes a subset of the endogenous variables from the model, and a class of mappings, exogenous reparameterizations, that can be used to reduce the space of exogenous variables. We show that both operations preserve the causal semantics of the model and that under mild conditions they can lead to a significant reduction of the model complexity, at least in terms of the number of variables in the model. We argue that for the task of estimating an SCM from data, the existence of \"smooth\" reductions would be desirable. We provide several conditions under which the existence of such reductions can be shown, but also provide a counterexample that shows that such reductions do not exist in general. The latter result implies that existing approaches to estimate linear or Markovian SCMs from data cannot be extended to general SCMs.\nCredit card plays a very important rule in today's economy. It becomes an unavoidable part of household, business and global activities. Although using credit cards provides enormous benefits when used carefully and responsibly,significant credit and financial damages may be caused by fraudulent activities. Many techniques have been proposed to confront the growth in credit card fraud. However, all of these techniques have the same goal of avoiding the credit card fraud; each one has its own drawbacks, advantages and characteristics. In this paper, after investigating difficulties of credit card fraud detection, we seek to review the state of the art in credit card fraud detection techniques, data sets and evaluation criteria.The advantages and disadvantages of fraud detection methods are enumerated and compared.Furthermore, a classification of mentioned techniques into two main fraud detection approaches, namely, misuses (supervised) and anomaly detection (unsupervised) is presented. Again, a classification of techniques is proposed based on capability to process the numerical and categorical data sets. Different data sets used in literature are then described and grouped into real and synthesized data and the effective and common attributes are extracted for further usage.Moreover, evaluation employed criterions in literature are collected and discussed.Consequently, open issues for credit card fraud detection are explained as guidelines for new researchers.\nReinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient, but need to acquire explicit knowledge about the environment.   In this paper, we take a step towards using model-based techniques in environments with a high-dimensional visual state space by demonstrating that it is possible to learn system dynamics and the reward structure jointly. Our contribution is to extend a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well. To this end, we phrase a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly. Empirical evaluations on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these results as opening up important directions for model-based reinforcement learning in complex, initially unknown environments.\nIn this work, we study the guaranteed delivery model which is widely used in online display advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user traffic to generate as much as possible revenue.   Different from previous works which usually minimize the penalty of unsatisfied contracts and some other cost (e.g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user traffic consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user traffic to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user traffic is redundant or short so that they can sell or buy this part of traffic in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.\nKnowledge Tracing (KT) is a task of tracing evolving knowledge state of students with respect to one or more concepts as they engage in a sequence of learning activities. One important purpose of KT is to personalize the practice sequence to help students learn knowledge concepts efficiently. However, existing methods such as Bayesian Knowledge Tracing and Deep Knowledge Tracing either model knowledge state for each predefined concept separately or fail to pinpoint exactly which concepts a student is good at or unfamiliar with. To solve these problems, this work introduces a new model called Dynamic Key-Value Memory Networks (DKVMN) that can exploit the relationships between underlying concepts and directly output a student's mastery level of each concept. Unlike standard memory-augmented neural networks that facilitate a single memory matrix or two static memory matrices, our model has one static matrix called key, which stores the knowledge concepts and the other dynamic matrix called value, which stores and updates the mastery levels of corresponding concepts. Experiments show that our model consistently outperforms the state-of-the-art model in a range of KT datasets. Moreover, the DKVMN model can automatically discover underlying concepts of exercises typically performed by human annotations and depict the changing knowledge state of a student.\nIt is clear that one of the primary tools we can use to mitigate the potential risk from a misbehaving AI system is the ability to turn the system off. As the capabilities of AI systems improve, it is important to ensure that such systems do not adopt subgoals that prevent a human from switching them off. This is a challenge because many formulations of rational agents create strong incentives for self-preservation. This is not caused by a built-in instinct, but because a rational agent will maximize expected utility and cannot achieve whatever objective it has been given if it is dead. Our goal is to study the incentives an agent has to allow itself to be switched off. We analyze a simple game between a human H and a robot R, where H can press R's off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H's actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.\nTo enhance developer productivity, all modern integrated development environments (IDEs) include code suggestion functionality that proposes likely next tokens at the cursor. While current IDEs work well for statically-typed languages, their reliance on type annotations means that they do not provide the same level of support for dynamic programming languages as for statically-typed languages. Moreover, suggestion engines in modern IDEs do not propose expressions or multi-statement idiomatic code. Recent work has shown that language models can improve code suggestion systems by learning from software repositories. This paper introduces a neural language model with a sparse pointer network aimed at capturing very long-range dependencies. We release a large-scale code suggestion corpus of 41M lines of Python code crawled from GitHub. On this corpus, we found standard neural language models to perform well at suggesting local phenomena, but struggle to refer to identifiers that are introduced many tokens in the past. By augmenting a neural language model with a pointer network specialized in referring to predefined classes of identifiers, we obtain a much lower perplexity and a 5 percentage points increase in accuracy for code suggestion compared to an LSTM baseline. In fact, this increase in code suggestion accuracy is due to a 13 times more accurate prediction of identifiers. Furthermore, a qualitative analysis shows this model indeed captures interesting long-range dependencies, like referring to a class member defined over 60 tokens in the past.\nDecision support systems help decision makers make better decisions in the face of complex decision problems (e.g. investment or policy decisions). Fisheries and Aquaculture is a domain where decision makers face such decisions since they involve factors from many different scientific fields. No systematic overview of literature describing decision support systems and their application in fisheries and aquaculture has been conducted. This paper summarizes scientific literature that describes decision support systems applied to the domain of Fisheries and Aquaculture. We use an established systematic mapping survey method to conduct our literature mapping. Our research questions are: What decision support systems for fisheries and aquaculture exists? What are the most investigated fishery and aquaculture decision support systems topics and how have these changed over time? Do any current DSS for fisheries provide real- time analytics? Do DSSes in Fisheries and Aquaculture build their models using machine learning done on captured and grounded data? The paper then detail how we employ the systematic mapping method in answering these questions. This results in 27 papers being identified as relevant and gives an exposition on the primary methods concluded in the study for designing a decision support system. We provide an analysis of the research done in the studies collected. We discovered that most literature does not consider multiple aspects for multiple stakeholders in their work. In addition we observed that little or no work has been done with real-time analysis in these decision support systems.\nModel generation is a problem complementary to theorem proving and is important for fault analysis and debugging of formal specifications of security protocols, programs and terminological definitions. This paper discusses several ways of enhancing the paradigm of bottom-up model generation. The two main contributions are new, generalized blocking techniques and a new range-restriction transformation. The blocking techniques are based on simple transformations of the input set together with standard equality reasoning and redundancy elimination techniques. These provide general methods for finding small, finite models. The range-restriction transformation refines existing transformations to range-restricted clauses by carefully limiting the creation of domain terms. All possible combinations of the introduced techniques and classical range-restriction were tested on the clausal problems of the TPTP Version 6.0.0 with an implementation based on the SPASS theorem prover using a hyperresolution-like refinement. Unrestricted domain blocking gave best results for satisfiable problems showing it is a powerful technique indispensable for bottom-up model generation methods. Both in combination with the new range-restricting transformation, and the classical range-restricting transformation, good results have been obtained. Limiting the creation of terms during the inference process by using the new range restricting transformation has paid off, especially when using it together with a shifting transformation. The experimental results also show that classical range restriction with unrestricted blocking provides a useful complementary method. Overall, the results showed bottom-up model generation methods were good for disproving theorems and generating models for satisfiable problems, but less efficient than SPASS in auto mode for unsatisfiable problems.\nUnobserved or unknown confounders complicate even the simplest attempts to estimate the effect of one variable on another using observational data. When cause and effect are both affected by unobserved confounders, methods based on identifying natural experiments have been proposed to eliminate confounds. However, their validity is hard to verify because they depend on assumptions about the independence of variables, that by definition, cannot be measured. In this paper we investigate a particular scenario in time series data that permits causal identification in the presence of unobserved confounders and present an algorithm to automatically find such scenarios. Specifically, we examine what we call the split-door setting, when the effect variable can be split up into two parts: one that is potentially affected by the cause, and another that is independent of it. We show that when both of these variables are caused by the same (unobserved) confounders, the problem of identification reduces to that of testing for independence among observed variables. We discuss various situations in which split-door variables are commonly recorded in both online and offline settings, and demonstrate the method by estimating the causal impact of Amazon's recommender system, obtaining more than 23,000 natural experiments that provide similar---but more precise---estimates than past studies.\nWith the advent of semantic web, various tools and techniques have been introduced for presenting and organizing knowledge. Concept hierarchies are one such technique which gained significant attention due to its usefulness in creating domain ontologies that are considered as an integral part of semantic web. Automated concept hierarchy learning algorithms focus on extracting relevant concepts from unstructured text corpus and connect them together by identifying some potential relations exist between them. In this paper, we propose a novel approach for identifying relevant concepts from plain text and then learns hierarchy of concepts by exploiting subsumption relation between them. To start with, we model topics using a probabilistic topic model and then make use of some lightweight linguistic process to extract semantically rich concepts. Then we connect concepts by identifying an \"is-a\" relationship between pair of concepts. The proposed method is completely unsupervised and there is no need for a domain specific training corpus for concept extraction and learning. Experiments on large and real-world text corpora such as BBC News dataset and Reuters News corpus shows that the proposed method outperforms some of the existing methods for concept extraction and efficient concept hierarchy learning is possible if the overall task is guided by a probabilistic topic modeling algorithm.\nThe applicability of fractional order (FO) automatic generation control (AGC) for power system frequency oscillation damping is investigated in this paper, employing distributed energy generation. The hybrid power system employs various autonomous generation systems like wind turbine, solar photovoltaic, diesel engine, fuel-cell and aqua electrolyzer along with other energy storage devices like the battery and flywheel. The controller is placed in a remote location while receiving and sending signals over an unreliable communication network with stochastic delay. The controller parameters are tuned using robust optimization techniques employing different variants of Particle Swarm Optimization (PSO) and are compared with the corresponding optimal solutions. An archival based strategy is used for reducing the number of function evaluations for the robust optimization methods. The solutions obtained through the robust optimization are able to handle higher variation in the controller gains and orders without significant decrease in the system performance. This is desirable from the FO controller implementation point of view, as the design is able to accommodate variations in the system parameter which may result due to the approximation of FO operators, using different realization methods and order of accuracy. Also a comparison is made between the FO and the integer order (IO) controllers to highlight the merits and demerits of each scheme.\nHumans are remarkably adept at interpreting the gaze direction of other individuals in their surroundings. This skill is at the core of the ability to engage in joint visual attention, which is essential for establishing social interactions. How accurate are humans in determining the gaze direction of others in lifelike scenes, when they can move their heads and eyes freely, and what are the sources of information for the underlying perceptual processes? These questions pose a challenge from both empirical and computational perspectives, due to the complexity of the visual input in real-life situations. Here we measure empirically human accuracy in perceiving the gaze direction of others in lifelike scenes, and study computationally the sources of information and representations underlying this cognitive capacity. We show that humans perform better in face-to-face conditions compared with recorded conditions, and that this advantage is not due to the availability of input dynamics. We further show that humans are still performing well when only the eyes-region is visible, rather than the whole face. We develop a computational model, which replicates the pattern of human performance, including the finding that the eyes-region contains on its own, the required information for estimating both head orientation and direction of gaze. Consistent with neurophysiological findings on task-specific face regions in the brain, the learned computational representations reproduce perceptual effects such as the Wollaston illusion, when trained to estimate direction of gaze, but not when trained to recognize objects or faces.\nTwo potential bottlenecks on the expressiveness of recurrent neural networks (RNNs) are their ability to store information about the task in their parameters, and to store information about the input history in their units. We show experimentally that all common RNN architectures achieve nearly the same per-task and per-unit capacity bounds with careful training, for a variety of tasks and stacking depths. They can store an amount of task information which is linear in the number of parameters, and is approximately 5 bits per parameter. They can additionally store approximately one real number from their input history per hidden unit. We further find that for several tasks it is the per-task parameter capacity bound that determines performance. These results suggest that many previous results comparing RNN architectures are driven primarily by differences in training effectiveness, rather than differences in capacity. Supporting this observation, we compare training difficulty for several architectures, and show that vanilla RNNs are far more difficult to train, yet have slightly higher capacity. Finally, we propose two novel RNN architectures, one of which is easier to train than the LSTM or GRU for deeply stacked architectures.\nThe Choquet integral is a powerful aggregation operator which lists many well-known models as its special cases. We look at these special cases and provide their axiomatic analysis. In cases where an axiomatization has been previously given in the literature, we connect the existing results with the framework that we have developed. Next we turn to the question of learning, which is especially important for the practical applications of the model. So far, learning of the Choquet integral has been mostly confined to the learning of the capacity. Such an approach requires making a powerful assumption that all dimensions (e.g. criteria) are evaluated on the same scale, which is rarely justified in practice. Too often categorical data is given arbitrary numerical labels (e.g. AHP), and numerical data is considered cardinally and ordinally commensurate, sometimes after a simple normalization. Such approaches clearly lack scientific rigour, and yet they are commonly seen in all kinds of applications. We discuss the pros and cons of making such an assumption and look at the consequences which axiomatization uniqueness results have for the learning problems. Finally, we review some of the applications of the Choquet integral in decision analysis. Apart from MCDA, which is the main area of interest for our results, we also discuss how the model can be interpreted in the social choice context. We look in detail at the state-dependent utility, and show how comonotonicity, central to the previous axiomatizations, actually implies state-independency in the Choquet integral model. We also discuss the conditions required to have a meaningful state-dependent utility representation and show the novelty of our results compared to the previous methods of building state-dependent models.\nThe goal of multi-winner elections is to choose a fixed-size committee based on voters' preferences. An important concern in this setting is representation: large groups of voters with cohesive preferences should be adequately represented by the election winners. Recently, Aziz et al. (2015a;2017) proposed two axioms that aim to capture this idea: justified representation (JR) and its strengthening extended justified representation (EJR). In this paper, we extend the work of Aziz et al. in several directions. First, we answer an open question of Aziz et al., by showing that Reweighted Approval Voting satisfies JR for $k=3, 4, 5$, but fails it for $k\\ge 6$. Second, we observe that EJR is incompatible with the Perfect Representation criterion, which is important for many applications of multi-winner voting, and propose a relaxation of EJR, which we call Proportional Justified Representation (PJR). PJR is more demanding than JR, but, unlike EJR, it is compatible with perfect representation, and a committee that provides PJR can be computed in polynomial time if the committee size divides the number of voters. Moreover, just like EJR, PJR can be used to characterize the classic PAV rule in the class of weighted PAV rules. On the other hand, we show that EJR provides stronger guarantees with respect to average voter satisfaction than PJR does.\nThe gold standard for discovering causal relations is by means of experimentation. Over the last decades, alternative methods have been proposed that can infer causal relations between variables from certain statistical patterns in purely observational data. We introduce Joint Causal Inference (JCI), a novel approach to causal discovery from multiple data sets that elegantly unifies both approaches. JCI is a causal modeling approach rather than a specific algorithm, and it can be used in combination with any causal discovery algorithm that can take into account certain background knowledge. The main idea is to reduce causal discovery from multiple datasets originating from different contexts (e.g., different experimental conditions) to causal discovery from a single pooled dataset by adding a set of auxiliary context variables. JCI offers the following features: it deals with several different types of interventions in a unified fashion, it can learn intervention targets, it pools data across different datasets which improves the statistical power of independence tests, and by exploiting differences in distribution between contexts it improves on the accuracy and identifiability of the predicted causal relations. We evaluate the approach on flow cytometry data.\nAlthough support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive, if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially defined data chunks. Our contributions are two fold: On the theoretical side we establish an oracle inequality for the overall learning method using the hinge loss, and show that the resulting rates match those known for SVMs solving the complete optimization problem with Gaussian kernels. On the practical side we compare our approach to learning SVMs on small, randomly chosen chunks. Here it turns out that for comparable training times our approach is significantly faster during testing and also reduces the test error in most cases significantly. Furthermore, we show that our approach easily scales up to 10 million training samples: including hyper-parameter selection using cross validation, the entire training only takes a few hours on a single machine. Finally, we report an experiment on 32 million training samples. All experiments used liquidSVM (Steinwart and Thomann, 2017).\nRecently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a \"baseline\" to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7.\nAutonomous systems can be used to search for sparse signals in a large space; e.g., aerial robots can be deployed to localize threats, detect gas leaks, or respond to distress calls. Intuitively, search algorithms may increase efficiency by collecting aggregate measurements summarizing large contiguous regions. However, most existing search methods either ignore the possibility of such region observations (e.g., Bayesian optimization and multi-armed bandits) or make strong assumptions about the sensing mechanism that allow each measurement to arbitrarily encode all signals in the entire environment (e.g., compressive sensing). We propose an algorithm that actively collects data to search for sparse signals using only noisy measurements of the average values on rectangular regions (including single points), based on the greedy maximization of information gain. We analyze our algorithm in 1d and show that it requires $\\tilde{O}(\\frac{n}{\\mu^2}+k^2)$ measurements to recover all of $k$ signal locations with small Bayes error, where $\\mu$ and $n$ are the signal strength and the size of the search space, respectively. We also show that active designs can be fundamentally more efficient than passive designs with region sensing, contrasting with the results of Arias-Castro, Candes, and Davenport (2013). We demonstrate the empirical performance of our algorithm on a search problem using satellite image data and in high dimensions.\nMachine learning is making substantial progress in diverse applications. The success is mostly due to advances in deep learning. However, deep learning can make mistakes and its generalization abilities to new tasks are questionable. We ask when and how one can combine network outputs, when (i) details of the observations are evaluated by learned deep components and (ii) facts and confirmation rules are available in knowledge based systems. We show that in limited contexts the required number of training samples can be low and self-improvement of pre-trained networks in more general context is possible. We argue that the combination of sparse outlier detection with deep components that can support each other diminish the fragility of deep methods, an important requirement for engineering applications. We argue that supervised learning of labels may be fully eliminated under certain conditions: a component based architecture together with a knowledge based system can train itself and provide high quality answers. We demonstrate these concepts on the State Farm Distracted Driver Detection benchmark. We argue that the view of the Study Panel (2016) may overestimate the requirements on `years of focused research' and `careful, unique construction' for `AI systems'.\nWe study machine learning formulations of inductive program synthesis; that is, given input-output examples, synthesize source code that maps inputs to corresponding outputs. Our key contribution is TerpreT, a domain-specific language for expressing program synthesis problems. A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs. The inference task is to observe a set of input-output examples and infer the underlying program. From a TerpreT model we automatically perform inference using four different back-ends: gradient descent (thus each TerpreT model can be seen as defining a differentiable interpreter), linear program (LP) relaxations for graphical models, discrete satisfiability solving, and the Sketch program synthesis system. TerpreT has two main benefits. First, it enables rapid exploration of a range of domains, program representations, and interpreter models. Second, it separates the model specification from the inference algorithm, allowing proper comparisons between different approaches to inference.   We illustrate the value of TerpreT by developing several interpreter models and performing an extensive empirical comparison between alternative inference algorithms on a variety of program models. To our knowledge, this is the first work to compare gradient-based search over program space to traditional search-based alternatives. Our key empirical finding is that constraint solvers dominate the gradient descent and LP-based formulations.   This is a workshop summary of a longer report at arXiv:1608.04428\nThe concept of uncertainty is posed in almost any complex system including parallel robots as an outstanding instance of dynamical robotics systems. As suggested by the name, uncertainty, is some missing information that is beyond the knowledge of human thus we may tend to handle it properly to minimize the side-effects through the control process.   Type-II fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. Type-II fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when noise (as an important instance of uncertainty) emerges. During the design of Type-I fuzzy logic systems, we presume that we are almost certain about the fuzzy membership functions which is not true in many cases. Thus T2FLS as a more realistic approach dealing with practical applications might have a lot to offer. Type-II fuzzy logic takes into account a higher level of uncertainty, in other words, the membership grade for a type-II fuzzy variable is no longer a crisp number but rather is itself a type-I linguistic term. In this thesis the effects of uncertainty in dynamic control of a parallel robot is considered. More specifically, it is intended to incorporate the Type-II Fuzzy Logic paradigm into a model based controller, the so-called computed torque control method, and apply the result to a 3 degrees of freedom parallel manipulator.   ...\nThe field of connectomics faces unprecedented \"big data\" challenges. To reconstruct neuronal connectivity, automated pixel-level segmentation is required for petabytes of streaming electron microscopy data. Existing algorithms provide relatively good accuracy but are unacceptably slow, and would require years to extract connectivity graphs from even a single cubic millimeter of neural tissue. Here we present a viable real-time solution, a multi-pass pipeline optimized for shared-memory multicore systems, capable of processing data at near the terabyte-per-hour pace of multi-beam electron microscopes. The pipeline makes an initial fast-pass over the data, and then makes a second slow-pass to iteratively correct errors in the output of the fast-pass. We demonstrate the accuracy of a sparse slow-pass reconstruction algorithm and suggest new methods for detecting morphological errors. Our fast-pass approach provided many algorithmic challenges, including the design and implementation of novel shallow convolutional neural nets and the parallelization of watershed and object-merging techniques. We use it to reconstruct, from image stack to skeletons, the full dataset of Kasthuri et al. (463 GB capturing 120,000 cubic microns) in a matter of hours on a single multicore machine rather than the weeks it has taken in the past on much larger distributed systems.\nModel checking of strategic ability under imperfect information is known to be hard. The complexity results range from NP-completeness to undecidability, depending on the precise setup of the problem. No less importantly, fixpoint equivalences do not generally hold for imperfect information strategies, which seriously hampers incremental synthesis of winning strategies. In this paper, we propose translations of ATLir formulae that provide lower and upper bounds for their truth values, and are cheaper to verify than the original specifications. That is, if the expression is verified as true then the corresponding formula of ATLir should also hold in the given model. We begin by showing where the straightforward approach does not work. Then, we propose how it can be modified to obtain guaranteed lower bounds. To this end, we alter the next-step operator in such a way that traversing one's indistinguishability relation is seen as atomic activity. Most interestingly, the lower approximation is provided by a fixpoint expression that uses a nonstandard variant of the next-step ability operator. We show the correctness of the translations, establish their computational complexity, and validate the approach by experiments with a scalable scenario of Bridge play.\nRandom backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and learning channels. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results, including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.\nThis paper presents the design of a supervisory algorithm that monitors safety at road intersections and overrides drivers with a safe input when necessary. The design of the supervisor consists of two parts: safety verification and control design. Safety verification is the problem to determine if vehicles will be able to cross the intersection without colliding with current drivers' inputs. We translate this safety verification problem into a jobshop scheduling problem, which minimizes the maximum lateness and evaluates if the optimal cost is zero. The zero optimal cost corresponds to the case in which all vehicles can cross each conflict area without collisions. Computing the optimal cost requires solving a Mixed Integer Nonlinear Programming (MINLP) problem due to the nonlinear second-order dynamics of the vehicles. We therefore estimate this optimal cost by formulating two related Mixed Integer Linear Programming (MILP) problems that assume simpler vehicle dynamics. We prove that these two MILP problems yield lower and upper bounds of the optimal cost. We also quantify the worst case approximation errors of these MILP problems. We design the supervisor to override the vehicles with a safe control input if the MILP problem that computes the upper bound yields a positive optimal cost. We theoretically demonstrate that the supervisor keeps the intersection safe and is non-blocking. Computer simulations further validate that the algorithms can run in real time for problems of realistic size.\nIn this paper, we study the problem of author identification under double-blind review setting, which is to identify potential authors given information of an anonymized paper. Different from existing approaches that rely heavily on feature engineering, we propose to use network embedding approach to address the problem, which can automatically represent nodes into lower dimensional feature vectors. However, there are two major limitations in recent studies on network embedding: (1) they are usually general-purpose embedding methods, which are independent of the specific tasks; and (2) most of these approaches can only deal with homogeneous networks, where the heterogeneity of the network is ignored. Hence, challenges faced here are two folds: (1) how to embed the network under the guidance of the author identification task, and (2) how to select the best type of information due to the heterogeneity of the network.   To address the challenges, we propose a task-guided and path-augmented heterogeneous network embedding model. In our model, nodes are first embedded as vectors in latent feature space. Embeddings are then shared and jointly trained according to task-specific and network-general objectives. We extend the existing unsupervised network embedding to incorporate meta paths in heterogeneous networks, and select paths according to the specific task. The guidance from author identification task for network embedding is provided both explicitly in joint training and implicitly during meta path selection. Our experiments demonstrate that by using path-augmented network embedding with task guidance, our model can obtain significantly better accuracy at identifying the true authors comparing to existing methods.\nSynthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.\nPrediction in a small-sized sample with a large number of covariates, the \"small n, large p\" problem, is challenging. This setting is encountered in multiple applications, such as precision medicine, where obtaining additional samples can be extremely costly or even impossible, and extensive research effort has recently been dedicated to finding principled solutions for accurate prediction. However, a valuable source of additional information, domain experts, has not yet been efficiently exploited. We formulate knowledge elicitation generally as a probabilistic inference process, where expert knowledge is sequentially queried to improve predictions. In the specific case of sparse linear regression, where we assume the expert has knowledge about the values of the regression coefficients or about the relevance of the features, we propose an algorithm and computational approximation for fast and efficient interaction, which sequentially identifies the most informative features on which to query expert knowledge. Evaluations of our method in experiments with simulated and real users show improved prediction accuracy already with a small effort from the expert.\nMultiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research.\nConsensus formation is investigated for multi-agent systems in which agents' beliefs are both vague and uncertain. Vagueness is represented by a third truth state meaning \\emph{borderline}. This is combined with a probabilistic model of uncertainty. A belief combination operator is then proposed which exploits borderline truth values to enable agents with conflicting beliefs to reach a compromise. A number of simulation experiments are carried out in which agents apply this operator in pairwise interactions, under the bounded confidence restriction that the two agents' beliefs must be sufficiently consistent with each other before agreement can be reached. As well as studying the consensus operator in isolation we also investigate scenarios in which agents are influenced either directly or indirectly by the state of the world. For the former we conduct simulations which combine consensus formation with belief updating based on evidence. For the latter we investigate the effect of assuming that the closer an agent's beliefs are to the truth the more visible they are in the consensus building process. In all cases applying the consensus operators results in the population converging to a single shared belief which is both crisp and certain. Furthermore, simulations which combine consensus formation with evidential updating converge faster to a shared opinion which is closer to the actual state of the world than those in which beliefs are only changed as a result of directly receiving new evidence. Finally, if agent interactions are guided by belief quality measured as similarity to the true state of the world, then applying the consensus operator alone results in the population converging to a high quality shared belief.\nThis paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes, with two properties that are crucial for practical applications. First, the policies are implementable with a very low computational cost: once the policy is computed, the action corresponding to a given state is obtained in logarithmic time with respect to the number of samples used. Second, our method is versatile: it does not rely on any a priori knowledge of the structure of optimal policies. We build upon the Fitted Q-iteration algorithm which represents the $Q$-value as the average of several regression trees. Our algorithm, the Fitted Policy Forest algorithm (FPF), computes a regression forest representing the Q-value and transforms it into a single tree representing the policy, while keeping control on the size of the policy using resampling and leaf merging. We introduce an adaptation of Multi-Resolution Exploration (MRE) which is particularly suited to FPF. We assess the performance of FPF on three classical benchmarks for reinforcement learning: the \"Inverted Pendulum\", the \"Double Integrator\" and \"Car on the Hill\" and show that FPF equals or outperforms other algorithms, although these algorithms rely on the use of particular representations of the policies, especially chosen in order to fit each of the three problems. Finally, we exhibit that the combination of FPF and MRE allows to find nearly optimal solutions in problems where $\\epsilon$-greedy approaches would fail.\nGiven a knowledge base or KB containing (noisy) facts about common nouns or generics, such as \"all trees produce oxygen\" or \"some animals live in forests\", we consider the problem of inferring additional such facts at a precision similar to that of the starting KB. Such KBs capture general knowledge about the world, and are crucial for various applications such as question answering. Different from commonly studied named entity KBs such as Freebase, generics KBs involve quantification, have more complex underlying regularities, tend to be more incomplete, and violate the commonly used locally closed world assumption (LCWA). We show that existing KB completion methods struggle with this new task, and present the first approach that is successful. Our results demonstrate that external information, such as relation schemas and entity taxonomies, if used appropriately, can be a surprisingly powerful tool in this setting. First, our simple yet effective knowledge guided tensor factorization approach achieves state-of-the-art results on two generics KBs (80% precise) for science, doubling their size at 74%-86% precision. Second, our novel taxonomy guided, submodular, active learning method for collecting annotations about rare entities (e.g., oriole, a bird) is 6x more effective at inferring further new facts about them than multiple active learning baselines.\nAssumption-Based Argumentation (ABA) is an argumentation framework that has been proposed in the late 20th century. Since then, there was still no solver implemented in a programming language which is easy to setup and no solver have been interfaced to the web, which impedes the interests of the public. This project aims to implement an ABA solver in a modern programming language that performs reasonably well and interface it to the web for easier access by the public. This project has demonstrated the novelty of development of an ABA solver, that computes conflict-free, stable, admissible, grounded, ideal, and complete semantics, in Python programming language which can be used via an easy-to-use web interface for visualization of the argument and dispute trees. Experiments were conducted to determine the project's best configurations and to compare this project with proxdd, a state-of-the-art ABA solver, which has no web interface and computes less number of semantics. From the results of the experiments, this project's best configuration is achieved by utilizing \"pickle\" technique and tree caching technique. Using this project's best configuration, this project achieved a lower average runtime compared to proxdd. On other aspect, this project encountered more cases with exceptions compared to proxdd, which might be caused by this project computing more semantics and hence requires more resources to do so. Hence, it can be said that this project run comparably well to the state-of-the-art ABA solver proxdd. Future works of this project include computational complexity analysis and efficiency analysis of algorithms implemented, implementation of more semantics in argumentation framework, and usability testing of the web interface.\nA heuristic procedure based on novel recursive formulation of sinusoid (RFS) and on regression with predictive least-squares (LS) enables to decompose both uniformly and nonuniformly sampled 1-d signals into a sparse set of sinusoids (SSS). An optimal SSS is found by Levenberg-Marquardt (LM) optimization of RFS parameters of near-optimal sinusoids combined with common criteria for the estimation of the number of sinusoids embedded in noise. The procedure estimates both the cardinality and the parameters of SSS. The proposed algorithm enables to identify the RFS parameters of a sinusoid from a data sequence containing only a fraction of its cycle. In extreme cases when the frequency of a sinusoid approaches zero the algorithm is able to detect a linear trend in data. Also, an irregular sampling pattern enables the algorithm to correctly reconstruct the under-sampled sinusoid. Parsimonious nature of the obtaining models opens the possibilities of using the proposed method in machine learning and in expert and intelligent systems needing analysis and simple representation of 1-d signals. The properties of the proposed algorithm are evaluated on examples of irregularly sampled artificial signals in noise and are compared with high accuracy frequency estimation algorithms based on linear prediction (LP) approach, particularly with respect to Cramer-Rao Bound (CRB).\nDeep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models can do this as well and propose our Pointing and Justification (PJ-X) model which can justify its decision with a sentence and point to the evidence by introspecting its decision and explanation process using an attention mechanism. Unfortunately there is no dataset available with reference explanations for visual decision making. We thus collect two datasets in two domains where it is interesting and challenging to explain decisions. First, we extend the visual question answering task to not only provide an answer but also a natural language explanation for the answer. Second, we focus on explaining human activities which is traditionally more challenging than object classification. We extensively evaluate our PJ-X model, both on the justification and pointing tasks, by comparing it to prior models and ablations using both automatic and human evaluations.\nWe demonstrate the possibility of classifying causal systems into kinds that share a common structure without first constructing an explicit dynamical model or using prior knowledge of the system dynamics. The algorithmic ability to determine whether arbitrary systems are governed by causal relations of the same form offers significant practical applications in the development and validation of dynamical models. It is also of theoretical interest as an essential stage in the scientific inference of laws from empirical data. The algorithm presented is based on the dynamical symmetry approach to dynamical kinds. A dynamical symmetry with respect to time is an intervention on one or more variables of a system that commutes with the time evolution of the system. A dynamical kind is a class of systems sharing a set of dynamical symmetries. The algorithm presented classifies deterministic, time-dependent causal systems by directly comparing their exhibited symmetries. Using simulated, noisy data from a variety of nonlinear systems, we show that this algorithm correctly sorts systems into dynamical kinds. It is robust under significant sampling error, is immune to violations of normality in sampling error, and fails gracefully with increasing dynamical similarity. The algorithm we demonstrate is the first to address this aspect of automated scientific discovery.\nBayesian inference on structured models typically relies on the ability to infer posterior distributions of underlying hidden variables. However, inference in implicit models or complex posterior distributions is hard. A popular tool for learning implicit models are generative adversarial networks (GANs) which learn parameters of generators by fooling discriminators. Typically, GANs are considered to be models themselves and are not understood in the context of inference. Current techniques rely on inefficient global discrimination of joint distributions to perform learning, or only consider discriminating a single output variable. We overcome these limitations by treating GANs as a basis for likelihood-free inference in generative models and generalize them to Bayesian posterior inference over factor graphs. We propose local learning rules based on message passing minimizing a global divergence criterion involving cooperating local adversaries used to sidestep explicit likelihood evaluations. This allows us to compose models and yields a unified inference and learning framework for adversarial learning. Our framework treats model specification and inference separately and facilitates richly structured models within the family of Directed Acyclic Graphs, including components such as intractable likelihoods, non-differentiable models, simulators and generally cumbersome models. A key result of our treatment is the insight that Bayesian inference on structured models can be performed only with sampling and discrimination when using nonparametric variational families, without access to explicit distributions. As a side-result, we discuss the link to likelihood maximization. These approaches hold promise to be useful in the toolbox of probabilistic modelers and enrich the gamut of current probabilistic programming applications.\nDespite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. First, reinforcement learners typically require interaction with the environment, so conventional dialogue corpora cannot be used directly. Second, each task presents specific challenges, requiring separate corpus of task-specific annotated data. Third, collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge. Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues. Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator. Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online. To ease empirical algorithmic comparisons in dialogues, this paper introduces a new, publicly available simulation framework, where our simulator, designed for the movie-booking domain, leverages both rules and collected data. The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demonstrate several agents and detail the procedure to add and test your own agent in the proposed framework.\nWe study the TAPF (combined target-assignment and path-finding) problem for teams of agents in known terrain, which generalizes both the anonymous and non-anonymous multi-agent path-finding problems. Each of the teams is given the same number of targets as there are agents in the team. Each agent has to move to exactly one target given to its team such that all targets are visited. The TAPF problem is to first assign agents to targets and then plan collision-free paths for the agents to their targets in a way such that the makespan is minimized. We present the CBM (Conflict-Based Min-Cost-Flow) algorithm, a hierarchical algorithm that solves TAPF instances optimally by combining ideas from anonymous and non-anonymous multi-agent path-finding algorithms. On the low level, CBM uses a min-cost max-flow algorithm on a time-expanded network to assign all agents in a single team to targets and plan their paths. On the high level, CBM uses conflict-based search to resolve collisions among agents in different teams. Theoretically, we prove that CBM is correct, complete and optimal. Experimentally, we show the scalability of CBM to TAPF instances with dozens of teams and hundreds of agents and adapt it to a simulated warehouse system.\nIn this project we propose a new approach for emotion recognition using web-based similarity (e.g. confidence, PMI and PMING). We aim to extract basic emotions from short sentences with emotional content (e.g. news titles, tweets, captions), performing a web-based quantitative evaluation of semantic proximity between each word of the analyzed sentence and each emotion of a psychological model (e.g. Plutchik, Ekman, Lovheim). The phases of the extraction include: text preprocessing (tokenization, stop words, filtering), search engine automated query, HTML parsing of results (i.e. scraping), estimation of semantic proximity, ranking of emotions according to proximity measures. The main idea is that, since it is possible to generalize semantic similarity under the assumption that similar concepts co-occur in documents indexed in search engines, therefore also emotions can be generalized in the same way, through tags or terms that express them in a particular language, ranking emotions. Training results are compared to human evaluation, then additional comparative tests on results are performed, both for the global ranking correlation (e.g. Kendall, Spearman, Pearson) both for the evaluation of the emotion linked to each single word. Different from sentiment analysis, our approach works at a deeper level of abstraction, aiming at recognizing specific emotions and not only the positive/negative sentiment, in order to predict emotions as semantic data.\nIn this paper, we consider a realistic and meaningful scenario in the context of smart grids where an electricity retailer serves three different types of customers, i.e., customers with an optimal home energy management system embedded in their smart meters (C-HEMS), customers with only smart meters (C-SM), and customers without smart meters (C-NONE). The main objective of this paper is to support the retailer to make optimal day-ahead dynamic pricing decisions in such a mixed customer pool. To this end, we propose a two-level decision-making framework where the retailer acting as upper-level agent firstly announces its electricity prices of next 24 hours and customers acting as lower-level agents subsequently schedule their energy usages accordingly. For the lower level problem, we model the price responsiveness of different customers according to their unique characteristics. For the upper level problem, we optimize the dynamic prices for the retailer to maximize its profit subject to realistic market constraints. The above two-level model is tackled by genetic algorithms (GA) based distributed optimization methods while its feasibility and effectiveness are confirmed via simulation results.\nA dominant paradigm for deep learning based object detection relies on a \"bottom-up\" approach using \"passive\" scoring of class agnostic proposals. These approaches are efficient but lack of holistic analysis of scene-level context. In this paper, we present an \"action-driven\" detection mechanism using our \"top-down\" visual attention model. We localize an object by taking sequential actions that the attention model provides. The attention model conditioned with an image region provides required actions to get closer toward a target object. An action at each time step is weak itself but an ensemble of the sequential actions makes a bounding-box accurately converge to a target object boundary. This attention model we call AttentionNet is composed of a convolutional neural network. During our whole detection procedure, we only utilize the actions from a single AttentionNet without any modules for object proposals nor post bounding-box regression. We evaluate our top-down detection mechanism over the PASCAL VOC series and ILSVRC CLS-LOC dataset, and achieve state-of-the-art performances compared to the major bottom-up detection methods. In particular, our detection mechanism shows a strong advantage in elaborate localization by outperforming Faster R-CNN with a margin of +7.1% over PASCAL VOC 2007 when we increase the IoU threshold for positive detection to 0.7.\nDeep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement learning (DRL), we begin from traditional reinforcement learning algorithms, showing how they are extended to the deep context and effective mechanisms that could be added on top of the DRL algorithms. We then introduce representative works that utilize DRL to solve navigation and manipulation tasks in robotics. We continue our discussion on methods addressing the challenge of the reality gap for transferring DRL policies trained in simulation to real-world scenarios, and summarize robotics simulation platforms for conducting DRL research. For imitation leaning, we go through its three main categories, behavior cloning, inverse reinforcement learning and generative adversarial imitation learning, by introducing their formulations and their corresponding robotics applications. Finally, we discuss the open challenges and research frontiers.\nMulti-objective evolutionary algorithms (MOEAs) have achieved great progress in recent decades, but most of them are designed to solve unconstrained multi-objective optimization problems. In fact, many real-world multi-objective problems usually contain a number of constraints. To promote the research of constrained multi-objective optimization, we first propose three primary types of difficulty, which reflect the challenges in the real-world optimization problems, to characterize the constraint functions in CMOPs, including feasibility-hardness, convergence-hardness and diversity-hardness. We then develop a general toolkit to construct difficulty adjustable and scalable constrained multi-objective optimization problems (CMOPs) with three types of parameterized constraint functions according to the proposed three primary types of difficulty. In fact, combination of the three primary constraint functions with different parameters can lead to construct a large variety of CMOPs, whose difficulty can be uniquely defined by a triplet with each of its parameter specifying the level of each primary difficulty type respectively. Furthermore, the number of objectives in this toolkit are able to scale to more than two. Based on this toolkit, we suggest nine difficulty adjustable and scalable CMOPs named DAS-CMOP1-9. To evaluate the proposed test problems, two popular CMOEAs - MOEA/D-CDP and NSGA-II-CDP are adopted to test their performances on DAS-CMOP1-9 with different difficulty triplets. The experiment results demonstrate that none of them can solve these problems efficiently, which stimulate us to develop new constrained MOEAs to solve the suggested DAS-CMOPs.\nThe paper proposes an analysis of liquid democracy (or, delegable proxy voting) from the perspective of binary aggregation and of binary diffusion models. We show how liquid democracy on binary issues can be embedded into the framework of binary aggregation with abstentions, enabling the transfer of known results about the latter---such as impossibility theorems---to the former. This embedding also sheds light on the relation between delegation cycles in liquid democracy and the probability of collective abstentions, as well as the issue of individual rationality in a delegable proxy voting setting. We then show how liquid democracy on binary issues can be modeled and analyzed also as a specific process of dynamics of binary opinions on networks. These processes---called Boolean DeGroot processes---are a special case of the DeGroot stochastic model of opinion diffusion. We establish the convergence conditions of such processes and show they provide some novel insights on how the effects of delegation cycles and individual rationality could be mitigated within liquid democracy.   The study is a first attempt to provide theoretical foundations to the delgable proxy features of the liquid democracy voting system. Our analysis suggests recommendations on how the system may be modified to make it more resilient with respect to the handling of delegation cycles and of inconsistent majorities.\nData science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.\nEnergy disaggregation (a.k.a nonintrusive load monitoring, NILM), a single-channel blind source separation problem, aims to decompose the mains which records the whole house electricity consumption into appliance-wise readings. This problem is difficult because it is inherently unidentifiable. Recent approaches have shown that the identifiability problem could be reduced by introducing domain knowledge into the model. Deep neural networks have been shown to be a promising approach for these problems, but sliding windows are necessary to handle the long sequences which arise in signal processing problems, which raises issues about how to combine predictions from different sliding windows. In this paper, we propose sequence-to-point learning, where the input is a window of the mains and the output is a single point of the target appliance. We use convolutional neural networks to train the model. Interestingly, we systematically show that the convolutional neural networks can inherently learn the signatures of the target appliances, which are automatically added into the model to reduce the identifiability problem. We applied the proposed neural network approaches to real-world household energy data, and show that the methods achieve state-of-the-art performance, improving two standard error measures by 84% and 92%.\nThe causal structure of any system can be analyzed at a multitude of spatial and temporal scales. It has long been thought that while higher scale (macro) descriptions of causal structure may be useful to observers, they are at best a compressed description and at worse leave out critical information. However, recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus (be more informative) at a macroscale (Hoel et al. 2013). That is, a macro model of a system (a map) can be more informative than a fully detailed model of the system (the territory). This has been called causal emergence. While causal emergence may at first glance seem counterintuitive, this paper grounds the phenomenon in a classic concept from information theory: Shannon's discovery of the channel capacity. I argue that systems have a particular causal capacity, and that different causal models of those systems take advantage of that capacity to various degrees. For some systems, only macroscale causal models use the full causal capacity. Such macroscale causal models can either be coarse-grains, or may leave variables and states out of the model (exogenous) in various ways, which can improve the model's efficacy and its informativeness via the same mathematical principles of how error-correcting codes take advantage of an information channel's capacity. As model choice increase, the causal capacity of a system approaches the channel capacity. Ultimately, this provides a general framework for understanding how the causal structure of some systems cannot be fully captured by even the most detailed microscopic model.\nWe are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction, and they can be combined in AI systems to solve complex automated reasoning problems. This paper provides a recipe for combining ML algorithms to solve for causal effects in the presence of instrumental variables -- sources of treatment randomization that are conditionally independent from the response. We show that a flexible IV specification resolves into two prediction tasks that can be solved with deep neural nets: a first-stage network for treatment prediction and a second-stage network whose loss function involves integration over the conditional treatment distribution. This Deep IV framework imposes some specific structure on the stochastic gradient descent routine used for training, but it is general enough that we can take advantage of off-the-shelf ML capabilities and avoid extensive algorithm customization. We outline how to obtain out-of-sample causal validation in order to avoid over-fit. We also introduce schemes for both Bayesian and frequentist inference: the former via a novel adaptation of dropout training, and the latter via a data splitting routine.\nGravitational wave astronomy has set in motion a scientific revolution. To further enhance the science reach of this emergent field, there is a pressing need to increase the depth and speed of the gravitational wave algorithms that have enabled these groundbreaking discoveries. To contribute to this effort, we introduce Deep Filtering, a new highly scalable method for end-to-end time-series signal processing, based on a system of two deep convolutional neural networks, which we designed for classification and regression to rapidly detect and estimate parameters of signals in highly noisy time-series data streams. We demonstrate a novel training scheme with gradually increasing noise levels, and a transfer learning procedure between the two networks. We showcase the application of this method for the detection and parameter estimation of gravitational waves from binary black hole mergers. Our results indicate that Deep Filtering significantly outperforms conventional machine learning techniques, achieves similar performance compared to matched-filtering while being several orders of magnitude faster thus allowing real-time processing of raw big data with minimal resources. More importantly, Deep Filtering extends the range of gravitational wave signals that can be detected with ground-based gravitational wave detectors. This framework leverages recent advances in artificial intelligence algorithms and emerging hardware architectures, such as deep-learning-optimized GPUs, to facilitate real-time searches of gravitational wave sources and their electromagnetic and astro-particle counterparts.\nTechniques known as Nonlinear Set Membership prediction, Lipschitz Interpolation or Kinky Inference are approaches to machine learning that utilise presupposed Lipschitz properties to compute inferences over unobserved function values. Provided a bound on the true best Lipschitz constant of the target function is known a priori they offer convergence guarantees as well as bounds around the predictions. Considering a more general setting that builds on Hoelder continuity relative to pseudo-metrics, we propose an online method for estimating the Hoelder constant online from function value observations that possibly are corrupted by bounded observational errors. Utilising this to compute adaptive parameters within a kinky inference rule gives rise to a nonparametric machine learning method, for which we establish strong universal approximation guarantees. That is, we show that our prediction rule can learn any continuous function in the limit of increasingly dense data to within a worst-case error bound that depends on the level of observational uncertainty. We apply our method in the context of nonparametric model-reference adaptive control (MRAC). Across a range of simulated aircraft roll-dynamics and performance metrics our approach outperforms recently proposed alternatives that were based on Gaussian processes and RBF-neural networks. For discrete-time systems, we provide guarantees on the tracking success of our learning-based controllers both for the batch and the online learning setting.\nPrivacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy, as we also illustrate through empirical experiments.\nTo ease the development of robot learning in industry, two conditions need to be fulfilled. Manipulators must be able to learn high accuracy and precision tasks while being safe for workers in the factory. In this paper, we extend previously submitted work which consists in rapid learning of local high accuracy behaviors. By exploration and regression, linear and quadratic models are learnt for respectively the dynamics and cost function. Iterative Linear Quadratic Gaussian Regulator combined with cost quadratic regression can converge rapidly in the final stages towards high accuracy behavior as the cost function is modelled quite precisely. In this paper, both a different cost function and a second order improvement method are implemented within this framework. We also propose an analysis of the algorithm parameters through simulation for a positioning task. Finally, an experimental validation on a KUKA LBR iiwa robot is carried out. This collaborative robot manipulator can be easily programmed into safety mode, which makes it qualified for the second industry constraint stated above.\nAnalogy Based Effort Estimation (ABE) is one of the prominent methods for software effort estimation. The fundamental concept of ABE is closer to the mentality of expert estimation but with an automated procedure in which the final estimate is generated by reusing similar historical projects. The main key issue when using ABE is how to adapt the effort of the retrieved nearest neighbors. The adaptation process is an essential part of ABE to generate more successful accurate estimation based on tuning the selected raw solutions, using some adaptation strategy. In this study we show that there are three interrelated decision variables that have great impact on the success of adaptation method: (1) number of nearest analogies (k), (2) optimum feature set needed for adaptation, and (3) adaptation weights. To find the right decision regarding these variables, one need to study all possible combinations and evaluate them individually to select the one that can improve all prediction evaluation measures. The existing evaluation measures usually behave differently, presenting sometimes opposite trends in evaluating prediction methods. This means that changing one decision variable could improve one evaluation measure while it is decreasing the others. Therefore, the main theme of this research is how to come up with best decision variables that improve adaptation strategy and thus, the overall evaluation measures without degrading the others. The impact of these decisions together has not been investigated before, therefore we propose to view the building of adaptation procedure as a multi-objective optimization problem. The Particle Swarm Optimization Algorithm (PSO) is utilized to find the optimum solutions for such decision variables based on optimizing multiple evaluation measures\nThis paper extends recent work in interactive machine learning (IML) focused on effectively incorporating human feedback. We show how control and feedback signals complement each other in systems which model human reward. We demonstrate that simultaneously incorporating human control and feedback signals can improve interactive robotic systems' performance on a self-mirrored movement control task where an RL-agent controlled right arm attempts to match the preprogrammed movement pattern of the left arm. We illustrate the impact of varying human feedback parameters on task performance by investigating the probability of giving feedback on each time step and the likelihood of given feedback being correct. We further illustrate that varying the temporal decay with which the agent incorporates human feedback has a significant impact on task performance. We found that smearing human feedback over time steps improves performance and we show varying the probability of feedback at each time step, and an increased likelihood of those feedbacks being 'correct' can impact agent performance. We conclude that understanding latent variables in human feedback is crucial for learning algorithms acting in human-machine interaction domains.\nForecasting the flow of crowds is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, including spatial dependencies (nearby and distant), temporal dependencies (closeness, period, trend), and external conditions (e.g., weather and events). We propose a deep-learning-based approach, called ST-ResNet, to collectively forecast two types of crowd flows (i.e. inflow and outflow) in each and every region of a city. We design an end-to-end structure of ST-ResNet based on unique properties of spatio-temporal data. More specifically, we employ the residual neural network framework to model the temporal closeness, period, and trend properties of crowd traffic. For each property, we design a branch of residual convolutional units, each of which models the spatial properties of crowd traffic. ST-ResNet learns to dynamically aggregate the output of the three residual neural networks based on data, assigning different weights to different branches and regions. The aggregation is further combined with external factors, such as weather and day of the week, to predict the final traffic of crowds in each and every region. We have developed a real-time system based on Microsoft Azure Cloud, called UrbanFlow, providing the crowd flow monitoring and forecasting in Guiyang City of China. In addition, we present an extensive experimental evaluation using two types of crowd flows in Beijing and New York City (NYC), where ST-ResNet outperforms nine well-known baselines.\nThe Internet of Things is arriving to our homes or cities through fields already known like Smart Homes, Smart Cities, or Smart Towns. The monitoring of environmental conditions of cities can help to adapt the indoor locations of the cities in order to be more comfortable for people who stay there. A way to improve the indoor conditions is an efficient temperature control, however, it depends on many factors like the different combinations of outdoor temperature and humidity. Therefore, adjusting the indoor temperature is not setting a value according to other value. There are many more factors to take into consideration, hence the traditional logic based in binary states cannot be used. Many problems cannot be solved with a set of binary solutions and we need a new way of development. Fuzzy logic is able to interpret many states, more than two states, giving to computers the capacity to react in a similar way to people. In this paper we will propose a new approach to control the temperature using the Internet of Things together its platforms and fuzzy logic regarding not only the indoor temperature but also the outdoor temperature and humidity in order to save energy and to set a more comfortable environment for their users. Finally, we will conclude that the fuzzy approach allows us to achieve an energy saving around 40% and thus, save money.\nLimited annotated data available for the recognition of facial expression and action units embarrasses the training of deep networks, which can learn disentangled invariant features. However, a linear model with just several parameters normally is not demanding in terms of training data. In this paper, we propose an elegant linear model to untangle confounding factors in challenging realistic multichannel signals such as 2D face videos. The simple yet powerful model does not rely on huge training data and is natural for recognizing facial actions without explicitly disentangling the identity. Base on well-understood intuitive linear models such as Sparse Representation based Classification (SRC), previous attempts require a prepossessing of explicit decoupling which is practically inexact. Instead, we exploit the low-rank property across frames to subtract the underlying neutral faces which are modeled jointly with sparse representation on the action components with group sparsity enforced. On the extended Cohn-Kanade dataset (CK+), our one-shot automatic method on raw face videos performs as competitive as SRC applied on manually prepared action components and performs even better than SRC in terms of true positive rate. We apply the model to the even more challenging task of facial action unit recognition, verified on the MPI Face Video Database (MPI-VDB) achieving a decent performance. All the programs and data have been made publicly available.\nIn this paper, a novel architecture for a deep recurrent neural network, residual LSTM is introduced. A plain LSTM has an internal memory cell that can learn long term dependencies of sequential data. It also provides a temporal shortcut path to avoid vanishing or exploding gradients in the temporal domain. The residual LSTM provides an additional spatial shortcut path from lower layers for efficient training of deep networks with multiple LSTM layers. Compared with the previous work, highway LSTM, residual LSTM separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows. Furthermore, residual LSTM reuses the output projection matrix and the output gate of LSTM to control the spatial information flow instead of additional gate networks, which effectively reduces more than 10% of network parameters. An experiment for distant speech recognition on the AMI SDM corpus shows that 10-layer plain and highway LSTM networks presented 13.7% and 6.2% increase in WER over 3-layer aselines, respectively. On the contrary, 10-layer residual LSTM networks provided the lowest WER 41.0%, which corresponds to 3.3% and 2.8% WER reduction over plain and highway LSTM networks, respectively.\nThe promise of compressive sensing (CS) has been offset by two significant challenges. First, real-world data is not exactly sparse in a fixed basis. Second, current high-performance recovery algorithms are slow to converge, which limits CS to either non-real-time applications or scenarios where massive back-end computing is available. In this paper, we attack both of these challenges head-on by developing a new signal recovery framework we call {\\em DeepInverse} that learns the inverse transformation from measurement vectors to signals using a {\\em deep convolutional network}. When trained on a set of representative images, the network learns both a representation for the signals (addressing challenge one) and an inverse map approximating a greedy or convex recovery algorithm (addressing challenge two). Our experiments indicate that the DeepInverse network closely approximates the solution produced by state-of-the-art CS recovery algorithms yet is hundreds of times faster in run time. The tradeoff for the ultrafast run time is a computationally intensive, off-line training procedure typical to deep networks. However, the training needs to be completed only once, which makes the approach attractive for a host of sparse recovery problems.\nFor a social networking service to acquire and retain users, it must find ways to keep them engaged. By accurately gauging their preferences, it is able to serve them with the subset of available content that maximises revenue for the site. Without the constraints of an appropriate regulatory framework, we argue that a sufficiently sophisticated curator algorithm tasked with performing this process may choose to explore curation strategies that are detrimental to users. In particular, we suggest that such an algorithm is capable of learning to manipulate its users, for several qualitative reasons: 1. Access to vast quantities of user data combined with ongoing breakthroughs in the field of machine learning are leading to powerful but uninterpretable strategies for decision making at scale. 2. The availability of an effective feedback mechanism for assessing the short and long term user responses to curation strategies. 3. Techniques from reinforcement learning have allowed machines to learn automated and highly successful strategies at an abstract level, often resulting in non-intuitive yet nonetheless highly appropriate action selection. In this work, we consider the form that these strategies for user manipulation might take and scrutinise the role that regulation should play in the design of such systems.\nStochastic dynamic control systems relate in a prob- abilistic fashion the space of control signals to the space of corresponding future states. Consequently, stochastic dynamic systems can be interpreted as an information channel between the control space and the state space. In this work we study this control-to-state informartion capacity of stochastic dynamic systems in continuous-time, when the states are observed only partially. The control-to-state capacity, known as empowerment, was shown in the past to be useful in solving various Artificial Intelligence & Control benchmarks, and was used to replace problem-specific utilities. The higher the value of empowerment is, the more optional future states an agent may reach by using its controls inside a given time horizon. The contribution of this work is that we derive an efficient solution for computing the control-to-state information capacity for a linear, partially-observed Gaussian dynamic control system in continuous time, and discover new relationships between control-theoretic and information-theoretic properties of dynamic systems. Particularly, using the derived method, we demonstrate that the capacity between the control signal and the system output does not grow without limits with the length of the control signal. This means that only the near-past window of the control signal contributes effectively to the control-to-state capacity, while most of the information beyond this window is irrelevant for the future state of the dynamic system. We show that empowerment depends on a time constant of a dynamic system.\nA new, radical CNN design approach is presented in this paper, considering the reduction of the total computational load during inference. This is achieved by a new holistic intervention on both the CNN architecture and the training procedure, which targets to the parsimonious inference by learning to exploit or remove the redundant capacity of a CNN architecture. This is accomplished, by the introduction of a new structural element that can be inserted as an add-on to any contemporary CNN architecture, whilst preserving or even improving its recognition accuracy. Our approach formulates a systematic and data-driven method for developing CNNs that are trained to eventually change size and form in real-time during inference, targeting to the smaller possible computational footprint. Results are provided for the optimal implementation on a few modern, high-end mobile computing platforms indicating a significant speed-up of up to x3 times.\nIn this work several semantic approaches to concept-based query expansion and reranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes, where, in order to effectively increase the precision of web document retrieval and to decrease the users browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed by using statistical results from web search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.\nWe introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.\nIn this paper, we focus on online representation learning in non-stationary environments which may require continuous adaptation of model architecture. We propose a novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with improved cognitive function and adaptation to new environments. In the online learning setting, where new input instances arrive sequentially in batches, the neuronal-birth is implemented by adding new units with random initial weights (random dictionary elements); the number of new units is determined by the current performance (representation error) of the dictionary, higher error causing an increase in the birth rate. Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity) on the dictionary within the block-coordinate descent optimization at each iteration of our online alternating minimization scheme, which iterates between the code and dictionary updates. Finally, hidden unit connectivity adaptation is facilitated by introducing sparsity in dictionary elements. Our empirical evaluation on several real-life datasets (images and language) as well as on synthetic data demonstrates that the proposed approach can considerably outperform the state-of-art fixed-size (nonadaptive) online sparse coding of Mairal et al. (2009) in the presence of nonstationary data. Moreover, we identify certain properties of the data (e.g., sparse inputs with nearly non-overlapping supports) and of the model (e.g., dictionary sparsity) associated with such improvements.\nThe payload of communications satellites must go through a series of tests to assert their ability to survive in space. Each test involves some equipment of the payload to be active, which has an impact on the temperature of the payload. Sequencing these tests in a way that ensures the thermal stability of the payload and minimizes the overall duration of the test campaign is a very important objective for satellite manufacturers. The problem can be decomposed in two sub-problems corresponding to two objectives: First, the number of distinct configurations necessary to run the tests must be minimized. This can be modeled as packing the tests into configurations, and we introduce a set of implied constraints to improve the lower bound of the model. Second, tests must be sequenced so that the number of times an equipment unit has to be switched on or off is minimized. We model this aspect using the constraint Switch, where a buffer with limited capacity represents the currently active equipment units, and we introduce an improvement of the propagation algorithm for this constraint. We then introduce a search strategy in which we sequentially solve the sub-problems (packing and sequencing). Experiments conducted on real and random instances show the respective interest of our contributions.\nWe consider an online version of the robust Principle Component Analysis (PCA), which arises naturally in time-varying source separations such as video foreground-background separation. This paper proposes a compressive online robust PCA with prior information for recursively separating a sequences of frames into sparse and low-rank components from a small set of measurements. In contrast to conventional batch-based PCA, which processes all the frames directly, the proposed method processes measurements taken from each frame. Moreover, this method can efficiently incorporate multiple prior information, namely previous reconstructed frames, to improve the separation and thereafter, update the prior information for the next frame. We utilize multiple prior information by solving $n\\text{-}\\ell_{1}$ minimization for incorporating the previous sparse components and using incremental singular value decomposition ($\\mathrm{SVD}$) for exploiting the previous low-rank components. We also establish theoretical bounds on the number of measurements required to guarantee successful separation under assumptions of static or slowly-changing low-rank components. Using numerical experiments, we evaluate our bounds and the performance of the proposed algorithm. In addition, we apply the proposed algorithm to online video foreground and background separation from compressive measurements. Experimental results show that the proposed method outperforms the existing methods.\nThe current trends in next-generation exascale systems go towards integrating a wide range of specialized (co-)processors into traditional supercomputers. Due to the efficiency of heterogeneous systems in terms of Watts and FLOPS per surface unit, opening the access of heterogeneous platforms to a wider range of users is an important problem to be tackled. However, heterogeneous platforms limit the portability of the applications and increase development complexity due to the programming skills required. Program transformation can help make programming heterogeneous systems easier by defining a step-wise transformation process that translates a given initial code into a semantically equivalent final code, but adapted to a specific platform. Program transformation systems require the definition of efficient transformation strategies to tackle the combinatorial problem that emerges due to the large set of transformations applicable at each step of the process. In this paper we propose a machine learning-based approach to learn heuristics to define program transformation strategies. Our approach proposes a novel combination of reinforcement learning and classification methods to efficiently tackle the problems inherent to this type of systems. Preliminary results demonstrate the suitability of this approach.\nSocial messages classification is a research domain that has attracted the attention of many researchers in these last years. Indeed, the social message is different from ordinary text because it has some special characteristics like its shortness. Then the development of new approaches for the processing of the social message is now essential to make its classification more efficient. In this paper, we are mainly interested in the classification of social messages based on their spreading on online social networks (OSN). We proposed a new distance metric based on the Dynamic Time Warping distance and we use it with the probabilistic and the evidential k Nearest Neighbors (k-NN) classifiers to classify propagation networks (PrNets) of messages. The propagation network is a directed acyclic graph (DAG) that is used to record propagation traces of the message, the traversed links and their types. We tested the proposed metric with the chosen k-NN classifiers on real world propagation traces that were collected from Twitter social network and we got good classification accuracies.\nThe Frame Problem (FP) is a puzzle in philosophy of mind and epistemology, articulated by the Stanford Encyclopedia of Philosophy as follows: \"How do we account for our apparent ability to make decisions on the basis only of what is relevant to an ongoing situation without having explicitly to consider all that is not relevant?\" In this work, we focus on the causal variant of the FP, the Causal Frame Problem (CFP). Assuming that a reasoner's mental causal model can be (implicitly) represented by a causal Bayes net, we first introduce a notion called Potential Level (PL). PL, in essence, encodes the relative position of a node with respect to its neighbors in a causal Bayes net. Drawing on the psychological literature on causal judgment, we substantiate the claim that PL may bear on how time is encoded in the mind. Using PL, we propose an inference framework, called the PL-based Inference Framework (PLIF), which permits a boundedly-rational approach to the CFP to be formally articulated at Marr's algorithmic level of analysis. We show that our proposed framework, PLIF, is consistent with a wide range of findings in causal judgment literature, and that PL and PLIF make a number of predictions, some of which are already supported by existing findings.\nWe study the problem of identifying the causal relationship between two discrete random variables from observational data. We recently proposed a novel framework called entropic causality that works in a very general functional model but makes the assumption that the unobserved exogenous variable has small entropy in the true causal direction.   This framework requires the solution of a minimum entropy coupling problem: Given marginal distributions of m discrete random variables, each on n states, find the joint distribution with minimum entropy, that respects the given marginals. This corresponds to minimizing a concave function of nm variables over a convex polytope defined by nm linear constraints, called a transportation polytope. Unfortunately, it was recently shown that this minimum entropy coupling problem is NP-hard, even for 2 variables with n states. Even representing points (joint distributions) over this space can require exponential complexity (in n, m) if done naively.   In our recent work we introduced an efficient greedy algorithm to find an approximate solution for this problem. In this paper we analyze this algorithm and establish two results: that our algorithm always finds a local minimum and also is within an additive approximation error from the unknown global optimum.\nMost of researches on image forensics have been mainly focused on detection of artifacts introduced by a single processing tool. They lead in the development of many specialized algorithms looking for one or more particular footprints under specific settings. Naturally, the performance of such algorithms are not perfect, and accordingly the provided output might be noisy, inaccurate and only partially correct. Furthermore, a forged image in practical scenarios is often the result of utilizing several tools available by image-processing software systems. Therefore, reliable tamper detection requires developing more poweful tools to deal with various tempering scenarios. Fusion of forgery detection tools based on Fuzzy Inference System has been used before for addressing this problem. Adjusting the membership functions and defining proper fuzzy rules for attaining to better results are time-consuming processes. This can be accounted as main disadvantage of fuzzy inference systems. In this paper, a Neuro-Fuzzy inference system for fusion of forgery detection tools is developed. The neural network characteristic of these systems provides appropriate tool for automatically adjusting the membership functions. Moreover, initial fuzzy inference system is generated based on fuzzy clustering techniques. The proposed framework is implemented and validated on a benchmark image splicing data set in which three forgery detection tools are fused based on adaptive Neuro-Fuzzy inference system. The outcome of the proposed method reveals that applying Neuro Fuzzy inference systems could be a better approach for fusion of forgery detection tools.\nAs we know, there is a controversy about the decision making under risk between economists and psychologists. We discuss to build a unified theory of risky choice, which would explain both of compensatory and non-compensatory theories. For risky choice, according to cognition ability, we argue that people could not build a continuous and accurate subjective probability world, but several order concepts, such as small, middle and large probability. People make decisions based on information, experience, imagination and other things. All of these things are so huge that people have to prepare some strategies. That is, people have different strategies when facing to different situations. The distributions of these things have different decision structures. More precisely, decision making is a process of simplifying the decision structure. However, the process of decision structure simplifying is not stuck in a rut, but through different path when facing problems repeatedly. It is why preference reversal always happens when making decisions. The most efficient way to simplify the decision structure is calculating expected value or making decisions based on one or two dimensions. We also argue that the deliberation time at least has four parts, which are consist of substitution time, first order time, second order time and calculation time. Decision structure also can simply explain the phenomenon of paradoxes and anomalies. JEL Codes: C10, D03, D81\nA credal network under epistemic irrelevance is a generalised type of Bayesian network that relaxes its two main building blocks. On the one hand, the local probabilities are allowed to be partially specified. On the other hand, the assessments of independence do not have to hold exactly. Conceptually, these two features turn credal networks under epistemic irrelevance into a powerful alternative to Bayesian networks, offering a more flexible approach to graph-based multivariate uncertainty modelling. However, in practice, they have long been perceived as very hard to work with, both theoretically and computationally.   The aim of this paper is to demonstrate that this perception is no longer justified. We provide a general introduction to credal networks under epistemic irrelevance, give an overview of the state of the art, and present several new theoretical results. Most importantly, we explain how these results can be combined to allow for the design of recursive inference methods. We provide numerous concrete examples of how this can be achieved, and use these to demonstrate that computing with credal networks under epistemic irrelevance is most definitely feasible, and in some cases even highly efficient. We also discuss several philosophical aspects, including the lack of symmetry, how to deal with probability zero, the interpretation of lower expectations, the axiomatic status of graphoid properties, and the difference between updating and conditioning.\nFor artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classification tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicability for neural network training. Finally, PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm (A3C).\nFuture projection of climate is typically obtained by combining outputs from multiple Earth System Models (ESMs) for several climate variables such as temperature and precipitation. While IPCC has traditionally used a simple model output average, recent work has illustrated potential advantages of using a multitask learning (MTL) framework for projections of individual climate variables. In this paper we introduce a framework for hierarchical multitask learning (HMTL) with two levels of tasks such that each super-task, i.e., task at the top level, is itself a multitask learning problem over sub-tasks. For climate projections, each super-task focuses on projections of specific climate variables spatially using an MTL formulation. For the proposed HMTL approach, a group lasso regularization is added to couple parameters across the super-tasks, which in the climate context helps exploit relationships among the behavior of different climate variables at a given spatial location. We show that some recent works on MTL based on learning task dependency structures can be viewed as special cases of HMTL. Experiments on synthetic and real climate data show that HMTL produces better results than decoupled MTL methods applied separately on the super-tasks and HMTL significantly outperforms baselines for climate projection.\nFrequent itemset mining is a popular data mining technique. Apriori, Eclat, and FP-Growth are among the most common algorithms for frequent itemset mining. Considerable research has been performed to compare the relative performance between these three algorithms, by evaluating the scalability of each algorithm as the dataset size increases. While scalability as data size increases is important, previous papers have not examined the performance impact of similarly sized datasets that contain different itemset characteristics. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm.   This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm.\nThis survey explores Procedural Content Generation via Machine Learning (PCGML), defined as the generation of game content using machine learning models trained on existing content. As the importance of PCG for game development increases, researchers explore new avenues for generating high-quality content with or without human involvement; this paper addresses the relatively new paradigm of using machine learning (in contrast with search-based, solver-based, and constructive methods). We focus on what is most often considered functional game content such as platformer levels, game maps, interactive fiction stories, and cards in collectible card games, as opposed to cosmetic content such as sprites and sound effects. In addition to using PCG for autonomous generation, co-creativity, mixed-initiative design, and compression, PCGML is suited for repair, critique, and content analysis because of its focus on modeling existing content. We discuss various data sources and representations that affect the resulting generated content. Multiple PCGML methods are covered, including neural networks, long short-term memory (LSTM) networks, autoencoders, and deep convolutional networks; Markov models, $n$-grams, and multi-dimensional Markov chains; clustering; and matrix factorization. Finally, we discuss open problems in the application of PCGML, including learning from small datasets, lack of training data, multi-layered learning, style-transfer, parameter tuning, and PCG as a game mechanic.\nOptimization of Mixed-Integer Non-Linear Programming (MINLP) supports important decisions in applications such as Chemical Process Engineering. But current solvers have limited ability for deductive reasoning or the use of domain-specific theories, and the management of integrality constraints does not yet exploit automated reasoning tools such as SMT solvers. This seems to limit both scalability and reach of such tools in practice. We therefore present a tool, ManyOpt, for MINLP optimization that enables experimentation with reduction techniques which transform a MINLP problem to feasibility checking realized by an SMT solver. ManyOpt is similar to the SAT solver ManySAT in that it runs a specified number of such reduction techniques in parallel to get the strongest result on a given MINLP problem. The tool is implemented in layers, which we may see as features and where reduction techniques are feature vectors. Some of these features are inspired by known MINLP techniques whereas others are novel and specific to SMT. Our experimental results on standard benchmarks demonstrate the benefits of this approach. The tool supports a variety of SMT solvers and is easily extensible with new features, courtesy of its layered structure. For example, logical formulas for deductive reasoning are easily added to constrain further the optimization of a MINLP problem of interest.\nIn this paper, we propose a new autonomous braking system based on deep reinforcement learning. The proposed autonomous braking system automatically decides whether to apply the brake at each time step when confronting the risk of collision using the information on the obstacle obtained by the sensors. The problem of designing brake control is formulated as searching for the optimal policy in Markov decision process (MDP) model where the state is given by the relative position of the obstacle and the vehicle's speed, and the action space is defined as whether brake is stepped or not. The policy used for brake control is learned through computer simulations using the deep reinforcement learning method called deep Q-network (DQN). In order to derive desirable braking policy, we propose the reward function which balances the damage imposed to the obstacle in case of accident and the reward achieved when the vehicle runs out of risk as soon as possible. DQN is trained for the scenario where a vehicle is encountered with a pedestrian crossing the urban road. Experiments show that the control agent exhibits desirable control behavior and avoids collision without any mistake in various uncertain environments.\nIn recent years, machine learning techniques based on neural networks for mobile computing become increasingly popular. Classical multi-layer neural networks require matrix multiplications at each stage. Multiplication operation is not an energy efficient operation and consequently it drains the battery of the mobile device. In this paper, we propose a new energy efficient neural network with the universal approximation property over space of Lebesgue integrable functions. This network, called, additive neural network, is very suitable for mobile computing. The neural structure is based on a novel vector product definition, called ef-operator, that permits a multiplier-free implementation. In ef-operation, the \"product\" of two real numbers is defined as the sum of their absolute values, with the sign determined by the sign of the product of the numbers. This \"product\" is used to construct a vector product in $R^N$. The vector product induces the $l_1$ norm. The proposed additive neural network successfully solves the XOR problem. The experiments on MNIST dataset show that the classification performances of the proposed additive neural networks are very similar to the corresponding multi-layer perceptron and convolutional neural networks (LeNet).\nThe Boolean Satisfiability problem asks if a Boolean formula is satisfiable by some assignment of the variables or not. It belongs to the NP-complete complexity class and hence no algorithm with polynomial time worst-case complexity is known, i.e., the problem is hard. The K-SAT problem is the subset of the Boolean Satisfiability problem, for which the Boolean formula has the conjunctive normal form with K literals per clause. This problem is still NP-complete for $K \\ge 3$. Although the worst case complexity of NP-complete problems is conjectured to be exponential, there might be subsets of the realizations where solutions can typically be found in polynomial time. In fact, random $K$-SAT, with the number of clauses to number of variables ratio $\\alpha$ as control parameter, shows a phase transition between a satisfiable phase and an unsatisfiable phase, at which the hardest problems are located. We use here several linear programming approaches to reveal further \"easy-hard\" transition points at which the typical hardness of the problems increases which means that such algorithms can solve the problem on one side efficiently but not beyond this point. For one of these transitions, we observed a coincidence with a structural transition of the literal factor graphs of the problem instances. We also investigated cutting-plane approaches, which often increase the computational efficiency. Also we tried out a mapping to another NP-complete optimization problem using a specific algorithm for that problem. In both cases, no improvement of the performance was observed, i.e., no shift of the easy-hard transition to higher values of $\\alpha$.\nObserving nearby galaxies would facilitate the search for artificial radio signals by sampling many billions of stars simultaneously, but few efforts have been made to exploit this opportunity. An added attraction is that the Milky Way is the second-largest member of the Local Group, so our galaxy might be a probable target for hypothetical broadcasters in nearby galaxies. We present the first relatively high spectral resolution (<1 kHz) 21 cm band search for intelligent radio signals of complete galaxies in the Local Group with the Jansky VLA, observing the galaxies M31 (Andromeda) and M33 (Triangulum) - the first and third largest members of the group respectively - sampling more stars than any prior search of this kind. We used 122 Hz channels over a 1 MHz spectral window in the target galaxy velocity frame of reference, and 15 Hz channels over a 125 kHz window in our local standard of rest. No narrowband signals were detected above a signal-to-noise ratio of 7, suggesting the absence of continuous narrowband flux greater than approximately 0.24 Jy and 1.33 Jy in the respective spectral windows illuminating our part of the Milky Way during our observations in December 2014 and January 2015. This is also the first study in which the upgraded VLA has been used for SETI.\nThis paper has three main contributions to our understanding of fixed-depth minimax search: (A) A new formulation for Stockman's SSS* algorithm, based on Alpha-Beta, is presented. It solves all the perceived drawbacks of SSS*, finally transforming it into a practical algorithm. In effect, we show that SSS* = alpha-beta + ransposition tables. The crucial step is the realization that transposition tables contain so-called solution trees, structures that are used in best-first search algorithms like SSS*. Having created a practical version, we present performance measurements with tournament game-playing programs for three different minimax games, yielding results that contradict a number of publications. (B) Based on the insights gained in our attempts at understanding SSS*, we present a framework that facilitates the construction of several best-first fixed- depth game-tree search algorithms, known and new. The framework is based on depth-first null-window Alpha-Beta search, enhanced with storage to allow for the refining of previous search results. It focuses attention on the essential differences between algorithms. (C) We present a new instance of the framework, MTD(f). It is well-suited for use with iterative deepening, and performs better than algorithms that are currently used in most state-of-the-art game-playing programs. We provide experimental evidence to explain why MTD(f) performs better than the other fixed-depth minimax algorithms.\nA considerable amount of machine learning algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with equal or unequal lengths to a matrix format. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation. The learned feature representation is particularly suitable to the class of learning problems that are sensitive to data similarities. Given a set of $n$ time series, we first construct an $n\\times n$ partially observed similarity matrix by randomly sampling $O(n \\log n)$ pairs of time series and computing their pairwise similarities. We then propose an extremely efficient algorithm that solves a highly non-convex and NP-hard problem to learn new features based on the partially observed similarity matrix. We use the learned features to conduct experiments on both data classification and clustering tasks. Our extensive experimental results demonstrate that the proposed framework is both effective and efficient.\nResearchers in answer set programming and constraint programming have spent significant efforts in the development of hybrid languages and solving algorithms combining the strengths of these traditionally separate fields. These efforts resulted in a new research area: constraint answer set programming. Constraint answer set programming languages and systems proved to be successful at providing declarative, yet efficient solutions to problems involving hybrid reasoning tasks. One of the main contributions of this paper is the first comprehensive account of the constraint answer set language and solver EZCSP, a mainstream representative of this research area that has been used in various successful applications. We also develop an extension of the transition systems proposed by Nieuwenhuis et al. in 2006 to capture Boolean satisfiability solvers. We use this extension to describe the EZCSP algorithm and prove formal claims about it. The design and algorithmic details behind EZCSP clearly demonstrate that the development of the hybrid systems of this kind is challenging. Many questions arise when one faces various design choices in an attempt to maximize system's benefits. One of the key decisions that a developer of a hybrid solver makes is settling on a particular integration schema within its implementation. Thus, another important contribution of this paper is a thorough case study based on EZCSP, focused on the various integration schemas that it provides.   Under consideration in Theory and Practice of Logic Programming (TPLP).\nIn this paper we propose a multi-convex framework for multi-task learning that improves predictions by learning relationships both between tasks and between features. Our framework is a generalization of related methods in multi-task learning, that either learn task relationships, or feature relationships, but not both. We start with a hierarchical Bayesian model, and use the empirical Bayes method to transform the underlying inference problem into a multi-convex optimization problem. We propose a coordinate-wise minimization algorithm that has a closed form solution for each block subproblem. Naively these solutions would be expensive to compute, but by using the theory of doubly stochastic matrices, we are able to reduce the underlying matrix optimization subproblem into a minimum weight perfect matching problem on a complete bipartite graph, and solve it analytically and efficiently. To solve the weight learning subproblem, we propose three different strategies, including a gradient descent method with linear convergence guarantee when the instances are not shared by multiple tasks, and a numerical solution based on Sylvester equation when instances are shared. We demonstrate the efficiency of our method on both synthetic datasets and real-world datasets. Experiments show that the proposed optimization method is orders of magnitude faster than an off-the-shelf projected gradient method, and our model is able to exploit the correlation structures among multiple tasks and features.\nNeural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid- and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step. This vector is used both for predicting the next token as well as for the key and value of a differentiable memory of a token history. In this paper, we propose a neural language model with a key-value attention mechanism that outputs separate representations for the key and value of a differentiable memory, as well as for encoding the next-word distribution. This model outperforms existing memory-augmented neural language models on two corpora. Yet, we found that our method mainly utilizes a memory of the five most recent output representations. This led to the unexpected main finding that a much simpler model based only on the concatenation of recent output representations from previous time steps is on par with more sophisticated memory-augmented neural language models.\nSparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate the acceleration of first-order methods for solving extensive-form games through better design of the dilated entropy function---a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player. This result improves the convergence rate of several first-order methods by a factor of $\\Omega(b^dd)$, where $b$ is the branching factor of the player, and $d$ is the depth of the game tree.   Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and practical tuning we show that, for the first time, the excessive gap technique can be made faster than the fastest counterfactual regret minimization algorithm, CFR+, in practice.\nWe propose a direct estimation method for R\\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets $X$ and $Y$, respectively with $N$ and $M$ samples, where $\\eta:=M/N$ is a constant value. Considering the $k$-nearest neighbor ($k$-NN) graph of $Y$ in the joint data set $(X,Y)$, we show that the average powered ratio of the number of $X$ points to the number of $Y$ points among all $k$-NN points is proportional to R\\'{e}nyi divergence of $X$ and $Y$ densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of $\\gamma$-H\\\"{o}lder smooth functions, the estimator achieves the MSE rate of $O(N^{-2\\gamma/(\\gamma+d)})$. Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order $d$, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of $O(1/N)$. Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.\nBeing an unsupervised machine learning and data mining technique, biclustering and its multimodal extensions are becoming popular tools for analysing object-attribute data in different domains. Apart from conventional clustering techniques, biclustering is searching for homogeneous groups of objects while keeping their common description, e.g., in binary setting, their shared attributes. In bioinformatics, biclustering is used to find genes, which are active in a subset of situations, thus being candidates for biomarkers. However, the authors of those biclustering techniques that are popular in gene expression analysis, may overlook the existing methods. For instance, BiMax algorithm is aimed at finding biclusters, which are well-known for decades as formal concepts. Moreover, even if bioinformatics classify the biclustering methods according to reasonable domain-driven criteria, their classification taxonomies may be different from survey to survey and not full as well. So, in this paper we propose to use concept lattices as a tool for taxonomy building (in the biclustering domain) and attribute exploration as means for cross-domain taxonomy completion.\nThe beyond worst-case synthesis problem was introduced recently by Bruy\\`ere et al. [BFRR14]: it aims at building system controllers that provide strict worst-case performance guarantees against an antagonistic environment while ensuring higher expected performance against a stochastic model of the environment. Our work extends the framework of [BFRR14] and follow-up papers, which focused on quantitative objectives, by addressing the case of $\\omega$-regular conditions encoded as parity objectives, a natural way to represent functional requirements of systems.   We build strategies that satisfy a main parity objective on all plays, while ensuring a secondary one with sufficient probability. This setting raises new challenges in comparison to quantitative objectives, as one cannot easily mix different strategies without endangering the functional properties of the system. We establish that, for all variants of this problem, deciding the existence of a strategy lies in ${\\sf NP} \\cap {\\sf coNP}$, the same complexity class as classical parity games. Hence, our framework provides additional modeling power while staying in the same complexity class.   [BFRR14] V\\'eronique Bruy\\`ere, Emmanuel Filiot, Mickael Randour, and Jean-Fran\\c{c}ois Raskin. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. In Ernst W. Mayr and Natacha Portier, editors, 31st International Symposium on Theoretical Aspects of Computer Science, STACS 2014, March 5-8, 2014, Lyon, France, volume 25 of LIPIcs, pages 199-213. Schloss Dagstuhl - Leibniz - Zentrum fuer Informatik, 2014.\nWe study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and delay linear in the Hamming weight of each valuation. Moreover, valuations of constant Hamming weight can be enumerated with linear preprocessing and constant delay.   Our results yield a framework for efficient enumeration that applies to all problems whose solutions can be compiled to structured d-DNNFs. In particular, we use it to recapture classical results in database theory, for factorized database representations and for MSO evaluation. This gives an independent proof of constant-delay enumeration for MSO formulae with first-order free variables on bounded-treewidth structures.\nVertex Separation Minimization Problem (VSMP) consists of finding a layout of a graph G = (V,E) which minimizes the maximum vertex cut or separation of a layout. It is an NP-complete problem in general for which metaheuristic techniques can be applied to find near optimal solution. VSMP has applications in VLSI design, graph drawing and computer language compiler design. VSMP is polynomially solvable for grids, trees, permutation graphs and cographs. Construction heuristics play a very important role in the metaheuristic techniques as they are responsible for generating initial solutions which lead to fast convergence. In this paper, we have proposed three construction heuristics H1, H2 and H3 and performed experiments on Grids, Small graphs, Trees and Harwell Boeing graphs, totaling 248 instances of graphs. Experiments reveal that H1, H2 and H3 are able to achieve best results for 88.71%, 43.5% and 37.1% of the total instances respectively while the best construction heuristic in the literature achieves the best solution for 39.9% of the total instances. We have also compared the results with the state-of-the-art metaheuristic GVNS and observed that the proposed construction heuristics improves the results for some of the input instances. It was found that GVNS obtained best results for 82.9% instances of all input instances and the heuristic H1 obtained best results for 82.3% of all input instances.\nAutomatic segmentation of the liver and hepatic lesions is an important step towards deriving quantitative biomarkers for accurate clinical diagnosis and computer-aided decision support systems. This paper presents a method to automatically segment liver and lesions in CT and MRI abdomen images using cascaded fully convolutional neural networks (CFCNs) enabling the segmentation of a large-scale medical trial or quantitative image analysis. We train and cascade two FCNs for a combined segmentation of the liver and its lesions. In the first step, we train a FCN to segment the liver as ROI input for a second FCN. The second FCN solely segments lesions within the predicted liver ROIs of step 1. CFCN models were trained on an abdominal CT dataset comprising 100 hepatic tumor volumes. Validations on further datasets show that CFCN-based semantic liver and lesion segmentation achieves Dice scores over 94% for liver with computation times below 100s per volume. We further experimentally demonstrate the robustness of the proposed method on an 38 MRI liver tumor volumes and the public 3DIRCAD dataset.\nOne of the long-standing challenges in Artificial Intelligence for learning goal-directed behavior is to build a single agent which can solve multiple tasks. Recent progress in multi-task learning for goal-directed sequential problems has been in the form of distillation based learning wherein a student network learns from multiple task-specific expert networks by mimicking the task-specific policies of the expert networks. While such approaches offer a promising solution to the multi-task learning problem, they require supervision from large expert networks which require extensive data and computation time for training. In this work, we propose an efficient multi-task learning framework which solves multiple goal-directed tasks in an on-line setup without the need for expert supervision. Our work uses active learning principles to achieve multi-task learning by sampling the harder tasks more than the easier ones. We propose three distinct models under our active sampling framework. An adaptive method with extremely competitive multi-tasking performance. A UCB-based meta-learner which casts the problem of picking the next task to train on as a multi-armed bandit problem. A meta-learning method that casts the next-task picking problem as a full Reinforcement Learning problem and uses actor critic methods for optimizing the multi-tasking performance directly. We demonstrate results in the Atari 2600 domain on seven multi-tasking instances: three 6-task instances, one 8-task instance, two 12-task instances and one 21-task instance.\nMobile robots are increasingly being employed for performing complex tasks in dynamic environments. Reinforcement learning (RL) methods are recognized to be promising for specifying such tasks in a relatively simple manner. However, the strong dependency between the learning method and the task to learn is a well-known problem that restricts practical implementations of RL in robotics, often requiring major modifications of parameters and adding other techniques for each particular task. In this paper we present a practical core implementation of RL which enables the learning process for multiple robotic tasks with minimal per-task tuning or none. Based on value iteration methods, this implementation includes a novel approach for action selection, called Q-biased softmax regression (QBIASSR), which avoids poor performance of the learning process when the robot reaches new unexplored states. Our approach takes advantage of the structure of the state space by attending the physical variables involved (e.g., distances to obstacles, X,Y,{\\theta} pose, etc.), thus experienced sets of states may favor the decision-making process of unexplored or rarely-explored states. This improvement has a relevant role in reducing the tuning of the algorithm for particular tasks. Experiments with real and simulated robots, performed with the software framework also introduced here, show that our implementation is effectively able to learn different robotic tasks without tuning the learning method. Results also suggest that the combination of true online SARSA({\\lambda}) with QBIASSR can outperform the existing RL core algorithms in low-dimensional robotic tasks.\nIn statistical relational learning, knowledge graph completion deals with automatically understanding the structure of large knowledge graphs---labeled directed graphs---and predicting missing relationships---labeled edges. State-of-the-art embedding models propose different trade-offs between modeling expressiveness, and time and space complexity. We reconcile both expressiveness and complexity through the use of complex-valued embeddings and explore the link between such complex-valued embeddings and unitary diagonalization. We corroborate our approach theoretically and show that all real square matrices---thus all possible relation/adjacency matrices---are the real part of some unitarily diagonalizable matrix. This results opens the door to a lot of other applications of square matrices factorization. Our approach based on complex embeddings is arguably simple, as it only involves a Hermitian dot product, the complex counterpart of the standard dot product between real vectors, whereas other methods resort to more and more complicated composition functions to increase their expressiveness. The proposed complex embeddings are scalable to large data sets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.\nLimited labeled data are available for the research of estimating facial expression intensities. For instance, the ability to train deep networks for automated pain assessment is limited by small datasets with labels of patient-reported pain intensities. Fortunately, fine-tuning from a data-extensive pre-trained domain, such as face verification, can alleviate this problem. In this paper, we propose a network that fine-tunes a state-of-the-art face verification network using a regularized regression loss and additional data with expression labels. In this way, the expression intensity regression task can benefit from the rich feature representations trained on a huge amount of data for face verification. The proposed regularized deep regressor is applied to estimate the pain expression intensity and verified on the widely-used UNBC-McMaster Shoulder-Pain dataset, achieving the state-of-the-art performance. A weighted evaluation metric is also proposed to address the imbalance issue of different pain intensities.\nLTE in unlicensed spectrum (LTE-U) is a promising approach to overcome the wireless spectrum scarcity. However, to reap the benefits of LTE-U, a fair coexistence mechanism with other incumbent WiFi deployments is required. In this paper, a novel deep learning approach is proposed for modeling the resource allocation problem of LTE-U small base stations (SBSs). The proposed approach enables multiple SBSs to proactively perform dynamic channel selection, carrier aggregation, and fractional spectrum access while guaranteeing fairness with existing WiFi networks and other LTE-U operators. Adopting a proactive coexistence mechanism enables future delay-intolerant LTE-U data demands to be served within a given prediction window ahead of their actual arrival time thus avoiding the underutilization of the unlicensed spectrum during off-peak hours while maximizing the total served LTE-U traffic load. To this end, a noncooperative game model is formulated in which SBSs are modeled as Homo Egualis agents that aim at predicting a sequence of future actions and thus achieving long-term equal weighted fairness with WLAN and other LTE-U operators over a given time horizon. The proposed deep learning algorithm is then shown to reach a mixed-strategy Nash equilibrium (NE), when it converges. Simulation results using real data traces show that the proposed scheme can yield up to 28% and 11% gains over a conventional reactive approach and a proportional fair coexistence mechanism, respectively. The results also show that the proposed framework prevents WiFi performance degradation for a densely deployed LTE-U network.\nBipartite matching, where agents on one side of a market are matched to agents or items on the other, is a classical problem in computer science and economics, with widespread application in healthcare, education, advertising, and general resource allocation. A practitioner's goal is typically to maximize a matching market's economic efficiency, possibly subject to some fairness requirements that promote equal access to resources. A natural balancing act exists between fairness and efficiency in matching markets, and has been the subject of much research.   In this paper, we study a complementary goal---balancing diversity and efficiency---in a generalization of bipartite matching where agents on one side of the market can be matched to sets of agents on the other. Adapting a classical definition of the diversity of a set, we propose a quadratic programming-based approach to solving a supermodular minimization problem that balances diversity and total weight of the solution. We also provide a scalable greedy algorithm with theoretical performance bounds. We then define the price of diversity, a measure of the efficiency loss due to enforcing diversity, and give a worst-case theoretical bound. Finally, we demonstrate the efficacy of our methods on three real-world datasets, and show that the price of diversity is not bad in practice.\nOver the past few years, online aggression and abusive behaviors have occurred in many different forms and on a variety of platforms. In extreme cases, these incidents have evolved into hate, discrimination, and bullying, and even materialized into real-world threats and attacks against individuals or groups. In this paper, we study the Gamergate controversy. Started in August 2014 in the online gaming world, it quickly spread across various social networking platforms, ultimately leading to many incidents of cyberbullying and cyberaggression. We focus on Twitter, presenting a measurement study of a dataset of 340k unique users and 1.6M tweets to study the properties of these users, the content they post, and how they differ from random Twitter users. We find that users involved in this \"Twitter war\" tend to have more friends and followers, are generally more engaged and post tweets with negative sentiment, less joy, and more hate than random users. We also perform preliminary measurements on how the Twitter suspension mechanism deals with such abusive behaviors. While we focus on Gamergate, our methodology to collect and analyze tweets related to aggressive and bullying activities is of independent interest.\nWe introduce DeepNAT, a 3D Deep convolutional neural network for the automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance images. DeepNAT is an end-to-end learning-based approach to brain segmentation that jointly learns an abstract feature representation and a multi-class classification. We propose a 3D patch-based approach, where we do not only predict the center voxel of the patch but also neighbors, which is formulated as multi-task learning. To address a class imbalance problem, we arrange two networks hierarchically, where the first one separates foreground from background, and the second one identifies 25 brain structures on the foreground. Since patches lack spatial context, we augment them with coordinates. To this end, we introduce a novel intrinsic parameterization of the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As network architecture, we use three convolutional layers with pooling, batch normalization, and non-linearities, followed by fully connected layers with dropout. The final segmentation is inferred from the probabilistic output of the network with a 3D fully connected conditional random field, which ensures label agreement between close voxels. The roughly 2.7 million parameters in the network are learned with stochastic gradient descent. Our results show that DeepNAT compares favorably to state-of-the-art methods. Finally, the purely learning-based method may have a high potential for the adaptation to young, old, or diseased brains by fine-tuning the pre-trained network with a small training sample on the target application, where the availability of larger datasets with manual annotations may boost the overall segmentation accuracy in the future.\nMachine learning is essentially the sciences of playing with data. An adaptive data selection strategy, enabling to dynamically choose different data at various training stages, can reach a more effective model in a more efficient way. In this paper, we propose a deep reinforcement learning framework, which we call \\emph{\\textbf{N}eural \\textbf{D}ata \\textbf{F}ilter} (\\textbf{NDF}), to explore automatic and adaptive data selection in the training process. In particular, NDF takes advantage of a deep neural network to adaptively select and filter important data instances from a sequential stream of training data, such that the future accumulative reward (e.g., the convergence speed) is maximized. In contrast to previous studies in data selection that is mainly based on heuristic strategies, NDF is quite generic and thus can be widely suitable for many machine learning tasks. Taking neural network training with stochastic gradient descent (SGD) as an example, comprehensive experiments with respect to various neural network modeling (e.g., multi-layer perceptron networks, convolutional neural networks and recurrent neural networks) and several applications (e.g., image classification and text understanding) demonstrate that NDF powered SGD can achieve comparable accuracy with standard SGD process by using less data and fewer iterations.\nDeep neural networks require a large amount of labeled training data during supervised learning. However, collecting and labeling so much data might be infeasible in many cases. In this paper, we introduce a source-target selective joint fine-tuning scheme for improving the performance of deep learning tasks with insufficient training data. In this scheme, a target learning task with insufficient training data is carried out simultaneously with another source learning task with abundant training data. However, the source learning task does not use all existing training data. Our core idea is to identify and use a subset of training images from the original source learning task whose low-level characteristics are similar to those from the target learning task, and jointly fine-tune shared convolutional layers for both tasks. Specifically, we compute descriptors from linear or nonlinear filter bank responses on training images from both tasks, and use such descriptors to search for a desired subset of training samples for the source learning task.   Experiments demonstrate that our selective joint fine-tuning scheme achieves state-of-the-art performance on multiple visual classification tasks with insufficient training data for deep learning. Such tasks include Caltech 256, MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to fine-tuning without a source domain, the proposed method can improve the classification accuracy by 2% - 10% using a single model.\nCongestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problems in a fully decentralized manner. However, its performance and applicability strongly depends on some, until now, undocumented assumptions. Two of the main congestion benchmark problems considered in the literature are: the Beach Problem Domain and the Traffic Lane Domain. In both settings the highest system utility is achieved when overcrowding one resource and keeping the rest at optimum capacity. We analyse how abstract grouping can promote this behaviour and how feasible it is to apply this approach in a real-world domain (i.e., what assumptions need to be satisfied and what knowledge is necessary). We introduce a new test problem, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network (e.g., road network), thus choosing one path will also impact the load on other paths having common road segments. We demonstrate the application of state-of-the-art MARL methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment.\nWe consider a scheduling problem where a cloud service provider has multiple units of a resource available over time. Selfish clients submit jobs, each with an arrival time, deadline, length, and value. The service provider's goal is to implement a truthful online mechanism for scheduling jobs so as to maximize the social welfare of the schedule. Recent work shows that under a stochastic assumption on job arrivals, there is a single-parameter family of mechanisms that achieves near-optimal social welfare. We show that given any such family of near-optimal online mechanisms, there exists an online mechanism that in the worst case performs nearly as well as the best of the given mechanisms. Our mechanism is truthful whenever the mechanisms in the given family are truthful and prompt, and achieves optimal (within constant factors) regret.   We model the problem of competing against a family of online scheduling mechanisms as one of learning from expert advice. A primary challenge is that any scheduling decisions we make affect not only the payoff at the current step, but also the resource availability and payoffs in future steps. Furthermore, switching from one algorithm (a.k.a. expert) to another in an online fashion is challenging both because it requires synchronization with the state of the latter algorithm as well as because it affects the incentive structure of the algorithms. We further show how to adapt our algorithm to a non-clairvoyant setting where job lengths are unknown until jobs are run to completion. Once again, in this setting, we obtain truthfulness along with asymptotically optimal regret (within poly-logarithmic factors).\nConversion optimization means designing a web interface so that as many users as possible take a desired action on it, such as register or purchase. Such design is usually done by hand, testing one change at a time through A/B testing, or a limited number of combinations through multivariate testing, making it possible to evaluate only a small fraction of designs in a vast design space. This paper describes Sentient Ascend, an automatic conversion optimization system that uses evolutionary optimization to create effective web interface designs. Ascend makes it possible to discover and utilize interactions between the design elements that are difficult to identify otherwise. Moreover, evaluation of design candidates is done in parallel online, i.e. with a large number of real users interacting with the system. A case study on an existing media site shows that significant improvements (i.e. over 43%) are possible beyond human design. Ascend can therefore be seen as an approach to massively multivariate conversion optimization, based on a massively parallel interactive evolution.\nNeural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone. Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. Despite significant computational requirements, we show that it is now possible to evolve models with accuracies within the range of those published in the last year. Specifically, we employ simple evolutionary techniques at unprecedented scales to discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial initial conditions and reaching accuracies of 94.6% (95.6% for ensemble) and 77.0%, respectively. To do this, we use novel and intuitive mutation operators that navigate large search spaces; we stress that no human participation is required once evolution starts and that the output is a fully-trained model. Throughout this work, we place special emphasis on the repeatability of results, the variability in the outcomes and the computational requirements.\nBellemare et al. (2016) introduced the notion of a pseudo-count, derived from a density model, to generalize count-based exploration to non-tabular reinforcement learning. This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge. We consider two questions left open by their work: First, how important is the quality of the density model for exploration? Second, what role does the Monte Carlo update play in exploration? We answer the first question by demonstrating the use of PixelCNN, an advanced neural density model for images, to supply a pseudo-count. In particular, we examine the intrinsic difficulties in adapting Bellemare et al.'s approach when assumptions about the model are violated. The result is a more practical and general algorithm requiring no special apparatus. We combine PixelCNN pseudo-counts with different agent architectures to dramatically improve the state of the art on several hard Atari games. One surprising finding is that the mixed Monte Carlo update is a powerful facilitator of exploration in the sparsest of settings, including Montezuma's Revenge.\nUnifying seemingly disparate algorithmic ideas to produce better performing algorithms has been a longstanding goal in reinforcement learning. As a primary example, TD($\\lambda$) elegantly unifies one-step TD prediction with Monte Carlo methods through the use of eligibility traces and the trace-decay parameter $\\lambda$. Currently, there are a multitude of algorithms that can be used to perform TD control, including Sarsa, $Q$-learning, and Expected Sarsa. These methods are often studied in the one-step case, but they can be extended across multiple time steps to achieve better performance. Each of these algorithms is seemingly distinct, and no one dominates the others for all problems. In this paper, we study a new multi-step action-value algorithm called $Q(\\sigma)$ which unifies and generalizes these existing algorithms, while subsuming them as special cases. A new parameter, $\\sigma$, is introduced to allow the degree of sampling performed by the algorithm at each step during its backup to be continuously varied, with Sarsa existing at one extreme (full sampling), and Expected Sarsa existing at the other (pure expectation). $Q(\\sigma)$ is generally applicable to both on- and off-policy learning, but in this work we focus on experiments in the on-policy case. Our results show that an intermediate value of $\\sigma$, which results in a mixture of the existing algorithms, performs better than either extreme. The mixture can also be varied dynamically which can result in even greater performance.\nSimulation-based training (SBT) is gaining popularity as a low-cost and convenient training technique in a vast range of applications. However, for a SBT platform to be fully utilized as an effective training tool, it is essential that feedback on performance is provided automatically in real-time during training. It is the aim of this paper to develop an efficient and effective feedback generation method for the provision of real-time feedback in SBT. Existing methods either have low effectiveness in improving novice skills or suffer from low efficiency, resulting in their inability to be used in real-time. In this paper, we propose a neural network based method to generate feedback using the adversarial technique. The proposed method utilizes a bounded adversarial update to minimize a L1 regularized loss via back-propagation. We empirically show that the proposed method can be used to generate simple, yet effective feedback. Also, it was observed to have high effectiveness and efficiency when compared to existing methods, thus making it a promising option for real-time feedback generation in SBT.\nClass labels have been empirically shown useful in improving the sample quality of generative adversarial nets (GANs). In this paper, we mathematically study the properties of the current variants of GANs that make use of class label information. With class aware gradient and cross-entropy decomposition, we reveal how class labels and associated losses influence GAN's training. Based on that, we propose Activation Maximization Generative Adversarial Networks (AM-GAN) as an advanced solution. Comprehensive experiments have been conducted to validate our analysis and evaluate the effectiveness of our solution, where AM-GAN outperforms other strong baselines and achieves state-of-the-art Inception Score (8.91) on CIFAR-10. In addition, we demonstrate that, with the Inception ImageNet classifier, Inception Score mainly tracks the diversity of the generator, and there is, however, no reliable evidence that it can reflect the true sample quality. We thus propose a new metric, called AM Score, to provide more accurate estimation on the sample quality. Our proposed model also outperforms the baseline methods in the new metric.\nBeing able to fall safely is a necessary motor skill for humanoids performing highly dynamic tasks, such as running and jumping. We propose a new method to learn a policy that minimizes the maximal impulse during the fall. The optimization solves for both a discrete contact planning problem and a continuous optimal control problem. Once trained, the policy can compute the optimal next contacting body part (e.g. left foot, right foot, or hands), contact location and timing, and the required joint actuation. We represent the policy as a mixture of actor-critic neural network, which consists of n control policies and the corresponding value functions. Each pair of actor-critic is associated with one of the n possible contacting body parts. During execution, the policy corresponding to the highest value function will be executed while the associated body part will be the next contact with the ground. With this mixture of actor-critic architecture, the discrete contact sequence planning is solved through the selection of the best critics while the continuous control problem is solved by the optimization of actors. We show that our policy can achieve comparable, sometimes even higher, rewards than a recursive search of the action space using dynamic programming, while enjoying 50 to 400 times of speed gain during online execution.\nDespite progress in visual perception tasks such as image classification and detection, computers still struggle to understand the interdependency of objects in the scene as a whole, e.g., relations between objects or their attributes. Existing methods often ignore global context cues capturing the interactions among different object instances, and can only recognize a handful of types by exhaustively training individual detectors for all possible relationships. To capture such global interdependency, we propose a deep Variation-structured Reinforcement Learning (VRL) framework to sequentially discover object relationships and attributes in the whole image. First, a directed semantic action graph is built using language priors to provide a rich and compact representation of semantic correlations between object categories, predicates, and attributes. Next, we use a variation-structured traversal over the action graph to construct a small, adaptive action set for each step based on the current state and historical actions. In particular, an ambiguity-aware object mining scheme is used to resolve semantic ambiguity among object categories that the object detector fails to distinguish. We then make sequential predictions using a deep RL framework, incorporating global context cues and semantic embeddings of previously extracted phrases in the state vector. Our experiments on the Visual Relationship Detection (VRD) dataset and the large-scale Visual Genome dataset validate the superiority of VRL, which can achieve significantly better detection results on datasets involving thousands of relationship and attribute types. We also demonstrate that VRL is able to predict unseen types embedded in our action graph by learning correlations on shared graph nodes.\nThis paper develops a general framework for learning interpretable data representation via Long Short-Term Memory (LSTM) recurrent neural networks over hierarchal graph structures. Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization. We thus call this model the structure-evolving LSTM. In particular, starting with an initial element-level graph representation where each node is a small data element, the structure-evolving LSTM gradually evolves the multi-level graph representations by stochastically merging the graph nodes with high compatibilities along the stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two connected nodes from their corresponding LSTM gate outputs, which is used to generate a merging probability. The candidate graph structures are accordingly generated where the nodes are grouped into cliques with their merging probabilities. We then produce the new graph structure with a Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in local optimums by stochastic sampling with an acceptance probability. Once a graph structure is accepted, a higher-level graph is then constructed by taking the partitioned cliques as its nodes. During the evolving process, representation becomes more abstracted in higher-levels where redundant information is filtered out, allowing more efficient propagation of long-range data dependencies. We evaluate the effectiveness of structure-evolving LSTM in the application of semantic object parsing and demonstrate its advantage over state-of-the-art LSTM models on standard benchmarks.\nWe propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.\nRecently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot attention. No accuracy drop on image classification is observed on these extremely compact networks, compared to well-known models. An emerging question, however, is whether these model compression techniques hurt DNN's learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. Therefore, we propose a new compact network architecture and unsupervised DA method in this paper. The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Our DNN has 4.1M parameters, which is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves state-of-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets.\nIn this paper we look into the problem of planning over hybrid domains, where change can be both discrete and instantaneous, or continuous over time. In addition, it is required that each state on the trajectory induced by the execution of plans complies with a given set of global constraints. We approach the computation of plans for such domains as the problem of searching over a deterministic state model. In this model, some of the successor states are obtained by solving numerically the so-called initial value problem over a set of ordinary differential equations (ODE) given by the current plan prefix. These equations hold over time intervals whose duration is determined dynamically, according to whether zero crossing events take place for a set of invariant conditions. The resulting planner, FS+, incorporates these features together with effective heuristic guidance. FS+ does not impose any of the syntactic restrictions on process effects often found on the existing literature on Hybrid Planning. A key concept of our approach is that a clear separation is struck between planning and simulation time steps. The former is the time allowed to observe the evolution of a given dynamical system before committing to a future course of action, whilst the later is part of the model of the environment. FS+ is shown to be a robust planner over a diverse set of hybrid domains, taken from the existing literature on hybrid planning and systems.\nA new model of symbol grounding is presented, in which the structures of natural language, logical semantics, perception and action are represented categorically, and symbol grounding is modeled via the composition of morphisms between the relevant categories. This model gives conceptual insight into the fundamentally systematic nature of symbol grounding, and also connects naturally to practical real-world AI systems in current research and commercial use. Specifically, it is argued that the structure of linguistic syntax can be modeled as a certain asymmetric monoidal category, as e.g. implicit in the link grammar formalism; the structure of spatiotemporal relationships and action plans can be modeled similarly using \"image grammars\" and \"action grammars\"; and common-sense logical semantic structure can be modeled using dependently-typed lambda calculus with uncertain truth values. Given these formalisms, the grounding of linguistic descriptions in spatiotemporal perceptions and coordinated actions consists of following morphisms from language to logic through to spacetime and body (for comprehension), and vice versa (for generation). The mapping is indicated between the spatial relationships in the Region Connection Calculus and Allen Interval Algebra and corresponding entries in the link grammar syntax parsing dictionary. Further, the abstractions introduced here are shown to naturally model the structures and systems currently being deployed in the context of using the OpenCog cognitive architecture to control Hanson Robotics humanoid robots.\nBayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.\nThe Entity Disambiguation and Linking (EDL) task matches entity mentions in text to a unique Knowledge Base (KB) identifier such as a Wikipedia or Freebase id. It plays a critical role in the construction of a high quality information network, and can be further leveraged for a variety of information retrieval and NLP tasks such as text categorization and document tagging. EDL is a complex and challenging problem due to ambiguity of the mentions and real world text being multi-lingual. Moreover, EDL systems need to have high throughput and should be lightweight in order to scale to large datasets and run on off-the-shelf machines. More importantly, these systems need to be able to extract and disambiguate dense annotations from the data in order to enable an Information Retrieval or Extraction task running on the data to be more efficient and accurate. In order to address all these challenges, we present the Lithium EDL system and algorithm - a high-throughput, lightweight, language-agnostic EDL system that extracts and correctly disambiguates 75% more entities than state-of-the-art EDL systems and is significantly faster than them.\nWe present a computational evaluation of three hypotheses about sources of deficit in sentence comprehension in aphasia: slowed processing, intermittent deficiency, and resource reduction. The ACT-R based Lewis and Vasishth (2005) model is used to implement these three proposals. Slowed processing is implemented as slowed default production-rule firing time; intermittent deficiency as increased random noise in activation of chunks in memory; and resource reduction as reduced goal activation. As data, we considered subject vs. object rela- tives whose matrix clause contained either an NP or a reflexive, presented in a self-paced listening modality to 56 individuals with aphasia (IWA) and 46 matched controls. The participants heard the sentences and carried out a picture verification task to decide on an interpretation of the sentence. These response accuracies are used to identify the best parameters (for each participant) that correspond to the three hypotheses mentioned above. We show that controls have more tightly clustered (less variable) parameter values than IWA; specifically, compared to controls, among IWA there are more individuals with low goal activations, high noise, and slow default action times. This suggests that (i) individual patients show differential amounts of deficit along the three dimensions of slowed processing, intermittent deficient, and resource reduction, (ii) overall, there is evidence for all three sources of deficit playing a role, and (iii) IWA have a more variable range of parameter values than controls. In sum, this study contributes a proof of concept of a quantitative implementation of, and evidence for, these three accounts of comprehension deficits in aphasia.\nBoth the ethics of autonomous systems and the problems of their technical implementation have by now been studied in some detail. Less attention has been given to the areas in which these two separate concerns meet. This paper, written by both philosophers and engineers of autonomous systems, addresses a number of issues in machine ethics that are located at precisely the intersection between ethics and engineering. We first discuss the main challenges which, in our view, machine ethics posses to moral philosophy. We them consider different approaches towards the conceptual design of autonomous systems and their implications on the ethics implementation in such systems. Then we examine problematic areas regarding the specification and verification of ethical behavior in autonomous systems, particularly with a view towards the requirements of future legislation. We discuss transparency and accountability issues that will be crucial for any future wide deployment of autonomous systems in society. Finally we consider the, often overlooked, possibility of intentional misuse of AI systems and the possible dangers arising out of deliberately unethical design, implementation, and use of autonomous robots.\nLearning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on. We release an open source implementation of the meta-training algorithm.\nHuman parsing has recently attracted a lot of research interests due to its huge application potentials. However existing datasets have limited number of images and annotations, and lack the variety of human appearances and the coverage of challenging cases in unconstrained environment. In this paper, we introduce a new benchmark \"Look into Person (LIP)\" that makes a significant advance in terms of scalability, diversity and difficulty, a contribution that we feel is crucial for future developments in human-centric analysis. This comprehensive dataset contains over 50,000 elaborately annotated images with 19 semantic part labels, which are captured from a wider range of viewpoints, occlusions and background complexity. Given these rich annotations we perform detailed analyses of the leading human parsing approaches, gaining insights into the success and failures of these methods. Furthermore, in contrast to the existing efforts on improving the feature discriminative capability, we solve human parsing by exploring a novel self-supervised structure-sensitive learning approach, which imposes human pose structures into parsing results without resorting to extra supervision (i.e., no need for specifically labeling human joints in model training). Our self-supervised learning framework can be injected into any advanced neural networks to help incorporate rich high-level knowledge regarding human joints from a global perspective and improve the parsing results. Extensive evaluations on our LIP and the public PASCAL-Person-Part dataset demonstrate the superiority of our method.\nIn today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries.   We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.\nWe recommend that the search for exoplanets around binary stars be extended to include X-ray binaries in which the accretor is a white dwarf, neutron star, or black hole. We present a novel idea for detecting planets bound to such mass transfer binaries: we propose that the X-ray light curves of these binaries be inspected for signatures of transiting planets. X-ray transits may be the only way to detect planets around some systems, while providing a complementary approach to optical and/or radio observations in others. Any planets associated with X-ray binaries must be in stable orbits. We consider the range of allowable separations and find that orbital periods can be hours or longer, while transit durations extend upward from about a minute for Earth-radius planets in very close orbits, to hours for Jupiter-radius planets in wider orbits. The search for planets around mass transfer binaries could begin at once with existing X-ray observations of these systems. If and when a planet is detected around an X-ray binary, the size and mass of the planet may be readily measured, and it may also be possible to study the transmission and absorption of X-rays through its atmosphere. Finally, a noteworthy application of our proposal is that the same technique could be used to search for signals from extraterrestrial intelligence. If an advanced exocivilization placed a Dyson sphere or similar structure in orbit around the accretor of an X-ray binary in order to capture energy, such an artificial structure might cause detectable transits in the X-ray light curve.\nWe discuss the computational complexity of approximating maximum a posteriori inference in sum-product networks. We first show NP-hardness in trees of height two by a reduction from maximum independent set; this implies non-approximability within a sublinear factor. We show that this is a tight bound, as we can find an approximation within a linear factor in networks of height two. We then show that, in trees of height three, it is NP-hard to approximate the problem within a factor $2^{f(n)}$ for any sublinear function $f$ of the size of the input $n$. Again, this bound is tight, as we prove that the usual max-product algorithm finds (in any network) approximations within factor $2^{c \\cdot n}$ for some constant $c < 1$. Last, we present a simple algorithm, and show that it provably produces solutions at least as good as, and potentially much better than, the max-product algorithm. We empirically analyze the proposed algorithm against max-product using synthetic and realistic networks.\nMost games have, or can be generalised to have, a number of parameters that may be varied in order to provide instances of games that lead to very different player experiences. The space of possible parameter settings can be seen as a search space, and we can therefore use a Random Mutation Hill Climbing algorithm or other search methods to find the parameter settings that induce the best games. One of the hardest parts of this approach is defining a suitable fitness function. In this paper we explore the possibility of using one of a growing set of General Video Game AI agents to perform automatic play-testing. This enables a very general approach to game evaluation based on estimating the skill-depth of a game. Agent-based play-testing is computationally expensive, so we compare two simple but efficient optimisation algorithms: the Random Mutation Hill-Climber and the Multi-Armed Bandit Random Mutation Hill-Climber. For the test game we use a space-battle game in order to provide a suitable balance between simulation speed and potential skill-depth. Results show that both algorithms are able to rapidly evolve game versions with significant skill-depth, but that choosing a suitable resampling number is essential in order to combat the effects of noise.\nAs autonomous vehicles become an every-day reality, high-accuracy pedestrian detection is of paramount practical importance. Pedestrian detection is a highly researched topic with mature methods, but most datasets focus on common scenes of people engaged in typical walking poses on sidewalks. But performance is most crucial for dangerous scenarios, such as children playing in the street or people using bicycles/skateboards in unexpected ways. Such \"in-the-tail\" data is notoriously hard to observe, making both training and testing difficult. To analyze this problem, we have collected a novel annotated dataset of dangerous scenarios called the Precarious Pedestrian dataset. Even given a dedicated collection effort, it is relatively small by contemporary standards (around 1000 images). To allow for large-scale data-driven learning, we explore the use of synthetic data generated by a game engine. A significant challenge is selected the right \"priors\" or parameters for synthesis: we would like realistic data with poses and object configurations that mimic true Precarious Pedestrians. Inspired by Generative Adversarial Networks (GANs), we generate a massive amount of synthetic data and train a discriminative classifier to select a realistic subset, which we deem the Adversarial Imposters. We demonstrate that this simple pipeline allows one to synthesize realistic training data by making use of rendering/animation engines within a GAN framework. Interestingly, we also demonstrate that such data can be used to rank algorithms, suggesting that Adversarial Imposters can also be used for \"in-the-tail\" validation at test-time, a notoriously difficult challenge for real-world deployment.\nWe introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end -- from pixels to multi-agent multi-round dialog to game reward.   We demonstrate two experimental results.   First, as a 'sanity check' demonstration of pure RL (from scratch), we show results on a synthetic world, where the agents communicate in ungrounded vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find that two bots invent their own communication protocol and start using certain symbols to ask/answer about certain visual attributes (shape/color/style). Thus, we demonstrate the emergence of grounded language and communication among 'visual' dialog agents with no human supervision.   Second, we conduct large-scale real-image experiments on the VisDial dataset, where we pretrain with supervised dialog data and show that the RL 'fine-tuned' agents significantly outperform SL agents. Interestingly, the RL Qbot learns to ask questions that Abot is good at, ultimately resulting in more informative dialog and a better team.\nTranslating information between text and image is a fundamental problem in artificial intelligence that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen significant improvement through the adoption of recurrent neural networks (RNN). Meanwhile, text-to-image generation begun to generate plausible images using datasets of specific categories like birds and flowers. We've even seen image generation from multi-category datasets such as the Microsoft Common Objects in Context (MSCOCO) through the use of generative adversarial networks (GANs). Synthesizing objects with a complex shape, however, is still challenging. For example, animals and humans have many degrees of freedom, which means that they can take on many complex shapes. We propose a new training method called Image-Text-Image (I2T2I) which integrates text-to-image and image-to-text (image captioning) synthesis to improve the performance of text-to-image synthesis. We demonstrate that %the capability of our method to understand the sentence descriptions, so as to I2T2I can generate better multi-categories images using MSCOCO than the state-of-the-art. We also demonstrate that I2T2I can achieve transfer learning by using a pre-trained image captioning module to generate human images on the MPII Human Pose\nIn the modern era, each Internet user leaves enormous amounts of auxiliary digital residuals (footprints) by using a variety of on-line services. All this data is already collected and stored for many years. In recent works, it was demonstrated that it's possible to apply simple machine learning methods to analyze collected digital footprints and to create psycho-demographic profiles of individuals. However, while these works clearly demonstrated the applicability of machine learning methods for such an analysis, created simple prediction models still lacks accuracy necessary to be successfully applied for practical needs. We have assumed that using advanced deep machine learning methods may considerably increase the accuracy of predictions. We started with simple machine learning methods to estimate basic prediction performance and moved further by applying advanced methods based on shallow and deep neural networks. Then we compared prediction power of studied models and made conclusions about its performance. Finally, we made hypotheses how prediction accuracy can be further improved. As result of this work, we provide full source code used in the experiments for all interested researchers and practitioners in corresponding GitHub repository. We believe that applying deep machine learning for psycho-demographic profiling may have an enormous impact on the society (for good or worse) and provides means for Artificial Intelligence (AI) systems to better understand humans by creating their psychological profiles. Thus AI agents may achieve the human-like ability to participate in conversation (communication) flow by anticipating human opponents' reactions, expectations, and behavior.\nThis paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global constraint-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Finally, we also extend our approach by introducing a multi-structure scheme, which learns a set of local correspondence structures to capture the spatial correspondence sub-patterns between a camera pair, so as to handle the spatial misalignments between individual images in a more precise way. Experimental results on various datasets demonstrate the effectiveness of our approach.\nA natural image usually conveys rich semantic content and can be viewed from different angles. Existing image description methods are largely restricted by small sets of biased visual paragraph annotations, and fail to cover rich underlying semantics. In this paper, we investigate a semi-supervised paragraph generative framework that is able to synthesize diverse and semantically coherent paragraph descriptions by reasoning over local semantic regions and exploiting linguistic knowledge. The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators. The paragraph generator generates sentences recurrently by incorporating region-based visual and language attention mechanisms at each step. The quality of generated paragraph sentences is assessed by multi-level adversarial discriminators from two aspects, namely, plausibility at sentence level and topic-transition coherence at paragraph level. The joint adversarial training of RTT-GAN drives the model to generate realistic paragraphs with smooth logical transition between sentence topics. Extensive quantitative experiments on image and video paragraph datasets demonstrate the effectiveness of our RTT-GAN in both supervised and semi-supervised settings. Qualitative results on telling diverse stories for an image also verify the interpretability of RTT-GAN.\nMany problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. A holy-grail solution towards generic image manipulation should be able to efficiently alter an input image with any personalized signals (even signals unseen during training), such as diverse paintings and arbitrary descriptive attributes. However, existing methods are either inefficient to simultaneously process multiple signals (let alone generalize to unseen signals), or unable to handle signals from other modalities. In this paper, we make the first attempt to address the zero-shot image manipulation task. We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a fully-differentiable architecture that jointly optimizes an image-transformation network (TNet) and a parameter network (PNet). The PNet learns to generate key transformation parameters for the TNet given any guiding signal while the TNet performs fast zero-shot image manipulation according to both signal-dependent parameters from the PNet and signal-invariant parameters from the TNet itself. Extensive experiments show that our ZM-Net can perform high-quality image manipulation conditioned on different forms of guiding signals (e.g. style images and attributes) in real-time (tens of milliseconds per image) even for unseen signals. Moreover, a large-scale style dataset with over 20,000 style images is also constructed to promote further research.\nThe problem of automatically generating a computer program from some specification has been studied since the early days of AI. Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation.   Here, for the first time, we directly compare both approaches on a large-scale, real-world learning task. We additionally contrast to rule-based program synthesis, which uses hand-crafted semantics to guide the program generation. Our neural models use a modified attention RNN to allow encoding of variable-sized sets of I/O pairs. Our best synthesis model achieves 92% accuracy on a real-world test set, compared to the 34% accuracy of the previous best neural synthesis approach. The synthesis model also outperforms a comparable induction model on this task, but we more importantly demonstrate that the strength of each approach is highly dependent on the evaluation metric and end-user application. Finally, we show that we can train our neural models to remain very robust to the type of noise expected in real-world data (e.g., typos), while a highly-engineered rule-based system fails entirely.\nWe provide new results for noise-tolerant and sample-efficient learning algorithms under $s$-concave distributions. The new class of $s$-concave distributions is a broad and natural generalization of log-concavity, and includes many important additional distributions, e.g., the Pareto distribution and $t$-distribution. This class has been studied in the context of efficient sampling, integration, and optimization, but much remains unknown about the geometry of this class of distributions and their applications in the context of learning. The challenge is that unlike the commonly used distributions in learning (uniform or more generally log-concave distributions), this broader class is not closed under the marginalization operator and many such distributions are fat-tailed. In this work, we introduce new convex geometry tools to study the properties of $s$-concave distributions and use these properties to provide bounds on quantities of interest to learning including the probability of disagreement between two halfspaces, disagreement outside a band, and the disagreement coefficient. We use these results to significantly generalize prior results for margin-based active learning, disagreement-based active learning, and passive learning of intersections of halfspaces. Our analysis of geometric properties of $s$-concave distributions might be of independent interest to optimization more broadly.\nWhen using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate (i.e. to learn) approximation architectures.   We investigate a method of adapting approximation architectures which uses feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. We introduce an algorithm based upon this idea which adapts a state aggregation approximation architecture on-line.   Assuming $S$ states, we demonstrate theoretically that - provided the following relatively non-restrictive assumptions are satisfied: (a) the number of cells $X$ in the state aggregation architecture is of order $\\sqrt{S}\\ln{S}\\log_2{S}$ or greater, (b) the policy and transition function are close to deterministic, and (c) the prior for the transition function is uniformly distributed - our algorithm can guarantee, assuming we use an appropriate scoring function to measure VF error, error which is arbitrarily close to zero as $S$ becomes large. It is able to do this despite having only $O(X\\log_2{S})$ space complexity (and negligible time complexity). We conclude by generating a set of empirical results which support the theoretical results.\nDeep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.   This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.   The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.\nIn visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are evaluated on them. As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different methods. In this paper, we analyze existing VQA algorithms using a new dataset. It contains over 1.6 million questions organized into 12 different categories. We also introduce questions that are meaningless for a given image to force a VQA system to reason about image content. We propose new evaluation schemes that compensate for over-represented question-types and make it easier to study the strengths and weaknesses of algorithms. We analyze the performance of both baseline and state-of-the-art VQA models, including multi-modal compact bilinear pooling (MCB), neural module networks, and recurrent answering units. Our experiments establish how attention helps certain categories more than others, determine which models work better than others, and explain how simple models (e.g. MLP) can surpass more complex models (MCB) by simply learning to answer large, easy question categories.\nIn one perspective, the central problem pursued in this research is that of the inverse problem in the context of general rough sets. The problem is about the existence of rough basis for given approximations in a context. Granular operator spaces were recently introduced by the present author as an optimal framework for anti-chain based algebraic semantics of general rough sets and the inverse problem. In the framework, various subtypes of crisp and non crisp objects are identifiable that may be missed in more restrictive formalism. This is also because in the latter cases the concept of complementation and negation are taken for granted. This opens the door for a general approach to dialectical rough sets building on previous work of the present author and figures of opposition. In this paper dialectical rough logics are developed from a semantic perspective, concept of dialectical predicates is formalized, connection with dialethias and glutty negation established, parthood analyzed and studied from the point of view of classical and dialectical figures of opposition. Potential semantics through dialectical counting based on these figures are proposed building on earlier work by the present author. Her methods become more geometrical and encompass parthood as a primary relation (as opposed to roughly equivalent objects) for algebraic semantics. Dialectical counting strategies over anti chains (a specific form of dialectical structure) for semantics are also proposed.\nRating platforms enable large-scale collection of user opinion about items (products, other users, etc.). However, many untrustworthy users give fraudulent ratings for excessive monetary gains. In the paper, we present FairJudge, a system to identify such fraudulent users. We propose three metrics: (i) the fairness of a user that quantifies how trustworthy the user is in rating the products, (ii) the reliability of a rating that measures how reliable the rating is, and (iii) the goodness of a product that measures the quality of the product. Intuitively, a user is fair if it provides reliable ratings that are close to the goodness of the product. We formulate a mutually recursive definition of these metrics, and further address cold start problems and incorporate behavioral properties of users and products in the formulation. We propose an iterative algorithm, FairJudge, to predict the values of the three metrics. We prove that FairJudge is guaranteed to converge in a bounded number of iterations, with linear time complexity. By conducting five different experiments on five rating platforms, we show that FairJudge significantly outperforms nine existing algorithms in predicting fair and unfair users. We reported the 100 most unfair users in the Flipkart network to their review fraud investigators, and 80 users were correctly identified (80% accuracy). The FairJudge algorithm is already being deployed at Flipkart.\nMost existing neural network models for music generation use recurrent neural networks. However, the recent WaveNet model proposed by DeepMind shows that convolutional neural networks (CNNs) can also generate realistic musical waveforms in the audio domain. Following this light, we investigate using CNNs for generating melody (a series of MIDI notes) one bar after another in the symbolic domain. In addition to the generator, we use a discriminator to learn the distributions of melodies, making it a generative adversarial network (GAN). Moreover, we propose a novel conditional mechanism to exploit available prior knowledge, so that the model can generate melodies either from scratch, by following a chord sequence, or by conditioning on the melody of previous bars (e.g. a priming melody), among other possibilities. The resulting model, named MidiNet, can be expanded to generate music with multiple MIDI channels (i.e. tracks). We conduct a user study to compare the melody of eight-bar long generated by MidiNet and by Google's MelodyRNN models, each time using the same priming melody. Result shows that MidiNet performs comparably with MelodyRNN models in being realistic and pleasant to listen to, yet MidiNet's melodies are reported to be much more interesting.\nThe Android operating system has become the most popular operating system for smartphones and tablets leading to a rapid rise in malware. Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. For this reason, countermeasures against antiemulation are becoming increasingly important in Android malware detection. Analysis and detection based on real devices can alleviate the problems of anti-emulation as well as improve the effectiveness of dynamic analysis. Hence, in this paper we present an investigation of machine learning based malware detection using dynamic analysis on real devices. A tool is implemented to automatically extract dynamic features from Android phones and through several experiments, a comparative analysis of emulator based vs. device based detection by means of several machine learning algorithms is undertaken. Our study shows that several features could be extracted more effectively from the on-device dynamic analysis compared to emulators. It was also found that approximately 24% more apps were successfully analysed on the phone. Furthermore, all of the studied machine learning based detection performed better when applied to features extracted from the on-device dynamic analysis.\nOne of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable that simple algorithms like SGD reliably return solutions with low test error. One roadblock to explaining these phenomena in terms of implicit regularization, structural properties of the solution, and/or easiness of the data is that many learning bounds are quantitatively vacuous when applied to networks learned by SGD in this \"deep learning\" regime. Logically, in order to explain generalization, we need nonvacuous bounds. We return to an idea by Langford and Caruana (2001), who used PAC-Bayes bounds to compute nonvacuous numerical bounds on generalization error for stochastic two-layer two-hidden-unit neural networks via a sensitivity analysis. By optimizing the PAC-Bayes bound directly, we are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples. We connect our findings to recent and old work on flat minima and MDL-based explanations of generalization.\nClassifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability.\nAs part of Smart Cities initiatives, national, regional and local governments all over the globe are under the mandate of being more open regarding how they share their data. Under this mandate, many of these governments are publishing data under the umbrella of open government data, which includes measurement data from city-wide sensor networks. Furthermore, many of these data are published in so-called data portals as documents that may be spreadsheets, comma-separated value (CSV) data files, or plain documents in PDF or Word documents. The sharing of these documents may be a convenient way for the data provider to convey and publish data but it is not the ideal way for data consumers to reuse the data. For example, the problems of reusing the data may range from difficulty opening a document that is provided in any format that is not plain text, to the actual problem of understanding the meaning of each piece of knowledge inside of the document. Our proposal tackles those challenges by identifying metadata that has been regarded to be relevant for measurement data and providing a schema for this metadata. We further leverage the Human-Aware Sensor Network Ontology (HASNetO) to build an architecture for data collected in urban environments. We discuss the use of HASNetO and the supporting infrastructure to manage both data and metadata in support of the City of Fortaleza, a large metropolitan area in Brazil.\nSignificant efforts have been made to understand and document knowledge related to scientific measurements. Many of those efforts resulted in one or more high-quality ontologies that describe some aspects of scientific measurements, but not in a comprehensive and coherently integrated manner. For instance, we note that many of these high-quality ontologies are not properly aligned, and more challenging, that they have different and often conflicting concepts and approaches for encoding knowledge about empirical measurements. As a result of this lack of an integrated view, it is often challenging for scientists to determine whether any two scientific measurements were taken in semantically compatible manners, thus making it difficult to decide whether measurements should be analyzed in combination or not. In this paper, we present the Human-Aware Sensor Network Ontology that is a comprehensive alignment and integration of a sensing infrastructure ontology and a provenance ontology. HASNetO has been under development for more than one year, and has been reviewed, shared and used by multiple scientific communities. The ontology has been in use to support the data management of a number of large-scale ecological monitoring activities (observations) and empirical experiments.\nThe Dynamic Vehicle Routing Problem with Time Windows (DVRPTW) is an extension of the well-known Vehicle Routing Problem (VRP), which takes into account the dynamic nature of the problem. This aspect requires the vehicle routes to be updated in an ongoing manner as new customer requests arrive in the system and must be incorporated into an evolving schedule during the working day. Besides the vehicle capacity constraint involved in the classical VRP, DVRPTW considers in addition time windows, which are able to better capture real-world situations. Despite this, so far, few studies have focused on tackling this problem of greater practical importance. To this end, this study devises for the resolution of DVRPTW, an ant colony optimization based algorithm, which resorts to a joint solution construction mechanism, able to construct in parallel the vehicle routes. This method is coupled with a local search procedure, aimed to further improve the solutions built by ants, and with an insertion heuristics, which tries to reduce the number of vehicles used to service the available customers. The experiments indicate that the proposed algorithm is competitive and effective, and on DVRPTW instances with a higher dynamicity level, it is able to yield better results compared to existing ant-based approaches.\nAn overview of current debates and contemporary research devoted to the modeling of decision making processes and their facilitation directs attention to the Analytic Hierarchy Process (AHP). At the core of the AHP are various prioritization procedures (PPs) and consistency measures (CMs) for a Pairwise Comparison Matrix (PCM) which, in a sense, reflects preferences of decision makers. Certainly, when judgments about these preferences are perfectly consistent (cardinally transitive), all PPs coincide and the quality of the priority ratios (PRs) estimation is exemplary. However, human judgments are very rarely consistent, thus the quality of PRs estimation may significantly vary. The scale of these variations depends on the applied PP and utilized CM for a PCM. This is why it is important to find out which PPs and which CMs for a PCM lead directly to an improvement of the PRs estimation accuracy. The main goal of this research is realized through the properly designed, coded and executed seminal and sophisticated simulation algorithms in Wolfram Mathematica 8.0. These research results convince that the embedded in the AHP and commonly applied, both genuine PP and CM for PCM may significantly deteriorate the quality of PRs estimation; however, solutions proposed in this paper can significantly improve the methodology.\nA great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely microTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of microTC along with an extensive experimental comparison with relevant state-of-the-art methods. mircoTC was compared on 30 different datasets. Regarding accuracy, microTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.\nMajor advances have recently been made in merging language and vision representations. But most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw data to perform certain types of higher-level reasoning, expressed in natural language by function words. A case in point is given by their ability to learn quantifiers, i.e. expressions like 'few', 'some' and 'all'. From formal semantics and cognitive linguistics, we know that quantifiers are relations over sets which, as a simplification, we can see as proportions. For instance, in 'most fish are red', most encodes the proportion of fish which are red fish. In this paper, we study how well current language and vision strategies model such relations. We show that state-of-the-art attention mechanisms coupled with a traditional linguistic formalisation of quantifiers gives best performance on the task. Additionally, we provide insights on the role of 'gist' representations in quantification. A 'logical' strategy to tackle the task would be to first obtain a numerosity estimation for the two involved sets and then compare their cardinalities. We however argue that precisely identifying the composition of the sets is not only beyond current state-of-the-art models but perhaps even detrimental to a task that is most efficiently performed by refining the approximate numerosity estimator of the system.\nThe Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.\nBuilding a dialogue agent to fulfill complex tasks, such as travel planning, is challenging because the agent has to learn to collectively complete multiple subtasks. For example, the agent needs to reserve a hotel and book a flight so that there leaves enough time for commute between arrival and hotel check-in. This paper addresses this challenge by formulating the task in the mathematical framework of options over Markov Decision Processes (MDPs), and proposing a hierarchical deep reinforcement learning approach to learning a dialogue manager that operates at different temporal scales. The dialogue manager consists of: (1) a top-level dialogue policy that selects among subtasks or options, (2) a low-level dialogue policy that selects primitive actions to complete the subtask given by the top-level policy, and (3) a global state tracker that helps ensure all cross-subtask constraints be satisfied. Experiments on a travel planning task with simulated and real users show that our approach leads to significant improvements over three baselines, two based on handcrafted rules and the other based on flat deep reinforcement learning.\nThis paper targets on the problem of set to set recognition, which learns the metric between two image sets. Images in each set belong to the same identity. Since images in a set can be complementary, they hopefully lead to higher accuracy in practical applications. However, the quality of each sample cannot be guaranteed, and samples with poor quality will hurt the metric. In this paper, the quality aware network (QAN) is proposed to confront this problem, where the quality of each sample can be automatically learned although such information is not explicitly provided in the training stage. The network has two branches, where the first branch extracts appearance feature embedding for each sample and the other branch predicts quality score for each sample. Features and quality scores of all samples in a set are then aggregated to generate the final feature embedding. We show that the two branches can be trained in an end-to-end manner given only the set-level identity annotation. Analysis on gradient spread of this mechanism indicates that the quality learned by the network is beneficial to set-to-set recognition and simplifies the distribution that the network needs to fit. Experiments on both face verification and person re-identification show advantages of the proposed QAN. The source code and network structure can be downloaded at https://github.com/sciencefans/Quality-Aware-Network.\nWe consider off-policy temporal-difference (TD) learning in discounted Markov decision processes, where the goal is to evaluate a policy in a model-free way by using observations of a state process generated without executing the policy. To curb the high variance issue in off-policy TD learning, we propose a new scheme of setting the $\\lambda$-parameters of TD, based on generalized Bellman equations. Our scheme is to set $\\lambda$ according to the eligibility trace iterates calculated in TD, thereby easily keeping these traces in a desired bounded range. Compared to prior works, this scheme is more direct and flexible, and allows much larger $\\lambda$ values for off-policy TD learning with bounded traces. Using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\\lambda$ and the unique invariant probability measure of the state-trace process. These results not only lead immediately to a characterization of the convergence behavior of least-squares based implementation of our scheme, but also prepare the ground for further analysis of gradient-based implementations.\nHigh Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware. This malware was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.\nOnline reinforcement learning (RL) is increasingly popular for the personalized mobile health (mHealth) intervention. It is able to personalize the type and dose of interventions according to user's ongoing statuses and changing needs. However, at the beginning of online learning, there are usually too few samples to support the RL updating, which leads to poor performances. A delay in good performance of the online learning algorithms can be especially detrimental in the mHealth, where users tend to quickly disengage with the mHealth app. To address this problem, we propose a new online RL methodology that focuses on an effective warm start. The main idea is to make full use of the data accumulated and the decision rule achieved in a former study. As a result, we can greatly enrich the data size at the beginning of online learning in our method. Such case accelerates the online learning process for new users to achieve good performances not only at the beginning of online learning but also through the whole online learning process. Besides, we use the decision rules achieved in a previous study to initialize the parameter in our online RL model for new users. It provides a good initialization for the proposed online RL algorithm. Experiment results show that promising improvements have been achieved by our method compared with the state-of-the-art method.\nIn this paper, we propose a simple variant of the original stochastic variance reduction gradient (SVRG), where hereafter we refer to as the variance reduced stochastic gradient descent (VR-SGD). Different from the choices of the snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG, the two vectors of each epoch in VR-SGD are set to the average and last iterate of the previous epoch, respectively. This setting allows us to use much larger learning rates or step sizes than SVRG, e.g., 3/(7L) for VR-SGD vs 1/(10L) for SVRG, and also makes our convergence analysis more challenging. In fact, a larger learning rate enjoyed by VR-SGD means that the variance of its stochastic gradient estimator asymptotically approaches zero more rapidly. Unlike common stochastic methods such as SVRG and proximal stochastic methods such as Prox-SVRG, we design two different update rules for smooth and non-smooth objective functions, respectively. In other words, VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without using any reduction techniques such as quadratic regularizers. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains a linear convergence rate. We also provide the convergence guarantees of VR-SGD for non-strongly convex problems. Experimental results show that the performance of VR-SGD is significantly better than its counterparts, SVRG and Prox-SVRG, and it is also much better than the best known stochastic method, Katyusha.\nProgress in science has advanced the development of human society across history, with dramatic revolutions shaped by information theory, genetic cloning, and artificial intelligence, among the many scientific achievements produced in the 20th century. However, the way that science advances itself is much less well-understood. In this work, we study the evolution of scientific development over the past century by presenting an anatomy of 89 million digitalized papers published between 1900 and 2015. We find that science has benefited from the shift from individual work to collaborative effort, with over 90% of the world-leading innovations generated by collaborations in this century, nearly four times higher than they were in the 1900s. We discover that rather than the frequent myopic- and self-referencing that was common in the early 20th century, modern scientists instead tend to look for literature further back and farther around. Finally, we also observe the globalization of scientific development from 1900 to 2015, including 25-fold and 7-fold increases in international collaborations and citations, respectively, as well as a dramatic decline in the dominant accumulation of citations by the US, the UK, and Germany, from ~95% to ~50% over the same period. Our discoveries are meant to serve as a starter for exploring the visionary ways in which science has developed throughout the past century, generating insight into and an impact upon the current scientific innovations and funding policies.\nGrids allow users flexible on-demand usage of computing resources through remote communication networks. A remarkable example of a Grid in High Energy Physics (HEP) research is used in the ALICE experiment at European Organization for Nuclear Research CERN. Physicists can submit jobs used to process the huge amount of particle collision data produced by the Large Hadron Collider (LHC). Grids face complex security challenges. They are interesting targets for attackers seeking for huge computational resources. Since users can execute arbitrary code in the worker nodes on the Grid sites, special care should be put in this environment. Automatic tools to harden and monitor this scenario are required. Currently, there is no integrated solution for such requirement. This paper describes a new security framework to allow execution of job payloads in a sandboxed context. It also allows process behavior monitoring to detect intrusions, even when new attack methods or zero day vulnerabilities are exploited, by a Machine Learning approach. We plan to implement the proposed framework as a software prototype that will be tested as a component of the ALICE Grid middleware.\nHumans can ground natural language commands to tasks at both abstract and fine-grained levels of specificity. For instance, a human forklift operator can be instructed to perform a high-level action, like \"grab a pallet\" or a lowlevel action like \"tilt back a little bit.\" While robots are also capable of grounding language commands to tasks, previous methods implicitly assume that all commands and tasks reside at a single, fixed level of abstraction. Additionally, those approaches that do not use abstraction experience inefficient planning and execution times due to the large, intractable state-action spaces, which closely resemble real world complexity. In this work, by grounding commands to all the tasks or subtasks available in a hierarchical planning framework, we arrive at a model capable of interpreting language at multiple levels of specificity ranging from coarse to more granular. We show that the accuracy of the grounding procedure is improved when simultaneously inferring the degree of abstraction in language used to communicate the task. Leveraging hierarchy also improves efficiency: our proposed approach enables a robot to respond to a command within one second on 90% of our tasks, while baselines take over twenty seconds on half the tasks. Finally, we demonstrate that a real, physical robot can ground commands at multiple levels of abstraction allowing it to efficiently plan different subtasks within the same planning hierarchy.\nMaking high-quality decisions in strategic spatial planning is heavily dependent on extracting knowledge from vast amounts of data. Although many decision-making problems like developing urban areas require such perception and reasoning, existing methods in this field usually neglect the deep knowledge mined from geographic databases and are based on pure statistical methods. Due to the large volume of data gathered in spatial databases, and the uncertainty of spatial objects, mining association rules for high-level knowledge representation is a challenging task. Few algorithms manage geographical and non-geographical data using topological relations. In this paper, a novel approach for spatial data mining based on the MOSES evolutionary framework is presented which improves the classic genetic programming approach. A hybrid architecture called GGeo is proposed to apply the MOSES mining rules considering fuzzy topological relations from spatial data. The uncertainty and fuzziness aspects are addressed using an enriched model of topological relations by fuzzy region connection calculus. Moreover, to overcome the problem of time-consuming fuzzy topological relationships calculations, this a novel data pre-processing method is offered. GGeo analyses and learns from geographical and non-geographical data and uses topological and distance parameters, and returns a series of arithmetic-spatial formulas as classification rules. The proposed approach is resistant to noisy data, and all its stages run in parallel to increase speed. This approach may be used in different spatial data classification problems as well as representing an appropriate method of data analysis and economic policy making.\nWe consider the problem of online learning in misspecified linear stochastic multi-armed bandit problems. Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features. It is, however, of interest to investigate the impact of potential misspecification in linear bandit models, where the expected rewards are perturbed away from the linear subspace determined by the arms features. Although OFUL has recently been shown to be robust to relatively small deviations from linearity, we show that any linear bandit algorithm that enjoys optimal regret performance in the perfectly linear setting (e.g., OFUL) must suffer linear regret under a sparse additive perturbation of the linear model. In an attempt to overcome this negative result, we define a natural class of bandit models characterized by a non-sparse deviation from linearity. We argue that the OFUL algorithm can fail to achieve sublinear regret even under models that have non-sparse deviation.We finally develop a novel bandit algorithm, comprising a hypothesis test for linearity followed by a decision to use either the OFUL or Upper Confidence Bound (UCB) algorithm. For perfectly linear bandit models, the algorithm provably exhibits OFULs favorable regret performance, while for misspecified models satisfying the non-sparse deviation property, the algorithm avoids the linear regret phenomenon and falls back on UCBs sublinear regret scaling. Numerical experiments on synthetic data, and on recommendation data from the public Yahoo! Learning to Rank Challenge dataset, empirically support our findings.\nThough the deep learning is pushing the machine learning to a new stage, basic theories of machine learning are still limited. The principle of learning, the role of the a prior knowledge, the role of neuron bias, and the basis for choosing neural transfer function and cost function, etc., are still far from clear. In this paper, we present a general theoretical framework for machine learning. We classify the prior knowledge into common and problem-dependent parts, and consider that the aim of learning is to maximally incorporate them. The principle we suggested for maximizing the former is the design risk minimization principle, while the neural transfer function, the cost function, as well as pretreatment of samples, are endowed with the role for maximizing the latter. The role of the neuron bias is explained from a different angle. We develop a Monte Carlo algorithm to establish the input-output responses, and we control the input-output sensitivity of a learning machine by controlling that of individual neurons. Applications of function approaching and smoothing, pattern recognition and classification, are provided to illustrate how to train general learning machines based on our theory and algorithm. Our method may in addition induce new applications, such as the transductive inference.\nThis manuscript introduces the problem of prominent object detection and recognition inspired by the fact that human seems to priorities perception of scene elements. The problem deals with finding the most important region of interest, segmenting the relevant item/object in that area, and assigning it an object class label. In other words, we are solving the three problems of saliency modeling, saliency detection, and object recognition under one umbrella. The motivation behind such a problem formulation is (1) the benefits to the knowledge representation-based vision pipelines, and (2) the potential improvements in emulating bio-inspired vision systems by solving these three problems together. We are foreseeing extending this problem formulation to fully semantically segmented scenes with instance object priority for high-level inferences in various applications including assistive vision. Along with a new problem definition, we also propose a method to achieve such a task. The proposed model predicts the most important area in the image, segments the associated objects, and labels them. The proposed problem and method are evaluated against human fixations, annotated segmentation masks, and object class categories. We define a chance level for each of the evaluation criterion to compare the proposed algorithm with. Despite the good performance of the proposed baseline, the overall evaluations indicate that the problem of prominent object detection and recognition is a challenging task that is still worth investigating further.\nPatient time series classification faces challenges in high degrees of dimensionality and missingness. In light of patient similarity theory, this study explores effective temporal feature engineering and reduction, missing value imputation, and change point detection methods that can afford similarity-based classification models with desirable accuracy enhancement. We select a piecewise aggregation approximation method to extract fine-grain temporal features and propose a minimalist method to impute missing values in temporal features. For dimensionality reduction, we adopt a gradient descent search method for feature weight assignment. We propose new patient status and directional change definitions based on medical knowledge or clinical guidelines about the value ranges for different patient status levels, and develop a method to detect change points indicating positive or negative patient status changes. We evaluate the effectiveness of the proposed methods in the context of early Intensive Care Unit mortality prediction. The evaluation results show that the k-Nearest Neighbor algorithm that incorporates methods we select and propose significantly outperform the relevant benchmarks for early ICU mortality prediction. This study makes contributions to time series classification and early ICU mortality prediction via identifying and enhancing temporal feature engineering and reduction methods for similarity-based time series classification.\nDecoding human brain activities via functional magnetic resonance imaging (fMRI) has gained increasing attention in recent years. While encouraging results have been reported in brain states classification tasks, reconstructing the details of human visual experience still remains difficult. Two main challenges that hinder the development of effective models are the perplexing fMRI measurement noise and the high dimensionality of limited data instances. Existing methods generally suffer from one or both of these issues and yield dissatisfactory results. In this paper, we tackle this problem by casting the reconstruction of visual stimulus as the Bayesian inference of missing view in a multiview latent variable model. Sharing a common latent representation, our joint generative model of external stimulus and brain response is not only \"deep\" in extracting nonlinear features from visual images, but also powerful in capturing correlations among voxel activities of fMRI recordings. The nonlinearity and deep structure endow our model with strong representation ability, while the correlations of voxel activities are critical for suppressing noise and improving prediction. We devise an efficient variational Bayesian method to infer the latent variables and the model parameters. To further improve the reconstruction accuracy, the latent representations of testing instances are enforced to be close to that of their neighbours from the training set via posterior regularization. Experiments on three fMRI recording datasets demonstrate that our approach can more accurately reconstruct visual stimuli.\nThe aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper we focus on process discovery relying on online streams of business process execution events. Learning process models from event streams poses both challenges and opportunities, i.e. we need to handle unlimited amounts of data using finite memory and, preferably, constant time. We propose a generic architecture that allows for adopting several classes of existing process discovery techniques in context of event streams. Moreover, we provide several instantiations of the architecture, accompanied by implementations in the process mining tool-kit ProM (http://promtools.org). Using these instantiations, we evaluate several dimensions of stream-based process discovery. The evaluation shows that the proposed architecture allows us to lift process discovery to the streaming domain.\nA ranking is an ordered sequence of items, in which an item with higher ranking score is more preferred than the items with lower ranking scores. In many information systems, rankings are widely used to represent the preferences over a set of items or candidates. The consensus measure of rankings is the problem of how to evaluate the degree to which the rankings agree. The consensus measure can be used to evaluate rankings in many information systems, as quite often there is not ground truth available for evaluation.   This paper introduces a novel approach for consensus measure of rankings by using graph representation, in which the vertices or nodes are the items and the edges are the relationship of items in the rankings. Such representation leads to various algorithms for consensus measure in terms of different aspects of rankings, including the number of common patterns, the number of common patterns with fixed length and the length of the longest common patterns. The proposed measure can be adopted for various types of rankings, such as full rankings, partial rankings and rankings with ties. This paper demonstrates how the proposed approaches can be used to evaluate the quality of rank aggregation and the quality of top-$k$ rankings from Google and Bing search engines.\nA central goal in cancer genomics is to identify the somatic alterations that underpin tumor initiation and progression. This task is challenging as the mutational profiles of cancer genomes exhibit vast heterogeneity, with many alterations observed within each individual, few shared somatically mutated genes across individuals, and important roles in cancer for both frequently and infrequently mutated genes. While commonly mutated cancer genes are readily identifiable, those that are rarely mutated across samples are difficult to distinguish from the large numbers of other infrequently mutated genes. Here, we introduce a method that considers per-individual mutational profiles within the context of protein-protein interaction networks in order to identify small connected subnetworks of genes that, while not individually frequently mutated, comprise pathways that are perturbed across (i.e., \"cover\") a large fraction of the individuals. We devise a simple yet intuitive objective function that balances identifying a small subset of genes with covering a large fraction of individuals. We show how to solve this problem optimally using integer linear programming and also give a fast heuristic algorithm that works well in practice. We perform a large-scale evaluation of our resulting method, nCOP, on 6,038 TCGA tumor samples across 24 different cancer types. We demonstrate that our approach nCOP is more effective in identifying cancer genes than both methods that do not utilize any network information as well as state-of-the-art network-based methods that aggregate mutational information across individuals. Overall, our work demonstrates the power of combining per-individual mutational information with interaction networks in order to uncover genes functionally relevant in cancers, and in particular those genes that are less frequently mutated.\nCurrent domain-independent, classical planners require symbolic models of the problem domain and instance as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems such as planners. We propose LatPlan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), and a pair of images representing the initial and the goal states (planning inputs), LatPlan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. The contribution of this paper is twofold: (1) State Autoencoder, which finds a propositional state representation of the environment using a Variational Autoencoder. It generates a discrete latent vector from the images, based on which a PDDL model can be constructed and then solved by an off-the-shelf planner. (2) Action Autoencoder / Discriminator, a neural architecture which jointly finds the action symbols and the implicit action models (preconditions/effects), and provides a successor function for the implicit graph search. We evaluate LatPlan using image-based versions of 3 planning domains: 8-puzzle, Towers of Hanoi and LightsOut.\nA logical theory of regular double or multiple recurrence of eventualities, which are regular patterns of occurrences that are repeated, in time, has been developed within the context of temporal reasoning that enabled reasoning about the problem of coincidence. i.e. if two complex eventualities, or eventuality sequences consisting respectively of component eventualities x0, x1,....,xr and y0, y1, ..,ys both recur over an interval k and all eventualities are of fixed durations, is there a subinterval of k over which the occurrence xp and yq for p between 1 and r and q between 1 and s coincide. We present the ideas behind a new algorithm for detecting the coincidence of eventualities xp and yq within a cycle of the double recurrence of x and y. The algorithm is based on the novel concept of gcd partitions that requires the partitioning of each of the incidences of both x and y into eventuality sequences each of which components have a duration that is equal to the greatest common divisor of the durations of x and y. The worst case running time of the partitioning algorithm is linear in the maximum of the duration of x and that of y, while the worst case running time of an algorithm exploring a complete cycle is quadratic in the durations of x and y. Hence the partitioning algorithm works faster than the cyclical exploration in the worst case.\nDifferent from other sequential data, sentences in natural language are structured by linguistic grammars. Previous generative conversational models with chain-structured decoder ignore this structure in human language and might generate plausible responses with less satisfactory relevance and fluency. In this study, we aim to incorporate the results from linguistic analysis into the process of sentence generation for high-quality conversation generation. Specifically, we use a dependency parser to transform each response sentence into a dependency tree and construct a training corpus of sentence-tree pairs. A tree-structured decoder is developed to learn the mapping from a sentence to its tree, where different types of hidden states are used to depict the local dependencies from an internal tree node to its children. For training acceleration, we propose a tree canonicalization method, which transforms trees into equivalent ternary trees. Then, with a proposed tree-structured search method, the model is able to generate the most probable responses in the form of dependency trees, which are finally flattened into sequences as the system output. Experimental results demonstrate that the proposed X2Tree framework outperforms baseline methods over 11.15% increase of acceptance ratio.\nMental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions.   We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control' users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health.\nImpressive image captioning results are achieved in domains with plenty of training image and sentence pairs (e.g., MSCOCO). However, transferring to a target domain with significant domain shifts but no paired training data (referred to as cross-domain image captioning) remains largely unexplored. We propose a novel adversarial training procedure to leverage unpaired data in the target domain. Two critic networks are introduced to guide the captioner, namely domain critic and multi-modal critic. The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain. The multi-modal critic assesses whether an image and its generated sentence are a valid pair. During training, the critics and captioner act as adversaries -- captioner aims to generate indistinguishable sentences, whereas critics aim at distinguishing them. The assessment improves the captioner through policy gradient updates. During inference, we further propose a novel critic-based planning method to select high-quality sentences without additional supervision (e.g., tags). To evaluate, we use MSCOCO as the source domain and four other datasets (CUB-200-2011, Oxford-102, TGIF, and Flickr30k) as the target domains. Our method consistently performs well on all datasets. In particular, on CUB-200-2011, we achieve 21.8% CIDEr-D improvement after adaptation. Utilizing critics during inference further gives another 4.5% boost.\nThis paper describes a new evolutionary algorithm that is especially well suited to AI-Assisted Game Design. The approach adopted in this paper is to use observations of AI agents playing the game to estimate the game's quality. Some of best agents for this purpose are General Video Game AI agents, since they can be deployed directly on a new game without game-specific tuning; these agents tend to be based on stochastic algorithms which give robust but noisy results and tend to be expensive to run. This motivates the main contribution of the paper: the development of the novel N-Tuple Bandit Evolutionary Algorithm, where a model is used to estimate the fitness of unsampled points and a bandit approach is used to balance exploration and exploitation of the search space. Initial results on optimising a Space Battle game variant suggest that the algorithm offers far more robust results than the Random Mutation Hill Climber and a Biased Mutation variant, which are themselves known to offer competitive performance across a range of problems. Subjective observations are also given by human players on the nature of the evolved games, which indicate a preference towards games generated by the N-Tuple algorithm.\nIn the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbors graph. In the k-means iteration, each data sample is only compared to clusters that its nearest neighbors reside. Since the number of nearest neighbors we consider is much less than k, the processing cost in this step becomes minor and irrelevant to k. The processing bottleneck is therefore overcome. The most interesting thing is that k-nearest neighbor graph is constructed by iteratively calling the fast $k$-means itself. Comparing with existing fast k-means variants, the proposed algorithm achieves hundreds to thousands times speed-up while maintaining high clustering quality. As it is tested on 10 million 512-dimensional data, it takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the same scale of clustering, it would take 3 years for traditional k-means.\nHuman-in-the-loop data analysis applications necessitate greater transparency in machine learning models for experts to understand and trust their decisions. To this end, we propose a visual analytics workflow to help data scientists and domain experts explore, diagnose, and understand the decisions made by a binary classifier. The approach leverages \"instance-level explanations\", measures of local feature relevance that explain single instances, and uses them to build a set of visual representations that guide the users in their investigation. The workflow is based on three main visual representations and steps: one based on aggregate statistics to see how data distributes across correct / incorrect decisions; one based on explanations to understand which features are used to make these decisions; and one based on raw data, to derive insights on potential root causes for the observed patterns. The workflow is derived from a long-term collaboration with a group of machine learning and healthcare professionals who used our method to make sense of machine learning models they developed. The case study from this collaboration demonstrates that the proposed workflow helps experts derive useful knowledge about the model and the phenomena it describes, thus experts can generate useful hypotheses on how a model can be improved.\nOnline reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers. In contrast to previous works in this domain exploiting a variety of syntactic and community-level features, we delve deep into the semantics of reviews as to what makes them useful, providing interpretable explanation for the same. We identify a set of consistency and semantic factors, all from the text, ratings, and timestamps of user-generated reviews, making our approach generalizable across all communities and domains. We explore review semantics in terms of several latent factors like the expertise of its author, his judgment about the fine-grained facets of the underlying product, and his writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) item facets, and (iii) review helpfulness. Large-scale experiments on five real-world datasets from Amazon show significant improvement over state-of-the-art baselines in predicting and ranking useful reviews.\nCurrent recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels.\nMachine Learning (ML) has found it particularly useful in malware detection. However, as the malware evolves very fast, the stability of the feature extracted from malware serves as a critical issue in malware detection. Recent success of deep learning in image recognition, natural language processing, and machine translation indicate a potential solution for stabilizing the malware detection effectiveness. We present a coloR-inspired convolutional neuRal network-based AndroiD malware Detection (R2-D2), which can detect malware without extracting pre-selected features (e.g., the control-flow of op-code, classes, methods of functions and the timing they are invoked etc.) from Android apps. In particular, we develop a color representation for translating Android apps into RGB color code and transform them to a fixed-sized encoded image. After that, the encoded image is fed to convolutional neural network for automatic feature extraction and learning, reducing the expert's intervention. We have collected over 1 million malware samples and 1 million benign samples according to the data provided by Leopard Mobile Inc. from its core product Security Master (which has 623 million monthly active users and 10k new malware samples per day). It is shown that R2-D2 can effectively detect the malware. Furthermore, we keep our research results and release experiment material on http://R2D2.TWMAN.ORG if there is any update.\nExisting methods for arterial blood pressure (BP) estimation directly map the input physiological signals to output BP values without explicitly modeling the underlying temporal dependencies in BP dynamics. As a result, these models suffer from accuracy decay over a long time and thus require frequent calibration. In this work, we address this issue by formulating BP estimation as a sequence prediction problem in which both the input and target are temporal sequences. We propose a novel deep recurrent neural network (RNN) consisting of multilayered Long Short-Term Memory (LSTM) networks, which are incorporated with (1) a bidirectional structure to access larger-scale context information of input sequence, and (2) residual connections to allow gradients in deep RNN to propagate more effectively. The proposed deep RNN model was tested on a static BP dataset, and it achieved root mean square error (RMSE) of 3.90 and 2.66 mmHg for systolic BP (SBP) and diastolic BP (DBP) prediction respectively, surpassing the accuracy of traditional BP prediction models. On a multi-day BP dataset, the deep RNN achieved RMSE of 3.84, 5.25, 5.80 and 5.81 mmHg for the 1st day, 2nd day, 4th day and 6th month after the 1st day SBP prediction, and 1.80, 4.78, 5.0, 5.21 mmHg for corresponding DBP prediction, respectively, which outperforms all previous models with notable improvement. The experimental results suggest that modeling the temporal dependencies in BP dynamics significantly improves the long-term BP prediction accuracy.\nWe present the third generation of the constraint answer set system clingcon, combining Answer Set Programming (ASP) with finite domain constraint processing (CP). While its predecessors rely on a black-box approach to hybrid solving by integrating the CP solver gecode, the new clingcon system pursues a lazy approach using dedicated constraint propagators to extend propagation in the underlying ASP solver clasp. No extension is needed for parsing and grounding clingcon's hybrid modeling language since both can be accommodated by the new generic theory handling capabilities of the ASP grounder gringo. As a whole, clingcon 3 is thus an extension of the ASP system clingo 5, which itself relies on the grounder gringo and the solver clasp. The new approach of clingcon offers a seamless integration of CP propagation into ASP solving that benefits from the whole spectrum of clasp's reasoning modes, including for instance multi-shot solving and advanced optimization techniques. This is accomplished by a lazy approach that unfolds the representation of constraints and adds it to that of the logic program only when needed. Although the unfolding is usually dictated by the constraint propagators during solving, it can already be partially (or even totally) done during preprocessing. Moreover, clingcon's constraint preprocessing and propagation incorporate several well established CP techniques that greatly improve its performance. We demonstrate this via an extensive empirical evaluation contrasting, first, the various techniques in the context of CSP solving and, second, the new clingcon system with other hybrid ASP systems. Under consideration in Theory and Practice of Logic Programming (TPLP)\nWe propose an algorithm to separate simultaneously speaking persons from each other, the \"cocktail party problem\", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speaker masks. We call this technique source-contrastive estimation. The methodology is inspired by negative sampling, which has seen success in natural language processing, where an embedding is learned by correlating and de-correlating a given input vector with output weights. Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference. Doing so will ensure that source separation is explicitly speaker-independent. Our approach is similar to recent deep neural network clustering and permutation-invariant training research; we use weighted spectral features and masks to augment individual speaker frequencies while filtering out other speakers. We avoid, however, the severe computational burden of other approaches with our technique. Furthermore, by training a vector space rather than combinations of different speakers or differences thereof, we avoid the so-called permutation problem during training. Our algorithm offers an intuitive, computationally efficient response to the cocktail party problem, and most importantly boasts better empirical performance than other current techniques.\nWe consider an extension of the set covering problem (SCP) introducing (i)~multicover and (ii)~generalized upper bound (GUB)~constraints. For the conventional SCP, the pricing method has been introduced to reduce the size of instances, and several efficient heuristic algorithms based on such reduction techniques have been developed to solve large-scale instances. However, GUB constraints often make the pricing method less effective, because they often prevent solutions from containing highly evaluated variables together. To overcome this problem, we develop heuristic algorithms to reduce the size of instances, in which new evaluation schemes of variables are introduced taking account of GUB constraints. We also develop an efficient implementation of a 2-flip neighborhood local search algorithm that reduces the number of candidates in the neighborhood without sacrificing the solution quality. In order to guide the search to visit a wide variety of good solutions, we also introduce a path relinking method that generates new solutions by combining two or more solutions obtained so far. According to computational comparison on benchmark instances, the proposed method succeeds in selecting a small number of promising variables properly and performs quite effectively even for large-scale instances having hard GUB constraints.\nA broad range of on-line behaviors are mediated by interfaces in which people make choices among sets of options. A rich and growing line of work in the behavioral sciences indicate that human choices follow not only from the utility of alternatives, but also from the choice set in which alternatives are presented. In this work we study comparison-based choice functions, a simple but surprisingly rich class of functions capable of exhibiting so-called choice-set effects. Motivated by the challenge of predicting complex choices, we study the query complexity of these functions in a variety of settings. We consider settings that allow for active queries or passive observation of a stream of queries, and give analyses both at the granularity of individuals or populations that might exhibit heterogeneous choice behavior. Our main result is that any comparison-based choice function in one dimension can be inferred as efficiently as a basic maximum or minimum choice function across many query contexts, suggesting that choice-set effects need not entail any fundamental algorithmic barriers to inference. We also introduce a class of choice functions we call distance-comparison-based functions, and briefly discuss the analysis of such functions. The framework we outline provides intriguing connections between human choice behavior and a range of questions in the theory of sorting.\nThe content ranking problem in a social news website, is typically a function that maximizes a scalar metric of interest like dwell-time. However, like in most real-world applications we are interested in more than one metric---for instance simultaneously maximizing click-through rate, monetization metrics, dwell-time---and also satisfy the traffic requirements promised to different publishers. All this needs to be done on online data and under the settings where the objective function and the constraints can dynamically change; this could happen if for instance new publishers are added, some contracts are adjusted, or if some contracts are over.   In this paper, we formulate this problem as a constrained, dynamic, multi-objective optimization problem. We propose a novel framework that extends a successful genetic optimization algorithm, NSGA-II, to solve this online, data-driven problem. We design the modules of NSGA-II to suit our problem. We evaluate optimization performance using Hypervolume and introduce a confidence interval metric for assessing the practicality of a solution. We demonstrate the application of this framework on a real-world Article Ranking problem. We observe that we make considerable improvements in both time and performance over a brute-force baseline technique that is currently in production.\nThis paper proposes a design of hierarchical fuzzy inference tree (HFIT). An HFIT produces an optimum treelike structure, i.e., a natural hierarchical structure that accommodates simplicity by combining several low-dimensional fuzzy inference systems (FISs). Such a natural hierarchical structure provides a high degree of approximation accuracy. The construction of HFIT takes place in two phases. Firstly, a nondominated sorting based multiobjective genetic programming (MOGP) is applied to obtain a simple tree structure (a low complexity model) with a high accuracy. Secondly, the differential evolution algorithm is applied to optimize the obtained tree's parameters. In the derived tree, each node acquires a different input's combination, where the evolutionary process governs the input's combination. Hence, HFIT nodes are heterogeneous in nature, which leads to a high diversity among the rules generated by the HFIT. Additionally, the HFIT provides an automatic feature selection because it uses MOGP for the tree's structural optimization that accepts inputs only relevant to the knowledge contained in data. The HFIT was studied in the context of both type-1 and type-2 FISs, and its performance was evaluated through six application problems. Moreover, the proposed multiobjective HFIT was compared both theoretically and empirically with recently proposed FISs methods from the literature, such as McIT2FIS, TSCIT2FNN, SIT2FNN, RIT2FNS-WB, eT2FIS, MRIT2NFS, IT2FNN-SVR, etc. From the obtained results, it was found that the HFIT provided less complex and highly accurate models compared to the models produced by the most of other methods. Hence, the proposed HFIT is an efficient and competitive alternative to the other FISs for function approximation and feature selection.\nOutlier detection is the identification of points in a dataset that do not conform to the norm. Outlier detection is highly sensitive to the choice of the detection algorithm and the feature subspace used by the algorithm. Extracting domain-relevant insights from outliers needs systematic exploration of these choices since diverse outlier sets could lead to complementary insights. This challenge is especially acute in an interactive setting, where the choices must be explored in a time-constrained manner. In this work, we present REMIX, the first system to address the problem of outlier detection in an interactive setting. REMIX uses a novel mixed integer programming (MIP) formulation for automatically selecting and executing a diverse set of outlier detectors within a time limit. This formulation incorporates multiple aspects such as (i) an upper limit on the total execution time of detectors (ii) diversity in the space of algorithms and features, and (iii) meta-learning for evaluating the cost and utility of detectors. REMIX provides two distinct ways for the analyst to consume its results: (i) a partitioning of the detectors explored by REMIX into perspectives through low-rank non-negative matrix factorization; each perspective can be easily visualized as an intuitive heatmap of experiments versus outliers, and (ii) an ensembled set of outliers which combines outlier scores from all detectors. We demonstrate the benefits of REMIX through extensive empirical validation on real-world data.\nThe main idea of this paper is to represent shopping items through vectors because these vectors act as the base for building em- beddings for customers and shopping carts. Also, these vectors are input to the mathematical models that act as either a recommendation engine or help in targeting potential customers. We have used exponential family embeddings as the tool to construct two basic vectors - product embeddings and context vectors. Using the basic vectors, we build combined embeddings, trip embeddings and customer embeddings. Combined embeddings mix linguistic properties of product names with their shopping patterns. The customer embeddings establish an understand- ing of the buying pattern of customers in a group and help in building customer profile. For example a customer profile can represent customers frequently buying pet-food. Identifying such profiles can help us bring out offers and discounts. Similarly, trip embeddings are used to build trip profiles. People happen to buy similar set of products in a trip and hence their trip embeddings can be used to predict the next product they would like to buy. This is a novel technique and the first of its kind to make recommendation using product, trip and customer embeddings.\nThe concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.\nBy utilizing different communication channels, such as verbal language, gestures or facial expressions, virtually embodied interactive humans hold a unique potential to bridge the gap between human-computer interaction and actual interhuman communication. The use of virtual humans is consequently becoming increasingly popular in a wide range of areas where such a natural communication might be beneficial, including entertainment, education, mental health research and beyond. Behind this development lies a series of technological advances in a multitude of disciplines, most notably natural language processing, computer vision, and speech synthesis. In this paper we discuss a Virtual Human Journalist, a project employing a number of novel solutions from these disciplines with the goal to demonstrate their viability by producing a humanoid conversational agent capable of naturally eliciting and reacting to information from a human user. A set of qualitative and quantitative evaluation sessions demonstrated the technical feasibility of the system whilst uncovering a number of deficits in its capacity to engage users in a way that would be perceived as natural and emotionally engaging. We argue that naturalness should not always be seen as a desirable goal and suggest that deliberately suppressing the naturalness of virtual human interactions, such as by altering its personality cues, might in some cases yield more desirable results.\nInfrared (IR) imaging has the potential to enable more robust action recognition systems compared to visible spectrum cameras due to lower sensitivity to lighting conditions and appearance variability. While the action recognition task on videos collected from visible spectrum imaging has received much attention, action recognition in IR videos is significantly less explored. Our objective is to exploit imaging data in this modality for the action recognition task. In this work, we propose a novel two-stream 3D convolutional neural network (CNN) architecture by introducing the discriminative code layer and the corresponding discriminative code loss function. The proposed network processes IR image and the IR-based optical flow field sequences. We pretrain the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge, this is the first application of the 3D CNN to action recognition in the IR domain. We conduct an elaborate analysis of different fusion schemes (weighted average, single and double-layer neural nets) applied to different 3D CNN outputs. Experimental results demonstrate that our approach can achieve state-of-the-art average precision (AP) performances on the InfAR dataset: (1) the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our 3D CNN model applied to the optical flow fields achieves the best reported single stream 75.42% AP.\nThe problem of sparse rewards is one of the hardest challenges in contemporary reinforcement learning. Hierarchical reinforcement learning (HRL) tackles this problem by using a set of temporally-extended actions, or options, each of which has its own subgoal. These subgoals are normally handcrafted for specific tasks. Here, though, we introduce a generic class of subgoals with broad applicability in the visual domain. Underlying our approach (in common with work using \"auxiliary tasks\") is the hypothesis that the ability to control aspects of the environment is an inherently useful skill to have. We incorporate such subgoals in an end-to-end hierarchical reinforcement learning system and test two variants of our algorithm on a number of games from the Atari suite. We highlight the advantage of our approach in one of the hardest games -- Montezuma's revenge -- for which the ability to handle sparse rewards is key. Our agent learns several times faster than the current state-of-the-art HRL agent in this game, reaching a similar level of performance. UPDATE 22/11/17: We found that a standard A3C agent with a simple shaped reward, i.e. extrinsic reward + feature control intrinsic reward, has comparable performance to our agent in Montezuma Revenge. In light of the new experiments performed, the advantage of our HRL approach can be attributed more to its ability to learn useful features from intrinsic rewards rather than its ability to explore and reuse abstracted skills with hierarchical components. This has led us to a new conclusion about the result.\nVisual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our \"CNN Inception + Gate\" model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering.\nGenerality is one of the main advantages of heuristic algorithms, as such, multiple parameters are exposed to the user with the objective of allowing them to shape the algorithms to their specific needs. Parameter selection, therefore, becomes an intrinsic problem of every heuristic algorithm. Selecting good parameter values relies not only on knowledge related to the problem at hand, but to the algorithms themselves. This research explores the usage of self-organized criticality to reduce user interaction in the process of selecting suitable parameters for particle swarm optimization (PSO) heuristics. A particle swarm variant (named Adaptive PSO) with self-organized criticality is developed and benchmarked against the standard PSO. Criticality is observed in the dynamic behaviour of this swarm and excellent results are observed in the long run. In contrast with the standard PSO, the Adaptive PSO does not stagnate at any point in time, balancing the concepts of exploration and exploitation better. A software platform for experimenting with particle swarms, called PSO Laboratory, is also developed. This software is used to test the standard PSO as well as all other PSO variants developed in the process of creating the Adaptive PSO. As the software is intended to be of aid to future and related research, special attention has been put in the development of a friendly graphical user interface. Particle swarms are executed in real time, allowing users to experiment by changing parameters on-the-fly.\nA goal of cloud service management is to design self-adaptable auto-scaler to react to workload fluctuations and changing the resources assigned. The key problem is how and when to add/remove resources in order to meet agreed service-level agreements. Reducing application cost and guaranteeing service-level agreements (SLAs) are two critical factors of dynamic controller design. In this paper, we compare two dynamic learning strategies based on a fuzzy logic system, which learns and modifies fuzzy scaling rules at runtime. A self-adaptive fuzzy logic controller is combined with two reinforcement learning (RL) approaches: (i) Fuzzy SARSA learning (FSL) and (ii) Fuzzy Q-learning (FQL). As an off-policy approach, Q-learning learns independent of the policy currently followed, whereas SARSA as an on-policy always incorporates the actual agent's behavior and leads to faster learning. Both approaches are implemented and compared in their advantages and disadvantages, here in the OpenStack cloud platform. We demonstrate that both auto-scaling approaches can handle various load traffic situations, sudden and periodic, and delivering resources on demand while reducing operating costs and preventing SLA violations. The experimental results demonstrate that FSL and FQL have acceptable performance in terms of adjusted number of virtual machine targeted to optimize SLA compliance and response time.\nThe Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (RL) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, like continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experimental results of PSO-P on IB are compared to results of closed-form control policies derived from the model-based Recurrent Control Neural Network (RCNN) and the model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded the best performing policy in our IB setting. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requiring only a relatively low amount of effort in finding adequate parameters or making complex design decisions.\nDeep Reinforcement Learning (DRL) methods have performed well in an increasing numbering of high-dimensional visual decision making domains. Among all such visual decision making problems, those with discrete action spaces often tend to have underlying compositional structure in the said action space. Such action spaces often contain actions such as go left, go up as well as go diagonally up and left (which is a composition of the former two actions). The representations of control policies in such domains have traditionally been modeled without exploiting this inherent compositional structure in the action spaces. We propose a new learning paradigm, Factored Action space Representations (FAR) wherein we decompose a control policy learned using a Deep Reinforcement Learning Algorithm into independent components, analogous to decomposing a vector in terms of some orthogonal basis vectors. This architectural modification of the control policy representation allows the agent to learn about multiple actions simultaneously, while executing only one of them. We demonstrate that FAR yields considerable improvements on top of two DRL algorithms in Atari 2600: FARA3C outperforms A3C (Asynchronous Advantage Actor Critic) in 9 out of 14 tasks and FARAQL outperforms AQL (Asynchronous n-step Q-Learning) in 9 out of 13 tasks.\nReinforcement Learning (RL) can model complex behavior policies for goal-directed sequential decision making tasks. A hallmark of RL algorithms is Temporal Difference (TD) learning: value function for the current state is moved towards a bootstrapped target that is estimated using next state's value function. $\\lambda$-returns generalize beyond 1-step returns and strike a balance between Monte Carlo and TD learning methods. While lambda-returns have been extensively studied in RL, they haven't been explored a lot in Deep RL. This paper's first contribution is an exhaustive benchmarking of lambda-returns. Although mathematically tractable, the use of exponentially decaying weighting of n-step returns based targets in lambda-returns is a rather ad-hoc design choice. Our second major contribution is that we propose a generalization of lambda-returns called Confidence-based Autodidactic Returns (CAR), wherein the RL agent learns the weighting of the n-step returns in an end-to-end manner. This allows the agent to learn to decide how much it wants to weigh the n-step returns based targets. In contrast, lambda-returns restrict RL agents to use an exponentially decaying weighting scheme. Autodidactic returns can be used for improving any RL algorithm which uses TD learning. We empirically demonstrate that using sophisticated weighted mixtures of multi-step returns (like CAR and lambda-returns) considerably outperforms the use of n-step returns. We perform our experiments on the Asynchronous Advantage Actor Critic (A3C) algorithm in the Atari 2600 domain.\nDeep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.\nThis paper examines two-sided matching with budget constraints where one side (a firm or hospital) can make monetary transfers (offer wages) to the other (a worker or doctor). In a standard model, while multiple doctors can be matched to a single hospital, a hospital has a {\\em maximum quota}, thus, the number of doctors assigned to that hospital cannot exceed a certain limit. In our model, in contrast, a hospital instead has a {\\em fixed budget}, that is, the total amount of wages allocated by each hospital to the doctors is constrained. With budget constraints, stable matchings may fail to exist and checking for the existence is hard. To deal with the nonexistence of stable matchings, we extend the \"matching with contracts\" model of Hatfield and Milgrom, so that it deals with \\textit{near-feasible} matchings that exceed each hospital budget by a certain amount. We then propose two novel mechanisms that efficiently return such a near-feasible matching that is stable with respect to the actual amount of wages allocated by each hospital. Specifically, by sacrificing strategy-proofness, our second mechanism achieves the best possible bound of budget excess.\nSequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most recent Olympiad Champion player to be publicly released.\nGenerative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN. The new distance measure in MMD GAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works.\nThe knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: \"a ball is used by a football player\", \"a tennis player is located at a tennis court\". Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies---specifically, MIT's ConceptNet ontology---can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.\nTo run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Alternating Operator Ansatz (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.\nSemantic Image Interpretation (SII) is the task of extracting structured semantic descriptions from images. It is widely agreed that the combined use of visual data and background knowledge is of great importance for SII. Recently, Statistical Relational Learning (SRL) approaches have been developed for reasoning under uncertainty and learning in the presence of data and rich knowledge. Logic Tensor Networks (LTNs) are an SRL framework which integrates neural networks with first-order fuzzy logic to allow (i) efficient learning from noisy data in the presence of logical constraints, and (ii) reasoning with logical formulas describing general properties of the data. In this paper, we develop and apply LTNs to two of the main tasks of SII, namely, the classification of an image's bounding boxes and the detection of the relevant part-of relations between objects. To the best of our knowledge, this is the first successful application of SRL to such SII tasks. The proposed approach is evaluated on a standard image processing benchmark. Experiments show that the use of background knowledge in the form of logical constraints can improve the performance of purely data-driven approaches, including the state-of-the-art Fast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show that the use of logical background knowledge adds robustness to the learning system when errors are present in the labels of the training data.\nA kidney exchange is a centrally-administered barter market where patients swap their willing yet incompatible donors. Modern kidney exchanges use 2-cycles, 3-cycles, and chains initiated by non-directed donors (altruists who are willing to give a kidney to anyone) as the means for swapping.   We propose significant generalizations to kidney exchange. We allow more than one donor to donate in exchange for their desired patient receiving a kidney. We also allow for the possibility of a donor willing to donate if any of a number of patients receive kidneys. Furthermore, we combine these notions and generalize them. The generalization is to exchange among organ clubs, where a club is willing to donate organs outside the club if and only if the club receives organs from outside the club according to given specifications. We prove that unlike in the standard model, the uncapped clearing problem is NP-complete.   We also present the notion of operation frames that can be used to sequence the operations across batches, and present integer programming formulations for the market clearing problems for these new types of organ exchanges.   Experiments show that in the single-donation setting, operation frames improve planning by 34%--51%. Allowing up to two donors to donate in exchange for one kidney donated to their designated patient yields a further increase in social welfare.\nProcess mining is a research field focused on the analysis of event data with the aim of extracting insights related to dynamic behavior. Applying process mining techniques on data from smart home environments has the potential to provide valuable insights in (un)healthy habits and to contribute to ambient assisted living solutions. Finding the right event labels to enable the application of process mining techniques is however far from trivial, as simply using the triggering sensor as the label for sensor events results in uninformative models that allow for too much behavior (overgeneralizing). Refinements of sensor level event labels suggested by domain experts have been shown to enable discovery of more precise and insightful process models. However, there exists no automated approach to generate refinements of event labels in the context of process mining. In this paper we propose a framework for the automated generation of label refinements based on the time attribute of events, allowing us to distinguish behaviourally different instances of the same event type based on their time attribute. We show on a case study with real life smart home event data that using automatically generated refined labels in process discovery, we can find more specific, and therefore more insightful, process models. We observe that one label refinement could have an effect on the usefulness of other label refinements when used together. Therefore, we explore four strategies to generate useful combinations of multiple label refinements and evaluate those on three real life smart home event logs.\nWhile domain adaptation has been actively researched in recent years, most theoretical results and algorithms focus on the single-source-single-target adaptation setting. Naive application of such algorithms on multiple source domain adaptation problem may lead to suboptimal solutions. As a step toward bridging the gap, we propose a new generalization bound for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances. Compared with existing bounds, the new bound does not require expert knowledge about the target distribution, nor the optimal combination rule for multisource domains. Interestingly, our theory also leads to an efficient learning strategy using adversarial neural networks: we show how to interpret it as learning feature representations that are invariant to the multiple domain shifts while still being discriminative for the learning task. To this end, we propose two models, both of which we call multisource domain adversarial networks (MDANs): the first model optimizes directly our bound, while the second model is a smoothed approximation of the first one, leading to a more data-efficient and task-adaptive model. The optimization tasks of both models are minimax saddle point problems that can be optimized by adversarial training. To demonstrate the effectiveness of MDANs, we conduct extensive experiments showing superior adaptation performance on three real-world datasets: sentiment analysis, digit classification, and vehicle counting.\nNew types of machine learning hardware in development and entering the market hold the promise of revolutionizing deep learning in a manner as profound as GPUs. However, existing software frameworks and training algorithms for deep learning have yet to evolve to fully leverage the capability of the new wave of silicon. We already see the limitations of existing algorithms for models that exploit structured input via complex and instance-dependent control flow, which prohibits minibatching. We present an asynchronous model-parallel (AMP) training algorithm that is specifically motivated by training on networks of interconnected devices. Through an implementation on multi-core CPUs, we show that AMP training converges to the same accuracy as conventional synchronous training algorithms in a similar number of epochs, but utilizes the available hardware more efficiently even for small minibatch sizes, resulting in significantly shorter overall training times. Our framework opens the door for scaling up a new class of deep learning models that cannot be efficiently trained today.\nWe introduce a new flexible paradigm of grounding and solving in Answer Set Programming (ASP), which we refer to as multi-shot ASP solving, and present its implementation in the ASP system clingo.   Multi-shot ASP solving features grounding and solving processes that deal with continuously changing logic programs. In doing so, they remain operative and accommodate changes in a seamless way. For instance, such processes allow for advanced forms of search, as in optimization or theory solving, or interaction with an environment, as in robotics or query-answering. Common to them is that the problem specification evolves during the reasoning process, either because data or constraints are added, deleted, or replaced. This evolutionary aspect adds another dimension to ASP since it brings about state changing operations. We address this issue by providing an operational semantics that characterizes grounding and solving processes in multi-shot ASP solving. This characterization provides a semantic account of grounder and solver states along with the operations manipulating them.   The operative nature of multi-shot solving avoids redundancies in relaunching grounder and solver programs and benefits from the solver's learning capacities. clingo accomplishes this by complementing ASP's declarative input language with control capacities. On the declarative side, a new directive allows for structuring logic programs into named and parameterizable subprograms. The grounding and integration of these subprograms into the solving process is completely modular and fully controllable from the procedural side. To this end, clingo offers a new application programming interface that is conveniently accessible via scripting languages.\nBandit based optimisation has a remarkable advantage over gradient based approaches due to their global perspective, which eliminates the danger of getting stuck at local optima. However, for continuous optimisation problems or problems with a large number of actions, bandit based approaches can be hindered by slow learning. Gradient based approaches, on the other hand, navigate quickly in high-dimensional continuous spaces through local optimisation, following the gradient in fine grained steps. Yet, apart from being susceptible to local optima, these schemes are less suited for online learning due to their reliance on extensive trial-and-error before the optimum can be identified. In this paper, we propose a Bayesian approach that unifies the above two paradigms in one single framework, with the aim of combining their advantages. At the heart of our approach we find a stochastic linear approximation of the function to be optimised, where both the gradient and values of the function are explicitly captured. This allows us to learn from both noisy function and gradient observations, and predict these properties across the action space to support optimisation. We further propose an accompanying bandit driven exploration scheme that uses Bayesian credible bounds to trade off exploration against exploitation. Our empirical results demonstrate that by unifying bandit and gradient based learning, one obtains consistently improved performance across a wide spectrum of problem environments. Furthermore, even when gradient feedback is unavailable, the flexibility of our model, including gradient prediction, still allows us outperform competing approaches, although with a smaller margin. Due to the pervasiveness of bandit based optimisation, our scheme opens up for improved performance both in meta-optimisation and in applications where gradient related information is readily available.\nThis paper proposes a new approach to a novel value network architecture for the game Go, called a multi-labelled (ML) value network. In the ML value network, different values (win rates) are trained simultaneously for different settings of komi, a compensation given to balance the initiative of playing first. The ML value network has three advantages, (a) it outputs values for different komi, (b) it supports dynamic komi, and (c) it lowers the mean squared error (MSE). This paper also proposes a new dynamic komi method to improve game-playing strength. This paper also performs experiments to demonstrate the merits of the architecture. First, the MSE of the ML value network is generally lower than the value network alone. Second, the program based on the ML value network wins by a rate of 67.6% against the program based on the value network alone. Third, the program with the proposed dynamic komi method significantly improves the playing strength over the baseline that does not use dynamic komi, especially for handicap games. To our knowledge, up to date, no handicap games have been played openly by programs using value networks. This paper provides these programs with a useful approach to playing handicap games.\nWe present a new system S for handling uncertainty in a quantified modal logic (first-order modal logic). The system is based on both probability theory and proof theory. The system is derived from Chisholm's epistemology. We concretize Chisholm's system by grounding his undefined and primitive (i.e. foundational) concept of reasonablenes in probability and proof theory. S can be useful in systems that have to interact with humans and provide justifications for their uncertainty. As a demonstration of the system, we apply the system to provide a solution to the lottery paradox. Another advantage of the system is that it can be used to provide uncertainty values for counterfactual statements. Counterfactuals are statements that an agent knows for sure are false. Among other cases, counterfactuals are useful when systems have to explain their actions to users. Uncertainties for counterfactuals fall out naturally from our system.   Efficient reasoning in just simple first-order logic is a hard problem. Resolution-based first-order reasoning systems have made significant progress over the last several decades in building systems that have solved non-trivial tasks (even unsolved conjectures in mathematics). We present a sketch of a novel algorithm for reasoning that extends first-order resolution.   Finally, while there have been many systems of uncertainty for propositional logics, first-order logics and propositional modal logics, there has been very little work in building systems of uncertainty for first-order modal logics. The work described below is in progress; and once finished will address this lack.\nThe multi-agent path-finding (MAPF) problem has recently received a lot of attention. However, it does not capture important characteristics of many real-world domains, such as automated warehouses, where agents are constantly engaged with new tasks. In this paper, we therefore study a lifelong version of the MAPF problem, called the multi-agent pickup and delivery (MAPD) problem. In the MAPD problem, agents have to attend to a stream of delivery tasks in an online setting. One agent has to be assigned to each delivery task. This agent has to first move to a given pickup location and then to a given delivery location while avoiding collisions with other agents. We present two decoupled MAPD algorithms, Token Passing (TP) and Token Passing with Task Swaps (TPTS). Theoretically, we show that they solve all well-formed MAPD instances, a realistic subclass of MAPD instances. Experimentally, we compare them against a centralized strawman MAPD algorithm without this guarantee in a simulated warehouse system. TP can easily be extended to a fully distributed MAPD algorithm and is the best choice when real-time computation is of primary concern since it remains efficient for MAPD instances with hundreds of agents and tasks. TPTS requires limited communication among agents and balances well between TP and the centralized MAPD algorithm.\nWe consider the online one-class collaborative filtering (CF) problem that consists of recommending items to users over time in an online fashion based on positive ratings only. This problem arises when users respond only occasionally to a recommendation with a positive rating, and never with a negative one. We study the impact of the probability of a user responding to a recommendation, p_f, on the sample complexity, i.e., the number of ratings required to make `good' recommendations, and ask whether receiving positive and negative ratings, instead of positive ratings only, improves the sample complexity. Both questions arise in the design of recommender systems. We introduce a simple probabilistic user model, and analyze the performance of an online user-based CF algorithm. We prove that after an initial cold start phase, where recommendations are invested in exploring the user's preferences, this algorithm makes---up to a fraction of the recommendations required for updating the user's preferences---perfect recommendations. The number of ratings required for the cold start phase is nearly proportional to 1/p_f, and that for updating the user's preferences is essentially independent of p_f. As a consequence we find that, receiving positive and negative ratings instead of only positive ones improves the number of ratings required for initial exploration by a factor of 1/p_f, which can be significant.\nDNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very useful for cross-modal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process. This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness.\nOff-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorithm provides a generalization and unification of existing deep policy gradient techniques, has theoretical guarantees on the bias introduced by off-policy updates, and improves on the state-of-the-art model-free deep RL methods on a number of OpenAI Gym continuous control benchmarks.\nVariational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables. We further define a general objective for semi-supervised learning in this model class, which can be approximated using an importance sampling procedure. We evaluate our framework's ability to learn disentangled representations, both by qualitative exploration of its generative capacity, and quantitative evaluation of its discriminative ability on a variety of models and datasets.\nWe give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm --- an iterative application of compressed sensing techniques for orthogonal polynomials --- requires only uniform sampling of the hyperparameters and is thus easily parallelizable.   Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x.   Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity.\nA novel skill learning approach is proposed that allows a robot to acquire human-like visuospatial skills for object manipulation tasks. Visuospatial skills are attained by observing spatial relationships among objects through demonstrations. The proposed Visuospatial Skill Learning (VSL) is a goal-based approach that focuses on achieving a desired goal configuration of objects relative to one another while maintaining the sequence of operations. VSL is capable of learning and generalizing multi-operation skills from a single demonstration, while requiring minimum prior knowledge about the objects and the environment. In contrast to many existing approaches, VSL offers simplicity, efficiency and user-friendly human-robot interaction. We also show that VSL can be easily extended towards 3D object manipulation tasks, simply by employing point cloud processing techniques. In addition, a robot learning framework, VSL-SP, is proposed by integrating VSL, Imitation Learning, and a conventional planning method. In VSL-SP, the sequence of performed actions are learned using VSL, while the sensorimotor skills are learned using a conventional trajectory-based learning approach. such integration easily extends robot capabilities to novel situations, even by users without programming ability. In VSL-SP the internal planner of VSL is integrated with an existing action-level symbolic planner. Using the underlying constraints of the task and extracted symbolic predicates, identified by VSL, symbolic representation of the task is updated. Therefore the planner maintains a generalized representation of each skill as a reusable action, which can be used in planning and performed independently during the learning phase. The proposed approach is validated through several real-world experiments.\nAutomated story generation is the problem of automatically selecting a sequence of events, actions, or words that can be told as a story. We seek to develop a system that can generate stories by learning everything it needs to know from textual story corpora. To date, recurrent neural networks that learn language models at character, word, or sentence levels have had little success generating coherent stories. We explore the question of event representations that provide a mid-level of abstraction between words and sentences in order to retain the semantic information of the original data while minimizing event sparsity. We present a technique for preprocessing textual story data into event sequences. We then present a technique for automated story generation whereby we decompose the problem into the generation of successive events (event2event) and the generation of natural language sentences from events (event2sentence). We give empirical results comparing different event representations and their effects on event successor generation and the translation of events to natural language.\nUsing established principles from Information Theory and Statistics, we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion. This also has an alternative interpretation as minimizing a PAC-Bayesian bound on the test error. Finally, we exploit a duality between weights and activations induced by the architecture, to show that the information in the weights bounds the minimality and Total Correlation of the layers, therefore showing that regularizing the weights explicitly or implicitly, using SGD, not only helps avoid overfitting, but also fosters invariance and disentangling of the learned representation. The theory also enables predicting sharp phase transitions between underfitting and overfitting random labels at precise information values, and sheds light on the relation between the geometry of the loss function, in particular so-called \"flat minima,\" and generalization.\nThis study investigates how adequate coordination among the different cognitive processes of a humanoid robot can be developed through end-to-end learning of direct perception of visuomotor stream. We propose a deep dynamic neural network model built on a dynamic vision network, a motor generation network, and a higher-level network. The proposed model was designed to process and to integrate direct perception of dynamic visuomotor patterns in a hierarchical model characterized by different spatial and temporal constraints imposed on each level. We conducted synthetic robotic experiments in which a robot learned to read human's intention through observing the gestures and then to generate the corresponding goal-directed actions. Results verify that the proposed model is able to learn the tutored skills and to generalize them to novel situations. The model showed synergic coordination of perception, action and decision making, and it integrated and coordinated a set of cognitive skills including visual perception, intention reading, attention switching, working memory, action preparation and execution in a seamless manner. Analysis reveals that coherent internal representations emerged at each level of the hierarchy. Higher-level representation reflecting actional intention developed by means of continuous integration of the lower-level visuo-proprioceptive stream.\nThis study presents a dynamic neural network model based on the predictive coding framework for perceiving and predicting the dynamic visuo-proprioceptive patterns. In our previous study [1], we have shown that the deep dynamic neural network model was able to coordinate visual perception and action generation in a seamless manner. In the current study, we extended the previous model under the predictive coding framework to endow the model with a capability of perceiving and predicting dynamic visuo-proprioceptive patterns as well as a capability of inferring intention behind the perceived visuomotor information through minimizing prediction error. A set of synthetic experiments were conducted in which a robot learned to imitate the gestures of another robot in a simulation environment. The experimental results showed that with given intention states, the model was able to mentally simulate the possible incoming dynamic visuo-proprioceptive patterns in a top-down process without the inputs from the external environment. Moreover, the results highlighted the role of minimizing prediction error in inferring underlying intention of the perceived visuo-proprioceptive patterns, supporting the predictive coding account of the mirror neuron systems. The results also revealed that minimizing prediction error in one modality induced the recall of the corresponding representation of another modality acquired during the consolidative learning of raw-level visuo-proprioceptive patterns.\nHumans and animals are constantly exposed to a continuous stream of sensory information from different modalities. At the same time, they form more compressed representations like concepts or symbols. In species that use language, this process is further structured by this interaction, where a mapping between the sensorimotor concepts and linguistic elements needs to be established. There is evidence that children might be learning language by simply disambiguating potential meanings based on multiple exposures to utterances in different contexts (cross-situational learning). In existing models, the mapping between modalities is usually found in a single step by directly using frequencies of referent and meaning co-occurrences. In this paper, we present an extension of this one-step mapping and introduce a newly proposed sequential mapping algorithm together with a publicly available Matlab implementation. For demonstration, we have chosen a less typical scenario: instead of learning to associate objects with their names, we focus on body representations. A humanoid robot is receiving tactile stimulations on its body, while at the same time listening to utterances of the body part names (e.g., hand, forearm and torso). With the goal at arriving at the correct \"body categories\", we demonstrate how a sequential mapping algorithm outperforms one-step mapping. In addition, the effect of data set size and noise in the linguistic input are studied.\nOnline platforms can be divided into information-oriented and social-oriented domains. The former refers to forums or E-commerce sites that emphasize user-item interactions, like Trip.com and Amazon; whereas the latter refers to social networking services (SNSs) that have rich user-user connections, such as Facebook and Twitter. Despite their heterogeneity, these two domains can be bridged by a few overlapping users, dubbed as bridge users. In this work, we address the problem of cross-domain social recommendation, i.e., recommending relevant items of information domains to potential users of social networks. To our knowledge, this is a new problem that has rarely been studied before.   Existing cross-domain recommender systems are unsuitable for this task since they have either focused on homogeneous information domains or assumed that users are fully overlapped. Towards this end, we present a novel Neural Social Collaborative Ranking (NSCR) approach, which seamlessly sews up the user-item interactions in information domains and user-user connections in SNSs. In the information domain part, the attributes of users and items are leveraged to strengthen the embedding learning of users and items. In the SNS part, the embeddings of bridge users are propagated to learn the embeddings of other non-bridge users. Extensive experiments on two real-world datasets demonstrate the effectiveness and rationality of our NSCR method.\nOver 13 months in 2016-17 the FCC conducted an \"incentive auction\" to repurpose radio spectrum from broadcast television to wireless internet. In the end, the auction yielded $19.8 billion, $10.05 billion of which was paid to 175 broadcasters for voluntarily relinquishing their licenses across 14 UHF channels. Stations that continued broadcasting were assigned potentially new channels to fit as densely as possible into the channels that remained. The government netted more than $7 billion (used to pay down the national debt) after covering costs. A crucial element of the auction design was the construction of a solver, dubbed SATFC, that determined whether sets of stations could be \"repacked\" in this way; it needed to run every time a station was given a price quote. This paper describes the process by which we built SATFC. We adopted an approach we dub \"deep optimization\", taking a data-driven, highly parametric, and computationally intensive approach to solver design. More specifically, to build SATFC we designed software that could pair both complete and local-search SAT-encoded feasibility checking with a wide range of domain-specific techniques. We then used automatic algorithm configuration techniques to construct a portfolio of eight complementary algorithms to be run in parallel, aiming to achieve good performance on instances that arose in proprietary auction simulations. To evaluate the impact of our solver in this paper, we built an open-source reverse auction simulator. We found that within the short time budget required in practice, SATFC solved more than 95% of the problems it encountered. Furthermore, the incentive auction paired with SATFC produced nearly optimal allocations in a restricted setting and substantially outperformed other alternatives at national scale.\nIn order to perform complex actions in human environments, an autonomous robot needs the ability to understand the environment, that is, to gather and maintain spatial knowledge. Topological map is commonly used for representing large scale, global maps such as floor plans. Although much work has been done in topological map extraction, we have found little previous work on the problem of learning the topological map using a probabilistic model. Learning a topological map means learning the structure of the large-scale space and dependency between places, for example, how the evidence of a group of places influence the attributes of other places. This is an important step towards planning complex actions in the environment. In this thesis, we consider the problem of using probabilistic deep learning model to learn the topological map, which is essentially a sparse undirected graph where nodes represent places annotated with their semantic attributes (e.g. place category). We propose to use a novel probabilistic deep model, Sum-Product Networks (SPNs), due to their unique properties. We present two methods for learning topological maps using SPNs: the place grid method and the template-based method. We contribute an algorithm that builds SPNs for graphs using template models. Our experiments evaluate the ability of our models to enable robots to infer semantic attributes and detect maps with novel semantic attribute arrangements. Our results demonstrate their understanding of the topological map structure and spatial relations between places.\nDesigning an auction that maximizes expected revenue is an intricate task. Indeed, as of today--despite major efforts and impressive progress over the past few years--only the single-item case is fully understood. In this work, we initiate the exploration of the use of tools from deep learning on this topic. The design objective is revenue optimal, dominant-strategy incentive compatible auctions. We show that multi-layer neural networks can learn almost-optimal auctions for settings for which there are analytical solutions, such as Myerson's auction for a single item, Manelli and Vincent's mechanism for a single bidder with additive preferences over two items, or Yao's auction for two additive bidders with binary support distributions and multiple items, even if no prior knowledge about the form of optimal auctions is encoded in the network and the only feedback during training is revenue and regret. We further show how characterization results, even rather implicit ones such as Rochet's characterization through induced utilities and their gradients, can be leveraged to obtain more precise fits to the optimal design. We conclude by demonstrating the potential of deep learning for deriving optimal auctions with high revenue for poorly understood problems.\nThis paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the-art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users.\nVerification determines whether two samples belong to the same class or not, and has important applications such as face and fingerprint verification, where thousands or millions of categories are present but each category has scarce labeled examples, presenting two major challenges for existing deep learning models. We propose a deep semi-supervised model named SEmi-supervised VErification Network (SEVEN) to address these challenges. The model consists of two complementary components. The generative component addresses the lack of supervision within each category by learning general salient structures from a large amount of data across categories. The discriminative component exploits the learned general features to mitigate the lack of supervision within categories, and also directs the generative component to find more informative structures of the whole data manifold. The two components are tied together in SEVEN to allow an end-to-end training of the two components. Extensive experiments on four verification tasks demonstrate that SEVEN significantly outperforms other state-of-the-art deep semi-supervised techniques when labeled data are in short supply. Furthermore, SEVEN is competitive with fully supervised baselines trained with a larger amount of labeled data. It indicates the importance of the generative component in SEVEN.\nRecommendation algorithms that incorporate techniques from deep learning are becoming increasingly popular. Due to the structure of the data coming from recommendation domains (i.e., one-hot-encoded vectors of item preferences), these algorithms tend to have large input and output dimensionalities that dominate their overall size. This makes them difficult to train, due to the limited memory of graphical processing units, and difficult to deploy on mobile devices with limited hardware. To address these difficulties, we propose Bloom embeddings, a compression technique that can be applied to the input and output of neural network models dealing with sparse high-dimensional binary-coded instances. Bloom embeddings are computationally efficient, and do not seriously compromise the accuracy of the model up to 1/5 compression ratios. In some cases, they even improve over the original accuracy, with relative increases up to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4 alternative methods, obtaining favorable results. We also discuss a number of further advantages of Bloom embeddings, such as 'on-the-fly' constant-time operation, zero or marginal space requirements, training time speedups, or the fact that they do not require any change to the core model architecture or training configuration.\nThe success of automated driving deployment is highly depending on the ability to develop an efficient and safe driving policy. The problem is well formulated under the framework of optimal control as a cost optimization problem. Model based solutions using traditional planning are efficient, but require the knowledge of the environment model. On the other hand, model free solutions suffer sample inefficiency and require too many interactions with the environment, which is infeasible in practice. Methods under the Reinforcement Learning framework usually require the notion of a reward function, which is not available in the real world. Imitation learning helps in improving sample efficiency by introducing prior knowledge obtained from the demonstrated behavior, on the risk of exact behavior cloning without generalizing to unseen environments. In this paper we propose a Meta learning framework, based on data set aggregation, to improve generalization of imitation learning algorithms. Under the proposed framework, we propose MetaDAgger, a novel algorithm which tackles the generalization issues in traditional imitation learning. We use The Open Race Car Simulator (TORCS) to test our algorithm. Results on unseen test tracks show significant improvement over traditional imitation learning algorithms, improving the learning time and sample efficiency in the same time. The results are also supported by visualization of the learnt features to prove generalization of the captured details.\nMonte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future nteractions. In this paper, we propose a computer Go system that follows experts way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human experts nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of 253233 professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.\nAn Autonomous Underwater Vehicle (AUV) should carry out complex tasks in a limited time interval. Since existing AUVs have limited battery capacity and restricted endurance, they should autonomously manage mission time and the resources to perform effective persistent deployment in longer missions. Task assignment requires making decisions subject to resource constraints, while tasks are assigned with costs and/or values that are budgeted in advance. Tasks are distributed in a particular operation zone and mapped by a waypoint covered network. Thus, design an efficient routing-task priority assign framework considering vehicle's availabilities and properties is essential for increasing mission productivity and on-time mission completion. This depends strongly on the order and priority of the tasks that are located between node-like waypoints in an operation network. On the other hand, autonomous operation of AUVs in an unfamiliar dynamic underwater and performing quick response to sudden environmental changes is a complicated process. Water current instabilities can deflect the vehicle to an undesired direction and perturb AUVs safety. The vehicle's robustness to strong environmental variations is extremely crucial for its safe and optimum operations in an uncertain and dynamic environment. To this end, the AUV needs to have a general overview of the environment in top level to perform an autonomous action selection (task selection) and a lower level local motion planner to operate successfully in dealing with continuously changing situations. This research deals with developing a novel reactive control architecture to provide a higher level of decision autonomy for the AUV operation that enables a single vehicle to accomplish multiple tasks in a single mission in the face of periodic disturbances in a turbulent and highly uncertain environment.\nTraditional approaches to building a large scale knowledge graph have usually relied on extracting information (entities, their properties, and relations between them) from unstructured text (e.g. Dbpedia). Recent advances in Convolutional Neural Networks (CNN) allow us to shift our focus to learning entities and relations from images, as they build robust models that require little or no pre-processing of the images. In this paper, we present an approach to identify and extract spatial relations (e.g., The girl is standing behind the table) from images using CNNs. Our research addresses two specific challenges: providing insight into how spatial relations are learned by the network and which parts of the image are used to predict these relations. We use the pre-trained network VGGNet to extract features from an image and train a Multi-layer Perceptron (MLP) on a set of synthetic images and the sun09 dataset to extract spatial relations. The MLP predicts spatial relations without a bounding box around the objects or the space in the image depicting the relation. To understand how the spatial relations are represented in the network, a heatmap is overlayed on the image to show the regions that are deemed important by the network. Also, we analyze the MLP to show the relationship between the activation of consistent groups of nodes and the prediction of a spatial relation. We show how the loss of these groups affects the networks ability to identify relations.\nBackdoors and backbones of Boolean formulas are hidden structural properties. A natural goal, already in part realized, is that solver algorithms seek to obtain substantially better performance by exploiting these structures.   However, the present paper is not intended to improve the performance of SAT solvers, but rather is a cautionary paper. In particular, the theme of this paper is that there is a potential chasm between the existence of such structures in the Boolean formula and being able to effectively exploit them. This does not mean that these structures are not useful to solvers. It does mean that one must be very careful not to assume that it is computationally easy to go from the existence of a structure to being able to get one's hands on it and/or being able to exploit the structure.   For example, in this paper we show that, under the assumption that P $\\neq$ NP, there are easily recognizable sets of Boolean formulas for which it is hard to determine whether they have a large backbone. We also show that, also under the assumption P $\\neq$ NP, there are easily recognizable families of Boolean formulas with strong backdoors that are easy to find, yet for which it is hard to determine whether they are satisfiable.\nWe want to build robots that are useful in unstructured real world applications, such as doing work in the household. Grasping in particular is an important skill in this domain, yet it remains a challenge. One of the key hurdles is handling unexpected changes or motion in the objects being grasped and kinematic noise or other errors in the robot. This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object. We use a wrist-mounted sensor to acquire depth images in front of the gripper and train a convolutional neural network to learn a distance function to true grasps for grasp configurations over an image. The training sensor data is generated in simulation, a major advantage over previous work that uses real robot experience, which is costly to obtain. Despite being trained in simulation, our approach works well on real noisy sensor images. We compare our controller in simulated and real robot experiments to a strong baseline for grasp pose detection, and find that our approach significantly outperforms the baseline in the presence of kinematic noise, perceptual errors and disturbances of the object during grasping.\nWe consider the effect of introducing a curriculum of targets when training Boolean models on supervised Multi Label Classification (MLC) problems. In particular, we consider how to order targets in the absence of prior knowledge, and how such a curriculum may be enforced when using meta-heuristics to train discrete non-linear models.   We show that hierarchical dependencies between targets can be exploited by enforcing an appropriate curriculum using hierarchical loss functions. On several multi output circuit-inference problems with known target difficulties, Feedforward Boolean Networks (FBNs) trained with such a loss function achieve significantly lower out-of-sample error, up to $10\\%$ in some cases. This improvement increases as the loss places more emphasis on target order and is strongly correlated with an easy-to-hard curricula. We also demonstrate the same improvements on three real-world models and two Gene Regulatory Network (GRN) inference problems.   We posit a simple a-priori method for identifying an appropriate target order and estimating the strength of target relationships in Boolean MLCs. These methods use intrinsic dimension as a proxy for target difficulty, which is estimated using optimal solutions to a combinatorial optimisation problem known as the Minimum-Feature-Set (minFS) problem. We also demonstrate that the same generalisation gains can be achieved without providing any knowledge of target difficulty.\nThe past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.\nA key part of any evolutionary algorithm is fitness evaluation. When fitness evaluations are corrupted by noise, as happens in many real-world problems as a consequence of various types of uncertainty, a strategy is needed in order to cope with this. Resampling is one of the most common strategies, whereby each solution is evaluated many times in order to reduce the variance of the fitness estimates. When evaluating the performance of a noisy optimisation algorithm, a key consideration is the stopping condition for the algorithm. A frequently used stopping condition in runtime analysis, known as \"First Hitting Time\", is to stop the algorithm as soon as it encounters the optimal solution. However, this is unrealistic for real-world problems, as if the optimal solution were already known, there would be no need to search for it. This paper argues that the use of First Hitting Time, despite being a commonly used approach, is significantly flawed and overestimates the quality of many algorithms in real-world cases, where the optimum is not known in advance and has to be genuinely searched for. A better alternative is to measure the quality of the solution an algorithm returns after a fixed evaluation budget, i.e., to focus on final solution quality. This paper argues that focussing on final solution quality is more realistic and demonstrates cases where the results produced by each algorithm evaluation method lead to very different conclusions regarding the quality of each noisy optimisation algorithm.\nThe availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world repositories remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-created databases that have high relational structure (e.g., predicate calculus representations) but are very sparse. Simpler machine-learning/information-retrieval similarity metrics can scale to large, natural-language datasets, but struggle to account for structural similarity, which is central to analogy. In this paper we explore the viability and value of learning simpler structural representations, specifically, \"problem schemas\", which specify the purpose of a product and the mechanisms by which it achieves that purpose. Our approach combines crowdsourcing and recurrent neural networks to extract purpose and mechanism vector representations from product descriptions. We demonstrate that these learned vectors allow us to find analogies with higher precision and recall than traditional information-retrieval methods. In an ideation experiment, analogies retrieved by our models significantly increased people's likelihood of generating creative ideas compared to analogies retrieved by traditional methods. Our results suggest a promising approach to enabling computational analogy at scale is to learn and leverage weaker structural representations.\nNote that a newer expanded version of this paper is now available at: arXiv:1802.03888   It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.\nEvolution sculpts both the body plans and nervous systems of agents together over time. In contrast, in AI and robotics, a robot's body plan is usually designed by hand, and control policies are then optimized for that fixed design. The task of simultaneously co-optimizing the morphology and controller of an embodied robot has remained a challenge. In psychology, the theory of embodied cognition posits that behavior arises from a close coupling between body plan and sensorimotor control, which suggests why co-optimizing these two subsystems is so difficult: most evolutionary changes to morphology tend to adversely impact sensorimotor control, leading to an overall decrease in behavioral performance. Here, we further examine this hypothesis and demonstrate a technique for \"morphological innovation protection\", which temporarily reduces selection pressure on recently morphologically-changed individuals, thus enabling evolution some time to \"readapt\" to the new morphology with subsequent control policy mutations. We show the potential for this method to avoid local optima and converge to similar highly fit morphologies across widely varying initial conditions, while sustaining fitness improvements further into optimization. While this technique is admittedly only the first of many steps that must be taken to achieve scalable optimization of embodied machines, we hope that theoretical insight into the cause of evolutionary stagnation in current methods will help to enable the automation of robot design and behavioral training -- while simultaneously providing a testbed to investigate the theory of embodied cognition.\nIn Acoustic Scene Classification (ASC) two major approaches have been followed . While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1 st rank among 49 submissions, substantially improving the previous state of the art.\nThis paper introduces a new framework for real-time decision making in video games. An Ensemble agent is a compound agent composed of multiple agents, each with its own tasks or goals to achieve. Usually when dealing with real-time decision making, reactive agents are used; that is agents that return a decision based on the current state. While reactive agents are very fast, most games require more than just a rule-based agent to achieve good results. Deliberative agents---agents that use a forward model to search future states---are very useful in games with no hard time limit, such as Go or Backgammon, but generally take too long for real-time games. The Ensemble framework addresses this issue by allowing the agent to be both deliberative and reactive at the same time. This is achieved by breaking up the game-play into logical roles and having highly focused components for each role, with each component disregarding anything outwith its own role. Reactive agents can be used where a reactive agent is suited to the role, and where a deliberative approach is required, branching is kept to a minimum by the removal of all extraneous factors, enabling an informed decision to be made within a much smaller time-frame. An Arbiter is used to combine the component results, allowing high performing agents to be created from simple, efficient components.\nIn recent years, research has been done on applying Recurrent Neural Networks (RNNs) as recommender systems. Results have been promising, especially in the session-based setting where RNNs have been shown to outperform state-of-the-art models. In many of these experiments, the RNN could potentially improve the recommendations by utilizing information about the user's past sessions, in addition to its own interactions in the current session. A problem for session-based recommendation, is how to produce accurate recommendations at the start of a session, before the system has learned much about the user's current interests. We propose a novel approach that extends a RNN recommender to be able to process the user's recent sessions, in order to improve recommendations. This is done by using a second RNN to learn from recent sessions, and predict the user's interest in the current session. By feeding this information to the original RNN, it is able to improve its recommendations. Our experiments on two different datasets show that the proposed approach can significantly improve recommendations throughout the sessions, compared to a single RNN working only on the current session. The proposed model especially improves recommendations at the start of sessions, and is therefore able to deal with the cold start problem within sessions.\nAdvances in image processing and computer vision in the latest years have brought about the use of visual features in artwork recommendation. Recent works have shown that visual features obtained from pre-trained deep neural networks (DNNs) perform very well for recommending digital art. Other recent works have shown that explicit visual features (EVF) based on attractiveness can perform well in preference prediction tasks, but no previous work has compared DNN features versus specific attractiveness-based visual features (e.g. brightness, texture) in terms of recommendation performance. In this work, we study and compare the performance of DNN and EVF features for the purpose of physical artwork recommendation using transactional data from UGallery, an online store of physical paintings. In addition, we perform an exploratory analysis to understand if DNN embedded features have some relation with certain EVF. Our results show that DNN features outperform EVF, that certain EVF features are more suited for physical artwork recommendation and, finally, we show evidence that certain neurons in the DNN might be partially encoding visual features such as brightness, providing an opportunity for explaining recommendations based on visual neural models.\nHierarchical models are utilized in a wide variety of problems which are characterized by task hierarchies, where predictions on smaller subtasks are useful for trying to predict a final task. Typically, neural networks are first trained for the subtasks, and the predictions of these networks are subsequently used as additional features when training a model and doing inference for a final task. In this work, we focus on improving learning for such hierarchical models and demonstrate our method on the task of speaker trait prediction. Speaker trait prediction aims to computationally identify which personality traits a speaker might be perceived to have, and has been of great interest to both the Artificial Intelligence and Social Science communities. Persuasiveness prediction in particular has been of interest, as persuasive speakers have a large amount of influence on our thoughts, opinions and beliefs. In this work, we examine how leveraging the relationship between related speaker traits in a hierarchical structure can help improve our ability to predict how persuasive a speaker is. We present a novel algorithm that allows us to backpropagate through this hierarchy. This hierarchical model achieves a 25% relative error reduction in classification accuracy over current state-of-the art methods on the publicly available POM dataset.\nWe present a straightforward source-to-source transformation that introduces justifications for user-defined constraints into the CHR programming language. Then a scheme of two rules suffices to allow for logical retraction (deletion, removal) of constraints during computation. Without the need to recompute from scratch, these rules remove not only the constraint but also undo all consequences of the rule applications that involved the constraint. We prove a confluence result concerning the rule scheme and show its correctness. When algorithms are written in CHR, constraints represent both data and operations. CHR is already incremental by nature, i.e. constraints can be added at runtime. Logical retraction adds decrementality. Hence any algorithm written in CHR with justifications will become fully dynamic. Operations can be undone and data can be removed at any point in the computation without compromising the correctness of the result. We present two classical examples of dynamic algorithms, written in our prototype implementation of CHR with justifications that is available online: maintaining the minimum of a changing set of numbers and shortest paths in a graph whose edges change.\nIn the last decade, driven also by the availability of an unprecedented computational power and storage capabilities in cloud environments we assisted to the proliferation of new algorithms, methods, and approaches in two areas of artificial intelligence: knowledge representation and machine learning. On the one side, the generation of a high rate of structured data on the Web led to the creation and publication of the so-called knowledge graphs. On the other side, deep learning emerged as one of the most promising approaches in the generation and training of models that can be applied to a wide variety of application fields. More recently, autoencoders have proven their strength in various scenarios, playing a fundamental role in unsupervised learning. In this paper, we instigate how to exploit the semantic information encoded in a knowledge graph to build connections between units in a Neural Network, thus leading to a new method, SEM-AUTO, to extract and weigh semantic features that can eventually be used to build a recommender system. As adding content-based side information may mitigate the cold user problems, we tested how our approach behave in the presence of a few rating from a user on the Movielens 1M dataset and compare results with BPRSLIM.\nOnline advertising and product recommendation are important domains of applications for multi-armed bandit methods. In these fields, the reward that is immediately available is most often only a proxy for the actual outcome of interest, which we refer to as a conversion. For instance, in web advertising, clicks can be observed within a few seconds after an ad display but the corresponding sale --if any-- will take hours, if not days to happen. This paper proposes and investigates a new stochas-tic multi-armed bandit model in the framework proposed by Chapelle (2014) --based on empirical studies in the field of web advertising-- in which each action may trigger a future reward that will then happen with a stochas-tic delay. We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored. We provide performance lower bounds as well as two simple but efficient algorithms based on the UCB and KLUCB frameworks. The latter algorithm, which is preferable when conversion rates are low, is based on a Poissonization argument, of independent interest in other settings where aggregation of Bernoulli observations with different success probabilities is required.\nFinancial portfolio management is the process of constant redistribution of a fund into different financial products. This paper presents a financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem. The framework consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. This framework is realized in three instants in this work with a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). They are, along with a number of recently reviewed or published portfolio-selection strategies, examined in three back-test experiments with a trading period of 30 minutes in a cryptocurrency market. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. All three instances of the framework monopolize the top three positions in all experiments, outdistancing other compared trading algorithms. Although with a high commission rate of 0.25% in the backtests, the framework is able to achieve at least 4-fold returns in 50 days.\nMarkov decision processes (MDPs) are the standard formalism for modelling sequential decision making in stochastic environments. Policy synthesis addresses the problem of how to control or limit the decisions an agent makes so that a given specification is met. In this paper we consider PCTL*, the probabilistic counterpart of CTL*, as the specification language. Because in general the policy synthesis problem for PCTL* is undecidable, we restrict to policies whose execution history memory is finitely bounded a priori.   Surprisingly, no algorithm for policy synthesis for this natural and expressive framework has been developed so far. We close this gap and describe a tableau-based algorithm that, given an MDP and a PCTL* specification, derives in a non-deterministic way a system of (possibly nonlinear) equalities and inequalities. The solutions of this system, if any, describe the desired (stochastic) policies.   Our main result in this paper is the correctness of our method, i.e., soundness, completeness and termination.\nIn reinforcement learning (RL) tasks, an efficient exploration mechanism should be able to encourage an agent to take actions that lead to less frequent states which may yield higher accumulative future return. However, both knowing about the future and evaluating the frequentness of states are non-trivial tasks, especially for deep RL domains, where a state is represented by high-dimensional image frames. In this paper, we propose a novel informed exploration framework for deep RL tasks, where we build the capability for a RL agent to predict over the future transitions and evaluate the frequentness for the predicted future frames in a meaningful manner. To this end, we train a deep prediction model to generate future frames given a state-action pair, and a convolutional autoencoder model to generate deep features for conducting hashing over the seen frames. In addition, to utilize the counts derived from the seen frames to evaluate the frequentness for the predicted frames, we tackle the challenge of making the hash codes for the predicted future frames to match with their corresponding seen frames. In this way, we could derive a reliable metric for evaluating the novelty of the future direction pointed by each action, and hence inform the agent to explore the least frequent one. We use Atari 2600 games as the testing environment and demonstrate that the proposed framework achieves significant performance gain over a state-of-the-art informed exploration approach in most of the domains.\nRegression or classification? This is perhaps the most basic question faced when tackling a new supervised learning problem. We present an Evolutionary Deep Learning (EDL) algorithm that automatically solves this by identifying the question type with high accuracy, along with a proposed deep architecture. Typically, a significant amount of human insight and preparation is required prior to executing machine learning algorithms. For example, when creating deep neural networks, the number of parameters must be selected in advance and furthermore, a lot of these choices are made based upon pre-existing knowledge of the data such as the use of a categorical cross entropy loss function. Humans are able to study a dataset and decide whether it represents a classification or a regression problem, and consequently make decisions which will be applied to the execution of the neural network. We propose the Automated Problem Identification (API) algorithm, which uses an evolutionary algorithm interface to TensorFlow to manipulate a deep neural network to decide if a dataset represents a classification or a regression problem. We test API on 16 different classification, regression and sentiment analysis datasets with up to 10,000 features and up to 17,000 unique target values. API achieves an average accuracy of $96.3\\%$ in identifying the problem type without hardcoding any insights about the general characteristics of regression or classification problems. For example, API successfully identifies classification problems even with 1000 target values. Furthermore, the algorithm recommends which loss function to use and also recommends a neural network architecture. Our work is therefore a step towards fully automated machine learning.\nIn the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the $\\alpha$-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.\nIn this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a miniature version of StarCraft, captures key game dynamics and runs at 40K frame-per-second (FPS) per core on a Macbook Pro notebook. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end-to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environments like Arcade Learning Environment. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU and Batch Normalization coupled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than $70\\%$ of the time in the full game of Mini-RTS. Strong performance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open-sourced at https://github.com/facebookresearch/ELF.\nDomain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative in prediction. To learn such representations, domain adaptation frameworks usually include a domain invariant representation learning approach to measure and reduce the domain discrepancy, as well as a discriminator for classification. Inspired by Wasserstein GAN, in this paper we propose a novel approach to learn domain invariant feature representations, namely Wasserstein Distance Guided Representation Learning (WDGRL). WDGRL utilizes a neural network, denoted by the domain critic, to estimate empirical Wasserstein distance between the source and target samples and optimizes the feature extractor network to minimize the estimated Wasserstein distance in an adversarial manner. The theoretical advantages of Wasserstein distance for domain adaptation lie in its gradient property and promising generalization bound. Empirical studies on common sentiment and image classification adaptation datasets demonstrate that our proposed WDGRL outperforms the state-of-the-art domain invariant representation learning approaches.\nDespite large incentives, ecorrectness in software remains an elusive goal. Declarative programming techniques, where algorithms are derived from a specification of the desired behavior, offer hope to address this problem, since there is a combinatorial reduction in complexity in programming in terms of specifications instead of algorithms, and arbitrary desired properties can be expressed and enforced in specifications directly. However, limitations on performance have prevented programming with declarative specifications from becoming a mainstream technique for general-purpose programming. To address the performance bottleneck in deriving an algorithm from a specification, I propose information-gain computation, a framework where an adaptive evaluation strategy is used to efficiently perform a search which derives algorithms that provide information about a query most directly. Within this framework, opportunities to compress the search space present themselves, which suggest that information-theoretic bounds on the performance of such a system might be articulated and a system designed to achieve them. In a preliminary empirical study of adaptive evaluation for a simple test program, the evaluation strategy adapts successfully to evaluate a query efficiently.\nRecognizing seismic waves immediately is very important for the realization of efficient disaster prevention. Generally these systems consist of a network of seismic detectors that send real time data to a central server. The server elaborates the data and attempts to recognize the first signs of an earthquake. The current problem with this approach is that it is subject to false alarms. A critical trade-off exists between sensitivity of the system and error rate. To overcame this problems, an artificial neural network based intelligent learning systems can be used. However, conventional supervised ANN systems are difficult to train, CPU intensive and prone to false alarms. To surpass these problems, here we attempt to use a next-generation unsupervised cortical algorithm HTM. This novel approach does not learn particular waveforms, but adapts to continuously fed data reaching the ability to discriminate between normality (seismic sensor background noise in no-earthquake conditions) and anomaly (sensor response to a jitter or an earthquake). Main goal of this study is test the ability of the HTM algorithm to be used to signal earthquakes automatically in a feasible disaster prevention system. We describe the methodology used and give the first qualitative assessments of the recognition ability of the system. Our preliminary results show that the cortical algorithm used is very robust to noise and that can successfully recognize synthetic earthquake-like signals efficiently and reliably.\nThis paper proposes a new fuzzy assessing procedure with application in management decision making. The proposed fuzzy approach build the membership functions for system characteristics of a standby repairable system. This method is used to extract a family of conventional crisp intervals from the fuzzy repairable system for the desired system characteristics. This can be determined with a set of nonlinear parametric programing using the membership functions. When system characteristics are governed by the membership functions, more information is provided for use by management, and because the redundant system is extended to the fuzzy environment, general repairable systems are represented more accurately and the analytic results are more useful for designers and practitioners. Also beside standby, active redundancy systems are used in many cases so this article has many practical instances. Different from other studies, our model provides, a good estimated value based on uncertain environments, a comparison discussion of using fuzzy theory and conventional method and also a comparison between parallel (active redundancy) and series system in fuzzy world when we have standby redundancy. When the membership function intervals cannot be inverted explicitly, system management or designers can specify the system characteristics of interest, perform numerical calculations, examine the corresponding {\\alpha}-cuts, and use this information to develop or improve system processes.\nThe reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following https://youtu.be/hx_bgoTF7bs .\nThe enormous amount of texts published daily by Internet users has fostered the development of methods to analyze this content in several natural language processing areas, such as sentiment analysis. The main goal of this task is to classify the polarity of a message. Even though many approaches have been proposed for sentiment analysis, some of the most successful ones rely on the availability of large annotated corpus, which is an expensive and time-consuming process. In recent years, distant supervision has been used to obtain larger datasets. So, inspired by these techniques, in this paper we extend such approaches to incorporate popular graphic symbols used in electronic messages, the emojis, in order to create a large sentiment corpus for Portuguese. Trained on almost one million tweets, several models were tested in both same domain and cross-domain corpora. Our methods obtained very competitive results in five annotated corpora from mixed domains (Twitter and product reviews), which proves the domain-independent property of such approach. In addition, our results suggest that the combination of emoticons and emojis is able to properly capture the sentiment of a message.\nWe describe the Inspire system which participated in the first competition on Inductive Logic Programming (ILP). Inspire is based on Answer Set Programming (ASP). The distinguishing feature of Inspire is an ASP encoding for hypothesis space generation: given a set of facts representing the mode bias, and a set of cost configuration parameters, each answer set of this encoding represents a single rule that is considered for finding a hypothesis that entails the given examples. Compared with state-of-the-art methods that use the length of the rule body as a metric for rule complexity, our approach permits a much more fine-grained specification of the shape of hypothesis candidate rules. The Inspire system iteratively increases the rule cost limit and thereby increases the search space until it finds a suitable hypothesis. The system searches for a hypothesis that entails a single example at a time, utilizing an ASP encoding derived from the encoding used in XHAIL. We perform experiments with the development and test set of the ILP competition. For comparison we also adapted the ILASP system to process competition instances. Experimental results show that the cost parameters for the hypothesis search space are an important factor for finding hypotheses to competition instances within tight resource bounds.\nThe success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen if we increase the dataset size by 10x or 100x? This paper takes a step towards clearing the clouds of mystery surrounding the relationship between `enormous data' and visual deep learning. By exploiting the JFT-300M dataset which has more than 375M noisy labels for 300M images, we investigate how the performance of current vision tasks would change if this data was used for representation learning. Our paper delivers some surprising (and some expected) findings. First, we find that the performance on vision tasks increases logarithmically based on volume of training data size. Second, we show that representation learning (or pre-training) still holds a lot of promise. One can improve performance on many vision tasks by just training a better base model. Finally, as expected, we present new state-of-the-art results for different vision tasks including image classification, object detection, semantic segmentation and human pose estimation. Our sincere hope is that this inspires vision community to not undervalue the data and develop collective efforts in building larger datasets.\nRobotic motion planning problems are typically solved by constructing a search tree of valid maneuvers from a start to a goal configuration. Limited onboard computation and real-time planning constraints impose a limit on how large this search tree can grow. Heuristics play a crucial role in such situations by guiding the search towards potentially good directions and consequently minimizing search effort. Moreover, it must infer such directions in an efficient manner using only the information uncovered by the search up until that time. However, state of the art methods do not address the problem of computing a heuristic that explicitly minimizes search effort. In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand. Unfortunately, naively training such policies leads to slow convergence and poor local minima. We present SaIL, an efficient algorithm that trains heuristic policies by imitating \"clairvoyant oracles\" - oracles that have full information about the world and demonstrate decisions that minimize search effort. We leverage the fact that such oracles can be efficiently computed using dynamic programming and derive performance guarantees for the learnt heuristic. We validate the approach on a spectrum of environments which show that SaIL consistently outperforms state of the art algorithms. Our approach paves the way forward for learning heuristics that demonstrate an anytime nature - finding feasible solutions quickly and incrementally refining it over time.\nA number of intriguing decision scenarios revolve around partitioning a collection of objects to optimize some application specific objective function. This problem is generally referred to as the Object Partitioning Problem (OPP) and is known to be NP-hard. We here consider a particularly challenging version of OPP, namely, the Stochastic On-line Equi-Partitioning Problem (SO-EPP). In SO-EPP, the target partitioning is unknown and has to be inferred purely from observing an on-line sequence of object pairs. The paired objects belong to the same partition with probability $p$ and to different partitions with probability $1-p$, with $p$ also being unknown. As an additional complication, the partitions are required to be of equal cardinality. Previously, only sub-optimal solution strategies have been proposed for SO- EPP. In this paper, we propose the first optimal solution strategy. In brief, the scheme that we propose, BN-EPP, is founded on a Bayesian network representation of SO-EPP problems. Based on probabilistic reasoning, we are not only able to infer the underlying object partitioning with optimal accuracy. We are also able to simultaneously infer $p$, allowing us to accelerate learning as object pairs arrive. Furthermore, our scheme is the first to support arbitrary constraints on the partitioning (Constrained SO-EPP). Being optimal, BN-EPP provides superior performance compared to existing solution schemes. We additionally introduce Walk-BN-EPP, a novel WalkSAT inspired algorithm for solving large scale BN-EPP problems. Finally, we provide a BN-EPP based solution to the problem of order picking, a representative real-life application of BN-EPP.\nDeep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, many recent meta-learning approaches are extensively hand-designed, either using architectures specialized to a particular application, or hard-coding algorithmic components that constrain how the meta-learner solves the task. We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. In the most extensive set of meta-learning experiments to date, we evaluate the resulting Simple Neural AttentIve Learner (or SNAIL) on several heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins.\nImitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, and embodiment. We term this kind of imitation learning as imitation-from-observation and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts the assumption in imitation learning that the demonstration should consist of observations and actions in the same environment, and enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Our experimental results show that our approach can perform imitation-from-observation for a variety of real-world robotic tasks modeled on common household chores, acquiring skills such as sweeping from videos of a human demonstrator. Videos can be found at https://sites.google.com/site/imitationfromobservation\nIt has been shown that most machine learning algorithms are susceptible to adversarial perturbations. Slightly perturbing an image in a carefully chosen direction in the image space may cause a trained neural network model to misclassify it. Recently, it was shown that physical adversarial examples exist: printing perturbed images then taking pictures of them would still result in misclassification. This raises security and safety concerns.   However, these experiments ignore a crucial property of physical objects: the camera can view objects from different distances and at different angles. In this paper, we show experiments that suggest that current constructions of physical adversarial examples do not disrupt object detection from a moving platform. Instead, a trained neural network classifies most of the pictures taken from different distances and angles of a perturbed image correctly. We believe this is because the adversarial property of the perturbation is sensitive to the scale at which the perturbed picture is viewed, so (for example) an autonomous car will misclassify a stop sign only from a small range of distances.   Our work raises an important question: can one construct examples that are adversarial for many or most viewing conditions? If so, the construction should offer very significant insights into the internal representation of patterns by deep networks. If not, there is a good prospect that adversarial examples can be reduced to a curiosity with little practical impact.\nTwo fundamental problems for extraterrestrial intelligences (ETIs) attempting to establish interstellar communication are timing and energy consumption. Humanity's study of exoplanets via their transit across the host star highlights a means of solving both problems. An ETI 'A' can communicate with ETI 'B' if B is observing transiting planets in A's star system, either by building structures to produce artificial transits observable by B, or by emitting signals at B during transit, at significantly lower energy consumption than typical electromagnetic transmission schemes.   This can produce a network of interconnected civilisations, establishing contact via observing each other's transits. Assuming that civilisations reside in a Galactic Habitable Zone (GHZ), I conduct Monte Carlo Realisation simulations of the establishment and growth of this network, and analyse its properties in the context of graph theory.   I find that at any instant, only a few civilisations are correctly aligned to communicate via transits. However, we should expect the true network to be cumulative, where a \"handshake\" connection at any time guarantees connection in the future via e.g. electromagnetic signals. In all our simulations, the cumulative network connects all civilisations together in a complete network. If civilisations share knowledge of their network connections, the network can be fully complete on timescales of order a hundred thousand years. Once established, this network can connect any two civilisations either directly, or via intermediate civilisations, with a path much less than the dimensions of the GHZ.\nNeural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic remains open. In this paper, we introduce a novel family of flexible activation functions that are based on an inexpensive kernel expansion at every neuron. Leveraging over several properties of kernel-based models, we propose multiple variations for designing and initializing these kernel activation functions (KAFs), including a multidimensional scheme allowing to nonlinearly combine information from different paths in the network. The resulting KAFs can approximate any mapping defined over a subset of the real line, either convex or nonconvex. Furthermore, they are smooth over their entire domain, linear in their parameters, and they can be regularized using any known scheme, including the use of $\\ell_1$ penalties to enforce sparseness. To the best of our knowledge, no other known model satisfies all these properties simultaneously. In addition, we provide a relatively complete overview on alternative techniques for adapting the activation functions, which is currently lacking in the literature. A large set of experiments validates our proposal.\nThis paper introduces an unsupervised framework to extract semantically rich features for video representation. Inspired by how the human visual system groups objects based on motion cues, we propose a deep convolutional neural network that disentangles motion, foreground and background information. The proposed architecture consists of a 3D convolutional feature encoder for blocks of 16 frames, which is trained for reconstruction tasks over the first and last frames of the sequence. A preliminary supervised experiment was conducted to verify the feasibility of proposed method by training the model with a fraction of videos from the UCF-101 dataset taking as ground truth the bounding boxes around the activity regions. Qualitative results indicate that the network can successfully segment foreground and background in videos as well as update the foreground appearance based on disentangled motion features. The benefits of these learned features are shown in a discriminative classification task, where initializing the network with the proposed pretraining method outperforms both random initialization and autoencoder pretraining. Our model and source code are publicly available at https://imatge-upc.github.io/unsupervised-2017-cvprw/ .\nText-dependent speaker verification is becoming popular in the speaker recognition society. However, the conventional i-vector framework which has been successful for speaker identification and other similar tasks works relatively poorly in this task. Researchers have proposed several new methods to improve performance, but it is still unclear that which model is the best choice, especially when the pass-phrases are prompted during enrollment and test. In this paper, we introduce four modeling methods and compare their performance on the newly published RedDots dataset. To further explore the influence of different frame alignments, Viterbi and forward-backward algorithms are both used in the HMM-based models. Several bottleneck features are also investigated. Our experiments show that, by explicitly modeling the lexical content, the HMM-based modeling achieves good results in the fixed-phrase condition. In the prompted-phrase condition, GMM-HMM and i-vector/HMM are not as successful. In both conditions, the forward-backward algorithm brings more benefits to the i-vector/HMM system. Additionally, we also find that even though bottleneck features perform well for text-independent speaker verification, they do not outperform MFCCs on the most challenging Imposter-Correct trials on RedDots.\nMuch of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.\nWe study revenue optimization learning algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation for a good and seeks to maximize his cumulative discounted surplus. For this setting, first, we propose a novel algorithm that never decreases offered prices and has a tight strategic regret bound in $\\Theta(\\log\\log T)$ under some mild assumptions on the buyer surplus discounting. This result closes the open research question on the existence of a no-regret horizon-independent weakly consistent pricing. The proposed algorithm is inspired by our observation that a double decrease of offered prices in a weakly consistent algorithm is enough to cause a linear regret. This motivates us to construct a novel transformation that maps a right-consistent algorithm to a weakly consistent one that never decreases offered prices.   Second, we outperform the previously known strategic regret upper bound of the algorithm PRRFES, where the improvement is achieved by means of a finer constant factor $C$ of the principal term $C\\log\\log T$ in this upper bound. Finally, we generalize results on strategic regret previously known for geometric discounting of the buyer's surplus to discounting of other types, namely: the optimality of the pricing PRRFES to the case of geometrically concave decreasing discounting; and linear lower bound on the strategic regret of a wide range of horizon-independent weakly consistent algorithms to the case of arbitrary discounts.\nAmong the myriad of desirable properties discussed in the context of forgetting in Answer Set Programming (ASP), strong persistence naturally captures its essence. Recently, it has been shown that it is not always possible to forget a set of atoms from a program while obeying this property, and a precise criterion regarding what can be forgotten has been presented, accompanied by a class of forgetting operators that return the correct result when forgetting is possible.   However, it is an open question what to do when we have to forget a set of atoms, but cannot without violating this property. In this paper, we address this issue and investigate three natural alternatives to forget when forgetting without violating strong persistence is not possible, which turn out to correspond to the different possible relaxations of the characterization of strong persistence. Additionally, we discuss their preferable usage, shed light on the relation between forgetting and notions of relativized equivalence established earlier in the context of ASP, and present a detailed study on their computational complexity.\nAI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human \"in the loop\" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent's learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.\nThis paper proposes a new neural architecture for collaborative ranking with implicit feedback. Our model, LRML (\\textit{Latent Relational Metric Learning}) is a novel metric learning approach for recommendation. More specifically, instead of simple push-pull mechanisms between user and item pairs, we propose to learn latent relations that describe each user item interaction. This helps to alleviate the potential geometric inflexibility of existing metric learing approaches. This enables not only better performance but also a greater extent of modeling capability, allowing our model to scale to a larger number of interactions. In order to do so, we employ a augmented memory module and learn to attend over these memory blocks to construct latent relations. The memory-based attention module is controlled by the user-item interaction, making the learned relation vector specific to each user-item pair. Hence, this can be interpreted as learning an exclusive and optimal relational translation for each user-item interaction. The proposed architecture demonstrates the state-of-the-art performance across multiple recommendation benchmarks. LRML outperforms other metric learning models by $6\\%-7.5\\%$ in terms of Hits@10 and nDCG@10 on large datasets such as Netflix and MovieLens20M. Moreover, qualitative studies also demonstrate evidence that our proposed model is able to infer and encode explicit sentiment, temporal and attribute information despite being only trained on implicit feedback. As such, this ascertains the ability of LRML to uncover hidden relational structure within implicit datasets.\nIn this paper, we address the basic problem of recognizing moving objects in video images using Visual Vocabulary model and Bag of Words and track our object of interest in the subsequent video frames using species inspired PSO. Initially, the shadow free images are obtained by background modelling followed by foreground modeling to extract the blobs of our object of interest. Subsequently, we train a cubic SVM with human body datasets in accordance with our domain of interest for recognition and tracking. During training, using the principle of Bag of Words we extract necessary features of certain domains and objects for classification. Subsequently, matching these feature sets with those of the extracted object blobs that are obtained by subtracting the shadow free background from the foreground, we detect successfully our object of interest from the test domain. The performance of the classification by cubic SVM is satisfactorily represented by confusion matrix and ROC curve reflecting the accuracy of each module. After classification, our object of interest is tracked in the test domain using species inspired PSO. By combining the adaptive learning tools with the efficient classification of description, we achieve optimum accuracy in recognition of the moving objects. We evaluate our algorithm benchmark datasets: iLIDS, VIVID, Walking2, Woman. Comparative analysis of our algorithm against the existing state-of-the-art trackers shows very satisfactory and competitive results.\nIn Computer Vision domain, moving Object Tracking considered as one of the toughest problem.As there so many factors associated like illumination of light, noise, occlusion, sudden start and stop of moving object, shading which makes tracking even harder problem not only for dynamic background but also for static background.In this paper we present a new object tracking algorithm based on Dominant points on tracked object using Quantum particle swarm optimization (QPSO) which is a new different version of PSO based on Quantum theory. The novelty in our approach is that it can be successfully applicable in variable background as well as static background and application of quantum PSO makes the algorithm runs lot faster where other basic PSO algorithm failed to do so due to heavy computation.In our approach firstly dominants points of tracked objects detected, then a group of particles form a swarm are initialized randomly over the image search space and then start searching the curvature connected between two consecutive dominant points until they satisfy fitness criteria. Obviously it is a Multi-Swarm approach as there are multiple dominant points, as they moves, the curvature moves and the curvature movement is tracked by the swarm throughout the video and eventually when the swarm reaches optimal solution , a bounding box drawn based on particles final position.Experimental results demonstrate this proposed QPSO based method work efficiently and effectively in visual object tracking in both dynamic and static environments and run time shows that it runs closely 90% faster than basic PSO.in our approach we also apply parallelism using MatLab Parfor command to show how very less number of iteration and swarm size will enable us to successfully track object.\nMany relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in reverse, gradually learning to reach the goal from a set of start states increasingly far from the goal. Our method automatically generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal-oriented tasks. We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.\nOne promising trend in digital system integration consists of boosting on-chip communication performance by means of silicon photonics, thus materializing the so-called Optical Networks-on-Chip (ONoCs). Among them, wavelength routing can be used to route a signal to destination by univocally associating a routing path to the wavelength of the optical carrier. Such wavelengths should be chosen so to minimize interferences among optical channels and to avoid routing faults. As a result, physical parameter selection of such networks requires the solution of complex constrained optimization problems. In previous work, published in the proceedings of the International Conference on Computer-Aided Design, we proposed and solved the problem of computing the maximum parallelism obtainable in the communication between any two endpoints while avoiding misrouting of optical signals. The underlying technology, only quickly mentioned in that paper, is Answer Set Programming (ASP). In this work, we detail the ASP approach we used to solve such problem.   Another important design issue is to select the wavelengths of optical carriers such that they are spread across the available spectrum, in order to reduce the likelihood that, due to imperfections in the manufacturing process, unintended routing faults arise. We show how to address such problem in Constraint Logic Programming on Finite Domains (CLP(FD)).   This paper is under consideration for possible publication on Theory and Practice of Logic Programming.\nWe introduce a parallel offline algorithm for computing hybrid conditional plans, called HCP-ASP, oriented towards robotics applications. HCP-ASP relies on modeling actuation actions and sensing actions in an expressive nonmonotonic language of answer set programming (ASP), and computation of the branches of a conditional plan in parallel using an ASP solver. In particular, thanks to external atoms, continuous feasibility checks (like collision checks) are embedded into formal representations of actuation actions and sensing actions in ASP; and thus each branch of a hybrid conditional plan describes a feasible execution of actions to reach their goals. Utilizing nonmonotonic constructs and nondeterministic choices, partial knowledge about states and nondeterministic effects of sensing actions can be explicitly formalized in ASP; and thus each branch of a conditional plan can be computed by an ASP solver without necessitating a conformant planner and an ordering of sensing actions in advance. We apply our method in a service robotics domain and report experimental evaluations. Furthermore, we present performance comparisons with other compilation based conditional planners on standardized benchmark domains. This paper is under consideration for acceptance in TPLP.\nConventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the \"Imagination-based Planner\", the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a \"plan context\" which conditions future real and imagined actions. The agent can even decide how to imagine: testing out alternative imagined actions, chaining sequences of actions together, or building a more complex \"imagination tree\" by navigating flexibly among the previously imagined states using a learned policy. And our agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated with using its imagination. We show that our architecture can learn to solve a challenging continuous control problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new direction toward learning the components of a model-based planning system and how to use them.\nBoth hybrid automata and action languages are formalisms for describing the evolution of dynamic systems. This paper establishes a formal relationship between them. We show how to succinctly represent hybrid automata in an action language which in turn is defined as a high-level notation for answer set programming modulo theories (ASPMT) --- an extension of answer set programs to the first-order level similar to the way satisfiability modulo theories (SMT) extends propositional satisfiability (SAT). We first show how to represent linear hybrid automata with convex invariants by an action language modulo theories. A further translation into SMT allows for computing them using SMT solvers that support arithmetic over reals. Next, we extend the representation to the general class of non-linear hybrid automata allowing even non-convex invariants. We represent them by an action language modulo ODE (Ordinary Differential Equations), which can be compiled into satisfiability modulo ODE. We developed a prototype system cplus2aspmt based on these translations, which allows for a succinct representation of hybrid transition systems that can be computed effectively by the state-of-the-art SMT solver dReal.\nThis paper studies Bayesian ranking and selection (R&S) problems with correlated prior beliefs and continuous domains, i.e. Bayesian optimization (BO). Knowledge gradient methods [Frazier et al., 2008, 2009] have been widely studied for discrete R&S problems, which sample the one-step Bayes-optimal point. When used over continuous domains, previous work on the knowledge gradient [Scott et al., 2011, Wu and Frazier, 2016, Wu et al., 2017] often rely on a discretized finite approximation. However, the discretization introduces error and scales poorly as the dimension of domain grows. In this paper, we develop a fast discretization-free knowledge gradient method for Bayesian optimization. Our method is not restricted to the fully sequential setting, but useful in all settings where knowledge gradient can be used over continuous domains. We show how our method can be generalized to handle (i) batch of points suggestion (parallel knowledge gradient); (ii) the setting where derivative information is available in the optimization process (derivative-enabled knowledge gradient). In numerical experiments, we demonstrate that the discretization-free knowledge gradient method finds global optima significantly faster than previous Bayesian optimization algorithms on both synthetic test functions and real-world applications, especially when function evaluations are noisy; and derivative-enabled knowledge gradient can further improve the performances, even outperforming the gradient-based optimizer such as BFGS when derivative information is available.\nImitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.\nPredictive business process monitoring refers to the act of making predictions about the future state of ongoing cases of a business process, based on their incomplete execution traces and logs of historical (completed) traces. Motivated by the increasingly pervasive availability of fine-grained event data about business process executions, the problem of predictive process monitoring has received substantial attention in the past years. In particular, a considerable number of methods have been put forward to address the problem of outcome-oriented predictive process monitoring, which refers to classifying each ongoing case of a process according to a given set of possible outcomes - e.g. Will the customer complain or not? Will an order be delivered, cancelled or withdrawn? Unfortunately, different authors have used different datasets, experimental settings, evaluation measures and baselines to assess their proposals, resulting in poor comparability and an unclear picture of the relative merits and applicability of different methods. To address this gap, this article presents a systematic review and taxonomy of outcome-oriented predictive process monitoring methods, and a comparative experimental evaluation of eleven representative methods using a benchmark covering twelve predictive process monitoring tasks based on four real-life event logs.\nIn recent work, we proved that the domain recursion inference rule makes domain-lifted inference possible on several relational probability models (RPMs) for which the best known time complexity used to be exponential. We also identified two classes of RPMs for which inference becomes domain lifted when using domain recursion. These two classes subsume the largest lifted classes that were previously known. In this paper, we show that domain recursion can also be applied to models with existential quantifiers. Currently, all lifted inference algorithms assume that existential quantifiers have been removed in pre-processing by Skolemization. We show that besides introducing potentially inconvenient negative weights, Skolemization may increase the time complexity of inference. We give two example models where domain recursion can replace Skolemization, avoids the need for dealing with negative numbers, and reduces the time complexity of inference. These two examples may be interesting from three theoretical aspects: 1- they provide a better and deeper understanding of domain recursion and, in general, (lifted) inference, 2- they may serve as evidence that there are larger classes of models for which domain recursion can satisfyingly replace Skolemization, and 3- they may serve as evidence that better Skolemization techniques exist.\nMany real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge graphs. Many previous works describe specialized approaches to perform specific types of analysis, mining and learning on such networks. In this work, we propose a unified framework consisting of a data model -a graph with a first order schema along with a declarative language for constructing, querying and manipulating such networks in ways that facilitate relational and structured machine learning. In particular, we provide an initial prototype for a relational and graph traversal query language where queries are directly used as relational features for structured machine learning models. Feature extraction is performed by making declarative graph traversal queries. Learning and inference models can directly operate on this relational representation and augment it with new data and knowledge that, in turn, is integrated seamlessly into the relational structure to support new predictions. We demonstrate this system's capabilities by showcasing tasks in natural language processing and computational biology domains.\nThe theory of belief functions is an effective tool to deal with the multiple uncertain information. In recent years, many evidence combination rules have been proposed in this framework, such as the conjunctive rule, the cautious rule, the PCR (Proportional Conflict Redistribution) rules and so on. These rules can be adopted for different types of sources. However, most of these rules are not applicable when the number of sources is large. This is due to either the complexity or the existence of an absorbing element (such as the total conflict mass function for the conjunctive-based rules when applied on unreliable evidence). In this paper, based on the assumption that the majority of sources are reliable, a combination rule for a large number of sources, named LNS (stands for Large Number of Sources), is proposed on the basis of a simple idea: the more common ideas one source shares with others, the morereliable the source is. This rule is adaptable for aggregating a large number of sources among which some are unreliable. It will keep the spirit of the conjunctive rule to reinforce the belief on the focal elements with which the sources are in agreement. The mass on the empty set will be kept as an indicator of the conflict. Moreover, it can be used to elicit the major opinion among the experts. The experimental results on synthetic mass functionsverify that the rule can be effectively used to combine a large number of mass functions and to elicit the major opinion.\nOne of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification.   To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities.   Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.\nIn this paper we introduce {\\em global and local announcement logic} (GLAL), a dynamic epistemic logic with two distinct announcement operators -- $[\\phi]^+_A$ and $[\\phi]^-_A$ indexed to a subset $A$ of the set $Ag$ of all agents -- for global and local announcements respectively. The boundary case $[\\phi]^+_{Ag}$ corresponds to the public announcement of $\\phi$, as known from the literature. Unlike standard public announcements, which are {\\em model transformers}, the global and local announcements are {\\em pointed model transformers}. In particular, the update induced by the announcement may be different in different states of the model. Therefore, the resulting computations are trees of models, rather than the typical sequences. A consequence of our semantics is that modally bisimilar states may be distinguished in our logic. Then, we provide a stronger notion of bisimilarity and we show that it preserves modal equivalence in GLAL. Additionally, we show that GLAL is strictly more expressive than public announcement logic with common knowledge. We prove a wide range of validities for GLAL involving the interaction between dynamics and knowledge, and show that the satisfiability problem for GLAL is decidable. We illustrate the formal machinery by means of detailed epistemic scenarios.\nIn the typical framework for boolean games (BG) each player can change the truth value of some propositional atoms, while attempting to make her goal true. In standard BG goals are propositional formulas, whereas in iterated BG goals are formulas of Linear Temporal Logic. Both notions of BG are characterised by the fact that agents have exclusive control over their set of atoms, meaning that no two agents can control the same atom. In the present contribution we drop the exclusivity assumption and explore structures where an atom can be controlled by multiple agents. We introduce Concurrent Game Structures with Shared Propositional Control (CGS-SPC) and show that they ac- count for several classes of repeated games, including iterated boolean games, influence games, and aggregation games. Our main result shows that, as far as verification is concerned, CGS-SPC can be reduced to concurrent game structures with exclusive control. This result provides a polynomial reduction for the model checking problem of specifications in Alternating-time Temporal Logic on CGS-SPC.\nLanguage is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively.\nLow-rank modeling has many important applications in computer vision and machine learning. While the matrix rank is often approximated by the convex nuclear norm, the use of nonconvex low-rank regularizers has demonstrated better empirical performance. However, the resulting optimization problem is much more challenging. Recent state-of-the-art requires an expensive full SVD in each iteration. In this paper, we show that for many commonly-used nonconvex low-rank regularizers, a cutoff can be derived to automatically threshold the singular values obtained from the proximal operator. This allows such operator being efficiently approximated by power method. Based on it, we develop a proximal gradient algorithm (and its accelerated variant) with inexact proximal splitting and prove that a convergence rate of O(1/T) where T is the number of iterations is guaranteed. Furthermore, we show the proposed algorithm can be well parallelized, which achieves nearly linear speedup w.r.t the number of threads. Extensive experiments are performed on matrix completion and robust principal component analysis, which shows a significant speedup over the state-of-the-art. Moreover, the matrix solution obtained is more accurate and has a lower rank than that of the nuclear norm regularizer.\nThe past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together? Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; we call AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map.   Code and data are shared at www.deepchrome.org\nHuman aware planning requires an agent to be aware of the intentions, capabilities and mental model of the human in the loop during its decision process. This can involve generating plans that are explicable to a human observer as well as the ability to provide explanations when such plans cannot be generated. This has led to the notion \"multi-model planning\" which aim to incorporate effects of human expectation in the deliberative process of a planner - either in the form of explicable task planning or explanations produced thereof. In this paper, we bring these two concepts together and show how a planner can account for both these needs and achieve a trade-off during the plan generation process itself by means of a model-space search method MEGA. This in effect provides a comprehensive perspective of what it means for a decision making agent to be \"human-aware\" by bringing together existing principles of planning under the umbrella of a single plan generation process. We situate our discussion specifically keeping in mind the recent work on explicable planning and explanation generation, and illustrate these concepts in modified versions of two well known planning domains, as well as a demonstration on a robot involved in a typical search and reconnaissance task with an external supervisor.\nWith computers to handle more and more complicated things in variable environments, it becomes an urgent requirement that the artificial intelligence has the ability of automatic judging and deciding according to numerous specific conditions so as to deal with the complicated and variable cases. ANNs inspired by brain is a good candidate. However, most of current numeric ANNs are not good at representing logical relations because these models still try to represent logical relations in the form of ratio based on functional approximation. On the other hand, researchers have been trying to design novel neural network models to make neural network model represent logical relations. In this work, a novel neural network model specified for representing logical relations is proposed and applied. New neurons and multiple kinds of links are defined. Inhibitory links are introduced besides exciting links. Different from current numeric ANNs, one end of an inhibitory link connects an exciting link rather than a neuron. Inhibitory links inhibit the connected exciting links conditionally to make this neural network model represent logical relations correctly. This model can simulate the operations of Boolean logic gates, and construct complex logical relations with the advantages of simpler neural network structures than recent works in this area. This work provides some ideas to make neural networks represent logical relations more directly and efficiently, and the model could be used as the complement to current numeric ANN to deal with logical issues and expand the application areas of ANN.\nWhile there is currently a lot of enthusiasm about \"big data\", useful data is usually \"small\" and expensive to acquire. In this paper, we present a new paradigm of learning partial differential equations from {\\em small} data. In particular, we introduce \\emph{hidden physics models}, which are essentially data-efficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and nonlinear partial differential equations, to extract patterns from high-dimensional data generated from experiments. The proposed methodology may be applied to the problem of learning, system identification, or data-driven discovery of partial differential equations. Our framework relies on Gaussian processes, a powerful tool for probabilistic inference over functions, that enables us to strike a balance between model complexity and data fitting. The effectiveness of the proposed approach is demonstrated through a variety of canonical problems, spanning a number of scientific domains, including the Navier-Stokes, Schr\\\"odinger, Kuramoto-Sivashinsky, and time dependent linear fractional equations. The methodology provides a promising new direction for harnessing the long-standing developments of classical methods in applied mathematics and mathematical physics to design learning machines with the ability to operate in complex domains without requiring large quantities of data.\nDeep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly \"on\" device.   We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks--a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler \"projection\" network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question \"how many neural bits are required to solve a given task?\" using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets.\nRecent studies have shown that attackers can force deep learning models to misclassify so-called \"adversarial examples\": maliciously generated images formed by making imperceptible modifications to pixel values. With growing interest in deep learning for security applications, it is important for security experts and users of machine learning to recognize how learning systems may be attacked. Due to the complex nature of deep learning, it is challenging to understand how deep models can be fooled by adversarial examples. Thus, we present a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a convolutional neural network (CNN) system. Adversarial-Playground is educational, modular and interactive. (1) It enables non-experts to compare examples visually and to understand why an adversarial example can fool a CNN-based image classifier. (2) It can help security experts explore more vulnerability of deep learning as a software module. (3) Building an interactive visualization is challenging in this domain due to the large feature space of image classification (generating adversarial examples is slow in general and visualizing images are costly). Through multiple novel design choices, our tool can provide fast and accurate responses to user requests. Empirically, we find that our client-server division strategy reduced the response time by an average of 1.5 seconds per sample. Our other innovation, a faster variant of JSMA evasion algorithm, empirically performed twice as fast as JSMA and yet maintains a comparable evasion rate.   Project source code and data from our experiments available at: https://github.com/QData/AdversarialDNN-Playground\nSerious games are beneficial for education in various computer science areas. Numerous works have reported the experiences of using games (not only playing but also development) in teaching and learning. Considering it could be difficult for teachers/students to prepare/develop a game from scratch during one semester, assistant educational materials would be crucial in the corresponding courses. Unfortunately, the literature shows that not many materials from educational game projects are shared. To help different educators identify suitable courseware and help students implement game development, it is worth further investigating and accumulating the educational resources from individual game projects. Following such an idea, this paper proposes a game development project of an object-oriented Sokoban solver, and exposes relevant educational materials. The documented system design can be viewed as a ready-to-use resource for education in object-oriented analysis and design (OOAD), while the Sokoban solver itself may be used as an assignment platform for teaching artificial intelligence (AI). Further documentation, platform, and APIs will be realized and shared in the future to facilitate others' educational activities. Overall, this work is supposed to inspire and encourage other researchers and educators to post available materials of more game projects for the purpose of sharing and reuse.\nReasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.\nIn this paper, we propose the nonlinearity generation method to speed up and stabilize the training of deep convolutional neural networks. The proposed method modifies a family of activation functions as nonlinearity generators (NGs). NGs make the activation functions linear symmetric for their inputs to lower model capacity, and automatically introduce nonlinearity to enhance the capacity of the model during training. The proposed method can be considered an unusual form of regularization: the model parameters are obtained by training a relatively low-capacity model, that is relatively easy to optimize at the beginning, with only a few iterations, and these parameters are reused for the initialization of a higher-capacity model. We derive the upper and lower bounds of variance of the weight variation, and show that the initial symmetric structure of NGs helps stabilize training. We evaluate the proposed method on different frameworks of convolutional neural networks over two object recognition benchmark tasks (CIFAR-10 and CIFAR-100). Experimental results showed that the proposed method allows us to (1) speed up the convergence of training, (2) allow for less careful weight initialization, (3) improve or at least maintain the performance of the model at negligible extra computational cost, and (4) easily train a very deep model.\nThe multi-armed bandit problem forms the foundation for solving a wide range of on-line stochastic optimization problems through a simple, yet effective mechanism. One simply casts the problem as a gambler that repeatedly pulls one out of N slot machine arms, eliciting random rewards. Learning of reward probabilities is then combined with reward maximization, by carefully balancing reward exploration against reward exploitation. In this paper, we address a particularly intriguing variant of the multi-armed bandit problem, referred to as the {\\it Stochastic Point Location (SPL) Problem}. The gambler is here only told whether the optimal arm (point) lies to the \"left\" or to the \"right\" of the arm pulled, with the feedback being erroneous with probability $1-\\pi$. This formulation thus captures optimization in continuous action spaces with both {\\it informative} and {\\it deceptive} feedback. To tackle this class of problems, we formulate a compact and scalable Bayesian representation of the solution space that simultaneously captures both the location of the optimal arm as well as the probability of receiving correct feedback. We further introduce the accompanying Thompson Sampling guided Stochastic Point Location (TS-SPL) scheme for balancing exploration against exploitation. By learning $\\pi$, TS-SPL also supports {\\it deceptive} environments that are lying about the direction of the optimal arm. This, in turn, allows us to solve the fundamental Stochastic Root Finding (SRF) Problem. Empirical results demonstrate that our scheme deals with both deceptive and informative environments, significantly outperforming competing algorithms both for SRF and SPL.\nBackground Road collisions and casualties pose a serious threat to commuters around the globe. Autonomous Vehicles (AVs) aim to make the use of technology to reduce the road accidents. However, the most of research work in the context of collision avoidance has been performed to address, separately, the rear end, front end and lateral collisions in less congested and with high inter-vehicular distances. Purpose The goal of this paper is to introduce the concept of a social agent, which interact with other AVs in social manners like humans are social having the capability of predicting intentions, i.e. mentalizing and copying the actions of each other, i.e. mirroring. The proposed social agent is based on a human-brain inspired mentalizing and mirroring capabilities and has been modelled for collision detection and avoidance under congested urban road traffic.   Method We designed our social agent having the capabilities of mentalizing and mirroring and for this purpose we utilized Exploratory Agent Based Modeling (EABM) level of Cognitive Agent Based Computing (CABC) framework proposed by Niazi and Hussain.   Results Our simulation and practical experiments reveal that by embedding Richardson's arms race model within AVs, collisions can be avoided while travelling on congested urban roads in a flock like topologies. The performance of the proposed social agent has been compared at two different levels.\nDeep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, e.g., bird classification, it cannot easily be trained to do new tasks, e.g., incrementally learning to recognize additional bird species or learning an entirely different task such as flower recognition. When new tasks are added, typical deep neural networks are prone to catastrophically forgetting previous tasks. Networks that are capable of assimilating new information incrementally, much like how humans form new memories over time, will be more efficient than re-training the model from scratch each time a new task needs to be learned. There have been multiple attempts to develop schemes that mitigate catastrophic forgetting, but these methods have not been directly compared, the tests used to evaluate them vary considerably, and these methods have only been evaluated on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics and benchmarks for directly comparing five different mechanisms designed to mitigate catastrophic forgetting in neural networks: regularization, ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on real-world images and sounds show that the mechanism(s) that are critical for optimal performance vary based on the incremental training paradigm and type of data being used, but they all demonstrate that the catastrophic forgetting problem has yet to be solved.\nHighly automated robot ecologies (HARE), or societies of independent autonomous robots or agents, are rapidly becoming an important part of much of the world's critical infrastructure. As with human societies, regulation, wherein a governing body designs rules and processes for the society, plays an important role in ensuring that HARE meet societal objectives. However, to date, a careful study of interactions between a regulator and HARE is lacking. In this paper, we report on three user studies which give insights into how to design systems that allow people, acting as the regulatory authority, to effectively interact with HARE. As in the study of political systems in which governments regulate human societies, our studies analyze how interactions between HARE and regulators are impacted by regulatory power and individual (robot or agent) autonomy. Our results show that regulator power, decision support, and adaptive autonomy can each diminish the social welfare of HARE, and hint at how these seemingly desirable mechanisms can be designed so that they become part of successful HARE.\nAs deep neural networks become more complex and input datasets grow larger, it can take days or even weeks to train a deep neural network to the desired accuracy. Therefore, distributed Deep Learning at a massive scale is a critical capability, since it offers the potential to reduce the training time from weeks to hours. In this paper, we present a software-hardware co-optimized distributed Deep Learning system that can achieve near-linear scaling up to hundreds of GPUs. The core algorithm is a multi-ring communication pattern that provides a good tradeoff between latency and bandwidth and adapts to a variety of system configurations. The communication algorithm is implemented as a library for easy use. This library has been integrated into Tensorflow, Caffe, and Torch. We train Resnet-101 on Imagenet 22K with 64 IBM Power8 S822LC servers (256 GPUs) in about 7 hours to an accuracy of 33.8 % validation accuracy. Microsoft's ADAM and Google's DistBelief results did not reach 30 % validation accuracy for Imagenet 22K. Compared to Facebook AI Research's recent paper on 256 GPU training, we use a different communication algorithm, and our combined software and hardware system offers better communication overhead for Resnet-50. A PowerAI DDL enabled version of Torch completed 90 epochs of training on Resnet 50 for 1K classes in 50 minutes using 64 IBM Power8 S822LC servers (256 GPUs).\nDespite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition, due to the vast difference in visual quality between the domains and the difficulty of curating diverse large-scale video datasets. This paper addresses both of those challenges, through an image to video feature-level domain adaptation approach, to learn discriminative video frame representations. The framework utilizes large-scale unlabeled video data to reduce the gap between different domains while transferring discriminative knowledge from large-scale labeled still images. Given a face recognition network that is pretrained in the image domain, the adaptation is achieved by (i) distilling knowledge from the network to a video adaptation network through feature matching, (ii) performing feature restoration through synthetic data augmentation and (iii) learning a domain-invariant feature through a domain adversarial discriminator. We further improve performance through a discriminator-guided feature fusion that boosts high-quality frames while eliminating those degraded by video domain-specific factors. Experiments on the YouTube Faces and IJB-A datasets demonstrate that each module contributes to our feature-level domain adaptation framework and substantially improves video face recognition performance to achieve state-of-the-art accuracy. We demonstrate qualitatively that the network learns to suppress diverse artifacts in videos such as pose, illumination or occlusion without being explicitly trained for them.\nThe Computational Algebraic Geometry applied in Algebraic Statistics; are beginning to exploring new branches and applications; in artificial intelligence and others areas. Currently, the development of the mathematics is very extensive and it is difficult to see the immediate application of few theorems in different areas, such as is the case of the Theorem 3.9 given in [10] and proved in part of here. Also this work has the intention to show the Hilbert basis as a powerful tool in data science; and for that reason we compile important results proved in works by, S. Watanabe [27], D. Cox, J. Little and H. Schenck [8], B. Sturmfels [16] and G. Ewald [10]. In this work we study, first, the fundamental concepts in Toric Algebraic Geometry. The principal contribution of this work is the application of Hilbert basis (as one realization of Theorem 3.9) for the resolution of singularities with toric varieties, and a background in Lattice Polytope. In the second part we apply this theorem to problems in statistical learning, principally in a recent area as is the Singular Learning Theory. We define the singular machines and the problem of Singular Learning through the computing of learning curves on these statistical machines. We review and compile results on the work of S. Watanabe in Singular Learning Theory, ref.; [17], [20], [21], also revising the important result in [26], about almost the machines are singular, we formalize this theory withtoric resolution morphism in a theorem proved here (Theorem 5.4), characterizing these Learning Machines as toric varieties, and we reproduce results previously published in Singular Statistical Learning seen in [19], [20], [23].\nCross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching. Most of the traditional textual-visual binary encoding methods only consider holistic image representations and fail to model descriptive sentences. This renders existing methods inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To address the problem of hashing cross-modal data with semantic-rich cues, in this paper, a novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB). In particular, region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text convolutional network. Additionally, we propose a stochastic batch-wise training routine, where high-quality binary codes and deep encoding functions are efficiently optimized in an alternating manner. Experiments are conducted on three multimedia datasets, i.e. Microsoft COCO, IAPR TC-12, and INRIA Web Queries, where the proposed TVDB model significantly outperforms state-of-the-art binary coding methods in the task of cross-modal retrieval.\nThe study of causality or causal inference - how much a given treatment causally affects a given outcome in a population - goes way beyond correlation or association analysis of variables, and is critical in making sound data driven decisions and policies in a multitude of applications. The gold standard in causal inference is performing \"controlled experiments\", which often is not possible due to logistical or ethical reasons. As an alternative, inferring causality on \"observational data\" based on the \"Neyman-Rubin potential outcome model\" has been extensively used in statistics, economics, and social sciences over several decades. In this paper, we present a formal framework for sound causal analysis on observational datasets that are given as multiple relations and where the population under study is obtained by joining these base relations. We study a crucial condition for inferring causality from observational data, called the \"strong ignorability assumption\" (the treatment and outcome variables should be independent in the joined relation given the observed covariates), using known conditional independences that hold in the base relations. We also discuss how the structure of the conditional independences in base relations given as graphical models help infer new conditional independences in the joined relation. The proposed framework combines concepts from databases, statistics, and graphical models, and aims to initiate new research directions spanning these fields to facilitate powerful data-driven decisions in today's big data world.\nIn the context of superintelligent AI systems, the term \"oracle\" has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research problems related to oracles which align with long-term research goals of AI safety. We examine existing question answering systems and argue that their high degree of architectural heterogeneity makes them poor candidates for rigorous analysis as oracles. On the other hand, we identify computer algebra systems (CASs) as being primitive examples of domain-specific oracles for mathematics and argue that efforts to integrate computer algebra systems with theorem provers, systems which have largely been developed independent of one another, provide a concrete set of problems related to the notion of provable safety that has emerged in the AI safety community. We review approaches to interfacing CASs with theorem provers, describe well-defined architectural deficiencies that have been identified with CASs, and suggest possible lines of research and practical software projects for scientists interested in AI safety.\nModel-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbmf\nDetection of surface water in natural environment via multi-spectral imagery has been widely utilized in many fields, such land cover identification. However, due to the similarity of the spectra of water bodies, built-up areas, approaches based on high-resolution satellites sometimes confuse these features. A popular direction to detect water is spectral index, often requiring the ground truth to find appropriate thresholds manually. As for traditional machine learning methods, they identify water merely via differences of spectra of various land covers, without taking specific properties of spectral reflection into account. In this paper, we propose an automatic approach to detect water bodies based on Dempster-Shafer theory, combining supervised learning with specific property of water in spectral band in a fully unsupervised context. The benefits of our approach are twofold. On the one hand, it performs well in mapping principle water bodies, including little streams and branches. On the other hand, it labels all objects usually confused with water as `ignorance', including half-dry watery areas, built-up areas and semi-transparent clouds and shadows. `Ignorance' indicates not only limitations of the spectral properties of water and supervised learning itself but insufficiency of information from multi-spectral bands as well, providing valuable information for further land cover classification.\nThe simulation of pedestrian crowd that reflects reality is a major challenge for researches. Several crowd simulation models have been proposed such as cellular automata model, agent-based model, fluid dynamic model, etc. It is important to note that agent-based model is able, over others approaches, to provide a natural description of the system and then to capture complex human behaviors. In this paper, we propose a multi-agent simulation model in which pedestrian positions are updated at discrete time intervals. It takes into account the major normal conditions of a simple pedestrian situated in a crowd such as preferences, realistic perception of environment, etc. Our objective is to simulate the pedestrian crowd realistically towards a simulation of believable pedestrian behaviors. Typical pedestrian phenomena, including the unidirectional and bidirectional movement in a corridor as well as the flow through bottleneck, are simulated. The conducted simulations show that our model is able to produce realistic pedestrian behaviors. The obtained fundamental diagram and flow rate at bottleneck agree very well with classic conclusions and empirical study results. It is hoped that the idea of this study may be helpful in promoting the modeling and simulation of pedestrian crowd in a simple way.\nIn recent years supervised representation learning has provided state of the art or close to the state of the art results in semantic analysis tasks including ranking and information retrieval. The core idea is to learn how to embed items into a latent space such that they optimize a supervised objective in that latent space. The dimensions of the latent space have no clear semantics, and this reduces the interpretability of the system. For example, in personalization models, it is hard to explain why a particular item is ranked high for a given user profile. We propose a novel model of representation learning called Supervised Explicit Semantic Analysis (SESA) that is trained in a supervised fashion to embed items to a set of dimensions with explicit semantics. The model learns to compare two objects by representing them in this explicit space, where each dimension corresponds to a concept from a knowledge base. This work extends Explicit Semantic Analysis (ESA) with a supervised model for ranking problems. We apply this model to the task of Job-Profile relevance in LinkedIn in which a set of skills defines our explicit dimensions of the space. Every profile and job are encoded to this set of skills their similarity is calculated in this space. We use RNNs to embed text input into this space. In addition to interpretability, our model makes use of the web-scale collaborative skills data that is provided by users for each LinkedIn profile. Our model provides state of the art result while it remains interpretable.\nThe ability to learn new tasks and generalize performance to others is one of the most remarkable characteristics of the human brain and of recent AI systems. The ability to perform multiple tasks simultaneously is also a signature characteristic of large-scale parallel architectures, that is evident in the human brain, and has been exploited effectively more traditional, massively parallel computational architectures. Here, we show that these two characteristics are in tension, reflecting a fundamental tradeoff between interactive parallelism that supports learning and generalization, and independent parallelism that supports processing efficiency through concurrent multitasking. We formally show that, while the maximum number of tasks that can be performed simultaneously grows linearly with network size, under realistic scenarios (e.g. in an unpredictable environment), the expected number that can be performed concurrently grows radically sub-linearly with network size. Hence, even modest reliance on shared representation strictly constrains the number of tasks that can be performed simultaneously, implying profound consequences for the development of artificial intelligence that optimally manages the tradeoff between learning and processing, and for understanding the human brains remarkably puzzling mix of sequential and parallel capabilities.\nData-driven techniques are used in cyber-physical systems (CPS) for controlling autonomous vehicles, handling demand responses for energy management, and modeling human physiology for medical devices. These data-driven techniques extract models from training data, where their performance is often analyzed with respect to random errors in the training data. However, if the training data is maliciously altered by attackers, the effect of these attacks on the learning algorithms underpinning data-driven CPS have yet to be considered. In this paper, we analyze the resilience of classification algorithms to training data attacks. Specifically, a generic metric is proposed that is tailored to measure resilience of classification algorithms with respect to worst-case tampering of the training data. Using the metric, we show that traditional linear classification algorithms are resilient under restricted conditions. To overcome these limitations, we propose a linear classification algorithm with a majority constraint and prove that it is strictly more resilient than the traditional algorithms. Evaluations on both synthetic data and a real-world retrospective arrhythmia medical case-study show that the traditional algorithms are vulnerable to tampered training data, whereas the proposed algorithm is more resilient (as measured by worst-case tampering).\nUsers try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware attention mechanism to capture the structure of the session context. is enables us to control the scope of the session from which we infer the suggested next query, which helps not only handle the noisy data but also automatically detect session boundaries. Furthermore, we observe that, based on the user query reformulation behavior, within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary. We therefore empower the decoder of our model to access the source words from the session context during decoding by incorporating a copy mechanism. Moreover, we propose evaluation metrics to assess the quality of the generative models for query suggestion. We conduct an extensive set of experiments and analysis. e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.\nActive Object Recognition (AOR) has been approached as an unsupervised learning problem, in which optimal trajectories for object inspection are not known and are to be discovered by reducing label uncertainty measures or training with reinforcement learning. Such approaches have no guarantees of the quality of their solution. In this paper, we treat AOR as a Partially Observable Markov Decision Process (POMDP) and find near-optimal policies on training data using Belief Tree Search (BTS) on the corresponding belief Markov Decision Process (MDP). AOR then reduces to the problem of knowledge transfer from near-optimal policies on training set to the test set. We train a Long Short Term Memory (LSTM) network to predict the best next action on the training set rollouts. We sho that the proposed AOR method generalizes well to novel views of familiar objects and also to novel objects. We compare this supervised scheme against guided policy search, and find that the LSTM network reaches higher recognition accuracy compared to the guided policy method. We further look into optimizing the observation function to increase the total collected reward of optimal policy. In AOR, the observation function is known only approximately. We propose a gradient-based method update to this approximate observation function to increase the total reward of any policy. We show that by optimizing the observation function and retraining the supervised LSTM network, the AOR performance on the test set improves significantly.\nOver 150,000 new people in the United States are diagnosed with colorectal cancer each year. Nearly a third die from it (American Cancer Society). The only approved noninvasive diagnosis tools currently involve fecal blood count tests (FOBTs) or stool DNA tests. Fecal blood count tests take only five minutes and are available over the counter for as low as \\$15. They are highly specific, yet not nearly as sensitive, yielding a high percentage (25%) of false negatives (Colon Cancer Alliance). Moreover, FOBT results are far too generalized, meaning that a positive result could mean much more than just colorectal cancer, and could just as easily mean hemorrhoids, anal fissure, proctitis, Crohn's disease, diverticulosis, ulcerative colitis, rectal ulcer, rectal prolapse, ischemic colitis, angiodysplasia, rectal trauma, proctitis from radiation therapy, and others. Stool DNA tests, the modern benchmark for CRC screening, have a much higher sensitivity and specificity, but also cost \\$600, take two weeks to process, and are not for high-risk individuals or people with a history of polyps. To yield a cheap and effective CRC screening alternative, a unique ensemble-based classification algorithm is put in place that considers the FIT result, BMI, smoking history, and diabetic status of patients. This method is tested under ten-fold cross validation to have a .95 AUC, 92% specificity, 89% sensitivity, .88 F1, and 90% precision. Once clinically validated, this test promises to be cheaper, faster, and potentially more accurate when compared to a stool DNA test.\nNeuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.\nFrom medicines to materials, small organic molecules are indispensable for human well-being. To plan their syntheses, chemists employ a problem solving technique called retrosynthesis. In retrosynthesis, target molecules are recursively transformed into increasingly simpler precursor compounds until a set of readily available starting materials is obtained. Computer-aided retrosynthesis would be a highly valuable tool, however, past approaches were slow and provided results of unsatisfactory quality. Here, we employ Monte Carlo Tree Search (MCTS) to efficiently discover retrosynthetic routes. MCTS was combined with an expansion policy network that guides the search, and an \"in-scope\" filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on 12 million reactions, which represents essentially all reactions ever published in organic chemistry. Our system solves almost twice as many molecules and is 30 times faster in comparison to the traditional search method based on extracted rules and hand-coded heuristics. Finally after a 60 year history of computer-aided synthesis planning, chemists can no longer distinguish between routes generated by a computer system and real routes taken from the scientific literature. We anticipate that our method will accelerate drug and materials discovery by assisting chemists to plan better syntheses faster, and by enabling fully automated robot synthesis.\nThe K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures? This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about $20\\%$ while the noise level reaches $90\\%$, this is true for all the distances used. This means that the KNN classifier using any of the top $10$ distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.\nPrevious research on automatic pain estimation from facial expressions has focused primarily on \"one-size-fits-all\" metrics (such as PSPI). In this work, we focus on directly estimating each individual's self-reported visual-analog scale (VAS) pain metric, as this is considered the gold standard for pain measurement. The VAS pain score is highly subjective and context-dependent, and its range can vary significantly among different persons. To tackle these issues, we propose a novel two-stage personalized model, named DeepFaceLIFT, for automatic estimation of VAS. This model is based on (1) Neural Network and (2) Gaussian process regression models, and is used to personalize the estimation of self-reported pain via a set of hand-crafted personal features and multi-task learning. We show on the benchmark dataset for pain analysis (The UNBC-McMaster Shoulder Pain Expression Archive) that the proposed personalized model largely outperforms the traditional, unpersonalized models: the intra-class correlation improves from a baseline performance of 19\\% to a personalized performance of 35\\% while also providing confidence in the model\\textquotesingle s estimates -- in contrast to existing models for the target task. Additionally, DeepFaceLIFT automatically discovers the pain-relevant facial regions for each person, allowing for an easy interpretation of the pain-related facial cues.\nTraining model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current state-of-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\\left(\\bz\\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.\nMany popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a list of non-discrete attributes for each entity. Intuitively, these attributes such as height, price or population count are able to richly characterize entities in knowledge graphs. This additional source of information may help to alleviate the inherent sparsity and incompleteness problem that are prevalent in knowledge graphs. Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs. In this paper, we propose a novel multi-task neural network approach for both encoding and prediction of non-discrete attribute information in a relational setting. Specifically, we train a neural network for triplet prediction along with a separate network for attribute value regression. Via multi-task learning, we are able to learn representations of entities, relations and attributes that encode information about both tasks. Moreover, such attributes are not only central to many predictive tasks as an information source but also as a prediction target. Therefore, models that are able to encode, incorporate and predict such information in a relational learning context are highly attractive as well. We show that our approach outperforms many state-of-the-art methods for the tasks of relational triplet classification and attribute value prediction.\nAs AI continues to advance, human-AI teams are inevitable. However, progress in AI is routinely measured in isolation, without a human in the loop. It is crucial to benchmark progress in AI, not just in isolation, but also in terms of how it translates to helping humans perform certain tasks, i.e., the performance of human-AI teams.   In this work, we design a cooperative game - GuessWhich - to measure human-AI team performance in the specific context of the AI being a visual conversational agent. GuessWhich involves live interaction between the human and the AI. The AI, which we call ALICE, is provided an image which is unseen by the human. Following a brief description of the image, the human questions ALICE about this secret image to identify it from a fixed pool of images.   We measure performance of the human-ALICE team by the number of guesses it takes the human to correctly identify the secret image after a fixed number of dialog rounds with ALICE. We compare performance of the human-ALICE teams for two versions of ALICE. Our human studies suggest a counterintuitive trend - that while AI literature shows that one version outperforms the other when paired with an AI questioner bot, we find that this improvement in AI-AI performance does not translate to improved human-AI performance. This suggests a mismatch between benchmarking of AI in isolation and in the context of human-AI teams.\nMusic is usually highly structured and it is still an open question how to design models which can successfully learn to recognize and represent musical structure. A fundamental problem is that structurally related patterns can have very distinct appearances, because the structural relationships are often based on transformations of musical material, like chromatic or diatonic transposition, inversion, retrograde, or rhythm change. In this preliminary work, we study the potential of two unsupervised learning techniques - Restricted Boltzmann Machines (RBMs) and Gated Autoencoders (GAEs) - to capture pre-defined transformations from constructed data pairs. We evaluate the models by using the learned representations as inputs in a discriminative task where for a given type of transformation (e.g. diatonic transposition), the specific relation between two musical patterns must be recognized (e.g. an upward transposition of diatonic steps). Furthermore, we measure the reconstruction error of models when reconstructing musical transformed patterns. Lastly, we test the models in an analogy-making task. We find that it is difficult to learn musical transformations with the RBM and that the GAE is much more adequate for this task, since it is able to learn representations of specific transformations that are largely content-invariant. We believe these results show that models such as GAEs may provide the basis for more encompassing music analysis systems, by endowing them with a better understanding of the structures underlying music.\nTraining deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs in Apache Spark. The framework implements both Data Parallelism and Model Parallelism making it suitable to use for deep networks which require huge training data and model parameters which are too big to fit into the memory of a single machine. It can be scaled easily over a cluster of cheap commodity hardware to attain significant speedup and obtain better results making it quite economical as compared to farm of GPUs and supercomputers. We have proposed a new algorithm for training of deep networks for the case when the network is partitioned across the machines (Model Parallelism) along with detailed cost analysis and proof of convergence of the same. We have developed implementations for Fully-Connected Feedforward Networks, Convolutional Neural Networks, Recurrent Neural Networks and Long Short-Term Memory architectures. We present the results of extensive simulations demonstrating the speedup and accuracy obtained by our framework for different sizes of the data and model parameters with variation in the number of worker cores/partitions; thereby showing that our proposed framework can achieve significant speedup (upto 11X for CNN) and is also quite scalable.\nWe consider a wide range of regularized stochastic minimization problems with two regularization terms, one of which is composed with a linear function. This optimization model abstracts a number of important applications in artificial intelligence and machine learning, such as fused Lasso, fused logistic regression, and a class of graph-guided regularized minimization. The computational challenges of this model are in two folds. On one hand, the closed-form solution of the proximal mapping associated with the composed regularization term or the expected objective function is not available. On the other hand, the calculation of the full gradient of the expectation in the objective is very expensive when the number of input data samples is considerably large. To address these issues, we propose a stochastic variant of extra-gradient type methods, namely \\textsf{Stochastic Primal-Dual Proximal ExtraGradient descent (SPDPEG)}, and analyze its convergence property for both convex and strongly convex objectives. For general convex objectives, the uniformly average iterates generated by \\textsf{SPDPEG} converge in expectation with $O(1/\\sqrt{t})$ rate. While for strongly convex objectives, the uniformly and non-uniformly average iterates generated by \\textsf{SPDPEG} converge with $O(\\log(t)/t)$ and $O(1/t)$ rates, respectively. The order of the rate of the proposed algorithm is known to match the best convergence rate for first-order stochastic algorithms. Experiments on fused logistic regression and graph-guided regularized logistic regression problems show that the proposed algorithm performs very efficiently and consistently outperforms other competing algorithms.\nThe Recurrent Chinese Restaurant Process (RCRP) is a powerful statistical method for modeling evolving clusters in large scale social media data. With the RCRP, one can allow both the number of clusters and the cluster parameters in a model to change over time. However, application of the RCRP has largely been limited due to the non-conjugacy between the cluster evolutionary priors and the Multinomial likelihood. This non-conjugacy makes inference di cult and restricts the scalability of models which use the RCRP, leading to the RCRP being applied only in simple problems, such as those that can be approximated by a single Gaussian emission. In this paper, we provide a novel solution for the non-conjugacy issues for the RCRP and an example of how to leverage our solution for one speci c problem - the social event discovery problem. By utilizing Sequential Monte Carlo methods in inference, our approach can be massively paralleled and is highly scalable, to the extent it can work on tens of millions of documents. We are able to generate high quality topical and location distributions of the clusters that can be directly interpreted as real social events, and our experimental results suggest that the approaches proposed achieve much better predictive performance than techniques reported in prior work. We also demonstrate how the techniques we develop can be used in a much more general ways toward similar problems.\nWe model the spread of news as a social learning game on a network. Agents can either endorse or oppose a claim made in a piece of news, which itself may be either true or false. Agents base their decision on a private signal and their neighbors' past actions. Given these inputs, agents follow strategies derived via multi-agent deep reinforcement learning and receive utility from acting in accordance with the veracity of claims. Our framework yields strategies with agent utility close to a theoretical, Bayes optimal benchmark, while remaining flexible to model re-specification. Optimized strategies allow agents to correctly identify most false claims, when all agents receive unbiased private signals. However, an adversary's attempt to spread fake news by targeting a subset of agents with a biased private signal can be successful. Even more so when the adversary has information about agents' network position or private signal. When agents are aware of the presence of an adversary they re-optimize their strategies in the training stage and the adversary's attack is less effective. Hence, exposing agents to the possibility of fake news can be an effective way to curtail the spread of fake news in social networks. Our results also highlight that information about the users' private beliefs and their social network structure can be extremely valuable to adversaries and should be well protected.\nActual causation is concerned with the question \"what caused what?\". Consider a transition between two subsequent observations within a system of elements. Even under perfect knowledge of the system, a straightforward answer to this question may not be available. Counterfactual accounts of actual causation based on graphical models, paired with system interventions, have demonstrated initial success in addressing specific problem cases. We present a formal account of actual causation, applicable to discrete dynamical systems of interacting elements, that considers all counterfactual states of a state transition from t-1 to t. Within such a transition, causal links are considered from two complementary points of view: we can ask if any occurrence at time t has an actual cause at t-1, but also if any occurrence at time t-1 has an actual effect at t. We address the problem of identifying such actual causes and actual effects in a principled manner by starting from a set of basic requirements for causation (existence, composition, information, integration, and exclusion). We present a formal framework to implement these requirements based on system manipulations and partitions. This framework is used to provide a complete causal account of the transition by identifying and quantifying the strength of all actual causes and effects linking two occurrences. Finally, we examine several exemplary cases and paradoxes of causation and show that they can be illuminated by the proposed framework for quantifying actual causation.\nAesthetic quality prediction is a challenging task in the computer vision community because of the complex interplay with semantic contents and photographic technologies. Recent studies on the powerful deep learning based aesthetic quality assessment usually use a binary high-low label or a numerical score to represent the aesthetic quality. However the scalar representation cannot describe well the underlying varieties of the human perception of aesthetics. In this work, we propose to predict the aesthetic score distribution (i.e., a score distribution vector of the ordinal basic human ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs which aim to minimize the difference between the predicted scalar numbers or vectors and the ground truth cannot be directly used for the ordinal basic rating distribution. Thus, a novel CNN based on the Cumulative distribution with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic score distribution of human ratings, with a new reliability-sensitive learning method based on the kurtosis of the score distribution, which eliminates the requirement of the original full data of human ratings (without normalization). Experimental results on large scale aesthetic dataset demonstrate the effectiveness of our introduced CJS-CNN in this task.\nThe volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human fact checker's productivity by surfacing relevant facts and patterns to aid their analysis. To this end, we present a novel, unsupervised network-flow based approach to determine the truthfulness of a statement of fact expressed in the form of a (subject, predicate, object) triple. We view a knowledge graph of background information about real-world entities as a flow network, and knowledge as a fluid, abstract commodity. We show that computational fact checking of such a triple then amounts to finding a \"knowledge stream\" that emanates from the subject node and flows toward the object node through paths connecting them. Evaluation on a range of real-world and hand-crafted datasets of facts related to entertainment, business, sports, geography and more reveals that this network-flow model can be very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model is expressive in its ability to automatically discover several useful path patterns and surface relevant facts that may help a human fact checker corroborate or refute a claim.\nThis paper provides a theoretical justification of the superior classification performance of deep rectifier networks over shallow rectifier networks from the geometrical perspective of piecewise linear (PWL) classifier boundaries. We show that, for a given threshold on the approximation error, the required number of boundary facets to approximate a general smooth boundary grows exponentially with the dimension of the data, and thus the number of boundary facets, referred to as boundary resolution, of a PWL classifier is an important quality measure that can be used to estimate a lower bound on the classification errors. However, learning naively an exponentially large number of boundary facets requires the determination of an exponentially large number of parameters and also requires an exponentially large number of training patterns. To overcome this issue of \"curse of dimensionality\", compressive representations of high resolution classifier boundaries are required. To show the superior compressive power of deep rectifier networks over shallow rectifier networks, we prove that the maximum boundary resolution of a single hidden layer rectifier network classifier grows exponentially with the number of units when this number is smaller than the dimension of the patterns. When the number of units is larger than the dimension of the patterns, the growth rate is reduced to a polynomial order. Consequently, the capacity of generating a high resolution boundary will increase if the same large number of units are arranged in multiple layers instead of a single hidden layer. Taking high dimensional spherical boundaries as examples, we show how deep rectifier networks can utilize geometric symmetries to approximate a boundary with the same accuracy but with a significantly fewer number of parameters than single hidden layer nets.\nWe address a problem of area protection in graph-based scenarios with multiple agents. The problem consists of two adversarial teams of agents that move in an undirected graph shared by both teams. Agents are placed in vertices of the graph; at most one agent can occupy a vertex; and they can move into adjacent vertices in a conflict free way. Teams have asymmetric goals: the aim of one team - attackers - is to invade into given area while the aim of the opponent team - defenders - is to protect the area from being entered by attackers by occupying selected vertices. We study strategies for allocating vertices to be occupied by the team of defenders to block attacking agents. We show that the decision version of the problem of area protection is PSPACE-hard under the assumption that agents can allocate their target vertices multiple times. Further we develop various on-line vertex-allocation strategies for the defender team in a simplified variant of the problem with single stage vertex allocation and evaluated their performance in multiple benchmarks. The success of a strategy is heavily dependent on the type of the instance, and so one of the contributions of this work is that we identify suitable vertex-allocation strategies for diverse instance types. In particular, we introduce a simulation-based method that identifies and tries to capture bottlenecks in the graph, that are frequently used by the attackers. Our experimental evaluation suggests that this method often allows a successful defense even in instances where the attackers significantly outnumber the defenders.\nThis paper focuses on the problem of learning 6-DOF grasping with a parallel jaw gripper in simulation. We propose the notion of a geometry-aware representation in grasping based on the assumption that knowledge of 3D geometry is at the heart of interaction. Our key idea is constraining and regularizing grasping interaction learning through 3D geometry prediction. Specifically, we formulate the learning of deep geometry-aware grasping model in two steps: First, we learn to build mental geometry-aware representation by reconstructing the scene (i.e., 3D occupancy grid) from RGBD input via generative 3D shape modeling. Second, we learn to predict grasping outcome with its internal geometry-aware representation. The learned outcome prediction model is used to sequentially propose grasping solutions via analysis-by-synthesis optimization. Our contributions are fourfold: (1) To best of our knowledge, we are presenting for the first time a method to learn a 6-DOF grasping net from RGBD input; (2) We build a grasping dataset from demonstrations in virtual reality with rich sensory and interaction annotations. This dataset includes 101 everyday objects spread across 7 categories, additionally, we propose a data augmentation strategy for effective learning; (3) We demonstrate that the learned geometry-aware representation leads to about 10 percent relative performance improvement over the baseline CNN on grasping objects from our dataset. (4) We further demonstrate that the model generalizes to novel viewpoints and object instances.\nInformation theory is a mathematical theory of learning with deep connections with topics as diverse as artificial intelligence, statistical physics, and biological evolution. Many primers on the topic paint a broad picture with relatively little mathematical sophistication, while many others develop specific application areas in detail. In contrast, these informal notes aim to outline some elements of the information-theoretic \"way of thinking,\" by cutting a rapid and interesting path through some of the theory's foundational concepts and theorems. We take the Kullback-Leibler divergence as our foundational concept, and then proceed to develop the entropy and mutual information. We discuss some of the main foundational results, including the Chernoff bounds as a characterization of the divergence; Gibbs' Theorem; and the Data Processing Inequality. A recurring theme is that the definitions of information theory support natural theorems that sound \"obvious\" when translated into English. More pithily, \"information theory makes common sense precise.\" Since the focus of the notes is not primarily on technical details, proofs are provided only where the relevant techniques are illustrative of broader themes. Otherwise, proofs and intriguing tangents are referenced in liberally-sprinkled footnotes. The notes close with a highly nonexhaustive list of references to resources and other perspectives on the field.\nWe look at the unbiased Maker-Breaker Hamiltonicity game played on the edge set of a complete graph $K_n$, where Maker's goal is to claim a Hamiltonian cycle. First, we prove that, independent of who starts, Maker can win the game for $n = 8$ and $n = 9$. Then we use an inductive argument to show that, independent of who starts, Maker can win the game if and only if $n \\geq 8$. This, in particular, resolves in the affirmative the long-standing conjecture of Papaioannou.   We also study two standard positional games related to Hamiltonicity game. For Hamiltonian Path game, we show that Maker can claim a Hamiltonian path if and only if $n \\geq 5$, independent of who starts. Next, we look at Fixed Hamiltonian Path game, where the goal of Maker is to claim a Hamiltonian path between two predetermined vertices. We prove that if Maker starts the game, he wins if and only if $n \\geq 7$, and if Breaker starts, Maker wins if and only if $n \\geq 8$. Using this result, we are able to improve the previously best upper bound on the smallest number of edges a graph on $n$ vertices can have, knowing that Maker can win the Maker-Breaker Hamiltonicity game played on its edges.   To resolve the outcomes of the mentioned games on small (finite) boards, we devise algorithms for efficiently searching game trees and then obtain our results with the help of a computer.\nWe introduce a dynamic mechanism for the solution of analytically-tractable substructure in probabilistic programs, using conjugate priors and affine transformations to reduce variance in Monte Carlo estimators. For inference with Sequential Monte Carlo, this automatically yields improvements such as locally-optimal proposals and Rao-Blackwellization. The mechanism maintains a directed graph alongside the running program that evolves dynamically as operations are triggered upon it. Nodes of the graph represent random variables, edges the analytically-tractable relationships between them. Random variables remain in the graph for as long as possible, to be sampled only when they are used by the program in a way that cannot be resolved analytically. In the meantime, they are conditioned on as many observations as possible. We demonstrate the mechanism with a few pedagogical examples, as well as a linear-nonlinear state-space model with simulated data, and an epidemiological model with real data of a dengue outbreak in Micronesia. In all cases one or more variables are automatically marginalized out to significantly reduce variance in estimates of the marginal likelihood, in the final case facilitating a random-weight or pseudo-marginal-type importance sampler for parameter estimation. We have implemented the approach in Anglican and a new probabilistic programming language called Birch.\nDependency graph, as a heterogeneous graph representing the intrinsic relationships between different pairs of system entities, is essential to many data analysis applications, such as root cause diagnosis, intrusion detection, etc. Given a well-trained dependency graph from a source domain and an immature dependency graph from a target domain, how can we extract the entity and dependency knowledge from the source to enhance the target? One way is to directly apply a mature dependency graph learned from a source domain to the target domain. But due to the domain variety problem, directly using the source dependency graph often can not achieve good performance. Traditional transfer learning methods mainly focus on numerical data and are not applicable.   In this paper, we propose ACRET, a knowledge transfer based model for accelerating dependency graph learning from heterogeneous categorical event streams. In particular, we first propose an entity estimation model to filter out irrelevant entities from the source domain based on entity embedding and manifold learning. Only the entities with statistically high correlations are transferred to the target domain. On the surviving entities, we propose a dependency construction model for constructing the unbiased dependency relationships by solving a two-constraint optimization problem. The experimental results on synthetic and real-world datasets demonstrate the effectiveness and efficiency of ACRET. We also apply ACRET to a real enterprise security system for intrusion detection. Our method is able to achieve superior detection performance at least 20 days lead lag time in advance with more than 70% accuracy.\nWe investigate task clustering for deep-learning based multi-task and few-shot learning in a many-task setting. We propose a new method to measure task similarities with cross-task transfer performance matrix for the deep learning scenario. Although this matrix provides us critical information regarding similarity between tasks, its asymmetric property and unreliable performance scores can affect conventional clustering methods adversely. Additionally, the uncertain task-pairs, i.e., the ones with extremely asymmetric transfer scores, may collectively mislead clustering algorithms to output an inaccurate task-partition. To overcome these limitations, we propose a novel task-clustering algorithm by using the matrix completion technique. The proposed algorithm constructs a partially-observed similarity matrix based on the certainty of cluster membership of the task-pairs. We then use a matrix completion algorithm to complete the similarity matrix. Our theoretical analysis shows that under mild constraints, the proposed algorithm will perfectly recover the underlying \"true\" similarity matrix with a high probability. Our results show that the new task clustering method can discover task clusters for training flexible and superior neural network models in a multi-task learning setup for sentiment classification and dialog intent classification tasks. Our task clustering approach also extends metric-based few-shot learning methods to adapt multiple metrics, which demonstrates empirical advantages when the tasks are diverse.\nTraffic speed is a key indicator for the efficiency of an urban transportation system. Accurate modeling of the spatiotemporally varying traffic speed thus plays a crucial role in urban planning and development. This paper addresses the problem of efficient fine-grained traffic speed prediction using big traffic data obtained from static sensors. Gaussian processes (GPs) have been previously used to model various traffic phenomena, including flow and speed. However, GPs do not scale with big traffic data due to their cubic time complexity. In this work, we address their efficiency issues by proposing local GPs to learn from and make predictions for correlated subsets of data. The main idea is to quickly group speed variables in both spatial and temporal dimensions into a finite number of clusters, so that future and unobserved traffic speed queries can be heuristically mapped to one of such clusters. A local GP corresponding to that cluster can then be trained on the fly to make predictions in real-time. We call this method localization. We use non-negative matrix factorization for localization and propose simple heuristics for cluster mapping. We additionally leverage on the expressiveness of GP kernel functions to model road network topology and incorporate side information. Extensive experiments using real-world traffic data collected in the two U.S. cities of Pittsburgh and Washington, D.C., show that our proposed local GPs significantly improve both runtime performances and prediction accuracies compared to the baseline global and local GPs.\nActive appearance models (AAMs) are a class of generative models that have seen tremendous success in face analysis. However, model learning depends on the availability of detailed annotation of canonical landmark points. As a result, when accurate AAM fitting is required on a different set of variations (expression, pose, identity), a new dataset is collected and annotated. To overcome the need for time consuming data collection and annotation, transfer learning approaches have received recent attention. The goal is to transfer knowledge from previously available datasets (source) to a new dataset (target). We propose a subspace transfer learning method, in which we select a subspace from the source that best describes the target space. We propose a metric to compute the directional similarity between the source eigenvectors and the target subspace. We show an equivalence between this metric and the variance of target data when projected onto source eigenvectors. Using this equivalence, we select a subset of source principal directions that capture the variance in target data. To define our model, we augment the selected source subspace with the target subspace learned from a handful of target examples. In experiments done on six publicly available datasets, we show that our approach outperforms the state of the art in terms of the RMS fitting error as well as the percentage of test examples for which AAM fitting converges to the ground truth.\nRecent advances in Deep Neural Networks (DNNs) have led to the development of DNN-driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive without any human intervention. Most major manufacturers including Tesla, GM, Ford, BMW, and Waymo/Google are working on building and testing different types of autonomous vehicles. The lawmakers of several US states including California, Texas, and New York have passed new legislation to fast-track the process of testing and deployment of autonomous vehicles on their roads.   However, despite their spectacular progress, DNNs, just like traditional software, often demonstrate incorrect or unexpected corner case behaviors that can lead to potentially fatal collisions. Several such real-world accidents involving autonomous cars have already happened including one which resulted in a fatality. Most existing testing techniques for DNN-driven vehicles are heavily dependent on the manual collection of test data under different driving conditions which become prohibitively expensive as the number of test conditions increases.   In this paper, we design, implement and evaluate DeepTest, a systematic testing tool for automatically detecting erroneous behaviors of DNN-driven vehicles that can potentially lead to fatal crashes. First, our tool is designed to automatically generated test cases leveraging real-world changes in driving conditions like rain, fog, lighting conditions, etc. DeepTest systematically explores different parts of the DNN logic by generating test inputs that maximize the numbers of activated neurons. DeepTest found thousands of erroneous behaviors under different realistic driving conditions (e.g., blurring, rain, fog, etc.) many of which lead to potentially fatal crashes in three top performing DNNs in the Udacity self-driving car challenge.\nKnowing where people live is a fundamental component of many decision making processes such as urban development, infectious disease containment, evacuation planning, risk management, conservation planning, and more. While bottom-up, survey driven censuses can provide a comprehensive view into the population landscape of a country, they are expensive to realize, are infrequently performed, and only provide population counts over broad areas. Population disaggregation techniques and population projection methods individually address these shortcomings, but also have shortcomings of their own. To jointly answer the questions of \"where do people live\" and \"how many people live there,\" we propose a deep learning model for creating high-resolution population estimations from satellite imagery. Specifically, we train convolutional neural networks to predict population in the USA at a $0.01^{\\circ} \\times 0.01^{\\circ}$ resolution grid from 1-year composite Landsat imagery. We validate these models in two ways: quantitatively, by comparing our model's grid cell estimates aggregated at a county-level to several US Census county-level population projections, and qualitatively, by directly interpreting the model's predictions in terms of the satellite image inputs. We find that aggregating our model's estimates gives comparable results to the Census county-level population projections and that the predictions made by our model can be directly interpreted, which give it advantages over traditional population disaggregation methods. In general, our model is an example of how machine learning techniques can be an effective tool for extracting information from inherently unstructured, remotely sensed data to provide effective solutions to social problems.\nChemical multisensor devices need calibration algorithms to estimate gas concentrations. Their possible adoption as indicative air quality measurements devices poses new challenges due to the need to operate in continuous monitoring modes in uncontrolled environments. Several issues, including slow dynamics, continue to affect their real world performances. At the same time, the need for estimating pollutant concentrations on board the devices, espe- cially for wearables and IoT deployments, is becoming highly desirable. In this framework, several calibration approaches have been proposed and tested on a variety of proprietary devices and datasets; still, no thorough comparison is available to researchers. This work attempts a benchmarking of the most promising calibration algorithms according to recent literature with a focus on machine learning approaches. We test the techniques against absolute and dynamic performances, generalization capabilities and computational/storage needs using three different datasets sharing continuous monitoring operation methodology. Our results can guide researchers and engineers in the choice of optimal strategy. They show that non-linear multivariate techniques yield reproducible results, outperforming lin- ear approaches. Specifically, the Support Vector Regression method consistently shows good performances in all the considered scenarios. We highlight the enhanced suitability of shallow neural networks in a trade-off between performance and computational/storage needs. We confirm, on a much wider basis, the advantages of dynamic approaches with respect to static ones that only rely on instantaneous sensor array response. The latter have been shown to be best choice whenever prompt and precise response is needed.\nAnomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, in realworld applications, this process can be exceedingly difficult for the analyst since a large fraction of high-ranking anomalies are false positives and not interesting from the application perspective. In this paper, we aim to make the analyst's job easier by allowing for analyst feedback during the investigation process. Ideally, the feedback influences the ranking of the anomaly detector in a way that reduces the number of false positives that must be examined before discovering the anomalies of interest. In particular, we introduce a novel technique for incorporating simple binary feedback into tree-based anomaly detectors. We focus on the Isolation Forest algorithm as a representative tree-based anomaly detector, and show that we can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback. Our technique is simple and scales well as the size of the data increases, which makes it suitable for interactive discovery of anomalies in large datasets.\nIn recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al., 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al., 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that \"reads\" the passages to generate an answer to the question. Performance in this setting lags considerably behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader $(R^3)$, based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of generating the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets.\nA Behavior Tree (BT) is a way to structure the switching between different tasks in an autonomous agent, such as a robot or a virtual entity in a computer game. BTs are a very efficient way of creating complex systems that are both modular and reactive. These properties are crucial in many applications, which has led to the spread of BT from computer game programming to many branches of AI and Robotics. In this book, we will first give an introduction to BTs, then we describe how BTs relate to, and in many cases generalize, earlier switching structures. These ideas are then used as a foundation for a set of efficient and easy to use design principles. Properties such as safety, robustness, and efficiency are important for an autonomous system, and we describe a set of tools for formally analyzing these using a state space description of BTs. With the new analysis tools, we can formalize the descriptions of how BTs generalize earlier approaches. We also show the use of BTs in automated planning and machine learning. Finally, we describe an extended set of tools to capture the behavior of Stochastic BTs, where the outcomes of actions are described by probabilities. These tools enable the computation of both success probabilities and time to completion.\nA significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly reduce the output space of generated queries. Moreover, we use rewards from in-the-loop query execution over the database to learn a policy to generate unordered parts of the query, which we show are less suitable for optimization via cross entropy loss. In addition, we will publish WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia. This dataset is required to train our model and is an order of magnitude larger than comparable datasets. By applying policy-based reinforcement learning with a query execution environment to WikiSQL, our model Seq2SQL outperforms attentional sequence to sequence models, improving execution accuracy from 35.9% to 59.4% and logical form accuracy from 23.4% to 48.3%.\nThe introduction of artificial intelligence (AI) on visual images for emotional analysis obliterates the natural subjectivity and contextual dependence of our facial displays. Emotion AI places itself as an algorithmic lens on our digital artifacts and real-time interactions, creating the illusion of a new, objective class of data: our emotional and mental states. Building upon a rich network of existing public photographs--as well as fresh feeds from surveillance footage or smart phone cameras--these emotion algorithms require no additional infrastructure or improvements on image quality. In order to examine the potential policy and legal remedies for emotion AI as an emerging technology, we first establish a framework of actors, collection motivations, time scales, and space considerations that differentiates emotion AI from other algorithmic lenses. Each of these elements influences available policy remedies, and should shape continuing discussions on the antecedent conditions that make emotional AI acceptable or not in particular contexts. Based on our framework of unique elements, we examine potential available policy remedies to prevent or remediate harm. Specifically, our paper looks toward the regulatory role of the Federal Trade Commission in the US, gaps in the EU's General Data Protection Regulation (GDPR) allowing for emotion data collection, and precedent set by polygraph technologies in evidentiary and use restrictions set by law. We also examine the way social norms and adaptations could grow to also modulate broader use. Given the challenges in controlling the flow of these data, we call for further research and attention as emotion AI technology remains poised for adoption.\nOne key challenge in talent search is to translate complex criteria of a hiring position into a search query, while it is relatively easy for a searcher to list examples of suitable candidates for a given position. To improve search efficiency, we propose the next generation of talent search at LinkedIn, also referred to as Search By Ideal Candidates. In this system, a searcher provides one or several ideal candidates as the input to hire for a given position. The system then generates a query based on the ideal candidates and uses it to retrieve and rank results. Shifting from the traditional Query-By-Keyword to this new Query-By-Example system poses a number of challenges: How to generate a query that best describes the candidates? When moving to a completely different paradigm, how does one leverage previous product logs to learn ranking models and/or evaluate the new system with no existing usage logs? Finally, given the different nature between the two search paradigms, the ranking features typically used for Query-By-Keyword systems might not be optimal for Query-By-Example. This paper describes our approach to solving these challenges. We present experimental results confirming the effectiveness of the proposed solution, particularly on query building and search ranking tasks. As of writing this paper, the new system has been available to all LinkedIn members.\nMore and more of the information available on the web is dialogic, and a significant portion of it takes place in online forum conversations about current social and political topics. We aim to develop tools to summarize what these conversations are about. What are the CENTRAL PROPOSITIONS associated with different stances on an issue, what are the abstract objects under discussion that are central to a speaker's argument? How can we recognize that two CENTRAL PROPOSITIONS realize the same FACET of the argument? We hypothesize that the CENTRAL PROPOSITIONS are exactly those arguments that people find most salient, and use human summarization as a probe for discovering them. We describe our corpus of human summaries of opinionated dialogs, then show how we can identify similar repeated arguments, and group them into FACETS across many discussions of a topic. We define a new task, ARGUMENT FACET SIMILARITY (AFS), and show that we can predict AFS with a .54 correlation score, versus an ngram system baseline of .39 and a semantic textual similarity system baseline of .45.\nWe address a problem of area protection in graph-based scenarios with multiple mobile agents where connectivity is maintained among agents to ensure they can communicate. The problem consists of two adversarial teams of agents that move in an undirected graph shared by both teams. Agents are placed in vertices of the graph; at most one agent can occupy a vertex; and they can move into adjacent vertices in a conflict free way. Teams have asymmetric goals: the aim of one team - attackers - is to invade into given area while the aim of the opponent team - defenders - is to protect the area from being entered by attackers by occupying selected vertices. The team of defenders need to maintain connectivity of vertices occupied by its own agents in a visibility graph. The visibility graph models possibility of communication between pairs of vertices.   We study strategies for allocating vertices to be occupied by the team of defenders to block attacking agents where connectivity is maintained at the same time. To do this we reserve a subset of defending agents that do not try to block the attackers but instead are placed to support connectivity of the team. The performance of strategies is tested in multiple benchmarks. The success of a strategy is heavily dependent on the type of the instance, and so one of the contributions of this work is that we identify suitable strategies for diverse instance types.\nWe introduce a novel method to train agents of reinforcement learning (RL) by sharing knowledge in a way similar to the concept of using a book. The recorded information in the form of a book is the main means by which humans learn knowledge. Nevertheless, the conventional deep RL methods have mainly focused either on experiential learning where the agent learns through interactions with the environment from the start or on imitation learning that tries to mimic the teacher. Contrary to these, our proposed book learning shares key information among different agents in a book-like manner by delving into the following two characteristic features: (1) By defining the linguistic function, input states can be clustered semantically into a relatively small number of core clusters, which are forwarded to other RL agents in a prescribed manner. (2) By defining state priorities and the contents for recording, core experiences can be selected and stored in a small container. We call this container as `BOOK'. Our method learns hundreds to thousand times faster than the conventional methods by learning only a handful of core cluster information, which shows that deep RL agents can effectively learn through the shared knowledge from other agents.\nCapturing the temporal dynamics of user preferences over items is important for recommendation. Existing methods mainly assume that all time steps in user-item interaction history are equally relevant to recommendation, which however does not apply in real-world scenarios where user-item interactions can often happen accidentally. More importantly, they learn user and item dynamics separately, thus failing to capture their joint effects on user-item interactions. To better model user and item dynamics, we present the Interacting Attention-gated Recurrent Network (IARN) which adopts the attention model to measure the relevance of each time step. In particular, we propose a novel attention scheme to learn the attention scores of user and item history in an interacting way, thus to account for the dependencies between user and item dynamics in shaping user-item interactions. By doing so, IARN can selectively memorize different time steps of a user's history when predicting her preferences over different items. Our model can therefore provide meaningful interpretations for recommendation results, which could be further enhanced by auxiliary features. Extensive validation on real-world datasets shows that IARN consistently outperforms state-of-the-art methods.\nDeep learning has been shown to outperform traditional machine learning algorithms across a wide range of problem domains. However, current deep learning algorithms have been criticized as uninterpretable \"black-boxes\" which cannot explain their decision making processes. This is a major shortcoming that prevents the widespread application of deep learning to domains with regulatory processes such as finance. As such, industries such as finance have to rely on traditional models like decision trees that are much more interpretable but less effective than deep learning for complex problems. In this paper, we propose CLEAR-Trade, a novel financial AI visualization framework for deep learning-driven stock market prediction that mitigates the interpretability issue of deep learning methods. In particular, CLEAR-Trade provides a effective way to visualize and explain decisions made by deep stock market prediction models. We show the efficacy of CLEAR-Trade in enhancing the interpretability of stock market prediction by conducting experiments based on S&P 500 stock index prediction. The results demonstrate that CLEAR-Trade can provide significant insight into the decision-making process of deep learning-driven financial models, particularly for regulatory processes, thus improving their potential uptake in the financial industry.\nGenerative modeling, which learns joint probability distribution from training data and generates samples according to it, is an important task in machine learning and artificial intelligence. Inspired by probabilistic interpretation of quantum physics, we propose a generative model using matrix product states, which is a tensor network originally proposed for describing (particularly one-dimensional) entangled quantum states. Our model enjoys efficient learning by utilizing the density matrix renormalization group method which allows dynamic adjusting dimensions of the tensors, and offers an efficient direct sampling approach, Zipper, for generative tasks. We apply our method to generative modeling of several standard datasets including the principled Bars and Stripes, random binary patterns and the MNIST handwritten digits, to illustrate ability of our model, and discuss features as well as drawbacks of our model over popular generative models such as Hopfield model, Boltzmann machines and generative adversarial networks. Our work shed light on many interesting directions for future exploration on the development of quantum-inspired algorithms for unsupervised machine learning, which is of possibility of being realized by a quantum device.\nWhen people converse about social or political topics, similar arguments are often paraphrased by different speakers, across many different conversations. Debate websites produce curated summaries of arguments on such topics; these summaries typically consist of lists of sentences that represent frequently paraphrased propositions, or labels capturing the essence of one particular aspect of an argument, e.g. Morality or Second Amendment. We call these frequently paraphrased propositions ARGUMENT FACETS. Like these curated sites, our goal is to induce and identify argument facets across multiple conversations, and produce summaries. However, we aim to do this automatically. We frame the problem as consisting of two steps: we first extract sentences that express an argument from raw social media dialogs, and then rank the extracted arguments in terms of their similarity to one another. Sets of similar arguments are used to represent argument facets. We show here that we can predict ARGUMENT FACET SIMILARITY with a correlation averaging 0.63 compared to a human topline averaging 0.68 over three debate topics, easily beating several reasonable baselines.\nWe propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset.\nMultiple automakers have in development or in production automated driving systems (ADS) that offer freeway-pilot functions. This type of ADS is typically limited to restricted-access freeways only, that is, the transition from manual to automated modes takes place only after the ramp merging process is completed manually. One major challenge to extend the automation to ramp merging is that the automated vehicle needs to incorporate and optimize long-term objectives (e.g. successful and smooth merge) when near-term actions must be safely executed. Moreover, the merging process involves interactions with other vehicles whose behaviors are sometimes hard to predict but may influence the merging vehicle optimal actions. To tackle such a complicated control problem, we propose to apply Deep Reinforcement Learning (DRL) techniques for finding an optimal driving policy by maximizing the long-term reward in an interactive environment. Specifically, we apply a Long Short-Term Memory (LSTM) architecture to model the interactive environment, from which an internal state containing historical driving information is conveyed to a Deep Q-Network (DQN). The DQN is used to approximate the Q-function, which takes the internal state as input and generates Q-values as output for action selection. With this DRL architecture, the historical impact of interactive environment on the long-term reward can be captured and taken into account for deciding the optimal control policy. The proposed architecture has the potential to be extended and applied to other autonomous driving scenarios such as driving through a complex intersection or changing lanes under varying traffic flow conditions.\nA visual-relational knowledge graph (KG) is a multi-relational graph whose entities are associated with images. We introduce ImageGraph, a KG with 1,330 relation types, 14,870 entities, and 829,931 images. Visual-relational KGs lead to novel probabilistic query types where images are treated as first-class citizens. Both the prediction of relations between unseen images and multi-relational image retrieval can be formulated as query types in a visual-relational KG. We approach the problem of answering such queries with a novel combination of deep convolutional networks and models for learning knowledge graph embeddings. The resulting models can answer queries such as \"How are these two unseen images related to each other?\" We also explore a zero-shot learning scenario where an image of an entirely new entity is linked with multiple relations to entities of an existing KG. The multi-relational grounding of unseen entity images into a knowledge graph serves as the description of such an entity. We conduct experiments to demonstrate that the proposed deep architectures in combination with KG embedding objectives can answer the visual-relational queries efficiently and accurately.\nWe present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.\nDeep reinforcement learning has become an important paradigm for constructing agents that can enter complex multi-agent situations and improve their policies through experience. One commonly used technique is reactive training - applying standard RL methods while treating other agents as a part of the learner's environment. It is known that in general-sum games reactive training can lead groups of agents to converge to inefficient outcomes. We focus on one such class of environments: Stag Hunt games. Here agents either choose a risky cooperative policy (which leads to high payoffs if both choose it but low payoffs to an agent who attempts it alone) or a safe one (which leads to a safe payoff no matter what). We ask how we can change the learning rule of a single agent to improve its outcomes in Stag Hunts that include other reactive learners. We extend existing work on reward-shaping in multi-agent reinforcement learning and show that that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes. Thus, even if we control a single agent in a group making that agent prosocial can increase our agent's long-run payoff. We show experimentally that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.\nPower grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging countries. Industrial NTL detection systems are still largely based on expert knowledge when deciding whether to carry out costly on-site inspections of customers. Electricity providers are reluctant to move to large-scale deployments of automated systems that learn NTL profiles from data due to the latter's propensity to suggest a large number of unnecessary inspections. In this paper, we propose a novel system that combines automated statistical decision making with expert knowledge. First, we propose a machine learning framework that classifies customers into NTL or non-NTL using a variety of features derived from the customers' consumption data. The methodology used is specifically tailored to the level of noise in the data. Second, in order to allow human experts to feed their knowledge in the decision loop, we propose a method for visualizing prediction results at various granularity levels in a spatial hologram. Our approach allows domain experts to put the classification results into the context of the data and to incorporate their knowledge for making the final decisions of which customers to inspect. This work has resulted in appreciable results on a real-world data set of 3.6M customers. Our system is being deployed in a commercial NTL detection software.\nReinforcement Learning is divided in two main paradigms: model-free and model-based. Each of these two paradigms has strengths and limitations, and has been successfully applied to real world domains that are appropriate to its corresponding strengths. In this paper, we present a new approach aimed at bridging the gap between these two paradigms. We aim to take the best of the two paradigms and combine them in an approach that is at the same time data-efficient and cost-savvy. We do so by learning a probabilistic dynamics model and leveraging it as a prior for the intertwined model-free optimization. As a result, our approach can exploit the generality and structure of the dynamics model, but is also capable of ignoring its inevitable inaccuracies, by directly incorporating the evidence provided by the direct observation of the cost. Preliminary results demonstrate that our approach outperforms purely model-based and model-free approaches, as well as the approach of simply switching from a model-based to a model-free setting.\nThis paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.\nThe infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classic RBM. However, the convergence of learning the iRBM is slow, due to the fact that the iRBM is sensitive to the ordering of its hidden units, the learned filters change slowly from the left-most hidden unit to right. To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed. The key idea of the proposed training strategy is randomly regrouping the hidden units before each gradient descent step. Potentially, a mixing of infinite many iRBMs with different permutations of the hidden units can be achieved by this learning method, which has a similar effect of preventing the model from over-fitting as the dropout. The original iRBM is also modified to be capable of carrying out discriminative training. To evaluate the impact of our method on convergence speed of learning and the model's generalization ability, several experiments have been performed on the binarized MNIST and CalTech101 Silhouettes datasets. Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs.\nLanding an unmanned aerial vehicle (UAV) on a ground marker is an open problem despite the effort of the research community. Previous attempts mostly focused on the analysis of hand-crafted geometric features and the use of external sensors in order to allow the vehicle to approach the land-pad. In this article, we propose a method based on deep reinforcement learning that only requires low-resolution images taken from a down-looking camera in order to identify the position of the marker and land the UAV on it. The proposed approach is based on a hierarchy of Deep Q-Networks (DQNs) used as high-level control policy for the navigation toward the marker. We implemented different technical solutions, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Using domain randomization we trained the vehicle on uniform textures and we tested it on a large variety of simulated and real-world environments. The overall performance is comparable with a state-of-the-art algorithm and human pilots.\nA commonly used technique for managing AI complexity in real-time strategy (RTS) games is to use action and/or state abstractions. High-level abstractions can often lead to good strategic decision making, but tactical decision quality may suffer due to lost details. A competing method is to sample the search space which often leads to good tactical performance in simple scenarios, but poor high-level planning.   We propose to use a deep convolutional neural network (CNN) to select among a limited set of abstract action choices, and to utilize the remaining computation time for game tree search to improve low level tactics. The CNN is trained by supervised learning on game states labelled by Puppet Search, a strategic search algorithm that uses action abstractions. The network is then used to select a script --- an abstract action --- to produce low level actions for all units. Subsequently, the game tree search algorithm improves the tactical actions of a subset of units using a limited view of the game state only considering units close to opponent units.   Experiments in the microRTS game show that the combined algorithm results in higher win-rates than either of its two independent components and other state-of-the-art microRTS agents.   To the best of our knowledge, this is the first successful application of a convolutional network to play a full RTS game on standard game maps, as previous work has focused on sub-problems, such as combat, or on very small maps.\nWe investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small molecules) with associated bioactivities (e.g. inhibition of the target), learn a predictive mapping from molecular representation to activity. Although almost every type of machine learning method has been applied to QSAR learning there is no agreed single best way of learning QSARs, and therefore the problem area is well-suited to meta-learning. We first carried out the most comprehensive ever comparison of machine learning methods for QSAR learning: 18 regression methods, 6 molecular representations, applied to more than 2,700 QSAR problems. (These results have been made publicly available on OpenML and represent a valuable resource for testing novel meta-learning methods.) We then investigated the utility of algorithm selection for QSAR problems. We found that this meta-learning approach outperformed the best individual QSAR learning method (random forests using a molecular fingerprint representation) by up to 13%, on average. We conclude that meta-learning outperforms base-learning methods for QSAR learning, and as this investigation is one of the most extensive ever comparisons of base and meta-learning methods ever made, it provides evidence for the general effectiveness of meta-learning over base-learning.\nThe rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.\nKnowledge graph (KG) is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts. However, it is challenging to build QA systems which can learn to reason over knowledge graphs based on question-answer pairs alone. First, when people ask questions, their expressions are noisy (for example, typos in texts, or variations in pronunciations), which is non-trivial for the QA system to match those mentioned entities to the knowledge graph. Second, many questions require multi-hop logic reasoning over the knowledge graph to retrieve the answers. To address these challenges, we propose a novel and unified deep learning architecture, and an end-to-end variational learning algorithm which can handle noise in questions, and learn multi-hop reasoning simultaneously. Our method achieves state-of-the-art performance on a recent benchmark dataset in the literature. We also derive a series of new benchmark datasets, including questions for multi-hop reasoning, questions paraphrased by neural translation model, and questions in human voice. Our method yields very promising results on all these challenging datasets.\nIn allocation problems, a given set of goods are assigned to agents in such a way that the social welfare is maximised, that is, the largest possible global worth is achieved. When goods are indivisible, it is possible to use money compensation to perform a fair allocation taking into account the actual contribution of all agents to the social welfare. Coalitional games provide a formal mathematical framework to model such problems, in particular the Shapley value is a solution concept widely used for assigning worths to agents in a fair way. Unfortunately, computing this value is a $\\#{\\rm P}$-hard problem, so that applying this good theoretical notion is often quite difficult in real-world problems.   We describe useful properties that allow us to greatly simplify the instances of allocation problems, without affecting the Shapley value of any player. Moreover, we propose algorithms for computing lower bounds and upper bounds of the Shapley value, which in some cases provide the exact result and that can be combined with approximation algorithms.   The proposed techniques have been implemented and tested on a real-world application of allocation problems, namely, the Italian research assessment program, known as VQR. For the large university considered in the experiments, the problem involves thousands of agents and goods (here, researchers and their research products). The algorithms described in the paper are able to compute the Shapley value for most of those agents, and to get a good approximation of the Shapley value for all of them.\nThere has been a good amount of progress in sentiment analysis over the past 10 years, including the proposal of new methods and the creation of benchmark datasets. In some papers, however, there is a tendency to compare models only on one or two datasets, either because of time restraints or because the model is tailored to a specific task. Accordingly, it is hard to understand how well a certain model generalizes across different tasks and datasets. In this paper, we contribute to this situation by comparing several models on six different benchmarks, which belong to different domains and additionally have different levels of granularity (binary, 3-class, 4-class and 5-class). We show that Bi-LSTMs perform well across datasets and that both LSTMs and Bi-LSTMs are particularly good at fine-grained sentiment tasks (i. e., with more than two classes). Incorporating sentiment information into word embeddings during training gives good results for datasets that are lexically similar to the training data. With our experiments, we contribute to a better understanding of the performance of different model architectures on different data sets. Consequently, we detect novel state-of-the-art results on the SenTube datasets.\nReinforcement Learning AI commonly uses reward/penalty signals that are objective and explicit in an environment -- e.g. game score, completion time, etc. -- in order to learn the optimal strategy for task performance. However, Human-AI interaction for such AI agents should include additional reinforcement that is implicit and subjective -- e.g. human preferences for certain AI behavior -- in order to adapt the AI behavior to idiosyncratic human preferences. Such adaptations would mirror naturally occurring processes that increase trust and comfort during social interactions. Here, we show how a hybrid brain-computer-interface (hBCI), which detects an individual's level of interest in objects/events in a virtual environment, can be used to adapt the behavior of a Deep Reinforcement Learning AI agent that is controlling a virtual autonomous vehicle. Specifically, we show that the AI learns a driving strategy that maintains a safe distance from a lead vehicle, and most novelly, preferentially slows the vehicle when the human passengers of the vehicle encounter objects of interest. This adaptation affords an additional 20\\% viewing time for subjectively interesting objects. This is the first demonstration of how an hBCI can be used to provide implicit reinforcement to an AI agent in a way that incorporates user preferences into the control system.\nVisual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack of proper methods to measure the robustness of VQA models. There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the ranked basic questions, with similarity scores, of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question about the given image. We claim that a robust VQA model is one, whose performance is not changed much when related basic questions as also made available to it as input. We formulate the basic questions generation problem as a LASSO optimization, and also propose a large scale Basic Question Dataset (BQD) and Rscore (novel robustness measure), for analyzing the robustness of VQA models. We hope our BQD will be used as a benchmark for to evaluate the robustness of VQA models, so as to help the community build more robust and accurate VQA models.\nRecurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, \"Directional Self-Attention Network (DiSAN)\", is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.\nTime series prediction is of great significance in many applications and has attracted extensive attention from the data mining community. Existing work suggests that for many problems, the shape in the current time series may correlate an upcoming shape in the same or another series. Therefore, it is a promising strategy to associate two recurring patterns as a rule's antecedent and consequent: the occurrence of the antecedent can foretell the occurrence of the consequent, and the learned shape of consequent will give accurate predictions. Earlier work employs symbolization methods, but the symbolized representation maintains too little information of the original series to mine valid rules. The state-of-the-art work, though directly manipulating the series, fails to segment the series precisely for seeking antecedents/consequents, resulting in inaccurate rules in common scenarios. In this paper, we propose a novel motif-based rule discovery method, which utilizes motif discovery to accurately extract frequently occurring consecutive subsequences, i.e. motifs, as antecedents/consequents. It then investigates the underlying relationships between motifs by matching motifs as rule candidates and ranking them based on the similarities. Experimental results on real open datasets show that the proposed approach outperforms the baseline method by 23.9%. Furthermore, it extends the applicability from single time series to multiple ones.\nOur understanding of the world depends highly on our capacity to produce intuitive and simplified representations which can be easily used to solve problems. We reproduce this simplification process using a neural network to build a low dimensional state representation of the world from images acquired by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way using prior knowledge about the world as loss functions called robotic priors and extend this approach to high dimension richer images to learn a 3D representation of the hand position of a robot from RGB images. We propose a quantitative evaluation of the learned representation using nearest neighbors in the state space that allows to assess its quality and show both the potential and limitations of robotic priors in realistic environments. We augment image size, add distractors and domain randomization, all crucial components to achieve transfer learning to real robots. Finally, we also contribute a new prior to improve the robustness of the representation. The applications of such low dimensional state representation range from easing reinforcement learning (RL) and knowledge transfer across tasks, to facilitating learning from raw data with more efficient and compact high level representations. The results show that the robotic prior approach is able to extract high level representation as the 3D position of an arm and organize it into a compact and coherent space of states in a challenging dataset.\nTracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over time. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose change is mostly attributed to the subject's interaction with the surrounding, e.g., crossing behind another object, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluent and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate this joint task as an iterative search of a feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrate that our method outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.\nTo capture the inherent geometric features of many community detection problems, we propose to use a new random graph model of communities that we call a Geometric Block Model. The geometric block model generalizes the random geometric graphs in the same way that the well-studied stochastic block model generalizes the Erdos-Renyi random graphs. It is also a natural extension of random community models inspired by the recent theoretical and practical advancement in community detection. While being a topic of fundamental theoretical interest, our main contribution is to show that many practical community structures are better explained by the geometric block model. We also show that a simple triangle-counting algorithm to detect communities in the geometric block model is near-optimal. Indeed, even in the regime where the average degree of the graph grows only logarithmically with the number of vertices (sparse-graph), we show that this algorithm performs extremely well, both theoretically and practically. In contrast, the triangle-counting algorithm is far from being optimum for the stochastic block model. We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm.\nThe vehicle to represent Knowledge Organization Systems (KOSs) in the environment of the Semantic Web and linked data is the Simple Knowledge Organization System (SKOS). SKOS provides a way to assign a URI to each concept, and this URI functions as a surrogate for the concept. This fact makes of main concern the need to clarify the URIs' ontological meaning. The aim of this study is to investigate the relation between the ontological substance of KOS concepts and concepts revealed through the grammatical and syntactic formalisms of natural language. For this purpose, we examined the dividableness of concepts in specific KOSs (i.e. a thesaurus, a subject headings system and a classification scheme) by applying Natural Language Processing (NLP) techniques (i.e. morphosyntactic analysis) to the lexical representations (i.e. RDF literals) of SKOS concepts. The results of the comparative analysis reveal that, despite the use of multi-word units, thesauri tend to represent concepts in a way that can hardly be further divided conceptually, while Subject Headings and Classification Schemes - to a certain extent - comprise terms that can be decomposed into more conceptual constituents. Consequently, SKOS concepts deriving from thesauri are more likely to represent atomic conceptual units and thus be more appropriate tools for inference and reasoning. Since identifiers represent the meaning of a concept, complex concepts are neither the most appropriate nor the most efficient way of modelling a KOS for the Semantic Web.\nDeep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppose we have a testing example, whose label can be correctly predicted by a DNN classifier. An attacker can add a small carefully crafted noise to the testing example such that the DNN classifier predicts an incorrect label, where the crafted testing example is called adversarial example. Such attacks are called evasion attacks. Evasion attacks are one of the biggest challenges for deploying DNNs in safety and security critical applications such as self-driving cars. In this work, we develop new methods to defend against evasion attacks. Our key observation is that adversarial examples are close to the classification boundary. Therefore, we propose region-based classification to be robust to adversarial examples. For a benign/adversarial testing example, we ensemble information in a hypercube centered at the example to predict its label. In contrast, traditional classifiers are point-based classification, i.e., given a testing example, the classifier predicts its label based on the testing example alone. Our evaluation results on MNIST and CIFAR-10 datasets demonstrate that our region-based classification can significantly mitigate evasion attacks without sacrificing classification accuracy on benign examples. Specifically, our region-based classification achieves the same classification accuracy on testing benign examples as point-based classification, but our region-based classification is significantly more robust than point-based classification to various evasion attacks.\nThis paper proposes a push and pull search (PPS) framework for solving constrained multi-objective optimization problems (CMOPs). To be more specific, the proposed PPS divides the search process into two different stages, including the push and pull search stages. In the push stage, a multi-objective evolutionary algorithm (MOEA) is adopted to explore the search space without considering any constraints, which can help to get across infeasible regions very fast and approach the unconstrained Pareto front. Furthermore, the landscape of CMOPs with constraints can be probed and estimated in the push stage, which can be utilized to conduct the parameters setting for constraint-handling approaches applied in the pull stage. Then, a constrained multi-objective evolutionary algorithm (CMOEA) equipped with an improved epsilon constraint-handling is applied to pull the infeasible individuals achieved in the push stage to the feasible and non-dominated regions. Compared with other CMOEAs, the proposed PPS method can more efficiently get across infeasible regions and converge to the feasible and non-dominated regions by applying push and pull search strategies at different stages. To evaluate the performance regarding convergence and diversity, a set of benchmark CMOPs is used to test the proposed PPS and compare with other five CMOEAs, including MOEA/D-CDP, MOEA/D-SR, C-MOEA/D, MOEA/D-Epsilon and MOEA/D-IEpsilon. The comprehensive experimental results demonstrate that the proposed PPS achieves significantly better or competitive performance than the other five CMOEAs on most of the benchmark set.\nDeveloping useful interfaces between brains and machines is a grand challenge of neuroengineering. An effective interface has the capacity to not only interpret neural signals, but predict the intentions of the human to perform an action in the near future; prediction is made even more challenging outside well-controlled laboratory experiments. This paper describes our approach to detect and to predict natural human arm movements in the future, a key challenge in brain computer interfacing that has never before been attempted. We introduce the novel Annotated Joints in Long-term ECoG (AJILE) dataset; AJILE includes automatically annotated poses of 7 upper body joints for four human subjects over 670 total hours (more than 72 million frames), along with the corresponding simultaneously acquired intracranial neural recordings. The size and scope of AJILE greatly exceeds all previous datasets with movements and electrocorticography (ECoG), making it possible to take a deep learning approach to movement prediction. We propose a multimodal model that combines deep convolutional neural networks (CNN) with long short-term memory (LSTM) blocks, leveraging both ECoG and video modalities. We demonstrate that our models are able to detect movements and predict future movements up to 800 msec before movement initiation. Further, our multimodal movement prediction models exhibit resilience to simulated ablation of input neural signals. We believe a multimodal approach to natural neural decoding that takes context into account is critical in advancing bioelectronic technologies and human neuroscience.\nAs computational power has continued to increase, and sensors have become more accurate, the corresponding advent of systems that are cognitive-and-immersive (CAI) has come to pass. CAI systems fall squarely into the intersection of AI with HCI/HRI: such systems interact with and assist the human agents that enter them, in no small part because such systems are infused with AI able to understand and reason about these humans and their beliefs, goals, and plans. We herein explain our approach to engineering CAI systems. We emphasize the capacity of a CAI system to develop and reason over a \"theory of the mind\" of its humans partners. This capacity means that the AI in question has a sophisticated model of the beliefs, knowledge, goals, desires, emotions, etc. of these humans. To accomplish this engineering, a formal framework of very high expressivity is needed. In our case, this framework is a \\textit{cognitive event calculus}, a partciular kind of quantified multi-modal logic, and a matching high-expressivity planner. To explain, advance, and to a degree validate our approach, we show that a calculus of this type can enable a CAI system to understand a psychologically tricky scenario couched in what we call the \\textit{cognitive blockworld framework} (CBF). CBF includes machinery able to represent and plan over not merely blocks and actions, but also agents and their mental attitudes about other agents.\nWe present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as label-label correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and state-of-the-art methods for large-scale multi-label learning. To facilitate end-to-end learning, we develop a joint learning algorithm that can learn the embeddings as well as a regression model that predicts these embeddings given input features, via efficient gradient-based methods.\nWe consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables. Given i.i.d samples from the joint distribution $f(x,y,z)$ of continuous random vectors $X,Y$ and $Z,$ we determine whether $X \\perp Y | Z$. We approach this by converting the conditional independence test into a classification problem. This allows us to harness very powerful classifiers like gradient-boosted trees and deep neural networks. These models can handle complex probability distributions and allow us to perform significantly better compared to the prior state of the art, for high-dimensional CI testing. The main technical challenge in the classification problem is the need for samples from the conditional product distribution $f^{CI}(x,y,z) = f(x|z)f(y|z)f(z)$ -- the joint distribution if and only if $X \\perp Y | Z.$ -- when given access only to i.i.d. samples from the true joint distribution $f(x,y,z)$. To tackle this problem we propose a novel nearest neighbor bootstrap procedure and theoretically show that our generated samples are indeed close to $f^{CI}$ in terms of total variational distance. We then develop theoretical results regarding the generalization bounds for classification for our problem, which translate into error bounds for CI testing. We provide a novel analysis of Rademacher type classification bounds in the presence of non-i.i.d near-independent samples. We empirically validate the performance of our algorithm on simulated and real datasets and show performance gains over previous methods.\nGenerating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .\nWhile machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications for various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances.\nIn knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness.   In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset.   We then analyze what contributes to one property being rated as more important than another one, and identify that at least three factors play a role, namely (i) general frequency, (ii) applicability to similar entities and (iii) semantic similarity between property and entity. We experimentally analyze the contribution of each factor and show that a combination of techniques addressing all the three factors achieves 74% precision on the task.   The dataset is available at www.kaggle.com/srazniewski/wikidatapropertyranking.\nObjective: Electronic medical records (EMRs) contain an amount of medical knowledge which can be used for clinical decision support (CDS). Our objective is a general system that can extract and represent these knowledge contained in EMRs to support three CDS tasks: test recommendation, initial diagnosis, and treatment plan recommendation, with the given condition of one patient. Methods: We extracted four kinds of medical entities from records and constructed an EMR-based medical knowledge network (EMKN), in which nodes are entities and edges reflect their co-occurrence in a single record. Three bipartite subgraphs (bi-graphs) were extracted from the EMKN to support each task. One part of the bi-graph was the given condition (e.g., symptoms), and the other was the condition to be inferred (e.g., diseases). Each bi-graph was regarded as a Markov random field to support the inference. Three lazy energy functions and one parameter-based energy function were proposed, as well as two knowledge representation learning-based energy functions, which can provide a distributed representation of medical entities. Three measures were utilized for performance evaluation. Results: On the initial diagnosis task, 80.11% of the test records identified at least one correct disease from top 10 candidates. Test and treatment recommendation results were 87.88% and 92.55%, respectively. These results altogether indicate that the proposed system outperformed the baseline methods. The distributed representation of medical entities does reflect similarity relationships in regards to knowledge level. Conclusion: Combining EMKN and MRF is an effective approach for general medical knowledge representation and inference. Different tasks, however, require designing their energy functions individually.\nThe most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the \"pendubot\" swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.\nSwarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.\nSuzuki and Niida (Ann. Pure. Appl. Logic, 2015) showed the following results on independent distributions (IDs) on an AND-OR tree, where they took only depth-first algorithms into consideration. (1) Among IDs such that probability of the root having value 0 is fixed as a given r such that 0 < r < 1, if d is a maximizer of cost of the best algorithm then d is an independent and identical distribution (IID). (2) Among all IDs, if d is a maximizer of cost of the best algorithm then d is an IID. In the case where non-depth-first algorithms are taken into consideration, the counter parts of (1) and (2) are left open in the above work. Peng et al. (Inform. Process. Lett., 2017) extended (1) and (2) to multi-branching trees, where in (2) they put an additional hypothesis on IDs that probability of the root having value 0 is neither 0 nor 1. We give positive answers for the two questions of Suzuki-Niida. A key to the proof is that if ID d achieves the equilibrium among IDs then we can chose an algorithm of the best cost against d from depth-first algorithms. In addition, we extend the result of Peng et al. to the case where non-depth-first algorithms are taken into consideration.\nWe study a unique network dataset including periodic surveys and electronic logs of dyadic contacts via smartphones. The participants were a sample of freshmen entering university in the Fall 2011. Their opinions on a variety of political and social issues and lists of activities on campus were regularly recorded at the beginning and end of each semester for the first three years of study. We identify a behavioral network defined by call and text data, and a cognitive network based on friendship nominations in ego-network surveys. Both networks are limited to study participants. Since a wide range of attributes on each node were collected in self-reports, we refer to these networks as attribute-rich networks. We study whether student preferences for certain attributes of friends can predict formation and dissolution of edges in both networks. We introduce a method for computing student preferences for different attributes which we use to predict link formation and dissolution. We then rank these attributes according to their importance for making predictions. We find that personal preferences, in particular political views, and preferences for common activities help predict link formation and dissolution in both the behavioral and cognitive networks.\nWe consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, to attain a higher level of robustness and accuracy, we introduce additional sparse depth samples, which are either acquired with a low-resolution depth sensor or computed via visual Simultaneous Localization and Mapping (SLAM) algorithms. We propose the use of a single deep regression network to learn directly from the RGB-D raw data, and explore the impact of number of depth samples on prediction accuracy. Our experiments show that, compared to using only RGB images, the addition of 100 spatially random depth samples reduces the prediction root-mean-square error by 50% on the NYU-Depth-v2 indoor dataset. It also boosts the percentage of reliable prediction from 59% to 92% on the KITTI dataset. We demonstrate two applications of the proposed algorithm: a plug-in module in SLAM to convert sparse maps to dense maps, and super-resolution for LiDARs. Software and video demonstration are publicly available.\nE-commerce websites such as Amazon, Alibaba, Flipkart, and Walmart sell billions of products. Machine learning (ML) algorithms involving products are often used to improve the customer experience and increase revenue, e.g., product similarity, recommendation, and price estimation. The products are required to be represented as features before training an ML algorithm. In this paper, we propose an approach called MRNet-Product2Vec for creating generic embeddings of products within an e-commerce ecosystem. We learn a dense and low-dimensional embedding where a diverse set of signals related to a product are explicitly injected into its representation. We train a Discriminative Multi-task Bidirectional Recurrent Neural Network (RNN), where the input is a product title fed through a Bidirectional RNN and at the output, product labels corresponding to fifteen different tasks are predicted. The task set includes several intrinsic characteristics about a product such as price, weight, size, color, popularity, and material. We evaluate the proposed embedding quantitatively and qualitatively. We demonstrate that they are almost as good as sparse and extremely high-dimensional TF-IDF representation in spite of having less than 3% of the TF-IDF dimension. We also use a multimodal autoencoder for comparing products from different language-regions and show preliminary yet promising qualitative results.\nGraph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximally preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.\nInstrumenting and collecting annotated visual grasping datasets to train modern machine learning algorithms can be extremely time-consuming and expensive. An appealing alternative is to use off-the-shelf simulators to render synthetic data for which ground-truth annotations are generated automatically. Unfortunately, models trained purely on simulated data often fail to generalize to the real world. We study how randomized simulated environments and domain adaptation methods can be extended to train a grasping system to grasp novel objects from raw monocular RGB images. We extensively evaluate our approaches with a total of more than 25,000 physical test grasps, studying a range of simulation conditions and domain adaptation methods, including a novel extension of pixel-level domain adaptation that we term the GraspGAN. We show that, by using synthetic data and domain adaptation, we are able to reduce the number of real-world samples needed to achieve a given level of performance by up to 50 times, using only randomly generated simulated objects. We also show that by using only unlabeled real-world data and our GraspGAN methodology, we obtain real-world grasping performance without any real-world labels that is similar to that achieved with 939,777 labeled real-world samples.\nAuthoring of OWL-DL ontologies is intellectually challenging and to make this process simpler, many systems accept natural language text as input. A text-based ontology authoring approach can be successful only when it is combined with an effective method for extracting ontological axioms from text. Extracting axioms from unrestricted English input is a substantially challenging task due to the richness of the language. Controlled natural languages (CNLs) have been proposed in this context and these tend to be highly restrictive. In this paper, we propose a new CNL called TEDEI (TExtual DEscription Identifier) whose grammar is inspired by the different ways OWL-DL constructs are expressed in English. We built a system that transforms TEDEI sentences into corresponding OWL-DL axioms. Now, ambiguity due to different possible lexicalizations of sentences and semantic ambiguity present in sentences are challenges in this context. We find that the best way to handle these challenges is to construct axioms corresponding to alternative formalizations of the sentence so that the end-user can make an appropriate choice. The output is compared against human-authored axioms and in substantial number of cases, human-authored axiom is indeed one of the alternatives given by the system. The proposed system substantially enhances the types of sentence structures that can be used for ontology authoring.\nBiclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CP-LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the original CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive.\nRecently there has been increasing interest in probabilistic solvers for ordinary differential equations (ODEs) that return full probability measures, instead of point estimates, over the solution and can incorporate uncertainty over the ODE at hand, e.g. if the vector field or the initial value is only approximately known or evaluable. The ODE filter proposed in recent work models the solution of the ODE by a Gauss-Markov process which serves as a prior in the sense of Bayesian statistics. While previous work employed a Wiener process prior on the (possibly multiple times) differentiated solution of the ODE and established equivalence of the corresponding solver with classical numerical methods, this paper raises the question whether other priors also yield practically useful solvers. To this end, we discuss a range of possible priors which enable fast filtering and propose a new prior--the Integrated Ornstein Uhlenbeck Process (IOUP)--that complements the existing Integrated Wiener process (IWP) filter by encoding the property that a derivative in time of the solution is bounded in the sense that it tends to drift back to zero. We provide experiments comparing IWP and IOUP filters which support the belief that IWP approximates better divergent ODE's solutions whereas IOUP is a better prior for trajectories with bounded derivatives.\nA new prior is proposed for representation learning, which can be combined with other priors in order to help disentangling abstract factors from each other. It is inspired by the phenomenon of consciousness seen as the formation of a low-dimensional combination of a few concepts constituting a conscious thought, i.e., consciousness as awareness at a particular time instant. This provides a powerful constraint on the representation in that such low-dimensional thought vectors can correspond to statements about reality which are true, highly probable, or very useful for taking decisions. The fact that a few elements of the current state can be combined into such a predictive or useful statement is a strong constraint and deviates considerably from the maximum likelihood approaches to modelling data and how states unfold in the future based on an agent's actions. Instead of making predictions in the sensory (e.g. pixel) space, the consciousness prior allows the agent to make predictions in the abstract space, with only a few dimensions of that space being involved in each of these predictions. The consciousness prior also makes it natural to map conscious states to natural language utterances or to express classical AI knowledge in the form of facts and rules, although the conscious states may be richer than what can be expressed easily in the form of a sentence, a fact or a rule.\nAutomatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of the generative model as a reinforcement learning policy has shown promising results in text generation. However, the scalar guiding signal is only available after the entire text has been generated and lacks intermediate information about text structure during the generative process. As such, it limits its success when the length of the generated text samples is long (more than 20 words). In this paper, we propose a new framework, called LeakGAN, to address the problem for long text generation. We allow the discriminative net to leak its own high-level extracted features to the generative net to further help the guidance. The generator incorporates such informative signals into all generation steps through an additional Manager module, which takes the extracted features of current generated words and outputs a latent vector to guide the Worker module for next-word generation. Our extensive experiments on synthetic data and various real-world tasks with Turing test demonstrate that LeakGAN is highly effective in long text generation and also improves the performance in short text generation scenarios. More importantly, without any supervision, LeakGAN would be able to implicitly learn sentence structures only through the interaction between Manager and Worker.\nUltra-dense heterogeneous networks (Ud-HetNets) have been put forward to improve the network capacity for next-generation wireless networks. However, counter to the 5G vision, ultra-dense deployment of networks would significantly increase energy consumption and thus decrease network energy efficiency suffering from the conventional worst-case network design philosophy. This problem becomes particularly severe when Ud-HetNets meet big data because of the traditional reactive request-transmit service mode. In view of these, this article first develops a big-data-aware artificial intelligent based framework for energy-efficient operations of Ud-HetNets. Based on the framework, we then identify four promising techniques, namely big data analysis, adaptive base station operation, proactive caching, and interference-aware resource allocation, to reduce energy cost on both large and small scales. We further develop a load-aware stochastic optimization approach to show the potential of our proposed framework and techniques in energy conservation. In a nutshell, we devote to constructing green Ud-HetNets of big data with the abilities of learning and inferring by improving the flexibility of control from worst-case to adaptive design and shifting the manner of services from reactive to proactive modes.\nWe propose Object-oriented Neural Programming (OONP), a framework for semantically parsing documents in specific domains. Basically, OONP reads a document and parses it into a predesigned object-oriented data structure (referred to as ontology in this paper) that reflects the domain-specific semantics of the document. An OONP parser models semantic parsing as a decision process: a neural net-based Reader sequentially goes through the document, and during the process it builds and updates an intermediate ontology to summarize its partial understanding of the text it covers. OONP supports a rich family of operations (both symbolic and differentiable) for composing the ontology, and a big variety of forms (both symbolic and differentiable) for representing the state and the document. An OONP parser can be trained with supervision of different forms and strength, including supervised learning (SL) , reinforcement learning (RL) and hybrid of the two. Our experiments on both synthetic and real-world document parsing tasks have shown that OONP can learn to handle fairly complicated ontology with training data of modest sizes.\nThis paper is concerned with the problem of exact MAP inference in general higher-order graphical models by means of a traditional linear programming relaxation approach. In fact, the proof that we have developed in this paper is a rather simple algebraic proof being made straightforward, above all, by the introduction of two novel algebraic tools. Indeed, on the one hand, we introduce the notion of delta-distribution which merely stands for the difference of two arbitrary probability distributions, and which mainly serves to alleviate the sign constraint inherent to a traditional probability distribution. On the other hand, we develop an approximation framework of general discrete functions by means of an orthogonal projection expressing in terms of linear combinations of function margins with respect to a given collection of point subsets, though, we rather exploit the latter approach for the purpose of modeling locally consistent sets of discrete functions from a global perspective. After that, as a first step, we develop from scratch the expectation optimization framework which is nothing else than a reformulation, on stochastic grounds, of the convex-hull approach, as a second step, we develop the traditional LP relaxation of such an expectation optimization approach, and we show that it enables to solve the MAP inference problem in graphical models under rather general assumptions. Last but not least, we describe an algorithm which allows to compute an exact MAP solution from a perhaps fractional optimal (probability) solution of the proposed LP relaxation.\nScholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. I report from an ethnography of infrastructure in Wikipedia to discuss an often understudied aspect of this topic: the local, contextual, learned expertise involved in participating in a highly automated social-technical environment. Today, the organizational culture of Wikipedia is deeply intertwined with various data-driven algorithmic systems, which Wikipedians rely on to help manage and govern the \"anyone can edit\" encyclopedia at a massive scale. These bots, scripts, tools, plugins, and dashboards make Wikipedia more efficient for those who know how to work with them, but like all organizational culture, newcomers must learn them if they want to fully participate. I illustrate how cultural and organizational expertise is enacted around algorithmic agents by discussing two autoethnographic vignettes, which relate my personal experience as a veteran in Wikipedia. I present thick descriptions of how governance and gatekeeping practices are articulated through and in alignment with these automated infrastructures. Over the past 15 years, Wikipedian veterans and administrators have made specific decisions to support administrative and editorial workflows with automation in particular ways and not others. I use these cases of Wikipedia's bot-supported bureaucracy to discuss several issues in the fields of critical algorithms studies, critical data studies, and fairness, accountability, and transparency in machine learning -- most principally arguing that scholarship and practice must go beyond trying to \"open up the black box\" of such systems and also examine sociocultural processes like newcomer socialization.\nIn the context of fitness coaching or for rehabilitation purposes, the motor actions of a human participant must be observed and analyzed for errors in order to provide effective feedback. This task is normally carried out by human coaches, and it needs to be solved automatically in technical applications that are to provide automatic coaching (e.g. training environments in VR). However, most coaching systems only provide coarse information on movement quality, such as a scalar value per body part that describes the overall deviation from the correct movement. Further, they are often limited to static body postures or rather simple movements of single body parts. While there are many approaches to distinguish between different types of movements (e.g., between walking and jumping), the detection of more subtle errors in a motor performance is less investigated. We propose a novel approach to classify errors in sports or rehabilitation exercises such that feedback can be delivered in a rapid and detailed manner: Homogeneous sub-sequences of exercises are first temporally aligned via Dynamic Time Warping. Next, we extract a feature vector from the aligned sequences, which serves as a basis for feature selection using Random Forests. The selected features are used as input for Support Vector Machines, which finally classify the movement errors. We compare our algorithm to a well established state-of-the-art approach in time series classification, 1-Nearest Neighbor combined with Dynamic Time Warping, and show our algorithm's superiority regarding classification quality as well as computational cost.\nWith the advancement of treatment modalities in radiation therapy for cancer patients, outcomes have improved, but at the cost of increased treatment plan complexity and planning time. The accurate prediction of dose distributions would alleviate this issue by guiding clinical plan optimization to save time and maintain high quality plans. We have modified a convolutional deep network model, U-net (originally designed for segmentation purposes), for predicting dose from patient image contours. We show that, as an example, we are able to accurately predict the dose of intensity-modulated radiation therapy (IMRT) for prostate cancer patients, where the average dice similarity coefficient is 0.91 when comparing the predicted vs. true isodose volumes between 0% and 100% of the prescription dose. The average value of the absolute differences in [max, mean] dose is found to be under 5% of the prescription dose, specifically for each structure is [1.80%, 1.03%](PTV), [1.94%, 4.22%](Bladder), [1.80%, 0.48%](Body), [3.87%, 1.79%](L Femoral Head), [5.07%, 2.55%](R Femoral Head), and [1.26%, 1.62%](Rectum) of the prescription dose. We thus managed to map a desired radiation dose distribution from a patient's PTV and OAR contours. As an additional advantage, relatively little data was used in the techniques and models described in this paper.\nThis paper introduces a novel real-time Fuzzy Supervised Learning with Binary Meta-Feature (FSL-BM) for big data classification task. The study of real-time algorithms addresses several major concerns, which are namely: accuracy, memory consumption, and ability to stretch assumptions and time complexity. Attaining a fast computational model providing fuzzy logic and supervised learning is one of the main challenges in the machine learning. In this research paper, we present FSL-BM algorithm as an efficient solution of supervised learning with fuzzy logic processing using binary meta-feature representation using Hamming Distance and Hash function to relax assumptions. While many studies focused on reducing time complexity and increasing accuracy during the last decade, the novel contribution of this proposed solution comes through integration of Hamming Distance, Hash function, binary meta-features, binary classification to provide real time supervised method. Hash Tables (HT) component gives a fast access to existing indices; and therefore, the generation of new indices in a constant time complexity, which supersedes existing fuzzy supervised algorithms with better or comparable results. To summarize, the main contribution of this technique for real-time Fuzzy Supervised Learning is to represent hypothesis through binary input as meta-feature space and creating the Fuzzy Supervised Hash table to train and validate model.\nIn this dissertation the practical speech emotion recognition technology is studied, including several cognitive related emotion types, namely fidgetiness, confidence and tiredness. The high quality of naturalistic emotional speech data is the basis of this research. The following techniques are used for inducing practical emotional speech: cognitive task, computer game, noise stimulation, sleep deprivation and movie clips.   A practical speech emotion recognition system is studied based on Gaussian mixture model. A two-class classifier set is adopted for performance improvement under the small sample case. Considering the context information in continuous emotional speech, a Gaussian mixture model embedded with Markov networks is proposed.   A further study is carried out for system robustness analysis. First, noise reduction algorithm based on auditory masking properties is fist introduced to the practical speech emotion recognition. Second, to deal with the complicated unknown emotion types under real situation, an emotion recognition method with rejection ability is proposed, which enhanced the system compatibility against unknown emotion samples. Third, coping with the difficulties brought by a large number of unknown speakers, an emotional feature normalization method based on speaker-sensitive feature clustering is proposed. Fourth, by adding the electrocardiogram channel, a bi-modal emotion recognition system based on speech signals and electrocardiogram signals is first introduced.   The speech emotion recognition methods studied in this dissertation may be extended into the cross-language speech emotion recognition and the whispered speech emotion recognition.\nThe increasing interconnectivity of industrial networks is one of the central current hot topics. It is adressed by research institutes, as well as industry. In order to perform the fourth industrial revolution, a full connectivity between production facilities is necessary. Due to this connectivity, however, an abundance of new attack vectors emerges. In the National Reference Project for Industrial IT-Security (IUNO), these risks and threats are addressed and solutions are developed. These solutions are especially applicable for small and medium sized enterprises that have not as much means in staff as well as money as larger companies. These enterprises should be able to implement the solutions without much effort. The security solutions are derived from four use cases and implemented prototypically. A further topic of this work are the research areas of the German Research Center for Artificial Intelligence that address the given challenges, as well as the solutions developed in the context of IUNO. Aside from the project itself, a method for distributed network data collection aggregation is presented, as a prerequisite for anomaly detection for network security.\nWe address the problem of assisting human dispatchers in operating power grids in today's changing context using machine learning, with theaim of increasing security and reducing costs. Power networks are highly regulated systems, which at all times must meet varying demands of electricity with a complex production system, including conventional power plants, less predictable renewable energies (such as wind or solar power), and the possibility of buying/selling electricity on the international market with more and more actors involved at a Europeanscale. This problem is becoming ever more challenging in an aging network infrastructure. One of the primary goals of dispatchers is to protect equipment (e.g. avoid that transmission lines overheat) with few degrees of freedom: we are considering in this paper solely modifications in network topology, i.e. re-configuring the way in which lines, transformers, productions and loads are connected in sub-stations. Using years of historical data collected by the French Transmission Service Operator (TSO) \"R\\'eseau de Transport d'Electricit\\'e\" (RTE), we develop novel machine learning techniques (drawing on \"deep learning\") to mimic human decisions to devise \"remedial actions\" to prevent any line to violate power flow limits (so-called \"thermal limits\"). The proposed technique is hybrid. It does not rely purely on machine learning: every action will be tested with actual simulators before being proposed to the dispatchers or implemented on the grid.\nThis paper focuses on two commonly used path assignment policies for agents traversing a congested network: self-interested routing, and system-optimum routing. In the self-interested routing policy each agent selects a path that optimizes its own utility, while the system-optimum routing agents are assigned paths with the goal of maximizing system performance. This paper considers a scenario where a centralized network manager wishes to optimize utilities over all agents, i.e., implement a system-optimum routing policy. In many real-life scenarios, however, the system manager is unable to influence the route assignment of all agents due to limited influence on route choice decisions. Motivated by such scenarios, a computationally tractable method is presented that computes the minimal amount of agents that the system manager needs to influence (compliant agents) in order to achieve system optimal performance. Moreover, this methodology can also determine whether a given set of compliant agents is sufficient to achieve system optimum and compute the optimal route assignment for the compliant agents to do so. Experimental results are presented showing that in several large-scale, realistic traffic networks optimal flow can be achieved with as low as 13% of the agent being compliant and up to 54%.\nWe present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, efficient to collect and reflective of relevant and/or trending topics. These self-dialogues provide training data for a generative neural network as well as a basis for soft rules used by a matching score component. Each match of a soft rule against a user utterance is associated with a confidence score which we show is strongly indicative of reply quality, allowing this component to self-censor and be effectively integrated with other components. Edina's full architecture features a rule-based system backing off to a matching score, backing off to a generative neural network. Our hybrid data-driven methodology thus addresses both coverage limitations of a strictly rule-based approach and the lack of guarantees of a strictly machine-learning approach.\nDeveloping a safe and efficient collision avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generate its paths without observing other robots' states and intents. While other distributed multi-robot collision avoidance systems exist, they often require extracting agent-level features to plan a local collision-free action, which can be computationally prohibitive and not robust. More importantly, in practice the performance of these methods are much lower than their centralized counterparts.   We present a decentralized sensor-level collision avoidance policy for multi-robot systems, which directly maps raw sensor measurements to an agent's steering commands in terms of movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to find an optimal policy which is trained over a large number of robots on rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system. We also demonstrate that the learned policy can be well generalized to new scenarios that do not appear in the entire training period, including navigating a heterogeneous group of robots and a large-scale scenario with 100 robots. Videos are available at https://sites.google.com/view/drlmaca\nWhile recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot of training data. One way to increase the speed at which agents are able to learn to perform tasks is by leveraging the input of human trainers. Although such input can take many forms, real-time, scalar-valued feedback is especially useful in situations where it proves difficult or impossible for humans to provide expert demonstrations. Previous approaches have shown the usefulness of human input provided in this fashion (e.g., the TAMER framework), but they have thus far not considered high-dimensional state spaces or employed the use of deep learning. In this paper, we do both: we propose Deep TAMER, an extension of the TAMER framework that leverages the representational power of deep neural networks in order to learn complex tasks in just a short amount of time with a human trainer. We demonstrate Deep TAMER's success by using it and just 15 minutes of human-provided feedback to train an agent that performs better than humans on the Atari game of Bowling - a task that has proven difficult for even state-of-the-art reinforcement learning methods.\nEmbedded, continual learning for autonomous and adaptive behavior is a key application of neuromorphic hardware designed to mimic the dynamics and architecture of biological neural networks. However, neuromorphic implementations of embedded learning at large scales that are both flexible and efficient have been hindered by a lack of a suitable algorithmic framework. As a result, most neuromorphic hardware are trained off-line on large clusters of dedicated processors or GPUs and transferred post hoc to the device. We address this by introducing the neural and synaptic array transceiver (NSAT), a neuromorphic computational framework facilitating flexible and efficient embedded learning. NSAT supports event-driven supervised, unsupervised and reinforcement learning algorithms including deep learning. We demonstrate the NSAT in a wide range of tasks, including the simulation of Mihalas-Niebur neuron, dynamic neural fields, event-driven random back-propagation for event-based deep learning, event-based contrastive divergence for unsupervised learning, and voltage-based learning rules for sequence learning. We anticipate that this contribution will establish the foundation for a new generation of devices enabling adaptive mobile systems, wearable devices, and robots with data-driven autonomy.\nWe motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as \" burchak \" for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self-and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.\nEnabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. In contrast, learning-based methods improve as the robot acts in the environment, but are difficult to deploy in the real-world due to their high sample complexity. To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based. We then instantiate this graph to form a navigation model that learns from raw images and is sample efficient. Our simulated car experiments explore the design decisions of our navigation model, and show our approach outperforms single-step and $N$-step double Q-learning. We also evaluate our approach on a real-world RC car and show it can learn to navigate through a complex indoor environment with a few hours of fully autonomous, self-supervised training. Videos of the experiments and code can be found at github.com/gkahn13/gcg\nIn this work, we present an approach to deep visuomotor control using structured deep dynamics models. Our deep dynamics model, a variant of SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an encoder-decoder structure. Unlike prior work, our dynamics model is structured: given an input scene, our network explicitly learns to segment salient parts and predict their pose-embedding along with their motion modeled as a change in the pose space due to the applied actions. We train our model using a pair of point clouds separated by an action and show that given supervision only in the form of point-wise data associations between the frames our network is able to learn a meaningful segmentation of the scene along with consistent poses. We further show that our model can be used for closed-loop control directly in the learned low-dimensional pose space, where the actions are computed by minimizing error in the pose space using gradient-based methods, similar to traditional model-based control. We present results on controlling a Baxter robot from raw depth data in simulation and in the real world and compare against two baseline deep networks. Our method runs in real-time, achieves good prediction of scene dynamics and outperforms the baseline methods on multiple control runs. Video results can be found at: https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/\nDeep neural networks are increasingly being used in a variety of machine learning applications applied to rich user data on the cloud. However, this approach introduces a number of privacy and efficiency challenges, as the cloud operator can perform secondary inferences on the available data. Recently, advances in edge processing have paved the way for more efficient, and private, data processing at the source for simple tasks and lighter models, though they remain a challenge for larger, and more complicated models. In this paper, we present a hybrid approach for breaking down large, complex deep models for cooperative, privacy-preserving analytics. We do this by breaking down the popular deep architectures and fine-tune them in a particular way. We then evaluate the privacy benefits of this approach based on the information exposed to the cloud service. We also asses the local inference cost of different layers on a modern handset for mobile applications. Our evaluations show that by using certain kind of fine-tuning and embedding techniques and at a small processing costs, we can greatly reduce the level of information available to unintended tasks applied to the data feature on the cloud, and hence achieving the desired tradeoff between privacy and performance.\nMobile Ad hoc Network (MANET) is an infrastructure-less network formed between a set of mobile nodes. The discovery of services in MANET is a challenging job due to the unique properties of network. In this paper, a novel service discovery framework called Hybrid Association Rules Based Network Layer Discovery of Services for Ad hoc Networks (HANDY) has been proposed. HANDY provides three major research contributions. At first, it adopts a cross-layer optimized design for discovery of services that is based on simultaneous discovery of services and corresponding routes. Secondly, it provides a multi-level ontology-based approach to describe the services. This resolves the issue of semantic interoperability among the service consumers in a scalable fashion. Finally, to further optimize the performance of the discovery process, HANDY recommends exploiting the inherent associations present among the services. These associations are used in two ways. First, periodic service advertisements are performed based on these associations. In addition, when a response of a service discovery request is generated, correlated services are also attached with the response. The proposed service discovery scheme has been implemented in JIST/SWANS simulator. The results demonstrate that the proposed modifications give rise to improvement in hit ratio of the service consumers and latency of discovery process.\nThe infamous exploration-exploitation dilemma is one of the oldest and most important problems in reinforcement learning (RL). Deliberate and effective exploration is necessary for RL agents to succeed in most environments. However, until very recently even very sophisticated RL algorithms employed simple, undirected exploration strategies in large-scale RL tasks.   We introduce a new optimistic count-based exploration algorithm for RL that is feasible in high-dimensional MDPs. The success of RL algorithms in these domains depends crucially on generalization from limited training experience. Function approximation techniques enable RL agents to generalize in order to estimate the value of unvisited states, but at present few methods have achieved generalization about the agent's uncertainty regarding unvisited states. We present a new method for computing a generalized state visit-count, which allows the agent to estimate the uncertainty associated with any state.   In contrast to existing exploration techniques, our $\\phi$-$\\textit{pseudocount}$ achieves generalization by exploiting the feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The resulting $\\phi$-$\\textit{Exploration-Bonus}$ algorithm rewards the agent for exploring in feature space rather than in the original state space. This method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks. In particular, we report world-class results on several notoriously difficult Atari 2600 video games, including Montezuma's Revenge.\nThe meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model's performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties.\nOptimizing deep neural networks (DNNs) often suffers from the ill-conditioned problem. We observe that the scaling-based weight space symmetry property in rectified nonlinear network will cause this negative effect. Therefore, we propose to constrain the incoming weights of each neuron to be unit-norm, which is formulated as an optimization problem over Oblique manifold. A simple yet efficient method referred to as projection based weight normalization (PBWN) is also developed to solve this problem. PBWN executes standard gradient updates, followed by projecting the updated weight back to Oblique manifold. This proposed method has the property of regularization and collaborates well with the commonly used batch normalization technique. We conduct comprehensive experiments on several widely-used image datasets including CIFAR-10, CIFAR-100, SVHN and ImageNet for supervised learning over the state-of-the-art convolutional neural networks, such as Inception, VGG and residual networks. The results show that our method is able to improve the performance of DNNs with different architectures consistently. We also apply our method to Ladder network for semi-supervised learning on permutation invariant MNIST dataset, and our method outperforms the state-of-the-art methods: we obtain test errors as 2.52%, 1.06%, and 0.91% with only 20, 50, and 100 labeled samples, respectively.\nClustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version have still a wide audience because of their conceptual simplicity and efficacy. However, the systematic application of the kernelized version of k-means is hampered by its inherent square scaling in memory with the number of samples. In this contribution, we devise an approximate strategy to minimize the kernel k-means cost function in which the trade-off between accuracy and velocity is automatically ruled by the available system memory. Moreover, we define an ad-hoc parallelization scheme well suited for hybrid cpu-gpu state-of-the-art parallel architectures. We proved the effectiveness both of the approximation scheme and of the parallelization method on standard UCI datasets and on molecular dynamics (MD) data in the realm of computational chemistry. In this applicative domain, clustering can play a key role for both quantitively estimating kinetics rates via Markov State Models or to give qualitatively a human compatible summarization of the underlying chemical phenomenon under study. For these reasons, we selected it as a valuable real-world application scenario.\nA smart grid can be considered as a complex network where each node represents a generation unit or a consumer. Whereas links can be used to represent transmission lines. One way to study complex systems is by using the agent-based modeling (ABM) paradigm. An ABM is a way of representing a complex system of autonomous agents interacting with each other. Previously, a number of studies have been presented in the smart grid domain making use of the ABM paradigm. However, to the best of our knowledge, none of these studies have focused on the specification aspect of ABM. An ABM specification is important not only for understanding but also for replication of the model. In this study, we focus on development as well as specification of ABM for smart grid. We propose an ABM by using a combination of agent-based and complex network-based approaches. For ABM specification, we use ODD and DREAM specification approaches. We analyze these two specification approaches qualitatively as well as quantitatively. Extensive experiments demonstrate that DREAM is a most useful approach as compared with ODD for modeling as well as for replication of models for smart grid.\nOne of the key challenges when looking for the causes of a complex event is to determine the causal status of factors that are neither individually necessary nor individually sufficient to produce that event. In order to reason about how such factors should be taken into account, we need a vocabulary to distinguish different cases. In philosophy, the concept of overdetermination and the concept of preemption serve an important purpose in this regard, although their exact meaning tends to remain elusive. In this paper, I provide theory-neutral definitions of these concepts using structural equations in the Halpern-Pearl tradition. While my definitions do not presuppose any particular causal theory, they take such a theory as a variable parameter. This enables us to specify formal constraints on theories of causality, in terms of a pre-theoretic understanding of what preemption and overdetermination actually mean. I demonstrate the usefulness of this by presenting and arguing for what I call the principle of presumption. Roughly speaking, this principle states that a possible cause can only be regarded as having been preempted if there is independent evidence to support such an inference. I conclude by showing that the principle of presumption is violated by the two main theories of causality formulated in the Halpern-Pearl tradition. The paper concludes by defining the class of empirical causal theories, characterised in terms of a fixed-point of counterfactual reasoning about difference-making. It is argued that theories of actual causality ought to be empirical.\nThe goal of this paper is to advance an extensible theory of living systems using an approach to biomathematics and biocomputation that suitably addresses self-organized, self-referential and anticipatory systems with multi-temporal multi-agents. Our first step is to provide foundations for modelling of emergent and evolving dynamic multi-level organic complexes and their sustentative processes in artificial and natural life systems. Main applications are in life sciences, medicine, ecology and astrobiology, as well as robotics, industrial automation and man-machine interface. Since 2011 over 100 scientists from a number of disciplines have been exploring a substantial set of theoretical frameworks for a comprehensive theory of life known as Integral Biomathics. That effort identified the need for a robust core model of organisms as dynamic wholes, using advanced and adequately computable mathematics. The work described here for that core combines the advantages of a situation and context aware multivalent computational logic for active self-organizing networks, Wandering Logic Intelligence (WLI), and a multi-scale dynamic category theory, Memory Evolutive Systems (MES), hence WLIMES. This is presented to the modeller via a formal augmented reality language as a first step towards practical modelling and simulation of multi-level living systems. Initial work focuses on the design and implementation of this visual language and calculus (VLC) and its graphical user interface. The results will be integrated within the current methodology and practices of theoretical biology and (personalized) medicine to deepen and to enhance the holistic understanding of life.\nDeep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients after each optimizer step. This single-precision copy is rounded to half-precision format during training. Secondly, we propose scaling the loss appropriately to handle the loss of information with half-precision gradients. We demonstrate that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks. This technique works for large scale models with more than 100 million parameters trained on large datasets. Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x. In future processors, we can also expect a significant computation speedup using half-precision hardware units.\nThis article expands on research that has been done to develop a recurrent neural network (RNN) capable of predicting aircraft engine vibrations using long short-term memory (LSTM) neurons. LSTM RNNs can provide a more generalizable and robust method for prediction over analytical calculations of engine vibration, as analytical calculations must be solved iteratively based on specific empirical engine parameters, making this approach ungeneralizable across multiple engines. In initial work, multiple LSTM RNN architectures were proposed, evaluated and compared. This research improves the performance of the most effective LSTM network design proposed in the previous work by using a promising neuroevolution method based on ant colony optimization (ACO) to develop and enhance the LSTM cell structure of the network. A parallelized version of the ACO neuroevolution algorithm has been developed and the evolved LSTM RNNs were compared to the previously used fixed topology. The evolved networks were trained on a large database of flight data records obtained from an airline containing flights that suffered from excessive vibration. Results were obtained using MPI (Message Passing Interface) on a high performance computing (HPC) cluster, evolving 1000 different LSTM cell structures using 168 cores over 4 days. The new evolved LSTM cells showed an improvement of 1.35%, reducing prediction error from 5.51% to 4.17% when predicting excessive engine vibrations 10 seconds in the future, while at the same time dramatically reducing the number of weights from 21,170 to 11,810.\nWe present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL) agents. The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology, while the sampling-based planners provide an approximate map of the space of possible configurations of the robot from which collision-free trajectories feasible for the RL agents can be identified. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. These evaluations included both simulated environments and on-robot tests. Our results show improvement in navigation task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 meters long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 meters without violating the task constraints in an environment 63 million times larger than used in training.\nMost recently proposed methods for Neural Program Induction work under the assumption of having a large set of input/output (I/O) examples for learning any underlying input-output mapping. This paper aims to address the problem of data and computation efficiency of program induction by leveraging information from related tasks. Specifically, we propose two approaches for cross-task knowledge transfer to improve program induction in limited-data scenarios. In our first proposal, portfolio adaptation, a set of induction models is pretrained on a set of related tasks, and the best model is adapted towards the new task using transfer learning. In our second approach, meta program induction, a $k$-shot learning approach is used to make a model generalize to new tasks without additional training. To test the efficacy of our methods, we constructed a new benchmark of programs written in the Karel programming language. Using an extensive experimental evaluation on the Karel benchmark, we demonstrate that our proposals dramatically outperform the baseline induction method that does not use knowledge transfer. We also analyze the relative performance of the two approaches and study conditions in which they perform best. In particular, meta induction outperforms all existing approaches under extreme data sparsity (when a very small number of examples are available), i.e., fewer than ten. As the number of available I/O examples increase (i.e. a thousand or more), portfolio adapted program induction becomes the best approach. For intermediate data sizes, we demonstrate that the combined method of adapted meta program induction has the strongest performance.\nThe eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of \"spectrally critical\" off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering.\nAlthough aviation accidents are rare, safety incidents occur more frequently and require a careful analysis to detect and mitigate risks in a timely manner. Analyzing safety incidents using operational data and producing event-based explanations is invaluable to airline companies as well as to governing organizations such as the Federal Aviation Administration (FAA) in the United States. However, this task is challenging because of the complexity involved in mining multi-dimensional heterogeneous time series data, the lack of time-step-wise annotation of events in a flight, and the lack of scalable tools to perform analysis over a large number of events. In this work, we propose a precursor mining algorithm that identifies events in the multidimensional time series that are correlated with the safety incident. Precursors are valuable to systems health and safety monitoring and in explaining and forecasting safety incidents. Current methods suffer from poor scalability to high dimensional time series data and are inefficient in capturing temporal behavior. We propose an approach by combining multiple-instance learning (MIL) and deep recurrent neural networks (DRNN) to take advantage of MIL's ability to learn using weakly supervised data and DRNN's ability to model temporal behavior. We describe the algorithm, the data, the intuition behind taking a MIL approach, and a comparative analysis of the proposed algorithm with baseline models. We also discuss the application to a real-world aviation safety problem using data from a commercial airline company and discuss the model's abilities and shortcomings, with some final remarks about possible deployment directions.\nDeep neural networks are widely used for classification. These deep models often suffer from a lack of interpretability -- they are particularly difficult to understand because of their non-linear nature. As a result, neural networks are often treated as \"black box\" models, and in the past, have been trained purely to optimize the accuracy of predictions. In this work, we create a novel network architecture for deep learning that naturally explains its own reasoning for each prediction. This architecture contains an autoencoder and a special prototype layer, where each unit of that layer stores a weight vector that resembles an encoded training input. The encoder of the autoencoder allows us to do comparisons within the latent space, while the decoder allows us to visualize the learned prototypes. The training objective has four terms: an accuracy term, a term that encourages every prototype to be similar to at least one encoded input, a term that encourages every encoded input to be close to at least one prototype, and a term that encourages faithful reconstruction by the autoencoder. The distances computed in the prototype layer are used as part of the classification process. Since the prototypes are learned during training, the learned network naturally comes with explanations for each prediction, and the explanations are loyal to what the network actually computes.\nSocial network analysis provides meaningful information about behavior of network members that can be used for diverse applications such as classification, link prediction. However, network analysis is computationally expensive because of feature learning for different applications. In recent years, many researches have focused on feature learning methods in social networks. Network embedding represents the network in a lower dimensional representation space with the same properties which presents a compressed representation of the network. In this paper, we introduce a novel algorithm named \"CARE\" for network embedding that can be used for different types of networks including weighted, directed and complex. Current methods try to preserve local neighborhood information of nodes, whereas the proposed method utilizes local neighborhood and community information of network nodes to cover both local and global structure of social networks. CARE builds customized paths, which are consisted of local and global structure of network nodes, as a basis for network embedding and uses the Skip-gram model to learn representation vector of nodes. Subsequently, stochastic gradient descent is applied to optimize our objective function and learn the final representation of nodes. Our method can be scalable when new nodes are appended to network without information loss. Parallelize generation of customized random walks is also used for speeding up CARE. We evaluate the performance of CARE on multi label classification and link prediction tasks. Experimental results on various networks indicate that the proposed method outperforms others in both Micro and Macro-f1 measures for different size of training data.\nBoth resources in the natural environment and concepts in a semantic space are distributed \"patchily\", with large gaps in between the patches. To describe people's internal and external foraging behavior, various random walk models have been proposed. In particular, internal foraging has been modeled as sampling: in order to gather relevant information for making a decision, people draw samples from a mental representation using random-walk algorithms such as Markov chain Monte Carlo (MCMC). However, two common empirical observations argue against simple sampling algorithms such as MCMC. First, the spatial structure is often best described by a L\\'evy flight distribution: the probability of the distance between two successive locations follows a power-law on the distances. Second, the temporal structure of the sampling that humans and other animals produce have long-range, slowly decaying serial correlations characterized as $1/f$-like fluctuations. We propose that mental sampling is not done by simple MCMC, but is instead adapted to multimodal representations and is implemented by Metropolis-coupled Markov chain Monte Carlo (MC$^3$), one of the first algorithms developed for sampling from multimodal distributions. MC$^3$ involves running multiple Markov chains in parallel but with target distributions of different temperatures, and it swaps the states of the chains whenever a better location is found. Heated chains more readily traverse valleys in the probability landscape to propose moves to far-away peaks, while the colder chains make the local steps that explore the current peak or patch. We show that MC$^3$ generates distances between successive samples that follow a L\\'evy flight distribution and $1/f$-like serial correlations, providing a single mechanistic account of these two puzzling empirical phenomena.\nIn order to autonomously learn wide repertoires of complex skills, robots must be able to learn from their own autonomously collected data, without human supervision. One learning signal that is always available for autonomously collected data is prediction: if a robot can learn to predict the future, it can use this predictive model to take actions to produce desired outcomes, such as moving an object to a particular location. However, in complex open-world scenarios, designing a representation for prediction is difficult. In this work, we instead aim to enable self-supervised robotic learning through direct video prediction: instead of attempting to design a good representation, we directly predict what the robot will see next, and then use this model to achieve desired goals. A key challenge in video prediction for robotic manipulation is handling complex spatial arrangements such as occlusions. To that end, we introduce a video prediction model that can keep track of objects through occlusion by incorporating temporal skip-connections. Together with a novel planning criterion and action space formulation, we demonstrate that this model substantially outperforms prior work on video prediction-based control. Our results show manipulation of objects not seen during training, handling multiple objects, and pushing objects around obstructions. These results represent a significant advance in the range and complexity of skills that can be performed entirely with self-supervised robotic learning.\nIn this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that totally eliminates the imbalance, whereas undersampling can perform better when the imbalance is only removed to some extent; (iv) as opposed to some classical machine learning models, oversampling does not necessarily cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.\nFinding semantically rich and computer-understandable representations for textual dialogues, utterances and words is crucial for dialogue systems (or conversational agents), as their performance mostly depends on understanding the context of conversations. Recent research aims at finding distributed vector representations (embeddings) for words, such that semantically similar words are relatively close within the vector-space. Encoding the \"meaning\" of text into vectors is a current trend, and text can range from words, phrases and documents to actual human-to-human conversations. In recent research approaches, responses have been generated utilizing a decoder architecture, given the vector representation of the current conversation. In this paper, the utilization of embeddings for answer retrieval is explored by using Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor (ANN) model, to find similar conversations in a corpus and rank possible candidates. Experimental results on the well-known Ubuntu Corpus (in English) and a customer service chat dataset (in Dutch) show that, in combination with a candidate selection method, retrieval-based approaches outperform generative ones and reveal promising future research directions towards the usability of such a system.\nMost of the existing medicine recommendation systems that are mainly based on electronic medical records (EMRs) are significantly assisting doctors to make better clinical decisions benefiting both patients and caregivers. Even though the growth of EMRs is at a lighting fast speed in the era of big data, content limitations in EMRs restrain the existed recommendation systems to reflect relevant medical facts, such as drug-drug interactions. Many medical knowledge graphs that contain drug-related information, such as DrugBank, may give hope for the recommendation systems. However, the direct use of these knowledge graphs in the systems suffers from robustness caused by the incompleteness of the graphs. To address these challenges, we stand on recent advances in graph embedding learning techniques and propose a novel framework, called Safe Medicine Recommendation (SMR), in this paper. Specifically, SMR first constructs a high-quality heterogeneous graph by bridging EMRs (MIMIC-III) and medical knowledge graphs (ICD-9 ontology and DrugBank). Then, SMR jointly embeds diseases, medicines, patients, and their corresponding relations into a shared lower dimensional space. Finally, SMR uses the embeddings to decompose the medicine recommendation into a link prediction process while considering the patient's diagnoses and adverse drug reactions. To our best knowledge, SMR is the first to learn embeddings of a patient-disease-medicine graph for medicine recommendation in the world. Extensive experiments on real datasets are conducted to evaluate the effectiveness of proposed framework.\nEmail responses often contain items-such as a file or a hyperlink to an external document-that are attached to or included inline in the body of the message. Analysis of an enterprise email corpus reveals that 35% of the time when users include these items as part of their response, the attachable item is already present in their inbox or sent folder. A modern email client can proactively retrieve relevant attachable items from the user's past emails based on the context of the current conversation, and recommend them for inclusion, to reduce the time and effort involved in composing the response. In this paper, we propose a weakly supervised learning framework for recommending attachable items to the user. As email search systems are commonly available, we constrain the recommendation task to formulating effective search queries from the context of the conversations. The query is submitted to an existing IR system to retrieve relevant items for attachment. We also present a novel strategy for generating labels from an email corpus---without the need for manual annotations---that can be used to train and evaluate the query formulation model. In addition, we describe a deep convolutional neural network that demonstrates satisfactory performance on this query formulation task when evaluated on the publicly available Avocado dataset and a proprietary dataset of internal emails obtained through an employee participation program.\nWe propose a framework to understand the unprecedented performance and robustness of deep neural networks using field theory. Correlations between the weights within the same layer can be described by symmetries in that layer, and networks generalize better if such symmetries are broken to reduce the redundancies of the weights. Using a two parameter field theory, we find that the network can break such symmetries itself towards the end of training in a process commonly known in physics as spontaneous symmetry breaking. This corresponds to a network generalizing itself without any user input layers to break the symmetry, but by communication with adjacent layers. In the layer decoupling limit applicable to residual networks (He et al., 2015), we show that the remnant symmetries that survive the non-linear layers are spontaneously broken. The Lagrangian for the non-linear and weight layers together has striking similarities with the one in quantum field theory of a scalar. Using results from quantum field theory we show that our framework is able to explain many experimentally observed phenomena,such as training on random labels with zero error (Zhang et al., 2017), the information bottleneck, the phase transition out of it and gradient variance explosion (Shwartz-Ziv & Tishby, 2017), shattered gradients (Balduzzi et al., 2017), and many more.\nIn order for robots to perform mission-critical tasks, it is essential that they are able to quickly adapt to changes in their environment as well as to injuries and or other bodily changes. Deep reinforcement learning has been shown to be successful in training robot control policies for operation in complex environments. However, existing methods typically employ only a single policy. This can limit the adaptability since a large environmental modification might require a completely different behavior compared to the learning environment. To solve this problem, we propose Map-based Multi-Policy Reinforcement Learning (MMPRL), which aims to search and store multiple policies that encode different behavioral features while maximizing the expected reward in advance of the environment change. Thanks to these policies, which are stored into a multi-dimensional discrete map according to its behavioral feature, adaptation can be performed within reasonable time without retraining the robot. An appropriate pre-trained policy from the map can be recalled using Bayesian optimization. Our experiments show that MMPRL enables robots to quickly adapt to large changes without requiring any prior knowledge on the type of injuries that could occur. A highlight of the learned behaviors can be found here: https://youtu.be/QwInbilXNOE .\nBlack-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose a transparent model distillation approach to audit such models. Model distillation was first introduced to transfer knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. To this we add a third criterion - transparency. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by the teacher. Moreover, we use side information in the form of the actual outcomes the teacher scoring model was intended to predict in the first place. By training a second transparent model on the outcomes, we can compare the two models to each other. When comparing models trained on risk scores to models trained on outcomes, we show that it is necessary to calibrate the risk-scoring model's predictions to remove distortion that may have been added to the black-box risk-scoring model during or after its training process. We also show how to compute confidence intervals for the particular class of transparent student models we use - tree-based additive models with pairwise interactions (GA2Ms) - to support comparison of the two transparent models. We demonstrate the methods on four public datasets: COMPAS, Lending Club, Stop-and-Frisk, and Chicago Police.\nWe describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the development of dialogue systems for virtual agents and video playback, we add templates with open parameters which allow the wizard to quickly produce a wide variety of utterances. Our research demonstrates that this approach to data collection is viable as an intermediate step in developing a dialogue system for physical robots in remote locations from their users - a domain in which the human and robot need to regularly verify and update a shared understanding of the physical environment. We show that our WoZ interface and the fixed set of utterances and templates therein provide for a natural pace of dialogue with good coverage of the navigation domain.\nWe consider two questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well? Our work responds to Zhang et al. (2016), who showed deep neural networks can easily memorize randomly labeled training data, despite generalizing well on real labels of the same inputs. We show that the same phenomenon occurs in small linear models. These observations are explained by the Bayesian evidence, which penalizes sharp minima but is invariant to model parameterization. We also demonstrate that, when one holds the learning rate fixed, there is an optimum batch size which maximizes the test set accuracy. We propose that the noise introduced by small mini-batches drives the parameters towards minima whose evidence is large. Interpreting stochastic gradient descent as a stochastic differential equation, we identify the \"noise scale\" $g = \\epsilon (\\frac{N}{B} - 1) \\approx \\epsilon N/B$, where $\\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size. Consequently the optimum batch size is proportional to both the learning rate and the size of the training set, $B_{opt} \\propto \\epsilon N$. We verify these predictions empirically.\nIn this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.\nSentence representation at the semantic level is a challenging task for Natural Language Processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is an open question due to complexities of semantic interactions among words. In this paper, we present an embedding method, which is aimed at learning unsupervised sentence representations from unlabeled text. We propose an unsupervised method that models a sentence as a weighted series of word embeddings. The weights of the word embeddings are fitted by using Shannon's word entropies provided by the Term Frequency--Inverse Document Frequency (TF--IDF) transform. The hyperparameters of the model can be selected according to the properties of data (e.g. sentence length and textual gender). Hyperparameter selection involves word embedding methods and dimensionalities, as well as weighting schemata. Our method offers advantages over existing methods: identifiable modules, short-term training, online inference of (unseen) sentence representations, as well as independence from domain, external knowledge and language resources. Results showed that our model outperformed the state of the art in well-known Semantic Textual Similarity (STS) benchmarks. Moreover, our model reached state-of-the-art performance when compared to supervised and knowledge-based STS systems.\nThis paper presents a novel differential evolution algorithm for protein folding optimization that is applied to a three-dimensional AB off-lattice model. The proposed algorithm includes two new mechanisms. A local search is used to improve convergence speed and to reduce the runtime complexity of the energy calculation. For this purpose, a local movement is introduced within the local search. The designed evolutionary algorithm has fast convergence and, therefore, when it is trapped into local optimum or a relatively good solution is located, it is hard to locate a better similar solution. The similar solution is different from the good solution in only a few components. A component reinitialization method is designed to mitigate this problem. Both the new mechanisms and the proposed algorithm were analyzed on well-known amino-acid sequences that are used frequently in the literature. Experimental results show that the employed new mechanisms improve the efficiency of our algorithm and the proposed algorithm is superior to other state-of-the-art algorithms. It obtained a hit ratio of 100 % for sequences up to 18 monomers within a budget of $10^{11}$ solution evaluations. New best-known solutions were obtained for most of the sequences. The existence of the symmetric best-known solutions is also demonstrated in the paper.\nConvolutional neural networks rely on image texture and structure to serve as discriminative features to classify the image content. Image enhancement techniques can be used as preprocessing steps to help improve the overall image quality and in turn improve the overall effectiveness of a CNN. Existing image enhancement methods, however, are designed to improve the perceptual quality of an image for a human observer. In this paper, we are interested in learning CNNs that can emulate image enhancement and restoration, but with the overall goal to improve image classification and not necessarily human perception. To this end, we present a unified CNN architecture that uses a range of enhancement filters that can enhance image-specific details via end-to-end dynamic filter learning. We demonstrate the effectiveness of this strategy on four challenging benchmark datasets for fine-grained, object, scene, and texture classification: CUB-200-2011, PASCAL-VOC2007, MIT-Indoor, and DTD. Experiments using our proposed enhancement show promising results on all the datasets. In addition, our approach is capable of improving the performance of all generic CNN architectures.\nSGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent gradient variance. To attack the problem, mini-batch SGD was proposed to get a trade-off in terms of convergence rate and iteration cost. In this paper, a general CVI (Convergence-Variance Inequality) equation is presented to state formally the interaction of convergence rate and gradient variance. Then a novel algorithm named SSAG (Stochastic Stratified Average Gradient) is introduced to reduce gradient variance based on two techniques, stratified sampling and averaging over iterations that is a key idea in SAG (Stochastic Average Gradient). Furthermore, SSAG can achieve linear convergence rate of $\\mathcal {O}((1-\\frac{\\mu}{8CL})^k)$ at smaller storage and iterative costs, where $C\\geq 2$ is the category number of training data. This convergence rate depends mainly on the variance between classes, but not on the variance within the classes. In the case of $C\\ll N$ ($N$ is the training data size), SSAG's convergence rate is much better than SAG's convergence rate of $\\mathcal {O}((1-\\frac{\\mu}{8NL})^k)$. Our experimental results show SSAG outperforms SAG and many other algorithms.\nDue to the lack of enough generalization in the state-space, common methods in Reinforcement Learning (RL) suffer from slow learning speed especially in the early learning trials. This paper introduces a model-based method in discrete state-spaces for increasing learning speed in terms of required experience (but not required computational time) by exploiting generalization in the experiences of the subspaces. A subspace is formed by choosing a subset of features in the original state representation (full-space). Generalization and faster learning in a subspace are due to many-to-one mapping of experiences from the full-space to each state in the subspace. Nevertheless, due to inherent perceptual aliasing in the subspaces, the policy suggested by each subspace does not generally converge to the optimal policy. Our approach, called Model Based Learning with Subspaces (MoBLeS), calculates confidence intervals of the estimated Q-values in the full-space and in the subspaces. These confidence intervals are used in the decision making, such that the agent benefits the most from the possible generalization while avoiding from detriment of the perceptual aliasing in the subspaces. Convergence of MoBLeS to the optimal policy is theoretically investigated. Additionally, we show through several experiments that MoBLeS improves the learning speed in the early trials.\nHigh Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.\nMemristors have recently received significant attention as ubiquitous device-level components for building a novel generation of computing systems. These devices have many promising features, such as non-volatility, low power consumption, high density, and excellent scalability. The ability to control and modify biasing voltages at the two terminals of memristors make them promising candidates to perform matrix-vector multiplications and solve systems of linear equations. In this article, we discuss how networks of memristors arranged in crossbar arrays can be used for efficiently solving optimization and machine learning problems. We introduce a new memristor-based optimization framework that combines the computational merit of memristor crossbars with the advantages of an operator splitting method, alternating direction method of multipliers (ADMM). Here, ADMM helps in splitting a complex optimization problem into subproblems that involve the solution of systems of linear equations. The capability of this framework is shown by applying it to linear programming, quadratic programming, and sparse optimization. In addition to ADMM, implementation of a customized power iteration (PI) method for eigenvalue/eigenvector computation using memristor crossbars is discussed. The memristor-based PI method can further be applied to principal component analysis (PCA). The use of memristor crossbars yields a significant speed-up in computation, and thus, we believe, has the potential to advance optimization and machine learning research in artificial intelligence (AI).\nThis paper presents a practical approach for identifying unknown mechanical parameters, such as mass and friction models of manipulated rigid objects or actuated robotic links, in a succinct manner that aims to improve the performance of policy search algorithms. Key features of this approach are the use of off-the-shelf physics engines and the adaptation of a black-box Bayesian optimization framework for this purpose. The physics engine is used to reproduce in simulation experiments that are performed on a real robot, and the mechanical parameters of the simulated system are automatically fine-tuned so that the simulated trajectories match with the real ones. The optimized model is then used for learning a policy in simulation, before safely deploying it on the real robot. Given the well-known limitations of physics engines in modeling real-world objects, it is generally not possible to find a mechanical model that reproduces in simulation the real trajectories exactly. Moreover, there are many scenarios where a near-optimal policy can be found without having a perfect knowledge of the system. Therefore, searching for a perfect model may not be worth the computational effort in practice. The proposed approach aims then to identify a model that is good enough to approximate the value of a locally optimal policy with a certain confidence, instead of spending all the computational resources on searching for the most accurate model. Empirical evaluations, performed in simulation and on a real robotic manipulation task, show that model identification via physics engines can significantly boost the performance of policy search algorithms that are popular in robotics, such as TRPO, PoWER and PILCO, with no additional real-world data.\nMarkov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature.   Commonly, analysis of models with uncertainties amounts to searching for the most robust policy which means that the goal is to generate a policy with the greatest lower bound on performance (or, symmetrically, the lowest upper bound on costs). However, hedging against an unlikely worst case may lead to losses in other situations. In general, one is interested in policies that behave well in all situations which results in a multi-objective view on decision making.   In this paper, we consider policies for the expected discounted reward measure of MDPs with uncertain parameters. In particular, the approach is defined for bounded-parameter MDPs (BMDPs) [8]. In this setting the worst, best and average case performances of a policy are analyzed simultaneously, which yields a multi-scenario multi-objective optimization problem. The paper presents and evaluates approaches to compute the pure Pareto optimal policies in the value vector space.\nGraphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as DeepWalk, node2vec, as well as graph-based deep learning algorithms. However, the simple random walk used by these methods is fundamentally tied to the identity of the node. This has three main disadvantages. First, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Second, they are not space-efficient as a feature vector is learned for each node which is impractical for large graphs. Third, most of these approaches lack support for attributed graphs.   To make these methods more generally applicable, we propose a framework for inductive network representation learning based on the notion of attributed random walk that is not tied to node identity and is instead based on learning a function $\\Phi : \\mathrm{\\rm \\bf x} \\rightarrow w$ that maps a node attribute vector $\\mathrm{\\rm \\bf x}$ to a type $w$. This framework serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many other previous methods that leverage traditional random walks.\nThe Advent of the Internet-of-Things (IoT) paradigm has brought opportunities to solve many real-world problems. Energy management, for example, has attracted huge interest from academia, industries, governments and regulatory bodies. It involves collecting energy usage data, analyzing it, and optimizing the energy consumption by applying control strategies. However, in industrial environments, performing such optimization is not trivial. The changes in business rules, process control, and customer requirements make it much more challenging. In this paper, a Semantic Rules Engine (SRE) for industrial gateways is presented that allows implementing dynamic and flexible rule-based control strategies. It is simple, expressive, and allows managing rules on-the-fly without causing any service interruption. Additionally, it can handle semantic queries and provide results by inferring additional knowledge from previously defined concepts in ontologies. SRE has been validated and tested on different hardware platforms and in commercial products. Performance evaluations are also presented to validate its conformance to the customer requirements.\nVagueness and uncertainty management is counted among one of the challenges that remain unresolved in systems that generate texts from non-linguistic data, known as data-to-text systems. In the last decade, work in fuzzy linguistic summarization and description of data has raised the interest of using fuzzy sets to model and manage the imprecision of human language in data-to-text systems. However, despite some research in this direction, there has not been an actual clear discussion and justification on how fuzzy sets can contribute to data-to-text for modeling vagueness and uncertainty in words and expressions. This paper intends to bridge this gap by answering the following questions: What does vagueness mean in fuzzy sets theory? What does vagueness mean in data-to-text contexts? In what ways can fuzzy sets theory contribute to improve data-to-text systems? What are the challenges that researchers from both disciplines need to address for a successful integration of fuzzy sets into data-to-text systems? In what cases should the use of fuzzy sets be avoided in D2T? For this, we review and discuss the state of the art of vagueness modeling in natural language generation and data-to-text, describe potential and actual usages of fuzzy sets in data-to-text contexts, and provide some additional insights about the engineering of data-to-text systems that make use of fuzzy set-based techniques.\nWe consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, a noisy continuous-time observation of the trajectory is provided to the learner. This problem exhibits wide-ranging applications and the specific application we consider here is the scenario in which the learner seeks to penetrate a perimeter patrolled by a robot. The learner's field of view is limited due to which it cannot observe the patroller's complete trajectory. Instead, we allow the learner to listen to the expert's movement sound, which it can also use to estimate the expert's state and action using an observation model. We treat the expert's state and action as hidden data and present an algorithm based on expectation maximization and maximum entropy principle to solve the non-linear, non-convex problem. Related work considers discrete-time observations and an observation model that does not include actions. In contrast, our technique takes expectations over both state and action of the expert, enabling learning even in the presence of extreme noise and broader applications.\nObjective: Radiomics-driven Computer Aided Diagnosis (CAD) has shown considerable promise in recent years as a potential tool for improving clinical decision support in medical oncology, particularly those based around the concept of Discovery Radiomics, where radiomic sequencers are discovered through the analysis of medical imaging data. One of the main limitations with current CAD approaches is that it is very difficult to gain insight or rationale as to how decisions are made, thus limiting their utility to clinicians. Methods: In this study, we propose CLEAR-DR, a novel interpretable CAD system based on the notion of CLass-Enhanced Attentive Response Discovery Radiomics for the purpose of clinical decision support for diabetic retinopathy. Results: In addition to disease grading via the discovered deep radiomic sequencer, the CLEAR-DR system also produces a visual interpretation of the decision-making process to provide better insight and understanding into the decision-making process of the system. Conclusion: We demonstrate the effectiveness and utility of the proposed CLEAR-DR system of enhancing the interpretability of diagnostic grading results for the application of diabetic retinopathy grading. Significance: CLEAR-DR can act as a potential powerful tool to address the uninterpretability issue of current CAD systems, thus improving their utility to clinicians.\nDeep learning models require extensive architecture design exploration and hyperparameter optimization to perform well on a given task. The exploration of the model design space is often made by a human expert, and optimized using a combination of grid search and search heuristics over a large space of possible choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach that has been proposed to automate architecture design. NAS has been successfully applied to generate Neural Networks that rival the best human-designed architectures. However, NAS requires sampling, constructing, and training hundreds to thousands of models to achieve well-performing architectures. This procedure needs to be executed from scratch for each new task. The application of NAS to a wide set of tasks currently lacks a way to transfer generalizable knowledge across tasks. In this paper, we present the Multitask Neural Model Search (MNMS) controller. Our goal is to learn a generalizable framework that can condition model construction on successful model searches for previously seen tasks, thus significantly speeding up the search for new tasks. We demonstrate that MNMS can conduct an automated architecture search for multiple tasks simultaneously while still learning well-performing, specialized models for each task. We then show that pre-trained MNMS controllers can transfer learning to new tasks. By leveraging knowledge from previous searches, we find that pre-trained MNMS models start from a better location in the search space and reduce search time on unseen tasks, while still discovering models that outperform published human-designed models.\nAlthough Generative Adversarial Networks (GANs) have shown remarkable success in various tasks, they still face challenges in generating high quality images. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images. First, we propose a two-stage generative adversarial network architecture, StackGAN-v1, for text-to-image synthesis. The Stage-I GAN sketches the primitive shape and colors of the object based on given text description, yielding low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. Second, an advanced multi-stage generative adversarial network architecture, StackGAN-v2, is proposed for both conditional and unconditional generative tasks. Our StackGAN-v2 consists of multiple generators and discriminators in a tree-like structure; images at multiple scales corresponding to the same scene are generated from different branches of the tree. StackGAN-v2 shows more stable training behavior than StackGAN-v1 by jointly approximating multiple distributions. Extensive experiments demonstrate that the proposed stacked generative adversarial networks significantly outperform other state-of-the-art methods in generating photo-realistic images.\nEndowing robots with the capability of assessing risk and making risk-aware decisions is widely considered a key step toward ensuring safety for robots operating under uncertainty. But, how should a robot quantify risk? A natural and common approach is to consider the framework whereby costs are assigned to stochastic outcomes - an assignment captured by a cost random variable. Quantifying risk then corresponds to evaluating a risk metric, i.e., a mapping from the cost random variable to a real number. Yet, the question of what constitutes a \"good\" risk metric has received little attention within the robotics community. The goal of this paper is to explore and partially address this question by advocating axioms that risk metrics in robotics applications should satisfy in order to be employed as rational assessments of risk. We discuss general representation theorems that precisely characterize the class of metrics that satisfy these axioms (referred to as distortion risk metrics), and provide instantiations that can be used in applications. We further discuss pitfalls of commonly used risk metrics in robotics, and discuss additional properties that one must consider in sequential decision making tasks. Our hope is that the ideas presented here will lead to a foundational framework for quantifying risk (and hence safety) in robotics applications.\nWe study the problem of semantic code repair, which can be broadly defined as automatically fixing non-syntactic bugs in source code. The majority of past work in semantic code repair assumed access to unit tests against which candidate repairs could be validated. In contrast, the goal here is to develop a strong statistical model to accurately predict both bug locations and exact fixes without access to information about the intended correct behavior of the program. Achieving such a goal requires a robust contextual repair model, which we train on a large corpus of real-world source code that has been augmented with synthetically injected bugs. Our framework adopts a two-stage approach where first a large set of repair candidates are generated by rule-based processors, and then these candidates are scored by a statistical model using a novel neural network architecture which we refer to as Share, Specialize, and Compete. Specifically, the architecture (1) generates a shared encoding of the source code using an RNN over the abstract syntax tree, (2) scores each candidate repair using specialized network modules, and (3) then normalizes these scores together so they can compete against one another in comparable probability space. We evaluate our model on a real-world test set gathered from GitHub containing four common categories of bugs. Our model is able to predict the exact correct repair 41\\% of the time with a single guess, compared to 13\\% accuracy for an attentional sequence-to-sequence model.\nRecent work on quantum machine learning has demonstrated that quantum computers can offer dramatic improvements over classical devices for data mining, prediction and classification. However, less is known about the advantages using quantum computers may bring in the more general setting of reinforcement learning, where learning is achieved via interaction with a task environment that provides occasional rewards. Reinforcement learning can incorporate data-analysis-oriented learning settings as special cases, but also includes more complex situations where, e.g., reinforcing feedback is delayed. In a few recent works, Grover-type amplification has been utilized to construct quantum agents that achieve up-to-quadratic improvements in learning efficiency. These encouraging results have left open the key question of whether super-polynomial improvements in learning times are possible for genuine reinforcement learning problems, that is problems that go beyond the other more restricted learning paradigms. In this work, we provide a family of such genuine reinforcement learning tasks. We construct quantum-enhanced learners which learn super-polynomially, and even exponentially faster than any classical reinforcement learning model, and we discuss the potential impact our results may have on future technologies.\nIn this project, we aimed to improve the runtime of Minisat, a Conflict-Driven Clause Learning (CDCL) solver that solves the Propositional Boolean Satisfiability (SAT) problem. We first used a logistic regression model to predict the satisfiability of propositional boolean formulae after fixing the values of a certain fraction of the variables in each formula. We then applied the logistic model and added a preprocessing period to Minisat to determine the preferable initial value (either true or false) of each boolean variable using a Monte-Carlo approach. Concretely, for each Monte-Carlo trial, we fixed the values of a certain ratio of randomly selected variables, and calculated the confidence that the resulting sub-formula is satisfiable with our logistic regression model. The initial value of each variable was set based on the mean confidence scores of the trials that started from the literals of that variable. We were particularly interested in setting the initial values of the backbone variables correctly, which are variables that have the same value in all solutions of a SAT formula. Our Monte-Carlo method was able to set 78% of the backbones correctly. Excluding the preprocessing time, compared with the default setting of Minisat, the runtime of Minisat for satisfiable formulae decreased by 23%. However, our method did not outperform vanilla Minisat in runtime, as the decrease in the conflicts was outweighed by the long runtime of the preprocessing period.\nOne of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a challenging multi-label classification task. There are two major biological mechanisms for TF binding: (1) sequence-specific binding patterns on genomes known as \"motifs\" and (2) interactions among TFs known as co-binding effects. In this paper, we propose a novel deep architecture, the Prototype Matching Network (PMN) to mimic the TF binding mechanisms. Our PMN model automatically extracts prototypes (\"motif\"-like features) for each TF through a novel prototype-matching loss. Borrowing ideas from few-shot matching models, we use the notion of support set of prototypes and an LSTM to learn how TFs interact and bind to genomic sequences. On a reference TFBS dataset with $2.1$ $million$ genomic sequences, PMN significantly outperforms baselines and validates our design choices empirically. To our knowledge, this is the first deep learning architecture that introduces prototype learning and considers TF-TF interactions for large-scale TFBS prediction. Not only is the proposed architecture accurate, but it also models the underlying biology.\nCombining deep model-free reinforcement learning with on-line planning is a promising approach to building on the successes of deep RL. On-line planning with look-ahead trees has proven successful in environments where transition models are known a priori. However, in complex environments where transition models need to be learned from data, the deficiencies of learned models have limited their utility for planning. To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions. TreeQN dynamically constructs a tree by recursively applying a transition model in a learned abstract state space and then aggregating predicted rewards and state-values using a tree backup to estimate Q-values. We also propose ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network. Both approaches are trained end-to-end, such that the learned model is optimised for its actual use in the tree. We show that TreeQN and ATreeC outperform n-step DQN and A2C on a box-pushing task, as well as n-step DQN and value prediction networks (Oh et al. 2017) on multiple Atari games. Furthermore, we present ablation studies that demonstrate the effect of different auxiliary losses on learning transition models.\nBorder crossing delays cause problems like huge economics loss and heavy environmental pollutions. To understand more about the nature of border crossing delay, this study applies a dictionary-based compression algorithm to process the historical Niagara Frontier border wait times data. It can identify the abnormal spatial-temporal patterns for both passenger vehicles and trucks at three bridges connecting US and Canada. Furthermore, it provides a quantitate anomaly score to rank the wait times patterns across the three bridges for each vehicle type and each direction. By analyzing the top three most abnormal patterns, we find that there are at least two factors contributing the anomaly of the patterns. The weekends and holidays may cause unusual heave congestions at the three bridges at the same time, and the freight transportation demand may be uneven from Canada to the USA at Peace Bridge and Lewiston-Queenston Bridge, which may lead to a high anomaly score. By calculating the frequency of the top 5% abnormal patterns by hour of the day, the results show that for cars from the USA to Canada, the frequency of abnormal waiting time patterns is the highest during noon while for trucks in the same direction, it is the highest during the afternoon peak hours. For Canada to US direction, the frequency of abnormal border wait time patterns for both cars and trucks reaches to the peak during the afternoon. The analysis of abnormal spatial-temporal wait times patterns is promising to improve the border crossing management\nRecurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes advantage of dropout to achieve this goal. Specifically, we propose to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions. In this way our regularization encourages the representations of RNNs to be invariant to dropout mask, thus being robust. We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout. We evaluate our model and achieve state-of-the-art results in sequence modeling tasks on two benchmark datasets - Penn Treebank and Wikitext-2. We also show that our approach leads to performance improvement by a significant margin in image captioning (Microsoft COCO) and semi-supervised (CIFAR-10) tasks.\nPurpose: A new method for magnetic resonance (MR) imaging water-fat separation using a convolutional neural network (ConvNet) and deep learning (DL) is presented. Feasibility of the method with complex and magnitude images is demonstrated with a series of patient studies and accuracy of predicted quantitative values is analyzed.   Methods: Water-fat separation of 1200 gradient-echo acquisitions from 90 imaging sessions (normal, acute and chronic myocardial infarction) was performed using a conventional model based method with modeling of R2* and off-resonance and a multi-peak fat spectrum. A U-Net convolutional neural network for calculation of water-only, fat-only, R2* and off-resonance images was trained with 900 gradient-echo Multiple and single-echo complex and magnitude input data algorithms were studied and compared to conventional extended echo modeling.   Results: The U-Net ConvNet was easily trained and provided water-fat separation results visually comparable to conventional methods. Myocardial fat deposition in chronic myocardial infarction and intramyocardial hemorrhage in acute myocardial infarction were well visualized in the DL results. Predicted values for R2*, off-resonance, water and fat signal intensities were well correlated with conventional model based water fat separation (R2>=0.97, p<0.001). DL images had a 14% higher signal-to-noise ratio (p<0.001) when compared to the conventional method.   Conclusion: Deep learning utilizing ConvNets is a feasible method for MR water-fat separationimaging with complex, magnitude and single echo image data. A trained U-Net can be efficiently used for MR water-fat separation, providing results comparable to conventional model based methods.\nBackground: In silico drug-target interaction (DTI) prediction plays an integral role in drug repositioning: the discovery of new uses for existing drugs. One popular method of drug repositioning is network-based DTI prediction, which uses complex network theory to predict DTIs from a drug-target network. Currently, most network-based DTI prediction is based on machine learning methods such as Restricted Boltzmann Machines (RBM) or Support Vector Machines (SVM). These methods require additional information about the characteristics of drugs, targets and DTIs, such as chemical structure, genome sequence, binding types, causes of interactions, etc., and do not perform satisfactorily when such information is unavailable. We propose a new, alternative method for DTI prediction that makes use of only network topology information attempting to solve this problem.   Results: We compare our method for DTI prediction against the well-known RBM approach. We show that when applied to the MATADOR database, our approach based on node neighborhoods yield higher precision for high-ranking predictions than RBM when no information regarding DTI types is available.   Conclusion: This demonstrates that approaches purely based on network topology provide a more suitable approach to DTI prediction in the many real-life situations where little or no prior knowledge is available about the characteristics of drugs, targets, or their interactions.\nThe success of Deep Learning and its potential use in many important safety- critical applications has motivated research on formal verification of Neural Network (NN) models. Despite the reputation of learned NN models to behave as black boxes and the theoretical hardness of proving their properties, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure. Unfortunately, most of these approaches test their algorithms without comparison with other approaches. As a result, the pros and cons of the different algorithms are not well understood. Motivated by the need to accelerate progress in this very important area, we investigate the trade-offs of a number of different approaches based on Mixed Integer Programming, Satisfiability Modulo Theory, as well as a novel method based on the Branch-and-Bound framework. We also propose a new data set of benchmarks, in addition to a collection of pre- viously released testcases that can be used to compare existing methods. Our analysis not only allows a comparison to be made between different strategies, the comparison of results from different solvers also revealed implementation bugs in published methods. We expect that the availability of our benchmark and the analysis of the different approaches will allow researchers to develop and evaluate promising approaches for making progress on this important topic.\nSchool bus planning is usually divided into routing and scheduling due to the complexity of solving them concurrently. However, the separation between these two steps may lead to worse solutions with higher overall costs than that from solving them together. When finding the minimal number of trips in the routing problem, neglecting the importance of trip compatibility may increase the number of buses actually needed in the scheduling problem. This paper proposes a new formulation for the multi-school homogeneous fleet routing problem that maximizes trip compatibility while minimizing total travel time. This incorporates the trip compatibility for the scheduling problem in the routing problem. Since the problem is inherently just a routing problem, finding a good solution is not cumbersome. To compare the performance of the model with traditional routing problems, we generate eight mid-size data sets. Through importing the generated trips of the routing problems into the bus scheduling (blocking) problem, it is shown that the proposed model uses up to 13% fewer buses than the common traditional routing models.\nSafely serving the school transportation demand with the minimum number of buses is one of the highest financial goals of school transportation directors. To achieve that objective, a good and efficient way to solve the routing and scheduling problem is required. Due to the growth of the computing power, the spotlight has been shed on solving the combined problem of the school bus routing and scheduling problem. We show that an integrated multi-school bus routing and scheduling can be formulated with the help of trip compatibility. A novel decomposition algorithm is proposed to solve the integrated model. The merit of this integrated model and the decomposition method is that with the consideration of the trip compatibility, the interrelationship between the routing and scheduling sub-problems will not be lost in the process of decomposition. Results show the proposed decomposed problem could provide the solutions using the same number of buses as the integrated model in much shorter time (as little as 0.6%) and that the proposed method can save up to 26% number of buses from existing research.\nUser participation in online communities is driven by the intertwinement of the social network structure with the crowd-generated content that flows along its links. These aspects are rarely explored jointly and at scale. By looking at how users generate and access pictures of varying beauty on Flickr, we investigate how the production of quality impacts the dynamics of online social systems. We develop a deep learning computer vision model to score images according to their aesthetic value and we validate its output through crowdsourcing. By applying it to over 15B Flickr photos, we study for the first time how image beauty is distributed over a large-scale social system. Beautiful images are evenly distributed in the network, although only a small core of people get social recognition for them. To study the impact of exposure to quality on user engagement, we set up matching experiments aimed at detecting causality from observational data. Exposure to beauty is double-edged: following people who produce high-quality content increases one's probability of uploading better photos; however, an excessive imbalance between the quality generated by a user and the user's neighbors leads to a decline in engagement. Our analysis has practical implications for improving link recommender systems.\nAccumulating evidence suggest that human behavior in trial-and-error learning tasks based on decisions between discrete actions may involve a combination of reinforcement learning (RL) and working-memory (WM). While the understanding of brain activity at stake in this type of tasks often involve the comparison with non-human primate neurophysiological results, it is not clear whether monkeys use similar combined RL and WM processes to solve these tasks. Here we analyzed the behavior of five monkeys with computational models combining RL and WM. Our model-based analysis approach enables to not only fit trial-by-trial choices but also transient slowdowns in reaction times, indicative of WM use. We found that the behavior of the five monkeys was better explained in terms of a combination of RL and WM despite inter-individual differences. The same coordination dynamics we used in a previous study in humans best explained the behavior of some monkeys while the behavior of others showed the opposite pattern, revealing a possible different dynamics of WM process. We further analyzed different variants of the tested models to open a discussion on how the long pretraining in these tasks may have favored particular coordination dynamics between RL and WM. This points towards either inter-species differences or protocol differences which could be further tested in humans.\nLearning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code's known syntax. For example, long-range dependencies induced by using the same variable or function in distant locations are often not considered. We propose to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures.   In this work, we present how to construct graphs from source code and how to scale Gated Graph Neural Networks training to such large graphs. We evaluate our method on two tasks: VarNaming, in which a network attempts to predict the name of a variable given its usage, and VarMisuse, in which the network learns to reason about selecting the correct variable that should be used at a given program location. Our comparison to methods that use less structured program representations shows the advantages of modeling known structure, and suggests that our models learn to infer meaningful names and to solve the VarMisuse task in many cases. Additionally, our testing showed that VarMisuse identifies a number of bugs in mature open-source projects.\nThe largest source of sound events is web videos. Most videos lack sound event labels at segment level, however, a significant number of them do respond to text queries, from a match found using metadata by search engines. In this paper we explore the extent to which a search query can be used as the true label for detection of sound events in videos. We present a framework for large-scale sound event recognition on web videos. The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets. The datasets are used to train three classifiers, and we obtain a prediction on 3.7 million web video segments. We evaluated performance using the search query as true label and compare it with human labeling. Both types of ground truth exhibited close performance, to within 10%, and similar performance trend with increasing number of evaluated segments. Hence, our experiments show potential for using search query as a preliminary true label for sound event recognition in web videos.\nWe propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\\ell_\\infty$ norm less than $\\epsilon = 0.1$). Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.\nIn many real-world optimization problems, the objective function evaluation is subject to noise, and we cannot obtain the exact objective value. Evolutionary algorithms (EAs), a type of general-purpose randomized optimization algorithm, have shown able to solve noisy optimization problems well. However, previous theoretical analyses of EAs mainly focused on noise-free optimization, which makes the theoretical understanding largely insufficient. Meanwhile, the few existing theoretical studies under noise often considered the one-bit noise model, which flips a randomly chosen bit of a solution before evaluation; while in many realistic applications, several bits of a solution can be changed simultaneously. In this paper, we study a natural extension of one-bit noise, the bit-wise noise model, which independently flips each bit of a solution with some probability. We analyze the running time of the (1+1)-EA solving OneMax and LeadingOnes under bit-wise noise for the first time, and derive the ranges of the noise level for polynomial and super-polynomial running time bounds. The analysis on LeadingOnes under bit-wise noise can be easily transferred to one-bit noise, and improves the previously known results. Since our analysis discloses that the (1+1)-EA can be efficient only under low noise levels, we also study whether the sampling strategy can bring robustness to noise. We prove that using sampling can significantly increase the largest noise level allowing a polynomial running time, that is, sampling is robust to noise.\nProcess Discovery is concerned with the automatic generation of a process model that describes a business process from execution data of that business process. Real life event logs can contain chaotic activities. These activities are independent of the state of the process and can, therefore, happen at rather arbitrary points in time. We show that the presence of such chaotic activities in an event log heavily impacts the quality of the process models that can be discovered with process discovery techniques. The current modus operandi for filtering activities from event logs is to simply filter out infrequent activities. We show that frequency-based filtering of activities does not solve the problems that are caused by chaotic activities. Moreover, we propose a novel technique to filter out chaotic activities from event logs. We evaluate this technique on a collection of seventeen real-life event logs that originate from both the business process management domain and the smart home environment domain. As demonstrated, the developed activity filtering methods enable the discovery of process models that are more behaviorally specific compared to process models that are discovered using standard frequency-based filtering.\nIn robotics, it is essential to be able to plan efficiently in high-dimensional continuous state-action spaces for long horizons. For such complex planning problems, unguided uniform sampling of actions until a path to a goal is found is hopelessly inefficient, and gradient-based approaches often fall short when the optimization manifold of a given problem is not smooth. In this paper we present an approach that guides the search of a state-space planner, such as A*, by learning an action-sampling distribution that can generalize across different instances of a planning problem. The motivation is that, unlike typical learning approaches for planning for continuous action space that estimate a policy, an estimated action sampler is more robust to error since it has a planner to fall back on. We use a Generative Adversarial Network (GAN), and address an important issue: search experience consists of a relatively large number of actions that are not on a solution path and a relatively small number of actions that actually are on a solution path. We introduce a new technique, based on an importance-ratio estimation method, for using samples from a non-target distribution to make GAN learning more data-efficient. We provide theoretical guarantees and empirical evaluation in three challenging continuous robot planning problems to illustrate the effectiveness of our algorithm.\nRapid growth of modern technologies such as internet and mobile computing are bringing dramatically increased e-commerce payments, as well as the explosion in transaction fraud. Meanwhile, fraudsters are continually refining their tricks, making rule-based fraud detection systems difficult to handle the ever-changing fraud patterns. Many data mining and artificial intelligence methods have been proposed for identifying small anomalies in large transaction data sets, increasing detecting efficiency to some extent. Nevertheless, there is always a contradiction that most methods are irrelevant to transaction sequence, yet sequence-related methods usually cannot learn information at single-transaction level well. In this paper, a new \"within->between->within\" sandwich-structured sequence learning architecture has been proposed by stacking an ensemble method, a deep sequential learning method and another top-layer ensemble classifier in proper order. Moreover, attention mechanism has also been introduced in to further improve performance. Models in this structure have been manifested to be very efficient in scenarios like fraud detection, where the information sequence is made up of vectors with complex interconnected features.\nThe performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12%-35% for the Matrix Multiplication, Stream and $2D$ Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.\nAccurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.\nBuilding effective recommender systems for domains like fashion is challenging due to the high level of subjectivity and the semantic complexity of the features involved (i.e., fashion styles). Recent work has shown that approaches to `visual' recommendation (e.g.~clothing, art, etc.) can be made more accurate by incorporating visual signals directly into the recommendation objective, using `off-the-shelf' feature representations derived from deep networks. Here, we seek to extend this contribution by showing that recommendation performance can be significantly improved by learning `fashion aware' image representations directly, i.e., by training the image representation (from the pixel level) and the recommender system jointly; this contribution is related to recent work using Siamese CNNs, though we are able to show improvements over state-of-the-art recommendation techniques such as BPR and variants that make use of pre-trained visual features. Furthermore, we show that our model can be used \\emph{generatively}, i.e., given a user and a product category, we can generate new images (i.e., clothing items) that are most consistent with their personal taste. This represents a first step towards building systems that go beyond recommending existing items from a product corpus, but which can be used to suggest styles and aid the design of new products.\nDR-submodular continuous functions are important objectives with wide real-world applications spanning MAP inference in determinantal point processes (DPPs), and mean-field inference for probabilistic submodular models, amongst others. DR-submodularity captures a subclass of non-convex functions that enables both exact minimization and approximate maximization in polynomial time.   In this work we study the problem of maximizing non-monotone DR-submodular continuous functions under general down-closed convex constraints. We start by investigating geometric properties that underlie such objectives, e.g., a strong relation between (approximately) stationary points and global optimum is proved. These properties are then used to devise two optimization algorithms with provable guarantees. Concretely, we first devise a \"two-phase\" algorithm with $1/4$ approximation guarantee. This algorithm allows the use of existing methods for finding (approximately) stationary points as a subroutine, thus, harnessing recent progress in non-convex optimization. Then we present a non-monotone Frank-Wolfe variant with $1/e$ approximation guarantee and sublinear convergence rate. Finally, we extend our approach to a broader class of generalized DR-submodular continuous functions, which captures a wider spectrum of applications. Our theoretical findings are validated on synthetic and real-world problem instances.\nEmbedding methods such as word embedding have become pillars for many applications containing discrete structures. Conventional embedding methods directly associate each symbol with a continuous embedding vector, which is equivalent to applying linear transformation based on \"one-hot\" encoding of the discrete symbols. Despite its simplicity, such approach yields number of parameters that grows linearly with the vocabulary size and can lead to overfitting. In this work we propose a much more compact K-way D-dimensional discrete encoding scheme to replace the \"one-hot\" encoding. In \"KD encoding\", each symbol is represented by a $D$-dimensional code, and each of its dimension has a cardinality of $K$. The final symbol embedding vector can be generated by composing the code embedding vectors. To learn the semantically meaningful code, we derive a relaxed discrete optimization technique based on stochastic gradient descent. By adopting the new coding system, the efficiency of parameterization can be significantly improved (from linear to logarithmic), and this can also mitigate the over-fitting problem. In our experiments with language modeling, the number of embedding parameters can be reduced by 97\\% while achieving similar or better performance.\nWe describe a novel spiking neural network (SNN) for automated, real-time handwritten digit classification and its implementation on a GP-GPU platform. Information processing within the network, from feature extraction to classification is implemented by mimicking the basic aspects of neuronal spike initiation and propagation in the brain. The feature extraction layer of the SNN uses fixed synaptic weight maps to extract the key features of the image and the classifier layer uses the recently developed NormAD approximate gradient descent based supervised learning algorithm for spiking neural networks to adjust the synaptic weights. On the standard MNIST database images of handwritten digits, our network achieves an accuracy of 99.80% on the training set and 98.06% on the test set, with nearly 7x fewer parameters compared to the state-of-the-art spiking networks. We further use this network in a GPU based user-interface system demonstrating real-time SNN simulation to infer digits written by different users. On a test set of 500 such images, this real-time platform achieves an accuracy exceeding 97% while making a prediction within an SNN emulation time of less than 100ms.\nAs technology becomes more advanced, those who design, use and are otherwise affected by it want to know that it will perform correctly, and understand why it does what it does, and how to use it appropriately. In essence they want to be able to trust the systems that are being designed. In this survey we present assurances that are the method by which users can understand how to trust autonomous systems. Trust between humans and autonomy is reviewed, and the implications for the design of assurances are highlighted. A survey of existing research related to assurances is presented. Much of the surveyed research originates from fields such as interpretable, comprehensible, transparent, and explainable machine learning, as well as human-computer interaction, human-robot interaction, and e-commerce. Several key ideas are extracted from this work in order to refine the definition of assurances. The design of assurances is found to be highly dependent not only on the capabilities of the autonomous system, but on the characteristics of the human user, and the appropriate trust-related behaviors. Several directions for future research are identified and discussed.\nWe introduce KBGAN, an adversarial learning framework to improve the performances of a wide range of existing knowledge graph embedding models. Because knowledge graphs typically only contain positive facts, sampling useful negative training examples is a non-trivial task. Replacing the head or tail entity of a fact with a uniformly randomly selected entity is a conventional method for generating negative facts, but the majority of the generated negative facts can be easily discriminated from positive facts, and will contribute little towards the training. Inspired by generative adversarial networks (GANs), we use one knowledge graph embedding model as a negative sample generator to assist the training of our desired model, which acts as the discriminator in GANs. This framework is independent of the concrete form of generator and discriminator, and therefore can utilize a wide variety of knowledge graph embedding models as its building blocks. In experiments, we adversarially train two translation-based models, TransE and TransD, each with assistance from one of the two probability-based models, DistMult and ComplEx. We evaluate the performances of KBGAN on the link prediction task, using three knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental results show that adversarial training substantially improves the performances of target embedding models under various settings.\nDifferential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences in asymptotic performance among various input classes in terms of program internals. We propose a data-driven technique based on discriminant regression tree (DRT) learning problem where the goal is to discriminate among different classes of inputs. We propose a new algorithm for DRT learning that first clusters the data into functional clusters, capturing different asymptotic performance classes, and then invokes off-the-shelf decision tree learning algorithms to explain these clusters. We focus on linear functional clusters and adapt classical clustering algorithms (K-means and spectral) to produce them. For the K-means algorithm, we generalize the notion of the cluster centroid from a point to a linear function. We adapt spectral clustering by defining a novel kernel function to capture the notion of linear similarity between two data points. We evaluate our approach on benchmarks consisting of Java programs where we are interested in debugging performance. We show that our algorithm significantly outperforms other well-known regression tree learning algorithms in terms of running time and accuracy of classification.\nSpectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the discrete one. Finally, those three subtasks are integrated into a unified framework, with each subtask iteratively boosted by using the results of the others towards an overall optimal solution. It is known that the performance of a kernel method is largely determined by the choice of kernels. To tackle this practical problem of how to select the most suitable kernel for a particular data set, we further extend our model to incorporate multiple kernel learning ability. Extensive experiments demonstrate the superiority of our proposed method as compared to existing clustering approaches.\n"
  },
  {
    "path": "data/arxiv/artificial intelligence_10047_15000_15_title.txt",
    "content": "Impact of Artificial Intelligence on Economic Theory\nArtificial Intelligence in Humans\nThree IQs of AI Systems and their Testing Methods\nPhilosophy in the Face of Artificial Intelligence\nA Study on Artificial Intelligence IQ and Standard Intelligent Model\nHybrid Systems Knowledge Representation Using Modelling Environment  System Techniques Artificial Intelligence\nTests of Machine Intelligence\nAn Approximation of the Universal Intelligence Measure\nDecision under Uncertainty\nAction Selection Properties in a Software Simulated Agent\nA Novel Method for Developing Robotics via Artificial Intelligence and  Internet of Things\nModular Belief Updates and Confusion about Measures of Certainty in  Artificial Intelligence Research\nQuantifying Natural and Artificial Intelligence in Robots and Natural  Systems with an Algorithmic Behavioural Test\nA Formal Measure of Machine Intelligence\nComputational Narrative Intelligence: A Human-Centered Goal for  Artificial Intelligence\nArtificial Intelligence and Economic Theories\nA Definition of Artificial Intelligence\nGuidelines for Artificial Intelligence Containment\nImproving content marketing processes with the approaches by artificial  intelligence\nEthical Considerations in Artificial Intelligence Courses\nA Framework for Searching for General Artificial Intelligence\nBrain Intelligence: Go Beyond Artificial Intelligence\nIntelligence Quotient and Intelligence Grade of Artificial Intelligence\nSwarm Intelligence\nOn the influence of intelligence in (social) intelligence testing  environments\nHybrid Reasoning and the Future of Iconic Representations\nDiverse Consequences of Algorithmic Probability\nA Backwards View for Assessment\nFoundations of Probability Theory for AI - The Application of  Algorithmic Probability to Problems in Artificial Intelligence\nTowards Verified Artificial Intelligence\nMultilayered Model of Speech\nArtificial and Biological Intelligence\nMan and Machine: Questions of Risk, Trust and Accountability in Today's  AI Technology\nHuman-in-the-loop Artificial Intelligence\nMeasurements of collective machine intelligence\nThe Role of Artificial Intelligence Technologies in Crisis Response\nA Systematic Approach to Artificial Agents\nComparison between the two definitions of AI\nA Python Engine for Teaching Artificial Intelligence in Games\nProbability Judgement in Artificial Intelligence\nAI Methods in Algorithmic Composition: A Comprehensive Survey\nRational Choice and Artificial Intelligence\nInstitutional Metaphors for Designing Large-Scale Distributed AI versus  AI Techniques for Running Institutions\nIntelligible Artificial Intelligence\nHow Intelligent is your Intelligent Robot?\nQuantization of Games: Towards Quantum Artificial Intelligence\nIdentifying Independencies in Causal Graphs with Feedback\nConstructing Lower Probabilities\nThe Assumptions Behind Dempster's Rule\nThe Nature of the Unnormalized Beliefs Encountered in the Transferable  Belief Model\nCoefficients of Relations for Probabilistic Reasoning\nEvolution towards Smart Optical Networking: Where Artificial  Intelligence (AI) meets the World of Photonics\nOne Decade of Universal Artificial Intelligence\nRealizing Intelligence\nArtificial Intelligence and Asymmetric Information Theory\nExplanation in Artificial Intelligence: Insights from the Social  Sciences\nMemory Based Machine Intelligence Techniques in VLSI hardware\nQuantitative Analysis of Whether Machine Intelligence Can Surpass Human  Intelligence\nAgent Models of Political Interactions\nIntroduction to intelligent computing unit 1\nA primer on Answer Set Programming\nUniversal Intelligence: A Definition of Machine Intelligence\nMeasuring Intelligence through Games\nAvoiding Undesired Choices Using Intelligent Adaptive Systems\nDetecting Qualia in Natural and Artificial Agents\nDesign of the Artificial: lessons from the biological roots of general  intelligence\nA Cyber Science Based Ontology for Artificial General Intelligence  Containment\nConsiderations upon the Machine Learning Technologies\nAlgorithmic Randomness as Foundation of Inductive Reasoning and  Artificial Intelligence\nHybrid Systems for Knowledge Representation in Artificial Intelligence\nTypes of Cognition and its Implications for future High-Level Cognitive  Machines\nAmbiente de Planejamento Ipê\nArtificial Intelligence Approaches To UCAV Autonomy\nResponsible Autonomy\nArtificial Intelligence and its Role in Near Future\nOpen Problems in Universal Induction & Intelligence\nA Collection of Definitions of Intelligence\nA framework: Cluster detection and multidimensional visualization of  automated data mining using intelligent agents\nThe Lovelace 2.0 Test of Artificial Creativity and Intelligence\nIntelligent Biohybrid Neurotechnologies: Are They Really What They  Claim?\nDesign and development of a software system for swarm intelligence based  research studies\nA Survey of Question Answering for Math and Science Problem\nChallenges and Characteristics of Intelligent Autonomy for Internet of  Battle Things in Highly Adversarial Environments\nThe Computational Theory of Intelligence: Information Entropy\nA Model for Combination of External and Internal Stimuli in the Action  Selection of an Autonomous Agent\nArtificial Intelligence Techniques for Steam Generator Modelling\nAnalysis of Microarray Data using Artificial Intelligence Based  Techniques\nBlue Sky Ideas in Artificial Intelligence Education from the EAAI 2017  New and Future AI Educator Program\nArtificial Intelligence Based Malware Analysis\nKnowledge Transfer Between Artificial Intelligence Systems\nNarrow Artificial Intelligence with Machine Learning for Real-Time  Estimation of a Mobile Agents Location Using Hidden Markov Models\nIntelligence in Artificial Intelligence\nThe SP Theory of Intelligence as a Foundation for the Development of a  General, Human-Level Thinking Machine\nTowards an Intelligent Database System Founded on the SP Theory of  Computing and Cognition\nFaith in the Algorithm, Part 1: Beyond the Turing Test\nAn existing, ecologically-successful genus of collectively intelligent  artificial creatures\nDeath and Suicide in Universal Artificial Intelligence\nOpen Ended Intelligence: The individuation of Intelligent Agents\nEnaction-Based Artificial Intelligence: Toward Coevolution with Humans  in the Loop\nUniversal Algorithmic Intelligence: A mathematical top->down approach\nIs Intelligence Artificial?\nTuring: Then, Now and Still Key\nNon-Evolutionary Superintelligences Do Nothing, Eventually\nCluster-based Specification Techniques in Dempster-Shafer Theory for an  Evidential Intelligence Analysis of MultipleTarget Tracks (Thesis Abstract)\nEvidential Force Aggregation\nEffect of noise in intelligent cellular decision making\nExamples of Artificial Perceptions in Optical Character Recognition and  Iris Recognition\nAnalysis of first prototype universal intelligence tests: evaluating and  comparing AI algorithms and humans\nOPUS: An Efficient Admissible Algorithm for Unordered Search\nA Resolution Calculus for Dynamic Semantics\nA Logic for Reasoning about Evidence\nThe Road to Quantum Artificial Intelligence\nHeterogeneous knowledge representation using a finite automaton and  first order logic: a case study in electromyography\nApproximate Counting of Graphical Models Via MCMC Revisited\nThe Complexity of Plan Existence and Evaluation in Probabilistic Domains\nProbabilistic Conceptual Network: A Belief Representation Scheme for  Utility-Based Categorization\nDiscounting and Combination Operations in Evidential Reasoning\nTowards a Simulation-Based Programming Paradigm for AI applications\nDecision Under Uncertainty in Diagnosis\nRelative Entropy, Probabilistic Inference and AI\nAn Evaluation of Two Alternatives to Minimax\nResearch Priorities for Robust and Beneficial Artificial Intelligence\nThe Computational Power of Dynamic Bayesian Networks\nDon't Fear the Reaper: Refuting Bostrom's Superintelligence Argument\nGeneral Video Game AI: Learning from Screen Capture\nFeasibility Study: Moving Non-Homogeneous Teams in Congested Video Game  Environments\nSelf-Regulating Artificial General Intelligence\nViewpoint: Artificial Intelligence and Labour\nAn architecture for the evaluation of intelligent systems\nQuantitative Results Comparing Three Intelligent Interfaces for  Information Capture: A Case Study Adding Name Information into an Electronic  Personal Organizer\nAvoiding Wireheading with Value Reinforcement Learning\nA proposal for ethically traceable artificial intelligence\nElementary epistemological features of machine intelligence\nFormal Definition of AI\nStream Computing\nAn Analysis of General Fuzzy Logic and Fuzzy Reasoning Method\nHuman-Level Intelligence or Animal-Like Abilities?\nDesign of a P System based Artificial Graph Chemistry\nGoal Conflict in Designing an Autonomous Artificial System\nApplications of Artificial Intelligence Techniques to Combating Cyber  Crimes: A Review\nOn the idea of a new artificial intelligence based optimization  algorithm inspired from the nature of vortex\nCharacterizations of Decomposable Dependency Models\nConditional Plausibility Measures and Bayesian Networks\nInstantaneously Trained Neural Networks\nA Rational Decision Maker with Ordinal Utility under Uncertainty:  Optimism and Pessimism\nArtificial Brain Based on Credible Neural Circuits in a Human Brain\nFinding a Path is Harder than Finding a Tree\nSHOP2: An HTN Planning System\nNew Polynomial Classes for Logic-Based Abduction\nAsymptotically Optimal Agents\nEngineering a Conformant Probabilistic Planner\nMaximum likelihood fitting of acyclic directed mixed graphs to binary  data\nOn Measurement Bias in Causal Inference\nSymmetry Breaking Constraints: Recent Results\nComplexity Analysis and Variational Inference for Interpretation-based  Probabilistic Description Logic\nIdentifying Dynamic Sequential Plans\nReading Dependencies from Polytree-Like Bayesian Networks\nA Criterion for Parameter Identification in Structural Equation Models\nSufficient conditions for convergence of Loopy Belief Propagation\nDescription Logics with Fuzzy Concrete Domains\nOptimistic Agents are Asymptotically Optimal\nFactorization of Discrete Probability Distributions\nApproximate Planning for Factored POMDPs using Belief State  Simplification\nBayesian Control for Concentrating Mixed Nuclear Waste\nA Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer  Architectures for Computing Marginals of Probability Distributions\nPlanning with Partially Observable Markov Decision Processes: Advances  in Exact Solution Method\nExploiting Uncertain and Temporal Information in Correlation\nA Scheme for Approximating Probabilistic Inference\nLimitations of Skeptical Default Reasoning\nOn Stable Multi-Agent Behavior in Face of Uncertainty\nRegion-Based Approximations for Planning in Stochastic Domains\nPropagation of 2-Monotone Lower Probabilities on an Undirected Graph\nTopological Parameters for Time-Space Tradeoff\nComputing Upper and Lower Bounds on Likelihoods in Intractable Networks\nBinary Join Trees\nReal Time Estimation of Bayesian Networks\nA Characterization of the Dirichlet Distribution with Application to  Learning Bayesian Networks\nToward a Characterization of Uncertainty Measure for the Dempster-Shafer  Theory\nCausal Inference and Causal Explanation with Background Knowledge\nStrong Completeness and Faithfulness in Bayesian Networks\nDefaults and Infinitesimals: Defeasible Inference by Nonarchimedean  Entropy-Maximization\nA Bayesian Method Reexamined\nPossibility and Necessity Functions over Non-classical Logics\nFrom Influence Diagrams to Junction Trees\nBelief Induced by the Partial Knowledge of the Probabilities\nOn Axiomatization of Probabilistic Conditional Independencies\nNormative Engineering Risk Management Systems\nTwo Procedures for Compiling Influence Diagrams\nDeciding Morality of Graphs is NP-complete\nQualitative Measures of Ambiguity\nJeffrey's rule of conditioning generalized to belief functions\nInference with Possibilistic Evidence\nEntropy and Belief Networks\nObjection-Based Causal Networks\nA Note on the Measure of Discord\nBayesian Networks Aplied to Therapy Monitoring\nA Probabilistic Analysis of Marker-Passing Techniques for  Plan-Recognition\nA Reason Maintenace System Dealing with Vague Data\nRepresentation Requirements for Supporting Decision Model Formulation\nA Fusion Algorithm for Solving Bayesian Decision Problems\nAlgorithms for Irrelevance-Based Partial MAPs\nFrom Relational Databases to Belief Networks\nConditional Plausibility Measures and Bayesian Networks\nGeneralized Qualitative Probability: Savage Revisited\nQuantum Annealing for Clustering\nFrom Ordinary Differential Equations to Structural Causal Models: the  deterministic case\nA Counter Example to Theorems of Cox and Fine\nImperfect Match: PDDL 2.1 and Real Applications\nPDDL 2.1: Representation vs. Computation\nContext-Dependent Similarity\nSimilarity Networks for the Construction of Multiple-Faults Belief  Networks\nOn Some Equivalence Relations between Incidence Calculus and  Dempster-Shafer Theory of Evidence\nAnalysis in HUGIN of Data Conflict\nd-Separation: From Theorems to Algorithms\nMaximum Uncertainty Procedures for Interval-Valued Probability  Distributions\nDirected Cycles in Belief Networks\nNormalization and the Representation of Nonmonotonic Knowledge in the  Theory of Evidence\nA Method for Using Belief Networks as Influence Diagrams\nModeling uncertain and vague knowledge in possibility and evidence  theories\nTruth Maintenance Under Uncertainty\nThe Optimality of Satisficing Solutions\nProbabilistic Inference and Probabilistic Reasoning\nA Linear Approximation Method for Probabilistic Inference\nHandling uncertainty in a system for text-symbol context analysis\nDo We Need Higher-Order Probabilities and, If So, What Do They Mean?\nUsing the Dempster-Shafer Scheme in a Diagnostic Expert System Shell\nComparisons of Reasoning Mechanisms for Computer Vision\nEvidential Reasoning in Image Understanding\nExplanation of Probabilistic Inference for Decision Support Systems\nEfficient Inference on Generalized Fault Diagrams\nLearning Link-Probabilities in Causal Trees\nGeneralizing Fuzzy Logic Probabilistic Inferences\nQualitative Probabilistic Networks for Planning Under Uncertainty\nOn Implementing Usual Values\nOn the Combinality of Evidence in the Dempster-Shafer Theory\nA Constraint Propagation Approach to Probabilistic Reasoning\nImplementing Probabilistic Reasoning\nA factorization criterion for acyclic directed mixed graphs\nRule reasoning for legal norm validation of FSTP facts\nDecidability, Introduction Rules and Automata\nOn Quantum Decision Trees\nThe MacGyver Test - A Framework for Evaluating Machine Resourcefulness  and Creative Problem Solving\nAI Safety and Reproducibility: Establishing Robust Foundations for the  Neuroscience of Human Values\nDimensions of Neural-symbolic Integration - A Structured Survey\nCan Intelligence Explode?\nA Heuristic Search Algorithm Using the Stability of Learning Algorithms  in Certain Scenarios as the Fitness Function: An Artificial General  Intelligence Engineering Approach\nA Theory of Universal Artificial Intelligence based on Algorithmic  Complexity\nTowards an Intelligent Tutor for Mathematical Proofs\nSubjective Reality and Strong Artificial Intelligence\nOn the Compatibility Between Physics and Intelligent Organisms\nIntelligent encoding and economical communication in the visual stream\nAutomatic Synthesis of Geometry Problems for an Intelligent Tutoring  System\nThe Computational Theory of Intelligence: Data Aggregation\nIntelligent User Interfaces - A Tutorial\nWhy Artificial Intelligence Needs a Task Theory --- And What It Might  Look Like\nConscious Intelligent Systems - Part II - Mind, Thought, Language and  Understanding\nUltimate Intelligence Part III: Measures of Intelligence, Perception and  Intelligent Agents\nLandau Theory of Adaptive Integration in Computational Intelligence\nAffect Control Processes: Intelligent Affective Interaction using a  Partially Observable Markov Decision Process\nA Model for Web-Intelligence Index to Evaluate the Web Intelligence  Capacity of Government Web Sites of Sri Lanka\nVisual Character Recognition using Artificial Neural Networks\nArtificial Learning in Artificial Memories\nModeling Belief in Dynamic Systems, Part II: Revision and Update\nA note on Darwiche and Pearl\nMatrix Games, Linear Programming, and Linear Approximation\nQuality Classifiers for Open Source Software Repositories\nFact Sheet on Semantic Web\nApproximated Structured Prediction for Learning Large Scale Graphical  Models\nArtificial Intelligence in Reverse Supply Chain Management: The State of  the Art\nDesign of Automatically Adaptable Web Wrappers\nArtificial Decision Making Under Uncertainty in Intelligent Buildings\nA Misanthropic Reinterpretation of the Chinese Room Problem\nExperimental Realization of Quantum Artificial Intelligence\nTwo Gaussian Approaches to Black-Box Optomization\nSAT as a game\nAre Minds Computable?\nQuantifying Morphological Computation based on an Information  Decomposition of the Sensorimotor Loop\nA Survey on Artificial Intelligence and Data Mining for MOOCs\nUnethical Research: How to Create a Malevolent Artificial Intelligence\nA Comment on Argumentation\nDeepStack: Expert-Level Artificial Intelligence in No-Limit Poker\nEntropy Non-increasing Games for the Improvement of Dataflow Programming\nExplainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models\nHow linguistic descriptions of data can help to the teaching-learning  process in higher education, case of study: artificial intelligence\nInnateness, AlphaZero, and Artificial Intelligence\nBlockchain and Artificial Intelligence\nArtificial intelligence and pediatrics: A synthetic mini review\nThe AGINAO Self-Programming Engine\nPerspectives for Strong Artificial Life\nThoughts on an Unified Framework for Artificial Chemistries\nTowards a Conceptual Framework for Innate Immunity\nChemlambda, universality and self-multiplication\nAI Researchers, Video Games Are Your Friends!\nRealizing an optimization approach inspired from Piagets theory on  cognitive development\nArtificial Intelligence and Legal Liability\nAn Integrated Framework for Learning and Reasoning\nVariations of the Turing Test in the Age of Internet and Virtual Reality\nModeling and Verification of a Multi-Agent Argumentation System using  NuSMV\nA Case Study in Knowledge Discovery and Elicitation in an Intelligent  Tutoring Application\nRisk Agoras: Dialectical Argumentation for Scientific Reasoning\nA Knowledge Acquisition Tool for Bayesian-Network Troubleshooters\nNetwork Engineering for Complex Belief Networks\nOn Non-monotonic Conditional Reasoning\nBayesian Inference in Model-Based Machine Vision\nAn Application of Non-Monotonic Probabilistic Reasoning to Air Force  Threat Correlation\nEnergetics of the brain and AI\nUsing artificial intelligence for data reduction in mechanical  engineering\nOn the Use of Skeletons when Learning in Bayesian Networks\nAlgorithms and Complexity Results for Persuasive Argumentation\nIris Codes Classification Using Discriminant and Witness Directions\nIntroduction to the SP theory of intelligence\nManaging Inconsistent Intelligence\nAnalysis of Algorithms and Partial Algorithms\nEmotional Responses in Artificial Agent-Based Systems: Reflexivity and  Adaptation in Artificial Life\nArtificial Immune Systems Tutorial\nEnhancing a Search Algorithm to Perform Intelligent Backtracking\nModeling of Social Transitions Using Intelligent Systems\nA theory of intelligence: networked problem solving in animal societies\nIntelligent Semantic Web Search Engines: A Brief Survey\nCreating Intelligent Linking for Information Threading in Knowledge  Networks\nAn Intelligent Location Management approaches in GSM Mobile Network\nAre there intelligent Turing machines?\nA New Theoretical and Technological System of Imprecise-Information  Processing\nIQ of Neural Networks\nArtificial Intelligence and Systems Theory: Applied to Cooperative  Robots\nOn-line Planning and Scheduling: An Application to Controlling Modular  Printers\nMathematical Foundations for Designing and Development of Intelligent  Systems of Information Analysis\nApplying the Negative Selection Algorithm for Merger and Acquisition  Target Identification\nUsing Thought-Provoking Children's Questions to Drive Artificial  Intelligence Research\nThe Challenge of Non-Technical Loss Detection using Artificial  Intelligence: A Survey\nDesigning a Safe Autonomous Artificial Intelligence Agent based on Human  Self-Regulation\nSenseNet: 3D Objects Database and Tactile Simulator\nSelf-Regulated Artificial Ant Colonies on Digital Image Habitats\nStructured Learning Modulo Theories\nDesign of an Alarm System for Isfahan Ozone Level based on Artificial  Intelligence Predictor Models\nA Model of Pathways to Artificial Superintelligence Catastrophe for Risk  and Decision Analysis\nDecoupling Learning Rules from Representations\nMinimally Naturalistic Artificial Intelligence\nMachine Learning for Wireless Networks with Artificial Intelligence: A  Tutorial on Neural Networks\nA World of Views: A World of Interacting Post-human Intelligences\nAutonomous robots and the SP theory of intelligence\nIntelligent Systems: Architectures and Perspectives\nIntelligent location of simultaneously active acoustic emission sources:  Part I\nDoes intelligence imply contradiction?\nPolyethism in a colony of artificial ants\nDigital Genesis: Computers, Evolution and Artificial Life\nNanoscale artificial intelligence: creating artificial neural networks  using autocatalytic reactions\nToward Formalizing Teleportation of Pedagogical Artificial Agents\nIntelligent User Interface in Fuzzy Environment\nApplying Artificial Intelligence and Internet Techniques in Rural  Tourism Domain\nIntelligent Human Machine Interface Design for Advanced Product Life  Cycle Management Systems\n8-Valent Fuzzy Logic for Iris Recognition and Biometry\nCognitive Bias for Universal Algorithmic Intelligence\nExtending Universal Intelligence Models with Formal Notion of  Representation\nSelf-Modification of Policy and Utility Function in Rational Agents\nHuman vs. Computer Go: Review and Prospect\nA Machine Learning Based Intrusion Detection System for Software Defined  5G Network\nSimple Cortex: A Model of Cells in the Sensory Nervous System\nNeural Networks\nA Dynamical Systems Approach for Static Evaluation in Go\nThe Search for Computational Intelligence\nFrom Seed AI to Technological Singularity via Recursively Self-Improving  Software\nFuzzy Clustering Data Given in the Ordinal Scale\nTransferrable Plausibility Model - A Probabilistic Interpretation of  Mathematical Theory of Evidence\nCan Machines Think in Radio Language?\nForced Evolution in Silico by Artificial Transposons and their Genetic  Operators: The John Muir Ant Problem\nDo Artificial Reinforcement-Learning Agents Matter Morally?\nBorn to Learn: the Inspiration, Progress, and Future of Evolved Plastic  Artificial Neural Networks\nQuantum Artificial Life in an IBM Quantum Computer\nThe structure of evolved representations across different substrates for  artificial intelligence\nClassifying Signals with Local Classifiers\nThe Future of Scientific Simulations: from Artificial Life to Artificial  Cosmogenesis\nI'm sorry to say, but your understanding of image processing  fundamentals is absolutely wrong\nComputational Understanding and Manipulation of Symmetries\nDefinition and properties to assess multi-agent environments as social  intelligence tests\nSingle photon in hierarchical architecture for physical reinforcement  learning: Photon intelligence\nLinguistics Computation, Automatic Model Generation, and Intensions\nDynamic Backtracking\nApplying GSAT to Non-Clausal Formulas\nOn Planning while Learning\nDecision-Theoretic Foundations for Causal Reasoning\nStatistical Feature Combination for the Evaluation of Game Positions\nRule-based Machine Learning Methods for Functional Prediction\nWell-Founded Semantics for Extended Logic Programs with Dynamic  Preferences\nQuantum Computing and Phase Transitions in Combinatorial Search\nMean Field Theory for Sigmoid Belief Networks\nImproved Use of Continuous Attributes in C4.5\nActive Learning with Statistical Models\nA Divergence Critic for Inductive Proof\nLearning First-Order Definitions of Functions\nLifeworld Analysis\nA Complete Classification of Tractability in RCC-5\nEight Maximal Tractable Subclasses of Allen's Algebra with Metric Time\nDefining Relative Likelihood in Partially-Ordered Preferential  Structures\nRepresentation Theory for Default Logic\nMixing Metaphors\nFuzzy Approaches to Abductive Inference\nExact Phase Transitions in Random Constraint Satisfaction Problems\nEntrenchment Relations: A Uniform Approach to Nonmonotonicity\nA Tableau Calculus for Pronoun Resolution\nGeneralized Qualitative Probability: Savage revisited\nBayesian Information Extraction Network\nInformation Compression by Multiple Alignment, Unification and Search as  a Unifying Principle in Computing and Cognition\nUsing Artificial Intelligence for Model Selection\nMemory As A Monadic Control Construct In Problem-Solving\nDefensive forecasting\nRelation Variables in Qualitative Spatial Reasoning\nAdaptation Knowledge Discovery from a Case Base\nDependency Parsing with Dynamic Bayesian Network\nNeural Networks with c-NOT Gated Nodes\nArabic Speech Recognition System using CMU-Sphinx4\nSymbolic sensors\nArtificial Intelligence for Conflict Management\nA Novel Model of Working Set Selection for SMO Decomposition Methods\nFeature Dynamic Bayesian Networks\nBreaking Value Symmetry\nSymmetry Breaking Using Value Precedence\nStochastic Constraint Programming\nDecompositions of All Different, Global Cardinality and Related  Constraints\nReasoning about soft constraints and conditional preferences: complexity  results and approximation techniques\nMultiset Ordering Constraints\nPattern Recognition Theory of Mind\nHow to Complete an Interactive Configuration Process?\nA fuzzified BRAIN algorithm for learning DNF from incomplete data\nSymmetry within Solutions\nIntegrating multiple sources to answer questions in Algebraic Topology\nFaithfulness in Chain Graphs: The Gaussian Case\nAI 3D Cybug Gaming\nUse of Python and Phoenix-M Interface in Robotics\nSolving the Satisfiability Problem Through Boolean Networks\nTeraflop-scale Incremental Machine Learning\nEvolutionary Algorithms for Reinforcement Learning\nReasoning about Minimal Belief and Negation as Failure\nPlanning Graph as a (Dynamic) CSP: Exploiting EBL, DDB and other CSP  Search Techniques in Graphplan\nOn Deducing Conditional Independence from d-Separation in Causal Graphs  with Feedback (Research Note)\nNonapproximability Results for Partially Observable Markov Decision  Processes\nMean Field Methods for a Special Class of Belief Networks\nPlanning by Rewriting\nSpeeding Up the Convergence of Value Iteration in Partially Observable  Markov Decision Processes\nOptimizing Dialogue Management with Reinforcement Learning: Experiments  with the NJFun System\nATTac-2000: An Adaptive Autonomous Bidding Agent\nEfficient Methods for Qualitative Spatial Reasoning\nImproving the Efficiency of Inductive Logic Programming Through the Use  of Query Packs\nExpert-Guided Subgroup Discovery: Methodology and Application\nAn Architectural Approach to Ensuring Consistency in Hierarchical  Execution\nLearning to Coordinate Efficiently: A Model-based Approach\nTemporal Decision Trees: Model-based Diagnosis of Dynamic Systems  On-Board\nA New Technique for Combining Multiple Classifiers using The  Dempster-Shafer Theory of Evidence\nCan We Learn to Beat the Best Stock\nRestricted Value Iteration: Theory and Algorithms\nTowards a Reliable Framework of Uncertainty-Based Group Decision Support  System\nAn Information Theoretic Representation of Agent Dynamics as Set  Intersections\nGeneralized Fast Approximate Energy Minimization via Graph Cuts:  Alpha-Expansion Beta-Shrink Moves\nOn the Practical use of Variable Elimination in Constraint Optimization  Problems: 'Still-life' as a Case Study\nReasoning about Action: An Argumentation - Theoretic Approach\nLogical Hidden Markov Models\nmGPT: A Probabilistic Planner Based on Heuristic Search\nOptiplan: Unifying IP-based and Graph-based Planning\nPDDL2.1 - The Art of the Possible? Commentary on Fox and Long\nThe Case for Durative Actions: A Commentary on PDDL2.1\nGenerative Prior Knowledge for Discriminative Classification\nProbabilistic Planning via Heuristic Forward Search and Weighted Model  Counting\nThe Complexity of Planning Problems With Simple Causal Graphs\nA Bayesian Model for Plan Recognition in RTS Games applied to StarCraft\nReasoning about Unreliable Actions\nThree new sensitivity analysis methods for influence diagrams\nDistribution over Beliefs for Memory Bounded Dec-POMDP Planning\nBEEM : Bucket Elimination with External Memory\nSolving Hybrid Influence Diagrams with Deterministic Variables\nA Delayed Column Generation Strategy for Exact k-Bounded MAP Inference  in Markov Logic Networks\nComparative Analysis of Probabilistic Models for Activity Recognition  with an Instrumented Walker\nConfounding Equivalence in Causal Inference\nCharacterizing the Set of Coherent Lower Previsions with a Finite Number  of Constraints or Vertices\nModeling Events with Cascades of Poisson Processes\nBayesian Model Averaging Using the k-best Bayesian Network Structures\nTruthful Feedback for Sanctioning Reputation Mechanisms\nSolving Multistage Influence Diagrams using Branch-and-Bound Search\nMultiple faults diagnosis using causal graph\nDevelopment of knowledge Base Expert System for Natural treatment of  Diabetes disease\nConstraint Processing in Lifted Probabilistic Inference\nDeterministic POMDPs Revisited\nAND/OR Importance Sampling\nIdentifying reasoning patterns in games\nComplexity of Inference in Graphical Models\nIdentifying Optimal Sequential Decisions\nSampling First Order Logical Particles\nImproving Gradient Estimation by Incorporating Sensor Data\nExplanation Trees for Causal Bayesian Networks\nModel-Based Bayesian Reinforcement Learning in Large Structured Domains\nBounding Search Space Size via (Hyper)tree Decompositions\nNew Techniques for Algorithm Portfolio Design\nRefractor Importance Sampling\nInference for Multiplicative Models\nLarge-Flip Importance Sampling\nCausal Reasoning in Graphical Time Series Models\nMinimax regret based elicitation of generalized additive utilities\nPolynomial Constraints in Causal Bayesian Networks\nAccuracy Bounds for Belief Propagation\nWhat Counterfactuals Can Be Tested\nImproved Memory-Bounded Dynamic Programming for Decentralized POMDPs\nOn the Robustness of Most Probable Explanations\nInequality Constraints in Causal Models with Hidden Variables\nA new axiomatization for likelihood gambles\nFrom influence diagrams to multi-operator cluster DAGs\nApproximate Separability for Weak Interaction in Dynamic Systems\nIdentifying the Relevant Nodes Without Learning the Model\nBelief Update in CLG Bayesian Networks With Lazy Propagation\nReasoning about Uncertainty in Metric Spaces\nStratified Analysis of `Probabilities of Causation'\nBayesian Inference for Gaussian Mixed Graph Models\nIdentification of Conditional Interventional Distributions\nRule Based Expert System for Cerebral Palsy Diagnosis\nCost Sensitive Reachability Heuristics for Handling State Uncertainty\nPrediction, Expectation, and Surprise: Methods, Designs, and Study of a  Deployed Traffic Forecasting Service\nBelief Updating and Learning in Semi-Qualitative Probabilistic Networks\nCommon Voting Rules as Maximum Likelihood Estimators\nOn Bayesian Network Approximation by Edge Deletion\nExploiting Evidence in Probabilistic Inference\nLocal Markov Property for Models Satisfying Composition Axiom\nApproximate Inference Algorithms for Hybrid Bayesian Networks with  Discrete Constraints\nMetrics for Markov Decision Processes with Infinite State Spaces\nUnstructuring User Preferences: Efficient Non-Parametric Utility  Revelation\nThe Graphical Identification for Total Effects by using Surrogate  Variables\nThe Relationship Between AND/OR Search and Variable Elimination\nPoint-Based POMDP Algorithms: Improved Analysis and Implementation\nQualitative Decision Making Under Possibilistic Uncertainty: Toward more  discriminating criteria\nGenerating Markov Equivalent Maximal Ancestral Graphs by Single Edge  Replacement\nReasoning about Agent Programs using ATL-like Logics\nMetrics for Finite Markov Decision Processes\nDynamic Programming for Structured Continuous Markov Decision Problems\nRegion-Based Incremental Pruning for POMDPs\nA Unified framework for order-of-magnitude confidence relations\nMixtures of Deterministic-Probabilistic Networks and their AND/OR Search  Space\nCompact Value-Function Representations for Qualitative Preferences\nSelection of Identifiability Criteria for Total Effects by using Path  Diagrams\nIdentifying Conditional Causal Effects\nHeuristic Search Value Iteration for POMDPs\nRobustness of Causal Claims\nAn improvement direction for filter selection techniques using  information theory measures and quadratic optimization\nJoin-graph based cost-shifting schemes\nA Maximum Likelihood Approach For Selecting Sets of Alternatives\nA Case Study in Complexity Estimation: Towards Parallel Branch-and-Bound  over Graphical Models\nThe Complexity of Approximately Solving Influence Diagrams\nBelief Propagation for Structured Decision Making\nAn Approximate Solution Method for Large Risk-Averse Markov Decision  Processes\nFrom imprecise probability assessments to conditional probabilities with  quasi additive classes of conditioning events\nMulti-objective Influence Diagrams\nVerbalizing Ontologies in Controlled Baltic Languages\nQuantum Consciousness Soccer Simulator\nGliders2012: Development and Competition Results\nA Dataset for StarCraft AI \\& an Example of Armies Clustering\nA possibilistic handling of partially ordered information\nStructure-Based Causes and Explanations in the Independent Choice Logic\nProbabilistic Reasoning about Actions in Nonmonotonic Causal Theories\nA Linear Belief Function Approach to Portfolio Evaluation\nPolicy-contingent abstraction for robust robot control\nAn Axiomatic Approach to Robustness in Search Problems with Multiple  Scenarios\nA constraint satisfaction approach to the robust spanning tree problem  with interval data\nQualitative MDPs and POMDPs: An Order-Of-Magnitude Approximation\nGeneralized Instrumental Variables\nCauses and Explanations in the Structural-Model Approach: Tractable  Cases\nStatistical Decisions Using Likelihood Information Without Prior  Probabilities\nReduction of Maximum Entropy Models to Hidden Markov Models\nExpectation Propogation for approximate inference in dynamic Bayesian  networks\nCoordinates: Probabilistic Forecasting of Presence and Availability\nEfficient Nash Computation in Large Population Games with Bounded  Influence\nFormalizing Scenario Analysis\nFactored Particles for Scalable Monitoring\nInference with Seperately Specified Sets of Probabilities in Credal  Networks\nAsymptotic Model Selection for Naive Bayesian Networks\nLoopy Belief Propogation and Gibbs Measures\nParticle Filters in Robotics (Invited Talk)\nOn the Testable Implications of Causal Models with Hidden Variables\nMarkov Chain Monte Carlo using Tree-Based Priors on Model Structure\nA Calculus for Causal Relevance\nInstrumentality Tests Revisited\nConditions Under Which Conditional Independence and Scoring Methods Lead  to Identical Selection of Bayesian Network Models\nCauses and Explanations: A Structural-Model Approach --- Part 1: Causes\nPlausible reasoning from spatial observations\nProbabilistic Logic Programming under Inheritance with Overriding\nSolving Influence Diagrams using HUGIN, Shafer-Shenoy and Lazy  Propagation\nDirect and Indirect Effects\nVector-space Analysis of Belief-state Approximation for POMDPs\nA Mixed Graphical Model for Rhythmic Parsing\nA Tractable POMDP for a Class of Sequencing Problems\nCausal Discovery from Changes\nUsing Temporal Data for Making Recommendations\nPerfect Tree-Like Markovian Distributions\nA Principled Analysis of Merging Operations in Possibilistic Logic\nApproximately Optimal Monitoring of Plan Preconditions\nStochastic Logic Programs: Sampling, Inference and Applications\nA Qualitative Linear Utility Theory for Spohn's Theory of Epistemic  Beliefs\nCausal Mechanism-based Model Construction\nEvaluating Influence Diagrams using LIMIDs\nConversation as Action Under Uncertainty\nPivotal Pruning of Trade-offs in QPNs\nA Branch-and-Bound Algorithm for MDL Learning Bayesian Networks\nProbabilities of Causation: Bounds and Identification\nAn Application of Uncertain Reasoning to Requirements Engineering\nLoglinear models for first-order probabilistic reasoning\nLearning Polytrees\nQuantifier Elimination for Statistical Problems\nFaithful Approximations of Belief Functions\nEstimating the Value of Computation in Flexible Information Refinement\nRepresenting and Combining Partially Specified CPTs\nOn the Complexity of Policy Iteration\nA Variational Approximation for Bayesian Networks with Discrete and  Continuous Latent Variables\nLearning Bayesian Networks from Incomplete Data with Stochastic Search  Algorithms\nEnhancing QPNs for Trade-off Resolution\nA Possibilistic Model for Qualitative Sequential Decision Problems under  Uncertainty in Partially Observable Environments\nEfficient Value of Information Computation\nMultiplicative Factorization of Noisy-Max\nA Method for Speeding Up Value Iteration in Partially Observable Markov  Decision Processes\nOn the Acceptability of Arguments in Preference-Based Argumentation\nMerging Uncertain Knowledge Bases in a Possibilistic Logic Framework\nMarginalizing in Undirected Graph and Hypergraph Models\nIrrelevance and Independence Relations in Quasi-Bayesian Networks\nOn the Semi-Markov Equivalence of Causal Models\nComparative Uncertainty, Belief Functions and Accepted Beliefs\nQualitative Decision Theory with Sugeno Integrals\nLearning the Structure of Dynamic Probabilistic Networks\nSolving POMDPs by Searching in Policy Space\nMeasure Selection: Notions of Rationality and Representation  Independence\nExact Inference of Hidden Structure from Sample Data in Noisy-OR  Networks\nUsing Qualitative Relationships for Bounding Probability Distributions\nConstructing Situation Specific Belief Networks\nLogarithmic Time Parallel Bayesian Inference\nProbabilistic Inference in Influence Diagrams\nCorrelated Action Effects in Decision Theoretic Regression\nAlgorithms for Learning Decomposable Models and Chordal Graphs\nIncremental Pruning: A Simple, Fast, Exact Method for Partially  Observable Markov Decision Processes\nDefining Explanation in Probabilistic Systems\nEfficient Induction of Finite State Automata\nA Standard Approach for Optimizing Belief Network Inference using Query  DAGs\nAlgorithm Portfolio Design: Theory vs. Practice\nInference with Idempotent Valuations\nTime-Critical Reasoning: Representations and Application\nRelational Bayesian Networks\nProbabilistic Acceptance\nIncremental Map Generation by Low Cost Robots Based on  Possibility/Necessity Grids\nStructure and Parameter Learning for Causal Independence and Causal  Interaction Models\nLearning Bayesian Networks from Incomplete Databases\nIndependence of Causal Influence and Clique Tree Propagation\nFast Value Iteration for Goal-Directed Markov Decision Processes\nApproximations for Decision Making in the Dempster-Shafer Theory of  Evidence\nArguing for Decisions: A Qualitative Model of Decision Making\nSome Experiments with Real-Time Decision Algorithms\nBucket Elimination: A Unifying Framework for Several Probabilistic  Inference\nFlexible Policy Construction by Information Refinement\nEfficient Search-Based Inference for Noisy-OR Belief Networks:  TopEpsilon\nMIDAS - An Influence Diagram for Management of Mildew in Winter Wheat\nToward a Market Model for Bayesian Inference\nA Graph-Theoretic Analysis of Information Value\nOptimal Monte Carlo Estimation of Belief Network Inference\nCoherent Knowledge Processing at Maximum Entropy by SPIRIT\nA Measure of Decision Flexibility\nTesting Implication of Probabilistic Dependencies\nIn Love With a Robot: the Dawn of Machine-To-Machine Marketing\nCounterfactuals and Policy Analysis in Structural Models\nBelief Functions and Default Reasoning\nChain Graphs for Learning\nA Transformational Characterization of Equivalent Bayesian Network  Structures\nConditioning Methods for Exact and Approximate Inference in Causal  Networks\nImplementation of Continuous Bayesian Networks Using Sums of Weighted  Gaussians\nTesting Identifiability of Causal Effects\nEfficient Decision-Theoretic Planning: Techniques and Empirical Analysis\nLearning Bayesian Networks: A Unification for Discrete and Gaussian  Domains\nReasoning, Metareasoning, and Mathematical Truth: Studies of Theorem  Proving under Limited Resources\nImproved Sampling for Diagnostic Reasoning in Bayesian Networks\nCautious Propagation in Bayesian Networks\nHUGS: Combining Exact Inference and Gibbs Sampling in Junction Trees\nIs There a Role for Qualitative Risk Assessment?\nOn the Complexity of Solving Markov Decision Problems\nA Theoretical Framework for Context-Sensitive Temporal Probability Model  Construction with Application to Plan Projection\nRefining Reasoning in Qualitative Probabilistic Networks\nOn the Testability of Causal Models with Latent and Instrumental  Variables\nProbabilistic Evaluation of Sequential Plans from Causal Models with  Hidden Variables\nCausal Inference in the Presence of Latent Variables and Selection Bias\nAn Order of Magnitude Calculus\nA Method for Implementing a Probabilistic Model as a Relational Database\nGenerating Explanations for Evidential Reasoning\nInference with Causal Independence in the CPSC Network\nModus Ponens Generating Function in the Class of ^-valuations of  Plausibility\nApproximation Algorithms for the Loop Cutset Problem\nExploratory Model Building\nA Stratified Simulation Scheme for Inference in Bayesian Belief Networks\nSymbolic Probabilitistic Inference in Large BN2O Networks\nIntegrating Planning and Execution in Stochastic Domains\nPenalty logic and its Link with Dempster-Shafer Theory\nConditional Independence in Possibility Theory\nBackward Simulation in Bayesian Networks\nAbstracting Probabilistic Actions\nOn Modal Logics for Qualitative Possibility in a Fuzzy Setting\nA Logic for Default Reasoning About Probabilities\nOptimal Junction Trees\nConstructing Belief Networks to Evaluate Plans\nAnytime Decision Making with Imprecise Probabilities\nSolving Asymmetric Decision Problems with Influence Diagrams\nA Probabilistic Approach to Hierarchical Model-based Diagnosis\nSemigraphoids Are Two-Antecedental Approximations of Stochastic  Conditional Independence Models\nExceptional Subclasses in Qualitative Probability\nA Defect in Dempster-Shafer Theory\nState-space Abstraction for Anytime Evaluation of Probabilistic Networks\nGenerating Graphoids from Generalised Conditional Probability\nEvidential Reasoning with Conditional Belief Functions\nInter-causal Independence and Heterogeneous Factorization\nCausality in Bayesian Belief Networks\nFrom Conditional Oughts to Qualitative Decision Theory\nParameter Adjustment in Bayes Networks. The generalized noisy OR-gate\nCausal Independence for Knowledge Acquisition and Inference\nSensitivity Analysis for Probability Assessments in Bayesian Networks\nCausal Modeling\nReasoning about the Value of Decision-Model Refinement: Methods and  Application\nValuation Networks and Conditional Independence\nA Generalization of the Noisy-Or Model\nGraph-Grammar Assistance for Automated Generation of Influence Diagrams\nAn Algorithm for the Construction of Bayesian Network Structures from  Data\nA Synthesis of Logical and Probabilistic Reasoning for Program  Understanding and Debugging\nIncremental Probabilistic Inference\nIntercausal Reasoning with Uninstantiated Ancestor Nodes\nInference Algorithms for Similarity Networks\nUsing Tree-Decomposable Structures to Approximate Belief Networks\nUsing Potential Influence Diagrams for Probabilistic Inference and  Decision Making\nIncremental computation of the value of perfect information in  stepwise-decomposable influence diagrams\nArgument Calculus and Networks\nOn reasoning in networks with qualitative uncertainty\nProbabilistic Assumption-Based Reasoning\nPartially Specified Belief Functions\nBelief Revision in Probability Theory\nA Belief-Function Based Decision Support System\nRES - a Relative Method for Evidential Reasoning\nOptimizing Causal Orderings for Generating DAGs from Data\nModal Logics for Qualitative Possibility and Beliefs\nLattice-Based Graded Logic: a Multimodal Approach\nDynamic Network Models for Forecasting\nParallelizing Probabilistic Inference: Some Early Explorations\nAn Entropy-based Learning Algorithm of Bayesian Conditional Trees\nKnowledge Integration for Conditional Probability Assessments\nA computational scheme for Reasoning in Dynamic Probabilistic Networks\nThe Dynamic of Belief in the Transferable Belief Model and  Specialization-Generalization Matrices\nSome Problems for Convex Bayesians\nBayesian Meta-Reasoning: Determining Model Adequacy from Within a Small  World\nThe Bounded Bayesian\nRepresenting Context-Sensitive Knowledge in a Network Formalism: A  Preliminary Report\nA Probabilistic Network of Predicates\nEmpirical Probabilities in Monadic Deductive Databases\naHUGIN: A System Creating Adaptive Causal Probabilistic Networks\nMESA: Maximum Entropy by Simulated Annealing\nGuess-And-Verify Heuristics for Reducing Uncertainties in Expert  Classification Systems\nDecision Making Using Probabilistic Inference Methods\nConditional Independence in Uncertainty Theories\nA Fuzzy Logic Approach to Target Tracking\nTowards Precision of Probabilistic Bounds Propagation\nGeneralizing Jeffrey Conditionalization\nInterval Structure: A Framework for Representing Uncertain Information\n\"Conditional Inter-Causally Independent\" Node Distributions, a Property  of \"Noisy-Or\" Models\nCombination of Upper and Lower Probabilities\nA Bayesian Method for Constructing Bayesian Belief Networks from  Databases\nAdvances in Probabilistic Reasoning\nTime-Dependent Utility and Action Under Uncertainty\nNon-monotonic Reasoning and the Reversibility of Belief Change\nReasoning with Mass Distributions\nConflict and Surprise: Heuristics for Model Revision\nReasoning under Uncertainty: Some Monte Carlo Results\nA Modification to Evidential Probability\nNon-monotonic Negation in Probabilistic Deductive Databases\nIntegrating Probabilistic Rules into Neural Networks: A Stochastic EM  Learning Algorithm\nRepresenting Bayesian Networks within Probabilistic Horn Abduction\nPulcinella: A General Tool for Propagating Uncertainty in Valuation  Networks\nOn the Generation of Alternative Explanations with Implications for  Belief Revision\nAbout Updating\nCompressed Constraints in Probabilistic Logic and Their Revision\nDetecting Causal Relations in the Presence of Unmeasured Variables\nAn Efficient Implementation of Belief Function Propagation\nWhy Do We Need Foundations for Modelling Uncertainties?\nHow to minimize the energy consumption in mobile ad-hoc networks\nProbabilistic Conditional Preference Networks\nAdvances in Bayesian Network Learning using Integer Programming\nA Sound and Complete Algorithm for Learning Causal Models from  Relational Data\nIdentifying Finite Mixtures of Nonparametric Product Distributions and  Causal Inference of Confounders\nCase Adaptation with Qualitative Algebras\nPlanning based on classification by induction graph\nGiving the AI definition a form suitable for the engineer\nTransductive Rademacher Complexity and its Applications\nThe Role of Macros in Tractable Planning\nFast Set Bounds Propagation Using a BDD-SAT Hybrid\nOn the Intertranslatability of Argumentation Semantics\nDr.Fill: Crosswords and an Implemented Solver for Singly Weighted CSPs\nInteractions between Knowledge and Time in a First-Order Logic for  Multi-Agent Systems: Completeness Results\nNarrative Planning: Compilations to Classical Planning\nRational Counterfactuals\nModel revision inference for extensions of first order logic\nFlow for Meta Control\nA Logic for Reasoning about Evidence\nA Logic for Reasoning about Upper Probabilities\nA Heuristic Search Algorithm for Solving First-Order MDPs\nMarkov Chains on Orbits of Permutation Groups\nSome Reflections on the Set-based and the Conditional-based  Interpretations of Statements in Syllogistic Reasoning\nQsmodels: ASP Planning in Interactive Gaming Environment\nOn the Computability of AIXI\nWelfare of Sequential Allocation Mechanisms for Indivisible Goods\nBounded Optimal Exploration in MDP\nNormative Multiagent Systems: A Dynamic Generalization\nLatent Contextual Bandits and their Application to Personalized  Recommendations for New Users\nProcedural Generation of Angry Birds Levels using Building Constructive  Grammar with Chinese-Style and/or Japanese-Style Models\nPropositional Abduction with Implicit Hitting Sets\nTeaching natural language to computers\nLearning to Rank for Synthesizing Planning Heuristics\nSmart Policies for Artificial Intelligence\nSemantic Similarity in a Taxonomy: An Information-Based Measure and its  Application to Problems of Ambiguity in Natural Language\nVariational Cumulant Expansions for Intractable Distributions\nSolving Highly Constrained Search Problems with Quantum Computers\nProperties and Applications of Programs with Monotone and Convex  Constraints\nSolving Factored MDPs with Hybrid State and Action Variables\nLearning Symbolic Models of Stochastic Domains\nCombining Spatial and Temporal Logics: Expressiveness vs. Complexity\nThe Power of Modeling - a Response to PDDL2.1\nAuctions with Severely Bounded Communication\nMarvin: A Heuristic Search Planner with Online Macro-Action Learning\nDiscovering Classes of Strongly Equivalent Logic Programs\nPhase Transition for Random Quantified XOR-Formulas\nExploiting Functional Dependencies in Qualitative Probabilistic  Reasoning\nManaging Uncertainty in Rule Based Cognitive Models\nProblem Formulation as the Reduction of a Decision Model\nDynamic Construction of Belief Networks\nErgo: A Graphical Environment for Constructing Bayesian\nA Dynamic Approach to Probabilistic Inference\nRobust Inference Policies\nMinimum Error Tree Decomposition\nIDEAL: A Software Package for Analysis of Influence Diagrams\nOptimal Decomposition of Belief Networks\nOn Heuristics for Finding Loop Cutsets in Multiply-Connected Belief  Networks\nA Combination of Cutset Conditioning with Clique-Tree Propagation in the  Pathfinder System\nUsing Dempster-Shafer Theory in Knowledge Representation\nAmplitude-Based Approach to Evidence Accumulation\nA Probabilistic Reasoning Environment\nDecisions with Limited Observations over a Finite Product Space: the  Klir Effect\nRules, Belief Functions and Default Logic\nComputing Probability Intervals Under Independency Constraints\nAn Empirical Analysis of Likelihood-Weighting Simulation on a Large,  Multiply-Connected Belief Network\nTowards a Normative Theory of Scientific Evidence\nPlan Recognition in Stories and in Life\nDecision Making \"Biases\" and Support for Assumption-Based Higher-Order  Reasoning\nDeciding Consistency of Databases Containing Defeasible and Strict  Information\nHeuristic Search as Evidential Reasoning\nInference Policies\nStrategies for Generating Micro Explanations for Bayesian Belief  Networks\nEvidence Absorption and Propagation through Evidence Reversals\nFreedom: A Measure of Second-order Uncertainty for Intervalic  Probability Schemes\nEfficient Parallel Estimation for Markov Random Fields\nCan Uncertainty Management be Realized in a Finite Totally Ordered  Probability Algebra?\nSummary of A New Normative Theory of Probabilistic Logic\nProcess, Structure, and Modularity in Reasoning with Uncertainty\nA Temporal Logic for Uncertain Events and An Outline of A Possible  Implementation in An Extension of PROLOG\nProbability as a Modal Operator\nOn the Logic of Causal Models\nAn Empirical Comparison of Three Inference Methods\nParallel Belief Revision\nRational Nonmonotonic Reasoning\nEpistemological Relevance and Statistical Knowledge\nJustifying the Principle of Interval Constraints\nAn Axiomatic Framework for Bayesian and Belief-function Propagation\nUpdating Probabilities in Multiply-Connected Belief Networks\nCausal Networks: Semantics and Expressiveness\nMCE Reasoning in Recursive Causal Networks\nNonmonotonic Reasoning via Possibility Theory\nIs Shafer General Bayes?\nModifiable Combining Functions\nDempster-Shafer vs. Probabilistic Logic\nBelief in Belief Functions: An Examination of Shafer's Canonical  Examples\nCan Evidence Be Combined in the Dempster-Shafer Theory\nTemporal Reasoning About Uncertain Worlds\nA Perspective on Confidence and Its Use in Focusing Attention During  Knowledge Acquisition\nPractical Issues in Constructing a Bayes' Belief Network\nObjective Probability\nDecision Tree Induction Systems: A Bayesian Analysis\nThe Automatic Training of Rule Bases that Use Numerical Uncertainty  Representations\nThe Inductive Logic of Information Systems\nThe Recovery of Causal Poly-Trees from Statistical Data\nA Heuristic Bayesian Approach to Knowledge Acquisition: Application to  Analysis of Tissue-Type Plasminogen Activator\nA Study of Associative Evidential Reasoning\nConvergent Deduction for Probabilistic Logic\nA Knowledge Engineer's Comparison of Three Evidence Aggregation Methods\nProblem Structure and Evidential Reasoning\nThe Role of Tuning Uncertain Inference Systems\nIntegrating Logical and Probabilistic Reasoning for Decision Making\nAn Algorithm for Computing Probabilistic Propositions\nTaxonomy, Structure, and Implementation of Evidential Reasoning\nTowards The Inductive Acquisition of Temporal Knowledge\nReasoning With Uncertain Knowledge\nDeriving And Combining Continuous Possibility Functions in the Framework  of Evidential Reasoning\nNon-Monotonicity in Probabilistic Reasoning\nFlexible Interpretations: A Computational Model for Dynamic Uncertainty  Assessment\nEvidence as Opinions of Experts\nBayesian Inference for Radar Imagery Based Surveillance\nEvidential Reasoning in Parallel Hierarchical Vision Programs\nComputing Reference Classes\nAn Uncertainty Management Calculus for Ordering Searches in Distributed  Dynamic Databases\nEstimating Uncertain Spatial Relationships in Robotics\nEvaluation of Uncertain Inference Models I: PROSPECTOR\nA Framework for Non-Monotonic Reasoning About Probabilistic Assumptions\nInduction, of and by Probability\nCombining Uncertain Estimates\nIncidence Calculus: A Mechanism for Probabilistic Reasoning\nExact Reasoning Under Uncertainty\nStrong & Weak Methods: A Logical View of Uncertainty\nStatistical Mechanics Algorithm for Response to Targets (SMART)\nKnowledge Structures and Evidential Reasoning in Decision Analysis\nA Social Welfare Optimal Sequential Allocation Procedure\nStatistical Constraints\nWhat Is It Like to Be a Brain Simulation?\nEvolutionary solving of the debts' clearing problem\nAn eigenvector-based hotspot detection\nProbabilistic Selection in AgentSpeak(L)\nQuantum computing for pattern classification\nUsing the Mean Absolute Percentage Error for Regression Models\nSearch Strategies for Binary Feature Selection for a Naive Bayes  Classifier\nLearning from Pairwise Marginal Independencies\nTuring's Imitation Game has been Improved\nAn Empirical Comparison of Neural Architectures for Reinforcement  Learning in Partially Observable Environments\nComposing inference algorithms as program transformations\nReview of state-of-the-arts in artificial intelligence with application  to AI safety problem\nAutomatic Extraction of Causal Relations from Natural Language Texts: A  Comprehensive Survey\nProceedings Fifteenth Conference on Theoretical Aspects of Rationality  and Knowledge\nRobust Natural Language Processing - Combining Reasoning, Cognitive  Semantics and Construction Grammar for Spatial Language\nLatent Dependency Forest Models\nAn Extended Neo-Fuzzy Neuron and its Adaptive Learning Algorithm\nOverview: Generalizations of Multi-Agent Path Finding to Real-World  Scenarios\nCriticality & Deep Learning I: Generally Weighted Nets\nMinimax density estimation for growing dimension\nBetaRun Soccer Simulation League Team: Variety, Complexity, and Learning\nSource-Sensitive Belief Change\nMOBA: a New Arena for Game AI\nLow Impact Artificial Intelligences\nA Tutor Agent for MOBA Games\nBandit Models of Human Behavior: Reward Processing in Mental Disorders\nA New Probabilistic Algorithm for Approximate Model Counting\nAI-Powered Social Bots\nArmstrong's Axioms and Navigation Strategies\nStrategic Coalitions with Perfect Recall\nProceedings Sixteenth Conference on Theoretical Aspects of Rationality  and Knowledge\nDeclarative Sequential Pattern Mining of Care Pathways\nExact Inference for Relational Graphical Models with Interpreted  Functions: Lifted Probabilistic Inference Modulo Theories\nCommonsense Scene Semantics for Cognitive Robotics: Towards Grounding  Embodied Visuo-Locomotive Interactions\nAn enhanced method to compute the similarity between concepts of  ontology\nThe Promise and Peril of Human Evaluation for Model Interpretability\nThe mind as a computational system\nA Slow Read attack Using Cloud\nSimulated Autonomous Driving on Realistic Road Networks using Deep  Reinforcement Learning\nDeep Learning: A Critical Appraisal\nTrading the Twitter Sentiment with Reinforcement Learning\nQuantified Degrees of Group Responsibility (Extended Abstract)\nEtymo: A New Discovery Engine for AI Research\nMorphologic for knowledge dynamics: revision, fusion, abduction\nBernoulli Embeddings for Graphs\nGenerative Design in Minecraft (GDMC), Settlement Generation Competition\nApplication of Grey Numbers to Assessment Processes\nVisual Analytics for Explainable Deep Learning\nOrder to Disorder Transitions in Hybrid Intelligent Systems: a Hatch to  the Interactions of Nations -Governments\nHybrid technique for effective knowledge representation & a comparative  study\nA short note on estimating intelligence from user profiles in the  context of universal psychometrics: prospects and caveats\nAdvice from the Oracle: Really Intelligent Information Retrieval\nIntelligent Identification of Two-Dimensional Structure by  Machine-Learning Optical Microscopy\nTextbook examples of recursion\nTo Preference via Entrenchment\nThe Logic Programming Paradigm and Prolog\nA theory of experiment\nValue Based Argumentation Frameworks\nKnowledge Representation\nToward the Implementation of Functions in the DLV System (Preliminary  Technical Report)\nQuantum Computers\nSelf-organizing neural networks in classification and image recognition\nA Note on the PAC Bayesian Theorem\nInferring knowledge from a large semantic network\nSelf-Organizing Multilayered Neural Networks of Optimal Complexity\nRedundancy in Logic III: Non-Mononotonic Reasoning\nYet Another Efficient Unification Algorithm\nIslands for SAT\nSolving planning domains with polytree causal graphs is NP-complete\nQuantum Artificial Intelligence\nCalculating Valid Domains for BDD-Based Interactive Configuration\nHow to realize \"a sense of humour\" in computers ?\nGeometric Data Analysis, From Correspondence Analysis to Structured Data  Analysis (book review)\nTemporized Equilibria\nIdentification of parameters underlying emotions and a classification of  emotions\nA remark on higher order RUE-resolution with EXTRUE\nFuzzy Mnesors\nThe Soft Cumulative Constraint\nModelling Concurrent Behaviors in the Process Specification Language\nBeyond Turing Machines\nABC-LogitBoost for Multi-class Classification\nAlgorithms for finding dispensable variables\nDominion -- A constraint solver generator\nA Formalization of the Turing Test\nPlanning to Be Surprised: Optimal Bayesian Exploration in Dynamic  Environments\nPredicting growth fluctuation in network economy\nInstantiation Schemes for Nested Theories\n'Just Enough' Ontology Engineering\nKernel diff-hash\nPrinciples of Solomonoff Induction and AIXI\nEfficient Methods for Unsupervised Learning of Probabilistic Models\nAn example illustrating the imprecision of the efficient approach for  diagnosis of Petri nets via integer linear programming\nTowards common-sense reasoning via conditional simulation: legacies of  Turing in Artificial Intelligence\nModification of conceptual clustering algorithm Cobweb for numerical  data using fuzzy membership function\nThe Doxastic Interpretation of Team Semantics\nIntroduction to Judea Pearl's Do-Calculus\nLambda Dependency-Based Compositional Semantics\nA short note on the axiomatic requirements of uncertainty measure\nLogic in the Lab\nFree-configuration Biased Sampling for Motion Planning: Errata\nCortex simulation system proposal using distributed computer network  environments\nDynamic Sweep Filtering Algorithm for FlexC\nAI Evaluation: past, present and future\nNeurocontrol methods review\nExpressibility of norms in temporal logic\nXapagy: a cognitive architecture for narrative reasoning\nFuzzy Inference Systems Optimization\nDeontic modality based on preference\nQualitative shape representation based on the qualitative relative  direction and distance calculus eOPRAm\nAbout Tau-Chain\nNorm-Based Capacity Control in Neural Networks\nA Note on Information-Directed Sampling and Thompson Sampling\nWhy Bother With Syntax?\nAn Application of the Generalized Rectangular Fuzzy Model to Critical  Thinking Assessment\nConcept Generation in Language Evolution\nNegative Learning Rates and P-Learning\nObstacle evasion using fuzzy logic in a sliding blades problem  environment\nA note on adjusting $R^2$ for using with cross-validation\nHow to avoid ethically relevant Machine Consciousness\nHow to advance general game playing artificial intelligence by player  modelling\nSimplified Boardgames\nLattice Structure of Variable Precision Rough Sets\nDivisive-agglomerative algorithm and complexity of automatic  classification problems\nOn Seeking Consensus Between Document Similarity Measures\nResolving the Complexity of Some Fundamental Problems in Computational  Social Choice\nBrief Notes on Hard Takeoff, Value Alignment, and Coherent Extrapolated  Volition\nA History of Metaheuristics\nEmbodied Artificial Intelligence through Distributed Adaptive Control:  An Integrated Framework\nThe Causality/Repair Connection in Databases: Causality-Programs\nPolicy Gradient Methods for Reinforcement Learning with Function  Approximation and Action-Dependent Baselines\nThe Complex Negotiation Dialogue Game\nScientists in silico?\nMagNet and \"Efficient Defenses Against Adversarial Attacks\" are Not  Robust to Adversarial Examples\nNetwork Analysis for Explanation\nParanom: A Parallel Anomaly Dataset Generator\nComputing as compression: the SP theory of intelligence\nTowards the Augmented Pathologist: Challenges of Explainable-AI in  Digital Pathology\nReputation in M2M Economy\nMonotonicity and Persistence in Preferential Logics\nOn the accuracy and running time of GSAT\nTowards a Universal Theory of Artificial Intelligence based on  Algorithmic Probability and Sequential Decision Theory\nAnnotated revision programs\nRobust Feature Selection by Mutual Information Distributions\nThe New AI: General & Sound & Relevant for Physics\nAnusaaraka: Machine Translation in Stages\nNeuro Fuzzy Systems: Sate-of-the-Art Modeling Techniques\nThe Integration of Connectionism and First-Order Knowledge  Representation and Reasoning as a Challenge for Artificial Intelligence\nDefault reasoning over domains and concept hierarchies\nArtificial Neural Networks and Support Vector Machines for Water Demand  Time Series Forecasting\n2006: Celebrating 75 years of AI - History and Outlook: the Next 25  Years\nLocal search heuristics: Fitness Cloud versus Fitness Landscape\nA Reactive Tabu Search Algorithm for Stimuli Generation in  Psycholinguistics\nReasoning about Cardinal Directions between Extended Objects\nLife, the Universe, and almost Everything: Signs of Cosmic Design?\nA Monte Carlo Algorithm for Universally Optimal Bayesian Sequence  Prediction and Planning\nFeature Importance in Bayesian Assessment of Newborn Brain Maturity from  EEG\nSymmetry within and between solutions\nHybrid tractability of soft constraint problems\nQuantum Interaction Approach in Cognition, Artificial Intelligence and  Robotics\nThe Complexity of Reasoning about Spatial Congruence\nDecentralized Supply Chain Formation: A Market Protocol and Competitive  Equilibrium Analysis\nA Maximal Tractable Class of Soft Constraints\nEvaluation of a Simple, Scalable, Parallel Best-First Search Strategy\nClassification of artificial intelligence ids for smurf attack\nMaking life better one large system at a time: Challenges for UAI  research\nA unified setting for inference and decision: An argumentation-based  approach\nExistence and Finiteness Conditions for Risk-Sensitive Planning: Results  and Conjectures\nBayes' Bluff: Opponent Modelling in Poker\nFHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large  POMDPs\nUnderstanding (dis)similarity measures\nEfficient Approximation for Triangulation of Minimum Treewidth\nChoosing Among Interpretations of Probability\nExpected Utility Networks\nInference Using Message Propagation and Topology Transformation in  Vector Gaussian Continuous Networks\nProbabilistic Exploration in Planning while Learning\nSome Properties of Joint Probability Distributions\nThree Approaches to Probability Model Selection\nExpressing Relational and Temporal Knowledge in Visual Probabilistic  Networks\nA Sensitivity Analysis of Pathfinder: A Follow-up Study\nProjective simulation for classical learning agents: a comprehensive  investigation\nSecond Order Swarm Intelligence\nA brief network analysis of Artificial Intelligence publication\nThe Complexity of Integer Bound Propagation\nCombining Evaluation Metrics via the Unanimous Improvement Ratio and its  Application to Clustering Tasks\nReal Time Strategy Language\nE-Generalization Using Grammars\nRobust Feature Selection by Mutual Information Distributions\nFinetuning Randomized Heuristic Search For 2D Path Planning: Finding The  Best Input Parameters For R* Algorithm Through Series Of Experiments\nStrategic Dialogue Management via Deep Reinforcement Learning\nLimits to Verification and Validation of Agentic Behavior\nPilot Testing an Artificial Intelligence Algorithm That Selects Homeless  Youth Peer Leaders Who Promote HIV Testing\nA Randomized Approximation Algorithm of Logic Sampling\nIntegrating Case-Based and Rule-Based Reasoning: the Possibilistic  Connection\nThe Effects of Perfect and Sample Information on Fuzzy Utilities in  Decision-Making\nBayesian Prediction for Artificial Intelligence\nKnowledge Engineering Within A Generalized Bayesian Framework\nThe Rational and Computational Scope of Probabilistic Rule-Based Expert  Systems\nMachine Learning, Clustering, and Polymorphy\nProbabilistic Conflict Resolution in Hierarchical Hypothesis Spaces\nPattern recognition issues on anisotropic smoothed particle  hydrodynamics\nA hybrid swarm-based algorithm for single-objective optimization  problems involving high-cost analyses\nFriendly Artificial Intelligence: the Physics Challenge\nA Quantum Production Model\nAscribing Consciousness to Artificial Intelligence\nArtificial general intelligence through recursive data compression and  grounded reasoning: a position paper\nAutomated Assignment of Backbone NMR Data using Artificial Intelligence\nA Minimal Architecture for General Cognition\nA Topological Approach to Meta-heuristics: Analytical Results on the BFS  vs. DFS Algorithm Selection Problem\nPhilosophical Fictionalism and Problem of Artificial Intelligence\nCategory theoretic foundation of single-photon-based decision making\nAutomatic Bridge Bidding Using Deep Reinforcement Learning\nLong-Term Trends in the Public Perception of Artificial Intelligence\nArtificial Intelligence Safety and Cybersecurity: a Timeline of AI  Failures\nSelf-Correcting Models for Model-Based Reinforcement Learning\nBasic protocols in quantum reinforcement learning with superconducting  circuits\nRobust Multilingual Named Entity Recognition with Shallow  Semi-Supervised Features\nWho Will Win Practical Artificial Intelligence? AI Engineerings in China\nA System for Accessible Artificial Intelligence\nEthical Artificial Intelligence - An Open Question\nOPEB: Open Physical Environment Benchmark for Artificial Intelligence\nLearning Photography Aesthetics with Deep CNNs\nDiscriminant chronicles mining: Application to care pathways analytics\nAutonomous Agents Modelling Other Agents: A Comprehensive Survey and  Open Problems\nWhat Automated Planning can do for Business Process Management\nArtificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo\nArtificial Intelligence and Statistics\nIn folly ripe. In reason rotten. Putting machine theology to rest\nArtificial Intelligence (AI) Methods in Optical Networks: A  Comprehensive Survey\nCan Computers Create Art?\nA Human-Grounded Evaluation Benchmark for Local Explanations of Machine  Learning\nSemantic Vector Spaces for Broadening Consideration of Consequences\nCategorizing Variants of Goodhart's Law\nNeural Network Quine\nArtificial Intelligence Enabled Software Defined Networking: A  Comprehensive Overview\nApplications of Artificial Intelligence to Network Security\nHow an Electrical Engineer Became an Artificial Intelligence Researcher,  a Multiphase Active Contours Analysis\nReview of Deep Learning\nAn electronic-game framework for evaluating coevolutionary algorithms\nIntelligent Anticipated Exploration of Web Sites\nAlleviating Media Bias Through Intelligent Agent Blogging\nState of the Art Review for Applying Computational Intelligence and  Machine Learning Techniques to Portfolio Optimisation\nSemantic Oriented Agent based Approach towards Engineering Data  Management, Web Information Retrieval and User System Communication Problems\nBuilding Smart Communities with Cyber-Physical Systems\nIntelligent Search Heuristics for Cost Based Scheduling\nResearch on the mobile robots intelligent path planning based on ant  colony algorithm application in manufacturing logistics\nA Roadmap towards Machine Intelligence\nMaking Math Searchable in Wikipedia\nGap Analysis of Natural Language Processing Systems with respect to  Linguistic Modality\nIs swarm intelligence able to create mazes?\nEmotional Metaheuristics For in-situ Foraging Using Sensor Constrained  Robot Swarms\nOn the Development of Intelligent Agents for MOBA Games\nUn résultat intrigant en commande sans modèle\nBetween collective intelligence and semantic web : hypermediating sites.  Contribution to technologies of intelligence\nOrder Effects for Queries in Intelligent Systems\nDecision Support Systems Using Intelligent Paradigms\nA Knowledge Discovery Framework for Learning Task Models from User  Interactions in Intelligent Tutoring Systems\nFaith in the Algorithm, Part 2: Computational Eudaemonics\nContribution of Case Based Reasoning (CBR) in the Exploitation of Return  of Experience. Application to Accident Scenarii in Railroad Transport\nExpert PC Troubleshooter With Fuzzy-Logic And Self-Learning Support\nApplications of Algorithmic Probability to the Philosophy of Mind\nOnly T3-AI can reach human-level intelligence: A variety argument\nEmotional control - conditio sine qua non for advanced artificial  intelligences?\nThe Anatomy of a Modular System for Media Content Analysis\nAdvances in Artificial Intelligence: Are you sure, we are on the right  track?\nThe Singularity May Never Be Near\nA Tutorial on Deep Neural Networks for Intelligent Systems\nVirtual Embodiment: A Scalable Long-Term Strategy for Artificial  Intelligence Research\nCan Turing machine be curious about its Turing test results? Three  informal lectures on physics of intelligence\nThe Danger Theory and Its Application to Artificial Immune Systems\nAn Artificial Immune System as a Recommender System for Web Sites\nNext Challenges in Bringing Artificial Immune Systems to Production in  Network Security\nAn Immune Inspired Approach to Anomaly Detection\nArtificial Hormone Reaction Networks: Towards Higher Evolvability in  Evolutionary Multi-Modular Robotics\nEstimating Continuous Distributions in Bayesian Classifiers\nEmpirical Study of Artificial Fish Swarm Algorithm\nThe Information-theoretic and Algorithmic Approach to Human, Animal and  Artificial Cognition\nNiépce-Bell or Turing: How to Test Odor Reproduction?\nLearning Solving Procedure for Artificial Neural Network\nParrondo Strategies for Artificial Traders\nLearning and discrimination through STDP in a top-down modulated  associative memory\nMovie Recommendation Systems Using An Artificial Immune System\nArticulation and Clarification of the Dendritic Cell Algorithm\nDendritic Cells for Real-Time Anomaly Detection\nApplication of PSO, Artificial Bee Colony and Bacterial Foraging  Optimization algorithms to economic load dispatch: An analysis\nMotion Planning Of an Autonomous Mobile Robot Using Artificial Neural  Network\nObesity Heuristic, New Way On Artificial Immune Systems\nArtificial Neuron Modelling Based on Wave Shape\nToward Idealized Decision Theory\nTraining artificial neural networks to learn a nondeterministic game\nStoic Ethics for Artificial Agents\nSelf-Organization and Artificial Life: A Review\nAutonomous development and learning in artificial intelligence and  robotics: Scaling up deep learning to human--like learning\nMachine learning \\& artificial intelligence in the quantum domain\nArtificial Ant Colonies in Digital Image Habitats - A Mass Behaviour  Effect Study on Pattern Recognition\nConscious Intelligent Systems - Part 1 : I X I\nAutomatic Vehicle Checking Agent (VCA)\nFrom Cognitive Binary Logic to Cognitive Intelligent Agents\nPerception Lie Paradox: Mathematically Proved Uncertainty about Humans  Perception Similarity\nInformation Retrieval in Intelligent Systems: Current Scenario & Issues\nApplicability of Crisp and Fuzzy Logic in Intelligent Response  Generation\nReview of intelligent tutoring systems using bayesian approach\nSocial and Business Intelligence Analysis Using PSO\nProbabilistic Graphical Models on Multi-Core CPUs using Java 8\nMeasuring Machine Intelligence Through Visual Question Answering\nSimplified firefly algorithm for 2D image key-points search\nInteractive Restless Multi-armed Bandit Game and Swarm Intelligence  Effect\nUltimate Intelligence Part II: Physical Measure and Complexity of  Intelligence\nNetworked Intelligence: Towards Autonomous Cyber Physical Systems\nYou want to survive the data deluge: Be careful, Computational  Intelligence will not serve you as a rescue boat\nCommonsense reasoning, commonsense knowledge, and the SP theory of  intelligence\nProvably Bounded-Optimal Agents\nPOMDPs Make Better Hackers: Accounting for Uncertainty in Penetration  Testing\nArtificial Intelligence Based Cognitive Routing for Cognitive Radio  Networks\nActive learning machine learns to create new quantum experiments\nAn Empirical Study of AI Population Dynamics with Million-agent  Reinforcement Learning\nIntelligent Fault Analysis in Electrical Power Grids\nCognitive Database: A Step towards Endowing Relational Databases with  Artificial Intelligence Capabilities\nArtificial Immune Systems\nQuantum Structure in Cognition: Fundamentals and Applications\nElementos de ingeniería de explotación de la información aplicados  a la investigación tributaria fiscal\nAdaptive Parallel Iterative Deepening Search\nBENCHIP: Benchmarking Intelligence Processors\nLearning, Social Intelligence and the Turing Test - why an  \"out-of-the-box\" Turing Machine will not pass the Turing Test\nThe Hyper-Cortex of Human Collective-Intelligence Systems\nA Distributed AI Aided 3D Domino Game\nA study on non-destructive method for detecting Toxin in pepper using  Neural networks\nThe SP theory of intelligence: benefits and applications\nWhat is Learning? A primary discussion about information and  Representation\nAIXIjs: A Software Demo for General Reinforcement Learning\nMeta-Learning Evolutionary Artificial Neural Networks\nBiological Inspiration for Artificial Immune Systems\nExperimenting with Innate Immunity\nRecognition of cDNA microarray image Using Feedforward artificial neural  network\nImproving Naive Bayes for Regression with Optimised Artificial Surrogate  Data\nHow the symbol grounding of living organisms can be realized in  artificial agents\nEvolutionary Training of Sparse Artificial Neural Networks: A Network  Science Perspective\nBrainstorm/J: a Java Framework for Intelligent Agents\nInformation Integration and Computational Logic\nFrom Alife Agents to a Kingdom of N Queens\nModeling Belief in Dynamic Systems, Part II: Revisions and Update\nModeling Chaotic Behavior of Stock Indices Using Intelligent Paradigms\nAdaptation of Mamdani Fuzzy Inference System Using Neuro - Genetic  Approach for Tactical Air Combat Decision Support System\nNew Millennium AI and the Convergence of History\nOn Granular Knowledge Structures\nNew parallel programming language design: a bridge between brain models  and multi-core/many-core computers?\nFeature Markov Decision Processes\nFeature Reinforcement Learning: Part I: Unstructured MDPs\nAutomated Reasoning and Presentation Support for Formalizing Mathematics  in Mizar\nDetecting Danger: The Dendritic Cell Algorithm\nA Bayesian Methodology for Estimating Uncertainty of Decisions in  Safety-Critical Systems\nAccelerating Reinforcement Learning through Implicit Imitation\nIntelligent Self-Repairable Web Wrappers\nInformledge System: A Modified Knowledge Network with Autonomous Nodes  using Multi-lateral Links\nA Combinatorial Optimisation Approach to Designing Dual-Parented  Long-Reach Passive Optical Networks\nReasoning with Very Expressive Fuzzy Description Logics\nMIVAR: Transition from Productions to Bipartite Graphs MIVAR Nets and  Practical Realization of Automated Constructor of Algorithms Handling More  than Three Million Production Rules\nModelling Social Structures and Hierarchies in Language Evolution\nToward Experiential Utility Elicitation for Interface Customization\nOf Starships and Klingons: Bayesian Logic for the 23rd Century\nMOB-ESP and other Improvements in Probability Estimation\nApplication of Fuzzy Mathematics to Speech-to-Text Conversion by  Elimination of Paralinguistic Content\nToward Large-Scale Agent Guidance in an Urban Taxi Service\nHypothesis Management in Situation-Specific Network Construction\nBuilding a Stochastic Dynamic Model of Application Use\nProbabilistic Models for Agents' Beliefs and Decisions\nModel Criticism of Bayesian Networks with Latent Variables\nA New Model of Plan Recognition\nThe Lumiere Project: Bayesian User Modeling for Inferring the Goals and  Needs of Software Users\nFlexible Decomposition Algorithms for Weakly Coupled Markov Decision  Problems\nWNtags: A Web-Based Tool For Image Labeling And Retrieval With Lexical  Ontologies\nPlan Development using Local Probabilistic Models\nA Decision-Based View of Causality\nA Construction of Bayesian Networks from Databases Based on an MDL  Principle\nR&D Analyst: An Interactive Approach to Normative Decision System Model  Construction\nSemi-bounded Rationality: A model for decision making\nFlexibly-bounded Rationality and Marginalization of Irrationality  Theories for Decision Making\nConceptive Artificial Intelligence: Insights from design theory\nAdvances in Artificial Intelligence: Deep Intentions, Shallow  Achievements\nMulti-Context Systems for Reactive Reasoning in Dynamic Environments\nThe AGI Containment Problem\nTime, Chance, and Action\nBaRT: A Bayesian Reasoning Tool for Knowledge Based Systems\nPlanning, Scheduling, and Uncertainty in the Sequence of Future Events\nInformation and Multi-Sensor Coordination\nMachine Generalization and Human Categorization: An  Information-Theoretic View\nInteractive POMDP Lite: Towards Practical Planning to Predict and  Exploit Intentions for Interacting with Self-Interested Agents\nUniversal Empathy and Ethical Bias for Artificial General Intelligence\nUniversal Psychometrics Tasks: difficulty, composition and decomposition\nProjective simulation with generalization\nEmotion Analysis of Songs Based on Lyrical and Audio Features\nMeasuring an Artificial Intelligence System's Performance on a Verbal IQ  Test For Young Children\nIntrospective Agents: Confidence Measures for General Value Functions\nResource Planning For Rescue Operations\nHandwriting Profiling using Generative Adversarial Networks\nInteraction Networks for Learning about Objects, Relations and Physics\nMessage Passing Multi-Agent GANs\nMachine Reading with Background Knowledge\nArtificial Intelligence Probes for Interstellar Exploration and  Colonization\nArtificial Intelligence as an Enabler for Cognitive Self-Organizing  Future Networks\nLearning A Physical Long-term Predictor\nLearning Macromanagement in StarCraft from Replays using Deep Learning\nToward the Starting Line: A Systems Engineering Approach to Strong AI\nGeneral AI Challenge - Round One: Gradual Learning\nArtificial Intelligence and Data Science in the Automotive Industry\nAbstractions for AI-Based User Interfaces and Systems\nDeep Reinforcement Learning for Conversational AI\nGood and safe uses of AI Oracles\nCooperative Multi-Agent Planning: A Survey\nMAgent: A Many-Agent Reinforcement Learning Platform for Artificial  Collective Intelligence\nRecent Advances in Neural Program Synthesis\nValue Alignment, Fair Play, and the Rights of Service Robots\nThe 2017 AIBIRDS Competition\nEvidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov  Model\nDesign and implementation of computational platform for social-humanoid  robot Lumen as an exhibition guide in Electrical Engineering Days 2015\nA Market-Oriented Programming Environment and its Application to  Distributed Multicommodity Flow Problems\nLearning the Past Tense of English Verbs: The Symbolic Pattern  Associator vs. Connectionist Models\nExploring the Decision Forest: An Empirical Investigation of Occam's  Razor in Decision Tree Induction\nA System for Induction of Oblique Decision Trees\nIntegrative Windowing\nOptimization of Evolutionary Neural Networks Using Hybrid Learning  Algorithms\nOn the Implicit and on the Artificial - Morphogenesis and Emergent  Aesthetics in Autonomous Collective Systems\nArtificial Immune Systems (AIS) - A New Paradigm for Heuristic Decision  Making\nExtended Mixture of MLP Experts by Hybrid of Conjugate Gradient Method  and Modified Cuckoo Search\nFeature Selection for Generator Excitation Neurocontroller Development  Using Filter Technique\nA Sampling-Based Approach to Computing Equilibria in Succinct  Extensive-Form Games\nImproved Local Search in Artificial Bee Colony using Golden Section  Search\nTowards the Evolution of Novel Vertical-Axis Wind Turbines\nEstimating Well-Performing Bayesian Networks using Bernoulli Mixtures\nOptimization of Inter-Subnet Belief Updating in Multiply Sectioned  Bayesian Networks\nStructured Message Passing\nTreedy: A Heuristic for Counting and Sampling Subsets\nABC-SG: A New Artificial Bee Colony Algorithm-Based Distance of  Sequential Data Using Sigma Grams\nA Novel Hybrid Crossover based Artificial Bee Colony Algorithm for  Optimization Problem\nMemory shapes time perception and intertemporal choices\nBridging LSTM Architecture and the Neural Dynamics during Reading\nThe Gn,m Phase Transition is Not Hard for the Hamiltonian Cycle Problem\nAntNet: Distributed Stigmergetic Control for Communications Networks\nUsing Artificial Bee Colony Algorithm for MLP Training on Earthquake  Time Series Data Prediction\nEvolutionary Search in the Space of Rules for Creation of New Two-Player  Board Games\nDeepMind Lab\nIntelligent information extraction based on artificial neural network\nExperience Replay Using Transition Sequences\nData Fusion on Motion and Magnetic Sensors embedded on Mobile Devices  for the Identification of Activities of Daily Living\nA Genetic Programming Framework for 2D Platform AI\nA dataset and architecture for visual reasoning with a working memory\nOntology Based Information Extraction for Disease Intelligence\nTowards a New Science of a Clinical Data Intelligence\nNext Generation Business Intelligence and Analytics: A Survey\nFML-based Dynamic Assessment Agent for Human-Machine Cooperative System  on Game of Go\nIntelligent Traffic Light Control Using Distributed Multi-agent Q  Learning\nInitial Reference Architecture of an Intelligent Autonomous Agent for  Cyber Defense\nOn the semantics of merging\nProblem solving in ID-logic with aggregates: some experiments\nSemantic Parsing based on Verbal Subcategorization\nOn Nonspecific Evidence\nBeslutstödssystemet Dezzy - en översikt\nA Flexible Rule Compiler for Speech Synthesis\nCooperative Game Theory within Multi-Agent Systems for Systems  Scheduling\nBelief Calculus\nIs there an Elegant Universal Theory of Prediction?\nA Foundation to Perception Computing, Logic and Automata\nNon-Computability of Consciousness\nTowards Physarum robots: computing and manipulating on water surface\nSwarm-Based Spatial Sorting\nProposition of the Interactive Pareto Iterated Local Search Procedure -  Elements and Initial Experiments\nECOLANG - Communications Language for Ecological Simulations Network\nI, Quantum Robot: Quantum Mind control on a Quantum Computer\nConsiderations on Construction Ontologies\nBack analysis based on SOM-RST system\nThe Application of Mamdani Fuzzy Model for Auto Zoom Function of a  Digital Camera\nOn Building a Knowledge Base for Stability Theory\nBrain-Like Stochastic Search: A Research Challenge and Funding  Opportunity\nModélisation d'une analyse pragma-linguistique d'un forum de  discussion\nMiBoard: Multiplayer Interactive Board Game\nQuerying Biomedical Ontologies in Natural Language using Answer Set\nAutomatic Estimation of the Exposure to Lateral Collision in Signalized  Intersections using Video Sensors\nLeo Breiman\nExtraction of handwritten areas from colored image of bank checks by an  hybrid method\nLearning Hierarchical Sparse Representations using Iterative Dictionary  Learning and Dimension Reduction\nThe Harmonic Theory; A mathematical framework to build intelligent  contextual and adaptive computing, cognition and sensory system\nDetecting lateral genetic material transfer\nWhen majority voting fails: Comparing quality assurance methods for  noisy human computation environment\nA Mixed Observability Markov Decision Process Model for Musical Pitch\nClustering of Local Optima in Combinatorial Fitness Landscapes\nChanging the Environment based on Intrinsic Motivation\nDealing with the Fuzziness of Human Reasoning\nSemantic information and artificial intelligence\nImplementing Anti-Unification Modulo Equational Theory\nInvestigation of A Collective Decision Making System of Different  Neighbourhood-Size Based on Hyper-Geometric Distribution\nRobotics Technology in Mental Health Care\nEnacting textual entailment and ontologies for automated essay grading  in chemical domain\nInformation retrieval in folktales using natural language processing\nBackward-Forward Search for Manipulation Planning\nMoving Beyond the Turing Test with the Allen AI Science Challenge\nSemantic Reasoning for Context-aware Internet of Things Applications\nTowards Visual Type Theory as a Mathematical Tool and Mathematical User  Interface\nEvent Selection Rules to Compute Explanations\nPredicting User Actions in Software Processes\nSimulated Car Racing Championship: Competition Software Manual\nMeasuring the Directional Distance Between Fuzzy Sets\nA Fuzzy Directional Distance Measure\nUsing Answer Set Programming for pattern mining\nDifferent Types of Conflicting Knowledge in AmI Environments\nEmergence of synchrony in an Adaptive Interaction Model\nEvaluating Go Game Records for Prediction of Player Attributes\nEvolving Non-linear Stacking Ensembles for Prediction of Go Player  Attributes\nSimpleDS: A Simple Deep Reinforcement Learning Dialogue System\nTowards Machine Intelligence\nCITlab ARGUS for historical handwritten documents\nMs. Pac-Man Versus Ghost Team CIG 2016 Competition\nApplication of Ontologies in Cloud Computing: The State-Of-The-Art\nAn Evolving Cascade System Based on A Set Of Neo Fuzzy Nodes\nTowards Lifelong Self-Supervision: A Deep Learning Direction for  Robotics\nThe formal-logical characterisation of lies, deception, and associated  notions\nFuzzy Constraints Linear Discriminant Analysis\nImitating Driver Behavior with Generative Adversarial Networks\nC3A: A Cognitive Collaborative Control Architecture For an Intelligent  Wheelchair\nUniversal Reasoning, Rational Argumentation and Human-Machine  Interaction\nGeracao Automatica de Paineis de Controle para Analise de Mobilidade  Urbana Utilizando Redes Complexas\nStatic Gesture Recognition using Leap Motion\nUnsupervised Neural-Symbolic Integration\nP-Tree Programming\nAutoencoder-augmented Neuroevolution for Visual Doom Playing\nApplying MAPP Algorithm for Cooperative Path Finding in Urban  Environments\nIntelligent Subset Selection of Power Generators for Economic Dispatch\nGenerating OWA weights using truncated distributions\nAn Ontology to support automated negotiation\nBypass Fraud Detection: Artificial Intelligence Approach\nAI2-THOR: An Interactive 3D Environment for Visual AI\nWinograd Schema - Knowledge Extraction Using Narrative Chains\nTheoretical Impediments to Machine Learning With Seven Sparks from the  Causal Revolution\nHuman-Machine Inference Networks For Smart Decision Making:  Opportunities and Challenges\nBridging Cognitive Programs and Machine Learning\nThe problem of the development ontology-driven architecture of  intellectual software systems\nMIRIAM: A Multimodal Chat-Based Interface for Autonomous Systems\nOn Chatbots Exhibiting Goal-Directed Autonomy in Dynamic Environments\nFutureMapping: The Computational Structure of Spatial AI Systems\nEmotion Orientated Recommendation System for Hiroshima Tourist by Fuzzy  Petri Net\nCentralized reward system gives rise to fast and efficient work sharing  for intelligent Internet agents lacking direct communication\nIntelligent search strategies based on adaptive Constraint Handling  Rules\nProtocol Requirements for Self-organizing Artifacts: Towards an Ambient  Intelligence\nLearning to Bluff\nPhase transition in SONFIS&SORST\nIntuitive visualization of the intelligence for the run-down of  terrorist wire-pullers\nDevelopment of Hybrid Intelligent Systems and their Applications from  Engineering Systems to Complex Systems\nDesign of Intelligent layer for flexible querying in databases\nKnowledge Embedding and Retrieval Strategies in an Informledge System\nAn Intelligent Approach for Negotiating between chains in Supply Chain  Management Systems\nEvaluation of Distributed Intelligence on the Smart Card\nPrinciples of modal and vector theory of formal intelligence systems\nUsing the quaternion's representation of individuals in swarm  intelligence and evolutionary computation\nA Novel Approach for Intelligent Robot Path Planning\nIntelligent City Traffic Management and Public Transportation System\nAn Argumentation-Based Framework to Address the Attribution Problem in  Cyber-Warfare\nTowards Bayesian Deep Learning: A Survey\nA Hybrid, PDE-ODE Control Strategy for Intercepting an Intelligent,  well-informed Target in a Stationary, Cluttered Environment\nDiscovering patterns of correlation and similarities in software project  data with the Circos visualization tool\nOn Generalized Bayesian Data Fusion with Complex Models in Large Scale  Networks\nAnalysis of Intelligent Classifiers and Enhancing the Detection Accuracy  for Intrusion Detection System\nThe Computational Principles of Learning Ability\nMissing Data Estimation in High-Dimensional Datasets: A Swarm  Intelligence-Deep Neural Network Approach\nFeynman Machine: The Universal Dynamical Systems Computer\nIntelligent bidirectional rapidly-exploring random trees for optimal  motion planning in complex cluttered environments\nInteractive, Intelligent Tutoring for Auxiliary Constructions in  Geometry Proofs\nA Bi-population Particle Swarm Optimizer for Learning Automata based  Slow Intelligent System\nRandom Worlds and Maximum Entropy\nA Uniform Framework for Concept Definitions in Description Logics\nSynthesizing Customized Planners from Specifications\nAn Average Analysis of Backtracking on Random Constraint Satisfaction  Problems\nNetNeg: A Connectionist-Agent Integrated System for Representing Musical  Knowledge\nMulti-Instance Multi-Label Learning\nAdaptive Branching for Constraint Satisfaction Problems\nThe tractability of CSP classes defined by forbidden patterns\nImplementing Human-like Intuition Mechanism in Artificial Intelligence\nUse of Markov Chains to Design an Agent Bidding Strategy for Continuous  Double Auctions\nMerging Knowledge Bases in Possibilistic Logic by Lexicographic  Aggregation\nComputational Aspects of Nearly Single-Peaked Electorates\nA Study of Scaling Issues in Bayesian Belief Networks for Ship  Classification\nA Bayesian Variant of Shafer's Commonalities For Modelling Unforeseen  Events\nGeospatial Narratives and their Spatio-Temporal Dynamics: Commonsense  Reasoning for High-level Analyses in Geographic Information Systems\nGOTCHA Password Hackers!\nPrime Implicates and Prime Implicants: From Propositional to Modal Logic\nCase-Based Subgoaling in Real-Time Heuristic Search for Video Game  Pathfinding\nOnline Speedup Learning for Optimal Planning\nEthical Artificial Intelligence\nThe Limitations of Standardized Science Tests as Benchmarks for  Artificial Intelligence Research: Position Paper\nUsing Automated Theorem Provers to Teach Knowledge Representation in  First-Order Logic\nSome Epistemological Problems with the Knowledge Level in Cognitive  Architectures\nAn Empirical Evaluation of a Randomized Algorithm for Probabilistic  Inference\nA Decision-Theoretic Model for Using Scientific Data\nMulti-objective Reinforcement Learning with Continuous Pareto Frontier  Approximation Supplementary Material\nEfficiency and complexity of price competition among single-product  vendors\nMeta-learning within Projective Simulation\nTracing Linguistic Relations in Winning and Losing Sides of Explicit  Opposing Groups\nThe Morphospace of Consciousness\nReceptor uptake arrays for vitamin B12, siderophores and glycans shape  bacterial communities\nHow Important is Syntactic Parsing Accuracy? An Empirical Evaluation on  Rule-Based Sentiment Analysis\nNeural-Symbolic Learning and Reasoning: A Survey and Interpretation\nTowards a Deep Reinforcement Learning Approach for Tower Line Wars\nNull Dynamical State Models of Human Cognitive Dysfunction\nAugmented Artificial Intelligence: a Conceptual Framework\nGenerating retinal flow maps from structural optical coherence  tomography with artificial intelligence\nCan Autism be Catered with Artificial Intelligence-Assisted Intervention  Technology? A Literature Review\nEvolving a Stigmergic Self-Organized Data-Mining\nAn Intelligent System For Effective Forest Fire Detection Using Spatial  Data\nSimplification and integration in computing and cognition: the SP theory  and the multiple alignment concept\nMulti-objects association in perception of dynamical situation\nFighting Sample Degeneracy and Impoverishment in Particle Filters: A  Review of Intelligent Approaches\nScientific Discovery by Machine Intelligence: A New Avenue for Drug  Research\nDesigning Intelligent Instruments\nMetaheuristic Algorithms for Convolution Neural Network\nMachine Intelligence Techniques for Next-Generation Context-Aware  Wireless Networks\nThe ORCA Hub: Explainable Offshore Robotics through Intelligent  Interfaces\nThe Essence of Constraint Propagation\nCox's Theorem Revisited\nExtending Classical Logic with Inductive Definitions\nQUIP - A Tool for Computing Nonmonotonic Reasoning Tasks\nCoherence, Belief Expansion and Bayesian Networks\nBDD-based reasoning in the fluent calculus - first results\nPAL: Pertinence Action Language\nLocal Diagnosis\nXNMR: A tool for knowledge bases exploration\nConstraint compiling into rules formalism constraint compiling into  rules formalism for dynamic CSPs computing\nKnowledge Theoretic Properties of Topological Spaces\nModal Logics for Topological Spaces\nOn the relationship between fuzzy logic and four-valued relevance logic\nThe alldifferent Constraint: A Survey\nOptimization Over Zonotopes and Training Support Vector Machines\nThe Representation of Legal Contracts\nThe logical meaning of Expansion\nThe Traits of the Personable\nTwo Representations for Iterative Non-prioritized Change\nCollective Argumentation\nXCB, the Last of the Shortest Single Axioms for the Classical  Equivalential Calculus\nUnsupervised Learning in a Framework of Information Compression by  Multiple Alignment, Unification and Search\nUniversal Sequential Decisions in Unknown Environments\nImplementing an Agent Trade Server\nTransient Diversity in Multi-Agent Systems\nWSAT(cc) - a fast local-search ASP solver\nGreat Expectations. Part II: Generalized Expected Utility as a Universal  Decision Rule\nDemolishing Searle's Chinese Room\nWhere Fail-Safe Default Logics Fail\nParametric external predicates for the DLV System\nPropositional Defeasible Logic has Linear Complexity\nGeneralized Evolutionary Algorithm based on Tsallis Statistics\nAugmenting ALC(D) (atemporal) roles and (aspatial) concrete domain with  temporal roles and a spatial concrete domain -first results\nA TCSP-like decidable constraint language generalising existing cardinal  direction relations\nIssues in Exploiting GermaNet as a Resource in Real Applications\nTransforming Business Rules Into Natural Language Text\nSelf-Organization of the Neuron Collective of Optimal Complexity\nMetalinguistic Information Extraction for Terminology\nA Study for the Feature Core of Dynamic Reduct\nUniversal Learning of Repeated Matrix Games\nEvolutionary Computing\nUsing Domain Knowledge in Evolutionary System Identification\nUniCalc.LIN: a linear constraint solver for the UniCalc system\nBelief Conditioning Rules (BCRs)\nSemantic Description of Parameters in Web Service Annotations\nModular self-organization\nUne expérience de sémantique inférentielle\nOn Geometric Algebra representation of Binary Spatter Codes\nConstant for associative patterns ensemble\nAttribute Value Weighting in K-Modes Clustering\nOn the Complexity of the Numerically Definite Syllogistic and Related  Fragments\nMathematical model of interest matchmaking in electronic social networks\nRemarks on Inheritance Systems\nMathematics as an Exact and Precise Language of Nature\nIncompleteness, Complexity, Randomness and Beyond\nPreconditioned Temporal Difference Learning\nCan the Internet cope with stress?\nCompositional Semantics Grounded in Commonsense Metaphysics\nHORPO with Computability Closure : A Reconstruction\nEffective Generation of Subjectively Random Binary Sequences\nMeasuring the Evolvability Landscape to study Neutrality\nFrom vectors to mnesors\nAbout Algorithm for Transformation of Logic Functions (ATLF)\nNumerical Sensitivity and Efficiency in the Treatment of Epistemic and  Aleatory Uncertainty\nThe Choquet integral for the aggregation of interval scales in  multicriteria decision making\nData-Complexity of the Two-Variable Fragment with Counting Quantifiers\nOn Introspection, Metacognitive Control and Augmented Data Mining Live  Cycles\nComparison between CPBPV, ESC/Java, CBMC, Blast, EUREKA and Why for  Bounded Program Verification\nGeneralized Prediction Intervals for Arbitrary Distributed  High-Dimensional Data\nOn-the-fly Macros\nMulti-Agent Reinforcement Learning and Genetic Policy Sharing\nArtificial intelligence for Bidding Hex\nN-norm and N-conorm in Neutrosophic Logic and Set, and the Neutrosophic  Topologies\nXML Representation of Constraint Networks: Format XCSP 2.1\nWriting Positive/Negative-Conditional Equations Conveniently\nCombining Symmetry Breaking and Global Constraints\nOptimistic Simulated Exploration as an Incentive for Real Exploration\nGuarded resolution for answer set programming\nFeasibility of random basis function approximators for modeling and  control\nLearning Nonlinear Dynamic Models\nAutomating Quantified Multimodal Logics in Simple Type Theory -- A Case  Study\nOn Defining 'I' \"I logy\"\nToward a Category Theory Design of Ontological Knowledge Bases\nMnesors for automatic control\nA Class of DSm Conditional Rules\nAn improved axiomatic definition of information granulation\nLogic with Verbs\nThe Weighted CFG Constraint\nProceedings 6th International Workshop on Local Search Techniques in  Constraint Satisfaction\nA Decision-Optimization Approach to Quantum Mechanics and Game Theory\nSimilarité en intension vs en extension : à la croisée de  l'informatique et du théâtre\nExponential Family Hybrid Semi-Supervised Learning\nRelease ZERO.0.1 of package RefereeToolbox\nImportance of Sources using the Repeated Fusion Method and the  Proportional Conflict Redistribution Rules #5 and #6\nPredictive Gain Estimation - A mathematical analysis\nThe Socceral Force\nComputing by Means of Physics-Based Optical Neural Networks\nWhere are the hard manipulation problems?\nParameterized Complexity Results in Symmetry Breaking\nBorder Algorithms for Computing Hasse Diagrams of Arbitrary Lattices\nUsing Semantic Wikis for Structured Argument in Medical Domain\nScientific Collaborations: principles of WikiBridge Design\nOn the CNF encoding of cardinality constraints and beyond\nBoolVar/PB v1.0, a java library for translating pseudo-Boolean  constraints into CNF formulae\nReal Islamic Logic\nTranslation-based Constraint Answer Set Solving\nGenerating Schemata of Resolution Proofs\nALPprolog --- A New Logic Programming Method for Dynamic Domains\nConscious Machines and Consciousness Oriented Programming\nSelf-Organizing Mixture Networks for Representation of Grayscale Digital  Images\nDetection and emergence\ndynPARTIX - A Dynamic Programming Reasoner for Abstract Argumentation\nUne analyse basée sur la S-DRT pour la modélisation de dialogues  pathologiques\naspcud: A Linux Package Configuration Tool Based on Answer Set  Programming\nTechnical Note: Exploring Σ^P_2 / Π^P_2-hardness for  Argumentation Problems with fixed distance to tractable classes\nAbstract Representations and Frequent Pattern Discovery\nEDML: A Method for Learning Parameters in Bayesian Networks\n(weak) Calibration is Computationally Hard\nRelational Reinforcement Learning in Infinite Mario\nThe Equational Approach to CF2 Semantics\nGeneralisation of language and knowledge models for corpus analysis\nApplications of fuzzy logic to Case-Based Reasoning\nLearning in Riemannian Orbifolds\nEHRs Connect Research and Practice: Where Predictive Modeling,  Artificial Intelligence, and Clinical Decision Support Intersect\nLearning AMP Chain Graphs under Faithfulness\nQuantified Conditional Logics are Fragments of HOL\nDissimilarity Clustering by Hierarchical Multi-Level Refinement\nA Simplified Description of Fuzzy TOPSIS\nThe Causal Topography of Cognition\nThe hardest logic puzzle ever becomes even tougher\nElimination of Spurious Ambiguity in Transition-Based Dependency Parsing\nChallenges for Distributional Compositional Semantics\nIntroduction of the weight edition errors in the Levenshtein distance\nNew results of ant algorithms for the Linear Ordering Problem\nA Linguistic Model for Terminology Extraction based Conditional Random  Fields\nLearning Riemannian Metrics\nA Study on Fuzzy Systems\nKnowledge Sharing: A Model\nAutomated Variational Inference in Probabilistic Programming\nExperiments with Random Projection\nPhoneme discrimination using KS algebra I\nUpdate report: LEO-II version 1.5\nStochastic gradient descent algorithms for strongly convex functions at  O(1/T) convergence rates\nFast Collision Checking: From Single Robots to Multi-Robot Teams\nExponentiated Gradient LINUCB for Contextual Multi-Armed Bandits\nNormalized Online Learning\nUsing Genetic Programming to Model Software\nSyntactic sensitive complexity for symbol-free sequence\nA fully automatic problem solver with human-style output\nA finite axiomatization of conditional independence and inclusion  dependencies\nDistributed Reinforcement Learning via Gossip\nQ-learning optimization in a multi-agents system for image segmentation\nStrategic Argumentation is NP-Complete\nThe DIAMOND System for Argumentation: Preliminary Report\nDoes Syntactic Knowledge help English-Hindi SMT?\nA Microkernel Architecture for Constraint Programming\nPropagators and Violation Functions for Geometric and Workload  Constraints Arising in Airspace Sectorisation\nA proof challenge: multiple alignment and information compression\nOn a correlational clustering of integers\nThou Shalt is not You Will\nOntology as a Source for Rule Generation\nTurKPF: TurKontrol as a Particle Filter\nDo we need Asimov's Laws?\nSome thoughts about benchmarks for NMR\nDialogues for proof search\nA Self-Adaptive Network Protection System\nVicious Circle Principle and Logic Programs with Aggregates\nLexpresso: a Controlled Natural Language\nPossibility neutrosophic soft sets with applications in decision making  and similarity measure\nImparo is complete by inverse subsumption\nNormalized Online Learning\nThe Universe of Minds\nDomain-Independent Optimistic Initialization for Reinforcement Learning\nThe probatilistic Quantifier Fuzzification Mechanism FA: A theoretical  analysis\nScalable Parallel Numerical CSP Solver\nIntroduction to ROSS: A New Representational Scheme\nCognitive Systems and Question Answering\nOn Generalized Rectangular Fuzzy Model for Assessment\nAutomatic Observer Script for StarCraft: Brood War Bot Games (technical  report)\nA Definition of Happiness for Reinforcement Learning Agents\nShedding Light on the Asymmetric Learning Capability of AdaBoost\nLazy Explanation-Based Approximation for Probabilistic Logic Programming\nOn the Computability of Solomonoff Induction and Knowledge-Seeking\nYARBUS : Yet Another Rule Based belief Update System\nA genetic algorithm for autonomous navigation in partially observable  domain\nUsing Ontology-Based Context in the Portuguese-English Translation of  Homographs in Textual Dialogues\nAsymptotic Logical Uncertainty and The Benford Test\nSystem Descriptions of the First International Competition on  Computational Models of Argumentation (ICCMA'15)\nMy Reflections on the First Man vs. Machine No-Limit Texas Hold 'em  Competition\nTuring's Red Flag\nZ Specification for the W3C Editor's Draft Core SHACL Semantics\nAbstract Attribute Exploration with Partial Object Descriptions\nRHOG: A Refinement-Operator Library for Directed Labeled Graphs\nQuantifier Scope in Categorical Compositional Distributional Semantics\nHolophrasm: a neural Automated Theorem Prover for higher-order logic\nFive dimensions of reasoning in the wild\nThe Movie Graph Argument Revisited\nTwo Projection Pursuit Algorithms for Machine Learning under  Non-Stationarity\nLogical Fuzzy Optimization\nModèle flou d'expression des préférences basé sur les CP-Nets\nSymmetry-Aware Marginal Density Estimation\nDeveloping and Analyzing Boundary Detection Operators Using  Probabilistic Models\nSolving WCSP by Extraction of Minimal Unsatisfiable Cores\nOntoRich - A Support Tool for Semi-Automatic Ontology Enrichment and  Evaluation\nA Markov Model for Ontology Alignment\nA novice looks at emotional cognition\nThree Generalizations of the FOCUS Constraint\nFrom Ordinary Differential Equations to Structural Causal Models: the  deterministic case\nStratified Labelings for Abstract Argumentation\nRecommandation mobile, sensible au contexte de contenus évolutifs:  Contextuel-E-Greedy\nRelations on FP-Soft Sets Applied to Decision Making Problems\nA Powerful Genetic Algorithm for Traveling Salesman Problem\nCascading A*: a Parallel Approach to Approximate Heuristic Search\nInitial Experiments with TPTP-style Automated Theorem Provers on ACL2  Problems\nBelief revision by examples\nSemantic HMC for Big Data Analysis\nA tool for implementation of a domain model based on fuzzy relationships\nBach in 2014: Music Composition with Recurrent Neural Network\nNeutrosophic information in the framework of multi-valued representation\nWorkshop Notes of the 6th International Workshop on Acquisition,  Representation and Reasoning about Context with Logic (ARCOE-Logic 2014)\nA Generalization of Gustafson-Kessel Algorithm Using a New Constraint  Parameter\nA New Penta-valued Logic Based Knowledge Representation\nDevelopment of a VO Registry Subject Ontology using Automated Methods\nTensor SimRank for Heterogeneous Information Networks\nKnowledge reduction of dynamic covering decision information systems  with immigration of more objects\nTemporal ordering of clinical events\nBridging belief function theory to modern machine learning\nControlled Query Evaluation for Datalog and OWL 2 Profile Ontologies\nCan Machines Truly Think\nSimilarity, Cardinality and Entropy for Bipolar Fuzzy Set in the  Framework of Penta-valued Representation\nEntropy and Syntropy in the Context of Five-Valued Logics\nA Large-Scale Car Dataset for Fine-Grained Categorization and  Verification\nDecomposition and Identification of Linear Structural Equation Models\nWhy is GDP growth linear?\nConstructing Abstraction Hierarchies Using a Skill-Symbol Loop\nSolving a Mathematical Problem in Square War: a Go-like Board Game\nThinking Required\nConvolutional Monte Carlo Rollouts in Go\nMultivariate Time Series Classification Using Dynamic Time Warping  Template Selection for Human Activity Recognition\nA Predictive Model using the Markov Property\nInference rules for RDF(S) and OWL in N3Logic\nPricing Vehicle Sharing with Proximity Information\nProbabilistic Models for Computerized Adaptive Testing: Experiments\nApplying Boolean discrete methods in the production of a real-valued  probabilistic programming model\nRange-based argumentation semantics as 2-valued models\nGeoGebra Tools with Proof Capabilities\nCOCO: The Experimental Procedure\nBuilding the Signature of Set Theory Using the MathSem Program\nA Step from Probabilistic Programming to Cognitive Architectures\nImproving abcdSAT by At-Least-One Recently Used Clause Management  Strategy\nDifferences between Industrial Models of Autonomy and Systemic Models of  Autonomy\nOpenAI Gym\nRelating Strong Spatial Cognition to Symbolic Problem Solving --- An  Example\nX575: writing rengas with web services\nUsing Recurrent Neural Network for Learning Expressive Ontologies\nAdaptive Artificial Intelligence in Games: Issues, Requirements, and a  Solution through Behavlets-based General Player Modelling\nAssisting Drivers During Overtaking Using Car-2-Car Communication and  Multi-Agent Systems\nModeling selectional restrictions in a relational type system\nReprowd: Crowdsourced Data Processing Made Reproducible\nFirst-Order Bayesian Network Specifications Capture the Complexity Class  PP\nMicro-Data Learning: The Other End of the Spectrum\nDeepAlgebra - an outline of a program\nFairness as a Program Property\nIntroduction: Cognitive Issues in Natural Language Processing\nA Projective Simulation Scheme for a Partially-Observable Multi-Agent  Game\nQuantile Reinforcement Learning\nDependence and Relevance: A probabilistic view\nBayesian Non-parametric model to Target Gamification Notifications Using  Big Data\nGeneralized LR parsing and the shuffle operator\nStratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)\nBipolar Weighted Argumentation Graphs\nComparing Apples and Oranges: Two Examples of the Limits of Statistical  Inference, With an Application to Google Advertising Markets\nAlgorithmic Songwriting with ALYSIA\nEfficient iterative policy optimization\nTowards Smart Proof Search for Isabelle\nMulticlass MinMax Rank Aggregation\nInteraction Information for Causal Inference: The Case of Directed  Triangle\nASHACL: Alternative Shapes Constraint Language\nSolving the Brachistochrone Problem by an Influence Diagram\nT-SKIRT: Online Estimation of Student Proficiency in an Adaptive  Learning System\nMonte Carlo Action Programming\nExchangeable choice functions\nSegmentation of skin lesions based on fuzzy classification of pixels and  histogram thresholding\nSolving the Goddard problem by an influence diagram\nPseudorehearsal in value function approximation\nSynergy of all-purpose static solver and temporal reasoning tools in  dynamic integrated expert systems\nKnowledge Fusion via Embeddings from Text, Knowledge Graphs, and Images\nStructured Production System (extended abstract)\nComposition of Credal Sets via Polyhedral Geometry\nA note on the uniqueness of models in social abstract argumentation\nA Survey of Distant Supervision Methods using PGMs\nA rational analysis of curiosity\nA Method for Determining Weights of Criterias and Alternative of Fuzzy  Group Decision Making Problem\nExperience enrichment based task independent reward model\nCompatible extensions and consistent closures: a fuzzy approach\nThe Singularity May Be Near\nRegular Boardgames\nEvidence Against Evidence Theory (?!)\nGenerative Models for Learning from Crowds\nTechnical Report: Implementation and Validation of a Smart Health  Application\nLearning and Evaluating Musical Features with Deep Autoencoders\nOn the enumeration of sentences by compactness\nDeductive and Analogical Reasoning on a Semantically Embedded Knowledge  Graph\nMathematical aspect of the combinatorial game \"Mahjong\"\nDetection of Abnormal Input-Output Associations\nInvestigating Reinforcement Learning Agents for Continuous State Space  Environments\nNon-FPT lower bounds for structural restrictions of decision DNNF\nPlausibility and probability in deductive reasoning\nLinking Generative Adversarial Learning and Binary Classification\nLoIDE: a web-based IDE for Logic Programming - Preliminary Technical  Report\nA Deep-Reinforcement Learning Approach for Software-Defined Networking  Routing Optimization\nAssumption-Based Approaches to Reasoning with Priorities\nTensors Come of Age: Why the AI Revolution will help HPC\nCreating a Social Brain for Cooperative Connected Autonomous Vehicles:  Issues and Challenges\nExploring Cross-Domain Data Dependencies for Smart Homes to Improve  Energy Efficiency\nSufficient and necessary causation are dual\nTopological characteristics of oil and gas reservoirs and their  applications\nNote on Representing attribute reduction and concepts in concepts  lattice using graphs\nThe destiny of constant structure discrete time closed semantic systems\nA Study on Modeling of Inputting Electrical Power of Ultra High Power  Electric Furnace by using Fuzzy Rule and Regression Model\nVariational Deep Q Network\nNintendo Super Smash Bros. Melee: An \"Untouchable\" Agent\nSentiment Predictability for Stocks\nPseudorehearsal in actor-critic agents with neural network function  approximation\nGreenhouse: A Zero-Positive Machine Learning System for Time-Series  Anomaly Detection\nPrecision and Recall for Range-Based Anomaly Detection\nReasoning about multiple aspects in DLs: Semantics and Closure  Construction\nMulti-optional Many-sorted Past Present Future structures and its  description\nOnto2Vec: joint vector-based representation of biological entities and  their ontology-based annotations\nA Scheme-Driven Approach to Learning Programs from Input/Output  Equations\nAverage Size of Implicational Bases\nReasoning in a Hierarchical System with Missing Group Size Information\nDetecting truth, just on parts\nTechnique for designing a domain ontology\nIntegrated Tools for Engineering Ontologies\nPrinciples of design and software development models of  ontological-driven computer systems\nSufiSent - Universal Sentence Representations Using Suffix Encodings\nOn the scaling of polynomial features for representation matching\nDecision-making processes in the Cognitive Theory of True Conditions\nAn Application of HodgeRank to Online Peer Assessment\nLearning and analyzing vector encoding of symbolic representations\nOn the Algebra in Boole's Laws of Thought\nInformation Theoretic Interpretation of Deep learning\nWeakly Aggregative Modal Logic: Characterization and Interpolation\nA Rule for Committee Selection with Soft Diversity Constraints\nThe Logical Essentials of Bayesian Reasoning\nLinguistic Structure as Composition and Perturbation\nQuantitative Neural Network Model of the Tip-of-the-Tongue Phenomenon  Based on Synthesized Memory-Psycholinguistic-Metacognitive Approach\nMarket-Based Reinforcement Learning in Partially Observable Worlds\nBehaviour-based Knowledge Systems: An Epigenetic Path from Behaviour to  Knowledge\nMultidimensional data classification with artificial neural networks\nApplying Evolutionary Optimisation to Robot Obstacle Avoidance\nArtificial Agents and Speculative Bubbles\nA Data-Parallel Version of Aleph\nOptimising the topology of complex neural networks\nA System for Predicting Subcellular Localization of Yeast Genome Using  Neural Network\nCharacterization of the convergence of stationary Fokker-Planck learning\nAn introduction to DSmT\nCooperative Automated Worm Response and Detection Immune Algorithm\nEvolving Genes to Balance a Pole\nBuilding a Chaotic Proved Neural Network\nNegotiating Socially Optimal Allocations of Resources\nControl Neuronal por Modelo Inverso de un Servosistema Usando Algoritmos  de Aprendizaje Levenberg-Marquardt y Bayesiano\nUnsupervised Classification Using Immune Algorithm\nDynamic consistency and decision making under vacuous belief\nDiscovering causal structures in binary exclusive-or skew acyclic models\nA Comparative Study of State Transition Algorithm with Harmony Search  and Artificial Bee Colony\nConstraints on the search space of argumentation\nLog-Optimal Portfolio Selection Using the Blackwell Approachability  Theorem\nTowards Learning Object Affordance Priors from Technical Texts\nHypotheses of neural code and the information model of the  neuron-detector\nThe Effect of Social Learning on Individual Learning and Evolution\nChases and Escapes, and Optimization Problems\nAn Evolutionary Algorithm for Error-Driven Learning via Reinforcement\nRobot Dream\nSequential Short-Text Classification with Recurrent and Convolutional  Neural Networks\nOptimal Binary Autoencoding with Pairwise Correlations\nMachine Learning for Dental Image Analysis\nNeural Networks for Joint Sentence Classification in Medical Paper  Abstracts\nMIT at SemEval-2017 Task 10: Relation Extraction with Convolutional  Neural Networks\nDeep learning evaluation using deep linguistic processing\nTechnical Problems With \"Programmable self-assembly in a thousand-robot  swarm\"\nExplaining Trained Neural Networks with Semantic Web Technologies: First  Steps\nAn Artificial Neural Network Architecture Based on Context  Transformations in Cortical Minicolumns\nReasons and Means to Model Preferences as Incomplete\nMachine learning and evolutionary techniques in interplanetary  trajectory design\nA Model of Free Will for Artificial Entities\nA Bayesian Model for Activities Recommendation and Event Structure  Optimization Using Visitors Tracking\nBig Data Analytics, Machine Learning and Artificial Intelligence in  Next-Generation Wireless Networks\nEnglish Sentence Recognition using Artificial Neural Network through  Mouse-based Gestures\nOn Affinity Measures for Artificial Immune System Movie Recommenders\nAn Idiotypic Immune Network as a Short Term Learning Architecture for  Mobile Robots\nArtificial Immune Tissue using Self-Orgamizing Networks\nIntroducing Dendritic Cells as a Novel Immune-Inspired Algorithm for  Anomoly Detection\nAn Efficient Automatic Mass Classification Method In Digitized  Mammograms Using Artificial Neural Network\nOptimization of artificial flockings by means of anisotropy measurements\nUsing Belief Theory to Diagnose Control Knowledge Quality. Application  to cartographic generalisation\nTraining a Feed-forward Neural Network with Artificial Bee Colony Based  Backpropagation Method\nImplementation of a Vision System for a Landmine Detecting Robot Using  Artificial Neural Network\nThe method of artificial systems\nUsing Artificial Neural Network Techniques for Prediction of Electric  Energy Consumption\nA Self-Taught Artificial Agent for Multi-Physics Computational Model  Personalization\nIntelligent decision: towards interpreting the Pe Algorithm\nDesign of a GIS-based Assistant Software Agent for the Incident  Commander to Coordinate Emergency Response Operations\nCognitive Development of the Web\nFault Detection Engine in Intelligent Predictive Analytics Platform for  DCIM\nArtificial Immune Systems (INTROS 2)\nA symbolic description of punning riddles and its computer  implementation\nAn implemented model of punning riddles\nMorphology with a Null-Interface\nNatural Language Interfaces to Databases - An Introduction\nA Variant of Earley Parsing\nNature's Way of Optimizing\nAn Empirical Analysis of Search in GSAT\nThe Difficulties of Learning Logic Programs with Cut\nSoftware Agents: Completing Patterns and Constructing User Interfaces\nDecidable Reasoning in Terminological Knowledge Representation Systems\nTeleo-Reactive Programs for Agent Control\nSubstructure Discovery Using Minimum Description Length and Background  Knowledge\nBias-Driven Revision of Logical Domain Theories\nA Semantics and Complete Algorithm for Subsumption in the CLASSIC  Description Logic\nPattern Matching and Discourse Processing in Information Extraction from  Japanese Text\nWrap-Up: a Trainable Discourse Module for Information Extraction\nOperations for Learning with Graphical Models\nTotal-Order and Partial-Order Planning: A Comparative Analysis\nA Domain-Independent Algorithm for Plan Adaptation\nTruncating Temporal Differences: On the Efficient Implementation of  TD(lambda) for Reinforcement Learning\nOn the Informativeness of the DNA Promoter Sequences Domain Theory\nCost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic  Decision Tree Induction Algorithm\nUsing Pivot Consistency to Decompose and Solve Functional CSPs\nAdaptive Load Balancing: A Study in Multi-Agent Learning\nPac-Learning Recursive Logic Programs: Efficient Algorithms\nPac-learning Recursive Logic Programs: Negative Results\nInduction of First-Order Decision Lists: Results on Learning the Past  Tense of English Verbs\nBuilding and Refining Abstract Planning Cases by Change of  Representation Language\nUsing Qualitative Hypotheses to Identify Inaccurate Data\nDiffusion of Context and Credit Information in Markovian Models\nLearning Membership Functions in a Function-Based Object Recognition  System\nFlexibly Instructable Agents\nVision-Based Road Detection in Automotive Systems: A Real-Time  Expectation-Driven Approach\nGeneralization of Clauses under Implication\nTranslating between Horn Representations and their Characteristic Models\nThe Design and Experimental Analysis of Algorithms for Temporal  Reasoning\nLogarithmic-Time Updates and Queries in Probabilistic Networks\nPractical Methods for Proving Termination of General Logic Programs\nFurther Experimental Evidence against the Utility of Occam's Razor\nLeast Generalizations and Greatest Specializations of Sets of Clauses\nReinforcement Learning: A Survey\nAdaptive Problem-solving for Large-scale Scheduling Problems: A Case  Study\nA Formal Framework for Speedup Learning from Problems and Solutions\n2Planning for Contingencies: A Decision-based Approach\nA Principled Approach Towards Symbolic Geometric Constraint Satisfaction\nOn Partially Controlled Multi-Agent Systems\nCue Phrase Classification Using Machine Learning\nExploiting Causal Independence in Bayesian Network Inference\nImproved Heterogeneous Distance Functions\nConnectionist Theory Refinement: Genetically Searching the Space of  Network Topologies\nFlaw Selection Strategies for Partial-Order Planning\nA New Look at the Easy-Hard-Easy Pattern of Combinatorial Search  Difficulty\nIdentifying Hierarchical Structure in Sequences: A linear-time algorithm\nAnalysis of Three-Dimensional Protein Images\nA Model Approximation Scheme for Planning in Partially Observable  Stochastic Domains\nWhen Gravity Fails: Local Search Topology\nBidirectional Heuristic Search Reconsidered\nTractability of Theory Patching\nModel-Based Diagnosis using Structured System Descriptions\nA Selective Macro-learning Algorithm and its Application to the NxN  Sliding-Tile Puzzle\nSet-Theoretic Completeness for Epistemic and Conditional Logic\nThe Computational Complexity of Probabilistic Planning\nSemantics and Conversations for an Agent Communication Language\nLearning Nested Agent Models in an Information Economy\nFixpoint 3-valued semantics for autoepistemic logic\nResolving Part-of-Speech Ambiguity in the Greek Language Using Learning  Techniques\nACLP: Integrating Abduction and Constraint Solving\nSome Remarks on Boolean Constraint Propagation\nNaive Bayes and Exemplar-Based approaches to Word Sense Disambiguation  Revisited\nOrdering-based Representations of Rational Inference\nContextual Inference in Computational Semantics\nLogic Programming Approaches for Representing and Solving Constraint  Satisfaction Problems: A Comparison\nA Constraint-Driven System for Contract Assembly\nAnother perspective on Default Reasoning\nPreferred well-founded semantics for logic programming by alternating  fixpoints: Preliminary report\nQuestion answering: from partitions to Prolog\nGeometric Aspects of Multiagent Systems\nA uniform approach to logic programming semantics\nFormal Concept Analysis and Resolution in Algebraic Domains\nBayesian Treatment of Incomplete Discrete Data applied to Mutual  Information and Feature Selection\nDefinition and Complexity of Some Basic Metareasoning Problems\nModeling Object Oriented Constraint Programs in Z\nPolyhierarchical Classifications Induced by Criteria Polyhierarchies,  and Taxonomy Algebra\nNLML--a Markup Language to Describe the Unlimited English Grammar\nExploiting Cross-Document Relations for Multi-document Evolving  Summarization\nA Neuro-Fuzzy Approach for Modelling Electricity Demand in Victoria\nData Mining Approach for Analyzing Call Center Performance\nDeductive Algorithmic Knowledge\nPruning Search Space in Defeasible Argumentation\nSub-Structural Niching in Non-Stationary Environments\nExplorations in engagement for humans and robots\nStochastic Process Semantics for Dynamical Grammar Syntax: An Overview\nRobust Inference of Trees\nApproximate Discrete Probability Distribution Representation using a  Multi-Resolution Binary Tree\nImagination as Holographic Processor for Text Animation\nRepresenting Knowledge about Norms\nCase Base Mining for Adaptation Knowledge Acquisition\nOpen-Ended Artificial Evolution\nA Leaf Recognition Algorithm for Plant Classification Using  Probabilistic Neural Network\nSANA - Network Protection through artificial Immunity\nA Network Protection Framework through Artificial Immunity\nNeural networks in 3D medical scan visualization\nExtension of Inagaki General Weighted Operators and A New Fusion Rule  Class of Proportional Redistribution of Intersection Masses\nA comparison of the notions of optimality in soft constraints and  graphical games\nOn the Conditional Independence Implication Problem: A Lattice-Theoretic  Approach\nThe Expressive Power of Binary Submodular Functions\nThe Latent Relation Mapping Engine: Algorithm and Experiments\nEmotions, diffusive emotional control and the motivational problem for  autonomous cognitive systems\nThe Semantics of Kalah Game\nGranularity-Adaptive Proof Presentation\nFiltering Algorithms for the Multiset Ordering Constraint\nThe Parameterized Complexity of Global Constraints\nDecompositions of Grammar Constraints\nSLIDE: A Useful Special Case of the CARDPATH Constraint\nTagging multimedia stimuli with ontologies\nBuilding the information kernel and the problem of recognition\nWhere are the really hard manipulation problems? The phase transition in  manipulating the veto rule\nCircuit Complexity and Decompositions of Global Constraints\nScenario-based Stochastic Constraint Programming\nOn Maximum a Posteriori Estimation of Hidden Markov Processes\nA Novel Two-Staged Decision Support based Threat Evaluation and Weapon  Assignment Algorithm, Asset-based Dynamic Weapon Scheduling using Artificial  Intelligence Techinques\nScheme of thinking quantum systems\nHigher coordination with less control - A result of information  maximization in the sensorimotor loop\nNeural Networks for Dynamic Shortest Path Routing Problems - A Survey\nClosing the Learning-Planning Loop with Predictive State Representations\nA Survey of Paraphrasing and Textual Entailment Methods\nComputational and Biological Analogies for Understanding Fine-Tuned  Parameters in Physics\nPropagating Conjunctions of AllDifferent Constraints\nPCA 4 DCA: The Application Of Principal Component Analysis To The  Dendritic Cell Algorithm\nThe Production of Probabilistic Entropy in Structure/Action Contingency  Relations\nHow to correctly prune tropical trees\nAn Empirical Study of the Manipulability of Single Transferable Voting\nSymmetries of Symmetry Breaking Constraints\nMemristor Crossbar-based Hardware Implementation of Fuzzy Membership  Functions\nText Classification using Artificial Intelligence\nEfficient Knowledge Base Management in DCSP\nSteepest Ascent Hill Climbing For A Mathematical Problem\nA Reduction of Imitation Learning and Structured Prediction to No-Regret  Online Learning\nLearning Planar Ising Models\nTo study the phenomenon of the Moravec's Paradox\nDD-EbA: An algorithm for determining the number of neighbors in cost  estimation by analogy using distance distributions\nPlanning with Partial Preference Models\nA Factorial Experiment on Scalability of Search Based Software Testing\nContext Capture in Software Development\nMeaning Negotiation as Inference\nArtificial Immune Privileged Sites as an Enhancement to Immuno-Computing  Paradigm\nGRASP and path-relinking for Coalition Structure Generation\nOn Minimal Constraint Networks\nPlanning Graph Heuristics for Belief Space Search\nTractable Set Constraints\nThe Good Old Davis-Putnam Procedure Helps Counting Models\nIdentifying Mislabeled Training Data\nMarkov Localization for Mobile Robots in Dynamic Environments\nDecentralized Markets versus Central Control: A Comparative Study\nRandomized Algorithms for the Loop Cutset Problem\nSpace Efficiency of Propositional Knowledge Representation Formalisms\nValue-Function Approximations for Partially Observable Markov Decision  Processes\nRobust Agent Teams via Socially-Attentive Monitoring\nWhat's in an Attribute? Consequences for the Least Common Subsumer\nThe Complexity of Reasoning with Cardinality Restrictions and Nominals  in Expressive Description Logics\nAn Application of Reinforcement Learning to Dialogue Strategy Selection  in a Spoken Dialogue System for Email\nAsimovian Adaptive Agents\nA Model of Inductive Bias Learning\nOn the Compilability and Expressive Power of Propositional Planning  Formalisms\nPartial-Order Planning with Concurrent Interacting Actions\nGrounding the Lexical Semantics of Verbs in Visual Perception using  Force Dynamics and Event Logic\nThe GRT Planning System: Backward Heuristic Construction in Forward  State-Space Planning\nExperiments with Infinite-Horizon, Policy-Gradient Estimation\nReasoning within Fuzzy Description Logics\nGIB: Imperfect Information in a Computationally Challenging Game\nDomain Filtering Consistencies\nThe FF Planning System: Fast Plan Generation Through Heuristic Search\nLearning Geometrically-Constrained Hidden Markov Models for Robot  Navigation: Bridging the Topological-Geometrical Gap\nA Sequence of Relaxations Constraining Hidden Variable Models\nAccelerating Reinforcement Learning by Composing Solutions of  Automatically Identified Subtasks\nParameter Learning of Logic Programs for Symbolic-Statistical Modeling\nExtensions of Simple Conceptual Graphs: the Complexity of Rules and  Constraints\nFusions of Description Logics and Abstract Description Systems\nWhen do Numbers Really Matter?\nAutomatically Training a Problematic Dialogue Predictor for a Spoken  Dialogue System\nA Knowledge Compilation Map\nInferring Strategies for Sentence Ordering in Multidocument News  Summarization\nMachine Learning Markets\nSpecific-to-General Learning for Temporal Events with Application to  Learning Event Definitions from Video\nAn Analysis of Phase Transition in NK Landscapes\nPropositional Independence - Formula-Variable Independence and  Forgetting\nTranslation of Pronominal Anaphora between English and Spanish:  Discrepancies and Evaluation\nMonte Carlo Methods for Tempo Tracking and Rhythm Quantization\nExploiting Contextual Independence In Probabilistic Inference\nBound Propagation\nOn Polynomial Sized MDP Succinct Policies\nCompiling Causal Theories to Successor State Axioms and STRIPS-Like  Systems\nVHPOP: Versatile Heuristic Partial Order Planner\nAnswer Set Planning Under Action Costs\nSAPA: A Multi-objective Metric Temporal Planner\nAltAltp: Online Parallelization of Plans with Heuristic State Search\nPlanning Through Stochastic Local Search and Temporal Action Graphs in  LPG\nTALplanner in IPC-2002: Extensions and Control Rules\nOptimal Schedules for Parallelizing Anytime Algorithms: The Case of  Shared Resources\nDecision-Theoretic Bidding Based on Learned Density Models in  Simultaneous, Interacting Auctions\nThe Metric-FF Planning System: Translating \"Ignoring Delete Lists\" to  Numeric State Variables\nThe 3rd International Planning Competition: Results and Analysis\nCP-nets: A Tool for Representing and Reasoning withConditional Ceteris  Paribus Preference Statements\nIDL-Expressions: A Formalism for Representing and Parsing Finite  Languages in Natural Language Processing\nEffective Dimensions of Hierarchical Latent Class Models\nA Personalized System for Conversational Recommendations\nCoherent Integration of Databases by Abductive Logic Programming\nGrounded Semantic Composition for Visual Scenes\nPrice Prediction in a Trading Agent Competition\nCompositional Model Repositories via Dynamic Constraint Satisfaction  with Order-of-Magnitude Preferences\nCompetitive Coevolution through Evolutionary Complexification\nDual Modelling of Permutation and Injection Problems\nGeneralizing Boolean Satisfiability I: Background and Survey of Existing  Work\nGraduality in Argumentation\nDecentralized Control of Cooperative Systems: Categorization and  Complexity Analysis\nOn Prediction Using Variable Order Markov Models\nOrdered Landmarks in Planning\nA Comprehensive Trainable Error Model for Sung Music Queries\nPhase Transitions and Backbones of the Asymmetric Traveling Salesman  Problem\nLinear Latent Force Models using Gaussian Processes\nInformation, Utility & Bounded Rationality\nUndithering using linear filtering and non-linear diffusion techniques\nA KIF Formalization for the IFF Category Theory Ontology\nATP and Presentation Service for Mizar Formalizations\nStructured Knowledge Representation for Image Retrieval\nGeneralizing Boolean Satisfiability II: Theory\nRelational Dynamic Bayesian Networks\nSolving Set Constraint Satisfaction Problems using ROBDDs\nLearning Concept Hierarchies from Text Corpora using Formal Concept  Analysis\nGeneralizing Boolean Satisfiability III: Implementation\nPerseus: Randomized Point-based Value Iteration for POMDPs\nCIXL2: A Crossover Operator for Evolutionary Algorithms Based on  Population Features\nMacro-FF: Improving AI Planning with Automatically Learned  Macro-Operators\nApproximate Policy Iteration with a Policy Language Bias: Solving  Relational Markov Decision Processes\nThe Deterministic Part of IPC-4: An Overview\nBinary Encodings of Non-binary Constraint Satisfaction Problems:  Algorithms and Experimental Results\nDynamic Local Search for the Maximum Clique Problem\nRepresenting Conversations for Scalable Overhearing\nProbabilistic Hybrid Action Models for Predicting Concurrent  Percept-driven Robot Behavior\nAsynchronous Partial Overlay: A New Algorithm for Solving Distributed  Constraint Satisfaction Problems\nAdmissible and Restrained Revision\nOn Graphical Modeling of Preference and Importance\nFault Tolerant Boolean Satisfiability\nCognitive Principles in Robust Multimodal Interpretation\nMultiple-Goal Heuristic Search\nFluCaP: A Heuristic Search Planner for First-Order MDPs\nNew Inference Rules for Max-SAT\nObtaining Reliable Feedback for Sanctioning Reputation Mechanisms\nConjunctive Query Answering for the Description Logic SHIQ\nExploiting Subgraph Structure in Multi-Robot Path Planning\nCTL Model Update for System Modifications\nExtended RDF as a Semantic Foundation of Rule Markup Languages\nLoosely Coupled Formulations for Automated Planning: An Integer  Programming Perspective\nA Constraint Programming Approach for Solving a Queueing Control Problem\nFirst Order Decision Diagrams for Relational MDPs\nRevisiting Numerical Pattern Mining with Formal Concept Analysis\nA Well-typed Lightweight Situation Calculus\nBelief change with noisy sensing in the situation calculus\nAn MLP based Approach for Recognition of Handwritten `Bangla' Numerals\nGibbs Sampling in Open-Universe Stochastic Languages\nCompiling Possibilistic Networks: Alternative Approaches to  Possibilistic Inference\nPossibilistic Answer Set Programming Revisited\nProbabilistic Similarity Logic\nAn Online Learning-based Framework for Tracking\nLifted Inference for Relational Continuous Models\nA Scalable Method for Solving High-Dimensional Continuous POMDPs Using  Local Approximation\nPlaying games against nature: optimal policies for renewable resource  allocation\nLearning Game Representations from Data Using Rationality Constraints\nReal-Time Scheduling via Reinforcement Learning\nFormula-Based Probabilistic Inference\nIntracluster Moves for Constrained Discrete-Space MCMC\nCausal Conclusions that Flip Repeatedly and Their Justification\nAnytime Planning for Decentralized POMDPs using Expectation Maximization\nThe Cost of Troubleshooting Cost Clusters with Inside Information\nOn a Class of Bias-Amplifying Variables that Endanger Effect Estimates\nIrregular-Time Bayesian Networks\nOn the Validity of Covariate Adjustment for Estimating Causal Effects\nBayesian Inference in Monte-Carlo Tree Search\nLearning Why Things Change: The Difference-Based Causality Learner\nPrimal View on Belief Propagation\nRollout Sampling Policy Iteration for Decentralized POMDPs\nModeling Multiple Annotator Expertise in the Semi-Supervised Learning  Scenario\nMulti-Domain Collaborative Filtering\nA Convex Formulation for Learning Task Relationships in Multi-Task  Learning\nRAPID: A Reachable Anytime Planner for Imprecisely-sensed Domains\nUnderstanding Sampling Style Adversarial Search Methods\nEliminating the Weakest Link: Making Manipulation Intractable?\nQuantum Interference in Cognition: Structural Aspects of the Brain\nThe Network of French Legal Codes\nMost Relevant Explanation: Properties, Algorithms, and Evaluations\nExploring compact reinforcement-learning representations with linear  regression\nMeasuring Inconsistency in Probabilistic Knowledge Bases\nEffects of Treatment on the Treated: Identification and Generalization\nBisimulation-based Approximate Lifted Inference\nRegret-based Reward Elicitation for Markov Decision Processes\nExact Structure Discovery in Bayesian Networks with Less Space\nLogical Inference Algorithms and Matrix Representations for  Probabilistic Conditional Independence\nConvexifying the Bethe Free Energy\nConvergent message passing algorithms - a unifying view\nMAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts\nMonolingual Probabilistic Programming Using Generalized Coroutines\nCounting Belief Propagation\nTemporal Action-Graph Games: A New Representation for Dynamic Games\nMAP Estimation, Message Passing, and Perfect Graphs\nImproved Mean and Variance Approximations for Belief Net Responses via  Network Doubling\nFirst-Order Mixed Integer Linear Programming\nDistributed Parallel Inference on Large Factor Graphs\nGenerating Optimal Plans in Highly-Dynamic Domains\nMean Field Variational Approximation for Continuous-Time Bayesian  Networks\nLower Bound Bayesian Networks - An Efficient Inference of Lower Bounds  on Probability Distributions in Bayesian Networks\nSoftening Fuzzy Knowledge Representation Tool with the Learning of New  Words in Natural Language\nSpeeding Up Planning in Markov Decision Processes via Automatically  Constructed Abstractions\nAdaptive Inference on General Graphical Models\nOn Identifying Total Effects in the Presence of Latent Variables and  Selection bias\nBayesian network learning by compiling to weighted MAX-SAT\nStrategy Selection in Influence Diagrams using Imprecise Probabilities\nKnowledge Combination in Graphical Multiagent Model\nAlmost Optimal Intervention Sets for Causal Discovery\nGibbs Sampling in Factorized Continuous-Time Markov Processes\nLearning and Solving Many-Player Games through a Cluster-Based  Representation\nCausal discovery of linear acyclic models with arbitrary distributions\nLearning When to Take Advice: A Statistical Test for Achieving A  Correlated Equilibrium\nSparse Stochastic Finite-State Controllers for POMDPs\nThe Computational Complexity of Sensitivity Analysis and Parameter  Tuning\nPartitioned Linear Programming Approximations for MDPs\nThe Evaluation of Causal Effects in Studies with an Unobserved  Exposure/Outcome Variable: Bounds and Identification\nLearning Arithmetic Circuits\nDiscovering Cyclic Causal Models by Independent Components Analysis\nCT-NOR: Representing and Reasoning About Events in Continuous Time\nImproving the Accuracy and Efficiency of MAP Inference for Markov Logic\nObservation Subset Selection as Local Compilation of Performance  Profiles\nDyna-Style Planning with Linear Function Approximation and Prioritized  Sweeping\nTightening LP Relaxations for MAP using Message Passing\nEfficient inference in persistent Dynamic Bayesian Networks\nHierarchical POMDP Controller Optimization by Likelihood Maximization\nPropagation using Chain Event Graphs\nSensitivity analysis in decision circuits\nStudies in Lower Bounding Probabilities of Evidence using the Markov  Inequality\nSearch for Choquet-optimal paths under uncertainty\nA new parameter Learning Method for Bayesian Networks with Qualitative  Influences\nLearning Probabilistic Relational Dynamics for Multiple Tasks\nProbabilistic Models for Anomaly Detection in Remote Sensor Data Streams\nNode Splitting: A Scheme for Generating Upper Bounds in Bayesian  Networks\nReachability Under Uncertainty\nEvaluating influence diagrams with decision circuits\nOptimizing Memory-Bounded Controllers for Decentralized POMDPs\nMixture-of-Parents Maximum Entropy Markov Models\nConsensus ranking under the exponential model\nAND/OR Multi-Valued Decision Diagrams (AOMDDs) for Weighted Graphical  Models\nBest-First AND/OR Search for Most Probable Explanations\nLearning Bayesian Network Structure from Correlation-Immune Data\nEvaluation of the Causal Effect of Control Plans in Nonrecursive  Structural Equation Models\nSurvey Propagation Revisited\nRanking Under Uncertainty\nMore-or-Less CP-Networks\nImportance Sampling via Variational Optimization\nPolicy Iteration for Relational MDPs\nConstrained Automated Mechanism Design for Infinite Games of Incomplete  Information\nImproved Dynamic Schedules for Belief Propagation\nMarkov Logic in Infinite Domains\nPredicting the behavior of interacting humans by fusing data from  multiple sources\nAn Empirical Comparison of Algorithms for Aggregating Expert Predictions\nMAIES: A Tool for DNA Mixture Analysis\nA Variational Approach for Approximating Bayesian Networks by Edge  Deletion\nSensitivity Analysis for Threshold Decision Making with Dynamic Networks\nOptimal Coordinated Planning Amongst Self-Interested Agents with Private  State\nGraphical Condition for Identification in recursive SEM\nCutset Sampling with Likelihood Weighting\nAn Efficient Triplet-based Algorithm for Evidential Reasoning\nNon-Minimal Triangulations for Mixed Stochastic/Deterministic Graphical  Models\nLinear Algebra Approach to Separable Bayesian Networks\nAdvances in exact Bayesian structure discovery in Bayesian networks\nThe AI&M Procedure for Learning from Incomplete Data\nPearl's Calculus of Intervention Is Complete\nDimension Reduction in Singularly Perturbed Continuous-Time Bayesian  Networks\nMethods for computing state similarity in Markov Decision Processes\nAsymmetric separation for local independence graphs\nGeneral-Purpose MCMC Inference over Relational Structures\nVisualization of Collaborative Data\nA compact, hierarchical Q-function decomposition\nStructured Priors for Structure Learning\nA theoretical study of Y structures for causal discovery\nOn the Number of Samples Needed to Learn the Correct Structure of a  Bayesian Network\nA Non-Parametric Bayesian Method for Inferring Hidden Causes\nAxiomatic Foundations for a Class of Generalized Expected Utility:  Algebraic Expected Utility\nRecognizing Activities and Spatial Context Using Wearable Sensors\nIncremental Model-based Learners With Formal Learning-Time Guarantees\nA simple approach for finding the globally optimal Bayesian network  structure\nInference in Hybrid Bayesian Networks Using Mixtures of Gaussians\nPractical Linear Value-approximation Techniques for First-order MDPs\nStable Independence in Perfect Maps\n'Say EM' for Selecting Probabilistic Models for Logical Sequences\nA Differential Semantics of Lazy AR Propagation\nModifying Bayesian Networks by Probability Constraints\nExploiting Evidence-dependent Sensitivity Bounds\nMAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs\nLocal Utility Elicitation in GAI Models\nTowards Characterizing Markov Equivalence Classes for Directed Acyclic  Graphs with Latent Variables\nHybrid Bayesian Networks with Linear Deterministic Variables\nCounterexample-guided Planning\nNonparametric Bayesian Logic\nCounterfactual Reasoning in Linear Structural Equation Models\nEfficient algorithm for estimation of qualitative expected utility in  possibilistic case-based reasoning\nUnsupervised Activity Discovery and Characterization From Event-Streams\nModeling Transportation Routines using Hybrid Dynamic Mixed Networks\nLearning Bayesian Network Parameters with Prior Knowledge about  Context-Specific Qualitative Influences\nPlanning in POMDPs Using Multiplicity Automata\nOn the Number of Experiments Sufficient and in the Worst Case Necessary  to Identify All Causal Relations Among N Variables\nNear-optimal Nonmyopic Value of Information in Graphical Models\nOn the optimality of tree-reweighted max-product message-passing\nA Revision-Based Approach to Resolving Conflicting Information\nAsynchronous Dynamic Bayesian Networks\nRobotic Mapping with Polygonal Random Fields\nExpectation Propagation for Continuous Time Bayesian Networks\nA Conditional Random Field for Discriminatively-trained Finite-state  String Edit Distance\nRepresentation Policy Iteration\nApproximate Linear Programming for First-order MDPs\nPredictive Linear-Gaussian Models of Stochastic Dynamical Systems\nEfficient Test Selection in Active Diagnosis via Entropy Approximation\nA Transformational Characterization of Markov Equivalence for Directed  Acyclic Graphs with Latent Variables\nImportance Sampling in Bayesian Networks: An Influence-Based  Approximation Strategy for Importance Functions\nStructured Region Graphs: Morphing EP into GBP\nArabic CALL system based on pedagogically indexed text\nQualitative Approximate Behavior Composition\nExploiting First-Order Regression in Inductive Policy Selection\nA Complete Anytime Algorithm for Treewidth\nDecision Making for Symbolic Probability\nStable Independance and Complexity of Representation\nPropositional and Relational Bayesian Networks Associated with Imprecise  and Qualitative Probabilistic Assesments\nBayesian Biosurveillance of Disease Outbreaks\nA Logic Programming Framework for Possibilistic Argumentation with Vague  Knowledge\nSensitivity Analysis in Bayesian Networks: From Single to Multiple  Parameters\nOn finding minimal w-cutset\nUsing arguments for making decisions: A possibilistic logic approach\nCase-Factor Diagrams for Structured Probabilistic Modeling\nConvolutional Factor Graphs as Probabilistic Models\nAn Empirical Evaluation of Possible Variations of Lazy Propagation\nPre-Selection of Independent Binary Features: An Application to  Diagnosing Scrapie in Sheep\nSolving Factored MDPs with Continuous and Discrete Variables\nAnnealed MAP\nDiscretized Approximations for POMDP with Average Cost\nOn the Choice of Regions for Generalized Belief Propagation\nMonotonicity in Bayesian Networks\nPredictive State Representations: A New Theory for Modeling Dynamical  Systems\nA New Characterization of Probabilities in Bayesian Networks\nEvidence-invariant Sensitivity Bounds\nRobust Probabilistic Inference in Distributed Systems\nOn Modeling Profiles instead of Values\nLearning Diagnostic Policies from Examples by Systematic Search\nHybrid Influence Diagrams Using Mixtures of Truncated Exponentials\nThe Arcade Learning Environment: An Evaluation Platform for General  Agents\nImproving multivariate Horner schemes with Monte Carlo tree search\nA Novel Fuzzy Logic Based Adaptive Supertwisting Sliding Mode Control  Algorithm for Dynamic Uncertain Systems\nModeling and Control of CSTR using Model based Neural Network Predictive  Control\nA hybrid ACO approach to the Matrix Bandwidth Minimization Problem\nSoft Computing approaches on the Bandwidth Problem\nConflict Anticipation in the Search for Graph Automorphisms\nAutomated Marble Plate Classification System Based On Different Neural  Network Input Training Sets and PLC Implementation\nQualitative Modelling via Constraint Programming: Past, Present and  Future\nScoring and Searching over Bayesian Networks with Causal and Associative  Priors\nLifted Relax, Compensate and then Recover: From Approximate to Exact  Lifted Probabilistic Inference\nAn Efficient Message-Passing Algorithm for the M-Best MAP Problem\nCausal Inference by Surrogate Experiments: z-Identifiability\nExploiting Uniform Assignments in First-Order MPE\nThe Do-Calculus Revisited\nWeighted Sets of Probabilities and MinimaxWeighted Expected Regret: New  Approaches for Representing Uncertainty and Making Decisions\nSemantic Understanding of Professional Soccer Commentaries\nGeneralized Belief Propagation on Tree Robust Structured Region Graphs\nUniform Solution Sampling Using a Constraint Solver As an Oracle\nScaling Up Decentralized MDPs Through Heuristic Search\nA Bayesian Approach to Constraint Based Causal Inference\nDynamic Stochastic Orienteering Problems for Risk-Aware Applications\nA Theory of Goal-Oriented MDPs with Dead Ends\nCausal Discovery of Linear Cyclic Models from Multiple Experimental Data  Sets with Overlapping Variables\nInferring Strategies from Limited Reconnaissance in Real-time Strategy  Games\nExploiting Structure in Cooperative Bayesian Games\nHilbert Space Embeddings of POMDPs\nLocal Structure Discovery in Bayesian Networks\nLearning STRIPS Operators from Noisy and Incomplete Observations\nClosed-Form Learning of Markov Networks from Dependency Networks\nEfficient MRF Energy Minimization via Adaptive Diminishing Smoothing\nNew Advances and Theoretical Insights into EDML\nAn Improved Admissible Heuristic for Learning Optimal Bayesian Networks\nDynamic Teaching in Sequential Decision Making Environments\nTracking Group Evolution in Social Networks\nFull Object Boundary Detection by Applying Scale Invariant Features in a  Region Merging Segmentation Algorithm\nInfluence of Context on Decision Making during Requirements Elicitation\nAdaptive Bee Colony in an Artificial Bee Colony for Solving Engineering  Design Problems\nDynamic Decision Support System Based on Bayesian Networks Application  to fight against the Nosocomial Infections\nAn ontology-based approach to relax traffic regulation for autonomous  vehicle assistance\nOn revising fuzzy belief bases\nUpgrading Ambiguous Signs in QPNs\nAn Empirical Study of w-Cutset Sampling for Bayesian Networks\nValue Elimination: Bayesian Inference via Backtracking Search\nNew Advances in Inference by Recursive Conditioning\nSymbolic Generalization for On-line Planning\nA Simple Insight into Iterative Belief Propagation's Success\nA Robust Independence Test for Constraint-Based Learning of Causal  Structure\nLarge-Sample Learning of Bayesian Networks is NP-Hard\nUsing the structure of d-connecting paths as a qualitative measure of  the strength of dependence\nReasoning about Bayesian Network Classifiers\nMonte Carlo Matrix Inversion Policy Evaluation\nApproximate Decomposition: A Method for Bounding and Estimating  Probabilistic and Deterministic Queries\nLAYERWIDTH: Analysis of a New Metric for Directed Acyclic Graphs\nApproximate Inference and Constrained Optimization\nMonte-Carlo optimizations for resource allocation problems in stochastic  network systems\nImplementation and Comparison of Solution Methods for Decision Processes  with Non-Markovian Rewards\nDecision Making with Partially Consonant Belief Functions\nPhase Transition of Tractability in Constraint Satisfaction and Bayesian  Network Inference\nDecentralized Sensor Fusion With Distributed Particle Filters\nSolving MAP Exactly using Systematic Search\nMarginalizing Out Future Passengers in Group Elevator Control\nOn Local Optima in Learning Bayesian Networks\nOptimal Limited Contingency Planning\nPractically Perfect\nSystematic vs. Non-systematic Algorithms for Solving the MPE Task\nAn Importance Sampling Algorithm Based on Evidence Pre-propagation\nExploiting Locality in Searching the Web\nThe Revisiting Problem in Mobile Robot Map Building: A Hierarchical  Bayesian Approach\nEfficient Inference in Large Discrete Domains\nMarkov Equivalence Classes for Maximal Ancestral Graphs\nOn the Construction of the Inclusion Boundary Neighbourhood for Markov  Equivalence Classes of Bayesian Network Structures\nIntroducing Variable Importance Tradeoffs into CP-Nets\nIterative Join-Graph Propagation\nThe Thing That We Tried Didn't Work Very Well : Deictic Representation  in Reinforcement Learning\nDistributed Planning in Hierarchical Factored MDPs\nUnconstrained Influence Diagrams\nCFW: A Collaborative Filtering System Using Posteriors Over Weights Of  Evidence\nA Bayesian Network Scoring Metric That Is Based On Globally Uniform  Parameter Priors\nValue Function Approximation in Zero-Sum Markov Games\nPolynomial Value Iteration Algorithms for Detrerminstic MDPs\nReal-valued All-Dimensions search: Low-overhead rapid searching over  subsets of attributes\nContinuous Time Bayesian Networks\nModelling Information Incorporation in Markets, with Application to  Detecting and Explaining Events\nFrom Qualitative to Quantitative Probabilistic Networks\nAn MDP-based Recommender System\nDiscriminative Probabilistic Models for Relational Data\nAnytime State-Based Solution Methods for Decision Processes with  non-Markovian Rewards\nExploiting Functional Dependence in Bayesian Network Inference\nDecision Principles to justify Carnap's Updating Method and to Suggest  Corrections of Probability Judgments (Invited Talks)\nIPF for Discrete Chain Factor Graphs\nInductive Policy Selection for First-Order MDPs\nGraphical readings of possibilistic logic bases\nPre-processing for Triangulation of Probabilistic Networks\nConfidence Inference in Bayesian Networks\nSemi-Instrumental Variables: A Test for Instrument Admissibility\nUsing Bayesian Networks to Identify the Causal Effect of Speeding in  Individual Vehicle/Pedestrian Collisions\nEfficient Stepwise Selection in Decomposable Models\nIncorporating Expressive Graphical Models in Variational Approximations:  Chain-Graphs and Hidden Variables\nLearning the Dimensionality of Hidden Variables\nMultivariate Information Bottleneck\nA Comparison of Axiomatic Approaches to Qualitative Decision Making  Using Possibility Theory\nRobust Combination of Local Controllers\nA Clustering Approach to Solving Large Stochastic Matching Problems\nA Bayesian Approach to Tackling Hard Computational Problems\nOn characterizing Inclusion of Bayesian Networks\nImproved learning of Bayesian networks\nInference in Hybrid Networks: Theoretical Limits and Practical  Algorithms\nA Bayesian Multiresolution Independence Test for Continuous Variables\nAggregating Learned Probabilistic Beliefs\nExpectation Propagation for approximate Bayesian inference\nThe Factored Frontier Algorithm for Approximate Inference in DBNs\nLattice Particle Filters\nApproximating MAP using Local Search\nSufficiency, Separability and Temporal Probabilistic Models\nToward General Analysis of Recursive Probability Models\nValue-Directed Sampling Methods for POMDPs\nDecision-Theoretic Planning with Concurrent Temporally Extended Actions\nPolicy Improvement for POMDPs Using Normalized Importance Sampling\nMaximum Likelihood Bounded Tree-Width Markov Networks\nBayesian Error-Bars for Belief Net Inference\nAnalysing Sensitivity Data from Probabilistic Networks\nThe Optimal Reward Baseline for Gradient-Based Reinforcement Learning\nBelief Optimization for Binary Networks: A Stable Alternative to Loopy  Belief Propagation\nStatistical Modeling in Continuous Speech Recognition (CSR)(Invited  Talk)\nPlanning and Acting under Uncertainty: A New Model for Spoken Dialogue  Systems\nA Complete Calculus for Possibilistic Logic Programming with Fuzzy  Propositional Variables\nThe Complexity of Decentralized Control of Markov Decision Processes\nDynamic Bayesian Multinets\nUtilities as Random Variables: Density Estimation and Structure  Discovery\nComputational Investigation of Low-Discrepancy Sequences in Simulation  Algorithms for Bayesian Networks\nA Decision Theoretic Approach to Targeted Advertising\nA Bayesian Method for Causal Modeling and Discovery Under Selection\nSeparation Properties of Sets of Probability Measures\nA Differential Approach to Inference in Bayesian Networks\nAny-Space Probabilistic Inference\nMix-nets: Factored Mixtures of Gaussians in Bayesian Networks With Mixed  Continuous And Discrete Variables\nRao-Blackwellised Particle Filtering for Dynamic Bayesian Networks\nLikelihood Computations Using Value Abstractions\nInference for Belief Networks Using Coupling From the Past\nDependency Networks for Collaborative Filtering and Data Visualization\nYGGDRASIL - A Statistical Package for Learning Split Models\nMarginalization in Composed Probabilistic Models\nFast Planning in Stochastic Games\nMaking Sensitivity Analysis Computationally Efficient\nGame Networks\nCombinatorial Optimization by Learning and Simulation of Bayesian  Networks\nCredal Networks under Maximum Entropy\nTractable Bayesian Learning of Tree Belief Networks\nRepresenting and Solving Asymmetric Bayesian Decision Problems\nUsing ROBDDs for Inference in Bayesian Networks with Troubleshooting as  an Example\nAdaptive Importance Sampling for Estimation in Structured Domains\nProbabilistic Models for Query Approximation with Large Sparse Binary  Datasets\nValue-Directed Belief State Approximation for POMDPs\nProbabilistic State-Dependent Grammars for Plan Recognition\nDynamic Trees: A Structured Variational Method Giving Efficient  Propagation Rules\nModel-Based Hierarchical Clustering\nConditional Independence and Markov Properties in Possibility Theory\nVariational Approximations between Mean Field Theory and the Junction  Tree Algorithm\nExploiting Qualitative Knowledge in the Learning of Conditional  Probabilities of Bayesian Networks\nUser Interface Tools for Navigation in Conditional Probability Tables  and Elicitation of Probabilities in Bayesian Networks\nComputer Poker Research at LIACC\nA Temporal Bayesian Network for Diagnosis and Prediction\nPossibilistic logic bases and possibilistic graphs\nReasoning With Conditional Ceteris Paribus Preference Statem\nDiscovering the Hidden Structure of Complex Dynamic Systems\nComparing Bayesian Network Classifiers\nA Hybrid Anytime Algorithm for the Constructiion of Causal Models From  Sparse Data\nModel-Based Bayesian Exploration\nHybrid Probabilistic Programs: Algorithms and Complexity\nAssessing the value of a candidate. Comparing belief function and  possibility theories\nQualitative Models for Decision Under Uncertainty without the  Commensurability Assumption\nData Analysis with Bayesian Networks: A Bootstrap Approach\nLearning Bayesian Network Structure from Massive Datasets: The \"Sparse  Candidate\" Algorithm\nOn Transformations between Probability and Spohnian Disbelief Functions\nA Hybrid Approach to Reasoning with Partially Elicited Preference Models\nSPUDD: Stochastic Planning using Decision Diagrams\nAttention-Sensitive Alerting\nMini-Bucket Heuristics for Improved Search\nA General Algorithm for Approximate Inference and its Application to  Hybrid Bayes Nets\nBayesian Poker\nLazy Evaluation of Symmetric Bayesian Decision Problems\nSolving POMDPs by Searching the Space of Finite Policies\nLearning Finite-State Controllers for Partially Observable Environments\nBayes Nets in Educational Assessment: Where Do the Numbers Come From?\nA Bayesian Network Classifier that Combines a Finite Mixture Model and a  Naive Bayes Model\nLoopy Belief Propagation for Approximate Inference: An Empirical Study\nLearning Bayesian Networks with Restricted Causal Interactions\nThe Decision-Theoretic Interactive Video Advisor\nGraphical Representations of Consensus Belief\nBayesian Networks for Dependability Analysis: an Application to Digital  Control Reliability\nInference Networks and the Evaluation of Evidence: Alternative Analyses\nLearning Hidden Markov Models with Geometrical Constraints\nAn Update Semantics for Defeasible Obligations\nHow to Elicit Many Probabilities\nProbabilistic Belief Change: Expansion, Conditioning and Constraining\nContextual Weak Independence in Bayesian Networks\nInference in Multiply Sectioned Bayesian Networks with Extended  Shafer-Shenoy and Lazy Propagation\nTime-Critical Dynamic Decision Making\nTowards a Logic-Based Unifying Framework for Computing\nOn the Semantics and Automated Deduction for PLFC, a Logic of  Possibilistic Uncertainty and Fuzziness\nA Hybrid Algorithm to Compute Marginal and Joint Beliefs in Bayesian  Networks and Its Complexity\nStructured Reachability Analysis for Markov Decision Processes\nQuery Expansion in Information Retrieval Systems using a Bayesian  Network-Based Thesaurus\nDynamic Jointrees\nThe Bayesian Structural EM Algorithm\nPsychological and Normative Theories of Causal Power and the  Probabilities of Causes\nTowards Case-Based Preference Elicitation: Similarity Measures on  Preference Structures\nHierarchical Solution of Markov Decision Processes using Macro-actions\nInferring Informational Goals from Free-Text Queries: A Bayesian  Approach\nEvaluating Las Vegas Algorithms - Pitfalls and Remedies\nAn Anytime Algorithm for Decision Making under Uncertainty\nImplementing Resolute Choice Under Uncertainty\nDealing with Uncertainty on the Initial State of a Petri Net\nIncremental Tradeoff Resolution in Qualitative Probabilistic Networks\nMagic Inference Rules for Probabilistic Deduction under Taxonomic  Knowledge\nLazy Propagation in Junction Trees\nFrom Likelihood to Plausibility\nA Multivariate Discretization Method for Learning Bayesian Networks from  Mixed Data\nResolving Conflicting Arguments under Uncertainties\nLearning From What You Don't Observe\nContext-Specific Approximation in Probabilistic Inference\nEmpirical Evaluation of Approximation Algorithms for Probabilistic  Decoding\nDecision Theoretic Foundations of Graphical Model Selection\nBayes-Ball: The Rational Pastime (for Determining Irrelevance and  Requisite Information in Belief Networks and Influence Diagrams)\nSwitching Portfolios\nBayesian Networks from the Point of View of Chain Graphs\nLearning Mixtures of DAG Models\nFlexible and Approximate Computation through State-Space Reduction\nLearning to Rank for Expert Search in Digital Libraries of Academic  Publications\nThe SETI Episode in the 1967 Discovery of Pulsars\nBayes Networks for Sonar Sensor Fusion\nStructured Arc Reversal and Simulation of Dynamic Probabilistic Networks\nA Bayesian Approach to Learning Bayesian Networks with Local Structure\nExploring Parallelism in Learning Belief Networks\nRobustness Analysis of Bayesian Networks with Local Convex Sets of  Distributions\nMyopic Value of Information in Influence Diagrams\nDecision-making Under Ordinal Preferences and Comparative Uncertainty\nSequential Update of Bayesian Network Structure\nImage Segmentation in Video Sequences: A Probabilistic Approach\nLearning Bayesian Nets that Perform Well\nProbability Update: Conditioning vs. Cross-Entropy\nProblem-Focused Incremental Elicitation of Multi-Attribute Utility  Models\nPerception, Attention, and Resources: A Decision-Theoretic Approach to  Graphics Rendering\nLearning Belief Networks in Domains with Recursively Embedded Pseudo  Independent Submodels\nComposition of Probability Measures on Finite Spaces\nNested Junction Trees\nNonuniform Dynamic Discretization in Hybrid Networks\nNetwork Fragments: Representing Knowledge for Constructing Probabilistic  Models\nComputational Advantages of Relevance Reasoning in Bayesian Belief  Networks\nA Target Classification Decision Aid\nSupport and Plausibility Degrees in Generalized Functional Models\nThe Cognitive Processing of Causal Knowledge\nCost-Sharing in Bayesian Knowledge Bases\nConditional Utility, Utility Independence, and Utility Networks\nSequential Thresholds: Context Sensitive Default Extensions\nScore and Information for Recursive Exponential Models with Incomplete  Data\nAn Algorithm for Finding Minimum d-Separating Sets in Belief Networks\nConstraining Influence Diagram Structure by Generative Planning: An  Application to the Optimization of Oil Spill Response\nA Structurally and Temporally Extended Bayesian Belief Network Model:  Definitions, Properties, and Modeling Techniques\nAn Alternative Markov Property for Chain Graphs\nEntailment in Probability of Thresholded Generalizations\nObject Recognition with Imperfect Perception and Redundant Description\nA Sufficiently Fast Algorithm for Finding Close to Optimal Junction  Trees\nContext-Specific Independence in Bayesian Networks\nDecision-Theoretic Troubleshooting: A Framework for Repair and  Experiment\nTail Sensitivity Analysis in Bayesian Networks\nDecision-Analytic Approaches to Operational Decision Making: Application  and Observation\nLearning Equivalence Classes of Bayesian Networks Structures\nEfficient Approximations for the Marginal Likelihood of Incomplete Data  Given a Bayesian Network\nIndependence with Lower and Upper Probabilities\nQuasi-Bayesian Strategies for Efficient Plan Generation: Application to  the Planning to Observe Problem\nSound Abstraction of Probabilistic Actions in The Constraint Mass  Assignment Framework\nBelief Revision with Uncertain Inputs in the Possibilistic Setting\nAn Evaluation of Structural Parameters for Probabilistic Reasoning:  Results on Benchmark Circuits\nLearning Bayesian Networks with Local Structure\nA Qualitative Markov Assumption and its Implications for Belief Change\nAsymptotic Model Selection for Directed Networks with Hidden Variables\nTheoretical Foundations for Abstraction-Based Probabilistic Planning\nWhy Is Diagnosis Using Belief Networks Insensitive to Imprecision In  Probabilities?\nA Probabilistic Model For Sensor Validation\nUncertain Inferences and Uncertain Conclusions\nProbabilistic Disjunctive Logic Programming\nGeometric Implications of the Naive Bayes Assumption\nA Discovery Algorithm for Directed Cyclis Graphs\nA Polynomial-Time Algorithm for Deciding Markov Equivalence of Directed  Cyclic Graphical Models\nSample-and-Accumulate Algorithms for Belief Updating in Bayes Networks\nEfficient Enumeration of Instantiations in Bayesian Networks\nOn Separation Criterion and Recovery Algorithm for Chain Graphs\nPossible World Partition Sequences: A Unifying Framework for Uncertain  Reasoning\nSupply Restoration in Power Distribution Systems - A Case Study in  Integrating Model-Based Diagnosis and Repair Planning\nOptimal Factory Scheduling using Stochastic Dominance A*\nCritical Remarks on Single Link Search in Learning Belief Networks\nGraphical Models for Preference and Utility\nAn Algebraic Semantics for Possibilistic Logic\nAutomating Computer Bottleneck Detection with Belief Nets\nError Estimation in Approximate Bayesian Belief Network Inference\nPractical Model-Based Diagnosis with Qualitative Possibilistic  Uncertainty\nIndependence Concepts for Convex Sets of Probabilities\nClustering Without (Thinking About) Triangulation\nElicitation of Probabilities for Belief Networks: Combining Qualitative  and Quantitative Information\nNumerical Representations of Acceptance\nFraud/Uncollectible Debt Detection Using a Bayesian Network Based  Learning System: A Rare Binary Outcome with Mixed Data Structures\nA Constraint Satisfaction Approach to Decision under Uncertainty\nPlausibility Measures: A User's Guide\nFast Belief Update Using Order-of-Magnitude Probabilities\nA Definition and Graphical Representation for Causality\nDisplay of Information for Time-Critical Decision Making\nInformation/Relevance Influence Diagrams\nOn the Detection of Conflicts in Diagnostic Bayesian Networks Using  Abstraction\nSensitivities: An Alternative to Conditional Probabilities for Bayesian  Belief Networks\nExploiting the Rule Structure for Decision Making within the Independent  Choice Logic\nAbstraction in Belief Networks: The Role of Intermediate States in  Diagnostic Reasoning\nAccounting for Context in Plan Recognition, with Application to Traffic  Monitoring\nA New Pruning Method for Solving Decision Trees and Game Trees\nDirected Cyclic Graphical Representations of Feedback Models\nModeling Failure Priors and Persistence in Model-Based Diagnosis\nA Polynomial Algorithm for Computing the Optimal Repair Strategy in a  System with Independent Component Failures\nExploiting System Hierarchy to Compute Repair Plans in Probabilistic  Model-based Diagnosis\nPath Planning under Time-Dependent Uncertainty\nDevelopment of Yes/No Arabic Question Answering System\nAn Evaluation of an Algorithm for Inductive Learning of Bayesian Belief  Networks Usin\nProbabilistic Constraint Satisfaction with Non-Gaussian Noise\nGenerating New Beliefs From Old\nCounterfactual Probabilities: Computational Methods, Bounds and  Applications\nPlanning with External Events\nProperties of Bayesian Belief Network Learning Algorithms\nEfficient Estimation of the Value of Information in Monte Carlo Models\nAction Networks: A Framework for Reasoning about Actions and Change  under Uncertainty\nOn the Relation between Kappa Calculus and Probabilistic Reasoning\nA Structured, Probabilistic Representation of Action\nLocalized Partial Evaluation of Belief Networks\nA Probabilistic Model of Action for Least-Commitment Planning with  Information Gather\nAn Ordinal View of Independence with Application to Plausible Reasoning\nValue of Evidence on Influence Diagrams\nLearning Gaussian Networks\nOn Testing Whether an Embedded Bayesian Network Represents a Probability  Model\nEpsilon-Safe Planning\nGenerating Bayesian Networks from Probability Logic Knowledge Bases\nA New Look at Causal Independence\nProbabilistic Description Logics\nAn Experimental Comparison of Numerical and Qualitative Probabilistic  Reasoning\nAn Alternative Proof Method for Possibilistic Logic and its Application  to Terminological Logics\nPossibilistic Conditioning and Propagation\nThe Automated Mapping of Plans for Plan Recognition\nReduction of Computational Complexity in Bayesian Networks through  Removal of Weak Dependencies\nUsing New Data to Refine a Bayesian Network\nSyntax-based Default Reasoning as Probabilistic Model-based Diagnosis\nFuzzy Geometric Relations to Represent Hierarchical Spatial Information\nOperator Selection While Planning Under Uncertainty\nModel-Based Diagnosis with Qualitative Temporal Uncertainty\nIncremental Dynamic Construction of Layered Polytree Networks\nA Probabilistic Calculus of Actions\nRobust Planning in Uncertain Environments\nKnowledge Engineering for Large Belief Networks\nBelief Maintenance in Bayesian Networks\nGlobal Conditioning for Probabilistic Inference in Belief Networks\nIgnorance and the Expressiveness of Single- and Set-Valued Probability  Models of Belief\nGeneral Belief Measures\nA Probabilistic Algorithm for Calculating Structure: Borrowing from  Simulated Annealing\nTradeoffs in Constructing and Evaluating Temporal Influence Diagrams\nEnd-User Construction of Influence Diagrams for Bayesian Statistics\nOn Considering Uncertainty and Alternatives in Low-Level Vision\nForecasting Sleep Apnea with Dynamic Network Models\nDiagnosis of Multiple Faults: A Sensitivity Analysis\nAdditive Belief-Network Models\nDialectic Reasoning with Inconsistent Information\nUtility-Based Abstraction and Categorization\nSome Complexity Considerations in the Combination of Belief Networks\nDeriving a Minimal I-map of a Belief Network Relative to a Target  Ordering of its Nodes\nMixtures of Gaussians and Minimum Relative Entropy Techniques for  Modeling Continuous Uncertainties\nRelevant Explanations: Allowing Disjunctive Assignments\nUsing First-Order Probability Logic for the Construction of Bayesian  Networks\nRepresenting and Reasoning With Probabilistic Knowledge: A Bayesian  Approach\nUsing Causal Information and Local Measures to Learn Bayesian Networks\nMinimal Assumption Distribution Propagation in Belief Networks\nKnowledge-Based Decision Model Construction for Hierarchical Diagnosis:  A Preliminary Report\nAn Implementation of a Method for Computing the Uncertainty in Inferred  Probabilities in Belief Networks\nDeliberation Scheduling for Time-Critical Sequential Decision Making\nAn efficient approach for finding the MPE in belief networks\nA Method for Planning Given Uncertain and Incomplete Information\nThe use of conflicts in searching Bayesian networks\nGALGO: A Genetic ALGOrithm Decision Support Tool for Complex Uncertain  Systems Modeled with Bayesian Belief Networks\nArgumentative inference in uncertain and inconsistent knowledge bases\nArgumentation as a General Framework for Uncertain Reasoning\nThe Probability of a Possibility: Adding Uncertainty to Default Rules\nPossibilistic decreasing persistence\nSecurity Assessment of Software Design using Neural Network\nStructural Controllability and Observability in Influence Diagrams\nReformulating Inference Problems Through Selective Conditioning\nA Symbolic Approach to Reasoning with Linguistic Quantifiers\nPossibilistic Assumption based Truth Maintenance System, Validation in a  Data Fusion Application\nIntegrating Model Construction and Evaluation\nReasoning With Qualitative Probabilities Can Be Tractable\nSemantics for Probabilistic Inference\nRepresenting Heuristic Knowledge in D-S Theory\nThe Topological Fusion of Bayes Nets\nCalculating Uncertainty Intervals From Conditional Convex Sets of  Probabilities\nSensor Validation Using Dynamic Belief Networks\nDecision Methods for Adaptive Task-Sharing in Associate Systems\nModeling Uncertain Temporal Evolutions in Model-Based Diagnosis\nPossibilistic Constraint Satisfaction Problems or \"How to handle soft  constraints?\"\nIntuitions about Ordered Beliefs Leading to Probabilistic Models\nAn Algorithm for Deciding if a Set of Observed Independencies Has a  Causal Explanation\nExploring Localization in Bayesian Networks for Large Expert Systems\nA Decision Calculus for Belief Functions in Valuation-Based Systems\nSidestepping the Triangulation Problem in Bayesian Net Computations\nARCO1: An Application of Belief Networks to the Oil Market\nCombining Multiple-Valued Logics in Modular Expert Systems\nConstraint Propagation with Imprecise Conditional Probabilities\nSome Properties of Plausible Reasoning\nTheory Refinement on Bayesian Networks\nSymbolic Probabilistic Inference with Continuous Variables\nProbability Estimation in Face of Irrelevant Information\nAn Approximate Nonmyopic Computation for Value of Information\nSearch-based Methods to Bound Diagnostic Probabilities in Very Large  Belief Nets\nBelief and Surprise - A Belief-Function Formulation\nEvidential Reasoning in a Categorial Perspective: Conjunction and  Disjunction of Belief Functions\nA Logic of Graded Possibility and Certainty Coping with Partial  Inconsistency\nA Language for Planning with Statistics\nManagement of Uncertainty in the Multi-Level Monitoring and Diagnosis of  the Time of Flight Scintillation Array\nDynamic Network Updating Techniques For Diagnostic Reasoning\nHigh Level Path Planning with Uncertainty\nDeliberation and its Role in the Formation of Intentions\nTruth as Utility: A Conceptual Synthesis\nStructuring Bodies of Evidence\nA Method for Integrating Utility Analysis into an Expert System for  Design Evaluation\nCompatibility of Quantitative and Qualitative Representations of Belief\nA Non-Numeric Approach to Multi-Criteria/Multi-Expert Aggregation Based  on Approximate Reasoning\nUniversal Induction with Varying Sets of Combinators\nArtificial Intelligence MArkup Language: A Brief Tutorial\nLower Bounds for Exact Model Counting and Applications in Probabilistic  Databases\nReasoning about Probabilities in Dynamic Systems using Goal Regression\nSparsityBoost: A New Scoring Function for Learning Bayesian Network  Structure\nAutomorphism Groups of Graphical Models and Lifted Variational Inference\nLearning Sparse Causal Models is not NP-hard\nQualitative Possibilistic Mixed-Observable MDPs\nOptimization With Parity Constraints: From Binary Codes to Discrete  Integration\nMonte-Carlo Planning: Theoretically Fast Convergence Meets Practical  Efficiency\nBethe-ADMM for Tree Decomposition based Parallel MAP Inference\nDiscovering Cyclic Causal Models with Latent Variables: A General  SAT-Based Procedure\nSolving Limited-Memory Influence Diagrams Using Branch-and-Bound Search\nEvaluating Anytime Algorithms for Learning Optimal Bayesian Networks\nOn the Complexity of Strong and Epistemic Credal Networks\nLearning Periodic Human Behaviour Models from Sparse Data for  Crowdsourcing Aid Delivery in Developing Countries\nTighter Linear Program Relaxations for High Order Graphical Models\nCyclic Causal Discovery from Continuous Equilibrium Data\nEvaluating computational models of explanation using human judgments\nApproximation of Lorenz-Optimal Solutions in Multiobjective Markov  Decision Processes\nSolution Methods for Constrained Markov Decision Process with Continuous  Probability Modulation\nSparse Nested Markov models with Log-linear Parameters\nPreference Elicitation For General Random Utility Models\nDynamic Blocking and Collapsing for Gibbs Sampling\nBounded Approximate Symbolic Dynamic Programming for Hybrid MDPs\nOn MAP Inference by MWSS on Perfect Graphs\nLinear combination of one-step predictive information with an external  reward in an episodic policy gradient setting: a critical analysis\nCalculation of Entailed Rank Constraints in Partially Non-Linear and  Cyclic Models\nDouble four-bar crank-slider mechanism dynamic balancing by  meta-heuristic algorithms\nStudying a Chaotic Spiking Neural Model\nA Big Data Approach to Computational Creativity\nPlanning by case-based reasoning based on fuzzy logic\nAbstraction in decision-makers with limited information processing  capabilities\nBounded Recursive Self-Improvement\nDecision Making under Uncertainty: A Quasimetric Approach\nTractability through Exchangeability: A New Perspective on Efficient  Probabilistic Inference\nNetworks of Influence Diagrams: A Formalism for Representing Agents'  Beliefs and Decision-Making Processes\nAnalogical Dissimilarity: Definition, Algorithms and Two Experiments in  Machine Learning\nA Heuristic Search Approach to Planning with Continuous Resources in  Stochastic Domains\nOnline Planning Algorithms for POMDPs\nThe Ultrametric Constraint and its Application to Phylogenetics\nInteractive Policy Learning through Confidence-Based Autonomy\nAsynchronous Forward Bounding for Distributed COPs\nCompleteness and Performance Of The APO Algorithm\nThe Computational Complexity of Dominance and Consistency in CP-Nets\nMonte Carlo Sampling Methods for Approximating Interactive POMDPs\nSolving #SAT and Bayesian Inference with Backtracking Search\nGeneric Preferences over Subsets of Structured Objects\nA Bilinear Programming Approach for Multiagent Planning\nLearning Bayesian Network Equivalence Classes with Ant Colony  Optimization\nPlanning over Chain Causal Graphs for Variables with Domains of Size 5  Is NP-Hard\nCompiling Uncertainty Away in Conformant Planning Problems with Bounded  Width\nExploiting Single-Cycle Symmetries in Continuous Constraint Problems\nConservative Inference Rule for Uncertain Reasoning under Incompleteness\nThe Complexity of Circumscription in DLs\nRelaxed Survey Propagation for The Weighted Maximum Satisfiability  Problem\nHypertableau Reasoning for Description Logics\nThe DL-Lite Family and Relations\nJoin-Graph Propagation Algorithms\nParamILS: An Automatic Algorithm Configuration Framework\nPredicting the Performance of IDA* using Conditional Distributions\nEfficient Planning under Uncertainty with Macro-actions\nActive Tuples-based Scheme for Bounding Posterior Beliefs\nChange in Abstract Argumentation Frameworks: Adding an Argument\nDeveloping Approaches for Solving a Telecommunications Feature  Subscription Problem\nMultiattribute Auctions Based on Generalized Additive Independence\nAutomatic Induction of Bellman-Error Features for Probabilistic Planning\nApproximate Model-Based Diagnosis Using Greedy Stochastic Search\nNominals, Inverses, Counting, and Conjunctive Queries or: Why Infinity  is your Friend!\nAlgorithms for Closed Under Rational Behavior (CURB) Sets\nLogical Foundations of RDF(S) with Datatypes\nPlanning with Noisy Probabilistic Relational Rules\nBest-First Heuristic Search for Multicore Machines\nAn Effective Algorithm for and Phase Transitions of the Directed  Hamiltonian Cycle Problem\nA Logical Study of Partial Entailment\nIterated Belief Change Due to Actions and Observations\nClause-Learning Algorithms with Many Restarts and Bounded-Width  Resolution\nLearning to Make Predictions In Partially Observable Environments  Without a Generative Model\nSecond-Order Consistencies\nProperties of Bethe Free Energies and Message Passing in Gaussian Models\nValue of Information Lattice: Exploiting Probabilistic Independence for  Effective Feature Subset Acquisition\nProbabilistic Relational Planning with First Order Decision Diagrams\nExploiting Structure in Weighted Model Counting Approaches to  Probabilistic Inference\nAnalyzing Search Topology Without Running Any Search: On the Connection  Between Causal Graphs and h+\nInterpolable Formulas in Equilibrium Logic and Answer Set Programming\nFirst-Order Stable Model Semantics and First-Order Loop Formulas\nDecidability and Undecidability Results for Propositional Schemata\nDefeasible Inclusions in Low-Complexity DLs\nMaking Decisions Using Sets of Probabilities: Updating, Time  Consistency, and Calibration\nProximity-Based Non-uniform Abstractions for Approximate Planning\nSAS+ Planning as Satisfiability\nLearning and Reasoning with Action-Related Places for Robust Mobile  Manipulation\nCounting-Based Search: Branching Heuristics for Constraint Satisfaction  Problems\nSemantic Similarity Measures Applied to an Ontology for Human-Like  Interaction\nReformulating the Situation Calculus and the Event Calculus in the  General Theory of Stable Models and in Answer Set Programming\nLocal Consistency and SAT-Solvers\nAlgorithms and Limits for Compact Plan Representations\nThe Logical Difference for the Lightweight Description Logic EL\nAlgorithms for Generating Ordered Solutions for Explicit AND/OR  Structures\nReasoning over Ontologies with Hidden Content: The Import-by-Query  Approach\nQuality of Geographic Information: Ontological approach and Artificial  Intelligence Tools\nDynamic Move Chains -- a Forward Pruning Approach to Tree Search in  Computer Chess\nApproximation Models of Combat in StarCraft 2\nAxiomatization of Finite Algebras\nA Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference  in Natural Language Processing\nProjective simulation applied to the grid-world and the mountain-car  problem\nA Comparison of Monte Carlo Tree Search and Mathematical Optimization  for Large Scale Dynamic Resource Allocation\nGenerating Natural Language Descriptions from OWL Ontologies: the  NaturalOWL System\nA Multi-threshold Segmentation Approach Based on Artificial Bee Colony  Optimization\nAnalogy-Based and Case-Based Reasoning: Two sides of the same coin\nInferring latent structures via information inequalities\nAllocation in Practice\nContext Aware Dynamic Traffic Signal Optimization\nFuzzy inference system for integrated VVC in isolated power systems\nEfficient Bayesian Nonparametric Modelling of Structured Point Processes\nDefining Relative Likelihood in Partially-Ordered Preferential  Structures\nUpdating Probabilities\nReasoning about Expectation\nWhen Ignorance is Bliss\nEvidence with Uncertain Likelihoods\nA Game-Theoretic Analysis of Updating Sets of Probabilities\nMDPs with Unawareness\nMarket Making with Decreasing Utility for Information\nLogarithmic-Time Updates and Queries in Probabilistic Networks\nAxiomatizing Causal Reasoning\nLearning to Cooperate via Policy Search\nUpdating with incomplete observations\nWhen do Numbers Really Matter?\nSensitivity analysis for finite Markov chains in discrete time\nOn the Conditional Independence Implication Problem: A Lattice-Theoretic  Approach\nApproximate inference on planar graphs using Loop Calculus and Belief  Propagation\nEfficient Clustering with Limited Distance Information\nSelecting Computations: Theory and Applications\nPredicting the behavior of interacting humans by fusing data from  multiple sources\nActive Sensing as Bayes-Optimal Sequential Decision Making\nScoring and Searching over Bayesian Networks with Causal and Associative  Priors\nTight Regret Bounds for Stochastic Combinatorial Semi-Bandits\nStochastic Discriminative EM\nAn improved multimodal PSO method based on electrostatic interaction  using n- nearest-neighbor local search\nConsensus Message Passing for Layered Graphical Models\nLogical Limitations to Machine Ethics with Consequences to Lethal  Autonomous Weapons\nROSS User's Guide and Reference Manual (Version 1.0)\nGenetic Algorithms in Wireless Networking: Techniques, Applications, and  Issues\nA Fuzzy Syllogistic Reasoning Schema for Generalized Quantifiers\nOn the analysis of set-based fuzzy quantified reasoning using classical  syllogistics\nSolving Games with Functional Regret Estimation\nBelief Approach for Social Networks\nFactorization, Inference and Parameter Learning in Discrete AMP Chain  Graphs\nA Distance-Based Decision in the Credal Level\nAn approach to multi-agent planning with incomplete information\nReactive Reasoning with the Event Calculus\nTowards Ideal Semantics for Analyzing Stream Reasoning\nAsynchronous Multi-Context Systems\nOn Minimal Change in Evolving Multi-Context Systems (Preliminary Report)\nTowards Large-scale Inconsistency Measurement\nTowards Efficient Evolving Multi-Context Systems (Preliminary Report)\nProbabilistic Constraint Programming for Parameters Optimisation of  Generative Models\nFast Cross-Validation for Incremental Learning\nA New Approach to Probabilistic Programming Inference\nBlack-Box Policy Search with Probabilistic Programs\nBudget Constraints in Prediction Markets\nReview-Level Sentiment Classification with Sentence-Level Polarity  Correction\nTaxonomy of Pathways to Dangerous AI\nIntroduzione all'Intelligenza Artificiale\nGaussian Process Planning with Lipschitz Continuous Reward Functions:  Towards Unifying Bayesian Optimization, Active Learning, and Beyond\nNear-Optimal Active Learning of Multi-Output Gaussian Processes\nSolving Transition-Independent Multi-agent MDPs with Sparse Interactions  (Extended version)\nLearning to Generate Posters of Scientific Papers\nLandmark-Based Plan Recognition\nA system of serial computation for classified rules prediction in  non-regular ontology trees\nLearning Social Affordance for Human-Robot Interaction\nVisual Storytelling\nTasks for agent-based negotiation teams: Analysis, review, and  challenges\nManaging Overstaying Electric Vehicles in Park-and-Charge Facilities\nConstructive Preference Elicitation by Setwise Max-margin Learning\nBayesian Inference of Recursive Sequences of Group Activities from  Tracks\nDeep Learning for Reward Design to Improve Monte Carlo Tree Search in  ATARI Games\nProtein Secondary Structure Prediction Using Cascaded Convolutional and  Recurrent Neural Networks\nThe Power of Arc Consistency for CSPs Defined by Partially-Ordered  Forbidden Patterns\nDeep, Convolutional, and Recurrent Models for Human Activity Recognition  using Wearables\nEssentials of an Integrated Crowd Management Support System Based on  Collective Artificial Intelligence\nProbabilistic Inference from Arbitrary Uncertainty using Mixtures of  Factorized Generalized Gaussians\nTypical models: minimizing false beliefs\nThe Ariadne's Clew Algorithm\nComputational Aspects of Reordering Plans\nThe Divide-and-Conquer Subgoal-Ordering Algorithm for Speeding up Logic  Inference\nA Temporal Description Logic for Reasoning about Actions and Plans\nOrder of Magnitude Comparisons of Distance\nThe Automatic Inference of State Invariants in TIM\nComplexity of Prioritized Default Logics\nSqueaky Wheel Optimization\nEfficient Implementation of the Plan Graph in STAN\nCooperation between Top-Down and Bottom-Up Theorem Provers\nProbabilistic Deduction with Conditional Constraints over Basic Events\nVariational Probabilistic Inference and the QMR-DT Network\nExtensible Knowledge Representation: the Case of Description Reasoners\nLearning to Order Things\nConstructing Conditional Plans by a Theorem-Prover\nIssues in Stacked Generalization\nReasoning on Interval and Point-based Disjunctive Metric Constraints in  Temporal Contexts\nCauses of Ineradicable Spurious Predictions in Qualitative Simulation\nHow the Landscape of Random Job Shop Scheduling Instances Depends on the  Ratio of Jobs to Machines\nPreference-based Search using Example-Critiquing with Suggestions\nAnytime Point-Based Approximations for Large POMDPs\nLearning Sentence-internal Temporal Relations\nConfidence-based Reasoning in Stochastic Constraint Programming\nModelling Mixed Discrete-Continuous Domains for Planning\nSet Intersection and Consistency in Constraint Networks\nConsistency and Random Constraint Satisfaction Models\nUncertainty in Soft Temporal Constraint Problems:A General Framework and  Controllability Algorithms forThe Fuzzy Case\nThe Generalized A* Architecture\nAn Approach to Temporal Planning and Scheduling in Domains with  Predictable Exogenous Events\nProactive Algorithms for Job Shop Scheduling with Probabilistic  Durations\nThe Language of Search\nUnderstanding Algorithm Performance on an Oversubscribed Scheduling  Application\nAnytime Heuristic Search\nCutset Sampling for Bayesian Networks\nSemantic Matchmaking as Non-Monotonic Reasoning: A Description Logic  Approach\nSolution-Guided Multi-Point Constructive Search for Job Shop Scheduling\nResource Allocation Among Agents with MDP-Induced Preferences\nA Continuation Method for Nash Equilibria in Structured Games\nExtending Object-Oriented Languages by Declarative Specifications of  Complex Objects using Answer-Set Programming\nPerformance Evaluation of Road Traffic Control Using a Fuzzy Cellular  Model\nQualitative Propagation and Scenario-based Explanation of Probabilistic  Reasoning\nIntegrating Probabilistic, Taxonomic and Causal Knowledge in Abductive  Diagnosis\nWhat is an Optimal Diagnosis?\nKutato: An Entropy-Driven System for Construction of Probabilistic  Expert Systems from Databases\nComputationally-Optimal Real-Resource Strategies\nReducing Uncertainty in Navigation and Exploration\nDecision Making with Interval Influence Diagrams\nApproximations in Bayesian Belief Universe for Knowledge Based Systems\nA Polynomial Time Algorithm for Finding Bayesian Probabilities from  Marginal Constraints\nComputation of Variances in Causal Networks\nA Sensitivity Analysis of Pathfinder\nOn the Equivalence of Causal Models\nDirected Reduction Algorithms and Decomposable Graphs\nPossibility as Similarity: the Semantics of Fuzzy Logic\nCredibility Discounting in the Theory of Approximate Reasoning\nUpdating with Belief Functions, Ordinal Conditioning Functions and  Possibility Measures\nValuation-Based Systems for Discrete Optimization\nComputational Aspects of the Mobius Transform\nA Hierarchical Approach to Designing Approximate Reasoning-Based  Controllers for Dynamic Physical Systems\nEvidence Combination and Reasoning and Its Application to Real-World  Problem-Solving\nUsing Belief Functions for Uncertainty Management and Knowledge  Acquisition: An Expert Application\nAn Architecture for Probabilistic Concept-Based Information Retrieval\nFine-Grained Decision-Theoretic Search Control\nCombination of Evidence Using the Principle of Minimum Information Gain\nProbabilistic Evaluation of Candidates and Symptom Clustering for  Multidisorder Diagnosis\nExtending Term Subsumption systems for Uncertainty Management\nRefinement and Coarsening of Bayesian Networks\nSecond Order Probabilities for Uncertain and Conflicting Evidence\nA Model for Non-Monotonic Reasoning Using Dempster's Rule\nDefault Reasoning and the Transferable Belief Model\nSeparable and transitive graphoids\nMap Learning with Indistinguishable Locations\nTemporal Reasoning with Probabilities\nNow that I Have a Good Theory of Uncertainty, What Else Do I Need?\nUncertainty and Incompleteness\nAutomated Reasoning Using Possibilistic Logic: Semantics, Belief  Revision and Variable Certainty Weights\nHow Much More Probable is \"Much More Probable\"? Verbal Expressions for  Probability Updates\nInterval Influence Diagrams\nWeighing and Integrating Evidence for Stochastic Simulation in Bayesian  Networks\nThe Relationship between Knowledge, Belief and Certainty\nThe Compilation of Decision Models\nA Tractable Inference Algorithm for Diagnosing Multiple Diseases\nBounded Conditioning: Flexible Inference for Decisions under Scarce  Resources\nHierarchical Evidence Accumulation in the Pseiki System and Experiments  in Model-Driven Mobile Robot Navigation\nWhen Should a Decision Maker Ignore the Advice of a Decision Aid?\nModel-based Influence Diagrams for Machine Vision\nDefeasible Decisions: What the Proposal is and isn't\nExperiments Using Belief Functions and Weights of Evidence incorporating  Statistical Data and Expert Opinions\nConditioning on Disjunctive Knowledge: Defaults and Probabilities\nA Logical Interpretation of Dempster-Shafer Theory, with Application to  Visual Recognition\nSimulation Approaches to General Probabilistic Inference on Belief  Networks\nAssessment, Criticism and Improvement of Imprecise Subjective  Probabilities for a Medical Expert System\nAutomated Construction of Sparse Bayesian Networks from Unstructured  Probabilistic Models and Domain Information\nMaking Decisions with Belief Functions\nComparing Expert Systems Built Using Different Uncertain Inference  Systems\nA General Framework for Interacting Bayes-Optimally with Self-Interested  Agents using Arbitrary Parametric Model and Model Prior\nThe structure of Bayes nets for vision recognition\nProbability Distributions Over Possible Worlds\nHierarchical Evidence and Belief Functions\nInduction and Uncertainty Management Techniques Applied to Veterinary  Medical Diagnosis\nKNET: Integrating Hypermedia and Bayesian Modeling\nProbabilistic Causal Reasoning\nUncertainty Management for Fuzzy Decision Support Systems\nStochastic Sensitivity Analysis Using Fuzzy Influence Diagrams\nA Representation of Uncertainty to Aid Insight into Decision Models\nA Comparison of Decision Analysis and Expert Rules for Sequential  Diagnosis\nMultiple decision trees\nProbabilistic Semantics and Defaults\nDecision Making with Linear Constraints on Probabilities\nMaintenance in Probabilistic Knowledge-Based Systems\nGenerating Decision Structures and Causal Explanations for Decision  Making\nGeneralizing the Dempster-Shafer Theory to Fuzzy Sets\nHigher Order Probabilities\nAn Interesting Uncertainty-Based Combinatoric Problem in Spare Parts  Forecasting: The FRED System\nStochastic Simulation of Bayesian Belief Networks\nNAIVE: A Method for Representing Uncertainty and Temporal Relationships  in an Automated Reasoner\nSatisfaction of Assumptions is a Weak Predictor of Performance\nStructuring Causal Tree Models with Continuous Variables\nImplementing Evidential Reasoning in Expert Systems\nAutomated Generation of Connectionist Expert Systems for Problems  Involving Noise and Redundancy\nTowards Solving the Multiple Extension Problem: Combining Defaults and  Probabilities\nThe Role of Calculi in Uncertain Inference Systems\nImplementing a Bayesian Scheme for Revising Belief Commitments\nCompiling Fuzzy Logic Control Rules to Hardware Implementations\nSteps Towards Programs that Manage Uncertainty\nCombining Symbolic and Numeric Approaches to Uncertainty Management\nEstimation Procedures for Robust Sensor Control\nReasoning About Beliefs and Actions Under Computational Resource  Constraints\nAdvantages and a Limitation of Using LEG Nets in a Real-TIme Problem\nApplication of Evidential Reasoning to Helicopter Flight Path Control\nProbabilistic Reasoning About Ship Images\nPredicting The Performance of Minimax and Product in Game-Tree\nThe Myth of Modularity in Rule-Based Systems\nAn Axiomatic Framework for Belief Updates\nImprecise Meanings as a Cause of Uncertainty in Medical Knowledge-Based  Systems\nAn Explanation Mechanism for Bayesian Inferencing Systems\nDistributed Revision of Belief Commitment in Multi-Hypothesis  Interpretations\nApproximate Deduction in Single Evidential Bodies\nA Causal Bayesian Model for the Diagnosis of Appendicitis\nExperimentally Comparing Uncertain Inference Systems to Probability\nAn Inequality Paradigm for Probabilistic Knowledge\nProbabilistic Interpretations for MYCIN's Certainty Factors\nUncertain Reasoning Using Maximum Entropy Inference\nIndependence and Bayesian Updating Methods\nA Framework for Comparing Uncertain Inference Systems to Probability\nInductive Inference and the Representation of Uncertainty\nAn Odds Ratio Based Inference Engine\nA Framework for Control Strategies in Uncertain Inference Networks\nConfidence Factors, Empiricism and the Dempster-Shafer Theory of  Evidence\nEvidential Confirmation as Transformed Probability\nBackdoors to Abduction\nA Hybrid Rule Based Fuzzy-Neural Expert System For Passive Network  Monitoring\nAn n-ary Constraint for the Stable Marriage Problem\nSpace as an invention of biological organisms\nHistory Based Coalition Formation in Hedonic Context Using Trust\nFormalization, Mechanization and Automation of Gödel's Proof of God's  Existence\nDavid Poole's Specificity Revised\nOptimal Rectangle Packing: An Absolute Placement Approach\nSafe Exploration of State and Action Spaces in Reinforcement Learning\nIrrelevant and independent natural extension for sets of desirable  gambles\nLifted Variable Elimination: Decoupling the Operators from the  Constraint Language\nBoolean Equi-propagation for Concise and Efficient SAT Encodings of  Combinatorial Problems\nDescription Logic Knowledge and Action Bases\nLearning to Predict from Textual Data\nReasoning about Explanations for Negative Query Answers in DL-Lite\nA Survey of Multi-Objective Sequential Decision-Making\nExtended Breadth-First Search Algorithm\nNear Optimal Bayesian Active Learning for Decision Making\nA self-organizing system for urban traffic control based on predictive  interval microscopic model\nAutomated Generation of Geometric Theorems from Images of Diagrams\nGeneric construction of scale-invariantly coarse grained memory\nEvent and Anomaly Detection Using Tucker3 Decomposition\nLifted Tree-Reweighted Variational Inference\nHow good is the Shapley value-based approach to the influence  maximization problem?\nNon-myopic learning in repeated stochastic games\nMulti-Context Models for Reasoning under Partial Knowledge: Generative  Process and Inference Grammar\nGraATP: A Graph Theoretic Approach for Automated Theorem Proving in  Plane Geometry\nIntelligent Indoor Mobile Robot Navigation Using Stereo Vision\nUnified vector space mapping for knowledge representation systems\nGame-theoretic Approach for Non-Cooperative Planning\nA Minimal Active Inference Agent\nThe concept of free will as an infinite metatheoretic recursion\nUsing Latent Semantic Analysis to Identify Quality in Use (QU)  Indicators from User Reviews\nEvaluation Evaluation a Monte Carlo study\nAn Optimized Hybrid Approach for Path Finding\nDiscovery of the $D$-basis in binary tables based on hypergraph  dualization\nGrid-based angle-constrained path planning\nArguments for the Effectiveness of Human Problem Solving\nReducing offline evaluation bias of collaborative filtering algorithms\nExact ICL maximization in a non-stationary time extension of the latent  block model for dynamic networks\nSNA-based reasoning for multiagent team composition\nA Survey of Current Datasets for Vision and Language Research\nOn Design Mining: Coevolution and Surrogate Models\nLifted Representation of Relational Causal Models Revisited:  Implications for Reasoning and Structure Learning\nA note on the complexity of the causal ordering problem\nSemi-described and semi-supervised learning with Gaussian processes\nEnsemble UCT Needs High Exploitation\nTuned and GPU-accelerated parallel data mining from comparable corpora\nReasoning in Infinitely Valued G-IALCQ\nToward a Taxonomy and Computational Models of Abnormalities in Images\nExtracting Biomolecular Interactions Using Semantic Parsing of  Biomedical Text\nLarge Scale Distributed Semi-Supervised Learning Using Streaming  Approximation\nSubsumptive reflection in SNOMED CT: a large description logic-based  terminology for diagnosis\nSentence Entailment in Compositional Distributional Semantics\nIncreasing the Action Gap: New Operators for Reinforcement Learning\nConstrained Sampling and Counting: Universal Hashing Meets SAT Solving\nInformation-Theoretic Bounded Rationality\nComplexity of Shift Bribery in Committee Elections\nBachelor's thesis on generative probabilistic programming (in Russian  language, June 2014)\nA Comparative Study of Ranking-based Semantics for Abstract  Argumentation\nReinforcement Learning approach for Real Time Strategy Games Battle city  and S3\nAuthorship Attribution Using a Neural Network Language Model\nEntity Embeddings with Conceptual Subspaces as a Basis for Plausible  Reasoning\nDistributed Constraint Optimization Problems and Applications: A Survey\nHarnessing disordered quantum dynamics for machine learning\nBounded Rational Decision-Making in Feedforward Neural Networks\nAn Online Mechanism for Ridesharing in Autonomous Mobility-on-Demand  Systems\nOn the Theory and Practice of Privacy-Preserving Bayesian Data Analysis\nAdaptive Maximization of Pointwise Submodular Functions With Budget  Constraint\nLearning Purposeful Behaviour in the Absence of Rewards\nSmall Representations of Big Kidney Exchange Graphs\nA PAC RL Algorithm for Episodic POMDPs\nScalable Algorithms for Tractable Schatten Quasi-Norm Minimization\nThe Dark Side of Ethical Robots\nTowards Anthropo-inspired Computational Systems: the $P^3$ Model\nConcrete Problems in AI Safety\nOn Gaussian Markov models for conditional independence\nOn the Semantic Relationship between Probabilistic Soft Logic and Markov  Logic\nNon-linear Label Ranking for Large-scale Prediction of Long-Term User  Interests\nLearning Relational Dependency Networks for Relation Extraction\nSituated Structure Learning of a Bayesian Logic Network for Commonsense  Reasoning\nVisualizing Natural Language Descriptions: A Survey\nPath planning with Inventory-driven Jump-Point-Search\nLearning opening books in partially observable games: using random seeds  in Phantom Go\nGlobal Continuous Optimization with Error Bound and Fast Convergence\nGenerating Images Part by Part with Composite Generative Adversarial  Networks\nAutomatically Reinforcing a Game AI\nPsychologically inspired planning method for smart relocation task\nFinite LTL Synthesis is EXPTIME-complete\nThe Option-Critic Architecture\nStructured Inference Networks for Nonlinear State Space Models\nDeep Convolutional Networks as Models of Generalization and Blending  Within Visual Creativity\nQuantum-enhanced machine learning\nDetecting Dependencies in Sparse, Multivariate Databases Using  Probabilistic Programming and Non-parametric Bayes\nRecursive Decomposition for Nonconvex Optimization\nDouble-quantitative $γ^{\\ast}-$fuzzy coverings approximation  operators\nQuantum Enhanced Inference in Markov Logic Networks\nLocal Discriminant Hyperalignment for multi-subject fMRI data alignment\nAccelerated Gradient Temporal Difference Learning\nOvercoming catastrophic forgetting in neural networks\nUsing Discourse Signals for Robust Instructor Intervention Prediction\nLearning Representations by Stochastic Meta-Gradient Descent in Neural  Networks\nTechnical Report: A Generalized Matching Pursuit Approach for  Graph-Structured Sparsity\nComputing Human-Understandable Strategies\nSolving Set Optimization Problems by Cardinality Optimization via Weak  Constraints with an Application to Argumentation\nThe Predictron: End-To-End Learning and Planning\nAccelerated Convolutions for Efficient Multi-Scale Time to Contact  Computation in Julia\nCuriosity-Aware Bargaining\nEfficient Transfer Learning Schemes for Personalized Language Modeling  using Recurrent Neural Network\nQuery Efficient Posterior Estimation in Scientific Experiments via  Bayesian Active Learning\nThe Absent-Minded Driver Problem Redux\nBeating the World's Best at Super Smash Bros. with Deep Reinforcement  Learning\nNeural Programming by Example\nProbabilistic Models for Computerized Adaptive Testing\nThe Top 10 Topics in Machine Learning Revisited: A Quantitative  Meta-Study\nTreatment-Response Models for Counterfactual Reasoning with  Continuous-time, Continuous-valued Interventions\nApproximating the Backbone in the Weighted Maximum Satisfiability  Problem\nTweeting AI: Perceptions of AI-Tweeters (AIT) vs Expert AI-Tweeters  (EAIT)\nTramp Ship Scheduling Problem with Berth Allocation Considerations and  Time-dependent Constraints\nA Reasoning System for a First-Order Logic of Limited Belief\nAn Anthropic Argument against the Future Existence of Superintelligent  Artificial Intelligence\nPitfalls and Best Practices in Algorithm Configuration\nWhy You Should Charge Your Friends for Borrowing Your Stuff\nXOR-Sampling for Network Design with Correlated Stochastic Events\nContinual Learning with Deep Generative Replay\nWhen Will AI Exceed Human Performance? Evidence from AI Experts\nEfficient, Safe, and Probably Approximately Complete Learning of Action  Models\nAn Empirical Analysis of Approximation Algorithms for the Euclidean  Traveling Salesman Problem\nUniversal Reinforcement Learning Algorithms: Survey and Experiments\nLearning to Represent Mechanics via Long-term Extrapolation and  Interpolation\nRapid Randomized Restarts for Multi-Agent Path Finding Solvers\nDynamic Difficulty Adjustment on MOBA Games\nStructured Best Arm Identification with Fixed Confidence\nExpert and Non-Expert Opinion about Technological Unemployment\nA Useful Motif for Flexible Task Learning in an Embodied Two-Dimensional  Visual Environment\nCount-Based Exploration in Feature Space for Reinforcement Learning\nThe Relationship Between Emotion Models and Artificial Intelligence\nCausal Consistency of Structural Equation Models\nMaintaining cooperation in complex social dilemmas using deep  reinforcement learning\nThe Intentional Unintentional Agent: Learning to Solve Many Continuous  Control Tasks Simultaneously\nRobust Bayesian Optimization with Student-t Likelihood\nTowards learning domain-independent planning heuristics\nLearning Sparse Representations in Reinforcement Learning with Sparse  Coding\nNon-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models\nDeep Style Match for Complementary Recommendation\nVisual art inspired by the collective feeding behavior of sand-bubbler  crabs\nGigamachine: incremental machine learning on desktop computers\nProbability Reversal and the Disjunction Effect in Reasoning Systems\nPerspectives for Evaluating Conversational AI\nA Streaming Accelerator for Deep Convolutional Neural Networks with  Image and Feature Decomposition for Resource-limited System Applications\nAlgorithms and Architecture for Real-time Recommendations at News UK\nAugmenting End-to-End Dialog Systems with Commonsense Knowledge\nOn Inductive Abilities of Latent Factor Models for Relational Learning\nHuman Understandable Explanation Extraction for Black-box Classification  Models Based on Matrix Factorization\nTweeting AI: Perceptions of Lay vs Expert Twitterati\nFine-grained Event Learning of Human-Object Interaction with LSTM-CRF\nSpecification Inference from Demonstrations\nArguing Machines: Perception-Control System Redundancy and Edge Case  Discovery in Real-World Autonomous Driving\nCombinatorial Multi-armed Bandits for Real-Time Strategy Games\nFast Top-k Area Topics Extraction with Knowledge Base\nSemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and  Querying Knowledge Graphs\nAcquiring Target Stacking Skills by Goal-Parameterized Deep  Reinforcement Learning\nDistributed Bayesian Piecewise Sparse Linear Models\nFrom Algorithmic Black Boxes to Adaptive White Boxes: Declarative  Decision-Theoretic Ethical Programs as Codes of Ethics\nModeling Epistemological Principles for Bias Mitigation in AI Systems:  An Illustration in Hiring Decisions\nTeaching a Machine to Read Maps with Deep Reinforcement Learning\nJamBot: Music Theory Aware Chord Based Generation of Polyphonic Music  with LSTMs\nMultiagent Simple Temporal Problem: The Arc-Consistency Approach\nImprovised Comedy as a Turing Test\nInteractive Robot Learning of Gestures, Language and Affordances\nLearning to Rank based on Analogical Reasoning\nHappiness Pursuit: Personality Learning in a Society of Agents\nVisual Explanation by High-Level Abduction: On Answer-Set Programming  Driven Reasoning about Moving Objects\nDeep Learning Can Reverse Photon Migration for Diffuse Optical  Tomography\nDiscriminant Projection Representation-based Classification for Vision  Recognition\nMastering Chess and Shogi by Self-Play with a General Reinforcement  Learning Algorithm\nCogniculture: Towards a Better Human-Machine Co-evolution\nA Bayesian Clearing Mechanism for Combinatorial Auctions\nRevisiting the Master-Slave Architecture in Multi-Agent Deep  Reinforcement Learning\nPractical Challenges in Explicit Ethical Machine Reasoning\nOvercoming catastrophic forgetting with hard attention to the task\nEBIC: an artificial intelligence-based parallel biclustering algorithm  for pattern discovery\nCooperative Multi-Agent Reinforcement Learning for Low-Level Wireless  Communication\nA Quantitative Approach in Heuristic Evaluation of E-commerce Websites\nThe Role of Conditional Independence in the Evolution of Intelligent  Systems\nDynamic Optimization of Neural Network Structures Using Probabilistic  Modeling\nProbabilistic Planning by Probabilistic Programming\nDeceptive Games\nShort-term Memory of Deep RNN\nMulti-attention Recurrent Network for Human Communication Comprehension\nWays of Applying Artificial Intelligence in Software Engineering\nAnswerer in Questioner's Mind for Goal-Oriented Visual Dialogue\nAn Ontology Based Modeling Framework for Design of Educational  Technologies\nLearning to Search with MCTSnets\nQuantitative Predictions in Quantum Decision Theory\nAutomated Playtesting with Procedural Personas through MCTS with Evolved  Heuristics\nThe Malicious Use of Artificial Intelligence: Forecasting, Prevention,  and Mitigation\nIncremental and Iterative Learning of Answer Set Programs from Mutually  Distinct Examples\nVector Field Based Neural Networks\nMeta Multi-Task Learning for Sequence Modeling\nShaping Influence and Influencing Shaping: A Computational Red Teaming  Trust-based Swarm Intelligence Model\nSelective Experience Replay for Lifelong Learning\nComputational Theories of Curiosity-Driven Learning\nInferencing Based on Unsupervised Learning of Disentangled  Representations\nPredicting Crime Using Spatial Features\nA New Result on the Complexity of Heuristic Estimates for the A*  Algorithm\nLearning State Representations for Query Optimization with Deep  Reinforcement Learning\nScalable photonic reinforcement learning by time-division multiplexing  of laser chaos\nA Distributed Extension of the Turing Machine\nArtificial Intelligence and Robotics\nAn Empirical Analysis of Constrained Support Vector Quantile Regression  for Nonparametric Probabilistic Forecasting of Wind Power\nLearning to Reason with HOL4 tactics\nDesigning Autonomous Vehicles: Evaluating the Role of Human Emotions and  Social Norms\nFrom Statistical Knowledge Bases to Degrees of Belief\nWeb Usage Mining Using Artificial Ant Colony Clustering and Genetic  Programming\nComputational Chemotaxis in Ants and Bacteria over Dynamic Environments\nA Computational Study on Emotions and Temperament in Multi-Agent Systems\nA survey on independence-based Markov networks learning\nUse of Fuzzy Sets in Semantic Nets for Providing On-Line Assistance to  User of Technological Systems\nAnalysis of Statistical Hypothesis based Learning Mechanism for Faster  Crawling\nA Framework for Intelligent Medical Diagnosis using Rough Set with  Formal Concept Analysis\nUsing Artificial Intelligence Models in System Identification\nImproving circuit miniaturization and its efficiency using Rough Set  Theory\nNew Ideas for Brain Modelling\nBuilding Machines That Learn and Think Like People\nIdeal Reformulation of Belief Networks\nA VLSI Design and Implementation for a Real-Time Approximate Reasoning\nIntelligent Conversational Bot for Massive Online Open Courses (MOOCs)\nAutomated Big Text Security Classification\nMorphognosis: the shape of knowledge in space and time\nAnalysis of Agent Expertise in Ms. Pac-Man using  Value-of-Information-based Policies\nMultiagent Bidirectionally-Coordinated Nets: Emergence of Human-level  Coordination in Learning to Play StarCraft Combat Games\nAlways Lurking: Understanding and Mitigating Bias in Online Human  Trafficking Detection\nA Berkeley View of Systems Challenges for AI\nSolutions to problems with deep learning\nGame of Sketches: Deep Recurrent Models of Pictionary-style Word  Guessing\nFractal AI: A fragile theory of intelligence\nComputational Power and the Social Impact of Artificial Intelligence\nEmpirical Analysis of Foundational Distinctions in the Web of Data\nLearning to Navigate in Cities Without a Map\nStarCraft Micromanagement with Reinforcement Learning and Curriculum  Transfer Learning\nA Mathematical Framework for Superintelligent Machines\nPCT and Beyond: Towards a Computational Framework for `Intelligent'  Communicative Systems\nApplying Data Mining and Machine Learning Techniques to Submarine  Intelligence Analysis\nData mining and Privacy in Public Sector using Intelligent Agents  (discussion paper)\nA Concurrent Fuzzy-Neural Network Approach for Decision Support Systems\nCan an Organism Adapt Itself to Unforeseen Circumstances?\nIntelligent systems in the context of surrounding environment\nIntelligent location of simultaneously active acoustic emission sources:  Part II\nComputational Intelligence Characterization Method of Semiconductor  Device\nAn Intelligent Multi-Agent Recommender System for Human Capacity  Building\nAn Agent Based Classification Model\nLearning Better Context Characterizations: An Intelligent Information  Retrieval Approach\nSTORM - A Novel Information Fusion and Cluster Interpretation Technique\nSystem Dynamics Modelling of the Processes Involving the Maintenance of  the Naive T Cell Repertoire\nInformal Concepts in Machines\nThe DCA:SOMe Comparison A comparative study between two  biologically-inspired algorithms\nNot only a lack of right definitions: Arguments for a shift in  information-processing paradigm\nA Novel Approach for Cardiac Disease Prediction and Classification Using  Intelligent Agents\nAutomatic Wrapper Adaptation by Tree Edit Distance Matching\nFinding Shortest Path for Developed Cognitive Map Using Medial Axis\nA Proposed Decision Support System/Expert System for Guiding Fresh  Students in Selecting a Faculty in Gomal University, Pakistan\nTowards Maximum Spanning Tree Model in Web 3.0 Design and Development  for Students using Discriminant Analysis\nIntelligent Automated Diagnosis of Client Device Bottlenecks in Private  Clouds\nDiagnosing client faults using SVM-based intelligent inference from TCP  packet traces\nSystem identification and modeling for interacting and non-interacting  tank systems using intelligent techniques\nThe state-of-the-art in web-scale semantic information processing for  cloud computing\nPenetration Testing == POMDP Solving?\nSmart machines and the SP theory of intelligence\nQuantum speedup for active learning agents\nActive Learning for Autonomous Intelligent Agents: Exploration,  Curiosity, and Interaction\nProposal of a multiagent-based smart environment for the IoT\nBad Universal Priors and Notions of Optimality\nIBMMS Decision Support Tool For Management of Bank Telemarketing  Campaigns\nOptimal Route Planning with Prioritized Task Scheduling for AUV Missions\nAGI and Reflexivity\nMultiple ant-bee colony optimization for load balancing in  packet-switched networks\nFirefly Algorithm: Recent Advances and Applications\nLearning-Based Procedural Content Generation\nSoftware & Systems Engineering Process and Tools for the Development of  Autonomous Driving Intelligence\nImplementing an intelligent version of the classical sliding-puzzle game  for unix terminals using Golang's concurrency primitives\nMaintaining prediction quality under the condition of a growing  knowledge space\nA different perspective on a scale for pairwise comparisons\nA Neuro-Fuzzy Method to Improving Backfiring Conversion Ratios\nAn intelligent extension of Variable Neighbourhood Search for labelling  graph problems\nDriverseat: Crowdstrapping Learning Tasks for Autonomous Driving\nDecision Aids for Adversarial Planning in Military Operations:  Algorithms, Tools, and Turing-test-like Experimental Validation\nOn Reward Function for Survival\nSuperintelligence cannot be contained: Lessons from Computability Theory\nMIST: Missing Person Intelligence Synthesis Toolkit\nConstrained Cohort Intelligence using Static and Dynamic Penalty  Function Approach for Mechanical Components Design\nApplying Chatbots to the Internet of Things: Opportunities and  Architectural Elements\nIterative Multi-document Neural Attention for Multiple Answer Prediction\nOntology based Scene Creation for the Development of Automated Vehicles\nIntelligent Personal Assistant with Knowledge Navigation\nImproving drug sensitivity predictions in precision medicine through  active expert knowledge elicitation\nRise of the humanbot\nSoftware engineering and the SP theory of intelligence\nWhat's up with Privacy?: User Preferences and Privacy Concerns in  Intelligent Personal Assistants\nImproved Learning in Evolution Strategies via Sparser Inter-Agent  Network Topologies\nAliMe Assist: An Intelligent Assistant for Creating an Innovative  E-commerce Experience\nNature vs. Nurture: The Role of Environmental Resources in Evolutionary  Deep Intelligence\nAntifragility for Intelligent Autonomous Systems\nAnalyzing Business Process Anomalies Using Autoencoders\nTowards Intelligent Vehicular Networks: A Machine Learning Framework\nNot just about size - A Study on the Role of Distributed Word  Representations in the Analysis of Scientific Publications\nClustering and Retrieval Method of Immunological Memory Cell in Clonal  Selection Algorithm\nCombating catastrophic forgetting with developmental compression\nAffective Recommendation System for Tourists by Using Emotion Generating  Calculations\nSCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis  Using Artificial Neural Networks\nSpontaneous organization leads to robustness in evolutionary algorithms\nSchema Redescription in Cellular Automata: Revisiting Emergence in  Complex Systems\nAcquiring Word-Meaning Mappings for Natural Language Interfaces\nOn the Formal Semantics of Speech-Act Based Communication in an  Agent-Oriented Programming Language\nSoftware Aging Analysis of Web Server Using Neural Networks\nTowards Swarm Calculus: Urn Models of Collective Decisions and Universal  Properties of Swarm Performance\nAn Extensive Report on Cellular Automata Based Artificial Immune System  for Strengthening Automated Protein Prediction\nTheta*: Any-Angle Path Planning on Grids\nUsing Learned Predictions as Feedback to Improve Control and  Communication with an Artificial Limb: Preliminary Findings\nKnowledge and Uncertainty\nRecurrent Neural Network Based Modeling of Gene Regulatory Network Using  Bat Algorithm\nGrounded Language Learning in a Simulated 3D World\nMachine Learning, Deepest Learning: Statistical Data Assimilation  Problems\nLearning Visual Reasoning Without Strong Priors\nDetermining Positive Cancer Rescue Mutations in p53 Based Cancers by  using Artificial Intelligence\nA Benchmark Environment Motivated by Industrial Control Problems\nPattern Recognition Techniques for the Identification of Activities of  Daily Living using Mobile Device Accelerometer\nUser Environment Detection with Acoustic Sensors Embedded on Mobile  Devices for the Recognition of Activities of Daily Living\nDeep Reinforcement Learning using Capsules in Advanced Game Environments\nA New Multi Criteria Decision Making Method: Approach of Logarithmic  Concept (APLOCO)\nPredicting Natural Hazards with Neuronal Networks\nExploration of RNA Editing and Design of Robust Genetic Algorithms\nOn the possible Computational Power of the Human Mind\nApplication of Artificial Neural Network in Jitter Analysis of  Dispersion-Managed Communication System\nResponse Prediction of Structural System Subject to Earthquake Motions  using Artificial Neural Network\nOn the Effects of Idiotypic Interactions for Recommendation Communities  in Artificial Immune Systems\nDanger Theory: The Link between AIS and IDS?\nA Recommender System based on Idiotypic Artificial Immune Networks\nAGNOSCO - Identification of Infected Nodes with artificial Ant Colonies\nDistributed Self Management for Distributed Security Systems\nArtificial Dendritic Cells: Multi-faceted Perspectives\nFurther Exploration of the Dendritic Cell Algorithm: Antigen Multiplier  and Time Windows\nlibtissue - implementing innate immunity\nArtificial Immune Systems Metaphor for Agent Based Modeling of Crisis  Response Operations\nOil Price Trackers Inspired by Immune Memory\nPrice Trackers Inspired by Immune Memory\nThe Application of a Dendritic Cell Algorithm to a Robotic Classifier\nThe Deterministic Dendritic Cell Algorithm\nDetecting Anomalous Process Behaviour using Second Generation Artificial  Immune Systems\nArtificial Immune Systems (2010)\nModeling Spammer Behavior: Naïve Bayes vs. Artificial Neural Networks\nAn Artificial Immune System Model for Multi-Agents Resource Sharing in  Distributed Environments\nComparative study of Financial Time Series Prediction by Artificial  Neural Network with Gradient Descent Learning\nDesign of Emergent and Adaptive Virtual Players in a War RTS Game\nA Hybrid Artificial Bee Colony Algorithm for Graph 3-Coloring\nMemetic Artificial Bee Colony Algorithm for Large-Scale Global  Optimization\nNeuro-Fuzzy Computing System with the Capacity of Implementation on  Memristor-Crossbar and Optimization-Free Hardware Training\nArtificial Neural Network Fuzzy Inference System (ANFIS) For Brain Tumor  Detection\nTermite-hill: From natural to artificial termites in sensor networks\nEstimation of soil moisture in paddy field using Artificial Neural  Networks\nReal-world Transfer of Evolved Artificial Immune System Behaviours  between Small and Large Scale Robotic Platforms\nEnglish Character Recognition using Artificial Neural Network\nEvaluation the efficiency of artificial bee colony and the firefly  algorithm in solving the continuous optimization problem\nAssessment of Customer Credit through Combined Clustering of Artificial  Neural Networks, Genetics Algorithm and Bayesian Probabilities\nAn ANN Based Call Handoff Management Scheme for Mobile Cellular Network\nComparison of Selection Methods in On-line Distributed Evolutionary  Robotics\nEmergence-focused design in complex system simulation\nDiscovering Latent States for Model Learning: Applying Sensorimotor  Contingencies Theory and Predictive Processing to Model Context\nToward the Coevolution of Novel Vertical-Axis Wind Turbines\nQuantum Cybernetics and Complex Quantum Systems Science - A Quantum  Connectionist Exploration\nDoes the D.C. Response of Memristors Allow Robotic Short-Term Memory and  a Possible Route to Artificial Time Perception?\nA Novel Energy Aware Node Clustering Algorithm for Wireless Sensor  Networks Using a Modified Artificial Fish Swarm Algorithm\nArtificial Prediction Markets for Online Prediction of Continuous  Variables-A Preliminary Report\nComparing Human and Automated Evaluation of Open-Ended Student Responses  to Questions of Evolution\nOn the Performance of Network Parallel Training in Artificial Neural  Networks\nLearning an attention model in an artificial visual system\nMaximum Resilience of Artificial Neural Networks\nMind the Gap: A Well Log Data Analysis\nHoME: a Household Multimodal Environment\nHuman Perception of Performance\nComparing heterogeneous entities using artificial neural networks of  trainable weighted structural components and machine-learned activation  functions\nGalactic Gradients, Postbiological Evolution and the Apparent Failure of  SETI\nBusiness Intelligence from Web Usage Mining\nMulti-Modal Human-Machine Communication for Instructing Robot Grasping  Tasks\nCognitive Internet of Things: A New Paradigm beyond Connection\nEvolutionary-aided negotiation model for bilateral bargaining in Ambient  Intelligence domains with complex utility functions\nTowards an intelligent VNS heuristic for the k-labelled spanning forest  problem\nDarknet and Deepnet Mining for Proactive Cybersecurity Threat  Intelligence\nDetection, Recognition and Tracking of Moving Objects from Real-time  Video via SP Theory of Intelligence and Species Inspired PSO\nDeep Anticipation: Light Weight Intelligent Mobile Sensing in IoT by  Recurrent Architecture\nJointDNN: An Efficient Training and Inference Engine for Intelligent  Mobile Cloud Computing Services\nThe Ethics of Robotics\nBelief and Truth in Hypothesised Behaviours\nShootout-89: A Comparative Evaluation of Knowledge-based Systems that  Forecast Severe Weather\nCognitive Science in the era of Artificial Intelligence: A roadmap for  reverse-engineering the infant language-learner\nBionic Humans Using EAP as Artificial Muscles Reality and Challenges\nThe Self-Organization of Speech Sounds\nExploration Of The Dendritic Cell Algorithm Using The Duration Calculus\nDelay dynamics of neuromorphic optoelectronic nanoscale resonators:  Perspectives and applications\nCollective Intelligence, Data Routing and Braess' Paradox\nThe SP theory of intelligence: an overview\nInformation Compression, Intelligence, Computing, and Mathematics\nA Market-Inspired Approach for Intersection Management in Urban Road  Traffic Networks\nProposal for the creation of a research facility for the development of  the SP machine\nAI Gamma-Ray Burst Classification: Methodology/Preliminary Results\nSelf Control of Chaotic Dynamics using LTI Filters\nAn Empirically Motivated Reinterpretation of Dependency Grammar\nThe Acquisition of a Lexicon from Paired Phoneme Sequences and Semantic  Representations\nMeasuring semantic complexity\nConstraint Categorial Grammars\nUsing Information Content to Evaluate Semantic Similarity in a Taxonomy\nA Chart Generator for Shake and Bake Machine Translation\nA Conceptual Reasoning Approach to Textual Ellipsis\nRationality, Cooperation and Conversational Implicature\nChess Pure Strategies are Probably Chaotic\nA Proof Theoretic View of Constraint Programming\nAn Adaptive Agent Oriented Software Architecture\nInducing a Semantically Annotated Lexicon via EM-Based Clustering\nPredicate Logic with Definitions\nAutomatically Selecting Useful Phrases for Dialogue Act Tagging\nEvents in Property Patterns\nExtending the Stable Model Semantics with More Expressive Rules\nThe Rough Guide to Constraint Propagation\nDeduction over Mixed-Level Logic Representations for Text Passage  Retrieval\nMixed-Level Knowledge Representation and Variable-Depth Inference in  Natural Language Processing\nSafe cooperative robot dynamics on graphs\nProspects for in-depth story understanding by computer\nA database and lexicon of scripts for ThoughtTreasure\nConditional indifference and conditional preservation\nDescription of GADEL\nHypothetical revision and matter-of-fact supposition\nProbabilistic Default Reasoning with Conditional Constraints\nA Compiler for Ordered Logic Programs\nSLDNFA-system\nDeclarative Representation of Revision Strategies\nDLV - A System for Declarative Problem Solving\nFages' Theorem and Answer Set Programming\nA note on the Declarative reading(s) of Logic Programming\nSATEN: An Object-Oriented Web-Based Revision and Extraction Engine\ndcs: An Implementation of DATALOG with Constraints\nRepresentation results for defeasible logic\nProgramming in Alma-0, or Imperative and Declarative Programming  Reconciled\nPractical Reasoning for Expressive Description Logics\nReasoning with Individuals for the Description Logic SHIQ\nCentroid-based summarization of multiple documents: sentence extraction,  utility-based evaluation, and user studies\nModeling the Uncertainty in Complex Engineering Systems\nThe SAT Phase Transition\nMultiagent Control of Self-reconfigurable Robots\nKnowledge on Treelike Spaces\nComputing Presuppositions by Contextual Reasoning\nTree-gram Parsing: Lexical Dependencies and Structural Relations\nCauses and Explanations: A Structural-Model Approach, Part I: Causes\nOrder-consistent programs are cautiously monotonic\nRule Writing or Annotation: Cost-efficient Resource Usage for Base Noun  Phrase Chunking\nSolving Composed First-Order Constraints from Discrete-Time Robust  Control\nSoft Scheduling\nFile mapping Rule-based DBMS and Natural Language Processing\nStacking classifiers for anti-spam filtering of e-mail\nA Sequential Model for Multi-Class Classification\nEnhancing Constraint Propagation with Composition Operators\naspps --- an implementation of answer-set programming with propositional  schemata\nIntegrating Multiple Knowledge Sources for Robust Semantic Parsing\nA logic-based approach to data integration\nGradient-based Reinforcement Planning in Policy-Search Methods\nA Tight Upper Bound on the Number of Candidate Patterns\nRepresentation of Uncertainty for Limit Processes\nInteractive Constrained Association Rule Mining\nRational Competitive Analysis\nThe Deductive Database System LDL++\nDistance Semantics for Belief Revision\nPreferred History Semantics for Iterated Updates\nOptimal Solutions for Multi-Unit Combinatorial Auctions: Branch and  Bound Heuristics\nCovariance Plasticity and Regulated Criticality\nStereotypical Reasoning: Logical Properties\nTwo results for proiritized logic programming\nThe Algorithms of Updating Sequential Patterns\nFast Hands-free Writing by Gaze Direction\nBelief Revision and Rational Inference\nComputing stable models: worst-case performance estimates\nRelational Association Rules: getting WARMeR\nOn Concise Encodings of Preferred Extensions\nAlternative Characterizations for Strong Equivalence of Logic Programs\nSome logics of belief and disbelief\nWell-Founded Argumentation Semantics for Extended Logic Programming\nLogic Programming with Ordered Disjunction\nThe Rise and Fall of the Church-Turing Thesis\nEmbedding Default Logic in Propositional Argumentation Systems\nRevising Partially Ordered Beliefs\nCompilability of Abduction\nAdaptive Development of Koncepts in Virtual Animats: Insights into the  Development of Knowledge\nThinking Adaptive: Towards a Behaviours Virtual Laboratory\nRedundancy in Logic I: CNF Propositional Formulae\nMonadic Style Control Constructs for Inference Systems\nDynamic Adjustment of the Motivation Degree in an Action Selection  Mechanism\nA Theory of Cross-Validation Error\nMerging Locally Correct Knowledge Bases: A Preliminary Report\nKalman filter control in the reinforcement learning framework\nA semantic framework for preference handling in answer set programming\nConstraint-based analysis of composite solvers\nTight Logic Programs\nKalman-filtering using local interactions\nOn the Notion of Cognition\nTime-scales, Meaning, and Availability of Information in a Global Brain\nClustering belief functions based on attracting and conflicting  metalevel evidence\nTechniques for effective vocabulary selection\nLexicographic probability, conditional probability, and nonstandard  probability\nReinforcement Learning with Linear Function Approximation and LQ control  Converges\nAn Alternative to RDF-Based Languages for the Representation and  Processing of Ontologies in the Semantic Web\nA logic for reasoning about upper probabilities\nLearning in Multiagent Systems: An Introduction from a Game-Theoretic  Perspective\nModel-Based Debugging using Multiple Abstract Models\nA Hierarchical Situation Calculus\nApplication of Kullback-Leibler Metric to Speech Recognition\nThe Algebra of Utility Inference\nAn information theory for preferences\nGreat Expectations. Part I: On the Customizability of Generalized  Expected Utility\nUsing Counterfactuals in Knowledge-Based Programming\nResponsibility and blame: a structural-model approach\nDiagnostic reasoning with A-Prolog\nDialogue as Discourse: Controlling Global Properties of Scripted  Dialogue\nAcquiring Lexical Paraphrases from a Single Corpus\nUnifying Computing and Cognition: The SP Theory and its Applications\nSelf-Organising Networks for Classification: developing Applications to  Science Analysis for Astroparticle Physics\nThe Complexity of Modified Instances\nThe role of behavior modifiers in representation development\nA Comparative Study of Arithmetic Constraints on Integer Intervals\nXML framework for concept description and knowledge representation\nKnowledge And The Action Description Language A\n\"In vivo\" spam filtering: A challenge problem for data mining\nMulti-agent coordination using nearest neighbor rules: revisiting the  Vicsek model\nOn the Complexity of Case-Based Planning\nA Sequent Calculus and a Theorem Prover for Standard Conditional Logics\nAn Algorithm for Quasi-Associative and Quasi-Markovian Rules of  Combination in Information Fusion\nOn Global Warming (Softening Global Constraints)\nFLUX: A Logic Programming Method for Reasoning Agents\nThe Generalized Pignistic Transformation\nApplying Policy Iteration for Training Recurrent Neural Networks\nL1 regularization is better than L2 for learning and predicting chaotic  systems\nIntransitivity and Vagueness\nSleeping Beauty Reconsidered: Conditioning and Reflection in  Asynchronous Systems\nRobust Dialogue Understanding in HERALD\nBounded Input Bounded Predefined Control Bounded Output\nTopological Navigation of Simulated Robots using Occupancy Grid\nA Link Clustering Based Approach for Clustering Categorical Data\nFinite Domain Bounds Consistency Revisited\nClever Search: A WordNet Based Wrapper for Internet Search Engines\nCorpus based Enrichment of GermaNet Verb Frames\nContext Related Derivation of Word Senses\nTransforming and Enriching Documents for the Semantic Web\nEstimating mutual information and multi--information in large networks\nDecomposable Problems, Niching, and Scalability of Multiobjective  Estimation of Distribution Algorithms\nTowards a Systematic Account of Different Semantics for Logic Programs\nProperty analysis of symmetric travelling salesman problem instances  acquired through evolution\nComplexity Issues in Finding Succinct Solutions of PSPACE-Complete  Problems\nAn Optimization Model for Outlier Detection in Categorical Data\nMonotonic and Nonmonotonic Preference Revision\nConstraint-Based Qualitative Simulation\nLearning Polynomial Networks for Classification of Clinical  Electroencephalograms\nA Learning Algorithm for Evolving Cascade Neural Networks\nPolynomial Neural Networks Learnt to Classify EEG Signals\nAn Evolving Cascade Neural Network Technique for Cleaning Sleep  Electroencephalograms\nThe Combined Technique for Detection of Artifacts in Clinical  Electroencephalograms of Sleeping Newborns\nSingle-solution Random 3-SAT Instances\nBeyond Hypertree Width: Decomposition Methods Without Decompositions\nWavelet Time Shift Properties Integration with Support Vector Machines\nRedundancy in Logic II: 2CNF and Horn Propositional Formulae\nTwo-dimensional cellular automata and the analysis of correlated time  series\nIn the beginning was game semantics\nData complexity of answering conjunctive queries over SHIQ knowledge  bases\nTemporal Phylogenetic Networks and Logic Programming\nPlanning with Preferences using Logic Programming\nTransitive Text Mining for Information Extraction and Hypothesis  Generation\nA formally verified proof of the prime number theorem\nK-Histograms: An Efficient Clustering Algorithm for Categorical Dataset\nAuthoring case based training by document data extraction\nInteractive Unawareness Revisited\nAutomatic extraction of paraphrastic phrases from medium size corpora\nSur le statut référentiel des entités nommées\nK-ANMI: A Mutual Information Based Clustering Algorithm for Categorical  Data\nIntegration of Declarative and Constraint Programming\nProcessing Uncertainty and Indeterminacy in Information Systems success  mapping\nMathematical Models in Schema Theory\nThe logic of interactive Turing reduction\nDistributed Kernel Regression: An Algorithm for Training Collaboratively\nThe intuitionistic fragment of computability logic at the propositional  level\nAvoiding the Bloat with Stochastic Grammar-based Genetic Programming\nExplaining Constraint Programming\nMetatheory of actions: beyond consistency\nConvergence of Min-Sum Message Passing for Quadratic Optimization\nConsensus Propagation\nApplication of Support Vector Regression to Interpolation of Sparse  Shock Physics Data Sets\nApproximation Algorithms for K-Modes Clustering\nNearly optimal exploration-exploitation decision thresholds\nAdaptative combination rule and proportional conflict redistribution  rule for information fusion\nPerspective alignment in spatial language\nA framework of reusable structures for mobile agent development\nMobile Agent Based Solutions for Knowledge Assessment in elearning  Environments\nClassification of Ordinal Data\nA Decision-Making Support System Based on Know-How\nBuilding a logical model in the machining domain for CAPP expert systems\nThe Cumulative Rule for Belief Fusion\nDatabase Querying under Changing Preferences\nAn Analysis of Arithmetic Constraints on Integer Intervals\nDealing with Metonymic Readings of Named Entities\nLinguistically Grounded Models of Language Change\nReasoning with Intervals on Granules\nAbout Norms and Causes\nTowards \"Propagation = Logic + Control\"\nInfinite Qualitative Simulations by Means of Constraint Programming\nCascade hash tables: a series of multilevel double hashing schemes with  O(1) worst case lookup time\nLogic programs with monotone abstract constraint atoms\nThe role of time in considering collections\nAn application-oriented terminology evaluation: the case of back-of-the  book indexes\nOntologies and Information Extraction\nRapport technique du projet OGRE\nWhy did the accident happen? A norm-based reasoning approach\nNorm Based Causal Reasoning in Textual Corpus\nFarthest-Point Heuristic based Initialization Methods for K-Modes  Clustering\nAnalytic Tableaux Calculi for KLM Logics of Nonmonotonic Reasoning\nEvolutionary Optimization in an Algorithmic Setting\nFunctional Brain Imaging with Multi-Objective Multi-Modal Evolutionary  Optimization\nOn Measuring the Impact of Human Actions in the Machine Learning of a  Board Game's Playing Policies\nPlayer co-modelling in a strategy board game: discovering how to play  fast\nLossless fitness inheritance in genetic algorithms for decision trees\nPropositional theories are strongly equivalent to logic programs\nA novel set of rotationally and translationally invariant features for  images based on the non-commutative bispectrum\nDealing With Logical Omniscience: Expressiveness and Pragmatics\nUniform and Partially Uniform Redistribution Rules\nLogic Programming with Satisfiability\nRedesigning Decision Matrix Method with an indeterminacy-based inference  process\nCopula Component Analysis\nModelling Complexity in Musical Rhythm\nReinforcement Learning for Adaptive Routing\nRecursion relations for two-loop self-energy diagrams on-shell\nApplication of the Worldline Path Integral Method to the Calculation of  Inverse Mass Expansions\nSome Local Measures of Complexity of Convex Hulls and Generalization  Bounds\nLearning a Machine for the Decision in a Partially Observable Markov  Universe\nFast Non-Parametric Bayesian Inference on Infinite Trees\nStrong Asymptotic Assertions for Discrete MDL in Regression and  Classification\nStatistical Modeling of Nuclear Systematics\nDeterministic Chaos: A signature of Quantumlike Mechanics in  Self-Organized Adaptive Network\nSemiclassical Neural Network\nQuantum Aspects of Semantic Analysis and Symbolic Artificial  Intelligence\nA genetic algorithm for finding pulse sequences for NMR quantum  computing\nIntroduction to Arabic Speech Recognition Using CMUSphinx System\nComparing Robustness of Pairwise and Multiclass Neural-Network Systems  for Face Recognition\nA Note on Ontology and Ordinary Language\nFault Classification in Cylinders Using Multilayer Perceptrons, Support  Vector Machines and Guassian Mixture Models\nThe Parameter-Less Self-Organizing Map algorithm\nBayesian Approach to Neuro-Rough Models\nEvolving Symbolic Controllers\nA first-order Temporal Logic for Actions\nFuzzy and Multilayer Perceptron for Evaluation of HV Bushings\nOn the monotonization of the training set\nLoop corrections for message passing algorithms in continuous variable  models\nEpistemic Analysis of Strategic Games with Arbitrary Strategy Sets\nAutomatically Restructuring Practice Guidelines using the GEM DTD\nBijective Faithful Translations among Default Logics\nLearning Probabilistic Models of Word Sense Disambiguation\nA structure from motion inequality\nOn Ullman's theorem in computer vision\nRaising a Hardness Result\nOn Ultrametric Algorithmic Information\nQualitative Belief Conditioning Rules (QBCR)\nMulti-Sensor Fusion Method using Dynamic Bayesian Network for Precise  Vehicle Localization and Road Matching\nAutoencoder, Principal Component Analysis and Support Vector Regression  for Data Imputation\nFrom Texts to Structured Documents: The Case of Health Practice  Guidelines\nQuantum Causal Networks\nWhat's in a Name?\nStationary probability density of stochastic search processes in global  optimization\nAnalyzing covert social network foundation behind terrorism disaster\nNode discovery problem for a social network\nComputer Model of a \"Sense of Humour\". I. General Algorithm\nComputer Model of a \"Sense of Humour\". II. Realization in Neural  Networks\nDerivative of functions over lattices as a basis for the notion of  interaction between attributes\nCan a Computer Laugh ?\nThe Second Law as a Cause of the Evolution\nA Spectral Approach to Analyzing Belief Propagation for 3-Coloring\nDecomposition During Search for Propagation-Based Constraint Solvers\nCommon knowledge logic in a higher order proof assistant?\nJudgment\niBOA: The Incremental Bayesian Optimization Algorithm\nA review of the Statistical Mechanics approach to Random Optimization  Problems\nBrain architecture: A design for natural computation\nDempster-Shafer for Anomaly Detection\nImproved evolutionary generation of XSLT stylesheets\nTableau-based decision procedures for logics of strategic ability in  multi-agent systems\nTowards a human eye behavior model by applying Data Mining Techniques on  Gaze Information from IEC\nEye-Tracking Evolutionary Algorithm to minimize user's fatigue in IEC  applied to Interactive One-Max problem\nUniversality in Globally Coupled Maps and Flows\nCombinatorial Explorations in Su-Doku\nSupport Vector Machine Classification with Indefinite Kernels\nGraphical Estimation of Permeability Using RST&NFIS\nApplication of Rough Set Theory to Analysis of Hydrocyclone Operation\nA Unified Semi-Supervised Dimensionality Reduction Framework for  Manifold Learning\nFrom Qualitative to Quantitative Proofs of Security Properties Using  First-Order Conditional Logic\nCausal models have no complete axiomatic characterization\nGrainy Numbers\nSimDialog: A visual game dialog editor\nContact state analysis using NFIS and SOM\nA Fast Algorithm and Datalog Inexpressibility for Temporal Reasoning\nToward Fuzzy block theory\nAnalysis of hydrocyclone performance based on information granulation  theory\nCognitive Architecture for Direction of Attention Founded on Subliminal  Memory Searches, Pseudorandom and Nonstop\nCompressing Binary Decision Diagrams\nLogic programming with social features\nAn Evolutionary-Based Approach to Learning Multiple Decision Models from  Underrepresented Data\nFusion for Evaluation of Image Classification in Uncertain Environments\nFusion de classifieurs pour la classification d'images sonar\nExperts Fusion and Multilayer Perceptron Based on Belief Learning for  Sonar Image Classification\nDefaults and Normality in Causal Structures\nOn Sequences with Non-Learnable Subsequences\nPrediction with Expert Advice in Games with Unbounded One-Step Gains\nOn empirical meaning of randomness with respect to a real parameter\nA new Hedging algorithm and its application to inferring latent random  variables\nBelief decision support and reject for textured images characterization\nA new probabilistic transformation of belief mass assignment\nAceWiki: Collaborative Ontology Management in Controlled Natural  Language\nInitial Results on the F-logic to OWL Bi-directional Translation on a  Tabled Prolog Engine\nn-ary Fuzzy Logic and Neutrosophic Logic Operators\nImproving Local Search for Fuzzy Scheduling Problems\nBin Packing Under Multiple Objectives - a Heuristic Approximation  Approach\nAn application of the Threshold Accepting metaheuristic for curriculum  based course timetabling\nPeek Arc Consistency\nPredicting Abnormal Returns From News Using Text Classification\nLearning Hidden Markov Models using Non-Negative Matrix Factorization\nDetermining the Unithood of Word Sequences using a Probabilistic  Approach\nThree New Complexity Results for Resource Allocation Problems\nA global physician-oriented medical information system\nOn combinations of local theory extensions\nCombining Semantic Wikis and Controlled Natural Language\nThe many faces of optimism - Extended version\nThe use of entropy to measure structural diversity\nA computational model of affects\nProbabilistic reasoning with answer sets\nLogic programs with propositional connectives and aggregates\nA New Trend in Optimization on Multi Overcomplete Dictionary toward  Inpainting\nPattern Recognition and Memory Mapping using Mirroring Neural Networks\nAnalyse et structuration automatique des guides de bonnes pratiques  cliniques : essai d'évaluation\nA Computational Model to Disentangle Semantic Information Embedded in  Word Association Norms\nA New Method for Knowledge Representation in Expert System's (XMLKR)\nOn the Optimal Convergence Probability of Univariate Estimation of  Distribution Algorithms\nTransitivity vs. Intransitivity in decision making process. (An example  in quantum game theory)\nHiding Quiet Solutions in Random Constraint Satisfaction Problems\nResource Adaptive Agents in Interactive Theorem Proving\nGeospatial semantics: beyond ontologies, towards an enactive approach\nCut-Simulation and Impredicativity\nBack analysis of microplane model parameters using soft computing  methods\nTopological Centrality and Its Applications\nFeature Hashing for Large Scale Multitask Learning\nError-Correcting Tournaments\nProgress in Computer-Assisted Inductive Theorem Proving by  Human-Orientedness and Descente Infinie?\nFull First-Order Sequent and Tableau Calculi With Preservation of  Solutions and the Liberalized delta-Rule but Without Skolemization\nWhy Would You Trust B?\nRange and Roots: Two Common Patterns for Specifying and Propagating  Counting and Occurrence Constraints\nImpact of Cognitive Radio on Future Management of Spectrum\nOnline Estimation of SAT Solving Runtime\nBreaking Value Symmetry\nEfficiently Learning a Detection Cascade with Sparse Eigenvectors\nDesigning a GUI for Proofs - Evaluation of an HCI Experiment\nConditional Probability Tree Estimation Analysis and Algorithms\nTime manipulation technique for speeding up reinforcement learning in  simulations\nLearning for Dynamic subsumption\nSafe Reasoning Over Ontologies\nEligibility Propagation to Speed up Time Hopping for Reinforcement  Learning\nOptimal Tableau Decision Procedures for PDL\nDependency Pairs and Polynomial Path Orders\nAgent-Based Decision Support System to Prevent and Manage Risk  Situations\nHybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database  Representation Approach\nSemantic Social Network Analysis\nAdaptive Learning with Binary Neurons\nQuantified Multimodal Logics in Simple Type Theory\nQuantum Annealing for Clustering\nQuantum Annealing for Variational Bayes Inference\nInformation Modeling for a Dynamic Representation of an Emergency  Situation\nThe CIFF Proof Procedure for Abductive Logic Programming with  Constraints: Theory, Implementation and Experiments\nTowards Improving Validation, Verification, Crash Investigations, and  Event Reconstruction of Flight-Critical Systems with Self-Forensics\nTwo-Dimensional ARMA Modeling for Breast Cancer Detection and  Classification\nSoft Constraints for Quality Aspects in Service Oriented Architectures\nConcept-based Recommendations for Internet Advertisement\nGeneral combination rules for qualitative and quantitative beliefs\nRestricted Global Grammar Constraints\nAgent-Oriented Approach for Detecting and Managing Risks in Emergency  Situations\nComputational Scenario-based Capability Planning\nCredit Assignment in Adaptive Evolutionary Algorithms\nHow Controlled English can Improve Semantic Wikis\nImprovements for multi-objective flow shop scheduling by Pareto Iterated  Local Search\nGraph Theory and Optimization Problems for Very Large Networks\nRestart Strategy Selection using Machine Learning Techniques\nOn Classification from Outlier View\nConvergence of Expected Utility for Universal AI\nKnowledge Discovery of Hydrocyclone s Circuit Based on SONFIS and SORST\nPractical approach to programmable analog circuits with memristors\nQuantifying Rational Belief\nMaximizing profit using recommender systems\nAssessing the Impact of Informedness on a Consultant's Profit\nn-Opposition theory to structure debates\nStochastic Optimization of Linear Dynamic Systems with Parametric  Uncertainties\nBuilding upon Fast Multipole Methods to Detect and Model Organizations\nA Local Search Modeling for Constrained Optimum Paths Problems (Extended  Abstract)\nIntegrating Conflict Driven Clause Learning to Local Search\nSonet Network Design Problems\nToward an automaton Constraint for Local Search\nLudics and its Applications to natural Language Semantics\nMachine Learning: When and Where the Horses Went Astray?\nManipulating Tournaments in Cup and Round Robin Competitions\nActive Learning for Mention Detection: A Comparison of Sentence  Selection Strategies\nEmotion: Appraisal-coping model for the \"Cascades\" problem\nEmotion : modèle d'appraisal-coping pour le problème des Cascades\nApply Ant Colony Algorithm to Search All Extreme Points of Function\nA Semantic Similarity Measure for Expressive Description Logics\nOpportunistic Adaptation Knowledge Discovery\nA Multi-stage Probabilistic Algorithm for Dynamic Path-Planning\nData management in Systems biology II - Outlook towards the semantic web\nMulti-valued Action Languages in CLP(FD)\nNew Generalization Bounds for Learning Kernels\nElkan's k-Means for Graphs\nComplexity of stochastic branch and bound methods for belief tree search  in Bayesian reinforcement learning\nAbstract Answer Set Solvers with Learning\nDocument Clustering with K-tree\nK-tree: Large Scale Document Clustering\nGraph Quantization\nA betting interpretation for probabilities and Dempster-Shafer degrees  of belief\nDetecting Botnets Through Log Correlation\nClassifying Network Data with Deep Kernel Machines\nJanus: Automatic Ontology Builder from XSD Files\nGenetic algorithm for robotic telescope scheduling\nConstraint solvers: An empirical evaluation of design decisions\nLogical Evaluation of Consciousness: For Incorporating Consciousness  into Machine Architecture\nUsing CODEQ to Train Feed-forward Neural Networks\nDire n'est pas concevoir\nA Generalization of the Chow-Liu Algorithm and its Application to  Statistical Learning\nUsing ATL to define advanced and flexible constraint model  transformations\nConvergence of Bayesian Control Rule\nNonparametric Estimation and On-Line Prediction for General Stationary  Ergodic Sources\nLess Regret via Online Conditioning\nDeep Big Simple Neural Nets Excel on Handwritten Digit Recognition\nProceedings FM-09 Workshop on Formal Methods for Aerospace\nGeometric Algebra Model of Distributed Representations\nOntology-supported processing of clinical text using medical knowledge  integration for multi-label classification of diagnosis coding\nNode inspection and analysis thereof in the light of area estimation and  curve fitting\nSpatio-Temporal Graphical Model Selection\nCausality and the semantics of provenance\nPublishing Math Lecture Notes as Linked Data\nOntology-based inference for causal explanation\nDesigning neural networks that process mean values of random variables\nSimple Type Theory as Framework for Combining Logics\nThe Exact Closest String Problem as a Constraint Satisfaction Problem\nAn introduction to spectral distances in networks (extended version)\nAdaptive Bases for Reinforcement Learning\nUsing machine learning to make constraint solver implementation  decisions\nA Soft Computing Model for Physicians' Decision Process\nGenetic algorithms and the art of Zen\nEvidence Algorithm and System for Automated Deduction: A Retrospective  View\nProofs, proofs, proofs, and proofs\nFailover in cellular automata\nBuilding Computer Network Attacks\nMDPs with Unawareness\nMirrored Language Structure and Innate Logic of the Human Brain as a  Computable Model of the Oracle Turing Machine\nOnline Cake Cutting\nA Brief Introduction to Temporality and Causality\nTesting and Debugging Techniques for Answer Set Solver Development\nA decidable subclass of finitary programs\nIdentifying Causal Effects with Computer Algebra\nThreat assessment of a possible Vehicle-Born Improvised Explosive Device  using DSmT\nCo-evolution is Incompatible with the Markov Assumption in Phylogenetics\nAssociative control processor with a rigid structure\nTowards arrow-theoretic semantics of ontologies: conceptories\nA Learning Algorithm based on High School Teaching Wisdom\nRole of Ontology in Semantic Web Development\nA formalism for causal explanations with an Answer Set Programming  translation\nLearning from Profession Knowledge: Application on Knitting\nDistributed solving through model splitting\nPrediction by Compression\nOptimizing Selective Search in Chess\nOntology Temporal Evolution for Multi-Entity Bayesian Networks under  Exogenous and Endogenous Semantic Updating\nMeasuring Similarity of Graphs and their Nodes by Neighbor Matching\nA Cost-Minimizing Algorithm for School Choice\nIntroduction to the iDian\nA Partial Taxonomy of Substitutability and Interchangeability\nQualitative Reasoning about Relative Direction on Adjustable Levels of  Granularity\nPrunnig Algorithm of Generation a Minimal Set of Rule Reducts Based on  Rough Set Theory\nReasoning about Cardinal Directions between Extended Objects: The  Hardness Result\nProbabilistic Inferences in Bayesian Networks\nTarget tracking in the recommender space: Toward a new recommender  system based on Kalman filtering\nReified unit resolution and the failed literal rule\nNew Methods of Analysis of Narrative and Semantics in Support of  Interactivity\nOptimizing real-time RDF data streams\nBayesian Modeling of a Human MMORPG Player\nAre SNOMED CT Browsers Ready for Institutions? Introducing MySNOM\nFirst steps in the logic-based assessment of post-composed phenotypic  descriptions\nNondeterministic fuzzy automata\nPhase Transitions of Plan Modification in Conformant Planning\nA new Recommender system based on target tracking: a Kalman Filter  approach\nDescriptive-complexity based distance for fuzzy sets\nInterpolation in Equilibrium Logic and Answer Set Programming: the  Propositional Case\nExtending Binary Qualitative Direction Calculi with a Granular Distance  Concept: Hidden Feature Attachment\nAdaptive Submodular Optimization under Matroid Constraints\nRestructuring in Combinatorial Optimization\nHybrid Model for Solving Multi-Objective Problems Using Evolutionary  Algorithm and Tabu Search\nAutomated Complexity Analysis Based on the Dependency Pair Method\nEvolved preambles for MAX-SAT heuristics\nCounting Solutions of Constraint Satisfiability Problems:Exact Phase  Transitions and Approximate Algorithm\nNew Worst-Case Upper Bound for #XSAT\nWorst-Case Upper Bound for (1, 2)-QSAT\nPractical inventory routing: A problem definition and an optimization  method\nAn Agent Based Architecture (Using Planning) for Dynamic and Semantic  Web Services Composition in an EBXML Context\nClimbing depth-bounded adjacent discrepancy search for solving hybrid  flow shop scheduling problems with multiprocessor tasks\nThe AllDifferent Constraint with Precedences\nInformed Heuristics for Guiding Stem-and-Cycle Ejection Chains\nHandwritten Digit Recognition with a Committee of Deep Neural Nets on  GPUs\nRepresenting First-Order Causal Theories by Logic Programs\nOn Understanding and Machine Understanding\nFoundations for Uniform Interpolation and Forgetting in Expressive  Description Logics\nUnderstanding Exhaustive Pattern Learning\nBoolean Equi-propagation for Optimized SAT Encoding\nHybrid Tractable Classes of Binary Quantified Constraint Satisfaction  Problems\nArc Consistency and Friends\nLimits of Preprocessing\nMean-Variance Optimization in Markov Decision Processes\nProposal of Pattern Recognition as a necessary and sufficient Principle  to Cognitive Science\nA Linear Time Natural Evolution Strategy for Non-Separable Functions\nActual causation and the art of modeling\nExtensional Higher-Order Logic Programming\nOn the expressive power of unit resolution\nEmbedding and Automating Conditional Logics in Classical Higher-Order  Logic\nCoincidences and the encounter problem: A formal account\nSymmetry-Based Search Space Reduction For Grid Maps\nUnderstanding opinions. A cognitive and formal account\nExploiting Reputation in Distributed Virtual Environments\nPose Estimation from a Single Depth Image for Arbitrary Kinematic  Skeletons\nClass-based Rough Approximation with Dominance Principle\nA case of combination of evidence in the Dempster-Shafer theory  inconsistent with evaluation of probabilities\nA Probabilistic Attack on NP-complete Problems\nLaw of Connectivity in Machine Learning\nTask swapping networks in distributed systems\nCurrent State and Challenges of Automatic Planning in Web Service  Composition\nRule-Based Semantic Sensing\nOn the Undecidability of Fuzzy Description Logics with GCIs with  Lukasiewicz t-norm\nAn end-to-end machine learning system for harmonic analysis of music\nA theory of multiclass boosting\nOrigins of Answer-Set Programming - Some Background And Two Personal  Accounts\nSolving puzzles described in English by automated translation to answer  set programming and learning how to do that translation\nConvergence of a Recombination-Based Elitist Evolutionary Algorithm on  the Royal Roads Test Function\nComputing with Logic as Operator Elimination: The ToyElim System\nEvent in Compositional Dynamic Semantics\nEncoding Phases using Commutativity and Non-commutativity in a Logical  Framework\nFdConfig: A Constraint-Based Interactive Product Configurator\nA prototype of a knowledge-based programming environment\nA Constraint Logic Programming Approach for Computing Ordinal  Conditional Functions\nConfidentiality-Preserving Data Publishing for Credulous Users by  Extended Abduction\nProof System for Plan Verification under 0-Approximation Semantics\nDomain-specific Languages in a Finite Domain Constraint Programming  System\nCoprocessor - a Standalone SAT Preprocessor\nTransfer from Multiple MDPs\nVisual Inference Specification Methods for Modularized Rulebases.  Overview and Integration Proposal\nApplication of the Modified 2-opt and Jumping Gene Operators in  Multi-Objective Genetic Algorithm to solve MOTSP\nConceptual Knowledge Markup Language: The central core\nConjure Revisited: Towards Automated Constraint Modelling\nDigital Libraries, Conceptual Knowledge Systems, and the Nebula  Interface\nOn the use of reference points for the biobjective Inventory Routing  Problem\nA Characterization of the Combined Effects of Overlap and Imbalance on  the SVM Classifier\nSocial choice rules driven by propositional logic\nExplicit Approximations of the Gaussian Kernel\nNew Candidates Welcome! Possible Winners with respect to the Addition of  New Candidates\nUnbiased Statistics of a CSP - A Controlled-Bias Generator\nConstraining the Size Growth of the Task Space with Socially Guided  Intrinsic Motivation using Demonstrations\nContinuity in Information Algebras\nTacit knowledge mining algorithm based on linguistic truth-valued  concept lattice\nThe computation of first order moments on junction trees\nA Dichotomy for 2-Constraint Forbidden CSP Patterns\nProgress in animation of an EMA-controlled tongue model for  acoustic-visual speech synthesis\nA Description Logic Primer\nCognitive Memory Network\nImproving feature selection algorithms using normalised feature  histograms\nRecommender System Based on Algorithm of Bicluster Analysis RecBi\nTowards quantitative measures in applied ontology\nA temporally abstracted Viterbi algorithm\nStrictly Proper Mechanisms with Cooperating Players\nBayesian network learning with cutting planes\nEfficient Inference in Markov Control Problems\nReasoning about RoboCup Soccer Narratives\nSuboptimality Bounds for Stochastic Shortest Path Problems\nNoisy Search with Comparative Feedback\nVariational Algorithms for Marginal MAP\nOrder-of-Magnitude Influence Diagrams\nIterated risk measures for risk-sensitive Markov decision processes with  discounted cost\nThe Structure of Signals: Causal Interdependence Models for Games of  Incomplete Information\nGraphical Models for Bandit Problems\nMAV Stabilization using Machine Learning and Onboard Sensors\nElitism Levels Traverse Mechanism For The Derivation of Upper Bounds on  Unimodal Functions\nMarginality: a numerical mapping for enhanced treatment of nominal and  hierarchical attributes\n(Dual) Hoops Have Unique Halving\nA Probabilistic Transmission Expansion Planning Methodology based on  Roulette Wheel Selection and Social Welfare\nCombining Voting Rules Together\nGaussian Process Topic Models\nSuper-Samples from Kernel Herding\nRegularized Maximum Likelihood for Intrinsic Dimension Estimation\nApproximating Higher-Order Distances Using Random Projections\nDirichlet Process Mixtures of Generalized Mallows Models\nA Bayesian Matrix Factorization Model for Relational Data\nLearning networks determined by the ratio of prior and data\nThe Abzooba Smart Health Informatics Platform (SHIP) TM - From Patient  Experiences to Big Data to Insights\nLearning Feature Hierarchies with Centered Deep Boltzmann Machines\nOn Training Deep Boltzmann Machines\nGlobal preferential consistency for the topological sorting-based  maximal spanning tree problem\nUnit contradiction versus unit propagation\nCharacterization of Dynamic Bayesian Network\nSkin-color based videos categorization\nPublishing Identifiable Experiment Code And Configuration Is Important,  Good and Easy\nDerivation of Upper Bounds on Optimization Time of Population-Based  Evolutionary Algorithm on a Function with Fitness Plateaus Using Elitism  Levels Traverse Mechanism\nSimultaneous Object Detection, Tracking, and Event Recognition\nSolution Representations and Local Search for the bi-objective Inventory  Routing Problem\nAutomatic Sampling of Geographic objects\nObjective Function Designing Led by User Preferences Acquisition\nOn the Complexity of Finding Second-Best Abductive Explanations\nA Fuzzy Model for Analogical Problem Solving\nPoultry Diseases Expert System using Dempster-Shafer Theory\nDocument summarization using positive pointwise mutual information\nPublishing and linking transport data on the Web\nModularity-Based Clustering for Network-Constrained Trajectories\nThe Infinite Latent Events Model\nHerding Dynamic Weights for Partially Observed Random Field Models\nTemporal-Difference Networks for Dynamical Systems with Continuous  Observations and Actions\nProbabilistic Structured Predictors\nDomain Knowledge Uncertainty and Probabilistic Parameter Constraints\nInterpretation and Generalization of Score Matching\nCorrelated Non-Parametric Latent Feature Models\nREGAL: A Regularization based Algorithm for Reinforcement Learning in  Weakly Communicating MDPs\nOperations on soft sets revisited\nUnfair items detection in educational measurement\nThe Good, the Bad, and the Odd: Cycles in Answer-Set Programs\nNeural Networks for Handwritten English Alphabet Recognition\nA Mixed Integer Programming Model Formulation for Solving the Lot-Sizing  Problem\nFeature Weighting for Improving Document Image Retrieval System  Performance\nOnline open neuroimaging mass meta-analysis\nApproximating the Partition Function by Deleting and then Correcting for  Model Edges\nMulti-View Learning in the Presence of View Disagreement\nLearning Convex Inference of Marginals\nEstimation and Clustering with Infinite Rankings\nLearning Hidden Markov Models for Regression using Path Aggregation\nTopic Models Conditioned on Arbitrary Features with  Dirichlet-multinomial Regression\nImproving the Asymmetric TSP by Considering Graph Structure\nIdentifying Independence in Relational Models\nTrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing  Multiple Ratings\nNear-Optimal BRL using Optimistic Local Transitions\nContinuous Inverse Optimal Control with Locally Optimal Examples\nActive Learning for Matching Problems\nMachine Learning that Matters\nStatistical Translation, Heat Kernels and Expected Distances\nGeneralized Polya Urn for Time-varying Dirichlet Process Mixtures\nOn Discarding, Caching, and Recalling Samples in Active Learning\nMarkov Chains on Orbits of Permutation Groups\nBounded Planning in Passive POMDPs\nExtension of Three-Variable Counterfactual Casual Graphic Model: from  Two-Value to Three-Value Random Variable\nA concentration theorem for projections\nDirect and Indirect Effects of Sequential Treatments\nSequential Document Representations and Simplicial Curves\nPredicting Conditional Quantiles via Reduction to Classification\nBayesian Multicategory Support Vector Machines\nAlternative Restart Strategies for CMA-ES\nRelational Data Mining Through Extraction of Representative Exemplars\nUnsupervised spectral learning\nOn Privacy-Preserving Histograms\nTwo-Way Latent Grouping Model for User Preference Prediction\nLearning to Map Sentences to Logical Form: Structured Classification  with Probabilistic Categorial Grammars\nAn Algorithm for Computing Stochastically Stable Distributions with  Applications to Multiagent Learning in Repeated Games\nSuper-Mixed Multiple Attribute Group Decision Making Method Based on  Hybrid Fuzzy Grey Relation Approach Degree\nGeneralized Hybrid Grey Relation Method for Multiple Attribute Mixed  Type Decision Making\nThe SeqBin Constraint Revisited\nRule Based Expert System for Diagnosis of Neuromuscular Disorders\nOn Formal Specification of Maple Programs\nOn-line Prediction with Kernels and the Complexity Approximation  Principle\nApplying Discrete PCA in Data Analysis\nExponential Families for Conditional Random Fields\nAn Extended Cencov-Campbell Characterization of Conditional Information  Geometry\nMaximum Entropy for Collaborative Filtering\nConvergence and asymptotic normality of variational Bayesian  approximations for exponential family models with missing values\nComputing Best-Response Strategies in Infinite Games of Incomplete  Information\nTowards Understanding Triangle Construction Problems\nProbability Bracket Notation, Multivariable Systems and Static Bayesian  Networks\nVOI-aware MCTS\nRedundant Sudoku Rules\nEarthquake Scenario Reduction by Symmetry Reasoning\nDiversity in Ranking using Negative Reinforcement\nFMLtoHOL (version 1.0): Automating First-order Modal Logics with LEO-II  and Friends\nFree Lunch or No Free Lunch: That is not Just a Question?\nCredal nets under epistemic irrelevance\nAlgorithmic Simplicity and Relevance\nExperiments with Game Tree Search in Real-Time Strategy Games\nElimination of ISI Using Improved LMS Based Decision Feedback Equalizer\nLifted Variable Elimination: A Novel Operator and Completeness Results\nA Unifying Survey of Reinforced, Sensitive and Stigmergic Agent-Based  Approaches for E-GTSP\nParallel ACO with a Ring Neighborhood for Dynamic TSP\nDesign of Low Noise Amplifiers Using Particle Swarm Optimization\nA matrix approach for computing extensions of argumentation frameworks\nMultimodal diffusion geometry by joint diagonalization of Laplacians\nPattern Detection with Rare Item-set Mining\nTractable Optimization Problems through Hypergraph-Based Structural  Restrictions\nTheorem Proving in Large Formal Mathematics as an Emerging AI Field\nSpeech Signal Filters based on Soft Computing Techniques: A Comparison\nOn Move Pattern Trends in a Large Go Games Corpus\nEfficient Natural Evolution Strategies\nRelative Expressiveness of Defeasible Logics\nInformation fusion in multi-task Gaussian processes\nSimulated Tom Thumb, the Rule Of Thumb for Autonomous Robots\nAI in arbitrary world\nDistributional Framework for Emergent Knowledge Acquisition and its  Application to Automated Document Annotation\nMulti-threaded ASP Solving with clasp\nArtex is AnotheR TEXt summarizer\nIntroduction to the 28th International Conference on Logic Programming  Special Issue\nLocal Optima Networks, Landscape Autocorrelation and Heuristic Search  Performance\nRelational Theories with Null Values and Non-Herbrand Stable Models\nBudget Optimization for Sponsored Search: Censored Learning in MDPs\nCombining local search techniques and path following for bimatrix games\nSample-efficient Nonstationary Policy Evaluation for Contextual Bandits\nHokusai - Sketching Streams in Real Time\nHidden Trends in 90 Years of Harvard Business Review\nAttractor networks and memory replay of phase coded spike patterns\nA Biomimetic Approach Based on Immune Systems for Classification of  Unstructured Data\nGet my pizza right: Repairing missing is-a relations in ALC ontologies  (extended version)\nHierarchical Learning Algorithm for the Beta Basis Function Neural  Network\nTemporal Autoencoding Restricted Boltzmann Machine\nAn Experiment on the Connection between the DLs' Family DL<ForAllPiZero>  and the Real World\nA hybrid cross entropy algorithm for solving dynamic transit network  design problem\nShadows and headless shadows: a worlds-based, autobiographical approach  to reasoning\nModeling problems of identity in Little Red Riding Hood\nShadows and Headless Shadows: an Autobiographical Approach to Narrative  Reasoning\nNew Hoopoe Heuristic Optimization\nNew Heuristics for Interfacing Human Motor System using Brain Waves\nProvocative radio transients and base rate bias: a Bayesian argument for  conservatism\nCompositional Stochastic Modeling and Probabilistic Programming\nCompiling Relational Database Schemata into Probabilistic Graphical  Models\nA New Algorithm for Maximum Likelihood Estimation in Gaussian Graphical  Models for Marginal Independence\nMarkov Random Walk Representations with Continuous Distributions\nEfficient Parametric Projection Pursuit Density Estimation\nBoltzmann Machine Learning with the Latent Maximum Entropy Principle\nAccelerating Inference: towards a full Language, Compiler and Hardware  stack\nProduct/Brand extraction from WikiPedia\nKeyword Extraction for Identifying Social Actors\nA trust-based security mechanism for nomadic users in pervasive systems\nImproving problem solving by exploiting the concept of symmetry\nGeneral Lower Bounds based on Computer Generated Higher Order Expansions\nStaged Mixture Modelling and Boosting\nMechanism Design with Execution Uncertainty\nUnsupervised Active Learning in Large Domains\nAdaptive Foreground and Shadow Detection inImage Sequences\nTranslating NP-SPEC into ASP\nLanguage ASP{f} with Arithmetic Expressions and Consistency-Restoring  Rules\nAnswer Set Programming for Stream Reasoning\nTwo New Definitions of Stable Models of Logic Programs with Generalized  Quantifiers\nLloyd-Topor Completion and General Stable Models\nFuzzy Soft Set Based Classification for Gene Expression Data\nA Forgetting-based Approach to Merging Knowledge Bases\nProceedings of Answer Set Programming and Other Computing Paradigms  (ASPOCP 2012), 5th International Workshop, September 4, 2012, Budapest,  Hungary\nDiscovering Multiple Constraints that are Frequently Approximately  Satisfied\nIterative Markov Chain Monte Carlo Computation of Reference Priors and  Minimax Risk\nCutting Recursive Autoencoder Trees\nMonte Carlo Inference via Greedy Importance Sampling\nAn Uncertainty Framework for Classification\nA formalization of re-identification in terms of compatible  probabilities\nRecycling Proof Patterns in Coq: Case Studies\nApproximation of Classification and Measures of Uncertainty in Rough Set  on Two Universal Sets\nLearning by Transduction\nOn the Geometry of Bayesian Graphical Models with Hidden Variables\nEfficient Partial Order CDCL Using Assertion Level Choice Heuristics\nEstimation of Effects of Sequential Treatments by Reparameterizing  Directed Acyclic Graphs\nFast Image Scanning with Deep Max-Pooling Convolutional Neural Networks\nComplexity distribution of agent policies\nReasoning about Independence in Probabilistic Models of Relational Data\nPreference-Based Unawareness\nLearning in Multi-level Stochastic games with Delayed Information\nThe Semantic Web takes Wing: Programming Ontologies with Tawny-OWL\nReducing Validity in Epistemic ATL to Validity in Epistemic CTL\nGene-Machine, a new search heuristic algorithm\nQuantum and Concept Combination, Entangled Measurements and Prototype  Theory\nTowards Automated Proof Strategy Generalisation\nSeparating Topology and Geometry in Space Planning\nGenerating extrema approximation of analytically incomputable functions  through usage of parallel computer aided genetic algorithms\nDiscovering Semantic Spatial and Spatio-Temporal Outliers from Moving  Object Trajectories\nModel Based Framework for Estimating Mutation Rate of Hepatitis C Virus  in Egypt\nBipolar Fuzzy Soft sets and its applications in decision making problem\nDiscrete Optimization of Statistical Sample Sizes in Simulation by Using  the Hierarchical Bootstrap Method\nDesign for a Darwinian Brain: Part 1. Philosophy and Neuroscience\nSymmetries in Modal Logics\nTesting Hypotheses by Regularized Maximum Mean Discrepancy\nExtending Modern SAT Solvers for Enumerating All Models\nAn Improved EM algorithm\nGeneralized Neutrosophic Soft Set\nA Mining-Based Compression Approach for Constraint Satisfaction Problems\nAplicacion de las Redes Neuronales al Reconocimiento de Sistemas  Operativos\nRobust Logistic Regression using Shift Parameters (Long Version)\nSemantic Web Search based on Ontology Modeling using Protege Reasoner\nImproved Branch-and-Bound for Low Autocorrelation Binary Sequences\nA Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian  Network Structures\nCollaborative ontology sharing and editing\nAlgebraic Properties of Qualitative Spatio-Temporal Calculi\nModelling Electricity Consumption in Office Buildings: An Agent Based  Approach\nSensitive Ants for Denial Jamming Attack on Wireless Sensor Network\nTowards Detection of Bottlenecks in Modular Systems\nLLAMA: Leveraging Learning to Automatically Manage Algorithms\nDirect Uncertainty Estimation in Reinforcement Learning\nAccomplishable Tasks in Knowledge Representation\nSparse Auto-Regressive: Robust Estimation of AR Parameters\nVerifying the Steane code with Quantomatic\nSolution to Quadratic Equation Using Genetic Algorithm\nParallel Algorithm for Longest Common Subsequence in a String\nDistributed Heuristic Forward Search for Multi-Agent Systems\nSimulating Ability: Representing Skills in Games\nREAD-EVAL-PRINT in Parallel and Asynchronous Proof-checking\nDecision Making for Inconsistent Expert Judgments Using Negative  Probabilities\nA novel approach of solving the CNF-SAT problem\nLearning to Understand by Evolving Theories\nReasoning for Moving Blocks Problem: Formal Representation and  Implementation\nIntegration of 3D Object Recognition and Planning for Robotic  Manipulation: A Preliminary Report\nExtracting Information-rich Part of Texts using Text Denoising\nSigma Point Belief Propagation\nAnalysing Quality of English-Hindi Machine Translation Engine Outputs  Using Bayesian Classification\nGraded Causation and Defaults\nCompact Representations of Extended Causal Models\nApproximate Counting CSP Solutions Using Partition Function\nBeyond the quantum formalism: consequences of a neural-oscillator model  to quantum cognition\nA new look at reweighted message passing\nAn evolutionary approach to Function\nGenerating Explanations for Biomedical Queries\nBoosting in the presence of label noise\nGaussian Processes for Big Data\nStructured Convex Optimization under Submodular Constraints\nBeyond Log-Supermodularity: Lower Bounds and the Bethe Partition  Function\nSpeedy Model Selection (SMS) for Copula Models\nApproximate Kalman Filter Q-Learning for Continuous State-Space MDPs\nAn upper bound on prototype set size for condensed nearest neighbor\nLearning Chordal Markov Networks by Constraint Satisfaction\nA necessary and sufficient condition for two relations to induce the  same definable set family\nA Sparse and Adaptive Prior for Time-Dependent Model Parameters\nActivity date estimation in timestamped interaction networks\nWhen is an Example a Counterexample?\nTechnical Report: Distribution Temporal Logic: Combining Correctness  with Quality of Estimation\nInformation, Computation, Cognition. Agency-based Hierarchies of Levels\nA novel local search based on variable-focusing for random K-SAT\nA generalized evidence distance\nMethods for Integrating Knowledge with the Three-Weight Optimization  Algorithm for Hybrid Cognitive Processing\nA hybrid decision support system : application on healthcare\nA Constraint Programming Approach for Mining Sequential Patterns in a  Sequence Database\nIntroduction to Neutrosophic Measure, Neutrosophic Integral, and  Neutrosophic Probability\nCase-Based Merging Techniques in OAKPLAN\nA state vector algebra for algorithmic implementation of second-order  logic\nOntoVerbal: a Generic Tool and Practical Application to SNOMED CT\nRepresenting Knowledge Base into Database for WAP and Web-based Expert  System\nPath Based Mapping Technique for Robots\nParkinson's Disease Motor Symptoms in Machine Learning: A Review\nSharpening independence results for Huntington's affine geometry\nSemantic Annotation: The Mainstay of Semantic Web\nConservative, Proportional and Optimistic Contextual Discounting in the  Belief Functions Theory\nGenerating Shortest Synchronizing Sequences using Answer Set Programming\nAbstract Modular Systems and Solvers\nVolumetric Spanners: an Efficient Exploration Basis for Learning\nThe Value Iteration Algorithm is Not Strongly Polynomial for Discounted  Dynamic Programming\nDescription Logics based Formalization of Wh-Queries\nA regression model with a hidden logistic process for signal  parametrization\nFunctional Mixture Discriminant Analysis with hidden process regression  for curve classification\nProceedings of Answer Set Programming and Other Computing Paradigms  (ASPOCP 2013), 6th International Workshop, August 25, 2013, Istanbul, Turkey\nA Review: Expert System for Diagnosis of Myocardial Infarction\nA stochastic model for Case-Based Reasoning\nAntipodal Interval-Valued Fuzzy Graphs\nCortical prediction markets\nA logic for reasoning about ambiguity\nLatent Tree Models and Approximate Inference in Bayesian Networks\nRoxyBot-06: Stochastic Prediction and Optimization in TAC Travel\nMechanisms for Multi-Unit Auctions\nPolicy Invariance under Reward Transformations for General-Sum  Stochastic Games\nSolving the Minimum Common String Partition Problem with the Help of  Ants\nSkill Analysis with Time Series Image Data\nSentence Compression as Tree Transduction\nCross-lingual Annotation Projection for Semantic Roles\nAn Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem\nHypergraph Acyclicity and Propositional Model Counting\nTractable Epistemic Reasoning with Functional Fluents, Static Causal  Laws and Postdiction\nDesign a Persian Automated Plagiarism Detector (AMZPPD)\nDefuzzify firstly or finally: Dose it matter in fuzzy DEMATEL under  uncertain environment?\nNon-characterizability of belief revision: an application of finite  model theory\nDifficulty Rating of Sudoku Puzzles: An Overview and Evaluation\nSelf-protection and self-healing in the context of cognitive radio\nReasoning about Knowledge and Strategies: Epistemic Strategy Logic\nScalable Planning and Learning for Multiagent POMDPs: Extended Version\nMTD(f), A Minimax Algorithm Faster Than NegaScout\nVerification of confliction and unreachability in rule-based expert  systems with model checking\nAn Integer Programming Model for the Dynamic Location and Relocation of  Emergency Vehicles: A Case Study\nA new combination approach based on improved evidence distance\nGeneralized Evidence Theory\nCausal Interfaces\nGraph Kernels via Functional Embedding\nModeling multi-stage decision optimization problems\nGradual Classical Logic for Attributed Objects\nA Comparative study Between Fuzzy Clustering Algorithm and Hard  Clustering Algorithm\nRough Clustering Based Unsupervised Image Change Detection\nUnsupervised Text Extraction from G-Maps\nCredulous and Skeptical Argument Games for Complete Semantics in  Conflict Resolution based Argumentation\nDeontic Logic for Human Reasoning\nExchangeable Variable Models\nOn the Relative Expressiveness of Argumentation Frameworks, Normal Logic  Programs and Abstract Dialectical Frameworks\nThe Multi-engine ASP Solver ME-ASP: Progress Report\nGabor Filter and Rough Clustering Based Edge Detection\nTowards a Benchmark of Natural Language Arguments\nAn expert system for recommending suitable ornamental fish addition to  an aquarium based on aquarium condition\nTransalg: a Tool for Translating Procedural Descriptions of Discrete  Functions to SAT\nAdaptive Monte Carlo via Bandit Allocation\nDeveloping Corpus-based Translation Methods between Informal and Formal  Mathematics: Project Description\nESmodels: An Epistemic Specification Solver\nAnytime Computation of Cautious Consequences in Answer Set Programming\nBuilding a Classification Model for Enrollment In Higher Educational  Courses using Data Mining Techniques\nApplication of Methods for Syntax Analysis of Context-Free Languages to  Query Evaluation of Logic Programs\nOff-Policy Shaping Ensembles in Reinforcement Learning\nUnderstanding model counting for $β$-acyclic CNF-formulas\nOn minimal sets of graded attribute implications\nOn the cost-complexity of multi-context systems\nIntegrating Vague Association Mining with Markov Model\nn-Valued Refined Neutrosophic Logic and Its Applications to Physics\nCounting Markov Blanket Structures\nLearning Probabilistic Programs\nRéseaux de radio cognitive : Allocation des ressources radio et  accès dynamique au spectre\nPossibilities of technologization of philosophical knowledge\nDecision-Making with Complex Data Structures using Probabilistic  Programming\nAbduction and Dialogical Proof in Argumentation and Logic Programming\nStrategy Synthesis for General Deductive Games Based on SAT Solving\nA Plausibility Semantics for Abstract Argumentation Frameworks\nVirus Detection in Multiplexed Nanowire Arrays using Hidden Semi-Markov  models\n$OntoMath^{PRO}$ Ontology: A Linked Data Hub for Mathematics\nAn evolutionary solver for linear integer programming\n'Almost Sure' Chaotic Properties of Machine Learning Methods\nMONEYBaRL: Exploiting pitcher decision-making using Reinforcement  Learning\nBoundary properties of the inconsistency of pairwise comparisons in  group decisions\nBayesian Multitask Learning with Latent Hierarchies\nRobust Graphical Modeling with t-Distributions\nQuantum Annealing for Variational Bayes Inference\nPrediction with Advice of Unknown Number of Experts\nExponentiated Gradient Exploration for Active Learning\nFuzzy inequational logic\nControlled Natural Language Processing as Answer Set Programming: an  Experiment\nMatrix Completion under Interval Uncertainty\nThe New Approach on Fuzzy Decision Trees\nImproving the Interpretability of Support Vector Machines-based Fuzzy  Rules\nSoft Neutrosophic Algebraic Structures and Their Generalization\nConsensus and Consistency Level Optimization of Fuzzy Preference  Relation: A Soft Computing Approach\nEquilibrium States in Numerical Argumentation Networks\nMathematical Knowledge Representation: Semantic Models and Formalisms\nOn the Computational Efficiency of Training Neural Networks\nInteractive Error Correction in Implicative Theories\nTowards a Model Theory for Distributed Representations\nParameterizing the semantics of fuzzy attribute implications by systems  of isotone Galois connections\nAn Unsupervised Ensemble-based Markov Random Field Approach to  Microscope Cell Image Segmentation\nReasoning for ALCQ extended with a flexible meta-modelling hierarchy\nA Comparison of learning algorithms on the Arcade Learning Environment\nConditional Generative Adversarial Nets\nHardware and Software manual for Evolution of Oil Droplets in a  Chemo-Robotic Platform\nModeling Word Relatedness in Latent Dirichlet Allocation\nBounding the Probability of Causation in Mediation Analysis\nHandling owl:sameAs via Rewriting\nAn Approach to Model Checking of Multi-agent Data Analysis\nIntegrating Fuzzy and Ant Colony System for Fuzzy Vehicle Routing  Problem with Time Windows\nUsing Description Logics for RDF Constraint Checking and Closed-World  Recognition\nInfluence Functions for Machine Learning: Nonparametric Estimators for  Entropies, Divergences and Mutual Informations\nA Note on Systematic Conflict Generation in CA-EN-type Causal Structures\nRelations World: A Possibilistic Graphical Model\nAutomated Reasoning in Deontic Logic\nA Unified View of Large-scale Zero-sum Equilibrium Computation\nFalling Rule Lists\nDiscrete Bayesian Networks: The Exact Posterior Marginal Distributions\nRational Deployment of Multiple Heuristics in IDA*\nEfficient Algorithms for Bayesian Network Parameter Learning from  Incomplete Data\nUnweighted Stochastic Local Search can be Effective for Random CSP  Benchmarks\nImproving the Deductive System DES with Persistence by Using SQL DBMS's\nBelief Hierarchical Clustering\nHolographic Graph Neuron: a Bio-Inspired Architecture for Pattern  Processing\nValue Iteration with Options and State Aggregation\nSecond International Nurse Rostering Competition (INRC-II) --- Problem  Description and Rules ---\nStructure Learning in Bayesian Networks of Moderate Size by Efficient  Sampling\nSlice Sampling for Probabilistic Programming\nConsid{é}rant la d{é}pendance dans la th{é}orie des fonctions de  croyance\nSecond-Order Belief Hidden Markov Models\nInt{é}gration d'une mesure d'ind{é}pendance pour la fusion  d'informations\nOutput-Sensitive Adaptive Metropolis-Hastings for Probabilistic Programs\nInclusion within Continuous Belief Functions\nA Flexible Coupling Approach to Multi-Agent Planning under Incomplete  Information\nA Modification of the Halpern-Pearl Definition of Causality\nTowards the Ontology Web Search Engine\nA Feature-based Classification Technique for Answering Multi-choice  World History Questions\nFast Differentially Private Matrix Factorization\nStructure Formation in Large Theories\nLeoPARD --- A Generic Platform for the Implementation of Higher-Order  Reasoners\nRelations between MDDs and Tuples and Dynamic Modifications of MDDs  based constraints\nNorm Monitoring under Partial Action Observability\nOntoSOC: Sociocultural Knowledge Ontology\nScalable Parallel Numerical Constraint Solver Using Global Load  Balancing\nOn sets of graded attribute implications with witnessed non-redundancy\nOn the relation between accuracy and fairness in binary classification\nNew HSL Distance Based Colour Clustering Algorithm\nA Logic of Knowing How\nFeature Representation for Online Signature Verification\nA Tool for Computing and Estimating the Volume of the Solution Space of  SMT(LA)\nOnline Transfer Learning in Reinforcement Learning Domains\nEmphatic Temporal-Difference Learning\nDependency-based Convolutional Neural Networks for Sentence Embedding\nArchaeology in the Digital Age: From Paper to Databases\nTowards Log-Linear Logics with Concrete Domains\nFirst-order integer programming for MAP problems\nComplexity and Compilation of GZ-Aggregates in Answer Set Programming\nSolomonoff Induction Violates Nicod's Criterion\nOptimizing the computation of overriding\nReinforcement Learning for the Unit Commitment Problem\nRAPS: A Recommender Algorithm Based on Pattern Structures\nTowards a Better Understanding of CAR, CDR, CADR and the Others\nAdapting Stochastic Search For Real-time Dynamic Weighted Constraint  Satisfaction\nComputation of Stackelberg Equilibria of Finite Sequential Games\nImplementing Efficient All Solutions SAT Solvers\nLocal Rademacher Complexity Bounds based on Covering Numbers\nTowards a general framework for an observation and knowledge based model  of occupant behaviour in office buildings\nGelisp: A Library to Represent Musical CSPs and Search Strategies\nOn oblivious branching programs with bounded repetition that cannot  efficiently compute CNFs of bounded treewidth\nNarrative Science Systems: A Review\nCreating Scalable and Interactive Web Applications Using High  Performance Latent Variable Models\nAn Efficient Implementation for WalkSAT\nEmpirical Study on Deep Learning Models for Question Answering\nRedesigning pattern mining algorithms for supercomputers\nChaos of Protein Folding\nVisualising interactive inferences with IDPD3\nSubmodular Hamming Metrics\nDeep Multimodal Semantic Embeddings for Speech and Images\nComplexity of the Description Logic ALCM\nCommunicating Semantics: Reference by Description\nPlanning in the Wild: Modeling Tools for PDDL\nBayesian Network Models for Adaptive Testing\nShaping Proto-Value Functions via Rewards\nOn the convergence of cycle detection for navigational reinforcement  learning\nColumn-Oriented Datalog Materialization for Large Knowledge Graphs  (Extended Technical Report)\nAsk, and shall you receive?: Understanding Desire Fulfillment in Natural  Language Text\nA SAT model to mine flexible sequences in transactional datasets\nCOCO: The Bi-objective Black Box Optimization Benchmarking (bbob-biobj)  Test Suite\nCoordination of Players in Ride-Sharing Games by Signaling\nCharacter-Level Question Answering with Attention\nExtending DLR with Labelled Tuples, Projections, Functional Dependencies  and Objectification (full version)\nData-Efficient Off-Policy Policy Evaluation for Reinforcement Learning\nOn the uniform one-dimensional fragment\nTowards an Indexical Model of Situated Language Comprehension for  Cognitive Agents in Physical Worlds\nPatterns on data described by vague limits, vague colimits and vague  commutativity\nResource Allocation with Population Dynamics\nHordeQBF: A Modular and Massively Parallel QBF Solver\nSingle-Image Depth Perception in the Wild\nKOGNAC: Efficient Encoding of Large Knowledge Graphs\nA global constraint for closed itemset mining\nText-based LSTM networks for Automatic Music Composition\nProcedural urban environments for FPS games\nA Factorization Machine Framework for Testing Bigram Embeddings in  Knowledgebase Completion\nParameterized Compilation Lower Bounds for Restricted CNF-formulas\nA Computational Model for Situated Task Learning with Interactive  Instruction\nExtracted Social Network Mining\nAGM-Style Revision of Beliefs and Intentions from a Database Perspective  (Preliminary Version)\nEndgame Analysis of Dou Shou Qi\nMutual Transformation of Information and Knowledge\nSelecting the Selection\nSupervisory Control for Behavior Composition\nContext Discovery for Model Learning in Partially Observable  Environments\nLearning a Driving Simulator\nInteracting Conceptual Spaces\nStable Models for Infinitary Formulas with Extensional Atoms\nQuery Answering in Resource-Based Answer Set Semantics\nWinograd Schemas and Machine Translation\nDelta Epsilon Alpha Star: A PAC-Admissible Search Algorithm\nDeeply Semantic Inductive Spatio-Temporal Learning\nMean Box Pooling: A Rich Image Representation and Output Embedding for  the Visual Madlibs Task\nResolving Spatial-Time Conflicts In A Set Of Any-angle Or  Angle-constrained Grid Paths\nNeural Generation of Regular Expressions from Natural Language with  Minimal Domain Knowledge\nPerceptual Reward Functions\nA Geometric Framework for Convolutional Neural Networks\nNatural Language Processing using Hadoop and KOSHIK\nFree Lunch for Optimisation under the Universal Distribution\nEvaluating Causal Models by Comparing Interventional Distributions\nTowards Music Captioning: Generating Music Playlist Descriptions\nEffectiveness of greedily collecting items in open world games\nCausality and Responsibility for Formal Verification and Beyond\nFrom Deterministic ODEs to Dynamic Structural Causal Models\nAchievements in Answer Set Programming (Preliminary Report)\nBreakID: Static Symmetry Breaking for ASP (System Description)\nALLSAT compressed with wildcards. Part 1: Converting CNF's to orthogonal  DNF's\nThe Generalized Smallest Grammar Problem\nOptimal Upper and Lower Bounds for Boolean Expressions by Dissociation\nA Quantitative Version of the Gibbard-Satterthwaite Theorem for Three  Alternatives\nOvercoming Misleads In Logic Programs by Redefining Negation\nActivity-Based Search for Black-Box Contraint-Programming Solvers\nThe matrices of argumentation frameworks\nBi-modal Gödel logic over [0,1]-valued Kripke frames\nA Generalized Arc-Consistency Algorithm for a Class of Counting  Constraints: Revised Edition that Incorporates One Correction\nQuels formalismes temporels pour représenter des connaissances  extraites de textes de recettes de cuisine ?\nModelling Constraint Solver Architecture Design as a Constraint Problem\nA cognitive diversity framework for radar target classification\nEntropy Search for Information-Efficient Global Optimization\nMulti-granular Perspectives on Covering\nBootstrapping Intrinsically Motivated Learning with Human Demonstrations\nReal-time face swapping as a tool for understanding infant  self-recognition\nTruncated Power Method for Sparse Eigenvalue Problems\nEnhancing Support for Knowledge Works: A relatively unexplored vista of  computing research\nMulti-q Analysis of Image Patterns\nDisjunctive Logic Programs versus Normal Logic Programs\nIFP-Intuitionistic fuzzy soft set theory and its applications\nLogical Fuzzy Preferences\nNested Aggregates in Answer Sets: An Application to a Priori  Optimization\nRoborobo! a Fast Robot Simulator for Swarm and Collective Robotics\nLogical Probability Preferences\nLogical Stochastic Optimization\nJustificatory and Explanatory Argumentation for Committing Agents\nUnveiling the link between logical fallacies and web persuasion\nEfficient Computation of Mean Truncated Hitting Times on Very Large  Graphs\nMining to Compact CNF Propositional Formulae\nTemporal Description Logic for Ontology-Based Data Access (Extended  Version)\nEnacting Social Argumentative Machines in Semantic Wikipedia\nAutomating the Dispute Resolution in Task Dependency Network\nEnhancements to ACL2 in Versions 5.0, 6.0, and 6.1\nA Note on Topology Preservation in Classification, and the Construction  of a Universal Neuron Grid\nMaLeS: A Framework for Automatic Tuning of Automated Theorem Provers\nHidden Parameter Markov Decision Processes: A Semiparametric Regression  Approach for Discovering Latent Task Parametrizations\nBat Algorithm: Literature Review and Applications\nEvolution Theory of Self-Evolving Autonomous Problem Solving Systems\nAn Integrated Framework for Diagnosis and Prognosis of Hybrid Systems\nThe Partner Units Configuration Problem: Completing the Picture\nMicrostrip Coupler Design Using Bat Algorithm\nCombining finite and continuous solvers\nRevisiting the Learned Clauses Database Reduction Strategies\nHandwritten Character Recognition In Malayalam Scripts- A Review\nFeature and Variable Selection in Classification\nMachine Learner for Automated Reasoning 0.4 and 0.5\nParameter estimation based on interval-valued belief structures\nTowards Ultra Rapid Restarts\nA normative account of defeasible and probabilistic inference\nLine Maps in Cluttered Environments\nReciprocity in Gift-Exchange-Games\nA Superposition Calculus for Abductive Reasoning\nA Geometric Method to Obtain the Generation Probability of a Sentence\nThe Best Templates Match Technique For Example Based Machine Translation\nMultiscale probability transformation of basic probability assignment\nIntroduction to Neutrosophic Statistics\nRational Closure in SHIQ\nA bio-inspired algorithm for fuzzy user equilibrium problem by aid of  Physarum Polycephalum\nTableaux for Dynamic Logic of Propositional Assignments\nExpertBayes: Automatically refining manually built Bayesian networks\nGraph Approximation and Clustering on a Budget\nKalman Temporal Differences\nSemi-Separable Hamiltonian Monte Carlo for Inference in Bayesian  Hierarchical Models\nTowards a theory of granular sets\nNotes on hierarchical ensemble methods for DAG-structured taxonomies\nExact Decoding on Latent Variable Conditional Models is NP-Hard\nKnowledge Base of an Expert System Used for Dyslalic Children Therapy\nRandom Logic Programs: Linear Model\nCommunicating and resolving entity references\nFeature Selection in Conditional Random Fields for Map Matching of GPS  Trajectories\nAction Recognition in the Frequency Domain\nSimulating Non Stationary Operators in Search Algorithms\nOn Minimax Optimal Offline Policy Evaluation\nOn tensor rank of conditional probability tables in Bayesian networks\nNeighborhood Selection and Rules Identification for Cellular Automata: A  Rough Sets Approach\nThe Application of Differential Privacy for Rank Aggregation: Privacy  and Accuracy\nGradient-based Taxis Algorithms for Network Robotics\nMedical diagnosis as pattern recognition in a framework of information  compression by multiple alignment, unification and search\nA CSP implementation of the bigraph embedding problem\nUnsupervised Induction of Semantic Roles within a Reconstruction-Error  Minimization Framework\nHierarchical Mixture-of-Experts Model for Large-Scale Gaussian Process  Regression\nThe category of networks of ontologies\nLogic of temporal attribute implications\nFeature Weight Tuning for Recursive Neural Networks\nTuring Test for the Internet of Things\nStochastic Local Search for Pattern Set Mining\nEfficient Decision-Making by Volume-Conserving Physical Object\nExample Selection For Dictionary Learning\nTowards a Consistent, Sound and Complete Conceptual Knowledge\nKF metamodel formalization\nIn Search of the Real Inductive Bias: On the Role of Implicit  Regularization in Deep Learning\nA note about the generalisation of the C-tests\nMinimizing Regret in Dynamic Decision Problems\nInjury risk prediction for traffic accidents in Porto Alegre/RS, Brazil\nNumerical Solution of Fuzzy Stochastic Differential Equation\nCalibration of conditional composite likelihood for Bayesian inference  on Gibbs random fields\nThe Power of Randomization: Distributed Submodular Maximization on  Massive Datasets\nRandom Coordinate Descent Methods for Minimizing Decomposable Submodular  Functions\nOn Forgetting in Tractable Propositional Fragments\nConstrained Nonlinear Model Predictive Control of an MMA Polymerization  Process via Evolutionary Optimization\nExplaining robust additive utility models by sequences of preference  swaps\nInductive Learning for Rule Generation from Ontology\nAutomated Reasoning for Robot Ethics\nPseudo Fuzzy Set\nTransformation of basic probability assignments to probabilities based  on a new entropy measure\nPath Finding under Uncertainty through Probabilistic Inference\nOnline Fair Division: analysing a Food Bank problem\nProbabilistic Zero-shot Classification with Semantic Rankings\nNovel Metaknowledge-based Processing Technique for Multimedia Big Data  clustering challenges\nAn Introduction to Logics of Knowledge and Belief\nRobustly Leveraging Prior Knowledge in Text Classification\nCompositional Distributional Semantics with Long Short Term Memory\nTransitive reasoning with imprecise probabilities\nAutonomic Resource Management in Virtual Networks\nCombining partially independent belief functions\nA Rule-Based Short Query Intent Identification System\nProperties of Sparse Distributed Representations and their Application  to Hierarchical Temporal Memory\nRecent advances on inconsistency indices for pairwise comparisons - a  commentary\nThe Libra Toolkit for Probabilistic Models\nMonte Carlo Localization in Hand-Drawn Maps\nDual Decomposition from the Perspective of Relax, Compensate and then  Recover\nRDF annotation of Second Life objects: Knowledge Representation meets  Social Virtual reality\nKnowledge reduction of dynamic covering decision information systems  with varying attribute values\nTractable Query Answering and Optimization for Extensions of  Weakly-Sticky Datalog+-\nFuzzy approaches to context variable in fuzzy geographically weighted  clustering\nGraphlet-based lazy associative graph classification\nFormalizing Preference Utilitarianism in Physical World Models\nx.ent: R Package for Entities and Relations Extraction based on  Unsupervised Learning and Document Structure\nLogical Conditional Preference Theories\nUsing Syntax-Based Machine Translation to Parse English into Abstract  Meaning Representation\nMaximum a Posteriori Estimation by Search in Probabilistic Programs\nTheory of Semi-Instantiation in Abstract Argumentation\nPrivate Disclosure of Information in Health Tele-monitoring\nPrefix-Projection Global Constraint for Sequential Pattern Mining\nDetecting Concept-level Emotion Cause in Microblogging\nInteractive Knowledge Base Population\nFormal Concept Analysis for Knowledge Discovery from Biological Data\nPerforming Bayesian Risk Aggregation using Discrete Approximation  Algorithms with Graph Factorization\nOn SAT Models Enumeration in Itemset Mining\nAn Ensemble method for Content Selection for Data-to-text Systems\nVariational Gaussian Copula Inference\nLeverage Financial News to Predict Stock Price Movements Using Word  Embeddings and Deep Neural Networks\nSequential Extensions of Causal and Evidential Decision Theory\nDynamic Bayesian Ontology Languages\nArgumentation Semantics for Prioritised Default Logic\nProcedural Content Generation for GDL Descriptions of Simplified  Boardgames\nFactor Graphs for Quantum Probabilities\nIdentifying Avatar Aliases in Starcraft 2\nArabic Text Watermarking: A Review\nSimulation of optical flow and fuzzy based obstacle avoidance system for  mobile robots\nFuzzy Longest Common Subsequence Matching With FCM Using R\nDrawing and Analyzing Causal DAGs with DAGitty\nDuration and Interval Hidden Markov Model for Sequential Data Analysis\nThe Relation Between Acausality and Interference in Quantum-Like  Bayesian Networks\nModel Guided Sampling Optimization for Low-dimensional Problems\nValue function approximation via low-rank models\nA Neural Attention Model for Abstractive Sentence Summarization\nBetter Document-level Sentiment Analysis from RST Discourse Parsing\nReinforcement Learning with Parameterized Actions\nNatural scene statistics mediate the perception of image complexity\nLoops with abelian inner mapping groups: An application of automated  deduction\nSports highlights generation based on acoustic events detection: A rugby  case study\nGraph Kernels exploiting Weisfeiler-Lehman Graph Isomorphism Test  Extensions\nA Compositional Explanation of the Pet Fish Phenomenon\nA Feature-Based Comparison of Evolutionary Computing Techniques for  Constrained Continuous Optimisation\nQuantum Look at two Common Logics: the Logic of Primitive Thinking and  the Logic of Everyday Human Reasoning\nEncoding Reality: Prediction-Assisted Cortical Learning Algorithm in  Hierarchical Temporal Memory\nBuilding Memory with Concept Learning Capabilities from Large-scale  Knowledge Base\nContamination-Free Measures and Algebraic Operations\nThe Rationale behind the Concept of Goal\nA Model for Safety Case Confidence Assessment\nConditions for Normative Decision Making at the Fire Ground\nSignal Representations on Graphs: Tools and Applications\nCombining Fuzzy Cognitive Maps and Discrete Random Variables\nSDDs are Exponentially More Succinct than OBDDs\nAngrier Birds: Bayesian reinforcement learning\nFuzzy Object-Oriented Dynamic Networks. I\nOn Clustering Time Series Using Euclidean Distance and Pearson  Correlation\nIndicators of Good Student Performance in Moodle Activity Data\nProactive Message Passing on Memory Factor Networks\nTop-N Recommender System via Matrix Completion\nThe Singularity Controversy, Part I: Lessons Learned and Open Questions:  Conclusions from the Battle on the Legitimacy of the Debate\nA Label Semantics Approach to Linguistic Hedges\nThe Utility of Hedged Assertions in the Emergence of Shared Categorical  Labels\nA First Attempt to Cloud-Based User Verification in Distributed System\nDiscussion on Mechanical Learning and Learning Machine\nGreedy Deep Dictionary Learning\nGECKA3D: A 3D Game Engine for Commonsense Knowledge Acquisition\nFuzzy Object-Oriented Dynamic Networks. II\nWayfinding and cognitive maps for pedestrian models\nRegion Based Approximation for High Dimensional Bayesian Network Models\nVariations of the Similarity Function of TextRank for Automated  Summarization\nMachine olfaction using time scattering of sensor multiresolution graphs\nScience Question Answering using Instructional Materials\nLarge-Scale Reasoning with OWL\nDeep Exploration via Bootstrapped DQN\nRecommendations as Treatments: Debiasing Learning and Evaluation\nA General Modifier-based Framework for Inconsistency-Tolerant Query  Answering\nToward Deeper Understanding of Neural Networks: The Power of  Initialization and a Dual View on Expressivity\nStrong Backdoors for Default Logic\nCauses for Query Answers from Databases, Datalog Abduction and  View-Updates: The Presence of Integrity Constraints\nSocial planning for social HRI\nSIFT: An Algorithm for Extracting Structural Information From Taxonomies\nThompson Sampling is Asymptotically Optimal in General Environments\nTowards Neural Knowledge DNA\nLearning to Blend Computer Game Levels\nA Set Theoretic Approach for Knowledge Representation: the  Representation Part\nPenta and Hexa Valued Representation of Neutrosophic Information\nGrounding Recursive Aggregates: Preliminary Report\nActive Algorithms For Preference Learning Problems with Multiple  Populations\nEvolving Shepherding Behavior with Genetic Programming Algorithms\nA System for Probabilistic Linking of Thesauri and Classification  Systems\nLearning Executable Semantic Parsers for Natural Language Understanding\nAn Expressive Probabilistic Temporal Logic\nProperties of ABA+ for Non-Monotonic Reasoning\nSpectral M-estimation with Applications to Hidden Markov Models\nAlgorithms for Batch Hierarchical Reinforcement Learning\nOrdinal Conditional Functions for Nearly Counterfactual Revision\nReactive Policies with Planning for Action Languages\nGraph Clustering Bandits for Recommendation\nOnline Learning of Commission Avoidant Portfolio Ensembles\nNotes on a model for fuzzy computing\nLSTM-based Mixture-of-Experts for Knowledge-Aware Dialogues\nCombinatorial Aspects of the Distribution of Rough Objects\nAdobe-MIT submission to the DSTC 4 Spoken Language Understanding pilot  task\nFunction-Described Graphs for Structural Pattern Recognition\nLearning Bounded Treewidth Bayesian Networks with Thousands of Variables\nConcept based Attention\nLearning Representations for Counterfactual Inference\nReal-Time Web Scale Event Summarization Using Sequential Decision Making\nNatural Language Semantics and Computability\nA New Method for Parallel Monte Carlo Tree Search\nOn Avoidance Learning with Partial Observability\nOn the Complexity of Connection Games\nCombat Models for RTS Games\nRelations such as Hypernymy: Identifying and Exploiting Hearst Patterns  in Distributional Vectors for Lexical Entailment\nThe Bees Algorithm for the Vehicle Routing Problem\nDynamic Bayesian Networks to simulate occupant behaviours in office  buildings related to indoor air quality\nAs Cool as a Cucumber: Towards a Corpus of Contemporary Similes in  Serbian\nExtracting Higher-Order Goals from the Mizar Mathematical Library\nPosterior Dispersion Indices\nCompliant Conditions for Polynomial Time Approximation of Operator  Counts\nInternal Guidance for Satallax\nPsychologically based Virtual-Suspect for Interrogative Interview  Training\nAdaptive Learning Rate via Covariance Matrix Based Preconditioning for  Deep Neural Networks\nInformation Theoretically Aided Reinforcement Learning for Embodied  Agents\nA structured argumentation framework for detaching conditional  obligations\nAn interactive fuzzy goal programming algorithm to solve decentralized  bi-level multiobjective fractional programming problem\nLearning to Optimize\nTowards Playlist Generation Algorithms Using RNNs Trained on  Within-Track Transitions\nStructured Convolution Matrices for Energy-efficient Deep learning\nSymbolic Music Data Version 1.0\nDialPort: Connecting the Spoken Dialog Research Community to Real User  Data\ne-Commerce product classification: our participation at cDiscount 2015  challenge\nGenerative Adversarial Imitation Learning\nA Probabilistic-Based Model for Binary CSP\nModal-set estimation with an application to clustering\nDeepMath - Deep Sequence Models for Premise Selection\nDeep Reinforcement Learning With Macro-Actions\nThe Mondrian Kernel\nProceedings First International Workshop on Hammers for Type Theories\nLearning Abstract Classes using Deep Learning\nFounded Semantics and Constraint Semantics of Logic Rules\nA Hierarchical Reinforcement Learning Method for Persistent  Time-Sensitive Tasks\nAutomated Extraction of Number of Subjects in Randomised Controlled  Trials\nThe VGLC: The Video Game Level Corpus\nEpistemic Protocols for Distributed Gossiping\nA Dynamic Epistemic Framework for Conformant Planning\n\"Show me the cup\": Reference with Continuous Representations\nExploring high-level Perspectives on Self-Configuration Capabilities of  Systems\nGreedy, Joint Syntactic-Semantic Parsing with Stack LSTMs\nSwift: Compiled Inference for Probabilistic Programming Languages\nLearning Crosslingual Word Embeddings without Bilingual Corpora\nProbabilistic Reasoning in the Description Logic ALCP with the Principle  of Maximum Entropy (Full Version)\nWhy is Posterior Sampling Better than Optimism for Reinforcement  Learning?\nMeaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability\nAnalysis of Double Covers of Factor Graphs\nA New Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes  Classifier for Coping with Gene Ontology-based Features\nReal-Time Anomaly Detection for Streaming Analytics\nHow to Allocate Resources For Features Acquisition?\nMapping distributional to model-theoretic semantic spaces: a baseline\nsk_p: a neural program corrector for MOOCs\nPossibilistic Networks: Parameters Learning from Imprecise Data and  Evaluation strategy\nValidation of Information Fusion\nMeta-Prod2Vec - Product Embeddings Using Side-Information for  Recommendation\nGrounding Dynamic Spatial Relations for Embodied (Robot) Interaction\nNeuromorphic Robot Dream\nPersonalized Emphasis Framing for Persuasive Message Generation\nA MIP Backend for the IDP System\nHigh Dimensional Human Guided Machine Learning\nMulti Exit Configuration of Mesoscopic Pedestrian Simulation\nEquilibrium Graphs\nFeasibility of Post-Editing Speech Transcriptions with a Mismatched  Crowd\nWav2Letter: an End-to-End ConvNet-based Speech Recognition System\nReduced Space and Faster Convergence in Imperfect-Information Games via  Regret-Based Pruning\nSequencing Chess\nExploration Potential\nPrioritised Default Logic as Argumentation with Partial Order Default  Priorities\nSolving the Wastewater Treatment Plant Problem with SMT\nLabel-Free Supervision of Neural Networks with Physics and Domain  Knowledge\nA globally-applicable disease ontology for biosurveillance; Anthology of  Biosurveillance Diseases (ABD)\nA Logic of Knowing Why\nThe Digital Synaptic Neural Substrate: Size and Quality Matters\nSemiring Programming: A Framework for Search, Inference and Learning\nNdFluents: A Multi-dimensional Contexts Ontology\nSocial Network Processes in the Isabelle and Coq Theorem Proving  Communities\nLearning to Translate for Multilingual Question Answering\nAP16-OL7: A Multilingual Database for Oriental Languages and A Language  Recognition Baseline\nGlobal Constraint Catalog, Volume II, Time-Series Constraints\nHeuristic with elements of tabu search for Truck and Trailer Routing  Problem\nSemantic Parsing with Semi-Supervised Sequential Autoencoders\nDeep unsupervised learning through spatial contrasting\nA Tour of TensorFlow\nLifted Message Passing for the Generalized Belief Propagation\nPlaces: An Image Database for Deep Scene Understanding\nDeep Reinforcement Learning From Raw Pixels in Doom\nLearning Macro-actions for State-Space Planning\nOn Deductive Systems of AC Semantics for Rough Sets\nMulti-Objective Deep Reinforcement Learning\nABA+: Assumption-Based Argumentation with Preferences\nPCG-Based Game Design Patterns\nA Fuzzy Logic System to Analyze a Student's Lifestyle\nBank Card Usage Prediction Exploiting Geolocation Information\nImproved Knowledge Base Completion by Path-Augmented TransR Model\nWind ramp event prediction with parallelized Gradient Boosted Regression  Trees\nDiagnosis of aerospace structure defects by a HPC implemented soft  computing algorithm\nMaximizing positive opinion influence using an evidential approach\nGeneralized Interval-valued OWA Operators with Interval Weights Derived  from Interval-valued Overlap Functions\nA Multidimensional Cascade Neuro-Fuzzy System with Neuron Pool  Optimization in Each Cascade\nAdaptive Forecasting of Non-Stationary Nonlinear Time Series Based on  the Evolving Weighted Neuro-Neo-Fuzzy-ANARX-Model\nAn Evolving Neuro-Fuzzy System with Online Learning/Self-learning\nAn Ensemble of Adaptive Neuro-Fuzzy Kohonen Networks for Online Data  Stream Fuzzy Clustering\nJointly Learning to Align and Convert Graphemes to Phonemes with Neural  Attention Models\nReinforcement Learning in Conflicting Environments for Autonomous  Vehicles\nCharacterization of an inconsistency ranking for pairwise comparison  matrices\nFast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation\nAnomaly Detection with the Voronoi Diagram Evolutionary Algorithm\nHit-and-Run for Sampling and Planning in Non-Convex Spaces\nFuzzy Bayesian Learning\nRobust Spectral Inference for Joint Stochastic Matrix Factorization\nStrong Neutrosophic Graphs and Subgraph Topological Subspaces\nThe new hybrid COAW method for solving multi-objective problems\nTorchCraft: a Library for Machine Learning Research on Real-Time  Strategy Games\nWays of Conditioning Generative Adversarial Networks\nA Compare-Aggregate Model for Matching Text Sequences\nReinforcement Learning Approach for Parallelization in Filters  Aggregation Based Feature Selection Algorithms\nOn interestingness measures of formal concepts\nSong From PI: A Musically Plausible Network for Pop Music Generation\nSummaRuNNer: A Recurrent Neural Network based Sequence Model for  Extractive Summarization of Documents\nRecoverability of Joint Distribution from Missing Data\nA Way out of the Odyssey: Analyzing and Combining Recent Insights for  LSTMs\nNeural Style Representations and the Large-Scale Classification of  Artistic Style\nMonte Carlo Connection Prover\nLearning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment\nGeneralized Dropout\nOptions Discovery with Budgeted Reinforcement Learning\nLimbo: A Fast and Flexible Library for Bayesian Optimization\nDeep Learning Approximation for Stochastic Control Problems\nDeep Reinforcement Learning for Multi-Domain Dialogue Systems\nMultiwinner Approval Rules as Apportionment Methods\n\"Model and Run\" Constraint Networks with a MILP Engine\nAnalyzing Features for the Detection of Happy Endings in German Novels\nTowards a new quantum cognition model\nGeneric and Efficient Solution Solves the Shortest Paths Problem in  Square Runtime\nSemantic Parsing of Mathematics by Context-based Learning from Aligned  Corpora and Theorem Proving\nC-RNN-GAN: Continuous recurrent neural networks with adversarial  training\nLow-dimensional Data Embedding via Robust Ranking\nUnit Commitment using Nearest Neighbor as a Short-Term Proxy\nOptimizing Quantiles in Preference-based Markov Decision Processes\nProbabilistic Neural Programs\nComparison of the COG Defuzzification Technique and Its Variations to  the GPA Index\nRepresenting Independence Models with Elementary Triplets\nImproving the Performance of Neural Networks in Regression Tasks Using  Drawering\nCoactive Critiquing: Elicitation of Preferences and Features\nKnowledge Representation in Graphs using Convolutional Neural Networks\nDecision Theory in an Algebraic Setting\nControlling Robot Morphology from Incomplete Measurements\nHierarchy through Composition with Linearly Solvable Markov Decision  Processes\nGOTM: a Goal-oriented Framework for Capturing Uncertainty of Medical  Treatments\nDeepCancer: Detecting Cancer through Gene Expressions via Deep  Generative Learning\nLearning to Drive using Inverse Reinforcement Learning and Deep  Q-Networks\nEncapsulating models and approximate inference programs in probabilistic  modules\nCrowdsourced Outcome Determination in Prediction Markets\nTeKnowbase: Towards Construction of a Knowledge-base of Technical  Concepts\nA correlation coefficient of belief functions\nSample-efficient Deep Reinforcement Learning for Dialog Control\nNon-Deterministic Policy Improvement Stabilizes Approximated  Reinforcement Learning\nAutomated timetabling for small colleges and high schools using huge  integer programs\nMeta-Unsupervised-Learning: A supervised approach to unsupervised  learning\nA hybrid approach to supervised machine learning for algorithmic melody  composition\nPrASP Report\nCutting-off Redundant Repeating Generations for Neural Abstractive  Summarization\nFinding Risk-Averse Shortest Path with Time-dependent Stochastic Costs\nFrom Preference-Based to Multiobjective Sequential Decision-Making\nA pre-semantics for counterfactual conditionals and similar logics\nA K-fold Method for Baseline Estimation in Policy Gradient Algorithms\nOpenNMT: Open-Source Toolkit for Neural Machine Translation\nHedera: Scalable Indexing and Exploring Entities in Wikipedia Revision  History\nVulnerability of Deep Reinforcement Learning to Policy Induction Attacks\nHeterogeneous Information Network Embedding for Meta Path based  Proximity\nBinary Matrix Guessing Problem\nENIGMA: Efficient Learning-based Inference Guiding Machine\nLAREX - A semi-automatic open-source Tool for Layout Analysis and Region  Extraction on Early Printed Books\nLogic Programming Petri Nets\nEfficiently Summarising Event Sequences with Rich Interleaving Patterns\nOrganic Computing in the Spotlight\nComparative Study Of Data Mining Query Languages\nIncremental Maintenance Of Association Rules Under Support Threshold  Change\nA Study of FOSS'2013 Survey Data Using Clustering Techniques\nRedefinition of the concept of fuzzy set based on vague partition from  the perspective of axiomatization\nA Hybrid Evolutionary Algorithm Based on Solution Merging for the  Longest Arc-Preserving Common Subsequence Problem\nTwo forms of minimality in ASPIC+\nTowards Better Analysis of Machine Learning Models: A Visual Analytics  Perspective\nSurvey of modern Fault Diagnosis methods in networks\nConvolutional Neural Network for Earthquake Detection and Location\nA Historical Review of Forty Years of Research on CMAC\nGraph Neural Networks and Boolean Satisfiability\nDeveloping an ontology for the access to the contents of an archival  fonds: the case of the Catasto Gregoriano\nOntoMath Digital Ecosystem: Ontologies, Mathematical Knowledge Analytics  and Management\nBe Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers  from Vision\nTheorem Proving Based on Semantics of DNA Strand Graph\nQuantifying Program Bias\n'Viral' Turing Machines, Computation from Noise and Combinatorial  Hierarchies\nThe Dialog State Tracking Challenge with Bayesian Approach\nAn Integer Programming Model for Binary Knapsack Problem with  Value-Related Dependencies among Elements\nRealization of Ontology Web Search Engine\nA DIKW Paradigm to Cognitive Engineering\nOntologies in System Engineering: a Field Report\nBoosted Generative Models\nTowards A Rigorous Science of Interpretable Machine Learning\nBayesian Verification under Model Uncertainty\nStacked Thompson Bandits\nImproving the Neural GPU Architecture for Algorithm Learning\nDo Reichenbachian Common Cause Systems of Arbitrary Finite Size Exist?\nThe Statistical Recurrent Unit\nEvaluating Singleplayer and Multiplayer in Human Computation Games\nHigh-Resolution Multispectral Dataset for Semantic Segmentation\nA Gentle Introduction to Epistemic Planning: The DEL Approach\nUnsupervised Learning of Sentence Embeddings using Compositional n-Gram  Features\nModeling the Ellsberg Paradox by Argument Strength\nAbductive, Causal, and Counterfactual Conditionals Under Incomplete  Probabilistic Knowledge\nThe Ontological Multidimensional Data Model\nAxioms in Model-based Planners\nGait Pattern Recognition Using Accelerometers\nCost-Based Intuitionist Probabilities on Spaces of Graphs, Hypergraphs  and Theorems\nReinforcement Learning for Transition-Based Mention Detection\nFuzzy Rankings: Properties and Applications\nOn Inconsistency Indices and Inconsistency Axioms in Pairwise  Comparisons\nInScript: Narrative texts annotated with script information\nParaGraphE: A Library for Parallel Knowledge Graph Embedding\nA Visual Web Tool to Perform What-If Analysis of Optimization Approaches\nFoundations for a Probabilistic Event Calculus\nImproving Statistical Multimedia Information Retrieval Model by using  Ontology\nOntology Based Pivoted normalization using Vector Based Approach for  information Retrieval\nDeep Exploration via Randomized Value Functions\nDiversification-Based Learning in Computing and Optimization\nImplications of the Fourth Industrial Age on Higher Education\nStructured Parallel Programming for Monte Carlo Tree Search\nMulti-Task Learning of Keyphrase Boundary Classification\nReprogramming Matter, Life, and Purpose\nA simulated annealing approach to optimal storing in a multi-level  warehouse\nFinite-Time Stabilization of Longitudinal Control for Autonomous  Vehicles via a Model-Free Approach\nBest Practices for Applying Deep Learning to Novel Applications\nGeometry of Policy Improvement\nAppLP: A Dialogue on Applications of Logic Programming\nSemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations  from Scientific Publications\nMatching Media Contents with User Profiles by means of the  Dempster-Shafer Theory\nMinkowski Operations of Sets with Application to Robot Localization\nScavenger 0.1: A Theorem Prover Based on Conflict Resolution\nBeliefs and Probability in Bacchus' l.p. Logic: A~3-Valued Logic  Solution to Apparent Counter-intuition\nCASP Solutions for Planning in Hybrid Domains\nDempster-Shafer Belief Function - A New Interpretation\nDeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks\nEnvironment-Independent Task Specifications via GLTL\nGeneric LSH Families for the Angular Distance Based on  Johnson-Lindenstrauss Projections and Feature Hashing LSH\nPseudorehearsal in actor-critic agents\nInvestigating Recurrence and Eligibility Traces in Deep Q-Networks\nBeating Atari with Natural Language Guided Reinforcement Learning\nA multi-method simulation of a high-frequency bus line using AnyLogic\nStochastic Constraint Programming as Reinforcement Learning\nLearning from Ontology Streams with Semantic Concept Drift\nAbstract Syntax Networks for Code Generation and Semantic Parsing\n280 Birds with One Stone: Inducing Multilingual Taxonomies from  Wikipedia using Character-level Classification\nFine-Grained Entity Typing with High-Multiplicity Assignments\nA Recurrent Neural Model with Attention for the Recognition of Chinese  Implicit Discourse Relations\nPunny Captions: Witty Wordplay in Image Descriptions\nMultimodal Word Distributions\nNo, This is not a Circle\nModeling Events as Machines\nKiwi - A Minimalist CP Solver\nQuantum Mechanical Approach to Modelling Reliability of Sensor Reports\nImagining Probabilistic Belief Change as Imaging (Technical Report)\nA Versatile, Sound Tool for Simplifying Definitions\nDistributed Online Learning of Event Definitions\nSLDR-DL: A Framework for SLD-Resolution with Deep Learning\nFinding Bottlenecks: Predicting Student Attrition with Unsupervised  Classifier\nBlock-Parallel IDA* for GPUs (Extended Manuscript)\nCORe50: a New Dataset and Benchmark for Continuous Object Recognition\nDeep Episodic Value Iteration for Model-based Meta-Reinforcement  Learning\nOn the Complexity of Semantic Integration of OWL Ontologies\nTuning Modular Networks with Weighted Losses for Hand-Eye Coordination\nLearning Probabilistic Programs Using Backpropagation\nRepeated Inverse Reinforcement Learning\nAtari games and Intel processors\nRankPL: A Qualitative Probabilistic Programming Language\nCombining tabu search and graph reduction to solve the maximum balanced  biclique problem\nEnsemble Sampling\nNote on Evolution and Forecasting of Requirements: Communications  Example\npix2code: Generating Code from a Graphical User Interface Screenshot\nContinual Learning in Generative Adversarial Nets\nBeyond Parity: Fairness Objectives for Collaborative Filtering\nLearning Causal Structures Using Regression Invariance\nAnomaly Detection in a Digital Video Broadcasting System Using Timed  Automata\nProbabilistic Program Abstractions\nAbstract Argumentation / Persuasion / Dynamics\nDeep Learning for Ontology Reasoning\nFine-grained acceleration control for autonomous intersection management  using deep reinforcement learning\nGenerating Steganographic Text with LSTMs\nKnowledge Base Completion: Baselines Strike Back\nUnsupervised Learning of Disentangled Representations from Video\nICABiDAS: Intuition Centred Architecture for Big Data Analysis and  Synthesis\nA Joint Model for Question Answering and Question Generation\nMarmara Turkish Coreference Corpus and Coreference Resolution Baseline\nDesign and Implementation of Modified Fuzzy based CPU Scheduling  Algorithm\nTowards balanced clustering - part 1 (preliminaries)\nOff The Beaten Lane: AI Challenges In MOBAs Beyond Player Control\nTowards Statistical Reasoning in Description Logics over Finite Domains  (Full Version)\nTowards Grounding Conceptual Spaces in Neural Representations\nAn Overview of Multi-Task Learning in Deep Neural Networks\nBib2vec: An Embedding-based Search System for Bibliographic Information\nCollaborative vehicle routing: a survey\nBayesian Conditional Generative Adverserial Networks\nThe impact of Entropy and Solution Density on selected SAT heuristics\nEntropy, neutro-entropy and anti-entropy for neutrosophic information\nData set operations to hide decision tree rules\nLearning Hierarchical Information Flow with Recurrent Neural Modules\nDex: Incremental Learning for Complex Environments in Deep Reinforcement  Learning\nVAIN: Attentional Multi-agent Predictive Modeling\nProgrammable Agents\nMAGIX: Model Agnostic Globally Interpretable Explanations\nAn approach to reachability analysis for feed-forward ReLU neural  networks\nRational coordination with no communication or conventions\nTemporal-related Convolutional-Restricted-Boltzmann-Machine capable of  learning relational order via reinforcement learning procedure?\nHandling PDDL3.0 State Trajectory Constraints with Temporal Landmarks\nOptimal choice: new machine learning problem and its solution\nA Simulator for Hedonic Games\nWell-supported phylogenies using largest subsets of core-genes by  discrete particle swarm optimization\nSUNNY-CP and the MiniZinc Challenge\nLogic Programming for an Introductory Computer Science Course for High  School Students\nLearning Knowledge Graph Embeddings with Type Regularizer\nNew Fairness Metrics for Recommendation that Embrace Differences\nRestricted Causal Inference Algorithm\nProbabilistic Active Learning of Functions in Structural Causal Models\nSynthesizing Deep Neural Network Architectures using Biological Synaptic  Strength Distributions\nModifying Optimal SAT-based Approach to Multi-agent Path-finding Problem  to Suboptimal Variants\nIsing Processing Units: Potential and Challenges for Discrete  Optimization\nKernel Feature Selection via Conditional Covariance Minimization\nA Deep Network with Visual Text Composition Behavior\nConvergence Analysis of Optimization Algorithms\nAn Online Development Environment for Answer Set Programming\nEvaluating Social Networks Using Task-Focused Network Inference\nTowards an automated method based on Iterated Local Search optimization  for tuning the parameters of Support Vector Machines\nSimilarity Search Over Graphs Using Localized Spectral Analysis\nConflict Analysis for Pythagorean Fuzzy Information Systems with Group  Decision Making\nMechanics Automatically Recognized via Interactive Observation: Jumping\nA Formal Framework to Characterize Interpretability of Procedures\nRepresentation Learning for Grounded Spatial Reasoning\nFast Restricted Causal Inference\nA Comprehensive Implementation of Conceptual Spaces\nTensorLog: Deep Learning Meets Probabilistic DBs\nEigenlogic: Interpretable Quantum Observables with applications to Fuzzy  Behavior of Vehicular Robots\nSpeeding-up ProbLog's Parameter Learning\nNavigability with Imperfect Information\nVideo Highlight Prediction Using Audience Chat Reactions\nBinary Voting with Delegable Proxy: An Analysis of Liquid Democracy\nTogether We Know How to Achieve: An Epistemic Logic of Know-How  (Extended Abstract)\nAnalysis of Italian Word Embeddings\nProviding Self-Aware Systems with Reflexivity\nA Vision For Continuous Automated Game Design\nCost and Actual Causation\nEvaluating Music Recommender Systems for Groups\nAdvantages and Limitations of using Successor Features for Transfer in  Reinforcement Learning\nDeep Transfer in Reinforcement Learning by Language Grounding\nDeep Recurrent Generative Decoder for Abstractive Text Summarization\nReader-Aware Multi-Document Summarization: An Enhanced Model and The  First Dataset\nInception Score, Label Smoothing, Gradient Vanishing and -log(D(x))  Alternative\nCheryl's Birthday\nMeasuring Inconsistency in Argument Graphs\nHierarchically-Attentive RNN for Album Summarization and Storytelling\nTosca: Operationalizing Commitments Over Information Protocols\nAutomatic Selection of t-SNE Perplexity\nSystematic Testing of Convolutional Neural Networks for Autonomous  Driving\nGlobeNet: Convolutional Neural Networks for Typhoon Eye Tracking from  Remote Sensing Imagery\nDeep Incremental Boosting\nEnergy saving for building heating via a simple and efficient model-free  control design: First steps with computer simulations\nA Measure for Dialog Complexity and its Application in Streamlining  Service Operations\nLearning body-affordances to simplify action spaces\nTheoretical Foundation of Co-Training and Disagreement-Based Algorithms\nEnriching Information Technology Course Materials by Using Youtube\nTheoSea: Marching Theory to Light\nWarp: a method for neural network interpretability applied to gene  expression profiles\nBeyond Temporal Pooling: Recurrence and Temporal Convolutions for  Gesture Recognition in Video\nSelective Greedy Equivalence Search: Finding Optimal Bayesian Networks  Using a Polynomial Number of Score Evaluations\nRisk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach\nVisual Learning of Arithmetic Operations\nA Framework for Constrained and Adaptive Behavior-Based Agents\nNP-hardness of sortedness constraints\nDUAL-LOCO: Distributing Statistical Estimation Using Random Projections\nNew Limits for Knowledge Compilation and Applications to Exact Model  Counting\nThe Wreath Process: A totally generative model of geometric shape based  on nested symmetries\nOn-the-Job Learning with Bayesian Decision Theory\nTeaching Machines to Read and Comprehend\nAn efficient algorithm for contextual bandits with knapsacks, and an  extension to concave objectives\nOn the Prior Sensitivity of Thompson Sampling\nThe Online Coupon-Collector Problem and Its Application to Lifelong  Reinforcement Learning\nFast Online Clustering with Randomized Skeleton Sets\nBayesian Poisson Tensor Factorization for Inferring Multilateral  Relations from Sparse Dyadic Event Counts\nMondrian Forests for Large-Scale Regression when Uncertainty Matters\nListen, Attend, and Walk: Neural Mapping of Navigational Instructions to  Action Sequences\nAttacker and Defender Counting Approach for Abstract Argumentation\nQuery-Answer Causality in Databases: Abductive Diagnosis and  View-Updates\nRare Speed-up in Automatic Theorem Proving Reveals Tradeoff Between  Computational Time and Information Value\nReading Scene Text in Deep Convolutional Sequences\nLinguistic Harbingers of Betrayal: A Case Study on an Online Strategy  Game\nThe Scope and Limits of Simulation in Cognitive Models\nUsing Hankel Matrices for Dynamics-based Facial Emotion Recognition and  Pain Detection\nEarly Predictions of Movie Success: the Who, What, and When of  Profitability\nSmart Pacing for Effective Online Ad Campaign Optimization\nExpectation Particle Belief Propagation\nA Novel Method for Stock Forecasting based on Fuzzy Time Series Combined  with the Longest Common/Repeated Sub-sequence\nA Neural Network Approach to Context-Sensitive Generation of  Conversational Responses\nObjective Variables for Probabilistic Revenue Maximization in  Second-Price Auctions with Reserve\nHumor in Collective Discourse: Unsupervised Funniness Detection in the  New Yorker Cartoon Caption Contest\nDeep-Plant: Plant Identification with convolutional neural networks\nAutomated Benchmarking of Incremental SAT and QBF Solvers\nThe Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured  Multi-Turn Dialogue Systems\nLanguage Understanding for Text-based Games Using Deep Reinforcement  Learning\nNeuro-Fuzzy Algorithmic (NFA) Models and Tools for Estimation\nMixed Logical and Probabilistic Reasoning for Planning and Explanation  Generation in Robotics\nExtending SROIQ with Constraint Networks and Grounded Circumscription\nEvolutionary Multimodal Optimization: A Short Survey\nA Weakly Supervised Learning Approach based on Spectral Graph-Theoretic  Grouping\nEstimating Mutual Information by Local Gaussian Approximation\nQualitative Decision Methods for Multi-Attribute Decision Making\nStructured Prediction: From Gaussian Perturbations to Linear-Time  Principled Algorithms\nOn the Linear Belief Compression of POMDPs: A re-examination of current  methods\nThe QBF Gallery: Behind the Scenes\nOntology Bulding vs Data Harvesting and Cleaning for Smart-city Services\nMining for Causal Relationships: A Data-Driven Study of the Islamic  State\nReplication and Generalization of PRECISE\nFuzzy Logic Based Direct Torque Control Of Induction Motor With Space  Vector Modulation\nSecurity Games with Ambiguous Beliefs of Agents\nCrime Prediction Based On Crime Types And Using Spatial And Temporal  Criminal Hotspots\nA Linearly-Convergent Stochastic L-BFGS Algorithm\nSyntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models  of Meaning\nSimulating Brain Reaction to Methamphetamine Regarding Consumer  Personality\nAnswering Fuzzy Conjunctive Queries over Finitely Valued Fuzzy  Ontologies\nOOASP: Connecting Object-oriented and Logic Programming\nMultiple-Path Selection for new Highway Alignments using Discrete  Algorithms\nGeneration of Multimedia Artifacts: An Extractive Summarization-based  Approach\nTalking about the Moving Image: A Declarative Model for Image Schema  Based Embodied Perception Grounding and Language Generation\nSufficient and necessary conditions for Dynamic Programming in  Valuation-Based Systems\nCausal Decision Trees\nVariable Elimination in the Fourier Domain\nMolding CNNs for text: non-linear, non-consecutive convolutions\nReasoning in complex environments with the SelectScript declarative  language\nDistributed Deep Q-Learning\nEnd-to-End Attention-based Large Vocabulary Speech Recognition\nFishing out Winners from Vote Streams\nWarehouse Layout Method Based on Ant Colony and Backtracking Algorithm\nEfficient Computation of Exact IRV Margins\nThe backtracking survey propagation algorithm for solving random K-SAT  problems\nLifted Relational Neural Networks\nThe Max $K$-Armed Bandit: A PAC Lower Bound and tighter Algorithms\nERBlox: Combining Matching Dependencies with Machine Learning for Entity  Resolution\nUnsatisfiable Cores and Lower Bounding for Constraint Programming\nRobot Language Learning, Generation, and Comprehension\nMining Combined Causes in Large Data Sets\nA Comparison Between Decision Trees and Decision Tree Forest Models for  Software Development Effort Estimation\nLearning Structures of Bayesian Networks for Variable Groups\nGR2RSS: Publishing Linked Open Commerce Data as RSS and Atom Feeds\nWhat to talk about and how? Selective Generation using LSTMs with  Coarse-to-Fine Alignment\nGenerating Weather Forecast Texts with Case Based Reasoning\nBuilding a Truly Distributed Constraint Solver with JADE\nQuantization based Fast Inner Product Search\nGiraffe: Using Deep Reinforcement Learning to Play Chess\nResearch: Analysis of Transport Model that Approximates Decision Taker's  Preferences\nRisk-Averse Approximate Dynamic Programming with Quantile-Based Risk  Measures\nAn Approach to the Analysis of the South Slavic Medieval Labels Using  Image Texture\nBounded Situation Calculus Action Theories\nC3: Lightweight Incrementalized MCMC for Probabilistic Programs using  Continuations and Callsite Caching\nLearning Efficient Representations for Reinforcement Learning\nEvolving TSP heuristics using Multi Expression Programming\nCoarse-to-Fine Sequential Monte Carlo for Probabilistic Programs\nCompatible Value Gradients for Reinforcement Learning of Continuous Deep  Policies\nAgent enabled Mining of Distributed Protein Data Banks\nAn Epsilon Hierarchical Fuzzy Twin Support Vector Regression\nMulti-Attribute Proportional Representation\nSharing HOL4 and HOL Light proof knowledge\nPremise Selection and External Provers for HOL4\nLazy Factored Inference for Functional Probabilistic Programming\nSome Supplementaries to The Counting Semantics for Abstract  Argumentation\nBenchmarking for Bayesian Reinforcement Learning\nDouble Relief with progressive weighting function\nKernelized Deep Convolutional Neural Network for Describing Complex  Images\nLarge-Scale Optimization Algorithms for Sparse Conditional Gaussian  Graphical Models\nCausal Model Analysis using Collider v-structure with Negative  Percentage Mapping\nRecurrent Neural Networks for Driver Activity Anticipation via  Sensory-Fusion Architecture\nEfficient Task Collaboration with Execution Uncertainty\n(Blue) Taxi Destination and Trip Time Prediction from Partial  Trajectories\nClass Association Rules Mining based Rough Set Method\nLearning from Synthetic Data Using a Stacked Multichannel Autoencoder\nTransG : A Generative Mixture Model for Knowledge Graph Embedding\nProceedings Thirteenth International Workshop on the ACL2 Theorem Prover  and Its Applications\nEnergy saving in smart homes based on consumer behaviour: A case study\nBackdoors into Heterogeneous Classes of SAT and CSP\nExploiting Reduction Rules and Data Structures: Local Search for Minimum  Vertex Cover in Massive Graphs\nTelugu OCR Framework using Deep Learning\nImpact of noise on a dynamical system: prediction and uncertainties from  a swarm-optimized neural network\nPoker-CNN: A Pattern Learning Strategy for Making Draws and Bets in  Poker Games\nBoolean Hedonic Games\nCRDT: Correlation Ratio Based Decision Tree Model for Healthcare Data  Mining\nFeature Evaluation of Deep Convolutional Neural Networks for Object  Recognition and Detection\nDeep Multimodal Embedding: Manipulating Novel Objects with Point-clouds,  Language and Trajectories\nDiscovery and Visualization of Nonstationary Causal Models\nOptimal Release Time Decision from Fuzzy Mathematical Programming  Perspective\nApproximation and Heuristic Algorithms for Probabilistic Physical Search  on General Graphs\nBoolean Matrix Factorization and Noisy Completion via Message Passing\nLearning dynamic Boltzmann machines with spike-timing dependent  plasticity\nTowards Unveiling the Ontology Key Features Altering Reasoner  Performances\nTwo Phase $Q-$learning for Bidding-based Vehicle Sharing\nInferring Interpersonal Relations in Narrative Summaries\nTaxonomy grounded aggregation of classifiers with different label sets\nA New Approach for Scalable Analysis of Microbial Communities\nFast k-Nearest Neighbour Search via Dynamic Continuous Indexing\nAttribute2Image: Conditional Image Generation from Visual Attributes\nObject-based World Modeling in Semi-Static Environments with Dependent  Dirichlet-Process Mixtures\nBayesian Matrix Completion via Adaptive Relaxed Spectral Regularization\nQuantifying knowledge with a new calculus for belief functions - a  generalization of probability theory\nLocally Adaptive Translation for Knowledge Graph Embedding\nWhat Makes it Difficult to Understand a Scientific Literature?\nReuse of Neural Modules for General Video Game Playing\nRisk-Constrained Reinforcement Learning with Percentile Risk Criteria\nProbabilistic Structural Controllability in Causal Bayesian Networks\nKnowledge Sharing in Coalitions\nA Novel Approach to Distributed Multi-Class SVM\nHow to Discount Deep Reinforcement Learning: Towards New Dynamic  Strategies\nFrom rules to runs: A dynamic epistemic take on imperfect information  games\nSensitivity analysis, multilinearity and beyond\nLearning Discrete Bayesian Networks from Continuous Data\nShapeNet: An Information-Rich 3D Model Repository\nLearning measures of semi-additive behaviour\nMobile Robots Adaptive Control Using Neural Networks\nUsing Linear Constraints for Logic Program Termination Analysis\nPolicy Gradient Methods for Off-policy Control\nEvaluation of Pose Tracking Accuracy in the First and Second Generations  of Microsoft Kinect\nOrigami: A 803 GOp/s/W Convolutional Network Accelerator\nAn Event Calculus Production Rule System for Reasoning in Dynamic and  Uncertain Domains\nData-driven Sequential Monte Carlo in Probabilistic Programming\nFrom One Point to A Manifold: Knowledge Graph Embedding For Precise Link  Prediction\nBayesDB: A probabilistic programming system for querying the probable  implications of data\nFeature Representation for ICU Mortality\nA thermodynamical approach towards multi-criteria decision making (MCDM)\nA Survey of Available Corpora for Building Data-Driven Dialogue Systems\nQuadripolar Relational Model: a framework for the description of  borderline and narcissistic personality disorders\nA Planning based Framework for Essay Generation\nCan Pretrained Neural Networks Detect Anatomy?\nOntology-driven Information Extraction\nTest-Driven Development of ontologies (extended version)\nTowards Integrated Glance To Restructuring in Combinatorial Optimization\nRemote Health Coaching System and Human Motion Data Analysis for  Physical Therapy with Microsoft Kinect\nRestricted Predicates for Hypothetical Datalog\nOn the Differential Privacy of Bayesian Inference\nBeauty and Brains: Detecting Anomalous Pattern Co-Occurrences\nKeeping it Short and Simple: Summarising Complex Event Sequences with  Multivariate Patterns\nSR-Clustering: Semantic Regularized Clustering for Egocentric Photo  Streams Segmentation\nA Deep Generative Deconvolutional Image Model\nSelecting the top-quality item through crowd scoring\nRandomized Social Choice Functions Under Metric Preferences\nRepresentation and Coding of Signal Geometry\nThe Max $K$-Armed Bandit: PAC Lower Bounds and Efficient Algorithms\nMeasuring pattern retention in anonymized data -- where one measure is  not enough\nRDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent  Predicate Cycles\nProbabilistic Model-Based Approach for Heart Beat Detection\nMulti-Level Cause-Effect Systems\nToward a Research Agenda in Adversarial Reasoning: Computational  Approaches to Anticipating the Opponent's Intent and Actions\nDevice and System Level Design Considerations for  Analog-Non-Volatile-Memory Based Neuromorphic Architectures\nUsing Data Analytics to Detect Anomalous States in Vehicles\nRegularized Orthogonal Tensor Decompositions for Multi-Relational  Learning\nGELATO and SAGE: An Integrated Framework for MS Annotation\nConditional probability generation methods for high reliability  effects-based decision making\nTaming the Noise in Reinforcement Learning via Soft Updates\nLearning Natural Language Inference with LSTM\nClosing the Gap Between Short and Long XORs for Model Counting\nNonparametric Bayesian Factor Analysis for Dynamic Count Matrices\nBayes-Optimal Effort Allocation in Crowdsourcing: Bounds and Index  Policies\nAn (MI)LP-based Primal Heuristic for 3-Architecture Connected Facility  Location in Urban Access Network Design\nSelecting Near-Optimal Learners via Incremental Data Allocation\nComputational Pathology: Challenges and Promises for Tissue Analysis\nWavelet Scattering on the Pitch Spiral\nA Unified Approach for Learning the Parameters of Sum-Product Networks\nBenders Decomposition for the Design of a Hub and Shuttle Public Transit  System\nMutual Information and Diverse Decoding Improve Neural Machine  Translation\nScalable Models for Computing Hierarchies in Information Networks\nWeakly-supervised Disentangling with Recurrent Transformations for 3D  View Synthesis\nOpen challenges in understanding development and evolution of speech  forms: The roles of embodied self-organization, motivation and active  exploration\nJoint learning of ontology and semantic parser from text\nWikiometrics: A Wikipedia Based Ranking System\nTowards Semantic Integration of Heterogeneous Sensor Data with  Indigenous Knowledge for Drought Forecasting\nIdentifying Stable Patterns over Time for Emotion Recognition from EEG\nA Synthetic Approach for Recommendation: Combining Ratings, Social  Relations, and Reviews\nGit4Voc: Git-based Versioning for Collaborative Vocabulary Development\nBasic Reasoning with Tensor Product Representations\nThe minimal hitting set generation problem: algorithms and computation\nSubmodular Optimization under Noise\nComplexity of ITL model checking: some well-behaved fragments of the  interval logic HS\nTrust from the past: Bayesian Personalized Ranking based Link Prediction  in Knowledge Graphs\nA Method for Image Reduction Based on a Generalization of Ordered  Weighted Averaging Functions\nAutomatic Description Generation from Images: A Survey of Models,  Datasets, and Evaluation Measures\nIt's about time: Online Macrotask Sequencing in Expert Crowdsourcing\n$\\mathbf{D^3}$: Deep Dual-Domain Based Fast Restoration of  JPEG-Compressed Images\nStudying Very Low Resolution Recognition Using Deep Networks\nWord Existence Algorithm\nSemantics for probabilistic programming: higher-order functions,  continuous distributions, and soft constraints\nSemantic Word Clusters Using Signed Normalized Graph Cuts\nSub-Optimal Multi-Phase Path Planning: A Method for Solving Rubik's  Revenge\nOnline Event Recognition from Moving Vessel Trajectories\nCoalition-based Planning of Military Operations: Adversarial Reasoning  Algorithms in an Integrated Decision Aid\nBitwise Neural Networks\nTowards Resolving Unidentifiability in Inverse Reinforcement Learning\nExpected Similarity Estimation for Large-Scale Batch and Streaming  Anomaly Detection\nGeneralizing Prototype Theory: A Formal Quantum Framework\nFisher Motion Descriptor for Multiview Gait Recognition\nFont Identification in Historical Documents Using Active Learning\nQuantum machine learning with glow for episodic tasks and decision games\nLearning and Tuning Meta-heuristics in Plan Space Planning\nEfficient Hill-Climber for Multi-Objective Pseudo-Boolean Optimization\nNumerical Atrribute Extraction from Clinical Texts\nTowards a Cognitive Routing Engine for Software Defined Networks\nMarvin: Semantic annotation using multiple knowledge sources\nAre Elephants Bigger than Butterflies? Reasoning about Sizes of Objects\nFinding the different patterns in buildings data using bag of words  representation with clustering\nA Factorized Recurrent Neural Network based architecture for medium to  large vocabulary Language Modelling\nUps and Downs: Modeling the Visual Evolution of Fashion Trends with  One-Class Collaborative Filtering\nA Generalised Quantifier Theory of Natural Language in Categorical  Compositional Distributional Semantics with Bialgebras\nFormal Verification of Autonomous Vehicle Platooning\nHarmonic Grammar in a DisCo Model of Meaning\nProbabilistic Extension to the Concurrent Constraint Factor Oracle Model  for Music Improvisation\nERBlox: Combining Matching Dependencies with Machine Learning for Entity  Resolution\nFind an Optimal Path in Static System and Dynamical System within  Polynomial Runtime\nGraying the black box: Understanding DQNs\nLearning to Communicate to Solve Riddles with Deep Distributed Recurrent  Q-Networks\nDecoy Bandits Dueling on a Poset\nStrategic disclosure of opinions on a social network\nThe IMP game: Learnability, approximability and adversarial learning  beyond $Σ^0_1$\nValue Iteration Networks\nTime Resource Networks\nFeature Based Task Recommendation in Crowdsourcing with Implicit  Observations\nIterative Hierarchical Optimization for Misspecified Problems (IHOMP)\nAdaptive Skills, Adaptive Partitions (ASAP)\nEnabling Basic Normative HRI in a Cognitive Robotic Architecture\nDetection of Cooperative Interactions in Logistic Regression Models\nIdentifying Diabetic Patients with High Risk of Readmission\nA Minimalistic Approach to Sum-Product Network Learning for Real  Applications\nLook, Listen and Learn - A Multimodal LSTM for Speaker Identification\nBPCMont: Business Process Change Management Ontology\nRandom Forest Based Approach for Concept Drift Handling\nSurprising properties of dropout in deep networks\nExtending Consequence-Based Reasoning to SRIQ\nTowards reducing the multidimensionality of OLAP cubes using the  Evolutionary Algorithms and Factor Analysis Methods\nPOMDP-lite for Robust Robot Planning under Uncertainty\nUnsupervised Domain Adaptation Using Approximate Label Matching\nA diffusion and clustering-based approach for finding coherent motions  and understanding crowd scenes\nQ($λ$) with Off-Policy Corrections\nContextual Media Retrieval Using Natural Language Queries\nA Subsequence Interleaving Model for Sequential Pattern Mining\nSymmetry Breaking Predicates for SAT-based DFA Identification\nBioSpaun: A large-scale behaving brain model with complex neurons\n11 x 11 Domineering is Solved: The first player wins\nAuxiliary Deep Generative Models\nQuery Answering with Inconsistent Existential Rules under Stable Model  Semantics\nOrdonnancement d'entités pour la rencontre du web des documents et du  web des données\nText Matching as Image Recognition\nInteractive Storytelling over Document Collections\nRecurrent Orthogonal Networks and Long-Memory Tasks\nEnablers and Inhibitors in Causal Justifications of Logic Programs\nEmpath: Understanding Topic Signals in Large-Scale Text\nLatent Skill Embedding for Personalized Lesson Sequence Recommendation\nFinding Needle in a Million Metrics: Anomaly Detection in a Large-scale  Computational Advertising Platform\nSqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB  model size\nParametric Prediction from Parametric Agents\nA Quantum Computational Semantics for Epistemic Logical Operators. Part  I: Epistemic Structures\nMultilingual Twitter Sentiment Classification: The Role of Human  Annotators\nTime and Activity Sequence Prediction of Business Process Instances\nA Survey on Domain-Specific Languages for Machine Learning in Big Data\nLearning values across many orders of magnitude\nToward Game Level Generation from Gameplay Videos\nReinforcement Learning of POMDPs using Spectral Methods\nTop-N Recommendation with Novel Rank Approximation\nModeling cumulative biological phenomena with Suppes-Bayes Causal  Networks\nProbably Approximately Correct Greedy Maximization\nWeight Normalization: A Simple Reparameterization to Accelerate Training  of Deep Neural Networks\nCausal Discovery from Subsampled Time Series Data by Constraint  Optimization\nHow effective can simple ordinal peer grading be?\nScalable Bayesian Rule Lists\nLie Access Neural Turing Machine\nInvestigating practical linear temporal difference learning\nEasy Monotonic Policy Iteration\nGuided Cost Learning: Deep Inverse Optimal Control via Policy  Optimization\nProbabilistic Relational Model Benchmark Generation\nContinuous Deep Q-Learning with Model-based Acceleration\nFilter based Taxonomy Modification for Improving Hierarchical  Classification\nAutomatic Differentiation Variational Inference\nHybrid Collaborative Filtering with Autoencoders\nAutomatic learning of gait signatures for people identification\nDeep Reinforcement Learning from Self-Play in Imperfect-Information  Games\nLearning Physical Intuition of Block Towers by Example\nCausal inference for cloud computing\nSentiment Analysis in Scholarly Book Reviews\nA Linked Data Scalability Challenge: Concept Reuse Leads to Semantic  Decay\nAn Argument-based Creative Assistant for Harmonic Blending\nHierarchical Decision Making In Electricity Grid Management\nAdaptive Visualisation System for Construction Building Information  Models Using Saliency\nPairwise Choice Markov Chains\nImplicit Discourse Relation Classification via Multi-Task Neural  Networks\nA Markovian-based Approach for Daily Living Activities Recognition\nHierarchical Linearly-Solvable Markov Decision Problems\nHigh-dimensional Black-box Optimization via Divide and Approximate  Conquer\nOn the physical realizability of quantum stochastic walks\nDemonstrating the Feasibility of Automatic Game Balancing\nSolving MaxSAT by Successive Calls to a SAT Solver\nImage Captioning with Semantic Attention\nOn Learning High Dimensional Structured Single Index Models\nA Signaling Game Approach to Databases Querying and Interaction\nExploratory Gradient Boosting for Reinforcement Learning in Complex  Domains\nItem2Vec: Neural Item Embedding for Collaborative Filtering\nLearning Network of Multivariate Hawkes Processes: A Time Series  Approach\nControlling Search in Very large Commonsense Knowledge Bases: A Machine  Learning Approach\nLearning Domain-Invariant Subspace using Domain Features and  Independence Maximization\nOptimal Sensing via Multi-armed Bandit Relaxations in Mixed  Observability Domains\nSuppressing the Unusual: towards Robust CNNs using Symmetric Activation  Functions\nHardware Acceleration for Boolean Satisfiability Solver by Applying  Belief Propagation Algorithm\nMapping Temporal Variables into the NeuCube for Improved Pattern  Recognition, Predictive Modelling and Understanding of Stream Data\nBank distress in the news: Describing events through deep learning\nSentence Pair Scoring: Towards Unified Framework for Text Comprehension\nNeurally-Guided Procedural Models: Amortized Inference for Procedural  Graphics Programs using Neural Networks\nEvaluation of a Tree-based Pipeline Optimization Tool for Automating  Data Science\nAn Approximation Approach for Solving the Subpath Planning Problem\nMulti-fidelity Gaussian Process Bandit Optimisation\nHarnessing Deep Neural Networks with Logic Rules\nIncorporating Copying Mechanism in Sequence-to-Sequence Learning\nCharacterization of neighborhood behaviours in a multi-neighborhood  local search algorithm\nAction-Affect Classification and Morphing using Multi-Task  Representation Learning\nGenerating Factoid Questions With Recurrent Neural Networks: The 30M  Factoid Question-Answer Corpus\nCosolver2B: An Efficient Local Search Heuristic for the Travelling Thief  Problem\nDebugging Machine Learning Tasks\nA Diagram Is Worth A Dozen Images\nLoad Disaggregation Based on Aided Linear Integer Programming\nPixel-Level Domain Transfer\nConditional Similarity Networks\nHow NOT To Evaluate Your Dialogue System: An Empirical Study of  Unsupervised Evaluation Metrics for Dialogue Response Generation\nDo You See What I Mean? Visual Resolution of Linguistic Ambiguities\nGenerating Visual Explanations\nShuffle and Learn: Unsupervised Learning using Temporal Order  Verification\nCOCO: A Platform for Comparing Continuous Optimizers in a Black-Box  Setting\nTowards Practical Bayesian Parameter and State Estimation\nRobustness of Bayesian Pool-based Active Learning Against Prior  Misspecification\nPhoenix: A Self-Optimizing Chess Engine\nIterated Ontology Revision by Reinterpretation\nEnhancing Sentence Relation Modeling with Auxiliary Character-level  Embedding\nA New Approach for Revising Logic Programs\nVerifiability of Argumentation Semantics\nDistributing Knowledge into Simple Bases\nCharacterizing Realizability in Abstract Argumentation\nNeural Language Correction with Character-Based Attention\nA Survey of League Championship Algorithm: Prospects and Challenges\nHigher Order Recurrent Neural Networks\nAn Improved System for Sentence-level Novelty Detection in Textual  Streams\nEnforcing Template Representability and Temporal Consistency for  Adaptive Sparse Tracking\nCommon-Description Learning: A Framework for Learning Algorithms and  Generating Subproblems from Few Examples\nCoalition Formability Semantics with Conflict-Eliminable Sets of  Arguments\nFast Simulation of Probabilistic Boolean Networks (Technical Report)\nOntology-Mediated Queries: Combined Complexity and Succinctness of  Rewritings via Circuit Complexity\nLearning from the memory of Atari 2600\nBrain Emotional Learning-Based Prediction Model (For Long-Term Chaotic  Prediction Applications)\nThe KB paradigm and its application to interactive configuration\nEnergy Disaggregation for Real-Time Building Flexibility Detection\nRobust Dialog State Tracking for Large Ontologies\nBelief Merging by Source Reliability Assessment\nAudio Event Detection using Weakly Labeled Data\nMachine Learning Techniques with Ontology for Subjective Answer  Evaluation\nA Hierarchical Emotion Regulated Sensorimotor Model: Case Studies\nDeep Neural Networks Under Stress\nCharacterizing Quantifier Fuzzification Mechanisms: a behavioral guide  for practical applications\nOptimizing human-interpretable dialog management policy using Genetic  Algorithm\nAnytime Inference in Valuation Algebras\nOBDA Constraints for Effective Query Answering (Extended Version)\nA Critical Examination of RESCAL for Completion of Knowledge Bases with  Transitive Relations\nHigh-Performance Computing for Scheduling Decision Support: A Parallel  Depth-First Search Heuristic\nOff-policy evaluation for slate recommendation\nDigital Stylometry: Linking Profiles Across Social Networks\nLearning Convolutional Neural Networks for Graphs\nFuzzy Sets Across the Natural Language Generation Pipeline\nDynamic Frame skip Deep Q Network\nHeuristics for Planning, Plan Recognition and Parsing\nAMSOM: Adaptive Moving Self-organizing Map for Clustering and  Visualization\nA Hierarchical Latent Variable Encoder-Decoder Model for Generating  Dialogues\nVariational hybridization and transformation for large inaccurate  noisy-or networks\nResidual Networks Behave Like Ensembles of Relatively Shallow Networks\nQuery-Efficient Imitation Learning for End-to-End Autonomous Driving\nTensorLog: A Differentiable Deductive Database\nProgramming with a Differentiable Forth Interpreter\nLearning to Communicate with Deep Multi-Agent Reinforcement Learning\nStochastic Patching Process\nGenerative Choreography using Deep Learning\nDP-EM: Differentially Private Expectation Maximization\nSpontaneous vs. Posed smiles - can we tell the difference?\nFast Bayesian Optimization of Machine Learning Hyperparameters on Large  Datasets\nGenetic Architect: Discovering Genomic Structure with Learned Neural  Architectures\nAdaptive ADMM with Spectral Penalty Parameter Selection\nDiagnosing editorial strategies of Chilean media on Twitter using an  automatic news classifier\nNear-optimal Bayesian Active Learning with Correlated and Noisy Tests\nNon-Gaussian Random Generators in Bacteria Foraging Algorithm for  Multiobjective Optimization\nAlternating Optimisation and Quadrature for Robust Control\nTowards Bin Packing (preliminary problem survey, models with multiset  estimates)\nLearning Multiagent Communication with Backpropagation\nAutomatic Open Knowledge Acquisition via Long Short-Term Memory Networks  with Feedback Negative Sampling\nAdaptive Neural Compilation\nRuling Out Static Latent Homophily in Citation Networks\nThe Symbolic Interior Point Method\nProbabilistic Inference Modulo Theories\nKronecker Determinantal Point Processes\nEstimation of Passenger Route Choice Pattern Using Smart Card Data for  Complex Metro Systems\nModel-Free Imitation Learning with Policy Optimization\nDensity estimation using Real NVP\nControl of Memory, Active Perception, and Action in Minecraft\nRandomization and The Pernicious Effects of Limited Budgets on Auction  Experiments\nUnsupervised Discovery of El Nino Using Causal Feature Learning on  Microlevel Climate Data\nParallel Markov Chain Monte Carlo via Spectral Clustering\nInterdependent Scheduling Games\nDetermining the Characteristic Vocabulary for a Specialized Dictionary  using Word2vec and a Directed Crawler\nVIME: Variational Information Maximizing Exploration\nTowards ontology driven learning of visual concept detectors\nTechnical Report: Directed Controller Synthesis of Discrete Event  Systems\nUncertain programming model for multi-item solid transportation problem\nQuantifying the probable approximation error of probabilistic inference  programs\nHardness of the Pricing Problem for Chains in Barter Exchanges\nA Survey of Qualitative Spatial and Temporal Calculi -- Algebraic and  Computational Properties\nMining Software Components from Object-Oriented APIs\nPost-Inference Prior Swapping\nQuestion Answering over Knowledge Base with Neural Attention Combining  Global Knowledge Information\nSelecting the Best Player Formation for Corner-Kick Situations Based on  Bayes' Estimation\nThe belief noisy-or model applied to network reliability analysis\nEnd-to-end LSTM-based dialog control optimized with supervised and  reinforcement learning\nScene Grammars, Factor Graphs, and Belief Propagation\nEffective Multi-Robot Spatial Task Allocation using Model Approximations\nGenerating Natural Language Inference Chains\nDistance Metric Ensemble Learning and the Andrews-Curtis Conjecture\nCoordination in Categorical Compositional Distributional Semantics\nBayesian Poisson Tucker Decomposition for Learning the Structure of  International Relations\nUnifying Count-Based Exploration and Intrinsic Motivation\nPreliminaries of a Space Situational Awareness Ontology\nConsistency and Trust in Peer Data Exchange Systems\nSorting out symptoms: design and evaluation of the 'babylon check'  automated triage system\nMulti-resource defensive strategies for patrolling games with alarm  systems\nEmotional Intensity analysis in Bipolar subjects\nSifting Common Information from Many Variables\nSE3-Nets: Learning Rigid Body Motion using Deep Neural Networks\nDeep Learning Convolutional Networks for Multiphoton Microscopy  Vasculature Segmentation\nDeep Successor Reinforcement Learning\nLearning Language Games through Interaction\nExploring Implicit Human Responses to Robot Mistakes in a Learning from  Demonstration Task\nDISCO Nets: DISsimilarity COefficient Networks\nTowards End-to-End Learning for Dialog State Tracking and Management  using Deep Reinforcement Learning\nSafe and Efficient Off-Policy Reinforcement Learning\nTheoretical Robopsychology: Samu Has Learned Turing Machines\nArbitrage-Free Combinatorial Market Making via Integer Programming\nA Thorough Examination of the CNN/Daily Mail Reading Comprehension Task\nUnderstanding User Instructions by Utilizing Open Knowledge for Service  Robots\nA Cognitive Architecture for the Implementation of Emotions in Computing  Systems\nGenerative Topic Embedding: a Continuous Representation of Documents  (Extended Version with Proofs)\nMuFuRU: The Multi-Function Recurrent Unit\nCooperative Inverse Reinforcement Learning\nSimple epistemic planning: generalised gossiping\nNatural Language Generation enhances human decision-making with  uncertain information\nTunable Online MUS/MSS Enumeration\nWordNet2Vec: Corpora Agnostic Word Vectorization Method\nLength bias in Encoder Decoder Models and a Case for Global Conditioning\nScan Order in Gibbs Sampling: Models in Which it Matters and Bounds on  How Much\nThe Mythos of Model Interpretability\nWord Sense Disambiguation using a Bidirectional LSTM\nA framework for detecting fraudulent activities in edo state tax  collection system using investigative data mining\nThe Opacity of Backbones\nDeep Reinforcement Learning with a Combinatorial Action Space for  Predicting Popular Reddit Threads\nDetecção de comunidades em redes complexas para identificar  gargalos e desperdício de recursos em sistemas de ônibus\nMITRE at SemEval-2016 Task 6: Transfer Learning for Stance Detection\nEvidential Label Propagation Algorithm for Graphs\nRobust Probabilistic Modeling with Bayesian Data Reweighting\nNeural Associative Memory for Dual-Sequence Modeling\nVisual-Inertial-Semantic Scene Representation for 3-D Object Detection\nEstimating individual treatment effect: generalization bounds and  algorithms\nUsing a Distributional Semantic Vector Space with a Knowledge Base for  Reasoning in Uncertain Conditions\nBacteria Foraging Algorithm with Genetic Operators for the Solution of  QAP and mQAP\nUsing Virtual Humans to Understand Real Ones\nMicro-interventions in urban transport from pattern discovery on the  flow of passengers and on the bus network\nSpreadsheet Probabilistic Programming\nThe Parallel Knowledge Gradient Method for Batch Bayesian Optimization\nLogic Tensor Networks: Deep Learning and Logical Reasoning from Data and  Knowledge\nLifted Convex Quadratic Programming\nWhy is Compiling Lifted Inference into a Low-Level Language so  Effective?\nImpossibility in Belief Merging\nNatural Language Generation as Planning under Uncertainty Using  Reinforcement Learning\nStrategic Attentive Writer for Learning Macro-Actions\nASAGA: Asynchronous Parallel SAGA\nLearning Optimal Interventions\nRobust Active Perception via Data-association aware Belief Space  planning\nDeep Reinforcement Learning Discovers Internal Models\nSuccessor Features for Transfer in Reinforcement Learning\nUnsupervised Risk Estimation Using Only Conditional Independence  Structure\nOn the Expressive Power of Deep Neural Networks\nAbducing Compliance of Incomplete Event Logs\nAdding Context to Concept Trees\nBandit-Based Random Mutation Hill-Climbing\nProduct Classification in E-Commerce using Distributional Semantics\nPolymetric Rhythmic Feel for a Cognitive Drum Computer\nComplex Embeddings for Simple Link Prediction\nUnanimous Prediction for 100% Precision with Application to Learning  Semantic Mappings\nThe Schema Editor of OpenIoT for Semantic Sensor Networks\nNeighborhood Mixture Model for Knowledge Base Completion\nÉtude de Problèmes d'Optimisation Combinatoire à Multiples  Composantes Interdépendantes\nStructure in the Value Function of Two-Player Zero-Sum Games of  Incomplete Information\nInferring Logical Forms From Denotations\nAncestral Causal Inference\nEmulating Human Conversations using Convolutional Neural Network-based  IR\nAn Approach to Stable Gradient Descent Adaptation of Higher-Order Neural  Units\nAnalyzing the Behavior of Visual Question Answering Models\nRobust Learning of Fixed-Structure Bayesian Networks\nLearning to Poke by Poking: Experiential Learning of Intuitive Physics\nLSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in  Recurrent Neural Networks\nSort Story: Sorting Jumbled Images and Captions into Stories\nHuman-Agent Decision-making: Combining Theory and Practice\nStandard State Space Models of Unawareness (Extended Abstract)\nCeteris paribus logic in counterfactual reasoning\nPreference at First Sight\nRelating Knowledge and Coordinated Action: The Knowledge of  Preconditions Principle\nParameterized Complexity Results for a Model of Theory of Mind Based on  Dynamic Epistemic Logic\nThe optimality of coarse categories in decision-making and information  storage\nTranslucent Players: Explaining Cooperative Behavior in Social Dilemmas\nNeural Network Based Next-Song Recommendation\nPrecise deep neural network computation on imprecise low-power analog  hardware\nProactive Decision Support using Automated Planning\nLabel Tree Embeddings for Acoustic Scene Classification\nAssigning a Small Agreeable Set of Indivisible Items to Multiple Players\nQuantum Simulation of a Quantum Stochastic Walk\nContent-Based Top-N Recommendation using Heterogeneous Relations\nPropagators and Solvers for the Algebra of Modular Systems\nTrue Lies\nLifted Rule Injection for Relation Embeddings\nA Reduction for Optimizing Lattice Submodular Functions with Diminishing  Returns\nA Learning Algorithm for Relational Logistic Regression: Preliminary  Results\nA Local Density-Based Approach for Local Outlier Detection\nAdaptive Training of Random Mapping for Data Quantization\nTechnical Report: Towards a Universal Code Formatter through Machine  Learning\nsubgraph2vec: Learning Distributed Representations of Rooted Sub-graphs  from Large Graphs\nEvaluation and selection of Medical Tourism sites: A rough AHP based  MABAC approach\nCredibilistic TOPSIS Model for Evaluation and Selection of Municipal  Solid Waste Disposal Methods\nCompression of Neural Machine Translation Models via Pruning\nClique-Width and Directed Width Measures for Answer-Set Programming\nOrdering as privileged information\nContextual Symmetries in Probabilistic Graphical Models\nA Permutation-based Model for Crowd Labeling: Optimal Estimation and  Robustness\nLifted Region-Based Belief Propagation\nTowards A Virtual Assistant That Can Be Taught New Tasks In Any Domain  By Its End-Users\nFractal Dimension Pattern Based Multiresolution Analysis for Rough  Estimator of Person-Dependent Audio Emotion Recognition\nThrowing fuel on the embers: Probability or Dichotomy, Cognitive or  Linguistic?\nNeutrosophic Overset, Neutrosophic Underset, and Neutrosophic Offset.  Similarly for Neutrosophic Over-/Under-/Off- Logic, Probability, and  Statistics\nDomain Adaptation for Neural Networks by Parameter Augmentation\nAdaptive Neighborhood Graph Construction for Inference in  Multi-Relational Networks\nA Hybrid POMDP-BDI Agent Architecture with Online Stochastic Planning  and Plan Caching\nCan we reach Pareto optimal outcomes using bottom-up approaches?\nFormal analysis of HTM Spatial Pooler performance under predefined  operation conditions\nUnderstanding the Abstract Dialectical Framework (Preliminary Report)\nModeling of Item-Difficulty for Ontology-based MCQs\nEncoding Cryptographic Functions to SAT Using Transalg System\nModelling Context with User Embeddings for Sarcasm Detection in Social  Media\nGeneric Statistical Relational Entity Resolution in Knowledge Graphs\nBootstrap Model Aggregation for Distributed Statistical Learning\nAffect Intensity Estimation Using Multiple Modalities\nAn extended MABAC for multi-attribute decision making using trapezoidal  interval type-2 fuzzy numbers\nCan mobile usage predict illiteracy in a developing country?\nTowards Self-explanatory Ontology Visualization with Contextual  Verbalization\nDeep CORAL: Correlation Alignment for Deep Domain Adaptation\nRolling Horizon Coevolutionary Planning for Two-Player Video Games\nMapping Data to Ontologies with Exceptions Using Answer Set Programming\nRepresenting Verbs with Rich Contexts: an Evaluation on Verb Similarity\nCaR-FOREST: Joint Classification-Regression Decision Forests for  Overlapping Audio Event Detection\nTranslating Bayesian Networks into Entity Relationship Models, Extended  Version\nExplaining Deep Convolutional Neural Networks on Music Classification\nSolving finite-domain linear constraints in presence of the  $\\texttt{alldifferent}$\nAnalysis of opinionated text for opinion mining\nAugmenting Supervised Emotion Recognition with Rule-Based Decision Model\nExtending Weakly-Sticky Datalog+/-: Query-Answering Tractability and  Optimizations\nOpen Information Extraction\nA Framework for Estimating Long Term Driver Behavior\nPopulations can be essential in tracking dynamic optima\nCharacterizing Driving Styles with Deep Learning\nMinimum Vertex-type Sequence Indexingfor Clusters on Square Lattice\nLarge-scale Analysis of Chess Games with Chess Engines: A Preliminary  Report\nRandom-Key Cuckoo Search for the Travelling Salesman Problem\nVista: A Visually, Socially, and Temporally-aware Model for Artistic  Recommendation\nIntrinsically Motivated Multimodal Structure Learning\nA Counterexample to the Forward Recursion in Fuzzy Critical Path  Analysis Under Discrete Fuzzy Sets\nKnowledge Representation on the Web revisited: Tools for Prototype Based  Ontologies\nPlaying Atari Games with Deep Reinforcement Learning and Human  Checkpoint Replay\nIs spoken language all-or-nothing? Implications for future speech-based  human-machine interaction\nTowards Analytics Aware Ontology Based Access to Static and Streaming  Data (Extended Version)\nExploiting Vagueness for Multi-Agent Consensus\nAn Event Grouping Based Algorithm for University Course Timetabling  Problem\nNeural Contextual Conversation Learning with Labeled Question-Answering  Pairs\nIdentifying Candidate Risk Factors for Prescription Drug Side Effects  using Causal Contrast Set Mining\nIndebted households profiling: a knowledge discovery from database  approach\nAdaptive Data Communication Interface: A User-Centric Visual Data  Interpretation Framework\nRefining adverse drug reaction signals by incorporating interaction  variables identified using emergent pattern mining\nSupervised Anomaly Detection in Uncertain Pseudoperiodic Data Streams\nConstructing a Natural Language Inference Dataset using Generative  Neural Networks\nDataset and Neural Recurrent Sequence Labeling Model for Open-Domain  Factoid Question Answering\nModelling Office Energy Consumption: An Agent Based Approach\nLatent Variable Discovery Using Dependency Patterns\nOptimal resampling for the noisy OneMax problem\nInpainting of long audio segments with similarity graphs\nProcessing Natural Language About Ongoing Actions\nRedundancy-free Verbalization of Individuals for Ontology Validation\nSpatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition\nEstimating Activity at Multiple Scales using Spatial Abstractions\nTweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM  Encoder-Decoder\nOntoCat: Automatically categorizing knowledge in API Documentation\nThe Price of Anarchy in Auctions\nLeveraging Unstructured Data to Detect Emerging Reliability Issues\nFocused Model-Learning and Planning for Non-Gaussian Continuous  State-Action Systems\nTechnical Report: Giving Hints for Logic Programming Examples without  Revealing Solutions\nPolling-systems-based Autonomous Vehicle Coordination in Traffic  Intersections with No Traffic Signals\nJoint Embedding of Hierarchical Categories and Entities for Concept  Categorization and Dataless Classification\nMining Arguments from Cancer Documents Using Natural Language Processing  and Ontologies\nHarmonization of conflicting medical opinions using argumentation  protocols and textual entailment - a case study on Parkinson disease\nImproving Semantic Embedding Consistency by Metric Learning for  Zero-Shot Classification\nA DEMATEL-Based Completion Method for Incomplete Pairwise Comparison  Matrix in AHP\nN-opcode Analysis for Android Malware Classification and Categorization\nAndroid Malware Detection Using Parallel Machine Learning Classifiers\nVHT: Vertical Hoeffding Tree\nFaceless Person Recognition; Privacy Implications in Social Media\nA symbolic algebra for the computation of expected utilities in  multiplicative influence diagrams\nSemi-supervised evidential label propagation algorithm for graph data\nThe DLVHEX System for Knowledge Representation: Recent Advances (System  Description)\nIdentifying and Harnessing the Building Blocks of Machine Learning  Pipelines for Sensible Initialization of a Data Science Automation Tool\nPDDL+ Planning via Constraint Answer Set Programming\nHuman Pose Estimation in Space and Time using 3D CNN\nA Novel Progressive Learning Technique for Multi-class Classification\nFrom Community Detection to Community Deception\nTernary Neural Networks for Resource-Efficient AI Applications\nCrowdsourcing with Unsure Option\nVerifier Theory and Unverifiability\nA case study of algorithm selection for the traveling thief problem\nLexical-Morphological Modeling for Legal Text Analysis\nAn Online Universal Classifier for Binary, Multi-class and Multi-label  Classification\nSpectral learning of dynamic systems from nonequilibrium data\nQ-Learning with Basic Emotions\nOpenTripPlanner, OpenStreetMap, General Transit Feed Specification:  Tools for Disaster Relief and Recovery\nAxiomatizing Category Theory in Free Logic\nAutomation of Pedestrian Tracking in a Crowded Situation\nUnifying task specification in reinforcement learning\nDeep Markov Random Field for Image Modeling\nUberNet: Training a `Universal' Convolutional Neural Network for Low-,  Mid-, and High-Level Vision using Diverse Datasets and Limited Memory\nRandom Shuffling and Resets for the Non-stationary Stochastic Bandit  Problem\nLatest Datasets and Technologies Presented in the Workshop on Grasping  and Manipulation Datasets\nAn Integrated Classification Model for Financial Data Mining\nEpisodic Exploration for Deep Deterministic Policies: An Application to  StarCraft Micromanagement Tasks\nA centralized reinforcement learning method for multi-agent job  scheduling in Grid\nOn Generation of Time-based Label Refinements\nZaliQL: A SQL-Based Framework for Drawing Causal Inference from Big Data\nJoint Extraction of Events and Entities within a Document Context\nGraph Aggregation\nInstrumenting an SMT Solver to Solve Hybrid Network Reachability  Problems\nA Generic Bet-and-run Strategy for Speeding Up Traveling Salesperson and  Minimum Vertex Cover\nQuick and energy-efficient Bayesian computing of binocular disparity  using stochastic digital signals\nColumn Networks for Collective Classification\nContext Aware Nonnegative Matrix Factorization Clustering\nNPCs as People, Too: The Extreme AI Personality Engine\nA Formal Solution to the Grain of Truth Problem\nStyle Imitation and Chord Invention in Polyphonic Music with Exponential  Families\nShould Terminology Principles be re-examined?\nGrammatical Templates: Improving Text Difficulty Evaluation for Language  Learners\nContinuous occurrence theory\nNPCs Vote! Changing Voter Reactions Over Time Using the Extreme AI  Personality Engine\nApplications of Data Mining (DM) in Science and Engineering: State of  the art and perspectives\nPlaying FPS Games with Deep Reinforcement Learning\nGraph-Structured Representations for Visual Question Answering\nPreorder-Based Triangle: A Modified Version of Bilattice-Based Triangle  for Belief Revision in Nonmonotonic Reasoning\nExtending Unification in $\\mathcal{EL}$ to Disunification: The Case of  Dismatching and Local Disunification\nOn the adoption of abductive reasoning for time series interpretation\nTODIM and TOPSIS with Z-numbers\nEnabling Dark Energy Science with Deep Generative Models of Galaxy  Images\nScope for Machine Learning in Digital Manufacturing\nOn the Phase Transition of Finding a Biclique in a larger Bipartite  Graph\nOnline and Distributed learning of Gaussian mixture models by Bayesian  Moment Matching\nEnhanced LSTM for Natural Language Inference\nAn Ensemble Blocking Scheme for Entity Resolution of Large and Sparse  Datasets\nSemantic Similarity Strategies for Job Title Classification\nRecognizing Detailed Human Context In-the-Wild from Smartphones and  Smartwatches\nRecognizing Implicit Discourse Relations via Repeated Reading: Neural  Networks with Multi-Level Attention\nDocument Image Coding and Clustering for Script Discrimination\nThe Color of the Cat is Gray: 1 Million Full-Sentences Visual Question  Answering (FSVQA)\nLanguage as a Latent Variable: Discrete Generative Models for Sentence  Compression\nDiscovering Sound Concepts and Acoustic Relations In Text\nRegulating Reward Training by Means of Certainty Prediction in a Neural  Network-Implemented Pong Game\nOptimizing positional scoring rules for rank aggregation\nFast Learning of Clusters and Topics via Sparse Posteriors\nPointer Sentinel Mixture Models\nTowards Evidence-Based Ontology for Supporting Systematic Literature  Review\nOnline Segment to Segment Neural Transduction\nTop-N Recommendation on Graphs\nDecision Making Based on Cohort Scores for Speaker Verification\nModel-based Test Generation for Robotic Software: Automata versus  Belief-Desire-Intention Agents\nWeakly Supervised PLDA Training\nA computer program for simulating time travel and a possible 'solution'  for the grandfather paradox\nUbuntuWorld 1.0 LTS - A Platform for Automated Problem Solving &  Troubleshooting in the Ubuntu OS\nCorrect classification for big/smart/fast data machine learning\nTopic Browsing for Research Papers with Hierarchical Latent Tree  Analysis\nDeep Tracking on the Move: Learning to Track the World from a Moving  Vehicle using Recurrent Neural Networks\nEvaluating Induced CCG Parsers on Grounded Semantic Parsing\nBacterial Foraging Optimized STATCOM for Stability Assessment in Power  System\nOutlier Detection from Network Data with Subnetwork Interpretation\nConsistency Ensuring in Social Web Services Based on Commitments  Structure\nTowards deep learning with segregated dendrites\nImproving Accuracy and Scalability of the PC Algorithm by Maximizing  P-value\nA Probability Distribution Strategy with Efficient Clause Selection for  Hard Max-SAT Formulas\nCan Evolutionary Sampling Improve Bagged Ensembles?\nOne-Trial Correction of Legacy AI Systems and Stochastic Separation  Theorems\nCollective Robot Reinforcement Learning with Distributed Asynchronous  Guided Policy Search\nDeep Visual Foresight for Planning Robot Motion\nEmbracing data abundance: BookTest Dataset for Reading Comprehension\nA Constraint-Handling Technique for Genetic Algorithms using a Violation  Factor\nTutorial on Answering Questions about Images with Deep Learning\nTowards the Design of Prospect-Theory based Human Decision Rules for  Hypothesis Testing\nFind Your Own Way: Weakly-Supervised Segmentation of Path Proposals for  Urban Autonomy\nEPOpt: Learning Robust Neural Network Policies Using Model Ensembles\nDomain Adaptation with Soft-margin multiple feature-kernel learning  beats Deep Learning for surveillance face recognition\nThe Predictive Context Tree: Predicting Contexts and Interactions\nVisual Question Answering: Datasets, Algorithms, and Future Challenges\n$\\ell_1$ Regularized Gradient Temporal-Difference Learning\nA Novel Representation of Neural Networks\nHuman Decision-Making under Limited Time\nSolving Marginal MAP Problems with NP Oracles and Parity Constraints\nInterpreting Neural Networks to Improve Politeness Comprehension\nRanking academic institutions on potential paper acceptance in upcoming  conferences\nSituational Awareness by Risk-Conscious Skills\nTowards an Ontology-Driven Blockchain Design for Supply Chain Provenance\nExtrapolation and learning equations\nNavigational Instruction Generation as Inverse Reinforcement Learning  with Neural Machine Translation\nA Chain-Detection Algorithm for Two-Dimensional Grids\nMaximum entropy models for generation of expressive music\nDeep Fruit Detection in Orchards\nDetecting Unseen Falls from Wearable Devices using Channel-wise Ensemble  of Autoencoders\nA fuzzy expert system for earthquake prediction, case study: the Zagros  range\nAn Information Theoretic Feature Selection Framework for Big Data under  Apache Spark\nHadamard Product for Low-rank Bilinear Pooling\nDistributional Inclusion Hypothesis for Tensor-based Composition\nLocalization for Wireless Sensor Networks: A Neural Network Approach\nA Closed Form Solution to Multi-View Low-Rank Regression\nEfficient Rectangular Maximal-Volume Algorithm for Rating Elicitation in  Collaborative Filtering\nWeekly maintenance scheduling using exact and genetic methods\nLearning and Transfer of Modulated Locomotor Controllers\nDecentralized Collaborative Learning of Personalized Models over  Networks\nVRPBench: A Vehicle Routing Benchmark Tool\nMakespan Optimal Solving of Cooperative Path-Finding via Reductions to  Propositional Satisfiability\nWeighted Positive Binary Decision Diagrams for Exact Probabilistic  Inference\nIdentifiability and Transportability in Dynamic Causal Networks\nOn the Hyperprior Choice for the Global Shrinkage Parameter in the  Horseshoe Prior\nCensus Signal Temporal Logic Inference for Multi-Agent Group Behavior  Analysis\nLow-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models\nDeep Amortized Inference for Probabilistic Programs\nBig Batch SGD: Automated Inference using Adaptive Batch Sizes\nEmbodiment of Learning in Electro-Optical Signal Processors\nDynamic Probabilistic Network Based Human Action Recognition\nA Growing Long-term Episodic & Semantic Memory\nReasoning with Memory Augmented Neural Networks for Language  Comprehension\nKGEval: Estimating Accuracy of Automatically Constructed Knowledge  Graphs\nLearning Cost-Effective Treatment Regimes using Markov Decision  Processes\nTemplate Matching Advances and Applications in Image Analysis\nSurprisal-Driven Zoneout\nFrank-Wolfe Algorithms for Saddle Point Problems\nInfinite-dimensional Log-Determinant divergences II: Alpha-Beta  divergences\nA self-tuning Firefly algorithm to tune the parameters of Ant Colony  System (ACSFA)\nUniversal adversarial perturbations\nNew Liftable Classes for First-Order Probabilistic Inference\nDistraction-Based Neural Networks for Document Summarization\nSynthesis of Shared Control Protocols with Provable Safety and  Performance Guarantees\nLearning Scalable Deep Kernels with Recurrent Structure\nImproving Sampling from Generative Autoencoders with Markov Chains\nProbabilistic Model Checking for Complex Cognitive Tasks -- A case study  in human-robot interaction\nDPPred: An Effective Prediction Framework with Concise Discriminative  Patterns\nEdward: A library for probabilistic modeling, inference, and criticism\nA Survey of Brain Inspired Technologies for Engineering\nChinese Poetry Generation with Planning based Neural Network\nMining Social Media for Open Innovation in Transportation Systems\nInference Compilation and Universal Probabilistic Programming\nNeural Symbolic Machines: Learning Semantic Parsers on Freebase with  Weak Supervision\nTowards Blended Reactive Planning and Acting using Behavior Trees\nDetecting Affordances by Visuomotor Simulation\nUsing Artificial Intelligence to Identify State Secrets\nBots as Virtual Confederates: Design and Ethics\nAn application of incomplete pairwise comparison matrices for ranking  top tennis players\nLimitations and Alternatives for the Evaluation of Large-scale Link  Prediction\nInferring Coupling of Distributed Dynamical Systems via Transfer Entropy\nImproving incremental recommenders with online bagging\nInitialization and Coordinate Optimization for Multi-way Matching\nProbabilistic Modeling of Progressive Filtering\nA Hybrid Approach to Word Sense Disambiguation Combining Supervised and  Unsupervised Learning\nLearning Continuous Semantic Representations of Symbolic Expressions\nEstimating Causal Direction and Confounding of Two Discrete Variables\nQBF Solving by Counterexample-guided Expansion\nQuasi-Recurrent Neural Networks\nA Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks\nDynamic Coattention Networks For Question Answering\nCombining policy gradient and Q-learning\nA Differentiable Physics Engine for Deep Learning in Robotics\nCauses for Query Answers from Databases: Datalog Abduction,  View-Updates, and Integrity Constraints\nLearning to Act by Predicting the Future\nSelf-Wiring Question Answering Systems\nAn Information-Theoretic Framework for Fast and Robust Unsupervised  Learning via Neural Population Infomax\nAveraged-DQN: Variance Reduction and Stabilization for Deep  Reinforcement Learning\nReinforcement-based Simultaneous Algorithm and its Hyperparameters  Selection\nPlaying SNES in the Retro Learning Environment\nHierarchical compositional feature learning\nGaussian Attention Model and Its Application to Knowledge Base Embedding  and Question Answering\nNormalizing Flows on Riemannian Manifolds\nCombining observational and experimental data to find heterogeneous  treatment effects\nThe Data Complexity of Description Logic Ontologies\nCognitive Discriminative Mappings for Rapid Learning\nThe Neural Noisy Channel\nSentence Ordering and Coherence Modeling using Recurrent Neural Networks\nEncoding monotonic multi-set preferences using CI-nets: preliminary  report\nA stochastically verifiable autonomous control architecture with  reasoning\nXCSP3: An Integrated Format for Benchmarking Combinatorial Constrained  Problems\nImportance Sampling with Unequal Support\nThe Sum-Product Theorem: A Foundation for Learning Tractable Models\nNeural Networks Models for Entity Discovery and Linking\nUTCNN: a Deep Learning Model of Stance Classificationon on Social Media  Text\nShow me the material evidence: Initial experiments on evaluating  hypotheses from user-generated multimedia data\nLearning to Navigate in Complex Environments\nReinforcement Learning in Rich-Observation MDPs using Spectral Methods\nA Review on Algorithms for Constraint-based Causal Discovery\nLeveraging Video Descriptions to Learn Video Question Answering\nLearning to Gather Information via Imitation\nFeature Engineering and Ensemble Modeling for Paper Acceptance Rank  Prediction\nWhen Saliency Meets Sentiment: Understanding How Image Content Invokes  Emotion and Sentiment\nTraversing Knowledge Graph in Vector Space without Symbolic Space  Guidance\nThe NOESIS Network-Oriented Exploration, Simulation, and Induction  System\nAn Evaluation of Information Sharing Parking Guidance Policies Using a  Bayesian Approach\nAn integrated Graphical User Interface for Debugging Answer Set Programs\nVariable Neighborhood Search Algorithms for the multi-depot dial-a-ride  problem with heterogeneous vehicles and users\nDriving CDCL Search\nTowards Interconnected Virtual Reality: Opportunities, Challenges and  Enablers\nProjE: Embedding Projection for Knowledge Graph Completion\nZero-Shot Visual Question Answering\nStream Packing for Asynchronous Multi-Context Systems using ASP\nLearning to detect and localize many objects from few examples\nStudy on Feature Subspace of Archetypal Emotions for Speech Emotion  Recognition\nFast Non-Parametric Tests of Relative Dependency and Similarity\nNothing Else Matters: Model-Agnostic Explanations By Identifying  Prediction Invariance\nTowards a Mathematical Understanding of the Difficulty in Learning with  Feedforward Neural Networks\nAnalysis of a Design Pattern for Teaching with Features and Labels\nNavigational Rule Derivation: An algorithm to determine the effect of  traffic signs on road networks\nTeam-maxmin equilibrium: efficiency bounds and algorithms\nVariable Computation in Recurrent Neural Networks\nExpert Gate: Lifelong Learning with a Network of Experts\nGenerative Deep Neural Networks for Dialogue: A Short Review\nA Survey of Methods for Collective Communication Optimization and Tuning\nInvertible Conditional GANs for image editing\nGenerating machine-executable plans from end-user's natural-language  instructions\nNon-Local Color Image Denoising with Convolutional Neural Networks\nLearning From Graph Neighborhoods Using LSTMs\nMemory Lens: How Much Memory Does an Agent Use?\nEnforcing Relational Matching Dependencies with Datalog for Entity  Resolution\nAssociative Adversarial Networks\nCoherent Dialogue with Attention-based Language Models\nAn Efficient Training Algorithm for Kernel Survival Support Vector  Machines\nInterpreting Finite Automata for Sequential Data\nCAS-CNN: A Deep Convolutional Neural Network for Image Compression  Artifact Suppression\nAn unexpected unity among methods for interpreting model predictions\nVariational Intrinsic Control\nFeature Importance Measure for Non-linear Learning Algorithms\nPrograms as Black-Box Explanations\niCaRL: Incremental Classifier and Representation Learning\nMulti-Modal Mean-Fields via Cardinality-Based Clamping\nA Spatio-Temporal Representation for the Orienteering Problem with  Time-Varying Profits\nMultiscale Inverse Reinforcement Learning using Diffusion Wavelets\nOn Human Intellect and Machine Failures: Troubleshooting Integrative  Machine Learning Systems\nGuessWhat?! Visual object discovery through multi-modal dialogue\nAn Analysis of Tournament Structure\nNew Trends in Neutrosophic Theory and Applications\nConvolutional Experts Constrained Local Model for Facial Landmark  Detection\nTraining an Interactive Humanoid Robot Using Multimodal Deep  Reinforcement Learning\nBliStrTune: Hierarchical Invention of Theorem Proving Strategies\nEmbedded Bandits for Large-Scale Black-Box Optimization\nSAD-GAN: Synthetic Autonomous Driving using Generative Adversarial  Networks\nLong-Term Image Boundary Prediction\nThe BIN_COUNTS Constraint: Filtering and Applications\nDeepSetNet: Predicting Sets with Deep Neural Networks\nImproving Policy Gradient by Exploring Under-appreciated Rewards\nEmergence of foveal image sampling from learning to attend in visual  scenes\nInput Switched Affine Networks: An RNN Architecture Designed for  Interpretability\nMaximizing Non-Monotone DR-Submodular Functions with Cardinality  Constraints\nLearning Filter Banks Using Deep Learning For Acoustic Signals\nFractional Order Fuzzy Control of Hybrid Power System with Renewable  Generation Using Chaotic PSO\nDialogue Learning With Human-In-The-Loop\nNewsQA: A Machine Comprehension Dataset\nExploration for Multi-task Reinforcement Learning with Deep Generative  Models\nNeural Combinatorial Optimization with Reinforcement Learning\nContextualizing Geometric Data Analysis and Related Data Analytics: A  Virtual Microscope for Big Data Analytics\nSystem-Generated Requests for Rewriting Proposals\nFusion of EEG and Musical Features in Continuous Music-emotion  Recognition\nSeDMiD for Confusion Detection: Uncovering Mind State from Time Series  Brain Wave Data\nThe observer-assisted method for adjusting hyper-parameters in deep  learning algorithms\nComputer Assisted Composition with Recurrent Neural Networks\nRobust Optimization for Tree-Structured Stochastic Network Design\nCDVAE: Co-embedding Deep Variational Auto Encoder for Conditional  Variational Generation\nAnalysis of the Human-Computer Interaction on the Example of Image-based  CAPTCHA by Association Rule Mining\nOn Coreferring Text-extracted Event Descriptions with the aid of  Ontological Reasoning\nAn Evaluation of Models for Runtime Approximation in Link Discovery\nA Compositional Object-Based Approach to Learning Physical Dynamics\nBootstrapping incremental dialogue systems: using linguistic knowledge  to learn from minimal data\nLarge-scale Validation of Counterfactual Learning Methods: A Test-Bed\nPiecewise Latent Variables for Neural Variational Text Processing\nPlaying Doom with SLAM-Augmented Deep Reinforcement Learning\nTransfer Learning Across Patient Variations with Hidden Parameter Markov  Decision Processes\nInferring Cognitive Models from Data using Approximate Bayesian  Computation\nAutomated assessment of non-native learner essays: Investigating the  role of linguistic features\nStructured Filtering\nAsynchronous Stochastic Gradient MCMC with Elastic Coupling\nCommonly Uncommon: Semantic Sparsity in Situation Recognition\nA Matrix Splitting Perspective on Planning with Options\nRecSys Challenge 2016: job recommendations based on preselection of  offers and gradient boosting\nDeepBach: a Steerable Model for Bach Chorales Generation\nShort-term traffic flow forecasting with spatial-temporal correlation in  a hybrid deep learning framework\nEnhancing Use Case Points Estimation Method Using Soft Computing  Techniques\nDeep Learning of Robotic Tasks without a Simulator using Strong and Weak  Human Supervision\nThe Complexity of Bayesian Networks Specified by Propositional and  Relational Languages\nNeural Symbolic Machines: Learning Semantic Parsers on Freebase with  Weak Supervision (Short Version)\nProportional Rankings\nN-gram Opcode Analysis for Android Malware Detection\nDeep learning in color: towards automated quark/gluon jet discrimination\nFleet Size and Mix Split-Delivery Vehicle Routing\nFactored Contextual Policy Search with Bayesian Optimization\nKnowing When to Look: Adaptive Attention via A Visual Sentinel for Image  Captioning\nCross-Lingual Predicate Mapping Between Linked Data Ontologies\nMultimodal Transfer: A Hierarchical Deep Convolutional Neural Network  for Fast Artistic Style Transfer\nEffect of Reward Function Choices in MDPs with Value-at-Risk\nMode Regularized Generative Adversarial Networks\nMeasuring the non-asymptotic convergence of sequential Monte Carlo  samplers using probabilistic programming\nStochastic Primal-Dual Methods and Sample Complexity of Reinforcement  Learning\nInverses, Conditionals and Compositional Operators in Separative  Valuation Algebra\nCoupling Distributed and Symbolic Execution for Natural Language Queries\nMeasuring Adverse Drug Effects on Multimorbity using Tractable Bayesian  Networks\nAdvancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and  Length-Scale Cool Down\nFinding Better Active Learners for Faster Literature Reviews\nReinforcement Learning With Temporal Logic Rewards\nFlu Detector: Estimating influenza-like illness rates from online  user-generated content\nContext-aware Sentiment Word Identification: sentiword2vec\nA Unit Selection Methodology for Music Generation Using Deep Neural  Networks\nTensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)\nDeep Active Learning for Dialogue Generation\nApplication of Advanced Record Linkage Techniques for Complex Population  Reconstruction\nIncorporating Human Domain Knowledge into Large Scale Cost Function  Learning\nAn argumentative agent-based model of scientific inquiry\nSparse Factorization Layers for Neural Networks with Limited Supervision\nReal-time interactive sequence generation and control with Recurrent  Neural Network ensembles\nImposing higher-level Structure in Polyphonic Music Generation using  Convolutional Restricted Boltzmann Machines and Constraints\nScalable Computation of Optimized Queries for Sequential Diagnosis\nCollaborative creativity with Monte-Carlo Tree Search and Convolutional  Neural Networks\nLearning through Dialogue Interactions by Asking Questions\nOntohub: A semantic repository for heterogeneous ontologies\nCoupling Adaptive Batch Sizes with Learning Rates\nMulti-Agent Path Finding with Delay Probabilities\nDefensive Player Classification in the National Basketball Association\nDeep Reinforcement Learning with Successor Features for Navigation  across Similar Environments\nSupervised Quantum Learning without Measurements\nAn Alternative Softmax Operator for Reinforcement Learning\nReinforcement Learning Using Quantum Boltzmann Machines\nExploiting sparsity to build efficient kernel based collaborative  filtering for top-N item recommendation\nA Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal  Data: Learning and Inference\nContext and Interference Effects in the Combinations of Natural Concepts\nAn Empirical Study of Adequate Vision Span for Attention-Based Neural  Machine Translation\nImproving Tweet Representations using Temporal and User Context\nA modified Physarum-inspired model for the user equilibrium traffic  assignment problem\nA Scalable Document-based Architecture for Text Analysis\nLearning Features by Watching Objects Move\nComputational Complexity of Testing Proportional Justified  Representation\nParallelized Tensor Train Learning of Polynomial Classifiers\nA Latent-class Model for Estimating Product-choice Probabilities from  Clickstream Data\nCLEVR: A Diagnostic Dataset for Compositional Language and Elementary  Visual Reasoning\nAIVAT: A New Variance Reduction Technique for Agent Evaluation in  Imperfect Information Games\nBoolean kernels for collaborative filtering in top-N item recommendation\nUnderstanding Error Correction and its Role as Part of the Communication  Channel in Environments composed of Self-Integrating Systems\nCausal Effect Identification in Acyclic Directed Mixed Graphs and Gated  Models\nCounting Answer Sets via Dynamic Programming\nJointly Extracting Relations with Class Ties via Effective Deep Ranking\nHighway and Residual Networks learn Unrolled Iterative Estimation\nSampleRNN: An Unconditional End-to-End Neural Audio Generation Model\nConvergence Rates for Greedy Kaczmarz Algorithms, and Faster Randomized  Kaczmarz Rules Using the Orthogonality Graph\nEnhanceNet: Single Image Super-Resolution Through Automated Texture  Synthesis\nSolving Combinatorial Optimization problems with Quantum inspired  Evolutionary Algorithm Tuned using a Novel Heuristic Method\nMonte Carlo Sort for unreliable human comparisons\nA Sparse Nonlinear Classifier Design Using AUC Optimization\nRole of Simplicity in Creative Behaviour: The Case of the Poietic  Generator\nFastMask: Segment Multi-scale Object Candidates in One Shot\nDeep neural heart rate variability analysis\nLifted Relational Algebra with Recursion and Connections to Modal Logic\nAdaptive Lambda Least-Squares Temporal Difference Learning\nA Joint Speaker-Listener-Reinforcer Model for Referring Expressions\nDigital Advertising Traffic Operation: Machine Learning for Process  Discovery\nNon-Negative Matrix Factorization Test Cases\nLearning Weighted Association Rules in Human Phenotype Ontology\nProceedings 29th and 30th Workshops on (Constraint) Logic Programming  and 24th International Workshop on Functional and (Constraint) Logic  Programming\nSTRIPS Planning in Infinite Domains\nAn affective computational model for machine consciousness\nTruthful Facility Location with Additive Errors\nFuzzy finite element model updating using metaheuristic optimization  algorithms\nStochastic Planning and Lifted Inference\nA Review of Neural Network Based Machine Learning Approaches for Rotor  Angle Stability Control\nAutoencoder Regularized Network For Driving Style Representation  Learning\nToward negotiable reinforcement learning: shifting priorities in Pareto  optimal sequential decision-making\nGenerating Focussed Molecule Libraries for Drug Discovery with Recurrent  Neural Networks\nUnderstanding the complexity of #SAT using knowledge compilation\nMulti-level Representations for Fine-Grained Typing of Knowledge Base  Entities\nCoupled Compound Poisson Factorization\nInformation Pursuit: A Bayesian Framework for Sequential Scene Parsing\nPlaytime Measurement with Survival Analysis\nReinforcement Learning via Recurrent Convolutional Neural Networks\nMulti-task Learning Of Deep Neural Networks For Audio Visual Automatic  Speech Recognition\nA Convenient Category for Higher-Order Probability Theory\nA Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based  Semantic Role Labeling\nTowards Decoding as Continuous Optimization in Neural Machine  Translation\nContext-aware Captions from Context-agnostic Supervision\nReal-time eSports Match Result Prediction\nFrom First-Order Logic to Assertional Logic\nA Savage-Like Axiomatization for Nonstandard Expected Utility\nOn the links between argumentation-based reasoning and nonmonotonic  reasoning\nDeep Probabilistic Programming\nA Copy-Augmented Sequence-to-Sequence Architecture Gives Good  Performance on Task-Oriented Dialogue\nAgent-Agnostic Human-in-the-Loop Reinforcement Learning\nNear Optimal Behavior via Approximate State Abstraction\nUnderstanding the Effective Receptive Field in Deep Convolutional Neural  Networks\nAchieving Privacy in the Adversarial Multi-Armed Bandit\nThompson Sampling For Stochastic Bandits with Graph Feedback\nTowards prediction of rapid intensification in tropical cyclones with  recurrent neural networks\nFrom Community Detection to Community Profiling\nMultiobjective Optimization of Solar Powered Irrigation System with  Fuzzy Type-2 Noise Modelling\nUne mesure d'expertise pour le crowdsourcing\nVOCSMAT: a connectionist-inspired treatment proposal for relational  traumas\nConverting Cascade-Correlation Neural Nets into Probabilistic Generative  Models\nOntology based system to guide internship assignment process\nReasoning in Non-Probabilistic Uncertainty: Logic Programming and  Neural-Symbolic Computing as Examples\nInteractive Learning from Policy-Dependent Human Feedback\nLabel Propagation on K-partite Graphs with Heterophily\nWhat the Language You Tweet Says About Your Occupation\nA Multichannel Convolutional Neural Network For Cross-language Dialog  State Tracking\nSpace-Time Graph Modeling of Ride Requests Based on Real-World Data\nPerceptually Optimized Image Rendering\nVariability-Aware Design for Energy Efficient Computational Artificial  Intelligence Platform\nFast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D\nIdentifying Consistent Statements about Numerical Data with  Dispersion-Corrected Subgroup Discovery\nImage-Grounded Conversations: Multimodal Context for Natural Question  and Response Generation\nSystems of natural-language-facilitated human-robot cooperation: A  review\nPure Rough Mereology and Counting\nPractical Reasoning with Norms for Autonomous Software Agents (Full  Edition)\nPlan Explanations as Model Reconciliation: Moving Beyond Explanation as  Soliloquy\nRhythm Transcription of Polyphonic Piano Music Based on Merged-Output  HMM for Multiple Voices\nDiversification Methods for Zero-One Optimization\nClick Through Rate Prediction for Contextual Advertisment Using Linear  Regression\nReinforcement Learning Algorithm Selection\nExpert Level control of Ramp Metering based on Multi-task Deep  Reinforcement Learning\nDeep Reinforcement Learning for Robotic Manipulation-The state of the  art\nOn the Semantics and Complexity of Probabilistic Logic Programs\nEfficient Rank Aggregation via Lehmer Codes\nTowards \"AlphaChem\": Chemical Synthesis Planning with Tree Search and  Deep Neural Network Policies\nRobust Order Scheduling in the Fashion Industry: A Multi-Objective  Optimization Approach\nMultilingual and Cross-lingual Timeline Extraction\nThe Value of Inferring the Internal State of Traffic Participants for  Autonomous Freeway Driving\nDeep Learning with Low Precision by Half-wave Gaussian Quantization\nReluplex: An Efficient SMT Solver for Verifying Deep Neural Networks\nTraffic Lights with Auction-Based Controllers: Algorithms and Real-World  Data\nA Theoretical Analysis of First Heuristics of Crowdsourced Entity  Resolution\nExploring the bidimensional space: A dynamic logic point of view\nView Independent Vehicle Make, Model and Color Recognition Using  Convolutional Neural Network\nExtracting Lifted Mutual Exclusion Invariants from Temporal Planning  Domains\nRepresentations of language in a model of visually grounded speech  signal\nGenerating Multiple Diverse Hypotheses for Human 3D Pose Consistent with  2D Joint Detections\nPropagation via Kernelization: The Vertex Cover Constraint\nDeep Generalized Canonical Correlation Analysis\nAutomatic Rule Extraction from Long Short Term Memory Networks\nCausal Regularization\nOptimal Detection of Faulty Traffic Sensors Used in Route Planning\nGraph Based Relational Features for Collective Classification\nAnswer Set Solving with Bounded Treewidth Revisited\nMulti-agent Reinforcement Learning in Sequential Social Dilemmas\nModeling Semantic Expectation: Using Script Knowledge for Referent  Prediction\nHybrid Code Networks: practical and efficient end-to-end dialog control  with supervised and reinforcement learning\nGroup Scissor: Scaling Neuromorphic Computing Design to Large Neural  Networks\nOctopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing\nGenetic and Memetic Algorithm with Diversity Equilibrium based on Greedy  Diversification\nMechanism Design in Social Networks\nA Morphology-aware Network for Morphological Disambiguation\nReservoir Computing Using Non-Uniform Binary Cellular Automata\nBilateral Multi-Perspective Matching for Natural Language Sentences\nOffline bilingual word vectors, orthogonal transformations and the  inverted softmax\nData-Intensive Supercomputing in the Cloud: Global Analytics for  Satellite Imagery\nDetection of Slang Words in e-Data using semi-Supervised Learning\nOn Detecting Adversarial Perturbations\nDAGER: Deep Age, Gender and Emotion Recognition Using Convolutional  Neural Network\nCyclic Dominance in the Spatial Coevolutionary Optional Prisoner's  Dilemma Game\nOn the Discrepancy Between Kleinberg's Clustering Axioms and $k$-Means  Clustering Algorithm Behavior\nLocal Search for Minimum Weight Dominating Set with Two-Level  Configuration Checking and Frequency Based Scoring Function\nVisualizing Deep Neural Network Decisions: Prediction Difference  Analysis\nAutomated Identification of Drug-Drug Interactions in Pediatric  Congestive Heart Failure Patients\nA Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale  Learning\nQuadratic Upper Bound for Recursive Teaching Dimension of Finite VC  Classes\nRevisiting Distributed Synchronous SGD\nHemingway: Modeling Distributed Optimization Algorithms\nCosine Normalization: Using Cosine Similarity Instead of Dot Product in  Neural Networks\nLearning to Repeat: Fine Grained Action Repetition for Deep  Reinforcement Learning\nSurvey of reasoning using Neural networks\nSample Efficient Policy Search for Optimal Stopping Domains\nSynthesizing Imperative Programs from Examples Guided by Static Analysis\nDelving Deeper into MOOC Student Dropout Prediction\nUnsupervised Diverse Colorization via Generative Adversarial Networks\nTask-driven Visual Saliency and Attention-based Visual Question  Answering\nDeepCloak: Masking Deep Neural Network Models for Robustness Against  Adversarial Samples\nCausal Inference by Stochastic Complexity\nSolving DCOPs with Distributed Large Neighborhood Search\nA Realistic Dataset for the Smart Home Device Scheduling Problem for  DCOPs\nTheoretical and Experimental Analysis of the Canadian Traveler Problem\nSequence-based Multimodal Apprenticeship Learning For Robot Perception  and Decision Making\nEmbedding Knowledge Graphs Based on Transitivity and Antisymmetry of  Rules\nRationalization: A Neural Machine Translation Approach to Generating  Natural Language Explanations\nContractibility for Open Global Constraints\nStochastic Variance Reduction Methods for Policy Evaluation\nMaximum-Likelihood Augmented Discrete Generative Adversarial Networks\nReinforcement Learning with Deep Energy-Based Policies\nSynergistic Team Composition\nBalancing Lexicographic Fairness and a Utilitarian Objective with  Application to Kidney Exchange\nDifferentiable Learning of Logical Rules for Knowledge Base Reasoning\nCombining the $k$-CNF and XOR Phase-Transitions\nOptimal Experiment Design for Causal Discovery from Fixed Number of  Experiments\nShow, Attend and Interact: Perceivable Human-Robot Social Interaction  through Neural Attention Q-Network\nOptimal Categorical Attribute Transformation for Granularity Change in  Relational Databases for Binary Decision Problems in Educational Data Mining\nRobust Budget Allocation via Continuous Submodular Functions\nProportional Representation in Vote Streams\nBridging the Gap Between Value and Policy Based Reinforcement Learning\nLearning Conversational Systems that Interleave Task and Non-Task  Content\nInvestigating the Characteristics of One-Sided Matching Mechanisms Under  Various Preferences and Risk Attitudes\nA Hypercat-enabled Semantic Internet of Things Data Hub: Technical  Report\nHolStep: A Machine Learning Dataset for Higher-order Logic Theorem  Proving\nFast k-Nearest Neighbour Search via Prioritized DCI\nLearning to Optimize Neural Nets\nOptNet: Differentiable Optimization as a Layer in Neural Networks\nLearning Social Affordance Grammar from Videos: Transferring Human  Interactions to Human-Robot Interactions\nPMLB: A Large Benchmark Suite for Machine Learning Evaluation and  Comparison\nEvolving Deep Neural Networks\nAdaptive Matching for Expert Systems with Uncertain Task Types\nSampling Variations of Lead Sheets\nSLIM: Semi-Lazy Inference Mechanism for Plan Recognition\nUnsupervised Image-to-Image Translation Networks\nDAWT: Densely Annotated Wikipedia Texts across multiple languages\nToward Controlled Generation of Text\nA Laplacian Framework for Option Discovery in Reinforcement Learning\nBelief Propagation in Conditional RBMs for Structured Prediction\nEnd-to-End Task-Completion Neural Dialogue Systems\nLearning Robot Activities from First-Person Human Videos Using  Convolutional Future Regression\nFeUdal Networks for Hierarchical Reinforcement Learning\nTowards Monetary Incentives in Social Q&A Services\nGeneralised Discount Functions applied to a Monte-Carlo AImu  Implementation\nControlling for Unobserved Confounds in Classification Using  Correlational Constraints\nPrinciples and Examples of Plausible Reasoning and Propositional  Plausible Logic\nSound-Word2Vec: Learning Word Representations Grounded in Sounds\nA new belief Markov chain model and its application in inventory  prediction\nEvidential supplier selection based on interval data fusion\nGuarantees for Greedy Maximization of Non-submodular Functions with  Applications\nOn the Limits of Learning Representations with Label-Based Supervision\nCooperative Epistemic Multi-Agent Planning for Implicit Coordination\nDeep Robust Kalman Filter\nA quantum dynamic belief decision making model\nMulti-Robot Active Information Gathering with Periodic Communication\nCost-Optimal Learning of Causal Graphs\nTowards Generalization and Simplicity in Continuous Control\nIntroduction to Formal Concept Analysis and Its Applications in  Information Retrieval and Related Fields\nMemory Enriched Big Bang Big Crunch Optimization Algorithm for Data  Clustering\nA quantum dynamic belief model to explain the interference effects of  categorization on decision making\nLearning Invariant Feature Spaces to Transfer Skills with Reinforcement  Learning\nCombining Bayesian Approaches and Evolutionary Techniques for the  Inference of Breast Cancer Networks\nLearning the Probabilistic Structure of Cumulative Phenomena with  Suppes-Bayes Causal Networks\nEfficient Simulation of Financial Stress Testing Scenarios with  Suppes-Bayes Causal Networks\nInformation Extraction in Illicit Domains\nA Structured Self-attentive Sentence Embedding\nBehavior-based Navigation of Mobile Robot in Unknown Environments Using  Fuzzy Logic and Multi-Objective Optimization\nEmbedding Tarskian Semantics in Vector Spaces\nCounterfactuals, indicative conditionals, and negation under  uncertainty: Are there cross-cultural differences?\nLesionSeg: Semantic segmentation of skin lesions using Deep  Convolutional Neural Network\nWhat can you do with a rock? Affordance extraction via word embeddings\nUsing Options and Covariance Testing for Long Horizon Off-Policy Policy  Evaluation\nLearning Gradient Descent: Better Generalization and Longer Horizons\nApplying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue\nRight for the Right Reasons: Training Differentiable Models by  Constraining their Explanations\nConvolutional Spike Timing Dependent Plasticity based Feature Learning  in Spiking Neural Networks\nEvolution Strategies as a Scalable Alternative to Reinforcement Learning\nFront-to-End Bidirectional Heuristic Search with Near-Optimal Node  Expansions\nReal-Time Machine Learning: The Missing Pieces\nMicro-Objective Learning : Accelerating Deep Reinforcement Learning  through the Discovery of Continuous Subgoals\nPrediction and Control with Temporal Segment Models\nAny-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm\nDeep Value Networks Learn to Evaluate and Iteratively Refine Structured  Outputs\nTask-based End-to-end Model Learning in Stochastic Optimization\nFuzzy Model Tree For Early Effort Estimation\nLearning best K analogies from data distribution for case-based software  effort estimation\nMinimizing Maximum Regret in Commitment Constrained Sequential Decision  Making\nUnderstanding Black-box Predictions via Influence Functions\nWeighted Voting Via No-Regret Learning\nMaking Neural QA as Simple as Possible but not Simpler\nExploring the Combination Rules of D Numbers From a Perspective of  Conflict Redistribution\nResilience: A Criterion for Learning in the Presence of Arbitrary  Outliers\nLegal Question Answering using Ranking SVM and Deep Convolutional Neural  Network\nFinite Sample Analysis of Two-Timescale Stochastic Approximation with  Applications to Reinforcement Learning\nConvolutional Recurrent Neural Networks for Small-Footprint Keyword  Spotting\nMinimax Regret Bounds for Reinforcement Learning\nMachining of Spherical Component Fabricated by Selected Laser Melting:  Strategies and Equipment\nParticle Value Functions\nModeling Relational Data with Graph Convolutional Networks\nGeneralised Reichenbachian Common Cause Systems\nDeep Decentralized Multi-task Multi-Agent Reinforcement Learning under  Partial Observability\nAlgorithms for Semantic Segmentation of Multispectral Remote Sensing  Imagery using Deep Learning\nMulti-Timescale, Gradient Descent, Temporal Difference Learning with  Linear Options\nObject category understanding via eye fixations on freehand sketches\nEvidence Updating for Stream-Processing in Big-Data: Robust Conditioning  in Soft and Hard Fusion Environments\nQMDP-Net: Deep Learning for Planning under Partial Observability\nDistributed Constraint Problems for Utilitarian Agents with Privacy  Concerns, Recast as POMDPs\nInvestigation of Language Understanding Impact for Reinforcement  Learning Based Dialogue Systems\nInterest-Driven Discovery of Local Process Models\nDeep Learning for Explicitly Modeling Optimization Landscapes\nUnifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement  Learning\n\\$1 Today or \\$2 Tomorrow? The Answer is in Your Facebook Likes\nSupervised Typing of Big Graphs using Semantic Embeddings\nInformation-theoretic Model Identification and Policy Search using  Physics Engines with Application to Robotic Manipulation\nSelf corrective Perturbations for Semantic Segmentation and  Classification\nFast Stochastic Variance Reduced Gradient Method with Momentum  Acceleration for Machine Learning\nContainment for Rule-Based Ontology-Mediated Queries\nDistribution of Gaussian Process Arc Lengths\nAn overview of embedding models of entities and relationships for  knowledge base completion\nSemi-supervised Embedding in Attributed Networks with Outliers\nNote Value Recognition for Piano Transcription Using Markov Random  Fields\nSupervisor Synthesis of POMDP based on Automata Learning\nSmart Augmentation - Learning an Optimal Data Augmentation Strategy\nReasoning by Cases in Structured Argumentation\nCalendar.help: Designing a Workflow-Based Scheduling Agent with Humans  in the Loop\nOvercoming Catastrophic Forgetting by Incremental Moment Matching\nTeam Formation for Scheduling Educational Material in Massive Online  Classes\nOpen Vocabulary Scene Parsing\nInfoGAIL: Interpretable Imitation Learning from Visual Demonstrations\nSocially Aware Motion Planning with Deep Reinforcement Learning\nTransfer learning for music classification and regression tasks\nEnsembles of Deep LSTM Learners for Activity Recognition using Wearables\nAdversarial Transformation Networks: Learning to Generate Adversarial  Examples\nMining Best Closed Itemsets for Projection-antimonotonic Constraints in  Polynomial Time\nIs This a Joke? Detecting Humor in Spanish Tweets\nInverse Reinforcement Learning from Incomplete Observation Data\nPerception Driven Texture Generation\nBringing Salary Transparency to the World: Computing Robust Compensation  Insights via LinkedIn Salary\nLabelBank: Revisiting Global Perspectives for Semantic Segmentation\nOn Convergence Property of Implicit Self-paced Objective\nSpaceprint: a Mobility-based Fingerprinting Scheme for Public Spaces\nBandit-Based Model Selection for Deformable Object Manipulation\nEnter the Matrix: A Virtual World Approach to Safely Interruptable  Autonomous Systems\nEfficient Parallel Translating Embedding For Knowledge Graphs\nAn Empirical Approach for Modeling Fuzzy Geographical Descriptors\nSpeaking the Same Language: Matching Machine to Human Captions by  Adversarial Training\nBootstrapping Labelled Dataset Construction for Cow Tracking and  Behavior Analysis\nEvaluating Complex Task through Crowdsourcing: Multiple Views Approach\nReliable Decision Support using Counterfactual Models\nLearning Discourse-level Diversity for Neural Dialog Models using  Conditional Variational Autoencoders\nOntological Multidimensional Data Models and Contextual Data Qality\nAdversarial Connective-exploiting Networks for Implicit Discourse  Relation Classification\nAligned Image-Word Representations Improve Inductive Transfer Across  Vision-Language Tasks\nChained Multi-stream Networks Exploiting Pose, Motion, and Appearance  for Action Classification and Detection\nSemi-Supervised Generation with Cluster-aware Generative Models\nMulti-Advisor Reinforcement Learning\nDeriving Probability Density Functions from Probabilistic Functional  Programs\nAdaptive Motion Gaming AI for Health Promotion\nAn Ontological Architecture for Orbital Debris Data\nEmotional Chatting Machine: Emotional Conversation Generation with  Internal and External Memory\nProbabilistic Search for Structured Data via Probabilistic Programming  and Nonparametric Bayes\nNeural Audio Synthesis of Musical Notes with WaveNet Autoencoders\nMulti-Label Learning with Global and Local Label Correlation\nMultitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder  Based Speech Recognition\nLandmark Guided Probabilistic Roadmap Queries\nConformative Filtering for Implicit Feedback Data\nEncoder Based Lifelong Learning\nFrom Data to City Indicators: A Knowledge Graph for Supporting Automatic  Generation of Dashboards\nRecurrent Environment Simulators\nThresholding Bandits with Augmented UCB\nA Constrained Sequence-to-Sequence Neural Model for Sentence  Simplification\nBasic Formal Properties of A Relational Model of The Mathematical Theory  of Evidence\nMixed Graphical Models for Causal Analysis of Multi-modal Variables\nAdaptive Relaxed ADMM: Convergence Theory and Practical Implementation\nExploring Word Embeddings for Unsupervised Textual User-Generated  Content Normalization\nStochastic Neural Networks for Hierarchical Reinforcement Learning\nSemantically Consistent Regularization for Zero-Shot Recognition\nWRPN: Training and Inference using Wide Reduced-Precision Networks\nInterpretable Explanations of Black Boxes by Meaningful Perturbation\nUnsupervised Event Abstraction using Pattern Abstraction and Local  Process Models\nFinding Modes by Probabilistic Hypergraphs Shifting\nStigmergy-based modeling to discover urban activity patterns from  positioning data\nBeliefs in Markov Trees - From Local Computations to Local Valuation\nCounterexample Guided Inductive Optimization\nParallelized Kendall's Tau Coefficient Computation via SIMD Vectorized  Sorting On Many-Integrated-Core Processors\nDeep Reinforcement Learning-based Image Captioning with Embedding Reward\nValue Directed Exploration in Multi-Armed Bandits with Structured Priors\nVirtual to Real Reinforcement Learning for Autonomous Driving\nFully Distributed and Asynchronized Stochastic Gradient Descent for  Networked Systems\nSolving ill-posed inverse problems using iterative deep neural networks\nExplaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR)  Approach to Understanding Deep Neural Networks\nCBinfer: Change-Based Inference for Convolutional Neural Networks on  Video Data\nDeep API Programmer: Learning to Program with APIs\nAn entity-driven recursive neural network model for chinese discourse  coherence modeling\nIncremental learning of high-level concepts by imitation\nOptimizing Differentiable Relaxations of Coreference Evaluation Metrics\nTGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering\nThe Reactor: A Sample-Efficient Actor-Critic Architecture\nRACE: Large-scale ReAding Comprehension Dataset From Examinations\nLearn-Memorize-Recall-Reduce A Robotic Cloud Computing Paradigm\nA Novel Experimental Platform for In-Vessel Multi-Chemical Molecular  Communications\nBayesian Hybrid Matrix Factorisation for Data Integration\nMorpheo: Traceable Machine Learning on Hidden data\nAnomaly detection and motif discovery in symbolic representations of  time series\nUnderstanding Negations in Information Processing: Learning from  Replicating Human Behavior\nGeneralized Ideals and Co-Granular Rough Sets\n25 Tweets to Know You: A New Model to Predict Personality with Social  Media\nSimultaneous Policy Learning and Latent State Inference for Imitating  Driver Behavior\nUsing Contexts and Constraints for Improved Geotagging of Human  Trafficking Webpages\nAnswering Complex Questions Using Open Information Extraction\nA Large Self-Annotated Corpus for Sarcasm\nOCRAPOSE II: An OCR-based indoor positioning system using mobile phone  images\nUniversal Adversarial Perturbations Against Semantic Image Segmentation\nImportance Sampled Stochastic Optimization for Variational Inference\nNetwork Dissection: Quantifying Interpretability of Deep Visual  Representations\nAn Interpretable Knowledge Transfer Model for Knowledge Base Completion\nSemEval-2017 Task 8: RumourEval: Determining rumour veracity and support  for rumours\nPredicting Cognitive Decline with Deep Learning of Brain Metabolism and  Amyloid Imaging\nLearning to Acquire Information\nImproved Neural Relation Detection for Knowledge Base Question Answering\nOn Singleton Arc Consistency for CSPs Defined by Monotone Patterns\nA Semantic QA-Based Approach for Text Summarization Evaluation\nA Reinforcement Learning Approach to Weaning of Mechanical Ventilation  in Intensive Care Units\nGoverning Governance: A Formal Framework for Analysing Institutional  Design and Enactment Governance\nModular Multi-Objective Deep Reinforcement Learning with Decision Values\nA Review on Deep Learning Techniques Applied to Semantic Segmentation\nPopulation Seeding Techniques for Rolling Horizon Evolution in General  Video Game Playing\nNaturalizing a Programming Language via Interactive Learning\nEvaluating and Modelling Hanabi-Playing Agents\nAnalysis of Vanilla Rolling Horizon Evolution Parameters in General  Video Game Playing\nTuring at SemEval-2017 Task 8: Sequential Approach to Rumour Stance  Classification with Branch-LSTM\nPaying Attention to Descriptions Generated by Image Captioning Models\nMulti-Task Video Captioning with Video and Entailment Generation\nPPMF: A Patient-based Predictive Modeling Framework for Early ICU  Mortality Prediction\nLearning of Human-like Algebraic Reasoning Using Deep Feedforward Neural  Networks\nPath Planning with Kinematic Constraints for Robot Groups\nSemi-supervised Bayesian Deep Multi-modal Emotion Recognition\nMolecular De Novo Design through Deep Reinforcement Learning\nTaxonomy Induction using Hypernym Subsequences\nReinforcement Learning-based Thermal Comfort Control for Vehicle Cabins\nFrom Language to Programs: Bridging Reinforcement Learning and Maximum  Marginal Likelihood\nTraining L1-Regularized Models with Orthant-Wise Passive Descent  Algorithms\nThe loss surface of deep and wide neural networks\nUsing a new parsimonious AHP methodology combined with the Choquet  integral: An application for evaluating social housing initiatives\nA Generalization of Convolutional Neural Networks to Graph-Structured  Data\nNo More Discrimination: Cross City Adaptation of Road Scene Segmenters\nA quantitative assessment of the effect of different algorithmic schemes  to the task of learning the structure of Bayesian Networks\nParseval Networks: Improving Robustness to Adversarial Examples\nPast, Present, Future: A Computational Investigation of the Typology of  Tense in 1000 Languages\nNot All Dialogues are Created Equal: Instance Weighting for Neural  Conversational Models\nLearning to Ask: Neural Question Generation for Reading Comprehension\nDefense semantics of argumentation: encoding reasons for accepting  arguments\nDeriving Quests from Open World Mechanics\nTowards well-specified semi-supervised model-based classifiers via  structural adaptation\nMACA: A Modular Architecture for Conversational Agents\nArgumentation-based Security for Social Good\nThe Problem of Coincidence in A Theory of Temporal Multiple Recurrence\nAn improved Ant Colony System for the Sequential Ordering Problem\nNavigating Occluded Intersections with Autonomous Vehicles using Deep  Reinforcement Learning\nA Rule-Based Computational Model of Cognitive Arithmetic\nLifelong Metric Learning\nFormal Verification of Piece-Wise Linear Feed-Forward Neural Networks\nAnswer Set Programming for Non-Stationary Markov Decision Processes\nGroup invariance principles for causal generative models\nData Readiness Levels\nAnalogical Inference for Multi-Relational Embeddings\nInterface and Data Biopolitics in the Age of Hyperconnectivity\nPeople on Drugs: Credibility of User Statements in Health Communities\nExperimental results : Reinforcement Learning of POMDPs using Spectral  Methods\nA New Medical Diagnosis Method Based on Z-Numbers\nTrajectoryNet: An Embedded GPS Trajectory Representation for Point-based  Classification Using Recurrent Neural Networks\nCredible Review Detection with Limited Information using Consistency  Analysis\nItem Recommendation with Continuous Experience Evolution of Users using  Brownian Motion\nAirDraw: Leveraging Smart Watch Motion Sensors for Mobile Human Computer  Interactions\nMultimodal Affect Analysis for Product Feedback Assessment\nComputing an Approximately Optimal Agreeable Set of Items\nScene Text Eraser\nGeometric GAN\nMachine Learning with World Knowledge: The Position and Survey\nSafe and Nested Subgame Solving for Imperfect-Information Games\nWord and Phrase Translation with word2vec\nSolving a Path Planning Problem in a Partially Known Environment using a  Swarm Algorithm\nThe Imprecisions of Precision Measures in Process Mining\nAsynchronous Announcements\nSequential Dialogue Context Modeling for Spoken Language Understanding\nSolving Multi-Objective MDP with Lexicographic Preference: An  application to stochastic planning with multiple quantile objective\nFlexible and Creative Chinese Poetry Generation Using Neural Memory\nContext Attentive Bandits: Contextual Bandit with Restricted Context\nSurvey of Visual Question Answering: Datasets and Techniques\nSolving Distributed Constraint Optimization Problems Using Logic  Programming\nMemetic search for identifying critical nodes in sparse graphs\nProgram Induction by Rationale Generation : Learning to Solve and  Explain Algebraic Word Problems\nA First Empirical Study of Emphatic Temporal Difference Learning\nLearning to see people like people\nA Formal Characterization of the Local Search Topology of the Gap  Heuristic\nPerson Re-Identification by Deep Joint Learning of Multi-Loss  Classification\nAwareness improves problem-solving performance\nDiscrete Sequential Prediction of Continuous Actions for Deep RL\nSimulated Penetration Testing and Mitigation Analysis\nQuantifying Aspect Bias in Ordinal Ratings using a Bayesian Approach\nResumeVis: A Visual Analytics System to Discover Semantic Information in  Semi-structured Resume Data\nStrategically knowing how\nExploiting the Pruning Power of Strong Local Consistencies Through  Parallelization\nConstrained Bayesian Networks: Theory, Optimization, and Applications\nProbabilistically Safe Policy Transfer\nLearning Hard Alignments with Variational Inference\nOptimal Warping Paths are unique for almost every Pair of Time Series\nSubjective Knowledge Acquisition and Enrichment Powered By Crowdsourcing\nKnow-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs\nAll-relevant feature selection using multidimensional filters with  exhaustive search\nDemystifying Relational Latent Representations\nSmall cities face greater impact from automation\nAI, Native Supercomputing and The Revival of Moore's Law\nLearning to Represent Haptic Feedback for Partially-Observable Tasks\nIdentification and Off-Policy Learning of Multiple Objectives Using  Adaptive Clustering\nAutomatic Goal Generation for Reinforcement Learning Agents\nScalable Exact Parent Sets Identification in Bayesian Networks Learning  with Apache Spark\nVehicle Routing with Drones\nOnline learnability of Statistical Relational Learning in anomaly  detection\nAn evidential Markov decision making model\nContinuous Implicit Authentication for Mobile Devices based on Adaptive  Neuro-Fuzzy Inference System\nThe Conference Paper Assignment Problem: Using Order Weighted Averages  to Assign Indivisible Goods\nEstimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach\nInduction of Interpretable Possibilistic Logic Theories from Relational  Data\nThe Bag Semantics of Ontology-Based Data Access\nVAE with a VampPrior\nModel-Based Planning with Discrete and Continuous Actions\nOn Convergence and Stability of GANs\nAIDE: An algorithm for measuring the accuracy of probabilistic inference  algorithms\nSearch Engine Guided Non-Parametric Neural Machine Translation\nFast Change Point Detection on Dynamic Social Networks\nMixed Membership Word Embeddings for Computational Social Science\nGeneralizing the Role of Determinization in Probabilistic Planning\nSketched Answer Set Programming\nStatistical inference using SGD\nA unified view of entropy-regularized Markov decision processes\nAsk the Right Questions: Active Question Reformulation with  Reinforcement Learning\nA Unified Approach to Interpreting Model Predictions\nSemantically Decomposing the Latent Spaces of Generative Adversarial  Networks\nPoincaré Embeddings for Learning Hierarchical Representations\nDetection Algorithms for Communication Systems Using Deep Learning\nEnhanced Experience Replay Generation for Efficient Reinforcement  Learning\nExplaining Transition Systems through Program Induction\nReinforcement Learning with a Corrupted Reward Channel\nSymbolic LTLf Synthesis\nPersonalized and Private Peer-to-Peer Machine Learning\nFormal Guarantees on the Robustness of a Classifier against Adversarial  Manipulation\nSecond-Order Word Embeddings from Nearest Neighbor Topological Features\nUplift Modeling with Multiple Treatments and General Response Types\nSelective Classification for Deep Neural Networks\nPredictive Analytics for Enhancing Travel Time Estimation in Navigation  Apps of Apple, Google, and Microsoft\nAn effective algorithm for hyperparameter optimization of neural  networks\nData-driven Random Fourier Features using Stein Effect\nSafe Model-based Reinforcement Learning with Stability Guarantees\nSemi-supervised Learning with GANs: Manifold Invariance with Improved  Inference\nFlow-GAN: Combining Maximum Likelihood and Adversarial Learning in  Generative Models\nCounterfactual Multi-Agent Policy Gradients\nState Space Decomposition and Subgoal Creation for Transfer in Deep  Reinforcement Learning\nPrincipled Hybrids of Generative and Discriminative Domain Adaptation\nOnline Edge Grafting for Efficient MRF Structure Learning\nCross-Domain Perceptual Reward Functions\nLearning Structured Text Representations\nFinding Robust Solutions to Stable Marriage\nNeural Attribute Machines for Program Generation\nFiltering Variational Objectives\nTogether We Know How to Achieve: An Epistemic Logic of Know-How\nDistributed Robust Subspace Recovery\nDiscovering Reliable Approximate Functional Dependencies\nMultimodal Machine Learning: A Survey and Taxonomy\nTaste or Addiction?: Using Play Logs to Infer Song Selection Motivation\nASR error management for improving spoken language understanding\nLogical and Inequality Implications for Reducing the Size and Complexity  of Quadratic Unconstrained Binary Optimization Problems\nClassification regions of deep neural networks\nAnalysis of universal adversarial perturbations\nBayesian GAN\nRisk-Sensitive Cooperative Games for Human-Machine Systems\nQuadratic Unconstrained Binary Optimization Problem Preprocessing:  Theory and Empirical Analysis\nInexpensive Cost-Optimized Measurement Proposal for Sequential  Model-Based Diagnosis\nRole Playing Learning for Socially Concomitant Mobile Robot Navigation\nKernel Implicit Variational Inference\nMachine Learned Learning Machines\nContextual Explanation Networks\nLearning End-to-end Multimodal Sensor Policies for Autonomous Navigation\nPreliminary results on Ontology-based Open Data Publishing\nDeep Learning is Robust to Massive Label Noise\nSemi-Supervised Learning for Detecting Human Trafficking\nMorphological Error Detection in 3D Segmentations\nTowards Learned Clauses Database Reduction Strategies Based on Dominance  Relationship\nPropositional Knowledge Representation in Restricted Boltzmann Machines\nAdversarial Generation of Natural Language\nNon-Markovian Control with Gated End-to-End Memory Policy Networks\nThe Atari Grand Challenge Dataset\nEnd-to-End Differentiable Proving\nControllable Invariance through Adversarial Feature Learning\nA Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic  Problems Leveraging the Graphics Processor Unit\nDescriptions of Objectives and Processes of Mechanical Learning\nFree energy-based reinforcement learning using a quantum processor\nDiversified Top-k Partial MaxSAT Solving\nTeaching Machines to Describe Images via Natural Language Feedback\nOne button machine for automating feature engineering in relational  databases\nGrounding Symbols in Multi-Modal Instructions\nEnhancing workflow-nets with data for trace completion\nDiscovering Discrete Latent Topics with Neural Variational Inference\nSemantic Specialisation of Distributional Word Vector Spaces using  Monolingual and Cross-Lingual Constraints\nKnowledge Representation in Bicategories of Relations\nModeling Latent Attention Within Neural Networks\nJoint Matrix-Tensor Factorization for Knowledge Base Inference\nActor-Critic for Linearly-Solvable Continuous MDP with Partially Known  Dynamics\n3D Pathfinding and Collision Avoidance Using Uneven Search-space  Quantization and Visual Cone Search\nA method for the online construction of the set of states of a Markov  Decision Process using Answer Set Programming\nBatched Large-scale Bayesian Optimization in High-dimensional Spaces\nExtracting Hierarchies of Search Tasks & Subtasks via a Bayesian  Nonparametric Approach\nClassifying Documents within Multiple Hierarchical Datasets using  Multi-Task Learning\nA WL-SPPIM Semantic Model for Document Classification\nParameter Space Noise for Exploration\nEpistemic Logic with Functional Dependency Operator\nGuided Interaction Exploration in Artifact-centric Process Models\nImproving Max-Sum through Decimation to Solve Loopy Distributed  Constraint Optimization Problems\nRecurrent computations for visual pattern completion\nStochastic Global Optimization Algorithms: A Systematic Formal Approach\nInfoVAE: Information Maximizing Variational Autoencoders\nCan Computers overcome Humans? Consciousness interaction and its  implications\nMulti-Agent Actor-Critic for Mixed Cooperative-Competitive Environments\nGeneralized Value Iteration Networks: Life Beyond Lattices\nScaling up the Automatic Statistician: Scalable Structure Discovery  using Gaussian Processes\nDynamic Discovery of Type Classes and Relations in Semantic Web Data\nDynamic Integration of Background Knowledge in Neural NLU Systems\nSetting Players' Behaviors in World of Warcraft through Semi-Supervised  Learning\nThe FastMap Algorithm for Shortest Path Computations\nTIP: Typifying the Interpretability of Procedures\nStock Trading Using PE ratio: A Dynamic Bayesian Network Modeling on  Behavioral Finance and Fundamental Investment\nSymmetry Learning for Function Approximation in Reinforcement Learning\nA Focal Any-Angle Path-finding Algorithm Based on A* on Visibility  Graphs\nRethinking Skip-thought: A Neighborhood based Approach\nImage Matching via Loopy RNN\nACCNet: Actor-Coordinator-Critic Net for \"Learning-to-Communicate\" with  Deep Multi-agent Reinforcement Learning\nData-Efficient Policy Evaluation Through Behavior Policy Search\nYellowFin and the Art of Momentum Tuning\nNeural Domain Adaptation for Biomedical Question Answering\nDeep reinforcement learning from human preferences\nSemantic Entity Retrieval Toolkit\nCausal Discovery in the Presence of Measurement Error: Identifiability  Conditions\nFuzzy Recommendations in Marketing Campaigns\nRecommendations for Marketing Campaigns in Telecommunication Business  based on the footprint analysis\nA Supervised Approach to Extractive Summarisation of Scientific Papers\nOn Natural Language Generation of Formal Argumentation\nZero-Shot Relation Extraction via Reading Comprehension\nOptimization by a quantum reinforcement algorithm\nEnhanced discrete particle swarm optimization path planning for UAV  vision-based surface inspection\nSimultaneous merging multiple grid maps using the robust motion  averaging\nNeural Models for Key Phrase Detection and Question Generation\nConjunctions of Among Constraints\nZero-Shot Task Generalization with Multi-Task Deep Reinforcement  Learning\nJoint Extraction of Entities and Relations Based on a Novel Tagging  Scheme\nDeal or No Deal? End-to-End Learning for Negotiation Dialogues\nFrom Propositional Logic to Plausible Reasoning: A Uniqueness Theorem\nValue-Decomposition Networks For Cooperative Multi-Agent Learning\nVariants of RMSProp and Adagrad with Logarithmic Regret Bounds\nEvaluating the quality of tourist agendas customized to different travel  styles\nCapacity Releasing Diffusion for Speed and Locality\nModified Frank-Wolfe Algorithm for Enhanced Sparsity in Support Vector  Machine Classifiers\nLearning to Schedule Deadline- and Operator-Sensitive Tasks\nSolving Integer Linear Programs with a Small Number of Global Variables  and Constraints\nMulti-Label Annotation Aggregation in Crowdsourcing\nUser Intent Classification using Memory Networks: A Comparative Analysis  for a Limited Data Scenario\nmeProp: Sparsified Back Propagation for Accelerated Deep Learning with  Reduced Overfitting\nSub-domain Modelling for Dialogue Management with Hierarchical  Reinforcement Learning\nDualing GANs\nThe Complexity of Campaigning\nSession Analysis using Plan Recognition\nA Thorough Formalization of Conceptual Spaces\nTowards Proof Synthesis Guided by Neural Machine Translation for  Intuitionistic Propositional Logic\nOptimal modularity and memory capacity of neural networks\nRobust and Efficient Transfer Learning with Hidden-Parameter Markov  Decision Processes\nWord-Entity Duet Representations for Document Ranking\nToward Real-Time Decentralized Reinforcement Learning using Finite  Support Basis Functions\nNPGLM: A Non-Parametric Method for Temporal Link Prediction\nStructure Learning in Motor Control:A Deep Reinforcement Learning Model\nWeb-STAR: Towards a Visual Web-Based IDE for a Story Comprehension  System\nCAN: Creative Adversarial Networks, Generating \"Art\" by Learning About  Styles and Deviating from Style Norms\nExplaining Recurrent Neural Network Predictions in Sentiment Analysis\nGated-Attention Architectures for Task-Oriented Language Grounding\nA Framework for Accurate Drought Forecasting System Using  Semantics-Based Data Integration Middleware\nModel Selection with Nonlinear Embedding for Unsupervised Domain  Adaptation\nA-NICE-MC: Adversarial Training for MCMC\nUnsupervised Learning of Frustrated Classical Spin Models I: Principle  Component Analysis\nFinding optimal finite biological sequences over finite alphabets: the  OptiFin toolbox\nSpecifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces  (Preliminary Version)\nRandom Forests for Industrial Device Functioning Diagnostics Using  Wireless Sensor Networks\nThere and Back Again: A General Approach to Learning Sparse Models\nThe Boolean Solution Problem from the Perspective of Predicate Logic -  Extended Version\nGenerative Encoder-Decoder Models for Task-Oriented Spoken Dialog  Systems with Chatting Capability\nNatural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog\nNeural Question Answering at BioASQ 5B\nDeveloping Bug-Free Machine Learning Systems With Formal Mathematics\nRelating Complexity-theoretic Parameters with SAT Solver Performance\nTimeNet: Pre-trained deep recurrent neural network for time series  classification\nGradient Episodic Memory for Continual Learning\nTraining a Fully Convolutional Neural Network to Route Integrated  Circuits\nStrategyproof Mechanisms for Additively Separable Hedonic Games and  Fractional Hedonic Games\nA Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach  to Visual Blending\nGenerative Bridging Network in Neural Sequence Prediction\nHierarchical Attentive Recurrent Tracking\nPath planning for Robotic Mobile Fulfillment Systems\nDynASP2.5: Dynamic Programming on Tree Decompositions in Action\nDefault Logic and Bounded Treewidth\nNeural SLAM: Learning to Explore with External Memory\nPath Integral Networks: End-to-End Differentiable Optimal Control\nIndoor UAV scheduling with Restful Task Assignment Algorithm\nSpeaker Identification in each of the Neutral and Shouted Talking  Environments based on Gender-Dependent Approach Using SPHMMs\nProviding Effective Real-time Feedback in Simulation-based Surgical  Training\nA ROS multi-ontology references services: OWL reasoners and application  prototyping issues\nStatistical Analysis of Dice CAPTCHA Usability\nA reliability-based approach for influence maximization using the  evidence theory\nTowards Understanding Generalization of Deep Learning: Perspective of  Loss Landscapes\nBridging the Gap between Probabilistic and Deterministic Models: A  Simulation Study on a Variational Bayes Predictive Coding Recurrent Neural  Network Model\nA study of existing Ontologies in the IoT-domain\nSample-efficient Actor-Critic Reinforcement Learning with Supervised  Data for Dialogue Management\nTeacher-Student Curriculum Learning\nIdentifying hazardousness of sewer pipeline gas mixture using  classification methods: a comparative study\nSubmodular Function Maximization for Group Elevator Scheduling\nA Reverse Hex Solver\nModeling preference time in middle distance triathlons\nStructure Optimization for Deep Multimodal Fusion Networks using  Graph-Induced Kernels\nVisualizing the Consequences of Evidence in Bayesian Networks\nConditional generation of multi-modal data using constrained embedding  space mapping\nWindow-of-interest based Multi-objective Evolutionary Search for  Satisficing Concepts\nDissipative quantum bifurcation machine: Quantum heating of coupled  nonlinear oscillators\nInterpretable & Explorable Approximations of Black Box Models\nUnsupervised Submodular Rank Aggregation on Score-based Permutations\nSentiment Identification in Code-Mixed Social Media Text\nThe impossibility of \"fairness\": a generalized impossibility result for  decisions\nGraph Based Recommendations: From Data Representation to Feature  Extraction and Application\nSADA: A General Framework to Support Robust Causation Discovery with  Theoretical Guarantee\nLearning to Design Games: Strategic Environments in Deep Reinforcement  Learning\nImproving Content-Invariance in Gated Autoencoders for 2D and 3D Object  Rotation\nModel enumeration in propositional circumscription via unsatisfiable  core analysis\nHindsight Experience Replay\nOptimal Vehicle Dispatching Schemes via Dynamic Pricing\nCNN features are also great at unsupervised classification\nCross-linguistic differences and similarities in image descriptions\nTrust-PCL: An Off-Policy Trust Region Method for Continuous Control\nLong-Term Memory Networks for Question Answering\nNetworked Fairness in Cake Cutting\nMethods for finding leader--follower equilibria with multiple followers\nA parallel corpus of Python functions and documentation strings for  automated code documentation and code generation\nMeasuring Relations Between Concepts In Conceptual Spaces\nEvaluating race and sex diversity in the world's largest companies using  deep neural networks\nTowards Zero-Shot Frame Semantic Parsing for Domain Scaling\nA Fast Integrated Planning and Control Framework for Autonomous Driving  via Imitation Learning\nNeural Machine Translation between Herbal Prescriptions and Diseases\nUnderstanding State Preferences With Text As Data: Introducing the UN  General Debate Corpus\nTowards Crafting Text Adversarial Samples\nA Brief Survey of Text Mining: Classification, Clustering and Extraction  Techniques\nVision-Based Multi-Task Manipulation for Inexpensive Robots Using  End-To-End Learning from Demonstration\nLexicographic choice functions\nA Survey on Resilient Machine Learning\nAccelerated Variance Reduced Stochastic ADMM\nAutomated Game Design Learning\nCHARDA: Causal Hybrid Automata Recovery via Dynamic Analysis\nDetecting Policy Preferences and Dynamics in the UN General Debate with  Neural Word Embeddings\nValue Prediction Network\nDeep Learning for Sensor-based Activity Recognition: A Survey\nUsing RDF Summary Graph For Keyword-based Semantic Searches\nSource-Target Inference Models for Spatial Instruction Understanding\nIndependence, Conditionality and Structure of Dempster-Shafer Belief  Functions\nIdentification and Interpretation of Belief Structure in Dempster-Shafer  Theory\nAutomatic Mapping of NES Games with Mappy\nLarge Scale Variable Fidelity Surrogate Modeling\nA Brief Study of In-Domain Transfer and Learning from Fewer Samples  using A Few Simple Priors\nDependency Injection for Programming by Optimization\nConstraints, Lazy Constraints, or Propagators in ASP Solving: An  Empirical Analysis\nLarge-scale Video Classification guided by Batch Normalized LSTM  Translator\nStable Distribution Alignment Using the Dual of the Adversarial Distance\nClingo goes Linear Constraints over Reals and Integers\nNeural Networks for Information Retrieval\nLithium NLP: A System for Rich Information Extraction from Noisy User  Generated Text on Social Media\nHot-Rodding the Browser Engine: Automatic Configuration of JavaScript  Compilers\nOn (Anti)Conditional Independence in Dempster-Shafer Theory\nPredicting Abandonment in Online Coding Tutorials\nBayesian Optimization for Probabilistic Programs\nFreeway Merging in Congested Traffic based on Multipolicy Decision  Making with Passive Actor Critic\nReliability Assessment of Distribution System Using Fuzzy Logic for  Modelling of Transformer and Line Uncertainties\nGLSR-VAE: Geodesic Latent Space Regularization for Variational  AutoEncoder Architectures\nTunnel Effects in Cognition: A new Mechanism for Scientific Discovery  and Education\nImproving Adherence to Heart Failure Management Guidelines via Abductive  Reasoning\nOnline Multi-Armed Bandit\nCoalition formation for Multi-agent Pursuit based on Neural Network and  AGRMF Model\ngraph2vec: Learning Distributed Representations of Graphs\nThe Power of Constraint Grammars Revisited\nPDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge  Graphs via Entity Linking\nHoudini: Fooling Deep Structured Prediction Models\nOrder-Free RNN with Visual Attention for Multi-Label Classification\nGrounding Spatio-Semantic Referring Expressions for Human-Robot  Interaction\nOn-line Building Energy Optimization using Deep Reinforcement Learning\nDeformable Part-based Fully Convolutional Network for Object Detection\nEntropy-based Pruning for Learning Bayesian Networks using BIC\nImagination-Augmented Agents for Deep Reinforcement Learning\nCrowdsourcing Multiple Choice Science Questions\nWorst-case vs Average-case Design for Estimation from Fixed Pairwise  Comparisons\nThe Role of Conversation Context for Sarcasm Detection in Online  Interactions\nComputing LPMLN Using ASP and MLN Solvers\nFully Decentralized Policies for Multi-Agent Systems: An Information  Theoretic Approach\nVideo Question Answering via Attribute-Augmented Attention Network  Learning\nSequential Lifted Bayesian Filtering in Multiset Rewriting Systems\nAn All-in-One Network for Dehazing and Beyond\nDeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning\nAn Infinite Hidden Markov Model With Similarity-Biased Transitions\nOn the Computation of Paracoherent Answer Sets\nA Distributional Perspective on Reinforcement Learning\nA Framework for Easing the Development of Applications Embedding Answer  Set Programming\nSociety-in-the-Loop: Programming the Algorithmic Social Contract\nPreference Reasoning in Matching Procedures: Application to the  Admission Post-Baccalaureat Platform\nPrediction-Constrained Training for Semi-Supervised Mixture and Topic  Models\nReinforcement Learning for Bandit Neural Machine Translation with  Simulated Human Feedback\nLikelihood Estimation for Generative Adversarial Networks\nLearning Rare Word Representations using Semantic Bridging\nShare your Model instead of your Data: Privacy Preserving Mimic Learning  for Ranking\nImprove Lexicon-based Word Embeddings By Word Sense Disambiguation\nEvaluation of Semantic Web Technologies for Storing Computable  Definitions of Electronic Health Records Phenotyping Algorithms\nExtracting Core Claims from Scientific Articles\nDesensitized RDCA Subspaces for Compressive Privacy in Machine Learning\nMutual Alignment Transfer Learning\nStructural Regularities in Text-based Entity Vector Spaces\nUn modèle pour la représentation des connaissances temporelles dans  les documents historiques\nPrice and Profit Awareness in Recommender Systems\nA Survey on Multi-Task Learning\nPhysical problem solving: Joint planning with symbolic, geometric, and  dynamic constraints\nClosed-Loop Policies for Operational Tests of Safety-Critical Systems\nThe Advantage of Evidential Attributes in Social Networks\nA Decidable Very Expressive Description Logic for Databases (Extended  Version)\nDARLA: Improving Zero-Shot Transfer in Reinforcement Learning\nGuiding Reinforcement Learning Exploration Using Natural Language\nRobust Rigid Point Registration based on Convolution of Adaptive  Gaussian Mixture Models\nA Tale of Two DRAGGNs: A Hybrid Approach for Interpreting  Action-Oriented and Goal-Oriented Instructions\nCommon Knowledge in a Logic of Gossips\nPreservation of Semantic Properties during the Aggregation of Abstract  Argumentation Frameworks\nAn Epistemic Foundation for Authentication Logics (Extended Abstract)\nGroup Recommendations: Axioms, Impossibilities, and Random Walks\nArgument-based Belief in Topological Structures\nReconciling Bayesian Epistemology and Narration-based Approaches to  Judiciary Fact-finding\nA New Modal Framework for Epistemic Logic\nLeveraging Demonstrations for Deep Reinforcement Learning on Robotics  Problems with Sparse Rewards\nDeep Residual Learning for Weakly-Supervised Relation Extraction\nLearning to Teach Reinforcement Learning Agents\nMEMEN: Multi-layer Embedding with Memory Networks for Machine  Comprehension\nData-Driven Stochastic Robust Optimization: A General Computational  Framework and Algorithm for Optimization under Uncertainty in the Big Data  Era\nRecurrent Ladder Networks\nThe Topology of Statistical Verifiability\nPhotographic Image Synthesis with Cascaded Refinement Networks\nMethod and apparatus for automatic text input insertion in digital  devices with a restricted number of keys\nVirtual PET Images from CT Data Using Deep Convolutional Networks:  Initial Results\nDeveloping Knowledge-enhanced Chronic Disease Risk Prediction Models  from Regional EHR Repositories\nFashioning with Networks: Neural Style Transfer to Design Clothes\nDeep Convolutional Framelet Denosing for Low-Dose CT via Wavelet  Residual Network\nLearned in Translation: Contextualized Word Vectors\nA Labelling Framework for Probabilistic Argumentation\nQuantum Projective Simulation with Hamiltonian Evolution: A study in  reinforcement learning\nNeural Rating Regression with Abstractive Tips Generation for  Recommendation\nFast Preprocessing for Robust Face Sketch Synthesis\nCREST: Convolutional Residual Learning for Visual Tracking\nUsing Program Induction to Interpret Transition System Dynamics\nHierarchical Subtask Discovery With Non-Negative Matrix Factorization\n\"I can assure you [$\\ldots$] that it's going to be all right\" -- A  definition, case for, and survey of algorithmic assurances in human-autonomy  trust relationships\nOn the Importance of Consistency in Training Deep Neural Networks\nFairness-aware machine learning: a perspective\nGraph-based Features for Automatic Online Abuse Detection\nIndependently Controllable Factors\nEffective sketching methods for value function approximation\nThe UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task\nAgent based Tools for Modeling and Simulation of Self-Organization in  Peer-to-Peer, Ad-Hoc and other Complex Networks\nGame theory models for communication between agents: a review\n3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks\nBoosting Variational Inference: an Optimization Perspective\ne-QRAQ: A Multi-turn Reasoning Dataset and Simulator with Explanations\nDeclarative Statistics\nAn Information-Theoretic Optimality Principle for Deep Reinforcement  Learning\nTraining of Deep Neural Networks based on Distance Measures using  RMSProp\nWhy Adaptively Collected Data Have Negative Bias and How to Correct for  It\nSTARDATA: A StarCraft AI Research Dataset\nA Characterization of Monotone Influence Measures for Data  Classification\nAsking Too Much? The Rhetorical Role of Questions in Political Discourse\nGenerative Statistical Models with Self-Emergent Grammar of Chord  Sequences\nReal-Time Visual Localisation in a Tagged Environment\nReinforced Video Captioning with Entailment Rewards\nShortcut-Stacked Sentence Encoders for Multi-Domain Inference\nMultibiometric Secure System Based on Deep Learning\nTowards A Novel Unified Framework for Developing Formal, Network and  Validated Agent-Based Simulation Models of Complex Adaptive Systems\nBeyond the technical challenges for deploying Machine Learning solutions  in a software company\nLearning how to Active Learn: A Deep Reinforcement Learning Approach\nStochastic Optimization with Bandit Sampling\nTips and Tricks for Visual Question Answering: Learnings from the 2017  Challenge\nDecoupled Learning of Environment Characteristics for Safe Exploration\nThe Tensor Memory Hypothesis\nRole of Secondary Attributes to Boost the Prediction Accuracy of  Students Employability Via Data Mining\nAddendum to: Summary Information for Reasoning About Hierarchical Plans\nPreference fusion and Condorcet's Paradox under uncertainty\nThinking, Fast and Slow: Combining Vector Spaces and Knowledge Graphs\nSemi-supervised emotion lexicon expansion with label propagation and  specialized word embeddings\nCounterexample Guided Inductive Optimization Applied to Mobile Robots  Path Planning (Extended Version)\nDeep Object-Centric Representations for Generalizable Robot Learning\nMotion Planning under Partial Observability using Game-Based Abstraction\nBenchmark Environments for Multitask Learning in Continuous Domains\nGraph Classification via Deep Learning with Virtual Nodes\nLearning from Noisy Label Distributions\nAutomatic Summarization of Online Debates\nWeighted parallel SGD for distributed unbalanced-workload training  system\nNew Ideas for Brain Modelling 4\nMaximum A Posteriori Inference in Sum-Product Networks\nmAnI: Movie Amalgamation using Neural Imitation\nVisualizing and Exploring Dynamic High-Dimensional Datasets with  LION-tSNE\nCross-lingual Entity Alignment via Joint Attribute-Preserving Embedding\nThe Mean and Median Criterion for Automatic Kernel Bandwidth Selection  for Support Vector Data Description\nThe Size of a Hyperball in a Conceptual Space\nCultural Structures of Knowledge from Wikipedia Networks of First Links\nExploring Directional Path-Consistency for Solving Constraint Networks\nLADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online  Auctions\nAn Improved Residual LSTM Architecture for Acoustic Modeling\nHuman Uncertainty and Ranking Error -- The Secret of Successful  Evaluation in Predictive Data Mining\nA Stronger Foundation for Computer Science and P=NP\nApplying Deep Bidirectional LSTM and Mixture Density Network for  Basketball Trajectory Prediction\nA Brief Survey of Deep Reinforcement Learning\nA novel agent-based simulation framework for sensing in complex adaptive  environments\nSolving a New 3D Bin Packing Problem with Deep Reinforcement Learning  Method\nSoftware-Defined Robotics -- Idea & Approach\nApplying Data Augmentation to Handwritten Arabic Numeral Recognition  Using Deep Learning Neural Networks\nA Batch Noise Contrastive Estimation Approach for Training Large  Vocabulary Language Models\nMore cat than cute? Interpretable Prediction of Adjective-Noun Pairs\nNeural Block Sampling\nVector Space Model as Cognitive Space for Text Classification\nComparative Benchmarking of Causal Discovery Techniques\nProbabilistic Relation Induction in Vector Space Embeddings\nThe CARESSES EU-Japan project: making assistive robots culturally  competent\nNetwork Model Selection for Task-Focused Attributed Network Inference\nOn a Formal Model of Safe and Scalable Self-driving Cars\nReinforcement Learning in POMDPs with Memoryless Options and  Option-Observation Initiation Sets\nHuman Action Recognition System using Good Features and Multilayer  Perceptron Network\nAnalysis of the Impact of Negative Sampling on Link Prediction in  Knowledge Graphs\nClassification of Radiology Reports Using Neural Attention Models\nAnytime Neural Network: a Versatile Trade-off Between Computation and  Accuracy\nSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks\nOn Relaxing Determinism in Arithmetic Circuits\nLearning Deep Neural Network Representations for Koopman Operators of  Nonlinear Dynamical Systems\nNon-linear Convolution Filters for CNN-based Learning\nCapturing Long-term Temporal Dependencies with Convolutional Networks  for Continuous Emotion Recognition\nSingle Reference Image based Scene Relighting via Material Guided  Filtering\nA Survey of Human Activity Recognition Using WiFi CSI\nTowards an Automatic Turing Test: Learning to Evaluate Dialogue  Responses\nA Study on Neural Network Language Modeling\nLearning Generalized Reactive Policies using Deep Neural Networks\nAchieving Proportional Representation via Voting\nReinforcement Mechanism Design for e-commerce\nUnderstanding and Comparing Deep Neural Networks for Age and Gender  Classification\nMulti-Agent Q-Learning for Minimizing Demand-Supply Power Deficit in  Microgrids\n$k$-Nearest Neighbor Augmented Neural Networks for Text Classification\n3D Object Reconstruction from a Single Depth View with Adversarial  Learning\nNovel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient  Sensor Networks\nRIOT: a Novel Stochastic Method for Rapidly Configuring Cloud-Based  Workflows\nOn Type-Aware Entity Retrieval\nThe Convergence of Machine Learning and Communications\nDeep Belief Networks used on High Resolution Multichannel  Electroencephalography Data for Seizure Detection\nDeep Learning for Accelerated Reliability Analysis of Infrastructure  Networks\nSafe Reinforcement Learning via Shielding\nUnifying DAGs and UGs\nHow 5G (and concomitant technologies) will revolutionize healthcare\nLimiting the Reconstruction Capability of Generative Neural Network  using Negative Learning\nModelling Protagonist Goals and Desires in First-Person Narrative\nQuality and Diversity Optimization: A Unifying Modular Framework\nPros and cons gamification and gaming in classroom\nEnd-to-end Training for Whole Image Breast Cancer Diagnosis using An All  Convolutional Design\nLearning Fine-Grained Knowledge about Contingent Relations between  Everyday Events\nInference of Fine-Grained Event Causality from Blogs and Films\nLearning what to read: Focused machine reading\nOrder-Planning Neural Text Generation From Structured Data\nInferring Networked Device Categories from Low-Level Activity Indicators\nAn Automated Compatibility Prediction Engine using DISC Theory Based  Classification and Neural Networks\nXFlow: 1D-2D Cross-modal Deep Neural Networks for Audiovisual  Classification\nTopic Independent Identification of Agreement and Disagreement in Social  Media Dialogue\nDifficulty-level Modeling of Ontology-based Factual Questions\nInteractive Attention Networks for Aspect-Level Sentiment Classification\nAutomation of Android Applications Testing Using Machine Learning  Activities Classification\nA Computer Composes A Fabled Problem: Four Knights vs. Queen\nALICE: Towards Understanding Adversarial Learning for Joint Distribution  Matching\nSpeeding-up the decision making of a learning agent using an ion trap  quantum processor\nA Generic Approach for Escaping Saddle points\nLearning the PE Header, Malware Detection with Minimal Domain Knowledge\nFine-tuning deep CNN models on specific MS COCO categories\nA second order primal-dual method for nonsmooth convex composite  optimization\nMachine Learning and Social Robotics for Detecting Early Signs of  Dementia\nBayesian Optimisation for Safe Navigation under Localisation Uncertainty\nUncertainty-Aware Learning from Demonstration using Mixture Density  Networks with Sampling-Free Variance Modeling\nLearning from lions: inferring the utility of agents from their  trajectories\nInferring Generative Model Structure with Static Analysis\nObject-Oriented Knowledge Extraction using Universal Exploiters\nIdentifying Mirror Symmetry Density with Delay in Spiking Neural  Networks\nSemantic Preserving Embeddings for Generalized Graphs\nUncertainty measurement with belief entropy on interference effect in  Quantum-Like Bayesian Networks\nVariable Annealing Length and Parallelism in Simulated Annealing\nComputational Machines in a Coexistence with Concrete Universals and  Data Streams\nExpert Opinion Extraction from a Biomedical Database\nCellular Automaton Based Simulation of Large Pedestrian Facilities - A  Case Study on the Staten Island Ferry Terminals\nA Planning Approach to Monitoring Behavior of Computer Programs\nCLAD: A Complex and Long Activities Dataset with Rich Crowdsourced  Annotations\nArt of singular vectors and universal adversarial perturbations\nA Practically Competitive and Provably Consistent Algorithm for Uplift  Modeling\nRRA: Recurrent Residual Attention for Sequence Learning\nEnd-to-End United Video Dehazing and Detection\nAffective Neural Response Generation\nExplore, Exploit or Listen: Combining Human Feedback and Policy Model to  Speed up Deep Reinforcement Learning in 3D Worlds\nRefining Source Representations with Relation Networks for Neural  Machine Translation\nInformation Design in Crowdfunding under Thresholding Policies\nParallelizing Linear Recurrent Neural Nets Over Sequence Length\nOn labeling Android malware signatures using minhashing and further  classification with Structural Equation Models\nA Comparison of Public Causal Search Packages on Linear, Gaussian Data  with No Latent Variables\nAction Schema Networks: Generalised Policies with Deep Learning\nAutomated Cloud Provisioning on AWS using Deep Reinforcement Learning\nNeural Network Based Nonlinear Weighted Finite Automata\nVisualizations for an Explainable Planning Agent\nWorkflow Complexity for Collaborative Interactions: Where are the  Metrics? -- A Challenge\nPredicting Organic Reaction Outcomes with Weisfeiler-Lehman Network\nAutonomous Extracting a Hierarchical Structure of Tasks in Reinforcement  Learning and Multi-task Reinforcement Learning\nA Framework for Generalizing Graph-based Representation Learning Methods\nWarmstarting of Model-based Algorithm Configuration\nKBLRN : End-to-End Learning of Knowledge Base Representations with  Latent, Relational, and Numerical Features\nThe Conditional Analogy GAN: Swapping Fashion Articles on People Images\nDenoising Autoencoders for Overgeneralization in Neural Networks\nFast semi-supervised discriminant analysis for binary classification of  large data-sets\nOne-Shot Visual Imitation Learning via Meta-Learning\nShared Learning : Enhancing Reinforcement in $Q$-Ensembles\nClickBAIT: Click-based Accelerated Incremental Training of Convolutional  Neural Networks\nQuery-based Attention CNN for Text Similarity Map\nFeature-Fused SSD: Fast Detection for Small Objects\nA Spectral Method for Activity Shaping in Continuous-Time Information  Cascades\nSupervising Unsupervised Learning\nEmbedding Deep Networks into Visual Explanations\nThe Uncertainty Bellman Equation and Exploration\nScene-centric Joint Parsing of Cross-view Videos\nProcess-oriented Iterative Multiple Alignment for Medical Process Mining\nReinforcement Learning Based Conversational Search Assistant\nA Categorical Approach for Recognizing Emotional Effects of Music\nAI Programmer: Autonomously Creating Software Programs Using Genetic  Algorithms\nSim-to-real Transfer of Visuo-motor Policies for Reaching in Clutter:  Domain Randomization and Adaptation with Modular Networks\nDirection-Aware Semi-Dense SLAM\nRelational Marginal Problems: Theory and Estimation\nZhuSuan: A Library for Bayesian Deep Learning\nKernel Cross-Correlator\nThe shortest way to visit all metro lines in a city\nDeep Graph Attention Model\nFeedforward and Recurrent Neural Networks Backward Propagation and  Hessian in Matrix Form\nWhen is a Convolutional Filter Easy To Learn?\nDropoutDAgger: A Bayesian Approach to Safe Imitation Learning\nOn the Complexity of Robust Stable Marriage\nOnline algorithms for POMDPs with continuous state, action, and  observation spaces\nA Comparative Quantitative Analysis of Contemporary Big Data Clustering  Algorithms for Market Segmentation in Hospitality Industry\nIncorrigibility in the CIRL Framework\nSparse Markov Decision Processes with Causal Sparse Tsallis Entropy  Regularization for Reinforcement Learning\nInteractive Music Generation with Positional Constraints using  Anticipation-RNNs\nLearning to update Auto-associative Memory in Recurrent Neural Networks  for Improving Sequence Memorization\nSummable Reparameterizations of Wasserstein Critics in the  One-Dimensional Setting\nDeep Reinforcement Learning that Matters\nVerifying Properties of Binarized Deep Neural Networks\nWhy PairDiff works? -- A Mathematical Analysis of Bilinear Relational  Compositional Operators for Analogy Detection\nOptionGAN: Learning Joint Reward-Policy Options using Generative  Adversarial Inverse Reinforcement Learning\nA Voting-Based System for Ethical Decision Making\nTemporal Pattern Mining from Evolving Networks\nOpen Source Dataset and Deep Learning Models for Online Digit Gesture  Recognition on Touchscreens\nBayesian Optimization with Automatic Prior Selection for Data-Efficient  Direct Policy Search\nDeep Reinforcement Learning for Dexterous Manipulation with Concept  Networks\nOn Compiling DNNFs without Determinism\nPractical Machine Learning for Cloud Intrusion Detection: Challenges and  the Way Forward\nFeature Engineering for Predictive Modeling using Reinforcement Learning\nConvolutional neural networks that teach microscopes how to image\nExact Learning of Lightweight Description Logic Ontologies\nNeural Optimizer Search with Reinforcement Learning\nComplexity of Scheduling Charging in the Smart Grid\nRobust Optimization of Unconstrained Binary Quadratic Problems\nDefining a Lingua Franca to Open the Black Box of a Naïve Bayes  Recommender\nEB-GLS: An Improved Guided Local Search Based on the Big Valley  Structure\nHierarchical Detail Enhancing Mesh-Based Shape Generation with 3D  Generative Adversarial Network\nInverse Reinforcement Learning with Conditional Choice Probabilities\nPredicting Runtime Distributions using Deep Neural Networks\nCode Attention: Translating Code to Comments by Exploiting Domain  Features\nOptLayer - Practical Constrained Optimization for Deep Reinforcement  Learning in the Real World\nHumanoid Robots as Agents of Human Consciousness Expansion\nOn overfitting and asymptotic bias in batch reinforcement learning with  partial observability\nQuantum Memristors in Quantum Photonics\nGeneralized Quantum Reinforcement Learning with Quantum Technologies\nEfficiently Discovering Locally Exceptional yet Globally Representative  Subgroups\nSemi-Supervised Hierarchical Semantic Object Parsing\nObject-Oriented Knowledge Representation and Data Storage Using  Inhomogeneous Classes\nTowards Classification of Web ontologies using the Horizontal and  Vertical Segmentation\nPrioritized Norms in Formal Argumentation\nCross-modal Recurrent Models for Weight Objective Prediction from  Multimodal Time-series Data\nSelf-supervised learning: When is fusion of the primary and secondary  sensor cue useful?\nIntrusions in Marked Renewal Processes\nAn Optimal Online Method of Selecting Source Policies for Reinforcement  Learning\nLearning Unmanned Aerial Vehicle Control for Autonomous Target Following\nHDLTex: Hierarchical Deep Learning for Text Classification\nLearning Graph-Structured Sum-Product Networks for Probabilistic  Semantic Maps\n\"Let me convince you to buy my product ... \": A Case Study of an  Automated Persuasive System for Fashion Products\nDeep Learning Based Cryptographic Primitive Classification\nNon-iterative Label Propagation on Optimal Leading Forest\nTowards continuous control of flippers for a multi-terrain robot using  deep reinforcement learning\nEnhanced Quantum Synchronization via Quantum Machine Learning\nEnsemble Classifier for Eye State Classification using EEG Signals\nTowards automation of data quality system for CERN CMS experiment\nFooling Vision and Language Models Despite Localization and Attention  Mechanism\nUser and Developer Interaction with Editable and Readable Ontologies\nEmbodied Evolution in Collective Robotics: A Review\nLexical Disambiguation in Natural Language Questions (NLQs)\nA Simple Reinforcement Learning Mechanism for Resource Allocation in  LTE-A Networks with Markov Decision Process and Q-Learning\nDeepTransport: Learning Spatial-Temporal Dependency for Traffic  Condition Forecasting\nMulti-Label Classification of Patient Notes a Case Study on ICD Code  Assignment\nA Policy Search Method For Temporal Logic Specified Reinforcement  Learning Tasks\nApplication of a Hybrid Bi-LSTM-CRF model to the task of Russian Named  Entity Recognition\nHeuristic Online Goal Recognition in Continuous Domains\nDistance-based Confidence Score for Neural Network Classifiers\nAre we Done with Object Recognition? The iCub robot's Perspective\nImproving Efficiency in Convolutional Neural Network with Multilinear  Filters\nDeep Learning Assisted Heuristic Tree Search for the Container  Pre-marshalling Problem\nLearning Complex Dexterous Manipulation with Deep Reinforcement Learning  and Demonstrations\nOvercoming Exploration in Reinforcement Learning with Demonstrations\nA Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering\nProvably Minimally-Distorted Adversarial Examples\nTraining an adaptive dialogue policy for interactive learning of  visually grounded word meanings\nHuman motion primitive discovery and recognition\nVision-based deep execution monitoring\nPersonalized Fuzzy Text Search Using Interest Prediction and Word  Vectorization\nParameter Sharing Deep Deterministic Policy Gradient for Cooperative  Multi-agent Reinforcement Learning\nLearning event representation: As sparse as possible, but not sparser\nDeep Abstract Q-Networks\nImproving speech recognition by revising gated recurrent units\nSensor Synthesis for POMDPs with Reachability Objectives\nSupervised Q-walk for Learning Vector Representation of Nodes in  Networks\nOptimal DNN Primitive Selection with Partitioned Boolean Quadratic  Programming\nContext Embedding Networks\nNeural Task Programming: Learning to Generalize Across Hierarchical  Tasks\nLearning Graphical Models from a Distributed Stream\nStacked Structure Learning for Lifted Relational Neural Networks\nLearnable Explicit Density for Continuous Latent Space and Variational  Inference\nLattice Recurrent Unit: Improving Convergence and Statistical Efficiency  for Sequence Modeling\nGenerating Nontrivial Melodies for Music as a Service\nRainbow: Combining Improvements in Deep Reinforcement Learning\nSocially Compliant Navigation through Raw Depth Inputs with Generative  Adversarial Imitation Learning\nTexture Fuzzy Segmentation using Skew Divergence Adaptive Affinity  Functions\nAn Analysis of the Value of Information when Exploring Stochastic,  Discrete Multi-Armed Bandits\nRecurrent Network-based Deterministic Policy Gradient for Solving  Bipedal Walking Challenge on Rugged Terrains\nOn formalizing fairness in prediction with machine learning\nFunction space analysis of deep learning representation layers\nCoresets for Dependency Networks\nGeo-referencing Place from Everyday Natural Language Descriptions\nCausality and Temporal Dependencies in the Design of Fault Management  Systems\nPrior Knowledge based mutation prioritization towards causal variant  finding in rare disease\nLearning to Rank Question-Answer Pairs using Hierarchical Recurrent  Encoder with Latent Topic Clustering\nSafe Semi-Supervised Learning of Sum-Product Networks\nMeta Inverse Reinforcement Learning via Maximum Reward Sharing for Human  Motion Analysis\nEmergent Complexity via Multi-Agent Competition\nDeep Reinforcement Learning: Framework, Applications, and Embedded  Implementations\nEnd-to-End Deep Learning for Steering Autonomous Vehicles Considering  Temporal Dependencies\nDeep Semantic Abstractions of Everyday Human Activities: On Commonsense  Representations of Human Interactions\nA novel prestack sparse azimuthal AVO inversion\nCounterfactual Conditionals in Quantified Modal Logic\nSynkhronos: a Multi-GPU Theano Extension for Data Parallelism\nMachine Learning Bell Nonlocality in Quantum Many-body Systems\nMeasurement Context Extraction from Text: Discovering Opportunities and  Gaps in Earth Science\nDisSent: Sentence Representation Learning from Explicit Discourse  Relations\nSign-Constrained Regularized Loss Minimization\nMarginal sequential Monte Carlo for doubly intractable models\nClusters of Driving Behavior from Observational Smartphone Data\nIdentifying On-time Reward Delivery Projects with Estimating Delivery  Duration on Kickstarter\nHyperENTM: Evolving Scalable Neural Turing Machines through HyperNEAT\nBayesian Hypernetworks\nRecent Advances in Zero-shot Recognition\nFunctional Decision Theory: A New Theory of Instrumental Rationality\nNetwork Model Selection Using Task-Focused Minimum Description Length\nLearners that Use Little Information\nMulti-Value Rule Sets\nLearning Infinite RBMs with Frank-Wolfe\nThe Complete Extensions do not form a Complete Semilattice\nManifold Regularization for Kernelized LSTD\nCausal Rule Sets for Identifying Subgroups with Enhanced Treatment  Effect\nFlow: Architecture and Benchmarking for Reinforcement Learning in  Traffic Control\nGeneralization in Deep Learning\nIntention-Net: Integrating Planning and Deep Learning for Goal-Directed  Autonomous Navigation\nMining Frequent Patterns in Process Models\nA Survey on Optical Character Recognition System\nCharacterizing Driving Context from Driver Behavior\nGradient-free Policy Architecture Search and Adaptation\nPubMed 200k RCT: a Dataset for Sequential Sentence Classification in  Medical Abstracts\nDistributed algorithm for empty vehicles management in personal rapid  transit (PRT) network\nThe Hard Problems Are Almost Everywhere For Random CNF-XOR Formulas\nConvergence diagnostics for stochastic gradient descent with constant  step size\nMulti-Task Domain Adaptation for Deep Learning of Instance Grasping from  Simulation\nConstructing Datasets for Multi-hop Reading Comprehension Across  Documents\nNear-Optimal Adversarial Policy Switching for Decentralized Asynchronous  Multi-Agent Systems\nAsymmetric Actor Critic for Image-Based Robot Learning\nThe Effects of Memory Replay in Reinforcement Learning\nA Nonconvex Proximal Splitting Algorithm under Moreau-Yosida  Regularization\nGraph Embedding with Rich Information through Heterogeneous Network\nCharacterization of Gradient Dominance and Regularity Conditions for  Neural Networks\nEmergent Translation in Multi-Agent Communication\nAdapting general-purpose speech recognition engine output for  domain-specific natural language question answering\nConsequentialist conditional cooperation in social dilemmas with  imperfect information\nDecision Trees for Helpdesk Advisor Graphs\nSwift Linked Data Miner: Mining OWL 2 EL class expressions directly from  online RDF datasets\nA Two-Phase Safe Vehicle Routing and Scheduling Problem: Formulations  and Solution Algorithms\nOn Using Linear Diophantine Equations to Tune the extent of Look Ahead  while Hiding Decision Tree Rules\nSpoken Language Biomarkers for Detecting Cognitive Impairment\nDeep Voice 3: Scaling Text-to-Speech with Convolutional Sequence  Learning\nPoint Neurons with Conductance-Based Synapses in the Neural Engineering  Framework\nSolving the \"false positives\" problem in fraud prediction\nADA: A Game-Theoretic Perspective on Data Augmentation for Object  Detection\nA Learning-to-Infer Method for Real-Time Power Grid Topology  Identification\nDeep Neural Network Approximation using Tensor Sketching\nThe Complexity of Graph-Based Reductions for Reachability in Markov  Decision Processes\nSafety-Aware Apprenticeship Learning\nHierarchical State Abstractions for Decision-Making Problems with  Computational Constraints\nInvestigating the feature collection for semantic segmentation via  single skip connection\nListening to the World Improves Speech Command Recognition\nDeep Health Care Text Classification\nServing deep learning models in a serverless platform\nMax-Margin Invariant Features from Transformed Unlabeled Data\nEfficiently Trainable Text-to-Speech System Based on Deep Convolutional  Networks with Guided Attention\nFeature learning in feature-sample networks using multi-objective  optimization\nFashionBrain Project: A Vision for Understanding Europe's Fashion Data  Universe\nKlout Topics for Modeling Interests and Expertise of Users Across Social  Networks\nUnderstanding Grounded Language Learning Agents\nAudiovisual Analytics Vocabulary and Ontology (AAVO): initial core and  example expansion\nDistributional Reinforcement Learning with Quantile Regression\nGroup Fairness in Multiwinner Voting\nAn efficient SAT formulation for learning multiple criteria  non-compensatory sorting rules from examples\nTowards a new paradigm for assistive technology at home: research  challenges, design issues and performance assessment\nDetection and Analysis of Human Emotions through Voice and Speech  Pattern Processing\nPartitioning Relational Matrices of Similarities or Dissimilarities  using the Value of Information\nDual Skipping Networks\nLong-Distance Loop Closure Using General Object Landmarks\nExploiting Points and Lines in Regression Forests for RGB-D Camera  Relocalization\nVehicle Routing Problem with Vector Profits (VRPVP) with Max-Min  Criterion\nRegularization for Deep Learning: A Taxonomy\nTraining Probabilistic Spiking Neural Networks with First-to-spike  Decoding\nTensorizing Generative Adversarial Nets\nUnderstanding Hidden Memories of Recurrent Neural Networks\nRough extreme learning machine: a new classification method based on  uncertainty measure\nGraph Attention Networks\nThe loss surface and expressivity of deep convolutional neural networks\nUnsupervised Neural Machine Translation\nEigenoption Discovery through the Deep Successor Representation\nFast and Scalable Learning of Sparse Changes in High-Dimensional  Gaussian Graphical Model Structure\nAdversarial Advantage Actor-Critic Model for Task-Completion Dialogue  Policy Learning\nDeep Forward and Inverse Perceptual Models for Tracking and Prediction\nGenerating Natural Adversarial Examples\nParametrizing filters of a CNN with a GAN\nRegret Minimization for Partially Observable Deep Reinforcement Learning\nPhysics-guided Neural Networks (PGNN): An Application in Lake  Temperature Modeling\nWhodunnit? Crime Drama as a Case for Natural Language Understanding\nMeta-Learning and Universality: Deep Representations and Gradient  Descent can Approximate any Learning Algorithm\nUnsupervised Machine Translation Using Monolingual Corpora Only\nDCN+: Mixed Objective and Deep Residual Coattention for Question  Answering\nBeyond Shared Hierarchies: Deep Multitask Learning through Soft Layer  Ordering\nAutomata Guided Hierarchical Reinforcement Learning for Zero-shot Skill  Composition\nPomegranate: fast and flexible probabilistic modeling in python\nGeneralization without systematicity: On the compositional skills of  sequence-to-sequence recurrent networks\nServant of Many Masters: Shifting priorities in Pareto-optimal  sequential decision-making\nMinimal Exploration in Structured Stochastic Bandits\nBuilding Data-driven Models with Microstructural Images: Generalization  and Interpretability\nJust ASK: Building an Architecture for Extensible Self-Service Spoken  Language Understanding\nVariational Inference of Disentangled Latent Concepts from Unlabeled  Observations\nWeight-Based Variable Ordering in the Context of High-Level  Consistencies\nSPARK: Static Program Analysis Reasoning and Retrieving Knowledge\nMeta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory\nMandolin: A Knowledge Discovery Framework for the Web of Data\nDecentralised firewall for malware detection\nThe Case for Meta-Cognitive Machine Learning: On Model Entropy and  Concept Formation in Deep Learning\nSearching for Biophysically Realistic Parameters for Dynamic Neuron  Models by Genetic Algorithms from Calcium Imaging Recording\nEnsembles of Multiple Models and Architectures for Robust Brain Tumour  Segmentation\nComposing Meta-Policies for Autonomous Driving Using Hierarchical Deep  Reinforcement Learning\nSemantic Web Today: From Oil Rigs to Panama Papers\nFisher-Rao Metric, Geometry, and Complexity of Neural Networks\nWider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence  Learning\nStrategies for Conceptual Change in Convolutional Neural Networks\nMultilingual Speech Recognition With A Single End-To-End Model\nRoboCupSimData: A RoboCup soccer research dataset\nNeural Language Modeling by Jointly Learning Syntax and Lexicon\nNeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm\nBounding and Counting Linear Regions of Deep Neural Networks\nWeighted Transformer Network for Machine Translation\nAdaptive Bayesian Sampling with Monte Carlo EM\nAlpha-expansion is Exact on Stable Instances\nCan Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?\nLearning Overcomplete HMMs\nSparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent  Networks\nRecurrent Autoregressive Networks for Online Multi-Object Tracking\nBlock-Sparse Recurrent Neural Networks\nFaster Fuzzing: Reinitialization with Deep Neural Models\nInverse Reward Design\nLearning Sparse Visual Representations with Leaky Capped Norm  Regularizers\nClustering with feature selection using alternating minimization,  Application to computational biology\nExploration in NetHack with Secret Discovery\nInformation Directed Sampling for Stochastic Bandits with Graph Feedback\nLarge-scale Cloze Test Dataset Designed by Teachers\nCogSciK: Clustering for Cognitive Science Motivated Decision Making\nDiscovering Representative Examples for Program Synthesis\nHeuristic Optimization for Automated Distribution System Planning in  Network Integration Studies\nRepairing Ontologies via Axiom Weakening\nOpen-World Knowledge Graph Completion\nScalable Log Determinants for Gaussian Process Kernel Learning\nLearning Multi-Modal Word Representation Grounded in Visual Context\nFast Meta-Learning for Adaptive Hierarchical Classifier Design\nPicasso, Matisse, or a Fake? Automated Analysis of Drawings at the  Stroke Level for Attribution and Authentication\nA Change-Detection based Framework for Piecewise-stationary Multi-Armed  Bandit Problem\nDLPaper2Code: Auto-generation of Code from Deep Learning Research Papers\nStochastic Deep Learning in Memristive Networks\nSelf-Supervised Intrinsic Image Decomposition\nSaliency Prediction for Mobile User Interfaces\nLattice embeddings between types of fuzzy sets. Closed-valued fuzzy sets\nLearning with Options that Terminate Off-Policy\nArrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records\nCARLA: An Open Urban Driving Simulator\nApplications of Deep Learning and Reinforcement Learning to Biological  Data\nOptimised Maintenance of Datalog Materialisations\nStream Reasoning in Temporal Datalog\nDeep Within-Class Covariance Analysis for Acoustic Scene Classification\nParkinson's Disease Digital Biomarker Discovery with Optimized  Transitions and Inferred Markov Emissions\nFine Grained Knowledge Transfer for Personalized Task-oriented Dialogue  Systems\nMojiTalk: Generating Emotional Responses at Scale\nCommonsense LocatedNear Relation Extraction\nEvaluation of trackers for Pan-Tilt-Zoom Scenarios\nHigh-Order Attention Models for Visual Question Answering\nLearning Abduction under Partial Observability\nSimple And Efficient Architecture Search for Convolutional Neural  Networks\nSolving the Resource Constrained Project Scheduling Problem Using the  Parallel Tabu Search Designed for the CUDA Platform\nPhonemic and Graphemic Multilingual CTC Based Speech Recognition\nMultilingual Adaptation of RNN Based ASR Systems\nA unified decision making framework for supply and demand management in  microgrid networks\nEfficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks\nWeb Robot Detection in Academic Publishing\nEvidence Aggregation for Answer Re-Ranking in Open-Domain Question  Answering\nDeep Rewiring: Training very sparse deep networks\nSaliency-based Sequential Image Attention with Multiset Prediction\nGoal-Driven Query Answering for Existential Rules with Equality\nWeakly-supervised Semantic Parsing with Abstract Examples\nLoss Functions for Multiset Prediction\nRevisiting Simple Neural Networks for Learning Representations of  Knowledge Graphs\nHibikino-Musashi@Home 2017 Team Description Paper\nA Generally Applicable, Highly Scalable Measurement Computation and  Optimization Approach to Sequential Model-Based Diagnosis\nExploiting Layerwise Convexity of Rectifier Networks with Sign  Constrained Weights\nMarkov Decision Processes with Continuous Side Information\nFast Predictive Simple Geodesic Regression\nQuantile Markov Decision Process\nK3, L3, LP, RM3, A3, FDE: How to Make Many-Valued Logics Work for You\nBootstrapped synthetic likelihood\nA General Neural Network Hardware Architecture on FPGA\nUsing Noisy Extractions to Discover Causal Knowledge\nBudget-Constrained Multi-Armed Bandits with Multiple Plays\nHindsight policy gradients\nEnabling Reasoning with LegalRuleML\nA Robust Genetic Algorithm for Learning Temporal Specifications from  Data\nOne Model for the Learning of Language\nTowards Deep Learning Models for Psychological State Prediction using  Smartphone Data: Challenges and Opportunities\n3D Reconstruction of Incomplete Archaeological Objects Using a  Generative Adversarial Network\nUsing KL-divergence to focus Deep Visual Explanation\nWin Prediction in Esports: Mixed-Rank Match Prediction in Multi-player  Online Battle Arena Games\nLearning to Play Othello with Deep Neural Networks\nDependent landmark drift: robust point set registration based on the  Gaussian mixture model with a statistical shape model\nDriven to Distraction: Self-Supervised Distractor Learning for Robust  Monocular Visual Odometry in Urban Environments\nIs prioritized sweeping the better episodic control?\nScalable Recollections for Continual Lifelong Learning\nA Generalized Genetic Algorithm-Based Solver for Very Large Jigsaw  Puzzles of Complex Types\nAnonymous Hedonic Game for Task Allocation in a Large-Scale Multiple  Agent System\nRun, skeleton, run: skeletal model in a physics-based simulation\nComputational Results for Extensive-Form Adversarial Team Games\nFacets, Tiers and Gems: Ontology Patterns for Hypernormalisation\nFusionNet: Fusing via Fully-Aware Attention with Application to Machine  Comprehension\nImplementing the Deep Q-Network\nGenerating Thematic Chinese Poetry using Conditional Variational  Autoencoders with Hybrid Decoders\nCross Temporal Recurrent Networks for Ranking Question Answer Pairs\nFullie and Wiselie: A Dual-Stream Recurrent Convolutional Attention  Model for Activity Recognition\nHidden Tree Markov Networks: Deep and Wide Learning for Structured Data\nSituationally Aware Options\nConstructive Preference Elicitation over Hybrid Combinatorial Spaces\nQuantifying Performance of Bipedal Standing with Multi-channel EMG\nDeep Learning for Physical Processes: Incorporating Prior Scientific  Knowledge\nRelating Input Concepts to Convolutional Neural Network Decisions\nRecurrent Relational Networks for Complex Relational Reasoning\nDeterministic Policy Optimization by Combining Pathwise and Score  Function Estimators for Discrete Action Spaces\nRobust Stackelberg Equilibria in Extensive-Form Games and Extension to  Limited Lookahead\nAsymmetric Action Abstractions for Multi-Unit Control in Adversarial  Real-Time Games\nThe Stochastic Firefighter Problem\nDecomposition Strategies for Constructive Preference Elicitation\nA correlational analysis of multiagent sensorimotor interactions:  clustering autonomous and controllable entities\nRGB-D-based Human Motion Recognition with Deep Learning: A Survey\nSafer Classification by Synthesis\nAutomated Algorithm Selection on Continuous Black-Box Problems By  Combining Exploratory Landscape Analysis and Machine Learning\nEthical Challenges in Data-Driven Dialogue Systems\nCascade Attribute Learning Network\nD numbers theory based game-theoretic framework in adversarial decision  making under fuzzy environment\nGeneralizing Hamiltonian Monte Carlo with Neural Networks\nGenerative Adversarial Network for Abstractive Text Summarization\nPedagogical learning\nMAVOT: Memory-Augmented Video Object Tracking\nA general unified framework for interval pairwise comparison matrices\nDeep Reinforcement Learning for Sepsis Treatment\nButterfly Effect: Bidirectional Control of Classification Performance by  Small Additive Perturbation\nProduction Ready Chatbots: Generate if not Retrieve\nClassifier Selection with Permutation Tests\nTable-to-text Generation by Structure-aware Seq2seq Learning\nDistilling a Neural Network Into a Soft Decision Tree\nTensor Completion Algorithms in Big Data Analytics\nHomomorphic Parameter Compression for Distributed Deep Learning Training\nOne-Shot Reinforcement Learning for Robot Navigation with Interactive  Replay\nQuantitative CBA: Small and Comprehensible Association Rule  Classification Models\nHierarchical Policy Search via Return-Weighted Density Estimation\nCrossmodal Attentive Skill Learner\nComplex Structure Leads to Overfitting: A Structure Regularization  Decoding Method for Natural Language Processing\nBackprop as Functor: A compositional perspective on supervised learning\nA Recursive Bayesian Approach To Describe Retinal Vasculature Geometry\nPhysics Informed Deep Learning (Part I): Data-driven Solutions of  Nonlinear Partial Differential Equations\nFearNet: Brain-Inspired Model for Incremental Learning\nPhysics Informed Deep Learning (Part II): Data-driven Discovery of  Nonlinear Partial Differential Equations\nA reinforcement learning algorithm for building collaboration in  multi-agent systems\nTensorFlow Distributions\nPSIque: Next Sequence Prediction of Satellite Images using a  Convolutional Sequence-to-Sequence Network\nEfficient exploration with Double Uncertain Value Networks\nSaliency Weighted Convolutional Features for Instance Search\nDeep Reinforcement Learning for De-Novo Drug Design\nExtreme Dimension Reduction for Handling Covariate Shift\nNow Playing: Continuous low-power music recognition\nEmbedding Words as Distributions with a Bayesian Skip-gram Model\nImproving Latent User Models in Online Social Media\nVideo Captioning via Hierarchical Reinforcement Learning\nA Semantic Loss Function for Deep Learning with Symbolic Knowledge\nEmbedded Real-Time Fall Detection Using Deep Learning For Elderly Care\nLearning to Learn from Weak Supervision by Full Supervision\nConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection,  Adversarial Examples and Model Criticism\nCollaborative Filtering with Social Exposure: A Modular Approach to  Social Recommendation\nCalculating Semantic Similarity between Academic Articles using Topic  Event and Ontology\nDeep Neural Networks for Multiple Speaker Detection and Localization\nComparing Deep Reinforcement Learning and Evolutionary Methods in  Continuous Control\nImproving Smiling Detection with Race and Gender Diversity\nA double competitive strategy based learning automata algorithm\nNovel Exploration Techniques (NETs) for Malaria Policy Interventions\nInteractive Reinforcement Learning for Object Grounding via Self-Talking\nFrom knowledge-based to data-driven modeling of fuzzy rule-based  systems: A critical reflection\nWill humans even write code in 2040 and what would that mean for extreme  heterogeneity in computing?\nEvaluation of Alzheimer's Disease by Analysis of MR Images using  Multilayer Perceptrons and Kohonen SOM Classifiers as an Alternative to the  ADC Maps\nGradient Descent Learns One-hidden-layer CNN: Don't be Afraid of  Spurious Local Minima\nHierarchical Actor-Critic\nMining Supervisor Evaluation and Peer Feedback in Performance Appraisals\nCharacterizing and Computing Causes for Query Answers in Databases from  Database Repairs and Repair Programs\nA Deeper Look at Experience Replay\nLearning User Intent from Action Sequences on Interactive Systems\nExamining Cooperation in Visual Dialog Models\nMultimodal Storytelling via Generative Adversarial Imitation Learning\nDeterminism in the Certification of UNSAT Proofs\nNeural Cross-Lingual Entity Linking\nAn analysis of incorporating an external language model into a  sequence-to-sequence model\nLearning General Latent-Variable Graphical Models with Predictive Belief  Propagation and Hilbert Space Embeddings\nDistance-based Self-Attention Network for Natural Language Inference\nAdversarial Examples that Fool Detectors\nUsing Rule-Based Labels for Weak Supervised Learning: A ChemNet for  Transferable Chemical Property Prediction\nA Deep Network Model for Paraphrase Detection in Short Text Messages\nEnd-to-End Offline Goal-Oriented Dialog Policy Learning via Policy  Gradient\nColumnar Database Techniques for Creating AI Features\nStochastic Dual Coordinate Descent with Bandit Sampling\nFlagIt: A System for Minimally Supervised Human Trafficking Indicator  Mining\nA Class of Logistic Functions for Approximating State-Inclusive Koopman  Operators\nS-Shaped vs. V-Shaped Transfer Functions for Antlion Optimization  Algorithm in Feature Selection Problems\nSocial Emotion Mining Techniques for Facebook Posts Reaction Prediction\nBayesian Q-learning with Assumed Density Filtering\nRobust Deep Reinforcement Learning with Adversarial Attacks\nDeepConfig: Automating Data Center Network Topologies Management with  Machine Learning\nMINOS: Multimodal Indoor Simulator for Navigation in Complex  Environments\nInvestigating the Impact of Data Volume and Domain Similarity on  Transfer Learning Applications\nLearning Robust Dialog Policies in Noisy Environments\nThe Eigenoption-Critic Framework\nIn a Nutshell: Sequential Parameter Optimization\nBenchmarking Single Image Dehazing and Beyond\nToward `verifying' a Water Treatment System\nMining Non-Redundant Sets of Generalizing Patterns from Sequence  Databases\nInterpretable Policies for Reinforcement Learning by Genetic Programming\nConsideration on Example 2 of \"An Algorithm of General Fuzzy  InferenceWith The Reductive Property\"\nReview of Design of Speech Recognition and Text Analytics based Digital  Banking Customer Interface and Future Directions of Technology Adoption\nReasoning in Systems with Elements that Randomly Switch Characteristics\nIntrinsic Point of Interest Discovery from Trajectory Data\nProximodistal Exploration in Motor Learning as an Emergent Property of  Optimization\nConstraint and Mathematical Programming Models for Integrated Port  Container Terminal Operations\nCoDraw: Visual Dialog for Collaborative Drawing\nPre-training Attention Mechanisms\nImpossibility of deducing preferences and rationality from human policy\nMorphology dictates a robot's ability to ground crowd-proposed language\nRay: A Distributed Framework for Emerging AI Applications\nSchNet - a deep learning architecture for molecules and materials\nVisual Explanations from Hadamard Product in Multimodal Deep Networks\n'Indifference' methods for managing agent rewards\nNonparametric Inference for Auto-Encoding Variational Bayes\nParallel Complexity of Forward and Backward Propagation\nLearning Representations from Road Network for End-to-End Urban Growth  Simulation\nHeinrich Behmann's Contributions to Second-Order Quantifier Elimination  from the View of Computational Logic\nLarge-Scale Vandalism Detection with Linear Classifiers - The  Conkerberry Vandalism Detector at WSDM Cup 2017\nSafe Policy Improvement with Baseline Bootstrapping\nMining Smart Card Data for Travelers' Mini Activities\nColumn Generation for Interaction Coverage in Combinatorial Software  Testing\nOn Data-Dependent Random Features for Improved Generalization in  Supervised Learning\nMachine Learning for Vehicular Networks\nHierarchical and Interpretable Skill Acquisition in Multi-task  Reinforcement Learning\nBlock-diagonal Hessian-free Optimization for Training Neural Networks\nAnalysis of supervised and semi-supervised GrowCut applied to  segmentation of masses in mammography images\nPartial Labeled Gastric Tumor Segmentation via patch-based Reiterative  Learning\nAn Ensemble Model with Ranking for Social Dialogue\nContext-aware Path Ranking for Knowledge Base Completion\nBit-Vector Model Counting using Statistical Estimation\nImprovements to Inference Compilation for Probabilistic Programming in  Large-Scale Scientific Simulators\nReachable Set Computation and Safety Verification for Neural Networks  with ReLU Activations\nFair Forests: Regularized Tree Induction to Minimize Model Bias\nCSGNet: Neural Shape Parser for Constructive Solid Geometry\nInverse Classification for Comparison-based Interpretability in Machine  Learning\nRank Pruning for Dominance Queries in CP-Nets\nObtaining Accurate Probabilistic Causal Inference by Post-Processing  Calibration\nInterpretable Counting for Visual Question Answering\nTowards Collaborative Conceptual Exploration\nBuilding Robust Deep Neural Networks for Road Sign Detection\nAn Online Ride-Sharing Path Planning Strategy for Public Vehicle Systems\nReport: Dynamic Eye Movement Matching and Visualization Tool in Neuro  Gesture\nToward Continual Learning for Conversational Agents\nThe Merits of Sharing a Ride\nKernel Robust Bias-Aware Prediction under Covariate Shift\nDeep Learning Interior Tomography for Region-of-Interest Reconstruction\nCharacterizing optimal hierarchical policy inference on graphs via  non-equilibrium thermodynamics\nLearning Structural Weight Uncertainty for Sequential Decision-Making\nA Compare-Propagate Architecture with Alignment Factorization for  Natural Language Inference\nGame-theoretic Network Centrality: A Review\nNeurally Plausible Model of Robot Reaching Inspired by Infant Motor  Babbling\nScalable Hash-Based Estimation of Divergence Measures\nAutomated rating of recorded classroom presentations using speech  analysis in kazakh\nMulti-Objective Vehicle Routing Problem Applied to Large Scale Post  Office Deliveries\nUm Sistema Multiagente no Combate ao Braqueamento de Capitais\nViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, &  Snapshot Ensembling\nSocial Media Analysis based on Semanticity of Streaming and Batch Data\nApproximate Ranking from Pairwise Comparisons\nDeep Learning Reconstruction for 9-View Dual Energy CT Baggage Scanner\nSoft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement  Learning with a Stochastic Actor\nA Quantitative Analysis of Multi-Winner Rules\nCombination of Hyperband and Bayesian Optimization for Hyperparameter  Optimization in Deep Learning\nSecrecy by Witness-Functions under Equational Theories\nEntropy production rate as a criterion for inconsistency in decision  theory\nAutomated Conjecturing VII: The Graph Brain Project & Big Mathematics\nA Comprehensive Survey of Ontology Summarization: Measures and Methods\nOn the inherent competition between valid and spurious inductive  inferences in Boolean data\nApproximate FPGA-based LSTMs under Computation Time Constraints\nIndian Regional Movie Dataset for Recommender Systems\nSample-Efficient Reinforcement Learning through Transfer and  Architectural Priors\nHow to find a GSMem malicious activity via an AI approach\nDeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement  Learning\nDistributed Deep Reinforcement Learning: Learn how to play Atari games  in 21 minutes\nAn Ontology for Satellite Databases\nProbabilistic Prognostic Estimates of Survival in Metastatic Cancer  Patients (PPES-Met) Utilizing Free-Text Clinical Narratives\nDeep In-GPU Experience Replay\nA Formalization of Kant's Second Formulation of the Categorical  Imperative\nAdaptive Graph Convolutional Neural Networks\nEliciting Worker Preference for Task Completion\nReasoning about Unforeseen Possibilities During Policy Learning\nNeural Program Synthesis with Priority Queue Training\nUsing probabilistic programs as proposals\nTopic-based Evaluation for Conversational Bots\nEARL: Joint Entity and Relation Linking for Question Answering over  Knowledge Graphs\nFormalized Conceptual Spaces with a Geometric Representation of  Correlations\nModel-Based Action Exploration for Learning Dynamic Motion Skills\nInteractive Learning of Acyclic Conditional Preference Networks\nPlanning with Trust for Human-Robot Collaboration\nCombining Symbolic and Function Evaluation Expressions In Neural  Programs\nA Computational Model of Commonsense Moral Decision Making\nFairness in Supervised Learning: An Information Theoretic Approach\nWhich Training Methods for GANs do actually Converge?\nBetter Runtime Guarantees Via Stochastic Domination\nNon-Parametric Transformation Networks\ntau-FPL: Tolerance-Constrained Learning in Linear Time\nRobots as Powerful Allies for the Study of Embodied Cognition from the  Bottom Up\nBuilding a Conversational Agent Overnight with Dialogue Self-Play\nTopic Modeling on Health Journals with Regularized Variational Inference\nEmpirical Explorations in Training Networks with Discrete Activations\nSocial Network based Short-Term Stock Trading System\nLearning Features For Relational Data\nConsiderations regarding security issues impact on systems availability\nUnseen Class Discovery in Open-world Classification\nA Generalized Dempster--Shafer Evidence Theory\nToward Scalable Verification for Safety-Critical Deep Networks\nLayered TPOT: Speeding up Tree-based Pipeline Optimization\nNatural Language Multitasking: Analyzing and Improving Syntactic  Saliency of Hidden Representations\nOptimal Weighting for Exam Composition\nIntegrating planning for task-completion dialogue policy learning\nDemonstration of Topological Data Analysis on a Quantum Processor\nActive Learning of Strict Partial Orders: A Case Study on Concept  Prerequisite Relations\nmvn2vec: Preservation and Collaboration in Multi-View Network Embedding\nA high-performance analog Max-SAT solver and its application to Ramsey  numbers\nVisualization of Hyperspectral Images Using Moving Least Squares\nEfficient Learning of Optimal Markov Network Topology with k-Tree  Modeling\nCross-Domain Transfer in Reinforcement Learning using Target Apprentice\nExtreme Learning Machine with Local Connections\nOptimal Convergence for Distributed Learning with Stochastic Gradient  Methods and Spectral-Regularization Algorithms\nPersonalizing Dialogue Agents: I have a dog, do you have pets too?\nComparison Training for Computer Chinese Chess\nAnalyzing Language Learned by an Active Question Answering Agent\nMitigating Unwanted Biases with Adversarial Learning\nDeepGestalt - Identifying Rare Genetic Syndromes Using Deep Learning\nClustering with Deep Learning: Taxonomy and New Methods\nA Classification Refinement Strategy for Semantic Segmentation\nOptimal Transport on Discrete Domains\nPointCNN\nEvaluation of Interactive Machine Learning Systems\nIntrinsic dimension of concept lattices\nPRNN: Recurrent Neural Network with Persistent Memory\nMAttNet: Modular Attention Network for Referring Expression  Comprehension\nDiscovering Markov Blanket from Multiple interventional Datasets\nFinding ReMO (Related Memory Object): A Simple Neural Architecture for  Text based Reasoning\nSine Cosine Crow Search Algorithm: A powerful hybrid meta heuristic for  global optimization\nDeep Learning in Pharmacogenomics: From Gene Regulation to Patient  Stratification\nKnowledge Graph Embedding with Multiple Relation Projections\nSafe Exploration in Continuous Action Spaces\nFlashRL: A Reinforcement Learning Platform for Flash Games\nA Sheaf Model of Contradictions and Disagreements. Preliminary Report  and Discussion\nDeep Reinforcement Learning for Dynamic Treatment Regimes on Medical  Registry Data\nRepresenting the Insincere: Strategically Robust Proportional  Representation\nOn the Inter-relationships among Drift rate, Forgetting rate,  Bias/variance profile and Error\nImproving Active Learning in Systematic Reviews\nAn Improved Tabu Search Heuristic for Static Dial-A-Ride Problem\nEvaluating approaches for supervised semantic labeling\nPredicting Rapid Fire Growth (Flashover) Using Conditional Generative  Adversarial Networks\nPersonalized Survival Prediction with Contextual Explanation Networks\nAlgorithms for the Greater Good! On Mental Modeling and Acceptable  Symbiosis in Human-AI Collaboration\nCOBRA: A Fast and Simple Method for Active Clustering with Pairwise  Constraints\nFeatures, Projections, and Representation Change for Generalized  Planning\nA Rational Distributed Process-level Account of Independence Judgment\nA Cross Entropy based Optimization Algorithm with Global Convergence  Guarantees\nDeep Learning Works in Practice. But Does it Work in Theory?\nPretraining Deep Actor-Critic Reinforcement Learning Algorithms With  Expert Demonstrations\nDeep Reinforcement Learning for Programming Language Correction\nDeep Predictive Models in Interactive Music\nLifted Filtering via Exchangeable Decomposition\nLearning Families of Formal Languages from Positive and Negative  Information\nCluster-based Approach to Improve Affect Recognition from Passively  Sensed Data\nRecursive Feature Generation for Knowledge-based Learning\nDeep Learning with Data Dependent Implicit Activation Function\nDual Recurrent Attention Units for Visual Question Answering\n3D Object Dense Reconstruction from a Single Depth View\nObfuscated Gradients Give a False Sense of Security: Circumventing  Defenses to Adversarial Examples\nAdaptive Memory Networks\nGenerating Redundant Features with Unsupervised Multi-Tree Genetic  Programming\nInterpretable Deep Convolutional Neural Networks via Meta-learning\nVisual Interpretability for Deep Learning: a Survey\nHow do Humans Understand Explanations from Machine Learning Systems? An  Evaluation of the Human-Interpretability of Explanation\nIntriguing Properties of Randomly Weighted Networks: Generalizing While  Learning Next to Nothing\nMemory Fusion Network for Multi-view Sequential Learning\nIncorporating Literals into Knowledge Graph Embeddings\nPose Flow: Efficient Online Pose Tracking\nPlan Explanations as Model Reconciliation -- An Empirical Study\nDevelopment of c-means Clustering Based Adaptive Fuzzy Controller for A  Flapping Wing Micro Air Vehicle\nTask-Aware Compressed Sensing with Generative Adversarial Networks\nInteractive Grounded Language Acquisition and Generalization in a 2D  World\nThe Sea Exploration Problem: Data-driven Orienteering on a Continuous  Surface\nGuided Policy Exploration for Markov Decision Processes using an  Uncertainty-Based Value-of-Information Criterion\nAbstractly Interpreting Argumentation Frameworks for Sharpening  Extensions\nLearning from Richer Human Guidance: Augmenting Comparison-Based  Learning with Feature Queries\nUtility Decomposition with Deep Corrections for Scalable Planning under  Uncertainty\nDecoding-History-Based Adaptive Control of Attention for Neural Machine  Translation\nA Survey Of Methods For Explaining Black Box Models\nImproving Variational Encoder-Decoders in Dialogue Generation\nFastNet\nIONet: Learning to Cure the Curse of Drift in Inertial Odometry\nScalable Meta-Learning for Bayesian Optimization\nEvolutionary Computation plus Dynamic Programming for the Bi-Objective  Travelling Thief Problem\nEfficient Learning of Bounded-Treewidth Bayesian Networks from Complete  and Incomplete Data Sets\nDeepHeart: Semi-Supervised Sequence Learning for Cardiovascular Risk  Prediction\nPPFNet: Global Context Aware Local Features for Robust 3D Point Matching\nEfficient collective swimming by harnessing vortices through deep  reinforcement learning\nEfficient Large-Scale Multi-Modal Classification\nLearning Role-based Graph Embeddings\nCognitive Business Process Management for Adaptive Cyber-Physical  Processes\nWeb-Based Implementation of Travelling Salesperson Problem Using Genetic  Algorithm\nBalancing Two-Player Stochastic Games with Soft Q-Learning\nLearning Robust Options\nSlice Sampling Particle Belief Propagation\nATPboost: Learning Premise Selection in Binary Setting with ATP Feedback\nNot-So-CLEVR: Visual Relations Strain Feedforward Neural Networks\nGeneralization of an Upper Bound on the Number of Nodes Needed to  Achieve Linear Separability\nGenerative Adversarial Networks and Probabilistic Graph Models for  Hyperspectral Image Classification\nLocal Contrast Learning\nTo the problem of \"The Instrumental complex for ontological engineering  purpose\" software system design\nBeyond Markov Logic: Efficient Mining of Prediction Rules in Large  Graphs\nGraph Planning with Expected Finite Horizon\nBeyond the One Step Greedy Approach in Reinforcement Learning\nLearning a SAT Solver from Single-Bit Supervision\nFormal Ontology Learning from English IS-A Sentences\nThe Need for Speed of AI Applications: Performance Comparison of Native  vs. Browser-based Algorithm Implementations\nLearning Multiple Levels of Representations with Kernel Machines\nInfluence-Directed Explanations for Deep Convolutional Networks\nPseudo-Recursal: Solving the Catastrophic Forgetting Problem in Deep  Neural Networks\nDetecting and Correcting for Label Shift with Black Box Predictors\nA note on reinforcement learning with Wasserstein distance  regularisation, with applications to multipolicy learning\nProofWatch: Watchlist Guidance for Large Theories in E\nA New Algorithmic Decision for Categorical Syllogisms via Caroll's  Diagrams\nState Representation Learning for Control: An Overview\nEfficient Hierarchical Robot Motion Planning Under Uncertainty and  Hybrid Dynamics\nDeep Reinforcement Learning for Solving the Vehicle Routing Problem\nGlobal Model Interpretation via Recursive Partitioning\nEfficient Model-Based Deep Reinforcement Learning with Variational State  Tabulation\nIdentifiability of Nonparametric Mixture Models and Bayes Optimal  Clustering\nsignSGD: compressed optimisation for non-convex problems\nLearning Robust and Adaptive Real-World Continuous Control Using  Simulation and Transfer Learning\nOn the Relative Succinctness of Sentential Decision Diagrams\nDiversity-Driven Exploration Strategy for Deep Reinforcement Learning\nBarista - a Graphical Tool for Designing and Training Deep Neural  Networks\nAttention based Sentence Extraction from Scientific Articles using  Pseudo-Labeled data\nProgressive Reinforcement Learning with Distillation for Multi-Skilled  Motion Control\nEvolved Policy Gradients\nChallenging Images For Minds and Machines\nLearning via social awareness: improving sketch representations with  facial feedback\nDisjoint Multi-task Learning between Heterogeneous Human-centric Tasks\nPlayeRank: Multi-dimensional and role-aware rating of soccer player  performance\nNot to Cry Wolf: Distantly Supervised Multitask Learning in Critical  Care\nDeep Learning and Data Assimilation for Real-Time Production Prediction  in Natural Gas Wells\nWho Killed Albert Einstein? From Open Data to Murder Mystery Games\nLearning Deep Disentangled Embeddings with the F-Statistic Loss\nReinforcement Learning from Imperfect Demonstrations\nFrom Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in  the Style of Alpha(Go) Zero\nDeep Learning Based Speech Beamforming\nHigh Dimensional Bayesian Optimization Using Dropout\nMean Field Multi-Agent Reinforcement Learning\nAdmissible Time Series Motif Discovery with Missing Data\nPrioritized Sweeping Neural DynaQ with Multiple Predecessors, and  Hippocampal Replays\nTruth Validation with Evidence\nDisentangling Aspect and Opinion Words in Target-based Sentiment  Analysis using Lifelong Learning\nAn Anytime Algorithm for Task and Motion MDPs\nA Unified View of Causal and Non-causal Feature Selection\nDeep Generative Model for Joint Alignment and Word Representation\nMeasuring Human-perceived Similarity in Heterogeneous Collections\nMonte Carlo Q-learning for General Game Playing\nImproved GQ-CNN: Deep Learning Model for Planning Robust Grasps\nDropout Model Evaluation in MOOCs\nTowards a Continuous Knowledge Learning Engine for Chatbots\nOnline Continuous Submodular Maximization\nImplicit Robot-Human Communication in Adversarial and Collaborative  Environments\nHyP-DESPOT: A Hybrid Parallel Algorithm for Online Planning under  Uncertainty\nOptimizing Interactive Systems with Data-Driven Objectives\nGraphical Models for Non-Negative Data Using Generalized Score Matching\nConvergence of Online Mirror Descent Algorithms\nScalable Alignment Kernels via Space-Efficient Feature Maps\nEstimating scale-invariant future in continuous time\nEfficient Large-Scale Fleet Management via Multi-Agent Deep  Reinforcement Learning\nMemorize or generalize? Searching for a compositional RNN in a haystack\nAccelerated Primal-Dual Policy Optimization for Safe Reinforcement  Learning\nRobust Estimation via Robust Gradient Estimation\nLearning High-level Representations from Demonstrations\nDeep Echo State Networks for Diagnosis of Parkinson's Disease\nBayes-optimal Hierarchical Classification over Asymmetric Tree-Distance  Loss\nDivide, Denoise, and Defend against Adversarial Attacks\nDeep Learning for Joint Source-Channel Coding of Text\nFourier Policy Gradients\nHierarchical Expertise-Level Modeling for User Specific Robot-Behavior  Explanations\nUsing Automatic Generation of Relaxation Constraints to Improve the  Preimage Attack on 39-step MD4\nTAP-DLND 1.0 : A Corpus for Document Level Novelty Detection\nNeural Network Ensembles to Real-time Identification of Plug-level  Appliance Measurements\nRobust Maximization of Non-Submodular Objectives\nCombining Textual Content and Structure to Improve Dialog Similarity\nMeta-Reinforcement Learning of Structured Exploration Strategies\nLearning to Play with Intrinsically-Motivated Self-Aware Agents\nEmergence of Structured Behaviors from Curiosity-Based Intrinsic  Motivation\nEpistemic Graphs for Representing and Reasoning with Positive and  Negative Influences of Arguments\nExplanations based on the Missing: Towards Contrastive Explanations with  Pertinent Negatives\nManipulating and Measuring Model Interpretability\nConvergent Actor-Critic Algorithms Under Off-Policy Training and  Function Approximation\nPooling homogeneous ensembles to build heterogeneous ensembles\nL2-Nonexpansive Neural Networks\nRobustness of classifiers to uniform $\\ell\\_p$ and Gaussian noise\nGenerating High-Quality Query Suggestion Candidates for Task-Based  Search\nTowards an Understanding of Entity-Oriented Search Intents\nIntrinsic Motivation and Mental Replay enable Efficient Online  Adaptation in Stochastic Recurrent Networks\nAlgorithmic Collusion in Cournot Duopoly Market: Evidence from  Experimental Economics\nMultimodal Explanations: Justifying Decisions and Pointing to the  Evidence\nReliable Intersection Control in Non-cooperative Environments\nA Polynomial Time Subsumption Algorithm for Nominal Safe  $\\mathcal{ELO}_\\bot$ under Rational Closure\nTensor Field Networks: Rotation- and Translation-Equivariant Neural  Networks for 3D Point Clouds\nHigh Order Recurrent Neural Networks for Acoustic Modelling\nLearning to Make Predictions on Graphs with Autoencoders\nBudget Constrained Bidding by Model-free Reinforcement Learning in  Display Advertising\nColoring black boxes: visualization of neural network decisions\nWeighted Double Deep Multiagent Reinforcement Learning in Stochastic  Cooperative Environments\nOptimal Stochastic Delivery Planning in Full-Truckload and  Less-Than-Truckload Delivery\nUnsupervised Grammar Induction with Depth-bounded PCFG\nVisualizing the Flow of Discourse with a Concept Ontology\nLearning Optimal Policies from Observational Data\nDeep learning in radiology: an overview of the concepts and a survey of  the state of the art\nGraphRNN: A Deep Generative Model for Graphs\nCakewalk Sampling\nDomain Specific Design Patterns: Designing For Conversational User  Interfaces\nOne Single Deep Bidirectional LSTM Network for Word Sense Disambiguation  of Text Data\nSelf-organizing maps and generalization: an algorithmic description of  Numerosity and Variability Effects\nMulti-Goal Reinforcement Learning: Challenging Robotics Environments and  Request for Research\nReinforcement and Imitation Learning for Diverse Visuomotor Skills\nModeling Others using Oneself in Multi-Agent Reinforcement Learning\nA Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human  to Autonomous Eliciting Agents\nPhenotype-based and Self-learning Inter-individual Sleep Apnea Screening  with a Level IV Monitoring System\nGeneralized Binary Search For Split-Neighborly Problems\nReal-Time Bidding with Multi-Agent Reinforcement Learning in Display  Advertising\nBioinformatics and Medicine in the Era of Deep Learning\nCoarse to fine non-rigid registration: a chain of scale-specific neural  networks for multimodal image alignment with application to remote sensing\nAb initio Algorithmic Causal Deconvolution of Intertwined Programs and  Networks by Generative Mechanism\nThe Emergence of Spectral Universality in Deep Networks\nLoss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs\nInvestigating Human Priors for Playing Video Games\nDeepSOFA: A Real-Time Continuous Acuity Score Framework using Deep  Learning\nEscort: Efficient Sparse Convolutional Neural Networks on GPUs\nGeneral Video Game AI: a Multi-Track Framework for Evaluating Agents,  Games and Content Generation Algorithms\nIdentifying Sources and Sinks in the Presence of Multiple Agents with  Gaussian Process Vector Calculus\nQuantum cognition goes beyond-quantum: modeling the collective  participant in psychological measurements\nDiGrad: Multi-Task Reinforcement Learning with Shared Actions\nAnticipation in Human-Robot Cooperation: A Recurrent Neural Network  Approach for Multiple Action Sequences Prediction\nModel-Ensemble Trust-Region Policy Optimization\nNeural Networks Should Be Wide Enough to Learn Disconnected Decision  Regions\nModel-Based Value Estimation for Efficient Model-Free Reinforcement  Learning\nIntegrating Human-Provided Information Into Belief State Representation  Using Dynamic Factorization\nLearning Longer-term Dependencies in RNNs with Auxiliary Losses\nModeling reverse thinking for machine learning\nTowards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent  Reinforcement Learning Approach\nFacial Expression Recognition Based on Complexity Perception  Classification Algorithm\nLearning Flexible and Reusable Locomotion Primitives for a Microrobot\nRepresentation Learning in Partially Observable Environments using  Sensorimotor Prediction\nQ-CP: Learning Action Values for Cooperative Planning\nThe Power Mean Laplacian for Multilayer Graph Clustering\nComposable Planning with Attributes\nSemi-Supervised Online Structure Learning for Composite Event  Recognition\nHierarchical Imitation and Reinforcement Learning\nSemi-parametric Topological Memory for Navigation\nGesture-based Piloting of an Aerial Robot using Monocular Vision\nUnsupervised Learning of Goal Spaces for Intrinsically Motivated Goal  Exploration\nEstimating Total Search Space Size for Specific Piece Sets in Chess\nEssentially No Barriers in Neural Network Energy Landscape\nImpact of Biases in Big Data\nMulti-Instance Dynamic Ordinal Random Fields for Weakly-supervised  Facial Behavior Analysis\nUnderstanding the Loss Surface of Neural Networks for Binary  Classification\nOptimization with Gradient-Boosted Trees and Risk Control\nMulti-Agent Imitation Learning for Driving Simulation\nAn Ensemble Framework of Voice-Based Emotion Recognition System for  Films and TV Programs\nOn the Power of Over-parametrization in Neural Networks with Quadratic  Activation\nA Swift Heuristic Method for Work Order Scheduling under the  Skilled-Workforce Constraint\nAn Empirical Evaluation of Generic Convolutional and Recurrent Networks  for Sequence Modeling\nOn Cognitive Preferences and the Interpretability of Rule-based Models\nImproving Multi-Step Traffic Flow Prediction\nLocalization under Topological Uncertainty for Lane Identification of  Autonomous Vehicles\nExploring Novel Game Spaces with Fluidic Games\nA real-time rule-based system for bridge management based on CART  decision tree and SMO algorithms\nDAGs with NO TEARS: Smooth Optimization for Structure Learning\nN-body Networks: a Covariant Hierarchical Neural Network Architecture  for Learning Atomic Potentials\nOne-Class Adversarial Nets for Fraud Detection\nTowards Automatic & Personalised Mobile Health Interventions: An  Interactive Machine Learning Perspective\nROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization  Tasks\nExplain Yourself: A Natural Language Interface for Scrutable Autonomous  Robots\nOptimal Stochastic Package Delivery Planning with Deadline: A  Cardinality Minimization in Routing\nSynthesizing Neural Network Controllers with Probabilistic Model based  Reinforcement Learning\nAnnotation Artifacts in Natural Language Inference Data\nSmoothed Action Value Functions for Learning Gaussian Policies\nVisual Explanations From Deep 3D Convolutional Neural Networks for  Alzheimer's Disease Classification\nObject cosegmentation using deep Siamese network\nMulti-Channel Pyramid Person Matching Network for Person  Re-Identification\nGPSP: Graph Partition and Space Projection based Approach for  Heterogeneous Network Embedding\nExtracting Action Sequences from Texts Based on Deep Reinforcement  Learning\nGenerating Contradictory, Neutral, and Entailing Sentences\nOntoWind: An Improved and Extended Wind Energy Ontology\nAccelerated Methods for Deep Reinforcement Learning\nSever: A Robust Meta-Algorithm for Stochastic Optimization\nSatisficing in Time-Sensitive Bandit Learning\nAn efficient framework for learning sentence representations\nSimultaneous Task Allocation and Planning Under Uncertainty\nSA-IGA: A Multiagent Reinforcement Learning Method Towards Socially  Optimal Outcomes\nCompositional Attention Networks for Machine Reasoning\nConcise Fuzzy Representation of Big Graphs: a Dimensionality Reduction  Approach\nFeudal Reinforcement Learning for Dialogue Management in Large Domains\nDeep Neural Network Compression with Single and Multiple Level  Quantization\nValuing knowledge, information and agency in Multi-agent Reinforcement  Learning: a case study in smart buildings\nA New Model for Evaluating Range-Based Anomaly Detection Algorithms\nEvolutionary Architecture Search For Deep Multitask Networks\nARMDN: Associative and Recurrent Mixture Density Networks for eRetail  Demand Forecasting\nA Deep Learning Based Behavioral Approach to Indoor Autonomous  Navigation\nConcept2vec: Metrics for Evaluating Quality of Embeddings for  Ontological Concepts\nMeasuring Conflict in a Multi-Source Environment as a Normal Measure\nOn Cryptographic Attacks Using Backdoors for SAT\nHierarchical Reinforcement Learning: Approximating Optimal Discounted  TSP Using Local Policies\nAn Agent-Based Simulation of Residential Location Choice of Tenants in  Tehran, Iran\nImpacts of transport development on residence choice of renter  households: An agent-based evaluation\nLearning to Explore with Meta-Policy Gradient\nFeature extraction without learning in an analog Spatial Pooler  memristive-CMOS circuit design of Hierarchical Temporal Memory\nLearning to Play General Video-Games via an Object Embedding Network\nKnowledge-based Recurrent Attentive Neural Network for Traffic Sign  Detection\nComplex activity patterns generated by short-term synaptic plasticity\nImitation Learning with Concurrent Actions in 3D Games\nAveraging Weights Leads to Wider Optima and Better Generalization\nThink you have Solved Question Answering? Try ARC, the AI2 Reasoning  Challenge\nChallenges in Discriminating Profanity from Hate Speech\nRearrangement with Nonprehensile Manipulation Using Deep Reinforcement  Learning\nFeature Distillation: DNN-Oriented JPEG Compression Against Adversarial  Examples\nToolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey  and Future Directions\nUnraveling Go gaming nature by Ising Hamiltonian and common fate graphs:  tactics and statistics\nBeyond Patient Monitoring: Conversational Agents Role in Telemedicine &  Healthcare Support For Home-Living Elderly Individuals\nA Meaning-based Statistical English Math Word Problem Solver\nVulnerability of Deep Learning\nSome HCI Priorities for GDPR-Compliant Machine Learning\nSnap Machine Learning\nLearning to Cluster for Proposal-Free Instance Segmentation\nArgumentation theory for mathematical argument\nLearning recurrent dynamics in spiking networks\nThe Web as a Knowledge-base for Answering Complex Questions\nComputing and Testing Pareto Optimal Committees\nBatched quantum state exponentiation and quantum Hebbian learning\nSimple random search provides a competitive approach to reinforcement  learning\nAutomated Curriculum Learning by Rewarding Temporally Rare Events\nNeural Text Generation: Past, Present and Beyond\nEnglish-Catalan Neural Machine Translation in the Biomedical Domain  through the cascade approach\nAttention-based Temporal Weighted Convolutional Neural Network for  Action Recognition\nOntology-Based Reasoning about the Trustworthiness of Cyber-Physical  Systems\nEnslaving the Algorithm: From a \"Right to an Explanation\" to a \"Right to  Better Decisions\"?\nInference in Probabilistic Graphical Models by Graph Neural Networks\nLook Before You Leap: Bridging Model-Free and Model-Based Reinforcement  Learning for Planned-Ahead Vision-and-Language Navigation\nSpeech Emotion Recognition Considering Local Dynamic Features\nEmergence of grid-like representations by training recurrent neural  networks to perform spatial localization\nLearning and Recognizing Human Action from Skeleton Movement with Deep  Residual Neural Networks\nMulti-view Metric Learning in Vector-valued Kernel Spaces\nExpeditious Generation of Knowledge Graph Embeddings\nOn-demand Relational Concept Analysis\nScalable Generalized Dynamic Topic Models\nScan transcription of two-dimensional shapes as an alternative  neuromorphic concept\nStacked Cross Attention for Image-Text Matching\nRobust Blind Deconvolution via Mirror Descent\nLearning the Localization Function: Machine Learning Approach to  Fingerprinting Localization\nLearning-based Model Predictive Control for Safe Exploration and  Reinforcement Learning\nA framework for Culture-aware Robots based on Fuzzy Logic\nStructured Output Learning with Abstention: Application to Accurate  Opinion Prediction\nThe Rapidly Changing Landscape of Conversational Agents\nDeep Reinforcement Learning with Model Learning and Monte Carlo Tree  Search in Minecraft\nTowards Universal Representation for Unseen Action Recognition\nText2Shape: Generating Shapes from Natural Language by Learning Joint  Embeddings\nNeuronal Circuit Policies\nFoundations of Prescriptive Process Monitoring\nFrom Random Differential Equations to Structural Causal Models: the  stochastic case\n2CoBel : An Efficient Belief Function Extension for Two-dimensional  Continuous Spaces\nA mosaic of Chu spaces and Channel Theory with applications to Object  Identification and Mereological Complexity\nDeepMood: Modeling Mobile Phone Typing Dynamics for Mood Detection\nLayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image\nFace Recognition with Hybrid Efficient Convolution Algorithms on FPGAs\nDatasheets for Datasets\nMulti-range Reasoning for Machine Comprehension\nA Resourceful Reframing of Behavior Trees\ncode2vec: Learning Distributed Representations of Code\nConnectionist Recommendation in the Wild\nAccelerating Empowerment Computation with UCT Tree Search\nMLE-induced Likelihood for Markov Random Fields\nImage Semantic Transformation: Faster, Lighter and Stronger\nReinforcement Learning for Fair Dynamic Pricing\nAutomated Speed and Lane Change Decision Making using Deep Reinforcement  Learning\nDeepJDOT: Deep Joint distribution optimal transport for unsupervised  domain adaptation\nComprehending Real Numbers: Development of Bengali Real Number Speech  Corpus\nForward-Backward Reinforcement Learning\nNeuroevolution for RTS Micro\nWhat deep learning can tell us about higher cognitive functions like  mindreading?\nBundled fragments of first-order modal logic: (un)decidability\nThe fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset,  task and baselines\nLearning to Become an Expert: Deep Networks Applied To Super-Resolution  Microscopy\nBest arm identification in multi-armed bandits with delayed feedback\nA Review of Literature on Parallel Constraint Solving\nA real-time warning system for rear-end collision based on random forest  classifier\nModified SMOTE Using Mutual Information and Different Sorts of Entropies\nActor-Critic based Training Framework for Abstractive Summarization\n3D Consistent Biventricular Myocardial Segmentation Using Deep Learning  for Mesh Generation\nTwo can play this Game: Visual Dialog with Discriminative Question  Generation and Answering\nWelfare Without Taxation - Autonomous production revenues for Universal  Basic Income\nRegularizing RNNs for Caption Generation by Reconstructing The Past with  The Present\n3D Pose Estimation and 3D Model Retrieval for Objects in the Wild\nLearning to Anonymize Faces for Privacy Preserving Action Detection\nVisual Robot Task Planning\nEfficient Encodings of Conditional Cardinality Constraints\nModeling Individual Differences in Game Behavior using HMM\nAttentional Multilabel Learning over Graphs: A Message Passing Approach\nAggregated Momentum: Stability Through Passive Damping\nCuriosity-driven Exploration for Mapless Navigation with Deep  Reinforcement Learning\nRegional Priority Based Anomaly Detection using Autoencoders\nTowards Explanation of DNN-based Prediction with Guided Feature  Inversion\nPredictions of short-term driving intention using recurrent neural  network on sequential data\nInvestigating Capsule Networks with Dynamic Routing for Text  Classification\nSpecification-Driven Multi-Perspective Predictive Business Process  Monitoring (Extended Version)\nUniversal Planning Networks\nUnsupervised Learning of Sequence Representations by Autoencoders\nCIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble  of Deep and Shallow Learning to predict the Quality of Product Titles\nTransferring Common-Sense Knowledge for Object Detection\nUnsupervised Geometry-Aware Representation for 3D Human Pose Estimation\nNeural-Guided Deductive Search for Real-Time Program Synthesis from  Examples\nNegPSpan: efficient extraction of negative sequential patterns with  embedding constraints\nClinical Concept Embeddings Learned from Massive Sources of Medical Data\nAbstractive Tabular Dataset Summarization via Knowledge Base Semantic  Embeddings\nStochastic Adversarial Video Prediction\nHypertree Decompositions Revisited for PGMs\nVariational Rejection Sampling\nThe Kanerva Machine: A Generative Distributed Memory\nEnd-to-End Saliency Mapping via Probability Distribution Prediction\nA Human Mixed Strategy Approach to Deep Reinforcement Learning\nA Survey of Miss-Ratio Curve Construction Techniques\nPedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and  Beyond\nNoise-resistant Deep Learning for Object Classification in 3D Point  Clouds Using a Point Pair Descriptor\nReinforcement Learning based QoS/QoE-aware Service Function Chaining in  Software-Driven 5G Slices\nEnd-to-End Learning of Communications Systems Without a Channel Model\nComparing Dependencies in Probability Theory and General Rough Sets:  Part-A\nCompositional Obverter Communication Learning From Raw Visual Input\nPredictive Process Monitoring Methods: Which One Suits Me Best?\nProgrammatically Interpretable Reinforcement Learning\nEfficient Reciprocal Collision Avoidance between Heterogeneous Agents  Using CTMAT\nHindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP  Solvers for Multi-resolution Information Gathering\nA Proposal of Interactive Growing Hierarchical SOM\nFast Conditional Independence Test for Vector Variables with Large  Sample Sizes\nActive Mini-Batch Sampling using Repulsive Point Processes\nA Generation Method of Immunological Memory in Clonal Selection  Algorithm by using Restricted Boltzmann Machines\nPolicy Gradient With Value Function Approximation For Collective  Multiagent Planning\nA theory of consciousness: computation, algorithm, and neurobiological  realization\nA review of possible effects of cognitive biases on interpretation of  rule-based machine learning models\nLeveraging Intra-User and Inter-User Representation Learning for  Automated Hate Speech Detection\nData2Vis: Automatic Generation of Data Visualizations Using  Sequence-to-Sequence Recurrent Neural Networks\nSegmentation of Multiple Sclerosis lesion in brain MR images using Fuzzy  C-Means\nQA4IE: A Question Answering based Framework for Information Extraction\nA Hierarchical Latent Structure for Variational Conversation Modeling\nExploring Disentangled Feature Representation Beyond Face Identification\nTowards Training Probabilistic Topic Models on Neuromorphic Multi-chip  Systems\nIntroduction to Iltis: An Interactive, Web-Based System for Teaching  Logic\nPersonalization of Health Interventions using Cluster-Based  Reinforcement Learning\nUnderstanding disentangling in $β$-VAE\nUniversal Successor Representations for Transfer Reinforcement Learning\nCoT: Cooperative Training for Generative Modeling\nA Variable Neighborhood Search for Flying Sidekick Traveling Salesman  Problem\nIncremental Predictive Process Monitoring: How to Deal with the  Variability of Real Environments\nReasoning about Safety of Learning-Enabled Components in Autonomous  Cyber-physical Systems\nEmergent Communication through Negotiation\nEmergence of Linguistic Communication from Referential Games with  Symbolic and Pixel Input\nDORA The Explorer: Directed Outreaching Reinforcement Action-Selection\nPredicting Twitter User Socioeconomic Attributes with Network and  Language Information\nMarket Making via Reinforcement Learning\nExploiting Task-Oriented Resources to Learn Word Embeddings for Clinical  Abbreviation Expansion\nAdafactor: Adaptive Learning Rates with Sublinear Memory Cost\nIncomplete Contracting and AI Alignment\nSTAIR Actions: A Video Dataset of Everyday Home Actions\nGlobal SNR Estimation of Speech Signals using Entropy and Uncertainty  Estimates from Dropout Networks\nRegularized Greedy Column Subset Selection\nCubeNet: Equivariance to 3D Rotation and Translation\nOutline Objects using Deep Reinforcement Learning\nDiscovery and usage of joint attention in images\nCERES: Distantly Supervised Relation Extraction from the Semi-Structured  Web\nOnline Fall Detection using Recurrent Neural Networks\nMonitoring and Executing Workflows in Linked Data Environments\nIntelligent Probabilistic Inference\nAn Introduction to Collective Intelligence\nChannel-Independent and Sensor-Independent Stimulus Representations\nThe Cyborg Astrobiologist: First Field Experience\nField geology with a wearable computer: 1st results of the Cyborg  Astrobiologist System\nNurse Rostering with Genetic Algorithms\nSurvey on Various Gesture Recognition Techniques for Interfacing  Machines Based on Ambient Intelligence\nSNF Project Locomotion: Final report 2009-2010\nSNF Project Locomotion: Progress report 2008-2009\nDiscovering Stock Price Prediction Rules of Bombay Stock Exchange Using  Rough Fuzzy Multi Layer Perception Networks\nGalactic-scale macro-engineering: Looking for signs of other intelligent  species, as an exercise in hope for our own\nSemantic Enrichment of Mobile Phone Data Records Using Background  Knowledge\nAutonomics: an autonomous and intelligent economic platform and next  generation money tool\nPersonalized and situation-aware multimodal route recommendations: the  FAVOUR algorithm\nGeometry of Interest (GOI): Spatio-Temporal Destination Extraction and  Partitioning in GPS Trajectory Data\nVisual Dialog\nPredictive modeling of die filling of the pharmaceutical granules using  the flexible neural tree\nFast YOLO: A Fast You Only Look Once System for Real-time Embedded  Object Detection in Video\nSim-To-Real Optimization Of Complex Real World Mobile Network with  Imperfect Information via Deep Reinforcement Learning from Self-play\nAutomatic Recognition of Space-Time Constellations by Learning on the  Grassmann Manifold (Extended Version)\nThe Biological Concept of Neoteny in Evolutionary Colour Image  Segmentation - Simple Experiments in Simple Non-Memetic Genetic Algorithms\nUnderstanding the Social Cascading of Geekspeak and the Upshots for  Social Cognitive Systems\nThe \"Wow! signal\" of the terrestrial genetic code\nA DDoS-Aware IDS Model Based on Danger Theory and Mobile Agents\nDistinguishing cause from effect using observational data: methods and  benchmarks\nSENNS: Sparse Extraction Neural NetworkS for Feature Extraction\nGoogle's Multilingual Neural Machine Translation System: Enabling  Zero-Shot Translation\nDevelopment and application of a machine learning supported methodology  for measurement and verification (M&V) 2.0\nThe Tsetlin Machine - A Game Theoretic Bandit Driven Approach to Optimal  Pattern Recognition with Propositional Logic\nAn Improved Search Algorithm for Optimal Multiple-Sequence Alignment\nFormal Model of Uncertainty for Possibilistic Rules\nAn Anytime Algorithm for Optimal Coalition Structure Generation\nProbabilistic and Non-Monotonic Inference\nA General Non-Probabilistic Theory of Inductive Reasoning\nGraphs in machine learning: an introduction\nSwarms, Phase Transitions, and Collective Intelligence\nNeural network design for J function approximation in dynamic  programming\nWeak subsumption Constraints for Type Diagnosis: An Incremental  Algorithm\nThe Effect of Resource Limits and Task Complexity on Collaborative  Planning in Dialogue\nA Framework for Natural Language Interfaces to Temporal Databases\nFASTUS: A Cascaded Finite-State Transducer for Extracting Information  from Natural-Language Text\nTime, Tense and Aspect in Natural Language Database Interfaces\nDiscovery of Linguistic Relations Using Lexical Attraction\nFrom spin glasses to hard satisfiable formulas\nFirst-Order Conditional Logic Revisited\nSimilarity-Based Models of Word Cooccurrence Probabilities\nHypertree Decompositions and Tractable Queries\nAn Empirical Approach to Temporal Reference Resolution (journal version)\nMinimum Description Length Induction, Bayesianism, and Kolmogorov  Complexity\nThe \"Fodor\"-FODOR fallacy bites back\nAn Algebraic Programming Style for Numerical Software and its  Optimization\nGeneral Principles of Learning-Based Multi-Agent Systems\nAutomatic Generation of Constraint Propagation Algorithms for Small  Finite Domains\nKnowledge in Multi-Agent Systems: Initial Configurations and Broadcast\nComputing large and small stable models\nSLT-Resolution for the Well-Founded Semantics\nOn the tractable counting of theory models and its application to belief  revision and truth maintenance\nLinear Tabulated Resolution Based on Prolog Control Strategy\nA Consistency-Based Model for Belief Change: Preliminary Report\nDetecting Unsolvable Queries for Definite Logic Programs\nConstraint Programming viewed as Rule-based Programming\nDATALOG with constraints - an answer-set programming system\nConstraint Exploration and Envelope of Simulation Trajectories\nUsing Learning-based Filters to Detect Rule-based Filtering Obsolescence\nTwo Steps Feature Selection and Neural Network Classification for the  TREC-8 Routing\nNoise-Tolerant Learning, the Parity Problem, and the Statistical Query  Model\nCreativity and Delusions: A Neurocomputational Approach\nBelief Revision: A Critique\nOn Properties of Update Sequences Based on Causal Rejection\nArc consistency for soft constraints\nComputing Preferred Answer Sets by Meta-Interpretation in Answer Set  Programming\nCollusion in Unrepeated, First-Price Auctions with an Uncertain Number  of Participants\nA Modal Logic Framework for Multi-agent Belief Fusion\nLinear Programming helps solving large multi-unit combinatorial auctions\nNonmonotonic Logics and Semantics\nA Framework for Compiling Preferences in Logic Programs\nComplexity of Manipulating Elections with Few Candidates\nModeling Complex Domains of Actions and Change\nA Polynomial Translation of Logic Programs with Nested Expressions into  Disjunctive Logic Programs: Preliminary Report\nThe partition semantics of questions, syntactically\nExtremal Optimization: an Evolutionary Local-Search Algorithm\nThe DLV System for Knowledge Representation and Reasoning\nVanquishing the XCB Question: The Methodology Discovery of the Last  Shortest Single Axiom for the Equivalential Calculus\nPropositional satisfiability in declarative programming\nTheoretical Analyses of Cross-Validation Error and Voting in  Instance-Based Learning\nMany Hard Examples in Exact Phase Transitions with Application to  Generating Hard Satisfiable Instances\nUnfolding Partiality and Disjunctions in Stable Model Semantics\nA Neural Network Assembly Memory Model with Maximum-Likelihood Recall  and Recognition Properties\nComplex Systems\nCluster-based Specification Techniques in Dempster-Shafer Theory\nOn rho in a Decision-Theoretic Apparatus of Dempster-Shafer Theory\nUpdating beliefs with incomplete observations\nOn the Existence and Convergence Computable Universal Priors\nMinimum Model Semantics for Logic Programs with Negation-as-Failure\nComplexity of Determining Nonemptiness of the Core\nUniversal Voting Protocol Tweaks to Make Manipulation Hard\nBridging the gap between modal temporal logics and constraint-based QSR  as an ALC(D) spatio-temporalisation with weakly cyclic TBoxes\nA ternary Relation Algebra of directed lines\nNeural realisation of the SP theory: cell assemblies revisited\nCoherent Keyphrase Extraction via Web Mining\nA Neural Network Assembly Memory Model Based on an Optimal Binary Signal  Detection Theory\nClustering by compression\nComputational complexity and simulation of rare events of Ising spin  glasses\nDistribution of Mutual Information from Complete and Incomplete Data\nA Simple Proportional Conflict Redistribution Rule\nOutlier Detection by Logic Programming\nNormal forms for Answer Sets Programming\nComparing Multi-Target Trackers on Different Force Unit Levels\nImage Colour Segmentation by Genetic Algorithms\nOn Image Filtering, Noise and Morphological Size Intensity Diagrams\nOn the existence of stable models of non-stratified logic programs\nFrom truth to computability II\nTowards Automated Integration of Guess and Check Programs in Answer Set  Programming: A Meta-Interpreter and Applications\nGenerating Hard Satisfiable Formulas by Hiding Solutions Deceptively\nThe Bayesian Decision Tree Technique with a Sweeping Strategy\nSummarization from Medical Documents: A Survey\nEstimating Classification Uncertainty of Bayesian Decision Tree  Technique on Financial Data\nComparison of the Bayesian and Randomised Decision Tree Ensembles within  an Uncertainty Envelope Technique\nKnowledge Representation Issues in Semantic Graphs for Relationship  Detection\nTemporal and Spatial Data Mining with Second-Order Hidden Models\nA Unified Subspace Outlier Ensemble Framework for Outlier Detection in  High Dimensional Spaces\nPreferential and Preferential-discriminative Consequence relations\nCompetitive on-line learning with a convex loss function\nDeriving a Stationary Dynamic Bayesian Network from a Logic Program with  Recursive Loops\nConjunctive Query Containment and Answering under Description Logics  Constraints\nMeasuring Semantic Similarity by Latent Relational Analysis\nMAP estimation via agreement on (hyper)trees: Message-passing and linear  programming\nDoes a Plane Imitate a Bird? Does Computer Vision Have to Follow  Biological Paradigms?\nIdentifying Interaction Sites in \"Recalcitrant\" Proteins: Predicted  Protein and Rna Binding Sites in Rev Proteins of Hiv-1 and Eiav Agree with  Experimental Data\nBranch-and-Prune Search Strategies for Numerical Constraint Solving\nMinimum Cost Homomorphisms to Proper Interval Graphs and Bigraphs\nOpen Answer Set Programming with Guarded Programs\nNew results on rewrite-based satisfiability procedures\nRetraction and Generalized Extension of Computing with Words\nA Knowledge-Based Approach for Selecting Information Sources\nModal Logics of Topological Relations\nSupervisory Control of Fuzzy Discrete Event Systems: A Formal Approach\nUnderstanding Design Fundamentals: How Synthesis and Analysis Drive  Creativity, Resulting in Emergence\nExpressing Implicit Semantic Relations without Supervision\nSearching for Globally Optimal Functional Forms for Inter-Atomic  Potentials Using Parallel Tempering and Genetic Programming\nAutomated verification of weak equivalence within the SMODELS system\nHigher-Order Termination: from Kruskal to Computability\nSensor Scheduling for Optimal Observability Using Estimation Entropy\nSemantic results for ontic and epistemic change\nDecentralized Failure Diagnosis of Stochastic Discrete Event Systems\nA Logical Approach to Efficient Max-SAT solving\nFuzzy Logic Classification of Imaging Laser Desorption Fourier Transform  Mass Spectrometry Data\nA Neutrosophic Description Logic\nOn the Benefits of Inoculation, an Example in Train Scheduling\nInteractive Configuration by Regular String Constraints\nTruncating the loop series expansion for Belief Propagation\nGeneric Global Constraints based on MDDs\nAxiomatic Theory of Algorithms: Computability and Decidability in  Algorithmic Classes\nGenerating Functions For Kernels of Digraphs (Enumeration & Asymptotics  for Nim Games)\nAttribute Exploration of Discrete Temporal Transitions\nQuantum Computer as a Probabilistic Inference Engine\nEntangled Quantum Networks\nMarkovian Entanglement Networks\nA study of structural properties on profiles HMMs\nClustering Co-occurrence of Maximal Frequent Patterns in Streams\nClustering with Lattices in the Analysis of Graph Patterns\nVirtual Sensor Based Fault Detection and Classification on a Plasma Etch  Reactor\nA preliminary analysis on metaheuristics methods applied to the  Haplotype Inference Problem\nUsing RDF to Model the Structure and Process of Systems\nEfficient Tabling Mechanisms for Transaction Logic Programs\nOn the deduction of galaxy abundances with evolutionary neural networks\nFitness landscape of the cellular automata majority problem: View from  the Olympus\nLagrangian Relaxation for MAP Estimation in Graphical Models\nFuzzy Modeling of Electrical Impedance Tomography Image of the Lungs\nDiscriminated Belief Propagation\nPerformance Bounds for Lambda Policy Iteration and Application to the  Game of Tetris\nTowards a Sound Theory of Adaptation for the Simple Genetic Algorithm\nDimensionality Reduction and Reconstruction using Mirroring Neural  Networks and Object Recognition based on Reduced Dimension Characteristic  Vector\nAutomatic Pattern Classification by Unsupervised Learning Using  Dimensionality Reduction of Data with Mirroring Neural Networks\nA Common View on Strong, Uniform, and Other Notions of Equivalence in  Answer-Set Programming\nSequential operators in computability logic\nTRUST-TECH based Methods for Optimization and Learning\nLe terme et le concept : fondements d'une ontoterminologie\nDesign and Implementation of Aggregate Functions in the DLV System\nAutomated Termination Proofs for Logic Programs by Term Rewriting\nA $O(\\log m)$, deterministic, polynomial-time computable approximation  of Lewis Carroll's scoring rule\nOn Kernelization of Supervised Mahalanobis Distance Learners\nAn Analysis of Key Factors for the Success of the Communal Management of  Knowledge\nLogic Mining Using Neural Networks\nTowards applied theories based on computability logic\nConstructing Folksonomies from User-specified Relations on Flickr\nFeature Selection for Bayesian Evaluation of Trauma Death Risk\nThe end of Sleeping Beauty's nightmare\nUnveiling the mystery of visual information processing in human brain\nMessage-passing for Maximum Weight Independent Set\nElectricity Demand and Energy Consumption Management System\nApproximating acyclicity parameters of sparse hypergraphs\nMining Meaning from Wikipedia\nAchieving compositionality of the stable model semantics for Smodels  programs\nAn Evidential Path Logic for Multi-Relational Networks\nExperimental Evidence for Quantum Structure in Cognition\nClassical Logical versus Quantum Conceptual Thought: Examples in  Economics, Decision theory and Concept Theory\nEmbedding Non-Ground Logic Programs into Autoepistemic Logic for  Knowledge Base Combination\nModeling Social Annotation: a Bayesian Approach\nLearning Class-Level Bayes Nets for Relational Data\nPhysics of risk and uncertainty in quantum decision making\nA New Clustering Algorithm Based Upon Flocking On Complex Network\nApproximate inference on planar graphs using Loop Calculus and Belief  Propagation\nImprovements of real coded genetic algorithms based on differential  operators preventing premature convergence\nSyntactic Confluence Criteria for Positive/Negative-Conditional Term  Rewriting Systems\nA Self-Contained and Easily Accessible Discussion of the Method of  Descente Infinie and Fermat's Only Explicitly Known Proof by Descente Infinie\nLearning DTW Global Constraint for Time Series Classification\nModeling the Experience of Emotion\nOn Requirements for Programming Exercises from an E-learning Perspective\nComplexity of Terminating Preference Elicitation\nSwitcher-random-walks: a cognitive-inspired mechanism for network  exploration\nFully Automated Approaches to Analyze Large-Scale Astronomy Survey Data\nCP-logic: A Language of Causal Probabilistic Events and Its Relation to  Logic Programming\nKiWi: A Scalable Subspace Clustering Algorithm for Gene Expression  Analysis\nFast Algorithms for Mining Interesting Frequent Itemsets without Minimum  Support\nToggling operators in computability logic\nAutomated Epilepsy Diagnosis Using Interictal Scalp EEG\nCharacterizations of Stable Model Semantics for Logic Programs with  Arbitrary Constraint Atoms\nDo not Choose Representation just Change: An Experimental Study in  States based EA\nA Minimum Description Length Approach to Multitask Feature Selection\nMining Compressed Repetitive Gapped Sequential Patterns Efficiently\nExact Indexing for Massive Time Series Databases under Time Warping  Distance\nRecommender Systems for the Conference Paper Assignment Problem\nOn Chase Termination Beyond Stratification\nConstructive Decision Theory\nSurvival of the flexible: explaining the recent dominance of  nature-inspired optimization within a rapidly evolving world\nStrategic Positioning in Tactical Scenario Planning\nGeneralized Collective Inference with Symmetric Clique Potentials\nRobustness and Adaptiveness Analysis of Future Fleets\nApply Local Clustering Method to Improve the Running Speed of Ant Colony  Optimization\nThe Cost of Stability in Coalitional Games\nOn the Internal Topological Structure of Plane Regions\nResource Matchmaking Algorithm using Dynamic Rough Set in Grid  Environment\nInteractive Data Integration through Smart Copy & Paste\nGreedy Gossip with Eavesdropping\nA Convergent Online Single Time Scale Actor Critic Algorithm\nDealing with incomplete agents' preferences and an uncertain agenda in  group decision making via sequential majority voting\nReduced-Rank Hidden Markov Models\nScaling Analysis of Affinity Propagation\nA Component Based Heuristic Search Method with Evolutionary Eliminations\nAn Evolutionary Squeaky Wheel Optimisation Approach to Personnel  Scheduling\nAlgorithms for Image Analysis and Combination of Pattern Classifiers  with Application to Medical Diagnosis\nSum of Us: Strategyproof Selection from the Selectors\nIndustrial-Strength Formally Certified SAT Solving\nMaximin affinity learning of image segmentation\nDifferentially Private Empirical Risk Minimization\nOn Finding Predictors for Arbitrary Families of Processes\nBelieve It or Not: Adding Belief Annotations to Databases\nWhy so? or Why no? Functional Causality for Explaining Query Answers\nWeb-Based Expert System for Civil Service Regulations: RCSES\nComparing Simulation Output Accuracy of Discrete Event and Agent Based  Models: A Quantitive Approach\nA Decidable Class of Nested Iterated Schemata (extended version)\nFace Recognition by Fusion of Local and Global Matching Scores using DS  Theory: An Evaluation with Uni-classifier and Multi-classifier Paradigm\nSIFT-based Ear Recognition by Fusion of Detected Keypoints from Color  Similarity Slice Regions\nDetecting Motifs in System Call Sequences\nEstablishment of Relationships between Material Design and Product  Design Domains by Hybrid FEM-ANN Technique\nImplementation of an Innovative Bio Inspired GA and PSO Algorithm for  Controller design considering Steam GT Dynamics\nMessage-Passing Algorithms: Reparameterizations and Splittings\nRedundancy, Deduction Schemes, and Minimum-Size Bases for Association  Rules\nA Contextual-Bandit Approach to Personalized News Article Recommendation\nIndexer Based Dynamic Web Services Discovery\nHandwritten Arabic Numeral Recognition using a Multi Layer Perceptron\nThe role of semantics in mining frequent patterns from knowledge bases  in description logics with rules\nAgreement Maintenance Based on Schema and Ontology Change in P2P  Environment\nModelling and simulating retail management practices: a first approach\nOptimisation of a Crossdocking Distribution Centre Simulation Model\nA Formal Approach to Modeling the Memory of a Living Organism\nAdaptive Submodularity: Theory and Applications in Active Learning and  Stochastic Optimization\nIntegrating Real-Time Analysis With The Dendritic Cell Algorithm Through  Segmentation\nOn Tsallis Entropy Bias and Generalized Maximum Entropy Models\nBelief Propagation for Min-cost Network Flow: Convergence and  Correctness\nTerrorism Event Classification Using Fuzzy Inference Systems\nGenetic Algorithms for Multiple-Choice Problems\nDecision Support Systems (DSS) in Construction Tendering Processes\nParcellation of fMRI Datasets with ICA and PLS-A Data Driven Approach\nIntelligent System for Speaker Identification using Lip features with  PCA and ICA\nDiscrete geometric analysis of message passing algorithm on graphs\nFeature Selection with Conjunctions of Decision Stumps and Learning from  Microarray Data\nScalable Probabilistic Databases with Factor Graphs and MCMC\nCombining Naive Bayes and Decision Tree for Adaptive Intrusion Detection\nMétodos para la Selección y el Ajuste de Características en  el Problema de la Detección de Spam\nUncovering the Riffled Independence Structure of Rankings\nBegin, After, and Later: a Maximal Decidable Interval Temporal Logic\nModelling Reactive and Proactive Behaviour in Simulation\nPAC learnability of a concept class under non-atomic measures: a problem  by Vidyasagar\nSoft Approximations and uni-int Decision Making\nApproximate Counting for Complex-Weighted Boolean Constraint  Satisfaction Problems\nOn The Complexity and Completeness of Static Constraints for Breaking  Row and Column Symmetry\nAn axiomatic formalization of bounded rationality based on a  utility-information equivalence\nAn svm multiclassifier approach to land cover mapping\nLogic-Based Decision Support for Strategic Environmental Assessment\nNew Results for the MAP Problem in Bayesian Networks\nA Program-Level Approach to Revising Logic Programs under the Answer Set  Semantics\nStable marriage problems with quantitative preferences\nComparison Of Modified Dual Ternary Indexing And Multi-Key Hashing  Algorithms For Music Information Retrieval\nA Homogeneous Reaction Rule Language for Complex Event Processing\nApproximate Judgement Aggregation\nNESVM: a Fast Gradient Method for Support Vector Machines\nExperimental Evaluation of Branching Schemes for the CSP\nGaussian Process Bandits for Tree Search: Theory and Application to  Planning in Discounted MDPs\nThe Complexity of Causality and Responsibility for Query Answers and  non-Answers\nMultiplex Structures: Patterns of Complexity in Real-World Networks\nOn the Doubt about Margin Explanation of Boosting\nModeling and Analyzing Adaptive User-Centric Systems in Real-Time Maude\nAn Algebraic Study of Bilattice-based Logics\nMining Knowledge in Astrophysical Massive Data Sets\nGrounded Symbols in the Brain Computational Foundations for Perceptual  Symbol System\nAnalysing the behaviour of robot teams through relational sequential  pattern mining\nPredictive State Temporal Difference Learning\nSupervised Random Walks: Predicting and Recommending Links in Social  Networks\nBiologically Inspired Design Principles for Scalable, Robust, Adaptive,  Decentralized Search and Automated Response (RADAR)\nQuantum randomness and free will\nDistributed Graph Coloring: An Approach Based on the Calling Behavior of  Japanese Tree Frogs\nLearning restricted Bayesian network structures\nURSA: A System for Uniform Reduction to SAT\nExperimental Comparison of Representation Methods and Distance Measures  for Time Series Data\nOn the size of data structures used in symbolic model checking\nData Conflict Resolution Using Trust Mappings\nSAPFOCS: a metaheuristic based approach to part family formation  problems in group technology\nDyna-H: a heuristic planning reinforcement learning algorithm applied to  role-playing-game strategy decision systems\nEvolutionary Mechanics: new engineering principles for the emergence of  flexibility in a dynamic and uncertain world\nA Context-theoretic Framework for Compositionality in Distributional  Semantics\nFinding undetected protein associations in cell signaling by belief  propagation\nA Human-Centric Approach to Group-Based Context-Awareness\nActive Markov Information-Theoretic Path Planning for Robotic  Environmental Sensing\nSpeeding up SAT solver by exploring CNF symmetries : Revisited\nEmergence through Selection: The Evolution of a Scientific Challenge\nGraph Coalition Structure Generation\nOlogs: a categorical framework for knowledge representation\nDecision Theory with Prospect Interference and Entanglement\nFoundations for Understanding and Building Conscious Systems using  Stable Parallel Looped Dynamics\nBisimulations for fuzzy automata\nA Wiki for Business Rules in Open Vocabulary, Executable English\nA Discrete Evolutionary Model for Chess Players' Ratings\nLanguage, Emotions, and Cultures: Emotional Sapir-Whorf Hypothesis\nReduced Ordered Binary Decision Diagram with Implied Literals: A New  knowledge Compilation Approach\nDecentralized Constraint Satisfaction\nDoubly Robust Policy Evaluation and Learning\nFormal and Computational Properties of the Confidence Boost of  Association Rules\nWhen is social computation better than the sum of its parts?\nCounting Homomorphisms and Partition Functions\nA Simplified and Improved Free-Variable Framework for Hilbert's epsilon  as an Operator of Indefinite Committed Choice\nBackdoors to Tractable Answer-Set Programming\nLearning invariant features through local space contraction\nCombining Ontology Development Methodologies and Semantic Web Platforms  for E-government Domain Ontology Development\nThe Impact of Mutation Rate on the Computation Time of Evolutionary  Dynamic Optimization\nBayesian and L1 Approaches to Sparse Unsupervised Learning\nOn Kinds of Indiscernibility in Logic and Metaphysics\nBelief-propagation algorithm and the Ising model on networks with  arbitrary distributions of motifs\nAcquiring Correct Knowledge for Natural Language Generation\nTheory and Algorithms for Partial Order Based Reduction in Planning\nThe Rate of Convergence of AdaBoost\nConcurrent Auctions Across The Supply Chain\nExistence of Multiagent Equilibria with Limited Agents\nThe Influence of Global Constraints on Similarity Measures for  Time-Series Databases\nLocal Optima Networks of NK Landscapes with Neutrality\nHigher Order Programming to Mine Knowledge for a Modern Medical Expert  System\nThe PITA System: Tabling and Answer Subsumption for Reasoning under  Uncertainty\nAn iterative feature selection method for GRNs inference by exploring  topological properties\nControlling wheelchairs by body motions: A learning framework for the  adaptive remapping of space\nExploiting Agent and Type Independence in Collaborative Graphical  Bayesian Games\nSpecifying and Staging Mixed-Initiative Dialogs with Program Generation  and Transformation\nReputation-based Incentive Protocols in Crowdsourcing Applications\nThe Ditmarsch Tale of Wonders - The Dynamics of Lying\nSelectivity in Probabilistic Causality: Drawing Arrows from Inputs to  Stochastic Outputs\nComparing System Dynamics and Agent-Based Simulation for Tumour Growth  and its Interactions with Effector Cells\nMaking Use of Advances in Answer-Set Programming for Abstract  Argumentation Systems\nGeographic Trough Filling for Internet Datacenters\nCosmological parameter estimation using Particle Swarm Optimization  (PSO)\nLexRank: Graph-based Lexical Centrality as Salience in Text  Summarization\nEfficiency versus Convergence of Boolean Kernels for On-Line Learning  Algorithms\nRisk-Sensitive Reinforcement Learning Applied to Control under  Constraints\nLearning where to Attend with Deep Architectures for Image Tracking\nKara: A System for Visualising and Visual Editing of Interpretations for  Answer-Set Programs\nLatent Semantic Learning with Structured Sparse Representation for Human  Action Recognition\nHigher-Order Markov Tag-Topic Models for Tagged Documents and Images\nCooperative Information Sharing to Improve Distributed Learning in  Multi-Agent Systems\nIndividual and Domain Adaptation in Sentence Planning for Dialogue\nEmbedding Description Logic Programs into Default Logic\nSemantic-Driven e-Government: Application of Uschold and King Ontology  Building Methodology for Semantic Ontology Models Development\nContextually Guided Semantic Labeling and Search for 3D Point Clouds\nConstraint Satisfaction Tractability from Semi-lattice Operations on  Infinite Sets\nDeveloping Embodied Multisensory Dialogue Agents\nInteractive Character Posing by Sparse Coding\nPbm: A new dataset for blog mining\nA Pareto-metaheuristic for a bi-objective winner determination problem  in a combinatorial reverse auction\nA probabilistic methodology for multilabel classification\nEmpowerment for Continuous Agent-Environment Systems\nOptimization in SMT with LA(Q) Cost Functions\nDecentralized Multi-agent Plan Repair in Dynamic Environments\nAn efficient high-quality hierarchical clustering algorithm for  automatic inference of software architecture from the source code of a  software system\nRefinement Modal Logic\nDynamic Mechanism Design for Markets with Strategic Resources\nStrong Backdoors to Nested Satisfiability\nHybrid Batch Bayesian Optimization\nAlgorithms for Learning Kernels Based on Centered Alignment\nTowards Electronic Shopping of Composite Product\nLearning High-Dimensional Mixtures of Graphical Models\nMulti source feedback based performance appraisal system using Fuzzy  logic decision support system\nRole-Dynamics: Fast Mining of Large Dynamic Networks\nAlgorithms and Complexity Results for Exact Bayesian Structure Learning\nSparse-posterior Gaussian Processes for general likelihoods\nBayesian Parameter Estimation for Latent Markov Random Fields and Social  Networks\nLearning loopy graphical models with latent variables: Efficient methods  and guarantees\nThe Initial Conditions of the Universe from Constrained Simulations\nTransforming Graph Representations for Statistical Relational Learning\nDirected Information Graphs\nKepler Eclipsing Binary Stars. III. Classification of Kepler Eclipsing  Binary Light Curves with Locally Linear Embedding\nLeveraging Usage Data for Linked Data Movie Entity Summarization\nLarge-Scale Automatic Labeling of Video Events with Verbs Based on  Event-Participant Interaction\nRobot Navigation using Reinforcement Learning and Slow Feature Analysis\nMesh Learning for Classifying Cognitive Processes\nBayesian Discovery of Linear Acyclic Causal Models\nL2 Regularization for Learning Kernels\nTree Projections and Structural Decomposition Methods: The Power of  Local Consistency and Larger Islands of Tractability\nkLog: A Language for Logical and Relational Learning with Kernels\nForeword: A Computable Universe, Understanding Computation and Exploring  Nature As Computation\nApproximate Equalities on Rough Intuitionistic Fuzzy Sets and an  Analysis of Approximate Equalities\nFuzzy Knowledge Representation Based on Possibilistic and Necessary  Bayesian Networks\nPossibilistic Pertinence Feedback and Semantic Networks for Goal's  Extraction\nConcepts and Their Dynamics: A Quantum-Theoretic Modeling of Human  Thought\nA weighted combination similarity measure for mobility patterns in  wireless networks\nFuzzy Knowledge Representation, Learning and Optimization with Bayesian  Analysis in Fuzzy Semantic Networks\nThe third open Answer Set Programming competition\nConstrained Approximate Maximum Entropy Learning of Markov Random Fields\nModelling local and global phenomena with sparse Gaussian processes\nSimple Regret Optimization in Online Planning for Markov Decision  Processes\nFeature Based Fuzzy Rule Base Design for Image Extraction\nOn the Complexity of Existential Positive Queries\nAnt Robotics: Covering Continuous Domains by Multi-A(ge)nt Systems\nLearning the Experts for Online Sequence Prediction\nModeling Latent Variable Uncertainty for Loss-based Learning\nShift-Invariance Sparse Coding for Audio Classification\nCausal Bounds and Instruments\nDetermining the Number of Non-Spurious Arcs in a Learned DAG Model:  Investigation of a Bayesian and a Frequentist Approach\nRelational Approach to Knowledge Engineering for POMDP-based Assistance  Systems as a Translation of a Psychological Model\nThe evolution of representation in simple cognitive networks\nDiscriminative Learning via Semidefinite Probabilistic Models\nMatrix Tile Analysis\nStochastic Optimal Control in Continuous Space-Time Multi-Agent Systems\nA Self-Supervised Terrain Roughness Estimator for Off-Road Autonomous  Driving\nDesign, Evaluation and Analysis of Combinatorial Optimization Heuristic  Algorithms\nConceptual Modelling and The Quality of Ontologies: Endurantism Vs.  Perdurantism\nSet-valued dynamic treatment regimes for competing outcomes\nIsabelle/jEdit --- a Prover IDE within the PIDE framework\nRecovering Articulated Object Models from 3D Range Data\nActive Model Selection\nJoint discovery of haplotype blocks and complex trait associations from  SNP sequences\nProbabilistic index maps for modeling natural signals\nOntology for Cellular Communication\nMeta-Learning of Exploration/Exploitation Strategies: The Multi-Armed  Bandit Case\nSemantic Information Retrieval Using Ontology In University Domain\nToward an Integrated Framework for Automated Development and  Optimization of Online Advertising Campaigns\nOn Finding Optimal Polytrees\nComparison of different T-norm operators in classification problems\nMore than Word Frequencies: Authorship Attribution via Natural Frequency  Zoned Word Distribution Analysis\nContent-based Text Categorization using Wikitology\nMonte Carlo Search Algorithm Discovery for One Player Games\nOptimized Look-Ahead Tree Policies: A Bridge Between Look-Ahead Tree  Policies and Direct Policy Search\nConquering the rating bound problem in neighborhood-based collaborative  filtering: a function recovery approach\nParametric Constructive Kripke-Semantics for Standard Multi-Agent Belief  and Knowledge (Knowledge As Unbiased Belief)\nOn firm specific characteristics of pharmaceutical generics and  incentives to permanence under fuzzy conditions\nCultural Algorithm Toolkit for Multi-objective Rule Mining\nRIO: Minimizing User Interaction in Ontology Debugging\nDecision-Theoretic Coordination and Control for Active Multi-Camera  Surveillance in Uncertain, Partially Observable Environments\nLattice structures of fixed points of the lower approximations of two  types of covering-based rough sets\nA Cookbook for Temporal Conceptual Data Modelling with Description  Logics\nSemi-automatic annotation process for procedural texts: An application  on cooking recipes\nTest-cost-sensitive attribute reduction of data with normal distribution  measurement errors\nRevisiting the Training of Logic Models of Protein Signaling Networks  with a Formal Approach based on Answer Set Programming\nLearning Heterogeneous Similarity Measures for Hybrid-Recommendations in  Meta-Mining\nConflict-driven ASP Solving with External Sources\nBayesian Inference with Posterior Regularization and applications to  Infinite Latent SVMs\nReply to Comments on Neuroelectrodynamics: Where are the Real Conceptual  Pitfalls?\nMining Permission Request Patterns from Android and Facebook  Applications (extended author version)\nMulti-view constrained clustering with an incomplete mapping between  views\nInferring the Underlying Structure of Information Cascades\nQuick Summary\nUncertain Congestion Games with Assorted Human Agent Populations\nMarkov Determinantal Point Processes\nA Model-Based Approach to Rounding in Spectral Clustering\nLatent Composite Likelihood Learning for the Structured Canonical  Correlation Model\nLatent Dirichlet Allocation Uncovers Spectral Characteristics of Drought  Stressed Plants\nCreating a level playing field for all symbols in a discretization\nTyped Answer Set Programming and Inverse Lambda Algorithms\nLearning classifier systems with memory condition to solve non-Markov  problems\nParameterized Complexity and Kernel Bounds for Hard Planning Problems\nSurprisingly Rational: Probability theory plus noise explains biases in  judgment\nAlgorithm Runtime Prediction: Methods & Evaluation\nLearning using Local Membership Queries\nSecured Wireless Communication using Fuzzy Logic based High Speed  Public-Key Cryptography (FLHSPKC)\nComposite Strategy for Multicriteria Ranking/Sorting (methodological  issues, examples)\nDistributed Non-Stochastic Experts\nConstruction of Energy Functions for Lattice Heteropolymer Models: A  Case Study in Constraint Satisfaction Programming and Adiabatic Quantum  Optimization\nObjective Improvement in Information-Geometric Optimization\nVisualization and clustering by 3D cellular automata: Application to  unstructured data\nTwitterPaul: Extracting and Aggregating Twitter Predictions\nIntrusion Detection on Smartphones\nTACT: A Transfer Actor-Critic Learning Framework for Energy Saving in  Cellular Radio Access Networks\nLearning-Assisted Automated Reasoning with Flyspeck\nEvolutionarily Stable Sets in Quantum Penny Flip Games\nTraining Support Vector Machines Using Frank-Wolfe Optimization Methods\nAutonomous Navigation by Robust Scan Matching Technique\nFoundations of scientific research (Foundations of Research Activities)\nDeciding Monotone Duality and Identifying Frequent Itemsets in Quadratic  Logspace\nBag-of-Words Representation for Biomedical Time Series Classification\nTree Projections and Structural Decomposition Methods: Minimality and  Game-Theoretic Characterization\n1 Billion Pages = 1 Million Dollars? Mining the Web to Play \"Who Wants  to be a Millionaire?\"\nPreference-based Graphic Models for Collaborative Filtering\nCollaborative Ensemble Learning: Combining Collaborative and  Content-Based Information Filtering via Hierarchical Bayes\nA Generalized Mean Field Algorithm for Variational Inference in  Exponential Families\nLearning Module Networks\nModeling in OWL 2 without Restrictions\nTaming the Infinite Chase: Query Answering under Expressive Integrity  Constraints\nIncreasing Air Traffic: What is the Problem?\nOnline Learning for Ground Trajectory Prediction\nInteractive Ant Colony Optimisation (iACO) for Early Lifecycle Software  Design\nDistributed optimization of deeply nested systems\nDiscovering Basic Emotion Sets via Semantic Clustering on a Twitter  Corpus\nAutonomously Learning to Visually Detect Where Manipulation Will Succeed\nMaximizing a Nonnegative, Monotone, Submodular Function Constrained to  Matchings\nApplying Strategic Multiagent Planning to Real-World Travel Sharing  Problems\nLearning with Scope, with Application to Information Extraction and  Classification\nAn Information-Theoretic External Cluster-Validity Measure\nProbabilistic entailment in the setting of coherence: The role of quasi  conjunction and inclusion relation\nUtilizing ASP for Generating and Visualizing Argumentation Frameworks\nExtending FO(ID) with Knowledge Producing Definitions: Preliminary  Results\nVerification of Agent-Based Artifact Systems\nCrowd Labeling: a survey\nThe IBMAP approach for Markov networks structure learning\nBayesian Classification and Feature Selection from Finite Data Sets\nA Two-round Variant of EM for Gaussian Mixtures\nThe Anchors Hierachy: Using the triangle inequality to survive high  dimensional data\nCollaborative Filtering by Personality Diagnosis: A Hybrid Memory- and  Model-Based Approach\nWhen you talk about \"Information processing\" what actually do you have  in mind?\nFrom 9-IM Topological Operators to Qualitative Spatial Relations using  3D Selective Nef Complexes and Logic Rules for bodies\nDeveloping Parallel Dependency Graph In Improving Game Balancing\nOn Supervised Selection of Bayesian Networks\nEmpirical Analysis of Predictive Algorithms for Collaborative Filtering\nHierarchical Mixtures-of-Experts for Exponential Family Regression  Models with Generalized Linear Mean Functions: A Survey of Approximation and  Consistency Results\nMulti-Robot Informative Path Planning for Active Sensing of  Environmental Phenomena: A Tale of Two Algorithms\nAn Information-Theoretic Analysis of Hard and Soft Assignment Methods  for Clustering\nA solution concept for games with altruism and cooperation\nTowards a theory of good SAT representations\nExploiting Social Tags for Cross-Domain Collaborative Filtering\nGeodesic-based Salient Object Detection\nVariational Algorithms for Marginal MAP\nUsing Modified Partitioning Around Medoids Clustering Technique in  Mobile Network Planning\nArriving on time: estimating travel time distributions on large-scale  road networks\nLearning AMP Chain Graphs and some Marginal Models Thereof under  Faithfulness: Extended Version\nK-Nearest Neighbour algorithm coupled with logistic regression in  medical case-based reasoning systems. Application to prediction of access to  the renal transplant waiting list in Brittany\nFairness in Academic Course Timetabling\nA Greedy Approximation of Bayesian Reinforcement Learning with Probably  Optimistic Transition Model\nViterbi training in PRISM\nHeart Disease Prediction System using Associative Classification and  Genetic Algorithm\nA Massively Parallel Associative Memory Based on Sparse Neural Networks\nOn the speed of constraint propagation and the time complexity of arc  consistency testing\nFormalizing the Confluence of Orthogonal Rewriting Systems\nA Community Based Algorithm for Large Scale Web Service Composition\nWhat does mathoverflow tell us about the production of mathematics?\nHigh Level Pattern Classification via Tourist Walks in Networks\nSemantic-based Anomalous Pattern Discovery in Moving Object Trajectories\nThe Dynamically Extended Mind -- A Minimal Modeling Case Study\nInformation-Theoretic Approach to Efficient Adaptive Path Planning for  Mobile Robotic Environmental Sensing\nModelling and Analysing Cargo Screening Processes: A Project Outline\nInvestigating Mathematical Models of Immuno-Interactions with  Early-Stage Cancer under an Agent-Based Modelling Perspective\nUsing a bag of Words for Automatic Medical Image Annotation with a  Latent Semantic\nLoop Calculus and Bootstrap-Belief Propagation for Perfect Matchings on  Arbitrary Graphs\nIs protein folding problem really a NP-complete one ? First  investigations\nNew Results on Equilibria in Strategic Candidacy\nFinding Academic Experts on a MultiSensor Approach using Shannon's  Entropy\nThe Rise and Fall of Semantic Rule Updates Based on SE-Models\nA Multi-Engine Approach to Answer Set Programming\nA Decomposition of the Max-min Fair Curriculum-based Course Timetabling  Problem\nComputation of Diet Composition for Patients Suffering from Kidney and  Urinary Tract Diseases with the Fuzzy Genetic System\nActivity Modeling in Smart Home using High Utility Pattern Mining over  Data Streams\nApproximate Bayesian Image Interpretation using Generative Probabilistic  Graphics Programs\nEvidence and plausibility in neighborhood structures\nRouting in Wireless Mesh Networks: Two Soft Computing Based Approaches\nSoft Computing Framework for Routing in Wireless Mesh Networks: An  Integrated Cost Function Approach\nOn Nicod's Condition, Rules of Induction and the Raven Paradox\nProbability Distinguishes Different Types of Conditional Statements\nThe Fundamental Learning Problem that Genetic Algorithms with Uniform  Crossover Solve Efficiently and Repeatedly As Evolution Proceeds\nLearning Markov networks with context-specific independences\nParameterized Complexity Results for Plan Reuse\nModel checking coalitional games in shortage resource scenarios\nNumerical response of the magnetic permeability as a funcion of the  frecuency of NiZn ferrites using Genetic Algorithm\nTime-Series Classification Through Histograms of Symbolic Polynomials\nLevels of Integration between Low-Level Reasoning and Task Planning\nHerding the Crowd: Automated Planning for Crowdsourced Planning\nInducing Honest Reporting Without Observing Outcomes: An Application to  the Peer-Review Process\nCultural Evolution Entails (Creativity Entails (Concept Combination  Entails Quantum Structure))\nMultiobjective Tactical Planning under Uncertainty for Air Traffic Flow  and Capacity Management\nThe multi-vehicle covering tour problem: building routes for urban  patrolling\nPartition-Merge: Distributed Inference and Modularity Optimization\nLearning Lambek grammars from proof frames\nToward robust phase-locking in Melibe swim central pattern generator  models\nSAT-based Preprocessing for MaxSAT (extended version)\nA practical approach to ontology-enabled control systems for  astronomical instrumentation\nDissociation and Propagation for Approximate Lifted Inference with  Standard Relational Database Management Systems\nOn the Tractability of Minimal Model Computation for Some CNF Theories\nUnsupervised learning human's activities by overexpressed recognized  non-speech sounds\nStructural Weights in Ontology Matching\nClustering Markov Decision Processes For Continual Transfer\nAnalyzing Evolutionary Optimization in Noisy Environments\nCharacterizing and Extending Answer Set Semantics using Possibility  Theory\nTest Set Selection using Active Information Acquisition for Predictive  Models\nHigh Throughput Virtual Screening with Data Level Parallelism in  Multi-core Processors\nBalancing bike sharing systems (BBSS): instance generation from the  CitiBike NYC data\nMining Malware Specifications through Static Reachability Analysis\nFair assignment of indivisible objects under ordinal preferences\nPredictive User Modeling with Actionable Attributes\nFormal Ontology Learning on Factual IS-A Corpus in English using  Description Logics\nA New Approach to Constraint Weight Learning for Variable Ordering in  CSPs\nProceedings 2nd Workshop on GRAPH Inspection and Traversal Engineering\nQuantitative methods for Phylogenetic Inference in Historical  Linguistics: An experimental case study of South Central Dravidian\nData Smashing\nConstraint Solvers for User Interface Layout\nLearning optimization models in the presence of unknown relations\nSpeeding up SOR Solvers for Constraint-based GUIs with a Warm-Start  Strategy\nGödel, Tarski, Turing and the conundrum of free will\nDoes Restraining End Effect Matter in EMD-Based Modeling Framework for  Time Series Prediction? Some Experimental Evidences\nMechanisms for Making Crowds Truthful\nLearning Document-Level Semantic Properties from Free-Text Annotations\nTrust-Based Mechanisms for Robust and Efficient Task Allocation in the  Presence of Execution Uncertainty\nComplex Question Answering: Unsupervised Learning Approaches and  Experiments\nHighly comparative feature-based time-series classification\nConstructing Reference Sets from Unstructured, Ungrammatical Text\nCooperative Games with Overlapping Coalitions\nFalse-Name Manipulations in Weighted Voting Games\nStackelberg vs. Nash in Security Games: An Extended Investigation of  Interchangeability, Equivalence, and Uniqueness\nMulti-Robot Adversarial Patrolling: Facing a Full-Knowledge Opponent\nCentrality-as-Relevance: Support Sets and Similarity as Geometric  Proximity\nCause Identification from Aviation Safety Incident Reports via Weakly  Supervised Semantic Lexicon Construction\nModelling Observation Correlations for Active Exploration and Robust  Object Detection\nA Scalable Conditional Independence Test for Nonlinear, Non-Gaussian  Data\nGeneralized Biwords for Bitext Compression and Translation Spotting\nText Relatedness Based on a Word Thesaurus\nGGP with Advanced Reasoning and Board Knowledge Discovery\nUsing Neural Network to Propose Solutions to Threats in Attack Patterns\nTowards Unsupervised Learning of Temporal Relations between Events\nCoalition Structure Generation over Graphs\nContext-based Word Acquisition for Situated Dialogue in a Virtual World\nImproving Statistical Machine Translation for a Resource-Poor Language  Using Related Resource-Rich Languages\nExpert System Based On Neural-Fuzzy Rules for Thyroid Diseases Diagnosis\nRepresenting, reasoning and answering questions about biological  pathways - various applications\nA Novel Method for Comparative Analysis of DNA Sequences by  Ramanujan-Fourier Transform\nCounterfactual Estimation and Optimization of Click Metrics for Search  Engines\nConstraint-based Causal Discovery from Multiple Interventions over  Overlapping Variable Sets\nCancer Prognosis Prediction Using Balanced Stratified Sampling\nAdaptive MCMC-Based Inference in Probabilistic Logic Programs\nMulti-agent Inverse Reinforcement Learning for Zero-sum Games\nVenture: a higher-order probabilistic programming platform with  programmable inference\nA New Paradigm for Minimax Search\nSSS* = Alpha-Beta + TT\nNearly Optimal Minimax Tree Search?\nOuter-Product Hidden Markov Model and Polyphonic MIDI Score Following\nEfficient Inference and Learning in a Large Knowledge Base: Reasoning  with Extracted Information using a Locally Groundable First-Order  Probabilistic Logic\nInferring Social Status and Rich Club Effects in Enterprise  Communication Networks\nSurpassing Human-Level Face Verification Performance on LFW with  GaussianFace\nA Control Dichotomy for Pure Scoring Rules\nApproximate Equilibrium and Incentivizing Social Coordination\nA Formal Analysis of Required Cooperation in Multi-agent Planning\nNonmonotonic Reasoning as a Temporal Activity\nLTLf and LDLf Monitoring: A Technical Report\nFinding Inner Outliers in High Dimensional Space\nSemantics and Compilation of Answer Set Programming with Generalized  Atoms\nA Mathematical Theory of Learning\nFastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test\nLifted Variable Elimination for Probabilistic Logic Programming\nAn Agent-based Modeling Framework for Sociotechnical Simulation of Water  Distribution Contamination Events\nBound Founded Answer Set Programming\nCredal Model Averaging for classification: representing prior ignorance  and expert opinions\nThe Design of the Fifth Answer Set Programming Competition\nContextual Abductive Reasoning with Side-Effects\nTransaction Logic with (Complex) Events\nProperties of Stable Model Semantics Extensions\nThat's sick dude!: Automatic identification of word sense change across  different timescales\nAn Ordinal Bargaining Solution with Fixed-Point Property\nEfficient Model Learning for Human-Robot Collaborative Tasks\nHEPGAME and the Simplification of Expressions\nSemantic Composition and Decomposition: From Recognition to Generation\nInverse Graphics with Probabilistic CAD Models\nThe Complexity of Reasoning with FODD and GFODD\nMultiple chaotic central pattern generators with learning for legged  locomotion and malfunction compensation\nA New Rational Algorithm for View Updating in Relational Databases\nAre There Good Mistakes? A Theoretical Analysis of CEGIS\nModeling and Recognition of Smart Grid Faults by a Combined Approach of  Dissimilarity Learning and One-Class Classification\nBackwards State-space Reduction for Planning in Dynamic Knowledge Bases\nCharacterization of graphs for protein structure modeling and  recognition of solubility\nPeople are Strange when you're a Stranger: Impact and Influence of Bots  on Social Networks\nCalculating Ultra-Strong and Extended Solutions for Nine Men's Morris,  Morabaraba, and Lasker\nFunctional Principal Component Analysis and Randomized Sparse Clustering  Algorithm for Medical Image Analysis\nComputational Analysis of Perfect-Information Position Auctions\nA Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian  Networks\nA QoS based Routing Approach using Genetic Algorithms for Bandwidth  Maximization in Network\nNon-Convex Rank Minimization via an Empirical Bayesian Approach\nVideo Face Editing Using Temporal-Spatial-Smooth Warping\nClassifying sequences by the optimized dissimilarity space embedding  approach: a case study on the solubility analysis of the E. coli proteome\nA Study of Proxies for Shapley Allocations of Transport Costs\nKnowledge Engineering for Planning-Based Hypothesis Generation\nIntegrating active sensing into reactive synthesis with temporal logic  constraints under partial observations\nHD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale  Visual Recognition\nGamma Processes, Stick-Breaking, and Variational Inference\nOntology-based Representation and Reasoning on Process Models: A Logic  Programming Approach\nComputational Beauty: Aesthetic Judgment at the Intersection of Art and  Science\neTutor: Online Learning for Personalized Education\nA Parallel and Efficient Algorithm for Learning to Match\nEstimating the intrinsic dimension in fMRI space via dataset fractal  analysis - Counting the `cpu cores' of the human brain\nTowards a Visual Turing Challenge\nLearning of Agent Capability Models with Applications in Multi-agent  Planning\nThe Spaces of Data, Information, and Knowledge\nAnswering Conjunctive Queries over $\\mathcal{EL}$ Knowledge Bases with  Transitive and Reflexive Roles\nOn Sparse Discretization for Graphical Games\nBuilding the Observer into the System: Toward a Realistic Description of  Human Interaction with the World\nScalable Link Prediction in Dynamic Networks via Non-Negative Matrix  Factorization\nLearning Fuzzy Controllers in Mobile Robotics with Embedded  Preprocessing\nAggregating partial rankings with applications to peer grading in  massive online open courses\nToward a Universal Cortical Algorithm: Examining Hierarchical Temporal  Memory in Light of Frontal Cortical Function\nOn the High-dimensional Power of Linear-time Kernel Two-Sample Testing  under Mean-difference Alternatives\nSolving the Periodic Timetabling Problem using a Genetic Algorithm\nBehaviour Trees for Evolutionary Robotics\nProbability Theory without Bayes' Rule\nInvestigation of a chaotic spiking neuron model\nLong-term causal effects via behavioral game theory\nSocial Participation Ontology: community documentation, enhancements and  use examples\nHigh performance photonic reservoir computer based on a coherently  driven passive cavity\nWhat do we learn about development from baby robots?\nManaging large-scale scientific hypotheses as uncertain and  probabilistic data\nA New Efficient Method for Calculating Similarity Between Web Services\nUser Clustering in Online Advertising via Topic Models\nCompositional Distributional Semantics with Compact Closed Categories  and Frobenius Algebras\nLanguage Models for Image Captioning: The Quirks and What Works\nThe Boundary Forest Algorithm for Online Supervised and Unsupervised  Learning\nDo PageRank-based author rankings outperform simple citation counts?\nExploring Strategy-Proofness, Uniqueness, and Pareto Optimality for the  Stable Matching Problem with Couples\nA Theory of Formal Synthesis via Inductive Learning\nMargins, Kernels and Non-linear Smoothed Perceptrons\nAlgorithmic Connections Between Active Learning and Stochastic Convex  Optimization\nHinge-Loss Markov Random Fields and Probabilistic Soft Logic\nDopeLearning: A Computational Approach to Rap Lyrics Generation\nA New Fundamental Evidence of Non-Classical Structure in the Combination  of Natural Concepts\nModular Action Language ALM\nParallel Streaming Signature EM-tree: A Clustering Algorithm for Web  Scale Applications\nVariational Inference with Normalizing Flows\nA survey of SMS based Information Systems\nA U.S. Research Roadmap for Human Computation\nMultidefender Security Games\nSelf-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge  Evolution\nIncentivizing Exploration In Reinforcement Learning With Deep Predictive  Models\nUsing Monte Carlo method for searching partitionings of hard variants of  Boolean satisfiability problem\nA model building framework for Answer Set Programming with external  computations\nMarkov Logic Networks for Natural Language Question Answering\nEvidential relational clustering using medoids\nScaling Monte Carlo Tree Search on Intel Xeon Phi\nLift-Based Bidding in Ad Selection\nTree-based Visualization and Optimization for Image Collection\nSupport Vector Machine in Prediction of Building Energy Demand Using  Pseudo Dynamic Approach\nFairness Constraints: Mechanisms for Fair Classification\nDual-normal Logic Programs - the Forgotten Class\nIncorporating Belief Function in SVM for Phoneme Recognition\nCommunication: Words and Conceptual Systems\nA Gauss-Newton Method for Markov Decision Processes\nTransfer Learning from Deep Features for Remote Sensing and Poverty  Mapping\nBoosting in the presence of outliers: adaptive classification with  non-convex loss functions\nWithin-Brain Classification for Brain Tumor Segmentation\nDisjunctive Answer Set Solvers via Templates\nAttend, Adapt and Transfer: Attentive Deep Architecture for Adaptive  Transfer from multiple sources in the same domain\nEvaluating Real-time Anomaly Detection Algorithms - the Numenta Anomaly  Benchmark\nLayer-Specific Adaptive Learning Rates for Deep Networks\nHybridization of Interval CP and Evolutionary Algorithms for Optimizing  Difficult Problems\nNormalization of Relative and Incomplete Temporal Expressions in  Clinical Narratives\nHigh Performance Latent Variable Models\nTime-resolved emission from bright hot pixels of an active region  observed in the EUV band with SDO/AIA and multi-stranded loop modeling\nSample Complexity of Episodic Fixed-Horizon Reinforcement Learning\nLearning Causal Graphs with Small Interventions\nLearning Adversary Behavior in Security Games: A PAC Model Perspective\nToward an Efficient Multi-class Classification in an Open Universe\nAdaptive information-theoretic bounded rational decision-making with  parametric priors\nCombining Privileged Information to Improve Context-Aware Recommender  Systems\nDetecting events and key actors in multi-person videos\nInstantaneous Modelling and Reverse Engineering of DataConsistent Prime  Models in Seconds!\nCharacterizing Concept Drift\nSeeing the Unseen Network: Inferring Hidden Social Ties from  Respondent-Driven Sampling\nConvolutional Models for Joint Object Categorization and Pose Estimation\nAsk, Attend and Answer: Exploring Question-Guided Spatial Attention for  Visual Question Answering\nActive exploration of sensor networks from a robotics perspective\nCensoring Representations with an Adversary\nBehavior Query Discovery in System-Generated Temporal Graphs\nJoint Word Representation Learning using a Corpus and a Semantic Lexicon\nHand Pose Estimation through Semi-Supervised and Weakly-Supervised  Learning\nNon-Sentential Utterances in Dialogue: Experiments in Classification and  Interpretation\nMulti-Agent Continuous Transportation with Online Balanced Partitioning\nInterpretable Two-level Boolean Rule Learning for Classification\nLearning with Memory Embeddings\nSemantic Folding Theory And its Application in Semantic Fingerprinting\nOn Learning to Think: Algorithmic Information Theory for Novel  Combinations of Reinforcement Learning Controllers and Recurrent Neural World  Models\nBicycle-Sharing System Analysis and Trip Prediction\nAsking the metaquestions in constraint tractability\nAn Efficient Algorithm for Mining Frequent Sequence with Constraint  Programming\nFeature extraction using Latent Dirichlet Allocation and Neural  Networks: A case study on movie synopses\nDeep Cross Residual Learning for Multitask Visual Recognition\nThe Curious Robot: Learning Visual Representations via Physical  Interactions\nA Corpus and Evaluation Framework for Deeper Understanding of  Commonsense Stories\nLearning to Track at 100 FPS with Deep Regression Networks\nHow deep is knowledge tracing?\nDifferential Evolution for Efficient AUV Path Planning in Time Variant  Uncertain Underwater Environment\nEfficient Classification of Multi-Labelled Text Streams by Clashing\nStrategyproof Peer Selection using Randomization, Partitioning, and  Apportionment\nA Discrete and Bounded Envy-Free Cake Cutting Protocol for Any Number of  Agents\nA Discrete Firefly Algorithm to Solve a Rich Vehicle Routing Problem  Modelling a Newspaper Distribution System with Recycling Policy\nMatch-SRNN: Modeling the Recursive Matching Structure with Spatial RNN\nThe STRANDS Project: Long-Term Autonomy in Everyday Environments\nElicitation for Preferences Single Peaked on Trees\nA Hierarchical Genetic Optimization of a Fuzzy Logic System for Flow  Control in Micro Grids\nToward Efficient Task Assignment and Motion Planning for Large Scale  Underwater Mission\nEnd-to-End Tracking and Semantic Segmentation Using Recurrent Neural  Networks\nPreference Elicitation For Single Crossing Domain\nAnnotation Order Matters: Recurrent Image Annotator for Arbitrary Length  Image Tagging\nInductive Coherence\nLearning Sparse Additive Models with Interactions in High Dimensions\nExtending the Harper Identity to Iterated Belief Change\nContribution to the Formal Specification and Verification of a  Multi-Agent Robotic System\nProving the Incompatibility of Efficiency and Strategyproofness via SMT  Solving\nParallel Strategies Selection\nDisCSPs with Privacy Recast as Planning Problems for Utility-based  Agents\nConversational Markers of Constructive Discussions\nDistributed Flexible Nonlinear Tensor Factorization\nDefining Concepts of Emotion: From Philosophy to Science\nThe Z-loss: a shift and scale invariant classification loss belonging to  the Spherical Family\nFormulating Semantics of Probabilistic Argumentation by Characterizing  Subgraphs: Theory and Empirical Results\nA Web-based Tool for Identifying Strategic Intervention Points in  Complex Systems\nCombining Answer Set Programming and Domain Heuristics for Solving Hard  Industrial Problems (Application Paper)\nParaconsistency and Word Puzzles\nSelf-Organising Maps in Computer Security\nThe Power of Non-Ground Rules in Answer Set Programming\nIterative Learning of Answer Set Programs from Context Dependent  Examples\nTowards the Self-constructive Brain: emergence of adaptive behavior\nBlankets Joint Posterior score for learning Markov network structures\nRevisiting Causality Inference in Memory-less Transition Networks\nFacial Expression Recognition Using a Hybrid CNN-SIFT Aggregator\nStochastic Rank-1 Bandits\nInferring unknown biological function by integration of GO annotations  and gene expression data\nA Convolutional Autoencoder for Multi-Subject fMRI Data Aggregation\nOpen Problem: Approximate Planning of POMDPs in the class of Memoryless  Policies\nlpopt: A Rule Optimization Tool for Answer Set Programming\nRETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism\nPlanning With Discrete Harmonic Potential Fields\nPhased Exploration with Greedy Exploitation in Stochastic Combinatorial  Partial Monitoring Games\nFathom: Reference Workloads for Modern Deep Learning Methods\nState Duration and Interval Modeling in Hidden Semi-Markov Model for  Sequential Data Analysis\nMulti-View Fuzzy Clustering with Minimax Optimization for Effective  Clustering of Data from Multiple Sources\nTitle Generation for User Generated Videos\nActivity Networks with Delays An application to toxicity analysis\nA Bi-LSTM-RNN Model for Relation Classification Using Low-Cost Sequence  Features\nModelling Cyber-Security Experts' Decision Making Processes using  Aggregation Operators\nLanguage Detection For Short Text Messages In Social Media\nWhat makes ImageNet good for transfer learning?\nBinary Particle Swarm Optimization versus Hybrid Genetic Algorithm for  Inferring Well Supported Phylogenetic Trees\nKnowledge Representation Analysis of Graph Mining\nSplitting and Updating Hybrid Knowledge Bases (Extended Version)\nA Multi-Purpose Scenario-based Simulator for Smart House Environments\nIntegrating Testing and Interactive Theorem Proving\nKernel Belief Propagation\nStrange Beta: An Assistance System for Indoor Rock Climbing Route  Setting Using Chaotic Variations and Machine Learning\nWell-Definedness and Efficient Inference for Probabilistic Logic  Programming under the Distribution Semantics\nCharacterizing and Improving Generalized Belief Propagation Algorithms  on the 2D Edwards-Anderson Model\nBayesian Locality Sensitive Hashing for Fast Similarity Search\nImproving parameter learning of Bayesian nets from incomplete data\nTowards Holistic Scene Understanding: Feedback Enabled Cascaded  Classification Models\nA quantitative Gibbard-Satterthwaite theorem without neutrality\nData Mining Session-Based Patient Reported Outcomes (PROs) in a Mental  Health Setting: Toward Data-Driven Clinical Decision Support and Personalized  Treatment\nAn Information Theoretic Analysis of Decision in Computer Chess\nThe Diversity Paradox: How Nature Resolves an Evolutionary Dilemma\nA Study on Using Uncertain Time Series Matching Algorithms in MapReduce  Applications\nA Study of CAPTCHAs for Securing Web Services\nDocument Clustering based on Topic Maps\nA comparison of two suffix tree-based document clustering algorithms\nOn the definition of a confounder\nOn the complexity of strong Nash equilibrium: Hard-to-solve instances  and smoothed complexity\nPattern-Based Constraint Satisfaction and Logic Puzzles\nOn Appropriate Selection of Fuzzy Aggregation Operators in Medical  Decision Support System\nFrom Constraints to Resolution Rules, Part II: chains, braids,  confluence and T&E\nA Fuzzy Logic Based Certain Trust Model for E-Commerce\nA Junction Tree Framework for Undirected Graphical Model Selection\nImprovement/Extension of Modular Systems as Combinatorial Reengineering  (Survey)\nOptimal Stochastic Strongly Convex Optimization with a Logarithmic  Number of Projections\nTowards an Extension of the 2-tuple Linguistic Model to Deal With  Unbalanced Linguistic Term sets\nThe Stochastic Gradient Descent for the Primal L1-SVM Optimization  Revisited\nInference and learning in probabilistic logic programs using weighted  Boolean formulas\nMeasuring Cultural Relativity of Emotional Valence and Arousal using  Semantic Clustering and Twitter\nA Time and Space Efficient Junction Tree Architecture\nExploring The Contribution of Unlabeled Data in Financial Sentiment  Analysis\nMeasure Transformer Semantics for Bayesian Machine Learning\nCognitive residues of similarity\nExtended Distributed Learning Automata:A New Method for Solving  Stochastic Graph Optimization Problems\nExploiting Binary Floating-Point Representations for Constraint  Propagation: The Complete Unabridged Version\nHow Did Humans Become So Creative? A Computational Approach\nEfficient Computation of the Shapley Value for Game-Theoretic Network  Centrality\nA Feature Subset Selection Algorithm Automatic Recommendation Method\nSharing Rewards in Cooperative Connectivity Games\nDecentralized Anti-coordination Through Multi-agent Learning\nUsing content features to enhance performance of user-based  collaborative filtering performance of user-based collaborative filtering\nPlanning for Decentralized Control of Multiple Robots Under Uncertainty\nUnsupervised Ranking of Multi-Attribute Objects Based on Principal  Curves\nUsing the Crowd to Generate Content for Scenario-Based Serious-Games\nAssessing the Reach and Impact of Game-Based Learning Approaches to  Cultural Competency and Behavioural Change\nDiscriminative Functional Connectivity Measures for Brain Decoding\nIncremental Learning of Event Definitions with Inductive Logic  Programming\nA predictive analytics approach to reducing avoidable hospital  readmission\nAlgorithms for multi-armed bandit problems\nSolving MaxSAT and #SAT on structured CNF formulas\nA Game-theoretic Machine Learning Approach for Revenue Maximization in  Sponsored Search\nOn the satisfiability problem for SPARQL patterns\nConjunction and Negation of Natural Concepts: A Quantum-theoretic  Modeling\nTree-like Queries in OWL 2 QL: Succinctness and Complexity Results\nEigenspace Method for Spatiotemporal Hotspot Detection\nNeural tuning size is a key factor underlying holistic face processing\nTyped Hilbert Epsilon Operators and the Semantics of Determiner Phrases  (Invited Lecture)\nLow-Autocorrelation Binary Sequences: On Improved Merit Factors and  Runtime Predictions to Achieve Them\nNoise-adaptive Margin-based Active Learning and Lower Bounds under  Tsybakov Noise Condition\nCognitive Surveillance: Why does it never appear among the AVSS  Conferences topics?\nCombining predictions from linear models when training and test inputs  differ\nAn interacting replica approach applied to the traveling salesman  problem\nBayesian Network Constraint-Based Structure Learning Algorithms:  Parallel and Optimised Implementations in the bnlearn R Package\nData classification using the Dempster-Shafer method\nVariational Inference for Uncertainty on the Inputs of Gaussian Process  Models\nBuilding Program Vector Representations for Deep Learning\nHardness of parameter estimation in graphical models\nPerformance analysis of a 240 thread tournament level MCTS Go program on  the Intel Xeon Phi\nA Tabu Search Algorithm for the Multi-period Inspector Scheduling  Problem\nVirtual Electrode Recording Tool for EXtracellular potentials (VERTEX):  Comparing multi-electrode recordings from simulated and biological mammalian  cortical tissue\nEfficient Feature Group Sequencing for Anytime Linear Prediction\nIP Tracing and Active Network Response\nOptimal high-level descriptions of dynamical systems\nEntanglement-Based Machine Learning on a Quantum Computer\nInterference Effects in Quantum Belief Networks\nAn agent-driven semantical identifier using radial basis neural networks  and reinforcement learning\nProblem Theory\nHighly comparative fetal heart rate analysis\nSymmetric Weighted First-Order Model Counting\nDeep Neural Networks are Easily Fooled: High Confidence Predictions for  Unrecognizable Images\nWhen Computer Vision Gazes at Cognition\nPlan or not: Remote Human-robot Teaming with Incomplete Task Information\nThe Computational Complexity of Structure-Based Causality\nTeaching Deep Convolutional Neural Networks to Play Go\nAppropriate Causal Models and the Stability of Causation\nOn the Inductive Bias of Dropout\nA Multi-criteria neutrosophic group decision making metod based TOPSIS  for supplier selection\nBelief as Willingness to Bet\nRevisiting Non-Progressive Influence Models: Scalable Influence  Maximization\nPersian Sentiment Analyzer: A Framework based on a Novel Feature  Selection Method\nUsing temporal abduction for biosignal interpretation: A case study on  QRS detection\nBelief Revision, Minimal Change and Relaxation: A General Framework  based on Satisfaction Systems, and Applications to Description Logics\nTractability and Decompositions of Global Cost Functions\nBoost Phrase-level Polarity Labelling with Review-level Sentiment  Classification\nCollaborative Filtering Bandits\nSpeeding up Permutation Testing in Neuroimaging\nAn Efficient Metric of Automatic Weight Generation for Properties in  Instance Matching Technique\nJoint Optimization of Masks and Deep Recurrent Neural Networks for  Monaural Source Separation\nComputational Curiosity (A Book Draft)\nInfluence-Optimistic Local Values for Multiagent Planning --- Extended  Version\nLow-Cost Learning via Active Data Procurement\nBuilding with Drones: Accurate 3D Facade Reconstruction using MAVs\nDescribing Videos by Exploiting Temporal Structure\nWhen Are Tree Structures Necessary for Deep Learning of Representations?\n23-bit Metaknowledge Template Towards Big Data Knowledge Discovery and  Management\nA Meta-Analysis of the Anomaly Detection Problem\nEstimating the Probability of Meeting a Deadline in Hierarchical Plans\nPoker Cash Game: a Thermodynamic Description\nMapping-equivalence and oid-equivalence of single-function  object-creating conjunctive queries\nSequential Relevance Maximization with Binary Feedback\nModeling State-Conditional Observation Distribution using Weighted  Stereo Samples for Factorial Speech Processing Models\nDoubly Robust Policy Evaluation and Optimization\nQuantum Structure of Negation and Conjunction in Human Thought\nDynamic Move Tables and Long Branches with Backtracking in Computer  Chess\nReduced Basis Decomposition: a Certified and Fast Lossy Data Compression  Algorithm\nBoosting Convolutional Features for Robust Object Proposals\nConstruction of FuzzyFind Dictionary using Golay Coding Transformation  for Searching Applications\nIs Poker a Skill Game? New Insights from Statistical Physics\nTwo Timescale Stochastic Approximation with Controlled Markov noise and  Off-policy temporal difference learning\nELM-Based Distributed Cooperative Learning Over Networks\nLarge Margin Nearest Neighbor Embedding for Knowledge Representation\nDetecting Falls with X-Factor Hidden Markov Models\nOn mining complex sequential data by means of FCA and pattern structures\nAutomated Analysis and Prediction of Job Interview Performance\nNegatively Correlated Search\nHybrid Genetic Algorithm and Lasso Test Approach for Inferring Well  Supported Phylogenetic Trees based on Subsets of Chloroplastic Core Genes\nGroupwise registration of aerial images\nHow do you revise your belief set with %$;@*?\nOnline Learning Algorithm for Time Series Forecasting Suitable for Low  Cost Wireless Sensor Networks Nodes\nGeneralized Support and Formal Development of Constraint Propagators\nConcept Extraction to Identify Adverse Drug Reactions in Medical Forums:  A Comparison of Algorithms\nA Probabilistic Framework for Representing Dialog Systems and  Entropy-Based Dialog Management through Dynamic Stochastic State Evolution\nCombining Existential Rules and Transitivity: Next Steps\nExplanation of Stagnation at Points that are not Local Optima in  Particle Swarm Optimization by Potential Analysis\nManipulation is Harder with Incomplete Votes\nDiscovering Valuable Items from Massive Data\nQuizz: Targeted crowdsourcing with a billion (potential) users\nHomogeneous Spiking Neuromorphic System for Real-World Pattern  Recognition\nProbabilistic Numerics and Uncertainty in Computations\nBootstrapping Skills\nLeading Tree in DPCLUS and Its Impact on Building Hierarchies\nPlace classification with a graph regularized deep neural network model\nLeveraging Word Embeddings for Spoken Document Summarization\nSpatial Symmetry Driven Pruning Strategies for Efficient Declarative  Spatial Reasoning\nSAT-based Analysis of Large Real-world Feature Models is Easy\nHybrid Algorithm for Multi-Objective Optimization by Greedy Hypervolume  Maximization\nThe Extreme Value Machine\nSkopus: Mining top-k sequential patterns under leverage\nExact and approximate inference in graphical models: variable  elimination and beyond\nCharacterization of Logic Program Revision as an Extension of  Propositional Revision\nAdaptive Automation: Leveraging Machine Learning to Support  Uninterrupted Automated Testing of Software Applications\nPredicting respiratory motion for real-time tumour tracking in  radiotherapy\nPerceptron like Algorithms for Online Learning to Rank\nKm4City Ontology Building vs Data Harvesting and Cleaning for Smart-city  Services\nUsing Linguistic Analysis to Translate Arabic Natural Language Queries  to SPARQL\nType-Constrained Representation Learning in Knowledge Graphs\nNo Regret Bound for Extreme Bandits\nSETI via Leakage from Light Sails in Exoplanetary Systems\nBeyond-Quantum Modeling of Question Order Effects and Response  Replicability in Psychological Measurements\nSchema Independent Relational Learning\nA Refinement-Based Architecture for Knowledge Representation and  Reasoning in Robotics\nClustering With Side Information: From a Probabilistic Model to a  Deterministic Algorithm\nComputing Stable Coalitions: Approximation Algorithms for Reward Sharing\nReal-time Top-K Predictive Query Processing over Event Streams\nA unified heuristic and an annotated bibliography for a large class of  earliness-tardiness scheduling problems\nA Behavior Analysis-Based Game Bot Detection Approach Considering  Various Play Styles\nRecurrent Reinforcement Learning: A Hybrid Approach\nEfficient Convolutional Neural Networks for Pixelwise Classification on  Heterogeneous Hardware Systems\nBio-Inspired Human Action Recognition using Hybrid Max-Product  Neuro-Fuzzy Classifier and Quantum-Behaved PSO\nOn Reasoning with RDF Statements about Statements using Singleton  Property Triples\nMapping Heritability of Large-Scale Brain Networks with a Billion  Connections {\\em via} Persistent Homology\nExtraction of evidence tables from abstracts of randomized clinical  trials using a maximum entropy classifier and global constraints\nA Simulated Annealing Approach to Bayesian Inference\nVisual Generalized Coordinates\nEvaluation of Protein-protein Interaction Predictors with Noisy  Partially Labeled Data Sets\nReasoning about Entailment with Neural Attention\nMinimum Weight Perfect Matching via Blossom Belief Propagation\nDesigning Behaviour in Bio-inspired Robots Using Associative Topologies  of Spiking-Neural-Networks\nSymbol Emergence in Robotics: A Survey\nKnowledge-based system for collaborative process specification\nSupporting interoperability of collaborative networks through  engineering of a service-based Mediation Information System (MISE 2.0)\nSymbolic Neutrosophic Theory\nThe GTR-model: a universal framework for quantum-like measurements\nNeural Enquirer: Learning to Query Tables with Natural Language\nAssessing forensic evidence by computing belief functions\nCrossCat: A Fully Bayesian Nonparametric Method for Analyzing  Heterogeneous, High Dimensional Data\nOn the Min-cost Traveling Salesman Problem with Drone\nA Restricted Visual Turing Test for Deep Scene and Event Understanding\nFast Algorithms for Game-Theoretic Centrality Measures\nOptical SETI Observations of the Anomalous Star KIC 8462852\nA Novel Regularized Principal Graph Learning Framework on Explicit Graph  Representation\nQuery Answering over Contextualized RDF/OWL Knowledge with  Forall-Existential Bridge Rules: Decidable Finite Extension Classes (Post  Print)\nHyper-Heuristic Algorithm for Finding Efficient Features in Diagnose of  Lung Cancer Disease\nSymphony from Synapses: Neocortex as a Universal Dynamical Systems  Modeller using Hierarchical Temporal Memory\nSolving stable matching problems using answer set programming\nDeep Active Object Recognition by Joint Label and Action Prediction\nBlind, Greedy, and Random: Ordinal Approximation Algorithms for Matching  and Clustering\nProbabilistic Programming with Gaussian Process Memoization\nLearning the Preferences of Ignorant, Inconsistent Agents\nOn Voting and Facility Location\nA Mathematical Theory of Deep Convolutional Neural Networks for Feature  Extraction\nAddressing Complex and Subjective Product-Related Queries with Customer  Reviews\nNews Across Languages - Cross-Lingual Document Similarity and Event  Tracking\nHeuristic algorithms for finding distribution reducts in probabilistic  rough set model\nThe ERA of FOLE: Foundation\nDeep Reinforcement Learning in Large Discrete Action Spaces\nMining Massive Hierarchical Data Using a Scalable Probabilistic  Graphical Model\nOn the Foundations of the Brussels Operational-Realistic Approach to  Cognition\nModeling Variations of First-Order Horn Abduction in Answer Set  Programming\nSimple, Robust and Optimal Ranking from Pairwise Comparisons\nProgramming in logic without logic programming\nHow do neurons operate on sparse distributed representations? A  mathematical theory of sparsity, neurons and active dendrites\nLearning Preferences for Manipulation Tasks from Online Coactive  Feedback\nDrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing  Hyperparameters of Deep Neural Networks\nLarge Collection of Diverse Gene Set Search Queries Recapitulate Known  Protein-Protein Interactions and Gene-Gene Functional Associations\nRobobarista: Learning to Manipulate Novel Objects via Deep Multimodal  Embedding\nFunnel Libraries for Real-Time Robust Feedback Motion Planning\nEngineering Safety in Machine Learning\nGraded Entailment for Compositional Distributional Semantics\nThe DARPA Twitter Bot Challenge\nPersuasive Teachable Agent for Intergenerational Learning\nMulti-Object Reasoning with Constrained Goal Models\nUsing Social Networks to Aid Homeless Shelters: Dynamic Influence  Maximization under Uncertainty - An Extended Version\nAdaptive Subgradient Methods for Online AUC Maximization\nSpatial Concept Acquisition for a Mobile Robot that Integrates  Self-Localization and Unsupervised Word Discovery from Spoken Sentences\nRecognition of Visually Perceived Compositional Human Actions by  Multiple Spatio-Temporal Scales Recurrent Neural Networks\nConvex Relaxation Regression: Black-Box Optimization of Smooth Functions  by Learning Their Convex Envelopes\nClassification Accuracy as a Proxy for Two Sample Testing\nScalable Text Mining with Sparse Generative Models\nPredicting Clinical Events by Combining Static and Dynamic Information  Using Recurrent Neural Networks\nNetwork of Bandits insure Privacy of end-users\nModeling Human Ad Hoc Coordination\nA Truthful Mechanism with Biparameter Learning for Online Crowdsourcing\nIdentifying Structures in Social Conversations in NSCLC Patients through  the Semi-Automatic extraction of Topical Taxonomies\n\"Why Should I Trust You?\": Explaining the Predictions of Any Classifier\nCommunication-Efficient Learning of Deep Networks from Decentralized  Data\nMachine learning meets network science: dimensionality reduction for  fast and efficient embedding of networks in the hyperbolic space\nDetermining the best attributes for surveillance video keywords  generation\nA Motion Planning Strategy for the Active Vision-Based Mapping of  Ground-Level Structures\nAugur: Mining Human Behaviors from Fiction to Power Interactive Systems\nMoving Target Defense for Web Applications using Bayesian Stackelberg  Games\nUnbounded Human Learning: Optimal Scheduling for Spaced Repetition\nVisual Genome: Connecting Language and Vision Using Crowdsourced Dense  Image Annotations\nStochastic Shortest Path with Energy Constraints in POMDPs\nEnhancing Genetic Algorithms using Multi Mutations\nLarge-Scale Detection of Non-Technical Losses in Imbalanced Data Sets\nA Neutrosophic Recommender System for Medical Diagnosis Based on  Algebraic Neutrosophic Measures\nAnalyzing Games with Ambiguous Player Types using the ${\\rm MINthenMAX}$  Decision Model\nUnscented Bayesian Optimization for Safe Robot Grasping\nLearning Shared Representations in Multi-task Reinforcement Learning\nLearning Hand-Eye Coordination for Robotic Grasping with Deep Learning  and Large-Scale Data Collection\nUTA-poly and UTA-splines: additive value functions with polynomial  marginals\nBuilding a Fine-Grained Entity Typing System Overnight for a New X (X =  Language, Domain, Genre)\nInferring Fine-grained Details on User Activities and Home Location from  Social Media: Detecting Drinking-While-Tweeting Patterns in Communities\nBayesian Opponent Exploitation in Imperfect-Information Games\nSequential Voting Promotes Collective Discovery in Social Recommendation  Systems\nTuring learning: a metric-free approach to inferring behavior and its  application to swarms\nOne-Shot Generalization in Deep Generative Models\nNeural Aggregation Network for Video Face Recognition\nA Comprehensive Performance Evaluation of Deformable Face Tracking  \"In-the-Wild\"\nGenerating Natural Questions About an Image\nAutomated Correction for Syntax Errors in Programming Assignments using  Recurrent Neural Networks\nFeeling the Bern: Adaptive Estimators for Bernoulli Probabilities of  Pairwise Comparisons\nA Novel Biologically Mechanism-Based Visual Cognition Model--Automatic  Extraction of Semantics, Formation of Integrated Concepts and Re-selection  Features for Ambiguity\nUsing Enthymemes to Fill the Gap between Logical Argumentation and  Revision of Abstract Argumentation Frameworks\nGreedy Strategies and Larger Islands of Tractability for Conjunctive  Queries and Constraint Satisfaction Problems\nLook-ahead before you leap: end-to-end active recognition by forecasting  the effect of motion\nAdaptive Candidate Generation for Scalable Edge-discovery Tasks on Data  Graphs\nA heuristic algorithm for a single vehicle static bike sharing  rebalancing problem\nA Comparative Evaluation of Approximate Probabilistic Simulation and  Deep Neural Networks as Accounts of Human Physical Scene Understanding\nMovement Coordination in Human-Robot Teams: A Dynamical Systems Approach\nODE - Augmented Training Improves Anomaly Detection in Sensor Data from  Machines\nBeyond knowing that: a new generation of epistemic logics\nMarkov Chain methods for the bipartite Boolean quadratic programming  problem\nLow-Complexity Stochastic Generalized Belief Propagation\nThe GPU-based Parallel Ant Colony System\nBuilding a Large Scale Dataset for Image Emotion Recognition: The Fine  Print and The Benchmark\nAsk Your Neurons: A Deep Learning Approach to Visual Question Answering\nA constrained L1 minimization approach for estimating multiple Sparse  Gaussian or Nonparanormal Graphical Models\nCausal Discovery for Manufacturing Domains\nOnline Optimization Methods for the Quantification Problem\nMonotone Retargeting for Unsupervised Rank Aggregation with Object  Features\nGeneralized Linear Models for Aggregated Data\nGearbox Fault Detection through PSO Exact Wavelet Analysis and SVM  Classifier\nEnhanced Twitter Sentiment Classification Using Contextual Information\nHeart Rate Variability and Respiration Signal as Diagnostic Tools for  Late Onset Sepsis in Neonatal Intensive Care Units\nOption Discovery in Hierarchical Reinforcement Learning using  Spatio-Temporal Clustering\nDetecting Novel Processes with CANDIES -- An Holistic Novelty Detection  Technique based on Probabilistic Models\nThe Information-Collecting Vehicle Routing Problem: Stochastic  Optimization for Emergency Storm Response\nBayesian Variable Selection for Globally Sparse Probabilistic PCA\nInteractive Debugging of Knowledge Bases\nTowards Automation of Knowledge Understanding: An Approach for  Probabilistic Generative Classifiers\nQuantifying the accuracy of approximate diffusions and Markov chains\nAnomaly Detection in XML-Structured SOAP Messages Using Tree-Based  Association Rule Mining\nOptimal Number of Choices in Rating Contexts\nOnline Influence Maximization under Independent Cascade Model with  Semi-Bandit Feedback\nBackprop KF: Learning Discriminative Deterministic State Estimators\nUnsupervised Learning for Physical Interaction through Video Prediction\nA note on privacy preserving iteratively reweighted least squares\nYum-me: A Personalized Nutrient-based Meal Recommender System\nData Programming: Creating Large Training Sets, Quickly\nDimension Projection among Languages based on Pseudo-relevant Documents  for Query Translation\nToward a general, scaleable framework for Bayesian teaching with  applications to topic models\nDeep Predictive Coding Networks for Video Prediction and Unsupervised  Learning\nLow-Cost Scene Modeling using a Density Function Improves Segmentation  Performance\nTensorFlow: A system for large-scale machine learning\nHybrid Perturbation methods based on Statistical Time Series models\nSynthesizing the preferred inputs for neurons in neural networks via  deep generator networks\nAdversarial Feature Learning\nApplications of Probabilistic Programming (Master's thesis, 2015)\nMultiresolution Recurrent Neural Networks: An Application to Dialogue  Response Generation\nTowards a Job Title Classification System\nECMdd: Evidential c-medoids clustering with multiple prototypes\nMultimodal Compact Bilinear Pooling for Visual Question Answering and  Visual Grounding\nA Formal Calculus for International Relations Computation and Evaluation\nFace valuing: Training user interfaces with facial expressions and  reinforcement learning\nPolicy Networks with Two-Stage Training for Dialogue Systems\nFuzzy-Klassen Model for Development Disparities Analysis based on Gross  Regional Domestic Product Sector of a Region\nCommunity Structure in Industrial SAT Instances\nStore Location Selection via Mining Search Query Logs of Baidu Maps\nA framework for redescription set construction\nEntropy/IP: Uncovering Structure in IPv6 Addresses\nInvariant recognition drives neural representations of action sequences\nSafe Exploration in Finite Markov Decision Processes with Gaussian  Processes\nAssessing Human Error Against a Benchmark of Perfection\nData-driven HR - Résumé Analysis Based on Natural Language  Processing and Machine Learning\nAn Efficient Large-scale Semi-supervised Multi-label Classifier Capable  of Handling Missing labels\nA Comparative Analysis of classification data mining techniques :  Deriving key factors useful for predicting students performance\nThe LAMBADA dataset: Word prediction requiring a broad discourse context\nBootstrapping with Models: Confidence Intervals for Off-Policy  Evaluation\nKnowledge-Defined Networking\nGraphical Models for Optimal Power Flow\nSimultaneous Control and Human Feedback in the Training of a Robotic  Agent with Actor-Critic Reinforcement Learning\nEfficient Attack Graph Analysis through Approximate Inference\nE-commerce in Your Inbox: Product Recommendations at Scale\nResolving Distributed Knowledge\nAn Axiomatic Approach to Routing\nEnriching Linked Datasets with New Object Properties\nA Game-Theoretic Approach to Word Sense Disambiguation\nNon-Monotonic Spatial Reasoning with Answer Set Programming Modulo  Theories\nSTransE: a novel embedding model of entities and relationships in  knowledge bases\nInformation integration from distributed threshold-based interactions\nAlgebraic foundations for qualitative calculi and networks\nLSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection\nThe SERENDIP III 70 cm Search for Extraterrestrial Intelligence\nA Greedy Approach to Adapting the Trace Parameter for Temporal  Difference Learning\nNeighborhood Features Help Detecting Non-Technical Losses in Big Data  Sets\nApplication of Statistical Relational Learning to Hybrid Recommendation  Systems\nClick Carving: Segmenting Objects in Video with Point Clicks\nOptimal control for a robotic exploration, pick-up and delivery problem\nOne-Shot Session Recommendation Systems with Combinatorial Items\nMixed Strategy for Constrained Stochastic Optimal Control\nCost-Optimal Algorithms for Planning with Procedural Control Knowledge\nFundamental Parameters of Main-Sequence Stars in an Instant with Machine  Learning\nDocument Clustering Games in Static and Dynamic Scenarios\nLog-Linear RNNs: Towards Recurrent Neural Networks with Flexible Prior  Knowledge\nExtended Graded Modalities in Strategy Logic\nDeepQA: Improving the estimation of single protein model quality with  deep belief networks\nJuxtaposition of System Dynamics and Agent-based Simulation for a Case  Study in Immunosenescence\nSimulating user learning in authoritative technology adoption: An agent  based model for council-led smart meter deployment planning in the UK\nOptimising Rule-Based Classification in Temporal Data\nExploring Differences in Interpretation of Words Essential in Medical  Expert-Patient Communication\nSupervised Adverse Drug Reaction Signalling Framework Imitating Bradford  Hill's Causality Considerations\nMan is to Computer Programmer as Woman is to Homemaker? Debiasing Word  Embeddings\nNovel Word Embedding and Translation-based Language Modeling for  Extractive Speech Summarization\nAutomated Prediction of Temporal Relations\nClassification of Alzheimer's Disease Structural MRI Data by Deep  Learning Convolutional Neural Networks\nAn Evolutionary Algorithm to Learn SPARQL Queries for  Source-Target-Pairs: Finding Patterns for Human Associations in DBpedia\nMultiple scan data association by convex variational inference\nEfficient Hyperparameter Optimization of Deep Learning Algorithms Using  Deterministic RBF Surrogates\nRobust Contextual Outlier Detection: Where Context Meets Sparsity\nA novel online multi-label classifier for high-speed streaming data  applications\nNeural Coarse-Graining: Extracting slowly-varying latent degrees of  freedom with neural networks\nFitted Learning: Models with Awareness of their Limits\nDESPOT: Online POMDP Planning with Regularization\nModelling Creativity: Identifying Key Components through a Corpus-Based  Approach\n\"Flow Size Difference\" Can Make a Difference: Detecting Malicious TCP  Network Flows Based on Benford's Law\nEven Good Bots Fight: The Case of Wikipedia\nConcordance and the Smallest Covering Set of Preference Orderings\nThe ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to  Foster Reproducible Research\nTowards Deep Symbolic Reinforcement Learning\nContext-aware Sequential Recommendation\nTemporal Logic Programs with Variables\nAn Efficient Method of Partitioning High Volumes of Multidimensional  Data for Parallel Clustering Algorithms\nVote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient  Convolutional Neural Networks\nDeep-Learned Collision Avoidance Policy for Distributed Multi-Agent  Navigation\nMulti-document abstractive summarization using ILP based multi-sentence  compression\nPose-Selective Max Pooling for Measuring Similarity\nTowards the bio-personalization of music recommendation systems: A  single-sensor EEG biomarker of subjective music preference\nPredictive modelling of football injuries\nTesting Quantum Models of Conjunction Fallacy on the World Wide Web\nAn Ontology of Preference-Based Multiobjective Metaheuristics\nA Fast Factorization-based Approach to Robust PCA\nHierarchical Memory Networks for Answer Selection on Unknown Words\nLearning from the Hindsight Plan -- Episodic MPC Improvement\nICE: Information Credibility Evaluation on Social Media via  Representation Learning\nComprehensive Evaluation of OpenCL-based Convolutional Neural Network  Accelerators in Xilinx and Altera FPGAs\nTechnical Report: Graph-Structured Sparse Optimization for Connected  Subgraph Detection\nDeep Spatio-Temporal Residual Networks for Citywide Crowd Flows  Prediction\nDeep Reinforcement Learning for Robotic Manipulation with Asynchronous  Off-Policy Updates\nPhase-Mapper: An AI Platform to Accelerate High Throughput Materials  Discovery\nNetwork Structure Inference, A Survey: Motivations, Methods, and  Applications\nA new algorithm for identity verification based on the analysis of a  handwritten dynamic signature\nTowards Cognitive Exploration through Deep Reinforcement Learning for  Mobile Robots\nDeepDGA: Adversarially-Tuned Domain Generation and Detection\nAutomated Phase Mapping with AgileFD and its Application to Light  Absorber Discovery in the V-Mn-Nb Oxide System\nAdaptive Convolutional ELM For Concept Drift Handling in Online Stream  Data\nDiverse Beam Search: Decoding Diverse Solutions from Neural Sequence  Models\nA Music-generating System Inspired by the Science of Complex Adaptive  Systems\nRevisiting Multiple Instance Neural Networks\nPersonalizing a Dialogue System with Transfer Reinforcement Learning\nSafe, Multi-Agent, Reinforcement Learning for Autonomous Driving\nTransfer from Simulation to Real World through Learning Deep Inverse  Dynamics Model\nTruthful Mechanisms for Matching and Clustering in an Ordinal World\nExploiting Sentence and Context Representations in Deep Neural Models  for Spoken Language Understanding\nModular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies\nSafety Verification of Deep Neural Networks\nBalancing Suspense and Surprise: Timely Decision Making with Endogenous  Information Acquisition\nKnowledge will Propel Machine Understanding of Content: Extrapolating  from Current Examples\nKissing Cuisines: Exploring Worldwide Culinary Habits on the Web\nA Review of 40 Years of Cognitive Architecture Research: Core Cognitive  Abilities and Practical Applications\nFast Low-rank Shared Dictionary Learning for Image Classification\nPersonalized Risk Scoring for Critical Care Prognosis using Mixtures of  Gaussian Processes\nOptimal Belief Approximation\nIntegrating Topic Models and Latent Factors for Recommendation\nGeneralized Haar Filter based Deep Networks for Real-Time Object  Detection in Traffic Scene\nFrom Node Embedding To Community Embedding\nNatural-Parameter Networks: A Class of Probabilistic Neural Networks\nCollaborative Recurrent Autoencoder: Recommend while Learning to Fill in  the Blanks\nExtensions and Limitations of the Neural GPU\nPredicting Domain Generation Algorithms with Long Short-Term Memory  Networks\nMaximizing Investment Value of Small-Scale PV in a Smart Grid  Environment\nNeural Architecture Search with Reinforcement Learning\nTopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency\nNeuro-Symbolic Program Synthesis\nRL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning\nLearning to Play Guess Who? and Inventing a Grounded Language as a  Consequence\nA Connection between Generative Adversarial Networks, Inverse  Reinforcement Learning, and Energy-Based Models\nMultilingual Knowledge Graph Embeddings for Cross-lingual Knowledge  Alignment\nEntropic Causal Inference\nCommonsense Knowledge Enhanced Embeddings for Solving Pronoun  Disambiguation Problems in Winograd Schema Challenge\nRecognizing and Eliciting Weakly Single Crossing Profiles on Trees\nGenerative Models and Model Criticism via Optimized Maximum Mean  Discrepancy\nZero-resource Machine Translation by Multimodal Encoder-decoder Network  with Multimedia Pivot\nLearning-Theoretic Foundations of Algorithm Configuration for  Combinatorial Partitioning Problems\nCausal Inference in Observational Data\nThe Effects of Relative Importance of User Constraints in Cloud of  Things Resource Discovery: A Case Study\nExplicablility as Minimizing Distance from Expected Behavior\nFictitious play for cooperative action selection in robot teams\nOptimal Dynamic Coverage Infrastructure for Large-Scale Fleets of  Reconnaissance UAVs\nLearning to reinforcement learn\nA Generalized Stochastic Variational Bayesian Hyperparameter Learning  Framework for Sparse Spectrum Gaussian Process Regression\nFaster variational inducing input Gaussian process classification\nStructural Causal Models: Cycles, Marginalizations, Exogenous  Reparametrizations and Reductions\nA Survey of Credit Card Fraud Detection Techniques: Data and Technique  Oriented Perspective\nA Deep Learning Approach for Joint Video Frame and Reward Prediction in  Atari Games\nEfficient Delivery Policy to Minimize User Traffic Consumption in  Guaranteed Advertising\nDynamic Key-Value Memory Networks for Knowledge Tracing\nThe Off-Switch Game\nLearning Python Code Suggestion with a Sparse Pointer Network\nDecision Support Systems in Fisheries and Aquaculture: A systematic  review\nBlocking and Other Enhancements for Bottom-Up Model Generation Methods\nSplit-door criterion for causal identification: Automatic search for  natural experiments\nLearning Concept Hierarchies through Probabilistic Topic Modeling\nFractional Order AGC for Distributed Energy Resources Using Robust  Optimization\nMeasuring and modeling the perception of natural and unconstrained gaze  in humans and machines\nCapacity and Trainability in Recurrent Neural Networks\nChoquet integral in decision analysis - lessons from the axiomatization\nProportional Justified Representation\nJoint Causal Inference from Multiple Contexts\nSpatial Decompositions for Large Scale SVMs\nSelf-critical Sequence Training for Image Captioning\nActive Search for Sparse Signals with Region Sensing\nCognitive Deep Machine Can Train Itself\nSummary - TerpreT: A Probabilistic Programming Language for Program  Induction\nA New Type-II Fuzzy Logic Based Controller for Non-linear Dynamical  Systems with Application to a 3-PSP Parallel Robot\nA Multi-Pass Approach to Large-Scale Connectomics\nFixpoint Approximation of Strategic Abilities under Imperfect  Information\nLearning in the Machine: Random Backpropagation and the Deep Learning  Channel\nSafety Verification and Control for Collision Avoidance at Road  Intersections\nTask-Guided and Path-Augmented Heterogeneous Network Embedding for  Author Identification\nStackGAN: Text to Photo-realistic Image Synthesis with Stacked  Generative Adversarial Networks\nKnowledge Elicitation via Sequential Probabilistic Inference for  High-Dimensional Prediction\nMultiple Instance Learning: A Survey of Problem Characteristics and  Applications\nA Model of Multi-Agent Consensus for Vague and Uncertain Beliefs\nOnline Reinforcement Learning for Real-Time Exploration in Continuous  State and Action Markov Decision Processes\nKnowledge Completion for Generics using Guided Tensor Factorization\nWeb-based Argumentation\nRetrieving sinusoids from nonuniformly sampled data using recursive  formulation\nAttentive Explanations: Justifying Decisions and Pointing to the  Evidence\nDynamical Kinds and their Discovery\nAdversarial Message Passing For Graphical Models\nA User Simulator for Task-Completion Dialogues\nOptimal Target Assignment and Path Finding for Teams of Agents\nWeb-based Semantic Similarity for Emotion Recognition in Web Objects\nAn Integrated Optimization + Learning Approach to Optimal Dynamic  Pricing for the Retailer with Multi-type Customers in Smart Grids\nAction-Driven Object Detection with Top-Down Visual Attentions\nA Survey of Deep Network Solutions for Learning Control in Robotics:  From Reinforcement to Imitation\nDifficulty Adjustable and Scalable Constrained Multi-objective Test  Problem Toolkit\nLiquid Democracy: An Analysis in Binary Aggregation and Diffusion\nTheory-guided Data Science: A New Paradigm for Scientific Discovery from  Data\nSequence-to-point learning with neural networks for nonintrusive load  monitoring\nWhen the map is better than the territory\nCounterfactual Prediction with Deep Instrumental Variables Networks\nDeep Neural Networks to Enable Real-time Multimessenger Astrophysics\nLazily Adapted Constant Kinky Inference for Nonparametric Regression and  Model-Reference Adaptive Control\nToward sensitive document release with privacy guarantees\nLearning local trajectories for high precision robotic tasks :  application to KUKA LBR iiwa Cartesian positioning\nPareto Efficient Multi Objective Optimization for Local Tuning of  Analogy Based Estimation\nReinforcement Learning based Embodied Agents Modelling Human Users  Through Interaction and Multi-Sensory Perception\nPredicting Citywide Crowd Flows Using Deep Spatio-Temporal Residual  Networks\nIoFClime: The fuzzy logic and the Internet of Things to control indoor  temperature regarding the outdoor ambient conditions\nLinear Disentangled Representation Learning for Facial Actions\nResidual LSTM: Design of a Deep Recurrent Architecture for Distant  Speech Recognition\nLearning to Invert: Signal Recovery via Deep Convolutional Networks\nUnknowable Manipulators: Social Network Curator Algorithms\nControl Capacity of Partially Observable Dynamic Systems in Continuous  Time\nParsimonious Inference on Convolutional Neural Networks: Learning and  applying on-line kernel activation rules\nSemantic Evolutionary Concept Distances for Effective Information  Retrieval in Query Expansion\nT-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects\nNeurogenesis-Inspired Dictionary Learning: Online Model Adaption in a  Changing World\nConstraint programming for planning test campaigns of communications  satellites\nIncorporating Prior Information in Compressive Online Robust Principal  Component Analysis\nTowards Automatic Learning of Heuristics for Mechanical Transformations  of Procedural Code\nDynamic time warping distance for message propagation classification in  Twitter\nThe Causal Frame Problem: An Algorithmic Perspective\nEntropic Causality and Greedy Minimum Entropy Coupling\nFeature base fusion for splicing forgery detection based on neuro fuzzy\nDecision structure of risky choice\nCredal Networks under Epistemic Irrelevance\nPathNet: Evolution Channels Gradient Descent in Super Neural Networks\nSpatial Projection of Multiple Climate Variables using Hierarchical  Multitask Learning\nComparing Dataset Characteristics that Favor the Apriori, Eclat or  FP-Growth Frequent Itemset Mining Algorithms\nProcedural Content Generation via Machine Learning (PCGML)\nManyopt: An Extensible Tool for Mixed, Non-Linear Optimization Through  SMT Solving\nAutonomous Braking System via Deep Reinforcement Learning\nEnergy Saving Additive Neural Network\nPhase Transitions of the Typical Algorithmic Complexity of the Random  Satisfiability Problem Studied with Linear Programming\nA VLA Search for Radio Signals from M31 and M33\nA Minimax Algorithm Better Than Alpha-beta?: No and Yes\nSimilarity Preserving Representation Learning for Time Series Analysis\nConstraint Answer Set Solver EZCSP and Why Integration Schemas Matter\nEfficient Multi-task Feature and Relationship Learning\nFrustratingly Short Attention Spans in Neural Language Modeling\nTheoretical and Practical Advances on Smoothing for Extensive-Form Games\nDirect Estimation of Information Divergence Using Nearest Neighbor  Ratios\nTowards a Unified Taxonomy of Biclustering Methods\nThreshold Constraints with Guarantees for Parity Objectives in Markov  Decision Processes\nA Circuit-Based Approach to Efficient Enumeration\nPolynomial Time Efficient Construction Heuristics for Vertex Separation  Minimization Problem\nAutomatic Liver and Tumor Segmentation of CT and MRI Volumes using  Cascaded Fully Convolutional Neural Networks\nLearning to Multi-Task by Active Sampling\nTowards a Common Implementation of Reinforcement Learning for Multiple  Robotic Tasks\nKnowledge Graph Completion via Complex Tensor Factorization\nRegularizing Face Verification Nets For Pain Intensity Regression\nProactive Resource Management in LTE-U Systems: A Deep Learning  Perspective\nDiverse Weighted Bipartite b-Matching\nMeasuring #GamerGate: A Tale of Hate, Sexism, and Bullying\nDeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy\nLearning What Data to Learn\nBorrowing Treasures from the Wealthy: Deep Transfer Learning through  Selective Joint Fine-tuning\nAnalysing Congestion Problems in Multi-agent Reinforcement Learning\nTruth and Regret in Online Scheduling\nConversion Rate Optimization through Evolutionary Computation\nLarge-Scale Evolution of Image Classifiers\nCount-Based Exploration with Neural Density Models\nMulti-step Reinforcement Learning: A Unifying Algorithm\nAdversarial Generation of Real-time Feedback with Neural Networks for  Simulation-based Training\nActivation Maximization Generative Adversarial Nets\nLearning a Unified Control Policy for Safe Falling\nDeep Variation-structured Reinforcement Learning for Visual Relationship  and Attribute Detection\nInterpretable Structure-Evolving LSTM\nModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks\nA Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification  and Domain Adaptation\nNumerical Integration and Dynamic Discretization in Heuristic Search  Planning over Hybrid Domains\nSymbol Grounding via Chaining of Morphisms\nBayesian Optimization with Gradients\nHigh-Throughput and Language-Agnostic Entity Disambiguation and Linking  on User Generated Data\nA computational investigation of sources of variability in sentence  comprehension difficulty in aphasia\nTowards Moral Autonomous Systems\nLearned Optimizers that Scale and Generalize\nLook into Person: Self-supervised Structure-sensitive Learning and A New  Benchmark for Human Parsing\nDatabase Learning: Toward a Database that Becomes Smarter Every Time\nSearching for Exoplanets Around X-Ray Binaries with Accreting White  Dwarfs, Neutron Stars, and Black Holes\nApproximation Complexity of Maximum A Posteriori Inference in  Sum-Product Networks\nEvolving Game Skill-Depth using General Video Game AI Agents\nExpecting the Unexpected: Training Detectors for Unusual Pedestrians  with Adversarial Imposters\nLearning Cooperative Visual Dialog Agents with Deep Reinforcement  Learning\nI2T2I: Learning Text to Image Synthesis with Textual Data Augmentation\nApplying Deep Machine Learning for psycho-demographic profiling of  Internet users using O.C.E.A.N. model of personality\nLearning Correspondence Structures for Person Re-identification\nRecurrent Topic-Transition GAN for Visual Paragraph Generation\nZM-Net: Real-time Zero-shot Image Manipulation Network\nRobustFill: Neural Program Learning under Noisy I/O\nSample and Computationally Efficient Learning Algorithms under S-Concave  Distributions\nUnsupervised Basis Function Adaptation for Reinforcement Learning\nEfficient Processing of Deep Neural Networks: A Tutorial and Survey\nAn Analysis of Visual Question Answering Algorithms\nDialectical Rough Sets, Parthood and Figures of Opposition\nFairJudge: Trustworthy User Prediction in Rating Platforms\nMidiNet: A Convolutional Generative Adversarial Network for  Symbolic-domain Music Generation\nEMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning\nComputing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural  Networks with Many More Parameters than Training Data\nOn the Reliable Detection of Concept Drift from Streaming Unlabeled Data\nContextual Data Collection for Smart Cities\nHuman-Aware Sensor Network Ontology: Semantic Support for Empirical Data  Collection\nTackling Dynamic Vehicle Routing Problem with Time Windows by means of  Ant Colony System\nThe quality of priority ratios estimation in relation to a selected  prioritization procedure and consistency measure for a Pairwise Comparison  Matrix\nAn Automated Text Categorization Framework based on Hyperparameter  Optimization\nPay Attention to Those Sets! Learning Quantification from Images\nA Dual-Stage Attention-Based Recurrent Neural Network for Time Series  Prediction\nComposite Task-Completion Dialogue Policy Learning via Hierarchical Deep  Reinforcement Learning\nQuality Aware Network for Set to Set Recognition\nOn Generalized Bellman Equations and Temporal-Difference Learning\nA Security Monitoring Framework For Virtualization Based HEP  Infrastructures\nEffective Warm Start for the Online Actor-Critic Reinforcement Learning  based mHealth Intervention\nLarger is Better: The Effect of Learning Rates Enjoyed by Stochastic  Optimization with Progressive Variance Reduction\nA Century of Science: Globalization of Scientific Collaborations,  Citations, and Innovations\nIntrusion Prevention and Detection in Grid Computing - The ALICE Case\nAccurately and Efficiently Interpreting Human-Robot Instructions of  Varying Granularities\nA hybrid spatial data mining approach based on fuzzy topological  relations and MOSES evolutionary algorithm\nMisspecified Linear Bandits\nA General Theory for Training Learning Machine\nTowards Instance Segmentation with Object Priority: Prominent Object  Detection and Recognition\nLeveraging Patient Similarity and Time Series Data in Healthcare  Predictive Models\nSharing deep generative representation for perceived image  reconstruction from human brain activity\nEvent Stream-Based Process Discovery using Abstract Representations\nConsensus measure of rankings\nNetwork-based coverage of mutational profiles reveals cancer genes\nClassical Planning in Deep Latent Space: Bridging the  Subsymbolic-Symbolic Boundary\nA Partitioning Algorithm for Detecting Eventuality Coincidence in  Temporal Double recurrence\nTree-Structured Neural Machine for Linguistics-Aware Sentence Generation\nQuantifying Mental Health from Social Media with Neural User Embeddings\nShow, Adapt and Tell: Adversarial Training of Cross-domain Image  Captioner\nThe N-Tuple Bandit Evolutionary Algorithm for Automatic Game Improvement\nFast k-means based on KNN Graph\nA Workflow for Visual Diagnostics of Binary Classifiers using  Instance-Level Explanations\nExploring Latent Semantic Factors to Find Useful Product Reviews\nItem Recommendation with Evolving User Preferences and Experience\nR2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD  Malware Detections\nLong-term Blood Pressure Prediction with Deep Recurrent Neural Networks\nClingcon: The Next Generation\nMonaural Audio Speaker Separation with Source Contrastive Estimation\nRelaxation heuristics for the set multicover problem with generalized  upper bound constraints\nComparison-Based Choices\nOnline Article Ranking as a Constrained, Dynamic, Multi-Objective  Optimization Problem\nMultiobjective Programming for Type-2 Hierarchical Fuzzy Inference Trees\nREMIX: Automated Exploration for Interactive Outlier Detection\nDistributed Vector Representation Of Shopping Items, The Customer And  Shopping Cart To Build A Three Fold Recommendation System\nEvolving Ensemble Fuzzy Classifier\nI Probe, Therefore I Am: Designing a Virtual Journalist with Human  Emotions\nLearning Spatiotemporal Features for Infrared Action Recognition with 3D  Convolutional Neural Networks\nFeature Control as Intrinsic Motivation for Hierarchical Reinforcement  Learning\nLearning Convolutional Text Representations for Visual Question  Answering\nParameter Adaptation and Criticality in Particle Swarm Optimization\nA Comparison of Reinforcement Learning Techniques for Fuzzy Cloud  Auto-Scaling\nBatch Reinforcement Learning on the Industrial Benchmark: First  Experiences\nLearning to Factor Policies and Action-Value Functions: Factored Action  Space Representations for Deep Reinforcement learning\nLearning to Mix n-Step Returns: Generalizing lambda-Returns for Deep  Reinforcement Learning\nShallow Updates for Deep Reinforcement Learning\nNear-Feasible Stable Matchings with Budget Constraints\nThinking Fast and Slow with Deep Learning and Tree Search\nMMD GAN: Towards Deeper Understanding of Moment Matching Network\nHow a General-Purpose Commonsense Ontology can Improve Performance of  Learning-Based Image Retrieval\nCompiling quantum circuits to realistic hardware architectures using  temporal planners\nLogic Tensor Networks for Semantic Image Interpretation\nOperation Frames and Clubs in Kidney Exchange\nGenerating Time-Based Label Refinements to Discover More Precise Process  Models\nMultiple Source Domain Adaptation with Adversarial Training of Neural  Networks\nAMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks\nMulti-shot ASP solving with clingo\nBayesian Unification of Gradient and Bandit-based Learning for  Accelerated Global Optimisation\nMulti-Labelled Value Networks for Computer Go\nStrength Factors: An Uncertainty System for a Quantified Modal Logic\nLifelong Multi-Agent Path Finding for Online Pickup and Delivery Tasks\nThe Sample Complexity of Online One-Class Collaborative Filtering\nCross-modal Common Representation Learning by Hybrid Transfer Network\nInterpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient  Estimation for Deep Reinforcement Learning\nLearning Disentangled Representations with Semi-Supervised Deep  Generative Models\nHyperparameter Optimization: A Spectral Approach\nVisuospatial Skill Learning for Robots\nEvent Representations for Automated Story Generation with Deep Neural  Nets\nEmergence of Invariance and Disentangling in Deep Representations\nSeamless Integration and Coordination of Cognitive Skills in Humanoid  Robots: A Deep Learning Approach\nPredictive Coding-based Deep Dynamic Neural Network for Visuomotor  Learning\nWhere is my forearm? Clustering of body parts from simultaneous tactile  and linguistic input using sequential mapping\nItem Silk Road: Recommending Items from Information Domains to Social  Users\nDeep Optimization for Spectrum Repacking\nLearning Large-Scale Topological Maps Using Sum-Product Networks\nOptimal Auctions through Deep Learning\nDAC-h3: A Proactive Robot Cognitive Architecture to Acquire and Express  Knowledge About the World and the Self\nSEVEN: Deep Semi-supervised Verification Networks\nGetting deep recommenders fit: Bloom embeddings for sparse binary  input/output networks\nMeta learning Framework for Automated Driving\nBeyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural  Network and Long-Term Evaluation\nAutonomous Reactive Mission Scheduling and Task-Path Planning  Architecture for Autonomous Underwater Vehicle\nIdentifying Spatial Relations in Images using Convolutional Neural  Networks\nThe Opacity of Backbones and Backdoors Under a Weak Assumption\nLearning a visuomotor controller for real world robotic grasping using  simulated depth images\nTarget Curricula via Selection of Minimum Feature Sets: a Case Study in  Boolean Networks\nDevice Placement Optimization with Reinforcement Learning\nEvaluating Noisy Optimisation Algorithms: First Hitting Time is  Problematic\nAccelerating Innovation Through Analogy Mining\nConsistent feature attribution for tree ensembles\nScalable Co-Optimization of Morphology and Control in Embodied Machines\nA Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural  Networks for Acoustic Scene Classification\nEnsemble Framework for Real-time Decision Making\nInter-Session Modeling for Session-Based Recommendation\nComparing Neural and Attractiveness-based Visual Features for Artwork  Recommendation\nPreserving Intermediate Objectives: One Simple Trick to Improve Learning  for Hierarchical Models\nJustifications in Constraint Handling Rules for Logical Retraction in  Dynamic Algorithms\nAuto-Encoding User Ratings via Knowledge Graphs in Recommendation  Scenarios\nStochastic Bandit Models for Delayed Conversions\nA Deep Reinforcement Learning Framework for the Financial Portfolio  Management Problem\nTableaux for Policy Synthesis for MDPs with PCTL* Constraints\nHashing Over Predicted Future Frames for Informed Exploration of Deep  Reinforcement Learning\nAutomated Problem Identification: Regression vs Classification via  Evolutionary Deep Networks\nEfficient Probabilistic Performance Bounds for Inverse Reinforcement  Learning\nELF: An Extensive, Lightweight and Flexible Research Platform for  Real-time Strategy Games\nWasserstein Distance Guided Representation Learning for Domain  Adaptation\nInformation-gain computation\nAn HTM based cortical algorithm for detection of seismic waves\nApplication of Fuzzy Assessing for Reliability Decision Making\nEmergence of Locomotion Behaviours in Rich Environments\nPELESent: Cross-domain polarity classification using distant supervision\nBest-Effort Inductive Logic Programming via Fine-grained Cost-based  Hypothesis Generation\nRevisiting Unreasonable Effectiveness of Data in Deep Learning Era\nLearning Heuristic Search via Imitation\nAn Optimal Bayesian Network Based Solution Scheme for the Constrained  Stochastic On-line Equi-Partitioning Problem\nA Simple Neural Attentive Meta-Learner\nImitation from Observation: Learning to Imitate Behaviors from Raw Video  via Context Translation\nNO Need to Worry about Adversarial Examples in Object Detection in  Autonomous Vehicles\nExoplanet Transits as the Foundation of an Interstellar Communications  Network\nKafnets: kernel-based non-parametric activation functions for neural  networks\nDisentangling Motion, Foreground and Background Features in Videos\nComparison of Multiple Features and Modeling Methods for Text-dependent  Speaker Verification\nLenient Multi-Agent Deep Reinforcement Learning\nOn consistency of optimal pricing algorithms in repeated posted-price  auctions with strategic buyer\nWhen You Must Forget: beyond strong persistence when forgetting in  answer set programming\nTrial without Error: Towards Safe Reinforcement Learning via Human  Intervention\nLatent Relational Metric Learning via Memory-based Attention for  Collaborative Ranking\nDetection, Recognition and Tracking of Moving Objects from Real-time  Video via Visual Vocabulary Model and Species Inspired PSO\nObject Tracking based on Quantum Particle Swarm Optimization\nReverse Curriculum Generation for Reinforcement Learning\nLogic Programming approaches for routing fault-free and  maximally-parallel Wavelength Routed Optical Networks on Chip (Application  paper)\nHybrid Conditional Planning using Answer Set Programming\nLearning model-based planning from scratch\nRepresenting Hybrid Automata by Action Language Modulo Theories\nDiscretization-free Knowledge Gradient Methods for Bayesian Optimization\nRAIL: Risk-Averse Imitation Learning\nOutcome-Oriented Predictive Process Monitoring: Review and Benchmark\nDomain Recursion for Lifted Inference with Existential Quantifiers\nRelational Learning and Feature Extraction by Querying over  Heterogeneous Information Networks\nEvidence combination for a large number of sources\nProbabilistic Graphical Models for Credibility Analysis in Evolving  Online Communities\nA Logic for Global and Local Announcements\nRelaxing Exclusive Control in Boolean Games\nMen Also Like Shopping: Reducing Gender Bias Amplification using  Corpus-level Constraints\nLarge-Scale Low-Rank Matrix Learning with Nonconvex Regularizers\nAttend and Predict: Understanding Gene Regulation by Selective Attention  on Chromatin\nBalancing Explicability and Explanation in Human-Aware Planning\nA Novel Neural Network Model Specified for Representing Logical  Relations\nHidden Physics Models: Machine Learning of Nonlinear Partial  Differential Equations\nProjectionNet: Learning Efficient On-Device Deep Networks Using Neural  Projections\nAdversarial-Playground: A Visualization Suite Showing How Adversarial  Examples Fool Deep Learning\nObject-Oriented Sokoban Solver: A Serious Game Project for OOAD and AI  Education\nThe Argument Reasoning Comprehension Task: Identification and  Reconstruction of Implicit Warrants\nAn Effective Training Method For Deep Convolutional Neural Network\nThompson Sampling Guided Stochastic Searching on the Line for Deceptive  Environments with Applications to Root-Finding Problems\nTowards Social Autonomous Vehicles: Efficient Collision Avoidance Scheme  Using Richardson's Arms Race Model\nMeasuring Catastrophic Forgetting in Neural Networks\nRegulating Highly Automated Robot Ecologies: Insights from Three User  Studies\nPowerAI DDL\nUnsupervised Domain Adaptation for Face Recognition in Unlabeled Videos\nAn Approach with Toric Varieties for Singular Learning Machines\nDeep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual  Cross Retrieval\nA Framework for Inferring Causality from Multi-Relational Observational  Data using Conditional Independence\nRobust Computer Algebra, Theorem Proving, and Oracle AI\nNeural Network Dynamics for Model-Based Deep Reinforcement Learning with  Model-Free Fine-Tuning\nAn automatic water detection approach based on Dempster-Shafer theory  for multi spectral images\nA Simple and Realistic Pedestrian Model for Crowd Simulation and  Application\nSESA: Supervised Explicit Semantic Analysis\nUniversal limits to parallel processing capability of network  architectures\nResilient Linear Classification: An Approach to Deal with Attacks on  Training Data\nLearning to Attend, Copy, and Generate for Session-Based Query  Suggestion\nBelief Tree Search for Active Object Recognition\nOptimization of Ensemble Supervised Learning Algorithms for Increased  Sensitivity, Specificity, and AUC of Population-Based Colorectal Cancer  Screenings\nA scalable multi-core architecture with heterogeneous memory structures  for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)\nLearning to Plan Chemical Syntheses\nDistance and Similarity Measures Effect on the Performance of K-Nearest  Neighbor Classifier - A Review\nDeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain\nGeometric Enclosing Networks\nMulti-task Neural Network for Non-discrete Attribute Prediction in  Knowledge Graphs\nEvaluating Visual Conversational Agents via Cooperative Human-AI Games\nLearning Musical Relations using Gated Autoencoders\nA Data and Model-Parallel, Distributed and Scalable Framework for  Training of Deep Networks in Apache Spark\nStochastic Primal-Dual Proximal ExtraGradient Descent for Compositely  Regularized Optimization\nEfficient Online Inference for Infinite Evolutionary Cluster models with  Applications to Latent Social Event Discovery\nFake News in Social Networks\nWhat caused what? An irreducible account of actual causation\nPredicting Aesthetic Score Distribution through Cumulative  Jensen-Shannon Divergence\nFinding Streams in Knowledge Graphs to Support Fact Checking\nOn the Compressive Power of Deep Rectifier Networks for High Resolution  Representation of Class Boundaries\nArea Protection in Adversarial Path-Finding Scenarios with Multiple  Mobile Agents on Graphs: a theoretical and experimental study of  target-allocation strategies for defense coordination\nLearning 6-DOF Grasping Interaction with Deep Geometry-aware 3D  Representations\nDivergence, Entropy, Information: An Opinionated Introduction to  Information Theory\nHamiltonian Maker-Breaker games on small graphs\nDelayed Sampling and Automatic Rao-Blackwellization of Probabilistic  Programs\nAccelerating Dependency Graph Learning from Heterogeneous Categorical  Event Streams via Knowledge Transfer\nRobust Task Clustering for Deep Many-Task Learning\nLocal Gaussian Processes for Efficient Fine-Grained Traffic Speed  Prediction\nSubspace Selection to Suppress Confounding Source Domain Information in  AAM Transfer Learning\nDeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous  Cars\nA Deep Learning Approach for Population Estimation from Satellite  Imagery\nCalibrating chemical multisensory devices for real world applications:  An in-depth comparison of quantitative Machine Learning approaches\nIncorporating Feedback into Tree-based Anomaly Detection\nR$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering\nBehavior Trees in Robotics and AI: An Introduction\nSeq2SQL: Generating Structured Queries from Natural Language using  Reinforcement Learning\nSmile for the Camera: Privacy and Policy Implications of Emotion AI\nFrom Query-By-Keyword to Query-By-Example: LinkedIn Talent Search  Approach\nUsing Summarization to Discover Argument Facets in Online Ideological  Dialog\nMaintaining Ad-Hoc Communication Network in Area Protection Scenarios  with Adversarial Agents\nBOOK: Storing Algorithm-Invariant Episodes for Deep Reinforcement  Learning\nInteracting Attention-gated Recurrent Networks for Recommendation\nOpening the Black Box of Financial AI with CLEAR-Trade: A CLass-Enhanced  Attentive Response Approach for Explaining and Visualizing Deep  Learning-Driven Stock Market Prediction\nUnsupervised Generative Modeling Using Matrix Product States\nMeasuring the Similarity of Sentential Arguments in Dialog\nCausalGAN: Learning Causal Implicit Generative Models with Adversarial  Training\nFormulation of Deep Reinforcement Learning Architecture Toward  Autonomous Driving for On-Ramp Merge\nRepresentation Learning for Visual-Relational Knowledge Graphs\nA Deep Reinforcement Learning Chatbot\nProsocial learning agents solve generalized Stag Hunts better than  selfish ones\nIdentifying Irregular Power Usage by Turning Predictions into  Holographic Spatial Visualizations\nMBMF: Model-Based Priors for Model-Free Reinforcement Learning\nFairness Testing: Testing Software for Discrimination\nOn better training the infinite restricted Boltzmann machines\nAutonomous Quadrotor Landing using Deep Reinforcement Learning\nCombining Strategic Learning and Tactical Search in Real-Time Strategy  Games\nMeta-QSAR: a large-scale application of meta-learning to drug design and  discovery\nMultimodal Content Analysis for Effective Advertisements on YouTube\nVariational Reasoning for Question Answering with Knowledge Graph\nComputing the Shapley Value in Allocation Problems: Approximations and  Bounds, with an Application to the Italian VQR Research Assessment Program\nAssessing State-of-the-Art Sentiment Models on State-of-the-Art  Sentiment Datasets\nTowards personalized human AI interaction - adapting the behavior of AI  agents using neural signatures of subjective interest\nRobustness Analysis of Visual QA Models by Basic Questions\nDiSAN: Directional Self-Attention Network for RNN/CNN-Free Language  Understanding\nMotif-based Rule Discovery for Predicting Real-valued Time Series\nUnsupervised state representation learning with robotic priors: a  robustness benchmark\nA Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking  Interacting Objects\nThe Geometric Block Model\nSKOS Concepts and Natural Language Concepts: an Analysis of Latent  Relationships in KOSs\nMitigating Evasion Attacks to Deep Neural Networks via Region-based  Classification\nPush and Pull Search for Solving Constrained Multi-objective  Optimization Problems\nAJILE Movement Prediction: Multimodal Deep Learning for Natural Human  Neural Recordings and Video\nTowards Cognitive-and-Immersive Systems: Experiments in a Shared (or  common) Blockworld Framework\nLeveraging Distributional Semantics for Multi-Label Learning\nModel-Powered Conditional Independence Test\nMuseGAN: Multi-track Sequential Generative Adversarial Networks for  Symbolic Music Generation and Accompaniment\nUnsupervised Machine Learning for Networking: Techniques, Applications  and Research Challenges\nDoctoral Advisor or Medical Condition: Towards Entity-specific Rankings  of Knowledge Base Properties [Extended Version]\nEMR-based medical knowledge representation and inference via Markov  random fields and distributed representation learning\nUsing Parameterized Black-Box Priors to Scale Up Model-Based Policy  Search for Robotics\nLearning Complex Swarm Behaviors by Exploiting Local Communication  Protocols with Deep Reinforcement Learning\nNon-Depth-First Search against Independent Distributions on an AND-OR  Tree\nInfluence of Personal Preferences on Link Dynamics in Social Networks\nSparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single  Image\nMRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product  Embeddings\nA Comprehensive Survey of Graph Embedding: Problems, Techniques and  Applications\nUsing Simulation and Domain Adaptation to Improve Efficiency of Deep  Robotic Grasping\nExtracting Ontological Knowledge from Textual Descriptions\nMining a Sub-Matrix of Maximal Sum\nBayesian Filtering for ODEs with Bounded Derivatives\nThe Consciousness Prior\nLong Text Generation via Adversarial Training with Leaked Information\nUltra-Dense HetNets Meet Big Data: Green Frameworks, Techniques, and  Approaches\nObject-oriented Neural Programming (OONP) for Document Understanding\nExact MAP inference in general higher-order graphical models using  linear programming\nBeyond opening up the black box: Investigating the role of algorithmic  systems in Wikipedian organizational culture\nAutomatic Error Analysis of Human Motor Performance for Interactive  Coaching in Virtual Reality\nDose Prediction with U-net: A Feasibility Study for Predicting Dose  Distributions from Contours using Deep Learning on Prostate IMRT Patients\nFSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for  Classification\nResearch on several key technologies in practical speech emotion  recognition\nAngriffserkennung für industrielle Netzwerke innerhalb des Projektes  IUNO\nIntroducing machine learning for power system operation support\nTraffic Optimization For a Mixture of Self-interested and Compliant  Agents\nEdina: Building an Open Domain Socialbot with Self-dialogues\nTowards Optimally Decentralized Multi-Robot Collision Avoidance via Deep  Reinforcement Learning\nDeep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces\nNeural and Synaptic Array Transceiver: A Brain-Inspired Computing  Framework for Embedded Learning\nThe BURCHAK corpus: a Challenge Data Set for Interactive Learning of  Visually Grounded Word Meanings\nSelf-supervised Deep Reinforcement Learning with Generalized Computation  Graphs for Robot Navigation\nSE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning  and Control\nPrivacy-Preserving Deep Inference for Rich User Data on The Cloud\nHANDY: A Hybrid Association Rules Mining Approach for Network Layer  Discovery of Services for Mobile Ad hoc Network\nExploration in Feature Space for Reinforcement Learning\nHow Much Chemistry Does a Deep Neural Network Need to Know to Make  Accurate Predictions?\nProjection Based Weight Normalization for Deep Neural Networks\nDistributed Kernel K-Means for Large Scale Clustering\nTowards Agent-Based Model Specification in Smart Grid: A Cognitive  Agent-based Computing Approach\nOn Preemption and Overdetermination in Formal Theories of Causality\nAdapting a Formal Model Theory to Applications in Augmented Personalized  Medicine\nMixed Precision Training\nOptimizing Long Short-Term Memory Recurrent Neural Networks Using Ant  Colony Optimization to Predict Turbine Engine Vibration\nPRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement  Learning and Sampling-based Planning\nNeural Program Meta-Induction\nTowards Scalable Spectral Clustering via Spectrum-Preserving  Sparsification\nExplaining Aviation Safety Incidents Using Deep Temporal Multiple  Instance Learning\nDeep Learning for Case-Based Reasoning through Prototypes: A Neural  Network that Explains Its Predictions\nCommunity Aware Random Walk for Network Embedding\nMental Sampling in Multimodal Representations\nSelf-Supervised Visual Planning with Temporal Skip Connections\nA systematic study of the class imbalance problem in convolutional  neural networks\nA retrieval-based dialogue system utilizing utterance and context  embeddings\nSafe Medicine Recommendation via Medical Knowledge Graph Embedding\nReply With: Proactive Recommendation of Email Attachments\nSpontaneous Symmetry Breaking in Neural Networks\nMap-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of  Robots by Deep Reinforcement Learning\nAuditing Black-Box Models Using Transparent Model Distillation With Side  Information\nLaying Down the Yellow Brick Road: Development of a Wizard-of-Oz  Interface for Collecting Human-Robot Dialogue\nA Bayesian Perspective on Generalization and Stochastic Gradient Descent\nLearning Pose Grammar to Encode Human Body Configuration for 3D Pose  Estimation\nUnsupervised Sentence Representations as Word Information Series:  Revisiting TF--IDF\nProtein Folding Optimization using Differential Evolution Extended with  Local Search and Component Reinitialization\nClassification Driven Dynamic Image Enhancement\nA Novel Stochastic Stratified Average Gradient Method: Convergence Rate  and Its Complexity\nExploiting generalization in the subspaces for faster model-based  learning\nHPC Cloud for Scientific and Business Applications: Taxonomy, Vision,  and Research Challenges\nA Memristor-Based Optimization Framework for AI Applications\nModel Identification via Physics Engines for Improved Policy Search\nMulti-Objective Approaches to Markov Decision Processes with Uncertain  Transition Parameters\nInductive Representation Learning in Large Attributed Graphs\nSRE: Semantic Rules Engine For the Industrial Internet-Of-Things  Gateways\nOn modeling vagueness and uncertainty in data-to-text systems through  fuzzy sets\nInverse Reinforcement Learning Under Noisy Observations\nDiscovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy\nTransfer Learning to Learn with Multitask Neural Model Search\nStackGAN++: Realistic Image Synthesis with Stacked Generative  Adversarial Networks\nHow Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in  Robotics\nSemantic Code Repair using Neuro-Symbolic Transformation Networks\nSuper-polynomial and exponential improvements for quantum-enhanced  reinforcement learning\nImprove SAT-solving with Machine Learning\nPrototype Matching Networks for Large-Scale Multi-label Genomic Sequence  Classification\nTreeQN and ATreeC: Differentiable Tree-Structured Models for Deep  Reinforcement Learning\nAbnormal Spatial-Temporal Pattern Analysis for Niagara Frontier Border  Wait Times\nFraternal Dropout\nSeparation of Water and Fat Magnetic Resonance Imaging Signals Using  Deep Learning with Convolutional Neural Networks\nErratum: Link prediction in drug-target interactions network using  similarity indices\nPiecewise Linear Neural Network verification: A comparative study\nSchool bus routing by maximizing trip compatibility\nSCDA: School Compatibility Decomposition Algorithm for Solving the  Multi-School Bus Routing and Scheduling Problem\nBeautiful and damned. Combined effect of content quality and social ties  on user engagement\nAdaptive coordination of working-memory and reinforcement learning in  non-human primates performing a trial-and-error problem solving task\nLearning to Represent Programs with Graphs\nFramework for evaluation of sound event detection in web videos\nProvable defenses against adversarial examples via the convex outer  adversarial polytope\nRunning Time Analysis of the (1+1)-EA for OneMax and LeadingOnes under  Bit-wise Noise\nDiscovering More Precise Process Models from Event Logs by Filtering Out  Chaotic Activities\nGuiding the search in continuous state-action spaces by learning an  action sampling distribution from off-target samples\nTransaction Fraud Detection Using GRU-centered Sandwich-structured Model\nHPX Smart Executors\nOnline Tool Condition Monitoring Based on Parsimonious Ensemble+\nVisually-Aware Fashion Recommendation and Design with Generative Image  Models\nContinuous DR-submodular Maximization: Structure and Algorithms\nLearning K-way D-dimensional Discrete Code For Compact Embedding  Representations\nLearning and Real-time Classification of Hand-written Digits With  Spiking Neural Networks\n\"Dave...I can assure you...that it's going to be all right...\" -- A  definition, case for, and survey of algorithmic assurances in human-autonomy  trust relationships\nKBGAN: Adversarial Learning for Knowledge Graph Embeddings\nDifferential Performance Debugging with Discriminant Regression Trees\nUnified Spectral Clustering with Optimal Graph\n"
  },
  {
    "path": "data/arxiv/artificial intelligence_134_15000_200_abs.txt",
    "content": "The singularity refers to an idea that once a machine having an artificial intelligence surpassing the human intelligence capacity is created, it will trigger explosive technological and intelligence growth. I propose to test the hypothesis that machine intelligence capacity can grow autonomously starting with an intelligence comparable to that of bacteria - microbial intelligence. The goal will be to demonstrate that rapid growth in intelligence capacity can be realized at all in artificial computing systems. I propose the following three properties that may allow an artificial intelligence to exhibit a steady growth in its intelligence capacity: (i) learning with the ability to modify itself when exposed to more data, (ii) acquiring new functionalities (skills), and (iii) expanding or replicating itself. The algorithms must demonstrate a rapid growth in skills of dataprocessing and analysis and gain qualitatively different functionalities, at least until the current computing technology supports their scalable development. The existing algorithms that already encompass some of these or similar properties, as well as missing abilities that must yet be implemented, will be reviewed in this work. Future computational tests could support or oppose the hypothesis that artificial intelligence can potentially grow to the level of superintelligence which overcomes the limitations in hardware by producing necessary processing resources or by changing the physical realization of computation from using chip circuits to using quantum computing principles.\nNowadays, considering the speed of the processes and the amount of data used in cyber defense, it cannot be expected to have an effective defense by using only human power without the help of automation systems. However, for the effective defense against dynamically evolving attacks on networks, it is difficult to develop software with conventional fixed algorithms. This can be achieved by using artificial intelligence methods that provide flexibility and learning capability. The likelihood of developing cyber defense capabilities through increased intelligence of defense systems is quite high. Given the problems associated with cyber defense in real life, it is clear that many cyber defense problems can be successfully solved only when artificial intelligence methods are used. In this article, the current artificial intelligence practices and techniques are reviewed and the use and importance of artificial intelligence in cyber defense systems is mentioned. The aim of this article is to be able to explain the use of these methods in the field of cyber defense with current examples by considering and analyzing the artificial intelligence technologies and methodologies that are currently being developed and integrating them with the role and adaptation of the technology and methodology in the defense of cyberspace.\nThe large distances involved in interstellar travel require a high degree of spacecraft autonomy, realized by artificial intelligence. The breadth of tasks artificial intelligence could perform on such spacecraft involves maintenance, data collection, designing and constructing an infrastructure using in-situ resources. Despite its importance, existing publications on artificial intelligence and interstellar travel are limited to cursory descriptions where little detail is given about the nature of the artificial intelligence. This article explores the role of artificial intelligence for interstellar travel by compiling use cases, exploring capabilities, and proposing typologies, system and mission architectures. Estimations for the required intelligence level for specific types of interstellar probes are given, along with potential system and mission architectures, covering those proposed in the literature but also presenting novel ones. Finally, a generic design for interstellar probes with an AI payload is proposed. Given current levels of increase in computational power, a spacecraft with a similar computational power as the human brain would have a mass from dozens to hundreds of tons in a 2050-2060 timeframe. Given that the advent of the first interstellar missions and artificial general intelligence are estimated to be by the mid-21st century, a more in-depth exploration of the relationship between the two should be attempted, focusing on neglected areas such as protecting the artificial intelligence payload from radiation in interstellar space and the role of artificial intelligence in self-replication.\nThe first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers culminating in book (Hutter, 2005), an exciting sound and complete mathematical model for a super intelligent agent (AIXI) has been developed and rigorously analyzed. While nowadays most AI researchers avoid discussing intelligence, the award-winning PhD thesis (Legg, 2008) provided the philosophical embedding and investigated the UAI-based universal measure of rational intelligence, which is formal, objective and non-anthropocentric. Recently, effective approximations of AIXI have been derived and experimentally investigated in JAIR paper (Veness et al. 2011). This practical breakthrough has resulted in some impressive applications, finally muting earlier critique that UAI is only a theory. For the first time, without providing any domain knowledge, the same agent is able to self-adapt to a diverse range of interactive environments. For instance, AIXI is able to learn from scratch to play TicTacToe, Pacman, Kuhn Poker, and other games by trial and error, without even providing the rules of the games.   These achievements give new hope that the grand goal of Artificial General Intelligence is not elusive.   This article provides an informal overview of UAI in context. It attempts to gently introduce a very theoretical, formal, and mathematical subject, and discusses philosophical and technical ingredients, traits of intelligence, some social questions, and the past and future of UAI.\nThe rise of deep learning has brought artificial intelligence (AI) to the forefront. The ultimate goal of AI is to realize machines with human mind and consciousness, but existing achievements mainly simulate intelligent behavior on computer platforms. These achievements all belong to weak AI rather than strong AI. How to achieve strong AI is not known yet in the field of intelligence science. Currently, this field is calling for a new paradigm, especially Theory of Cognitive Relativity (TCR). The TCR aims to summarize a simple and elegant set of first principles about the nature of intelligence, at least including the Principle of World's Relativity and the Principle of Symbol's Relativity. The Principle of World's Relativity states that the subjective world an intelligent agent can observe is strongly constrained by the way it perceives the objective world. The Principle of Symbol's Relativity states that an intelligent agent can use any physical symbol system to express what it observes in its subjective world. The two principles are derived from scientific facts and life experience. Thought experiments show that they are important to understand high-level intelligence and necessary to establish a scientific theory of mind and consciousness. Rather than brain-like intelligence, the TCR indeed advocates a promising change in direction to realize true AI, i.e. artificial general intelligence or artificial consciousness, particularly different from humans' and animals'. Furthermore, a TCR creed has been presented and extended to reveal the secrets of consciousness and to guide realization of conscious machines. In the sense that true AI could be diversely implemented in a brain-different way, the TCR would probably drive an intelligence revolution in combination with some additional first principles.\nWith the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence.\nAI technology has a long history which is actively and constantly changing and growing. It focuses on intelligent agents, which contain devices that perceive the environment and based on which takes actions in order to maximize goal success chances. In this paper, we will explain the modern AI basics and various representative applications of AI. In the context of the modern digitalized world, AI is the property of machines, computer programs, and systems to perform the intellectual and creative functions of a person, independently find ways to solve problems, be able to draw conclusions and make decisions. Most artificial intelligence systems have the ability to learn, which allows people to improve their performance over time. The recent research on AI tools, including machine learning, deep learning and predictive analysis intended toward increasing the planning, learning, reasoning, thinking and action taking ability. Based on which, the proposed research intends towards exploring on how the human intelligence differs from the artificial intelligence. Moreover, we critically analyze what AI of today is capable of doing, why it still cannot reach human intelligence and what are the open challenges existing in front of AI to reach and outperform human level of intelligence. Furthermore, it will explore the future predictions for artificial intelligence and based on which potential solution will be recommended to solve it within next decades.\nThis article deals with the links between the enaction paradigm and artificial intelligence. Enaction is considered a metaphor for artificial intelligence, as a number of the notions which it deals with are deemed incompatible with the phenomenal field of the virtual. After explaining this stance, we shall review previous works regarding this issue in terms of artifical life and robotics. We shall focus on the lack of recognition of co-evolution at the heart of these approaches. We propose to explicitly integrate the evolution of the environment into our approach in order to refine the ontogenesis of the artificial system, and to compare it with the enaction paradigm. The growing complexity of the ontogenetic mechanisms to be activated can therefore be compensated by an interactive guidance system emanating from the environment. This proposition does not however resolve that of the relevance of the meaning created by the machine (sense-making). Such reflections lead us to integrate human interaction into this environment in order to construct relevant meaning in terms of participative artificial intelligence. This raises a number of questions with regards to setting up an enactive interaction. The article concludes by exploring a number of issues, thereby enabling us to associate current approaches with the principles of morphogenesis, guidance, the phenomenology of interactions and the use of minimal enactive interfaces in setting up experiments which will deal with the problem of artificial intelligence in a variety of enaction-based ways.\nThe elusive quest for intelligence in artificial intelligence prompts us to consider that instituting human-level intelligence in systems may be (still) in the realm of utopia. In about a quarter century, we have witnessed the winter of AI (1990) being transformed and transported to the zenith of tabloid fodder about AI (2015). The discussion at hand is about the elements that constitute the canonical idea of intelligence. The delivery of intelligence as a pay-per-use-service, popping out of an app or from a shrink-wrapped software defined point solution, is in contrast to the bio-inspired view of intelligence as an outcome, perhaps formed from a tapestry of events, cross-pollinated by instances, each with its own microcosm of experiences and learning, which may not be discrete all-or-none functions but continuous, over space and time. The enterprise world may not require, aspire or desire such an engaged solution to improve its services for enabling digital transformation through the deployment of digital twins, for example. One might ask whether the \"work-flow on steroids\" version of decision support may suffice for intelligence? Are we harking back to the era of rule based expert systems? The image conjured by the publicity machines offers deep solutions with human-level AI and preposterous claims about capturing the \"brain in a box\" by 2020. Even emulating insects may be difficult in terms of real progress. Perhaps we can try to focus on worms (Caenorhabditis elegans) which may be better suited for what business needs to quench its thirst for so-called intelligence in AI.\nThis paper summarises how the \"SP theory of intelligence\" and its realisation in the \"SP computer model\" simplifies and integrates concepts across artificial intelligence and related areas, and thus provides a promising foundation for the development of a general, human-level thinking machine, in accordance with the main goal of research in artificial general intelligence.   The key to this simplification and integration is the powerful concept of \"multiple alignment\", borrowed and adapted from bioinformatics. This concept has the potential to be the \"double helix\" of intelligence, with as much significance for human-level intelligence as has DNA for biological sciences.   Strengths of the SP system include: versatility in the representation of diverse kinds of knowledge; versatility in aspects of intelligence (including: strengths in unsupervised learning; the processing of natural language; pattern recognition at multiple levels of abstraction that is robust in the face of errors in data; several kinds of reasoning (including: one-step `deductive' reasoning; chains of reasoning; abductive reasoning; reasoning with probabilistic networks and trees; reasoning with 'rules'; nonmonotonic reasoning and reasoning with default values; Bayesian reasoning with 'explaining away'; and more); planning; problem solving; and more); seamless integration of diverse kinds of knowledge and diverse aspects of intelligence in any combination; and potential for application in several areas (including: helping to solve nine problems with big data; helping to develop human-level intelligence in autonomous robots; serving as a database with intelligence and with versatility in the representation and integration of several forms of knowledge; serving as a vehicle for medical knowledge and as an aid to medical diagnosis; and several more).\nArtificial General Intelligence is a field of research aiming to distill the principles of intelligence that operate independently of a specific problem domain or a predefined context and utilize these principles in order to synthesize systems capable of performing any intellectual task a human being is capable of and eventually go beyond that. While \"narrow\" artificial intelligence which focuses on solving specific problems such as speech recognition, text comprehension, visual pattern recognition, robotic motion, etc. has shown quite a few impressive breakthroughs lately, understanding general intelligence remains elusive. In the paper we offer a novel theoretical approach to understanding general intelligence. We start with a brief introduction of the current conceptual approach. Our critique exposes a number of serious limitations that are traced back to the ontological roots of the concept of intelligence. We then propose a paradigm shift from intelligence perceived as a competence of individual agents defined in relation to an a priori given problem domain or a goal, to intelligence perceived as a formative process of self-organization by which intelligent agents are individuated. We call this process open-ended intelligence. Open-ended intelligence is developed as an abstraction of the process of cognitive development so its application can be extended to general agents and systems. We introduce and discuss three facets of the idea: the philosophical concept of individuation, sense-making and the individuation of general cognitive agents. We further show how open-ended intelligence can be framed in terms of a distributed, self-organizing network of interacting elements and how such process is scalable. The framework highlights an important relation between coordination and intelligence and a new understanding of values. We conclude with a number of questions for future research.\nOne of the main research areas in Artificial Intelligence is the coding of agents (programs) which are able to learn by themselves in any situation. This means that agents must be useful for purposes other than those they were created for, as, for example, playing chess. In this way we try to get closer to the pristine goal of Artificial Intelligence. One of the problems to decide whether an agent is really intelligent or not is the measurement of its intelligence, since there is currently no way to measure it in a reliable way. The purpose of this project is to create an interpreter that allows for the execution of several environments, including those which are generated randomly, so that an agent (a person or a program) can interact with them. Once the interaction between the agent and the environment is over, the interpreter will measure the intelligence of the agent according to the actions, states and rewards the agent has undergone inside the environment during the test. As a result we will be able to measure agents' intelligence in any possible environment, and to make comparisons between several agents, in order to determine which of them is the most intelligent. In order to perform the tests, the interpreter must be able to randomly generate environments that are really useful to measure agents' intelligence, since not any randomly generated environment will serve that purpose.\nToday, available methods that assess AI systems are focused on using empirical techniques to measure the performance of algorithms in some specific tasks (e.g., playing chess, solving mazes or land a helicopter). However, these methods are not appropriate if we want to evaluate the general intelligence of AI and, even less, if we compare it with human intelligence. The ANYNT project has designed a new method of evaluation that tries to assess AI systems using well known computational notions and problems which are as general as possible. This new method serves to assess general intelligence (which allows us to learn how to solve any new kind of problem we face) and not only to evaluate performance on a set of specific tasks. This method not only focuses on measuring the intelligence of algorithms, but also to assess any intelligent system (human beings, animals, AI, aliens?,...), and letting us to place their results on the same scale and, therefore, to be able to compare them. This new approach will allow us (in the future) to evaluate and compare any kind of intelligent system known or even to build/find, be it artificial or biological. This master thesis aims at ensuring that this new method provides consistent results when evaluating AI algorithms, this is done through the design and implementation of prototypes of universal intelligence tests and their application to different intelligent systems (AI algorithms and humans beings). From the study we analyze whether the results obtained by two different intelligent systems are properly located on the same scale and we propose changes and refinements to these prototypes in order to, in the future, being able to achieve a truly universal intelligence test.\nThe overarching problem in artificial intelligence (AI) is that we do not understand the intelligence process well enough to enable the development of adequate computational models. Much work has been done in AI over the years at lower levels, but a big part of what has been missing involves the high level, abstract, general nature of intelligence. We address this gap by developing a model for general intelligence. To accomplish this, we focus on three basic aspects of intelligence. First, we must realize the general order and nature of intelligence at a high level. Second, we must come to know what these realizations mean with respect to the overall intelligence process. Third, we must describe these realizations as clearly as possible. We propose a hierarchical model to help capture and exploit the order within intelligence. The underlying order involves patterns of signals that become organized, stored and activated in space and time. These patterns can be described using a simple, general hierarchy, with physical signals at the lowest level, information in the middle, and abstract signal representations at the top. This high level perspective provides a big picture that literally helps us see the intelligence process, thereby enabling fundamental realizations, a better understanding and clear descriptions of the intelligence process. The resulting model can be used to support all kinds of information processing across multiple levels of abstraction. As computer technology improves, and as cooperation increases between humans and computers, people will become more efficient and more productive in performing their information processing tasks.\nTo make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to \"buy\" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.\nArtificial Intelligence frameworks should allow for ever more autonomous and general systems in contrast to very narrow and restricted (human pre-defined) domain systems, in analogy to how the brain works. Self-constructive Artificial Intelligence ($SCAI$) is one such possible framework. We herein propose that $SCAI$ is based on three principles of organization: self-growing, self-experimental and self-repairing. Self-growing: the ability to autonomously and incrementally construct structures and functionality as needed to solve encountered (sub)problems. Self-experimental: the ability to internally simulate, anticipate and take decisions based on these expectations. Self-repairing: the ability to autonomously re-construct a previously successful functionality or pattern of interaction lost from a possible sub-component failure (damage). To implement these principles of organization, a constructive architecture capable of evolving adaptive autonomous agents is required. We present Schema-based learning as one such architecture capable of incrementally constructing a myriad of internal models of three kinds: predictive schemas, dual (inverse models) schemas and goal schemas as they are necessary to autonomously develop increasing functionality.   We claim that artificial systems, whether in the digital or in the physical world, can benefit very much form this constructive architecture and should be organized around these principles of organization. To illustrate the generality of the proposed framework, we include several test cases in structural adaptive navigation in artificial intelligence systems in Paper II of this series, and resilient robot motor control in Paper III of this series. Paper IV of this series will also include $SCAI$ for problem structural discovery in predictive Business Intelligence.\nIt is very important to adhere strictly to ethical and social influences when delivering most of our life to artificial intelligence systems. With industry 4.0, the internet of things, data analysis and automation have begun to be of great importance in our lives. With the Yapanese version of Industry 5.0, it has come to our attention that machine-human interaction and human intelligence are working in harmony with the cognitive computer. In this context, robots working on artificial intelligence algorithms co-ordinated with the development of technology have begun to enter our lives. But the consequences of the recent complaints of the Robots have been that important issues have arisen about how to be followed in terms of intellectual property and ethics. Although there are no laws regulating robots in our country at present, laws on robot ethics and rights abroad have entered into force. This means that it is important that we organize the necessary arrangements in the way that robots and artificial intelligence are so important in the new world order. In this study, it was aimed to examine the existing rules of machine and robot ethics and to set an example for the arrangements to be made in our country, and various discussions were given in this context.\nMotivated by Shannon's model and recent rehabilitation of self-supervised artificial intelligence having a \"World Model\", this paper propose an unified intelligence-communication (UIC) model for describing a single agent and any multi-agent system.   Firstly, the environment is modelled as the generic communication channel between agents. Secondly, the UIC model adopts a learning-agent model for unifying several well-adopted agent architecture, e.g. rule-based agent model in complex adaptive systems, layered model for describing human-level intelligence, world-model based agent model. The model may also provide an unified approach to investigate a multi-agent system (MAS) having multiple action-perception modalities, e.g. explicitly information transfer and implicit information transfer.   This treatise would be divided into three parts, and this first part provides an overview of the UIC model without introducing cumbersome mathematical analysis and optimizations. In the second part of this treatise, case studies with quantitative analysis driven by the UIC model would be provided, exemplifying the adoption of the UIC model in multi-agent system. Specifically, two representative cases would be studied, namely the analysis of a natural multi-agent system, as well as the co-design of communication, perception and action in an artificial multi-agent system. In the third part of this treatise, the paper provides further insights and future research directions motivated by the UIC model, such as unification of single intelligence and collective intelligence, a possible explanation of intelligence emergence and a dual model for agent-environment intelligence hypothesis.   Notes: This paper is a Previewed Version, the extended full-version would be released after being accepted.\nThe concept of \"task\" is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane's performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial *general* intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A *task theory* would enable addressing tasks at the *class* level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks.\nDeveloping a reliable parametric cost model at the conceptual stage of the project is crucial for projects managers and decision-makers. Existing methods, such as probabilistic and statistical algorithms have been developed for project cost prediction. However, these methods are unable to produce accurate results for conceptual cost prediction due to small and unstable data samples. Artificial intelligence (AI) and machine learning (ML) algorithms include numerous models and algorithms for supervised regression applications. Therefore, a comparison analysis for AI models is required to guide practitioners to the appropriate model. The study focuses on investigating twenty artificial intelligence (AI) techniques which are conducted for cost modeling such as fuzzy logic (FL) model, artificial neural networks (ANNs), multiple regression analysis (MRA), case-based reasoning (CBR), hybrid models, and ensemble methods such as scalable boosting trees (XGBoost). Field canals improvement projects (FCIPs) are used as an actual case study to analyze the performance of the applied ML models. Out of 20 AI techniques, the results showed that the most accurate and suitable method is XGBoost with 9.091% and 0.929 based on Mean Absolute Percentage Error (MAPE) and adjusted R2. Nonlinear adaptability, handling missing values and outliers, model interpretation and uncertainty have been discussed for the twenty developed AI models. Keywords: Artificial intelligence, Machine learning, ensemble methods, XGBoost, evolutionary fuzzy rules generation, Conceptual cost, and parametric cost model.\nComputational Intelligence (CI) is a sub-branch of Artificial Intelligence paradigm focusing on the study of adaptive mechanisms to enable or facilitate intelligent behavior in complex and changing environments. There are several paradigms of CI [like artificial neural networks, evolutionary computations, swarm intelligence, artificial immune systems, fuzzy systems and many others], each of these has its origins in biological systems [biological neural systems, natural Darwinian evolution, social behavior, immune system, interactions of organisms with their environment]. Most of those paradigms evolved into separate machine learning (ML) techniques, where probabilistic methods are used complementary with CI techniques in order to effectively combine elements of learning, adaptation, evolution and Fuzzy logic to create heuristic algorithms that are, in some sense, intelligent. The current trend is to develop consensus techniques, since no single machine learning algorithms is superior to others in all possible situations. In order to overcome this problem several meta-approaches were proposed in ML focusing on the integration of results from different methods into single prediction. We discuss here the Landau theory for the nonlinear equation that can describe the adaptive integration of information acquired from an ensemble of independent learning agents. The influence of each individual agent on other learners is described similarly to the social impact theory. The final decision outcome for the consensus system is calculated using majority rule in the stationary limit, yet the minority solutions can survive inside the majority population as the complex intermittent clusters of opposite opinion.\nThis paper describes a novel method for building affectively intelligent human-interactive agents. The method is based on a key sociological insight that has been developed and extensively verified over the last twenty years, but has yet to make an impact in artificial intelligence. The insight is that resource bounded humans will, by default, act to maintain affective consistency. Humans have culturally shared fundamental affective sentiments about identities, behaviours, and objects, and they act so that the transient affective sentiments created during interactions confirm the fundamental sentiments. Humans seek and create situations that confirm or are consistent with, and avoid and supress situations that disconfirm or are inconsistent with, their culturally shared affective sentiments. This \"affect control principle\" has been shown to be a powerful predictor of human behaviour. In this paper, we present a probabilistic and decision-theoretic generalisation of this principle, and we demonstrate how it can be leveraged to build affectively intelligent artificial agents. The new model, called BayesAct, can maintain multiple hypotheses about sentiments simultaneously as a probability distribution, and can make use of an explicit utility function to make value-directed action choices. This allows the model to generate affectively intelligent interactions with people by learning about their identity, predicting their behaviours using the affect control principle, and taking actions that are simultaneously goal-directed and affect-sensitive. We demonstrate this generalisation with a set of simulations. We then show how our model can be used as an emotional \"plug-in\" for artificially intelligent systems that interact with humans in two different settings: an exam practice assistant (tutor) and an assistive device for persons with a cognitive disability.\nTheatrical improvisation (impro or improv) is a demanding form of live, collaborative performance. Improv is a humorous and playful artform built on an open-ended narrative structure which simultaneously celebrates effort and failure. It is thus an ideal test bed for the development and deployment of interactive artificial intelligence (AI)-based conversational agents, or artificial improvisors. This case study introduces an improv show experiment featuring human actors and artificial improvisors. We have previously developed a deep-learning-based artificial improvisor, trained on movie subtitles, that can generate plausible, context-based, lines of dialogue suitable for theatre (Mathewson and Mirowski 2017). In this work, we have employed it to control what a subset of human actors say during an improv performance. We also give human-generated lines to a different subset of performers. All lines are provided to actors with headphones and all performers are wearing headphones. This paper describes a Turing test, or imitation game, taking place in a theatre, with both the audience members and the performers left to guess who is a human and who is a machine. In order to test scientific hypotheses about the perception of humans versus machines we collect anonymous feedback from volunteer performers and audience members. Our results suggest that rehearsal increases proficiency and possibility to control events in the performance. That said, consistency with real world experience is limited by the interface and the mechanisms used to perform the show. We also show that human-generated lines are shorter, more positive, and have less difficult words with more grammar and spelling mistakes than the artificial improvisor generated lines.\nWeb intelligence can be considered as a subset of Artificial Intelligence. It uses existing data in web to produce new data, knowledge and wisdom to support decision making and new predictions for web users. Artificial Intelligence is ever changing and evolving field of computer science and it is extensively used in wide array of web based business applications. Although it is used substantially in web based systems in developed countries, it is not examined whether it is being substantially used in Sri Lanka. Every Sri Lankan citizen depends on Public Service more or less throughout his/ her life time and at least more than 3 times: at birth, marriage and death. So providing most of these services to its citizen, Sri Lankan Government uses more or less of its country web portal. This paper presents a model to evaluate web intelligence capability based on weight to key functionalities with respect to web intelligence. The government websites were checked by the proposed criteria to show the potential of using web intelligent technology to provide website based services. The result indicates that the use of web intelligence techniques openly and publicly to provide web based services through government web portal to its citizens is not satisfactory. It also indicates that lack of using the technologies pertaining to web intelligence in the public service web hinders the most of the advantages that citizen and government can gain from such technological involvement.\nThe study of arguments as abstract entities and their interaction as introduced by Dung (Artificial Intelligence 177, 1995) has become one of the most active research branches within Artificial Intelligence and Reasoning. A main issue for abstract argumentation systems is the selection of acceptable sets of arguments. Value-based argumentation, as introduced by Bench-Capon (J. Logic Comput. 13, 2003), extends Dung's framework. It takes into account the relative strength of arguments with respect to some ranking representing an audience: an argument is subjectively accepted if it is accepted with respect to some audience, it is objectively accepted if it is accepted with respect to all audiences. Deciding whether an argument is subjectively or objectively accepted, respectively, are computationally intractable problems. In fact, the problems remain intractable under structural restrictions that render the main computational problems for non-value-based argumentation systems tractable. In this paper we identify nontrivial classes of value-based argumentation systems for which the acceptance problems are polynomial-time tractable. The classes are defined by means of structural restrictions in terms of the underlying graphical structure of the value-based system. Furthermore we show that the acceptance problems are intractable for two classes of value-based systems that where conjectured to be tractable by Dunne (Artificial Intelligence 171, 2007).\nPeople who design, use, and are affected by autonomous artificially intelligent agents want to be able to \\emph{trust} such agents -- that is, to know that these agents will perform correctly, to understand the reasoning behind their actions, and to know how to use them appropriately. Many techniques have been devised to assess and influence human trust in artificially intelligent agents. However, these approaches are typically ad hoc, and have not been formally related to each other or to formal trust models. This paper presents a survey of \\emph{algorithmic assurances}, i.e. programmed components of agent operation that are expressly designed to calibrate user trust in artificially intelligent agents. Algorithmic assurances are first formally defined and classified from the perspective of formally modeled human-artificially intelligent agent trust relationships. Building on these definitions, a synthesis of research across communities such as machine learning, human-computer interaction, robotics, e-commerce, and others reveals that assurance algorithms naturally fall along a spectrum in terms of their impact on an agent's core functionality, with seven notable classes ranging from integral assurances (which impact an agent's core functionality) to supplemental assurances (which have no direct effect on agent performance). Common approaches within each of these classes are identified and discussed; benefits and drawbacks of different approaches are also investigated.\nEvaluation has always been a key challenge in the development of artificial intelligence (AI) based software, due to the technical complexity of the software artifact and, often, its embedding in complex sociotechnical processes. Recent advances in machine learning (ML) enabled by deep neural networks has exacerbated the challenge of evaluating such software due to the opaque nature of these ML-based artifacts. A key related issue is the (in)ability of such systems to generate useful explanations of their outputs, and we argue that the explanation and evaluation problems are closely linked. The paper models the elements of a ML-based AI system in the context of public sector decision (PSD) applications involving both artificial and human intelligence, and maps these elements against issues in both evaluation and explanation, showing how the two are related. We consider a number of common PSD application patterns in the light of our model, and identify a set of key issues connected to explanation and evaluation in each case. Finally, we propose multiple strategies to promote wider adoption of AI/ML technologies in PSD, where each is distinguished by a focus on different elements of our model, allowing PSD policy makers to adopt an approach that best fits their context and concerns.\nWe discuss the objectives of any endeavor in creating artificial intelligence, AI, and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. This suggests that our attempts at AI could have been misguided; what we actually need to strive for can be termed artificial curiosity, AC, and intelligence happens as a consequence of those efforts. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles needs to be present. We discuss what these essential doctrines might be and why their establishment is required to form connections, possibly growing, between a knowledge store that has been built up and new pieces of information that curiosity will bring back. As more findings are acquired and more bonds are fermented, we need a way to, periodically, reduce the amount of data; in the sense, it is important to capture the critical characteristics of what has been accumulated or produce a summary of what has been gathered. We start with the intuition for this line of reasoning and formalize it with a series of models (and iterative improvements) that will be necessary to make the incubation of intelligence a reality. Our discussion provides conceptual modifications to the Turing Test and to Searle's Chinese room argument. We discuss the future implications for society as AI becomes an integral part of life.\nWe present an alternative methodology for the analysis of algorithms, based on the concept of expected discounted reward. This methodology naturally handles algorithms that do not always terminate, so it can (theoretically) be used with partial algorithms for undecidable problems, such as those found in artificial general intelligence (AGI) and automated theorem proving. We mention an approach to self-improving AGI enabled by this methodology.   Aug 2017 addendum: This article was originally written with multiple audiences in mind. It is really best put in the following terms. Goertzel, Hutter, Legg, and others have developed a definition of an intelligence score for a general abstract agent: expected lifetime reward in a random environment. AIXI is generally the optimal agent according to this score, but there may be reasons to analyze other agents and compare score values. If we want to use this definition of intelligence in practice, perhaps we can start by analyzing some simple agents. Common algorithms can be thought of as simple agents (environment is input, reward is based on running time) so we take the goal of applying the agent intelligence score to algorithms. That is, we want to find, what are the IQ scores of algorithms? We can do some very simple analysis, but the real answer is that even for simple algorithms, the intelligence score is too difficult to work with in practice.\nThe biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.\nThe rapid advancement of machine learning techniques has re-energized research into general artificial intelligence. While the idea of domain-agnostic meta-learning is appealing, this emerging field must come to terms with its relationship to human cognition and the statistics and structure of the tasks humans perform. The position of this article is that only by aligning our agents' abilities and environments with those of humans do we stand a chance at developing general artificial intelligence (GAI). A broad reading of the famous 'No Free Lunch' theorem is that there is no universally optimal inductive bias or, equivalently, bias-free learning is impossible. This follows from the fact that there are an infinite number of ways to extrapolate data, any of which might be the one used by the data generating environment; an inductive bias prefers some of these extrapolations to others, which lowers performance in environments using these adversarial extrapolations. We may posit that the optimal GAI is the one that maximally exploits the statistics of its environment to create its inductive bias; accepting the fact that this agent is guaranteed to be extremely sub-optimal for some alternative environments. This trade-off appears benign when thinking about the environment as being the physical universe, as performance on any fictive universe is obviously irrelevant. But, we should expect a sharper inductive bias if we further constrain our environment. Indeed, we implicitly do so by defining GAI in terms of accomplishing that humans consider useful. One common version of this is need the for 'common-sense reasoning', which implicitly appeals to the statistics of physical universe as perceived by humans.\nArtificial Intelligence (AI) - the phenomenon of machines being able to solve problems that require human intelligence - has in the past decade seen an enormous rise of interest due to significant advances in effectiveness and use. The health sector, one of the most important sectors for societies and economies worldwide, is particularly interesting for AI applications, given the ongoing digitalisation of all types of health information. The potential for AI assistance in the health domain is immense, because AI can support medical decision making at reduced costs, everywhere. However, due to the complexity of AI algorithms, it is difficult to distinguish good from bad AI-based solutions and to understand their strengths and weaknesses, which is crucial for clarifying responsibilities and for building trust. For this reason, the International Telecommunication Union (ITU) has established a new Focus Group on \"Artificial Intelligence for Health\" (FG-AI4H) in partnership with the World Health Organization (WHO). Health and care services are usually the responsibility of a government - even when provided through private insurance systems - and thus under the responsibility of WHO/ITU member states. FG-AI4H will identify opportunities for international standardization, which will foster the application of AI to health issues on a global scale. In particular, it will establish a standardized assessment framework with open benchmarks for the evaluation of AI-based methods for health, such as AI-based diagnosis, triage or treatment decisions.\nThe fields of artificial intelligence and neuroscience have a long history of fertile bi-directional interactions. On the one hand, important inspiration for the development of artificial intelligence systems has come from the study of natural systems of intelligence, the mammalian neocortex in particular. On the other, important inspiration for models and theories of the brain have emerged from artificial intelligence research. A central question at the intersection of these two areas is concerned with the processes by which neocortex learns, and the extent to which they are analogous to the back-propagation training algorithm of deep networks. Matching the data efficiency, transfer and generalization properties of neocortical learning remains an area of active research in the field of deep learning. Recent advances in our understanding of neuronal, synaptic and dendritic physiology of the neocortex suggest new approaches for unsupervised representation learning, perhaps through a new class of objective functions, which could act alongside or in lieu of back-propagation. Such local learning rules have implicit rather than explicit objectives with respect to the training data, facilitating domain adaptation and generalization. Incorporating them into deep networks for representation learning could better leverage unlabelled datasets to offer significant improvements in data efficiency of downstream supervised readout learning, and reduce susceptibility to adversarial perturbations, at the cost of a more restricted domain of applicability.\nThis article is about how the \"SP theory of intelligence\" and its realisation in the \"SP machine\" (both outlined in the article) may help to solve computer-related problems in the design of autonomous robots, meaning robots that do not depend on external intelligence or power supplies, are mobile, and are designed to exhibit as much human-like intelligence as possible. The article is about: how to increase the computational and energy efficiency of computers and reduce their bulk; how to achieve human-like versatility in intelligence; and likewise for human-like adaptability in intelligence. The SP system has potential for substantial gains in computational and energy efficiency and reductions in the bulkiness of computers: by reducing the size of data to be processed; by exploiting statistical information that the system gathers; and via an updated version of Donald Hebb's concept of a \"cell assembly\". Towards human-like versatility in intelligence, the SP system has strengths in unsupervised learning, natural language processing, pattern recognition, information retrieval, several kinds of reasoning, planning, problem solving, and more, with seamless integration amongst structures and functions. The SP system's strengths in unsupervised learning and other aspects of intelligence may help to achieve human-like adaptability in intelligence via: the learning of natural language; learning to see; building 3D models of objects and of a robot's surroundings; learning regularities in the workings of a robot and in the robot's environment; exploration and play; learning major skills; and secondary forms of learning. Also discussed are: how the SP system may process parallel streams of information; generalisation of knowledge, correction of over-generalisations, and learning from dirty data; how to cut the cost of learning; and reinforcements, motivations, goals, and demonstration.\nWhat would a human hundreds or thousands times more intelligent than the brightest human ever born be like? We must admit we can hardly guess. A human being of such intelligence will be so radically different from us that it can hardly, if at all, be recognized as human. If we had to go back along the evolutionary tree to identify a creature 1000 times less intelligent than the average contemporary human, we will have to go really far back. Would it be a kind of a lizard? An insect perhaps? Considering this, how can we possibly aspire to have a grasp of something a thousand times more intelligent than us? When it comes to intelligence, even the very attempt to quantify it is highly misleading. Now if we attend to a seemingly adjacent question, what would a machine with such capacity for intelligence be like? Just coming up with an approximate metaphor requires a huge stretch of the imagination, meaning that almost anything goes... What would a society of such super intelligent agents, be they human, machines or an amalgam of both, be like? Well, here we are transported into the realm of pure speculation. Technological Singularity is referred to as the event of artificial intelligence surpassing the intelligence of humans and shortly after augmenting itself far beyond that. It is no wonder that the mathematical concept of singularity has become the symbol of an event so disruptive and so far reaching that it is impossible to conceptually or even metaphorically grasp, much less to predict.\nHuman-Computer Interaction with the traditional User Interface is done using a specified in advance script dialog menu, mainly based on human intellect and unproductive use of navigation. This approach does not lead to making qualitative decision in control systems, where the situations and processes cannot be structured in advance. Any dynamic changes in the controlled business process (as example, in organizational unit of the information fuzzy control system) make it necessary to modify the script dialogue in User Interface. This circumstance leads to a redesign of the components of the User Interface and of the entire control system. In the Intelligent User Interface, where the dialog situations are unknown in advance, fuzzy structured and artificial intelligence is crucial, the redesign described above is impossible. To solve this and other problems, we propose the data, information and knowledge based technology of Smart/ Intelligent User Interface (IUI) design, which interacts with users and systems in natural and other languages, utilizing the principles of Situational Control and Fuzzy Logic theories, Artificial Intelligence, Linguistics, Knowledge Base technologies and others. The proposed technology of IUI design is defined by multi-agents of Situational Control and of data, information and knowledge, modelling of Fuzzy Logic Inference, Generalization, Representation and Explanation of knowledge, Planning and Decision-making, Dialog Control, Reasoning and Systems Thinking, Fuzzy Control of organizational unit in real-time, fuzzy conditions, heterogeneous domains, and multi-lingual communication under uncertainty and in Fuzzy Environment.\nSociety has become more dependent on automated intelligent systems, at the same time, these systems have become more and more complicated. Society's expectation regarding the capabilities and intelligence of such systems has also grown. We have become a more complicated society with more complicated problems. As the expectation of intelligent systems rises, we discover many more applications for artificial intelligence. Additionally, as the difficulty level and computational requirements of such problems rise, there is a need to distribute the problem solving. Although the field of multiagent systems (MAS) and distributed artificial intelligence (DAI) is relatively young, the importance and applicability of this technology for solving today's problems continue to grow. In multiagent systems, the main goal is to provide fruitful cooperation among agents in order to enrich the support given to all user activities. This paper deals with the development of a multiagent system aimed at solving the reservation problems encountered in rural tourism. Due to their benefits over the last few years, online travel agencies have become a very useful instrument in planning vacations. A MAS concept (which is based on the Internet exploitation) can improve this activity and provide clients with a new, rapid and efficient way of making accommodation arrangements.\nThis philosophical paper explores the relation between modern scientific simulations and the future of the universe. We argue that a simulation of an entire universe will result from future scientific activity. This requires us to tackle the challenge of simulating open-ended evolution at all levels in a single simulation. The simulation should encompass not only biological evolution, but also physical evolution (a level below) and cultural evolution (a level above). The simulation would allow us to probe what would happen if we would \"replay the tape of the universe\" with the same or different laws and initial conditions. We also distinguish between real-world and artificial-world modelling. Assuming that intelligent life could indeed simulate an entire universe, this leads to two tentative hypotheses. Some authors have argued that we may already be in a simulation run by an intelligent entity. Or, if such a simulation could be made real, this would lead to the production of a new universe. This last direction is argued with a careful speculative philosophical approach, emphasizing the imperative to find a solution to the heat death problem in cosmology. The reader is invited to consult Annex 1 for an overview of the logical structure of this paper. -- Keywords: far future, future of science, ALife, simulation, realization, cosmology, heat death, fine-tuning, physical eschatology, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis, selfish biocosm hypothesis, meduso-anthropic principle, developmental singularity hypothesis, role of intelligent life.\nSocial intelligence in natural and artificial systems is usually measured by the evaluation of associated traits or tasks that are deemed to represent some facets of social behaviour. The amalgamation of these traits is then used to configure the intuitive notion of social intelligence. Instead, in this paper we start from a parametrised definition of social intelligence as the expected performance in a set of environments with several agents, and we assess and derive tests from it. This definition makes several dependencies explicit: (1) the definition depends on the choice (and weight) of environments and agents, (2) the definition may include both competitive and cooperative behaviours depending on how agents and rewards are arranged into teams, (3) the definition mostly depends on the abilities of other agents, and (4) the actual difference between social intelligence and general intelligence (or other abilities) depends on these choices. As a result, we address the problem of converting this definition into a more precise one where some fundamental properties ensuring social behaviour (such as action and reward dependency and anticipation on competitive/cooperative behaviours) are met as well as some other more instrumental properties (such as secernment, boundedness, symmetry, validity, reliability, efficiency), which are convenient to convert the definition into a practical test. From the definition and the formalised properties, we take a look at several representative multi-agent environments, tests and games to see whether they meet these properties.\nUnderstanding and using natural processes for intelligent functionalities, referred to as natural intelligence, has recently attracted interest from a variety of fields, including post-silicon computing for artificial intelligence and decision making in the behavioural sciences. In a past study, we successfully used the wave-particle duality of single photons to solve the two-armed bandit problem, which constitutes the foundation of reinforcement learning and decision making. In this study, we propose and confirm a hierarchical architecture for single-photon-based reinforcement learning and decision making that verifies the scalability of the principle. Specifically, the four-armed bandit problem is solved given zero prior knowledge in a two-layer hierarchical architecture, where polarization is autonomously adapted in order to effect adequate decision making using single-photon measurements. In the hierarchical structure, the notion of layer-dependent decisions emerges. The optimal solutions in the coarse layer and in the fine layer, however, conflict with each other in some contradictive problems. We show that while what we call a tournament strategy resolves such contradictions, the probabilistic nature of single photons allows for the direct location of the optimal solution even for contradictive problems, hence manifesting the exploration ability of single photons. This study provides insights into photon intelligence in hierarchical architectures for future artificial intelligence as well as the potential of natural processes for intelligent functionalities.\nGeneral game playing artificial intelligence has recently seen important advances due to the various techniques known as 'deep learning'. However the advances conceal equally important limitations in their reliance on: massive data sets; fortuitously constructed problems; and absence of any human-level complexity, including other human opponents. On the other hand, deep learning systems which do beat human champions, such as in Go, do not generalise well. The power of deep learning simultaneously exposes its weakness. Given that deep learning is mostly clever reconfigurations of well-established methods, moving beyond the state of art calls for forward-thinking visionary solutions, not just more of the same. I present the argument that general game playing artificial intelligence will require a generalised player model. This is because games are inherently human artefacts which therefore, as a class of problems, contain cases which require a human-style problem solving approach. I relate this argument to the performance of state of art general game playing agents. I then describe a concept for a formal category theoretic basis to a generalised player model. This formal model approach integrates my existing 'Behavlets' method for psychologically-derived player modelling:   Cowley, B., Charles, D. (2016). Behavlets: a Method for Practical Player Modelling using Psychology-Based Player Traits and Domain Specific Features. User Modeling and User-Adapted Interaction, 26(2), 257-306.\nNext-generation wireless networks must support ultra-reliable, low-latency communication and intelligently manage a massive number of Internet of Things (IoT) devices in real-time, within a highly dynamic environment. This need for stringent communication quality-of-service (QoS) requirements as well as mobile edge and core intelligence can only be realized by integrating fundamental notions of artificial intelligence (AI) and machine learning across the wireless infrastructure and end-user devices. In this context, this paper provides a comprehensive tutorial that introduces the main concepts of machine learning, in general, and artificial neural networks (ANNs), in particular, and their potential applications in wireless communications. For this purpose, we present a comprehensive overview on a number of key types of neural networks that include feed-forward, recurrent, spiking, and deep neural networks. For each type of neural network, we present the basic architecture and training procedure, as well as the associated challenges and opportunities. Then, we provide an in-depth overview on the variety of wireless communication problems that can be addressed using ANNs, ranging from communication using unmanned aerial vehicles to virtual reality and edge caching.For each individual application, we present the main motivation for using ANNs along with the associated challenges while also providing a detailed example for a use case scenario and outlining future works that can be addressed using ANNs. In a nutshell, this article constitutes one of the first holistic tutorials on the development of machine learning techniques tailored to the needs of future wireless networks.\nThis article presents an extensive literature review of technology based intervention methodologies for individuals facing Autism Spectrum Disorder (ASD). Reviewed methodologies include: contemporary Computer Aided Systems (CAS), Computer Vision Assisted Technologies (CVAT) and Virtual Reality (VR) or Artificial Intelligence (AI)-Assisted interventions. The research over the past decade has provided enough demonstrations that individuals with ASD have a strong interest in technology based interventions, which are useful in both, clinical settings as well as at home and classrooms. Despite showing great promise, research in developing an advanced technology based intervention that is clinically quantitative for ASD is minimal. Moreover, the clinicians are generally not convinced about the potential of the technology based interventions due to non-empirical nature of published results. A major reason behind this lack of acceptability is that a vast majority of studies on distinct intervention methodologies do not follow any specific standard or research design. We conclude from our findings that there remains a gap between the research community of computer science, psychology and neuroscience to develop an AI assisted intervention technology for individuals suffering from ASD. Following the development of a standardized AI based intervention technology, a database needs to be developed, to devise effective AI algorithms.\nThe General Data Protection Regulation (GDPR) is a European Union regulation that will replace the existing Data Protection Directive on 25 May 2018. The most significant change is a huge increase in the maximum fine that can be levied for breaches of the regulation. Yet fewer than half of UK companies are fully aware of GDPR - and a number of those who were preparing for it stopped doing so when the Brexit vote was announced. A last-minute rush to become compliant is therefore expected, and numerous companies are starting to offer advice, checklists and consultancy on how to comply with GDPR. In such an environment, artificial intelligence technologies ought to be able to assist by providing best advice; asking all and only the relevant questions; monitoring activities; and carrying out assessments. The paper considers four areas of GDPR compliance where rule based technologies and/or machine learning techniques may be relevant: * Following compliance checklists and codes of conduct; * Supporting risk assessments; * Complying with the new regulations regarding technologies that perform automatic profiling; * Complying with the new regulations concerning recognising and reporting breaches of security. It concludes that AI technology can support each of these four areas. The requirements that GDPR (or organisations that need to comply with GDPR) state for explanation and justification of reasoning imply that rule-based approaches are likely to be more helpful than machine learning approaches. However, there may be good business reasons to take a different approach in some circumstances.\nBig data, data science, deep learning, artificial intelligence are the key words of intense hype related with a job market in full evolution, that impose to adapt the contents of our university professional trainings. Which artificial intelligence is mostly concerned by the job offers? Which methodologies and technologies should be favored in the training programs? Which objectives, tools and educational resources do we needed to put in place to meet these pressing needs? We answer these questions in describing the contents and operational resources in the Data Science orientation of the specialty Applied Mathematics at INSA Toulouse. We focus on basic mathematics training (Optimization, Probability, Statistics), associated with the practical implementation of the most performing statistical learning algorithms, with the most appropriate technologies and on real examples. Considering the huge volatility of the technologies, it is imperative to train students in seft-training, this will be their technological watch tool when they will be in professional activity. This explains the structuring of the educational site github.com/wikistat into a set of tutorials. Finally, to motivate the thorough practice of these tutorials, a serious game is organized each year in the form of a prediction contest between students of Master degrees in Applied Mathematics for IA.\nArtificial intelligence (AI) holds great promise to empower us with knowledge and augment our effectiveness. We can -- and must -- ensure that we keep humans safe and in control, particularly with regard to government and public sector applications that affect broad populations. How can AI development teams harness the power of AI systems and design them to be valuable to humans? Diverse teams are needed to build trustworthy artificial intelligent systems, and those teams need to coalesce around a shared set of ethics. There are many discussions in the AI field about ethics and trust, but there are few frameworks available for people to use as guidance when creating these systems. The Human-Machine Teaming (HMT) Framework for Designing Ethical AI Experiences described in this paper, when used with a set of technical ethics, will guide AI development teams to create AI systems that are accountable, de-risked, respectful, secure, honest, and usable. To support the team's efforts, activities to understand people's needs and concerns will be introduced along with the themes to support the team's efforts. For example, usability testing can help determine if the audience understands how the AI system works and complies with the HMT Framework. The HMT Framework is based on reviews of existing ethical codes and best practices in human-computer interaction and software development. Human-machine teams are strongest when human users can trust AI systems to behave as expected, safely, securely, and understandably. Using the HMT Framework to design trustworthy AI systems will provide support to teams in identifying potential issues ahead of time and making great experiences for humans.\nThe potential for machine learning to disrupt the medical profession is the subject of ongoing debate within biomedical informatics. This study aimed to explore psychiatrists' opinions about the potential impact of innovations in artificial intelligence and machine learning on psychiatric practice. In Spring 2019, we conducted a web-based survey of 791 psychiatrists from 22 countries worldwide. The survey measured opinions about the likelihood future technology would fully replace physicians in performing ten key psychiatric tasks. This study involved qualitative descriptive analysis of written response to three open-ended questions in the survey. Comments were classified into four major categories in relation to the impact of future technology on patient-psychiatric interactions, the quality of patient medical care, the profession of psychiatry, and health systems. Overwhelmingly, psychiatrists were skeptical that technology could fully replace human empathy. Many predicted that 'man and machine' would increasingly collaborate in undertaking clinical decisions, with mixed opinions about the benefits and harms of such an arrangement. Participants were optimistic that technology might improve efficiencies and access to care, and reduce costs. Ethical and regulatory considerations received limited attention. This study presents timely information of psychiatrists' view about the scope of artificial intelligence and machine learning on psychiatric practice. Psychiatrists expressed divergent views about the value and impact of future technology with worrying omissions about practice guidelines, and ethical and regulatory issues.\nThis paper provides an overview of the SP theory of intelligence and its central idea that artificial intelligence, mainstream computing, and much of human perception and cognition, may be understood as information compression.   The background and origins of the SP theory are described, and the main elements of the theory, including the key concept of multiple alignment, borrowed from bioinformatics but with important differences. Associated with the SP theory is the idea that redundancy in information may be understood as repetition of patterns, that compression of information may be achieved via the matching and unification (merging) of patterns, and that computing and information compression are both fundamentally probabilistic. It appears that the SP system is Turing-equivalent in the sense that anything that may be computed with a Turing machine may, in principle, also be computed with an SP machine.   One of the main strengths of the SP theory and the multiple alignment concept is in modelling concepts and phenomena in artificial intelligence. Within that area, the SP theory provides a simple but versatile means of representing different kinds of knowledge, it can model both the parsing and production of natural language, with potential for the understanding and translation of natural languages, it has strengths in pattern recognition, with potential in computer vision, it can model several kinds of reasoning, and it has capabilities in planning, problem solving, and unsupervised learning.   The paper includes two examples showing how alternative parsings of an ambiguous sentence may be modelled as multiple alignments, and another example showing how the concept of multiple alignment may be applied in medical diagnosis.\nDigital pathology is not only one of the most promising fields of diagnostic medicine, but at the same time a hot topic for fundamental research. Digital pathology is not just the transfer of histopathological slides into digital representations. The combination of different data sources (images, patient records, and *omics data) together with current advances in artificial intelligence/machine learning enable to make novel information accessible and quantifiable to a human expert, which is not yet available and not exploited in current medical settings. The grand goal is to reach a level of usable intelligence to understand the data in the context of an application task, thereby making machine decisions transparent, interpretable and explainable. The foundation of such an \"augmented pathologist\" needs an integrated approach: While machine learning algorithms require many thousands of training examples, a human expert is often confronted with only a few data points. Interestingly, humans can learn from such few examples and are able to instantly interpret complex patterns. Consequently, the grand goal is to combine the possibilities of artificial intelligence with human intelligence and to find a well-suited balance between them to enable what neither of them could do on their own. This can raise the quality of education, diagnosis, prognosis and prediction of cancer and other diseases. In this paper we describe some (incomplete) research issues which we believe should be addressed in an integrated and concerted effort for paving the way towards the augmented pathologist.\nTriggered by modern technologies, our possibilities may now expand beyond the unthinkable. Cars externally may look similar to decades ago, but a dramatic revolution happened inside the cabin as a result of their computation, communications, and storage capabilities. With the advent of Electric Autonomous Vehicles (EAVs), Artificial Intelligence and ecological technologies found the best synergy. Several transportation problems may be solved (accidents, emissions, and congestion among others), and the foundation of Machine-to-Machine (M2M) economy could be established, in addition to value-added services such as infotainment (information and entertainment).   In the world where intelligent technologies are pervading everyday life, software and algorithms play a major role. Software has been lately introduced in virtually every technological product available on the market, from phones to television sets to cars and even housing. Artificial Intelligence is one of the consequences of this pervasive presence of algorithms. The role of software is becoming dominant and technology is, at times pervasive, of our existence. Concerns, such as privacy and security, demand high attention and have been already explored to some level of detail. However, intelligent agents and actors are often considered as perfect entities that will overcome human error-prone nature. This may not always be the case and we advocate that the notion of reputation is also applicable to intelligent artificial agents, in particular to EAVs.\nThis paper presents a tentative outline for the construction of an artificial, generally intelligent system (AGI). It is argued that building a general data compression algorithm solving all problems up to a complexity threshold should be the main thrust of research. A measure for partial progress in AGI is suggested. Although the details are far from being clear, some general properties for a general compression algorithm are fleshed out. Its inductive bias should be flexible and adapt to the input data while constantly searching for a simple, orthogonal and complete set of hypotheses explaining the data. It should recursively reduce the size of its representations thereby compressing the data increasingly at every iteration.   Abstract Based on that fundamental ability, a grounded reasoning system is proposed. It is argued how grounding and flexible feature bases made of hypotheses allow for resourceful thinking. While the simulation of representation contents on the mental stage accounts for much of the power of propositional logic, compression leads to simple sets of hypotheses that allow the detection and verification of universally quantified statements.   Abstract Together, it is highlighted how general compression and grounded reasoning could account for the birth and growth of first concepts about the world and the commonsense reasoning about them.\nArtificial Intelligence (AI) technologies could be broadly categorised into Analytics and Autonomy. Analytics focuses on algorithms offering perception, comprehension, and projection of knowledge gleaned from sensorial data. Autonomy revolves around decision making, and influencing and shaping the environment through action production. A smart autonomous system (SAS) combines analytics and autonomy to understand, learn, decide and act autonomously. To be useful, SAS must be trusted and that requires testing. Lifelong learning of a SAS compounds the testing process. In the remote chance that it is possible to fully test and certify the system pre-release, which is theoretically an undecidable problem, it is near impossible to predict the future behaviours that these systems, alone or collectively, will exhibit. While it may be feasible to severely restrict such systems\\textquoteright \\ learning abilities to limit the potential unpredictability of their behaviours, an undesirable consequence may be severely limiting their utility. In this paper, we propose the architecture for a watchdog AI (WAI) agent dedicated to lifelong functional testing of SAS. We further propose system specifications including a level of abstraction whereby humans shepherd a swarm of WAI agents to oversee an ecosystem made of humans and SAS. The discussion extends to the challenges, pros, and cons of the proposed concept.\nOne of the common artificial intelligence applications in electronic games consists of making an artificial agent learn how to execute some determined task successfully in a game environment. One way to perform this task is through machine learning algorithms capable of learning the sequence of actions required to win in a given game environment. There are several supervised learning techniques able to learn the correct answer for a problem through examples. However, when learning how to play electronic games, the correct answer might only be known by the end of the game, after all the actions were already taken. Thus, not being possible to measure the accuracy of each individual action to be taken at each time step. A way for dealing with this problem is through Neuroevolution, a method which trains Artificial Neural Networks using evolutionary algorithms. In this article, we introduce a framework for testing optimization algorithms with artificial agent controllers in electronic games, called EvoMan, which is inspired in the action-platformer game Mega Man II. The environment can be configured to run in different experiment modes, as single evolution, coevolution and others. To demonstrate some challenges regarding the proposed platform, as initial experiments we applied Neuroevolution using Genetic Algorithms and the NEAT algorithm, in the context of competitively coevolving two distinct agents in this game.\nToday, more and more, it is necessary that most applications and documents developed in previous or current technologies to be accessible online on cloud-based infrastructures. That is why the migration of legacy systems including their hosts of documents to new technologies and online infrastructures, using modern Artificial Intelligence techniques, is absolutely necessary. With the advancement of Artificial Intelligence and Deep Learning with its multitude of applications, a new area of research is emerging - that of automated systems development and maintenance. The underlying work objective that led to this paper aims to research and develop truly intelligent systems able to analyze user interfaces from various sources and generate real and usable inferences ranging from architecture analysis to actual code generation. One key element of such systems is that of artificial scene detection and analysis based on deep learning computer vision systems. Computer vision models and particularly deep directed acyclic graphs based on convolutional modules are generally constructed and trained based on natural images datasets. Due to this fact, the models will develop during the training process natural image feature detectors apart from the base graph modules that will learn basic primitive features. In the current paper, we will present the base principles of a deep neural pipeline for computer vision applied to artificial scenes (scenes generated by user interfaces or similar). Finally, we will present the conclusions based on experimental development and benchmarking against state-of-the-art transfer-learning implemented deep vision models.\nWhat is the nature of curiosity? Is there any scientific way to understand the origin of this mysterious force that drives the behavior of even the stupidest naturally intelligent systems and is completely absent in their smartest artificial analogs? Can we build AI systems that could be curious about something, systems that would have an intrinsic motivation to learn? Is such a motivation quantifiable? Is it implementable? I will discuss this problem from the standpoint of physics. The relationship between physics and intelligence is a consequence of the fact that correctly predicted information is nothing but an energy resource, and the process of thinking can be viewed as a process of accumulating and spending this resource through the acts of perception and, respectively, decision making. The natural motivation of any autonomous system to keep this accumulation/spending balance as high as possible allows one to treat the problem of describing the dynamics of thinking processes as a resource optimization problem. Here I will propose and discuss a simple theoretical model of such an autonomous system which I call the Autonomous Turing Machine (ATM). The potential attractiveness of ATM lies in the fact that it is the model of a self-propelled AI for which the only available energy resource is the information itself. For ATM, the problem of optimal thinking, learning, and decision-making becomes conceptually simple and mathematically well tractable. This circumstance makes the ATM an ideal playground for studying the dynamics of intelligent behavior and allows one to quantify many seemingly unquantifiable features of genuine intelligence.\nIntelligent Transportation Systems (ITS) have attracted the attention of researchers and the general public alike as a means to alleviate traffic congestion. Recently, the maturity of wireless technology has enabled a cost-efficient way to achieve ITS by detecting vehicles using Vehicle to Infrastructure (V2I) communications. Traditional ITS algorithms, in most cases, assume that every vehicle is observed, such as by a camera or a loop detector, but a V2I implementation would detect only those vehicles with wireless communications capability. We examine a family of transportation systems, which we will refer to as `Partially Detected Intelligent Transportation Systems'. An algorithm that can act well under a small detection rate is highly desirable due to gradual penetration rates of the underlying wireless technologies such as Dedicated Short Range Communications (DSRC) technology. Artificial Intelligence (AI) techniques for Reinforcement Learning (RL) are suitable tools for finding such an algorithm due to utilizing varied inputs and not requiring explicit analytic understanding or modeling of the underlying system dynamics. In this paper, we report a RL algorithm for partially observable ITS based on DSRC. The performance of this system is studied under different car flows, detection rates, and topologies of the road network. Our system is able to efficiently reduce the average waiting time of vehicles at an intersection, even with a low detection rate.\nIn the coming years, the future of military combat will include, on one hand, artificial intelligence-optimized complex command, control, communications, computers, intelligence, surveillance and reconnaissance (C4ISR) and networks and, on the other hand, autonomous intelligent Things fighting autonomous intelligent Things at a fast pace. Under this perspective, enemy forces will seek to disable or disturb our autonomous Things and our complex infrastructures and systems. Autonomy, scale and complexity in our defense systems will trigger new cyber-attack strategies, and autonomous intelligent malware (AIM) will be part of the picture. Should these cyber-attacks succeed while human operators remain unaware or unable to react fast enough due to the speed, scale or complexity of the mission, systems or attacks, missions would fail, our networks and C4ISR would be heavily disrupted, and command and control would be disabled. New cyber-defense doctrines and technologies are therefore required. Autonomous cyber defense (ACyD) is a new field of research and technology driven by the defense sector in anticipation of such threats to future military infrastructures, systems and operations. It will be implemented via swarms of autonomous intelligent cyber-defense agents (AICAs) that will fight AIM within our networks and systems. This paper presents this cyber-defense technology of the future, the current state of the art in this field and its main challenges. First, we review the rationale of the ACyD concept and its associated AICA technology. Then, we present the current research results from NATO's IST-152 Research Task Group on the AICA Reference Architecture. We then develop the 12 main technological challenges that must be resolved in the coming years, besides ethical and political issues.\nThe influence of Artificial Intelligence (AI) and Artificial Life (ALife) technologies upon society, and their potential to fundamentally shape the future evolution of humankind, are topics very much at the forefront of current scientific, governmental and public debate. While these might seem like very modern concerns, they have a long history that is often disregarded in contemporary discourse. Insofar as current debates do acknowledge the history of these ideas, they rarely look back further than the origin of the modern digital computer age in the 1940s-50s. In this paper we explore the earlier history of these concepts. We focus in particular on the idea of self-reproducing and evolving machines, and potential implications for our own species. We show that discussion of these topics arose in the 1860s, within a decade of the publication of Darwin's The Origin of Species, and attracted increasing interest from scientists, novelists and the general public in the early 1900s. After introducing the relevant work from this period, we categorise the various visions presented by these authors of the future implications of evolving machines for humanity. We suggest that current debates on the co-evolution of society and technology can be enriched by a proper appreciation of the long history of the ideas involved.\nAutonomous lifelong development and learning is a fundamental capability of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives. Deep learning (DL) approaches made great advances in artificial intelligence, but are still far away from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very little data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap, that are either superficially, or not adequately, discussed by Lake et al. These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to manually specify a task-specific objective function for every new task, and learn through off-line processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value, and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources. In the two last decades, the field of Developmental and Cognitive Robotics (Cangelosi and Schlesinger, 2015, Asada et al., 2009), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational\nWe present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the \"Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan.\" Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspective of Fermilab, America's premier national laboratory for High Energy Physics (HEP). We believe the NAIRDSP should be extended in light of the rapid pace of development and innovation in the field of Artificial Intelligence (AI) since 2016, and present our recommendations below. AI has profoundly impacted many areas of human life, promising to dramatically reshape society --- e.g., economy, education, science --- in the coming years. We are still early in this process. It is critical to invest now in this technology to ensure it is safe and deployed ethically. Science and society both have a strong need for accuracy, efficiency, transparency, and accountability in algorithms, making investments in scientific AI particularly valuable. Thus far the US has been a leader in AI technologies, and we believe as a national Laboratory it is crucial to help maintain and extend this leadership. Moreover, investments in AI will be important for maintaining US leadership in the physical sciences.\nQuantum information technologies, and intelligent learning systems, are both emergent technologies that will likely have a transforming impact on our society. The respective underlying fields of research -- quantum information (QI) versus machine learning (ML) and artificial intelligence (AI) -- have their own specific challenges, which have hitherto been investigated largely independently. However, in a growing body of recent work, researchers have been probing the question to what extent these fields can learn and benefit from each other. QML explores the interaction between quantum computing and ML, investigating how results and techniques from one field can be used to solve the problems of the other. Recently, we have witnessed breakthroughs in both directions of influence. For instance, quantum computing is finding a vital application in providing speed-ups in ML, critical in our \"big data\" world. Conversely, ML already permeates cutting-edge technologies, and may become instrumental in advanced quantum technologies. Aside from quantum speed-up in data analysis, or classical ML optimization used in quantum experiments, quantum enhancements have also been demonstrated for interactive learning, highlighting the potential of quantum-enhanced learning agents. Finally, works exploring the use of AI for the very design of quantum experiments, and for performing parts of genuine research autonomously, have reported their first successes. Beyond the topics of mutual enhancement, researchers have also broached the fundamental issue of quantum generalizations of ML/AI concepts. This deals with questions of the very meaning of learning and intelligence in a world that is described by quantum mechanics. In this review, we describe the main ideas, recent developments, and progress in a broad spectrum of research investigating machine learning and artificial intelligence in the quantum domain.\nSince its inception, artificial intelligence has relied upon a theoretical foundation centered around perfect rationality as the desired property of intelligent systems. We argue, as others have done, that this foundation is inadequate because it imposes fundamentally unsatisfiable requirements. As a result, there has arisen a wide gap between theory and practice in AI, hindering progress in the field. We propose instead a property called bounded optimality. Roughly speaking, an agent is bounded-optimal if its program is a solution to the constrained optimization problem presented by its architecture and the task environment. We show how to construct agents with this property for a simple class of machine architectures in a broad class of real-time environments. We illustrate these results using a simple model of an automated mail sorting facility. We also define a weaker property, asymptotic bounded optimality (ABO), that generalizes the notion of optimality in classical complexity theory. We then construct universal ABO programs, i.e., programs that are ABO no matter what real-time constraints are applied. Universal ABO programs can be used as building blocks for more complex systems. We conclude with a discussion of the prospects for bounded optimality as a theoretical basis for AI, and relate it to similar trends in philosophy, economics, and game theory.\nCognitive radio networks (CRNs) are networks of nodes equipped with cognitive radios that can optimize performance by adapting to network conditions. While cognitive radio networks (CRN) are envisioned as intelligent networks, relatively little research has focused on the network level functionality of CRNs. Although various routing protocols, incorporating varying degrees of adaptiveness, have been proposed for CRNs, it is imperative for the long term success of CRNs that the design of cognitive routing protocols be pursued by the research community. Cognitive routing protocols are envisioned as routing protocols that fully and seamless incorporate AI-based techniques into their design. In this paper, we provide a self-contained tutorial on various AI and machine-learning techniques that have been, or can be, used for developing cognitive routing protocols. We also survey the application of various classes of AI techniques to CRNs in general, and to the problem of routing in particular. We discuss various decision making techniques and learning techniques from AI and document their current and potential applications to the problem of routing in CRNs. We also highlight the various inference, reasoning, modeling, and learning sub tasks that a cognitive routing protocol must solve. Finally, open research issues and future directions of work are identified.\nWe construct a complexity-based morphospace to study systems-level properties of conscious & intelligent systems. The axes of this space label 3 complexity types: autonomous, cognitive & social. Given recent proposals to synthesize consciousness, a generic complexity-based conceptualization provides a useful framework for identifying defining features of conscious & synthetic systems. Based on current clinical scales of consciousness that measure cognitive awareness and wakefulness, we take a perspective on how contemporary artificially intelligent machines & synthetically engineered life forms measure on these scales. It turns out that awareness & wakefulness can be associated to computational & autonomous complexity respectively. Subsequently, building on insights from cognitive robotics, we examine the function that consciousness serves, & argue the role of consciousness as an evolutionary game-theoretic strategy. This makes the case for a third type of complexity for describing consciousness: social complexity. Having identified these complexity types, allows for a representation of both, biological & synthetic systems in a common morphospace. A consequence of this classification is a taxonomy of possible conscious machines. We identify four types of consciousness, based on embodiment: (i) biological consciousness, (ii) synthetic consciousness, (iii) group consciousness (resulting from group interactions), & (iv) simulated consciousness (embodied by virtual agents within a simulated reality). This taxonomy helps in the investigation of comparative signatures of consciousness across domains, in order to highlight design principles necessary to engineer conscious machines. This is particularly relevant in the light of recent developments at the crossroads of cognitive neuroscience, biomedical engineering, artificial intelligence & biomimetics.\nWe propose Cognitive Databases, an approach for transparently enabling Artificial Intelligence (AI) capabilities in relational databases. A novel aspect of our design is to first view the structured data source as meaningful unstructured text, and then use the text to build an unsupervised neural network model using a Natural Language Processing (NLP) technique called word embedding. This model captures the hidden inter-/intra-column relationships between database tokens of different types. For each database token, the model includes a vector that encodes contextual semantic relationships. We seamlessly integrate the word embedding model into existing SQL query infrastructure and use it to enable a new class of SQL-based analytics queries called cognitive intelligence (CI) queries. CI queries use the model vectors to enable complex queries such as semantic matching, inductive reasoning queries such as analogies, predictive queries using entities not present in a database, and, more generally, using knowledge from external sources. We demonstrate unique capabilities of Cognitive Databases using an Apache Spark based prototype to execute inductive reasoning CI queries over a multi-modal database containing text and images. We believe our first-of-a-kind system exemplifies using AI functionality to endow relational databases with capabilities that were previously very hard to realize in practice.\nThe goal of creating Artificial General Intelligence (AGI) -- or in other words of creating Turing machines (modern computers) that can behave in a way that mimics human intelligence -- has occupied AI researchers ever since the idea of AI was first proposed. One common theme in these discussions is the thesis that the ability of a machine to conduct convincing dialogues with human beings can serve as at least a sufficient criterion of AGI. We argue that this very ability should be accepted also as a necessary condition of AGI, and we provide a description of the nature of human dialogue in particular and of human language in general against this background. We then argue that it is for mathematical reasons impossible to program a machine in such a way that it could master human dialogue behaviour in its full generality. This is (1) because there are no traditional explicitly designed mathematical models that could be used as a starting point for creating such programs; and (2) because even the sorts of automated models generated by using machine learning, which have been used successfully in areas such as machine translation, cannot be extended to cope with human dialogue. If this is so, then we can conclude that a Turing machine also cannot possess AGI, because it fails to fulfil a necessary condition thereof. At the same time, however, we acknowledge the potential of Turing machines to master dialogue behaviour in highly restricted contexts, where what is called ``narrow'' AI can still be of considerable utility.\nRecent developments in machine-learning algorithms have led to impressive performance increases in many traditional application scenarios of artificial intelligence research. In the area of deep reinforcement learning, deep learning functional architectures are combined with incremental learning schemes for sequential tasks that include interaction-based, but often delayed feedback. Despite their impressive successes, modern machine-learning approaches, including deep reinforcement learning, still perform weakly when compared to flexibly adaptive biological systems in certain naturally occurring scenarios. Such scenarios include transfers to environments different than the ones in which the training took place or environments that dynamically change, both of which are often mastered by biological systems through a capability that we here term \"fluid adaptivity\" to contrast it from the much slower adaptivity (\"crystallized adaptivity\") of the prior learning from which the behavior emerged. In this article, we derive and discuss research strategies, based on analyzes of fluid adaptivity in biological systems and its neuronal modeling, that might aid in equipping future artificially intelligent systems with capabilities of fluid adaptivity more similar to those seen in some biologically intelligent systems. A key component of this research strategy is the dynamization of the problem space itself and the implementation of this dynamization by suitably designed flexibly interacting modules.\nThe biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.\nMany of the artificial intelligence techniques developed to date rely on heuristic search through large spaces. Unfortunately, the size of these spaces and the corresponding computational effort reduce the applicability of otherwise novel and effective algorithms. A number of parallel and distributed approaches to search have considerably improved the performance of the search process. Our goal is to develop an architecture that automatically selects parallel search strategies for optimal performance on a variety of search problems. In this paper we describe one such architecture realized in the Eureka system, which combines the benefits of many different approaches to parallel heuristic search. Through empirical and theoretical analyses we observe that features of the problem space directly affect the choice of optimal parallel search strategy. We then employ machine learning techniques to select the optimal parallel search strategy for a given problem space. When a new search task is input to the system, Eureka uses features describing the search space and the chosen architecture to automatically select the appropriate search strategy. Eureka has been tested on a MIMD parallel processor, a distributed network of workstations, and a single workstation using multithreading. Results generated from fifteen puzzle problems, robot arm motion problems, artificial search spaces, and planning problems indicate that Eureka outperforms any of the tested strategies used exclusively for all problem instances and is able to greatly reduce the search time for these applications.\nBy introducing elements of information mining to tax analysis, by means of data mining software and advanced computational concepts of artificial intelligence, the problem of tax evader's crime against public property has been addressed. Through an empirical approach from a hypothetical case of use, induction algorithms, neural networks and bayesian networks are applied to determine the feasibility of its heuristic application by the tax public administrator. Different strategies are explored to facilitate the work of local and regional federal tax inspectors, considering their limited computational capabilities, but equally effective for those social scientist committed to handcrafting tax research.   -----   Apresentando a introdu\\c{c}\\~ao de elementos de explora\\c{c}\\~ao de informa\\c{c}\\~oes para an\\'alise fiscal, por meio de software de minera\\c{c}\\~ao de dados e conceitos avan\\c{c}ados computacionais de intelig\\^encia artificial, foi abordado o problema do crime de sonegador fiscal contra o patrim\\^onio p\\'ublico. Atrav\\'es de uma abordagem emp\\'irica a partir de um caso hipot\\'etico de uso, os algoritmos de indu\\c{c}\\~ao, redes neurais e redes bayesianas s\\~ao aplicados para determinar a viabilidade de sua aplica\\c{c}\\~ao heur\\'istica pelo administrador p\\'ublico tribut\\'ario. Diferentes estrat\\'egias s\\~ao exploradas para facilitar o trabalho dos inspectores tribut\\'arios federais locais e regionais, tendo em conta as suas capacidades computacionais limitados, mas igualmente eficaz para aqueles cientista social comprometido com a investiga\\c{c}\\~ao fiscal.\nHutchinson, Lo and Poggio raised the question that if learning works can learn the Black-Scholes formula, and they proposed the network mapping the ratio of underlying price to strike $S_t/K$ and the time to maturity $\\tau$ directly into the ratio of option price to strike $C_t/K$. In this paper we propose a novel descision function and study the network mapping $S_t/K$ and $\\tau$ into the ratio of time value to strike $V_t/K$. Time values' appearance in artificial intelligence fits into traders' natural intelligence. Empirical experiments will be carried out to demonstrate that it significantly improves Hutchinson-Lo-Poggio's original model by faster learning and better generalization performance. In order to take a conceptual viewpoint and to prove that $V_t/K$ but not $C_t/K$ can be approximated by superpositions of logistic functions on its domain of definition, we work on the theory of universal approximation on unbounded domains. We prove some general results which imply that an artificial neural network with a single hidden layer and sigmoid activation represents no function in $L^{p}(\\RR^2 \\times [0, 1]^{n})$ unless it is constant zero, and that an artificial neural network with a single hidden layer and logistic activation is a universal approximator of $L^{2}(\\RR \\times [0, 1]^{n})$. Our work partially generalizes Cybenko's fundamental universal approximation theorem on the unit hypercube $[0, 1]^{n}$.\nIn the article a turn-based game played on four computers connected via network is investigated. There are three computers with natural intelligence and one with artificial intelligence. Game table is seen by each player's own view point in all players' monitors. Domino pieces are three dimensional. For distributed systems TCP/IP protocol is used. In order to get 3D image, Microsoft XNA technology is applied. Domino 101 game is nondeterministic game that is result of the game depends on the initial random distribution of the pieces. Number of the distributions is equal to the multiplication of following combinations: . Moreover, in this game that is played by four people, players are divided into 2 pairs. Accordingly, we cannot predict how the player uses the dominoes that is according to the dominoes of his/her partner or according to his/her own dominoes. The fact that the natural intelligence can be a player in any level affects the outcome. These reasons make it difficult to develop an AI. In the article four levels of AI are developed. The AI in the first level is equivalent to the intelligence of a child who knows the rules of the game and recognizes the numbers. The AI in this level plays if it has any domino, suitable to play or says pass. In most of the games which can be played on the internet, the AI does the same. But the AI in the last level is a master player, and it can develop itself according to its competitors' levels.\nMycotoxin contamination in certain agricultural systems have been a serious concern for human and animal health. Mycotoxins are toxic substances produced mostly as secondary metabolites by fungi that grow on seeds and feed in the field, or in storage. The food-borne Mycotoxins likely to be of greatest significance for human health in tropical developing countries are Aflatoxins and Fumonisins. Chili pepper is also prone to Aflatoxin contamination during harvesting, production and storage periods.Various methods used for detection of Mycotoxins give accurate results, but they are slow, expensive and destructive. Destructive method is testing a material that degrades the sample under investigation. Whereas, non-destructive testing will, after testing, allow the part to be used for its intended purpose. Ultrasonic methods, Multispectral image processing methods, Terahertz methods, X-ray and Thermography have been very popular in nondestructive testing and characterization of materials and health monitoring. Image processing methods are used to improve the visual quality of the pictures and to extract useful information from them. In this proposed work, the chili pepper samples will be collected, and the X-ray, multispectral images of the samples will be processed using image processing methods. The term \"Computational Intelligence\" referred as simulation of human intelligence on computers. It is also called as \"Artificial Intelligence\" (AI) approach. The techniques used in AI approach are Neural network, Fuzzy logic and evolutionary computation. Finally, the computational intelligence method will be used in addition to image processing to provide best, high performance and accurate results for detecting the Mycotoxin level in the samples collected.\nThis article describes existing and expected benefits of the \"SP theory of intelligence\", and some potential applications. The theory aims to simplify and integrate ideas across artificial intelligence, mainstream computing, and human perception and cognition, with information compression as a unifying theme. It combines conceptual simplicity with descriptive and explanatory power across several areas of computing and cognition. In the \"SP machine\" -- an expression of the SP theory which is currently realized in the form of a computer model -- there is potential for an overall simplification of computing systems, including software. The SP theory promises deeper insights and better solutions in several areas of application including, most notably, unsupervised learning, natural language processing, autonomous robots, computer vision, intelligent databases, software engineering, information compression, medical diagnosis and big data. There is also potential in areas such as the semantic web, bioinformatics, structuring of documents, the detection of computer viruses, data fusion, new kinds of computer, and the development of scientific theories. The theory promises seamless integration of structures and functions within and between different areas of application. The potential value, worldwide, of these benefits and applications is at least $190 billion each year. Further development would be facilitated by the creation of a high-parallel, open-source version of the SP machine, available to researchers everywhere.\nNowadays, represented by Deep Learning techniques, the field of machine learning is experiencing unprecedented prosperity and its influence is demonstrated in academia, industry and civil society. \"Intelligent\" has become a label which could not be neglected for most applications; celebrities and scientists also warned that the development of full artificial intelligence may spell the end of the human race. It seems that the answer to building a computer system that could automatically improve with experience is right on the next corner. While for AI and machine learning researchers, it is a consensus that we are not anywhere near the core technique which could bring the Terminator, Number 5 or R2D2 into real life, and there is not even a formal definition about what is intelligence, or one of its basic properties: Learning. Therefore, even though researchers know these concerns are not necessary currently, there is no generalized explanation about why these concerns are not necessary, and what properties people should take into account that would make these concerns to be necessary. In this paper, starts from analysing the relation between information and its representation, a necessary condition for a model to be a learning model is proposed. This condition and related future works could be used to verify whether a system is able to learn or not, and enrich our understanding of learning: one important property of Intelligence.\nReinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn't explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents.\nDeep learning has revolutionized speech recognition, image recognition, and natural language processing since 2010, each involving a single modality in the input signal. However, many applications in artificial intelligence involve more than one modality. It is therefore of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities. In this paper, a technical review of the models and learning methods for multimodal intelligence is provided. The main focus is the combination of vision and natural language, which has become an important area in both computer vision and natural language processing research communities. This review provides a comprehensive analysis of recent work on multimodal deep learning from three new angles - learning multimodal representations, the fusion of multimodal signals at various levels, and multimodal applications. On multimodal representation learning, we review the key concept of embedding, which unifies the multimodal signals into the same vector space and thus enables cross-modality signal processing. We also review the properties of the many types of embedding constructed and learned for general downstream tasks. On multimodal fusion, this review focuses on special architectures for the integration of the representation of unimodal signals for a particular task. On applications, selected areas of a broad interest in current literature are covered, including caption generation, text-to-image generation, and visual question answering. We believe this review can facilitate future studies in the emerging field of multimodal intelligence for the community.\nThe Turing Test (TT) checks for human intelligence, rather than any putative general intelligence. It involves repeated interaction requiring learning in the form of adaption to the human conversation partner. It is a macro-level post-hoc test in contrast to the definition of a Turing Machine (TM), which is a prior micro-level definition. This raises the question of whether learning is just another computational process, i.e. can be implemented as a TM. Here we argue that learning or adaption is fundamentally different from computation, though it does involve processes that can be seen as computations. To illustrate this difference we compare (a) designing a TM and (b) learning a TM, defining them for the purpose of the argument. We show that there is a well-defined sequence of problems which are not effectively designable but are learnable, in the form of the bounded halting problem. Some characteristics of human intelligence are reviewed including it's: interactive nature, learning abilities, imitative tendencies, linguistic ability and context-dependency. A story that explains some of these is the Social Intelligence Hypothesis. If this is broadly correct, this points to the necessity of a considerable period of acculturation (social learning in context) if an artificial intelligence is to pass the TT. Whilst it is always possible to 'compile' the results of learning into a TM, this would not be a designed TM and would not be able to continually adapt (pass future TTs). We conclude three things, namely that: a purely \"designed\" TM will never pass the TT; that there is no such thing as a general intelligence since it necessary involves learning; and that learning/adaption and computation should be clearly distinguished.\nReinforcement learning (RL) is an area of research that has blossomed tremendously in recent years and has shown remarkable potential for artificial intelligence based opponents in computer games. This success is primarily due to the vast capabilities of convolutional neural networks, that can extract useful features from noisy and complex data. Games are excellent tools to test and push the boundaries of novel RL algorithms because they give valuable insight into how well an algorithm can perform in isolated environments without the real-life consequences. Real-time strategy games (RTS) is a genre that has tremendous complexity and challenges the player in short and long-term planning. There is much research that focuses on applied RL in RTS games, and novel advances are therefore anticipated in the not too distant future. However, there are to date few environments for testing RTS AIs. Environments in the literature are often either overly simplistic, such as microRTS, or complex and without the possibility for accelerated learning on consumer hardware like StarCraft II. This paper introduces the Deep RTS game environment for testing cutting-edge artificial intelligence algorithms for RTS games. Deep RTS is a high-performance RTS game made specifically for artificial intelligence research. It supports accelerated learning, meaning that it can learn at a magnitude of 50 000 times faster compared to existing RTS games. Deep RTS has a flexible configuration, enabling research in several different RTS scenarios, including partially observable state-spaces and map complexity. We show that Deep RTS lives up to our promises by comparing its performance with microRTS, ELF, and StarCraft II on high-end consumer hardware. Using Deep RTS, we show that a Deep Q-Network agent beats random-play agents over 70% of the time. Deep RTS is publicly available at https://github.com/cair/DeepRTS.\nA system with artificial intelligence usually relies on symbol manipulation, at least partly and implicitly. However, the interpretation of the symbols - what they represent and what they are about - is ultimately left to humans, as designers and users of the system. How symbols can acquire meaning for the system itself, independent of external interpretation, is an unsolved problem. Some grounding of symbols can be obtained by embodiment, that is, by causally connecting symbols (or sub-symbolic variables) to the physical environment, such as in a robot with sensors and effectors. However, a causal connection as such does not produce representation and aboutness of the kind that symbols have for humans. Here I present a theory that explains how humans and other living organisms have acquired the capability to have symbols and sub-symbolic variables that represent, refer to, and are about something else. The theory shows how reference can be to physical objects, but also to abstract objects, and even how it can be misguided (errors in reference) or be about non-existing objects. I subsequently abstract the primary components of the theory from their biological context, and discuss how and under what conditions the theory could be implemented in artificial agents. A major component of the theory is the strong nonlinearity associated with (potentially unlimited) self-reproduction. The latter is likely not acceptable in artificial systems. It remains unclear if goals other than those inherently serving self-reproduction can have aboutness and if such goals could be stabilized.\nThe field of machine ethics is concerned with the question of how to embed ethical behaviors, or a means to determine ethical behaviors, into artificial intelligence (AI) systems. The goal is to produce artificial moral agents (AMAs) that are either implicitly ethical (designed to avoid unethical consequences) or explicitly ethical (designed to behave ethically). Van Wynsberghe and Robbins' (2018) paper Critiquing the Reasons for Making Artificial Moral Agents critically addresses the reasons offered by machine ethicists for pursuing AMA research; this paper, co-authored by machine ethicists and commentators, aims to contribute to the machine ethics conversation by responding to that critique. The reasons for developing AMAs discussed in van Wynsberghe and Robbins (2018) are: it is inevitable that they will be developed; the prevention of harm; the necessity for public trust; the prevention of immoral use; such machines are better moral reasoners than humans, and building these machines would lead to a better understanding of human morality. In this paper, each co-author addresses those reasons in turn. In so doing, this paper demonstrates that the reasons critiqued are not shared by all co-authors; each machine ethicist has their own reasons for researching AMAs. But while we express a diverse range of views on each of the six reasons in van Wynsberghe and Robbins' critique, we nevertheless share the opinion that the scientific study of AMAs has considerable value.\nA recurring topic in interstellar exploration and the search for extraterrestrial intelligence (SETI) is the role of artificial intelligence. More precisely, these are programs or devices that are capable of performing cognitive tasks that have been previously associated with humans such as image recognition, reasoning, decision-making etc. Such systems are likely to play an important role in future deep space missions, notably interstellar exploration, where the spacecraft needs to act autonomously. This article explores the drivers for an interstellar mission with a computation-heavy payload and provides an outline of a spacecraft and mission architecture that supports such a payload. Based on existing technologies and extrapolations of current trends, it is shown that AI spacecraft development and operation will be constrained and driven by three aspects: power requirements for the payload, power generation capabilities, and heat rejection capabilities. A likely mission architecture for such a probe is to get into an orbit close to the star in order to generate maximum power for computational activities, and then to prepare for further exploration activities. Given current levels of increase in computational power, such a payload with a similar computational power as the human brain would have a mass of hundreds to dozens of tons in a 2050 - 2060 timeframe.\nNew Artificial Human Optimization (AHO) Field Algorithms can be created from scratch or by adding the concept of Artificial Humans into other existing Optimization Algorithms. Particle Swarm Optimization (PSO) has been very popular for solving complex optimization problems due to its simplicity. In this work, new Artificial Human Optimization Field Algorithms are created by modifying existing PSO algorithms with AHO Field Concepts. These Hybrid PSO Algorithms comes under PSO Field as well as AHO Field. There are Hybrid PSO research articles based on Human Behavior, Human Cognition and Human Thinking etc. But there are no Hybrid PSO articles which based on concepts like Human Disease, Human Kindness and Human Relaxation. This paper proposes new AHO Field algorithms based on these research gaps. Some existing Hybrid PSO algorithms are given a new name in this work so that it will be easy for future AHO researchers to find these novel Artificial Human Optimization Field Algorithms. A total of 6 Artificial Human Optimization Field algorithms titled \"Human Safety Particle Swarm Optimization (HuSaPSO)\", \"Human Kindness Particle Swarm Optimization (HKPSO)\", \"Human Relaxation Particle Swarm Optimization (HRPSO)\", \"Multiple Strategy Human Particle Swarm Optimization (MSHPSO)\", \"Human Thinking Particle Swarm Optimization (HTPSO)\" and \"Human Disease Particle Swarm Optimization (HDPSO)\" are tested by applying these novel algorithms on Ackley, Beale, Bohachevsky, Booth and Three-Hump Camel Benchmark Functions. Results obtained are compared with PSO algorithm.\nMany computer models such as cellular automata and artificial neural networks have been developed and successfully applied. However, in some cases, these models might be restrictive on the possible solutions or their solutions might be difficult to interpret. To overcome this problem, we outline a new approach, the so-called allagmatic method, that automatically programs and executes models with as little limitations as possible while maintaining human interpretability. Earlier we described a metamodel and its building blocks according to the philosophical concepts of structure (spatial dimension) and operation (temporal dimension). They are entity, milieu, and update function that together abstractly describe cellular automata, artificial neural networks, and possibly any kind of computer model. By automatically combining these building blocks in an evolutionary computation, interpretability might be increased by the relationship to the metamodel, and models might be translated into more interpretable models via the metamodel. We propose generic and object-oriented programming to implement the entities and their milieus as dynamic and generic arrays and the update function as a method. We show two experiments where a simple cellular automaton and an artificial neural network are automatically programmed, compiled, and executed. A target state is successfully evolved and learned in the cellular automaton and artificial neural network, respectively. We conclude that the allagmatic method can create and execute cellular automaton and artificial neural network models in an automated manner with the guidance of philosophy.\nWe seek causes through science, religion, and in everyday life. We get excited when a big rock causes a big splash, and we get scared when it tumbles without a cause. But our causal cognition is usually biased. The 'why' is influenced by the 'who'. It is influenced by the 'self', and by 'others'. We share rituals, we watch action movies, and we influence each other to believe in the same causes. Human mind is packed with subjectivity because shared cognitive biases bring us together. But they also make us vulnerable.   An artificial mind is deemed to be more objective than the human mind. After many years of science-fiction fantasies about even-minded androids, they are now sold as personal or expert assistants, as brand advocates, as policy or candidate supporters, as network influencers. Artificial agents have been stunningly successful in disseminating artificial causal beliefs among humans. As malicious artificial agents continue to manipulate human cognitive biases, and deceive human communities into ostensive but expansive causal illusions, the hope for defending us has been vested into developing benevolent artificial agents, tasked with preventing and mitigating cognitive distortions inflicted upon us by their malicious cousins. Can the distortions of human causal cognition be corrected on a more solid foundation of artificial causal cognition?   In the present paper, we study a simple model of causal cognition, viewed as a quest for causal models. We show that, under very mild and hard to avoid assumptions, there are always self-confirming causal models, which perpetrate self-deception, and seem to preclude a royal road to objectivity.\nThe ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. The intelligence of current computer algorithms has not reached this level of complexity, but there are several research efforts that are working towards it. With the number of classification algorithms available, it is hard to determine which algorithm works best for a particular situation. In classification of visual human intent data, Hidden Markov Models (HMM), and their variants, are leading candidates.   The inability of HMMs to provide a probability in the observation to observation linkages is a big downfall in this classification technique. If a person is visually identifying an action of another person, they monitor patterns in the observations. By estimating the next observation, people have the ability to summarize the actions, and thus determine, with pretty good accuracy, the intention of the person performing the action. These visual cues and linkages are important in creating intelligent algorithms for determining human actions based on visual observations.   The Evidence Feed Forward Hidden Markov Model is a newly developed algorithm which provides observation to observation linkages. The following research addresses the theory behind Evidence Feed Forward HMMs, provides mathematical proofs of their learning of these parameters to optimize the likelihood of observations with a Evidence Feed Forwards HMM, which is important in all computational intelligence algorithm, and gives comparative examples with standard HMMs in classification of both visual action data and measurement data; thus providing a strong base for Evidence Feed Forward HMMs in classification of many types of problems.\nSocial Robot Lumen is an Artificial Intelligence development project that aims to create an Artificial Intelligence (AI) which allows a humanoid robot to communicate with human being naturally. In this study, Lumen will be developed to be a tour guide in Electrical Engineering Days 2015 exhibition. In developing an AI, there are a lot of modules that need to be developed separately. To make the development easier, we need a computational platform which becomes basis for all developers to give easiness in developing the modules in parallel way. That computational platform that developed by the writer is called Lumen Server. Lumen Server has two main function, which are to be a bridge between all Lumen intelligence modules with NAO robot, and to be the communication bridge between those Lumen intelligence modules. For the second function, Lumen Server implements the AMQP protocol using RabbitMQ. Besides that, writer also developed a control system for robot movement called Lumen Motion. Lumen motion is implemented by modelling the movement of NAO robot and also by creating a control system using fuzzy logic controller. Writer also developed a program that connects all Lumen intelligence modules so that Lumen can act like a tour guide. The implementation of this program uses FSM and event-driven program. From implementation result, all the features which were designed are successfully implemented. By the developing of this computational platform, it can ease the development of Lumen in the future. For next development, it must be focused on creating integration system so that Lumen can be more responsive to the environment.   -----   Sosial Robot Lumen adalah proyek pengembangan kecerdasan buatan yang bertujuan untuk menciptakan kecerdasan buatan atau artificial intelligence (AI) yang memungkinkan robot untuk dapat berkomunikasi dengan manusia secara alami.\nIn this paper, we demonstrate the application of Fuzzy Markup Language (FML) to construct an FML-based Dynamic Assessment Agent (FDAA), and we present an FML-based Human-Machine Cooperative System (FHMCS) for the game of Go. The proposed FDAA comprises an intelligent decision-making and learning mechanism, an intelligent game bot, a proximal development agent, and an intelligent agent. The intelligent game bot is based on the open-source code of Facebook Darkforest, and it features a representational state transfer application programming interface mechanism. The proximal development agent contains a dynamic assessment mechanism, a GoSocket mechanism, and an FML engine with a fuzzy knowledge base and rule base. The intelligent agent contains a GoSocket engine and a summarization agent that is based on the estimated win rate, real-time simulation number, and matching degree of predicted moves. Additionally, the FML for player performance evaluation and linguistic descriptions for game results commentary are presented. We experimentally verify and validate the performance of the FDAA and variants of the FHMCS by testing five games in 2016 and 60 games of Google Master Go, a new version of the AlphaGo program, in January 2017. The experimental results demonstrate that the proposed FDAA can work effectively for Go applications.\nThe combination of Artificial Intelligence (AI) and Internet-of-Things (IoT), which is denoted as AI-powered Internet-of-Things (AIoT), is capable of processing huge amount of data generated from a large number of devices and handling complex problems in social infrastructures. As AI and IoT technologies are becoming mature, in this paper, we propose to apply AIoT technologies for traffic light control, which is an essential component for intelligent transportation system, to improve the efficiency of smart city's road system. Specifically, various sensors such as surveillance cameras provide real-time information for intelligent traffic light control system to observe the states of both motorized traffic and non-motorized traffic. In this paper, we propose an intelligent traffic light control solution by using distributed multi-agent Q learning, considering the traffic information at the neighboring intersections as well as local motorized and non-motorized traffic, to improve the overall performance of the entire control system. By using the proposed multi-agent Q learning algorithm, our solution is targeting to optimize both the motorized and non-motorized traffic. In addition, we considered many constraints/rules for traffic light control in the real world, and integrate these constraints in the learning algorithm, which can facilitate the proposed solution to be deployed in real operational scenarios. We conducted numerical simulations for a real-world map with real-world traffic data. The simulation results show that our proposed solution outperforms existing solutions in terms of vehicle and pedestrian queue lengths, waiting time at intersections, and many other key performance metrics.\nThis report - a major revision of its previous release - describes a reference architecture for intelligent software agents performing active, largely autonomous cyber-defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 \"Intelligent Autonomous Agents for Cyber Defense and Resilience\". In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents - malware - will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance and computerized weapon systems. To fight them, NATO needs artificial cyber hunters - intelligent, autonomous, mobile agents specialized in active cyber defense. With this in mind, in 2016, NATO initiated RTG IST-152. Its objective has been to help accelerate the development and transition to practice of such software agents by producing a reference architecture and technical roadmap. This report presents the concept and architecture of an Autonomous Intelligent Cyber-defense Agent (AICA). We describe the rationale of the AICA concept, explain the methodology and purpose that drive the definition of the AICA Reference Architecture, and review some of the main features and challenges of AICAs.\nIntelligent systems and advanced automation are involved in information collection and evaluation, in decision-making and in the implementation of chosen actions. In such systems, human responsibility becomes equivocal. Understanding human casual responsibility is particularly important when intelligent autonomous systems can harm people, as with autonomous vehicles or, most notably, with autonomous weapon systems (AWS). Using Information Theory, we develop a responsibility quantification (ResQu) model of human involvement in intelligent automated systems and demonstrate its applications on decisions regarding AWS. The analysis reveals that human comparative responsibility to outcomes is often low, even when major functions are allocated to the human. Thus, broadly stated policies of keeping humans in the loop and having meaningful human control are misleading and cannot truly direct decisions on how to involve humans in intelligent systems and advanced automation. The current model is an initial step in the complex goal to create a comprehensive responsibility model, that will enable quantification of human causal responsibility. It assumes stationarity, full knowledge regarding the characteristic of the human and automation and ignores temporal aspects. Despite these limitations, it can aid in the analysis of systems designs alternatives and policy decisions regarding human responsibility in intelligent systems and advanced automation.\nIn the last five years, edge computing has attracted tremendous attention from industry and academia due to its promise to reduce latency, save bandwidth, improve availability, and protect data privacy to keep data secure. At the same time, we have witnessed the proliferation of AI algorithms and models which accelerate the successful deployment of intelligence mainly in cloud services. These two trends, combined together, have created a new horizon: Edge Intelligence (EI). The development of EI requires much attention from both the computer systems research community and the AI community to meet these demands. However, existing computing techniques used in the cloud are not applicable to edge computing directly due to the diversity of computing sources and the distribution of data sources. We envision that there missing a framework that can be rapidly deployed on edge and enable edge AI capabilities. To address this challenge, in this paper we first present the definition and a systematic review of EI. Then, we introduce an Open Framework for Edge Intelligence (OpenEI), which is a lightweight software platform to equip edges with intelligent processing and data sharing capability. We analyze four fundamental EI techniques which are used to build OpenEI and identify several open problems based on potential research directions. Finally, four typical application scenarios enabled by OpenEI are presented.\nGiven a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula Phi holds given KB. If we are reasoning about a world or system consisting of N individuals, then we can consider all possible worlds, or first-order models, with domain {1,...,N} that satisfy KB, and compute the fraction of them in which Phi is true. We define the degree of belief to be the asymptotic value of this fraction as N grows large. We show that when the vocabulary underlying Phi and KB uses constants and unary predicates only, we can naturally associate an entropy with each world. As N grows larger, there are many more worlds with higher entropy. Therefore, we can use a maximum-entropy computation to compute the degree of belief. This result is in a similar spirit to previous work in physics and artificial intelligence, but is far more general. Of equal interest to the result itself are the limitations on its scope. Most importantly, the restriction to unary predicates seems necessary. Although the random-worlds method makes sense in general, the connection to maximum entropy seems to disappear in the non-unary case. These observations suggest unexpected limitations to the applicability of maximum-entropy methods.\nMost modern formalisms used in Databases and Artificial Intelligence for describing an application domain are based on the notions of class (or concept) and relationship among classes. One interesting feature of such formalisms is the possibility of defining a class, i.e., providing a set of properties that precisely characterize the instances of the class. Many recent articles point out that there are several ways of assigning a meaning to a class definition containing some sort of recursion. In this paper, we argue that, instead of choosing a single style of semantics, we achieve better results by adopting a formalism that allows for different semantics to coexist. We demonstrate the feasibility of our argument, by presenting a knowledge representation formalism, the description logic muALCQ, with the above characteristics. In addition to the constructs for conjunction, disjunction, negation, quantifiers, and qualified number restrictions, muALCQ includes special fixpoint constructs to express (suitably interpreted) recursive definitions. These constructs enable the usual frame-based descriptions to be combined with definitions of recursive data structures such as directed acyclic graphs, lists, streams, etc. We establish several properties of muALCQ, including the decidability and the computational complexity of reasoning, by formulating a correspondence with a particular modal logic of programs called the modal mu-calculus.\nExisting plan synthesis approaches in artificial intelligence fall into two categories -- domain independent and domain dependent. The domain independent approaches are applicable across a variety of domains, but may not be very efficient in any one given domain. The domain dependent approaches need to be (re)designed for each domain separately, but can be very efficient in the domain for which they are designed. One enticing alternative to these approaches is to automatically synthesize domain independent planners given the knowledge about the domain and the theory of planning. In this paper, we investigate the feasibility of using existing automated software synthesis tools to support such synthesis. Specifically, we describe an architecture called CLAY in which the Kestrel Interactive Development System (KIDS) is used to derive a domain-customized planner through a semi-automatic combination of a declarative theory of planning, and the declarative control knowledge specific to a given domain, to semi-automatically combine them to derive domain-customized planners. We discuss what it means to write a declarative theory of planning and control knowledge for KIDS, and illustrate our approach by generating a class of domain-specific planners using state space refinements. Our experiments show that the synthesized planners can outperform classical refinement planners (implemented as instantiations of UCP, Kambhampati & Srivastava, 1995), using the same control knowledge. We will contrast the costs and benefits of the synthesis approach with conventional methods for customizing domain independent planners.\nIn this paper we propose a random CSP model, called Model GB, which is a natural generalization of standard Model B. It is proved that Model GB in which each constraint is easy to satisfy exhibits non-trivial behaviour (not trivially satisfiable or unsatisfiable) as the number of variables approaches infinity. A detailed analysis to obtain an asymptotic estimate (good to 1+o(1)) of the average number of nodes in a search tree used by the backtracking algorithm on Model GB is also presented. It is shown that the average number of nodes required for finding all solutions or proving that no solution exists grows exponentially with the number of variables. So this model might be an interesting distribution for studying the nature of hard instances and evaluating the performance of CSP algorithms. In addition, we further investigate the behaviour of the average number of nodes as r (the ratio of constraints to variables) varies. The results indicate that as r increases, random CSP instances get easier and easier to solve, and the base for the average number of nodes that is exponential in r tends to 1 as r approaches infinity. Therefore, although the average number of nodes used by the backtracking algorithm on random CSP is exponential, many CSP instances will be very easy to solve when r is sufficiently large.\nThe constraint satisfaction problem (CSP) is a general problem central to computer science and artificial intelligence. Although the CSP is NP-hard in general, considerable effort has been spent on identifying tractable subclasses. The main two approaches consider structural properties (restrictions on the hypergraph of constraint scopes) and relational properties (restrictions on the language of constraint relations). Recently, some authors have considered hybrid properties that restrict the constraint hypergraph and the relations simultaneously.   Our key contribution is the novel concept of a CSP pattern and classes of problems defined by forbidden patterns (which can be viewed as forbidding generic subproblems). We describe the theoretical framework which can be used to reason about classes of problems defined by forbidden patterns. We show that this framework generalises relational properties and allows us to capture known hybrid tractable classes.   Although we are not close to obtaining a dichotomy concerning the tractability of general forbidden patterns, we are able to make some progress in a special case: classes of problems that arise when we can only forbid binary negative patterns (generic subproblems in which only inconsistent tuples are specified). In this case we are able to characterise very large classes of tractable and NP-hard forbidden patterns. This leaves the complexity of just one case unresolved and we conjecture that this last case is tractable.\nHuman intuition has been simulated by several research projects using artificial intelligence techniques. Most of these algorithms or models lack the ability to handle complications or diversions. Moreover, they also do not explain the factors influencing intuition and the accuracy of the results from this process. In this paper, we present a simple series based model for implementation of human-like intuition using the principles of connectivity and unknown entities. By using Poker hand datasets and Car evaluation datasets, we compare the performance of some well-known models with our intuition model. The aim of the experiment was to predict the maximum accurate answers using intuition based models. We found that the presence of unknown entities, diversion from the current problem scenario, and identifying weakness without the normal logic based execution, greatly affects the reliability of the answers. Generally, the intuition based models cannot be a substitute for the logic based mechanisms in handling such problems. The intuition can only act as a support for an ongoing logic based model that processes all the steps in a sequential manner. However, when time and computational cost are very strict constraints, this intuition based model becomes extremely important and useful, because it can give a reasonably good performance. Factors affecting intuition are analyzed and interpreted through our model.\nAs computational agents are developed for increasingly complicated e-commerce applications, the complexity of the decisions they face demands advances in artificial intelligence techniques. For example, an agent representing a seller in an auction should try to maximize the seller's profit by reasoning about a variety of possibly uncertain pieces of information, such as the maximum prices various buyers might be willing to pay, the possible prices being offered by competing sellers, the rules by which the auction operates, the dynamic arrival and matching of offers to buy and sell, and so on. A naive application of multiagent reasoning techniques would require the seller's agent to explicitly model all of the other agents through an extended time horizon, rendering the problem intractable for many realistically-sized problems. We have instead devised a new strategy that an agent can use to determine its bid price based on a more tractable Markov chain model of the auction process. We have experimentally identified the conditions under which our new strategy works well, as well as how well it works in comparison to the optimal performance the agent could have achieved had it known the future. Our results show that our new strategy in general performs well, outperforming other tractable heuristic strategies in a majority of experiments, and is particularly effective in a 'seller?s market', where many buy offers are available.\nBelief merging is an important but difficult problem in Artificial Intelligence, especially when sources of information are pervaded with uncertainty. Many merging operators have been proposed to deal with this problem in possibilistic logic, a weighted logic which is powerful for handling inconsistency and deal- ing with uncertainty. They often result in a possibilistic knowledge base which is a set of weighted formulas. Although possibilistic logic is inconsistency tolerant, it suers from the well-known \"drowning effect\". Therefore, we may still want to obtain a consistent possi- bilistic knowledge base as the result of merg- ing. In such a case, we argue that it is not always necessary to keep weighted informa- tion after merging. In this paper, we define a merging operator that maps a set of pos- sibilistic knowledge bases and a formula rep- resenting the integrity constraints to a clas- sical knowledge base by using lexicographic ordering. We show that it satisfies nine pos- tulates that generalize basic postulates for propositional merging given in [11]. These postulates capture the principle of minimal change in some sense. We then provide an algorithm for generating the resulting knowl- edge base of our merging operator. Finally, we discuss the compatibility of our merging operator with propositional merging and es- tablish the advantage of our merging opera- tor over existing semantic merging operators in the propositional case.\nThe problems associated with scaling involve active and challenging research topics in the area of artificial intelligence. The purpose is to solve real world problems by means of AI technologies, in cases where the complexity of representation of the real world problem is potentially combinatorial. In this paper, we present a novel approach to cope with the scaling issues in Bayesian belief networks for ship classification. The proposed approach divides the conceptual model of a complex ship classification problem into a set of small modules that work together to solve the classification problem while preserving the functionality of the original model. The possible ways of explaining sensor returns (e.g., the evidence) for some features, such as portholes along the length of a ship, are sometimes combinatorial. Thus, using an exhaustive approach, which entails the enumeration of all possible explanations, is impractical for larger problems. We present a network structure (referred to as Sequential Decomposition, SD) in which each observation is associated with a set of legitimate outcomes which are consistent with the explanation of each observed piece of evidence. The results show that the SD approach allows one to represent feature-observation relations in a manageable way and achieve the same explanatory power as an exhaustive approach.\nShafer's theory of belief and the Bayesian theory of probability are two alternative and mutually inconsistent approaches toward modelling uncertainty in artificial intelligence. To help reduce the conflict between these two approaches, this paper reexamines expected utility theory-from which Bayesian probability theory is derived. Expected utility theory requires the decision maker to assign a utility to each decision conditioned on every possible event that might occur. But frequently the decision maker cannot foresee all the events that might occur, i.e., one of the possible events is the occurrence of an unforeseen event. So once we acknowledge the existence of unforeseen events, we need to develop some way of assigning utilities to decisions conditioned on unforeseen events. The commonsensical solution to this problem is to assign similar utilities to events which are similar. Implementing this commonsensical solution is equivalent to replacing Bayesian subjective probabilities over the space of foreseen and unforeseen events by random set theory probabilities over the space of foreseen events. This leads to an expected utility principle in which normalized variants of Shafer's commonalities play the role of subjective probabilities. Hence allowing for unforeseen events in decision analysis causes Bayesian probability theory to become much more similar to Shaferian theory.\nIn recent years, researchers in decision analysis and artificial intelligence (Al) have used Bayesian belief networks to build models of expert opinion. Using standard methods drawn from the theory of computational complexity, workers in the field have shown that the problem of probabilistic inference in belief networks is difficult and almost certainly intractable. K N ET, a software environment for constructing knowledge-based systems within the axiomatic framework of decision theory, contains a randomized approximation scheme for probabilistic inference. The algorithm can, in many circumstances, perform efficient approximate inference in large and richly interconnected models of medical diagnosis. Unlike previously described stochastic algorithms for probabilistic inference, the randomized approximation scheme computes a priori bounds on running time by analyzing the structure and contents of the belief network. In this article, we describe a randomized algorithm for probabilistic inference and analyze its performance mathematically. Then, we devote the major portion of the paper to a discussion of the algorithm's empirical behavior. The results indicate that the generation of good trials (that is, trials whose distribution closely matches the true distribution), rather than the computation of numerous mediocre trials, dominates the performance of stochastic simulation. Key words: probabilistic inference, belief networks, stochastic simulation, computational complexity theory, randomized algorithms.\nMany Artificial Intelligence systems depend on the agent's updating its beliefs about the world on the basis of experience. Experiments constitute one type of experience, so scientific methodology offers a natural environment for examining the issues attendant to using this class of evidence. This paper presents a framework which structures the process of using scientific data from research reports for the purpose of making decisions, using decision analysis as the basis for the structure and using medical research as the general scientific domain. The structure extends the basic influence diagram for updating belief in an object domain parameter of interest by expanding the parameter into four parts: those of the patient, the population, the study sample, and the effective study sample. The structure uses biases to perform the transformation of one parameter into another, so that, for instance, selection biases, in concert with the population parameter, yield the study sample parameter. The influence diagram structure provides decision theoretic justification for practices of good clinical research such as randomized assignment and blindfolding of care providers. The model covers most research designs used in medicine: case-control studies, cohort studies, and controlled clinical trials, and provides an architecture to separate clearly between statistical knowledge and domain knowledge. The proposed general model can be the basis for clinical epidemiological advisory systems, when coupled with heuristic pruning of irrelevant biases; of statistical workstations, when the computational machinery for calculation of posterior distributions is added; and of meta-analytic reviews, when multiple studies may impact on a single population parameter.\nThe modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational methods that would underlie their spatial information theoretic underpinnings.   We present the conceptual overview and architecture for the development of high-level semantic and qualitative analytical capabilities for dynamic geospatial domains. Building on formal methods in the areas of commonsense reasoning, qualitative reasoning, spatial and temporal representation and reasoning, reasoning about actions and change, and computational models of narrative, we identify concrete theoretical and practical challenges that accrue in the context of formal reasoning about `space, events, actions, and change'. With this as a basis, and within the backdrop of an illustrated scenario involving the spatio-temporal dynamics of urban narratives, we address specific problems and solutions techniques chiefly involving `qualitative abstraction', `data integration and spatial consistency', and `practical geospatial abduction'. From a broad topical viewpoint, we propose that next-generation dynamic GIS technology demands a transdisciplinary scientific perspective that brings together Geography, Artificial Intelligence, and Cognitive Science.   Keywords: artificial intelligence; cognitive systems; human-computer interaction; geographic information systems; spatio-temporal dynamics; computational models of narrative; geospatial analysis; geospatial modelling; ontology; qualitative spatial modelling and reasoning; spatial assistance systems\nWe introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve. (2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA. Our main theorem demonstrates that GOTCHAs can be used to mitigate the threat of offline dictionary attacks against passwords by ensuring that a password cracker must receive constant feedback from a human being while mounting an attack. Finally, we provide a candidate construction of GOTCHAs based on Inkblot images. Our construction relies on the usability assumption that users can recognize the phrases that they originally used to describe each Inkblot image --- a much weaker usability assumption than previous password systems based on Inkblots which required users to recall their phrase exactly. We conduct a user study to evaluate the usability of our GOTCHA construction. We also generate a GOTCHA challenge where we encourage artificial intelligence and security researchers to try to crack several passwords protected with our scheme.\nPrime implicates and prime implicants have proven relevant to a number of areas of artificial intelligence, most notably abductive reasoning and knowledge compilation. The purpose of this paper is to examine how these notions might be appropriately extended from propositional logic to the modal logic K. We begin the paper by considering a number of potential definitions of clauses and terms for K. The different definitions are evaluated with respect to a set of syntactic, semantic, and complexity-theoretic properties characteristic of the propositional definition. We then compare the definitions with respect to the properties of the notions of prime implicates and prime implicants that they induce. While there is no definition that perfectly generalizes the propositional notions, we show that there does exist one definition which satisfies many of the desirable properties of the propositional case. In the second half of the paper, we consider the computational properties of the selected definition. To this end, we provide sound and complete algorithms for generating and recognizing prime implicates, and we show the prime implicate recognition task to be PSPACE-complete. We also prove upper and lower bounds on the size and number of prime implicates. While the paper focuses on the logic K, all of our results hold equally well for multi-modal K and for concept expressions in the description logic ALC.\nReal-time heuristic search algorithms satisfy a constant bound on the amount of planning per action, independent of problem size. As a result, they scale up well as problems become larger. This property would make them well suited for video games where Artificial Intelligence controlled agents must react quickly to user commands and to other agents actions. On the downside, real-time search algorithms employ learning methods that frequently lead to poor solution quality and cause the agent to appear irrational by re-visiting the same problem states repeatedly. The situation changed recently with a new algorithm, D LRTA*, which attempted to eliminate learning by automatically selecting subgoals. D LRTA* is well poised for video games, except it has a complex and memory-demanding pre-computation phase during which it builds a database of subgoals. In this paper, we propose a simpler and more memory-efficient way of pre-computing subgoals thereby eliminating the main obstacle to applying state-of-the-art real-time search methods in video games. The new algorithm solves a number of randomly chosen problems off-line, compresses the solutions into a series of subgoals and stores them in a database. When presented with a novel problem on-line, it queries the database for the most similar previously solved case and uses its subgoals to solve the problem. In the domain of pathfinding on four large video game maps, the new algorithm delivers solutions eight times better while using 57 times less memory and requiring 14% less pre-computation time.\nDomain-independent planning is one of the foundational areas in the field of Artificial Intelligence. A description of a planning task consists of an initial world state, a goal, and a set of actions for modifying the world state. The objective is to find a sequence of actions, that is, a plan, that transforms the initial world state into a goal state. In optimal planning, we are interested in finding not just a plan, but one of the cheapest plans. A prominent approach to optimal planning these days is heuristic state-space search, guided by admissible heuristic functions. Numerous admissible heuristics have been developed, each with its own strengths and weaknesses, and it is well known that there is no single \"best heuristic for optimal planning in general. Thus, which heuristic to choose for a given planning task is a difficult question. This difficulty can be avoided by combining several heuristics, but that requires computing numerous heuristic estimates at each state, and the tradeoff between the time spent doing so and the time saved by the combined advantages of the different heuristics might be high. We present a novel method that reduces the cost of combining admissible heuristics for optimal planning, while maintaining its benefits. Using an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for learning a classifier with that decision rule as the target concept, and employ the learned classifier to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms the standard method for combining several heuristics via their pointwise maximum.\nThis book-length article combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence (AI). The behavior of future AI systems can be described by mathematical equations, which are adapted to analyze possible unintended AI behaviors and ways that AI designs can avoid them. This article makes the case for utility-maximizing agents and for avoiding infinite sets in agent definitions. It shows how to avoid agent self-delusion using model-based utility functions and how to avoid agents that corrupt their reward generators (sometimes called \"perverse instantiation\") using utility functions that evaluate outcomes at one point in time from the perspective of humans at a different point in time. It argues that agents can avoid unintended instrumental actions (sometimes called \"basic AI drives\" or \"instrumental goals\") by accurately learning human values. This article defines a self-modeling agent framework and shows how it can avoid problems of resource limits, being predicted by other agents, and inconsistency between the agent's utility function and its definition (one version of this problem is sometimes called \"motivated value selection\"). This article also discusses how future AI will differ from current AI, the politics of AI, and the ultimate use of AI to help understand the nature of the universe and our place in it.\nIn this position paper, I argue that standardized tests for elementary science such as SAT or Regents tests are not very good benchmarks for measuring the progress of artificial intelligence systems in understanding basic science. The primary problem is that these tests are designed to test aspects of knowledge and ability that are challenging for people; the aspects that are challenging for AI systems are very different. In particular, standardized tests do not test knowledge that is obvious for people; none of this knowledge can be assumed in AI systems. Individual standardized tests also have specific features that are not necessarily appropriate for an AI benchmark. I analyze the Physics subject SAT in some detail and the New York State Regents Science test more briefly. I also argue that the apparent advantages offered by using standardized tests are mostly either minor or illusory. The one major real advantage is that the significance is easily explained to the public; but I argue that even this is a somewhat mixed blessing. I conclude by arguing that, first, more appropriate collections of exam style problems could be assembled, and second, that there are better kinds of benchmarks than exam-style problems. In an appendix I present a collection of sample exam-style problems that test kinds of knowledge missing from the standardized tests.\nUndergraduate students of artificial intelligence often struggle with representing knowledge as logical sentences. This is a skill that seems to require extensive practice to obtain, suggesting a teaching strategy that involves the assignment of numerous exercises involving the formulation of some bit of knowledge, communicated using a natural language such as English, as a sentence in some logic. The number of such exercises needed to master this skill is far too large to allow typical artificial intelligence course teaching teams to provide prompt feedback on student efforts. Thus, an automated assessment system for such exercises is needed to ensure that students receive an adequate amount of practice, with the rapid delivery of feedback allowing students to identify errors in their understanding and correct them. This paper describes an automated grading system for knowledge representation exercises using first-order logic. A resolution theorem prover, \\textit{Prover9}, is used to check if a student-submitted formula is logically equivalent to a solution provided by the instructor. This system has been used by students enrolled in undergraduate artificial intelligence classes for several years. Use of this teaching tool resulted in a statistically significant improvement on first-order logic knowledge representation questions appearing on the course final examination. This article explains how this system works, provides an analysis of changes in student learning outcomes, and explores potential enhancements of this system, including the possibility of providing rich formative feedback by replacing the resolution theorem prover with a tableaux-based method.\nThis article addresses an open problem in the area of cognitive systems and architectures: namely the problem of handling (in terms of processing and reasoning capabilities) complex knowledge structures that can be at least plausibly comparable, both in terms of size and of typology of the encoded information, to the knowledge that humans process daily for executing everyday activities. Handling a huge amount of knowledge, and selectively retrieve it ac- cording to the needs emerging in different situational scenarios, is an important aspect of human intelligence. For this task, in fact, humans adopt a wide range of heuristics (Gigerenzer and Todd) due to their bounded rationality (Simon, 1957). In this perspective, one of the re- quirements that should be considered for the design, the realization and the evaluation of intelligent cognitively inspired systems should be represented by their ability of heuristically identify and retrieve, from the general knowledge stored in their artificial Long Term Memory (LTM), that one which is synthetically and contextually relevant. This require- ment, however, is often neglected. Currently, artificial cognitive systems and architectures are not able, de facto, to deal with complex knowledge structures that can be even slightly comparable to the knowledge heuris- tically managed by humans. In this paper I will argue that this is not only a technological problem but also an epistemological one and I will briefly sketch a proposal for a possible solution.\nLearning models of artificial intelligence can nowadays perform very well on a large variety of tasks. However, in practice different task environments are best handled by different learning models, rather than a single, universal, approach. Most non-trivial models thus require the adjustment of several to many learning parameters, which is often done on a case-by-case basis by an external party. Meta-learning refers to the ability of an agent to autonomously and dynamically adjust its own learning parameters, or meta-parameters. In this work we show how projective simulation, a recently developed model of artificial intelligence, can naturally be extended to account for meta-learning in reinforcement learning settings. The projective simulation approach is based on a random walk process over a network of clips. The suggested meta-learning scheme builds upon the same design and employs clip networks to monitor the agent's performance and to adjust its meta-parameters \"on the fly\". We distinguish between \"reflexive adaptation\" and \"adaptation through learning\", and show the utility of both approaches. In addition, a trade-off between flexibility and learning-time is addressed. The extended model is examined on three different kinds of reinforcement learning tasks, in which the agent has different optimal values of the meta-parameters, and is shown to perform well, reaching near-optimal to optimal success rates in all of them, without ever needing to manually adjust any meta-parameter.\nLinguistic relations in oral conversations present how opinions are constructed and developed in a restricted time. The relations bond ideas, arguments, thoughts, and feelings, re-shape them during a speech, and finally build knowledge out of all information provided in the conversation. Speakers share a common interest to discuss. It is expected that each speaker's reply includes duplicated forms of words from previous speakers. However, linguistic adaptation is observed and evolves in a more complex path than just transferring slightly modified versions of common concepts. A conversation aiming a benefit at the end shows an emergent cooperation inducing the adaptation. Not only cooperation, but also competition drives the adaptation or an opposite scenario and one can capture the dynamic process by tracking how the concepts are linguistically linked. To uncover salient complex dynamic events in verbal communications, we attempt to discover self-organized linguistic relations hidden in a conversation with explicitly stated winners and losers. We examine open access data of the United States Supreme Court. Our understanding is crucial in big data research to guide how transition states in opinion mining and decision-making should be modeled and how this required knowledge to guide the model should be pinpointed, by filtering large amount of data.\nUnifying seemingly disparate algorithmic ideas to produce better performing algorithms has been a longstanding goal in reinforcement learning. As a primary example, TD($\\lambda$) elegantly unifies one-step TD prediction with Monte Carlo methods through the use of eligibility traces and the trace-decay parameter $\\lambda$. Currently, there are a multitude of algorithms that can be used to perform TD control, including Sarsa, $Q$-learning, and Expected Sarsa. These methods are often studied in the one-step case, but they can be extended across multiple time steps to achieve better performance. Each of these algorithms is seemingly distinct, and no one dominates the others for all problems. In this paper, we study a new multi-step action-value algorithm called $Q(\\sigma)$ which unifies and generalizes these existing algorithms, while subsuming them as special cases. A new parameter, $\\sigma$, is introduced to allow the degree of sampling performed by the algorithm at each step during its backup to be continuously varied, with Sarsa existing at one extreme (full sampling), and Expected Sarsa existing at the other (pure expectation). $Q(\\sigma)$ is generally applicable to both on- and off-policy learning, but in this work we focus on experiments in the on-policy case. Our results show that an intermediate value of $\\sigma$, which results in a mixture of the existing algorithms, performs better than either extreme. The mixture can also be varied dynamically which can result in even greater performance.\nThe area of computation called artificial intelligence (AI) is falsified by describing a previous 1972 falsification of AI by British applied mathematician James Lighthill. It is explained how Lighthill's arguments continue to apply to current AI. It is argued that AI should use the Popperian scientific method in which it is the duty of every scientist to attempt to falsify theories and if theories are falsified to replace or modify them. The paper describes the Popperian method in detail and discusses Paul Nurse's application of the method to cell biology that also involves questions of mechanism and behavior. Arguments used by Lighthill in his original 1972 report that falsified AI are discussed. The Lighthill arguments are then shown to apply to current AI. The argument uses recent scholarship to explain Lighthill's assumptions and to show how the arguments based on those assumptions continue to falsify modern AI. An important focus of the argument involves Hilbert's philosophical programme that defined knowledge and truth as provable formal sentences. Current AI takes the Hilbert programme as dogma beyond criticism while Lighthill as a mid 20th century applied mathematician had abandoned it. The paper uses recent scholarship to explain John von Neumann's criticism of AI that I claim was assumed by Lighthill. The paper discusses computer chess programs to show Lighthill's combinatorial explosion still applies to AI but not humans. An argument showing that Turing Machines (TM) are not the correct description of computation is given. The paper concludes by advocating studying computation as Peter Naur's Dataology.\nMolecular variants of vitamin B12, siderophores and glycans occur. To take up variant forms, bacteria may express an array of receptors. The gut microbe Bacteroides thetaiotaomicron has three different receptors to take up variants of vitamin B12 and 88 receptors to take up various glycans. The design of receptor arrays reflects key processes that shape cellular evolution. Competition may focus each species on a subset of the available nutrient diversity. Some gut bacteria can take up only a narrow range of carbohydrates, whereas species such as B.~thetaiotaomicron can digest many different complex glycans. Comparison of different nutrients, habitats, and genomes provide opportunity to test hypotheses about the breadth of receptor arrays. Another important process concerns fluctuations in nutrient availability. Such fluctuations enhance the value of cellular sensors, which gain information about environmental availability and adjust receptor deployment. Bacteria often adjust receptor expression in response to fluctuations of particular carbohydrate food sources. Some species may adjust expression of uptake receptors for specific siderophores. How do cells use sensor information to control the response to fluctuations? That question about regulatory wiring relates to problems that arise in control theory and artificial intelligence. Control theory clarifies how to analyze environmental fluctuations in relation to the design of sensors and response systems. Recent advances in deep learning studies of artificial intelligence focus on the architecture of regulatory wiring and the ways in which complex control networks represent and classify environmental states. I emphasize the similar design problems that arise in cellular evolution, control theory, and artificial intelligence. I connect those broad concepts to testable hypotheses for bacterial uptake of B12, siderophores and glycans.\nSyntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful.   In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources.   The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.\nThere have been numerous breakthroughs with reinforcement learning in the recent years, perhaps most notably on Deep Reinforcement Learning successfully playing and winning relatively advanced computer games. There is undoubtedly an anticipation that Deep Reinforcement Learning will play a major role when the first AI masters the complicated game plays needed to beat a professional Real-Time Strategy game player. For this to be possible, there needs to be a game environment that targets and fosters AI research, and specifically Deep Reinforcement Learning. Some game environments already exist, however, these are either overly simplistic such as Atari 2600 or complex such as Starcraft II from Blizzard Entertainment. We propose a game environment in between Atari 2600 and Starcraft II, particularly targeting Deep Reinforcement Learning algorithm research. The environment is a variant of Tower Line Wars from Warcraft III, Blizzard Entertainment. Further, as a proof of concept that the environment can harbor Deep Reinforcement algorithms, we propose and apply a Deep Q-Reinforcement architecture. The architecture simplifies the state space so that it is applicable to Q-learning, and in turn improves performance compared to current state-of-the-art methods. Our experiments show that the proposed architecture can learn to play the environment well, and score 33% better than standard Deep Q-learning which in turn proves the usefulness of the game environment.\nThe hard problem in artificial intelligence asks how the shuffling of syntactical symbols in a program can lead to systems which experience semantics and qualia. We address this question in three stages. First, we introduce a new class of human semantic symbols which appears when unexpected and drastic environmental change causes humans to become surprised, confused, uncertain, and in extreme cases, unresponsive, passive and dysfunctional. For this class of symbols, pre-learned programs become inoperative so these syntactical programs cannot be the source of experienced qualia. Second, we model the dysfunctional human response to a radically changed environment as being the natural response of any learning machine facing novel inputs from well outside its previous training set. In this situation, learning machines are unable to extract information from their input and will typically enter a dynamical state characterized by null outputs and a lack of response. This state immediately predicts and explains the characteristics of the semantic experiences of humans in similar circumstances. In the third stage, we consider learning machines trained to implement multiple functions in simple sequential programs using environmental data to specify subroutine names, control flow instructions, memory calls, and so on. Drastic change in any of these environmental inputs can again lead to inoperative programs. By examining changes specific to people or locations we can model human cognitive symbols featuring these dependencies, such as attachment and grief. Our approach links known dynamical machines states with human qualia and thus offers new insight into the hard problem of artificial intelligence.\nAll artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes (\"non-human\" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems. The mathematical foundations of AI non-destructive correction are presented and a series of new stochastic separation theorems is proven. These theorems provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. They demonstrate that in high dimensions and even for exponentially large samples, linear classifiers in their classical Fisher's form are powerful enough to separate errors from correct responses with high probability and to provide efficient solution to the non-destructive corrector problem. In particular, we prove some hypotheses formulated in our paper `Stochastic Separation Theorems' (Neural Networks, 94, 255--259, 2017), and answer one general problem published by Donoho and Tanner in 2009.\nOnline symptom checkers have significant potential to improve patient care, however their reliability and accuracy remain variable. We hypothesised that an artificial intelligence (AI) powered triage and diagnostic system would compare favourably with human doctors with respect to triage and diagnostic accuracy. We performed a prospective validation study of the accuracy and safety of an AI powered triage and diagnostic system. Identical cases were evaluated by both an AI system and human doctors. Differential diagnoses and triage outcomes were evaluated by an independent judge, who was blinded from knowing the source (AI system or human doctor) of the outcomes. Independently of these cases, vignettes from publicly available resources were also assessed to provide a benchmark to previous studies and the diagnostic component of the MRCGP exam. Overall we found that the Babylon AI powered Triage and Diagnostic System was able to identify the condition modelled by a clinical vignette with accuracy comparable to human doctors (in terms of precision and recall). In addition, we found that the triage advice recommended by the AI System was, on average, safer than that of human doctors, when compared to the ranges of acceptable triage provided by independent expert judges, with only a minimal reduction in appropriateness.\nIn recent years there has been a sharp rise in networking applications, in which significant events need to be classified but only a few training instances are available. These are known as cases of one-shot learning. Examples include analyzing network traffic under zero-day attacks, and computer vision tasks by sensor networks deployed in the field. To handle this challenging task, organizations often use human analysts to classify events under high uncertainty. Existing algorithms use a threshold-based mechanism to decide whether to classify an object automatically or send it to an analyst for deeper inspection. However, this approach leads to a significant waste of resources since it does not take the practical temporal constraints of system resources into account. Our contribution is threefold. First, we develop a novel Deep Reinforcement One-shot Learning (DeROL) framework to address this challenge. The basic idea of the DeROL algorithm is to train a deep-Q network to obtain a policy which is oblivious to the unseen classes in the testing data. Then, in real-time, DeROL maps the current state of the one-shot learning process to operational actions based on the trained deep-Q network, to maximize the objective function. Second, we develop the first open-source software for practical artificially intelligent one-shot classification systems with limited resources for the benefit of researchers in related fields. Third, we present an extensive experimental study using the OMNIGLOT dataset for computer vision tasks and the UNSW-NB15 dataset for intrusion detection tasks that demonstrates the versatility and efficiency of the DeROL framework.\n5th generation networks are envisioned to provide seamless and ubiquitous connection to 1000-fold more devices and is believed to provide ultra-low latency and higher data rates up to tens of Gbps. Different technologies enabling these requirements are being developed including mmWave communications, Massive MIMO and beamforming, Device to Device (D2D) communications and Heterogeneous Networks. D2D communication is a promising technology to enable applications requiring high bandwidth such as online streaming and online gaming etc. It can also provide ultra- low latencies required for applications like vehicle to vehicle communication for autonomous driving. D2D communication can provide higher data rates with high energy efficiency and spectral efficiency compared to conventional communication. The performance benefits of D2D communication can be best achieved when D2D users reuses the spectrum being utilized by the conventional cellular users. This spectrum sharing in a multi-tier heterogeneous network will introduce complex interference among D2D users and cellular users which needs to be resolved. Motivated by limited number of surveys for interference mitigation and resource allocation in D2D enabled heterogeneous networks, we have surveyed different conventional and artificial intelligence based interference mitigation and resource allocation schemes developed in recent years. Our contribution lies in the analysis of conventional interference mitigation techniques and their shortcomings. Finally, the strengths of AI based techniques are determined and open research challenges deduced from the recent research are presented.\nThe Artificial Intelligence (AI) revolution foretold of during the 1960s is well underway in the second decade of the 21st century. Its period of phenomenal growth likely lies ahead. Still, we believe, there are crucial lessons that biology can offer that will enable a prosperous future for AI. For machines in general, and for AI's especially, operating over extended periods or in extreme environments will require energy usage orders of magnitudes more efficient than exists today. In many operational environments, energy sources will be constrained. Any plans for AI devices operating in a challenging environment must begin with the question of how they are powered, where fuel is located, how energy is stored and made available to the machine, and how long the machine can operate on specific energy units. Hence, the materials and technologies that provide the needed energy represent a critical challenge towards future use-scenarios of AI and should be integrated into their design. Here we make four recommendations for stakeholders and especially decision makers to facilitate a successful trajectory for this technology. First, that scientific societies and governments coordinate Biomimetic Research for Energy-efficient, AI Designs (BREAD); a multinational initiative and a funding strategy for investments in the future integrated design of energetics into AI. Second, that biomimetic energetic solutions be central to design consideration for future AI. Third, that a pre-competitive space be organized between stakeholder partners and fourth, that a trainee pipeline be established to ensure the human capital required for success in this area.\nConventional methods for visual assessment of civil infrastructures have certain limitations, such as subjectivity of the collected data, long inspection time, and high cost of labor. Although some new technologies i.e. robotic techniques that are currently in practice can collect objective, quantified data, the inspectors own expertise is still critical in many instances since these technologies are not designed to work interactively with human inspector. This study aims to create a smart, human centered method that offers significant contributions to infrastructure inspection, maintenance, management practice, and safety for the bridge owners. By developing a smart Mixed Reality framework, which can be integrated into a wearable holographic headset device, a bridge inspector, for example, can automatically analyze a certain defect such as a crack that he or she sees on an element, display its dimension information in real-time along with the condition state. Such systems can potentially decrease the time and cost of infrastructure inspections by accelerating essential tasks of the inspector such as defect measurement, condition assessment and data processing to management systems. The human centered artificial intelligence will help the inspector collect more quantified and objective data while incorporating inspectors professional judgement. This study explains in detail the described system and related methodologies of implementing attention guided semi supervised deep learning into mixed reality technology, which interacts with the human inspector during assessment. Thereby, the inspector and the AI will collaborate or communicate for improved visual inspection.\nPeople frequently face challenging decision-making problems in which outcomes are uncertain or unknown. Artificial intelligence (AI) algorithms exist that can outperform humans at learning such tasks. Thus, there is an opportunity for AI agents to assist people in learning these tasks more effectively. In this work, we use a multi-armed bandit as a controlled setting in which to explore this direction. We pair humans with a selection of agents and observe how well each human-agent team performs. We find that team performance can beat both human and agent performance in isolation. Interestingly, we also find that an agent's performance in isolation does not necessarily correlate with the human-agent team's performance. A drop in agent performance can lead to a disproportionately large drop in team performance, or in some settings can even improve team performance. Pairing a human with an agent that performs slightly better than them can make them perform much better, while pairing them with an agent that performs the same can make them them perform much worse. Further, our results suggest that people have different exploration strategies and might perform better with agents that match their strategy. Overall, optimizing human-agent team performance requires going beyond optimizing agent performance, to understanding how the agent's suggestions will influence human decision-making.\nHumor is an essential human trait. Efforts to understand humor have called out links between humor and the foundations of cognition, as well as the importance of humor in social engagement. As such, it is a promising and important subject of study, with relevance for artificial intelligence and human-computer interaction. Previous computational work on humor has mostly operated at a coarse level of granularity, e.g., predicting whether an entire sentence, paragraph, document, etc., is humorous. As a step toward deep understanding of humor, we seek fine-grained models of attributes that make a given text humorous. Starting from the observation that satirical news headlines tend to resemble serious news headlines, we build and analyze a corpus of satirical headlines paired with nearly identical but serious headlines. The corpus is constructed via Unfun.me, an online game that incentivizes players to make minimal edits to satirical headlines with the goal of making other players believe the results are serious headlines. The edit operations used to successfully remove humor pinpoint the words and concepts that play a key role in making the original, satirical headline funny. Our analysis reveals that the humor tends to reside toward the end of headlines, and primarily in noun phrases, and that most satirical headlines follow a certain logical pattern, which we term false analogy. Overall, this paper deepens our understanding of the syntactic and semantic structure of satirical news headlines and provides insights for building humor-producing systems.\nPURPOSE OF REVIEW: Despite the impressive results of recent artificial intelligence (AI) applications to general ophthalmology, comparatively less progress has been made toward solving problems in pediatric ophthalmology using similar techniques. This article discusses the unique needs of pediatric ophthalmology patients and how AI techniques can address these challenges, surveys recent applications of AI to pediatric ophthalmology, and discusses future directions in the field.   RECENT FINDINGS: The most significant advances involve the automated detection of retinopathy of prematurity (ROP), yielding results that rival experts. Machine learning (ML) has also been successfully applied to the classification of pediatric cataracts, prediction of post-operative complications following cataract surgery, detection of strabismus and refractive error, prediction of future high myopia, and diagnosis of reading disability via eye tracking. In addition, ML techniques have been used for the study of visual development, vessel segmentation in pediatric fundus images, and ophthalmic image synthesis.   SUMMARY: AI applications could significantly benefit clinical care for pediatric ophthalmology patients by optimizing disease detection and grading, broadening access to care, furthering scientific discovery, and improving clinical efficiency. These methods need to match or surpass physician performance in clinical trials before deployment with patients. Due to widespread use of closed-access data sets and software implementations, it is difficult to directly compare the performance of these approaches, and reproducibility is poor. Open-access data sets and software implementations could alleviate these issues, and encourage further AI applications to pediatric ophthalmology.   KEYWORDS: pediatric ophthalmology, machine learning, artificial intelligence, deep learning\nThis paper investigates a paradigm for offering artificial intelligence as a service (AI-aaS) on software-defined infrastructures (SDIs). The increasing complexity of networking and computing infrastructures is already driving the introduction of automation in networking and cloud computing management systems. Here we consider how these automation mechanisms can be leveraged to offer AI-aaS. Use cases for AI-aaS are easily found in addressing smart applications in sectors such as transportation, manufacturing, energy, water, air quality, and emissions. We propose an architectural scheme based on SDIs where each AI-aaS application is comprised of a monitoring, analysis, policy, execution plus knowledge (MAPE-K) loop (MKL). Each application is composed as one or more specific service chains embedded in SDI, some of which will include a Machine Learning (ML) pipeline. Our model includes a new training plane and an AI-aaS plane to deal with the model-development and operational phases of AI applications. We also consider the role of an ML/MKL sandbox in ensuring coherency and consistency in the operation of multiple parallel MKL loops. We present experimental measurement results for three AI-aaS applications deployed on the SAVI testbed: 1. Compressing monitored data in SDI using autoencoders; 2. Traffic monitoring to allocate CPUs resources to VNFs; and 3. Highway segment classification in smart transportation.\nWe introduce DaiMoN, a decentralized artificial intelligence model network, which incentivizes peer collaboration in improving the accuracy of machine learning models for a given classification problem. It is an autonomous network where peers may submit models with improved accuracy and other peers may verify the accuracy improvement. The system maintains an append-only decentralized ledger to keep the log of critical information, including who has trained the model and improved its accuracy, when it has been improved, by how much it has improved, and where to find the newly updated model. DaiMoN rewards these contributing peers with cryptographic tokens. A main feature of DaiMoN is that it allows peers to verify the accuracy improvement of submitted models without knowing the test labels. This is an essential component in order to mitigate intentional model overfitting by model-improving peers. To enable this model accuracy evaluation with hidden test labels, DaiMoN uses a novel learnable Distance Embedding for Labels (DEL) function proposed in this paper. Specific to each test dataset, DEL scrambles the test label vector by embedding it in a low-dimension space while approximately preserving the distance between the dataset's test label vector and a label vector inferred by the classifier. It therefore allows proof-of-improvement (PoI) by peers without providing them access to true test labels. We provide analysis and empirical evidence that under DEL, peers can accurately assess model accuracy. We also argue that it is hard to invert the embedding function and thus, DEL is resilient against attacks aiming to recover test labels in order to cheat. Our prototype implementation of DaiMoN is available at https://github.com/steerapi/daimon.\nData-driven computational approaches have evolved to enable extraction of information from medical images with a reliability, accuracy and speed which is already transforming their interpretation and exploitation in clinical practice. While similar benefits are longed for in the field of interventional imaging, this ambition is challenged by a much higher heterogeneity. Clinical workflows within interventional suites and operating theatres are extremely complex and typically rely on poorly integrated intra-operative devices, sensors, and support infrastructures. Taking stock of some of the most exciting developments in machine learning and artificial intelligence for computer assisted interventions, we highlight the crucial need to take context and human factors into account in order to address these challenges. Contextual artificial intelligence for computer assisted intervention, or CAI4CAI, arises as an emerging opportunity feeding into the broader field of surgical data science. Central challenges being addressed in CAI4CAI include how to integrate the ensemble of prior knowledge and instantaneous sensory information from experts, sensors and actuators; how to create and communicate a faithful and actionable shared representation of the surgery among a mixed human-AI actor team; how to design interventional systems and associated cognitive shared control schemes for online uncertainty-aware collaborative decision making ultimately producing more precise and reliable interventions.\nThe potential of Artificial Intelligence (AI) to tackle challenging problems that afflict society is enormous, particularly in the areas of healthcare, conservation and public safety and security. Many problems in these domains involve harnessing social networks of under-served communities to enable positive change, e.g., using social networks of homeless youth to raise awareness about Human Immunodeficiency Virus (HIV) and other STDs. Unfortunately, most of these real-world problems are characterized by uncertainties about social network structure and influence models, and previous research in AI fails to sufficiently address these uncertainties. This thesis addresses these shortcomings by advancing the state-of-the-art to a new generation of algorithms for interventions in social networks. In particular, this thesis describes the design and development of new influence maximization algorithms which can handle various uncertainties that commonly exist in real-world social networks. These algorithms utilize techniques from sequential planning problems and social network theory to develop new kinds of AI algorithms. Further, this thesis also demonstrates the real-world impact of these algorithms by describing their deployment in three pilot studies to spread awareness about HIV among actual homeless youth in Los Angeles. This represents one of the first-ever deployments of computer science based influence maximization algorithms in this domain. Our results show that our AI algorithms improved upon the state-of-the-art by 160% in the real-world. We discuss research and implementation challenges faced in deploying these algorithms, and lessons that can be gleaned for future deployment of such algorithms. The positive results from these deployments illustrate the enormous potential of AI in addressing societally relevant problems.\n"
  },
  {
    "path": "data/arxiv/artificial intelligence_134_15000_200_title.txt",
    "content": "Seeding the Singularity for A.I\nUse of Artificial Intelligence Techniques / Applications in Cyber  Defense\nArtificial Intelligence for Interstellar Travel\nOne Decade of Universal Artificial Intelligence\nTheory of Cognitive Relativity: A Promising Paradigm for True AI\nEdge Intelligence: Paving the Last Mile of Artificial Intelligence with  Edge Computing\nArtificial Intelligence and its Role in Near Future\nEnaction-Based Artificial Intelligence: Toward Coevolution with Humans  in the Loop\nIntelligence in Artificial Intelligence\nThe SP Theory of Intelligence as a Foundation for the Development of a  General, Human-Level Thinking Machine\nOpen Ended Intelligence: The individuation of Intelligent Agents\nAn architecture for the evaluation of intelligent systems\nAnalysis of first prototype universal intelligence tests: evaluating and  comparing AI algorithms and humans\nA Model for General Intelligence\nOn the Measure of Intelligence\nTowards Self-constructive Artificial Intelligence: Algorithmic basis  (Part I)\nRobotics Rights and Ethics Rules\nAn Unified Intelligence-Communication Model for Multi-Agent System  Part-I: Overview\nWhy Artificial Intelligence Needs a Task Theory --- And What It Might  Look Like\nComparison of Artificial Intelligence Techniques for Project Conceptual  Cost Prediction\nLandau Theory of Adaptive Integration in Computational Intelligence\nAffect Control Processes: Intelligent Affective Interaction using a  Partially Observable Markov Decision Process\nImprobotics: Exploring the Imitation Game using Machine Intelligence in  Improvised Theatre\nA Model for Web-Intelligence Index to Evaluate the Web Intelligence  Capacity of Government Web Sites of Sri Lanka\nAlgorithms and Complexity Results for Persuasive Argumentation\n\"Dave...I can assure you...that it's going to be all right...\" -- A  definition, case for, and survey of algorithmic assurances in human-autonomy  trust relationships\nHows and Whys of Artificial Intelligence for Public Sector Decisions:  Explanation and Evaluation\nArtificial Intelligence: A Child's Play\nAnalysis of Algorithms and Partial Algorithms\nArtificial Immune Systems Tutorial\nMinimally Naturalistic Artificial Intelligence\nFocus Group on Artificial Intelligence for Health\nNeocortical plasticity: an unsupervised cake but no free lunch\nAutonomous robots and the SP theory of intelligence\nA World of Views: A World of Interacting Post-human Intelligences\nIntelligent User Interface in Fuzzy Environment\nApplying Artificial Intelligence and Internet Techniques in Rural  Tourism Domain\nThe Future of Scientific Simulations: from Artificial Life to Artificial  Cosmogenesis\nDefinition and properties to assess multi-agent environments as social  intelligence tests\nSingle photon in hierarchical architecture for physical reinforcement  learning: Photon intelligence\nHow to advance general game playing artificial intelligence by player  modelling\nArtificial Neural Networks-Based Machine Learning for Wireless Networks:  A Tutorial\nCan Autism be Catered with Artificial Intelligence-Assisted Intervention  Technology? A Literature Review\nUsing Artificial Intelligence to Support Compliance with the General  Data Protection Regulation\nWikistat 2.0: Educational Resources for Artificial Intelligence\nDesigning Trustworthy AI: A Human-Machine Teaming Framework to Guide  Development\nArtificial Intelligence and the Future of Psychiatry: Qualitative  Findings from a Global Physician Survey\nComputing as compression: the SP theory of intelligence\nTowards the Augmented Pathologist: Challenges of Explainable-AI in  Digital Pathology\nReputation in M2M Economy\nArtificial general intelligence through recursive data compression and  grounded reasoning: a position paper\nLifelong Testing of Smart Autonomous Systems by Shepherding a Swarm of  Watchdog Artificial Intelligence Agents\nAn electronic-game framework for evaluating coevolutionary algorithms\nCloudifierNet -- Deep Vision Models for Artificial Image Processing\nCan Turing machine be curious about its Turing test results? Three  informal lectures on physics of intelligence\nIntelligent Traffic Signal Control: Using Reinforcement Learning with  Partial Detection\nWhen Autonomous Intelligent Goodware will Fight Autonomous Intelligent  Malware: A Possible Future of Cyber Defense\nPast Visions of Artificial Futures: One Hundred and Fifty Years under  the Spectre of Evolving Machines\nAutonomous development and learning in artificial intelligence and  robotics: Scaling up deep learning to human--like learning\nResponse to NITRD, NCO, NSF Request for Information on \"Update to the  2016 National Artificial Intelligence Research and Development Strategic  Plan\"\nMachine learning \\& artificial intelligence in the quantum domain\nProvably Bounded-Optimal Agents\nArtificial Intelligence Based Cognitive Routing for Cognitive Radio  Networks\nThe Morphospace of Consciousness\nCognitive Database: A Step towards Endowing Relational Databases with  Artificial Intelligence Capabilities\nThere is no Artificial General Intelligence\nFrom Crystallized Adaptivity to Fluid Adaptivity in Deep Reinforcement  Learning -- Insights from Biological Systems on Adaptive Flexibility\nArtificial Immune Systems\nAdaptive Parallel Iterative Deepening Search\nElementos de ingeniería de explotación de la información aplicados  a la investigación tributaria fiscal\nThe option pricing model based on time values: an application of the  universal approximation theory on unbounded domains\nA Distributed AI Aided 3D Domino Game\nA study on non-destructive method for detecting Toxin in pepper using  Neural networks\nThe SP theory of intelligence: benefits and applications\nWhat is Learning? A primary discussion about information and  Representation\nAIXIjs: A Software Demo for General Reinforcement Learning\nMultimodal Intelligence: Representation Learning, Information Fusion,  and Applications\nLearning, Social Intelligence and the Turing Test - why an  \"out-of-the-box\" Turing Machine will not pass the Turing Test\nDeep RTS: A Game Environment for Deep Reinforcement Learning in  Real-Time Strategy Games\nHow the symbol grounding of living organisms can be realized in  artificial agents\nResponses to a Critique of Artificial Moral Agents\nArtificial Intelligence Probes for Interstellar Exploration and  Colonization\nNovel Artificial Human Optimization Field Algorithms - The Beginning\nAutomatic Programming of Cellular Automata and Artificial Neural  Networks Guided by Philosophy\nCausality and deceit: Do androids watch action movies?\nEvidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov  Model\nDesign and implementation of computational platform for social-humanoid  robot Lumen as an exhibition guide in Electrical Engineering Days 2015\nFML-based Dynamic Assessment Agent for Human-Machine Cooperative System  on Game of Go\nIntelligent Traffic Light Control Using Distributed Multi-agent Q  Learning\nAutonomous Intelligent Cyber-defense Agent (AICA) Reference  Architecture. Release 2.0\nThe Responsibility Quantification (ResQu) Model of Human Interaction  with Automation\nOpenEI: An Open Framework for Edge Intelligence\nRandom Worlds and Maximum Entropy\nA Uniform Framework for Concept Definitions in Description Logics\nSynthesizing Customized Planners from Specifications\nAn Average Analysis of Backtracking on Random Constraint Satisfaction  Problems\nThe tractability of CSP classes defined by forbidden patterns\nImplementing Human-like Intuition Mechanism in Artificial Intelligence\nUse of Markov Chains to Design an Agent Bidding Strategy for Continuous  Double Auctions\nMerging Knowledge Bases in Possibilistic Logic by Lexicographic  Aggregation\nA Study of Scaling Issues in Bayesian Belief Networks for Ship  Classification\nA Bayesian Variant of Shafer's Commonalities For Modelling Unforeseen  Events\nAn Empirical Evaluation of a Randomized Algorithm for Probabilistic  Inference\nA Decision-Theoretic Model for Using Scientific Data\nGeospatial Narratives and their Spatio-Temporal Dynamics: Commonsense  Reasoning for High-level Analyses in Geographic Information Systems\nGOTCHA Password Hackers!\nPrime Implicates and Prime Implicants: From Propositional to Modal Logic\nCase-Based Subgoaling in Real-Time Heuristic Search for Video Game  Pathfinding\nOnline Speedup Learning for Optimal Planning\nEthical Artificial Intelligence\nThe Limitations of Standardized Science Tests as Benchmarks for  Artificial Intelligence Research: Position Paper\nUsing Automated Theorem Provers to Teach Knowledge Representation in  First-Order Logic\nSome Epistemological Problems with the Knowledge Level in Cognitive  Architectures\nMeta-learning within Projective Simulation\nTracing Linguistic Relations in Winning and Losing Sides of Explicit  Opposing Groups\nMulti-step Reinforcement Learning: A Unifying Algorithm\nA Popperian Falsification of Artificial Intelligence - Lighthill  Defended\nReceptor uptake arrays for vitamin B12, siderophores and glycans shape  bacterial communities\nHow Important is Syntactic Parsing Accuracy? An Empirical Evaluation on  Rule-Based Sentiment Analysis\nTowards a Deep Reinforcement Learning Approach for Tower Line Wars\nNull Dynamical State Models of Human Cognitive Dysfunction\nAugmented Artificial Intelligence: a Conceptual Framework\nA comparative study of artificial intelligence and human doctors for the  purpose of triage and diagnosis\nDeep Reinforcement One-Shot Learning for Artificially Intelligent  Classification Systems\nA Survey of Conventional and Artificial Intelligence / Learning based  Resource Allocation and Interference Mitigation Schemes in D2D Enabled  Networks\nMaking BREAD: Biomimetic strategies for Artificial Intelligence Now and  in the Future\nArtificial Intelligence Assisted Infrastructure Assessment Using Mixed  Reality Systems\nHuman-AI Learning Performance in Multi-Armed Bandits\nReverse-Engineering Satire, or \"Paper on Computational Humor Accepted  Despite Making Serious Advances\"\nArtificial Intelligence for Pediatric Ophthalmology\nArtificial Intelligence as a Services (AI-aaS) on Software-Defined  Infrastructure\nDaiMoN: A Decentralized Artificial Intelligence Model Network\nCAI4CAI: The Rise of Contextual Artificial Intelligence in Computer  Assisted Interventions\nArtificial Intelligence for Low-Resource Communities: Influence  Maximization in an Uncertain World\n"
  },
  {
    "path": "data/arxiv/computer vision_14582_15000_15_title.txt",
    "content": "Multiband NFC for High-Throughput Wireless Computer Vision Sensor  Network\nReal-time Tracking Based on Neuromrophic Vision\nReconfiguring the Imaging Pipeline for Computer Vision\nI'm sorry to say, but your understanding of image processing  fundamentals is absolutely wrong\nA Survey on Deep Learning Methods for Robot Vision\nNegative Results in Computer Vision: A Perspective\nCloudCV: Large Scale Distributed Computer Vision as a Cloud Service\nComputers Should Be Uniters Not Dividers: A Vision of Computer-Enhanced  Happy Future\nThe Evolution of First Person Vision Methods: A Survey\nEuphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous  Vision\nThe Informed Sampler: A Discriminative Approach to Bayesian Inference in  Generative Computer Vision Models\nCrowdsourcing in Computer Vision\nRecent Advances in Transient Imaging: A Computer Graphics and Vision  Perspective\nMotion Segmentation by SCC on the Hopkins 155 Database\nVector Quantization for Machine Vision\nMinimal Problems for the Calibrated Trifocal Variety\nLearning Inference Models for Computer Vision\nSparse models for Computer Vision\nUniversal representations:The missing link between faces, text,  planktons, and cat breeds\nA Survey on Recent Advances of Computer Vision Algorithms for Egocentric  Video\nCS591 Report: Application of siamesa network in 2D transformation\nUtilizing Large Scale Vision and Text Datasets for Image Segmentation  from Referring Expressions\nOn-Board Vision Processing For Small UAVs: Time to Rethink Strategy\nComputer Vision Systems in Road Vehicles: A Review\n$EVA^2$ : Exploiting Temporal Redundancy in Live Computer Vision\nAddressing the non-functional requirements of computer vision systems: A  case study\nSideEye: A Generative Neural Network Based Simulator of Human Peripheral  Vision\nToward Designing Intelligent PDEs for Computer Vision: An Optimal  Control Approach\nActions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for  Visual Recognition\nRobotics Vision-based Heuristic Reasoning for Underwater Target Tracking  and Navigation\nA Possible Model of Noise Enhanced Visual Perception in Human Vision\nAccelerated Convolutions for Efficient Multi-Scale Time to Contact  Computation in Julia\nRobust Fitting in Computer Vision: Easy or Hard?\nA real time vehicles detection algorithm for vision based sensors\nAnnotation Methodologies for Vision and Language Dataset Creation\nA Taxonomy of Deep Convolutional Neural Nets for Computer Vision\nFast k Nearest Neighbor Search using GPU\nWhen Image Denoising Meets High-Level Vision Tasks: A Deep Learning  Approach\nExploiting skeletal structure in computer vision annotation with Benders  decomposition\nSimulations for Validation of Vision Systems\nAccelerating Convolutional Neural Networks for Continuous Mobile Vision  via Cache Reuse\nMarket-Oriented Cloud Computing: Vision, Hype, and Reality for  Delivering IT Services as Computing Utilities\nComplex Networks: New Concepts and Tools for Real-Time Imaging and  Vision\nAnalyzing the Performance of Multilayer Neural Networks for Object  Recognition\nLearning a Loopy Model For Semantic Segmentation Exactly\nsiftservice.com - Turning a Computer Vision algorithm into a World Wide  Web Service\ngvnn: Neural Network Library for Geometric Computer Vision\nDirty Pixels: Optimizing Image Classification Architectures for Raw  Sensor Data\nDesign and Evaluation of Vision-based Head and Face Tracking Interfaces  for Assistive Input\nChainerCV: a Library for Deep Learning in Computer Vision\nOpenmv: A Python powered, extensible machine vision camera\nAdversarial Examples that Fool both Human and Computer Vision\nThe Affine Transforms for Image Enhancement in the Context of  Logarithmic Models\nDimensionApp : android app to estimate object dimensions\nDeep Learning applied to NLP\nInterstitial Content Detection\nSecrets in Computing Optical Flow by Convolutional Networks\nEdge Computing and Dynamic Vision Sensing for Low Delay Access to Visual  Medical Information\nConsistent sets of lines with no colorful incidence\nConverting Static Image Datasets to Spiking Neuromorphic Datasets Using  Saccades\nIs Image Super-resolution Helpful for Other Vision Tasks?\nPerformance Evaluation of Vision-Based Algorithms for MAVs\nActive Control of Camera Parameters for Object Detection Algorithms\nWhat Next? A Dozen Information-Technology Research Goals\nSelective Image Super-Resolution\nComputer Vision Accelerators for Mobile Systems based on OpenCL GPGPU  Co-Processing\nHuman perception in computer vision\nInspiring Computer Vision System Solutions\nEmbedded Platforms for Computer Vision-based Advanced Driver Assistance  Systems: a Survey\nTowards Closing the Energy Gap Between HOG and CNN Features for Embedded  Vision\nUtility-Based Control for Computer Vision\nA survey on Human Computer Interaction Mechanism Using Finger Tracking\nImage Analysis in Astronomy for very large vision machine\nOn the closed-form solution of the rotation matrix arising in computer  vision problems\nA Hilbert Scheme in Computer Vision\nDomain Adaptations for Computer Vision Applications\nThe Chow Form of the Essential Variety in Computer Vision\nReflection Invariance: an important consideration of image orientation\nParameterizing Region Covariance: An Efficient Way To Apply Sparse Codes  On Second Order Statistics\nResearchDoom and CocoDoom: Learning Computer Vision with Games\nReplicator Equation: Applications Revisited\nTwo Hilbert schemes in computer vision\nAn Extremely Efficient Chess-board Detection for Non-trivial Photos\nUnderstanding the visual speech signal\nApplications of a Graph Theoretic Based Clustering Framework in Computer  Vision and Pattern Recognition\nFine-grained Activity Recognition in Baseball Videos\nAn efficient circle detection scheme in digital images using ant system  algorithm\nGPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network  Training\nReview on Computer Vision Techniques in Emergency Situation\nRevisiting Unreasonable Effectiveness of Data in Deep Learning Era\n3D Visual Tracking with Particle and Kalman Filters\nAutomatic liver segmentation method in CT images\nA Stochastic Grammar for Natural Shapes\nA Minimal Six-Point Auto-Calibration Algorithm\nOn Minimal Accuracy Algorithm Selection in Computer Vision and  Intelligent Systems\nComparisons of Reasoning Mechanisms for Computer Vision\nTalk to the Hand: Generating a 3D Print from Photographs\nAlgebraic Relations and Triangulation of Unlabeled Image Points\nSolutions of Quadratic First-Order ODEs applied to Computer Vision  Problems\nThin Structure Estimation with Curvature Regularization\nAligned Image-Word Representations Improve Inductive Transfer Across  Vision-Language Tasks\nThe Incremental Multiresolution Matrix Factorization Algorithm\nBioTracker: An Open-Source Computer Vision Framework for Visual Animal  Tracking\nCloud Computing framework for Computer Vision Research:An Introduction\nIntelligent Indoor Mobile Robot Navigation Using Stereo Vision\nRobot Vision Architecture for Autonomous Clothes Manipulation\nAn Image Processing based Object Counting Approach for Machine Vision  Application\nVirtual Worlds as Proxy for Multi-Object Tracking Analysis\nRobust Deformation Estimation in Wood-Composite Materials using  Variational Optical Flow\nRobust positioning of drones for land use monitoring in strong terrain  relief using vision-based navigation\nUse of Computer Vision to Detect Tangles in Tangled Objects\nDistributed and Parallel Net Imaging\nIllustrating Color Evolution and Color Blindness by the Decoding Model  of Color Vision\nVision-based Human Gender Recognition: A Survey\nEfficient Point-to-Subspace Query in $\\ell^1$: Theory and Applications  in Computer Vision\nClustering Learning for Robotic Vision\nPooling-Invariant Image Feature Learning\nRobust Subspace Recovery via Bi-Sparsity Pursuit\nChallenges in video based object detection in maritime scenario using  computer vision\nIAT - Image Annotation Tool: Manual\nAn Approach for Noise Removal on Depth Images\nColor Homography Color Correction\nTackling Corruption With Agents & ICT: A Vision\nAn Analysis of Action Recognition Datasets for Language and Vision Tasks\nUsing Artificial Tokens to Control Languages for Multilingual Image  Caption Generation\nCommonsense Scene Semantics for Cognitive Robotics: Towards Grounding  Embodied Visuo-Locomotive Interactions\nSpeedMachines: Anytime Structured Prediction\nMahotas: Open source software for scriptable computer vision\nCrowd Behavior Analysis: A Review where Physics meets Biology\nEfficient Clustering on Riemannian Manifolds: A Kernelised Random  Projection Approach\nDeep Learning with Energy-efficient Binary Gradient Cameras\nAn Empirical Study of Adequate Vision Span for Attention-Based Neural  Machine Translation\nTasselNet: Counting maize tassels in the wild via local counts  regression network\nTowards a Crowd Analytic Framework For Crowd Management in  Majid-al-Haram\nA Computer Vision System to Localize and Classify Wastes on the Streets\nQRkit: Sparse, Composable QR Decompositions for Efficient and Stable  Solutions to Problems in Computer Vision\nModel Validation for Vision Systems via Graphics Simulation\nTowards Practical Verification of Machine Learning: The Case of Computer  Vision Systems\nLeader Follower Formation Control of Ground Vehicles Using Camshift  Based Guidance\nSimplified vision based automatic navigation for wheat harvesting in low  income economies\nAssessing The Performance Bounds Of Local Feature Detectors: Taking  Inspiration From Electronics Design Practices\nA Survey of Current Datasets for Vision and Language Research\nCalorie Counter: RGB-Depth Visual Estimation of Energy Expenditure at  Home\nEvent-based Vision meets Deep Learning on Steering Prediction for  Self-driving Cars\nSystem-theoretic approach to image interest point detection\nProcams-Based Cybernetics\nComputing the Stereo Matching Cost with a Convolutional Neural Network\nNeRD: a Neural Response Divergence Approach to Visual Salience Detection\nCompression Rate Method for Empirical Science and Application to  Computer Vision\nSupervised Descent Method for Solving Nonlinear Least Squares Problems  in Computer Vision\nA Standalone Markerless 3D Tracker for Handheld Augmented Reality\nReal Time Object Tracking Based on Inter-frame Coding: A Review\nDeep Convolutional Neural Networks for Microscopy-Based Point of Care  Diagnostics\nTemplate Matching based Object Detection Using HOG Feature Pyramid\nAn Empirical Evaluation of Deep Learning on Highway Driving\nMedian and Mode Ellipse Parameterization for Robust Contour Fitting\nDefensive Distillation is Not Robust to Adversarial Examples\nCompensating for Large In-Plane Rotations in Natural Images\nEmotioNet Challenge: Recognition of facial expressions of emotion in the  wild\nComputer Vision for Autonomous Vehicles: Problems, Datasets and  State-of-the-Art\nComputer vision-based food calorie estimation: dataset, method, and  experiment\nExploration of object recognition from 3D point cloud\nSim4CV: A Photo-Realistic Simulator for Computer Vision Applications\nA Generative Restricted Boltzmann Machine Based Method for  High-Dimensional Motion Data Modeling\nDeepScores -- A Dataset for Segmentation, Detection and Classification  of Tiny Objects\nSome considerations on how the human brain must be arranged in order to  make its replication in a thinking machine possible\nResource-Aware Programming for Robotic Vision\nArch2030: A Vision of Computer Architecture Research over the Next 15  Years\nMidgar: Detection of people through computer vision in the Internet of  Things scenarios to improve the security in Smart Cities, Smart Towns, and  Smart Homes\nDifferential Invariants under Gamma Correction\nA Conjecture about a \"vision\" model for blind men\nFree actions and Grassmanian variety\nApplication of the SP theory of intelligence to the understanding of  natural vision and the development of computer vision\nTexture Defect Detection in Gradient Space\nFast and numerically stable circle fit\nExamining Representational Similarity in ConvNets and the Primate Visual  Cortex\nRecovery of structure of looped jointed objects from multiframes\nProjective reconstruction in algebraic vision\nIntegral Images: Efficient Algorithms for Their Computation and Storage  in Resource-Constrained Embedded Vision Systems\nWhat Will I Do Next? The Intention from Motion Experiment\nComplex Networks, Simple Vision\nEntropy And Vision\nIsometric Embeddings in Imaging and Vision: Facts and Fiction\nPicture Collage with Genetic Algorithm and Stereo vision\nA Unified Quantitative Model of Vision and Audition\nAraguaia Medical Vision Lab at ISIC 2017 Skin Lesion Classification  Challenge\nSemantic Robot Vision Challenge: Current State and Future Directions\nModel-based Influence Diagrams for Machine Vision\n3D Vision Guided Robotic Charging Station for Electric and Plug-in  Hybrid Vehicles\nVision Based Game Development Using Human Computer Interaction\nObject Detection Through Exploration With A Foveated Visual Field\nThe Digital Humanities Unveiled: Perceptions Held by Art Historians and  Computer Scientists about Computer Vision Technology\nHardware based Scale- and Rotation-Invariant Feature Extraction: A  Retrospective Analysis and Future Directions\nROI Segmentation for Feature Extraction from Human Facial Images\nCausal graph-based video segmentation\nA Survey on Eye-Gaze Tracking Techniques\nA Novel Method for Vectorization\nHierarchical structure-and-motion recovery from uncalibrated images\nResnet in Resnet: Generalizing Residual Architectures\nTraining Sparse Neural Networks\nA Novel Approach for Image Segmentation based on Histograms computed  from Hue-data\nWidening siamese architectures for stereo matching\nKernel Spectral Curvature Clustering (KSCC)\nToward Parts-Based Scene Understanding with Pixel-Support Parts-Sparse  Pictorial Structures\nDesigning an FPGA Synthesizable Computer Vision Algorithm to Detect the  Greening of Potatoes\nA Cognitive Model for Humanoid Robot Navigation and Mapping using  Alderbaran NAO\nSparse Modeling for Image and Vision Processing\nOn Vectorization of Deep Convolutional Neural Networks for Vision Tasks\nPredicting Motivations of Actions by Leveraging Text\nEdge Detection for Pattern Recognition: A Survey\nDeep Motion Features for Visual Tracking\nWhat Uncertainties Do We Need in Bayesian Deep Learning for Computer  Vision?\nCoarse-to-Fine Lifted MAP Inference in Computer Vision\nCNN Fixations: An unraveling approach to visualize the discriminative  image regions\nAutomated Identification of Trampoline Skills Using Computer Vision  Extracted Pose Estimation\nSemantic 3D Reconstruction with Finite Element Bases\nTraining and Testing Object Detectors with Virtual Images\nA Unifying Contrast Maximization Framework for Event Cameras, with  Applications to Motion, Depth, and Optical Flow Estimation\nVision-and-Language Navigation: Interpreting visually-grounded  navigation instructions in real environments\nRetinal Vessel Segmentation Using A New Topological Method\nCompetitive Analysis of Minimum-Cut Maximum Flow Algorithms in Vision  Problems\nPixels to Voxels: Modeling Visual Representation in the Human Brain\nPlay and Learn: Using Video Games to Train Computer Vision Models\nA Simple, Fast and Highly-Accurate Algorithm to Recover 3D Shape from 2D  Landmarks on a Single Image\nDeepCorrect: Correcting DNN models against Image Distortions\nFirst-Order Modeling and Stability Analysis of Illusory Contours\nBregman Divergences for Infinite Dimensional Covariance Matrices\nMemory-Efficient Design Strategy for a Parallel Embedded Integral Image  Computation Engine\nAirDraw: Leveraging Smart Watch Motion Sensors for Mobile Human Computer  Interactions\nField Geology with a Wearable Computer: First Results of the Cyborg  Astrobiologist System\nRethinking the Inception Architecture for Computer Vision\nABHIVYAKTI: A Vision Based Intelligent System for Elder and Sick Persons\nImplementation of a Vision System for a Landmine Detecting Robot Using  Artificial Neural Network\nOn Considering Uncertainty and Alternatives in Low-Level Vision\nModeling Radiometric Uncertainty for Vision with Tone-mapped Color  Images\nDesign and Analysis of a Single-Camera Omnistereo Sensor for Quadrotor  Micro Aerial Vehicles (MAVs)\nGeneric decoding of seen and imagined objects using hierarchical visual  features\nDetecting People in Cubist Art\nWhat value do explicit high level concepts have in vision to language  problems?\nTo Know Where We Are: Vision-Based Positioning in Outdoor Environments\nMöbius Invariants of Shapes and Images\nSet-Point Regulation of Linear Continuous-Time Systems using  Neuromorphic Vision Sensors\nLearning Grimaces by Watching TV\nBridging between Computer and Robot Vision through Data Augmentation: a  Case Study on Object Recognition\nTowards Vision-Based Smart Hospitals: A System for Tracking and  Monitoring Hand Hygiene Compliance\nUtilizing Semantic Visual Landmarks for Precise Vehicle Navigation\nNot-So-CLEVR: Visual Relations Strain Feedforward Neural Networks\nLook Before You Leap: Bridging Model-Free and Model-Based Reinforcement  Learning for Planned-Ahead Vision-and-Language Navigation\nIncremental Color Quantization for Color-Vision-Deficient Observers  Using Mobile Gaming Data\nA Neural Markovian Concurrent Image Labeling Algorithm\nOn the Cell-based Complexity of Recognition of Bounded Configurations by  Finite Dynamic Cellular Automata\nFast Planar Correlation Clustering for Image Segmentation\nGender Recognition in Walk Gait through 3D Motion by Quadratic Bezier  Curve and Statistical Techniques\nFingertip Detection: A Fast Method with Natural Hand\nContent Based Image Retrieval System Using NOHIS-tree\nAn Integrated System for 3D Gaze Recovery and Semantic Analysis of Human  Attention\nRigid-Motion Scattering for Texture Classification\nA Review of Image Mosaicing Techniques\nAnalysis of Gait Pattern to Recognize the Human Activities\nClustering Approach Towards Image Segmentation: An Analytical Study\nLearning Multi-Scale Representations for Material Classification\nA Robust Point Sets Matching Method\nAttributes as Semantic Units between Natural Language and Visual  Recognition\nHands Deep in Deep Learning for Hand Pose Estimation\nOn Computing the Translations Norm in the Epipolar Graph\nLearning to Compare Image Patches via Convolutional Neural Networks\nRigid Multiview Varieties\nThe use of deep learning in image segmentation, classification and  detection\nA probabilistic tour of visual attention and gaze shift computational  models\nInferring low-dimensional microstructure representations using  convolutional neural networks\nTowards Applying the OPRA Theory to Shape Similarity\nFast Convolutional Sparse Coding in the Dual Domain\nCan you find a face in a HEVC bitstream?\nGoing Further with Point Pair Features\nKeypoint-based object tracking and localization using networks of  low-power embedded smart cameras\nSynthesizing Novel Pairs of Image and Text\nAttention on Attention: Architectures for Visual Question Answering  (VQA)\nModeling Visual Information Processing in Brain: A Computer Vision Point  of View and Approach\nThe Event-Camera Dataset and Simulator: Event-based Data for Pose  Estimation, Visual Odometry, and SLAM\nA New Computational Framework For 2D Shape-Enclosing Contours\nQuantized Convolutional Neural Networks for Mobile Devices\nA clever elimination strategy for efficient minimal solvers\nHand Gesture Real Time Paint Tool - Box\nA computer verified, monadic, functional implementation of the integral\nA Meta-Theory of Boundary Detection Benchmarks\nViTac: Feature Sharing between Vision and Tactile Sensing for Cloth  Texture Recognition\nLearning to Transfer Privileged Information\nWhen Computer Vision Gazes at Cognition\nInvestigating Natural Image Pleasantness Recognition using Deep Features  and Eye Tracking for Loosely Controlled Human-computer Interaction\nMultiscale Computing in the Exascale Era\nHumans and deep networks largely agree on which kinds of variation make  object recognition harder\nGeometric Analysis of the Conformal Camera for Intermediate-Level Vision  and Perisaccadic Perception\nModeling Instantaneous Changes In Natural Scenes\nIterative Grassmannian Optimization for Robust Image Alignment\nA Review of Co-saliency Detection Technique: Fundamentals, Applications,  and Challenges\nASP Vision: Optically Computing the First Layer of Convolutional Neural  Networks using Angle Sensitive Pixels\nAlgorithmic Performance-Accuracy Trade-off in 3D Vision Applications  Using HyperMapper\nConvolutional Networks in Visual Environments\nCyborg Systems as Platforms for Computer-Vision Algorithm-Development  for Astrobiology\nImage decomposition with anisotropic diffusion applied to leaf-texture  analysis\nGraph Degree Linkage: Agglomerative Clustering on a Directed Graph\nA Computer Vision System for Attention Mapping in SLAM based 3D Models\nEfficient Regularization of Squared Curvature\nOn Learning Where To Look\nWhat you need to know about the state-of-the-art computational models of  object-vision: A tour through the models\nMeasurement of Road Traffic Parameters Based on Multi-Vehicle Tracking\nEnhancing Feature Tracking With Gyro Regularization\nVisualization Regularizers for Neural Network based Image Recognition\nFrom Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment  Prediction\nInvariant Representative Cocycles of Cohomology Generators using  Irregular Graph Pyramids\nA Fast Semidefinite Approach to Solving Binary Quadratic Problems\nSparse Coding and Dictionary Learning for Symmetric Positive Definite  Matrices: A Kernel Approach\nCoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein  Tertiary Structure Retrieval\nEstimating the Potential Speedup of Computer Vision Applications on  Embedded Multiprocessors\nSuperpixelizing Binary MRF for Image Labeling Problems\nOrigami: A 803 GOp/s/W Convolutional Network Accelerator\nTotal Variation Applications in Computer Vision\nCan we unify monocular detectors for autonomous driving by using the  pixel-wise semantic segmentation of CNNs?\nTutorial on Answering Questions about Images with Deep Learning\nVisual Question Answering: Datasets, Algorithms, and Future Challenges\nPhotographic home styles in Congress: a computer vision approach\nFace-to-BMI: Using Computer Vision to Infer Body Mass Index on Social  Media\nBeyond Planar Symmetry: Modeling human perception of reflection and  rotation symmetries in the wild\nOn human motion prediction using recurrent neural networks\nFacial Affect Estimation in the Wild Using Deep Residual and  Convolutional Networks\nEnhanced discrete particle swarm optimization path planning for UAV  vision-based surface inspection\nDetection of curved lines with B-COSFIRE filters: A case study on crack  delineation\nDockerface: an Easy to Install and Use Faster R-CNN Face Detector in a  Docker Container\nTowards a Dedicated Computer Vision Tool set for Crowd Simulation Models\nA Study on Topological Descriptors for the Analysis of 3D Surface  Texture\nAI Challenger : A Large-scale Dataset for Going Deeper in Image  Understanding\nNon-local Neural Networks\nLatency and Throughput Characterization of Convolutional Neural Networks  for Mobile Computer Vision\nDeep Learning For Computer Vision Tasks: A review\nA Machine Learning Approach to Recovery of Scene Geometry from Images\nA Multi-Camera Image Processing and Visualization System for Train  Safety Assessment\nA Generic Framework for Assessing the Performance Bounds of Image  Feature Detectors\nVision-Based Assessment of Parkinsonism and Levodopa-Induced Dyskinesia  with Deep Learning Pose Estimation\nDOTA: A Large-scale Dataset for Object Detection in Aerial Images\nThe ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for  Traffic Vision Research\nIntelligent Health Recommendation System for Computer Users\nBiologically Inspired Hierarchical Model for Feature Extraction and  Localization\nPerception games, the image understanding and interpretational geometry\nOn Ullman's theorem in computer vision\nA dyadic solution of relative pose problems\nAutomatic system for counting cells with elliptical shape\nAssessing the Value of 3D Reconstruction in Building Construction\nConvolutional Neural Networks Applied to House Numbers Digit  Classification\nFusing image representations for classification using support vector  machines\nInvestigating the performance of Correspondence Algorithms in Vision  based Driver-assistance in Indoor Environment\nStopping Criterion for the Mean Shift Iterative Algorithm\nReduced egomotion estimation drift using omnidirectional views\nA Vision on the Status and Evolution of HEP Physics Software Tools\nA Novel Georeferenced Dataset for Stereo Visual Odometry\nA State Of the Art Report on Research in Multiple RGB-D sensor Setups\nThe role of RGB-D benchmark datasets: an overview\nAn Experimental Comparison of Trust Region and Level Sets\nDeepPose: Human Pose Estimation via Deep Neural Networks\nAn Adaptive Dictionary Learning Approach for Modeling Dynamical Textures\nProceedings of The 38th Annual Workshop of the Austrian Association for  Pattern Recognition (ÖAGM), 2014\nHomotopy equivalence of finite digital images\nPattern Encoding on the Poincare Sphere\nPredicting Depth, Surface Normals and Semantic Labels with a Common  Multi-Scale Convolutional Architecture\nAutonomous Farm Vehicles: Prototype of Power Reaper\nRiemannian Metric Learning for Symmetric Positive Definite Matrices\nAdvances in Human Action Recognition: A Survey\nA Framework for Fast Face and Eye Detection\nFine-grained Recognition Datasets for Biodiversity Analysis\nHuman Head Pose Estimation by Facial Features Location\nAn Extension to Hough Transform Based on Gradient Orientation\nThe Fast Bilateral Solver\nBidirectional Warping of Active Appearance Model\nLoss Functions for Neural Networks for Image Processing\nStructure from Motion on a Sphere\nAmodal Instance Segmentation\nGPU-based Image Analysis on Mobile Devices\nDescriptor learning for omnidirectional image matching\nObject Tracking in Videos: Approaches and Issues\nReading Ancient Coin Legends: Object Recognition vs. OCR\nImage Acquisition in an Underwater Vision System with NIR and VIS  Illumination\nReal-time Pedestrian Surveillance with Top View Cumulative Grids\nImaging with Rays: Microscopy, Medical Imaging, and Computer Vision\nAction Recognition in the Frequency Domain\nSpeeding-up Graphical Model Optimization via a Coarse-to-fine Cascade of  Pruning Classifiers\nConvex Color Image Segmentation with Optimal Transport Distances\nLearning to Linearize Under Uncertainty\nInAR:Inverse Augmented Reality\nLow Rank Representation on Riemannian Manifold of Square Root Densities\nSPECFACE - A Dataset of Human Faces Wearing Spectacles\nDeep Learning Algorithms with Applications to Video Analytics for A  Smart City: A Survey\nEvaluation of Object Detection Proposals Under Condition Variations\nOn the Relation between two Rotation Metrics\n2D SEM images turn into 3D object models\nRegion Graph Based Method for Multi-Object Detection and Tracking using  Depth Cameras\nRoad Detection through Supervised Classification\nInteractive Image Segmentation From A Feedback Control Perspective\nBoundary conditions for Shape from Shading\nDepth Estimation Through a Generative Model of Light Field Synthesis\nDistortion Varieties\nCloud Dictionary: Sparse Coding and Modeling for Point Clouds\nBe Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers  from Vision\nRecasting Residual-based Local Descriptors as Convolutional Neural  Networks: an Application to Image Forgery Detection\nFrom visual words to a visual grammar: using language modelling for  image classification\nA Fast HOG Descriptor Using Lookup Table and Integral Image\nFeature Fusion using Extended Jaccard Graph and Stochastic Gradient  Descent for Robot\nConvolutional Neural Pyramid for Image Processing\nHigh-Quality Correspondence and Segmentation Estimation for Dual-Lens  Smart-Phone Portraits\nUnsupervised Learning by Predicting Noise\nCORe50: a New Dataset and Benchmark for Continuous Object Recognition\ncvpaper.challenge in 2016: Futuristic Computer Vision through 1,600  Papers Survey\nGeneralized Convolutional Neural Networks for Point Cloud Data\nIdentifying 3 moss species by deep learning, using the \"chopped picture\"  method\n3D Pose Regression using Convolutional Neural Networks\nTwo-stream Flow-guided Convolutional Attention Networks for Action  Recognition\nExploring Geometric Property Thresholds For Filtering Non-Text Regions  In A Connected Component Based Text Detection Application\nA LBP Based Correspondence Identification Scheme for Multi-view Sensing  Network\nMargin Sample Mining Loss: A Deep Learning Based Method for Person  Re-identification\nFacial Key Points Detection using Deep Convolutional Neural Network -  NaimishNet\nCan the early human visual system compete with Deep Neural Networks?\nConditional Autoencoders with Adversarial Information Factorization\nLeaf Identification Using a Deep Convolutional Neural Network\nGuided Labeling using Convolutional Neural Networks\nEnhanced Characterness for Text Detection in the Wild\nVisual Based Navigation of Mobile Robots\nImage Registration Techniques: A Survey\nLearning audio and image representations with bio-inspired trainable  feature extractors\nGenerating Instance Segmentation Annotation by Geometry-guided GAN\nBridging Cognitive Programs and Machine Learning\nLongitudinal Face Aging in the Wild - Recent Deep Learning Approaches\nSatellite imagery analysis for operational damage assessment in  Emergency situations\nAn Introduction to Image Synthesis with Generative Adversarial Nets\nTransferring Rich Deep Features for Facial Beauty Prediction\nA Pyramid CNN for Dense-Leaves Segmentation\nJoint interpretation of on-board vision and static GPS cartography for  determination of correct speed limit\nEnhancement Techniques for Local Content Preservation and Contrast  Improvement in Images\nEfficient Selection of Disambiguating Actions for Stereo Vision\nUnderstanding How Image Quality Affects Deep Neural Networks\nModeling the Contribution of Central Versus Peripheral Vision in Scene,  Object, and Face Recognition\nUnsupervised Video Analysis Based on a Spatiotemporal Saliency Detector\nA New Approach of Gray Images Binarization with Threshold Methods\nUberNet: Training a `Universal' Convolutional Neural Network for Low-,  Mid-, and High-Level Vision using Diverse Datasets and Limited Memory\nRecurrent 3D Attentional Networks for End-to-End Active Object  Recognition in Cluttered Scenes\nA Dataset for Developing and Benchmarking Active Vision\nEnhancing human color vision by breaking binocular redundancy\nPerson Re-Identification with Vision and Language\nEnhancing Underwater Imagery using Generative Adversarial Networks\nFusion of stereo and still monocular depth estimates in a  self-supervised learning context\nMulti-GPU Training of ConvNets\nComputational Models for Multiview Dense Depth Maps of Dynamic Scene\nPixel Normalization from Numeric Data as Input to Neural Networks\nSelf-organizing neural networks in classification and image recognition\nchi2TeX Semi-automatic translation from chiwriter to LaTeX\nSegmentation for radar images based on active contour\nExtension of Path Probability Method to Approximate Inference over Time\nLogical methods of object recognition on satellite images using spatial  constraints\nExtending Bron Kerbosch for Solving the Maximum Weight Clique Problem\nKunchenko's Polynomials for Template Matching\nKernel diff-hash\nA Topological Code for Plane Images\nWatersheds on edge or node weighted graphs \"par l'exemple\"\nMulti-Column Deep Neural Networks for Offline Handwritten Chinese  Character Classification\nThe complex-valued encoding for dicision-making based on aliasing data\nContinuous Optimization for Fields of Experts Denoising Works\nElectrocardiography Separation of Mother and Baby\nNext Generation Multicuts for Semi-Planar Graphs\nCalculate distance to object in the area where car, using video analysis\nHollywood in Homes: Crowdsourcing Data Collection for Activity  Understanding\nTowards Automated Melanoma Screening: Proper Computer Vision & Reliable  Results\nTowards Miss Universe Automatic Prediction: The Evening Gown Competition\nOn the Existence of a Projective Reconstruction\nConvolutional Neural Networks learn compact local image descriptors\nComputer Vision Approach for Low Cost, High Precision Measurement of  Grapevine Trunk Diameter in Outdoor Conditions\nKernel Methods on Riemannian Manifolds with Gaussian RBF Kernels\nDeep Neural Networks are Easily Fooled: High Confidence Predictions for  Unrecognizable Images\nImage Enhancement Using a Generalization of Homographic Function\nReal-world Object Recognition with Off-the-shelf Deep Conv Nets: How  Many Objects can iCub Learn?\nAn On-line Variational Bayesian Model for Multi-Person Tracking from  Cluttered Scenes\nPolyhedron Volume-Ratio-based Classification for Image Recognition\nPhotographic dataset: random peppercorns\nBiconvex Relaxation for Semidefinite Programming in Computer Vision\nReview of Action Recognition and Detection Methods\nThe HASYv2 dataset\nChallenge of Multi-Camera Tracking\nISIC 2017 - Skin Lesion Analysis Towards Melanoma Detection\nAutomatic skin lesion segmentation with fully  convolutional-deconvolutional networks\nA Dynamic Programming Solution to Bounded Dejittering Problems\nGraphcut Texture Synthesis for Single-Image Superresolution\nOnline Handwritten Mathematical Expressions Recognition System Using  Fuzzy Neural Network\nHuman-Level Intelligence or Animal-Like Abilities?\nNetwork Analysis for Explanation\nThreat of Adversarial Attacks on Deep Learning in Computer Vision: A  Survey\nExplaining First Impressions: Modeling, Recognizing, and Explaining  Apparent Personality from Videos\nBuilding an Integrated Mobile Robotic System for Real-Time Applications  in Construction\nExploiting deep residual networks for human action recognition from  skeletal data\nHuman Emotional Facial Expression Recognition\nFast and Accurate Surface Normal Integration on Non-Rectangular Domains\nIn Quest of Image Semantics: Are We Looking for It Under the Right  Lamppost?\nView Based Methods can achieve Bayes-Optimal 3D Recognition\nMany-to-Many Graph Matching: a Continuous Relaxation Approach\nAsymmetric Totally-corrective Boosting for Real-time Object Detection\nFeatureless 2D-3D Pose Estimation by Minimising an  Illumination-Invariant Loss\nA Unified Multiscale Framework for Discrete Energy Minimization\nAnalysis of Multi-Scale Fractal Dimension to Classify Human Motion\nDiscrete Energy Minimization, beyond Submodularity: Applications and  Approximations\nGLCM-based chi-square histogram distance for automatic detection of  defects on patterned textures\nUnsupervised Feature Learning for low-level Local Image Descriptors\nMultiview Hessian Discriminative Sparse Coding for Image Annotation\nRandom Binary Mappings for Kernel Learning and Efficient SVM\nDictionary Learning and Sparse Coding on Grassmann Manifolds: An  Extrinsic Solution\nSubmodularization for Quadratic Pseudo-Boolean Optimization\nCircle detection on images using Learning Automata\nFast and Robust Archetypal Analysis for Representation Learning\nCalibration of Multiple Fish-Eye Cameras Using a Wand\nMobile Camera Array Calibration for Light Field Acquisition\nVisual Word Selection without Re-Coding and Re-Pooling\nCaffe: Convolutional Architecture for Fast Feature Embedding\nShow and Tell: A Neural Image Caption Generator\nFashion Apparel Detection: The Role of Deep Convolutional Neural Network  and Pose-dependent Priors\nDetection of Non-Stationary Photometric Perturbations on Projection  Screens\nThe Cross-Depiction Problem: Computer Vision Algorithms for Recognising  Objects in Artwork and in Photographs\nDiscovering Attribute Shades of Meaning with the Crowd\nRiemannian Dictionary Learning and Sparse Coding for Positive Definite  Matrices\nLeveraging Context to Support Automated Food Recognition in Restaurants\nDeep Mean Maps\nNatural Language Object Retrieval\nData-dependent Initializations of Convolutional Neural Networks\nContext-aware CNNs for person head detection\nFinding Optimal Combination of Kernels using Genetic Programming\nParametric Object Motion from Blur\nLearning by tracking: Siamese CNN for robust target association\nDeep Learning the City : Quantifying Urban Perception At A Global Scale\nHard Negative Mining for Metric Learning Based Zero-Shot Classification\nLinking Image and Text with 2-Way Nets\nSparse Image Representation with Epitomes\nBayesian ensemble learning for image denoising\nLarge-margin Learning of Compact Binary Image Encodings\nSparse Coding on Symmetric Positive Definite Manifolds using Bregman  Divergences\nUnsupervised Network Pretraining via Encoding Human Design\nEfficient SDP Inference for Fully-connected CRFs Based on Low-rank  Decomposition\nAdaptive Locally Affine-Invariant Shape Matching\nTowards Distortion-Predictable Embedding of Neural Networks\nDictionary Learning and Sparse Coding for Third-order Super-symmetric  Tensors\nColor-Phase Analysis for Sinusoidal Structured Light in Rapid Range  Imaging\nColor-Stripe Structured Light Robust to Surface Color and Discontinuity\nLoss Functions for Top-k Error: Analysis and Insights\nDirect Intrinsics: Learning Albedo-Shading Decomposition by  Convolutional Regression\nRemote Health Coaching System and Human Motion Data Analysis for  Physical Therapy with Microsoft Kinect\nKernel Sparse Subspace Clustering on Symmetric Positive Definite  Manifolds\nSpace-Time Representation of People Based on 3D Skeletal Data: A Review\nSolving Dense Image Matching in Real-Time using Discrete-Continuous  Optimization\nSurvey on the attention based RNN model and its applications in computer  vision\nRobust Multi-body Feature Tracker: A Segmentation-free Approach\nLearning Attributes Equals Multi-Source Domain Generalization\nFast and Accurate Algorithm for Eye Localization for Gaze Tracking in  Low Resolution Images\nHierarchical Piecewise-Constant Super-regions\nSuperpixel Hierarchy\ncvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey\nNeither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep  Neural Networks\nSmall-Variance Nonparametric Clustering on the Hypersphere\nComplexity of Discrete Energy Minimization Problems\nPolyp Detection and Segmentation from Video Capsule Endoscopy: A Review\nShow and Tell: Lessons learned from the 2015 MSCOCO Image Captioning  Challenge\nSemantic Decomposition and Recognition of Long and Complex Manipulation  Action Sequences\nJoint Graph Decomposition and Node Labeling: Problem, Algorithms,  Applications\nCAS-CNN: A Deep Convolutional Neural Network for Image Compression  Artifact Suppression\nDeepSetNet: Predicting Sets with Deep Neural Networks\nSceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor  Trajectories with Ground Truth\nEgoReID: Cross-view Self-Identification and Human Re-identification in  Egocentric and Surveillance Videos\nGuaranteed Parameter Estimation for Discrete Energy Minimization\nDomain Adaptation for Visual Applications: A Comprehensive Survey\nThe Ciona17 Dataset for Semantic Segmentation of Invasive Species in a  Marine Aquaculture Environment\nAlgorithms for Semantic Segmentation of Multispectral Remote Sensing  Imagery using Deep Learning\nDeep Unsupervised Similarity Learning using Partially Ordered Sets\nDeepPermNet: Visual Permutation Learning\nExplaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR)  Approach to Understanding Deep Neural Networks\nA Review on Deep Learning Techniques Applied to Semantic Segmentation\nJoint Learning from Earth Observation and OpenStreetMap Data to Get  Faster Better Semantic Maps\nUnmasking the abnormal events in video\nSelf-supervised learning of visual features through embedding images  into text topic spaces\nA watershed-based algorithm to segment and classify cells in  fluorescence microscopy images\nBlock-Matching Optical Flow for Dynamic Vision Sensor- Algorithm and  FPGA Implementation\nInfinite Latent Feature Selection: A Probabilistic Latent Graph-Based  Ranking Approach\nLearning Robust Representations for Computer Vision\nDual-Glance Model for Deciphering Social Relationships\nWhen Kernel Methods meet Feature Learning: Log-Covariance Network for  Action Recognition from Skeletal Data\nAttentive Semantic Video Generation using Captions\nThe Conditional Analogy GAN: Swapping Fashion Articles on People Images\nFacial Feature Tracking under Varying Facial Expressions and Face Poses  based on Restricted Boltzmann Machines\nA Hierarchical Probabilistic Model for Facial Feature Detection\nSimultaneous Facial Landmark Detection, Pose and Deformation Estimation  under Facial Occlusion\nDrought Stress Classification using 3D Plant Models\nVisual speech recognition: aligning terminologies for better  understanding\nHigh efficiency compression for object detection\nStructured learning and detailed interpretation of minimal object images\nWasserstein Divergence for GANs\nDenoising Adversarial Autoencoders: Classifying Skin Lesions Using  Limited Labelled Training Data\nThe challenge of simultaneous object detection and pose estimation: a  comparative study\nA Benchmark and Evaluation of Non-Rigid Structure from Motion\nLearning Semantic Segmentation with Diverse Supervision\nDeep Visual Domain Adaptation: A Survey\nChallenging Images For Minds and Machines\nMultispectral Image Intrinsic Decomposition via Low Rank Constraint\nLearning and Recognizing Human Action from Skeleton Movement with Deep  Residual Neural Networks\nWho Let The Dogs Out? Modeling Dog Behavior From Visual Data\n3D Pose Estimation and 3D Model Retrieval for Objects in the Wild\nVision as Adaptive Epistemology\nHigh-for-Low and Low-for-High: Efficient Boundary Detection from Deep  Object Features and its Applications to High-Level Vision\nGoogle's Cloud Vision API Is Not Robust To Noise\nTelepath: Understanding Users from a Human Vision Perspective in  Large-Scale Recommender Systems\nEfficient variational inference in large-scale Bayesian compressed  sensing\nComputation-Performance Optimization of Convolutional Neural Networks  with Redundant Kernel Removal\nEscort: Efficient Sparse Convolutional Neural Networks on GPUs\nDoes a Plane Imitate a Bird? Does Computer Vision Have to Follow  Biological Paradigms?\nA Low Cost Vision Based Hybrid Fiducial Mark Tracking Technique for  Mobile Industrial Robots\nInverse Graphics with Probabilistic CAD Models\nBuilding with Drones: Accurate 3D Facade Reconstruction using MAVs\nHierarchical Multi-scale Attention Networks for Action Recognition\nAutomatic Tool Landmark Detection for Stereo Vision in Robot-Assisted  Retinal Surgery\nDeep Sparse Coding for Invariant Multimodal Halle Berry Neurons\nA Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation  and Geolocalization\nA Novice Guide towards Human Motion Analysis and Understanding\nIntroducing the Computable Universe\nSoft Computing Techniques for Change Detection in remotely sensed images  : A Review\nA Novel Architecture for Computing Approximate Radon Transform\nApproximate Bayesian Image Interpretation using Generative Probabilistic  Graphics Programs\nPlaying for Data: Ground Truth from Computer Games\nA $p$-adic RanSaC algorithm for stereo vision using Hensel lifting\nA Multi-Agents Architecture to Learn Vision Operators and their  Parameters\nSimplifying Energy Optimization using Partial Enumeration\nUnsupervised feature learning by augmenting single images\nFast Localization of Facial Landmark Points\nScalable Matting: A Sub-linear Approach\nOptimal measurement of visual motion across spatial and temporal scales\nMultiple Moving Object Recognitions in video based on Log Gabor-PCA  Approach\nA Pattern Recognition System for Detecting Use of Mobile Phones While  Driving\nConsensus Message Passing for Layered Graphical Models\nLow-level Vision by Consensus in a Spatial Hierarchy of Regions\nLow Cost Semi-Autonomous Agricultural Robots In Pakistan-Vision Based  Navigation Scalable methodology for wheat harvesting\nLearning Visual Features from Large Weakly Supervised Data\nAutomatically selecting inference algorithms for discrete energy  minimisation\nVisual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings  Using Abstract Scenes\nSearching for Objects using Structure in Indoor Scenes\nObject Detection from Video Tubelets with Convolutional Neural Networks\nImproving Multi-label Learning with Missing Labels by Structured  Semantic Correlations\nA Study of Vision based Human Motion Recognition and Analysis\nLearning to relate images: Mapping units, complex cells and simultaneous  eigenspaces\nAutomated Switching System for Skin Pixel Segmentation in Varied  Lighting\nDistributed Low-rank Subspace Segmentation\nBuilding Proteins in a Day: Efficient 3D Molecular Reconstruction\nA Neural Algorithm of Artistic Style\nAn Uncertain Future: Forecasting from Static Images using Variational  Autoencoders\nA quantitative analysis of tilt in the Café Wall illusion: a  bioplausible model for foveal and peripheral vision\nExploiting Depth from Single Monocular Images for Object Detection and  Semantic Segmentation\nTemplate Matching Advances and Applications in Image Analysis\nVisual motion processing and human tracking behavior\nThe More You Know: Using Knowledge Graphs for Image Classification\nChord Angle Deviation using Tangent (CADT), an Efficient and Robust  Contour-based Corner Detector\nSeeing What Is Not There: Learning Context to Determine Where Objects  Are Missing\nDepthSynth: Real-Time Realistic Synthetic Data Generation from CAD  Models for 2.5D Recognition\nVision-based Real-Time Aerial Object Localization and Tracking for UAV  Sensing System\nGaussian Processes with Context-Supported Priors for Active Object  Localization\nSuperpixel-based Semantic Segmentation Trained by Statistical Process  Control\nPlace recognition: An Overview of Vision Perspective\nDetecting and Grouping Identical Objects for Region Proposal and  Classification\nLearning Uncertain Convolutional Features for Accurate Saliency  Detection\nA Generic Deep Architecture for Single Image Reflection Removal and  Image Smoothing\nMass Displacement Networks\nDeep Scene Text Detection with Connected Component Proposals\nIn search of inliers: 3d correspondence by local and global voting\nGradient-based Camera Exposure Control for Outdoor Mobile Platforms\nIdentifying Mirror Symmetry Density with Delay in Spiking Neural  Networks\nLearning Compact Geometric Features\nScene-centric Joint Parsing of Cross-view Videos\nPlaying for Benchmarks\nMulti-Task Learning by Deep Collaboration and Application in Facial  Landmark Detection\nObject Referring in Visual Scene with Spoken Language\nThe Perception-Distortion Tradeoff\nVision Based Railway Track Monitoring using Deep Learning\nxUnit: Learning a Spatial Activation Function for Efficient Image  Restoration\nExcitation Backprop for RNNs\nDeep Depth Inference using Binocular and Monocular Cues\nDiscriminant Projection Representation-based Classification for Vision  Recognition\nOn the Duality Between Retinex and Image Dehazing\nOneDataShare: A Vision for Cloud-hosted Data Transfer Scheduling and  Optimization as a Service\nUnsupervised Domain Adaptation: from Simulation Engine to the RealWorld\nToward Natural Gesture/Speech Control of a Large Display\nA Tool for Integer Homology Computation: Lambda-At Model\nCamera Calibration: a USU Implementation\nTop-Down Unsupervised Image Segmentation (it sounds like oxymoron, but  actually it is not)\nComputational Vision in Nature and Technology\nCo-occurrence Matrix and Fractal Dimension for Image Segmentation\nPedestrian Detection with Unsupervised Multi-Stage Feature Learning\nA Prototyping Environment for Integrated Artificial Attention Systems\nSemantic Annotation: The Mainstay of Semantic Web\nText Based Approach For Indexing And Retrieval Of Image And Video: A  Review\nRevised Version of a JCIT Paper-Comparison of Feature Point Extraction  Algorithms for Vision Based Autonomous Aerial Refueling\nOn the Convergence of the Mean Shift Algorithm in the One-Dimensional  Space\nFlying Objects Detection from a Single Moving Camera\nFingertip in the Eye: A cascaded CNN pipeline for the real-time  fingertip detection in egocentric videos\nGraph-based denoising for time-varying point clouds\nClient-Driven Content Extraction Associated with Table\nEvidential Reasoning in Parallel Hierarchical Vision Programs\nSparkle Vision: Seeing the World through Random Specular Microfacets\nHuman Shape Variation - An Efficient Implementation using Skeleton\nA Novel Approach For Finger Vein Verification Based on Self-Taught  Learning\nPreprint ARPPS Augmented Reality Pipeline Prospect System\nPerson Recognition in Personal Photo Collections\nDepth Superresolution using Motion Adaptive Regularization\nFeature-based Recursive Observer Design for Homography Estimation\nAutonomous Ingress of a UAV through a window using Monocular Vision\n25 years of CNNs: Can we compare to human abstraction capabilities?\nVanishing point detection with convolutional neural networks\nFully-Trainable Deep Matching\nRevisiting Winner Take All (WTA) Hashing for Sparse Datasets\nScene Flow Estimation: A Survey\nInverse Compositional Spatial Transformer Networks\nMirrored Light Field Video Camera Adapter\nA/D Converter Architectures for Energy-Efficient Vision Processor\nTuning Modular Networks with Weighted Losses for Hand-Eye Coordination\nSee, Hear, and Read: Deep Aligned Representations\nNonlinear Embedding Transform for Unsupervised Domain Adaptation\nOriginal Loop-closure Detection Algorithm for Monocular vSLAM\nMultiple-Kernel Local-Patch Descriptor\nCapacity limitations of visual search in deep convolutional neural  network\nMonocular Navigation in Large Scale Dynamic Environments\nVisual Reasoning with Natural Language\nSimulating Structure-from-Motion\nImage matting with normalized weight and semi-supervised learning\nSuperpixel clustering with deep features for unsupervised road  segmentation\nA Parallel Algorithm for Dilated Contour Extraction from Bilevel Images\nLeast squares fitting of circles and lines\nField geology with a wearable computer: 1st results of the Cyborg  Astrobiologist System\nParametrical Neural Networks and Some Other Similar Architectures\nSpatio-Temporal Electromagnetic Field Shapes and their Logical  Processing\nAutomatic Detection of Pulmonary Embolism using Computational  Intelligence\nBayesian Nonlinear Principal Component Analysis Using Random Fields\nNecessary Conditions for Discontinuities of Multidimensional Size  Functions\nMapping Images with the Coherence Length Diagrams\nActive Testing for Face Detection and Localization\nBilateral filters: what they can and cannot do\nGeometric Models with Co-occurrence Groups\nVisual Concept Detection and Real Time Object Detection\nModelling Distributed Shape Priors by Gibbs Random Fields of Second  Order\nEye Pupil Location Using Webcam\nThe ideal of the trifocal variety\nSimulation of Fractional Brownian Surfaces via Spectral Synthesis on  Manifolds\nGenetic Stereo Matching Algorithm with Fuzzy Fitness\nSimilarity- based approach for outlier detection\nMassively Deep Artificial Neural Networks for Handwritten Digit  Recognition\nTime-domain multiscale shape identification in electro-sensing\nCITlab ARGUS for Arabic Handwriting\nEffective persistent homology of digital images\nUnderstanding Deep Convolutional Networks\nHandwritten Recognition Using SVM, KNN and Neural Network\nSynthesising Dynamic Textures using Convolutional Neural Networks\nA filter based approach for inbetweening\nCombinational neural network using Gabor filters for the classification  of handwritten digits\nAlgebraic Image Processing\nDetermination of Digital Straight Segments Using the Slope\nOn the impact of quantum computing technology on future developments in  high-performance scientific computing\nUsing Self-Contradiction to Learn Confidence Measures in Stereo Vision\nEgocentric Height Estimation\nSingle Image Super-resolution via a Lightweight Residual Convolutional  Neural Network\nHuman-like Clustering with Deep Convolutional Neural Networks\nAn End-to-End Compression Framework Based on Convolutional Neural  Networks\nAdaptive Deep Learning through Visual Domain Localization\nEvent-based Moving Object Detection and Tracking\nRecovering an Algebraic Curve Using its Projections From Different  Points. Applications to Static and Dynamic Computational Vision\nFormulation Of A N-Degree Polynomial For Depth Estimation using a Single  Image\nExtracting Parts of 2D Shapes Using Local and Global Interactions  Simultaneously\nGaze2Segment: A Pilot Study for Integrating Eye-Tracking Technology into  Medical Image Segmentation\nAn efficient Exact-PGA algorithm for constant curvature manifolds\nSaliency Driven Object recognition in egocentric videos with deep CNN\nHolistic Planimetric prediction to Local Volumetric prediction for 3D  Human Pose Estimation\nFast Multi-frame Stereo Scene Flow with Motion Segmentation\nFolded Recurrent Neural Networks for Future Video Prediction\nThe Cyborg Astrobiologist: First Field Experience\nCompressive adaptive computational ghost imaging\nShuffleNet: An Extremely Efficient Convolutional Neural Network for  Mobile Devices\nMulti-Task Feature Learning Via Efficient l2,1-Norm Minimization\nPotts model, parametric maxflow and k-submodular functions\nPerformance Evaluation of Raster Based Shape Vectors in Object  Recognition\nCorrelation Filters with Limited Boundaries\nReal Time Speckle Image De-Noising\nFast and Accurate Bilateral Filtering using Gauss-Polynomial  Decomposition\nDeepFool: a simple and accurate method to fool deep neural networks\nMAX-CSP, Graph Cuts and Statistical Physics\nOn learning optimized reaction diffusion processes for effective image  restoration\nFast and Robust Hand Tracking Using Detection-Guided Optimization\nEnergy-Efficient ConvNets Through Approximate Computing\nLightNet: A Versatile, Standalone Matlab-based Environment for Deep  Learning\nQuick and energy-efficient Bayesian computing of binocular disparity  using stochastic digital signals\nSpatially Adaptive Computation Time for Residual Networks\nPoseAgent: Budget-Constrained 6D Object Pose Estimation via  Reinforcement Learning\nEfficiently Computing Piecewise Flat Embeddings for Data Clustering and  Image Segmentation\nComputer methods for 3D motion tracking in real-time\nApprentice: Using Knowledge Distillation Techniques To Improve  Low-Precision Network Accuracy\nGazing into the Abyss: Real-time Gaze Estimation\nSome Applications of Algebraic Curves to Computational Vision\nMaking a Science of Model Search\nDimensionality Reduction and Reconstruction using Mirroring Neural  Networks and Object Recognition based on Reduced Dimension Characteristic  Vector\nInternet of Things (IoT): A Vision, Architectural Elements, and Future  Directions\nSeeking multi-thresholds for image segmentation with Learning Automata\nIntroducing SLAMBench, a performance and accuracy benchmarking  methodology for SLAM\nGoing Deeper in Facial Expression Recognition using Deep Neural Networks\nDiving deeper into mentee networks\nMultiview Differential Geometry of Curves\nAn improved computer vision method for detecting white blood cells\nUnderstanding Image Virality\nComputational Imaging for VLBI Image Reconstruction\nDigitizing Municipal Street Inspections Using Computer Vision\nUnderstanding Image and Text Simultaneously: a Dual Vision-Language  Machine Comprehension Task\nRevisiting Deep Intrinsic Image Decompositions\nDeep Learning in the Automotive Industry: Applications and Tools\nOrder embeddings and character-level convolutions for multimodal  alignment\nComparing deep neural networks against humans: object recognition when  the signal gets weaker\nDiscriminative Optimization: Theory and Applications to Computer Vision  Problems\nLearning Discriminative Alpha-Beta-divergence for Positive Definite  Matrices (Extended Version)\nGlobally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose  and Feature Correspondence\nAn Adaptive Fuzzy-Based System to Simulate, Quantify and Compensate  Color Blindness\nTranslation of \"Zur Ermittlung eines Objektes aus zwei Perspektiven mit  innerer Orientierung\" by Erwin Kruppa (1913)\nAFT*: Integrating Active Learning and Transfer Learning to Reduce  Annotation Efforts\nHand Gesture Controlled Drones: An Open Source Library\nFace Detection with Effective Feature Extraction\nVision-Based Navigation I: A navigation filter for fusing  DTM/correspondence updates\nWILI - Web Interface for people with Lowvision Issues\nInvariance of visual operations at the level of receptive fields\nBinocular disparity as an explanation for the moon illusion\nEgocentric vision IT technologies for Alzheimer disease assessment and  studies\nAnalysis of Amoeba Active Contours\nDeCAF: A Deep Convolutional Activation Feature for Generic Visual  Recognition\nElectrotactile vision substitution for 3D trajectory following\nNeural perceptual model to global-local vision for recognition of the  logical structure of administrative documents\nAre all training examples equally valuable?\nAn Improved Tracking using IMU and Vision Fusion for Mobile Augmented  Reality Applications\nStructured Hough Voting for Vision-based Highway Border Detection\nVision and Learning for Deliberative Monocular Cluttered Flight\nCapturing the Dynamics of Pedestrian Traffic Using a Machine Vision  System\nRelaxed Multiple-Instance SVM with Application to Object Discovery\nIn the sight of my wearable camera: Classifying my visual experience\nHybrid Focal Stereo Networks for Pattern Analysis in Homogeneous Scenes\nPreprint Extending Touch-less Interaction on Vision Based Wearable  Device\nObject Level Deep Feature Pooling for Compact Image Representation\nCompression Artifacts Reduction by a Deep Convolutional Network\nVision System and Depth Processing for DRC-HUBO+\nOptimizing Gaze Direction in a Visual Navigation Task\nFriction from Reflectance: Deep Reflectance Codes for Predicting  Physical Surface Properties from One-Shot In-Field Reflectance\nAll Weather Perception: Joint Data Association, Tracking, and  Classification for Autonomous Ground Vehicles\nA Light-powered, Always-On, Smart Camera with Compressed Domain Gesture  Detection\nVision-based Traffic Flow Prediction using Dynamic Texture Model and  Gaussian Process\nIntrospective Perception: Learning to Predict Failures in Vision Systems\nFrom Monocular SLAM to Autonomous Drone Exploration\nMultiCol-SLAM - A Modular Real-Time Multi-Camera SLAM System\nDual Attention Networks for Multimodal Reasoning and Matching\nWearable Vision Detection of Environmental Fall Risks using  Convolutional Neural Networks\nSemi-Supervised Recognition of the Diploglossus Millepunctatus Lizard  Species using Artificial Vision Algorithms\nSparse Factorization Layers for Neural Networks with Limited Supervision\nEgocentric Video Description based on Temporally-Linked Sequences\nConnecting Look and Feel: Associating the visual and tactile properties  of physical materials\nMobileNets: Efficient Convolutional Neural Networks for Mobile Vision  Applications\nShading Annotations in the Wild\nTopometric Localization with Deep Learning\nRotational Rectification Network: Enabling Pedestrian Detection for  Mobile Vision\nTeam Applied Robotics: A closer look at our robotic picking system\nFast, Accurate Thin-Structure Obstacle Detection for Autonomous Mobile  Robots\nRecognizing Objects In-the-wild: Where Do We Stand?\nTowards Decentralised Resilient Community Cloud Infrastructures\nAdapting Engineering Education to Industrie 4.0 Vision\nSeparating Self-Expression and Visual Content in Hashtag Supervision\nTrack, then Decide: Category-Agnostic Vision-based Multi-Object Tracking\nStructured Triplet Learning with POS-tag Guided Attention for Visual  Question Answering\nDeep Collaborative Weight-based Classification\nThe organizing vision of integrated health information systems\nDynamic Vision Sensors for Human Activity Recognition\nRobust event-stream pattern tracking based on correlative filter\nA Comprehensive Analysis of Deep Regression\nCompare and Contrast: Learning Prominent Visual Differences\nComputational Unification: a Vision for Connecting Researchers\nThe Cyborg Astrobiologist: Porting from a wearable computer to the  Astrobiology Phone-cam\nImage Processing in Optical Guidance for Autonomous Landing of Lunar  Probe\nBio-inspired speed detection and discrimination\nPersonalised product design using virtual interactive techniques\nGeneralized Boundaries from Multiple Image Interpretations\nMouse Simulation Using Two Coloured Tapes\nLinearized Alternating Direction Method with Adaptive Penalty and Warm  Starts for Fast Solving Transform Invariant Low-Rank Textures\nColor Assessment and Transfer for Web Pages\nA comparative study on face recognition techniques and neural network\nImage Registration for Stability Testing of MEMS\nComputer simulation based parameter selection for resistance exercise\nA Novel Equation based Classifier for Detecting Human in Images\nIs Bottom-Up Attention Useful for Scene Recognition?\nFast Approximate $K$-Means via Cluster Closures\nFast Training of Convolutional Networks through FFTs\nMulti Modal Face Recognition Using Block Based Curvelet Features\nAffine Subspace Representation for Feature Description\nOptimal Radiometric Calibration for Camera-Display Communication\nA Dataset for Movie Description\nClassification of Occluded Objects using Fast Recurrent Processing\nGalaxy morphology - an unsupervised machine learning approach\nA Simple Yet Effective Improvement to the Bilateral Filter for Image  Denoising\nAccelerating Very Deep Convolutional Networks for Classification and  Detection\nFast and Accurate Poisson Denoising with Optimized Nonlinear Diffusion\nFacial Expression Recognition Using Sparse Gaussian Conditional Random  Field\nA dense subgraph based algorithm for compact salient image region  detection\nConvolutional Feature Masking for Joint Object and Stuff Segmentation\nObject Detectors Emerge in Deep Scene CNNs\nSimple Image Description Generator via a Linear Phrase-Based Approach\nAn exploration of parameter redundancy in deep networks with circulant  projections\nConvolutional Channel Features\nTalking about the Moving Image: A Declarative Model for Image Schema  Based Embodied Perception Grounding and Language Generation\nToward a Taxonomy and Computational Models of Abnormalities in Images\nA framework for robust object multi-detection with a vote aggregation  and a cascade filtering\nHuman Attention Estimation for Natural Images: An Automatic Gaze  Refinement Approach\nDeep Learning For Smile Recognition\nThe Role of Typicality in Object Classification: Improving The  Generalization Capacity of Convolutional Neural Networks\nDo We Need Binary Features for 3D Reconstruction?\nResource Constrained Structured Prediction\nRevisiting Active Perception\nFast and High-Quality Bilateral Filtering Using Gauss-Chebyshev  Approximation\nMovie Description\nPooling Faces: Template based Face Recognition with Pooled Face Images\nDeep Structured-Output Regression Learning for Computational Color  Constancy\nVirtual Embodiment: A Scalable Long-Term Strategy for Artificial  Intelligence Research\nFixed-point Factorized Networks\nGeometry of 3D Environments and Sum of Squares Polynomials\nEasy-setup eye movement recording system for human-computer interaction\nSecond-order Convolutional Neural Networks\nA Bag-of-Words Equivalent Recurrent Neural Network for Action  Recognition\nA Holistic Approach for Optimizing DSP Block Utilization of a CNN  implementation on FPGA\nAutomatic Image Filtering on Social Networks Using Deep Learning and  Perceptual Hashing During Crises\nWRPN: Training and Inference using Wide Reduced-Precision Networks\nFeature Enhancement in Visually Impaired Images\nA Fast Method For Computing Principal Curvatures From Range Images\nBinarized Convolutional Neural Networks with Separable Filters for  Efficient Hardware Acceleration\nWRPN: Wide Reduced-Precision Networks\nNewton-type Methods for Inference in Higher-Order Markov Random Fields\nImage Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison  for Distorted Images\nSaliency Preservation in Low-Resolution Grayscale Images\nWild Patterns: Ten Years After the Rise of Adversarial Machine Learning\nNormalization of Neural Networks using Analytic Variance Propagation\nHyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine  for mW IoT End-Nodes\nDistribution-Aware Binarization of Neural Networks for Sketch  Recognition\nUnsupervised and semi-supervised learning with Categorical Generative  Adversarial Networks assisted by Wasserstein distance for dermoscopy image  Classification\nImage-based Vehicle Classification System\nFine-graind Image Classification via Combining Vision and Language\nTowards Instance Segmentation with Object Priority: Prominent Object  Detection and Recognition\nEmerging from Water: Underwater Image Color Correction Based on Weakly  Supervised Color Transfer\nTowards Quality Advancement of Underwater Machine Vision with Generative  Adversarial Networks\nFruit Quantity and Quality Estimation using a Robotic Vision System\nA meshfree particle method for a vision-based macroscopic pedestrian  model\nParallel Stroked Multi Line: a model-based method for compressing large  fingerprint databases\nDecoding visemes: improving machine lipreading (PhD thesis)\nAn Upper Limit of AC Huffman Code Length in JPEG Compression\nMobile Augmented Reality Applications\nA Novel Windowing Technique for Efficient Computation of MFCC for  Speaker Recognition\nKernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative  Clustering\nEye-GUIDE (Eye-Gaze User Interface Design) Messaging for  Physically-Impaired People\nSpatially-Adaptive Reconstruction in Computed Tomography using Neural  Networks\nModel-Driven Applications of Fractional Derivatives and Integrals\nMobile Crowd Sensing and Computing: When Participatory Sensing Meets  Participatory Social Media\nDeeply Semantic Inductive Spatio-Temporal Learning\nWeb Similarity\nPatchBatch: a Batch Augmented Loss for Optical Flow\nRandomized Iterative Reconstruction for Sparse View X-ray Computed  Tomography\nMaking data center computations fast, but not so furious\nSnapshot Difference Imaging using Time-of-Flight Sensors\npix2code: Generating Code from a Graphical User Interface Screenshot\nDragon: A Computation Graph Virtual Machine Based Deep Learning  Framework\nComputational complexity lower bounds of certain discrete Radon  transform approximations\nWhite Noise from the White Goods? Conceptual and Empirical Perspectives  on Ambient Domestic Computing\nDemystifying Parallel and Distributed Deep Learning: An In-Depth  Concurrency Analysis\nAqua Computing: Coupling Computing and Communications\nEfficient Privacy Preserving Viola-Jones Type Object Detection via  Random Base Image Representation\nMarket-Oriented Cloud Computing and the Cloudbus Toolkit\nEfficient Minimization of Higher Order Submodular Functions using  Monotonic Boolean Functions\nExact and Approximate Inference in Associative Hierarchical Networks  using Graph Cuts\nComputer vision tools for the non-invasive assessment of autism-related  behavioral markers\nA brief experience on journey through hardware developments for image  processing and its applications on Cryptography\nFeature Selection with Annealing for Computer Vision and Big Data  Learning\nEfficient Visual Coding: From Retina To V2\nLearning to see like children: proof of concept\nOn the Performance of ConvNet Features for Place Recognition\nRobust Optimization for Deep Regression\nStories in the Eye: Contextual Visual Interactions for Efficient Video  to Language Translation\nLearning to Track at 100 FPS with Deep Regression Networks\nLearning Spatially Regularized Correlation Filters for Visual Tracking\nEgocentric Meets Top-view\nA System View of the Recognition and Interpretation of Observed Human  Shape, Pose and Action\nHigh-Contrast Color-Stripe Pattern for Rapid Structured-Light Range  Imaging\nA Deep Structured Model with Radius-Margin Bound for 3D Human Activity  Recognition\nTowards the Design of an End-to-End Automated System for Image and  Video-based Recognition\nTracing liquid level and material boundaries in transparent vessels  using the graph cut computer vision approach\nGenerating Natural Questions About an Image\nUAV-based Autonomous Image Acquisition with Multi-View Stereo Quality  Assurance by Confidence Prediction\nImage Classification of Grapevine Buds using Scale-Invariant Features  Transform, Bag of Features and Support Vector Machines\nPerception-aware Path Planning\nSeeing into Darkness: Scotopic Visual Recognition\nVideo Processing from Electro-optical Sensors for Object Detection and  Tracking in Maritime Environment: A Survey\nCrowd Counting by Adapting Convolutional Neural Networks with Side  Information\nEgoTransfer: Transferring Motion Across Egocentric and Exocentric  Domains using Deep Neural Networks\nGoDP: Globally optimized dual pathway system for facial landmark  localization in-the-wild\nUsing convolutional networks and satellite imagery to identify patterns  in urban environments at a large scale\nContext-based Object Viewpoint Estimation: A 2D Relational Approach\nWhen Unsupervised Domain Adaptation Meets Tensor Representations\nThe iNaturalist Species Classification and Detection Dataset\n3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic  Parsing of Large-scale 3D Point Clouds\nPPR-FCN: Weakly Supervised Visual Relation Detection via Parallel  Pairwise R-FCN\nGenerative Adversarial Network-based Synthesis of Visible Faces from  Polarimetric Thermal Faces\nLearning Invariant Riemannian Geometric Representations Using Deep Nets\nMulti-label Class-imbalanced Action Recognition in Hockey Videos via 3D  Convolutional Neural Networks\nFine-Grained Car Detection for Visual Census Estimation\nRapid and Robust Automated Macroscopic Wood Identification System using  Smartphone with Macro-lens\nHow Much Chemistry Does a Deep Neural Network Need to Know to Make  Accurate Predictions?\nPerson Recognition in Social Media Photos\nMaterial Classification using Neural Networks\nWhat Makes Good Synthetic Training Data for Learning Disparity and  Optical Flow Estimation?\nDream Formulations and Deep Neural Networks: Humanistic Themes in the  Iconology of the Machine-Learned Image\n3D non-rigid registration using color: Color Coherent Point Drift\nHATS: Histograms of Averaged Time Surfaces for Robust Event-based Object  Classification\nDeep Learning Object Detection Methods for Ecological Camera Trap Data\nNon-Linear Temporal Subspace Representations for Activity Recognition\nDIY Human Action Data Set Generation\nThe Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking\nUnsupervised Sparse Dirichlet-Net for Hyperspectral Image  Super-Resolution\nA Linear Shift Invariant Multiscale Transform\nGeneral Theory of Image Normalization\nA Differential Invariant for Zooming\nBoosting the Differences: A fast Bayesian classifier neural network\nDistorted English Alphabet Identification : An application of Difference  Boosting Algorithm\nCorrelation over Decomposed Signals: A Non-Linear Approach to Fast and  Effective Sequences Comparison\nFingerprint based bio-starter and bio-access\nIS (Iris Security)\nFactor Temporal Prognosis of Tick-Borne Encephalitis Foci Functioning on  the South of Russian Far East\nConditional Expressions for Blind Deconvolution: Multi-point form\nSimple method to eliminate blur based on Lane and Bates algorithm\nRiemannian level-set methods for tensor-valued data\nLearning Similarity for Character Recognition and 3D Object Recognition\nA Class of LULU Operators on Multi-Dimensional Arrays\nIncreasing Linear Dynamic Range of Commercial Digital Photocamera Used  in Imaging Systems with Optical Coding\nConceptualization of seeded region growing by pixels aggregation. Part  1: the framework\nHigher Order Moments Generation by Mellin Transform for Compound Models  of Clutter\nGeneralized Prediction Intervals for Arbitrary Distributed  High-Dimensional Data\nAudio Classification from Time-Frequency Texture\nObtaining Depth Maps From Color Images By Region Based Stereo Matching  Algorithms\nDipole and Quadrupole Moments in Image Processing\nDipole Vectors in Images Processing\nSparse image representation by discrete cosine/spline based dictionaries\nHow Do Interactive Virtual Operas Shift Relationships between Music,  Text and Image?\nProperties of the Discrete Pulse Transform for Multi-Dimensional Arrays\nL2-optimal image interpolation and its applications to medical imaging\nPolyharmonic Daubechies type wavelets in Image Processing and Astronomy,  II\nMultiplierless Modules for Forward and Backward Integer Wavelet  Transform\nTemplate-based matching using weight maps\nA radial version of the Central Limit Theorem\nMultimodal diff-hash\nAn image processing of a Raphael's portrait of Leonardo\nThe watershed concept and its use in segmentation : a brief history\nImage Restoration with Signal-dependent Camera Noise\nLearning in Riemannian Orbifolds\nEfficient Topology-Controlled Sampling of Implicit Shapes\nVisual Vocabulary Learning and Its Application to 3D and Mobile Visual  Search\nIdentifications of concealed weapon in a Human Body\nVisual Transfer Learning: Informal Introduction and Literature Overview\nDetection of elliptical shapes via cross-entropy clustering\nGMM-Based Hidden Markov Random Field for Color Image and 3D Volume  Segmentation\nCoded aperture compressive temporal imaging\nSoftware Requirements Specification - Softbody Simulation System\nSubmodularity of a Set Label Disagreement Function\nWavelet methods for shape perception in electro-sensing\nQ-learning optimization in a multi-agents system for image segmentation\nStitched Panoramas from Toy Airborne Video Cameras\nHeat kernel coupling for multiple graph analysis\nARIANNA: pAth Recognition for Indoor Assisted NavigatioN with Augmented  perception\nImage processing using miniKanren\nOn Quadratization of Pseudo-Boolean Functions\nComputer vision-based recognition of liquid surfaces and phase  boundaries in transparent vessels, with emphasis on chemistry applications\nA graph-based mathematical morphology reader\nRPCA-KFE: Key Frame Extraction for Consumer Video based Robust Principal  Component Analysis\nComparative analysis of common edge detection techniques in context of  object extraction\nA review over the applicability of image entropy in analyses of remote  sensing datasets\nEnhanced EZW Technique for Compression of Image by Setting Detail  Retaining Pass Number\nRecognition of Handwritten Persian/Arabic Numerals Based on Robust  Feature Set and K-NN Classifier\nGabor-like Image Filtering using a Neural Microcircuit\nChallenge IEEE-ISBI/TCB : Application of Covariance matrices and wavelet  marginals\nOptical Character Recognition, Using K-Nearest Neighbors\nV-variable image compression\nBlob indentation identification via curvature measurement\nProceedings of The 39th Annual Workshop of the Austrian Association for  Pattern Recognition (OAGM), 2015\nSparse 3D convolutional neural networks\nBenchmarking KAZE and MCM for Multiclass Classification\nVeinPLUS: A Transillumination and Reflection-based Hand Vein Database\nShedding Light on the Asymmetric Learning Capability of AdaBoost\nHandwriting Recognition\nPiecewise Linear Activation Functions For More Efficient Deep Networks\nA Simple Hierarchical Pooling Data Structure for Loop Closure\nNode Specificity in Convolutional Deep Nets Depends on Receptive Field  Position and Size\nSome medical applications of example-based super-resolution\nAutomatic detection of moving objects in video surveillance\nCritical Points for Two-view Triangulation\nStroke-Based Cursive Character Recognition\nFacial transformations of ancient portraits: the face of Caesar\nEvidential Reasoning in Image Understanding\nDeveloping and Analyzing Boundary Detection Operators Using  Probabilistic Models\nAn implementation of the relational k-means algorithm\nHead Gesture Recognition using Optical Flow based Classification with  Reinforcement of GMM based Background Subtraction\nNo more meta-parameter tuning in unsupervised sparse feature learning\nTransfer Learning for Video Recognition with Scarce Training Data for  Deep Convolutional Neural Network\nGray level image enhancement using the Bernstein polynomials\nMulti-valued Color Representation Based on Frank t-norm Properties\nShannon, Tsallis and Kaniadakis entropies in bi-level image thresholding\nQuantum image classification using principal component analysis\nTime-causal and time-recursive spatio-temporal receptive fields\nPose Estimation Based on 3D Models\nBag-of-Features Image Indexing and Classification in Microsoft SQL  Server Relational Database\nA Large-Scale Car Dataset for Fine-Grained Categorization and  Verification\nFeature Learning for Interaction Activity Recognition in RGBD Videos\nGeometry and dimensionality reduction of feature spaces in primary  visual cortex\ngSLICr: SLIC superpixels at over 250Hz\nOvercomplete Dictionary Learning with Jacobi Atom Updates\nMotion trails from time-lapse video\nA proposal project for a blind image quality assessment by learning  distortions from the full reference image quality assessments\nMulticlass Classification of Cervical Cancer Tissues by Hidden Markov  Model\nAutomatic Detection and Decoding of Photogrammetric Coded Targets\nContent Aware Neural Style Transfer\nEfficient Robust Mean Value Calculation of 1D Features\nClosed Form for Some Gaussian Convolutions\nFast calculation of correlations in recognition systems\nExploiting Facial Landmarks for Emotion Recognition in the Wild\nColor Homography\nAn Alternative Matting Laplacian\nFace Detection with the Faster R-CNN\nInference on subspheres model for directional data\nDepth Estimation from Single Image using Sparse Representations\nAshwin: Plug-and-Play System for Machine-Human Image Annotation\nStamp processing with examplar features\nImage Based Camera Localization: an Overview\nComparing Face Detection and Recognition Techniques\nMulti-Camera Occlusion and Sudden-Appearance-Change Detection Using  Hidden Markovian Chains\nEffective sparse representation of X-Ray medical images\nPath-following based Point Matching using Similarity Transformation\nGroup Visual Sentiment Analysis\nPhotographic dataset: playing cards\nSpatially Aware Melanoma Segmentation Using Hybrid Deep Learning  Techniques\nSkin Lesion Classification Using Deep Multi-scale Convolutional Neural  Networks\nImage Classification of Melanoma, Nevus and Seborrheic Keratosis by Deep  Neural Network Ensemble\nSegmenting Dermoscopic Images\nSegmentation of skin lesions based on fuzzy classification of pixels and  histogram thresholding\nA Hybrid Deep Learning Approach for Texture Analysis\nSurface Normals in the Wild\nCollaborative Low-Rank Subspace Clustering\nAdaptive Cost Function for Pointcloud Registration\nImproved underwater image enhancement algorithms based on partial  differential equations (PDEs)\nBlood capillaries and vessels segmentation in optical coherence  tomography angiogram using fuzzy C-means and Curvelet transform\nDeep Learning Methods for Efficient Large Scale Video Labeling\nRotation Invariance Neural Network\nA Bayesian algorithm for detecting identity matches and fraud in image  databases\nImproved Human Emotion Recognition Using Symmetry of Facial Key Points  with Dihedral Group\nUPSET and ANGRI : Breaking High Performance Image Classifiers\nImage Segmentation Algorithms Overview\nA step towards procedural terrain generation with GANs\nEvaluation of Hashing Methods Performance on Binary Feature Descriptors\nA comment on the paper Prediction of Kidney Function from Biopsy Images  using Convolutional Neural Networks\nMultigraded Cayley-Chow forms\nElliptification of Rectangular Imagery\nFruit recognition from images using deep learning\nMathematics of Deep Learning\nInvariants of multidimensional time series based on their  iterated-integral signature\nDetecting and counting tiny faces\nA predictor-corrector method for the training of deep neural networks\nAggregated Sparse Attention for Steering Angle Prediction\nA Survey of Deep Learning Techniques for Mobile Robot Applications\nNot quite unreasonable effectiveness of machine learning algorithms\nOn the Robustness of the CVPR 2018 White-Box Adversarial Example  Defenses\nQCMC: Quasi-conformal Parameterizations for Multiply-connected domains\nFast 2-D Complex Gabor Filter with Kernel Decomposition\nPre-Symmetry Sets of 3D shapes\nThe Expressive Power of Binary Submodular Functions\nDeformable Model with a Complexity Independent from Image Resolution\nOn landmark selection and sampling in high-dimensional data analysis\nHodge Theory on Metric Spaces\nLearning an Interactive Segmentation System\nComputing the output distribution and selection probabilities of a stack  filter from the DNF of its positive Boolean function\nClinical gait data analysis based on Spatio-Temporal features\nMultilinear Biased Discriminant Analysis: A Novel Method for Facial  Action Unit Representation\nPerception of Motion and Architectural Form: Computational Relationships  between Optical Flow and Perspective\nFully Automatic Expression-Invariant Face Correspondence\nAutomatic Tuning of Interactive Perception Applications\nSignsWorld; Deeping Into the Silence World and Hearing Its Signs (State  of the Art)\nVision Paper: Towards an Understanding of the Limits of Map-Reduce  Computation\nObject Recognition with Multi-Scale Pyramidal Pooling Networks\nSupervised Texture Classification Using a Novel Compression-Based  Similarity Measure\nColor Constancy based on Image Similarity via Bilayer Sparse Coding\nStable Segmentation of Digital Image\nAutomatic Detection of Texture Defects Using Texture-Periodicity and  Gabor Wavelets\nImage-based Face Detection and Recognition: \"State of the Art\"\nJoint optimization of fitting & matching in multi-view reconstruction\nSpatio-Temporal Covariance Descriptors for Action and Gesture  Recognition\nHigher-order Segmentation via Multicuts\nActive Sensing as Bayes-Optimal Sequential Decision Making\nRecognition of Indian Sign Language in Live Video\nInfrared face recognition: a literature review\n6th International Symposium on Attention in Cognitive Systems 2013\nEfficient Energy Minimization for Enforcing Statistics\nCombining Spatio-Temporal Appearance Descriptors and Optical Flow for  Human Action Recognition in Video Data\nA Novel Illumination-Invariant Loss for Monocular 3D Pose Estimation\nA fast and robust algorithm to count topologically persistent holes in  noisy clouds\nFace Detection from still and Video Images using Unsupervised Cellular  Automata with K means clustering algorithm\nLow-Rank Modeling and Its Applications in Image Analysis\nHallucinating optimal high-dimensional subspaces\nFace Recognition Methods & Applications\nLearning detectors quickly using structured covariance matrices\nScalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods\nA Reverse Hierarchy Model for Predicting Eye Fixations\nFace Detection with a 3D Model\nPart-based R-CNNs for Fine-grained Category Detection\nTree-based iterated local search for Markov random fields with  applications in image analysis\nAggregation of local parametric candidates with exemplar-based occlusion  handling for optical flow\nScalable Greedy Algorithms for Transfer Learning\nA Multi-World Approach to Question Answering about Real-World Scenes  based on Uncertain Input\nCIDEr: Consensus-based Image Description Evaluation\nScale-Invariant Convolutional Neural Networks\nFeatures in Concert: Discriminative Feature Selection meets Unsupervised  Clustering\nVolumetric Bias in Segmentation and Reconstruction: Secrets and  Solutions\nSee the Difference: Direct Pre-Image Reconstruction and Pose Estimation  by Differentiating HOG\nLearning to See by Moving\nDense Semantic Correspondence where Every Pixel is a Classifier\nPlace Recognition with Event-based Cameras and a Neural Implementation  of SeqSLAM\nGazeDPM: Early Integration of Gaze Information in Deformable Part Models\nRendering of Eyes for Eye-Shape Registration and Gaze Estimation\nLogDet Rank Minimization with Application to Subspace Clustering\nEnd-to-end Convolutional Network for Saliency Prediction\nLearning a Discriminative Model for the Perception of Realism in  Composite Images\nA Markov Random Field and Active Contour Image Segmentation Model for  Animal Spots Patterns\nEfficient non-greedy optimization of decision trees\nFRIST - Flipping and Rotation Invariant Sparsifying Transform Learning  and Applications\nApplying deep learning to classify pornographic images and videos\nSublabel-Accurate Convex Relaxation of Vectorial Multilabel Energies\nFast Object Localization Using a CNN Feature Map Based Multi-Scale  Search\nSelf-taught learning of a deep invariant representation for visual  tracking via temporal slowness principle\nEfficient Splitting-based Method for Global Image Smoothing\nFusing Deep Convolutional Networks for Large Scale Visual Concept  Classification\nCongruences and Concurrent Lines in Multi-View Geometry\nWho Leads the Clothing Fashion: Style, Color, or Texture? A  Computational Study\nVehicles Recognition Using Fuzzy Descriptors of Image Segments\nMAS for video objects segmentation and tracking based on active contours  and SURF descriptor\nGeneralized Max Pooling\nFingers' Angle Calculation using Level-Set Method\nVideoSET: Video Summary Evaluation through Text\nVisual Speech Recognition\nF-formation Detection: Individuating Free-standing Conversational Groups  in Images\nMining Mid-level Features for Action Recognition Based on Effective  Skeleton Representation\nTowards Open World Recognition\nOcclusion Edge Detection in RGB-D Frames using Deep Convolutional  Networks\nTexture analysis using volume-radius fractal dimension\nA Brief Survey of Recent Edge-Preserving Smoothing Algorithms on Digital  Images\nReconciling saliency and object center-bias hypotheses in explaining  free-viewing fixations\nClustering Assisted Fundamental Matrix Estimation\nExploring Integral Image Word Length Reduction Techniques for SURF  Detector\nAnticipating Visual Representations from Unlabeled Video\nNeural Activation Constellations: Unsupervised Part Model Discovery with  Convolutional Networks\nImaging Time-Series to Improve Classification and Imputation\nCompressing Convolutional Neural Networks\nIntegrated Inference and Learning of Neural Factors in Structural  Support Vector Machines\nUsing User Generated Online Photos to Estimate and Monitor Air Pollution  in Major Cities\nIterative Thresholded Bi-Histogram Equalization for Medical Image  Enhancement\nMaximum Persistency via Iterative Relaxed Inference with Graphical  Models\nVision-Based Road Detection using Contextual Blocks\nLearning Social Relation Traits from Face Images\nShapeNet: An Information-Rich 3D Model Repository\nEnhanced image feature coverage: Key-point selection using genetic  algorithms\nRandomized Low-Rank Dynamic Mode Decomposition for Motion Detection\nCan Pretrained Neural Networks Detect Anatomy?\nAutomatic Description Generation from Images: A Survey of Models,  Datasets, and Evaluation Measures\nEfficient Globally Optimal 2D-to-3D Deformable Shape Matching\nLipreading with Long Short-Term Memory\nLearnt quasi-transitive similarity for retrieval from large collections  of faces\nImage Captioning with Semantic Attention\nEfficient Global Point Cloud Alignment using Bayesian Nonparametric  Mixtures\nFrom line segments to more organized Gestalts\nLightweight Unsupervised Domain Adaptation by Convolutional Filter  Reconstruction\nSub-pixel accuracy edge fitting by means of B-spline\nInterActive: Inter-Layer Activeness Propagation\nFacial Expression Recognition from World Wild Web\nCNN based texture synthesize with Semantic segment\nBacterial foraging optimization based brain magnetic resonance image  segmentation\nWeighted Residuals for Very Deep Networks\nAttention Correctness in Neural Image Captioning\nConvolution by Evolution: Differentiable Pattern Producing Networks\nRobust 3D Hand Pose Estimation in Single Depth Images: from Single-View  CNN to Multi-View CNNs\nTheta-RBM: Unfactored Gated Restricted Boltzmann Machine for  Rotation-Invariant Representations\nSuperpixel-based Two-view Deterministic Fitting for Multiple-structure  Data\nCan DMD obtain a Scene Background in Color?\nGrid Loss: Detecting Occluded Faces\nUnrealCV: Connecting Computer Vision to Unreal Engine\nVisual Saliency Detection Based on Multiscale Deep CNN Features\nEnd-to-End Eye Movement Detection Using Convolutional Neural Networks\nImage Aesthetic Assessment: An Experimental Survey\nA Baseline for Detecting Misclassified and Out-of-Distribution Examples  in Neural Networks\nOptimization of Convolutional Neural Network using Microcanonical  Annealing Algorithm\nTangled Splines\nLearning Robust Video Synchronization without Annotations\nUniversal adversarial perturbations\nReal-Time Image Distortion Correction: Analysis and Evaluation of  FPGA-Compatible Algorithms\nRenderGAN: Generating Realistic Labeled Data\nSemi-Dense 3D Semantic Mapping from Monocular SLAM\nDeep Variational Inference Without Pixel-Wise Reconstruction\nFactorized Bilinear Models for Image Recognition\nAutoScaler: Scale-Attention Networks for Visual Correspondence\nMulti-Scale Saliency Detection using Dictionary Learning\nFast Video Classification via Adaptive Cascading of Deep Models\nQuad-networks: unsupervised learning to rank for interest point  detection\nExplaining Radiological Emphysema Subtypes with Unsupervised Texture  Prototypes: MESA COPD Study\nHarmonic Networks: Deep Translation and Rotation Equivariance\nSignature of Geometric Centroids for 3D Local Shape Description and  Partial Shape Matching\nSmart Content Recognition from Images Using a Mixture of Convolutional  Neural Networks\nTransforming Sensor Data to the Image Domain for Deep Learning - an  Application to Footstep Detection\nComputing Egomotion with Local Loop Closures for Egocentric Videos\nWide-Residual-Inception Networks for Real-time Object Detection\nEnd-to-End Interpretation of the French Street Name Signs Dataset\nEfficient Large-scale Approximate Nearest Neighbor Search on the GPU\nGeometry-Based Region Proposals for Real-Time Robot Detection of  Tabletop Objects\nRecent Advances in Features Extraction and Description Algorithms: A  Comprehensive Survey\nIn Defense of the Triplet Loss for Person Re-Identification\nR-C3D: Region Convolutional 3D Network for Temporal Activity Detection\nVisually grounded learning of keyword prediction from untranscribed  speech\nA Paradigm Shift: Detecting Human Rights Violations Through Web Images\nGraph Partitioning with Acyclicity Constraints\nGeneralized Rank Pooling for Activity Recognition\nCERN: Confidence-Energy Recurrent Network for Group Activity Recognition\nSimultaneous Stereo Video Deblurring and Scene Flow Estimation\nCNN-SLAM: Real-time dense monocular SLAM with learned depth prediction\nOn the Two-View Geometry of Unsynchronized Cameras\nJoint Semantic and Motion Segmentation for dynamic scenes using Deep  Convolutional Networks\nLocality Preserving Projections for Grassmann manifold\nSurvey of Visual Question Answering: Datasets and Techniques\nLearning Image Relations with Contrast Association Networks\nUnrolled Optimization with Deep Priors\nA Random-Fern based Feature Approach for Image Matching\nNetwork Sketching: Exploiting Binary Structure in Deep CNNs\nUnsupervised Adaptive Re-identification in Open World Dynamic Camera  Networks\nChanging Views on Curves and Surfaces\nThe Surfacing of Multiview 3D Drawings via Lofting and Occlusion  Reasoning\nCoresets for Triangulation\nEnzyNet: enzyme classification using 3D convolutional neural networks on  spatial representation\nHead Detection with Depth Images in the Wild\nA Novel Transfer Learning Approach upon Hindi, Arabic, and Bangla  Numerals using Convolutional Neural Networks\nFashioning with Networks: Neural Style Transfer to Design Clothes\nAssociative Domain Adaptation\nTraining Deep Networks to be Spatially Sensitive\nA discriminative view of MRF pre-processing algorithms\nRandom Binary Trees for Approximate Nearest Neighbour Search in Binary  Space\nChromaTag: A Colored Marker and Fast Detection Algorithm\nPose Guided Structured Region Ensemble Network for Cascaded Hand Pose  Estimation\nTowards Semantic Fast-Forward and Stabilized Egocentric Videos\nHuman Action Recognition System using Good Features and Multilayer  Perceptron Network\nRobust Stereo Feature Descriptor for Visual Odometry\nSynthesising Wider Field Images from Narrow-Field Retinal Video Acquired  Using a Low-Cost Direct Ophthalmoscope (Arclight) Attached to a Smartphone\nPerformance Guaranteed Network Acceleration via High-Order Residual  Quantization\nMedical Image Analysis using Convolutional Neural Networks: A Review\nReversible Architectures for Arbitrarily Deep Residual Neural Networks\nA Computational Model of Afterimages based on Simultaneous and  Successive Contrasts\nSocial Style Characterization from Egocentric Photo-streams\n3D Reconstruction with Low Resolution, Small Baseline and High Radial  Distortion Stereo Images\nFace Retrieval using Frequency Decoded Local Descriptor\nModeling Image Virality with Pairwise Spatial Transformer Networks\nHuman Detection for Night Surveillance using Adaptive Background  Subtracted Image\nImage Identification Using SIFT Algorithm: Performance Analysis against  Different Image Deformations\nToward predictive machine learning for active vision\nFeed Forward and Backward Run in Deep Convolution Neural Network\nLearning Multi-Modal Word Representation Grounded in Visual Context\nImage Registration of Very Large Images via Genetic Programming\nA Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles\nVisual Explanation by High-Level Abduction: On Answer-Set Programming  Driven Reasoning about Moving Objects\nA Rotation and a Translation Suffice: Fooling CNNs with Simple  Transformations\nCan We Teach Computers to Understand Art? Domain Adaptation for  Enhancing Deep Networks Capacity to De-Abstract Art\nRecurrent Attentional Reinforcement Learning for Multi-label Image  Recognition\nSuperPoint: Self-Supervised Interest Point Detection and Description\nAggregated Channels Network for Real-Time Pedestrian Detection\nLearning to Prune Filters in Convolutional Neural Networks\nA neural model of the locust visual system for detection of object  approaches with real-world scenes\nSESR: Single Image Super Resolution with Recursive Squeeze and  Excitation Networks\nFrom Hashing to CNNs: Training BinaryWeight Networks via Hashing\nSampling Superquadric Point Clouds with Normals\nUsing Trusted Data to Train Deep Networks on Labels Corrupted by Severe  Noise\nNeural Photometric Stereo Reconstruction for General Reflectance  Surfaces\nScalable Dense Non-rigid Structure-from-Motion: A Grassmannian  Perspective\nAutomatic Pixelwise Object Labeling for Aerial Imagery Using Stacked  U-Nets\nTOMAAT: volumetric medical image analysis as a cloud service\nEigendecomposition-free Training of Deep Networks with Zero  Eigenvalue-based Losses\nLifting Layers: Analysis and Applications\nContext Encoding for Semantic Segmentation\nFaceForensics: A Large-scale Video Dataset for Forgery Detection in  Human Faces\nReview of Deep Learning\nMarkerless Inside-Out Tracking for Interventional Applications\nEnd-to-End Saliency Mapping via Probability Distribution Prediction\nA Multi-Layer Approach to Superpixel-based Higher-order Conditional  Random Field for Semantic Image Segmentation\nEvaluation of the visual odometry methods for semi-dense real-time\nCommunity Cloud Computing\nImplementation of Hand Detection based Techniques for Human Computer  Interaction\nMassivizing Computer Systems: a Vision to Understand, Design, and  Engineer Computer Ecosystems through and beyond Modern Distributed Systems\nLocally Served Network Computers\nVisualising the structure of architectural open spaces based on shape  analysis\nAn explicit formula for the number of tunnels in digital objects\nNeural Networks with Complex and Quaternion Inputs\nNeural Network Clustering Based on Distances Between Objects\nRough Sets Computations to Impute Missing Data\nTowards understanding and modelling office daily life\nApproximation of a Fractional Order System by an Integer Order Model  Using Particle Swarm Optimization Technique\nFaster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound\nAn Iterative Fingerprint Enhancement Algorithm Based on Accurate  Determination of Orientation Flow\nImprovements of the 3D images captured with Time-of-Flight cameras\nBreast Cancer Detection Using Multilevel Thresholding\nReal-Time Implementation of Order-Statistics Based Directional Filters\nCost-Effective Implementation of Order-Statistics Based Vector Filters  Using Minimax Approximations\nClassification with Scattering Operators\nHandwritten Character Recognition of South Indian Scripts: A Review\nMulti Layer Analysis\nRobust seed selection algorithm for k-means type algorithms\nMultimodal similarity-preserving hashing\nNonparametric Edge Detection in Speckled Imagery\nA Plea for Neutral Comparison Studies in Computational Sciences\nCreation of Digital Test Form for Prepress Department\nCoupled quasi-harmonic bases\nApplications of Clifford's Geometric Algebra\nAngles between subspaces\nAnalysing Word Importance for Image Annotation\nRemoval and Contraction Operations in $n$D Generalized Maps for  Efficient Homology Computation\nRough Clustering Based Unsupervised Image Change Detection\nGabor Filter and Rough Clustering Based Edge Detection\nDefinition of Visual Speech Element and Research on a Method of  Extracting Feature Vector for Korean Lip-Reading\nInverse Renormalization Group Transformation in Bayesian Image  Segmentations\nThe color of smiling: computational synaesthesia of facial expressions\nResearch on the fast Fourier transform of image based on GPU\nComputational models of attention\nFaster method for Deep Belief Network based Object classification using  DWT\nFast $(1+ε)$-approximation of the Löwner extremal matrices of  high-dimensional symmetric matrices\nArtificial Neural Networks for Detection of Malaria in RBCs\nAn Optimization Method For Slice Interpolation Of Medical Images\nUnsupervised Deep Haar Scattering on Graphs\nExact Decoding on Latent Variable Conditional Models is NP-Hard\nMultiple Object Recognition with Visual Attention\nComplex-Valued Hough Transforms for Circles\nQuantum Energy Regression using Scattering Transforms\nDESAT: an SSW tool for SDO/AIA image de-saturation\nA Unified Deep Neural Network for Speaker and Language Recognition\nImage Retrieval Based on Binary Signature ang S-kGraph\nA massively parallel multi-level approach to a domain decomposition  method for the optical flow estimation with varying illumination\nNeuron detection in stack images: a persistent homology interpretation\nLearning to Compose Neural Networks for Question Answering\nAutomatic Moth Detection from Trap Images for Pest Management\nEfficient forward propagation of time-sequences in convolutional neural  networks using Deep Shifting\nLazy Evaluation of Convolutional Filters\nQuantitative Analysis of Saliency Models\nInterpreting extracted rules from ensemble of trees: Application to  computer-aided diagnosis of breast MRI\nA Scalable and Robust Framework for Intelligent Real-time Video  Surveillance\nVIBIKNet: Visual Bidirectional Kernelized Network for Visual Question  Answering\nImaging around corners with single-pixel detector by computational ghost  imaging\nAn Efficient Algebraic Solution to the Perspective-Three-Point Problem\nDual-Tree Wavelet Scattering Network with Parametric Log Transformation  for Object Classification\nEnhanced Local Binary Patterns for Automatic Face Recognition\nLow-Precision Batch-Normalized Activations\nLensless computational imaging through deep learning\nDeciding How to Decide: Dynamic Routing in Artificial Neural Networks\nPunny Captions: Witty Wordplay in Image Descriptions\nTime Stretch Inspired Computational Imaging\nPrune the Convolutional Neural Networks with Sparse Shrink\nBlitzNet: A Real-Time Deep Network for Scene Understanding\nAdaptive strategies for solving parameterized systems using homotopy  continuation\nHeat Kernel Smoothing in Irregular Image Domains\nGenetic Algorithm-Based Solver for Very Large Multiple Jigsaw Puzzles of  Unknown Dimensions and Piece Orientation\nOn Nearest Neighbors in Non Local Means Denoising\nParallel transport in shape analysis: a scalable numerical scheme\nBLADE: Filter Learning for General Purpose Computational Photography\nThe Enhanced Hybrid MobileNet\nViolable Contracts and Governance for Blockchain Applications\nSAR Image Despeckling Using Quadratic-Linear Approximated L1-Norm\nWRPN & Apprentice: Methods for Training and Inference using  Low-Precision Numerics\nFutureMapping: The Computational Structure of Spatial AI Systems\nMobility Enhancement for Elderly\nThe Cyborg Astrobiologist: Testing a Novelty-Detection Algorithm on Two  Mobile Exploration Systems at Rivas Vaciamadrid in Spain and at the Mars  Desert Research Station in Utah\nYin and Yang: Balancing and Answering Binary Visual Questions\nThe Stixel world: A medium-level representation of traffic scenes\n3D Interpreter Networks for Viewer-Centered Wireframe Modeling\nEfficient Point-to-Subspace Query in $\\ell^1$ with Application to Robust  Object Instance Recognition\nStructured learning of sum-of-submodular higher order energy functions\nService-oriented Communities: Visions and Contributions towards Social  Organizations\nMeasuring Atmospheric Scattering from Digital Images of Urban Scenery  using Temporal Polarization-Based Vision\nSequential Score Adaptation with Extreme Value Theory for Robust Railway  Track Inspection\nCircle detection using isosceles triangles sampling\nUnderstanding learned CNN features through Filter Decoding with  Substitution\nDeep multi-scale video prediction beyond mean square error\nSparse Coral Classification Using Deep Convolutional Neural Networks\nThe Curious Robot: Learning Visual Representations via Physical  Interactions\nResolving Language and Vision Ambiguities Together: Joint Segmentation &  Prepositional Attachment Resolution in Captioned Scenes\nSingle Image 3D Interpreter Network\nWe Can \"See\" You via Wi-Fi - WiFi Action Recognition via Vision-based  Methods\nSteps Towards a Theory of Visual Information: Active Perception,  Signal-to-Symbol Conversion and the Interplay Between Sensing and Control\nA Bimodal Co-Sparse Analysis Model for Image Processing\nMulti-Projector Color Structured-Light Vision\nWeighted Schatten $p$-Norm Minimization for Image Denoising and  Background Subtraction\nMOON: A Mixed Objective Optimization Network for the Recognition of  Facial Attributes\nDetecting Violent and Abnormal Crowd activity using Temporal Analysis of  Grey Level Co-occurrence Matrix (GLCM) Based Texture Measures\nDerivatives and inverse of a linear-nonlinear multi-layer spatial vision  model\nModelHub: Towards Unified Data and Lifecycle Management for Deep  Learning\nPsyPhy: A Psychophysics Driven Evaluation Framework for Visual  Recognition\nPay Attention to Those Sets! Learning Quantification from Images\nMachine Vision System for 3D Plant Phenotyping\nWebVision Challenge: Visual Learning and Understanding With Web Data\nDeep Steering: Learning End-to-End Driving Model from Spatial and  Temporal Visual Cues\nVQS: Linking Segmentations to Questions and Answers for Supervised  Attention in VQA and Question-Focused Semantic Segmentation\nTowards social pattern characterization in egocentric photo-streams\nAutomatic Ground Truths: Projected Image Annotations for Omnidirectional  Vision\nConstrained Deep Transfer Feature Learning and its Applications\nGrad-CAM++: Generalized Gradient-based Visual Explanations for Deep  Convolutional Networks\nAdversarial Learning of Structure-Aware Fully Convolutional Networks for  Landmark Localization\nIn-Bed Pose Estimation: Deep Learning with Shallow Dataset\nNo Blind Spots: Full-Surround Multi-Object Tracking for Autonomous  Vehicles using Cameras & LiDARs\nConstrained Deep Learning using Conditional Gradient and Applications in  Computer Vision\nVisual Psychophysics for Making Face Recognition Algorithms More  Explainable\nEnergy-Efficient Management of Data Center Resources for Cloud  Computing: A Vision, Architectural Elements, and Open Challenges\nWhy Size Matters: Feature Coding as Nystrom Sampling\nZNN - A Fast and Scalable Algorithm for Training 3D Convolutional  Networks on Multi-Core and Many-Core Shared Memory Machines\nHFirst: A Temporal Approach to Object Recognition\nAffectNet: A Database for Facial Expression, Valence, and Arousal  Computing in the Wild\nMobile Cloud Computing: A Review on Smartphone Augmentation Approaches\nComputation of the Hausdorff distance between sets of line segments in  parallel\nPhysical Computing With No Clock to Implement the Gaussian Pyramid of  SIFT Algorithm\nThe Theory of Computational Quasi-conformal Geometry on Point Clouds\nAccelerated Distance Computation with Encoding Tree for High Dimensional  Data\nTRIM: Triangulating Images for Efficient Registration\nOn Using Micro-Clouds to Deliver the Fog\nOptimal Piecewise Linear Function Approximation for GPU-based  Applications\nFlexible Camera Calibration Using a New Analytical Radial Undistortion  Formula with Application to Mobile Robot Localization\nEntity Based Peer-to-Peer in a Data Grid Environment\nA Java Based Architecture of P2P-Grid Middleware\nSpace and camera path reconstruction for omni-directional vision\nlambda-Connectedness Determination for Image Segmentation\nA Nonparametric Approach to 3D Shape Analysis from Digital Camera Images  - I. in Memory of W.P. Dayawansa\nMulti-Label MRF Optimization via Least Squares s-t Cuts\nMetric and Kernel Learning using a Linear Transformation\nICT in Universities of the Western Himalayan Region in India: Status,  Performance- An Assessment\nSynthesis of supervised classification algorithm using intelligent and  statistical tools\nMatching 2-D Ellipses to 3-D Circles with Application to Vehicle Pose  Estimation\nBinarizing Business Card Images for Mobile Devices\nRecognition of handwritten Roman Numerals using Tesseract open source  OCR engine\nRandomized hybrid linear modeling by local best-fit flats\nLACBoost and FisherBoost: Optimally Building Cascade Classifiers\nIncremental Training of a Detector Using Online Sparse  Eigen-decomposition\nRepairing People Trajectories Based on Point Clustering\nOnline Adaptive Decision Fusion Framework Based on Entropic Projections  onto Convex Sets with Application to Wildfire Detection in Video\nNatural images from the birthplace of the human eye\nModeling Dynamic Swarms\nSHREC 2011: robust feature detection and description benchmark\nAutomatic Detection of Ringworm using Local Binary Pattern (LBP)\nA Medial Axis Based Thinning Strategy for Character Images\nVariational Gaussian Process Dynamical Systems\nGeneralised Object Detection and Semantic Analysis: Casino Example using  Matlab\nSpatiotemporal Gabor filters: a new method for dynamic texture  recognition\nMulti-column Deep Neural Networks for Image Classification\nUsing Barriers to Reduce the Sensitivity to Edge Miscalculations of  Casting-Based Object Projection Feature Estimation\nTexture Classification Approach Based on Combination of Edge &  Co-occurrence and Local Binary Pattern\nNon-sparse Linear Representations for Visual Tracking with Online  Reservoir Metric Learning\nRobust Head Pose Estimation Using Contourlet Transform\nSpectral Graph Cut from a Filtering Point of View\nGeneralized sequential tree-reweighted message passing\nPoisson noise reduction with non-local PCA\nThe Stability of Convergence of Curve Evolutions in Vector Fields\nIncorporating Domain Knowledge in Matching Problems via Harmonic  Analysis\nA Survey of Recent View-based 3D Model Retrieval Methods\nDiscriminative Sparse Coding on Multi-Manifold for Data Representation  and Classification\nFull Object Boundary Detection by Applying Scale Invariant Features in a  Region Merging Segmentation Algorithm\nAn Automatic Algorithm for Object Recognition and Detection Based on  ASIFT Keypoints\nTracking Revisited using RGBD Camera: Baseline and Benchmark\nTraining Effective Node Classifiers for Cascade Classification\nAuto-pooling: Learning to Improve Invariance of Image Features from  Image Sequences\nChESS - Quick and Robust Detection of Chess-board Features\nRobust Face Recognition via Block Sparse Bayesian Learning\nSparse Camera Network for Visual Surveillance -- A Comprehensive Survey\nImage Interpolation Using Kriging Technique for Spatial Data\nGood Recognition is Non-Metric\nObject Detection in Real Images\nIntelligent Approaches to interact with Machines using Hand Gesture  Recognition in Natural way: A Survey\nA Method for Visuo-Spatial Classification of Freehand Shapes Freely  Sketched\nComputer vision applications for coronagraphic optical alignment and  image processing\nAutomatic Parameter Adaptation for Multi-object Tracking\nSparse Norm Filtering\nClassifying and Visualizing Motion Capture Sequences using Deep Neural  Networks\nCharacterizing Ambiguity in Light Source Invariant Shape from Shading\nSaliency-Guided Perceptual Grouping Using Motion Cues in Region-Based  Artificial Visual Attention\nA General Two-Step Approach to Learning-Based Hashing\nSurface Registration Using Genetic Algorithm in Reduced Search Space\nFlexible Visual Quality Inspection in Discrete Manufacturing\nUsing the Random Sprays Retinex Algorithm for Global Illumination  Estimation\nMulticlass Road Sign Detection using Multiplicative Kernel\nGlobal Localization Based on 3D Planar Surface Segments\nClassifying Traffic Scenes Using The GIST Image Descriptor\nAn Overview and Evaluation of Various Face and Eyes Detection Algorithms  for Driver Fatigue Monitoring Systems\nContextual Hypergraph Modelling for Salient Object Detection\nObject Recognition System Design in Computer Vision: a Universal  Approach\nSecond-order Shape Optimization for Geometric Inverse Problems in Vision\nRecognizing Image Style\nComparative Study Of Image Edge Detection Algorithms\nA compact formula for the derivative of a 3-D rotation in exponential  coordinates\nAn Algorithmic Theory of Dependent Regularizers, Part 1: Submodular  Structure\nDeep Convolutional Ranking for Multilabel Image Annotation\nEstimation of Human Body Shape and Posture Under Clothing\nUsing Web Co-occurrence Statistics for Improving Image Categorization\nNear-separable Non-negative Matrix Factorization with $\\ell_1$- and  Bregman Loss Functions\nWhat is usual in unusual videos? Trajectory snippet histograms for  discovering unusualness\nA bi-level view of inpainting - based image compression\nObject Tracking via Non-Euclidean Geometry: A Grassmann Approach\nSummarisation of Short-Term and Long-Term Videos using Texture and  Colour\nCross-Scale Cost Aggregation for Stereo Matching\nOn learning to localize objects with minimal supervision\nQuality-based Multimodal Classification Using Tree-Structured Sparsity\nBlind Recognition of Touched Keys: Attack and Countermeasures\nCapturing and Recognizing Objects Appearance Employing Eigenspace\nTraffic Monitoring Using M2M Communication\nAutomatic Tracker Selection w.r.t Object Detection Performance\nCost-Effective HITs for Relative Similarity Comparisons\niPiano: Inertial Proximal Algorithm for Non-Convex Optimization\nIndoor Activity Detection and Recognition for Sport Games Analysis\nRelative Facial Action Unit Detection\nBetter Feature Tracking Through Subspace Constraints\nAn Intelligent Pixel Replication Technique by Binary Decomposition for  Digital Image Zooming\nSpeeding up Convolutional Neural Networks with Low Rank Expansions\nESSP: An Efficient Approach to Minimizing Dense and Nonsubmodular Energy  Functions\nAdapted Approach for Fruit Disease Identification using Images\nOn the Optimal Solution of Weighted Nuclear Norm Minimization\nAutomated Fabric Defect Inspection: A Survey of Classifiers\nGeometric Polynomial Constraints in Higher-Order Graph Matching\nLayered Logic Classifiers: Exploring the `And' and `Or' Relations\nCircle detection by Harmony Search Optimization\nStrengthening the Effectiveness of Pedestrian Detection with Spatially  Pooled Features\nOptimizing Ranking Measures for Compact Binary Code Learning\nSimultaneous Detection and Segmentation\nOrientation covariant aggregation of local descriptors with embeddings\nJet-Images: Computer Vision Inspired Techniques for Jet Tagging\nPushbroom Stereo for High-Speed Navigation in Cluttered Environments\nOn Pairwise Costs for Network Flow Multi-Object Tracking\nScene Image is Non-Mutually Exclusive - A Fuzzy Qualitative Scene  Understanding\nEnhanced Random Forest with Image/Patch-Level Learning for Image  Understanding\nMKL-RT: Multiple Kernel Learning for Ratio-trace Problems via Convex  Optimization\nLearning visual biases from human imagination\nSupervised mid-level features for word image representation\nOn The Effect of Hyperedge Weights On Hypergraph Learning\nA Solution for Multi-Alignment by Transformation Synchronisation\nA Weighted Common Subgraph Matching Algorithm\nEdge Detection based on Kernel Density Estimation\nSubmodular meets Structured: Finding Diverse Subsets in  Exponentially-Large Structured Item Sets\nPart Detector Discovery in Deep Convolutional Neural Networks\n6 Seconds of Sound and Vision: Creativity in Micro-Videos\nDeep Deconvolutional Networks for Scene Parsing\nConceptLearner: Discovering Visual Concepts from Weakly Labeled Image  Collections\nIteratively Reweighted Graph Cut for Multi-label MRFs with Non-convex  Priors\nMid-level Deep Pattern Mining\nVisual Representations: Defining Properties and Deep Approximations\nAnalysing domain shift factors between videos and images for object  detection\nAn Effective Image Feature Classiffication using an improved SOM\nHOG based Fast Human Detection\nCombining Language and Vision with a Multimodal Skip-gram Model\nDeep Image: Scaling up Image Recognition\nCorrentropy Induced L2 Graph for Robust Subspace Clustering\nDeep Convolutional Neural Networks for Action Recognition Using Depth  Map Sequences\nRobust Face Recognition by Constrained Part-based Alignment\nA Light Transport Model for Mitigating Multipath Interference in TOF  Sensors\nPoint Context: An Effective Shape Descriptor for RST-invariant  Trajectory Recognition\nJoint Object and Part Segmentation using Deep Learned Potentials\nOn a fast bilateral filtering formulation using functional  rearrangements\nActivity recognition from videos with parallel hypergraph matching on  GPUs\nInterleaved Text/Image Deep Mining on a Large-Scale Radiology Database  for Automated Image Interpretation\nLearning Style Similarity for Searching Infographics\nA Deeper Look at Dataset Bias\nA Two-Layer Local Constrained Sparse Coding Method for Fine-Grained  Visual Categorization\nMonocular Object Instance Segmentation and Depth Ordering with CNNs\nMRF Optimization by Graph Approximation\nCAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research\nReproducible Evaluation of Pan-Tilt-Zoom Tracking\nAction Recognition with Trajectory-Pooled Deep-Convolutional Descriptors\nKinect Range Sensing: Structured-Light versus Time-of-Flight Kinect\nRender for CNN: Viewpoint Estimation in Images Using CNNs Trained with  Rendered 3D Model Views\nDesign and Implementation of Real-time Algorithms for Eye Tracking and  PERCLOS Measurement for on board Estimation of Alertness of Drivers\nThe Minimum Spanning Tree of Maximum Entropy\nImage Segmentation Using Hierarchical Merge Tree\nFast Detection of Curved Edges at Low SNR\nTraining a Convolutional Neural Network for Appearance-Invariant Place  Recognition\nImproved Deep Convolutional Neural Network For Online Handwritten  Chinese Character Recognition using Domain-Specific Knowledge\nLearning to count with deep object features\nParsimonious Labeling\nVisual Data Deblocking using Structural Layer Priors\nFeature Representation in Convolutional Neural Networks\nNeural Network Classifiers for Natural Food Products\nLifting GIS Maps into Strong Geometric Context for Scene Understanding\nData-free parameter pruning for Deep Neural Networks\nRelating Cascaded Random Forests to Deep Convolutional Neural Networks  for Semantic Segmentation\nZero-Shot Domain Adaptation via Kernel Regression on the Grassmannian\nAction recognition in still images by latent superpixel classification\nBregman Iteration for Correspondence Problems: A Study of Optical Flow\nAugmenting Bag-of-Words: Data-Driven Discovery of Temporal and  Structural Information for Activity Recognition\nEgocentric Field-of-View Localization Using First-Person Point-of-View  Devices\nLearning Data-driven Reflectance Priors for Intrinsic Image  Decomposition\nDeepFix: A Fully Convolutional Neural Network for predicting Human Eye  Fixations\nWide-Area Image Geolocalization with Aerial Reference Imagery\nFine-Grained Product Class Recognition for Assisted Shopping\nMultiresolution hierarchy co-clustering for semantic segmentation in  sequences with small variations\nTowards Reversible De-Identification in Video Sequences Using 3D Avatars  and Steganography\nPersonalized Age Progression with Aging Dictionary\nImage Parsing with a Wide Range of Classes and Scene-Level Context\nGeometric Context from Videos\nFinding Temporally Consistent Occlusion Boundaries in Videos using  Geometric Context\nLinear Shape Deformation Models with Local Support Using Graph-based  Structured Matrix Factorisation\nRegional Active Contours based on Variational level sets and Machine  Learning for Image Segmentation\nColor Space Transformation Network\nBackground Modeling Using Adaptive Pixelwise Kernel Variances in a  Hybrid Feature Space\nRecovering hard-to-find object instances by sampling context-based  object proposals\nReview of Person Re-identification Techniques\nAn Efficient Multilinear Optimization Framework for Hypergraph Matching\nWeakly Supervised Deep Detection Networks\nSemantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer\nOnline Supervised Hashing for Ever-Growing Datasets\nFrom Images to Sentences through Scene Description Graphs using  Commonsense Reasoning and Knowledge\nVisual7W: Grounded Question Answering in Images\nAn Adaptive Data Representation for Robust Point-Set Registration and  Merging\nLearning to Assign Orientations to Feature Points\nStandard methods for inexpensive pollen loads authentication by means of  computer vision and machine learning\nRobust Face Alignment Using a Mixture of Invariant Experts\nSensory Polymorphism and Behavior: When Machine Vision Meets Monkey Eyes\nMoral Lineage Tracing\nLearning Structured Inference Neural Networks with Label Relations\nUnsupervised Representation Learning with Deep Convolutional Generative  Adversarial Networks\nWIDER FACE: A Face Detection Benchmark\nDeepCut: Joint Subset Partition and Labeling for Multi Person Pose  Estimation\nGround-truth dataset and baseline evaluations for image base-detail  separation algorithms\nTransCut: Transparent Object Segmentation from a Light-Field Image\nConstrained Structured Regression with Convolutional Neural Networks\nDenseCap: Fully Convolutional Localization Networks for Dense Captioning\nRecurrent Instance Segmentation\nIterative Instance Segmentation\nReal-Time Depth Refinement for Specular Objects\nPHOCNet: A Deep Convolutional Neural Network for Word Spotting in  Handwritten Documents\nRGBD Datasets: Past, Present and Future\nRadiometric Scene Decomposition: Scene Reflectance, Illumination, and  Geometry from RGB-D Images\nA Convolutional Neural Network Neutrino Event Classifier\nLOMo: Latent Ordinal Model for Facial Analysis in Videos\nA CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data  Engine\nCNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard  Examples\nPrivacy-Preserving Human Activity Recognition from Extreme Low  Resolution\nTraining Region-based Object Detectors with Online Hard Example Mining\nJoint Unsupervised Learning of Deep Representations and Image Clusters\nRemoving Clouds and Recovering Ground Observations in Satellite Image  Sequences via Temporally Contiguous Robust Matrix Completion\nVisual Storytelling\nDeep Feature Based Contextual Model for Object Detection\nImproving the Robustness of Deep Neural Networks via Stability Training\nACD: Action Concept Discovery from Image-Sentence Corpora\nPixel-level Encoding and Depth Layering for Instance-level Semantic  Labeling\nDeep Saliency with Encoded Low level Distance Map and High Level  Features\nWarpNet: Weakly Supervised Matching for Single-view Reconstruction\nTrack and Transfer: Watching Videos to Simulate Strong Human Supervision  for Weakly-Supervised Object Detection\nAutomatic 3D Reconstruction of Manifold Meshes via Delaunay  Triangulation and Mesh Sweeping\nRotation-Invariant Restricted Boltzmann Machine Using Shared Gradient  Filters\nCrowd Counting via Weighted VLAD on Dense Attribute Feature Maps\nMesh Interest Point Detection Based on Geometric Measures and Sparse  Refinement\nFaster R-CNN Features for Instance Search\nSparse vs. Non-sparse: Which One Is Better for Practical Visual  Tracking?\nLeveraging Union of Subspace Structure to Improve Constrained Clustering\nComparative study and enhancement of Camera Tampering Detection  algorithms\nEnd-to-End Localization and Ranking for Relative Attributes\nFashion Landmark Detection in the Wild\nLearning Dynamic Hierarchical Models for Anytime Scene Labeling\nSSHMT: Semi-supervised Hierarchical Merge Tree for Electron Microscopy  Image Segmentation\nAbout Pyramid Structure in Convolutional Neural Networks\nSeeing with Humans: Gaze-Assisted Neural Image Captioning\nSemantic Understanding of Scenes through the ADE20K Dataset\nA Recurrent Encoder-Decoder Network for Sequential Face Alignment\nDetecting Vanishing Points using Global Image Context in a Non-Manhattan  World\nA 4D Light-Field Dataset and CNN Architectures for Material Recognition\nSympathy for the Details: Dense Trajectories and Hybrid Classification  Architectures for Action Recognition\nTemporal Activity Detection in Untrimmed Videos with Recurrent Neural  Networks\nMulti-Person Pose Estimation with Local Joint-to-Person Associations\nMeasuring Machine Intelligence Through Visual Question Answering\nSpatio-Colour Asplünd 's Metric and Logarithmic Image Processing for  Colour Images (LIPC)\nA Multiple Component Matching Framework for Person Re-Identification\nDiscriminately Decreasing Discriminability with Learned Image Filters\nRuntime Guarantees for Regression Problems\nInsights from Classifying Visual Concepts with Multiple Kernel Learning\nImage Retrieval using Histogram Factorization and Contextual Similarity  Learning\nA New Approach To Two-View Motion Segmentation Using Global Dimension  Minimization\nA Health Monitoring System for Elder and Sick Persons\nA Bag of Visual Words Approach for Symbols-Based Coarse-Grained Ancient  Coin Classification\nCompact Relaxations for MAP Inference in Pairwise MRFs with Piecewise  Linear Priors\nSeeing What You're Told: Sentence-Guided Activity Recognition In Video\nBrain MRI Segmentation with Fast and Globally Convex Multiphase Active  Contours\nGraph Cuts with Interacting Edge Costs - Examples, Approximations, and  Algorithms\nAutomatic Estimation of Live Coffee Leaf Infection based on Image  Processing Techniques\nCircle detection using electro-magnetism optimization\nContinuous Action Recognition Based on Sequence Alignment\nShape-from-intrinsic operator\nLog-Euclidean Bag of Words for Human Action Recognition\nPlaying with Duality: An Overview of Recent Primal-Dual Approaches for  Solving Large-Scale Optimization Problems\nInteractively Test Driving an Object Detector: Estimating Performance on  Unlabeled Data\nVery Deep Convolutional Networks for Large-Scale Image Recognition\nComparing Feature Detectors: A bias in the repeatability criteria, and  how to correct it\nVisual Words for Automatic Lip-Reading\nDetector Discovery in the Wild: Joint Multiple Instance and  Representation Learning\nMetric Learning Driven Multi-Task Structured Output Optimization for  Robust Keypoint Tracking\nActions and Attributes from Wholes and Parts\nWeb image annotation by diffusion maps manifold learning algorithm\nCandidate Constrained CRFs for Loss-Aware Structured Prediction\nOptimizing Over Radial Kernels on Compact Manifolds\nIranian cashes recognition using mobile\nVisual Scene Representations: Contrast, Scaling and Occlusion\nGabor wavelets combined with volumetric fractal dimension applied to  texture analysis\nDetect2Rank : Combining Object Detectors Using Learning to Rank\nDriver distraction detection and recognition using RGB-D sensor\nBeyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image  Segmentation and Cosegmentation\nClothing Co-Parsing by Joint Image Segmentation and Labeling\nCrowded Scene Analysis: A Survey\nModeling Brain Circuitry over a Wide Range of Scales\nA Comprehensive Survey on Pose-Invariant Face Recognition\nRectified Factor Networks\nTotal variation on a tree\nWeakly Supervised Object Localization with Multi-fold Multiple Instance  Learning\nBethe Learning of Conditional Random Fields via MAP Decoding\nWhat's Cookin'? Interpreting Cooking Videos using Text, Speech and  Vision\nDense image registration and deformable surface reconstruction in  presence of occlusions and minimal texture\n3D Object Class Detection in the Wild\nSign Language Fingerspelling Classification from Depth and Color Images  using a Deep Belief Network\nEfficient piecewise training of deep structured models for semantic  segmentation\nSeparable time-causal and time-recursive spatio-temporal receptive  fields\nEgo-Object Discovery\nKernelized Low Rank Representation on Grassmann Manifolds\nLow Rank Representation on Grassmann Manifolds: An Extrinsic Perspective\nMOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking\nLearning Multiple Visual Tasks while Discovering their Structure\nColor Constancy Using CNNs\nVisual Recognition Using Directional Distribution Distance\nPreprint Touch-less Interactive Augmented Reality Game on Vision Based  Wearable Device\nFlowNet: Learning Optical Flow with Convolutional Networks\nIdentifying Reliable Annotations for Large Scale Image Segmentation\nA Flexible Tensor Block Coordinate Ascent Scheme for Hypergraph Matching\nHyperspectral Image Classification and Clutter Detection via Multiple  Structural Embeddings and Dimension Reductions\nUnsupervised domain adaption dictionary learning for visual recognition\nA Novel Approach Towards Clustering Based Image Segmentation\nBoosting Optical Character Recognition: A Super-Resolution Approach\nICDAR 2015 Text Reading in the Wild Competition\nSlow and steady feature analysis: higher order temporal coherence in  video\nTime Series Classification using the Hidden-Unit Logistic Model\nFast ADMM Algorithm for Distributed Optimization with Adaptive Penalty\nKernelized Multiview Projection\nImage Representations and New Domains in Neural Image Captioning\nSeeing Behind the Camera: Identifying the Authorship of a Photograph\nAn end-to-end generative framework for video segmentation and  recognition\nConvexity Shape Constraints for Image Segmentation\nObject Proposals for Text Extraction in the Wild\nDeep Multi-task Learning for Railway Track Inspection\nFacial Descriptors for Human Interaction Recognition In Still Images\nAttribute-Graph: A Graph based approach to Image Ranking\nAlgebraic Clustering of Affine Subspaces\nAutomatic Concept Discovery from Parallel Text and Visual Corpora\nLearning FRAME Models Using CNN Filters\nAttribute2Image: Conditional Image Generation from Visual Attributes\nSublabel-Accurate Relaxation of Nonconvex Energies\nScalable domain adaptation of convolutional neural networks\nPseudo-Bayesian Robust PCA: Algorithms and Analyses\n3D Reconstruction of Crime Scenes and Design Considerations for an  Interactive Investigation Tool\nSR-Clustering: Semantic Regularized Clustering for Egocentric Photo  Streams Segmentation\nA Latent-Variable Lattice Model\nG-CNN: an Iterative Grid Based Object Detector\nDenoising and Completion of 3D Data via Multidimensional Dictionary  Learning\nMultimodal Classification of Events in Social Media\nRobust Method of Vote Aggregation and Proposition Verification for  Invariant Local Features\nGamifying Video Object Segmentation\nKernelized LRR on Grassmann Manifolds for Subspace Clustering\nTemporal Action Localization in Untrimmed Videos via Multi-stage CNNs\nFacial Expression Recognition in the Wild using Rich Deep Features\nA Comparative Study of Object Trackers for Infrared Flying Bird Tracking\nNeighborhood Preserved Sparse Representation for Robust Classification  on Symmetric Positive Definite Matrices\nA Grassmannian Graph Approach to Affine Invariant Feature Matching\nA Large Dataset of Object Scans\nDAP3D-Net: Where, What and How Actions Occur in Videos?\nGlobal Deconvolutional Networks for Semantic Segmentation\nContextual Media Retrieval Using Natural Language Queries\nEvaluation of Deep Learning based Pose Estimation for Sign Language  Recognition\nGOGMA: Globally-Optimal Gaussian Mixture Alignment\nWeakly Supervised Localization using Deep Feature Maps\nMOT16: A Benchmark for Multi-Object Tracking\nShallow and Deep Convolutional Networks for Saliency Prediction\nDrift Robust Non-rigid Optical Flow Enhancement for Long Sequences\nTemporally coherent 4D reconstruction of complex dynamic scenes\nDeep Interactive Object Selection\nPushing the Limits of Deep CNNs for Pedestrian Detection\nLearning Domain-Invariant Subspace using Domain Features and  Independence Maximization\nImage Co-localization by Mimicking a Good Detector's Confidence Score  Distribution\nEnsemble of Deep Convolutional Neural Networks for Learning to Detect  Retinal Vessels in Fundus Images\nUnderstanding and Improving Convolutional Neural Networks via  Concatenated Rectified Linear Units\nLearning Image Matching by Simply Watching Video\nTowards Viewpoint Invariant 3D Human Pose Estimation\nRobust cDNA microarray image segmentation and analysis technique based  on Hough circle transform\nA Diagram Is Worth A Dozen Images\nMode-Seeking on Hypergraphs for Robust Geometric Model Fitting\nRecognizing Car Fluents from Video\nDo You See What I Mean? Visual Resolution of Linguistic Ambiguities\nVideo Interpolation using Optical Flow and Laplacian Smoothness\nGenerating Visual Explanations\nMulti-Cue Zero-Shot Learning with Strong Supervision\nLatent Embeddings for Zero-shot Classification\nRolling Shutter Camera Relative Pose: Generalized Epipolar Geometry\nComparison of Optimization Methods in Optical Flow Estimation\nPatch-based Texture Synthesis for Image Inpainting\nRobust Optical Flow Estimation of Double-Layer Images under Transparency  or Reflection\nLearning Discriminative Features with Class Encoder\nSemiContour: A Semi-supervised Learning Approach for Contour Detection\nLIME: A Method for Low-light IMage Enhancement\nMask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained  Image Recognition\nSpontaneous vs. Posed smiles - can we tell the difference?\nAn Analysis of Deep Neural Network Models for Practical Applications\nReal-Time Human Motion Capture with Multiple Depth Cameras\nSNN: Stacked Neural Networks\nLearning the image processing pipeline\nGeneralizing the Convolution Operator to extend CNNs to Irregular  Domains\nSemi-Supervised Domain Adaptation for Weakly Labeled Semantic Video  Object Segmentation\nMachine Learning Techniques and Applications For Ground-based Image  Analysis\nMultiple Human Tracking in RGB-D Data: A Survey\nV-Net: Fully Convolutional Neural Networks for Volumetric Medical Image  Segmentation\nThe ND-IRIS-0405 Iris Image Dataset\nA Hierarchical Pose-Based Approach to Complex Action Understanding Using  Dictionaries of Actionlets and Motion Poselets\nDecomposeMe: Simplifying ConvNets for End-to-End Learning\nEye Tracking for Everyone\nPragmatic factors in image description: the case of negations\nAn active efficient coding model of the optokinetic nystagmus\nAugmenting Supervised Neural Networks with Unsupervised Objectives for  Large-scale Image Classification\nImage Restoration Using Convolutional Auto-encoders with Symmetric Skip  Connections\nUnsupervised Learning of 3D Structure from Images\nDeep Learning of Appearance Models for Online Object Tracking\nLearning to Hash with Binary Deep Neural Network\nGeometry-Informed Material Recognition\nBinary Hashing with Semidefinite Relaxation and Augmented Lagrangian\nOn Differentiating Parameterized Argmin and Argmax Problems with  Application to Bi-level Optimization\nA Local-Global Approach to Semantic Segmentation in Aerial Images\nInteractive Illumination Invariance\nFeature Descriptors for Tracking by Detection: a Benchmark\nTemporal Model Adaptation for Person Re-Identification\nSemantic Clustering for Robust Fine-Grained Scene Recognition\nMS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition\nStereo Video Deblurring\nDeep Retinal Image Understanding\nReconstructing Articulated Rigged Models from RGB-D Videos\nDense Motion Estimation for Smoke\nLearning Action Concept Trees and Semantic Alignment Networks from  Image-Description Data\nRobust Structure from Motion in the Presence of Outliers and Missing  Data\nAn empirical study on the effects of different types of noise in image  classification tasks\nGenerative Visual Manipulation on the Natural Image Manifold\nTowards Deep Compositional Networks\n3D Face Reconstruction by Learning from Synthetic Data\nGeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for  Multimodal Information Fusion\nLearning camera viewpoint using CNN to improve 3D body pose estimation\nA scalable convolutional neural network for task-specified scenarios via  knowledge distillation\nOn Support Relations and Semantic Scene Graphs\nThe Color of the Cat is Gray: 1 Million Full-Sentences Visual Question  Answering (FSVQA)\nRealtime Hierarchical Clustering based on Boundary and Surface  Statistics\nVisual Fashion-Product Search at SK Planet\nOptimistic and Pessimistic Neural Networks for Scene and Object  Recognition\nVideo Summarization using Deep Semantic Features\nSimilarity Mapping with Enhanced Siamese Network for Multi-Object  Tracking\nReal-Time RGB-D based Template Matching Pedestrian Detection\nA novel and effective scoring scheme for structure classification and  pairwise similarity measurement\nImpatient DNNs - Deep Neural Networks with Dynamic Time Budgets\nMultiple Instance Learning Convolutional Neural Networks for Object  Recognition\nSpatio-Temporal Attention Models for Grounded Video Captioning\nEnhanced Object Detection via Fusion With Prior Beliefs from Image  Classification\nMaxmin convolutional neural networks for image classification\nA New Distance Measure for Non-Identical Data with Application to Image  Classification\nA Detailed Rubric for Motion Segmentation\nInitialization and Coordinate Optimization for Multi-way Matching\nUMDFaces: An Annotated Face Dataset for Training Deep Networks\nBoosting Image Captioning with Attributes\nAction Recognition Based on Joint Trajectory Maps Using Convolutional  Neural Networks\nGender Politics in the 2016 U.S. Presidential Election: A Computer  Vision Approach\nReal Time Video Analysis using Smart Phone Camera for Stroboscopic Image\nAssociative Embedding: End-to-End Learning for Joint Detection and  Grouping\nCross Domain Knowledge Transfer for Person Re-identification\nAn End-to-End Spatio-Temporal Attention Model for Human Action  Recognition from Skeleton Data\nDeepVO: A Deep Learning approach for Monocular Visual Odometry\nRecurrent Memory Addressing for describing videos\nMulti-Scale Anisotropic Fourth-Order Diffusion Improves Ridge and Valley  Localization\nDense Captioning with Joint Inference and Visual Context\nRecurrent Attention Models for Depth-Based Person Identification\n3D Image Reconstruction from X-Ray Measurements with Overlap\nAlternating Direction Graph Matching\nMulti-View 3D Object Detection Network for Autonomous Driving\nMulti-Modal Mean-Fields via Cardinality-Based Clamping\nRobotic Grasp Detection using Deep Convolutional Neural Networks\nDiscriminative Correlation Filter with Channel and Spatial Reliability\nGuessWhat?! Visual object discovery through multi-modal dialogue\nFast deterministic tourist walk for texture analysis\nIt's Written All Over Your Face: Full-Face Appearance-Based Gaze  Estimation\nObject Detection Free Instance Segmentation With Labeling  Transformations\nDeep, Dense, and Low-Rank Gaussian Conditional Random Fields\nImage Based Appraisal of Real Estate Properties\nPOSEidon: Face-from-Depth for Driver Pose Estimation\nObject-Centric Representation Learning from Unlabeled Videos\nPerspective Transformer Nets: Learning Single-View 3D Object  Reconstruction without 3D Supervision\nFood Image Recognition by Using Convolutional Neural Networks (CNNs)\nOn-Demand Learning for Deep Image Restoration\nRicher Convolutional Features for Edge Detection\nDeep Multi-scale Convolutional Neural Network for Dynamic Scene  Deblurring\nFeedback Neural Network for Weakly Supervised Geo-Semantic Segmentation\nUnderstanding and Mapping Natural Beauty\nPaying More Attention to Attention: Improving the Performance of  Convolutional Neural Networks via Attention Transfer\nDeep Convolutional Poses for Human Interaction Recognition in Monocular  Videos\nHow do people explore virtual environments?\nDeep Function Machines: Generalized Neural Networks for Topological  Layer Expression\nFusionNet: A deep fully residual convolutional neural network for image  segmentation in connectomics\nA Message Passing Algorithm for the Minimum Cost Multicut Problem\nA Study of Lagrangean Decompositions and Dual Ascent Solvers for Graph  Matching\nVideo Propagation Networks\nLearning a No-Reference Quality Metric for Single-Image Super-Resolution\nDeeply Aggregated Alternating Minimization for Image Restoration\nWide-Slice Residual Networks for Food Recognition\nGlobally Optimal Object Tracking with Fully Convolutional Networks\nQuantum Clustering and Gaussian Mixtures\nRotation equivariant vector field networks\nAction Recognition Based on Joint Trajectory Maps with Convolutional  Neural Networks\nRobust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation\nLearning from Synthetic Humans\nSee the Glass Half Full: Reasoning about Liquid Containers, their Volume  and Content\nCNN-based Segmentation of Medical Imaging Data\nTwo-view 3D Reconstruction for Food Volume Estimation\nBandwidth limited object recognition in high resolution imagery\nComplex Event Recognition from Images with Few Training Examples\nNormative theory of visual receptive fields\nLarge Scale Novel Object Discovery in 3D\nSide Information in Robust Principal Component Analysis: Algorithms and  Applications\nAn Analysis of 1-to-First Matching in Iris Recognition\nAn Experimental Study of Deep Convolutional Features For Iris  Recognition\nDesigning Deep Convolutional Neural Networks for Continuous Object  Orientation Estimation\nGuided Optical Flow Learning\nPredicting Privileged Information for Height Estimation\nJoint Discovery of Object States and Manipulation Actions\nOn-the-Fly Adaptation of Regression Forests for Online Camera  Relocalisation\nGraph Based Over-Segmentation Methods for 3D Point Clouds\nDeep Multi-camera People Detection\nEMNIST: an extension of MNIST to handwritten letters\nLearning to Detect Human-Object Interactions\nUnsupervised Diverse Colorization via Generative Adversarial Networks\nAnalyzing Learned Convnet Features with Dirichlet Process Gaussian  Mixture Models\nII-FCN for skin lesion analysis towards melanoma detection\nWeakly- and Semi-Supervised Object Detection with  Expectation-Maximization Algorithm\nGraph-based Isometry Invariant Representation Learning\nBridging Saliency Detection to Weakly Supervised Object Detection Based  on Self-paced Curriculum Learning\nWavelet Domain Residual Network (WavResNet) for Low-Dose X-ray CT  Reconstruction\nDeep Head Pose Estimation from Depth Data for In-car Automotive  Applications\nSRN: Side-output Residual Network for Object Symmetry Detection in the  Wild\nViraliency: Pooling Local Virality\nEvaluating Deep Convolutional Neural Networks for Material  Classification\nDetection of Human Rights Violations in Images: Can Convolutional Neural  Networks help?\nZero-Shot Learning - The Good, the Bad and the Ugly\nEnd-to-end Binary Representation Learning via Direct Binary Embedding\nZero-Shot Recognition using Dual Visual-Semantic Mapping Paths\nJoint Epipolar Tracking (JET): Simultaneous optimization of epipolar  geometry and feature correspondences\nTexture segmentation with Fully Convolutional Networks\nConvolutional Neural Network on Three Orthogonal Planes for Dynamic  Texture Classification\nConvolutional neural network architecture for geometric matching\nOn the Limitation of Convolutional Neural Networks in Recognizing  Negative Images\nSelf corrective Perturbations for Semantic Segmentation and  Classification\nPlanar Object Tracking in the Wild: A Benchmark\nDeep Residual Learning for Instrument Segmentation in Robotic Surgery\nLearned Multi-Patch Similarity\nMIHash: Online Hashing with Mutual Information\nIntroduction To The Monogenic Signal\nMultiple Instance Detection Network with Online Instance Classifier  Refinement\nCompositional Human Pose Regression\nSparse Autoencoder for Unsupervised Nucleus Detection and Representation  in Histopathology Images\n3D Object Reconstruction from Hand-Object Interactions\nOptic Disc and Cup Segmentation Methods for Glaucoma Detection with  Modification of U-Net Convolutional Neural Network\nDeep Depth From Focus\nTwo Stream LSTM: A Deep Fusion Framework for Human Action Recognition\nImproving Vision-based Self-positioning in Intelligent Transportation  Systems via Integrated Lane and Vehicle Detection\nClassification of Diabetic Retinopathy Images Using Multi-Class  Multiple-Instance Learning Based on Color Correlogram Features\nGenerate To Adapt: Aligning Domains using Generative Adversarial  Networks\nSeismic facies recognition based on prestack data using deep  convolutional autoencoder\nModeling Temporal Dynamics and Spatial Configurations of Actions Using  Two-Stream Recurrent Neural Networks\nAdaptive Relaxed ADMM: Convergence Theory and Practical Implementation\nTracking the Trackers: An Analysis of the State of the Art in Multiple  Object Tracking\nDOPE: Distributed Optimization for Pairwise Energies\nInterpretable Explanations of Black Boxes by Meaningful Perturbation\nLearning Two-Branch Neural Networks for Image-Text Matching Tasks\nDeep Contextual Recurrent Residual Networks for Scene Labeling\nProvable Self-Representation Based Outlier Detection in a Union of  Subspaces\nDiscriminative Bimodal Networks for Visual Localization and Detection  with Natural Language Queries\nAnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive  Features For Semantic Matching\nHarvesting Multiple Views for Marker-less 3D Human Pose Annotations\nA Fuzzy Brute Force Matching Method for Binary Image Features\nAccurate Optical Flow via Direct Cost Volume Processing\nJoint Layout Estimation and Global Multi-View Registration for Indoor  Reconstruction\nInception Recurrent Convolutional Neural Network for Object Recognition\nBAM! The Behance Artistic Media Dataset for Recognition Beyond  Photography\nAction Understanding with Multiple Classes of Actors\nObject Discovery via Cohesion Measurement\nGeneralized orderless pooling performs implicit salient matching\nRotation Averaging and Strong Duality\nAttributes2Classname: A discriminative model for attribute-based  unsupervised zero-shot learning\nAuto-painter: Cartoon Image Generation from Sketch by Using Conditional  Generative Adversarial Networks\nUnsupervised learning of object landmarks by factorized spatial  embeddings\nWhat Can Help Pedestrian Detection?\nGenerative Cooperative Net for Image Generation and Data Augmentation\nResidual Squeeze VGG16\nCHAM: action recognition using convolutional hierarchical attention  model\nPredicting the Driver's Focus of Attention: the DR(eye)VE Project\nLearning 3D Object Categories by Looking Around Them\nNeural Style Transfer: A Review\nObject-Level Context Modeling For Scene Classification with Context-CNN\nRevisiting IM2GPS in the Deep Learning Era\nCooperative Learning with Visual Attributes\nLocalized LRR on Grassmann Manifolds: An Extrinsic View\nSparse Coding on Stereo Video for Object Detection\nClassification and Retrieval of Digital Pathology Scans: A New Dataset\nFacial Expression Recognition Using Enhanced Deep 3D Convolutional  Neural Networks\nLook, Listen and Learn\nHashing as Tie-Aware Learning to Rank\nDeep Learning Improves Template Matching by Normalized Cross Correlation\nReal-Time Background Subtraction Using Adaptive Sampling and Cascade of  Gaussians\nDilated Residual Networks\nTowards Metamerism via Foveated Style Transfer\nResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting,  Violent Behaviour Detection and Crowd Density Level Classification\nLine Profile Based Segmentation Algorithm for Touching Corn Kernels\nLearning by Association - A versatile semi-supervised training method  for neural networks\nDeep Frame Interpolation\nFacial Emotion Detection Using Convolutional Neural Networks and  Representational Autoencoder Units\nDeep Alignment Network: A convolutional neural network for robust face  alignment\nLearning to Learn from Noisy Web Videos\nOkutama-Action: An Aerial View Video Dataset for Concurrent Human Action  Detection\nPoint Linking Network for Object Detection\nDeep Learning-Based Food Calorie Estimation Method in Dietary Assessment\nJoint Max Margin and Semantic Features for Continuous Event Detection in  Complex Scenes\nTeaching Compositionality to CNNs\nRecent Progress of Face Image Synthesis\nPersonalized Automatic Estimation of Self-reported Pain Intensity from  Facial Expressions\nSynthesis of Near-regular Natural Textures\nDeep Network Flow for Multi-Object Tracking\nIlluminating Pedestrians via Simultaneous Detection & Segmentation\nWhat's Mine is Yours: Pretrained CNNs for Limited Training Sonar ATR\nA selectional auto-encoder approach for document image binarization\nWeighted Singular Value Thresholding and its Application to Background  Estimation\nDeep Semantic Segmentation for Automated Driving: Taxonomy, Roadmap and  Challenges\nAutomatic Understanding of Image and Video Advertisements\nAerial Vehicle Tracking by Adaptive Fusion of Hyperspectral Likelihood  Maps\nLeveraging the Path Signature for Skeleton-based Human Action  Recognition\nOn Measuring and Quantifying Performance: Error Rates, Surrogate Loss,  and an Example in SSL\nRethinking Reprojection: Closing the Loop for Pose-aware  ShapeReconstruction from a Single Image\nShow and Recall: Learning What Makes Videos Memorable\nA robotic vision system to measure tree traits\nImage Projective Invariants\nObject-Extent Pooling for Weakly Supervised Single-Shot Localization\nEmotion Recognition by Body Movement Representation on the Manifold of  Symmetric Positive Definite Matrices\nDeep Optical Flow Estimation Via Multi-Scale Correspondence Structure  Learning\nCompact Model Representation for 3D Reconstruction\nJoint Background Reconstruction and Foreground Segmentation via A  Two-stage Convolutional Neural Network\nGraph-Theoretic Spatiotemporal Context Modeling for Video Saliency  Detection\nssEMnet: Serial-section Electron Microscopy Image Registration using a  Spatial Transformer Network with Learned Features\nLearning Bag-of-Features Pooling for Deep Convolutional Neural Networks\nModelling the Scene Dependent Imaging in Cameras with a Deep Neural  Network\nProduct recognition in store shelves as a sub-graph isomorphism problem\nRepresentation-Aggregation Networks for Segmentation of Multi-Gigapixel  Histology Images\nSerious Games Application for Memory Training Using Egocentric Images\nHuman Pose Forecasting via Deep Markov Models\nPhotographic Image Synthesis with Cascaded Refinement Networks\nAnalysis and Optimization of Convolutional Neural Network Architectures\nPROBE-GK: Predictive Robust Estimation using Generalized Kernels\nLearning Deep Convolutional Embeddings for Face Representation Using  Joint Sample- and Set-based Supervision\nBest Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic  Obstacles\nDepth Super-Resolution Meets Uncalibrated Photometric Stereo\nLearning Feature Pyramids for Human Pose Estimation\nImproved Speech Reconstruction from Silent Video\nDeep Metric Learning with Angular Loss\nLong Short-Term Memory Kalman Filters:Recurrent Neural Estimators for  Pose Regularization\nFace Parsing via Recurrent Propagation\nA Framework for Visually Realistic Multi-robot Simulation in Natural  Environment\nA Unified Model for Near and Remote Sensing\nLearning to Synthesize a 4D RGBD Light Field from a Single Image\nImage Quality Assessment Guided Deep Neural Networks Training\nKinship Verification from Videos using Spatio-Temporal Texture Features  and Deep Learning\nLearning Blind Motion Deblurring\nMonocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two  Perspective Frames\nLearning with Rethinking: Recurrently Improving Convolutional Neural  Networks through Feedback\nDesnowNet: Context-Aware Deep Network for Snow Removal\nGANs for Biological Image Synthesis\nA deep architecture for unified aesthetic prediction\nMirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion  Estimation\nAn Efficient Single Chord-based Accumulation Technique (SCA) to Detect  More Reliable Corners\nRecognizing Involuntary Actions from 3D Skeleton Data Using Body States\nRelaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake  Expression Prediction\nA Robust Indoor Scene Recognition Method based on Sparse Representation\nLeaf Counting with Deep Convolutional and Deconvolutional Networks\n3D Binary Signatures\nDistributed Bundle Adjustment\nAdaptive SVM+: Learning with Privileged Information for Domain  Adaptation\nTexture and Structure Incorporated ScatterNet Hybrid Deep Learning  Network (TS-SHDL) For Brain Matter Segmentation\nDisguised Face Identification (DFI) with Facial KeyPoints using Spatial  Fusion Convolutional Network\nSingle Shot Text Detector with Regional Attention\nHuman Detection and Tracking for Video Surveillance A Cognitive Science  Approach\nPhotometric stereo for strong specular highlights\nDense Face Alignment\nThe Devil is in the Tails: Fine-grained Classification in the Wild\nFine-grained Recognition in the Wild: A Multi-Task Domain Adaptation  Approach\nWhy Do Deep Neural Networks Still Not Recognize These Images?: A  Qualitative Analysis on Failure Cases of ImageNet Classification\nCLAD: A Complex and Long Activities Dataset with Rich Crowdsourced  Annotations\nDeep Generative Filter for Motion Deblurring\nJoint Learning of Set Cardinality and State Distribution\nA Tutorial on Deep Learning for Music Information Retrieval\nUnsupervised object discovery for instance recognition\nExploring Food Detection using CNNs\nFeature-Fused SSD: Fast Detection for Small Objects\nVariational Methods for Normal Integration\nMatterport3D: Learning from RGB-D Data in Indoor Environments\nWhen 3D-Aided 2D Face Recognition Meets Deep Learning: An extended UR2D  for Pose-Invariant Face Recognition\nVisual Question Generation as Dual Task of Visual Question Answering\nRobust Facial Landmark Detection under Significant Head Poses and  Occlusion\nConstrained Joint Cascade Regression Framework for Simultaneous Facial  Action Unit Recognition and Facial Landmark Detection\nMulti-view pose estimation with mixtures-of-parts and adaptive viewpoint  selection\nAttribute Recognition by Joint Recurrent Learning of Context and  Correlation\nHydraPlus-Net: Attentive Deep Features for Pedestrian Analysis\nUnified Deep Supervised Domain Adaptation and Generalization\nTemporal shape super-resolution by intra-frame motion encoding using  high-fps structured light\nPIRVS: An Advanced Visual-Inertial SLAM System with Flexible Sensor  Fusion and Hardware Co-Design\nGroup Affect Prediction Using Multimodal Distributions\nContrastive Learning for Image Captioning\nKeynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with  Small Deep-Neural-Network Architectures\nFace Sketch Matching via Coupled Deep Transform Learning\nHandwritten digit string recognition by combination of residual network  and RNN-CTC\nIterative PET Image Reconstruction Using Convolutional Neural Network  Representation\nEntanglement Entropy of Target Functions for Image Classification and  Convolutional Neural Network\nDescribing Natural Images Containing Novel Objects with Knowledge Guided  Assitance\nIdentifying Mild Traumatic Brain Injury Patients From MR Images Using  Bag of Visual Words\nBacktracking Regression Forests for Accurate Camera Relocalization\nComplete 3D Scene Parsing from Single RGBD Image\nDeep Spatial Regression Model for Image Crowd Counting\nSpiking Optical Flow for Event-based Sensors Using IBM's TrueNorth  Neurosynaptic System\nDeterministic Approximate Methods for Maximum Consensus Robust Fitting\nSceneFlowFields: Dense Interpolation of Sparse Scene Flow  Correspondences\nMulti-level Residual Networks from Dynamical Systems View\nExploiting Points and Lines in Regression Forests for RGB-D Camera  Relocalization\nA Connection between Feed-Forward Neural Networks and Probabilistic  Graphical Models\nImage Patch Matching Using Convolutional Descriptors with Euclidean  Distance\nClothing Retrieval with Visual Attention Model\nRobust Saliency Detection via Fusing Foreground and Background Priors\nA Bio-Inspired Multi-Exposure Fusion Framework for Low-light Image  Enhancement\nSet-to-Set Hashing with Applications in Visual Recognition\nObject-Centric Photometric Bundle Adjustment with Deep Shape Prior\nAn EEG-based Image Annotation System\nMSR-net:Low-light Image Enhancement Using Deep Convolutional Network\nCT-SRCNN: Cascade Trained and Trimmed Deep Convolutional Neural Networks  for Image Super Resolution\nDeep Residual Text Detection Network for Scene Text\nAON: Towards Arbitrarily-Oriented Text Recognition\nEvaluation of trackers for Pan-Tilt-Zoom Scenarios\nA Novel SDASS Descriptor for Fully Encoding the Information of 3D Local  Surface\nSliced Wasserstein Distance for Learning Gaussian Mixture Models\nPeople, Penguins and Petri Dishes: Adapting Object Counting Models To  New Visual Domains And Object Types Without Forgetting\nReal-Time Document Image Classification using Deep CNN and Extreme  Learning Machines\nZero-Annotation Object Detection with Web Knowledge Transfer\nFrame Interpolation with Multi-Scale Deep Loss Functions and Generative  Adversarial Networks\nThoracic Disease Identification and Localization with Limited  Supervision\nLook, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval  with Generative Models\nLearning from Synthetic Data: Addressing Domain Shift for Semantic  Segmentation\nData Fusion on Motion and Magnetic Sensors embedded on Mobile Devices  for the Identification of Activities of Daily Living\nAsking the Difficult Questions: Goal-Oriented Visual Question Generation  via Intermediate Rewards\nRepulsion Loss: Detecting Pedestrians in a Crowd\nUnFlow: Unsupervised Learning of Optical Flow with a Bidirectional  Census Loss\nRGB-D-based Human Motion Recognition with Deep Learning: A Survey\nDeep Expander Networks: Efficient Deep Networks from Graph Theory\nSelf-Supervised Vision-Based Detection of the Active Speaker as a  Prerequisite for Socially-Aware Language Acquisition\nVisual Feature Attribution using Wasserstein GANs\nInteractive Robot Learning of Gestures, Language and Affordances\nAppearance-and-Relation Networks for Video Classification\nCoplanar Repeats by Energy Minimization\nInterpretable Facial Relational Network Using Relational Importance\nA Generative Model of 3D Object Layouts in Apartments\nHoME: a Household Multimodal Environment\nTowards Alzheimer's Disease Classification through Transfer Learning\nEmbedded Real-Time Fall Detection Using Deep Learning For Elderly Care\n3DContextNet: K-d Tree Guided Hierarchical Learning of Point Clouds  Using Local and Global Contextual Cues\nSpatially-Adaptive Filter Units for Deep Neural Networks\nEmbodied Question Answering\nHybrid VAE: Improving Deep Generative Models using Partial Observations\nSemantic Photometric Bundle Adjustment on Natural Sequences\nInertial-aided Rolling Shutter Relative Pose Estimation\nImproving Smiling Detection with Race and Gender Diversity\nA 3D Coarse-to-Fine Framework for Automatic Pancreas Segmentation\nLearning Deep Representations for Word Spotting Under Weak Supervision\nDR-Net: Transmission Steered Single Image Dehazing Network with Weakly  Supervised Refinement\nFeature Generating Networks for Zero-Shot Learning\nAI Oriented Large-Scale Video Management for Smart City: Technologies,  Standards and Beyond\n4DFAB: A Large Scale 4D Facial Expression Database for Biometric  Applications\nSeparating Reflection and Transmission Images in the Wild\nBeyond the Pixel-Wise Loss for Topology-Aware Delineation\nHybrid eye center localization using cascaded regression and  hand-crafted model fitting\nDeep Image Smoothing based on Texture and Structure Guidance\nLearning 2D Gabor Filters by Infinite Kernel Learning Regression\nA Frequency Domain Neural Network for Fast Image Super-resolution\nClass Rectification Hard Mining for Imbalanced Deep Learning\n3D Facial Expression Reconstruction using Cascaded Regression\nDeep convolutional neural networks for brain image analysis on magnetic  resonance imaging: a review\nCoDraw: Visual Dialog for Collaborative Drawing\nSemantic Visual Localization\nAn ILP Solver for Multi-label MRFs with Connectivity Constraints\nVisual Explanation by Interpretation: Improving Visual Feedback  Capabilities of Deep Neural Networks\nLight Field Segmentation From Super-pixel Graph Representation\nAutomatic Estimation of Ice Bottom Surfaces from Radar Imagery\nBeyond saliency: understanding convolutional neural networks from  saliency prediction on layer-wise relevance propagation\nSignificance of Softmax-based Features in Comparison to Distance Metric  Learning-based Features\nFace Synthesis from Visual Attributes via Sketch using Conditional VAEs  and GANs\nDepth Not Needed - An Evaluation of RGB-D Feature Encodings for Off-Road  Scene Understanding by Convolutional Neural Network\nSemantic Segmentation via Highly Fused Convolutional Network with  Multiple Soft Cost Functions\nMoving Vehicle Detection Using AdaBoost and Haar-Like Feature in  Surveillance Videos\nSketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis\nFully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low  Resolution Action Recognition\nMulti-Task Spatiotemporal Neural Networks for Structured Surface  Reconstruction\nMSDNN: Multi-Scale Deep Neural Network for Salient Object Detection\nReblur2Deblur: Deblurring Videos via Self-Supervised Learning\nImage denoising and restoration with CNN-LSTM Encoder Decoder with  Direct Attention\nUnsupervised Representation Learning with Laplacian Pyramid  Auto-encoders\nBrenier approach for optimal transportation between a quasi-discrete  measure and a discrete measure\nTernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image  Segmentation\nPTB-TIR: A Thermal Infrared Pedestrian Tracking Benchmark\nRED-Net: A Recurrent Encoder-Decoder Network for Video-based Face  Alignment\nDeepGestalt - Identifying Rare Genetic Syndromes Using Deep Learning\nC2MSNet: A Novel approach for single image haze removal\nFrom Neuronal Models to Neuronal Dynamics and Image Processing\nNeural Algebra of Classifiers\nContextual Multi-Scale Region Convolutional 3D Network for Activity  Detection\nComparative Study of ECO and CFNet Trackers in Noisy Environment\nHierarchical Spatial Transformer Network\nObject-based reasoning in VQA\nImage Captioning at Will: A Versatile Scheme for Effectively Injecting  Sentiments into Image Descriptions\nA Survey of Recent Advances in Texture Representation\nSynchronization Detection and Recovery of Steganographic Messages with  Adversarial Learning\nIn Defense of Classical Image Processing: Fast Depth Completion on the  CPU\nSingle Image Reflection Removal Using Deep Encoder-Decoder Network\nHoloFace: Augmenting Human-to-Human Interactions on HoloLens\nWhen can $l_p$-norm objective functions be minimized via graph cuts?\nThe edge cloud: A holistic view of communication, computation and  caching\nHuman Action Adverb Recognition: ADHA Dataset and A Three-Stream Hybrid  Model\nBackground subtraction using the factored 3-way restricted Boltzmann  machines\nA comprehensive review of 3D point cloud descriptors\nUnsupervised Typography Transfer\nPros and Cons of GAN Evaluation Measures\nDisjoint Multi-task Learning between Heterogeneous Human-centric Tasks\nTree-CNN: A Deep Convolutional Neural Network for Lifelong Learning\nDo deep nets really need weight decay and dropout?\nUncertainty Estimates for Optical Flow with Multi-Hypotheses Networks\nContinuous Relaxation of MAP Inference: A Nonconvex Perspective\nxView: Objects in Context in Overhead Imagery\nEnd-to-end learning of keypoint detector and descriptor for pose  invariant 3D matching\nDeep Unsupervised Learning of Visual Similarities\nPulling Out All the Tops with Computer Vision and Deep Learning\nFacial Expression Recognition Based on Complexity Perception  Classification Algorithm\nDeepDefense: Training Deep Neural Networks with Improved Robustness\nMonocular Depth Estimation using Multi-Scale Continuous CRFs as  Sequential Deep Networks\nFocal Loss Dense Detector for Vehicle Surveillance\nLearning Scene Gist with Convolutional Neural Networks to Improve Object  Recognition\nExponential Discriminative Metric Embedding in Deep Learning\nMotion deblurring of faces\nTracking by Prediction: A Deep Generative Model for Mutli-Person  localisation and Tracking\nIndoor Scene Understanding in 2.5/3D: A Survey\nTask Specific Visual Saliency Prediction with Memory Augmented  Conditional Generative Adversarial Networks\nBeyond Gröbner Bases: Basis Selection for Minimal Solvers\nParticle Identification In Camera Image Sensors Using Computer Vision\nVideo Based Reconstruction of 3D People Models\nAdversarial Data Programming: Using GANs to Relax the Bottleneck of  Curated Labeled Data\nUnpaired Image Captioning by Language Pivoting\nSelf-Supervised Monocular Image Depth Learning and Confidence Estimation\nDiverse M-Best Solutions by Dynamic Programming\nDeja Vu: Motion Prediction in Static Images\nProgressive Structure from Motion\nDiscrete Potts Model for Generating Superpixels on Noisy Images\nBuried object detection from B-scan ground penetrating radar data using  Faster-RCNN\nMaximum Consensus Parameter Estimation by Reweighted $\\ell_1$ Methods\nExplicit Reasoning over End-to-End Neural Architectures for Visual  Question Answering\nNoise generation for compression algorithms\nScene Graph Parsing as Dependency Parsing\nGeneralized Hadamard-Product Fusion Operators for Visual Question  Answering\nDeepJDOT: Deep Joint distribution optimal transport for unsupervised  domain adaptation\nExploiting Recurrent Neural Networks and Leap Motion Controller for Sign  Language and Semaphoric Gesture Recognition\nLearning Structure and Strength of CNN Filters for Small Sample Size  Training\nLearning to Anonymize Faces for Privacy Preserving Action Detection\nCombining STDP and Reward-Modulated STDP in Deep Convolutional Spiking  Neural Networks for Digit Recognition\nUnsupervised Correlation Analysis\nEntrenamiento de una red neuronal para el reconocimiento de imagenes de  lengua de senas capturadas con sensores de profundidad\nMegaDepth: Learning Single-View Depth Prediction from Internet Photos\nTowards Deep Learning based Hand Keypoints Detection for Rapid  Sequential Movements from RGB Images\nSelf-supervised Learning of Geometrically Stable Features Through  Probabilistic Introspection\nHallucinated-IQA: No-Reference Image Quality Assessment via Adversarial  Learning\nIdentifying Cross-Depicted Historical Motifs\nCBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation\nAdaptive Quantile Sparse Image (AQuaSI) Prior for Inverse Imaging  Problems\nPOL-LWIR Vehicle Detection: Convolutional Neural Networks Meet Polarised  Infrared Sensors\nFast Single Image Rain Removal via a Deep Decomposition-Composition  Network\nImage Segmentation using Sparse Subset Selection\nk-NN Graph Construction: a Generic Online Approach\nLarge Field and High Resolution: Detecting Needle in Haystack\nGeometrical analysis of polynomial lens distortion models\nFrench Word Recognition through a Quick Survey on Recurrent Neural  Networks Using Long-Short Term Memory RNN-LSTM\nDeformation Aware Image Compression\nOutline Objects using Deep Reinforcement Learning\nFishEyeRecNet: A Multi-Context Collaborative Deep Network for Fisheye  Image Rectification\nComparatives, Quantifiers, Proportions: A Multi-Task Model for the  Learning of Quantities from Vision\nExploiting Non-Causal CPU-State Information for Energy-Efficient Mobile  Cooperative Computing\nVideo Face Matching using Subset Selection and Clustering of  Probabilistic Multi-Region Histograms\nThe Open Connectome Project Data Cluster: Scalable Analysis and Vision  for High-Throughput Neuroscience\nFast Neuromimetic Object Recognition using FPGA Outperforms GPU  Implementations\nEfficient Dictionary Learning with Sparseness-Enforcing Projections\nDesign, Implementation and Simulation of a Cloud Computing System for  Enhancing Real-time Video Services by using VANET and Onboard Navigation  Systems\nDesign of a Mobile Face Recognition System for Visually Impaired Persons\nLocal Color Contrastive Descriptor for Image Classification\nCompact Convolutional Neural Network Cascade for Face Detection\nFast Randomized Singular Value Thresholding for Low-rank Optimization\nFPNN: Field Probing Neural Networks for 3D Data\nRistretto: Hardware-Oriented Approximation of Convolutional Neural  Networks\nAttend Refine Repeat: Active Box Proposal Generation via In-Out  Localization\nAutomated Linear-Time Detection and Quality Assessment of Superpixels in  Uncalibrated True- or False-Color RGB Images\nCHAOS: A Parallelization Scheme for Training Convolutional Neural  Networks on Intel Xeon Phi\nMulti-task Dictionary Learning based Convolutional Neural Network for  Computer aided Diagnosis with Longitudinal Images\nModeling the Resource Requirements of Convolutional Neural Networks on  Mobile Devices\nFPGA based Parallelized Architecture of Efficient Graph based Image  Segmentation Algorithm\nDeep Versus Wide Convolutional Neural Networks for Object Recognition on  Neuromorphic System\nmuNet: A Highly Compact Deep Convolutional Neural Network Architecture  for Real-time Embedded Traffic Sign Classification\nSistema de Navegação Autônomo Baseado em Visão Computacional\nSuper-resolution of spatiotemporal event-stream image captured by the  asynchronous temporal contrast vision sensor\nOptimization over Geodesics for Exact Principal Geodesic Analysis\nPrivacy, Trust and Identity in Pervasive Computing: A Review of  Technical Challenges and Future Research Directions\nPreserving privacy for secure and outsourcing for Linear Programming in  cloud computing\nOpen Quantum Systems and Quantum Algorithms\nComplexity of Representation and Inference in Compositional Models with  Part Sharing\nCubical Cohomology Ring of 3D Photographs\nA Simplified Phase Model for Oscillator Based Computing\nRecurrent computations for visual pattern completion\nSecure SURF with Fully Homomorphic Encryption\nRecurrent Segmentation for Variable Computational Budgets\nProbabilistic Adaptive Computation Time\nApproximate FPGA-based LSTMs under Computation Time Constraints\nFast and accurate computation of orthogonal moments for texture analysis\nEfficient and Deep Person Re-Identification using Multi-Level Similarity\nMapping the Physical Properties of Cosmic Hot Gas with Hyper-spectral  Imaging\nTheory of ferroelectrics: A vision for the next decade and beyond\nGeometric Morphology of Granular Materials\nOn a cepstrum-based speech detector robust to white noise\nNon-negative sparse coding\n2D Electrophoresis Gel Image and Diagnosis of a Disease\nOn the complexity of curve fitting algorithms\nA rigorous definition of axial lines: ridges on isovist fields\nExtraction of topological features from communication network  topological patterns using self-organizing feature maps\nQ-valued neural network as a system of fast identification and pattern  recognition\nEstimating mutual information and multi--information in large networks\nNear Perfect Decoding of LDPC Codes\nBayesian Restoration of Digital Images Employing Markov Chain Monte  Carlo a Review\nStability in multidimensional Size Theory\nA Note on Approximate Nearest Neighbor Methods\nOn reconstructing n-point configurations from the distribution of  distances or areas\nThe Parameter-Less Self-Organizing Map algorithm\nMedical Image Segmentation and Localization using Deformable Templates\nThe Fuzzy Vault for fingerprints is Vulnerable to Brute Force Attack\nColour image segmentation by the vector-valued Allen-Cahn phase-field  model: a multigrid solution\nArea distances of Convex Plane Curves and Improper Affine Spheres\nKohonAnts: A Self-Organizing Ant Algorithm for Clustering and Pattern  Classification\nDiscrete schemes for Gaussian curvature and their convergence\nFast Wavelet-Based Visual Classification\nIntrusion Detection Using Cost-Sensitive Classification\nVisual Grouping by Neural Oscillators\nA Vision-based Computed Torque Control for Parallel Kinematic Machines\nReal-time Texture Error Detection\nCombinatorial Ricci Curvature and Laplacians for Image Processing\nTwo-Dimensional ARMA Modeling for Breast Cancer Detection and  Classification\nSearch-based Structured Prediction\nAccelerating Competitive Learning Graph Quantization\nOn the equivalence between hierarchical segmentations and ultrametric  watersheds\nCLD-shaped Brushstrokes in Non-Photorealistic Rendering\nA Comprehensive Review of Image Enhancement Techniques\nImage Segmentation by Using Threshold Techniques\nImage processing of a spectrogram produced by Spectrometer Airglow  Temperature Imager\nFusion of Wavelet Coefficients from Visual and Thermal Face Images for  Human Face Recognition - A Comparative Study\nFast Color Space Transformations Using Minimax Approximations\nA Fast Switching Filter for Impulsive Noise Removal from Color Images\nA family of statistical symmetric divergences based on Jensen's  inequality\nSurface Curvature Effects on Reflectance from Translucent Materials\nFast Inference in Sparse Coding Algorithms with Applications to Object  Recognition\nConvex Analysis and Optimization with Submodular Functions: a Tutorial\nAffine Invariant, Model-Based Object Recognition Using Robust Metrics  and Bayesian Statistics\nAdaptive Cluster Expansion (ACE): A Multilayer Network for Estimating  Probability Density Functions\nThe Development of Dominance Stripes and Orientation Maps in a  Self-Organising Visual Cortex Network (VICON)\nDiffusion-geometric maximally stable component detection in deformable  shapes\nA Self-Organising Neural Network for Processing Data from Multiple  Sensors\nAutomatic segmentation of HeLa cell images\nOff-Line Handwritten Signature Identification Using Rotated Complex  Wavelet Filters\nHandwritten Digit Recognition with a Committee of Deep Neural Nets on  GPUs\nLearning Hierarchical Sparse Representations using Iterative Dictionary  Learning and Dimension Reduction\nSpatial Features for Multi-Font/Multi-Size Kannada Numerals and Vowels  Recognition\nEfficient and Accurate Gaussian Image Filtering Using Running Sums\nAn Automatic Clustering Technique for Optimal Clusters\nA Novel Approach to Texture classification using statistical feature\nDetermining a rotation of a tetrahedron from a projection\nInvestigation to implicate data on clouds\nInvariant Scattering Convolution Networks\nA comparative evaluation of two algorithms of detection of masses on  mammograms\nUsing Hausdorff Distance for New Medical Image Annotation\nReconstruction error in a motion capture system\nA Complete Workflow for Development of Bangla OCR\nSkin-color based videos categorization\nA Privacy-Aware Bayesian Approach for Combining Classifier and Cluster  Ensembles\nTexture Analysis And Characterization Using Probability Fractal  Descriptors\nNeural Networks for Handwritten English Alphabet Recognition\nPilgrims Face Recognition Dataset -- HUFRD\nRapid Feature Extraction for Optical Character Recognition\nA Missing and Found Recognition System for Hajj and Umrah\nA Novel Approach of Harris Corner Detection of Noisy Images using  Adaptive Wavelet Thresholding Technique\nAn Implementation of Computer Graphics as Prepress Image Enhancement  Process\nMultibiometric: Feature Level Fusion Using FKP Multi-Instance biometric\nA Survey of Multibiometric Systems\nClassification of Hepatic Lesions using the Matching Metric\nA Comparative study of Arabic handwritten characters invariant feature\nArtificial Neural Network Based Optical Character Recognition\nAn Effective Fingerprint Classification and Search Method\nViewpoint Invariant Object Detector\nA Scale-Space Theory for Text\nOptical Flow on Evolving Surfaces with an Application to the Analysis of  4D Microscopy Data\nApplication of Hopfield Network to Saccades\nLearning Graphical Model Parameters with Approximate Marginal Inference\nA Geometric Descriptor for Cell-Division Detection\nFast Image Scanning with Deep Max-Pooling Convolutional Neural Networks\nFour Side Distance: A New Fourier Shape Signature\nK Means Segmentation of Alzheimers Disease in PET scan datasets: An  implementation\nRecognition of Facial Expression Using Eigenvector Based Distributed  Features and Euclidean Distance Based Decision Making Technique\nA Robust Rapid Approach to Image Segmentation with Optimal Thresholding  and Watershed Transform\nTemplate matching with noisy patches: A contrast-invariant GLR test\nImage Compression By Embedding Five Modulus Method Into JPEG\nBubbles are rational\nBiEntropy - The Approximate Entropy of a Finite Binary String\nRotation invariants of two dimensional curves based on iterated  integrals\nQuaternion Fourier Transform on Quaternion Fields and Generalizations\nIntroduction to Clifford's Geometric Algebra\nA Face-like Structure Detection on Planet and Satellite Surfaces using  Image Processing\nImage Fusion Technologies In Commercial Remote Sensing Packages\nFuzzy Fibers: Uncertainty in dMRI Tractography\nHandwritten Digits Recognition using Deep Convolutional Neural Network:  An Experimental Study using EBlearn\nIntegration of 3D Object Recognition and Planning for Robotic  Manipulation: A Preliminary Report\nThe Immune System: the ultimate fractionated cyber-physical system\nOptical Flow on Evolving Surfaces with Space and Time Regularisation\nWavelet and Fast Fourier Transform based analysis of Solar Image\nGeneric Deep Networks with Wavelet Scattering\nDeep learning for class-generic object detection\nBangla Text Recognition from Video Sequence: A New Focus\nA sparse Kaczmarz solver and a linearized Bregman method for online  compressed sensing\nReview of Face Detection Systems Based Artificial Neural Networks  Algorithms\nDenseNet: Implementing Efficient ConvNet Descriptor Pyramids\nEmbed System for Robotic Arm with 3 Degree of Freedom Controller using  Computational Vision on Real-Time\nUnsupervised Text Extraction from G-Maps\nSinogram constrained TV-minimization for metal artifact reduction in CT\nSelecting a Small Set of Optimal Gestures from an Extensive Lexicon\nStudy on performance improvement of oil paint image filter algorithm  using parallel pattern library\nImplementation And Performance Evaluation Of Background Subtraction  Algorithms\nEfficient Tracking of a Moving Object using Inter-Frame Coding\nAn FPGA-based Parallel Architecture for Face Detection using Mixed Color  Models\nHyperspectral Imaging and Analysis for Sparse Reconstruction and  Recognition\nBoosted Markov Networks for Activity Recognition\nReal-Time Impulse Noise Suppression from Images Using an Efficient  Weighted-Average Filtering\nReal-time emotion recognition for gaming using deep convolutional  network features\nFinding Action Tubes\nAn Analytical Study of different Document Image Binarization Methods\nA Gaussian Scale Space Approach For Exudates Detection, Classification  And Severity Prediction\nOn a spatial-temporal decomposition of the optical flow\nUsing Ensemble Models in the Histological Examination of Tissue  Abnormalities\nOn the Problem of Detecting When Two Implicit Plane Algebraic Curves Are  Similar\nBoosting-like Deep Learning For Pedestrian Detection\nGeneral Deformations of Point Configurations Viewed By a Pinhole Model  Camera\nPlanar Ultrametric Rounding for Image Segmentation\nMultiscale Adaptive Representation of Signals: I. The Basic Framework\nSnowWatch: Snow Monitoring through Acquisition and Analysis of  User-Generated Content\nBackground Image Generation Using Boolean Operations\nElasticity-based Matching by Minimizing the Symmetric Difference of  Shapes\nConfusing Deep Convolution Networks by Relabelling\nRobust Large-Scale Localization in 3D Point Clouds Revisited\nImprovised Salient Object Detection and Manipulation\n3D Time-lapse Reconstruction from Internet Photos\nDeep Multimodal Semantic Embeddings for Speech and Images\nLearning Deep Structure-Preserving Image-Text Embeddings\nScreen Content Image Segmentation Using Sparse-Smooth Decomposition\nImproving LSTM-based Video Description with Linguistic Knowledge Mined  from Text\nA robust autoassociative memory with coupled networks of Kuramoto-type  oscillators\nFull Flow: Optical Flow Estimation By Global Optimization over Regular  Grids\nPersistence Lenses: Segmentation, Simplification, Vectorization, Scale  Space and Fractal Analysis of Images\nAutomatic 3D Point Set Reconstruction from Stereo Laparoscopic Images  using Deep Neural Networks\nLabeling Topics with Images using Neural Networks\nSupervised Classification of RADARSAT-2 Polarimetric Data for Different  Land Features\nEarly Methods for Detecting Adversarial Images\nShape and Centroid Independent Clustring Algorithm for Crowd Management  Applications\nMean Box Pooling: A Rich Image Representation and Output Embedding for  the Visual Madlibs Task\nDoes V-NIR based Image Enhancement Come with Better Features?\nComputer-Aided Colorectal Tumor Classification in NBI Endoscopy Using  CNN Features\nTransfer Learning for Endoscopic Image Classification\nFacial Surface Analysis using Iso-Geodesic Curves in Three Dimensional  Face Recognition System\nPlanar Pixelations and Image Recognition\nEfficient Learning of Sparse Invariant Representations\nClassification with Invariant Scattering Representations\nReal-time face swapping as a tool for understanding infant  self-recognition\nMulti-q Analysis of Image Patterns\nEfficient Parallel Estimation for Markov Random Fields\nEuclidean Upgrade from a Minimal Number of Segments\nA review on handwritten character and numeral recognition for Roman,  Arabic, Chinese and Indian scripts\nClassification Tree Diagrams in Health Informatics Applications\nGroup-sparse Matrix Recovery\nA DCT Approximation for Image Compression\nGraph Approximation and Clustering on a Budget\nHeterogeneous Multi-task Learning for Human Pose Estimation with Deep  Convolutional Neural Network\nCommittees of deep feedforward networks trained with few data\nLearning to Deblur\nImage processing\nOne-Dimensional Vector based Pattern Matching\nCompute Less to Get More: Using ORC to Improve Sparse Filtering\nA Concept Learning Approach to Multisensory Object Perception\nMoDeep: A Deep Learning Framework Using Motion Features for Human Pose  Estimation\nGradient Boundary Histograms for Action Recognition\nAn algorithm for improving Non-Local Means operators via low-rank  approximation\nObject Recognition Using Deep Neural Networks: A Survey\nUnsupervised Neural Architecture for Saliency Detection: Extended  Version\nCITlab ARGUS for historical handwritten documents\nContour Detection Using Contrast Formulas in the Framework of  Logarithmic Models\nCITlab ARGUS for historical data tables\nPermutohedral Lattice CNNs\nExtraction of Salient Sentences from Labelled Documents\nA Novel Feature Selection and Extraction Technique for Classification\nSpectral classification using convolutional neural networks\nA Discrete Tchebichef Transform Approximation for Image and Video Coding\nQuantum Pairwise Symmetry: Applications in 2D Shape Analysis\nRing artifacts correction in compressed sensing tomographic  reconstruction\nEfficient batchwise dropout training using submatrices\nWhy Use Sobolev Metrics on the Space of Curves\nOver-Sampling in a Deep Neural Network\nDRAW: A Recurrent Neural Network For Image Generation\nTowards radio astronomical imaging using an arbitrary basis\nSeparable and non-separable data representation for pattern  discrimination\nPixel-wise Deep Learning for Contour Detection\nPredicting People's 3D Poses from Short Sequences\nPath-SGD: Path-Normalized Optimization in Deep Neural Networks\nOptical Flow on Evolving Sphere-Like Surfaces\nLocalized Multiple Kernel Learning---A Convex Approach\nPartial Functional Correspondence\nSpectral Collaborative Representation based Classification for Hand  Gestures recognition on Electromyography Signals\nTracking Direction of Human Movement - An Efficient Implementation using  Skeleton\nPlaces205-VGGNet Models for Scene Recognition\nTuring's Imitation Game has been Improved\nAccelerated graph-based spectral polynomial filters\nNatural scene statistics mediate the perception of image complexity\nMMSE Estimation for Poisson Noise Removal in Images\nCreation of a Deep Convolutional Auto-Encoder in Caffe\nSimple Baseline for Visual Question Answering\nRNN Fisher Vectors for Action Recognition and Image Annotation\nMultilinear Subspace Clustering\nSupervised Texture Segmentation: A Comparative Study\nA Theory of Local Matching: SIFT and Beyond\nFitting a 3D Morphable Model to Edges: A Comparison Between Hard and  Soft Correspondences\nGenerate Image Descriptions based on Deep RNN and Memory Cells for  Images Features\nCell segmentation with random ferns and graph-cuts\nPosition paper: Towards an observer-oriented theory of shape comparison\nThe red one!: On learning to refer to things based on their  discriminative properties\nFast Bilateral Filtering of Vector-Valued Images\nCompact Hash Codes for Efficient Visual Descriptors Retrieval in Large  Scale Databases\nHierarchical Clustering in Face Similarity Score Space\nStereotyping and Bias in the Flickr30K Dataset\nTowards Multi-Agent Communication-Based Language Learning\nCITlab ARGUS for historical handwritten documents\nMulti-View Treelet Transform\nStructured Convolution Matrices for Energy-efficient Deep learning\n3D zigzag for multislicing, multiband and video processing\nFind your Way by Observing the Sun and Other Semantic Cues\nBayesian Inference of Bijective Non-Rigid Shape Correspondence\nHierarchical Manifold Clustering on Diffusion Maps for Connectomics (MIT  18.S096 final project)\nA convolutional approach to reflection symmetry\nLearning Robust Representations of Text\nImproving analytical tomographic reconstructions through consistency  conditions\nSpatio-Temporal Sentiment Hotspot Detection Using Geotagged Photos\nA Tour of TensorFlow\nECAT: Event Capture Annotation Tool\nGenerating captions without looking beyond objects\nGPU-accelerated real-time stixel computation\nTheory and computer simulation of the moiré patterns in single-layer  cylindrical particles\nShort-term prediction of localized cloud motion using ground-based sky  imagers\nDeep Neural Networks for HDR imaging\nGPU-based Pedestrian Detection for Autonomous Driving\nA Fully Convolutional Neural Network based Structured Prediction  Approach Towards the Retinal Vessel Segmentation\nFuzzy Statistical Matrices for Cell Classification\nGeneralized Dropout\nLearning Invariant Representations Of Planar Curves\nA neuro-mathematical model for geometrical optical illusions\nLarge-Scale Shape Retrieval with Sparse 3D Convolutional Neural Networks\nPoint Pair Feature based Object Detection for Random Bin Picking\nTemplate Matching with Deformable Diversity Similarity\nAutomated Inference on Sociopsychological Impressions of Attractive  Female Faces\nPhoto-Quality Evaluation based on Computational Aesthetics: Review of  Feature Extraction Techniques\nRe-evaluating Automatic Metrics for Image Captioning\nA Framework for Wasserstein-1-Type Metrics\nGeometric features for voxel-based surface recognition\nFusion of Heterogeneous Data in Convolutional Networks for Urban  Semantic Labeling (Invited Paper)\nA method of limiting performance loss of CNNs in noisy environments\nEvolution-Preserving Dense Trajectory Descriptors\nVehicle Speed Detecting App\nChanging Model Behavior at Test-Time Using Reinforcement Learning\nDiscrete Wavelet Transform Based Algorithm for Recognition of QRS  Complexes\nOptical Flow-based 3D Human Motion Estimation from Monocular Video\nMulti-Scale Wavelet Domain Residual Learning for Limited-Angle CT  Reconstruction\nObject classification in images of Neoclassical furniture using Deep  Learning\nOn the Interplay between Strong Regularity and Graph Densification\nSemantic Instance Segmentation via Deep Metric Learning\nLearned Watershed: End-to-End Learning of Seeded Segmentation\nCreativity: Generating Diverse Questions using Variational Autoencoders\nLeast square ellipsoid fitting using iterative orthogonal  transformations\nIntrospective Classification with Convolutional Nets\nIntrospective Generative Modeling: Decide Discriminatively\nICNet for Real-Time Semantic Segmentation on High-Resolution Images\nData Augmentation for Low-Resource Neural Machine Translation\nSpeech-Based Visual Question Answering\nQuantum Mechanical Approach to Modelling Reliability of Sensor Reports\nImagination improves Multimodal Translation\nDeep Learning for Lung Cancer Detection: Tackling the Kaggle Data  Science Bowl 2017 Challenge\nMirror version of similar triangles method for constrained optimization  problems\nMegapixel Size Image Creation using Generative Adversarial Networks\nDeep learning evaluation using deep linguistic processing\nThe in-town monitoring system for ambulance dispatch centre\nOnline Convolutional Dictionary Learning for Multimodal Imaging\nMultispectral and Hyperspectral Image Fusion Using a 3-D-Convolutional  Neural Network\nThe $\\mathcal{E}$-Average Common Submatrix: Approximate Searching in a  Restricted Neighborhood\nSynthesizing Deep Neural Network Architectures using Biological Synaptic  Strength Distributions\nImpulsive noise removal from color images with morphological filtering\nTensor-based approach to accelerate deformable part models\nDomain Adaptation for Resume Classification Using Convolutional Neural  Networks\nOn recent advances in 2D Constrained Delaunay triangulation algorithms\nLearning Visually Grounded Sentence Representations\nVideo Highlight Prediction Using Audience Chat Reactions\nImproved Face Detection and Alignment using Cascade Deep Convolutional  Network\nEstimating speech from lip dynamics\nHierarchically-Attentive RNN for Album Summarization and Storytelling\nMotion Feature Augmented Recurrent Neural Network for Skeleton-based  Dynamic Hand Gesture Recognition\nGlobeNet: Convolutional Neural Networks for Typhoon Eye Tracking from  Remote Sensing Imagery\nLearning Rotation for Kernel Correlation Filter\nSTNet: Selective Tuning of Convolutional Networks for Object  Localization\nExact Blur Measure Outperforms Conventional Learned Features for Depth  Finding\nMachine learning methods for histopathological image analysis\nA Bottom Up Procedure for Text Line Segmentation of Latin Script\nA Finite Element Computational Framework for Active Contours on Graphs\nReal time ridge orientation estimation for fingerprint images\nComFlux: External Composition and Adaptation of Pervasive Applications\nFindings of the Second Shared Task on Multimodal Machine Translation and  Multilingual Image Description\nLinear-Time Algorithm in Bayesian Image Denoising based on Gaussian  Markov Random Field\nUsing the quantization error from Self-Organized Map (SOM) output for  detecting critical variability in large bodies of image time series in less  than a minute\nFingerprint Orientation Refinement through Iterative Smoothing\nExtremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15  Minutes\nA deep learning-based method for relative location prediction in CT scan  images\nConvolutional Neural Networks for Breast Cancer Screening: Transfer  Learning with Exponential Decay\nSentiment Classification using Images and Label Embeddings\nHigh performance ultra-low-precision convolutions on mobile devices\nDistributed Mapper\nAttention networks for image-to-text\nLearning a Complete Image Indexing Pipeline\nLightweight Neural Networks\nAn Artificial Neural Network Architecture Based on Context  Transformations in Cortical Minicolumns\nAVEID: Automatic Video System for Measuring Engagement In Dementia\nTexture Object Segmentation Based on Affine Invariant Texture Detection\nObamaNet: Photo-realistic lip-sync from text\nCombining Stereo Disparity and Optical Flow for Basic Scene Flow\nFine-tuned Language Models for Text Classification\nThe WiLI benchmark dataset for written language identification\nGraphVAE: Towards Generation of Small Graphs Using Variational  Autoencoders\nPlummer Autoencoders\nLipschitz-Margin Training: Scalable Certification of Perturbation  Invariance for Deep Neural Networks\nSystematic Weight Pruning of DNNs using Alternating Direction Method of  Multipliers\nLearning to Count Objects in Natural Images for Visual Question  Answering\nAffine Differential Invariants for Invariant Feature Point Detection\nRigid Point Registration with Expectation Conditional Maximization\nContour Parametrization via Anisotropic Mean Curvature Flows\nA Framework for Video-Driven Crowd Synthesis\nEnhanced navigation systems in GPS denied environments for visually  impaired people: A Survey\nThe Three Pillars of Machine-Based Programming\nFast, Accurate, and, Lightweight Super-Resolution with Cascading  Residual Network\nPresentation Attack Detection for Iris Recognition: An Assessment of the  State of the Art\nExpanding a robot's life: Low power object recognition via FPGA-based  DCNN deployment\nA Modified Image Comparison Algorithm Using Histogram Features\nVoroTop: Voronoi Cell Topology Visualization and Analysis Toolkit\nAssessment of Breast Cancer Histology using Densely Connected  Convolutional Networks\nVision-Guided Robot Hearing\nRobot In a Room: Toward Perfect Object Recognition in Closed  Environments\nRapid Online Analysis of Local Feature Detectors and Their  Complementarity\nBackground subtraction - separating the modeling and the inference\nFree-Space Detection with Self-Supervised and Online Trained Fully  Convolutional Networks\nOpt: A Domain Specific Language for Non-linear Least Squares  Optimization in Graphics and Imaging\nPartial Sum Minimization of Singular Values in Robust PCA: Algorithm and  Applications\nA Restricted Visual Turing Test for Deep Scene and Event Understanding\n2D Visual Place Recognition for Domestic Service Robots at Night\nVision-based Engagement Detection in Virtual Reality\nLearning Deep CNN Denoiser Prior for Image Restoration\nPredicting Foreground Object Ambiguity and Efficiently Crowdsourcing the  Segmentation(s)\nVisual pathways from the perspective of cost functions and multi-task  deep neural networks\nPseudo-labels for Supervised Learning on Dynamic Vision Sensor Data,  Applied to Object Detection under Ego-motion\nLearning Affinity via Spatial Propagation Networks\nThe Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?\nDDD17: End-To-End DAVIS Driving Dataset\nLearning Aggregated Transmission Propagation Networks for Haze Removal  and Beyond\nOptimizing colormaps with consideration for color vision deficiency to  enable accurate interpretation of scientific data\nGrounding Referring Expressions in Images by Variational Context\nA vision based system for underwater docking\nSuperpixel based Class-Semantic Texton Occurrences for Natural Roadside  Vegetation Segmentation\nTSSD: Temporal Single-Shot Object Detection Based on Attention-Aware  LSTM\nScaling Egocentric Vision: The EPIC-KITCHENS Dataset\nA Survey on Mobile Edge Computing: The Communication Perspective\nBehavior Subtraction\nMultiple View Reconstruction of Calibrated Images using Singular Value  Decomposition\nGray Image extraction using Fuzzy Logic\nFast 3D Salient Region Detection in Medical Images using GPUs\nFast Training of Effective Multi-class Boosting Using Coordinate Descent  Optimization\nUbic: Bridging the gap between digital cryptography and the physical  world\nA robust and adaptable method for face detection based on Color  Probabilistic Estimation Technique\nDiscovering Discriminative Cell Attributes for HEp-2 Specimen Image  Classification\nFast Robust PCA on Graphs\nDeep convolutional neural networks for pedestrian detection\nConvolutional neural networks with low-rank regularization\nA Focused Dynamic Attention Model for Visual Question Answering\nA New Trusted and E-Commerce Architecture for Cloud Computing\nDictionary learning for fast classification based on soft-thresholding\nExtracting man-made objects from remote sensing images via fast level  set evolutions\nMulti-Atlas Segmentation of Biomedical Images: A Survey\nVisual Saliency Based on Multiscale Deep Features\nSpherical Conformal Parameterization of Genus-0 Point Clouds for Meshing\nLocal Multi-Grouped Binary Descriptor with Ring-based Pooling  Configuration and Optimization\nDynamic Parallel and Distributed Graph Cuts\nRecent Advances in Convolutional Neural Networks\nDeep convolutional networks for automated detection of posterior-element  fractures on spine CT\nA Powerful Generative Model Using Random Weights for the Deep Image  Representation\nFALCON: Feature Driven Selective Classification for Energy-Efficient  Image Recognition\nGeneralized Haar Filter based Deep Networks for Real-Time Object  Detection in Traffic Scene\nOriented bounding boxes using multiresolution contours for fast  interference detection of arbitrary geometry objects\nThe Amazing Mysteries of the Gutter: Drawing Inferences Between Panels  in Comic Book Narratives\nGeometric deep learning: going beyond Euclidean data\nGeometric deep learning on graphs and manifolds using mixture model CNNs\nBeam Search for Learning a Deep Convolutional Neural Network of 3D  Shapes\nBuilding Fast and Compact Convolutional Neural Networks for Offline  Handwritten Chinese Character Recognition\nBorrowing Treasures from the Wealthy: Deep Transfer Learning through  Selective Joint Fine-tuning\nDistance Metric Learning using Graph Convolutional Networks: Application  to Functional Brain Networks\nEfficient Processing of Deep Neural Networks: A Tutorial and Survey\nSmart Mining for Deep Metric Learning\nGabor Filter Assisted Energy Efficient Fast Learning Convolutional  Neural Networks\nDarkRank: Accelerating Deep Metric Learning via Cross Sample  Similarities Transfer\nOnline Multi-Object Tracking Using CNN-based Single Object Tracker with  Spatial-Temporal Attention Mechanism\nOne-Shot Concept Learning by Simulating Evolutionary Instinct  Development\nReading Scene Text with Attention Convolutional Sequence Modeling\nCompact Environment-Invariant Codes for Robust Visual Place Recognition\nHPC Cloud for Scientific and Business Applications: Taxonomy, Vision,  and Research Challenges\nKnowledge Projection for Deep Neural Networks\nEfficient Implementation of a Recognition System Using the Cortex  Ventral Stream Model\nWeakly Supervised One-Shot Detection with Attention Siamese Networks\nBayesian Deep Convolutional Encoder-Decoder Networks for Surrogate  Modeling and Uncertainty Quantification\nFixaTons: A collection of Human Fixations Datasets and Metrics for  Scanpath Similarity\nTiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network  for Real-time Embedded Object Detection\nUsing Convolutional Neural Networks for Determining Reticulocyte  Percentage in Cats\nFast Subspace Clustering Based on the Kronecker Product\nGroup Normalization\nContext-aware Deep Feature Compression for High-speed Visual Tracking\nQuantum Analogue Computing\nLarge-scale Binary Quadratic Optimization Using Semidefinite Relaxation  and Applications\nDesigning Energy-Efficient Convolutional Neural Networks using  Energy-Aware Pruning\nHigh Performance Software in Multidimensional Reduction Methods for  Image Processing with Application to Ancient Manuscripts\nA Novel Framework for Robustness Analysis of Visual QA Models\nOracle Complexity and Nontransitivity in Pattern Recognition\nNovel Runtime Systems Support for Adaptive Compositional Modeling on the  Grid\nThe Computational Complexity of Orientation Search Problems in  Cryo-Electron Microscopy\nAnalogue Quantum Computers for Data Analysis\nThe Fast Haar Wavelet Transform for Signal & Image Processing\nA Note on the Membrane Computer\nDistribution of the search of evolutionary product unit neural networks  for classification\nA programme to determine the exact interior of any connected digital  picture\nEntropy Computation of Document Images in Run-Length Compressed Domain\nThe Shortlist Method for Fast Computation of the Earth Mover's Distance  and Finding Optimal Solutions to Transportation Problems\nTrainable and Dynamic Computing: Error Backpropagation through Physical  Media\nImproving the performance of the linear systems solvers using CUDA\nDistributed Machine Learning via Sufficient Factor Broadcasting\nApproaching the Computational Color Constancy as a Classification  Problem through Deep Learning\nDistributed Machine Learning via Sufficient Factor Broadcasting\nSimplified firefly algorithm for 2D image key-points search\nQuantifying Creativity in Art Networks\nDepth Reconstruction and Computer-Aided Polyp Detection in Optical  Colonoscopy Video Frames\nDense Wide-Baseline Scene Flow From Two Handheld Video Cameras\nPreserving the value of large scale data analytics over time through  selective re-computation\nA sparse linear algebra algorithm for fast computation of prediction  variances with Gaussian Markov random fields\nFree Space Estimation using Occupancy Grids and Dynamic Object Detection\nSome observations on computer lip-reading: moving from the dream to the  reality\nSkipNet: Learning Dynamic Routing in Convolutional Networks\nConvolutional Networks with Adaptive Computation Graphs\nDepth-Adaptive Computational Policies for Efficient Visual Tracking\nSBNet: Sparse Blocks Network for Fast Inference\nMEDEA: Automated Measure and on-line Analysis in Astronomy and  Astrophysics for Very Large Vision Machine\nPaving the Way for Image Understanding: A New Kind of Image  Decomposition is Desired\nMetric State Space Reinforcement Learning for a Vision-Capable Mobile  Robot\nCamera motion estimation through planar deformation determination\nMulti-Dimensional Recurrent Neural Networks\nEvaluation for Uncertain Image Classification and Segmentation\nThe bispectrum as a source of phase-sensitive invariants for Fourier  descriptors: a group-theoretic approach\nA possible low-level explanation of \"temporal dynamics of brightness  induction and White's illusion\"\nFeature Level Clustering of Large Biometric Database\nVisual Infrared Video Fusion for Night Vision using Background  Estimation\nFeature Selection via Sparse Approximation for Face Recognition\nVision-Based Navigation III: Pose and Motion from Omnidirectional  Optical Flow and a Digital Terrain Map\nVision-Based Navigation II: Error Analysis for a Navigation Algorithm  based on Optical-Flow and a Digital Terrain Map\nMoving Object Detection by Detecting Contiguous Outliers in the Low-Rank  Representation\nEstimating 3D Human Shapes from Measurements\nHow important are Deformable Parts in the Deformable Parts Model?\nJoint Reconstruction of Multi-view Compressed Images\nA Distributed Algorithm for Gathering Many Fat Mobile Robots in the  Plane\n3D Scene Grammar for Parsing RGB-D Pointclouds\nA Multi-Orientation Analysis Approach to Retinal Vessel Tracking\nObject Recognition with Imperfect Perception and Redundant Description\nHandwritten and Printed Text Separation in Real Document\nExpressing Relational and Temporal Knowledge in Visual Probabilistic  Networks\niCub World: Friendly Robots Help Building Good Vision Data-Sets\nSpeedy Object Detection based on Shape\nTop-down and Bottom-up Feature Combination for Multi-sensor Attentive  Robots\nScalable $k$-NN graph construction\nCSIFT Based Locality-constrained Linear Coding for Image Classification\nGlasgow's Stereo Image Database of Garments\nLarge Scale Visual Recommendations From Street Fashion Images\nConvex Relaxations of SE(2) and SE(3) for Visual Pose Estimation\nEfficient Background Modeling Based on Sparse Representation and Outlier  Iterative Removal\nRegression-Based Image Alignment for General Object Categories\nGlobally Optimal Joint Image Segmentation and Shape Matching Based on  Wasserstein Modes\nUnified mobile public health care system (UMPHCS) for underdeveloped  countries\nGASP : Geometric Association with Surface Patches\nAnisotropic Agglomerative Adaptive Mean-Shift\nRevisiting Kernelized Locality-Sensitive Hashing for Improved  Large-Scale Image Retrieval\nA Latent Clothing Attribute Approach for Human Pose Estimation\nExploring Human Vision Driven Features for Pedestrian Detection\nVisual Summary of Egocentric Photostreams by Representative Keyframes\nParametric Regression on the Grassmannian\nSalient Structure Detection by Context-Guided Visual Search\nDriver Gaze Region Estimation Without Using Eye Movement\nSubspace Alignment Based Domain Adaptation for RCNN Detector\nBuilding a Large-scale Multimodal Knowledge Base System for Answering  Visual Queries\nParticle detection and tracking in fluorescence time-lapse imaging: a  contrario approach\nOcclusion-Aware Object Localization, Segmentation and Pose Estimation\nAction-Conditional Video Prediction using Deep Networks in Atari Games\nSingle Image Dehazing through Improved Atmospheric Light Estimation\nEvent-based Camera Pose Tracking using a Generative Event Model\nPerformance Characterization of Image Feature Detectors in Relation to  the Scene Content Utilizing a Large Image Database\nDepth Extraction from Videos Using Geometric Context and Occlusion  Boundaries\nAccurate Vision-based Vehicle Localization using Satellite Imagery\nBioinspired Visual Motion Estimation\nTowards Vision-Based Deep Reinforcement Learning for Robotic Motion  Control\nLabeled pupils in the wild: A dataset for studying pupil detection in  unconstrained environments\nLearning High-level Prior with Convolutional Neural Networks for  Semantic Segmentation\nHorizon Lines in the Wild\nCounting Everyday Objects in Everyday Scenes\nReversible Image Merging for Low-level Machine Vision\nDeep Residual Networks with Exponential Linear Unit\nInvariant feature extraction from event based stimuli\nPhotometric Bundle Adjustment for Vision-Based SLAM\nLearning Joint Representations of Videos and Sentences with Web Image  Search\nCamera Pose Estimation from Lines using Plücker Coordinates\nDepth2Action: Exploring Embedded Depth for Large-Scale Action  Recognition\nMultiple objects tracking in surveillance video using color and Hu  moments\nVehicle Detection from 3D Lidar Using Fully Convolutional Network\nA Biomimetic Model of the Outer Plexiform Layer by Incorporating  Memristive Devices\nShadow Estimation Method for \"The Episolar Constraint: Monocular Shape  from Shadow Correspondence\"\nA Novel Approach for Canvas Accessibility Problem in HTML5\nFast Edge Detection Using Structured Forests\nRoad Detection via On--line Label Transfer\nVideo (language) modeling: a baseline for generative models of natural  videos\nA Stable Multi-Scale Kernel for Topological Machine Learning\nAn Expressive Deep Model for Human Action Parsing from A Single Image\nThe NUbots Team Description Paper 2015\nPredicting opponent team activity in a RoboCup environment\nBuilding Statistical Shape Spaces for 3D Human Modeling\nPredicting Complete 3D Models of Indoor Scenes\nHolistically-Nested Edge Detection\nTurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking\nCircle-based Eye Center Localization (CECL)\nDiving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual  Sentiment Prediction\nKernelized Deep Convolutional Neural Network for Describing Complex  Images\nDeep Multimodal Embedding: Manipulating Novel Objects with Point-clouds,  Language and Trajectories\nSegment-Phrase Table for Semantic Segmentation, Visual Entailment and  Paraphrasing\nThe Next Best Underwater View\nCross-dimensional Weighting for Aggregated Deep Convolutional Features\nUnsupervised Temporal Segmentation of Repetitive Human Actions Based on  Kinematic Modeling and Frequency Analysis\nDomain Adaptation and Transfer Learning in StochasticNets\nAnalysis of Vessel Connectivities in Retinal Images by Cortically  Inspired Spectral Clustering\nWeakly-supervised Disentangling with Recurrent Transformations for 3D  View Synthesis\nOn Some Properties of Calibrated Trifocal Tensors\nFacial age estimation using BSIF and LBP\nJoint Object-Material Category Segmentation from Audio-Visual Cues\nTowards Declarative Safety Rules for Perception Specification  Architectures\nBit-Planes: Dense Subpixel Alignment of Binary Descriptors\nAre Elephants Bigger than Butterflies? Reasoning about Sizes of Objects\nDensity-based Denoising of Point Cloud\nTemporally Robust Global Motion Compensation by Keypoint-based  Congealing\nMulti-modal Tracking for Object based SLAM\nCross-modal Supervision for Learning Active Speaker Detection in Video\nRich Image Captioning in the Wild\nWEPSAM: Weakly Pre-Learnt Saliency Model\nHierarchical Modeling of Multidimensional Data in Regularly Decomposed  Spaces: Applications in Image Analysis\nLearning Action Maps of Large Environments via First-Person Vision\nSoftware Assumptions Failure Tolerance: Role, Strategies, and Visions\nChained Predictions Using Convolutional Neural Networks\nHARRISON: A Benchmark on HAshtag Recommendation for Real-world Images in  Social Networks\nLocal Perturb-and-MAP for Structured Prediction\nAction Classification via Concepts and Attributes\nEnd-to-End Instance Segmentation with Recurrent Attention\nModeling Photographic Composition via Triangles\nDeeper Depth Prediction with Fully Convolutional Residual Networks\nMultimodal Residual Learning for Visual QA\nShallow Networks for High-Accuracy Road Object-Detection\nPredictive Coding for Dynamic Vision : Development of Functional  Hierarchy in a Multiple Spatio-Temporal Scales RNN Model\nLearning without Forgetting\nCamera Elevation Estimation from a Single Mountain Landscape Photograph\nEfficient and Robust Pedestrian Detection using Deep Learning for  Human-Aware Navigation\nFusionNet: 3D Object Classification Using Multiple Data Representations\nHuman Pose Estimation in Space and Time using 3D CNN\nDeep Markov Random Field for Image Modeling\nBottom-up Instance Segmentation using Deep Higher-Order CRFs\nTrack Facial Points in Unconstrained Videos\nImage denoising via group sparsity residual constraint\nDevelopment of a Fuzzy Expert System based Liveliness Detection Scheme  for Biometric Authentication\nNon-flat Road Detection Based on A Local Descriptor\nA compact representation for minimizers of $k$-submodular functions\nReal-time Halfway Domain Reconstruction of Motion and Geometry\nmdBrief - A Fast Online Adaptable, Distorted Binary Descriptor for  Real-Time Applications Using Calibrated Wide-Angle Or Fisheye Cameras\nSoundNet: Learning Sound Representations from Unlabeled Video\nCRF-CNN: Modeling Structured Information in Human Pose Estimation\nCan fully convolutional networks perform well for general image  restoration problems?\nLight Field Stitching for Extended Synthetic Aperture\nHybrid Light Field Imaging for Improved Spatial Resolution and Depth  Range\nAnswering Image Riddles using Vision and Reasoning through Probabilistic  Soft Logic\nSANet: Structure-Aware Network for Visual Tracking\nRobust end-to-end deep audiovisual speech recognition\nObject Detection using Image Processing\nWhat Can Be Predicted from Six Seconds of Driver Glances?\nDid Evolution get it right? An evaluation of Near-Infrared imaging in  semantic scene segmentation using deep learning\nSequential Person Recognition in Photo Albums with a Recurrent Network\nEmbedded Line Scan Image Sensors: The Low Cost Alternative for High  Speed Imaging\nThe Mehler-Fock Transform and some Applications in Texture Analysis and  Color Processing\nUnrealStereo: A Synthetic Dataset for Analyzing Stereo Vision\nEfficient Optical flow and Stereo Vision for Velocity Estimation and  Obstacle Avoidance on an Autonomous Pocket Drone\nBeyond Holistic Object Recognition: Enriching Image Understanding with  Part States\nAutomatic Interpretation of Unordered Point Cloud Data for UAV  Navigation in Construction\nDesign, Control and Visual Navigation of the DelftaCopter\nOverlapping Cover Local Regression Machines\nWhat are the visual features underlying human versus machine vision?\nUnderstanding trained CNNs by indexing neuron selectivity\nRelative Camera Pose Estimation Using Convolutional Neural Networks\nAttentional Network for Visual Object Detection\nTowards Autonomous UAV Landing Based on Infrared Beacons and Particle  Filtering\nHandwritten Arabic Numeral Recognition using Deep Learning Neural  Networks\nThe Game Imitation: Deep Supervised Convolutional Networks for Quick  Video Game AI\nWeighted Motion Averaging for the Registration of Multi-View Range Scans\nAn Optimization Framework with Flexible Inexact Inner Iterations for  Nonconvex and Nonsmooth Programming\nContext Aware Query Image Representation for Particular Object Retrieval\nCombining Self-Supervised Learning and Imitation for Vision-Based Rope  Manipulation\nAutomatic Skin Lesion Analysis using Large-scale Dermoscopy Images and  Deep Residual Networks\nDeformable Convolutional Networks\nDiscriminatively Boosted Image Clustering with Fully Convolutional  Auto-Encoders\nLearning to Predict: A Fast Re-constructive Method to Generate  Multimodal Embeddings\nSatellite Image-based Localization via Learned Embeddings\nNon-Convex Weighted Lp Minimization based Group Sparse Representation  Framework for Image Denoising\nWeakly Supervised Dense Video Captioning\nEncoder Based Lifelong Learning\nCould you guess an interesting movie from the posters?: An evaluation of  vision-based features on movie poster database\nLearning Where to Look: Data-Driven Viewpoint Set Selection for 3D  Scenes\nReconstruction of~3-D Rigid Smooth Curves Moving Free when Two Traceable  Points Only are Available\nVirtual to Real Reinforcement Learning for Autonomous Driving\nTGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering\nOn Face Segmentation, Face Swapping, and Face Perception\nCamera Pose Filtering with Local Regression Geodesics on the Riemannian  Manifold of Dual Quaternions\nAdversarial PoseNet: A Structure-aware Convolutional Network for Human  Pose Estimation\nFace Recognition Machine Vision System Using Eigenfaces\nYou said that?\nDistribution of degrees of freedom over structure and motion of rigid  bodies\nAutomatic Extrinsic Calibration for Lidar-Stereo Vehicle Sensor Setups\nLearning to Refine Object Contours with a Top-Down Fully Convolutional  Encoder-Decoder Network\nMagnetic-Visual Sensor Fusion based Medical SLAM for Endoscopic Capsule  Robot\nFashion Forward: Forecasting Visual Style in Fashion\nExploring the structure of a real-time, arbitrary neural artistic  stylization network\nVANETs Meet Autonomous Vehicles: A Multimodal 3D Environment Learning  Approach\nCASENet: Deep Category-Aware Semantic Edge Detection\nReflection Invariant and Symmetry Detection\nCortexNet: a Generic Network Family for Robust Visual Temporal  Representations\nImage Matching via Loopy RNN\nVision-based Real Estate Price Estimation\nThe Devil is in the Decoder\nEmotional Filters: Automatic Image Transformation for Inducing Affect\nCascaded Scene Flow Prediction using Semantic Segmentation\nScene Graph Generation from Objects, Phrases and Region Captions\nStructure-measure: A New Way to Evaluate Foreground Maps\nAssociations among Image Assessments as Cost Functions in Linear  Decomposition: MSE, SSIM, and Correlation Coefficient\nAdversarial Robustness: Softmax versus Openmax\nCurvature sensing by vision and touch\nWeakly Supervised Image Annotation and Segmentation with Objects and  Attributes\nAdversarial Networks for Spatial Context-Aware Spectral Image  Reconstruction from RGB\nSeDAR - Semantic Detection and Ranging: Humans can localise without  LiDAR, can robots?\nOne-Shot Learning for Semantic Segmentation\nEfficient Online Surface Correction for Real-time Large-Scale 3D  Reconstruction\nConversational Exploratory Search via Interactive Storytelling\nDirect Pose Estimation with a Monocular Camera\nLearning quadrangulated patches for 3D shape parameterization and  completion\nOpen Source Dataset and Deep Learning Models for Online Digit Gesture  Recognition on Touchscreens\nEmerging Topics in Assistive Reading Technology: From Presentation to  Content Accessibility\nPerformance Characterization of Image Feature Detectors in Relation to  the Scene Content Utilizing a Large Image Database\nUAV and Service Robot Coordination for Indoor Object Search Tasks\nFast Shadow Detection from a Single Image Using a Patched Convolutional  Neural Network\nAre we Done with Object Recognition? The iCub robot's Perspective\nVision-based deep execution monitoring\nTranslating Videos to Commands for Robotic Manipulation with Deep  Recurrent Neural Networks\nClassification of Time-Series Images Using Deep Convolutional Neural  Networks\nVisual gesture variability between talkers in continuous visual speech\nGrader variability and the importance of reference standards for  evaluating machine learning models for diabetic retinopathy\nMultiframe Scene Flow with Piecewise Rigid Motion\nEnd-to-end Driving via Conditional Imitation Learning\nScalable Dense Monocular Surface Reconstruction\nSimultaneous Recognition and Pose Estimation of Instruments in Minimally  Invasive Surgery\nDropout Sampling for Robust Object Detection in Open-Set Conditions\nReal-time Convolutional Neural Networks for Emotion and Gender  Classification\nSEGCloud: Semantic Segmentation of 3D Point Clouds\nClass Correlation affects Single Object Localization using Pre-trained  ConvNets\nSqueeze-SegNet: A new fast Deep Convolutional Neural Network for  Semantic Segmentation\nThe Devil is in the Middle: Exploiting Mid-level Representations for  Cross-Domain Instance Matching\nW-Net: A Deep Model for Fully Unsupervised Image Segmentation\nLearning Depth from Monocular Videos using Direct Methods\nWhy my photos look sideways or upside down? Detecting Canonical  Orientation of Images using Convolutional Neural Networks\nVisual to Sound: Generating Natural Sound for Videos in the Wild\nVision Recognition using Discriminant Sparse Optimization Learning\nLearning to Navigate by Growing Deep Networks\nFlexible Stereo: Constrained, Non-rigid, Wide-baseline Stereo Vision for  Fixed-wing Aerial Platforms\nQuery-Efficient Black-box Adversarial Examples (superceded)\nLow-Shot Learning with Imprinted Weights\nHuman-Centric Data Cleaning [Vision]\nScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks\nA Real-Time Game Theoretic Planner for Autonomous Two-Player Drone  Racing\nLow-Shot Learning from Imaginary Data\nDynamics of Driver's Gaze: Explorations in Behavior Modeling & Maneuver  Prediction\nLearning random-walk label propagation for weakly-supervised semantic  segmentation\nEnhanced Image Classification With Data Augmentation Using Position  Coordinates\nDeep Image Super Resolution via Natural Image Priors\nVehicle Pose and Shape Estimation through Multiple Monocular Vision\nJoint 3D Reconstruction of a Static Scene and Moving Objects\nSalient Object Detection by Lossless Feature Reflection\nAutomatic Instrument Segmentation in Robot-Assisted Surgery Using Deep  Learning\nGenerating goal-directed visuomotor plans based on learning using a  predictive coding type deep visuomotor recurrent neural network model\nReview of Visual Saliency Detection with Comprehensive Information\nVision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep  Network with Online Error Correction\nObject Captioning and Retrieval with Natural Language\nMAGSAC: marginalizing sample consensus\nModeling Camera Effects to Improve Deep Vision for Real and Synthetic  Data\nStacked Cross Attention for Image-Text Matching\nRobust Blind Deconvolution via Mirror Descent\nImproving DNN Robustness to Adversarial Attacks using Jacobian  Regularization\nOn the Importance of Stereo for Accurate Depth Estimation: An Efficient  Semi-Supervised Deep Neural Network Approach\nRandom Polyhedral Scenes: An Image Generator for Active Vision System  Experiments\nRobust Video Content Alignment and Compensation for Rain Removal in a  CNN Framework\nLearning Kinematic Descriptions using SPARE: Simulated and Physical  ARticulated Extendable dataset\nSpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional  Filters\nGuide Me: Interacting with Deep Networks\nPerformance Evaluation of 3D Correspondence Grouping Algorithms\nMonocular Vision based Collaborative Localization for Micro Aerial  Vehicle Swarms\nThe Sound of Pixels\nPCN: Part and Context Information for Pedestrian Detection with CNNs\nAnisotropic k-Nearest Neighbor Search Using Covariance Quadtree\nSLA-Oriented Resource Provisioning for Cloud Computing: Challenges,  Architecture, and Solutions\nAutomatic Pattern Classification by Unsupervised Learning Using  Dimensionality Reduction of Data with Mirroring Neural Networks\nLearning Graph Matching\nPoint-Set Registration: Coherent Point Drift\nBoosting k-NN for categorization of natural scenes\nTowards automated high-throughput screening of C. elegans on agar\nEffective Pedestrian Detection Using Center-symmetric Local  Binary/Trinary Patterns\nThe Object Projection Feature Estimation Problem in Unsupervised  Markerless 3D Motion Tracking\n3D Model Assisted Image Segmentation\nFilling-Based Techniques Applied to Object Projection Feature Estimation\nProbabilistic index maps for modeling natural signals\nReview of Statistical Shape Spaces for 3D Data with Comparative Analysis  for Human Faces\nAn Image Based Technique for Enhancement of Underwater Images\nA Survey of Appearance Models in Visual Object Tracking\nFiltering for More Accurate Dense Tissue Segmentation in Digitized  Mammograms\nA Study of Actor and Action Semantic Retention in Video Supervoxel  Segmentation\nOne-Shot Adaptation of Supervised Deep Convolutional Models\nLearning Human Pose Estimation Features with Convolutional Networks\nInfrared face recognition: a comprehensive review of methodologies and  databases\nExtraction of Line Word Character Segments Directly from Run Length  Compressed Printed Text Documents\nDepth Reconstruction from Sparse Samples: Representation, Algorithm, and  Sampling\nComputational Beauty: Aesthetic Judgment at the Intersection of Art and  Science\nTransferring Rich Feature Hierarchies for Robust Visual Tracking\nA Graph Theoretic Approach for Object Shape Representation in  Compositional Hierarchies Using a Hybrid Generative-Descriptive Model\nVQA: Visual Question Answering\nGeometry-Aware Neighborhood Search for Learning Local Models for Image  Reconstruction\nAn Image is Worth More than a Thousand Favorites: Surfacing the Hidden  Beauty of Flickr Pictures\nWeakly-Supervised Alignment of Video With Text\nTexture Modelling with Nested High-order Markov-Gibbs Random Fields\nRepresentational Distance Learning for Deep Neural Networks\nSemantic Object Parsing with Local-Global Long Short-Term Memory\nSample and Filter: Nonparametric Scene Parsing via Efficient Filtering\nStructural-RNN: Deep Learning on Spatio-Temporal Graphs\nRecombinator Networks: Learning Coarse-to-Fine Feature Aggregation\nUsing Apache Lucene to Search Vector of Locally Aggregated Descriptors\nEfficient Optimization for Rank-based Loss Functions\nA machine learning method for the large-scale evaluation of urban visual  environment\nEfficient Continuous Relaxations for Dense CRF\nAbsolute Pose Estimation from Line Correspondences using Direct Linear  Transformation\nFast Trajectory Simplification Algorithm for Natural User Interfaces in  Robot Programming by Demonstration\nImproved Anomaly Detection in Crowded Scenes via Cell-based Analysis of  Foreground Speed, Size and Texture\nClassification of Human Epithelial Type 2 Cell Indirect  Immunofluoresence Images via Codebook Based Descriptors\nDiscriminative Unsupervised Feature Learning with Exemplar Convolutional  Neural Networks\nGeneralized Twin Gaussian Processes using Sharma-Mittal Divergence\n3D-Assisted Image Feature Synthesis for Novel Views of an Object\nSemantic Image Segmentation with Deep Convolutional Nets and Fully  Connected CRFs\nA Framework for Symmetric Part Detection in Cluttered Scenes\nFast image-based obstacle detection from unmanned surface vehicles\nFaceNet: A Unified Embedding for Face Recognition and Clustering\nFast keypoint detection in video sequences\nSketch-based 3D Shape Retrieval using Convolutional Neural Networks\nFPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave  Infrared\nFacial Expressions Tracking and Recognition: Database Protocols for  Systems Validation and Evaluation\nA Novel Feature Extraction Method for Scene Recognition Based on  Centered Convolutional Restricted Boltzmann Machines\nTrainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast  and Effective Image Restoration\nSemantic Image Segmentation via Deep Parsing Network\nSelecting Relevant Web Trained Concepts for Automated Event Retrieval\nLIBSVX: A Supervoxel Library and Benchmark for Early Video Processing\nA Semi-Automated Method for Object Segmentation in Infant's Egocentric  Videos to Study Object Perception\nUniform Hypergraph Partitioning: Provable Tensor Methods and Sampling  Techniques\nVisual Genome: Connecting Language and Vision Using Crowdsourced Dense  Image Annotations\nOn Complex Valued Convolutional Neural Networks\nContinuous 3D Label Stereo Matching using Local Expansion Moves\nAerial image geolocalization from recognition and matching of roads and  intersections\nSynthesizing the preferred inputs for neurons in neural networks via  deep generator networks\nHow Deep is the Feature Analysis underlying Rapid Visual Categorization?\nDeep learning trends for focal brain pathology segmentation in MRI\nOn the usability of deep networks for object-based image analysis\nDriving in the Matrix: Can Virtual Worlds Replace Human-Generated  Annotations for Real World Tasks?\nEM-Based Mixture Models Applied to Video Event Detection\nMessage-passing algorithms for synchronization problems over compact  groups\nAdaptive mixed norm optical flow estimation\nFast On-Line Kernel Density Estimation for Active Object Localization\nKernel Cross-View Collaborative Representation based Classification for  Person Re-Identification\nCombining Data-driven and Model-driven Methods for Robust Facial  Landmark Detection\nAggressive Quadrotor Flight through Narrow Gaps with Onboard Sensing and  Computing using Active Vision\nProcedural Generation of Videos to Train Deep Action Recognition  Networks\nDifferential Angular Imaging for Material Recognition\nSuperpixel Segmentation Using Gaussian Mixture Model\nStructured Deep Hashing with Convolutional Neural Networks for Fast  Person Re-identification\nSpeckle Reduction with Trained Nonlinear Diffusion Filtering\nParallel Structure from Motion from Local Increment to Global Averaging\nA Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification  and Domain Adaptation\nI2T2I: Learning Text to Image Synthesis with Textual Data Augmentation\nRobust Kronecker-Decomposable Component Analysis for Low-Rank Modeling\nCoordinating Filters for Faster Deep Neural Networks\nScalable Surface Reconstruction from Point Clouds with Extreme Scale and  Density Diversity\nChemception: A Deep Neural Network with Minimal Chemistry Knowledge  Matches the Performance of Expert-developed QSAR/QSPR Models\nObject Detection Using Deep CNNs Trained on Synthetic Images\nDetekcja upadku i wybranych akcji na sekwencjach obrazów cyfrowych\nRecurrent Residual Learning for Action Recognition\nSpectral Filter Tracking\nMulti-Branch Fully Convolutional Network for Face Detection\nDeep Neural Network Capacity\nLinear Differential Constraints for Photo-polarimetric Height Estimation\nComplete End-To-End Low Cost Solution To a 3D Scanning System with  Integrated Turntable\nSelf-Guiding Multimodal LSTM - when we do not have a perfect training  dataset for image captioning\nConvolutional Long Short-Term Memory Networks for Recognizing First  Person Interactions\nAn Evolutionary Computing Enriched RS Attack Resilient Medical Image  Steganography Model for Telemedicine Applications\nGeneric 3D Representation via Pose Estimation and Matching\nA Survey on Hardware Implementations of Visual Object Trackers\nCrowd counting via scale-adaptive convolutional neural network\nKnowledge Concentration: Learning 100K Object Classifiers in a Single  CNN\nDetection-aided liver lesion segmentation using deep learning\nSingle-epoch supernova classification with deep convolutional neural  networks\nRobust Kronecker Component Analysis\nDynamic Graph CNN for Learning on Point Clouds\nGame of Sketches: Deep Recurrent Models of Pictionary-style Word  Guessing\nBuild a Compact Binary Neural Network through Bit-level Sensitivity and  Data Pruning\nTracking Noisy Targets: A Review of Recent Object Tracking Approaches\nFull-Frame Scene Coordinate Regression for Image-Based Localization\nA Weighted Sparse Sampling and Smoothing Frame Transition Approach for  Semantic Fast-Forward First-Person Videos\nComputational Optimal Transport\nGAN-based Synthetic Medical Image Augmentation for increased CNN  Performance in Liver Lesion Classification\nDeep learning and its application to medical image segmentation\nHDM-Net: Monocular Non-Rigid 3D Reconstruction with Learned Deformation  Model\nLearning Free-Form Deformations for 3D Object Reconstruction\nLearning Beyond Human Expertise with Generative Models for Dental  Restorations\nImpact of ultrasound image reconstruction method on breast lesion  classification with neural transfer learning\nAccelerated Optimization in the PDE Framework: Formulations for the  Manifold of Diffeomorphisms\nVESICLE: Volumetric Evaluation of Synaptic Interfaces using Computer  vision at Large Scale\nTracing the boundaries of materials in transparent vessels using  computer vision\nWhittleSearch: Interactive Image Search with Relative Attribute Feedback\nFrom Virtual to Real World Visual Perception using Domain Adaptation --  The DPM as Example\nAugmented Reality Meets Computer Vision : Efficient Data Generation for  Urban Driving Scenes\nThe Cafe Wall Illusion: Local and Global Perception from multiple scale  to multiscale\nImproved Inception-Residual Convolutional Neural Network for Object  Recognition\nEstimacao Temporal da Deformacao entre Objectos utilizando uma  Metodologia Fisica\n3D-Ultrasound probe calibration for computer-guided diagnosis and  therapy\nMulticlass Approaches for Support Vector Machine Based Land Cover  Classification\nAn Exponential Lower Bound on the Complexity of Regularization Paths\nA Combinatorial Algorithm to Compute Regularization Paths\nCritical Analysis of Middleware Architectures for Large Scale  Distributed Systems\nGenus Computing for 3D digital objects: algorithm and implementation\nDetection of Microcalcification in Mammograms Using Wavelet Transform  and Fuzzy Shell Clustering\nEvolution of Things\nComparison of Persistent Homologies for Vector Functions: from  continuous to discrete and back\nA Simple and Correct Even-Odd Algorithm for the Point-in-Polygon Problem  for Complex Polygons\nComputationally Efficient Implementation of Convolution-based Locally  Adaptive Binarization Techniques\nOptimal Computational Trade-Off of Inexact Proximal Methods\nDifferent Operating Systems Compatible for Image Prepress Process in  Color Management: Analysis and Performance Testing\nComputing Consensus Curves\nOn Automation and Medical Image Interpretation, With Applications for  Laryngeal Imaging\nSpeech: A Challenge to Digital Signal Processing Technology for  Human-to-Computer Interaction\nFast Linearized Alternating Direction Minimization Algorithm with  Adaptive Parameter Selection for Multiplicative Noise Removal\nComputer Aided ECG Analysis - State of the Art and Upcoming Challenges\nA Sub-block Based Image Retrieval Using Modified Integrated Region  Matching\nNumerical Computation of Weil-Peterson Geodesics in the Universal  Teichmüller Space\nBeyond visual P300 based brain-computer interfacing paradigms\nWord Spotting in Cursive Handwritten Documents using Modified Character  Shape Codes\nComputing support for advanced medical data analysis and imaging\nStereo on a budget\nFollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory  and Computation\nComputing With Contextual Numbers\nFast Disk Conformal Parameterization of Simply-connected Open Surfaces\nA Video Database of Human Faces under Near Infra-Red Illumination for  Human Computer Interaction Aplications\nEfficient Hand Articulations Tracking using Adaptive Hand Model and  Depth map\nIris Codes Classification Using Discriminant and Witness Directions\nTowards a Generic Application Partitioning and Retraction Framework for  Pervasive Environments\nMatConvNet - Convolutional Neural Networks for MATLAB\nAccelerated graph-based nonlinear denoising filters\nEvolution of active categorical image classification via saccadic eye  movement\nA Systematic Approach to Blocking Convolutional Neural Networks\nFeature Extraction and Soft Computing Methods for Aerospace Structure  Defect Classification\nTheory and Tools for the Conversion of Analog to Spiking Convolutional  Neural Networks\nAn affective computational model for machine consciousness\nDevelopment of JavaScript-based deep learning platform and application  to distributed training\nLarge-scale image analysis using docker sandboxing\nDynamic Computational Time for Visual Attention\nStreaming Algorithm for Euler Characteristic Curves of Multidimensional  Images\nOptimizing Memory Efficiency for Convolution Kernels on Kepler GPUs\nmeProp: Sparsified Back Propagation for Accelerated Deep Learning with  Reduced Overfitting\nBalanced Quantization: An Effective and Efficient Approach to Quantized  Neural Networks\nHigh-Performance Out-of-core Block Randomized Singular Value  Decomposition on GPU\nVideo Salient Object Detection Using Spatiotemporal Deep Features\nAdaptive PCA for Time-Varying Data\nImproving Efficiency in Convolutional Neural Network with Multilinear  Filters\nWhich phoneme-to-viseme maps best improve visual-only computer  lip-reading?\nOptimal Control of Wireless Computing Networks\nLearning Random Fourier Features by Hybrid Constrained Optimization\nclcNet: Improving the Efficiency of Convolutional Neural Network using  Channel Local Convolutions\nA Quantization-Friendly Separable Convolution for MobileNets\nCloudbus Toolkit for Market-Oriented Cloud Computing\nA U.S. Research Roadmap for Human Computation\n21st Century Computer Architecture\nBinary-decomposed DCNN for accelerating computation and compressing  model without retraining\nIncomplete Dot Products for Dynamic Computation Scaling in Neural  Network Inference\nAssisted Video Sequences Indexing : Motion Analysis Based on Interest  Points\nDigital Color Imaging\nStructure from Motion: Theoretical Foundations of a Novel Approach Using  Custom Built Invariants\nA Virtual Library of Technical Publications\nA Theory of Cross-Validation Error\nThe Management of Context-Sensitive Features: A Review of Strategies\nStatistical efficiency of curve fitting algorithms\nDifferential Methods in Catadioptric Sensor Design with Applications to  Panoramic Imaging\nA New Analytical Radial Distortion Model for Camera Calibration\nRational Radial Distortion Models with Analytical Undistortion Formulae\nGeometrical Complexity of Classification Problems\nComputerized Face Detection and Recognition\nBlind Detection and Compensation of Camera Lens Geometric Distortions\nSearch Process and Probabilistic Bifix Approach\nThe Poincare conjecture for digital spaces. Properties of digital  n-dimensional disks and spheres\nConnection between continuous and digital n-manifolds and the Poincare  conjecture\nA kernel method for canonical correlation analysis\nConditional Expressions for Blind Deconvolution: Derivative form\nA novel set of rotationally and translationally invariant features for  images based on the non-commutative bispectrum\nContains and Inside relationships within combinatorial Pyramids\nWhich Point Configurations are Determined by the Distribution of their  Pairwise Distances?\nMatrices of Forests and the Analysis of Digraphs\nHilbert series of subspace arrangements\nNumerically Invariant Signature Curves\nBrainy light sensors with no diffraction limitations\nMultiresolution Approximation of Polygonal Curves in Linear Complexity\nAn Independent Evaluation of Subspace Face Recognition Algorithms\nMI image registration using prior knowledge\nA structure from motion inequality\nVariational local structure estimation for image super-resolution\nHigh-Order Nonparametric Belief-Propagation for Fast Image Inpainting\nLossless Representation of Graphs using Distributions\nAn Affinity Propagation Based method for Vector Quantization Codebook  Design\nAffine Geometry of Space Curves\nGraph kernels between point clouds\nA Fast Hierarchical Multilevel Image Segmentation Method using Unbiased  Estimators\nSome Aspects of Testing Process for Transport Streams in Digital Video  Broadcasting\nEfficient implementation of GALS systems over commercial synchronous  FPGAs: a new approach\nSpatio-activity based object detection\nA multilateral filtering method applied to airplane runway image\nThe Euler-Poincare theory of Metamorphosis\nFusion de classifieurs pour la classification d'images sonar\nExperts Fusion and Multilayer Perceptron Based on Belief Learning for  Sonar Image Classification\nConceptualization of seeded region growing by pixels aggregation. Part  2: how to localize a final partition invariant about the seeded region  initialisation order\nConceptualization of seeded region growing by pixels aggregation. Part  3: a wide range of algorithms\nAn image processing analysis of skin textures\nDesign and Implementation a 8 bits Pipeline Analog to Digital Converter  in the Technology 0.6 μm CMOS Process\nA Fuzzy Commitment Scheme\nModeling and Control with Local Linearizing Nadaraya Watson Regression\nDetecting the Most Unusual Part of a Digital Image\nStroke Fragmentation based on Geometry Features and HMM\nOver-enhancement Reduction in Local Histogram Equalization using its  Degrees of Freedom\nThe Digital Restoration of Da Vinci's Sketches\nDigital Restoration of Ancient Papyri\nColor Dipole Moments for Edge Detection\nBetter Global Polynomial Approximation for Image Rectification\nInformation Distance in Multiples\nQuality assessment of the MPEG-4 scalable video CODEC\nCoding cells of digital spaces: a framework to write generic digital  topology algorithms\nAdaptive Regularization of Ill-Posed Problems: Application to Non-rigid  Image Registration\nMultiple pattern classification by sparse subspace decomposition\nIntroducing New AdaBoost Features for Real-Time Vehicle Detection\nNon-photorealistic image processing: an Impressionist rendering\nLaser Actuated Presentation System\nKannada Character Recognition System A Review\nFusion of Multiple Matchers using SVM for Offline Signature  Identification\nGeometric approach to sampling and communication\nIterative exact global histogram specification and SSIM gradient ascent:  a proof of convergence, step size and parameter selection\nOffline Signature Identification by Fusion of Multiple Classifiers using  Statistical Learning Theory\nRecognition of Handwritten Roman Script Using Tesseract Open source OCR  Engine\nTrends and Techniques in Visual Gaze Analysis\nSignature Recognition using Multi Scale Fourier Descriptor And Wavelet  Transform\nClassification via Incoherent Subspaces\nOn the Subspace of Image Gradient Orientations\nFace Synthesis (FASY) System for Generation of a Face Image from Human  Description\nOptimization of Weighted Curvature for Image Segmentation\nA Two Stage Classification Approach for Handwritten Devanagari  Characters\nA novel approach for handwritten Devnagari character recognition\nFPGA Based Assembling of Facial Components for Human Face Construction\nUncertainty of visual measurement and efficient allocation of sensory  resources\nA Fast Decision Technique for Hierarchical Hough Transform for Line  Detection\nOrthogonal multifilters image processing of astronomical images from  scanned photographic plates\nA Miniature-Based Image Retrieval System\nOn Euclidean Norm Approximations\nVariational Iteration Method for Image Restoration\nDistance Measures for Reduced Ordering Based Vector Filters\nA Fuzzy Clustering Model for Fuzzy Data with Outliers\nDetecting Image Forgeries using Geometric Cues\nAffine-invariant diffusion geometry for the analysis of deformable 3D  shapes\nDiffusion framework for geometric and photometric data fusion in  non-rigid shape analysis\nChernoff information of exponential families\nAll Roads Lead To Rome\nImage Retrieval Method Using Top-surf Descriptor\nApproximation of Besov vectors by Paley-Wiener vectors in Hilbert spaces\nGaussian Affine Feature Detector\nFuzzy Rules and Evidence Theory for Satellite Image Analysis\nFrom a Modified Ambrosio-Tortorelli to a Randomized Part Hierarchy Tree\nHue Histograms to Spatiotemporal Local Features for Action Recognition\nImproving digital signal interpolation: L2-optimal kernels with  kernel-invariant interpolation speed\nIntent Inference and Syntactic Tracking with GMTI Measurements\nPose Estimation from a Single Depth Image for Arbitrary Kinematic  Skeletons\nVisual Secret Sharing Scheme using Grayscale Images\nOnline Vehicle Detection For Estimating Traffic Status\nA Variation of the Box-Counting Algorithm Applied to Colour Images\nBSVM: A Banded Suport Vector Machine\nA Fuzzy View on k-Means Based Signal Quantization with Application in  Iris Segmentation\nLabel-Specific Training Set Construction from Web Resource for Image  Annotation\nWeakly Supervised Learning of Foreground-Background Segmentation using  Masked RBMs\nA Invertible Dimension Reduction of Curves on a Manifold\nHamiltonian Streamline Guided Feature Extraction with Applications to  Face Detection\nEdge detection based on morphological amoebas\nGeneralized Fast Approximate Energy Minimization via Graph Cuts:  Alpha-Expansion Beta-Shrink Moves\nConjugate Variables as a Resource in Signal and Image Processing\nAn Efficient Codebook Initialization Approach for LBG Algorithm\nA Non-Iterative Solution to the Four-Point Three-Views Pose Problem in  Case of Collinear Cameras\nOn the digital homology groups of digital images\nDetachable Object Detection: Segmentation and Depth Ordering From  Short-Baseline Video\nProbabilistic prototype models for attributed graphs\nSquiggle - A Glyph Recognizer for Gesture Input\nTowards an interoperable information infrastructure providing decision  support for genomic medicine\nGraph Regularized Nonnegative Matrix Factorization for Hyperspectral  Data Unmixing\nSparsity and Robustness in Face Recognition\nCovariant fractional extension of the modified Laplace-operator used in  3D-shape recovery\nGood Pairs of Adjacency Relations in Arbitrary Dimensions\nA Single Euler Number Feature for Multi-font Multi-size Kannada Numeral  Recognition\nA self-portrait of young Leonardo\nCompressed sensing of astronomical images: orthogonal wavelets domains\nWard's Hierarchical Clustering Method: Clustering Criterion and  Agglomerative Algorithm\nLes crashs sont rationnels\nAdaptive Noise Reduction Scheme for Salt and Pepper\nPolynomial Regression on Riemannian Manifolds\nAn efficient FPGA implementation of MRI image filtering and tumor  characterization using Xilinx system generator\nMultiscale Fractal Descriptors Applied to Nanoscale Images\nImage Labeling and Segmentation using Hierarchical Conditional Random  Field Model\nCognitive Memory Network\nComparing Background Subtraction Algorithms and Method of Car Counting\nImproving feature selection algorithms using normalised feature  histograms\nUsing Covariance Matrices as Feature Descriptors for Vehicle Detection  from a Fixed Camera\nA feature extraction technique based on character geometry for character  recognition\nStochastic-Based Pattern Recognition Analysis\nImage Fusion and Re-Modified SPIHT for Fused Image\nIntegrated three-dimensional reconstruction using reflectance fields\nSVD-EBP Algorithm for Iris Pattern Recognition\nSimultaneous Object Detection, Tracking, and Event Recognition\nCompensating Interpolation Distortion by Using New Optimized Modular  Method\nOptimal Weights Mixed Filter for Removing Mixture of Gaussian and  Impulse Noises\nGray Level Co-Occurrence Matrices: Generalisation and Some New Features\nPotentials and Limits of Super-Resolution Algorithms and Signal  Reconstruction from Sparse Data\nAn efficient hierarchical graph based image segmentation\nOn multi-view feature learning\nIs margin preserved after random projection?\nPortraits of Julius Caesar: a proposal for 3D analysis\nStatistical Translation, Heat Kernels and Expected Distances\nLarge Scale Variational Bayesian Inference for Structured Scale Mixture  Models\nOnline Exploration of Polygons with Holes\nAnatomical Structure Segmentation in Liver MRI Images\nHMRF-EM-image: Implementation of the Hidden Markov Random Field Model  and its Expectation-Maximization Algorithm\nContent Based Multimedia Information Retrieval to Support Digital  Libraries\nMultisegmentation through wavelets: Comparing the efficacy of Daubechies  vs Coiflets\nA Survey Of Activity Recognition And Understanding The Behavior In Video  Survelliance\nA QCQP Approach to Triangulation\nA phase-sensitive method for filtering on the sphere\nInformation-theoretic Dictionary Learning for Image Classification\nTrace transform based method for color image domain identification\nVideo Data Visualization System: Semantic Classification And  Personalization\nA Comparative Study between Moravec and Harris Corner Detection of Noisy  Images Using Adaptive Wavelet Thresholding Technique\nMultimodal diffusion geometry by joint diagonalization of Laplacians\nWriting Reusable Digital Geometry Algorithms in a Generic Image  Processing Framework\nAn Efficient Color Face Verification Based on 2-Directional  2-Dimensional Feature Extraction\nModel based neuro-fuzzy ASR on Texas processor\nEnvironmental Sounds Spectrogram Classification using Log-Gabor Filters  and Multiclass Support Vector Machines\nRefinability of splines from lattice Voronoi cells\nSubset Selection for Gaussian Markov Random Fields\nReproduction of Images by Gamut Mapping and Creation of New Test Charts  in Prepress Process\nEnhanced Techniques for PDF Image Segmentation and Text Extraction\nSemisupervised Classifier Evaluation and Recalibration\nA notion of continuity in discrete spaces and applications\nLevel Set Estimation from Compressive Measurements using Box Constrained  Total Variation Regularization\nThree dimensional tracking of gold nanoparticles using digital  holographic microscopy\nA polygon-based interpolation operator for super-resolution imaging\nNovel Architecture for 3D model in virtual communities from detected  face\nMLPACK: A Scalable C++ Machine Learning Library\nResolution Enhancement of Range Images via Color-Image Segmentation\nThe fortresses of Ejin: an example of outlining a site from satellite  images\nPerformance Evaluation of Random Set Based Pedestrian Tracking  Algorithms\nLocalisation of Numerical Date Field in an Indian Handwritten Document\nEfficient Superimposition Recovering Algorithm\nImproving Perceptual Color Difference using Basic Color Terms\nUCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild\nUnmixing of Hyperspectral Data Using Robust Statistics-based NMF\nRobust Face Recognition using Local Illumination Normalization and  Discriminant Feature Point Selection\nA Learning Framework for Morphological Operators using Counter-Harmonic  Mean\nMulti-target tracking algorithms in 3D\nApproximating rational Bezier curves by constrained Bezier curves of  arbitrary degree\nPerceptually Motivated Shape Context Which Uses Shape Interiors\nBlinking Molecule Tracking\nAdaptive Foreground and Shadow Detection inImage Sequences\nMatrix Approximation under Local Low-Rank Assumption\nRecurrent Online Clustering as a Spatio-Temporal Feature Extractor in  DeSTIN\nIndoor Semantic Segmentation using depth information\nGradient Driven Learning for Pooling in Visual Pipeline Feature  Extraction Models\nOn the Product Rule for Classification Problems\nSparse MRI for motion correction\nA Fast Learning Algorithm for Image Segmentation with Max-Pooling  Convolutional Networks\nA new compressive video sensing framework for mobile broadcast\nShape Characterization via Boundary Distortion\nIndian Sign Language Recognition Using Eigen Value Weighted Euclidean  Distance Based Classification Technique\nVerifying a platform for digital imaging: a multi-tool strategy\nA new approach for an unitary risk theory\nImage compression using anti-forensics method\nA survey on sensing methods and feature extraction algorithms for SLAM  problem\nAsynchronous Cellular Operations on Gray Images Extracting Topographic  Shape Features and Their Relations\nAn Entropy-based Learning Algorithm of Bayesian Conditional Trees\nGeneralizing k-means for an arbitrary distance matrix\nA Comparative Analysis on the Applicability of Entropy in remote sensing\nAnalysis Of Interest Points Of Curvelet Coefficients Contributions Of  Microscopic Images And Improvement Of Edges\nBlockwise SURE Shrinkage for Non-Local Means\nA novel automatic thresholding segmentation method with local adaptive  thresholds\nK-Algorithm A Modified Technique for Noise Removal in Handwritten  Documents\nOPS-QFTs: A new type of quaternion Fourier transforms based on the  orthogonal planes split with one or two general pure quaternions\nAlgebraic foundations of split hypercomplex nonlinear adaptive filtering\nClifford Fourier-Mellin transform with two real square roots of -1 in  Cl(p,q), p+q=2\nImage segmentation by optimal and hierarchical piecewise constant  approximations\nAn Overview of the Research on Texture Based Plant Leaf Classification\nPersian Heritage Image Binarization Competition (PHIBC 2012)\nA Unified Framework of Elementary Geometric Transformation  Representation\nA two-layer Conditional Random Field for the classification of partially  occluded objects\nMaking Laplacians commute\nWho and Where: People and Location Co-Clustering\nA Robust Alternating Direction Method for Constrained Hybrid Variational  Deblurring Model\nThe Linearized Bregman Method via Split Feasibility Problems: Analysis  and Generalizations\nA Novel Approach in detecting pose orientation of a 3D face required for  face\nA method for nose-tip based 3D face registration using maximum intensity  algorithm\nEstimation of intrinsic volumes from digital grey-scale images\nA Non-Local Means Filter for Removing the Poisson Noise\nA new look at reweighted message passing\nExploration and Exploitation in Visuomotor Prediction of Autonomous  Agents\nDense Scattering Layer Removal\nMisfire Detection in IC Engine using Kstar Algorithm\nCan Facial Uniqueness be Inferred from Impostor Scores?\nGender Classification Using Gradient Direction Pattern\nEfficient Information Theoretic Clustering on Discrete Lattices\nNeighborhood filters and the decreasing rearrangement\nSkin Texture Recognition Using Neural Networks\nDetection of Partially Visible Objects\nA novel framework for image forgery localization\nImage forgery detection based on the fusion of machine learning and  block-matching methods\nThe Power of Asymmetry in Binary Hashing\nAnalysis and Understanding of Various Models for Efficient  Representation and Accurate Recognition of Human Faces\nCo-Sparse Textural Similarity for Image Segmentation\nEvaluation of Plane Detection with RANSAC According to Density of 3D  Point Clouds\nSome Improvements on Deep Convolutional Neural Network Based Image  Classification\nLearning Transformations for Classification Forests\nAn Efficient Edge Detection Technique by Two Dimensional Rectangular  Cellular Automata\nKey point selection and clustering of swimmer coordination through  Sparse Fisher-EM\nContent Based Image Indexing and Retrieval\nGesture recognition based mouse events\nA parameterless scale-space approach to find meaningful modes in  histograms - Application to image and spectrum segmentation\nVisual Tracking using Particle Swarm Optimization\nEdge detection of binary images using the method of masks\nHierarchical pixel clustering for image segmentation\nDelegating Custom Object Detection Tasks to a Universal Classification  System\nUse HMM and KNN for classifying corneal data\nTemporal Image Fusion\nSplines and Wavelets on Geophysically Relevant Manifolds\nSublinear Models for Graphs\nRemoving Mixture of Gaussian and Impulse Noise by Patch-Based Weighted  Means\nImage reconstruction from limited range projections using orthogonal  moments\nUsing n-grams models for visual semantic place recognition\nConsensus in the Wasserstein Metric Space of Probability Measures\nWeyl group orbit functions in image processing\nColor to Gray and Back transformation for distributing color digital  images\nDecreasing Weighted Sorted $\\ell_1$ Regularization\nGeneric Object Detection With Dense Neural Patterns and Regionlets\nLearning Fine-grained Image Similarity with Deep Ranking\nA General Homogeneous Matrix Formulation to 3D Rotation Geometric  Transformations\nCode Minimization for Fringe Projection Based 3D Stereo Sensors by  Calibration Improvement\nEntropy Based Cartoon Texture Separation\nImproving Image Clustering using Sparse Text and the Wisdom of the  Crowds\nCellular Automata based adaptive resampling technique for the processing  of remotely sensed imagery\nClassification of Basmati Rice Grain Variety using Image Processing and  Principal Component Analysis\nImaging with Kantorovich-Rubinstein discrepancy\nAn iterative approach to Hough transform without re-voting\nAn landcover fuzzy logic classification by maximumlikelihood\nPerformance evaluation of wavelet scattering network in image texture  classification in various color spaces\nModulation Classification via Gibbs Sampling Based on a Latent Dirichlet  Bayesian Network\nTurkish Presidential Elections TRT Publicity Speech Facial Expression  Analysis\nIntroduction to Clustering Algorithms and Applications\nFuzzy and entropy facial recognition\nBinary matrices of optimal autocorrelations as alignment marks\nComment on \"Ensemble Projection for Semi-supervised Image  Classification\"\nTree-Structure Bayesian Compressive Sensing for Video\nShape and Color Object Tracking for Real-Time Robotic Navigation\nOnline Tracking of Skin Colour Regions Against a Complex Background\nThe HAWKwood Database\nImprove CAPTCHA's Security Using Gaussian Blur Filter\nRemote sensing image classification exploiting multiple kernel learning\nVehicle Detection and Tracking Techniques: A Concise Review\nA Regularization Approach to Blind Deblurring and Denoising of QR  Barcodes\nImproved depth imaging by constrained full-waveform inversion\nAn Unsupervised Ensemble-based Markov Random Field Approach to  Microscope Cell Image Segmentation\nNew similarity index based on entropy and group theory\nMultilinear Principal Component Analysis Network for Tensor Object  Classification\nConditional Generative Adversarial Nets\nAbnormal Object Recognition: A Comprehensive Study\nCollecting Image Description Datasets using Crowdsourcing\nOn Coarse Graining of Information and Its Application to Pattern  Recognition\nTen Years of Pedestrian Detection, What Have We Learned?\nA Unified Semantic Embedding: Relating Taxonomies and Attributes\nViewpoints and Keypoints\nLow-Rank and Sparse Matrix Decomposition with a-priori knowledge for  Dynamic 3D MRI reconstruction\nReal time Detection of Lane Markers in Urban Streets\nOn color image quality assessment using natural image statistics\nOpen-source code for manifold-based 3D rotation recovery of X-ray  scattering patterns\nHSI based colour image equalization using iterative nth root and nth  power\nLearning to Recognize Pedestrian Attribute\nImplementation of Auto Monitoring and Short-Message-Service System via  GSM Modem\nConstructing Binary Descriptors with a Stochastic Hill Climbing Search\nFiltered Channel Features for Pedestrian Detection\nFeature Sampling Strategies for Action Recognition\nEmbedding of binary image in the Gray planes\nHyper-parameter optimization of Deep Convolutional Networks for object  recognition\nVector Quantization by Minimizing Kullback-Leibler Divergence\nPose Induction for Novel Object Categories\nSynthCam3D: Semantic Understanding With Synthetic Indoor Scenes\nImage Segmentation by Size-Dependent Single Linkage Clustering of a  Watershed Basin Graph\nObject Class Detection and Classification using Multi Scale Gradient and  Corner Point based Shape Descriptors\nFast Guided Filter\nComparing persistence diagrams through complex vectors\nNoise in Structured-Light Stereo Depth Cameras: Modeling and its  Applications\nTraining Deeper Convolutional Networks with Deep Supervision\nDistributed Lustre activity tracking\nModified Hausdorff Fractal Dimension (MHFD)\nVanishing Point Attracts Eye Movements in Scene Free-viewing\nImproved Microaneurysm Detection using Deep Neural Networks\nExploring Nearest Neighbor Approaches for Image Captioning\nUnsupervised Segmentation of Overlapping Cervical Cell Cytoplasm\nNew HSL Distance Based Colour Clustering Algorithm\nRobust Rotation Synchronization via Low-rank and Sparse Matrix  Decomposition\nA comparative study between proposed Hyper Kurtosis based Modified  Duo-Histogram Equalization (HKMDHE) and Contrast Limited Adaptive Histogram  Equalization (CLAHE) for Contrast Enhancement Purpose of Low Contrast Human  Brain CT scan images\nDiscrete Independent Component Analysis (DICA) with Belief Propagation\nUsing Dimension Reduction to Improve the Classification of  High-dimensional Data\nSymbolic Segmentation Using Algorithm Selection\nGeometry of Graph Edit Distance Spaces\nFeature Representation for Online Signature Verification\nPose Embeddings: A Deep Architecture for Learning to Match Human Poses\nConvolutional Color Constancy\nReinventing Pocket Microscopy\nClosed Curves and Elementary Visual Object Identification\nLearning Robust Deep Face Representation\nHand Gesture Recognition Library\nPart Localization using Multi-Proposal Consensus for Fine-Grained  Categorization\nEffective Object Tracking in Unstructured Crowd Scenes\nNonlinear Spectral Analysis via One-homogeneous Functionals - Overview  and Future Prospects\nDeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer\nSimultaneous Deep Transfer Across Domains and Tasks\nFree-hand Sketch Synthesis with Deformable Stroke Models\nEvaluation of Joint Multi-Instance Multi-Label Learning For Breast  Cancer Diagnosis\nInteractive multiclass segmentation using superpixel classification\nFast sequential forensic camera identification\nDynamical spectral unmixing of multitemporal hyperspectral images\nWhat's the point? Frame-wise Pointing Gesture Recognition with  Latent-Dynamic Conditional Random Fields\nA Deep Siamese Network for Scene Detection in Broadcast Videos\nPixel-wise Segmentation of Street with Neural Networks\nProperties of the Sample Mean in Graph Spaces and the  Majorize-Minimize-Mean Algorithm\nRobust Registration of Calcium Images by Learned Contrast Synthesis\nCell identification in whole-brain multiview images of neural activation\nGenerating Images from Captions with Attention\nSliced Wasserstein Kernels for Probability Distributions\nA GMM-Based Stair Quality Model for Human Perceived JPEG Images\nA Directional Diffusion Algorithm for Inpainting\nHuman Curation and Convnets: Powering Item-to-Item Recommendations on  Pinterest\nDeeply-Recursive Convolutional Network for Image Super-Resolution\nUncovering Temporal Context for Video Question and Answering\nCoarse-to-fine Face Alignment with Multi-Scale Local Patch Regression\nCross-scale predictive dictionaries\nQuantitative Analysis of Particles Segregation\nWhat Players do with the Ball: A Physically Constrained Interaction  Modeling\nUnsupervised Deep Embedding for Clustering Analysis\nAcceleration of the PDHGM on strongly convex subspaces\nTop-k Multiclass SVM\nMulti-view 3D Models from Single Images with a Convolutional Network\nMulti-Volume High Resolution RGB-D Mapping with Dynamic Volume Placement\nWhere To Look: Focus Regions for Visual Question Answering\nExploring Person Context and Local Scene Context for Object Detection\nMidRank: Learning to rank based on subsequences\nOn-line Recognition of Handwritten Mathematical Symbols\nModeling Visual Compatibility through Hierarchical Mid-level Elements\nA Classification Leveraged Object Detector\nKeyboard Based Control of Four Dimensional Rotations\nRecurrent Attentional Networks for Saliency Detection\nAn incremental linear-time learning algorithm for the Optimum-Path  Forest classifier\nDeep3D: Fully Automatic 2D-to-3D Video Conversion with Deep  Convolutional Neural Networks\nSingle-Image Depth Perception in the Wild\nLow-Rank Matrix Recovery using Gabidulin Codes in Characteristic Zero\nVisual saliency detection: a Kalman filter based approach\nFrom Dynamic to Static Semantics, Quantitatively\nCan Boosting with SVM as Week Learners Help?\nRight whale recognition using convolutional neural networks\nEvaluation of the Effect of Improper Segmentation on Word Spotting\nImproving Human Action Recognition by Non-action Classification\nScalable Gaussian Processes for Supervised Hashing\nSemantic Reasoning for Context-aware Internet of Things Applications\nTowards Conceptual Compression\nLearning Robust Features using Deep Learning for Automatic Seizure  Detection\nUnion is strength in lossy image compression\nKalman's shrinkage for wavelet-based despeckling of SAR images\nNeural shrinkage for wavelet-based SAR despeckling\nOne-Class Slab Support Vector Machine\nPermutation NMF\nSpoofing 2D Face Detection: Machines See People Who Aren't There\nBootstrapping Face Detection with Hard Negative Examples\nA Factorization Approach to Inertial Affine Structure from Motion\nCanonical Correlation Inference for Mapping Abstract Scenes to Text\nScout-It: Interior tomography using modified scout acquisition\nStacked Approximated Regression Machine: A Simple Deep Learning Approach\nDynamic Hand Gesture Recognition for Wearable Devices with Low  Complexity Recurrent Neural Networks\nGenerating Synthetic Data for Text Recognition\nTemporally Consistent Motion Segmentation from RGB-D Video\nIn the Saddle: Chasing Fast and Repeatable Features\nFine Hand Segmentation using Convolutional Neural Networks\nMindX: Denoising Mixed Impulse Poisson-Gaussian Noise Using Proximal  Algorithms\nHuman Action Recognition without Human\nNew Methods to Improve Large-Scale Microscopy Image Analysis with Prior  Knowledge and Uncertainty\nA statistical model of tristimulus measurements within and between OLED  displays\nMethods of Hierarchical Clustering\nConsiderations and Results in Multimedia and DVB Application Development  on Philips Nexperia Platform\nConvergence of Variational Regularization Methods for Imaging on  Riemannian Manifolds\nAn Algorithmic Solution to the Five-Point Pose Problem Based on the  Cayley Representation of Rotations\nAlignment of Microtubule Imagery\nNon-Gaussian Scale Space Filtering with 2 by 2 Matrix of Linear Filters\nControlled Total Variation regularization for inverse problems\nFace Recognition Based on SVM and 2DPCA\nAge group and gender recognition from human facial images\nA software for aging faces applied to ancient marble busts\nMultiscale Fractal Descriptors Applied to Texture Classification\nImage Compression predicated on Recurrent Iterated Function Systems\nDigit Recognition in Handwritten Weather Records\nA Convex Approach for Image Hallucination\nBingham Procrustean Alignment for Object Detection in Clutter\nFast image segmentation and restoration using parametric curve evolution  with junctions and topology changes\nInvertibility and Robustness of Phaseless Reconstruction\nEdge-detection applied to moving sand dunes on Mars\nStability of Phase Retrievable Frames\nAn Estimation Method of Measuring Image Quality for Compressed Images of  Human Face\nForeground segmentation based on multi-resolution and matting\nHandwritten Character Recognition In Malayalam Scripts- A Review\nFTVd is beyond Fast Total Variation regularized Deconvolution\nDeconstruction of compound objects from image sets\nAmbiguous Proximity Distribution\nThe constitution of visual perceptual units in the functional  architecture of V1\nImage retrieval with hierarchical matching pursuit\nImprovement Tracking Dynamic Programming using Replication Function for  Continuous Sign Language Recognition\nDenosing Using Wavelets and Projections onto the L1-Ball\nJoint Training of a Convolutional Network and a Graphical Model for  Human Pose Estimation\nWhy are images smooth?\nRobust Outlier Detection Technique in Data Mining: A Univariate Approach\nR-CNNs for Pose Estimation and Action Detection\nSaccadic Eye Movements and the Generalized Pareto Distribution\nWeakly-supervised Discovery of Visual Pattern Configurations\nHow good are detection proposals, really?\nMulti-tensor Completion for Estimating Missing Values in Video Data\nIdentifying Synapses Using Deep and Wide Multiscale Recursive Networks\nA feasible roadmap for developing volumetric probability atlas of  localized prostate cancer\nConvolutional Networks for Image Processing by Coupled Oscillator Arrays\nDISA at ImageCLEF 2014 Revised: Search-based Image Annotation with DeCAF  Features\nTensity Research Based on the Information of Eye Movement\nA Survey on Heterogeneous Face Recognition: Sketch, Infra-red, 3D and  Low-resolution\nDeep Regression for Face Alignment\nA Global Approach for Solving Edge-Matching Puzzles\nCtrax extensions for tracking in difficult lighting conditions\nImage Classification with A Deep Network Model based on Compressive  Sensing\nA Deep Graph Embedding Network Model for Face Recognition\nLocation Recognition Over Large Time Lags\nFeedforward semantic segmentation with zoom-out features\nCovariance estimation using conjugate gradient for 3D classification in  Cryo-EM\nColorisation et texturation temps réel d'environnements urbains par  système mobile avec scanner laser et caméra fish-eye\nThe myth of the Digital Earth between fragmentation and wholeness\nEdge Preserving Multi-Modal Registration Based On Gradient Intensity  Self-Similarity\nOriented Edge Forests for Boundary Detection\nA survey of modern optical character recognition techniques\nThe application of the Bayes Ying Yang harmony based GMMs in on-line  signature verification\nAn Algebraical Model for Gray Level Images\nImage Dynamic Range Enhancement in the Context of Logarithmic Models\nGray Level Image Enhancement Using Polygonal Functions\nVisual Instance Retrieval with Deep Convolutional Networks\nIn Search of the Real Inductive Bias: On the Role of Implicit  Regularization in Deep Learning\nDeep metric learning using Triplet network\nA New Way to Factorize Linear Cameras\nFractal descriptors based on the probability dimension: a texture  analysis and classification approach\nFunctional correspondence by matrix completion\nDomain-Size Pooling in Local Descriptors: DSP-SIFT\nA multistep segmentation algorithm for vessel extraction in medical  imaging\nDeep Roto-Translation Scattering for Object Classification\nHierarchical Maximum-Margin Clustering\nPredicting Alzheimer's disease: a neuroimaging study with 3D  convolutional neural networks\nRandom Coordinate Descent Methods for Minimizing Decomposable Submodular  Functions\nReal Time Implementation of Spatial Filtering On FPGA\nClustering by Descending to the Nearest Neighbor in the Delaunay Graph  Space\nCross-Modality Hashing with Partial Correspondence\nSome enumerations of binary digital images\nProbabilistic Zero-shot Classification with Semantic Rankings\nOn debiasing restoration algorithms: applications to total-variation and  nonlocal-means\nFrequency Domain TOF: Encoding Object Depth in Modulation Frequency\nFully Connected Deep Structured Networks\nDesigning A Composite Dictionary Adaptively From Joint Examples\nLearning to Detect Vehicles by Clustering Appearance Patterns\nCharacterizing driving behavior using automatic visual analysis\nNovel Super-Resolution Method Based on High Order Nonlocal-Means\nPattern Recognition of Bearing Faults using Smoother Statistical  Features\nInitialization Strategies of Spatio-Temporal Convolutional Neural  Networks\nRANSAC based three points algorithm for ellipse fitting of spherical  object's projection\nReal-time multi-view deconvolution\nDirect l_(2,p)-Norm Learning for Feature Selection\nRobust Anomaly Detection Using Semidefinite Programming\nKnowledge driven Offline to Online Script Conversion\nOn-line Handwritten Devanagari Character Recognition using Fuzzy  Directional Features\nA Multicomponent Approach to Nonrigid Registration of Diffusion Tensor  Images\nImage Subset Selection Using Gabor Filters and Neural Networks\nConnectivity Preserving Multivalued Functions in Digital Topology\nExtraction of Protein Sequence Motif Information using PSO K-Means\nWhat Do Deep CNNs Learn About Objects?\nUnsupervised Feature Learning from Temporal Data\nJoint Learning of Distributed Representations for Images and Texts\nMultiple Measurements and Joint Dimensionality Reduction for Large Scale  Image Search with Short Vectors - Extended Version\nPreprint Imagining In-Air Interaction for Hemiplegia Sufferer\nSegmentation of Subspaces in Sequential Data\nCaffe con Troll: Shallow Ideas to Speed Up Deep Learning\nSIFT Vs SURF: Quantifying the Variation in Transformations\nLinear Spatial Pyramid Matching Using Non-convex and non-negative Sparse  Coding for Image Classification\nImage Segmentation and Restoration Using Parametric Contours With Free  Endpoints\nMid-level Elements for Object Detection\nImproved repeatability measures for evaluating performance of feature  detectors\nFast R-CNN\nA Review of Feature and Data Fusion with Medical Images\nClassify Images with Conceptor Network\nWell-posedness of a nonlinear integro-differential problem and its  rearranged formulation\nWavelets and continuous wavelet transform for autostereoscopic multiview  images\nFast ConvNets Using Group-wise Brain Damage\nFast Geometric Fit Algorithm for Sphere Using Exact Solution\nTechnical Report: Image Captioning with Semantically Similar Images\nDeep Structured Models For Group Activity Recognition\nDeep Secure Encoding: An Application to Face Recognition\nAutomatic Layer Separation using Light Field Imaging\nAutonomous 3D Reconstruction Using a MAV\nAutomatic vehicle tracking and recognition from aerial image sequences\nSpectral Motion Synchronization in SE(3)\nLand Use Classification in Remote Sensing Images by Convolutional Neural  Networks\nPartial matching face recognition method for rehabilitation nursing  robots beds\nOn Hyperspectral Classification in the Compressed Domain\nSingle and Multiple Illuminant Estimation Using Convolutional Neural  Networks\nOn the convergence of the sparse possibilistic c-means algorithm\nUnconstrained Face Verification using Deep CNN Features\nSimulation of optical flow and fuzzy based obstacle avoidance system for  mobile robots\nA Deep Pyramid Deformable Part Model for Face Detection\nIntroducing Geometry in Active Learning for Image Segmentation\nAn algorithm for Left Atrial Thrombi detection using Transesophageal  Echocardiography\nImage Type Water Meter Character Recognition Based on Embedded DSP\nDiffusion tensor imaging with deterministic error bounds\nSemantic Video Segmentation : Exploring Inference Efficiency\nLearning Sparse Feature Representations using Probabilistic Quadtrees  and Deep Belief Nets\nNoSPaM Manual - A Tool for Node-Specific Triad Pattern Mining\nImage Set Querying Based Localization\nHomotopy relations for digital images\nSpatially Encoding Temporal Correlations to Classify Temporal Data Using  Convolutional Neural Networks\nRetinex filtering of foggy images: generation of a bulk set with  selection and ranking\nLight Field Reconstruction Using Shearlet Transform\nDouble Sparse Multi-Frame Image Super Resolution\nActive Learning for Delineation of Curvilinear Structures\nASIST: Automatic Semantically Invariant Scene Transformation\nImage reconstruction from dense binary pixels\nOn The Continuous Steering of the Scale of Tight Wavelet Frames\nIs Hamming distance the only way for matching binary image feature  descriptors?\nVideo captioning with recurrent networks based on frame- and video-level  features and visual content classification\nRobust Dictionary based Data Representation\nDeep Tracking: Visual Tracking Using Deep Convolutional Networks\nEffects of GIMP Retinex Filtering Evaluated by the Image Entropy\nCar Segmentation and Pose Estimation using 3D Object Models\nCost-based Feature Transfer for Vehicle Occupant Classification\nHow can one sample images with sampling rates close to the theoretical  minimum?\nImage-based Vehicle Analysis using Deep Neural Network: A Systematic  Study\nAngrier Birds: Bayesian reinforcement learning\nQuality Adaptive Low-Rank Based JPEG Decoding with Applications\nStochastic Dykstra Algorithms for Metric Learning on Positive  Semi-Definite Cone\nVisual Script and Language Identification\nCreativity in Machine Learning\nCombining Markov Random Fields and Convolutional Neural Networks for  Image Synthesis\nProactive Message Passing on Memory Factor Networks\nEye detection in digital images: challenges and solutions\nWhen is Clustering Perturbation Robust?\nUnsupervised Deep Hashing for Large-scale Visual Search\nImage and Information\nImproved Eigenfeature Regularization for Face Identification\nGabor Wavelets in Image Processing\nTriplet Similarity Embedding for Face Verification\nConvolutional Radio Modulation Recognition Networks\nDeep Feature-based Face Detection on Mobile Devices\nMulti-resolution Compressive Sensing Reconstruction\nContext-guided diffusion for label propagation on graphs\nA Survey of Semantic Segmentation\nAutomatic Building Extraction in Aerial Scenes Using Convolutional  Networks\nCar Type Recognition with Deep Neural Networks\nRobust Detection of Intensity Variant Clones in Forged and JPEG  Compressed Images\nSHAPE: Linear-Time Camera Pose Estimation With Quadratic Error-Decay\nOn the Accuracy of Point Localisation in a Circular Camera-Array\nA straightforward method to assess motion blur for different types of  displays\nDual Smoothing and Level Set Techniques for Variational Matrix  Decomposition\nA novel and automatic pectoral muscle identification algorithm for  mediolateral oblique (MLO) view mammograms using ImageJ\nProximal groupoid patterns In digital images\nFrom A to Z: Supervised Transfer of Style and Content Using Deep Neural  Network Generators\nTowards Building an RGBD-M Scanner\nLearning zeroth class dictionary for human action recognition\nU-CATCH: Using Color ATtribute of image patCHes in binary descriptors\nDeep video gesture recognition using illumination invariants\nLearning Representations for Automatic Colorization\nMulti-velocity neural networks for gesture recognition in videos\nStacked Hourglass Networks for Human Pose Estimation\nSimple Does It: Weakly Supervised Instance and Semantic Segmentation\nPosition and Vector Detection of Blind Spot motion with the Horn-Schunck  Optical Flow\nObject Recognition Based on Amounts of Unlabeled Data\nLIFT: Learned Invariant Feature Transform\nConfidence driven TGV fusion\nDetecting Burnscar from Hyperspectral Imagery via Sparse Representation  with Low-Rank Interference\nCompression Artifacts Removal Using Convolutional Neural Networks\nDiscovering Useful Parts for Pose Estimation in Sparsely Annotated  Datasets\nAutomatic Identification of Retinal Arteries and Veins in Fundus Images  using Local Binary Patterns\nMining Discriminative Triplets of Patches for Fine-Grained  Classification\nRobust Bayesian Method for Simultaneous Block Sparse Signal Recovery  with Applications to Face Recognition\nOn Image segmentation using Fractional Gradients-Learning Model  Parameters using Approximate Marginal Inference\nWeakly Supervised Learning of Affordances\nImage-level Classification in Hyperspectral Images using Feature  Descriptors, with Application to Face Recognition\nImproved Image Boundaries for Better Video Segmentation\nMultimodal Sparse Coding for Event Detection\nImage segmentation with superpixel-based covariance descriptors in  low-rank representation\nImproving Weakly-Supervised Object Localization By Micro-Annotation\nContour-based 3d tongue motion visualization using ultrasound image  sequences\nDevelopment of a 3D tongue motion visualization platform based on  ultrasound image sequences\nA Formal Evaluation of PSNR as Quality Measurement Parameter for Image  Segmentation Algorithms\nTrajectory probability hypothesis density filter\nLatent Bi-constraint SVM for Video-based Object Recognition\nOn Recognizing Transparent Objects in Domestic Environments Using Fusion  of Multiple Sensor Modalities\nIncorporating long-range consistency in CNN-based texture generation\nTRex: A Tomography Reconstruction Proximal Framework for Robust Sparse  View X-Ray Applications\nA practical local tomography reconstruction algorithm based on known  subregion\nLearning Abstract Classes using Deep Learning\nAutomatic 3D Reconstruction for Symmetric Shapes\nPreserving Color in Neural Artistic Style Transfer\nModel-based Deep Hand Pose Estimation\nAttribute Recognition from Adaptive Parts\nApplication of Convolutional Neural Network for Image Classification on  Pascal VOC Challenge 2012 dataset\nAdaptive Gray World-Based Color Normalization of Thin Blood Film Images\nNew version of Gram-Schmidt Process with inverse for Signal and Image  Processing\nInformation-theoretical label embeddings for large-scale image  classification\nA Topological Lowpass Filter for Quasiperiodic Signals\nLearning the Roots of Visual Domain Shift\nDetection of surface defects on ceramic tiles based on morphological  techniques\nCombined Classifiers for Invariant Face Recognition\nA Statistical Test for Joint Distributions Equivalence\nInstance Normalization: The Missing Ingredient for Fast Stylization\nOnline Trajectory Segmentation and Summary With Applications to  Visualization and Retrieval\nLocal Feature Detectors, Descriptors, and Image Representations: A  Survey\nSEMBED: Semantic Embedding of Egocentric Action Videos\nDefeating Image Obfuscation with Deep Learning\nTowards Automated Melanoma Screening: Exploring Transfer Learning  Schemes\nMaking a Case for Learning Motion Representations with Phase\nSemantic Video Trailers\nRectifier Neural Network with a Dual-Pathway Architecture for Image  Denoising\nLie-X: Depth Image Based Articulated Object Pose Estimation, Tracking,  and Action Recognition on Lie Groups\nA Perspective on Deep Imaging\nDeep Impression: Audiovisual Deep Residual Networks for Multimodal  Apparent Personality Trait Recognition\nLabel-Free Supervision of Neural Networks with Physics and Domain  Knowledge\nRobust Estimation of Multiple Inlier Structures\nPartial Least Squares Regression on Riemannian Manifolds and Its  Application in Classifications\nDistributed Training of Deep Neural Networks: Theoretical and Practical  Limits of Parallel Scalability\nSuper-resolving multiresolution images with band-independant geometry of  multispectral pixels\nAutomatic Construction of a Recurrent Neural Network based Classifier  for Vehicle Passage Detection\nSimultaneous Low-rank Component and Graph Estimation for  High-dimensional Graph Signals: Application to Brain Imaging\nAutomated Breast Lesion Segmentation in Ultrasound Images\nSemi Automatic Color Segmentation of Document Pages\nPano2CAD: Room Layout From A Single Panorama Image\nRedefining Binarization and the Visual Archetype\nTraining a Feedback Loop for Hand Pose Estimation\nLow-dose CT denoising with convolutional neural network\nFast Image Classification by Boosting Fuzzy Classifiers\nConvex Histogram-Based Joint Image Segmentation with Regularized Optimal  Transport Cost\nPlaces: An Image Database for Deep Scene Understanding\nDeep disentangled representations for volumetric reconstruction\nOn the Existence of a Sample Mean in Dynamic Time Warping Spaces\nM2CAI Workflow Challenge: Convolutional Neural Networks with Time  Smoothing and Hidden Markov Model for Video Frames Classification\nMixed context networks for semantic segmentation\nKernel Alignment for Unsupervised Transfer Learning\nUtilization of Deep Reinforcement Learning for saccadic-based object  visual search\nDetecting Rainfall Onset Using Sky Images\nRecord Counting in Historical Handwritten Documents with Convolutional  Neural Networks\nTool and Phase recognition using contextual CNN features\nRecent advances in content based video copy detection\nThe TUM LapChole dataset for the M2CAI 2016 workflow challenge\nBest-Buddies Tracking\nAn All-In-One Convolutional Neural Network for Face Analysis\nRough Set Based Color Channel Selection\nDeep Convolutional Neural Network for 6-DOF Image Localization\nGenerative Shape Models: Joint Text Recognition and Segmentation with  Very Little Training Data\nX-ray Scattering Image Classification Using Deep Learning\nEvaluating Urbanization from Satellite and Aerial Images by means of a  statistical approach to the texture analysis\nOptimized clothes segmentation to boost gender classification in  unconstrained scenarios\nResponses to Critiques on Machine Learning of Criminality Perceptions  (Addendum of arXiv:1611.04135)\nA DNN Framework For Text Image Rectification From Planar Transformations\nHerding Generalizes Diverse M -Best Solutions\nAutomatic discovery of discriminative parts as a quadratic assignment  problem\nCIFAR-10: KNN-based Ensemble of Classifiers\nNeural Style Representations and the Large-Scale Classification of  Artistic Style\nOptical Flow Requires Multiple Strategies (but only one network)\nGeneralized BackPropagation, Étude De Cas: Orthogonality\nA Bayesian approach to type-specific conic fitting\nSemi-Supervised Learning with Context-Conditional Generative Adversarial  Networks\nTextBoxes: A Fast Text Detector with a Single Deep Neural Network\nThe subset-matched Jaccard index for evaluation of Segmentation for  Plant Images\nSublabel-Accurate Discretization of Nonconvex Free-Discontinuity  Problems\nCascaded Neural Networks with Selective Classifiers and its evaluation  using Lung X-ray CT Images\n3D Fully Convolutional Network for Vehicle Detection in Point Cloud\nDeep Watershed Transform for Instance Segmentation\nSemantic Segmentation using Adversarial Networks\nMultimodal Latent Variable Analysis\n3D Ultrasound image segmentation: A Survey\nMachine Learning for Dental Image Analysis\nEfficient Pose and Cell Segmentation using Column Generation\nBreast Mass Classification from Mammograms using Deep Convolutional  Neural Networks\nZero-Shot Learning posed as a Missing Data Problem\nA method for the segmentation of images based on thresholding and  applied to vesicular textures\nDeep Pyramidal Residual Networks with Separated Stochastic Depth\nStereo image de-fencing using smartphones\nImageNet pre-trained models with batch normalization\nDiverse Sampling for Self-Supervised Learning of Semantic Segmentation\nA series of maximum entropy upper bounds of the differential entropy\nAutoencoder-based holographic image restoration\nObservation of dynamics inside an unlabeled live cell using bright-field  photon microscopy: Evaluation of organelles' trajectories\nCompressive Image Recovery Using Recurrent Generative Model\nDesign of Image Matched Non-Separable Wavelet using Convolutional Neural  Network\nFast, Dense Feature SDM on an iPhone\nA Dual Ascent Framework for Lagrangean Decomposition of Combinatorial  Problems\nFeature Encoding in Band-limited Distributed Surveillance Systems\nLocal Sparse Approximation for Image Restoration with Adaptive Block  Size Selection\nA Statistical Approach to Continuous Self-Calibrating Eye Gaze Tracking  for Head-Mounted Virtual Reality Systems\nStochastic Multidimensional Scaling\nMeta-Unsupervised-Learning: A supervised approach to unsupervised  learning\nSuper-Resolution Reconstruction of Electrical Impedance Tomography  Images\nVid2speech: Speech Reconstruction from Silent Video\nAbnormal Event Detection in Videos using Spatiotemporal Autoencoder\nMap-guided Hyperspectral Image Superpixel Segmentation Using Proportion  Maps\nGreedy Search for Descriptive Spatial Face Features\nOn Classification of Distorted Images with Deep Convolutional Neural  Networks\nGreen-Blue Stripe Pattern for Range Sensing from a Single Image\nVisual Multiple-Object Tracking for Unknown Clutter Rate\nVisualizing Residual Networks\nSystematic study of color spaces and components for the segmentation of  sky/cloud images\nAnalysis of the noise in back-projection light field acquisition and its  optimization\nUsing Convolutional Neural Networks to Count Palm Trees in Satellite  Images\nLAREX - A semi-automatic open-source Tool for Layout Analysis and Region  Extraction on Early Printed Books\nSuper-resolution Using Constrained Deep Texture Synthesis\nFace Detection using Deep Learning: An Improved Faster RCNN Approach\nImageNet MPEG-7 Visual Descriptors - Technical Report\nSegmentation of optic disc, fovea and retinal vasculature using a single  convolutional neural network\nFast and easy blind deblurring using an inverse filter and PROBE\nRobust features for facial action recognition\nAn Integrated Simulator and Dataset that Combines Grasping and Vision  for Deep Learning\nAn Implementation of Faster RCNN with Study for Region Sampling\nAn Adversarial Regularisation for Semi-Supervised Training of Structured  Output Neural Networks\nEffective face landmark localization via single deep network\nMulti-Resolution Dual-Tree Wavelet Scattering Network for Signal  Classification\nVisual Discovery at Pinterest\nCityPersons: A Diverse Dataset for Pedestrian Detection\nDerivative Based Focal Plane Array Nonuniformity Correction\nLearning Compact Appearance Representation for Video-based Person  Re-Identification\nMimicking Ensemble Learning with Deep Branched Networks\nHow ConvNets model Non-linear Transformations\nUnifying local and non-local signal processing with graph CNNs\nOptimal rates of estimation for multi-reference alignment\nAn Extensive Technique to Detect and Analyze Melanoma: A Challenge at  the International Symposium on Biomedical Imaging (ISBI) 2017\nLabel Refinement Network for Coarse-to-Fine Semantic Segmentation\nIntroduction to Nonnegative Matrix Factorization\nTowards CNN Map Compression for camera relocalisation\nEstimating the resolution of real images\nSkin Lesion Classification using Class Activation Map\nGenerative Compression\nHigh-Resolution Multispectral Dataset for Semantic Segmentation\nPrior-based Hierarchical Segmentation Highlighting Structures of  Interest\nGait Pattern Recognition Using Accelerometers\nNeural method for Explicit Mapping of Quasi-curvature Locally Linear  Embedding in image retrieval\nCombining Residual Networks with LSTMs for Lipreading\nUsers prefer Guetzli JPEG over same-sized libjpeg\nFully Convolutional Networks to Detect Clinical Dermoscopic Features\nRandom Forests and VGG-NET: An Algorithm for the ISIC 2017 Skin Lesion  Classification Challenge\nNeural Networks for Beginners. A fast implementation in Matlab, Torch,  TensorFlow\nHyperspectral Unmixing with Endmember Variability using Semi-supervised  Partial Membership Latent Dirichlet Allocation\nRecurrent Models for Situation Recognition\nKnowledge distillation using unlabeled mismatched images\nIOD-CNN: Integrating Object Detection Networks for Event Recognition\nEpisode-Based Active Learning with Bayesian Neural Networks\nImportant New Developments in Arabographic Optical Character Recognition  (OCR)\nNovel Structured Low-rank algorithm to recover spatially smooth  exponential image time series\nSentiment Recognition in Egocentric Photostreams\nSAR image despeckling through convolutional neural networks\nClustering in Hilbert simplex geometry\nSpatiotemporal Networks for Video Emotion Recognition\nSoft-to-Hard Vector Quantization for End-to-End Learning Compressible  Representations\nEnhance Feature Discrimination for Unsupervised Hashing\nDetail-revealing Deep Video Super-resolution\nFast Learning and Prediction for Object Detection using Whitened CNN  Features\nSolving the L1 regularized least square problem via a box-constrained  smooth minimization\nUC Merced Submission to the ActivityNet Challenge 2016\nSaliency-guided Adaptive Seeding for Supervoxel Segmentation\nCamera Calibration by Global Constraints on the Motion of Silhouettes\nA Comment on \"Analysis of Video Image Sequences Using Point and Line  Correspondences\"\nOn Measuring Bias in Online Information\nDerivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR  Models under Upscaling\nDeep Occlusion Reasoning for Multi-Camera Multi-Target Detection\nExploring epoch-dependent stochastic residual networks\nPanorama to panorama matching for location recognition\nA Dual Sparse Decomposition Method for Image Denoising\nSpeeding up Convolutional Neural Networks By Exploiting the Sparsity of  Rectifier Units\nFull-Page Text Recognition: Learning Where to Start and When to Stop\nPartially Occluded Leaf Recognition via Subgraph Matching and Energy  Optimization\nOutline Colorization through Tandem Adversarial Networks\nSingle image depth estimation by dilated deep residual convolutional  neural network and soft-weight-sum inference\nOffline Handwritten Recognition of Malayalam District Name - A Holistic  Approach\nStatistical learning of rational wavelet transform for natural images\nGenerative Convolutional Networks for Latent Fingerprint Reconstruction\nRecurrent Soft Attention Model for Common Object Recognition\nDiving Performance Assessment by means of Video Processing\nSkin lesion detection based on an ensemble of deep convolutional neural  network\nCollaborative Descriptors: Convolutional Maps for Preprocessing\nObstacle Avoidance Using Stereo Camera\nProbabilistic Image Colorization\nConvolutional Sparse Representations with Gradient Penalties\nUsing Satellite Imagery for Good: Detecting Communities in Desert and  Mapping Vaccination Activities\nTraX: The visual Tracking eXchange Protocol and Library\nAutomated Robotic Monitoring and Inspection of Steel Structures and  Bridges\nA Closed-Form Model for Image-Based Distant Lighting\nA Correspondence Relaxation Approach for 3D Shape Reconstruction\nJoint Geometrical and Statistical Alignment for Visual Domain Adaptation\nAutomated Body Structure Extraction from Arbitrary 3D Mesh\nIntel RealSense Stereoscopic Depth Cameras\nResearch on Bi-mode Biometrics Based on Deep Learning\nStatic Gesture Recognition using Leap Motion\nWhat's In A Patch, I: Tensors, Differential Geometry and Statistical  Shading Analysis\nWhat's In A Patch, II: Visualizing generic surfaces\nProbabilistic Combination of Noisy Points and Planes for RGB-D Odometry\nMUTAN: Multimodal Tucker Fusion for Visual Question Answering\nWhat are the Receptive, Effective Receptive, and Projective Fields of  Neurons in Convolutional Neural Networks?\nAdversarial Examples Are Not Easily Detected: Bypassing Ten Detection  Methods\nCrossNets : A New Approach to Complex Learning\nShake-Shake regularization\nView-Invariant Recognition of Action Style Self-Dissimilarity\nConvolutional Networks with MuxOut Layers as Multi-rate Systems for  Image Upscaling\nOn the mathematics of beauty: beautiful images\nResidual Expansion Algorithm: Fast and Effective Optimization for  Nonconvex Least Squares Problems\nUnsupervised Learning of Disentangled Representations from Video\nBridge Simulation and Metric Estimation on Landmark Manifolds\nFace R-CNN\nA Kind of Affine Weighted Moment Invariants\nMobile vs. point guards\nStyle Transfer for Anime Sketches with Enhanced Residual U-net and  Auxiliary Classifier GAN\nShape-Color Differential Moment Invariants under Affine Transformations\nAn Overview of Multi-Task Learning in Deep Neural Networks\nUsing Deep Networks for Drone Detection\nUser-driven mobile robot storyboarding: Learning image interest and  saliency from pairwise image comparisons\nPedestrian Prediction by Planning using Deep Neural Networks\nA Novel VHR Image Change Detection Algorithm Based on Image Fusion and  Fuzzy C-Means Clustering\nCoupled Support Vector Machines for Supervised Domain Adaptation\nIrregular Convolutional Neural Networks\nRobust Lane Tracking with Multi-mode Observation Model and Particle  Filtering\nRetinal Vessel Segmentation in Fundoscopic Images with Generative  Adversarial Networks\nOnline Convolutional Dictionary Learning\nChatbots as Conversational Recommender Systems in Urban Contexts\nPersistence Diagrams with Linear Machine Learning Models\nJoint Pose and Principal Curvature Refinement Using Quadrics\nHigh-Quality Face Image SR Using Conditional Generative Adversarial  Networks\nGenerative Adversarial Models for People Attribute Recognition in  Surveillance\nEffective Approaches to Batch Parallelization for Dynamic Neural Network  Architectures\nIdentity Alignment by Noisy Pixel Removal\nAdaptive Binarization for Weakly Supervised Affordance Segmentation\nOn Study of the Reliable Fully Convolutional Networks with Tree Arranged  Outputs (TAO-FCN) for Handwritten String Recognition\nOptical Mapping Near-eye Three-dimensional Display with Correct Focus  Cues\nCultivating DNN Diversity for Large Scale Video Labelling\nMake Your Bone Great Again : A study on Osteoporosis Classification\nExploiting Convolutional Representations for Multiscale Human Settlement  Detection\nPictures of Combinatorial Cubes\nConvolutional Sparse Coding: Boundary Handling Revisited\nAutomatic breast cancer grading in lymph nodes using a deep neural  network\nA Jointly Learned Deep Architecture for Facial Attribute Analysis and  Face Detection in the Wild\nUnderstanding Aesthetics in Photography using Deep Convolutional Neural  Networks\nUnsupervised Visual Attribute Transfer with Reconfigurable Generative  Adversarial Networks\nSuperposition de calques monochromes d'opacités variables\nCorrection of \"Cloud Removal By Fusing Multi-Source and Multi-Temporal  Images\"\nDeep Generative Adversarial Neural Networks for Realistic Prostate  Lesion MRI Synthesis\nLearning to Hallucinate Face Images via Component Generation and  Enhancement\nInception Score, Label Smoothing, Gradient Vanishing and -log(D(x))  Alternative\nSurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis\nIsointense infant brain MRI segmentation with a dilated convolutional  neural network\nAnveshak - A Groundtruth Generation Tool for Foreground Regions of  Document Images\nAnalysis of Convolutional Neural Networks for Document Image  Classification\nDocument Image Binarization with Fully Convolutional Neural Networks\nSystematic Testing of Convolutional Neural Networks for Autonomous  Driving\nUnsupervised Incremental Learning of Deep Descriptors From Video Streams\nDeep Incremental Boosting\nAugmentor: An Image Augmentation Library for Machine Learning\nColorimetric Calibration of a Digital Camera\nAn Improved Neural Segmentation Method Based on U-NET\nBrain Abnormality Detection by Deep Convolutional Neural Network\nLearning a Multi-View Stereo Machine\nDeformable Modeling for Human Body Acquired from Depth Sensors\ne-Counterfeit: a mobile-server platform for document counterfeit  detection\nReflection Separation and Deblurring of Plenoptic Images\nShape Registration with Directional Data\nAn Optimized Union-Find Algorithm for Connected Components Labeling  Using GPUs\nDeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation\nDeep Learning for Medical Image Analysis\nDual-fisheye lens stitching for 360-degree imaging\nA simple expression for the map of Asplund's distances with the  multiplicative Logarithmic Image Processing (LIP) law\nPix2face: Direct 3D Face Model Estimation\nAdversarial nets with perceptual losses for text-to-image synthesis\nAssessing verticalization effects on urban safety perception\nLensless-camera based machine learning for image classification\nLearning Loss for Knowledge Distillation with Conditional Adversarial  Networks\nGaussian Filter in CRF Based Semantic Segmentation\nSushi Dish - Object detection and classification from real images\nDataset Augmentation with Synthetic Images Improves Semantic  Segmentation\nA Reproducible Study on Remote Heart Rate Measurement\nFeasibility of Corneal Imaging for Handheld Augmented Reality\nTowards Around-Device Interaction using Corneal Imaging\nA Comparison on Audio Signal Preprocessing Methods for Deep Neural  Networks on Music Tagging\nLearning Dilation Factors for Semantic Segmentation of Street Scenes\nA Survey of Efficient Regression of General-Activity Human Poses from  Depth Images\nA Geometric Approach to Harmonic Color Palette Design\nDetecting Hands in Egocentric Videos: Towards Action Recognition\nRobust Sparse Coding via Self-Paced Learning\nAn Iterative Regression Approach for Face Pose Estimation from RGB  Images\nGeneric Sketch-Based Retrieval Learned without Drawing a Single Sketch\nAsian Stamps Identification and Classification System\nCorrelating Satellite Cloud Cover with Sky Cameras\nSalNet360: Saliency Maps for omni-directional images with CNN\nSemi-Automated Nasal PAP Mask Sizing using Facial Photographs\nA First Derivative Potts Model for Segmentation and Denoising Using ILP\nSmart Mirror: Intelligent Makeup Recommendation and Synthesis\nMeasurement of amplitude of the moiré patterns in digital  autostereoscopic 3D display\nRealizing Half-Diminished Reality from Video Stream of Manipulating  Objects\nLADAR-Based Mover Detection from Moving Vehicles\nFast Vehicle Detection in Aerial Imagery\nImage similarity using Deep CNN and Curriculum Learning\nGenerative Adversarial Networks with Inverse Transformation Unit\nLeveraging Weakly Annotated Data for Fashion Image Retrieval and Label  Prediction\nA New Multifocus Image Fusion Method Using Contourlet Transform\nScale Adaptive Clustering of Multiple Structures\nExact Camera Location Recovery by Least Unsquared Deviations\nA Variational Approach to Shape-from-shading Under Natural Illumination\nDecoding visemes: improving machine lipreading\nRobust non-local means filter for ultrasound image denoising\nVariational Grid Setting Network\nGaussian Three-Dimensional kernel SVM for Edge Detection Applications\nIQ of Neural Networks\nEnergy-Based Spherical Sparse Coding\nVideo Denoising and Enhancement via Dynamic Video Layering\nA Multiscale Patch Based Convolutional Network for Brain Tumor  Segmentation\nHuman Pose Regression by Combining Indirect Part Detection and  Contextual Information\nDoes Normalization Methods Play a Role for Hyperspectral Image  Classification?\nA Sequential Thinning Algorithm For Multi-Dimensional Binary Patterns\nAutomatic Streaming Segmentation of Stereo Video Using Bilateral Space\nTowards In-Transit Analytics for Industry 4.0\nHardware design for binarization and thinning of fingerprint images\nFace Transfer with Generative Adversarial Network\nThe Robust Reading Competition Annotation and Evaluation Platform\nUsing Deep Convolutional Networks for Gesture Recognition in American  Sign Language\nUnsupervised Object Discovery and Segmentation of RGBD-images\nLight-weight place recognition and loop detection using road markings\nRethinking Convolutional Semantic Segmentation Learning\nNeural Stain-Style Transfer Learning using GAN for Histopathological  Images\nMulti-modal Aggregation for Video Classification\nStochastic variance reduced multiplicative update for nonnegative matrix  factorization\nLearning Graph Convolution Filters from Data Manifold\nRandom Subspace Two-dimensional LDA for Face Recognition\nImage Captioning and Classification of Dangerous Situations\nIdentifying Rings in IFU Surveys\nFrangi-Net: A Neural Network Approach to Vessel Segmentation\nHand Gesture Recognition with Leap Motion\nVertebral body segmentation with GrowCut: Initial experience, workflow  and practical application\nConvolutional neural networks pretrained on large face recognition  datasets for emotion classification from video\nA Public Image Database for Benchmark of Plant Seedling Classification  Algorithms\nHidden Markov Random Field Iterative Closest Point\nA Forward-Backward Approach for Visualizing Information Flow in Deep  Networks\nSegmenting Brain Tumors with Symmetry\nBlock-Cyclic Stochastic Coordinate Descent for Deep Neural Networks\nConvergent Block Coordinate Descent for Training Tikhonov Regularized  Deep Neural Networks\nMulti-Image Semantic Matching by Mining Consistent Features\nA Face Fairness Framework for 3D Meshes\nIntegral Human Pose Regression\nCost-Effective Active Learning for Melanoma Segmentation\nJoint Cuts and Matching of Partitions in One Graph\nSSD-6D: Making RGB-based 3D detection and 6D pose estimation great again\nLearning Channel Inter-dependencies at Multiple Scales on Dense Networks  for Face Recognition\nRevisiting hand-crafted feature for action recognition: a set of  improved dense trajectories\nHighlighting objects of interest in an image by integrating saliency and  depth\nTighter Lifting-Free Convex Relaxations for Quadratic Matching Problems\nPredicting Depression Severity by Multi-Modal Feature Engineering and  Fusion\nHigh Dynamic Range Imaging Technology\nTowards High Performance Video Object Detection\nSpatial PixelCNN: Generating Images from Patches\nAutomatic Recognition of Coal and Gangue based on Convolution Neural  Network\nContext Augmentation for Convolutional Neural Networks\nFactoring Shape, Pose, and Layout from the 2D Image of a 3D Scene\nPopulation-based Respiratory 4D Motion Atlas Construction and its  Application for VR Simulations of Liver Punctures\nOnline and Batch Supervised Background Estimation via L1 Regression\nBurst Denoising with Kernel Prediction Networks\nShape from Shading through Shape Evolution\nNoise Level Estimation for Overcomplete Dictionary Learning Based on  Tight Asymptotic Bounds\nDeep Koalarization: Image Colorization using CNNs and  Inception-ResNet-v2\nCycleGAN Face-off\nTraining Ensembles to Detect Adversarial Examples\nReview. Machine learning techniques for traffic sign detection\nObject Classification using Ensemble of Local and Deep Features\nLearning Low-shot facial representations via 2D warping\nMulti-appearance Segmentation and Extended 0-1 Program for Dense Small  Object Tracking\nAI2-THOR: An Interactive 3D Environment for Visual AI\nBipartite Graph Matching for Keyframe Summary Evaluation\nEncoding CNN Activations for Writer Recognition\nA Bidirectional Adaptive Bandwidth Mean Shift Strategy for Clustering\nTexture Synthesis with Recurrent Variational Auto-Encoder\nMulti-Target, Multi-Camera Tracking by Hierarchical Clustering: Recent  Progress on DukeMTMC Project\nAdversarial Patch\nExtrapolating Expected Accuracies for Large Multi-Class Problems\nSky detection and log illumination refinement for PDE-based hazy image  contrast enhancement\nA PDE-based log-agnostic illumination correction algorithm\nNeural Networks in Adversarial Setting and Ill-Conditioned Weight Space\nLoopSmart: Smart Visual SLAM Through Surface Loop Closure\nDetection and segmentation of the Left Ventricle in Cardiac MRI using  Deep Learning\nLaVAN: Localized and Visible Adversarial Noise\nEnd-to-end detection-segmentation network with ROI convolution\nAn overview of deep learning based methods for unsupervised and  semi-supervised anomaly detection in videos\nComparative Study on Generative Adversarial Networks\nSemi-supervised Fisher vector network\nGeneralizing, Decoding, and Optimizing Support Vector Machine  Classification\nLearning Deep Features for One-Class Classification\nImage Captioning using Deep Neural Architectures\nLight-weight pixel context encoders for image inpainting\nFood recognition and recipe analysis: integrating visual content,  context and external knowledge\nSpatial Temporal Graph Convolutional Networks for Skeleton-Based Action  Recognition\nPersonalized Human Activity Recognition Using Convolutional Neural  Networks\nAbnormal Heartbeat Detection Using Recurrent Neural Networks\nUsing Deep Autoencoders for Facial Expression Recognition\nTowards an Understanding of Neural Networks in Natural-Image Spaces\nHistogram of Oriented Depth Gradients for Action Recognition\nE2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene  Text\nLearning Video-Story Composition via Recurrent Neural Network\nWeighted Nonlocal Total Variation in Image Processing\nPerceptual Compressive Sensing\nConvolutional neural network-based regression for depth prediction in  digital holography\nLearning Attribute Representation for Human Activity Recognition\nMulti-task Learning for Continuous Control\nClassSim: Similarity between Classes Defined by Misclassification Ratios  of Trained Classifiers\nRollable Latent Space for SAR Target Recognition of Un-seen Views\nStructural Recurrent Neural Network (SRNN) for Group Activity Analysis\nFace Detection Using Improved Faster RCNN\nA Multiresolution Deep Learning Framework for Automated Annotation of  Reflectance Confocal Microscopy Images\nShakeDrop regularization\nTexture Segmentation Based Video Compression Using Convolutional Neural  Networks\nConvolutional Hashing for Automated Scene Matching\nGenerative ScatterNet Hybrid Deep Learning (G-SHDL) Network with  Structural Priors for Semantic Image Segmentation\nCombinets: Learning New Classifiers via Recombination\nOn the Universal Approximability of Quantized ReLU Neural Networks\nFooling OCR Systems with Adversarial Text Images\nSpectral Normalization for Generative Adversarial Networks\nFast, Trainable, Multiscale Denoising\nA new foundational crisis in mathematics, is it really happening?\nRobustness of Rotation-Equivariant Networks to Adversarial Perturbations\nA survey on trajectory clustering analysis\nStochastic Video Generation with a Learned Prior\nImproved Techniques For Weakly-Supervised Object Localization\nChatPainter: Improving Text to Image Generation using Dialogue\nAdversarial vulnerability for any classifier\nEdge-Based Recognition of Novel Objects for Robotic Grasping\nSemantic segmentation of trajectories with agent models\nGenerating High Quality Visible Images from SAR Images Using CNNs\nUsing Deep Learning for Segmentation and Counting within Microscopy Data\nPoisson Image Denoising Using Best Linear Prediction: A Post-processing  Framework\nKnowledge Transfer with Jacobian Matching\nContained Neural Style Transfer for Decorated Logo Generation\nProtecting JPEG Images Against Adversarial Attacks\nCategorical Mixture Models on VGGNet activations\nRevisiting Decomposable Submodular Function Minimization with Incidence  Relations\nNoise2Noise: Learning Image Restoration without Clean Data\nidtracker.ai: Tracking all individuals in large collectives of unmarked  animals\nDiscriminability objective for training descriptive captions\nTarget Driven Instance Detection\nPrincipal Component Analysis with Tensor Train Subspace\nUsing accumulation to optimize deep residual neural nets\nTemporal Human Action Segmentation via Dynamic Clustering\nVarying k-Lipschitz Constraint for Generative Adversarial Networks\nActivity Detection with Latent Sub-event Hierarchy Learning\nLearning Region Features for Object Detection\nFace Recognition Techniques: A Survey\nSemi-Blind Spatially-Variant Deconvolution in Optical Microscopy with  Local Point Spread Function Estimation By Use Of Convolutional Neural  Networks\nSpeech-Driven Facial Reenactment Using Conditional Generative  Adversarial Networks\nNon-rigid 3D Shape Registration using an Adaptive Template\nGuided Image Inpainting: Replacing an Image Region by Pulling Content  from Another Image\nAligning Across Large Gaps in Time\nWhat Do We Understand About Convolutional Networks?\nDeep Convolutional Compressed Sensing for LiDAR Depth Completion\nIterative Low-Rank Approximation for CNN Compression\nDesign of a PCIe Interface Card Control Software Based on WDF\nUnsupervised Domain Adaptation: A Multi-task Learning-based Method\nSemantic See-Through Rendering on Light Fields\nUnsupervised Learning and Segmentation of Complex Activities from Video\nWeakening the Detecting Capability of CNN-based Steganalysis\nTwo-Stream Neural Networks for Tampered Face Detection\nSnap Angle Prediction for 360$^{\\circ}$ Panorama\nLearnable Image Encryption\nA Neuronal Planar Modeling for Handwriting Signature based on Automatic  Segmentation\nMultimodal Biometric Authentication Using Choquet Integral and Genetic  Algorithm\nClosed-form detector for solid sub-pixel targets in multivariate  t-distributed background clutter\nTelepresence System based on Simulated Holographic Display\nSupervised Convolutional Sparse Coding\nYOLOv3: An Incremental Improvement\nAMNet: Memorability Estimation with Attention\nThe Monge-Kantorovich Optimal Transport Distance for Image Comparison\nTwo Stream 3D Semantic Scene Completion\nClassification of Point Cloud Scenes with Multiscale Voxel Deep Network\nMGGAN: Solving Mode Collapse using Manifold Guided Training\nExtraction of Airways using Graph Neural Networks\nSeed-Point Based Geometric Partitioning of Nuclei Clumps\nAn efficient CNN for spectral reconstruction from RGB images\nInterCloud: Utility-Oriented Federation of Cloud Computing Environments  for Scaling of Application Services\nParcellation of fMRI Datasets with ICA and PLS-A Data Driven Approach\nA Taxonomy and Survey of Energy-Efficient Data Centers and Cloud  Computing Systems\nAnalytical Cost Metrics : Days of Future Past\nAnalysis and approximation of some Shape-from-Shading models for  non-Lambertian surfaces\nImage Segmentation for Fruit Detection and Yield Estimation in Apple  Orchards\nMaking the V in VQA Matter: Elevating the Role of Image Understanding in  Visual Question Answering\nImage De-raining Using a Conditional Generative Adversarial Network\nUsing Deep Learning and Google Street View to Estimate the Demographic  Makeup of the US\nGlobal Minimum for a Finsler Elastica Minimal Path Approach\nVirtual Kathakali : Gesture Driven Metamorphosis\nNeural Architectures for Robot Intelligence\nCleaning the USNO-B Catalog through automatic detection of optical  artifacts\nThe source coding game with a cheating switcher\nAugmenting Light Field to model Wave Optics effects\nSVM-based Multiview Face Recognition by Generalization of Discriminant  Analysis\nFace Recognition by Fusion of Local and Global Matching Scores using DS  Theory: An Evaluation with Uni-classifier and Multi-classifier Paradigm\nA stochastic model of human visual attention with a dynamic Bayesian  network\nFuzzy Logic of Speed and Steering Control System for Three Dimensional  Line Following of an Autonomous Vehicle\nA Peer-to-Peer Middleware Framework for Resilient Persistent Programming\nCombinatorial Continuous Maximal Flows\nHybrid Linear Modeling via Local Best-fit Flats\nIntroduction to the Bag of Features Paradigm for Image Classification  and Retrieval\nA Panorama on Multiscale Geometric Representations, Intertwining  Spatial, Directional and Frequency Selectivity\nBlock-Sparse Recovery via Convex Optimization\nA bio-inspired image coder with temporal scalability\nRobust Mobile Object Tracking Based on Multiple Feature Similarity and  Trajectory Filtering\nDistributed Multi-view Matching in Networks with Limited Communications\nA Theory for Optical flow-based Transport on Image Manifolds\nDistributed Representation of Geometrically Correlated Images with  Compressed Linear Measurements\nAutonomous Cleaning of Corrupted Scanned Documents - A Generative  Modeling Approach\nGeneralized Principal Component Analysis (GPCA)\nFixed-Rank Representation for Unsupervised Visual Learning\nImprovement of ISOM by using filter\nKinects and Human Kinetics: A New Approach for Studying Crowd Behavior\nA Hash based Approach for Secure Keyless Steganography in Lossless RGB  Images\nComparison of Fuzzy and Neuro Fuzzy Image Fusion Techniques and its  Applications\nA Multi-View Embedding Space for Modeling Internet Images, Tags, and  their Semantics\nEfficient Multiple Object Tracking Using Mutually Repulsive Active  Membranes\nA Model of OpenEHR Based Electronic Medical Record In Indonesia\nKernel Sparse Models for Automated Tumor Segmentation\nImproved Foreground Detection via Block-based Classifier Cascade with  Probabilistic Decision Integration\nSeparable Dictionary Learning\nSemantic Context Forests for Learning-Based Knee Cartilage Segmentation  in 3D MR Images\nNonmyopic View Planning for Active Object Detection\nEfficient pedestrian detection by directly optimize the partial area  under the ROC curve\nDetermining Leishmania Infection Levels by Automatic Analysis of  Microscopy Images\nHuman Face Recognition using Gabor based Kernel Entropy Component  Analysis\nAssociative embeddings for large-scale knowledge transfer with  self-assessment\nAutomatic Detection of Calibration Grids in Time-of-Flight Images\nExtrinsic Methods for Coding and Dictionary Learning on Grassmann  Manifolds\nMatching Image Sets via Adaptive Multi Convex Hull\nRandom Projections on Manifolds of Symmetric Positive Definite Matrices  for Image Classification\nA Tiered Move-making Algorithm for General Non-submodular Pairwise  Energies\nROML: A Robust Feature Correspondence Approach for Matching Objects in A  Set of Images\nA Comparative Study of Modern Inference Techniques for Structured  Discrete Energy Minimization Problems\nFast Supervised Hashing with Decision Trees for High-Dimensional Data\nCascades of Regression Tree Fields for Image Restoration\nRobust and Efficient Subspace Segmentation via Least Squares Regression\nCircle detection using Discrete Differential Evolution Optimization\nLearning Rich Features from RGB-D Images for Object Detection and  Segmentation\nVideo Face Editing Using Temporal-Spatial-Smooth Warping\nRobust Statistical Approach for Extraction of Moving Human Silhouettes  from Videos\nHierarchical Adaptive Structural SVM for Domain Adaptation\nA Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training  Structural SVMs with a Costly max-Oracle\nReconstructive Sparse Code Transfer for Contour Detection and Semantic  Labeling\nLearning to Rank Binary Codes\nTowards Scene Understanding with Detailed 3D Object Representations\nDeep Learning Face Attributes in the Wild\nCorrelation Adaptive Subspace Segmentation by Trace Lasso\nDetail-preserving and Content-aware Variational Multi-view Stereo  Reconstruction\nMulti-view Convolutional Neural Networks for 3D Shape Recognition\nA Vision Based System for Monitoring the Loss of Attention in Automotive  Drivers\nPredicting Important Objects for Egocentric Video Summarization\nHigh Performance Offline Handwritten Chinese Character Recognition Using  GoogLeNet and Directional Feature Maps\nRiemannian Gaussian Distributions on the Space of Symmetric Positive  Definite Matrices\nMulti-Face Tracking by Extended Bag-of-Tracklets in Egocentric Videos\nAn End-to-End Trainable Neural Network for Image-based Sequence  Recognition and Its Application to Scene Text Recognition\nDeep Networks for Image Super-Resolution with Sparse Prior\nBetter Exploiting OS-CNNs for Better Event Recognition in Images\nCoherent Motion Segmentation in Moving Camera Videos using Optical Flow  Orientations\nBasic Level Categorization Facilitates Visual Object Recognition\nLearning Articulated Motion Models from Visual and Lingual Signals\nCompositional Memory for Visual Question Answering\nWhat Objective Does Self-paced Learning Indeed Optimize?\nLearning to Generate Images with Perceptual Similarity Metrics\nDeep Manifold Traversal: Changing Labels with Convolutional Features\nSuperpixel Convolutional Networks using Bilateral Inceptions\nAsk Me Anything: Free-form Visual Question Answering Based on Knowledge  from External Sources\nNetVLAD: CNN architecture for weakly supervised place recognition\nVoronoi Region-Based Adaptive Unsupervised Color Image Segmentation\nWriter-independent Feature Learning for Offline Signature Verification  using Deep Convolutional Neural Networks\nIntegrating Local Material Recognition with Large-Scale Perceptual  Attribute Discovery\nImage segmentation of cross-country scenes captured in IR spectrum\nUsing Deep Learning for Image-Based Plant Disease Detection\nLearning Visual Storylines with Skipping Recurrent Neural Networks\nThe THUMOS Challenge on Action Recognition for Videos \"in the Wild\"\nAn information theoretic formulation of the Dictionary Learning and  Sparse Coding Problems on Statistical Manifolds\n$\\ell_p$-Box ADMM: A Versatile Framework for Integer Programming\nUnderstanding Human-Centric Images: From Geometry to Fashion\nSANTIAGO: Spine Association for Neuron Topology Improvement and Graph  Optimization\nHighly Efficient Compact Pose SLAM with SLAM++\nRefining Geometry from Depth Sensors using IR Shading Images\nA geometric analysis of subspace clustering with outliers\nPatch-based Probabilistic Image Quality Assessment for Face Selection  and Improved Video-based Face Recognition\nIntegration of spatio-temporal contrast sensitivity with a multi-slice  channelized Hotelling observer\nRotational Projection Statistics for 3D Local Surface Description and  Object Recognition\nMultispectral Palmprint Encoding and Recognition\nSeeing the Big Picture: Deep Embedding with Contextual Evidences\n3D ShapeNets: A Deep Representation for Volumetric Shapes\nImageNet Large Scale Visual Recognition Challenge\nStructured Low-Rank Matrix Factorization with Missing and Grossly  Corrupted Observations\nEvaluation of Output Embeddings for Fine-Grained Image Classification\n3D Hand Pose Detection in Egocentric RGB-D Images\nBrachiaria species identification using imaging techniques based on  fractal descriptors\nLearning Contour-Fragment-based Shape Model with And-Or Tree  Representation\nIncorporating Structural Alternatives and Sharing into Hierarchy for  Multiclass Object Recognition and Detection\nSimulation of Color Blindness and a Proposal for Using Google Glass as  Color-correcting Tool\nNEFI: Network Extraction From Images\nRecognizing Fine-Grained and Composite Activities using Hand-Centric  Features and Script Data\nDeepTrack: Learning Discriminative Feature Representations Online for  Robust Visual Tracking\nConvex Optimization for Parallel Energy Minimization\nLifting Object Detection Datasets into 3D\nConvolutional Neural Network-Based Image Representation for Visual Loop  Closure Detection\nThe adaptable buffer algorithm for high quantile estimation in  non-stationary data streams\nCapturing Hands in Action using Discriminative Salient Points and  Physics Simulation\nDeepStereo: Learning to Predict New Views from the World's Imagery\nKernel Cuts: MRF meets Kernel & Spectral Clustering\nUnsupervised Learning from Narrated Instruction Videos\nRecurrent Network Models for Human Dynamics\nOwl and Lizard: Patterns of Head Pose and Eye Pose in Driver Gaze  Classification\nLearning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images\nLearning Sampling Distributions for Efficient Object Detection\nUnsupervised Cross-Domain Recognition by Identifying Compact Joint  Subspaces\nDeep Attributes from Context-Aware Regional Neural Codes\nDeepSat - A Learning framework for Satellite Imagery\nLEWIS: Latent Embeddings for Word Images and their Semantics\nVisualizing Deep Convolutional Neural Networks Using Natural Pre-Images\nTruncated Max-of-Convex Models\nImproving Facial Analysis and Performance Driven Animation through  Disentangling Identity and Expression\nCrater Detection via Convolutional Neural Networks\nLow-rank Matrix Factorization under General Mixture Noise Distributions\nEnhancing Energy Minimization Framework for Scene Text Recognition with  Top-Down Cues\nFactors in Finetuning Deep Model for object detection\nReconNet: Non-Iterative Reconstruction of Images from Compressively  Sensed Random Measurements\nSelf-Transfer Learning for Fully Weakly Supervised Object Localization\nAutomatic Face Reenactment\nFace Attribute Prediction Using Off-the-Shelf CNN Features\nPlaNet - Photo Geolocation with Convolutional Neural Networks\nAugur: Mining Human Behaviors from Fiction to Power Interactive Systems\nExploring the coevolution of predator and prey morphology and behavior\nSSSC-AM: A Unified Framework for Video Co-Segmentation by Structured  Sparse Subspace Clustering with Appearance and Motion Features\nRevisiting Batch Normalization For Practical Domain Adaptation\nDeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene  Understanding\nLearning to Navigate the Energy Landscape\nFractal Dimension Invariant Filtering and Its CNN-based Implementation\nBreakingNews: Article Annotation by Image and Text Processing\n3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions\nPerceptually Consistent Color-to-Gray Image Conversion\nViZDoom: A Doom-based AI Research Platform for Visual Reinforcement  Learning\nBuilding a Large Scale Dataset for Image Emotion Recognition: The Fine  Print and The Benchmark\nNeural Dataset Generality\nAutomatic Image Annotation via Label Transfer in the Semantic Space\nViziometrics: Analyzing Visual Information in the Scientific Literature\nDual Local-Global Contextual Pathways for Recognition in Aerial Imagery\nSiamese Instance Search for Tracking\nEventNet Version 1.1 Technical Report\nA Sparse Representation of Complete Local Binary Pattern Histogram for  Human Face Recognition\nLongitudinal Face Modeling via Temporal Deep Restricted Boltzmann  Machines\nApparent Age Estimation Using Ensemble of Deep Learning Models\nA constrained clustering based approach for matching a collection of  feature sets\nMax-Margin Feature Selection\nFVQA: Fact-based Visual Question Answering\nRevisiting Visual Question Answering Baselines\nScene Text Detection via Holistic, Multi-Channel Prediction\nSteering a Predator Robot using a Mixed Frame/Event-Driven Convolutional  Neural Network\nTubelets: Unsupervised action proposals from spatiotemporal super-voxels\nPiecewise convexity of artificial neural networks\nImproved Deep Learning of Object Category using Pose Information\nVisual Question Answering: A Survey of Methods and Datasets\nMulti-Camera Action Dataset for Cross-Camera Action Recognition  Benchmarking\nEgo2Top: Matching Viewers in Egocentric and Top-view Videos\nDeep Learning Human Mind for Automated Visual Classification\nAutomatic Visual Theme Discovery from Joint Image and Text Corpora\nPoisson Noise Reduction with Higher-order Natural Image Prior Model\nAutomated Visual Fin Identification of Individual Great White Sharks\nFrom Facial Expression Recognition to Interpersonal Relation Prediction\nVote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient  Convolutional Neural Networks\nPose-Selective Max Pooling for Measuring Similarity\nEnd-to-end Concept Word Detection for Video Captioning, Retrieval, and  Question Answering\nMulti-Task Curriculum Transfer Deep Learning of Clothing Attributes\nA Harmonic Mean Linear Discriminant Analysis for Robust Image  Classification\nLocation Sensitive Deep Convolutional Neural Networks for Segmentation  of White Matter Hyperintensities\nA Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation\nModular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies\nLaplacian regularized low rank subspace clustering\nLocal Similarity-Aware Deep Feature Embedding\nMCMC Shape Sampling for Image Segmentation with Nonparametric Shape  Priors\nLearning Detailed Face Reconstruction from a Single Image\nDeep Action- and Context-Aware Sequence Learning for Activity  Recognition and Anticipation\nMultimodal Memory Modelling for Video Captioning\nDSAC - Differentiable RANSAC for Camera Localization\nThe Freiburg Groceries Dataset\nFinding Mirror Symmetry via Registration\nImproving training of deep neural networks via Singular Value Bounding\nSelf-Supervised Video Representation Learning With Odd-One-Out Networks\nTowards Robust Deep Neural Networks with BANG\nFully Convolutional Crowd Counting On Highly Congested Scenes\n3D Bounding Box Estimation Using Deep Learning and Geometry\nUnsupervised Human Action Detection by Action Matching\nStackGAN: Text to Photo-realistic Image Synthesis with Stacked  Generative Adversarial Networks\nMultiple Instance Learning: A Survey of Problem Characteristics and  Applications\nA Video-Based Method for Objectively Rating Ataxia\nDefining the Pose of any 3D Rigid Object and an Associated Distance\nScale Coding Bag of Deep Features for Human Attribute and Action  Recognition\nObjective Micro-Facial Movement Detection Using FACS-Based Regions and  Baseline Evaluation\nPhysically-Based Rendering for Indoor Scene Understanding Using  Convolutional Neural Networks\nDARN: a Deep Adversial Residual Network for Intrinsic Image  Decomposition\nLearning Non-Lambertian Object Intrinsics across ShapeNet Categories\nSymbolic Representation and Classification of Logos\nFeedback Networks\nAENet: Learning Deep Audio Features for Video Analysis\nUrban Scene Segmentation with Laser-Constrained CRFs\nAn OpenCL(TM) Deep Learning Accelerator on Arria 10\nAction Recognition: From Static Datasets to Moving Robots\nDeep Learning Features at Scale for Visual Place Recognition\nA Survey of Structure from Motion\nPixel-wise Ear Detection with Convolutional Encoder-Decoder Networks\nAutomating Image Analysis by Annotating Landmarks with Deep Neural  Networks\nA Fast and Compact Saliency Score Regression Network Based on Fully  Convolutional Network\nYouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set  for Object Detection in Video\nReconstruction-Based Disentanglement for Pose-invariant Face Recognition\nSparse Representation based Multi-sensor Image Fusion: A Review\nOne-Step Time-Dependent Future Video Frame Prediction with a  Convolutional Encoder-Decoder Neural Network\nEnhanced Facial Recognition Framework based on Skin Tone and False Alarm  Rejection\nLearning Spatial Regularization with Image-level Supervisions for  Multi-label Image Classification\nScene Recognition by Combining Local and Global Image Descriptors\nDeep representation learning for human motion prediction and  classification\nSynthesizing Training Data for Object Detection in Indoor Scenes\nVisual Translation Embedding Network for Visual Relation Detection\nAge Progression/Regression by Conditional Adversarial Autoencoder\nMIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional  Networks with Privileged Information\nCDC: Convolutional-De-Convolutional Networks for Precise Temporal Action  Localization in Untrimmed Videos\nShape DNA: Basic Generating Functions for Geometric Moment Invariants\nNoScope: Optimizing Neural Network Queries over Video at Scale\nA Hybrid Deep Learning Architecture for Privacy-Preserving Mobile  Analytics\nDevelopment of An Android Application for Object Detection Based on  Color, Shape, or Local Features\nLearning Rank Reduced Interpolation with Principal Component Analysis\nUsing Human Brain Activity to Guide Machine Learning\nNeed for Speed: A Benchmark for Higher Frame Rate Object Tracking\nEncouraging LSTMs to Anticipate Actions Very Early\nZM-Net: Real-time Zero-shot Image Manipulation Network\nNeural Ctrl-F: Segmentation-free Query-by-String Word Spotting in  Handwritten Manuscript Collections\nLarge Pose 3D Face Reconstruction from a Single Image via Direct  Volumetric CNN Regression\nTransductive Zero-Shot Learning with Adaptive Structural Embedding\nActive Convolution: Learning the Shape of Convolution for Image  Classification\nSemi and Weakly Supervised Semantic Segmentation Using Generative  Adversarial Network\nBundle Optimization for Multi-aspect Embedding\nMoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised  Monocular Reconstruction\nUnsupervised learning from video to detect foreground objects in single  images\nConfigurable, Photorealistic Image Rendering and Ground Truth Synthesis  by Sampling Stochastic Grammars Representing Indoor Scenes\nGenerating Descriptions with Grounded and Co-Referenced People\nReal-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor\nThree-Dimensional Segmentation of Vesicular Networks of Fungal Hyphae in  Macroscopic Microscopy Image Stacks\nDeep Generative Adversarial Compression Artifact Removal\nImproving Pairwise Ranking for Multi-label Image Classification\nAutomatic Discovery, Association Estimation and Learning of Semantic  Attributes for a Thousand Categories\nVisual Recognition of Paper Analytical Device Images for Detection of  Falsified Pharmaceuticals\nUnsupervised object segmentation in video by efficient selection of  highly probable positive features\nA Labeling-Free Approach to Supervising Deep Neural Networks for Retinal  Blood Vessel Segmentation\nAutomatic Viseme Vocabulary Construction to Enhance Continuous  Lip-reading\nJoint Denoising / Compression of Image Contours via Shape Prior and  Context Tree\nSubmodular Trajectory Optimization for Aerial 3D Scanning\nA Dual-Source Approach for 3D Human Pose Estimation from a Single Image\nCAD Priors for Accurate and Flexible Instance Reconstruction\nREAD-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in  Archival Documents\nLCDet: Low-Complexity Fully-Convolutional Neural Networks for Object  Detection in Embedded Systems\nLearning Robust Object Recognition Using Composed Scenes from Generative  Models\nTowards seamless multi-view scene analysis from satellite to  street-level\nHierarchical Cellular Automata for Visual Saliency\nEffective Sampling: Fast Segmentation Using Robust Geometric Model  Fitting\nLearning a Robust Society of Tracking Parts\nContinuous Video to Simple Signals for Swimming Stroke Detection with  Convolutional Neural Networks\nDiscriminatively Learned Hierarchical Rank Pooling Networks\nAdversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and  Image-to-Image Translation from Unpaired Supervision\nr-BTN: Cross-domain Face Composite and Synthesis from Limited Facial  Patches\nDeep-Learning Convolutional Neural Networks for scattered shrub  detection with Google Earth Imagery\nMeasurement-Adaptive Sparse Image Sampling and Recovery\nDeep Adaptive Feature Embedding with Local Sample Distributions for  Person Re-identification\nRecovering 6D Object Pose: Multi-modal Analyses on Challenges\nModeling Multi-Object Configurations via Medial/Skeletal Linking  Structures\nLearning without Prejudice: Avoiding Bias in Webly-Supervised Action  Recognition\nInteractive 3D Modeling with a Generative Adversarial Network\nFine-Grained Categorization via CNN-Based Automatic Extraction and  Integration of Object-Level and Part-Level Features\nComparing Neural and Attractiveness-based Visual Features for Artwork  Recommendation\nDeep Hashing Network for Unsupervised Domain Adaptation\nA New Urban Objects Detection Framework Using Weakly Annotated Sets\nLaplacian-Steered Neural Style Transfer\nKnowledge-Guided Recurrent Neural Network Learning for Task-Oriented  Action Prediction\nObject Tracking based on Quantum Particle Swarm Optimization\nDominant Sets for \"Constrained\" Image Segmentation\nFaster Than Real-time Facial Alignment: A 3D Spatial Transformer Network  Approach in Unconstrained Poses\nSupervising Neural Attention Models for Video Captioning by Human Gaze  Data\nUnsupervised Object Discovery and Co-Localization by Deep Descriptor  Transforming\nEnhancing Convolutional Neural Networks for Face Recognition with  Occlusion Maps and Batch Triplet Loss\nConcise Radiometric Calibration Using The Power of Ranking\nFCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in  City Cameras\nGeneralizing the Convolution Operator in Convolutional Neural Networks\nAutomated Assessment of Facial Wrinkling: a case study on the effect of  smoking\nUnsupervised Domain Adaptation for Face Recognition in Unlabeled Videos\nExtreme clicking for efficient object annotation\nTransitive Invariance for Self-supervised Visual Representation Learning\nExploring Temporal Preservation Networks for Precise Temporal Action  Localization\nJoint Multi-Person Pose Estimation and Semantic Part Segmentation\nSSH: Single Stage Headless Face Detector\nConditional Adversarial Network for Semantic Segmentation of Brain Tumor\nPractical Block-wise Neural Network Architecture Generation\nDiscovery of Visual Semantics by Unsupervised and Self-Supervised  Representation Learning\nOn Image Classification: Correlation v.s. Causality\nWordSup: Exploiting Word Annotations for Character based Text Detection\nPredicting Aesthetic Score Distribution through Cumulative  Jensen-Shannon Divergence\nA Comprehensive Survey of Deep Learning in Remote Sensing: Theories,  Tools and Challenges for the Community\nFast Image Processing with Fully-Convolutional Networks\nCompressed Sensing MRI Reconstruction using a Generative Adversarial  Network with a Cyclic Loss\nScene Text Recognition with Sliding Convolutional Character Models\nTowards Automated Cadastral Boundary Delineation from UAV Data\nFocusing Attention: Towards Accurate Text Recognition in Natural Images\nRotational Subgroup Voting and Pose Clustering for Robust 3D Object  Recognition\nImage Matching: An Application-oriented Benchmark\nTo Go or Not To Go? A Near Unsupervised Learning Approach For Robot  Navigation\nNormal Integration: A Survey\nMorphable Face Models - An Open Framework\nRecognition of feature curves on 3D shapes using an algebraic approach  to Hough transforms\nWhere computer vision can aid physics: dynamic cloud motion forecasting  from satellite images\nDepth estimation using structured light flow -- analysis of projected  pattern flow on an object's surface --\nRethinking Feature Discrimination and Polymerization for Large-scale  Recognition\nDeepSolarEye: Power Loss Prediction and Weakly Supervised Soiling  Localization via Fully Convolutional Networks for Solar Panels\nCo-saliency Detection for RGBD Images Based on Multi-constraint Feature  Matching and Cross Label Propagation\n3D Object Discovery and Modeling Using Single RGB-D Images Containing  Multiple Object Instances\nProcedural Modeling and Physically Based Rendering for Synthetic Data  Generation in Automotive Applications\nDART: Distribution Aware Retinal Transform for Event-based Cameras\nBeautiful and damned. Combined effect of content quality and social ties  on user engagement\nAn Iterative Co-Saliency Framework for RGBD Images\nInterpreting Convolutional Neural Networks Through Compression\nRemote Sensing Image Fusion Based on Two-stream Fusion Network\nGlobal versus Localized Generative Adversarial Nets\nParallel Attention: A Unified Framework for Visual Object Discovery  through Dialogs and Queries\nAction-Attending Graphic Neural Network\nParameter Reference Loss for Unsupervised Domain Adaptation\nFew-shot Learning by Exploiting Visual Concepts within CNNs\nOn the Relations of Correlation Filter Based Trackers and Struck\nCan Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?\nDFUNet: Convolutional Neural Networks for Diabetic Foot Ulcer  Classification\nColour Constancy: Biologically-inspired Contrast Variant Pooling  Mechanism\nSPP-Net: Deep Absolute Pose Regression with Synthetic Views\nMapping the world population one building at a time\nNote on Attacking Object Detectors with Adversarial Stickers\nBrain Tumor Segmentation Based on Refined Fully Convolutional Neural  Networks with A Hierarchical Dice Loss\nMemory-Efficient Deep Salient Object Segmentation Networks on Gridized  Superpixels\nFuture Frame Prediction for Anomaly Detection -- A New Baseline\nLearning Deep Similarity Models with Focus Ranking for Fabric Image  Retrieval\nImplementation of Deep Convolutional Neural Network in Multi-class  Categorical Image Classification\nWhat have we learned from deep representations for action recognition?\nConvex Relaxations for Pose Graph Optimization with Outliers\nAdversarial Spheres\nLight Field Super-Resolution using a Low-Rank Prior and Deep  Convolutional Neural Networks\nImage Provenance Analysis at Scale\nEnKCF: Ensemble of Kernelized Correlation Filters for High-Speed Object  Tracking\nHandwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder  Network\nWhen Vehicles See Pedestrians with Phones:A Multi-Cue Framework for  Recognizing Phone-based Activities of Pedestrians\nSupersaliency: Predicting Smooth Pursuit-Based Attention with Slicing  CNNs Improves Fixation Prediction for Naturalistic Videos\nTell-and-Answer: Towards Explainable Visual Question Answering using  Attributes and Captions\nLearning the Synthesizability of Dynamic Texture Samples\nFace recognition for monitoring operator shift in railways\nEvery Smile is Unique: Landmark-Guided Diverse Smile Generation\nGenerating Triples with Adversarial Networks for Scene Graph  Construction\nTriplet-based Deep Similarity Learning for Person Re-Identification\nDA-GAN: Instance-level Image Translation by Deep Attention Generative  Adversarial Networks (with Supplementary Materials)\nGenerating retinal flow maps from structural optical coherence  tomography with artificial intelligence\nBonnet: An Open-Source Training and Deployment Framework for Semantic  Segmentation in Robotics using CNNs\nMulti-Evidence Filtering and Fusion for Multi-Label Classification,  Object Detection and Semantic Segmentation Based on Weakly Supervised  Learning\nDirectional Statistics-based Deep Metric Learning for Image  Classification and Retrieval\nJoint Pixel and Feature-level Domain Adaptation in the Wild\nCross-View Image Synthesis using Conditional GANs\nFace2Text: Collecting an Annotated Image Description Corpus for the  Generation of Rich Face Descriptions\nDeep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise  Loss\nQueuing Theory Guided Intelligent Traffic Scheduling through Video  Analysis using Dirichlet Process Mixture Model\nHierarchical Metric Learning and Matching for 2D and 3D Geometric  Correspondences\nUnsupervised Representation Learning by Predicting Image Rotations\nEnd-to-End Video Captioning with Multitask Reinforcement Learning\nPDNet: Prior-model Guided Depth-enhanced Network for Salient Object  Detection\nSpeeding-up Object Detection Training for Robotics with FALKON\nTowards Human-Machine Cooperation: Self-supervised Sample Mining for  Object Detection\nSeeing Voices and Hearing Faces: Cross-modal biometric matching\nCodeSLAM - Learning a Compact, Optimisable Representation for Dense  Visual SLAM\nFace Alignment in Full Pose Range: A 3D Total Solution\nHyperDense-Net: A hyper-densely connected CNN for multi-modal image  segmentation\nGeometric Consistency for Self-Supervised End-to-End Visual Odometry\nCapsules for Object Segmentation\nExploiting feature representations through similarity learning,  post-ranking and ranking aggregation for person re-identification\nMultimodal Unsupervised Image-to-Image Translation\nThe Generalized Universal Law of Generalization\nL.T.Kuzin: Research Program\nA hybrid MLP-PNN architecture for fast image superresolution\nAccurate and robust image superresolution by neural processing of local  image representations\nMatching Edges in Images ; Application to Face Recognition\nNext Generation Language Resources using GRID\nTactical games & behavioral self-organization\nSEAL: Common Core Libraries and Services for LHC Applications\nInverse problems in imaging systems and the general Bayesian inversion  frawework\nExtreme Learning Machine for land cover classification\n3D/2D Registration of Mapping Catheter Images for Arrhythmia  Interventional Assistance\nA New Method to Extract Dorsal Hand Vein Pattern using Quadratic  Inference Function\nAnalytical shape determination of fiber-like objects with Virtual Image  Correlation\nAutomatic diagnosis of retinal diseases from color retinal images\nMedical Image Compression using Wavelet Decomposition for Prediction  Method\nA Study on Potential of Integrating Multimodal Interaction into Musical  Conducting Education\nNormalized Information Distance is Not Semicomputable\nRevisiting Complex Moments For 2D Shape Representation and Image  Normalization\nA Review of Research on Devnagari Character Recognition\nStatistical Multiresolution Dantzig Estimation in Imaging: Fundamental  Concepts and Algorithmic Framework\nExploring New Directions in Iris Recognition\nOff-Line Arabic Handwriting Character Recognition Using Word  Segmentation\nWavelet Based Normal and Abnormal Heart Sound Identification using  Spectrogram Analysis\nAn anisotropy preserving metric for DTI processing\nOn the Relation Between the Common Labelling and the Median Graph\nParallel D2-Clustering: Large-Scale Clustering of Discrete Distributions\nA Low-Complexity Algorithm for Static Background Estimation from  Cluttered Image Sequences in Surveillance Contexts\nComputing Motion with 3D Memristive Grid\nNon-constant bounded holomorphic functions of hyperbolic numbers -  Candidates for hyperbolic activation functions\nTime Efficient Approach To Offline Hand Written Character Recognition  Using Associative Memory Net\nGPU Accelerated Particle Visualization with Splotch\nCorrecting Multi-focus Images via Simple Standard Deviation for Image  Fusion\nMultiple Kernel Learning for Brain-Computer Interfacing\nApproximating persistent homology for a cloud of $n$ points in a  subquadratic time\nStopping Rules for Bag-of-Words Image Search and Its Application in  Appearance-Based Localization\nSoftware Architecture and Subclassing Technique for Semiconductor  Manufacturing Machines\nEvaluation of Image Segmentation and Filtering With ANN in the Papaya  Leaf\nBlock Motion Based Dynamic Texture Analysis: A Review\nEfficient Nonnegative Tucker Decompositions: Algorithms and Uniqueness\nA Multi-threshold Segmentation Approach Based on Artificial Bee Colony  Optimization\nFast Separable Non-Local Means\nNeighborhood Rank Order Coding for Robust Texture Analysis and Feature  Extraction\nSymmetric angular momentum coupling, the quantum volume operator and the  7-spin network: a computational perspective\nA Two-phase Decision Support Framework for the Automatic Screening of  Digital Fundus Images\nOnline SLAM with Any-time Self-calibration and Automatic Change  Detection\nDeep Learning for Medical Image Segmentation\nA Real Time Facial Expression Classification System Using Local Binary  Patterns\nDescribing Multimedia Content using Attention-based Encoder--Decoder  Networks\nCompression of Fully-Connected Layer in Neural Network by Kronecker  Product\nGPU-Based Computation of 2D Least Median of Squares with Applications to  Fast and Robust Line Detection\nMultilingual Image Description with Neural Sequence Models\nColor graph based wavelet transform with perceptual information\nComputational models: Bottom-up and top-down aspects\nEfficient Training of Very Deep Neural Networks for Supervised Hashing\nTEMPO: Feature-Endowed Teichmüller Extremal Mappings of Point Clouds\nReservoir computing for spatiotemporal signal classification without  trained output weights\nDesign of Efficient Convolutional Layers using Single Intra-channel  Convolution, Topological Subdivisioning and Spatial \"Bottleneck\" Structure\nFundamental principles of cortical computation: unsupervised learning  with prediction, compression and feedback\nRobot Networks with Homonyms: The Case of Patterns Formation\nAlgorithmic Optimisations for Iterative Deconvolution Methods\nMedial Meshes for Volume Approximation\nBinary Stereo Matching\nBird Species Categorization Using Pose Normalized Deep Convolutional  Nets\nNon-Verbal Communication Analysis in Victim-Offender Mediations\nAdjusted least squares fitting of algebraic hypersurfaces\nNonparametric Nearest Neighbor Descent Clustering based on Delaunay  Triangulation\nA Probabilistic Theory of Deep Learning\nFast Methods for Eikonal Equations: an Experimental Survey\nA Survey of Multithreading Image Analysis\nExploring the influence of scale on artist attribution\nDetection and Analysis of Emotion From Speech Signals\nCamera Calibration from Dynamic Silhouettes Using Motion Barcodes\nRapid Exact Signal Scanning with Deep Convolutional Neural Networks\nFlatCam: Thin, Bare-Sensor Cameras using Coded Aperture and Computation\nLearning the Semantics of Manipulation Action\nA simple method for estimating the fractal dimension from digital  images: The compression dimension\nGenerating Images with Perceptual Similarity Metrics based on Deep  Networks\nLocal High-order Regularization on Data Manifolds\nDepth-Based Object Tracking Using a Robust Gaussian Filter\nEffective Computer Model For Recognizing Nationality From Frontal Image\nPersistent Homology of Attractors For Action Recognition\nDeep Self-Convolutional Activations Descriptor for Dense Cross-Modal  Correspondence\nAccelerating Deep Learning with Shrinkage and Recall\nNetwork as a Service: The New Vista of Opportunities\nHuman Computer Interaction Using Marker Based Hand Gesture Recognition\nNIST: An Image Classification Network to Image Semantic Retrieval\nEmoFit: Affect Monitoring System for Sedentary Jobs\nLabel distribution based facial attractiveness computation by deep  residual learning\nDeep Convolutional Networks as Models of Generalization and Blending  Within Visual Creativity\nMachine learning methods for accurate delineation of tumors in PET  images\nOptical Flow Estimation using a Spatial Pyramid Network\nLow-rank Bilinear Pooling for Fine-Grained Classification\nDense Prediction on Sequences with Time-Dilated Convolutions for Speech  Recognition\nEnd-to-End Training of Hybrid CNN-CRF Models for Stereo\nTwo-Bit Networks for Deep Learning on Resource-Constrained Embedded  Devices\nSecurity-related Research in Ubiquitous Computing -- Results of a  Systematic Literature Review\nDetection of Face using Viola Jones and Recognition using Back  Propagation Neural Network\nAn Epipolar Line from a Single Pixel\nUnsupervised Construction of Human Body Models Using Principles of  Organic Computing\nGroup-based Sparse Representation for Image Compressive Sensing  Reconstruction with Non-Convex Regularization\nA Real-time Hand Gesture Recognition and Human-Computer Interaction  System\nEvaluation and Prediction of Polygon Approximations of Planar Contours  for Shape Analysis\nComparative Analysis of Open Source Frameworks for Machine Learning with  Use Case in Single-Threaded and Multi-Threaded Modes\nA New Adaptive Video Super-Resolution Algorithm With Improved Robustness  to Innovations\nEfficient and accurate monitoring of the depth information in a Wireless  Multimedia Sensor Network based surveillance\nExploring the Imposition of Synaptic Precision Restrictions For  Evolutionary Synthesis of Deep Neural Networks\nDiscriminative Localization in CNNs for Weakly-Supervised Segmentation  of Pulmonary Nodules\nDiscrete Gyrator Transforms: Computational Algorithms and Applications\nLearned Primal-dual Reconstruction\nSalt-n-pepper noise filtering using Cellular Automata\nStreamlined Deployment for Quantized Neural Networks\nWhite Matter Fiber Segmentation Using Functional Varifolds\nTexture Fuzzy Segmentation using Skew Divergence Adaptive Affinity  Functions\nLearning Wasserstein Embeddings\nDetection and Analysis of Human Emotions through Voice and Speech  Pattern Processing\nConvolutional Drift Networks for Video Classification\nHyperNetworks with statistical filtering for defending adversarial  examples\nRandomized Nonnegative Matrix Factorization\nFast Predictive Simple Geodesic Regression\nGradually Updated Neural Networks for Large-Scale Image Recognition\nCompression for Smooth Shape Analysis\nA Survey of FPGA Based Neural Network Accelerator\nThe Internet of Battle Things\nCan Computers Create Art?\nWorst-case Optimal Submodular Extensions for Marginal Estimation\nStochastic Downsampling for Cost-Adjustable Inference and Improved  Regularization in Convolutional Networks\nKRISM --- Krylov Subspace-based Optical Computing of Hyperspectral  Images\nEfficient Large-Scale Multi-Modal Classification\nEfficient Neural Architecture Search via Parameter Sharing\nLightweight Classification of IoT Malware based on Image Recognition\nFD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy\nEnsemble computation approach to the Hough transform\nReal-Time End-to-End Action Detection with Two-Stream Networks\nFacial Expression Analysis under Partial Occlusion: A Survey\nSolving Inverse Computational Imaging Problems using Deep Pixel-level  Prior\nA Deep Learning Approach for Multimodal Deception Detection\nAbnormality Detection in Mammography using Deep Convolutional Neural  Networks\nSparse Adversarial Perturbations for Videos\nSharing and Preserving Computational Analyses for Posterity with  encapsulator\nEfficient Hardware Realization of Convolutional Neural Networks using  Intra-Kernel Regular Pruning\nTransformation on Computer-Generated Facial Image to Avoid Detection by  Spoofing Detector\nThe similarity metric\nEvolutionary Computational Method of Facial Expression Analysis for  Content-based Video Retrieval using 2-Dimensional Cellular Automata\nSimplification and integration in computing and cognition: the SP theory  and the multiple alignment concept\nThe SP theory of intelligence: benefits and applications\nEfficient Legendre moment computation for grey level images\nReal-Time Human-Computer Interaction Based on Face and Hand Gesture  Recognition\nAn NBDMMM Algorithm Based Framework for Allocation of Resources in Cloud\nAiiDA: Automated Interactive Infrastructure and Database for  Computational Science\nJoint System and Algorithm Design for Computationally Efficient Fan Beam  Coded Aperture X-ray Coherent Scatter Imaging\nDyVEDeep: Dynamic Variable Effort Deep Neural Networks\nComputer-aided position planning of miniplates to treat facial bone  defects\nAn Integrated Soft Computing Approach to a Multi-biometric Security  Model\nThe Neural Representation Benchmark and its Evaluation on Brain and  Machine\nSparsey: Event Recognition via Deep Hierarchical Spare Distributed Codes\nFast YOLO: A Fast You Only Look Once System for Real-time Embedded  Object Detection in Video\nUsing Self-Organising Mappings to Learn the Structure of Data Manifolds\nScheduleNanny: Using GPS to Learn the User's Significant Locations,  Travel Times and Schedule\nA simple effective method for curvatures estimation on triangular meshes\nSpatiotemporal sensistivity and visual attention for efficient rendering  of dynamic environments\nFast Lexically Constrained Viterbi Algorithm (FLCVA): Simultaneous  Optimization of Speed and Memory\nA Fast and Accurate Nonlinear Spectral Method for Image Recognition and  Registration\nGrid Added Value to Address Malaria\nEvolutionary Optimisation Methods for Template Based Image Registration\nConstruction of Bayesian Deformable Models via Stochastic Approximation  Algorithm: A Convergence Study\nEmpirical Evaluation of Four Tensor Decomposition Algorithms\nStochastic Algorithm For Parameter Estimation For Dense Deformable  Template Mixture Model\nNetwork QoS Management in Cyber-Physical Systems\nA Novel Clustering Algorithm Based on Quantum Games\nOn Design and Implementation of the Distributed Modular Audio  Recognition Framework: Requirements and Specification Design Document\nParallel AdaBoost Algorithm for Gabor Wavelet Selection in Face  Recognition\nRIOT: I/O-Efficient Numerical Computing without SQL\nAn Efficient Secure Multimodal Biometric Fusion Using Palmprint and Face  Image\nA Method for Extraction and Recognition of Isolated License Plate  Characters\nColor Image Clustering using Block Truncation Algorithm\nAn Optimal Method For Wake Detection In SAR Images Using Radon  Transformation Combined With Wavelet Filters\nCanICA: Model-based extraction of reproducible group-level ICA patterns  from fMRI time series\nSequential Clustering based Facial Feature Extraction Method for  Automatic Creation of Facial Models from Orthogonal Views\nReversible Image Authentication with Tamper Localization Based on  Integer Wavelet Transform\nFingerprint Verification based on Gabor Filter Enhancement\nRobust Multi biometric Recognition Using Face and Ear Images\nPerformance analysis of Non Linear Filtering Algorithms for underwater  images\nGenetic Programming Framework for Fingerprint Matching\nGesture Recognition with a Focus on Important Actions by Using a Path  Searching Method in Weighted Graph\nStability of multidimensional persistent homology with respect to domain  perturbations\nA New Image Steganography Based On First Component Alteration Technique\nAn Improved Image Mining Technique For Brain Tumour Classification Using  Efficient Classifier\nA Comparative Study of Removal Noise from Remote Sensing Image\nText Region Extraction from Business Card Images for Mobile Devices\nFacial Gesture Recognition Using Correlation And Mahalanobis Distance\nSliding window approach based Text Binarisation from Complex Textual  images\nTuning CLD Maps\nSpatially-Adaptive Reconstruction in Computed Tomography Based on  Statistical Learning\nNew Visual Cryptography Algorithm For Colored Image\nDeblured Gaussian Blurred Images\nGraphic Symbol Recognition using Graph Based Signature and Bayesian  Network Classifier\nBayesian estimation of regularization and PSF parameters for Wiener-Hunt  deconvolution\nCombining Multiple Feature Extraction Techniques for Handwritten  Devnagari Character Recognition\nAn Effective Fingerprint Verification Technique\nFuture management needs of a \"software-driven\" science community\nMultiple Classifier Combination for Off-line Handwritten Devnagari  Character Recognition\nSurvey of Nearest Neighbor Techniques\nMaximum Likelihood Mosaics\nSegmentation of Camera Captured Business Card Images for Mobile Devices\nA correspondence-less approach to matching of deformable shapes\nAn Effect of Spatial Filtering in Visualization of Coronary Arteries  Imaging\nVisualization techniques for data mining of Latur district satellite  imagery\nDWT Based Fingerprint Recognition using Non Minutiae Features\nFingerprint: DWT, SVD Based Enhancement and Significant Contrast for  Ridges and Valleys Using Fuzzy Measures\nFingerprint recognition using standardized fingerprint model\nFast multi-scale edge-detection in medical ultrasound signals\nLinearized Alternating Direction Method with Adaptive Penalty for  Low-Rank Representation\nDevnagari document segmentation using histogram approach\nImprovements on \"Fast space-variant elliptical filtering using box  splines\"\nSpeculative Parallel Evaluation Of Classification Trees On GPGPU Compute  Engines\nA Facial Expression Classification System Integrating Canny, Principal  Component Analysis and Artificial Neural Network\n3D Model Retrieval Based on Semantic and Shape Indexes\nProbabilistic Motion Estimation Based on Temporal Coherence\nInformation Distance: New Developments\nA Multimodal Biometric System Using Linear Discriminant Analysis For  Improved Performance\nA Novel Approach to Fast Image Filtering Algorithm of Infrared Images  based on Intro Sort Algorithm\nA General Solver Based on Sparse Resultants\nComparing Methods for segmentation of Microcalcification Clusters in  Digitized Mammograms\nVery Short Literature Survey From Supervised Learning To Surrogate  Modeling\nNew approach using Bayesian Network to improve content based image  classification systems\nParametric annealing: a stochastic search method for human pose tracking\nVolumetric Mapping of Genus Zero Objects via Mass Preservation\nFingerprint Gender Classification using Wavelet Transform and Singular  Value Decomposition\nOptimizing Face Recognition Using PCA\nInvestigation of Color Constancy for Ubiquitous Wireless LAN/Camera  Positioning: An Initial Outcome\nBayesian Random Fields: The Bethe-Laplace Approximation\nSpeckle Reduction using Stochastic Distances\nCups Products in Z2-Cohomology of 3D Polyhedral Complexes\nAssessment of SAR Image Filtering using Adaptive Stack Filters\nFixed Interfaces, Adaptive Interfaces... What is next? Total movability  - a new paradigm for the user interface\nA Novel Approach of Color Image Hiding using RGB Color planes and DWT\nCombinatorial Gradient Fields for 2D Images with Empirically Convergent  Separatrices\nSegmentation of Breast Regions in Mammogram Based on Density: A Review\nA Complete System for Candidate Polyps Detection in Virtual Colonoscopy\nInference of Fine-grained Attributes of Bengali Corpus for Stylometry  Detection\n3D Face Recognition using Significant Point based SULD Descriptor\nA New Algorithm Based Entropic Threshold for Edge Detection in Images\nThe role of colour preattentive processing in human-computer interaction  task efficiency: a preliminary study\nA recursive divide-and-conquer approach for sparse principal component  analysis\nSelf Authentication of image through Daubechies Transform technique  (SADT)\nFast and Robust Linear Motion Deblurring\nA Novel Directional Weighted Minimum Deviation (DWMD) Based Filter for  Removal of Random Valued Impulse Noise\nToward New Vision in Teaching Calculus\nA Self-Organizing Neural Scheme for Door Detection in Different  Environments\nBarnes-Hut-SNE\nAn ANN-based Method for Detecting Vocal Fold Pathology\nAutomatic symmetry based cluster approach for anomalous brain  identification in PET scan image : An Analysis\nGBM Volumetry using the 3D Slicer Medical Image Computing Platform\nFont Acknowledgment and Character Extraction of Digital and Scanned  Images\nDetermining Points on Handwritten Mathematical Symbols\nExploiting Data Parallelism in the yConvex Hypergraph Algorithm for  Image Representation using GPGPUs\nConversion of Braille to Text in English, Hindi and Tamil Languages\nContent Based Image Retrieval System using Feature Classification with  Modified KNN Algorithm\nSelection Mammogram Texture Descriptors Based on Statistics Properties  Backpropagation Structure\nHaptic Science and Technology\nDevelopment of Comprehensive Devnagari Numeral and Character Database  for Offline Handwritten Character Recognition\nContextually learnt detection of unusual motion-based behaviour in  crowded public spaces\nContour polygonal approximation using shortest path in networks\nNeural Network Application on Foliage Plant Identification\nMulti-Sensor Image Fusion Based on Moment Calculation\nGeometric Feature Based Face-Sketch Recognition\nRegion and Location Based Indexing and Retrieval of MR-T2 Brain Tumor  Images\nEnd-to-end Phoneme Sequence Recognition using Convolutional Neural  Networks\nSparse similarity-preserving hashing\nDeep Inside Convolutional Networks: Visualising Image Classification  Models and Saliency Maps\nImage Processing based Systems and Techniques for the Recognition of  Ancient and Modern Coins\nAutomated Coin Recognition System using ANN\nPerformance Engineering for a Medical Imaging Application on the Intel  Xeon Phi Accelerator\nLeaf Classification Using Shape, Color, and Texture Features\nLicense Plate Recognition (LPR): A Review with Experiments for Malaysia  Case Study\nsmart application for AMS using Face Recognition\nOn Performance of Logical-Clustering Of Flow-Sensors\nMulti-view Face Analysis Based on Gabor Features\nAnt Colony based Feature Selection Heuristics for Retinal Vessel  Segmentation\nA-infinity Persistence\nIndoor 3D Video Monitoring Using Multiple Kinect Depth-Cameras\nAn inertial forward-backward algorithm for monotone inclusions\nStabilizing dual-energy X-ray computed tomography reconstructions using  patch-based regularization\nExtraction of Projection Profile, Run-Histogram and Entropy Features  Straight from Run-Length Compressed Text-Documents\nExploiting Linear Structure Within Convolutional Networks for Efficient  Evaluation\nIcon Based Information Retrieval and Disease Identification in  Agriculture\nDynamic Mode Decomposition for Real-Time Background/Foreground  Separation in Video\nA Study of Local Binary Pattern Method for Facial Expression Detection\nAn evolutionary computational based approach towards automatic image  registration\nRecognition of Isolated Words using Zernike and MFCC features for Audio  Visual Speech Recognition\nObject Proposal Generation using Two-Stage Cascade SVMs\nA Bottom-Up Approach for Automatic Pancreas Segmentation in Abdominal CT  Scans\nReal-Time and Robust Method for Hand Gesture Recognition System Based on  Cross-Correlation Coefficient\nSelf Organization Map based Texture Feature Extraction for Efficient  Medical Image Categorization\nBi-l0-l2-Norm Regularization for Blind Motion Deblurring\nA Convex Approach to Consensus on SO(n)\nA graph Laplacian regularization for hyperspectral data unmixing\nSalient Object Detection: A Discriminative Regional Feature Integration  Approach\nAffective Facial Expression Processing via Simulation: A Probabilistic  Model\nFast Iteratively Reweighted Least Squares Algorithms for Analysis-Based  Sparsity Reconstruction\nFast forward feature selection for the nonlinear classification of  hyperspectral images\nA new ADMM algorithm for the Euclidean median and its application to  robust patch regression\nMicroscopic Advances with Large-Scale Learning: Stochastic Optimization  for Cryo-EM\nScalable Multi-Output Label Prediction: From Classifier Chains to  Classifier Trellises\nWise Computing: Towards Endowing System Development with True Wisdom\nA Cheap System for Vehicle Speed Detection\nA Proximal Bregman Projection Approach to Continuous Max-Flow Problems  Using Entropic Distances\nDeep Learning for Object Saliency Detection and Image Segmentation\nData Fusion of Objects Using Techniques Such as Laser Scanning,  Structured Light and Photogrammetry for Cultural Heritage Applications\nHow Far Can You Get By Combining Change Detection Algorithms?\nA Review Paper: Noise Models in Digital Image Processing\nGraph edit distance : a new binary linear programming formulation\nRandomized Robust Subspace Recovery for High Dimensional Data Matrices\nSAR Imaging of Moving Target based on Knowledge-aided Two-dimensional  Autofocus\nTeaching Logic to Information Systems Students: Challenges and  Opportunities\nA National Effort for Motivating Indian Students and Teachers towards  Algorithmic Research\nManitest: Are classifiers really invariant?\nThinning Algorithm Using Hypergraph Based Morphological Operators\nLarge-scale subspace clustering using sketching and validation\nA Latent Source Model for Patch-Based Image Segmentation\nStereo Matching by Training a Convolutional Neural Network to Compare  Image Patches\nStacked Attention Networks for Image Question Answering\nNeural Module Networks\nHeterogeneous Knowledge Transfer in Video Emotion Recognition,  Attribution and Summarization\nAn Introduction to Convolutional Neural Networks\nSparseness helps: Sparsity Augmented Collaborative Representation for  Classification\nR-FUSE: Robust Fast Fusion of Multi-Band Images Based on Solving a  Sylvester Equation\nLocal Binary Pattern for Word Spotting in Handwritten Historical  Document\nIncremental Reconstruction of Urban Environments by Edge-Points Delaunay  Triangulation\nA Classifier-guided Approach for Top-down Salient Object Detection\nKernelized Covariance for Action Recognition\nImage Colorization Using a Deep Convolutional Neural Network\nJoint Line Segmentation and Transcription for End-to-End Handwritten  Paragraph Recognition\nAttention Tree: Learning Hierarchies of Visual Features for Large-Scale  Image Recognition\nFPGA system for real-time computational extended depth of field imaging  using phase aperture coding\nOptimal Filtered Backprojection for Fast and Accurate Tomography  Reconstruction\nAdapting Deep Network Features to Capture Psychological Representations\nTemporal Registration in In-Utero Volumetric MRI Time Series\nDynamic Network Surgery for Efficient DNNs\nThe Symmetry of a Simple Optimization Problem in Lasso Screening\nLearning Temporal Transformations From Time-Lapse Videos\nCast and Self Shadow Segmentation in Video Sequences using Interval  based Eigen Value Representation\nOn the Cohomology of 3D Digital Images\nFast Approximate L_infty Minimization: Speeding Up Robust Regression\nSemi-Optimal Edge Detector based on Simple Standard Deviation with  Adjusted Thresholding\nArabic Text Recognition in Video Sequences\nToward Cloud-based Vehicular Networks with Efficient Resource Management\nA Cellular Automata based Optimal Edge Detection Technique using  Twenty-Five Neighborhood Model\nPerformance of Hull-Detection Algorithms For Proton Computed Tomography  Reconstruction\nDirect Processing of Run Length Compressed Document Image for  Segmentation and Characterization of a Specified Block\nRealtime Multilevel Crowd Tracking using Reciprocal Velocity Obstacles\nSurvey on Sparse Coded Features for Content Based Face Image Retrieval\nVisual Saliency Model using SIFT and Comparison of Learning Approaches\nCombined Approach for Image Segmentation\nLocal Decorrelation For Improved Detection\nOptimization Methods for Convolutional Sparse Coding\nRecurrent Models of Visual Attention\nConvex Hulls under Uncertainty\nA Computational Model of the Short-Cut Rule for 2D Shape Decomposition\nReal-time Crowd Tracking using Parameter Optimized Mixture of Motion  Models\nGoing Deeper with Convolutions\nApproximation errors of online sparsification criteria\nHistogram of Oriented Principal Components for Cross-View Action  Recognition\nFast Sublinear Sparse Representation using Shallow Tree Matching Pursuit\nFast Steerable Principal Component Analysis\nAnalytical Comparison of Noise Reduction Filters for Image Restoration  Using SNR Estimation\nImage Data Compression for Covariance and Histogram Descriptors\nPy3DFreeHandUS: a library for voxel-array reconstruction using  Ultrasonography and attitude sensors\nSelf-informed neural network structure learning\nTransformation Properties of Learned Visual Representations\nFast and Robust Feature Matching for RGB-D Based Localization\nGradient Difference based approach for Text Localization in Compressed  domain\nSemi-supervised Segmentation Fusion of Multi-spectral and Aerial Images\nLearning Descriptors for Object Recognition and 3D Pose Estimation\nMatrix Product State for Feature Extraction of Higher-Order Tensors\nA Survey On Video Forgery Detection\nTomographic Image Reconstruction using Training images\nBrain Tumor Segmentation: A Comparative Analysis\nProperties of simple sets in digital spaces. Contractions of simple sets  preserving the homotopy type of a digital space\nComparisons of wavelet functions in QRS signal to noise ratio  enhancement and detection accuracy\nMultimodal Convolutional Neural Networks for Matching Image and Sentence\nComputational Cost Reduction in Learned Transform Classifications\nSpeeding Up Neural Networks for Large Scale Classification using WTA  Hashing\nLearning to Answer Questions From Image Using Convolutional Neural  Network\nRobust Face Recognition with Structural Binary Gradient Patterns\nUnderstanding deep features with computer-generated imagery\nImplementation of Training Convolutional Neural Networks\nFacial Expressions recognition Based on Principal Component Analysis  (PCA)\nLearning both Weights and Connections for Efficient Neural Networks\nExtract an essential skeleton of a character as a graph from a character  image\nMRF-ZOOM: A Fast Dictionary Searching Algorithm for Magnetic Resonance  Fingerprinting\nFace Prediction Model for an Automatic Age-invariant Face Recognition  System\nUnderstanding Neural Networks Through Deep Visualization\nAligning Books and Movies: Towards Story-like Visual Explanations by  Watching Movies and Reading Books\nForming A Random Field via Stochastic Cliques: From Random Graphs to  Fully Connected Random Fields\nSaliency maps on image hierarchies\nGaussian Mixture Reduction Using Reverse Kullback-Leibler Divergence\nDeep Convolutional Neural Networks for Smile Recognition\nAdapting Resilient Propagation for Deep Learning\nDirect high-order edge-preserving regularization for tomographic image  reconstruction\nFast Template Matching by Subsampled Circulant Matrix\nProjection Bank: From High-dimensional Data to Medium-length Binary  Codes\nA Parallel Framework for Parametric Maximum Flow Problems in Image  Segmentation\nImplicit Sparse Code Hashing\nCompressed Dynamic Mode Decomposition for Background Modeling\nDo Less and Achieve More: Training CNNs for Action Recognition Utilizing  Action Images from the Web\nExploiting Local Structures with the Kronecker Layer in Convolutional  Networks\nComputational Pathology: Challenges and Promises for Tissue Analysis\nB-spline Shape from Motion & Shading: An Automatic Free-form Surface  Modeling for Face Reconstruction\nGeometric-Algebra LMS Adaptive Filter and its Application to Rotation  Estimation\nFast Binary Embedding via Circulant Downsampled Matrix -- A  Data-Independent Approach\nVery Efficient Training of Convolutional Neural Networks using Fast  Fourier Transform and Overlap-and-Add\nA semi-automatic computer-aided method for surgical template design\nPreoperative Volume Determination for Pituitary Adenoma\nSigner-independent Fingerspelling Recognition with Deep Neural Network  Adaptation\nComputer Aided Restoration of Handwritten Character Strokes\nOn Study of the Binarized Deep Neural Network for Image Classification\nLearning Shapes by Convex Composition\nAdaptive Frequency Cepstral Coefficients for Word Mispronunciation  Detection\nDynamic Memory Networks for Visual and Textual Question Answering\nNested Invariance Pooling and RBM Hashing for Image Instance Retrieval\nStitching Stabilizer: Two-frame-stitching Video Stabilization for  Embedded Systems\nOn Fast Bilateral Filtering using Fourier Kernels\nSparse Activity and Sparse Connectivity in Supervised Learning\nDeep Embedding for Spatial Role Labeling\nFAST: A Framework to Accelerate Super-Resolution Processing on  Compressed Videos\nUnsupervised Understanding of Location and Illumination Changes in  Egocentric Videos\nAnalog Signal Processing Approach for Coarse and Fine Depth Estimation\nA metric on the space of finite sets of trajectories for evaluation of  multi-target tracking algorithms\nNot Just a Black Box: Learning Important Features Through Propagating  Activation Differences\nShaping the Future through Innovations: From Medical Imaging to  Precision Medicine\nReal-time Eye Gaze Direction Classification Using Convolutional Neural  Network\nopenXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words  Toolkit\nA Deep Learning Approach to Block-based Compressed Sensing of Images\nVery Deep Convolutional Networks for Text Classification\nFast and Extensible Online Multivariate Kernel Density Estimation\nConvolutional Sketch Inversion\nUniversal Correspondence Network\ncltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network  Library, Based on OpenCL\nMultiplierless 16-point DCT Approximation for Low-complexity Image and  Video Coding\nSpatial Aggregation of Holistically-Nested Networks for Automated  Pancreas Segmentation\nOut-of-Sample Extension for Dimensionality Reduction of Noisy Time  Series\nAlternating Back-Propagation for Generator Network\nDe-Hashing: Server-Side Context-Aware Feature Reconstruction for Mobile  Visual Search\nResolution- and throughput-enhanced spectroscopy using high-throughput  computational slit\nVisualizing Natural Language Descriptions: A Survey\nFrom Collective Adaptive Systems to Human Centric Computation and Back:  Spatial Model Checking for Medical Imaging\nNetwork Trimming: A Data-Driven Neuron Pruning Approach towards  Efficient Deep Architectures\nMulti-modal image retrieval with random walk on multi-layer graphs\nSpatial probabilistic pulsatility model for enhancing  photoplethysmographic imaging systems\nFeatures Fusion for Classification of Logos\nSegmentation and Classification of Skin Lesions for Disease Diagnosis\nWarped Convolutions: Efficient Invariance to Spatial Transformations\nA Large Contextual Dataset for Classification, Detection and Counting of  Cars with Deep Learning\nMatrix Product State for Higher-Order Tensor Compression and  Classification\nContext Aware Nonnegative Matrix Factorization Clustering\nImage-to-Markup Generation with Coarse-to-Fine Attention\nFrom Multiview Image Curves to 3D Drawings\nDocument Image Coding and Clustering for Script Discrimination\nCharacterization of Lung Nodule Malignancy using Hybrid Shape and  Appearance Features\nNeural Photo Editing with Introspective Adversarial Networks\nThe face-space duality hypothesis: a computational model\nLow-complexity Image and Video Coding Based on an Approximate Discrete  Tchebichef Transform\nA Deep Spatial Contextual Long-term Recurrent Convolutional Network for  Saliency Detection\nProposal for Automatic License and Number Plate Recognition System for  Vehicle Identification\nVideo Depth-From-Defocus\nReal-time Joint Tracking of a Hand Manipulating an Object from RGB-D  Input\nImage Clustering without Ground Truth\nAnatomically Constrained Video-CT Registration via the V-IMLOP Algorithm\nDiversity Promoting Online Sampling for Streaming Video Summarization\nSliding Dictionary Based Sparse Representation For Action Recognition\nLearning Deep Embeddings with Histogram Loss\nRegularized Pel-Recursive Motion Estimation Using Generalized  Cross-Validation and Spatial Adaptation\nDetecting Moving Regions in CrowdCam Images\nTowards Interconnected Virtual Reality: Opportunities, Challenges and  Enablers\nDelugeNets: Deep Networks with Efficient and Flexible Cross-layer  Information Inflows\nPolyNet: A Pursuit of Structural Diversity in Very Deep Networks\nGenerative One-Class Models for Text-based Person Retrieval in Forensic  Applications\nFast low-level pattern matching algorithm\nNoiseOut: A Simple Way to Prune Neural Networks\nLearning the Number of Neurons in Deep Networks\nDeep Residual Learning for Compressed Sensing CT Reconstruction via  Persistent Homology Analysis\nDeep Tensor Convolution on Multicores\nPVANet: Lightweight Deep Neural Networks for Real-time Object Detection\nComputational Mapping of the Ground Reflectivity with Laser Scanners\nComputer Aided Detection of Oral Lesions on CT Images\nTraining Bit Fully Convolutional Network for Fast Semantic Segmentation\nA Large Deformation Diffeomorphic Approach to Registration of CLARITY  Images via Mutual Information\nAutomatic Lymphocyte Detection in H&E Images with Deep Neural Networks\nDevelopment of a Real-time Colorectal Tumor Classification System for  Narrow-band Imaging zoom-videoendoscopy\nA Stochastic Large Deformation Model for Computational Anatomy\nDeep Multi-instance Networks with Sparse Label Assignment for Whole  Mammogram Classification\nCross-Modal Manifold Learning for Cross-modal Retrieval\nEfficient Action Detection in Untrimmed Videos via Multi-Task Learning\nActive Learning and Proofreading for Delineation of Curvilinear  Structures\nBayesian Nonparametric Models for Synchronous Brain-Computer Interfaces\nOptimization on Product Submanifolds of Convolution Kernels\nLearning Word-Like Units from Joint Audio-Visual Analysis\nPruned non-local means\nSeeded Laplaican: An Eigenfunction Solution for Scribble Based  Interactive Image Segmentation\nTask-driven Visual Saliency and Attention-based Visual Question  Answering\nAutomatic segmentation of trees in dynamic outdoor environments\nFast Back-Projection for Non-Line of Sight Reconstruction\nA Computational Model of a Single-Photon Avalanche Diode Sensor for  Transient Imaging\nFast Gesture Recognition with Multiple Stream Discrete HMMs on 3D  Skeletons\nConvolutional Spike Timing Dependent Plasticity based Feature Learning  in Spiking Neural Networks\nA 3D Object Detection and Pose Estimation Pipeline Using RGB-D Images\nWeb-based visualisation of head pose and facial expressions changes:  monitoring human activity using depth data\nSurfNet: Generating 3D shape surfaces using deep residual networks\nLocal Patch Classification Based Framework for Single Image  Super-Resolution\nRobust Non-Rigid Registration With Reweighted Dual Sparsities\nCloud Radiative Effect Study Using Sky Camera\nFast Spectral Ranking for Similarity Search\nGenerative Adversarial Residual Pairwise Networks for One Shot Learning\nAdversarial Transformation Networks: Learning to Generate Adversarial  Examples\nSemantic-driven Generation of Hyperlapse from $360^\\circ$ Video\nChained Multi-stream Networks Exploiting Pose, Motion, and Appearance  for Action Classification and Detection\nA Computational Approach to Relative Aesthetics\nLearning Important Features Through Propagating Activation Differences\nEfficient Sparse Subspace Clustering by Nearest Neighbour Filtering\n3D seismic data denoising using two-dimensional sparse coding scheme\nRanking to Learn: Feature Ranking and Selection via Eigenvector  Centrality\nFast Generation for Convolutional Autoregressive Models\nSecond-order Temporal Pooling for Action Recognition\nA Deep Learning Perspective on the Origin of Facial Expressions\nCross-label Suppression: A Discriminative and Fast Dictionary Learning  with Group Regularization\nPhase recovery and holographic image reconstruction using deep learning  in neural networks\nSpatial Variational Auto-Encoding via Matrix-Variate Normal  Distributions\nOn Convergence and Stability of GANs\nGaze Distribution Analysis and Saliency Prediction Across Age Groups\nDeep Multi-instance Networks with Sparse Label Assignment for Whole  Mammogram Classification\nAdversarial Generation of Natural Language\nLearning Time/Memory-Efficient Deep Architectures with Budgeted Super  Networks\nSubmanifold Sparse Convolutional Networks\nYeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor\nCompression Fractures Detection on CT\nDistributed Hierarchical Control for State Estimation With Robotic  Sensor Networks\nShiftCNN: Generalized Low-Precision Architecture for Inference of  Convolutional Neural Networks\nLarge-Scale Plant Classification with Deep Neural Networks\nAccurate Pulmonary Nodule Detection in Computed Tomography Images Using  Deep Convolutional Neural Networks\nRobotic Ironing with 3D Perception and Force/Torque Feedback in  Household Environments\nUsing Convolutional Neural Networks in Robots with Limited Computational  Resources: Detecting NAO Robots while Playing Soccer\nGPGPU Acceleration of the KAZE Image Feature Extraction Algorithm\nClass-specific image denoising using importance sampling\nEvolving Spatially Aggregated Features from Satellite Imagery for  Regional Modeling\nResearch Opportunities and Visions for Smart and Pervasive Health\nDetection and Localization of Image Forgeries using Resampling Features  and Deep Learning\nTemporal HeartNet: Towards Human-Level Automatic Analysis of Fetal  Cardiac Screening Video\nData-Driven Sparse Structure Selection for Deep Neural Networks\nLike What You Like: Knowledge Distill via Neuron Selectivity Transfer\nA dataset for Computer-Aided Detection of Pulmonary Embolism in CTA  images\nSHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter  Optimization\nGPU-Accelerated Algorithms for Compressed Signals Recovery with  Application to Astronomical Imagery Deblurring\nInterleaved Group Convolutions for Deep Neural Networks\nSaltiNet: Scan-path Prediction on 360 Degree Images using Saliency  Volumes\nTracking as Online Decision-Making: Learning a Policy from Streaming  Videos with Reinforcement Learning\nSpeeding up the Köhler's method of contrast thresholding\nSlanted Stixels: Representing San Francisco's Steepest Streets\nOn Optimizing Distributed Tucker Decomposition for Dense Tensors\nFast Screening Algorithm for Rotation and Scale Invariant Template  Matching\nDeeply-Learned Part-Aligned Representations for Person Re-Identification\nAnisotropic EM Segmentation by 3D Affinity Learning and Agglomeration\nOcclusion Handling using Semantic Segmentation and Visibility-Based  Rendering for Mixed Reality\nSynthesis of Positron Emission Tomography (PET) Images via Multi-channel  Generative Adversarial Networks (GANs)\nExtremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM\nStatistics on the (compact) Stiefel manifold: Theory and Applications\nReal-time Deep Video Deinterlacing\nCNN Cascades for Segmenting Whole Slide Images of the Kidney\nWhat is the Role of Recurrent Neural Networks (RNNs) in an Image Caption  Generator?\nJointly Attentive Spatial-Temporal Pooling Networks for Video-based  Person Re-Identification\nMutual Visibility by Robots with Persistent Memory\nSemantic Video CNNs through Representation Warping\nImproved Fixed-Rank Nyström Approximation via QR Decomposition:  Practical and Theoretical Aspects\nLarge Batch Training of Convolutional Networks\nActive Orthogonal Matching Pursuit for Sparse Subspace Clustering\nPillar Networks++: Distributed non-parametric deep and wide networks\nAnytime Neural Network: a Versatile Trade-off Between Computation and  Accuracy\nSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks\nStatistical Selection of CNN-Based Audiovisual Features for  Instantaneous Estimation of Human Emotional States\nSuper-Convergence: Very Fast Training of Residual Networks Using Large  Learning Rates\nModel based learning for accelerated, limited-view 3D photoacoustic  tomography\nIs human face processing a feature- or pattern-based task? Evidence  using a unified computational method driven by eye movements\nReal-time convolutional networks for sonar image classification in  low-power embedded systems\nA Novel Low-Complexity Framework in Ultra-Wideband Imaging for Breast  Cancer Detection\nCan you tell a face from a HEVC bitstream?\nAcceleration of Histogram-Based Contrast Enhancement via Selective  Downsampling\nViewpoint Invariant Action Recognition using RGB-D Videos\nUne véritable approche $\\ell_0$ pour l'apprentissage de dictionnaire\nNeural network identification of people hidden from view with a  single-pixel, single-photon detector\nGenerative learning for deep networks\nMuon Trigger for Mobile Phones\nNumerical optimization for Artificial Retina Algorithm\nTensor Product Generation Networks for Deep NLP Modeling\nEfficient Convolutional Neural Network For Audio Event Detection\nVIDOSAT: High-dimensional Sparsifying Transform Learning for Online  Video Denoising\nCalligraphic Stylisation Learning with a Physiologically Plausible Model  of Movement and Recurrent Neural Networks\nAnatomical Pattern Analysis for decoding visual stimuli in human brains\nDiffuserCam: Lensless Single-exposure 3D Imaging\nCAMREP- Concordia Action and Motion Repository\nVector Quantization using the Improved Differential Evolution Algorithm  for Image Compression\nReal-time marker-less multi-person 3D pose estimation in RGB-Depth  camera networks\nA Line-Point Unified Solution to Relative Camera Pose Estimation\nBlock DCT filtering using vector processing\nDeep Cropping via Attention Box Prediction and Aesthetics Assessment\nComputational ghost imaging using deep learning\nEnhanced Biologically Inspired Model for Image Recognition Based on a  Novel Patch Selection Method with Moment\nWhodunnit? Crime Drama as a Case for Natural Language Understanding\nReBNet: Residual Binarized Neural Network\nComputationally efficient cardiac views projection using 3D  Convolutional Neural Networks\nCurve Reconstruction via the Global Statistics of Natural Curves\nUnsupervised Learning of Geometry with Edge-aware Depth-Normal  Consistency\nA General Neural Network Hardware Architecture on FPGA\nParametric Manifold Learning Via Sparse Multidimensional Scaling\nGrammatical facial expression recognition using customized deep neural  network architecture\nMobile Video Object Detection with Temporally-Aware Feature Maps\nA Generalized Genetic Algorithm-Based Solver for Very Large Jigsaw  Puzzles of Complex Types\nTotal Variation-Based Dense Depth from Multi-Camera Array\nRobust Object Tracking Based on Self-adaptive Search Area\nBlockDrop: Dynamic Inference Paths in Residual Networks\nEfficient quantum circuit for singular value thresholding\nCondenseNet: An Efficient DenseNet using Learned Group Convolutions\nWSNet: Compact and Efficient Networks with Weight Sampling\nDeformation estimation of an elastic object by partial observation using  a neural network\nTensorFlow Distributions\nBudget-Aware Activity Detection with A Recurrent Policy Network\nReal-time Semantic Image Segmentation via Spatial Sparsity\nHuman activity recognition from mobile inertial sensors using recurrence  plots\nO-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis\nDesign Automation for Binarized Neural Networks: A Quantum Leap  Opportunity?\nAdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks\nTactics to Directly Map CNN graphs on Embedded FPGAs\nDetection and Attention: Diagnosing Pulmonary Lung Cancer from CT by  Imitating Physicians\nDeformable Classifiers\nTracking objects using 3D object proposals\nEnd-to-end weakly-supervised semantic alignment\nAnalysis of supervised and semi-supervised GrowCut applied to  segmentation of masses in mammography images\nDeep Learning with Lung Segmentation and Bone Shadow Exclusion  Techniques for Chest X-Ray Analysis of Lung Cancer\nBenchmarking Decoupled Neural Interfaces with Synthetic Gradients\nFacial emotion recognition using min-max similarity classifier\nTopological Tracking of Connected Components in Image Sequences\nReducing Deep Network Complexity with Fourier Transform Methods\nFOTS: Fast Oriented Text Spotting with a Unified Network\nArchitecture Based Classification of Leaf Images\nData Augmentation for Brain-Computer Interfaces: Analysis on  Event-Related Potentials Data\nDeep Stereo Matching with Explicit Cost Aggregation Sub-Architecture\nEmpirical Explorations in Training Networks with Discrete Activations\nTexT - Text Extractor Tool for Handwritten Document Transcription and  Annotation\nMobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC)  Perspective\nEffNet: An Efficient Structure for Convolutional Neural Networks\nStacked Filters Stationary Flow For Hardware-Oriented Acceleration Of  Deep Convolutional Neural Networks\nRobust Multi-subspace Analysis Using Novel Column L0-norm Constrained  Matrix Factorization\nFast and Accurate Reconstruction of Compressed Color Light Field\nInference, Learning and Attention Mechanisms that Exploit and Preserve  Sparsity in Convolutional Networks\nDual Recurrent Attention Units for Visual Question Answering\nDeep Convolutional Neural Networks for Breast Cancer Histology Image  Analysis\nRecent Advances in Efficient Computation of Deep Convolutional Neural  Networks\nAdviser Networks: Learning What Question to Ask for Human-In-The-Loop  Viewpoint Estimation\nHighly accurate model for prediction of lung nodule malignancy with CT  scans\nMultispectral Compressive Imaging Strategies using Fabry-Pérot  Filtered Sensors\n2D-Densely Connected Convolution Neural Networks for automatic Liver and  Tumor Segmentation\nFastNet\nUniversal Deep Neural Network Compression\nMiMatrix: A Massively Distributed Deep Learning Framework on a Petascale  High-density Heterogeneous Cluster\nAdvertising in the IoT Era: Vision and Challenges\nAn Optimized Architecture for Unpaired Image-to-Image Translation\nAnalyzing and Mitigating the Impact of Permanent Faults on a Systolic  Array Based Neural Network Accelerator\nComputer-Aided Knee Joint Magnetic Resonance Image Segmentation - A  Survey\nTeaching Machines to Code: Neural Markup Generation with Visual  Attention\nSecurity and Privacy Approaches in Mixed Reality: A Literature Survey\nOn Lyapunov exponents and adversarial perturbation\nLeast Square Error Method Robustness of Computation: What is not usually  considered and taught\nLocally Adaptive Learning Loss for Semantic Image Segmentation\nRecurrent Residual Module for Fast Inference in Videos\nSpeeding Up the Bilateral Filter: A Joint Acceleration Way\nA Neural Multi-sequence Alignment TeCHnique (NeuMATCH)\nCalcium Removal From Cardiac CT Images Using Deep Convolutional Neural  Network\nDeep-neural-network based sinogram synthesis for sparse-view CT image  reconstruction\nClassification based Grasp Detection using Spatial Transformer Network\nMIS-SLAM: Real-time Large Scale Dense Deformable SLAM System in Minimal  Invasive Surgery Based on Heterogeneous Computing\nWhere is my Device? - Detecting the Smart Device's Wearing Location in  the Context of Active Safety for Vulnerable Road Users\nInferencing Based on Unsupervised Learning of Disentangled  Representations\nCombining Multi-level Contexts of Superpixel using Convolutional Neural  Networks to perform Natural Scene Labeling\nTowards Image Understanding from Deep Compression without Decoding\nESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic  Segmentation\nLocal Binary Pattern Networks\nMultimodal Sentiment Analysis: Addressing Key Issues and Setting up  Baselines\nFisher Pruning of Deep Nets for Facial Trait Classification\nCSfM: Community-based Structure from Motion\nMerging and Evolution: Improving Convolutional Neural Networks for  Mobile Applications\nPrecision Sugarcane Monitoring Using SVM Classifier\nCascaded multi-scale and multi-dimension convolutional neural network  for stereo matching\nDemystifying Differentiable Programming: Shift/Reset the Penultimate  Backpropagator\nB-DCGAN:Evaluation of Binarized DCGAN for FPGA\nFast and Robust Subspace Clustering Using Random Projections\nLow-Latency Video Semantic Segmentation\nMulti-Scale Spatially-Asymmetric Recalibration for Image Classification\nBuilding Efficient CNN Architecture for Offline Handwritten Chinese  Character Recognition\nSemi-supervised multi-organ segmentation via multi-planar co-training\nAttention U-Net: Learning Where to Look for the Pancreas\nDiscovery and usage of joint attention in images\nSAMI: Service-Based Arbitrated Multi-Tier Infrastructure for Mobile  Cloud Computing\nA study on non-destructive method for detecting Toxin in pepper using  Neural networks\nComputing as compression: the SP theory of intelligence\nAnalysis of Farthest Point Sampling for Approximating Geodesics in a  Graph\nGPU Accelerated Fractal Image Compression for Medical Imaging in  Parallel Computing Platform\nThe Zen of Graduate-level Programming\nOpenHEC: A Framework for Application Programmers to Design FPGA-based  Systems\nDeveloping Autonomic Properties for Distributed Pattern-Recognition  Systems with ASSL: A Distributed MARF Case Study\nEfficient Low Dose X-ray CT Reconstruction through Sparsity-Based MAP  Modeling\nTraining Deep Networks with Structured Layers by Matrix Backpropagation\nGet More With Less: Near Real-Time Image Clustering on Mobile Phones\nA Distributed Deep Representation Learning Model for Big Image Data  Classification\nDeep Learning for Computational Chemistry\nLearning Generative Models with Sinkhorn Divergences\nSelf-paced Convolutional Neural Network for Computer Aided Detection in  Medical Imaging Analysis\nSCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial  Beauty Prediction\nThe Cyborg Astrobiologist: Scouting Red Beds for Uncommon Features with  Geological Significance\nUnified Structured Learning for Simultaneous Human Pose Estimation and  Garment Attribute Classification\nHierarchical Deep Learning Architecture For 10K Objects Classification\nConditional Deep Learning for Energy-Efficient and Enhanced Pattern  Recognition\nReal-time dynamics of lattice gauge theories with a few-qubit quantum  computer\nAdaptive Algorithm and Platform Selection for Visual Detection and  Tracking\nStage 4 validation of the Satellite Image Automatic Mapper lightweight  computer program for Earth observation Level 2 product generation, Part 1  Theory\nAggregated Wasserstein Metric and State Registration for Hidden Markov  Models\nThe magnitude of the effect of calf muscles fatigue on postural control  during bipedal quiet standing with vision depends on the eye-visual target  distance\nMOMCC: Market-Oriented Architecture for Mobile Cloud Computing Based on  Service Oriented Architecture\nSparse Spike Coding : applications of Neuroscience to the processing of  natural images\nDeep Learning for Detecting Robotic Grasps\nGeodesic-based Salient Object Detection\nOnline Unsupervised Feature Learning for Visual Tracking\nRevisiting loss-specific training of filter-based MRFs for image  restoration\nVirtual Windshields: Merging Reality and Digital Content to Improve the  Driving Experience\nComplex Events Recognition under Uncertainty in a Sensor Network\nDeep Convolutional Neural Fields for Depth Estimation from a Single  Image\nAdaptive Objectness for Object Tracking\nEnd-to-End Photo-Sketch Generation via Fully Convolutional  Representation Learning\nDeepDriving: Learning Affordance for Direct Perception in Autonomous  Driving\nObject-Proposal Evaluation Protocol is 'Gameable'\nA Multi-scale Multiple Instance Video Description Network\nModelling, Measuring and Compensating Color Weak Vision\nReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation\nUnsupervised Image Segmentation using the Deffuant-Weisbuch Model from  Social Dynamics\nArticulated Hand Pose Estimation Review\nDeep Learning for Detecting Multiple Space-Time Action Tubes in Videos\nA Recursive Framework for Expression Recognition: From Web Images to  Deep Models to Game Dataset\nDeep Convolution Networks for Compression Artifacts Reduction\nA Comparative Study for the Weighted Nuclear Norm Minimization and  Nuclear Norm Minimization\nA Fuzzy Logic Based Certain Trust Model for E-Commerce\nThe standing pool of genomic structural variation in a natural  population of Mimulus guttatus\nA Manifesto for Semantic Model Differencing\nA simple coding for cross-domain matching with dimension reduction via  spectral graph embedding\nAppearance-based indoor localization: A comparison of patch descriptor  performance\nEnd-to-End Training of Deep Visuomotor Policies\nDEEP-CARVING: Discovering Visual Attributes by Carving Deep Neural Nets\nDeep Networks Can Resemble Human Feed-forward Vision in Invariant Object  Recognition\nA Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map  Calculation Example\nMultivariate Median Filters and Partial Differential Equations\nA Multiresolution Clinical Decision Support System Based on Fractal  Model Design for Classification of Histological Brain Tumours\nEvent Specific Multimodal Pattern Mining with Image-Caption Pairs\nDiscriminative Training of Deep Fully-connected Continuous CRF with  Task-specific Loss\nNonlinearities and Adaptation of Color Vision from Sequential Principal  Curves Analysis\nDiscriminative Sparse Neighbor Approximation for Imbalanced Learning\nEfficient Multi-view Performance Capture of Fine-Scale Surface Detail\nModular Tracking Framework: A Unified Approach to Registration based  Tracking\nElastic Functional Coding of Riemannian Trajectories\nImage Captioning and Visual Question Answering Based on Attributes and  External Knowledge\nDescriptor transition tables for object retrieval using unconstrained  cluttered video acquired using a consumer level handheld mobile device\nThe Conditional Lucas & Kanade Algorithm\nLook-ahead before you leap: end-to-end active recognition by forecasting  the effect of motion\nAutomatic Selection of the Optimal Local Feature Detector\nA Systematic Evaluation and Benchmark for Person Re-Identification:  Features, Metrics, and Datasets\nDictionary Learning for Robotic Grasp Recognition and Detection\nCMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face  Detection\nDetection of concealed cars in complex cargo X-ray imagery using Deep  Learning\nSalient Region Detection and Segmentation in Images using Dynamic Mode  Decomposition\nA Machine learning approach for Shape From Shading\nEvent-based, 6-DOF Camera Tracking from Photometric Depth Maps\nVideo Registration in Egocentric Vision under Day and Night Illumination  Changes\nImage segmentation based on histogram of depth and an application in  driver distraction detection\nHMD Vision-based Teleoperating UGV and UAV for Hostile Environment using  Deep Learning\nLost and Found: Detecting Small Road Hazards for Self-Driving Vehicles\nFast and reliable stereopsis measurement at multiple distances with iPad\nMulti-view Self-supervised Deep Learning for 6D Pose Estimation in the  Amazon Picking Challenge\nA Rich Source of Labels for Deep Network Models of the Primate Dorsal  Visual Stream\nBayesian Modeling of Motion Perception using Dynamical Stochastic  Textures\nLearning a Deep Embedding Model for Zero-Shot Learning\nDeMeshNet: Blind Face Inpainting for Deep MeshFace Verification\nExamining the Impact of Blur on Recognition by Convolutional Networks\nMonocular 3D Human Pose Estimation In The Wild Using Improved CNN  Supervision\nSuperpixels: An Evaluation of the State-of-the-Art\nDetecting Unexpected Obstacles for Self-Driving Cars: Fusing Deep  Learning and Geometric Modeling\nShape Estimation from Defocus Cue for Microscopy Images via Belief  Propagation\n3D tracking of water hazards with polarized stereo cameras\nTemporal scale selection in time-causal scale space\nA Projected Gradient Descent Method for CRF Inference allowing  End-To-End Training of Arbitrary Pairwise Potentials\nTracking using Numerous Anchor points\nHow hard is it to cross the room? -- Training (Recurrent) Neural  Networks to steer a UAV\nLearning Deep Visual Object Models From Noisy Web Data: How to Make it  Work\nA Vision-based Scheme for Kinematic Model Construction of  Re-configurable Modular Robots\nReal-time 3D Human Tracking for Mobile Robots with Multisensors\nComparison of the Deep-Learning-Based Automated Segmentation Methods for  the Head Sectioned Images of the Virtual Korean Human Project\nUnpaired Image-to-Image Translation using Cycle-Consistent Adversarial  Networks\nHard Mixtures of Experts for Large Scale Weakly Supervised Vision\nProxy Templates for Inverse Compositional Photometric Bundle Adjustment\nSelf-Supervised Siamese Learning on Stereo Image Pairs for Depth  Estimation in Robotic Surgery\nHow a General-Purpose Commonsense Ontology can Improve Performance of  Learning-Based Image Retrieval\nAttention-based Natural Language Person Retrieval\nA Vision System for Multi-View Face Recognition\nTruly Multi-modal YouTube-8M Video Classification with Video, Audio, and  Text\nAdvanced Steel Microstructural Classification by Deep Learning Methods\nIndependent Motion Detection with Event-driven Cameras\nVision-based Detection of Acoustic Timed Events: a Case Study on  Clarinet Note Onsets\nAnalysis and Modeling of 3D Indoor Scenes\nEfficient Eye Typing with 9-direction Gaze Estimation\nClass-Weighted Convolutional Features for Visual Instance Search\nSingle-Shot Clothing Category Recognition in Free-Configurations with  Application to Autonomous Clothes Sorting\nVision-Based Fallen Person Detection for the Elderly\nPROBE: Predictive Robust Estimation for Visual-Inertial Navigation\nOn the Selective and Invariant Representation of DCNN for  High-Resolution Remote Sensing Image Recognition\nGPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled  Images\nSemantic Instance Segmentation with a Discriminative Loss Function\nLearning Policies for Adaptive Tracking with Deep Feature Cascades\n3D Visual Perception for Self-Driving Cars using a Multi-Camera System:  Calibration, Mapping, Localization, and Obstacle Detection\nWeighted Low-rank Tensor Recovery for Hyperspectral Image Restoration\nVisual-textual Attention Driven Fine-grained Representation Learning\nBridge the Gap Between Group Sparse Coding and Rank Minimization via  Adaptive Dictionary Learning\nCapturing Localized Image Artifacts through a CNN-based Hyper-image  Representation\nUnsupervised Reverse Domain Adaptation for Synthetic Medical Images via  Adversarial Training\nCMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual  Generation\nVisual Question Answering as a Meta Learning Task\nTemporal 3D ConvNets: New Architecture and Transfer Learning for Video  Classification\nScene-Specific Pedestrian Detection Based on Parallel Vision\nThe Robust Manifold Defense: Adversarial Training using Generative  Models\nDeep Supervision with Intermediate Concepts\nGeneralizable Data-free Objective for Crafting Universal Adversarial  Perturbations\nGapFlyt: Active Vision Based Minimalist Structure-less Gap Detection For  Quadrotor Flight\nEV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based  Cameras\nMoNet: Moments Embedding Network\nDriver Hand Localization and Grasp Analysis: A Vision-based Real-time  Approach\nIM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully  Residual Convolutional-Deconvolutional Network\nSalientDSO: Bringing Attention to Direct Sparse Odometry\nA new stereo formulation not using pixel and disparity models\nCubic Range Error Model for Stereo Vision with Illuminators\nDigital Limits of Government: The Failure of E-Democracy\nImproving Transferability of Adversarial Examples with Input Diversity\nMonocular Depth Estimation by Learning from Heterogeneous Datasets\nSampleAhead: Online Classifier-Sampler Communication for Learning from  Synthesized Data\nImage Segmentation Using Subspace Representation and Sparse  Decomposition\nExpressway visibility estimation based on image entropy and piecewise  stationary time series analysis\nThe secret world of shrimps: polarisation vision at its best\nAutomated Pattern Detection--An Algorithm for Constructing Optimally  Synchronizing Multi-Regular Language Filters\nAlgorithms for Image Analysis and Combination of Pattern Classifiers  with Application to Medical Diagnosis\nFast space-variant elliptical filtering using box splines\nEvidence-Based Filters for Signal Detection: Application to Evoked Brain  Responses\nConstant-time filtering using shiftable kernels\nReal-time Image-based 6-DOF Localization in Large-Scale Environments\nSmoothed Analysis of Belief Propagation for Minimum-Cost Flow and  Matching\nNonlinear Dynamic Field Embedding: On Hyperspectral Scene Visualization\nFast non parametric entropy estimation for spatial-temporal saliency  method\nP-HGRMS: A Parallel Hypergraph Based Root Mean Square Algorithm for  Image Denoising\nOptimizing Auto-correlation for Fast Target Search in Large Search Space\nDesigning labeled graph classifiers by exploiting the Rényi entropy of  the dissimilarity representation\nAlgorithmic Analysis of Edge Ranking and Profiling for MTF Determination  of an Imaging System\nFast Computation of PERCLOS and Saccadic Ratio\nActive skeleton for bacteria modeling\nA Computational Model for Amodal Completion\nUnsupervised single-particle deep clustering via statistical manifold  learning\nLets keep it simple, Using simple architectures to outperform deeper and  more complex architectures\nNeural tuning size is a key factor underlying holistic face processing\nHSR: L1/2 Regularized Sparse Representation for Fast Face Recognition  using Hierarchical Feature Selection\nFast unsupervised Bayesian image segmentation with adaptive spatial  regularisation\nCascade Learning by Optimally Partitioning\nA generalized flow for multi-class and binary classification tasks: An  Azure ML approach\nClassification of Large-Scale Fundus Image Data Sets: A Cloud-Computing  Framework\nDeep Action Sequence Learning for Causal Shape Transformation\nAutomated Resolution Selection for Image Segmentation\nApplication-Driven Near-Data Processing for Similarity Search\nMetaheuristic Algorithms for Convolution Neural Network\nNumerical Inversion of SRNF Maps for Elastic Shape Analysis of  Genus-Zero Surfaces\nA Distance Function for Comparing Straight-Edge Geometric Figures\nParsimonious Inference on Convolutional Neural Networks: Learning and  applying on-line kernel activation rules\nRenderMap: Exploiting the Link Between Perception and Rendering for  Dense Mapping\nAdaptive Neural Networks for Efficient Inference\nLarge-Scale Evolution of Image Classifiers\nHigh Accuracy Classification of Parkinson's Disease through Shape  Analysis and Surface Fitting in $^{123}$I-Ioflupane SPECT Imaging\nEspresso: Efficient Forward Propagation for BCNNs\nDevelopment of the SP machine\nImproved Bilinear Pooling with CNNs\nRecurrent Scale Approximation for Object Detection in CNN\nPerformance Analysis of Open Source Machine Learning Frameworks for  Various Parameters in Single-Threaded and Multi-Threaded Modes\nDistributed Deep Neural Networks over the Cloud, the Edge and End  Devices\nUI-Net: Interactive Artificial Neural Networks for Iterative Image  Segmentation Based on a User Model\nGenerating Reflectance Curves from sRGB Triplets\nTraining Simplification and Model Simplification for Deep Learning: A  Minimal Effort Back Propagation Method\nExposing Computer Generated Images by Using Deep Convolutional Neural  Networks\nRegularized Evolution for Image Classifier Architecture Search\nPRUNE: Dynamic and Decidable Dataflow for Signal Processing on  Heterogeneous Platforms\nHardware-Efficient Guided Image Filtering For Multi-Label Problem\nNotes on a New Philosophy of Empirical Science\nAcceleration of the shiftable O(1) algorithm for bilateral filtering and  non-local means\nA Comparative Study of Human thermal face recognition based on Haar  wavelet transform (HWT) and Local Binary Pattern (LBP)\nZoom Better to See Clearer: Human and Object Parsing with Hierarchical  Auto-Zoom Net\nLearning Local Image Descriptors with Deep Siamese and Triplet  Convolutional Networks by Minimising Global Loss Functions\nLearning Convolutional Neural Networks using Hybrid Orthogonal  Projection and Estimation\nYouTube-8M: A Large-Scale Video Classification Benchmark\nAutomatically tracking neurons in a moving and deforming brain\nPartial Procedural Geometric Model Fitting for Point Clouds\nAdversarially Tuned Scene Generation\nUnsupervised temporal context learning using convolutional neural  networks for laparoscopic workflow analysis\nBioplausible multiscale filtering in retino-cortical processing as a  mechanism in perceptual grouping\nSemantic3D.net: A new Large-scale Point Cloud Classification Benchmark\nLearning to Associate Words and Images Using a Large-scale Graph\nCapturing natural-colour 3D models of insects for species discovery\nMIT Autonomous Vehicle Technology Study: Large-Scale Deep Learning Based  Analysis of Driver Behavior and Interaction with Automation\nDecision-Based Adversarial Attacks: Reliable Attacks Against Black-Box  Machine Learning Models\nThe Theory of Unified Relativity for a Biovielectroluminescence  Phenomenon via Fly's Visual and Imaging System\nA New 2.5D Representation for Lymph Node Detection using Random Sets of  Deep Convolutional Neural Network Observations\nAn Even Faster and More Unifying Algorithm for Comparing Trees via  Unbalanced Bipartite Matchings\nProbabilistic Search for Object Segmentation and Recognition\nData Engineering for the Analysis of Semiconductor Manufacturing Data\nSegmentation, Indexing, and Visualization of Extended Instructional  Videos\nAnalysis and Interface for Instructional Video\nSupporting Dynamic Ad hoc Collaboration Capabilities\nThe Generalized Riemann or Henstock Integral Underpinning Multivariate  Data Analysis: Application to Faint Structure Finding in Price Processes\nUsing Stochastic Encoders to Discover Structure in Data\nInvariant Stochastic Encoders\nAdaptive Cluster Expansion (ACE): A Hierarchical Bayesian Network\nDistributed Regression in Sensor Networks: Training Distributively with  Alternating Projections\nAutomatic Face Recognition System Based on Local Fourier-Bessel Features\nFace Verification in Polar Frequency Domain: a Biologically Motivated  Approach\nMultilevel Thresholding for Image Segmentation through a Fast  Statistical Recursive Algorithm\nA third level trigger programmable on FPGA for the gamma/hadron  separation in a Cherenkov telescope using pseudo-Zernike moments and the SVM  classifier\nA kernel for time series based on global alignments\nA stochastic-variational model for soft Mumford-Shah segmentation\nLocal to Global Normalization Dynamic by Nonlinear Local Interactions\nN-Particle Dynamics of the Euler Equations for Planar Diffeomorphisms\nFeatures and dimensions: Motion estimation in fly vision\nClustering fetal heart rate tracings by compression\nEnhancement of Noisy Planar Nuclear Medicine Images using Mean Field  Annealing\nStructural Health Monitoring Using Neural Network Based Vibrational  System Identification\nLocal Area Damage Detection in Composite Structures Using Piezoelectric  Transducers\nCode Similarity on High Level Programs\nEfficient representation as a design principle for neural coding and  computation\nTheory and Applications of Two-dimensional, Null-boundary,  Nine-Neighborhood, Cellular Automata Linear rules\nNatural pseudo-distance and optimal matching between reduced size  functions\nAnalytic Torsion of a Bounded Generalized Cone\nEfficient Exact Inference in Planar Ising Models\nGraph-based classification of multiple observation sets\nExact Histogram Specification Optimized for Structural Similarity\nGradient-based adaptive interpolation in super-resolution image  restoration\nManaging Distributed MARF with SNMP\nTotal Variation, Adaptive Total Variation and Nonconvex Smoothly Clipped  Absolute Deviation Penalty for Denoising Blocky Images\nSegmentation of Facial Expressions Using Semi-Definite Programming and  Generalized Principal Component Analysis\nThe VOISE Algorithm: a Versatile Tool for Automatic Segmentation of  Astronomical Images\nMaximal digital straight segments and convergence of discrete geometric  estimators\nCombinatorial pyramids and discrete geometry for energy-minimizing  segmentation\nAutomatic Defect Detection and Classification Technique from Image: A  Special Case Using Ceramic Tiles\nEfficient IRIS Recognition through Improvement of Feature Extraction and  subset Selection\nMultiresolution Elastic Medical Image Registration in Standard Intensity  Scale\nFully Automatic 3D Reconstruction of Histological Images\nImage Sampling with Quasicrystals\nAutomatic local Gabor Features extraction for face recognition\nSide-channel attack on labeling CAPTCHAs\nFast adaptive elliptical filtering using box splines\nMACH: Fast Randomized Tensor Decompositions\nLow-rank Matrix Completion with Noisy Observations: a Quantitative  Comparison\nParallelization of the LBG Vector Quantization Algorithm for Shared  Memory Systems\nSeeing Science\nDesigning fuzzy rule based classifier using self-organizing feature map  for analysis of multispectral satellite images\nLand cover classification using fuzzy rules and aggregation of  contextual information through evidence theory\nA Model-Based Approach to Predicting Predator-Prey & Friend-Foe  Relationships in Ant Colonies\nFast Alternating Linearization Methods for Minimizing the Sum of Two  Convex Functions\nAn Unsupervised Algorithm For Learning Lie Group Transformations\nIncorporating characteristics of human creativity into an evolutionary  art algorithm\nFeatures Based Text Similarity Detection\n3D Skull Recognition Using 3D Matching Technique\nHybrid Medical Image Classification Using Association Rule Mining with  Decision Tree Algorithm\nGradient Based Seeded Region Grow method for CT Angiographic Image  Segmentation\nDetection and Demarcation of Tumor using Vector Quantization in MRI  images\nThreshold Based Indexing of Commercial Shoe Print to Create Reference  and Recovery Images\nSupervised Learning of Digital image restoration based on Quantization  Nearest Neighbor algorithm\nSupervised Classification Performance of Multispectral Images\nScalable Large-Margin Mahalanobis Distance Metric Learning\nA Unified Algorithmic Framework for Multi-Dimensional Scaling\nAn Offline Technique for Localization of License Plates for Indian  Commercial Vehicles\nInvestigation and Assessment of Disorder of Ultrasound B-mode Images\nA comparative study of different feature sets for recognition of  handwritten Arabic numerals using a Multi Layer Perceptron\nLand-cover Classification and Mapping for Eastern Himalayan State Sikkim\nLarge Margin Boltzmann Machines and Large Margin Sigmoid Belief Networks\nDevelopment of an automated Red Light Violation Detection System (RLVDS)  for Indian vehicles\nA novel scheme for binarization of vehicle images using hierarchical  histogram equalization technique\nExtended Two-Dimensional PCA for Efficient Face Representation and  Recognition\nA Robust Fuzzy Clustering Technique with Spatial Neighborhood  Information for Effective Medical Image Segmentation\nNew Clustering Algorithm for Vector Quantization using Rotation of Error  Vector\nSAR Image Segmentation using Vector Quantization Technique on Entropy  Images\nOffline Handwriting Recognition using Genetic Algorithm\nColor Image Compression Based On Wavelet Packet Best Tree\nSignature Region of Interest using Auto cropping\nAn Efficient Watermarking Algorithm to Improve Payload and Robustness  without Affecting Image Perceptual Quality\nMultistage Hybrid Arabic/Indian Numeral OCR System\nAn Efficient Vein Pattern-based Recognition System\nApplication Of Fuzzy System In Segmentation Of MRI Brain Tumor\nSolving Inverse Problems with Piecewise Linear Estimators: From Gaussian  Mixture Models to Structured Sparsity\nEfficient Region-Based Image Querying\nFusion of Daubechies Wavelet Coefficients for Human Face Recognition\nRegistration of Brain Images using Fast Walsh Hadamard Transform\nA Study on the Effectiveness of Different Patch Size and Shape for Eyes  and Mouth Detection\nAn Efficient Automatic Mass Classification Method In Digitized  Mammograms Using Artificial Neural Network\nHomotopy Perturbation Method for Image Restoration and Denoising\nAutomated Acanthamoeba polyphaga detection and computation of Salmonella  typhimurium concentration in spatio-temporal images\nRotation Invariant Face Detection Using Wavelet, PCA and Radial Basis  Function Networks\nHow to Extract the Geometry and Topology from Very Large 3D  Segmentations\nAn Embarrassingly Simple Speed-Up of Belief Propagation with Robust  Potentials\nVisual-hint Boundary to Segment Algorithm for Image Segmentation\nAxiomatic Digital Topology\nNearness to Local Subspace Algorithm for Subspace and Motion  Segmentation\nFast Color Quantization Using Weighted Sort-Means Clustering\nLesion Border Detection in Dermoscopy Images\nLow-Rank Matrix Approximation with Weights or Missing Data is NP-hard\nReal-time Visual Tracking Using Sparse Representation\nStochastic Vector Quantisers\nHarmonic Order Parameters for Characterizing Complex Particle  Morphologies\nTexture feature extraction in the spatial-frequency domain for  content-based image retrieval\nAffine-invariant geodesic geometry of deformable 3D shapes\nImproving the Performance of K-Means for Color Quantization\nApplication of Freeman Chain Codes: An Alternative Recognition Technique  for Malaysian Car Plates\nGroup Invariant Scattering\nSupport vector machines/relevance vector machine for remote sensing  classification: A review\nEfficient Independence-Based MAP Approach for Robust Markov Networks  Structure Discovery\nSmart depth of field optimization applied to a robotised view camera\nA General Framework for Development of the Cortex-like Visual Object  Recognition System: Waves of Spikes, Predictive Coding and Universal  Dictionary of Features\nMulti-task GLOH feature selection for human age estimation\nComputationally efficient algorithms for statistical image processing.  Implementation in R\nContinuous Multiclass Labeling Approaches and Algorithms\nAorta Segmentation for Stent Simulation\nAutomatic Open Space Area Extraction and Change Detection from High  Resolution Urban Satellite Images\nBenchmarking the Quality of Diffusion-Weighted Images\n\"Improved FCM algorithm for Clustering on Web Usage Mining\"\nOff-Line Handwritten Signature Retrieval using Curvelet Transforms\nDisconnected Skeleton: Shape at its Absolute Scale\nA Novel Image Segmentation Enhancement Technique based on Active Contour  and Topological Alignments\nNearest Prime Simplicial Complex for Object Recognition\nComparing Haar-Hilbert and Log-Gabor Based Iris Encoders on Bath Iris  Image Database\nSufficient Conditions for Low-rank Matrix Recovery, Translated from  Sparse Signal Recovery\nMorphological Reconstruction for Word Level Script Identification\nA Replica Inference Approach to Unsupervised Multi-Scale Image  Segmentation\nUnstructured Human Activity Detection from RGBD Images\nThe IHS Transformations Based Image Fusion\nAn Efficient Real Time Method of Fingertip Detection\nGender Recognition Based on Sift Features\nA new embedding quality assessment method for manifold learning\nHierarchical Object Parsing from Structured Noisy Point Clouds\nBiometric Authorization System using Gait Biometry\nDesign of an Optical Character Recognition System for Camera-based  Handheld Devices\nLearning Topic Models by Belief Propagation\nBeyond pixels and regions: A non local patch means (NLPM) method for  content-level restoration, enhancement, and reconstruction of degraded  document images\nNew Method for 3D Shape Retrieval\nEnhancement of Image Resolution by Binarization\nA New IRIS Normalization Process For Recognition System With  Cryptographic Techniques\nInvariant texture analysis through Local Binary Patterns\nWhy We Shouldn't Forget Multicast in Name-oriented Publish/Subscribe\nA Topic Modeling Toolbox Using Belief Propagation\nMinutiae Extraction from Fingerprint Images - a Review\nNegCut: Automatic Image Segmentation based on MRF-MAP\nA New Color Feature Extraction Method Based on Dynamic Color  Distribution Entropy of Neighborhoods\nOrganic Design of Massively Distributed Systems: A Complex Networks  Perspective\nOn the Lagrangian Biduality of Sparsity Minimization Problems\nTask-Driven Adaptive Statistical Compressive Sensing of Gaussian Mixture  Models\nAutomatic Clustering with Single Optimal Solution\nAn evaluation of local shape descriptors for 3D shape retrieval\nEfficient Web-based Facial Recognition System Employing 2DHOG\nRegularized Robust Coding for Face Recognition\nLocally Linear Embedding Clustering Algorithm for Natural Imagery\nHandwritten Bangla Alphabet Recognition using an MLP Based Classifier\nVideo Object Tracking and Analysis for Computer Assisted Surgery\nEnhancement of Images using Morphological Transformation\nSingle Reduct Generation Based on Relative Indiscernibility of Rough Set  Theory\nA Framework for Automated Cell Tracking in Phase Contrast Microscopic  Videos based on Normal Velocities\nAnalysis of Magnification in Depth from Defocus\nA New Approach to Speeding Up Topic Modeling\nPrincipal Component Analysis-Linear Discriminant Analysis Feature  Extractor for Pattern Recognition\nImage segmentation by adaptive distance based on EM algorithm\nA New Approach for Arabic Handwritten Postal Addresses Recognition\nAutomatic facial feature extraction and expression recognition based on  neural network\nUbiquitous WLAN/Camera Positioning using Inverse Intensity Chromaticity  Space-based Feature Detection and Matching: A Preliminary Result\nSpeech Recognition: Increasing Efficiency of Support Vector Machines\nMorphological Filtering in Shape Spaces: Applications using Tree-Based  Image Representations\nSpectral Analysis of Projection Histogram for Enhancing Close matching  character Recognition in Malayalam\nHajj and Umrah Event Recognition Datasets\nFuzzy - Rough Feature Selection With Π- Membership Function For  Mammogram Classification\nImage Filtering using All Neighbor Directional Weighted Pixels:  Optimization using Particle Swarm Optimization\nReal time facial expression recognition using a novel method\nDependence Maximizing Temporal Alignment via Squared-Loss Mutual  Information\nLeaf vein segmentation using Odd Gabor filters and morphological  operations\nConditional Sparse Coding and Grouped Multivariate Regression\nModeling Images using Transformed Indian Buffet Processes\nFixed-Form Variational Posterior Approximation through Stochastic Linear  Regression\nImproving neural networks by preventing co-adaptation of feature  detectors\nPolarimetric SAR Image Smoothing with Stochastic Distances\nA Fast Projected Fixed-Point Algorithm for Large Graph Matching\nNon-Local Euclidean Medians\nTracking Tetrahymena Pyriformis Cells using Decision Trees\nDimension Reduction by Mutual Information Feature Extraction\nHierarchical Approach for Total Variation Digital Image Inpainting\nQualitative Comparison of Community Detection Algorithms\nPenalty Constraints and Kernelization of M-Estimation Based Fuzzy  C-Means\nA New Training Algorithm for Kanerva's Sparse Distributed Memory\nRecklessly Approximate Sparse Coding\nPerformance Measurement and Method Analysis (PMMA) for Fingerprint  Reconstruction\nApproximating the Weil-Petersson Metric Geodesics on the Universal  Teichmüller space by Singular Solutions\nColor Image Compression Algorithm Based on the DCT Blocks\nComparative Study and Optimization of Feature-Extraction Techniques for  Content based Image Retrieval\nA two-stage denoising filter: the preprocessed Yaroslavsky filter\nA Session Based Blind Watermarking Technique within the NROI of Retinal  Fundus Images for Authentication Using DWT, Spread Spectrum and Harris Corner  Detection\nPerformance Analysis Of Neuro Genetic Algorithm Applied On Detecting  Proportion Of Components In Manhole Gas Mixture\nA Comparative Study of Efficient Initialization Methods for the K-Means  Clustering Algorithm\nWavelet Based Image Coding Schemes : A Recent Survey\nA Hajj And Umrah Location Classification System For Video Crowded Scenes\nImage Classification and Optimized Image Reproduction\nSpike Timing Dependent Competitive Learning in Recurrent Self Organizing  Pulsed Neural Networks Case Study: Phoneme and Word Recognition\nPlaceRaider: Virtual Theft in Physical Spaces with Smartphones\nNoise Influence on the Fuzzy-Linguistic Partitioning of Iris Code Space\nApproximate evaluation of marginal association probabilities with belief  propagation\nSparse Modeling of Intrinsic Correspondences\nDemosaicing and Superresolution for Color Filter Array via Residual  Image Reconstruction and Sparse Representation\nDiscrete geodesic calculus in the space of viscous fluidic objects\nSchrödinger Diffusion for Shape Analysis with Texture\nEvaluating Discussion Boards on BlackBoard as a Collaborative Learning  Tool A Students Survey and Reflections\nVariational time discretization of geodesic calculus\nDeveloping ICC Profile Using Gray Level Control In Offset Printing  Process\nMulti-input Multi-output Beta Wavelet Network: Modeling of Acoustic  Units for Speech Recognition\nTime Complexity Analysis of Binary Space Partitioning Scheme for Image  Compression\n3D Surface Reconstruction of Underwater Objects\nNF-SAVO: Neuro-Fuzzy system for Arabic Video OCR\nExact and Stable Recovery of Rotations for Robust Synchronization\nA Comparative Study of Gaussian Mixture Model and Radial Basis Function  for Voice Recognition\nA Non-Blind Watermarking Scheme for Gray Scale Images in Discrete  Wavelet Transform Domain using Two Subbands\nSketch Recognition using Domain Classification\nMatching Through Features and Features Through Matching\nSVD Based Image Processing Applications: State of The Art, Contributions  and Research Challenges\nCompressive Schlieren Deflectometry\nStratified SIFT Matching for Human Iris Recognition\nTime-Frequency Representation of Microseismic Signals using the  Synchrosqueezing Transform\nLip Localization and Viseme Classification for Visual Speech Recognition\nImage registration with sparse approximations in parametric dictionaries\nImage Denoising Using Interquartile Range Filter with Local Averaging\nKriging Interpolation Filter to Reduce High Density Salt and Pepper  Noise\nCooperative Environmental Monitoring for PTZ Visual Sensor Networks: A  Payoff-based Learning Approach\nA new bio-inspired method for remote sensing imagery classification\nComparision and analysis of photo image forgery detection techniques\nAdaptive Temporal Compressive Sensing for Video\nLearning Stable Multilevel Dictionaries for Sparse Representations\nGenetic Programming for Document Segmentation and Region Classification  Using Discipulus\nScale Selection of Adaptive Kernel Regression by Joint Saliency Map for  Nonrigid Image Registration\nOmega Model for Human Detection and Counting for application in Smart  Surveillance System\nSpatial Fuzzy C Means PET Image Segmentation of Neurodegenerative  Disorder\nGaussian Mixture Model for Handwritten Script Identification\nStatistical Texture Features based Handwritten and Printed Text  Classification in South Indian Documents\nDictionary learning based image enhancement for rarity detection\nHybridization of Otsu Method and Median Filter for Color Image  Segmentation\nSpeckle Noise Reduction in Medical Ultrasound Images\nRepairing and Inpainting Damaged Images using Diffusion Tensor\nHuman Mood Detection For Human Computer Interaction\nImage Optimization and Prediction\nNovel variational model for inpainting in the wavelet domain\nImage Inpainting by Kriging Interpolation Technique\nGeometric operations implemented by conformal geometric algebra neural  nodes\nSpeckle Reduction with Adaptive Stack Filters\n3D model retrieval using global and local radial distances\nPhyseter catodon localization by sparse coding\nNon-Correlated Character Recognition using Artificial Neural Network\nDiscriminative Training: Learning to Describe Video with Sentences, from  Video Described with Sentences\nCompressive Coded Aperture Keyed Exposure Imaging with Optical Flow  Reconstruction\nA Novel Active Contour Model for Texture Segmentation\nIncreasing Compression Ratio in PNG Images by k-Modulus Method for Image  Transformation\nExtending UML for Conceptual Modeling of Annotation of Medical Images\nMajor Limitations of Satellite images\nVideo Text Localization using Wavelet and Shearlet Transforms\nOnline Tracking Parameter Adaptation based on Evaluation\nVisual saliency estimation by integrating features using multiple kernel  learning\nBayesian Fusion of Multi-Band Images\nVeni Vidi Vici, A Three-Phase Scenario For Parameter Space Analysis in  Image Analysis and Visualization\nHigh-Accuracy Total Variation for Compressed Video Sensing\nEfficient binary tomographic reconstruction\nBoosting in Location Space\nA multi-stream hmm approach to offline handwritten arabic word  recognition\nThe Classification Accuracy of Multiple-Metric Learning Algorithm on  Multi-Sensor Fusion\nMultiple-object tracking in cluttered and crowded public spaces\nRotationally Invariant Image Representation for Viewing Direction  Classification in Cryo-EM\nPersonal Identification from Lip-Print Features using a Statistical  Model\nSingular Value Decomposition of Images from Scanned Photographic Plates\nA Robust Variational Model for Positive Image Deconvolution\nLinear Algorithm for Digital Euclidean Connected Skeleton\nImage Restoration using Total Variation with Overlapping Group Sparsity\nA novel sparsity and clustering regularization\nDevnagari Handwritten Numeral Recognition using Geometric Features and  Statistical Combination Classifier\nDetermination, Calculation and Representation of the Upper and Lower  Sealing Zones During Virtual Stenting of Aneurysms\nImprovement of Automatic Hemorrhages Detection Methods Using Shapes  Recognition\nSkin Segmentation based Elastic Bunch Graph Matching for efficient  multiple Face Recognition\nOn Convergent Finite Difference Schemes for Variational - PDE Based  Image Processing\nAn iterative algorithm for computed tomography image reconstruction from  limited-angle projections\nSmoothness-Constrained Image Recovery from Block-Based Random  Projections\nTOP-SPIN: TOPic discovery via Sparse Principal component INterference\nFast Tracking via Spatio-Temporal Context Learning\nOn a non-local spectrogram for denoising one-dimensional signals\nScientific Workflows and Provenance: Introduction and Research  Opportunities\nDictionary-Learning-Based Reconstruction Method for Electron Tomography\nOn the Design and Analysis of Multiple View Descriptors\nColor and Shape Content Based Image Classification using RBF Network and  PSO Technique: A Survey\nReal-time High Resolution Fusion of Depth Maps on GPU\nImproving Texture Categorization with Biologically Inspired Filtering\nAutomatic White Blood Cell Measuring Aid for Medical Diagnosis\nFeature Extraction of Human Lip Prints\nAn Approach: Modality Reduction and Face-Sketch Recognition\nBook embeddings of Reeb graphs\nFrom Maxout to Channel-Out: Encoding Information on Sparse Pathways\nClassifiers With a Reject Option for Early Time-Series Classification\nClustering using Vector Membership: An Extension of the Fuzzy C-Means  Algorithm\nRectifying Self Organizing Maps for Automatic Concept Learning from Web  Images\nNetwork In Network\nDropout improves Recurrent Neural Networks for Handwriting Recognition\nConstraint Reduction using Marginal Polytope Diagrams for MAP LP  Relaxations\nLearning High-level Image Representation for Image Retrieval via  Multi-Task DNN using Clickthrough Data\nComparative analysis of evolutionary algorithms for image enhancement\nUnsupervised Feature Learning by Deep Sparse Coding\nCompetitive Learning with Feedforward Supervisory Signal for Pre-trained  Multilayered Networks\nLearning Generative Models with Visual Attention\nA Review on Automated Brain Tumor Detection and Segmentation from MRI of  Brain\nLearning Paired-associate Images with An Unsupervised Deep Learning  Architecture\nDo Deep Nets Really Need to be Deep?\nSpectral Networks and Locally Connected Networks on Graphs\nIVSS Integration of Color Feature Extraction Techniques for Intelligent  Video Search Systems\nSpeech Recognition Front End Without Information Loss\nA Novel Scheme for Generating Secure Face Templates Using BDA\nLearning Temporal Logical Properties Discriminating ECG models of  Cardiac Arrhytmias\nA Novel Retinal Vessel Segmentation Based On Histogram Transformation  Using 2-D Morlet Wavelet and Supervised Classification\nRobust Hierarchical Clustering\nA Hybrid NN/HMM Modeling Technique for Online Arabic Handwriting  Recognition\nBrazilian License Plate Detection Using Histogram of Oriented Gradients  and Sliding Windows\nExperiments of Distance Measurements in a Foliage Plant Retrieval System\nStudy of Efficient Technique Based On 2D Tsallis Entropy For Image  Thresholding\nBrain Tumor Detection Based On Symmetry Information\nPainting Analysis Using Wavelets and Probabilistic Topic Models\nVideo Compressive Sensing for Dynamic MRI\nLearning Deep Face Representation\nShape-Based Plagiarism Detection for Flowchart Figures in Texts\nParallel WiSARD object tracker: a ram-based tracking system\nSpectral Clustering with Jensen-type kernels and their multi-point  extensions\nThe state of play of ASC-Inclusion: An Integrated Internet-Based  Environment for Social Inclusion of Children with Autism Spectrum Conditions\nClassroom Video Assessment and Retrieval via Multiple Instance Learning\nTheory and Application of Shapelets to the Analysis of Surface  Self-assembly Imaging\nPseudo-Zernike Based Multi-Pass Automatic Target Recognition From  Multi-Channel SAR\nA Compact Linear Programming Relaxation for Binary Sub-modular MRF\nReal-time Decolorization using Dominant Colors\nAlgorithm For Multi-Hand Finger Counting: An Easy Approach\nThoughts on a Recursive Classifier Graph: a Multiclass Network for Deep  Object Recognition\nCube-Cut: Vertebral Body Segmentation in MRI-Data through Cubic-Shaped  Divergences\nEfficient Semidefinite Branch-and-Cut for MAP-MRF Inference\nThe fshape framework for the variability analysis of functional shapes\nProximal Iteratively Reweighted Algorithm with Multiple Splitting for  Nonconvex Sparsity Optimization\nHuman Pose Estimation from RGB Input Using Synthetic Training Data\nUp and Away: A Cheap UAV Cyber-Physical Testbed (Work in Progress)\nAutomatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes  using High-Resolution Neural EM Data\nImage Segmentation Using Frequency Locking of Coupled Oscillators\nGraph Matching: Relax at Your Own Risk\nRobust Fuzzy corner detector\nSemi-supervised Spectral Clustering for Classification\nAn enhanced neural network based approach towards object extraction\nA Bi-clustering Framework for Consensus Problems\nHuman Face as human single identity\nA Comparison of Nature Inspired Algorithms for Multi-threshold Image  Segmentation\nCortical spatio-temporal dimensionality reduction for visual grouping\nFrom Manifold to Manifold: Geometry-Aware Dimensionality Reduction for  SPD Matrices\nGeneralized Higher-Order Tensor Decomposition via Parallel ADMM\nDeep Networks with Internal Selective Attention through Feedback  Connections\nDeep Metric Learning for Practical Person Re-Identification\nAdaptive Image Denoising by Targeted Databases\nOptimized Method for Iranian Road Signs Detection and recognition system\nReal-Time and Efficient Method for Accuracy Enhancement of Edge Based  License Plate Recognition System\nNew Method for Optimization of License Plate Recognition system with Use  of Edge Detection and Connected Component\nA Robust and Efficient Method for Improving Accuracy of License Plate  Characters Recognition\nA Survey on Two Dimensional Cellular Automata and Its Application in  Image Processing\nMerging and Shifting of Images with Prominence Coefficient for  Predictive Analysis using Combined Image\nLow-rank SIFT: An Affine Invariant Feature for Place Recognition\nActive Sensing as Bayes-Optimal Sequential Decision Making\nBags of Affine Subspaces for Robust Object Tracking\nUnsupervised learning segmentation for dynamic speckle activity images\nObject Segmentation in Images using EEG Signals\nVideo In Sentences Out\nReal Time Fabric Defect Detection System on an Embedded DSP Platform\nRecognition of Handwritten Bangla Basic Characters and Digits using  Convex Hull based Feature Set\nExplain Images with Multimodal Recurrent Neural Networks\nRecognition of cDNA microarray image Using Feedforward artificial neural  network\nComputing Topology Preservation of RBF Transformations for  Landmark-Based Image Registration\nRefined Particle Swarm Intelligence Method for Abrupt Motion Tracking\nAn exact mapping between the Variational Renormalization Group and Deep  Learning\nEfficient Image Categorization with Sparse Fisher Vector\nHigh Order Structure Descriptors for Scene Images\nA two-pass fuzzy-geno approach to pattern classification\nBuilding pattern recognition applications with the SPARE library\nIris Biometric System using a hybrid approach\nExact Expression For Information Distance\nHigher-order MRFs based image super resolution: why not MAP?\nAbrupt Motion Tracking via Nearest Neighbor Field Driven Stochastic  Sampling\nA Short Image Series Based Scheme for Time Series Digital Image  Correlation\nOn the Covariance of ICP-based Scan-matching Techniques\nA hierarchical framework for object recognition\nDeepSentiBank: Visual Sentiment Concept Classification with Deep  Convolutional Neural Networks\nEntropy of Overcomplete Kernel Dictionaries\nDo Convnets Learn Correspondence?\nConvolutional Neural Network-based Place Recognition\nUnifying Visual-Semantic Embeddings with Multimodal Neural Language  Models\nAmoeba Techniques for Shape and Texture Analysis\nDeep Belief Network Training Improvement Using Elite Samples Minimizing  Free Energy\nEfficient and Accurate Approximations of Nonlinear Convolutional  Networks\nEfficient Object Localization Using Convolutional Networks\nFully Convolutional Neural Networks for Crowd Segmentation\nAttentional Neural Network: Feature Selection Using Cognitive Feedback\nLearning a Recurrent Visual Representation for Image Caption Generation\nVisual Sentiment Prediction with Deep Convolutional Neural Networks\nUnderstanding image representations by measuring their equivariance and  equivalence\nLearning to Generate Chairs, Tables and Cars with Convolutional Networks\nVirtual View Networks for Object Reconstruction\nAn Egocentric Look at Video Photographer Identity\nImage Super-Resolution Using Deep Convolutional Networks\nUnsupervised Feature Learning for Dense Correspondences across Scenes\nSparse Deep Stacking Network for Image Classification\nThe Quadrifocal Variety\nHard to Cheat: A Turing Test based on Answering Questions about Images\nLATCH: Learned Arrangements of Three Patch Codes\nVisual Analytics of Image-Centric Cohort Studies in Epidemiology\nReconstruction-free action inference from compressive imagers\nAutomatic Objects Removal for Scene Completion\nConstrained Extreme Learning Machines: A Study on Classification Cases\nParametric Image Segmentation of Humans with Structural Shape Priors\nThe Beauty of Capturing Faces: Rating the Quality of Digital Portraits\nSketch-a-Net that Beats Humans\nOverlapping and Non-overlapping Camera Layouts for Robot Pose Estimation\nSegmentation and Restoration of Images on Surfaces by Parametric Active  Contours with Topology Changes\nApproaching unstructured search from function bilateral symmetry  detection - A quantum algorithm\nAsk Your Neurons: A Neural-based Approach to Answering Questions about  Images\nShadow Optimization from Structured Deep Edge Detection\nFast Spectral Unmixing based on Dykstra's Alternating Projection\nObject detection via a multi-region & semantic segmentation-aware CNN  model\nThe structure of optimal parameters for image restoration problems\nExploring Models and Data for Image Question Answering\nBilevel approaches for learning of variational imaging models\nCOROLA: A Sequential Solution to Moving Object Detection Using Low-rank  Approximation\nA PCA-Based Convolutional Network\nTask-Based Optimization of Computed Tomography Imaging Systems\nAutomatic Facial Expression Recognition Using Features of Salient Facial  Patches\nRobust Facial Expression Classification Using Shape and Appearance  Features\nHarmonic Exponential Families on Manifolds\nGlobal Variational Method for Fingerprint Segmentation by Three-part  Decomposition\nFlickr30k Entities: Collecting Region-to-Phrase Correspondences for  Richer Image-to-Sentence Models\nMeasuring Visibility using Atmospheric Transmission and Digital Surface  Model\nAffine and Regional Dynamic Time Warpng\nExpresso : A user-friendly GUI for Designing, Training and Exploring  Convolutional Neural Networks\nSmooth and iteratively Restore: A simple and fast edge-preserving  smoothing model\nInner and Inter Label Propagation: Salient Object Detection in the Wild\nNew characterizations of minimum spanning trees and of saliency maps  based on quasi-flat zones\nTexture Synthesis Using Convolutional Neural Networks\nLike Partying? Your Face Says It All. Predicting the Ambiance of Places  with Profile Pictures\nVisual Search at Pinterest\nQuery by String word spotting based on character bi-gram indexing\nSalient Object Detection via Augmented Hypotheses\nDistributed image reconstruction for very large arrays in radio  astronomy\nLearning Better Encoding for Approximate Nearest Neighbor Search with  Dictionary Annealing\nEmulating short-term synaptic dynamics with memristive devices\nDouble-Base Asymmetric AdaBoost\nTowards Effective Codebookless Model for Image Classification\nUnsupervised Decision Forest for Data Clustering and Density Estimation\nAnalysis of the South Slavic Scripts by Run-Length Features of the Image  Texture\nClassification of Complex Wishart Matrices with a Diffusion-Reaction  System guided by Stochastic Distances\nHuman Gender Classification: A Review\nThe Cumulative Distribution Transform and Linear Pattern Classification\nDeep Fishing: Gradient Features from Deep Nets\nHuman Pose Estimation with Iterative Error Feedback\nMultimodal Deep Learning for Robust RGB-D Object Recognition\nA Study of Morphological Filtering Using Graph and Hypergraphs\nReal-time 2D/3D Registration via CNN Regression\nA Hyperelastic Two-Scale Optimization Model for Shape Matching\nSynapCountJ --- a Tool for Analyzing Synaptic Densities in Neurons\nDeep Learning for Single-View Instance Recognition\nMultilinear Map Layer: Prediction Regularization by Structural  Constraint\nAgglomerative clustering and collectiveness measure via exponent  generating function\nSecond order elastic metrics on the shape space of curves\nData Association for an Adaptive Multi-target Particle Filter Tracking  System\nLocal Higher-Order Statistics (LHS) describing images with statistics of  local non-binarized pixel patterns\nCalculating entropy at different scales among diverse communication  systems\nSentiCap: Generating Image Descriptions with Sentiments\nPredicting Daily Activities From Egocentric Images Using Deep Learning\nLearn to Evaluate Image Perceptual Quality Blindly from Statistics of  Self-similarity\nDo Deep Neural Networks Learn Facial Action Units When Doing Expression  Recognition?\nSemi-Automatic Segmentation of Autosomal Dominant Polycystic Kidneys  using Random Forests\nSeam Puckering Objective Evaluation Method for Sewing Process\nPan-Tilt Camera and PIR Sensor Fusion Based Moving Object Detection for  Mobile Security Robots\nCells in the Internet of Things\nPrivacy Prediction of Images Shared on Social Media Sites Using Deep  Features\nRATM: Recurrent Attentive Tracking Model\nHigh-Performance and Tunable Stereo Reconstruction\nUnderstanding symmetries in deep networks\nSymmetry-invariant optimization in deep networks\nGeneration and Comprehension of Unambiguous Object Descriptions\nExplicit Knowledge-based Reasoning for Visual Question Answering\nBayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder  Architectures for Scene Understanding\nOnline Principal Component Analysis in High Dimension: Which Algorithm  to Choose?\nShearlet-Based Detection of Flame Fronts\nLearning Human Identity from Motion Patterns\nMultimodal Skip-gram Using Convolutional Pseudowords\nDeep Gaussian Conditional Random Field Network: A Model-based Deep  Network for Discriminative Denoising\nSolving Jigsaw Puzzles with Linear Programming\nLearning Neural Network Architectures using Backpropagation\nReturn of Frustratingly Easy Domain Adaptation\nIdentifying the Absorption Bump with Deep Learning\nImage Question Answering using Convolutional Neural Network with Dynamic  Parameter Prediction\nDense Human Body Correspondences Using Convolutional Networks\nStochastic gradient method with accelerated stochastic dynamics\nSemi-supervised Learning for Convolutional Neural Networks via Online  Graph Construction\nRobust Convolutional Neural Networks under Adversarial Noise\nWhy M Heads are Better than One: Training a Diverse Ensemble of Deep  Networks\nOrder-Embeddings of Images and Language\nDelving Deeper into Convolutional Networks for Learning Video  Representations\nFast Metric Learning For Deep Neural Networks\nLearning to decompose for object detection and instance segmentation\nDirect Prediction of 3D Body Poses from Motion Compensated Sequences\nSemantic Diversity versus Visual Diversity in Visual Dictionaries\nMapping Images to Sentiment Adjective Noun Pairs with Factorized Neural  Nets\nGradual DropIn of Layers to Train Very Deep Neural Networks\nSceneNet: Understanding Real World Indoor Scenes With Synthetic Data\nCNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural  Networks on Android\nAn open access repository of images on plant health to enable the  development of mobile disease diagnostics\nA Short Survey on Data Clustering Algorithms\nHierarchical Invariant Feature Learning with Marginalization for Person  Re-Identification\nAutomated Alertness and Emotion Detection for Empathic Feedback During  E-Learning\nAdapting Models to Signal Degradation using Distillation\nAutomatic Annotation of Structured Facts in Images\nA Fully Convolutional Neural Network for Cardiac Segmentation in  Short-Axis MRI\nMulti-Bias Non-linear Activation in Deep Neural Networks\nLayer-wise Relevance Propagation for Neural Networks with Local  Renormalization Layers\nMarr Revisited: 2D-3D Alignment via Surface Normal Prediction\nTowards Bayesian Deep Learning: A Survey\nAutomatic Content-aware Non-Photorealistic Rendering of Images\nTrajectory Aligned Features For First Person Action Recognition\nSTD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven  Pooling\nStatistics of RGBD Images\nBinarized Neural Networks on the ImageNet Classification Task\nHardware-oriented Approximation of Convolutional Neural Networks\nOrientation-boosted Voxel Nets for 3D Object Recognition\nVideo Description using Bidirectional Recurrent Neural Networks\nWhat do different evaluation metrics tell us about saliency models?\nDENSER Cities: A System for Dense Efficient Reconstructions of Cities\nA simple numeric algorithm for ancient coin dies identification\nPrecomputed Real-Time Texture Synthesis with Markovian Generative  Adversarial Networks\nTracking Human-like Natural Motion Using Deep Recurrent Neural Networks\nCNN-RNN: A Unified Framework for Multi-label Image Classification\nAutomatic Segmentation of Dynamic Objects from an Image Pair\nImproving Raw Image Storage Efficiency by Exploiting Similarity\nEstimating 3D Trajectories from 2D Projections via Disjunctive Factored  Four-Way Conditional Restricted Boltzmann Machines\nHierarchical Deep Reinforcement Learning: Integrating Temporal  Abstraction and Intrinsic Motivation\nAnalysis of the Entropy-guided Switching Trimmed Mean Deviation-based  Anisotropic Diffusion filter\nA New Approach in Persian Handwritten Letters Recognition Using Error  Correcting Output Coding\nDASC: Robust Dense Descriptor for Multi-modal and Multi-spectral  Correspondence Estimation\nCrafting Adversarial Input Sequences for Recurrent Neural Networks\nArtistic style transfer for videos\nMysteries of Visual Experience\nDeep FisherNet for Object Classification\nHyperparameter Transfer Learning through Surrogate Alignment for  Efficient Deep Neural Network Training\nModeling Context in Referring Expressions\nFast and robust pushbroom hyperspectral imaging via DMD-based scanning\nAccelerating the Super-Resolution Convolutional Neural Network\nIdentification of repeats in DNA sequences using nucleotide distribution  uniformity\nGlobal Vertices and the Noising Paradox\nAutonomous Grounding of Visual Field Experience through Sensorimotor  Prediction\nLanguage free character recognition using character sketch and center of  gravity shifting\nAn efficient iterative thresholding method for image segmentation\nIdentifying Metastases in Sentinel Lymph Nodes with Deep Convolutional  Neural Networks\nSparse Subspace Clustering via Diffusion Process\nOnionNet: Sharing Features in Cascaded Deep Classifiers\nAutomatic text extraction and character segmentation using maximally  stable extremal regions\nSolving Visual Madlibs with Multiple Cues\nMulti-View Product Image Search Using Deep ConvNets Representations\nSpeech Signal Analysis for the Estimation of Heart Rates Under Different  Emotional States\nApplying Deep Learning to Basketball Trajectories\nDeepDiary: Automatic Caption Generation for Lifelogging Image Streams\nRecurrent Fully Convolutional Neural Networks for Multi-slice MRI  Cardiac Segmentation\nBeyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image  Denoising\nA Riemannian Network for SPD Matrix Learning\nIntrinsic Light Field Images\nAnomaly detection and classification for streaming data using PDEs\nSenTion: A framework for Sensing Facial Expressions\nMedical image denoising using convolutional denoising autoencoders\nLarge Angle based Skeleton Extraction for 3D Animation\nVoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain  Segmentation\nLocal Binary Convolutional Neural Networks\nA Delay-Tolerant Potential-Field-Based Network Implementation of an  Integrated Navigation System\nFailure Detection for Facial Landmark Detectors\nA Non-Local Conventional Approach for Noise Removal in 3D MRI\nA Convolutional Neural Network Approach for Post-Processing in HEVC  Intra Coding\nA Novel Approach for Shot Boundary Detection in Videos\nKullback-Leibler Penalized Sparse Discriminant Analysis for  Event-Related Potential Classification\nAn Octree-Based Approach towards Efficient Variational Range Data Fusion\nA Fast Ellipse Detector Using Projective Invariant Pruning\nLearning to generalize to new compositions in image understanding\nVisual Question: Predicting If a Crowd Will Agree on the Answer\nTemporal Convolutional Networks: A Unified Approach to Action  Segmentation\nAmerican Sign Language fingerspelling recognition from video: Methods  for unrestricted recognition and signer-independence\nMulti-Class Multi-Object Tracking using Changing Point Detection\nTraining Deep Spiking Neural Networks using Backpropagation\nTowards Transparent AI Systems: Interpreting Visual Question Answering  Models\nAn Automated Size Recognition Technique for Acetabular Implant in Total  Hip Replacement\nRetrieval and Clustering from a 3D Human Database based on Body and Head  Shape\nBenchmarks, Performance Evaluation and Contests for 3D Shape Retrieval\nPreprocessing for Automating Early Detection of Cervical Cancer\nImage Splicing Detection Using Inherent Lens Radial Distortion\nNeural Networks for Emotion Classification\nIncremental Top-k List Comparison Approach to Robust Multi-Structure  Model Fitting\nLearning image transformations without training examples\nRobust artificial neural networks and outlier detection. Technical  report\nEclectic Extraction of Propositional Rules from Neural Networks\nLinearized Additive Classifiers\nRotation, Scaling and Translation Analysis of Biometric Signature  Templates\nA Comparative Experiment of Several Shape Methods in Recognizing Plants\nThe Generalized A* Architecture\nStudying Satellite Image Quality Based on the Fusion Techniques\nHand Tracking based on Hierarchical Clustering of Range Data\nOn B-spline framelets derived from the unitary extension principle\nOnline Adaptive Statistical Compressed Sensing of Gaussian Mixture  Models\nAutomated PolyU Palmprint sample Registration and Coarse Classification\nMultiscale Hybrid Non-local Means Filtering Using Modified Similarity  Measure\nHiding Image in Image by Five Modulus Method for Image Steganography\nAutomatic Fingerprint Recognition Using Minutiae Matching Technique for  the Large Fingerprint Database\nMultispectral Spatial Characterization: Application to Mitosis Detection  in Breast Cancer Histopathology\nSpeckle Reduction in Polarimetric SAR Imagery with Stochastic Distances  and Nonlocal Means\nTracking of Fingertips and Centres of Palm using KINECT\nRobust Noise Filtering in Image Sequences\nA Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map  Super-Resolution\nColor image denoising by chromatic edges based vector valued diffusion\nk-Modulus Method for Image Transformation\nFractal-Based Detection of Microcalcification Clusters in Digital  Mammograms\nSparse arrays of signatures for online character recognition\nMultimodal Approach for Video Surveillance Indexing and Retrieval\nLearning Features and their Transformations by Spatial and Temporal  Spherical Clustering\nAn interactive engine for multilingual video browsing using semantic  content\nInfluences Combination of Multi-Sensor Images on Classification Accuracy\nCategorizing ancient documents\nA proposition of a robust system for historical document images  indexation\nA Synergistic Approach for Recovering Occlusion-Free Textured 3D Maps of  Urban Facades from Heterogeneous Cartographic Data\nA New Algorithm of Speckle Filtering using Stochastic Distances\nDeeply Coupled Auto-encoder Networks for Cross-view Classification\nImage Search Reranking\nZero-bias autoencoders and the benefits of co-adapting features\nApplication of the Ring Theory in the Segmentation of Digital Images\nAnisotropic Mesh Adaptation for Image Representation\nExploiting Two-Dimensional Group Sparsity in 1-Bit Compressive Sensing\nStructure Tensor Based Image Interpolation Method\nActive spline model: A shape based model-interactive segmentation\n$l_1$-regularized Outlier Isolation and Regression\nBeyond $χ^2$ Difference: Learning Optimal Metric for Boundary  Detection\nTraining Convolutional Networks with Noisy Labels\nDeep Epitomic Convolutional Neural Networks\nModelling, Visualising and Summarising Documents with a Single  Convolutional Neural Network\nMass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour  Equality\nMRF-based Background Initialisation for Improved Foreground Detection in  Cluttered Surveillance Videos\nDeep Fragment Embeddings for Bidirectional Image Sentence Mapping\nFast, Robust and Non-convex Subspace Recovery\nIncorporating Near-Infrared Information into Semantic Image Segmentation\nFace Identification with Second-Order Pooling\nAdaptive texture energy measure method\nPersistent Homology in Sparse Regression and Its Application to Brain  Morphometry\nEnforcing Label and Intensity Consistency for IR Target Detection\n10,000+ Times Accelerated Robust Subset Selection (ARSS)\nDeeply-Supervised Nets\nSolving the Maximum-Weight Connected Subgraph Problem to Optimality\nDomain Adaptive Neural Networks for Object Recognition\nSpatially-sparse convolutional neural networks\nOn The Power of Joint Wavelet-DCT Features for Multispectral Palmprint  Recognition\nCombining human and machine learning for morphological analysis of  galaxy images\nMultiple Instance Reinforcement Learning for Efficient Weakly-Supervised  Detection in Images\nSimple Two-Dimensional Object Tracking based on a Graph Algorithm\nEvent Retrieval Using Motion Barcodes\nTextural Approach for Mass Abnormality Segmentation in Mammographic  Images\nParsing Occluded People by Flexible Compositions\nFisher Kernel for Deep Neural Activations\nBayesian Image Restoration for Poisson Corrupted Image using a Latent  Variational Method with Gaussian MRF\nJoint Segmentation and Deconvolution of Ultrasound Images Using a  Hierarchical Bayesian Model based on Generalized Gaussian Priors\nCancer Detection with Multiple Radiologists via Soft Multiple Instance  Logistic Regression and $L_1$ Regularization\nBrain Tumor Detection Based on Bilateral Symmetry Information\nDescriptor Ensemble: An Unsupervised Approach to Descriptor Fusion in  the Homography Space\nFixed Point Algorithm Based on Quasi-Newton Method for Convex  Minimization Problem with Application to Image Deblurring\nAutomatic video scene segmentation based on spatial-temporal clues and  rhythm\nTranslating Videos to Natural Language Using Deep Recurrent Neural  Networks\nDiscovering beautiful attributes for aesthetic image analysis\nEfficient GPU Implementation for Single Block Orthogonal Dictionary  Learning\nTowards Deep Neural Network Architectures Robust to Adversarial Examples\nLocally Scale-Invariant Convolutional Neural Networks\nAre We Ready for Driver-less Vehicles? Security vs. Privacy- A Social  Perspective\nImage enhancement using the mean dynamic range maximization with  logarithmic operations\nCompressing Deep Convolutional Networks using Vector Quantization\nLearning to Segment Moving Objects in Videos\nDiscovering Hidden Factors of Variation in Deep Networks\nStriving for Simplicity: The All Convolutional Net\nLearning Activation Functions to Improve Deep Neural Networks\nContour Detection Using Cost-Sensitive Convolutional Neural Networks\nHalf-CNN: A General Framework for Whole-Image Regression\nMulti-modal Sensor Registration for Vehicle Perception via Deep Neural  Networks\nTraining deep neural networks with low precision multiplications\nFully Convolutional Multi-Class Multiple Instance Learning\nLearning Compact Convolutional Neural Networks with Nested Dropout\nConvolutional Neural Networks for joint object detection and pose  estimation: A comparative study\nEnhancing fractal descriptors on images by combining boundary and  interior of Minkowski dilation\nTexture analysis by multi-resolution fractal descriptors\nDisjunctive Normal Networks\nMax-Margin Object Detection\nPose and Shape Estimation with Discriminatively Learned Parts\nA Class of DCT Approximations Based on the Feig-Winograd Algorithm\nDeep Boosting: Layered Feature Mining for General Image Classification\nRecognizing Focal Liver Lesions in Contrast-Enhanced Ultrasound with  Discriminatively Trained Spatio-Temporal Model\nTowards a Practical Architecture for the Next Generation Internet of  Things\nUnsupervised Fusion Weight Learning in Multiple Classifier Systems\nDelving Deep into Rectifiers: Surpassing Human-Level Performance on  ImageNet Classification\nReflectance Hashing for Material Recognition\nA Survey on Hough Transform, Theory, Techniques and Applications\nA HMAX with LLC for visual recognition\nFast Fusion of Multi-Band Images Based on Solving a Sylvester Equation\nImage denoising based on improved data-driven sparse representation\nLarge-Scale Deep Learning on the YFCC100M Dataset\nPhrase-based Image Captioning\nFusion of Image Segmentation Algorithms using Consensus Clustering\nApplication of Independent Component Analysis Techniques in Speckle  Noise Reduction of Retinal OCT Images\nSpike Event Based Learning in Neural Networks\nEvaluation of Deep Convolutional Nets for Document Image Classification  and Retrieval\nPuzzle Imaging: Using Large-scale Dimensionality Reduction Algorithms  for Localization\nImage Segmentation in Liquid Argon Time Projection Chamber Detector\nGenerating Multi-Sentence Lingual Descriptions of Indoor Scenes\nMacroblock Classification Method for Video Applications Involving  Motions\nDeep Transfer Network: Unsupervised Domain Adaptation\nGrouping and Recognition of Dot Patterns with Straight Offset Polygons\nVideo-Based Facial Expression Recognition Using Local Directional Binary  Pattern\nBand selection in RKHS for fast nonlinear unmixing of hyperspectral  images\nRepresentation Learning with Deep Extreme Learning Machines for  Efficient Image Set Classification\nGlobal 6DOF Pose Estimation from Untextured 2D City Models\nDeep Hierarchical Parsing for Semantic Segmentation\nDeep Convolutional Inverse Graphics Network\nAdaptive-Rate Sparse Signal Reconstruction With Application in  Compressive Background Subtraction\nConvolutional Neural Network Architectures for Matching Natural Language  Sentences\nDiverse Landmark Sampling from Determinantal Point Processes for  Scalable Manifold Learning\nTraining Binary Multilayer Neural Networks for Image Classification  using Expectation Backpropagation\nSingle image super-resolution by approximated Heaviside functions\nSparse Code Formation with Linear Inhibition\nA Dictionary-based Approach for Estimating Shape and Spatially-Varying  Reflectance\nLiSens --- A Scalable Architecture for Video Compressive Sensing\nMetric Localization using Google Street View\nStatistical Analysis of Loopy Belief Propagation in Random Fields\nEnhanced Image Classification With a Fast-Learning Shallow Convolutional  Neural Network\nReal-time Dynamic MRI Reconstruction using Stacked Denoising Autoencoder\nA Comparative Analysis of Tensor Decomposition Models Using Hyper  Spectral Image\nDiscriminative Bayesian Dictionary Learning for Classification\nFast Optimal Transport Averaging of Neuroimaging Data\nGlobally Tuned Cascade Pose Regression via Back Propagation with  Application in 2D Face Pose Estimation and Heart Segmentation in 3D CT Images\nMicrosoft COCO Captions: Data Collection and Evaluation Server\nFast algorithms for morphological operations using run-length encoded  binary images\nRobust real time face recognition and tracking on gpu using fusion of  rgb and depth image\nA Multiphase Image Segmentation Based on Fuzzy Membership Functions and  L1-norm Fidelity\nCar that Knows Before You Do: Anticipating Maneuvers via Learning  Temporal Driving Models\nEfficient Scene Text Localization and Recognition with Local Character  Refinement\nText Localization in Video Using Multiscale Weber's Local Descriptor\nApplication of Enhanced-2D-CWT in Topographic Images for Mapping  Landslide Risk Areas\nLocal Variation as a Statistical Hypothesis Test\nDeviation Based Pooling Strategies For Full Reference Image Quality  Assessment\nFast Dictionary Matching for Content-based Image Retrieval\nCascaded Sparse Spatial Bins for Efficient and Effective Generic Object  Detection\nBecoming the Expert - Interactive Multi-Class Machine Teaching\nProbabilistic Depth Image Registration incorporating Nonvisual  Information\nHierarchical Subquery Evaluation for Active Learning on a Graph\nPerforatedCNNs: Acceleration through Elimination of Redundant  Convolutions\nAn Open Source Testing Tool for Evaluating Handwriting Input Methods\nVisual Madlibs: Fill in the blank Image Generation and Question  Answering\nPredicting Deep Zero-Shot Convolutional Neural Networks using Textual  Descriptions\nA Riemannian low-rank method for optimization over semidefinite matrices  with block-diagonal constraints\nVisualizing and Understanding Neural Models in NLP\nRecognition of Changes in SAR Images Based on Gauss-Log Ratio and MRFFCM\nThe Long-Short Story of Movie Description\nBeyond Temporal Pooling: Recurrence and Temporal Convolutions for  Gesture Recognition in Video\nCirculant temporal encoding for video retrieval and temporal alignment\nInverting Visual Representations with Convolutional Networks\nScheduled Sampling for Sequence Prediction with Recurrent Neural  Networks\nUnveiling the Dreams of Word Embeddings: Towards Language-Driven Image  Generation\nTowards Benchmarking Scene Background Initialization\nParseNet: Looking Wider to See Better\nUsing Hankel Matrices for Dynamics-based Facial Emotion Recognition and  Pain Detection\nDeep Convolutional Networks on Graph-Structured Data\nCFORB: Circular FREAK-ORB Visual Odometry\nLearning with a Wasserstein Loss\nPoint-wise Map Recovery and Refinement from Functional Correspondence\nAn Open Science Platform for the Next Generation of Data\nCrowd Flow Segmentation in Compressed Domain using CRF\nmoco: Fast Motion Correction for Calcium Imaging\nCO2 Forest: Improved Random Forest by Continuous Optimization of Oblique  Splits\nAdaptive Digital Scan Variable Pixels\nIncremental RANSAC for Online Relocation in Large Dynamic Environments\nNatural Scene Recognition Based on Superpixels and Deep Boltzmann  Machines\nUnshredding of Shredded Documents: Computational Framework and  Implementation\nParallel Multi-Dimensional LSTM, With Application to Fast Biomedical  Volumetric Image Segmentation\nDeepMatching: Hierarchical Deformable Dense Matching\nNonnegative Matrix Factorization applied to reordered pixels of single  images based on patches to achieve structured nonnegative dictionaries\nDeep-Plant: Plant Identification with convolutional neural networks\nVariational Inference for Background Subtraction in Infrared Imagery\nRecognition of Emotions using Kinects\nSemantic Pose using Deep Networks Trained on Synthetic RGB-D\nAutomatic 3D Liver Segmentation Using Sparse Representation of Global  and Local Image Information via Level Set Formulation\nBorobudur was Built Algorithmically\nMobile-Based Experience Sampling for Behaviour Research\nTowards universal neural nets: Gibbs machines and ACE\nChebyshev and Conjugate Gradient Filters for Graph Image Denoising\nJoint Color-Spatial-Directional clustering and Region Merging (JCSD-RM)  for unsupervised RGB-D image segmentation\nAn Approach to the Analysis of the South Slavic Medieval Labels Using  Image Texture\nA New Low-Rank Tensor Model for Video Completion\nA Dual Fast and Slow Feature Interaction in Biologically Inspired Visual  Recognition of Human Action\nAnalyzing structural characteristics of object category representations  from their semantic-part distributions\nMedical Image Classification via SVM using LBP Features from  Saliency-Based Folded Data\nLinearized Kernel Dictionary Learning\nModelling Uncertainty in Deep Learning for Camera Relocalization\nTelugu OCR Framework using Deep Learning\nDeep Convolutional Features for Image Based Retrieval and Scene  Categorization\nNew Fuzzy LBP Features for Face Recognition\nHyper-Fisher Vectors for Action Recognition\nCompression of Deep Neural Networks on the Fly\nOnline Object Tracking with Proposal Selection\nTowards Dropout Training for Convolutional Neural Networks\nOn Optical Flow Models for Variational Motion Estimation\nLabeling the Features Not the Samples: Efficient Video Classification  with Minimal Supervision\nPredicting and visualizing psychological attributions with a deep neural  network\nMax-Pooling Dropout for Regularization of Convolutional Neural Networks\nFixation prediction with a combined model of bottom-up saliency and  vanishing point\nClustering by Deep Nearest Neighbor Descent (D-NND): A Density-based  Parameter-Insensitive Clustering Method\nIn-situ multi-scattering tomography\nDeep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views\nAffinity CNN: Learning Pixel-Centric Pairwise Relations for  Figure/Ground Embedding\nMovieQA: Understanding Stories in Movies through Question-Answering\nNeural Self Talk: Image Understanding via Continuous Questioning and  Answering\nDeep Relative Attributes\nOn non-iterative training of a neural classifier\nContext Driven Label Fusion for segmentation of Subcutaneous and  Visceral Fat in CT Volumes\nMultiple penalized principal curves: analysis and computation\nLocal and global gestalt laws: A neurally based spectral approach\nTransformed Residual Quantization for Approximate Nearest Neighbor  Search\nDeep Learning with S-shaped Rectified Linear Activation Units\nAdaptive Object Detection Using Adjacency and Zoom Prediction\nA Combined Deep-Learning and Deformable-Model Approach to Fully  Automatic Segmentation of the Left Ventricle in Cardiac MRI\nPart-Stacked CNN for Fine-Grained Visual Categorization\nGraph entropies in texture segmentation of images\nRobust Scene Text Recognition Using Sparse Coding based Features\nGPU-Based Fuzzy C-Means Clustering Algorithm for Image Segmentation\nSusceptibility of texture measures to noise: an application to lung  tumor CT images\nDeep Neural Networks predict Hierarchical Spatio-temporal Cortical  Dynamics of Human Visual Object Recognition\nMulti-Atlas Segmentation with Joint Label Fusion of Osteoporotic  Vertebral Compression Fractures on CT\nBrain-Inspired Deep Networks for Image Aesthetics Assessment\nComparison-based Image Quality Assessment for Parameter Selection\nThe Image Torque Operator for Contour Processing\nAdaptive Image Denoising by Mixture Adaptation\nSparsity in Dynamics of Spontaneous Subtle Emotions: Analysis \\&  Application\nPupilNet: Convolutional Neural Networks for Robust Pupil Detection\nPN-Net: Conjoined Triple Deep Network for Learning Local Image  Descriptors\nLearning Support Correlation Filters for Visual Tracking\nTopological descriptors for 3D surface analysis\nPerson Re-Identification by Discriminative Selection in Video Ranking\nAn Unsupervised Method for Detection and Validation of The Optic Disc  and The Fovea\nRelief R-CNN : Utilizing Convolutional Features for Fast Object  Detection\nPixel Recurrent Neural Networks\nClassification and Verification of Online Handwritten Signatures with  Time Causal Information Theory Quantifiers\nPersonNet: Person Re-identification with Deep Convolutional Neural  Networks\nOsteoporotic and Neoplastic Compression Fracture Classification on  Longitudinal CT\nFace Alignment by Local Deep Descriptor Regression\nScene Invariant Crowd Segmentation and Counting Using Scale-Normalized  Histogram of Moving Gradients (HoMG)\nA Deep Learning Based Fast Image Saliency Detection Algorithm\nImproving Vertebra Segmentation through Joint Vertebra-Rib Atlases\nLearning Discriminative Features via Label Consistent Neural Network\nTowards Better Exploiting Convolutional Neural Networks for Remote  Sensing Scene Classification\nComparative Evaluation of Action Recognition Methods via Riemannian  Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions\nCorrentropy Maximization via ADMM - Application to Robust Hyperspectral  Unmixing\nOn Feature based Delaunay Triangulation for Palmprint Recognition\nHomogeneity of Cluster Ensembles\nExploiting Cyclic Symmetry in Convolutional Neural Networks\nChallenges of Integrating A Priori Information Efficiently in the  Discovery of Spatio-Temporal Objects in Large Databases\nImage encryption with dynamic chaotic Look-Up Table\nA Versatile Scene Model with Differentiable Visibility Applied to  Generative Pose Estimation\nReal-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model\nCharacter Proposal Network for Robust Text Extraction\nEmbracing Error to Enable Rapid Crowdsourcing\nGenerating images with recurrent adversarial networks\nA landmark-based algorithm for automatic pattern recognition and  abnormality detection\nCreating Simplified 3D Models with High Quality Textures\nCorrelation Hashing Network for Efficient Cross-Modal Retrieval\nImplicit LOD using points ordering for processing and visualisation in  Point Cloud Servers\nA statistical shape space model of the palate surface trained on 3D MRI  scans of the vocal tract\nA Single Model Explains both Visual and Auditory Precortical Coding\nFALDOI: A new minimization strategy for large displacement variational  optical flow\nScalable Metric Learning via Weighted Approximate Rank Component  Analysis\nTechnical Report: Band selection for nonlinear unmixing of hyperspectral  images as a maximal clique problem\nNetwork Morphism\nConfidence-Constrained Maximum Entropy Framework for Learning from  Multi-Instance Data\nDeep Contrast Learning for Salient Object Detection\nLearning a Discriminative Null Space for Person Re-identification\nA non-extensive entropy feature and its application to texture  classification\nDROW: Real-Time Deep Learning based Wheelchair Detection in 2D Range  Data\nSummary Transfer: Exemplar-based Subset Selection for Video  Summarization\nTexture Networks: Feed-forward Synthesis of Textures and Stylized Images\nLearning Typographic Style\nA comprehensive study of sparse codes on abnormality detection\nA Novel Method for Extrinsic Calibration of a 2-D Laser-Rangefinder and  a Camera\nGraph Based Sinogram Denoising for Tomographic Reconstructions\nFourier ptychographic reconstruction using Poisson maximum likelihood  and truncated Wirtinger gradient\nA Neural Approach to Blind Motion Deblurring\nVariable-Length Hashing\nTracking multiple moving objects in images using Markov Chain Monte  Carlo\nGenerative Image Modeling using Style and Structure Adversarial Networks\nDeep Shading: Convolutional Neural Networks for Screen-Space Shading\nControlling Explanatory Heatmap Resolution and Semantics via  Decomposition Depth\nConvolution in Convolution for Network in Network\nMulti-Subregion Based Correlation Filter Bank for Robust Face  Recognition\nCoarse-to-Fine Segmentation With Shape-Tailored Scale Spaces\nFast and Provably Accurate Bilateral Filtering\nHuman Pose Estimation using Deep Consensus Voting\nAudio Visual Emotion Recognition with Temporal Alignment and Perception  Attention\nA Generic Inverted Index Framework for Similarity Search on the GPU -  Technical Report\nOn distances, paths and connections for hyperspectral image segmentation\nInstance-sensitive Fully Convolutional Networks\nPalmprint Recognition Using Deep Scattering Convolutional Network\nUnsupervised Visual Sense Disambiguation for Verbs using Multimodal  Embeddings\nDeep Networks with Stochastic Depth\nAccurate Text Localization in Natural Image with Cascaded Convolutional  Text Network\nRobust Head-Pose Estimation Based on Partially-Latent Mixture of Linear  Regressions\nFourier Analysis and q-Gaussian Functions: Analytical and Numerical  Results\nImproving Image Captioning by Concept-based Sentence Reranking\nTexture Synthesis Through Convolutional Neural Networks and Spectrum  Constraints\nThe embedding dimension of Laplacian eigenfunction maps\nMatrix Factorization-Based Clustering Of Image Features For  Bandwidth-Constrained Information Retrieval\nRobust and Low-Rank Representation for Fast Face Identification with  Occlusions\nDetecting Ground Control Points via Convolutional Neural Network for  Stereo Matching\nWhen Do Luxury Cars Hit the Road? Findings by A Big Data Approach\nRecurrent Human Pose Estimation\nEco-Strategy: Towards a New Generation Managerial Model Based on Green  IT and CSR\nBlind image separation based on exponentiated transmuted Weibull  distribution\nTernary Weight Networks\nMultilevel Thresholding Segmentation of T2 weighted Brain MRI images  using Convergent Heterogeneous Particle Swarm Optimization\nGoing Deeper into Action Recognition: A Survey\nStructured Prediction of 3D Human Pose with Deep Neural Networks\nGenerative Adversarial Text to Image Synthesis\nMatching Handwritten Document Images\nDimensionality Reduction on SPD Manifolds: The Emergence of  Geometry-Aware Methods\nPoisson multi-Bernoulli conjugate prior for multiple extended object  estimation\nR-FCN: Object Detection via Region-based Fully Convolutional Networks\nResidual Networks Behave Like Ensembles of Relatively Shallow Networks\nSwapout: Learning an ensemble of deep architectures\nSparse Signal Reconstruction with Multiple Side Information using  Adaptive Weights for Multiview Sources\nWide Residual Networks\nDense CNN Learning with Equivalent Mappings\nMeasuring Neural Net Robustness with Constraints\nReview Networks for Caption Generation\nDeepMovie: Using Optical Flow and Deep Neural Networks to Stylize Movies\nDiscrete Deep Feature Extraction: A Theory and New Architectures\nBenign-Malignant Lung Nodule Classification with Geometric and  Appearance Histogram Features\nAchieving stable subspace clustering by post-processing generic  clustering results\nStacking With Auxiliary Features\nHyperspectral Image Classification with Support Vector Machines on  Kernel Distribution Embeddings\nk2-means for fast and accurate large scale clustering\nParametric Exponential Linear Unit for Deep Convolutional Neural  Networks\nHierarchical Question-Image Co-Attention for Visual Question Answering\nRecursive Autoconvolution for Unsupervised Learning of Convolutional  Neural Networks\nA Crowd Monitoring Framework using Emotion Analysis of Social Media for  Emergency Management in Mass Gatherings\nSynthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet\nScene Grammars, Factor Graphs, and Belief Propagation\nAn Interactive Medical Image Segmentation Framework Using Iterative  Refinement\nBetter Image Segmentation by Exploiting Dense Semantic Predictions\nUnsupervised classification of children's bodies using currents\nDeep neural networks are robust to weight binarization and other  non-linear distortions\nSystematic evaluation of CNN advances on the ImageNet\nPoint-wise mutual information-based video segmentation with high  temporal consistency\nConvolutional Neural Fabrics\nThe Mythos of Model Interpretability\nImproved Techniques for Training GANs\nHuman Attention in Visual Question Answering: Do Humans and Deep  Networks Look at the Same Regions?\nInverting face embeddings with convolutional neural networks\nHolistic Features For Real-Time Crowd Behaviour Anomaly Detection\nConditional Image Generation with PixelCNN Decoders\nDeep Learning for Identifying Metastatic Breast Cancer\nRRV: A Spatiotemporal Descriptor for Rigid Body Motion Recognition\nSlack and Margin Rescaling as Convex Extensions of Supermodular  Functions\nCutting out the middleman: measuring nuclear area in histopathology  slides without segmentation\nSocial-sparsity brain decoders: faster spatial sparsity\nQuestion Relevance in VQA: Identifying Non-Visual And False-Premise  Questions\nTagger: Deep Unsupervised Perceptual Grouping\n3D Display Calibration by Visual Pattern Analysis\nIdentifying individual facial expressions by deconstructing a neural  network\nDynamical optical flow of saliency maps for predicting visual attention\nDropNeuron: Simplifying the Structure of Deep Neural Networks\nAnalyzing the Behavior of Visual Question Answering Models\nFast Multi-Layer Laplacian Enhancement\nSort Story: Sorting Jumbled Images and Captions into Stories\nConvex Decomposition And Efficient Shape Representation Using Deformable  Convex Polytopes\nDisjunctive Normal Level Set: An Efficient Parametric Implicit Method\nStochastic Multiple Choice Learning for Training Diverse Deep Ensembles\nLearning Concept Taxonomies from Multi-modal Data\nHow smart does your profile image look? Estimating intelligence from  social network profile images\n3D Deeply Supervised Network for Automatic Liver Segmentation from CT  Volumes\nCell assemblies at multiple time scales with arbitrary lag  constellations\nCUNet: A Compact Unsupervised Network for Image Classification\nDeep CORAL: Correlation Alignment for Deep Domain Adaptation\nMulti Channel-Kernel Canonical Correlation Analysis for Cross-View  Person Re-Identification\nCNN-LTE: a Class of 1-X Pooling Convolutional Neural Networks on Label  Tree Embeddings for Audio Scene Recognition\nFast Predictive Image Registration\nMultimodal Affect Recognition using Kinect\nIntra-layer Nonuniform Quantization for Deep Convolutional Neural  Network\nCity-Identification of Flickr Videos Using Semantic Acoustic Features\nDNA Image Pro -- A Tool for Generating Pixel Patterns using DNA Tile  Assembly\nAccelerating Eulerian Fluid Simulation With Convolutional Networks\nSpatio-Temporal Saliency Networks for Dynamic Saliency Prediction\nWeakly supervised object detection using pseudo-strong labels\nDistributed Coding of Multiview Sparse Sources with Joint Recovery\nDeep Active Contours\nSpatially Supervised Recurrent Convolutional Neural Networks for Visual  Object Tracking\n4D Cardiac Ultrasound Standard Plane Location by Spatial-Temporal  Correlation\nGeometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of  Neurons\nSpatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition\nExploiting Symmetry and/or Manhattan Properties for 3D Object Structure  Estimation from Single and Multiple Images\nA Unified Multi-scale Deep Convolutional Neural Network for Fast Object  Detection\nLarge-Scale Video Search with Efficient Temporal Voting Structure\nTracking with multi-level features\nMesh Denoising based on Normal Voting Tensor and Binary Optimization\nGeneric Feature Learning for Wireless Capsule Endoscopy Analysis\nFundamental Matrices from Moving Objects Using Line Motion Barcodes\nLow-complexity feedback-channel-free distributed video coding using  Local Rank Transform\nCNN-based Patch Matching for Optical Flow with Thresholded Hinge  Embedding Loss\nIncremental Noising and its Fractal Behavior\nA Deep Primal-Dual Network for Guided Depth Super-Resolution\nA 58.6mW Real-Time Programmable Object Detector with Multi-Scale  Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps\nSPICE: Semantic Propositional Image Caption Evaluation\nAttentional Push: Augmenting Salience with Shared Attention Modeling\nSemantic Image Based Geolocation Given a Map\nSEBOOST - Boosting Stochastic Learning Using Subspace Optimization  Techniques\nDeep-Anomaly: Fully Convolutional Neural Network for Fast Anomaly  Detection in Crowded Scenes\nCryptoImg: Privacy Preserving Processing Over Encrypted Images\nEvolutionary Synthesis of Deep Neural Networks via Synaptic  Cluster-driven Genetic Encoding\nAn Adaptive Parameter Estimation for Guided Filter based Image  Deconvolution\nAutomation of Pedestrian Tracking in a Crowded Situation\nTracking System to Automate Data Collection of Microscopic Pedestrian  Traffic Flow\nDelaunay Triangulation on Skeleton of Flowers for Classification\nAnimal Classification System: A Block Based Approach\nGuided Filter based Edge-preserving Image Non-blind Deconvolution\nPolysemous codes\nClearing the Skies: A deep network architecture for single-image rain  removal\nTracking Algorithm for Microscopic Flow Data Collection\nComparison of several short-term traffic speed forecasting models\nImage Denoising Via Collaborative Support-Agnostic Recovery\nProbabilistic Saliency Estimation\nCrafting a multi-task CNN for viewpoint estimation\nImproving the Accuracy of Stereo Visual Odometry Using Visual  Illumination Estimation\nA Deep Metric for Multimodal Registration\nColor: A Crucial Factor for Aesthetic Quality Assessment in a Subjective  Dataset of Paintings\nGraph-Structured Representations for Visual Question Answering\nGAdaBoost: Accelerating Adaboost Feature Selection with Genetic  Algorithms\nMarkov Random Field Model-Based Salt and Pepper Noise Removal\nGeometry-Based Next Frame Prediction from Monocular Video\nRevealing Structure in Large Graphs: Szemerédi's Regularity Lemma and  its Use in Pattern Recognition\nLand Use Classification using Convolutional Neural Networks Applied to  Ground-Level Images\nCharacter-level and Multi-channel Convolutional Neural Networks for  Large-scale Authorship Attribution\nSymmetric Non-Rigid Structure from Motion for Category-Specific Object  Structure Estimation\nImage-embodied Knowledge Representation Learning\nLarge Margin Nearest Neighbor Classification using Curved Mahalanobis  Distances\nDeep Learning in Multi-Layer Architectures of Dense Nuclei\nLexicon-Free Fingerspelling Recognition from Video: Data, Models, and  Signer Adaptation\nUnderstanding and Exploiting Object Interaction Landscapes\nDeep Architectures for Face Attributes\nRecurrent Convolutional Networks for Pulmonary Nodule Detection in CT  Imaging\nMinMax Radon Barcodes for Medical Image Retrieval\nStacked Autoencoders for Medical Image Search\nReal Time Fine-Grained Categorization with Accuracy and Interpretability\nAdaptive Graph-based Total Variation for Tomographic Reconstructions\nMobility Map Computations for Autonomous Navigation using an RGBD Sensor\nDeepGaze II: Reading fixations from deep features trained on object  recognition\nCompressive Imaging with Iterative Forward Models\nDistributed Averaging CNN-ELM for Big Data\nLearning What and Where to Draw\nContent-Based Image Retrieval Using Multiresolution Analysis Of  Shape-Based Classified Images\nOpen-Ended Visual Question-Answering\nImage Segmentation Based on the Self-Balancing Mechanism in Virtual 3D  Elastic Mesh\nMatching of Images with Rotation Transformation Based on the Virtual  Electromagnetic Interaction\nCrossing the Road Without Traffic Lights: An Android-based Safety Device\nThe Analysis of Local Motion and Deformation in Image Sequences Inspired  by Physical Electromagnetic Interaction\nA Model of Virtual Carrier Immigration in Digital Images for Region  Segmentation\nThe Virtual Electromagnetic Interaction between Digital Images for Image  Matching with Shifting Transformation\nEmbedded real-time stereo estimation via Semi-Global Matching on the GPU\nHadamard Product for Low-rank Bilinear Pooling\nRGBD-based Parameter Extraction for Door Opening Tasks with Human  Assists in Nuclear Rescue\nRule Extraction Algorithm for Deep Neural Networks: A Review\nShape-based defect classification for Non Destructive Testing\nFast L1-NMF for Multiple Parametric Model Estimation\nChange-point Detection Methods for Body-Worn Video\nAn Image Dataset of Text Patches in Everyday Scenes\nProposing Plausible Answers for Open-ended Visual Question Answering\nScalable Pooled Time Series of Big Video Data from the Deep Web\nEfficient Global Indoor Localization for Micro Aerial Vehicles\nA Novel Boundary Matching Algorithm for Video Temporal Error Concealment\nSpatial Relationship Based Features for Indian Sign Language Recognition\nSavu: A Python-based, MPI Framework for Simultaneous Processing of  Multiple, N-dimensional, Large Tomography Datasets\nAutomated Management of Pothole related Disasters Using Image Processing  and Geotagging\nJudging a Book By its Cover\nDiscovering containment: from infants to machines\nFlood-Filling Networks\nDeep Convolutional Neural Network Design Patterns\nIntegrating Atlas and Graph Cut Methods for LV Segmentation from Cardiac  Cine MRI\nNonnegative Matrix Underapproximation for Robust Multiple Model Fitting\nReal-Time Visual Place Recognition for Personal Localization on a Mobile  Device\nMemory-augmented Attention Modelling for Videos\nDomain Adaptation with L2 constraints for classifying images from  different endoscope systems\nGradients of Counterfactuals\nAudio Visual Speech Recognition using Deep Recurrent Neural Networks\nDeep Recurrent Neural Network for Mobile Human Activity Recognition with  High Throughput\nShow me the material evidence: Initial experiments on evaluating  hypotheses from user-generated multimedia data\nOctNet: Learning Deep 3D Representations at High Resolutions\nEfficient Diffusion on Region Manifolds: Recovering Small Objects with  Compact CNN Representations\nZero-Shot Visual Question Answering\nLearning to detect and localize many objects from few examples\nA Discriminatively Learned CNN Embedding for Person Re-identification\nEnd-to-end Learning of Cost-Volume Aggregation for Real-time Dense  Stereo\nBeyond Deep Residual Learning for Image Restoration: Persistent  Homology-Guided Manifold Simplification\nLCNN: Lookup-based Convolutional Neural Network\nA Hierarchical Approach for Generating Descriptive Image Paragraphs\nCovariate conscious approach for Gait recognition based upon Zernike  moment invariants\nNon-Local Color Image Denoising with Convolutional Neural Networks\nStatistical Learning for OCR Text Correction\nMultiple-View Spectral Clustering for Group-wise Functional Community  Detection\nLearning Multi-level Features For Sensor-based Human Action Recognition\nDistributable Consistent Multi-Object Matching\n3D Menagerie: Modeling the 3D shape and pose of animals\nDeep Feature Flow for Video Recognition\nMultiframe Motion Coupling for Video Super Resolution\nAdaptive Feature Abstraction for Translating Video to Text\nSemantic Compositional Networks for Visual Captioning\nInstanceCut: from Edges to Instances with MultiCut\nColor Constancy with Derivative Colors\nHandwriting Profiling using Generative Adversarial Networks\nKernel classification of connectomes based on earth mover's distance  between graph spectra\nWhat Is Around The Camera?\nIs a picture worth a thousand words? A Deep Multi-Modal Fusion  Architecture for Product Classification in e-commerce\nLens Distortion Rectification using Triangulation based Interpolation\nPredicting Human Eye Fixations via an LSTM-based Saliency Attentive  Model\nFast Supervised Discrete Hashing and its Analysis\nAdversarial Images for Variational Autoencoders\nTemporal Attention-Gated Model for Robust Sequence Classification\nUnderstanding image motion with group representations\nScribbler: Controlling Deep Image Synthesis with Sketch and Color\nSemi-supervised learning of deep metrics for stereo reconstruction\nEnsembles of Generative Adversarial Networks\nMulti-way Particle Swarm Fusion\nMessage Passing Multi-Agent GANs\nHighly Efficient Regression for Scalable Person Re-Identification\nAI Researchers, Video Games Are Your Friends!\nCluster-Wise Ratio Tests for Fast Camera Localization\nMode Regularized Generative Adversarial Networks\nExploring the potential of combining time of flight and thermal infrared  cameras for person detection\nGeneralized Sinkhorn iterations for regularizing inverse problems using  optimal mass transport\nAutomatic Detection of ADHD and ASD from Expressive Behaviour in RGBD  Data\nDeMoN: Depth and Motion Network for Learning Monocular Stereo\nProgressive Tree-like Curvilinear Structure Reconstruction with  Structured Ranking Learning and Graph Algorithm\nJoint Hand Detection and Rotation Estimation by Using CNN\nFacial Expression Recognition using Convolutional Neural Networks: State  of the Art\nFeature Pyramid Networks for Object Detection\nGeneralized Deep Image to Image Regression\nNeural Networks with Manifold Learning for Diabetic Retinopathy  Detection\nRecurrent Image Captioner: Describing Images with Spatial-Invariant  Transformation and Attention Filtering\nA Multilinear Tongue Model Derived from Speech Related MRI Data of the  Human Vocal Tract\nVisual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and  Pose Estimator\nMachine Reading with Background Knowledge\nMedical Image Synthesis with Context-Aware Generative Adversarial  Networks\nHandwritten Signature Verification Using Hand-Worn Devices\nLearning Features by Watching Objects Move\nExploring the Design Space of Deep Convolutional Neural Networks at  Large Scale\nAutomatic Generation of Grounded Visual Questions\nCLEVR: A Diagnostic Dataset for Compositional Language and Elementary  Visual Reasoning\nFINN: A Framework for Fast, Scalable Binarized Neural Network Inference\nMultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving\nTwo-stream convolutional neural network for accurate RGB-D fingertip  detection using depth and edge information\nImage-Text Multi-Modal Representation Learning by Adversarial  Backpropagation\nSemantic Video Segmentation by Gated Recurrent Flow Propagation\nMemory Efficient Multi-Scale Line Detector Architecture for Retinal  Blood Vessel Segmentation\nA Joint Speaker-Listener-Reinforcer Model for Referring Expressions\nImproved Stereo Matching with Constant Highway Networks and Reflective  Confidence Learning\nProduct Manifold Filter: Non-Rigid Shape Correspondence via Kernel  Density Estimation in the Product Space\nA Hierarchical Image Matting Model for Blood Vessel Segmentation in  Fundus images\nA Concave Optimization Algorithm for Matching Partially Overlapping  Point Sets\nDemystifying Neural Style Transfer\nSalGAN: Visual Saliency Prediction with Generative Adversarial Networks\nAutoencoder Regularized Network For Driving Style Representation  Learning\nDeep Convolutional Denoising of Low-Light Images\nDeep Class Aware Denoising\nUnsupervised Learning of Long-Term Motion Dynamics for Videos\nMulti-task Learning Of Deep Neural Networks For Audio Visual Automatic  Speech Recognition\nEfficient Image Set Classification using Linear Regression based Image  Reconstruction\nLight Source Point Cluster Selection Based Atmosphere Light Estimation\nOn Hölder projective divergences\nLight Source Estimation with Analytical Path-tracing\nA Deep Convolutional Auto-Encoder with Pooling - Unpooling Layers in  Caffe\nAccurate Motion Estimation through Random Sample Aggregated Consensus\nEnd-To-End Visual Speech Recognition With LSTMs\nImage Compression with SVD : A New Quality Metric Based On Energy Ratio\nDistributed methods for synchronization of orthogonal matrices over  graphs\nDeep Reinforcement Learning: An Overview\nImage-Grounded Conversations: Multimodal Context for Natural Question  and Response Generation\nRe-ranking Person Re-identification with k-reciprocal Encoding\nTransformation-Based Models of Video Sequences\nStable and Controllable Neural Texture Synthesis and Style Transfer  Using Histogram Losses\nFeature Selection based on PCA and PSO for Multimodal Medical Image  Fusion using DTCWT\nDeep Multitask Architecture for Integrated 2D and 3D Human Sensing\nSiamese Network of Deep Fisher-Vector Descriptors for Image Retrieval\nVisual Saliency Prediction Using a Mixture of Deep Neural Networks\nProduct Graph-based Higher Order Contextual Similarities for Inexact  Subgraph Matching\nLearning a time-dependent master saliency map from eye-tracking data in  videos\nMaritime situational awareness using adaptive multi-sensor management  under hazy conditions\nDeep Learning with Low Precision by Half-wave Gaussian Quantization\nPrinted Arabic Text Recognition using Linear and Nonlinear Regression\nMulti-scale Convolutional Neural Networks for Crowd Counting\nRegion Ensemble Network: Improving Convolutional Network for Hand Pose  Estimation\nManifold Based Low-rank Regularization for Image Restoration and  Semi-supervised Learning\nTexture Characterization by Using Shape Co-occurrence Patterns\nA Morphology-aware Network for Morphological Disambiguation\nIntegrating Three Mechanisms of Visual Attention for Active Visual  Search\nFilling missing data in point clouds by merging structured and  unstructured point clouds\nSpectral Algorithms for Temporal Graph Cuts\n3D Cell Nuclei Segmentation with Balanced Graph Partitioning\nDefect detection for patterned fabric images based on GHOG and low-rank  decomposition\nOnline Robust Principal Component Analysis with Change Point Detection\nMemory Efficient Max Flow for Multi-label Submodular MRFs\nSynthesis versus analysis in patch-based image priors\nThe Power of Sparsity in Convolutional Neural Networks\nDifferential Geometric Retrieval of Deep Features\nFast Resampling of 3D Point Clouds via Graphs\nOnline Representation Learning with Single and Multi-layer Hebbian  Networks for Image Classification\nLensless Photography with only an image sensor\nRobust and fully automated segmentation of mandible from CT scans\nViewpoint Adaptation for Rigid Object Detection\nMemory-Efficient Global Refinement of Decision-Tree Ensembles and its  Application to Face Alignment\nEnabling Sparse Winograd Convolution by Native Pruning\nSelective Video Object Cutout\nMILD: Multi-Index hashing for Loop closure Detection\nTheoretical Properties for Neural Networks with Weight Matrices of Low  Displacement Rank\nBinarized Convolutional Landmark Localizers for Human Pose Estimation  and Face Alignment with Limited Resources\nUsing Synthetic Data to Train Neural Networks is Model-Based Reasoning\nA Restaurant Process Mixture Model for Connectivity Based Parcellation  of the Cortex\nArbitrary-Oriented Scene Text Detection via Rotation Proposals\nAdversarial Examples for Semantic Image Segmentation\nInstance Flow Based Online Multiple Object Tracking\nLearning across scales - A multiscale method for Convolution Neural  Networks\nDeep View Morphing\nRemoval of Salt and Pepper noise from Gray-Scale and Color Images: An  Adaptive Approach\nDeep Learning for Automated Quality Assessment of Color Fundus Images in  Diabetic Retinopathy Screening\nFaster Coordinate Descent via Adaptive Importance Sampling\nLarge Kernel Matters -- Improve Semantic Segmentation by Global  Convolutional Network\nDeep Convolutional Neural Network Inference with Floating-point Weights  and Fixed-point Activations\nLesionSeg: Semantic segmentation of skin lesions using Deep  Convolutional Neural Network\nThe xDotGrid Native, Cross-Platform, High-Performance xDFS File Transfer  Framework\nParallel Multiscale Autoregressive Density Estimation\nEnd-to-End Learning of Geometry and Context for Deep Stereo Regression\nA Localisation-Segmentation Approach for Multi-label Annotation of  Lumbar Vertebrae using Deep Nets\nGuetzli: Perceptually Guided JPEG Encoder\nDiscriminate-and-Rectify Encoders: Learning from Image Transformation  Sets\nA Proximity-Aware Hierarchical Clustering of Faces\nReal-Time Panoramic Tracking for Event Cameras\nLearning Robust Visual-Semantic Embeddings\nTURN TAP: Temporal Unit Regression Network for Temporal Action Proposals\nRoomNet: End-to-End Room Layout Estimation\nMultilevel Context Representation for Improving Object Recognition\nA Fully-Automated Pipeline for Detection and Segmentation of Liver  Lesions and Pathological Lymph Nodes\nVQABQ: Visual Question Answering by Basic Questions\nObject category understanding via eye fixations on freehand sketches\nSORT: Second-Order Response Transform for Visual Recognition\nSimple Online and Realtime Tracking with a Deep Association Metric\nSpatially-Varying Blur Detection Based on Multiscale Fused and Sorted  Transform Coefficients of Gradient Magnitudes\nRobust SfM with Little Image Overlap\nRecurrent and Contextual Models for Visual Question Answering\nDeepVisage: Making face recognition simple yet with powerful  generalization skills\nExploiting Color Name Space for Salient Object Detection\nA Study on the Extraction and Analysis of a Large Set of Eye Movement  Features during Reading\nDiscriminative Transfer Learning for General Image Restoration\nFemoral ROIs and Entropy for Texture-based Detection of Osteoarthritis  from High-Resolution Knee Radiographs\nEnsembles of Deep LSTM Learners for Activity Recognition using Wearables\nAdversarial Image Perturbation for Privacy Protection -- A Game Theory  Perspective\nEfficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination\nDeep 6-DOF Tracking\nTowards thinner convolutional neural networks through Gradually Global  Pruning\nApplication of a Shallow Neural Network to Short-Term Stock Trading\nSpeaking the Same Language: Matching Machine to Human Captions by  Adversarial Training\nTowards a Visual Privacy Advisor: Understanding and Predicting Privacy  Risks in Images\nConcurrent Segmentation and Localization for Tracking of Surgical  Instruments\nDiabetic Retinopathy Detection via Deep Convolutional Networks for  Discriminative Localization and Visual Explanation\nFast Predictive Multimodal Image Registration\nGeodesic Distance Histogram Feature for Video Segmentation\nCustomizing First Person Image Through Desired Actions\nDense Multi-view 3D-reconstruction Without Dense Correspondences\nA Genetic Programming Approach to Designing Convolutional Neural Network  Architectures\nsWSI: A Low-cost and Commercial-quality Whole Slide Imaging System on  Android and iOS Smartphones\nFeature Squeezing: Detecting Adversarial Examples in Deep Neural  Networks\nRelative Learning from Web Images for Content-adaptive Enhancement\nSupporting Navigation of Outdoor Shopping Complexes for  Visually-impaired Users through Multi-modal Data Fusion\nNot All Pixels Are Equal: Difficulty-aware Semantic Segmentation via  Deep Layer Cascade\nPrivacy-Preserving Visual Learning Using Doubly Permuted Homomorphic  Encryption\nA New Pseudo-color Technique Based on Intensity Information Protection  for Passive Sensor Imagery\nMotion Saliency Based Automatic Delineation of Glottis Contour in  High-speed Digital Images\nDynamic Edge-Conditioned Filters in Convolutional Neural Networks on  Graphs\nFeature Selection Parallel Technique for Remotely Sensed Imagery  Classification\nDeep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution\nZero-order Reverse Filtering\nFastVentricle: Cardiac Segmentation with ENet\nCBinfer: Change-Based Inference for Convolutional Neural Networks on  Video Data\nShapeWorld - A new test methodology for multimodal language  understanding\nDeep Learning for Photoacoustic Tomography from Sparse Data\nA learning-based approach for automatic image and video colorization\nBig Universe, Big Data: Machine Learning and Image Analysis for  Astronomy\nTemporal Action Localization by Structured Maximal Sums\nUniversal Adversarial Perturbations Against Semantic Image Segmentation\nSkiMap: An Efficient Mapping Framework for Robot Navigation\nA Nuclear-norm Model for Multi-Frame Super-Resolution Reconstruction  from Video Clips\nMulti-view Supervision for Single-view Reconstruction via Differentiable  Ray Consistency\nAttend to You: Personalized Image Captioning with Context Sequence  Memory Networks\nHierarchical Bayesian Data Fusion for Robotic Platform Navigation\nConvolutional Neural Networks for Facial Expression Recognition\nMulti-Task Video Captioning with Video and Entailment Generation\nThe loss surface of deep and wide neural networks\nNew region force for variational models in image segmentation and high  dimensional data clustering\nC-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0  Dataset\nDeep Cross-Modal Audio-Visual Generation\n(Quasi)Periodicity Quantification in Video Data, Using Topology\nSaliency Benchmarking: Separating Models, Maps and Metrics\nSparse Hierachical Extrapolated Parametric Methods for Cortical Data  Analysis\nObstacle Avoidance through Deep Networks based Intermediate Perception\nCompressive Sensing Approaches for Autonomous Object Detection in Video  Sequences\nTopologically Robust 3D Shape Matching via Gradual Deflation and  Inflation\nSurfCut: Surfaces of Minimal Paths From Topological Structures\nA Statistical Model for Simultaneous Template Estimation, Bias  Correction, and Registration of 3D Brain Images\nQuery-adaptive Video Summarization via Quality-aware Relevance  Estimation\nThe Promise of Premise: Harnessing Question Premises in Visual Question  Answering\nSTAIR Captions: Constructing a Large-Scale Japanese Image Caption  Dataset\nVisual Attribute Transfer through Deep Image Analogy\nThe Forgettable-Watcher Model for Video Question Answering\nOptical Flow in Mostly Rigid Scenes\nA Rural Lens on a Research Agenda for Intelligent Infrastructure\nSupervised Learning of Universal Sentence Representations from Natural  Language Inference Data\nTrajectoryNet: An Embedded GPS Trajectory Representation for Point-based  Classification Using Recurrent Neural Networks\nHigh-Level Concepts for Affective Understanding of Images\nMachine Learning with World Knowledge: The Position and Survey\nLarge-scale, Fast and Accurate Shot Boundary Detection through  Spatio-temporal Convolutional Neural Networks\nMulti-Scale Spatially Weighted Local Histograms in O(1)\nInferring and Executing Programs for Visual Reasoning\nA Cascaded Convolutional Neural Network for X-ray Low-dose CT Image  Denoising\nSEAGLE: Sparsity-Driven Image Reconstruction under Multiple Scattering\nLearning to see people like people\nDetection of irregular QRS complexes using Hermite Transform and Support  Vector Machine\nMotion-Compensated Autonomous Scanning for Tumour Localisation using  Intraoperative Ultrasound\nRobust Registration of Gaussian Mixtures for Colour Transfer\nElastic and Secure Energy Forecasting in Cloud Environments\nBuilding effective deep neural network architectures one feature at a  time\nA New 3D Method to Segment the Lumbar Vertebral Bodies and to Determine  Bone Mineral Density and Geometry\nPhase-Shifting Separable Haar Wavelets and Applications\nLarge-Scale Classification of Structured Objects using a CRF with Deep  Class Embedding\nLearning to Prune Deep Neural Networks via Layer-wise Optimal Brain  Surgeon\nDynamics Based 3D Skeletal Hand Tracking\nSemantically Decomposing the Latent Spaces of Generative Adversarial  Networks\nGP-Unet: Lesion Detection from Weak Labels with a 3D Regression Network\nDistributed Algorithms for Feature Extraction Off-loading in  Multi-Camera Visual Sensor Networks\nIsomorphism between Differential and Moment Invariants under Affine  Transform\nAccelerating Discrete Wavelet Transforms on GPUs\nMatching neural paths: transfer from recognition to correspondence  search\nA New 3D Segmentation Technique for QCT Scans of the Lumbar Spine to  Determine BMD and Vertebral Geometry\nAn Invariant Model of the Significance of Different Body Parts in  Recognizing Different Actions\nClassification of Aerial Photogrammetric 3D Point Clouds\nBetter Text Understanding Through Image-To-Text Transfer\nFormal Guarantees on the Robustness of a Classifier against Adversarial  Manipulation\nAdaptive Detrending to Accelerate Convolutional Gated Recurrent Unit  Training for Contextual Video Recognition\nDense Transformer Networks\nSLAM based Quasi Dense Reconstruction For Minimally Invasive Surgery  Scenes\nAnalysis of universal adversarial perturbations\nNear-linear time approximation algorithms for optimal transport via  Sinkhorn iteration\nPVEs: Position-Velocity Encoders for Unsupervised Learning of Structured  State Representations\nBMXNet: An Open-Source Binary Neural Network Implementation Based on  MXNet\nL1-norm Error Function Robustness and Outlier Regularization\nOptimal Multi-Object Segmentation with Novel Gradient Vector Flow Based  Shape Priors\nEmergent Communication in a Multi-Modal, Multi-Step Referential Game\nDeep Learning is Robust to Massive Label Noise\nWorking hard to know your neighbor's margins: Local descriptor learning  loss\nTeaching Machines to Describe Images via Natural Language Feedback\nShape and Positional Geometry of Multi-Object Configurations\nAn Effective Approach for Point Clouds Registration Based on the Hard  and Soft Assignments\nDiracNets: Training Very Deep Neural Networks Without Skip-Connections\nModeling Latent Attention Within Neural Networks\nEarly Experiences with Crowdsourcing Airway Annotations in Chest CT\nDeLiGAN : Generative Adversarial Networks for Diverse and Limited Data\nTraining Quantized Nets: A Deeper Understanding\nEvaluating (and improving) the correspondence between deep neural  networks and human representations\nC-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of  Kidney Stones\nLearning Local Receptive Fields and their Weight Sharing Scheme on  Graphs\nBicycle Detection Based On Multi-feature and Multi-frame Fusion in  low-resolution traffic videos\nChannel-Recurrent Autoencoding for Image Modeling\nRecurrent Inference Machines for Solving Inverse Problems\nAutomatic Localization of Deep Stimulation Electrodes Using  Trajectory-based Segmentation Approach\nLarge-Scale YouTube-8M Video Understanding with Deep Neural Networks\nDOTE: Dual cOnvolutional filTer lEarning for Super-Resolution and  Cross-Modality Synthesis in MRI\nA Fully Trainable Network with RNN-based Pooling\nFeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis\nVariants of RMSProp and Adagrad with Logarithmic Regret Bounds\nRethinking Atrous Convolution for Semantic Image Segmentation\nDeep learning with spatiotemporal consistency for nerve segmentation in  ultrasound images\nOptimising the topological information of the $A_\\infty$-persistence  groups\nBrain Tumor Detection and Classification with Feed Forward Back-Prop  Neural Network\nRecognition of Grasp Points for Clothes Manipulation under unconstrained  Conditions\nDeep Learning Autoencoder Approach for Handwritten Arabic Digits  Recognition\nDeep Supervision for Pancreatic Cyst Segmentation in Abdominal CT Scans\nTracking Single-Cells in Overcrowded Bacterial Colonies\nMultiresolution Match Kernels for Gesture Video Classification\nComputer-aided implant design for the restoration of cranial defects\nLarge-Scale Mapping of Human Activity using Geo-Tagged Videos\nFReLU: Flexible Rectified Linear Units for Improving Convolutional  Neural Networks\nScalable multimodal convolutional networks for brain tumour segmentation\nDeep Semantics-Aware Photo Adjustment\nNatural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog\nUsing Frame Theoretic Convolutional Gridding for Robust Synthetic  Aperture Sonar Imaging\nHierarchical Attentive Recurrent Tracking\nAlternative Semantic Representations for Zero-Shot Human Action  Recognition\nActor-Critic Sequence Training for Image Captioning\nScale-Aware Face Detection\nAutomatic Face Image Quality Prediction\nColor-opponent mechanisms for local hue encoding in a hierarchical  framework\nA Batch-Incremental Video Background Estimation Model using Weighted  Low-Rank Approximation of Matrices\nAutomatic Trimap Generation for Image Matting\nPhysics Inspired Optimization on Semantic Transfer Features: An  Alternative Method for Room Layout Estimation\nDeep-learning-based data page classification for holographic memory\nDeep Representation Learning with Part Loss for Person Re-Identification\nOptimization Beyond the Convolution: Generalizing Spatial Relations with  End-to-End Metric Learning\nCopy-move Forgery Detection based on Convolutional Kernel Network\nImproving Content-Invariance in Gated Autoencoders for 2D and 3D Object  Rotation\nTowards lightweight convolutional neural networks for object detection\nAutomated Lane Detection in Crowds using Proximity Graphs\nCross-linguistic differences and similarities in image descriptions\nFast Stochastic Hierarchical Bayesian MAP for Tomographic Imaging\nDeep Learning for Vanishing Point Detection Using an Inverse Gnomonic  Projection\nMDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network\nTowards Crafting Text Adversarial Samples\nCheckerboard artifact free sub-pixel convolution: A note on sub-pixel  convolution, resize convolution and convolution resize\nA Reconfigurable Streaming Deep Convolutional Neural Network Accelerator  for Internet of Things\nFast Amortized Inference and Learning in Log-linear Models with Randomly  Perturbed Nearest Neighbor Search\nLearning the Latent \"Look\": Unsupervised Discovery of a Style-Coherent  Embedding from Fashion Images\nDeep Learning for Sensor-based Activity Recognition: A Survey\nTwo-dimensional nonseparable discrete linear canonical transform based  on CM-CC-CM-CC decomposition\nData preprocessing methods for robust Fourier ptychographic microscopy\nLearning Photography Aesthetics with Deep CNNs\nRecognizing Abnormal Heart Sounds Using Deep Learning\nPathological OCT Retinal Layer Segmentation using Branch Residual  U-shape Networks\nVisual Question Answering with Memory-Augmented Networks\nDCTM: Discrete-Continuous Transformation Matching for Semantic Flow\nFast Feature Fool: A data independent approach to universal adversarial  perturbations\nVSE++: Improving Visual-Semantic Embeddings with Hard Negatives\nHashed Binary Search Sampling for Convolutional Network Training with  Large Overhead Image Patches\nAirCode: Unobtrusive Physical Tags for Digital Fabrication\nOn Finding Maximum Cardinality Subset of Vectors with a Constraint on  Normalized Squared Length of Vectors Sum\nVideo Question Answering via Attribute-Augmented Attention Network  Learning\nScalable Full Flow with Learned Binary Descriptors\nRecovering Sparse Nonnegative Signals via Non-convex Fraction Function  Penalty\nLocal Geometry Inclusive Global Shape Representation\nWhat Looks Good with my Sofa: Multimodal Search Engine for Interior  Design\nPersistent-homology-based gait recognition\nOBJ2TEXT: Generating Visually Descriptive Language from Object Layouts\nTowards Good Practices for Deep 3D Hand Pose Estimation\nRobust Tracking and Behavioral Modeling of Movements of Biological  Collectives from Ordinary Video Recordings\nA new take on measuring relative nutritional density: The feasibility of  using a deep neural network to assess commercially-prepared pureed food  concentrations\nTraffic scene recognition based on deep cnn and vlad spatial pyramids\nImage Pivoting for Learning Multilingual Multimodal Representations\nLiver lesion segmentation informed by joint liver segmentation\nAutomatic Liver Segmentation Using an Adversarial Image-to-Image Network\nA Unified Joint Matrix Factorization Framework for Data Integration\nEfficient Low Rank Tensor Ring Completion\nTensorLayer: A Versatile Library for Efficient Deep Learning Development\nInterpatient Respiratory Motion Model Transfer for Virtual Reality  Simulations of Liver Punctures\nDeep Residual Learning for Weakly-Supervised Relation Extraction\nA Locally Adapting Technique for Boundary Detection using Image  Segmentation\nVisual Relationship Detection with Internal and External Linguistic  Knowledge Distillation\nZero-Shot Activity Recognition with Verb Attribute Induction\nScanNet: A Fast and Dense Scanning Framework for Metastatic Breast  Cancer Detection from Whole-Slide Images\nGuided Co-training for Large-Scale Multi-View Spectral Clustering\nRepresentation Learning on Large and Small Data\nLearned in Translation: Contextualized Word Vectors\nFast Preprocessing for Robust Face Sketch Synthesis\nOn the Importance of Consistency in Training Deep Neural Networks\nGeneration of High Dynamic Range Illumination from a Single Image for  the Enhancement of Undesirably Illuminated Images\nAction recognition by learning pose representations\nPredictive Coding for Dynamic Visual Processing: Development of  Functional Hierarchy in a Multiple Spatio-Temporal Scales RNN Model\nFingerprint Extraction Using Smartphone Camera\nLearning Accurate Low-Bit Deep Neural Networks with Stochastic  Quantization\nPhase-error estimation and image reconstruction from digital-holography  data using a Bayesian framework\nThree-dimensional planar model estimation using multi-constraint  knowledge based on k-means and RANSAC\nSemantic Augmented Reality Environment with Material-Aware Physical  Interactions\nReal-time Geometry-Aware Augmented Reality in Minimally Invasive Surgery\nMulti-modal Factorized Bilinear Pooling with Co-Attention Learning for  Visual Question Answering\nImproving Speaker-Independent Lipreading with Domain-Adversarial  Training\nRegion-Based Multiscale Spatiotemporal Saliency for Video\nParametrization and Generation of Geological Models with Generative  Adversarial Networks\nReinforced Video Captioning with Entailment Rewards\nTips and Tricks for Visual Question Answering: Learnings from the 2017  Challenge\nWeakly- and Self-Supervised Learning for Content-Aware Deep Image  Retargeting\nGaussian Prototypical Networks for Few-Shot Learning on Omniglot\nPrivacy Preserving Face Retrieval in the Cloud for Mobile Users\nInteracting with Acoustic Simulation and Fabrication\nAn evaluation of large-scale methods for image instance and class  discovery\nConvolutional Neural Networks for Font Classification\nA Cost-Sensitive Visual Question-Answer Framework for Mining a Deep  And-OR Object Semantics from Web Images\nArtistic style transfer for videos and spherical images\nEfficiently Tracking Homogeneous Regions in Multichannel Images\nDeep Neural Network with l2-norm Unit for Brain Lesions Detection\nIncorporating Copying Mechanism in Image Captioning for Learning Novel  Objects\nMesh-based 3D Textured Urban Mapping\nEmploying Weak Annotations for Medical Image Analysis Problems\nSharpness-aware Low dose CT denoising using conditional generative  adversarial network\nContrast and visual saliency similarity induced index for image quality  assessment\nActivity Recognition based on a Magnitude-Orientation Stream Network\nWhat does 2D geometric information really tell us about 3D face shape?\nNon-linear Convolution Filters for CNN-based Learning\nAn Image Analysis Approach to the Calligraphy of Books\nFacePoseNet: Making a Case for Landmark-Free Face Alignment\nThe Parallel Algorithm for the 2-D Discrete Wavelet Transform\nBatch-Based Activity Recognition from Egocentric Photo-Streams\nMaximum A Posteriori Estimation of Distances Between Deep Features in  Still-to-Video Face Recognition\nFacial Expression Recognition using Visual Saliency and Deep Learning\nImbalanced Malware Images Classification: a CNN based Approach\nFraming U-Net via Deep Convolutional Framelets: Application to  Sparse-view CT\nAutomatic Discovery and Geotagging of Objects from Street View Imagery\nDeep Belief Networks used on High Resolution Multichannel  Electroencephalography Data for Seizure Detection\nDeep Structure for end-to-end inverse rendering\nScatterNet Hybrid Deep Learning (SHDL) Network For Object Classification\nNeural Class-Specific Regression for face verification\nAbnormal Event Detection in Videos using Generative Adversarial Nets\nGlyph-aware Embedding of Chinese Characters\nFirst and Second Order Methods for Online Convolutional Dictionary  Learning\nXFlow: 1D-2D Cross-modal Deep Neural Networks for Audiovisual  Classification\nSelf-Supervised Learning for Stereo Matching with Self-Improving Ability\nALICE: Towards Understanding Adversarial Learning for Joint Distribution  Matching\nSqueeze-and-Excitation Networks\nBranchyNet: Fast Inference via Early Exiting from Deep Neural Networks\nCNN-Based Projected Gradient Descent for Consistent Image Reconstruction\nTowards high-throughput 3D insect capture for species discovery and  diagnostics\nThe Mating Rituals of Deep Neural Networks: Learning Compact Feature  Representations through Sexual Evolutionary Synthesis\nGraph Scaling Cut with L1-Norm for Classification of Hyperspectral  Images\nRobust Emotion Recognition from Low Quality and Low Bit Rate Video: A  Deep Learning Approach\nDPC-Net: Deep Pose Correction for Visual Localization\nArt of singular vectors and universal adversarial perturbations\nOn the definition of Shape Parts: a Dominant Sets Approach\nPQk-means: Billion-scale Clustering for Product-quantized Codes\nA low cost non-wearable gaze detection system based on infrared image  processing\nDenoising Autoencoders for Overgeneralization in Neural Networks\nInformed Non-convex Robust Principal Component Analysis with Features\nJoint Hierarchical Category Structure Learning and Large-Scale Image  Classification\nA Streaming Accelerator for Deep Convolutional Neural Networks with  Image and Feature Decomposition for Resource-limited System Applications\nDetecting Faces Using Region-based Fully Convolutional Networks\nThe Multiscale Bowler-Hat Transform for Blood Vessel Enhancement in  Retinal Images\nDeepLung: 3D Deep Convolutional Nets for Automated Pulmonary Nodule  Detection and Classification\nKernel Cross-Correlator\nMulti-modal analysis of genetically-related subjects using SIFT  descriptors in brain MRI\nMUFold-SS: Protein Secondary Structure Prediction Using Deep  Inception-Inside-Inception Networks\nA Fast Algorithm Based on a Sylvester-like Equation for LS Regression  with GMRF Prior\nCompressing Low Precision Deep Neural Networks Using Sparsity-Induced  Regularization in Ternary Networks\nAn Adaptive Algorithm for Precise Pupil Boundary Detection using Entropy  of Contour Gradients\nUpdating the silent speech challenge benchmark with deep learning\nMulti-camera Multi-Object Tracking\nLearned Features are better for Ethnicity Classification\nHierarchical Detail Enhancing Mesh-Based Shape Generation with 3D  Generative Adversarial Network\nHigh-Resolution Shape Completion Using Deep Neural Networks for Global  Structure and Local Geometry Inference\nSingle-pixel imaging with Morlet wavelet correlated random patterns\nCan Image Retrieval help Visual Saliency Detection?\nHDLTex: Hierarchical Deep Learning for Text Classification\nUnderstanding Infographics through Textual and Visual Tag Prediction\nA Read-Write Memory Network for Movie Story Understanding\nConnectivity Learning in Multi-Branch Networks\nCombining Real-Valued and Binary Gabor-Radon Features for Classification  and Search in Medical Imaging Archives\nImproving Dermoscopic Image Segmentation with Enhanced  Convolutional-Deconvolutional Networks\nDistance-based Confidence Score for Neural Network Classifiers\nRecognition of Documents in Braille\nPossibilistic Fuzzy Local Information C-Means for Sonar Image  Segmentation\nImproving image generative models with human interactions\nHuman motion primitive discovery and recognition\nFine-grained Event Learning of Human-Object Interaction with LSTM-CRF\nPyramidal RoR for Image Classification\nLearning event representation: As sparse as possible, but not sparser\nOut-of-focus Blur: Image De-blurring\nOptimal DNN Primitive Selection with Partitioned Boolean Quadratic  Programming\nIsotropic and Steerable Wavelets in N Dimensions. A multiresolution  analysis framework for ITK\nFinding phonemes: improving machine lip-reading\nLearning Autoencoded Radon Projections\nGraphMatch: Efficient Large-Scale Graph Construction for Structure from  Motion\nA self-organizing neural network architecture for learning human-object  interactions\nSemantic keyword spotting by learning from images and speech\nDeep Convolutional Neural Networks as Generic Feature Extractors\nGender and Ethnicity Classification of Iris Images using Deep  Class-Encoder\nAn automatic deep learning approach for coronary artery calcium  segmentation\nMultitask training with unlabeled data for end-to-end sign language  fingerspelling recognition\nJoint Weakly and Semi-Supervised Deep Learning for Localization and  Classification of Masses in Breast Ultrasound Images\nDeep Hyperalignment\nRADNET: Radiologist Level Accuracy using Deep Learning for HEMORRHAGE  detection in CT Scans\nMicroaneurysm Detection in Fundus Images Using a Two-step Convolutional  Neural Networks\nAn Adaptive Framework for Missing Depth Inference Using Joint Bilateral  Filter\nLung Cancer Screening Using Adaptive Memory-Augmented Recurrent Networks\nA New Coherence-Penalized Minimal Path Model with Application to Retinal  Vessel Centerline Delineation\nTowards CT-quality Ultrasound Imaging using Deep Learning\nBeat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural  Networks\nDo Convolutional Neural Networks Learn Class Hierarchy?\nPose-based Deep Gait Recognition\nEnhancing the Performance of Convolutional Neural Networks on Quality  Degraded Datasets\nNonlinear Supervised Dimensionality Reduction via Smooth Regular  Embeddings\nVisual Speech Recognition Using PCA Networks and LSTMs in a Tandem  GMM-HMM System\nLearning to Recognize Actions from Limited Training Examples Using a  Recurrent Spiking Neural Model\nAnticipating Daily Intention using On-Wrist Motion Triggered Sensing\nMR to X-Ray Projection Image Synthesis\nADA: A Game-Theoretic Perspective on Data Augmentation for Object  Detection\nProgressive Learning for Systematic Design of Large Neural Networks\nAutoEncoder Inspired Unsupervised Feature Selection\nBenchmark of Deep Learning Models on Large Healthcare MIMIC Datasets\nRobust Photometric Stereo via Dictionary Learning\nThe Shape of an Image: A Study of Mapper on Images\nCrop Planning using Stochastic Visual Optimization\nAnatomical labeling of brain CT scan anomalies using multi-context  nearest neighbor relation networks\nBiometrics-as-a-Service: A Framework to Promote Innovative Biometric  Recognition in the Cloud\nAdversarial Deep Structured Nets for Mass Segmentation from Mammograms\nA Generative Model for Volume Rendering\nPhase Transitions in Image Denoising via Sparsely Coding Convolutional  Neural Networks\nSEGMENT3D: A Web-based Application for Collaborative Segmentation of 3D  images used in the Shoot Apical Meristem\nData-driven Feature Sampling for Deep Hyperspectral Classification and  Segmentation\nStochastic Conjugate Gradient Algorithm with Variance Reduction\nDual Path Networks for Multi-Person Human Pose Estimation\nAutomatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep  Learning-Based Approach\nRegularization for Deep Learning: A Taxonomy\nAutomated Tumor Segmentation and Brain Mapping for the Tumor Area\nGenerating Natural Adversarial Examples\nUpdating the VESICLE-CNN Synapse Detector\nLog-DenseNet: How to Sparsify a DenseNet\nSmooth Neighbors on Teacher Graphs for Semi-supervised Learning\nHierarchical Representations for Efficient Architecture Search\nDon't Decay the Learning Rate, Increase the Batch Size\nRecognizing Textures with Mobile Cameras for Pedestrian Safety  Applications\nA Classification-Based Perspective on GAN Distributions\nBackground Subtraction via Fast Robust Matrix Completion\nAttentional Pooling for Action Recognition\nA Survey on Dialogue Systems: Recent Advances and New Frontiers\nNeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm\nDoppler-Radar Based Hand Gesture Recognition System Using Convolutional  Neural Networks\nGradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep  Multitask Networks\nSIMILARnet: Simultaneous Intelligent Localization and Recognition  Network\nCompact Neural Networks based on the Multiscale Entanglement  Renormalization Ansatz\nPicasso, Matisse, or a Fake? Automated Analysis of Drawings at the  Stroke Level for Attribution and Authentication\nWhat Really is Deep Learning Doing?\nArrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records\nCommonsense LocatedNear Relation Extraction\nRobust Keyframe-based Dense SLAM with an RGB-D Camera\nDynamic Zoom-in Network for Fast Object Detection in Large Images\nA Correlation Based Feature Representation for First-Person Activity  Recognition\nFast and Efficient Calculations of Structural Invariants of Chirality\nLearning to Compare: Relation Network for Few-Shot Learning\nLanguage-Based Image Editing with Recurrent Attentive Models\nImprovements to context based self-supervised learning\nTowards dense volumetric pancreas segmentation in CT using 3D fully  convolutional networks\nHigh-Resolution Deep Convolutional Generative Adversarial Networks\nDependent landmark drift: robust point set registration based on the  Gaussian mixture model with a statistical shape model\nLearning Discriminative Affine Regions via Discriminability\nAn Automatic Solver for Very Large Jigsaw Puzzles Using Genetic  Algorithms\nStyle Transfer in Text: Exploration and Evaluation\nLearning Steerable Filters for Rotation Equivariant CNNs\nDetection of Tooth caries in Bitewing Radiographs using Deep Learning\nSelf-Similarity Based Time Warping\nFully Convolutional Neural Networks for Page Segmentation of Historical  Document Images\nThe Application of Preconditioned Alternating Direction Method of  Multipliers in Depth from Focal Stack\nReceptive Field Block Net for Accurate and Fast Object Detection\nVisual and Textual Sentiment Analysis Using Deep Fusion Convolutional  Neural Networks\nGenerating Analytic Insights on Human Behaviour using Image Processing\nPersonalization of Saliency Estimation\nShift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions\nAlignedReID: Surpassing Human-Level Performance in Person  Re-Identification\nOn the Automatic Generation of Medical Imaging Reports\nSolarisNet: A Deep Regression Network for Solar Radiation Prediction\nIn Defense of Product Quantization\nDNN-Buddies: A Deep Neural Network-Based Estimation Metric for the  Jigsaw Puzzle Problem\nDeepPainter: Painter Classification Using Deep Convolutional  Autoencoders\nSplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels\nUnsupervised Domain Adaptation with Similarity Learning\nNatural and Effective Obfuscation by Head Inpainting\nTransfer Learning in CNNs Using Filter-Trees\nDeepBrain: Functional Representation of Neural In-Situ Hybridization  Images for Gene Ontology Classification Using Deep Convolutional Autoencoders\nEvaluating gender portrayal in Bangladeshi TV\nParticle Filter Re-detection for Visual Tracking via Correlation Filters\nAttnGAN: Fine-Grained Text to Image Generation with Attentional  Generative Adversarial Networks\nPSIque: Next Sequence Prediction of Satellite Images using a  Convolutional Sequence-to-Sequence Network\nImage2Mesh: A Learning Framework for Single Image 3D Reconstruction\nTransfer Learning with Binary Neural Networks\nVideo Captioning via Hierarchical Reinforcement Learning\nConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection,  Adversarial Examples and Model Criticism\nNeural Signatures for Licence Plate Re-identification\nPropagating Uncertainty in Multi-Stage Bayesian Convolutional Neural  Networks with Application to Pulmonary Nodule Detection\nProgressive Neural Architecture Search\nEvaluation of Alzheimer's Disease by Analysis of MR Images using  Multilayer Perceptrons and Kohonen SOM Classifiers as an Alternative to the  ADC Maps\nIncorporating External Knowledge to Answer Open-Domain Visual Questions  with Dynamic Memory Networks\nMultimodal Visual Concept Learning with Weakly Supervised Techniques\nSemi-Global Stereo Matching with Surface Orientation Priors\nComposite Quantization\nConnecting Pixels to Privacy and Utility: Automatic Redaction of Private  Information in Images\nLearning by Asking Questions\nExamining Cooperation in Visual Dialog Models\n3D Semantic Trajectory Reconstruction from 3D Pixel Continuum\nMultimodal Storytelling via Generative Adversarial Imitation Learning\nGeneralization of Deep Neural Networks for Chest Pathology  Classification in X-Rays Using Generative Adversarial Networks\nTriagem virtual de imagens de imuno-histoquímica usando redes neurais  artificiais e espectro de padrões\nTech Report: A Fast Multiscale Spatial Regularization for Sparse  Hyperspectral Unmixing\nLearning General Latent-Variable Graphical Models with Predictive Belief  Propagation and Hilbert Space Embeddings\nGenerative Adversarial Perturbations\nTomographic Reconstruction using Global Statistical Prior\nCNNs are Globally Optimal Given Multi-Layer Support\nBroadcasting Convolutional Network for Visual Relational Reasoning\nIn-Place Activated BatchNorm for Memory-Optimized Training of DNNs\nAdaComp : Adaptive Residual Gradient Compression for Data-Parallel  Distributed Training\nIncremental Learning in Deep Convolutional Neural Networks Using Partial  Network Sharing\nEnd-to-end Learning of Deterministic Decision Trees\nStochastic reconstruction of an oolitic limestone by generative  adversarial networks\nImage Inpainting for High-Resolution Textures using CNN Texture  Synthesis\nPeephole: Predicting Network Performance Before Training\nGeometry Guided Adversarial Facial Expression Synthesis\nA practical guide and software for analysing pairwise comparison  experiments\nLearning Surrogate Models of Document Image Quality Metrics for  Automated Document Image Processing\nLearning Modality-Invariant Representations for Speech and Images\nFusing Multiple Multiband Images\nDeep Quaternion Networks\nRegularization and Optimization strategies in Deep Convolutional Neural  Network\nMultidimensional Data Tensor Sensing for RF Tomographic Imaging\nLearning Compact Recurrent Neural Networks with Block-Term Tensor  Decomposition\nSemi-Automatic Algorithm for Breast MRI Lesion Segmentation Using  Marker-Controlled Watershed Transformation\nObject Detection with an Aligned Spatial-Temporal Memory\nMulti-modal Face Pose Estimation with Multi-task Manifold Deep Learning\nDynamic Weight Alignment for Convolutional Neural Networks\nMining Point Cloud Local Structures by Kernel Correlation and Graph  Pooling\nMulti-shot Pedestrian Re-identification via Sequential Decision Making\nFinding Competitive Network Architectures Within a Day Using UCT\nAn Order Preserving Bilinear Model for Person Detection in Multi-Modal  Data\nAn Incremental Self-Organizing Architecture for Sensorimotor Learning  and Prediction\nInterpretable Counting for Visual Question Answering\nTowards Structured Analysis of Broadcast Badminton Videos\nA model for interpreting social interactions in local image regions\nConsensus-based Sequence Training for Video Captioning\nReport: Dynamic Eye Movement Matching and Visualization Tool in Neuro  Gesture\nDeep Learning Interior Tomography for Region-of-Interest Reconstruction\nTransfer learning for diagnosis of congenital abnormalities of the  kidney and urinary tract in children based on Ultrasound imaging data\nQuality assessment metrics for edge detection and edge-aware filtering:  A tutorial review\nAutomated image segmentation for detecting cell spreading for  metastasizing assessments of cancer development\nA Novel Approach to Skew-Detection and Correction of English Alphabets  for OCR\nLive Intrinsic Material Estimation\nFingerprint Distortion Rectification using Deep Convolutional Neural  Networks\nDeep Learning Reconstruction for 9-View Dual Energy CT Baggage Scanner\n3D Surface-to-Structure Translation using Deep Convolutional Networks\nLow-dose spectral CT reconstruction using L0 image gradient and tensor  dictionary\nAccelerated Training for Massive Classification via Dynamic Class  Selection\nImproved Style Transfer by Respecting Inter-layer Correlations\nCross-modal Embeddings for Video and Audio Retrieval\nAnatomical Data Augmentation For CNN based Pixel-wise Classification\nSynthetic Data Augmentation using GAN for Improved Liver Lesion  Classification\nUnsupervised Discovery of Toxoplasma gondii Motility Phenotypes\nGenerative Sensing: Transforming Unreliable Sensor Data for Reliable  Recognition\nA Benchmark for Breast Ultrasound Image Segmentation (BUSIS)\nFWLBP: A Scale Invariant Descriptor for Texture Classification\nFocus: Querying Large Video Datasets with Low Latency and Low Cost\nNon-Rigid Image Registration Using Self-Supervised Fully Convolutional  Networks without Training Data\nA Bio-inspired Collision Detecotr for Small Quadcopter\nFix your classifier: the marginal value of training the last weight  layer\nFace Recognition via Centralized Coordinate Learning\n3D CNN-based classification using sMRI and MD-DTI images for Alzheimer  disease studies\nFully Point-wise Convolutional Neural Network for Modeling Statistical  Regularities in Natural Images\nTransfer Learning for Improving Speech Emotion Classification Accuracy\nAn Improved LPTC Neural Model for Background Motion Direction Estimation\nDemonstrably Doing Accountability in the Internet of Things\nClustering with Deep Learning: Taxonomy and New Methods\nA Classification Refinement Strategy for Semantic Segmentation\nMAttNet: Modular Attention Network for Referring Expression  Comprehension\nA Rapidly Deployable Classification System using Visual Data for the  Application of Precision Weed Management\nImage2GIF: Generating Cinemagraphs using Recurrent Deep Q-Networks\nDeepSIC: Deep Semantic Image Compression\nRGB image-based data analysis via discrete Morse theory and persistent  homology\nSpherical CNNs\nConvCSNet: A Convolutional Compressive Sensing Framework Based on Deep  Learning\nModel compression for faster structural separation of macromolecules  captured by Cellular Electron Cryo-Tomography\nSemantic White Balance: Semantic Color Constancy Using Convolutional  Neural Network\nAPPLE Picker: Automatic Particle Picking, a Low-Effort Cryo-EM Framework\nVisual Interpretability for Deep Learning: a Survey\nDensely Connected Bidirectional LSTM with Applications to Sentence  Classification\nPose Flow: Efficient Online Pose Tracking\nFast Piecewise-Affine Motion Estimation Without Segmentation\nA Log-Euclidean and Total Variation based Variational Framework for  Computational Sonography\nA Systematic Analysis for State-of-the-Art 3D Lung Nodule Proposals  Generation\nSeeded Ising Model and Statistical Natures of Human Iris Templates\nEnergy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic  System-on-a-Chip\nAn Unsupervised Learning Model for Deformable Medical Image Registration\nGoing Deeper in Spiking Neural Networks: VGG and Residual Architectures\nPPFNet: Global Context Aware Local Features for Robust 3D Point Matching\nSaliency-Enhanced Robust Visual Tracking\nPiecewise Flat Embedding for Image Segmentation\nNature vs. Nurture: The Role of Environmental Resources in Evolutionary  Deep Intelligence\nADC: Automated Deep Compression and Acceleration with Reinforcement  Learning\nAnswerer in Questioner's Mind for Goal-Oriented Visual Dialogue\nDeep feature compression for collaborative object detection\nTemporal and Volumetric Denoising via Quantile Sparse Image (QuaSI)  Prior in Optical Coherence Tomography and Beyond\nDCFNet: Deep Neural Network with Decomposed Convolutional Filters\nModelling of Facial Aging and Kinship: A Survey\nWeb-Scale Responsive Visual Search at Bing\nISEC: Iterative over-Segmentation via Edge Clustering\nA New De-blurring Technique for License Plate Images with Robust Length  Estimation\nExact and Consistent Interpretation for Piecewise Linear Neural  Networks: A Closed Form Solution\nEfficient Sparse-Winograd Convolutional Neural Networks\nMachine Learning Methods for Solving Assignment Problems in Multi-Target  Tracking\nTeaching Categories to Human Learners with Visual Explanations\nSegmentation hiérarchique faiblement supervisée\nFusing Video and Inertial Sensor Data for Walking Person Identification\nTransport-Based Pattern Theory: A Signal Transformation Approach\nEmergence of Structured Behaviors from Curiosity-Based Intrinsic  Motivation\nLiver Segmentation in Abdominal CT Images by Adaptive 3D Region Growing\nMultimodal Explanations: Justifying Decisions and Pointing to the  Evidence\nVizWiz Grand Challenge: Answering Visual Questions from Blind People\nTensor Field Networks: Rotation- and Translation-Equivariant Neural  Networks for 3D Point Clouds\nSPLATNet: Sparse Lattice Networks for Point Cloud Processing\nSleep-deprived Fatigue Pattern Analysis using Large-Scale Selfies from  Social Med\nA Twofold Siamese Network for Real-Time Object Tracking\nReHAR: Robust and Efficient Human Activity Recognition\nCoarse to fine non-rigid registration: a chain of scale-specific neural  networks for multimodal image alignment with application to remote sensing\nGraph-based Image Anomaly Detection\n3D Object Super-Resolution\nCompressing Neural Networks using the Variational Information Bottleneck\nContext-Aware Learning using Transferable Features for Classification of  Breast Cancer Histology Images\nFast and robust misalignment correction of Fourier ptychographic  microscopy\nFusion of multispectral satellite imagery using a cluster of graphics  processing unit\nDriving Digital Rock towards Machine Learning: predicting permeability  with Gradient Boosting and Deep Neural Networks\nPose-Robust Face Recognition via Deep Residual Equivariant Mapping\nGreedy stochastic algorithms for entropy-regularized optimal transport  problems\nLess Is More: Picking Informative Frames for Video Captioning\nPath Aggregation Network for Instance Segmentation\nThe Contextual Loss for Image Transformation with Non-Aligned Data\n2^B3^C: 2 Box 3 Crop of Facial Image for Gender Classification with  Convolutional Networks\nMethodology to analyze the accuracy of 3D objects reconstructed with  collaborative robot based monocular LSD-SLAM\nDeep Thermal Imaging: Proximate Material Type Recognition in the Wild  through Deep Learning of Spatial Surface Temperature Patterns\nFast and Accurate Semantic Mapping through Geometric-based Incremental  Segmentation\nLearning Effective Binary Visual Representations with Deep Networks\nMeasuring Conflict in a Multi-Source Environment as a Normal Measure\nImage Segmentation and Processing for Efficient Parking Space Analysis\nResource aware design of a deep convolutional-recurrent neural network  for speech recognition through audio-visual sensor fusion\nExpert identification of visual primitives used by CNNs during mammogram  classification\nAveraging Weights Leads to Wider Optima and Better Generalization\nComputer-aided diagnosis of lung carcinoma using deep learning - a pilot  study\nTargeted change detection in remote sensing images\nExploring Linear Relationship in Feature Map Subspace for ConvNets  Compression\nI Know What You See: Power Side-Channel Attack on Convolutional Neural  Network Accelerators\nVirtual CNN Branching: Efficient Feature Ensemble for Person  Re-Identification\nToolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey  and Future Directions\nSemantic Segmentation of Pathological Lung Tissue with Dilated Fully  Convolutional Networks\nJoint Recognition of Handwritten Text and Named Entities with a Neural  End-to-end Model\nAdaptive strategy for superpixel-based region-growing image segmentation\nFusion of an Ensemble of Augmented Image Detectors for Robust Object  Detection\nFacial Landmarks Detection by Self-Iterative Regression based  Landmarks-Attention Network\nDepth-aware CNN for RGB-D Segmentation\nFeatureless: Bypassing feature extraction in action categorization\nLive Target Detection with Deep Learning Neural Network and Unmanned  Aerial Vehicle on Android Mobile Device\nDiagnostic Classification Of Lung Nodules Using 3D Neural Networks\n3D Point Cloud Denoising using Graph Laplacian Regularization of a Low  Dimensional Manifold Model\nResidual Codean Autoencoder for Facial Attribute Analysis\nA Distance Oriented Kalman Filter Particle Swarm Optimizer Applied to  Multi-Modality Image Registration\nVQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual  Questions\nIntPhys: A Framework and Benchmark for Visual Intuitive Physics  Reasoning\nProduct Characterisation towards Personalisation: Learning Attributes  from Unstructured Data to Recommend Fashion Products\nA Feature-Driven Active Framework for Ultrasound-Based Brain Shift  Compensation\nFast Semantic Segmentation on Video Using Motion Vector-Based Feature  Interpolation\nZero-shot Recognition via Semantic Embeddings and Knowledge Graphs\nDeep Learning using Rectified Linear Units (ReLU)\nAutomated Detection of Acute Leukemia using K-mean Clustering Algorithm\nHardware based Spatio-Temporal Neural Processing Backend for Imaging  Sensors: Towards a Smart Camera\nFace Recognition with Hybrid Efficient Convolution Algorithms on FPGAs\nPredicting Gaze in Egocentric Video by Learning Task-dependent Attention  Transition\nA Face Recognition Signature Combining Patch-based Features with Soft  Facial Attributes\nFast and Accurate Single Image Super-Resolution via Information  Distillation Network\nEfficient Image Dataset Classification Difficulty Estimation for  Predicting Deep-Learning Accuracy\nA disciplined approach to neural network hyper-parameters: Part 1 --  learning rate, batch size, momentum, and weight decay\nNeural Baby Talk\nDiversity Regularized Spatiotemporal Attention for Video-based Person  Re-identification\nDiagonalwise Refactorization: An Efficient Training Method for Depthwise  Convolutions\nA Fast Face Detection Method via Convolutional Neural Network\nFeed-forward Uncertainty Propagation in Belief and Neural Networks\nWeakly-Supervised Action Segmentation with Iterative Soft Boundary  Assignment\nAdversarial Network Compression\nSocial GAN: Socially Acceptable Trajectories with Generative Adversarial  Networks\nA real-time warning system for rear-end collision based on random forest  classifier\nDetection of Structural Change in Geographic Regions of Interest by Self  Organized Mapping: Las Vegas City and Lake Mead across the Years\nUnsupervised Textual Grounding: Linking Words to Image Concepts\nTwo can play this Game: Visual Dialog with Discriminative Question  Generation and Answering\nInterpretable and Globally Optimal Prediction for Textual Grounding  using Image Concepts\nParallel Grid Pooling for Data Augmentation\nOn the Resistance of Neural Nets to Label Noise\nA Subpixel Registration Algorithm for Low PSNR Images\nGated Fusion Network for Single Image Dehazing\nEnd-to-End Detection and Re-identification Integrated Net for Person  Search\nGenerative Spatiotemporal Modeling Of Neutrophil Behavior\nEnd-to-End Learning of Motion Representation for Video Understanding\nTowards Explanation of DNN-based Prediction with Guided Feature  Inversion\nUniversal Planning Networks\nDynamic Video Segmentation Network\nIn-depth Question classification using Convolutional Neural Networks\nLooking at Hands in Autonomous Vehicles: A ConvNet Approach using Part  Affinity Fields\nJointly Discovering Visual Objects and Spoken Words from Raw Sensory  Input\nImage Generation from Scene Graphs\nFinding beans in burgers: Deep semantic-visual embedding with  localization\nOrdinal Pooling Networks: For Preserving Information over Shrinking  Feature Maps\nA Generation Method of Immunological Memory in Clonal Selection  Algorithm by using Restricted Boltzmann Machines\nComposing photomosaic images using clustering based evolutionary  programming\nAbdominal Aortic Aneurysm Segmentation with a Small Number of Training  Subjects\nBlazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning\nCortex Neural Network: learning with Neural Network groups\nA Fast Hierarchically Preconditioned Eigensolver Based On  Multiresolution Matrix Decomposition\nParameterized Algorithms for the Matrix Completion Problem\nImagine This! Scripts to Compositions to Videos\nMaking Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose  Estimation\nSentiment Transfer using Seq2Seq Adversarial Autoencoders\nLearning to Extract a Video Sequence from a Single Motion-Blurred Image\nA two-stage 3D Unet framework for multi-class segmentation on full  resolution image\nLearned Deformation Stability in Convolutional Neural Networks\nGroup Anomaly Detection using Deep Generative Models\nUnderstanding Design Fundamentals: How Synthesis and Analysis Drive  Creativity, Resulting in Emergence\nGeneralized Discriminant Analysis algorithm for feature reduction in  Cyber Attack Detection System\nMapping the spatiotemporal dynamics of calcium signaling in cellular  neural networks using optical flow\nFish recognition based on the combination between robust feature  selection, image segmentation and geometrical parameter techniques using  Artificial Neural Network and Decision Tree\nPerturbation Resilience and Superiorization of Iterative Algorithms\nMemory-Efficient Topic Modeling\nLarge-scale continuous subgraph queries on streams\nDistributed optimization of deeply nested systems\nTensor-based formulation and nuclear norm regularization for  multi-energy computed tomography\nMedical Aid for Automatic Detection of Malaria\nParticle methods enable fast and simple approximation of Sobolev  gradients in image segmentation\nExploring the power of GPU's for training Polyglot language models\nCode Generation for High-Level Synthesis of Multiresolution Applications  on FPGAs\nSingle Image Super Resolution via Manifold Approximation\nComparative Evaluation of Symmetric SVD Algorithms for Real-time Face  and Eye Tracking\nAn Empirical Evaluation of Current Convolutional Architectures' Ability  to Manage Nuisance Location and Scale Variability\nBayesian Time-of-Flight for Realtime Shape, Illumination and Albedo\nAdapted sampling for 3D X-ray computed tomography\nBinaryConnect: Training Deep Neural Networks with binary weights during  propagations\nA Survey of the Trends in Facial and Expression Recognition Databases  and Methods\nAsk, Attend and Answer: Exploring Question-Guided Spatial Attention for  Visual Question Answering\nIntegrating Deep Features for Material Recognition\nA method for locally approximating regularized iterative tomographic  reconstruction methods\nGPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring\nEpipolar Geometry Based On Line Similarity\nDeep Adaptive Network: An Efficient Deep Neural Network with Sparse  Binary Connections\nPruning Filters for Efficient ConvNets\nSocio-inspired ICT - Towards a socially grounded society-ICT symbiosis\nMulti Circle Detection on Images Using Artificial Bee Colony (ABC)  Optimization\nA role for recurrent processing in object completion:  neurophysiological, psychophysical and computational\"evidence\nHighly Efficient Forward and Backward Propagation of Convolutional  Neural Networks for Pixelwise Classification\nLearning of Proto-object Representations via Fixations on Low Resolution\nA Holistic Approach for Modeling and Synthesis of Image Processing  Applications for Heterogeneous Computing Architectures\nAnatomy-specific classification of medical images using deep  convolutional nets\nHomogeneous Spiking Neuromorphic System for Real-World Pattern  Recognition\nDeep SimNets\nDeepOrgan: Multi-level Deep Convolutional Networks for Automated  Pancreas Segmentation\nEstimating Absolute-Phase Maps Using ESPIRiT and Virtual Conjugate Coils\n3-D/2-D Registration of Cardiac Structures by 3-D Contrast Agent  Distribution Estimation\nComparative evaluation of state-of-the-art algorithms for SSVEP-based  BCIs\nStorm Detection by Visual Learning Using Satellite Images\nKernelized Weighted SUSAN based Fuzzy C-Means Clustering for Noisy Image  Segmentation\nRobust and Globally Optimal Manhattan Frame Estimation in Near Real Time\nDeep Roots: Improving CNN Efficiency with Hierarchical Filter Groups\nZNNi - Maximizing the Inference Throughput of 3D Convolutional Networks  on Multi-Core CPUs and GPUs\nJoint M-Best-Diverse Labelings as a Parametric Submodular Minimization\nRandom Walk Graph Laplacian based Smoothness Prior for Soft Decoding of  JPEG Images\nThe Projected Power Method: An Efficient Algorithm for Joint Alignment  from Pairwise Differences\nComprehensive Evaluation of OpenCL-based Convolutional Neural Network  Accelerators in Xilinx and Altera FPGAs\nMultispectral image denoising with optimized vector non-local mean  filter\nDeep fusion of visual signatures for client-server facial analysis\nHierarchical Object Detection with Deep Reinforcement Learning\nMeasuring and modeling the perception of natural and unconstrained gaze  in humans and machines\nTemporal-Needle: A view and appearance invariant video descriptor\nAn extended Perona-Malik model based on probabilistic models\nCamera-trap images segmentation using multi-layer robust principal  component analysis\nDeep Learning the Indus Script\nIntrinsic Grassmann Averages for Online Linear and Robust Subspace  Learning\nAn Efficient Decomposition Framework for Discriminative Segmentation  with Supermodular Losses\nFast and Accurate Inference with Adaptive Ensemble Prediction in Image  Classification with Deep Neural Networks\nMining Object Parts from CNNs via Active Question-Answering\nFast PET reconstruction using Multi-scale Fully Convolutional Neural  Networks\nMatrix Completion via Factorizing Polynomials\nExploring Computation-Communication Tradeoffs in Camera Systems\nFast and Accurate Image Super Resolution by Deep CNN with Skip  Connection and Network in Network\nFull-Network Embedding in a Multimodal Embedding Pipeline\nCross-Media Similarity Evaluation for Web Image Retrieval in the Wild\nStatistical learning of spatiotemporal patterns from longitudinal  manifold-valued networks\nSkin Lesion Segmentation: U-Nets versus Clustering\nDiscovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy\nEvolving Deep Convolutional Neural Networks for Image Classification\nA Pose-Sensitive Embedding for Person Re-Identification with Expanded  Cross Neighborhood Re-Ranking\nOn Usage of Autoencoders and Siamese Networks for Online Handwritten  Signature Verification\nEfficient Trimmed Convolutional Arithmetic Encoding for Lossless Image  Compression\nRecognizing Cuneiform Signs Using Graph Based Methods\nLDOP: Local Directional Order Pattern for Robust Face Retrieval\nImage Recognition Using Scale Recurrent Neural Networks\nFPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural  Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and  Stacked Filters Stationary Flow\nAccelerating Materials Development via Automation, Machine Learning, and  High-Performance Computing\nOn the Computation of Kantorovich-Wasserstein Distances between  2D-Histograms by Uncapacitated Minimum Cost Flows\nHybrid Binary Networks: Optimizing for Accuracy, Efficiency and Memory\nGlobal registration of multiple point clouds using semidefinite  programming\nAn Empirical Study into Annotator Agreement, Ground Truth Estimation,  and Algorithm Evaluation\nNumerical Methods for Coupled Reconstruction and Registration in Digital  Breast Tomosynthesis\nCalibration of an Articulated Camera System with Scale Factor Estimation\nStable Camera Motion Estimation Using Convex Programming\nA fast eikonal equation solver using the Schrodinger wave equation\nBag of Visual Words and Fusion Methods for Action Recognition:  Comprehensive Study and Good Practice\nSingle camera pose estimation using Bayesian filtering and Kinect motion  priors\nRobust Temporally Coherent Laplacian Protrusion Segmentation of 3D  Articulated Bodies\nA Data-Driven Approach for Tag Refinement and Localization in Web Videos\nPISA: Pixelwise Image Saliency by Aggregating Complementary Appearance  Contrast Measures with Edge-Preserving Coherence\nBrain Tumor Segmentation with Deep Neural Networks\nFace Search at Scale: 80 Million Gallery\nSupervised Dictionary Learning and Sparse Representation-A Review\nVideo Inpainting of Complex Scenes\nA Practical Guide to CNNs and Fisher Vectors for Image Instance  Retrieval\nEIE: Efficient Inference Engine on Compressed Deep Neural Network\nFast Training of Triplet-based Deep Binary Embedding Networks\nAdaptive foveated single-pixel imaging with dynamic super-sampling\nVisual Dialog\nAsynchronous approach in the plane: A deterministic polynomial algorithm\nA Computer Vision Approach To Identify Einstein Rings And Arcs\nTexture Classification of MR Images of the Brain in ALS using CoHOG\nClusterNet: Detecting Small Objects in Large Scenes by Exploiting  Spatio-Temporal Information\nBeyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks -  Counting, Detection, and Tracking\nBinary Patterns Encoded Convolutional Neural Networks for Texture  Recognition and Remote Sensing Scene Classification\nMultimedia Semantic Integrity Assessment Using Joint Embedding Of Images  And Text\nMulti-View Stereo with Single-View Semantic Mesh Refinement\nExploring and Exploiting Diversity for Image Segmentation\nMulti-Label Zero-Shot Human Action Recognition via Joint Latent  Embedding\nMulti-label Pixelwise Classification for Reconstruction of Large-scale  Urban Areas\nAdversarial Attacks Beyond the Image Space\nLSTM Pose Machines\nA Semi-Supervised Two-Stage Approach to Learning from Noisy Labels\nThe History Began from AlexNet: A Comprehensive Survey on Deep Learning  Approaches\nA New Target-specific Object Proposal Generation Method for Visual  Tracking\nMulti-label Learning with Missing Labels using Mixed Dependency Graphs\nLearning a Robust Society of Tracking Parts using Co-occurrence  Constraints\nInformation Compression, Intelligence, Computing, and Mathematics\nMontblanc: GPU accelerated Radio Interferometer Measurement Equations in  support of Bayesian Inference for Radio Observations\nTraining CNNs with Low-Rank Filters for Efficient Image Classification\nNiftyNet: a deep-learning platform for medical imaging\nInducing Features of Random Fields\nClustering by compression\nThe Perceptron Algorithm: Image and Signal Decomposition, Compression,  and Analysis by Iterative Gaussian Blurring\nSimilarity of Objects and the Meaning of Words\nLinear versus Non-linear Acquisition of Step-Functions\nA computational theory for the classification of natural biosonar  targets based on a spike code\nPolygon Exploration with Time-Discrete Vision\nA branch-and-bound feature selection algorithm for U-shaped cost  functions\nThe Modular Audio Recognition Framework (MARF) and its Applications:  Scientific and Software Engineering Notes\nHandwritten Farsi Character Recognition using Artificial Neural Network\nIterative Shrinkage Approach to Restoration of Optical Imagery\nDistributed Object Medical Imaging Model\nAn Innovative Scheme For Effectual Fingerprint Data Compression Using  Bezier Curve Representations\nBiogeography based Satellite Image Classification\nA Topological derivative based image segmentation for sign language  recognition system using isotropic filter\nMulti-camera Realtime 3D Tracking of Multiple Flying Animals\nA GA based Window Selection Methodology to Enhance Window based Multi  wavelet transformation and thresholding aided CT image denoising technique\nEmploying fuzzy intervals and loop-based methodology for designing  structural signature: an application to symbol recognition\nBiometric Authentication using Nonparametric Methods\nBiometric Authentication using Nonparametric Methods\n3-D Rigid Models from Partial Views - Global Factorization\nFrom Social Simulation to Integrative System Design\nSelf-Organising Stochastic Encoders\nExact Reconstruction of the Rank Order Coding using Frames Theory\nExploratory simulation of an Intelligent Iris Verifier Distributed  System\nLearning Shape and Texture Characteristics of CT Tree-in-Bud Opacities  for CAD Systems\nFacial Expression Classification Based on Multi Artificial Neural  Network and Two Dimensional Principal Component Analysis\n3-Phase Recognition Approach to Pseudo 3D Building Generation from 2D  Floor Plan\nAutomatic Application Level Set Approach in Detection Calcifications in  Mammographic Image\nTopology on locally finite metric spaces\nA New Local Adaptive Thresholding Technique in Binarization\nScene Parsing with Multiscale Feature Learning, Purity Trees, and  Optimal Covers\nDivide-and-Conquer Method for L1 Norm Matrix Factorization in the  Presence of Outliers and Missing Data\nOn-Board Visual Tracking with Unmanned Aircraft System (UAS)\nExtraction of Facial Feature Points Using Cumulative Histogram\nBayesian Parameter Estimation for Latent Markov Random Fields and Social  Networks\nHeterogeneous Highly Parallel Implementation of Matrix Exponentiation  Using GPU\nFunctional Currents : a new mathematical tool to model and analyse  functional shapes\nFace Recognition Algorithms based on Transformed Shape Features\nJoint-ViVo: Selecting and Weighting Visual Words Jointly for  Bag-of-Features based Tissue Classification in Medical Images\nEGovernment Stage Model: Evaluating the Rate of Web Development Progress  of Government Websites in Saudi Arabia\nMulti-Sensor Fusion via Reduction of Dimensionality\nMultislice Modularity Optimization in Community Detection and Image  Segmentation\nTraining Support Vector Machines Using Frank-Wolfe Optimization Methods\nAutonomous Navigation by Robust Scan Matching Technique\nAn Analysis of Gene Expression Data using Penalized Fuzzy C-Means  Approach\nMorphological Analusis Of The Left Ventricular Eendocardial Surface  Using A Bag-Of-Features Descriptor\nThe State of the Art Recognize in Arabic Script through Combination of  Online and Offline\nGeometric tree kernels: Classification of COPD from airway tree geometry\nCompressive Sensing of Sparse Tensors\nDiscriminative extended canonical correlation analysis for pattern set  matching\nAutomated Thermal Face recognition based on Minutiae Extraction\nImprovements to deep convolutional neural networks for LVCSR\nScan-based Compressed Terahertz Imaging and Real-Time Reconstruction via  the Complex-valued Fast Block Sparse Bayesian Learning Algorithm\nAn Application of Backpropagation Artificial Neural Network Method for  Measuring The Severity of Osteoarthritis\nDynamic Model of Facial Expression Recognition based on Eigen-face  Approach\nA robust Iris recognition method on adverse conditions\nGenerative NeuroEvolution for Deep Learning\nMonte Carlo non local means: Random sampling for large-scale image  filtering\nPectoral Muscles Suppression in Digital Mammograms using Hybridization  of Soft Computing Methods\nA Study of Image Analysis with Tangent Distance\nAn Identification System Using Eye Detection Based On Wavelets And  Neural Networks\nMulti-Directional Multi-Level Dual-Cross Patterns for Robust Face  Recognition\nGeometry-based Adaptive Symbolic Approximation for Fast Sequence  Matching on Manifolds\nCollaborative Verification-Driven Engineering of Hybrid Systems\nGaussian-Chain Filters for Heavy-Tailed Noise with Application to  Detecting Big Buyers and Big Sellers in Stock Market\nNewton-Type Iterative Solver for Multiple View $L2$ Triangulation\nCoarse-to-Fine Classification via Parametric and Nonparametric Models  for Computer-Aided Diagnosis\nLarge-scale Supervised Hierarchical Feature Learning for Face  Recognition\nNovel and Tuneable Method for Skin Detection Based on Hybrid Color Space  and Color Statistical Features\nToward Automated Discovery of Artistic Influence\n2D View Aggregation for Lymph Node Detection Using a Shallow Hierarchy  of Linear Classifiers\nA Fusion Approach for Efficient Human Skin Detection\nZero-Aliasing Correlation Filters for Object Recognition\nComputational Baby Learning\nPredictive Encoding of Contextual Relationships for Perceptual  Inference, Interpolation and Prediction\nThe Treasure beneath Convolutional Layers: Cross-convolutional-layer  Pooling for Image Classification\nSkincure: An Innovative Smart Phone-Based Application To Assist In  Melanoma Early Detection And Prevention\nFrom Visual Attributes to Adjectives through Decompositional  Distributional Semantics\nFeature Selection based on Machine Learning in MRIs for Hippocampal  Segmentation\nCoupled Depth Learning\nImage Denoising using Optimally Weighted Bilateral Filters: A Sure and  Fast Approach\nSelf-Expressive Decompositions for Matrix Approximation and Clustering\nEfficient Large Scale Video Classification\nPoseNet: A Convolutional Network for Real-Time 6-DOF Camera  Relocalization\nApproximate Fisher Kernels of non-iid Image Models for Image  Categorization\nSome like it hot - visual guidance for preference prediction\nGrounding of Textual Phrases in Images by Reconstruction\nSymbol Grounding Association in Multimodal Sequences with Missing  Elements\nSequential Optimization for Efficient High-Quality Object Proposal  Generation\nDeep learning is a good steganalysis tool when embedding key is reused  for different images, even if there is a cover source-mismatch\nHow much data is needed to train a medical image deep learning system to  achieve necessary high accuracy?\nUnsupervised decoding of long-term, naturalistic human neural recordings  with automated video and audio annotations\nA Semi-Lagrangian two-level preconditioned Newton-Krylov solver for  constrained diffeomorphic image registration\nVideo Analysis for Body-worn Cameras in Law Enforcement\nFacial expression recognition based on local region specific features  and support vector machines\nFaster CNNs with Direct Sparse Convolutions and Guided Pruning\nOpenCL-accelerated object classification in video streams using Spatial  Pooler of Hierarchical Temporal Memory\nDetecting Sarcasm in Multimodal Social Platforms\nSimilarity Search on Automata Processors\nPVANET: Deep but Lightweight Neural Networks for Real-time Object  Detection\nFace Shape and Reflectance Acquisition using a Multispectral Light Stage\nFast O(1) bilateral filtering using trigonometric range kernels\nPOCS Based Super-Resolution Image Reconstruction Using an Adaptive  Regularization Parameter\nDiscretization of Parametrizable Signal Manifolds\nEmbedding of Blink Frequency in Electrooculography Signal using  Difference Expansion based Reversible Watermarking Technique\nThe varifold representation of non-oriented shapes for diffeomorphic  registration\nLocal image registration a comparison for bilateral registration  mammography\nDetection of copy-move forgery in digital images based on DCT\nDiscriminative Parameter Estimation for Random Walks Segmentation\nSemantic Graph for Zero-Shot Learning\nNatural Color Image Enhancement based on Modified Multiscale Retinex  Algorithm and Performance Evaluation usingWavelet Energy\nMinimizing the Number of Matching Queries for Object Retrieval\nGenerative Modeling of Convolutional Neural Networks\nTraining Deep Neural Networks on Noisy Labels with Bootstrapping\nBi-directional Shape Correspondences (BSC): A Novel Technique for 2-d  Shape Warping in Quadratic Time?\nA specialized face-processing network consistent with the  representational geometry of monkey face patches\nOut-of-sample generalizations for supervised manifold learning for  classification\nStudy on Sparse Representation based Classification for Biometric  Verification\nNode.DPWS: High performance and scalable Web Services for the IoT\nParallel Statistical Multi-resolution Estimation\nLow-Level Features for Image Retrieval Based on Extraction of  Directional Binary Patterns and Its Oriented Gradients Histogram\nSkilled Impostor Attacks Against Fingerprint Verification Systems And  Its Remedy\nAutonomy Infused Teleoperation with Application to BCI Manipulation\nReduced Basis Decomposition: a Certified and Fast Lossy Data Compression  Algorithm\nSocializing the Semantic Gap: A Comparative Survey on Image Tag  Assignment, Refinement and Retrieval\nUser Preferences Modeling and Learning for Pleasing Photo Collage  Generation\nColor Constancy by Learning to Predict Chromaticity from Luminance\nConvergence rates for pretraining and dropout: Guiding learning  parameters using network structure\nStereoscopic Cinema\nThe Multi-Strand Graph for a PTZ Tracker\nEnd-to-End Privacy for Open Big Data Markets\nMultiresolution Approach to Acceleration of Iterative Image  Reconstruction for X-Ray Imaging for Security Applications\nRecursive Training of 2D-3D Convolutional Networks for Neuronal Boundary  Detection\nSPF-CellTracker: Tracking multiple cells with strongly-correlated moves  using a spatial particle filter\nEfficient Convolutional Neural Networks for Pixelwise Classification on  Heterogeneous Hardware Systems\nEnabling Depth-driven Visual Attention on the iCub Humanoid Robot:  Instructions for Use and New Perspectives\nEfficient Discriminative Nonorthogonal Binary Subspace with its  Application to Visual Tracking\nSymbol Emergence in Robotics: A Survey\nThe Indian Spontaneous Expression Database for Emotion Recognition\nWe Are Humor Beings: Understanding and Predicting Visual Humor\nProbabilistic Programming with Gaussian Process Memoization\nAssessment of texture measures susceptibility to noise in conventional  and contrast enhanced computed tomography lung tumour images\nStatistical and Computational Guarantees for the Baum-Welch Algorithm\nUsing Filter Banks in Convolutional Neural Networks for Texture  Classification\nDetection and Visualization of Endoleaks in CT Data for Monitoring of  Thoracic and Abdominal Aortic Aneurysm Stents\nComposable Industrial Internet Applications for Tiered Architectures\nAutonomous navigation for low-altitude UAVs in urban areas\nPandora: Description of a Painting Database for Art Movement Recognition  with Baselines and Perspectives\nHierarchical image simplification and segmentation based on  Mumford-Shah-salient level line selection\nA Novel Biologically Mechanism-Based Visual Cognition Model--Automatic  Extraction of Semantics, Formation of Integrated Concepts and Re-selection  Features for Ambiguity\nAutomatic 3D liver location and segmentation via convolutional neural  networks and graph cut\nA Gaussian Mixture MRF for Model-Based Iterative Reconstruction with  Applications to Low-Dose X-ray CT\nDeepLab: Semantic Image Segmentation with Deep Convolutional Nets,  Atrous Convolution, and Fully Connected CRFs\nMultilingual Visual Sentiment Concept Matching\nTraining Recurrent Answering Units with Joint Loss Minimization for VQA\nDeep Learning with Darwin: Evolutionary Synthesis of Deep Neural  Networks\nDeep Image Set Hashing\nCrowdsourcing scoring of immunohistochemistry images: Evaluating  Performance of the Crowd and an Automated Computational Method\nPicture It In Your Mind: Generating High Level Visual Representations  From Textual Descriptions\nIs a Picture Worth Ten Thousand Words in a Review Dataset?\nGeneralized Wishart processes for interpolation over diffusion tensor  fields\nOptimising The Input Window Alignment in CD-DNN Based Phoneme  Recognition for Low Latency Processing\nObject Boundary Detection and Classification with Image-level Labels\nIncorporating prior knowledge in medical image segmentation: a survey\nLarge Scale SfM with the Distributed Camera Model\nApproximate Policy Iteration for Budgeted Semantic Video Segmentation\nImage Prediction for Limited-angle Tomography via Deep Learning with  Convolutional Neural Network\nComponent-Based Distributed Framework for Coherent and Real-Time Video  Dehazing\nReduced Memory Region Based Deep Convolutional Neural Network Detection\nMulti-Residual Networks: Improving the Speed and Accuracy of Residual  Networks\nOPML: A One-Pass Closed-Form Solution for Online Metric Learning\nAutomatic Liver and Lesion Segmentation in CT Using Cascaded Fully  Convolutional Neural Networks and 3D Conditional Random Fields\nDiverse Beam Search: Decoding Diverse Solutions from Neural Sequence  Models\nDOTmark - A Benchmark for Discrete Optimal Transport\nLearning and Fusing Multimodal Features from and for Multi-task Facial  Computing\nVisual-Inertial Monocular SLAM with Map Reuse\nVideo Analysis of \"YouTube Funnies\" to Aid the Study of Human Gait and  Falls - Preliminary Results and Proof of Concept\nOptimal Multiple Surface Segmentation with Convex Priors in Irregularly  Sampled Space\nDeep Convolutional Neural Network for Inverse Problems in Imaging\nReal-Time Video Super-Resolution with Spatio-Temporal Networks and  Motion Compensation\nInverting The Generator Of A Generative Adversarial Network\nInferring Restaurant Styles by Mining Crowd Sourced Photos from  User-Review Websites\nRelaxed Earth Mover's Distances for Chain- and Tree-connected Spaces and  their use as a Loss Function in Deep Learning\nMultigrid Neural Architectures\nRecognition of Text Image Using Multilayer Perceptron\nParameter Compression of Recurrent Neural Networks and Degradation of  Short-term Memory\nMulti-Agent Cooperation and the Emergence of (Natural) Language\nA robust approach for tree segmentation in deciduous forests using  small-footprint airborne LiDAR data\nAssessing Uncertainties in X-ray Single-particle Three-dimensional  reconstructions\nRandom Sampling for Fast Face Sketch Synthesis\nLight Field Super-Resolution Via Graph-Based Regularization\nProfiling of OCR'ed Historical Texts Revisited\nUmUTracker: A versatile MATLAB program for automated particle tracking  of 2D light microscopy or 3D digital holography data\nA novel method for automatic localization of joint area on knee plain  radiographs\nComputational Model for Predicting Visual Fixations from Childhood to  Adulthood\nAutomatic Liver and Tumor Segmentation of CT and MRI Volumes using  Cascaded Fully Convolutional Neural Networks\nForest understory trees can be segmented accurately within sufficiently  dense airborne laser scanning point clouds\nDeep artifact learning for compressed sensing and parallel MRI\nChain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating  Deep Convolutional Neural Networks\nWebCaricature: a benchmark for caricature face recognition\nDepth from Monocular Images using a Semi-Parallel Deep Neural Network  (SPDNN) Hybrid Architecture\nWhere to put the Image in an Image Caption Generator\nHidden Two-Stream Convolutional Networks for Action Recognition\nEstimation of Tissue Microstructure Using a Deep Network Inspired by a  Sparse Reconstruction Framework\nReconstruction of three-dimensional porous media using generative  adversarial neural networks\nToward a new approach for massive LiDAR data processing\nRecovery of damped exponentials using structured low rank matrix  completion\nA location-aware embedding technique for accurate landmark recognition\nDeep LDA-Pruned Nets for Efficient Facial Gender Classification\nAccelerated Nearest Neighbor Search with Quick ADC\nEnd-to-End Multimodal Emotion Recognition using Deep Neural Networks\nDiscovery Radiomics via Evolutionary Deep Radiomic Sequencer Discovery  for Pathologically-Proven Lung Cancer Detection\nImage-based immersed boundary model of the aortic root\nBack to RGB: 3D tracking of hands and hand-object interactions based on  short-baseline stereo\nLearning Convolutional Text Representations for Visual Question  Answering\nADMM-Net: A Deep Learning Approach for Compressive Sensing MRI\nFiber Orientation Estimation Guided by a Deep Network\nNon-Linear Phase-Shifting of Haar Wavelets for Run-Time All-Frequency  Lighting\nImage Segmentation by Iterative Inference from Conditional Score  Estimation\nPlan3D: Viewpoint and Trajectory Optimization for Aerial Multi-View  Stereo Reconstruction\nDeep manifold-to-manifold transforming network for action recognition\nDeep Generative Adversarial Networks for Compressed Sensing Automates  MRI\nTransFlow: Unsupervised Motion Flow by Joint Geometric and Pixel-level  Estimation\nGlobal-Local Airborne Mapping (GLAM): Reconstructing a City from Aerial  Videos\nVolume Calculation of CT lung Lesions based on Halton Low-discrepancy  Sequences\nA Bayesian Hyperprior Approach for Joint Image Denoising and  Interpolation, with an Application to HDR Imaging\nPoseidon: An Efficient Communication Architecture for Distributed Deep  Learning on GPU Clusters\nA dynamic graph-cuts method with integrated multiple feature maps for  segmenting kidneys in ultrasound images\nReconstructing the Forest of Lineage Trees of Diverse Bacterial  Communities Using Bio-inspired Image Analysis\nCognitive Psychology for Deep Neural Networks: A Shape Bias Case Study\nFast and accurate classification of echocardiograms using deep learning\nEvolutionary Training of Sparse Artificial Neural Networks: A Network  Science Perspective\nImproving Deep Pancreas Segmentation in CT and MRI Images via Recurrent  Neural Contextual Learning and Direct Loss Function\nResidual Features and Unified Prediction Network for Single Stage  Detection\nHMM-based Writer Identification in Music Score Documents without  Staff-Line Removal\nFrom Image to Text Classification: A Novel Approach based on Clustering  Word Embeddings\nSparse Deep Nonnegative Matrix Factorization\nCurriculum Domain Adaptation for Semantic Segmentation of Urban Scenes\nLEARN: Learned Experts' Assessment-based Reconstruction Network for  Sparse-data CT\nIterative Manifold Embedding Layer Learned by Incomplete Data for  Large-scale Image Retrieval\nHand2Face: Automatic Synthesis and Recognition of Hand Over Face  Occlusions\nCombining Keystroke Dynamics and Face Recognition for User Verification\nWhat your Facebook Profile Picture Reveals about your Personality\nComputational Motility Tracking of Calcium Dynamics in Toxoplasma gondii\nDeep Learning for Passive Synthetic Aperture Radar\nSpotting Separator Points at Line Terminals in Compressed Document  Images for Text-line Segmentation\nVisual Forecasting by Imitating Dynamics in Natural Sequences\nComputer-aided diagnosis of lung nodule using gradient tree boosting and  Bayesian optimization\nDeepBreath: Deep Learning of Breathing Patterns for Automatic Stress  Recognition using Low-Cost Thermal Imaging in Unconstrained Settings\nA Type II Fuzzy Entropy Based Multi-Level Image Thresholding Using  Adaptive Plant Propagation Algorithm\nNon-rigid image registration using fully convolutional networks with  deep self-supervision\nDeep Embedding Convolutional Neural Network for Synthesizing CT Image  from T1-Weighted MR Image\nGravitational Clustering: A Simple, Robust and Adaptive Approach for  Distributed Networks\nA Deep Structured Learning Approach Towards Automating Connectome  Reconstruction from 3D Electron Micrographs\nLearning to Segment Instances in Videos with Spatial Propagation Network\nVehicle Tracking in Wide Area Motion Imagery via Stochastic Progressive  Association Across Multiple Frames (SPAAM)\nDeep-Learnt Classification of Light Curves\nFast Barcode Retrieval for Consensus Contouring\nSE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning  and Control\nStrengths and Weaknesses of Deep Learning Models for Face Recognition  Against Image Degradations\nFast and Accurate Image Super-Resolution with Deep Laplacian Pyramid  Networks\nTowards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT  using Cascaded 3D Fully Convolutional Network\nDeep Spectral Descriptors: Learning the point-wise correspondence metric  via Siamese deep neural networks\nWildbook: Crowdsourcing, computer vision, and data science for  conservation\nWeight Initialization of Deep Neural Networks(DNNs) using Data  Statistics\nMedical Image Segmentation Based on Multi-Modal Convolutional Neural  Network: Study on Image Fusion Schemes\nSpatial Pyramid Context-Aware Moving Object Detection and Tracking for  Full Motion Video and Wide Aerial Motion Imagery\nScale out for large minibatch SGD: Residual network training on  ImageNet-1K with improved accuracy and reduced time to train\nSpatio-Temporal Data Mining: A Survey of Problems and Methods\nNo Reference Stereoscopic Video Quality Assessment Using Joint Motion  and Depth Statistics\nAttend and Interact: Higher-Order Object Interactions for Video  Understanding\nA Two-Phase Genetic Algorithm for Image Registration\nLight-Head R-CNN: In Defense of Two-Stage Object Detector\nHierarchical internal representation of spectral features in deep  convolutional networks trained for EEG decoding\nAccessible Melanoma Detection using Smartphones and Mobile Image  Analysis\n3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with  Adversarial Networks\nOnline Product Quantization\nIntegrated Nanophotonics Architecture for Residue Number System  Arithmetic\nAutomatic Spine Segmentation using Convolutional Neural Network via  Redundant Generation of Class Labels for 3D Spine Modeling\nFuzzy-Based Dialectical Non-Supervised Image Classification and  Clustering\nAvaliação do método dialético na quantização de imagens  multiespectrais\nDialectical Multispectral Classification of Diffusion-Weighted Magnetic  Resonance Images as an Alternative to Apparent Diffusion Coefficients Maps to  Perform Anatomical Analysis\nOLÉ: Orthogonal Low-rank Embedding, A Plug and Play Geometric Loss for  Deep Learning\nDeep Learning for Reliable Mobile Edge Analytics in Intelligent  Transportation Systems\nAutomated flow for compressing convolution neural networks for efficient  edge-computation with FPGA\nA fully automated framework for lung tumour detection, segmentation and  analysis\nImproving utility of brain tumor confocal laser endomicroscopy:  objective value assessment and diagnostic frame detection with convolutional  neural networks\nSimultaneous Tensor Completion and Denoising by Noise Inequality  Constrained Convex Optimization\nHyperspectral recovery from RGB images using Gaussian Processes\nFully Convolutional Multi-scale Residual DenseNets for Cardiac  Segmentation and Automated Cardiac Diagnosis using Ensemble of Classifiers\nStressedNets: Efficient Feature Representations via Stress-induced  Evolutionary Synthesis of Deep Neural Networks\nInteractive Diversity Optimization of Environments\nDeepLung: Deep 3D Dual Path Nets for Automated Pulmonary Nodule  Detection and Classification\nDeep Learning based Retinal OCT Segmentation\nMusical Chair: Efficient Real-Time Recognition Using Collaborative IoT  Devices\nA Continuation Method for Discrete Optimization and its Application to  Nearest Neighbor Classification\nDeep Predictive Coding Network for Object Recognition\nSemi-supervised multi-task learning for lung cancer diagnosis\nTowards Principled Design of Deep Convolutional Networks: Introducing  SimpNet\nOsteoarthritis Disease Detection System using Self Organizing Maps  Method based on Ossa Manus X-Ray\nDeep Inference of Personality Traits by Integrating Image and Word Use  in Social Networks\nChest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation  and Augmentation\nInnovative Texture Database Collecting Approach and Feature Extraction  Method based on Combination of Gray Tone Difference Matrixes, Local Binary  Patterns,and K-means Clustering\nThe ARM Scalable Vector Extension\nEvolving Deep Convolutional Neural Networks by Variable-length Particle  Swarm Optimization for Image Classification\nNeural Architecture Construction using EnvelopeNets\nReal-time Burst Photo Selection Using a Light-Head Adversarial Network\nTensor graph convolutional neural network\nLearning distributions of shape trajectories from longitudinal datasets:  a hierarchical model on a manifold of diffeomorphisms\nPancreas Segmentation in CT and MRI Images via Domain Specific Network  Designing and Recurrent Neural Contextual Learning\nContrast-Oriented Deep Neural Networks for Salient Object Detection\nDeep Residual Learning for Accelerated MRI using Magnitude and Phase  Networks\nCancelable Indexing Based on Low-rank Approximation of  Correlation-invariant Random Filtering for Fast and Secure Biometric  Identification\nEPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for  Depth from Light Field Images\nVisual Tracking Using Sparse Coding and Earth Mover's Distance\nPilot Comparative Study of Different Deep Features for Palmprint  Identification in Low-Quality Images\nCentral and peripheral vision for scene recognition: A  neurocomputational modeling exploration\nComparing Distributions and Shapes using the Kernel Distance\nA new variational principle for the Euclidean distance function: Linear  approach to the non-linear eikonal problem\nTensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed  Systems\nGenomics as a Service: a Joint Computing and Networking Perspective\nSource Detection in Simulated XMM-Newton Observations\nTiming with the EPIC pn Camera of XMM-Newton\nX-ray powerful diagnostics for highly-ionized plasmas: He-like ions\nVirtual Observatory: From Concept to Implementation\nEmergent Probability - A directed Scale-Free Network Approach to  Lonergan's Generic Model of Development\nAnalysis of Three-Dimensional Protein Images\nRobustness of Regional Matching Scheme over Global Matching Scheme\nA Bayesian Reflection on Surfaces\nIntegrating E-Commerce and Data Mining: Architecture and Challenges\nInventing E-Regulation in the US, EU and East Asia: Conflicting Social  Visions of the Internet & the Information Society\nThe Open Language Archives Community and Asian Language Resources\nThe Mysterious Optimality of Naive Bayes: Estimation of the Probability  in the System of \"Classifiers\"\nProsody Based Co-analysis for Continuous Recognition of Coverbal  Gestures\nTechnical Note: Bias and the Quantification of Stability\nTypes of Cost in Inductive Concept Learning\nExploiting Context When Learning to Classify\nRobust Classification with Context-Sensitive Features\nManifold Learning with Geodesic Minimal Spanning Trees\nAn Analytical Piecewise Radial Distortion Model for Precision Camera  Calibration\nA Family of Simplified Geometric Distortion Models for Camera  Calibration\nBetter Foreground Segmentation Through Graph Cuts\nA Numerical Example on the Principles of Stochastic Discrimination\nThree-Dimensional Face Orientation and Gaze Detection from a Single  Image\nSelf-Organised Factorial Encoding of a Toroidal Manifold\nReverse Engineering Ontology to Conceptual Data Models\nFrom Feature Extraction to Classification: A multidisciplinary Approach  applied to Portuguese Granites\nLess is More - Genetic Optimisation of Nearest Neighbour Classifiers\nClustering Techniques for Marbles Classification\nLine and Word Matching in Old Documents\nGradient Vector Flow Models for Boundary Extraction in 2D Images\nSemi-automatic vectorization of linear networks on rasterized  cartographic maps\nGeometric Models of Rolling-Shutter Cameras\nConvexity Analysis of Snake Models Based on Hamiltonian Formulation\nPattern Recognition for Conditionally Independent Data\nRegularity of Position Sequences\nAchievable Rates for Pattern Recognition\nThe consistency principle for a digitization procedure. An algorithm for  building normal digital spaces of continuous n-dimensional objects\nUnderstanding physics from interconnected data\nGeometric symmetry in the quadratic Fisher discriminant operating on  image pixels\nThe `Face on Mars': a photographic approach for the search of signs of  past civilizations from a macroscopic point of view, factoring long-term  erosion in image reconstruction\nLocally Adaptive Block Thresholding Method with Continuity Constraint\nFourier Analysis and Holographic Representations of 1D and 2D Signals\nSemi-Supervised Learning -- A Statistical Physics Approach\nFace Recognition using Principal Component Analysis and Log-Gabor  Filters\nRecognition of expression variant faces using masked log-Gabor features  and Principal Component Analysis\nNotes on Geometric Measure Theory Applications to Image Processing;  De-noising, Segmentation, Pattern, Texture, Lines, Gestalt and Occlusion\nA New Quartet Tree Heuristic for Hierarchical Clustering\nAn effective edge--directed frequency filter for removal of aliasing in  upsampled images\nCooperative Optimization for Energy Minimization: A Case Study of Stereo  Matching\nInvariant template matching in systems with spatiotemporal coding: a  vote for instability\nExtraction of cartographic objects in high resolution satellite images  for object model generation\nThe Hough transform estimator\nImage denoising by statistical area thresholding\nAn active curve approach for tomographic reconstruction of binary  radially symmetric objects\nRecovering convex boundaries from blurred and noisy observations\nTexture synthesis and nonparametric resampling of random fields\nFunctional dissipation microarrays for classification\nStatistical Mechanics Characterization of Neuronal Mosaics\nA Combinatorial Bit Bang Leading to Quaternions\nText Line Segmentation of Historical Documents: a Survey\nMorphing Ensemble Kalman Filters\nSiZer for time series: A new approach to the analysis of trends\nVery fast watermarking by reversible contrast mapping\nBandwidth selection for kernel estimation in mixed multi-dimensional  spaces\nGraph rigidity, Cyclic Belief Propagation and Point Pattern Matching\nComparison and Combination of State-of-the-art Techniques for  Handwritten Character Recognition: Topping the MNIST Benchmark\nImage Classification Using SVMs: One-against-One Vs One-against-All\nLearning View Generalization Functions\nHierarchy construction schemes within the Scale set framework\nPattern Recognition System Design with Linear Encoding for Discrete  Patterns\nProbabilistic Visual Secret Sharing Schemes for Gray-scale images and  Color images\nAutomatic Text Area Segmentation in Natural Images\nImplementing a Test Strategy for an Advanced Video Acquisition and  Processing Architecture\nWavelet and Curvelet Moments for Image Classification: Application to  Aggregate Mixture Grading\nAcquisition Accuracy Evaluation in Visual Inspection Systems - a  Practical Approach\nUsing Spatially Varying Pixels Exposures and Bayer-covered Photosensors  for High Dynamic Range Imaging\nLinear Time Recognition Algorithms for Topological Invariants in 3D\nNotes on Convex Sets, Polytopes, Polyhedra, Combinatorial Topology,  Voronoi Diagrams and Delaunay Triangulations\nA New Algorithm for Interactive Structural Image Segmentation\nStatistical region-based active contours with exponential family  observations\nRegion-based active contour with noise and shape priors\nDimReduction - Interactive Graphic Environment for Dimensionality  Reduction\nProjective Reeds-Shepp car on $S^2$ with quadratic cost\nHuman expert fusion for image classification\nGeneralized proportional conflict redistribution rule applied to Sonar  imagery and Radar targets classification\nCech homology for shape recognition in the presence of occlusions\nThe Five Points Pose Problem : A New and Accurate Solution Adapted to  any Geometric Configuration\nAn Image-Based Sensor System for Autonomous Rendez-Vous with  Uncooperative Satellites\nA 8 bits Pipeline Analog to Digital Converter Design for High Speed  Camera Application\nOn Bounded Integer Programming\nAutomatic Identification and Data Extraction from 2-Dimensional Plots in  Digital Documents\nSupervised Dictionary Learning\nRobust Near-Isometric Matching via Structured Learning of Graphical  Models\nLarge Scale Variational Inference and Experimental Design for Sparse  Generalized Linear Models\nNon-Negative Matrix Factorization, Convexity and Isometry\nHierarchical Bag of Paths for Kernel Based Shape Classification\nAstronomical imaging: The theory of everything\nCamera distortion self-calibration using the plumb-line constraint and  minimal Hough entropy\n3D Face Recognition with Sparse Spherical Representations\nFeature Selection By KDDA For SVM-Based MultiView Face Recognition\nFace Detection Using Adaboosted SVM-Based Component Classifier\nSparse Component Analysis (SCA) in Random-valued and Salt and Pepper  Noise Removal\nOn the Dual Formulation of Boosting Algorithms\nModel-Based Event Detection in Wireless Sensor Networks\nA Keygraph Classification Framework for Real-Time Object Detection\nUniqueness of Low-Rank Matrix Completion by Rigidity Theory\nAre Tensor Decomposition Solutions Unique? On the global convergence of  HOSVD and ParaFac algorithms\nTracking using explanation-based modeling\nMarkov Random Field Segmentation of Brain MR Images\nBuilding the information kernel and the problem of recognition\nBoosting through Optimization of Margin Distributions\nExponential Family Graph Matching and Ranking\nFaceBots: Steps Towards Enhanced Long-Term Human-Robot Interaction by  Utilizing and Publishing Online Social Information\nGeneralized Kernel-based Visual Tracking\nA statistical learning approach to color demosaicing\nA New Solution to the Relative Orientation Problem using only 3 Points  and the Vertical Direction\nAutomatic Spatially-Adaptive Balancing of Energy Terms for Image  Segmentation\nA Novel Two-Staged Decision Support based Threat Evaluation and Weapon  Assignment Algorithm, Asset-based Dynamic Weapon Scheduling using Artificial  Intelligence Techinques\nRegistration of Standardized Histological Images in Feature Space\nGabor wavelet analysis and the fractional Hilbert transform\nScale-Based Gaussian Coverings: Combining Intra and Inter Mixture Models  in Image Segmentation\nSparsity and `Something Else': An Approach to Encrypted Image Folding\nMedian K-flats for hybrid linear modeling with many outliers\nAdaboost with \"Keypoint Presence Features\" for Real-Time Vehicle Visual  Detection\nVisual object categorization with new keypoint-based adaBoost features\nModular Traffic Sign Recognition applied to on-vehicle real-time visual  detection of American and European speed limit signs\nLocal and global approaches of affinity propagation clustering for large  scale data\nMicrostructure reconstruction using entropic descriptors\nPositive Semidefinite Metric Learning with Boosting\nFractional differentiation based image processing\nPigment Melanin: Pattern for Iris Recognition\nIsometric Multi-Manifolds Learning\nHeart Rate Variability Analysis Using Threshold of Wavelet Package  Coefficients\nWriter Identification Using Inexpensive Signal Processing Techniques\nRegularization for Matrix Completion\nDigital Mathematics Libraries: The Good, the Bad, the Ugly\nFace Identification by SIFT-based Complete Graph Topology\nFeature Level Fusion of Biometrics Cues: Human Identification with  Doddingtons Caricature\nThe Influence of Intensity Standardization on Medical Image Registration\nAn Improved DC Recovery Method from AC Coefficients of DCT-Transformed  Images\nIntrinsic dimension estimation of data by principal component analysis\nFeature Level Fusion of Face and Fingerprint Biometrics\nAssessment Of The Wind Farm Impact On The Radar\nMultibiometrics Belief Fusion\nExact feature probabilities in images with occlusion\nPattern recognition using inverse resonance filtration\nOn MMSE and MAP Denoising Under Sparse Representation Modeling Over a  Unitary Dictionary\nThe Projected GSURE for Automatic Parameter Tuning in Iterative  Shrinkage Methods\nThe Video Genome\nDevelopment of a multi-user handwriting recognition system using  Tesseract open source OCR engine\nRecognition of Handwritten Textual Annotations using Tesseract Open  Source OCR Engine for information Just In Time (iJIT)\nDevelopment of a Multi-User Recognition Engine for Handwritten Bangla  Basic Characters and Digits\nFeature Level Fusion of Face and Palmprint Biometrics by Isomorphic  Graph-based Improved K-Medoids Partitioning\nMaximized Posteriori Attributes Selection from Facial Salient Landmarks  for Face Recognition\nHashing Image Patches for Zooming\nCompressed Sensing with off-axis frequency-shifting holography\nDetecting the Most Unusual Part of Two and Three-dimensional Digital  Images\nHierarchical Clustering for Finding Symmetries and Other Patterns in  Massive, High Dimensional Datasets\nClassification of Polar-Thermal Eigenfaces using Multilayer Perceptron  for Human Face Recognition\nReduction of Feature Vectors Using Rough Set Theory for Human Face  Recognition\nContent Based Image Retrieval Using Exact Legendre Moments and Support  Vector Machine\nImage Segmentation Using Weak Shape Priors\nPenalized K-Nearest-Neighbor-Graph Based Metrics for Clustering\nAlgorithm for Sector Spectra Calculation from Images Registered by the  Spectral Airglow Temperature Imager\nSegmentation of Natural Images by Texture and Boundary Compression\nNoise Invalidation Denoising\nPerformance Comparison of SVM and ANN for Handwritten Devnagari  Character Recognition\nApplication of Statistical Features in Handwritten Devnagari Character  Recognition\nClassification Of Gradient Change Features Using MLP For Handwritten  Character Recognition\nFuzzy Classification of Facial Component Parameters\nFace Synthesis (FASY) System for Determining the Characteristics of a  Face Image\nQuotient Based Multiresolution Image Fusion of Thermal and Visual Images  Using Daubechies Wavelet Transform for Human Face Recognition\nImage Pixel Fusion for Human Face Recognition\nClassification of Fused Images using Radial Basis Function Neural  Network for Human Face Recognition\nClassification of fused face images using multilayer perceptron neural  network\nClassification of Log-Polar-Visual Eigenfaces using Multilayer  Perceptron\nHuman Face Recognition using Line Features\nImproved RANSAC performance using simple, iterative minimal-set solvers\nNeural Network Based Reconstruction of a 3D Object from a 2D Wireframe\nVideo Event Recognition for Surveillance Applications (VERSA)\nEar Identification by Fusion of Segmented Slice Regions using Invariant  Features: An Experimental Manifold with Dual Fusion Approach\nImage sequence interpolation using optimal control\nModeling the growth of fingerprints improves matching for adolescents\nOptimally Training a Cascade Classifier\nMulti-Agent Deployment for Visibility Coverage in Polygonal Environments  with Holes\nNonlinear Vector Filtering for Impulsive Noise Removal from Color Images\nAutomatic Detection of Blue-White Veil and Related Structures in  Dermoscopy Images\nAn Improved Objective Evaluation Measure for Border Detection in  Dermoscopy Images\nApproximate Lesion Localization in Dermoscopy Images\n3D-Mesh denoising using an improved vertex based anisotropic diffusion\nBalancing clusters to reduce response time variability in large scale  image search\nImage Segmentation by Discounted Cumulative Ranking on Maximal Cliques\nA Microwave Imaging and Enhancement Technique from Noisy Synthetic Data\nProfile Based Sub-Image Search in Image Databases\nStatistical Compressive Sensing of Gaussian Mixture Models\nPerformance Analysis of Spectral Clustering on Compressed, Incomplete  and Inaccurate Measurements\nSingle Frame Image super Resolution using Learned Directionlets\nImage Segmentation with Multidimensional Refinement Indicators\nBounded Multivariate Surfaces On Monovariate Internal Functions\nThe Data Replication Method for the Classification with Reject Option\nWarping Peirce Quincuncial Panoramas\nModeling Image Structure with Factorized Phase-Coupled Boltzmann  Machines\nAn Introduction to Conditional Random Fields\nGeneralized Tree-Based Wavelet Transform\nEdge Preserving Image Denoising in Reproducing Kernel Hilbert Spaces\nAn Effective Method of Image Retrieval using Image Mining Techniques\nAutomatic Image Segmentation by Dynamic Region Merging\nSparse motion segmentation using multiple six-point consistencies\nTILT: Transform Invariant Low-rank Textures\nA Fast Statistical Method for Multilevel Thresholding in Wavelet Domain\nA Framework for Real-Time Face and Facial Feature Tracking using Optical  Flow Pre-estimation and Template Tracking\nBinary and nonbinary description of hypointensity in human brain MR  images\nSafeVchat: Detecting Obscene Content and Misbehaving Users in Online  Video Chat Services\nTransductive-Inductive Cluster Approximation Via Multivariate Chebyshev  Inequality\nUsing Feature Weights to Improve Performance of Neural Networks\nA Generalized Method for Integrating Rule-based Knowledge into Inductive  Methods Through Virtual Sample Creation\nGuaranteeing Convergence of Iterative Skewed Voting Algorithms for Image  Segmentation\nFeature selection via simultaneous sparse approximation for person  specific face verification\nAn Efficient and Integrated Algorithm for Video Enhancement in  Challenging Lighting Conditions\nSearching in one billion vectors: re-rank with source coding\nA linear framework for region-based image segmentation and inpainting  involving curvature penalization\nA Trajectory UML profile For Modeling Trajectory Data: A Mobile Hospital  Use Case\nDetection of objects in noisy images and site percolation on square  lattices\nWeighted Radial Variation for Node Feature Classification\nAn Algorithm for Repairing Low-Quality Video Enhancement Techniques  Based on Trained Filter\nSubmodular Decomposition Framework for Inference in Associative Markov  Networks with Global Constraints\nRay-Based and Graph-Based Methods for Fiber Bundle Boundary Estimation\nAdaptive mosaic image representation for image processing\nSO(3)-invariant asymptotic observers for dense depth field estimation  based on visual data and known camera motion\nIdentification of arabic word from bilingual text using character  features\nAutomatic Extraction of Open Space Area from High Resolution Urban  Satellite Imagery\nA comparison of Gap statistic definitions with and without logarithm  function\nImproved Edge Awareness in Discontinuity Preserving Smoothing\nInternal Constraints of the Trifocal Tensor\nA Statistical Nonparametric Approach of Face Recognition: Combination of  Eigenface & Modified k-Means Clustering\nGEOMIR2K9 - A Similar Scene Finder\nAn Axis-Based Representation for Recognition\nA Meshless Method for Variational Nonrigid 2-D Shape Registration\nCurved Gabor Filters for Fingerprint Image Enhancement\nConvex Approaches to Model Wavelet Sparsity Patterns\nBayesian approach for near-duplicate image detection\nContent-Based Spam Filtering on Video Sharing Social Networks\nClustering with Multi-Layer Graphs: A Spectral Perspective\nWho clicks there!: Anonymizing the photographer in a camera saturated  society\nInferring 3D Articulated Models for Box Packaging Robot\nFace Identification from Manipulated Facial Images using SIFT\nAutomated segmentation of the pulmonary arteries in low-dose CT by  vessel tracking\nAugmented Reality Implementation Methods in Mainstream Applications\nActive Classification: Theory and Application to Underwater Inspection\nImage denoising assessment using anisotropic stack filtering\nAutomatic Road Lighting System (ARLS) Model Based on Image Processing of  Moving Object\nAnalysis and Improvement of Low Rank Representation for Subspace  segmentation\nMedian Algorithm for Sector Spectra Calculation from Images Registered  by the Spectral Airglow Temperature Imager\nLearning Hypergraph Labeling for Feature Matching\nTopographic Feature Extraction for Bengali and Hindi Character Images\nFace Recognition using Curvelet Transform\nThe Chan-Vese Algorithm\nOn the Hilbert transform of wavelets\nDiffeomorphic Metric Mapping of High Angular Resolution Diffusion  Imaging based on Riemannian Structure of Orientation Distribution Functions\nLeveraging Billions of Faces to Overcome Performance Barriers in  Unconstrained Face Recognition\nReal time face recognition using adaboost improved fast PCA algorithm\nUndithering using linear filtering and non-linear diffusion techniques\nCompressive Imaging using Approximate Message Passing and a Markov-Tree  Prior\nThe Statistical methods of Pixel-Based Image Fusion Techniques\nAdvanced phase retrieval: maximum likelihood technique with sparse  regularization of phase and amplitude\nA Machine Learning Perspective on Predictive Coding with PAQ\nMultisensor Images Fusion Based on Feature-Level\nVessel Segmentation in Medical Imaging Using a Tight-Frame Based  Algorithm\nShareBoost: Efficient Multiclass Learning with Feature Sharing\nColor Texture Classification Approach Based on Combination of Primitive  Pattern Units and Statistical Features\nMinimax hypothesis testing for curve registration\nCurvature Prior for MRF-based Segmentation and Shape Inpainting\nExact Subspace Segmentation and Outlier Detection by Low-Rank  Representation\nMIS-Boost: Multiple Instance Selection Boosting\nA Probabilistic Framework for Discriminative Dictionary Learning\nMulti-Hypothesis CRF-Segmentation of Neural Tissue in Anisotropic EM  Volumes\nOnline Robust Subspace Tracking from Partial Information\nExhaustive and Efficient Constraint Propagation: A Semi-Supervised  Learning Perspective and Its Applications\nLow-rank data modeling via the Minimum Description Length principle\nThe Statistical Inefficiency of Sparse Coding for Images (or, One Gabor  to Rule them All)\nA Novel comprehensive method for real time Video Motion Detection  Surveillance\nDistributed Lossy Source Coding Using Real-Number Codes\nSecuring Biometric Images using Reversible Watermarking\nFace Recognition Using Discrete Cosine Transform for Global and Local  Features\nEfficient Hierarchical Markov Random Fields for Object Detection on a  Mobile Robot\nDigital Manifolds and the Theorem of Jordan-Brouwer\nA prototype system for handwritten sub-word recognition: Toward  Arabic-manuscript transliteration\nMulti-font Multi-size Kannada Numeral Recognition Based on Structural  Features\nRedundant Wavelets on Graphs and High Dimensional Data Clouds\nFacial Asymmetry and Emotional Expression\nSuboptimality of Nonlocal Means for Images with Sharp Edges\nEfficient Adaptive Compressive Sensing Using Sparse Hierarchical Learned  Dictionaries\nLearning joint intensity-depth sparse representations\nA United Image Force for Deformable Models and Direct Transforming  Geometric Active Contorus to Snakes by Level Sets\nIdentifying and Analysis of Scene Mining Methods Beased on Scenes  Extracted Features\nNonparametric Sparse Representation\nG-Lets: Signal Processing Using Transformation Groups\nEnhancing Volumetric Bouligand-Minkowski Fractal Descriptors by using  Functional Data Analysis\nShape analysis using fractal dimension: a curvature based approach\nFractal Descriptors in the Fourier Domain Applied to Color Texture  Analysis\nFractal and Multi-Scale Fractal Dimension analysis: a comparative study  of Bouligand-Minkowski method\nVariations of images to increase their visibility\nFractal Descriptors Based on Fourier Spectrum Applied to Texture  Analysis\nCharge migration in organic materials: Can propagating charges affect  the key physical quantities controlling their motion?\nRT-SLAM: A Generic and Real-Time Visual SLAM Implementation\nFeature selection using nearest attributes\nExamplers based image fusion features for face recognition\nResolving Implementation Ambiguity and Improving SURF\nWavelet-based deconvolution of ultrasonic signals in nondestructive  evaluation\nCombined Haar-Hilbert and Log-Gabor Based Iris Encoders\nA better Beta for the H measure of classification performance\nNo-reference image quality assessment through the von Mises distribution\nSegmentation of Offline Handwritten Bengali Script\nA Simple Unsupervised Color Image Segmentation Method based on MRF-MAP\nReal-time detection and tracking of multiple objects with partial  decoding in H.264/AVC bitstream domain\nMultilevel Image Encryption\nA new hybrid jpeg image compression scheme using symbol reduction  technique\nLeft-Invariant Diffusion on the Motion Group in terms of the Irreducible  Representations of SO(3)\nFast approximations to structured sparse coding and applications to  object classification\nStable image reconstruction using total variation minimization\nPerturbation of the Eigenvectors of the Graph Laplacian: Application to  Image Denoising\nMulti-Level Feature Descriptor for Robust Texture Classification via  Locality-Constrained Collaborative Strategy\nA Report on Multilinear PCA Plus Multilinear LDA to Deal with Tensorial  Data: Visual Classification as An Example\nPosterior Mean Super-Resolution with a Compound Gaussian Markov Random  Field Prior\nAn MLP based Approach for Recognition of Handwritten `Bangla' Numerals\nLearning Random Kernel Approximations for Object Recognition\nSubstructure and Boundary Modeling for Continuous Action Recognition\nHybrid Poisson and multi-Bernoulli filters\nMarginal multi-Bernoulli filters: RFS derivation of MHT, JIPDA and  association-based MeMBer\nHybrid Generative/Discriminative Learning for Automatic Image Annotation\nA stochastic algorithm for probabilistic independent component analysis\nHandwritten digit Recognition using Support Vector Machine\nScilab and SIP for Image Processing\nNotions of Chaotic Cryptography: Sketch of a Chaos based Cryptosystem\nClustering Using Isoperimetric Number of Trees\nReconstruction of hidden 3D shapes using diffuse reflections\nA Local Approach for Identifying Clusters in Networks\nKernel Density Feature Points Estimator for Content-Based Image  Retrieval\nFace Expression Recognition and Analysis: The State of the Art\nValidation of nonlinear PCA\nEfficient Fruit Defect Detection and Glare removal Algorithm by  anisotropic diffusion and 2D Gabor filter\nContinuous Markov Random Fields for Robust Stereo Estimation\nMulti-Level Coding Efficiency with Improved Quality for Image  Compression based on AMBTC\nRobust Nonnegative Matrix Factorization via $L_1$ Norm Regularization\nVideo In Sentences Out\nSeeing Unseeability to See the Unseeable\nStatistical Multiresolution Estimation for Variational Imaging: With an  Application in Poisson-Biophotonics\nA New Approach of Improving CFA Image for Digital Camera's\nBackground subtraction based on Local Shape\nA 3D Segmentation Method for Retinal Optical Coherence Tomography Volume  Data\nActive Contour with A Tangential Component\nElimination of Glass Artifacts and Object Segmentation\nOCT Segmentation Survey and Summary Reviews and a Novel 3D Segmentation  Algorithm and a Proof of Concept Implementation\nImage Enhancement with Statistical Estimation\nDBC based Face Recognition using DWT\nA novel statistical fusion rule for image fusion and its comparison in  non subsampled contourlet transform domain and wavelet domain\nM-FISH Karyotyping - A New Approach Based on Watershed Transform\nAre visual dictionaries generalizable?\nLearning Mixed Graphical Models\nLocally Orderless Registration\nA Brief Summary of Dictionary Learning Based Approach for Classification\nA Brief Summary of Dictionary Learning Based Approach for Classification  (revised)\nAn Unsupervised Dynamic Image Segmentation using Fuzzy Hopfield Neural  Network based Genetic Algorithm\nTemplate-Cut: A Pattern-Based Segmentation Paradigm\nICT's role in e-Governance in India and Malaysia: A Review\nDimension Reduction by Mutual Information Discriminant Analysis\nComments on \"On Approximating Euclidean Metrics by Weighted t-Cost  Distances in Arbitrary Dimension\"\nRevolvable Indoor Panoramas Using a Rectified Azimuthal Projection\nImage Similarity Using Sparse Representation and Compression Distance\nBlind PSF estimation and methods of deconvolution optimization\nThe Ultrasound Visualization Pipeline - A Survey\nA Linear Approximation to the chi^2 Kernel with Geometric Convergence\nManifold Relevance Determination\nTotal Variation and Euler's Elastica for Supervised Learning\nLearning Efficient Structured Sparse Models\nDimensionality Reduction by Local Discriminative Gaussians\nClustering by Low-Rank Doubly Stochastic Matrix Decomposition\nDynamic Domain Classification for Fractal Image Compression\nA generic framework for video understanding applied to group behavior  recognition\nLearning Invariant Representations with Local Transformations\nDeep Lambertian Networks\nLearning Object Arrangements in 3D Scenes using Human Context\nDifferentiable Pooling for Hierarchical Feature Learning\nLocal Water Diffusion Phenomenon Clustering From High Angular Resolution  Diffusion Imaging (HARDI)\nPAC-Bayesian Majority Vote for Late Classifier Fusion\nMolecular Biology at the Quantum Level: Can Modern Density Functional  Theory Forge the Path?\nBackground Subtraction for Online Calibration of Baseline RSS in RF  Sensing Networks\nAn Innovative Skin Detection Approach Using Color Based Image Retrieval  Technique\nNon-Convex Rank Minimization via an Empirical Bayesian Approach\nKernelized Supervised Dictionary Learning\nA Novel Approach Coloured Object Tracker with Adaptive Model and  Bandwidth using Mean Shift Algorithm\nCamera identification by grouping images from database, based on shared  noise patterns\nIncremental Learning of 3D-DCT Compact Representations for Robust Visual  Tracking\nKernel Principal Component Analysis and its Applications in Face  Recognition and Active Shape Models\nDesigning various component analysis at will\nImage Labeling on a Network: Using Social-Network Metadata for Image  Classification\nPolarimetric SAR Image Segmentation with B-Splines and a New Statistical  Model\nA Two-Stage Combined Classifier in Scale Space Texture Classification\nPiecewise Linear Patch Reconstruction for Segmentation and Description  of Non-smooth Image Structures\nGuarantees of Augmented Trace Norm Models in Tensor Recovery\nTowards a theory of statistical tree-shape analysis\nAutofocus Correction of Azimuth Phase Error and Residual Range Cell  Migration in Spotlight SAR Polar Format Imagery\nHuman Activity Learning using Object Affordances from RGB-D Videos\nContour Completion Around a Fixation Point\nImproved Total Variation based Image Compressive Sensing Recovery by  Nonlocal Regularization\nImage Super-Resolution via Dual-Dictionary Learning And Sparse  Representation\nAdaptive Graph via Multiple Kernel Learning for Nonnegative Matrix  Factorization\nIterative graph cuts for image segmentation with a nonlinear statistical  shape prior\nA Unified Approach for Modeling and Recognition of Individual Actions  and Group Activities\nThe Segmentation Fusion Method On10 Multi-Sensors\nAre You Imitating Me? Unsupervised Sparse Modeling for Group Activity  Analysis from a Single Video\nBenchmarking recognition results on word image datasets\nShort-time homomorphic wavelet estimation\nDifference of Normals as a Multi-Scale Operator in Unorganized Point  Clouds\nOn the Use of Lee's Protocol for Speckle-Reducing Techniques\nBlind Image Deblurring by Spectral Properties of Convolution Operators\nVisual Tracking with Similarity Matching Ratio\nHirarchical Digital Image Inpainting Using Wavelets\nDetection and Classification of Viewer Age Range Smart Signs at TV  Broadcast\nThe Pascal Triangle of a Discrete Image: Definition, Properties and  Application to Shape Analysis\nCombined Descriptors in Spatial Pyramid Domain for Image Classification\nSuper-resolution using Sparse Representations over Learned Dictionaries:  Reconstruction of Brain Structure using Electron Microscopy\nBlurred Image Classification based on Adaptive Dictionary\nRobust Degraded Face Recognition Using Enhanced Local Frequency  Descriptor and Multi-scale Competition\nLearning Human Activities and Object Affordances from RGB-D Videos\nVideo De-fencing\nNear-optimal compressed sensing guarantees for total variation  minimization\nUnsupervised Detection and Tracking of Arbitrary Objects with Dependent  Dirichlet Process Mixtures\nContemporary Semantic Web Service Frameworks: An Overview and  Comparisons\nNotes on image annotation\nOn the Role of Contrast and Regularity in Perceptual Boundary Saliency\nImage Processing using Smooth Ordering of its Patches\nGetting Feasible Variable Estimates From Infeasible Ones: MRF Local  Polytope Study\nEpitome for Automatic Image Colorization\nA Slice Sampler for Restricted Hierarchical Beta Process with  Applications to Shared Subspace Learning\nDBN-Based Combinatorial Resampling for Articulated Object Tracking\nNested Dictionary Learning for Hierarchical Organization of Imagery and  Text\nOpenCFU, a New Free and Open-Source Software to Count Cell Colonies and  Other Circular Objects\nEfficient Inference in Fully Connected CRFs with Gaussian Edge  Potentials\nMulti-Stage Multi-Task Feature Learning\nTextural Approach to Palmprint Identification\nManaging sparsity, time, and quality of inference in topic models\nA Multiscale Framework for Challenging Discrete Optimization\nRecognizing Static Signs from the Brazilian Sign Language: Comparing  Large-Margin Decision Directed Acyclic Graphs, Voting Support Vector Machines  and Artificial Neural Networks\nPerformance Evaluation of Different Techniques for texture  Classification\nMugshot Identification from Manipulated Facial Images\nDimensionality Reduction and Classification Feature Using Mutual  Information Applied to Hyperspectral Images: A Wrapper Strategy Algorithm  Based on Minimizing the Error Probability Using the Inequality of Fano\nImplementation of Radon Transformation for Electrical Impedance  Tomography (EIT)\nHandwritten digit recognition by bio-inspired hierarchical networks\nFrom Bits to Images: Inversion of Local Binary Descriptors\nImage denoising with multi-layer perceptrons, part 2: training  trade-offs and analysis of their mechanisms\nLearning Monocular Reactive UAV Control in Cluttered Natural  Environments\nTangent-based manifold approximation with locally linear models\nFourier-Bessel rotational invariant eigenimages\nDeep Attribute Networks\nRate-Distortion Analysis of Multiview Coding in a DIBR Framework\nApplying Dynamic Model for Multiple Manoeuvring Target Tracking Using  Particle Filtering\nFive Modulus Method For Image Compression\nAn Effective Method for Fingerprint Classification\nContent based video retrieval\nThe Nature of Quantum States Created by One Photon Absorption: Pulsed  Coherent vs. Pulsed Incoherent Light\nA New Automatic Method to Adjust Parameters for Object Recognition\nSecure voice based authentication for mobile devices: Vaulted Voice  Verification\nArtificial Neural Network Fuzzy Inference System (ANFIS) For Brain Tumor  Detection\nInverting and Visualizing Features for Object Detection\nVisual Objects Classification with Sliding Spatial Pyramid Matching\nSketch-to-Design: Context-based Part Assembly\nAutomatic landmark annotation and dense correspondence registration for  3D human facial images\nOn the Adaptability of Neural Network Image Super-Resolution\nIn Vivo Quantification of Clot Formation in Extracorporeal Circuits\nHigh Quality Image Interpolation via Local Autoregressive and Nonlocal  3-D Sparse Regularization\nLarge Scale Strongly Supervised Ensemble Metric Learning, with  Applications to Face Verification and Retrieval\nA Semi-automated Statistical Algorithm for Object Separation\nClassifier Fusion Method to Recognize Handwritten Kannada Numerals\nPaFiMoCS: Particle Filtered Modified-CS and Applications in Visual  Tracking across Illumination Change\nA novel processing pipeline for optical multi-touch surfaces\nA Factorized Variational Technique for Phase Unwrapping in Markov Random  Fields\nLattice Particle Filters\nEnhancing the retrieval performance by combing the texture and edge  features\nRobust subspace clustering\nWavelet-based Scale Saliency\nFactorized Topic Models\nBoltzmann Machines and Denoising Autoencoders for Image Denoising\nPushing Stochastic Gradient towards Second-Order Methods --  Backpropagation Learning with Transformations in Nonlinearities\nLearnable Pooling Regions for Image Classification\nDeep Predictive Coding Networks\nInformation Theoretic Learning with Infinitely Divisible Kernels\nBig Neural Networks Waste Capacity\nRegularized Discriminant Embedding for Visual Descriptor Learning\nZero-Shot Learning Through Cross-Modal Transfer\nConvex Variational Image Restoration with Histogram Priors\nDiscriminative Recurrent Sparse Auto-Encoders\nLearning Graphical Models of Images, Videos and Their Spatial  Transformations\nHeteroscedastic Conditional Ordinal Random Fields for Pain Intensity  Estimation from Facial Images\nSpread spectrum compressed sensing MRI using chirp radio frequency  pulses\nMulti-Class Detection and Segmentation of Objects in Depth\nAn improvement to k-nearest neighbor classifier\nSimplifying the Configuration of 802.11 Wireless Networks with Effective  SNR\nCorrecting Camera Shake by Incremental Sparse Approximation\nMulti-scale Visual Attention & Saliency Modelling with Decision Theory\nCentrality-constrained graph embedding\nHybrid Image Segmentation using Discerner Cluster in FCM and Histogram  Thresholding\nImage Segmentation in Video Sequences: A Probabilistic Approach\nAdaptive low rank and sparse decomposition of video using compressive  sensing\nSurveillance Video Processing Using Compressive Sensing\npROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for  Realtime Background Subtraction in Video\nAssessing Semantic Quality of Web Directory Structure\nA Fresnelet-Based Encryption of Medical Images using Arnold Transform\nA new scheme of signature extraction for iris authentication\nMatching Pursuit LASSO Part II: Applications and Sparse Recovery over  Batch Signals\nUnsupervised edge map scoring: a statistical complexity approach\nNonparametric Basis Pursuit via Sparse Kernel-based Learning\nImage restoration using sparse approximations of spatially varying blur  operators in the wavelet domain\nEnsemble Sparse Models for Image Analysis\nSparse Shape Reconstruction\nOn a link between kernel mean maps and Fraunhofer diffraction, with an  application to super-resolution beyond the diffraction limit\nMultiple Kernel Sparse Representations for Supervised and Unsupervised  Learning\nSymmetry Based Cluster Approach for Automatic Recognition of the  Epileptic Focus in Brain Using PET Scan Image : An Analysis\nALPRS - A New Approach for License Plate Recognition using the Sift  Algorithm\nImproving Automatic Emotion Recognition from speech using Rhythm and  Temporal feature\nLeast-Squares FIR Models of Low-Resolution MR data for Efficient  Phase-Error Compensation with Simultaneous Artefact Removal\nVoxel-wise Weighted MR Image Enhancement using an Extended Neighborhood  Filter\nBilateral Filter: Graph Spectral Interpretation and Extensions\nCombined Learning of Salient Local Descriptors and Distance Metrics for  Image Set Face Verification\nMaterial quality assessment of silk nanofibers based on swarm  intelligence\nWhich research in design creativity and innovation? Let us not forget  the reality of companies\nMethods Of Measurement The Three-Dimensional Wind Waves Spectra, Based  On The Processing Of Video Images Of The Sea Surface\nCortical Surface Co-Registration based on MRI Images and Photos\nMachine learning of hierarchical clustering to segment 2D and 3D images\nPerformance Evaluation of Edge-Directed Interpolation Methods for Images\nAn N-dimensional approach towards object based classification of  remotely sensed imagery\nAn intelligent approach towards automatic shape modeling and object  extraction from satellite images using cellular automata based algorithm\nAn investigation towards wavelet based optimization of automatic image  registration techniques\nInductive Hashing on Manifolds\nAn Adaptive Descriptor Design for Object Recognition in the Wild\nOn the Convergence and Consistency of the Blurring Mean-Shift Process\nHow to find real-world applications for compressive sensing\nA new framework for optimal classifier design\nEarly Detection of Alzheimer's - A Crucial Requirement\nA Bag of Words Approach for Semantic Segmentation of Monitored Scenes\nClassification for Big Dataset of Bioacoustic Signals Based on Human  Scoring System and Artificial Neural Network\nBioacoustic Signal Classification Based on Continuous Region Processing,  Grid Masking and Artificial Neural Network\nMachine learning on images using a string-distance\nMobile Network Anomaly Detection and Mitigation: The NEMESYS Approach\nObject Detection with Pixel Intensity Comparisons Organized in Decision  Trees\nEfficient Image Retargeting for High Dynamic Range Scenes\nA Supervised Neural Autoregressive Topic Model for Simultaneous Image  Classification and Annotation\nNonnegative Tensor Factorization, Completely Positive Tensors and an  Hierarchical Elimination Algorithm\nEdge Detection in Radar Images Using Weibull Distribution\nMatrices of forests, analysis of networks, and ranking problems\nA Local Active Contour Model for Image Segmentation with Intensity  Inhomogeneity\nLensless Imaging by Compressive Sensing\nRobust Hyperspectral Unmixing with Correntropy based Metric\nAn Analysis of the Connections Between Layers of Deep Neural Networks\nPyHST2: an hybrid distributed code for high speed tomographic  reconstruction with iterative reconstruction and a priori knowledge  capabilities\nStatistical Denoising for single molecule fluorescence microscopic  images\nEmotional Expression Classification using Time-Series Kernels\nDiscriminative k-means clustering\nHand Gesture Recognition Based on Karhunen-Loeve Transform\nRecurrent Convolutional Neural Networks for Scene Parsing\nOptimization of Clustering for Clustering-based Image Denoising\nThe Ripple Pond: Enabling Spiking Networks to See\nLearning to encode motion using spatio-temporal synchrony\nFeature Learning by Multidimensional Scaling and its Applications in  Object Recognition\nMatching objects across the textured-smooth continuum\nLive-wire 3D medical images segmentation\nHyperparameter Optimization and Boosting for Classifying Facial  Expressions: How good can a \"Null\" Model be?\nNon-Uniform Blind Deblurring with a Spatially-Adaptive Sparse Prior\nTwo-View Matching with View Synthesis Revisited\nMulti-view in Lensless Compressive Imaging\nFinite Element Based Tracking of Deforming Surfaces\nFine-Grained Visual Classification of Aircraft\nNew Approach of Estimating PSNR-B For De-blocked Images\nA maximal-information color to gray conversion method for document  images: Toward an optimal grayscale representation for document image  binarization\nActive Contour Models for Manifold Valued Image Segmentation\nNew Mathematical and Algorithmic Schemes for Pattern Classification with  Application to the Identification of Writers of Important Ancient Documents\nHyperspectral Data Unmixing Using GNMF Method and Sparseness Constraint\nMultilevel Threshold Based Gray Scale Image Segmentation using Cuckoo  Search\nA Novel Robust Method to Add Watermarks to Bitmap Images by Fading  Technique\nFurther results on dissimilarity spaces for hyperspectral images RF-CBIR\nToward Guaranteed Illumination Models for Non-Convex Objects\nDetection of Outer Rotations on 3D-Vector Fields with Iterative  Geometric Correlation and its Efficiency\nAnisotropic Diffusion for Details Enhancement in Multi-Exposure Image  Fusion\nFast Exact Search in Hamming Space with Multi-Index Hashing\nContrast Enhancement And Brightness Preservation Using Multi-  Decomposition Histogram Equalization\nModified SPLICE and its Extension to Non-Stereo Data for Noise Robust  Speech Recognition\nProcessing stationary noise: model and parameter selection in  variational methods\nAutomated Defect Localization via Low Rank Plus Outlier Modeling of  Propagating Wavefield Data\nUsing a Dynamic Neural Field Model to Explore a Direct Collicular  Inhibition Account of Inhibition of Return\nUnderstanding Humans' Strategies in Maze Solving\nAppearance Descriptors for Person Re-identification: a Comprehensive  Review\nAn Adaptive GMM Approach to Background Subtraction for Application in  Real Time Surveillance\nMatching-Constrained Active Contours\nSelf-Learning for Player Localization in Sports Video\nUnion of Low-Rank Subspaces Detector\nA Study on Unsupervised Dictionary Learning and Feature Encoding for  Action Classification\nMinutiae Based Thermal Human Face Recognition using Label Connected  Component Algorithm\nRadar shadow detection in SAR images using DEM and projections\nSingle image super resolution in spatial and wavelet domain\nReal-Time and Continuous Hand Gesture Spotting: an Approach Based on  Artificial Neural Networks\nContour Manifolds and Optimal Transport\nRobust Periocular Recognition By Fusing Sparse Representations of Color  and Geometry Information\nRecovery guarantees for exemplar-based clustering\nVisual-Semantic Scene Understanding by Sharing Labels in a Context  Network\nSEEDS: Superpixels Extracted via Energy-Driven Sampling\nSparsity Based Poisson Denoising with Dictionary Learning\nPhoton counting compressive depth mapping\nGRED: Graph-Regularized 3D Shape Reconstruction from Highly Anisotropic  and Noisy Images\nA novel approach for nose tip detection using smoothing by weighted  median filtering applied to 3D face images in variant poses\nDetection of pose orientation across single and multiple axes in case of  3D face images\nA novel approach to nose-tip and eye corners detection using H-K  Curvature Analysis in case of 3D images\nBlind Deconvolution via Maximum Kurtosis Adaptive Filtering\nLatent Fisher Discriminant Analysis\nOnline Algorithms for Factorization-Based Structure from Motion\nEfficient Algorithms for Robust and Stable Principal Component Pursuit  Problems\nAn Efficient Index for Visual Search in Appearance-based SLAM\nAdopting level set theory based algorithms to segment human ear\nFace Verification Using Boosted Cross-Image Features\nIdentificación y Registro Catastral de Cuerpos de Agua mediante  Técnicas de Procesamiento Digital de Imagenes\nObject Detection Using Keygraphs\nElectricity Market Forecasting via Low-Rank Multi-Kernel Learning\nSpatially Scalable Compressed Image Sensing with Hybrid Transform and  Inter-layer Prediction Model\nSecond order scattering descriptors predict fMRI activity due to visual  textures\nA Novel Progressive Image Scanning and Reconstruction Scheme based on  Compressed Sensing and Linear Prediction\nDirector Field Model of the Primary Visual Cortex for Contour Detection\nA Splitting Augmented Lagrangian Method for Low Multilinear-Rank Tensor  Recovery\nEnd-to-End Text Recognition with Hybrid HMM Maxout Models\nEarly Fire Detection Using HEP and Space-time Analysis\nFeature Selection Strategies for Classifying High Dimensional  Astronomical Data Sets\nFrom Shading to Local Shape\nAn Improved K-means Clustering Based Approach to Detect a DNA Structure  in H&E Image of Mouse Tissue Reacted with CD4-Green Antigen\nNew Ways to Promote Sustainability and Social Well-Being in a Complex,  Strongly Interdependent World: The FuturICT Approach\nMapping the stereotyped behaviour of freely-moving fruit flies\nFine-grained Categorization -- Short Summary of our Entry for the  ImageNet Challenge 2012\nPrincipal motion components for gesture recognition using a  single-example\nAdvances in Hyperspectral Image Classification: Earth monitoring with  statistical learning methods\nShip Detection and Segmentation using Image Correlation\nRANSAC: Identification of Higher-Order Geometric Features and  Applications in Humanoid Robot Soccer\nFusion of Hyperspectral and Panchromatic Images using Spectral Uumixing  Results\nPseudo vs. True Defect Classification in Printed Circuits Boards using  Wavelet Features\nTwo Dimensional Array Imaging with Beam Steered Data\nCompressed Sensing SAR Imaging with Multilook Processing\nImpulse Noise Removal In Speech Using Wavelets\nRobust Compressed Sensing and Sparse Coding with the Difference Map\nStructure-preserving color transformations using Laplacian commutativity\nReconstruction of Complex-Valued Fractional Brownian Motion Fields Based  on Compressive Sampling and Its Application to PSF Interpolation in Weak  Lensing Survey\nTracking Deformable Parts via Dynamic Conditional Random Fields\nA Parallel Compressive Imaging Architecture for One-Shot Acquisition\nMotion and audio analysis in mobile devices for remote monitoring of  physical activities and user authentication\nFace Recognition via Globality-Locality Preserving Projections\nBiometric Signature Processing & Recognition Using Radial Basis Function  Network\nA new stopping criterion for the mean shift iterative algorithm\nVolumetric Reconstruction Applied to Perceptual Studies of Size and  Weight\nVisualizing and Understanding Convolutional Networks\nAn Efficient Method for Recognizing the Low Quality Fingerprint  Verification by Means of Cross Correlation\nThe STONE Transform: Multi-Resolution Image Enhancement and Real-Time  Compressive Video\nDescribing Textures in the Wild\nPeriodicity Extraction using Superposition of Distance Matching Function  and One-dimensional Haar Wavelet Transform\nBlind Deconvolution with Non-local Sparsity Reweighting\nA Comparative Study of Histogram Equalization Based Image Enhancement  Techniques for Brightness Preservation and Contrast Enhancement\nReflection methods for user-friendly submodular optimization\nTexture descriptor combining fractal dimension and artificial crawlers\nAdaptive Learning of Region-based pLSA Model for Total Scene Annotation\nPANDA: Pose Aligned Networks for Deep Attribute Modeling\nOn Nonrigid Shape Similarity and Correspondence\nA brief network analysis of Artificial Intelligence publication\nLocal Similarities, Global Coding: An Algorithm for Feature Coding and  its Applications\nOn Approximate Inference for Generalized Gaussian Process Models\nHilditchs Algorithm Based Tamil Character Recognition\nCross-Domain Sparse Coding\nUnobtrusive Low Cost Pupil Size Measurements using Web cameras\nShape from Texture using Locally Scaled Point Processes\nDual coordinate solvers for large-scale structural SVMs\nMulti-frame denoising of high speed optical coherence tomography data  using inter-frame and intra-frame priors\nScalable Object Detection using Deep Neural Networks\nOn the Performance of Filters for Reduction of Speckle Noise in SAR  Images off the Coast of the Gulf of Guinea\nFast Neighborhood Graph Search using Cartesian Concatenation\nUnsupervised learning of depth and motion\nSparse Matrix-based Random Projection for Classification\nECOC-Based Training of Neural Networks for Face Recognition\nOne-Shot-Learning Gesture Recognition using HOG-HOF Features\nTeleoperation System Using Past Image Records Considering Narrow  Communication Band\nDecomposition of Optical Flow on the Sphere\nLearned versus Hand-Designed Feature Representations for 3d  Agglomeration\nTotal variation with overlapping group sparsity for image deblurring  under impulse noise\nOverFeat: Integrated Recognition, Localization and Detection using  Convolutional Networks\nGrowing Regression Forests by Classification: Applications to Object  Pose Estimation\nTop Down Approach to Multiple Plane Detection\nSequentially Generated Instance-Dependent Image Representations for  Classification\n3D Interest Point Detection via Discriminative Learning\nAn Unsupervised Approach for Automatic Activity Recognition based on  Hidden Markov Model Regression\nFinding More Relevance: Propagating Similarity on Markov Random Field  for Image Retrieval\nCorrelation-based construction of neighborhood and edge features\nLesion Border Detection in Dermoscopy Images Using Ensembles of  Thresholding Methods\nShape Primitive Histogram: A Novel Low-Level Face Representation for  Face Recognition\nCollaborative Discriminant Locality Preserving Projections With its  Application to Face Recognition\nTotal variation regularization for manifold-valued data\nA Novel Approach For Generating Face Template Using Bda\nSystem Analysis And Design For Multimedia Retrieval Systems\nMedical Image Fusion: A survey of the state of the art\nHybrid Approach to Face Recognition System using Principle component and  Independent component with score based fusion process\nAdaptive-Rate Compressive Sensing Using Side Information\nMachine Assisted Authentication of Paper Currency: an Experiment on  Indian Banknotes\nConceptVision: A Flexible Scene Classification Framework\nContext-Aware Hypergraph Construction for Robust Spectral Clustering\nFrom Kernel Machines to Ensemble Learning\nFeature Selection Using Classifier in High Dimensional Data\nFast nonparametric clustering of structured time-series\nImage reconstruction from few views by L0-norm optimization\nEnhancement performance of road recognition system of autonomous robots  in shadow scenario\nSatellite image classification and segmentation using non-additive  entropy\nInsights into analysis operator learning: From patch-based sparse models  to higher-order MRFs\nMultilinear Wavelets: A Statistical Shape Space for Human Faces\nTensor Representation and Manifold Learning Methods for Remote Sensing  Images\nApplication of the Modified Fractal Signature Method for Terrain  Classification from Synthetic Aperture Radar Images\nAn Enhanced Method For Evaluating Automatic Video Summaries\nStructured Priors for Sparse-Representation-Based Hyperspectral Image  Classification\nLearning $\\ell_1$-based analysis and synthesis sparsity priors using  bi-level optimization\nDistortion-driven Turbulence Effect Removal using Variational Model\nHumanoid Robot With Vision Recognition Control System\nAn Analysis of Random Projections in Cancelable Biometrics\nEnhancing Template Security of Face Biometrics by Using Edge Detection  and Hashing\nImage Block Loss Restoration Using Sparsity Pattern as Side Information\nFace Verification Using Kernel Principle Component Analysis\nFace Verification System based on Integral Normalized Gradient  Image(INGI)\nImage enhancement using fusion by wavelet transform and laplacian  pyramid\nSmoothed Low Rank and Sparse Matrix Recovery by Iteratively Reweighted  Least Squares Minimization\nInformation quantity in a pixel of digital image\nA Generalized Probabilistic Framework for Compact Codebook Creation\nCross-calibration of Time-of-flight and Colour Cameras\nBayes Merging of Multiple Vocabularies for Scalable Image Retrieval\nMultiview Hessian regularized logistic regression for action recognition\nK-Tangent Spaces on Riemannian Manifolds for Improved Pedestrian  Detection\nCollaborative Representation for Classification, Sparse or Non-sparse?\nIllumination,Expression and Occlusion Invariant Pose-Adaptive Face  Recognition System for Real-Time Applications\nAutomated Tracking and Estimation for Control of Non-rigid Cloth\nFeature Extraction of ECG Signal Using HHT Algorithm\nCompressive Hyperspectral Imaging Using Progressive Total Variation\nMulti-scale Orderless Pooling of Deep Convolutional Activation Features\n3D Well-composed Polyhedral Complexes\nA Novel Method to Extract Rocks from Mars Images\nSpontaneous expression classification in the encrypted domain\nGeometric VLAD for Large Scale Image Search\nStructured Sparse Method for Hyperspectral Unmixing\nA Non-Local Structure Tensor Based Approach for Multicomponent Image  Recovery Problems\nAn Efficient Method for Face Recognition System In Various Assorted  Conditions\nSmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images\nSRA: Fast Removal of General Multipath for ToF Sensors\nBrain Tumor Detection Based On Mathematical Analysis and Symmetry  Information\nSelective Factor Extraction in High Dimensions\nImage Retargeting by Content-Aware Synthesis\nOptimized imaging using non-rigid registration\nPyramidal Fisher Motion for Multiview Gait Recognition\nCompressive Pattern Matching on Multispectral Data\nExpectation-Maximization Technique and Spatial-Adaptation Applied to  Pel-Recursive Motion Estimation\nActive Deformable Part Models\nA Continuous Max-Flow Approach to General Hierarchical Multi-Labeling  Problems\nEnabling Automatic Certification of Online Auctions\nSparse Wavelet Representations of Spatially Varying Blurring Operators\nResolving Multi-path Interference in Time-of-Flight Imaging via  Modulation Frequency Diversity and Sparse Regularization\nRecognition of Handwritten MODI Numerals using Hu and Zernike features\nImproving Bilayer Product Quantization for Billion-Scale Approximate  Nearest Neighbors in High Dimensions\nRANCOR: Non-Linear Image Registration with Total Variation  Regularization\nBayesian image segmentations by Potts prior and loopy belief propagation\nShrinkage Optimized Directed Information using Pictorial Structures for  Action Recognition\nRecover Canonical-View Faces in the Wild with Deep Neural Networks\nOnline Group Feature Selection\nRobust Face Recognition via Adaptive Sparse Representation\nGeometric Abstraction from Noisy Image-Based 3D Reconstructions\nA higher-order MRF based variational model for multiplicative noise  reduction\nFast Approximate Matching of Cell-Phone Videos for Robust Background  Subtraction\nLinking Geographic Vocabularies through WordNet\nLarge Margin Image Set Representation and Classification\nFind my mug: Efficient object search with a mobile robot using semantic  segmentation\nMaximum Margin Vector Correlation Filter\nImproving weather radar by fusion and classification\nSpatially Directional Predictive Coding for Block-based Compressive  Sensing of Natural Images\nStructural Group Sparse Representation for Image Compressive Sensing  Recovery\nVSCAN: An Enhanced Video Summarization using Density-based Spatial  Clustering\nRule of Three for Superresolution of Still Images with Applications to  Compression and Denoising\nA Continuous Max-Flow Approach to Multi-Labeling Problems under  Arbitrary Region Regularization\nComparing apples to apples in the evaluation of binary coding methods\nNuclear Norm based Matrix Regression with Applications to Face  Recognition with Occlusion and Illumination Changes\nNew Algorithmic Approaches to Point Constellation Recognition\nTexture Based Image Segmentation of Chili Pepper X-Ray Images Using  Gabor Filter\nPrecision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps\nVariational Image Segmentation Model Coupled with Image Restoration  Achievements\nAn Overview of Face Liveness Detection\nGraph Regularized Non-negative Matrix Factorization By Maximizing  Correntropy\nHyperspectral pan-sharpening: a variational convex constrained  formulation to impose parallel level lines, solved with ADMM\nAnomaly-Sensitive Dictionary Learning for Unsupervised Diagnostics of  Solid Media\nCross-view Action Modeling, Learning and Recognition\nImage Restoration Using Joint Statistical Modeling in Space-Transform  Domain\nActive Mining of Parallel Video Streams\nKronecker PCA Based Spatio-Temporal Modeling of Video for Dismount  Classification\nSparsity Based Methods for Overparameterized Variational Problems\nDynamic Hierarchical Bayesian Network for Arabic Handwritten Word  Recognition\nIterative Non-Local Shrinkage Algorithm for MR Image Reconstruction\nDescriptor Matching with Convolutional Neural Networks: a Comparison to  SIFT\nImprovements and Experiments of a Compact Statistical Background Model\nMulti-view Metric Learning for Multi-view Video Summarization\nSupervised Dictionary Learning by a Variational Bayesian Group Sparse  Nonnegative Matrix Factorization\nLarge Scale, Large Margin Classification using Indefinite Similarity  Measures\nSampling, splines and frames on compact manifolds\nDetection Bank: An Object Detection Based Video Representation for  Multimedia Event Recognition\nFeature sampling and partitioning for visual vocabulary generation on  large action classification datasets\nDEM Registration and Error Analysis using ASCII values\nIdentifying Outliers in Large Matrices via Randomized Adaptive  Compressive Sampling\nA New Path to Construct Parametric Orientation Field: Sparse FOMFE Model  and Compressed Sparse FOMFE Model\nDeep Poselets for Human Detection\nSolving QVIs for Image Restoration with Adaptive Constraint Sets\nExpanding the Family of Grassmannian Kernels: An Embedding Perspective\nWeakly Supervised Action Labeling in Videos Under Ordering Constraints\nA Cylindrical Basis Function for Solving Partial Differential Equations  on Manifolds\nHomophilic Clustering by Locally Asymmetric Geometry\nThe Primal-Dual Hybrid Gradient Method for Semiconvex Splittings\nPAINTER: a spatio-spectral image reconstruction algorithm for optical  interferometry\nOnline Stroke and Akshara Recognition GUI in Assamese Language Using  Hidden Markov Model\nClassifiers fusion method to recognize handwritten persian numerals\nClassifying Fonts and Calligraphy Styles Using Complex Wavelet Transform\nOffline handwritten signature identification using adaptive window  positioning techniques\nARTOS -- Adaptive Real-Time Object Detection System\nFAME: Face Association through Model Evolution\nAn SVM Based Approach for Cardiac View Planning\nArticulated Pose Estimation by a Graphical Model with Image Dependent  Pairwise Relations\nAn Enhancement Neighborhood connected Segmentation for 2D-Cellular Image\nA New Approach for Super resolution by Using Web Images and FFT Based  Image Registration\nRecovery of Images with Missing Pixels using a Gradient Compressive  Sensing Algorithm\nAggregate channel features for multi-view face detection\nHand Pointing Detection Using Live Histogram Template of Forehead Skin\nCertifying the Existence of Epipolar Matrices\nThe U-curve optimization problem: improvements on the original algorithm  and time complexity analysis\nJoint Energy-based Detection and Classificationon of Multilingual Text  Lines\nscikit-image: Image processing in Python\nLearning Structured Outputs from Partial Labels using Forest Ensemble\nNovel and Fast Algorithm for Extracting License Plate Location Based on  Edge Analysis\nA unified framework for thermal face recognition\nA discussion on the validation tests employed to compare human action  recognition methods using the MSR Action3D dataset\nA Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene  Text Extraction\nAccurate merging of images for predictive analysis using combined image\nA New Model of Array Grammar for generating Connected Patterns on an  Image Neighborhood\nVariational Depth from Focus Reconstruction\nMethodology For Detection of QRS Pattern Using Secondary Wavelets\nAdaptive Wavelet Based Identification and Extraction of PQRST  Combination in Randomly Stretching ECG Sequence\nDetermining the Number of Clusters via Iterative Consensus Clustering\nA Flexible Iterative Framework for Consensus Clustering\nIt is hard to see a needle in a haystack: Modeling contrast masking  effect in a numerical observer\nFormation of General Position by Asynchronous Mobile Robots\nMultidimensional Digital Filters for Point-Target Detection in Cluttered  Infrared Scenes\nSpectral Unmixing of Hyperspectral Imagery using Multilayer NMF\nHashing for Similarity Search: A Survey\nHuman Activity Learning and Segmentation using Partially Hidden  Discriminative Models\nRobust Statistical Ranking: Theory and Algorithms\nParallel software implementation of recursive multidimensional digital  filters for point-target detection in cluttered infrared scenes\nMotion Deblurring for Plenoptic Images\nHighly Accurate Multispectral Palmprint Recognition Using Statistical  and Wavelet Features\nHOPC: Histogram of Oriented Principal Components of 3D Pointclouds for  Action Recognition\nAction Classification with Locality-constrained Linear Coding\nLearning Deep Representation for Face Alignment with Auxiliary  Attributes\nWhat makes an Image Iconic? A Fine-Grained Case Study\nUnsupervised Parallel Extraction based Texture for Efficient Image  Representation\nSeeing through bag-of-visual-word glasses: towards understanding  quantization effects in feature extraction methods\nGIMP and Wavelets for Medical Image Processing: Enhancing Images of the  Fundus of the Eye\nUnsupervised Spike Sorting Based on Discriminative Subspace Learning\nHierarchical Saliency Detection on Extended CSSD\nLearn Convolutional Neural Network for Face Anti-Spoofing\nDependent Nonparametric Bayesian Group Dictionary Learning for online  reconstruction of Dynamic MR images\nSparse Graph-based Transduction for Image Classification\nCompression, Restoration, Re-sampling, Compressive Sensing: Fast  Transforms in Digital Imaging\nMultispectral Palmprint Recognition Using Textural Features\nText Line Identification in Tagore's Manuscript\nTemporal Extension of Scale Pyramid and Spatial Pyramid Matching for  Action Recognition\nRiemannian Multi-Manifold Modeling\nNon-parametric Image Registration of Airborne LiDAR, Hyperspectral and  Photographic Imagery of Forests\nMultidimensional Digital Smoothing Filters for Target Detection\nGroup Orbit Optimization: A Unified Approach to Data Normalization\nA Model of Plant Identification System Using GLCM, Lacunarity And Shen  Features\nLearning Invariant Color Features for Person Re-Identification\nMemristive Threshold Logic Circuit Design of Fast Moving Object  Detection\nHierarchical Sparse and Collaborative Low-Rank Representation for  Emotion Recognition\nFace Detection Using Radial Basis Functions Neural Networks With Fixed  Spread\nImage Denoising using New Adaptive Based Median Filters\nAn Aerial Image Recognition Framework using Discrimination and  Redundancy Quality Measure\nA unified approach for multi-object triangulation, tracking and camera  calibration\nTag Relevance Fusion for Social Image Retrieval\nZero-Shot Object Recognition System based on Topic Model\nCrowd Saliency Detection via Global Similarity Structure\nDetection of Salient Regions in Crowded Scenes\nOnline interpretation of numeric sign language using 2-d skeletal model\nImplicit segmentation of Kannada characters in offline handwriting  recognition using hidden Markov models\nA Gesture Recognition System for Detecting Behavioral Patterns of ADHD\nGraph-Sparse LDA: A Topic Model with Structured Sparsity\nRandomized Structural Sparsity via Constrained Block Subsampling for  Improved Sensitivity of Discriminative Voxel Identification\nAttentive monitoring of multiple video streams driven by a Bayesian  foraging strategy\nCompositional Structure Learning for Action Understanding\nForeground-Background Segmentation Based on Codebook and Edge Detector\nA Novel Visual Word Co-occurrence Model for Person Re-identification\nA Framework for On-Line Devanagari Handwritten Character Recognition\nDirectional Bilateral Filters\nA method for context-based adaptive QRS clustering in real-time\nVisual Chunking: A List Prediction Framework for Region-Based Object  Detection\nDeep Structured learning for mass segmentation from Mammograms\nRobust Piecewise-Constant Smoothing: M-Smoother Revisited\nSuper-resolution method using sparse regularization for point-spread  function recovery\nCollaborative Multi-sensor Classification via Sparsity-based  Representation\nExtended Dynamic Programming and Fast Multidimensional Search Algorithm  for Energy Minization in Stereo and Motion\nA comparison of dense region detectors for image search and fine-grained  classification\nAn ensemble-based system for automatic screening of diabetic retinopathy\nAn Ensemble-based System for Microaneurysm Detection and Diabetic  Retinopathy Grading\nSymmetric low-rank representation for subspace clustering\nGeneralized Adaptive Dictionary Learning via Domain Shift Minimization\nDetection of texts in natural images\nGeodesic Exponential Kernels: When Curvature and Linearity Conflict\nHigh Dynamic Range Imaging by Perceptual Logarithmic Exposure Merging\nNon Binary Local Gradient Contours for Face Recognition\nSimultaneous Localization, Mapping, and Manipulation for Unsupervised  Object Discovery\nA random algorithm for low-rank decomposition of large-scale matrices  with missing entries\nLarge-Margin Determinantal Point Processes\nFast Mesh-Based Medical Image Registration\nStacked Quantizers for Compositional Vector Compression\nInfinite Object Coating in the Amoebot Model\nApplications of sampling Kantorovich operators to thermographic images  for seismic engineering\n3D Shape Estimation from 2D Landmarks: A Convex Relaxation Approach\nPerson Re-identification Based on Color Histogram and Spatial  Configuration of Dominant Color Regions\nWindow-Based Descriptors for Arabic Handwritten Alphabet Recognition: A  Comparative Study on a Novel Dataset\nGaze Stabilization for Humanoid Robots: a Comprehensive Framework\nSparse And Low Rank Decomposition Based Batch Image Alignment for  Speckle Reduction of retinal OCT Images\nFully Convolutional Networks for Semantic Segmentation\nA Faster Method for Tracking and Scoring Videos Corresponding to  Sentences\nJoint cross-domain classification and subspace learning for unsupervised  adaptation\nTILDE: A Temporally Invariant Learned DEtector\nAlexU-Word: A New Dataset for Isolated-Word Closed-Vocabulary Offline  Arabic Handwriting Recognition\nDesigning Deep Networks for Surface Normal Estimation\nA Pooling Approach to Modelling Spatial Relations for Image Retrieval  and Annotation\nSparse distributed localized gradient fused features of objects\nEfficient Media Retrieval from Non-Cooperative Queries\nEnd-to-End Integration of a Convolutional Network, Deformable Parts  Model and Non-Maximum Suppression\nMaximum Likelihood Directed Enumeration Method in Piecewise-Regular  Object Recognition\nHypercolumns for Object Segmentation and Fine-grained Localization\nCategory-Specific Object Reconstruction from a Single Image\nOn the mathematic modeling of non-parametric curves based on cubic  Bézier curves\nDeep convolutional filter banks for texture recognition and segmentation\nImage Classification and Retrieval from User-Supplied Tags\nPost-acquisition image based compensation for thickness variation in  microscopy section series\nPatents used by NPE as an Open Information System in Web 2.0 - Two mini  case studies\nBi-objective Optimization for Robust RGB-D Visual Odometry\nA statistical reduced-reference method for color image quality  assessment\nCross-Modal Learning via Pairwise Constraints\nArticulated motion discovery using pairs of trajectories\nOn Rendering Synthetic Images for Training an Object Detector\nLearning Face Representation from Scratch\nMultiple object tracking with context awareness\nEffective Face Frontalization in Unconstrained Images\nFace recognition using color local binary pattern from mutually  independent color channels\nNon-iterative rigid 2D/3D point-set registration using semidefinite  programming\nA Deep-structured Conditional Random Field Model for Object Silhouette  Tracking\nHashing with binary autoencoders\nGroup $K$-Means\nStem-Calyx Recognition of an Apple using Shape Descriptors\nA Novel Technique for Grading of Dates using Shape and Texture Features\nObject localization in ImageNet by looking out of the window\nSuper-resolution MRI Using Finite Rate of Innovation Curves\nA Systematic Scheme for Measuring the Performance of the Display-Camera  Channel\nOnline Handwritten Devanagari Stroke Recognition Using Extended  Directional Features\nA Modified No Search Algorithm for Fractal Image Compression\nAn Adaptive Neuro-Fuzzy Inference System Modeling for Grid-Adaptive  Interpolation over Depth Images\nRobust and Real Time Detection of Curvy Lanes (Curves) with Desired  Slopes for Driving Assistance and Autonomous Vehicles\nHigher dimensional homodyne filtering for suppression of incidental  phase artifacts in multichannel MRI\nImage enhancement in intensity projected multichannel MRI using  spatially adaptive directional anisotropic diffusion\nScreen Content Image Segmentation Using Least Absolute Deviation Fitting\nSubmodular relaxation for inference in Markov random fields\nMind the Gap: Subspace based Hierarchical Domain Adaptation\nImproving resolution and depth of astronomical observations via modern  mathematical methods for image analysis\nA Fast Fractal Image Compression Algorithm Using Predefined Values for  Contrast Scaling\nPairwise Constraint Propagation on Multi-View Data\nClustering based on the In-tree Graph Structure and Affinity Propagation\nInstance Significance Guided Multiple Instance Boosting for Robust  Visual Tracking\nNaive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?\nDeepHash: Getting Regularization, Depth and Fine-Tuning Right\nHandwritten Devanagari Script Segmentation: A non-linear Fuzzy Approach\nDesign of a novel convex hull based feature set for recognition of  isolated handwritten Roman numerals\nAn Improved Feature Descriptor for Recognition of Handwritten Bangla  Alphabet\nGlobally Optimal Cell Tracking using Integer Programming\nEstimating the Intrinsic Dimension of Hyperspectral Images Using an  Eigen-Gap Approach\nUnsupervised image segmentation by Global and local Criteria  Optimization Based on Bayesian Networks\nBi-Objective Nonnegative Matrix Factorization: Linear Versus  Kernel-Based Models\nBeyond Frontal Faces: Improving Person Recognition Using Multiple Cues\nTaking a Deeper Look at Pedestrians\nUnsupervised Segmentation of Multispectral Images with Cellular Automata\nAccurate automatic segmentation of retina layers with emphasis on first  layer\nAn Occlusion Reasoning Scheme for Monocular Pedestrian Tracking in  Dynamic Scenes\nUnsupervised Object Discovery and Localization in the Wild: Part-based  Matching with Bottom-up Region Proposals\nParallel Magnetic Resonance Imaging\nDeep Transductive Semi-supervised Maximum Margin Clustering\nGeodesic convolutional neural networks on Riemannian manifolds\nIT-map: an Effective Nonlinear Dimensionality Reduction Method for  Interactive Clustering\nPairwise Rotation Hashing for High-dimensional Features\nWeakly Supervised Learning for Salient Object Detection\nCo-Regularized Deep Representations for Video Summarization\nGibbs-Ringing Artifact Removal Based on Local Subvoxel-shifts\nMulti-task Image Classification via Collaborative, Hierarchical  Spike-and-Slab Priors\nQuality Control in Crowdsourced Object Segmentation\nApplication of S-Transform on Hyper kurtosis based Modified Duo  Histogram Equalized DIC images for Pre-cancer Detection\nDense Optical Flow Prediction from a Static Image\nLearning Temporal Embeddings for Complex Video Analysis\nJoint Multi-Leaf Segmentation, Alignment and Tracking from Fluorescence  Plant Videos\nReNet: A Recurrent Neural Network Based Alternative to Convolutional  Networks\nElectron Neutrino Classification in Liquid Argon Time Projection Chamber  Detector\nSequence to Sequence -- Video to Text\nModeling Representation of Videos for Anomaly Detection using Deep  Learning: A Review\nHigher Order Maximum Persistency and Comparison Theorems\nUnsupervised Learning of Visual Representations using Videos\nEmpirical Evaluation of Rectified Activations in Convolutional Network\nLarge-scale Classification of Fine-Art Paintings: Learning The Right  Metric on The Right Feature\nAdaptive diffusion constrained total variation scheme with application  to `cartoon + texture + edge' image decomposition\nIn Defense of the Direct Perception of Affordances\nContextual Action Recognition with R*CNN\nWebly Supervised Learning of Convolutional Networks\nAdaptive Nonparametric Image Parsing\nFilter characteristics in image decomposition with singular spectrum  analysis\nDeepBox: Learning Objectness with Convolutional Networks\nLearning image representations tied to ego-motion\nSubset Feature Learning for Fine-Grained Category Classification\nAutomatic Script Identification in the Wild\nA new Level-set based Protocol for Accurate Bone Segmentation from CT  Imaging\nLeveraging Image based Prior for Visual Place Recognition\nAPAC: Augmented PAttern Classification with Neural Networks\nLoop-corrected belief propagation for lattice spin models\nMulti-scale Volumes for Deep Object Detection and Localization\nUnsupervised Object Discovery and Tracking in Video Collections\nRobust Real-time Extraction of Fiducial Facial Feature Points using  Haar-like Features\nLearning Deconvolution Network for Semantic Segmentation\nVisual Semantic Role Labeling\nU-Net: Convolutional Networks for Biomedical Image Segmentation\nMulti-Image Matching via Fast Alternating Minimization\nHave a Look at What I See\nCharacter-level Chinese Writer Identification using Path Signature  Feature, DropStroke and Deep CNN\nConvective regularization for optical flow\nImage Reconstruction from Bag-of-Visual-Words\nUnsupervised Visual Representation Learning by Context Prediction\nBarcode Annotations for Medical Image Retrieval: A Preliminary  Investigation\nImage aesthetic evaluation using paralleled deep convolution neural  network\nMulti-scale recognition with DAG-CNNs\nLive Video Synopsis for Multiple Cameras\nA Posteriori Error Control for the Binary Mumford-Shah Model\nObject Modelling with a Handheld RGB-D Camera\nWatch and Learn: Semi-Supervised Learning of Object Detectors from  Videos\nDiffusion Methods for Classification with Pairwise Relationships\nTunnel Surface 3D Reconstruction from Unoriented Image Sequences\nSmooth PARAFAC Decomposition for Tensor Completion\nRecognition Confidence Analysis of Handwritten Chinese Character with  CNN\nSequential Dimensionality Reduction for Extracting Localized Features\nEfficient Decomposition of Image and Mesh Graphs by Lifted Multicuts\nImproving Spatial Codification in Semantic Segmentation\nPrivacy in the Internet of Things: Threats and Challenges\nInvertible Orientation Scores of 3D Images\nEstimating Visual Comfort in Stereoscopic Displays Using  Electroencephalography: A Proof-of-Concept\nCURL: Co-trained Unsupervised Representation Learning for Image  Classification\nRepresenting data by sparse combination of contextual data points for  classification\nCompressive Deconvolution in Medical Ultrasound Imaging\nCross Modal Distillation for Supervision Transfer\nTV News Commercials Detection using Success based Locally Weighted  Kernel Combination\nScalable Sparse Subspace Clustering by Orthogonal Matching Pursuit\nBeyond Semantic Image Segmentation : Exploring Efficient Inference in  Video\nSpotlight the Negatives: A Generalized Discriminative Latent Model\nIris Recognition Using Scattering Transform and Textural Features\nUnderstanding Intra-Class Knowledge Inside CNN\nLearning Structured Ordinal Measures for Video based Face Recognition\nGeneralized Video Deblurring for Dynamic Scenes\nMulti-Type Activity Recognition in Robot-Centric Scenarios\nDeep Perceptual Mapping for Thermal to Visible Face Recognition\nFace Alignment Assisted by Head Pose Estimation\nUnconstrained Facial Landmark Localization with Backbone-Branches  Fully-Convolutional Networks\nEnsemble of Hankel Matrices for Face Emotion Recognition\nUntangling AdaBoost-based Cost-Sensitive Classification. Part I:  Theoretical Perspective\nUntangling AdaBoost-based Cost-Sensitive Classification. Part II:  Empirical Analysis\nA Deep Hashing Learning Network\nDiagnosing State-Of-The-Art Object Proposal Methods\nRBIR Based on Signature Graph\nDeep Multimodal Speaker Naming\nLearning Complexity-Aware Cascades for Deep Pedestrian Detection\nA Parameter-free Affinity Based Clustering\nEfficient moving point handling for incremental 3D manifold  reconstruction\nClustering Tree-structured Data on Manifold\nRule Of Thumb: Deep derotation for improved fingertip detection\nOnline Metric-Weighted Linear Representations for Robust Visual Tracking\nEvery Moment Counts: Dense Detailed Labeling of Actions in Complex  Videos\nBanzhaf Random Forests\nTowards Storytelling from Visual Lifelogging: An Overview\nFourier descriptors based on the structure of the human primary visual  cortex with applications to object recognition\nEfficient Face Alignment via Locality-constrained Representation for  Robust Recognition\nLearning 3D Deformation of Animals from 2D Images\nOffline Handwritten Signature Verification - Literature Review\nCollaborative Representation Classification Ensemble for Face  Recognition\nCross-pose Face Recognition by Canonical Correlation Analysis\nWhen VLAD met Hilbert\nPeople Counting in High Density Crowds from Still Images\nBeyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs\nFlip-Rotate-Pooling Convolution and Split Dropout on Convolution Neural  Networks for Image Classification\nMultimodal Multipart Learning for Action Recognition in Depth Videos\nMobile Multi-View Object Image Search\nOff-the-Grid Recovery of Piecewise Constant Images from Few Fourier  Samples\nIntensity-only optical compressive imaging using a multiply scattering  material and a double phase retrieval approach\nRAID: A Relation-Augmented Image Descriptor\nA System for Precise End-to-End Delay Measurements in Video  Communication\nVisual Tracking via Nonnegative Regularization Multiple Locality Coding\nEfficient Object Detection for High Resolution Images\nOn the Existence of Epipolar Matrices\nHarvesting Discriminative Meta Objects with Deep CNN Features for Scene  Classification\nActive Transfer Learning with Zero-Shot Priors: Reusing Past Datasets  for Future Tasks\nLearning Deep Representations of Appearance and Motion for Anomalous  Event Detection\nEuclidean Auto Calibration of Camera Networks: Baseline Constraint  Removes Scale Ambiguity\nStructured Transforms for Small-Footprint Deep Learning\nBuilding Resource Adaptive Software Systems (BRASS): Objectives and  System Evaluation\nData-Efficient Learning of Feedback Policies from Image Pixels using  Deep Dynamical Models\nDreaming More Data: Class-dependent Distributions over Diffeomorphisms  for Learned Data Augmentation\nOn the Definiteness of Earth Mover's Distance Yields and Its Relation to  Set Intersection\nWavelet Frame Based Image Restoration Using Sparsity, Nonlocal and  Support Prior of Frame Coefficients\nTagBook: A Semantic Video Representation without Supervision for Event  Detection\nTemporal Dynamic Appearance Modeling for Online Multi-Person Tracking\nOn 1-Laplacian Elliptic Equations Modeling Magnetic Resonance Image  Rician Denoising\nSpatial Semantic Regularisation for Large Scale Object Detection\nUsing Anatomical Markers for Left Ventricular Segmentation of Long Axis  Ultrasound Images\nMultiresolution Search of the Rigid Motion Space for Intensity Based  Registration\nFiltrated Spectral Algebraic Subspace Clustering\nA Novel Approach for Human Action Recognition from Silhouette Images\nSparsity-aware Possibilistic Clustering Algorithms\nBeyond Spatial Pyramid Matching: Space-time Extended Descriptor for  Action Recognition\nA Brief Survey of Image Processing Algorithms in Electrical Capacitance  Tomography\nShape Complexes in Continuous Max-Flow Hierarchical Multi-Labeling  Problems\nNo Spare Parts: Sharing Part Detectors for Image Categorization\nContent adaptive screen image scaling\nPredicting popularity of online videos using Support Vector Regression\nEfficient Unsupervised Temporal Segmentation of Motion Data\nOrder-Fractal transition in abstract paintings\nNonconvex Nonsmooth Low-Rank Minimization via Iteratively Reweighted  Nuclear Norm\nObjects2action: Classifying and localizing actions without any video  example\nVehicle Color Recognition using Convolutional Neural Network\nAggregating Deep Convolutional Features for Image Retrieval\nGeneralized Regressive Motion: a Visual Cue to Collision\nVideo Paragraph Captioning Using Hierarchical Recurrent Neural Networks\nDefect Detection Techniques for Airbag Production Sewing Stages\nLearning Multi-Domain Convolutional Neural Networks for Visual Tracking\nENFT: Efficient Non-Consecutive Feature Tracking for Robust  Structure-from-Motion\nHybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties\nScale-aware Fast R-CNN for Pedestrian Detection\nRobust Subspace Clustering via Tighter Rank Approximation\nVISALOGY: Answering Visual Analogy Questions\nPostprocessing of Compressed Images via Sequential Denoising\nDeep Recurrent Regression for Facial Landmark Detection\nEstimating Target Signatures with Diverse Density\nSemantic Cross-View Matching\nOptimized Mission Planning for Planetary Exploration Rovers\nSemantic Summarization of Egocentric Photo Stream Events\nWater Detection through Spatio-Temporal Invariant Descriptors\nFace Aging Effect Simulation using Hidden Factor Analysis Joint Sparse  Representation\nTrain and Test Tightness of LP Relaxations in Structured Prediction\nImage classification based on support vector machine and the fusion of  complementary features\nMulti-Target Tracking and Occlusion Handling with Learned Variational  Bayesian Clusters and a Social Force Model\nWood Species Recognition Based on SIFT Keypoint Histogram\nRadon-Nikodym approximation in application to image analysis\nEnhanced Low-Rank Matrix Approximation\nPooling the Convolutional Layers in Deep ConvNets for Action Recognition\nDeep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images\nSCUT-FBP: A Benchmark Dataset for Facial Beauty Perception\nLOGO-Net: Large-scale Deep Logo Detection and Brand Recognition with  Deep Region-based Convolutional Networks\nA new humanlike facial attractiveness predictor with cascaded  fine-tuning deep learning model\nBearing fault diagnosis based on spectrum images of vibration signals\nBatch-normalized Maxout Network in Network\nBiologically Inspired Dynamic Textures for Probing Motion Perception\nPartial Membership Latent Dirichlet Allocation\nMultiple Instance Dictionary Learning using Functions of Multiple  Instances\nSpatially Coherent Random Forests\nSpectral-Spatial Classification of Hyperspectral Image Using  Autoencoders\nHyperspectral Image Recovery via Hybrid Regularization\nTraffic Sign Classification Using Deep Inception Based Convolutional  Networks\nDeep Representation of Facial Geometric and Photometric Attributes for  Automatic 3D Facial Expression Recognition\nOnline Action Recognition based on Incremental Learning of Weighted  Covariance Descriptors\nAnalyzing Stability of Convolutional Neural Networks in the Frequency  Domain\nDynamic Belief Fusion for Object Detection\nThe Radon cumulative distribution transform and its application to image  classification\nTemplateNet for Depth-Based Object Instance Recognition\nSemantic Image Segmentation with Task-Specific Edge Detection Using CNNs  and a Discriminatively Trained Domain Transform\nAttention to Scale: Scale-aware Semantic Image Segmentation\nPrincipal Autoparallel Analysis: Data Analysis in Weitzenböck Space\nDiscovery Radiomics via StochasticNet Sequencers for Cancer Detection\nFacial Expression Detection using Patch-based Eigen-face Isomap Networks\nGod(s) Know(s): Developmental and Cross-Cultural Patterns in Children  Drawings\nA Continuous Max-Flow Approach to Cyclic Field Reconstruction\nAutomatic Content-Aware Color and Tone Stylization\nProNet: Learning to Propose Object-specific Boxes for Cascaded Neural  Networks\nHand-Object Interaction and Precise Localization in Transitive Action  Recognition\nWhen Naïve Bayes Nearest Neighbours Meet Convolutional Neural Networks\nFeature Learning based Deep Supervised Hashing with Pairwise Labels\nSequential estimation of intrinsic activity and synaptic input in single  neurons by particle filtering with optimal importance density\nFacial Landmark Detection with Tweaked Convolutional Neural Networks\nNewtonian Image Understanding: Unfolding the Dynamics of Objects in  Static Images\nAction Recognition using Visual Attention\nAdaptive Affinity Matrix for Unsupervised Metric Learning\nUnsupervised Learning of Edges\nStructure Inference Machines: Recurrent Neural Networks for Analyzing  Relations in Group Activity Recognition\nLearning Dense Convolutional Embeddings for Semantic Segmentation\nDeep Reflectance Maps\nSimilarity-based Text Recognition by Deeply Supervised Siamese Network\nZero-Shot Learning via Joint Latent Similarity Embedding\nLearning Fine-grained Features via a CNN Tree for Large-scale  Classification\nJointly Learning Non-negative Projection and Dictionary with  Discriminative Graph Constraints for Classification\nImplementation and comparative quantitative assessment of different  multispectral image pansharpening approches\nDeep Neural Network for Real-Time Autonomous Indoor Navigation\nSemi-Inner-Products for Convex Functionals and Their Use in Image  Decomposition\nSeparation Surfaces in the Spectral TV Domain for Texture Decomposition\nPerforming Highly Accurate Predictions Through Convolutional Networks  for Actual Telecommunication Challenges\nProposal Flow\nJoint Training of Generic CNN-CRF Models with Stochastic Optimization\nNonlinear Local Metric Learning for Person Re-identification\nVisualizing and Understanding Deep Texture Representations\nRobust PCA via Nonconvex Rank Approximation\nClassifying and Segmenting Microscopy Images Using Convolutional  Multiple Instance Learning\nHierarchical Spatial Sum-Product Networks for Action Recognition in  Still Images\nTowards Predicting the Likeability of Fashion Images\nParticular object retrieval with integral max-pooling of CNN activations\nCollecting and Annotating the Large Continuous Action Dataset\nABC-CNN: An Attention Based Convolutional Neural Network for Visual  Question Answering\nActive Object Localization with Deep Reinforcement Learning\nA Hierarchical Deep Temporal Model for Group Activity Recognition\nCompact Bilinear Pooling\nDeep Learning for Tactile Understanding From Visual and Haptic Data\nStructured Depth Prediction in Challenging Monocular Video Sequences\nPrincipled Parallel Mean-Field Inference for Discrete Random Fields\nCoreset-Based Adaptive Tracking\nMultimodal sparse representation learning and applications\nConvolutional Clustering for Unsupervised Learning\nFoveation-based Mechanisms Alleviate Adversarial Examples\nface anti-spoofing based on color texture analysis\nRobust Classification by Pre-conditioned LASSO and Transductive  Diffusion Component Analysis\nEfficient inference in occlusion-aware generative models of images\nManifold Regularized Deep Neural Networks using Adversarial Examples\nGeodesics of learned representations\nFeature-based Attention in Convolutional Neural Networks\nQBDC: Query by dropout committee for training deep supervised  architecture\nFirst Step toward Model-Free, Anonymous Object Tracking with Recurrent  Neural Networks\nA convnet for non-maximum suppression\nLearning Representations from EEG with Deep Recurrent-Convolutional  Neural Networks\nDeep Metric Learning via Lifted Structured Feature Embedding\nCompression of Deep Convolutional Neural Networks for Fast and Low Power  Mobile Applications\nElSe: Ellipse Selection for Robust Pupil Detection in Real-World  Environments\nRecurrent Semi-supervised Classification and Constrained Adversarial  Generation with Motion Capture Data\nTracklet Association by Online Target-Specific Metric Learning and  Coherent Dynamics Estimation\nImages Don't Lie: Transferring Deep Visual Semantic Features to  Large-Scale Multimodal Learning to Rank\nRecognizing Activities of Daily Living with a Wrist-mounted Camera\nThe Unreasonable Effectiveness of Noisy Data for Fine-Grained  Recognition\nLearning visual groups from co-occurrences in space and time\nFidelity-Naturalness Evaluation of Single Image Super Resolution\nSemantic Segmentation of Colon Glands with Deep Convolutional Neural  Networks and Total Variation Segmentation\nReal-Time Anomaly Detection and Localization in Crowded Scenes\nEnd-to-end Learning of Action Detection from Frame Glimpses in Videos\nFine-grained pose prediction, normalization, and recognition\nAuxiliary Image Regularization for Deep CNNs with Noisy Labels\nMulti-Scale Context Aggregation by Dilated Convolutions\nFace Alignment Across Large Poses: A 3D Solution\nRendering refraction and reflection of eyeglasses for synthetic eye  tracker images\nLearning Visual Predictive Models of Physics for Playing Billiards\nReal-Time Anomalous Behavior Detection and Localization in Crowded  Scenes\nPicking a Conveyor Clean by an Autonomously Learning Robot\nMouse Pose Estimation From Depth Images\nWeakly Supervised Object Boundaries\nShape and Symmetry Induction for 3D Objects\nPrincipal Basis Analysis in Sparse Representation\nVideo Tracking Using Learned Hierarchical Features\nPASCAL Boundaries: A Class-Agnostic Semantic Boundary Dataset\nPedestrian Detection Inspired by Appearance Constancy and Shape Symmetry\nHigher Order Conditional Random Fields in Deep Neural Networks\nTracking Motion and Proxemics using Thermal-sensor Array\nTowards Automatic Image Editing: Learning to See another You\nAn analysis of the factors affecting keypoint stability in scale-space\nTennisVid2Text: Fine-grained Descriptions for Domain Specific Videos\nSliding-Window Optimization on an Ambiguity-Clearness Graph for  Multi-object Tracking\nThe Multiverse Loss for Robust Transfer Learning\nIncidental Scene Text Understanding: Recent Progresses on ICDAR 2015  Robust Reading Competition Challenge 4\nFine-Grained Classification via Mixture of Deep Convolutional Neural  Networks\nSparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video\nTo Fall Or Not To Fall: A Visual Approach to Physical Stability  Prediction\nVariational reaction-diffusion systems for semantic segmentation\nIt's Moving! A Probabilistic Model for Causal Motion Segmentation in  Moving Camera Videos\nLearning a Pose Lexicon for Semantic Action Recognition\nTensor Representations via Kernel Linearization for Action Recognition  from 3D Skeletons (Extended Version)\nHow to Transfer? Zero-Shot Object Recognition via Hierarchical Transfer  of Semantic Attributes\nPerson Re-identification in Appearance Impaired Scenarios\nOverlay Text Extraction From TV News Broadcast\nRobust video object tracking via Bayesian model averaging based feature  fusion\nImage Quality Assessment for Performance Evaluation of Focus Measure  Operators\nGAL: A Global-Attributes Assisted Labeling System for Outdoor Scenes\nHDRFusion: HDR SLAM using a low-cost auto-exposure RGB-D sensor\nDetecting Engagement in Egocentric Video\nExtended Object Tracking: Introduction, Overview and Applications\nComparative Deep Learning of Hybrid Representations for Image  Recommendations\nCohomology of Cryo-Electron Microscopy\nDeep Image Retrieval: Learning global representations for image search\nHighly accurate gaze estimation using a consumer RGB-depth sensor\nForecasting Interactive Dynamics of Pedestrians with Fictitious Play\nLearning A Deep $\\ell_\\infty$ Encoder for Hashing\nCorrelated and Individual Multi-Modal Deep Learning for RGB-D Object  Recognition\nFusing Face and Periocular biometrics using Canonical correlation  analysis\nThe Cityscapes Dataset for Semantic Urban Scene Understanding\nA Subpath Kernel for Learning Hierarchical Image Representations\nExploiting Semantic Information and Deep Matching for Optical Flow\nJoint Detection and Identification Feature Learning for Person Search\nReinterpreting the Transformation Posterior in Probabilistic Image  Registration\nA Novel Scene Text Detection Algorithm Based On Convolutional Neural  Network\nGeometric Scene Parsing with Hierarchical LSTM\nEdge Detection Based Shape Identification\n3-D Hand Pose Estimation from Kinect's Point Cloud Using Appearance  Matching\nFamilies in the Wild (FIW): Large-Scale Kinship Image Database and  Benchmarks\nInfrared Colorization Using Deep Convolutional Neural Networks\nBayesian Neighbourhood Component Analysis\nMachine Learning for Visual Navigation of Unmanned Ground Vehicles\nApplication of Multifractal Analysis to Segmentation of Water Bodies in  Optical and Synthetic Aperture Radar Satellite Images\nPerson Re-identification in the Wild\nScene-driven Retrieval in Edited Videos using Aesthetic and Semantic  Deep Features\nDirection matters: hand pose estimation from local surface normals\nDCAN: Deep Contour-Aware Networks for Accurate Gland Segmentation\nSoccer Field Localization from a Single Image\nCapturing Dynamic Textured Surfaces of Moving Targets\nNTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis\nActive Learning for Online Recognition of Human Activities from  Streaming Videos\nJoint Face Detection and Alignment using Multi-task Cascaded  Convolutional Networks\nSemantic 3D Reconstruction with Continuous Regularization and Ray  Potentials Using a Visibility Consistency Constraint\nGaussian Process Domain Experts for Model Adaptation in Facial Behavior  Analysis\nKernel-based Sensor Fusion with Application to Audio-Visual Voice  Activity Detection\nCP-mtML: Coupled Projection multi-task Metric Learning for Large Scale  Face Retrieval\nSemi-supervised learning of local structured output predictors\nApplication of the Second-Order Statistics for Estimation of the Pure  Spectra of Individual Components from the Visible Hyperspectral Images of  Their Mixture\nVolumetric and Multi-View CNNs for Object Classification on 3D Data\nScan, Attend and Read: End-to-End Handwritten Paragraph Recognition with  MDLSTM Attention\nSweep Distortion Removal from THz Images via Blind Demodulation\nStructured Matrix Recovery via the Generalized Dantzig Selector\nDTM: Deformable Template Matching\nGoing Deeper with Contextual CNN for Hyperspectral Image Classification\nCross-stitch Networks for Multi-task Learning\nQuantifying mesoscale neuroanatomy using X-ray microtomography\nLearning Social Affordance for Human-Robot Interaction\nVConv-DAE: Deep Volumetric Shape Learning Without Object Labels\nSystem Design of Internet-of-Things for Residential Smart Grid\nMulti-Oriented Text Detection with Fully Convolutional Networks\nOn Reducing the Number of Visual Words in the Bag-of-Features  Representation\nUnsupervised Nonlinear Spectral Unmixing based on a Multilinear Mixing  Model\nDARI: Distance metric And Representation Integration for Person  Verification\nLong-term Temporal Convolutions for Action Recognition\nLearning Temporal Regularity in Video Sequences\nBags of Local Convolutional Features for Scalable Instance Search\nRadon Features and Barcodes for Medical Image Retrieval via SVM\nGenerating Binary Tags for Fast Medical Image Retrieval Based on  Convolutional Nets and Radon Transform\nSubcategory-aware Convolutional Neural Networks for Object Proposals and  Detection\nLearning Models for Actions and Person-Object Interactions with Transfer  to Question Answering\nPhase-Aligned Spectral Filtering for Decomposing Spatiotemporal Dynamics\nGenerating Semi-Synthetic Validation Benchmarks for Embryomics\nFully Convolutional Recurrent Network for Handwritten Chinese Text  Recognition\nMost Likely Separation of Intensity and Warping Effects in Image  Registration\nScribbleSup: Scribble-Supervised Convolutional Networks for Semantic  Segmentation\nNon-contact hemodynamic imaging reveals the jugular venous pulse  waveform\nLearning Dense Correspondence via 3D-guided Cycle Consistency\nTriplet Probabilistic Embedding for Face Verification and Clustering\nParts for the Whole: The DCT Norm for Extreme Visual Recovery\nTuning the work function in transition metal oxides and their  heterostructures\nSherlock: Sparse Hierarchical Embeddings for Visually-aware One-class  Collaborative Filtering\nDeep CNNs for HEp-2 Cells Classification : A Cross-specimen Analysis\nDepth Image Inpainting: Improving Low Rank Matrix Completion with Low  Gradient Regularization\nScene Parsing with Integration of Parametric and Non-parametric Models\nJansen-MIDAS: a multi-level photomicrograph segmentation software based  on isotropic undecimated wavelets\nLabeled Multi-Bernoulli Tracking for Industrial Mobile Platform Safety\nSymmetry-aware Depth Estimation using Deep Neural Networks\nMiniature optical planar camera based on a wide-angle metasurface  doublet corrected for monochromatic aberrations\nNovelty Detection in MultiClass Scenarios with Incomplete Set of Class  Labels\nVisual Congruent Ads for Image Search\nConvolutional Two-Stream Network Fusion for Video Action Recognition\nThe Mean Partition Theorem of Consensus Clustering\nSynthetic Data for Text Localisation in Natural Images\nLearning rotation invariant convolutional filters for texture  classification\nRefining Architectures of Deep Convolutional Neural Networks\nWord2VisualVec: Image and Video to Sentence Matching by Visual Feature  Prediction\nContextual object categorization with energy-based model\nText Flow: A Unified Text Detection System in Natural Scene Images\nBayesian Inference of Recursive Sequences of Group Activities from  Tracks\nCardiac Motion Analysis by Temporal Flow Graphs\nTowards Better Analysis of Deep Convolutional Neural Networks\nSemi-supervised Vocabulary-informed Learning\nMakeup like a superstar: Deep Localized Makeup Transfer Network\nActionness Estimation Using Hybrid Fully Convolutional Networks\nSemi-supervised Dictionary Learning Based on Hilbert-Schmidt  Independence Criterion\nSupervised Incremental Hashing\nAttributes for Improved Attributes: A Multi-Task Network for Attribute  Classification\nContext Encoders: Feature Learning by Inpainting\nBalancing Appearance and Context in Sketch Interpretation\nJoint Semantic Segmentation and Depth Estimation with Deep Convolutional  Networks\nOnce for All: a Two-flow Convolutional Neural Network for Visual  Tracking\nSemantic Change Detection with Hypermaps\nLearning Deep Feature Representations with Domain Guided Dropout for  Person Re-identification\nSpot On: Action Localization from Pointly-Supervised Proposals\nAn Accelerometer Based Calculator for Visually Impaired People Using  Mobile Devices\nReal-time Action Recognition with Enhanced Motion Vector CNNs\nEgoSampling: Wide View Hyperlapse from Egocentric Videos\nAn Enhanced Deep Feature Representation for Person Re-identification\nZero-shot object prediction using semantic scene knowledge\nSimultaneous Food Localization and Recognition\nDetecting Violence in Video using Subclasses\nLaser light-field fusion for wide-field lensfree on-chip phase contrast  nanoscopy\nUnsupervised Classification in Hyperspectral Imagery with Nonlocal Total  Variation and Primal-Dual Hybrid Gradient Algorithm\nA Probabilistic Adaptive Search System for Exploring the Face Space\nImproved Dense Trajectory with Cross Streams\nConvolutional Neural Networks for Attribute-based Active Authentication  on Mobile Devices\nFace Recognition Using Scattering Convolutional Network\nVisual Relationship Detection with Language Priors\nA Data-driven Approach for Human Pose Tracking Based on Spatio-temporal  Pictorial Structure\nLearning deep representation from coarse to fine for face alignment\nSimilarity Registration Problems for 2D/3D Ultrasound Calibration\nDenoising and compression in wavelet domain via projection onto  approximation coefficients\nNew wavelet-based superresolution algorithm for speckle reduction in SAR  images\nDenoising based on wavelets and deblurring via self-organizing map for  Synthetic Aperture Radar images\nFuzzy thresholding in wavelet domain for speckle reduction in Synthetic  Aperture Radar images\nVideo Summarization in a Multi-View Camera Network\nTop-down Neural Attention by Excitation Backprop\nDimensionality reduction based on Distance Preservation to Local Mean  (DPLM) for SPD matrices and its application in BCI\nModeling Context Between Objects for Referring Expression Understanding\nInteractive Image Segmentation Using Constrained Dominant Sets\nA Survey of Visual Analysis of Human Motion and Its Applications\nSemantically Guided Depth Upsampling\nInteractive Removal and Ground Truth for Difficult Shadow Scenes\nCUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016\nA study of the effect of JPG compression on adversarial images\nTemporal Segment Networks: Towards Good Practices for Deep Action  Recognition\nTowards Learning to Perceive and Reason About Liquids\nPicHunt: Social Media Image Retrieval for Improved Law Enforcement\nAutomated X-ray Image Analysis for Cargo Security: Critical Review and  Future Promise\nIncremental Real-Time Multibody VSLAM with Trajectory Optimization Using  Stereo Camera\nTraining Deep Networks for Facial Expression Recognition with  Crowd-Sourced Label Distribution\nLearning Common and Specific Features for RGB-D Semantic Segmentation  with Deconvolutional Networks\nUnitBox: An Advanced Object Detection Network\nRecoding Color Transfer as a Color Homography\nSaliency Integration: An Arbitrator Model\nCompartmental analysis of dynamic nuclear medicine data: regularization  procedure and application to physiology\nBlind Deconvolution of PET Images using Anatomical Priors\nEnhanced Directional Smoothing Algorithm for Edge-Preserving Smoothing  of Synthetic-Aperture Radar Images\nCompressive Change Retrieval for Moving Object Detection\nMulti-Model Hypothesize-and-Verify Approach for Incremental Loop Closure  Verification\nSigns in time: Encoding human motion as a temporal image\nShapeFit and ShapeKick for Robust, Scalable Structure from Motion\nMultiview Cauchy Estimator Feature Embedding for Depth and Inertial  Sensor-Based Human Action Recognition\nResidual CNDS\nDiscriminatively Trained Latent Ordinal Model for Video Classification\nA combined Approach Based on Fuzzy Classification and Contextual Region  Growing to Image Segmentation\nDatabase of handwritten Arabic mathematical formulas images\nSteerable Principal Components for Space-Frequency Localized Images\nConvolutional Oriented Boundaries\nResidual Networks of Residual Networks: Multilevel Residual Networks\nObject Detection, Tracking, and Motion Segmentation for Object-level  Video Segmentation\n3D Human Pose Estimation Using Convolutional Neural Networks with 2D  Pose Information\nDeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns\nFractional Calculus In Image Processing: A Review\nApproximate search with quantized sparse representations\nEnabling My Robot To Play Pictionary : Recurrent Neural Networks For  Sketch Recognition\nRecurrent Neural Networks to Correct Satellite Image Classification Maps\nFaster Training of Very Deep Networks Via p-Norm Gates\nReasoning and Algorithm Selection Augmented Symbolic Segmentation\nSelf-paced Learning for Weakly Supervised Evidence Discovery in  Multimedia Event Search\nWhen was that made?\nBranching Gaussian Processes with Applications to Spatiotemporal  Reconstruction of 3D Trees\nThe Importance of Skip Connections in Biomedical Image Segmentation\nEvery Filter Extracts A Specific Texture In Convolutional Neural  Networks\nOcclusion-Model Guided Anti-Occlusion Depth Estimation in Light Field\nFace Alignment In-the-Wild: A Survey\nGenerative and Discriminative Voxel Modeling with Convolutional Neural  Networks\nDetecting Dominant Vanishing Points in Natural Scenes with Application  to Composition-Sensitive Image Retrieval\nVisual place recognition using landmark distribution descriptors\nTransitive Hashing Network for Heterogeneous Multimedia Retrieval\nWeakly Supervised Object Localization Using Size Estimates\nStar-galaxy Classification Using Deep Convolutional Neural Networks\nUnconstrained Two-parallel-plane Model for Focused Plenoptic Cameras  Calibration\nVariational Gaussian Process Auto-Encoder for Ordinal Prediction of  Facial Action Units\nGeometry-aware Similarity Learning on SPD Manifolds for Visual  Recognition\nFrame- and Segment-Level Features and Candidate Pool Evaluation for  Video Caption Generation\nAn image compression and encryption scheme based on deep learning\nScene Labeling Through Knowledge-Based Rules Employing Constrained  Integer Linear Programing\nIM2CAD\nFull Resolution Image Compression with Recurrent Neural Networks\nMulti-stage Object Detection with Group Recursive Learning\nDeeply-Supervised Recurrent Convolutional Neural Network for Saliency  Detection\nSaliency Detection via Combining Region-Level and Pixel-Level  Predictions with CNNs\nEfficient Multi-Frequency Phase Unwrapping using Kernel Density  Estimation\nHow Image Degradations Affect Deep CNN-based Face Recognition?\nRigid Slice-To-Volume Medical Image Registration through Markov Random  Fields\nBack to Basics: Unsupervised Learning of Optical Flow via Brightness  Constancy and Motion Smoothness\nFeedback-Controlled Sequential Lasso Screening\nCrowdNet: A Deep Convolutional Network for Dense Crowd Counting\nLarge-scale Continuous Gesture Recognition Using Convolutional Neural  Networks\nDeep Double Sparsity Encoder: Learning to Sparsify Not Only Features But  Also Parameters\nConvolutional Network for Attribute-driven and Identity-preserving Human  Face Generation\nSearching Action Proposals via Spatial Actionness Estimation and  Temporal Path Inference and Tracking\nNeural Networks with Smooth Adaptive Activation Functions for Regression\nAutomatic Synchronization of Multi-User Photo Galleries\nAmbient Sound Provides Supervision for Visual Learning\nModeling and Propagating CNNs in a Tree Structure for Visual Tracking\nScalable Compression of Deep Neural Networks\nSpatio-temporal Aware Non-negative Component Representation for Action  Recognition\nMulti-Path Feedback Recurrent Neural Network for Scene Parsing\nTotal variation reconstruction for compressive sensing using nonlocal  Lagrangian multiplier\nUsing k-nearest neighbors to construct cancelable minutiae templates\nCorrespondence Insertion for As-Projective-As-Possible Image Stitching\nEdge Preserving and Multi-Scale Contextual Neural Network for Salient  Object Detection\nWhere is my Phone ? Personal Object Retrieval from Egocentric Images\nORBSLAM-based Endoscope Tracking and 3D Reconstruction\nTracking Completion\nReal-Time Visual Tracking: Promoting the Robustness of Correlation  Filter Learning\nConstruction of Convex Sets on Quadrilateral Ordered Tiles or Graphs  with Propagation Neighborhood Operations. Dales, Concavity Structures.  Application to Gray Image Analysis of Human-Readable Shapes\nLow-rank Multi-view Clustering in Third-Order Tensor Space\nMotion Representation with Acceleration Images\nCliqueCNN: Deep Unsupervised Exemplar Learning\nEfficient Two-Stream Motion and Appearance 3D CNNs for Video  Classification\nMeasuring the Quality of Exercises\nStreaming Multimedia Information Using the Features of the DVB-S Card\nA Modified Cross Correlation Algorithm for Reference-free Image  Alignment of Non-Circular Projections in Single-Particle Electron Microscopy\nInternet of Things: Applications and Challenges in Technology and  Standardization\n$\\ell_0$ Minimization for Wavelet Frame Based Image Restoration\nSalient Local 3D Features for 3D Shape Retrieval\nFace Recognition using 3D Facial Shape and Color Map Information:  Comparison and Combination\nOptimal Camera Placement to measure Distances Conservativly Regarding  Static and Dynamic Obstacles\nHierarchical Recursive Running Median\nA Multiple-Choice Test Recognition System based on the Gamera Framework\nHuman Identity Verification based on Heart Sounds: Recent Advances and  Future Directions\nScale-Invariant Local Descriptor for Event Recognition in 1D Sensor  Signals\nDensity Estimation and Classification via Bayesian Nonparametric  Learning of Affine Subspaces\nIdentifying relationships between drugs and medical conditions: winning  experience in the Challenge 2 of the OMOP 2010 Cup\nDictionary Learning for Deblurring and Digital Zoom\nNash Equilibria in Quantum Games\nFoliage Plant Retrieval using Polar Fourier Transform, Color Moments and  Vein Features\nThe proximal point method for a hybrid model in image restoration\nClosed-Loop Learning of Visual Control Policies\nGround Metric Learning\nRobust Image Analysis by L1-Norm Semi-supervised Learning\nAnti-sparse coding for approximate nearest neighbor search\nLocal Naive Bayes Nearest Neighbor for Image Classification\nOptimality Bounds for a Variational Relaxation of the Image Partitioning  Problem\nMeaningful Matches in Stereovision\nImprovement of BM3D Algorithm and Employment to Satellite and CFA Images  Denoising\nLarge Scale Correlation Clustering Optimization\nSupervised Generative Reconstruction: An Efficient Way To Flexibly Store  and Recognize Patterns\nData Processing For Atomic Resolution EELS\nHigher-Order Momentum Distributions and Locally Affine LDDMM  Registration\nAutomatic post-picking improves particle image detection from Cryo-EM  micrographs\nA Reduced Reference Image Quality Measure Using Bessel K Forms Model for  Tetrolet Coefficients\nOracle inequalities and minimax rates for non-local means and related  adaptive kernel-based methods\nZero-Temperature Limit of a Convergent Algorithm to Minimize the Bethe  Free Energy\nMultispectral Palmprint Recognition Using a Hybrid Feature\nTranslation-Invariant Shrinkage/Thresholding of Group Sparse Signals\nJassologie : Une vision originale sur les cartes orientables\nImproved Performance of Unsupervised Method by Renovated K-Means\nLie Algebrized Gaussians for Image Representation\nRestoration of Images Corrupted by Impulse Noise and Mixed Gaussian  Impulse Noise using Blind Inpainting\nDynamic Amelioration of Resolution Mismatches for Local Feature Based  Identity Inference\nKernel Reconstruction ICA for Sparse Representation\nImage Classification by Feature Dimension Reduction and Graph based  Ranking\nDetecting Directionality in Random Fields Using the Monogenic Signal\nPlanning, Scheduling, and Uncertainty in the Sequence of Future Events\nMerging Satellite Measurements of Rainfall Using Multi-scale Imagery  Technique\nSingle View Depth Estimation from Examples\nA new Bayesian ensemble of trees classifier for identifying multi-class  labels in satellite images\nPolygon Matching and Indexing Under Affine Transformations\nSeparating the Real from the Synthetic: Minutiae Histograms as  Fingerprints of Fingerprints\nCounting people from above: Airborne video based crowd analysis\nLearning Visual Symbols for Parsing Human Poses in Images\nFilament and Flare Detection in Hα image sequences\nPulmonary Vascular Tree Segmentation from Contrast-Enhanced CT Images\nDeterministic Initialization of the K-Means Algorithm Using Hierarchical  Clustering\nRegistration of Images with Outliers Using Joint Saliency Map\nDomain-invariant Face Recognition using Learned Low-rank Transformation\nSparse Dictionary-based Attributes for Action Recognition and  Summarization\nSign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels\nImage interpolation using Shearlet based iterative refinement\nSpatial-Aware Dictionary Learning for Hyperspectral Image Classification\nA Multi-Swarm Cellular PSO based on Clonal Selection Algorithm in  Dynamic Environments\nSatellite image classification methods and Landsat 5TM bands\nGradient Magnitude Similarity Deviation: A Highly Efficient Perceptual  Image Quality Index\nTowards Adapting ImageNet to Reality: Scalable Domain Adaptation with  Implicit Low-rank Transformations\nMatching Demand with Supply in the Smart Grid using Agent-Based  Multiunit Auction\nA Unified Framework for Multi-Sensor HDR Video Reconstruction\nGroup-Sparse Signal Denoising: Non-Convex Regularization, Convex  Optimization\nSuspicious Object Recognition Method in Video Stream Based on Visual  Attention\nText recognition in both ancient and cartographic documents\nGNCGCP - Graduated NonConvexity and Graduated Concavity Procedure\nImage Set based Collaborative Representation for Face Recognition\nCollaborative Receptive Field Learning\nA Robust Framework for Moving-Object Detection and Vehicular Traffic  Density Estimation\nScene Labeling with Contextual Hierarchical Models\nSignal to Noise Ratio in Lensless Compressive Imaging\nPatchwise Joint Sparse Tracking with Occlusion Detection\nQuantile Representation for Indirect Immunofluorescence Image  Classification\nA Hybrid Loss for Multiclass and Structured Prediction\nLeveraging Long-Term Predictions and Online-Learning in Agent-based  Multiple Person Tracking\nSignal Reconstruction Framework Based On Projections Onto Epigraph Set  Of A Convex Cost Function (PESC)\nModeling sequential data using higher-order relational features and  predictive training\nSparsity averaging for radio-interferometric imaging\nAnimation of 3D Human Model Using Markerless Motion Capture Applied To  Sports\nReal-Time Hand Shape Classification\nNoise Analysis for Lensless Compressive Imaging\nHand-Eye and Robot-World Calibration by Global Polynomial Optimization\nIntrinsically Motivated Learning of Visual Motion Perception and Smooth  Pursuit\nImproving Streaming Video Segmentation with Early and Mid-Level Visual  Processing\nA Narrative Vehicle Protection Representation for Vehicle Speed  Regulator Under Driver Exhaustion -- A Study\nScalable Kernel Clustering: Approximate Kernel k-means\nSparse Coding Approach for Multi-Frame Image Super Resolution\nThe Algebraic Approach to Phase Retrieval and Explicit Inversion at the  Identifiability Threshold\nStatistical Noise Analysis in SENSE Parallel MRI\nInformation Theory of Matrix Completion\nVesselness via Multiple Scale Orientation Scores\nReal-time Automatic Emotion Recognition from Body Gestures\nBinary Fused Compressive Sensing: 1-Bit Compressive Sensing meets Group  Sparsity\nRobust Binary Fused Compressive Sensing using Adaptive Outlier Pursuit\nA Novel Histogram Based Robust Image Registration Technique\nLocalization of License Plate Using Morphological Operations\nExemplar-based Linear Discriminant Analysis for Robust Object Tracking\nA Novel Scheme for Intelligent Recognition of Pornographic Images\nA Novel Face Recognition Method using Nearest Line Projection\nA Testbed for Cross-Dataset Analysis\nA Multiplierless Pruned DCT-like Transformation for Image and Video  Compression that Requires 10 Additions Only\nA Novel Method for the Recognition of Isolated Handwritten Arabic  Characters\nLow-Cost Compressive Sensing for Color Video and Depth\nHierarchical community structure in complex (social) networks\nConvex Total Least Squares\nOn Classification with Bags, Groups and Sets\nVisual Reranking with Improved Image Graph\nMultiscale Fields of Patterns\nShared Representation Learning for Heterogeneous Face Recognition\nIllusory Shapes via Phase Transition\nA Context-aware Delayed Agglomeration Framework for Electron Microscopy  Segmentation\nTowards building a Crowd-Sourced Sky Map\nSmall Sample Learning of Superpixel Classifiers for EM Segmentation-  Extended Version\nFine-grained Activity Recognition with Holistic and Pose based Features\nRefinement-Cut: User-Guided Segmentation Algorithm for Translational  Science\nTwo-Stream Convolutional Networks for Action Recognition in Videos\nSynthetic Data and Artificial Neural Networks for Natural Scene Text  Recognition\nRobust Estimation of 3D Human Poses from a Single Image\nDepth Map Prediction from a Single Image using a Multi-Scale Deep  Network\nParsing Semantic Parts of Cars Using Graphical Models and Segment  Appearance Consistency\nWhy do linear SVMs trained on HOG features perform so well?\nThe Secrets of Salient Object Segmentation\nAcoustic Gait-based Person Identification using Hidden Markov Models\nTruncated Nuclear Norm Minimization for Image Restoration Based On  Iterative Support Detection\n\"Mental Rotation\" by Optimizing Transforming Distance\nHuman-Machine CRFs for Identifying Bottlenecks in Holistic Scene  Understanding\nA Fusion of Labeled-Grid Shape Descriptors with Weighted Ranking  Algorithm for Shapes Recognition\nImpact of Exponent Parameter Value for the Partition Matrix on the  Performance of Fuzzy C Means Algorithm\nPRISM: Person Re-Identification via Structured Matching\nMulti-stage Multi-task feature learning via adaptive threshold\nDeep Learning Face Representation by Joint Identification-Verification\nInner Product Similarity Search using Compositional Codes\nWeb-Scale Training for Face Identification\nEarly Recognition of Human Activities from First-Person Videos Using  Onset Representations\nMulti-utility Learning: Structured-output Learning with Multiple  Annotation-specific Loss Functions\nImage Completion for View Synthesis Using Markov Random Fields and  Efficient Belief Propagation\nOn the Convergence Rate of Decomposable Submodular Function Minimization\n$ N^4 $-Fields: Neural Network Nearest Neighbor Fields for Image  Transforms\nSupport vector machine classification of dimensionally reduced  structural MRI images for dementia\n3DUNDERWORLD-SLS: An Open-Source Structured-Light Scanning System for  Rapid Geometry Acquisition\nFace Image Classification by Pooling Raw Features\nDeep Learning Multi-View Representation for Face Recognition\n3D planar patch extraction from stereo using probabilistic region  growing\nOn a new formulation of nonlocal image filters involving the relative  rearrangement\nFusion Based Holistic Road Scene Understanding\nKernel Coding: General Formulation and Special Cases\nTransferring Landmark Annotations for Cross-Dataset Face Alignment\nVisual Passwords Using Automatic Lip Reading\nConstructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive  Features\nFocused Proofreading: Efficiently Extracting Connectomes from Segmented  EM Images\nAnnotating Synapses in Large EM Datasets\nAutomatic Neuron Type Identification by Neurite Localization in the  Drosophila Medulla\nDepth image hand tracking from an overhead perspective using partially  labeled, unbalanced data: Development and real-world testing\nA theoretical contribution to the fast implementation of null linear  discriminant analysis method using random matrix multiplication with scatter  matrices\nAmbiguity-Driven Fuzzy C-Means Clustering: How to Detect Uncertain  Clustered Records\nUnsupervised learning of clutter-resistant visual representations from  natural videos\nConcurrent Tracking of Inliers and Outliers\nCavlectometry: Towards Holistic Reconstruction of Large Mirror Objects\nA Combined Method Of Fractal And GLCM Features For MRI And CT Scan  Images Classification\nFingerprint Classification Based on Depth Neural Network\nSubspace Alignment For Domain Adaptation\nDeformable Part Models are Convolutional Neural Networks\nHyperspectral and Multispectral Image Fusion based on a Sparse  Representation\nActive Dictionary Learning in Sparse Representation Based Classification\nFast Low-rank Representation based Spatial Pyramid Matching for Image  Classification\nAnalyzing sparse dictionaries for online learning with kernels\n1-HKUST: Object Detection in ILSVRC 2014\nA non-linear learning & classification algorithm that achieves full  training accuracy with stellar classification accuracy\nUnified Heat Kernel Regression for Diffusion, Kernel Smoothing and  Wavelets on Manifolds and Its Application to Mandible Growth Modeling in CT  Images\nRecent Progress in Image Deblurring\nDo More Dropouts in Pool5 Feature Maps for Better Object Detection\nDeep Learning Representation using Autoencoder for 3D Shape Retrieval\nTwo-stage Geometric Information Guided Image Reconstruction\nHow close are we to understanding image-based saliency?\nAudio Surveillance: a Systematic Review\nUnderstanding Deep Image Representations by Inverting Them\nA Bayesian Framework for Sparse Representation-Based 3D Human Pose  Estimation\nColor image quality assessment measure using multivariate generalized  Gaussian distribution\nRobust Camera Location Estimation by Convex Programming\nSimple pairs of points in digital spaces. Topology-preserving  transformations of digital spaces by contracting simple pairs of points\nA Clearer Picture of Blind Deconvolution\nUntangling Local and Global Deformations in Deep Convolutional Networks  for Image Classification and Sliding Window Detection\nFuzzy human motion analysis: A review\nRecovering Spatiotemporal Correspondence between Deformable Objects by  Exploiting Consistent Foreground Motion in Video\nOrthogonal Matrix Retrieval in Cryo-Electron Microscopy\nDeeply learned face representations are sparse, selective, and robust\nMemory Bounded Deep Convolutional Networks\nConvolutional Neural Networks at Constrained Time Cost\nReading Text in the Wild with Convolutional Neural Networks\nCoMIC: Good features for detection and matching at object boundaries\nLearning Multi-target Tracking with Quadratic Object Interactions\nDeep Visual-Semantic Alignments for Generating Image Descriptions\nVisual Causal Feature Learning\nLinear optical demonstration of quantum speed-up with a single qudit\nHyperSpectral classification with adaptively weighted L1-norm  regularization and spatial postprocessing\nImage quality assessment measure based on natural image statistics in  the Tetrolet domain\nSubspace based low rank and joint sparse matrix recovery\nScore Function Features for Discriminative Learning: Matrix and Tensor  Framework\nReal-Time Grasp Detection Using Convolutional Neural Networks\nDeep Domain Confusion: Maximizing for Domain Invariance\nRoad Detection by One-Class Color Classification: Dataset and  Experiments\nEgoSampling: Fast-Forward and Stereo for Egocentric Videos\nA Novel Adaptive Possibilistic Clustering Algorithm\nCompact Compositional Models\nMachine Learning for Neuroimaging with Scikit-Learn\nAn Automatic Seeded Region Growing for 2D Biomedical Image Segmentation\nHigh-level numerical simulations of noise in CCD and CMOS photosensors:  review and tutorial\nAn Experimental Evaluation of Machine-to-Machine Coordination  Middleware: Extended Version\nKernel Methods on the Riemannian Manifold of Symmetric Positive Definite  Matrices\nA Framework for Shape Analysis via Hilbert Space Embedding\nA Study of Sindhi Related and Arabic Script Adapted languages  Recognition\nCombining the Best of Graphical Models and ConvNets for Semantic  Segmentation\nInexact Alternating Direction Method Based on Newton descent algorithm  with Application to Poisson Image Deblurring\nA Robust Regression Approach for Background/Foreground Segmentation\nAutomatic Training Data Synthesis for Handwriting Recognition Using the  Structural Crossing-Over Technique\nUnsupervised Learning of Spatiotemporally Coherent Metrics\nFractional Max-Pooling\nSemantic Part Segmentation using Compositional Model combining Shape and  Appearance\nData Representation using the Weyl Transform\nAutomated Objective Surgical Skill Assessment in the Operating Room  Using Unstructured Tool Motion\nPooled Motion Features for First-Person Videos\nCauchy Principal Component Analysis\nScore Function Features for Discriminative Learning\nFracking Deep Convolutional Image Descriptors\nAutomatic Discovery and Optimization of Parts for Image Classification\nThe local low-dimensionality of natural images\nVisualizing and Comparing Convolutional Neural Networks\nMixture of Parts Revisited: Expressive Part Interactions for Pose  Estimation\nFusing Color and Texture Cues to Categorize the Fruit Diseases from  Images\nSymmetry in Image Registration and Deformation Modeling\nAn Effective Semi-supervised Divisive Clustering Algorithm\nA Fuzzy Based Model to Identify Printed Sinhala Characters (ICIAfS14)\nJoint Deep Learning for Car Detection\nMetacarpal Bones Localization in X-ray Imagery Using Particle Filter  Segmentation\nRigid and Non-rigid Shape Evolutions for Shape Alignment and Recovery in  Images\nSHOE: Supervised Hashing with Output Embeddings\nCategory-Epitomes : Discriminatively Minimalist Representations for  Object Categories\nOptimized Projection for Sparse Representation Based Classification\nModified Fast Fractal Image Compression Algorithm in spatial domain\nComplex Background Subtraction by Pursuing Dynamic Spatio-Temporal  Models\nIterated Support Vector Machines for Distance Metric Learning\nTowards a solid solution of real-time fire and flame detection\nLearning the Matching Function\nRecovery of Piecewise Smooth Images from Few Fourier Samples\nDynamical And-Or Graph Learning for Object Shape Modeling and Detection\nFace frontalization for Alignment and Recognition\nDeepID3: Face Recognition with Very Deep Neural Networks\nClassification of Hyperspectral Imagery on Embedded Grassmannians\nORB-SLAM: a Versatile and Accurate Monocular SLAM System\nA Multiple-Expert Binarization Framework for Multispectral Images\nLinear-time Online Action Detection From 3D Skeletal Data Using Bags of  Gesturelets\nCollaborative Feature Learning from Social Media\nFast Constraint Propagation for Image Segmentation\nSemantic Embedding Space for Zero-Shot Action Recognition\nMulti-Action Recognition via Stochastic Modelling of Optical Flow and  Gradients\nGeneralized Inpainting Method for Hyperspectral Image Acquisition\nA Fingerprint-based Access Control using Principal Component Analysis  and Edge Detection\nVisual Recognition by Counting Instances: A Multi-Instance Cardinality  Potential Kernel\nComparison of Algorithms for Compressed Sensing of Magnetic Resonance  Images\nDeep Neural Networks for Anatomical Brain Segmentation\nWeakly- and Semi-Supervised Learning of a DCNN for Semantic Image  Segmentation\nShow, Attend and Tell: Neural Image Caption Generation with Visual  Attention\nKernel Task-Driven Dictionary Learning for Hyperspectral Image  Classification\nAn equalised global graphical model-based approach for multi-camera  object tracking\nConvergence of gradient based pre-training in Denoising autoencoders\nTowards zero-configuration condition monitoring based on dictionary  learning\nDiscovering Human Interactions in Videos with Limited Data Labeling\nSemi-supervised Data Representation via Affinity Graph Learning\nSkeleton Matching based approach for Text Localization in Scene Images\nGray-Level Image Transitions Driven by Tsallis Entropic Index\nCardiac MR Image Segmentation Techniques: an overview\nSpatial Stimuli Gradient Sketch Model\nsegDeepM: Exploiting Segmentation and Context in Deep Neural Networks  for Object Detection\nBi-Level Image Thresholding obtained by means of Kaniadakis Entropy\nInferring 3D Object Pose in RGB-D Images\n3D Pose from Detections\nContext Tricks for Cheap Semantic Segmentation\nWhat makes for effective detection proposals?\nPrediction of Search Targets From Fixations in Open-World Settings\nSA-CNN: Dynamic Scene Classification using Convolutional Neural Networks\nVisualizing Object Detection Features\nVIP: Finding Important People in Images\nPairwise Constraint Propagation: A Survey\nVisual object tracking performance measures revisited\nA new network-based algorithm for human activity recognition in video\nA Heat-Map-based Algorithm for Recognizing Group Activities in Videos\nStudy of a Robust Algorithm Applied in the Optimal Position Tuning for  the Camera Lens in Automated Visual Inspection Systems\nDon't Just Listen, Use Your Imagination: Leveraging Visual Common Sense  for Non-Visual Tasks\nVideo Text Localization with an emphasis on Edge Features\nBoosting of Image Denoising Algorithms\nSpatio-temporal Video Parsing for Abnormality Detection\nCompressive Hyperspectral Imaging with Side Information\nConvolutional Patch Networks with Spatial Prior for Road Detection and  Urban Scene Understanding\nDiscrete Wavelet Transform and Gradient Difference based approach for  text localization in videos\nOnline Tracking by Learning Discriminative Saliency Map with  Convolutional Neural Network\nReal-Time System of Hand Detection And Gesture Recognition In Cyber  Presence Interactive System For E-Learning\nCoercive Region-level Registration for Multi-modal Images\nConcept for a CMOS Image Sensor Suited for Analog Image Pre-Processing\nA hypothesize-and-verify framework for Text Recognition using Deep  Recurrent Neural Networks\nDynamic Belief Fusion for Object Detection\nLandmark-Guided Elastic Shape Analysis of Human Character Motions\nModelling Local Deep Convolutional Neural Network Features to Improve  Fine-Grained Image Classification\nSecond Order Minimum Energy Filtering on $\\operatorname{SE}_3$ with  Nonlinear Measurement Equations\nActivity Recognition Using A Combination of Category Components And  Local Models for Video Surveillance\nGroup Event Detection with a Varying Number of Group Members for Video  Surveillance\nImproved Image Deblurring based on Salient-region Segmentation\nGraphical Representation for Heterogeneous Face Recognition\nLearning a Convolutional Neural Network for Non-uniform Motion Blur  Removal\nA review of mean-shift algorithms for clustering\nJoint calibration of Ensemble of Exemplar SVMs\nContext Forest for efficient object detection with large mixture models\nMultiscale Combinatorial Grouping for Image Segmentation and Object  Proposal Generation\nAnisotropic Diffusion in ITK\nUsing Descriptive Video Services to Create a Large Data Source for Video  Annotation Research\nLearning Super-Resolution Jointly from External and Internal Examples\nDo We Need More Training Data?\nJointly Learning Multiple Measures of Similarities from Triplet  Comparisons\nSpectral Clustering by Ellipsoid and Its Connection to Separable  Nonnegative Matrix Factorization\nDeep Temporal Appearance-Geometry Network for Facial Expression  Recognition\nPyrcca: regularized kernel canonical correlation analysis in Python and  its applications to neuroimaging\nLearning to rank in person re-identification with metric ensembles\nInference of hidden structures in complex physical systems by  multi-scale clustering\nBoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks  for Semantic Segmentation\nColor Image Classification via Quaternion Principal Component Analysis  Network\nLatent Hierarchical Model for Activity Recognition\nDeep Clustered Convolutional Kernels\nLinear Global Translation Estimation with Feature Tracks\nPartial light field tomographic reconstruction from a fixed-camera focal  stack\nOn the Invariance of Dictionary Learning and Sparse Representation to  Projecting Data to a Discriminative Space\nAn Improved Image Mosaicing Algorithm for Damaged Documents\nFitting 3D Morphable Models using Local Features\nFast and Robust Fixed-Rank Matrix Recovery\nRemarks on pointed digital homotopy\nLearning Classifiers from Synthetic Data Using a Multichannel  Autoencoder\nSimple, Accurate, and Robust Nonparametric Blind Super-Resolution\nA model-based approach to recovering the structure of a plant from  images\nA Novel Hybrid CNN-AIS Visual Pattern Recognition Engine\nStochastic Texture Difference for Scale-Dependent Data Analysis\nBrowserbite: Cross-Browser Testing via Image Processing\nAndroid based Portable Hand Sign Recognition System\n2D Face Recognition System Based on Selected Gabor Filters and Linear  Discriminant Analysis LDA\nDiagnosing Heterogeneous Dynamics for CT Scan Images of Human Brain in  Wavelet and MFDFA domain\nExploiting Image-trained CNN Architectures for Unconstrained Video  Classification\nTemplate-based Monocular 3D Shape Recovery using Laplacian Meshes\nPhase and TV Based Convex Sets for Blind Deconvolution of Microscopic  Images\nEdge Detection: A Collection of Pixel based Approach for Colored Images\nAn approach to improving edge detection for facial and remotely sensed  images using vector order statistics\nAutomatic Pollen Grain and Exine Segmentation from Microscope Images\nLearning Hypergraph-regularized Attribute Predictors\nA General Framework for Multi-focal Image Classification and  Authentication: Application to Microscope Pollen Images\nSkin Detection of Animation Characters\nWavelet based approach for tissue fractal parameter measurement: Pre  cancer detection\nA novel pLSA based Traffic Signs Classification System\nVehicle Local Position Estimation System\nFactorization of View-Object Manifolds for Joint Object Recognition and  Pose Estimation\nCompressed sensing MRI using masked DCT and DFT measurements\nRobust Eye Centers Localization with Zero--Crossing Encoded Image  Projections\nPain Intensity Estimation by a Self--Taught Selection of Histograms of  Topographical Features\nContent-Based Bird Retrieval using Shape context, Color moments and Bag  of Features\nTransductive Multi-class and Multi-label Zero-shot Learning\nCRF Learning with CNN Features for Image Segmentation\nLabel-Embedding for Image Classification\nBeyond Short Snippets: Deep Networks for Video Classification\nReal-World Font Recognition Using Deep Network and Domain Adaptation\nWeakly Supervised Learning of Objects, Attributes and their Associations\nThe Approximation of the Dissimilarity Projection\nConvex Denoising using Non-Convex Tight Frame Regularization\nGraph Connectivity in Noisy Sparse Subspace Clustering\nLocally Non-rigid Registration for Mobile HDR Photography\nDesign and Implementation of a 3D Undersea Camera System\nHeterogeneous Tensor Decomposition for Clustering via Manifold  Optimization\nPerformance measures for classification systems with rejection\nA Coarse-to-Fine Model for 3D Pose Estimation and Sub-category  Recognition\nAppearance-Based Gaze Estimation in the Wild\nA Novel Approach to Develop a New Hybrid Technique for Trademark Image  Retrieval\nImage Denoising Using Low Rank Minimization With Modified Noise  Estimation\nBackground Subtraction via Generalized Fused Lasso Foreground Modeling\nA spectral optical flow method for determining velocities from digital  imagery\nUnderstanding the Fisher Vector: a multimodal part model\nLearning discriminative trajectorylet detector sets for accurate  skeleton-based action recognition\nF-SVM: Combination of Feature Transformation and SVM Learning via Convex  Relaxation\nExploiting Local Features from Deep Networks for Image Retrieval\nViewpoint distortion compensation in practical surveillance systems\nAutomatic Face Recognition from Video\nKey-Pose Prediction in Cyclic Human Motion\nAdaptive Compressive Tracking via Online Vector Boosting Feature  Selection\nSelf-Tuned Deep Super Resolution\nCombining local regularity estimation and total variation optimization  for scale-free texture segmentation\nLOAD: Local Orientation Adaptive Descriptor for Texture and Material  Classification\nEdge Detection Based on Global and Local Parameters of the Image\nUnderstanding and Diagnosing Visual Tracking Systems\nObject Detection Networks on Convolutional Feature Maps\nOnline Adaptive Hidden Markov Model for Multi-Tracker Fusion\nSparse Radial Sampling LBP for Writer Identification\nPerson Re-identification with Correspondence Structure Learning\nDepth-based hand pose estimation: methods, data, and challenges\nSituational Object Boundary Detection\nSemantic Motion Segmentation Using Dense CRF Formulation\nWxBS: Wide Baseline Stereo Generalizations\nDifferential Recurrent Neural Networks for Action Recognition\nMax-margin Deep Generative Models\nDetection and Recognition of Malaysian Special License Plate Based On  SIFT Features\nSegSALSA-STR: A convex formulation to supervised hyperspectral image  segmentation using hidden fields and structure tensor regularization\nShape Representation and Classification through Pattern Spectrum and  Local Binary Pattern - A Decision Level Fusion Approach\nCompact CNN for Indexing Egocentric Videos\nA Robust Lane Detection and Departure Warning System\nAccelerating the Development of Software-Defined Network Optimization  Applications Using SOL\nVisual Information Retrieval in Endoscopic Video Archives\nComparative study of image registration techniques for bladder  video-endoscopy\nRobust hyperspectral image classification with rejection fields\nSemi-Orthogonal Multilinear PCA with Relaxed Start\nEfficient Image-Space Extraction and Representation of 3D Surface  Topography\nBag-of-Genres for Video Genre Retrieval\nHiding Information in Noise: Fundamental Limits of Covert Wireless  Communication\nRBIR using Interest Regions and Binary Signatures\nWhat Makes Kevin Spacey Look Like Kevin Spacey\nStochastic And-Or Grammars: A Unified Framework and Logic Perspective\nImage Retrieval System Base on EMD Similarity Measure and S-Tree\nColor Image Retrieval Using Fuzzy Measure Hamming and S-Tree\nHEP-FCE Working Group on Libraries and Tools\nComparing the Performance of L*A*B* and HSV Color Spaces with Respect to  Color Image Segmentation\nMultilayer Structured NMF for Spectral Unmixing of Hyperspectral Images\nMonocular SLAM Supported Object Recognition\nLearning to track for spatio-temporal action localization\nSpatial Transformer Networks\nSentence Directed Video Object Codetection\nAutomatic tracking of protein vesicles\nWhat's the Point: Semantic Segmentation with Point Supervision\nDescribing Common Human Visual Actions in Images\nVisual Learning of Arithmetic Operations\nSVM and ELM: Who Wins? Object Recognition with Deep Convolutional  Features from ImageNet\nLearning with Group Invariant Features: A Kernel Perspective\nLearning to Select Pre-Trained Deep Representations with Bayesian  Evidence Framework\nFlowing ConvNets for Human Pose Estimation in Videos\nMultiscale edge detection and parametric shape modeling for boundary  delineation in optoacoustic images\nLicense Plate Recognition System Based on Color Coding Of License Plates\nWide baseline stereo matching with convex bounded-distortion constraints\nImage Tag Completion and Refinement by Subspace Clustering and Matrix  Completion\nGenerative Image Modeling Using Spatial LSTMs\nP-CNN: Pose-based CNN Features for Action Recognition\nConstrained Convolutional Neural Networks for Weakly Supervised  Segmentation\nPose-Invariant 3D Face Alignment\nTree-Cut for Probabilistic Image Segmentation\nSparse Multi-layer Image Approximation: Facial Image Compression\nConvolutional LSTM Network: A Machine Learning Approach for  Precipitation Nowcasting\nCombinatorial Energy Learning for Image Segmentation\nResolving Scale Ambiguity Via XSlit Aspect Ratio Analysis\nReading Scene Text in Deep Convolutional Sequences\nFlow Segmentation in Dense Crowds\nLeveraging the Power of Gabor Phase for Face Identification: A Block  Matching Approach\nMulti-path Convolutional Neural Networks for Complex Image  Classification\nLayered Interpretation of Street View Images\nImage-based Recommendations on Styles and Substitutes\nEnd-to-end people detection in crowded scenes\nDecoupled Deep Neural Network for Semi-supervised Semantic Segmentation\nPost-Reconstruction Deconvolution of PET Images by Total Generalized  Variation Regularization\nRobust High Quality Image Guided Depth Upsampling\nA Discriminative Representation of Convolutional Features for Indoor  Scene Recognition\nDeep Generative Image Models using a Laplacian Pyramid of Adversarial  Networks\nAligning where to see and what to tell: image caption with region-based  attention and scene factorization\n3D Reconstruction from Full-view Fisheye Camera\nTarget Tracking In Real Time Surveillance Cameras and Videos\nR-CNN minus R\nSegmentation of Three-dimensional Images with Parametric Active Surfaces  and Topology Changes\nDeep CNN Ensemble with Data Augmentation for Object Detection\nTargeting Ultimate Accuracy: Face Recognition via Deep Embedding\nSalient Object Detection via Objectness Measure\nEmbed to Control: A Locally Linear Latent Dynamics Model for Control  from Raw Images\nDegenerate Motions in Multicamera Cluster SLAM with Non-overlapping  Fields of View\nGeneralized Majorization-Minimization\nAttentionNet: Aggregating Weak Directions for Accurate Object Detection\nOcclusion Coherence: Detecting and Localizing Occluded Faces\nA note on patch-based low-rank minimization for fast image denoising\nUnsupervised Semantic Parsing of Video Collections\nTell and Predict: Kernel Classifier Prediction for Unseen Visual Classes  from Unstructured Text Descriptions\nAutomatic Channel Network Extraction from Remotely Sensed Images by  Singularity Analysis\nAn automatic and efficient foreground object extraction scheme\nLens Factory: Automatic Lens Generation Using Off-the-shelf Components\nOnline Learning to Sample\nLong-Range Motion Trajectories Extraction of Articulated Human Using  Mesh Evolution\nMulti-Cue Structure Preserving MRF for Unconstrained Video Segmentation\nDiscovering Characteristic Landmarks on Ancient Coins using  Convolutional Networks\nLearning to Detect Blue-white Structures in Dermoscopy Images with Weak  Supervision\nDictionary and Image Recovery from Incomplete and Random Measurements\nEvaluating software-based fingerprint liveness detection using  Convolutional Networks and Local Binary Patterns\nOnline Domain Adaptation for Multi-Object Tracking\nEstimating snow cover from publicly available images\nDetection of Critical Number of People in Interlocked Doors for Security  Access Control by Exploiting a Microwave Transceiver-Array\nEvaluating color texture descriptors under large variations of  controlled lighting conditions\nSocially Constrained Structural Learning for Groups Detection in Crowd\nTabletGaze: Unconstrained Appearance-based Gaze Estimation in Mobile  Tablets\nNonlinear Metric Learning for kNN and SVMs through Geometric  Transformations\nDigging Deep into the layers of CNNs: In Search of How CNNs Achieve View  Invariance\nAutomatic Extraction of the Passing Strategies of Soccer Teams\nGait Assessment for Multiple Sclerosis Patients Using Microsoft Kinect\nWhat is Holding Back Convnets for Detection?\nMountain Peak Detection in Online Social Media\nA New Approach to an Old Problem: The Reconstruction of a Go Game  through a Series of Photographs\nLensless Compressive Imaging\nLight-field Microscopy with a Consumer Light-field Camera\nBeat-Event Detection in Action Movie Franchises\nPose-Guided Human Parsing with Deep Learned Features\nLCNN: Low-level Feature Embedded CNN for Salient Object Detection\nSense Beyond Expressions: Cuteness\nA Generative Model for Multi-Dialect Representation\nAction Recognition based on Subdivision-Fusion Model\nSupervised learning of sparse context reconstruction coefficients for  data representation and classification\nImage tag completion by local learning\nRobust Subspace Clustering via Smoothed Rank Approximation\nDeepWriterID: An End-to-end Online Text-independent Writer  Identification System\nImproving Image Restoration with Soft-Rounding\nExemplar Based Deep Discriminative and Shareable Feature Learning for  Scene Image Classification\nMorphometry-Based Longitudinal Neurodegeneration Simulation with MR  Imaging\nBREN: Body Reflection Essence-Neuter Model for Separation of Reflection  Components\nMultiple kernel multivariate performance learning using cutting plane  algorithm\nMaximum-Margin Structured Learning with Deep Networks for 3D Human Pose  Estimation\nA Comparative Analysis of Retrieval Techniques In Content Based Image  Retrieval\nShopper Analytics: a customer activity recognition system using a  distributed RGB-D camera network\nValidation of neural spike sorting algorithms without ground-truth  information\nDiscrete Hashing with Deep Neural Network\nBilevel parameter learning for higher-order total variation  regularisation models\nMixed Gaussian-Impulse Noise Removal from Highly Corrupted Images via  Adaptive Local and Nonlocal Statistical Priors\nImage Annotation Incorporating Low-Rankness, Tag and Visual Correlation  and Inhomogeneous Errors\nLove Thy Neighbors: Image Annotation by Exploiting Image Metadata\nAction Recognition by Hierarchical Mid-level Action Elements\nDomain Generalization for Object Recognition with Multi-task  Autoencoders\nApproximate Nearest Neighbor Fields in Video\nLearning A Task-Specific Deep Architecture For Clustering\nRobust Face Recognition via Multimodal Deep Face Representation\nDAG-Recurrent Neural Networks For Scene Labeling\nDictionary based Approach to Edge Detection\nDepth Fields: Extending Light Field Techniques to Time-of-Flight Imaging\nLight Efficient Flutter Shutter\nImage Classification with Rejection using Contextual Information\nLearning Temporal Alignment Uncertainty for Efficient Event Detection\nCNN Based Hashing for Image Retrieval\nCoordinate Descent Methods for Symmetric Nonnegative Matrix  Factorization\nEM Algorithms for Weighted-Data Clustering with Application to  Audio-Visual Scene Analysis\nConjugate Gradient Acceleration of Non-Linear Smoothing Filters\nObject Recognition from Short Videos for Robotic Perception\nCo-interest Person Detection from Multiple Wearable Camera Videos\nStructured Prediction with Output Embeddings for Semantic Image  Annotation\nHEp-2 Cell Classification: The Role of Gaussian Scale Space Theory as A  Pre-processing Approach\nEdge-enhancing Filters with Negative Weights\nShape Interaction Matrix Revisited and Robustified: Efficient Subspace  Clustering with Corrupted and Incomplete Data\nReal-time Sign Language Fingerspelling Recognition using Convolutional  Neural Networks from Depth map\nA deep matrix factorization method for learning attribute  representations\nA reliable order-statistics-based approximate nearest neighbor search  algorithm\nOCR accuracy improvement on document images through a novel  pre-processing approach\nFingerprint Recognition Using Translation Invariant Scattering Network\nOracle MCG: A first peek into COCO Detection Challenges\nOn Binary Classification with Single-Layer Convolutional Neural Networks\nLearning to Divide and Conquer for Online Multi-Target Tracking\nExpanded Parts Model for Semantic Description of Humans in Still Images\nA Total Fractional-Order Variation Model for Image Restoration with  Non-homogeneous Boundary Conditions and its Numerical Solution\nComparative Design Space Exploration of Dense and Semi-Dense SLAM\nZero-Shot Learning via Semantic Similarity Embedding\nGroup Membership Prediction\nDenseBox: Unifying Landmark Localization with End to End Object  Detection\nAn Improved Algorithm for Eye Corner Detection\nGuiding Long-Short Term Memory for Image Caption Generation\nHuman and Sheep Facial Landmarks Localisation by Triplet Interpolated  Features\nRecurrent Neural Networks for Driver Activity Anticipation via  Sensory-Fusion Architecture\nHCLAE: High Capacity Locally Aggregating Encodings for Approximate  Nearest Neighbor Search\nImproved Residual Vector Quantization for High-dimensional Approximate  Nearest Neighbor Search\nHand-held Video Deblurring via Efficient Fourier Aggregation\nHumans Are Easily Fooled by Digital Images\nRecurrent Spatial Transformer Networks\nGeometry-aware Deep Transform\nLearning from Synthetic Data Using a Stacked Multichannel Autoencoder\nAn Experimental Survey on Correlation Filter-based Tracking\nSimilar Handwritten Chinese Character Discrimination by Weakly  Supervised Learning\nFace Photo Sketch Synthesis via Larger Patch and Multiresolution Spline\nImage Retrieval Based on LBP Pyramidal Multiresolution using Reversible  Watermarking\nOn Large-Scale Retrieval: Binary or n-ary Coding?\nFusing Multi-Stream Deep Networks for Video Classification\nOn 3D Face Reconstruction via Cascaded Regression in Shape Space\nFrom Facial Parts Responses to Face Detection: A Deep Learning Approach\nUnderstand Scene Categories by Objects: A Semantic Regularized Scene  Classifier Using Convolutional Neural Networks\nInvariants of objects and their images under surjective maps\nA Dual-Source Approach for 3D Pose Estimation from a Single Image\nRobust Object Tracking with a Hierarchical Ensemble Framework\nMulti-Region Probabilistic Dice Similarity Coefficient using the  Aitchison Distance and Bipartite Graph Matching\nLearning Concept Embeddings with Combined Human-Machine Expertise\nIncremental Loop Closure Verification by Guided Sampling\nSelf-localization Using Visual Experience Across Domains\nFeature Evaluation of Deep Convolutional Neural Networks for Object  Recognition and Detection\nModeling Curiosity in a Mobile Robot for Long-Term Autonomous  Exploration and Monitoring\nAnomaly Detection in Unstructured Environments using Bayesian  Nonparametric Scene Modeling\nAmodal Completion and Size Constancy in Natural Scenes\nRobust video object tracking using particle filter with likelihood based  feature fusion and adaptive template updating\nLong-Range Trajectories from Global and Local Motion Representations\nScalable Nonlinear Embeddings for Semantic Category-based Image  Retrieval\nStats-Calculus Pose Descriptor Feeding A Discrete HMM Low-latency  Detection and Recognition System For 3D Skeletal Actions\nMoving Object Detection in Video Using Saliency Map and Subspace  Learning\nA spatial compositional model (SCM) for linear unmixing and endmember  uncertainty estimation\nGeneral Dynamic Scene Reconstruction from Multiple View Video\nAnalyzing Classifiers: Fisher Vectors and Deep Neural Networks\nEfficient Edge Detection on Low-Cost FPGAs\nThe MegaFace Benchmark: 1 Million Faces for Recognition at Scale\nContinuous and Simultaneous Gesture and Posture Recognition for  Commanding a Robotic Wheelchair; Towards Spotting the Signal Patterns\nActions ~ Transformations\nCompressive hyperspectral imaging via adaptive sampling and dictionary  learning\nA Literature Survey of various Fingerprint De-noising Techniques to  justify the need of a new De-noising model based upon Pixel Component  Analysis\nFast Low-Rank Matrix Learning with Nonconvex Regularization\nOcclusion-Aware Human Pose Estimation with Mixtures of Sub-Trees\nTrending Chic: Analyzing the Influence of Social Media on Fashion Brands\nStaple: Complementary Learners for Real-Time Tracking\nA Shapley Value Solution to Game Theoretic-based Feature Reduction in  False Alarm Detection\nMaximum Entropy Binary Encoding for Face Template Protection\nSparsifying Neural Network Connections for Face Recognition\nFast Optimization Algorithm on Riemannian Manifolds and Its Application  in Low-Rank Representation\nA Large Dataset to Train Convolutional Networks for Disparity, Optical  Flow, and Scene Flow Estimation\nLearning to Point and Count\nTracking Objects with Higher Order Interactions using Delayed Column  Generation\nFine-grained Image Classification by Exploring Bipartite-Graph Labels\nWindow-Object Relationship Guided Representation Learning for Generic  Object Detections\nYet Another Statistical Analysis of Bob Ross Paintings\nDeep Residual Learning for Image Recognition\nImproving Human Activity Recognition Through Ranking and Re-ranking\nLearning the Correction for Multi-Path Deviations in Time-of-Flight  Cameras\nDeep Learning-Based Image Kernel for Inductive Transfer\nArticulated Pose Estimation Using Hierarchical Exemplar-Based Models\nA Person Re-Identification System For Mobile Devices\nEvaluation of Pose Tracking Accuracy in the First and Second Generations  of Microsoft Kinect\nInside-Outside Net: Detecting Objects in Context with Skip Pooling and  Recurrent Neural Networks\nLearning Deep Features for Discriminative Localization\nWatch-Bot: Unsupervised Learning for Reminding Humans of Forgotten  Actions\nInstance-aware Semantic Segmentation via Multi-task Network Cascades\nSparse Representation of a Blur Kernel for Blind Image Restoration\nBlockout: Dynamic Model Selection for Hierarchical Deep Networks\nNumerical Demultiplexing of Color Image Sensor Measurements via  Non-linear Random Forest Modeling\nReconstruction of Enhanced Ultrasound Images From Compressed  Measurements Using Simultaneous Direction Method of Multipliers\nRelay Backpropagation for Effective Learning of Deep Convolutional  Neural Networks\nDeformable Distributed Multiple Detector Fusion for Multi-Person  Tracking\nModeling Colors of Single Attribute Variations with Application to Food  Appearance\nMultistage SFM: A Coarse-to-Fine Approach for 3D Reconstruction\nNeutro-Connectedness Cut\nKernel principal component analysis network for image classification\nHarnessing the Deep Net Object Models for Enhancing Human Action  Recognition\nSpatial Phase-Sweep: Increasing temporal resolution of transient imaging  using a light source array\nDeep Learning for Surface Material Classification Using Haptic And  Visual Information\nSparse Coding with Fast Image Alignment via Large Displacement Optical  Flow\nInstance-Level Segmentation for Autonomous Driving with Deep Densely  Connected MRFs\nMulti-Instance Visual-Semantic Embedding\nSeeing through the Human Reporting Bias: Visual Classifiers from Noisy  Human-Centric Labels\nImplementation of deep learning algorithm for automatic detection of  brain tumors using intraoperative IR-thermal mapping data\nMid-level Representation for Visual Recognition\nA Deep Generative Deconvolutional Image Model\nConvolutional Architecture Exploration for Action Recognition and Image  Classification\nOn the Automated Synthesis of Enterprise Integration Patterns to Adapt  Choreography-based Distributed Systems\nFast Acquisition for Quantitative MRI Maps: Sparse Recovery from  Non-linear Measurements\nLearning Transferrable Knowledge for Semantic Segmentation with Deep  Convolutional Neural Network\nTexture measures combination for improved meningioma classification of  histopathological images\nVisually Indicated Sounds\nCombined statistical and model based texture features for improved image  classification\nActor-Action Semantic Segmentation with Grouping Process Models\nAutoencoding beyond pixels using a learned similarity metric\nDiscriminative Sparsity for Sonar ATR\nA fractal dimension based optimal wavelet packet analysis technique for  classification of meningioma brain tumours\nImage Resolution Enhancement by Using Interpolation Followed by  Iterative Back Projection\nMulti-task CNN Model for Attribute Prediction\nMatrix Variate RBM and Its Applications\nLow-Rank Representation over the Manifold of Curves\nMemory Matters: Convolutional Recurrent Neural Network for Scene Text  Recognition\nAutomatic 3D object detection of Proteins in Fluorescent labeled  microscope images with spatial statistical analysis\nMixture of Bilateral-Projection Two-dimensional Probabilistic Principal  Component Analysis\nBlock-Diagonal Sparse Representation by Learning a Linear Combination  Dictionary for Recognition\nLearning to Remove Multipath Distortions in Time-of-Flight Range Images  for a Robotic Arm Setup\nMulticuts and Perturb & MAP for Probabilistic Graph Clustering\n3D Gaze Estimation from 2D Pupil Positions on Monocular Head-Mounted Eye  Trackers\nSubspace Clustering Based Tag Sharing for Inductive Tag Matrix  Refinement with Complex Errors\nDigital Image Forensics vs. Image Composition: An Indirect Arms Race\nDocument image classification, with a specific view on applications of  patent images\nA Score-level Fusion Method for Eye Movement Biometrics\nThe Ultimate Display\nDynamic Concept Composition for Zero-Example Event Detection\nStereo Matching by Joint Energy Minimization\n$\\mathbf{D^3}$: Deep Dual-Domain Based Fast Restoration of  JPEG-Compressed Images\nStudying Very Low Resolution Recognition Using Deep Networks\nFace-space Action Recognition by Face-Object Interactions\nDiscovering Picturesque Highlights from Egocentric Vacation Videos\nDeep Perceptual Mapping for Cross-Modal Face Recognition\nDetecting Temporally Consistent Objects in Videos through Object Class  Label Propagation\nRGB-D-based Action Recognition Datasets: A Survey\nAutomatic 3D modelling of craniofacial form\nManifold-Kernels Comparison in MKPLS for Visual Speech Recognition\nDepth and Reflection Total Variation for Single Image Dehazing\nOnline Event Recognition from Moving Vessel Trajectories\nUnsupervised convolutional neural networks for motion estimation\nA bifibrational reconstruction of Lawvere's presheaf hyperdoctrine\nSuper-resolution reconstruction of hyperspectral images via low rank  tensor modeling and total variation regularization\nUsing compatible shape descriptor for lexicon reduction of printed Farsi  subwords\nSynthesis of Gaussian Trees with Correlation Sign Ambiguity: An  Information Theoretic Approach\nEgocentric Activity Recognition with Multimodal Fisher Vector\nFisher Motion Descriptor for Multiview Gait Recognition\nFont Identification in Historical Documents Using Active Learning\nFast Integral Image Estimation at 1% measurement rate\nLearning to Extract Motion from Videos in Convolutional Neural Networks\nCombining Maps and Street Level Images for Building Height and Facade  Estimation\nDehazeNet: An End-to-End System for Single Image Haze Removal\nGeo-distinctive Visual Element Matching for Location Estimation of  Images\nMapping Tractography Across Subjects\nWhat Can I Do Around Here? Deep Functional Scene Understanding for  Cognitive Robots\nConvolutional Pose Machines\nLearning a low-rank shared dictionary for object classification\nTransfer Learning Based on AdaBoost for Feature Selection from Multiple  ConvNet Layer Features\nAlgorithm-Induced Prior for Image Restoration\nCombining ConvNets with Hand-Crafted Features for Action Recognition  Based on an HMM-SVM Classifier\nSimple Online and Realtime Tracking\nLearning a Deep Model for Human Action Recognition from Novel Viewpoints\nHead Pose Estimation of Occluded Faces using Regularized Regression\nA-expansion for multiple \"hedgehog\" shapes\nHow Far are We from Solving Pedestrian Detection?\nLearning scale-variant and scale-invariant features for deep image  classification\nDevelopment of an Ideal Observer that Incorporates Nuisance Parameters  and Processes List-Mode Data\nAn ensemble diversity approach to supervised binary hashing\nAppearance Based Robot and Human Activity Recognition System\nRandom Feature Maps via a Layered Random Projection (LaRP) Framework for  Object Classification\nLeveraging Mid-Level Deep Representations For Predicting Face Attributes  in the Wild\nVisual Tracking via Reliable Memories\nSearch Tracker: Human-derived object tracking in-the-wild through  large-scale search and retrieval\nSub-cortical brain structure segmentation using F-CNN's\nScreen Content Image Segmentation Using Sparse Decomposition and Total  Variation Minimization\nCharacterization of a Multi-User Indoor Positioning System Based on Low  Cost Depth Vision (Kinect) for Monitoring Human Activity in a Smart Home\nTumour ROI Estimation in Ultrasound Images via Radon Barcodes in  Patients with Locally Advanced Breast Cancer\nJoint Defogging and Demosaicking\nFace Recognition: Perspectives from the Real-World\nA New Spatio-Spectral Morphological Segmentation For Multi-Spectral  Remote-Sensing Images\nDesign of false color palettes for grayscale reproduction\nGenerating Discriminative Object Proposals via Submodular Ranking\nHMM and DTW for evaluation of therapeutical gestures using kinect\nSemi-supervised Learning with Explicit Relationship Regularization\nWavelet-Based Semantic Features for Hyperspectral Signature  Discrimination\nImage Restoration and Reconstruction using Variable Splitting and  Class-adapted Image Priors\nManifolds of Projective Shapes\nConvolutional Tables Ensemble: classification in microseconds\nA diffusion and clustering-based approach for finding coherent motions  and understanding crowd scenes\nDeconvolutional Feature Stacking for Weakly-Supervised Semantic  Segmentation\nImage Restoration: A General Wavelet Frame Based Model and Its  Asymptotic Analysis\nBoost Picking: A Universal Method on Converting Supervised  Classification to Semi-supervised Classification\nFeature-Area Optimization: A Novel SAR Image Registration Method\nWeighted Unsupervised Learning for 3D Object Detection\nPlücker Correction Problem: Analysis and Improvements in Efficiency\nLarge age-gap face verification by feature injection in deep networks\nDenoising and Covariance Estimation of Single Particle Cryo-EM Images\nPlanogram Compliance Checking Based on Detection of Recurring Patterns\nExploring the Neural Algorithm of Artistic Style\nSqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB  model size\nHow Deep Neural Networks Can Improve Emotion Recognition on Video Data\nLearning to Generate with Memory\nA fine-grained approach to scene text script identification\nImproving patch-based scene text script identification with ensembles of  conjoined networks\nA Low Complexity VLSI Architecture for Multi-Focus Image Fusion in DCT  Domain\nCNN for License Plate Motion Deblurring\nAuto-JacoBin: Auto-encoder Jacobian Binary Hashing\nMultimodal Emotion Recognition Using Multimodal Deep Learning\nVictory Sign Biometric for Terrorists Identification\nSeq-NMS for Video Object Detection\nGraph clustering, variational image segmentation methods and Hough  transform scale detection for object measurement in images\nSingle-Image Superresolution Through Directional Representations\nContent-based Video Indexing and Retrieval Using Corr-LDA\nLearning Multilayer Channel Features for Pedestrian Detection\nA Universal Update-pacing Framework For Visual Tracking\nConvolutional Patch Representations for Image Retrieval: an Unsupervised  Approach\nKeypoint Density-based Region Proposal for Fine-Grained Object Detection  and Classification using Regions with Convolutional Neural Network Features\nSynthesized Classifiers for Zero-Shot Learning\nFlies as Ship Captains? Digital Evolution Unravels Selective Pressures  to Avoid Collision in Drosophila\nA Nonlinear Weighted Total Variation Image Reconstruction Algorithm for  Electrical Capacitance Tomography\nAutomatic segmentation of lizard spots using an active contour model\nLiDAR Ground Filtering Algorithm for Urban Areas Using Scan Line Based  Segmentation\nPCANet: An energy perspective\nSelf-localization from Images with Small Overlap\nAutomatic learning of gait signatures for people identification\nFirst Steps Toward Camera Model Identification with Convolutional Neural  Networks\nWhat is the right way to represent document images?\nHyperFace: A Deep Multi-task Learning Framework for Face Detection,  Landmark Localization, Pose Estimation, and Gender Recognition\nLearning deep representation of multityped objects and tasks\nSaliency Detection combining Multi-layer Integration algorithm with  background prior and energy function\nA Feature Learning and Object Recognition Framework for Underwater Fish  Images\nGrading of Mammalian Cumulus Oocyte Complexes using Machine Learning for  in Vitro Embryo Culture\nSemantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks\nVariational methods for Conditional Multimodal Deep Learning\nSingle Image Restoration for Participating Media Based on Prior Fusion\nA Two-Stage Shape Retrieval (TSR) Method with Global and Local Features\nAdaptive Visualisation System for Construction Building Information  Models Using Saliency\nA novel learning-based frame pooling method for Event Detection\nGaussian Process Regression for Out-of-Sample Extension\nBlur Robust Optical Flow using Motion Channel\nHand Segmentation for Hand-Object Interaction from Depth map\nA hybrid approach based segmentation technique for brain tumor in MRI  Images\nA New Method to Visualize Deep Neural Networks\nIterative Hough Forest with Histogram of Control Points for 6 DoF Object  Registration from Depth Images\nA regularization-based approach for unsupervised image segmentation\nDiscriminative models for robust image classification\nRecursive Recurrent Nets with Attention Modeling for OCR in the Wild\nUTSig: A Persian Offline Signature Dataset\nLearning Gaze Transitions from Depth to Improve Video Saliency  Estimation\nReal-time 3D scene description using Spheres, Cones and Cylinders\nOptical Flow with Semantic Segmentation and Localized Layers\nRobust Scene Text Recognition with Automatic Rectification\nTemplate Adaptation for Face Verification and Identification\nPose for Action - Action for Pose\nRISAS: A Novel Rotation, Illumination, Scale Invariant Appearance and  Shape Feature\nSaliency Detection for Improving Object Proposals\nRegression-based Hypergraph Learning for Image Clustering and  Classification\nVisual Concept Recognition and Localization via Iterative Introspection\nDiversity in Object Proposals\nAutomatic Discrimination of Color Retinal Images using the Bag of Words  Approach\nRapid building detection using machine learning\nObject Contour Detection with a Fully Convolutional Encoder-Decoder  Network\nScalable Image Retrieval by Sparse Product Quantization\nModeling Time Series Similarity with Siamese Recurrent Networks\nCombining the Best of Convolutional Layers and Recurrent Layers: A  Hybrid Network for Semantic Segmentation\nDeep Fully-Connected Networks for Video Compressive Sensing\nNon-linear Dimensionality Regularizer for Solving Inverse Problems\nIdentity Mappings in Deep Residual Networks\nSuppressing the Unusual: towards Robust CNNs using Symmetric Activation  Functions\nXNOR-Net: ImageNet Classification Using Binary Convolutional Neural  Networks\nImage Labeling by Assignment\nSaliency Detection with Spaces of Background-based Distribution\nUnsupervised Cross-Media Hashing with Structure Preservation\nA Flexible Primal-Dual Toolbox\nGeometric Hypergraph Learning for Visual Tracking\nTransferring Learned Microcalcification Group Detection from 2D  Mammography to 3D Digital Breast Tomosynthesis Using a Hierarchical Model and  Scope-based Normalization Features\nLarge scale near-duplicate image retrieval using Triples of Adjacent  Ranked Features (TARF) with embedded geometric information\nSeed, Expand and Constrain: Three Principles for Weakly-Supervised Image  Segmentation\nBuried object detection using handheld WEMI with task-driven extended  functions of multiple instances\nAdaptive coherence estimator (ACE) for explosive hazard detection using  wideband electromagnetic induction (WEMI)\nTowards Automatic Wild Animal Monitoring: Identification of Animal  Species in Camera-trap Images using Very Deep Convolutional Neural Networks\nSegmentation from Natural Language Expressions\nModelling Temporal Information Using Discrete Fourier Transform for  Video Classification\nUnified Depth Prediction and Intrinsic Image Decomposition from a Single  Image via Joint Convolutional Neural Fields\nAppearance Harmonization for Single Image Shadow Removal\nBeyond Sharing Weights for Deep Domain Adaptation\nFrankenstein: Learning Deep Face Representations using Small Data\nAction-Affect Classification and Morphing using Multi-Task  Representation Learning\nModelling Temporal Information Using Discrete Fourier Transform for  Recognizing Emotions in User-generated Videos\nInput Aggregated Network for Face Video Representation\nImplementation of a FPGA-Based Feature Detection and Networking System  for Real-time Traffic Monitoring\nImage Super-Resolution Based on Sparsity Prior via Smoothed $l_0$ Norm\nKnowledge Transfer for Scene-specific Motion Prediction\nActive Detection and Localization of Textureless Objects in Cluttered  Environments\nDo We Really Need to Collect Millions of Faces for Effective Face  Recognition?\nDeep Multimodal Feature Analysis for Action Recognition in RGB+D Videos\nWeakly-Supervised Semantic Segmentation using Motion Cues\nFace Recognition Using Deep Multi-Pose Representations\nPixel-Level Domain Transfer\nFine-scale Surface Normal Estimation using a Single NIR Image\nJoint Projection and Dictionary Learning using Low-rank Regularization  and Graph Constraints\nCo-occurrence Feature Learning for Skeleton based Action Recognition  using Regularized Deep LSTM Networks\nQuadratic Projection Based Feature Extraction with Its Application to  Biometric Recognition\nAn Effective Unconstrained Correlation Filter and Its Kernelization for  Face Recognition\nConditional Similarity Networks\nTraining-Free Synthesized Face Sketch Recognition Using Image Quality  Assessment Metrics\nBlind signal separation and identification of mixtures of images\nSupport Driven Wavelet Frame-based Image Deblurring\nPerceptual Losses for Real-Time Style Transfer and Super-Resolution\nVolumeDeform: Real-time Volumetric Non-rigid Reconstruction\nDeLight-Net: Decomposing Reflectance Maps into Specular Materials and  Natural Illumination\nHierarchy of Groups Evaluation Using Different F-score Variants\nHierarchical Gaussian Mixture Model with Objects Attached to Terminal  and Non-terminal Dendrogram Nodes\nLearning to Read Chest X-Rays: Recurrent Neural Cascade Model for  Automated Image Annotation\nColorful Image Colorization\nShuffle and Learn: Unsupervised Learning using Temporal Order  Verification\nAttend, Infer, Repeat: Fast Scene Understanding with Generative Models\nExploring Local Context for Multi-target Tracking in Wide Area Aerial  Surveillance\nLearning a Predictable and Generative Vector Representation for Objects\nMulti-Band Image Fusion Based on Spectral Unmixing\nScalable Solution for Approximate Nearest Subspace Search\nSMASH: Physics-guided Reconstruction of Collisions from Videos\nDense Image Representation with Spatial Pyramid VLAD Coding of CNN for  Locally Robust Captioning\nStructured Feature Learning for Pose Estimation\nLearning Local Descriptors by Optimizing the Keypoint-Correspondence  Criterion\nUnsupervised Learning of Visual Representations by Solving Jigsaw  Puzzles\nPartial Face Detection for Continuous Authentication\nThe Open World of Micro-Videos\nExemplar-AMMs: Recognizing Crowd Movements from Pedestrian Trajectories\nRobust Uncalibrated Stereo Rectification with Constrained Geometric  Distortions (USR-CGD)\nA ParaBoost Stereoscopic Image Quality Assessment (PBSIQA) System\nLarge Scale Deep Convolutional Neural Network Features Search with  Lucene\nObject Boundary Guided Semantic Segmentation\nDeep Convolutional Neural Networks on Cartoon Functions\nDisturbLabel: Regularizing CNN on the Loss Layer\n3D Keypoint Detection Based on Deep Neural Network with Sparse  Autoencoder\nEnforcing Template Representability and Temporal Consistency for  Adaptive Sparse Tracking\nMultidimensional Scaling on Multiple Input Distance Matrices\nDominant Codewords Selection with Topic Model for Action Recognition\nParallel Wavelet Schemes for Images\nSpatially Aware Dictionary Learning and Coding for Fossil Pollen  Identification\nRecurrent Convolutional Neural Network Regression for Continuous Pain  Intensity Estimation in Video\nHierarchical Bayesian Noise Inference for Robust Real-time Probabilistic  Object Classification\nMARLow: A Joint Multiplanar Autoregressive and Low-Rank Approach for  Image Completion\nLearning Covariant Feature Detectors\nUnsupervised Total Variation Loss for Semi-supervised Deep Learning of  Semantic Segmentation\nLeveraging Visual Question Answering for Image-Caption Ranking\nSkin Lesion Analysis toward Melanoma Detection: A Challenge at the  International Symposium on Biomedical Imaging (ISBI) 2016, hosted by the  International Skin Imaging Collaboration (ISIC)\nClassification of Human Whole-Body Motion using Hidden Markov Models\nAdversarial Diversity and Hard Positive Generation\nRobust SAR STAP via Kronecker Decomposition\nShape from Mixed Polarization\nDeeply Exploit Depth Information for Object Detection\nLaplacian Pyramid Reconstruction and Refinement for Semantic  Segmentation\nEstimating Depth from Monocular Images as Classification Using Deep  Fully Convolutional Residual Networks\nFuzzy Clustering Based Segmentation Of Vertebrae in T1-Weighted Spinal  MR Images\nDeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation  Model\nUnsupervised Semantic Action Discovery from Video Collections\nA robust particle detection algorithm based on symmetry\nGo-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration\nEfficiently Creating 3D Training Data for Fine Hand Pose Estimation\nOn-the-fly Network Pruning for Object Detection\nReal-time 3D Tracking of Articulated Tools for Robotic Surgery\nDeep Neural Networks Under Stress\nView Synthesis by Appearance Flow\nItem Popularity Prediction in E-commerce Using Image Quality Feature  Vectors\nGoing Deeper into First-Person Activity Recognition\nRobust and Efficient Relative Pose with a Multi-camera System for  Autonomous Vehicle in Highly Dynamic Environments\nDeformable Parts Correlation Filters for Robust Visual Tracking\nFast Graph-Based Object Segmentation for RGB-D Images\nA New Manifold Distance Measure for Visual Object Categorization\nTrack Extraction with Hidden Reciprocal Chain Models\nFast Semantic Image Segmentation with High Order Context and Guided  Filtering\nWith Whom Do I Interact? Detecting Social Interactions in Egocentric  Photo-streams\nSimultaneous Surface Reflectance and Fluorescence Spectra Estimation\nAn Empirical Study and Analysis of Generalized Zero-Shot Learning for  Object Recognition in the Wild\nImproving the Neural Algorithm of Artistic Style\nMono-jet Signatures of Gluphilic Scalar Dark Matter\nVideo2GIF: Automatic Generation of Animated GIFs from Video\nClassification of Big Data with Application to Imaging Genetics\nImage stitching with perspective-preserving warping\nIncremental Robot Learning of New Objects with Fixed Update Time\nMonocular Urban Localization using Street View\nHuman Action Localization with Sparse Spatial Supervision\nLearning Deep Representations of Fine-grained Visual Descriptions\nRelative distance features for gait recognition with Kinect\nBeyond Caption To Narrative: Video Captioning With Multiple Sentences\nLow-Rank Matrices on Graphs: Generalized Recovery & Applications\nScalable low dimensional manifold model in the reconstruction of noisy  and incomplete hyperspectral images\nRobust Image Descriptors for Real-Time Inter-Examination Retargeting in  Gastrointestinal Endoscopy\nTongue contour extraction from ultrasound images based on deep neural  network\nA Geometric Approach to Color Image Regularization\nInter-Battery Topic Representation Learning\nFine-Grained Classification of Pedestrians in Video: Benchmark and State  of the Art\nFully Convolutional Networks for Semantic Segmentation\nLocalizing by Describing: Attribute-Guided Attention Localization for  Fine-Grained Recognition\nEnd-to-End Kernel Learning with Supervised Convolutional Kernel Networks\nLearning shape correspondence with anisotropic convolutional neural  networks\nX-ray image separation via coupled dictionary learning\nWAHRSIS: A Low-cost, High-resolution Whole Sky Imager With Near-Infrared  Capabilities\nLearning to Communicate with Deep Multi-Agent Reinforcement Learning\nFine-to-coarse Knowledge Transfer For Low-Res Image Classification\nAutomatic Detection of Epileptiform Discharges in the EEG\nA Rapid Pattern-Recognition Method for Driving Types Using  Clustering-Based Support Vector Machines\n3D Face Tracking and Texture Fusion in the Wild\nSelf-expressive Dictionary Learning for Dynamic 3D Reconstruction\nDepth from a Single Image by Harmonizing Overcomplete Local Network  Predictions\nDeepText: A Unified Framework for Text Proposal Generation and Text  Detection in Natural Images\nSpatio-Temporal Image Boundary Extrapolation\nQuickest Moving Object Detection\nNatural Scene Image Segmentation Based on Multi-Layer Feature Extraction\nBlind Analysis of CT Image Noise Using Residual Denoised Images\nDeepCut: Object Segmentation from Bounding Box Annotations using  Convolutional Neural Networks\nMulti-Object Tracking and Identification over Sets\nVideo Summarization with Long Short-term Memory\nLearning Latent Sub-events in Activity Videos Using Temporal Attention  Filters\nPredicting Visual Exemplars of Unseen Classes for Zero-Shot Learning\nA single scale retinex based method for palm vein extraction\nMultiple target tracking based on sets of trajectories\nDiscovering Causal Signals in Images\nPairwise Decomposition of Image Sequences for Active Multi-View  Recognition\nDomain Transfer Multi-Instance Dictionary Learning\nDense Volume-to-Volume Vascular Boundary Detection\nA Feature based Approach for Video Compression\nA Channelized Binning Method for Extraction of Dominant Color Pixel  Value\nVideo Key Frame Extraction using Entropy value as Global and Local  Feature\nSparse Coding and Counting for Robust Visual Tracking\nSemi-supervised Zero-Shot Learning by a Clustering-based Approach\nPredicting Personal Traits from Facial Images using Convolutional Neural  Networks Augmented with Facial Landmark Information\nImage segmentation based on the hybrid total variation model and the  K-means clustering strategy\nControl of Memory, Active Perception, and Action in Minecraft\nBlind Modulation Classification based on MLP and PNN\nRobust Deep-Learning-Based Road-Prediction for Augmented Reality  Navigation Systems\nSemantic-Aware Depth Super-Resolution in Outdoor Scenes\nModel-driven Simulations for Deep Convolutional Neural Networks\nGeneralized Multi-view Embedding for Visual Recognition and Cross-modal  Retrieval\nTowards ontology driven learning of visual concept detectors\nFast Zero-Shot Image Tagging\nTexture Synthesis Using Shallow Convolutional Networks with Random  Filters\nA Comparative Study of Algorithms for Realtime Panoramic Video Blending\nOpenSalicon: An Open Source Implementation of the Salicon Saliency Model\nMultiview Rectification of Folded Documents\nA Survey on Learning to Hash\nHyperspectral Subspace Identification Using SURE\nImproving Deep Neural Network with Multiple Parametric Exponential  Linear Units\nA 3D Face Modelling Approach for Pose-Invariant Face Recognition in a  Human-Robot Environment\nRecurrent Fully Convolutional Networks for Video Segmentation\nStorytelling of Photo Stream with Bidirectional Multi-thread Recurrent  Neural Network\nUnifying Geometric Features and Facial Action Units for Improved  Performance of Facial Expression Analysis\nComparison of 14 different families of classification algorithms on 115  binary datasets\nAutomatic Separation of Compound Figures in Scientific Articles\nExtraction of clinical information from the non-invasive fetal  electrocardiogram\nLearning under Distributed Weak Supervision\nReinforcement Learning for Semantic Segmentation in Indoor Scenes\nWhat is the Best Feature Learning Procedure in Hierarchical Recognition  Architectures?\nPairwise Quantization\nIntegrated perception with recurrent multi-task neural networks\nOptically lightweight tracking of objects around a corner\nLearning deep structured network for weakly supervised change detection\nHand Action Detection from Ego-centric Depth Sequences with  Error-correcting Hough Transform\nJoint Recursive Monocular Filtering of Camera Motion and Disparity Map\nENet: A Deep Neural Network Architecture for Real-Time Semantic  Segmentation\nSelective Unsupervised Feature Learning with Convolutional Neural  Network (S-CNN)\nSE3-Nets: Learning Rigid Body Motion using Deep Neural Networks\nDeep Learning Convolutional Networks for Multiphoton Microscopy  Vasculature Segmentation\nProgressive Attention Networks for Visual Attribute Prediction\nEstimation of solar irradiance using ground-based whole sky imagers\nDISCO Nets: DISsimilarity COefficient Networks\nFully Convolutional Networks for Dense Semantic Labelling of  High-Resolution Aerial Imagery\nRotation Invariant Angular Descriptor Via A Bandlimited Gaussian-like  Kernel\nSimultaneous Inpainting and Denoising by Directional Global Three-part  Decomposition: Connecting Variational and Fourier Domain Based Image  Processing\nImplicit Tubular Surface Generation Guided by Centerline\nMutual Exclusivity Loss for Semi-Supervised Deep Learning\nSurvey on RGB, 3D, Thermal, and Multimodal Approaches for Facial  Expression Recognition: History, Trends, and Affect-related Applications\nFOMTrace: Interactive Video Segmentation By Image Graphs and Fuzzy  Object Models\nAlternative Technique to Asymmetry Analysis-Based Overlapping for Foot  Ulcer Examination: Scalable Scanning\nColor-based Segmentation of Sky/Cloud Images From Ground-based Cameras\nSegmentation of scanning electron microscopy images from natural rubber  samples with gold nanoparticles using starlet wavelets\nHuman Centred Object Co-Segmentation\nDeep Image Homography Estimation\nLaplacian LRR on Product Grassmann Manifolds for Human Activity  Clustering in Multi-Camera Video Surveillance\nVisual-Inertial-Semantic Scene Representation for 3-D Object Detection\nDCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size\nRichardson-Lucy Deblurring for Moving Light Field Cameras\nEfficient adaptation of complex-valued noiselet sensing matrices for  compressed single-pixel imaging\nRegularization With Stochastic Transformations and Perturbations for  Deep Semi-Supervised Learning\nIn the Shadows, Shape Priors Shine: Using Occlusion to Improve  Multi-Region Segmentation\nNatural Scene Character Recognition Using Robust PCA and Sparse  Representation\nWatch What You Just Said: Image Captioning with Text-Conditional  Attention\nFree Form based active contours for image segmentation and free space  perception\nCombining multiscale features for classification of hyperspectral  images: a sequence based kernel approach\n3DFS: Deformable Dense Depth Fusion and Segmentation for Object  Reconstruction from a Handheld Camera\nCLEAR: Covariant LEAst-square Re-fitting with applications to image  restoration\nHow many faces can be recognized? Performance extrapolation for  multi-class classification\nLearning feed-forward one-shot learners\nHuman Attention in Visual Question Answering: Do Humans and Deep  Networks Look at the Same Regions?\nGenerating Object Cluster Hierarchies for Benchmarking\nA Survey of Pansharpening Methods with A New Band-Decoupled Variational  Model\nPerfect Fingerprint Orientation Fields by Locally Adaptive Global Models\nDualNet: Domain-Invariant Network for Visual Question Answering\nDetection and Tracking of Liquids with Fully Convolutional Networks\nRecognizing Surgical Activities with Recurrent Neural Networks\nMultiple Instance Hyperspectral Target Characterization\nEfficient 2D and 3D Facade Segmentation using Auto-Context\n3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation\nAutomatic Neuron Detection in Calcium Imaging Data Using Convolutional  Networks\nLearning to Poke by Poking: Experiential Learning of Intuitive Physics\nCoupled Generative Adversarial Networks\nMultipartite Ranking-Selection of Low-Dimensional Instances by  Supervised Projection to High-Dimensional Space\nA Taxonomy and Library for Visualizing Learned Features in Convolutional  Neural Networks\nTraining LDCRF model on unsegmented sequences using Connectionist  Temporal Classification\nScalable image coding based on epitomes\nLCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior  Learning\nGeometry in Active Learning for Binary and Multi-class Image  Segmentation\nA spectral-spatial fusion model for robust blood pulse waveform  extraction in photoplethysmographic imaging\nMultiphase Segmentation For Simultaneously Homogeneous and Textural  Images\nZero-Shot Learning with Multi-Battery Factor Analysis\nParking Stall Vacancy Indicator System Based on Deep Convolutional  Neural Networks\nmaskSLIC: Regional Superpixel Generation with Application to Local  Pathology Characterisation in Medical Images\nFully-Convolutional Siamese Networks for Object Tracking\nSparse Graphical Representation based Discriminant Analysis for  Heterogeneous Face Recognition\nNoise Models in Feature-based Stereo Visual Odometry\nMachine-based Multimodal Pain Assessment Tool for Infants: A Review\nKeyframe-based monocular SLAM: design, survey, and future directions\nActive Object Localization in Visual Situations\nAn Analysis System for DNA Gel Electrophoresis Images Based on Automatic  Thresholding an Enhancement\nAutomatic Techniques for Gridding cDNA Microarray Images\nRobust Deep Appearance Models\nA Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single  RGB Images\nFacial Expression Classification Using Rotation Slepian-based Moment  Invariants\nImproving Sparse Representation-Based Classification Using Local  Principal Component Analysis\nAggressive actions and anger detection from multiple modalities using  Kinect\nLearning the semantic structure of objects from Web supervision\nFeature Selection Library (MATLAB Toolbox)\nObject Recognition and Identification Using ESM Data\nOn a method for Rock Classification using Textural Features and Genetic  Optimization\nVideoLSTM Convolves, Attends and Flows for Action Recognition\nIterative Multi-domain Regularized Deep Learning for Anatomical  Structure Detection and Segmentation from Ultrasound Images\nDeep Depth Super-Resolution : Learning Depth Super-Resolution using Deep  Convolutional Neural Network\nUntrimmed Video Classification for Activity Detection: submission to  ActivityNet Challenge\nOvercoming Challenges in Fixed Point Training of Deep Convolutional  Networks\nSiamese Regression Networks with Efficient mid-level Feature Extraction  for 3D Object Pose Estimation\nNon-Central Catadioptric Cameras Pose Estimation using 3D Lines\nScreen Content Image Segmentation Using Robust Regression and Sparse  Decomposition\nA Photometrically Calibrated Benchmark For Monocular Visual Odometry\nAction Recognition with Joint Attention on Multi-Level Deep Features\nVisual Dynamics: Probabilistic Future Frame Synthesis via Cross  Convolutional Networks\nAugmenting Supervised Emotion Recognition with Rule-Based Decision Model\nTowards an \"In-the-Wild\" Emotion Dataset Using a Game-based Framework\nAdversarial Training For Sketch Retrieval\nEfficient Activity Detection in Untrimmed Video with Max-Subgraph Search\nHypergraph Modelling for Geometric Model Fitting\nLearning a metric for class-conditional KNN\nFast Cosine Transform to increase speed-up and efficiency of  Karhunen-Loeve Transform for lossy image compression\nGland Instance Segmentation by Deep Multichannel Side Supervision\nLocal feature hierarchy for face recognition across pose and  illumination\nWeakly Supervised Learning of Heterogeneous Concepts in Videos\nA Variational Model for Joint Motion Estimation and Image Reconstruction\nDeepBinaryMask: Learning a Binary Mask for Video Compressive Sensing\nA Representation Theory Perspective on Simultaneous Alignment and  Classification\nEnd-to-end training of object class detectors for mean average precision\nImproved Multi-Class Cost-Sensitive Boosting via Estimation of the  Minimum-Risk Class\nHierarchical learning for DNN-based acoustic scene classification\nAdaptable Precomputation for Random Walker Image Segmentation and  Registration\nEnd-to-End Learning for Image Burst Deblurring\nA Real-Time Deep Learning Pedestrian Detector for Robot Navigation\nDAVE: A Unified Framework for Fast Vehicle Detection and Annotation\nSpatial Context based Angular Information Preserving Projection for  Hyperspectral Image Classification\nPerson Re-identification with Hyperspectral Multi-Camera Systems --- A  Pilot Study\nConstruction of extended 3D field of views of the internal bladder wall  surface: a proof of concept\nGland Instance Segmentation by Deep Multichannel Neural Networks\nComposite Kernel Local Angular Discriminant Analysis for Multi-Sensor  Geospatial Image Analysis\nSparse Representation-Based Classification: Orthogonal Least Squares or  Orthogonal Matching Pursuit?\nEnd-to-end optimization of nonlinear transform codes for perceptual  quality\nDeep Cascaded Bi-Network for Face Hallucination\nRecycle deep features for better object detection\nQuery-Focused Extractive Video Summarization\nHeMIS: Hetero-Modal Image Segmentation\nA Multi-task Deep Network for Person Re-identification\nGenerating Images Part by Part with Composite Generative Adversarial  Networks\nTraining Skinny Deep Neural Networks with Iterative Hard Thresholding  Methods\nSupervised Transformer Network for Efficient Face Detection\nDual Purpose Hashing\nPerson Re-identification for Real-world Surveillance Systems\nOn the Modeling of Error Functions as High Dimensional Landscapes for  Weight Initialization in Learning Networks\nDeep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose  Estimation\nHashmod: A Hashing Method for Scalable 3D Object Detection\nLocal Multiple Directional Pattern of Palmprint Image\nHaze Visibility Enhancement: A Survey and Quantitative Benchmarking\nReal-Time Intensity-Image Reconstruction for Event Cameras Using  Manifold Regularisation\nConfidence-Weighted Local Expression Predictions for Occlusion Handling  in Expression Recognition and Action Unit detection\nA Multi-cut Formulation for Joint Segmentation and Tracking of Multiple  Objects\nFast Robust Monocular Depth Estimation for Obstacle Detection with Fully  Convolutional Networks\nReasoning about Body-Parts Relations for Sign Language Recognition\nHierarchical Attention Network for Action Recognition in Videos\nPrior-based Coregistration and Cosegmentation\nAn ensemble learning method for scene classification based on Hidden  Markov Model image representation\nA probabilistic patch based image representation using Conditional  Random Field model for image classification\nRecurrent Regression for Face Recognition\nDeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation\nLocal- and Holistic- Structure Preserving Image Super Resolution via  Deep Joint Component Learning\nAutomatic Attribute Discovery with Neural Activations\nLearning Aligned Cross-Modal Representations from Weakly Aligned Data\nAutomated quantification of one-dimensional nanostructure alignment on  surfaces\nSymmetry-free SDP Relaxations for Affine Subspace Clustering\nSalient Object Subitizing\nSemantic Image Inpainting with Deep Generative Models\nGeneric 3D Convolutional Fusion for image restoration\nScale Invariant Interest Points with Shearlets\nRegion-based semantic segmentation with end-to-end training\nJoint Optical Flow and Temporally Consistent Semantic Segmentation\nHow scientific literature has been evolving over the time? A novel  statistical approach using tracking verbal-based methods\nA Continuous Optimization Approach for Efficient and Accurate Scene Flow\nATGV-Net: Accurate Depth Super-Resolution\nImproving Semantic Embedding Consistency by Metric Learning for  Zero-Shot Classification\nMLPnP - A Real-Time Maximum Likelihood Solution to the  Perspective-n-Point Problem\nA Siamese Long Short-Term Memory Architecture for Human  Re-Identification\nFaceless Person Recognition; Privacy Implications in Social Media\nA Nonlocal Denoising Algorithm for Manifold-Valued Images Using Second  Order Statistics\nFine-To-Coarse Global Registration of RGB-D Scans\nConnectionist Temporal Modeling for Weakly Supervised Action Labeling\nSwiDeN : Convolutional Neural Networks For Depiction Invariant Object  Recognition\nAnalysis of a low memory implementation of the Orthogonal Matching  Pursuit greedy strategy\nSegmentation Free Object Discovery in Video\nAutonomous driving challenge: To Infer the property of a dynamic object  based on its motion pattern using recurrent neural network\nBuilt-in Foreground/Background Prior for Weakly-Supervised Semantic  Segmentation\nGreedy MAXCUT Algorithms and their Information Content\nStochastic Learning of Multi-Instance Dictionary for Earth Mover's  Distance based Histogram Comparison\nTowards Segmenting Consumer Stereo Videos: Benchmark, Baselines and  Ensembles\nA Probabilistic Optimum-Path Forest Classifier for Binary Classification  Problems\nCombining Fully Convolutional and Recurrent Neural Networks for 3D  Biomedical Image Segmentation\nA Deep Multi-Level Network for Saliency Prediction\nA max-cut approach to heterogeneity in cryo-electron microscopy\nEfficient Volumetric Fusion of Airborne and Street-Side Data for Urban  Reconstruction\nObject Specific Deep Learning Feature and Its Application to Face  Detection\nMulti-instance Dynamic Ordinal Random Fields for Weakly-Supervised Pain  Intensity Estimation\nJoint Alignment of Multiple Point Sets with Batch and Incremental  Expectation-Maximization\nConfidence-aware Levenberg-Marquardt optimization for joint motion  estimation and super-resolution\nBest-Buddies Similarity - Robust Template Matching using Mutual Nearest  Neighbors\nDiscriminating image textures with the multiscale two-dimensional  complexity-entropy causality plane\nHuman pose estimation via Convolutional Part Heatmap Regression\nPerformance Measures and a Data Set for Multi-Target, Multi-Camera  Tracking\nA Boosting Method to Face Image Super-resolution\nDAiSEE: Towards User Engagement Recognition in the Wild\nObject Tracking via Dynamic Feature Selection Processes\nHuman Body Orientation Estimation using Convolutional Neural Network\nOptimizing Codes for Source Separation in Color Image Demosaicing and  Compressive Video Recovery\nAutomated Segmentation of Retinal Layers from Optical Coherent  Tomography Images Using Geodesic Distance\nEar-to-ear Capture of Facial Intrinsics\nQuantifying Radiographic Knee Osteoarthritis Severity using Deep  Convolutional Neural Networks\nLatest Datasets and Technologies Presented in the Workshop on Grasping  and Manipulation Datasets\nGenerating Videos with Scene Dynamics\nAutomatic Selection of Stochastic Watershed Hierarchies\nThe Role of Context Selection in Object Detection\nLearning-Based View Synthesis for Light Field Cameras\nSequential Deep Trajectory Descriptor for Action Recognition with  Three-stream CNN\nStyle-Transfer via Texture-Synthesis\nLearning Semantic Part-Based Models from Google Images\nActive Canny: Edge Detection and Recovery with Open Active Contour  Models\nMUG: A Parameterless No-Reference JPEG Quality Evaluator Robust to Block  Size and Misalignment\nHyperspectral Unmixing with Endmember Variability using Partial  Membership Latent Dirichlet Allocation\nA Multi-Scale Cascade Fully Convolutional Network Face Detector\nDilemma First Search for Effortless Optimization of NP-Hard Problems\nDetecting Text in Natural Image with Connectionist Text Proposal Network\nReliable Attribute-Based Object Recognition Using High Predictive Value  Classifiers\n3D Simulation for Robot Arm Control with Deep Q-Learning\nImage Decomposition Using a Robust Regression Approach\nVIPLFaceNet: An Open Source Deep Face Recognition SDK\nThe CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better\nSingle-image RGB Photometric Stereo With Spatially-varying Albedo\nContextLocNet: Context-Aware Deep Network Models for Weakly Supervised  Localization\nCombining Texture and Shape Cues for Object Recognition With Minimal  Supervision\nLearning Robust Features for Gait Recognition by Maximum Margin  Criterion\nTransport-based analysis, modeling, and learning from signal and data  distributions\nFrom the Skin-Depth Equation to the Inverse RFEC Sensor Model\nA Glimpse Far into the Future: Understanding Long-term Crowd Worker  Quality\nVisual Stability Prediction and Its Application to Manipulation\nUnbiased Sparse Subspace Clustering By Selective Pursuit\nRadon-Gabor Barcodes for Medical Image Retrieval\nSemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural  Networks\nDeep Kinematic Pose Regression\nPose from Action: Unsupervised Learning of Pose Features based on Motion\nConsistent Discretization and Minimization of the L1 Norm on Manifolds\nCoarse-to-fine Surgical Instrument Detection for Cataract Surgery  Monitoring\nDeep CTR Prediction in Display Advertising\nA very fast iterative algorithm for TV-regularized image reconstruction  with applications to low-dose and few-view CT\nTransfer Learning for Material Classification using Convolutional  Networks\nHands-Free Segmentation of Medical Volumes via Binary Inputs\nMatrix Variate RBM Model with Gaussian Distributions\nDetecting facial landmarks in the video based on a hybrid framework\nMulti-View Constraint Propagation with Consensus Prior Knowledge\nProduction-Level Facial Performance Capture Using Deep Convolutional  Neural Networks\nFaceNet2ExpNet: Regularizing a Deep Face Recognition Net for Expression  Recognition\nPixelNet: Towards a General Pixel-level Architecture\nHow should we evaluate supervised hashing?\nDeep Learning for Video Classification and Captioning\nHow Useful is Region-based Classification of Remote Sensing Images in a  Deep Learning Framework?\nWalker-Independent Features for Gait Recognition from Motion Capture  Data\nCustomized Facial Constant Positive Air Pressure (CPAP) Masks\nDeep Quality: A Deep No-reference Quality Assessment System\nExample-Based Image Synthesis via Randomized Patch-Matching\nReal-time Human Pose Estimation from Video with Convolutional Neural  Networks\nA Rotation Invariant Latent Factor Model for Moveme Discovery from  Static Poses\nThree Tiers Neighborhood Graph and Multi-graph Fusion Ranking for  Multi-feature Image Retrieval: A Manifold Aspect\nPerceptual uniform descriptor and Ranking on manifold: A bridge between  image representation and ranking for image retrieval\nDeep learning based fence segmentation and removal from an image using a  video sequence\nLinear Support Tensor Machine: Pedestrian Detection in Thermal Infrared  Images\nRobust Regression For Image Binarization Under Heavy Noises and  Nonuniform Background\nSwipe Mosaics from Video\nImage Retrieval with Fisher Vectors of Binary Features\nTensor Based Second Order Variational Model for Image Reconstruction\nHouse price estimation from visual and textual features\nLearning convolutional neural network to maximize Pos@Top performance  measure\nBlind Facial Image Quality Enhancement using Non-Rigid Semantic Patches\nTask Specific Adversarial Cost Function\nA Transportation $L^p$ Distance for Signal Analysis\nScalable Discrete Supervised Hash Learning with Asymmetric Matrix  Factorization\nUnderstanding data augmentation for classification: when to warp?\nTransforming building industry and health outcomes through social  data-supported design\nEffective Combination of Language and Vision Through Model Composition  and the R-CCA Method\nA Discriminative Framework for Anomaly Detection in Large Videos\nLearning to Push by Grasping: Using multiple tasks for effective  learning\nStructure-Aware Classification using Supervised Dictionary Learning\nCNN-aware Binary Map for General Semantic Segmentation\nA comparative study of complexity of handwritten Bharati characters with  that of major Indian scripts\nModelling depth for nonparametric foreground segmentation using RGBD  devices\nRobust Moving Objects Detection in Lidar Data Exploiting Visual Cues\nDeep Tracking on the Move: Learning to Track the World from a Moving  Vehicle using Recurrent Neural Networks\nCooperative Training of Descriptor and Generator Networks\nA Searchlight Factor Model Approach for Locating Shared Information in  Multi-Subject fMRI Analysis\nMulti-dimensional signal approximation with sparse structured priors  using split Bregman iterations\nTwo-stage Convolutional Part Heatmap Regression for the 1st 3D Face  Alignment in the Wild (3DFAW) Challenge\nA CNN Cascade for Landmark Guided Semantic Part Segmentation\nLatent fingerprint minutia extraction using fully convolutional network\nNear-Infrared Image Dehazing Via Color Regularization\nDeep Learning Algorithms for Signal Recognition in Long Perimeter  Monitoring Distributed Fiber Optic Sensors\nDeep Feature Consistent Variational Autoencoder\nPlug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal  Event Detection\nRain structure transfer using an exemplar rain image for synthetic rain  image generation\nVideo Pixel Networks\nDeep Visual Foresight for Planning Robot Motion\nSparsity-based Color Image Super Resolution via Exploiting Cross Channel  Constraints\nMulti-View Representation Learning: A Survey from Shallow Methods to  Deep Methods\nFind Your Own Way: Weakly-Supervised Segmentation of Path Proposals for  Urban Autonomy\nDomain Adaptation with Soft-margin multiple feature-kernel learning  beats Deep Learning for surveillance face recognition\nRecognizing and Presenting the Storytelling Video Structure with Deep  Multimodal Networks\nLearning Optimal Parameters for Multi-target Tracking with Contextual  Interactions\nTemplate shape estimation: correcting an asymptotic bias\nSupervision via Competition: Robot Adversaries for Learning Tasks\nPCA-aided Fully Convolutional Networks for Semantic Segmentation of  Multi-channel fMRI\nSearching Scenes by Abstracting Things\nDo They All Look the Same? Deciphering Chinese, Japanese and Koreans by  Fine-Grained Deep Learning\nUtilizing High-level Visual Feature for Indoor Shopping Mall Navigation\nXception: Deep Learning with Depthwise Separable Convolutions\nApproximate Nearest Neighbor Search on High Dimensional Data ---  Experiments, Analyses, and Improvement (v1.0)\n4D Crop Monitoring: Spatio-Temporal Reconstruction for Agriculture\nBoost K-Means\nVisual Closed-Loop Control for Pouring Liquids\nLearning Spatial-Semantic Context with Fully Convolutional Recurrent  Network for Online Handwritten Chinese Text Recognition\nZero Shot Hashing\nContent Based Image Retrieval (CBIR) in Remote Clinical Diagnosis and  Healthcare\nLearning Low Dimensional Convolutional Neural Networks for  High-Resolution Remote Sensing Image Retrieval\nFaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual  Reality\nRestoring STM images via Sparse Coding: noise and artifact removal\nFused DNN: A deep neural network fusion approach to fast and robust  pedestrian detection\nDeep Learning Assessment of Tumor Proliferation in Breast Cancer  Histological Images\nVisual Place Recognition with Probabilistic Vertex Voting\nFast Training of Convolutional Neural Networks via Kernel Rescaling\nDeep Fruit Detection in Orchards\nDetecting Unseen Falls from Wearable Devices using Channel-wise Ensemble  of Autoencoders\nRecursive Diffeomorphism-Based Regression for Shape Functions\nSemi-Coupled Two-Stream Fusion ConvNets for Action Recognition at  Extremely Low Resolutions\nPredicting the dynamics of 2d objects with a deep residual network\nVideo Fill in the Blank with Merging LSTMs\nTowards end-to-end optimisation of functional image analysis pipelines\nAutomatic View-Point Selection for Inter-Operative Endoscopic  Surveillance\nImproved phase-unwrapping method using geometric constraints\nAre Accuracy and Robustness Correlated?\nA Closed Form Solution to Multi-View Low-Rank Regression\nIncremental One-Class Models for Data Classification\nRecovering the Missing Link: Predicting Class-Attribute Associations for  Unsupervised Zero-Shot Learning\nARTiS: Appearance-based Action Recognition in Task Space for Real-Time  Human-Robot Collaboration\nEdge Based Grid Super-Imposition for Crowd Emotion Recognition\nMaster's Thesis : Deep Learning for Visual Recognition\nDeep Identity-aware Transfer of Facial Attributes\nFrom Traditional to Modern : Domain Adaptation for Action Classification  in Short Social Video Clips\nLensless Imaging with Compressive Ultrafast Sensing\nStuffNet: Using 'Stuff' to Improve Object Detection\nPOI: Multiple Object Tracking with High Performance Detection and  Appearance Feature\nAdaptive Substring Extraction and Modified Local NBNN Scoring for Binary  Feature-based Local Mobile Visual Search without False Positives\nDynamic Probabilistic Network Based Human Action Recognition\nEfficient Estimation of Compressible State-Space Models with Application  to Calcium Signal Deconvolution\nORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D  Cameras\nModel-based Outdoor Performance Capture\nFine-grained Recognition in the Noisy Wild: Sensitivity Analysis of  Convolutional Neural Networks Approaches\nSpectral Angle Based Unary Energy Functions for Spatial-Spectral  Hyperspectral Classification using Markov Random Fields\nMultitask Learning of Vegetation Biochemistry from Hyperspectral Data\nOptimization on Submanifolds of Convolution Kernels in CNNs\nExercise Motion Classification from Large-Scale Wearable Sensor Data  Using Convolutional Neural Networks\nSPiKeS: Superpixel-Keypoints Structure for Robust Visual Tracking\nTheoretical Analysis of Active Contours on Graphs\nFeature Sensitive Label Fusion with Random Walker for Atlas-based Image  Segmentation\nAutomatic and Manual Segmentation of Hippocampus in Epileptic Patients  MRI\nA data augmentation methodology for training machine/deep learning gait  recognition algorithms\nLearning a Probabilistic Latent Space of Object Shapes via 3D  Generative-Adversarial Modeling\nA Learned Representation For Artistic Style\nCamera Fingerprint: A New Perspective for Identifying User's Identity\nActive User Authentication for Smartphones: A Challenge Data Set and  Benchmark Results\nPATH: Person Authentication using Trace Histories\nEstimating the concentration of gold nanoparticles incorporated on  Natural Rubber membranes using Multi-Level Starlet Optimal Segmentation\nPCM and APCM Revisited: An Uncertainty Perspective\nEstimation of Bandlimited Grayscale Images From the Single Bit  Observations of Pixels Affected by Additive Gaussian Noise\nVolumetric Light-field Encryption at the Microscopic Scale\nSingle- and Multi-Task Architectures for Surgical Workflow Challenge at  M2CAI 2016\nSingle- and Multi-Task Architectures for Tool Presence Detection  Challenge at M2CAI 2016\nDetecting People in Artwork with CNNs\nCross-Modal Scene Networks\nCompressive Holographic Video\nIcon: An Interactive Approach to Train Deep Neural Networks for  Segmentation of Neuronal Structures\nLearnable Visual Markers\nDetecting Breast Cancer using a Compressive Sensing Unmixing Algorithm\nLearning Adaptive Parameter Tuning for Image Processing\nAsynchronous Stochastic Block Coordinate Descent with Variance Reduction\nFlyCap: Markerless Motion Capture Using Multiple Autonomous Flying  Cameras\nConditional Image Synthesis With Auxiliary Classifier GANs\nCompressed Learning: A Deep Neural Network Approach\nAccurate Deep Representation Quantization with Gradient Snapping Layer  for Similarity Search\nVisual Tracking via Boolean Map Representations\nJoint Large-Scale Motion Estimation and Image Reconstruction\nConfocalGN : a minimalistic confocal image simulator\nBi-modal First Impressions Recognition using Temporally Ordered Deep  Audio and Stochastic Visual Features\nExploiting Spatio-Temporal Structure with Recurrent Winner-Take-All  Networks\nStructured illumination microscopy with unknown patterns and a  statistical prior\nCombining Multiple Cues for Visual Madlibs Question Answering\nStatistical Inverse Formulation of Optical Flow with Uncertainty  Quantification\nAdversarial Machine Learning at Scale\nLearning Identity Mappings with Residual Gates\nWhat Is the Best Practice for CNNs Applied to Visual Instance Retrieval?\nThe Shallow End: Empowering Shallower Deep-Convolutional Networks  through Auxiliary Outputs\nLearning to Act by Predicting the Future\nAction2Activity: Recognizing Complex Activities from Sensor Data\nChinese/English mixed Character Segmentation as Semantic Segmentation\nHamiltonian operator for spectral shape analysis\nSpatiotemporal Residual Networks for Video Action Recognition\nUnsupervised Cross-Domain Image Generation\nMeat adulteration detection through digital image analysis of  histological cuts using LBP\nMultiple Object Tracking with Kernelized Correlation Filters in Urban  Mixed Traffic\nThe Loss Surface of Residual Networks: Ensembles and the Role of Batch  Normalization\nEstimating motion with principal component regression strategies\nMultispectral Deep Neural Networks for Pedestrian Detection\nA backward pass through a CNN using a generative model of its  activations\nGaussian process regression can turn non-uniform and undersampled  diffusion MRI data into diffusion spectrum imaging\nNode-Adapt, Path-Adapt and Tree-Adapt:Model-Transfer Domain Adaptation  for Random Forest\nMahalanobis Distance for Class Averaging of Cryo-EM Images\nError concealment by means of motion refinement and regularized Bregman  divergence\nVariables effecting photomosaic reconstruction and ortho-rectification  from aerial survey datasets\nFast Algorithm of High-resolution Microwave Imaging Using the  Non-parametric Generalized Reflectivity Model\nConstruction Inspection through Spatial Database\nAdaptive Deep Pyramid Matching for Remote Sensing Scene Classification\nLearning Multi-Scale Deep Features for High-Resolution Satellite Image  Classification\nLearning to Navigate in Complex Environments\nLeveraging Video Descriptions to Learn Video Question Answering\nLeast Squares Generative Adversarial Networks\nHand Gesture Recognition for Contactless Device Control in Operating  Rooms\nBaseline CNN structure analysis for facial expression recognition\n3-D Convolutional Neural Networks for Glioblastoma Segmentation\nWhen Saliency Meets Sentiment: Understanding How Image Content Invokes  Emotion and Sentiment\nMotion Estimated-Compensated Reconstruction with Preserved-Features in  Free-Breathing Cardiac MRI\nScale-constrained Unsupervised Evaluation Method for Multi-scale Image  Segmentation\nDiversity encouraged learning of unsupervised LSTM ensemble for neural  activity video prediction\nOne-to-Many Network for Visually Pleasing Compression Artifacts  Reduction\nCost-Sensitive Deep Learning with Layer-Wise Cost Estimation\nOne-Shot Video Object Segmentation\nJoint Network based Attention for Action Recognition\nA Combinatorial Solution to Non-Rigid 3D Shape-to-Image Matching\nDeep Transfer Learning for Person Re-identification\nTemporal Convolutional Networks for Action Segmentation and Detection\nGeneralisation and Sharing in Triplet Convnets for Sketch based Visual  Search\nA Semi-supervised Framework for Image Captioning\nLip Reading Sentences in the Wild\nFully-adaptive Feature Sharing in Multi-Task Networks with Applications  in Person Attribute Classification\nDynamic Attention-controlled Cascaded Shape Regression Exploiting  Training Data Augmentation and Fuzzy-set Sample Weighting\nAggregated Residual Transformations for Deep Neural Networks\nSelf-calibration-based Approach to Critical Motion Sequences of  Rolling-shutter Structure from Motion\nProbabilistic Fluorescence-Based Synapse Detection\nSemantic Regularisation for Recurrent Image Annotation\nOn the Exploration of Convolutional Fusion Networks for Visual  Recognition\nDeep Feature Interpolation for Image Content Changes\nInstance-aware Image and Sentence Matching with Selective Multimodal  LSTM\nSCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks  for Image Captioning\nLearning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation\nBuilding Deep Networks on Grassmann Manifolds\nCross-Domain Face Verification: Matching ID Document and Self-Portrait  Photographs\nReweighted Low-Rank Tensor Decomposition based on t-SVD and its  Applications in Video Denoising\nReweighted Low-Rank Tensor Completion and its Applications in Video  Recovery\nOnline Visual Multi-Object Tracking via Labeled Random Finite Set  Filtering\nEnd-to-End Subtitle Detection and Recognition for Videos in East Asian  Languages via CNN Ensemble with Near-Human-Level Performance\nExpert Gate: Lifelong Learning with a Network of Experts\nEar Recognition: More Than a Survey\nUnderstanding Anatomy Classification Through Attentive Response Maps\nInvertible Conditional GANs for image editing\nDeep Outdoor Illumination Estimation\nOn The Stability of Video Detection and Tracking\nLearning Fully Convolutional Networks for Iterative Non-blind  Deconvolution\nObject Recognition with and without Objects\nRefineNet: Multi-Path Refinement Networks for High-Resolution Semantic  Segmentation\nTemporal Generative Adversarial Nets with Singular Value Clipping\nPhrase Localization and Visual Relationship Detection with Comprehensive  Image-Language Cues\nCascaded Face Alignment via Intimacy Definition Feature\nDeep Learning for the Classification of Lung Nodules\nResFeats: Residual Network Based Features for Image Classification\nDeep Temporal Linear Encoding Networks\nMulti-Modality Fusion based on Consensus-Voting and 3D Convolution for  Isolated Gesture Recognition\nEfficient Convolutional Neural Network with Binary Quantization Layer\nImage-to-Image Translation with Conditional Adversarial Networks\nLearning Multi-level Deep Representations for Image Emotion  Classification\nA Spatial and Temporal Non-Local Filter Based Data Fusion\nSingle-View and Multi-View Depth Fusion\nActive learning with version spaces for object detection\nSmart Library: Identifying Books in a Library using Richly Supervised  Deep Scene Text Reading\nGrad-CAM: Why did you say that?\nScene Labeling using Gated Recurrent Units with Explicit Long Range  Conditioning\nSelf-learning Scene-specific Pedestrian Detectors using a Progressive  Latent Model\nSar image despeckling based on nonlocal similarity sparse decomposition\nLearning Joint Feature Adaptation for Zero-Shot Recognition\nFast Fourier Color Constancy\nT-CONV: A Convolutional Neural Network For Multi-scale Taxi Trajectory  Prediction\nDeep Convolutional Neural Networks with Merge-and-Run Mappings\niCaRL: Incremental Classifier and Representation Learning\nPoseTrack: Joint Multi-Person Pose Estimation and Tracking\nConvergence Analysis of MAP based Blur Kernel Estimation\nControlling Perceptual Factors in Neural Style Transfer\nThe World of Fast Moving Objects\nImage-based localization using LSTMs for structured feature correlation\nImage Segmentation Using Overlapping Group Sparsity\nDeep Restricted Boltzmann Networks\nStraight to Shapes: Real-time Detection of Encoded Shapes\nRealtime Multi-Person 2D Pose Estimation using Part Affinity Fields\nRecalling Holistic Information for Semantic Segmentation\nExtraction of airway trees using multiple hypothesis tracking and  template matching\nComparative study of histogram distance measures for re-identification\nAdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for  Human Action Recognition in Videos\nWeakly Supervised Cascaded Convolutional Networks\nLearning an Invariant Hilbert Space for Domain Adaptation\nDeep Video Deblurring\nNeural Machine Translation with Latent Semantic of Image and Text\nDirectional Mean Curvature for Textured Image Demixing\nTexture analysis using deterministic partially self-avoiding walk with  thresholds\nConvolutional Experts Constrained Local Model for Facial Landmark  Detection\nReal-Time Video Highlights for Yahoo Esports\nSAD-GAN: Synthetic Autonomous Driving using Generative Adversarial  Networks\nDeep Deformable Registration: Enhancing Accuracy by Fully Convolutional  Neural Net\nLong-Term Image Boundary Prediction\nUniform Information Segmentation\nSemantic Scene Completion from a Single Depth Image\nImproving Fully Convolution Network for Semantic Segmentation\n3D Human Pose Estimation from a Single Image via Distance Matrix  Regression\nAwesome Typography: Statistics-Based Text Effects Transfer\nSocial Scene Understanding: End-to-End Multi-Person Action Localization  and Collective Activity Recognition\nSpatio-Temporal Movements in Team Sports: A Visualization approach using  Motion Charts\nWho's that Actor? Automatic Labelling of Actors in TV series starting  from IMDB Images\nGaze Embeddings for Zero-Shot Image Classification\nHierarchical Boundary-Aware Neural Encoder for Video Captioning\nSocial Behavior Prediction from First Person Videos\nDeep Quantization: Encoding Convolutional Activations with Deep  Generative Model\nOcclusion-Aware Video Deblurring with a New Layered Blur Model\nFast Face-swap Using Convolutional Neural Networks\nA Large-scale Distributed Video Parsing and Evaluation Platform\nSurveillance Video Parsing with Single Frame Supervision\nInterpoNet, A brain inspired neural network for optical flow dense  interpolation\nSplit-Brain Autoencoders: Unsupervised Learning by Cross-Channel  Prediction\nWeakly-supervised Discriminative Patch Learning via CNN for Fine-grained  Recognition\nAttend in groups: a weakly-supervised deep learning framework for  learning from web data\nHigh-Resolution Image Inpainting using Multi-Scale Neural Patch  Synthesis\nModeling Relationships in Referential Expressions with Compositional  Modular Networks\nDeep Cuboid Detection: Beyond 2D Bounding Boxes\nActive Deep Learning for Classification of Hyperspectral Images\nUser Dependent Features in Online Signature Verification\nEffective Quantization Methods for Recurrent Neural Networks\nSync-DRAW: Automatic Video Generation using Deep Recurrent Attentive  Architectures\nGeneralized Fourier-Bessel operator and almost-periodic interpolation  and approximation\nBeyond standard benchmarks: Parameterizing performance evaluation in  visual object tracking\nCDVAE: Co-embedding Deep Variational Auto Encoder for Conditional  Variational Generation\nRMPE: Regional Multi-person Pose Estimation\nBASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network  for Hyperspectral Image Classification\nMonge's Optimal Transport Distance for Image Classification\nFlight Dynamics-based Recovery of a UAV Trajectory using Ground Cameras\nLearning in an Uncertain World: Representing Ambiguity Through Multiple  Hypotheses\nLearning to Generate Images of Outdoor Scenes from Attributes and  Semantic Layouts\nVideo Captioning with Multi-Faceted Attention\nPlaying Doom with SLAM-Augmented Deep Reinforcement Learning\nAnomaly Detection in Video Using Predictive Convolutional Long  Short-Term Memory Networks\nLearning Shape Abstractions by Assembling Volumetric Primitives\nComputerized Multiparametric MR image Analysis for Prostate Cancer  Aggressiveness-Assessment\nTorontoCity: Seeing the World with a Million Eyes\nIn Teacher We Trust: Learning Compressed Models for Pedestrian Detection\nGuided Open Vocabulary Image Captioning with Constrained Beam Search\nPointNet: Deep Learning on Point Sets for 3D Classification and  Segmentation\nLearning to Search on Manifolds for 3D Pose Estimation of Articulated  Objects\nA Point Set Generation Network for 3D Object Reconstruction from a  Single Image\nGlobally Consistent Multi-People Tracking using Motion Patterns\nSyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation\nCentrog Feature technique for vehicle type recognition at day and night  times\nVoxelwise nonlinear regression toolbox for neuroimage analysis:  Application to aging and neurodegenerative disease modeling\nIdentifying and Categorizing Anomalies in Retinal Imaging Data\nAction Recognition with Dynamic Image Networks\nA Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images\nCommonly Uncommon: Semantic Sparsity in Situation Recognition\nMining Spatio-temporal Data on Industrialization from Historical  Registries\nShort-term traffic flow forecasting with spatial-temporal correlation in  a hybrid deep learning framework\nAreas of Attention for Image Captioning\nSemi-Automated Annotation of Discrete States in Large Video Datasets\nLearning to Segment Object Proposals via Recursive Neural Networks\nWord Recognition with Deep Conditional Random Fields\nSkin Cancer Detection and Tracking using Data Synthesis and Deep  Learning\nEnd-to-end Learning of Driving Models from Large-scale Video Datasets\nPyramid Scene Parsing Network\nGeneral models for rational cameras and the case of two-slit projections\nWho is Mistaken?\nDeep Metric Learning via Facility Location\nDeep Multi-Modal Image Correspondence Learning\nLocal Blur Mapping: Exploiting High-Level Semantics by Deep Neural  Networks\nTurning an Urban Scene Video into a Cinemagraph\nCancerous Nuclei Detection and Scoring in Breast Cancer  Histopathological Images\nDeep Image Category Discovery using a Transferred Similarity Function\nPanoramic Structure from Motion via Geometric Relationship Detection\nHuman-In-The-Loop Person Re-Identification\nAuthoring image decompositions with generative models\nROAM: a Rich Object Appearance Model with Application to Rotoscoping\nAutomatic Event Detection for Signal-based Surveillance\nMarioQA: Answering Questions by Watching Gameplay Videos\nDeep Stereo Matching with Dense CRF Priors\nVideo Ladder Networks\nFLIC: Fast Linear Iterative Clustering with Active Search\nKnowing When to Look: Adaptive Attention via A Visual Sentinel for Image  Captioning\nMultimodal Transfer: A Hierarchical Deep Convolutional Neural Network  for Fast Artistic Style Transfer\nLearning Diverse Image Colorization\nCore Sampling Framework for Pixel Classification\nLearning Localized Geometric Features Using 3D-CNN: An Application to  Manufacturability Analysis of Drilled Holes\nConsensus Based Medical Image Segmentation Using Semi-Supervised  Learning And Graph Cuts\nFusion of Range and Thermal Images for Person Detection\nSaliency Driven Image Manipulation\nA Matrix Splitting Method for Composite Function Minimization\nPano2Vid: Automatic Cinematography for Watching 360$^{\\circ}$ Videos\nResearch on the Multiple Feature Fusion Image Retrieval Algorithm based  on Texture Feature and Rough Set Theory\nDiscrete Schroedinger Transform For Texture Recognition\nComplex Matrix Factorization for Face Recognition\nAn Efficient Algorithm for the Piecewise-Smooth Model with Approximately  Explicit Solutions\nAGA: Attribute Guided Augmentation\nFilter sharing: Efficient learning of parameters for volumetric  convolutions\nLearning Video Object Segmentation from Static Images\nFCNs in the Wild: Pixel-level Adversarial and Constraint-based  Adaptation\nDomain knowledge assisted cyst segmentation in OCT retinal images\nPredicting Ground-Level Scene Layout from Aerial Imagery\nA Maximum A Posteriori Estimation Framework for Robust High Dynamic  Range Video Synthesis\n3D Shape Segmentation with Projective Convolutional Networks\nDeep TEN: Texture Encoding Network\nExploiting 2D Floorplan for Building-scale Panorama RGBD Alignment\nFast Fourier single-pixel imaging using binary illumination\nActionFlowNet: Learning Motion Representation for Action Recognition\nFollowing Gaze Across Views\nBoundary-aware Instance Segmentation\nTowards an Automated Image De-fencing Algorithm Using Sparsity\nText-guided Attention Model for Image Captioning\nA Binary Convolutional Encoder-decoder Network for Real-time Natural  Scene Text Processing\nGeneralizable Features From Unsupervised Learning\nDeep Supervised Hashing with Triplet Labels\nSpatial Pyramid Convolutional Neural Network for Social Event Detection  in Static Image\nFast Patch-based Style Transfer of Arbitrary Style\nDisentangling Space and Time in Video with Hierarchical Variational  Auto-encoders\nSingle Image Action Recognition using Semantic Body Part Actions\nAstronomical image reconstruction with convolutional neural networks\nPermutation-equivariant neural networks applied to dynamics prediction\nEfficient phase retrieval based on dark fringe recognition with an  ability of bypassing invalid fringes\nLinear feature detection algorithm for astronomical surveys - I.  Algorithm description\nSuper-resolution Reconstruction of SAR Image based on Non-Local Means  Denoising Combined with BP Neural Network\nFast-AT: Fast Automatic Thumbnail Generation using Deep Neural Networks\nA fuzzy approach for segmentation of touching characters\nBorder-Peeling Clustering\nRegressing Robust and Discriminative 3D Morphable Models with a very  Deep Neural Network\nTowards Score Following in Sheet Music Images\nCoupling Adaptive Batch Sizes with Learning Rates\nLearning Residual Images for Face Attribute Manipulation\nOutput Constraint Transfer for Kernelized Correlation Filter in Tracking\nDeep Residual Hashing\nSymbioCity: Smart Cities for Smarter Networks\nUnsupervised Pixel-Level Domain Adaptation with Generative Adversarial  Networks\nOn the crucial impact of the coupling projector-backprojector in  iterative tomographic reconstruction\nA Fusion Method Based on Decision Reliability Ratio for Finger Vein  Verification\nMicroscopic Muscle Image Enhancement\nLearning to predict where to look in interactive environments using deep  recurrent q-learning\n3D Shape Induction from 2D Views of Multiple Objects\nDeep Learning on Lie Groups for Skeleton-based Action Recognition\nAdversarial Deep Structural Networks for Mammographic Mass Segmentation\nX-ray In-Depth Decomposition: Revealing The Latent Structures\nActive and Continuous Exploration with Deep Neural Networks and Expected  Model Output Changes\nCrowd collectiveness measure via graph-based node clique learning\nSemantic Jitter: Dense Supervision for Visual Comparisons via Synthetic  Images\nAsynchronous Temporal Fields for Action Recognition\nFractal Descriptors of Texture Images Based on the Triangular Prism  Dimension\nBinary Distance Transform to Improve Feature Extraction\nExploring Structure for Long-Term Tracking of Multiple Objects in Sports  Videos\n3D Human Pose Estimation = 2D Pose Estimation + Matching\nEnd-to-End Pedestrian Collision Warning System based on a Convolutional  Neural Network with Semantic Segmentation\nDynamic Action Recognition: A convolutional neural network model for  temporally organized joint location data\nTwo decades of local binary patterns: A survey\nUnsupervised Place Discovery for Visual Place Classification\nTemporal Tessellation: A Unified Approach for Video Analysis\nImage biomarker standardisation initiative\nAn Empirical Study of Language CNN for Image Captioning\nTrilaminar Multiway Reconstruction Tree for Efficient Large Scale  Structure from Motion\nA Unified Framework for Tumor Proliferation Score Prediction in Breast  Histopathology\nLearning Motion Patterns in Videos\nTop-down Visual Saliency Guided by Captions\nDeep Blind Compressed Sensing\nHandwriting recognition using Cohort of LSTM and lexicon verification  with extremely large lexicon\nAdversarial Examples Detection in Deep Networks with Convolutional  Filter Statistics\nFirst-Person Activity Forecasting with Online Inverse Reinforcement  Learning\nEnhanceNet: Single Image Super-Resolution Through Automated Texture  Synthesis\nUnderstanding Non-optical Remote-sensed Images: Needs, Challenges and  Ways Forward\nCorrelation Preserving Sparse Coding Over Multi-level Dictionaries for  Image Denoising\nJoint denoising and distortion correction of atomic scale scanning  transmission electron microscopy images\nYOLO9000: Better, Faster, Stronger\nExtracting Sub-Exposure Images from a Single Capture Through  Fourier-based Optical Modulation\nAn Automated CNN Recommendation System for Image Classification Tasks\nEnd-to-End Data Visualization by Metric Learning and Coordinate  Transformation\nAn FFT-based Synchronization Approach to Recognize Human Behaviors using  STN-LFP Signal\nMultivariate mixture model for myocardium segmentation combining  multi-source images\nFastMask: Segment Multi-scale Object Candidates in One Shot\nMARTA GANs: Unsupervised Representation Learning for Remote Sensing  Image Classification\nPartial Membership Latent Dirichlet Allocation\nLearning Visual N-Grams from Web Data\nGeneralized Intersection Kernel\nDeep Learning Logo Detection with Data Expansion by Synthesising Context\nA Unified Tensor-based Active Appearance Face Model\np-DLA: A Predictive System Model for Onshore Oil and Gas Pipeline  Dataset Classification and Monitoring - Part 1\nEgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras  (Extended Abstract)\nThe Geodesic Distance between $\\mathcal{G}_I^0$ Models and its  Application to Region Discrimination\nLifting from the Deep: Convolutional 3D Pose Estimation from a Single  Image\nWeakly Supervised Semantic Segmentation using Web-Crawled Videos\nRetrieving Similar X-Ray Images from Big Image Data Using Radon Barcodes  with Single Projections\nDeep-HiTS: Rotation Invariant Convolutional Neural Network for Transient  Detection\nImage denoising using group sparsity residual and external nonlocal  self-similarity prior\nConstrained Deep Weak Supervision for Histopathology Image Segmentation\nSemi-Supervised Endmember Identification In Nonlinear Spectral Mixtures  Via Semantic Representation\nLearning a Mixture of Deep Networks for Single Image Super-Resolution\nAn Evaluation Framework and Database for MoCap-Based Gait Recognition  Methods\nThe Dem@Care Experiments and Datasets: a Technical Report\nQuantitative Analysis of Automatic Image Cropping Algorithms: A Dataset  and Comparative Study\nMotion Deblurring in the Wild\nDistinguishing Posed and Spontaneous Smiles by Facial Dynamics\nTo Boost or Not to Boost? On the Limits of Boosted Trees for Object  Detection\nLarge-scale Isolated Gesture Recognition Using Convolutional Neural  Networks\nOriented Response Networks\nSign Language Recognition Using Temporal Classification\nDeepFace: Face Generation using Deep Learning\nTracking The Untrackable: Learning To Track Multiple Cues with Long-Term  Dependencies\nMS and PAN image fusion by combining Brovey and wavelet methods\nImproved Texture Networks: Maximizing Quality and Diversity in  Feed-forward Stylization and Texture Synthesis\nA Learning-based Variable Size Part Extraction Architecture for 6D  Object Pose Recovery in Depth\nMultiple Instance Hybrid Estimator for Learning Target Signatures\nInformation Pursuit: A Bayesian Framework for Sequential Scene Parsing\nScene Graph Generation by Iterative Message Passing\nDeep Learning for Logo Recognition\nChaLearn Looking at People: A Review of Events and Resources\nUnsupervised Image-to-Image Translation with Generative Adversarial  Networks\nFull-reference image quality assessment-based B-mode ultrasound image  similarity measure\nStochastic Generative Hashing\nContext-aware Captions from Context-agnostic Supervision\nMultivariate Regression with Grossly Corrupted Observations: A Robust  Approach and its Applications\nA More General Robust Loss Function\nKähler structures on spaces of framed curves\nOrdered Pooling of Optical Flow Sequences for Action Recognition\nProbabilistic Diffeomorphic Registration: Representing Uncertainty\nA Digital Fuzzy Edge Detector for Color Images\nJoint Dictionary Learning for Example-based Image Super-resolution\nComprehension-guided referring expressions\nMaximum Entropy Flow Networks\nReal-Time Optical flow-based Video Stabilization for Unmanned Aerial  Vehicles\nLearning Linear Dynamical Systems with High-Order Tensor Data for  Skeleton based Action Recognition\nBoosting Dictionary Learning with Error Codes\nIterative Block Tensor Singular Value Thresholding for Extraction of Low  Rank Component of Image Data\nUnderstanding the Effective Receptive Field in Deep Convolutional Neural  Networks\nAuxiliary Multimodal LSTM for Audio-visual Speech Recognition and  Lipreading\nAutomatic Spatial Context-Sensitive Cloud/Cloud-Shadow Detection in  Multi-Source Multi-Spectral Earth Observation Images: AutoCloud+\nHierarchical Salient Object Detection for Assisted Grasping\nClassification of MRI data using Deep Learning and Gaussian  Process-based Model Selection\nFusing Deep Learned and Hand-Crafted Features of Appearance, Shape, and  Dynamics for Automatic Pain Estimation\nImage Generation and Editing with Variational Info Generative  AdversarialNetworks\nConvolutional Oriented Boundaries: From Image Segmentation to High-Level  Tasks\n3D Reconstruction of Simple Objects from A Single View Silhouette Image\nSynthesizing Normalized Faces from Facial Identity Features\nCompression of Deep Neural Networks for Image Instance Retrieval\nBringing Impressionism to Life with Neural Style Transfer in Come Swim\nPixel Objectness\nFusionSeg: Learning to combine motion and appearance for fully automatic  segmention of generic objects in videos\nMoving to VideoKifu: the last steps toward a fully automatic  record-keeping of a Go game\nHigher-order Pooling of CNN Features via Kernel Linearization for Action  Recognition\nSynthetic to Real Adaptation with Generative Correlation Alignment  Networks\nHigh Performance Novel Skin Segmentation Algorithm for Images With  Complex Background\nFast and Efficient Skin Detection for Facial Detection\nDual Recovery Network with Online Compensation for Image  Super-Resolution\nA Large-scale Dataset and Benchmark for Similar Trademark Retrieval\nLyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive  Patterns in Vowel Acoustics\nDeadNet: Identifying Phototoxicity from Label-free Microscopy Images of  Cells using Deep ConvNets\nPerception-based energy functions in seam-cutting\nGreedy Compositional Clustering for Unsupervised Learning of  Hierarchical Compositional Models\nPerson Re-Identification via Recurrent Feature Aggregation\nNonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time  Warping Spaces\nLearning what to look in chest X-rays with a recurrent visual attention  model\nPerceptually Optimized Image Rendering\nResidual and Plain Convolutional Neural Networks for 3D Brain MRI  Classification\nDSSD : Deconvolutional Single Shot Detector\nTraining Group Orthogonal Neural Networks with Privileged Information\nLearning Multi-level Region Consistency with Dense Multi-label Networks  for Semantic Segmentation\nAn Edge Driven Wavelet Frame Model for Image Restoration\nTowards End-to-End Face Recognition through Alignment Learning\nDeep Local Video Feature for Action Recognition\nA Multi-view RGB-D Approach for Human Pose Estimation in Operating Rooms\nRecovering 3D Planar Arrangements from Videos\nCase Study of a highly automated Layout Analysis and OCR of an  incunabulum: 'Der Heiligen Leben' (1488)\nLearning an attention model in an artificial visual system\nSparse Ternary Codes for similarity search have higher coding gain than  dense binary codes\nPose Invariant Embedding for Deep Person Re-identification\nStructural Connectome Validation Using Pairwise Classification\nQuasi-homography warps in image stitching\nSampling Without Time: Recovering Echoes of Light via Temporal Phase  Retrieval\nExploiting saliency for object segmentation from image level labels\nTreelogy: A Novel Tree Classifier Utilizing Deep and Hand-crafted  Representations\nPooling Facial Segments to Face: The Shallow and Deep Ends\nSupervised Deep Sparse Coding Networks\nVINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning  Problem\nMSCM-LiFe: Multi-scale cross modal linear feature for horizon detection  in maritime images\nRandom Forest regression for manifold-valued responses\nFaceness-Net: Face Detection through Deep Facial Part Responses\nWhen Slepian Meets Fiedler: Putting a Focus on the Graph Spectrum\nSafeDrive: A Robust Lane Tracking System for Autonomous and Assisted  Driving Under Limited Visibility\nScalable Nearest Neighbor Search based on kNN Graph\nSelf-Adaptation of Activity Recognition Systems to New Sensors\nICT Green Governance: new generation model based on Corporate Social  Responsibility and Green IT\nEmergence of Selective Invariance in Hierarchical Feed Forward Networks\n3D Shape Retrieval via Irrelevance Filtering and Similarity Ranking  (IF/SR)\nCo-segmentation for Space-Time Co-located Collections\nDeep Reinforcement Learning for Visual Object Tracking in Videos\nTowards Adversarial Retinal Image Synthesis\nA New Method for Removing the Moire' Pattern from Images\nDeepNav: Learning to Navigate Large Cities\nHigh Order Stochastic Graphlet Embedding for Graph-Based Pattern  Recognition\nDesign, Analysis and Application of A Volumetric Convolutional Neural  Network\nA Kinematic Chain Space for Monocular Motion Capture\nEvolving Boxes for Fast Vehicle Detection\nLearning to Compose with Professional Photographs on the Web\nSolving Uncalibrated Photometric Stereo Using Fewer Images by Jointly  Optimizing Low-rank Matrix Completion and Integrability\nPixel Recursive Super Resolution\nFCSS: Fully Convolutional Self-Similarity for Dense Semantic  Correspondence\nRandom Triangles and Polygons in the Plane\nJoint 2D-3D-Semantic Data for Indoor Scene Understanding\nExploring the microstructure manifold: image texture representations  applied to ultrahigh carbon steel microstructures\nTowards Unsupervised Weed Scouting for Agricultural Robotics\nUsing Complex Wavelet Transform and Bilateral Filtering for Image  Denoising\nLatent Hinge-Minimax Risk Minimization for Inference from a Small Number  of Training Samples\nGender-From-Iris or Gender-From-Mascara?\nDetailed Surface Geometry and Albedo Recovery from RGB-D Video Under  Natural Illumination\nContextually Customized Video Summaries via Natural Language\nSlice-to-volume medical image registration: a survey\nConcurrent Activity Recognition with Multimodal CNN-LSTM Structure\nView Independent Vehicle Make, Model and Color Recognition Using  Convolutional Neural Network\nA Deep Convolutional Neural Network for Background Subtraction\nLow Rank Matrix Recovery with Simultaneous Presence of Outliers and  Sparse Corruption\nA New Point-set Registration Algorithm for Fingerprint Matching\nImage Reconstruction using Matched Wavelet Estimated from Data Sensed  Compressively using Partial Canonical Identity Matrix\nFace Aging With Conditional Generative Adversarial Networks\nKeyframe-Based Visual-Inertial Online SLAM with Relocalization\nComparison of machine learning methods for classifying mediastinal lymph  node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images\nGenerating Multiple Diverse Hypotheses for Human 3D Pose Consistent with  2D Joint Detections\nAdversarial Attacks on Neural Network Policies\nScene-adapted plug-and-play algorithm with convergence guarantees\nVideo Frame Synthesis using Deep Voxel Flow\nSemi-Dense Visual Odometry for RGB-D Cameras Using Approximate Nearest  Neighbour Fields\nMonocular LSD-SLAM Integration within AR System\nBackpropagation Training for Fisher Vectors within Neural Networks\nSemi-Supervised Deep Learning for Monocular Depth Map Prediction\nL1-regularized Reconstruction Error as Alpha Matte\nAttribute-controlled face photo synthesis from simple line drawing\nEAC-Net: A Region-based Deep Enhancing and Cropping Approach for Facial  Action Unit Detection\nA New Rank Constraint on Multi-view Fundamental Matrices, and its  Application to Camera Location Recovery\nA clustering approach to heterogeneous change detection\nArtGAN: Artwork Synthesis with Conditional Categorical GANs\nA Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale  Invariance\nUnderwater Optical Image Processing: A Comprehensive Review\nOnline People Tracking and Identification with RFID and Kinect\nEstimation of the volume of the left ventricle from MRI images using  deep neural networks\nSSPP-DAN: Deep Domain Adaptation Network for Face Recognition with  Single Sample Per Person\nEfficient Algorithms for Moral Lineage Tracing\nOn Detecting Adversarial Perturbations\nDAGER: Deep Age, Gender and Emotion Recognition Using Convolutional  Neural Network\nScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes\nLearning from Ambiguously Labeled Face Images\nNormalized Total Gradient: A New Measure for Multispectral Image  Registration\nVisualizing Deep Neural Network Decisions: Prediction Difference  Analysis\nMulti-Task Convolutional Neural Network for Pose-Invariant Face  Recognition\nDeep Hybrid Similarity Learning for Person Re-identification\nDiscovering objects and their relations from entangled scene  representations\nImproving Text Proposals for Scene Images with Fully Convolutional  Networks\nAutomatic Handgun Detection Alarm in Videos Using Deep Learning\nThe Effect of Color Space Selection on Detectability and  Discriminability of Colored Objects\nAdversarial Discriminative Domain Adaptation\nAn Unsupervised Approach for Overlapping Cervical Cell Cytoplasm  Segmentation\nCollaborative Deep Reinforcement Learning for Joint Object Search\n3D Face Reconstruction with Geometry Details from a Single Image\nRobust Shape Registration using Fuzzy Correspondences\nZoom Out-and-In Network with Recursive Training for Object Proposal\nPerson Search with Natural Language Description\nA Survey on Deep Learning in Medical Image Analysis\nDeep learning-based assessment of tumor-associated stroma for diagnosing  breast cancer in histopathology images\nFrom Photo Streams to Evolving Situations\nThe importance of stain normalization in colorectal tissue  classification with convolutional networks\nReflection Separation Using Guided Annotation\nAn Extended Framework for Marginalized Domain Adaptation\nProjection based advanced motion model for cubic mapping for 360-degree  video\nVisual Tracking by Reinforced Decision Making\nJust DIAL: DomaIn Alignment Layers for Unsupervised Domain Adaptation\nBrnoCompSpeed: Review of Traffic Camera Calibration and Comprehensive  Dataset for Monocular Speed Measurement\nCrowd Sourcing Image Segmentation with iaSTAPLE\nPixelNet: Representation of the pixels, by the pixels, and for the  pixels\nVidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization\n3D Reconstruction of Temples in the Special Region of Yogyakarta By  Using Close-Range Photogrammetry\nMomentsNet: a simple learning-free method for binary image recognition\nBoosted Multiple Kernel Learning for First-Person Activity Recognition\nLearning Deep Features via Congenerous Cosine Loss for Person  Recognition\nConvolutional Neural Network Committees for Melanoma Classification with  Classical And Expert Knowledge Based Image Transforms Data Augmentation\nLearning Chained Deep Features and Classifiers for Cascade in Object  Detection\nViP-CNN: Visual Phrase Guided Convolutional Neural Network\nImproving high-pass fusion method using wavelets\nContinuous-Time Visual-Inertial Trajectory Estimation with Event Cameras\nSequence-based Multimodal Apprenticeship Learning For Robot Perception  and Decision Making\nRobot gains Social Intelligence through Multimodal Deep Reinforcement  Learning\nToward high-performance online HCCR: a CNN approach with DropDistortion,  path signature and spatial stochastic max-pooling\nFast and robust curve skeletonization for real-world elongated objects\nA recommender system to restore images with impulse noise\nAn EM Based Probabilistic Two-Dimensional CCA with Application to Face  Recognition\nLearning Deep NBNN Representations for Robust Place Categorization\nBARCHAN: Blob Alignment for Robust CHromatographic ANalysis\nSupervised Learning of Labeled Pointcloud Differences via Cover-Tree  Entropy Reduction\nBayesian Nonparametric Feature and Policy Learning for Decision-Making\nInstance Hash Segmentation\nAnticipating many futures: Online human motion prediction and synthesis  for human-robot collaboration\nMulti-Label Segmentation via Residual-Driven Adaptive Regularization\nShow, Attend and Interact: Perceivable Human-Robot Social Interaction  through Neural Attention Q-Network\nSuper-Trajectory for Video Segmentation\nBillion-scale similarity search with GPUs\nShaResNet: reducing residual network parameter number by sharing weights\nDeep Image Harmonization\nInertial Odometry on Handheld Smartphones\nIncorporating Intra-Class Variance to Fine-Grained Visual Recognition\nImproving Object Detection with Region Similarity Learning\nMulti-stage Neural Networks with Single-sided Classifiers for False  Positive Reduction and its Evaluation using Lung X-ray CT Images\nPerturb-and-MPM: Quantifying Segmentation Uncertainty in Dense  Multi-Label CRFs\nLossy Image Compression with Compressive Autoencoders\nMaking 360$^{\\circ}$ Video Watchable in 2D: Learning Videography for  Click Free Viewing\nLearning Social Affordance Grammar from Videos: Transferring Human  Interactions to Human-Robot Interactions\nA Deep Cascade of Convolutional Neural Networks for MR Image  Reconstruction\nA novel image tag completion method based on convolutional neural  network\nWireless Interference Identification with Convolutional Neural Networks\nUnsupervised Image-to-Image Translation Networks\nDepth Estimation using Modified Cost Function for Occlusion Handling\nBelief Propagation in Conditional RBMs for Structured Prediction\nA Novel Multi-task Deep Learning Model for Skin Lesion Segmentation and  Classification\nOutlier Cluster Formation in Spectral Clustering\nLearning Robot Activities from First-Person Human Videos Using  Convolutional Future Regression\nDenoising Adversarial Autoencoders\nIncident Light Frequency-based Image Defogging Algorithm\nLooking at Outfit to Parse Clothing\nStacking-based Deep Neural Network: Deep Analytic Network on  Convolutional Spectral Histogram Features\nAutomated Top View Registration of Broadcast Football Videos\nGenetic CNN\nLR-GAN: Layered Recursive Generative Adversarial Networks for Image  Generation\nPerceiving and Reasoning About Liquids Using Fully Convolutional  Networks\nL2GSCI: Local to Global Seam Cutting and Integrating for Accurate Face  Contour Extraction\nReasoning About Liquids via Closed-Loop Simulation\nSegICP: Integrated Deep Semantic Segmentation and Pose Estimation\nDiversified Texture Synthesis with Feed-forward Networks\n4-DoF Tracking for Robot Fine Manipulation Tasks\nBuilding a Regular Decision Boundary with Deep Networks\nAll the people around me: face discovery in egocentric photo-streams\nIncorporating the Knowledge of Dermatologists to Convolutional Neural  Networks for the Diagnosis of Skin Lesions\nNon-line-of-sight tracking of people at long range\nAn optimal hierarchical clustering approach to segmentation of mobile  LiDAR point clouds\nSharing Residual Units Through Collective Tensor Factorization in Deep  Neural Networks\nUsing Deep Learning Method for Classification: A Proposed Algorithm for  the ISIC 2017 Skin Lesion Classification Challenge\nX-ray Astronomical Point Sources Recognition Using Granular Binary-tree  SVM\nDeep Learning based Large Scale Visual Recommendation and Search for  E-Commerce\nQualitative Assessment of Recurrent Human Motion\nLearning from Noisy Labels with Distillation\nUnsupervised Visual-Linguistic Reference Resolution in Instructional  Videos\nData Noising as Smoothing in Neural Network Language Models\nA Pursuit of Temporal Accuracy in General Activity Detection\nA Linear Extrinsic Calibration of Kaleidoscopic Imaging System from  Single 3D Point\nDeep Bayesian Active Learning with Image Data\nTransformation-Grounded Image Generation Network for Novel 3D View  Synthesis\nQuaSI: Quantile Sparse Image Prior for Spatio-Temporal Denoising of  Retinal OCT Data\nDA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks\nEnd-to-end semantic face segmentation with conditional random fields as  convolutional, recurrent and adversarial networks\nUntrimmedNets for Weakly Supervised Action Recognition and Detection\nFast and Robust Detection of Fallen People from a Mobile Robot\nPosition Tracking for Virtual Reality Using Commodity WiFi\nA New Representation of Skeleton Sequences for 3D Action Recognition\nA Convolutional Neural Network Approach for Half-Pel Interpolation in  Video Coding\nMulti-frequency image reconstruction for radio-interferometry with  self-tuned regularization parameters\nFast LIDAR-based Road Detection Using Fully Convolutional Neural  Networks\nFrom Depth Data to Head Pose Estimation: a Siamese approach\nNegentropic Planar Symmetry Detector\nColorization as a Proxy Task for Visual Understanding\nMulti-Pose Face Recognition Using Hybrid Face Features Descriptor\nProstate Cancer Diagnosis using Deep Learning with 3D Multiparametric  MRI\nImproving Interpretability of Deep Neural Networks with Semantic  Information\nCo-occurrence Filter\nGUN: Gradual Upsampling Network for single image super-resolution\nPoisson multi-Bernoulli mixture filter: direct derivation and  implementation\nAutomatic Skin Lesion Segmentation using Semi-supervised Learning  Technique\nDeep Value Networks Learn to Evaluate and Iteratively Refine Structured  Outputs\nDeep Learning for Skin Lesion Classification\nExtrinsic Calibration of 3D Range Finder and Camera without Auxiliary  Object or Human Intervention\nImproving LBP and its variants using anisotropic diffusion\nDetailed, accurate, human shape estimation from clothed 3D scan  sequences\nLearning Background-Aware Correlation Filters for Visual Tracking\nSubspace Learning in The Presence of Sparse Structured Outliers and  Noise\nA PatchMatch-based Dense-field Algorithm for Video Copy-Move Detection  and Localization\n6-DoF Object Pose from Semantic Keypoints\nTracking Gaze and Visual Focus of Attention of People Involved in Social  Interaction\nVisual end-effector tracking using a 3D model-aided particle filter for  humanoid robot platforms\nIn Search of a Dataset for Handwritten Optical Music Recognition:  Introducing MUSCIMA++\nSkin lesion segmentation based on preprocessing, thresholding and neural  networks\nFace Recognition using Multi-Modal Low-Rank Dictionary Learning\nSource Camera Identification Based On Content-Adaptive Fusion Network\nLarge Margin Object Tracking with Circulant Feature Maps\nA Data Driven Approach for Compound Figure Separation Using  Convolutional Neural Networks\nBlock Compressive Sensing of Image and Video with Nonlocal Lagrangian  Multiplier and Patch-based Sparse Representation\nLearning to Discover Cross-Domain Relations with Generative Adversarial  Networks\nTransfer Learning for Melanoma Detection: Participation in ISIC 2017  Skin Lesion Classification Challenge\nA Hybrid Supervised-unsupervised Method on Image Topic Visualization  with Convolutional Neural Network and LDA\nIlluminant Estimation using Ensembles of Multivariate Regression Trees\nConvolutional Low-Resolution Fine-Grained Classification\nEnd-to-end optimization of goal-driven and visually grounded dialogue  systems\nGlobal and Local Information Based Deep Network for Skin Lesion  Segmentation\nSteganographic Generative Adversarial Networks\nCombining Contrast Invariant L1 Data Fidelities with Nonlinear Spectral  Image Decomposition\nSegmented and Directional Impact Detection for Parked Vehicles using  Mobile Devices\nSVDNet for Pedestrian Retrieval\nLearning Robust Hash Codes for Multiple Instance Image Retrieval\nLow-rank and Sparse NMF for Joint Endmembers' Number Estimation and  Blind Unmixing of Hyperspectral Images\nDropRegion Training of Inception Font Network for High-Performance  Chinese Font Recognition\nUnsupervised Anomaly Detection with Generative Adversarial Networks to  Guide Marker Discovery\nComparison of Different Methods for Tissue Segmentation in  Histopathological Whole-Slide Images\nSemi-Supervised Deep Learning for Fully Convolutional Networks\nPSF field learning based on Optimal Transport Distances\nSingle image super-resolution using self-optimizing mask via  fractional-order gradient interpolation and reconstruction\nDeep Tensor Encoding\nDirect Monocular Odometry Using Points and Lines\nZero-Shot Learning by Generating Pseudo Feature Representations\nTAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial  Network\nDetecting Oriented Text in Natural Images by Linking Segments\nArbitrary Style Transfer in Real-time with Adaptive Instance  Normalization\nMask R-CNN\nMulti-style Generative Network for Real-time Transfer\nActive Decision Boundary Annotation with Deep Generative Models\nCross-modal Deep Metric Learning with Multi-task Regularization\nProposal Flow: Semantic Correspondences from Object Proposals\nImproving Person Re-identification by Attribute and Identity Learning\nLicense Plate Detection and Recognition Using Deeply Learned  Convolutional Neural Networks\nPop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments\nPKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action  Understanding\nKnowledge Transfer for Melanoma Screening with Deep Learning\nDeep Photo Style Transfer\nVideo Frame Interpolation via Adaptive Convolution\nDeeply-Supervised CNN for Prostate Segmentation\nDeep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D  vehicle analysis from monocular image\nPredicting Deeper into the Future of Semantic Segmentation\nClassifying Symmetrical Differences and Temporal Change in Mammography  Using Deep Neural Networks\nCross-View Image Matching for Geo-localization in Urban Environments\nBidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning  for Hyperspectral Image Classification\nPerspective: Energy Landscapes for Machine Learning\nChanging Fashion Cultures\nRecurrent Multimodal Interaction for Referring Image Segmentation\nImage-based Localization using Hourglass Networks\nNonlinear Spectral Image Fusion\nContent-based similar document image retrieval using fusion of CNN  features\nSparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse  IMUs\nSaliency-guided video classification via adaptively weighted learning\nQuality Resilient Deep Neural Networks\nWeakly Supervised Action Learning with RNN based Fine-to-coarse Modeling\nSemi-Automatic Segmentation and Ultrasonic Characterization of Solid  Breast Lesions\nOn the Robustness of Convolutional Neural Networks to Internal  Architecture and Weight Perturbations\nView Adaptive Recurrent Neural Networks for High Performance Human  Action Recognition from Skeleton Data\nImproving Classification by Improving Labelling: Introducing  Probabilistic Multi-Label Object Interaction Recognition\nMedical Image Retrieval using Deep Convolutional Neural Network\nMulti-stage Multi-recursive-input Fully Convolutional Networks for  Neuronal Boundary Detection\nLocal Deep Neural Networks for Age and Gender Classification\nImproving the Accuracy of the CogniLearn System for Cognitive Behavior  Assessment\nSketch-based Face Editing in Video Using Identity Deformation Transfer\nOpen Vocabulary Scene Parsing\nMultivariate Regression with Gross Errors on Manifold-valued Data\nInfoGAIL: Interpretable Imitation Learning from Visual Demonstrations\nA Visual Measure of Changes to Weighted Self-Organizing Map Patterns\nScaling the Scattering Transform: Deep Hybrid Networks\nLIDAR-based Driving Path Generation Using Fully Convolutional Neural  Networks\nTrespassing the Boundaries: Labeling Temporal Bounds for Object  Interactions in Egocentric Video\nTransfer learning for music classification and regression tasks\nCoherent Online Video Style Transfer\nGraph Regularized Tensor Sparse Coding for Image Representation\nMixture of Counting CNNs: Adaptive Integration of CNNs Specialized to  Specific Appearance for Crowd Counting\nEvaluation of Classifiers for Image Segmentation: Applications for  Eucalypt Forest Inventory\nOctree Generating Networks: Efficient Convolutional Architectures for  High-resolution 3D Outputs\nRobust Depth-based Person Re-identification\nL2-constrained Softmax Loss for Discriminative Face Verification\nTwo-Stream RNN/CNN for Action Recognition in 3D Videos\nPerception Driven Texture Generation\nAutomatic Detection of Knee Joints and Quantification of Knee  Osteoarthritis Severity using Convolutional Neural Networks\nClick Here: Human-Localized Keypoints as Guidance for Viewpoint  Estimation\nLabelBank: Revisiting Global Perspectives for Semantic Segmentation\nLearning with Privileged Information for Multi-Label Classification\nWho's Better? Who's Best? Pairwise Deep Ranking for Skill Determination\nOn Convergence Property of Implicit Self-paced Objective\nImage Restoration using Autoencoding Priors\nA Geometric Framework for Stochastic Shape Analysis\nFlow-Guided Feature Aggregation for Video Object Detection\nImproved Lossy Image Compression with Priming and Spatially Adaptive Bit  Rates for Recurrent Networks\nGoogle Map Aided Visual Navigation for UAVs in GPS-denied Environment\nUnrestricted Facial Geometry Reconstruction Using Image-to-Image  Translation\nDetecting Human Interventions on the Landscape: KAZE Features, Poisson  Point Processes, and a Construction Dataset\nLearning High Dynamic Range from Outdoor Panoramas\nSmartphone Based Colorimetric Detection via Machine Learning\nSeGAN: Segmenting and Generating the Invisible\nDeNet: Scalable Real-time Object Detection with Directed Sparse Sampling\nPlanecell: Representing the 3D Space with Planes\nEfficient optimization for Hierarchically-structured Interacting  Segments (HINTS)\nBootstrapping Labelled Dataset Construction for Cow Tracking and  Behavior Analysis\nGeometric Affordances from a Single Example via the Interaction Tensor\nInterpretable Learning for Self-Driving Cars by Visualizing Causal  Attention\nDeep 3D Face Identification\nUnsupervised Holistic Image Generation from Key Local Patches\nNovel Framework for Spectral Clustering using Topological Node  Features(TNF)\nA Hybrid Data Association Framework for Robust Online Multi-Object  Tracking\nSingle Image Super Resolution - When Model Adaptation Matters\nThin-Slicing Network: A Deep Structured Model for Pose Estimation in  Videos\nInverseFaceNet: Deep Single-Shot Inverse Face Rendering From A Single  Image\nEfficient Registration of Pathological Images: A Joint  PCA/Image-Reconstruction Approach\nTopologyNet: Topology based deep convolutional neural networks for  biomolecular property predictions\nEfficient Asymmetric Co-Tracking using Uncertainty Sampling\nSafetyNet: Detecting and Rejecting Adversarial Examples Robustly\nComplexity-Aware Assignment of Latent Values in Discriminative Models  for Accurate Gesture Recognition\nEfficient Version-Space Reduction for Visual Tracking\nRandomness in Deconvolutional Networks for Visual Representation\nRestoration of Images with Wavefront Aberrations\nGeometric Loss Functions for Camera Pose Regression with Deep Learning\nA Comparison of Directional Distances for Hand Pose Estimation\nConvolutional neural networks for segmentation and object detection of  human semen\nCapturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model  with Salient Points\nBlock-Matching Convolutional Neural Network for Image Denoising\nThe 2017 DAVIS Challenge on Video Object Segmentation\nHierarchical Surface Prediction for 3D Object Reconstruction\nCascaded Segmentation-Detection Networks for Word-Level Text Spotting\nGuided Proofreading of Automatic Segmentations for Connectomics\nSimultaneous Feature Aggregating and Hashing for Large-scale Image  Search\nA Branch-and-Bound Algorithm for Checkerboard Extraction in Camera-Laser  Calibration\nPose2Instance: Harnessing Keypoints for Person Instance Segmentation\nEscape from Cells: Deep Kd-Networks for the Recognition of 3D Point  Cloud Models\nJoint Regression and Ranking for Image Enhancement\nInvestigating Human Factors in Image Forgery Detection\nIncremental Tube Construction for Human Action Detection\nOn the Relation between Color Image Denoising and Classification\nThe UMCD Dataset\nEffect of Super Resolution on High Dimensional Features for Unsupervised  Face Recognition in the Wild\nAutomatic Breast Ultrasound Image Segmentation: A Survey\nConvolutional Neural Networks for Page Segmentation of Historical  Document Images\nIsotropic reconstruction of 3D fluorescence microscopy images using  convolutional neural networks\nThe Relative Performance of Ensemble Methods with Deep Convolutional  Neural Networks for Image Classification\nAction Representation Using Classifier Decision Boundaries\nBeyond triplet loss: a deep quadruplet network for person  re-identification\nHigher-Order Minimum Cost Lifted Multicuts for Motion Segmentation\nA Convolution Tree with Deconvolution Branches: Exploiting Geometric  Relationships for Single Shot Keypoint Detection\nOnline Hashing\nSemantically-Guided Video Object Segmentation\n\"RAPID\" Regions-of-Interest Detection In Big Histopathological Images\nSupervised Deep Hashing for Hierarchical Labeled Data\nRestricted Isometry Property of Gaussian Random Projection for Finite  Set of Subspaces\nMulti-Scale Continuous CRFs as Sequential Deep Networks for Monocular  Depth Estimation\nReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical  Coherence Tomography using Fully Convolutional Network\nSemi-Latent GAN: Learning to generate and modify facial images from  attributes\nHand3D: Hand Pose Estimation using 3D Neural Network\nIt Takes (Only) Two: Adversarial Generator-Encoder Networks\nAutomated Unsupervised Segmentation of Liver Lesions in CT scans via  Cahn-Hilliard Phase Separation\nPixelwise Instance Segmentation with a Dynamically Instantiated Network\nA Deep Cascade of Convolutional Neural Networks for Dynamic MR Image  Reconstruction\nLearning Cross-Modal Deep Representations for Robust Pedestrian  Detection\nTowards 3D Human Pose Estimation in the Wild: a Weakly-supervised  Approach\nDSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks\nMetric Learning in Codebook Generation of Bag-of-Words for Person  Re-identification\nBigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis\nQuaternion Based Camera Pose Estimation From Matched Feature Points\nDeep Affordance-grounded Sensorimotor Object Recognition\nR-Clustering for Egocentric Video Segmentation\nLearning Human Motion Models for Long-term Predictions\nActionVLAD: Learning spatio-temporal aggregation for action  classification\nContinuously heterogeneous hyper-objects in cryo-EM and 3-D movies of  many temporal dimensions\nLoss Max-Pooling for Semantic Image Segmentation\nWeakly-Supervised Spatial Context Networks\nSemantically Consistent Regularization for Zero-Shot Recognition\nA semidiscrete version of the Citti-Petitot-Sarti model as a plausible  model for anthropomorphic image reconstruction and pattern recognition\nDetecting Visual Relationships with Deep Relational Networks\nDeep Multimodal Representation Learning from Temporal Data\nEAST: An Efficient and Accurate Scene Text Detector\nShow, Ask, Attend, and Answer: A Strong Baseline For Visual Question  Answering\nOnline Video Deblurring via Dynamic Temporal Blending Network\nA-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection\nForecasting Human Dynamics from Static Images\nLearning Detection with Diverse Proposals\nAttention-based Extraction of Structured Information from Street View  Imagery\nCutting the Error by Half: Investigation of Very Deep CNN and Advanced  Training Strategies for Document Image Classification\nInstance-Level Salient Object Segmentation\nFeature Tracking Cardiac Magnetic Resonance via Deep Learning and Spline  Optimization\nDilated Convolutional Neural Networks for Cardiovascular MR Segmentation  in Congenital Heart Disease\nObject proposal generation applying the distance dependent Chinese  restaurant process\nAttention-Set based Metric Learning for Video Face Recognition\nOptimal Threshold Design for Quanta Image Sensor\nWhat's in a Question: Using Visual Questions as a Form of Supervision\nDeep Reinforcement Learning-based Image Captioning with Embedding Reward\nAsymmetric Feature Maps with Application to Sketch Based Retrieval\nTractable Clustering of Data on the Curve Manifold\n2D-3D Pose Consistency-based Conditional Random Fields for 3D Human Pose  Estimation\nInterspecies Knowledge Transfer for Facial Keypoint Detection\nLand Cover Classification via Multi-temporal Spatial Data by Recurrent  Neural Networks\nDCFNet: Discriminant Correlation Filters Network for Visual Tracking\nLearning to Estimate Pose by Watching Videos\nRecognizing Activities of Daily Living from Egocentric Images\nNeural Face Editing with Intrinsic Image Disentangling\nHide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised  Object and Action Localization\nApplying High-Resolution Visible Imagery to Satellite Melt Pond Fraction  Retrieval: A Neural Network Approach\nDataset Augmentation for Pose and Lighting Invariant Face Recognition\nQuantum Biometrics with Retinal Photon Counting\nParallel Multi Channel Convolution using General Matrix Multiplication\nDeep Structured Learning for Facial Action Unit Intensity Estimation\nInterpretable 3D Human Action Analysis with Temporal Convolutional  Networks\nA Quadratic Penalty Method for Hypergraph Matching\nA Comprehensive Review of Smart Wheelchairs: Past, Present and Future\nTrigger for the SoLid Reactor Antineutrino Experiment\nCT Image Reconstruction in a Low Dimensional Manifold\nVideo Object Segmentation using Supervoxel-Based Gerrymandering\nRobust Optical Flow Estimation in Rainy Scenes\nImage Fusion With Cosparse Analysis Operator\nLight Field Blind Motion Deblurring\nAnnotating Object Instances with a Polygon-RNN\nIlluminant Spectra-based Source Separation Using Flash Photography\nLearning to Fly by Crashing\nOCRAPOSE II: An OCR-based indoor positioning system using mobile phone  images\nInsensitive Stochastic Gradient Twin Support Vector Machine for Large  Scale Problems\nFSITM: A Feature Similarity Index For Tone-Mapped Images\nSkeleton Boxes: Solving skeleton based action detection with a single  deep convolutional neural network\nSkeleton based action recognition using translation-scale invariant  image mapping and multi-scale deep cnn\nDesign of low-cost, compact and weather-proof whole sky imagers for  high-dynamic-range captures\nUnsupervised Creation of Parameterized Avatars\nA Deep Learning Framework using Passive WiFi Sensing for Respiration  Monitoring\nLearning Video Object Segmentation with Visual Memory\nNetwork Dissection: Quantifying Interpretability of Deep Visual  Representations\nLearn to Model Motion from Blurry Footages\nGenerative Face Completion\nHPatches: A benchmark and evaluation of handcrafted and learned local  descriptors\nBranchConnect: Large-Scale Visual Recognition with Learned Branch  Connections\nPredicting Cognitive Decline with Deep Learning of Brain Metabolism and  Amyloid Imaging\nEnd-to-end representation learning for Correlation Filter based tracking\nUnderstanding the Mechanisms of Deep Transfer Learning for Medical  Images\nEnd-to-End Unsupervised Deformable Image Registration with a  Convolutional Neural Network\nTraining object class detectors with click supervision\nTemporal Action Detection with Structured Segment Networks\nIdentifying First-person Camera Wearers in Third-person Videos\nNormFace: L2 Hypersphere Embedding for Face Verification\nMultiple Reflection Symmetry Detection via Linear-Directional Kernel  Density Estimation\nSolar Power Plant Detection on Multi-Spectral Satellite Imagery using  Weakly-Supervised CNN with Feedback Features and m-PCNN Fusion\nPQTable: Non-exhaustive Fast Search for Product-quantized Codes using  Hash Tables\nScatteract: Automated extraction of data from scatter plots\nSREFI: Synthesis of Realistic Example Face Images\nScaleNet: Guiding Object Proposal Generation in Supermarkets and Beyond\nDeep Learning based Isolated Arabic Scene Character Recognition\nDeep Learning for Medical Image Processing: Overview, Challenges and  Future\nSkeleton Key: Image Captioning by Skeleton-Attribute Decomposition\nModel-based Iterative Restoration for Binary Document Image Compression  with Dictionary Learning\nNon-Convex Weighted Lp Nuclear Norm based ADMM Framework for Image  Restoration\nExploiting Multi-layer Graph Factorization for Multi-attributed Graph  Matching\nUnified Framework for Automated Person Re-identification and Camera  Network Topology Inference in Camera Networks\nDense 3D Facial Reconstruction from a Single Depth Image in  Unconstrained Environment\nMonocular Visual Odometry with a Rolling Shutter Camera\nAutomatic Liver Lesion Segmentation Using A Deep Convolutional Neural  Network Method\nMeasuring the Accuracy of Object Detectors and Trackers\nDetecting and Recognizing Human-Object Interactions\nPaying Attention to Descriptions Generated by Image Captioning Models\nA Context Aware and Video-Based Risk Descriptor for Cyclists\nTowards a quality metric for dense light fields\nSkeleton-based Action Recognition with Convolutional Neural Networks\nPerivascular Spaces Segmentation in Brain MRI Using Optimal 3D Filtering\nJoint Sequence Learning and Cross-Modality Convolution for 3D Biomedical  Segmentation\nArabidopsis roots segmentation based on morphological operations and  CRFs\nHand Keypoint Detection in Single Images using Multiview Bootstrapping\nMulti-View Dynamic Facial Action Unit Detection\nSpatio-temporal Person Retrieval via Natural Language Queries\nAnisotropic twicing for single particle reconstruction using  autocorrelation analysis\nSphereFace: Deep Hypersphere Embedding for Face Recognition\nAutoDIAL: Automatic DomaIn Alignment Layers\nA Faster Patch Ordering Method for Image Denoising\nMisdirected Registration Uncertainty\nMultimodal MRI brain tumor segmentation using random forests with  features learned from fully convolutional neural network\nCompact Descriptors for Video Analysis: the Emerging MPEG Standard\nA Generalization of Convolutional Neural Networks to Graph-Structured  Data\nFace Identification and Clustering\nNo More Discrimination: Cross City Adaptation of Road Scene Segmenters\nDeep Functional Maps: Structured Prediction for Dense Shape  Correspondence\nAutomatic Real-time Background Cut for Portrait Videos\nActive Collaborative Ensemble Tracking\nA new image compression by gradient Haar wavelet\nImproving Small Object Proposals for Company Logo Detection\nUnbiased Shape Compactness for Segmentation\nA Unified Approach of Multi-scale Deep and Hand-crafted Features for  Defocus Estimation\nUnderstanding People Flow in Transportation Hubs\nDeep Multi-view Models for Glitch Classification\nThe Pose Knows: Video Forecasting by Generating Pose Futures\nEffective scaling registration approach by imposing the emphasis on the  scale factor\nIndoor Frame Recovery from Refined Line Segments\nDiscriminative Nonlinear Analysis Operator Learning: When Cosparse Model  Meets Image Classification\nSub-Pixel Registration of Wavelet-Encoded Images\nDetecting Drivable Area for Self-driving Cars: An Unsupervised Approach\nShearlet-based compressed sensing for fast 3D cardiac MR imaging using  iterative reweighting\nRegularized Residual Quantization: a multi-layer sparse dictionary  learning approach\nHyperspectral Image Classification with Markov Random Fields and a  Convolutional Neural Network\nDense-Captioning Events in Videos\nLesion detection and Grading of Diabetic Retinopathy via Two-stages Deep  Convolutional Neural Networks\nInvestigation of Different Skeleton Features for CNN-based 3D Action  Recognition\nTransfer Learning by Ranking for Weakly Supervised Object Annotation\nActive Image-based Modeling with a Toy Drone\nOut-of-focus: Learning Depth from Image Bokeh for Robotic Perception\nCascaded Boundary Regression for Temporal Action Detection\nMarine Animal Classification with Correntropy Loss Based Multi-view  Learning\nUnsupervised Part-based Weighting Aggregation of Deep Convolutional  Features for Image Retrieval\nDetach and Adapt: Learning Cross-Domain Disentangled Deep Representation\nWeakly-supervised Visual Grounding of Phrases with Linguistic Structures\nLearning to Estimate 3D Hand Pose from Single RGB Images\nGabor Convolutional Networks\nAm I Done? Predicting Action Progress in Videos\nFrom Zero-shot Learning to Conventional Supervised Classification:  Unseen Visual Data Synthesis\nAction Tubelet Detector for Spatio-Temporal Action Localization\nEdge-based Component-Trees for Multi-Channel Image Segmentation\nCharacterizing and Improving Stability in Neural Style Transfer\nTALL: Temporal Activity Localization via Language Query\nPhase Congruency Parameter Optimization for Enhanced Detection of Image  Features for both Natural and Medical Applications\nPart-based Deep Hashing for Large-scale Person Re-identification\nUnified Embedding and Metric Learning for Zero-Exemplar Event Detection\nDetecting Adversarial Samples Using Density Ratio Estimates\nFace Detection, Bounding Box Aggregation and Pose Estimation for Robust  Facial Landmark Localisation in the Wild\nDeep Patch Learning for Weakly Supervised Object Classification and  Discovery\nSparse Representation-based Open Set Recognition\nStock Volatility Prediction Using Recurrent Neural Networks with  Sentiment Analysis\nA Study and Comparison of Human and Deep Learning Recognition  Performance Under Visual Distortions\nContext-Aware Trajectory Prediction\nMultimodal Affect Analysis for Product Feedback Assessment\nAutomatic Recognition of Mammal Genera on Camera-Trap Images using  Multi-Layer Robust Principal Component Analysis and Mixture Neural Networks\nDeep Descriptor Transforming for Image Co-Localization\nScene Text Eraser\nVideo Processing for Barycenter Trajectory Identification in Diving\nMulti Resolution LSTM For Long Term Prediction In Neural Activity Video\nGeometric GAN\nLearning non-maximum suppression\nKonzept für Bildanalysen in Hochdurchsatz-Systemen am Beispiel des  Zebrabärblings\nReal-Time User-Guided Image Colorization with Learned Deep Priors\nDeep Spatio-temporal Manifold Network for Action Recognition\nContour Detection from Deep Patch-level Boundary Prediction\nConvolutional Dictionary Learning via Local Processing\nModel Complexity-Accuracy Trade-off for a Convolutional Neural Network\nAdaptive Regularization of Some Inverse Problems in Image Analysis\nBayesian Joint Topic Modelling for Weakly Supervised Object Localisation\nLearning Deep Networks from Noisy Labels with Dropout Regularization\nSignal reconstruction via operator guiding\nLearning RGB-D Salient Object Detection using background enclosure,  depth contrast, and top-down features\n4d isip: 4d implicit surface interest point detection\nAutomatic Brain Tumor Detection and Segmentation Using U-Net Based Fully  Convolutional Networks\nSCNet: Learning Semantic Correspondence\nA Generative Model of People in Clothing\nIncremental Learning Through Deep Adaptation\nChallenges in Monocular Visual Odometry: Photometric Calibration, Motion  Bias and Rolling Shutter Effect\nAn Optimal Dimensionality Multi-shell Sampling Scheme with Accurate and  Efficient Transforms for Diffusion MRI\nTransfer Learning for Cross-Dataset Recognition: A Survey\nView-Invariant Template Matching Using Homography Constraints\nAdaptive Feature Representation for Visual Tracking\nExternal Prior Guided Internal Prior Learning for Real Noisy Image  Denoising\nSelf-Committee Approach for Image Restoration Problems using  Convolutional Neural Network\nTowards a Principled Integration of Multi-Camera Re-Identification and  Tracking through Optimal Bayes Filters\nPerson Re-Identification by Deep Joint Learning of Multi-Loss  Classification\nCombination of Hidden Markov Random Field and Conjugate Gradient for  Brain Image Segmentation\nDeep neural networks on graph signals for brain imaging analysis\nGland Segmentation in Histopathology Images Using Random Forest Guided  Boundary Construction\nAirSim: High-Fidelity Visual and Physical Simulation for Autonomous  Vehicles\nSingle Image Super-Resolution Using Multi-Scale Convolutional Neural  Network\nDesign of a Very Compact CNN Classifier for Online Handwritten Chinese  Character Recognition Using DropWeight and Global Pooling\nCompressed Sensing for Scalable Robotic Tactile Skins\nView-invariant Gait Recognition through Genetic Template Segmentation\nA Deep Learning Based 6 Degree-of-Freedom Localization Method for  Endoscopic Capsule Robots\nHandwritten Urdu Character Recognition using 1-Dimensional BLSTM  Classifier\nWordFence: Text Detection in Natural Images with Border Awareness\nPicasso: A Modular Framework for Visualizing the Learning Process of  Neural Network Image Classifiers\nMotion-Compensated Temporal Filtering for Critically-Sampled  Wavelet-Encoded Images\nVolumetric Super-Resolution of Multispectral Data\nReal-Time Adaptive Image Compression\nLearning a Hierarchical Latent-Variable Model of 3D Shapes\nOne Shot Joint Colocalization and Cosegmentation\nPaMM: Pose-aware Multi-shot Matching for Improving Person  Re-identification\nA deep level set method for image segmentation\nDeep Diagnostics: Applying Convolutional Neural Networks for Vessels  Defects Detection\nBayer Demosaicking Using Optimized Mean Curvature over RGB channels\nRe3 : Real-Time Recurrent Regression Networks for Visual Tracking of  Generic Objects\nDetecting Cyber-Physical Attacks in Additive Manufacturing using Digital  Audio Signing\nAgent-Centric Risk Assessment: Accident Anticipation and Risky Region  Localization\nLearning Texture Manifolds with the Periodic Spatial GAN\nTarget-Quality Image Compression with Recurrent, Convolutional Neural  Networks\nModel-based Catheter Segmentation in MRI-images\nDeep-LK for Efficient Adaptive Object Tracking\nOnline Signature Verification using Recurrent Neural Network and  Length-normalized Path Signature\nPrediction of Sea Surface Temperature using Long Short-Term Memory\nHyperspectral Band Selection Using Unsupervised Non-Linear Deep Auto  Encoder to Train External Classifiers\nThe Kinetics Human Action Video Dataset\nClassification revisited: a web of knowledge\nWhat do We Learn by Semantic Scene Understanding for Remote Sensing  imagery in CNN framework?\nMulti-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry  and Semantics\nA New 3D Segmentation Methodology for Lumbar Vertebral Bodies for the  Measurement of BMD and Geometry\nMulti-Stage Variational Auto-Encoders for Coarse-to-Fine Image  Generation\nMultiple-Human Parsing in the Wild\nPixColor: Pixel Recursive Colorization\nRecurrent Scene Parsing with Perspective Understanding in the Loop\nEnd-to-end Planning of Fixed Millimeter-Wave Networks\nForecasting Hands and Objects in Future Frames\nCritical Contours: An Invariant Linking Image Flow with Salient Surface  Organization\nStabilizing Adversarial Nets With Prediction Methods\nIncorporating Depth into both CNN and CRF for Indoor Semantic  Segmentation\nGenerative Partition Networks for Multi-Person Pose Estimation\nThe Do's and Don'ts for CNN-based Face Verification\nBoosting the accuracy of multi-spectral image pan-sharpening by learning  a deep residual network\nSemantic Softmax Loss for Zero-Shot Learning\nRobust Localized Multi-view Subspace Clustering\nTricorNet: A Hybrid Temporal Convolutional and Recurrent Network for  Video Action Segmentation\nRegularizing deep networks using efficient layerwise adversarial  training\nStabilizing GAN Training with Multiple Random Projections\nImproving Fine-Grained Visual Classification using Pairwise Confusion\nLearning multiple visual domains with residual adapters\nMultiple Images Recovery Using a Single Affine Transformation\nPatchnet: Interpretable Neural Networks for Image Classification\nVisual Semantic Planning using Deep Successor Representations\nUniversal Style Transfer via Feature Transforms\nA Multi-Armed Bandit to Smartly Select a Training Set from Big Medical  Data\nCorrelation Alignment by Riemannian Metric for Domain Adaptation\nRidiculously Fast Shot Boundary Detection with Fully Convolutional  Neural Networks\nHow hard can it be? Estimating the difficulty of visual search in an  image\nSequence Summarization Using Order-constrained Kernelized Feature  Subspaces\nDeep Rotation Equivariant Network\nContinual Learning with Deep Generative Replay\nStochastic Sequential Neural Networks with Structured Inference\nBidirectional Beam Search: Forward-Backward Inference in Neural Sequence  Models for Fill-in-the-Blank Image Captioning\nThe Lovász-Softmax loss: A tractable surrogate for the optimization of  the intersection-over-union measure in neural networks\nUnsupervised Learning Layers for Video Analysis\nVisual Servoing from Deep Neural Networks\nGridNet with automatic shape prior registration for automatic MRI  cardiac segmentation\nWeakly Supervised Semantic Segmentation Based on Web Image  Co-segmentation\nClassification of Quantitative Light-Induced Fluorescence Images Using  Convolutional Neural Network\nPose Guided Person Image Generation\nUnsupervised Feature Learning for Writer Identification and Writer  Retrieval\nAlgorithmic clothing: hybrid recommendation, from street-style-to-shop\nPredicting Human Interaction via Relative Attention Model\nZero-Shot Learning with Generative Latent Prototype Model\nLearning Robust Features with Incremental Auto-Encoders\nClassification regions of deep neural networks\nBayesian GAN\nExtracting 3D Vascular Structures from Microscopy Images using  Convolutional Recurrent Networks\nEnd-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth  Data\nA Multi-strength Adversarial Training Method to Mitigate Adversarial  Attacks\nLiDAR-Camera Calibration using 3D-3D Point correspondences\nGlobal hard thresholding algorithms for joint sparse image  representation and denoising\nProbabilistic Global Scale Estimation for MonoSLAM Based on Generic  Object Detection\nVocabulary-informed Extreme Value Learning\nCross-modal Subspace Learning for Fine-grained Sketch-based Image  Retrieval\nCare about you: towards large-scale human-centric visual relationship  detection\nMulti-channel Weighted Nuclear Norm Minimization for Real Color Image  Denoising\nConditional CycleGAN for Attribute Guided Face Image Generation\nData Driven Coded Aperture Design for Depth Recovery\nEnsemble of Part Detectors for Simultaneous Classification and  Localization\nOn the Power Spectral Density Applied to the Analysis of Old Canvases\nPose-Aware Person Recognition\nTowards Visual Ego-motion Learning in Robots\nFeature Incay for Representation Regularization\nLearning to Generate Chairs with Generative Adversarial Nets\nRobust Tracking Using Region Proposal Networks\nDecorrelation of Neutral Vector Variables: Theory and Applications\nParcellation of Visual Cortex on high-resolution histological Brain  Sections using Convolutional Neural Networks\nSaliency Revisited: Analysis of Mouse Movements versus Fixations\nInterpreting and Extending The Guided Filter Via Cyclic Coordinate  Descent\nEnd-to-end Active Object Tracking via Reinforcement Learning\nNighttime sky/cloud image segmentation\nJeffrey's prior sampling of deep sigmoidal networks\nMulti-View Task-Driven Recognition in Visual Sensor Networks\nAddressing Ambiguity in Multi-target Tracking by Hierarchical Strategy\nPCM-TV-TFV: A Novel Two Stage Framework for Image Reconstruction from  Fourier Data\nGeneric Tubelet Proposals for Action Localization\nMorphological Error Detection in 3D Segmentations\nEfficient, sparse representation of manifold distance matrices for  classical scaling\nWeakly supervised 3D Reconstruction with Adversarial Constraint\nNaturally Combined Shape-Color Moment Invariants under Affine  Transformations\nEffective Target Aware Visual Navigation for UAVs\nClass Specific Feature Selection for Interval Valued Data Through  Interval K-Means Clustering\nDeep Supervised Discrete Hashing\nNeuron Segmentation Using Deep Complete Bipartite Networks\nEvaluationNet: Can Human Skill be Evaluated by Deep Networks?\nRepresentation Learning by Rotating Your Faces\nU-Phylogeny: Undirected Provenance Graph Construction in the Wild\nPutting a Face to the Voice: Fusing Audio and Visual Signals Across a  Video to Determine Speakers\nSuperhuman Accuracy on the SNEMI3D Connectomics Challenge\nDepth Structure Preserving Scene Image Generation\nDeep Mutual Learning\nFader Networks: Manipulating Images by Sliding Attributes\nMachine Assisted Analysis of Vowel Length Contrasts in Wolof\nData Augmentation of Wearable Sensor Data for Parkinson's Disease  Monitoring using Convolutional Neural Networks\nIntegrated Deep and Shallow Networks for Salient Object Detection\nSAR Image Despeckling Using a Convolutional\nRank Persistence: Assessing the Temporal Performance of Real-World  Person Re-Identification\nImage Restoration from Patch-based Compressed Sensing Measurement\nDynamic Steerable Blocks in Deep Residual Networks\nDual-reference Face Retrieval\nTemporal Action Labeling using Action Sets\nOne-Sided Unsupervised Domain Mapping\nMulti-Class Model Fitting by Energy Minimization and Mode-Seeking\nNeural Network-Based Automatic Liver Tumor Segmentation With Random  Forest-Based Candidate Filtering\nLearning Person Trajectory Representations for Team Activity Analysis\nConcurrence-Aware Long Short-Term Sub-Memories for Person-Person Action  Recognition\nGraph-Cut RANSAC\nImage Compression Based on Compressive Sensing: End-to-End Comparison  with JPEG\nWhere and Who? Automatic Semantic-Aware Person Composition\nBrain Intelligence: Go Beyond Artificial Intelligence\nLearning Structured Semantic Embeddings for Visual Recognition\n3D Pathfinding and Collision Avoidance Using Uneven Search-space  Quantization and Visual Cone Search\nVisual attention models for scene text recognition\nGeometric Multi-Model Fitting with a Convex Relaxation Algorithm\nHyperplane Clustering Via Dual Principal Component Pursuit\nA Minimal Solution for Two-view Focal-length Estimation using Two Affine  Correspondences\nUnderstanding and Eliminating the Large-kernel Effect in Blind  Deconvolution\nFace Alignment Using K-Cluster Regression Forests With Weighted  Splitting\nStreetStyle: Exploring world-wide clothing styles from millions of  photos\nImposing Hard Constraints on Deep Networks: Promises and Limitations\nUnsupervised Place Discovery for Place-Specific Change Classifier\nBiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with  Fully Convolutional Networks\nLearning to Represent Mechanics via Long-term Extrapolation and  Interpolation\nSynthesizing Filamentary Structured Images with GANs\nDriver Action Prediction Using Deep (Bidirectional) Recurrent Neural  Network\nCoMaL Tracking: Tracking Points at the Object Boundaries\nLow-shot learning with large-scale diffusion\nLearning to Extract Semantic Structure from Documents Using Multimodal  Fully Convolutional Neural Network\nActive Learning for Structured Prediction from Partially Labeled Data\nPointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric  Space\nImage Captioning with Object Detection and Localization\nSliced Wasserstein Generative Models\nStructured Light Phase Measuring Profilometry Pattern Design for Binary  Spatial Light Modulators\nTextureGAN: Controlling Deep Image Synthesis with Texture Patches\nFace Detection through Scale-Friendly Deep Convolutional Networks\nClass-specific Poisson denoising by patch-based importance sampling\nDCCO: Towards Deformable Continuous Convolution Operators\nUnsupervised learning of object frames by dense equivariant image  labelling\nCollaborative Summarization of Topic-Related Videos\nDiversity-aware Multi-Video Summarization\nDeep Learning for Isotropic Super-Resolution from Non-Isotropic 3D  Electron Microscopy\nExploring Convolutional Networks for End-to-End Visual Servoing\nGenerate Identity-Preserving Faces by Generative Adversarial Networks\nSegmentation of nearly isotropic overlapped tracks in photomicrographs  using successive erosions as watershed markers\nFew-Shot Image Recognition by Predicting Parameters from Activations\nExploring the similarity of medical imaging classification problems\nEnriched Deep Recurrent Visual Attention Model for Multiple Object  Recognition\nImage Crowd Counting Using Convolutional Neural Network and Markov  Random Field\nProgressive and Multi-Path Holistically Nested Neural Networks for  Pathological Lung Segmentation from CT Images\nTransferring a Semantic Representation for Person Re-Identification and  Search\nSmoothGrad: removing noise by adding noise\nSubspace Clustering via Optimal Direction Search\nCriteria Sliders: Learning Continuous Database Criteria via Interactive  Ranking\nContrast Enhancement Estimation for Digital Image Forensics\nLong-Term Video Interpolation with Bidirectional Predictive Network\nProbabilistic RGB-D Odometry based on Points, Lines and Planes Under  Depth Uncertainty\nText Extraction From Texture Images Using Masked Signal Decomposition\nIndirect Image Registration with Large Diffeomorphic Deformations\nVideo Imagination from a Single Image with Transformation Generation\nThe \"something something\" video database for learning and evaluating  visual common sense\nAction Search: Learning to Search for Human Activities in Untrimmed  Videos\nAFIF4: Deep Gender Classification based on AdaBoost-based Fusion of  Isolated Facial Features and Foggy Faces\nSaliency detection by aggregating complementary background template with  optimization framework\nPhoto-realistic Facial Texture Transfer\nSalProp: Salient object proposals via aggregated edge cues\nArabian Horse Identification Benchmark Dataset\nStochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling  and Imaging Applications\nA convolutional autoencoder approach for mining features in cellular  electron cryo-tomograms and weakly supervised coarse segmentation\nHierarchical Label Inference for Video Classification\nDistance weighted discrimination of face images for gender  classification\nThe Monkeytyping Solution to the YouTube-8M Video Understanding  Challenge\nSelf-ensembling for visual domain adaptation\nDimensionality Reduction using Similarity-induced Embeddings\nTversky loss function for image segmentation using 3D fully  convolutional deep networks\nExploring Content-based Artwork Recommendation with Metadata and Visual  Features\nAn Entropy-based Pruning Method for CNN Compression\nHistograms of Gaussian normal distribution for feature matching in  clutter scenes\nEvaluating 35 Methods to Generate Structural Connectomes Using Pairwise  Classification\nEndoscopic Depth Measurement and Super-Spectral-Resolution Imaging\nSatellite Imagery Feature Detection using Deep Convolutional Neural  Network: A Kaggle Competition\nMulti-Target Tracking in Multiple Non-Overlapping Cameras using  Constrained Dominant Sets\nDualing GANs\nLow Resolution Face Recognition Using a Two-Branch Deep Convolutional  Neural Network Architecture\nSPLBoost: An Improved Robust Boosting Algorithm Based on Self-paced  Learning\nA comparative study of breast surface reconstruction for aesthetic  outcome assessment\nCo-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple  Objects\nCompact Tensor Pooling for Visual Question Answering\nComicolorization: Semi-Automatic Manga Colorization\nSaliency Guided End-to-End Learning for Weakly Supervised Object  Detection\nGM-Net: Learning Features with More Efficiency\nMEC: Memory-efficient Convolution for Deep Neural Network\nLearnable pooling with Context Gating for video classification\ncGAN-based Manga Colorization Using a Single Training Image\nScalable Online Convolutional Sparse Coding\nTwo-Stream Convolutional Networks for Dynamic Texture Synthesis\nLearning Efficient Point Cloud Generation for Dense 3D Object  Reconstruction\nComparison of Time-Frequency Representations for Environmental Sound  Classification using Convolutional Neural Networks\nShape recognition of volcanic ash by simple convolutional neural network\nFast Estimation of Haemoglobin Concentration in Tissue Via Wavelet  Decomposition\nPixels to Graphs by Associative Embedding\nSingle Classifier-based Passive System for Source Printer Classification  using Local Texture Features\nLearning Spatial-Aware Regressions for Visual Tracking\nFractal dimension analysis for automatic morphological galaxy  classification\nSampling Matters in Deep Embedding Learning\nJoint Prediction of Depths, Normals and Surface Curvature from RGB  Images using CNNs\nTraining Adversarial Discriminators for Cross-channel Abnormal Event  Detection in Crowds\nOn Detection of Faint Edges in Noisy Images\nImage Forgery Localization Based on Multi-Scale Convolutional Neural  Networks\nEncoding Video and Label Priors for Multi-label Video Classification on  YouTube-8M dataset\nSelf-Learning Phase Boundaries by Active Contours\nPhotometric Stereo by Hemispherical Metric Embedding\nRobust Video-Based Eye Tracking Using Recursive Estimation of Pupil  Characteristics\nEnd-to-end Learning of Image based Lane-Change Decision\nYoTube: Searching Action Proposal via Recurrent and Static Regression  Networks\nSkeleton-Based Action Recognition Using Spatio-Temporal LSTM Network  with Trust Gates\nMulti-Label Learning with Label Enhancement\nLearning to Map Vehicles into Bird's Eye View\nGroup Synchronization on Grids\nDetecting Small Signs from Large Images\nRobust Sonar ATR Through Bayesian Pose Corrected Sparse Classification\nVoxCeleb: a large-scale speaker identification dataset\nDense Non-rigid Structure-from-Motion Made Easy - A Spatial-Temporal  Smoothness based Solution\nHierarchical Model for Long-term Video Prediction\nMaterial Recognition CNNs and Hierarchical Planning for Biped Robot  Locomotion on Slippery Terrain\nAuto-Encoder Guided GAN for Chinese Calligraphy Synthesis\nApproximate Reflection Symmetry in a Point Set: Theory and Algorithm  with an Application\nCross-Country Skiing Gears Classification using Deep Learning\nTraining a Fully Convolutional Neural Network to Route Integrated  Circuits\nSuper-Resolution via Deep Learning\nPerceptual Adversarial Networks for Image-to-Image Transformation\nYes-Net: An effective Detector Based on Global Information\nA Parameterized Approach to Personalized Variable Length Summarization  of Soccer Matches\nThe YouTube-8M Kaggle Competition: Challenges and Methods\nDeep Learning Based Large-Scale Automatic Satellite Crosswalk  Classification\nOnline Adaptation of Convolutional Neural Networks for Video Object  Segmentation\nReal-time Distracted Driver Posture Classification\nFlow-free Video Object Segmentation\nR2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection\nIterative Spectral Clustering for Unsupervised Object Localization\nRobust Face Tracking using Multiple Appearance Models and Graph  Relational Learning\nSMC Faster R-CNN: Toward a scene-specialized multi-object detector\nMultiple VLAD encoding of CNNs for image classification\nAdversarial Image Alignment and Interpolation\nBetter than Real: Complex-valued Neural Nets for MRI Fingerprinting\nImage Companding and Inverse Halftoning using Deep Convolutional Neural  Networks\nDeep GrabCut for Object Selection\nWhere to Play: Retrieval of Video Segments using Natural-Language  Queries\nVectorial Dimension Reduction for Tensors Based on Bayesian Inference\nPedestrian Alignment Network for Large-scale Person Re-identification\nEnd-to-End Learning of Video Super-Resolution with Motion Compensation\nGeometric calibration of Colour and Stereo Surface Imaging System of  ESA's Trace Gas Orbiter\nAppearance invariance in convolutional networks with neighborhood  similarity\nArabic Character Segmentation Using Projection Based Approach with  Profile's Amplitude Filter\nAggregating Frame-level Features for Large-Scale Video Classification\nSelective Deep Convolutional Features for Image Retrieval\nOne-Shot Fine-Grained Instance Retrieval\nSpatial and Angular Resolution Enhancement of Light Fields Using  Convolutional Neural Networks\nLearning Human Pose Models from Synthesized Data for Robust RGB-D Action  Recognition\nFace Recognition with Machine Learning in OpenCV_ Fusion of the results  with the Localization Data of an Acoustic Camera for Speaker Identification\nConditional generation of multi-modal data using constrained embedding  space mapping\nThe Candidate Multi-Cut for Cell Segmentation\nSkeleton-aided Articulated Motion Generation\nLearning-based Image Enhancement for Visual Odometry in Challenging HDR  Environments\nR-PHOC: Segmentation-Free Word Spotting using CNN\nBenchmarking Denoising Algorithms with Real Photographs\nRobust Multi-Image HDR Reconstruction for the Modulo Camera\nGenerative diffeomorphic atlas construction from brain and spinal cord  MRI data\nAlignGAN: Learning to Align Cross-Domain Images with Conditional  Generative Adversarial Networks\nSensor Analytics in Basketball\nSSGAN: Secure Steganography Based on Generative Adversarial Networks\nCNN features are also great at unsupervised classification\nWeighted Low Rank Approximation for Background Estimation Problems\nTensor-Train Recurrent Neural Networks for Video Classification\nCardiologist-Level Arrhythmia Detection with Convolutional Neural  Networks\nZero-Shot Deep Domain Adaptation\nOn the Compactness, Efficiency, and Representation of 3D Convolutional  Networks: Brain Parcellation as a Pretext Task\nAutomatic Classification of Bright Retinal Lesions via Deep Network  Features\nA spatiotemporal model with visual attention for video classification\nSigNet: Convolutional Siamese Network for Writer Independent Offline  Signature Verification\nA multi-layer image representation using Regularized Residual  Quantization: application to compression and denoising\nThe 2017 Hands in the Million Challenge on 3D Hand Pose Estimation\nLearning Efficient Image Representation for Person Re-Identification\nLearning Representations and Generative Models for 3D Point Clouds\nSelf Adversarial Training for Human Pose Estimation\nLocal Activity-tuned Image Filtering for Noise Removal and Image  Smoothing\nIntegration of LiDAR and Hyperspectral Data for Land-cover  Classification: A Case Study\nA Human and Group Behaviour Simulation Evaluation Framework utilising  Composition and Video Analysis\nAnisotropic Diffusion-based Kernel Matrix Model for Face Liveness  Detection\nSynthesis-based Robust Low Resolution Face Recognition\nImproving speaker turn embedding by crossmodal transfer learning from  face embedding\nScale-Regularized Filter Learning\nAn Analysis of Human-centered Geolocation\nEnhanced Deep Residual Networks for Single Image Super-Resolution\nWavelet-based Reflection Symmetry Detection via Textural and Color  Histograms\nAutomatic Construction of Real-World Datasets for 3D Object Localization  using Two Cameras\nFoot anthropometry device and single object image thresholding\nUnderwater object classification using scattering transform of sonar  signals\nRegNet: Multimodal Sensor Registration Using Deep Neural Networks\nAdversarial training and dilated convolutions for brain MRI segmentation\nGeneralised Dice overlap as a deep learning loss function for highly  unbalanced segmentations\nHierarchical Deep Recurrent Architecture for Video Understanding\nObstacle detection test in real-word traffic contexts for the purposes  of motorcycle autonomous emergency braking (MAEB)\nMachine Learning in Appearance-based Robot Self-localization\nMachine Learning for RealisticBall Detection in RoboCup SPL\nAdversarial Dropout for Supervised and Semi-supervised Learning\nStructured Sparse Ternary Weight Coding of Deep Neural Networks for  Efficient Hardware Implementations\nDeep Fisher Discriminant Learning for Mobile Hand Gesture Recognition\nTwo-pixel polarimetric camera by compressive sensing\nContour and Centreline Tracking of Vessels from Angiograms using the  Classical Image Processing Techniques\nLinkNet: Exploiting Encoder Representations for Efficient Semantic  Segmentation\nLarge-scale Multiview 3D Hand Pose Dataset\nPixel-variant Local Homography for Fisheye Stereo Rectification  Minimizing Resampling Distortion\nReduced Electron Exposure for Energy-Dispersive Spectroscopy using  Dynamic Sampling\nUnsupervised Body Part Regression via Spatially Self-ordering  Convolutional Neural Networks\nTowards End-to-end Text Spotting with Convolutional Recurrent Neural  Networks\nMerge or Not? Learning to Group Faces via Imitation Learning\nQuery-Aware Sparse Coding for Multi-Video Summarization\nDeep Learning with Topological Signatures\nLarge-scale Video Classification guided by Batch Normalized LSTM  Translator\nStable Distribution Alignment Using the Dual of the Adversarial Distance\nFoolbox: A Python toolbox to benchmark the robustness of machine  learning models\nBe Careful What You Backpropagate: A Case For Linear Output Activations  & Gradient Boosting\nGuiding InfoGAN with Semi-Supervision\nTemporal Modeling Approaches for Large-scale Youtube-8M Video  Understanding\nThe Reversible Residual Network: Backpropagation Without Storing  Activations\nModified Alpha-Rooting Color Image Enhancement Method On The Two-Side  2-D Quaternion Discrete Fourier Transform And The 2-D Discrete Fourier  Transform\nRED: Reinforced Encoder-Decoder Networks for Action Anticipation\nOptical Music Recognition with Convolutional Sequence-to-Sequence Models\nGenerative Adversarial Network based on Resnet for Conditional Image  Restoration\nChinese Typography Transfer\nExpected exponential loss for gaze-based video and volume ground truth  annotation\nComparative Performance Analysis of Neural Networks Architectures on H2O  Platform for Various Activation Functions\nNon-Linear Subspace Clustering with Learned Low-Rank Kernels\nMoCoGAN: Decomposing Motion and Content for Video Generation\n\"Maximizing rigidity\" revisited: a convex programming approach for  generic 3D shape reconstruction from multiple perspective views\nDesigning Effective Inter-Pixel Information Flow for Natural Image  Matting\nFully Automatic and Real-Time Catheter Segmentation in X-Ray Fluoroscopy\nAesthetic-Driven Image Enhancement by Adversarial Learning\nHoudini: Fooling Deep Structured Prediction Models\nBenchmarking and Error Diagnosis in Multi-Instance Pose Estimation\nIncremental Boosting Convolutional Neural Network for Facial Action Unit  Recognition\nHybrid PS-V Technique: A Novel Sensor Fusion Approach for Fast Mobile  Eye-Tracking with Sensor-Shift Aware Correction\nDiscriminative Transformation Learning for Fuzzy Sparse Subspace  Clustering\nPruning Convolutional Neural Networks for Image Instance Retrieval\nAPE-GAN: Adversarial Perturbation Elimination with GAN\nOrder-Free RNN with Visual Attention for Multi-Label Classification\nBeyond Forward Shortcuts: Fully Convolutional Master-Slave Networks  (MSNets) with Backward Skip Connections for Semantic Segmentation\nTransitioning between Convolutional and Fully Connected Layers in Neural  Networks\nOptimizing the Latent Space of Generative Networks\nA Novel Deep Learning Architecture for Testis Histology Image  Classification\nDiscovering Class-Specific Pixels for Weakly-Supervised Semantic  Segmentation\nRecognizing and Curating Photo Albums via Event-Specific Image  Importance\nFace Alignment Robust to Pose, Expressions and Occlusions\nDrone-based Object Counting by Spatially Regularized Regional Proposal  Network\nOrthogonal and Idempotent Transformations for Learning Deep Neural  Networks\nClosed-form Solution for IMU based LSD-SLAM Point Cloud Conversion into  the Scaled 3D World Environment\nDetecting Parts for Action Localization\nModeling the Intra-class Variability for Liver Lesion Detection using a  Multi-class Patch-based CNN\nDeep View-Sensitive Pedestrian Attribute Inference in an end-to-end  Model\nDiscriminative convolutional Fisher vector network for action  recognition\nChannel Pruning for Accelerating Very Deep Neural Networks\nDeformable Part-based Fully Convolutional Network for Object Detection\nDomain-adversarial neural networks to address the appearance variability  of histopathology images\nDeformable Registration through Learning of Context-Specific Metric  Aggregation\nShape Generation using Spatially Partitioned Point Clouds\nPose-Invariant Face Alignment with a Single CNN\nSTag: A Stable Fiducial Marker System\nFast, Simple Calcium Imaging Segmentation with Fully Convolutional  Networks\nDenseNet for Dense Flow\nAutomatic Segmentation of Retinal Vasculature\nSunrise or Sunset: Selective Comparison Learning for Subtle Attribute  Recognition\n3D Shape Reconstruction from Sketches via Multi-view Convolutional  Networks\nAdaptive Feeding: Achieving Fast and Accurate Detections by Adaptively  Combining Object Detectors\nSemantic Segmentation with Reverse Attention\nDeep Layer Aggregation\nAn All-in-One Network for Dehazing and Beyond\nVideo Object Segmentation using Tracked Object Proposals\nResting state fMRI functional connectivity-based classification using a  convolutional neural network architecture\nTemporal Convolution Based Action Proposal: Submission to ActivityNet  2017\nA Nonlinear Dimensionality Reduction Framework Using Smooth Geodesics\nNeural Person Search Machines\nRecurrent Neural Networks for Online Video Popularity Prediction\nNeuron Pruning for Compressing Deep Networks using Maxout Architectures\nRetinal Microaneurysms Detection using Local Convergence Index Features\nMulti-kernel learning of deep convolutional features for action  recognition\nA Multi-Scale CNN and Curriculum Learning Strategy for Mammogram  Classification\nMemory-Efficient Implementation of DenseNets\nConfidence estimation in Deep Neural networks via density modelling\nAutomatic Curation of Golf Highlights using Multimodal Excitement  Features\nPatchShuffle Regularization\nDeep Networks for Compressed Image Sensing\nSingle Image Super-Resolution with Dilated Convolution based Multi-Scale  Information Learning Inception Module\nClinical Patient Tracking in the Presence of Transient and Permanent  Occlusions via Geodesic Feature\nEyemotion: Classifying facial expressions in VR using eye-tracking  cameras\nSpatio-temporal Human Action Localisation and Instance Segmentation in  Temporally Untrimmed Videos\nSAR Image Colorization: Converting Single-Polarization to Fully  Polarimetric Using Deep Neural Networks\nPerson Re-identification Using Visual Attention\nGroup-wise Deep Co-saliency Detection\nSemantic 3D Occupancy Mapping through Efficient High Order CRFs\nContrastive-center loss for deep neural networks\nWavelet Convolutional Neural Networks for Texture Classification\nSynthesizing Robust Adversarial Examples\nToward Geometric Deep SLAM\nGenerative OpenMax for Multi-Class Open Set Classification\nLV-ROVER: Lexicon Verified Recognizer Output Voting Error Reduction\nDelineation of line patterns in images using B-COSFIRE filters\nDetecting Semantic Parts on Partially Occluded Objects\nImproving Robustness of Feature Representations to Image Deformations  using Powered Convolution in CNNs\nMotion-Appearance Interactive Encoding for Object Segmentation in  Unconstrained Videos\nAnalyzing First-Person Stories Based on Socializing, Eating and  Sedentary Patterns\nSpatiotemporal Modeling for Crowd Counting in Videos\nResidual Conv-Deconv Grid Network for Semantic Segmentation\nBottom-Up and Top-Down Attention for Image Captioning and Visual  Question Answering\nFast Deep Matting for Portrait Animation on Mobile Phone\nGraph-Based Classification of Omnidirectional Images\nRankIQA: Learning from Rankings for No-reference Image Quality  Assessment\nReduction of Overfitting in Diabetes Prediction Using Deep Learning  Neural Network\nMaximum entropy based non-negative optoacoustic tomographic image  reconstruction\nSPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO  Data Set\nA Guided Spatial Transformer Network for Histology Cell Differentiation\nRobust Rigid Point Registration based on Convolution of Adaptive  Gaussian Mixture Models\nLearning a Target Sample Re-Generator for Cross-Database  Micro-Expression Recognition\nContext-Aware Single-Shot Detector\nUltra-low-power Wireless Streaming Cameras\nExploiting Web Images for Weakly Supervised Object Detection\nA Comparative Study of the Clinical use of Motion Analysis from Kinect  Skeleton Data\nFood Ingredients Recognition through Multi-label Learning\nA Downsampled Variant of ImageNet as an Alternative to the CIFAR  datasets\nSTN-OCR: A single Neural Network for Text Detection and Text Recognition\nHandwritten character recognition using some (anti)-diagonal structural  features\nBuilding Detection from Satellite Images on a Global Scale\nEfficient Deformable Shape Correspondence via Kernel Matching\nLearning from Video and Text via Large-Scale Discriminative Clustering\nObject Detection of Satellite Images Using Multi-Channel Higher-order  Local Autocorrelation\nFine-Pruning: Joint Fine-Tuning and Compression of a Convolutional  Network with Bayesian Optimization\nLocalizing Actions from Video Labels and Pseudo-Annotations\nSpatial-Aware Object Embeddings for Zero-Shot Localization and  Classification of Actions\nA weighting strategy for Active Shape Models\nThe WILDTRACK Multi-Camera Person Dataset\nFontCode: Embedding Information in Text Documents using Glyph  Perturbation\nWeakly-supervised learning of visual relations\nDeep Feature Consistent Deep Image Transformations: Downscaling,  Decolorization and HDR Tone Mapping\nSynthetic Database for Evaluation of General, Fundamental Biometric  Principles\nImproved Adversarial Systems for 3D Object Generation and Reconstruction\nVirtual PET Images from CT Data Using Deep Convolutional Networks:  Initial Results\nDiscover and Learn New Objects from Documentaries\nCNN-based Cascaded Multi-task Learning of High-level Prior and Density  Estimation for Crowd Counting\nScalable and Effective Deep CCA via Soft Decorrelation\nAutomatic Crack Detection in Built Infrastructure Using Unmanned Aerial  Vehicles\n2D-3D Fully Convolutional Neural Networks for Cardiac MR Segmentation\nDeep Domain Adaptation by Geodesic Distance Minimization\nConvolution with Logarithmic Filter Groups for Efficient Shallow CNN\nSpatially variant PSF modeling in confocal macroscopy\nFeature Extraction via Recurrent Random Deep Ensembles and its  Application in Gruop-level Happiness Estimation\nA Framework for Super-Resolution of Scalable Video via Sparse  Reconstruction of Residual Frames\nDeep Convolutional Framelet Denosing for Low-Dose CT via Wavelet  Residual Network\nSpatio-Temporal Action Detection with Cascade Proposal and Location  Anticipation\nTowards the Success Rate of One: Real-time Unconstrained Salient Object  Detection\nMaterial Editing Using a Physically Based Rendering Network\nImage Denoising via CNNs: An Adversarial Approach\nModel-based learning of local image features for unsupervised texture  segmentation\nTensorial Recurrent Neural Networks for Longitudinal Data Analysis\nVideo Object Segmentation with Re-identification\nCREST: Convolutional Residual Learning for Visual Tracking\nSelf-Supervised Learning for Spinal MRIs\nMomo: Monocular Motion Estimation on Manifolds\nActive Learning for Convolutional Neural Networks: A Core-Set Approach\nDense Piecewise Planar RGB-D SLAM for Indoor Environments\nKernalised Multi-resolution Convnet for Visual Tracking\nJoint Transmission Map Estimation and Dehazing using Deep Networks\nA Learning-based Framework for Hybrid Depth-from-Defocus and Stereo  Matching\nA Simple Loss Function for Improving the Convergence and Accuracy of  Visual Question Answering Models\nControllable Generative Adversarial Network\nExact Tensor Completion from Sparsely Corrupted Observations via Convex  Optimization\nOmniArt: Multi-task Deep Learning for Artistic Data Analysis\nAccurate Lung Segmentation via Network-Wise Training of Convolutional  Networks\nLatent tree models\nPIVO: Probabilistic Inertial-Visual Odometry for Occlusion-Robust  Navigation\nAligned and Non-Aligned Double JPEG Detection Using Convolutional Neural  Networks\nAn Energy Minimization Approach to 3D Non-Rigid Deformable Surface  Estimation Using RGBD Data\nPredicting Human Activities Using Stochastic Grammar\nSemantic Instance Labeling Leveraging Hierarchical Segmentation\nLow Dose CT Image Denoising Using a Generative Adversarial Network with  Wasserstein Distance and Perceptual Loss\nDual Quadrics from Object Detection BoundingBoxes as Landmark  Representations in SLAM\nORGB: Offset Correction in RGB Color Space for Illumination-Robust Image  Processing\nExtreme Low Resolution Activity Recognition with Multi-Siamese Embedding  Learning\nA Unified View-Graph Selection Framework for Structure from Motion\nDeep MR to CT Synthesis using Unpaired Data\nUnsupervised Video Understanding by Reconciliation of Posture  Similarities\nRecent Developments and Future Challenges in Medical Mixed Reality\nImage reconstruction with imperfect forward models and applications in  deblurring\nUnsupervised Representation Learning by Sorting Sequences\nAutomatic Spatially-aware Fashion Concept Discovery\nCASSL: Curriculum Accelerated Self-Supervised Learning\nA Latent Variable Model for Two-Dimensional Canonical Correlation  Analysis and its Variational Inference\nLocalizing Moments in Video with Natural Language\nCut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection\n3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks\nBetter Together: Joint Reasoning for Non-rigid 3D Reconstruction with  Specularities and Shading\nAccelerated Image Reconstruction for Nonlinear Diffractive Imaging\nIntrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and  Geometry Optimization with Spatially-Varying Lighting\nQuery-guided Regression Network with Context Policy for Phrase Grounding\nVideo Frame Interpolation via Adaptive Separable Convolution\nOptimizing Region Selection for Weakly Supervised Object Detection\nInteractively Transferring CNN Patterns for Part Localization\nInterpreting CNN Knowledge via an Explanatory Graph\nDetecting Noteheads in Handwritten Scores with ConvNets and Bounding Box  Regression\nManifold Constrained Low-Rank Decomposition\nEnd-to-end learning potentials for structured attribute prediction\nIntensity Video Guided 4D Fusion for Improved Highly Dynamic 3D  Reconstruction\nAccurate Light Field Depth Estimation with Superpixel Regularization  over Partially Occluded Regions\nIdentity-Aware Textual-Visual Matching with Latent Co-attention\nA Solution for Crime Scene Reconstruction using Time-of-Flight Cameras\nStructured Attentions for Visual Question Answering\nLearning for Active 3D Mapping\nTwo-Phase Learning for Weakly Supervised Object Localization\nLearning to segment on tiny datasets: a new shape model\nSelf-supervised Learning of Pose Embeddings from Spatiotemporal  Relations in Videos\nMemNet: A Persistent Memory Network for Image Restoration\nLearning a CNN-based End-to-End Controller for a Formula SAE Racecar\nGraph Classification with 2D Convolutional Neural Networks\nImage Quality Assessment Techniques Show Improved Training and  Evaluation of Autoencoder Generative Adversarial Networks\nReal-Time Visual Localisation in a Tagged Environment\nAn Adaptive Cluster-based Wiener Filter for Speckle Reduction of OCT  Skin Images\nMultibiometric Secure System Based on Deep Learning\nWhat Makes a Place? Building Bespoke Place Dependent Object Detectors  for Robotics\nUnconstrained Face Detection and Open-Set Face Recognition Challenge\nTemporal Context Network for Activity Localization in Videos\nLearning a Repression Network for Precise Vehicle Search\nAn Effective Feature Selection Method Based on Pair-Wise Feature  Proximity for High Dimensional Low Sample Size Data\nAn Unsupervised Game-Theoretic Approach to Saliency Detection\nFast Scene Understanding for Autonomous Driving\nAn Error Detection and Correction Framework for Connectomics\nHuman Skin Detection Using RGB, HSV and YCbCr Color Models\nWhat Actions are Needed for Understanding Human Actions in Videos?\nSequential Dual Deep Learning with Shape and Texture Features for Sketch  Recognition\nDeep Face Feature for Face Alignment\nJoint Face Alignment and 3D Face Reconstruction with Application to Face  Recognition\nLearning to Disambiguate by Asking Discriminative Questions\nMulti-dimensional Gated Recurrent Units for Automated Anatomical  Landmark Localization\nSPLODE: Semi-Probabilistic Point and Line Odometry with Depth Estimation  from RGB-D Camera Motion\nSUBIC: A supervised, structured binary code for image search\nPersonalized Cinemagraphs using Semantic Understanding and Collaborative  Learning\nTandemNet: Distilling Knowledge from Medical Images Using Diagnostic  Reports as Optional Semantic References\nModality-bridge Transfer Learning for Medical Image Classification\nWriter Identification and Verification from Intra-variable Individual  Handwriting\nVideo Deblurring via Semantic Segmentation and Pixel-Wise Non-Linear  Kernel\nIterative Deep Convolutional Encoder-Decoder Network for Medical Image  Segmentation\nDeep Recurrent Neural Networks for mapping winter vegetation quality  coverage via multi-temporal SAR Sentinel-1\nFace Parsing via a Fully-Convolutional Continuous CRF Neural Network\nCalipso: Physics-based Image and Video Editing through CAD Model Proxies\nFlower Categorization using Deep Convolutional Neural Networks\nNoisy Softmax: Improving the Generalization Ability of DCNN via  Postponing the Early Softmax Saturation\nKill Two Birds With One Stone: Boosting Both Object Detection Accuracy  and Speed With adaptive Patch-of-Interest Composition\nRecurrent Filter Learning for Visual Tracking\nLearning Deep Neural Networks for Vehicle Re-ID with  Visual-spatio-temporal Path Proposals\nVisual Graph Mining\nContext-based Normalization of Histological Stains using Deep  Convolutional Features\nFast-Forward Video Based on Semantic Extraction\nDivide and Fuse: A Re-ranking Approach for Person Re-identification\nDeep Object-Centric Representations for Generalizable Robot Learning\nAn ELU Network with Total Variation for Image Denoising\nSituation Recognition with Graph Neural Networks\nImage Augmentation using Radial Transform for Training Deep Neural  Networks\nBringing Background into the Foreground: Making All Classes Equal in  Weakly-supervised Video Semantic Segmentation\nPathological Pulmonary Lobe Segmentation from CT Images using  Progressive Holistically Nested Neural Networks and Random Walker\nImproved Regularization of Convolutional Neural Networks with Cutout\nSequence-to-Label Script Identification for Multilingual OCR\nDeformNet: Free-Form Deformation Network for 3D Shape Reconstruction  from a Single Image\nAcoustic Feature Learning via Deep Variational Canonical Correlation  Analysis\nLearning Graph While Training: An Evolving Graph Convolutional Neural  Network\nLanguage Identification Using Deep Convolutional Recurrent Neural  Networks\nGSLAM: Initialization-robust Monocular Visual SLAM via Global  Structure-from-Motion\nA Generalised Directional Laplacian Distribution: Estimation, Mixture  Models and Audio Source Separation\nRandom Erasing Data Augmentation\nStacked Deconvolutional Network for Semantic Segmentation\nConvNet Architecture Search for Spatiotemporal Feature Learning\nImportance of Image Enhancement Techniques in Color Image Segmentation:  A Comprehensive and Comparative Study\nDeep Binary Reconstruction for Cross-modal Hashing\nPixel-Level Matching for Video Object Segmentation using Convolutional  Neural Networks\nRobust Registration and Geometry Estimation from Unstructured Facial  Scans\nPixelNN: Example-based Image Synthesis\nSimultaneous Detection and Quantification of Retinal Fluid with Deep  Learning\nEigen Evolution Pooling for Human Action Recognition\nTowards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples\nTowards the Automatic Anime Characters Creation with Generative  Adversarial Networks\nWhat does a convolutional neural network recognize in the moon?\nHigh Voltage Insulator Surface Evaluation Using Image Processing\nA Brief Survey of Deep Reinforcement Learning\nTeaching UAVs to Race Using Sim4CV\nApplying Data Augmentation to Handwritten Arabic Numeral Recognition  Using Deep Learning Neural Networks\nShapelet-based Sparse Representation for Landcover Classification of  Hyperspectral Images\nMore cat than cute? Interpretable Prediction of Adjective-Noun Pairs\nDistantly Supervised Road Segmentation\nLearning Spread-out Local Feature Descriptors\nPiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection\nTowards Automatic Construction of Diverse, High-quality Image Dataset\nSparsity Invariant CNNs\nProbFlow: Joint Optical Flow and Uncertainty Estimation\nTags2Parts: Discovering Semantic Regions from Shape Tags\nA Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition\nRepresentation Learning by Learning to Count\nSeeing Through Noise: Visually Driven Speaker Separation and Enhancement\nDeep EndoVO: A Recurrent Convolutional Neural Network (RCNN) based  Visual Odometry Approach for Endoscopic Capsule Robots\nPose Estimation using Local Structure-Specific Shape and Appearance  Context\nExploiting Convolution Filter Patterns for Transfer Learning\nIncremental Learning of Object Detectors without Catastrophic Forgetting\nFast single image super-resolution based on sigmoid transformation\nApplication of a Convolutional Neural Network for image classification  to the analysis of collisions in High Energy Physics\nSingle Reference Image based Scene Relighting via Material Guided  Filtering\n3D Morphable Models as Spatial Transformer Networks\nSPARCNN: SPAtially Related Convolutional Neural Networks\nObjective Classes for Micro-Facial Expression Recognition\nA wavelet frame coefficient total variational model for image  restoration\nLearning Spatio-Temporal Features with 3D Residual Networks for Action  Recognition\nUnderstanding and Comparing Deep Neural Networks for Age and Gender  Classification\nFashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning  Algorithms\nEvaluation of Deep Learning on an Abstract Image Classification Dataset\nStructured Low-Rank Matrix Factorization: Global Optimality, Algorithms,  and Applications\nMulti-task Self-Supervised Visual Learning\nStereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo  Cameras\nRaspiReader: An Open Source Fingerprint Reader Facilitating Spoof  Detection\nDeep Learning for Target Classification from SAR Imagery: Data  Augmentation and Translation Invariance\n3D Object Reconstruction from a Single Depth View with Adversarial  Learning\nStereo Matching With Color-Weighted Correlation, Hierarchical Belief  Propagation And Occlusion Handling\nAn IoT Real-Time Biometric Authentication System Based on ECG Fiducial  Extracted Features Using Discrete Cosine Transform\nCross-Age LFW: A Database for Studying Cross-Age Face Recognition in  Unconstrained Environments\nAutomatic Dataset Augmentation\nDigital image splicing detection based on Markov features in QDCT and  QWT domain\nA Compromise Principle in Deep Monocular Depth Estimation\nStylizing Face Images via Multiple Exemplars\nOpen-World Visual Recognition Using Knowledge Graphs\nDeep Learning Sparse Ternary Projections for Compressed Sensing of  Images\nCurriculum Learning for Multi-Task Classification of Visual Attributes\nMulti-view Low-rank Sparse Subspace Clustering\nAutoencoder with recurrent neural networks for video forgery detection\nStudy of Clear Sky Models for Singapore\nSemantic Texture for Robust Dense Tracking\nLimiting the Reconstruction Capability of Generative Neural Network  using Negative Learning\nDeep Residual Bidir-LSTM for Human Activity Recognition Using Wearable  Sensors\nAnalyzing Cloud Optical Properties Using Sky Cameras\nLearning a 3D descriptor for cross-source point cloud registration from  synthetic data\nA Machine Learning Approach For Identifying Patients with Mild Traumatic  Brain Injury Using Diffusion MRI Modeling\nReal-Time 6DOF Pose Relocalization for Event Cameras with Stacked  Spatial LSTM Networks\nConvolutional Sparse Coding with Overlapping Group Norms\nBlock-Simultaneous Direction Method of Multipliers: A proximal  primal-dual splitting algorithm for nonconvex problems with multiple  constraints\nSimultaneously Color-Depth Super-Resolution with Conditional Generative  Adversarial Network\nJoint Maximum Purity Forest with Application to Image Super-Resolution\nCascade Residual Learning: A Two-stage Convolutional Neural Network for  Stereo Matching\nInterpretation of Mammogram and Chest X-Ray Reports Using Deep Neural  Networks - Preliminary Results\nEfficient Convolutional Network Learning using Parametric Log based  Dual-Tree Wavelet ScatterNet\nEnd-to-end Training for Whole Image Breast Cancer Diagnosis using An All  Convolutional Design\nAction Classification and Highlighting in Videos\nLearning a Generative Adversarial Network for High Resolution Artwork  Synthesis\nVideo Summarization with Attention-Based Encoder-Decoder Networks\nFast Landmark Localization with 3D Component Reconstruction and CNN for  Cross-Pose Recognition\nICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)\nALCN: Meta-Learning for Contrast Normalization Applied to Robust 3D Pose  Estimation\nAutomatic Semantic Style Transfer using Deep Convolutional Neural  Networks and Soft Masks\nInferring Human Activities Using Robust Privileged Probabilistic  Learning\nPredicting Cardiovascular Risk Factors from Retinal Fundus Photographs  using Deep Learning\nEuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and  Land Cover Classification\nContext Based Visual Content Verification\nEffective Use of Dilated Convolutions for Segmenting Small Object  Instances in Remote Sensing Imagery\nDeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land  Segmentation\nAutomatic Brain Tumor Segmentation using Cascaded Anisotropic  Convolutional Neural Networks\nFacial 3D Model Registration Under Occlusions With SensiblePoints-based  Reinforced Hypothesis Refinement\nDeep Learning-Guided Image Reconstruction from Incomplete Data\nSimulated Annealing for JPEG Quantization\nDetection of Moving Object in Dynamic Background Using Gaussian  Max-Pooling and Segmentation Constrained RPCA\nA Generative Model For Zero Shot Learning Using Conditional Variational  Autoencoders\nBlind Stereo Image Quality Assessment Inspired by Brain Sensory-Motor  Fusion\nHyperspectral Light Field Stereo Matching\nTo Learn or Not to Learn Features for Deformable Registration?\nHierarchical loss for classification\nA Nonparametric Model for Multimodal Collaborative Activities  Summarization\nA Multilayer-Based Framework for Online Background Subtraction with  Freely Moving Cameras\nLink the head to the \"beak\": Zero Shot Learning from Noisy Text  Description at Part Precision\nInhomogeneous Hypergraph Clustering with Applications\nVisualizing and Improving Scattering Networks\nPredicting Visual Features from Text for Image and Video Caption  Retrieval\nSubspace Segmentation by Successive Approximations: A Method for  Low-Rank and High-Rank Data with Missing Entries\nFine-tuning deep CNN models on specific MS COCO categories\nDeep Ordinal Ranking for Multi-Category Diagnosis of Alzheimer's Disease  using Hippocampal MRI data\nPageNet: Page Boundary Extraction in Historical Handwritten Documents\nUsing Cross-Model EgoSupervision to Learn Cooperative Basketball  Intention\nDeep Convolutional Neural Network for Age Estimation based on VGG-Face  Model\nGroup-level Emotion Recognition using Transfer Learning from Face  Identification\nBlind image deblurring using class-adapted image priors\nDetecting animals in African Savanna with UAVs and the crowds\nDeep learning from crowds\nAutomatic Document Image Binarization using Bayesian Optimization\nCross-Domain Image Retrieval with Attention Modeling\nSoft Proposal Networks for Weakly Supervised Object Localization\nClustering of Data with Missing Entries using Non-convex Fusion  Penalties\nSynthetic Medical Images from Dual Generative Adversarial Networks\nPolar Transformer Networks\nBlended e-Learning Training (BeLT): Enhancing Railway Station Controller  Knowledge\nLabel Denoising Adversarial Network (LDAN) for Inverse Lighting of Face  Images\nImage Splicing Localization Using A Multi-Task Fully Convolutional  Network (MFCN)\nIntegrating Specialized Classifiers Based on Continuous Time Markov  Chain\nImproving Sonar Image Patch Matching via Deep Learning\nFingerNet: An Unified Deep Network for Fingerprint Minutiae Extraction\nDeep Galaxy: Classification of Galaxies based on Deep Convolutional  Neural Networks\nUncertainty-Aware Learning from Demonstration using Mixture Density  Networks with Sampling-Free Variance Modeling\nMulti-modal Conditional Attention Fusion for Dimensional Emotion  Prediction\nAdaptive Real-Time Removal of Impulse Noise in Medical Images\nPWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume\nScalable Annotation of Fine-Grained Categories Without Experts\nDeepFeat: A Bottom Up and Top Down Saliency Model Based on Deep Features  of Convolutional Neural Nets\nDeep Subspace Clustering Networks\nLearning to Segment Breast Biopsy Whole Slide Images\nObjectness Scoring and Detection Proposals in Forward-Looking Sonar  Images with Convolutional Neural Networks\nBest Practices in Convolutional Networks for Forward-Looking Sonar Image  Recognition\nCalibration of depth cameras using denoised depth images\nCompletion of High Order Tensor Data with Missing Entries via  Tensor-train Decomposition\nMethod to Detect Eye Position Noise from Video-Oculography when  Detection of Pupil or Corneal Reflection Position Fails\nImproving Heterogeneous Face Recognition with Conditional Adversarial  Networks\nLearning a Dilated Residual Network for SAR Image Despeckling\nImage Processing Operations Identification via Convolutional Neural  Network\nJoint Calibration of Panoramic Camera and Lidar Based on Supervised  Learning\nHow to Train Triplet Networks with 100K Identities?\nSequential 3D U-Nets for Biologically-Informed Brain Tumor Segmentation\nOptimal Transport for Deep Joint Transfer Learning\nA Product Shape Congruity Measure via Entropy in Shape Scale Space\nFully Convolutional Neural Networks for Dynamic Object Detection in Grid  Maps\nDeep multi-frame face super-resolution\n3D Densely Convolutional Networks for Volumetric Segmentation\nRecurrent neural networks based Indic word-wise script identification  using character-wise training\nFused Text Segmentation Networks for Multi-oriented Scene Text Detection\nStack-Captioning: Coarse-to-Fine Learning for Image Captioning\nRecovering Homography from Camera Captured Documents using Convolutional  Neural Networks\nReal-Time Multiple Object Tracking - A Study on the Importance of Speed\nCapturing the contributions of the semantic web to the IoT: a unifying  vision\nAnti-Makeup: Learning A Bi-Level Adversarial Network for  Makeup-Invariant Face Verification\nLearning Gating ConvNet for Two-Stream based Methods in Action  Recognition\nJoint Adaptive Neighbours and Metric Learning for Multi-view Subspace  Clustering\nAdversarial Discriminative Heterogeneous Face Recognition\nJoint Dictionaries for Zero-Shot Learning\nDeep Mean-Shift Priors for Image Restoration\nEmotion Recognition in the Wild using Deep Neural Networks and Bayesian  Classifiers\nExprGAN: Facial Expression Editing with Controllable Expression  Intensity\nA Deep Cascade Network for Unaligned Face Attribute Classification\nEnd-to-End United Video Dehazing and Detection\nConstant Space Complexity Environment Representation for Vision-based  Navigation\nUnsupervised Deep Homography: A Fast and Robust Homography Estimation  Model\nMeta Networks for Neural Style Transfer\nDensely tracking sequences of 3D face scans\nGLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval\nContrast Enhancement of Brightness-Distorted Images by Improved Adaptive  Gamma Correction\nAn Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR  Image Segmentation\nA2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping\nSubspace Clustering using Ensembles of $K$-Subspaces\nFood Recognition using Fusion of Classifiers based on CNNs\nOne-Shot Visual Imitation Learning via Meta-Learning\nOn Coordinate Minimization of Convex Piecewise-Affine Functions\nClickBAIT: Click-based Accelerated Incremental Training of Convolutional  Neural Networks\nMulti-scale Deep Learning Architectures for Person Re-identification\nMasquer Hunter: Adversarial Occlusion-aware Face Detection\nCystoid macular edema segmentation of Optical Coherence Tomography  images using fully convolutional neural networks and fully connected CRFs\nEmbedding Deep Networks into Visual Explanations\nNIMA: Neural Image Assessment\nLong-Term Ensemble Learning of Visual Place Classifiers\nAn Improved Fatigue Detection System Based on Behavioral Characteristics  of Driver\nNeural Affine Grayscale Image Denoising\nOrganizing Multimedia Data in Video Surveillance Systems Based on Face  Verification with Convolutional Neural Networks\nJoint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution  from a Blurred Image Sequence\nSim-to-real Transfer of Visuo-motor Policies for Reaching in Clutter:  Domain Randomization and Adaptation with Modular Networks\nWhere to Focus: Deep Attention-based Spatially Recurrent Bilinear  Networks for Fine-Grained Visual Recognition\nDirection-Aware Semi-Dense SLAM\nStairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection\nBeyond SIFT using Binary features for Loop Closure Detection\nMicroscopy Cell Segmentation via Adversarial Neural Networks\nContinuous Multimodal Emotion Recognition Approach for AVEC 2017\nDepression Scale Recognition from Audio, Visual and Text Analysis\nMulti-Task Learning for Segmentation of Building Footprints with Deep  Neural Networks\nAdaptive compressed 3D imaging based on wavelet trees and Hadamard  multiplexing with a single photon counting detector\nTowards CNN map representation and compression for camera relocalisation\nMulti-Person Pose Estimation via Column Generation\nLS-VO: Learning Dense Optical Subspace for Robust Visual Odometry  Estimation\nTarget-adaptive CNN-based pansharpening\nRotation Adaptive Visual Object Tracking with Motion Consistency\nFiber-Flux Diffusion Density for White Matter Tracts Analysis:  Application to Mild Anomalies Localization in Contact Sports Players\nWhen is a Convolutional Filter Easy To Learn?\nLook Wider to Match Image Patches with Convolutional Neural Networks\nExploring Human-like Attention Supervision in Visual Question Answering\nHuman Action Forecasting by Learning Task Grammars\nAutomatic Leaf Extraction from Outdoor Images\nHuman Activity Recognition Using Robust Adaptive Privileged  Probabilistic Learning\nImage operator learning coupled with CNN classification and its  application to staff line removal\nLearning to Detect Violent Videos using Convolutional Long Short-Term  Memory\nCurriculum Learning of Visual Attribute Clusters for Multi-Task  Classification\nSegFlow: Joint Learning for Video Object Segmentation and Optical Flow\nReal-time Semantic Segmentation of Crop and Weed for Precision  Agriculture Robots Leveraging Background Knowledge in CNNs\nLatent Embeddings for Collective Activity Recognition\nUnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning\nEstimated Depth Map Helps Image Classification\nSceneCut: Joint Geometric and Object Segmentation for Indoor Scenes\nTemporal Multimodal Fusion for Video Emotion Classification in the Wild\nConvolutional neural networks that teach microscopes how to image\nAffordanceNet: An End-to-End Deep Learning Approach for Object  Affordance Detection\nEfficient Column Generation for Cell Detection and Segmentation\nClass-Splitting Generative Adversarial Networks\nVirtual Blood Vessels in Complex Background using Stereo X-ray Images\nNovel Evaluation Metrics for Seam Carving based Image Retargeting\nHappy Travelers Take Big Pictures: A Psychological Study with Machine  Learning and Big Data\nLearning to Generate Time-Lapse Videos Using Multi-Stage Dynamic  Generative Adversarial Networks\nDemography-based Facial Retouching Detection using Subclass Supervised  Sparse Autoencoder\nSwGridNet: A Deep Convolutional Neural Network based on Grid Topology  for Image Classification\nReal-time 3D Shape Instantiation from Single Fluoroscopy Projection for  Fenestrated Stent Graft Deployment\nOn Encoding Temporal Evolution for Real-time Action Prediction\nMR Acquisition-Invariant Representation Learning\nA semi-automated segmentation method for detection of pulmonary embolism  in True-FISP MRI sequences\nSemi-Supervised Hierarchical Semantic Object Parsing\nA Generic Regression Framework for Pose Recognition on Color and Depth  Images\nDomain Adaptation from Synthesis to Reality in Single-model Detector for  Video Smoke Detection\nComparison of Batch Normalization and Weight Normalization Algorithms  for the Large-scale Image Classification\nSurvey of Recent Advances in Visual Question Answering\n3D Camouflaging Object using RGB-D Sensors\nPose-driven Deep Convolutional Model for Person Re-identification\n3D Textured Model Encryption via 3D Lu Chaotic Mapping\nDeep Sparse Subspace Clustering\nVariational Reflectance Estimation from Multi-view Images\nCamera-Aware Multi-Resolution Analysis (CAMRA) for Raw Sensor Data  Compression\nTowards End-to-End Car License Plates Detection and Recognition with  Deep Neural Networks\nLearning to Inpaint for Image Compression\nLearning Multi-grid Generative ConvNets by Minimal Contrastive  Divergence\nLearning to Label Affordances from Simulated and Real Data\nMulti-layer Visualization for Medical Mixed Reality\nRegion-Based Image Retrieval Revisited\nAugmented Robust PCA For Foreground-Background Separation on Noisy,  Moving Camera Video\nSignature Verification Approach using Fusion of Hybrid Texture Features\nLight field super resolution through controlled micro-shifts of light  field sensor\nFoodNet: Recognizing Foods Using Ensemble of Deep Networks\nANSAC: Adaptive Non-minimal Sample and Consensus\nPhotorealistic Style Transfer with Screened Poisson Equation\nX-View: Graph-Based Semantic Multi-View Localization\nDeep Competitive Pathway Networks\nRobust Photometric Stereo Using Learned Image and Gradient Dictionaries\nUnsupervised Domain Adaptation with Copula Models\nPCANet-II: When PCANet Meets the Second Order Pooling\nUnsupervised Segmentation of Action Segments in Egocentric Videos using  Gaze\nUnsupervised Classification of Intrusive Igneous Rock Thin Section  Images using Edge Detection and Colour Analysis\nRobust Surface Reconstruction from Gradients via Adaptive Dictionary  Regularization\nDeepWheat: Estimating Phenotypic Traits from Crop Images with Deep  Learning\nImage Dehazing using Bilinear Composition Loss Function\nAdaptive Smoothing in fMRI Data Processing Neural Networks\nDeep Convolutional Neural Networks for Interpretable Analysis of EEG  Sleep Stage Scoring\nConditional Chromatic Filtering for Restoring Pansharpened Images\nA Study of Cross-domain Generative Models applied to Cartoon Series\nNeural Color Transfer between Images\nEnd-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech\nGP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks\nA concatenating framework of shortcut convolutional neural networks\nResolution limits on visual speech recognition\nDetection of Inferior Myocardial Infarction using Shallow Convolutional  Neural Networks\nSpeaker-independent machine lip-reading with speaker-dependent viseme  classifiers\nWide and deep volumetric residual networks for volumetric image  classification\nAdaptive Measurement Network for CS Image Reconstruction\nDeep learning for source camera identification on mobile devices\nSpinal cord gray matter segmentation using deep dilated convolutions\nBodyDigitizer: An Open Source Photogrammetry-based 3D Body Scanner\nEffective Image Differencing with ConvNets for Real-time Transient  Hunting\nImage Labeling Based on Graphical Models Using Wasserstein Messages and  Geometric Assignment\nContext Embedding Networks\nPlane-extraction from depth-data using a Gaussian mixture regression  model\nIntegrating Boundary and Center Correlation Filters for Visual Tracking  with Aspect Ratio Variation\nOnline Photometric Calibration for Auto Exposure Video for Realtime  Visual Odometry and SLAM\nDetecting the Moment of Completion: Temporal Models for Localising  Action Completion\nReal-Time Illegal Parking Detection System Based on Deep Learning\nA Transfer-Learning Approach for Accelerated MRI using Deep Neural  Networks\nA New Spectral Clustering Algorithm\nMicro-Expression Spotting: A Benchmark\nOn Matching Skulls to Digital Face Images: A Preliminary Approach\nVisual Servoing of Unmanned Surface Vehicle from Small Tethered Unmanned  Aerial Vehicle\nIsland Loss for Learning Discriminative Features in Facial Expression  Recognition\niVQA: Inverse Visual Question Answering\nReal-Time Action Detection in Video Surveillance using Sub-Action  Descriptor with Multi-CNN\nAdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text  Recognition\nDocEmul: a Toolkit to Generate Structured Historical Documents\nDetect to Track and Track to Detect\nDeep learning in remote sensing: a review\nA Review of Convolutional Neural Networks for Inverse Problems in  Imaging\nFFDNet: Toward a Fast and Flexible Solution for CNN based Image  Denoising\nImage retargeting via Beltrami representation\nDeep Semantic Abstractions of Everyday Human Activities: On Commonsense  Representations of Human Interactions\nLocal Radon Descriptors for Image Search\nRecognizing Daily Activities from Egocentric Photo-Streams\nGUIDES - Geospatial Urban Infrastructure Data Engineering Solutions\nJoint Image Filtering with Deep Convolutional Networks\nSelf-Taught Support Vector Machine\nResidual Connections Encourage Iterative Inference\nRetinal Fluid Segmentation and Detection in Optical Coherence Tomography  Images using Fully Convolutional Neural Network\nRetinal Vasculature Segmentation Using Local Saliency Maps and  Generative Adversarial Networks For Image Super Resolution\nRecent Advances in Zero-shot Recognition\nObject Classification in Images of Neoclassical Artifacts Using Deep  Learning\nSkin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017  International Symposium on Biomedical Imaging (ISBI), Hosted by the  International Skin Imaging Collaboration (ISIC)\nAutomatic Detection and Uncertainty Quantification of Landmarks on  Elastic Curves\nImproving Shadow Suppression for Illumination Robust Face Recognition\nAn adaptive thresholding approach for automatic optic disk segmentation\nK-means clustering for efficient and robust registration of multi-view  point sets\nCNNComparator: Comparative Analytics of Convolutional Neural Networks\nA multi-branch convolutional neural network for detecting double JPEG  compression\nWhat is (missing or wrong) in the scene? A Hybrid Deep Boltzmann Machine  For Contextualized Scene Modeling\nA Survey on Optical Character Recognition System\nVehicle classification based on convolutional networks applied to FM-CW  radar signals\nConvolutional Neural Networks for Histopathology Image Classification:  Training vs. Using Pre-Trained Networks\nIsointense Infant Brain Segmentation with a Hyper-dense Connected  Convolutional Neural Network\nGradient-free Policy Architecture Search and Adaptation\nVolumetric Data Exploration with Machine Learning-Aided Visualization in  Neutron Science\nCombining LiDAR Space Clustering and Convolutional Neural Networks for  Pedestrian Detection\nMulti-Task Domain Adaptation for Deep Learning of Instance Grasping from  Simulation\nSuperpixels Based Marker Tracking Vs. Hue Thresholding In Rodent  Biomechanics Application\nScene Parsing with Global Context Embedding\nLearning Deep Context-aware Features over Body and Latent Parts for  Person Re-identification\nCell Segmentation in 3D Confocal Images using Supervoxel Merge-Forests  with CNN-based Hypothesis Selection\nImage Restoration by Iterative Denoising and Backward Projections\nVisDA: The Visual Domain Adaptation Challenge\nImproved Search in Hamming Space using Deep Multi-Index Hashing\nGenerative Adversarial Networks: An Overview\nDeep Self-taught Learning for Remote Sensing Image Classification\nSea Level Anomaly Prediction using Recurrent Neural Networks\nCombining Multiple Views for Visual Speech Recognition\nDress like a Star: Retrieving Fashion Products from Videos\nInterpretable Transformations with Encoder-Decoder Networks\nHistorical Document Image Segmentation with LDA-Initialized Deep Neural  Networks\nSuperpixel Based Segmentation and Classification of Polyps in Wireless  Capsule Endoscopy\nEmploying Fusion of Learned and Handcrafted Features for Unconstrained  Ear Recognition\nGeneralized linear mixing model accounting for endmember variability\nLearning Discrete Weights Using the Local Reparameterization Trick\nAn efficient deep learning hashing neural network for mobile visual  search\nImage Disguise based on Generative Model\nDeep Neural Network Approximation using Tensor Sketching\nFeedback-prop: Convolutional Neural Network Inference under Partial  Evidence\nAccelerating GMM-based patch priors for image restoration: Three  ingredients for a 100$\\times$ speed-up\nAn iterative closest point method for measuring the level of similarity  of 3d log scans in wood industry\nImage Segmentation and Classification for Sickle Cell Disease using  Deformable U-Net\nInvestigating the feature collection for semantic segmentation via  single skip connection\nAmorphous Dynamic Partial Reconfiguration with Flexible Boundaries to  Remove Fragmentation\nThe ETH-MAV Team in the MBZ International Robotics Challenge\nListening to the World Improves Speech Command Recognition\nFully Context-Aware Video Prediction\nMax-Margin Invariant Features from Transformed Unlabeled Data\nOne pixel attack for fooling deep neural networks\nCompressive Online Robust Principal Component Analysis with Optical Flow  for Video Foreground-Background Separation\nSupervised Classification: Quite a Brief Overview\nLOOP Descriptor: Local Optimal Oriented Pattern\nGeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks\nOptimal Shrinkage of Singular Values Under Random Data Contamination\nLip2AudSpec: Speech reconstruction from silent lip movements video\nDynamic Routing Between Capsules\nHow far did we get in face spoofing detection?\nImage Compression: Sparse Coding vs. Bottleneck Autoencoders\nDeep Learning for Accelerated Ultrasound Imaging\nDual Skipping Networks\nTotal-Text: A Comprehensive Dataset for Scene Text Detection and  Recognition\nSeeThrough: Finding Chairs in Heavily Occluded Indoor Scene Images\nObject Recognition by Using Multi-level Feature Point Extraction\nA Novel Approach to Artistic Textual Visualization via GAN\nSynthetic Iris Presentation Attack using iDCGAN\nExamining CNN Representations with respect to Dataset Bias\nHigh-Precision Localization Using Ground Texture\nMultilinear Class-Specific Discriminant Analysis\nOn Pre-Trained Image Features and Synthetic Images for Deep Learning\nA Saak Transform Approach to Efficient, Scalable and Robust Handwritten  Digits Recognition\nCascade Region Proposal and Global Context for Deep Object Detection\nOpen Set Logo Detection and Retrieval\nLearning to solve inverse problems using Wasserstein loss\nThe loss surface and expressivity of deep convolutional neural networks\nSound Source Localization in a Multipath Environment Using Convolutional  Neural Networks\nOn the Taut String Interpretation of the One-dimensional  Rudin-Osher-Fatemi Model: A New Proof, a Fundamental Estimate and Some  Applications\nDenoising random forests\nContinuous Authentication Using One-class Classifiers and their Fusion\nAn Integrated Approach to Crowd Video Analysis: From Tracking to  Multi-level Activity Recognition\nCrescendoNet: A Simple Deep Convolutional Neural Network with Ensemble  Behavior\nDeep word embeddings for visual speech recognition\nSpatio-temporal interaction model for crowd video analysis\nPhysics-guided Neural Networks (PGNN): An Application in Lake  Temperature Modeling\nDeep Hashing with Triplet Quantization Loss\nOptimal Resource Allocation in Distributed Broadband Wireless  Communication Systems\nMulti-Resolution Fully Convolutional Neural Networks for Monaural Audio  Source Separation\nA multi-layer network based on Sparse Ternary Codes for universal vector  compression\nMultiple Instance Hybrid Estimator for Hyperspectral Target  Characterization and Sub-pixel Target Detection\nCommon Representation Learning Using Step-based Correlation Multi-Modal  CNN\nCountering Adversarial Images using Input Transformations\nAccelerated Sparse Subspace Clustering\nSegmentation-by-Detection: A Cascade Network for Volumetric Medical  Image Segmentation\nA multitask deep learning model for real-time deployment in embedded  systems\nImproving Object Localization with Fitness NMS and Bounded IoU Loss\nLearning deep features for source color laser printer identification  based on cascaded learning\nQuery-free Clothing Retrieval via Implicit Relevance Feedback\nAcquiring Target Stacking Skills by Goal-Parameterized Deep  Reinforcement Learning\nComplex-valued image denosing based on group-wise complex-domain  sparsity\nUnderstanding and Predicting The Attractiveness of Human Action Shot\nStatistical evaluation of visual quality metrics for image denoising\nVariational Inference of Disentangled Latent Concepts from Unlabeled  Observations\nAutomatic Query Image Disambiguation for Content-Based Image Retrieval\nA Taught-Obesrve-Ask (TOA) Method for Object Detection with Critical  Supervision\nMulti-Glimpse LSTM with Color-Depth Feature Fusion for Human Detection\nMotion Artifact Detection in Confocal Laser Endomicroscopy Images\nEnd-to-end Flow Correlation Tracking with Spatial-temporal Attention\nDistributed Unmixing of Hyperspectral Data With Sparsity Constraint\nEnsembles of Multiple Models and Architectures for Robust Brain Tumour  Segmentation\nRegistration and Fusion of Multi-Spectral Images Using a Novel Edge  Descriptor\nAdversarial Dropout Regularization\nSimultaneous Joint and Object Trajectory Templates for Human Activity  Recognition from 3-D Data\nEnd-to-End Video Classification with Knowledge Graphs\nActive Learning for Visual Question Answering: An Empirical Study\nTowards Reverse-Engineering Black-Box Neural Networks\nRadical analysis network for zero-shot learning in printed Chinese  character recognition\nOptimal transport maps for distribution preserving operations on latent  spaces of Generative Models\nArtificial Generation of Big Data for Improving Image Classification: A  Generative Adversarial Network Approach on SAR Data\nCharacterizing Sparse Connectivity Patterns in Neural Networks\nAlpha-expansion is Exact on Stable Instances\nImage Segmentation of Multi-Shaped Overlapping Objects\nChallenges in Disentangling Independent Factors of Variation\nUnconstrained Scene Text and Video Text Recognition for Arabic Script\nFine-tuning CNN Image Retrieval with No Human Annotation\nFew-Shot Adversarial Domain Adaptation\nMoonshine: Distilling with Cheap Convolutions\nCompression-aware Training of Deep Networks\nLatent hypernet: Exploring all Layers from Convolutional Neural Networks\nRecurrent Autoregressive Networks for Online Multi-Object Tracking\nA New Hybrid-parameter Recurrent Neural Networks for Online Handwritten  Chinese Character Recognition\nRevealing structure components of the retina by deep learning networks\nLearning Sparse Visual Representations with Leaky Capped Norm  Regularizers\nOffline signature authenticity verification through unambiguously  connected skeleton segments\nMulti-stage Suture Detection for Robot Assisted Anastomosis based on  Deep Learning\nCyCADA: Cycle-Consistent Adversarial Domain Adaptation\nFast camera focus estimation for gaze-based focus control\nToward Depth Estimation Using Mask-Based Lensless Cameras\nPoverty Prediction with Public Landsat 7 Satellite Imagery and Machine  Learning\nBreast density classification with deep convolutional neural networks\nSelf-Supervised Intrinsic Image Decomposition\nA Fully Convolutional Tri-branch Network (FCTN) for Domain Adaptation\nSaliency Prediction for Mobile User Interfaces\nRobotic Tactile Perception of Object Properties: A Review\nMaterial Classification in the Wild: Do Synthesized Training Data  Generalise Better than Real-World Training Data?\nCARLA: An Open Urban Driving Simulator\nEddyNet: A Deep Neural Network For Pixel-Wise Classification of Oceanic  Eddies\nLongitudinal Study of Child Face Recognition\nDeepKSPD: Learning Kernel-matrix-based SPD Representation for  Fine-grained Image Recognition\nTowards ECDSA key derivation from deep embeddings for novel Blockchain  applications\n3D Randomized Connection Network with Graph-based Label Inference\nD-PCN: Parallel Convolutional Networks for Image Recognition via a  Discriminator\nRobust Image Registration via Empirical Mode Decomposition\nHigh-Order Attention Models for Visual Question Answering\nAn Automatic Diagnosis Method of Facial Acne Vulgaris Based on  Convolutional Neural Network\nLearning and Visualizing Localized Geometric Features Using 3D-CNN: An  Application to Manufacturability Analysis of Drilled Holes\nDenoising Imaging Polarimetry by an Adapted BM3D Method\nModeling Human Categorization of Natural Images Using Deep Feature  Representations\nGrab, Pay and Eat: Semantic Food Detection for Smart Restaurants\nSaliency-based Sequential Image Attention with Multiset Prediction\nCheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep  Learning\nLoss Functions for Multiset Prediction\nC-WSL: Count-guided Weakly Supervised Localization\nVelocity variations at Columbia Glacier captured by particle filtering  of oblique time-lapse images\nDeep Epitome for Unravelling Generalized Hamming Network: A Fuzzy Logic  Interpretation of Deep Learning\nMARGIN: Uncovering Deep Neural Networks using Graph Signal Analysis\nDNA-GAN: Learning Disentangled Representations from Multi-Attribute  Images\nPackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning\nEnd-to-end Training for Whole Image Breast Cancer Diagnosis using An All  Convolutional Design\nRobust and Precise Vehicle Localization based on Multi-sensor Fusion in  Diverse City Scenes\nModal Regression based Atomic Representation for Robust Face Recognition\nOcclusion Aware Unsupervised Learning of Optical Flow\nPriming Neural Networks\nLearning Deeply Supervised Visual Descriptors for Dense Monocular  Reconstruction\nHandSeg: A Dataset for Hand Segmentation from Depth Images\nLess-forgetful Learning for Domain Expansion in Deep Neural Networks\nLearning to Find Good Correspondences\nNatural Language Guided Visual Relationship Detection\nDeep Matching Autoencoders\nTwo Birds with One Stone: Transforming and Generating Facial Images with  Iterative GAN\nGrounded Objects and Interactions for Video Captioning\n3D Reconstruction of Incomplete Archaeological Objects Using a  Generative Adversarial Network\nShape Inpainting using 3D Generative Adversarial Network and Recurrent  Convolutional Networks\nDimensionality Reduction on Grassmannian via Riemannian Optimization: A  Generalized Perspective\nVoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection\nTraining a network to attend like human drivers saves it from common but  misleading loss functions\nUsing KL-divergence to focus Deep Visual Explanation\nChinese Typeface Transformation with Hierarchical Adversarial Network\nGrounding Visual Explanations (Extended Abstract)\nDetecting hip fractures with radiologist-level performance using deep  neural networks\nMulti-Label Zero-Shot Learning with Structured Knowledge Graphs\nEfficient Diverse Ensemble for Discriminative Co-Tracking\nLearning to Play Othello with Deep Neural Networks\nDeep Local Binary Patterns\nSuperpixels Based Segmentation and SVM Based Classification Method to  Distinguish Five Diseases from Normal Regions in Wireless Capsule Endoscopy\nDepth Assisted Full Resolution Network for Single Image-based View  Synthesis\nDriven to Distraction: Self-Supervised Distractor Learning for Robust  Monocular Visual Odometry in Urban Environments\nNeural Motifs: Scene Graph Parsing with Global Context\nMultiresolution and Hierarchical Analysis of Astronomical Spectroscopic  Cubes using 3D Discrete Wavelet Transform\nADVISE: Symbolism and External Knowledge for Decoding Advertisements\nFusing Bird View LIDAR Point Cloud and Front View Camera Image for Deep  Object Detection\nIntegrating Disparate Sources of Experts for Robust Image Denoising\nLearning SO(3) Equivariant Representations with Spherical CNNs\nA Color Quantization Optimization Approach for Image Representation  Learning\nDLTK: State of the Art Reference Implementations for Deep Learning on  Medical Images\nA novel total variation model based on kernel functions and its  application\nBPGrad: Towards Global Optimality in Deep Learning via Branch and  Pruning\nKill Two Birds with One Stone: Weakly-Supervised Neural Network for  Image Annotation and Tag Refinement\nMicroExpNet: An Extremely Small and Fast Model For Expression  Recognition From Frontal Face Images\nDeblurGAN: Blind Motion Deblurring Using Conditional Adversarial  Networks\nDiverse and Accurate Image Description Using a Variational Auto-Encoder  with an Additive Gaussian Encoding Space\nNon-line-of-sight Imaging with Partial Occluders and Surface Normals\nSpectral-Spatial Feature Extraction and Classification by ANN Supervised  with Center Loss in Hyperspectral Imagery\nEnd-to-end Trained CNN Encode-Decoder Networks for Image Steganography\nStochastic metamorphosis with template uncertainties\nMegDet: A Large Mini-Batch Object Detector\nOptical Character Recognition (OCR) for Telugu: Database, Algorithm and  Application\nFace Attention Network: An Effective Face Detector for the Occluded  Faces\nVerifying Neural Networks with Mixed Integer Programming\nMemory Based Online Learning of Deep Representations from Video Streams\nAttentive Explanations: Justifying Decisions and Pointing to the  Evidence (Extended Abstract)\nPixel-wise object tracking\nRobust Seed Mask Generation for Interactive Image Segmentation\nConvolutional Networks for Object Category and 3D Pose Estimation from  2D Images\nVirtual Adversarial Ladder Networks For Semi-supervised Learning\nNeural 3D Mesh Renderer\nResidual Parameter Transfer for Deep Domain Adaptation\nDiscussion among Different Methods of Updating Model Filter in Object  Tracking\nFunctional Map of the World\nAutoencoder Node Saliency: Selecting Relevant Latent Representations\nSilNet : Single- and Multi-View Reconstruction by Learning from  Silhouettes\nAperture Supervision for Monocular Depth Estimation\nWAYLA - Generating Images from Eye Movements\nDynamic High Resolution Deformable Articulated Tracking\nRelating Input Concepts to Convolutional Neural Network Decisions\nIdentifying Most Walkable Direction for Navigation in an Outdoor  Environment\nVideo Semantic Object Segmentation by Self-Adaptation of DCNN\nAn Analysis of Scale Invariance in Object Detection - SNIP\n3D Point Cloud Classification and Segmentation using 3D Modified Fisher  Vector Representation for Convolutional Neural Networks\nNeuron-level Selective Context Aggregation for Scene Segmentation\nConditional Image-Text Embedding Networks\nVITON: An Image-based Virtual Try-on Network\nMultiple component decomposition from millimeter single-channel data\nFrustum PointNets for 3D Object Detection from RGB-D Data\nLearning Deep Representations of Medical Images using Siamese CNNs with  Application to Content-Based Image Retrieval\nTemporal Relational Reasoning in Videos\nAdversarial Feature Augmentation for Unsupervised Domain Adaptation\nSGPN: Similarity Group Proposal Network for 3D Point Cloud Instance  Segmentation\nContextual Based Image Inpainting: Infer, Match and Translate\nRegularization of Deep Neural Networks with Spectral Dropout\nUnsupervised End-to-end Learning for Deformable Medical Image  Registration\nSelf-Reinforced Cascaded Regression for Face Alignment\nRobust Visual SLAM with Point and Line Features\nPrediction of the progression of subcortical brain structures in  Alzheimer's disease from baseline\n3D Based Landmark Tracker Using Superpixels Based Segmentation for  Neuroscience and Biomechanics Studies\nVisual Speech Enhancement\nA Dictionary Approach to Identifying Transient RFI\nReal-Time Seamless Single Shot 6D Object Pose Prediction\nWasserstein Introspective Neural Networks\nFeature Selective Networks for Object Detection\nSupervised Hashing with End-to-End Binary Deep Neural Network\nCatGAN: Coupled Adversarial Transfer for Domain Generation\nFor Your Eyes Only: Learning to Summarize First-Person Videos\nEnd-to-End Deep HDR Imaging with Large Foreground Motions\nStarGAN: Unified Generative Adversarial Networks for Multi-Domain  Image-to-Image Translation\nLong-Term On-Board Prediction of People in Traffic Scenes under  Uncertainty\nDistance to Center of Mass Encoding for Instance Segmentation\nEfficient and Invariant Convolutional Neural Networks for Dense  Prediction\nVideo Enhancement with Task-Oriented Flow\nDeep Extreme Cut: From Extreme Points to Object Segmentation\nCross-Domain Self-supervised Multi-task Feature Learning using Synthetic  Imagery\nGeometric robustness of deep networks: analysis and improvement\nConvolutional Image Captioning\nReal-Time Capable Micro-Doppler Signature Decomposition of Walking Human  Limbs\nMultiple Instance Curriculum Learning for Weakly Supervised Object  Detection\nStructure-Aware and Temporally Coherent 3D Human Pose Estimation\nUnsupervised 3D Reconstruction from a Single Image via Adversarial  Learning\nIn2I : Unsupervised Multi-Image-to-Image Translation Using Generative  Adversarial Networks\nSemantically Consistent Image Completion with Fine-grained Details\nAutomatic Color Image Segmentation Using a Square Elemental Region-Based  Seeded Region Growing and Merging Method\nFeature Map Pooling for Cross-View Gait Recognition Based on Silhouette  Sequence Images\nPersonalized and Occupational-aware Age Progression by Generative  Adversarial Networks\nImproving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients\nMAVOT: Memory-Augmented Video Object Tracking\nDepth Map Completion by Jointly Exploiting Blurry Color Images and  Sparse Depth Maps\nQuery-Adaptive R-CNN for Open-Vocabulary Object Detection and Retrieval\nDeepDeblur: Fast one-step blurry face images restoration\nDynamic Graph Generation Network: Generating Relational Knowledge from  Diagrams\nFCLT - A Fully-Correlational Long-Term Tracker\nImproving OCR Accuracy on Early Printed Books by utilizing Cross Fold  Training and Voting\nOn the Robustness of Semantic Segmentation Models to Adversarial Attacks\nTensor Completion Algorithms in Big Data Analytics\nRestricting Greed in Training of Generative Adversarial Network\nMulti-stream 3D FCN with Multi-scale Deep Supervision for Multi-modality  Isointense Infant Brain MR Image Segmentation\nTracking for Half an Hour\nLearning Less is More - 6D Camera Localization via 3D Surface Regression\nDifferential Generative Adversarial Networks: Synthesizing Non-linear  Facial Variations with Limited Number of Training Data\n3D Semantic Segmentation with Submanifold Sparse Convolutional Networks\nBetween-class Learning for Image Classification\nMinimal-Entropy Correlation Alignment for Unsupervised Deep Domain  Adaptation\nCamera Style Adaptation for Person Re-identification\nSuper-Resolution for Overhead Imagery Using DenseNets and Adversarial  Learning\nLearning Face Age Progression: A Pyramid Architecture of GANs\nLearning to Segment Every Thing\nAn Adversarial Neuro-Tensorial Approach For Learning Disentangled  Representations\nEntropy-difference based stereo error detection\nA Recursive Bayesian Approach To Describe Retinal Vasculature Geometry\nFearNet: Brain-Inspired Model for Incremental Learning\nDeep-Person: Learning Discriminative Deep Features for Person  Re-Identification\nDo Convolutional Neural Networks act as Compositional Nearest Neighbors?\nRoad Extraction by Deep Residual U-Net\nAn Amateur Drone Surveillance System Based on Cognitive Internet of  Things\nPipeline Generative Adversarial Networks for Facial Images Generation  with Multiple Attributes\nBlind estimation of white Gaussian noise variance in highly textured  images\nSaliency Weighted Convolutional Features for Instance Search\nDeepSkeleton: Skeleton Map for 3D Human Pose Regression\nSparse Photometric 3D Face Reconstruction Guided by Morphable Models\nPointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation\nOcclusion-aware Hand Pose Estimation Using Hierarchical Mixture Density  Network\nJoint Blind Motion Deblurring and Depth Estimation of Light Field\nDeep Image Prior\nSaccade Sequence Prediction: Beyond Static Saliency Maps\nA fast nonconvex Compressed Sensing algorithm for highly low-sampled MR  images reconstruction\nProperties on n-dimensional convolution for image deconvolution\nA Closer Look at Spatiotemporal Convolutions for Action Recognition\nArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene\nA novel graph structure for salient object detection based on divergence  background and compact foreground\nUnsupervised Learning for Cell-level Visual Representation in  Histopathology Images with Generative Adversarial Networks\nRadially-Distorted Conjugate Translations\nImproving Video Generation for Multi-functional Applications\nAuxiliary Guided Autoregressive Variational Autoencoders\nRelation Networks for Object Detection\nHigh-Resolution Image Synthesis and Semantic Manipulation with  Conditional GANs\nParis-Lille-3D: a large and high-quality ground truth urban point cloud  dataset for automatic segmentation and classification\nGraph Distillation for Action Detection with Privileged Information\nBlind Gain and Phase Calibration via Sparse Spectral Methods\nLabel Efficient Learning of Transferable Representations across Domains  and Tasks\nVideo retrieval based on deep convolutional neural network\nDistance-based Camera Network Topology Inference for Person  Re-identification\nDelineation of Skin Strata in Reflectance Confocal Microscopy Images  using Recurrent Convolutional Networks with Toeplitz Attention\n3D Facial Action Units Recognition for Emotional Expression\nDeformable Shape Completion with Graph Convolutional Autoencoders\nGANosaic: Mosaic Creation with Generative Texture Manifolds\nSemi-Adversarial Networks: Convolutional Autoencoders for Imparting  Privacy to Face Images\nHierarchical Bayesian image analysis: from low-level modeling to robust  supervised learning\nPrecision Learning: Towards Use of Known Operators in Neural Networks\nSingle-Shot Object Detection with Enriched Semantics\nImage to Image Translation for Domain Adaptation\nLearning Neural Markers of Schizophrenia Disorder Using Recurrent Neural  Networks\nMulti-Content GAN for Few-Shot Font Style Transfer\nTowards understanding feedback from supermassive black holes using  convolutional neural networks\nLecture video indexing using boosted margin maximizing neural networks\nTaming Adversarial Domain Transfer with Structural Constraints for Image  Enhancement\nFrom Pixels to Object Sequences: Recurrent Semantic Instance  Segmentation\nCompressed Video Action Recognition\nGAGAN: Geometry-Aware Generative Adversarial Networks\nLow-Rank Tensor Completion by Truncated Nuclear Norm Regularization\nGradient Descent Learns One-hidden-layer CNN: Don't be Afraid of  Spurious Local Minima\nA Deep Learning Approach to Drone Monitoring\nData Dropout in Arbitrary Basis for Deep Network Regularization\nComposition-aided Sketch-realistic Portrait Generation\nDeep Learning Can Reverse Photon Migration for Diffuse Optical  Tomography\nDeep Sampling Networks\nFSSD: Feature Fusion Single Shot Multibox Detector\nGANerated Hands for Real-time 3D Hand Tracking from Monocular RGB\nSOT for MOT\nA Generalized Motion Pattern and FCN based approach for retinal fluid  detection and segmentation\nStructured Deep Neural Network Pruning via Matrix Pivoting\nIterative Deep Learning for Network Topology Extraction\nA Perceptual Measure for Deep Single Image Camera Calibration\nSfSNet : Learning Shape, Reflectance and Illuminance of Faces in the  Wild\nLong-Term Visual Object Tracking Benchmark\nA+D-Net: Shadow Detection with Adversarial Shadow Attenuation\nImagine it for me: Generative Adversarial Approach for Zero-Shot  Learning from Noisy Texts\nZone-based Keyword Spotting in Bangla and Devanagari Documents\nAdversarial Attribute-Image Person Re-identification\nFully Automatic Segmentation of Lumbar Vertebrae from CT Images using  Cascaded 3D Fully Convolutional Networks\nJoint Embedding and Classification for SAR Target Recognition\nManifold-valued Image Generation with Wasserstein Adversarial Networks\nDeep learning for semantic segmentation of remote sensing images with  rich spectral content\nDeep Learning for automatic sale receipt understanding\nOn Deterministic Sampling Patterns for Robust Low-Rank Matrix Completion\nCan CNNs Construct Highly Accurate Model Efficiently with Limited  Training Samples?\nFully-Convolutional Measurement Network for Compressive Sensing Image  Reconstruction\nColor Face Recognition using High-Dimension Quaternion-based Adaptive  Representation\nOpen Evaluation Tool for Layout Analysis of Document Images\nAn Ensemble of Deep Convolutional Neural Networks for Alzheimer's  Disease Detection and Classification\nAvaliação da doença de Alzheimer pela análise multiespectral  de imagens DW-MR por redes RBF como alternativa aos mapas ADC\nAutomated Pruning for Deep Neural Network Compression\nR-FCN-3000 at 30fps: Decoupling Detection and Classification\nTowards Recovery of Conditional Vectors from Conditional Generative  Adversarial Networks\nCo-domain Embedding using Deep Quadruplet Networks for Unseen Traffic  Sign Recognition\niPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects\nBlind Image Deblurring Using Row-Column Sparse Representations\nLearning Latent Super-Events to Detect Multiple Activities in Videos\nLearning to Forecast Videos of Human Activity with Multi-granularity  Models and Adaptive Rendering\nWhat's in my closet?: Image classification using fuzzy logic\nLearning Semantic Concepts and Order for Image and Sentence Matching\nAutomatic Segmentation and Overall Survival Prediction in Gliomas using  Fully Convolutional Neural Network and Texture Analysis\nLung Nodule Classification by the Combination of Fusion Classifier and  Cascaded Convolutional Neural Networks\nStretching Domain Adaptation: How far is too far?\nJoint 3D Proposal Generation and Object Detection from View Aggregation\nFrom Lifestyle Vlogs to Everyday Interactions\nTop-down Flow Transformer Networks\nCURE-TSR: Challenging Unreal and Real Environments for Traffic Sign  Recognition\nAdversarial Examples that Fool Detectors\nTake it in your stride: Do we need striding in CNNs?\nConsistent Multiple Graph Matching with Multi-layer Random Walks  Synchronization\nUsing SVDD in SimpleMKL for 3D-Shapes Filtering\nCreating Capsule Wardrobes from Fashion Images\nUsing Rule-Based Labels for Weak Supervised Learning: A ChemNet for  Transferable Chemical Property Prediction\nPer-Pixel Feedback for improving Semantic Segmentation\nMoDL: Model Based Deep Learning Architecture for Inverse Problems\nLearned Perceptual Image Enhancement\nMulti-Scale Video Frame-Synthesis Network with Transitive Consistency  Loss\nExploiting Modern Hardware for High-Dimensional Nearest Neighbor Search\nChaining Identity Mapping Modules for Image Denoising\nDense Optical Flow based Change Detection Network Robust to Difference  of Camera Viewpoints\nCycleGAN, a Master of Steganography\nCompact Hash Code Learning with Binary Deep Neural Network\nDirect and Real-Time Cardiovascular Risk Prediction\nWeaving Multi-scale Context for Single Shot Detector\nCombining Deep Universal Features, Semantic Attributes, and Hierarchical  Classification for Zero-Shot Learning\nMinimal Solvers for Monocular Rolling Shutter Compensation under  Ackermann Motion\nTransformational Sparse Coding\nIQA: Visual Question Answering in Interactive Environments\nBayesian Joint Matrix Decomposition for Data Integration with  Heterogeneous Noise\nA Deep Recurrent Framework for Cleaning Motion Capture Data\nVisual aesthetic analysis using deep neural network: model and  techniques to increase accuracy without transfer learning\nSingle-Shot Multi-Person 3D Body Pose Estimation From Monocular RGB  Input\nFHEDN: A based on context modeling Feature Hierarchy Encoder-Decoder  Network for face detection\nThe Effectiveness of Data Augmentation for Detection of Gastrointestinal  Diseases from Endoscopical Images\nDomain Adaptation Using Adversarial Learning for Autonomous Navigation\nIdentifying the Mislabeled Training Samples of ECG Signals using Machine  Learning\nUnsupervised Feature Learning for Audio Analysis\nUsing a single RGB frame for real time 3D hand pose estimation in the  wild\nGeneralized Zero-Shot Learning via Synthesized Examples\nMINOS: Multimodal Indoor Simulator for Navigation in Complex  Environments\nStrassenNets: Deep learning with a multiplication budget\nA GRU-based Encoder-Decoder Approach with Attention for Online  Handwritten Mathematical Expression Recognition\nEye In-Painting with Exemplar Generative Adversarial Networks\nInvestigating the Impact of Data Volume and Domain Similarity on  Transfer Learning Applications\nLearning Compressible 360° Video Isomers\nIm2Flow: Motion Hallucination from Static Images for Action Recognition\nDirection-aware Spatial Context Features for Shadow Detection\nBenchmarking Single Image Dehazing and Beyond\nConditional Generative Adversarial Networks for Emoji Synthesis with  Word Embedding Manipulation\n3D Object Classification via Spherical Projections\nData Distillation: Towards Omni-Supervised Learning\nImage Registration for the Alignment of Digitized Historical Documents\nFingerprint Spoof Buster\nCamera Calibration for Daylight Specular-Point Locus\nIm2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of  View\nTransfer Adversarial Hashing for Hamming Space Retrieval\nThe Effectiveness of Data Augmentation in Image Classification using  Deep Learning\nStochastic Low-Rank Bandits\nLearning Disentangling and Fusing Networks for Face Completion Under  Structured Occlusions\nGMM-Based Synthetic Samples for Classification of Hyperspectral Images  With Limited Training Data\nSymbol detection in online handwritten graphics using Faster R-CNN\nMaskLab: Instance Segmentation by Refining Object Detection with  Semantic and Direction Features\nPediatric Bone Age Assessment Using Deep Convolutional Neural Networks\nMentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels\nWeakly Supervised Action Localization by Sparse Temporal Pooling Network\nExtreme 3D Face Reconstruction: Seeing Through Occlusions\nRobust Estimation of Similarity Transformation for Visual Object  Tracking with Correlation Filters\nPointwise Convolutional Neural Networks\nSEE: Towards Semi-Supervised End-to-End Scene Text Recognition\nRAN4IQA: Restorative Adversarial Nets for No-Reference Image Quality  Assessment\nTransfer Learning for OCRopus Model Training on Early Printed Books\nPre-training Attention Mechanisms\nUnsupervised Domain Adaptation for 3D Keypoint Prediction from a Single  Depth Scan\nA novel nonconvex approach to recover the low-tubal-rank tensor data:  when t-SVD meets PSSV\nImpression Network for Video Object Detection\nLearning a Single Convolutional Super-Resolution Network for Multiple  Degradations\nVisual Explanations from Hadamard Product in Multimodal Deep Networks\nPanoramic Robust PCA for Foreground-Background Separation on Noisy,  Free-Motion Camera Video\nSpace-Filling Curve Indices as Acceleration Structure for Exemplar-Based  Inpainting\nLearning to Write Stylized Chinese Characters by Reading a Handful of  Examples\nSuper-Resolution with Deep Adaptive Image Resampling\nGuiding human gaze with convolutional neural networks\nMulti-point Vibration Measurement for Mode Identification of Bridge  Structures using Video-based Motion Magnification\nObjects that Sound\nHierarchical Cross Network for Person Re-identification\nComparison of fingerprint authentication algorithms for small imaging  sensors\nLearning Fixation Point Strategy for Object Detection and Classification\nOn the Evaluation of Video Keyframe Summaries using User Ground Truth\nComboGAN: Unrestrained Scalability for Image Domain Translation\nScale-Space Anisotropic Total Variation for Limited Angle Tomography\nAutomatic Renal Segmentation in DCE-MRI using Convolutional Neural  Networks\nAdversarial Examples: Attacks and Defenses for Deep Learning\nReal-time 3D Reconstruction on Construction Site using Visual SLAM and  UAV\nReal-time deep hair matting on mobile devices\nY-net: 3D intracranial artery segmentation using a convolutional  autoencoder\nDeep Regression Forests for Age Estimation\nHyperparameters Optimization in Deep Convolutional Neural Network /  Bayesian Approach with Gaussian Process Prior\nFoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation\nLearning Sight from Sound: Ambient Sound Provides Supervision for Visual  Learning\nLVreID: Person Re-Identification with Long Sequence Videos\nLost in Time: Temporal Analytics for Long-Term Video Surveillance\nOn the Diversity of Realistic Image Synthesis\nIncremental Adversarial Domain Adaptation for Continually Changing  Environments\nAccurate 3D Reconstruction of Dynamic Scenes from Monocular Image  Sequences with Severe Occlusions\nAttribute CNNs for Word Spotting in Handwritten Documents\nPartial Labeled Gastric Tumor Segmentation via patch-based Reiterative  Learning\nLearning to Act Properly: Predicting and Explaining Affordances from  Images\nImage Segmentation to Distinguish Between Overlapping Human Chromosomes\nDeep metric learning for multi-labelled radiographs\nAdversarial Synthesis Learning Enables Segmentation Without Target  Modality Ground Truth\nEnhance Visual Recognition under Adverse Conditions via Deep Networks\nContext-Aware Semantic Inpainting\nExploring Models and Data for Remote Sensing Image Caption Generation\nSimulating Patho-realistic Ultrasound Images using Deep Generative  Networks with Adversarial Learning\nHuman Action Recognition: Pose-based Attention draws focus to Hands\nLearning Intelligent Dialogs for Bounding Box Annotation\nA Deep Learning Interpretable Classifier for Diabetic Retinopathy  Disease Grading\nUnifying Map and Landmark Based Representations for Visual Navigation\nSmart, Sparse Contours to Represent and Edit Images\nUsing LIP to Gloss Over Faces in Single-Stage Face Detection Networks\nRecurrent Pixel Embedding for Instance Grouping\nCSGNet: Neural Shape Parser for Constructive Solid Geometry\nDeep Hashing with Category Mask for Fast Video Retrieval\nSimple Methods for Scanner Drift Normalization Validated for Automatic  Segmentation of Knee Magnetic Resonance Imaging - with data from the  Osteoarthritis Initiative\nEvaluation of PPG Biometrics for Authentication in different states\nDenoising of image gradients and total generalized variation denoising\nBoundary-sensitive Network for Portrait Segmentation\nAerial Spectral Super-Resolution using Conditional Adversarial Networks\nCombining Weakly and Webly Supervised Learning for Classifying Food  Images\nLarge-Scale Object Discovery and Detector Adaptation from Unlabeled  Video\nUse of Generative Adversarial Network for Cross-Domain Change Detection\nRIDI: Robust IMU Double Integration\nDeep Blind Image Inpainting\nDeep Meta Learning for Real-Time Visual Tracking based on  Target-Specific Feature Space\nSegmenting Sky Pixels in Images\nDetect-and-Track: Efficient Pose Estimation in Videos\nAircraft Fuselage Defect Detection using Deep Neural Networks\nLarge-Scale 3D Scene Classification With Multi-View Volumetric CNN\nAudio to Body Dynamics\nRaspiReader: Open Source Fingerprint Reader\nRobust Minutiae Extractor: Integrating Deep Networks and Fingerprint  Domain Knowledge\nEventness: Object Detection on Spectrograms for Temporal Localization of  Audio Events\nEfficient Parallel Connected Components Labeling with a Coarse-to-fine  Strategy\nSiamese LSTM based Fiber Structural Similarity Network (FS2Net) for  Rotation Invariant Brain Tractography Segmentation\nA Multi-Scale and Multi-Depth Convolutional Neural Network for Remote  Sensing Imagery Pan-Sharpening\nVisualizing the Loss Landscape of Neural Nets\nRapid Adaptation with Conditionally Shifted Neurons\nLearning Deep and Compact Models for Gesture Recognition\nEstimation under group actions: recovering orbits from invariants\nPolyp detection inside the capsule endoscopy: an approach for power  consumption reduction\nDense Fully Convolutional Network for Skin Lesion Segmentation\nScanComplete: Large-Scale Scene Completion and Semantic Segmentation for  3D Scans\nDeformable GANs for Pose-based Human Image Generation\nA Unified Method for First and Third Person Action Recognition\nIntegrating semi-supervised label propagation and random forests for  multi-atlas based hippocampus segmentation\nContext aware saliency map generation using semantic segmentation\nInteractive Video Object Segmentation in the Wild\nDeep Stacked Networks with Residual Polishing for Image Inpainting\nAdversarial Generative Nets: Neural Network Attacks on State-of-the-Art  Face Recognition\nSenseNet: 3D Objects Database and Tactile Simulator\nTheoretical Analysis of Sparse Subspace Clustering with Missing Entries\nSemantic Segmentation of Human Thigh Quadriceps Muscle in Magnetic  Resonance Images\nLearning Deep Structured Multi-Scale Features using Attention-Gated CRFs  for Contour Prediction\nScene-Adapted Plug-and-Play Algorithm with Guaranteed Convergence:  Applications to Data Fusion in Imaging\nImage denoising through bivariate shrinkage function in framelet domain\nRestricted Deformable Convolution based Road Scene Semantic Segmentation  Using Surround View Cameras\nOptimal Bayesian Transfer Learning\nPanoptic Segmentation\nRecovery of Point Clouds on Surfaces: Application to Image  Reconstruction\nRecovery of Noisy Points on Band-limited Surfaces: Kernel Methods  Re-explained\nInstance Embedding Transfer to Unsupervised Video Object Segmentation\nJoint convolutional neural pyramid for depth map super-resolution\nSpot the Difference by Object Detection\n3D Face Reconstruction with Region Based Best Fit Blending Using Mobile  Phone for Virtual Reality Based Social Media\nICFVR 2017: 3rd International Competition on Finger Vein Recognition\nPixelLink: Detecting Scene Text via Instance Segmentation\nSmartTennisTV: Automatic indexing of tennis videos\nIMU2Face: Real-time Gesture-driven Facial Reenactment\nQuantifying Translation-Invariance in Convolutional Neural Networks\nAdaptive kNN using Expected Accuracy for Classification of Geo-Spatial  Data\nDeep Cross Polarimetric Thermal-to-visible Face Recognition\nObject Referring in Videos with Language and Human Gaze\nCombination of Hyperband and Bayesian Optimization for Hyperparameter  Optimization in Deep Learning\nTotal Capture: A 3D Deformation Model for Tracking Faces, Hands, and  Bodies\nDeep learning for word-level handwritten Indic script identification\nEfficient Image Evidence Analysis of CNN Classification Results\nGatekeeping Algorithms with Human Ethical Bias: The ethics of algorithms  in archives, libraries and society\n3D-DETNet: a Single Stage Video-Based Vehicle Detector\nHi-Fi: Hierarchical Feature Integration for Skeleton Detection\nA First Step in the Co-Evolution of Blockchain and Ontologies: Towards  Engineering an Ontology of Governance at the Blockchain Protocol Level\nForeground Segmentation Using a Triplet Convolutional Neural Network for  Multiscale Feature Encoding\nDeep Crisp Boundaries: From Boundaries to Higher-level Tasks\nBridging the Gap: Simultaneous Fine Tuning for Data Re-Balancing\nBoundary Optimizing Network (BON)\nTowards Multi-Object Detection and Tracking in Urban Scenario under  Uncertainties\nBrain MRI Super Resolution Using 3D Deep Densely Connected Neural  Networks\nTextBoxes++: A Single-Shot Oriented Scene Text Detector\nDeepStyle: Multimodal Search Engine for Fashion and Interior Design\nEBIC: an artificial intelligence-based parallel biclustering algorithm  for pattern discovery\nMeta-Tracker: Fast and Robust Online Adaptation for Visual Object  Trackers\nInstance Map based Image Synthesis with a Denoising Generative  Adversarial Network\nUnsupervised Despeckling\nInferring a Third Spatial Dimension from 2D Histological Images\nMulti-Scale Attention with Dense Encoder for Handwritten Mathematical  Expression Recognition\nSegment-based Methods for Facial Attribute Detection from Partial Faces\nFrom Superpixel to Human Shape Modelling for Carried Object Detection\nCortical-inspired image reconstruction via sub-Riemannian geometry and  hypoelliptic diffusion\nMulti-view Consistency as Supervisory Signal for Learning Shape and Pose  Prediction\nBrain Age Prediction Based on Resting-State Functional Connectivity  Patterns Using Convolutional Neural Networks\nHow should a fixed budget of dwell time be spent in scanning electron  microscopy to optimize image quality?\nGenerative Single Image Reflection Separation\nQuickNAT: Segmenting MRI Neuroanatomy in 20 seconds\nConditional Probability Models for Deep Image Compression\nDeep saliency: What is learnt by a deep network about saliency?\nSize-to-depth: A New Perspective for Single Image Depth Estimation\nNon-Parametric Transformation Networks\nCooperative Multi-Agent Reinforcement Learning for Low-Level Wireless  Communication\nDeep Metric Learning with BIER: Boosting Independent Embeddings Robustly\nClassification of histopathological breast cancer images using iterative  VMD aided Zernike moments & textural signatures\nCircular Antenna Array Design for Breast Cancer Detection\nInferring Semantic Layout for Hierarchical Text-to-Image Synthesis\nDeep Multi-Spectral Registration Using Invariant Descriptor Learning\nLong-term Visual Localization using Semantically Segmented Images\nAutonomous Driving in Reality with Reinforcement Learning and Image  Translation\nBenchmark Visual Question Answer Models by using Focus Map\nRe-ID done right: towards good practices for person re-identification\nCahn--Hilliard inpainting with the double obstacle potential\nSemi-supervised FusedGAN for Conditional Image Generation\nGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace\nAdditive Margin Softmax for Face Verification\nMulti-View Stereo 3D Edge Reconstruction\nFaster gaze prediction with dense networks and Fisher pruning\nSparsely Connected Convolutional Networks\nExtend the shallow part of Single Shot MultiBox Detector via  Convolutional Neural Network\nBinaryRelax: A Relaxation Approach For Training Deep Neural Networks  With Quantized Weights\nPiggyback: Adapting a Single Network to Multiple Tasks by Learning to  Mask Weights\nHow would surround vehicles move? A Unified Framework for Maneuver  Classification and Motion Prediction\nVisualization of Hyperspectral Images Using Moving Least Squares\nStructured Inhomogeneous Density Map Learning for Crowd Counting\nDeepISP: Learning End-to-End Image Processing Pipeline\nBoundary-based Image Forgery Detection by Fast Shallow CNN\nPU-Net: Point Cloud Upsampling Network\nDeep joint rain and haze removal from single images\nDense Recurrent Neural Networks for Scene Labeling\nTowards Automated Tuberculosis detection using Deep Learning\nE-swish: Adjusting Activations to Different Network Depths\nFluorescence Microscopy Image Segmentation Using Convolutional Neural  Network With Generative Adversarial Networks\nDiscrimNet: Semi-Supervised Action Recognition from Videos using  Generative Adversarial Networks\nLow-level Active Visual Navigation: Increasing robustness of  vision-based localization using potential fields\nVehicle Detection in Aerial Images\nNumerical Coordinate Regression with Convolutional Neural Networks\nNovel digital tissue phenotypic signatures of distant metastasis in  colorectal cancer\nSurvey on Emotional Body Gesture Recognition\nStatistically Motivated Second Order Pooling\nSide Information for Face Completion: a Robust PCA Approach\nTowards Low-Latency and Ultra-Reliable Virtual Reality\nHuman Activity Recognition for Mobile Robot\nArcFace: Additive Angular Margin Loss for Deep Face Recognition\nPointCNN\nFeeding Hand-Crafted Features for Enhancing the Performance of  Convolutional Neural Networks\nDeep Structured Energy-Based Image Inpainting\nNear-lossless L-infinity constrained Multi-rate Image Decompression via  Deep Neural Network\nUnsupervised learning from videos using temporal coherency deep networks\nDVQA: Understanding Data Visualizations via Question Answering\nClass label autoencoder for zero-shot learning\nDual Asymmetric Deep Hashing Learning\nUnderstanding Human Behaviors in Crowds by Imitating the Decision-Making  Process\nSelf-Learning to Detect and Segment Cysts in Lung CT Images without  Manual Annotation\nDeep Learning for End-to-End Automatic Target Recognition from Synthetic  Aperture Radar Imagery\nGenerating Handwritten Chinese Characters using CycleGAN\nCloud Detection From RGB Color Remote Sensing Images With Deep Pyramid  Networks\nWeakly Supervised Object Detection with Pointwise Mutual Information\n3D Scanning: A Comprehensive Survey\nDeflecting Adversarial Attacks with Pixel Deflection\nEfficient Hierarchical Graph-Based Segmentation of RGBD Videos\nA Two-point Method for PTZ Camera Calibration in Sports\nEar Recognition With Score-Level Fusion Based On CMC In Long-Wave  Infrared Spectrum\nA Multi-Biometrics for Twins Identification Based Speech and Ear\nFine-grained Visual Categorization using PAIRS: Pose and Appearance  Integration for Recognizing Subcategories\nInteractive Deep Colorization With Simultaneous Global and Local Inputs\nInteractive Generative Adversarial Networks for Facial Expression  Generation in Dyadic Interactions\nMeshed Up: Learnt Error Correction in 3D Reconstructions\nImproved Training of Generative Adversarial Networks Using  Representative Features\nDocument Image Classification with Intra-Domain Transfer Learning and  Stacked Generalization of Deep Convolutional Neural Networks\nLocal Visual Microphones: Improved Sound Extraction from Silent Video\nHyper-Hue and EMAP on Hyperspectral Images for Supervised Layer  Decomposition of Old Master Drawings\nLearning-based Image Reconstruction via Parallel Proximal Algorithm\nEnd-to-End Fine-Grained Action Segmentation and Recognition Using  Conditional Random Field Models and Discriminative Sparse Coding\nDenoising Arterial Spin Labeling Cerebral Blood Flow Images Using Deep  Learning\nPredicting Rapid Fire Growth (Flashover) Using Conditional Generative  Adversarial Networks\nObject Detection in Videos by Short and Long Range Object Linking\nOpen3D: A Modern Library for 3D Data Processing\nStructured Memory based Deep Model to Detect as well as Characterize  Novel Inputs\nSliding Line Point Regression for Shape Robust Scene Text Detection\nMalaria Detection Using Image Processing and Machine Learning\nSegDenseNet: Iris Segmentation for Pre and Post Cataract Surgery\nRiemannian Walk for Incremental Learning: Understanding Forgetting and  Intransigence\nNetizen-Style Commenting on Fashion Photos: Dataset and Diversity  Measures\nA Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360  Video\nAn Infinitesimal Probabilistic Model for Principal Component Analysis of  Manifold Valued Data\nRobust 3D Human Motion Reconstruction Via Dynamic Template Construction\nFrom Benedict Cumberbatch to Sherlock Holmes: Character Identification  in TV series without a Script\nCounting Cells in Time-Lapse Microscopy using Deep Neural Networks\nImproved Image Segmentation via Cost Minimization of Multiple Hypotheses\nCross-domain CNN for Hyperspectral Image Classification\nDeep Learning with Data Dependent Implicit Activation Function\nFull Image Recover for Block-Based Compressive Sensing\nClustering and Unsupervised Anomaly Detection with L2 Normalized Deep  Auto-Encoder Representations\nAutomatic Safety Helmet Wearing Detection\nAnnotation-Free and One-Shot Learning for Instance Segmentation of  Homogeneous Object Clusters\n3D Object Dense Reconstruction from a Single Depth View\nA Fusion of Appearance based CNNs and Temporal evolution of Skeleton  with LSTM for Daily Living Action Recognition\nDensePose: Dense Human Pose Estimation In The Wild\nA New Registration Approach for Dynamic Analysis of Calcium Signals in  Organs\nComplex Network Classification with Convolutional Neural Network\nActivity-conditioned continuous human pose estimation for performance  analysis of athletes using the example of swimming\nHandwritten Isolated Bangla Compound Character Recognition: a new  benchmark using a novel deep learning approach\nNo Modes left behind: Capturing the data distribution effectively using  GANs\nIntriguing Properties of Randomly Weighted Networks: Generalizing While  Learning Next to Nothing\nMulti-attention Recurrent Network for Human Communication Comprehension\nDeep Learning Framework for Multi-class Breast Cancer Histology Image  Classification\nEnsembling Neural Networks for Digital Pathology Images Classification  and Segmentation\nImage Posterization Using Fuzzy Logic and Bilateral Filter\nEnd2You -- The Imperial Toolkit for Multimodal Profiling by End-to-End  Learning\nObject Detection and Sorting by Using a Global Texture-Shape 3D Feature  Descriptor\nEfficient Video Object Segmentation via Network Modulation\nTracking Multiple Moving Objects Using Unscented Kalman Filtering  Techniques\nFace Destylization\nEnhancing Multi-Class Classification of Random Forest using Random  Vector Functional Neural Network and Oblique Decision Surfaces\nZero-Shot Kernel Learning\nTask-Aware Compressed Sensing with Generative Adversarial Networks\nAdversarial Vulnerability of Neural Networks Increases With Input  Dimension\nA Method for Restoring the Training Set Distribution in an Image  Classifier\nRoad Segmentation in SAR Satellite Images with Deep Fully-Convolutional  Neural Networks\nExploring Spatial Context for 3D Semantic Segmentation of Point Clouds\nReal-time Prediction of Intermediate-Horizon Automotive Collision Risk\nScale-recurrent Network for Deep Image Deblurring\nGeometry-Contrastive Generative Adversarial Network for Facial  Expression Synthesis\nLearning Image Representations by Completing Damaged Jigsaw Puzzles\nThe steerable graph Laplacian and its application to filtering image  data-sets\nAttribute-Guided Network for Cross-Modal Zero-Shot Hashing\nOrthogonally Regularized Deep Networks For Image Super-resolution\nOn the Feasibility of Generic Deep Disaggregation for Single-Load  Extraction\nDeepTravel: a Neural Network Based Travel Time Estimation Model with  Auxiliary Supervision\nSmile detection in the wild based on transfer learning\nA High-Performance HOG Extractor on FPGA\nObject Detection on Dynamic Occupancy Grid Maps Using Deep Learning and  Automatic Label Generation\nAutomatic Pavement Crack Detection Based on Structured Prediction with  the Convolutional Neural Network\nIONet: Learning to Cure the Curse of Drift in Inertial Odometry\nDescribing Semantic Representations of Brain Activity Evoked by Visual  Stimuli\nThe Heart of an Image: Quantum Superposition and Entanglement in Visual  Perception\nScalable Meta-Learning for Bayesian Optimization\nGenerative Adversarial Networks using Adaptive Convolution\nFeature Based Framework to Detect Diseases, Tumor, and Bleeding in  Wireless Capsule Endoscopy\nA machine learning approach to reconstruction of heart surface  potentials from body surface potentials\nSpectral Image Visualization Using Generative Adversarial Networks\nOutlier Detection for Robust Multi-dimensional Scaling\nSlideRunner - A Tool for Massive Cell Annotations in Whole Slide Images\nRevisiting the Inverted Indices for Billion-Scale Approximate Nearest  Neighbors\nOn the Generalizability of Linear and Non-Linear Region of  Interest-Based Multivariate Regression Models for fMRI Data\nStochastic Deconvolutional Neural Network Ensemble Training on  Generative Pseudo-Adversarial Networks\nLearning One Convolutional Layer with Overlapping Patches\nBitewing Radiography Semantic Segmentation Base on Conditional  Generative Adversarial Nets\nSpatially adaptive image compression using a tiled deep network\nSCK: A sparse coding based key-point detector\nLearning to score the figure skating sports videos\nPeekaboo - Where are the Objects? Structure Adjusting Superpixels\nArchetypal Analysis for Sparse Representation-based Hyperspectral  Sub-pixel Quantification\nFrom Selective Deep Convolutional Features to Compact Binary  Representations for Image Retrieval\nRotate your Networks: Better Weight Consolidation and Less Catastrophic  Forgetting\nPractical Issues of Action-conditioned Next Image Prediction\nDetection of Adversarial Training Examples in Poisoning Attacks through  Anomaly Detection\nDeep Private-Feature Extraction\nBoosting Image Forgery Detection using Resampling Features and Copy-move  analysis\nMultiple Target Tracking by Learning Feature Representation and Distance  Metric Jointly\nUnsupervised Deep Domain Adaptation for Pedestrian Detection\nSlice Sampling Particle Belief Propagation\nTemporally Object-based Video Co-Segmentation\nShapes Characterization on Address Event Representation Using Histograms  of Oriented Events and an Extended LBP Approach\nA Two-Stage Method for Text Line Detection in Historical Documents\nGenerative Adversarial Networks and Probabilistic Graph Models for  Hyperspectral Image Classification\nLocal Contrast Learning\nHydra: an Ensemble of Convolutional Neural Networks for Geospatial Land  Classification\nCoverless information hiding based on Generative Model\nCollaborative Learning for Weakly Supervised Object Detection\nTubule segmentation of fluorescence microscopy images based on  convolutional neural networks with inhomogeneity correction\n2-gram-based Phonetic Feature Generation for Convolutional Neural  Network in Assessment of Trademark Similarity\nSupervised classification of Dermatological diseases by Deep neural  networks\nFlipDial: A Generative Model for Two-Way Visual Dialogue\nEdge-Host Partitioning of Deep Neural Networks with Feature Space  Encoding for Resource-Constrained Internet-of-Things Platforms\nObject Detection with Mask-based Feature Encoding\nIntegration of Absolute Orientation Measurements in the KinectFusion  Reconstruction pipeline\nSubspace Support Vector Data Description\nDeep learning based supervised semantic segmentation of Electron  Cryo-Subtomograms\nImage Retargetability\nDeep Learning Models Delineates Multiple Nuclear Phenotypes in H&E  Stained Histology Sections\nAutomatic localization and decoding of honeybee markers using deep  convolutional neural networks\nBarista - a Graphical Tool for Designing and Training Deep Neural  Networks\nSingle-Perspective Warps in Natural Image Stitching\nBIRNet: Brain Image Registration Using Dual-Supervised Fully  Convolutional Networks\nSemantic Scene Completion Combining Colour and Depth: preliminary  experiments\nLearning via social awareness: improving sketch representations with  facial feedback\nParaphrasing Complex Network: Network Compression via Factor Transfer\nM4CD: A Robust Change Detection Method for Intelligent Visual  Surveillance\nRecursive Chaining of Reversible Image-to-image Translators For Face  Aging\nThe Multiscale Bowler-Hat Transform for Vessel Enhancement in 3D  Biomedical Images\nTwo Is Harder To Recognize Than Tom: the Challenge of Visual Numerosity  for Deep Learning\nLearning Privacy Preserving Encodings through Adversarial Training\nAtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation\nMapping Images to Scene Graphs with Permutation-Invariant Structured  Prediction\nLearning from a Handful Volumes: MRI Resolution Enhancement with  Volumetric Super-Resolution Forests\nDeep Learning for Lip Reading using Audio-Visual Information for Urdu  Language\nConditioning of three-dimensional generative adversarial networks for  pore and reservoir-scale models\ncGANs with Projection Discriminator\nA Bio-inspired Redundant Sensing Architecture\nInverting The Generator Of A Generative Adversarial Network (II)\nIBeaconMap: Automated Indoor Space Representation for Beacon-Based  Wayfinding\nJoint Estimation of Room Geometry and Modes with Compressed Sensing\nA complete hand-drawn sketch vectorization framework\nReal-Time 3D Shape of Micro-Details\nVisual-Only Recognition of Normal, Whispered and Silent Speech\nFast 5DOF Needle Tracking in iOCT\nA Closed-form Solution to Photorealistic Image Stylization\nImage Forensics: Detecting duplication of scientific images with  manipulation-invariant image similarity\nWeighted Linear Discriminant Analysis based on Class Saliency  Information\nDeep Residual Network for Joint Demosaicing and Super-Resolution\nSimultaneous Compression and Quantization: A Joint Approach for  Efficient Unsupervised Hashing\nMulti-task, multi-label and multi-domain learning with residual  convolutional networks for emotion recognition\nLearning Representative Temporal Features for Action Recognition\nDivide, Denoise, and Defend against Adversarial Attacks\nOnline Action Detection in Untrimmed, Streaming Videos\nGlobal Pose Estimation with an Attention-based Recurrent Network\nRecovery of simultaneous low rank and two-way sparse coefficient  matrices, a nonconvex approach\nLearning to Abstain via Curve Optimization\nLatent RANSAC\nNovel View Synthesis for Large-scale Scene using Adversarial Loss\nComposite Optimization by Nonconvex Majorization-Minimization\nCorrelation Flow: Robust Optical Flow Using Kernel Cross-Correlators\ni-RevNet: Deep Invertible Networks\nCamera-based vehicle velocity estimation from monocular video\nStroke Controllable Fast Style Transfer with Adaptive Receptive Fields\nReal-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded  Devices\nDevon: Deformable Volume Network for Learning Optical Flow\nDensity-aware Single Image De-raining using a Multi-stream Dense Network\nAngle constrained path to cluster multiple manifolds\nConditional Adversarial Synthesis of 3D Facial Action Units\nBinary Constrained Deep Hashing Network for Image Retrieval without  Human Intervention\nLearning to Play with Intrinsically-Motivated Self-Aware Agents\nLoad Balanced GANs for Multi-view Face Image Synthesis\nSpatial Morphing Kernel Regression For Feature Interpolation\nLearning Image Conditioned Label Space for Multilabel Classification\nMulticlass Weighted Loss for Instance Segmentation of Cluttered Cells\nBatch Normalization and the impact of batch structure on the behavior of  deep convolution networks\nExplanations based on the Missing: Towards Contrastive Explanations with  Pertinent Negatives\nBuilding Efficient ConvNets using Redundant Feature Pruning\nLearning Multiple Categories on Deep Convolution Networks\nLossless Compression of Angiogram Foreground with Visual Quality  Preservation of Background\nLeft Ventricle Segmentation in Cardiac MR Images Using Fully  Convolutional Network\nLossless Image Compression Algorithm for Wireless Capsule Endoscopy by  Content-Based Classification of Image Blocks\nReversible Image Watermarking for Health Informatics Systems Using  Distortion Compensation in Wavelet Domain\nSegmentation of Bleeding Regions in Wireless Capsule Endoscopy Images an  Approach for inside Capsule Video Summarization\nSemantic Segmentation Refinement by Monte Carlo Region Growing of High  Confidence Detections\nLiver segmentation in CT images using three dimensional to two  dimensional fully convolutional network\nLow complexity convolutional neural network for vessel segmentation in  portable retinal diagnostic devices\nDetecting Small, Densely Distributed Objects with Filter-Amplifier  Networks and Loss Boosting\nMulti-Sensor Integration for Indoor 3D Reconstruction\nWhere's YOUR focus: Personalized Attention\nAdversarial Learning for Semi-Supervised Semantic Segmentation\nStereo obstacle detection for unmanned surface vehicles by IMU-assisted  semantic segmentation\nRobustness of classifiers to uniform $\\ell\\_p$ and Gaussian noise\nMagnifyMe: Aiding Cross Resolution Face Recognition via Identity Aware  Synthesis\nDiscriminative Label Consistent Domain Adaptation\nHarmonious Attention Network for Person Re-Identification\nAdaptive specular reflection detection and inpainting in colonoscopy  video frames\n6D Pose Estimation using an Improved Method based on Point Pair Features\nAn Approach to Vehicle Trajectory Prediction Using Automatically  Generated Traffic Maps\nInteractive Image Manipulation with Natural Language Instruction  Commands\nComparative Analysis of Unsupervised Algorithms for Breast MRI Lesion  Segmentation\nDeep learning in radiology: an overview of the concepts and a survey of  the state of the art\nSpatially Constrained Location Prior for Scene Parsing\nConstrained Image Generation Using Binarized Neural Networks with  Decision Procedures\nFree-breathing cardiac MRI using bandlimited manifold modelling\nA Dataset To Evaluate The Representations Learned By Video Prediction  Models\nBuilding Instance Classification Using Street View Images\nSeeing Small Faces from Robust Anchor's Perspective\nAttention-Aware Generative Adversarial Networks (ATA-GANs)\nPBGen: Partial Binarization of Deconvolution-Based Generators for Edge  Intelligence\nPhotographic Text-to-Image Synthesis with a Hierarchically-nested  Adversarial Network\nDepth Masked Discriminative Correlation Filter\n2D/3D Pose Estimation and Action Recognition using Multitask Deep  Learning\nHBST: A Hamming Distance embedding Binary Search Tree for Visual Place  Recognition\nDropLasso: A robust variant of Lasso for single cell RNA-seq data\nUsing Curvilinear Features in Focus for Registering a Single Image to a  3D Object\nA Resilient Image Matching Method with an Affine Invariant Feature  Detector and Descriptor\nHow (Not) To Train Your Neural Network Using the Information Bottleneck  Principle\nSpatio-Temporal Graph Convolution for Skeleton Based Action Recognition\nAdversarial Active Learning for Deep Networks: a Margin Based Approach\nReal-World Repetition Estimation by Div, Grad and Curl\nFusion of Multispectral Data Through Illumination-aware Deep Neural  Networks for Pedestrian Detection\nNeural Stereoscopic Image Style Transfer\nImproving OCR Accuracy on Early Printed Books using Deep Convolutional  Networks\nCSRNet: Dilated Convolutional Neural Networks for Understanding the  Highly Congested Scenes\nTell Me Where to Look: Guided Attention Inference Network\nNetworking the Boids is More Robust Against Adversarial Learning\nNeural Aesthetic Image Reviewer\nJoint Event Detection and Description in Continuous Video Streams\nA Model for Medical Diagnosis Based on Plantar Pressure\nLearning to Adapt Structured Output Space for Semantic Segmentation\nDeep-6DPose: Recovering 6D Object Pose from a Single RGB Image\nFine-grained wound tissue analysis using deep neural network\nA Simple Method to improve Initialization Robustness for Active Contours  driven by Local Region Fitting Energy\nNovelty Detection with GAN\nStereoscopic Neural Style Transfer\nA Feature Clustering Approach Based on Histogram of Oriented Optical  Flow and Superpixels\nInvariant properties of a locally salient dither pattern with a  spatial-chromatic histogram\nNeural Networks Should Be Wide Enough to Learn Disconnected Decision  Regions\nRing loss: Convex Feature Normalization for Face Recognition\nA Class-Incremental Learning Method Based on One Class Support Vector  Machine\nDRUNET: A Dilated-Residual U-Net Deep Learning Network to Digitally  Stain Optic Nerve Head Tissues in Optical Coherence Tomography Images\nFive-point Fundamental Matrix Estimation for Uncalibrated Cameras\nDetecting Volcano Deformation in InSAR using Deep learning\nFibres of Failure: Classifying errors in predictive processes\nMAGAN: Aligning Biological Manifolds\nA General Pipeline for 3D Detection of Vehicles\nImage Dataset for Visual Objects Classification in 3D Printing\nLeft ventricle segmentation By modelling uncertainty in prediction of  deep convolutional neural networks and adaptive thresholding inference\nAn Intelligent Intersection\nNatural data structure extracted from neighborhood-similarity graphs\nThe 2018 DAVIS Challenge on Video Object Segmentation\nSemi-parametric Topological Memory for Navigation\nRaw Multi-Channel Audio Source Separation using Multi-Resolution  Convolutional Auto-Encoders\nAspl{ü}nd's metric defined in the Logarithmic Image Processing (LIP)  framework for colour and multivariate images\nDeep Unsupervised Intrinsic Image Decomposition by Siamese Training\nQuantum distance-based classifier with constant size memory, distributed  knowledge and state recycling\nMulti-Instance Dynamic Ordinal Random Fields for Weakly-supervised  Facial Behavior Analysis\nTree Species Identification from Bark Images Using Convolutional Neural  Networks\nMultimodal Registration of Retinal Images Using Domain-Specific  Landmarks and Vessel Enhancement\nHashing with Mutual Information\nHigh-Dynamic-Range Imaging for Cloud Segmentation\nEnhancement of land-use change modeling using convolutional neural  networks and convolutional denoising autoencoders\nReal-Time Deep Learning Method for Abandoned Luggage Detection in Video\nDeep Bayesian Active Semi-Supervised Learning\nUnsupervised Learning of Face Representations\nTraining Deep Learning based Denoisers without Ground Truth Data\nEgocentric Basketball Motion Planning from a Single First-Person Image\nDeep Continuous Clustering\nLSTD: A Low-Shot Transfer Detector for Object Detection\nLearning-Based Dequantization For Image Restoration Against Extremely  Poor Illumination\nImproving the Improved Training of Wasserstein GANs: A Consistency Term  and Its Dual Effect\nRelocalization, Global Optimization and Map Merging for Monocular  Visual-Inertial SLAM\nBeyond Context: Exploring Semantic Similarity for Tiny Face Detection\nLocal Distance Metric Learning for Nearest Neighbor Algorithm\nPredicting Out-of-View Feature Points for Model-Based Camera Pose  Estimation\nSpectral reflectance estimation from one RGB image using  self-interreflections in a concave object\nUsing Visual Saliency to Improve Human Detection with Convolutional  Networks\nResampling Forgery Detection Using Deep Learning and A-Contrario  Analysis\nA generalized parametric 3D shape representation for articulated pose  estimation\nST-GAN: Spatial Transformer Generative Adversarial Networks for Image  Compositing\nM3Fusion: A Deep Learning Architecture for Multi-{Scale/Modal/Temporal}  satellite data fusion\nSegmentation of Drosophila Heart in Optical Coherence Microscopy Images  Using Convolutional Neural Networks\nOccupancy Map Prediction Using Generative and Fully Convolutional  Networks for Vehicle Navigation\nThe Earth ain't Flat: Monocular Reconstruction of Vehicles on Steep and  Graded Roads from a Moving Camera\nNonlocality-Reinforced Convolutional Neural Networks for Image Denoising\nA Non-Technical Survey on Deep Convolutional Neural Network  Architectures\nFully Convolutional Grasp Detection Network with Oriented Anchor Box\nDepth Information Guided Crowd Counting for Complex Crowd Scenes\nExpandNet: A Deep Convolutional Neural Network for High Dynamic Range  Expansion from Low Dynamic Range Content\nPersonalized Attention-Aware Exposure Control Using Reinforcement  Learning\nGeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera  Pose\nLearning monocular visual odometry with dense 3D mapping from dense 3D  flow\nComparison of Deep Learning Approaches for Multi-Label Chest X-Ray  Classification\nComparison of various image fusion methods for impervious surface  classification from VNREDSat-1\nFast Cylinder and Plane Extraction from Depth Cameras for Visual  Odometry\nPI-VIO: Robust and Efficient Stereo Visual Inertial Odometry using  Points and Lines\nVisual Explanations From Deep 3D Convolutional Neural Networks for  Alzheimer's Disease Classification\nPyramid Person Matching Network for Person Re-identification\nObject cosegmentation using deep Siamese network\nMulti-Channel Pyramid Person Matching Network for Person  Re-Identification\nDecoupled Spatial Neural Attention for Weakly Supervised Semantic  Segmentation\nConcurrent Spatial and Channel Squeeze & Excitation in Fully  Convolutional Networks\n3D Human Pose Estimation in RGBD Images for Robotic Task Learning\nCNN-Based Automatic Urinary Particles Recognition\nDeep Back-Projection Networks For Super-Resolution\nHENet:A Highly Efficient Convolutional Neural Networks Optimized for  Accuracy, Speed and Storage\nRTSeg: Real-time Semantic Segmentation Comparative Study\nA Deep Learning Algorithm for One-step Contour Aware Nuclei Segmentation  of Histopathological Images\nRethinking Feature Distribution for Loss Functions in Image  Classification\nRobustness of control point configurations for homography and planar  pose estimation\nPreserving Semantic Relations for Zero-Shot Learning\nLeveraging Unlabeled Data for Crowd Counting by Learning to Rank\nGONet: A Semi-Supervised Deep Learning Approach For Traversability  Estimation\nGeneralization in Metric Learning: Should the Embedding Layer be the  Embedding Layer?\nAdversarial Training for Adverse Conditions: Robust Metric Localisation  using Appearance Transfer\nDeep Semantic Face Deblurring\nLearning a Discriminative Prior for Blind Image Deblurring\nFusing Hierarchical Convolutional Features for Human Body Segmentation  and Clothing Fashion Classification\nRobust Landmark Detection for Alignment of Mouse Brain Section Images\nCooperative Starting Movement Detection of Cyclists Using Convolutional  Neural Networks and a Boosted Stacking Ensemble\nConstruction of neural networks for realization of localized deep  learning\nBreast Tumor Classification Based on Decision Information Genes and  Inverse Projection Sparse Representation\nLocal Kernels that Approximate Bayesian Regularization and Proximal  Operators\nDriving Scene Perception Network: Real-time Joint Detection, Depth  Estimation and Semantic Segmentation\nShuffleSeg: Real-time Semantic Segmentation Network\nFire detection in a still image using colour information\nSample-Relaxed Two-Dimensional Color Principal Component Analysis for  Face Recognition and Image Reconstruction\nLearning to Localize Sound Source in Visual Scenes\nKnowledge Aided Consistency for Weakly Supervised Phrase Grounding\nDeeply supervised neural network with short connections for retinal  vessel segmentation\nDeep Dictionary Learning: A PARametric NETwork Approach\nCascade context encoder for improved inpainting\nMultiple Instance Choquet Integral Classifier Fusion and Regression for  Remote Sensing Applications\nLearning Local Distortion Visibility From Image Quality Data-sets\nTwo-Stage Convolutional Neural Network for Breast Cancer Histology Image  Classification\nVideo Object Segmentation with Joint Re-identification and  Attention-Aware Mask Propagation\nSO-Net: Self-Organizing Network for Point Cloud Analysis\nSuper-Resolution of Sentinel-2 Images: Learning a Globally Applicable  Deep Neural Network\nClassifying Online Dating Profiles on Tinder using FaceNet Facial  Embeddings\nDissimilarity-based representation for radiomics applications\nCorrection by Projection: Denoising Images with Generative Adversarial  Networks\nLearning to Maintain Natural Image Statistics\nMultimodal Recurrent Neural Networks with Information Transfer Layers  for Indoor Scene Labeling\nFace Spoofing Detection by Fusing Binocular Depth and Spatial Pyramid  Coding Micro-Texture Features\nTesting Deep Neural Networks\nLow Rank Variation Dictionary and Inverse Projection Group Sparse  Representation Model for Breast Tumor Classification\nA Learning-Based Visual Saliency Fusion Model for High Dynamic Range  Video (LBVS-HDR)\n3D Video Quality Assessment\nLCANet: End-to-End Lipreading with Cascaded Attention-CTC\nA Multi-Modal Approach to Infer Image Affect\nRevisiting Salient Object Detection: Simultaneous Detection, Ranking,  and Subitizing of Multiple Salient Objects\nEdgeStereo: A Context Integrated Residual Pyramid Network for Stereo  Matching\nLivDet 2017 Fingerprint Liveness Detection Competition 2017\nDeep Image Demosaicking using a Cascade of Convolutional Residual  Denoising Networks\nFace-MagNet: Magnifying Feature Maps to Detect Small Faces\nKnowledge-based Recurrent Attentive Neural Network for Traffic Sign  Detection\nIllumination-aware Faster R-CNN for Robust Multispectral Pedestrian  Detection\nImage Colorization with Generative Adversarial Networks\nApproximate Query Matching for Image Retrieval\nImproving Object Counting with Heatmap Regulation\nEvaluation of Dense 3D Reconstruction from 2D Face Images in the Wild\nContext-Aware Mixed Reality: A Framework for Ubiquitous Interaction\nObject Detection in Video with Spatiotemporal Sampling Networks\nFacelet-Bank for Fast Portrait Manipulation\nDeep Adaptive Attention for Joint Facial Action Unit Detection and Face  Alignment\nTraining of Convolutional Networks on Multiple Heterogeneous Datasets  for Street Scene Semantic Segmentation\nVEGAC: Visual Saliency-based Age, Gender, and Facial Expression  Classification Using Convolutional Neural Networks\nWhat Catches the Eye? Visualizing and Understanding Deep Saliency Models\nFeature Distillation: DNN-Oriented JPEG Compression Against Adversarial  Examples\nLocal Spectral Graph Convolution for Point Set Feature Learning\nTowards Clinical Diagnosis: Automated Stroke Lesion Segmentation on  Multimodal MR Image Using Convolutional Neural Network\nPseudo Mask Augmented Object Detection\nLearned Iterative Decoding for Lossy Image Compression Systems\nDeep Structure Inference Network for Facial Action Unit Recognition\nStudying Invariances of Trained Convolutional Neural Networks\nDeep Co-Training for Semi-Supervised Image Recognition\nZero-Shot Object Detection: Learning to Simultaneously Recognize and  Localize Novel Concepts\nDeep Multiple Instance Learning for Zero-shot Image Tagging\nReal-time Detection, Tracking, and Classification of Moving and  Stationary Objects using Multiple Fisheye Images\nA dataset and architecture for visual reasoning with a working memory\nA constant-ratio approximation algorithm for a class of hub-and-spoke  network design problems and metric labeling problems: star metric case\nPatchwise object tracking via structural local sparse appearance model\nTriplet-Center Loss for Multi-View 3D Object Retrieval\nSynchronisation of Partial Multi-Matchings via Non-negative  Factorisations\nLearning deep structured active contours end-to-end\nFaces as Lighting Probes via Unsupervised Deep Highlight Extraction\nA Low-rank Tensor Regularization Strategy for Hyperspectral Unmixing\nDeep Component Analysis via Alternating Direction Neural Networks\nLearning to Segment via Cut-and-Paste\nLearning to Cluster for Proposal-Free Instance Segmentation\nWeakly Supervised Salient Object Detection Using Image Labels\nLearning Unsupervised Visual Grounding Through Semantic Self-Supervision\nMergeNet: A Deep Net Architecture for Small Obstacle Discovery\nSeqFace: Make full use of sequence information for face recognition\nConvolutional Point-set Representation: A Convolutional Bridge Between a  Densely Annotated Image and 3D Face Alignment\nA Multi-perspective Approach To Anomaly Detection For Self-aware  Embodied Agents\nEfficient and accurate inversion of multiple scattering with deep  learning\nLine Artist: A Multiple Style Sketch to Painting Synthesis Scheme\nRatio-Preserving Half-Cylindrical Warps for Natural Image Stitching\nSdf-GAN: Semi-supervised Depth Fusion with Multi-scale Adversarial  Networks\nNonlocal Low-Rank Tensor Factor Analysis for Image Restoration\nAttention-GAN for Object Transfiguration in Wild Images\nRevisiting RCNN: On Awakening the Classification Power of Faster RCNN\nAlive Caricature from 2D to 3D\nWeakly Supervised Object Localization on grocery shelves using simple  FCN and Synthetic Dataset\nA Mixture of Views Network with Applications to the Classification of  Breast Microcalcifications\nAsymmetric kernel in Gaussian Processes for learning target variance\nFactorised spatial representation learning: application in  semi-supervised myocardial segmentation\nVGAN-Based Image Representation Learning for Privacy-Preserving Facial  Expression Recognition\nZero-Shot Detection\nAttention-based Temporal Weighted Convolutional Neural Network for  Action Recognition\nUnveiling the invisible - mathematical methods for restoring and  interpreting illuminated manuscripts\nAdaptive Polar Active Contour for Segmentation and Tracking in  Ultrasound Videos\nDYAN: A Dynamical Atoms Network for Video Prediction\nA Temporally-Aware Interpolation Network for Video Frame Inpainting\nLearning the Hierarchical Parts of Objects by Deep Non-Smooth  Nonnegative Matrix Factorization\nText Detection and Recognition in images: A survey\nFlex-Convolution (Deep Learning Beyond Grid-Worlds)\nUnsupervised Cross-dataset Person Re-identification by Transfer Learning  of Spatial-Temporal Patterns\nAdaptive Co-weighting Deep Convolutional Features For Object Retrieval\nAre you eligible? Predicting adulthood from face images via class  specific mean autoencoder\nPatch-Based Image Inpainting with Generative Adversarial Networks\nAn Improved Evaluation Framework for Generative Adversarial Networks\nActor and Action Video Segmentation from a Sentence\nLearning Category-Specific Mesh Reconstruction from Image Collections\nThermal to Visible Synthesis of Face Images using Multiple Regions\nDynamic Sampling Convolutional Neural Networks\nRobust Depth Estimation from Auto Bracketed Images\nWeakly Supervised Medical Diagnosis and Localization from Multiple  Resolutions\nPatch-based Fake Fingerprint Detection Using a Fully Convolutional  Neural Network with a Small Number of Parameters and an Optimal Threshold\nEnd-to-End Fingerprints Liveness Detection using Convolutional Networks  with Gram module\nJoint 3D Face Reconstruction and Dense Alignment with Position Map  Regression Network\nA Cascaded Convolutional Neural Network for Single Image Dehazing\nAdversarial Defense based on Structure-to-Signal Autoencoders\nVideo Object Segmentation with Language Referring Expressions\nA Unified Framework for Multi-View Multi-Class Object Pose Estimation\nSingle-Shot Bidirectional Pyramid Networks for High-Quality Object  Detection\nShow, Tell and Discriminate: Image Captioning by Self-retrieval with  Partially Labeled Data\nDichromatic Gray Pixel for Camera-agnostic Color Constancy\nFound a good match: should I keep searching? - Accuracy and Performance  in Iris Matching Using 1-to-First Search\nDensely Connected Pyramid Dehazing Network\nPlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction\nA Smoke Removal Method for Laparoscopic Images\nGroup Sparsity Residual with Non-Local Samples for Image Denoising\nClustering-driven Deep Embedding with Pairwise Constraints\nTowards Universal Representation for Unseen Action Recognition\nBranched Generative Adversarial Networks for Multi-Scale Image Manifold  Learning\nKonIQ-10k: Towards an ecologically valid and large-scale IQA database\nText2Shape: Generating Shapes from Natural Language by Learning Joint  Embeddings\nGeneralized Scene Reconstruction\nClassification of simulated radio signals using Wide Residual Networks  for use in the search for extra-terrestrial intelligence\nFictitious GAN: Training GANs with Historical Models\nPyramid Stereo Matching Network\nObject Detection for Comics using Manga109 Annotations\nRevisiting Single Image Depth Estimation: Toward Higher Resolution Maps  with Accurate Object Boundaries\nRegion-filtering Correlation Tracking\nAn Incremental Boolean Tensor Factorization approach to model Change  Patterns of Objects in Images\nPose-Driven Deep Models for Person Re-Identification\nA Deep Error Correction Network for Compressed Sensing MRI\nLearning Deep Context-Network Architectures for Image Annotation\nGeometric and Physical Constraints for Head Plane Crowd Density  Estimation in Videos\nAudio-Visual Event Localization in Unconstrained Videos\nLearning Shape-from-Shading for Deformable Surfaces\nPattern Analysis with Layered Self-Organizing Maps\nLayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image\nRealtime Time Synchronized Event-based Stereo\nA Single-shot Camera-Projector Calibration System For Imperfect Planar  Targets\nComparing Generative Adversarial Network Techniques for Image Creation  and Modification\nMulti-Level Factorisation Net for Person Re-Identification\nImportance Weighted Adversarial Nets for Partial Domain Adaptation\nDetecting Heads using Feature Refine Net and Cascaded Multi-scale  Architecture\nP2P-NET: Bidirectional Point Displacement Net for Shape Transform\nDeep Depth Completion of a Single RGB-D Image\nDeep Faster Detection of Faint Edges in Noisy Images\nREST: Real-to-Synthetic Transform for Illumination Invariant Camera  Localization\nImage Set Classification for Low Resolution Surveillance\nOn Regularized Losses for Weakly-supervised CNN Segmentation\nOne-Shot Segmentation in Clutter\nMetric Learning with Dynamically Generated Pairwise Constraints for Ear  Recognition\nBAGAN: Data Augmentation with Balancing GAN\nA multilayer backpropagation saliency detection algorithm and its  applications\nTransferable Joint Attribute-Identity Deep Learning for Unsupervised  Person Re-Identification\nAttributes as Operators\nWebSeg: Learning Semantic Segmentation from Web Searches\nThree Birds One Stone: A Unified Framework for Salient Object  Segmentation, Edge Detection and Skeleton Extraction\nA Divide-and-Conquer Approach to Compressed Sensing MRI\nImage Semantic Transformation: Faster, Lighter and Stronger\nImage-based deep learning for classification of noise transients in  gravitational wave detectors\nRecent Developments from Attribute Profiles for Remote Sensing Image  Classification\nA Framework for Evaluating 6-DOF Object Trackers\nEfficient parametrization of multi-domain deep neural networks\nPoint Convolutional Neural Networks by Extension Operators\nEvent-based Dynamic Face Detection and Tracking Based on Activity\nAdaptive Affinity Field for Semantic Segmentation\nStructural inpainting\nClickBAIT-v2: Training an Object Detector in Real-Time\nReferring Relationships\nInLoc: Indoor Visual Localization with Dense Matching and View Synthesis\nAutomatic Stroke Lesions Segmentation in Diffusion-Weighted MRI\n3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation\nThe HAM10000 Dataset: A Large Collection of Multi-Source Dermatoscopic  Images of Common Pigmented Skin Lesions\nThe Effects of JPEG and JPEG2000 Compression on Attacks using  Adversarial Examples\nObjects Localisation from Motion with Constraints\nELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple  Face Attributes\nStochastic Variational Inference with Gradient Linearization\nPerson re-identification with fusion of hand-crafted and deep pose-based  body region features\nMotion Guided LIDAR-camera Autocalibration and Accelerated Depth Super  Resolution\nPose2Seg: Human Instance Segmentation Without Detection\nEnd-to-End Multi-Task Learning with Attention\nLearning to Become an Expert: Deep Networks Applied To Super-Resolution  Microscopy\nDeep Photometric Stereo on a Sunny Day\nFeatures for Multi-Target Multi-Camera Tracking and Re-Identification\nMemory Warps for Learning Long-Term Online Video Representations\nLearning to Look around Objects for Top-View Representations of Outdoor  Scenes\nSimplifying transforms for general elastic metrics on the space of plane  curves\nMotion-Appearance Co-Memory Networks for Video Question Answering\nStructured Attention Guided Convolutional Neural Fields for Monocular  Depth Estimation\n3D Consistent Biventricular Myocardial Segmentation Using Deep Learning  for Mesh Generation\nMining on Manifolds: Metric Learning without Labels\nLearning Deep Models for Face Anti-Spoofing: Binary or Auxiliary  Supervision\nBag of Recurrence Patterns Representation for Time-Series Classification\nMaskRNN: Instance Level Video Object Segmentation\nGenerative Modeling using the Sliced Wasserstein Distance\nThe Price is Right: Predicting Prices with Product Images\nImprove the performance of transfer learning without fine-tuning using  dissimilarity-based multi-view learning for breast cancer histology images\nRevisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking\nTask-Driven Super Resolution: Object Detection in Low-resolution Images\nTransductive Unbiased Embedding for Zero-Shot Learning\nDDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer\nJoint Optimization Framework for Learning with Noisy Labels\nCross-Domain Weakly-Supervised Object Detection through Progressive  Domain Adaptation\nDisentangling Features in 3D Face Shapes for Joint Face Reconstruction  and Recognition\nCross-modal Deep Variational Hand Pose Estimation\nScalable Deep Learning Logo Detection\nReconstruction Network for Video Captioning\nRegularizing RNNs for Caption Generation by Reconstructing The Past with  The Present\nPredicting Future Instance Segmentations by Forecasting Convolutional  Features\nHierarchical Transfer Convolutional Neural Networks for Image  Classification\nAdversarial Attacks and Defences Competition\nTagging like Humans: Diverse and Distinct Image Annotation\nDeepIM: Deep Iterative Matching for 6D Pose Estimation\nWebly Supervised Learning for Skin Lesion Classification\nRecognizing Challenging Handwritten Annotations with Fully Convolutional  Networks\nOne-Two-One Networks for Compression Artifacts Reduction in Remote  Sensing\nReal-time Progressive 3D Semantic Segmentation for Indoor Scene\nEarthMapper: A Tool Box for the Semantic Segmentation of Remote Sensing  Imagery\nDifferential Attention for Visual Question Answering\nAttention-based Ensemble for Deep Metric Learning\nBridging the Gap Between 2D and 3D Organ Segmentation\nSyncGAN: Synchronize the Latent Space of Cross-modal Generative  Adversarial Networks\nA Vehicle Detection Approach using Deep Learning Methodologies\nExploring to learn visual saliency: The RL-IAC approach\nRegional Priority Based Anomaly Detection using Autoencoders\nA Review on Image Texture Analysis Methods\nTransferable Pedestrian Motion Prediction Models at Intersections\nSemantic Adversarial Examples\nMultilayer Complex Network Descriptors for Color-Texture  Characterization\nGeneralizability vs. Robustness: Adversarial Examples for Medical  Imaging\nCompNet: Complementary Segmentation Network for Brain MRI Extraction\nPredictions of short-term driving intention using recurrent neural  network on sequential data\nLearning Intrinsic Image Decomposition from Watching the World\nLearning Descriptor Networks for 3D Shape Synthesis and Analysis\nUpdating the generator in PPGN-h with gradients flowing through the  encoder\n3D Registration of Curves and Surfaces using Local Differential  Information\nDeepMVS: Learning Multi-view Stereopsis\nInteractive Hand Pose Estimation: Boosting accuracy in localizing  extended finger joints\nConfidence from Invariance to Image Transformations\nHierarchical Novelty Detection for Visual Object Recognition\nImproved Fusion of Visual and Language Representations by Dense  Symmetric Co-Attention for Visual Question Answering\nLeft-Right Comparative Recurrent Model for Stereo Matching\nGenerating Diverse and Accurate Visual Captions by Comparative  Adversarial Learning\nDeep Appearance Maps\nPhaseNet for Video Frame Interpolation\nLearning to Guide Decoding for Image Captioning\nWhen will you do what? - Anticipating Temporal Occurrences of Activities\nTowards whole-body CT Bone Segmentation\nUnsupervised Learning of Sequence Representations by Autoencoders\nTraining VAEs Under Structured Residuals\nTransferring Common-Sense Knowledge for Object Detection\nUnsupervised Geometry-Aware Representation for 3D Human Pose Estimation\nDepth Pooling Based Large-scale 3D Action Recognition with Convolutional  Neural Networks\nSelf-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval\nBtrfly Net: Vertebrae Labelling with Energy-based Adversarial Learning  of Local Spine Prior\nNormalized Cut Loss for Weakly-supervised CNN Segmentation\nPatch-based Face Recognition using a Hierarchical Multi-label Matcher\nUnsupervised Semantic-based Aggregation of Deep Convolutional Features\nRepresenting Videos based on Scene Layouts for Recognizing  Agent-in-Place Actions\nLearning Discriminative Features with Multiple Granularities for Person  Re-Identification\nDensity Adaptive Point Set Registration\nStochastic Adversarial Video Prediction\nSemi-Supervised Deep Metrics for Image Registration\nStainGAN: Stain Style Transfer for Digital Histological Images\nUnifying Bilateral Filtering and Adversarial Training for Robust Neural  Networks\nPixel2Mesh: Generating 3D Mesh Models from Single RGB Images\nLearning Strict Identity Mappings in Deep Residual Networks\nLearning to Separate Object Sounds by Watching Unlabeled Video\nMissing Slice Recovery for Tensors Using a Low-rank Model in Embedded  Space\nGuess Where? Actor-Supervision for Spatiotemporal Action Localization\nMulti-level Activation for Segmentation of Hierarchically-nested Classes\nHigh-dimension Tensor Completion via Gradient-based Optimization Under  Tensor-train Format\nRegularizing Deep Networks by Modeling and Predicting Label Structure\nPedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and  Beyond\nNoise-resistant Deep Learning for Object Classification in 3D Point  Clouds Using a Point Pair Descriptor\nQuestion Type Guided Attention in Visual Question Answering\nMotion Segmentation by Exploiting Complementary Geometric Models\nOpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition  Under Changing Conditions\nMonocular Semantic Occupancy Grid Mapping with Convolutional Variational  Auto-Encoders\nMix and match networks: encoder-decoder alignment for zero-pair image  translation\nEnsemble Manifold Segmentation for Model Distillation and  Semi-supervised Learning\nAutomatic Prediction of Building Age from Photographs\nCross-Domain Image Matching with Deep Feature Maps\nExtracting Scientific Figures with Distantly Supervised Neural Networks\nMVSNet: Depth Inference for Unstructured Multi-view Stereo\nLearning a Text-Video Embedding from Incomplete and Heterogeneous Data\nStatistical transformer networks: learning shape and appearance models  via self supervision\nEfficient No-Reference Quality Assessment and Classification Model for  Contrast Distorted Images\nEstimation of Camera Locations in Highly Corrupted Scenarios: All About  that Base, No Shape Trouble\nTraining Multi-organ Segmentation Networks with Sample Selection by  Relaxed Upper Confident Bound\nDimensionality's Blessing: Clustering Images by Underlying Distribution\nOATM: Occlusion Aware Template Matching by Consensus Set Maximization\nLearning-based Video Motion Magnification\nDetecting Multi-Oriented Text with Corner-based Region Proposals\nEstimating Depth from RGB and Sparse Sensing\nRecovering Realistic Texture in Image Super-resolution by Deep Spatial  Feature Transform\nPhotometric Stereo in Participating Media Considering Shape-Dependent  Forward Scatter\nSemantic Edge Detection with Diverse Deep Supervision\nA Fully Progressive Approach to Single-Image Super-Resolution\nBringing Alive Blurred Moments!\nGenerative Adversarial Networks for Extreme Learned Image Compression\nVision as an Interlingua: Learning Multilingual Semantic Embeddings of  Untranscribed Speech\nAttribute-Centered Loss for Soft-Biometrics Guided Face Sketch-Photo  Recognition\n$\\mathcal{G}$-Distillation: Reducing Overconfident Errors on Novel  Samples\nAn ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural  Networks\nNetAdapt: Platform-Aware Neural Network Adaptation for Mobile  Applications\nTowards Deep Cellular Phenotyping in Placental Histology\nRecurrent Neural Networks for Person Re-identification Revisited\nCrafting a Toolchain for Image Restoration by Deep Reinforcement  Learning\nModular Generative Adversarial Networks\nGraphical Generative Adversarial Networks\nRSGAN: Face Swapping and Editing using Face and Hair Representation in  Latent Spaces\nExploring Disentangled Feature Representation Beyond Face Identification\nPointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place  Recognition\nA Deep Information Sharing Network for Multi-contrast Compressed Sensing  MRI Reconstruction\nAudio-Visual Scene Analysis with Self-Supervised Multisensory Features\nGraph Matching with Anchor Nodes: A Learning Approach\nNonlinear 3D Face Morphable Model\nMulti-Scale Generalized Plane Match for Optical Flow\nDemoiréing of Camera-Captured Screen Images Using Deep Convolutional  Neural Network\nExFuse: Enhancing Feature Fusion for Semantic Segmentation\nAttention Cropping: A Novel Data Augmentation Method for Real-world  Plant Species Identification\nPlaque Classification in Coronary Arteries from IVOCT Images Using  Convolutional Neural Networks and Transfer Learning\nFusing Saliency Maps with Region Proposals for Unsupervised Object  Localization\nOffline Object Extraction from Dynamic Occupancy Grid Map Sequences\nVR IQA NET: Deep Virtual Reality Image Quality Assessment using  Adversarial Learning\nProjection image-to-image translation in hybrid X-ray/MR imaging\nEdge-based LBP description of surfaces with colorimetric patterns\nQuadricSLAM: Constrained Dual Quadrics from Object Detections as  Landmarks in Semantic SLAM\nSeed-Point Detection of Clumped Convex Objects by Short-Range Attractive  Long-Range Repulsive Particle Clustering\nDetail-Preserving Pooling in Deep Networks\nRanking Generative Adversarial Networks: Subjective Control over  Semantic Image Attributes\nBeamformed Fingerprint Learning for Accurate Millimeter Wave Positioning\nThe Conversation: Deep Audio-Visual Speech Enhancement\nText2Colors: Guiding Image Colorization through Text-Driven Palette  Generation\nMulti-scale Neural Networks for Retinal Blood Vessels Segmentation\nView Extrapolation of Human Body from a Single Image\nClustering via Boundary Erosion\nSTAIR Actions: A Video Dataset of Everyday Home Actions\nZero-Shot Object Detection\nImage Correction via Deep Reciprocating HDR Transformation\nIterative fully convolutional neural networks for automatic vertebra  segmentation\nMulti-Label Wireless Interference Identification with Convolutional  Neural Networks\nUnsupervised Discovery of Object Landmarks as Structural Representations\nDistort-and-Recover: Color Enhancement using Deep Reinforcement Learning\nCubeNet: Equivariance to 3D Rotation and Translation\nDeep Autoencoding Models for Unsupervised Anomaly Segmentation in Brain  MR Images\nDLL: A Blazing Fast Deep Neural Network Library\nGenerative Visual Rationales\nTrajectory Factory: Tracklet Cleaving and Re-connection by Deep Siamese  Bi-GRU for Multiple Object Tracking\nTowards integrating spatial localization in convolutional neural  networks for brain image segmentation\nImproving Classification Rate of Schizophrenia Using a Multimodal  Multi-Layer Perceptron Model with Structural and Functional MR\nPersonalized Classifier for Food Image Recognition\nPix3D: Dataset and Methods for Single-Image 3D Shape Modeling\nEMBER: An Open Dataset for Training Static PE Malware Machine Learning  Models\nCross-Domain Visual Recognition via Domain Adaptive Dictionary Learning\nA Variational U-Net for Conditional Appearance and Shape Generation\nA Hybrid Model for Identity Obfuscation by Face Replacement\nDeep Motion Boundary Detection\nPrecise Temporal Action Localization by Evolving Temporal Proposals\nLearning Deep Sketch Abstraction\nMSnet: Mutual Suppression Network for Disentangled Video Representations\nSpline Error Weighting for Robust Visual-Inertial Fusion\nBodyNet: Volumetric Inference of 3D Human Body Shapes\nLearning to Exploit the Prior Network Knowledge for Weakly-Supervised  Semantic Segmentation\nPose estimation of a single circle using default intrinsic calibration\nA New Vision of the Coma Cluster: Conference Summary\nSurvey on Various Gesture Recognition Techniques for Interfacing  Machines Based on Ambient Intelligence\nMeasuring and Understanding Sensory Representations within Deep Networks  Using a Numerical Optimization Framework\nLearning Depth from Single Monocular Images Using Deep Convolutional  Neural Fields\nPrototypical Priors: From Improving Classification to Zero-Shot Learning\nCAD2RL: Real Single-Image Flight without a Single Real Image\nDense Associative Memory is Robust to Adversarial Inputs\nStructural Compression of Convolutional Neural Networks Based on Greedy  Filter Pruning\nUltimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM  in HDR and High Speed Scenarios\nPushing the envelope in deep visual recognition for mobile platforms\nVisual Concepts and Compositional Voting\nJoint Optic Disc and Cup Segmentation Based on Multi-label Deep Network  and Polar Transformation\nCollision Selective Visual Neural Network Inspired by LGMD2 Neurons in  Juvenile Locusts\nHypothesize and Bound: A Computational Focus of Attention Mechanism for  Simultaneous 3D Shape Reconstruction, Pose Estimation and Classification from  a Single 2D Image\nScalable Visibility Color Map Construction in Spatial Databases\nAutomated Classification of L/R Hand Movement EEG Signals using Advanced  Feature Extraction and Machine Learning\nDISC: Deep Image Saliency Computing via Progressive Representation  Learning\nDistributed-memory large deformation diffeomorphic 3D image registration\nGPU Acclerated Automated Feature Extraction from Satellite Images\nSENNS: Sparse Extraction Neural NetworkS for Feature Extraction\nProposal for the creation of a research facility for the development of  the SP machine\nHardware-Driven Nonlinear Activation for Stochastic Computing Based Deep  Convolutional Neural Networks\nNullHop: A Flexible Convolutional Neural Network Accelerator Based on  Sparse Representations of Feature Maps\nRemote sensing of forests using discrete return airborne LiDAR\nFOCAN: A Fog-supported Smart City Network Architecture for Management of  Applications in the Internet of Everything Environments\nOn-Chip Communication Network for Efficient Training of Deep  Convolutional Networks on Heterogeneous Manycore Systems\nTrade-off between angular resolution and straylight contamination in CMB  anisotropy experiments. I. Pattern simulations\nRobust Combining of Disparate Classifiers through Order Statistics\nImage Compression with Iterated Function Systems, Finite Automata and  Zerotrees: Grand Unification\nContextual Normalization Applied to Aircraft Gas Turbine Engine  Diagnosis\nFast Verification of Convexity of Piecewise-linear Surfaces\nSwarming around Shellfish Larvae\nArtificial Ant Colonies in Digital Image Habitats - A Mass Behaviour  Effect Study on Pattern Recognition\nImage Colour Segmentation by Genetic Algorithms\nFace Recognition Based on Polar Frequency Features\nA Better Alternative to Piecewise Linear Time Series Segmentation\nSelf-Replication and Self-Assembly for Manufacturing\nLife Under Your Feet: An End-to-End Soil Ecology Sensor Network,  Database, Web Server, and Analysis Service\nClassical and Quantum Causality in Quantum Field Theory. Or, \"The  Quantum Universe\"\nThe entropy of keys derived from laser speckle\nClassification of curves in 2D and 3D via affine integral signatures\nAutomatic Generation of the Axial Lines of Urban Environments to Capture  What We Perceive\nA Novel Clustering Algorithm Based Upon Games on Evolving Network\nA Theoretical Analysis of Joint Manifolds\nA new approach for digit recognition based on hand gesture analysis\nNon-quadratic convex regularized reconstruction of MR images from spiral  acquisitions\nMaximin affinity learning of image segmentation\nBehavior and performance of the deep belief networks on image  classification\nA Novel Feature Extraction for Robust EMG Pattern Recognition\nAn Explicit Nonlinear Mapping for Manifold Learning\nMessage-Passing Algorithms: Reparameterizations and Splittings\nHandwritten Bangla Basic and Compound character recognition using MLP  and SVM classifier\nSecuring Interactive Sessions Using Mobile Device through Visual Channel  and Visual Inspection\nNonlinear Filter Based Image Denoising Using AMF Approach\nObject-image correspondence for curves under finite and affine cameras\nRegularized Richardson-Lucy Algorithm for Sparse Reconstruction of  Poissonian Images\nA New Approach to Lung Image Segmentation using Fuzzy Possibilistic  C-Means Algorithm\nScalable Tensor Factorizations for Incomplete Data\nClassification of LULC Change Detection using Remotely Sensed Data for  Coimbatore City, Tamilnadu, India\nRecognition of Non-Compound Handwritten Devnagari Characters using a  Combination of MLP and Minimum Edit Distance\nProliferating cell nuclear antigen (PCNA) allows the automatic  identification of follicles in microscopic images of human ovarian tissue\nComparative Study of Statistical Skin Detection Algorithms for  Sub-Continental Human Images\nPenalty Decomposition Methods for $L0$-Norm Minimization\nWeighted Attribute Fusion Model for Face Recognition\nDeep Self-Taught Learning for Handwritten Character Recognition\nStatistical mechanics of digital halftoning\nCharacterizing Structure Through Shape Matching and Applications to Self  Assembly\nStatistical Compressed Sensing of Gaussian Mixture Models\nA novel super resolution reconstruction of low reoslution images  progressively using dct and zonal filter based denoising\nOn Democracy in Peer-to-Peer systems\nPolar Fusion Technique Analysis for Evaluating the Performances of Image  Fusion of Thermal and Visual Images for Human Face Recognition\nHigh Performance Human Face Recognition using Independent High Intensity  Gabor Wavelet Responses: A Statistical Approach\nArithmetic and Frequency Filtering Methods of Pixel-Based Image Fusion  Techniques\nA Distributed Mincut/Maxflow Algorithm Combining Path Augmentation and  Push-Relabel\nOn Partial Opimality by Auxiliary Submodular Problems\nProgressive versus Random Projections for Compressive Capture of Images,  Lightfields and Higher Dimensional Visual Signals\nA robust, low-cost approach to Face Detection and Face Recognition\nIris Recognition Based on LBP and Combined LVQ Classifier\nMultidimensional counting grids: Inferring word order from disordered  bags of words\nA Co-Prime Blur Scheme for Data Security in Video Surveillance\nFeature Extraction Methods for Color Image Similarity\nDiscrimination of English to other Indian languages (Kannada and Hindi)  for OCR system\nMesh Learning for Classifying Cognitive Processes\nSpatial And Spectral Quality Evaluation Based On Edges Regions Of  Satellite Image Fusion\nAutomated Training and Maintenance through Kinect\nA Novel Metric Approach Evaluation For The Spatial Enhancement Of  Pan-Sharpened Images\nAn Online Character Recognition System to Convert Grantha Script to  Malayalam\nVisual Exploration of Simulated and Measured Blood Flow\nFCM Based Blood Vessel Segmentation Method for Retinal Images\nHessian Schatten-Norm Regularization for Linear Inverse Problems\nThe Biometric Menagerie - A Fuzzy and Inconsistent Concept\nEfficient Solution to the 3D Problem of Automatic Wall Paintings  Reassembly\nSampling and Reconstruction of Spatial Fields using Mobile Sensors\nSegmentation of ultrasound images of thyroid nodule for assisting fine  needle aspiration cytology\nNew Edge Detection Technique based on the Shannon Entropy in Gray Level  Images\nG-invariant Persistent Homology\nKernel Estimation from Salient Structure for Robust Motion Deblurring\nSimilarity of Polygonal Curves in the Presence of Outliers\nPituitary Adenoma Volumetry with 3D Slicer\nDiscrete moving frames and discrete integrable systems\nHigh-precision camera distortion measurements with a \"calibration harp\"\nMultiscale Discriminant Saliency for Visual Attention\nEfficient MRF Energy Propagation for Video Segmentation via Bilateral  Filters\nMulti-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree  Modelling\nLocal Structure Matching Driven by Joint-Saliency-Structure Adaptive  Kernel Regression\nAn Optical Watermarking Solution for Color Personal Identification  Pictures\nFast Matching by 2 Lines of Code for Large Scale Face Recognition  Systems\nOn the convergence of the IRLS algorithm in Non-Local Patch Regression\nOptical Flow Sensing and the Inverse Perception Problem for Flying Bats\nShape Reconstruction and Recognition with Isolated Non-directional Cues\nMulti-q Pattern Classification of Polarization Curves\nGeometric primitive feature extraction - concepts, algorithms, and  applications\nIndexing Medical Images based on Collaborative Experts Reports\nDistributed Bayesian inference for consistent labeling of tracked  objects in non-overlapping camera networks\nHighly Scalable, Parallel and Distributed AdaBoost Algorithm using Light  Weight Threads and Web Services on a Network of Multi-Core Machines\nComparing Edge Detection Methods based on Stochastic Entropies and  Distances for PolSAR Imagery\nSymmetries in LDDMM with higher order momentum distributions\nFree Instrument for Movement Measure\nMammogram Edge Detection Using Hybrid Soft Computing Methods\nRegularized Discrete Optimal Transport\nAutomatic Mammogram image Breast Region Extraction and Removal of  Pectoral Muscle\nMinutiae Based Thermal Face Recognition using Blood Perfusion Data\nSaying What You're Looking For: Linguistics Meets Video Search\nAn Image-Based Fluid Surface Pattern Model\nPCG-Cut: Graph Driven Segmentation of the Prostate Central Gland\nMinimax rates in permutation estimation for feature matching\nQuality Assessment of Pixel-Level ImageFusion Using Fuzzy Logic\nPerforming edge detection by difference of Gaussians using q-Gaussian  kernels\nTemplate-Based Active Contours\nAn adaptive block based integrated LDP,GLCM,and Morphological features  for Face Recognition\nA Gabor block based Kernel Discriminative Common Vector (KDCV) approach  using cosine kernels for Human Face Recognition\nA Face Recognition approach based on entropy estimate of the nonlinear  DCT features in the Logarithm Domain together with Kernel Entropy Component  Analysis\nFace Recognition using Hough Peaks extracted from the significant blocks  of the Gradient Image\nHigh Performance Human Face Recognition using Gabor based Pseudo Hidden  Markov Model\nDeep Belief Networks for Image Denoising\nIntriguing properties of neural networks\nCombining persistent homology and invariance groups for shape comparison\nAn Empirical Evaluation of Similarity Measures for Time Series  Classification\nLearning Mid-Level Features and Modeling Neuron Selectivity for Image  Classification\nGraph matching: relax or not?\nEffective Features of Remote Sensing Image Classification Using  Interactive Adaptive Thresholding Method\nSparse Principal Component Analysis via Rotation and Truncation\nAutomatic Classification of Human Epithelial Type 2 Cell Indirect  Immunofluorescence Images using Cell Pyramid Matching\nCoherent Multi-Sentence Video Description with Variable Level of Detail\nBeyond L2-Loss Functions for Learning Sparse Models\nMBIS: Multivariate Bayesian Image Segmentation Tool\nPCANet: A Simple Deep Learning Baseline for Image Classification?\nQuadratization of Symmetric Pseudo-Boolean Functions\nHigh-Speed Tracking with Kernelized Correlation Filters\nMCL-3D: a database for stereoscopic image quality assessment using  2D-image-plus-depth source\nGroup-based Sparse Representation for Image Restoration\nScalable Semidefinite Relaxation for Maximum A Posterior Estimation\nMulti-ellipses detection on images inspired by collective animal  behavior\nFast algorithm for Multiple-Circle detection on images using Learning  Automata\nBiofilmQuant: A Computer-Assisted Tool for Dental Biofilm Quantification\nCIDI-Lung-Seg: A Single-Click Annotation Tool for Automatic Delineation  of Lungs from CT Scans\nNear-optimal Keypoint Sampling for Fast Pathological Lung Segmentation\nOptimally Stabilized PET Image Denoising Using Trilateral Filtering\nSpatiotemporal Stacked Sequential Learning for Pedestrian Detection\nKernel Nonnegative Matrix Factorization Without the Curse of the  Pre-image - Application to Unmixing Hyperspectral Images\nProbabilistic Group Testing under Sum Observations: A Parallelizable  2-Approximation for Entropy Loss\nEfficient On-the-fly Category Retrieval using ConvNets and GPUs\nNovel and Automatic Parking Inventory System Based on Pattern  Recognition and Directional Chain Code\nDissimilarity-based Sparse Subset Selection\n\"Your click decides your fate\": Leveraging clickstream patterns from  MOOC videos to infer students' information processing & attrition behavior\nFunctional Principal Component Analysis and Randomized Sparse Clustering  Algorithm for Medical Image Analysis\nA Fast and Accurate Unconstrained Face Detector\nAutomatic Removal of Marginal Annotations in Printed Text Document\nA brief survey on deep belief networks and introducing a new object  oriented toolbox (DeeBNet)\nRobust 3D face recognition in presence of pose and partial occlusions or  missing parts\nClassifying sequences by the optimized dissimilarity space embedding  approach: a case study on the solubility analysis of the E. coli proteome\nThe Filament Sensor for Near Real-Time Detection of Cytoskeletal Fiber  Structures\nHD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale  Visual Recognition\nFeature Learning from Incomplete EEG with Denoising Autoencoder\nMumford-Shah and Potts Regularization for Manifold-Valued Data with  Applications to DTI and Q-Ball Imaging\nDirect Processing of Document Images in Compressed Domain\nCanonical Polyadic Decomposition with Auxiliary Information for Brain  Computer Interface\nDetecting Figures and Part Labels in Patents: Competition-Based  Development of Image Processing Algorithms\nData Assimilation of Satellite Fire Detection in Coupled Atmosphere-Fire  Simulation by WRF-SFIRE\nTowards a Visual Turing Challenge\nOn Chord and Sagitta in ${\\mathbb Z}^2$: An Analysis towards Fast and  Robust Circular Arc Detection\nPower-Law Graph Cuts\nParallax Effect Free Mosaicing of Underwater Video Sequence Based on  Texture Features\nMulti-modal Image Registration for Correlative Microscopy\nA Comparative Study of Techniques of Distant Reconstruction of  Displacement Fields by using DISTRESS Simulator\nCombining contextual and local edges for line segment extraction in  cluttered images\nFrom Captions to Visual Concepts and Back\nSIRF: Simultaneous Image Registration and Fusion in A Unified Framework\nVisual Noise from Natural Scene Statistics Reveals Human Scene Category  Representations\nPersistent Evidence of Local Image Properties in Generic ConvNets\nUnderstanding Trajectory Behavior: A Motion Pattern Approach\nThe Effect of Wedge Tip Angles on Stress Intensity Factors in the  Contact Problem between Tilted Wedge and a Half Plane with an Edge Crack  Using Digital Image Correlation\nImproved 8-point Approximate DCT for Image and Video Compression  Requiring Only 14 Additions\nSketch and Validate for Big Data Clustering\nA General Preprocessing Method for Improved Performance of Epipolar  Geometry Estimation Algorithms\nGraphical Potential Games\nLanguage Models for Image Captioning: The Quirks and What Works\nMachine learning based data mining for Milky Way filamentary structures  reconstruction\nAutoencoding the Retrieval Relevance of Medical Images\nDCTNet : A Simple Learning-free Approach for Face Recognition\nTowards Good Practices for Very Deep Two-Stream ConvNets\nFast Segmentation of Left Ventricle in CT Images by Explicit Shape  Regression using Random Pixel Difference Features\nTracking Randomly Moving Objects on Edge Box Proposals\nA Visual Embedding for the Unsupervised Extraction of Abstract Semantics\nTransfer Learning from Deep Features for Remote Sensing and Poverty  Mapping\nFast Single Image Super-Resolution\nDeep Compression: Compressing Deep Neural Networks with Pruning, Trained  Quantization and Huffman Coding\nDiverse Large-Scale ITS Dataset Created from Continuous Learning for  Real-Time Vehicle Detection\nText-Attentional Convolutional Neural Networks for Scene Text Detection\nDual Principal Component Pursuit\nA Picture is Worth a Billion Bits: Real-Time Image Reconstruction from  Dense Binary Pixels\nLayer-Specific Adaptive Learning Rates for Deep Networks\nToward Long Distance, Sub-diffraction Imaging Using Coherent Camera  Arrays\nFireCaffe: near-linear acceleration of deep neural network training on  compute clusters\nImage-Based Correction of Continuous and Discontinuous Non-Planar Axial  Distortion in Serial Section Microscopy\nConvolutional Neural Network for Stereotypical Motor Movement Detection  in Autism\nA Century of Portraits: A Visual Historical Record of American High  School Yearbooks\nParkinson's disease patient rehabilitation using gaming platforms:  lessons learnt\nA Light CNN for Deep Face Representation with Noisy Labels\nExperimental robustness of Fourier Ptychography phase retrieval  algorithms\nHierarchical Recurrent Neural Encoder for Video Representation with  Application to Captioning\nSherlock: Scalable Fact Learning in Images\nAdversarial Manipulation of Deep Representations\nDeep Compositional Captioning: Describing Novel Object Categories  without Paired Training Data\nSuper-Resolution with Deep Convolutional Sufficient Statistics\nDensity Modeling of Images using a Generalized Normalization  Transformation\nUnsupervised Learning of Visual Structure using Predictive Generative  Networks\nAn Immersive Telepresence System using RGB-D Sensors and Head Mounted  Display\nPushing the Boundaries of Boundary Detection using Deep Learning\nThe Limitations of Deep Learning in Adversarial Settings\nFine-Grain Annotation of Cricket Videos\nLocNet: Improving Localization Accuracy for Object Detection\nDesign of Kernels in Convolutional Neural Networks for Image  Classification\nBuilding Machines That Learn and Think Like People\nWaterdrop Stereo\nImage Captioning with Deep Bidirectional LSTMs\nFace Image Analysis using AAM, Gabor, LBP and WD features for Gender,  Age, Expression and Ethnicity Classification\nGIFT: A Real-time and Scalable 3D Shape Search Engine\nOne-class classifiers based on entropic spanning graphs\nFilling in the details: Perceiving from low fidelity images\nEvolutionary Projection Selection for Radon Barcodes\nDeep Aesthetic Quality Assessment with Semantic Information\nEnd-to-End Tracking and Semantic Segmentation Using Recurrent Neural  Networks\nOnline Human Action Detection using Joint Classification-Regression  Recurrent Neural Networks\nA Framework for Human Pose Estimation in Videos\nDeep Learning for Saliency Prediction in Natural Video\nA Novel Method to Study Bottom-up Visual Saliency and its Neural  Mechanism\nExploiting Temporal Information for DCNN-based Fine-Grained Object  Classification\nDetailed Garment Recovery from a Single-View Image\nFacial Expression Recognition Using a Hybrid CNN-SIFT Aggregator\nDeep Motif Dashboard: Visualizing and Understanding Genomic Sequences  Using Deep Neural Networks\nCan Peripheral Representations Improve Clutter Metrics on Complex  Scenes?\nDeep Convolutional Neural Networks and Data Augmentation for  Environmental Sound Classification\nAutomated Selection of Uniform Regions for CT Image Quality Detection\nA Convolutional Autoencoder for Multi-Subject fMRI Data Aggregation\nLeveraging Structural Context Models and Ranking Score Fusion for Human  Interaction Prediction\nTowards Bayesian Deep Learning: A Framework and Some Existing Methods\nDensely Connected Convolutional Networks\nCurvature Integration in a 5D Kernel for Extracting Vessel Connections  in Retinal Images\n"
  },
  {
    "path": "data/arxiv/language generation_14514_15000_15_title.txt",
    "content": "On Derivation Languages of Flat Splicing Systems\nWhat's in a Name?\nFence - An Efficient Parser with Ambiguity Support for Model-Driven  Language Specification\nOn Even Linear Indexed Languages with a Reduction to the Learning of  Context-Free Languages\nMulti-Level Languages are Generalized Arrows\nPumping Lemma for Higher-order Languages\nConcept Generation in Language Evolution\nA Perfect Model for Bounded Verification\nVisual Representation of 3D Language Constructs Specified by Generic  Depictions\nHuman languages order information efficiently\nGeneration and analysis of lamplighter programs\nPrincipal ideal languages and synchronizing automata\nUsing Artificial Tokens to Control Languages for Multilingual Image  Caption Generation\nDesign and Implementation of a Reversible Object-Oriented Programming  Language\nA Syntactic Neural Model for General-Purpose Code Generation\nOn Store Languages of Language Acceptors\nAutomatic Generation of Language-Independent Features for Cross-Lingual  Classification\nRational stochastic languages\nRegular Languages are Church-Rosser Congruential\nInvisible pushdown languages\nTransliteration in Any Language with Surrogate Languages\nConstruction of rational expression from tree automata using a  generalization of Arden's Lemma\nRandomness of formal languages via automatic martingales\nFormal Properties of XML Grammars and Languages\nDeleting Powers in Words\nDyck-based characterizations of Indexed Languages\nBasic Classes of Grammars with Prohibition\nVarieties of Unranked Tree Languages\nThe MMT API: A Generic MKM System\nA prototype Malayalam to Sign Language Automatic Translator\nTranslating into Free Word Order Languages\nCircular Languages Generated by Complete Splicing Systems and Pure  Unitary Languages\nA Tool for Model-Based Language Specification\nQuantum, Stochastic, and Pseudo Stochastic Languages with Few States\nA Constraint-Satisfaction Parser for Context-Free Grammars\nEngineering Tagging Languages for DSLs\nAdapting the Core Language Engine to French and Spanish\nExploiting Similarities among Languages for Machine Translation\nGeneric Results for Concatenation Hierarchies\nOn context-free languages of scattered words\nInclusion of regular and linear languages in group languages\nUser Reviews and Language: How Language Influences Ratings\nManyDSL: A Host for Many Languages\nModeling of languages for tensor manipulation\nLearning Algorithm for Relation-Substitutable Context-Free Languages\nGeneric Description of Well-Scoped, Well-Typed Syntaxes\nComputational Representation of Linguistic Structures using  Domain-Specific Languages\nChart-driven Connectionist Categorial Parsing of Spoken Korean\nParsing of part-of-speech tagged Assamese Texts\nOn the Structure and Complexity of Rational Sets of Regular Languages\nA Survey and Classification of Controlled Natural Languages\nDecidability of regular language genus computation\nA Domain Specific Transformation Language\nPrecise but Natural Specification for Robot Tasks\nAn undecidable property of context-free languages\nFinite Orbits of Language Operations\nFinitely generated ideal languages and synchronizing automata\nOn the structure and syntactic complexity of generalized definite  languages\nRegular Boardgames\nRandom Words in a (Weighted) Regular Language: a Free Energy Approach\nCoqatoo: Generating Natural Language Versions of Coq Proofs\nFrom Syntactic Theories to Interpreters: A Specification Language and  Its Compilation\nOn the Complexity of Quantum Languages\nImmunity and Pseudorandomness of Context-Free Languages\nCone types and geodesic languages for lamplighter groups and Thompson's  group F\nPiecewise excluding geodesic languages\nRewriting Preserving Recognizability of Finite Tree Languages\nContextual Information and Specific Language Models for Spoken Language  Understanding\nAutomata and Reduced Words in the Free Group\nAbstraction Level Taxonomy of Programming Language Frameworks\nExistential Rule Languages with Finite Chase: Complexity and  Expressiveness\nAn assessment of orthographic similarity measures for several African  languages\nAn approach to computing downward closures\nInteractive Grounded Language Acquisition and Generalization in a 2D  World\nNatural Language Statistical Features of LSTM-generated Texts\nA General Architecture for Heterogeneous Language Engineering and  Projectional Editor Support\nSynthetic Data for Neural Machine Translation of Spoken-Dialects\nAnalyzing and Improving Statistical Language Models for Speech  Recognition\nBeyond $ω$BS-regular Languages: $ω$T-regular Expressions and  Counter-Check Automata\nA Model-Driven Parser Generator, from Abstract Syntax Trees to Abstract  Syntax Graphs\nA General Architecture for Language Engineering (GATE) - a new approach  to Language Engineering R&D\nMethods and Tools for Building the Catalan WordNet\nOperational Semantics and Type Soundness of Quantum Programming Language  LanQ\nA Domain-Specific Language for Programming in the Tile Assembly Model\nExtensible Pattern Matching in an Extensible Language\nProbabilistic and Geometric Languages in the Context of the Principle of  Least Action\nAttributes as Semantic Units between Natural Language and Visual  Recognition\nTeaching natural language to computers\nA Paradigm for Situated and Goal-Driven Language Learning\nGuStL - An Experimental Guarded States Language\nDual Language Models for Code Mixed Speech Recognition\nSemantics of Programming Languages: A Tool-Oriented Approach\nThe Concurrent Language Aldwych\nStemmer for Serbian language\nThe distributed Language Hello White Paper\nCan Machines Think in Radio Language?\nA modifiction of the CSP algorithm for infinite languages\nThe Role of the Gricean Maxims in the Generation of Referring  Expressions\nTowards rule-based visual programming of generic visual systems\nFormalization of simplification for context-free grammars\nUnsupervised Language Acquisition\nFrom Regular to Strictly Locally Testable Languages\nImplementation of nlization framework for verbs, pronouns and  determiners with eugene\nLearning to Create and Reuse Words in Open-Vocabulary Neural Language  Modeling\nDeciding Whether a Regular Language is Generated by a Splicing System\nOrdered Monoids: Languages and Relations\nGeneral-Purpose Visual Language and Information System with Case-Studies  in Developing Business Applications\nPatterns of Language - A Population Model for Language Structure\nTowards Generic Refactoring\nLearning to Order Facts for Discourse Planning in Natural Language  Generation\nSplicing systems and the Chomsky hierarchy\nA Neural Knowledge Language Model\nA Generative Parser with a Discriminative Recognition Algorithm\nLanguage Detection For Short Text Messages In Social Media\nThe ModelCC Model-Based Parser Generator\nSpecifying Logic Programs in Controlled Natural Language\nNotions of Equivalence in Software Design\nDescriptional complexity of bounded context-free languages\nThe Magic Number Problem for Subregular Language Families\nOn reverse-engineering the KUKA Robot Language\nPower of Randomization in Automata on Infinite Strings\nEngineering Delta Modeling Languages\nPOLYGLOT-NER: Massive Multilingual Named Entity Recognition\nHindi to English Transfer Based Machine Translation System\nSyntactic complexity of regular ideals\nClassifying Syntactic Regularities for Hundreds of Languages\nMatLM: a Matrix Formulation for Probabilistic Language Models\nSome Subclasses of Linear Languages based on Nondeterministic Linear  Automata\nThe probabilistic analysis of language acquisition: Theoretical,  computational, and experimental analysis\nTranslating Nondeterministic Functional Language based on Attribute  Grammars into Java\nUsing graph transformation algorithms to generate natural language  equivalents of icons expressing medical concepts\nExtensible type checker for parser generation\nThe morphospace of language networks\nApplying Explanation-based Learning to Control and Speeding-up Natural  Language Generation\nJulia: A Fast Dynamic Language for Technical Computing\nThe Cyan Language\nLSTM based Conversation Models\nBidirectional American Sign Language to English Translation\nMicroservices: a Language-based Approach\nModelling Word Burstiness in Natural Language: A Generalised Polya  Process for Document Language Models in Information Retrieval\nMyProLang - My Programming Language: A Template-Driven Automatic Natural  Programming Language\nBinary equality sets are generated by two words\nSynthetic Language Generation and Model Validation in BEAST2\nTradeoffs in Metaprogramming\nLearning rational stochastic languages\nUsing Inverse lambda and Generalization to Translate English to Formal  Languages\nThe Chomsky-Schützenberger Theorem for Quantitative Context-Free  Languages\nUsing Duality in Circuit Complexity\nSemantic Parsing Natural Language into SPARQL: Improving Target Language  Representation with Neural Attention\nA pragmatic theory of generic language\nSemantically Conditioned LSTM-based Natural Language Generation for  Spoken Dialogue Systems\nSHAPED: Shared-Private Encoder-Decoder for Text Style Adaptation\nPrimitive words and roots of words\nPseudorandom Generators Against Advised Context-Free Languages\nControlling Linguistic Style Aspects in Neural Language Generation\nLanguage Modeling with Generative AdversarialNetworks\nCompleteness of Compositional Translation for Context-Free Grammars\nCo-evolution of Language and of the Language Acquisition Device\nTemporal Phylogenetic Networks and Logic Programming\nFuzzy L languages\nSimulation of Quantum Error Correcting Code\nHow to Evaluate Controlled Natural Languages\nMulti-valued Action Languages in CLP(FD)\nStrategical languages of infinite words\nQuantum Finite Automata and Probabilistic Reversible Automata: R-trivial  Idempotent Languages\nFurther Results on Languages of Membrane Structures\nConjugacy growth series and languages in groups\nOn sets of numbers rationally represented in a rational base number  system\nRhythmic generation of infinite trees and languages\nPumping lemma and Ogden lemma for displacement context-free grammars\nLinear Context-Free Tree Languages and Inverse Homomorphisms\nTowards an algebraic characterization of rational word functions\nFormal Specification and Integration of Distributed Security Policies\nOn Finite-Index Indexed Grammars and Their Restrictions\nProbabilistic Typology: Deep Generative Models of Vowel Inventories\nNatural Language Processing: State of The Art, Current Trends and  Challenges\nCounterfactual Language Model Adaptation for Suggesting Phrases\nTools and resources for Romanian text-to-speech and speech-to-text  applications\nLanguage Modelling For Task-Oriented Domains\nQuotient Complexity of Regular Languages\nReversible Language Extensions and their Application in Debugging\nMultipass automata and group word problems\nOn Infinite Words Determined by Indexed Languages\nDesign Guidelines for Domain Specific Languages\nLanguage Recognition using Random Indexing\nBeyond Word-based Language Model in Statistical Machine Translation\nMany Languages, One Parser\nA Bayesian Model of Multilingual Unsupervised Semantic Role Induction\nTwo Discourse Driven Language Models for Semantics\nLearning Lexical Entries for Robotic Commands using Crowdsourcing\nStatistical Machine Translation for Indian Languages: Mission Hindi\nStatistical Machine Translation for Indian Languages: Mission Hindi 2\nA Multichannel Convolutional Neural Network For Cross-language Dialog  State Tracking\nThe Word Problem of $\\mathbb{Z}^n$ Is a Multiple Context-Free Language\nListen, Interact and Talk: Learning to Speak via Interaction\nLearning with Latent Language\nLanguage: The missing selection pressure\nQuery learning of derived $ω$-tree languages in polynomial time\nA Formal Comparison of Visual Web Wrapper Generators\nMarciani Normal Form of context-free grammars\nUsing Pseudo-Stochastic Rational Languages in Probabilistic Grammatical  Inference\nParameterized Neural Network Language Models for Information Retrieval\nRestrictions on Tree Adjoining Languages\nA Generalized Language Model as the Combination of Skipped n-grams and  Modified Kneser-Ney Smoothing\nApplications of L systems to group theory\nA Categorical Model for a Quantum Circuit Description Language (Extended  Abstract)\nLearning to Generate Wikipedia Summaries for Underserved Languages from  Wikidata\nGenerating Sentence Planning Variations for Story Telling\nComposable Languages for Bioinformatics: The NYoSh experiment\nThe complexity of conservative valued CSPs\nA Study of Language Usage Evolution in Open Source Software\nAn Autoencoder Approach to Learning Bilingual Word Representations\nGrounded Language Learning in a Simulated 3D World\nSentence Object Notation: Multilingual sentence notation based on  Wordnet\nBackward and Forward Language Modeling for Constrained Sentence  Generation\nNavigational Instruction Generation as Inverse Reinforcement Learning  with Neural Machine Translation\nRDF2PT: Generating Brazilian Portuguese Texts from RDF Data\nOn the Size Complexity of Non-Returning Context-Free PC Grammar Systems\nDescriptional Complexity of Three-Nonterminal Scattered Context  Grammars: An Improvement\nGuided Grammar Convergence. Full Case Study Report. Generated by  converge::Guided\nMorphological Analyzer and Generator for Russian and Ukrainian Languages\nLearning to Generate Compositional Color Descriptions\nDynamic Entity Representations in Neural Language Models\nCLARE: A Contextual Reasoning and Cooperative Response Framework for the  Core Language Engine\nPluggable AOP: Designing Aspect Mechanisms for Third-party Composition\nInformation Flow Analysis for a Dynamically Typed Functional Language  with Staged Metaprogramming\nOn the effect of the IO-substitution on the Parikh image of semilinear  AFLs\nDistinguishability Operations and Closures on Regular Languages\nRank diversity of languages: Generic behavior in computational  linguistics\nGenerating Domain-Specific Transformation Languages for Component &  Connector Architecture Descriptions\nDocument Context Language Models\nThe Language of Search\nEfficient Transfer Learning Schemes for Personalized Language Modeling  using Recurrent Neural Network\nLanguage-Based Image Editing with Recurrent Attentive Models\nA Tool for Collecting Domain Dependent Sortal Constraints From Corpora\nIntroduction to the CoNLL-2002 Shared Task: Language-Independent Named  Entity Recognition\nIntroduction to the CoNLL-2003 Shared Task: Language-Independent Named  Entity Recognition\nRepresenting Real Numbers in a Generalized Numeration Systems\nExtended Lambek calculi and first-order linear logic\nPRoMoTo 2013 proceedings\nConjugacy languages in groups\nModeling languages from graph networks\nMATLAB based language for generating randomized multiple choice  questions\nLanguage Models with Pre-Trained (GloVe) Word Embeddings\nCompiling Purely Functional Structured Programs\nA Survey of Neural Network Techniques for Feature Extraction from Text\nThe ModelCC Model-Driven Parser Generator\nGenerative Software Development\nMulti-lingual neural title generation for e-Commerce browse pages\nFibred Computational Effects\nCompiling Language Definitions: The ASF+SDF Compiler\nContext-Free Language Theory Formalization\nLearning a bidirectional mapping between human whole-body motion and  natural language using deep recurrent neural networks\nA framework for lexical representation\nSpeech Recognition by Composition of Weighted Finite Automata\nFull abstraction for nominal general references\nGADT meet Subtyping\nOn periodic points of free inverse monoid endomorphisms\nGADTs meet subtyping\nGeneralized Eilenberg Theorem I: Local Varieties of Languages\nA Binary Data Stream Scripting Language\nForkable Regular Expressions\nAn investigation into language complexity of World-of-Warcraft  game-external texts\nA Neural Architecture for Generating Natural Language Descriptions from  Source Code Changes\nEarly Experience with ASDL in lcc\nLogic Engines as Interactors\nAdapting general-purpose speech recognition engine output for  domain-specific natural language question answering\nIntelligent Voice Prosthesis: Converting Icons into Natural Language  Sentences\nStorage of Natural Language Sentences in a Hopfield Network\nHandling Defeasibilities in Action Domains\nAn Abstract Programming System\nA Language-theoretic View on Guidelines and Consistency Rules of UML\nMean-payoff Automaton Expressions\nDescriptional Complexity of the Languages KaL: Automata, Monoids and  Varieties\nA dichotomy theorem for conservative general-valued CSPs\nAging in language dynamics\nA DSL for Mapping Abstract Syntax Models to Concrete Syntax Models in  ModelCC\nSemilinearity and Context-Freeness of Languages Accepted by Valence  Automata\nOne Billion Word Benchmark for Measuring Progress in Statistical  Language Modeling\nNew Results on the Minimum Amount of Useful Space\nWhy It's Nice to be Quoted: Quasiquoting for Prolog\nSeparating Regular Languages with First-Order Logic\nTechnical Report: Towards a Universal Code Formatter through Machine  Learning\nProcessing Natural Language About Ongoing Actions\nSMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text\nComputing the longest common prefix of a context-free language in  polynomial time\nLong-Short Range Context Neural Networks for Language Modeling\nOn the widths of regular and context free languages, with an application  to information flow\nA categorical foundation for structured reversible flowchart languages:  Soundness and adequacy\nGender Aware Spoken Language Translation Applied to English-Arabic\nUnpaired Image Captioning by Language Pivoting\nAdversarial Generation of Natural Language\nOn Generalization of Definitional Equivalence to Languages with  Non-Disjoint Signatures\nModelling Concurrent Behaviors in the Process Specification Language\nSymbolic Languages and Ars Combinatoria\nSelf-assembling interactive modules: A research programme\nMonitorability of $ω$-regular languages\nLanguage Without Words: A Pointillist Model for Natural Language  Processing\nOn Formal Reasoning on the Semantics of PLC using Coq\nThe genus of regular languages\nArchitecture and Behavior Modeling of Cyber-Physical Systems with  MontiArcAutomaton\nA Domain-Specific Language and Editor for Parallel Particle Methods\nLeast Generalizations and Greatest Specializations of Sets of Clauses\nAssessing the Stylistic Properties of Neurally Generated Text in  Authorship Attribution\nQuotient complexity of ideal languages\nPhonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker  System\nLolisa: Formal Syntax and Semantics for a Subset of the Solidity  Programming Language\nInitial Semantics for Reduction Rules\nTwo-level, Many-Paths Generation\nConstructing a Natural Language Inference Dataset using Generative  Neural Networks\nTagset Design and Inflected Languages\nBayesian Grammar Induction for Language Modeling\nBetter Language Models with Model Merging\nThe Theoretical Status of Ontologies in Natural Language Processing\n\"I'm sorry Dave, I'm afraid I can't do that\": Linguistics, Statistics,  and Natural Language Processing circa 2001\nSimilarity-Based Supervisory Control of Discrete Event Systems\nSequential products in effect categories\nClosures in Formal Languages and Kuratowski's Theorem\nApplication of Generalised sequential crossover of languages to  generalised splicing\nLinear-Logic Based Analysis of Constraint Handling Rules with  Disjunction\nConsidering a resource-light approach to learning verb valencies\nBounded Counter Languages\nSymbolic Representation of Algorithmic Game Semantics\nSentence Compression in Spanish driven by Discourse Segmentation and  Language Models\nSome proof theoretical remarks on quantification in ordinary language\nInformation content versus word length in natural language: A reply to  Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751]\nEliminating Network Protocol Vulnerabilities Through Abstraction and  Systems Language Design\nOperads, quasiorders, and regular languages\nLearning Language from a Large (Unannotated) Corpus\nModeling multi-stage decision optimization problems\nPhrase Based Language Model For Statistical Machine Translation\nVarieties of Languages in a Category\nNeural Generation of Regular Expressions from Natural Language with  Minimal Domain Knowledge\nNumerically Grounded Language Models for Semantic Error Correction\nDependently Typed Programming based on Automated Theorem Proving\nTowards Efficient Abstractions for Concurrent Consensus\nA Fibrational Approach to Automata Theory\nDynamic Bayesian Ontology Languages\nIVOA recommendation: Parameter Description Language Version 1.0\nRationality for subclasses of 321-avoiding permutations\nNominal LCF: A Language for Generic Proof\nGeneralizing and Hybridizing Count-based and Neural Language Models\nAnnotation Methodologies for Vision and Language Dataset Creation\nRobust Natural Language Processing - Combining Reasoning, Cognitive  Semantics and Construction Grammar for Spatial Language\nA Framework for Extending microKanren with Constraints\nGeneric Axiomatization of Families of Noncrossing Graphs in Dependency  Parsing\nSynthesising Sign Language from semantics, approaching \"from the target  and back\"\nSentence Correction Based on Large-scale Language Modelling\nLow-Rank RNN Adaptation for Context-Aware Language Modeling\nUnsupervised Morphological Expansion of Small Datasets for Improving  Word Embeddings\nLabel Languages of 8-directional Array P System\nBiomedical term normalization of EHRs with UMLS\nNeural Program Search: Solving Programming Tasks from Description and  Examples\nCAESAR: Context Awareness Enabled Summary-Attentive Reader\nRemarks on formal languages and the model theory of monoids\nUnderstanding Editing Behaviors in Multilingual Wikipedia\nSoftware Infrastructure for Natural Language Processing\nNext Generation Language Resources using GRID\nQuantum Physics and Human Language\nThe C Object System: Using C as a High-Level Object-Oriented Language\nProceedings 5th International Workshop on Logical Frameworks and  Meta-languages: Theory and Practice\nSignsWorld; Deeping Into the Silence World and Hearing Its Signs (State  of the Art)\nThe essence of component-based design and coordination\nAcronym recognition and processing in 22 languages\nArchitecture of an Ontology-Based Domain-Specific Natural Language  Question Answering System\nINAUT, a Controlled Language for the French Coast Pilot Books  Instructions nautiques\nPermutations of context-free, ET0L and indexed languages\nExplaining Violation Traces with Finite State Natural Language  Generation Models\nA New Skill Based Robot Programming Language Using UML/P Statecharts\nToward an Energy Efficient Language and Compiler for (Partially)  Reversible Algorithms\nDependent Types for Multi-Rate Flows in Synchronous Programming\nUnderstanding Grounded Language Learning Agents\nSome apparently disjoint aims and requirements for grammar development  environments: the case of natural language generation\nOn Descriptive Complexity, Language Complexity, and GB\nNez: practical open grammar language\nGeneralizing input-driven languages: theoretical and practical benefits\nA Flexible Pragmatics-driven Language Generator for Animated Agents\nDialogue as Discourse: Controlling Global Properties of Scripted  Dialogue\nA Context-aware Natural Language Generator for Dialogue Systems\nMorphological Inflection Generation Using Character Sequence to Sequence  Learning\nConciseness through Aggregation in Text Generation\nDifferentially Private Distributed Learning for Language Modeling Tasks\nColored operads, series on colored operads, and combinatorial generating  systems\nTactical Generation in a Free Constituent Order Language\nGenerating Natural Language Inference Chains\nModel-based generation of natural language specifications\nInitiality for Typed Syntax and Semantics\nA Corrective Training Algorithm for Adaptive Learning in Bag Generation\nAn Information Structural Approach to Spoken Language Generation\nC++ Templates as Partial Evaluation\nLanguage Diversity of Measured Quantum Processes\nNon-redundant random generation from weighted context-free languages\nGenerative Knowledge Transfer for Neural Language Models\nRecognising and Generating Terms using Derivatives of Parsing Expression  Grammars\nLanguage Generation with Recurrent Generative Adversarial Networks  without Pre-training\nFormal Languages in Dynamical Systems\nA Framework for Natural Language Interfaces to Temporal Databases\nRegular Ideal Languages and Their Boolean Combinations\nOn the Formal Semantics of Speech-Act Based Communication in an  Agent-Oriented Programming Language\nEfficiently Computing Edit Distance to Dyck Language\nTowards More Security in Data Exchange: Defining Unparsers with  Context-Sensitive Encoders for Context-Free Grammars\nHuman language reveals a universal positivity bias\nDesign of an intermediate representation for query languages\nLeveraging Large Amounts of Weakly Supervised Data for Multi-Language  Sentiment Classification\nOn computational complexity of Set Automata\nFinite-State Approximation of Phrase-Structure Grammars\nAn Iterative Algorithm to Build Chinese Language Models\nHybrid language processing in the Spoken Language Translator\nDesign and Implementation of a Computational Lexicon for Turkish\nEstimation of English and non-English Language Use on the WWW\nTowards a query language for annotation graphs\nMonadic Datalog and the Expressive Power of Languages for Web  Information Extraction\nA Framework for Interoperability\nEffects of Language Modeling on Speech-driven Question Answering\nECA-LP / ECA-RuleML: A Homogeneous Event-Condition-Action Logic  Programming Language\nRational semigroup automata\nEntropy sensitivity of languages defined by infinite automata, via  Markov chains with forbidden transitions\nRewriting Logic Semantics of a Plan Execution Language\nRewriting Constraint Models with Metamodels\nA Type System for Tom\nProgramming Discrete Physical Systems\nReflection-based language support for the heterogeneous capture and  restoration of running computations\nTheory of Atomata\nProgram Equivalence in Linear Contexts\nSaying Hello World with Epsilon - A Solution to the 2011 Instructive  Case\nQuery Language for Complex Similarity Queries\nA 10-dimensional Phonetic-prosodic Space and its Stochastic Structure (A  framework for probabilistic modeling of spoken languages and their phonology)\nPROOFTOOL: a GUI for the GAPT Framework\nDGT-TM: A freely Available Translation Memory in 22 Languages\nVerifiable Source Code Documentation in Controlled Natural Language\nAnalysis Tool for UNL-Based Knowledge Representation\nOn the Computation of Distances for Probabilistic Context-Free Grammars\nRepresenting and Reasoning about Game Strategies\nGaussian Tree Constraints Applied to Acoustic Linguistic Functional Data\nIFC Inside: Retrofitting Languages with Dynamic Information Flow Control  (Extended Version)\nThe role of concurrency in an evolutionary view of programming  abstractions\nMultilingual Image Description with Neural Sequence Models\nA Multilingual FrameNet-based Grammar and Lexicon for Controlled Natural  Language\nCorrection of Noisy Sentences using a Monolingual Corpus\nToward verbalizing ontologies in isiZulu\nBlack-box Integration of Heterogeneous Modeling Languages for  Cyber-Physical Systems\nConvolutional Neural Networks over Tree Structures for Programming  Language Processing\nMontiCore: Modular Development of Textual Domain Specific Languages\nComputing downward closures for stacked counter automata\nConvolutional Neural Network Architectures for Matching Natural Language  Sentences\nA Survey of Current Datasets for Vision and Language Research\nSpin Glass Models of Syntax and Language Evolution\nOn the Expressiveness of Joining\nNeural Language Correction with Character-Based Attention\nWordNet2Vec: Corpora Agnostic Word Vectorization Method\nVisualizing Natural Language Descriptions: A Survey\nDesigning a semantic model for a wide-spectrum language with concurrency\nA Language-theoretic View on Network Protocols\nFalse-Friend Detection and Entity Matching via Unsupervised  Transliteration\nAn Empirical Study of Language CNN for Image Captioning\nMost Complex Non-Returning Regular Languages\nWhat can the programming language Rust do for astrophysics?\nA Morphology-aware Network for Morphological Disambiguation\nWord forms - not just their lengths- are optimized for efficient  communication\nCross-lingual and cross-domain discourse segmentation of entire  documents\nMinimal Forbidden Factors of Circular Words\nGrounding Spatio-Semantic Referring Expressions for Human-Robot  Interaction\nFluency-Guided Cross-Lingual Image Captioning\nA New Semantic Theory of Natural Language\nAuto Analysis of Customer Feedback using CNN and GRU Network\nPyFml - a Textual Language For Feature Modeling\nPTL-separability and closures for WQOs on words\nCanonizable Partial Order Generators and Regular Slice Languages\nThe SQL++ Query Language: Configurable, Unifying and Semi-structured\nAn Efficient Generation Algorithm for Lexicalist MT\nSegregatory Coordination and Ellipsis in Text Generation\nInfinite Correlation in Measured Quantum Processes\nAnisimov's Theorem for inverse semigroups\nGeneralized LR parsing and the shuffle operator\nNatural Language Generation in Healthcare: Brief Review\nA Survey of Paraphrasing and Textual Entailment Methods\nGrammatical Aspects for Language Descriptions\nLiquidsoap: a High-Level Programming Language for Multimedia Streaming\nRationalization: A Neural Machine Translation Approach to Generating  Natural Language Explanations\nSceneSeer: 3D Scene Design with Natural Language\nAffect-LM: A Neural Language Model for Customizable Affective Text  Generation\nGenerating Realtime Motion Plans from Attribute-Based Natural Language  Instructions Using Dynamic Constraint Mapping\nHarvesting Creative Templates for Generating Stylistically Varied  Restaurant Reviews\nCost and dimension of words of zero topological entropy\nText2Action: Generative Adversarial Synthesis from Language to Action\nOn the Design of Generic Static Analyzers for Modern Imperative  Languages\nToward an example-based machine translation from written text to ASL  using virtual agent animation\nBisimulation of Labelled State-to-Function Transition Systems  Coalgebraically\nImproving the Performance of English-Tamil Statistical Machine  Translation System using Source-Side Pre-Processing\nApproximate N-Gram Markov Model for Natural Language Generation\nLinguistics Computation, Automatic Model Generation, and Intensions\nDefault Handling in Incremental Generation\nModeling informational novelty in a conversational system with a hybrid  statistical and grammar-based approach to natural language generation\nThe Design and Algorithms of a Verification Condition Generator\nA Model-Driven Probabilistic Parser Generator\nModelling homogeneous generative meta-programming\nThe Code2Text Challenge: Text Generation in Source Code Libraries\nHierarchical Text Generation and Planning for Strategic Dialogue\nDomain and Language Independent Feature Extraction for Statistical Text  Categorization\nGeneral Game Management Agent\nChurch: a language for generative models\nUnified Form Language: A domain-specific language for weak formulations  of partial differential equations\nLanguage-based Games\nAugur: a Modeling Language for Data-Parallel Probabilistic Inference\nGenerating Natural Language Descriptions from OWL Ontologies: the  NaturalOWL System\nEfficient Editor Generation for Compositional DSLs in Eclipse\nStochastic Language Generation in Dialogue using Recurrent Neural  Networks with Convolutional Sentence Reranking\nOn Varieties of Ordered Automata\nText2Shape: Generating Shapes from Natural Language by Learning Joint  Embeddings\nExpressiveness and Closure Properties for Quantitative Languages\nClustering Web Search Results For Effective Arabic Language Browsing\nSemantic Folding Theory And its Application in Semantic Fingerprinting\nEvolutionary forces in language change\nThe KIT Motion-Language Dataset\nRank dynamics of word usage at multiple scales\nTowards Understanding Generics in Mainstream OOP\nBeginner's Luck: A Language for Property-Based Generators\nN-Gram Cluster Identification During Empirical Knowledge Representation  Generation\nThree Generative, Lexicalised Models for Statistical Parsing\nParsing and Generation with Tabulation and Compilation\nAn Analysis of Lambek's Production Machines\nRegular geodesic normal forms in virtually abelian groups\nA c*-algebraic framework for quantum groups\nTypesafe Modeling in Text Mining\nSentiment Analysis: A Survey\nJava Generics are Turing Complete\nMultimodal Semantic Simulations of Linguistically Underspecified Motion  Events\nGenerating Memorable Mnemonic Encodings of Numbers\nCharacterisation of (Sub)sequential Rational Functions over a General  Class Monoids\nTowards Hybrid Intensional Programming with JLucid, Objective Lucid, and  General Imperative Compiler Framework in the GIPSY\nMarkov semigroups, monoids, and groups\nCorpora Preparation and Stopword List Generation for Arabic data in  Social Network\nOn the Expressive Power of Multiple Heads in CHR\nToward an architecture for quantum programming\nFlavor: A Language for Media Representation\nConstraint-based verification of abstract models of multitreaded  programs\nCiliate Gene Unscrambling with Fewer Templates\nLightweight Time Modeling in Timed Creol\nIDL-Expressions: A Formalism for Representing and Parsing Finite  Languages in Natural Language Processing\nStep-Indexed Normalization for a Language with General Recursion\nBarQL: Collaborating Through Change\nA Swiss Pocket Knife for Computability\nImplementation of an Automatic Sign Language Lexical Annotation  Framework based on Propositional Dynamic Logic\nUnifying Visual-Semantic Embeddings with Multimodal Neural Language  Models\nLearning in the Rational Speech Acts Model\nModelWizard: Toward Interactive Model Construction\nCoalgebraic Characterizations of Context-Free Languages\nSystem and Methods for Converting Speech to SQL\nFrom Clarity to Efficiency for Distributed Algorithms\nTalking about the Moving Image: A Declarative Model for Image Schema  Based Embodied Perception Grounding and Language Generation\nTranslation into any natural language of the error messages generated by  any computer program\nAround Context-Free Grammars - a Normal Form, a Representation Theorem,  and a Regular Approximation\nConceptNet 5.5: An Open Multilingual Graph of General Knowledge\nBangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language  Model\nLong-Range Correlation Underlying Childhood Language and Generative  Models\nRevisiting the poverty of the stimulus: hierarchical generalization  without a hierarchical bias in recurrent neural networks\nOff-line Optimization for Earley-style HPSG Processing\nImproving the Efficiency of a Generation Algorithm for Shake and Bake  Machine Translation Using Head-Driven Phrase Structure Grammar\nLetting the Cat out of the Bag: Generation for Shake-and-Bake MT\nCollocational Grammar\nHigher-Order Coloured Unification and Natural Language Semantics\nTwo Sources of Control over the Generation of Software Instructions\nLanguage Trees and Zipping\nGeneralization of automatic sequences for numeration systems on a  regular language\nDynamic Nonlocal Language Modeling via Hierarchical Topic-Based  Adaptation\nReal numbers having ultimately periodic representations in abstract  numeration systems\nTest Collections for Patent-to-Patent Retrieval and Patent Map  Generation in NTCIR-4 Workshop\nMixing the Objective Caml and C# Programming Models in the .Net  Framework\nGetting More From Your Multicore: Exploiting OpenMP From An Open Source  Numerical Scripting Language\nMechanized semantics\nGroups with poly-context-free word problem\nMinimalist Grammars and Minimalist Categorial Grammars, definitions  toward inclusion of generated languages\nOn Varieties of Automata Enriched with an Algebraic Structure (Extended  Abstract)\nComplexity of Problems of Commutative Grammars\nRegular realizability problems and models of a generalized  nondeterminism\nControlled Natural Language Generation from a Multilingual  FrameNet-based Grammar\nAn Approach for Text Steganography Based on Markov Chains\nLanguage to Logical Form with Neural Attention\nWeight Computation of Regular Tree Languages\nSwift: Compiled Inference for Probabilistic Programming Languages\nProbabilistic call by push value\nDoubly-Attentive Decoder for Multi-modal Neural Machine Translation\nQuantifiers on languages and codensity monads\nData-driven Natural Language Generation: Paving the Road to Success\nGrammar induction for mildly context sensitive languages using  variational Bayesian inference\nInitiality for Typed Syntax and Semantics\nExpressiveness and Closure Properties for Quantitative Languages\nTest Case Generation for Object-Oriented Imperative Languages in CLP\nFASTUS: A Cascaded Finite-State Transducer for Extracting Information  from Natural-Language Text\nA Descriptive Characterization of Tree-Adjoining Languages (Full  Version)\nA Topos Foundation for Theories of Physics: I. Formal Languages for  Physics\nBiLingual Information Retrieval System for English and Tamil\nA Comparative Study of the Usability of Two Object-oriented Concurrent  Programming Languages\nLanguage learning from positive evidence, reconsidered: A  simplicity-based approach\nToric grammars: a new statistical approach to natural language modeling\nOpinion Mining In Hindi Language: A Survey\nA Reduction from Valued CSP to Min Cost Homomorphism Problem for  Digraphs\nApplied Choreographies\nSeveral types of types in programming languages\nA Polya Urn Document Language Model for Improved Information Retrieval\nAn Incremental Learner for Language-Based Anomaly Detection in XML\nThe Schützenberger product for syntactic spaces\nA Readable Read: Automatic Assessment of Language Learning Materials  based on Linguistic Complexity\nFrom Query to Usable Code: An Analysis of Stack Overflow Code Snippets\nCanonical Completeness in Lattice-Based Languages for Attribute-Based  Access Control\nA Lambda Calculus for Transfinite Arrays: Unifying Arrays and Streams\nA Language for Generic Programming in the Large\nCharacterizations of one-way general quantum finite automata\nCode Generator Composition for Model-Driven Engineering of Robotics  Component & Connector Systems\nMachine Learning of Phonologically Conditioned Noun Declensions For  Tamil Morphological Generators\nEgyptian Dialect Stopword List Generation from Social Network Data\nSemantic Refinement GRU-based Neural Language Generation for Spoken  Dialogue Systems\nAutomatic Generation of Natural Language Explanations\nLearning an Executable Neural Semantic Parser\nRole of Morphology Injection in Statistical Machine Translation\nMulti-Stage Programs are Generalized Arrows\nLarge Aperiodic Semigroups\nSeparating regular languages with two quantifier alternations\nGenerating Multilingual Parallel Corpus Using Subtitles\nAmalia -- A Unified Platform for Parsing and Generation\nTrainable Methods for Surface Natural Language Generation\nA Planning based Framework for Essay Generation\nMulti-domain Neural Network Language Generation for Spoken Dialogue  Systems\nFuzzy Sets Across the Natural Language Generation Pipeline\nNatural Language Generation in Dialogue using Lexicalized and  Delexicalized Data\nTweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM  Encoder-Decoder\nAdversarial Ranking for Language Generation\nNatural Language Generation for Spoken Dialogue System using RNN  Encoder-Decoder Networks\nSteering Output Style and Topic in Neural Response Generation\nEstimating Performance of Pipelined Spoken Language Translation Systems\nImproving Language Models by Clustering Training Sentences\nAdnominal adjectives, code-switching and lexicalized TAG\nConstraining Lexical Selection Across Languages Using TAGs\nImproving Statistical Language Model Performance with Automatically  Generated Word Hierarchies\nPhonetic Ambiguity : Approaches, Touchstones, Pitfalls and New  Approaches\nBuilding and Refining Abstract Planning Cases by Change of  Representation Language\nResolving Part-of-Speech Ambiguity in the Greek Language Using Learning  Techniques\nIntroduction to the GiNaC Framework for Symbolic Computation within the  C++ Programming Language\nAssertion checker for the C programming language based on computations  over event traces\nModelling Semantic Association and Conceptual Inheritance for Semantic  Analysis\nOwnership Confinement Ensures Representation Independence for  Object-Oriented Programs\nBuilding an Open Language Archives Community on the OAI Foundation\nThe Athena Data Dictionary and Description Language\nUnsupervised Topic Adaptation for Lecture Speech Retrieval\nA survey of topological work at CEOL\nQuantum finite multitape automata\nLinear-algebraic lambda-calculus\nAlgebraic characterization of logically defined tree languages\nLanguage structure in the n-object naming game\nOrbits of linear maps and regular languages\nTiling-Recognizable Two-Dimensional Languages: From Non-Determinism to  Determinism through Unambiguity\nIt Is NL-complete to Decide Whether a Hairpin Completion of Regular  Languages Is Regular\nOn Conditional Decomposability\nRecognizing Bangla Grammar using Predictive Parser\nA System-Level Semantics\nA Query Language for Formal Mathematical Libraries\nEnumerating regular expressions and their languages\nIncomplete Transition Complexity of Basic Operations on Finite Languages\nA Graphical Language for Proof Strategies\nA Graphical Language for Real-Time Critical Robot Commands\nDevelopment of a Hindi Lemmatizer\nCoordination Control of Discrete-Event Systems Revisited\nGraphical law beneath each written natural language\nEffective Quotation: relating approaches to language-integrated query\nOn Combinatorial Generation of Prefix Normal Words\nMore ties than we thought\nRegression analysis in quantum language\nLogics with rigidly guarded data tests\nTransfer Learning for Speech and Language Processing\nHigh-level GPU programming in Julia\nA deep language model for software code\nA Programming Language With a POMDP Inside\nL-FLAT: Logtalk Toolkit for Formal Languages and Automata Theory\nFrameNet CNL: a Knowledge Representation and Information Extraction  Language\nProceedings 13th International Workshop on Foundations of Coordination  Languages and Self-Adaptive Systems\nIdentifying an Honest ${\\rm EXP}^{\\rm NP}$ Oracle Among Many\nLoo.py: From Fortran to performance via transformation and substitution  rules\nBengali to Assamese Statistical Machine Translation using Moses (Corpus  Based)\nSyntactic Monoids in a Category\nTowards correct-by-construction product variants of a software product  line: GFML, a formal language for feature modules\nLearning Regular Languages over Large Ordered Alphabets\nA Hybrid Model for Enhancing Lexical Statistical Machine Translation  (SMT)\nSolution sets for equations over free groups are EDT0L languages\nSymmetry and Universality in Language Change\nRIPL: An Efficient Image Processing DSL for FPGAs\nA Framework for Analyzing Stochastic Jumps in Finance based on Belief  and Knowledge\nMorpho-syntactic Lexicon Generation Using Graph-based Semi-supervised  Learning\nProceedings 14th International Workshop on Foundations of Coordination  Languages and Self-Adaptive Systems\nWhy Just Boogie? Translating Between Intermediate Verification Languages\nA Factorized Recurrent Neural Network based architecture for medium to  large vocabulary Language Modelling\nLanguage recognition power and succintness of affine automata\nCorpus analysis without prior linguistic knowledge - unsupervised mining  of phrases and subphrase structure\nAn Implementation and Analysis of a Kernel Network Stack in Go with the  CSP Style\nPolyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic  Representation Learning\nThe Use of ICT to preserve Australian Indigenous Culture and Language -  a Preliminary Proposal using the Activity Theory Framework\nInducing Multilingual Text Analysis Tools Using Bidirectional Recurrent  Neural Networks\nEnabling Medical Translation for Low-Resource Languages\nEfficient Implementation of a Higher-Order Language with Built-In AD\nAttention-based Memory Selection Recurrent Network for Language Modeling\nSentence-level dialects identification in the greater China region\nRobust Multilingual Named Entity Recognition with Shallow  Semi-Supervised Features\nDAWT: Densely Annotated Wikipedia Texts across multiple languages\nThe Formal Semantics of Rascal Light\nAutomated Hate Speech Detection and the Problem of Offensive Language\nTopically Driven Neural Language Model\nImproving Multilingual Named Entity Recognition with Wikipedia Entity  Type Mapping\nA Verified Compiler for Probability Density Functions\nRule-Based Spanish Morphological Analyzer Built From Spell Checking  Lexicon\nA Tale of Two DRAGGNs: A Hybrid Approach for Interpreting  Action-Oriented and Goal-Oriented Instructions\nRecurrent Neural Network-Based Sentence Encoder with Gated Attention for  Natural Language Inference\nSimplicity: A New Language for Blockchains\nLarge-scale Cloze Test Dataset Designed by Teachers\nA superpolynomial lower bound for the size of non-deterministic  complement of an unambiguous automaton\nEfficient Representation for Natural Language Processing via Kernelized  Hashcodes\nAn Encoder-Decoder Framework Translating Natural Language to Database  Queries\nReview of Design of Speech Recognition and Text Analytics based Digital  Banking Customer Interface and Future Directions of Technology Adoption\nAnalyzing Language Learned by an Active Question Answering Agent\nGenerating Python Code From Object-Z Specifications\nQ#: Enabling scalable quantum computing and development with a  high-level domain-specific language\nA Factoid Question Answering System for Vietnamese\nGenerating Bilingual Pragmatic Color References\nColorless green recurrent networks dream hierarchically\nDiscontinuous Hamiltonian Monte Carlo for Probabilistic Programs\nLR(1) Parser Generation System: LR(1) Error Recovery, Oracles, and  Generic Tokens\nOn Learning More Appropriate Selectional Restrictions\nNPtool, a detector of English noun phrases\nHandling Sparse Data by Successive Abstraction\nSemi-Automatic Acquisition of Domain-Specific Translation Lexicons\nNumeration systems on a regular language\nA Matter of Opinion: Sentiment Analysis and Business Intelligence  (position paper)\nA type-based termination criterion for dependently-typed higher-order  rewrite systems\nEinsteins Arbeiten in Bezug auf die moderne Kosmologie\nA Feynman Diagram Analyser DIANA\nRegular geodesic languages and the falsification by fellow traveler  property\nMathematics as an Exact and Precise Language of Nature\nSome Insight into Many Constituent Dynamics\nIterators, Recursors and Interaction Nets\nCounting Finite Languages by Total Word Length\nDisplacement Calculus\nOn the Descriptional Complexity of Limited Propagating Lindenmayer  Systems\nFree inductive K-semialgebras\nFrom indexed grammars to generating functions\nKolmogorov complexity as a language\nOn uncountable hypersimple unidimensional theories\nPartial actions and automata\nSome Properties of Brzozowski Derivatives of Regular Expressions\nConcept-oriented programming: from classes to concepts and from  inheritance to inclusion\nTheory of Programs\nTwo Results on Discontinuous Input Processing\nOn nonpermutational transformation semigroups with an application to  syntactic complexity\nA Second-Order Formulation of Non-Termination\nIncorporating User Interaction into Imperative Languages\nOperators for Space and Time in BeSpaceD\nX575: writing rengas with web services\nA Linear Acceleration Theorem for 2D Cellular Automata on all Complete  Neighborhoods\nThe Algorithmic Inflection of Russian and Generation of Grammatically  Correct Text\nA Topological proof that $O_2$ is $2$-MCFL\nClickbait Identification using Neural Networks\nLearning and analyzing vector encoding of symbolic representations\nK-vec: A New Approach for Aligning Parallel Texts\nLexikoneintraege fuer deutsche Adverbien (Dictionary Entries for German  Adverbs)\nConcurrent Lexicalized Dependency Parsing: The ParseTalk Model\nParsing Using Linearly Ordered Phonological Rules\nFree-ordered CUG on Chemical Abstract Machine\nRobust stochastic parsing using the inside-outside algorithm\nLiteral Movement Grammars\nA Computational Treatment of HPSG Lexical Rules as Covariation in  Lexical Entries\nGLR-Parsing of Word Lattices Using a Beam Search Method\nBuilding Natural-Language Generation Systems\nParsing for Semidirectional Lambek Grammar is NP-Complete\nAn Efficient Compiler for Weighted Rewrite Rules\nLearning Micro-Planning Rules for Preventative Expressions\nMachine Transliteration\nLearning Parse and Translation Decisions From Examples With Rich Context\nAn Information Extraction Core System for Real World German Text  Processing\nA General, Sound and Efficient Natural Language Parsing Algorithm based  on Syntactic Constraints Propagation\nWebScript -- A Scripting Language for the Web\nConstruction of regular languages and recognizability of polynomials\nSemantic robust parsing for noun extraction from natural language  queries\nExploiting Diversity in Natural Language Processing: Combining Parsers\nDistributive Computability\nFast Recompilation of Object Oriented Modules\nLinguistic-Mathematical Statistics in Rebus, Lyrics, Juridical Texts,  Fancies and Paradoxes\nRepresentation Theory of Finite Semigroups, Semigroup Radicals and  Formal Language Theory\nExtending the Lambda Calculus to Express Randomized and Quantumized  Algorithms\nTheorem proving support in programming language semantics\nGenerating models for temporal representations\nCompiling ER Specifications into Declarative Programs\nDSL development based on target meta-models. Using AST transformations  for automating semantic analysis in a textual DSL framework\nOn Measuring Non-Recursive Trade-Offs\nMathematics, Recursion, and Universals in Human Languages\nTyping rule-based transformations over topological collections\nDeveloping a New Approach for Arabic Morphological Analysis and  Generation\nOn the capabilities of grammars, automata, and transducers controlled by  monoids\nGhost free Massive Gravity in the Stückelberg language\nCoInDiVinE: Parallel Distributed Model Checker for Component-Based  Systems\nWreath Products of Forest Algebras, with Applications to Tree Logics\nInput Scheme for Hindi Using Phonetic Mapping\nOn the transition reduction problem for finite automata\nFunctional Package Management with Guix\nLanguage Modeling with Power Low Rank Ensembles\nNatural Language Feature Selection via Cooccurrence\n$L$-Primitive Words in Submonoids\nDistributed Graph Automata\nEF+EX Forest Algebras\nThompson's group F is 1-counter graph automatic\n$\\mathbb{N}$-algebraicity of zeta functions of sofic-Dyck shifts\nComparative Analysis of Classic Garbage-Collection Algorithms for a  Lisp-like Language\nReasoning About Pragmatics with Neural Listeners and Speakers\nImproving LSTM-based Video Description with Linguistic Knowledge Mined  from Text\nA Modular Formalization of Reversibility for Concurrent Models and  Languages\nA program logic for higher-order procedural variables and non-local  jumps\nExtensible Validation Framework for DSLs using MontiCore on the Example  of Coding Guidelines\nMeta-Modeling Semantics of UML\nA Simple and Efficient Method To Generate Word Sense Representations\nGraph Grammars, Insertion Lie Algebras, and Quantum Field Theory\nA Next-Generation Data Language Proposal\nRecognize Foreign Low-Frequency Words with Similar Pairs\nFeature-based Decipherment for Large Vocabulary Machine Translation\nConfluent Orthogonal Drawings of Syntax Diagrams\nRecurrent Neural Network Grammars\nUnsupervised Ranking Model for Entity Coreference Resolution\nSchützenberger Products in a Category\nNoisy Parallel Approximate Decoding for Conditional Recurrent Language  Model\nAutomatic Extraction of Causal Relations from Natural Language Texts: A  Comprehensive Survey\nOn the Herbrand content of LK\nPartial Derivatives for Context-Free Languages: From $μ$-Regular  Expressions to Pushdown Automata\nSyntactic Enhancement to VSIMM for Roadmap Based Anomalous Trajectory  Detection: A Natural Language Processing Approach\nSparse Coding of Neural Word Embeddings for Multilingual Sequence  Labeling\nConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with  Multilingual Relational Knowledge\nSemi-supervised Multitask Learning for Sequence Labeling\nParameterized Complexity of CSP for Infinite Constraint Languages\nQuantumOptics.jl: A Julia framework for simulating open quantum systems\nBayesian Sparsification of Recurrent Neural Networks\nDeep Transfer in Reinforcement Learning by Language Grounding\nConstructive completeness and non-discrete languages\nWHY: Natural Explanations from a Robot Navigator\nLocally Nameless Permutation Types\nInducing Regular Grammars Using Recurrent Neural Networks\nSystème de traduction automatique statistique Anglais-Arabe\nA Short Survey on Sense-Annotated Corpora for Diverse Languages and  Resources\nA generic characterization of Pol(C)\nAlmost Sure Productivity\nOpen Programming Language Interpreters\nDesign and Implementation of a Tactical Generator for Turkish, a Free  Constituent Order Language\nComplex Systems: a Physicist's Viewpoint\nInductive-data-type Systems\nRecasting results in equivariant geometry: affine cosets, observable  subgroups and existence of good quotients\nUsing Built-In Domain-Specific Modeling Support to Guide Model-Based  Test Generation\nMonolingual Probabilistic Programming Using Generalized Coroutines\nk-Colorability is Graph Automaton Recognizable\nLinguistic Descriptions for Automatic Generation of Textual Short-Term  Weather Forecasts on Real Prediction Data\nGenerating Sentences from a Continuous Space\nAvatar-independent scripting for real-time gesture animation\nA Software Package for Chemically Inspired Graph Transformation\nNeural Text Generation from Structured Data with Application to the  Biography Domain\nLanguage as a Latent Variable: Discrete Generative Models for Sentence  Compression\nEvaluating Creative Language Generation: The Case of Rap Lyric  Ghostwriting\nWeighted Regular Tree Grammars with Storage\nTowards a Java Subtyping Operad\nAdaptive Convolutional Filter Generation for Natural Language  Understanding\nA Memory-Based Approach to Learning Shallow Natural Language Patterns\nForgetting Exceptions is Harmful in Language Learning\nOn the accuracy of language trees\nProceedings Eight Workshop on Structural Operational Semantics 2011\nA Rhetorical Analysis Approach to Natural Language Processing\nA Vision for Online Verification-Validation\nModular, Fully-abstract Compilation by Approximate Back-translation\nLanguage Oriented Modularity: From Theory to Practice\nAutomated code generation for discontinuous Galerkin methods\nSynthesizing Novel Pairs of Image and Text\nPGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE  Systems\nZipf's law is a consequence of coherent language production\nMa(r)king concessions in English and German\nGenerating Multilingual Personalized Descriptions of Museum Exhibits -  The M-PIRO Project\nThe Generation of Textual Entailment with NLML in an Intelligent  Dialogue system for Language Learning CSIEC\nThe C++0x \"Concepts\" Effort\nGibbs Sampling in Open-Universe Stochastic Languages\nCapture-Avoiding and Hygienic Program Transformations (incl. Proofs)\nVisualization of Constraint Handling Rules\nTraining a Multilingual Sportscaster: Using Perceptual Context to Learn  Language\nModeling meaning: computational interpreting and understanding of  natural language fragments\nRunning Probabilistic Programs Backwards\nRobot Language Learning, Generation, and Comprehension\nAnswer-Type Modification without Tears: Prompt-Passing Style Translation  for Typed Delimited-Control Operators\nNo positive cone in a free product is regular\nReference-Aware Language Models\nSEA: String Executability Analysis by Abstract Interpretation\nShapeWorld - A new test methodology for multimodal language  understanding\nControl Improvisation\nNeural-based Natural Language Generation in Dialogue using RNN  Encoder-Decoder with Semantic Aggregation\nSAM: Semantic Attribute Modulation for Language Modeling and Style  Variation\nLithium NLP: A System for Rich Information Extraction from Noisy User  Generated Text on Social Media\nA monadic solution to the Cartwright-Felleisen-Wadler conjecture\nReferenceless Quality Estimation for Natural Language Generation\nA Novel Way of Identifying Cyber Predators\nA General Path-Based Representation for Predicting Program Properties\nConnectivity in Bag Generation\nEmphatic generation: employing the theory of semantic emphasis for text  generation\nA generation algorithm for f-structure representations\nIncremental Parser Generation for Tree Adjoining Grammars\nGenerative Adversarial Nets for Multiple Text Corpora\nTowards the quantification of the semantic information encoded in  written language\nA visual programming language for drawing and executing flowcharts\nInterrupt Timed Automata: verification and expressiveness\nThe power of linear programming for general-valued CSPs\nJust-in-Time Static Type Checking for Dynamic Languages\nWhere in the World are You? Geolocation and Language Identification in  Twitter\nCharacterizing classes of regular languages using prefix codes of  bounded synchronization delay\nSemantic Similarity from Natural Language and Ontology Analysis\nEmergence of Language with Multi-agent Games: Learning to Communicate  with Sequences of Symbols\nRepresenting Hybrid Automata by Action Language Modulo Theories\nLinguistically Motivated Vocabulary Reduction for Neural Machine  Translation from Turkish to English\nNetwork Controllability in the IFG Relates to Controlled Language  Variability and Susceptibility to TMS\nProLanGO: Protein Function Prediction Using Neural~Machine Translation  Based on a Recurrent Neural Network\nAugmenting Librispeech with French Translations: A Multimodal Corpus for  Direct Speech Translation Evaluation\nStone duality, topological algebra, and recognition\nGenerating Configurable Hardware from Parallel Patterns\nAn error correcting parser for context free grammars that takes less  than cubic time\nMultiresolution Recurrent Neural Networks: An Application to Dialogue  Response Generation\nMultiple Context-Free Tree Grammars: Lexicalization and Characterization\nComparing Fifty Natural Languages and Twelve Genetic Languages Using  Word Embedding Language Divergence (WELD) as a Quantitative Measure of  Language Distance\nUser-Defined Operators Including Name Binding for New Language  Constructs\nTowards History-based Grammars: Using Richer Models for Probabilistic  Parsing\nNL Understanding with a Grammar of Constructions\nLearning Syntactic Rules and Tags with Genetic Algorithms for  Information Retrieval and Filtering: An Empirical Basis for Grammatical Rules\nUsing Single Layer Networks for Discrete, Sequential Data: An Example  from Natural Language Processing\nLanguage identification of controlled systems: Modelling, control and  anomaly detection\nFactorization of Language Models through Backing-Off Lattices\nModel Checking Linear Logic Specifications\nConstraint-based automatic verification of abstract models of  multithreaded programs\nPageRank without hyperlinks: Structural re-ranking using links induced  by language models\nThe Hole Argument for Covariant Theories\nA triangle-based logic for affine-invariant querying of spatial and  spatio-temporal data\nA Generalized Streaming Model for Concurrent Computing\nSolving the TTC 2011 Reengineering Case with MOLA and Higher-Order  Transformations\nDescriptive complexity for pictures languages (extended abstract)\nElaborating Intersection and Union Types\nFST Based Morphological Analyzer for Hindi Language\nStochastic Context-Free Grammars, Regular Languages, and Newton's Method\nSYNTAGMA. A Linguistic Approach to Parsing\n$\\mathcal C$-graph automatic groups\nRProtoBuf: Efficient Cross-Language Data Serialization in R\nClassical and quantum realtime alternating automata\nObject Oriented Analysis using Natural Language Processing concepts: A  Review\nTailoring the MontiArcAutomaton Component & Connector ADL for Generative  Development\nCrowd-sourcing NLG Data: Pictures Elicit Better Data\nBoundary-based MWE segmentation with text partitioning\nMontiWeb - Modular Development of Web Information Systems\nMontiCore: A Framework for the Development of Textual Domain Specific  Languages\nVoice based self help System: User Experience Vs Accuracy\nAsk Me Anything: Dynamic Memory Networks for Natural Language Processing\nReduction of Nondeterministic Tree Automata\nToward Mention Detection Robustness with Recurrent Neural Networks\nImproving Automated Patent Claim Parsing: Dataset, System, and  Experiments\nDetermining the Characteristic Vocabulary for a Specialized Dictionary  using Word2vec and a Directed Crawler\nAdaptable Symbol Table Management by Meta Modeling and Generation of  Symbol Table Infrastructures\nCertification of Prefixed Tableau Proofs for Modal Logic\nThe Best of Both Worlds: Linear Functional Programming without  Compromise\nWhy Can't You Behave? Non-termination Analysis of Direct Recursive Rules  with Constraints\nLiveness Verification and Synthesis: New Algorithms for Recursive  Programs\nSolutions of twisted word equations, EDT0L languages, and context-free  groups\nDiscriminative Bimodal Networks for Visual Localization and Detection  with Natural Language Queries\nDetecting English Writing Styles For Non Native Speakers\nBuilding Morphological Chains for Agglutinative Languages\nWhere to Play: Retrieval of Video Segments using Natural-Language  Queries\nThe Power of Constraint Grammars Revisited\nCodeSum: Translate Program Language to Natural Language\nMorphology Generation for Statistical Machine Translation\nCombining Representation Learning with Logic for Language Processing\nBidirectional Attention for SQL Generation\nExplicit Reasoning over End-to-End Neural Architectures for Visual  Question Answering\nCounterexamples for Robotic Planning Explained in Structured Language\nSynthesizing Bijective Lenses\nEmergent Parsing and Generation with Generalized Chart\nA Chart Generator for Shake and Bake Machine Translation\nBuilding Knowledge Bases for the Generation of Software Documentation\nOn Torsion-Free Semigroups Generated by Invertible Reversible Mealy  Automata\nTowards Music Captioning: Generating Music Playlist Descriptions\nA Geometric Method to Obtain the Generation Probability of a Sentence\nDeep Recurrent Generative Decoder for Abstractive Text Summarization\nCombining Generative and Discriminative Approaches to Unsupervised  Dependency Parsing via Dual Decomposition\nGenerating Sentences by Editing Prototypes\nSource-side Prediction for Neural Headline Generation\nZero-Shot Question Generation from Knowledge Graphs for Unseen  Predicates and Entity Types\nThe PITA System: Tabling and Answer Subsumption for Reasoning under  Uncertainty\nWhere to put the Image in an Image Caption Generator\nGenerating One-Anaphoric Expressions: Where Does the Decision Lie?\nA Logic-based Approach to Generatively Defined Discriminative Modeling\nUnveiling the Dreams of Word Embeddings: Towards Language-Driven Image  Generation\nA Generalized Quantifier Concept in Computational Complexity Theory\nThe equality problem for infinite words generated by primitive morphisms\nAn Analysis of General Fuzzy Logic and Fuzzy Reasoning Method\nAutomatic Test Data Generation and Model Checking with CHR\nThe lamplighter group $\\mathbb{Z}_3\\wr\\mathbb{Z}$ generated by a  bireversible automaton\nAn Application of the Generalized Rectangular Fuzzy Model to Critical  Thinking Assessment\nOn elementary equivalence of rings with a finitely generated additive  group\nEvent Representations for Automated Story Generation with Deep Neural  Nets\nOrder-Planning Neural Text Generation From Structured Data\nAutomatic Generation of Benchmarks for Entity Recognition and Linking\nMojiTalk: Generating Emotional Responses at Scale\nImage Captioning at Will: A Versatile Scheme for Effectively Injecting  Sentiments into Image Descriptions\nVisual definition of procedures for automatic virtual scene generation\nLearning Fault-tolerant Speech Parsing with SCREEN\nObservability and Decentralized Control of Fuzzy Discrete Event Systems\nModules over relative monads for syntax and semantics\nAn MML-based tool for evaluating the complexity of (stochastic) logic  theories\nChecking generalized debates with small space and randomness\nDeverbal semantics and the Montagovian generative lexicon\nSynthesis from Formal Partial Abstractions\nBuilding a Fine-Grained Entity Typing System Overnight for a New X (X =  Language, Domain, Genre)\nEnd-to-end Concept Word Detection for Video Captioning, Retrieval, and  Question Answering\nThe cognitive roots of regularization in language\nAn End-to-End Approach to Natural Language Object Retrieval via  Context-Aware Deep Reinforcement Learning\nProof-Relevant Logical Relations for Name Generation\nEmergence of Algorithmic Languages in Genetic Systems\nAutomated tone transcription\nParsing English with a Link Grammar\nApportioning Development Effort in a Probabilistic LR Parsing System  through Evaluation\nEfficient Algorithms for Parsing the DOP Model\nMetrics for Evaluating Dialogue Strategies in a Spoken Language System\nA Portable Algorithm for Mapping Bitext Correspondence\nSome Properties of Preposition and Subordinate Conjunction Attachments\nConditions on Consistency of Probabilistic Tree Adjoining Grammars\nA Splitting Set Theorem for Epistemic Specifications\nATLAS: A flexible and extensible architecture for linguistic annotation\nFault Detection using Immune-Based Systems and Formal Language  Algorithms\nDisjunctive Logic Programs with Inheritance\nEllogon: A New Text Engineering Platform\nMeasuring the Functional Load of Phonological Contrasts\nSupervisory Control of Fuzzy Discrete Event Systems\nChecking modes of HAL programs\nOn the freeze quantifier in Constraint LTL: decidability and complexity\nTowards a quantum evolutionary scheme: violating Bell's inequalities in  language\nDetecting palindromes, patterns, and borders in regular languages\nA Formal Foundation for XrML\nDiscovering Global Patterns in Linguistic Networks through Spectral  Analysis: A Case Study of the Consonant Inventories\nA Type System Theory for Higher-Order Intensional Logic Support for  Variable Bindings in Hybrid Intensional-Imperative Programs in GIPSY\nLazy mixin modules and disciplined effects\nThe Complexity of Translation Membership for Macro Tree Transducers\nReview and Analysis of The Issues of Unified Modeling Language for  Visualizing, Specifying, Constructing and Documenting the Artifacts of a  Software-Intensive System\nInterlanguages and synchronic models of computation\nExtended Computation Tree Logic\nOn the Iterated Hairpin Completion\nIsomorphism of regular trees and words\nQuantum-Like Uncertain Conditionals for Text Analysis\nTranslation of Pronominal Anaphora between English and Spanish:  Discrepancies and Evaluation\nFormal Model Engineering for Embedded Systems Using Real-Time Maude\nAn Abstract Semantics for Inference of Types and Effects in a Multi-Tier  Web Language\nTight bounds for the space complexity of nonregular language recognition  by real-time machines\nAbstract Diagnosis for Timed Concurrent Constraint programs\nUsing the DiaSpec design language and compiler to develop robotics  systems\nImplementation of the Domain-Specific Language EasyTime using a LISA  Compiler Generator\nThe Power of Centralized PC Systems of Pushdown Automata\nMining the Web for the Voice of the Herd to Track Stock Market Bubbles\nInduction by Coinduction and Control Operators in Call-by-Name\nSemantic Measures for the Comparison of Units of Language, Concepts or  Instances from Text and Knowledge Base Analysis\nCross-lingual Pseudo-Projected Expectation Regularization for Weakly  Supervised Learning\nRelative Expressive Power of Navigational Querying on Graphs\nThe Obvious Solution to Semantic Mapping -- Ask an Expert\nLinear usage of state\nQuantum Non-Objectivity from Performativity of Quantum Phenomena\nA Deep Architecture for Semantic Parsing\nOnline Stroke and Akshara Recognition GUI in Assamese Language Using  Hidden Markov Model\nContrastive Unsupervised Word Alignment with Non-Local Features\nCorrecting Errors in Digital Lexicographic Resources Using a Dictionary  Manipulation Language\nA type-theoretical approach to Universal Grammar\nAdding Partial Functions to Constraint Logic Programming with Sets\nSemi-supervised Bootstrapping approach for Named Entity Recognition\nExtracting Temporal and Causal Relations between Events\nQuipper: A Scalable Quantum Programming Language\nAuto Spell Suggestion for High Quality Speech Synthesis in Hindi\nBasis Identification for Automatic Creation of Pronunciation Lexicon for  Proper Names\nConstrained Expressions and their Derivatives\nVerification of Information Flow Properties under Rational Observation\nCD2Alloy: Class Diagrams Analysis Using Alloy Revisited\nSystem Model-Based Definition of Modeling Language Semantics\nTranslating Videos to Natural Language Using Deep Recurrent Neural  Networks\nThe Timestamp of Timed Automata\nAn Upper Bound on the Complexity of Recognizable Tree Languages\nAn Intermediate Language and Estimator for Automated Design Space  Exploration on FPGAs\nLanguage Emptiness of Continuous-Time Parametric Timed Automata\nA complex network approach to stylometry\nAugmenting Agent Platforms to Facilitate Conversation Reasoning\nStructural Complexity of Multi-Valued Partial Functions Computed by  Nondeterministic Pushdown Automata\nProgram Synthesis using Natural Language\nA Generalised Quantifier Theory of Natural Language in Categorical  Compositional Distributional Semantics with Bialgebras\nAuthorship Attribution Using a Neural Network Language Model\nA Survey on Domain-Specific Languages for Machine Learning in Big Data\nSemantics of Higher-Order Quantum Computation via Geometry of  Interaction\nNESTML: a modeling language for spiking neurons\nBi-Text Alignment of Movie Subtitles for Spoken English-Arabic  Statistical Machine Translation\nCharacter-Level Language Modeling with Hierarchical Recurrent Neural  Networks\nThe ACPATH Metric: Precise Estimation of the Number of Acyclic Paths in  C-like Languages\nPiecewise Latent Variables for Neural Variational Text Processing\nFrom signatures to monads in UniMath\nA Simple Approach to Multilingual Polarity Classification in Twitter\nA Review of Methodologies for Natural-Language-Facilitated Human-Robot  Cooperation\nType- and Content-Driven Synthesis of SQL Queries from Natural Language\nWeighted Operator Precedence Languages\nNative Language Identification using Stacked Generalization\nFrom Modal to Multimodal Ambiguities: a Classification Approach\nFrom Imitation to Prediction, Data Compression vs Recurrent Neural  Networks for Natural Language Processing\nA Teacher-Student Framework for Zero-Resource Neural Machine Translation\nCandidate sentence selection for language learning exercises: from a  comprehensive framework to an empirical evaluation\nLinear Parsing Expression Grammars\nOutfix-guided insertion\nAdversarial Examples for Evaluating Reading Comprehension Systems\nGuiding Reinforcement Learning Exploration Using Natural Language\nFooling Vision and Language Models Despite Localization and Attention  Mechanism\nNatural Language Aggregate Query over RDF Data\nCorrectness of Speculative Optimizations with Dynamic Deoptimization\nInteractive Robot Learning of Gestures, Language and Affordances\nCode Completion with Neural Attention and Pointer Networks\nInteractive Reinforcement Learning for Object Grounding via Self-Talking\nArrows for Parallel Computation\nStochastic Learning of Nonstationary Kernels for Natural Language  Modeling\nComparing the power of advice strings: a notion of complexity for  infinite words\nVietnamese Open Information Extraction\nLearning Families of Formal Languages from Positive and Negative  Information\nExamining the Tip of the Iceberg: A Data Set for Idiom Translation\nAnnotation Artifacts in Natural Language Inference Data\nCo-occurrence of the Benford-like and Zipf Laws Arising from the Texts  Representing Human and Artificial Languages\nLook Before You Leap: Bridging Model-Free and Model-Based Reinforcement  Learning for Planned-Ahead Vision-and-Language Navigation\nCompositional Obverter Communication Learning From Raw Visual Input\nA Categorical Approach to Syntactic Monoids\nAn Ontology-Based Dialogue Management System for Banking and Finance  Dialogue Systems\nSome Novel Applications of Explanation-Based Learning to Parsing  Lexicalized Tree-Adjoining Grammars\nAn Efficient Algorithm for Surface Generation\nGeneric rules and non-constituent coordination\nNoun Phrase Reference in Japanese-to-English Machine Translation\nBest-First Surface Realization\nDetermining Internal and External Indices for Chart Generation\nDynamics of text generation with realistic Zipf distribution\nGenerating a 3D Simulation of a Car Accident from a Written Description  in Natural Language: the CarSim System\nA Generic Analysis Environment for Curry Programs\nGeometric Algebra Techniques for General Relativity\nEvolving XSLT stylesheets\nLanguage recognition by generalized quantum finite automata with  unbounded error (abstract & poster)\nStar-free geodesic languages for groups\nOn Two Infinite Families of Pairing Bijections\nOntoVerbal: a Generic Tool and Practical Application to SNOMED CT\nGenerate Image Descriptions based on Deep RNN and Memory Cells for  Images Features\nCreating a Real-Time, Reproducible Event Dataset\nExploration of Proximity Heuristics in Length Normalization\nTransition-Based Generation from Abstract Meaning Representations\nDisjunctive Datalog with Existential Quantifiers: Semantics,  Decidability, and Complexity Issues\nGeneralized Grounding Graphs: A Probabilistic Framework for  Understanding Grounded Commands\nBuilding a Generation Knowledge Source using Internet-Accessible  Newswire\nMixing representation levels: The hybrid approach to automatic text  generation\nA Novel Approach to Dropped Pronoun Translation\nA Bayesian Model for Generative Transition-based Dependency Parsing\nLatent Predictor Networks for Code Generation\nRelevance of Unsupervised Metrics in Task-Oriented Dialogue for  Evaluating Natural Language Generation\nStructure and enumeration theorems for hereditary properties in finite  relational languages\nContinuous multilinguality with language vectors\nHygienic Source-Code Generation Using Functors\nA Principled Framework for Constructing Natural Language Interfaces To  Temporal Databases\nExploiting Term Hiding to Reduce Run-time Checking Overhead\nTagging and Morphological Disambiguation of Turkish Text\nSolutions of the quantum dynamical Yang-Baxter equation and dynamical  quantum groups\nMorphological annotation of Korean with Directly Maintainable Resources\nDBMSs Should Talk Back Too\nWeb Based Cross Language Plagiarism Detection\nAcquiring Word-Meaning Mappings for Natural Language Interfaces\nA CONVERT compiler of REC for PDP-8\nStatistical Laws Governing Fluctuations in Word Use from Word Birth to  Word Death\nTowards a Tool-based Development Methodology for Pervasive Computing  Applications\nGender identity and lexical variation in social media\nInteracting via the Heap in the Presence of Recursion\nAn efficient way to assemble finite element matrices in vector languages\nIdentifying Bengali Multiword Expressions using Semantic Clustering\nExploring the power of GPU's for training Polyglot language models\nBridge Correlational Neural Networks for Multilingual Multimodal  Representation Learning\nJoining Transition Systems of Records: Some Congruency and  Language-Theoretic Results\nThe Algebra of Recursive Graph Transformation Language UnCAL: Complete  Axiomatisation and Iteration Categorical Semantics\nEchoes of power: Language effects and power differences in social  interaction\nLanguage of physics, language of math: Disciplinary culture and dynamic  epistemology\nAn End-to-End Neural Network for Polyphonic Piano Music Transcription\nOn the State Complexity of the Shuffle of Regular Languages\nA multi-paradigm language for reactive synthesis\nMatrix Product Operators, Matrix Product States, and ab initio Density  Matrix Renormalization Group algorithms\nDependency resolution and semantic mining using Tree Adjoining Grammars  for Tamil Language\nThe implementation of a Deep Recurrent Neural Network Language Model on  a Xilinx FPGA\nSequence to Sequence Networks for Roman-Urdu to Urdu Transliteration\nTopic Compositional Neural Language Model\nLearning beyond datasets: Knowledge Graph Augmented Neural Networks for  Natural language Processing\nCompiling Diderot: From Tensor Calculus to C\nAutomatic Generation of Technical Documentation\nHas a Consensus NL Generation Architecture Appeared, and is it  Psycholinguistically Plausible?\nTextual Economy through Close Coupling of Syntax and Semantics\nBootstrapping Lexical Choice via Multiple-Sequence Alignment\nAspects of enumeration and generation with a string automata  representation\nNon-redundant random generation algorithms for weighted context-free  languages\nFormal Model-Driven Engineering: Generating Data and Behavioural  Components\nTowards a Coalgebraic Chomsky Hierarchy\nText to 3D Scene Generation with Rich Lexical Grounding\nCharacter-based Neural Machine Translation\nSequence Level Training with Recurrent Neural Networks\nModeling Context in Referring Expressions\nSimple Image Description Generator via a Linear Phrase-Based Approach\nA Hierarchical Neural Autoencoder for Paragraphs and Documents\nIndustrial Experiences with a Formal DSL Semantics to Check the  Correctness of DSL Artifacts\nNatural Language Generation as Planning under Uncertainty Using  Reinforcement Learning\nGenerating machine-executable plans from end-user's natural-language  instructions\nA Hierarchical Approach for Generating Descriptive Image Paragraphs\nJoint Copying and Restricted Generation for Paraphrase\nContext-aware Captions from Context-agnostic Supervision\nLong-Term Memory Networks for Question Answering\nA parallel corpus of Python functions and documentation strings for  automated code documentation and code generation\nOBJ2TEXT: Generating Visually Descriptive Language from Object Layouts\nGeneralized Results on Monoids as Memory\nA Chinese Dataset with Negative Full Forms for General Abbreviation  Prediction\nInteractive Image Manipulation with Natural Language Instruction  Commands\nNeural Baby Talk\nThe Dissecting Power of Regular Languages\nCapturing CFLs with Tree Adjoining Grammars\nGenerating Precondition Expressions in Instructional Text\nSpecifying Intonation from Context for Speech Synthesis\nEfficient Normal-Form Parsing for Combinatory Categorial Grammar\nMental State Adjectives: the Perspective of Generative Lexicon\nA Theory of Parallelism and the Case of VP Ellipsis\nApplying Natural Language Generation to Indicative Summarization\nOn probabilistic analog automata\nAdapting a general parser to a sublanguage\nThe Problem of Classical Limit in Quantum Cosmology: The effective  action language\nIsomorphism property in nonstandard extensions of ZFC universe\nOperads in Higher-Dimensional Category Theory\nExponential sums over definable subsets of finite fields\nThe cognitive homunculus: do tunable languages-of-thought convey  adaptive advantage?\nQuantum Automata and Quantum Grammars\nThe holographic principle and the language of genes\nTransport in molecular states language: Generalized quantum master  equation approach\nAn Order on Sets of Tilings Corresponding to an Order on Languages\nBagPack: A general framework to represent semantic relations\nComputational Power of P Systems with Small Size Insertion and Deletion  Rules\nDire n'est pas concevoir\nNondeterministic fuzzy automata\nPalindromic richness for languages invariant under more symmetries\nFurther developments in generating type-safe messaging\nPreparing Korean Data for the Shared Task on Parsing Morphologically  Rich Languages\nGeneral Purpose Textual Sentiment Analysis and Emotion Detection Tools\nGeneration, Implementation and Appraisal of an N-gram based Stemming  Algorithm\nLooking at Vector Space and Language Models for IR using Density  Matrices\nAutomated Code Generation for Lattice Quantum Chromodynamics and beyond\nMaximally Permissive Coordination Supervisory Control -- Towards  Necessary and Sufficient Conditions\nRobust Named Entity Recognition in Idiosyncratic Domains\nThe Generalized Smallest Grammar Problem\nA Foundational View on Integration Problems\nNeural Sentence Ordering\nFeasibility of Post-Editing Speech Transcriptions with a Mismatched  Crowd\nPuzzles in modern biology. II. Language, cancer and the recursive  processes of evolutionary innovation\nGenerating captions without looking beyond objects\nA Hybrid Convolutional Variational Autoencoder for Text Generation\nWord Affect Intensities\nUnlabeled Data for Morphological Generation With Character-Based  Sequence-to-Sequence Models\nGeneric Approach to Certified Static Checking of Module-like Constructs\nProgrammable Agents\nOn the Definition of Word Hyperbolic Groups\nSome equivalences for Martin's Axiom in asymmetric topology\nA generalized small model property for languages which force the  infinity\nNaive Philosophical Foundations\nThe Bivariate Normal Copula\nMulti-dimensional sets recognizable in all abstract numeration systems\nFixed points avoiding Abelian $k$-powers\ndup -- Explicit un-sharing in Haskell\nLambda Dependency-Based Compositional Semantics\nCauchy filters from Pelant's games\nRelatively Complete Counterexamples for Higher-Order Programs\nHenselian valued fields and inp-minimality\nA Gentle Introduction to a Beautiful Theorem of Molien\nOn the letter frequencies and entropy of written Marathi\nReally? Well. Apparently Bootstrapping Improves the Performance of  Sarcasm and Nastiness Classifiers for Online Dialogue\nGenerating Video Descriptions with Topic Guidance\nx-area\nA more reasonable proof of Cobham's theorem\nSolution sets for equations over free groups are EDT0L languages --  ICALP 2015 version\nNonparametric Bayesian Double Articulation Analyzer for Direct Language  Acquisition from Continuous Speech Signals\nOntology as a Source for Rule Generation\nGenerating Context-Appropriate Word Orders in Turkish\nComputational Interpretations of the Gricean Maxims in the Generation of  Referring Expressions\nGeneralizing Case Frames Using a Thesaurus and the MDL Principle\nExample-Based Optimization of Surface-Generation Tables\nKnowledge Acquisition for Content Selection\nTailored Patient Information: Some Issues and Questions\nMachine Learning of User Profiles: Representational Issues\nDecidable Reasoning in Terminological Knowledge Representation Systems\nA Machine-Learning Approach to Estimating the Referential Properties of  Japanese Noun Phrases\nGeneralized Strong Preservation by Abstract Interpretation\nJartege: a Tool for Random Generation of Unit Tests for Java Classes\nOn tractability and congruence distributivity\nParsimony Principles for Software Components and Metalanguages\nReasoning in Abella about Structural Operational Semantics  Specifications\nOptimal Control of Infinite Horizon Partially Observable Decision  Processes Modeled As Generators of Probabilistic Regular Languages\nPattern matching in compilers\nToward General Analysis of Recursive Probability Models\nProbabilistic State-Dependent Grammars for Plan Recognition\nA type theoretical framework for natural language semantics: the  Montagovian generative lexicon\nFrom bounded affine types to automatic timing analysis\nA Generic Analysis Server System for Functional Logic Programs\nContext-Free Groups and Bass-Serre Theory\nStructural Induction Principles for Functional Programmers\nSign Language Gibberish for syntactic parsing evaluation\nPersistent Topology of Syntax\nIndustrial Experiences with a Formal DSL Semantics to Check Correctness  of DSL Transformations\nHybrid SRL with Optimization Modulo Theories\nQuery Containment for Highly Expressive Datalog Fragments\nPhrase-based Image Captioning\nSequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme  Conversion\nUtilização de Grafos e Matriz de Similaridade na Sumarização  Automática de Documentos Baseada em Extração de Frases\nCompilation as a Typed EDSL-to-EDSL Transformation\nSequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax  Trees and Strings\nContext-aware Natural Language Generation with Recurrent Neural Networks\nQ-WordNet PPV: Simple, Robust and (almost) Unsupervised Generation of  Polarity Lexicons for Multiple Languages\nImproved Variational Autoencoders for Text Modeling using Dilated  Convolutions\nA Module-System Discipline for Model-Driven Software Development\nNeural Attribute Machines for Program Generation\nLearning Paraphrastic Sentence Embeddings from Back-Translated Bitext\nAutomatic Summarization of Online Debates\nContext Generation from Formal Specifications for C Analysis Tools\nUnified Pragmatic Models for Generating and Following Instructions\nA Flexible Approach to Automated RNN Architecture Generation\nDescribing Semantic Representations of Brain Activity Evoked by Visual  Stimuli\nLanguage properties and Grammar of Parallel and Series Parallel  Languages\nAn Efficient, Probabilistically Sound Algorithm for Segmentation and  Word Discovery\nThe Control System Modeling Language\nGeneric modes of consensus formation in stochastic language dynamics\nAcquiring Correct Knowledge for Natural Language Generation\nStrictly convergent analytic structures\nProgramming with models: writing statistical algorithms for general  model structures with NIMBLE\nOn the role of simplicity in science\nDeep API Learning\nSurvey of the State of the Art in Natural Language Generation: Core  tasks, applications and evaluation\nAttention-based Natural Language Person Retrieval\nSeq2SQL: Generating Structured Queries from Natural Language using  Reinforcement Learning\nAcquiring Common Sense Spatial Knowledge through Implicit Spatial  Templates\nError-tolerant Finite State Recognition with Applications to  Morphological Analysis and Spelling Correction\nInterfacing Constraint-Based Grammars and Generation Algorithms\nConstraint-Based Categorial Grammar\nClassifying Cue Phrases in Text and Speech Using Machine Learning\nGrammar Specialization through Entropy Thresholds\nMultiset-Valued Linear Index Grammars: Imposing Dominance Constraints on  Derivations\nDPOCL: A Principled Approach to Discourse Planning\nDecision Lists for Lexical Ambiguity Resolution: Application to Accent  Restoration in Spanish and French\nCombining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists\nThe Acquisition of a Lexicon from Paired Phoneme Sequences and Semantic  Representations\nRecovering From Parser Failures: A Hybrid Statistical/Symbolic Approach\nMinimal Change and Bounded Incremental Parsing\nDetermining Determiner Sequencing: A Syntactic Analysis for English\nFocus on ``only\" and ``Not\"\nMemoization of Coroutined Constraints\nCRYSTAL: Inducing a Conceptual Dictionary\nUser-Defined Nonmonotonicity in Unification-Based Formalisms\nCo-Indexing Labelled DRSs to Represent and Reason with Ambiguities\nBridging as Coercive Accommodation\nCountability and Number in Japanese-to-English Machine Translation\nFast Parsing using Pruning and Grammar Specialization\nEfficient Tabular LR Parsing\nA Simple Transformation for Offline-Parsable Grammars and its  Termination Properties\nCoordination as a Direct Process\nComputing Optimal Descriptions for Optimality Theory Grammars with  Context-Free Position Structures\nMaximizing Top-down Constraints for Unification-based Systems\nGramCheck: A Grammar and Style Checker\nParallel Replacement in Finite State Calculus\nCentering theory and the Italian pronominal system\nEvaluating Multilingual Gisting of Web Pages\nPARADISE: A Framework for Evaluating Spoken Dialogue Agents\nTracking Initiative in Collaborative Dialogue Interactions\nSense Tagging: Semantic Tagging with a Lexicon\nTagging Grammatical Functions\nGenerating Coherent Messages in Real-time Decision Support: Exploiting  Discourse Theory for Discourse Practice\nExpectations in Incremental Discourse Processing\nFast Context-Free Parsing Requires Fast Boolean Matrix Multiplication\nOff-line Parsability and the Well-foundedness of Subsumption\nVariation and Synthetic Speech\nCognitive scale-free networks as a model for intermittency in human  natural language\nContext-free multilanguages\nTransducers from Rewrite Rules with Backreferences\nDetecting Sub-Topic Correspondence through Bipartite Term Clustering\nStacking classifiers for anti-spam filtering of e-mail\nCHR as grammar formalism. A first report\nUnique Pattern Matching in Strings\nLearning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence  Alignment\nPart-of-Speech Tagging with Minimal Lexicalization\nDAB Content Annotation and Receiver Hardware Control with XML\nHigher-Order Concurrent Win32 Programming\nRRL: A Rich Representation Language for the Description of Agent  Behaviour in NECA\nA Tutorial on the Expectation-Maximization Algorithm Including  Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free  Grammars\nCanonical Abstract Syntax Trees\nEfficient Compression of Prolog Programs\nToward Functionality Oriented Programming\nA Study on Learnability for Rigid Lambek Grammars\nFeasible Depth\nThe Cotton, Simon-Mars and Cotton-York Tensors in Stationary Spacetimes\nFrom 2D conformal to 4D self-dual theories: quaternionic analyticity\nA note on Context Sensitive languages and Word Problems\nA finite basis theorem for residually finite, congruence  meet-semidistributive varieties\nComputational Theory of Biological Function I\nQuantum Finite One-Counter Automata\nStructuring quantum effects: superoperators as arrows\nForbidden lists (NP and CSP for combinatorialists)\nA Formal Model of Dictionary Structure and Content\nA proposal to a generalised splicing with a self assembly approach\nGrammatic -- a tool for grammar definition reuse and modularity\nInseparability and Strong Hypotheses for Disjoint NP Pairs\nLecture notes on the Ein-Popa extension result\nVariable binding, symmetric monoidal closed theories, and bigraphs\nNon-Markovian quantum trajectories: an exact result\nOn the Number of Membranes in Unary P Systems\nNon-Deterministic Kleene Coalgebras\nSmall NFAs from Regular Expressions: Some Experimental Results\nYacc is dead\nEnumerating Finitary Processes\nA Logic Programming Approach for Formal Verification of NetBill Security  and Transactions Protocol\nMaterials to the Russian-Bulgarian Comparative Dictionary \"EAD\"\nUniversal Algebra and Mathematical Logic\nConstruction du lexique LGLex à partir des tables du Lexique-Grammaire  des verbes du grec moderne\nCounting systems and the First Hilbert problem\nOn Algorithms and Extensions of Coordination Control of Discrete-Event  Systems\nAn Improved Proof-Theoretic Compilation of Logic Programs\nUniversal minimal flow in the language of near filters and its  applications\nProbabilistic Reasoning about Actions in Nonmonotonic Causal Theories\nUltimate periodicity of b-recognisable sets : a quasilinear procedure\nReal-Time Vector Automata\nArabizi Detection and Conversion to Arabic\nPart of Speech Tagging of Marathi Text Using Trigram Method\nDeadlock detection in linear recursive programs\nSynchronous Context-Free Grammars and Optimal Linear Parsing Strategies\nCross-lingual Annotation Projection for Semantic Roles\nGenerating abbreviations using Google Books library\nThe Foundational Cryptography Framework\nFour Ways from Universal to Particular: How Chomsky's  Language-Acquisition Faculty is Not Selectionist\nSlice Sampling for Probabilistic Programming\nMathematics and language\nShortest paths in one-counter systems\nOn the Systematic Design of Privacy Policies and Privacy Architectures\nExtending DLR with Labelled Tuples, Projections, Functional Dependencies  and Objectification (full version)\nDecidability of the Membership Problem for $2\\times 2$ integer matrices\nAn extensible formal semantics for UML activity diagrams\nTowards an Indexical Model of Situated Language Comprehension for  Cognitive Agents in Physical Worlds\nExploring Segment Representations for Neural Segmentation Models\nEmbedded SML using the MLton compiler\nFault Localization in Web Applications via Model Finding\nIVOA Recommendation: IVOA Astronomical Data Query Language Version 2.00\nTowards a Query Language for the Web of Data (A Vision Paper)\nAutomata theory in nominal sets\nModeling Algorithms in SystemC and ACL2\nRoles in Software Development using Domain Specific Modeling Languages\nProbabilistic Models for High-Order Projective Dependency Parsing\nAutomata and Graph Compression\nRobustly Leveraging Prior Knowledge in Text Classification\nA Framework for Comparing Groups of Documents\nImproving distant supervision using inference learning\nOn bordered theories for Khovanov homology\nSyntax and Semantics of Abstract Binding Trees\nAffine computation and affine automaton\nQuotationFinder - Searching for Quotations and Allusions in Greek and  Latin Texts and Establishing the Degree to Which a Quotation or Allusion  Matches Its Source\nA System for Probabilistic Linking of Thesauri and Classification  Systems\nLSTM-based Mixture-of-Experts for Knowledge-Aware Dialogues\nExperiments in Linear Template Combination using Genetic Algorithms\nSingle-Model Encoder-Decoder with Explicit Morphological Representation  for Reinflection\nPerSum: Novel Systems for Document Summarization in Persian\nsk_p: a neural program corrector for MOOCs\nRead, Tag, and Parse All at Once, or Fully-neural Dependency Parsing\nSolving the Wastewater Treatment Plant Problem with SMT\nAn Enumeration of the Supercharacter Theories of $C_p \\times C_2 \\times  C_2$ for Prime $p$\nExistence of Hierarchies and Human's Pursuit of Top Hierarchy Lead to  Power Law\nVery Deep Convolutional Networks for End-to-End Speech Recognition\nDeveloping a Practical Reactive Synthesis Tool: Experience and Lessons  Learned\nAn Inductive Proof Method for Simulation-based Compiler Correctness\nQEX: a framework for lattice field theories\nExploring Different Dimensions of Attention for Uncertainty Detection\nRe-evaluating Automatic Metrics for Image Captioning\nVerifying Heaps' law using Google Books Ngram data\nLeveraging Cognitive Features for Sentiment Analysis\nTraining Language Models Using Target-Propagation\nHybrid Dialog State Tracker with ASR Features\nContext-Aware Prediction of Derivational Word-forms\nConstruction of a Japanese Word Similarity Dataset\nSimplifying the Bible and Wikipedia Using Statistical Machine  Translation\nRepresenting Sentences as Low-Rank Subspaces\nData Augmentation for Low-Resource Neural Machine Translation\nNemo/Hecke: Computer Algebra and Number Theory Packages for the Julia  Programming Language\nMultiscale sequence modeling with a learned dictionary\nRepresentation Learning for Grounded Spatial Reasoning\nA Polynomial Time Match Test for Large Classes of Extended Regular  Expressions\nSome Improvements in Fuzzy Turing Machines\nFast calculation of entropy with Zhang's estimator\nHierarchically-Attentive RNN for Album Summarization and Storytelling\nLSTM Network for Inflected Abbreviation Expansion\nV1: A Visual Query Language for Property Graphs\nSLING: A framework for frame semantic parsing\nSimulating Action Dynamics with Neural Process Networks\nWhat's in a game? A theory of game models\nSentiment Classification using Images and Label Embeddings\nThe Zero Resource Speech Challenge 2017\nDetecting Hate Speech in Social Media\nVariational Attention for Sequence-to-Sequence Models\nRestrictions on Potential Automatic Structures on Thompson's Group F\nUsing reinforcement learning to learn how to play text-based games\nTamil Open-Source Landscape - Opportunities and Challenges\nThe Beta-Bernoulli process and algebraic effects\nProceedings 14th International Conference on Quantum Physics and Logic\nNatural Language to Structured Query Generation via Meta-Learning\nTranslating Questions into Answers using DBPedia n-triples\nTracing sharing in an imperative pure calculus\nText Segmentation as a Supervised Learning Task\nGeneralized Counters and Reversal Complexity\nReformulation of QCD in the language of general relativity\nExact generation of acyclic deterministic finite automata\nExtending the Interaction Nets Calculus by Generic Rules\nInterpreting canonical tensor model in minisuperspace\nGenerating Conceptual Metaphors from Proposition Stores\nOn the subword complexity of the fixed point of $a \\rightarrow aab$, $b  \\rightarrow b$, and generalizations\nGenerating Politically-Relevant Event Data\nLatent Variable Dialogue Models and their Diversity\nTrainable Referring Expression Generation using Overspecification  Preferences\nLow-Resource Neural Headline Generation\nOn finitely generated submonoids of free groups\nPriority Union and Generalization in Discourse Grammars\nGenerating Multilingual Documents from a Knowledge Base: The TECHDOC  Project\nEvaluating Discourse Processing Algorithms\nFrom Regular to Context Free to Mildly Context Sensitive Tree Rewriting  Systems: The Path of Child Language Acquisition\nEllipsis and Higher-Order Unification\nPresenting Punctuation\nApplication-driven automatic subgrammar extraction\nDimension in Complexity Classes\nMultimodal Meaning Representation for Generic Dialogue Systems  Architectures\nDo Goedel's incompleteness theorems set absolute limits on the ability  of the brain to express and communicate mental concepts verifiably?\nThe rationality of Sol manifolds\nA Language-Based Approach for Improving the Robustness of Network  Application Protocol Implementations\nMalware Detection using Attribute-Automata to parse Abstract Behavioral  Descriptions\nMutation of Directed Graphs -- Corresponding Regular Expressions and  Complexity of Their Generation\nOn Planning with Preferences in HTN\nLes entités spatiales dans la langue : étude descriptive, formelle  et expérimentale de la catégorisation\nHMC: Verifying Functional Programs Using Abstract Interpreters\nComparison of Two Context-Free Rewriting Systems with Simple  Context-Checking Mechanisms\nAutomated Verification of Practical Garbage Collectors\nA Generic Scheme for Qualified Constraint Functional Logic Progamming\nSelected Operations, Algorithms, and Applications of n-Tape Weighted  Finite-State Machines\nPoplar: A Java Extension for Evolvable Component Integration\nImplementing Equational Constraints in a Functional Language\nQuestion Answering in a Natural Language Understanding System Based on  Object-Oriented Semantics\nACME vs PDDL: support for dynamic reconfiguration of software  architectures\nAutomatic Generation of OWL Ontology from XML Data Source\nA Framework for Concurrent Imperative Programming\nA Domain Specific Language for kinematic models and fast implementations  of robot dynamics algorithms\nProbabilistic Description Logics\nTowards the Fully Automatic Merging of Lexical Resources: A Step Forward\nRelative Observability of Discrete-Event Systems and its Supremal  Sublanguages\nModal Specifications for Probabilistic Timed Systems\nAbstract Interpretation as a Programming Language\nSymmetric Groups and Quotient Complexity of Boolean Operations\nNatural Language Processing in Biomedicine: A Unified System  Architecture Overview\nReal Time Strategy Language\nQuery shredding: Efficient relational evaluation of queries over nested  multisets (extended version)\nA Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference  in Natural Language Processing\nJulia: A Fresh Approach to Numerical Computing\nAn Abstract Interpretation-based Model of Tracing Just-In-Time  Compilation\nDimensionality on Summarization\nDNN-based Speech Synthesis for Indian Languages from ASCII text\nIVOA Recommendation: Table Access Protocol Version 1.0\nA Multiplicative Model for Learning Distributed Text-Based Attribute  Representations\nMontiArcAutomaton: Modeling Architecture and Behavior of Robotic Systems\nLexical Normalisation of Twitter Data\nMontiArc - Architectural Modeling of Interactive Distributed and  Cyber-Physical Systems\nDefining UML Family Members Using Prefaces\nScaling laws in human speech, decreasing emergence of new words and a  generalized model\n$gen$CNN: A Convolutional Architecture for Word Sequence Prediction\nCompression and the origins of Zipf's law of abbreviation\nTreatJS: Higher-Order Contracts for JavaScript\nA Simple and Practical Linear Algebra Library Interface with Static Size  Checking\nDescribtion of normal basis of boundary algebras and factor languages of  small growth\nTheanoLM - An Extensible Toolkit for Neural Network Language Modeling\nDeclarative Machine Learning - A Classification of Basic Properties and  Types\nNatural Language Generation enhances human decision-making with  uncertain information\nConditional Generation and Snapshot Learning in Neural Dialogue Systems\nCriticality in Formal Languages and Statistical Physics\nThe Almost Equivalence by Asymptotic Probabilities for Regular Languages  and Its Computational Complexities\nDiffSharp: An AD Library for .NET Languages\nLandmark-based consonant voicing detection on multilingual corpora\nGeneralisation in Named Entity Recognition: A Quantitative Analysis\nPart of Speech Based Term Weighting for Information Retrieval\nOn Natural Language Generation of Formal Argumentation\nAn efficient algorithm to decide periodicity of b-recognisable sets  using LSDF convention\nA Constructor-Based Reachability Logic for Rewrite Theories\nStream Graphs and Link Streams for the Modeling of Interactions over  Time\nMeasurement Context Extraction from Text: Discovering Opportunities and  Gaps in Earth Science\nA Dual Encoder Sequence to Sequence Model for Open-Domain Dialogue  Modeling\nProgrammatic Control of a Compiler for Generating High-performance  Spatial Hardware\nMulti-Domain Adversarial Learning for Slot Filling in Spoken Language  Understanding\nA Deep Network Model for Paraphrase Detection in Short Text Messages\nPronouncUR: An Urdu Pronunciation Lexicon Generator\nRankME: Reliable Human Ratings for Natural Language Generation\nInsights into End-to-End Learning Scheme for Language Identification\nRobustly Safe Compilation or, Efficient, Provably Secure Compilation\nEvaluating historical text normalization systems: How well do they  generalize?\nDP-GAN: Diversity-Promoting Generative Adversarial Network for  Generating Informative and Diversified Text\nHow Images Inspire Poems: Generating Classical Chinese Poetry from  Images with Memory Networks\nFinite generating sets of relatively hyperbolic groups and applications  to geodesic languages\nThe emerging field of language dynamics\nInterfacing Interpreted and Compiled Languages to Support Applications  on a Massively Parallel Network of Workstations (MP-NOW)\nTransfer in a Connectionist Model of the Acquisition of Morphology\nThe Complexity of Quantified Constraint Satisfaction: Collapsibility,  Sink Algebras, and the Three-Element Case\nDefinition and Implementation of a Points-To Analysis for C-like  Languages\nBeyond Zipf's law: Modeling the structure of human language\nUsing the General Intensional Programming System (GIPSY) for Evaluation  of Higher-Order Intensional Logic (HOIL) Expressions\nParikh Images of Regular Languages: Complexity and Applications\nFormal-language-theoretic Optimal Path Planning For Accommodation of  Amortized Uncertainties and Dynamic Effects\nQuantitative Languages Defined by Functional Automata\nFeedback Generation for Performance Problems in Introductory Programming  Assignments\nDeterminization of fuzzy automata by means of the degrees of language  inclusion\nLanguage Models for Image Captioning: The Quirks and What Works\nComputational Complexity of the Minimum Cost Homomorphism Problem on  Three-Element Domains\nAlternation in Quantum Programming: From Superposition of Data to  Superposition of Programs\nElaborating Evaluation-Order Polymorphism\nA Correlational Encoder Decoder Architecture for Pivot Based Sequence  Generation\nHuman and Machine Judgements for Russian Semantic Relatedness\nCall-by-value, call-by-name and the vectorial behaviour of the algebraic  λ-calculus\nLanguage Edit Distance & Maximum Likelihood Parsing of Stochastic  Grammars: Faster Algorithms & Connection to Fundamental Graph Problems\nStepwise Debugging of Answer-Set Programs\nUsing NLU in Context for Question Answering: Improving on Facebook's  bAbI Tasks\nIndexed Languages and Unification Grammars\nQuantum Computers and Quantum Computer Languages: Quantum Assembly  Language and Quantum C Language\nSuper-Languages: Developing Languages and Applications with XMF (Second  Edition)\nEzhil: A Tamil Programming Language\nThe Complexity of General-Valued CSPs\nFrequency patterns of semantic change: Corpus-based evidence of a  near-critical dynamics in language change\nStory Generation from Sequence of Independent Short Descriptions\nAutomatic generation of analysis class diagrams from use case  specifications\nToward Controlled Generation of Text\nSpeaking the Same Language: Matching Machine to Human Captions by  Adversarial Training\nText Generation Based on Generative Adversarial Nets with Latent  Variable\nQINL: Query-integrated Languages\nTwo-Dimensional Pattern Languages\nLanguage Approximation With One-Counter Automata\nSemi-Supervised QA with Generative Domain-Adaptive Nets\nTree-Structured Neural Machine for Linguistics-Aware Sentence Generation\nLanguage Segmentation\nA Flexible Shallow Approach to Text Generation\nFirst-order Complete and Computationally Complete Query Languages for  Spatio-Temporal Databases\nLearning to generalize to new compositions in image understanding\nGenerative Deep Neural Networks for Dialogue: A Short Review\nAutomatic Generation of Grounded Visual Questions\nGenerating Code with Polymorphic let: A Ballad of Value Restriction,  Copying and Sharing\nNeural Question Generation from Text: A Preliminary Study\nAutomatized Generation of Alphabets of Symbols\nNeural Text Generation: Past, Present and Beyond\nGenerating Diverse and Accurate Visual Captions by Comparative  Adversarial Learning\nThe limits of SDP relaxations for general-valued CSPs\nSelf-Organizing Machine Translation: Example-Driven Induction of  Transfer Functions\nA Complete and Recursive Feature Theory\nFocusing for Pronoun Resolution in English Discourse: An Implementation\nSyntactic Analysis Of Natural Language Using Linguistic Rules And  Corpus-based Patterns\nClassifier Assignment by Corpus-based Approach\nFormalization and Parsing of Typed Unification-Based ID/LP Grammars\nDifferent Issues in the Design of a Lemmatizer/Tagger for Basque\nA Note on the Complexity of Restricted Attribute-Value Grammars\nCollaborating on Referring Expressions\nStatistical Decision-Tree Models for Parsing\nThe Use of Knowledge Preconditions in Language Processing\nHow Part-of-Speech Tags Affect Text Retrieval and Filtering Performance\nError-tolerant Tree Matching\nA Data-Oriented Approach to Semantic Interpretation\nCue Phrase Classification Using Machine Learning\nStochastic Attribute-Value Grammars\nIntegrating HMM-Based Speech Recognition With Direct Manipulation In A  Multimodal Korean Natural Language Interface\nMemory-Based Learning: Using Similarity for Smoothing\nA Corpus-Based Approach for Building Semantic Lexicons\nOn aligning trees\nProbabilistic Constraint Logic Programming\nOn the use of expectations for detecting and repairing human-machine  miscommunication\nTopic Graph Generation for Query Navigation: Use of Frequency Classes  for Topic Extraction\nGroup Theory and Grammatical Description\nLetter to Sound Rules for Accented Lexicon Compression\nA Variant of Earley Parsing\nStatistical Inference and Probabilistic Modelling for Constraint-Based  NLP\nNumeration systems on a regular language: Arithmetic operations,  Recognizability and Formal power series\nMany uses, many annotations for large speech corpora: Switchboard and  TDT as case studies\nSequence-Based Abstract Interpretation of Prolog\nCoaxing Confidences from an Old Friend: Probabilistic Classifications  from Transformation Rule Lists\nBuilding Multi-Platform User Interfaces with UIML\nA Multi-Step Process for Generating Multi-Platform User Interfaces using  UIML\nA Refinement Calculus for Logic Programs\nMostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences\nA Method for Open-Vocabulary Speech-Driven Text Retrieval\nMcRunjob: A High Energy Physics Workflow Planner for Grid Production  Processing\nLocal-search techniques for propositional logic extended with  cardinality constraints\nRunning C++ models undet the Swarm environment\nZipf's law and the creation of musical context\nOn the Theory of Structural Subtyping\nA knowledge-based approach to semi-automatic annotation of multimedia  documents via user adaptation\nA Systematic Aspect-Oriented Refactoring and Testing Strategy, and its  Application to JHotDraw\nWeighted Automata in Text and Speech Processing\nIntegration of the DOLCE top-level ontology into the OntoSpec  methodology\nACD Term Rewriting\nOn the logical definability of certain graph and poset languages\nNavigating multilingual news collections using automatically extracted  information\nLanguages, Algorithms, Procedures, Calculi, and Metalogic\nProgramming Complex Systems\nDegrees of freedom of tongue movements in speech may be constrained by  biomechanics\nAlgebraic Geometry over Free Metabelian Lie Algebra II: Finite Field  Case\nEfficient Solution of Language Equations Using Partitioned  Representations\nThe Expressional Limits of Formal Languages in the Notion of Observation\nCoherence thresholds in models of language change and evolution: the  effects of noise, dynamics and network of interactions\nResearch report: State complexity of operations on two-way quantum  finite automata\nConcept-Oriented Model and Query Language\nOn the Borel Inseparability of Game Tree Languages\nAs time goes by: Constraint Handling Rules - A survey of CHR research  from 1998 to 2007\nStructure Theorem and Strict Alternation Hierarchy for FO^2 on Words\nTesting the Equivalence of Regular Languages\nA unifying approach to picture grammars\nDeterministic Consistency: A Programming Model for Shared Memory  Parallelism\nSynthesis of AMBA AHB from Formal Specification\nSimplifying Parallelization of Scientific Codes by a Function-Centric  Approach in Python\nRecognition of Handwritten Textual Annotations using Tesseract Open  Source OCR Engine for information Just In Time (iJIT)\nAbstract Certification of Global Non-Interference in Rewriting Logic\nAvoiding another Green Elephant - A Proposal for the Next Generation HLA  based on the Model Driven Architecture\nOn the Implementation of GNU Prolog\nChameleons in imagined conversations: A new approach to understanding  coordination of linguistic style in dialogs\nExperimental Support for a Categorical Compositional Distributional  Model of Meaning\nDiscovering Knowledge using a Constraint-based Language\nProceedings 10th International Workshop on the Foundations of  Coordination Languages and Software Architectures\nEfficient and Correct Stencil Computation via Pattern Matching and  Static Typing\nBuilding-Blocks for Performance Oriented DSLs\nA DSEL for Studying and Explaining Causation\nResumption-based big-step and small-step interpreters for While with  interactive I/O\nGenetic Algorithm (GA) in Feature Selection for CRF Based Manipuri  Multiword Expression (MWE) Identification\nProgramming errors in traversal programs over structured data\nSearch Combinators\nParameter Learning in PRISM Programs with Continuous Random Variables\nAnalysing Temporally Annotated Corpora with CAVaT\nA model-driven approach for processing complex events\nNon-definability of languages by generalized first-order formulas over  (N,+)\nDerivatives of Approximate Regular Expressions\nImplementation of EasyTime Formal Semantics using a LISA Compiler  Generator\nA model of competition among more than two languages\nAdversarial Evaluation for Models of Natural Language\nOGCOSMO: An auxiliary tool for the study of the Universe within  hierarchical scenario of structure formation\nManaging Complex Structured Data In a Fast Evolving Environment\nNatural Language Processing - A Survey\nAnnotating Answer-Set Programs in LANA?\nExtraction of domain-specific bilingual lexicon from comparable corpora:  compositional translation and ranking\nDeriving program transformations by demonstration\nTowards a Semantic-based Approach for Modeling Regulatory Documents in  Building Industry\nProbabilistic Frame Induction\nA Domain-Specific Language for Rich Motor Skill Architectures\nPractical Inlining of Functions with Free Variables\nThe Size-Change Termination Principle for Constructor Based Languages\nA Grammatical Inference Approach to Language-Based Anomaly Detection in  XML\nInterplay between Point-Group Symmetries and the Choice of the Bloch  Basis in Multiband Models\nPatterns for computational effects arising from a monad or a comonad\nHybrid Automated Reasoning Tools: from Black-box to Clear-box  Integration\nOn the Semantics of Gringo\nA Study of Successive Over-relaxation Method Parallelization Over Modern  HPC Languages\nTowards a Generic Framework for the Development of Unicode Based Digital  Sindhi Dictionaries\nARSENAL: Automatic Requirements Specification Extraction from Natural  Language\nAspect-Based Opinion Extraction from Customer reviews\nComplete Separation of the 3 Tiers - Divide and Conquer\nA preliminary study of Croatian Language Syllable Networks\nTATI -- A Logo-like interface for microworlds and simulations for  physics teaching in Second Life\nDynamic Choreographies - Safe Runtime Updates of Distributed  Applications\nA Proposed Framework for Development of a Visualizer Based on Memory  Transfer Language (MTL)\nEncoding the structure of many-body localization with matrix product  operators\nOnline interpretation of numeric sign language using 2-d skeletal model\nPatterns in the English Language: Phonological Networks, Percolation and  Assembly Models\nTransforming while/do/for/foreach-Loops into Recursive Methods\nStatistically Significant Detection of Linguistic Change\nOpinion mining of text documents written in Macedonian language\nTowards a Fully Abstract Compiler Using Micro-Policies: Secure  Compilation for Mutually Distrustful Components\nSemi-supervised Sequence Learning\nOrder-Embeddings of Images and Language\nRegularizing RNNs by Stabilizing Activations\nDomain Adaptation of Recurrent Neural Networks for Natural Language  Understanding\nAn Interference-Free Programming Model for Network Objects\nSIMPL: A DSL for Automatic Specialization of Inference Algorithms\nOn Modular and Fully-Abstract Compilation -- Technical Appendix\nMorphological Priors for Probabilistic Neural Word Embeddings\nResolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine  Translation\nCOREALMLIB: An ALM Library Translated from the Component Library\nDeclarative Event-Based Workflow as Distributed Dynamic Condition  Response Graphs\nExtending Object-Oriented Languages by Declarative Specifications of  Complex Objects using Answer-Set Programming\nDecentralized Supervisory Control of Discrete Event Systems for  Bisimulation Equivalence\nLocality Optimization for Data Parallel Programs\nFormal Verification of a C Value Analysis Based on Abstract  Interpretation\nSeeing What You're Told: Sentence-Guided Activity Recognition In Video\nA Representation Theorem for Second-Order Functionals\nDomain Theory for Modeling OOP: A Summary\nOn the density of certain languages with $p^2$ letters\nControversy and Sentiment in Online News\nGames for Active XML Revisited\nA hypothesize-and-verify framework for Text Recognition using Deep  Recurrent Neural Networks\nLearning to Understand Phrases by Embedding the Dictionary\nConstraining application behaviour by generating languages\nContext-Free Path Queries on RDF Graphs\nParsing Expression Grammars Made Practical\nEmerging Dimension Weights in a Conceptual Spaces Model of Concept  Combination\nBachelor's thesis on generative probabilistic programming (in Russian  language, June 2014)\nKnowledge Transfer with Medical Language Embeddings\nA Latent Variable Recurrent Neural Network for Discourse Relation  Language Models\nA Character-Level Decoder without Explicit Segmentation for Neural  Machine Translation\nSensorimotor Input as a Language Generalisation Tool: A Neurorobotics  Model for Generation and Generalisation of Noun-Verb Combinations with  Sensorimotor Inputs\nTwo-Finger Keyboard Layout Problem: An Application On Turkish Language\nLearning Natural Language Inference using Bidirectional LSTM model and  Inner-Attention\nExploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical  Error Correction\nImplementing graph grammars for intelligence analysis in OCaml\nSet-Theoretic Types for Polymorphic Variants\nThe Role of CNL and AMR in Scalable Abstractive Summarization for  Multilingual Media Monitoring\nPragmatic factors in image description: the case of negations\nNN-grams: Unifying neural network and n-gram language models for Speech  Recognition\nPredicting the Relative Difficulty of Single Sentences With and Without  Surrounding Context\nTie-breaker: Using language models to quantify gender bias in sports  journalism\nCantor meets Scott: Semantic Foundations for Probabilistic Networks\nMultimodal Attention for Neural Machine Translation\nNeural Machine Transliteration: Preliminary Results\nThe Color of the Cat is Gray: 1 Million Full-Sentences Visual Question  Answering (FSVQA)\nAutomatic semigroups vs automaton semigroups\nSentence Segmentation in Narrative Transcripts from Neuropsychological  Tests using Recurrent Convolutional Neural Networks\nBidirectional LSTM-CRF for Clinical Concept Extraction\nReasoning with Memory Augmented Neural Networks for Language  Comprehension\nWhat Do Recurrent Neural Network Grammars Learn About Syntax?\nCoherent Dialogue with Attention-based Language Models\nBidirectional LSTM-CRF for Clinical Concept Extraction\nSingle-Pass, Adaptive Natural Language Filtering: Measuring Value in  User Generated Comments on Large-Scale, Social Media News Forums\nLazy Automata Techniques for WS1S\nUser Assistance Characteristics of the USE Model Checking Tool\nValidating and describing linked data portals using shapes\nAn Empirical Evaluation of Zero Resource Acoustic Unit Discovery\nPrinted Arabic Text Recognition using Linear and Nonlinear Regression\nNeural Multi-Step Reasoning for Question Answering on Semi-Structured  Tables\nMcFSM: Globally Taming Complex Systems\nCoping with Construals in Broad-Coverage Semantic Annotation of  Adpositions\nMultimodal Language Specification for Human Adaptive Mechatronics\nWell-Behaved Model Transformations with Model Subtyping\nInteracting Conceptual Spaces I : Grammatical Composition of Concepts\nCrowdsourcing Universal Part-Of-Speech Tags for Code-Switching\nStudying the Prevalence of Exception Handling Anti-Patterns\nThe Proof of CSP Dichotomy Conjecture\nSemi-supervised sequence tagging with bidirectional language models\nLogical Parsing from Natural Language Based on a Neural Translation  Model\nSubregular Complexity and Deep Learning\nLocal Monotonic Attention Mechanism for End-to-End Speech and Language  Processing\nPredicting Causes of Reformulation in Intelligent Assistants\nSpherical Paragraph Model\nSGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New  Models and Search Strategies\nFrom Type Spaces to Probability Frames and Back, via Language\nInput-Driven Double-Head Pushdown Automata\nCode Staging in GNU Guix\nTranslating Domain-Specific Expressions in Knowledge Bases with Neural  Machine Translation\nKnowNER: Incremental Multilingual Knowledge in Named Entity Recognition\nData Innovation for International Development: An overview of natural  language processing for qualitative data analysis\nUnwritten Languages Demand Attention Too! Word Discovery with  Encoder-Decoder Models\nLow-resource bilingual lexicon extraction using graph based word  embeddings\nReversible Computation in Term Rewriting\nOn the insertion of n-powers\nNatural Language Guided Visual Relationship Detection\nGrammatical facial expression recognition using customized deep neural  network architecture\nStyle Transfer in Text: Exploration and Evaluation\nHoME: a Household Multimodal Environment\nCapturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing  and Best-Worst Scaling\nNeural Machine Translation by Generating Multiple Linguistic Factors\nTranslating Pro-Drop Languages with Reconstruction Models\nGrounded Language Understanding for Manipulation Instructions Using  GAN-Based Classification\nDiscrete Autoencoders for Sequence Models\nEMME: a formal tool for ECMAScript Memory Model Evaluation\nSparseMAP: Differentiable Sparse Structured Inference\nA wide-spectrum language for verification of programs on weak memory  models\nObject Captioning and Retrieval with Natural Language\nExploiting Recurrent Neural Networks and Leap Motion Controller for Sign  Language and Semaphoric Gesture Recognition\nGuide Me: Interacting with Deep Networks\nA Novel Learnable Dictionary Encoding Layer for End-to-End Language  Identification\nImproved Fusion of Visual and Language Representations by Dense  Symmetric Co-Attention for Visual Question Answering\nProgrammatically Interpretable Reinforcement Learning\nRestricting the Weak-Generative Capacity of Synchronous Tree-Adjoining  Grammars\nCorpus-Driven Knowledge Acquisition for Discourse Analysis\nLearning Unification-Based Natural Language Grammars\nOrthographic Structuring of Human Speech and Texts: Linguistic  Application of Recurrence Quantification Analysis\nThe Weaves Reconfigurable Programming Framework\nThe DLV System for Knowledge Representation and Reasoning\nBeyond word frequency: Bursts, lulls, and scaling in the temporal  distributions of words\nIntroduction to the Report \"Interlanguages and Synchronic Models of  Computation.\"\nMalagasy Dialects and the Peopling of Madagascar\nRepetitive Reduction Patterns in Lambda Calculus with letrec (Work in  Progress)\nQIS-XML: An Extensible Markup Language for Quantum Information Science\nHamming Approximation of NP Witnesses\nReconsidering Written Language\nA Core Calculus for Provenance\nCategory-Theoretic Quantitative Compositional Distributional Models of  Natural Language Semantics\nAnswering SPARQL queries modulo RDF Schema with paths\nContext-based Word Acquisition for Situated Dialogue in a Virtual World\nEnabling FPGAs for the Masses\nFrom Captions to Visual Concepts and Back\nThe Goldilocks Principle: Reading Children's Books with Explicit Memory  Representations\nA Corpus and Evaluation Framework for Deeper Understanding of  Commonsense Stories\nLeft-corner Methods for Syntactic Modeling with Universal Structural  Constraints\nWNetKAT: A Weighted SDN Programming and Verification Language\nUsing Linguistic Analysis to Translate Arabic Natural Language Queries  to SPARQL\nSentiment of Emojis\nGraded Entailment for Compositional Distributional Semantics\nCVC Verilog Compiler -- Fast Complex Language Compilers Can be Simple\nThe power of Sherali-Adams relaxations for general-valued CSPs\nHoTTSQL: Proving Query Rewrites with Univalent SQL Semantics\nThe infochemical core\nKannada Spell Checker with Sandhi Splitter\nFrom narrative descriptions to MedDRA: automagically encoding adverse  drug reactions\nTinkering Under the Hood: Interactive Zero-Shot Learning with Net  Surgery\nFast Domain Adaptation for Neural Machine Translation\nUnderstanding Image and Text Simultaneously: a Dual Vision-Language  Machine Comprehension Task\nA modeling and simulation language for biological cells with coupled  mechanical and chemical processes\nPetri Automata\nAutomated Phrase Mining from Massive Text Corpora\nDialectometric analysis of language variation in Twitter\nAutomatic Text Summarization Approaches to Speed up Topic Model Learning  Process\nFine-graind Image Classification via Combining Vision and Language\nSimulations and Antichains for Efficient Handling of Finite Automata\nExtracting Formal Models from Normative Texts\nEffective Spoken Language Labeling with Deep Recurrent Neural Networks\nRevisiting Elementary Denotational Semantics\nCynical Selection of Language Model Training Data\nEmpower Sequence Labeling with Task-Aware Neural Language Model\nSoftware Engineering Modeling Applied to English Verb Classification  (and Poetry)\nImproved Twitter Sentiment Analysis Using Naive Bayes and Custom  Language Model\nTFW, DamnGina, Juvie, and Hotsie-Totsie: On the Linguistic and Social  Aspects of Internet Slang\nVideo-based Sign Language Recognition without Temporal Segmentation\nSemantic projection: recovering human knowledge of multiple, distinct  object features from word embeddings\nConvolutional Neural Networks and Language Embeddings for End-to-End  Dialect Recognition\nGeneralized geometries and kinematics for Quantum Gravity\nAsymptotic topology\nLinguistic Paradoxes and Tautologies\nOrbifolds and stable homotopy groups\nA Generalized Composition of Quadratic Forms based on Quadratic Pairs\nEngel conditions and symmetric tensors\nA method for recursively generating sequential rational approximations  to $\\sqrt[n]{k}$\nAbstract Representations and Frequent Pattern Discovery\nThe Generic Model of Computation\nOn partial and generic uniqueness of block term tensor decompositions\nGeneralized geometry, T-duality, and renormalization group flow\nImage processing using miniKanren\nAutonomization of Monoidal Categories\nValue Automata with Filters\nDirac matrices as elements of superalgebraic matrix algebra\nProcedural and Non-Procedural Implementation of Search Strategies in  Control Network Programming\nPresentations of Topological Full Groups by Generators and Relations\nAlgebraic and Nori fundamental gerbes\nOn level-transitivity and exponential growth\nModeling documents with Generative Adversarial Networks\nThe null-geodesic flow near horizons\nEliminating Field Quantifiers in Strongly Dependent Henselian Fields\nTo Infinity and Beyond\nFunctorial Semantics for Relational Theories\nExplicit equations for exterior square of the general linear group\nGeneralizing the Paige-Tarjan Algorithm by Abstract Interpretation\nAutomated Generation of User Guidance by Combining Computation and  Deduction\nUsing Neural Generative Models to Release Synthetic Twitter Corpora with  Reduced Stylometric Identifiability of Users\nFaithful (meta-)encodings of programmable strategies into term rewriting  systems\nAchieving Fluency and Coherency in Task-oriented Dialog\nSyntactic-Head-Driven Generation\nPermutations generated by a depth 2 and infinite stack in series are  algebraic\nChinese Song Iambics Generation with Neural Attention-based Model\nChinese Poetry Generation with Planning based Neural Network\nPAWS: A Tool for the Analysis of Weighted Systems\nWhat is the Role of Recurrent Neural Networks (RNNs) in an Image Caption  Generator?\nTensor Product Generation Networks for Deep NLP Modeling\nLanguage-integrated provenance in Haskell\nQuery-driven Procedures for Hybrid MKNF Knowledge Bases\nCreating Textual Language Dialects Using Aspect-like Techniques\nQuotient Complexity of Star-Free Languages\nWeak $ω$-Regular Trace Languages\nFly out-smarts man\nLanideNN: Multilingual Language Identification on Character Window\nLanguage Transfer of Audio Word2Vec: Learning Audio Segment  Representations without Target Language Data\nCombined-Semantics Equivalence Is Decidable for a Practical Class of  Conjunctive Queries\nA Primer on Resurgent Transseries and Their Asymptotics\nGeneralized Hadamard Product and the Derivatives of Spectral Functions\nA generic tool to generate a lexicon for NLP from Lexicon-Grammar tables\nA Formal Comparison of Approaches to Datatype-Generic Programming\nTowards a Logic-Based Unifying Framework for Computing\nPolitical Speech Generation\nAutomatic Description Generation from Images: A Survey of Models,  Datasets, and Evaluation Measures\nNeural Net Models for Open-Domain Discourse Coherence\nHow Much is 131 Million Dollars? Putting Numbers in Perspective with  Compositional Descriptions\nGenerating Focussed Molecule Libraries for Drug Discovery with Recurrent  Neural Networks\nSTAIR Captions: Constructing a Large-Scale Japanese Image Caption  Dataset\nNeural Models for Key Phrase Detection and Question Generation\nGenerative Bridging Network in Neural Sequence Prediction\nTranslating Phrases in Neural Machine Translation\nGenerating Natural Adversarial Examples\nInterpretable Charge Predictions for Criminal Cases: Learning to  Generate Court Views from Fact Descriptions\nImagine This! Scripts to Compositions to Videos\nFree differential algebras and generic 2D dilatonic (super)gravities\nGenerating Images from Captions with Attention\nA generalized Goulden-Jackson cluster method and lattice path  enumeration\nTowards an Accurate Mathematical Model of Generic Nominally-Typed OOP\nNeural Personalized Response Generation as Domain Adaptation\nTexygen: A Benchmarking Platform for Text Generation Models\nAutomatic Generation of Sparse Tensor Kernels with Workspaces\nReview of Charniak's \"Statistical Language Learning\"\nBirth, survival and death of languages by Monte Carlo simulation\nGeneralisation of language and knowledge models for corpus analysis\nEfficient Separability of Regular Languages by Subsequences and Suffixes\nRepresentation of (Left) Ideal Regular Languages by Synchronizing  Automata\nAbductive Equivalential Translation and its application to Natural  Language Database Interfacing\nVagueness of Linguistic variable\nExpressivity of Time-Varying Graphs and the Power of Waiting in Dynamic  Networks\nSoft Contract Verification for Higher-Order Stateful Programs\nA Second-Order Approach to Complex Event Recognition\nBest-first Model Merging for Hidden Markov Model Induction\nThree studies of grammar-based surface-syntactic parsing of unrestricted  English text. A summary and orientation\nA Consistency-Based Model for Belief Change: Preliminary Report\nEvent Driven Computations for Relational Query Language\nStochastic model for the vocabulary growth in natural languages\nTaming the Infinite Chase: Query Answering under Expressive Integrity  Constraints\nText to Multi-level MindMaps: A Novel Method for Hierarchical Visual  Abstraction of Natural Language Text\nSubtyping in Java is a Fractal\nAutomatic Parallelization: Executing Sequential Programs on a Task-Based  Parallel Runtime\nOpt: A Domain Specific Language for Non-linear Least Squares  Optimization in Graphics and Imaging\nParaconsistency and Word Puzzles\nGenerating Natural Questions About an Image\nLearning Visual Reasoning Without Strong Priors\nA general formal memory framework in Coq for verifying the properties of  programs based on higher-order logic theorem proving with increased  automation, consistency, and reusability\nAbstract Generation based on Rhetorical Structure Extraction\nStochastic phonological grammars and acceptability\nLinguistic Reflection in Java\nExtending the code generation capabilities of the Together CASE tool to  support Data Definition languages\nAlgebraic Geometry over Free Groups: Lifting Solutions into Generic  Points\nMUDOS-NG: Multi-document Summaries Using N-gram Graphs (Tech Report)\nGenerating Stack-based Access Control Policies\nAutomated generation and symbolic manipulation of tensor product finite  elements\nBuilding End-To-End Dialogue Systems Using Generative Hierarchical  Neural Network Models\nMedian-Based Generation of Synthetic Speech Durations using a  Non-Parametric Approach\nPolymonadic Programming\nDeep Visual-Semantic Alignments for Generating Image Descriptions\nVisual Madlibs: Fill in the blank Image Generation and Question  Answering\nA New Foundation for Finitary Corecursion\nGenerating Factoid Questions With Recurrent Neural Networks: The 30M  Factoid Question-Answer Corpus\nGenerating reversible circuits from higher-order functional programs\nGenerating Visual Explanations\nRedundancy-free Verbalization of Individuals for Ontology Validation\nGenerating Simulations of Motion Events from Verbal Descriptions\nHybrid Static/Dynamic Schedules for Tiled Polyhedral Programs\nVoice Conversion from Unaligned Corpora using Variational Autoencoding  Wasserstein Generative Adversarial Networks\nExploring Word Embeddings for Unsupervised Textual User-Generated  Content Normalization\nMulti-Task Video Captioning with Video and Entailment Generation\nDepression and Self-Harm Risk Assessment in Online Forums\nMoNoise: Modeling Noise Using a Modular Normalization System\nLearning Phrase Embeddings from Paraphrases with GRUs\nA generalized parsing framework for Abstract Grammars\nGenerative Interest Estimation for Document Recommendations\nA New Foundation for Finitary Corecursion and Iterative Algebras\nSyntax-Directed Variational Autoencoder for Structured Data\nAn End-to-End Goal-Oriented Dialog System with a Generative Natural  Language Response Generation\nLearning General Purpose Distributed Sentence Representations via Large  Scale Multi-task Learning\nNot just about size - A Study on the Role of Distributed Word  Representations in the Analysis of Scientific Publications\nWell-Typed Languages are Sound\nOne Model to Rule them all: Multitask and Multilingual Modelling for  Lexical Analysis\nSyntactic Complexity of Star-Free Languages\nGeneral Logic-Systems and Consequence Operators\nRandom Generation and Approximate Counting of Combinatorial Structures\nThe Quest for Optimal Sorting Networks: Efficient Generation of  Two-Layer Prefixes\nRandom Generation and Enumeration of Accessible Determinisitic Real-time  Pushdown Automata\nClosures and generating sets related to combinations of structures\nPyCells for an Open Semiconductor Industry\nGeneralized Sampling in Julia\nOptimized Automatic Code Generation for Geometric Algebra Based  Algorithms with Ray Tracing Application\nAMR-to-text generation as a Traveling Salesman Problem\nNon-coordinates basis in General Relativity and Cartan's structure  equations\nNo information can be conveyed by certain events: The case of the clever  widows of Fornicalia and the Stobon Oracle\nIntroduction to the CoNLL-2000 Shared Task: Chunking\nIntroduction to the CoNLL-2001 Shared Task: Clause Identification\nA Probabilistic Model of Machine Translation\nMulti-document Biography Summarization\nOn the structure of linear-time reducibility\nProblems of the Strategy of Regions\nString Partons and Multiple Quantisation\nThe abelian and non-abelian Josephson effect and pseudo-goldstone bosons\nOn certain higher dimensional analogues of vertex algebras\nAutomatic Quotients of Free Groups\nGroups, periodic planes and hyperbolic buildings\nAdaptive Quadrilateral Mesh in Curved Domains\nCausality Principle\nConsciousness in Physics\nAnalise dinamica da tendencia para o equilibrio num modelo simples: a  Segunda Lei de Newton e a Segunda Lei da Termodinamica\nFinite automata models of quantized systems: conceptual status and  outlook\nThe social aspects of quantum entanglement\nProof nets for display logic\nMathematics as the language of physics\nLanguage of Boolean functions its Grammar and Machine\nAdversary lower bounds for nonadaptive quantum algorithms\nDiagrammatics for Soergel categories\nOn the Jacobian of the harmonic moment map\nSequences close to periodic\nOn the Representation of Finite Automata\nCapacity Bounded Grammars and Petri Nets\nStatechart Verification with iState\nSpecifying Data Objects with Initial Algebras\nDeterministic Autopoietic Automata\nThe Morphisms With Unstackable Image Words\nOCamlJIT 2.0 - Faster Objective Caml\nPushing undecidability of the isolation problem for probabilistic  automata\nGeneralized Post Embedding Problems\nCell decomposition for semi-affine structures on p-adic fields\nSolving the TTC 2011 Compiler Optimization Case with GrGen.NET\nThe rigidity of periodic frameworks as graphs on a fixed torus\nPeriodic Rigidity on a Variable Torus Using Inductive Constructions\nVerification Condition Generation and Variable Conditions in Smallfoot\nAn effective characterization of the alternation hierarchy in  two-variable logic\nDeciding Word Problems of Semigroups using Finite State Automata\nThe enumeration of three pattern classes\nThe Cerny conjecture for automata respecting intervals of a directed  graph\nA Robust Specification Theory for Modal Event-Clock Automata\nSeveral AES Variants under VHDL language In FPGA\nOn the Number of Unbordered Factors\nFixed points of endomorphisms of trace monoids\nQuantum motor and future\nRational Subsets and Submonoids of Wreath Products\nNote on Undecidability of Bisimilarity for Second-Order Pushdown  Processes\nFibre bundle formulation of time-dependent mechanics\nSoergel Calculus\nInverse semigroups with rational word problem are finite\nA remark on the discriminant of Hill's equation and Herglotz functions\nQuantum Entanglement and Decoherence: Beyond Particle Models. A Farewell  to Quantum Mechanics's Weirdness\nTargeting HIV-related Medication Side Effects and Sentiment Using  Twitter Data\nHow the Voynich Manuscript was created\nSubcompletions of representable relation algebras\nSpin-density and Vorticity Contribution to the Cosmological Background\nAxiomatizing Analog Algorithms\nDerived-Term Automata of Multitape Rational Expressions (Long version)\nInvitation to Algorithmic Uses of Inclusion-Exclusion\nOn the logical strength of the automorphism groups of free nilpotent  groups\nA new approach to cross-bifix-free sets\nMonoidify! Monoids as a Design Principle for Efficient MapReduce  Algorithms\nA New Heuristic Synchronizing Algorithm\nAn R Implementation of the Polya-Aeppli Distribution\nUnraveling simplicity in elementary cellular automata\nComputable Axiomatizability of Elementary Classes\nProof Systems and Models for the First-Order Primal Logic\nPartial Derivative Automaton for Regular Expressions with Shuffle\nRust-Bio - a fast and safe bioinformatics library\nThinking Required\nComparing Weakest Precondition and Weakest Liberal Precondition\nTangles and Connectivity in Graphs\nPseudo-local Theories: A Functional Class Proposal\nSymbolic Tensor Calculus -- Functional and Dynamic Approach\nAutomatic Theorem Proving in Walnut\nA Step from Probabilistic Programming to Cognitive Architectures\nRelative exchangeability with equivalence relations\nA Modular Structural Operational Semantics for Delimited Continuations\nUnsupervised Neural Hidden Markov Models\nDecision problems on unary probabilistic and quantum automata\nMorphisms on infinite alphabets, countable states automata and regular  sequences\nSocial Media Argumentation Mining: The Quest for Deliberateness in  Raucousness\nSmooth contractible threefolds with hyperbolic $\\mathbb{G}_{m}$-actions  via ps-divisors\nMonadic Second Order Logic with Measure and Category Quantifiers\nVanishing theorems for perverse sheaves on abelian varieties, revisited\nDeriving Generic Bounds for Time-Series Constraints Based on Regular  Expressions Characteristics\nResponsive Graphical User Interface (ReGUI) and its Implementation in  MATLAB\nA Survey of Distant Supervision Methods using PGMs\nCoherence for braided and symmetric pseudomonoids\nAutomatic Mapping of French Discourse Connectives to PDTB Discourse  Relations\nAMR Parsing using Stack-LSTMs\nOwl: A General-Purpose Numerical Library in OCaml\nThe Trees of Hanoi\nUniqueness of Schrödinger flow on manifolds\nLocal finiteness for Green's relations in semigroup varieties\nOne loop QED corrections to the process $  γγ\\rightarrowμ^+μ^-γ$\nFormal specification of the FlexRay protocol using FocusST\nThe BMM symmetrising trace conjecture for groups  $G_4,\\,G_5,\\,G_6,\\,G_7,\\,G_8$\nMittens: An Extension of GloVe for Learning Domain-Specialized  Representations\nNetwork Traffic Anomaly Detection Using Recurrent Neural Networks\nMaskGAN: Better Text Generation via Filling in the______\nA simple branching model that reproduces language family and language  population distributions\nPumping lemmas for linear and nonlinear context-free languages\nFormalization of the pumping lemma for context-free languages\nNatural Language Understanding with Distributed Representation\nNatural Language Processing using Hadoop and KOSHIK\nProcessing XML for Domain Specific Languages\nLanguage classification from bilingual word embedding graphs\nOn the Similarities Between Native, Non-native and Translated Texts\nDiscriminating Similar Languages: Evaluations and Explorations\nDialog Context Language Modeling with Recurrent Neural Networks\nPredicting Native Language from Gaze\nVisual Reasoning with Natural Language\nParsing with Typed Feature Structures\nLearning Parse and Translation Decisions From Examples With Rich Context\nFrom truth to computability II\nIndex wiki database: design and experiments\nCompleteness for Flat Modal Fixpoint Logics\nUnbounded-error quantum computation with small space bounds\nReduced Ordered Binary Decision Diagram with Implied Literals: A New  knowledge Compilation Approach\nFurthering Baseline Core Lucid Standard Specification in the Context of  the History of Lucid, Intensional Programming, and Context-Aware Computing\nSpecifying and Staging Mixed-Initiative Dialogs with Program Generation  and Transformation\nOptimal Coalition Structures in Cooperative Graph Games\nQIRAL: A High Level Language for Lattice QCD Code Generation\nCompression as a universal principle of animal behavior\nGhost: A Uniform and General-Purpose Proxy Implementation\nEfficient Runtime Monitoring with Metric Temporal Logic: A Case Study in  the Android Operating System\nTowards Unsupervised Learning of Temporal Relations between Events\nVenture: a higher-order probabilistic programming platform with  programmable inference\nSimple and Effective Type Check Removal through Lazy Basic Block  Versioning\nMultilingual Relation Extraction using Compositional Universal Schema\nImage Captioning with Deep Bidirectional LSTMs\nImproving Quality of Hierarchical Clustering for Large Data Series\nAlbanian Sign Language (AlbSL) Number Recognition from Both Hand's  Gestures Acquired by Kinect Sensors\nCutoff for Extensions of Massive Gravity and Bi-Gravity\nImageCL: An Image Processing Language for Performance Portability on  Heterogeneous Systems\nDynamic Choreographies: Theory And Implementation\nDynamic Structural Operational Semantics\nBuilding Efficient Query Engines in a High-Level Language\nHierarchical LSTM with Adjusted Temporal Attention for Video Captioning\nAdaptive Lock-Free Data Structures in Haskell: A General Method for  Concurrent Implementation Swapping\nSQLNet: Generating Structured Queries From Natural Language Without  Reinforcement Learning\nThe Neural Network Pushdown Automaton: Model, Stack and Learning  Simulations\nFace2Text: Collecting an Annotated Image Description Corpus for the  Generation of Rich Face Descriptions\nRelationship Maintenance in Software Language Repositories\nA symbolic description of punning riddles and its computer  implementation\nDeriving Procedural and Warning Instructions from Device and Environment  Models\nPossessive Pronouns as Determiners in Japanese-to-English Machine  Translation\nLearning Features that Predict Cue Usage\nLearning Correlations between Linguistic Indicators and Semantic  Constraints: Reuse of Context-Dependent Descriptions of Entities\nCharacter design for soccer commmentary\nStrategic polymorphism requires just two combinators!\nAnusaaraka: Machine Translation in Stages\nSymmetric Space Cartan Connections and Gravity in Three and Four  Dimensions\nOptimising Code Generation with haggies\nVcache: Caching Dynamic Documents\nImplementing Multi-Periodic Critical Systems: from Design to Code  Generation\nLogical Step-Indexed Logical Relations\nA General Framework for Representing, Reasoning and Querying with  Annotated Semantic Web Data\nThe weighted words collector\nGeneral Bindings and Alpha-Equivalence in Nominal Isabelle\nA framework for automated PDE-constrained optimisation\nChecking Computations of Formal Method Tools - A Secondary Toolchain for  ProB\nThe classical umbral calculus, and the flow of a Drinfeld module\nShow and Tell: A Neural Image Caption Generator\nA Generative Model of Words and Relationships from Multiple Sources\nSentiCap: Generating Image Descriptions with Sentiments\nA dynamical definition of f.g. virtually free groups\nThe generalised word problem in hyperbolic and relatively hyperbolic  groups\nABC-CNN: An Attention Based Convolutional Neural Network for Visual  Question Answering\nDenseCap: Fully Convolutional Localization Networks for Dense Captioning\nLimits to Verification and Validation of Agentic Behavior\nFrame- and Segment-Level Features and Candidate Pool Evaluation for  Video Caption Generation\nExtending Term Subsumption systems for Uncertainty Management\nLearning to Predict from Textual Data\nFinitely Axiomatized Set Theory: a nonclassical first-order theory  implying ZF\nHEPMath 1.4: A Mathematica Package for Semi-Automatic Computations in  High Energy Physics\nA Generative Word Embedding Model and its Low Rank Positive Semidefinite  Solution\nParaphrase Generation from Latent-Variable PCFGs for Semantic Parsing\nImproving Trajectory Modelling for DNN-based Speech Synthesis by using  Stacked Bottleneck Features and Minimum Generation Error Training\nEmpath: Understanding Topic Signals in Large-Scale Text\nWeighted Pushdown Systems with Indexed Weight Domains\nOn Improving Informativity and Grammaticality for Multi-Sentence  Compression\nAbstract Program Slicing: an Abstract Interpretation-based approach to  Program Slicing\nA Proof Strategy Language and Proof Script Generation for Isabelle/HOL\nMorphology Generation for Statistical Machine Translation using Deep  Learning Techniques\nProposing Plausible Answers for Open-ended Visual Question Answering\nContent Selection in Data-to-Text Systems: A Survey\nCan Active Memory Replace Attention?\nKnowledge Questions from Knowledge Graphs\nGenerating Sentiment Lexicons for German Twitter\nA Joint Speaker-Listener-Reinforcer Model for Referring Expressions\nModel Theory and Proof Theory of Coalgebraic Predicate Logic\nIntersection Types and Counting\nMaximum-Likelihood Augmented Discrete Generative Adversarial Networks\nRandom vector generation of a semantic space\nLearning to Generate Reviews and Discovering Sentiment\nMachine Comprehension by Text-to-Text Neural Question Generation\nA Unification Algorithm for GP 2 (Long Version)\nText Summarization using Abstract Meaning Representation\nThe E2E Dataset: New Challenges For End-to-End Generation\nUnderstanding State Preferences With Text As Data: Introducing the UN  General Debate Corpus\nOn expansions of non-abelian free groups by cosets of a finite index  subgroup\nModeling Target-Side Inflection in Neural Machine Translation\nDeriving Law-Abiding Instances\nWhat Drives the International Development Agenda? An NLP Analysis of the  United Nations General Debate 1970-2016\nAutomating Direct Speech Variations in Stories and Games\nNeural Wikipedian: Generating Textual Summaries from Knowledge Base  Triples\nPushing the Limits of Paraphrastic Sentence Embeddings with Millions of  Machine Translations\nAsking the Difficult Questions: Goal-Oriented Visual Question Generation  via Intermediate Rewards\nMining Precision Interfaces From Query Logs\nTrain Once, Test Anywhere: Zero-Shot Learning for Text Classification\nLanguage and Noise Transfer in Speech Enhancement Generative Adversarial  Network\nA Family of Software Product Lines in Educational Technologies\nJoint Event Detection and Description in Continuous Video Streams\nGenerating Contradictory, Neutral, and Entailing Sentences\nAlgorithmic Differentiation for Domain Specific Languages\nConstant delay algorithms for regular document spanners\nActor-Critic based Training Framework for Abstractive Summarization\nNeural Sketch Learning for Conditional Program Generation\nComputer-Simulation des Wettbewerbs zwischen Sprachen\nHigher-Order Operator Precedence Languages\nPerpetual Adaptation of Software to Hardware: An Extensible Architecture  for Providing Code Optimization as a Central System Service\nAutomatic Generation of CHR Constraint Solvers\nRandom Sentences from a Generalized Phrase-Structure Grammar Interpreter\nEmbeddability and Stresses of Graphs\nImproved evolutionary generation of XSLT stylesheets\nA toolkit for a generative lexicon\nThe Question of Expressiveness in the Generation of Referring  Expressions\nFPGA Based Assembling of Facial Components for Human Face Construction\nFactor frequencies in generalized Thue-Morse words\nGeneralized Schrieffer-Wolff Formalism for Dissipative Systems\nThe finiteness of a group generated by a 2-letter invertible-reversible  Mealy automaton is decidable\nThe boundary is mixed\nExtentability of Automorphisms of Generic Substructures\nAnalyzer and generator for Pali\nConnected reversible Mealy automata of prime size cannot generate  infinite Burnside groups\nTopic Sensitive Neural Headline Generation\nGenerating random braids\nSocially-Informed Timeline Generation for Complex Events\nAbstractive Meeting Summarization UsingDependency Graph Fusion\nDroidGen: Constraint-based and Data-Driven Policy Generation for Android\nLearning to generate one-sentence biographies from Wikidata\nSymbolic and Numerical Analysis in General Relativity with Open Source  Computer Algebra Systems\nGeneric expansion and Skolemization in NSOP_1 theories\nGenerating Appealing Brand Names\nDeterministic Non-Autoregressive Neural Sequence Modeling by Iterative  Refinement\nIncorporating Discriminator in Sentence Generation: a Gibbs Sampling  Method\nAbout compression of vocabulary in computer oriented languages\nOgden's Lemma for Regular Tree Languages\nWeighted Automata and Recurrence Equations for Regular Languages\nComputing with Equations\nSyntactic Complexity of Ideal and Closed Languages\nSyntactic Complexity of Prefix-, Suffix-, Bifix-, and Factor-Free  Regular Languages\nRust for functional programmers\nAnnotating Cognates and Etymological Origin in Turkic Languages\nTree Automata and Tree Grammars\nProblems with the use of Web search engines to find results in foreign  languages\nOn the State Complexity of the Reverse of R- and J-trivial Regular  Languages\nModeling Language Variability\nCross-language Learning with Adversarial Neural Networks: Application to  Community Question Answering\nAn Unsupervised Approach for Mapping between Vector Spaces\nMicroplanning with Communicative Intentions: The SPUD System\nBe Your Own Prada: Fashion Synthesis with Structural Coherence\nAnalyse spectrale des textes: détection automatique des frontières  de langue et de discours\nMathematical Properties of Dynamic Systems and the Foundations of  Quantum Theory\nIterative Plan Construction for the Workflow Satisfiability Problem\nExperiments with Three Approaches to Recognizing Lexical Entailment\nDeep Sentence Embedding Using Long Short-Term Memory Networks: Analysis  and Application to Information Retrieval\nFabULous Interoperability for ML and a Linear Language\nModelling Requirements for Content Recommendation Systems\nTraining IBM Watson using Automatically Generated Question-Answer Pairs\nI2T2I: Learning Text to Image Synthesis with Textual Data Augmentation\nGenerating Different Story Tellings from Semantic Representations of  Narrative\nPrinciples and Implementation of Deductive Parsing\nA Plan-Based Model for Response Generation in Collaborative  Task-Oriented Dialogues\nAn implemented model of punning riddles\nInducing Probabilistic Grammars by Bayesian Model Merging\nIntegrating Gricean and Attentional Constraints\nWith raised eyebrows or the eyebrows raised ? A Neural Network Approach  to Grammar Checking for Definiteness\nNew Methods, Current Trends and Software Infrastructure for NLP\nGrapheme-to-Phoneme Conversion using Multiple Unbounded Overlapping  Chunks\nThree New Probabilistic Models for Dependency Parsing: An Exploration\nTowards a single proposal is spelling correction\nMoney and Goldstone modes\nUsing Local Optimality Criteria for Efficient Information Retrieval with  Redundant Information Filters\nComputation in an algebra of test selection criteria\nAxiomatizing Causal Reasoning\nA Classification Approach to Word Prediction\nCollecting Graphical Abstract Views of Mercury Program Executions\nAbductive reasoning with temporal information\nSoundness, Idempotence and Commutativity of Set-Sharing\nTowards Solving the Interdisciplinary Language Barrier Problem\nNon-Termination Inference of Logic Programs\nAn Anthological Review of Research Utilizing MontyLingua, a Python-Based  End-to-End Text Processor\nOn the Higher-Order Derivatives of Spectral Functions: Two Special Cases\nThe model completion of the theory of modules over finitely generated  commutative algebras\nComputation in Finitary Stochastic and Quantum Processes\nUsing Synchronic and Diachronic Relations for Summarizing Multiple  Documents Describing Evolving Events\nGetting More From Your Multicore: Exploiting OpenMP for Astronomy\nCLAIRLIB Documentation v1.03\nMeasurements and confluence in quantum lambda calculi with explicit  qubits\nSP2Bench: A SPARQL Performance Benchmark\nAn Object-Oriented and Fast Lexicon for Semantic Generation\nInterpretations of the Web of Data\nThe Usefulness of Multilevel Hash Tables with Multiple Hash Functions in  Large Databases\nHybrid Rules with Well-Founded Semantics\nTowards Multimodal Content Representation\nAutomatic Modular Abstractions for Template Numerical Constraints\nTransformations of Logic Programs on Infinite Lists\nUniversal Numeric Segmented Display\nMantis: Predicting System Performance through Program Analysis and  Modeling\nGroups defined by automata\nRational subsets of groups\nAn ER-based Framework for Declarative Web Programming\nInverse problems of symbolic dynamics\nA graphical environment to express the semantics of control systems\nSolving the TTC 2011 Compiler Optimization Task with metatools\nAn Accurate Arabic Root-Based Lemmatizer for Information Retrieval  Purposes\nTowards A Generic Formal Framework for Access Control Systems\nParaiso : An Automated Tuning Framework for Explicit Solvers of Partial  Differential Equations\nTowards a Generic Trace for Rule Based Constraint Reasoning\nTraductor Writing System Web\nBisimulation of Labeled State-to-Function Transition Systems of  Stochastic Process Languages\nGenerating events with style\nSMCHR: Satisfiability Modulo Constraint Handling Rules\nAutomating rule generation for grammar checkers\nShape from sound: toward new tools for quantum gravity\nStatic Analysis for Regular Expression Denial-of-Service Attacks\nA Language for Planning with Statistics\nDLOLIS-A: Description Logic based Text Ontology Learning\nMirrorShard: Proof by Computational Reflection with Verified Hints\nInfinite probability computation by cyclic explanation graphs\nOn Sound Compilation of Reals\nAbstract interpretation-based approaches to Security - A Survey on  Abstract Non-Interference and its Challenging Applications\nPropagating Regular Counting Constraints\nWaterfall: Primitives Generation on the Fly\nOne Quantifier Alternation in First-Order Logic with Modular Predicates\nOn Coinductive Equivalences for Higher-Order Probabilistic Functional  Programs (Long Version)\nOpacity with Orwellian Observers and Intransitive Non-interference\nReformulating the Situation Calculus and the Event Calculus in the  General Theory of Stable Models and in Answer Set Programming\nClassifying Fonts and Calligraphy Styles Using Complex Wavelet Transform\nAxiomatizing Causal Reasoning\nOn the Complexity of Optimization Problems based on Compiled NNF  Representations\nControl Improvisation\nFIFTH system for general-purpose connectionist computation\nNovel symmetries in an interacting N = 2 supersymmetric quantum  mechanical model\nMechanically Verified Calculational Abstract Interpretation\nDiscriminative Segmental Cascades for Feature-Rich Phone Recognition\nEESEN: End-to-End Speech Recognition using Deep RNN Models and  WFST-based Decoding\nVector Reachability Problem in $\\mathrm{SL}(2,\\mathbb{Z})$\nOn insertion-deletion systems over relational words\nData optimizations for constraint automata\nInference in Probabilistic Logic Programs using Lifted Explanations\nMeasuring Machine Intelligence Through Visual Question Answering\nAn Extension of Parikh's Theorem beyond Idempotence\nGPU Scripting and Code Generation with PyCUDA\nAutomated, Credible Autocoding of An Unmanned Aggressive Maneuvering Car  Controller\nRecursive Neural Networks Can Learn Logical Semantics\nA Requirements Modeling Language for the Component Behavior of Cyber  Physical Robotics Systems\nMulti-Platform Generative Development of Component & Connector Systems  using Model and Code Libraries\nEnsemble of Generative and Discriminative Techniques for Sentiment  Analysis of Movie Reviews\nModeling Compositionality with Multiplicative Recurrent Neural Networks\nVideo (language) modeling: a baseline for generative models of natural  videos\nStatistical laws in linguistics\nCombined Top-down and Bottom-up Approach to Multilevel Supervisory  Control\nCan JSP Code be Generated Using XML Tags?\nAutocorrelated errors in experimental data in the language sciences:  Some solutions offered by Generalized Additive Mixed Models\nSmoothing parameter estimation framework for IBM word alignment models\nResource Constrained Structured Prediction\nIntegrated Sequence Tagging for Medieval Latin Using Deep Representation  Learning\nVariational Neural Discourse Relation Recognizer\nSequence-to-Sequence Learning as Beam-Search Optimization\nIntelligent audit code generation from free text in the context of  neurosurgery\nShow and Tell: Lessons learned from the 2015 MSCOCO Image Captioning  Challenge\nEquation Parsing: Mapping Sentences to Grounded Equations\nAutomated Generation of Multilingual Clusters for the Evaluation of  Distributed Representations\nUnsupervised Pretraining for Sequence to Sequence Learning\nBootstrapping incremental dialogue systems: using linguistic knowledge  to learn from minimal data\nDefinition Modeling: Learning to define word embeddings in natural  language\nUsability Investigation on the Localization of Text CAPTCHAs: Take  Chinese Characters as a Case Study\nContext-aware Sentiment Word Identification: sentiword2vec\nEffects of Stop Words Elimination for Arabic Information Retrieval: A  Comparative Study\nSpecialization of Generic Array Accesses After Inlining\nGame-theoretic Model of Computation\nCustom Hypergraph Categories via Generalized Relations\nTransfer Learning for Sequence Tagging with Hierarchical Recurrent  Networks\nPrecision Interfaces\nOnline Spatial Concept and Lexical Acquisition with Simultaneous  Localization and Mapping\nUnderstanding Task Design Trade-offs in Crowdsourced Paraphrase  Collection\nPeople on Drugs: Credibility of User Statements in Health Communities\nProgram Induction by Rationale Generation : Learning to Solve and  Explain Algebraic Word Problems\nFast-Slow Recurrent Neural Networks\nControllable Invariance through Adversarial Feature Learning\nSynergistic Union of Word2Vec and Lexicon for Domain Specific Semantic  Similarity\nZero-Shot Relation Extraction via Reading Comprehension\nCharManteau: Character Embedding Models For Portmanteau Creation\nThe Influence of Feature Representation of Text on the Performance of  Document Classification\nProgram Completionin the Input Language of GRINGO\nMimicking Word Embeddings using Subword RNNs\nA New Modal Framework for Epistemic Logic\nOnline Deception Detection Refueled by Real World Data Collection\nSafety Verification of Phaser Programs\nActive Learning of Input Grammars\nPVSC-DTM: A domain-specific language and matrix-free stencil code for  investigating electronic properties of Dirac and topological materials\nSemantic Preserving Embeddings for Generalized Graphs\nAI Programmer: Autonomously Creating Software Programs Using Genetic  Algorithms\nHUMOR: A Crowd-Annotated Spanish Corpus for Humor Analysis\nA Novel Approach to Artistic Textual Visualization via GAN\nA Survey on Dialogue Systems: Recent Advances and New Frontiers\nLearning Robust Dialog Policies in Noisy Environments\nOn tractable query evaluation for SPARQL\nByte-Level Recursive Convolutional Auto-Encoder for Text\nDeep Generative Model for Joint Alignment and Word Representation\nGaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA  model\nA Feature-Rich Vietnamese Named-Entity Recognition Model\nGeneric Zero-Cost Reuse for Dependent Types\nScene Graph Parsing as Dependency Parsing\nReconstruction Network for Video Captioning\nThe simple essence of automatic differentiation (Differentiable  functional programming made easy)\nWhere Defaults Don't Help: the Case of the German Plural System\nIsometric Lineation in English Texts: An Empirical and Mathematical  Examination of its Character and Consequences\nA Machine-Independent Debugger--Revisited\nAfter Compilers and Operating Systems : The Third Advance in Application  Support\nOn Spatial Conjunction as Second-Order Logic\nThe Self-Organization of Speech Sounds\nExtending Prolog with Incomplete Fuzzy Information\nThe spinning electron: Hidrodynamical formulation, and quantum limit, of  the Barut-Zanghi theory\nA Topos Foundation for Theories of Physics: III. The Representation of  Physical Quantities With Arrows\nDesign and Implementation of a Tracer Driver: Easy and Efficient Dynamic  Analyses of Constraint Logic Programs\nA two-level logic approach to reasoning about computations\nA Homogeneous Reaction Rule Language for Complex Event Processing\nGrounded Symbols in the Brain Computational Foundations for Perceptual  Symbol System\nURSA: A System for Uniform Reduction to SAT\nAnnotated English\nOn minimising automata with errors\nIncremental dimension reduction of tensors with random index\nExtended Initiality for Typed Abstract Syntax\nDeveloping Embodied Multisensory Dialogue Agents\nModeling Languages: metrics and assessing tools\nDiscovering Basic Emotion Sets via Semantic Clustering on a Twitter  Corpus\nSets in homotopy type theory\nDevelopment of Marathi Part of Speech Tagger Using Statistical Approach\nA Model Approach to Build Basic Ontology\nLatent semantics of action verbs reflect phonetic parameters of  intensity and emotional content\nAnalysis of Timed and Long-Run Objectives for Markov Automata\nSpeech earthquakes: scaling and universality in human voice\nThe Hebrew Bible as Data: Laboratory - Sharing - Experiences\nCompositional Distributional Semantics with Compact Closed Categories  and Frobenius Algebras\nVQA: Visual Question Answering\nSimilarity of symbol frequency distributions with heavy tails\nInferring Parametric Energy Consumption Functions at Different Software  Levels: ISA vs. LLVM IR\nTGIF: A New Dataset and Benchmark on Animated GIF Description\nWell-Definedness and Efficient Inference for Probabilistic Logic  Programming under the Distribution Semantics\nC Language Extensions for Hybrid CPU/GPU Programming with StarPU\nComplexity Classifications for logic-based Argumentation\nUnderstanding Rulelog Computations in Silk\nA Boolean Algebraic Approach to Semiproper Iterations\nTowards Composable Concurrency Abstractions\nEnhancing R with Advanced Compilation Tools and Methods\nImproving Term Frequency Normalization for Multi-topical Documents, and  Application to Language Modeling Approaches\nDescribing Videos by Exploiting Temporal Structure\nIncremental Computation with Names\nTowards Practical Graph-Based Verification for an Object-Oriented  Concurrency Model\nEffectiveness of Structural Restrictions for Hybrid CSPs\nEdit Distance for Pushdown Automata\nA Fast Compiler for NetKAT\nDeep Learning Applied to Image and Text Matching\nLinguistic neighbourhoods: explaining cultural borders on Wikipedia  through multilingual co-editing activity\nThe Commutativity Problem of the MapReduce Framework: A Transducer-based  Approach\nVisual Question Answering: A Survey of Methods and Datasets\nOn Prefix Normal Words and Prefix Normal Forms\nA Devil's Advocate against Termination of Direct Recursion\nOn Delay and Regret Determinization of Max-Plus Automata\nExact Affine Counter Automata\nHigh-Throughput and Language-Agnostic Entity Disambiguation and Linking  on User Generated Data\nThe possibility of constructing a relativistic space of information  states based on the theory of complexity and analogies with physical  space-time\nReflection calculus and conservativity spectra\nAn Automated Text Categorization Framework based on Hyperparameter  Optimization\nA Concurrency-Agnostic Protocol for Multi-Paradigm Concurrent Debugging  Tools\nCoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection  in 52 Languages\nAn Interactive Tool for Natural Language Processing on Clinical Text\nReliability and Fault-Tolerance by Choreographic Design\nKnowledge gaps in the early growth of semantic networks\nBootstrapping incremental dialogue systems from minimal data: the  generalisation power of dialogue grammars\nLearning Continuous User Representations through Hybrid Filtering with  doc2vec\nSentiPers: A Sentiment Analysis Corpus for Persian\nEmpirical observations of ultraslow diffusion driven by the fractional  dynamics in languages: Dynamical statistical properties of word counts of  already popular words\nSmart Contracts Software Metrics: a First Study\nJoint Training for Neural Machine Translation Models with Monolingual  Data\nMeta-F*: Metaprogramming and Tactics in an Effectful Program Verifier\nA View-based Programmable Architecture for Controlling and Integrating  Decentralized Data\nA Systematic Review of Automated Grammar Checking in English Language\nFrom Symmetric Pattern-Matching to Quantum Control (Extended Version)\nAlgebraic Aspects of the Fractional Quantum Hall Effect\nBuilding the access pointers to a computation environment\nOn the generalized dining philosophers problem\nProbabilistic Parsing Strategies\nAnalysis of Equality Relationships for Imperative Programs\nDifferential Forms, Hopf Algebra and General Relativity I\nThe future of spin networks\nSome brane theoretic no-hair results (and their field theory duals)\nConceptual issues in combining general relativity and quantum theory\nGeneralized Kahler Geometry from supersymmetric sigma models\nUniform first-order definitions in finitely generated fields\nLaw of Excluded Quantum Gambling Strategies\nApplying Test-Paradigms in a Generic Tutoring System Concept for  Web-based Learning\nAn Approach to Programming Based on Concepts\nAnalytic aspects of the shuffle product\nOverview of some general results in combinatorial enumeration\nOn Theta-palindromic Richness\nTuring Machines on Graphs and Inescapable Groups\nL-systems in Geometric Modeling\nWhich finitely generated Abelian groups admit isomorphic Cayley graphs?\nApplications in Enumerative Combinatorics of Infinite Weighted Automata  and Graphs\nPre-processing of Domain Ontology Graph Generation System in Punjabi\nA Diversity-Promoting Objective Function for Neural Conversation Models\nUnifying Ghost-Free Lorentz-Invariant Lagrangians\nEquations for generalized n-point information with extreme and not  extreme approximations in the free Fock space\nGenerating Chinese Classical Poems with RNN Encoder-Decoder\nClassical field theories from Hamiltonian constraint: Symmetries and  conservation laws\nMutual Transformation of Information and Knowledge\nN=1 vacua in Exceptional Generalized Geometry\nA Semantic Approach to Summarization\nA Generic Numbering System based on Catalan Families of Combinatorial  Objects\nA connected 3-state reversible Mealy automaton cannot generate an  infinite Burnside group\nSurvey on Combinatorial Register Allocation and Instruction Scheduling\nCompositional Invariant Generation via Linear Recurrence Analysis\nContact spectral invariants and persistence\nUniform generation in trace monoids\nAn Ensemble method for Content Selection for Data-to-text Systems\nRobust Subgraph Generation Improves Abstract Meaning Representation  Parsing\ndeltaBLEU: A Discriminative Metric for Generation Tasks with  Intrinsically Diverse Targets\nKnapsack in graph groups, HNN-extensions and amalgamated products\nFortran code for generating random probability vectors, unitaries, and  quantum states\nRepresenting Strategic Games and Their Equilibria in Many-Valued Logics\nTopic Modeling Using Distributed Word Embeddings\nDataflow matrix machines as programmable, dynamically expandable,  self-referential generalized recurrent neural networks\nOn a Topic Model for Sentences\nSemantic Parsing with Semi-Supervised Sequential Autoencoders\nGeneralization Bounds for Weighted Automata\nMorphological Inflection Generation with Hard Monotonic Attention\nAdversarial Evaluation of Dialogue Models\nAMR-to-text Generation with Synchronous Node Replacement Grammar\nPost-edit Analysis of Collective Biography Generation\nFeature Generation for Robust Semantic Role Labeling\nAbstract Syntax Networks for Code Generation and Semantic Parsing\nA Generative Model of a Pronunciation Lexicon for Hindi\nMusic generation with variational recurrent autoencoder supported by  history\nQuadratic automaton algebras and intermediate growth\nProjective tensor product of protoquantum spaces\nA Joint Model for Question Answering and Question Generation\nNeural Machine Translation with Gumbel-Greedy Decoding\nExplainable Entity-based Recommendations with Knowledge Graphs\nMekler's construction and generalized stability\nGenerating Query Suggestions to Support Task-Based Search\nRobust Speech Recognition Using Generative Adversarial Networks\nWhy Do Neural Dialog Systems Generate Short and Meaningless Replies? A  Comparison between Dialog and Translation\nQuery-Based Abstractive Summarization Using Neural Networks\nA Very Short Self-Interpreter\nOn The Liniar Time Complexity of Finite Languages\nA Shrinking Lemma for Indexed Languages\nAn OLAC Extension for Dravidian Languages\nA note on decidability of cellularity\nComplexity in Prefix-Free Regular Languages\nTowards Nominal Formal Languages\nChallenges in Kurdish Text Processing\nReset Complexity of Ideal Languages\nLexpresso: a Controlled Natural Language\nTyping Regular Path Query Languages for Data Graphs\nSyntax and semantics of the weak consistency model specification  language cat\nPermutations of context-free and indexed languages\nThe While language\nDiscriminating between similar languages in Twitter using label  propagation\nRegular Separability of Parikh Automata\nDesigning a pi-based Programming Language in the .NET framework: CLR  interoperability from the Programmer's point of view\nAnonymous Variables in Imperative Languages\nOn the Generation of Test Data for Prolog by Partial Evaluation\nLazy Model Expansion: Interleaving Grounding with Search\nFoam: A General-Purpose Cellular Monte Carlo Event Generator\nIndividual and Domain Adaptation in Sentence Planning for Dialogue\nGeneralized Cayley Graphs and Cellular Automata over them\nA Comparison of Mechanisms for Integrating Handwritten and Generated  Code for Object-Oriented Programming Languages\nPhoneme-level speech and natural language intergration for agglutinative  languages\nLanguage Access: An Information Based Approach\nSimulation of language competition by physicists\nFlux: FunctionaL Updates for XML (extended report)\nAgent Based Models of Language Competition: Macroscopic descriptions and  Order-Disorder transitions\nDeciding Regularity of Hairpin Completions of Regular Languages in  Polynomial Time\nOn the Properties of Language Classes Defined by Bounded Reaction  Automata\nOperator Precedence ω-languages\nReconstructing Native Language Typology from Foreign Language Usage\nA Module System for Domain-Specific Languages\nCommutative Languages and their Composition by Consensual Methods\nCross-lingual Dataless Classification for Languages with Small Wikipedia  Presence\nSeparability by Piecewise Testable Languages is PTime-Complete\nWeakly and Strongly Irreversible Regular Languages\nImproved Text Language Identification for the South African Languages\nMultilingual Speech Recognition With A Single End-To-End Model\nA Deep Generative Framework for Paraphrase Generation\nA Script Language for Data Integration in Database\nNLOMJ--Natural Language Object Model in Java\nLanguage embeddings that preserve staging and safety\nA FORTRAN coded regular expression Compiler for IBM 1130 Computing  System\nAn Intuitive Automated Modelling Interface for Systems Biology\nFuzzy Modeling and Natural Language Processing for Panini's Sanskrit  Grammar\ntym: Typed Matlab\nA Proof of the Pumping Lemma for Context-Free Languages Through Pushdown  Automata\nThe separation problem for regular languages by piecewise testable  languages\nTowards Structural Natural Language Formalization: Mapping Discourse to  Controlled Natural Language\nMorphological Analysis of the Bishnupriya Manipuri Language using Finite  State Transducers\nSurvey:Natural Language Parsing For Indian Languages\nPattern Languages as Media for the Creative Society\nAre Style Guides Controlled Languages? The Case of Koenig & Bauer AG\nEmbedded Controlled Languages\nModeling Language Variability\nReplacing ANSI C with other modern programming languages\nRegular realizability problems and regular languages\nBottom Up Quotients and Residuals for Tree Languages\nMulti-Way, Multilingual Neural Machine Translation with a Shared  Attention Mechanism\nRestricted deterministic Watson-Crick automata\nThe Controlled Natural Language of Randall Munroe's Thing Explainer\nComplexity of Left-Ideal, Suffix-Closed and Suffix-Free Regular  Languages\nComparative Study Of Data Mining Query Languages\nTree Notation: an antifragile program notation\nDeep Investigation of Cross-Language Plagiarism Detection Methods\nRegularity of non context-free languages over a singleton terminal  alphabet\nOpen-Set Language Identification\nComposition by Conversation\nPhylogenetics of Indo-European Language families via an  Algebro-Geometric Analysis of their Syntactic Structures\nThe Frobenius problem for homomorphic embeddings of languages into the  integers\nThe WiLI benchmark dataset for written language identification\nNeural Lattice Language Models\nMeta-Learning a Dynamical Language Model\nA Topos Foundation for Theories of Physics: II. Daseinisation and the  Liberation of Quantum Theory\nClassical and quantum computation with small space bounds (PhD thesis)\nA Synthesis of the Procedural and Declarative Styles of Interactive  Theorem Proving\nFlowchart Programs, Regular Expressions, and Decidability of Polynomial  Growth-Rate\nMathematical Language Processing: Automatic Grading and Feedback for  Open Response Mathematical Questions\nUse of Modality and Negation in Semantically-Informed Syntactic MT\nAutomagically encoding Adverse Drug Reactions in MedDRA\nRace, Religion and the City: Twitter Word Frequency Patterns Reveal  Dominant Demographic Dimensions in the United States\nAn Introduction to Programming for Bioscientists: A Python-based Primer\nAn Analysis of Introductory Programming Courses at UK Universities\nSYSTRAN's Pure Neural Machine Translation Systems\nDescription Languages for Consistency Management Scenarios Based on  Examples from the Industry Automation Domain\nOn the Hierarchy of Block Deterministic Languages\nTopological Entropy of Formal Languages\nImproving Neural Machine Translation with Conditional Sequence  Generative Adversarial Nets\nJointly Modeling Embedding and Translation to Bridge Video and Language\nThe Stochastic Processes Generation in OpenModelica\nPhrase-based Image Captioning with Hierarchical LSTM Model\nAlexander Stratifications of Character Varieties\nIntegration Of Visual Inter-word Constraints And Linguistic Knowledge In  Degraded Text Recognition\nIntention-based Segmentation: Human Reliability and Correlation with  Linguistic Cues\nAn Integrated Heuristic Scheme for Partial Parse Evaluation\nPhoneme Recognition Using Acoustic Events\nThe Role of Cognitive Modeling in Achieving Communicative Intentions\nAutomated Postediting of Documents\nPrinciple Based Semantics for HPSG\nMulti-Dimensional Inheritance\nAlgorithms for Analysing the Temporal Structure of Discourse\nSplitting the Reference Time: Temporal Anaphora and Quantification in  DRT\nPrinciple Based Semantics for HPSG\nNLG vs. Templates\nThe intersection of Finite State Automata and Definite Clause Grammars\nResponse Generation in Collaborative Negotiation\nA Symbolic and Surgical Acquisition of Terms through Variation\nIndefeasible Semantics and Defeasible Pragmatics\nComparative Ellipsis and Variable Binding\nA Compositional Treatment of Polysemous Arguments in Categorial Grammar\nParsing with Typed Feature Structures\nText Windows and Phrases Differing by Discipline, Location in Document,  and Syntactic Structure\nMulti-level post-processing for Korean character recognition using  morphological analysis and linguistic evaluation\nProcessing Metonymy: a Domain-Model Heuristic Graph Traversal Approach\nFocus and Higher-Order Unification\nTwo Questions about Data-Oriented Parsing\nIsolated-Word Confusion Metrics and the PGPfone Alphabet\nGenerating Information-Sharing Subdialogues in Expert-User Consultation\nCharts, Interaction-Free Grammars, and the Compact Representation of  Ambiguity\nAttaching Multiple Prepositional Phrases: Generalized Backed-off  Estimation\n\"I don't believe in word senses\"\nForeground and Background Lexicons and Word Sense Disambiguation for  Information Extraction\nAnchoring a Lexicalized Tree-Adjoining Grammar for Discourse\nImproving Data Driven Wordclass Tagging by System Combination\nBeyond the Zipf-Mandelbrot law in quantitative linguistics\nIntermittency and scale-free networks: a dynamical model for human  language complexity\nTime-dependent Density-Matrix Renormalization-Group Methods\nThe descriptive complexity approach to LOGCFL\nName Strategy: Its Existence and Implications\nSupervised Grammar Induction Using Training Data with Limited  Constituent Information\nA Real World Implementation of Answer Extraction\nPlanning with Incomplete Information\nThe (Lazy) Functional Side of Logic Programming\nHow to Evaluate your Question Answering System Every Day and Still Get  Real Work Done\nUsing a Diathesis Model for Semantic Parsing\nProcessing Self Corrections in a speech to speech system\nAutomatic Debugging Support for UML Designs\nMulti-dimensional Type Theory: Rules, Categories, and Combinators for  Syntax and Semantics\nA Constrained Object Model for Configuration Based Workflow Composition\nMapping DEVS Models onto UML Models\nRemoving Redundant Arguments Automatically\nImproving Term Extraction with Terminological Resources\nRaisonnement stratifié à base de normes pour inférer les  causes dans un corpus textuel\nDepAnn - An Annotation Tool for Dependency Treebanks\nSASE: Complex Event Processing over Streams\nA Static Analyzer for Large Safety-Critical Software\nSpeeding up Domain Wall Fermion Algorithms using QCDLAB\nDuality and an Operator Realization for the Fermi-Bose Transmutation in  3+1 Dimensions\nGraph-Based Logic and Sketches 1: The General Framework\nSome Combinatorics behind Proofs\nSet theory is interpretable in the automorphism group of a free group\nAnalysis and Synthesis of the Distribution of Consonants over Languages:  A Complex Network Approach\nNon-equilibrium dynamics of language games on complex networks\nOrdering dynamics with two non-excluding options: Bilingualism in  language competition\nLanguages of Quantum Information Theory\nOn parallel composition of zero-knowledge proofs with black-box quantum  simulators\nQuantum-like Representation of Macroscopic Configurations\nSuccess and failure of programming environments - report on the design  and use of a graphic abstract syntax tree editor\nBuilding Rules on Top of Ontologies for the Semantic Web with Inductive  Logic Programming\nTCHR: a framework for tabled CLP\nIndirect Object Representation and Access by Means of Concepts\nAn application of the Deutsch-Josza algorithm to formal languages and  the word problem in groups\nEnsuring Spreadsheet Integrity with Model Master\nA Logic Programming Framework for Combinational Circuit Synthesis\nConception et Evaluation de XQuery dans une architecture de médiation  \"Tout-XML\"\nUnfolding in CHR\nQuantum Feedback Control: How to use Verification Theorems and Viscosity  Solutions to Find Optimal Protocols\nThe Prolog Interface to the Unstructured Information Management  Architecture\nBinding bigraphs as symmetric monoidal closed theories\nAutomatic Modular Abstractions for Linear Constraints\nA Spectral Algorithm for Learning Hidden Markov Models\nFiltering Microarray Correlations by Statistical Literature Analysis  Yields Potential Hypotheses for Lactation Research\nAutomatic Summarization System coupled with a Question-Answering System  (QAAS)\nDiagrams for Symmetric Product Orbifolds\nNormalized Web Distance and Word Similarity\nMultidimensional Generalized Automatic Sequences and Shape-symmetric  Morphic Words\nN-tuple Zipf Analysis and Modeling for Language, Computer Program and  DNA\nWild Card Queries for Searching Resources on the Web\nMultiple Retrieval Models and Regression Models for Prior Art Search\nAutomatic modular abstractions for template numerical constraints\nOn equations over sets of integers\nAlgebraic Linear Orderings\nDevelopment of a multi-user handwriting recognition system using  Tesseract open source OCR engine\nDevelopment of a Multi-User Recognition Engine for Handwritten Bangla  Basic Characters and Digits\nRankers over Infinite Words\nNetwork analysis of a corpus of undeciphered Indus civilization  inscriptions indicates syntactic organization\nUsing Soft Constraints To Learn Semantic Models Of Descriptions Of  Shapes\nVideo Event Recognition for Surveillance Applications (VERSA)\nPushdown Control-Flow Analysis of Higher-Order Programs\nPropositional Dynamic Logic for Message-Passing Systems\nAccepting Hybrid Networks of Evolutionary Processors with Special  Topologies and Small Communication\nRuntime-Flexible Multi-dimensional Arrays and Views for C++98 and C++0x\nQuivers of monoids with basic algebras\nWhat can we say about nature?\nMeasuring Performance of Continuous-Time Stochastic Processes using  Timed Automata\nSymmetry-Aware Predicate Abstraction for Shared-Variable Concurrent  Programs (Extended Technical Report)\nXMLlab : multimedia publication of simulations applets using XML and  Scilab\nJavaCtx: Seamless Toolchain Integration for Context-Oriented Programming\nSimple, Decidable Type Inference with Subtyping\nFaire levier sur les architectures logicielles pour guider et vérifier  le développement d'applications SCC\nCompiling Causal Theories to Successor State Axioms and STRIPS-Like  Systems\nOn the origin of ambiguity in efficient communication\nForOpenCL: Transformations Exploiting Array Syntax in Fortran for  Accelerator Programming\n(Co-)Inductive semantics for Constraint Handling Rules\nCross-moments computation for stochastic context-free grammars\nLeveraging Software Architectures to Guide and Verify the Development of  Sense/Compute/Control Applications\nA Probabilistic Approach to Pronunciation by Analogy\nGroups whose geodesics are locally testable\nConfidence Estimation in Structured Prediction\nSimulations of Dense Stellar Systems with the AMUSE Software Toolkit\nProgram Understanding: A Reengineering Case for the Transformation Tool  Contest\nRamified Structural Recursion and Corecursion\nAn OpenCL implementation for the solution of TDSE on GPU and CPU  architectures\nParametric Compositional Data Types\nAD in Fortran, Part 1: Design\nAutomated Feedback Generation for Introductory Programming Assignments\nTowards Real-Time Summarization of Scheduled Events from Twitter Streams\nTopSig: Topology Preserving Document Signatures\nAutomatic Generation of C-code or PLD Circuits under SFC Graphical  Environment\nRefining Inductive Types\nModel Driven Mutation Applied to Adaptative Systems Testing\nModel Checking Stochastic Branching Processes\nTime Warp on the Go (Updated Version)\nSoftware Verification and Graph Similarity for Automated Evaluation of  Students' Assignments\nOn the origin of long-range correlations in texts\nTowards Algorithmic Synthesis of Synchronization for Shared-Memory  Concurrent Programs\nInfo-Computationalism and Philosophical Aspects of Research in  Information Sciences\nAlan Turing's Legacy: Info-Computational Philosophy of Nature\nExploiting First-Order Regression in Inductive Policy Selection\nOn the specification of operations on the rational behaviour of systems\nTuring machines based on unsharp quantum logic\nUnderapproximation of Procedure Summaries for Integer Programs\nReversible Christoffel factorizations\nWiSANCloud: a set of UML-based specifications for the integration of  Wireless Sensor and Actor Networks (WSANs) with the Cloud Computing\nOn the Use of Underspecified Data-Type Semantics for Type Safety in  Low-Level Code\nDevelopment of an Astrophysical Specific Language for Big Data  Computation\nContinuous Time Bayesian Networks\nToward the Automatic Generation of a Semantic VRML Model from  Unorganized 3D Point Clouds\nInferring Informational Goals from Free-Text Queries: A Bayesian  Approach\nSilent Transitions in Automata with Storage\nBilingual Terminology Extraction Using Multi-level Termhood\nEffect of Query Formation on Web Search Engine Results\nRevisiting the Equivalence Problem for Finite Multitape Automata\nThe Case for Explicit Coupling Constraints\nRefining SCJ Mission Specifications into Parallel Handler Designs\nBayesian State-Space Modelling on High-Performance Hardware Using LibBi\nAn Effect System for Algebraic Effects and Handlers\nUnfolding for CHR programs\nReading Stockholm Riots 2013 in social media by text-mining\nArbitrary Sequence RAMs\nA Preadapted Universal Switch Distribution for Testing Hilberg's  Conjecture\nStreaMon: a data-plane programming abstraction for Software-defined  Stream Monitoring\nWhen Equivalence and Bisimulation Join Forces in Probabilistic Automata\nQuery Segmentation for Relevance Ranking in Web Search\nA semi-automatic semantic method for mapping SNOMED CT concepts to VCM  Icons\nStatic Application-Level Race Detection in STM Haskell using Contracts\nWord Emdeddings through Hellinger PCA\nTowards A Domain-specific Language For Pick-And-Place Applications\nSubsumption Checking in Conjunctive Coalgebraic Fixpoint Logics\nClassical realizability and arithmetical formulæ\nQuerying Geometric Figures Using a Controlled Language, Ontological  Graphs and Dependency Lattices\nExecutable Refinement Types\nDecentralized Supervisory Control with Communicating Supervisors Based  on Top-Down Coordination Control\nLexicon Infused Phrase Embeddings for Named Entity Resolution\nPerformance of Python runtimes on a non-numeric scientific code\nSPEEDY: An Eclipse-based IDE for invariant inference\nOpen induction in a bounded arithmetic for TC^0\nInitial Comparison of Linguistic Networks Measures for Parallel Texts\nAn Expert System for Automatic Reading of A Text Written in Standard  Arabic\nCoordinate System Selection for Minimum Error Rate Training in  Statistical Machine Translation\nInterprocedural Reachability for Flat Integer Programs\nLarge Code Base Change Ripple Management in C++: My thoughts on how a  new Boost C++ Library could help\nProbabilistic Alias Analysis for Parallel Programming in SSA Forms\nLoo.py: transformation-based code generation for GPUs and CPUs\nKNET: A General Framework for Learning Word Embedding using  Morphological Knowledge\nAlgebras of Open Dynamical Systems on the Operad of Wiring Diagrams\nA Tentative Role for FOXP2 in the Evolution of Dual Processing Modes and  Generative Abilities\nWhat Java Developers Know About Compatibility, And Why This Matters\nA Hoare-like logic of asserted single-pass instruction sequences\nFinite Automata for the Sub- and Superword Closure of CFLs:  Descriptional and Computational Complexity\nRepresentations of categories of G-maps\nOn-the-fly Probabilistic Model Checking\nA Semantic Web of Know-How: Linked Data for Community-Centric Tasks\nThe Bayesian Echo Chamber: Modeling Social Influence via Linguistic  Accommodation\nCIDEr: Consensus-based Image Description Evaluation\nHard to Cheat: A Turing Test based on Answering Questions about Images\nRDF Validation Requirements - Evaluation and Logical Underpinning\nModelling and Verifying an Object-Oriented Concurrency Model in GROOVE\nExposing ambiguities in a relation-extraction gold standard with  crowdsourcing\nGenerating Navigable Semantic Maps from Social Sciences Corpora\nNeural CRF Parsing\nAn Open Challenge Problem Repository for Systems Supporting Binders\nTranslating Hierarchical Block Diagrams into Composite Predicate  Transformers\nTowards Patterns for Heaps and Imperative Lambdas\nExplicit Knowledge-based Reasoning for Visual Question Answering\nInterprocedural Type Specialization of JavaScript Programs Without Type  Analysis\nA Deep Architecture for Semantic Matching with Multiple Positional  Sentence Representations\nResource theories of knowledge\nLinearly Typed Dyadic Group Sessions for Building Multiparty Sessions\nParallelizing Word2Vec in Shared and Distributed Memory\nExpressive Completeness of Existential Rule Languages for Ontology-based  Query Answering\nRobsut Wrod Reocginiton via semi-Character Recurrent Neural Network\nTemporal Attention Model for Neural Machine Translation\nProbabilistic Data Analysis with Probabilistic Programming\nKepler's Differential Equations\nComputational Aspects of Reordering Plans\nReasoning about Actions with Temporal Answer Sets\nRandom Context and Semi-Conditional Insertion-Deletion Systems\nAutomatic case acquisition from texts for process-oriented case-based  reasoning\nConceptual Understanding of Computer Program Execution: Application to  C++\nEvolution in a Changing Environment\nLexical State Analyzer\nThe infinite random simplicial complex\nThue's 1914 paper: a translation\nDevelopment of a language and its enacting engine for the unified  discovery of heterogeneous services\nFormal Specification Language Based IaaS Cloud Workload Regression  Analysis\nExercise: +-1 bug and center of an array problem\nHandling non-compositionality in multilingual CNLs\nThe Links Have It: Infobox Generation by Summarization over Linked  Entities\nJabalin: a Comprehensive Computational Model of Modern Standard Arabic  Verbal Morphology Based on Traditional Arabic Prosody\nA Pragmatic Interpretation of Quantum Logic\nWeak and Nested Class Memory Automata\nPruning, Pushdown Exception-Flow Analysis\nTaking into Account the Differences between Actively and Passively  Acquired Data: The Case of Active Learning with Support Vector Machines for  Imbalanced Datasets\nToward Refactoring of DMARF and GIPSY Case Studies -- a Team 12  SOEN6471-S14 Project Report\nImproved Semantic Representations From Tree-Structured Long Short-Term  Memory Networks\nExploring Cultures through Pattern Mining - Practices from Generative  Beauty Workshops\nEnd-To-End Memory Networks\nOn the Stability of Online Language Features: How Much Text do you Need  to know a Person?\nCompositional Vector Space Models for Knowledge Base Completion\nParsing Linear Context-Free Rewriting Systems with Fast Matrix  Multiplication\nLearning to Transduce with Unbounded Memory\nHow Scale Affects Structure in Java Programs\nApplying Deep Learning to Answer Selection: A Study and An Open Task\nImage Representations and New Domains in Neural Image Captioning\nA High-Level Modeling Language for the Efficient Design, Implementation,  and Testing of Android Applications\nWord, graph and manifold embedding from Markov processes\nAutomatic Dialect Detection in Arabic Broadcast Speech\nProbabilistic Output Analysis by Program Manipulation\nDistribution-based Bisimulation and Bisimulation Metric in Probabilistic  Automata\nNonparametric Bayesian Storyline Detection from Microtexts\nSound and Complete Bidirectional Typechecking for Higher-Rank  Polymorphism with Existentials and Indexed Types\nContextual LSTM (CLSTM) models for Large scale NLP tasks\nImproving Named Entity Recognition for Chinese Social Media with Word  Segmentation Representation Learning\nSession Types in a Linearly Typed Multi-Threaded Lambda-Calculus\nImage Captioning with Semantic Attention\nSymbolic Reachability Analysis of B through ProB and LTSmin\nIncorporating Copying Mechanism in Sequence-to-Sequence Learning\nRecurrent Neural Network Encoder with Attention for Community Question  Answering\nShirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias  in an Online Fiction Writing Community\nImproving Image Captioning by Concept-based Sentence Reranking\nMachine Learning Techniques with Ontology for Subjective Answer  Evaluation\nTopological language for RNA\nLog-linear Combinations of Monolingual and Bilingual Neural Machine  Translation Models for Automatic Post-Editing\nAttention Correctness in Neural Image Captioning\nDependency Parsing as Head Selection\nZoneout: Regularizing RNNs by Randomly Preserving Hidden Activations\nCultural Shift or Linguistic Drift? Comparing Two Computational Measures  of Semantic Change\nStructured Factored Inference: A Framework for Automated Reasoning in  Probabilistic Programming Languages\nDeep Reinforcement Learning with a Combinatorial Action Space for  Predicting Popular Reddit Threads\nFair Simulation for Nondeterministic and Probabilistic Buechi Automata:  a Coalgebraic Perspective\nAutomatic Pronunciation Generation by Utilizing a Semi-supervised Deep  Neural Networks\nSummarizing Decisions in Spoken Meetings\nLifted Rule Injection for Relation Embeddings\nAutomatic Generation of Probabilistic Programming from Time Series Data\nModelling movement for collective adaptive systems with CARMA\nRemoving Unnecessary Variables from Horn Clause Verification Conditions\nNeural Discourse Modeling of Conversations\nThe Actias system: supervised multi-strategy learning paradigm using  categorical logic\nTwitter-Network Topic Model: A Full Bayesian Treatment for Social  Network and Text Modeling\nLearning Spatial-Semantic Context with Fully Convolutional Recurrent  Network for Online Handwritten Chinese Text Recognition\nPersistent Contextual Values as Inter-Process Layers\nPre-Translation for Neural Machine Translation\nVirtual Embodiment: A Scalable Long-Term Strategy for Artificial  Intelligence Research\nDependent Types in Haskell: Theory and Practice\nDistraction-Based Neural Networks for Document Summarization\nOrdinal Common-sense Inference\nBalotage in Argentina 2015, a sentiment analysis of tweets\nDependency Sensitive Convolutional Neural Networks for Modeling  Sentences and Documents\nRecurrent Neural Network based Part-of-Speech Tagger for Code-Mixed  Social Media Text\nOntology Driven Disease Incidence Detection on Twitter\nTime Series Structure Discovery via Probabilistic Program Synthesis\nNeural Machine Translation with Latent Semantic of Image and Text\nRevisiting the Futamura Projections: A Diagrammatic Approach\nFlu Detector: Estimating influenza-like illness rates from online  user-generated content\nText-guided Attention Model for Image Captioning\nRecurrent Image Captioner: Describing Images with Spatial-Invariant  Transformation and Attention Filtering\nReducing Nondeterministic Tree Automata by Adding Transitions\nStructured Sequence Modeling with Graph Convolutional Recurrent Networks\nUnderstanding Neural Networks through Representation Erasure\nA Typeful Integration of SQL into Curry\nA Convenient Category for Higher-Order Probability Theory\nRUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain  Dialog Systems\nDeep Probabilistic Programming\nPushing for weighted tree automata\nEasyInterface: A toolkit for rapid development of GUIs for research  prototype tools\nStructured Attention Networks\nIterative Multi-document Neural Attention for Multiple Answer Prediction\nTrainable Greedy Decoding for Neural Machine Translation\nExploiting Domain Knowledge via Grouped Weight Sharing with Application  to Text Categorization\nAutomatic Rule Extraction from Long Short Term Memory Networks\nLearning Concept Embeddings for Efficient Bag-of-Concepts Densification\nOn the Boundary between Decidability and Undecidability of Asynchronous  Session Subtyping\nUsing Graphs of Classifiers to Impose Declarative Constraints on  Semi-supervised Learning\nVocabulary Alignment in Openly Specified Interactions\nVQABQ: Visual Question Answering by Basic Questions\nMétodos de Otimização Combinatória Aplicados ao Problema de  Compressão MultiFrases\nLearning to Predict: A Fast Re-constructive Method to Generate  Multimodal Embeddings\nFrames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems\nAligned Image-Word Representations Improve Inductive Transfer Across  Vision-Language Tasks\nA Transition-Based Directed Acyclic Graph Parser for UCCA\nAutomatic Measurement of Pre-aspiration\nMulti-space Variational Encoder-Decoders for Semi-supervised Labeled  Sequence Transduction\nA Constrained Sequence-to-Sequence Neural Model for Sentence  Simplification\nHiFrames: High Performance Data Frames in a Scripting Language\nCross-domain Semantic Parsing via Paraphrasing\nA Semantic QA-Based Approach for Text Summarization Evaluation\nUniversality of Confluent, Self-Loop Deterministic Partially Ordered  NFAs is Hard\nAn expressive completeness theorem for coalgebraic modal mu-calculi\nThe Forgettable-Watcher Model for Video Question Answering\nData Readiness Levels\nVariations of Checking Stack Automata: Obtaining Unexpected Decidability  Properties\nLatent Intention Dialogue Models\nEmergent Communication in a Multi-Modal, Multi-Step Referential Game\nDeep learning for extracting protein-protein interactions from  biomedical literature\nAssessing the Linguistic Productivity of Unsupervised Deep Neural  Networks\nA Mention-Ranking Model for Abstract Anaphora Resolution\nCompositional Hoare-style Reasoning about Hybrid CSP in the Duration  Calculus\nActor-Critic Sequence Training for Image Captioning\nIntroducing libeemd: A program package for performing the ensemble  empirical mode decomposition\nComplexity Metric for Code-Mixed Social Media Text\nNew theories of relativistic hydrodynamics in the LHC era\nRecalling a Witness: Foundations and Applications of Monotonic State\nMDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network\nTowards Crafting Text Adversarial Samples\nComputerized Adaptive Testing Simulation Through the Package catsim\nCUNI System for the WMT17 Multimodal Translation Task\nHigh-risk learning: acquiring new word vectors from tiny data\nWhy We Need New Evaluation Metrics for NLG\nSplit and Rephrase\nHierarchical Embeddings for Hypernymy Detection and Directionality\nReinforcement Learning for Bandit Neural Machine Translation with  Simulated Human Feedback\nSPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO  Data Set\nEnforcing Constraints on Outputs with Unconstrained Inference\nTensor Networks in a Nutshell\nSemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and  Cross-lingual Focused Evaluation\nAn Investigation into the Pedagogical Features of Documents\ne-QRAQ: A Multi-turn Reasoning Dataset and Simulator with Explanations\nLearning to Paraphrase for Question Answering\nDyck Words, Lattice Paths, and Abelian Borders\nNonmalleable Information Flow: Technical Report\nInvestigating how well contextual features are captured by  bi-directional recurrent neural network models\nTrace-Based Run-time Analysis of Message-Passing Go Programs\nAffective Neural Response Generation\nMethod for Aspect-Based Sentiment Annotation Using Rhetorical Analysis\nImproving Opinion-Target Extraction with Character-Level Word Embeddings\nLearning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts\nLearning Distributions of Meant Color\nSemantic keyword spotting by learning from images and speech\nSelf-adaptive static analysis\nVerb Pattern: A Probabilistic Semantic Representation on Verbs\nA software framework for pipelined arithmetic algorithms in field  programmable gate arrays\nConceptual Text Summarizer: A new model in continuous vector space\nA Model-Based Approach to Security Analysis for Cyber-Physical Systems\nTowards Neural Machine Translation with Partially Aligned Corpora\nNon-Autoregressive Neural Machine Translation\nReinforcement Learning of Speech Recognition System Based on Policy  Gradient and Hypothesis Selection\nBayesian Paragraph Vectors\nTowards Automated ICD Coding Using Deep Learning\nCommonsense LocatedNear Relation Extraction\nWeakly-supervised Semantic Parsing with Abstract Examples\nAutomatically Extracting Action Graphs from Materials Science Synthesis  Procedures\nEvent Representations with Tensor-based Compositions\nApplication of Natural Language Processing to Determine User  Satisfaction in Public Services\nThe Intersection Problem for Finite Monoids\nThe Impact of an AirBnb Host's Listing Description 'Sentiment' and  Length On Occupancy Rates\nAcronym Disambiguation: A Domain Independent Approach\nExtracting Automata from Recurrent Neural Networks Using Queries and  Counterexamples\nIncorporating External Knowledge to Answer Open-Domain Visual Questions  with Dynamic Memory Networks\nStudying tidal effects in planetary systems with Posidonius. A N-body  simulator written in Rust\nAWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus\nStrong Disorder Real-Space Renormalization for the Many-Body-Localized  phase of random Majorana models\nRecursive Programs for Document Spanners\nMapping to Declarative Knowledge for Word Problem Solving\nUnifying Theories of Time with Generalised Reactive Processes\nInvestigating the Working of Text Classifiers\nStructured Triplet Learning with POS-tag Guided Attention for Visual  Question Answering\nAutomatically Leveraging MapReduce Frameworks for Data-Intensive  Applications\nCall-by-name Gradual Type Theory\nAn Energy-aware Mutation Testing Framework for EAST-ADL Architectural  Models\nFormal Verification of Spacecraft Control Programs Using a Metalanguage  for State Transformers\nCode Reuse With Transformation Objects\nUnbounded Software Model Checking with Incremental SAT-Solving\nSentence Boundary Detection for French with Subword-Level Information  Vectors and Convolutional Neural Networks\nGlobal-scale phylogenetic linguistic inference from lexical resources\nA Type-Based Complexity Analysis of Object Oriented Programs\nLearning Approximate Inference Networks for Structured Prediction\nExperiments with Neural Networks for Small and Large Scale Authorship  Verification\nActor and Action Video Segmentation from a Sentence\nAggression-annotated Corpus of Hindi-English Code-mixed Data\nExecutable Operational Semantics of Solidity\nMachine Learning of Generic and User-Focused Summarization\nThe Rough Guide to Constraint Propagation\nFinite-State Non-Concatenative Morphotactics\nSoft Scheduling\nOrganizing Encyclopedic Knowledge based on the Web and its Application  to Question Answering\nAn object evaluator to generate flexible applications\nA Sequential Model for Multi-Class Classification\nThe Use of Classifiers in Sequential Inference\nUnsupervised Learning of Morphology without Morphemes\nType Inference for Guarded Recursive Data Types\nA Generalized Two-Phase Analysis of Knowledge Flows in Security  Protocols\nFormalizing typical crosscutting concerns\nAutomatically generating Feynman rules for improved lattice field  theories\nGamma-ray bursts and the sociology of science\nSymmetries and the Antibracket\nSpin(7) holonomy manifold and Superconnection\nFive dimensional 2-branes from special Lagrangian wrapped M5-branes\nRecursive method to obtain the parametric representation of a generic  Feynman diagram\nFirst-order supersymmetric sigma models and target space geometry\nOn N=8 attractors\nGeneralized Riemann-Hilbert Transmission and Boundary Value Problems,  Fredholm Pairs and Bordisms\nRepresentations of marked quivers\nGeodesics in the braid group on three strands\nBraid semistatistics and doubly regular R-matrix\nParallel transport of $Hom$-complexes and the Lovasz conjecture\nLocalization properties of highly singular generalized functions\nLevelScheme: A level scheme drawing and scientific figure preparation  system for Mathematica\nFirst Order Calculi with Values in Right--Universal Bimodules\nStructured psychosocial stress and therapeutic failure\nQuantum Stochastic Generators\nCorrections to the generalized vector dominance due to diffractive rho_3  production\nGeneralized Cauchy identities, trees and multidimensional Brownian  motions. Part II: Combinatorial differential calculus\nTwo physical characteristics of numerical apparent horizons\nOn the Kummer construction\nDownfolded Self-Energy of Many-Electron Systems\nOptimizing Binary Code Produced by Valgrind (Project Report on Virtual  Execution Environments Course - AVExe)\nCompetition and fragmentation: a simple model generating lognormal-like  distributions\nBounds for the discrete correlation of infinite sequences on k symbols  and generalized Rudin-Shapiro sequences\nCreating modular and reusable DSL textual syntax definitions with  Grammatic/ANTLR\nTime as an Illusion\nAbout raising and handling exceptions\nA standard transformation from XML to RDF via XSLT\nGeneral combination rules for qualitative and quantitative beliefs\nTowards a General Definition of Biometric Systems\nTowards a Semantic Preservation System\nTowards a Holographic Description of Inflation and Generation of  Fluctuations from Thermodynamics\nNonlinear Realization of Spontaneously Broken N=1 Supersymmetry  Revisited\nConstruction of minimal DFAs from biological motifs\nSimulation vs. Equivalence\nGravitational Chern-Simons and the adiabatic limit\nInfinity in computable probability\nThe GHZ/W-calculus contains rational arithmetic\nGeneralized Communicating P Systems Working in Fair Sequential Model\nDynamical generalizations of the Lagrange spectrum\nGenerative Prior Knowledge for Discriminative Classification\nStrictness of the Collapsible Pushdown Hierarchy\nOn a Formulation of Qubits in Quantum Field Theory\nFixed points of endomorphisms of virtually free groups\nConstructive version of Boolean algebra\nRule-weighted and terminal-weighted context-free grammars have identical  expressivity\nA Characterization of Cellular Automata Generated by Idempotents on the  Full Shift\nBADREX: In situ expansion and coreference of biomedical abbreviations  using dynamic regular expressions\nModeling Temporal Dependencies in High-Dimensional Sequences:  Application to Polyphonic Music Generation and Transcription\nThe three-state toric homogeneous Markov chain model has Markov degree  two\nIdentification of Fertile Translations in Medical Comparable Corpora: a  Morpho-Compositional Approach\nMixed integrals and related inequalities\nBinary Tree Arithmetic with Generalized Constructors\nTowards an Application of Update Propagation on Logic Programs  Representing Java Source Code\nTwo SVDs produce more focal deep learning representations\nExplaining Zipf's Law via Mental Lexicon\nOn the toric ideal of a matroid\nNonstandard techniques and nowhere differentiable functions I: A dense  family of generalized blancmange functions\nKleene Algebras and Semimodules for Energy Problems\nQuantum flights\nOnline Classification Using a Voted RDA Method\nA characterization of those automata that structurally generate finite  groups\nOn generalization of reversible second-order cellular automata\nBiquaternion formulation of relativistic tensor dynamics\nGenerating Music from Literature\nParsing using a grammar of word association vectors\nGeneral dynamic recovery for compensating CSP\nGeneralized version of the support vector machine for binary  classification problems: supporting hyperplane machine\nLax functors and coalgebraic weak bisimulation\nRecognizable Series on Hypergraphs\nApplication of Methods for Syntax Analysis of Context-Free Languages to  Query Evaluation of Logic Programs\nSimilarity density of the Thue-Morse word with overlap-free infinite  binary words\nDistributive Laws and Decidable Properties of SOS Specifications\nDecidability of the Clark's Completion Semantics for Monadic Programs  and Queries\nTowards the Ontology Web Search Engine\nImplementing generating functions to obtain power indices with coalition  configuration\nScattering Equations and Feynman Diagrams\nIrreducible decompositions and stationary states of quantum channels\nA novel code generation methodology for block diagram modeler and  simulators Scicos and VSS\nWords containing all permutations of a family of factors\nImplementing a Small Parsing Virtual Machine on Embedded Systems\nOn generalized Van-Benthem-type characterizations\nOn the length of fully commutative elements\nNew Improved Massive Gravity and Three Dimensional Spacetimes of  Constant Curvature and Constant Torsion\nConstruction of the fermionic vacuum and of fermionic operators of  creation and annihilation in the theory of algebraic spinors\nBounding quantification in parametric expansions of Presburger  arithmetic\nAlgebraic semantics for hybrid logics\nLabeling Topics with Images using Neural Networks\nTopological Sigma Models On Supermanifolds\nA Step-indexed Semantic Model of Types for the Call-by-Name Lambda  Calculus\nMetaprogramming Applied to Numerical Problems\nTranschromatic generalized character maps\nExplicit left orders on free groups extending the lexicographic order on  free monoids\nEfficient Generation of Correctness Certificates for the Abstract Domain  of Polyhedra\nFrom Declarative Model to Solution: Scheduling Scenario Synthesis\nCategorical Semantics for Functional Reactive Programming with Temporal  Recursion and Corecursion\nCLP(H): Constraint Logic Programming for Hedges\nUniform Interpolation for Coalgebraic Fixpoint Logic\nQuantum families of invertible maps and related problems\nA Formal Study on Backward Compatible Dynamic Software Updates\nOn Quantum Generalizations of Information-Theoretic Measures and their  Contribution to Distributional Semantics\nA note on the avoidability of binary patterns with variables and  reversals\nSymmetry in Cartan language for geometric theories of gravity\nA Neural Attention Model for Abstractive Sentence Summarization\nSports highlights generation based on acoustic events detection: A rugby  case study\nGenerating News Headlines with Recurrent Neural Networks\nVideo captioning with recurrent networks based on frame- and video-level  features and visual content classification\nSyntax-Semantics Interaction Parsing Strategies. Inside SYNTAGMA\nInflation and the Measurement Problem\nModel Checking : A Co-algebraic Approach\nNeural Network-Based Abstract Generation for Opinions and Arguments\nFocused Meeting Summarization via Unsupervised Relation Extraction\nPersonalized Emphasis Framing for Persuasive Message Generation\nControlling Output Length in Neural Encoder-Decoders\nGeneralization of metric classification algorithms for sequences  classification and labelling\nPresenting a New Dataset for the Timeline Generation Problem\nClassical field theories from Hamiltonian constraint: Local symmetries  and static gauge fields\nSynthesizing invariants by solving solvable loops\nPropositions in Linear Multirole Logic as Multiparty Session Types\nCutting-off Redundant Repeating Generations for Neural Abstractive  Summarization\nAutomatic Wikipedia Link Generation Based On Interlanguage Links\nUse Generalized Representations, But Do Not Forget Surface Features\nQuantization of noncompact coverings\nA Generic Online Parallel Learning Framework for Large Margin Models\nFibonacci words in hyperbolic Pascal triangles\nSynchronizing non-deterministic finite automata\nLater-stage Minimum Bayes-Risk Decoding for Neural Machine Translation\nScavenger 0.1: A Theorem Prover Based on Conflict Resolution\nAnalysing Data-To-Text Generation Benchmarks\npix2code: Generating Code from a Graphical User Interface Screenshot\nA General-Purpose Tagger with Convolutional Neural Networks\nToward uniform random generation in 1-safe Petri nets\nCompositions of Functions and Permutations Specified by Minimal Reaction  Systems\nA Generalized Recurrent Neural Architecture for Text Classification with  Multi-Task Learning\nThe Generalized Nagell-Ljunggren Problem: Powers with Repetitive  Representations\nExtractive Summarization using Deep Learning\nSeernet at EmoInt-2017: Tweet Emotion Intensity Estimator\nParadigm Completion for Derivational Morphology\nWOAH: Preliminaries to Zero-shot Ontology Learning for Conversational  Agents\nConfluence and Convergence in Probabilistically Terminating Reduction  Systems\nNominal C-Unification\nNovel Uses of Category Theory in Modeling OOP\nLearning to Remember Translation History with a Continuous Cache\n\"How Was Your Weekend?\" A Generative Model of Phatic Conversation\nAn Introduction to Image Synthesis with Generative Adversarial Nets\nSecure Web Access Control Algorithm\nBounded Context Switching for Valence Systems\nSimple Models for Word Formation in English Slang\nSelf-referencing cellular automata: A model of the evolution of  information control in biological systems\nAn analogue to Dixon's theorem for automaton groups\nComprehension-guided referring expressions\nAdversarial Learning for Neural Dialogue Generation\nLiveness-Driven Random Program Generation\nQuantum Analogical Modeling: A General Quantum Computing Algorithm for  Predicting Language Behavior\nConsistent perturbations in an imperfect fluid\nGeneralizing determinization from automata to coalgebras\nUnsolvability Cores in Classification Problems\nCode Generation for High-Level Synthesis of Multiresolution Applications  on FPGAs\nA Multi-scale Multiple Instance Video Description Network\nGraph Logics with Rational Relations\nComparing the writing style of real and artificial papers\nA Probabilistic Generative Grammar for Semantic Parsing\nRevisiting Parametricity: Inductives and Uniformity of Propositions\nGenerating and Estimating Nonverbal Alphabets for Situated and  Multimodal Communications\nApplication Software, Domain-Specific Languages, and Language Design  Assistants\nThe Open Language Archives Community and Asian Language Resources\nCompetition of Languages and their Hamming Distance\nHubs in Languages: Scale Free Networks of Synonyms\nSwapping Lemmas for Regular and Context-Free Languages\nQuotient Complexity of Closed Languages\nThe State of the Art: Ontology Web-Based Languages: XML Based\nFinitary languages\nApplying static code analysis to firewall policies for the purpose of  anomaly detection\nCross Language Text Classification via Subspace Co-Regularized  Multi-View Learning\nContextual Analysis for Middle Eastern Languages with Hidden Markov  Models\nModelling the Evolution of Programming Languages\nSeparating regular languages by piecewise testable and unambiguous  languages\nOn GitHub's Programming Languages\nContrastive Analysis with Predictive Power: Typology Driven Estimation  of Grammatical Error Distributions in ESL\nCapturing divergence in dependency trees to improve syntactic projection\nSharing Network Parameters for Crosslingual Named Entity Recognition\nKU-ISPL Language Recognition System for NIST 2015 i-Vector Machine  Learning Challenge\nKS_JU@DPIL-FIRE2016:Detecting Paraphrases in Indian Languages Using  Multinomial Logistic Regression Model\nImproving Document Clustering by Eliminating Unnatural Language\nNaturalizing a Programming Language via Interactive Learning\nA Study on Neural Network Language Modeling\nThe Galactic Dependencies Treebanks: Getting More Data by Synthesizing  New Languages\nTowards Language-Universal End-to-End Speech Recognition\nTracking Typological Traits of Uralic Languages in Distributed Language  Representations\nEmerging Language Spaces Learned From Massively Multilingual Corpora\nThe JHU Speech LOREHLT 2017 System: Cross-Language Transfer for  Situation-Frame Detection\nAutomatic Identification of Closely-related Indian Languages: Resources  and Experiments\nMultilayer Network of Language: a Unified Framework for Structural  Analysis of Linguistic Subsystems\nCameleon language Part 1: Processor\nIntegration of Heterogeneous Modeling Languages via Extensible and  Composable Language Components\nDeletion Operations on Deterministic Families of Automata\nA natural language interface to a graph-based bibliographic information  retrieval system\nLong Text Generation via Adversarial Training with Leaked Information\nDeformations of calibrated D-branes in flux generalized complex  manifolds\nAlgorithmic Detection of Computer Generated Text\nLearning to Start for Sequence to Sequence Architecture\nComputational Model to Generate Case-Inflected Forms of Masculine Nouns  for Word Search in Sanskrit E-Text\nTopic Aware Neural Response Generation\nBoundary-Seeking Generative Adversarial Networks\nA Unified Query-based Generative Model for Question Generation and  Question Answering\nNeural Text Generation: A Practical Guide\nEnd-to-end Adversarial Learning for Generative Conversational Agents\nUne grammaire formelle du créole martiniquais pour la génération  automatique\nGeneral three-state model with biased population replacement: Analytical  solution and application to language dynamics\nImplementing Support for Pointers to Private Data in a General-Purpose  Secure Multi-Party Compiler\nProfunctor Optics: Modular Data Accessors\nLive Multi-language Development and Runtime Environments\nGenerating efficient belief models for task-oriented dialogues\nAutomatic Generation of Constraint Propagation Algorithms for Small  Finite Domains\nComplexity of Nested Circumscription and Nested Abnormality Theories\nGraXML - Modular Geometric Modeler\nGeneral Scheme for Perfect Quantum Network Coding with Free Classical  Communication\nOn a coordinate independent description of string worldsheet theory\nStochastic Simulation of Process Calculi for Biology\nA Generation-based Text Steganography Method using SQL Queries\nFrom Design to Implementation: an Automated, Credible Autocoding Chain  for Control Systems\nFormal Ontology Learning on Factual IS-A Corpus in English using  Description Logics\nModularity Aspects of Disjunctive Stable Models\nA Spatial Data Model for Moving Object Databases\nCoherent Multi-Sentence Video Description with Variable Level of Detail\nKevoree Modeling Framework (KMF): Efficient modeling techniques for  runtime use\nTowards a General Framework for Actual Causation Using CP-logic\nHinge-Loss Markov Random Fields and Probabilistic Soft Logic\nPractical Run-time Checking via Unobtrusive Property Caching\nA Method for Modeling Co-Occurrence Propensity of Clinical Codes with  Application to ICD-10-PCS Auto-Coding\nType-Directed Synthesis of Products\nConstructive Galois Connections: Taming the Galois Connection Framework  for Mechanized Metatheory\nA Focused Dynamic Attention Model for Visual Question Answering\nPatterns and Rewrite Rules for Systematic Code Generation (From  High-Level Functional Patterns to High-Performance OpenCL Code)\nDistance structures for generalized metric spaces\nAutomated Correction for Syntax Errors in Programming Assignments using  Recurrent Neural Networks\nDynamic Witnesses for Static Type Errors (or, Ill-Typed Programs Usually  Go Wrong)\nA new selection strategy for selective cluster ensemble based on  Diversity and Independency\nA Theory of Available-by-Design Communicating Systems\nWord and Document Embeddings based on Neural Network Approaches\nReproducing and learning new algebraic operations on word embeddings  using genetic programming\nVisual-textual Attention Driven Fine-grained Representation Learning\nDoctoral Advisor or Medical Condition: Towards Entity-specific Rankings  of Knowledge Base Properties [Extended Version]\nA domain-specific language for the hybridization and static condensation  of finite element methods\nDid You Really Just Have a Heart Attack? Towards Robust Detection of  Personal Health Mentions in Social Media\nGraph2Seq: Graph to Sequence Learning with Attention-based Neural  Networks\nDATR Theories and DATR Models\nTrading off Completeness for Efficiency --- The \\textsc{ParseTalk}  Performance Grammar Approach to Real-World Text Parsing\nAn Annotation Scheme for Free Word Order Languages\nA complexity measure for diachronic Chinese phonology\nMessage-Passing Protocols for Real-World Parsing -- An Object-Oriented  Model and its Preliminary Evaluation\nC++ programming language for an abstract massively parallel SIMD  architecture\nFile mapping Rule-based DBMS and Natural Language Processing\nLexical Base as a Compressed Language Model of the World (on the  material of the Ukrainian language)\nG-automata, counter languages and the Chomsky hierarchy\nSimulation for competition of languages with an ageing sexual population\nPhase transition in a sexual age-structured model of learning foreign  languages\nA universal model for languages and cities, and their lifetimes\nAnalytical approach to bit-string models of language evolution\nOn the Length of the Wadge Hierarchy of Omega Context Free Languages\nHow applicable is Python as first computer language for teaching  programming in a pre-university educational environment, from a teacher's  point of view?\nClosures in Formal Languages: Concatenation, Separation, and Algorithms\nREC language is a live on IBM1130 simulator, EL lenguaje REC esta vivo  en el simulador de la IBM 1130\nA Concurrent Language with a Uniform Treatment of Regions and Locks\nInverse Star, Borders, and Palstars\nCombinatorial Characterization of Formal Languages\nContents of COMP6411 Summer 2010 Final Reports on Comparative Studies of  Programming Languages\nT2Script Programming Language\nLogic Characterization of Floyd Languages\nLucretia - a type system for objects in languages with reflection\nUniversal Witnesses for State Complexity of Boolean Operations and  Concatenation Combined with Star\nUnambiguous Tree Languages Are Topologically Harder Than Deterministic  Ones\nVerbalizing Ontologies in Controlled Baltic Languages\nMinimal Nondeterministic Finite Automata and Atoms of Regular Languages\nTopological dynamics and recognition of languages\nStemmers for Tamil Language: Performance Analysis\nA State of the Art of Word Sense Induction: A Way Towards Word Sense  Disambiguation for Under-Resourced Languages\nProperties of phoneme N -grams across the world's language families\nMonoid automata for displacement context-free languages\nUsing Scripting Languages to Teach Programming\nGraph Spectral Properties of Deterministic Finite Automata\nTemporal Analysis of Language through Neural Language Models\nNeural Mechanism of Language\nA Complete Refinement Procedure for Regular Separability of Context-Free  Languages\nLanguages for Mobile Agents\nA Primer on Neural Network Models for Natural Language Processing\nFormalization of context-free language theory\nOn the construction of fully interpreted formal languages which posses  their truth predicates\nA Semisupervised Approach for Language Identification based on Ladder  Networks\nFiltrations of Formal Languages by Arithmetic Progressions\nApricot - An Object-Oriented Modeling Language for Hybrid Systems\nDesigning a Pattern Language For Surviving Earthquakes\nCRF-based Named Entity Recognition @ICON 2013\nAn implementation of Apertium based Assamese morphological analyzer\nComparative Studies of Six Programming Languages\nA Survey on Operational State Complexity\nSecurity and Privacy Policy Languages: A Survey, Categorization and Gap  Identification\nTrans-gram, Fast Cross-lingual Word-embeddings\nProgramming Language Features for Refinement\nOne-Shot Neural Cross-Lingual Transfer for Paradigm Completion\nReversible Languages Having Finitely Many Reduced Automata\nRacial Disparity in Natural Language Processing: A Case Study of Social  Media African-American English\nCross-lingual, Character-Level Neural Morphological Tagging\nLower Bounds on Regular Expression Size\nExploring the Naturalness of Buggy Code with Recurrent Neural Networks\nA Language for Function Signature Representations\nModelling Concurrency with Comtraces and Generalized Comtraces\nOn Generating *-Sound Nets with Substitution\nLearning a Recurrent Visual Representation for Image Caption Generation\nGeneralized complex geometry of pure backgrounds in ten and eleven  dimensions\nGenerating Multi-Sentence Lingual Descriptions of Indoor Scenes\nCan Machine Generate Traditional Chinese Poetry? A Feigenbaum Test\nTwo are Better than One: An Ensemble of Retrieval- and Generation-Based  Dialog Systems\nTowards Automatic Generation of Entertaining Dialogues in Chinese  Crosstalks\nAttnGAN: Fine-Grained Text to Image Generation with Attentional  Generative Adversarial Networks\nNeural Response Generation with Dynamic Vocabularies\nLearning Deep Generative Models of Graphs\nText Analysis Tools in Spoken Language Processing\nAnusaaraka: Overcoming the Language Barrier in India\nUsing a hierarchy of Domain Specific Languages in complex software  systems design\nFormal Languages and Infinite Groups\nFormal semantics of language and the Richard-Berry paradox\nLength of the Shortest Word in the Intersection of Regular Languages\nJSC : A JavaScript Object System\nThe FC-rank of a context-free language\nNatural Language Understanding Based on Semantic Relations between  Sentences\nThe Green Language\nHyperbolic tilings and formal language theory\nSeparation Property for wB- and wS-regular Languages\nSyntax and analytic semantics of LISA\nA perspective on the advancement of natural language processing tasks  via topological analysis of complex networks\nOn the Complexity of L-reachability\nCommutative positive varieties of languages\nTowards a theory of word order. Comment on \"Dependency distance: a new  perspective on syntactic patterns in natural language\" by Haitao Liu et al\nLinking Types for Multi-Language Software: Have Your Cake and Eat It Too\nA Universal Semantic Space\nFrom Phonology to Syntax: Unsupervised Linguistic Typology at Different  Levels with Language Embeddings\nAutomatic Optimization of Hardware Accelerators for Image Processing\nFormal Modelling, Testing and Verification of HSA Memory Models using  Event-B\nWhy informatics and general science need a conjoint basic definition of  information\nThe Speech-Language Interface in the Spoken Language Translator\nIncorporating \"Unconscious Reanalysis\" into an Incremental, Monotonic  Parser\nA State-Transition Grammar for Data-Oriented Parsing\nMonte Carlo simulation of the rise and the fall of languages\nExploiting Syntactic Structure for Natural Language Modeling\nTwo-parameter Model of Word Length \"Language - Genre\"\nUniversal Model for Paraphrasing -- Using Transformation Based on a  Defined Criteria --\nMonads for natural language semantics\nExtending Dublin Core Metadata to Support the Description and Discovery  of Language Resources\nExploiting multilingual nomenclatures and language-independent text  features as an interlingua for cross-lingual text analysis applications\nCanonical decomposition of catenation of factorial languages\nComplex networks and human language\nThe DFAs of Finitely Different Languages\nIST is more than an algorithm to prove ZFC theorems\nHairdressing in groups: a survey of combings and formal languages\nMicroscopic and Macroscopic Simulation of Competition between Languages\nMonte Carlo simulation of survival for minority languages\nProbabilities to accept languages by quantum finite automata\nTranslating a first-order modal language to relational algebra\nLanguage simulation after a conquest\nDecision Problems For Convex Languages\nWeak Mso with the Unbounding Quantifier\nState complexity of orthogonal catenation\nAlgebraic properties of structured context-free languages: old  approaches and novel developments\nOn Languages Accepted by P/T Systems Composed of joins\nLXG Compiler - Design and Implementation\nExploring Language-Independent Emotional Acoustic Features via Feature  Selection\nComparative Studies of 10 Programming Languages within 10 Diverse  Criteria -- a Team 7 COMP6411-S10 Term Report\nStrategic programming on graph rewriting systems\nContext-free ordinals\nParameterized Regular Expressions and their Languages\nSelf-Adjusting Stack Machines\nLanguage understanding as a step towards human level intelligence -  automatizing the construction of the initial dictionary from example  sentences\nLanguage Acquisition in Computers\nInfinite Synchronizing Words for Probabilistic Automata (Erratum)\nUniversal Witnesses for State Complexity of Basic Operations Combined  with Reversal\nFinite Automata with Time-Delay Blocks (Extended Version)\nIn the Maze of Data Languages\nMaximal Syntactic Complexity of Regular Languages Implies Maximal  Quotient Complexities of Atoms\nEstimating Confusions in the ASR Channel for Improved Topic-based  Language Model Adaptation\nOn the Topological Complexity of omega-Languages of Non-Deterministic  Petri Nets\nDeciding the Borel complexity of regular tree languages\nMartta: A C++ Language Workbench\nSign Language Lexical Recognition With Propositional Dynamic Logic\nBreadth-first serialisation of trees and rational languages\nComplexity of checking whether two automata are synchronized by the same  language\nProceedings 14th International Conference on Automata and Formal  Languages\nHidden Markov Model Based Part of Speech Tagger for Sinhala Language\nUnknown Words Analysis in POS tagging of Sinhala Language\nScanning and Parsing Languages with Ambiguities and Constraints: The  Lamb and Fence Algorithms\nA language model based approach towards large scale and lightweight  language identification systems\nHierarchical Character-Word Models for Language Identification\nDemographic Dialectal Variation in Social Media: A Case Study of  African-American English\nExtension of hidden markov model for recognizing large vocabulary of  sign language\nMaximally Atomic Languages\nTranslation Of Telugu-Marathi and Vice-Versa using Rule Based Machine  Translation\nTemplet: a Markup Language for Concurrent Programming\nSeparated by an Un-common Language: Towards Judgment Language Informed  Vector Space Modeling\nThe Gremlin Graph Traversal Machine and Language\nLiberating language research from dogmas of the 20th century\nQuick Brown Fox in Formal Languages\nFrom quantum foundations via natural language meaning to a theory of  everything\nAn automata characterisation for multiple context-free languages\nInter-language Collaboration in an Object-oriented Virtual Machine\nA Chomsky-Schützenberger representation for weighted multiple  context-free languages\nPMI Matrix Approximations with Applications to Neural Language Modeling\nBuilding a robust sentiment lexicon with (almost) no resource\nIndustrial Experience Report on the Formal Specification of a Packet  Filtering Language Using the K Framework\nTowards a Theory of Complexity of Regular Languages\nMonte Carlo Action Programming\nSome connections between universal algebra and logics for trees\nCross-lingual Abstract Meaning Representation Parsing\nFound in Translation: Reconstructing Phylogenetic Language Trees from  Translations\nUncountable realtime probabilistic classes\nOn the decidability of $k$-Block determinism\nGated-Attention Architectures for Task-Oriented Language Grounding\nSyllable-aware Neural Language Models: A Failure to Beat Character-aware  Ones\nProceedings 15th International Conference on Automata and Formal  Languages\nBPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages\nLanguage as a matrix product state\nFunTAL: Reasonably Mixing a Functional Language with Assembly\nClassical Control, Quantum Circuits and Linear Logic in Enriched  Category Theory\nOne for All: Towards Language Independent Named Entity Linking\nUniversal Neural Machine Translation for Extremely Low Resource  Languages\nUnambiguous languages exhaust the index hierarchy\n2D Transonic Hydrodynamics in General Relativity\nIntroductory Overview of Modern Cosmology\nVelocity Curves for Stars in Disk Galaxies: A case for Nearly Newtonian  Dynamics\nPhysics of glassy systems\nOn J. Goodman's comment to \"Language Trees and Zipping\"\nOn an Application of Relative Entropy\nTwistor theory and the four-dimensional Quantum Hall effect of Zhang and  Hu\nA hybrid model for chaotic front dynamics: From semiconductors to water  tanks\nHydrodynamic Formulation of the Hubbard Model\nThe energy spectrum symmetry of Heisenberg model in Fock space\nTranslating near-synonyms: Possibilities and preferences in the  interlingua\nInducing a Semantically Annotated Lexicon via EM-Based Clustering\nNoun Phrase Recognition by System Combination\nCentroid-based summarization of multiple documents: sentence extraction,  utility-based evaluation, and user studies\nApproximation and Exactness in Finite State Optimality Theory\nSecurity Policy Consistency\nIncremental construction of minimal acyclic finite-state automata\nExploiting auxiliary distributions in stochastic unification-based  grammars\nSemantic interpretation of temporal information by abductive inference\nType Arithmetics: Computation based on the theory of types\nObject-oriented tools for advanced applications\nLogic, Individuals and Concepts\nThe Role of Conceptual Relations in Word Sense Disambiguation\nAnaphora and Discourse Structure\nStereotypical Reasoning: Logical Properties\nSchedulers for Rule-based Constraint Programming\nOpen Network Handles Implemented in DNS\nCatching the Drift: Probabilistic Content Models, with Applications to  Generation and Summarization\nTemporal logic with predicate abstraction\nImproved Inference for Checking Annotations\nExplaining Constraint Programming\nKnowledge Flow Analysis for Security Protocols\nOn Typechecking Top-Down XML Tranformations: Fixed Input or Output  Schemas\nTarski's influence on computer science\nInterroger un corpus par le sens\nTransformation de Fourier-Mukai sur les Surfaces Hyperkählériennes\nParticle Spectrum Created Through Bubble Nucleation\nVacuum Spacetimes with Future Trapped Surfaces\nPerturbative Analysis of Bianchi IX using Ashtekar Formalism\nOn the degenerate phase boundaries\nSpacetime G-structures I: Topological Defects\nIsocurvature Perturbations in Quintessence Cosmologies\nAn extension principle for the Einstein-Vlasov system in spherical  symmetry\nA quantum-like description of the planetary systems\nA Dodecalogue of Basic Didactics from Applications of Abstract  Differential Geometry to Quantum Gravity\nStrings, gravity and particle physics\nDill: An Algorithm and a Symbolic Software Package for Doing Classical  Supersymmetry Calculations\nA simple solution to color confinement\nSkyrme Model Language in the Theory of Nucleons and Nuclei\nSHERPA 1.alpha, a proof-of-concept version\nClassical integrable lattice models through quantum group related  formalism\nComputing the BRST Operator Used in Quantization of Gauge Theories\nFive Lectures on the Jet Methods in Field Theory\nWBase: a C package to reduce tensor products of Lie algebra  representations. Description and new developments\nCurrent Algebra and Bosonization in Three Dimensions\nQuantum Field Theories on Algebraic Curves\nExpanding and contracting universes in third quantized string cosmology\nSuperembeddings, Non-Linear Supersymmetry and 5-branes\nProbability distributions in statistical ensembles with conserved  charges\nCT-duality as a local property of the world-sheet\nTopological D-Branes and Commutative Algebra\nCompleteness proof of functional logic, a formalism with  variable-binding nonlogical symbols\nAffine type A crystal structure on tensor products of rectangles,  Demazure characters, and nilpotent varieties\nActive Libraries: Rethinking the roles of compilers and libraries\nUniform Versions of Infinitary Properties in Banach Spaces\nHochschild DGLAs and torsion algebras\nSets and Their Sizes\nA note on Newtonian, Lagrangian and Hamiltonian dynamical systems in  Riemannian manifolds\nSymplectic operad geometry and graph homology\nLie-Rinehart algebras, descent, and quantization\nCan we express every transfinite concept constructively?\nIs the Halting probability a Dedekind real number?\nState Complexity and the Monoid of Transformations of a Finite Set\nInvariant and evolutionary properties of the skew-symmetric differential  forms\nCovers of the multiplicative group of an algebraically closed field of  characteristic zero\nCountable and Full Exchange Rings\nShort introduction to Nonstandard Analysis\nVariations of a Coin-Removal Problem\nLimits and the system of near-numbers\nPA is instantiationally complete, but algorithmically incomplete: An  alternative interpretation of Goedelian incompleteness under Church's Thesis  that links formal logic and computability\nSmall Valdivia compact spaces\nKontsevich's formula and the WDVV equations in tropical geometry\nAn Introduction to Zoli Numbers\nSome metric properties of automorphisms of groups\nCombinatorial aspects of code loops\nQuantum and Braided Integrals\nNested quasicrystalline discretisations of the line\nCoupled Map Networks as Communication Schemes\nThe normal dual congruences and the dual Bianchi lattice\nMechanical interpretation of existence theorems in a nonlinear Dirichlet  problem\nAnalise Termodinamica da aceleracao de uma massa\nEstabelecimento do Conceito de Temperatura como uma grandeza derivada da  Energia e da Entropia\nThe Best Possible Unification for any Collection of Physical Theories\nHeat transmission in Relativity\nIntroduction to Quantum Cryptography\nFast Computation of Voigt Functions via Fourier Transforms\nDo reductionist cures select for holistic diseases? Adaptive chronic  infection, structured stress, and medical magic bullets\nStructured psychosocial stress and the US obesity epidemic\nCucker-Smale Flocking under Hierarchical Leadership\nQuantum Counting\nWhy Quantum Mechanics is Hard to Understand\nDesigning optimum CP maps for quantum teleportation\nOn Almost Periodicity Criteria for Morphic Sequences in Some Particular  Cases\nContinuous selections and sigma-spaces\nThe Holographic Interpretation of Hawking Radiation\nO-minimal cohomology: finiteness and invariance results\nTest Functions Space in Noncommutative Quantum Field Theory\nAn Abstract Interpolation Problem and the Extension Theory of Hermitian  Operators\nOn the quantization of conjugacy classes\nAdvanced Compact Thermal Modeling by using VHDL-AMS\nUML 2.0 - Overview and Perspectives in SoC Design\nUnified Modeling of Complex Real-Time Control Systems\nThe immune system: look who's talking\nCan a Computer Laugh ?\nApplication of Tuncay's language teacher model to business-customer  relations\nMcKay correspondence for Landau-Ginzburg models\nQ-systems as cluster algebras\nSurvey of Technologies for Web Application Development\nSoftware graphs and programmer awareness\nClassical Enhancement of Quantum Error-Correcting Codes\nFeature Unification in TAG Derivation Trees\nData-Oblivious Stream Productivity\nA Process Algebra Software Engineering Environment\nOn multi F-nomial coefficients and Inversion formula for F-nomial  coefficients\n$σ$-continuity and related forcings\nMean asymptotic behaviour of radix-rational sequences and dilation  equations (Extended version)\nDevelopment of simulation package 'ELSES' for extra-large-scale  electronic-structure calculation\nOn the Stability of Electrostatic Orbits\nCoZo+ - A Content Zoning Engine for textual documents\nRadiative corrections to muon decay in leading and next to leading  approximation for electron spectrum\nClassification de modules aux différences filtrés isogradués\nGeometry of splice-quotient singularities\nQuantum vacuum and accelerated expansion\nCatalan numbers and relations\nBridge Theory: Oltre la Frontiera Quantistica\nA Semantics-Aware Editing Environment for Prolog in Eclipse\nOn the lattice of sub-pseudovarieties of DA\nSome Remarks on the Toeplitz Corona problem\nA protocol for instruction stream processing\nConsiderations on Construction Ontologies\nNon-regularity of floor(alpha + log_k(n))\nSymbolic Script Programming for Java\nProceedings International Workshop on The Complexity of Simple Programs\nAdvances in the Design and Implementation of a Multi-Tier Architecture  in the GIPSY Environment\nA Noisy-Channel Model for Document Compression\nSyndeticity and independent substitutions\nDocumenting Spreadsheets with Pseudo-Code: an Exercise with Cash-Flow  and Loans\nThe congruence subgroup property for $Aut F_2$: A group-theoretic proof  of Asada's theorem\nUsing R for data analysis and graphing in an introductory physics  laboratory\nGeometric and topological aspects of Type IIB D-branes\nDiscussion on Supervisory Control by Solving Automata Equation\nIs Ramsey's theorem omega-automatic?\nCombinatorial cubic surfaces and reconstruction theorems\nExamples of non-compact quantum group actions\nPoint Processes Modeling of Time Series Exhibiting Power-Law Statistics\nUnification and Emergence in Physics: the Problem of Articulation\nProceedings Ninth International Workshop on Reduction Strategies in  Rewriting and Programming\nDiffusive wavelets on groups and homogeneous spaces\nAn Improved Algorithm for Generating Database Transactions from  Relational Algebra Specifications\nCategorical Models for a Semantically Linear Lambda-calculus\nOn Linear Information Systems\nCourant algebroids: Cohomology and Matched Pairs\nSelected issues on justification of holographic approach to QCD\nOn The Structure Of The Chan-Paton Factors For D-Branes In Type II  Orientifolds\nPartition theorems from creatures and idempotent ultrafilters\nMolecular Programming Pseudo-code Representation to Molecular  Electronics\nPreorientations of the derived motivic multiplicative group\nParametrizing Program Analysis by Lifting to Cardinal Power Domains\nLoomis--Sikorski Theorem and Stone Duality for Effect Algebras with  Internal State\nTuring Automata and Graph Machines\nA Framework for Constraint-Based Deployment and Autonomic Management of  Distributed Applications\nA Middleware Framework for Constraint-Based Deployment and Autonomic  Management of Distributed Applications\n$\\aleph_0$-categorical strongly minimal compact complex manifolds\nState Elimination Ordering Strategies: Some Experimental Results\nFinite-State Complexity and the Size of Transducers\nA view of canonical extension\nOn Second-Order Monadic Monoidal and Groupoidal Quantifiers\nOn the nature of financial leverage\nThe Wigner Distribution\nThe $z$-Transform and Automata-Recognizable Systems of Nonhomogeneous  Linear Recurrence Equations over Semirings\nCatalan structures and Catalan pairs\nThe semantic mapping of words and co-words in contexts\nA Categorical Outlook on Cellular Automata\nNegative bases and automata\nNominal Unification Revisited\nContracts for Abstract Processes in Service Composition\nOn Brlek-Reutenauer conjecture\nWeak mu-equality is decidable\nDescent and forms of tensor categories\nComputing Semi-algebraic Invariants for Polynomial Dynamical Systems\nMemory Reduction via Delayed Simulation\nAutomatic Synthesis of Switching Controllers for Linear Hybrid Automata\nReinforcement learning in signaling game\nCentral limit theorems for additive functionals of ergodic Markov  diffusions processes\nGeometric Semigroup Theory\nExtensional Higher-Order Logic Programming\nUnification of some classical and quantum ideas\nQuantification in ordinary language\nCombinatorics on words in information security: Unavoidable regularities  in the construction of multicollision attacks on iterated hash functions\nGeometric grid classes of permutations\nGalois subfields of inertially split division algebras\nRedAlert: Determinacy Inference for Prolog\nGeometrical view of quantum entanglement\nGroupoid-theoretical methods in the mapping class groups of surfaces\nEquational theories of profinite structures\nSolving the TTC 2011 Reengineering Case with GrGen.NET\nAn Entertaining Example of Using the Concepts of Context-Free Grammar  and Pushdown Automation\nA proof of Reidemeister-Singer's theorem by Cerf's methods\nThe quasi-Hopf analogue of $u_q(sl_2)$\nAn order-theoretic analysis of interpretations among propositional  deductive systems\nFunctional Logic Programming with Generalized Circular Coinduction\nCAT(0) geometry for the Thompson Group\nEnumeration of edges in some lattices of paths\nOn Shift Spaces with Algebraic Structure\nEnumeration of saturated chains in Dyck lattices\nComputing Accurate Age and Distance Factors in Cosmology\nA simplified framework for first-order languages and its formalization  in Mizar\nMatrix elements of unstable states\nSubword Complexity and k-Synchronization\nLeast periods of k-automatic sequences\nCredal nets under epistemic irrelevance\nApproximating Weak Bisimilarity of Basic Parallel Processes\nCoordination Level Modeling and Analysis of Parallel Programs using  Petri Nets\nBorel* Sets in the Generalised Baire Space\nA characterization of $p$-automatic sequences as columns of linear  cellular automata\nDiagrammatic confluence for Constraint Handling Rules\nArtex is AnotheR TEXt summarizer\nAn interactive programme for Steiner trees\nQuantum families of maps\nPoisson-Lie Sigma Models on Drinfel'd double\nBoA: a versatile software for bolometer data reduction\nOn optimum left-to-right strategies for active context-free games\nTowards a Theory of Glue\nPartial Orders for Efficient BMC of Concurrent Software\nOn the canonical connection for smooth envelopes\nOn the least number of palindromes contained in an infinite word\nA Semantic Matching Energy Function for Learning with Multi-relational  Data\nOn Families in Differential Geometry\nDensity Ratio Hidden Markov Models\nNonexistence of the final first integral in the Zipoy-Voorhees  space-time\nCompactness of powers of ω\nA Theoretical Framework for Context-Sensitive Temporal Probability Model  Construction with Application to Plan Projection\nKRAB Algorithm - A Revised Algorithm for Incremental Call Graph  Generation\nCollapsible Pushdown Graphs of Level 2 are Tree-Automatic\nAn efficient way to perform the assembly of finite element matrices in  Matlab and Octave\nThe use of the teleparallelism connection in continuum mechanics\nRelative dimension of morphisms and dimension for algebraic stacks\nA Babuška-Aziz type proof of the circumradius condition\nNonlinear parabolic problems in Musielak--Orlicz spaces\nIn-in formalism on tunneling background: multi-dimensional quantum  mechanics\nSyntactic Complexity of Circular Semi-Flower Automata\nLinear Dependent Types for Domain Specific Program Analysis (Extended  Abstract)\nIntegrating Datalog and Constraint Solving\nProceedings Machines, Computations and Universality 2013\nAnalysing Quality of English-Hindi Machine Translation Engine Outputs  Using Bayesian Classification\nSaturation of Concurrent Collapsible Pushdown Systems\nA reduction of proof complexity to computational complexity for  $AC^0[p]$ Frege systems\nFidelity susceptibility and Loschmidt echo for generic paths in a three  spin interacting transverse Ising model\nExpressiveness of Visibly Pushdown Transducers\nDeciding $k$CFA is complete for EXPTIME\nNon viability of hyperbolic quantum mechanics as a theory of Nature\nA Simple Method to Produce Algorithmic MIDI Music based on Randomness,  Simple Probabilities and Multi-Threading\nThe semantic marriage of monads and effects\nTBX goes TEI -- Implementing a TBX basic extension for the Text Encoding  Initiative guidelines\nReset thresholds of automata with two cycle lengths\nQuantum Turing automata\nType-amalgamation properties and polygroupoids in stable theories\nMemory-only selection of dictionary PINs\nClassification theory for accessible categories\nGenerating Synchronizing Automata with Large Reset Lengths\nMulti-borders classification\nA heuristic prover for real inequalities\nDo we really need to write documentation for a system? CASE tool  add-ons: generator+editor for a precise documentation\nKeeping a Crowd Safe: On the Complexity of Parameterized Verification  (Corrected version)\nPhynance\nFunctional Bandits\nTowards an Efficient Prolog System by Code Introspection\nGrammars with two-sided contexts\nBisimulation Equivalence of First-Order Grammars\nMulti-layered graph-based multi-document summarization model\nQuantitative model-checking of controlled discrete-time Markov processes\nProceedings 9th Workshop on Quantum Physics and Logic\nA Framework to Synergize Partial Order Reduction with State  Interpolation\nOptimizing Component Combination in a Multi-Indexing Paragraph Retrieval  System\nBe Careful When Assuming the Obvious: Commentary on \"The placement of  the head that minimizes online memory: a complex systems approach\"\nSubset seed automaton\nForcing a countable structure to belong to the ground model\nKazama-Suzuki Models of N=2 Superconformal Field Theory and Manin  triples\nSymbolic Solving of Extended Regular Expression Inequalities\nPrinciples for Verification Tools: Separation Logic\nOrbit automata as a new tool to attack the order problem in automaton  groups\nVariations on the Stochastic Shortest Path Problem\nThe complexity of satisfaction problems in reverse mathematics\nMetamorphosis of Fuzzy Regular Expressions to Fuzzy Automata using the  Follow Automata\nCertification of programs with computational effects\nPeetre-Slovák's theorem revisited\nQANUS: An Open-source Question-Answering Platform\nFoundational Extensible Corecursion\nDisaster Monitoring with Wikipedia and Online Social Networking Sites:  Structured Data and Linked Data Fragments to the Rescue?\nHamiltonian cosmology in bigravity and massive gravity\nA general framework for quantum macroscopicity in terms of coherence\nSynchronizing delay for binary uniform morphisms\nPart I: Vector Analysis of Spinors\nInitial non-repetitive complexity of infinite words\nInterface Between Market and Science\nMachine Learning for Machine Data from a CATI Network\nOmniGraph: Rich Representation and Graph Kernel Learning\nOrdered Tree-Pushdown Systems\nCharacteristic Formulae for Session Types (extended version)\nFast k-best Sentence Compression\nMinimizing Regret in Discounted-Sum Games\nProceedings 12th International Workshop on Quantum Physics and Logic\nSelective inference in regression models with groups of variables\nThe Complexity of Interaction (Long Version)\nThe realization of the wave function collapse in the linguistic  interpretation of quantum mechanics\nOn small profinite groups\nMulti-Field Structural Decomposition for Question Answering\nEilenberg--Moore Monoids and Backtracking Monad Transformers\nSupervised and Unsupervised Ensembling for Knowledge Base Population\nDerivative-Based Diagnosis of Regular Expression Ambiguity\nProbabilistic Resource Analysis by Program Transformation\nUnique Parallel Decomposition for the Pi-calculus\nSex, drugs, and violence\nHigher-Order Kullback-Leibler Aggregation of Markov Chains\nAligning Packed Dependency Trees: a theory of composition for  distributional semantics\nOn sequentially h-complete groups\nTopological Considerations for Tuning and Fingering Stringed Instruments\nThe Cubical Homology of Trace Monoids\nApproximated maximum likelihood estimation in multifractal random walks\nAbstracting Path Conditions\nProof nets for the Lambek-Grishin calculus\nCompactness of $ω^λ$ for $λ$ singular\nQuasi-Hamiltonian bookkeeping of WZNW defects\nTorus Invariant Curves\nState BCK-algebras and State-Morphism BCK-algebras\nOn Semantic Word Cloud Representation\nExpectation values of quantum powers < r^a > using quantum defects for  Li O Na Mg using symbolic Mathematics, and new tools such as Topbase to  produce quantum defects of these elements\nCOINs change leaders - Lessons Learned from a Distributed Course\nA Secure and Comparable Text Encryption Algorithm\nWhat are symmetries of nonlinear PDEs and what are they themselves?\nParameterization of temperature and spectral distortions in future CMB  experiments\nFramed 4-valent Graph Minor Theory I: Intoduction. A Planarity Criterion  and Linkless Embeddability\nPeriodic configurations of subshifts on groups\nStandard protocol complexes for the immediate snapshot read/write model\nSecond order symmetry operators\nFree amalgamation and automorphism groups\nMonad Transformers for Backtracking Search\nOn the relation between continuous functions in two different metric  spaces\nA Bengali HMM Based Speech Synthesis System\nTowards an Error Correction Memory to Enhance Technical Texts Authoring  in LELIE\nSupergravity Actions with Integral Forms\nDynamic Component Composition\nRecurrent Neural Network Regularization\nMinimal and maximal constituents of twisted Foulkes characters\nA Note on Semantics (with an Emphasis on UML)\nUsing Answer Set Programming for pattern mining\nThe role of homology in fluid vortices I: non-relativistic flow\nUnsupervised Domain Adaptation with Feature Embeddings\nTensor calculus with open-source software: the SageManifolds project\nBayesian Optimisation for Machine Translation\nDagger Geometry As Banach Algebraic Geometry\nRewriting Higher-Order Stack Trees\nString Corrected Spacetimes and SU(N)-Structure Manifolds\nOn the topology of rational T-varieties of complexity one\nBusiness Rule Mining from Spreadsheets\nDiscrete Temporal Constraint Satisfaction Problems\nDeep Recurrent Neural Networks for Acoustic Modelling\nUnsupervised Dependency Parsing: Let's Use Supervised Parsers\nHoming Vector Automata\nInferring Program Transformations from Type Transformations for  Partitioning of Ordered Sets\nSome aspects of holographic W-gravity\nParikh matrices and Parikh Rewriting Systems\nTopic Stability over Noisy Sources\nContinued fraction expansions in connection with the metric Mahler  measure\nIllinoisSL: A JAVA Library for Structured Prediction\nSentiment Uncertainty and Spam in Twitter Streams and Its Implications  for General Purpose Realtime Sentiment Analysis\nExtended Conditional Independence and Applications in Causal Inference\nA detailed analysis of mathematics of entanglement in Non-Hermitian  systems in real eigenvalue regime\nDipole Codes Attractively Encode Glue Functions\nOn Rings of Differential Rota-Baxter Operators\nSimple Baseline for Visual Question Answering\nTurbulent Thermal Diffusion: A Way to Concentrate Dust in Protoplanetary  Discs\nStack Exchange Tagger\nAlmost Continuous Transformations of Software and Higher-order Dataflow  Programming\nFusion of Array Operations at Runtime\nImproved Query Topic Models via Pseudo-Relevant Pólya Document Models\nEpistemological Consequences of the Incompleteness Theorems\nVariations of the Similarity Function of TextRank for Automated  Summarization\nAutomatic Generation of Formula Simplifiers based on Conditional Rewrite  Rules\nOn Equivalence and Uniformisation Problems for Finite Transducers\nApproximate Relational Hoare Logic for Continuous Random Samplings\nPrediction of Infinite Words with Automata\nMeasuring cones and other thick subsets in free groups\nPredicate Gradual Logic and Linguistics\nA Persona-Based Neural Conversation Model\nNeural Summarization by Extracting Sentences and Words\nLinear Distances between Markov Chains\nMeasuring the speed of light with electric and magnetic pendulum\nMarkov Chains and Unambiguous Büchi Automata\nDerived-term Automata for Extended Weighted Rational Expressions\nOn model architecture for a children's speech recognition interactive  dialog system\nConstruction of Non-expandable Non-overlapping Sets of Pictures\nMatrix Factorization using Window Sampling and Negative Sampling for  Improved Word Representations\nOn the Commutative Algebra of Categories\nN= 4 Supersymmetric Quantum Mechanical Model: Novel Symmetries\nFormalization of Phase Ordering\nToward Word Embedding for Personalized Information Retrieval\nEnforcing Termination of Interprocedural Analysis\nLearning for Biomedical Information Extraction: Methodological Review of  Recent Advances\nQuery Answering with Transitive and Linear-Ordered Data\nTurchin's Relation for Call-by-Name Computations: A Formal Approach\nExperiments with Synchronizing Automata\nNeural Machine Translation with Recurrent Attention Modeling\nOn the structure of formal balls of the balanced quasi-metric domain of  words\nIncremental Learning for Fully Unsupervised Word Segmentation Using  Penalized Likelihood and Model Selection\nNeural Machine Translation with Supervised Attention\nConstructing Orthogonal Latin Squares from Linear Cellular Automata\nNonsymbolic Text Representation\nNotes on Pure Dataflow Matrix Machines: Programming with  Self-referential Matrix Transformations\nModelling Sentence Pairs with Tree-structured Attentive Encoder\nAugmented Index and Quantum Streaming Algorithms for DYCK(2)\nIterative Refinement for Machine Translation\nHow Document Pre-processing affects Keyphrase Extraction Performance\nHigher-Order Linearisability\nA Compare-Aggregate Model for Matching Text Sequences\nAC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text  Classification\nDifferentiable Programs with Neural Libraries\nSubgroups of Quantum Groups\nSummaRuNNer: A Recurrent Neural Network based Sequence Model for  Extractive Summarization of Documents\nA Way out of the Odyssey: Analyzing and Combining Recent Insights for  LSTMs\nCoalgebraic trace semantics via forgetful logics\nStatic Analysis of Communicating Processes using Symbolic Transducers\nNeural Document Embeddings for Intensive Care Patient Mortality  Prediction\nVIBIKNet: Visual Bidirectional Kernelized Network for Visual Question  Answering\nTF.Learn: TensorFlow's High-level Module for Distributed Machine  Learning\nAutomatic Labelling of Topics with Neural Embeddings\nDomain specialization: a post-training domain adaptation for Neural  Machine Translation\nNondeterministic unitary OBDDs\nProceedings 13th International Conference on Quantum Physics and Logic\nModeling news spread as an SIR process over temporal networks\nBernstein-Zelevinsky derivatives: a Hecke algebra approach\nA storm is Coming: A Modern Probabilistic Model Checker\nBisimulation Metrics for Weighted Automata\nQuantum Computing with Variable Complex Plane. Light Beam Guide  Implementation\nUnsupervised Learning of Sentence Embeddings using Compositional n-Gram  Features\nReinforcement Learning for Transition-Based Mention Detection\nPermissive Supervisor Synthesis for Markov Decision Processes through  Learning\nSemi-Supervised Affective Meaning Lexicon Expansion Using Semantic and  Distributed Word Representations\nEnriched Duality in Double Categories: V-categories and V-cocategories\nEmbedded Collaborative Filtering for \"Cold Start\" Prediction\nPersian Wordnet Construction using Supervised Learning\nDoes Neural Machine Translation Benefit from Larger Context?\nBandit Structured Prediction for Neural Sequence-to-Sequence Learning\nLexical Features in Coreference Resolution: To be Used With Caution\nAlgebraically Closed Fields with a Generic Multiplicative Character\nRevisiting Recurrent Networks for Paraphrastic Sentence Embeddings\nLearning Topic-Sensitive Word Representations\nSpeech-Based Visual Question Answering\nA Versatile, Sound Tool for Simplifying Definitions\nYet Another Introduction to Dark Matter\nDynamic Compositional Neural Networks over Tree Structure\nA Neural Framework for Generalized Topic Models\nImplementing the sine transform of fermionic modes as a tensor network\nHilbert series for twisted commutative algebras\nAutoWIG: Automatic Generation of Python Bindings for C++ Libraries\nDataflow Matrix Machines as a Model of Computations with Linear Streams\nComputer aided synthesis: a game theoretic approach\nProsodic Event Recognition using Convolutional Neural Networks with  Context Information\nOrtoedres amb longitud d'arestes enteres / Cuboids with integer length  edges\ntrackr: A Framework for Enhancing Discoverability and Reproducibility of  Data Visualizations and Other Artifacts in R\nAn Overview of Multi-Task Learning in Deep Neural Networks\nA Mixture Model for Learning Multi-Sense Word Embeddings\nThe Moore and the Myhill Property For Strongly Irreducible Subshifts Of  Finite Type Over Group Sets\nOn the ghost issue of extended quasidilaton\nParareal Algorithm Implementation and Simulation in Julia\nDE-PACRR: Exploring Layers Inside the PACRR Model\nVertical almost-toric systems\nMultiple Range-Restricted Bidirectional Gated Recurrent Units with  Attention for Relation Classification\nContext Aware Document Embedding\nData-Driven Loop Invariant Inference with Automatic Feature Synthesis\nOn the physical interpretation of the Dirac wavefunction\nAuxiliary Objectives for Neural Error Detection Models\nMachine Translation at Booking.com: Journey and Lessons Learned\nCombining Thesaurus Knowledge and Probabilistic Topic Models\nDomain Aware Neural Dialog System\nCRF Autoencoder for Unsupervised Dependency Parsing\nTowards Neural Speaker Modeling in Multi-Party Conversation: The Task,  Dataset, and Models\nNeural Translation of Musical Style\nA Measure for Dialog Complexity and its Application in Streamlining  Service Operations\nOrbifold equivalence: structure and new examples\nFinite-state Strategies in Delay Games\nSmall-footprint Keyword Spotting Using Deep Neural Network and  Connectionist Temporal Classifier\nLearning to Explain Non-Standard English Words and Phrases\nTowards coarse graining of discrete Lorentzian quantum gravity\nEvent Identification as a Decision Process with Non-linear  Representation of Text\nGroup Sparse CNNs for Question Classification with Answer Sets\nConfidence through Attention\nQuerying Best Paths in Graph Databases\nClickbait Detection in Tweets Using Self-attentive Network\nAligning Script Events with Narrative Texts\nRETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and  CNN\nMatchPy: A Pattern Matching Library\nEmbedding-Based Speaker Adaptive Training of Deep Neural Networks\nMulti-Task Learning for Speaker-Role Adaptation in Neural Conversation  Models\nDeciding Confluence and Normal Form Properties of Ground Term Rewrite  Systems Efficiently\nWeakly 2-randoms and 1-generics in Scott sets\nFirst Results from Using Game Refinement Measure and Learning  Coefficient in Scrabble\nAn introduction to approximate computing\nEliminating the unit constant in the Lambek calculus with brackets\nA Uniform Framework for Timed Automata and Beyond\nRobert Sheckley\\s Answerer for two orthogonal projections\nA framework for on-line calibration of LINAC devices\nHeterogeneous continuous time random walks\nAttention networks for image-to-text\nEnhanced Characterness for Text Detection in the Wild\nAvoiding Echo-Responses in a Retrieval-Based Conversation System\nStructured Optimal Transport\nHotFlip: White-Box Adversarial Examples for NLP\nStable regularity for relational structures\nLearning Feature Representations for Keyphrase Extraction\nTrading the Twitter Sentiment with Reinforcement Learning\nLifelong Learning for Sentiment Classification\nPointlike sets for varieties determined by groups\nEvaluating neural network explanation methods using hybrid documents and  morphological prediction\nInvestigations on Knowledge Base Embedding for Relation Prediction and  Extraction\nNon-Projective Dependency Parsing via Latent Heads Representation (LHR)\nCross-topic Argument Mining from Heterogeneous Sources Using  Attention-based Neural Networks\nImplicit Argument Prediction with Event Knowledge\nSTRIPStream: Integrating Symbolic Planners and Blackbox Samplers\nEfficient Mendler-Style Lambda-Encodings in Cedille\nSemantical Equivalence of the Control Flow Graph and the Program  Dependence Graph\nBi-interpretability of a Free Monoid with the Arithmetic and  Applications\nCorpus Statistics in Text Classification of Online Data\nDefeasible Reasoning in SROEL: from Rational Entailment to Rational  Closure\nDeep Communicating Agents for Abstractive Summarization\nComputer-Assisted Text Analysis for Social Science: Topic Models and  Beyond\nAutomatic Generation of Chinese Short Product Titles for Mobile Display\nTraining Tips for the Transformer Model\nHigher-Order Bounded Model Checking\nEmotion Orientated Recommendation System for Hiroshima Tourist by Fuzzy  Petri Net\nA Dynamic Approach to Characterizing Termination of General Logic  Programs\nTopological sigma-models with H-flux and twisted generalized complex  manifolds\nOptimized Generation of Data-Path from C Codes for FPGAs\nOn generic properties of finitely presented monoids and semigroups\nTowards a Generic Framework to Generate Explanatory Traces of Constraint  Solving and Rule-Based Reasoning\nSpecific-to-General Learning for Temporal Events with Application to  Learning Event Definitions from Video\nBoltzmann samplers for random generation of lambda terms\nOn Periodicity and Complexity of Generalized Pseudostandard Words\nThe Influence of the Generator's License on Generated Artifacts\nThe Interaction of Memory and Attention in Novel Word Generalization: A  Computational Investigation\nGeneralized Metrics\nGeneralized Entropies and the Similarity of Texts\nSimTensor: A synthetic tensor data generator\nGenerative and Discriminative Text Classification with Recurrent Neural  Networks\nSkeleton Key: Image Captioning by Skeleton-Attribute Decomposition\nA Conditional Variational Framework for Dialog Generation\nFlexible and Creative Chinese Poetry Generation Using Neural Memory\nAdversarial Feature Matching for Text Generation\nDTATG: An Automatic Title Generator based on Dependency Trees\nRubyStar: A Non-Task-Oriented Mixture Model Dialog System\nACtuAL: Actor-Critic Under Adversarial Learning\nOn the Automatic Generation of Medical Imaging Reports\nTable-to-text Generation by Structure-aware Seq2seq Learning\nA Syntactic Approach to Domain-Specific Automatic Question Generation\nExploration on Generating Traditional Chinese Medicine Prescription from  Symptoms with an End-to-End method\nGeneric HKT geometries in the harmonic superspace approach\nShow, Tell and Discriminate: Image Captioning by Self-retrieval with  Partially Labeled Data\nCoT: Cooperative Training for Generative Modeling\nGRAMPAL: A Morphological Processor for Spanish implemented in Prolog\nMBT: A Memory-Based Part of Speech Tagger-Generator\nOT SIMPLE - a construction-kit approach to Optimality Theory  implementation\nCompetition-Induced Preferential Attachment\nArtificial Sequences and Complexity Measures\nAn Algebraic Programming Style for Numerical Software and its  Optimization\nRewriting Calculus: Foundations and Applications\nMaking Abstract Domains Condensing\nGeneric and Efficient Program Monitoring by trace analysis\nPractical Semantic Analysis of Web Sites and Documents\nDecidability of Type-checking in the Calculus of Algebraic Constructions  with Size Annotations\nGenerative Unbinding of Names\nA multiphysics and multiscale software environment for modeling  astrophysical systems\nProvenance Traces\nType-Safe Feature-Oriented Product Lines\nFractal Dimension for Fractal Structures\nAn extensible web interface for databases and its application to storing  biochemical data\nAmortised Resource Analysis with Separation Logic\nRegular Functions, Cost Register Automata, and Generalized Min-Cost  Problems\nPertinent Information retrieval based on Possibilistic Bayesian network  : origin and possibilistic perspective\nDisease processes as hybrid dynamical systems\nIdentification of Literary Movements Using Complex Networks to Represent  Texts\nPerfect orderings on Bratteli diagrams II: general Bratteli diagrams\nInference of Field-Sensitive Reachability and Cyclicity\nEvidence and plausibility in neighborhood structures\nSemantic Stability in Social Tagging Streams\nHeterogeneous Programming with Single Operation Multiple Data\nA Theory of Formal Synthesis via Inductive Learning\nTransforming Wikipedia into an Ontology-based Information Retrieval  Search Engine for Local Experts using a Third-Party Taxonomy\nKnowledge Base Population using Semantic Label Propagation\nInterprocedural Data Flow Analysis in Soot using Value Contexts\nInterface Reconciliation in Kahn Process Networks using CSP and SAT\nUsing Generic Summarization to Improve Music Information Retrieval Tasks\nDifferentiation with stratification: a principle of theoretical physics  in the tradition of the memory art\nModular Acquisition and Stimulation System for Timestamp-Driven  Neuroscience Experiments\nA Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) for  Unsupervised Discovery of Linguistic Units and Generation of High Quality  Features\nA Neural Conversational Model\nBoosting Java Performance using GPGPUs\nA square root map on Sturmian words\nTowards Energy Consumption Verification via Static Analysis\nInfant directed speech is consistent with teaching\nPicture It In Your Mind: Generating High Level Visual Representations  From Textual Descriptions\nThe Applied Pi Calculus: Mobile Values, New Names, and Secure  Communication\nNeural Architecture Search with Reinforcement Learning\nLearning a Static Analyzer from Data\nOld Content and Modern Tools - Searching Named Entities in a Finnish  OCRed Historical Newspaper Collection 1771-1910\nA covariant Hamiltonian tetrad approach to numerical relativity\nWeb-based Semantic Similarity for Emotion Recognition in Web Objects\nAn Agglomeration Law for Sorting Networks and its Application in  Functional Programming\nBeam Search Strategies for Neural Machine Translation\nToward Abstraction from Multi-modal Data: Empirical Studies on Multiple  Time-scale Recurrent Models\nApplied Type System: An Approach to Practical Programming with  Theorem-Proving\nDomains for Higher-Order Games\nSNMP for Common Lisp\nRank-1 Constrained Multichannel Wiener Filter for Speech Recognition in  Noisy Environments\nVerification of Asynchronous Systems with an Unspecified Component\nLearning to Prove Safety over Parameterised Concurrent Systems (Full  Version)\nHow Deterministic are Good-For-Games Automata?\nUnsupervised Context-Sensitive Spelling Correction of English and Dutch  Clinical Free-Text with Word and Character N-Gram Embeddings\nStrategy Preserving Compilation for Parallel Functional Code\nOn modeling vagueness and uncertainty in data-to-text systems through  fuzzy sets\nJoint Sentiment/Topic Modeling on Text Data Using Boosted Restricted  Boltzmann Machine\nCollaborative Metric Learning Recommendation System: Application to  Theatrical Movie Releases\nField redefinitions in theories beyond Einstein gravity using the  language of differential forms\nA Characterization for Decidable Separability by Piecewise Testable  Languages\nGröbner methods for representations of combinatorial categories\nA Generic Method for Automatic Ground Truth Generation of  Camera-captured Documents\nLanguage is Physical\nDecision Problems for Recognizable Languages of Infinite Pictures\nForward and Backward Application of Symbolic Tree Transducers\nA survey of methods to ease the development of highly multilingual text  mining applications\nA Comparative Study of Programming Languages in Rosetta Code\nA Very Low Resource Language Speech Corpus for Computational Language  Documentation Experiments\nStatistical Parametric Speech Synthesis Incorporating Generative  Adversarial Networks\nJapanese Discourse and the Process of Centering\nOn the Notion of Proposition in Classical and Quantum Mechanics\nA Strongly Grounded Stable Model Semantics for Full Propositional  Language\nSecondary implementation of interactive engagement teaching techniques:  Choices and challenges in a Gulf Arab context\nUnderstanding Zipf's law of word frequencies through sample-space  collapse in sentence formation\nSemi-galois Categories I: The Classical Eilenberg Variety Theory\nGeoTextTagger: High-Precision Location Tagging of Textual Documents  using a Natural Language Processing Approach\nOn Grothendieck's construction of Teichmüller space\nRobust algorithms with polynomial loss for near-unanimity CSPs\nThe complexity of Boolean surjective general-valued CSPs\nMining actionable information from security forums: the case of  malicious IP addresses\nFunctional Dynamics II : Syntactic Structure\nTHE GRISHCHUK-ZELDOVICH EFFECT IN THE OPEN UNIVERSE\nSpin-Coefficient Form of the New Laws of Black-Hole Dynamics\nDifferential Forms and Wave Equations for General Relativity\nOpen String BRST Cohomology for Generalized Complex Branes\nA Model of Classical and Quantum Measurement\nJava Physics Generator and Analysis Modules\nRegularity conditions via generalized interiority notions in convex  optimization: new achievements and their relation to some classical  statements\nDeriving the Probabilistic Capacity of General Run-Length Sets Using  Generating Functions\nNumber Theories\nGeneral Relativity and Weyl Frames\nMDA-based ATL transformation to generate MVC 2 web models\nDesigning a CPU model: from a pseudo-formal document to fast code\nVariable types for meaning assembly: a logical syntax for generic noun  phrases introduced by most\nPerfXplain: Debugging MapReduce Job Performance\nA new approach of designing Multi-Agent Systems\nAutomated Word Puzzle Generation via Topic Dictionaries\nEntailment in Probability of Thresholded Generalizations\nReduce Meaningless Words for Joint Chinese Word Segmentation and  Part-of-speech Tagging\nStructured Generative Models of Natural Source Code\nSequence to Sequence -- Video to Text\nPositive Alexander Duality for Pursuit and Evasion\nAbstract Learning Frameworks for Synthesis\nA Hiking Trip Through the Orders of Magnitude: Deriving Efficient  Generators for Closed Simply-Typed Lambda Terms and Normal Forms\nA Generalized Kahn Principle for Abstract Asynchronous Networks\nOn the algebraicity of generalized power series\nWhat to talk about and how? Selective Generation using LSTMs with  Coarse-to-Fine Alignment\nGenerating Weather Forecast Texts with Case Based Reasoning\nOn the Uniform Random Generation of Non Deterministic Automata Up to  Isomorphism\nBeyond Caption To Narrative: Video Captioning With Multiple Sentences\nA Hierarchical Latent Variable Encoder-Decoder Model for Generating  Dialogues\nGenerative Choreography using Deep Learning\nAn Attentional Neural Conversation Model with Improved Specificity\nDeep Reinforcement Learning for Dialogue Generation\nNeural Machine Translation with External Phrase Memory\nGeometrothermodynamics for Black holes and de Sitter Space\nSequence to Backward and Forward Sequences: A Content-Introducing  Approach to Generative Short-Text Conversation\nAnchored Correlation Explanation: Topic Modeling with Minimal Domain  Knowledge\nGenerating Code Summaries Using the Power of the Crowd\nDeep Active Learning for Dialogue Generation\nA Note on One Less Known Class of Generated Residual Implications\nAbstractive Headline Generation for Spoken Content by Attentive  Recurrent Neural Networks with ASR Error Modeling\nHierarchical Recurrent Attention Network for Response Generation\nParseIT: A Question-Answer based Tool to Learn Parsing Techniques\nEmotional Chatting Machine: Emotional Conversation Generation with  Internal and External Memory\nAssigning personality/identity to a chatting machine for coherent  conversation generation\nChallenges in Data-to-Document Generation\nData Sets: Word Embeddings Learned from Tweets and General Data\nGenerating Thematic Chinese Poetry using Conditional Variational  Autoencoders with Hybrid Decoders\nGenerative Adversarial Network for Abstractive Text Summarization\nGenerating High-Quality Query Suggestion Candidates for Task-Based  Search\nLearning Hyperedge Replacement Grammars for Graph Generation\nQuery and Output: Generating Words by Querying Distributed Word  Representations for Paraphrase Generation\nImage Generation from Scene Graphs\nGroups with Context-Free Co-Word Problem and Embeddings into Thompson's  Group $V$\nHigh-Level Why-Not Explanations using Ontologies\nStructured Query Language for Virtual Observatory\nSpelling Correction in Agglutinative Languages\nUsing Chinese Text Processing Technique for the Processing of Sanskrit  Based Indian Languages: Maximum Resource Utilization and Maximum  Compatibility\nIntegrated speech and morphological processing in a connectionist  continuous speech understanding for Korean\nA Semantics-based Communication System for Dysphasic Subjects\nTemporal Meaning Representations in a Natural Language Front-End\nThe OLAC Metadata Set and Controlled Vocabularies\nSeven Dimensions of Portability for Language Documentation and  Description\nThe Study of the Application of a Keywords-based Chatbot System on the  Teaching of Foreign Languages\nLogic-Based Specification Languages for Intelligent Software Agents\nThe UPLNC Compiler: Design and Implementation\nMapping the Object-Role Modeling language ORM2 into Description Logic  language DLRifd\nDemographic growth and the distribution of language sizes\nAn omega-Power of a Finitary Language Which is a Borel Set of Infinite  Rank\nNon-Deterministic Communication Complexity of Regular Languages\nThe Mob core language and abstract machine (rev 0.2)\nDynamic Complexity of Formal Languages\nOntology-Based Annotation of Multimedia Language Data for the Semantic  Web\nOn Recognizable Tree Languages Beyond the Borel Hierarchy\nComplexity of countable categoricity in finite languages\nOn the Hairpin Incompletion\nThe power of linear programming for valued CSPs: a constructive  characterization\nThe fundamentals of relations language mathematics\nOn the complexity of learning a language: An improvement of Block's  algorithm\nBenchmarking Usability and Performance of Multicore Languages\nDescribing groups using first-order language\nMashup of Meta-Languages and its Implementation in the Kermeta Language  Workbench\nCornell SPF: Cornell Semantic Parsing Framework\nTowards The Development of a Bishnupriya Manipuri Corpus\nProbabilistic Programming Concepts\nAn efficiency dependency parser using hybrid approach for tamil language\nOn Upper and Lower Bounds on the Length of Alternating Towers\nAzhary: An Arabic Lexical Ontology\nToward a new language of legal drafting\nLarger-Context Language Modelling\nSystematically Deriving Domain-Specific Transformation Languages\nThe languages of actions, formal grammars and qualitive modeling of  companies\nWhat to do about non-standard (or non-canonical) language in NLP\nThe non-abelian squares are not context-free\nOperational characterization of scattered MCFLs -- Technical Report\nMontiCore: a Framework for Compositional Development of Domain Specific  Languages\nIntegrated Definition of Abstract and Concrete Syntax for Textual  Languages\nIndex problems for game automata\nAn Automata Theoretic Approach to the Zero-One Law for Regular  Languages: Algorithmic and Logical Aspects\nNode Selection Query Languages for Trees\nA Survey of the State of the Art in Data Mining and Integration Query  Languages\nCross-Lingual Morphological Tagging for Low-Resource Languages\nLabeling of Query Words using Conditional Random Field\nJoint Online Spoken Language Understanding and Language Modeling with  Recurrent Neural Networks\nMulti-task Recurrent Model for True Multilingual Speech Recognition\nForeign-language Reviews: Help or Hindrance?\nKU-ISPL Speaker Recognition Systems under Language mismatch condition  for NIST 2016 Speaker Recognition Evaluation\nComparison of Modified Kneser-Ney and Witten-Bell Smoothing Techniques  in Statistical Language Model of Bahasa Indonesia\nDo Neural Nets Learn Statistical Laws behind Natural Language?\nLearning Language Representations for Typology Prediction\nNeural Language Modeling by Jointly Learning Syntax and Lexicon\nPhonemic and Graphemic Multilingual CTC Based Speech Recognition\nA Class of Automatic Sequences\nThe language (and series) of Hammersley-type processes\nUnboundedness problems for languages of vector addition systems\nMultilingual bottleneck features for subword modeling in zero-resource  languages\nAn Empirical Comparison of Probability Models for Dependency Grammar\nMultiresolution analysis of electronic structure: semicardinal and  wavelet bases\nOn Ising and dimer models in two and three dimensions\nConstraint Programming viewed as Rule-based Programming\nConstraint Exploration and Envelope of Simulation Trajectories\nThe similarity metric\nUsing Tree Automata and Regular Expressions to Manipulate Hierarchically  Structured Data\nAn Improved k-Nearest Neighbor Algorithm for Text Categorization\nSummarization from Medical Documents: A Survey\nA Knowledge-Based Approach for Selecting Information Sources\nClassdesc and Graphcode: support for scientific programming in C++\nField Theory of the Electron, Spin and Zitterbewegung\nSupersymmetry without Supersymmetry\nHeisenberg Honeycombs Solve Veneziano Puzzle\nTowards a necessary change in the mathematical principles of natural  philosophy\nNumerical Methods as an Integrated Part of Physics Education\nThe Interpretation of Quantum Mechanics: Many Worlds or Many Words?\nCounting, Fanout, and the Complexity of Quantum ACC\nUsing RDF to Model the Structure and Process of Systems\nA case study of the difficulty of quantifier elimination in constraint  databases: the alibi query in moving object databases\nThe large deviation approach to statistical mechanics\nThe Complexity of Enriched Mu-Calculi\nOn external presentations of infinite graphs\nSome Remarks on the Model Theory of Epistemic Plausibility Models\nAutomatic Music Composition using Answer Set Programming\nThe physical language of molecular codes: A rate-distortion approach to  the evolution and emergence of biological codes\nThe Conceptual Integration Modeling Framework: Abstracting from the  Multidimensional Model\nComplexity of Non-Monotonic Logics\nProceedings International Workshop on Strategies in Rewriting, Proving,  and Programming\nA Context-theoretic Framework for Compositionality in Distributional  Semantics\nRecovering Quantum Logic within an Extended Classical Framework\nSome works of Furtwängler and Vandiver revisited and Fermat's last  theorem\nCounting Homomorphisms and Partition Functions\nOBDD-based Universal Planning for Synchronized Agents in  Non-Deterministic Domains\nHigher Order Programming to Mine Knowledge for a Modern Medical Expert  System\nA Scalable Video Search Engine Based on Audio Content Indexing and Topic  Segmentation\nConstraint Satisfaction Tractability from Semi-lattice Operations on  Infinite Sets\nOn accuracy of mathematical languages used to deal with the Riemann zeta  function and the Dirichlet eta function\nFuzzy Time in LTL\nYou had me at hello: How phrasing affects memorability\nOCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram  Data Set\nFinding Structure in Text, Genome and Other Symbolic Sequences\nFirst steps in synthetic guarded domain theory: step-indexing in the  topos of trees\n1 Billion Pages = 1 Million Dollars? Mining the Web to Play \"Who Wants  to be a Millionaire?\"\nComplex networks analysis of language complexity\nGood Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts\nA practical approach to ontology-enabled control systems for  astronomical instrumentation\nA quantum linguistic characterization of the reverse relation between  confidence interval and hypothesis testing\nRemoving Dynamic Type Tests with Context-Driven Basic Block Versioning\nContinuous Speech Recognition Based on Deterministic Finite Automata  Machine using Utterance and Pitch Verification\nTelling Breaking News Stories from Wikipedia with Social Multimedia: A  Case Study of the 2014 Winter Olympics\nPagination: It's what you say, not how long it takes to say it\nLifted Variable Elimination for Probabilistic Logic Programming\nBenchmarking Named Entity Disambiguation approaches for Streaming Graphs\nTowards a Visual Turing Challenge\nEfficient reduction of Kappa models by static inspection of the rule-set\nA Canonical Form for Weighted Automata and Applications to Approximate  Minimization\nA domain-level DNA strand displacement reaction enumerator allowing  arbitrary non-pseudoknotted secondary structures\nEffect-Dependent Transformations for Concurrent Programs\nTelemedicine as a special case of Machine Translation\nFrom F to DOT: Type Soundness Proofs with Definitional Interpreters\nSymbol Grounding Association in Multimodal Sequences with Missing  Elements\nCompositional Memory for Visual Question Answering\nThe Mechanism of Additive Composition\nAchieving Open Vocabulary Neural Machine Translation with Hybrid  Word-Character Models\nModelling Student Behavior using Granular Large Scale Action Data from a  MOOC\nLiving in Parallel Realities -- Co-Existing Schema Versions with a  Bidirectional Database Evolution Language\nExpressibility in the Lambda Calculus with mu\nSelection and Influence in Cultural Dynamics\nProof Pad: A New Development Environment for ACL2\nWhy is combinatorial communication rare in the natural world, and why is  language an exception to this trend?\nRegular Combinators for String Transformations\nConversational Sensing\nOn The Reachability Problem for Recursive Hybrid Automata with One and  Two Players\nHuman Communication Systems Evolve by Cultural Selection\nInformation topology identifies emergent model classes\nSolving 3-Color Parity Games in $ O(n^2) $ Time\nIdentifying missing dictionary entries with frequency-conserving context  models\nBounding linear head reduction and visible interaction through skeletons\nPhotonics design tool for advanced CMOS nodes\nA Critical Review of Recurrent Neural Networks for Sequence Learning\nWordRank: Learning Word Embeddings via Robust Ranking\nLeveraging Word Embeddings for Spoken Document Summarization\nOrdering Interrogative Questions for Effective Requirements Engineering:  The W6H Pattern\nOn Practical SMT-Based Type Error Localization\nReasoning about Entailment with Neural Attention\nTopic segmentation via community detection in complex networks\nKauffman's adjacent possible in word order evolution\nAn Iterative Deep Learning Framework for Unsupervised Discovery of  Speech Features and Linguistic Units with Applications on Spoken Term  Detection\nThe Exception that Improves the Rule\nDialog state tracking, a machine reading approach using Memory Network\nA Novel Projected Two Binary Variables Formulation for Unit Commitment  Problem\nGeneralizing to Unseen Entities and Entity Pairs with Row-less Universal  Schema\nVideoMCC: a New Benchmark for Video Comprehension\nEnriching Linked Datasets with New Object Properties\nLog-Linear RNNs: Towards Recurrent Neural Networks with Flexible Prior  Knowledge\nCFGs-2-NLU: Sequence-to-Sequence Learning for Mapping Utterances to  Semantics and Pragmatics\nExtending OMNeT++ Towards a Platform for the Design of Future In-Vehicle  Network Architectures\nA Study of Factuality, Objectivity and Relevance: Three Desiderata in  Large-Scale Information Retrieval?\nScalable Machine Translation in Memory Constrained Environments\nA Practical Approach to Interval Refinement for math.h/cmath Functions\nFencing off Go: Liveness and Safety for Channel-based Programming  (extended version)\nNeuro-Symbolic Program Synthesis\nOn Proving Confluence Modulo Equivalence for Constraint Handling Rules\nLearning to Distill: The Essence Vector Modeling Framework\nDomain Adaptation for Named Entity Recognition in Online Media with Word  Embeddings\nTracking the World State with Recurrent Entity Networks\nLearning to Hash-tag Videos with Tag2Vec\nStream Fusion, to Completeness\nVariations on Variants\nFuzzy Based Implicit Sentiment Analysis on Quantitative Sentences\nMinimal theory of quasidilaton massive gravity\ncheckmate: Fast Argument Checks for Defensive R Programming\nDisruptive Behavior Disorder (DBD) Rating Scale for Georgian Population\nLoyalty in Online Communities\nSymbol Grounding via Chaining of Morphisms\nJoint Probabilistic Linear Discriminant Analysis\nLoop Quasi-Invariant Chunk Motion by peeling with statement composition\nStream Processing using Grammars and Regular Expressions\nCommunity Identity and User Engagement in a Multi-Community Landscape\nGated Recurrent Neural Tensor Network\nAccelerating Innovation Through Analogy Mining\nSynthesis of Data Completion Scripts using Finite Tree Automata\nInformation-gain computation\nTruly Sub-cubic Algorithms for Language Edit Distance and RNA Folding  via Fast Bounded-Difference Min-Plus Product\nProgressive Joint Modeling in Unsupervised Single-channel Overlapped  Speech Recognition\nProbabilistic Graphical Models for Credibility Analysis in Evolving  Online Communities\nLearning to Blame: Localizing Novice Type Errors with Data-Driven  Diagnosis\nLinear-time Temporal Logic with Event Freezing Functions\nA Deep Reinforcement Learning Chatbot\nHorndeski extension of the minimal theory of quasidilaton massive  gravity\nRobustness Analysis of Visual QA Models by Basic Questions\nThe BURCHAK corpus: a Challenge Data Set for Interactive Learning of  Visually Grounded Word Meanings\nThe Semantics of Transactions and Weak Memory in x86, Power, ARMv8, and  C++\n$A^{4}NT$: Author Attribute Anonymity by Adversarial Training of Neural  Machine Translation\nInternalising Interaction Protocols as First-Class Programming Elements  in Multi Agent Systems\nPolitical Polarization in Social Media: Analysis of the \"Twitter  Political Field\" in Japan\nImproving the Accuracy of Pre-trained Word Embeddings for Sentiment  Analysis\nModelling Domain Relationships for Transfer Learning on Retrieval-based  Question Answering Systems in E-commerce\nVisual Features for Context-Aware Speech Recognition\nGrounding Referring Expressions in Images by Variational Context\nTowards a science of human stories: using sentiment analysis and  emotional arcs to understand the building blocks of complex social systems\nCognitive Database: A Step towards Endowing Relational Databases with  Artificial Intelligence Capabilities\nComparative Opinion Mining: A Review\nA Deep Reinforcement Learning Chatbot (Short Version)\nContinuous Space Reordering Models for Phrase-based MT\nMatching Long Text Documents via Graph Convolutional Networks\nRecurrent Neural Network Attention Mechanisms for Interpretable System  Log Anomaly Detection\nUsing RuleBuilder to graphically define and visualize BioNetGen-language  patterns and reaction rules\neSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing\nWhat we talk about when we talk about monads\nDetecting Malicious PowerShell Commands using Deep Neural Networks\nMost Complex Deterministic Union-Free Regular Languages\nProactive Empirical Assessment of New Language Feature Adoption via  Automated Refactoring: The Case of Java 8 Default Methods\nConstruction of a Bilingual Dictionary Intermediated by a Third Language\nRapid Development of Morphological Descriptions for Full Language  Processing Systems\nA fast partial parse of natural language sentences using a connectionist  method\nConstraint Logic Programming for Natural Language Processing\nMAXIMUM LIKELIHOOD AND MINIMUM ENTROPY IDENTIFICATION OF GRAMMARS\nNew Techniques for Context Modeling\nCluster Expansions and Iterative Scaling for Maximum Entropy Language  Models\nNatural language processing: she needs something old and something new  (maybe something borrowed and something blue, too)\nUsing Terminological Knowledge Representation Languages to Manage  Linguistic Resources\nHead Automata for Speech Translation\nLearning Translation Rules From A Bilingual Corpus\nIncorporating POS Tagging into Language Modeling\nMultilingual phonological analysis and speech synthesis\nWhat is word sense disambiguation good for?\nAnnotation Style Guide for the Blinker Project\nLuaJava - A Scripting Tool for Java\nLanguage Identification With Confidence Limits\nComparative Analysis of Five XML Query Languages\nNLTK: The Natural Language Toolkit\nUsing the DIFF Command for Natural Language Processing\nCorpus structure, language models, and ad hoc information retrieval\nLinguistically Grounded Models of Language Change\nSelf-similarity in the taxonomic classification of human languages\nThe Ising Model for Changes in Word Ordering Rule in Natural Language\nNon-equilibrium and Irreversible Simulation of Competition among  Languages\nOn the class of languages recognizable by 1-way quantum finite automata\nUsing conceptual metaphor and functional grammar to explore how language  used in physics affects student learning\nDo language change rates depend on population size?\nThe physics of randomness and regularities for languages (lifetimes,  family trees, and the second languages); in terms of random matrices\nECOLANG - Communications Language for Ecological Simulations Network\nGeneralised sequential crossover of words and languages\nVariable elimination for building interpreters\nParikh's Theorem: A simple and direct automaton construction\nComparative study of the Pros and Cons of Programming languages Java,  Scala, C++, Haskell, VB .NET, AspectJ, Perl, Ruby, PHP & Scheme - a Team 11  COMP6411-S10 Term Report\nComparative Studies of 10 Programming Languages within 10 Diverse  Criteria - a Team 10 COMP6411-S10 Term Report\nCAL: A Language for Aggregating Functional and Extrafunctional  Constraints in Streaming Networks\nQuECT: A New Quantum Programming Paradigm\nWhy is language well-designed for communication? (Commentary on  Christiansen and Chater: 'Language as shaped by the brain')\nLanguage Support for Declarative Future Commitments\nA Note on Undecidability of Observation Consistency for Non-Regular  Languages\nAlgebraic Characterization of the Class of Languages recognized by  Measure Only Quantum Automata\nDeveloping a model for a text database indexed pedagogically for  teaching the Arabic language\nIndexed realizability for bounded-time programming with references and  type fixpoints\nAdaptation of pedagogical resources description standard (LOM) with the  specificity of Arabic language\nUniversal Numeric Segment Display for Indian Scheduled Languages: an  Architectural View\nEvaluation of Computational Grammar Formalisms for Indian Languages\nRewrite Closure and CF Hedge Automata\nA connection between concurrency and language theory\nEasyTime++: A case study of incremental domain-specific language  development\nQuantum finite automata and linear context-free languages: a decidable  problem\nImproving the quality of Gujarati-Hindi Machine Translation through  part-of-speech tagging and stemmer-assisted transliteration\nAcceptance conditions for omega-languages and the Borel hierarchy\nDescription Logics based Formalization of Wh-Queries\nThe existential fragment of S1S over element and successor is the  co-Buchi languages\nEvaluation and Ranking of Machine Translated Output in Hindi Language  using Precision and Recall Oriented Metrics\nAnnotated imports\nA Structural Query System for Han Characters\n\"Translation can't change a name\": Using Multilingual Data for Named  Entity Recognition\nTurkish Text Retrieval Experiments Using Lemur Toolkit\nSupervised learning model for parsing Arabic language\nTopological Entropy of Formal Languages\nLanguage discrimination and clustering via a neural network approach\nOne model, two languages: training bilingual parsers with harmonized  treebanks\nLogic for Unambiguous Context-Free Languages\nTowards Reversible Computation in Erlang\nStatistical Sign Language Machine Translation: from English written text  to American Sign Language Gloss\nMachine Translation Systems in India\nInvitation to Ezhil: A Tamil Programming Language for Early  Computer-Science Education\nSyntactic Complexity of Suffix-Free Languages\nA Unified Deep Neural Network for Speaker and Language Recognition\nThe Prose Storyboard Language: A Tool for Annotating and Directing  Movies\nOn the star-height of factor counting languages and their relationship  to Rees zero-matrix semigroups\nLearning Executable Semantic Parsers for Natural Language Understanding\nNatural Language Semantics and Computability\nA Free Energy Foundation of Semantic Similarity in Automata and  Languages\nWord Representation Models for Morphologically Rich Languages in Neural  Machine Translation\nThe word entropy of natural languages\nAn Unsupervised Probability Model for Speech-to-Translation Alignment of  Low-Resource Languages\nStatistical Properties of European Languages and Voynich Manuscript  Analysis\nImproving Neural Language Models with a Continuous Cache\nIncorporating Language Level Information into Acoustic Models\nLIDE: Language Identification from Text Documents\nFixing the Infix: Unsupervised Discovery of Root-and-Pattern Morphology\nLanguages of Play: Towards semantic foundations for game interfaces\nSpecifying Graph Languages with Type Graphs\nRankPL: A Qualitative Probabilistic Programming Language\nUnderstanding Abuse: A Typology of Abusive Language Detection Subtasks\nIncluding Dialects and Language Varieties in Author Profiling\nA Simple Language Model based on PMI Matrix Approximations\nSyllable-level Neural Language Model for Agglutinative Language\nMTIL17: English to Indian Langauge Statistical Machine Translation\nInput-to-Output Gate to Improve RNN Language Models\nCalcuList: a Functional Language Extended with Imperative Features\nLearning the Past Tense of English Verbs: The Symbolic Pattern  Associator vs. Connectionist Models\nTermination analysis of logic programs using acceptability with general  term orders\nEasy and Hard Constraint Ranking in OT: Algorithms and Complexity\nAcceptability with general orderings\nModel-Based Software Engineering and Ada: Synergy for the Development of  Safety-Critical Systems\nAnalysis of Titles and Readers For Title Generation Centered on the  Readers\nTowards Automated Generation of Scripted Dialogue: Some Time-Honoured  Strategies\nStatistical Machine Translation by Generalized Parsing\nProbabilistic Automata for Computing with Words\nDimensionally Democratic Calculus and Principles of Polydimensional  Physics\nGauge and General Relativity\nToda Lattice Hierarchy and Generalized String Equations\nA Generalized Information Formula as the Bridge between Shannon and  Popper\nFourier-Based Spectral Analysis with Adaptive Resolution\nStacks in canonical RNA pseudoknot structures\nOn (Omega-)Regular Model Checking\nGeneral Relativistic Rotation Curves in a Post-Newtonian Light\nUnfolding Mixed-Symmetry Fields in AdS and the BMV Conjecture: I.  General Formalism\nA D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem\nTime and symmetry in models of economic markets\nLimits of relatively hyperbolic groups and Lyndon's completions\nDynamically Generated Interfaces in XML Based Architecture\nContext-free pairs of groups I: Context-free pairs and graphs\nGraphes, moyennabilité et bas du spectre de variétés  topologiquement infinies\nSemantic annotation of requirements for automatic UML class diagram  generation\nSciduction: Combining Induction, Deduction, and Structure for  Verification and Synthesis\nGeneralized sequential tree-reweighted message passing\nSubalgebras of FA-presentable algebras\nGeneral-Purpose MCMC Inference over Relational Structures\nN=2 vacua in Generalized Geometry\nGeneral relativistic statistical mechanics\nA General Framework for the Derivation of Regular Expressions\nThe Tree Width of Separation Logic with Recursive Definitions\nA Generic Scheme and Properties of Bidirectional Transformations\nInflated Cauchy Filters - A Way to Construct the Completion of a General  Uniform Space\nTime-dependent Hierarchical Dirichlet Model for Timeline Generation\nGeneralizers: New Metaobjects for Generalized Dispatch\nExplain Images with Multimodal Recurrent Neural Networks\nA Novel Design of a Parallel Machine Learnt Generational Garbage  Collector\nSpacetime and observer space symmetries in the language of Cartan  geometry\nOn End-to-End Program Generation from User Intention by Deep Neural  Networks\nGenerative Concatenative Nets Jointly Learn to Write and Classify  Reviews\nNeural Variational Inference for Text Processing\nNeural Headline Generation with Sentence-wise Optimization\nGrowing Graphs with Hyperedge Replacement Graph Grammars\nContext Gates for Neural Machine Translation\nThe Generalized A* Architecture\nA Sufficient Condition for Hanna Neumann Property of Submonoids of a  Free Monoid\nGenerating Extractive Summaries of Scientific Paradigms\nExtended Recommendation Framework: Generating the Text of a User Review  as a Personalized Summary\nNeural Responding Machine for Short-Text Conversation\nThe geometry of generalized force matching in coarse-graining and  related information metrics\nAligning where to see and what to tell: image caption with region-based  attention and scene factorization\nSynchronization of Bernoulli sequences on shared letters\nA Neural Network Approach to Context-Sensitive Generation of  Conversational Responses\nA Generative Model for Multi-Dialect Representation\nNeural Generative Question Answering\nHorizon Shells and BMS-like Soldering Transformations\nArgumentation Mining in User-Generated Web Discourse\nGeneralized minimum dominating set and application in automatic text  summarization\nHow NOT To Evaluate Your Dialogue System: An Empirical Study of  Unsupervised Evaluation Metrics for Dialogue Response Generation\nWhich Learning Algorithms Can Generalize Identity-Based Rules to Novel  Inputs?\nProgramming Patterns in Dataflow Matrix Machines and Generalized  Recurrent Neural Nets\nOpen Information Extraction\nNeural Contextual Conversation Learning with Labeled Question-Answering  Pairs\nAn Actor-Critic Algorithm for Sequence Prediction\nSPICE: Semantic Propositional Image Caption Evaluation\nNeural Paraphrase Generation with Stacked Residual LSTM Networks\nNeural Machine Translation Advised by Statistical Machine Translation\nPrecondition Inference for Peephole Optimizations in LLVM\nEnvironmental Bisimulations for Delimited-Control Operators with Dynamic  Prompt Generation\nGenerating High-Quality and Informative Conversation Responses with  Sequence-to-Sequence Models\nJoin irreducible semigroups\nLearning Discourse-level Diversity for Neural Dialog Models using  Conditional Variational Autoencoders\nImproved Training of Wasserstein GANs\nGenerating Representative Executions [Extended Abstract]\nLearning Latent Representations for Speech Generation and Transformation\nDeep Keyphrase Generation\nOptimizing Memory Efficiency for Convolution Kernels on Kepler GPUs\nArtificial Error Generation with Machine Translation and Syntactic  Patterns\nDetecting and Explaining Causes From Text For a Time Series Event\nScene Graph Generation from Objects, Phrases and Region Captions\nNeural Rating Regression with Abstractive Tips Generation for  Recommendation\nCommunity Targeted Spam: A Middle Ground Between General Spam and Spear  Phishing\nLook-ahead Attention for Generation in Neural Machine Translation\nA Generative Model For Zero Shot Learning Using Conditional Variational  Autoencoders\nVariant-Based Decidable Satisfiability in Initial Algebras with  Predicates\nEinstein-Gauss-Bonnet theory of gravity : The Gauss-Bonnet-Katz boundary  term\nCode Attention: Translating Code to Comments by Exploiting Domain  Features\nGeneralization without systematicity: On the compositional skills of  sequence-to-sequence recurrent networks\nDynamic Graph Generation Network: Generating Relational Knowledge from  Diagrams\nGeneralizing inference systems by coaxioms\nQuery2Vec: An Evaluation of NLP Techniques for Generalized Workload  Analytics\nGenerating Wikipedia by Summarizing Long Sequences\nSemi-Amortized Variational Autoencoders\nFirst Order Generative Adversarial Networks\nGeneral Video Game AI: a Multi-Track Framework for Evaluating Agents,  Games and Content Generation Algorithms\nChern-Weil theorem, Lovelock Lagrangians in critical dimensions and  boundary terms in gravity actions\nTwo can play this Game: Visual Dialog with Discriminative Question  Generation and Answering\nSpeech waveform synthesis from MFCC sequences with generative  adversarial networks\nData2Vis: Automatic Generation of Data Visualizations Using  Sequence-to-Sequence Recurrent Neural Networks\nScalable Factorized Hierarchical Variational Autoencoder Training\nTyped Generic Traversal With Term Rewriting Strategies\nHermitian versus holomorphic complex and quaternionic generalized  supersymmetries of the M-theory. A classification\nFunctor is to Lens as Applicative is to Biplate: Introducing Multiplate\nGeneric Fibrational Induction\nScalar torsion and a new symmetry of general relativity\nA Case for Dynamic Reverse-code Generation to Debug Non-deterministic  Programs\nGenerating Abstractive Summaries from Meeting Transcripts\nGenerating Candidate Busy Beaver Machines (Or How to Build the Zany Zoo)\nSession Types as Generic Process Types\nThe JRC-Acquis: A multilingual aligned parallel corpus with 20+  languages\nLanguage, Emotions, and Cultures: Emotional Sapir-Whorf Hypothesis\nA practical approach to language complexity: a Wikipedia case study\nFirst-Order Quantifiers and the Syntactic Monoid of Height Fragments of  Picture Languages\nSchemas for Unordered XML on a DIME\nImproving Statistical Machine Translation for a Resource-Poor Language  Using Related Resource-Rich Languages\nThe statistical trade-off between word order and word structure -  large-scale evidence for the principle of least effort\nAmbiguity in language networks\nDefining relations on graphs: how hard is it in the presence of node  partitions?\nProbabilistic Modelling of Morphologically Rich Languages\nNews Across Languages - Cross-Lingual Document Similarity and Event  Tracking\nLanguage Models of Spoken Dutch\nA Resource-Light Method for Cross-Lingual Semantic Textual Similarity\nEssential equivalence of the GENERIC and Steepest Entropy Ascent models  of dissipation for non-equilibrium thermodynamics\nGeneralized Homogeneous Polynomials for Efficient Template-Based  Nonlinear Invariant Synthesis\nInversion Polynomials for Permutations Avoiding Consecutive Patterns\nRecurrent Topic-Transition GAN for Visual Paragraph Generation\nAre You Talking to Me? Reasoned Visual Dialog Generation through  Adversarial Learning\nOffline Specialisation in Prolog Using a Hand-Written Compiler Generator\nReasoning about Algebraic Data Types with Abstractions\nGeneralized Graph Pattern Matching\nComplexity of Suffix-Free Regular Languages\nThe Difficulties of Learning Logic Programs with Cut\nOn Modular Termination Proofs of General Logic Programs\nSemantic filtering by inference on domain knowledge in spoken dialogue  systems\nReverse Engineering Ontology to Conceptual Data Models\nTermination Analysis of General Logic Programs for Moded Queries: A  Dynamic Approach\nA Generic Global Constraint based on MDDs\nModelling general relativistic perfect fluids in field theoretic  language\nReconstruction of Black Hole Metric Perturbations from Weyl Curvature  II: The Regge-Wheeler gauge\nThe Information Geometry of Space and Time\nNew Tools for Fermion Masses from Extra Dimensions\nON THE EXTENDED POINCARE POLYNOMIAL\nFat Euclidean Gravity with Small Cosmological Constant\nNotes on Certain (0,2) Correlation Functions\nSupersymmetric AdS Backgrounds in String and M-theory\nNotes on certain other (0,2) correlation functions\nCombing nilpotent and polycyclic groups\nGluing theorems for complete anti-self-dual spaces\nAlgebraic orbifold quantum products\nA general intersection formula for Lagrangian cycles\nEffective JSJ Decompositions\nSaturated chains in composition posets\nAlgebraic G-functions associated to matrices over a group-ring\nMatrix Graph Grammars\nCombining generic judgments with recursive definitions\nCosmological Radar Ranging in an Expanding Universe\nLogical Reasoning for Higher-Order Functions with Local State\nTheory of Zipf's Law and of General Power Law Distributions with  Gibrat's law of Proportional Growth\nUNL-French deconversion as transfer & generation from an interlingua  with possible quality enhancement through offline human interaction\nAlgebraic mechanics as an accessible toy model demonstrating entropy  generation from reversible microscopic dynamics\nA Generalized Carpenter's Rule Theorem for Self-Touching Linkages\nAvoiding Squares and Overlaps Over the Natural Numbers\nCoherence for rewriting 2-theories\nTermination Prediction for General Logic Programs\nBraided Categorical Quantum Mechanics I\nTime-Varying Autoregressions in Speech: Detection Theory and  Applications\nCovariant star product on symplectic and Poisson spacetime manifolds\nAutomatic Generation of Proof Tactics for Finite-Valued Logics\nComputing Critical Pairs in 2-Dimensional Rewriting Systems\nOn Testing Constraint Programs\nAdvances in Modeling of Scanning Charged-Particle-Microscopy Images\nSemihyperrings Characterized by Their Hyperideals\nThe General Vector Addition System Reachability Problem by Presburger  Inductive Invariants\nWeighted random generation of context-free languages: Analysis of  collisions in random urn occupancy models\nTermination Casts: A Flexible Approach to Termination with General  Recursion\nGeneralized Thue-Morse words and palindromic richness\nFrom automatic structures to automatic groups\nMeasuring Intelligence through Games\nComplex dynamics in learning complicated games\nOn a Generalization of Zaslavsky's Theorem for Hyperplane Arrangements\nC++ Standard Template Library by template specialized containers\nA MDA approach for defining WS-Policy semantic non-functional properties\nInitial Semantics for Strengthened Signatures\nStructured general corecursion and coinductive graphs [extended  abstract]\nAgent-time Epistemics and Coordination\nThe Complexity of Monotone Hybrid Logics over Linear Frames and the  Natural Numbers\nAutomating embedded analysis capabilities and managing software  complexity in multiphysics simulation part I: template-based generic  programming\nInteractive visualization of a thin disc around a Schwarzschild black  hole\nUtilizing Static Analysis and Code Generation to Accelerate Neural  Networks\nCompletely reducible sets\nLinear-use CPS translations in the Enriched Effect Calculus\nFractional Laplacian on the torus\nQuantum field theory on affine bundles\nExtending the logical update view with transaction support\nGeneralized Counting Constraint Satisfaction Problems With Determinantal  Circuits\nOne-variable word equations in linear time\nAn Empirical Study of Path Feasibility Queries\nDesign and implementation of the NaI (Tl)CsI (Na) detectors output  signal generator\nAlmost local generation of EPR entanglement in non-equilibrium\nMean-payoff Games with Partial Observation\nSailfish: a flexible multi-GPU implementation of the lattice Boltzmann  method\nDimensions in non-Archimedean geometries\nSynchronizing weighted automata\nProfinite automata\nFactor Complexity of S-adic sequences generated by the  Arnoux-Rauzy-Poincaré Algorithm\nNews-Based Group Modeling and Forecasting\nA new approach to the $2$-regularity of the $\\ell$-abelian complexity of  $2$-automatic sequences\nKahler: An Implementation of Discrete Exterior Calculus on Hermitian  Manifolds\nProgram Synthesis and Linear Operator Semantics\nOMP2HMPP: HMPP Source Code Generation from Programs with Pragma  Extensions\nParameterized Complexity of CTL: A Generalization of Courcelle's Theorem\nFreeness of automata groups vs boundary dynamics\nContract-Based General-Purpose GPU Programming\nRational growth in the Heisenberg group\nGeneralized quantum gravity condensates for homogeneous geometries and  cosmology\nYet Another Way of Building Exact Polyhedral Model for Weakly Dynamic  Affine Programs\nOn the structure of Schnyder woods on orientable surfaces\nDecomposing Nekrasov Decomposition\nTop-down Tree Long Short-Term Memory Networks\nGeneration and Comprehension of Unambiguous Object Descriptions\nBootstrapping Ternary Relation Extractors\nAspect-based Opinion Summarization with Convolutional Neural Networks\nRevisiting Summarization Evaluation for Scientific Articles\nLearning to Generate Posters of Scientific Papers\nGenerating Concurrency Checks Automatically\nLearning Joint Representations of Videos and Sentences with Web Image  Search\nGeometric-Algebra Adaptive Filters\nSeparable determination in Banach spaces\nUnsupervised, Efficient and Semantic Expertise Retrieval\nInducing Probabilistic Programs by Bayesian Program Merging\nVisualization and Analysis of Frames in Collections of Messages: Content  Analysis and the Measurement of Meaning\nGenerating Sequences With Recurrent Neural Networks\nA comparison of linear and non-linear calibrations for speaker  recognition\nEfficient and Generalized Decentralized Monitoring of Regular Languages\nGSOS for non-deterministic processes with quantitative aspects\nProvGen: generating synthetic PROV graphs with predictable structure\nThe tangent bundle exponential map and locally autoparallel coordinates  for general connections with application to Finslerian geometries\nA Tale of Three Runtimes\nApproximating solution structure of the Weighted Sentence Alignment  problem\nPalindromic sequences generated from marked morphisms\nOMP2MPI: Automatic MPI code generation from OpenMP programs\nCosting Generated Runtime Execution Plans for Large-Scale Machine  Learning Programs\nAnalysis of Carries in Signed Digit Expansions\nReader-Aware Multi-Document Summarization via Sparse Coding\nThe Long-Short Story of Movie Description\nScheduled Sampling for Sequence Prediction with Recurrent Neural  Networks\nRequirement Tracing using Term Extraction\nGeneration of Multimedia Artifacts: An Extractive Summarization-based  Approach\nA unifying framework for ghost-free Lorentz-invariant Lagrangian field  theories\nGCC-Plugin for Automated Accelerator Generation and Integration on  Hybrid FPGA-SoCs\nWeak Gravity Conjecture in AdS/CFT\nExtending Hybrid CSP with Probability and Stochasticity\nEfficient Compilation to Event-Driven Task Programs\nTabMCQ: A Dataset of General Knowledge Tables and Multiple-choice  Questions\nThree-dimensional Boltzmann-Hydro code for core-collapse in massive  stars II. The Implementation of moving-mesh for neutron star kicks\nThe subdivision of large simplicial cones in Normaliz\nGenerative Topic Embedding: a Continuous Representation of Documents  (Extended Version with Proofs)\nSmart Reply: Automated Response Suggestion for Email\nA Generic Logic for Proving Linearizability (Extended Version)\nGeneric and Effective Specification of Structural Test Objectives\nThe basic $dd^{\\mathcal{J}}$-lemma\nHyperNetworks\nSequential decision problems, dependent types and generic solutions\nGenerating the Functions with Regular Graphs under Composition\nA Surrogate-based Generic Classifier for Chinese TV Series Reviews\nVeracity Computing from Lexical Cues and Perceived Certainty Trends\nA Simple, Fast Diverse Decoding Algorithm for Neural Generation\nAutomatically generating features for learning program analysis  heuristics\nLearning to Decode for Future Success\nImage-Grounded Conversations: Multimodal Context for Natural Question  and Response Generation\nExtensional Semantics for Higher-Order Logic Programs with Negation\nCommAI: Evaluating the first steps towards a useful general AI\nIntersection Types and Counting\nAnisotropic stellar models admitting conformal motion\nSystematic Mapping Study of Template-based Code Generation\nMultirole Logic (Extended Abstract)\nA practical approach to dialogue response generation in closed domains\nA Neural Parametric Singing Synthesizer\nDeep Reinforcement Learning-based Image Captioning with Embedding Reward\nGet To The Point: Summarization with Pointer-Generator Networks\nIncremental learning of high-level concepts by imitation\nAttend to You: Personalized Image Captioning with Context Sequence  Memory Networks\nLexically Constrained Decoding for Sequence Generation Using Grid Beam  Search\nPaying Attention to Descriptions Generated by Image Captioning Models\nNeural AMR: Sequence-to-Sequence Models for Parsing and Generation\nLearning to Ask: Neural Question Generation for Reading Comprehension\n$\\star$-Liftings for Differential Privacy\nInferring and Executing Programs for Visual Reasoning\nUtility of general and specific word embeddings for classifying  translational stages of research\nImage Captioning with Object Detection and Localization\nA Generative Model of Group Conversation\nGenerative Encoder-Decoder Models for Task-Oriented Spoken Dialog  Systems with Chatting Capability\nA Temporal Tree Decomposition for Generating Temporal Graphs\nA Verified Certificate Checker for Floating-Point Error Bounds\nEfficient Vector Representation for Documents through Corruption\nImproving Neural Parsing by Disentangling Model Combination and  Reranking Effects\nCrowdsourcing Multiple Choice Science Questions\nEffective Inference for Generative Neural Parsing\nSenGen: Sentence Generating Neural Variational Topic Model\nMerge decompositions, two-sided Krohn-Rhodes, and aperiodic pointlikes\nNarrative Variations in a Virtual Storyteller\nAlgorithms and Architecture for Real-time Recommendations at News UK\nAn Algebraic Glimpse at Bunched Implications and Separation Logic\nDeconvolutional Latent-Variable Model for Text Sequence Matching\nIs space a word, too?\nGeometric Computing with Chain Complexes: Design and Features of a Julia  Package\nReasoning about Divergences for Relaxations of Differential Privacy\nOversampling for Imbalanced Learning Based on K-Means and SMOTE\nSymmetries, Holography and Quantum Phase Transition in Two-dimensional  Dilaton AdS Gravity\nDLVM: A modern compiler infrastructure for deep learning systems\nA General Neural Network Hardware Architecture on FPGA\nQuestion Asking as Program Generation\nData-Driven Feedback Generation for Introductory Programming Exercises\nProduction Ready Chatbots: Generate if not Retrieve\nDocument Generation with Hierarchical Latent Tree Models\nFrom CFT to Ramond super-quantum curves\nImproving Generalization Performance by Switching from Adam to SGD\nHGum: Messaging Framework for Hardware Accelerators\nGeneralizing Gillespie's direct method to enable network-free  simulations\nImproving Variational Encoder-Decoders in Dialogue Generation\nValidation and Topic-driven Ranking for Biomedical Hypothesis Generation  Systems\nTeaching Machines to Code: Neural Markup Generation with Visual  Attention\nNeural Voice Cloning with a Few Samples\nDeep Feed-forward Sequential Memory Networks for Speech Synthesis\nSyzygies of secant ideals of Plücker-embedded Grassmannians are  generated in bounded degree\nDetection of Surgical Site Infection Utilizing Automated Feature  Generation in Clinical Notes\nThe Density of Linear-time Properties\nMemGEN: Memory is All You Need\nSymbolic Reasoning for Automatic Signal Placement (Extended Version)\nA comparison of recent waveform generation and acoustic modeling methods  for neural-network-based speech synthesis\nLearning Topics using Semantic Locality\nGeneral Relativity and Weyl Geometry\nKnowledge Engineering for Planning-Based Hypothesis Generation\nHierarchical Latent Semantic Mapping for Automated Topic Generation\nGeneralized Topic Modeling\nArbitrarily exhaustive hypergraph generation of 4-, 6-, 8-, 16-, and  32-dimensional quantum contextual sets\nMining Android App Usages for Generating Actionable GUI-based Execution  Scenarios\nHamilton Geometry - Phase Space Geometry from Modified Dispersion  Relations\nNatural Language Interfaces to Databases - An Introduction\nAbstract Machine for Typed Feature Structures\nDesigning Statistical Language Learners: Experiments on Noun Compounds\nSpecialized Language Models using Dialogue Predictions\nTranslation Methodology in the Spoken Language Translator: An Evaluation\nEvaluating Parsing Schemes with Entropy Indicators\nContext as a Spurious Concept\nThe Open Language Archives Community: An infrastructure for distributed  archiving of language resources\nLERIL : Collaborative Effort for Creating Lexical Resources\nNLML--a Markup Language to Describe the Unlimited English Grammar\nA Framework for Creating Natural Language User Interfaces for  Action-Based Applications\nScaling relations for diversity of languages\nModel of World; her cities, languages and countries\nLanguage Diversity across the Consonant Inventories: A Study in the  Framework of Complex Networks\nA database approach to information retrieval: The remarkable  relationship between language models and region models\nQuotient Complexity of Bifix-, Factor-, and Subword-Free Regular  Languages\nThe complexity of conservative finite-valued CSPs\nGeneralising tractable VCSPs defined by symmetric tournament pair  multimorphisms\nComparing Selected Criteria of Programming Languages Java, PHP, C++,  Perl, Haskell, AspectJ, Ruby, COBOL, Bash Scripts and Scheme Revision 1.0 - a  Team CPLgroup COMP6411-S10 Term Report\nImproving Web Page Readability by Plain Language\nNaming Game on Adaptive Weighted Networks\nUse Pronunciation by Analogy for text to speech system in Persian  language\nDo Software Languages Engineers Evaluate their Languages?\nStar-Free Languages are Church-Rosser Congruential\nUNL Based Bangla Natural Text Conversion - Predicate Preserving Parser  Approach\nAutomatic Segmentation of Manipuri (Meiteilon) Word into Syllabic Units\nSL: a \"quick and dirty\" but working intermediate language for SVP  systems\n3rd grade English language learners making sense of sound\nGNU epsilon - an extensible programming language\nThe Buffered π-Calculus: A Model for Concurrent Languages\nSyntactic Analysis Based on Morphological Characteristic Features of the  Romanian Language\nJoint Space Neural Probabilistic Language Model for Statistical Machine  Translation\nAnalytic solution of a model of language competition with bilingualism  and interlinguistic similarity\nA study for the effect of the Emphaticness and language and dialect for  Voice Onset Time (VOT) in Modern Standard Arabic (MSA)\nA mathematical theory of truth and an application to the regress problem\nLog-space counter is useful for unary languages by help of a  constant-size quantum register\nUnary languages recognized by two-way one-counter automata\nComplexity measurement of natural and artificial languages\nMultilinguals and Wikipedia Editing\nSpelling Error Trends and Patterns in Sindhi\nMetamorphic Domain-Specific Languages: A Journey Into the Shapes of a  Language\nDebates with small transparent quantum verifiers\nAssamese-English Bilingual Machine Translation\nOn Detecting Noun-Adjective Agreement Errors in Bulgarian Language Using  GATE\nNetwork Motifs Analysis of Croatian Literature\nCross-language Wikipedia Editing of Okinawa, Japan\nZero-One Law for Regular Languages and Semigroups with Zero\nThere is no fast lunch: an examination of the running speed of  evolutionary algorithms in several languages\nDeriving a Simple Gradual Security Language\nOn Word and Frontier Languages of Unsafe Higher-Order Grammars\nDesign and Implementation of Probabilistic Programming Language Anglican\nA Temporal Description Logic for Reasoning about Actions and Plans\nOn Separation by Locally Testable and Locally Threshold Testable  Languages\nNatural Language Web Interface for Database (NLWIDB)\nA novel datatype architecture support for programming languages\nEvaluating Indirect Strategies for Chinese-Spanish Statistical Machine  Translation\nJava Modular Extension for Operator Overloading\nAnalyzing the Language of Food on Social Media\nPolarity detection movie reviews in hindi language\nA Unified Mathematical Language for Medicine and Science\nVariability within Modeling Language Definitions\nOn $k$-piecewise testability (preliminary report)\nRecurrent-Neural-Network for Language Detection on Twitter  Code-Switching Corpus\nOn Distributed Density in Tuple-based Coordination Languages\nOn measuring linguistic intelligence\nCharacter-Aware Neural Language Models\nDependency length minimization: Puzzles and Promises\nThe \"handedness\" of language: Directional symmetry breaking of sign  usage in words\nSentiment/Subjectivity Analysis Survey for Languages other than English\nCPL: A Core Language for Cloud Computing -- Technical Report\nOn the notion of \"von Neumann vicious circle\" coined by John Backus\nRegular Language Distance and Entropy\nPiecewise Testable Languages and Nondeterministic Automata\nExtension Complexity of Formal Languages\nComplexity of Prefix-Convex Regular Languages\nMorphological Constraints for Phrase Pivot Statistical Machine  Translation\nTowards a continuous modeling of natural language domains\nNeural Machine Translation with Pivot Languages\nListen and Translate: A Proof of Concept for End-to-End Speech-to-Text  Translation\nCross-Lingual Predicate Mapping Between Linked Data Ontologies\nA Lazy Language Needs a Lazy Type System: Introducing Polymorphic  Contexts\nMachine Translation Approaches and Survey for Indian Languages\nLanguage competition in a population of migrating agents\nEmergence of Grounded Compositional Language in Multi-Agent Populations\nComplexity of Infimal Observable Superlanguages\nKnowledge Rich Natural Language Queries over Structured Biological  Databases\nPast, Present, Future: A Computational Investigation of the Typology of  Tense in 1000 Languages\nJoint PoS Tagging and Stemming for Agglutinative Languages\nOn the relation between dependency distance, crossing dependencies, and  parsing. Comment on \"Dependency distance: a new perspective on syntactic  patterns in natural languages\" by Haitao Liu et al\nNatural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog\nMultilingual Hierarchical Attention Networks for Document Classification\nIncremental Parametric Syntax for Multi-Language Transformation\nAll that is English may be Hindi: Enhancing language identification  through automatic ranking of likeliness of word borrowing in social media\nVector Space Model as Cognitive Space for Text Classification\nImproving Language Modelling with Noise-contrastive estimation\nAdding successor: A transfer theorem for separation and covering\nPerson Re-Identification with Vision and Language\nSyntactic and Semantic Features For Code-Switching Factored Language  Models\nIndowordnets help in Indian Language Machine Translation\nObject Referring in Visual Scene with Spoken Language\nLayer by layer - Combining Monads\nAn analysis of incorporating an external language model into a  sequence-to-sequence model\nEffective Extensible Programming: Unleashing Julia on GPUs\nAnalyzing Roles of Classifiers and Code-Mixed factors for Sentiment  Identification\nContext Models for OOV Word Translation in Low-Resource Languages\nSoftware Fault Isolation for Robust Compilation\nEmotions are Universal: Learning Sentiment Based Representations of  Resource-Poor Languages using Siamese Networks\nLearning Word Association Norms Using Tree Cut Pair Models\nA Domain Specific Language for Performance Portable Molecular Dynamics  Algorithms\nInduction of Decision Trees based on Generalized Graph Queries\nTime, Tense and Aspect in Natural Language Database Interfaces\nThe Effect of Native Language on Internet Usage\nUnsupervised Language Acquisition: Theory and Practice\nParsing Transformative LR(1) Languages\nThere Exist some Omega-Powers of Any Borel Rank\nBounded Underapproximations\nPTaCL: A Language for Attribute-Based Access Control in Open Systems\nThe Biological Origin of Linguistic Diversity\nPunjabi Language Interface to Database: a brief review\nUsingWord Embeddings for Query Translation for Hindi to English Cross  Language Information Retrieval\nThe Machine that Builds Itself: How the Strengths of Lisp Family  Languages Facilitate Building Complex and Flexible Bioinformatic Models\nXTQ: A Declarative Functional XML Query Language\nTowards Using Machine Translation Techniques to Induce Multilingual  Lexica of Discourse Markers\nAutomating Abstract Interpretation of Abstract Machines\nComplexity of regular bifix-free languages\nInteractive Natural Language Acquisition in a Multi-modal Recurrent  Neural Architecture\nWeakly Supervised Cross-Lingual Named Entity Recognition via Effective  Annotation and Representation Projection\nTortoise: Interactive System Configuration Repair\nFirst Programming Language: Visual or Textual?\nVision-and-Language Navigation: Interpreting visually-grounded  navigation instructions in real environments\nHow Does Bug-Handling Effort Differ Among Different Programming  Languages?\nNOOP: A Domain-Theoretic Model of Nominally-Typed OOP\nMemoization in Constraint Logic Programming\nSPANISH 1992 (S92): corpus-based analysis of present-day Spanish for  medical purposes\nCompact Representations by Finite-State Transducers\nJapanese word sense disambiguation based on examples of synonyms\nParsing a Flexible Word Order Language\nAcquiring a Lexicon from Unsegmented Speech\nMapping Scrambled Korean Sentences into English Using Synchronous TAGs\nIdentifying Word Translations in Non-Parallel Texts\nCLiFF Notes: Research in the Language, Information and Computation  Laboratory of the University of Pennsylvania\nRestricted Parallelism in Object-Oriented Lexical Parsing\nSynchronous Models of Language\nAssigning Grammatical Relations with a Back-off Model\nProof Nets and the Complexity of Processing Center-Embedded  Constructions\nPrimitive Part-of-Speech Tagging using Word Length and Sentential  Structure\nQuestion Answering System Using Syntactic Information\nA Monitoring Language for Run Time and Post-Mortem Behavior Analysis and  Visualization\nA TCSP-like decidable constraint language generalising existing cardinal  direction relations\nf2mma: FORTRAN to Mathematica translator\nChallenging the principle of compositionality in interpreting natural  language texts\nhepawk, Version 1.6: A Language for Scanning High Energy Physics Events\nA context-free and a 1-counter geodesic language for a Baumslag-Solitar  group\nModal languages for topology: expressivity and definability\nA new model for competition between many languages\nCompetition of languages in the presence of a barrier\nThe class of languages recognizable by 1-way quantum finite automata is  not closed under union\nInternational Standard for a Linguistic Annotation Framework\nCompositional Semantics Grounded in Commonsense Metaphysics\nThe Challenges of Hardware Synthesis from C-Like Languages\nMinimum de Bruijn Sequence in a Language with Forbidden Substrings\nFinding the growth rate of a regular language in polynomial time\nInvestigation of the Zipf-plot of the extinct Meroitic language\nReasoning About a Service-oriented Programming Paradigm\nA Metamodel of Unit Testing for Object-Oriented Programming Languages\nNumerical values of the growth rates of power-free languages\nDecidability and Shortest Strings in Formal Languages\nShuffling and Unshuffling\nProceedings IFIP Working Conference on Domain-Specific Languages\nImplementing Continuation based language in GCC\nFunctional Programming and Security\nLive-Musikprogrammierung in Haskell\nHaskell_#: Coordinating Functional Processes\nCompactified Horizontal Visibility Graph for the Language Network\nPENCIL: Towards a Platform-Neutral Compute Intermediate Language for  DSLs\nLive music programming in Haskell\nPropositional Encoding of Constraints over Tree-Shaped Data\nDialogue System: A Brief Review\nInfinitary Axiomatization of the Equational Theory of Context-Free  Languages\nPattern Matching via Choice Existential Quantifications in Imperative  Languages\nA Short Introduction to NILE\nTowards a Robot Perception Specification Language\nA Text to Speech (TTS) System with English to Punjabi Conversion\nCognitive Systems and Question Answering\nStandards for language resources in ISO -- Looking back at 13 fruitful  years\nOCR Error Correction Using Character Correction and Feature-Based Word  Classification\nPR2: A Language Independent Unsupervised Tool for Personality  Recognition from Text\nFrom algebra to logic: there and back again -- the story of a hierarchy\nModeling Hybrid Systems in Hy-tccp\nDeclaratively solving Google Code Jam problems with Picat\nTightening the Complexity of Equivalence Problems for Commutative  Grammars\nThe omega-inequality problem for concatenation hierarchies of star-free  languages\nA Short Note on Infinite Union/Intersection of Omega Regular Languages\nSiamese convolutional networks based on phonetic features for cognate  identification\nA Supervised Authorship Attribution Framework for Bengali Language\nLanguage Classes Associated with Automata Over Matrix Groups\nIntroduction: Cognitive Issues in Natural Language Processing\nProceedings 14th International Workshop Quantitative Aspects of  Programming Languages and Systems\nRegular Languages of Words over Countable Linear Orderings\nBeyond-Regular Typestate\nIt is undecidable if two regular tree languages can be separated by a  deterministic tree-walking automaton\nA Tidy Data Model for Natural Language Processing using cleanNLP\nExtending Functional Languages with High-Level Exception Handling\nBKTreebank: Building a Vietnamese Dependency Treebank\nALL-IN-1: Short Text Classification with One Model for All Languages\nProving Parikh's theorem using Chomsky-Schutzenberger theorem\nSufiSent - Universal Sentence Representations Using Suffix Encodings\nAn elementary method for the fidel codification of texts written in  Romanian language (O metodă elementară pentru codificarea fidelă  a textelor scrise în limba română)\nFast, Flexible, Polyglot Instrumentation Support for Debuggers and other  Tools\nProcess Physics: From Quantum Foam to General Relativity\nA Treatise on Quantum Clifford Algebras\nThe Geometry of Consistency: Decohering Histories in Generalized Quantum  Theory\nReconstruction of Protein-Protein Interaction Pathways by Mining  Subject-Verb-Objects Intermediates\nSemantic Composition and Decomposition: From Recognition to Generation\ngMark: Schema-Driven Generation of Graphs and Queries\nOn the Generative Power of Omega-Grammars and Omega-Automata\nScalable Text Mining with Sparse Generative Models\nLearning Generative Models with Sinkhorn Divergences\nEffective potential for classical field theories subject to stochastic  noise\nMultifractal Structure of the Harmonic Measure of Diffusion Limited  Aggregates\nSYNTAX: A computer program to compress a sequence and to estimate its  information content\nHierarchical Mean-Field Theories in Quantum Statistical Mechanics\nMicroscopic activity patterns in the Naming Game\nArchitectural Considerations for Conversational Systems -- The  Verbmobil/INTARC Experience\nE-RES: A System for Reasoning about Actions, Events and Observations\nNoun-phrase co-occurrence statistics for semi-automatic semantic lexicon  construction\nInteractive Timetabling\nHigher-Order Pattern Complement and the Strict Lambda-Calculus\nPrototyping CLP(FD) Tracers: a Trace Model and an Experimental  Validation Environment\nComposing Programs in a Rewriting Logic for Declarative Programming\nOptimal Ordered Problem Solver\nCooperation between Pronoun and Reference Resolution for Unrestricted  Texts\nA General Framework For Lazy Functional Logic Programming With Algebraic  Polymorphic Types\nPropositional Computability Logic II\nOn Generalized Records and Spatial Conjunction in Role Logic\nThematic Annotation: extracting concepts out of documents\nAn Improved Non-Termination Criterion for Binary Constraint Logic  Programs\nTowards a diagrammatic modeling of the LinBox C++ linear algebra library\nUsing phonetic constraints in acoustic-to-articulatory inversion\nOctave-GTK: A GTK binding for GNU Octave\nTermination and Confluence of Higher-Order Rewrite Systems\nReusing processes and documenting processes: toward an integrated  framework\nAcronym-Meaning Extraction from Corpora Using Multi-Tape Weighted  Finite-State Machines\nExSched: Solving Constraint Satisfaction Problems with the Spreadsheet  Paradigm\nSome Issues on Incremental Abstraction-Carrying Code\nPolydimensional Relativity, a Classical Generalization of the  Automorphism Invariance Principle\nClassical Histories in Hamiltonian Systems\nDiscrete Quantum Causal Dynamics\nQuestioning the Equivalence Principle\nSakharov's induced gravity: a modern perspective\nQuantum mechanics without spacetime II : noncommutative geometry and the  free point particle\nSpin networks, quantum automata and link invariants\nThe Duality of Time Dilation and Velocity\nBaryon Chiral Perturbation Theory in Manifestly Lorentz Invariant Form\nElectromagnetic Form Factors and the Localization of Quark Orbital  Angular Momentum in the Proton\nIntroduction to the functional RG and applications to gauge theories\nOn T-duality for open strings in general abelian and nonabelian gauge  field backgrounds\nGeneralized Lorentzian Triangulations and the Calogero Hamiltonian\nTorsion and nonmetricity in the stringy geometry\nSuperfield description of 5D supergravity on general warped geometry\nOpen strings in Lie groups and associative products\nAxiomatic classical (prequantum) field theory. Jet formalism\nAutomatic structures, rational growth and geometrically finite  hyperbolic groups\nRandomness and semigenericity\nGromov compactness theorem for stable curves\nToric Prevarieties and Subtorus Actions\nOn Nichols algebras of low dimension\nGrowth of maps, distortion in groups and symplectic geometry\nFuzzy Cognitive Maps and Neutrosophic Cognitive Maps\nMultiple Saddle Connections on Flat Surfaces and Principal Boundary of  the Moduli Spaces of Quadratic Differentials\nThe Elementary Theory of the Frobenius Automorphisms\nEnumerating Segmented Patterns in Compositions and Encoding by  Restricted Permutations\nOn the counting of holomorphic discs in toric Fano manifolds\nStable bundles on hypercomplex surfaces\nOn Exact Solvability of Anharmonic Oscillators in Large Dimensions\nQuantum Harmonic Analysis and Geometric Invariants\nQuantum Circuits with Mixed States\nUse of Mathematical Logical Concepts in Quantum Mechanics: An Example\nTeleportation of an arbitrary mixture of diagonal states of multiqubits  via classical correlation and classical communication\nGeneral-Purpose Computing on a Semantic Network Substrate\nA Generic Deployment Framework for Grid Computing and Distributed  Applications\nUne sémantique observationnelle du modèle des boîtes pour la  résolution de programmes logiques (version étendue)\nFubini-Griffiths-Harris rigidity and Lie algebra cohomology\nThe Effective Field Theory of Inflation\nHarvesting graphics power for MD simulations\nSpectral properties of entanglement witnesses\nObservational semantics of the Prolog Resolution Box Model\nAutomating Renormalization of Quantum Field Theories\nHierarchy wave functions--from conformal correlators to Tao-Thouless  states\nProof mining in ${\\mathbb R}$-trees and hyperbolic spaces\nGlimpses of the Octonions and Quaternions History and Todays  Applications in Quantum Physics\nThe molecular asymmetric rigid rotor Hamiltonian as an exactly solvable  model\nSuperpolynomial speedups based on almost any quantum circuit\nConcept-Oriented Programming\nA Non-Termination Criterion for Binary Constraint Logic Programs\nSimulating the All-Order Strong Coupling Expansion I: Ising Model Demo\nRepresentation theory of mv-algebras\nMaximum Entropy Rate of Markov Sources for Systems With Non-regular  Constraints\nAutomated Induction for Complex Data Structures\nIs quantum field theory a generalization of quantum mechanics?\nGenealogical trees from genetic distances\nNon-Confluent NLC Graph Grammar Inference by Compressing Disjoint  Subgraphs\nBetter Quality in Synthesis through Quantitative Objectives\nThe Structure of First-Order Causality\nThe valence bond solid in quasicrystals\nFrom Requirements to code: an Architecture-centric Approach for  producing Quality Systems\nStudying Maximum Information Leakage Using Karush-Kuhn-Tucker Conditions\nOn the homology of locally compact spaces with ends\nOn the Capacity of Constrained Systems\nEffective Theories and Modifications of Gravity\nPolytool: polynomial interpretations as a basis for termination analysis  of Logic programs\nThe fundamental importance of discourse in theoretical physics\nRefinement and Verification of Real-Time Systems\nDeveloping Experimental Models for NASA Missions with ASSL\nThe Automatic Synthesis of Linear Ranking Functions: The Complete  Unabridged Version\nA General Simulation Framework for Supply Chain Modeling: State of the  Art and Case Study\nThe bounds of the set of equivalent resistances of n equal resistors  combined in series and in parallel\nElectronic Geometry Textbook: A Geometric Textbook Knowledge Management  System\nYAPA: A generic tool for computing intruder knowledge\nC Library for Simulated Evolution of Biological Networks\nSur les espaces test pour la moyennabilité\nAutomatic Probabilistic Program Verification through Random Variable  Abstraction\nTwo refreshing views of Fluctuation Theorems through Kinematics Elements  and Exponential Martingale\nVan Wijngaarden grammars, metamorphism and K-ary malwares\nHistories and observables in covariant field theory\nCategorical Quantum Circuits\nDevelopment in the Scattering Matrix Theory: From Spin-Orbit-Coupling  Affected Shot Noise to Quantum Pumping\nSmooth infinite words over $n$-letter alphabets having same remainder  when divided by $n$\nBisimulations for fuzzy transition systems\nA Machine Checked Model of Idempotent MGU Axioms For Lists of Equational  Constraints\nPowermonads and Tensors of Unranked Effects\nFinite-lattice form factors in free-fermion models\nPicturing classical and quantum Bayesian inference\nGeneric Programming of Reusable, High Performance Container Types using  Automatic Type Hierarchy Inference and Bidirectional Antichain Typing\nA Tool for the Certification of PLCs based on a Coq Semantics for  Sequential Function Charts\nAlmost overlap-free words and the word problem for the free Burnside  semigroup satisfying x^2=x^3\nReduction of fuzzy automata by means of fuzzy quasi-orders\nA Spatial-Epistemic Logic for Reasoning about Security Protocols\nGRASP and path-relinking for Coalition Structure Generation\nTime Fractional Schrödinger Equation; Fox's H-functions and the  Effective Potential\nComputing generalized inverses using LU factorization of matrix product\nA Coinductive Calculus for Asynchronous Side-effecting Processes\nDifferential geometric formulation of the Cauchy Navier equations\nLaminations in the language of leaves\nMadGraph 5 : Going Beyond\nGeneralizing Boolean Satisfiability I: Background and Survey of Existing  Work\nScalar Field Theory on a Causal Set in Histories Form\nExperimenting with Transitive Verbs in a DisCoCat\nCausal categories: relativistically interacting processes\nThe emergence of gauge invariance: the stay-at-home gauge versus  local-global duality\nMonoids and Maximal Codes\nGeometric Path Integrals. A Language for Multiscale Biology and Systems  Robustness\nMaximum Segment Sum, Monadically (distilled tutorial, with solutions)\nGauge and Integrable Theories in Loop Spaces\nMulti-level Contextual Type Theory\nGraded CTL Model Checking for Test Generation\nThe Newtonian Limit of Geometrostatics\nHyper-relativistic mechanics and superluminal particles\nSynthesising Graphical Theories\nBounded Satisfiability for PCTL\nThe phenomenological approach to modeling the dark energy\nReliable Generation of High-Performance Matrix Algebra\nFreeFem++, a tool to solve PDEs numerically\nTiming and Code Size Optimization on Achieving Full Parallelism in  Uniform Nested Loops\nA Domain-Specific Compiler for Linear Algebra Operations\nMadAnalysis 5, a user-friendly framework for collider phenomenology\nOn hybrid models of quantum finite automata\nLatent Topic Models for Hypertext\nA Generic Library for Stencil Computations\nA Simple Optimum-Time FSSP Algorithm for Multi-Dimensional Cellular  Automata\nTopology Inspired Problems for Cellular Automata, and a Counterexample  in Topology\nUsing Program Synthesis for Social Recommendations\nRanking Functions for Linear-Constraint Loops\nWhat properties of numbers are needed to model accelerated observers in  relativity?\nBisimilarity of Probabilistic Pushdown Automata\nModelling an Automatic Proof Generator for Functional Dependency Rules  Using Colored Petri Net\nEfficient Instantiation of Parameterised Boolean Equation Systems to  Parity Games\nSummarizing Reviews with Variable-length Syntactic Patterns and Topic  Models\nApplication-tailored Linear Algebra Algorithms: A search-based Approach\nRelational Foundations For Functorial Data Migration\nLearning New Facts From Knowledge Bases With Neural Tensor Networks and  Semantic Word Vectors\nApproximation of grammar-based compression via recompression\nRemarks on local symmetry invariance in perturbative algebraic quantum  field theory\nα-concave functions and a functional extension of mixed volumes\nPushdown Exception-Flow Analysis of Object-Oriented Programs\nEnd to End Verification and Validation with SPIN\nWork in Progress: Enabling robot device discovery through robot device  descriptions\nAutomatic Equivalence Proofs for Non-deterministic Coalgebras\nDBI Galileon in the Effective Field Theory of Inflation: Orthogonal  non-Gaussianities and constraints from the Trispectrum\nDynamic Ising Model: Reconstruction of Evolutionary Trees\nLibsharp - spherical harmonic transforms revisited\nNovel discrete symmetries in the general N = 2 supersymmetric quantum  mechanical model\nGeneralized geometry applied to 4d-supergravity\nFormal Representation of the SS-DB Benchmark and Experimental Evaluation  in EXTASCID\nRule-Based Semantic Tagging. An Application Undergoing Dictionary  Glosses\nAlgebraic Net Class Rewriting Systems, Syntax and Semantics for  Knowledge Representation and Automated Problem Solving\nPriced Timed Petri Nets\nConsistency conditions from generalized-unitarity\nEfficient algorithms for discrete Gabor transforms on a nonseparable  lattice\nComplete Decoupling Limit of Ghost-free Massive Gravity\nCost-Aware Automatic Program Repair\nA Unifying Approach to Decide Relations for Timed Automata and their  Game Characterization\nLp theory for outer measures and two themes of Lennart Carleson united\nA \"q-deformed\" generalization of the Hosszu-Gluskin theorem\nTouch-enabled Programming for the Lab of Things\nTransport Equations for Oscillating Neutrinos\nAutomaton semigroup constructions\nGroups and Semigroups Defined by Colorings of Synchronizing Automata\nThread-Based Obfuscation through Control-Flow Mangling\nRandom Generation of Nondeterministic Finite-State Tree Automata\nImplicit Sensitive Text Summarization based on Data Conveyed by  Connectives\nA Simple and Scalable Static Analysis for Bound Analysis and Amortized  Complexity Analysis\nAlgebraic Properties of Valued Constraint Satisfaction Problem\nA geometric approach to (semi)-groups defined by automata via dual  transducers\nScalable and Robust Construction of Topical Hierarchies\nHigh-speed detection of emergent market clustering via an unsupervised  parallel genetic algorithm\nCooperating distributed context-free hexagonal array grammar systems  with permitting contexts\nA Vernacular for Coherent Logic\nPattern Recognition in Narrative: Tracking Emotional Expression in  Context\nWhat drives the time evolution of the spacetime geometry?\nAltitude Training: Strong Bounds for Single-Layer Dropout\nA New Model of Array Grammar for generating Connected Patterns on an  Image Neighborhood\nThe RD53 Collaboration's SystemVerilog-UVM Simulation Framework and its  General Applicability to Design of Advanced Pixel Readout Chips\nImproved Undecidability Results for Reachability Games on Recursive  Timed Automata\nGPGPU Computing\nModeling Creativity: Case Studies in Python\nHyperspaces in topological Categories\nGR uniqueness and deformations\nNotes on Noise Contrastive Estimation and Negative Sampling\nSublinear-Time Approximate MCMC Transitions for Probabilistic Programs\nMembership Function Assignment for Elements of Single OWL Ontology\nProgram Logics for Homogeneous Generative Run-Time Meta-Programming\nHardware Counted Profile-Guided Optimization\nGeneralized Gross-Pitaevskii equation adapted to the $U(5)\\supset  SO(5)\\supset SO(3)$ symmetry for spin-2 condensates\nInterleaved Text/Image Deep Mining on a Large-Scale Radiology Database  for Automated Image Interpretation\nAn axiomatic approach to free amalgamation\nProving Termination of Graph Transformation Systems using Weighted Type  Graphs over Semirings\nFELIX-1.0: A finite element solver for the time dependent generator  coordinate method with the Gaussian overlap approximation\nRefinement Type Inference via Horn Constraint Optimization\nAlgebraic approach to quantum theory: a finite-dimensional guide\nStrange Work in Strange Places: Quantum Field Theory in Curved Space\nDistinguishing Hidden Markov Chains\nJSKETCH: Sketching for Java\nParallel Triangles Counting Using Pipelining\nMatrix Schubert varieties and Gaussian conditional independence models\nLSTM-based Deep Learning Models for Non-factoid Answer Selection\nSentence Level Recurrent Topic Model: Letting Topics Speak for  Themselves\nDecidability of multiset, set and numerically decipherable directed  figure codes\nCategorical semiotics\nRow-less Universal Schema\nA Retraction Theorem for Distributed Synthesis\nOzy: A General Orchestration Container\nCross-Domain Entity Resolution in Social Media\nDe-Conflated Semantic Representations\nOptimal steering of a linear stochastic system to a final probability  distribution, Part III\nMonetary economics from econophysics perspective\nBPS counting for knots and combinatorics on words\nQuantum counter automata\nA Computational Model for the Direct Execution of General Specifications  with Multi-way Constraints\nArrangements of Submanifolds and the Tangent Bundle Complement\nIR divergences and kinetic equation in de Sitter space. (Poincare patch;  Principal series)\nSharp metric obstructions for quasi-Einstein metrics\nA Framework for Automated and Certified Refinement Steps\nSingle Time-Stamped Tries for Retroactive Call Subsumption\nThe Rank and Hanna Neumann Property of Some Submonoids of a Free Monoid\nHigh-Performance Astrophysical Simulations and Analysis with Python\nOn the lattice structure of probability spaces in quantum mechanics\nKappa-Minkowski spacetime: mathematical formalism and applications in  Planck scale physics\nThe finiteness problem for automaton semigroups is undecidable\nCorpus-based Web Document Summarization using Statistical and Linguistic  Approach\nAutomatic Structuring Of Semantic Web Services An Approach\nAutomata with Generalized Rabin Pairs for Probabilistic Model Checking  and LTL Synthesis\nThe Skin In The Game Heuristic for Protection Against Tail Events\nAverage expansion rate and light propagation in a cosmological Tardis  spacetime\nRealizability of hypergraphs and Ramsey link theory\nOn the Synchronization Rate for e-machines\nAbelian networks III. The critical group\nThe automatic solution of partial differential equations using a global  spectral method\nConcept-Oriented Programming: References, Classes and Inheritance  Revisited\nThe Role of Emotions in Propagating Brands in Social Networks\nA practical framework for infinite-dimensional linear algebra\nSimilarity-based matching meets Malware Diversity\nBudget Imbalance Criteria for Auctions: A Formalized Theorem\nConstruction of Vietnamese SentiWordNet by using Vietnamese Dictionary\nRAND-WALK: A Latent Variable Model Approach to Word Embeddings\nMonomial right ideals and the Hilbert series of noncommutative modules\nDeep Feelings: A Massive Cross-Lingual Study on the Relation between  Emotions and Virality\nFast, Multicore-Scalable, Low-Fragmentation Memory Allocation through  Large Virtual Memory and Global Data Structures\nMicrosoft COCO Captions: Data Collection and Evaluation Server\nTRIQS: A Toolbox for Research on Interacting Quantum Systems\nDuQuad: an inexact (augmented) dual first order algorithm for quadratic  programming\nA Hierarchical Distance-dependent Bayesian Model for Event Coreference  Resolution\nFactoriality and the Pin-Reutenauer procedure\nSummarization of Films and Documentaries Based on Subtitles and Scripts\nNecessary Condition for Local Distinguishability of Maximally Entangled  States: Beyond Orthogonality Preservation\nOn the accuracy of self-normalized log-linear models\nSkip-Thought Vectors\npyMOR - Generic Algorithms and Interfaces for Model Order Reduction\nAttention-Based Models for Speech Recognition\nUnsupervised Semantic Parsing of Video Collections\nReplication and Generalization of PRECISE\nBounded Determinization of Timed Automata with Silent Transitions\nSemantics-based Automated Web Testing\nGeometric Arbitrage and Spectral Theory\nTransG : A Generative Mixture Model for Knowledge Graph Embedding\nGeneralized Euler characteristic in power-bounded T-convex valued fields\nSparse Tensor Algebra as a Parallel Programming Model\nAlgorithmic decidability of Engel's property for automaton groups\nRegarding the `Hole Argument' and the `Problem of Time'\nMined Semantic Analysis: A New Concept Space Model for Semantic  Representation of Textual Data\nFinite Countermodel Based Verification for Program Transformation (A  Case Study)\nOn Cube Tilings of Tori and Classification of Perfect Codes in the  Maximum Metric\nTest-Driven Development of ontologies (extended version)\nWeyl gravity and Cartan geometry\nSynthesis of models for order-sorted first-order theories using linear  algebra and constraint solving\nAnalyzing Walter Skeat's Forty-Five Parallel Extracts of William  Langland's Piers Plowman\nDataflow Graphs as Matrices and Programming with Higher-order Matrix  Elements\nResearch Project: Text Engineering Tool for Ontological Scientometry\nOptimal cosmic microwave background map-making in the presence of  cross-correlated noise\nA Dichotomy for First-Order Reducts of Unary Structures\nSupervised and Semi-Supervised Text Categorization using LSTM for Region  Embeddings\nAlgebraic Databases\nHigher-Order Recursion Abstraction: How to Make Ackermann, Knuth and  Conway Look Like a Bunch of Primitives, Figuratively Speaking\nPetrarch 2 : Petrarcher\nANTS2 package: simulation and experimental data processing for Anger  camera type detectors\nA Quantum Computational Semantics for Epistemic Logical Operators. Part  II: Semantics\nUltradense Word Embeddings by Orthogonal Transformation\nAutomated Clustering and Program Repair for Introductory Programming  Assignments\nNeural Discourse Relation Recognition with Semantic Memory\nThe Quench Action\nConstructive canonicity for lattice-based fixed point logics\nopenXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words  Toolkit\nDeciding Maxmin Reachability in Half-Blind Stochastic Games\nLearning Moore Machines from Input-Output Traces\nVariational Neural Machine Translation\nBattRAE: Bidimensional Attention-Based Recursive Autoencoders for  Learning Bilingual Phrase Embeddings\nReview Networks for Caption Generation\nBuilding an Evaluation Scale using Item Response Theory\nGenerating and Exploiting Large-scale Pseudo Training Data for Zero  Pronoun Resolution\nRationalizing Neural Predictions\nWatch What You Just Said: Image Captioning with Text-Conditional  Attention\nQuery-Focused Opinion Summarization for User-Generated Content\nGoldstone origin of black hole hair from supertranslations and  criticality\nSelQA: A New Benchmark for Selection-based Question Answering\nGeneration and Pruning of Pronunciation Variants to Improve ASR Accuracy\nPRESAGE: Protecting Structured Address Generation against Soft Errors\nCoalgebraic Trace Semantics for Buechi and Parity Automata\nDomain Adaptation for Neural Networks by Parameter Augmentation\nLinear dynamical systems on graphs\nUniqueness of Normal Forms for Shallow Term Rewrite Systems\nGeneric Statistical Relational Entity Resolution in Knowledge Graphs\nLexical Based Semantic Orientation of Online Customer Reviews and Blogs\nAn Empirical Evaluation of doc2vec with Practical Insights into Document  Embedding Generation\nDataset and Neural Recurrent Sequence Labeling Model for Open-Domain  Factoid Question Answering\nOpinion Mining in Online Reviews About Distance Education Programs\nNoetherian Quasi-Polish Spaces\nHarder-Narasimhan theory for linear codes\nImage-to-Markup Generation with Coarse-to-Fine Attention\nSelect-Additive Learning: Improving Generalization in Multimodal  Sentiment Analysis\nDistributed Processing of Generalized Graph-Pattern Queries in SPARQL  1.1\nOne Sentence One Model for Neural Machine Translation\nText Network Exploration via Heterogeneous Web of Topics\nSummarizing Situational and Topical Information During Crises\nA General Framework for Content-enhanced Network Representation Learning\nNotions of Anonymous Existence in Martin-Löf Type Theory\nInteractive Attention for Neural Machine Translation\nDeep Amortized Inference for Probabilistic Programs\nA Theme-Rewriting Approach for Generating Algebra Word Problems\nProfessor Forcing: A New Algorithm for Training Recurrent Networks\nDetecting Context Dependent Messages in a Conversational Environment\nA Perspicuous Description of the Schwarzschild Black Hole Geodesics\nGeneric Construction of Efficient Matrix Product Operators\nLeveraging Video Descriptions to Learn Video Question Answering\nHopf images in locally compact quantum groups\nStatistical Learning for OCR Text Correction\nLearning Generic Sentence Representations Using Convolutional Neural  Networks\nSemantic Compositional Networks for Visual Captioning\nMS MARCO: A Human Generated MAchine Reading COmprehension Dataset\nErgodicity of the Liouville system implies the Chowla conjecture\nGeneralized Shared Control versus Classical Shared Control: Illustrative  Examples\nGeometrical thermodynamics and P-V criticality of the black holes with  power-law Maxwell field\nAreas of Attention for Image Captioning\nKnowing When to Look: Adaptive Attention via A Visual Sentinel for Image  Captioning\nHypernyms under Siege: Linguistically-motivated Artillery for Hypernymy  Detection\nOn Nonlinear Prices in Timed Automata\nA Context-aware Attention Network for Interactive Question Answering\nAnalogue Stochastic Gravity in Strongly-Interacting Bose-Einstein  Condensates\nSTRIPS Planning in Infinite Domains\nParallel Graph Rewriting with Overlapping Rules\nDiscontinuous Homomorphisms of $C(X)$ with $2^{\\aleph_0}>\\aleph_2$\nWeighted omega-Restricted One Counter Automata\n$L_{\\infty}$ Algebras and Field Theory\nContextually Customized Video Summaries via Natural Language\nFrom Formalised State Machines to Implementations of Robotic Controllers\nA Knowledge-Grounded Neural Conversation Model\nFine-Grained Entity Type Classification by Jointly Learning  Representations and Label Embeddings\nConsistent Alignment of Word Embedding Models\nTwo strings at Hamming distance 1 cannot be both quasiperiodic\nChiral Higher Spin Gravity\nSupervised Typing of Big Graphs using Semantic Embeddings\nRecurrent and Contextual Models for Visual Question Answering\nTacotron: Towards End-to-End Speech Synthesis\nDiagrammatic Semantics for Digital Circuits\nInjective Schur Modules\nSymbolic Computation and Automated Reasoning for Program Analysis\nUnfolding and Shrinking Neural Machine Translation Ensembles\nRegister automata with linear arithmetic\nRaPro: A Novel 5G Rapid Prototyping System Architecture\nSearchQA: A New Q&A Dataset Augmented with Context from a Search Engine\nExtractive Summarization: Limits, Compression, Generalized Model and  Heuristics\nSwellShark: A Generative Model for Biomedical Named Entity Recognition  without Labeled Data\nScientific Article Summarization Using Citation-Context and Article's  Discourse Structure\nAdversarial Neural Machine Translation\nDeep Text Classification Can be Fooled\nDiversity driven Attention Model for Query-based Abstractive  Summarization\nEntity Linking with people entity on Wikipedia\nChunk-Based Bi-Scale Decoder for Neural Machine Translation\nLearning Representations of Emotional Speech with Deep Convolutional  Generative Adversarial Networks\nMachine Learning with World Knowledge: The Position and Survey\nSympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic  Analysis\nBetter Text Understanding Through Image-To-Text Transfer\nStudies of a general flat space/boson star transition model in a box  through a language similar to holographic superconductors\nBiased Importance Sampling for Deep Neural Network Training\nA simple neural network module for relational reasoning\nA Deep Causal Inference Approach to Measuring the Effects of Forming  Group Loans in Online Non-profit Microfinance Platform\nTowards Proof Synthesis Guided by Neural Machine Translation for  Intuitionistic Propositional Logic\nNonlinear probability. A theory with incompatible stochastic variables\nMultiresolution Match Kernels for Gesture Video Classification\nTalking Drums: Generating drum grooves with neural networks\nNeural Sequence Model Training via $α$-divergence Minimization\nApplying the Polyhedral Model to Tile Time Loops in Devito\nPairwise Well-Formed Modes and Transformations\nPredicting the Quality of Short Narratives from Social Media\nKleene Algebra Modulo Theories\nDetecting Policy Preferences and Dynamics in the UN General Debate with  Neural Word Embeddings\nIntroduction of Curvilinear Coordinates into Numerical Analysis\nAtiyah-Floer Conjecture: a Formulation, a Strategy to Prove and  Generalizations\nDynamic Layer Normalization for Adaptive Neural Acoustic Modeling in  Speech Recognition\nVoiceLoop: Voice Fitting and Synthesis via a Phonological Loop\nTowards Semantic Query Segmentation\nFragile fate of driven-dissipative XY phase in two dimensions\nGraviton multi-point amplitudes for higher-derivative gravity in anti-de  Sitter space\nEnterprise to Computer: Star Trek chatbot\nGenerative Statistical Models with Self-Emergent Grammar of Chord  Sequences\nVeamy: an extensible object-oriented C++ library for the virtual element  method\nSemantic Word Clouds with Background Corpus Normalization and  t-distributed Stochastic Neighbor Embedding\nTraceDiff: Debugging Unexpected Code Behavior Using Trace Divergences\nEffect of strength of gravitational field on the rate of chemical  reactions\nM2D: Monolog to Dialog Generation for Conversational Story Telling\nScheduling Constraint Based Abstraction Refinement for Multi-Threaded  Program Verification\nLearning Fine-Grained Knowledge about Contingent Relations between  Everyday Events\nAutomatically Generating Commit Messages from Diffs using Neural Machine  Translation\nInteractive Attention Networks for Aspect-Level Sentiment Classification\nAutomata as $p$-adic Dynamical Systems\nSentiment Polarity Detection for Software Development\nStarSpace: Embed All The Things!\nOn the Generation of Initial Contexts for Effective Deadlock Detection\nA Rule-Based Approach to Analyzing Database Schema Objects with Datalog\nSubjective Simulation as a Notion of Morphism for Composing Concurrent  Resources\nMitigating the Impact of Speech Recognition Errors on Chatbot using  Sequence-to-Sequence Model\nEfficient and Effective Single-Document Summarizations and A  Word-Embedding Measurement of Quality\nA Semantic Relevance Based Neural Network for Text Summarization and  Text Simplification\nSynchronizing Data Words for Register Automata\nDescribing Natural Images Containing Novel Objects with Knowledge Guided  Assitance\nEnhancing Inductive Entailment Proofs in Separation Logic with Lemma  Synthesis\nDeep Triphone Embedding Improves Phoneme Recognition\nTesting the limits of unsupervised learning for semantic similarity\nInterpNET: Neural Introspection for Interpretable Deep Learning\nEspresso: Brewing Java For More Non-Volatility with Non-volatile Memory\nGeneral purpose graphics-processing-unit implementation of cosmological  domain wall network evolution\nGeneralized End-to-End Loss for Speaker Verification\nEvaluation of Automatic Video Captioning Using Direct Assessment\nAdversarial Advantage Actor-Critic Model for Task-Completion Dialogue  Policy Learning\nOn polarization of vector light beams: origin of Berry phase\nAutomated Migration of Hierarchical Data to Relational Tables using  Programming-by-Example\nFaithful to the Original: Fact Aware Neural Abstractive Summarization\nEvidence Aggregation for Answer Re-Ranking in Open-Domain Question  Answering\nCMU LiveMedQA at TREC 2017 LiveQA: A Consumer Health Question Answering  System\nGrounded Objects and Interactions for Video Captioning\nLook, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval  with Generative Models\nConvolutional Image Captioning\nTeleparallel theories of gravity as analogue of non-linear  electrodynamics\nHybrid Oracle: Making Use of Ambiguity in Transition-based Chinese  Dependency Parsing\nWill humans even write code in 2040 and what would that mean for extreme  heterogeneity in computing?\nLearning by Asking Questions\nMultimodal Storytelling via Generative Adversarial Imitation Learning\nEnd-to-End Offline Goal-Oriented Dialog Policy Learning via Policy  Gradient\nBubble-Flip---A New Generation Algorithm for Prefix Normal Words\nMixed tête-à-tête twists as monodromies associated with  holomorphic function germs\nExploring Models and Data for Remote Sensing Image Caption Generation\nMonitoring Data Minimisation\nConvergence of Pascal-Like Triangles in Parry-Bertrand Numeration  Systems\nVariational Recurrent Neural Machine Translation\nAdversarial Learning for Chinese NER from Crowd Annotations\nSize vs. Structure in Training Corpora for Word Embedding Models:  Araneum Russicum Maximum and Russian National Corpus\nImproving Review Representations with User Attention and Product  Attention for Sentiment Classification\nPCOT: Cache Oblivious Tiling of Polyhedral Programs\nA Renormalization Group Procedure for Fiber Bundle Models\nDiverse Beam Search for Increased Novelty in Abstractive Summarization\nEnhance word representation for out-of-vocabulary on Ubuntu dialogue  corpus\nFormal Ontology Learning from English IS-A Sentences\nPolicy Gradients for Contextual Bandits\nBlack-hole kicks from numerical-relativity surrogate models\nMDroid+: A Mutation Testing Framework for Android\nGrammar-based Compression of Unranked Trees\nImplementing distributed λ-calculus interpreter\nMultimodal Explanations: Justifying Decisions and Pointing to the  Evidence\nReducing Lambda Terms with Traversals\nUnderstanding and Improving Multi-Sense Word Embeddings via Extended  Robust Principal Component Analysis\nAn Empirical Evaluation of Generic Convolutional and Recurrent Networks  for Sequence Modeling\nA Non-Technical Survey on Deep Convolutional Neural Network  Architectures\nFrom $\\mathcal{N}{=}\\,4$ Galilean superparticle to three-dimensional  non-relativistic $\\mathcal{N}{=}\\,4$ superfields\nConcept2vec: Metrics for Evaluating Quality of Embeddings for  Ontological Concepts\nChallenges in Discriminating Profanity from Hate Speech\nControlling Decoding for More Abstractive Summaries with Copy-Based  Networks\nAdversarial Generalized Method of Moments\nWord sense induction using word embeddings and community detection in  complex networks\nContext is Everything: Finding Meaning Statistically in Semantic Spaces\nNeural-Guided Deductive Search for Real-Time Program Synthesis from  Examples\nExpressive Speech Synthesis via Modeling Expressions with Variational  Autoencoder\nA Generation Method of Immunological Memory in Clonal Selection  Algorithm by using Restricted Boltzmann Machines\nA GPU-based WFST Decoder with Exact Lattice Generation\nA Hierarchical Latent Structure for Variational Conversation Modeling\nGenerating Clues for Gender based Occupation De-biasing in Text\nImplementing Turing Machines in Dynamic Field Architectures\nAssociation schemes on general measure spaces and zero-dimensional  Abelian groups\nMicrotask crowdsourcing for disease mention annotation in PubMed  abstracts\nLearning Syntactic Program Transformations from Examples\nPCT and Beyond: Towards a Computational Framework for `Intelligent'  Communicative Systems\nVisual Dialog\nOn Higher Order Query Languages which on Relational Databases Collapse  to Second Order Logic\nOn the Behavior of Convolutional Nets for Feature Extraction\nEmpirical Analysis on Comparing the Performance of Alpha Miner Algorithm  in SQL Query Language and NoSQL Column-Oriented Databases Using Apache  Phoenix\nTopological Origin of Chiral Symmetry Breaking in QCD and in Gravity\nModular Labelled Sequent Calculi for Abstract Separation Logics\nLARGE SCALE PERTURBATIONS IN THE OPEN UNIVERSE\nEnumeration of Rota-Baxter Words\nGeneral Logic-Systems that Determine Significant Collections of  Consequence Operators\nHeavy ion event generator HYDJET++ (HYDrodynamics plus JETs)\nThe tractability of CSP classes defined by forbidden patterns\nOff-line test selection with test purposes for non-deterministic timed  automata\nAperiodic pseudorandom number generators based on infinite words\nSynthesis of Sequential Extended Regular Expressions for Verification\nA Dynamic Axiomatic Approach to First-Price Auctions\nComparator Circuits over Finite Bounded Posets\nAutomatic Generation of Minimal Cut Sets\nPostulation of generic lines and one double line in $\\PP^n$ in view of  generic lines and one multiple linear space\nDrawing and Recognizing Chinese Characters with Recurrent Neural Network\nCollaborative Recurrent Autoencoder: Recommend while Learning to Fill in  the Blanks\nGenerating Descriptions with Grounded and Co-Referenced People\nKBGAN: Adversarial Learning for Knowledge Graph Embeddings\nSenx: Sound Patch Generation for Security Vulnerabilities\nScale Up Event Extraction Learning via Automatic Training Data  Generation\nThe Generalized Matrix Chain Algorithm\nMemory-Based Lexical Acquisition and Processing\nA Learning Approach to Natural Language Understanding\nTDL--- A Type Description Language for Constraint-Based Grammars\nLexicalization and Grammar Development\nParsing Free Word-Order Languages in Polynomial Time\nComplexity of Scrambling: A New Twist to the Competence - Performance  Distinction\nGrouping Words Using Statistical Context\nA specification language for Lexical Functional Grammars\nCompositionality for Presuppositions over Tableaux\nA Pattern Matching method for finding Noun and Proper Noun Translations  from Noisy Parallel Corpora\nMeasuring semantic complexity\nUnification-Based Glossing\nA Grammar Formalism and Cross-Serial Dependencies\nClustered Language Models with Context-Equivalent States\nMorphological Cues for Lexical Semantics\nLinguistic Structure as Composition and Perturbation\nCompositional Semantics in Verbmobil\nMultilingual Text Analysis for Text-to-Speech Synthesis\nInstructions for Temporal Annotation of Scheduling Dialogs\nDomain Adaptation with Clustered Language Models\nDeveloping a hybrid NP parser\nA Comparative Study of the Application of Different Learning Techniques  to Natural Language Interfaces\nAggregate and mixed-order Markov models for statistical language  processing\nA Flexible POS tagger Using an Automatically Acquired Language Model\nTowards an Improved Performance Measure for Language Models\nGraph Interpolation Grammars: a Rule-based Approach to the Incremental  Parsing of Natural Languages\nAn Empirical Evaluation of Probabilistic Lexicalized Tree Insertion  Grammars\nLanguage as an Evolving Word Web\nPractical experiments with regular approximation of context-free  languages\nResolution of Verb Ellipsis in Japanese Sentence using Surface  Expressions and Examples\nRecognition Performance of a Structured Language Model\nStructured Language Modeling for Speech Recognition\nRanking suspected answers to natural language questions using predictive  annotation\nCompiling Language Models from a Linguistically Motivated Unification  Grammar\nCombining semantic and syntactic structure for language modeling\nCreating Annotation Tools with the Annotation Graph Toolkit\nGrid-Enabling Natural Language Engineering By Stealth\nBayesian Information Extraction Network\nA Logic for Reasoning about Digital Rights\nReactive Programming in Standard ML\nPerspective alignment in spatial language\nConscious Intelligent Systems - Part II - Mind, Thought, Language and  Understanding\nThe Paraldor Project\nA language theoretic analysis of combings\nStatistical Mechanical Approach to Human Language\nModelling linguistic taxonomic dynamics\nLanguage Time Series Analysis\nOn the Development of Text Input Method - Lessons Learned\nAggregation Languages for Moving Object and Places of Interest Data\nA Prolog-based Environment for Reasoning about Programming Languages  (Extended abstract)\nA resource-based Korean morphological annotation system\nMeaning and Form in a Language Computer Simulation\nA Fast Algorithm and Datalog Inexpressibility for Temporal Reasoning\nEvent Synchronization by Lightweight Message Passing\nInfluence of geography on language competition\nConstructing word similarities in Meroitic as an aid to decipherment\nMechanized semantics for the Clight subset of the C language\nThe Structure of Phonological Networks Across Multiple Languages\nRepresenting a P-complete problem by small trellis automata\nThe computational complexity of universality problems for prefixes,  suffixes, factors, and subwords of regular languages\nA Bayesian Model for Discovering Typological Implications\nStandards for Language Resources\nLanguage Models for Handwritten Short Message Services\nTowards the Safe Programming of Wireless Sensor Networks\nOn possible growth of Toeplitz languages\nCompiling Signal Processing Code embedded in Haskell via LLVM\nOperational State Complexity of Deterministic Unranked Tree Automata\nNondeterministic State Complexity for Suffix-Free Regular Languages\nKnowledge Recognition Algorithm enables P = NP\nProduct closure of some second-order modal logics\nPrecedence Automata and Languages\nModels of quantum computation and quantum programming languages\nRestarting Automata with Auxiliary Symbols and Small Lookahead\nBorel Hierarchy and Omega Context Free Languages\nAround Dot-depth One\nOn Understanding and Machine Understanding\nCodeco: A Grammar Notation for Controlled Natural Language in Predictive  Editors\nBounded Parikh Automata\nOn Pansiot Words Avoiding 3-Repetitions\nA decidable characterization of locally testable tree languages\nDu TAL au TIL\nA Lexical Analysis Tool with Ambiguity Support\nAn Annotation Scheme for Reichenbach's Verbal Tense Structure\nBasic completion strategies as another application of the Maude strategy  language\nArabic Language Learning Assisted by Computer, based on Automatic Speech  Recognition\nModular Type-Safety Proofs using Dependant Types\nPiecewise testable tree languages\nFormal languages analysed by quantum walks\nDetecting English Writing Styles For Non-native Speakers\nOn complexity of regular realizability problems\nAn Evaluation of Arabic Language Learning Websites\nDetermining token sequence mistakes in responses to questions with open  text answer\nPredicate Exchangeability and Language Invariance in Pure Inductive  Logic\nAstraKahn: A Coordination Language for Streaming Networks\nFormalisation of the lambda aleph Runtime\nRule Based Stemmer in Urdu\nA Domain-Specific Language for Discrete Mathematics\nA decidable class of (nominal) omega-regular languages over an infinite  alphabet\nThe (Nested) Word Problem\nNominal Regular Expressions for Languages over Infinite Alphabets.  Extended Abstract\nMost Complex Regular Right-Ideal Languages\nOCCA: A unified approach to multi-threading languages\nA Lemma Based Evaluator for Semitic Language Text Summarization Systems\nThe First Parallel Multilingual Corpus of Persian: Toward a Persian  BLARK\n$\\mathrm{Pal}^k$ Is Linear Recognizable Online\nComplexity of Atoms, Combinatorially\nAntescofo Intermediate Representation\nOperations on Automata with All States Final\nAn HMM Based Named Entity Recognition System for Indian Languages: The  JU System at ICON 2013\nQuality Estimation Of Machine Translation Outputs Through Stemming\nPushdown automata, lambda-graph systems and C*-algebras\nThe expressive power of quantum walks in terms of language acceptance\nContext-Free Grammars with Storage\nUniform definability of henselian valuation rings in the Macintyre  language\nUn résumeur à base de graphes, indépéndant de la langue\nOn the Effect of Human-Computer Interfaces on Language Expression\nOpportunities for a Truffle-based Golo Interpreter\nThe Mysteries of Lisp -- I: The Way to S-expression Lisp\nResponse to Liu, Xu, and Liang (2015) and Ferrer-i-Cancho and  Gómez-Rodríguez (2015) on Dependency Length Minimization\nNon-regular unary language and parallel communicating Watson-Crick  automata systems\nPrediction-Adaptation-Correction Recurrent Neural Networks for  Low-Resource Language Speech Recognition\nPolysemy in Controlled Natural Language Texts\nA new TAG Formalism for Tamil and Parser Analytics\nTalk&Learn: Improving Conversation Experience and Creating Opportunities  for Foreign Language Learning\nAn Introduction to Quantum Programming in Quipper\nA Literature Review: Stemming Algorithms for Indian Languages\nOn the state complexity of closures and interiors of regular languages  with subwords and superwords\nPolymorphic Types in ACL2\nApproaches to Interpreter Composition\nDafny: Statically Verifying Functional Correctness\nBricklayer: An Authentic Introduction to the Functional Programming  Language SML\nMultilingual Open Relation Extraction Using Cross-lingual Projection\nA Reference Interpreter for the Graph Programming Language GP 2\nGap Analysis of Natural Language Processing Systems with respect to  Linguistic Modality\nMeta-Packages: Painless Domain Specific Languages\nLearning language through pictures\nParsing Natural Language Sentences by Semi-supervised Methods\nOn not testing the foreign-language effect: A comment on Costa, Foucart,  Arnon, Aparici, and Apesteguia (2014)\nWord sense disambiguation: a survey\nOnline Representation Learning in Recurrent Neural Language Models\nPrograms as proofs\nOn the Complexity of Flanked Finite State Automata\nGibberish Semantics: How Good is Russian Twitter in Word Semantic  Similarity Task?\nA short proof that $O_2$ is an MCFL\nDerivatives for Enhanced Regular Expressions\nTowards Multi-Agent Communication-Based Language Learning\nA Decomposable Attention Model for Natural Language Inference\nDomain Specific Language for Modular Knitting Pattern Definitions: Purl\nDependency Language Models for Transition-based Dependency Parsing\nSyntax-based Attention Model for Natural Language Inference\nBehavioural Prototypes\nCharacterizing the Language of Online Communities and its Relation to  Community Reception\nCollaborative Learning for Language and Speaker Recognition\nAP16-OL7: A Multilingual Database for Oriental Languages and A Language  Recognition Baseline\nType checking through unification\nComparing 1D and 2D Real Time on Cellular Automata\nOrthographic Syllable as basic unit for SMT between Related Languages\nVoxML: A Visualization Modeling Language\nFrom phonemes to images: levels of representation in a recurrent neural  model of visually-grounded language learning\nFaster decoding for subword level Phrase-based SMT between related  languages\nQuantitative Entropy Study of Language Complexity\nDomain-Specific Languages of Mathematics: Presenting Mathematical  Analysis Using Functional Programming\nOn the computational power of affine automata\nPerformance Improvements of Probabilistic Transcript-adapted ASR with  Recurrent Neural Network and Language-specific Constraints\nRegular Separability of One Counter Automata\nDesign and Implementation of Concurrent C0\nGraph-Based Semi-Supervised Conditional Random Fields For Spoken  Language Understanding Using Unaligned Data\nUsingWord Embedding for Cross-Language Plagiarism Detection\nA case study on using speech-to-translation alignments for language  documentation\nA Visual Representation of Wittgenstein's Tractatus Logico-Philosophicus\nGlobal Entity Ranking Across Multiple Languages\nDynamic Bernoulli Embeddings for Language Evolution\nLearning Joint Multilingual Sentence Representations with Neural Machine  Translation\nImproving Context Aware Language Models\n280 Birds with One Stone: Inducing Multilingual Taxonomies from  Wikipedia using Character-level Classification\nDynamics of core of language vocabulary\nWho's to say what's funny? A computer using Language Models and Deep  Learning, That's Who!\nIn Search of Effectful Dependent Types\nComputational Models of Tutor Feedback in Language Acquisition\nSpace-Bounded OTMs and REG$^{\\infty}$\nNative Language Identification on Text and Speech\nTensor Fusion Network for Multimodal Sentiment Analysis\nCharacter-level Intra Attention Network for Natural Language Inference\nSelf-organized Hierarchical Softmax\nFrom Reversible Programs to Univalent Universes and Back\nLangPro: Natural Language Theorem Prover\nTowards an Arabic-English Machine-Translation Based on Semantic Web\nA Survey of Machine Learning for Big Code and Naturalness\nElementary number-theoretical statements proved by Language Theory\nUsing Deep Convolutional Networks for Gesture Recognition in American  Sign Language\nFine-tuned Language Models for Text Classification\nE2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene  Text\nReusing Weights in Subword-aware Neural Language Models\nLanguage Identification of Bengali-English Code-Mixed data using  Character & Phonetic based LSTM Models\nWord Problem Languages for Free Inverse Monoids\nAn Analysis of Neural Language Modeling at Multiple Scales\nLeveraging translations for speech transcription in low-resource  settings\nComposing DTI Visualizations with End-user Programming\nSome Bibliographical References on Intonation and Intonational Meaning\nLexical Functions and Machine Translation\nDistributional Part-of-Speech Tagging\nLexGram - a practical categorial grammar formalism -\nMemoization of Top Down Parsing\nSCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis  Using Artificial Neural Networks\nThe Logic Programming Paradigm and Prolog\nSynchronization from a Categorical Perspective\nIn memoriam Maurice Gross\nThe Parikh functions of sparse context-free languages are  quasi-polynomials\nProgramming languages with algorithmically parallelizing problem\nCREOLE: a Universal Language for Creating, Requesting, Updating and  Deleting Resources\nScattered context-free linear orderings\nOn religion and language evolutions seen through mathematical and agent  based models\nNotes on Electronic Lexicography\nFrom Mathematics to Abstract Machine: A formal derivation of an  executable Krivine machine\nThe state complexity of star-complement-star\nIndus script corpora, archaeo-metallurgy and Meluhha (Mleccha)\nTaxonomy and synthesis of Web services querying languages\nA Pointillism Approach for Natural Language Processing of Social Media\nOne-counter verifiers for decidable languages\nConcrete Semantics of Programs with Non-Deterministic and Random Inputs\nA Note on Kolmogorov-Uspensky Machines\nOn disjunction of equations in the semigroup language with no constants\nHard Asymptotic Sets for One-Dimensional Cellular Automata\nThe Geometry of Orbifolds via Lie Groupoids\nThe ins and outs of iteration in Mezzo\nInternal and external dynamics in language: Evidence from verb  regularity in a historical corpus of English\nA Transfer Theorem for the Separation Problem\nMultilingual Schema Matching for Wikipedia Infoboxes\nUnicode in Domain-Specific Programming Languages for Modeling &  Simulation: ScalaTion as a Case Study\nThe freeness problem over matrix semigroups and bounded languages\nSecurity Type Systems as Recursive Predicates\nIncorporating Semi-supervised Features into Discontinuous Easy-First  Constituent Parsing\nA Survey of Word Reordering in Statistical Machine Translation:  Computational Models and Language Phenomena\nLanguage, Twitter and Academic Conferences\nWhy Bother With Syntax?\nA Logical Approach to Event Handling in Imperative Languages\nParallels of human language in the behavior of bottlenose dolphins\nSimplified Boardgames\nLearning Python Code Suggestion with a Sparse Pointer Network\nSyntactic Structures of Regular Languages\nOn the Upward/Downward Closures of Petri Nets\nA Higher-Order Abstract Syntax Approach to the Verified Compilation of  Functional Programs\nFrustratingly Short Attention Spans in Neural Language Modeling\nSecond order conservative languages with a Maltsev polymorphism also  have a majority polymorphism\nTwo Dichotomy Theorems\nLEPOR: An Augmented Machine Translation Evaluation Metric\nPredicting language diversity with complex network\nAbstracting Definitional Interpreters\nImproved bounds for testing Dyck languages\nOn h-Lexicalized Restarting Automata\nBüchi VASS recognise w-languages that are Sigma^1_1 - complete\nBoundedness in languages of infinite words\nMMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language  Processing\nCross-Language Question Re-Ranking\nLanguage Modeling for Code-Switched Data: Challenges and Approaches\nLanguage Distribution Prediction based on Batch Markov Monte Carlo  Simulation with Migration\nThe process of purely event-driven programs\nContrastive Learning of Emoji-based Representations for Resource-Poor  Languages\nThe Very Idea of Dynamic Semantics\nGEMINI: A Natural Language System for Spoken-Language Understanding\nHaving Your Cake and Eating It Too: Autonomy and Interaction in a Model  of Sentence Processing\nAn Extended Clustering Algorithm for Statistical Language Models\nMultilingual Sentence Categorization according to Language\nA Note on Zipf's Law, Natural Languages, and Noncoding DNA regions\nA Dynamic Approach to Rhythm in Language: Toward a Temporal Phonology\nAttempto - From Specifications in Controlled Natural Language towards  Executable Specifications\nBuilding Probabilistic Models for Natural Language\nNatural Language Processing: Structure and Complexity\nCentering in Japanese Discourse\nMaximum Entropy Modeling Toolkit\nExploiting Context to Identify Lexical Atoms -- A Statistical View of  Linguistic Context\nNatural Language Dialogue Service for Appointment Scheduling Agents\nAdjunction As Substitution: An Algebraic Formulation of Regular,  Context-Free and Tree Adjoining Languages\nSemantics and Conversations for an Agent Communication Language\nUniversal Object Oriented Languages and Computer Algebra\nCross-Language Information Retrieval for Technical Documents\nApplying Machine Translation to Two-Stage Cross-Language Information  Retrieval\nExtension Language Automation of Embedded System Debugging\nAutomated Debugging In Java Using OCL And JDI\nProbabilistic top-down parsing and language modeling\nA New Approach to Formal Language Theory by Kolmogorov Complexity\nAn Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical  Information\nXPath-Logic and XPathLog: A Logic-Programming Style XML Data  Manipulation Language\nDelimited continuations in natural language: quantification and polarity  sensitivity\nFoundations of Modern Language Resource Archives\nAutomatic Identification of Document Translations in Large Multilingual  Document Collections\nExtending an Information Extraction tool set to Central and Eastern  European languages\nBit-strings and other modifications of Viviane model for language  competition\nPhysics of randomness and regularities for cities, languages, and their  lifetimes and family trees\nIndo-European languages tree by Levenshtein distance\nA computer simulation of language families\nCross Comparison of Synonym Graphs in A Multi Linguistic Context\nTopology and Ambiguity in Omega Context Free Languages\nSyntax diagrams as a formalism for representation of syntactic relations  of formal languages\nHow to turn a scripting language into a domain specific language for  computer algebra\nUniversal Complex Structures in Written Language\nNP Datalog: a Logic Language for Expressing NP Search and Optimization  Problems\nMeasures of lexical distance between languages\nObject-oriented Programming Laws for Annotated Java Programs\nExtending scientific computing system with structural quantum  programming capabilities\nCross-Lingual Adaptation using Structural Correspondence Learning\nHierarchical states in the Compositional Interchange Format\nTree Languages Defined in First-Order Logic with One Quantifier  Alternation\nReducing the Number of Annotations in a Verification-oriented Imperative  Language\nBeating the Productivity Checker Using Embedded Languages\nLanguages of Dot-depth One over Infinite Words\nA Comparative Case Study of Code Reuse With Language Oriented  Programming\nSelective Memoization\nNeutral evolution: A null model for language dynamics\nA Concise Query Language with Search and Transform Operations for  Corpora with Multiple Levels of Annotation\nPositivity of the English language\nToward a Motor Theory of Sign Language Perception\nFault detection system for Arabic language\nPerformance of the Google Desktop, Arabic Google Desktop and Peer to  Peer Application in Arabic Language\nNew developments in parsing Mizar\nA static cost analysis for a higher-order language\nA Joint Model of Language and Perception for Grounded Attribute Learning\nOn logical hierarchies within FO^2-definable languages\nPartially-commutative context-free languages\nLanguages cool as they expand: Allometric scaling and the decreasing  need for new words\nBinary Patterns in Binary Cube-Free Words: Avoidability and Growth\nPyPLN: a Distributed Platform for Natural Language Processing\nHypergraph Automata: A Theoretical Model for Patterned Self-assembly\nA Multilingual Semantic Wiki Based on Attempto Controlled English and  Grammatical Framework\nUniformly defining valuation rings in Henselian valued fields with  finite or pseudo-finite residue fields\nHow fast can we make interpreted Python?\nAn introduction to the Europe Media Monitor family of applications\nComputing in Operations Research using Julia\nProceedings 5th Workshop on Programming Language Approaches to  Concurrency and Communication-cEntric Software\nAn Application of Answer Set Programming to the Field of Second Language  Acquisition\nStructured Approach to Web Development\nA Linear Logic Programming Language for Concurrent Programming over  Graph Structures\nCompositional Morphology for Word Representations and Language Modelling\nSimplifying Nondeterministic Finite Cover Automata\nMore Structural Characterizations of Some Subregular Language Families  by Biautomata\nComputerization of African languages-French dictionaries\nCross-Language Personal Name Mapping\nReport on the First Workshop On the Globalization of Modeling Languages\nA Probabilistic Translation Method for Dictionary-based Cross-lingual  Information Retrieval in Agglutinative Languages\nMagic coins are useful for small-space quantum machines\nModeling Hybrid Systems in the Concurrent Constraint Paradigm\nHierarchy of Scales in Language Dynamics\nA Survey of Arabic Dialogues Understanding for Spontaneous Dialogues and  Instant Message\nExperience Report: Developing the Servo Web Browser Engine using Rust\nPrior Polarity Lexical Resources for the Italian Language\nWatson-Crick Quantum Finite Automata\nOne-Tape Turing Machine Variants and Language Recognition\nNatural Language Object Retrieval\nA declarative extension of parsing expression grammars for recognizing  most programming languages\nA Compositional Approach to Language Modeling\nTransfer Learning for Low-Resource Neural Machine Translation\nUncountable classical and quantum complexity classes\nManaging Schema Evolution in NoSQL Data Stores\nFrameworks for Reasoning about Syntax that Utilize Quotation and  Evaluation\nLifted Variable Elimination: Decoupling the Operators from the  Constraint Language\nPragmatic Neural Language Modelling in Machine Translation\nA Fundamental Scale of Descriptions for Analyzing Information Content of  Communication Systems\nA Solution to Yamakami's Problem on Advised Context-free Languages\nRegular realizability problems and context-free languages\nQuotient Complexities of Atoms in Regular Ideal Languages\nLearning Models for Following Natural Language Directions in Unknown  Environments\nPairs of Languages Closed under Shuffle Projection\nLeveraging Twitter for Low-Resource Conversational Speech Language  Modeling\nReasoning in complex environments with the SelectScript declarative  language\nPJAIT Systems for the IWSLT 2015 Evaluation Campaign Enhanced by  Comparable Corpora\nLearning Natural Language Inference with LSTM\nContrastive Entropy: A new evaluation metric for unnormalized language  models\nProceedings Eighth International Workshop on Programming Language  Approaches to Concurrency- and Communication-cEntric Software\nCross-Language Domain Adaptation for Classifying Crisis-Related Short  Messages\nRight Ideals of a Ring and Sublanguages of Science\nThe complexity of downward closure comparisons\nLearning Language Games through Interaction\nAristotle's square of opposition in the light of Hilbert's epsilon and  tau quantifiers\n2-tape 1-way Quantum Finite State Automata\nRepresenting Strategies\nOpen-Vocabulary Semantic Parsing with both Distributional Statistics and  Formal Knowledge\nObject Capabilities and Lightweight Affinity in Scala: Implementation,  Formalization, and Soundness\nLatent Tree Language Model\nNationalism, Immigration and the Dynamics of Language Evolution\nChallenges of Computational Processing of Code-Switching\nA Coordination Language for Databases\nMeasuring Asymmetric Opinions on Online Social Interrelationship with  Language and Network Features\nA Neural Architecture Mimicking Humans End-to-End for Natural Language  Inference\nStructural Correspondence Learning for Cross-lingual Sentiment  Classification with One-to-many Mappings\nLearn Quantum Mechanics with Haskell\nAutomata theory on sliding windows\nParent Oriented Teacher Selection Causes Language Diversity\nUtilizing Lexical Similarity between Related, Low-resource Languages for  Pivot-based SMT\nBuilding a Syllable Database to Solve the Problem of Khmer Word  Segmentation\nMachine Learning Based Source Code Classification Using Syntax Oriented  Features\nProceedings Tenth Workshop on Programming Language Approaches to  Concurrency- and Communication-cEntric Software\nDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning\nCross-lingual Distillation for Text Classification\nLatent Human Traits in the Language of Social Media: An Open-Vocabulary  Approach\nOne-step and Two-step Classification for Abusive Language Detection on  Twitter\nPreliminary Exploration of Formula Embedding for Mathematical  Information Retrieval: can mathematical formulae be embedded like a natural  language?\nMassively Multilingual Neural Grapheme-to-Phoneme Conversion\nLanguage Identification Using Deep Convolutional Recurrent Neural  Networks\nCross-Lingual Dependency Parsing for Closely Related Languages -  Helsinki's Submission to VarDial 2017\nTransfer Learning across Low-Resource, Related Languages for Neural  Machine Translation\nNatural Language Inference over Interaction Space\nMinimal Dependency Translation: a Framework for Computer-Assisted  Translation for Under-Resourced Languages\nWord Translation Without Parallel Data\nCross-Language Learning for Program Classification using Bilateral  Tree-Based Convolutional Neural Networks\nBreaking the Softmax Bottleneck: A High-Rank RNN Language Model\nNatural Language Inference with External Knowledge\nSlim Embedding Layers for Recurrent Neural Language Models\nKilling Two Birds with One Stone -- Querying Property Graphs using  SPARQL via GREMLINATOR\nGeospatial distributions reflect rates of evolution of features of  language\nJU_KS@SAIL_CodeMixed-2017: Sentiment Analysis for Indian Code Mixed  Social Media Texts\nPreparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of  Indian Languages\nOn the difficulty of a distributional semantics of spoken language\nSentiment Analysis of Code-Mixed Languages leveraging Resource Rich  Languages\nSO(5) as a Critical Dynamical Symmetry in the SU(4) Model of  High-Temperature Superconductivity\nA Bootstrap Approach to Automatically Generating Lexical Transfer Rules\nA General Framework for Automatic Termination Analysis of Logic Programs\nSymmetries and conservation laws in histories-based generalized quantum  mechanics\nSymmetries and conservation laws in histories-based theories\nPhases of massive gravity\nGenerando entrelazamiento en cadenas XY - (Generating entanglement in XY  chains)\nMethod of Squared Eigenfunction Potentials in Integrable Hierarchies of  KP Type\nGeneralized non-supersymmetric flux vacua\nSynechococcus as a \"singing\" bacterium: biology inspired by  micro-engineered acoustic streaming devices\nOn Generation of Firewall Log Status Reporter (SRr) Using Perl\nA Generic Scheme for Qualified Logic Programming\nOn the computation of non-perturbative effective potentials in the  string theory landscape -- IIB/F-theory perspective\nGeneralized Householder Transformations for the Complex Symmetric  Eigenvalue Problem\nGeneric solution of the heterogeneity-induced competing risk problem in  survival analysis\nAutomated Attack Planning\nHigh-level robot programming based on CAD: dealing with unpredictable  environments\nIntrinsic-Correlation Quantum Key Generation\nGeneralized Biwords for Bitext Compression and Translation Spotting\nConceptual Preconditions of Overcoming of Relativistic Intentions in  Modern Philosophy of Science\nA unifying description of dark energy\nA Logic Programming Playground for Lambda Terms, Combinators, Types and  Tree-based Arithmetic Computations\nA Program Logic for Verifying Secure Routing Protocols\nModel-checking Quantitative Alternating-time Temporal Logic on  One-counter Game Models\nVortex Formation and Evolution in Planet Harboring Disks under Thermal  Relaxation\nA Verified Information-Flow Architecture\nReachability Analysis of Reversal-bounded Automata on Series-Parallel  Graphs\nFitting Bayesian item response models in Stata and Stan\nTowards Spectral Geometric Methods for Euclidean Quantum Gravity\nOperational calculus on programming spaces\nA Generic Approach to Flow-Sensitive Polymorphic Effects (Extended  Version)\nQuestion Answering and Question Generation as Dual Tasks\nAdversarially Regularized Autoencoders\nTopology Analysis of International Networks Based on Debates in the  United Nations\nChimpCheck: Property-Based Randomized Test Generation for Interactive  Apps\nLattice Operations on Terms over Similar Signatures\nFlexible End-to-End Dialogue System for Knowledge Grounded Conversation\nAutomaton Semigroups and Groups: on the Undecidability of Problems  Related to Freeness and Finiteness\nFooling End-to-end Speaker Verification by Adversarial Examples\nInvariant Generation for Multi-Path Loops with Polynomial Assignments\nMachine Intelligence Techniques for Next-Generation Context-Aware  Wireless Networks\nAssertion-based QA with Question-Aware Open Information Extraction\nQuery Focused Abstractive Summarization: Incorporating Query Relevance,  Multi-Document Coverage, and Summary Length Constraints into seq2seq Models\nWaveform Modeling and Generation Using Hierarchical Recurrent Neural  Networks for Speech Bandwidth Extension\nInterpreting DNN output layer activations: A strategy to cope with  unseen data in speech recognition\nAutomatic Generation of Precise and Useful Commutativity Conditions  (Extended Version)\nTechnical Report about Tiramisu: a Three-Layered Abstraction for Hiding  Hardware Complexity from DSL Compilers\nInvestigating Generative Adversarial Networks based Speech  Dereverberation for Robust Speech Recognition\nConcepts for astronomical data accessibility and analysis via relational  database\nSecurity Policy Specification Using a Graphical Approach\nThe role of robust semantic analysis in spoken language dialogue systems\nThe Interpolation Theory of Radial Basis Functions\nChecking Finite State Machine Conformance when there are Distributed  Observations\nGenerating Exact- and Ranked Partially-Matched Answers to Questions in  Advertisements\nOn Periodically Iterated Morphisms\nMyhill-Nerode methods for hypergraphs\nPredicate Logic as a Modeling Language: Modeling and Solving some  Machine Learning and Data Mining Problems with IDP3\nCommunication complexity of promise problems and their applications to  finite automata\nUnifying Class-Based Representation Formalisms\nEngineering Benchmarks for Planning: the Domains Used in the  Deterministic Part of IPC-4\nACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs  Shortest Path Algorithm in ACL2\nDMARF AND GIPSY High Level Architecture and Requirements Analysis\nA Big Data Analyzer for Large Trace Logs\nInferring Energy Bounds via Static Program Analysis and Evolutionary  Modeling of Basic Blocks\nHigh level implementation of geometric multigrid solvers for finite  element problems: applications in atmospheric modelling\nMachine Translation Evaluation: A Survey\nA Novel Framework to Expedite Systematic Reviews by Automatically  Building Information Extraction Training Corpora\nThe Semantic Knowledge Graph: A compact, auto-generated model for  real-time traversal and ranking of any relationship within a domain\nA Physician Advisory System for Chronic Heart Failure Management Based  on Knowledge Patterns\nHow to do lexical quality estimation of a large OCRed historical Finnish  newspaper collection with scarce resources\nImageJ2: ImageJ for the next generation of scientific image data\nInterconnected Linguistic Architecture\nSocial Media Text Processing and Semantic Analysis for Smart Cities\nQuality-Efficiency Trade-offs in Machine Learning for Text Processing\nLearning to Organize Knowledge with N-Gram Machines\nObtaining the Non-relativistic Quantum Mechanics from Quantum Field  Theory: Issues, Folklores and Facts\nConversational AI: The Science Behind the Alexa Prize\nHyperledger Fabric: A Distributed Operating System for Permissioned  Blockchains\nA Comparison of Word Embeddings for the Biomedical Natural Language  Processing\nTowards Runtime Monitoring of Node.js and Its Application to the  Internet of Things\nSymmetry Breaking and Adaptation: Evidence from a Toy Model of a Virus\nRestrictions on a geometrical language in gravity\nSpot-like Structures of Neutron Star Surface Magnetic Fields\nGrad-Shafranov Approach To Axisymmetric Stationary Flows In Astrophysics\nAPECS - The Atacama Pathfinder Experiment Control System\nRegular unimodal systems and factors of finite automata\nRiemannian theory of Hamiltonian chaos and Lyapunov exponents\nDiffusion Limited Aggregation and Iterated Conformal Maps\nOn the low energy properies of fermions with singular interactions\nGlobal symmetries of quantum Hall systems: lattice description\nN-species stochastic models with boundaries and quadratic algebras\nObject orientation and visualization of physics in two dimensions\nGeometrical Properties of Cumulant Expansions\nTime-dependent linear response of an inhomogeneous Bose superfluid:  Microscopic theory and connection to current-density functional theory\nSimplest random K-satisfiability problem\nSingularity Formation and Collapse in the Attractive Gross-Pitaevskii  Equation\nThermodynamic Formalism of the Harmonic Measure of Diffusion Limited  Aggregates: Phase Transition and Converged $f(α)$\nAccelerated growth of networks\nKondo effect in coupled quantum dots: a Non-crossing approximation study\nThe Functional Schrödinger Picture Approach to Many-Particle Systems\nIrreversibility time scale\nRestoring Coherence Lost to a Slow Interacting Mesoscopic Bath\nMagnon condensation into Q-ball in 3He-B\nCue Phrase Classification Using Machine Learning\nComputing Declarative Prosodic Morphology\nDeriving Abstract Semantics for Forward Analysis of Normal Logic  Programs\nA Probabilistic Approach to Lexical Semantic Knowledge Acquisition and S  tructural Disambiguation\nEmpirically Evaluating an Adaptable Spoken Dialogue System\nDRAFT : Task System and Item Architecture (TSIA)\nA Second Step Towards Complexity-Theoretic Analogs of Rice's Theorem\nRequirements of Text Processing Lexicons\nACLP: Integrating Abduction and Constraint Solving\nOptimal Belief Revision\nNaive Bayes and Exemplar-Based approaches to Word Sense Disambiguation  Revisited\nPolynomial-time Computation via Local Inference Relations\nAn Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam  Filtering with Personal E-mail Messages\nMeasuring efficiency in high-accuracy, broad-coverage statistical  parsing\nLearning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a  Memory-Based Approach\nOn-the-fly Query-Based Debugging with Examples\nThe Existential Theory of Equations with Rational Constraints in Free  Groups is PSPACE-Complete\nIterative Residual Rescaling: An Analysis and Generalization of LSI\nInference of termination conditions for numerical loops\nLooking Under the Hood : Tools for Diagnosing your Question Answering  Engine\nInference of termination conditions for numerical loops in Prolog\nOn termination of meta-programs\nMragyati : A System for Keyword-based Searching in Databases\nFast Context-Free Grammar Parsing Requires Fast Boolean Matrix  Multiplication\nA Backward Analysis for Constraint Logic Programs\nUsing parametric set constraints for locating errors in CLP programs\nConformal Geometry, Euclidean Space and Geometric Algebra\nMonitoring and Debugging Concurrent and Distributed Object-Oriented  Systems\nA Probabilistic Method for Analyzing Japanese Anaphora Integrating Zero  Pronoun Detection and Resolution\nThinking, Learning, and Autonomous Problem Solving\nOn Applying Or-Parallelism and Tabling to Logic Programs\nA Cross-media Retrieval System for Lecture Videos\ncTI: A constraint-based termination inference tool for ISO-Prolog\nSchedulers and Redundancy for a Class of Constraint Propagation Rules\nIncompleteness of States w.r.t. Traces in Model Checking\nInteractive visualization of higher dimensional data in a multiview  environment\nLearning Hybrid Algorithms for Vehicle Routing Problems\nA Public Reference Implementation of the RAP Anaphora Resolution  Algorithm\nA New Approach to Draw Detection by Move Repetition in Computer Chess  Programming\nWord Sense Disambiguation by Web Mining for Word Co-occurrence  Probabilities\nExtending Design by Contract for Aspect-Oriented Programming\nMapping Fusion and Synchronized Hyperedge Replacement into Logic  Programming\nEfficient Multiclass Implementations of L1-Regularized Maximum Entropy\nProving or Disproving likely Invariants with Constraint Reasoning\nOn Algorithms and Complexity for Sets with Cardinality Constraints\nSolving Partial Order Constraints for LPO Termination\nIncremental copying garbage collection for WAM-based Prolog systems\nFast Frequent Querying with Lazy Control Flow Compilation\nReasoning About Knowledge of Unawareness\nOn the Design of Agent-Based Systems using UML and Extensions\nModeling the Dynamics of Social Networks\n10^(10^6) Worlds and Beyond: Efficient Representation and Processing of  Incomplete Information\nLexical Adaptation of Link Grammar to the Biomedical Sublanguage: a  Comparative Evaluation of Three Approaches\nOn the confluence of lambda-calculus with conditional rewriting\nPartial Evaluation for Program Comprehension\nApplying Part-of-Seech Enhanced LSA to Automatic Essay Grading\nUn modèle générique d'organisation de corpus en ligne: application  à la FReeBank\nThe Reaction RuleML Classification of the Event / Action / State  Processing and Reasoning Space\nA Web-based Tool Combining Different Type Analyses\nPre-Requirement Specification Traceability: Bridging the Complexity Gap  through Capabilities\nDomain Directed Dialogs for Decision Processes\nSymbolic Methods to Enhance the Precision of Numerical Abstract Domains\nOn the alternative description of complex holomorphic and Lorentz  geometries in four dimensions\nThe Use of Computer Algebra in Maxwell's Theory\nThe Covariant Approach to LRS Perfect Fluid Spacetime Geometries\nQuantum Prediction Algorithms\nQuantum gravity effects in the CGHS model of collapse to a black hole\nThe Angular Resolution of Space-Based Gravitational Wave Detectors\nSymmetries of homogeneous cosmologies\nBlack Hole Thermodynamics without a Black Hole?\nSurface terms and the Gauss-Bonnet Hamiltonian\nAcoustic Wormholes\nKerr-Sen dilaton-axion black hole lensing in the strong deflection limit\nInferring the intensity of Poisson processes at the limit of the  detector sensitivity (with a case study on gravitational wave burst search)\nLattice DIS Structure Functions\nThe Taming of QCD by Fortran 90\nPerturbative renormalization of the first two moments of non-singlet  quark distributions with overlap fermions\nCan a pseudo-symmetry solve the cosmological constant problem?\n$1/N_c$ Expansion for Excited Baryons\nFrom kaons to neutrinos: quantum mechanics of particle oscillations\nSpin effects in tau-lepton pair production at LHC\nA simple explanation of the non-appearance of physical gluons and quarks\nDynamics of the scalar field in 5-dimensional Kaluza-Klein theory\nQuark Imaging in the Proton Via Quantum Phase-Space Distributions\nJaxoDraw: A graphical user interface for drawing Feynman diagrams\nSimulation of long-baseline neutrino oscillation experiments with GLoBES\nRunMC - an object-oriented analysis framework for Monte Carlo simulation  of high-energy particle collisions\nChern-Simons Perturbation Theory\nA path-integral approach to polynomial invariants of links\nSymmetries and String Field Theory in D=2\nOn braided tensor categories\nDispersionless Toda hierarchy and two-dimensional string theory\nOn quantum group SL_q(2)\nThe Quantum Gauge Principle\nCoarse Grainings and Irreversibility in Quantum Field Theory\nGaudin Model, KZ Equation, and Isomonodromic Problem on Torus\nEquivalence between a bosonic theory and a massive non-local Thirring  model at Finite Temperature\nD-branes on Nonabelian Threefold Quotient Singularities\nTowards SO(2,10)-Invariant M-Theory: Multilagrangian Fields\nD3-branes on partial resolutions of abelian quotient singularities of  Calabi-Yau threefolds\nDiscrete Torsion and Gerbes II\nQuasi-local structure of p-form theory\nNoncompact Heisenberg spin magnets from high-energy QCD: I. Baxter  Q-operator and Separation of Variables\nD-brane probes on G2 Orbifolds\nCFT Description of Identity String Field: Toward Derivation of the VSFT  Action\nOn the Relation Between Fock and Schroedinger Representations for a  Scalar Field\nA universal symmetry structure in open string theory\nSuperfield Approach to (Non-)local Symmetries for One-Form Abelian Gauge  Theory\nFrom Branes to Branes\nTwistors, special relativity, conformal symmetry and minimal coupling -  a review\nHypermultiplets and hypercomplex geometry from 6 to 3 dimensions\nCovariant One-Loop Amplitudes in D=11\nFermions in the harmonic potential and string theory\nThe Ricci Curvature of Half-flat Manifolds\nCombinatorial problems about free groups and algebras\nThe Theory of Ultralogics Part II\nQuantum Homology of fibrations over $S^2$\nOn an ambiguity in the concept of partial and total derivatives in  classical analysis\nIdempotent functional analysis: an algebraic approach\nFactorization of nonlinear heat equation posed on Riemann manifold\nA classical approach to TQFT's\nVirtual moduli cycles and Gromov-Witten invariants of noncompact  symplectic manifolds\nSubrepresentations of Kronecker representations\nA graphic generalization of arithmetic\nWeak Omega Categories I\nDiscrete Baker Transformation and Cellular Automata\nSheaves and D-modules in integral geometry\nHow can we escape Thomae's relations?\nThe planar Tree Lagrange Inversion Formula\nAnalytic cell decomposition and analytic motivic integration\nLie algebroids and Cartan's method of equivalence\nIntegration and conjugacy in knot theory\nA gerbe for the elliptic gamma function\nFull field algebras, operads and tensor categories\nChen-Ruan cohomology of ADE singularities\nSolving One-Variable Equations in Free Groups\nA cohomological interpretation of Brion's formula\nThe loop problem for monoids and semigroups\nReal closed fields with nonstandard and standard analytic structure\nThe physical heritage of Sir W.R. Hamilton\nThe mathematical role of (commutative and noncommutative) infinitesimal  random walks over (commutative and noncommutative) riemannian manifolds in  Quantum Physics\nChaNoXity: The Nonlinear Dynamics of Nature\n$q$-analogue of modified KP hierarchy and its quasi-classical limit\nSocial network from communities of electronic mail\nLimited-Diffraction Solutions to Maxwell and to Schroedinger Equations\nThe Network Solution For Electron Identification in a Wide Momentum  Region with a TRD\nNonlinear relaxation field in charged systems under high electric fields\nA Segunda Lei da Termodinamica na formulacao da Lei de Hooke\nApplications of geometric algebra to black holes and Hawking radiation\nThe Indefinite Logarithm, Logarithmic Units, and the Nature of Entropy\nTopology Induced Coarsening in Language Games\nInconsistencies of Neutrino and Quark Conjectures and their Negative  Environmental Implications\nFusion of q-tensor operators: quasi-Hopf-algebraic point of view\nMultibraces on the Hochschild complex\nDeformation quantization of Poisson manifolds, I\nEvolutionary ecology in-silico: Does mathematical modelling help in  understanding the \"generic\" trends?\nParametrized Stochastic Grammars for RNA Secondary Structure Prediction\nMixed-Cultures and Alcoholic Fermentations\nElectron structure, Zitterbewegung, and the new non-linear Dirac-like  equation\nOptimal Copying of One Quantum Bit\nNon-local quantum evolution of entangled ensemble states in neural nets  and its significance for brain function and a theory of consciousness\nPseudo-forces in quantum mechanics\nSpin and Electron Structure\nA relativistically invariant mass operator\nQuantum Search on Bounded-Error Inputs\nQuantum Computational Logics. A Survey\nStatistical Structures Underlying Quantum Mechanics and Social Science\nCompiling gate networks on an Ising quantum computer\nDistributed measurement-based quantum computation\nA Pragmatic Interpretation of Quantum Logic\nQuantum automata, braid group and link polynomials\nQuantum Predicative Programming\nAdaptive strategies for graph state growth in the presence of monitored  errors\n(1+1)-Dirac particle with position-dependent mass in complexified  Lorentz scalar interactions: effectively PT-symmetric\nOn the impossibility of extracting classical randomness using a quantum  computer\nQuantum Walks, Quantum Gates and Quantum Computers\nA Topos Foundation for Theories of Physics: IV. Categories of Systems\nAn architecture-based dependability modeling framework using AADL\nAre We Typical?\nAvoiding Rotated Bitboards with Direct Lookup\nBirationality of étale morphisms via surgery\nField theoretic description of the abelian and non-abelian Josephson  effect\nDiagnostic tools for 3D unstructured oceanographic data\nThe Common Origin of Linear and Nonlinear Chiral Multiplets in N=4  Mechanics\nt-J model then and now: A personal perspective from the pioneering times\nA Robust Linguistic Platform for Efficient and Domain specific Web  Content Analysis\nSome Observations for Mean-Field Spin Glass Models\nAn Architecture Framework for Complex Data Warehouses\nRemoving Manually-Generated Boilerplate from Electronic Texts:  Experiments with Project Gutenberg e-Books\nSpatial Aggregation: Data Model and Implementation\nVacuum driven accelerated expansion\nEfficient Divide-and-Conquer Implementations Of Symmetric FSAs\nOptimal quantum adversary lower bounds for ordered search\nOn the interaction between sharing and linearity\nNonstandard Higgs Decays with Visible and Missing Energy\nFORAY-GEN: Automatic Generation of Affine Functions for Memory  Optimizations\nSWI-Prolog and the Web\nOn the embedding of spacetime in five-dimensional Weyl spaces\nQuivers, Geometric Invariant Theory, and Moduli of Linear Dynamical  Systems\nAxiomatizing rational power series\nQIS-XML: A metadata specification for Quantum Information Science\nMatrix Hamiltonians with an algebraic guarantee of unbroken PT-symmetry\nCall-by-value Termination in the Untyped lambda-calculus\nBetween conjecture and memento: shaping a collective emotional  perception of the future\nSpreadsheet Assurance by \"Control Around\" is a Viable Alternative to the  Traditional Approach\nEquation of Motion for the Quantum Characteristic Functions\nPython - All a Scientist Needs\nBreaking Out of the Cell: On The Benefits of a New Spreadsheet  User-Interaction Paradigm\nOn group theory for quantum gates and quantum coherence\nAn Introduction to Topos Physics\nThe elliptic GL(n) dynamical quantum group as an h-Hopf algebroid\nIntegration I(d) of Nonstationary Time Series: Stationary and  nonstationary increments\nOptimization of Enzymatic Biochemical Logic for Noise Reduction and  Scalability: How Many Biocomputing Gates Can Be Interconnected in a Circuit?\nInformation filtering based on wiki index database\nThe Complexity of Coverage\nOn d-dimensional d-Semimetrics and Simplex-Type Inequalities for  High-Dimensional Sine Functions\nNoncommutative gravity, a `no strings attached' quantum-classical  duality, and the cosmological constant puzzle\nManaging conflicts between users in Wikipedia\nSmall overlap monoids II: automatic structures and normal forms\nThe Correspondence Analysis Platform for Uncovering Deep Structure in  Data and Information\nCPBVP: A Constraint-Programming Framework for Bounded Program  Verification\nDescribeX: A Framework for Exploring and Querying XML Web Collections\nAgent-based model of competition in a social structure\nExecutable Set Theory and Arithmetic Encodings in Prolog\nText Modeling using Unsupervised Topic Models and Concept Hierarchies\nA Complete Grammar for Decomposing a Family of Graphs into 3-connected  Components\nElectromagnetic Fields Produced by Moving Sources in a Curved Beam Pipe\nOn finite-index extensions of subgroups of free groups\nMorphic and Automatic Words: Maximal Blocks and Diophantine  Approximation\nAlgorithmic derivation of Dyson-Schwinger Equations\nExtending Cantor Paradox\nQuantum theta functions and Gabor frames for modulation spaces\nArithmetic gravity and Yang-Mills theory: An approach to adelic physics  via algebraic spaces\nThe ADAPT Tool: From AADL Architectural Models to Stochastic Petri Nets  through Model Transformation\nEffective Theory of Braid Excitations of Quantum Geometry in terms of  Feynman Diagrams\nTricritical O(n) models in two dimensions\nEquivariant Quantizations of Symmetric Algebras\nClifford Algebra with Mathematica\nQuantum Curves and D-Modules\nAssembling Actor-based Mind-Maps from Text Stream\nBlack Holes, AdS, and CFTs\nBlasting through lattice calculations using CUDA\nSymbolic model checking of tense logics on rational Kripke models\nCoalgebraic Automata Theory: Basic Results\nStability structures, motivic Donaldson-Thomas invariants and cluster  transformations\nQuantum Symmetries and Marginal Deformations\nA polytime proof of correctness of the Rabin-Miller algorithm from  Fermat's little theorem\nProbabilistic SVM/GMM Classifier for Speaker-Independent Vowel  Recognition in Continues Speech\nOn Poncelet's maps\nTAUOLA, TAUOLA universal interface PHOTOS and MC-TESTER: Status Report\nBeyond Language Equivalence on Visibly Pushdown Automata\nGraphical Reasoning in Compact Closed Categories for Quantum Computation\nCompatibility of (co)actions and localizations\nPolynomial Size Analysis of First-Order Shapely Functions\nTowards a human proof of Gessel's conjecture\nImageSpace: An Environment for Image Ontology Management\nSymbolic Computing with Incremental Mindmaps to Manage and Mine Data  Streams - Some Applications\nConvergence, Strong Law of Large Numbers, and Measurement Theory in the  Language of Fuzzy Variables\nFootprints in Local Reasoning\nTemporal Platonic Metaphysics\nBetter Termination for Prolog with Constraints\nRfuzzy framework\nFuzzy Chemical Abstract Machines\nThe Cox ring of an algebraic variety with torus action\nMathematical Model for Transformation of Sentences from Active Voice to  Passive Voice\nOn the embedding of spacetime in higher-dimensional spaces with torsion\nDiscrete concavity and the half-plane property\nInduction of High-level Behaviors from Problem-solving Traces using  Machine Learning Tools\nFermi liquid theory for SU(N) Kondo model\nOn expressive power and class invariance\nOn Kurosh problem in varieties of algebras\nA Concrete View of Rule 110 Computation\nModelTalk: A Framework for Developing Domain Specific Executable Models\nXML Data Integrity Based on Concatenated Hash Function\nReasoning About Knowledge of Unawareness Revisited\nInteracting Quantum Observables: Categorical Algebra and Diagrammatics\nInformation processing in convex operational theories\nRIOT: I/O-Efficient Numerical Computing without SQL\nUsing Ellipsoidal Domains to Analyze Control Systems Software\nAnalyse en dépendances à l'aide des grammaires d'interaction\nPrograde rotation of protoplanets by accretion of pebbles in a gaseous  environment\nComposition and Inversion of Schema Mappings\nSaturated fusion systems as idempotents in the double Burnside ring\nThermodynamics as a nonequilibrium path integral\nEuclidean Jordan Algebras, Hidden Actions, and $J$-Kepler Problems\nA Tighter Bound for the Determinization of Visibly Pushdown Automata\nDecomposable functors and the exponential principle, II\nHybrid modeling of plasmas\nUntangling Phase and Time in Monophonic Sounds\nInferring Information from Feature Diagrams to Product Line Economic  Models\nQuiver Chern-Simons Theories, D3-branes and Lorentzian Lie 3-algebras\nGesture Recognition with a Focus on Important Actions by Using a Path  Searching Method in Weighted Graph\nAxiomatisability problems for S-posets\nNoncommutative gauge theory using covariant star product defined between  Lie-valued differential forms\nNamed Models in Coalgebraic Hybrid Logic\nRapidMind: Portability across Architectures and its Limitations\nWeighted Logics for Nested Words and Algebraic Formal Power Series\nExtensional and Intensional Strategies\nIntegrating Interval Constraints into Logic Programming\nTowards Parameterized Regular Type Inference Using Set Constraints\nA PCP Characterization of AM\nSelf-Organized Criticality in Solar Physics and Astrophysics\nS-Program Calculus\nAsymptotic principal values and regularization methods for correlation  functions with reflective boundary conditions\nDust of Dark Energy\nRecognition of handwritten Roman Numerals using Tesseract open source  OCR engine\nMore on Dimension-4 Proton Decay Problem in F-theory -- Spectral  Surface, Discriminant Locus and Monodromy\nFinite Optimal Control for Time-Bounded Reachability in CTMDPs and  Continuous-Time Markov Games\nNotations Around the World: Census and Exploitation\nInformal Concepts in Machines\nImplementation of the Six Channel Redundancy to achieve fault tolerance  in testing of satellites\nA Performance Comparison of CUDA and OpenCL\nClark-Ocone type formula for non-semimartingales with finite quadratic  variation\nA Tree Logic with Graded Paths and Nominals\nProducts of Weighted Logic Programs\nA Framework for Constraint-Based Deployment and Autonomic Management of  Distributed Applications (Extended Abstract)\nDynamical systems theory for nonlinear evolution equations\nOptimal Time-Abstract Schedulers for CTMDPs and Markov Games\nCircuit Design Methods for Quantum Separator (QS) and Systems to Use Its  Output\nCommunicative Competencies and the Structuration of Expectations: The  creative tension between Habermas' critical theory and Luhmann's social  systems theory\nRuntime Analysis of Probabilistic Programs with Unbounded Recursion\nSupervisory Control Synthesis of Discrete-Event Systems using  Coordination Scheme\nSingle Parameter Combinatorial Auctions with Partially Public Valuations\nA Declarative Semantics for CLP with Qualification and Proximity\nSymmetric categorial grammar: residuation and Galois connections\nOn the Count of Trees\nA Transformation-based Implementation for CLP with Qualification and  Proximity\nMulti-Agent Only-Knowing Revisited\nSimplifying Negative Goals Using Typed Existence Properties\nDigital image exploration at Maui Community College\nHochschild-Kohomologien von Observablenalgebren in der Klassischen  Feldtheorie\nTowards Quality of Service and Resource Aware Robotic Systems through  Model-Driven Software Development\nCurved infinity-algebras and their characteristic classes\nImplementing Lego Agents Using Jason\nA probabilistic top-down parser for minimalist grammars\nLibpsht - algorithms for efficient spherical harmonic transforms\nAn Exploration of OpenCL for a Numerical Relativity Application\nA non-expert view on Turing machines, Proof Verifiers, and Mental  reasoning\nOn Selective Unboundedness of VASS\nProbabilistic regular graphs\nStabilizing knowledge through standards - A perspective for the  humanities\nOn Three Alternative Characterizations of Combined Traces\nAsymptotic safety: a simple example\nA Calculus of Consistent Component-based Software Updates\nSoftware correlators as testbeds for RFI algorithms\nLoops under Strategies ... Continued\nInfinite dimensional manifolds from a new point of view\nAutomata and temporal logic over arbitrary linear time\nModelling the Spatial Dynamics of Culture Spreading in the Presence of  Cultural Strongholds\nRelating Church-Style and Curry-Style Subtyping\nAn automaton over data words that captures EMSO logic\nAn Algebra of Synchronous Scheduling Interfaces\nThe BinProlog Experience: Architecture and Implementation Choices for  Continuation Passing Prolog and First-Class Logic Engines\nBüchi Automata can have Smaller Quotients\nThe YAP Prolog System\nComparative Study on DFD to UML Diagrams Transformations\nGeometric picture of quantum discord for two-qubit quantum states\nEPS Confidentiality and Integrity mechanisms Algorithmic Approach\nClass Schema Evolution for Persistent Object-Oriented Software: Model,  Empirical Study, and Automated Support\nAchievable Sets in Z^n\nU(N) Based Transformations in N-Squared Dimensions\nA Paradoxical Property of the Monkey Book\nFitting Ranked English and Spanish Letter Frequency Distribution in U.S.  and Mexican Presidential Speeches\nAn Empirical Study of Real-World SPARQL Queries\nOn Friedmann-Lemaître-Robertson-Walker cosmologies in non-standard  gravity\nThe Critical Exponent is Computable for Automatic Sequences\nEctoplasm with an Edge\nPartial-Order Planning with Concurrent Interacting Actions\nParameterized complexity results for 1-safe Petri nets\nNonplanar Integrability: Beyond the SU(2) Sector\nFrom Causal Models To Counterfactual Structures\nThe Hirzebruch--Riemann--Roch theorem in true genus-0 quantum K-theory\nUnderstanding Code Patterns - Analysis, Interpretation & Measurement\nGreen-Schwarz Mechanism in Heterotic (2,0) Gauged Linear Sigma Models:  Torsion and NS5 Branes\nAbstraction Super-structuring Normal Forms: Towards a Theory of  Structural Induction\nImprovements for Free\nDecidable Problems for Probabilistic Automata on Infinite Words\nEdit wars in Wikipedia\nlibCreme: An optimization library for evaluating convex-roof  entanglement measures\nActual Causation in CP-logic\nComputational wave optics library for C++: CWO++ library\nInnocent strategies as presheaves and interactive equivalences for CCS\nA Process Algebra for Supervisory Coordination\nUniform Labeled Transition Systems for Nondeterministic, Probabilistic,  and Stochastic Process Calculi\nLow-energy effective field theory for finite-temperature relativistic  superfluids\nInfinite permutations vs. infinite words\nATP and Presentation Service for Mizar Formalizations\nConstraint-Based Deadlock Checking of High-Level Specifications\nNested Hoare Triples and Frame Rules for Higher-order Store\nInnocent strategies as presheaves and interactive equivalences for CCS  (expanded version)\nPushdown Abstractions of JavaScript\nComplexity of Model Checking Recursion Schemes for Fragments of the  Modal Mu-Calculus\nExtreme value distributions and Renormalization Group\nMemoryless computation: new results, constructions, and extensions\nMathematics : The Language of Science\nScikit-learn: Machine Learning in Python\nLTL to Büchi Automata Translation: Fast and More Deterministic\nAbstracting Runtime Heaps for Program Understanding\nPSDF: Particle Stream Data Format for N-Body Simulations\nMathematical and computational modeling for describing the basic  behavior of free radicals and antioxidants within epithelial cells\nTransporting Functions across Ornaments\nA Transformation-based Implementation for CLP with Qualification and  Proximity\nBosonic Loop Diagrams as Perturbative Solutions of the Classical Field  Equations in $φ^4$-Theory\nFirst-Order Model Checking on Generalisations of Pushdown Graphs\nGenerating a Performance Stochastic Model from UML Specifications\nA family of weakly universal cellular automata in the hyperbolic plane  with two states\nSubtyping for F-Bounded Quantifiers and Equirecursive Types (Extended  Version)\nModel-Checking the Higher-Dimensional Modal mu-Calculus\nGPGPU Processing in CUDA Architecture\nComputer-Assisted Program Reasoning Based on a Relational Semantics of  Programs\nHolographic Higgs Phases\nTraining Restricted Boltzmann Machines on Word Observations\nApplying SMT Solvers to the Test Template Framework\nExact Gap Computation for Code Coverage Metrics in ISO-C\nThe Horse Raced Past: Gardenpath Processing in Dynamical Systems\nAD in Fortran, Part 2: Implementation via Prepreprocessor\nDiscovering Algorithms with Matrix Code\nInflationary perturbation theory is geometrical optics in phase space\nStrong convergence for reduced free products (Remarks on a result by  Paul Skoufranis)\nSynchronizing Automata on Quasi Eulerian Digraph\nProbabilistic Similarity Logic\nExploring Text Virality in Social Networks\nAsynchronous Games over Tree Architectures\nTrace Spaces: an Efficient New Technique for State-Space Reduction\nAttribute Exploration of Gene Regulatory Processes\nThe failure of the law of brevity in two New World primates. Statistical  caveats\nSome remarks on the genesis of scalar-tensor theories\nThe Linguistic Interpretation of Quantum Mechanics\nDeterministic Automata for the (F,G)-fragment of LTL\nA New MHD Code with Adaptive Mesh Refinement and Parallelization for  Astrophysics\nAlgebraic points on meromorphic curves\nSecuring SQLJ Source Codes from Business Logic Disclosure by Data Hiding  Obfuscation\nBrane Induced Gravity: From a No-Go to a No-Ghost Theorem\nOld and New Reductions of Dispersionless Toda Hierarchy\nCyberChair: A Web-Based Groupware Application to Facilitate the Paper  Reviewing Process\nGibbs Sampling in Factorized Continuous-Time Markov Processes\nSuccinct Representations for Abstract Interpretation\nThe Machian contribution of the Universe to geodetic precession, frame  dragging and gravitational clock effect\nA generic framework for video understanding applied to group behavior  recognition\nFast computation of gradient and sentitivity in 13C metabolic flux  analysis instationary experiments using the adjoint method\nKeyphrase Based Arabic Summarizer (KPAS)\nSemantic Networks of Interests in Online NSSI Communities\nHorava-Lifshitz theory as a Fermionic Aether in Ashtekar gravity\nApplying Deep Belief Networks to Word Sense Disambiguation\nUPPAAL-SMC: Statistical Model Checking for Priced Timed Automata\nNonparametric Bayesian Logic\nPlanning in POMDPs Using Multiplicity Automata\nOn the Relationship between LTL Normal Forms and Buechi Automata\nBlack Holes as Critical Point of Quantum Phase Transition\nSolving Stochastic Büchi Games on Infinite Arenas with a Finite  Attractor\nAssume-Guarantee Abstraction Refinement for Probabilistic Systems\nTesting the concept of integral approach to derivatives within the  smoothed particle hydrodynamics technique in astrophysical scenarios\nBasic Data Analysis and More - A Guided Tour Using Python\nQuery Optimization Over Web Services Using A Mixed Approach\nProbabilistic Monads, Domains and Classical Information\nProof of Brlek-Reutenauer conjecture\nA new operation on partially ordered sets\nForward Analysis for WSTS, Part II: Complete WSTS\nFinite Bisimulations for Switched Linear Systems\nHardness of approximation for quantum problems\nTensor networks and the numerical renormalization group\nCodensity and the ultrafilter monad\nCognitive Bias for Universal Algorithmic Intelligence\nOptimal Weighting of Multi-View Data with Low Dimensional Hidden States\nMovie Popularity Classification based on Inherent Movie Attributes using  C4.5,PART and Correlation Coefficient\nFast Packed String Matching for Short Patterns\nProceedings 8th International Workshop on Quantum Physics and Logic\nEnhancing Twitter Data Analysis with Simple Semantic Filtering: Example  in Tracking Influenza-Like Illnesses\nAnnotation of Logic Programs for Independent AND-Parallelism by Partial  Evaluation\nElaborating Inductive Definitions\nTransition-Based Dependency Parsing With Pluggable Classifiers\nA Rewriting View of Simple Typing\nOn the Performance Potential of Connection Fault-Tolerant Commit  Processing in Mobile Environment\nDeterministic Compression with Uncertain Priors\nFormal Semantics of Heterogeneous CUDA-C: A Modular Approach with  Applications\nUniform Strategies\nQuery-focused Multi-document Summarization: Combining a Novel Topic  Model with Graph-based Semi-supervised Learning\nHigher Massey products in the cohomology of mild pro-p-groups\nWeb-Based Question Answering: A Decision-Making Perspective\nComplexity fits the fittest\nKilling Spinors -- Beyond Supergravity\nLeading Questions in an Extended Standard Model\nCellular automata between sofic tree shifts\nInductive Policy Selection for First-Order MDPs\nAnalysis of Influence of Internet Retail Service Quality (IRSQ) to  Consumer Online Shopping Satisfaction at www.kebanaran.com\nA Dual Number Approach for Numerical Calculation of Velocity and  Acceleration in the Spherical 4R Mechanism\nAPI Blender: A Uniform Interface to Social Platform APIs\nA Complete Calculus for Possibilistic Logic Programming with Fuzzy  Propositional Variables\nDesign Pattern-Based Extension of Class Hierarchies to Support Runtime  Invariant Checks\nInternational collaboration clusters in Africa\nTransfer Topic Modeling with Ease and Scalability\nCausal Theories: A Categorical Perspective on Bayesian Networks\nProbabilistic Latent Semantic Analysis\nQuantifying Opacity\nDynamical Local Chirality and Chiral Symmetry Breaking\nSémantique des déterminants dans un cadre richement typé\nDisplaying Asynchronous Reactions to a Document: Two Goals and a Design\nObject Recognition with Imperfect Perception and Redundant Description\nA cellular basis of the $q$-Brauer algebra related with Murphy bases of  the Hecke algebras\nFormation of photon spheres in boson stars with a nonminimally coupled  field\nEnding-based Strategies for Part-of-speech Tagging\nThe Automated Mapping of Plans for Plan Recognition\nOrder effects in dynamic semantics\nTowards an Updatable Strategy Logic\nThe Complexity of Synthesizing Uniform Strategies\nDecision problems for word-hyperbolic semigroups\nProbabilistic Topic and Syntax Modeling with Part-of-Speech LDA\nTypes and forgetfulness in categorical linguistics and quantum mechanics\nFrom propositional to first-order monitoring\nOptimal control of a bioreactor for biofuel production\nA Federated CloudNet Architecture: The PIP and the VNP Role\n#Bigbirds Never Die: Understanding Social Dynamics of Emergent Hashtag\nDesign for a Darwinian Brain: Part 2. Cognitive Architecture\nMATAWS: A Multimodal Approach for Automatic WS Semantic Annotation\nPySLHA: a Pythonic interface to SUSY Les Houches Accord data\nThe homology of $\\mathrm{tmf}$\nNatural Tuning: Towards A Proof of Concept\nA construction of integrated vertex operator in the pure spinor  sigma-model in AdS5xS5\nRandom close packing fractions of lognormal distributions of hard  spheres\nAutomatic Abstraction in SMT-Based Unbounded Software Model Checking\nRecurrent Convolutional Neural Networks for Discourse Compositionality\nEffective Translation of LTL to Deterministic Rabin Automata: Beyond the  (F,G)-Fragment\nDiscriminative Training: Learning to Describe Video with Sentences, from  Video Described with Sentences\nNext generation input-output data format for HEP using Google's protocol  buffers\nAdditive invariants in o-minimal valued fields\nVerifying Time Complexity of Deterministic Turing Machines\nBayesian Structured Prediction Using Gaussian Processes\nPushdown Systems for Monotone Frameworks\nSpeaker Independent Continuous Speech to Text Converter for Mobile  Application\nAlgebraic Meta-Theory of Processes with Data\nTowards a General Framework for Formal Reasoning about Java Bytecode  Transformation\nA Novel Architecture for Relevant Blog Page Identifcation\nBayesOpt: A Library for Bayesian optimization with Robotics Applications\nExploring the Boundaries of Monad Tensorability on Set\nBlazes: Coordination Analysis for Distributed Programs\nUsing Self-Organizing Maps for Sentiment Analysis\nRepresenting Code History with Development Environment Events\nAbelian complexity function of the Tribonacci word\nCoroutining Folds with Hyperfunctions\nAbstraction and Learning for Infinite-State Compositional Verification\nInteractive proofs for BQP via self-tested graph states\nDomain-Specific Sentiment Word Extraction by Seed Expansion and Pattern  Generation\nMadeup: A Mobile Development Environment for Programming 3-D Models\nAn analogue of Cobham's theorem for graph directed iterated function  systems\nKramers-Wannier Duality of Statistical Mechanics Applied to the Boolean  Satisfiability Problem of Computer Science\nIMSuite: A Benchmark Suite for Simulating Distributed Algorithms\nA Bayesian Network View on Acoustic Model-Based Techniques for Robust  Speech Recognition\nContext unification is in PSPACE\nA Logic-based Approach for Recognizing Textual Entailment Supported by  Ontological Background Knowledge\nDistributional semantics beyond words: Supervised learning of analogy  and paraphrase\nThe optimality of attaching unlinked labels to unlinked meanings\nMonadic second-order definable graph orderings\nMetric axioms: a structural study\nLazy Probabilistic Model Checking without Determinisation\nThe Bose-Hubbard model is QMA-complete\nAn NMF solution for the Flowgraphs case at the TTC 2013\nAn NMF solution for the Petri Nets to State Charts case study at the TTC  2013\ndS/CFT at uniform energy density and a de Sitter \"bluewall\"\nBuilding An Information System for a Distributed Testbed\nAlgorithmic Diversity for Software Security\nAutomatic continuity for isometry groups\nOn the Proxy Identity Crisis\nVery general monomial valuations of $\\mathbb{P}^2$ and a Nagata type  conjecture\nCan recursive neural tensor networks learn logical reasoning?\nWhich abelian tensor categories are geometric?\nSpeech Recognition Front End Without Information Loss\nTesting for Synchronization\nOptimization Of Cross Domain Sentiment Analysis Using Sentiwordnet\nContent Modeling Using Latent Permutations\nFirst-Order Stable Model Semantics and First-Order Loop Formulas\nReal solutions of a problem in enumerative geometry\nRibbon graphs and bialgebra of Lagrangian subspaces\nAutomatic Aggregation by Joint Modeling of Aspects and Values\nChasing diagrams in cryptography\nAdding modular predicates to first-order fragments\nConsciousness results when communication modifies the form of  self-estimated fitness\nService-Fingerprinting mittels Fuzzing\nSingular cohomology from supersymmetric field theories\nLearning Soft Linear Constraints with Application to Citation Field  Extraction\nPrivacy Failures in Encrypted Messaging Services: Apple iMessage and  Beyond\nSubset Synchronization and Careful Synchronization of Binary Finite  Automata\nJSAI: Designing a Sound, Configurable, and Efficient Static Analyzer for  JavaScript\nMultiagent Conflict Resolution for a Specification Network of  Discrete-Event Coordinating Agents\nBracing Heterogeneous Distributed Systems via Built-in Frameworks\nTopologies of Stochastic Markov Models: Computational Aspects\nDeepWalk: Online Learning of Social Representations\nAutomatic Segmentation of Broadcast News Audio using Self Similarity  Matrix\nEmotion Analysis Platform on Chinese Microblog\nToward Synthesis of Network Updates\nTowards Verifying Safety Properties of Real-Time Probabilistic Systems\nConditional convex orders and measurable martingale couplings\nExtending the range of real time density matrix renormalization group  simulations\nOpen Question Answering with Weakly Supervised Embedding Models\nLinking Geographic Vocabularies through WordNet\nThe square of opposition in orthomodular logic\nLipschitz Robustness of Finite-state Transducers\nUniform Hyperbolicity for Szegő Cocycles and Applications to Random  CMV Matrices and the Ising Model\nThe Coulomb Branch Formula for Quiver Moduli Spaces\nTowards Verification of Constituent Systems through Automated Proof\nA sheaf-theoretic perspective on sampling\nKaggle LSHTC4 Winning Solution\nKR$^3$: An Architecture for Knowledge Representation and Reasoning in  Robotics\nOn the Local Theory of Billiards in Polygons\nStructure and modeling of the network of two-Chinese-character compound  words in the Japanese language\nTabling, Rational Terms, and Coinduction Finally Together!\nQuery Rewriting and Optimization for Ontological Databases\nUnderstanding the Complexity of Lifted Inference and Asymmetric Weighted  Model Counting\nA Well-Founded Semantics for FOL-Programs\nMinimum Model Semantics for Extensional Higher-order Logic Programming  with Negation\nAnalysis and Transformation Tools for Constrained Horn Clause  Verification\nPengines: Web Logic Programming Made Easy\nHigher Dimensional Modal Logic\nThe Impact of Disjunction on Reasoning under Existential Rules: Research  Summary\nToeplitz operators defined by sesquilinear forms: Fock space case\nProbability Logic for Harsanyi Type Spaces\nConstraints on Natural Supersymmetry from Electroweak Precision Tests\nHybrid Type-Logical Grammars, First-Order Linear Logic and the  Descriptive Inadequacy of Lambda Grammars\nOn the design of an expert help system for computer algebra systems\nUsing Local Alignments for Relation Recognition\nOn structural completeness vs almost structural completeness problem: A  discriminator varieties case study\nPaPy: Parallel and Distributed Data-processing Pipelines in Python\nSliced Slices: Separating Data and Control Influences\nA Latent Space Analysis of Editor Lifecycles in Wikipedia\nGraph- versus Vector-Based Analysis of a Consensus Protocol\nBetween quantum logic and concurrency\nSimLex-999: Evaluating Semantic Models with (Genuine) Similarity  Estimation\nQuantum Mechanics: Harbinger of a Non-Commutative Probability Theory?\nProactive Quality Guidance for Model Evolution in Model Libraries\nDeterministic Timed Finite State Machines: Equivalence Checking and  Expressive Power\nThe proposal of a novel software testing framework\nChiral algebras of class S\nTeaching Parallel Programming Using Java\nT^σ_ρ(G) Theories and Their Hilbert Series\nDelocalization of boundary states in disordered topological insulators\nAn exact mapping between the Variational Renormalization Group and Deep  Learning\nMemory Networks\nConvolution, Separation and Concurrency\nType Targeted Testing\nA stronger null hypothesis for crossing dependencies\nAmplitudes, Form Factors and the Dilatation Operator in $\\mathcal{N}=4$  SYM Theory\nA Framework for On-Line Devanagari Handwritten Character Recognition\nSemi-Automatic Construction of a Domain Ontology for Wind Energy Using  Wikipedia Articles\nObservers and Splitting Structures in Relativistic Electrodynamics\nConditional Random Field Autoencoders for Unsupervised Structured  Prediction\nA Hybrid Recurrent Neural Network For Music Transcription\nNon-crossing dependencies: least effort, not grammar\nThe Classification of Dirac Homogeneous Spaces\nExploiting Correlations for Expensive Predicate Evaluation\nGeneralizing the Liveness Based Points-to Analysis\nElectrical properties of polar membranes\nLIVEIA: A Light-based Immersive Visualization Environment for  Imaginative Actualization\nMinimum Probabilistic Finite State Learning Problem on Finite Data Sets:  Complexity, Solution and Approximations\nA study of the interface usability issues of mobile learning  applications for smart phones from the users perspective\nLifting Term Rewriting Derivations in Constructor Systems by Using  Generators\nA Dataset for Movie Description\nOnline Handwritten Devanagari Stroke Recognition Using Extended  Directional Features\nQuantifying Prosodic Variability in Middle English Alliterative Poetry\nPrivacy by Design: On the Conformance Between Protocols and  Architectures\nSpatial Interpolants\nLearning Invariants using Decision Trees\nStatic Analysis of File-Processing Programs using File Format  Specifications\nA Note on the Uniform Kan Condition in Nominal Cubical Sets\nWise Computing: Towards Endowing System Development with True Wisdom\nExploring Models and Data for Image Question Answering\nImproved Relation Extraction with Feature-Rich Compositional Embedding  Models\nDistant Supervision for Entity Linking\nLocation Prediction of Social Images via Generative Model\nRNAiFold 2.0: A web server and software to design custom and Rfam-based  RNA molecules\nDescent via Tannaka duality\nLocal Integrand Representations of All Two-Loop Amplitudes in Planar SYM\nParameterized Linear Temporal Logics Meet Costs: Still not Costlier than  LTL (full version)\nModel categorical Koszul-Tate resolution for algebras over differential  operators\nEffective thermodynamics of isolated entangled squeezed and coherent  states\nUnbiased Monte Carlo for the age of tensor networks\nUnfolding-based Partial Order Reduction\nDescribing Multimedia Content using Attention-based Encoder--Decoder  Networks\nLearning to Mine Chinese Coordinate Terms Using the Web\nIdempotents in intensional type theory\nA Lower Bound on Supporting Predecessor Search in $k$ sorted Arrays\nKnapsack and subset sum problems in nilpotent, polycyclic, and  co-context-free groups\nHow to Generate a Good Word Embedding?\nCreating an Artificial World with a New Kind of Cellular Automata\nPractical Selection of SVM Supervised Parameters with Different Feature  Representations for Vowel Recognition\nRobust speech recognition using consensus function based on multi-layer  networks\nRevisiting the combinatorics of the 2D Ising model\nCRISNER: A Practically Efficient Reasoner for Qualitative Preferences\nImporting SMT and Connection proofs as expansion trees\nBatch Normalized Recurrent Neural Networks\nAutomata, reduced words, and Garside shadows in Coxeter groups\nAlgebraic and Logical Methods in Quantum Computation\nMapping Unseen Words to Task-Trained Embedding Spaces\nTowards Meaningful Maps of Polish Case Law\nLearning multi-faceted representations of individuals from heterogeneous  evidence using neural networks\nA 'Gibbs-Newton' Technique for Enhanced Inference of Multivariate Polya  Parameters and Topic Models\nAttention with Intention for a Neural Network Conversation Model\nModular Responsive Web Design using Element Queries\nTrain and Test Tightness of LP Relaxations in Structured Prediction\nIntegrating a large-scale testing campaign in the CK framework\nA disembodied developmental robotic agent called Samu Bátfai\nUSFD: Twitter NER with Drift Compensation and Linked Data\nStückelberg Formulation of Holography\nOptimised determinisation and completion of finite tree automata\nContext-Free Commutative Grammars with Integer Counters and Resets\nTransforming Platform-Independent to Platform-Specific Component and  Connector Software Architecture Models\nImage Question Answering using Convolutional Neural Network with Dynamic  Parameter Prediction\nCombining Neural Networks and Log-linear Models to Improve Relation  Extraction\nAsymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding  For Image & Text Retrieval\nQuantum Walks with Gremlin\nMapping Images to Sentiment Adjective Noun Pairs with Factorized Neural  Nets\nThe Automatic Statistician: A Relational Perspective\nQuantifier Alternation for Infinite Words\nMachine Learning Sentiment Prediction based on Hybrid Document  Representation\nProof Relevant Corecursive Resolution\nNeural Attention Models for Sequence Classification: Analysis and  Application to Key Term Extraction and Dialogue Act Detection\nCorruption and Wealth: Unveiling a national prosperity syndrome in  Europe\nRefactoring Delta-Oriented Product Lines to achieve Monotonicity\nStructural Multi-type Sequent Calculus for Inquisitive Logic\nSpace-Efficient Latent Contracts\nSymNet: scalable symbolic execution for modern networks\nRelative Coobservability in Decentralized Supervisory Control of  Discrete-Event Systems\nVideo Description using Bidirectional Recurrent Neural Networks\nMulti-Oriented Text Detection with Fully Convolutional Networks\nA parallel repetition theorem for all entangled games\nStalemateBreaker: A Proactive Content-Introducing Approach to Automatic  Human-Computer Conversation\nProof-relevant $π$-calculus: a constructive account of concurrency and  causality\nSentence-Level Grammatical Error Identification as Sequence-to-Sequence  Correction\nSSP: Semantic Space Projection for Knowledge Graph Embedding with Text  Descriptions\nIts All on the Square- The Importance of the Sum of Squares and Making  the General Linear Model Simple\nCygrid: A fast Cython-powered convolution-based gridding module for  Python\nWhy and How to Pay Different Attention to Phrase Alignments of Different  Intensities\nVisualization of Jacques Lacan's Registers of the Psychoanalytic Field,  and Discovery of Metaphor and of Metonymy. Analytical Case Study of Edgar  Allan Poe's \"The Purloined Letter\"\nTermination Analysis of Probabilistic Programs through  Positivstellensatz's\nTeaching Data Science\nEntities as topic labels: Improving topic interpretability and  evaluability combining Entity Linking and Labeled LDA\nThe Power of Arc Consistency for CSPs Defined by Partially-Ordered  Forbidden Patterns\nSpace-Efficient Error Reduction for Unitary Quantum Computations\nAnalyzing Timed Systems Using Tree Automata\nDistance Metric Learning for Aspect Phrase Grouping\nKeyphrase Extraction using Sequential Labeling\nA Physical Metaphor to Study Semantic Drift\nSolving General Arithmetic Word Problems\nScattering amplitudes over finite fields and multivariate functional  reconstruction\nMulti-task Domain Adaptation for Sequence Tagging\nCoends and the tensor product of $\\mathcal{C}$-modules\nSelf-Similarity Breeds Resilience\nPractical optimal experiment design with probabilistic programs\nSlangSD: Building and Using a Sentiment Dictionary of Slang Words for  Short-Text Sentiment Classification\nScaling Bounded Model Checking By Transforming Programs With Arrays\nSymbolic Abstract Contract Synthesis in a Rewriting Framework\nSocial Networks Analysis in Discovering the Narrative Structure of  Literary Fiction\nComprehensive online Atomic Database Management System (DBMS) with  Highly Qualified Computing Capabilities\nA statistical learning algorithm for word segmentation\nFeature-Aware Verification\nIVOA Recommendation: IVOA Registry Interfaces Version 1.0\nProduct Review Summarization based on Facet Identification and Sentence  Clustering\nCompressed Membership for NFA (DFA) with Compressed Labels is in NP (P)\nComplexity of random smooth functions on the high-dimensional sphere\nImproved Maximum Entropy Analysis with an Extended Search Space\nThermal right-handed neutrino production rate in the non-relativistic  regime\nDegree of arbitrariness directly from moving frames\nFacing Complexity: Prediction vs. Adaptation\nA note on $\\aleph_α$-saturated o-minimal expansions of real  closed fields\nComputing Datalog Rewritings beyond Horn Ontologies\nInterview with Warren Wiscombe on scientific programing and his  contributions to atmospheric science tool making\nImprecise Meanings as a Cause of Uncertainty in Medical Knowledge-Based  Systems\nMachine Learning, Clustering, and Polymorphy\nAlgebraic semantics for a modal logic close to S1\nAn Improving Method for Loop Unrolling\nPattern Language for Good Old Future From Japanese Culture\nFirst experiences with the Intel MIC architecture at LRZ\nTowards an Abstract Domain for Resource Analysis of Logic Programs Using  Sized Types\nCounter-Strategy Guided Refinement of GR(1) Temporal Logic  Specifications\nAcyclic, connected and tree sets\nAutomatic Labeling for Entity Extraction in Cyber Security\nApproximated Symbolic Computations over Hybrid Automata\nCrowdsourcing a Word-Emotion Association Lexicon\nDyck path triangulations and extendability\nVHDL Modeling of Intrusion Detection & Prevention System (IDPS) A Neural  Network Approach\nModelling the Lexicon in Unsupervised Part of Speech Induction\nSynthesizing Finite-state Protocols from Scenarios and Requirements\nIntegrated STEM in Elementary Grades Using Distributed Agent-based  Computation\nSynthetic Data and Artificial Neural Networks for Natural Scene Text  Recognition\nA machine-compiled macroevolutionary history of Phanerozoic life\nSession Types for Broadcasting\nOn Reachability for Unidirectional Channel Systems Extended with Regular  Tests\nParametric Strategy Iteration\nLate-time Structure of the Bunch-Davies De Sitter Wavefunction\nVideoSET: Video Summary Evaluation through Text\nScalable Topical Phrase Mining from Text Corpora\nA Model-Based Approach to Impact Analysis Using Model Differencing\nVerifying Procedural Programs via Constrained Rewriting Induction\nOn the Properties of Neural Machine Translation: Encoder-Decoder  Approaches\nIntroducing Molly: Distributed Memory Parallelization with LLVM\nVariability Modeling for Customizable SaaS Applications\nTowards a Family-based Analysis of Applicability Conditions in  Architectural Delta Models\nAbility of stabilizer quantum error correction to protect itself from  its own imperfection\nCollaborative Deep Learning for Recommender Systems\nLTL Parameter Synthesis of Parametric Timed Automata\nA Binary Schema and Computational Algorithms to Process Vowel-based  Euphonic Conjunctions for Word Searches\nVoting for Deceptive Opinion Spam Detection\nMatrix-Product-State Algorithm for Finite Fractional Quantum Hall  Systems\nAnalyzing Conflict Freedom For Multi-threaded Programs With Time  Annotations\nIPMACC: Open Source OpenACC to CUDA/OpenCL Translator\nSynthesizing Modular Invariants for Synchronous Code\nSpeeding up bootstrap computations: a vectorized implementation for  statistics based on sample moments\nPaREM: A Novel Approach for Parallel Regular Expression Matching\nContext-Dependent Fine-Grained Entity Type Tagging\nAlternative statistical methods for cytogenetic radiation biological  dosimetry\nScore Function Features for Discriminative Learning: Matrix and Tensor  Framework\nOn the Formal Semantics of the Cognitive Middleware AWDRAT\nElliptic multiple zeta values and one-loop superstring amplitudes\nN-gram-Based Low-Dimensional Representation for Document Classification\nScore Function Features for Discriminative Learning\nImproving zero-shot learning by mitigating the hubness problem\nMonads need not be endofunctors\nToward Refactoring of DMARF and GIPSY Case Studies -- a Team 10  SOEN6471-S14 Project Report\nMultivariate-from-Univariate MCMC Sampler: R Package MfUSampler\nRelation between firing statistics of spiking neuron with instantaneous  feedback and without feedback\nHolographic Thermal Relaxation in Superfluid Turbulence\nCombining k-Induction with Continuously-Refined Invariants\nImplicitization of rational hypersurfaces via linear syzygies: a  practical overview\nOrdering-sensitive and Semantic-aware Topic Modeling\nCompressed Tree Canonization\nF0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief  Network\nScalable Bayesian Optimization Using Deep Neural Networks\nEvaluating QoS Parameters for IPTV Distribution in Heterogeneous  Networks\nUnified vector space mapping for knowledge representation systems\nAuthor Name Disambiguation by Using Deep Neural Network\nThe NLP Engine: A Universal Turing Machine for NLP\nHilbert-Post completeness for the state and the exception effects\nCombining Probabilistic, Causal, and Normative Reasoning in CP-logic\nBethe Projections for Non-Local Inference\nWhat's Cookin'? Interpreting Cooking Videos using Text, Speech and  Vision\nMaximum a Posteriori Adaptation of Network Parameters in Deep Models\nStructured Prediction of Sequences and Trees using Infinite Contexts\nTimed pushdown automata revisited\nFractional charge and spin errors in self-consistent Green's function  theory\nVariability Abstractions: Trading Precision for Speed in Family-Based  Analyses (Extended Version)\nGeomRDF: A Geodata Converter with a Fine-Grained Structured  Representation of Geometry in the Web\nText Segmentation based on Semantic Word Embeddings\nOne-dimensional F-definable sets in F((t))\nGoldstone Inflation\nA Secure Intelligent Decision Support System for Prescribing Medication\nRelating tensor structures on representations of general linear and  symmetric groups\nTransferring Knowledge from a RNN to a DNN\nIdentifying seasonal stars in Kaurna astronomical traditions\nLearning about probabilistic inference and forecasting by playing with  multivariate normal distributions\nVerification of Generalized Inconsistency-Aware Knowledge and Action  Bases (Extended Version)\nRefining Existential Properties in Separation Logic Analyses\nThermodynamics of stochastic Turing machines\nVisualizing and Understanding Neural Models in NLP\nMinimal theory of massive gravity\nAbstractive Multi-Document Summarization via Phrase Selection and  Merging\nOMP2HMPP: Compiler Framework for Energy Performance Trade-off Analysis  of Automatically Generated Codes\nEmotion Analysis of Songs Based on Lyrical and Audio Features\nAnomalous Dimensions of Heavy Operators from Magnon Energies\nBRST symmetry for Regge-Teitelboim based minisuperspace models\nMulti-weighted Automata and MSO Logic\nBayesian linear mixed models using Stan: A tutorial for psychologists,  linguists, and cognitive scientists\nMulti-domain Dialog State Tracking using Recurrent Neural Networks\nSemantical conditions for the definability of functions and relations\nMixed Logical and Probabilistic Reasoning for Planning and Explanation  Generation in Robotics\nSignificance of Maximum Spectral Amplitude in Sub-bands for Spectral  Envelope Estimation and Its Application to Statistical Parametric Speech  Synthesis\nLearning to Discover Key Moments in Social Media Streams\nHardy's inequality for fractional powers of the sublaplacian on the  Heisenberg group\nMulti-Modal Bayesian Embeddings for Learning Social Knowledge Graphs\nStructured Prediction: From Gaussian Perturbations to Linear-Time  Principled Algorithms\nA Mood-based Genre Classification of Television Content\nApproximated solutions to Born-Infeld dynamics\nAn Automatic Machine Translation Evaluation Metric Based on Dependency  Parsing Model\nHyper-Fit: Fitting Linear Models to Multidimensional Data with  Multivariate Gaussian Uncertainties\nSyntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models  of Meaning\nRemoving Biases from Trainable MT Metrics by Using Self-Training\nA Game of Attribute Decomposition for Software Architecture Design\nReward Shaping with Recurrent Neural Networks for Speeding up On-Line  Policy Learning in Spoken Dialogue Systems\nMemetics and Neural Models of Conspiracy Theories\nFast, Flexible Models for Discovering Topic Correlation across  Weakly-Related Collections\nDemography-based adaptive network model reproduces the spatial  organization of human linguistic groups\nVisualizing NLP annotations for Crowdsourcing\nFormal conjugacy growth in acylindrically hyperbolic groups\nMorphisms, Symbolic sequences, and their Standard Forms\nTake and Took, Gaggle and Goose, Book and Read: Evaluating the Utility  of Vector Differences for Lexical Relation Learning\nSampled Weighted Min-Hashing for Large-Scale Topic Mining\nImproved Twitter Sentiment Prediction through Cluster-then-Predict Model\nEquivalence of two Fixed-Point Semantics for Definitional Higher-Order  Logic Programs\nRelational reasoning via probabilistic coupling\nTwitter Sentiment Analysis\nHydrodynamics, resurgence and trans-asymptotics\nBlack hole entropy in the Chern-Simons-like theories of gravity and  Lorentz-diffeomorphism Noether charge\nLatency Analysis of an Aerial Video Tracking System Using Fiacre and  Tina\nCounterterms in Massive Gravity Theory\nHybrid architecture for satellite data processing workflow management\nAn encoding of array verification problems into array-free Horn clauses\nBeyond Aztec Castles: Toric Cascades in the $dP_3$ Quiver\nKlasifikasi Komponen Argumen Secara Otomatis pada Dokumen Teks berbentuk  Esai Argumentatif\nExplaining NonLinear Classification Decisions with Deep Taylor  Decomposition\nA weighted pair graph representation for reconstructibility of Boolean  control networks\nNeural Self Talk: Image Understanding via Continuous Questioning and  Answering\nJoint Image-Text News Topic Detection and Tracking with And-Or Graph  Representation\nA Bayesian approach to the g-formula\nABCNN: Attention-Based Convolutional Neural Network for Modeling  Sentence Pairs\nClosed Systems of Invertible Maps\nAutomatic Inference of Specifications in the K Framework\nOnline Keyword Spotting with a Character-Level Recurrent Neural Network\nAutomaton semigroups: new construction results and examples of  non-automaton semigroups\nExactly solvable spin chain models corresponding to BDI class of  topological superconductors\nEnvironmental Noise Embeddings for Robust Speech Recognition\nGeneralizing Prototype Theory: A Formal Quantum Framework\nNmag micromagnetic simulation tool - software engineering lessons  learned\nCertified Context-Free Parsing: A formalisation of Valiant's Algorithm  in Agda\nExtracting Keyword for Disambiguating Name Based on the Overlap  Principle\nSynthesizing a Lego Forklift Controller in GR(1): A Case Study\nPhysical Version of Singularity Resolution in the Observable Universe\nSequence Classification with Neural Conditional Random Fields\nSwivel: Improving Embeddings by Noticing What's Missing\nValue Iteration Networks\nSpoofing detection under noisy conditions: a preliminary investigation  and an initial database\nA Convolutional Attention Network for Extreme Summarization of Source  Code\nA wild model of linear arithmetic and discretely ordered modules\nFermat's Last Theorem and Catalan's Conjecture in Weak Exponential  Arithmetics\nAttentive Pooling Networks\nStochastic orders and the frog model\nPrompt Delay\nSimulation of Effective Subshifts by Two-dimensional Subshifts of Finite  Type\nEffect of quantified irreducibility on the computability of subshift  entropy\nA fine-grained approach to scene text script identification\nCan one quantum bit separate any pair of words with zero-error?\nCharacterizing Diseases from Unstructured Text: A Vocabulary Driven  Word2vec Approach\nMGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for  Sentence Classification\nVerified compilation of space-efficient reversible circuits\nTowards Automatic Learning of Heuristics for Mechanical Transformations  of Procedural Code\nCalibrar: an R package for fitting complex ecological models\nSieve-based Coreference Resolution in the Biomedical Domain\nA Computationally Efficient Framework for Automatic Inertial Sensor  Calibration\nTree-to-Sequence Attentional Neural Machine Translation\nHarnessing Deep Neural Networks with Logic Rules\nRecursive Neural Conditional Random Fields for Aspect-based Sentiment  Analysis\nOn the Compression of Recurrent Neural Networks with an Application to  LVCSR acoustic modeling for Embedded Speech Recognition\nWhat a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL  Datasets\nAdaptive Computation Time for Recurrent Neural Networks\nRecurrent Batch Normalization\nUnsupervised Measure of Word Similarity: How to Outperform Co-occurrence  and Vector Cosine in VSMs\nHoming Vector Automata\nEnhancing Sentence Relation Modeling with Auxiliary Character-level  Embedding\nHigher Order Recurrent Neural Networks\nResponse Selection with Topic Clues for Retrieval-based Chatbots\nCommon-Description Learning: A Framework for Learning Algorithms and  Generating Subproblems from Few Examples\nShift-preserving maps on $ω^*$\nSynthesizing Probabilistic Invariants via Doob's Decomposition\nUnsupervised Semantic Action Discovery from Video Collections\nMovie Description\nEpistemic Extension of Godel Logic\nWell-Posed Models of Memristive Devices\nTweet Acts: A Speech Act Classifier for Twitter\nA Multi-Smartwatch System for Assessing Speech Characteristics of People  with Dysarthria in Group Settings\nOn-line Active Reward Learning for Policy Optimisation in Spoken  Dialogue Systems\nThe Symbolic Interior Point Method\nStacking With Auxiliary Features\nDoes Multimodality Help Human and Machine for Translation and Image  Captioning?\nHierarchical Question-Image Co-Attention for Visual Question Answering\nNeural Network Translation Models for Grammatical Error Correction\nStorytelling of Photo Stream with Bidirectional Multi-thread Recurrent  Neural Network\nSCJ-Circus: a refinement-oriented formal notation for Safety-Critical  Java\nSupervised Syntax-based Alignment between English Sentences and Abstract  Meaning Representation Graphs\nModel Checking Flat Freeze LTL on One-Counter Automata\nUnderstanding User Instructions by Utilizing Open Knowledge for Service  Robots\nLinguistic Input Features Improve Neural Machine Translation\nScattering with partial information\nHuman Attention in Visual Question Answering: Do Humans and Deep  Networks Look at the Same Regions?\nA framework for detecting fraudulent activities in edo state tax  collection system using investigative data mining\nNeural Associative Memory for Dual-Sequence Modeling\nQSWalk: a Mathematica package for quantum stochastic walks on arbitrary  graphs\nSense Embedding Learning for Word Sense Induction\nImproving Agreement and Disagreement Identification in Online  Discussions with A Socially-Tuned Sentiment Lexicon\nFull-Time Supervision based Bidirectional RNN for Factoid Question  Answering\nOn the dependent conjunction and implication\nUncertainty in Neural Network Word Embedding: Exploration of Threshold  for Similarity\nDualNet: Domain-Invariant Network for Visual Question Answering\nComparing the hierarchy of keywords in on-line news portals\nUncalibrated 3D Room Reconstruction from Sound\nCommand injection attacks, continuations, and the Lambek calculus\nA Curriculum Learning Method for Improved Noise Robustness in Automatic  Speech Recognition\nInferring Logical Forms From Denotations\nSome comments on the reliability of NOAA's Storm Events Database\nReal-Time Synthesis is Hard!\nLeveraging Semantic Web Search and Browse Sessions for Multi-Turn Spoken  Dialog Systems\nLearning Concept Taxonomies from Multi-modal Data\nA Sequence-to-Sequence Model for User Simulation in Spoken Dialogue  Systems\nLower Bounds for Alternating Online State Complexity\nMoving Toward High Precision Dynamical Modelling in Hidden Markov Models\nRepresentation learning for very short texts using weighted word  embedding aggregation\nTemporal Topic Analysis with Endogenous and Exogenous Processes\nChains of Reasoning over Entities, Relations, and Text using Recurrent  Neural Networks\nGuided Alignment Training for Topic-Aware Neural Machine Translation\nEquivariant classification of $b^m$-symplectic surfaces and Nambu  structures\nJUDE: An Ultraviolet Imaging Telescope Pipeline\nA Maturity Model for Public Administration as Open Translation Data  Providers\nSCOR: Software-defined Constrained Optimal Routing Platform for SDN\nLTL-based Verification of Reconfigurable Workflows\nNon-elementary classes of representable posets\nLarge Alphabet Source Coding using Independent Component Analysis\nTopology of scrambled simplices\nA Compendium of Chameleon Constraints\nHairy Black Holes in a Box\nDevito: Towards a generic Finite Difference DSL using Symbolic Python\nDistant Supervision for Relation Extraction beyond the Sentence Boundary\nLong-Term Trends in the Public Perception of Artificial Intelligence\nA Dynamical Boundary for Anti-de Sitter Space\nNonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor  Processes\nAnnotating Derivations: A New Evaluation Strategy and Dataset for  Algebra Word Problems\nA Framework for Algebraic Characterizations in Recursive Analysis\nOnline Segment to Segment Neural Transduction\nTopic Browsing for Research Papers with Hierarchical Latent Tree  Analysis\nLearning Sentence Representation with Guidance of Human Attention\nKnapsack problem for automaton groups\nSeismic collapse prediction of frame structures by means of genetic  algorithms\nDivide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC  and Random Forest\nNeural Structural Correspondence Learning for Domain Adaptation\nComputational linking theory\nQuantifying moral foundations from various topics on Twitter  conversations\nA Chain-Detection Algorithm for Two-Dimensional Grids\nSentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban  Neighbourhoods\nAutomatic sequences, generalised polynomials, and nilmanifolds\nReasoning in the Bernays-Schoenfinkel-Ramsey Fragment of Separation  Logic\nStylometric Analysis of Early Modern Period English Plays\nFrom Interacting Particles to Equilibrium Statistical Ensembles\nFrom Event-B to Verified C via HLL\nStill not there? Comparing Traditional Sequence-to-Sequence Models to  Encoder-Decoder Neural Networks on Monotone String Translation Tasks\nTracing where IoT data are collected and aggregated\nWord Embeddings to Enhance Twitter Gang Member Profile Identification\nEx Machina: Personal Attacks Seen at Scale\nMatrix Semigroup Freeness Problems in $\\mathrm{SL}(2,\\mathbb{Z})$\nEnd-to-End Answer Chunk Extraction and Ranking for Reading Comprehension\nThe Deep Journey from Content to Collaborative Filtering\nCompositional Reasoning for Shared-variable Concurrent Programs\nHPVM: A Portable Virtual Instruction Set for Heterogeneous Parallel  Systems\nMechanically Proving Determinacy of Hierarchical Block Diagram  Translations\nWords or Characters? Fine-grained Gating for Reading Comprehension\nNeural Machine Translation with Reconstruction\nDeepCoder: Learning to Write Programs\nKeyphrase Annotation with Graph Co-Ranking\nThe Neural Noisy Channel\nPolicy-Compliant Path Diversity and Bisection Bandwidth\nSemantics of Information\nBinomial Checkpointing for Arbitrary Programs with No User Annotation\nEvolving the Incremental λ Calculus into a Model of Forward  Automatic Differentiation (AD)\nNeural Networks Models for Entity Discovery and Linking\nLinguistically Regularized LSTMs for Sentiment Classification\nKnowledge Enhanced Hybrid Neural Network for Text Matching\nSimDoc: Topic Sequence Alignment based Document Similarity Framework\nZero-Shot Visual Question Answering\nFast Non-Parametric Tests of Relative Dependency and Similarity\nUsing SyGuS to Synthesize Reactive Motion Plans\nAdaptive Feature Abstraction for Translating Video to Text\nOn Sub-Propositional Fragments of Modal Logic\nNonabelian Higgs models: paving the way for asymptotic freedom\nImproving Multi-Document Summarization via Text Classification\nDynamic Reductions for Model Checking Concurrent Software\nOn the Complexity of the Word Problem for Automaton Semigroups and  Automaton Groups\nDialogue Learning With Human-In-The-Loop\nNewsQA: A Machine Comprehension Dataset\nComputer Assisted Composition with Recurrent Neural Networks\nOn Coreferring Text-extracted Event Descriptions with the aid of  Ontological Reasoning\nDesign Automation and Design Space Exploration for Quantum Computers\nUsing Discourse Signals for Robust Instructor Intervention Prediction\nCER: Complementary Entity Recognition via Knowledge Expansion on Large  Unlabeled Product Reviews\nBilayer Linearized Tensor Renormalization Group Approach for Thermal  Tensor Networks\nComparison of max-plus automata and joint spectral radius of tropical  matrices\nLecture Notes on Mathematical Methods of Classical Physics\nBuilding Large Machine Reading-Comprehension Datasets using Paragraph  Vectors\nYou Are What You Eat... Listen to, Watch, and Read\nSemi-Supervised Phone Classification using Deep Neural Networks and  Stochastic Graph-Based Entropic Regularization\nA Multilinear Tongue Model Derived from Speech Related MRI Data of the  Human Vocal Tract\nComplementary mode analyses between sub- and super-diffusions\nAFLUX: The LUX materials search API for the AFLOW data repositories\nComplete reducibility of subgroups of reductive algebraic groups over  nonperfect fields 3\nNeural Multi-Source Morphological Reinflection\nTemporal Tessellation: A Unified Approach for Video Analysis\nTop-down Visual Saliency Guided by Captions\nHandwriting recognition using Cohort of LSTM and lexicon verification  with extremely large lexicon\nJointly Extracting Relations with Class Ties via Effective Deep Ranking\nAccelerated Convolutions for Efficient Multi-Scale Time to Contact  Computation in Julia\nQuantum corrections for the cubic Galileon in the covariant language\nThe Star Product in Interacting Quantum Field Theory\nSynthesis of Tongue Motion and Acoustics from Text using a Multimodal  Articulatory Database\nEinstein-Podolsky-Rosen Paradox in Quantum Diagrams\nShortcut Sequence Tagging\nA Topological Perspective on Interacting Algebraic Theories\nReplication issues in syntax-based aspect extraction for opinion mining\nTask-Specific Attentive Pooling of Phrase Alignments Contributes to  Sentence Matching\nGames with Costs and Delays\nTowards Decoding as Continuous Optimization in Neural Machine  Translation\nLie-Butcher series, Geometry, Algebra and Computation\nA Copy-Augmented Sequence-to-Sequence Architecture Gives Good  Performance on Task-Oriented Dialogue\nDegree of sequentiality of weighted automata\nFaster Algorithms for Weighted Recursive State Machines\nBPS Algebras, Genus Zero, and the Heterotic Monster\nA Joint Framework for Argumentative Text Analysis Incorporating Domain  Knowledge\nFrom LTL and Limit-Deterministic Büchi Automata to Deterministic  Parity Automata\nLearn&Fuzz: Machine Learning for Input Fuzzing\nDeep Reinforcement Learning: An Overview\nThe Python-based Simulations of Chemistry Framework (PySCF)\nAuto-Documenation for Software Development\nAn Intermediate Level of Abstraction for Computational Systems Chemistry\nOpinion Recommendation using Neural Memory Model\nA Hybrid Approach For Hindi-English Machine Translation\nMulti-task memory networks for category-specific aspect and opinion  terms co-extraction\nProving linearizability using forward simulations\nQuantitative aspects of linear and affine closed lambda terms\nA set-theoretical approach for ABox reasoning services (Extended  Version)\nAutomated Identification of Drug-Drug Interactions in Pediatric  Congestive Heart Failure Patients\nOn expansions of $(\\mathbf{Z},+,0)$\nsoc2seq: Social Embedding meets Conversation Model\nSynthesizing Imperative Programs from Examples Guided by Static Analysis\nTask-driven Visual Saliency and Attention-based Visual Question  Answering\nDistributed Representation of Subgraphs\nCausal Discovery Using Proxy Variables\nUnsupervised Sequence Classification using Sequential Output Statistics\nThe Hardness of Solving Simple Word Equations\nSynchronization Problems in Automata without Non-trivial Cycles\nScaffolding Networks: Incremental Learning and Teaching Through  Questioning\nCharged String Tensor Networks\nSound-Word2Vec: Learning Word Representations Grounded in Sounds\nQT2S: A System for Monitoring Road Traffic via Fine Grounding of Tweets\nDRAGNN: A Transition-based Framework for Dynamically Connected Neural  Networks\nA New Proof of the Nešetřil-Rödl Theorem\nWearing Many (Social) Hats: How Different are Your Different Social  Network Personae?\nGreen's Relations in Finite Transformation Semigroups\nOn the Importance of Super-Gaussian Speech Priors for Machine-Learning  Based Speech Enhancement\nA New Unbiased and Efficient Class of LSH-Based Samplers and Estimators  for Partition Function Computation in Log-Linear Models\nEnd-to-end optimization of goal-driven and visually grounded dialogue  systems\nTokTrack: A Complete Token Provenance and Change Tracking Dataset for  the English Wikipedia\nSupervisor Synthesis of POMDP based on Automata Learning\nBootstrapping a Lexicon for Emotional Arousal in Software Engineering\nIs This a Joke? Detecting Humor in Spanish Tweets\nLearning Similarity Functions for Pronunciation Variations\nAutomatic Argumentative-Zoning Using Word2vec\nBuilding a Neural Machine Translation System Using Only Synthetic  Parallel Data\nCombining Lexical and Syntactic Features for Detecting Content-dense  Texts in News\nOntology based Scene Creation for the Development of Automated Vehicles\nCharacter-based Joint Segmentation and POS Tagging for Chinese using  Bidirectional RNN-CRF\nWeakly Supervised Dense Video Captioning\nFostering User Engagement: Rhetorical Devices for Applause Generation  Learnt from TED Talks\nUnsupervised Event Abstraction using Pattern Abstraction and Local  Process Models\nReal-time On-Demand Crowd-powered Entity Extraction\nMobile Keyboard Input Decoding with Finite-State Transducers\nA Search for Improved Performance in Regular Expressions\nRoom for improvement in automatic image description: an error analysis\nHow Robust Are Character-Based Word Embeddings in Tagging and MT Against  Wrod Scramlbing or Randdm Nouse?\nRACE: Large-scale ReAding Comprehension Dataset From Examinations\nBenchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming  productivity, performance, and energy consumption\nAnswering Complex Questions Using Open Information Extraction\nCALF: Categorical Automata Learning Framework\nGlobal Relation Embedding for Relation Extraction\nNeural System Combination for Machine Translation\nMaking Neural Programming Architectures Generalize via Recursion\nSarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual  Machine Translation\nLearning Automata with Side-Effects\nImproved Algorithms for Computing the Cycle of Minimum Cost-to-Time  Ratio in Directed Graphs\nBusy Beaver Scores and Alphabet Size\nFinite-state Strategies in Delay Games (full version)\nNon-linear Associative-Commutative Many-to-One Pattern Matching with  Sequence Variables\nCoherent extension of partial automorphisms, free amalgamation, and  automorphism groups\nDeep Speaker: an End-to-End Neural Speaker Embedding System\nRamsey properties and extending partial automorphisms for classes of  finite structures\nLearning Distributed Representations of Texts and Entities from  Knowledge Base\nEmptiness Problems for Distributed Automata\nCHAM: action recognition using convolutional hierarchical attention  model\nDeepTingle\nGeneralized Jacobi identities and Jacobi elements of the group ring of  the symmetric group\nLearning with Noise: Enhance Distantly Supervised Relation Extraction  with Dynamic Transition Matrix\nA Deep Reinforced Model for Abstractive Summarization\nHigher-Order Constrained Horn Clauses and Refinement Types\nState Complexity of Reversals of Deterministic Finite Automata with  Output\nA Regularized Framework for Sparse and Structured Neural Attention\nOn-the-fly Operation Batching in Dynamic Computation Graphs\nDeep Voice 2: Multi-Speaker Neural Text-to-Speech\nAutomatic sequences and generalised polynomials\nBiomedical Event Trigger Identification Using Bidirectional Recurrent  Neural Network Based Models\nDetecting and Explaining Crisis\nLock-step simulation is child's play\nAn Automatic Contextual Analysis and Clustering Classifiers Ensemble  approach to Sentiment Analysis\nFinding Root Causes of Floating Point Error with Herbgrind\nNetSciEd: Network Science and Education for the Interconnected World\nDiscovering Discrete Latent Topics with Neural Variational Inference\nModeling Latent Attention Within Neural Networks\nJoint Modeling of Topics, Citations, and Topical Authority in Academic  Corpora\nJoint Text Embedding for Personalized Content-based Recommendation\nMeasuring Offensive Speech in Online Political Discourse\nMacquarie University at BioASQ 5b -- Query-based Summarisation  Techniques for Selecting the Ideal Answers\nImproving Semantic Relevance for Sequence-to-Sequence Learning of  Chinese Social Media Text Summarization\nAvoiding Discrimination through Causal Reasoning\nCollaborative Summarization of Topic-Related Videos\nTrimming and Improving Skip-thought Vectors\nNeural Domain Adaptation for Biomedical Question Answering\nFailure-Directed Program Trimming (Extended Version)\nEnsembling Factored Neural Machine Translation Models for Automatic  Post-Editing and Quality Estimation\nPlan, Attend, Generate: Character-level Neural Machine Translation with  Planning in the Decoder\nDetecting Large Concept Extensions for Conceptual Analysis\nSignal Machine And Cellular Automaton Time-Optimal Quasi-Solutions Of  The Firing Squad/Mob Synchronisation Problem On Connected Graphs\nImproving text classification with vectors of reduced precision\nGraph-based Neural Multi-Document Summarization\nA Study of Concurrency Bugs and Advanced Development Support for  Actor-based Programs\nPersonalization in Goal-Oriented Dialog\nMixing for suspension flows over skew-translations and time-changes of  quasi-abelian filiform nilflows\nA Steganographic Design Paradigm for General Steganographic Objectives\nAn Approach for Weakly-Supervised Deep Information Retrieval\nChecking Linearizability of Concurrent Priority Queues\nCausal Consistency of Structural Equation Models\nVisually Grounded Word Embeddings and Richer Visual Features for  Improving Multimodal Neural Machine Translation\nTensor-Train Recurrent Neural Networks for Video Classification\nSegal-type models of higher categories\nEvent Schema Induction using Tensor Factorization with Back-off\nInferSpark: Statistical Inference at Scale\nGeometrization of the Real Number System\nA Brief Survey of Text Mining: Classification, Clustering and Extraction  Techniques\nThe Intentional Unintentional Agent: Learning to Solve Many Continuous  Control Tasks Simultaneously\nLoop Representation of Wigner's Little Groups\nIs writing style predictive of scientific fraud?\nOn Repair with Probabilistic Attribute Grammars\nThe Reach-Avoid Problem for Constant-Rate Multi-Mode Systems\nLearning Features from Co-occurrences: A Theoretical Analysis\nParsing with Traces: An $O(n^4)$ Algorithm and a Structural  Representation\nBayesian Optimization for Probabilistic Programs\nDeveloping a concept-level knowledge base for sentiment analysis in  Singlish\nCross-genre Document Retrieval: Matching between Conversational and  Formal Writings\nMPIgnite: An MPI-Like Language and Prototype Implementation for Apache  Spark\nVisual Question Answering with Memory-Augmented Networks\ngraph2vec: Learning Distributed Representations of Graphs\nAsk Me Anything: A Conversational Interface to Augment Information  Security Workers\nSolving the social choice problem under equality constraints\nAn Error-Oriented Approach to Word Embedding Pre-Training\nA Sentiment-and-Semantics-Based Approach for Emotion Detection in  Textual Conversations\nA Sequential Model for Classifying Temporal Relations between  Intra-Sentence Events\nAnalysing Errors of Open Information Extraction Systems\nEvaluation of Semantic Web Technologies for Storing Computable  Definitions of Electronic Health Records Phenotyping Algorithms\nIntegrating Lexical and Temporal Signals in Neural Ranking Models for  Searching Social Media Streams\nA calibration method for estimating critical cavitation loads from below  in 3D nonlinear elasticity\nKnotted solutions, from electromagnetism to fluid dynamics\nProcess Description, Behavior, and Control\nAdapting Sequence Models for Sentence Correction\nReporting Score Distributions Makes a Difference: Performance Study of  LSTM-networks for Sequence Tagging\nRetrofitting Distributional Embeddings to Knowledge Graphs with  Functional Relations\nDistinct Squares in Circular Words\nConstraint metric approximations and equations in groups\nDynamic Data Selection for Neural Machine Translation\nGraph-based Features for Automatic Online Abuse Detection\nNeural Machine Translation with Word Predictions\nISS-MULT: Intelligent Sample Selection for Multi-Task Learning in  Question Answering\nTandemNet: Distilling Knowledge from Medical Images Using Diagnostic  Reports as Optional Semantic References\nWhat matters in a transferable neural network model for relation  classification in the biomedical domain?\nGold Standard Online Debates Summaries and First Experiments Towards  Automatic Summarization of Online Debate Data\nSequence-to-Label Script Identification for Multilingual OCR\nA Generalised Directional Laplacian Distribution: Estimation, Mixture  Models and Audio Source Separation\nIncorporating Copying Mechanism in Image Captioning for Learning Novel  Objects\nConstructing Words with High Distinct Square Densities\nClassification of Radiology Reports Using Neural Attention Models\nTowards an Automatic Turing Test: Learning to Evaluate Dialogue  Responses\nCloudScan - A configuration-free invoice analysis system using recurrent  neural networks\nSupervised Speech Separation Based on Deep Learning: An Overview\nAutomated adjoints of coupled PDE-ODE systems\nUsing Optimal Ratio Mask as Training Target for Supervised Speech  Separation\nAbstractness, specificity, and complexity in software design\nLTL to Deterministic Emerson-Lei Automata\nAppTechMiner: Mining Applications and Techniques from Scientific  Articles\nOn Uniquely Closable and Uniquely Typable Skeletons of Lambda Terms\nOn the decidability of the existence of polyhedral invariants in  transition systems\nAnalyzing Hidden Representations in End-to-End Automatic Speech  Recognition Systems\nExtending Coinductive Logic Programming with Co-Facts\nErlang Code Evolution Control\nSynthesizing Coupling Proofs of Differential Privacy\n\"How May I Help You?\": Modeling Twitter Customer Service Conversations  Using Fine-Grained Dialogue Acts\nOrder-Preserving Abstractive Summarization for Spoken Content Based on  Connectionist Temporal Classification\nParaphrasing verbal metonymy through computational methods\nMetaLDA: a Topic Model that Efficiently Incorporates Meta information\nWhy PairDiff works? -- A Mathematical Analysis of Bilinear Relational  Compositional Operators for Analogy Detection\nAn Improvement on LSB Matching and LSB Matching Revisited Steganography  Methods\nNeural Optimizer Search with Reinforcement Learning\nThe holographic dual of the Penrose transform\n\"Let me convince you to buy my product ... \": A Case Study of an  Automated Persuasive System for Fashion Products\nIdentifying Restaurant Features via Sentiment Analysis on Yelp Reviews\nRegion-Based Image Retrieval Revisited\nPredicting Disease-Gene Associations using Cross-Document Graph-based  Features\nKnapsack Problems for Wreath Products\nPointless Continuous Spatial Surface Reconstruction\nThread-Modular Static Analysis for Relaxed Memory Models\nTraining an adaptive dialogue policy for interactive learning of  visually grounded word meanings\nFully Automated Fact Checking Using External Sources\nCrySL: Validating Correct Usage of Cryptographic APIs\nDistributional Inclusion Vector Embedding for Unsupervised Hypernymy  Detection\nDimReader: Using auto-differentiation to explain non-linear projections\nBorel functors, interpretations, and strong conceptual completeness for  $\\mathcal L_{ω_1ω}$\nThe DIRHA-English corpus and related tasks for distant-speech  recognition in domestic environments\nClickbait detection using word embeddings\nA Language Hierarchy and Kitchens-Type Theorem for Self-Similar Groups\nProgrammable and scalable radio-frequency pulse sequence generator for  multi-qubit quantum information experiments\nDisSent: Sentence Representation Learning from Explicit Discourse  Relations\nAverage Stack Cost of Buechi Pushdown Automata\nDark Energy after GW170817 and GRB170817A\nRobust Hyperproperty Preservation for Secure Compilation (Extended  Abstract)\nContent Based Document Recommender using Deep Learning\nAn MCMC Algorithm for Estimating the Reduced RUM\nROS and Buzz: consensus-based behaviors for heterogeneous teams\nTrace norm regularization and faster inference for embedded speech  recognition RNNs\nMinimal Synthesis of String To String Functions From Examples\nSocialbots supporting human rights\nIterations of Multifunctions for Graph Theory: Bipartite Graphs and  Filters\nRefounding legitimacy towards Aethogenesis\nMulti-label Dataless Text Classification with Topic Modeling\nA Theory of Slicing for Probabilistic Control-Flow Graphs\nSelf-referential basis of undecidable dynamics: from The Liar Paradox  and The Halting Problem to The Edge of Chaos\nExtractive Multi-document Summarization Using Multilayer Networks\nNeural Variational Inference and Learning in Undirected Graphical Models\nAn Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading  Comprehension Tasks\nDocument Context Neural Machine Translation with Memory Networks\nYEDDA: A Lightweight Collaborative Text Span Annotation Tool\nFine Grained Knowledge Transfer for Personalized Task-oriented Dialogue  Systems\nQuickEdit: Editing Text & Translations by Crossing Words Out\nDynamic Fusion Networks for Machine Reading Comprehension\nDuReader: a Chinese Machine Reading Comprehension Dataset from  Real-world Applications\nHuman and Machine Speaker Recognition Based on Short Trivial Events\nDeep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time\nCrowdsourcing Question-Answer Meaning Representations\nAn Abstractive approach to Question Answering\nSpeech Dereverberation with Context-aware Recurrent Neural Networks\nAbstract Interpretation of Binary Code with Memory Accesses using  Polyhedra\nFusionNet: Fusing via Fully-Aware Attention with Application to Machine  Comprehension\nProgram Synthesis using Conflict-Driven Learning\nRelational Symbolic Execution\nCustomized Nonlinear Bandits for Online Response Selection in Neural  Conversation Models\nSPINE: SParse Interpretable Neural Embeddings\nContinuous Semantic Topic Embedding Model Using Variational Autoencoder\nA Formal Specification Framework for Smart Grid Components\nDesigning Secure Ethereum Smart Contracts: A Finite State Machine Based  Approach\nTensorFlow Distributions\nEmbedding Words as Distributions with a Bayesian Skip-gram Model\nVideo Captioning via Hierarchical Reinforcement Learning\nA Novel Embedding Model for Knowledge Base Completion Based on  Convolutional Neural Network\nFlagIt: A System for Minimally Supervised Human Trafficking Indicator  Mining\nFPGA with Improved Routability and Robustness in 130nm CMOS with  Open-Source CAD Targetability\nAtiyah-Patodi-Singer index theorem for domain-wall fermion Dirac  operator\nTracing a Loose Wordhood for Chinese Input Method Engine\nNegBio: a high-performance tool for negation and uncertainty detection  in radiology reports\n\"Oh Tanenbaum, oh Tanenbaum...\": Technical Foundations of Xmas 4.0  Research\nMultilingual Topic Models\nModel-Based Clustering of Time-Evolving Networks through Temporal  Exponential-Family Random Graph Models\nSheaf-Theoretic Stratification Learning\nNeural Network Multitask Learning for Traffic Flow Forecasting\nSemi-automatic definite description annotation: a first report\nOn the Semantics of Intensionality and Intensional Recursion\nRewriting in Free Hypergraph Categories\nCNN Is All You Need\nTensor network states in time-bin quantum optics\nObject-Oriented Theorem Proving (OOTP): First Thoughts\nIdentifying emergency stages in Facebook posts of police departments  with convolutional and recurrent neural networks and support vector machines\nPotentiality, Actuality and Non-Separability in Quantum and Classical  Physics: Res Potentiae in the Macroscopic World\nA Multi-task Learning Approach for Improving Product Title Compression  with User Search Log Data\nNeural Program Synthesis with Priority Queue Training\nUsing probabilistic programs as proposals\nBuilding a Conversational Agent Overnight with Dialogue Self-Play\nCobra: A Framework for Cost Based Rewriting of Database Applications\nAn Iterative Closest Point Method for Unsupervised Word Translation\nIntegrating planning for task-completion dialogue policy learning\nSurvey on Emotional Body Gesture Recognition\nDiagrammatic Reasoning beyond Clifford+T Quantum Mechanics\nCataloging the Visible Universe through Bayesian Inference at Petascale\nDeceptive Games\nDual Recurrent Attention Units for Visual Question Answering\nPhonetic and Graphemic Systems for Multi-Genre Broadcast Transcription\nContent based Weighted Consensus Summarization\nTunneling Neural Perception and Logic Reasoning through Abductive  Learning\nProposal and implementation of a novel perturb and observe algorithm  using embedded software\nDeterministic Regular Expressions With Back-References\nDecoding-History-Based Adaptive Control of Attention for Neural Machine  Translation\nPartisan: Enabling Cloud-Scale Erlang Applications\nNonspecific biological effects of weak magnetic fields depend on  molecular rotations\nEvolution of the Science Fiction Writer's Capacity to Imagine the Future\nAugment and Reduce: Stochastic Inference for Large Categorical  Distributions\nAttention based Sentence Extraction from Scientific Articles using  Pseudo-Labeled data\nOn the Feasibility of Decentralized Derivatives Markets\nMulti-Task Learning for Extraction of Adverse Drug Reaction Mentions  from Tweets\nOpen Information Extraction on Scientific Text: An Evaluation\nMultinomial Adversarial Networks for Multi-Domain Text Classification\nVariational Autoencoders for Collaborative Filtering\nDisentangling Aspect and Opinion Words in Target-based Sentiment  Analysis using Lifelong Learning\nTowards a Continuous Knowledge Learning Engine for Chatbots\nCombining Textual Content and Structure to Improve Dialog Similarity\nBroyden's method for nonlinear eigenproblems\nComputing the concurrency threshold of sound free-choice workflow nets\nVizWiz Grand Challenge: Answering Visual Questions from Blind People\nInverse Doppler Effects in Pipe Instruments\nEvaluating Design Tradeoffs in Numeric Static Analysis for Java\nMeta Multi-Task Learning for Sequence Modeling\nDecreasing height along continued fractions\nTone Biased MMR Text Summarization\nTool Demonstration: FSolidM for Designing Secure Ethereum Smart  Contracts\nAnalyzing Uncertainty in Neural Machine Translation\nYuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational  Knowledge for Commonsense Machine Comprehension\nConcatenated $p$-mean Word Embeddings as Universal Cross-Lingual  Sentence Representations\nCalculated attributes of synonym sets\nA Genetic Programming Framework for 2D Platform AI\nROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization  Tasks\nSelf-Attention with Relative Position Representations\nActive Particles Bound by Information Flows\nAn Unsupervised Model with Attention Autoencoders for Question Retrieval\nGeneralised Operations in Free Harmonic Analysis\nSpace-Efficient Bimachine Construction Based on the Equalizer  Accumulation Principle\nFEVER: a large-scale dataset for Fact Extraction and VERification\nMore Nonlocality with Less Entanglement in CHSH Experiments using  Inefficient Detectors\nExpeditious Generation of Knowledge Graph Embeddings\nOn the Invariance of Gödel's Second Theorem with regard to Numberings\nThe Rapidly Changing Landscape of Conversational Agents\nLocally Private Bayesian Inference for Count Models\nStyle Tokens: Unsupervised Style Modeling, Control and Transfer in  End-to-End Speech Synthesis\nOn 2-Group Global Symmetries and Their Anomalies\nComparative Study of Eight Formal Specifications of the Message  Authenticator Algorithm\nMachine Speech Chain with One-shot Speaker Adaptation\nJoint PLDA for Simultaneous Modeling of Two Factors\nAttention-based End-to-End Models for Small-Footprint Keyword Spotting\nHyperbolic vortices and Dirac fields in 2+1 dimensions\nAutomatically augmenting an emotion dataset improves classification  using audio\nReusing Neural Speech Representations for Auditory Emotion Recognition\nReactive Supervisory Control of Open Discrete-event Systems\nCompletely Unsupervised Phoneme Recognition by Adversarially Learning  Mapping Relationships from Audio Embeddings\nHigh-quality nonparallel voice conversion based on cycle-consistent  adversarial network\nIncorporating Word Embeddings into Open Directory Project based  Large-scale Classification\nCIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble  of Deep and Shallow Learning to predict the Quality of Product Titles\nThe Factorization Problem in Jackiw-Teitelboim Gravity\nAbstractive Tabular Dataset Summarization via Knowledge Base Semantic  Embeddings\nIntegrating Software Engineering Key Practices into an OOP Massive  In-Classroom Course: an Experience Report\nLearning a Text-Video Embedding from Incomplete and Heterogeneous Data\nPath-integral representation of diluted pedestrian dynamics\nFlexible and Scalable Deep Learning with MMLSpark\nLearning Abstractions for Program Synthesis\nA denotational account of C11-style memory\nSingularities of Transition Processes in Dynamical Systems: Qualitative  Theory of Critical Delays\nLoading a Bose-Einstein Condensate onto an Optical Lattice: an  Application of Optimal Control Theory to The Non Linear Schrödinger  Equation\nAxiomatic Synthesis of Computer Programs and Computability Theorems\nGeneric Global Constraints based on MDDs\nNoise and Fluctuations in Semiclassical Gravity\nTwo-Time Physics with gravitational and gauge field backgrounds\nOpen String on Symmetric Product\nIntegrability of generalized (matrix) Ernst equations in string theory\nThe Euclidean geometry deformations and capacities of their application  to microcosm space-time geometry\nFrom Classical to Quantum Mechanics: \"How to translate physical ideas  into mathematical language\"\nCulminating paths\nA metageometric enquiry concerning time, space, and quantum physics\nFactor-Group-Generated Polar Spaces and (Multi-)Qudits\nRPO, Second-order Contexts, and Lambda-calculus\nCalculating energy shifts in terms of phase shifts\nA measure on the set of compact Friedmann-Lemaitre-Robertson-Walker  models\nReflection and Hyper-Programming in Persistent Programming Systems\nStructure of Lanczos-Lovelock Lagrangians in Critical Dimensions\nOperator Spin Foam Models\nCorrelations in Hawking radiation and the infall problem\nA Sequence of Qubit-Qudit Pauli Groups as a Nested Structure of Doilies\nEntropy-driven cutoff phenomena\nSome Quantum-Like Features of Mass Politics in Two-Party Systems\nDust driven mass loss from carbon stars as function of stellar  parameters - II. Effects of grain size on wind properties\nEffect of a relativistic correction to the Coulomb potential on the  energy levels of hydrogen atom\nCausal graph dynamics\nASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset\nAsymptotic probabilities of extension properties and random  $l$-colourable structures\nDetecting Events and Patterns in Large-Scale User Generated Textual  Streams with Statistical Learning Methods\nTowards Cancer Hybrid Automata\nProlongation of quasi-principal frame bundles and geometry of flag  structures on manifolds\nFiltergraph: A Flexible Web Application for Instant Data Visualization  of Astronomy Datasets\nDistributed optimization of deeply nested systems\nGeneric Strategies for Chemical Space Exploration\nInferring Robot Task Plans from Human Team Meetings: A Generative  Modeling Approach with Logic-Based Prior\nFiltergraph: An Interactive Web Application for Visualization of  Astronomy Datasets\nSymbolic Abstractions of Networked Control Systems\nInducing chaos by breaking axial symmetry in a black hole magnetosphere\nCOFFEE: an Optimizing Compiler for Finite Element Local Assembly\nComputational Analysis of Perfect-Information Position Auctions\nBroadcasting Automata and Patterns on Z^2\nSimple, Parallel, High-Performance Virtual Machines for Extreme  Computations\nA Direct Symbolic Execution of SQL Code for Testing of Data-Oriented  Applications\nA Symbolic Execution Algorithm for Constraint-Based Testing of Database  Programs\nDopeLearning: A Computational Approach to Rap Lyrics Generation\nLattice-Theoretic Progress Measures and Coalgebraic Model Checking (with  Appendices)\nA general framework for Noetherian well ordered polynomial reductions\nA Neural Transducer\nDeep Compositional Captioning: Describing Novel Object Categories  without Paired Training Data\nBoundary action of automaton groups without singular points and Wang  tilings\nMaps of Computer Science\nWord Network Topic Model: A Simple but General Solution for Short and  Imbalanced Texts\nDeep Captioning with Multimodal Recurrent Neural Networks (m-RNN)\nConcentration Independent Random Number Generation in Tile Self-Assembly\nGeneralizing Permissive-Upgrade in Dynamic Information Flow Analysis\nOn the evolution of word usage of classical Chinese poetry\nUnified view of quantum amplification based on quantum transformation\nJust Another Gibbs Additive Modeller: Interfacing JAGS and mgcv\nIdentifying Structures in Social Conversations in NSCLC Patients through  the Semi-Automatic extraction of Topical Taxonomies\nTensor networks, $p$-adic fields, and algebraic curves: arithmetic and  the AdS$_3$/CFT$_2$ correspondence\nThe Ryu-Takayanagi Formula from Quantum Error Correction\nMulti-document abstractive summarization using ILP based multi-sentence  compression\nRNN Approaches to Text Normalization: A Challenge\nSampled Image Tagging and Retrieval Methods on User Generated Content\nInvariant Representations for Noisy Speech Recognition\nThe SP Theory of Intelligence as a Foundation for the Development of a  General, Human-Level Thinking Machine\nDySign: Dynamic Fingerprinting for the Automatic Detection of Android  Malware\nSurface Charges for Gravity and Electromagnetism in the First Order  Formalism\nLa forma della Terra: una lezione sulla gravità Newtoniana\nClingcon: The Next Generation\nOn the arithmetic of graphs\nAn online sequence-to-sequence model for noisy speech recognition\nThe Intricacies of 3-Valued Extensional Semantics for Higher-Order Logic  Programs\nInspecting Maude Variants with GLINTS\nEnabling Mutation Testing for Android Apps\nFull-Network Embedding in a Multimodal Embedding Pipeline\nLearning to Attend, Copy, and Generate for Session-Based Query  Suggestion\nExploiting Semantic Contextualization for Interpretation of Human  Activity in Videos\nNeural Collaborative Filtering\nAutomated Crowdturfing Attacks and Defenses in Online Review Systems\nStudy on cluster algebras via abstract pattern and two conjectures on  d-vectors and g-vector\nVideo Captioning with Guidance of Multimodal Latent Topics\nR$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering\nProgramming Not Only by Example\nWord problems in Elliott monoids\nBlack-box Generation of Adversarial Text Sequences to Evade Deep  Learning Classifiers\nEmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings\nGeneralized Points-to Graphs: A New Abstraction of Memory in the  Presence of Pointers\nSearch Based Code Generation for Machine Learning Programs\nCan we steal your vocal identity from the Internet?: Initial  investigation of cloning Obama's voice using GAN, WaveNet and low-quality  found data\nReality-check for Econophysics: Likelihood-based fitting of  physics-inspired market models to empirical data\nDiscovering Users Topic of Interest from Tweet\nCombinatorial Register Allocation and Instruction Scheduling\nOntology Verbalization using Semantic-Refinement\nSCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis  Using Artificial Neural Networks\nIntegrating design synthesis and assembly of structured objects in a  visual design language\nCross-lingual keyword assignment\nCP-logic: A Language of Causal Probabilistic Events and Its Relation to  Logic Programming\nAutomated languages phylogeny from Levenshtein distance\nAn Implementation of the Language Lambda Prolog Organized around  Higher-Order Pattern Unification\nReactive Imperative Programming with Dataflow Constraints\nThe origin of Mayan languages from Formosan language group of  Austronesian\nQ#, a quantum computation package for the .NET platform\nUnambiguous Buchi is weak\nArray operators using multiple dispatch: a design methodology for array  implementations in dynamic languages\nThe statistical analysis of acoustic phonetic data: exploring  differences between spoken Romance languages\nSpoken Language Translation for Polish\nImplementation of the Programming Language Dino -- A Case Study in  Dynamic Language Performance\n$μ$Puppet: A Declarative Subset of the Puppet Configuration Language\nA Large-Scale Multilingual Disambiguation of Glosses\nHow much is said in a microblog? A multilingual inquiry based on Weibo  and Twitter\nVisual Affect Around the World: A Large-scale Multilingual Visual  Sentiment Ontology\nOn the emergence of syntactic structures: quantifying and modelling  duality of patterning\nComplex Networks of Words in Fables\nNesting Depth of Operators in Graph Database Queries: Expressiveness Vs.  Evaluation Complexity\nNominal Automata with Name Binding\nThe Social Dynamics of Language Change in Online Networks\nThe Algebra of Open and Interconnected Systems\nUsing Natural Language Processing and Qualitative Analysis to Intervene  in Gang Violence: A Collaboration Between Social Work Researchers and Data  Scientists\nA Comparison of Word Embeddings for English and Cross-Lingual Chinese  Word Sense Disambiguation\nLearning a Natural Language Interface with Neural Programmer\nFamilies of DFAs as Acceptors of $ω$-Regular Languages\nConstrained Topological Sorting\nTowards a Question Answering System over the Semantic Web\nAn End-to-end Neural Natural Language Interface for Databases\nRationality, irrationality, and Wilf equivalence in generalized factor  order\nThe Sheaf-Theoretic Structure Of Non-Locality and Contextuality\nRandomized Distributed Decision\nAbstracting Abstract Control (Extended)\nA cognitive neural architecture able to learn and communicate through  natural language\nWhy Nominal-Typing Matters in OOP\nCoupled dynamics of node and link states in complex networks: A model  for language competition\nDamage to white matter bottlenecks contributes to language impairments  after left hemispheric stroke\nParallel ICA reveals linked patterns of structural damage and fMRI  language task activation in chronic post-stroke aphasia\nGoogle's Multilingual Neural Machine Translation System: Enabling  Zero-Shot Translation\nGleam: the GLAST Large Area Telescope Simulation Framework\nDimers on a simple-quartic net with a vacancy\nMixing patterns in networks\nA Farewell to Liouvillians\nA symmetry principle for Topological Quantum Order\nModelling Contractual Arguments\nNonmonotonic Logics and Semantics\nNonmonotonic Reasoning, Preferential Models and Cumulative Logics\nThe Sketch of a Polymorphic Symphony\nUser software for the next generation\nLearning Algorithms for Keyphrase Extraction\nCoherent Keyphrase Extraction via Web Mining\nDistributed WWW Programming using (Ciao-)Prolog and the PiLLoW library\nProving Correctness and Completeness of Normal Programs - a Declarative  Approach\nA Generic Framework for the Analysis and Specialization of Logic  Programs\nCombining decision procedures for the reals\nThe OverRelational Manifesto\nA tool set for the quick and efficient exploration of large document  collections\nGeometrisation of Statistical Mechanics\nThe One-Way Speed of Light on Rotating Earth and the Definition of the  Meter\nNon-commutative Unification in Brane World\nSeiberg Duality for Quiver Gauge Theories\nDe Sitter and Schwarzschild-De Sitter According to Schwarzschild and De  Sitter\nAlmost locally free groups and the genus question\nThe Theory of Ultralogics Part I\nAdjusted Viterbi training\nThe bicategories of corings\nOn umbral extensions of Stirling numbers and Dobinski-like formulas\nNew crisis in geometry?\nStability and Paradox in Algorithmic Logic\nAspects of the stochastic Burgers equation and their connection with  turbulence\nA Multi-Phase Transport Model for Relativistic Heavy Ion Collisions\nRenormalization of the ETAS branching model of triggered seismicity from  total to observable seismicity\nDescription of Quantum Entanglement with Nilpotent Polynomials\nWeb Server Benchmark Application WiiBench using Erlang/OTP R11 and  Fedora-Core Linux 5.0\nVisibly Tree Automata with Memory and Constraints\nOn the derived category of a regular toric scheme\nNLS1 galaxies and estimation of their central black hole masses from the  X-ray excess variance method\nQuantum chromodynamics at high energy and statistical physics\nDomain Structure of Black Hole Space-Times\nCharacterizations of Stable Model Semantics for Logic Programs with  Arbitrary Constraint Atoms\nProgramming Realization of Symbolic Computations for Non-linear  Commutator Superalgebras over the Heisenberg--Weyl Superalgebra: Data  Structures and Processing Methods\nA GPU based real-time software correlation system for the Murchison  Widefield Array prototype\nAlgorithms for Glushkov K-graphs\nNew ideas about multiplication of tensorial distributions\nQuantifying the implicit process flow abstraction in SBGN-PD diagrams  with Bio-PEPA\nPyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code  Generation\nA secured Cryptographic Hashing Algorithm\nFrom Holant To #CSP And Back: Dichotomy For Holant$^c$ Problems\nMaking Access to Astronomical Software More Efficient\nDynamical properties of profinite actions\nQuery Routing and Processing in Peer-To-Peer Data Sharing Systems\nImperfect Dark Energy from Kinetic Gravity Braiding\nEquational Characterization of Covariant-Contravariant Simulation and  Conformance Simulation Semantics\nFourier expansions of GL(2) newforms at various cusps\nComplex sequencing rules of birdsong can be explained by simple hidden  Markov processes\nSchaefer's theorem for graphs\nBifix codes and Sturmian words\nGeometry and Energy of Non-abelian Vortices\nSearching for simplicity: Approaches to the analysis of neurons and  behavior\nThe Structure of First-Order Causality (extended version)\nGeneralized Remez Inequality for $(s,p)$-Valent Functions\nImproving Image Search based on User Created Communities\nDB Category: Denotational Semantics for View-based Database Mappings\nSimulating Spiking Neural P systems without delays using GPUs\nThree form potential in (special) minimal supergravity superspace and  supermembrane supercurrent\nAnalogy perception applied to seven tests of word comprehension\nScoring Strategies for the Underdog: A general, quantitative method for  determining optimal sports strategies\nPyCOOL - a Cosmological Object-Oriented Lattice code written in Python\nParallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset\nEstimating the Prevalence of Deception in Online Review Communities\nSoftware Mutational Robustness\nContext-sensitive Spelling Correction Using Google Web 1T 5-Gram  Information\nCitations, Sequence Alignments, Contagion, and Semantics: On Acyclic  Structures and their Randomness\nHigh Accuracy Gravitational Waveforms from Black Hole Binary Inspirals  Using OpenCL\nTwisted vertex algebras, bicharacter construction and boson-fermion  correspondences\nQuantum chromodynamics at high energy and noisy traveling waves\nContent-based Text Categorization using Wikitology\nGeneralized Hurst exponent and multifractal function of original and  translated texts mapped into frequency and length time series\nRuntime Verification Based on Register Automata\nThe Effective Field Theory of Dark Energy\nEffective hydrodynamics of black D3-branes\nAn improved semantic similarity measure for document clustering based on  topic maps\nAn Algorithm to Find Optimal Attack Paths in Nondeterministic Scenarios\nSurvey on Instruction Selection: An Extensive and Modern Literature  Review\nA biomechanical modeling study of the effects of the orbicularis oris  muscle and jaw posture on lip shape\nOntology Based Data Integration Over Document and Column Family Oriented  NOSQL\nLie algebroids, non-associative structures and non-geometric fluxes\nWebs and Posets\nFacebook and the Epistemic Logic of Friendship\nSystems Variability Modeling: A Textual Model Mixing Class and Feature  Concepts\npiBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under  complex evolutionary scenarios\nConcave Penalized Estimation of Sparse Gaussian Bayesian Networks\nAn EMG study of the lip muscles during covert auditory verbal  hallucinations in schizophrenia\nComplex Question Answering: Unsupervised Learning Approaches and  Experiments\nAn $\\infty$-categorical approach to $R$-line bundles, $R$-module Thom  spectra, and twisted $R$-homology\nAdaptive MCMC-Based Inference in Probabilistic Logic Programs\nInfinite-State Energy Games\nKMCLib: A general framework for lattice kinetic Monte Carlo (KMC)  simulations\nEntity-Linking via Graph-Distance Minimization\nAutomatic Completion of Distributed Protocols with Symmetry\nA synchronous rendering of hybrid systems for designing Plant-on-a-Chip  (PoC)\nWormholes, Emergent Gauge Fields, and the Weak Gravity Conjecture\nWhat's Decidable about Syntax-Guided Synthesis?\nStructure theory of flip graphs with applications to Weak Symmetry  Breaking\nThermodynamics of quantum feedback cooling\nTGSum: Build Tweet Guided Multi-Document Summarization Dataset\nEquivariant Structure on Smash Powers\nFlashR: R-Programmed Parallel and Scalable Machine Learning using SSDs\nOn the Use of Computer Programs as Money\nHuman-Algorithm Interaction Biases in the Big Data Cycle: A Markov Chain  Iterated Learning Framework\nA Dictionary-based Approach to Racism Detection in Dutch Social Media\nStrange Beta: An Assistance System for Indoor Rock Climbing Route  Setting Using Chaotic Variations and Machine Learning\nNon-equilibrium statistical mechanics: From a paradigmatic model to  biological transport\nSemantics and Algorithms for Parametric Monitoring\nClassSTRONG: Classical simulations of Strong Field processes\nSentiment in New York City: A High Resolution Spatial and Temporal View\nTesting Noninterference, Quickly\nA Parameterized Study of Maximum Generalized Pattern Matching Problems\nModular Description of a Comprehensive Semantics Model for the UML  (Version 2.0)\nDeep Structured Output Learning for Unconstrained Text Recognition\nOn the Effects of Low-Quality Training Data on Information Extraction  from Clinical Reports\nLocal Linearizability\nSupport for Eschenmoser's Glyoxylate Scenario\nHybrid Automata for Formal Modeling and Verification of Cyber-Physical  Systems\nCramér's theorem is atypical\nA Gibbs Sampler for Multivariate Linear Regression\nAn Operator for Entity Extraction in MapReduce\nAsymptotically hyperbolic connections\nDeep Reinforcement Learning in Large Discrete Action Spaces\nThe Abstract Structure of Quantum Algorithms\nImproved Spoken Document Summarization with Coverage Modeling Techniques\nExact Finite-State Machine Identification from Scenarios and Temporal  Properties\nExploiting Lists of Names for Named Entity Identification of Financial  Institutions from Unstructured Documents\nA Computational Method to Calculate the Exact Solution for Acoustic  Scattering by Liquid Spheroids\nOnline shopping behavior study based on multi-granularity opinion  mining: China vs. America\nFixed Parameter Approximations for k-Center Problems in Low Highway  Dimension Graphs\nLatent Tree Models for Hierarchical Topic Detection\nSource-LDA: Enhancing probabilistic topic models using prior knowledge  sources\nCaptioning Images with Diverse Objects\nReview Based Rating Prediction\nRepresenting Pattern Matching Algorithms by Polynomial-Size Automata\nTensiStrength: Stress and relaxation magnitude detection for social  media texts\nMaster equations and the theory of stochastic path integrals\nCreating Causal Embeddings for Question Answering with Minimal  Supervision\nA DIY Ultrasonic Signal Generator for Sound Experiments\nCellular Automata and Finite Groups\nMulti-colony Wright-Fisher with seed-bank\nSchwinger-Keldysh formalism II: Thermal equivariant cohomology\nEquidistribution, Uniform distribution: a probabilist's perspective\nComputing Integrated Information\nThe first Cheeger constant of a simplex\nA pumping lemma for non-cooperative self-assembly\nNoncommutative Cantor-Bendixson derivatives and scattered $C^*$-algebras\nModelHub: Towards Unified Data and Lifecycle Management for Deep  Learning\nGeometric deep learning on graphs and manifolds using mixture model CNNs\nGraph or Relational Databases: A Speed Comparison for Process Mining  Algorithm\nPrior matters: simple and general methods for evaluating and improving  topic quality in topic modeling\nPolynomial-Time Proactive Synthesis of Tree-to-String Functions from  Examples\nIndependent sets in hypergraphs and Ramsey properties of graphs and the  integers\nInvestigating the Application of Common-Sense Knowledge-Base for  Identifying Term Obfuscation in Adversarial Communication\nA Load-Buffer Semantics for Total Store Ordering\nContext-Bounded Analysis for POWER\nImproving the upper bound on the length of the shortest reset words\nPolynomial Time Efficient Construction Heuristics for Vertex Separation  Minimization Problem\nGuided Deep List: Automating the Generation of Epidemiological Line  Lists from Open Sources\nEmergent Gravity of Fractons: Mach's Principle Revisited\nSequential Monte Carlo Methods in the nimble R Package\nRapid-Rate: A Framework for Semi-supervised Real-time Sentiment Trend  Detection in Unstructured Big Data\nMultiBUGS: Massively parallel MCMC for Bayesian hierarchical models\nA GRU-Gated Attention Model for Neural Machine Translation\nItem Recommendation with Evolving User Preferences and Experience\nSimplicity condition and boundary-bulk duality\nParcels v0.9: prototyping a Lagrangian Ocean Analysis framework for the  petascale age\nSupervising Neural Attention Models for Video Captioning by Human Gaze  Data\nVideo as a By-Product of Digital Prototyping: Capturing the Dynamic  Aspect of Interaction\nLong range forces in a performance portable Molecular Dynamics framework\nIdentification of Probabilities\nProving Expected Sensitivity of Probabilistic Programs\nCross-Sentence N-ary Relation Extraction with Graph LSTMs\nEveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic  Tweets\nEfficient Online Inference for Infinite Evolutionary Cluster models with  Applications to Latent Social Event Discovery\nOne-Shot Concept Learning by Simulating Evolutionary Instinct  Development\nInterpretable Categorization of Heterogeneous Time Series Data\nLearning Invariant Riemannian Geometric Representations Using Deep Nets\nEdina: Building an Open Domain Socialbot with Self-dialogues\nDistributed and Managed: Research Challenges and Opportunities of the  Next Generation Cyber-Physical Systems\nA retrieval-based dialogue system utilizing utterance and context  embeddings\nScalar-tensor gravity in the Palatini approach\nAutomated Lemma Synthesis in Symbolic-Heap Separation Logic\nBifurcation of solutions to Hamiltonian boundary value problems\nSemantic Code Repair using Neuro-Symbolic Transformation Networks\nP4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming  Packet Parsers in FPGAs\nAcoustic-To-Word Model Without OOV\nParallel WaveNet: Fast High-Fidelity Speech Synthesis\nDeep Learning Scaling is Predictable, Empirically\nSMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties\nRate-Distributed Spatial Filtering Based Noise Reduction in Wireless  Acoustic Sensor Networks\nAsynchronous Bidirectional Decoding for Neural Machine Translation\nPilot-Streaming: A Stream Processing Framework for High-Performance  Computing\nCross-type Biomedical Named Entity Recognition with Deep Multi-Task  Learning\nMultimodal Image Captioning for Marketing Analysis\nA unifying framework for the modelling and analysis of STR DNA samples  arising in forensic casework\nBehavioral Learning of Aircraft Landing Sequencing Using a Society of  Probabilistic Finite State Machines\nClique-Based Lower Bounds for Parsing Tree-Adjoining Grammars\nImproved Landauer's principle and generalized second law of  thermodynamics with initial correlations and non-equilibrium surrounding  environments\nFinite-Temperature Scrambling of a Random Hamiltonian\nEnd-to-End Dense Video Captioning with Masked Transformer\nMulti-target Voice Conversion without Parallel Data by Adversarially  Learning Disentangled Audio Representations\nFrom Regular Expression Matching to Parsing\nHate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social  Media\nAbstract Machine for Typed Feature Structures\nOn Using Selectional Restriction in Language Models for Speech  Recognition\nQualitative and Quantitative Models of Speech Translation\nCorpus-based Method for Automatic Identification of Support Verbs for  Nominalizations\nSemantic Ambiguity and Perceived Ambiguity\nUbiquitous Talker: Spoken Language Interaction with Real World Objects\nHybrid Transfer in an English-French Spoken Language Translator\nRobust Processing of Natural Language\nThe Unsupervised Acquisition of a Lexicon from Continuous Speech\nFishing for Exactness\nNonuniform Markov models\nA Robust Text Processing Technique Applied to Lexical Error Recovery\nFast Statistical Parsing of Noun Phrases for Document Indexing\nRecycling Lingware in a Multilingual MT System\nAutomatic Discovery of Non-Compositional Compounds in Parallel Data\nTopology of the conceptual network of language\nThe syntactic processing of particles in Japanese spoken language\nRetrieval from Captioned Image Databases Using Natural Language  Processing\nA Bit of Progress in Language Modeling\nPart of Speech Tagging in Thai Language Using Support Vector Machine\nSyntax, Parsing and Production of Natural Language in a Framework of  Information Compression by Multiple Alignment, Unification and Search\nEffective XML Representation for Spoken Language in Organisations\nAn Application of Rational Trees in a Logic Programming Interpreter for  a Procedural Language\nCLAIRE: Combining Sets, Search And Rules To Better Express Algorithms\nOptimizing compilation of constraint handling rules in HAL\nBetter than the real thing? Iterative pseudo-query processing using  cluster-based language models\nA Visual Query Language for Complex-Value Databases\nContext-Sensitive Languages, Rational Graphs and Determinism\nUse of UML and Model Transformations for Workflow Process Definitions\nAutomatic annotation of multilingual text collections with a conceptual  thesaurus\nGeocoding multilingual texts: Recognition, disambiguation and  visualisation\nAlgebraic recognizability of regular tree languages\nA Quantum Computer Foundation for the Standard Model and SuperString  Theories\nRegular Expression Subtyping for XML Query and Update Languages\nMethods to integrate a language model with semantic information for a  word prediction component\nStructure and Interpretation of Computer Programs\nComplexity of Hybrid Logics over Transitive Frames\nA Bialgebraic Approach to Automata and Formal Language Theory\nCommonsense Knowledge, Ontology and Ordinary Language\nSoft Uncoupling of Markov Chains for Permeable Language Distinction: A  New Algorithm\nNew parallel programming language design: a bridge between brain models  and multi-core/many-core computers?\nHighly Undecidable Problems For Infinite Computations\nOn the Entropy of Written Spanish\nLanguages recognized by nondeterministic quantum finite automata\nHyperset Approach to Semi-structured Databases and the Experimental  Implementation of the Query Language Delta\nNondeterministic one-tape off-line Turing machines and their time  complexity\nProperties of quasi-alphabetic tree bimorphisms\nEmploying Wikipedia's Natural Intelligence For Cross Language  Information Retrieval\nObject-Oriented Intensional Programming: Intensional Classes Using Java  and Lucid\nType Safe Extensible Programming\nAdvanced Technology in Speech Disorder Therapy of Romanian Language\nUndecidability Results for Finite Interactive Systems\nType Inference for Deadlock Detection in a Multithreaded Polymorphic  Typed Assembly Language\nOn Decidable Growth-Rate Properties of Imperative Programs\nOn Omega Context Free Languages which are Borel Sets of Infinite Rank\nImportance of interlinguistic similarity and stable bilingualism when  two languages compete\nOffline Arabic Handwriting Recognition Using Artificial Neural Network\nRepresenting Small Ordinals by Finite Automata\nComponent Specification in the Cactus Framework: The Cactus  Configuration Language\nA comprehensive operational semantics of the SCOOP programming model\nUniversal Higher Order Grammar\nIdentification of arabic word from bilingual text using character  features\nSeparation of Test-Free Propositional Dynamic Logics over Context-Free  Languages\nA Knowledge Compilation Map\nProceedings First International Workshop on Process Algebra and  Coordination\nImplementing Explicit and Finding Implicit Sharing in Embedded DSLs\nProceedings Fifth Workshop on Formal Languages and Analysis of  Contract-Oriented Software\nQuotient Complexities of Atoms of Regular Languages\nModeling two-language competition dynamics\nGUBS, a Behavior-based Language for Open System Dedicated to Synthetic  Biology\nUpgrading EasyTime: from a textual to a visual language\nJooFlux: Hijacking Java 7 InvokeDynamic To Support Live Code  Modifications\nFrom Regexes to Parsing Expression Grammars\nSome Chances and Challenges in Applying Language Technologies to  Historical Studies in Chinese\nThe automatic creation of concept maps from documents written using  morphologically rich languages\nLarge Scale Language Modeling in Automatic Speech Recognition\nA Type System for the Automatic Distribution of Higher-order Synchronous  Dataflow Programs\nA Principled Approach to Grammars for Controlled Natural Languages and  Predictive Editors\n(Extended Version) Algebraic Characterization of the Class of Languages  recognized by Measure Only Quantum Automata\nTowards the Rapid Development of a Natural Language Understanding Module\nNLP and CALL: integration is working\nTowards Python-based Domain-specific Languages for Self-reconfigurable  Modular Robotics Research\nJapanese-Spanish Thesaurus Construction Using English as a Pivot\nAn Overview of Hindi Speech Recognition\nS+Net: extending functional coordination with extra-functional semantics\nPolyglot: Distributed Word Representations for Multilingual NLP\nClustering Algorithm for Gujarati Language\nText segmentation with character-level text embeddings\nDenotational Semantics of A User-Oriented, Domain-Specific Language\nIntroduction to Functional Grammars\nUsing Robust PCA to estimate regional characteristics of language use  from geo-tagged Twitter messages\nIllustrating the Mezzo programming language\nFunction Overloading Implementation in C++\nEntropy analysis of word-length series of natural language texts:  Effects of text language and genre\nWikipedia-based Semantic Interpretation for Natural Language Processing\nReasoning about Meaning in Natural Language with Compact Closed  Categories and Frobenius Algebras\nA Compilation Target for Probabilistic Programming Languages\nInducing Language Networks from Continuous Space Word Representations\nSemantic Unification A sheaf theoretic approach to natural language\nA hybrid formalism to parse Sign Languages\nTowards Active Logic Programming\nKalman filter in quantum language\nCustomisable Handling of Java References in Prolog Programs\nA Survey of Named Entity Recognition in Assamese and other Indian  Languages\nProcess-Oriented Parallel Programming with an Application to  Data-Intensive Computing\nPrinciples and Parameters: a coding theory perspective\nThe Final Solutions of Monty Hall Problem and Three Prisoners Problem\nModeling Basic Aspects of Cyber-Physical Systems, Part II\nA Language Support for Exhaustive Fault-Injection in Message-Passing  System Models\nROSS User's Guide and Reference Manual (Version 1.0)\nKitRobot: A multi-platform graphical programming IDE to program  mini-robotic agents\nA Fuzzy Logic Programming Environment for Managing Similarity and Truth  Degrees\nApplied Metamodelling: A Foundation for Language Driven Development  (Third Edition)\nSparse Automatic Differentiation for Large-Scale Computations Using  Abstract Elementary Algebra\nAsk Your Neurons: A Neural-based Approach to Answering Questions about  Images\nHorn Clauses as an Intermediate Representation for Program Analysis and  Transformation\nThe height of piecewise-testable languages with applications in logical  complexity\nOvercoming Language Variation in Sentiment Analysis with Social  Attention\nEnhancements in statistical spoken language translation by  de-normalization of ASR results\nMultilingual Part-of-Speech Tagging with Bidirectional Long Short-Term  Memory Models and Auxiliary Loss\nDialog-based Language Learning\nExploiting Deep Semantics and Compositionality of Natural Language for  Human-Robot-Interaction\nSyntactic complexity of bifix-free languages\nOn Implementing Real-time Specification Patterns Using Observers\nTowards cross-lingual distributed representations without parallel text  trained with adversarial autoencoders\nFast, Small and Exact: Infinite-order Language Modelling with Compressed  Suffix Trees\nUsing the Output Embedding to Improve Language Models\nVicious Circle Principle and Formation of Sets in ASP Based Languages\nContext-Oriented Programming: A Programming Paradigm for Autonomic  Systems\nA Scalable Module System\nRobust Sign Language Recognition System Using ToF Depth Cameras\nAbstracting Abstract Machines: A Systematic Approach to Higher-Order  Program Analysis\nGraph Reachability and Pebble Automata over Infinite Alphabets\nUsing Constraint Handling Rules to Provide Static Type Analysis for the  Q Functional Language\nLogical analysis of natural language semantics to solve the problem of  computer understanding\nImplementing Constraint Handling Rules as a Domain-Specific Language  Embedded in Java\nANOVA (analysis of variance) in the quantum linguistic formulation of  statistics\nLIQUi|>: A Software Design Architecture and Domain-Specific Language for  Quantum Computing\nLinguistic Analysis of Requirements of a Space Project and their  Conformity with the Recommendations Proposed by a Controlled Natural Language\nRuleCNL: A Controlled Natural Language for Business Rule Specifications\nA composable language for action models\nHow Easy is it to Learn a Controlled Natural Language for Building a  Knowledge Base?\nDelta-oriented Architectural Variability Using MontiCore\nConfluence for classical logic through the distinction between values  and computations\nAlternating Towers and Piecewise Testable Separators\nRediscovering the Alphabet - On the Innate Universal Grammar\nA Study of Sindhi Related and Arabic Script Adapted languages  Recognition\nIncremental Adaptation Strategies for Neural Network Language Models\nComplexity and universality in the long-range order of words\nConsequences of a Goedel's misjudgment\nLucretia - intersection type polymorphism for scripting languages\nGuided Grammar Convergence\nFine-grained Language Composition: A Case Study\nA Linear First-Order Functional Intermediate Language for Verified  Compilers\nSupporting Language Learners with the Meanings Of Closed Class Items\nA data-based classification of Slavic languages: Indices of qualitative  variation applied to grapheme frequencies\nDocument Classification by Inversion of Distributed Language  Representations\nA Nivat Theorem for Weighted Timed Automata and Weighted Relative  Distance Logic\nAutomata networks for memory loss effects in the formation of linguistic  conventions\nRelating BIP and Reo\nA large annotated corpus for learning natural language inference\nComputational Sociolinguistics: A Survey\nDescription of the Odin Event Extraction Framework and Rule Language\nMultilingual Language Processing From Bytes\nAugmenting Phrase Table by Employing Lexicons for Pivot-based SMT\nIn the Age of Web: Typed Functional-First Programming Revisited\nA Hidden Markov Model Based System for Entity Extraction from Social  Media English Text at FIRE 2015\nRecurrent Memory Networks for Language Modeling\nExploring the Limits of Language Modeling\nReversible Communicating Processes\nAutomated Word Prediction in Bangla Language Using Stochastic Language  Models\nUnsupervised word segmentation and lexicon discovery using acoustic word  embeddings\nZipf's law emerges asymptotically during phase transitions in  communicative systems\nInter-Paradigm Translation of Process Models using Simulation and Mining\nTowards an Automated Requirements-driven Development of Smart  Cyber-Physical Systems\nDefault Rules for Curry\nAdversarial Deep Averaging Networks for Cross-Lingual Sentiment  Classification\nGated Word-Character Recurrent Language Model\nLanguage-integrated provenance\nUser interfaces for computational science: a domain specific language  for OOMMF embedded in Python\nA Character-level Convolutional Neural Network for Distinguishing  Similar Languages and Dialects\nModeling Language Change in Historical Corpora: The Case of Portuguese\nSupervisory Control of Fuzzy Discrete Event Systems for Simulation  Equivalence\nEmergence of linguistic laws in human voice\nCompressing Neural Language Models by Sparse Word Representations\nTranslation Quality Estimation using Recurrent Neural Network\nLearning variable length units for SMT between related languages via  Byte Pair Encoding\nExperiments with POS Tagging Code-mixed Indian Social Media Text\nStructure vs. Language: Investigating the Multi-factors of Asymmetric  Opinions on Online Social Interrelationship with a Case Study\nAssessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies\nDifferentiable Functional Program Interpreters\nMinimal and Reduced Reversible Automata\nA Natural Language Query Interface for Searching Personal Information on  Smartwatches\nHow Are Programs Found? Speculating About Language Ergonomics With  Curry-Howard\nTowards better decoding and language model integration in sequence to  sequence models\nA Character-Word Compositional Neural Language Model for Finnish\nA POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP  Tools Contest Entry from Surukam\nLocal Modules in Imperative Languages\nA Higher-Order Logic for Concurrent Termination-Preserving Refinement\nAnalysing Temporal Evolution of Interlingual Wikipedia Article Pairs\nY-Calculus: A Language for Real Matrices Derived from the ZX-Calculus\nNamed Entity Evolution Recognition on the Blogosphere\nA Structural and Nominal Syntax for Diagrams\nRegular Separability of Well Structured Transition Systems\nCritical Survey of the Freely Available Arabic Corpora\nA survey on difference hierarchies of regular languages\nUsing Off-the-Shelf Exception Support Components in C++ Verification\nData Noising as Smoothing in Neural Network Language Models\nA Quasi-Linear Time Algorithm Deciding Whether Weak Büchi Automata  Reading Vectors of Reals Recognize Saturated Languages\nInvestigation of Language Understanding Impact for Reinforcement  Learning Based Dialogue Systems\nSequence-to-Sequence Models Can Directly Translate Foreign Speech\nTopic modeling of public repositories at scale using names in source  code\nModel Transfer for Tagging Low-resource Languages using a Bilingual  Dictionary\nWord and Phrase Translation with word2vec\nPhone-aware Neural Language Identification\nBuilding a Semantic Role Labelling System for Vietnamese\nA Co-contextual Type Checker for Featherweight Java (incl. Proofs)\nW2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis\nA Low Dimensionality Representation for Language Variety Identification\nUsing of heterogeneous corpora for training of an ASR system\nDecoding Lua: Formal Semantics for the Developer and the Semanticist\nDataset for a Neural Natural Language Interface for Databases (NNLIDB)\nMAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity  Linking Approach\nCross-Lingual Induction and Transfer of Verb Classes Based on Word  Vector Space Specialisation\nLocalizing Moments in Video with Natural Language\nOn the Learnability of Programming Language Semantics\nN-gram and Neural Language Models for Discriminating Similar Languages\nAn Annotated Corpus of Relational Strategies in Customer Service\nNeural Networks Compression for Language Modeling\nA Neural Language Model for Dynamically Representing the Meanings of  Unknown Words and Entities in a Discourse\nAbstractions for AI-Based User Interfaces and Systems\nContext-Updates Analysis and Refinement in Chisel\nLexical Disambiguation in Natural Language Questions (NLQs)\nSpeaker Role Contextual Modeling for Language Understanding and Dialogue  Policy Learning\nEmergent Translation in Multi-Agent Communication\nUnsupervised Machine Translation Using Monolingual Corpora Only\nEvaluation of Croatian Word Embeddings\nNeural Skill Transfer from Supervised Language Tasks to Reading  Comprehension\nRecurrent Neural Networks as Weighted Language Recognizers\nOne Model for the Learning of Language\nWYS*: A Verified Language Extension for Secure Multi-party Computations\nMastering the Dungeon: Grounded Language Learning by Mechanical Turker  Descent\nG-CORE: A Core for Future Graph Query Languages\nLeveraging Native Language Speech for Accent Identification using Deep  Siamese Networks\nAutomated rating of recorded classroom presentations using speech  analysis in kazakh\nScilla: a Smart Contract Intermediate-Level LAnguage\nHierarchical Memory Management for Mutable State\nCutLang: A Particle Physics Analysis Description Language and Runtime  Interpreter\nTransfer Learning for Improving Speech Emotion Classification Accuracy\nDeep Reinforcement Learning for Programming Language Correction\nBayesian Models for Unit Discovery on a Very Low Resource Language\nLearning Word Vectors for 157 Languages\nLIDIOMS: A Multilingual Linked Idioms Data Set\nTowards end-to-end spoken language understanding\nThe origins of the Malagasy people, some certainties and a few mysteries\nSyntax-Aware Language Modeling with Recurrent Neural Networks\nKnowledge Aided Consistency for Weakly Supervised Phrase Grounding\nPolyglot Semantic Parsing in APIs\nNatural Language or Not (NLoN) - A Package for Software Engineering Text  Analysis Pipeline\nAllenNLP: A Deep Semantic Natural Language Processing Platform\nVideo Object Segmentation with Language Referring Expressions\nUnsupervised Separation of Transliterable and Native Words for Malayalam\nAn Experiment in Ping-Pong Protocol Verification by Nondeterministic  Pushdown Automata\nRobust Cross-lingual Hypernymy Detection using Dependency Context\nAttentive Sequence-to-Sequence Learning for Diacritic Restoration of  Yorùbá Language Text\nChinese-Portuguese Machine Translation: A Study on Building Parallel  Corpora from Comparable Texts\nPositivity and conservation of superenergy tensors\nTracing the evolution of NGC6397 through the chemical composition of its  stellar populations\nA generalized palindromization map in free monoids\nCounting and generating lambda terms\nBest of Both Worlds: Transferring Knowledge from Discriminative Learning  to a Generative Visual Dialog Model\nAn Alternative Conception of Tree-Adjoining Derivation\nAcquiring Receptive Morphology: A Connectionist Model\nStructural Tags, Annealing and Automatic Word Classification\nAn Attributive Logic of Set Descriptions and Set Operations\nAligning a Parallel English-Chinese Corpus Statistically with Lexical  Criteria\nAn Empirical Model of Acknowledgment for Spoken-Language Systems\nTricolor DAGs for Machine Translation\nA Sequential Algorithm for Training Text Classifiers\nComparative Discourse Analysis of Parallel Texts\nParsing with Principles and Probabilities\nBuilding a Parser That can Afford to Interact with Semantics\nTreating `Free Word Order' in Machine Translation\nAligning Noisy Parallel Corpora Across Language Groups : Word Pair  Feature Matching by Dynamic Time Warping\nXTAG system - A Wide Coverage Grammar for English\nA Freely Available Syntactic Lexicon for English\nReference Resolution Using Semantic Patterns in Japanese Newspaper  Articles\nProbabilistic Tagging with Feature Structures\nFeature-Based TAG in place of multi-component adjunction: Computational  Implications\nInterlanguage Signs and Lexical Transfer Errors\nReverse Queries in DATR\nManipulating Human-oriented Dictionaries with very simple tools\nAnalysis of Japanese Compound Nouns using Collocational Information\nIntegrating \"Free\" Word Order Syntax and Information Structure\nAn Algorithm to Co-Ordinate Anaphora Resolution and PPS Disambiguation  Process\nImplementation and evaluation of a German HMM for POS disambiguation\nCreating a tagset, lexicon and guesser for a French tagger\nIncremental Interpretation: Applications, Theory, and Relationship to  Dynamic Semantics\nIncremental Interpretation of Categorial Grammar\nAutomatic processing proper names in texts\nA Uniform Treatment of Pragmatic Inferences in Simple and Complex  Utterances and Sequences of Utterances\nEncoding Lexicalized Tree Adjoining Grammars with a Nonmonotonic  Inheritance Hierarchy\nUsing Higher-Order Logic Programming for Semantic Interpretation of  Coordinate Constructs\nAutomatic Extraction of Tagset Mappings from Parallel-Annotated Corpora\nA Matching Technique in Example-Based Machine Translation\nOn Constraint-Based Lambek Calculi\nHow much is enough?: Data requirements for statistical NLP\nParseTalk about Textual Ellipsis\nAttempto Controlled English (ACE)\nActive Constraints for a Direct Interpretation of HPSG\nFunctional Centering\nClassification in Feature-based Default Inheritance Hierarchies\nResearch on Architectures for Integrated Speech/Language Systems in  Verbmobil\nHead Automata and Bilingual Tiling: Translation with Minimal  Representations\nUnsupervised Discovery of Phonological Categories through Supervised  Learning of Morphological Rules\nA Corpus Study of Negative Imperatives in Natural Language Instructions\nPhonological modeling for continuous speech recognition in Korean\nMultiple Discourse Relations on the Sentential Level in Japanese\nConnected Text Recognition Using Layered HMMs and Token Passing\nClassifiers in Japanese-to-English Machine Translation\nGathering Statistics to Aspectually Classify Sentences with a Genetic  Algorithm\nA Morphology-System and Part-of-Speech Tagger for German\nInformation Extraction - A User Guide\nA Maximum Entropy Approach to Identifying Sentence Boundaries\nFinite State Transducers Approximating Hidden Markov Models\nGlobal Thresholding and Multiple Pass Parsing\nLearning Methods for Combining Linguistic Indicators to Classify Verbs\nA Hybrid Environment for Syntax-Semantic Tagging\nNymble: a High-Performance Learning Name-finder\nLong-range fractal correlations in literary corpora\nExpoiting Syntactic Structure for Language Modeling\nA Structured Language Model\nLearning Transformation Rules to Find Grammatical Relations\nHMM Specialization with Selective Lexicalization\nMixed-Level Knowledge Representation and Variable-Depth Inference in  Natural Language Processing\nRefinement of a Structured Language Model\nProgramming in Alma-0, or Imperative and Declarative Programming  Reconciled\nA Compact Architecture for Dialogue Management Based on Scripts and  Meta-Outputs\nA Tableaux Calculus for Ambiguous Quantification\nA Tableau Calculus for Pronoun Resolution\nA Resolution Calculus for Dynamic Semantics\nVerification of Timed Automata Using Rewrite Rules and Strategies\nRicher Syntactic Dependencies for Structured Language Modeling\nAn Integrated Framework for Treebanks and Multilayer Annotations\nApplying a Hybrid Query Translation Method to Japanese/English  Cross-Language Patent Retrieval\nPRIME: A System for Multi-lingual Patent Retrieval\nLanguage Modeling for Multi-Domain Speech-Driven Text Retrieval\nSpeech-Driven Text Retrieval: Using Target IR Collections for  Statistical Language Model Adaptation in Speech Recognition\nMonadic Style Control Constructs for Inference Systems\nIssues in Communication Game\nA Grid Based Architecture for High-Performance NLP\nBuilding a Test Collection for Speech-Driven Web Retrieval\nApplication Architecture for Spoken Language Resources in Organisational  Settings\nPolarity sensitivity and evaluation order in type-logical grammar\nKnowledge And The Action Description Language A\nAspects de la Programmation d'Applications Win32 avec un Langage  Fonctionnel\nThe First-Order Theory of Sets with Cardinality Constraints is Decidable\nApplication of the Double Metaphone Algorithm to Amharic Orthography\nOverhead-Free Computation, DCFLs, and CFLs\nRobust Dialogue Understanding in HERALD\nProof rules for purely quantum programs\nA Formal Foundation for ODRL\nUtilisation de la linguistique en reconnaissance de la parole : un  état de l'art\nDealing with Metonymic Readings of Named Entities\nResidual Finite Tree Automata\nXString: XML as a String\nExperiments on predictability of word in context and information rate in  natural language\nDe l'oprateur de trace dans les jeux de Conway\nInterlinguistic similarity and language death dynamics\nQuantum Pushdown Automata\nQuantum Property Testing\nA Note on Ontology and Ordinary Language\nOn logical characterization of henselianity\nLinearly bounded infinite graphs\nBootstrapping Deep Lexical Resources: Resources for Courses\nNetwork model of human language\nUsing Description Logics for Recognising Textual Entailment\nAutomatic Coding Rule Conformance Checking Using Logic Programs\nOn Infinite Real Trace Rational Languages of Maximum Topological  Complexity\nPolicies of System Level Pipeline Modeling\nA Survey of Quantum Programming Languages: History, Methods, and Tools\nA classification of invasive patterns in AOP\nA language for mathematical knowledge management\nTopological Complexity of Context-Free omega-Languages: A Survey\nWhat It Feels Like To Hear Voices: Fond Memories of Julian Jaynes\nCombining Semantic Wikis and Controlled Natural Language\nThe Wadge Hierarchy of Deterministic Tree Languages\nTowards a Theory of Requirements Elicitation: Acceptability Condition  for the Relative Validity of Requirements\nCompilation of extended recursion in call-by-value functional languages\nDetecting patterns in finite regular and context-free languages\nA Decision Problem for Ultimately Periodic Sets in Non-standard  Numeration Systems\nBPDMN: A Conservative Extension of BPMN with Enhanced Data  Representation Capabilities\nBit Copying - The Ultimate Computational Simplicity\nDecision Problems For Turing Machines\nVers la reconnaissance de mini-messages manuscrits\nStandards for Language Resources\nA computational definition of the notion of vectorial space\nTypage fort et typage souple des collections topologiques et des  transformations\nPublic-key cryptography in functional programming context\nComplexity of Problems for Commutative Grammars\nPositive Supercompilation for a Higher-Order Call-By-Value Language\nPurely Functional Structured Programming\nA Non-Null Annotation Inferencer for Java Bytecode\nProceedings Ninth International Workshop on the Foundations of  Coordination Languages and Software Architectures\nThe Maximal Subword Complexity of Quasiperiodic Infinite Words\nTowards a Property Preserving Transformation from IEC 61131-3 to BIP\nConstructions définitoires des tables du Lexique-Grammaire\nRealizing evaluation strategies by hierarchical graph rewriting\nClosure properties of predicates recognized by deterministic and  non-deterministic asynchronous automata\nApplication of a Quantum Ensemble Model to Linguistic Analysis\nQuerying Biomedical Ontologies in Natural Language using Answer Set\nSyntax and Semantics of Babel-17\nECLiPSe - from LP to CLP\nNatural Language Processing (almost) from Scratch\nProceedings Types for Proofs and Programs, Revised Selected Papers\nArc Consistency and Friends\nA Simple Multi-Processor Computer Based on Subleq\nAlgorithmic Programming Language Identification\nFactor frequencies in languages invariant under more symmetries\nObservational equivalences for linear logic CC languages\nDomain-specific Languages in a Finite Domain Constraint Programming  System\nConjure Revisited: Towards Automated Constraint Modelling\nObject-oriented semantics of English in natural language understanding  system\nReaction Automata\nA sound and complete axiomatization for Dynamic Topological Logic\nFunction call overhead benchmarks with MATLAB, Octave, Python, Cython  and C\nTree Transducers, Machine Translation, and Cross-Language Divergences\nRe-differentiation as collective intelligence: The Ktunaxa language  online community\nNumeration Systems: a Link between Number Theory and Formal Language  Theory\nCell Decomposition for semibounded p-adic sets\nLearning to Map Sentences to Logical Form: Structured Classification  with Probabilistic Categorial Grammars\nA Note on Limited Pushdown Alphabets in Stateless Deterministic Pushdown  Automata\nProgramming Languages for Scientific Computing\nA Lightweight Stemmer for Gujarati\nA Scale-Space Theory for Text\nQuantifier Alternation in Two-Variable First-Order Logic with Successor  Is Decidable\nTranslating NP-SPEC into ASP\nLanguage ASP{f} with Arithmetic Expressions and Consistency-Restoring  Rules\nTwo-Sided Derivatives for Regular Expressions and for Hairpin  Expressions\nFormal Verification of Hardware Synthesis\nConnecting the Dots: Computer Systems Education using a Functional  Hardware Description Language\nKleene Algebra with Tests and Coq Tools for While Programs\nUnified Modeling Language for Describing Business Value Chain Activities\nIndian Sign Language Recognition Using Eigen Value Weighted Euclidean  Distance Based Classification Technique\nType-theoretical natural language semantics: on the system F for meaning  assembly\nA weak HOAS approach to the POPLmark Challenge\nA fast method for implementation of the property lists in programming  languages\nSofic-Dyck shifts\nTowards Tree Automata-based Success Types\nGenetic approach for arabic part of speech tagging\nExperimenting with X10 for Parallel Constraint-Based Local Search\nCharacterizing traits of coordination\nTowards Meta-Reasoning in the Concurrent Logical Framework CLF\nEmptiness and Universality Problems in Timed Automata with Positive  Frequency\nPretty-big-step-semantics-based Certified Abstract Interpretation  (Preliminary version)\nKeyboard for inputting Chinese language\nApplying quantitative semantics to higher-order quantum computing\nAuthorship Attribution Using Word Network Features\nProceedings Second International Workshop on Trends in Tree Automata and  Tree Transducers\nHigher-order semantics for quantum programming languages with classical  control\nFrom Lock Freedom to Progress Using Session Types\nSession Types Go Dynamic or How to Verify Your Python Conversations\nAxioms for Definability and Full Completeness\nUpper Bounds on Syntactic Complexity of Left and Two-Sided Ideals\nHybrid Approach to English-Hindi Name Entity Transliteration\nChallenges in Persian Electronic Text Analysis\nA Technology for BigData Analysis Task Description using Domain-Specific  Languages\nAn Account of Opinion Implicatures\nTowards a Benchmark of Natural Language Arguments\nOn state complexity of unions of binary factor-free languages\nESmodels: An Epistemic Specification Solver\nLogic Programming as Scripting Language for Bots in Computer Games --  Research Overview\nLanguage to Specify Syntax-Guided Synthesis Problems\nBoolean Circuit Complexity of Regular Languages\nVerified Subtyping with Traits and Mixins\nAutonomous requirements specification processing using natural language  processing\nEffective model-completeness for p-adic analytic structures\nControlled Natural Language Processing as Answer Set Programming: an  Experiment\nStar-free languages and local divisors\nUnsupervised Keyword Extraction from Polish Legal Texts\nKleene Algebras, Regular Languages and Substructural Logics\nNot All Neural Embeddings are Born Equal\nRiesz Logic\nOptimizing the For loop: Comparison of For loop and micro For loop\nA Survey on the Local Divisor Technique\nParallel Prefix Polymorphism Permits Parallelization, Presentation &  Proof\nConfusion in the Church-Turing Thesis\nA Category Theory of Communication Theory\nTalking to the crowd: What do people react to in online discussions?\nAbstract Gringo\nSequent Calculus and Equational Programming\nPolish to English Statistical Machine Translation\nThe Essence of JavaScript\nOn the Problem of Computing the Probability of Regular Sets of Trees\nHelping Domain Experts Build Speech Translation Systems\nFeedforward Sequential Memory Neural Networks without Recurrent Feedback\nBASEL (Buffering Architecture SpEcification Language)\nRandom graphs and Lindstrom quantifiers for natural graph properties\nData Language Specification via Terminal Attribution\nFull abstraction for probabilistic PCF\nInformation retrieval in folktales using natural language processing\nPaxos Made Switch-y\nAsk, and shall you receive?: Understanding Desire Fulfillment in Natural  Language Text\nSMT Solving for Functional Programming over Infinite Structures\nCompositionality and String Diagrams for Game Theory\nA Revision of the Mool Language\nThe IBM 2016 English Conversational Telephone Speech Recognition System\nWord Ordering Without Syntax\nTowards a native toplevel for the OCaml language\nA Type System for Unstructured Locking that Guarantees Deadlock Freedom  without Imposing a Lock Ordering\nDecorated proofs for computational effects: States\nDynamic Construction of Belief Networks\nDealing with natural language interfaces in a geolocation context\nC++11 - określanie typów\nA New Statement for Selection and Exception Handling in Imperative  Languages\nHandwritten Character Recognition In Malayalam Scripts- A Review\nA Programming Language Oriented Approach to Computability\nThe dagger lambda calculus\nFrom XML Schema to JSON Schema: Translation with CHR\nFinite Automata With Restricted Two-Way Motion\nExecutable Modeling with UML. A Vision or a Nightmare?\nCritical Systems Development Using Modeling Languages. (CSDUML-04):  Current Developments and Future Challenges (Report on the Third International  Workshop)\nSemi-supervised Classification for Natural Language Processing\nTowards a graphical language for quadrotor missions\nQPEL: Quantum Program and Effect Language\nSemantics for a Quantum Programming Language by Operator Algebras\nGeometry of Resource Interaction - A Minimalist Approach\nOn the Coverability Problem for Pushdown Vector Addition Systems in One  Dimension\nA Knowledge-poor Pronoun Resolution System for Turkish\nModel Driven Reactive Applications\nA Publicly Available Cross-Platform Lemmatizer for Bulgarian\nEgison: Non-Linear Pattern-Matching against Non-Free Data Types\njUCM: Universal Class Morphing (position paper)\nImproved Transition-Based Parsing by Modeling Characters instead of  Words with LSTMs\nJuMP: A Modeling Language for Mathematical Optimization\nPosterior calibration and exploratory analysis for natural language  processing models\nA commentary on \"The now-or-never bottleneck: a fundamental constraint  on language\", by Christiansen and Chater (2016)\nProceedings Thirteenth Workshop on Quantitative Aspects of Programming  Languages and Systems\nPredicting the top and bottom ranks of billboard songs using Machine  Learning\nModular implicits\nA Context-Oriented Extension of F#\nIncorporating Structural Alignment Biases into an Attentional Neural  Translation Model\nThe Essence of Inheritance\nOn Training Bi-directional Neural Network Language Model with Noise  Contrastive Estimation\nA Probabilistic Dependent Type System based on Non-Deterministic Beta  Reduction\nTrace semantics for polymorphic references\nTowards a DSL for Perception-Based Safety Systems\nModel Completeness for Henselian Fields with finite ramification valued  in a $Z$-Group\nThe Diagonal Problem for Higher-Order Recursion Schemes is Decidable\nAdobe-MIT submission to the DSTC 4 Spoken Language Understanding pilot  task\nAutomatic TM Cleaning through MT and POS Tagging: Autodesk's Submission  to the NLP4TM 2016 Shared Task\nAs Cool as a Cucumber: Towards a Corpus of Contemporary Similes in  Serbian\nPerspectives for proof unwinding by programming languages techniques\nExploiting Multi-typed Treebanks for Parsing with Deep Multi-task  Learning\nEvidences of the mismatch between industry and academy on modelling  language quality evaluation\nDeciding Equivalence of Linear Tree-to-Word Transducers in Polynomial  Time\nThe Vopěnka principle is inequivalent to but conservative over the  Vopěnka scheme\nZero-Resource Translation with Multi-Lingual Neural Machine Translation\nConstitutional Precedent of Amicus Briefs\nEvent-driven Adaptation in COP\nFrom Events to Reactions: A Progress Report\nOn the Solvability of Inductive Problems: A Study in Epistemic Topology\nSequential Convolutional Neural Networks for Slot Filling in Spoken  Language Understanding\nLearning Crosslingual Word Embeddings without Bilingual Corpora\nMapping distributional to model-theoretic semantic spaces: a baseline\nA XML Based Datagrid Description Language\nFragment Allocation Configuration in Distributed Database Systems\n2016 Google Scholar Metrics released: a matter of languages... and  something else\nGrounded Lexicon Acquisition - Case Studies in Spatial Language\nThe Number of Atomic Models of Uncountable Theories\nByte-based Language Identification with Deep Convolutional Networks\nPsychologically Motivated Text Mining\nSurvey on the Use of Typological Information in Natural Language  Processing\nA Language-independent and Compositional Model for Personality Trait  Recognition from Short Texts\nPersonalized Machine Translation: Preserving Original Author Traits\nThe Intelligent Voice 2016 Speaker Recognition System\n1.5 billion words Arabic Corpus\nOptimal Test Sets for Context-Free Languages\nFill it up: Exploiting partial dependency annotations in a minimum  spanning tree parser\nGeometry of Compositionality\nStateology: State-Level Interactive Charting of Language, Feelings, and  Values\nCross-Lingual Dependency Parsing with Late Decoding for Truly  Low-Resource Languages\nA Concurrent Model for Imperative Languages with Improved Atomicity\nConstraint Handling Rules - What Else?\nNon-Blocking Concurrent Imperative Programming with Session Types\nemLam -- a Hungarian Language Modeling baseline\nA Comprehensive Survey on Bengali Phoneme Recognition\nMultilingual Multi-modal Embeddings for Natural Language Processing\nComparative Study of CNN and RNN for Natural Language Processing\nFast and unsupervised methods for multilingual cognate clustering\nVerified type checker for Jolie programming language\nA Short Review of Ethical Challenges in Clinical Natural Language  Processing\nEnvironment-Independent Task Specifications via GLTL\nBeating Atari with Natural Language Guided Reinforcement Learning\nAn Analysis of Action Recognition Datasets for Language and Vision Tasks\nLearning Structured Natural Language Representations for Semantic  Parsing\nDuluth at SemEval-2017 Task 6: Language Models in Humor Detection\nExtending and Improving Wordnet via Unsupervised Word Embeddings\nEfficient Natural Language Response Suggestion for Smart Reply\nA Systematic Review of Hindi Prosody\nSpelling Correction as a Foreign Language\nCharacter Composition Model with Convolutional Neural Networks for  Dependency Parsing on Morphologically Rich Languages\nAre distributional representations ready for the real world? Evaluating  word vectors for grounded perceptual meaning\nEfficient Textual Representation of Structure\nLearning Pairwise Disjoint Simple Languages from Positive Examples\nComputational Thinking in Patch\nGerman in Flux: Detecting Metaphoric Change via Word Entropy\nIns-Robust Primitive Words\nAn Embedded Deep Learning based Word Prediction\nOpen Quantum Assembly Language\nProceedings 15th Workshop on Quantitative Aspects of Programming  Languages and Systems\nRotations and Interpretability of Word Embeddings: the Case of the  Russian Language\nOn the State of the Art of Evaluation in Neural Language Models\nImproving Language Modeling using Densely Connected Recurrent Neural  Networks\nA Sub-Character Architecture for Korean Language Processing\nIdentifying civilians killed by police with distantly supervised  entity-event extraction\nImproving coreference resolution with automatically predicted prosodic  information\nNatural Language Processing with Small Feed-Forward Networks\nConfluence in Probabilistic Rewriting\nAutomatic Identification of AltLexes using Monolingual Parallel Corpora\nTowards Syntactic Iberian Polarity Classification\nCD Grammar Systems with Two Propagating Scattered Context Components  Characterize the Family of Context Sensitive Languages\nCryptographically Secure Information Flow Control on Key-Value Stores\nModel Checking Regular Language Constraints\nTowards Neural Machine Translation with Latent Tree Attention\nData-Driven Dialogue Systems for Social Agents\nA Domain-specific Language for High-reliability Software used in the  JUICE SWI Instrument - The hO Language Manual\nLimitations of Cross-Lingual Learning from Image Search\nTwo-way Two-tape Automata\nA Practical Python API for Querying AFLOWLIB\nTransferring Semantic Roles Using Translation and Syntactic Information\nRobot-Initiated Specification Repair through Grounded Language  Interaction\nA New Technique for Reachability of States in Concatenation Automata\nTensor network language model\nA Comparison of Feature-Based and Neural Scansion of Poetry\nTowards Linguistically Generalizable NLP Systems: A Workshop and Shared  Task\nTowards operational natural language\nEvent-Clock Nested Automata\nCharacterizing the hyper-parameter space of LSTM language models for  mixed context applications\nCreating New Language and Voice Components for the Updated MaryTTS  Text-to-Speech Synthesis Platform\nRasa: Open Source Language Understanding and Dialogue Management\nword representation or word embedding in Persian text\nPWCT: Visual Language for IoT and Cloud Computing Applications and  Systems\nVnCoreNLP: A Vietnamese Natural Language Processing Toolkit\nTheory of higher order interpretations and application to Basic Feasible  Functions\nSequential Circuits from Regular Expressions Revisited\nCan One Escape Red Chains? Regular Path Queries Determinacy is  Undecidable\nNatural Language Inference over Interaction Space: ICLR 2018  Reproducibility Report\nLogic Programming Applications: What Are the Abstractions and  Implementations?\nFormal Semantics of the Language Cypher\nThe Importance of Being Recurrent for Modeling Hierarchical Structure\nVehicle Platooning Simulations with Functional Reactive Programming\nReal-Time Prediction of the Duration of Distribution System Outages\nA ZX-Calculus with Triangles for Toffoli-Hadamard, Clifford+T, and  Beyond\nEinstein-Yang-Mills Isolated Horizons: Phase Space, Mechanics, Hair and  Conjectures\nWebs, Lenard schemes, and the local geometry of bihamiltonian Toda and  Lax structures\nMeasuring and Synthesizing Systems in Probabilistic Environments\nRhythms of Memory and Bits on Edge: Symbol Recognition as a Physical  Phenomenon\nLearning Content Selection Rules for Generating Object Descriptions in  Dialogue\nTowards automating the generation of derivative nouns in Sanskrit by  simulating Panini\nAn Extended action for the effective field theory of dark energy: a  stability analysis and a complete guide to the mapping at the basis of  EFTCAMB\nFlow- and Context-Sensitive Points-to Analysis using Generalized  Points-to Graphs\nUsing Multiple Sources of Information for Constraint-Based Morphological  Disambiguation\nData-Oriented Language Processing. An Overview\nDiscovery of Linguistic Relations Using Lexical Attraction\nLanguage evolution and population Dynamics in a system of two  interacting species\nSimilarity-Based Models of Word Cooccurrence Probabilities\nSpecialization of Functional Logic Programs Based on Needed Narrowing\nA machine-independent port of the SR language run-time system to the  NetBSD operating system\nA Machine-Independent port of the MPD language run time system to NetBSD\nLineal: A linear-algebraic Lambda-calculus\nMykyta the Fox and networks of language\nQuerying XML Documents in Logic Programming\nComputational modelling of evolution: ecosystems and language\nStatistical analysis of the Indus script using $n$-grams\nAccelerating the Execution of Matrix Languages on the Cell Broadband  Engine Architecture\nFirst-order Fragments with Successor over Infinite Words\nPhylogeny and geometry of languages from normalized Levenshtein distance\nDigraph Complexity Measures and Applications in Formal Language Theory\nComplex network analysis of literary and scientific texts\nOn distributed monitoring of asynchronous systems\nA Trichotomy for Regular Simple Path Queries on Graphs\nProceedings Fifth Workshop on Programming Language Approaches to  Concurrency- and Communication-cEntric Software\nMining and Exploiting Domain-Specific Corpora in the PANACEA Platform\nJRC EuroVoc Indexer JEX - A freely available multi-label categorisation  tool\nResolving and Exploiting the $k$-CFA Paradox\nHierarchies\nQuantitative methods for Phylogenetic Inference in Historical  Linguistics: An experimental case study of South Central Dravidian\nAssessing Wikipedia-Based Cross-Language Retrieval Models\nA Modality Lexicon and its use in Automatic Tagging\nRegularity Preserving but not Reflecting Encodings\nStreaming Property Testing of Visibly Pushdown Languages\nFreshman or Fresher? Quantifying the Geographic Variation of Internet  Language\nResolving Language and Vision Ambiguities Together: Joint Segmentation &  Prepositional Attachment Resolution in Captioned Scenes\nWhat is India speaking: The \"Hinglish\" invasion\nFrameNet Resource Grammar Library for GF\nOn the Sizes of DPDAs, PDAs, LBAs\nOn the universal structure of human lexical semantics\nDenotational cost semantics for functional languages with inductive  types\nFully automatic multi-language translation with a catalogue of phrases -  successful employment for the Swiss avalanche bulletin\nThe scarcity of crossing dependencies: a direct outcome of a specific  constraint?\nCOGENT: Certified Compilation for a Functional Systems Language\nDomain Specific Author Attribution Based on Feedforward Neural Network  Language Models\nDimension Projection among Languages based on Pseudo-relevant Documents  for Query Translation\nMultilinear Grammar: Ranks and Interpretations\nA Survey of Voice Translation Methodologies - Acoustic Dialect Decoder\nMulti-Agent Cooperation and the Emergence of (Natural) Language\nImplementing GraphQL as a Query Language for Deductive Databases in  SWI-Prolog Using DCGs, Quasi Quotations, and Dicts\nEmail Babel: Does Language Affect Criminal Activity in Compromised  Webmail Accounts?\nRobust clustering of languages across Wikipedia growth\nA Semantics Comparison Workbench for a Concurrent, Asynchronous,  Distributed Programming Language\nInteractively Picking Real-World Objects with Unconstrained Spoken  Language Instructions\nBasic concepts and tools for the Toki Pona minimalist and constructed  language: Wordnet synsets; analysis of the vocabulary; synthesis and syntax  highlighting of texts\nThe Enemy Among Us: Detecting Hate Speech with Threats Based 'Othering'  Language Embeddings\nPutting in All the Stops: Execution Control for JavaScript\nSequence-based Multi-lingual Low Resource Speech Recognition\nResource Polymorphism\nRUSSE'2018: A Shared Task on Word Sense Induction for the Russian  Language\nRUSSE: The First Workshop on Russian Semantic Similarity\nP4K: A Formal Semantics of P4 and Applications\nDual-Coding Theory and Connectionist Lexical Selection\nIntentions and Information in Discourse\nGraded Unification: A Framework for Interactive Processing\nSyntactic Analysis by Local Grammars Automata: an Efficient Algorithm\nPRINCIPAR---An Efficient, Broad-coverage, Principle-based Parser\nLHIP: Extended DCGs for Configurable Robust Parsing\nEmergent Linguistic Rules from Inducing Decision Trees: Disambiguating  Discourse Clue Words\nStatistical versus symbolic parsing for captioned-information retrieval\nAn Experiment on Learning Appropriate Selectional Restrictions from a  Parsed Corpus\nDutch Cross Serial Dependencies in HPSG\nA Rule-Based Approach To Prepositional Phrase Attachment Disambiguation\nPlanning Argumentative Texts\nThe \"Whiteboard\" Architecture: a way to integrate heterogeneous  components of NLP systems\nComlex Syntax: Building a Computational Lexicon\nAn HPSG Parser Based on Description Logics\nHigher-order Linear Logic Programming of Categorial Deduction\nStochastic HPSG\nA Robust Parser Based on Syntactic Information\nLexical Acquisition via Constraint Solving\nPhonological Derivation in Optimality Theory\nConstraints, Exceptions and Representations\nUtilizing Statistical Dialogue Act Processing in Verbmobil\nTagging the Teleman Corpus\nQuantifier Scope and Constituency\nFeatures and Agreement\nEmpirical Discovery in Linguistics\nAnalysis of the Arabic Broken Plural and Diminutive\nComputing Prosodic Morphology\nProcessing Complex Sentences in the Centering Framework\nEfficient Algorithms for Parsing the DOP Model? A Reply to Joshua  Goodman\nNotes on LR Parser Design\nSemantic-based Transfer\nEfficient Implementation of a Semantic-based Transfer Approach\nStylistic Variation in an Information Retrieval Experiment\nA Czech Morphological Lexicon\nFeatures as Resources in R-LFG\nValence Induction with a Head-Lexicalized PCFG\nUsing WordNet for Building WordNets\nAn Empirical Investigation of Proposals in Collaborative Dialogues\nCentering in Dynamic Semantics\nSpotting Prosodic Boundaries in Continuous Speech in French\nTowards an implementable dependency grammar\nAutocatalytic Theory of Meaning\nPAL: Pertinence Action Language\nHistorical Dynamics of Lexical System as Random Walk Process\nPhonology\nEfficient Deep Processing of Japanese\nThe Rank-Frequency Analysis for the Functional Style Corpora in the  Ukrainian Language\nTabular Parsing\nOn Invariance and Convergence in Time Complexity theory\nThe One Page Model Checker\nThe SL synchronous language, revisited\nReactive concurrent programming revisited\nD2D: Digital Archive to MPEG-21 DIDL\nNumeration-automatic sequences\nAlgebraic recognizability of languages\nThe parallel implementation of the Astrée static analyzer\nSociophysics Simulations I: Language Competition\nMicroscopic Abrams-Strogatz model of language competition\nComputer simulation of language competition by physicists\nFormation of Languages; Equality, Hierarchy and Teachers\nQuantum Domain Theory - Definitions and Applications\nElementary transformation analysis for Array-OL\nSimulation of Quantum Algorithms with a Symbolic Programming Language\nSpecial relativity in complex vector algebra\nA Universal Kernel for Learning Regular Languages\nWeak index versus Borel rank\nA Compositional Query Algebra for Second-Order Logic and Uncertain  Databases\nOn NFAs Where All States are Final, Initial, or Both\nSubshifts, Languages and Logic\nAnswers to Questions Formulated in the Paper \"On States Observability in  Deterministic Finite Automata\"\nNon-Parametric Bayesian Areal Linguistics\njYang : A YANG parser in java\nEmpowering OLAC Extension using Anusaaraka and Effective text processing  using Double Byte coding\nRepresenting human and machine dictionaries in Markup languages\nLoopW Technical Reference v0.3\nCausality in the Semantics of Esterel: Revisited\nOperator-oriented programming: a new paradigm for implementing window  interfaces and parallel algorithms\nState complexity of union and intersection combined with star and  reversal\nModel Theory of the Inaccessibility Scheme\nEvents! (Reactivity in urbiscript)\nDependently-Typed Formalisation of Typed Term Graphs\nHigher-order Rewriting for Executable Compiler Specifications\nALPprolog --- A New Logic Programming Method for Dynamic Domains\nThe complexity of tangent words\nConstructing Premaximal Binary Cube-free Words of Any Level\nBilliard complexity in the hypercube\nGroups, Graphs, Languages, Automata, Games and Second-order Monadic  Logic\nLambda-lifting and CPS conversion in an imperative language\nCaterpillar dualities and regular languages\nQuerying Source Code with Natural Language\nMusings on Encodings and Expressiveness\nA Method for Selecting Noun Sense using Co-occurrence Relation in  English-Korean Translation\nIntroduction of the weight edition errors in the Levenshtein distance\nEquivalence of Deterministic One-Counter Automata is NL-complete\nProbing the statistical properties of unknown texts: application to the  Voynich Manuscript\nWeak morphisms of higher dimensional automata\nBounded Choice Queries for Logic Programming\nThe DeLiVerMATH project - Text analysis in mathematics\nExploiting Parallelism in Coalgebraic Logic Programming\nDoes Syntactic Knowledge help English-Hindi SMT?\nExperience in using a typed functional language for the development of a  security application\nVicious Circle Principle and Logic Programs with Aggregates\nBuilding of Networks of Natural Hierarchies of Terms Based on Analysis  of Texts Corpora\nDerivational modal logics with the difference modality\nFunctorial Zeta Integrals\nIntroduction to ROSS: A New Representational Scheme\nMutually Exclusive Procedures in Imperative Languages\nAn ACCL which is not a CRCL\nWhat Your Username Says About You\nRDF Knowledge Graph Visualization From a Knowledge Extraction System\nQuantifier Scope in Categorical Compositional Distributional Semantics\nWord Segmentation on Micro-blog Texts with External Lexicon and  Heterogeneous Data\nAuthorship clustering using multi-headed recurrent neural networks\nMany-Task Computing Tools for Multiscale Modeling\nPDDL 2.1: Representation vs. Computation\nAsymptotic Entropy of Random Walks on Regular Languages over a Finite  Alphabet\nSemantics, Modelling, and the Problem of Representation of Meaning -- a  Brief Survey of Recent Literature\nA Brief State of the Art for Ontology Authoring\nMultiparty Sessions based on Proof Nets\nUsing crowdsourcing system for creating site-specific statistical  machine translation engine\nCLAZY: Lazy Calling for Common Lisp\nRecognisable languages over monads\nWhat the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes\nPrograms as Polypeptides\nEvaluation of the Accuracy of the BGLemmatizer\nAutomata and automata mappings of semigroups\nAutocorrelated errors explain the apparent relationship between  disapproval of the US Congress and prosocial language\nInference rules for RDF(S) and OWL in N3Logic\nCo-Occurrence Patterns in the Voynich Manuscript\nA formal language for cyclic operads\nOn the axiomatizability of $\\mathrm{C}^*$-algebras as operator systems\nMulti-Level Analysis and Annotation of Arabic Corpora for Text-to-Sign  Language MT\nData as processes: introducing measurement data into CARMA models\nUsing Recurrent Neural Network for Learning Expressive Ontologies\nConvexity and Order in Probabilistic Call-by-Name FPC\nA New Bengali Readability Score\nModeling selectional restrictions in a relational type system\nThe canonical semantic network supports residual language function in  chronic post-stroke aphasia\nFirst-Order Bayesian Network Specifications Capture the Complexity Class  PP\nItaly goes to Stanford: a collection of CoreNLP modules for Italian\nLindenbaum method (propositional language)\nStructural characterization of Cayley graphs\nA Hackathon for Classical Tibetan\nToward a new instances of NELL\nChinese Restaurant Process for cognate clustering: A threshold free  approach\nEarly Evolution of Bird-Type Language without Grammar: Duplication and  Mutation\nA recurrent neural network without chaos\nASHACL: Alternative Shapes Constraint Language\nDeep Learning applied to NLP\nIs this word borrowed? An automatic approach to quantify the likeliness  of borrowing in social media\nApproximation of Weighted Automata with Storage\nMedical Text Classification using Convolutional Neural Networks\nA Biomedical Information Extraction Primer for NLP Researchers\nCanonical Selection of Colimits\nOverview of the NLPCC 2017 Shared Task: Chinese News Headline  Categorization\nNon-locality of the meet levels of the Trotter-Weil Hierarchy\nScoped Extension Methods in Dynamically-Typed Languages\nA Mathematical Picture Language Program\nWhen rule-based models need to count\nClosure Properties in the Class of Multiple Context Free Groups\nInternal Language of Finitely Complete $(\\infty, 1)$-categories\nSymbol, Conversational, and Societal Grounding with a Toy Robot\nLanguage-depedent I-Vectors for LRE15\nNatural Language Inference from Multiple Premises\nHomogeneous 3-dimensional permutation structures\nCons-free Programming with Immutable Functions\nQuantifier elimination on some pseudo-algebraically closed valued fields\nMIZAN: A Large Persian-English Parallel Corpus\nUnsupervised Part-of-Speech Induction\nDistributed NLP\nOn the scaling of polynomial features for representation matching\nDEMorphy, German Language Morphological Analyzer\nOpenMath and SMT-LIB\nMachine Learning and Applied Linguistics\nThe Logical Essentials of Bayesian Reasoning\nScaling in the Universe\nA Multi-Threaded Fast Convolver for Dynamically Parallel Image Filtering\nStructuring eccentric-narrow planetary rings\nLopsided spiral galaxies: evidence for gas accretion\nSimulating the High Energy Gamma-ray sky seen by the GLAST Large Area  Telescope\nAbundances of Na, Mg and Al in stars with giant planets\nThe energetics, evolution, and stellar depletion of Li6 in the early  Galaxy\nThe halo-model description of marked statistics\nThe Luminosity-Weighted or `Marked' Correlation Function\nDisordered Flat Phase in a Solid on Solid Model of Fcc(110) Surfaces and  Dimer States in Quantum Spin-1/2 Chains\nPolymer Winding Numbers and Quantum Mechanics\nConductances in normal and normal-superconductor structures\nKinetical theory beyond conventional approximations and 1/f-noise\nFirst-Principles Based Matrix-Green's Function Approach to Molecular  Electronic Devices: General Formalism\nVoting and Catalytic Processes with Inhomogeneities\nQuantum transport properties of two-dimensional electron gases under  high magnetic fields\nFirst-Order Conditional Logic Revisited\nThe \"Fodor\"-FODOR fallacy bites back\nSome Remarks on the Geometry of Grammar\nSLT-Resolution for the Well-Founded Semantics\nOn Redundancy Elimination Tolerant Scheduling Rules\nMeta-Learning for Phonemic Annotation of Corpora\nA Knowledge-based Automated Debugger in Learning System\nDecomposing Non-Redundant Sharing by Complementation\nThe Limits of Horn Logic Programs\nComputing Preferred Answer Sets by Meta-Interpretation in Answer Set  Programming\nWhat does a conditional knowledge base entail?\nVisual Environment for Rapid Composition of Parameter-Sweep Applications  for Distributed Processing on Global Grids\nThe new BaBar Data Reconstruction Control System\nMinimum Model Semantics for Logic Programs with Negation-as-Failure\nThe Athena Startup Kit\nHuman-Level Performance on Word Analogy Questions by Latent Relational  Analysis\nStabilization of Cooperative Information Agents in Unpredictable  Environment: A Logic Programming Approach\nSoftware Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and  Zipf's Law\nNaming Games in Spatially-Embedded Random Networks\nStatic Analysis using Parameterised Boolean Equation Systems\nSimilarity of Semantic Relations\nAutomata with Nested Pebbles Capture First-Order Logic with Transitive  Closure\nStatistical Geometry in Quantum Mechanics\nAcoustic black holes: horizons, ergospheres, and Hawking radiation\nGravitons from a loop representation of linearised gravity\nMatter with dilaton charge in Weyl-Cartan spacetime and evolution of the  Universe\nMonodromy-data parameterization of spaces of local solutions of  integrable reductions of Einstein's field equations\nQuantum Mechanics without spacetime V - Why a quantum theory of gravity  should be non-linear-\nThe Use of Schoonschip and Form in Perturbative Lattice Calculations\nPerturbative renormalization of moments of quark momentum, helicity and  transversity distributions with overlap and Wilson fermions\nTIME EVOLUTION OF $K^0-\\bar{K^0}$ SYSTEM IN SPECTRAL FORMULATION\nDiffractive J/Psi production in high energy gamma-gamma collisions as a  probe of the QCD pomeron\nPhysical mechanisms generating spontaneous symmetry breaking and a  hierarchy of scales\nOn Nonperturbative Calculations in Quantum Electrodynamics\nGeometrical Lattice models for N=2 supersymmetric theories in two  dimensions\nCentral extensions of current groups in two dimensions\nThe Mechanism behind the Embeddings of String Theories\nIntegrable Hierarchies and Contact Terms in u-plane Integrals of  Topologically Twisted Supersymmetric Gauge Theories\nNonperturbative Formulations of Superstring Theory\nDGP Brane as a Gravity Conductor\nGauge Transformations, BRST Cohomology and Wigner's Little Group\nFayet-Iliopoulos Terms in Supergravity and Cosmology\nTransgression forms as unifying principle in field theory\nAsymmetric Nondegenerate Geometry\nThe uniqueness of the spectral flow on spaces of unbounded self--adjoint  Fredholm operators\nFoundations of real analysis and computability theory in  non-Aristotelian finitary logic\nKrein duality, positive 2-algebras, and dilation of comultiplications  (To the centenary of Mark G.Krein)\nLandau-Lifshitz hierarchy and infinite dimensional Grassmann variety\nBosonization of the Pairing Hamiltonian\nThe Wondrous Design and Non-random Character of \"Chance\" Events\nThe National Ignition Facility: Status and Plans for Laser Fusion and  High-Energy-Density Experimental Studies\nThe Leeson effect - Phase noise in quasilinear oscillators\nGeometry of Financial Markets -- Towards Information Theory Model of  Markets\nTranscriptional Regulation by the Numbers 1: Models\nCarbon--The First Frontier of Information Processing\nResource Limited Theories and their Extensions\nQuantum Limitations on the Storage and Transmission of Information\nEntanglement as an Observer-Dependent Concept: An Application to Quantum  Phase Transitions\nClassical randomness in quantum measurements\nStatistical Zero Knowledge and quantum one-way functions\nQuantum Hardcore Functions by Complexity-Theoretical Quantum List  Decoding\nPermutation and Its Partial Transpose\nSome Quantitative Aspects of Fractional Computability\nSupersymmetric Models with Higher Dimensional Operators\nDiscrete rearranging disordered patterns, part I: Robust statistical  tools in two or three dimensions\nSky in Google Earth: The Next Frontier in Astronomical Data Discovery  and Visualization\nThe Scientific Manuscripts left Unpublished by Ettore Majorana (with  outlines of his life and work)\nApplication of Quantum Theory to Super-parametric Density Estimation\nA Joint Search for Gravitational Wave Bursts with AURIGA and LIGO\nCausality and Association: The Statistical and Legal Approaches\nCommon Reusable Verification Environment for BCA and RTL Models\nElastic Maps and Nets for Approximating Principal Manifolds and Their  Application to Microarray Data Visualization\nBayesian considerations on the multiverse explanation of cosmic  fine-tuning\nLexical growth, entropy and the benefits of networking\nVoyage by Catamaran: Effecting Semantic Network \"Bricolage\" via  Infinite-Dimensional Zero-Divisor Ensembles\nTopos Mediated Gravity: Toward the Categorical Resolution of the  Cosmological Constant Problem\nCanonical polygon Queries on the plane: a New Approach\nThe Infrared Luminosity of Galaxy Clusters\nAxions and Photons In Terms of \"Particles\" and \"Anti-Particles\"\nInteractive Proofs For Quantum Computations\nOn second quantization on noncommutative spaces with twisted symmetries\nCoherent state approach to the cross collisional effects in the  population dynamics of a two-mode Bose-Einstein condensate\nTemporal Support of Regular Expressions in Sequential Pattern Mining\nThe Safe Lambda Calculus\nSpherically symmetric massive scalar fields in GR\nIntegrable structure of melting crystal model with two q-parameters\nUncovering shared common genetic risk factors for various aspects of  complex disorders captured in multiple traits\nRobust Regulatory Networks\nHeterotic (0,2) Gepner Models and Related Geometries\nSoftware Model Checking via Large-Block Encoding\nQuantum Measure Theory: A New Interpretation\nInduction of Word and Phrase Alignments for Automatic Document  Summarization\nSelf-Assembling Systems are Distributed Systems\nEquivariant cohomology over Lie groupoids and Lie-Rinehart algebras\nPrecision multi-epoch astrometry with VLT cameras FORS1/2\nReaching the boundary between stellar kinematic groups and very wide  binaries. The Washington Double Stars with the widest angular separations\nDisc-planet interactions in sub-keplerian discs\nFirst Passage Distributions in a Collective Model of Anomalous Diffusion  with Tunable Exponent\nConvergence Time Evaluation of Algorithms in MANETs\nProperties of the United States Code Citation Network\nLikelihood-based semi-supervised model selection with applications to  speech processing\nModelling Cell Cycle using Different Levels of Representation\nCalibration of star formation rate tracers for short- and long-lived  star formation episodes\nImproved Constructions for Non-adaptive Threshold Group Testing\nHardware Implementation of TDES Crypto System with On Chip Verification  in FPGA\nHandwritten Arabic Numeral Recognition using a Multi Layer Perceptron\nA Formal Approach to Modeling the Memory of a Living Organism\nAssume-Guarantee Synthesis for Digital Contract Signing\nFully Countering Trusting Trust through Diverse Double-Compiling\nUndecidability of linear inequalities in graph homomorphism densities\nThe cooling time of white dwarfs produced from type Ia supernovae\nQuantum optical coherence can survive photon losses: a  continuous-variable quantum erasure correcting code\nCalculation methods of the nuclear characteristics\nFundamental Relativistic Rotator. Hessian singularity and the issue of  the minimal interaction with electromagnetic field\nA Bayesian View of the Poisson-Dirichlet Process\nSwapping Evaluation: A Memory-Scalable Solution for Answer-On-Demand  Tabling\nFixpoint & Proof-theoretic Semantics for CLP with Qualification and  Proximity\nTop-Down Multilevel Simulation of Tumor Response to Treatment in the  Context of In Silico Oncology\nModeling and Analyzing Adaptive User-Centric Systems in Real-Time Maude\nThe e.m. field nside the unidimensional Photonic Crystals - Study at  optical frequencies applying \"Quasi-Normal-Modes\" theory\nOpen Graphs and Monoidal Theories\nProbabilistic Arithmetic Automata and their Applications\nNoncommutative Riemannian geometry on graphs\nChiral and angular momentum content of mesons\nProspects for accurate distance measurements of pulsars with the SKA:  enabling fundamental physics\nOlogs: a categorical framework for knowledge representation\nCosmology at the Crossroads of the Natural and Human Sciences: is  demarcation possible?\nFirst-order Logic: Modality and Intensionality\nHappiness is assortative in online social networks\nConfronting the models of 3:2 quasiperiodic oscillations with the rapid  spin of the microquasar GRS 1915+105\nPushing the limits for medical image reconstruction on recent standard  multicore processors\nGPU-Based Heuristic Solver for Linear Sum Assignment Problems Under  Real-time Constraints\nA continuum solvent model: the DISOLV program - algorithms,  implementation, and validation\nComputing Distances between Probabilistic Automata\nEfficient Loop Navigation for Symbolic Execution\nThree Looks at Instantons in F-theory -- New Insights from Anomaly  Inflow, String Junctions and Heterotic Duality\nOn the Zipf strategy for short-term investments in WIG20 futures\nDiagrammatics for SU(2) invariant matrix product states\nHasenohrl and the Equivalence of Mass and Energy\nSimulation-based optimal Bayesian experimental design for nonlinear  systems\nA higher Chern-Weil derivation of AKSZ sigma-models\nVariational Minimization of Orbital-dependent Density Functionals\nOptical lattice quantum simulator for QED in strong external fields:  spontaneous pair creation and the Sauter-Schwinger effect\nAdding Logical Operators to Tree Pattern Queries on Graph-Structured  Data\nNon-Archimedean Whitney stratifications\nStackable groups, tame filling invariants, and algorithmic properties of  groups\nA Massive Data Parallel Computational Framework for Petascale/Exascale  Hybrid Computer Systems\nProcessor Allocation for Optimistic Parallelization of Irregular  Programs\nMixed Mimetic Spectral Element Method for Stokes Flow: A Pointwise  Divergence-Free Solution\nFirst-Order Logic on Higher-Order Nested Pushdown Trees\nRetrieving the three-dimensional matter power spectrum and galaxy  biasing parameters from lensing tomography\nDirected Spontaneous Emission from $N$-atom Extended Ensemble\nNew remarks on the Cosmological Argument\nContinuous time Boolean modeling for biological signaling: application  of Gillespie algorithm\nFragIt: A Tool to Prepare Input Files for Fragment Based Quantum  Chemical Calculations\nRevisiting Waiting Times in DNA evolution\nBinary hidden Markov models and varieties\nFuzzy Knowledge Representation, Learning and Optimization with Bayesian  Analysis in Fuzzy Semantic Networks\nOptimization of Fuzzy Semantic Networks Based on Galois Lattice and  Bayesian Formalism\nBisimilarity on Basic Process Algebra is in 2-ExpTime (an explicit  proof)\nBeyond crystals: the dialectic of materials and information\nWikiSent : Weakly Supervised Sentiment Analysis Through Extractive  Summarization With Wikipedia\nFactorization from an order-theoretic view 1&2\nLearn with SAT to Minimize Büchi Automata\nThe μ-Calculus Alternation Hierarchy Collapses over Structures with  Restricted Connectivity\nLa fonte algébrique des Méthodes nouvelles de la mécanique  céleste d'Henri Poincaré\nOpinion Mining for Relating Subjective Expressions and Annual Earnings  in US Financial Statements\nProject G.N.O.S.I.S.: Geographical Network Of Synoptic Information  System\nTensor Rank, Invariants, Inequalities, and Applications\nBijective Projections on Parabolic Quotients of Affine Weyl Groups\nGlobal well-posedness of the spatially homogeneous Hubbard-Boltzmann  equation\nLearning Joint Query Interpretation and Response Ranking\nProceedings First International Workshop on Formal Techniques for  Safety-Critical Systems\nAmplitudes for Spacetime Regions and the Quantum Zeno Effect: Pitfalls  of Standard Path Integral Constructions\nFrom 9-IM Topological Operators to Qualitative Spatial Relations using  3D Selective Nef Complexes and Logic Rules for bodies\nSource Code Analysis to Remove Security Vulnerabilities in Java Socket  Programs: A Case Study\nThe covariant description of electric and magnetic field lines of null  fields: application to Hopf-Ranada solutions\nRecommending Given Names\nWord sense disambiguation via high order of learning in complex networks\nDecidable Classes of Tree Automata Mixing Local and Global Constraints  Modulo Flat Theories\nA Semantic approach for effective document clustering using WordNet\nThe Velocity of Censorship: High-Fidelity Detection of Microblog Post  Deletions\nQuery Expansion Using Term Distribution and Term Association\nQuantum Mechanics in symmetry language\nAn efficient method for evaluating BEM singular integrals on curved  elements with application in acoustic analysis\nProbability Distinguishes Different Types of Conditional Statements\nYoung stellar clusters in the Rosette molecular cloud. Arguments against  triggered star formation\nCAD-based robot programming: The role of Fuzzy-PI force control in  unstructured environments\nEnumeration of the adjunctive hierarchy of hereditarily finite sets\nOn SAT representations of XOR constraints\nIndexing by Latent Dirichlet Allocation and Ensemble Model\nNaming Game on Networks: Let Everyone be Both Speaker and Hearer\nCovering Pairs in Directed Acyclic Graphs\nAbstract Acceleration of General Linear Loops\nWeak Singular Hybrid Automata\nPhysics Items and Student's Performance at Enem\nAnaDroid: Malware Analysis of Android with User-supplied Predicates\nOn the definition of a general learning system with user-defined  operators\nXQuery Streaming by Forest Transducers\nCovariant Bardeen Perturbation Formalism\nA Survey of Data Mining Techniques for Social Media Analysis\nFoundations of biology\nLearning Document-Level Semantic Properties from Free-Text Annotations\nConstructing Reference Sets from Unstructured, Ungrammatical Text\nCentrality-as-Relevance: Support Sets and Similarity as Geometric  Proximity\nReduction in mechanical systems with symmetry\nCritical points and symmetries of a free energy function for biaxial  nematic liquid crystals\nNegative index Jacobi forms and quantum modular forms\nDetecting Cohesive and 2-mode Communities in Directed and Undirected  Networks\nMicroeconomic Structure determines Macroeconomic Dynamics. Aoki defeats  the Representative Agent\nSERPent: Automated reduction and RFI-mitigation software for e-MERLIN\nConsequences of the Lakshmibai-Sandhya Theorem: the ubiquity of  permutation patterns in Schubert calculus and related geometry\nAtmospheric parameters and chemical properties of red giants in the  CoRoT asteroseismology fields\nA really simple approximation of smallest grammar\nHeap Abstractions for Static Analysis\nIntegrated Data Acquisition, Storage, Retrieval and Processing Using the  COMPASS DataBase (CDB)\nOn redundancy of memoryless sources over countable alphabets\nWorst-case Throughput Analysis for Parametric Rate and Parametric Actor  Execution Time Scenario-Aware Dataflow Graphs\nA Family of Descriptive Approaches To Preferred Answer Sets\nBe In The Know: Connecting News Articles to Relevant Twitter  Conversations\nHow to Ask for a Favor: A Case Study on the Success of Altruistic  Requests\nResource Usage Analysis of Logic Programs via Abstract Interpretation  Using Sized Types\nUndecidable propositional bimodal logics and one-variable first-order  linear temporal logics with counting\nDissimilarity-based Sparse Subset Selection\nDouble Counting in $2^t$-ary RSA Precomputation Reveals the Secret  Exponent\nEnhanced usage of keys obtained by physical, unconditionally secure  distributions\nVisibly Pushdown Modular Games\nPolarization Measurement of High Dimensional Social Media Messages With  Support Vector Machine Algorithm Using Mapreduce\nComprehensive and Macrospin-Based Magnetic Tunnel Junction Spin Torque  Oscillator Model - Part I: Analytical Model of the MTJ STO\nConverse Lyapunov theorems for discrete-time linear switching systems  with regular switching sequences\nPhenotypic Plasticity, the Baldwin Effect, and the Speeding up of  Evolution: the Computational Roots of an Illusion\nBulk Locality and Quantum Error Correction in AdS/CFT\nMillisecond Pulsars in Close Binaries\nRelational semantics of linear logic and higher-order model-checking\nNilpotent integrability, reduction of dynamical systems and a  third-order Calogero-Moser system\nRepresenting Objects, Relations, and Sequences\nRecurrent Neural Network Training with Dark Knowledge Transfer\nSynthesis through Unification\nPolynomially Low Error PCPs with polyloglog n Queries via Modular  Composition\nA model building framework for Answer Set Programming with external  computations\nProgrammatic and Direct Manipulation, Together at Last\nWhat are essential concepts about networks?\nTag-Weighted Topic Model For Large-scale Semi-Structured Documents\nEfficient Ranking of Lyndon Words and Decoding Lexicographically Minimal  de Bruijn Sequence\nRegular Symmetry Patterns (Technical Report)\nGenerating Text with Deep Reinforcement Learning\nRevisiting noninteracting string partition functions in Rindler space\nMulti-task Sequence to Sequence Learning\nRecurrent Models for Auditory Attention in Multi-Microphone Distance  Speech Recognition\nConducting sparse feature selection on arbitrarily long phrases in text  corpora with a focus on interpretability\nAdding Gradient Noise Improves Learning for Very Deep Networks\nAsk Me Anything: Free-form Visual Question Answering Based on Knowledge  from External Sources\nOptimizing Solution Quality in Synchronization Synthesis\nOn the étale homotopy type of higher stacks\nLearning with Memory Embeddings\nTowards Universal Paraphrastic Sentence Embeddings\nApproximating Optimal Bounds in Prompt-LTL Realizability in  Doubly-exponential Time\nSolving Diophantine Equations\nThe Complexity of Synchronizing Markov Decision Processes\nTextProposals: a Text-specific Selective Search Algorithm for Word  Spotting in the Wild\nEffortless Data Exploration with zenvisage: An Expressive and  Interactive Visual Analytics System\nMatch-SRNN: Modeling the Recursive Matching Structure with Spatial RNN\nStability and chaos in Kustaanheimo-Stiefel space induced by the Hopf  fibration\nMinimizing Expected Cost Under Hard Boolean Constraints, with  Applications to Quantitative Synthesis\nComputational Higher Type Theory I: Abstract Cubical Realizability\nUniversal Indexes for Highly Repetitive Document Collections\nWorld Knowledge as Indirect Supervision for Document Clustering\nSynchronizing Automata with Extremal Properties\nSerendipity and strategy in rapid innovation\nA General Framework for Static Profiling of Parametric Resource Usage\nSimilarity Search on Automata Processors\nMonochromatic factorisations of words and periodicity\nThe XML and Semantic Web Worlds: Technologies, Interoperability and  Integration. A Survey of the State of the Art\nCohesion and Coalition Formation in the European Parliament: Roll-Call  Votes and Twitter Activities\nClass Invariants: Concepts, Problems, Solutions\nSome consequences of a recursive number-theoretic relation that is not  the standard interpretation of any of its formal representations\nMark My Words! Linguistic Style Accommodation in Social Media\nTverberg's theorem and graph coloring\nMapping Relational Operations onto Hypergraph Model\nAn hbar-expansion of the Toda hierarchy: a recursive construction of  solutions\nLazy Pointer Analysis\nThe Non-Archimedean Theory of Discrete Systems\nThe FuturICT Education Accelerator\nLife Before Earth\nA Semantics for Approximate Program Transformations\nPermutation 2-groups I: structure and splitness\nSublinear Matching With Finite Automata Using Reverse Suffix Scanning\nTHELI -- Convenient reduction of any optical, near- and mid-infrared  imaging data\nLinking Rigid Bodies Symmetrically\nAnalysis of Watson's Strategies for Playing Jeopardy!\nBreaking a monad-comonad symmetry between computational effects\nIntelligent User Interface in Fuzzy Environment\nGraphX: Unifying Data-Parallel and Graph-Parallel Analytics\nQuantum Pushdown Automata with a Garbage Tape\nParticles, waves and trajectories: 210 years after Young's experiment\nLearning Natural Coding Conventions\nRegular path queries on graphs with data: A rigid approach\nImproving Collaborative Filtering based Recommenders using Topic  Modelling\nData Definitions in the ACL2 Sedan\nTyped Hilbert Epsilon Operators and the Semantics of Determiner Phrases  (Invited Lecture)\nSpiral Galaxies - classical description of spiral arms and rotational  velocity pattern - toy model\nCrowd-Sourcing Fuzzy and Faceted Classification for Concept Search\nBuilding DNN Acoustic Models for Large Vocabulary Speech Recognition\nNeural Machine Translation by Jointly Learning to Align and Translate\nBypassing Captcha By Machine A Proof For Passing The Turing Test\nGround state connectivity of local Hamiltonians\nSequence to Sequence Learning with Neural Networks\nSandboxing for Software Transactional Memory with Deferred Updates\nAn agent-driven semantical identifier using radial basis neural networks  and reinforcement learning\nTwisted chiral de Rham complex, generalized geometry, and T-duality\nCommunication complexity and the reality of the wave-function\nDistributed Protocols and Heterogeneous Trust: Technical Report\nEmbedding Entities and Relations for Learning and Inference in Knowledge  Bases\nLineCAPTCHA Mobile: A User Friendly Replacement for Unfriendly Reverse  Turing Tests for Mobile Devices (ICIAfS14)\nAlgebraic synchronization criterion and computing reset words\nApplying deep learning techniques on medical corpora from the World Wide  Web: a prototypical system and evaluation\nA $(1 + {\\varepsilon})$-Embedding of Low Highway Dimension Graphs into  Bounded Treewidth Graphs\nRule-and Dictionary-based Solution for Variations in Written Arabic  Names in Social Networks, Big Data, Accounting Systems and Large Databases\nWhen Are Tree Structures Necessary for Deep Learning of Representations?\nDecidable Horn Systems with Difference Constraints Arithmetic\nEfficiently intertwining widening and narrowing\nNonparametric Relational Topic Models through Dependent Gamma Processes\nQuantum theory of measurements as quantum decision theory\nOpen systems dynamics: Simulating master equations in the computer\nTimed Orchestration for Component-based Systems\nOptimal Shuffle Code with Permutation Instructions\nNodal Discontinuous Galerkin Simulations for Reverse-Time Migration on  GPU Clusters\nSpatial Symmetry Driven Pruning Strategies for Efficient Declarative  Spatial Reasoning\nComparing and evaluating extended Lambek calculi\nLCSTS: A Large Scale Chinese Short Text Summarization Dataset\nSolving two-dimensional density classification problem with two  probabilistic cellular automata\nResponse Operators for Markov Processes in a Finite State Space: Radius  of Convergence and Link to the Response Theory for Axiom A Systems\nWYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Visual  Information Extraction\nLearning from Real Users: Rating Dialogue Success with Neural Networks  for Reinforcement Learning in Spoken Dialogue Systems\nSuperspace Unitary Operator in Superfield Approach to Non-Abelian Gauge  Theory with Dirac Fields\nSelecting Relevant Web Trained Concepts for Automated Event Retrieval\nNeural-based machine translation for medical text domain. Based on  European Medicines Agency leaflet texts\nA Restricted Visual Turing Test for Deep Scene and Event Understanding\nInhomogeneous field theory inside the arctic circle\nScalable Package Queries in Relational Database Systems\nEdge instabilities of topological superconductors\nQuantifying Public Response towards Islam on Twitter after Paris Attacks\nProbabilistic Programming with Gaussian Process Memoization\nThe Impact of Technical Domain Expertise on Search Behavior and Task  Outcome\nModeling Variations of First-Order Horn Abduction in Answer Set  Programming\nWrite a Classifier: Predicting Visual Classifiers from Unstructured Text\nEvoGrader: an online formative assessment tool for automatically  evaluating written evolutionary explanations\nTrueHappiness: Neuromorphic Emotion Recognition on TrueNorth\nBaby Steps in Quantum Ring Theory: towards a background independent  framework for Quantum Gravity\nSentiment Analysis of Twitter Data: A Survey of Techniques\nIntelligent Conversational Bot for Massive Online Open Courses (MOOCs)\nThe X-ray/radio and UV luminosity expected from symbiotic systems as the  progenitor of SNe Ia\nMulti-Object Reasoning with Constrained Goal Models\nLorentz Constraints on Massive Three-Point Amplitudes\nSpatial Concept Acquisition for a Mobile Robot that Integrates  Self-Localization and Unsupervised Word Discovery from Spoken Sentences\nShelah's eventual categoricity conjecture in universal classes. Part II\nEndless love: On the termination of a playground number game\nControl-Flow Integrity: Precision, Security, and Performance\nConstructive Patterns of Logical Truth\nLabel Noise Reduction in Entity Typing by Heterogeneous Partial-Label  Embedding\nAutomatic Verification of Iterated Separating Conjunctions using  Symbolic Execution\nIf-Conversion Optimization using Neuro Evolution of Augmenting  Topologies\nOn point processes in multitarget tracking\nImage Captioning and Visual Question Answering Based on Attributes and  External Knowledge\nThe Road to Popularity: The Dilution of Growing Audience on Twitter\nDiagonal Unloading Beamforming for Source Localization\nExtremes and Recurrence in Dynamical Systems\nAssessment of Effectiveness of Content Models for Approximating Twitter  Social Connection Structures\nAdvanced geometrical constructs in a Pueblo ceremonial site, c 1200 CE\nTemporal Topic Modeling to Assess Associations between News Trends and  Infectious Disease Outbreaks\nApplication-Driven Near-Data Processing for Similarity Search\nActive Discriminative Text Representation Learning\nUsing Word Embeddings in Twitter Election Classification\nIs a Picture Worth Ten Thousand Words in a Review Dataset?\nExact gradient updates in time independent of output size for the  spherical loss family\nFinite time blowup for Lagrangian modifications of the three-dimensional  Euler equation\nOptimising The Input Window Alignment in CD-DNN Based Phoneme  Recognition for Low Latency Processing\nIncremental Quantitative Analysis on Dynamic Costs\nAn Active RBSE Framework to Generate Optimal Stimulus Sequences in a BCI  for Spelling\nLa representación de la variación contextual mediante definiciones  terminológicas flexibles\nAutomated Prediction of Temporal Relations\nTopicResponse: A Marriage of Topic Modelling and Rasch Modelling for  Automatic Measurement in MOOCs\nFusion basis for lattice gauge theory and loop quantum gravity\nCRTS: A type system for representing clinical recommendations\nAsk the GRU: Multi-Task Learning for Deep Text Recommendations\nDetecting Singleton Review Spammers Using Semantic Similarity\nMulti-view Dimensionality Reduction for Dialect Identification of Arabic  Broadcast Speech\nThe Complexity of Flat Freeze LTL\nA Theory of Interactive Debugging of Knowledge Bases in Monotonic Logics\nOCR++: A Robust Framework For Information Extraction from Scholarly  Articles\nOrganized Complexity: is Big History a Big Computation?\nPolynomial Time Corresponds to Solutions of Polynomial Ordinary  Differential Equations of Polynomial Length (Journal version)\nMonaural Multi-Talker Speech Recognition using Factorial Speech  Processing Models\nDiverse Beam Search: Decoding Diverse Solutions from Neural Sequence  Models\nJoint Resource Bidding and Tipping Strategies in Multi-hop Cognitive  Networks\nPossible regular phenomena in EXO 2030+375\nAcoustic Reflector Localization: Novel Image Source Reversion and Direct  Localization Methods\nEnd-to-End Training Approaches for Discriminative Segmental Models\nProgramming Heterogeneous Systems from an Image Processing DSL\nFEAST: An Automated Feature Selection Framework for Compilation Tasks\nLearning to superoptimize programs\nContradiction Detection for Rumorous Claims\nFractal Analysis Based on Hierarchical Scaling in Complex Systems\nGeometric deep learning: going beyond Euclidean data\nA Computational Non-Commutative Geometry Program for Disordered  Topological Insulators\nImproved Image Captioning via Policy Gradient optimization of SPIDEr\nZipf's law, unbounded complexity and open-ended evolution\nEchoWear: Smartwatch Technology for Voice and Speech Treatments of  Patients with Parkinson's Disease\n\"What is Relevant in a Text Document?\": An Interpretable Machine  Learning Approach\nAbelian-Square-Rich Words\nsTools - a data reduction pipeline for the GREGOR Fabry-Pérot  Interferometer and the High-resolution Fast Imager at the GREGOR solar  telescope\nJust an Update on PMING Distance for Web-based Semantic Similarity in  Artificial Intelligence and Data Mining\nPair production processes and flavor in gauge-invariant perturbation  theory\nAttention-Based Multimodal Fusion for Video Description\nTowards Automatic Learning of Heuristics for Mechanical Transformations  of Procedural Code\nLeptoquark mechanism of neutrino masses within the grand unification  framework\nDeep Learning the Indus Script\nMining User/Movie Preferred Features Based on Reviews for Video  Recommendation System\nLatent Room-Temperature T$_c$ in Cuprate Superconductors\nSampling strategies for fast updating of Gaussian Markov random fields\nStability of Topic Modeling via Matrix Factorization\nFrom Complex Event Processing to Simple Event Processing\nA Monadic Framework for Relational Verification: Applied to Information  Security, Program Equivalence, and Optimizations\nQuery Expansion Based on Crowd Knowledge for Code Search\nCats and Captions vs. Creators and the Clock: Comparing Multimodal  Content to Context in Predicting Relative Popularity\nCustomer Lifetime Value Prediction Using Embeddings\nForaging patterns in online searches\nMetaPAD: Meta Pattern Discovery from Massive Text Corpora\nToward a Formal Model of Cognitive Synergy\nMulti-talker Speech Separation with Utterance-level Permutation  Invariant Training of Deep Recurrent Neural Networks\nSubset Synchronization in Monotonic Automata\nFlare: Native Compilation for Heterogeneous Workloads in Apache Spark\nShingle 2.0: generalising self-consistent and automated domain  discretisation for multi-scale geophysical models\nDoes Outside-In Teaching Improve the Learning of Object-Oriented  Programming?\nTowards Building Large Scale Multimodal Domain-Aware Conversation  Systems\nDualGAN: Unsupervised Dual Learning for Image-to-Image Translation\nTowards Industry 4.0: Gap Analysis between Current Automotive MES and  Industry Standards using Model-Based Requirement Engineering\nGang-GC: Locality-aware Parallel Data Placement Optimizations for  Key-Value Storages\nRepresentation Stability as a Regularizer for Improved Text Analytics  Transfer Learning\nThe Conway Moonshine Module is a Reflected K3 Theory\nFML-based Prediction Agent and Its Application to Game of Go\nLearning to Reason: End-to-End Module Networks for Visual Question  Answering\nRecognizability for sequences of morphisms\nSonata: Query-Driven Network Telemetry\nMultigraded Hilbert Series of noncommutative modules\nMaximizing the information learned from finite data selects a simple  model\nPixie: A heterogeneous Virtual Coarse-Grained Reconfigurable Array for  high performance image processing applications\nExploring Latent Semantic Factors to Find Useful Product Reviews\nOn the role of words in the network structure of texts: application to  authorship attribution\nA Finitary Analogue of the Downward Löwenheim-Skolem Property\nEngineering Record And Replay For Deployability: Extended Technical  Report\nMatching Logic\nLearning Semantic Relatedness From Human Feedback Using Metric Learning\nDiscontinuous Hamiltonian Monte Carlo for models with discrete  parameters and discontinuous likelihoods\nAccelerating Neural Architecture Search using Performance Prediction\nRenormalization and Coarse-graining of Loop Quantum Gravity\nAttention Is All You Need\nExploring Code Clones in Programmable Logic Controller Software\nRELink: A Research Framework and Test Collection for Entity-Relationship  Retrieval\nS-Net: From Answer Extraction to Answer Generation for Machine Reading  Comprehension\nAlgebras of linear growth and the dynamical Mordell-Lang conjecture\nJointly Learning Word Embeddings and Latent Topics\nEnd-to-end Conversation Modeling Track in DSTC6\nImage Processing in Floriculture Using a robotic Mobile Platform\nCombinatorics and Topology of Kawai-Lewellen-Tye Relations\nStealthy Deception Attacks Against SCADA Systems\nTableaux for Policy Synthesis for MDPs with PCTL* Constraints\nReal-time colouring and filtering with graphics shaders\nHidden-Markov-Model Based Speech Enhancement\nDarkRank: Accelerating Deep Metric Learning via Cross Sample  Similarities Transfer\nLook Who's Talking: Bipartite Networks as Representations of a Topic  Model of New Zealand Parliamentary Speeches\nThe Case for Being Average: A Mediocrity Approach to Style Masking and  Author Obfuscation\nBridging Static and Dynamic Program Analysis using Fuzzy Logic\nToward Incorporation of Relevant Documents in word2vec\nAutOMP: An Automatic OpenMP Parallelization Generator for  Variable-Oriented High-Performance Scientific Codes\nShotgunWSD: An unsupervised algorithm for global word sense  disambiguation inspired by DNA sequencing\nOpening the black box of energy modelling: Strategies and lessons  learned\nA network approach to topic models\nOn-Stack Replacement à la Carte\n\"Is there anything else I can help you with?\": Challenges in Deploying  an On-Demand Crowd-Powered Conversational Agent\nModality-specific Cross-modal Similarity Measurement with Recurrent  Attention Network\nPolarizability Extraction for Waveguide-Fed Metasurfaces\nLarge-Scale Domain Adaptation via Teacher-Student Learning\nAbstractions for Verifying Isolation Properties in Stateful Networks\nLeveraging Deep Neural Network Activation Entropy to cope with Unseen  Data in Speech Recognition\nA Semi-Supervised Approach to Detecting Stance in Tweets\nTowards Runtime Adaptation of Actor Systems\nAssessing State-of-the-Art Sentiment Models on State-of-the-Art  Sentiment Datasets\nFoundations of Complex Event Processing\nNonnegative HMM for Babble Noise Derived from Speech HMM: Application to  Speech Enhancement\nWord Vector Enrichment of Low Frequency Words in the Bag-of-Words Model  for Short Text Multi-class Classification Problems\nUnsupervised Machine Learning for Networking: Techniques, Applications  and Research Challenges\nAnalyzing users' sentiment towards popular consumer industries and  brands on Twitter\nA Symbolic Approach to Safety LTL Synthesis\nMRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product  Embeddings\nDeepSafe: A Data-driven Approach for Checking Adversarial Robustness in  Neural Networks\nEnumeration Problems for Regular Path Queries\nInfinitesimal deformations of Poisson bi-vectors using the Kontsevich  graph calculus\nNeural Program Meta-Induction\n(3+1)-Dimensional Topologically Massive 2-form Gauge Theory: Geometrical  Superfield Approach\nPaxos Made EPR: Decidable Reasoning about Distributed Protocols\nThe Emptiness Problem for Valence Automata over Graph Monoids\nProgram Synthesis using Abstraction Refinement\nWorldlines and worldsheets for non-abelian lattice field theories:  Abelian color fluxes and Abelian color cycles\nProbabilistic Couplings for Probabilistic Reasoning\nSequence-to-Sequence ASR Optimization via Reinforcement Learning\nLearning K-way D-dimensional Discrete Code For Compact Embedding  Representations\nWeakly-supervised Relation Extraction by Pattern-enhanced Embedding  Learning\nThe Exoplanet Simple Orbit Fitting Toolbox (ExoSOFT): An Open-Source  Tool for Efficient Fitting of Astrometric and Radial Velocity Data\nSkipFlow: Incorporating Neural Coherence Features for End-to-End  Automatic Text Scoring\nDual-Path Convolutional Image-Text Embedding\nGame Characterization of Probabilistic Bisimilarity, and Applications to  Pushdown Automata\nCache-based Document-level Neural Machine Translation\nDon't Just Assume; Look and Answer: Overcoming Priors for Visual  Question Answering\nFormal Representation of SysML/KAOS Domain Model (Complete Version)\nAutomated Refactoring of Nested-IF Formulae in Spreadsheets\nVisual Text Correction\nAre the different layers of a social network conveying the same  information?\nA Survey on Compiler Autotuning using Machine Learning\nSequences, yet Functions: The Dual Nature of Data-Stream Processing\nHandwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder  Network\nLensing of Fast Radio Bursts by Binaries to Probe Compact Dark Matter\nDKN: Deep Knowledge-Aware Network for News Recommendation\nTell-and-Answer: Towards Explainable Visual Question Answering using  Attributes and Captions\nOpen Material Property Library With Native Simulation Tool Integrations  -- MASTO\nOn-the-fly Detection of Autogenerated Tweets\nEffective Quantization Approaches for Recurrent Neural Networks\nReinforceWalk: Learning to Walk in Graph with Monte Carlo Tree Search\nTVM: End-to-End Optimization Stack for Deep Learning\nNtMalDetect: A Machine Learning Approach to Malware Detection Using  Native API System Calls\nHWNet v2: An Efficient Word Image Representation for Handwritten  Documents\nBorel Kernels and their Approximation, Categorically\nExpert Finding in Heterogeneous Bibliographic Networks with  Locally-trained Embeddings\nSpatial Graph Convolutions for Drug Discovery\nCombining Symbolic Execution and Model Checking to Verify MPI Programs\nLearning Eligibility in Cancer Clinical Trials using Deep Neural  Networks\nClipping free attacks against artificial neural networks\nPerformance evaluation and hyperparameter tuning of statistical and  machine-learning models using spatial data\nReal Time Sentiment Change Detection of Twitter Data Streams\nAn Extended Low Fat Allocator API and Applications\nReactive Control Improvisation\nTree Structures: A Variational Approach to Shannon--Wiener Information\nLow-resolution spectroscopy of main sequence stars belonging to 12  Galactic globular clusters. I. CH and CN band strength variations\nKepler's Orbits and Special Relativity in Introductory Classical  Mechanics\nExpressibility in the Lambda Calculus with Letrec\nThe homotopy theory of diffeological spaces\nThe Classification of Two-Dimensional Extended Topological Field  Theories\nTowards the most general scalar-tensor theories of gravity: a unified  approach in the language of differential forms\nMassive migration from the steppe is a source for Indo-European  languages in Europe\nFormation {à} distance et outils num{é}riques pour l'enseignement  sup{é}rieur et la recherche en Asie-Pacifique (Cambodge, Laos, Vietnam).  Partie 02 : recommandations et feuille de route\nThe Effective Electroweak Chiral Lagrangian: The Matter Sector\nHecke Algebras, SVD, and Other Computational Examples with {\\sc  CLIFFORD}\nStructural and Dynamical Aspects of the AdS/CFT Correspondence: a  Rigorous Approach\nEvolution of magnetic fields in galaxies and future observational tests  with the Square Kilometre Array\nOptics for X-ray telescopes: analytical treatment of the off-axis  effective area of mirrors in optical modules\nThe MeqTrees software system and its use for third-generation  calibration of radio interferometers\nResource Bounded Measure\nTaming Numbers and Durations in the Model Checking Integrated Planning  System\nObligation Blackwell Games and p-Automata\nQuantum cohomology and toric minimal model programs\nEdge state inner products and real-space entanglement spectrum of trial  quantum Hall states\nGeneralization of the Menger's Theorem to Simplicial Complexes and  Certain Invariants of the Underlying Topological Spaces\nOrthogonal non-Gaussianity in DBI Galileon: prospect for Planck  polarisation and post-Planck experiments\nA Constraint Satisfaction Framework for Executing Perceptions and  Actions in Diagrammatic Reasoning\nExact and Approximate Determinization of Discounted-Sum Automata\nThe Mathematical Abstraction Theory, The Fundamentals for Knowledge  Representation and Self-Evolving Autonomous Problem Solving Systems\nGaps, Rings, and Non-Axisymmetric Structures in Protoplanetary Disks -  From Simulations to ALMA Observations\nEnabling High-Level Application Development for the Internet of Things\nScheme theoretic tropicalization\nTowards Scalable Synthesis of Stochastic Control Systems\nA Novel Framework for Robustness Analysis of Visual QA Models\nDataflow Matrix Machines and V-values: a Bridge between Programs and  Neural Nets\nDistributions and Euler systems for the general linear group\nOne Deep Music Representation to Rule Them All? : A comparative analysis  of different representation learning strategies\nToric varieties and minimal complexes\nSimilarity-Based Estimation of Word Cooccurrence Probabilities\nParsing of Spoken Language under Time Constraints\nKnowledge Representation for Lexical Semantics: Is Standard First Order  Logic Enough?\nA syntax-based part-of-speech analyser\nUsing a Corpus for Teaching Turkish Morphology\nPrepositional Phrase Attachment through a Backed-Off Model\nConserving Fuel in Statistical Language Learning: Predicting Data  Requirements\nSyntactic Analyses for Parallel Grammars: Auxiliaries and Genitive NPs\nNoun-Phrase Analysis in Unrestricted Text for Information Retrieval\nAn Empirical Study of Smoothing Techniques for Language Modeling\nTSNLP - Test Suites for Natural Language Processing\nMorphological Analysis as Classification: an Inductive-Learning Approach\nAlgorithms for Speech Recognition and Language Processing\nHow much has information technology contributed to linguistics?\nIntonational Boundaries, Speech Repairs and Discourse Markers: Modeling  Spoken Dialog\nText Segmentation Using Exponential Models\nStructure and Ostension in the Interpretation of Discourse Deixis\nDIA-MOLE: An Unsupervised Learning Approach to Adaptive Dialogue Models  for Spoken Dialogue Systems\nexplanation-based learning of data oriented parsing\nIntegrating a Lexical Database and a Training Collection for Text  Categorization\nUsing WordNet to Complement Training Information in Text Categorization\nHierarchical Non-Emitting Markov Models\nTreatment of Epsilon-Moves in Subset Construction\nWord-to-Word Models of Translational Equivalence\nGraph Interpolation Grammars as Context-Free Automata\nParallel Strands: A Preliminary Investigation into Mining the Web for  Bilingual Text\nDeriving the Predicate-Argument Structure for a Free Word Order Language\nPrefix Probabilities from Stochastic Tree Adjoining Grammars\nEvaluation of the NLP Components of the OVIS2 Spoken Dialogue System\nThe Alma Project, or How First-Order Logic Can Help Us in Imperative  Programming\nVerbal Interactions in Virtual Worlds\nSintesi di algoritmi con SKY\nAn Integrated Development Environment for Declarative Multi-Paradigm  Programming\nModels and Tools for Collaborative Annotation\nTransformations of Logic Programs with Goals as Arguments\nInformation Revolution\nCombining Independent Modules to Solve Multiple-choice Synonym and  Analogy Problems\nEmbedding Web-based Statistical Translation Models in Cross-Language  Information Retrieval\nA Tribute to Alain Colmerauer\nCHR Grammars\nLogic Column 10: Specifying Confidentiality\nVers un environnement multi personnalites pour la configuration et le  deploiement d'applications a base de composants logiciels\nToward a Human-Centered Uml for Risk Analysis\nCombining Independent Modules in Lexical Multiple-Choice Problems\nOn the Complexity of Nonrecursive XQuery and Functional Query Languages  on Complex Values\nChecking C++ Programs for Dimensional Consistency\nObject-Oriented Modeling of Programming Paradigms\nOn the tree-transformation power of XSLT\nPackrat Parsing: Simple, Powerful, Lazy, Linear Time\nAn Internet Framework to Bring Coherence between WAP and HTTP Ensuring  Better Mobile Internet Security\nDesign Strategies and Knowledge in Object-Oriented Programming: Effects  of Experience\nUsers' participation to the design process in an Open Source Software  online community\nLogic Meets Algebra: the Case of Regular Languages\nSupersymmetries and Gauge Natural Theories\nNotes on 2D Conformal Field Theory and String Theory\nSemiinfinite cohomology of Tate Lie algebras\nElementary equivalence versus Isomorphism\nBunches of cones in the divisor class group -- A new combinatorial  language for toric varieties\nModel theoretic reformulation of the Baum-Connes and Farrell-Jones  conjectures\nPattern avoiding permutations are context-sensitive\nUniqueness of quantization of complex contact manifolds\nOn the equivalence of two quantifier elimination tests\nBounded fitness landscapes and the evolution of the linguistic diversity\nA functional quantum programming language\nOpinion Dynamics and Sociophysics\nOn the Hopcroft's minimization algorithm\nAbstract numeration systems on bounded languages and multiplication by a  constant\nRational subsets of polycyclic monoids and valence automata\nAn approximation trichotomy for Boolean #CSP\nLet's get the student into the driver's seat\nOntology and Formal Semantics - Integration Overdue\nModular Compilation of a Synchronous Language\nThe topology of syntax relations of a formal language\nMusic, Complexity, Information\nHighly Undecidable Problems about Recognizability by Tiling Systems\nKruskal's theorem\nA Lightweight Combination of Semantics for Non-deterministic Functions\nFormally Specifying and Proving Operational Aspects of Forensic Lucid in  Isabelle\nFragments of first-order logic over infinite words\nFunction Interface Models for Hardware Compilation: Types, Signatures,  Protocols\nBounded Languages Meet Cellular Automata with Sparse Communication\nWord Sense Disambiguation Using English-Spanish Aligned Phrases over  Comparable Corpora\nOn Pebble Automata for Data Languages with Decidable Emptiness Problem\nBuilding a Vietnamese Language Query Processing Framework for ELibrary  Searching Systems\nLexical evolution rates by automated stability measure\nDocument Searching System based on Natural Language Query Processing for  Vietnam Open Courseware Library\nSentence Simplification Aids Protein-Protein Interaction Extraction\nDeriving Relationship Between Semantic Models - An Approach for cCSP\nPolychronous Interpretation of Synoptic, a Domain Specific Modeling  Language for Embedded Flight-Software\nCloud Process Execution Engine - Evaluation of the Core Concepts\nProceedings Eighth Workshop on Quantitative Aspects of Programming  Languages\nAutomated co-evolution of GMF editor models\nComparative Studies of Programming Languages; Course Lecture Notes\nCHR(PRISM)-based Probabilistic Logic Learning\nDecidability properties for fragments of CHR\nSimplifying Complex Software Assembly: The Component Retrieval Language  and Implementation\nA Model of Cooperative Threads\nVector bundles on elliptic curves and factors of automorphy\nFuzzy Ontology Representation using OWL 2\nSmart Bengali Cell Phone Keypad Layout\nIntegration of Agile Ontology Mapping towards NLP Search in I-SOAS\nContinuation-Passing C: compiling threads to events through  continuations\nThe growth function of S-recognizable sets\nQuantum picturalism for topological cluster-state computing\nA decompilation of the pi-calculus and its application to termination\nThe effect of linguistic constraints on the large scale organization of  language\nThe Language Features and Architecture of B-Prolog\nNested Refinements for Dynamic Languages\nA Data Mining view on Class Room Teaching Language\nThe Complexity of Mean-Payoff Automaton Expression\nConditional Elimination through Code Duplication\nAnswer Set Planning Under Action Costs\nEntropy of Telugu\nRascal: From Algebraic Specification to Meta-Programming\nProceedings Ninth Workshop on Quantitative Aspects of Programming  Languages\nEvent-Clock Automata: From Theory to Practice\nReasoning in the OWL 2 Full Ontology Language using First-Order  Automated Theorem Proving\nLinearization of CIF Through SOS\nSome Measurements of Nullable and Non-Nullable Parameter Declarations in  Relation to Software Malleability\nMELT - a Translated Domain Specific Language Embedded in the GCC  Compiler\nRule based Part of speech Tagger for Homoeopathy Clinical realm\nWell-typed Islands Parse Faster\nOrganizing the Aggregate: Languages for Spatial Computing\nModelling Social Structures and Hierarchies in Language Evolution\nOn Global Types and Multi-Party Session\nObservability of Turing Machines: a Refinement of the Theory of  Computation\nProceedings 6th Workshop on Logical and Semantic Frameworks with  Applications\nThe Power of Linear Programming for Valued CSPs\nMultilingual Topic Models for Unaligned Text\nDynamic Verification for File Safety of Multithreaded Programs\nC to O-O Translation: Beyond the Easy Stuff\nThe Complexity of Learning Principles and Parameters Grammars\nProceedings 10th Workshop on Quantitative Aspects of Programming  Languages and Systems\nTowards the Formal Specification and Verification of Maple Programs\nDistinct word length frequencies: distributions and symbol entropies\nThe abelian complexity of the paperfolding word\nOne-Way Reversible and Quantum Finite Automata with Advice\nEvaluation of some Information Retrieval models for Gujarati Ad hoc  Monolingual Tasks\nOstrowski Numeration and the Local Period of Sturmian Words\nDesign of English-Hindi Translation Memory for Efficient Translation\nA Rule-Based Approach For Aligning Japanese-Spanish Sentences From A  Comparable Corpora\nJooFlux : modification de code à chaud et injection d'aspects  directement dans une JVM 7\nThe Twitter of Babel: Mapping World Languages through Microblogging  Platforms\nBuilding a reordering system using tree-to-string hierarchical model\nIntegers in number systems with positive and negative quadratic Pisot  base\nA Classification of Adjectives for Polarity Lexicons Enhancement\nThe Story of Telebrain: A multi-performer telematic platform for  performatization\nDetermination through Universals: An Application of Category Theory in  the Life Sciences\nA Case Study in Coordination Programming: Performance Evaluation of  S-Net vs Intel's Concurrent Collections\nThe Quantum Challenge in Concept Theory and Natural Language Processing\nFrom Principles to Practice with Class in the First Year\nCompetency Tracking for English as a Second or Foreign Language Learners\nArtificial Intelligence MArkup Language: A Brief Tutorial\nPhysics as a Mechanism for Including ELLs in Classroom Discourse\nSingle-tape and Multi-tape Turing machines through the lens of the  Grossone methodology\nAn Overview of Nominal-Typing versus Structural-Typing in OOP\nPreliminary Notes on Termination and Non-Termination Reasoning\nColourful Language: Measuring Word-Colour Associations\nGoogle Scholar Metrics evolution: an analysis according to languages\nIndividual Biases, Cultural Evolution, and the Statistical Nature of  Language Universals: The Case of Colour Naming Systems\nImpact of Indentation in Programming\nSemantic Types, Lexical Sorts and Classifiers\nFactorial Hidden Markov Models for Learning Representations of Natural  Language\nMultilingual Distributed Representations without Word Alignment\nThe confidence interval methods in quantum language\nLearning Multilingual Word Representations using a Bag-of-Words  Autoencoder\nImproving Performance Of English-Hindi Cross Language Information  Retrieval Using Transliteration Of Query Terms\nMultilingual Part-of-Speech Tagging: Two Unsupervised Approaches\nLanguage Heedless of Logic - Philosophy Mindful of What? Failures of  Distributive and Absorption Laws\nImNet: An Imperative Network Programming Language\nAn Intensional Concurrent Faithful Encoding of Turing Machines\nSemantics and Validation of Shapes Schemas for RDF\nExtracting a bilingual semantic grammar from FrameNet-annotated corpora\nAn Efficient Solution for Model Checking Abstract State Machine Using  Bogor\nOverview of Stemming Algorithms for Indian and Non-Indian Languages\nOn the Role of Canonicity in Bottom-up Knowledge Compilation\nFO(C) and Related Modelling Paradigms\nFO(C): A Knowledge Representation Language of Causality\nHaskell for OCaml programmers\nBuffered Simulation Games for Büchi Automata\nMotàMot project: conversion of a French-Khmer published dictionary for  building a multilingual lexical system\nMathematical Language Processing Project\nSymbolic Algorithms for Language Equivalence and Kleene Algebra with  Tests\nThe two envelopes paradox in non-Bayesian and Bayesian statistics\nUsing Mechanical Turk to Build Machine Translation Evaluation Sets\nDetecting Structural Irregularity in Electronic Dictionaries Using  Language Modeling\nRapid Adaptation of POS Tagging for Domain Specific Uses\nSCULPT: A Schema Language for Tabular Data on the Web\nImaginaries in bounded pseudo real closed fields\nStep-Indexed Logical Relations for Probability (long version)\nCommunication and games in the online foreign language educational  system. User behavior study\nConsensus Game Acceptors and Iterated Transductions\nImplementation of an Automatic Syllabic Division Algorithm from Speech  Files in Portuguese Language\nTurn Segmentation into Utterances for Arabic Spontaneous Dialogues and  Instance Messages\nRecursion in RDF Data Shape Languages\nBoosting Named Entity Recognition with Neural Character Embeddings\nEmbedding rationally independent languages into maximal ones\nA New Approach to Probabilistic Programming Inference\nDependency Recurrent Neural Language Models for Sentence Completion\nReversible Watson-Crick Automata\nBuzz: An Extensible Programming Language for Self-Organizing  Heterogeneous Robot Swarms\nThe Polylingual Labeled Topic Model\nEverything old is new again: Quoted Domain Specific Languages\nTowards a Decoupled Context-Oriented Programming Language for the  Internet of Things\nChoreographies, Computationally\nNoisy-parallel and comparable corpora filtering methodology for the  extraction of bi-lingual equivalent data at sentence level\nStatistical Parsing by Machine Learning from a Classical Arabic Treebank\nBound Your Models! How to Make OWL an ASP Modeling Language\nQuantum Alternation: Prospects and Problems\nMulti-lingual Geoparsing based on Machine Translation\nGrowing Wikipedia Across Languages via Recommendation\nVisual Storytelling\nEilenberg theorems for many-sorted formations\nClustering Comparable Corpora of Russian and Ukrainian Academic Texts:  Word Embeddings and Semantic Fingerprints\nA declarative JVM Language for Automated Validation\nThe Dichotomy for Conservative Constraint Satisfaction is Polynomially  Decidable\nNew word analogy corpus for exploring embeddings of Czech words\nDual Density Operators and Natural Language Meaning\nA Formal, Resource Consumption-Preserving Translation of Actors to  Haskell\nDevito: automated fast finite difference computation\nWhere are my followers? Understanding the Locality Effect in Twitter\nAutonomous Agents Coordination: Action Languages meet CLP(FD) and Linda\nIterated Hairpin Completions of Non-crossing Words\nTypes for X10 Clocks\nThe Automorphism Group of a Resplendent Model\nThe risks of mixing dependency lengths from sequences of different  length\nA Web-based Multilingual Intelligent Tutor System based on Jackson's  Learning Styles Profiler and Expert Systems\nAutomatic Construction and Natural-Language Description of Nonparametric  Regression Models\nWhen Learners Surpass their Sources: Mathematical Modeling of Learning  from an Inconsistent Source\nAn efficient algorithm for computing the edit distance of a regular  language via input-altering transducers\nSubword complexity and decomposition of the set of factors\nA survey on phrase structure learning methods for text classification\nModal Object Diagrams\nDelta Modeling for Software Architectures\nStatistical Patterns in Written Language\nDiverse Embedding Neural Network Language Models\nScaling Recurrent Neural Network Language Models\nA Falsification View of Success Typing\nAutomata and rational expressions\nVarieties\nEncoding Source Language with Convolutional Neural Network for Machine  Translation\nIMP with exceptions over decorated logic\nPhrase database Approach to structural and semantic disambiguation in  English-Korean Machine Translation\nImplementing an intelligent version of the classical sliding-puzzle game  for unix terminals using Golang's concurrency primitives\nProceedings Tenth International Workshop on Developments in  Computational Models\nA Definability Dichotomy for Finite Valued CSPs\nSymbolic Manipulation of Code Properties\nTrust, but Verify: Two-Phase Typing for Dynamic Languages\nTexts in, meaning out: neural language models in semantic similarity  task for Russian\nDo Multi-Sense Embeddings Improve Natural Language Understanding?\nAuthor Identification using Multi-headed Recurrent Neural Networks\nPragmatic Side Effects\nNoncommutative Valiant's Classes: Structure and Complete Problems\nAutomata networks model for alignment and least effort on vocabulary  formation\nFinding Function in Form: Compositional Character Models for Open  Vocabulary Word Representation\nSyntax Evolution: Problems and Recursion\nThe influence of Chunking on Dependency Crossing and Distance\nThe USFD Spoken Language Translation System for IWSLT 2014\nIteration Algebras for UnQL Graphs and Completeness for Bisimulation\nEfficient Algorithms for Morphisms over Omega-Regular Languages\nTowards a Direct, By-Need Evaluator for Dependently Typed Languages\nA Sound and Complete Hoare Logic for Dynamically-Typed, Object-Oriented  Programs -- Extended Version --\nPolish - English Speech Statistical Machine Translation Systems for the  IWSLT 2014\nReal-Time Statistical Speech Translation\nA Sentence Meaning Based Alignment Method for Parallel Text Corpora  Preparation\nPolish - English Speech Statistical Machine Translation Systems for the  IWSLT 2013\nProceedings ML Family/OCaml Users and Developers workshops\nSemantic Boolean Arabic Information Retrieval\nStrategies for Training Large Vocabulary Neural Language Models\nTechnical Report: a tool for measuring Prosodic Accommodation\nPart-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON  2015\nElaborate lexicon extended language with a lot of conceptual information\nEmpirical Gaussian priors for cross-lingual transfer learning\nUnrestricted State Complexity of Binary Operations on Regular Languages\nFantastic 4 system for NIST 2015 Language Recognition Evaluation\nContextual Media Retrieval Using Natural Language Queries\nEilenberg Theorems for Free\nBioinformatics and Classical Literary Study\nEmbedding by Normalisation\nSegmentation from Natural Language Expressions\nDo You See What I Mean? Visual Resolution of Linguistic Ambiguities\nPrepositional Attachment Disambiguation Using Bilingual Parsing and  Alignments\nSemantic Spaces\nUniversal Dependencies for Learner English\nWord2Vec is a special case of Kernel Correspondence Analysis and Kernels  for Natural Language Processing\niStar 2.0 Language Guide\nSS4MCT: A Statistical Stemmer for Morphologically Complex Texts\nUsing stories to bridge the chasm between perspectives: How metaphors  and genres are used to share meaning\nSemi-Supervised Learning for Neural Machine Translation\nA Distributional Semantics Approach to Implicit Language Learning\nEvaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource\nA Note on Nested String Replacements\nSyntactic Phylogenetic Trees\nNeural Semantic Encoders\nEnriching Word Vectors with Subword Information\nIs spoken language all-or-nothing? Implications for future speech-based  human-machine interaction\nMining Arguments from Cancer Documents Using Natural Language Processing  and Ontologies\nCharacterizations and Effective Computation of Supremal Relatively  Observable Sublanguages\nA Large Scale Corpus of Gulf Arabic\nBounded-oscillation Pushdown Automata\nGrammatical Templates: Improving Text Difficulty Evaluation for Language  Learners\nTaming Context-Sensitive Languages with Principled Stateful Parsing\nEnhanced LSTM for Natural Language Inference\nPointer Sentinel Mixture Models\nemoji2vec: Learning Emoji Representations from their Description\nSyntactic Structures and Code Parameters\nRefinement Reflection (or, how to turn your favorite language into a  proof assistant using SMT)\nPath discovery by Querying the federation of Relational Database and RDF  Graph\nMultiactive objects and their applications\nCBAS: context based arabic stemmer\nTying Word Vectors and Word Classifiers: A Loss Framework for Language  Modeling\nCruciform: Solving Crosswords with Natural Language Processing\nMulti-Language Identification Using Convolutional Recurrent Neural  Network\nMixing Metaphors: Actors as Channels and Channels as Actors (Extended  Version)\nOn the quantized dynamics of factorial languages\nFormal Languages, Formally and Coinductively\nEnd-to-End Joint Learning of Natural Language Understanding and Dialogue  Manager\nCoupling Distributed and Symbolic Execution for Natural Language Queries\nMachine Reading with Background Knowledge\nA CRF Based POS Tagger for Code-mixed Indian Social Media Text\nLanguage Modeling with Gated Convolutional Networks\nWhat the Language You Tweet Says About Your Occupation\nSystems of natural-language-facilitated human-robot cooperation: A  review\nUsing English as Pivot to Extract Persian-Italian Parallel Sentences  from Non-Parallel Corpora\nSymbolic, Distributed and Distributional Representations for Natural  Language Processing in the Era of Deep Learning: a Survey\nAn introduction to singular SPDEs\nUniversal Semantic Parsing\nBilateral Multi-Perspective Matching for Natural Language Sentences\nHandwritten Arabic Numeral Recognition using Deep Learning Neural  Networks\nPerson Search with Natural Language Description\nEnabling Multi-Source Neural Machine Translation By Concatenating Source  Sentences In Multiple Languages\nJolie Static Type Checker: a prototype\nDynamic Word Embeddings for Evolving Semantic Discovery\nA Comment on \"Asking photons where they have been in plain language\"\nLanguage Use Matters: Analysis of the Linguistic Structure of Question  Texts Can Characterize Answerability in Quora\nMorphological Analysis for the Maltese Language: The Challenges of a  Hybrid System\nLearning Simpler Language Models with the Differential State Framework\nI CAN HAS SUPERCOMPUTER? A Novel Approach to Teaching Parallel and  Distributed Computing Concepts Using a Meme-Based Programming Language\nCharacter-Word LSTM Language Models\nSpatio-temporal Person Retrieval via Natural Language Queries\nMapping Objects to Persistent Predicates\nTowards Practical, Precise and Parametric Energy Analysis of IT  Controlled Systems\nTALL: Temporal Activity Localization via Language Query\nSupervised Learning of Universal Sentence Representations from Natural  Language Inference Data\nOn the Height of Towers of Subsequences and Prefixes\nFormalising opencypher Graph Queries in Relational Algebra\nSequential Dialogue Context Modeling for Spoken Language Understanding\nLogical and Algebraic Characterizations of Rational Transductions\nDecoding Sentiment from Distributed Representations of Sentences\nA Lightweight Regression Method to Infer Psycholinguistic Properties for  Brazilian Portuguese\nAsk the Right Questions: Active Question Reformulation with  Reinforcement Learning\nOn Multilingual Training of Neural Dependency Parsers\nSemantic Specialisation of Distributional Word Vector Spaces using  Monolingual and Cross-Lingual Constraints\nAcquisition of Translation Lexicons for Historically Unwritten Languages  via Bridging Loanwords\nVerb Physics: Relative Physical Knowledge of Actions and Objects\nCross-lingual Speaker Verification with Deep Feature Learning\nDocument Spanners for Extracting Incomplete Information: Expressiveness  and Complexity\nN-GrAM: New Groningen Author-profiling Model\nQuon language: surface algebras and Fourier duality\nA Web-Based Tool for Analysing Normative Documents in English\nAutomatic Speech Recognition with Very Large Conversational Finnish and  Estonian Vocabularies\nComputing LPMLN Using ASP and MLN Solvers\nExploring Neural Transducers for End-to-End Speech Recognition\nImage Pivoting for Learning Multilingual Multimodal Representations\nDual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation  Functions in Quasi-Recurrent Neural Networks\nA mathematically derived definitional/semantical theory of truth\nRevisiting Activation Regularization for Language RNNs\nRecursive Whitening Transformation for Speaker Recognition on Language  Mismatched Condition\nRegularizing and Optimizing LSTM Language Models\nCross-lingual Entity Alignment via Joint Attribute-Preserving Embedding\nA Semiotics-inspired Domain-Specific Modeling Language for Complex Event  Processing Rules\nFuture Word Contexts in Neural Network Language Models\nPatterns versus Characters in Subword-aware Neural Language Modeling\nGetting Reliable Annotations for Sarcasm in Online Dialogues\nLanguage Modeling by Clustering with Word Embeddings for Text  Readability Assessment\nMK-fuzzy Automata and MSO Logics\nCross-lingual Word Segmentation and Morpheme Segmentation as Sequence  Labelling\nOn Decidability of the Ordered Structures of Numbers\nLanguage Modeling with Highway LSTM\nUsing objective words in the reviews to improve the colloquial arabic  sentiment analysis\nImproving a Multi-Source Neural Machine Translation Model with Corpus  Extension for Low-Resource Languages\nTowards a Minimal Stabilizer ZX-calculus\nApplication of a Hybrid Bi-LSTM-CRF model to the task of Russian Named  Entity Recognition\nMathematical foundations of matrix syntax\nAutoMode: Relational Learning With Less Black Magic\nVisual and Textual Programming Languages: A Systematic Review of the  Literature\nExperimental Biological Protocols with Formal Semantics\nLinear Haskell: practical linearity in a higher-order polymorphic  language\nExploring Asymmetric Encoder-Decoder Structure for Context-based  Sentence Representation Learning\nDistributed Representation for Traditional Chinese Medicine Herb via  Deep Learning Models\nUnbounded cache model for online language modeling with open vocabulary\nMultilingual Adaptation of RNN Based ASR Systems\nA Language for Probabilistically Oblivious Computation\nA critical analysis of string APIs: The case of Pharo\nCurriculum Q-Learning for Visual Vocabulary Acquisition\nARbis Pictus: A Study of Language Learning with Augmented Reality\nNeural Cross-Lingual Entity Linking\nLearning Interpretable Spatial Operations in a Rich 3D Blocks World\nA High-Level Rule-based Language for Software Defined Network  Programming based on OpenFlow\nQuaternionic Wavefunction\nUnsupervised Word Mapping Using Structural Similarities in Monolingual  Embeddings\nObject-Orientation in Graph-Based Design Grammars\nBuilding a Sentiment Corpus of Tweets in Brazilian Portuguese\nHigher physical fitness levels are associated with less language decline  in healthy ageing\nApplying Vector Space Model (VSM) Techniques in Information Retrieval  for Arabic Language\nEnhancing Translation Language Models with Word Embedding for  Information Retrieval\nCommon factors in automatic and Sturmian sequences\nWeb-Based Implementation of Travelling Salesperson Problem Using Genetic  Algorithm\nLinguistic unit discovery from multi-modal inputs in unwritten  languages: Summary of the \"Speaking Rosetta\" JSALT 2017 Workshop\nSpace Improvements and Equivalences in a Functional Core Language\nRandomized sliding window algorithms for regular languages\nImproving Sentiment Analysis in Arabic Using Word Representation\nDolphin: a task orchestration language for autonomous vehicle networks\nOn Modular Training of Neural Acoustics-to-Word Model for LVCSR\nSentiment Analysis of Code-Mixed Indian Languages: An Overview of  SAIL_Code-Mixed Shared Task @ICON-2017\nEnglish-Catalan Neural Machine Translation in the Biomedical Domain  through the cascade approach\nMultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for  Aspect-level Sentiment Classification\nJumps in speeds of hereditary properties in finite relational languages\nAn LP-based hyperparameter optimization model for language modeling\nVision as an Interlingua: Learning Multilingual Semantic Embeddings of  Untranscribed Speech\nCoverability: Realizability Lower Bounds\nSentiment Transfer using Seq2Seq Adversarial Autoencoders\nPredicting Twitter User Socioeconomic Attributes with Network and  Language Information\nLanguage Recognition using Time Delay Deep Neural Network\nExponentially more concise quantum recognition of non-RMM regular  languages\nAn overview of Ciao and its design philosophy\nAn Expressive Language and Efficient Execution System for Software  Agents\nA new keyphrases extraction method based on suffix tree data structure  for arabic documents clustering\nSemantics derived automatically from language corpora contain human-like  biases\nCognitive Science in the era of Artificial Intelligence: A roadmap for  reverse-engineering the infant language-learner\nEl Lenguaje Natural como Lenguaje Formal\nScaling Reliably: Improving the Scalability of the Erlang Distributed  Actor Platform\nLanguage Design and Renormalization\nNatural Language Parsing as Statistical Pattern Recognition\nInducing Features of Random Fields\nCombining Hand-crafted Rules and Unsupervised Learning in  Constraint-based Morphological Disambiguation\nAn Abstract Machine for Unification Grammars\nSpeech Repairs, Intonational Boundaries and Discourse Markers: Modeling  Speakers' Utterances in Spoken Dialog\nRecognizing Syntactic Errors in the Writing of Second Language Learners\nA Logic Programming Approach to Knowledge-State Planning: Semantics and  Complexity\nJapanese/English Cross-Language Information Retrieval: Exploration of  Query Translation and Transliteration\nA Chart-Parsing Algorithm for Efficient Semantic Analysis\nAnalyzing language development from a network approach\nCharacterizations of 1-Way Quantum Finite Automata\nComputational approach to the emergence and evolution of language -  evolutionary naming game model\nOn Recognizable Languages of Infinite Pictures\nDu corpus au dictionnaire\nProbabilistic Weighted Automata\nAutomated words stability and languages phylogeny\nA Topological derivative based image segmentation for sign language  recognition system using isotropic filter\nMinimisation of Deterministic Parity and Buchi Automata and Relative  Minimisation of Deterministic Finite Automata\nTowards Design and Implementation of a Language Technology based  Information Processor for PDM Systems\nRecovering Grammar Relationships for the Java Language Specification\nProposing LT based Search in PDM Systems for Better Information  Retrieval\nThe Design and Implementation of Typed Scheme: From Scripts to Programs\nA Regularity Measure for Context Free Grammars\nLeft Recursion in Parsing Expression Grammars\nThe challenges of statistical patterns of language: the case of  Menzerath's law in genomes\nMJ no more: Using Concurrent Wikipedia Edit Spikes with Social Network  Plausibility Checks for Breaking News Detection\nModeling the emergence of a new language: Naming Game with hybridization\nSmall Depth Proof Systems\nProceedings Combined 20th International Workshop on Expressiveness in  Concurrency and 10th Workshop on Structural Operational Semantics\nTCC, with History\nIntensional Cyberforensics\nDeterministic Logics for UL\nEvaluating the fully automatic multi-language translation of the Swiss  avalanche bulletin\nProceedings Combined 21st International Workshop on Expressiveness in  Concurrency and 11th Workshop on Structural Operational Semantics\nA complete graphical calculus for Spekkens' toy bit theory\nDescription and Optimization of Abstract Machines in a Dialect of Prolog\nA Note on Monitors and Büchi automata\nStories in the Eye: Contextual Visual Interactions for Efficient Video  to Language Translation\nA Productivity Checker for Logic Programming\nChannels as Objects in Concurrent Object-Oriented Programming\nVisualization of Object Oriented Modeling from the Perspective of Set  theory\nImproving Accessibility of Archived Raster Dictionaries of Complex  Script Languages\nText mixing shapes the anatomy of rank-frequency distributions: A modern  Zipfian mechanics for natural language\nExemplar Dynamics and Sound Merger in Language\nOpen System Categorical Quantum Semantics in Natural Language Processing\nPersonalizing Universal Recurrent Neural Network Language Model with  User Characteristic Features by Social Network Crowdsouring\nBlock-Level Parallelism in Parsing Block Structured Languages\nWords are not Equal: Graded Weighting Model for building Composite  Document Vectors\nTopical differences between Chinese language Twitter and Sina Weibo\nTranslingual Obfuscation\nSound and Complete Runtime Security Monitor for Application Software\nA Graph-Based Semantics Workbench for Concurrent Asynchronous Programs\nSpatial logic of modal mu-calculus and tangled closure operators\nMulti-domain machine translation enhancements by parallel data  extraction from comparable corpora\nAsk Your Neurons: A Deep Learning Approach to Visual Question Answering\nComplexity Bounds of Constant-Space Quantum Computation\nComplexity of Universality and Related Problems for Partially Ordered  NFAs\nMapping Between fMRI Responses to Movies and their Natural Language  Annotations\nUnderstanding and maintaining tactics graphically OR how we are learning  that a diagram can be worth more than 10K LoC\nType oriented parallel programming for Exascale\nRecurrent Neural Network Language Model Adaptation Derived Document  Vector\nLanguage Support for Reliable Memory Regions\nLearning to Play Guess Who? and Inventing a Grounded Language as a  Consequence\nWikiwhere: An interactive tool for studying the geographical provenance  of Wikipedia references\nBuilding Code with Dynamic Staging\nArabic Language Sentiment Analysis on Health Services\nToward Semantic Foundations for Program Editors\nModeling the life and death of competing languages from a physical and  mathematical perspective\nStructural Stability of Lexical Semantic Spaces: Nouns in Chinese and  French\nA Pattern Language for High-Performance Computing Resilience\nCross-language Framework for Word Recognition and Spotting of Indic  Scripts\nDetecting Cross-Lingual Plagiarism Using Simulated Word Embeddings\nShielding Google's language toxicity model against adversarial attacks\nIs the coexistence of Catalan and Spanish possible in Catalonia?\nEfficient Gradual Typing\nStability of items: an experimental test\nDeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level  Sign Language Translation\nTesting the complexity of a valued CSP language\nA Comparison of the XTAG and CLE Grammars for English\nComplexity of Self-Assembled Shapes\nThe propagation of a cultural or biological trait by neutral genetic  drift in a subdivided population\nOn the security of AlphaEta: Response to `Some attacks on quantum-based  cryptographic protocols'\nParts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in  Extracting Information from Biomedical Text\nFrom formulas to cirquents in computability logic\nPolytropic neutron star - black hole merger simulations with a  Paczynski-Wiita potential\nJacobi Equations and Comparison Theorems for Corank 1 sub-Riemannian  Structures with Symmetries\nOn the Sets of Real Numbers Recognized by Finite Automata in Multiple  Bases\nThe second and third parameters of the Horizontal Branch in Globular  Clusters\nOnline Verification of Control Parameter Calculations in Communication  Based Train Control System\nThe evolving slope of the stellar mass function at 0.6 <= z < 4.5 from  deep WFC3 data\nStatic and dynamic variational principles for strongly correlated  electron systems\nPitfalls of Path Integrals: Amplitudes for Spacetime Regions and the  Quantum Zeno Effect\nThe Hilbert space of conditional clauses\nStructural connections between a forcing class and its modal logic\nEmbracing divergence: a formalism for when your semiring is simply not  complete, with applications in quantum simulation\nPyCosmic: a robust method to detect cosmics in CALIFA and other  fiber-fed integral-field spectroscopy datasets\nOn the Nature of Reality\nEfficient deconvolution methods for astronomical imaging: algorithms and  IDL-GPU codes\nUnifying Büchi Complementation Constructions\nEURETILE 2010-2012 summary: first three years of activity of the  European Reference Tiled Experiment\nHarmonic maps of finite uniton type into non-compact inner symmetric  spaces\nChemical compositions of six metal-poor stars in the ultra-faint dwarf  spheroidal galaxy Boötes I\nTheoretical Foundations of Equitability and the Maximal Information  Coefficient\nA Framework for Lattice QCD Calculations on GPUs\nNoise Robustness of the Incompatibility of Quantum Measurements\nAre You Talking to a Machine? Dataset and Methods for Multilingual Image  Question Answering\nTopological dynamics and the complexity of strong types\nWhat is Wrong with Topic Modeling? (and How to Fix it Using Search-based  Software Engineering)\nAn automaton approach for waiting times in DNA evolution\nJoint Video and Text Parsing for Understanding Events and Answering  Queries\nPerformance comparison between Java and JNI for optimal implementation  of computational micro-kernels\nGaia FGK Benchmark Stars: Effective temperatures and surface gravities\nHorava Gravity in the Effective Field Theory formalism: from cosmology  to observational constraints\n(Almost) C*-algebras as sheaves with self-action\nGeometric aspects of the symmetric inverse M-matrix problem\nTowards Evaluation of Cultural-scale Claims in Light of Topic Model  Sampling Effects\nRecovering Structured Probability Matrices\nUniversality of Black Hole Quantum Computing\nBelief-Invariant and Quantum Equilibria in Games of Incomplete  Information\nUnsupervised Feature Learning Based on Deep Models for Environmental  Audio Tagging\n4D Scattering Amplitudes and Asymptotic Symmetries from 2D CFT\nCausal structures and the classification of higher order quantum  computations\nMulti-level computational methods for interdisciplinary research in the  HathiTrust Digital Library\nThe Parameterized Complexity of Positional Games\nThe Covering Problem\nGo with the Flow: Compositional Abstractions for Concurrent Data  Structures (Extended Version)\nIn-RDBMS Hardware Acceleration of Advanced Analytics\nThe History Began from AlexNet: A Comprehensive Survey on Deep Learning  Approaches\nAbelian networks IV. Dynamics of nonhalting networks\nBigSR: an empirical study of real-time expressive RDF stream reasoning  on modern Big Data platforms\nEvent Data Definition in LHCb\nSome Problems in Automata Theory Which Depend on the Models of Set  Theory\nHOPE: A Python Just-In-Time compiler for astrophysical computations\nTime-space tradeoffs for two-way finite automata\nYin and Yang: Balancing and Answering Binary Visual Questions\nLow-dimensional Query Projection based on Divergence Minimization  Feedback Model for Ad-hoc Retrieval\nUnrestricted State Complexity of Binary Operations on Regular and Ideal  Languages\nLightRNN: Memory and Computation-Efficient Recurrent Neural Networks\nWhich NP-Hard SAT and CSP Problems Admit Exponentially Improved  Algorithms?\nPIE: A Domain-Specific Language for Interactive Software Development  Pipelines\nThe Shear TEsting Programme 2: Factors affecting high precision weak  lensing analyses\nPyrochlore Photons: The U(1) Spin Liquid in a S=1/2 Three-Dimensional  Frustrated Magnet\nAn Efficient and Flexible Engine for Computing Fixed Points\nUniversal Similarity\nDMTCP: Transparent Checkpointing for Cluster Computations and the  Desktop\nWarped Geometry of Brane Worlds\nQuantum Hyperbolic State Sum Invariants of 3-Manifolds\nHomological perturbations, equivariant cohomology, and Koszul duality\nAn XML Driven Graphical User Interface and Application Management  Toolkit\nRelativistic Quantum Dynamics: A non-traditional perspective on space,  time, particles, fields, and action-at-a-distance\nProcess, System, Causality, and Quantum Mechanics, A Psychoanalysis of  Animal Faith\nOn the Power of Random Bases in Fourier Sampling: Hidden Subgroup  Problem in the Heisenberg Group\n\"Gravitational mass\" of information?\nInfinitesimal or cocommutative dipterous bialgebras and good triples of  operads\nHydrodynamics of spacetime and vacuum viscosity\nKinematics of the old stellar population at the Galactic Center\nNumerical method for Darcy flow derived using Discrete Exterior Calculus\nSignatures of intrinsic Li depletion and Li-Na anti-correlation in the  metal-poor globular cluster NGC6397\nThe Second INTEGRAL AGN Catalogue\nThe dilution peak, metallicity evolution, and dating of galaxy  interactions and mergers\nSelf force on a scalar charge in Kerr spacetime: circular equatorial  orbits\nDoes resolving PvNP require a paradigm shift?\nPassively Mobile Communicating Logarithmic Space Machines\nStar formation in Cometary globule 1: the second generation\nOn the non-Abelian monopoles on the background of spaces with constant  curvature\nLeast squares deconvolution of the stellar intensity and polarization  spectra\nPhotospheric and coronal abundances in solar-type stars: the peculiar  case of Tau Bootis\nDust formation in the ejecta of the Type II-P supernova 2004dj\nGood Friends, Bad News - Affect and Virality in Twitter\nThe flare model for X-ray variability of NGC 4258\nActions des groupes topologiques sur les objets universels\nReducing Interpolation on Multi-Grid to Quantizing Grid's Data-Base as a  Recursion\nExact computation of joint spectral characteristics of linear operators\nEstimating the overlap between dependent computations for automatic  parallelization\nThe Control Theory of Motion-Based Communication: Problems in Teaching  Robots to Dance\nFour Degrees of Separation\nHarmony Explained: Progress Towards A Scientific Theory of Music\nA way to deal with the fringe-like pattern in VIMOS-IFU data\nRecompression: a simple and powerful technique for word equations\nThe Quantum Frontier\nSelf-similarity of temperature profiles in distant galaxy clusters: the  quest for a Universal law\nCDAS: A Crowdsourcing Data Analytics System\nInvariant Generation through Strategy Iteration in Succinctly  Represented Control Flow Graphs\nProbabilities on Sentences in an Expressive Logic\nInelastic electron and Raman scattering from the collective excitations  in quantum wires: Zero magnetic field\nA Rule-based Model of a Hypothetical Zombie Outbreak: Insights on the  role of emotional factors during behavioral adaptation of an artificial  population\nThe development of the Heliometer of the Observatorio Nacional of Rio de  Janeiro and application to the study of the Sun-Earth system\nCalibration of quasi-static aberrations in exoplanet direct-imaging  instruments with a Zernike phase-mask sensor\nNetwork Sparsification for Steiner Problems on Planar and Bounded-Genus  Graphs\nDynamics of the dust rings and satellites of Uranus and of the Saturn's  F-ring\nIntrinsic structure of liquid surface and capillary waves on the Density  Functional Theory\nCircum-stellar medium around rotating massive stars at solar metallicity\nAnalysis of chemical abundances in planetary nebulae with [WC] central  stars. II. Chemical abundances and the abundance discrepancy factor\nHigh-level programming and control for industrial robotics: using a  hand-held accelerometer-based input device for gesture and posture  recognition\nVerification of Imperative Programs by Constraint Logic Program  Transformation\nThe SPARQL2XQuery Interoperability Framework. Utilizing Schema Mapping,  Schema Transformation and Query Translation to Integrate XML and the Semantic  Web\nWhy does the effective field theory of inflation work?\nPsrPopPy: An open-source package for pulsar population simulations\nAn Effective End-User Development Approach Through Domain-Specific  Mashups for Research Impact Evaluation\nBayesian regression discontinuity designs: Incorporating clinical  knowledge in the causal analysis of primary care data\nChemical composition and constraints on mass loss for globular clusters  in dwarf galaxies: WLM and IKN\nConcise comparative summaries (CCS) of large text corpora with a human  experiment\nKernels in tropical geometry and a Jordan-Hölder Theorem\nGlobal disease monitoring and forecasting with Wikipedia\nConstructing small tree grammars and small circuits for formulas\nTowards Refactoring DMARF and GIPSY OSS\nPresence-absence reasoning for evolutionary phenotypes\nLarge Vocabulary Arabic Online Handwriting Recognition System\nFaster graphical model identification of tandem mass spectra using  peptide word lattices\nLong-term Recurrent Convolutional Networks for Visual Recognition and  Description\nCharacterizing the Google Books corpus: Strong limits to inferences of  socio-cultural and linguistic evolution\nSunPy - Python for Solar Physics\nConcrete resource analysis of the quantum linear system algorithm used  to compute the electromagnetic scattering cross section of a 2D target\nMulti-GPU Distributed Parallel Bayesian Differential Topic Modelling\nGeneral relativistic magnetohydrodynamical simulations of the jet in M87\nRegularity theory and extension problem for fractional nonlocal  parabolic equations and the master equation\nContext-Content Systems of Random Variables: The  Contextuality-by-Default Theory\nFast and Lean Immutable Multi-Maps on the JVM based on Heterogeneous  Hash-Array Mapped Tries\nOCR of historical printings with an application to building diachronic  corpora: A case study using the RIDGES herbal corpus\nQuantum mechanics in an evolving Hilbert space\nRelating Weight Constraint and Aggregate Programs: Semantics and  Representation\nThe Estimation of Subjective Probabilities via Categorical Judgments of  Uncertainty\nThe VLT-FLAMES Tarantula Survey XVI. The optical+NIR extinction laws in  30 Doradus and the photometric determination of the effective temperatures of  OB stars\nAbout Adaptive Coding on Countable Alphabets: Max-Stable Envelope  Classes\nAcyclicity Notions for Existential Rules and Their Application to Query  Answering in Ontologies\nLow delta-V near-Earth asteroids: A survey of suitable targets for space  missions\nEfficient Gluing of Numerical Continuation and a Multiple Solution  Method for Elliptic PDEs\nMusical elements in the discrete-time representation of sound\nADS: The Next Generation Search Platform\nq-randomized Robinson-Schensted-Knuth correspondences and random  polymers\nIf the Current Clique Algorithms are Optimal, so is Valiant's Parser\nTypologies of the Popular Science Web Video\nMaximum-Entropy Inference with a Programmable Annealer\nRotational Virtual Knots and Quantum Link Invariants\nTesting particle trapping in transition disks with ALMA\nExploration and Exploitation of Victorian Science in Darwin's Reading  Notebooks\nImplementation of the Tangent Sphere and Cutting Plane Methods in the  Quantitative Determination of Ligand Binding Site Burial Depths in Proteins  Using FORTRAN 77/90 Language\nTwo applications of the spectrum of numbers\nResurgent transseries $\\&$ Dyson-Schwinger equations\nDeep Learning on FPGAs: Past, Present, and Future\nMulti-frequency studies of galaxies and groups: I. Environmental effect  on galaxy stellar mass and morphology\nKitaev honeycomb tensor networks: exact unitary circuits and  applications\nDocument Retrieval on Repetitive String Collections\nEnsemble X-ray variability of Active Galactic Nuclei. II. Excess  Variance and updated Structure Function\nLP-branching algorithms based on biased graphs\nSemantic Information Encoding in One Dimensional Time Domain Signals\nComputational Interpretations of Markov's principle\nGaDei: On Scale-up Training As A Service For Deep Learning\nHow to measure the topological quality of protein grammars?\nEvolution of the real-space correlation function from next generation  cluster surveys\nComparing MapReduce and Pipeline Implementations for Counting Triangles\nExtracting and Analyzing Hidden Graphs from Relational Databases\nQuantum Break-Time of de Sitter\nSocial media mining for identification and exploration of health-related  information from pregnant women\nSpin-projected matrix product states (SP-MPS): a versatile tool for  strongly correlated systems\nSoundness in negotiations\nDGSAT: Dwarf Galaxy Survey with Amateur Telescopes II. A catalogue of  isolated nearby edge-on disk galaxies and the discovery of new low surface  brightness systems\nGraphical Models: An Extension to Random Graphs, Trees, and Other  Objects\nLearning to Associate Words and Images Using a Large-scale Graph\nExoplanet Biosignatures: Future Directions\nSimulated Galactic methanol maser distribution to constrain Milky Way  parameters\nSkin Temperature Measurement\nProduction of vector resonances at the LHC via WZ-scattering: a  unitarized EChL analysis\nThe Complexity Landscape of Fixed-Parameter Directed Steiner Network  Problems\nRaum, Zeit und Wechselwirkung in der Quantentheorie der Ur-Alternativen\nEfficient Algorithms for Checking Fast Termination in VASS\nModel Checking for Fragments of Halpern and Shoham's Interval Temporal  Logic Based on Track Representatives\nEnd-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics  Optimization by Fully Convolutional Neural Networks\nHow intelligent are convolutional neural networks?\nCharacterizing Diabetes, Diet, Exercise, and Obesity Comments on Twitter\nWeak Memory Models with Matching Axiomatic and Operational Definitions\nIndirect Supervision for Relation Extraction using Question-Answer Pairs\nSemRe-Rank: Improving Automatic Term Extraction By Incorporating  Semantic Relatedness With Personalised PageRank\nOptical-NIR dust extinction towards Galactic O stars\nPebble isolation mass --- scaling law and implications for the formation  of super-Earths and gas giants\nGalactic cold cores IX. Column density structures and radiative transfer  modelling\nDetecting Zones and Threat on 3D Body for Security in Airports using  Deep Machine Learning\nMechanical Stresses Estimation in Silicon and Glass Bonded at Elevated  Temperature\nComment on \"Linguistic Features of Noncoding DNA Sequences\"\nLessons from a Restricted Turing Test\nAn Empirically Motivated Reinterpretation of Dependency Grammar\nAdjuncts and the Processing of Lexical Rules\nTemporal Relations: Reference or Discourse Coherence?\nEfficiency, Robustness, and Accuracy in Picky Chart Parsing\nA Stochastic Finite-State Word-Segmentation Algorithm for Chinese\nCollaboration on reference to objects that are not mutually known\nPrecise n-gram Probabilities from Stochastic Context-free Grammars\nAn Extended Theory of Head-Driven Parsing\nModularity in a Connectionist Model of Morphology Acquisition\nRelating Complexity to Practical Performance in Parsing with  Wide-Coverage Unification Grammars\nStatistical Augmentation of a Chinese Machine-Readable Dictionary\nAn Automatic Method of Finding Topic Boundaries\nA Spanish Tagset for the CRATER Project\nResolution of Syntactic Ambiguity: the Case of New Subjects\nThe complexity of normal form rewrite sequences for Associativity\nAnytime Algorithms for Speech Parsing?\nMulti-Paragraph Segmentation of Expository Text\nLearning unification-based grammars using the Spoken English Corpus\nInterleaving Syntax and Semantics in an Efficient Bottom-Up Parser\nDiscourse Obligations in Dialogue Processing\nParsing as Tree Traversal\nComputing FIRST and FOLLOW Functions for Feature-Theoretic Grammars\nThe Correct and Efficient Implementation of Appropriateness  Specifications for Typed Feature Structures\nTyped Feature Structures as Descriptions\nTagging accurately -- Don't guess if you know\nDistributional Clustering of English Words\nExperimentally Evaluating Communicative Strategies: The Effect of the  Task\nA Formal Look at Dependency Grammars and Phrase-Structure Grammars, with  Special Consideration of Word-Order Phenomena\nRecognizing Text Genres with Simple Metrics Using Discriminant Analysis\nAutomatic Error Detection in Part of Speech Tagging\nPart-of-Speech Tagging with Neural Networks\nConcurrent Lexicalized Dependency Parsing: A Behavioral View on  ParseTalk Events\nSublanguage Terms: Dictionaries, Usage, and Automatic Classification\nExtending DRT with a Focusing Mechanism for Pronominal Anaphora and  Ellipsis Resolution\nExtraction in Dutch with Lexical Rules\nLexical Knowledge Representation in an Intelligent Dictionary Help  System\nInterlingual Lexical Organisation for Multilingual Lexical Databases in  NADIA\nBottom-Up Earley Deduction\nUtilization of a Lexicon for Spelling Correction in Modern Greek\nA Robust and Efficient Three-Layered Dialogue Component for a  Speech-to-Speech Translation System\nAmbiguity resolution in a reductionistic parser\nEllipsis and Quantification: a substitutional approach\nDeterministic Consistency Checking of LP Constraints\nA Tractable Extension of Linear Indexed Grammars\nOn Reasoning with Ambiguities\nTowards an Account of Extraposition in HPSG\nComputational dialectology in Irish Gaelic\nParseTalk about Sentence- and Text-Level Anaphora\nThe Semantics of Motion\nNon-Constituent Coordination: Theory and Practice\nDiscourse and Deliberation: Testing a Collaborative Strategy\nSATZ - An Adaptive Sentence Segmentation System\nFrom compositional to systematic semantics\nCo-occurrence Vectors from Corpora vs. Distance Vectors from  Dictionaries\nCues and control in Expert-Client Dialogues\nAn Implemented Formalism for Computing Linguistic Presuppositions and  Existential Commitments\nA Morphographemic Model for Error Correction in Nonconcatenative Strings\nCorpus Statistics Meet the Noun Compound: Some Empirical Results\nCompiling HPSG type constraints into definite clause programs\nTreating Coordination with Datalog Grammars\nCompilation of HPSG to TAG\nTagset Reduction Without Information Loss\nEvaluation of Semantic Clusters\nExploring the role of Punctuation in Parsing Natural Text\nCombining Multiple Knowledge Sources for Discourse Segmentation\nD-Tree Grammars\nSyllable parsing in English and French\nFilling Knowledge Gaps in a Broad-Coverage Machine Translation System\nThe Effect of Pitch Accenting on Pronoun Referent Resolution\nAn Approach to Proper Name Tagging for German\nConstraint Categorial Grammars\nA Natural Law of Succession\nThe Development and Migration of Concepts from Donor to Borrower  Disciplines: Sublanguage Term Use in Hard & Soft Sciences\nPOS Tagging Using Relaxation Labelling\nA Proposal for Word Sense Disambiguation using Conceptual Distance\nDeveloping and Evaluating a Probabilistic LR Parser of Part-of-Speech  and Punctuation Labels\nAn investigation into the correlation of cue phrases, unfilled pauses  and the structuring of spoken discourse\nUsing Information Content to Evaluate Semantic Similarity in a Taxonomy\nLimited Attention and Discourse Structure\nSimilarity between Words Computed by Spreading Activation on an English  Dictionary\nText Segmentation Based on Similarity between Words\nSituations and Computation: An Overview of Recent Research\nAssessing agreement on classification tasks: the kappa statistic\nA Constraint-based Case Frame Lexicon\nOff-line Constraint Propagation for Efficient HPSG Processing\nSemHe: A Generalised Two-Level System\nMagic for Filter Optimization in Dynamic Bottom-up Processing\nUnsupervised Learning of Word-Category Guessing Rules\nLearning Part-of-Speech Guessing Rules from Lexicon: Extension to  Non-Concatenative Operations\nYet Another Paper about Partial Verb Phrase Fronting in German\nResolving Anaphors in Embedded Sentences\nCounting Coordination Categorially\nA Conceptual Reasoning Approach to Textual Ellipsis\nIncremental Centering and Center Ambiguity\nParsing Algorithms and Metrics\nPart-of-Speech-Tagging using morphological information\nCoordination in Tree Adjoining Grammars: Formalization and  Implementation\nWord Sense Disambiguation using Conceptual Density\nModularizing Contexted Constraints\nRelating Turing's Formula and Zipf's Law\nStabilizing the Richardson Algorithm by Controlling Chaos\nCompilation of Weighted Finite-State Transducers from Decision Trees\nComputational Complexity of Probabilistic Disambiguation by means of  Tree-Grammars\nInducing Constraint Grammars\nIntegrating Syntactic and Prosodic Information for the Efficient  Detection of Empty Categories\nPattern-Based Context-Free Grammars for Machine Translation\nA Machine Learning Approach to the Classification of Dialogue Utterances\nAutomatic Construction of Clean Broad-Coverage Translation Lexicons\nCLEARS - An Education and Research Tool for Computational Semantics\nThe discourse functions of Italian subjects: a centering approach\nCorrections and Higher-Order Unification\nAutomatic Detection of Omissions in Translations\nComparative Experiments on Disambiguating Word Senses: An Illustration  of the Role of Bias in Machine Learning\nAutomatic Extraction of Subcategorization from Corpora\nConcept Clustering and Knowledge Integration from a Children's  Dictionary\nInsights into the Dialogue Processing of VERBMOBIL\nCentering in-the-large: Computing referential discourse segments\nSloppy Identity\nGrammatical analysis in the OVIS spoken-dialogue system\nComputing Parallelism in Discourse\nA Lexicon for Underspecified Semantic Tagging\nComparing a Linguistic and a Stochastic Tagger\nProbabilistic Coreference in Information Extraction\nName Searching and Information Retrieval\nEfficient Construction of Underspecified Semantics under Massive  Ambiguity\nAutomatic Detection of Text Genre\nRecognizing Referential Links: An Information Extraction Perspective\nSimilarity-Based Methods For Word Sense Disambiguation\nEncoding Frequency Information in Lexicalized Grammars\nThe Complexity of Recognition of Linguistically Adequate Dependency  Grammars\nSegmentation of Expository Texts by Hierarchical Agglomerative  Clustering\nUse of Weighted Finite State Transducers in Part of Speech Tagging\nDisambiguating with Controlled Disjunctions\nLook-Back and Look-Ahead in the Conversion of Hidden Markov Models into  Finite State Transducers\nOn the existence of certain total recursive functions in nontrivial  axiom systems, I\nParsing Inside-Out\nAutomatic summarising: factors and directions\nRationality, Cooperation and Conversational Implicature\nCan Subcategorisation Probabilities Help a Statistical Parser?\nWord Sense Disambiguation using Optimised Combinations of Knowledge  Sources\nNever Look Back: An Alternative to Centering\nEvaluating a Focus-Based Approach to Anaphora Resolution\nA Maximum-Entropy Partial Parser for Unrestricted Text\nChunk Tagger - Statistical Recognition of Noun Phrases\nA Linguistically Interpreted Corpus of German Newspaper Text\nAutomatically Creating Bilingual Lexicons for Machine Translation from  Bilingual Text\nStatistical Models for Unsupervised Prepositional Phrase Attachment\nCombining Expression and Content in Domains for Dialog Managers\nHow to define a context-free backbone for DGs: Implementing a DG in the  LFG formalism\nSeparating Surface Order and Syntactic Relations in a Dependency Grammar\nA Comparison of WordNet and Roget's Taxonomy for Measuring Semantic  Similarity\nEntropic analysis of the role of words in literary texts\nExtended Comment on Language Trees and Zipping\nSeparating Dependency from Constituency in a Tree Rewriting System\nImproving Tagging Performance by Using Voting Taggers\nObject Oriented and Functional Programming for Symbolic Manipulation\nMemory-Based Shallow Parsing\nFormal Modeling in a Commercial Setting: A Case Study\nSelf-Specifying Machines\nDeduction over Mixed-Level Logic Representations for Text Passage  Retrieval\nA database and lexicon of scripts for ThoughtTreasure\nDLV - A System for Declarative Problem Solving\nVariable Word Rate N-grams\nImproving Testsuites via Instrumentation\nBagging and Boosting a Treebank Parser\nComparing two trainable grammatical relations finders\nMore accurate tests for the statistical significance of result  differences\nMetonymy Interpretation Using X NO Y Examples\nBunsetsu Identification Using Category-Exclusive Rules\nTemporal Expressions in Japanese-to-English Machine Translation\nAnaphora Resolution in Japanese Sentences Using Surface Expressions and  Examples\nComputing Presuppositions by Contextual Reasoning\nUsing existing systems to supplement small amounts of annotated  grammatical relations training data\nReduction of Intermediate Alphabets in Finite-State Transducer Cascades\nUtilizing the World Wide Web as an Encyclopedia: Extracting Term  Descriptions from Semi-Structured Texts\nA Novelty-based Evaluation Method for Information Retrieval\nThe Use of Instrumentation in Grammar Engineering\nMathematical Model of Word Length on the Basis of the Cebanov-Fucks  Distribution with Uniform Parameter Distribution\nCorrection of Errors in a Modality Corpus Used for Machine Translation  by Using Machine-learning Method\nReverse Engineering from Assembler to Formal Specifications via Program  Transformations\nComponent Programming and Interoperability in Constraint Solver Design\nJoint and conditional estimation of tagging and parsing models\nComputational properties of environment-based disambiguation\nUsing the Distribution of Performance for Studying Statistical NLP  Systems and Corpora\nIntegrating Multiple Knowledge Sources for Robust Semantic Parsing\nLearning class-to-class selectional preferences\nUsing a Support-Vector Machine for Japanese-to-English Translation of  Tense, Aspect, and Modality\nTowards practical meta-querying\nAnnotation Graphs and Servers and Multi-Modal Resources: Infrastructure  for Interdisciplinary Education, Research and Development\nSemantic Properties for Lightweight Specification in Knowledgeable  Development Environments\nUnsupervised Discovery of Morphemes\nDefining Rough Sets by Extended Logic Programs\nExploiting Sublanguage and Domain Characteristics in a Bootstrapping  Approach to Lexicon and Ontology Creation\nAn Alternative to RDF-Based Languages for the Representation and  Processing of Ontologies in the Semantic Web\nDUCT: An Interactive Define-Use Chain Navigation Tool for Relative  Debugging\nMemory As A Monadic Control Construct In Problem-Solving\nThe Design of a COM-Oriented Module System\nSecure Prolog-Based Mobile Code\nO(1) Reversible Tree Navigation Without Cycles\nWell-Definedness and Semantic Type-Checking in the Nested Relational  Calculus and XQuery\nAnnotating Predicate-Argument Structure for a Parallel Treebank\nProofing Tools Technology at Neurosoft S.A.\nCrocoPat 2.1 Introduction and Reference Manual\nOptimal Union-Find in Constraint Handling Rules\nAn Audit Logic for Accountability\nIn the beginning was game semantics\nPlanning with Preferences using Logic Programming\nIntegration of Declarative and Constraint Programming\nStatistical Parameters of the Novel \"Perekhresni stezhky\" (\"The  Cross-Paths\") by Ivan Franko\nProlog Server Pages\nRemote-control and clustering of physical computations using the XML-RPC  protocol and the open-Mosix system\nContinuations, proofs and tests\nSemantics of Separation-Logic Typing and Higher-order Frame Rules  for<br> Algol-like Languages\nComplexity of Data Flow Analysis for Non-Separable Frameworks\nScaling Construction Grammar up to Production Systems: the SCIM\nFingerprinting Logic Programs\nThe pitfalls of verifying floating-point computations\nAn Abstract Monte-Carlo Method for the Analysis of Probabilistic  Programs\nA Formal Model for Programming Wireless Sensor Networks\nDependency Parsing with Dynamic Bayesian Network\nUniversal Nonperturbative Effects in Event Shapes from Soft-Collinear  Effective Theory\nExtremal Transitions in Heterotic String Theory\nOn the harmonic superspace language adapted to the Gelfand-Dickey  algebra of differential operators\nOn the Hopf Structure of W(2) Algebra and N=1 Superconformal Algebra in  the Ope Language\nTwo-Loop Superstrings in Hyperelliptic Language II: the Vanishing of the  Cosmological Constant and the Non-Renormalization Theorem\nAn application of Shoenfield's absoluteness theorem to the theory of  uniform distribution\nUsing Automata to obtain Regular Expressions for Induced Actions\nModel Companions of T_σfor stable T\nA language for multiplicative-additive linear logic\nDivisibility Theory and Complexity of Algorithms in Free Partially  Commutative Groups\nFormal languages and groups as memory\nSome thoughts upon axiomatized languages with estension tools, a focus  on probability theory and error calculus with Dirichlet forms\nLife in Silico - Simulation of Complex Systems by Enzymatic Computation\nProblem Solving and the Use of Math in Physics Courses\nAn Evolutionary Picture for Quantum Physics\nExact results for accepting probabilities of quantum automata\nLinear optics implementation of weak values in Hardy's paradox\nAn Algebra of Pure Quantum Programming\nQuantum Interpretations\nArabic Speech Recognition System using CMU-Sphinx4\nLearning Phonotactics Using ILP\nUndecidability in function fields of positive characteristic\nQuantized Detector Networks: A review of recent developments\nApproximately Independent Features of Languages\nZipf's Law and Avoidance of Excessive Synonymy\nRemarks on Jurdzinski and Lorys' proof that palindromes are not a  Church-Rosser language\nThe predictability of letters in written english\nEvaluation of a Grammar of French Determiners\nFreeware solutions for spectropolarimetric data reduction\nA logical analysis of entanglement and separability in quantum  higher-order functions\nHypergames and full completeness for system F (rough draft)\nGroups that do and do not have context-sensitive word problem\nConcerning Olga, the Beautiful Little Street Dancer (Adjectives as  Higher-Order Polymorphic Functions)\nPlatform-Independent Firewall Policy Representation\nA chain dictionary method for Word Sense Disambiguation and applications\nAutomata and cells in affine Weyl groups\nAceWiki: Collaborative Ontology Management in Controlled Natural  Language\nInitial Results on the F-logic to OWL Bi-directional Translation on a  Tabled Prolog Engine\nPeek Arc Consistency\nSoftware dependability modeling using an industry-standard architecture  description language\nNon procedural language for parallel programs\nA formally verified compiler back-end\nAmbiguity and Communication\nSyntactic variation of support verb constructions\nOn the Morse-Hedlund complexity gap\nA Type System for Parallel Components\nA Particular Universal Cellular Automaton\nSearch-based Structured Prediction\nContinuum multi-physics modeling with scripting languages: the Nsim  simulation compiler prototype for classical field theory\nBayesian Query-Focused Summarization\nModular Verification of Recursive Programs\nSerializing the Parallelism in Parallel Communicating Pushdown Automata  Systems\nMarking-up multiple views of a Text: Discourse and Reference\nA Note On Higher Order Grammar\nLudics and its Applications to natural Language Semantics\nDecision problems for inverse monoids presented by a single sparse  relator\nA Type System for a Stochastic CLS\nA non-interleaving process calculus for multi-party synchronisation\nGraph-Links\nInteger Reset Timed Automata: Clock Reduction and Determinizability\nSpeech Recognition of the letter 'zha' in Tamil Language using HMM\nMorphological study of Albanian words, and processing with NooJ\nComplete Context Calculus Design and Implementation in GIPSY\nLes Entités Nommées : usage et degrés de précision et de  désambiguïsation\nGraph Creation, Visualisation and Transformation\nThe Semantics of Graph Programs\nAlgèbre OLAP et langage graphique\nViews, Program Transformations, and the Evolutivity Problem in a  Functional Language\nQuantitative parametrization of texts written by Ivan Franko: An attempt  of the project\nQuantifier elimination and minimality conditions in algebraically closed  valued fields\nMirrored Language Structure and Innate Logic of the Human Brain as a  Computable Model of the Oracle Turing Machine\nMapping Business Process Modeling constructs to Behavior Driven  Development Ubiquitous Language\nStatic and Dynamic Quality Assurance by Aspect Oriented Techniques\nA tool stack for implementing Behaviour-Driven Development in Python  Language\nTransition Complexity of Incomplete DFAs\nTransformations Between Different Types of Unranked Bottom-Up Tree  Automata\nLearning Residual Finite-State Automata Using Observation Tables\nProceedings Seventh Workshop on Structural Operational Semantics\nExact Bivariate Polynomial Factorization in Q by Approximation of Roots\nAI 3D Cybug Gaming\nProceedings Fourth International Workshop on Testing, Analysis and  Verification of Web Software\nAn algorithmic approximation of the infimum reachability probability for  Probabilistic Finite Automata\nIntroduction to the iDian\nA Decidable Characterization of a Graphical Pi-calculus with Iterators\nA Path Algebra for Multi-Relational Graphs\nA PDTB-Styled End-to-End Discourse Parser\nAn Introduction to Software Engineering and Fault Tolerance\nEmoticonsciousness\nSICStus Prolog -- the first 25 years\nSPARQL Assist Language-Neutral Query Composer\nSevere Language Effect in University Rankings: Particularly Germany and  France are wronged in citation-based rankings\nMatrix Insertion-Deletion Systems\nNP has log-space verifiers with fixed-size public quantum registers\nThe Geometry of T-Varieties\nTransforming ASN.1 Specifications into CafeOBJ to assist with Property  Checking\nMixing, Ergodic, and Nonergodic Processes with Rapidly Growing  Information between Blocks\nRepresenting First-Order Causal Theories by Logic Programs\nA Universal Part-of-Speech Tagset\nHandwritten Character Recognition of South Indian Scripts: A Review\nCognitive Binary Logic - The Natural Unified Formal Theory of  Propositional Binary Logic\nClasificarea distribuita a mesajelor de e-mail\nDevelopment and performance analysis of a UPC Particle-in-Cell code\nOn the system F as a glue language for natural-language  compositional-semantics\nTutorial on Online Partial Evaluation\nModular Abstractions of Reactive Nodes using Disjunctive Invariants\nPDDL2.1 - The Art of the Possible? Commentary on Fox and Long\nSpecification of photonic circuits using Quantum Hardware Description  Language\nCompiler Optimization: A Case for the Transformation Tool Contest\nHelloWorld! An Instructive Case for the Transformation Tool Contest\nSaying Hello World with MOLA - A Solution to the TTC 2011 Instructive  Case\nSaying HelloWorld with QVTR-XSLT - A Solution to the TTC 2011  Instructive Case\nDynamic Logics of Imperfect Information: from Teams and Games to  Transitions\nOn Some Entertaining Applications of the Concept of Set in Computer  Science Course\nA Description Logic Primer\nSegmenting DNA sequence into `words'\nRealisation d'un systeme de reconnaissance automatique de la parole  arabe base sur CMU Sphinx\nProgramming with Algebraic Effects and Handlers\nUsing Signals to Improve Automatic Classification of Temporal Relations\nSemi-Automatically Extracting FAQs to Improve Accessibility of Software  Development Knowledge\nReasoning on Schemata of Formulae\nThe logic of quantum mechanics - Take II\nThe Design of GP 2\nLazy AC-Pattern Matching for Rewriting\nDegree two approximate Boolean #CSPs with variable weights\nCatroid: A Mobile Visual Programming System for Children\nSome Combinatorial Operators in Language Theory\nOn Formal Specification of Maple Programs\nClustering based approach extracting collocations\nFrame Interpretation and Validation in a Open Domain Dialogue System\nOckham's Razor, Probability and Quantum Physics as Logic\nNumerical methods with Sage\nDesign and Implementation A different Architectures of mixcolumn in FPGA\nForests and the W Construction\nA Myhill-Nerode theorem for automata with advice\nA call-by-value lambda-calculus with lists and control\nThe Jasper Framework: Towards a Platform Independent, Formal Treatment  of Web Programming\nAdding Sessions to BPEL\nText Classification with Compression Algorithms\nAn Experiment on the Connection between the DLs' Family DL<ForAllPiZero>  and the Real World\nTwo Algorithms for Finding $k$ Shortest Paths of a Weighted Pushdown  Automaton\nThe Grammar Hammer of 2012\nThree-Element Min-Sol and Conservative Min-Cost-Hom\nAdaptation of fictional and online conversations to communication media\nRealizability Categories\nCutting Recursive Autoencoder Trees\nLogarithmic Space and Permutations\nEnabling Operator Reordering in Data Flow Programs Through Static Code  Analysis\nForty hours of declarative programming: Teaching Prolog at the Junior  College Utrecht\nMerging Uncertain Knowledge Bases in a Possibilistic Logic Framework\nArguing for Decisions: A Qualitative Model of Decision Making\nVariant-Frequency Semantics for Green Futures\nModularizing and Specifying Protocols among Threads\nBisimulation and p-morphism for branching-time logics with  indistinguishability relations\nAutomatic Detection of Non-deverbal Event Nouns for Quick Lexicon  Production\nModeling Basic Aspects of Cyber-Physical Systems\nA quantum teleportation inspired algorithm produces sentence meaning  from word meaning and grammatical structure\nA representation of context-free grammars with the help of finite  digraphs\nEventual Linear Ranking Functions\nA framework for (under)specifying dependency syntax without overloading  annotators\nAccomplishable Tasks in Knowledge Representation\nHilbert's Tenth Problem over Function Fields of Positive Characteristic  Not Containing the Algebraic Closure of a Finite Field\nSuggest an Aspect-Oriented Design Approach for UML Communication Diagram\nProceedings Fourth International Symposium on Games, Automata, Logics  and Formal Verification\nAn Algorithm Enumerating All Infinite Repetitions in a D0L System\nReasoning for Moving Blocks Problem: Formal Representation and  Implementation\nConnecting Language and Knowledge Bases with Embedding Models for  Relation Extraction\nIntroducing Access Control in Webdamlog\nProceedings First Workshop on Control Operators and their Semantics\nC++11 -- idea r-wartości i przenoszenia\nEnvironmental Bisimulations for Delimited-Control Operators\nOnline partial evaluation of sheet-defined functions\nA Simple Semantics and Static Analysis for Stack Inspection\nLDC Arabic Treebanks and Associated Corpora: Data Divisions Manual\nThe Joseph Greenberg problem: combinatorics and comparative linguistics\nDevelopment and Transcription of Assamese Speech Corpus\nTreating clitics with minimalist grammars\nSome Remarks on Lower Bounds for Queue Machines (Preliminary Report)\nFighting network space: it is time for an SQL-type language to filter  phylogenetic networks\nWhat Mathematical Theories of Truth Should be Like (and Can be)\nProblems in Systematic Application of Software Metrics and Possible  Solution\nHEVAL: Yet Another Human Evaluation Metric\nProgramming with Permissions in Mezzo\nRemarks on Privileged Words\nHop and HipHop : Multitier Web Orchestration\nThe bitwise operations related to a fast sorting algorithm\nThe Petri-Nets to Statecharts Transformation Case\nAnalyzing Flowgraphs with ATL\nBidirectional Recursive Neural Networks for Token-Level Labeling with  Structure\nFinite automata with advice tapes\nInferring Algebraic Effects\nCoinductive Big-Step Semantics for Concurrency\nDeep Learning Embeddings for Discontinuous Linguistic Units\nSuffix Stripping Problem as an Optimization Problem\nPlurals: individuals and sets in a richly typed semantics\nUndecidable properties of self-affine sets and multi-tape automata\nOn the Potential of Twitter for Understanding the Tunisia of the  Post-Arab Spring\nBüchi Types for Infinite Traces and Liveness\nFinite difference numerical method for the superlattice Boltzmann  transport equation and case comparison of CPU(C) and GPU(CUDA)  implementations\nCategory theory, logic and formal linguistics: some connections, old and  new\nA Simple Method to Reduce Thermodynamic Derivatives by Computer\nDesign a Persian Automated Plagiarism Detector (AMZPPD)\nEncapsulating Formal Methods within Domain Specific Languages: A  Solution for Verifying Railway Scheme Plans\nEnforcing Operational Properties including Blockfreeness for  Deterministic Pushdown Automata\nLanguages of lossless seeds\nInformation Retrieval (IR) through Semantic Web (SW): An Overview\nSonglines and Navigation in Wardaman and other Australian Aboriginal  Cultures\nGoing higher in the First-order Quantifier Alternation Hierarchy on  Words\nOn the Relative Expressiveness of Argumentation Frameworks, Normal Logic  Programs and Abstract Dialectical Frameworks\nThree Semantics for Modular Systems\nMeasuring Communication in Parallel Communicating Finite Automata\nCooperating Distributed Grammar Systems of Finite Index Working in  Hybrid Modes\nA Logical Formalization of a Secure XML Database\nInter-Rater Agreement Study on Readability Assessment in Bengali\nSubstitute Based SCODE Word Embeddings in Supervised NLP Tasks\nUniformly defining $p$-henselian valuations\nTowards a Domain Specific Language for a Scene Graph based Robotic World  Model\nMaking FPGAs Accessible to Scientists and Engineers as Domain Expert  Software Programmers with LabVIEW\nProceedings Fifth International Symposium on Games, Automata, Logics and  Formal Verification\nTree games with regular objectives\nOn the compactness property of extensions of first-order Gödel logic\nRefinement Checking for Multirate Hybrid ZIA\nArabic Language Text Classification Using Dependency Syntax-Based  Feature Selection\nModeling Word Relatedness in Latent Dirichlet Allocation\nWeighted automata on infinite words in the context of Attacker-Defender  games\nRoot-Weighted Tree Automata and their Applications to Tree Kernels\nTree-based language complexity of Thompson's group F\nSemantics for Locking Specifications\nProving Looping and Non-Looping Non-Termination by Finite Automata\nA Formalisation of Finite Automata using Hereditarily Finite Sets\nPalindromic complexity of trees\nArabic Inquiry-Answer Dialogue Acts Annotation Schema\nTracking Causal Dependencies in Web Services Orchestrations Defined in  ORC\nOn Web-based Domain-Specific Language for Internet of Things\nAn Implementation Model for Interaction Nets\nAbstract Interpretation of Supermodular Games\nLogic and Branching Automata\nClassifier-Based Text Simplification for Improved Machine Translation\nWhy teach an introductory course in Mathematical Logic in the Philosophy  curriculum?\nUsing interrogative logic to teach classical logic\nThe tree machine\nDeterministic parallel communicating Watson-Crick automata systems\nP-trac Procedure: The Dispersion and Neutralization of Contrasts in  Lexicon\nType-Based Analysis for Session Inference\nOn the Number of Many-to-Many Alignments of Multiple Sequences\nIdentifying Actionable Messages on Social Media\nLearning about Spanish dialects through Twitter\nA Roadmap towards Machine Intelligence\nCross-lingual Models of Word Embeddings: An Empirical Comparison\nAn Extremal Series of Eulerian Synchronizing Automata\nUsing Sentence-Level LSTM Language Models for Script Inference\nShallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text\nSynthesizing Program Input Grammars\nWinograd Schemas and Machine Translation\nCanonical Correlation Inference for Mapping Abstract Scenes to Text\nSyntactically Informed Text Compression with Recurrent Neural Networks\nA Practical Quantum Instruction Set Architecture\nExtracting Biological Pathway Models From NLP Event Representations\nMeasuring the State of the Art of Automated Pathway Curation Using Graph  Algorithms - A Case Study of the mTOR Pathway\nJulia Implementation of the Dynamic Distributed Dimensional Data Model\nHomological combinatorics and extensions of the cd-index\nStatic Trace-Based Deadlock Analysis for Synchronous Mini-Go\nIf more than Analytical Modeling is Needed to Predict Real Agents'  Strategic Interaction\nIdeogram Based Chinese Sentiment Word Orientation Computation\nAn Implementation of Bubbling\nDependent Types for JavaScript\nQuantified Data Automata on Skinny Trees: an Abstract Domain for Lists\nMCE Reasoning in Recursive Causal Networks\nObjective Probability\nGlobal Life Patterns: A Methodology for Designing a Personal Global Life\nComparing the usage of global and local Wikipedias with focus on Swedish  Wikipedia\nAuthorship Analysis based on Data Compression\nIVOA Recommendation: TAPRegExt: a VOResource Schema Extension for  Describing TAP Services\nMcCammond's normal forms for free aperiodic semigroups revisited\nThe Best Templates Match Technique For Example Based Machine Translation\nCoherence for Skew-Monoidal Categories\nConstraint Handling Rules with Multiset Comprehension Patterns\nSessions as Propositions\nQuestion Answering with Subgraph Embeddings\nMapping the Economic Crisis: Some Preliminary Investigations\nA CNL for Contract-Oriented Diagrams\nSubshifts, MSO Logic, and Collapsing Hierarchies\nAn NLP Assistant for Clide\nTowards Architectural Programming of Embedded Systems\nInnocent Strategies are Sheaves over Plays---Deterministic,  Non-deterministic and Probabilistic Innocence\nStatic Enforcement of Role-Based Access Control\nUML-F: A Modeling Language for Object-Oriented Frameworks\nQuipper: Concrete Resource Estimation in Quantum Algorithms\nInteger-Programming Ensemble of Temporal-Relations Classifiers\nUnsupervised Induction of Semantic Roles within a Reconstruction-Error  Minimization Framework\nBiips: Software for Bayesian Inference with Interacting Particle Systems\nThe Expressive Power of DL-Lite\nEnsaio sobre o Auto-Aproveitamento: um relato de investidas naturais na  participação social\nReply to the commentary \"Be careful when assuming the obvious\", by P.  Alday\nFrom Logical to Distributional Models\nTowards a Systems Engineering Essence\nUnary probabilistic and quantum automata on promise problems\nWIKI THANKS: Cultural Differences in Thanks Network of  Different-Language Wikipedias\nLearning to Search for Dependencies\nSyntagma Lexical Database\nUnsupervised POS Induction with Word Embeddings\nOpen Transactions on Shared Memory\nA Robust Approximation to a Lambert-Type Function\nUsing Syntax-Based Machine Translation to Parse English into Abstract  Meaning Representation\nHomology and closure properties of autostackable groups\nContent Translation: Computer-assisted translation tool for Wikipedia  articles\nApplicative Bisimulation and Quantum $λ$-Calculi (Long Version)\nClass Vectors: Embedding representation of Document Classes\nA versatile DAQ, monitoring and data processing system for nuclear  experiments in CAMAC and VME standards\nUnlocking Blocked Communicating Processes\nLearning Meta-Embeddings by Using Ensembles of Embedding Sets\nEchoes of Persuasion: The Effect of Euphony in Persuasive Communication\nWeight Assignment Logic\nAlignment-based compositional semantics for instruction following\nBetter Document-level Sentiment Analysis from RST Discourse Parsing\nExploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical  Machine Translation\nA Parallel Corpus of Translationese\nFrequency Distribution of Error Messages\nCharacterization and Complexity Results on Jumping Finite Automata\nBuilding Memory with Concept Learning Capabilities from Large-scale  Knowledge Base\nHankel Matrices for Weighted Visibly Pushdown Automata\nMinimum Risk Training for Neural Machine Translation\nThe Improvement of Negative Sentences Translation in English-to-Korean  Machine Translation\nNatural Language Inference by Tree-Based Convolution and Heuristic  Matching\nLearning to Compose Neural Networks for Question Answering\nVisual Script and Language Identification\nOn probability and logic\nThe Utility of Hedged Assertions in the Emergence of Shared Categorical  Labels\nMassively Multilingual Word Embeddings\nThe \"Sprekend Nederland\" project and its application to accent location\nRelations on words\nTime Window Temporal Logic\nLearning to SMILE(S)\nRefinement types in Jolie\nNote on the construction of globular weak omega-groupoids from types,  topological spaces etc\nProving completeness of logic programs with the cut\nCVXPY: A Python-Embedded Modeling Language for Convex Optimization\nOptimized Polynomial Evaluation with Semantic Annotations\nElementary equivalences and accessible functors\nTraining with Exploration Improves a Greedy Stack-LSTM Parser\nOperations on Weakly Recognizing Morphisms\nBayesian Neural Word Embedding\nA Classical Realizability Model for a Semantical Value Restriction\nA Comparison of NOOP to Structural Domain-Theoretic Models of OOP\nUnprovability of circuit upper bounds in Cook's theory PV\nMixing Dirichlet Topic Models and Word Embeddings to Make lda2vec\nNeural Recovery Machine for Chinese Dropped Pronoun\nCalculational Design of Information Flow Monitors (extended version)\nSyntactically Guided Neural Machine Translation\nRecurrent Neural Network for Text Classification with Multi-Task  Learning\nQuery Expansion with Locally-Trained Word Embeddings\nImproving Recurrent Neural Networks For Sequence Labelling\nNeural Word Segmentation Learning for Chinese\nImproving Testability and Reuse by Transitioning to Functional  Programming\nOn the decidability of the $Σ_2$ theories of the arithmetic and  hyperarithmetic degrees as uppersemilattices\nA Dynamic Epistemic Framework for Conformant Planning\nEfficient Parallel Learning of Word2Vec\nIntrinsic Subspace Evaluation of Word Embedding Representations\nHilbert series of symmetric ideals in infinite polynomial rings via  formal languages\nCayley Automatic Groups and Numerical Characteristics of Turing  Transducers\n\"Show me the cup\": Reference with Continuous Representations\nLearning when to trust distant supervision: An application to  low-resource POS tagging using cross-lingual projection\nTarget-Side Context for Discriminative Models in Statistical Machine  Translation\nExtracting Formal Models from Normative Texts\nTowards Trustworthy Refactoring in Erlang\nImplicit Negative Feedback in Clinical Information Retrieval\nLearning Nominal Automata\nMonadic second-order properties of very sparse random graphs\nSentiment Classification of Food Reviews\nReasoning about Graph Programs\nSelf-Sustaining Iterated Learning\nLiveness for Verification\nminiAdapton: A Minimal Implementation of Incremental Computation in  Scheme\nAdvances in All-Neural Speech Recognition\nLearning Robust Representations of Text\nLarge-Scale Machine Translation between Arabic and Hebrew: Available  Corpora and Initial Results\nLearning to Translate for Multilingual Question Answering\nJolie Community on the Rise\nECAT: Event Capture Annotation Tool\nA tentative model for dimensionless phoneme distance from binary  distinctive features\nCausally consistent dynamic slicing\nA Semantic Analyzer for the Comprehension of the Spontaneous Arabic  Speech\nCombining Treewidth and Backdoors for CSP\nKeystroke dynamics as signal for shallow syntactic parsing\nA Comprehensive Comparative Study of Word and Sentence Similarity  Measures\nVietnamese Named Entity Recognition using Token Regular Expressions and  Bidirectional Inference\nAn Alternating Automaton for First-Order Linear Temporal Logic--Tech  Report\nImproving historical spelling normalization with bi-directional LSTMs  and multi-task learning\nParameterized Dataflow (Extended Abstract)\nAn empirical study for Vietnamese dependency parsing\nAn Automated System for Essay Scoring of Online Exams in Arabic based on  Stemming Techniques and Levenshtein Edit Operations\nWhen silver glitters more than gold: Bootstrapping an Italian  part-of-speech tagger for Twitter\nCharacter-level Convolutional Network for Text Classification Applied to  Chinese Corpus\nToward Multilingual Neural Machine Translation with Universal Encoder  and Decoder\nData Minimisation: a Language-Based Approach (Long Version)\nLearning to Compose Words into Sentences with Reinforcement Learning\nCoALP-Ty'16\nInformation Extraction with Character-level Neural Networks and Free  Noisy Supervision\nGrammatical Constraints on Intra-sentential Code-Switching: From  Theories to Working Models\nNeural Networks Classifier for Data Selection in Statistical Machine  Translation\nOn incomplete and synchronizing finite sets\nShamela: A Large-Scale Historical Arabic Corpus\nProceedings Third International Workshop on Rewriting Techniques for  Program Transformations and Evaluation\nAn Environment for Analyzing Space Optimizations in Call-by-Need  Functional Languages\nA Modularity Bug in Java 8\nJSON: data model, query languages and schema specification\nLie groupoid, deformation of unstable curve, and construction of  equivariant Kuranishi charts\nDe-identification In practice\nParsing Universal Dependencies without training\nAn Introduction to Liquid Haskell\nA Data-Oriented Model of Literary Language\nSMARTies: Sentiment Models for Arabic Target Entities\nTowards a Decidable LogicWeb via Length-Bounded Derivations\nA Tutorial on Using Dafny to Construct Verified Software\nIrreducible compositions of degree two polynomials over finite fields  have regular structure\nGADTs and Exhaustiveness: Looking for the Impossible\nLearning to Parse and Translate Improves Neural Machine Translation\nJFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction\nBe Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers  from Vision\nSystèmes du LIA à DEFT'13\nAre Emojis Predictable?\nPractical Magick with C, PDL, and PDL::PP -- a guide to compiled add-ons  for PDL\nDetecting Sockpuppets in Deceptive Opinion Spam\nEffects of Limiting Memory Capacity on the Behaviour of Exemplar  Dynamics\nWhy we have switched from building full-fledged taxonomies to simply  detecting hypernymy relations\nJoint Learning of Correlated Sequence Labelling Tasks Using  Bidirectional Recurrent Neural Networks\nInScript: Narrative texts annotated with script information\nFrom visual words to a visual grammar: using language modelling for  image classification\nFoundations for a Probabilistic Event Calculus\nJapanese Sentiment Classification using a Tree-Structured Long  Short-Term Memory with Attention\nCompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection  Methods for Semantic Textual Similarity\nMRA - Proof of Concept of a Multilingual Report Annotator Web  Application\nConversation Modeling on Reddit using a Graph-Structured LSTM\nLean and Full Congruence Formats for Recursion\nDistributional Modeling on a Diet: One-shot Word Learning from Text Only\nTowards String-to-Tree Neural Machine Translation\nDeep Joint Entity Disambiguation with Local Neural Attention\nAutomated Sized-Type Inference and Complexity Analysis\nRedefining Context Windows for Word Embedding Models: An Experimental  Study\nReinforcement Learning with External Knowledge and Two-Stage Q-functions  for Predicting Popular Reddit Threads\nAttention Strategies for Multi-Source Sequence-to-Sequence Learning\nLearning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge  Graph Embeddings\nJoint POS Tagging and Dependency Parsing with Transition-based Neural  Networks\nOntology-Aware Token Embeddings for Prepositional Phrase Attachment\nProof Mining with Dependent Types\nCodeCity for (and by) JavaScript\nThe careless use of language in quantum information\nThe language of Stratified Sets is confluent and strongly normalising\nDeep Learning for Hate Speech Detection in Tweets\nMorphological Embeddings for Named Entity Recognition in Morphologically  Rich Languages\nDeep learning evaluation using deep linguistic processing\nMarmara Turkish Coreference Corpus and Coreference Resolution Baseline\nTranslating Event-B machines to Eiffel programs\nAn Automatic Approach for Document-level Topic Model Evaluation\nAn exploration to visualize finite element data with a DSL\nTHUMT: An Open Source Toolkit for Neural Machine Translation\nExtract with Order for Coherent Multi-Document Summarization\nA Deep Network with Visual Text Composition Behavior\nLIUM-CVC Submissions for WMT17 Multimodal Translation Task\nLIUM Machine Translation Systems for WMT17 News Translation Task\nDetecting Off-topic Responses to Visual Prompts\nEncoding Word Confusion Networks with Recurrent Neural Networks for  Dialog State Tracking\nHealth Analytics: a systematic review of approaches to detect phenotype  cohorts using electronic health records\nAnalogs of Linguistic Structure in Deep Representations\nMen Are from Mars, Women Are from Venus: Evaluation and Modelling of  Verbal Associations\nAnalysis of Italian Word Embeddings\nSkill2vec: Machine Learning Approach for Determining the Relevant Skills  from Job Description\nTowards Semantic Modeling of Contradictions and Disagreements: A Case  Study of Medical Guidelines\nA Comparison of Neural Models for Word Ordering\nRookie: A unique approach for exploring news archives\nNeural and Statistical Methods for Leveraging Meta-information in  Machine Translation\nEmotion Intensities in Tweets\nLeveraging Sparse and Dense Feature Combinations for Sentiment  Classification\nContinuous Representation of Location for Geolocation and Lexical  Dialectology using Mixture Density Networks\nEvaluating Word Embeddings for Sentence Boundary Detection in Speech  Transcripts\nThe CLaC Discourse Parser at CoNLL-2016\nClaC: Semantic Relatedness of Words and Phrases\nThe CLaC Discourse Parser at CoNLL-2015\nNeural Machine Translation with Extended Context\nVariational Inference for Logical Inference\nGrasping the Finer Point: A Supervised Similarity Network for Metaphor  Detection\nHypothesis Testing based Intrinsic Evaluation of Word Embeddings\nOptimizing for Measure of Performance in Max-Margin Parsing\nLeveraging Discourse Information Effectively for Authorship Attribution\nHuman Associations Help to Detect Conventionalized Multiword Expressions\nDeadlock detection of Java Bytecode\nRbox: an integrated R package for ATOM Editor\nGraph Convolutional Networks for Named Entity Recognition\nTowards Universal Semantic Tagging\nImproving Lexical Choice in Neural Machine Translation\nLearning Word Embeddings for Hyponymy with Entailment-Based  Distributional Semantics\nInteractive Learning of State Representation through Natural Language  Instruction and Explanation\nDailyDialog: A Manually Labelled Multi-turn Dialogue Dataset\nFindings of the Second Shared Task on Multimodal Machine Translation and  Multilingual Image Description\nRecognizing Explicit and Implicit Hate Speech Using a Weakly Supervised  Two-path Bootstrapping Approach\nImpact of Coreference Resolution on Slot Filling\nFine-tuning Tree-LSTM for phrase-level sentiment classification on a  Polish dependency treebank. Submission to PolEval task 2\nOn the incorporation of interval-valued fuzzy sets into the Bousi-Prolog  system: declarative semantics, implementation and applications\nh: A Plank for Higher-order Attribute Contraction Schemes\nFast Reading Comprehension with ConvNets\nSingular value automata and approximate minimization\nFast BTG-Forest-Based Hierarchical Sub-sentential Alignment\nEffective Strategies in Zero-Shot Neural Machine Translation\nDeclarativeness: the work done by something else\nEnabling Embodied Analogies in Intelligent Music Systems\nImproving Visually Grounded Sentence Representations with Self-Attention\nA Quantitative Study of Java Software Buildability\nContextualized Word Representations for Reading Comprehension\nSocial Media Writing Style Fingerprint\nLearning when to skim and when to read\nSubword and Crossword Units for CTC Acoustic Models\nAdvances in Pre-Training Distributed Word Representations\nLetter-Based Speech Recognition with Gated ConvNets\nDisentangled Representations for Manipulation of Sentiment in Text\nImage Captioning using Deep Neural Architectures\nOngoing Events in Wikipedia: A Cross-lingual Case Study\nA Survey of Word Embeddings Evaluation Methods\nTwists and Twistability\nZero-Cost Coercions for Program and Proof Reuse\nNetwork Features Based Co-hyponymy Detection\nQWIRE Practice: Formal Verification of Quantum Circuits in Coq\nRepresenting Verbs as Argument Concepts\nAutomatic Transferring between Ancient Chinese and Contemporary Chinese\nLinking ImageNet WordNet Synsets with Wikidata\nAutomatic Detection of Online Jihadist Hate Speech\nSentEval: An Evaluation Toolkit for Universal Sentence Representations\nAdvancing Connectionist Temporal Classification With Attention Modeling\nDear Sir or Madam, May I introduce the YAFC Corpus: Corpus, Benchmarks  and Metrics for Formality Style Transfer\nAttention on Attention: Architectures for Visual Question Answering  (VQA)\nReversibility of Extreme Relational Structures\nDomain Adaptation for Statistical Machine Translation\nETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and  Convolutional Neural Networks for Relation Classification and Extraction\nA Survey on Neural Network-Based Summarization Methods\nA Processing Model for Free Word Order Languages\nWeak subsumption Constraints for Type Diagnosis: An Incremental  Algorithm\nA Constraint-based Case Frame Lexicon Architecture\nAlgebraic Approach to Interacting Quantum Systems\nComment on Reply of Benedetto et al\nResolution of Indirect Anaphora in Japanese Sentences Using Examples 'X  no Y (Y of X)'\nAn Usage Measure Based on Psychophysical Relations\nCan Prosody Aid the Automatic Classification of Dialog Acts in  Conversational Speech?\nProsody-Based Automatic Segmentation of Speech into Sentences and Topics\nRecursively Undecidable Properties of NP\nA theory of experiment\nLogic programming in the context of multiparadigm programming: the Oz  experience\nSmall Spans in Scaled Dimension\nToward the Implementation of Functions in the DLV System (Preliminary  Technical Report)\nSymmetry and interactivity in Programming\nDeductive Object Programming\nDependency Treebanks: Methods, Annotation Schemes and Tools\n$W_{1+\\infty}$ as a Discretization of Virasoro Algebra\nSelf Avoiding Walks, the Language of Science, and Fibonacci Numbers\nOn groups whose word problem is solved by a nested stack automaton\nSemiinfinite cohomology of Lie-* algebras\nThe Automorphism Groups of the Groups of Order 8p^2\nOn genus-change in algebraic curves over nonperfect fields\nModeling the Co-occurrence Principles of the Consonant Inventories: A  Complex Network Approach\nQuantum Communication Complexity\nTopological decomposition of composite quantum state spaces\nThe sum-over-histories formulation of quantum computing\nQuantum Mechanics in Phase Space\nSocial applications of two-dimensional Ising models\nHow to realize \"a sense of humour\" in computers ?\nCalculating Colimits Compositionally\nLe terme et le concept : fondements d'une ontoterminologie\nProgram Promises\nSome properties of the Ukrainian writing system\nGeometry of the Standard Model\nOffloading Cognition onto Cognitive Technology\nLogics for XML\nCubefree words with many squares\nThe Depth of a Hypersubstitution\nDejean's conjecture holds for n>=27\nA decidable policy language for history-based transaction monitoring\nA proof of Dejean's conjecture\nSlowly synchronizing automata with zero and incomplete sets\nMissing data in a stochastic Dollo model for cognate data, and its  application to the dating of Proto-Indo-European\nApproximating the minimum length of synchronizing words is hard\nA Framework for Specifying, Prototyping, and Reasoning about  Computational Systems\nWeak Kleene Algebra is Sound and (Possibly) Complete for Simulation\nLudique : une logique sans axiome d'identité\nStandardization of the formal representation of lexical information for  NLP\nFrequency of Occurrence and Information Entropy of American Sign  Language\nSyllable Analysis to Build a Dictation System in Telugu language\nEnumeration Order Reducibility\nDiversity, competition, extinction: the ecophysics of language change\nThe Cerny conjecture for one-cluster automata with prime length cycle\nThe Lambek-Grishin calculus is NP-complete\nHopf-Galois objects and cogroupoids\nNeutrino Mean Free Path in Neutron Star\nFree iterative and iteration K-semialgebras\nProceedings of CICLOPS-WLPE 2010\nGeometric Properties of Boundary Orbit Accumulation Points\nDecomposition Complexity\nBacteria inspired patterns grown with hyperbolic cellular automata\nProceedings 24th International Workshop on Unification\nStatus of GDL - GNU Data Language\nA variant of Hofstadter's sequence and finite automata\nOn primary and secondary repetitions in words\nSound and complete axiomatizations of coalgebraic language equivalence\nThe settlement of Madagascar: what dialects and languages can tell\nAttacker Control and Impact for Confidentiality and Integrity\nAbsoluteness of subword inequality is undecidable\nBiologically Inspired Process Calculi, Petri Nets and Membrane Computing\nSaying Hello World with GReTL - A Solution to the TTC 2011 Instructive  Case\nAutonomous push-down automaton built on DNA\nHarbingers of Artin's Reciprocity Law. III. Gauss's Lemma and Artin's  Transfer\nSingular and Plural Functions for Functional Logic Programming\nA Survey of Multi-Tape Automata\nCell decomposition and definable functions for weak p-adic structures\nModular session types for objects\nA Fast and Simple Algorithm for Training Neural Probabilistic Language  Models\nAn axiomatic look at a windmill\nExtensions of the Minimum Cost Homomorphism Problem\nTyped Answer Set Programming and Inverse Lambda Algorithms\ncphVB: A System for Automated Runtime Optimization and Parallelization  of Vectorized Applications\nThe Model of Semantic Concepts Lattice For Data Mining Of Microblogs\nVisual Recognition of Isolated Swedish Sign Language Signs\nSimplification and integration in computing and cognition: the SP theory  and the multiple alignment concept\nKnowledge Base Approach for 3D Objects Detection in Point Clouds Using  3D Processing and Specialists Knowledge\nTermhood-based Comparability Metrics of Comparable Corpus in Special  Domain\nThe most controversial topics in Wikipedia: A multilingual and  geographical analysis\nReachability in Higher-Order-Counters\nWas ist Unendlichkeit - und wenn ja, wie viele?\nJensen-type inequality for non-convex functions\nProceedings Second Workshop on Trends in Functional Programming In  Education\nCounting words with vector spaces\nAn exercise on streams: convergence acceleration\nSome criteria to check if a projective hypersurfaces is smooth or  singular\nYoneda lemma for complete Segal spaces\nTowards Focus on Time\nTransformation of UML Behavioral Diagrams to Support Software Model  Checking\nClingo = ASP + Control: Preliminary Report\nModel Checking Markov Chains Against Unambiguous Buchi Automata\nNon-termination using Regular Languages\nPy-oopsi: the python implementation of the fast-oopsi algorithm\nWeighted finite automata with output\nDistributed Representations for Compositional Semantics\nModel theory of special subvarieties and Schanuel-type conjectures\nEquation $x^iy^jx^k=u^iv^ju^k$ in words\nModular Action Language ALM\nClassifying informative and imaginative prose using complex networks\nMost Complex Regular Ideal Languages\nLearning Articulated Motion Models from Visual and Lingual Signals\nBlackOut: Speeding up Recurrent Neural Network Language Models With Very  Large Vocabularies\nLiveness-Based Garbage Collection for Lazy Languages\nClose Encounters of the Higher Kind Emulating Constructor Classes in  Standard ML\nThe Frobenius problem for the shuffle operation\nA Simulation and Modeling of Access Points with Definition Language\nDetecting Data Races on OpenCL Kernels with Symbolic Execution\nProceedings 7th Workshop on Programming Language Approaches to  Concurrency and Communication-cEntric Software\nReconciliation of RDF* and Property Graphs\nDerived $(\\infty,1)$-categories of two kinds\nLogic programming beyond Prolog\nFunctional Automata - Formal Languages for Computer Science Students\nEmbedding Word Similarity with Neural Machine Translation\nOn Subword Complexity of Morphic Sequences\nContext-free Algorithms\nYesWorkflow: A User-Oriented, Language-Independent Tool for Recovering  Workflow Information from Scripts\nKickstarting Choreographic Programming\nFully bordered words\nAnalysis of Stopping Active Learning based on Stabilizing Predictions\nOn the \"Naturalness\" of Buggy Code\nThe ultimate tactics of self-referential systems\nAvoidability index for binary patterns with reversal\nGenomic study of the Ket: a Paleo-Eskimo-related ethnic group with  significant ancient North Eurasian ancestry\nOn McKay's propagation theorem for the Foulkes conjecture\nRehearsal: A Configuration Verification Tool for Puppet\nCARMA: Collective Adaptive Resource-sharing Markovian Agents\nNeural Enquirer: Learning to Query Tables with Natural Language\nHenselianity in the language of rings\nPractical State Machines for Computer Software and Engineering\nThe logic of the reverse mathematics zoo\nComputational Soundness Results for Stateful Applied pi Calculus\nTowards Turkish ASR: Anatomy of a rule-based Turkish g2p\nCompleteness and the ZX-calculus\nLexical bundles in computational linguistics academic literature\nThe trace monoids in the queue monoid and in the direct product of two  free monoids\nModel Theory of Adeles I\nCAIR: Using Formal Languages to Study Routing, Leaking, and Interception  in BGP\nValued modules over skew polynomial rings 1\nGoing Deeper for Multilingual Visual Sentiment Detection\nThe Meaning of Null in Databases and Programming Languages\nShortest Trajectories and Reversibility in Boolean Automata Networks\nThe LAMBADA dataset: Word prediction requiring a broad discourse context\nHierarchical Neural Language Models for Joint Representation of  Streaming Documents and their Content\nLowering IrGL to CUDA\nNovel Word Embedding and Translation-based Language Modeling for  Extractive Speech Summarization\nThe weakest nontrivial idempotent equations\nOn the Existence of Weak One-Way Functions\nMultiplex lexical networks reveal patterns in early word acquisition in  children\nSentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep  Recurrent models\nFully Character-Level Neural Machine Translation without Explicit  Segmentation\nVisually pleasing knot projections in R\nExploiting Sentence and Context Representations in Deep Neural Models  for Spoken Language Understanding\nFoundations of Modern Query Languages for Graph Databases\nTopological aspects of the multi-language phases of the Naming Game on  community-based networks\nApplication of Case-Based Teaching and Learning in Compiler Design  Course\nMultilingual Knowledge Graph Embeddings for Cross-lingual Knowledge  Alignment\nMoore: Interval Arithmetic in Modern C++\nMultilingual Multiword Expressions\nImproving the Performance of Neural Machine Translation Involving  Morphologically Rich Languages\nFew more Comments on Benford's Law\nAmbiguity and Incomplete Information in Categorical Models of Language\nPrimitivity, Uniform Minimality and State Complexity of Boolean  Operations\nProceedings ML Family / OCaml Users and Developers workshops\nÉtude sur les portails et agrégateurs des ressources pédagogiques  universitaires francophones en accès libre\nEfficient Analytical Queries on Semantic Web Data Cubes\nAnalysis of the Ratio $D(n)/n$\nProfinite semigroups\nOn the Implementation of a Scalable Simulator for Multiscale  Hybrid-Mixed Methods\nReply to L.Vaidman's comment on \"Asking photons where they have been in  plain language\"\nPay Attention to Those Sets! Learning Quantification from Images\nAccurately and Efficiently Interpreting Human-Robot Instructions of  Varying Granularities\nLearning Convolutional Text Representations for Visual Question  Answering\nDefinable sets up to definable bijections in Presburger groups\nSix Challenges for Neural Machine Translation\nStatistical Inferences for Polarity Identification in Natural Language\nThe Complex Negotiation Dialogue Game\nA characterization of strongly dependent ordered Abelian groups\nThe Argument Reasoning Comprehension Task: Identification and  Reconstruction of Implicit Warrants\nDiSAN: Directional Self-Attention Network for RNN/CNN-Free Language  Understanding\nSKOS Concepts and Natural Language Concepts: an Analysis of Latent  Relationships in KOSs\nIdentifying Phrasemes via Interlingual Association Measures -- A  Data-driven Approach on Dependency-parsed and Word-aligned Parallel Corpora\nExtracting Ontological Knowledge from Textual Descriptions\nExpressing and verifying embedded software requirements\nA Sequential Neural Encoder with Latent Structured Description for  Modeling Sentences\nAicyber's System for NLPCC 2017 Shared Task 2: Voting of Baselines\nProof Complexity Meets Algebra\nLanguage Bootstrapping: Learning Word Meanings From Perception-Action  Association\nEfficient reduction of nondeterministic automata with application to  language inclusion testing\nThe Maximal MAM, a Reasonable Implementation of the Maximal Strategy\nA Practical Approach for Detecting Logical Error in Object Oriented  Environment\nCoarse homology theories and finite decomposition complexity\nDual Long Short-Term Memory Networks for Sub-Character Representation  Learning\nRooted Divergence-Preserving Branching Bisimilarity is a Congruence\nA Stitch in Time Saves Nine -- SPARQL querying of Property Graphs using  Gremlin Traversals\nOn B. Mossé's unilateral recognizability theorem\nForest Categories\nClassifying medical notes into standard disease codes using Machine  Learning\nWhen Good Components Go Bad: Formally Secure Compilation Despite Dynamic  Compromise\nQuantitative Fine-Grained Human Evaluation of Machine Translation  Systems: a Case Study on English to Croatian\nRecurrent Neural Network-Based Semantic Variational Autoencoder for  Sequence-to-Sequence Learning\nDynamic Natural Language Processing with Recurrence Quantification  Analysis\nStaQC: A Systematically Mined Question-Code Dataset from Stack Overflow\nSocioeconomic Dependencies of Linguistic Patterns in Twitter: A  Multivariate Analysis\nThe Computational Complexity of Symbolic Dynamics at the Onset of Chaos\nPearl: A Probabilistic Chart Parser\nDetermination of referential property and number of nouns in Japanese  sentences for machine translation into English\nDetecting and Correcting Speech Repairs\nExploring the Statistical Derivation of Transformational Rule Sequences  for Part-of-Speech Tagging\nDISCO---An HPSG-based NLP System and its Application for Appointment  Scheduling (Project Note)\nBuilding a Large-Scale Knowledge Base for Machine Translation\nA Probabilistic Model of Compound Nouns\nSituated Modeling of Epistemic Puzzles\nTowards an Automatic Dictation System for Translators: the TransTalk  Project\nA Centering Approach to Pronouns\nDilemma - An Instant Lexicographer\nA Freely Available Wide Coverage Morphological Analyzer for English\nAcquiring Knowledge from Encyclopedic Texts\nAdaptive Sentence Boundary Disambiguation\nDependency Grammar and the Parsing of Chinese Sentences\nCoupling Phonology and Phonetics in a Constraint-Based Gestural Model\nUsing default inheritance to describe LTAG\nProFIT: Prolog with Features, Inheritance and Templates\nSpecifying a shallow grammatical representation for parsing purposes\nBi-directional memory-based dialog translation: The KEMDT approach\nAn NLP Approach to a Specific Type of Texts: Car Accident Reports\nTagging French -- comparing a statistical and a constraint-based method\nLinear Logic for Meaning Assembly\nRobust Parsing Based on Discourse Information: Completing partial parses  of ill-formed sentences on the basis of discourse information\nAutomatic Evaluation and Uniform Filter Cascades for Inducing N-Best  Translation Lexicons\nA Study of the Context(s) in a Specific Type of Texts: Car Accident  Reports\nDisambiguating bilingual nominal entries against WordNet\nIncorporating Discourse Aspects in English -- Polish MT: Towards Robust  Implementation\nAutomatic Identification of Support Verbs: A Step Towards a Definition  of Semantic Weight\nDisambiguating Noun Groupings with Respect to WordNet Senses\nAutomatic Inference of DATR Theories\nReport of the Study Group on Assessment and Evaluation\nThe Measure of a Model\nTowards a Workbench for Acquisition of Domain Knowledge from Natural  Language\nTowards Understanding Spontaneous Speech: Word Accuracy vs. Concept  Accuracy\nDirected Replacement\nMinimizing Manual Annotation Cost In Supervised Training From Corpora\nBeyond Word N-Grams\nApplying Winnow to Context-Sensitive Spelling Correction\nA Sign-Based Phrase Structure Grammar for Turkish\nInferring Acceptance and Rejection in Dialogue by Default Rules of  Inference\nA Faster Structured-Tag Word-Classification Method\nDialogos: a Robust System for Human-Machine Spoken Dialogue on the  Telephone\nSequential Model Selection for Word Sense Disambiguation\nQuantitative Constraint Logic Programming for Weighted Grammar  Applications\nThe TreeBanker: a Tool for Supervised Training of Parsed Corpora\nExemplar-Based Word Sense Disambiguation: Some Recent Improvements\nApplying Reliability Metrics to Co-Reference Annotation\nA Linear Observed Time Statistical Parser Based on Maximum Entropy  Models\nA Model of Lexical Attraction and Repulsion\nAn Empirical Approach to Temporal Reference Resolution\nA Lexicalist Approach to the Translation of Colloquial Text\nA Word-to-Word Model of Translational Equivalence\nDiscourse Preferences in Dynamic Logic\nStressed and Unstressed Pronouns: Complementary Preferences\nExperiences with the GTU grammar development environment\nSimilarity-Based Approaches to Natural Language Processing\nProbabilistic Event Categorization\nApproximating Context-Free Grammars with a Finite-State Calculus\nProbabilistic Parsing Using Left Corner Language Models\nMulti-document Summarization by Graph Search and Matching\nIdentifying Discourse Markers in Spoken Dialog\nAutomating Coreference: The Role of Annotated Training Data\nThe Proper Treatment of Optimality in Computational Phonology\nModels of Co-occurrence\nComputing Dialogue Acts from Features with Transformation-Based Learning\nAn Investigation of Transformation-Based Learning in Discourse\nPartial Evaluation for Efficient Access to Inheritance Lexicons\nError-Driven Pruning of Treebank Grammars for Base Noun Phrase  Identification\nSome Ontological Principles for Designing Upper Level Lexical Resources\nWhat's in a name?\nSpoken Language Dialogue Systems and Components: Best practice in  development and evaluation (DISC 24823) - Periodic Progress Report 1: Basic  Details of the Action\nThe Boolean Hierarchy over Level 1/2 of the Straubing-Therien Hierarchy\nResources for Evaluation of Summarization Techniques\nSemi-Automatic Indexing of Multilingual Documents\nTwo-way finite automata with quantum and classical states\nAn Estimate of Referent of Noun Phrases in Japanese Sentences\ncc-Golog: Towards More Realistic Logic-Based Robot Controllers\nMultimethods and separate static typechecking in a language with  C++-like object model\nExploiting Diversity for Natural Language Parsing\nTurning Speech Into Scripts\nEntropy-based Pruning of Backoff Language Models\nJapanese Probabilistic Information Retrieval Using Location and Category  Information\nFinding consensus in speech recognition: word error minimization and  other applications of confusion networks\nEquiX---A Search and Query Language for XML\nQuantum Multi-Prover Interactive Proof Systems with Limited Prior  Entanglement\nMulti-Syllable Phonotactic Modelling\nRobust Probabilistic Predictive Syntactic Processing\nTransformation-Based Learning in the Fast Lane\nMultidimensional Transformation-Based Learning\nClasses for Fast Maximum Entropy Training\nConvergent Approximate Solving of First-Order Constraints by Approximate  Quantifiers\nPortability of Syntactic Structure for Language Modeling\nInformation Extraction Using the Structured Language Model\nProbabilistic asynchronous pi-calculus\nEquiX--A Search and Query Language for XML\nProliferation of SDDS Support for Various Platforms and Languages\nTableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on  the Annotation Graph Toolkit\nSoft Concurrent Constraint Programming\nRerendering Semantic Ontologies: Automatic Extensions to UMLS through  Corpus Analytics\nMining the Web for Synonyms: PMI-IR versus LSA on TOEFL\nApproximate Grammar for Information Extraction\nQuanta: a Language for Modeling and Manipulating Information Structures\nCollaborative Creation of Digital Content in Indian Languages\nCSIEC (Computer Simulator in Educational Communication): An Intelligent  Web-Based Teaching System for Foreign Language Learning\nGreedy Algorithms in Datalog\nA system for reflection in C++\nFinite-Tree Analysis for Constraint Logic-Based Languages: The Complete  Unabridged Version\nSchema-based Scheduling of Event Processors and Buffer Minimization for  Queries on Structured Data Streams\nA CHR-based Implementation of Known Arc-Consistency\nA new architecture for making highly scalable applications\nAn Introduction to the Summarization of Evolving Events: Linear and  Non-linear Evolution\nProgramming Finite-Domain Constraint Propagators in Action Rules\nATNoSFERES revisited\nOpenVanilla - A Non-Intrusive Plug-In Framework of Text Services\nSummarizing Reports on Evolving Events; Part I: Linear Evolution\nHaskell's overlooked object system\nNonmonotonic Trust Management for P2P Applications\nForward slicing of functional logic programs by partial evaluation\nCombining Relational Algebra, SQL, Constraint Modelling, and Local  Search\nConstraint Functional Logic Programming over Finite Domains\nCompositional Semantics for the Procedural Interpretation of Logic\nLanguage Support for Optional Functionality\nOn the Complexity of Limit Sets of Cellular Automata Associated with  Probability Measures\nMultilingual person name recognition and transliteration\nVerification, Validation and Integrity of Distributed and Interchanged  Rule Based Policies and Contracts in the Semantic Web\nMemory and compiler optimizations for low-power and -energy\nLanguage, logic and ontology: uncovering the structure of commonsense  knowledge\nLiveness of Heap Data for Functional Programs\nAPEmille: a parallel processor in the teraflop range\nThe Wholeness Axioms and V=HOD\nWikipedias: Collaborative web-based encyclopedias as complex networks\nNetwork properties of written human language\nSelf-organization of the Sound Inventories: Analysis and Synthesis of  the Occurrence and Co-occurrence Networks of Consonants\nHow Difficult is it to Develop a Perfect Spell-checker? A  Cross-linguistic Analysis through Complex Network Approach\nClassical Concepts in Quantum Programming\nThe language of Einstein spoken by optical instruments\nPhysical propositions and quantum languages\nTowards Understanding the Origin of Genetic Languages\nSeparation Logic for Small-step Cminor\nEmergence of Scale-Free Syntax Networks\nThe structure of verbal sequences analyzed with unsupervised learning  techniques\nAutomated Synthesis of Assertion Monitors using Visual Specifications\nAmélioration des Performances des Systèmes Automatiques de  Reconnaissance de la Parole pour la Parole Non Native\nVery strict selectional restrictions\nAcoustic Features and Perceptive Cues of Songs and Dialogues in Whistled  Speech: Convergences with Sung Speech\nValence extraction using EM selection and co-occurrence matrices\nBorel Ranks and Wadge Degrees of Context Free Omega Languages\nFramework and Resources for Natural Language Parser Evaluation\nAn omega-power of a context-free language which is Borel above  Delta^0_omega\nProgramming an interpreter using molecular dynamics\nA Comparison of natural (english) and artificial (esperanto) languages.  A Multifractal method based analysis\nRobustness Evaluation of Two CCG, a PCFG and a Link Grammar Parsers\nComplexity of Combinatorial Market Makers\nEfficient Algorithms for Membership in Boolean Hierarchies of Regular  Languages\nThe Abella Interactive Theorem Prover (System Description)\nTowards a stable definition of Kolmogorov-Chaitin complexity\nRespect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based  Language Models\nGraph Algorithms for Improving Type-Logical Proof Search\nModeling the Structure and Dynamics of the Consonant Inventories: A  Complex Network Approach\nAn overview of QML with a concrete implementation in Haskell\nA Layered Grammar Model: Using Tree-Adjoining Grammars to Build a Common  Syntactic Kernel for Related Dialects\nA sound spatio-temporal Hoare logic for the verification of structured  interactive programs with registers and voices\nThe Application of Fuzzy Logic to Collocation Extraction\nApproaching the linguistic complexity\nNew Confidence Measures for Statistical Machine Translation\nHilbert's epsilon as an Operator of Indefinite Committed Choice\nNetwork of two-Chinese-character compound words in Japanese language\nDecompositions of Grammar Constraints\nDeterministic pushdown automata and unary languages\nA Theory of Explicit Substitutions with Safe and Full Composition\nTowards Automated Deduction in Blackmail Case Analysis with Forensic  Lucid\nNLP-SIR: A Natural Language Approach for Spreadsheet Information  Retrieval\nFrom Declarative Languages to Declarative Processing in Computer Games\nA Random Matrix Approach to Language Acquisition\nThe Complexity of Infinite Computations In Models of Set Theory\nClassification with Tarskian system executions (Bakery Algorithms as an  example)\nA Graph Model for Imperative Computation\nExpressing the Behavior of Three Very Different Concurrent Systems by  Using Natural Extensions of Separation Logic\nA New Look at the Classical Entropy of Written English\nCove: A Practical Quantum Computer Programming Framework\nDistributed Quantum Programming\nLinear Recursion\nTowards a Unified Framework for Declarative Structured Communications\nProceedings Sixth Workshop on Structural Operational Semantics\nSyntactic Topic Models\nA Mathematical Approach to the Study of the United States Code\nRelating Nominal and Higher-order Abstract Syntax Specifications\nSpoken Language Identification Using Hybrid Feature Extraction Methods\nVerification of Object-Oriented Programs: a Transformational Approach\nAn Overview: Extensible Markup Language Technology\nInflection system of a language as a complex network\nSpace and the Synchronic A-Ram\nDon't 'have a clue'? Unsupervised co-learning of downward-entailing  operators\nFunctorial Data Migration\nTableaux for the Lambek-Grishin calculus\nPortability of Prolog programs: theory and case-studies\nA Short Decidability Proof for DPDA Language Equivalence via First-Order  Grammars\nHistory-sensitive versus future-sensitive approaches to security in  distributed systems\nExpressiveness modulo Bisimilarity of Regular Expressions with Parallel  Composition (Extended Abstract)\nLamplighter Random Walks and Entropy-Sensitivity of Languages\nMiniAgda: Integrating Sized and Dependent Types\nA Symbolic Transformation Language and its Application to a Multiscale  Method\nSynthese des Controleurs Optimaux pour les Systemes a Evenements  Discrets\nFinite state verifiers with constant randomness\nA system of relational syllogistic incorporating full Boolean reasoning\nHigher-Order Symbolic Execution via Contracts\nSelf reference in word definitions\nLanguages invariant under more symmetries: overlapping factors versus  palindromic richness\nComplexity Results for Modal Dependence Logic\nSeeking Meaning in a Space Made out of Strokes, Radicals, Characters and  Compounds\nSystematic Abstraction of Abstract Machines\nThe Magic of Logical Inference in Probabilistic Programming\nModelling and Simulation of Asynchronous Real-Time Systems using Timed  Rebeca\nUnit Testing in ASPIDE\nParsing Combinatory Categorial Grammar with Answer Set Programming:  Preliminary Report\nA Framework for Devanagari Script-based Captcha\nDevnagari document segmentation using histogram approach\nSLALOM: a Language for SLA specification and monitoring\n3D Model Retrieval Based on Semantic and Shape Indexes\nManaging Communication Latency-Hiding at Runtime for Parallel  Programming Languages and Libraries\nIrrelevance, Heterogeneous Equality, and Call-by-value Dependent Type  Systems\nProceedings 8th Workshop on Fixed Points in Computer Science\nLattices of Logical Fragments over Words\nQueries with Guarded Negation (full version)\nA Programmer-Centric Approach to Program Verification in ATS\nThe FO^2 alternation hierarchy is decidable\nInformation Retrieval Systems Adapted to the Biomedical Domain\nCommunication Language Specifications For Digital Ecosystems\nManagement Language Specifications For Digital Ecosystems\nAn Efficient Finite Tree Automata Library\nThe non-algorithmic side of the mind\nOTS/CafeOBJ2JML: An attempt to combine Design By Contract with  Behavioral Specifications\nFASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely  Lexical Substitutes Based on an N-gram Language Model\nA Machine Learning Approach For Opinion Holder Extraction In Arabic  Language\nMultilingual Medical Documents Classification Based on MesH Domain  Ontology\nWhat is Statistics?; The Answer by Quantum Language\nFlexible Dynamic Information Flow Control in the Presence of Exceptions\nArabic CALL system based on pedagogically indexed text\nIsomorphisms of types in the presence of higher-order references  (extended version)\nPrograming Using High Level Design With Python and FORTRAN: A Study Case  in Astrophysics\nA prototype for projecting HPSG syntactic lexica towards LMF\nRecent Technological Advances in Natural Language Processing and  Artificial Intelligence\nA Procedure for Splitting Processes and its Application to Coordination\nProceedings Sixth Workshop on Formal Languages and Analysis of  Contract-Oriented Software\nHigher-Order Pushdown Systems with Data\nA decidable quantified fragment of set theory with ordered pairs and  some undecidable extensions\nLightweight compilation of (C)LP to JavaScript\nThe complexity of finite-valued CSPs\nRecognizing Static Signs from the Brazilian Sign Language: Comparing  Large-Margin Decision Directed Acyclic Graphs, Voting Support Vector Machines  and Artificial Neural Networks\nA Hindi Speech Actuated Computer Interface for Web Search\nParallel BioScape: A Stochastic and Parallel Language for Mobile and  Spatial Interactions\nMahotas: Open source software for scriptable computer vision\nMeasuring Time in Sporting Competitions with the Domain-Specific  Language EasyTime\nModel completeness of o-minimal fields with convex valuations\nSemantics and Security Issues in JavaScript\nDiachronic Variation in Grammatical Relations\nClassifier Fusion Method to Recognize Handwritten Kannada Numerals\nIdentifying trends in word frequency dynamics\nTyping Context-Dependent Behavioural Variation\nStatic and dynamic semantics of NoSQL languages\nAutomatic lexical semantic classification of nouns\nUsing Mathematica & Matlab for CAGD/CAD research and education\nA Dataflow Language for Decentralised Orchestration of Web Service  Workflows\nWhere the \"it from bit\" come from?\nRecognition of Indian Sign Language in Live Video\nKolmogorov Complexity of Categories\nOn a compact encoding of the swap automaton\nOn the state complexity of semi-quantum finite automata\nConversion of Braille to Text in English, Hindi and Tamil Languages\nRule Based Transliteration Scheme for English to Punjabi\nFunctional framework for representing and transforming quantum channels\nMeta SOS - A Maude Based SOS Meta-Theory Framework\nWhat is Decidable about Partially Observable Markov Decision Processes  with ω-Regular Objectives\nSimulation of Two-Way Pushdown Automata Revisited\nAlternating Turing machines for inductive languages\nOn the Semantics of ReFLect as a Basis for a Reflective Theorem Prover\nJRC-Names: A freely available, highly multilingual named entity resource\nStep-Indexed Relational Reasoning for Countable Nondeterminism\nTo parallelize or not to parallelize, bugs issue\nChecking Race Freedom of Clocked X10 Programs\nFlow analysis, linearity, and PTIME\nOn the Expressiveness of TPTL and MTL over ω-Data Words\nEmpowering Evolving Social Network Users with Privacy Rights\nMonotonic References for Gradual Typing\nThe Glasgow Parallel Reduction Machine: Programming Shared-memory  Many-core Systems using Parallel Task Composition\nMinimising virtual machine support for concurrency\nDynamics in atomic signaling games\nOn Verifying Resource Contracts using Code Contracts\nDesign & Development of the Graphical User Interface for Sindhi Language\nFrom Safety To Termination And Back: SMT-Based Verification For Lazy  Languages\nTimed Soft Concurrent Constraint Programs: An Interleaved and a Parallel  Approach\nOn the penetration distance in Garside monoids\nHPS: a hierarchical Persian stemming method\nVerifying Web Applications: From Business Level Specifications to  Automated Model-Based Testing\nApplication of Ontologies in Identifying Requirements Patterns in Use  Cases\nA Convolutional Neural Network for Modelling Sentences\nComputer Simulation Codes for the Quine-McCluskey Method of Logic  Minimization\nOn Backdoors To Tractable Constraint Languages\nMultilingual Models for Compositional Distributed Semantics\nMultiplicative Bidding in Online Advertising\nRepresentation of a Sentence using a Polar Fuzzy Neutrosophic Semantic  Net\nData-flow Analysis of Programs with Associative Arrays\nSimulating dynamic systems using Linear Time Calculus theories\nA Study of Entanglement in a Categorical Framework of Natural Language\nLes mathématiques de la langue : l'approche formelle de Montague\nFormal Consistency Checking over Specifications in Natural Languages\nLearning to Exploit Different Translation Resources for Cross Language  Information Retrieval\nError Reporting in Parsing Expression Grammars\nAIOCJ: A Choreographic Framework for Safe Adaptive Distributed  Applications\nMerlin: A Language for Provisioning Network Resources\nJoint Energy-based Detection and Classificationon of Multilingual Text  Lines\nCrowdsourcing Dialect Characterization through Twitter\nFirst-Pass Large Vocabulary Continuous Speech Recognition using  Bi-Directional Recurrent DNNs\nLanguage-based Examples in the Statistics Classroom\nHybrid approaches for automatic vowelization of Arabic texts\nAnalysis of Named Entity Recognition and Linking for Tweets\nExperiments to Improve Named Entity Recognition on Turkish Tweets\nProgram certification with computational effects\nCoarse-grained Cross-lingual Alignment of Comparable Texts with Topic  Models and Encyclopedic Knowledge\nOptimisation using Natural Language Processing: Personalized Tour  Recommendation for Museums\nProceedings XIV Jornadas sobre Programación y Lenguajes\nXQOWL: An Extension of XQuery for OWL Querying and Reasoning\nPhrase Based Language Model for Statistical Machine Translation:  Empirical Study\nThe effect of using facebook markup language (fbml) for designing an  e-learning model in higher education\nA unified framework for modeling and implementation of hybrid systems  with synchronous controllers\n!-Graphs with Trivial Overlap are Context-Free\nA Fixed-Size Encoding Method for Variable-Length Sequences with its  Application to Neural Network Language Models\nA First Class Boolean Sort in First-Order Theorem Proving and TPTP\nWeb ontology representation and reasoning via fragments of set theory\nA Frobenius Model of Information Structure in Categorical Compositional  Distributional Semantics\nSQL for SRL: Structure Learning Inside a Database System\nLDQL: A Query Language for the Web of Linked Data (Extended Version)\nProceedings Tenth International Workshop on Logical Frameworks and Meta  Languages: Theory and Practice\nResolving References to Objects in Photographs using the  Words-As-Classifiers Model\nMarkovian language model of the DNA and its information content\nA Graph Traversal Based Approach to Answer Non-Aggregation Questions  Over DBpedia\nPrevalence and recoverability of syntactic parameters in sparse  distributed memories\nA Formal Model for Direct-style Asynchronous Observables\nLowering the learning curve for declarative programming: a Python API  for the IDP system\nData-driven Workflows for Microservices\nDeep Reinforcement Learning with a Natural Language Action Space\nA framework for deadlock detection in core ABS\nsense2vec - A Fast and Accurate Method for Word Sense Disambiguation In  Neural Word Embeddings\nThe Journal Coverage of Web of Science and Scopus: a Comparative  Analysis\nProceedings 6th Workshop on Mathematically Structured Functional  Programming\nCharacter-Level Neural Translation for Multilingual Media Monitoring in  the SUMMA Project\nDistributed Entity Disambiguation with Per-Mention Learning\nSweLL on the rise: Swedish Learner Language corpus for European  Reference Level studies\nA Hybrid Approach to Query Answering under Expressive Datalog+/-\nQuantum Algorithms for Compositional Natural Language Processing\nPrecise Complexity Guarantees for Pointer Analysis via Datalog with  Extensions\nBi-directional Attention with Agreement for Dependency Parsing\nEncoder-decoder with Focus-mechanism for Sequence Labelling Based Spoken  Language Understanding\nAn Erlang Implementation of Multiparty Session Actors\nWikiReading: A Novel Large-scale Language Understanding Task over  Wikipedia\nMiniZinc with Strings\nRedefining part-of-speech classes with distributional semantic models\nUndecidability of the Lambek calculus with subexponential and bracket  modalities\nNeural versus Phrase-Based Machine Translation Quality: a Case Study\nUsing Distributed Representations to Disambiguate Biomedical and  Clinical Concepts\nType Inference for Static Compilation of JavaScript (Extended Version)\nUtilizing Large Scale Vision and Text Datasets for Image Segmentation  from Referring Expressions\nAmerican Sign Language fingerspelling recognition from video: Methods  for unrestricted recognition and signer-independence\nMultilingual lexicon design tool and database management system for MT\nExcess entropy in natural language: present state and perspectives\nApproximating Petri Net Reachability Along Context-free Traces\nOn the Limitations of Provenance for Queries With Difference\nGarbage Collection for Multicore NUMA Machines\nSemantic Vector Machines\nSNEG - Mathematica package for symbolic calculations with  second-quantization-operator expressions\nTowards cross-lingual alerting for bursty epidemic events\nIdentifying Reference Objects by Hierarchical Clustering in Java  Environment\nProceedings Third Workshop on Programming Language Approaches to  Concurrency and communication-cEntric Software\nTopological Logics with Connectedness over Euclidean Spaces\nCreating a Live, Public Short Message Service Corpus: The NUS SMS Corpus\nIsomorphisms of types in the presence of higher-order references\nA cookbook of translating English to Xapi\nInteraction Nets in Russian\nThe Rational and Computational Scope of Probabilistic Rule-Based Expert  Systems\nOn the Relation between Context-Free Grammars and Parsing Expression  Grammars\nSpaces, Trees and Colors: The Algorithmic Landscape of Document  Retrieval on Sequences\nBlind-date Conversation Joining\nLocal Type Checking for Linked Data Consumers\nBounded-Choice Statements for User Interaction in Imperative and  Object-Oriented Programming\nPhase transition and fast agreement in Naming Game with preference for  multi-word agents\nProceedings Workshop on Fixed Points in Computer Science\nNatural Language Inference for Arabic Using Extended Tree Edit Distance  with Subtrees\nEvent Structure of Transitive Verb: A MARVS perspective\nA new model for Context-Oriented Programs\nBridging the gap between Legal Practitioners and Knowledge Engineers  using semi-formal KR\nProceedings Twelfth International Workshop on Quantitative Aspects of  Programming Languages and Systems\nStochastically timed predicate-based communication primitives for  autonomic computing\nPOS Tagging and its Applications for Mathematics\nRelating the Time Complexity of Optimization Problems in Light of the  Exponential-Time Hypothesis\nZipf's law holds for phrases, not words\nFirst-order definable string transformations\nSemantically Configurable Consistency Analysis for Class and Object  Diagrams\nTo found or not to found: that is the question\nAn Approach to Reducing Annotation Costs for BioNLP\nA Method for Stopping Active Learning Based on Stabilizing Predictions  and the Need for User-Adjustable Stopping\nModel Driven Testing of Time Sensitive Distributed Systems\nSemantically-Informed Syntactic Machine Translation: A Tree-Grafting  Approach\nTowards a Formalization of the Unified Modeling Language\nArabic Spelling Correction using Supervised Learning\nSkip-gram Language Modeling Using Sparse Non-negative Matrix Probability  Estimation\nDifferent Similarities\nLearn Physics by Programming in Haskell\nNew results on classical and quantum counter automata\nGrammar as a Foreign Language\nA Fuzzy Based Model to Identify Printed Sinhala Characters (ICIAfS14)\nJif: Language-based Information-flow Security in Java\nAuthorship recognition via fluctuation analysis of network topology and  word intermittency\nNecessary conditions for tractability of valued CSPs\nLocally-Oriented Programming: A Simple Programming Model for  Stencil-Based Computations on Multi-Level Distributed Memory Architectures\nType Classes for Lightweight Substructural Types\nTransducer Descriptions of DNA Code Properties and Undecidability of  Antimorphic Problems\nContext-Dependent Translation Selection Using Convolutional Neural  Network\nOn Using Monolingual Corpora in Neural Machine Translation\nNon-normal modalities in variants of Linear Logic\nLong Short-Term Memory Over Tree Structures\nSign Language Fingerspelling Classification from Depth and Color Images  using a Deep Belief Network\nFactorization in Formal Languages\nEquivalence of Deterministic Top-Down Tree-to-String Transducers is  Decidable\nA Query Language for Multi-version Data Web Archives\nEquational reasoning with context-free families of string diagrams\nDesign Issues of JPQ: a Pattern-based Query Language for Document  Databases\nLexical Translation Model Using a Deep Neural Network Architecture\nRecurrent Neural Networks with External Memory for Language  Understanding\nDiversity in Spectral Learning for Natural Language Parsing\nTraversing Knowledge Graphs in Vector Space\nModeling Order in Neural Word Embeddings at Scale\nTeaching Machines to Read and Comprehend\nAlgebraic Characterization of Forest Logics\nGradual Certified Programming in Coq\nTowards a unified query language for provenance and versioning\nEditorial for the First Workshop on Mining Scientific Papers:  Computational Linguistics and Bibliometrics\nA model of language inflection graphs\nLanguage Understanding for Text-based Games Using Deep Reinforcement  Learning\nEquations for Hereditary Substitution in Leivant's Predicative System F:  A Case Study\nListen, Attend and Spell\nLearning Structural Kernels for Natural Language Processing\nDepth-Gated LSTM\nTowards Enabling Overture as a Platform for Formal Notation IDEs\nEnd-to-End Attention-based Large Vocabulary Speech Recognition\nEuskahaldun: Euskararen Aldeko Martxa Baten Sare Sozialetako Islaren  Bilketa eta Analisia\nCrossings as a side effect of dependency lengths\nOn Compensation Primitives as Adaptable Processes\nOn TimeML-Compliant Temporal Expression Extraction in Turkish\nSplitting Compounds by Semantic Analogy\nTelugu OCR Framework using Deep Learning\nLevel Two of the Quantifier Alternation Hierarchy over Infinite Words\nAn IMS DSL Developed at Ericsson\nDeep Multimodal Embedding: Manipulating Novel Objects with Point-clouds,  Language and Trajectories\nWhat Makes it Difficult to Understand a Scientific Literature?\nImproving Type Error Messages in OCaml\nVisibly Linear Dynamic Logic\nProceedings XV Jornadas sobre Programación y Lenguajes\nService Choreography, SBVR, and Time\nNodIO, a JavaScript framework for volunteer-based evolutionary  algorithms : first results\nDynamic Games and Strategies\nSemantics for probabilistic programming: higher-order functions,  continuous distributions, and soft constraints\nA Kernel Independence Test for Geographical Language Variation\nLong Short-Term Memory-Networks for Machine Reading\nWASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter\nThe IMP game: Learnability, approximability and adversarial learning  beyond $Σ^0_1$\nSimple Search Algorithms on Semantic Networks Learned from Language Use\nComplexity of regular abstractions of one-counter languages\nPrecise subtyping for synchronous multiparty sessions\nA Language for the Declarative Composition of Concurrent Protocols\nA Simplified Stabilizer ZX-calculus\nOne-Counter Automata with Counter Observability\nPCA Method for Automated Detection of Mispronounced Words\nAdaptive Frequency Cepstral Coefficients for Word Mispronunciation  Detection\nIdentification of Parallel Passages Across a Large Hebrew/Aramaic Corpus\nCharacter-based Neural Machine Translation\nNeural Architectures for Named Entity Recognition\nPart-of-Speech Tagging for Historical English\nPersonalized Speech recognition on mobile devices\nDSCMC: Distributed Stateless Code Model Checker\nA Signaling Game Approach to Databases Querying and Interaction\nMulti-Task Cross-Lingual Sequence Tagging from Scratch\nStatic and Dynamic Feature Selection in Morphosyntactic Analyzers\nSemi-supervised Word Sense Disambiguation with Neural Models\nThe Anatomy of a Search and Mining System for Digital Archives\nModel Interpolation with Trans-dimensional Random Field Language Models  for Speech Recognition\nDifferentially Private Bayesian Programming\nBlock Shelves for Visual Programming Languages\nCompression and the origins of Zipf's law for word frequencies\nInference-based semantics in Data Exchange\nGrammatical Case Based IS-A Relation Extraction with Boosting for Polish\nLarge-scale Analysis of Counseling Conversations: An Application of  Natural Language Processing to Mental Health\nOvercoming the language barrier in mobile user interface design: A case  study on a mobile health app\nLearning Deep Representations of Fine-grained Visual Descriptions\nTwitter as a Lifeline: Human-annotated Twitter Corpora for NLP of  Crisis-related Messages\nDiachronic Word Embeddings Reveal Statistical Laws of Semantic Change\nOn the Complexity and Decidability of Some Problems Involving Shuffle\nA Theoretical Approach to initiate Mobile Assisted Language Learning  among school leavers and University Students of Sri Lanka\nCoordination in Categorical Compositional Distributional Semantics\nNeural Network Models for Implicit Discourse Relation Classification in  English and Chinese without Surface Features\nNatural Language Comprehension with the EpiReader\nOptimizing Spectral Learning for Parsing\nFirst Result on Arabic Neural Machine Translation\nMuFuRU: The Multi-Function Recurrent Unit\nPSDVec: a Toolbox for Incremental and Scalable Word Embedding\nDecidable Characterization of FO2(<,+1) and locality of DA\nExternal Lexical Information for Multilingual Part-of-Speech Tagging\nMatching Networks for One Shot Learning\nDiSquawk: 512 cores, 512 memories, 1 JVM\nUniversal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis\nEgyptian Arabic to English Statistical Machine Translation System for  NIST OpenMT'2015\nA Nonparametric Bayesian Approach for Spoken Term detection by Example  Query\nIntroducing a Calculus of Effects and Handlers for Natural Language  Semantics\nA Data-Driven Approach for Semantic Role Labeling from Induced Grammar  Structures in Language\nNeighborhood Mixture Model for Knowledge Base Completion\nQuestion Relevance in VQA: Identifying Non-Visual And False-Premise  Questions\nNeural Morphological Tagging from Characters for Morphologically Rich  Languages\nEvaluation method of word embedding by roots and affixes\nTowards Self-explanatory Ontology Visualization with Contextual  Verbalization\nNeural Tree Indexers for Text Understanding\nAn Empirical Evaluation of various Deep Learning Architectures for  Bi-Sequence Classification Tasks\nReasoning about Body-Parts Relations for Sign Language Recognition\nReductionism and the Universal Calculus\nForward-Mode Automatic Differentiation in Julia\nThe DLVHEX System for Knowledge Representation: Recent Advances (System  Description)\nDistributed agent-based automated theorem proving in order-sorted  first-order logic\nINSIGHT-1 at SemEval-2016 Task 5: Deep Learning for Multilingual  Aspect-based Sentiment Analysis\nThe Microsoft 2016 Conversational Speech Recognition System\nIn-place Graph Rewriting with Interaction Nets\nMulti-Buffer Simulations for Trace Language Inclusion\nMaximal Repetition and Zero Entropy Rate\nMultiparty Session Actors\nAutomatic Quality Assessment for Speech Translation Using Joint ASR and  MT Features\nWeakly supervised spoken term discovery using cross-lingual side  information\nGov2Vec: Learning Distributed Representations of Institutions and Their  Legal Text\nAligning Coordinated Text Streams through Burst Information Network  Construction and Decipherment\nModelling Radiological Language with Bidirectional Long Short-Term  Memory Networks\nOptimizing Neural Network Hyperparameters with Gaussian Processes for  Dialog Act Classification\nL-Convex Polyominoes are Recognizable in Real Time by 2D Cellular  Automata\nGaps between equations and experiments in quantum cryptography\nTutorial on Answering Questions about Images with Deep Learning\nVisual Question Answering: Datasets, Algorithms, and Future Challenges\nClinical Text Prediction with Numerically Grounded Conditional Language  Models\nA Novel Learning Algorithm for Büchi Automata based on Family of DFAs  and Classification Trees\nA Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural  Networks\nThe Geometry of Parallelism. Classical, Probabilistic, and Quantum  Effects\nInference Compilation and Universal Probabilistic Programming\nNeural Symbolic Machines: Learning Semantic Parsers on Freebase with  Weak Supervision\nCounterexamples and Proof Loophole for the C/C++ to POWER and ARMv7  Trailing-Sync Compiler Mappings\nLSTM-Based System-Call Language Modeling and Robust Ensemble Method for  Designing Host-Based Intrusion Detection Systems\nLatent Attention For If-Then Program Synthesis\nSums of Uncertainty: Refinements Go Gradual\nEnd-to-End Subtitle Detection and Recognition for Videos in East Asian  Languages via CNN Ensemble with Near-Human-Level Performance\nVisualizing Linguistic Shift\nLeveraging Parallel Data Processing Frameworks with Verified Lifting\nScalable Bayesian Learning of Recurrent Neural Networks for Language  Modeling\nTowards Accurate Word Segmentation for Chinese Patents\nTransaction-based Sandboxing for JavaScript\nAutomated assessment of non-native learner essays: Investigating the  role of linguistic features\nSelf-composable Programming\nOntohub: A semantic repository for heterogeneous ontologies\nRuntime enforcement of reactive systems using synchronous enforcers\nA Two-Phase Approach Towards Identifying Argument Structure in Natural  Language\nKnowledge Engineering for Hybrid Deductive Databases\nA Simulation Tool for tccp Programs\nA Practical Study of Control in Objected-Oriented--Functional--Logic  Programming with Paisley\nWorld Literature According to Wikipedia: Introduction to a DBpedia-Based  Framework\nNeural Machine Translation on Scarce-Resource Condition: A case-study on  Persian-English\nDeepDSL: A Compilation-based Domain-Specific Language for Deep Learning\nA Crevice on the Crane Beach: Finite-Degree Predicates\nA Multifaceted Evaluation of Neural versus Phrase-Based Machine  Translation for 9 Language Directions\nCross-lingual RST Discourse Parsing\nProceedings XVI Jornadas sobre Programación y Lenguajes\nQCRI Machine Translation Systems for IWSLT 16\nUp-To Techniques for Weighted Systems (Extended Version)\nAssessing User Expertise in Spoken Dialog System Interactions\nMinimization of Visibly Pushdown Automata Using Partial Max-SAT\nMultilingual and Cross-lingual Timeline Extraction\nWord equations in linear space\nNeural Semantic Parsing over Multiple Knowledge-bases\nZX-Calculus: Cyclotomic Supplementarity and Incompleteness for  Clifford+T quantum mechanics\nRepresentations of language in a model of visually grounded speech  signal\nUnveiling Eilenberg-type Correspondences: Birkhoff's Theorem for  (finite) Algebras + Duality\nAn efficient algorithm to decide periodicity of b-recognisable sets  using MSDF convention\nThe Parallel Meaning Bank: Towards a Multilingual Corpus of Translations  Annotated with Compositional Meaning Representations\nReinforcement Learning Based Argument Component Detection\nSpatial evolution of human dialects\nOptimal Non-blocking Decentralized Supervisory Control Using G-Control  Consistency\nNeural Machine Translation and Sequence-to-sequence Models: A Tutorial\nRefactoring Legacy JavaScript Code to Use Classes: The Good, The Bad and  The Ugly\nEstablishing Role-based Access Control in Viewpoint-oriented Variability  Management\nTurkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small  Datasets\nA Study of Metrics of Distance and Correlation Between Ranked Lists for  Compositionality Detection\nSequential Recurrent Neural Networks for Language Modeling\nVisually grounded learning of keyword prediction from untranscribed  speech\nOpinion Mining on Non-English Short Text\nRestricted Recurrent Neural Tensor Networks: Exploiting Word Frequency  and Compositionality for Increased Model Capacity and Performance With No  Computational Overhead\nRhetorical relations for information retrieval\nWhat do Neural Machine Translation Models Learn about Morphology?\nProceedings 8th Workshop on Developments in Implicit Computational  Complexity and 5th Workshop on Foundational and Practical Aspects of Resource  Analysis\nA Broad-Coverage Challenge Corpus for Sentence Understanding through  Inference\nStability and Fluctuations in a Simple Model of Phonetic Category Change\nTranslating Neuralese\nRussellian Propositional Logic and the BHK Interpretation\nFrom Characters to Words to in Between: Do We Capture Morphology?\nA Reasoning System for a First-Order Logic of Limited Belief\nCrowdsourcing Argumentation Structures in Chinese Hotel Reviews\nItem Recommendation with Continuous Experience Evolution of Users using  Brownian Motion\nPhonetic Temporal Neural Model for Language Identification\nLearning Semantic Correspondences in Technical Documentation\nAnnotating and Modeling Empathy in Spoken Conversations\nAgent-based model for the origins of scaling in human language\nFormalized Lambek Calculus in Higher Order Logic (HOL4)\nThe Meaning of Memory Safety\nTransformation of Python Applications into Function-as-a-Service  Deployments\nAnalysing Timelines of National Histories across Wikipedia Editions: A  Comparative Computational Approach\nASR error management for improving spoken language understanding\nHelping News Editors Write Better Headlines: A Recommender to Improve  the Keyword Contents & Shareability of News Headlines\nThe complexity of recognizing minimally tough graphs\nA Complete Axiomatisation of the ZX-Calculus for Clifford+T Quantum  Mechanics\nRecognizing Handwritten Source Code\nTeaching Machines to Describe Images via Natural Language Feedback\nLearning to Compute Word Embeddings On the Fly\nTransfer Learning for Speech Recognition on a Budget\nFunction Assistant: A Tool for NL Querying of APIs\nConcept Transfer Learning for Adaptive Language Understanding\nDynamic Integration of Background Knowledge in Neural NLU Systems\nConstraint Satisfaction Problem Dichotomy for Finite Templates: a Proof  Via Consistency Checks\nExploring the Syntactic Abilities of RNNs with Multi-task Learning\nAttention-based Vocabulary Selection for NMT Decoding\nTransfer Learning for Neural Semantic Parsing\nA Survey Of Cross-lingual Word Embedding Models\nNumber game\nJaTeCS an open-source JAva TExt Categorization System\nStance Detection in Turkish Tweets\nBeyond Bilingual: Multi-sense Word Embeddings using Multilingual Context\nAP17-OLR Challenge: Data, Plan, and Baseline\nCross-Lingual Sentiment Analysis Without (Good) Translation\nDevelopment and Maintenance of XML-Based Versus HTML-Based Websites: A  Case Study\nTowards Zero-Shot Frame Semantic Parsing for Domain Scaling\nRefinable Function : An Object-oriented Approach to Procedure Modularity\nProbabilistic Program Equivalence for NetKAT\nLearning to Compose Task-Specific Tree Structures\nTabula: A Language to Model Spreadsheet Tables\nSource-Target Inference Models for Spatial Instruction Understanding\nEvaluating Semantic Parsing against a Simple Web-based Question  Answering Model\nMemoisation: Purely, Left-recursively, and with (Continuation Passing)  Style\nIris: A Conversational Agent for Complex Tasks\nThe Digital Flynn Effect: Complexity of Posts on Social Media Increases  over Time\nFast and Accurate OOV Decoder on High-Level Features\nLV-ROVER: Lexicon Verified Recognizer Output Voting Error Reduction\nAn Executable Specification of Typing Rules for Extensible Records based  on Row Polymorphism\nThe RepEval 2017 Shared Task: Multi-Genre Natural Language Inference  with Sentence Representations\nShortcut-Stacked Sentence Encoders for Multi-Domain Inference\nLearning how to Active Learn: A Deep Reinforcement Learning Approach\nWhich Encoding is the Best for Text Classification in Chinese, English,  Japanese and Korean?\nRecent Trends in Deep Learning Based Natural Language Processing\nRadical-level Ideograph Encoder for RNN-based Sentiment Analysis of  Chinese and Japanese\nArgument Labeling of Explicit Discourse Relations using LSTM Neural  Networks\nExploring Directional Path-Consistency for Solving Constraint Networks\nNeural machine translation for low-resource languages\nCLaC @ QATS: Quality Assessment for Text Simplification\nApplying Data Augmentation to Handwritten Arabic Numeral Recognition  Using Deep Learning Neural Networks\nA Batch Noise Contrastive Estimation Approach for Training Large  Vocabulary Language Models\nPortuguese Word Embeddings: Evaluating on Word Analogies and Natural  Language Tasks\nThe Microsoft 2017 Conversational Speech Recognition System\nVerifying Quantum Programs: From Quipper to QPMC\nTrustworthy Refactoring via Decomposition and Schemes: A Complex Case  Study\nNNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit\nA Computational Interpretation of Context-Free Expressions\nSPARQL as a Foreign Language\nMaking \"fetch\" happen: The influence of social and linguistic context on  nonstandard word growth and decline\nUnderstanding the Logical and Semantic Structure of Large Documents\nThe Voynich Manuscript is Written in Natural Language: The Pahlavi  Hypothesis\nMonadic Second-Order Logic with Arbitrary Monadic Predicates\nCombining Static and Dynamic Contract Checking for Curry\nStructural Resolution for Abstract Compilation of Object-Oriented  Languages\nAre you serious?: Rhetorical Questions and Sarcasm in Social Media  Dialog\nMERF: Morphology-based Entity and Relational Entity Extraction Framework  for Arabic\nSelf-Similar Algebras with connections to Run-length Encoding and  Rational Languages\nLanguage Independent Acquisition of Abbreviations\nProsodic Features from Large Corpora of Child-Directed Speech as  Predictors of the Age of Acquisition of Words\nReplicability Analysis for Natural Language Processing: Testing  Significance with Multiple Datasets\nA Deep Neural Network Approach To Parallel Sentence Extraction\nOn the Effective Use of Pretraining for Natural Language Inference\nOSU Multimodal Machine Translation System Report\nDeep Learning Paradigm with Transformed Monolingual Word Embeddings for  Multilingual Sentiment Analysis\nMultitask training with unlabeled data for end-to-end sign language  fingerspelling recognition\nThe Refinement Calculus of Reactive Systems\nEnd-to-end Network for Twitter Geolocation Prediction and Hashing\nEffectiveSan: Type and Memory Error Detection using Dynamically Typed  C/C++\nSystem Description: Russell - A Logical Framework for Deductive Systems\nLearning Differentially Private Recurrent Language Models\nSafe Pointers in SPARK 2014\nSpoken Language Biomarkers for Detecting Cognitive Impairment\nTransparent Replication Using Metaprogramming in Cyan\nLinking Tweets with Monolingual and Cross-Lingual News using Transformed  Word Embeddings\nLogical relations for coherence of effect subtyping\nStreaming Small-Footprint Keyword Spotting using Sequence-to-Sequence  Models\nCapturing the Future by Replaying the Past\nLearning neural trans-dimensional random field language models with  noise-contrastive estimation\nWhodunnit? Crime Drama as a Case for Natural Language Understanding\nJust ASK: Building an Architecture for Extensible Self-Service Spoken  Language Understanding\nCompressing Word Embeddings via Deep Compositional Code Learning\nComparison of Parallelisation Approaches, Languages, and Compilers for  Unstructured Mesh Algorithms on GPUs\nReal-time Stream-based Monitoring\nZero-Shot Style Transfer in Text Using Recurrent Neural Networks\nLattice Rescoring Strategies for Long Short Term Memory Language Models  in Speech Recognition\nAddressing Cross-Lingual Word Sense Disambiguation on Low-Density  Languages: Application to Persian\nSCTP in Go\nSelf-Supervised Vision-Based Detection of the Active Speaker as a  Prerequisite for Socially-Aware Language Acquisition\nRefinement Types for Ruby\nVietnamese Semantic Role Labelling\nEmbodied Question Answering\nSequence Mining and Pattern Analysis in Drilling Reports with Deep  Natural Language Processing\nOn a question of Krajewski's\nA User-Study on Online Adaptation of Neural Machine Translation to Human  Post-Edits\nCoDraw: Visual Dialog for Collaborative Drawing\nMorphology dictates a robot's ability to ground crowd-proposed language\nA simple script language for choreography of multiple, synchronizing  non-anthropomorphic robots\nPresburger-Definable Parameterized Typestates\nA Compare-Propagate Architecture with Alignment Factorization for  Natural Language Inference\nSocial Media Analysis based on Semanticity of Streaming and Batch Data\nA diagrammatic axiomatisation of fermionic quantum circuits\nObject Referring in Videos with Language and Human Gaze\nIndian Regional Movie Dataset for Recommender Systems\nOneNet: Joint Domain, Intent, Slot Prediction for Spoken Language  Understanding\nBuilding an Ellipsis-aware Chinese Dependency Treebank for Web Text\nEvaluating Layers of Representation in Neural Machine Translation on  Part-of-Speech and Semantic Tagging Tasks\nChoreographies for Reactive Programming\nMAttNet: Modular Attention Network for Referring Expression  Comprehension\nA Sheaf Model of Contradictions and Disagreements. Preliminary Report  and Discussion\nThe recovery of George Berkeley's objective science of 1710 and its  implications for traditional science\nAccelerating recurrent neural network language model based online speech  recognition system\nA State-of-the-Art of Semantic Change Computation\nPilot study for the COST Action \"Reassembling the Republic of Letters\":  language-driven network analysis of letters from the Hartlib's Papers\nDisunited Nations? A Multiplex Network Approach to Detecting Preference  Affinity Blocs using Texts and Votes\nProceedings First Workshop on Architectures, Languages and Paradigms for  IoT\nDisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An  Evaluation on a Corpus of French Spontaneous and Read Speech\nZero-Resource Neural Machine Translation with Multi-Agent Communication  Game\nUnderstanding Recurrent Neural State Using Memory Signatures\nMaking \"fetch\" happen: The influence of social and linguistic context on  nonstandard word growth and decline\nEnd-to-End Automatic Speech Translation of Audiobooks\nExtending the DEVS Formalism with Initialization Information\nDR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language  Inference\nStructured-based Curriculum Learning for End-to-end English-Japanese  Speech Translation\nDeep Multimodal Learning for Emotion Recognition in Spoken Language\nStateful Behavioral Types for ABS\nNL2Bash: A Corpus and Semantic Parser for Natural Language Interface to  the Linux Operating System\nOne Single Deep Bidirectional LSTM Network for Word Sense Disambiguation  of Text Data\nCollective Entity Disambiguation with Structured Gradient Tree Boosting\nApache Calcite: A Foundational Framework for Optimized Query Processing  Over Heterogeneous Data Sources\nContinuity and Rational Functions\nCross-lingual and Multilingual Speech Emotion Recognition on English and  French\nExplain Yourself: A Natural Language Interface for Scrutable Autonomous  Robots\nExtracting Action Sequences from Texts Based on Deep Reinforcement  Learning\nUmbral Calculus, a Different Mathematical Language\nDegrees of Infinite Words, Polynomials, and Atoms (Extended Version)\nDecision support with text-based emotion recognition: Deep learning for  affective computing\nLow-Resource Speech-to-Text Translation\nTen Diverse Formal Models for a CBTC Automatic Train Supervision System\nOperator algebras for higher rank analysis and their application to  factorial languages\nAutomatic Normalization of Word Variations in Code-Mixed Social Media  Text\nVisual augmentation of source code editors: A systematic review\nEmergence of Linguistic Communication from Referential Games with  Symbolic and Pixel Input\nSolving Bongard Problems with a Visual Language and Pragmatic Reasoning\nReasoning with Higher-Order Abstract Syntax in a Logical Framework\nA structured alternative to Prolog with simple compositional semantics\nProceedings 11th International Workshop on Foundations of Coordination  Languages and Self Adaptation\nAND and/or OR: Uniform Polynomial-Size Circuits\nOracle Pushdown Automata, Nondeterministic Reducibilities, and the CFL  Hierarchy over the Family of Context-Free Languages\nPredicate Logic as a Modelling Language: The IDP System\nOracle performance for visual captioning\nIs language evolution grinding to a halt? The scaling of lexical  turbulence in English fiction suggests it is not\nMaking the V in VQA Matter: Elevating the Role of Image Understanding in  Visual Question Answering\nHarmonizing Signals and Events with a Lightweight Extension to Java\nA Comprehensive Workflow for General-Purpose Neural Modeling with Highly  Configurable Neuromorphic Hardware Systems\nThe Kepler characterization of the variability among A- and F-type  stars. I. General overview\nA Deductive Account of Quantification in LFG\nIntensional Verbs Without Type-Raising or Lexical Ambiguity\nTracking Point of View in Narrative\nUniform Representations for Syntax-Semantics Arbitration\nA Formalism and an Algorithm for Computing Pragmatic Inferences and  Detecting Infelicities\nTAKTAG: Two-phase learning method for hybrid statistical/rule-based  part-of-speech disambiguation\nA Robust Parsing Algorithm For Link Grammars\nThe Effect of Resource Limits and Task Complexity on Collaborative  Planning in Dialogue\nExtraction of V-N-Collocations from Text Corpora: A Feasibility Study  for German\nDiscourse Coherence and Shifting Centers in Japanese Texts\nLibrary of Practical Abstractions, Release 1.2\nA lexical database tool for quantitative phonological research\nType-driven semantic interpretation and feature dependencies in R-LFG\nCorpus-Based Word Sense Disambiguation\nComplexity of Two-Dimensional Patterns\nScoping Constructs in Logic Programming: Implementation Problems and  their Solution\nA Winnow-Based Approach to Context-Sensitive Spelling Correction\nLearning to Resolve Natural Language Ambiguities: A Unified Approach\nCompositionality, Synonymy, and the Systematic Representation of Meaning\nSpecifying and Implementing Security Policies Using LaSCO, the Language  for Security Constraints on Objects\nApplying Constraint Handling Rules to HPSG\nProbabilistic Constraint Logic Programming. Formal Foundations of  Quantitative and Statistical Inference in Constraint-Based Natural Language  Processing\nBootstrapping Structure into Language: Alignment-Based Learning\nModeling Complex Domains of Actions and Change\nIntroducing Dynamic Behavior in Amalgamated Knowledge Bases\nThe partition semantics of questions, syntactically\nAutomated Pattern Detection--An Algorithm for Constructing Optimally  Synchronizing Multi-Regular Language Filters\nParallel Evaluation of Mathematica Programs in Remote Computers  Available in Network\nA Recipe for Symbolic Geometric Computing: Long Geometric Product,  BREEFS and Clifford Factorization\nSharp transition towards shared vocabularies in multi-agent systems\nQuantum Certificate Verification: Single versus Multiple Quantum  Certificates\nTowards a Coherent Theory of Physics and Mathematics\nEquilibrium (Zipf) and Dynamic (Grasseberg-Procaccia) method based  analyses of human texts. A comparison of natural (english) and artificial  (esperanto) languages\nMining Meaning from Wikipedia\nThe transmission sense of information\nModeling Discrete Combinatorial Systems as Alphabetic Bipartite  Networks: Theory and Applications\nProspective Study for Semantic Inter-Media Fusion in Content-Based  Medical Image Retrieval\nOntoELAN: An Ontology-based Linguistic Multimedia Annotator\nFuzzy Linguistic Logic Programming and its Applications\nANN-based Innovative Segmentation Method for Handwritten text in  Assamese\nCoding Guidelines for Prolog\nProceedings Second International Workshop on Programming Language  Approaches to Concurrency and Communication-cEntric Software\nCoinductive subtyping for abstract compilation of object-oriented  languages into Horn formulas\nThe Need to Support of Data Flow Graph Visualization of Forensic Lucid  Programs, Forensic Evidence, and their Evaluation by GIPSY\nPartition Refinement of Component Interaction Automata: Why Structure  Matters More Than Size\nA Logical Foundation for Environment Classifiers\nLocal Distributed Decision\nUsing Java for distributed computing in the Gaia satellite data  processing\nCorrelating Formal Semantic Models of Reo Connectors: Connector Coloring  and Constraint Automata\nMultilingual ontology matching based on Wiktionary data accessible via  SPARQL endpoint\nA Domain-Specific Language for Incremental and Modular Design of  Large-Scale Verifiably-Safe Flow Networks (Preliminary Report)\nClassification of Flames in Computer Mediated Communications\nRapid Development of Interferometric Software Using MIRIAD and Python\nOn Quotients of Formal Power Series\nDiscrimination of English to other Indian languages (Kannada and Hindi)  for OCR system\nkLog: A Language for Logical and Relational Learning with Kernels\nBroccoli: Semantic Full-Text Search at your Fingertips\nTranslation of Bengali Terms in Mobile Phones: a Simplified Approach  Based on the Prescriptions of Conventional Accent Understand Ability\nInferring SQL Queries Using Program Synthesis\nSoftware Security analysis, static and dynamic testing in java and C  environment, a comparative study\nValue production in a collaborative environment\nDiffusion of Lexical Change in Social Media\nAdaptable processes\nModeling in OWL 2 without Restrictions\nRegular Cost Functions, Part I: Logic and Algebra over Words\nExtending FO(ID) with Knowledge Producing Definitions: Preliminary  Results\nLarge Scale Distributed Acoustic Modeling With Back-off N-grams\nObject-Oriented Bayesian Networks\nEthics of using language editing services in an era of digital  communication and heavily multiauthored papers\nSoft Contract Verification\nAlgebraic Structure of Combined Traces\nLanguage change in a multiple group society\nSaying What You're Looking For: Linguistics Meets Video Search\nRecognizing Speech in a Novel Accent: The Motor Theory of Speech  Perception Reframed\nSentiment Analysis in the News\nQEMU/CPC: Static Analysis and CPS Conversion for Safe, Portable, and  Efficient Coroutines\nA language independent web data extraction using vision based page  segmentation algorithm\nThe Complexity of Flow Analysis in Higher-Order Languages\nLearning to Win by Reading Manuals in a Monte-Carlo Framework\nA Proof Theoretic Study of Soft Concurrent Constraint Programming\nInteractions of cultures and top people of Wikipedia from ranking of 24  language editions\nZipf's law for word frequencies: word forms versus lemmas in long texts\nNon-Standard Words as Features for Text Categorization\nOmitting types in logic of metric structures\nGalois Transformers and Modular Abstract Interpreters\nSifting Robotic from Organic Text: A Natural Language Approach for  Detecting Automation on Twitter\nLearning Better Word Embedding by Asymmetric Low-Rank Projection of  Knowledge Graph\nCombining Models of Approximation with Partial Learning\nMarkov Logic Networks for Natural Language Question Answering\nPushdown Control-Flow Analysis for Free\nBias and population structure in the actuation of sound change\nMining Local Gazetteers of Literary Chinese with CRF and Pattern based  Methods for Biographical Information in Chinese History\nPopulation size predicts lexical diversity, but so does the mean sea  level - why it is important to correctly account for the structure of  temporal data\nJoint Word Representation Learning using a Corpus and a Semantic Lexicon\nMental Lexicon Growth Modelling Reveals the Multiplexity of the English  Language\nDesiree - a Refinement Calculus for Requirements Engineering\nWhat we write about when we write about causality: Features of causal  statements across large-scale social discourse\nSemantic Code Browsing\nHierarchical Attention Model for Improved Machine Comprehension of  Spoken Content\nModel Checking of Boolean Process Models\nDomain Specific Language for Geometric Relations between Rigid Bodies  targeted to robotic applications\nMeasure Transformer Semantics for Bayesian Machine Learning\nClash of the Lambdas\nPersian Sentiment Analyzer: A Framework based on a Novel Feature  Selection Method\nStructural characterizations of the navigational expressiveness of  relation algebras on a tree\nAll Who Wander: On the Prevalence and Characteristics of Multi-community  Engagement\nReally Natural Linear Indexed Type Checking\nAutomated Analysis and Prediction of Job Interview Performance\nGraphVista: Interactive Exploration Of Large Graphs\nIdioms-Proverbs Lexicon for Modern Standard Arabic and Colloquial  Sentiment Analysis\nDeductive Verification of Parallel Programs Using Why3\nUnsupervised Discovery of Linguistic Structure Including Two-level  Acoustic Patterns Using Three Cascaded Stages of Iterative Optimization\nSemiring-based Specification Approaches for Quantitative Security\nSymbol Emergence in Robotics: A Survey\nDeep Speech 2: End-to-End Speech Recognition in English and Mandarin\nSculpting Quantum Speedups\nTermination of canonical context-sensitive rewriting and productivity of  rewrite systems\nTransforming Javascript Event-Loop Into a Pipeline\nThe White Matter Query Language: A Novel Approach for Describing Human  White Matter Anatomy\nA Lambda-Calculus Foundation for Universal Probabilistic Programming\nA Constraint Satisfaction Method for Configuring Non-Local Service  Interfaces\nWinning Arguments: Interaction Dynamics and Persuasion Strategies in  Good-faith Online Discussions\nObserving Trends in Automated Multilingual Media Analysis\nPolymorphic Type Inference for Machine Code\nEnabling Cognitive Intelligence Queries in Relational Databases using  Low-dimensional Word Embeddings\nValidating an Approach to Formalize Use Cases with Ontologies\nRolex: Resilience-Oriented Language Extensions for Extreme-Scale Systems\nBuild It, Break It, Fix It: Contesting Secure Development\nData-driven HR - Résumé Analysis Based on Natural Language  Processing and Machine Learning\nSTransE: a novel embedding model of entities and relationships in  knowledge bases\nPredicting and Understanding Law-Making with Word Vectors and an  Ensemble Model\nHoney: A dataflow programming language for the processing, featurization  and analysis of multivariate, asynchronous and non-uniformly sampled scalar  symbolic time sequences\nFactored Neural Machine Translation\nPrioritized Garbage Collection: Explicit GC Support for Software Caches\nGradual Typing in an Open World\nBenchmarking Web-testing - Selenium versus Watir and the Choice of  Programming Language and Browser\nA Fault-tolerance Linguistic Structure for Distributed Applications\nTopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency\nAutomatic recognition of child speech for robotic applications in noisy  environments\nRanking medical jargon in electronic health record notes by adapted  distant supervision\nZero-resource Machine Translation by Multimodal Encoder-decoder Network  with Multimedia Pivot\nWeb-based Argumentation\nAttentive Explanations: Justifying Decisions and Pointing to the  Evidence\nQuantitative Regular Expressions for Arrhythmia Detection Algorithms\nSource Code Verification for Embedded Systems using Prolog\nOutrageously Large Neural Networks: The Sparsely-Gated  Mixture-of-Experts Layer\nExtracting Bilingual Persian Italian Lexicon from Comparable Corpora  Using Different Types of Seed Dictionaries\nConstraint Answer Set Solver EZCSP and Why Integration Schemas Matter\nThe Universal Fragment of Presburger Arithmetic with Unary Uninterpreted  Predicates is Undecidable\nNormalisation de la langue et de lecriture arabe : enjeux culturels  regionaux et mondiaux\nLearning Cooperative Visual Dialog Agents with Deep Reinforcement  Learning\nModelling System of Systems Interface Contract Behaviour\nBag-of-Words Method Applied to Accelerometer Measurements for the  Purpose of Classification and Energy Estimation\nProsody: The Rhythms and Melodies of Speech\nNeural Machine Translation Model with a Large Vocabulary Selected by  Branching Entropy\nMonoidal computer III: A coalgebraic view of computability and  complexity\nAn Empirical Analysis of NMT-Derived Interlingual Embeddings and their  Use in Parallel Sentence Identification\nBrzozowski Goes Concurrent - A Kleene Theorem for Pomset Languages\nData-Driven Program Completion\nTowards a Knowledge Graph based Speech Interface\nHow Important is Syntactic Parsing Accuracy? An Empirical Evaluation on  Rule-Based Sentiment Analysis\nWhere is my forearm? Clustering of body parts from simultaneous tactile  and linguistic input using sequential mapping\nIs Natural Language a Perigraphic Process? The Theorem about Facts and  Words Revisited\nProbabilistic Model Checking of Incomplete Models\nImproving Scalability of Inductive Logic Programming via Pruning and  Best-Effort Optimisation\nUnsupervised Iterative Deep Learning of Speech Features and Acoustic  Tokens with Applications to Spoken Term Detection\nComparing Classical and Relativistic Kinematics in First-Order Logic\nLanguage modeling with Neural trans-dimensional random fields\nRelational Learning and Feature Extraction by Querying over  Heterogeneous Information Networks\nA framework for quantitative modeling and analysis of highly  (re)configurable systems\nLocation Name Extraction from Targeted Text Streams using  Gazetteer-based Statistical Language Models\nVQS: Linking Segmentations to Questions and Answers for Supervised  Attention in VQA and Question-Focused Semantic Segmentation\nRepresentations and evaluation strategies for feasibly approximable  functions\nUnsupervised Sentence Representations as Word Information Series:  Revisiting TF--IDF\nFormally Secure Compilation of Unsafe Low-Level Components (Extended  Abstract)\nA First Step in Combining Cognitive Event Features and Natural Language  Representations to Predict Emotions\nPersonalized word representations Carrying Personalized Semantics  Learned from Social Network Posts\nKeyword-based Query Comprehending via Multiple Optimized-Demand  Augmentation\nLearning to Represent Programs with Graphs\nProgramming Bots by Synthesizing Natural Language Expressions into API  Invocations\nDepression Severity Estimation from Multiple Modalities\nSpeech recognition for medical conversations\nEffective Use of Bidirectional Language Modeling for Medical Named  Entity Recognition\nVisualisation and 'diagnostic classifiers' reveal how recurrent and  recursive neural networks process hierarchical structure\nRecurrent Neural Network Language Models for Open Vocabulary Event-Level  Cyber Anomaly Detection\nAn innovative solution for breast cancer textual big data analysis\nBuilding competitive direct acoustics-to-word models for English  conversational speech recognition\nEmo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced  Online Dictionary\nUnifying Theories of Reactive Design Contracts\nFirst Draft on the xInf Model for Universal Physical Computation and  Reverse Engineering of Natural Intelligence\nExploring Architectures, Data and Units For Streaming End-to-End Speech  Recognition with RNN-Transducer\nQuantitative Analysis of Smart Contracts\nSWRL2SPIN: A tool for transforming SWRL rule bases in OWL ontologies to  object-oriented SPIN rules\nReinforced Self-Attention Network: a Hybrid of Hard and Soft Attention  for Sequence Modeling\nLearning from Past Mistakes: Improving Automatic Speech Recognition  Output via Noisy-Clean Phrase Context Modeling\nTree-to-tree Neural Networks for Program Translation\nTornado: A Practical And Efficient Heterogeneous Programming Framework  For Managed Languages\nInterval-based Resource Usage Verification by Translation into Horn  Clauses and an Application to Energy Consumption\nSafe Non-blocking Synchronization in Ada 202x\nQDEE: Question Difficulty and Expertise Estimation in Community Question  Answering Sites\nA Computational Model of Syntactic Processing: Ambiguity Resolution from  Interpretation\nTopological relaxation of entangled flux lattices: Single vs collective  line dynamics\nThe physical Meaning of Replica Symmetry Breaking\nA Hydrodynamic Approach to Superconductivity\nLogic Programming, Functional Programming, and Inductive Definitions\nMini-indexes for literate programs\nIs Word Sense Disambiguation just one more NLP task?\nCascaded Markov Models\nCombining Inclusion Polymorphism and Parametric Polymorphism\nHuman-Computer Conversation\nA Unified Example-Based and Lexicalist Approach to Machine Translation\nMAP Lexicon is useful for segmentation and word discovery in  child-directed speech\nA statistical model for word discovery in child directed speech\nMeasures of Distributional Similarity\nAdvances in domain independent linear text segmentation\nThe Light Lexicographic path Ordering\nDo All Fragments Count?\nThe alldifferent Constraint: A Survey\nObject-oriented solutions\nCLP Approaches to 2D Angle Placements\nA procedure for unsupervised lexicon learning\nA Statistical Model for Word Discovery in Transcribed Speech\nEvaluating the Effectiveness of Ensembles of Decision Trees in  Disambiguating Senseval Lexical Samples\nAssessing System Agreement and Instance Difficulty in the Lexical Sample  Tasks of Senseval-2\nUnsupervised Learning in a Framework of Information Compression by  Multiple Alignment, Unification and Search\nEmpirical Methods for Compound Splitting\ngTybalt - a free computer algebra system\nPure Prolog Execution in 21 Rules\nWSAT(cc) - a fast local-search ASP solver\nComputing Convex Hulls with a Linear Solver\nBuilding a linguistic corpus from bee dance data\nOn Modal Logics of Partial Recursive Functions\nA Bimachine Compiler for Ranked Tagging Rules\nOn computing the fixpoint of a set of boolean equations\nThe state complexity of L^2 and L^k\nWidening Operators for Weakly-Relational Numeric Abstractions (Extended  Abstract)\nTransforming Business Rules Into Natural Language Text\nMetalinguistic Information Extraction for Terminology\nReacProc: A Tool to Process Reactions Describing Particle Interactions\nWord sense disambiguation criteria: a systematic study\nApproximability of Bounded Occurrence Max Ones\nA Predicative Harmonization of the Time and Provable Hierarchies\nRecurrence with affine level mappings is P-time decidable for CLP(R)\nAlgorithm of Segment-Syllabic Synthesis in Speech Recognition Problem\nUntersuchungen zur Jet-Parton-Korrelation in der tief-inelastischen  Streuung\nThree-Loop Results on the Lattice\nLattice Perturbation Theory by Computer Algebra: A Three-Loop Result for  the Topological Susceptibility\nPerturbative renormalization for overlap fermions\nCorrelations, correlation integrals and application to Bose-Einstein  interferometry\nGluon-Meson Duality in the Mean Field Approximation\nIntroductory lecture notes on the Karabali-Nair theory\nHard collisions of photons: plea for a common language\nA Feynman graph selection tool in GRACE system\nQCD and Hadron Dynamics\nA verification of the Optimal Jet Finder\nLeading rescattering effects cannot improve the description of $B \\to K  π$ data\nGiNaC - Symbolic computation with C++\nHeavy quarkonium decays and transitions in the language of effective  field theories\nDuality Principle and Braided Geometry\nIntegrability and Seiberg-Witten Theory: Curves and Periods\nCorrelation functions for the Z-invariant Ising model\nBranes and Six Dimensional Fixed Points\nString theory as a universal language\nProjection on higher Landau levels and non-commutative geometry\nNoncommutative gerbes and deformation quantization\nTwo-Loop Computation in Superstring Theory\nOperads and Quantum Gravity\nOn Concept of Parity for a Fermion\nQuantum Field Theory in the Language of Light-cone String\nOn a Classification of Irreducible Almost Commutative Geometries\nAKSZ-BV Formalism and Courant Algebroid-induced Topological Field  Theories\nMany simple cardinal invariants\nConvergence in homogeneous random graphs\nThe Knowlton-Graham partition problem\nExistence of Orbifolds IV: Examples\nA Note on Superamorphous Sets and Dual Dedekind-Infinity\nNew results on binary linear codes\nLectures on Special Lagrangian Submanifolds\nThree-Dimensional 2-Framed TQFTs and Surgery\nOne-Dimensional Peg Solitaire\nWeak Hopf algebra symmetries of C^*-algebra inclusions\nUniversal numerical algorithms and their software implementation\n3-Manifolds from Platonic Solids\nFunctions on groups and computational complexity\nWord Hyperbolic Semigroups\nA Universal Approach to Self-Referential Paradoxes, Incompleteness and  Fixed Points\nDer rechnende Dichter (The calculating poet)\nInternal bialgebroids, entwining structures and corings\nGrothendieck rings of \\mathbb{Z}-valued fields\nRealizations of Bialgebras (II) : Duality Theorem\nExtending the Language of Set Theory\nQuantum random walks and their convergence\nContinuity of the Mixing Operator\nModuli stacks of vector bundles on curves and the King-Schofield  rationality proof\nThe Stasheff model of a simply-connected manifold and the string bracket\nA recursive method for computing zeta functions of varieties\nThe word problem distinguishes counter languages\nA fansy divisor on M_{0,n}\nEnglish Russian Scientific Dictionary\nOn the context-freeness of the set of words containing overlaps\nThe Feynman Legacy\nPolynômes de Hua, noyau de Bergman des domaines de Cartan-Hartogs et  problème de Lu Qikeng\nDerived Algebraic Geometry III: Commutative Algebra\nJets, frames, and their Cartan geometry\nCARPS: An integrated proposal and data collection system\nBoltzmann's H-theorem and time irreversibility\nBrownian Motion for the School-Going Child\nTensor operators in R-matrix approach\nSimulation in Biology\nSearch for bottleneck effects in Penna ageing and Schulze language model\nA small 1-way quantum finite automaton\nEasy Control over Fermionic Computations\nThe Complexity of Probabilistic versus Quantum Finite Automata\nQuantum Implementation of Parrondo's Paradox\nQuantum measurement act as a \"speech act\"\nAnother Look at Quantum Teleportation\nAdiabatic Quantum Computing with Phase Modulated Laser Pulses\nScopes and Limits of Modality in Quantum Mechanics\nDistributions of Roots of Reduced Cubic Equations with Random  Coefficients\nThe Mathematics\nApplying the Z-transform for the static analysis of floating-point  numerical filters\nElementary gates for cartoon computation\nNagata's embedding theorem\nAssisted Problem Solving and Decompositions of Finite Automata\nProjection semantics for rigid loops\nGuerra's interpolation using Derrida-Ruelle cascades\nCD(4) has bounded width\nSimulation of ratio of old to young people in countries like Poland\nDifferential Complexes and Stratified Pro-Modules\nSoftware (Re-)Engineering with PSF\nClones and Genoids in Lambda Calculus and First Order Logic\nA Unified Approach to Local Cohomology Modules Using Serre Classes\nFinite Automata Based on Quantum Logic and Their Determinization\nSheaves on local Calabi-Yau varieties\nInfinite words containing squares at every position\nSymbolic computations in differential geometry\nEmbeddings of four-valent framed graphs into two-surfaces\nMarginal Likelihood Integrals for Mixtures of Independence Models\nRelaxed optimality conditions for mu-differentiable functions\nOpen architecture for multilingual parallel texts\nLinear algebra meets Lie algebra: the Kostant-Wallach theory\nA computer verified, monadic, functional implementation of the integral\nDistribution of complexities in the Vai script\nIntegrability and Chaos - algebraic and geometric approach\nText as Statistical Mechanics Object\nPractical language based on systems of definitions\nTwo Forms of One Useful Logic: Existential Fixed Point Logic and Liberal  Datalog\nOn the stability of the overconvergence under the direct image by a  proper smooth morphism\nA Note on Self-Dual Yang-Mills Theory\nSome peculiarities in response on filling up the Fermi sphere by quarks\nThere are k-uniform cubefree binary morphisms for all k >= 0\nA Simple, Linear-Time Algorithm for x86 Jump Encoding\nAn Array Algebra\nState Space Realization Theorems For Data Mining\nA Note on Symmetries in the Rauzy Graph and Factor Frequencies\nYet Another Deep Embedding of B:Extending de Bruijn Notations\nA new universal cellular automaton on the ternary heptagrid\nOn polynomial growth functions of D0L-systems\nClassical Combinatory Logic\nAxiomatizing mathematical conceptualism in third order arithmetic\nIterative Methods for Systems' Solving - a C# approach\nLexicographically least words in the orbit closure of the Rudin-Shapiro  word\nTriplet-like correlation symmetry of continuous variable entangled  states\nSome Considerations on Universality\nIntrinsically Universal Cellular Automata\nSmall Turing universal signal machines\nOn the boundaries of solvability and unsolvability in tag systems.  Theoretical and Experimental Results\nFairness as a QoS Measure for Web Services\nCross-Task Knowledge-Constrained Self Training\nIterative pushdown automata and hyperbolic contour words\nState Complexity Approximation\nProgramming with Quantum Communication\nProceedings Eleventh International Workshop on Descriptional Complexity  of Formal Systems\nPeriodicity in tilings\nThe averaging trick and the Cerny conjecture\nHow Do Interactive Virtual Operas Shift Relationships between Music,  Text and Image?\nOuter billiard outside regular polygons\nOn Event Structure in the Torn Dress\nA weakly universal cellular automaton in the hyperbolic 3D space with  three states\nNoncommutative rational functions, their difference-differential  calculus and realizations\nRecent Development of QCD Factorization for B-> M1 M2\nDeformations of algebraic subvarieties\nLocal positivity, multiplier ideals, and syzygies of abelian varieties\nProceedings Tenth International Workshop on Rule-Based Programming\nOn the palindromic decomposition of binary words\nA new measure of asymmetry of binary words\nAbout the embedding of one dimensional cellular automata into hyperbolic  cellular automata\nThe Socceral Force\nMorphonette: a morphological network of French\nA new weakly universal cellular automaton in the 3D hyperbolic space  with two states\nAn Effective Extension of the Wagner Hierarchy to Blind Counter Automata\nAn upper bound on the number of states for a strongly universal  hyperbolic cellular automaton on the pentagrid\nA Saturation Method for the Modal Mu-Calculus with Backwards Modalities  over Pushdown Systems\nEmpowering Collections with Swarm Behavior\nA measure of state transition of collective of stateless automata in  discrete environment\nA Weakly Intuitionistic Quantum Logic\nVariations on a theme of Beurling\nSteering Fragments of Instruction Sequences\nDiffieties and Liouvillian Systems\nStatic vs Dynamic SAGAs\nNetwork motifs in music sequences\nProceedings 12th International Workshop on Verification of  Infinite-State Systems\nTopological Modal Logics with Difference Modality\nSlopes of Tilings\nA Simple Correctness Proof for Magic Transformation\nAcross Browsers SVG Implementation\nCoordinates for a new triangular tiling of the hyperbolic plane\nVisualizing quantum mechanics in phase space\nJancar's formal system for deciding bisimulation of first-order grammars  and its non-soundness\nProceedings 5th International Workshop on Higher-Order Rewriting\nFife's Theorem Revisited\nRemarks on separating words\nHypermaps and multiply quasiplatonic Riemann surfaces\nReveR: Software Simulator of Reversible Processor with Stack\nGeometry of warped products\nEsparsidade, Estrutura, Escalamento e Estabilidade em Algebra Linear  Computacional\nInvariant number triangles, eigentriangles and Somos-4 sequences\nReversibility in Massive Concurrent Systems\nPetri Nets and Bio-Modelling - and how to benefit from their synergy\nOn the Delone property of (-β)-integers\nUne analyse basée sur la S-DRT pour la modélisation de dialogues  pathologiques\nSuper-Poincare' algebras, space-times and supergravities (II)\nIntroducing LoCo, a Logic for Configuration Problems\nPure spinor superfields and Born-Infeld theory\nInclusion of Unambiguous RE#s is NP-Hard\nThe Krohn-Rhodes Theorem and Local Divisors\nSolving the TTC 2011 Compiler Optimization Case with QVTR-XSLT\nSolving the TTC 2011 Compiler Optimization Case with GReTL\nSolving the TTC 2011 Reengineering Case with GReTL\nSaying Hello World with Henshin - A Solution to the TTC 2011 Instructive  Case\nSaying Hello World with GrGen.NET - A Solution to the TTC 2011  Instructive Case\nIndependent sets of words and the synchronization problem\nAdinkras for Mathematicians\nUnique decodability of bigram counts by finite automata\nExploring Twitter Hashtags\nBispecial factors in circular non-pushy D0L languages\nFormalization of semantic network of image constructions in electronic  content\nEnumerating Trees\nThe GF Mathematics Library\nOn equivalence and emptiness problems of multi-letter (measure many)  quantum finite automata\nAhlfors-Beurling operator on radial functions\nAutomatic Theorem-Proving in Combinatorics on Words\nRoget's Thesaurus as a Lexical Resource for Natural Language Processing\nImplementation of Kalman Filter with Python Language\nSynthesising Choreographies from Local Session Types (extended version)\nIsomorphisms of scattered automatic linear orders\nIndices to Quantify the Ranking of Arabic Journals and Research Output\nStructured Grammars are Effective\nOf priors and prejudice\nJuppix: a Linux Live-CD for Undergraduate Students\nVisibly pushdown automata on trees: universality and u-universality\nVisualization of features of a series of measurements with  one-dimensional cellular structure\nEmotion Detection from Text\nAn n log n Alogrithm for Deterministic Kripke Structure Minimization\nCellular automata on regular rooted trees\nBPA Bisimilarity is EXPTIME-hard\nThe hardest logic puzzle ever becomes even tougher\nThe Goedel's legacy: revisiting the Logic\nLocally compact groups and continuous logic\nElimination of Spurious Ambiguity in Transition-Based Dependency Parsing\nChallenges for Distributional Compositional Semantics\nInfinite ternary square-free words concatenated from permutations of a  single word\nQuantifier Elimination For Tame Fields\nComplexity of testing morphic primitivity\nOn the Existence of Universal Finite or Pushdown Automata\nLong time existence of Minimizing Movement solutions of Calabi flow\nIdentification of Probabilities of Languages\nRobopinion: Opinion Mining Framework Inspired by Autonomous Robot  Navigation\nCharacterizing Successful Formulas: the Multi-agent Case\nRewriting and narrowing for constructor systems with call-time choice  semantics\nA Linguistic Model for Terminology Extraction based Conditional Random  Fields\nFormally Checking Large Data Sets in the Railways\nBisimilarity of Pushdown Systems is Nonelementary\nProceedings 6th Workshop on Membrane Computing and Biologically Inspired  Process Calculi\nBalance properties of Arnoux-Rauzy words\nOn an algorithm for multiperiodic words\nThe Černý conjecture for small automata: experimental report\nA New Proof of P-time Completeness of Linear Lambda Calculus\nGlobal Mackey functors with operations and n-special lambda rings\nTowards Interactive Object-Oriented Programming\nWeak Concurrent Kleene Algebra with Application to Algebraic  Verification\nExploiting Uncertain and Temporal Information in Correlation\nReachability in Two-Clock Timed Automata is PSPACE-complete\nDouble cosets in free groups\nTwo Variable vs. Linear Temporal Logic in Model Checking and Games\nThe untyped stack calculus and Bohm's theorem\nOn p-form vortex-lines equations on extended phase space\nAn Inventory of Preposition Relations\nAlgebraic geometry over Boolean algebras in the language with constants\nTweets Miner for Stock Market Analysis\nCompact Notation for Finite Transformations\nThe User Feedback on SentiWordNet\nThe Holonomy Decomposition of Circular Semi-Flower Automata\nWords with unbounded periodicity complexity\nOn disjunction of equations in inverse semigroups\nA counterexample to a question of Hof, Knill and Simon\nOn Negotiation as Concurrency Primitive\nA Haskell Library for Term Rewriting\nNumerical computations in cobordism categories\nContextuality: Wheeler's universal regulating principle\nCompilation for QCSP\nCartesian closed 2-categories and permutation equivalence in  higher-order rewriting\nFinite State Machine Synthesis for Evolutionary Hardware\nThe homology graph of a higher dimensional automaton\nDisquisitiones 235\nChemical concrete machine\nAbstract interpretation as anti-refinement\nImplementing Computations in Automaton (Semi)groups\nEpistemic Logic for Communication Chains\nParacontrolled Distributions and the 3-dimensional Stochastic  Quantization Equation\nVariedades de álgebras topologicas\nThe decomposability problem for torsion-free abelian groups is analytic  complete\nCounting the Palstars\nFinite-type-Dyck shift spaces\nA Proof of the Barsotti-Chevalley Theorem on Algebraic Groups\nSolving the Flowgraphs Case with Eclectic\nPeriods and global invariants of automorphic representations\nThe hidden symmetry and Mr. Higgs!\nA Review of Verbal and Non-Verbal Human-Robot Interactive Communication\nA Microkernel Architecture for Constraint Programming\nSAP Speaks PDDL: Exploiting a Software-Engineering Model for Planning in  Business Process Management\nAlgorithmic Verification of Continuous and Hybrid Systems\nA Polynomial Time Solution to the Clique Problem\nClinical TempEval\nTowards a GPU-based implementation of interaction nets\nOpen Verlinde line operators\nIntégration des données d'un lexique syntaxique dans un analyseur  syntaxique probabiliste\nWhy must we work in the phase space?\nAnalysis of first order systems for the solution of Laplace's equation\nObject-Oriented Parallel Programming\nPiecewise Boolean algebras and their domains\nSynchronizing automata with random inputs\nA graph-based mathematical morphology reader\nLearning Bilingual Word Representations by Marginalizing Alignments\nSeparably closed fields and contractive Ore modules\nThe Correctness of Launchbury's Natural Semantics for Lazy Evaluation\nOptimality Theory as a Framework for Lexical Acquisition\nText Classification Using Association Rules, Dependency Pruning and  Hyperonymization\nOn multiply-exponential write-once Turing machines\nModeling Cassava Yield: A Response Surface Approach\nNegational Fragment of Intuitionistic Control Logic\nTransition regime of the one-dimensional two-stream instability\nSAT for pedestrians\nA Morphological Analyzer for Japanese Nouns, Verbs and Adjectives\nDependent Types for Pragmatics\nThe Visualization of Change in Word Meaning over Time using Temporal  Word Embeddings\nA note on two notions of compliance\nAn Intuitive Procedure for Converting PDA to CFG, by Construction of  Single State PDA\nYesquel: scalable SQL storage for Web applications\nApproaches for Synthesis Conjectures in an SMT Solver\nFREC 14: FRontiers of RECognizability\nOn the quantifier complexity of definable canonical henselian valuations\nA Meta-Logic of Inference Rules: Syntax\nUniform Definability in Propositional Dependence Logic\nStatic Analysis for Biological Systems (BioAmbients)\nTokuyama's Identity for Factorial Schur Functions\nA One-Dimensional Physically Universal Cellular Automaton\nRegroupement sémantique de définitions en espagnol\nFinitely Balanced Sequences and Plasticity of 1-Dimensional Tilings\nFibered Multiderivators and (co)homological descent\nFeatherweight PINQ\nPattern avoidance is not P-recursive\nRepresentation Theorems for Strong Predicate Exchangeability in Pure  Inductive Logic\nSimple, Fast Semantic Parsing with a Tensor Kernel\nOn a Unified Analysis in the language of preordered sets\nComplexity of Substitutive Sequences - Calculation of the Complexities  of Substitutive Sequences Over a Binary Alphabet\nFinite-Degree Predicates and Two-Variable First-Order Logic\nThe challenges of SVM optimization using Adaboost on a phoneme  recognition problem\nYARBUS : Yet Another Rule Based belief Update System\nProperty irrelevant predicates\nNormal forms for linear displacement context-free grammars\nUsing Ontology-Based Context in the Portuguese-English Translation of  Homographs in Textual Dialogues\nDEMONIC programming: a computational language for single-particle  equilibrium thermodynamics, and its formal semantics\nMultinomial Loss on Held-out Data for the Sparse Non-negative Matrix  Language Model\nComparing Writing Styles using Word Embedding and Dynamic Time Warping\nGood, Better, Best: Choosing Word Embedding Context\nRegular sequences and the joint spectral radius\nOn Basic Properties of Jumping Finite Automata\nBeyond OWL 2 QL in OBDA: Rewritings and Approximations (Extended  Version)\nHopf-Galois objects of Calabi-Yau Hopf algebras\nOnline Updating of Word Representations for Part-of-Speech Tagging\nThe Line Graph of the Universal Homogeneous Triangle-free Graph\nRefinement Types for TypeScript\nImproving sentence compression by learning to predict gaze\nFrom Incremental Meaning to Semantic Unit (phrase by phrase)\nRestricted trichotomy in higher dimensions\nEfficient Calculation of Bigram Frequencies in a Corpus of Short Texts\nIs 1+1=2 an empirical proposition?\nDetecting state of aggression in sentences using CNN\nDesiderata for Vector-Space Word Representations\nHolophrasm: a neural Automated Theorem Prover for higher-order logic\nOn the Regular Emptiness Problem of Subzero Automata\nTerpreT: A Probabilistic Programming Language for Program Induction\nQuicksort with median of medians is considered practical\nA special case of quasiminimality\nA Simple Guide to S3 Methods\nComputational Aspects of Asynchronous CA\nPutting Instruction Sequences into Effect\nThe Syllogistic with Unity\nCertifying and reasoning about cost annotations of functional programs\nLogical Fuzzy Optimization\nShortest Repetition-Free Words Accepted by Automata\nA polytime complexity analyser for Probabilistic Polynomial Time over  imperative stack programs\nExpressando Atributos Não-Funcionais em Workflows Científicos\nDeclarative Ajax Web Applications through SQL++ on a Unified Application  State\nA proposal for a Chinese keyboard for cellphones, smartphones, ipads and  tablets\nLinear models and linear mixed effects models in R with linguistic  applications\nHow Does Latent Semantic Analysis Work? A Visualisation Approach\nA note on groups of a family of hyperbolic tessellations\nTwo-dimensional Sentiment Analysis of text\n(k,l)-Unambiguity and Quasi-Deterministic Structures\nA Clustering Analysis of Tweet Length and its Relation to Sentiment\nDeformed one-loop amplitudes in N = 4 super-Yang-Mills theory\nQuantum finite automata: A modern introduction\nVector Clocks in Coq: An Experience Report\nThe word problem in Hanoi Towers groups\nAbelian networks II. Halting on all inputs\nObject-Oriented Programming, Functional Programming and R\nEvent Handling in ET++ - A Case Study in the Algebraic Specification of  Object-Oriented Application Frameworks\nCoffman deadlocks in SCOOP\nThe Emptiness Problem for Tree Automata with at Least One Disequality  Constraint is NP-hard\nA Note on a Recent Attempt to Improve the Pin-Frankl Bound\nZipf's Law and the Frequency of Characters or Words of Oracles\nNon-termination of Dalvik bytecode via compilation to CLP\nThe boundness of distance between two sets of fixed volume inside the  multidimensional ball or cube\nAn in-between \"implicit\" and \"explicit\" complexity: Automata\nExpressiveness of the modal mu-calculus on monotone neighborhood  structures\nPractical Realization of the Self-Balancing Robot Using Infrared Sensors\nDiverse Palindromic Factorization is NP-Complete\nA Finite Model Property for Intersection Types\nBuilding the distributed WPS-services execution environment\nOn the Combinatorics of Palindromes and Antipalindromes\nA definable henselian valuation with high quantifier complexity\nClassification of some Global Integrals related to groups of type $A_n$\nA Simple Parallel Implementation of Interaction Nets in Haskell\nTemporal ordering of clinical events\nControlled Query Evaluation for Datalog and OWL 2 Profile Ontologies\nLogic Blog 2014\nMedical Synonym Extraction with Concept Space Models\nSyntactic semigroup problem for the semigroup reducts of Affine  Near-semirings over Brandt Semigroups\nWords with the Maximum Number of Abelian Squares\nAn aperiodic set of 11 Wang tiles\nIncorporating Inductions and Game Semantics into Logic Programming\nLogic Programming with Macro Connectives\nEncoding TLA+ set theory into many-sorted first-order logic\nExploring Metaphorical Senses and Word Representations for Identifying  Metonyms\nFactorizations of the Fibonacci Infinite Word\nAnalysis of Communication Pattern with Scammers in Enron Corpus\nImplementing a teleo-reactive programming system\nTree Automata\nAbout the review in Mathematical Reviews of my paper: The two-cardinal  problem for languages of arbitrary cardinality The Journal of Symbolic Logic  75, Number 3, Sept., 2010, pp. 785-801\nRobustly Solvable Constraint Satisfaction Problems\nA token-passing net implementation of optimal reduction with embedded  read-back\nDeformations of holomorphic Poisson maps\nMulti-Source Neural Translation\nSome Landau--Ginzburg models viewed as rational maps\nHierarchical Latent Word Clustering\nProbabilistic Models for Computerized Adaptive Testing: Experiments\nRepetition-Free Derivability from a Regular Grammar is NP-Hard\nOn automatic subsets of the Gaussian integers\nEasy-First Dependency Parsing with Hierarchical Tree LSTMs\nElectromagnetic waves, gravitational waves and the prophets who  predicted them\nDerived noncommutative Zariski immersion and an equivalent reformulation  of Friedlander-Milnor conjecture\nMultichannel Variable-Size Convolution for Sentence Classification\nDifferential K-characters and D-branes\nThe Yahoo Query Treebank, V. 1.0\nThe bitwise operations in relation to obtaining Latin squares\nPrefix frequency of lost positions\nProof nets for the Displacement calculus\nThe masterpieces of John Forbes Nash Jr\nCalculi for Intuitionistic Normal Modal Logic\nShallow Discourse Parsing Using Distributed Argument Representations and  Bayesian Optimization\nEvaluating Informal-Domain Word Representations With UrbanDictionary\nAsking photons where they have been in plain language\nState complexity of multiple catenation\nShift registers fool finite automata\nModeling, refining and analyzing Incomplete Büchi Automata\nLAYERS: Yet another Neural Network toolkit\nFairness as a Program Property\nSequence-to-sequence neural network models for transliteration\nRepresenting regular pseudocomplemented Kleene algebras by  tolerance-based rough sets\nDefects and boundary RG flows in $\\mathbb{C}/\\mathbb{Z}_d$\nMore on Compression and Ranking\nImproving Reliability of Word Similarity Evaluation by Redesigning  Annotation Task and Performance Measure\nExpertise revisited I: Interactional Expertise\nSome conjectures on codes\nGowers norms for the Thue-Morse and Rudin-Shapiro sequences\nDiscovering Conversational Dependencies between Messages in Dialogs\nOn the synchronization of planar automata\nMatrix Dirichlet processes\nArcs, hypercubes, and graphs as quotients of projective Fraïssé  limits\nFaà di Bruno's note on eponymous formula, trilingual version\nGetting Started with PATSTAT Register\nRSSL: Semi-supervised Learning in R\nStance detection in online discussions\nTowards Smart Proof Search for Isabelle\nJob Detection in Twitter\nSemantic classifier approach to document classification\nA new lower bound for reset threshold of synchronizing automata with  sink state\nUndecidability and Finite Automata\nProceedings Eighth Workshop on Intersection Types and Related Systems\nA short proof of correctness of the quasi-polynomial time algorithm for  parity games\nCohomology with values in a sheaf of crossed groups over a site\nIntelligent User Interfaces - A Tutorial\nOn the Comparison of Context-Free Grammars\nVariations on a Visserian Theme\nEnglish Conversational Telephone Speech Recognition by Humans and  Machines\nOn the $k$-abelian complexity of the Cantor sequence\nComplexity of Verifying Nonblockingness in Modular Supervisory Control\nNeobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity  Model\nThe NLTK FrameNet API: Designing for Discoverability with a Rich  Linguistic Resource\nLabeled homology of higher-dimensional automata\nIntegral points on curves, the unit equation, and motivic periods\nThe Interplay of Semantics and Morphology in Word Embeddings\nThe Boolean SATisfiability Problem in Clifford algebra\nThe $Σ_2$ theory of $\\mathscr{D}_h(\\leq_h \\mathcal{O})$ as an  uppersemilattice with least and greatest element is decidable\nDuluth at Semeval-2017 Task 7 : Puns upon a midnight dreary, Lexical  Semantics for the weak and weary\nLearning Product Automata\nA Survey of Deep Learning Methods for Relation Extraction\nTen Conferences WORDS: Open Problems and Conjectures\nStegIbiza: Steganography in Club Music Implemented in Python\nOn the modularity of endomorphism algebras\nProperties of Normalization for a math based intermediate representation\nAlgebraic theories in monoidal categories\nChoreographies for Automatic Recovery\nword2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA\nPersistent topology for natural data analysis - A survey\nMyhill-Nerode Relation for Sequentiable Structures\nAlignment Elimination from Adams' Grammars\nTuring Completeness of Finite, Epistemic Programs\nFrame-Based Continuous Lexical Semantics through Exponential Family  Tensor Factorization and Semantic Proto-Roles\nA universal completion of the ZX-calculus\nGrammatical Error Correction with Neural Reinforcement Learning\nComplete Call-by-Value Calculi of Control Operators II: Strong  Termination\nComplete Call-by-Value Calculi of Control Operators, I\nExternal Evaluation of Event Extraction Classifiers for Automatic  Pathway Curation: An extended study of the mTOR pathway\nState complexity of catenation combined with boolean operations\nA Critique of a Critique of Word Similarity Datasets: Sanity Check or  Unnecessary Confusion?\nA Short Survey of Biomedical Relation Extraction Techniques\nGlobal Normalization of Convolutional Neural Networks for Joint Entity  and Relation Classification\nCan string kernels pass the test of time in Native Language  Identification?\nStar Height via Games\nImproved Abusive Comment Moderation with User Embeddings\nDerivations of Group Algebras\nConstructing an olfactory perceptual space and predicting percepts from  molecular structure\nA rule based algorithm for detecting negative words in Persian\nIVOA Recommendation: SSO - Single-Sign-On Profile: Authentication  Mechanisms Version 2.0\nArc-Standard Spinal Parsing with Stack-LSTMs\nComplexity of term representations of finitary functions\nOpenNMT: Open-source Toolkit for Neural Machine Translation\nLoIDE: a web-based IDE for Logic Programming - Preliminary Technical  Report\nHierarchical Gated Recurrent Neural Tensor Network for Answer Triggering\nNeural Networks for Text Correction and Completion in Keyboard Decoding\nNeural Machine Translation\nA new indexed approach to render the attractors of Kleinian groups\nCommunicating Finite-State Machines and Two-Variable Logic\nOn Vague Computers\nCompiling and Processing Historical and Contemporary Portuguese Corpora\nA geometer's view of the the Cramér-Rao bound on estimator variance\nWembedder: Wikidata entity embedding web service\nLagrange's Theorem for Binary Squares\nConvolutional Attention-based Seq2Seq Neural Network for End-to-End ASR\nA note on stability of Hardy inequalities\nTypesafe Abstractions for Tensor Operations\nWhen is an automatic set an additive basis?\nPermutation complexity of images of Sturmian words by marked morphisms\nA bound for the shortest reset words for semisimple synchronizing  automata via the packing number\nA Refutation of Guinea's \"Understanding SAT is in P\"\nAutomata in the Category of Glued Vector Spaces\nVerification of PCP-Related Computational Reductions in Coq\nPredicting readmission risk from doctors' notes\nA Taxonomy of Morphic Sequences\nNotes on bounded induction for the compositional truth predicate\nScott Ranks of Classifications of the Admissibility Equivalence Relation\nOuter billiards outside regular octagon: set of full measure and an  aperiodic point\nOn co-counter-fragments of automata\nSentiment Predictability for Stocks\nTrading Zones Revisited\nOn 'categories' of quantum field theories\nOn strong alt-induced codes\nUsing Sat solvers for synchronization issues in non-deterministic  automata\nBehavior Trees as a Representation for Medical Procedures\nMulti-optional Many-sorted Past Present Future structures and its  description\nFinitary-based Domain Theory in Coq: An Early Report\nHigher-dimensional automata modeling shared-variable systems\nSome Issues on the Theory of the Mimic-Computing-Oriented Automata\nHardy's paradox according to non-classical semantics\nCall-by-need, neededness and all that\ndiagnoseIT: Expertengestützte automatische Diagnose von  Performance-Probleme in Enterprise-Anwendungen (Abschlussbericht)\nA Scheme-Driven Approach to Learning Programs from Input/Output  Equations\nLayered structure and leveled function of a human brain\nAlternating Nonzero Automata\nProceedings Fourth International Workshop on Rewriting Techniques for  Program Transformations and Evaluation\nA Method to Translate Order-Sorted Algebras to Many-Sorted Algebras\nAutomatic supermartingales acting on sequences\nSequentializing cellular automata\nPrinciples of design and software development models of  ontological-driven computer systems\nImproved Upper Bounds on all Maximal $α$-gapped Repeats and  Palindromes\nA Study of Recent Contributions on Information Extraction\nUnsupervised Keyphrase Extraction with Multipartite Graphs\nChart Parsing Multimodal Grammars\nNeural models of factuality\nOn \\emptyset-definable elements in a field\nAGN host galaxies at redshift z~0.7: peculiar or not?\nPure Differential Modules and a Result of Macaulay on Unmixed Polynomial  Ideals\nThe GAPS programme with HARPS-N at TNG XI. Pr~0211 in M~44: the first  multi-planet system in an open cluster\nGoogle's Neural Machine Translation System: Bridging the Gap between  Human and Machine Translation\nFunctional Dynamics I : Articulation Process\nSherpa: a Mission-Independent Data Analysis Application\nAn Etymological Dictionary of Astronomy and Astrophysics:  English-French-Persian\nTruncated horseshoes and formal languages in chaotic scattering\nCommon Topics and Coherent Situations: Interpreting Ellipsis in the  Context of Discourse Inference\nAn Optimal Tabular Parsing Algorithm\nSemantics of Complex Sentences in Japanese\nExtracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and  Its Automatic Evaluation\nSpeech Dialogue with Facial Displays: Multimodal Human-Computer  Conversation\nTowards a Principled Representation of Discourse Plans\nWord-Sense Disambiguation Using Decomposable Models\nParsing Turkish with the Lexical Functional Grammar Formalism\nSome Advances in Transformation-Based Part of Speech Tagging\nA Psycholinguistically Motivated Parser for CCG\nVerb Semantics and Lexical Selection\nMorphology with a Null-Interface\nMulti-Tape Two-Level Morphology: A Case Study in Semitic Non-linear  Morphology\nComputational Analyses of Arabic Morphology\nA Modular and Flexible Architecture for an Integrated Corpus Query  System\nTraining and Scaling Preference Functions for Disambiguation\nOn Implementing an HPSG theory -- Aspects of the logical architecture,  the formalization, and the implementation of head-driven phrase structure  grammars\nReaping the Benefits of Interactive Syntax and Semantics\nIntegrating Knowledge Bases and Statistics in MT\nConceptual Association for Compound Noun Analysis\nKorean to English Translation Using Synchronous TAGs\nDisambiguation of Super Parts of Speech (or Supertags): Almost Parsing\nTowards a More User-friendly Correction\nA Comparison of Two Smoothing Methods for Word Bigram Models\nStatus of the XTAG System\nThe Linguistic Relevance of Quasi-Trees\nBootstrapping A Wide-Coverage CCG from FB-LTAG\nAutomatically Identifying Morphological Relations in = Machine-Readable  Dictionaries\nSegmenting speech without a lexicon: The roles of phonotactics and  speech source\nThe Semantics of Resource Sharing in Lexical-Functional Grammar\nCooperative Error Handling and Shallow Processing\nRedundancy in Collaborative Dialogue\nAssessing Complexity Results in Feature Theories\nMixed Initiative in Dialogue: An Investigation into Discourse  Segmentation\nEstimating Lexical Priors for Low-Frequency Syncretic Forms\nDiscourse Processing of Dialogues with Multiple Threads\nEfficient Analysis of Complex Diagrams using Constraint-Based Parsing\nRobust Parsing of Spoken Dialogue Using Contextual Knowledge and  Recognition Probabilities\nThe Compactness of Construction Grammars\nContext and ontology in understanding of dialogs\nDevelopment of a Spanish Version of the Xerox Tagger\nText Chunking using Transformation-Based Learning\nUsing Decision Trees for Coreference Resolution\nAmbiguity in the Acquisition of Lexical Information\nA Categorial Framework for Composition in Multiple Linguistic Domains\nA Computational Approach to Aspectual Composition\nA Labelled Analytic Theorem Proving Environment for Categorial Grammar\nHeuristics and Parse Ranking\nToward an MT System without Pre-Editing --- Effects of New Methods in  ALT-J/E ---\nTerm Encoding of Typed Feature Structures\nContext-Sensitive Measurement of Word Distance by Adaptive Scaling of a  Semantic Space\nAnother Facet of LIG Parsing\nThe importance of being lazy -- using lazy evaluation to process queries  to HPSG grammars\nExtended Dependency Structures and their Formal Interpretation\nCompiling a Partition-Based Two-Level Formalism\nLearning similarity-based word sense disambiguation from sparse data\nA New Statistical Parser Based on Bigram Lexical Dependencies\nLearning Dependencies between Case Frame Slots\nClustering Words with the MDL Principle\nCombining Trigram-based and Feature-based Methods for Context-Sensitive  Spelling Correction\nA Bayesian hybrid method for context-sensitive spelling correction\nAn Efficient Inductive Unsupervised Semantic Tagger\nA Probabilistic Disambiguation Method Based on Psycholinguistic  Principles\nA Robust System for Natural Spoken Dialogue\nIntegrating Multiple Knowledge Sources to Disambiguate Word Sense: An  Exemplar-Based Approach\nFrom Submit to Submitted via Submission: On Lexical Rules in Large-Scale  Lexicon Acquisition\nA Divide-and-Conquer Strategy for Parsing\nThe Grammar of Sense: Is word-sense tagging much more than  part-of-speech tagging?\nA Lexical Semantic Database for Verbmobil\nUsing textual clues to improve metaphor processing\nControlling Functional Uncertainty\nCentering in Italian\nPunctuation in Quoted Speech\nA Word Grammar of Turkish with Morphophonemic Rules\nMorphological Productivity in the Lexicon\nAutomatic Alignment of English-Chinese Bilingual Texts of CNS News\nUsing sentence connectors for evaluating MT output\nA Geometric Approach to Mapping Bitext Correspondence\nLearning string edit distance\nSelective Sampling of Effective Example Sentence Sets for Word Sense  Disambiguation\nImprovising Linguistic Style: Social and Affective Bases for Agent  Personality\nRepresenting Constraints with Automata\nCombining Unsupervised Lexical Knowledge Methods for Word Sense  Disambiguation\nMorphological Disambiguation by Voting Constraints\nDistinguishing Word Senses in Untagged Text\nEvaluating Competing Agent Strategies for a Voice Email Agent\nAn Efficient Distribution of Labor in a Two Stage Robust Interpretation  Process\nReluctant Paraphrase: Textual Restructuring under an Optimisation Model\nIntrasentential Centering: A Case Study\nTowards a PURE Spoken Dialogue System for Information Access\nEpistemic NP Modifiers\nCombining Multiple Methods for the Automatic Construction of  Multilingual WordNets\nSemantic Processing of Out-Of-Vocabulary Words in a Spoken Dialogue  System\nSemantic Similarity Based on Corpus Statistics and Lexical Taxonomy\nThe effect of alternative tree representations on tree bank grammars\nDo not forget: Full memory in memory-based learning of word  pronunciation\nModularity in inductively-learned word pronunciation systems\nManual Annotation of Translational Equivalence: The Blinker Project\nLazy Transformation-Based Learning\nEliminating deceptions and mistaken belief to infer conversational  implicature\nDialogue Act Tagging with Transformation-Based Learning\nUnlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS\nBayesian Stratified Sampling to Assess Corpus Utility\nWord Clustering and Disambiguation Based on Co-occurrence Data\nA Projection Architecture for Dependency Grammar and How it Compares to  LFG\nIndexing with WordNet synsets can improve Text Retrieval\nWord Length Frequency and Distribution in English: Observations, Theory,  and Implications for the Construction of Verse Lines\nBosonization Rules for Electron-Hole Systems - II\nEuclidean and Riemannian Geometrical Approaches to Non-Extensive  Thermo-Statistical Mechanics\nCore Lexicon and Contagious Words\nUtterance Selection Model of Language Change\nFlexibly Instructable Agents\nSet-Theoretic Completeness for Epistemic and Conditional Logic\nComparing the expressive power of the Synchronous and the Asynchronous  pi-calculus\nProducing NLP-based On-line Contentware\nA Formal Framework for Linguistic Annotation\nAn ascription-based approach to speech acts\nPronoun Resolution in Japanese Sentences Using Surface Expressions and  Examples\nAn Example-Based Approach to Japanese-to-English Translation of Tense,  Aspect, and Modality\nMinimum Description Length and Compositionality\nWhy C++ is not very fit for GUI programming\nA Polyvariant Binding-Time Analysis for Off-line Partial Deduction\nA Denotational Semantics for First-Order Logic\nOn the Scalability of the Answer Extraction System \"ExtrAns\"\nAccuracy, Coverage, and Speed: What Do They Mean to Users?\nType Classes and Constraint Handling Rules\nSelectional Restrictions in HPSG\nCombining Linguistic and Spatial Information for Document Analysis\nContextual Inference in Computational Semantics\nOn Exponential-Time Completeness of the Circularity Problem for  Attribute Grammars\nOne Sense per Collocation and Genre/Topic Variations\nA Formal Framework for Linguistic Annotation (revised version)\nA Lambda-Calculus with letrec, case, constructors and non-determinism\nSlicing of Constraint Logic Programs\nSemantics and Termination of Simply-Moded Logic Programs with Dynamic  Scheduling\nAn Effective Fixpoint Semantics for Linear Logic Programs\nMagical Number Seven Plus or Minus Two: Syntactic Structure Recognition  in Japanese and English Sentences\nMeaning Sort - Three examples: dictionary construction, tagged corpus  construction, and information presentation system\nCRL at Ntcir2\nChain Programs for Writing Deterministic Metainterpreters\nBootstrapping Syntax and Recursion using Alignment-Based Learning\nSoftware Toolkit for Building Embedded and Distributed Knowledge-based  Systems\nTransformations of CCP programs\nCombining a self-organising map with memory-based learning\nTwo-way Quantum One-counter Automata\nOn Equivalence and Canonical Forms in the LF Type Theory\nPractical Aspects for a Working Compile Time Garbage Collection System  for Mercury\nAn Environment for the Exploration of Non Monotonic Logic Programs\nA Straightforward Approach to Morphological Analysis and Synthesis\nIncremental Construction of Compact Acyclic NFAs\nBSML: A Binding Schema Markup Language for Data Interchange in Problem  Solving Environments (PSEs)\nQuerying Databases of Annotated Speech\nDecision Lists for English and Basque\nA variable-free dynamic semantics\nSimple Strategies for Large Zero-Sum Games with Applications to  Complexity Theory\nUsing the Annotated Bibliography as a Resource for Indicative  Summarization\nAgent Programming with Declarative Goals\nQuestion Answering over Unstructured Data without Domain Restrictions\nA continuation semantics of interrogatives that accounts for Baker's  ambiguity\nExploitingWeb Service Semantics: Taxonomies vs. Ontologies\nAn Approach for Resource Sharing in Multilingual NLP\nOn Decidability of Expressive Description Logics with Composition of  Roles in Number Restrictions\nParametric Connectives in Disjunctive Logic Programming\nDerivation of Efficient Logic Programs by Specialization and Reduction  of Nondeterminism\nOn Structuring Proof Search for First Order Linear Logic\nWeight Constraints as Nested Expressions\nA correct, precise and efficient integration of set-sharing, freeness  and linearity for the analysis of finite and rational tree languages\nConstraint Logic Programming with Hereditary Harrop Formula\nA lambda calculus for quantum computation with classical control\nAutogenic Training With Natural Language Processing Modules: A Recent  Tool For Certain Neuro Cognitive Studies\nModel Checking of Statechart Models: Survey and Research Directions\nOn Role Logic\nAn electronic dictionary as a basis for NLP tools: The Greek case\nA Model for Fine-Grained Alignment of Multilingual Texts\nDemo or Practice: Critical Analysis of the Language/Action Perspective\nDetecting User Engagement in Everyday Conversations\nIntuitionistic computability logic\nTulaFale: A Security Tool for Web Services\nPlayful, streamlike computation\nFine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word  Alignment, Word Clustering and Aligned Wordnets\nContextual equivalence for higher-order pi-calculus revisited\nAn Operational Foundation for Delimited Continuations in<br><br>  the<br><br><br> CPS<br><br> Hierarchy\nComplexity of Networks\nThe Nature of Novelty Detection\nBook review \"The Haskell Road to Logic, Maths and Programming\"\nDigital Libraries: From Process Modelling to Grid-based Service Oriented  Architecture\nAn Explicit Solution to Post's Problem over the Reals\nA compositional Semantics for CHR\nEnhanced Prolog Remote Predicate Call Protocol\nUniform Random Sampling of Traces in Very Large Models\nRepresenting Knowledge about Norms\nDeriving Escape Analysis by Abstract Interpretation: Proofs of results\nResource Usage Analysis for the Pi-Calculus\nNominal Logic Programming\nBuilding and displaying name relations using automatic unsupervised  analysis of newspaper articles\nEvent-based Information Extraction for the biomedical domain: the  Caderige project\nECA-RuleML: An Approach combining ECA Rules with temporal interval-based  KR Event/Action Logics and Transactional Update Logics\nConsistent Streaming Through Time: A Vision for Event Stream Processing\nStatistical keyword detection in literary corpora\nA decision procedure for linear \"big O\" equations\nBistable Biorders: A Sequential Domain Theory\nFIPA agent based network distributed control system\nSimplifications in Lagrangian BV quantization exemplified by the  anomalies of chiral $W_3$ gravity\nLorentz Group derivable from Polarization Optics\nBranes at Orbifolds versus Hanany Witten in Six Dimensions\nComments on Duality in MQCD\nClifford algebra as quantum language\nHagedorn transition, vortices and D0 branes: Lessons from 2+1 confining  strings\nOpen Superstring on Symmetric Product\nMoyal Formulation of Witten's Star Product in the Fermionic Ghost Sector\nTwo-Loop Superstrings in Hyperelliptic Language I: the Main Results\nTwo-Loop Superstrings in Hyperelliptic Language III: the Four-Particle  Amplitude\nFields in the Language of String: Divergences and Renormalization\nSome compact logics --- results in ZFC\nCan you feel the double jump?\nVery weak zero one law for random graphs with order and random binary  functions\nNumerical Calculations Using Maple: Why & How?\nNonstandard Consequence Operators\nOne-Dimensional Peg Solitaire, and Duotaire\nThe theorem of the complement for nested subpfaffian sets\nWigner's new physics frontier: Physics of two-by-two matrices, including  the Lorentz group and optical instruments\nDynamics and computation in functional shifts\nControl System for the LEDA 6.7-MeV Proton Beam Halo Experiment\nTheoretical model for the evolution of the linguistic diversity\nStatistical Dynamics of Religions and Adherents\nCultural route to the emergence of linguistic categories\nThe theory of physical superselection sectors in terms of vertex  operator algebra language\nOperational quantum logic: An overview\nQuantum machine language and quantum computation with Josephson  junctions\nSchwinger, Pegg and Barnett approaches and a relationship between  angular and Cartesian quantum descriptions II: Phase Spaces\nQuantum-mechanical motion and the stillness of experimental records\nToward a Quantum Process Algebra\nLorentz Group in Ray Optics\nA Process Algebraic Approach to Concurrent and Distributed Quantum  Computation: Operational Semantics\nCommunicating Quantum Processes\nExponential Separation of Quantum and Classical Online Space Complexity\nSome observations on two-way finite automata with quantum and classical  states\nWorst-Case Background Knowledge for Privacy-Preserving Data Publishing\nA discussion on particle number and quantum indistinguishability\nCurry-style type Isomorphisms and Game Semantics\nRecursive n-gram hashing is pairwise independent, at best\nHow to be correct, lazy and efficient ?\nProvenance as Dependency Analysis\nA proof of strong normalisation using domain theory\nBio-linguistic transition and Baldwin effect in an evolutionary  naming-game model\nImplementation, Compilation, Optimization of Object-Oriented Languages,  Programs and Systems - Report on the Workshop ICOOOLPS'2006 at ECOOP'06\nA quick search method for audio signals based on a piecewise linear  representation of feature trajectories\nSome Reflections on the Task of Content Determination in the Context of  Multi-Document Summarization of Evolving Events\nDeclarative Diagnosis of Floundering\nOutilex, plate-forme logicielle de traitement de textes écrits\nImplementation, Compilation, Optimization of Object-Oriented Languages,  Programs and Systems - Report on the Workshop ICOOOLPS'2007 at ECOOP'07\nConcepts and their Use for Modelling Objects and References in  Programming Languages\nCorpus sp{é}cialis{é} et ressource de sp{é}cialit{é}\nQuantum entanglement analysis based on abstract interpretation\nIndependence and concurrent separation logic\nTextual Fingerprinting with Texts from Parkin, Bassewitz, and Leander\nSign Language Tutoring Tool\nA Semi-Automatic Framework to Discover Epistemic Modalities in  Scientific Articles\nAlternating Automata on Data Trees and XPath Satisfiability\nComputational Approaches to Measuring the Similarity of Short Contexts :  A Review of Applications and Methods\nAceWiki: A Natural and Expressive Semantic Wiki\nCoinductive big-step operational semantics\nProviding Virtual Execution Environments: A Twofold Illustration\nOn the Vocabulary of Grammar-Based Codes and the Logical Consistency of  Texts\nA TLA+ Proof System\nGrapham: Graphical Models with Adaptive Random Walk Metropolis  Algorithms\nLogical Algorithms meets CHR: A meta-complexity result for Constraint  Handling Rules with rule priorities\nConsensus and ordering in language dynamics\nPalindromes in infinite ternary words\nWhat's in a Message?\nSuccinctness of two-way probabilistic and quantum finite automata\nReformulating Global Grammar Constraints\nStochastic Constraint Programming: A Scenario-Based Approach\nThermodynamics of Information Retrieval\nComment on \"Language Trees and Zipping\" arXiv:cond-mat/0108530\nComplexity of Fractran and Productivity\nA Note on the Complexity of the Satisfiability Problem for Graded Modal  Logics\nScenario-based Stochastic Constraint Programming\nThe cost of being co-Buchi is nonlinear\nKnowledge-Based Synthesis of Distributed Systems Using Event Structures\nRelational Parametricity for Computational Effects\nThe Complexity of Approximating Bounded-Degree Boolean \\sharp CSP\nVerification of Timed Automata Using Rewrite Rules and Strategies\nWhy a splitting in the final state cannot explain the GSI-Oscillations\nTranslation from Classical Two-Way Automata to Pebble Two-Way Automata\nUsing the Deutsch-Jozsa algorithm to determine parts of an array and  apply a specified function to each independent part\nPolynomial-Space Approximation of No-Signaling Provers\nConsideration Points Detecting Cross-Site Scripting\nFastFlow: Efficient Parallel Streaming Applications on Multi-core\nA Common XML-based Framework for Syntactic Annotations\nGrouping Synonyms by Definitions\nThe meta book and size-dependent properties of written language\nEvaluation of Hindi to Punjabi Machine Translation System\nProceedings 7th International Workshop on Security Issues in Concurrency\nCorrectness Kernels of Abstract Interpretations\nAlgorithm as Defining Dynamic Systems\nA New Computational Schema for Euphonic Conjunctions in Sanskrit  Processing\nQuantum Reality and Measurement: A Quantum Logical Approach\nProceedings Fifth Workshop on Developments in Computational  Models--Computational Models From Nature\nFlare: Architecture for rapid and easy development of Internet-based  Applications\nDeciding Regularity of the Set of Instances of a Set of Terms with  Regular Constraints is EXPTIME-Complete\nIntegrable systems and holomorphic curves\nGouverner la standardisation par les changements d'arene. Le cas du XML\nComputable de Finetti measures\nEmotions in Pervasive Computing Environments\nCollapsing and Separating Completeness Notions under Average-Case and  Worst-Case Hypotheses\nFormalizing cCSP Synchronous Semantics in PVS\nTowards Effective Sentence Simplification for Automatic Processing of  Biomedical Text\nThe Complexity of Approximating Bounded-Degree Boolean #CSP (Extended  Abstract)\nRecognition and translation Arabic-French of Named Entities: case of the  Sport places\nSession-Based Programming for Parallel Algorithms: Expressiveness and  Performance\nThai Rhetorical Structure Analysis\nA proof Procedure for Testing Membership in Regular Expressions\nFrom Frequency to Meaning: Vector Space Models of Semantics\nFormalization and Validation of Safety-Critical Requirements\nLa représentation formelle des concepts spatiaux dans la langue\nLazy Evaluation and Delimited Control\nQMIP = MIP*\nLiberalizing Dependency\nInformation Cost Tradeoffs for Augmented Index and Streaming Language  Recognition\nPunctuation effects in English and Esperanto texts\nProceedings International Workshop on Developments in Implicit  Computational complExity\nOn the Module of Internet Banking System\nA Meta-Programming Approach to Realizing Dependently Typed Logic  Programming\nThe Problem of the Observer in Physics\nToward a language theoretic proof of the four color theorem\nObject-oriented modelling with unified modelling language 2.0 for simple  software application based on agile methodology\nThe duality of computation under focus\nOrthogonal Persistence Revisited\nOn the Implementation of the Probabilistic Logic Programming Language  ProbLog\nConstructing Active Architectures in the ArchWare ADL\nProof-theoretic Analysis of Rationality for Strategic Games with  Arbitrary Strategy Sets\nLinguistic complexity: English vs. Polish, text vs. corpus\nVerification of Java Bytecode using Analysis and Transformation of Logic  Programs\nSawja: Static Analysis Workshop for Java\nApplying Prolog to Develop Distributed Systems\nA simple model for the evolution of molecular codes driven by the  interplay of accuracy, diversity and cost\nAbstracting Abstract Machines\nCatching the Ouroboros: On Debugging Non-ground Answer-Set Programs\nHandling Data-Based Concurrency in Context-Aware Service Protocols\nResumptions, Weak Bisimilarity and Big-Step Semantics for While with  Interactive I/O: An Exercise in Mixed Induction-Coinduction\nModelling the Dynamics of an Aedes albopictus Population\nMinimization Strategies for Maximally Parallel Multiset Rewriting  Systems\nExtending the Real-Time Maude Semantics of Ptolemy to Hierarchical DE  Models\nCertifying cost annotations in compilers\nA Graphical Approach to Progress for Structured Communication in Web  Services\nIntroducing Business Language Driven Development\nLogical Foundations and Complexity of 4QL, a Query Language with  Unrestricted Negation\nRobust Simulations and Significant Separations\nOn Probabilistic Parallel Programs with Process Creation and  Synchronisation\nSecure Information Flow by Model Checking Pushdown System\nPhysics of the mind: Concepts, emotions, language, cognition,  consciousness, beauty, music, and symbolic culture\nOn the expressiveness of Parikh automata and related models\nA Review of Research on Devnagari Character Recognition\nGeometric representations for minimalist grammars\nAutomata and Differentiable Words\nCFA2: a Context-Free Approach to Control-Flow Analysis\nDrive for Creativity\nLinear Dependent Types and Relative Completeness\nSLDs for Visualizing Multicolor Elevation Contour Lines in Geo-Spatial  Web Applications\nPhase Transitions in Knowledge Compilation: an Experimental Study\nExpression Templates Revisited: A Performance Analysis of the Current ET  Methodology\nStreaming Tree Transducers\nTowards OWL-based Knowledge Representation in Petrology\nConstraint solving in non-permutative nominal abstract syntax\nGrounded Semantic Composition for Visual Scenes\nPrototyping the Semantics of a DSL using ASF+SDF: Link to Formal  Verification of DSL Models\nOptimal Divide and Query (extended version)\nA Semantic Relatedness Measure Based on Combined Encyclopedic,  Ontological and Collocational Knowledge\nA Verified Algebra for Linked Data\nDecoupled execution of synchronous coordination models via behavioural  automata\nContracts in distributed systems\nProduct Lines for Service Oriented Applications - PL for SOA\nFormal Component-Based Semantics\nA Spatial Calculus of Wrapped Compartments\nModelling of Genetic Regulatory Mechanisms with GReg\nStepping Lazy Programs\nSpecific \"scientific\" data structures, and their processing\nApproximate Policy Iteration with a Policy Language Bias: Solving  Relational Markov Decision Processes\nThe SeaLion has Landed: An IDE for Answer-Set Programming---Preliminary  Report\nActor Continuation Passing: Efficient and Extensible Request Routing for  Event-Driven Architectures\nDomain Adaptation for Statistical Classifiers\nReasoning with Very Expressive Fuzzy Description Logics\nAn Improved Implementation and Abstract Interface for Hybrid\nA fusion algorithm for joins based on collections in Odra (Object  Database for Rapid Application development)\nExtending the adverbial coverage of a NLP oriented resource for French\nSemantic Navigation on the Web of Data: Specification of Routes, Web  Fragments and Actions\nEffects for Funargs\nProceedings Second Workshop on Developments in Implicit Computational  Complexity\nInformation Hiding in CSS : A Secure Scheme Text-Steganography using  Public Key Cryptosystem\nA Well-typed Lightweight Situation Calculus\nA dependent nominal type theory\nAn Authoring System for Editing Lessons in Phonetic English in SMIL3.0\nInference and Plausible Reasoning in a Natural Language Understanding  System Based on Object-Oriented Semantics\nA non-local method for robustness analysis of floating point programs\nQRB-Domains and the Probabilistic Powerdomain\nScaling Laws in Human Language\nTheorem proving for prenex Gödel logic with Delta: checking validity  and unsatisfiability\nEstablishing linguistic conventions in task-oriented primeval dialogue\nManual and Fast C Code Optimization\nAn MLP based Approach for Recognition of Handwritten `Bangla' Numerals\nHandwritten Bangla Alphabet Recognition using an MLP Based Classifier\nDistributional Measures of Semantic Distance: A Survey\nA Unifying Framework to Characterize the Power of a Language to Express  Relations\nPatterns in rational base number systems\nMassively Increasing TIMEX3 Resources: A Transduction Approach\nA Data Driven Approach to Query Expansion in Question Answering\nThe magnetospheric chaos hypothesis: a new point of view of the  magnetospheric dynamics\nRoget's Thesaurus and Semantic Similarity\nNot As Easy As It Seems: Automating the Construction of Lexical Chains  Using Roget's Thesaurus\nSegmentation Similarity and Agreement\nLower Complexity Bounds for Lifted Inference\nBiographical Social Networks on Wikipedia - A cross-cultural study of  links that made history\nThe Distributed Ontology Language (DOL): Ontology Integration and  Interoperability Applied to Mathematical Formalization\nLearning Semantic String Transformations from Examples\nCologne: A Declarative Distributed Constraint Optimization Platform\nSpecification and Verification of Uplink Framework for Application of  Software Engineering using RM-ODP\nConstraint LTL Satisfiability Checking without Automata\nCharacterizing Ranked Chinese Syllable-to-Character Mapping Spectrum: A  Bridge Between the Spoken and Written Chinese Language\nA Common Evaluation Setting for Just.Ask, Open Ephyra and Aranea QA  systems\nIssues of Architectural Description Languages for Handling Dynamic  Reconfiguration\nLanguage-Constraint Reachability Learning in Probabilistic Graphs\nBehavioural Types for Actor Systems\nTemporal expression normalisation in natural language texts\nInvariant measures concentrated on countable structures\nLearning to Identify Regular Expressions that Describe Email Campaigns\nDetection of Configuration Vulnerabilities in Distributed (Web)  Environments\nIntellectual Management of Enterprise\nA multi-prover interactive proof for NEXP sound against entangled  provers\nAn Exploratory Study of Forces and Frictions affecting Large-Scale  Model-Driven Development\nMDM: A Mode Diagram Modeling Framework for Periodic Control Systems\nThe law of brevity in macaque vocal communication is not an artifact of  analyzing mean call durations\nA Logic Programming Framework for Possibilistic Argumentation with Vague  Knowledge\nCompleteness of algebraic CPS simulations\nEfficient Indexing and Querying over Syntactically Annotated Trees\nThe Distributed Ontology Language (DOL): Use Cases, Syntax, and  Extensibility\nGuidelines for a Dynamic Ontology - Integrating Tools of Evolution and  Versioning in Ontology\nProceedings Combined 19th International Workshop on Expressiveness in  Concurrency and 9th Workshop on Structured Operational Semantics\nProceedings 18th international workshop on Cellular Automata and  Discrete Complex Systems and 3rd international symposium Journées Automates  Cellulaires\nTwo-Way Finite Automata: Old and Recent Results\nOrdered {AND, OR}-Decomposition and Binary-Decision Diagram\nConcept driven framework for Latent Table Discovery\nParameterized Concurrent Multi-Party Session Types\nRelaxed Operational Semantics of Concurrent Programming Languages\nAuthorship Identification in Bengali Literature: a Comparative Analysis\nHistory-Register Automata\nBlackboard Rules for Coordinating Context-aware Applications in Mobile  Ad Hoc Networks\nKnots, Braids and First Order Logic\nA Note on Program Specialization. What Can Syntactical Properties of  Residual Programs Reveal?\nAutomatic Unbounded Verification of Alloy Specifications with Prover9\nOn the Automation of Encoding Processes in the Quantum IO Monad\nSuperdense Coding with GHZ and Quantum Key Distribution with W in the  ZX-calculus\nA Semantic Approach for Automatic Structuring and Analysis of Software  Process Patterns\nImproved Quantum Query Algorithms for Triangle Finding and Associativity  Testing\nEfficient Tabling of Structured Data with Enhanced Hash-Consing\nInference of Fine-grained Attributes of Bengali Corpus for Stylometry  Detection\nAdvanced Automata Minimization\nThe Hangulphabet: A Descriptive Alphabet\nOptimal size, freshness and time-frame for voice search vocabulary\nDating Texts without Explicit Temporal Cues\nFault Localization Using Textual Similarities\nCombining Insertion and Deletion in RNA-editing Preserves Regularity\nTracking and Quantifying Censorship on a Chinese Microblogging Site\nLetter counting: a stem cell for Cryptology, Quantitative Linguistics,  and Statistics\nUsing external sources of bilingual information for on-the-fly word  alignment\nEvolution of the most common English words and phrases over the  centuries\nCoherent Minimisation: Towards efficient tamper-proof compilation\nOperational semantics for product-form solution\nlibcppa - Designing an Actor Semantic for C++11\nDistinguishing Models by Formulas and the Number of Countable Models\nSPARC - Sorted ASP with Consistency Restoring Rules\nPlanning and Scheduling in Hybrid Domains Using Answer Set Programming\nA modal perspective on the transverse Anderson localization of light in  disordered optical lattices\nTEI and LMF crosswalks\nThe Expressive Power of Word Embeddings\nProceedings First International Workshop on Trends in Functional  Programming in Education\nFrom 3D Point Clouds To Semantic Objects An Ontology-Based Detection  Approach\nFrom Two-Way to One-Way Finite State Transducers\nReal-Time Specification Patterns and Tools\nA Survey on Array Storage, Query Languages, and Systems\nTag-based Semantic Website Recommendation for Turkish Language\nArabic text summarization based on latent semantic analysis to enhance  arabic documents clustering\nLearning Universally Quantified Invariants of Linear Data Structures\nAn Algebraic Semantics for Possibilistic Logic\nStochastic dynamics of lexicon learning in an uncertain and nonuniform  world\nRole of temporal inference in the recognition of textual inference\nDevelopment of Yes/No Arabic Question Answering System\nNon-simplifying Graph Rewriting Termination\nKSU KDD: Word Sense Induction by Clustering in Topic Space\nObject-oriented programming: some history, and challenges for the next  fifty years\nEfficient learning strategy of Chinese characters based on network  approach\nAutomatic Verification of Erlang-Style Concurrency\nSemantics for Probabilistic Inference\nEffective Characterizations of Simple Fragments of Temporal Logic Using  Carton--Michel Automata\nSemantic Matching of Security Policies to Support Security Experts\nThe operad of wiring diagrams: formalizing a graphical language for  databases, recursion, and plug-and-play circuits\nProgramming with Personalized PageRank: A Locally Groundable First-Order  Probabilistic Logic\nBinary Tree based Chinese Word Segmentation\nRandom crossings in dependency trees\nOn the Concept of Variable Roles and its Use in Software Analysis\nTowards an Ontology based integrated Framework for Semantic Web\nProceedings 11th International Workshop on Quantitative Aspects of  Programming Languages and Systems\nA Universal Machine for Biform Theory Graphs\nExtended to Multi-Tilde-Bar Regular Expressions and Efficient Finite  Automata Constructions\nSemantics and pragmatics in actual software applications and in web  search engines: exploring innovations\nImproving Pointwise Mutual Information (PMI) by Incorporating  Significant Co-occurrence\nIntelligent Hybrid Man-Machine Translation Quality Estimation\nAlias and Change Calculi, Applied to Frame Inference\nSays who? Automatic Text-Based Content Analysis of Television News\nThe operad of temporal wiring diagrams: formalizing a graphical language  for discrete-time processes\nA Novel Architecture For Question Classification Based Indexing Scheme  For Efficient Question Answering\nAnnotations for Intersection Typechecking\nTagging Scientific Publications using Wikipedia and Natural Language  Processing Tools. Comparison on the ArXiv Dataset\nCombining and Relating Control Effects and their Semantics\nStructure Learning of Probabilistic Logic Programs by Searching the  Clause Space\nA Machine-Checked Proof for a Translation of Event-B Machines to JML\nSafeJS: Hermetic Sandboxing for JavaScript\nEven the Abstract have Colour: Consensus in Word-Colour Associations\nFrom Once Upon a Time to Happily Ever After: Tracking Emotions in Novels  and Fairy Tales\nAn Inter-lingual Reference Approach For Multi-Lingual Ontology Matching\n$μ$-Limit Sets of Cellular Automata from a Computational Complexity  Perspective\nRetargeting GCC: Do We Reinvent the Wheel Every Time?\nPathwise Taylor Expansions for Random Fields on Multiple Dimensional  Paths\nSubjective and Objective Evaluation of English to Urdu Machine  Translation\nFormal verification in Coq of program properties involving the global  state effect\nIntroducing Enriched Concrete Syntax Trees\nA Scratch-like visual programming system for Microsoft Windows Phone 8\nEvolution of the Modern Phase of Written Bangla: A Statistical Study\nNamed entity recognition using conditional random fields with non-local  relational constraints\nSpeculative Staging for Interpreter Optimization\nCertified proofs in programs involving exceptions\nSpatio-temporal variation of conversational utterances on Twitter\nKeyboards for inputting Japanese language -A study based on US patents\nEhrenfeucht-Fraisse Games on Omega-Terms\nCan Twitter Predict Royal Baby's Name ?\nDescription and Evaluation of Semantic Similarity Measures Approaches\nIdentifying Purpose Behind Electoral Tweets\nSBML for optimizing decision support's tools\nPlanning by case-based reasoning based on fuzzy logic\nOn the Structure of Bispecial Sturmian Words\nClustering and Relational Ambiguity: from Text Data to Natural Data\nFlexible Invariants Through Semantic Collaboration\nEfficient XML Keyword Search based on DAG-Compression\nApplying AOSE Concepts to Model Crosscutting Variability in Variant-Rich  Processes\nA Theory of Changes for Higher-Order Languages - Incrementalizing  λ-Calculi by Static Differentiation\nFormal Model of Web Service Composition: An Actor-Based Approach to  Unifying Orchestration and Choreography\nSession Types with Runtime Adaptation: Overview and Examples\nTowards deductive verification of MPI programs against session types\nDomain adaptation for sequence labeling using hidden Markov models\nEffective Slot Filling Based on Shallow Distant Supervision Methods\nLinear Temporal Logic for Regular Cost Functions\nDictionary-Based Concept Mining: An Application for Turkish\nONTS: \"Optima\" News Translation System\nLogical Foundations of RDF(S) with Datatypes\nThe Opposite of Smoothing: A Language Model Approach to Ranking  Query-Specific Document Clusters\nDefeasible Inclusions in Low-Complexity DLs\nFunctorial Semantics of Second-Order Algebraic Theories\nFormalization and Verification of Hierarchical Use of Interaction  Overview Diagrams Using Timing Diagrams\nIdentification of Pleonastic It Using the Web\nInferring Shallow-Transfer Machine Translation Rules from Small Parallel  Corpora\nIntegrative Semantic Dependency Parsing via Efficient Large-scale  Feature Selection\nThe Readability of Tweets and their Geographic Correlation with  Education\nControlling Complexity in Part-of-Speech Induction\nSafety verification of asynchronous pushdown systems with shaped stacks\nA Machine Learning Approach for the Identification of Bengali Noun-Noun  Compound Multiword Expressions\nKeyword and Keyphrase Extraction Using Centrality Measures on  Collocation Networks\nBehavior recognition and analysis in smart environments for  context-aware applications\nFree Applicative Functors\nFinding Eyewitness Tweets During Crises\nUsing Entropy Estimates for DAG-Based Ontologies\nTransaction Repair: Full Serializability Without Locks\nK-Position, Follow, Equation and K-C-Continuation Tree Automata  Constructions\nUpdating RDFS ABoxes and TBoxes in SPARQL\nAxiomatization of Finite Algebras\nProceedings 9th International Workshop on Developments in Computational  Models\nTowards Formal Interaction-Based Models of Grid Computing  Infrastructures\nMTL-Model Checking of One-Clock Parametric Timed Automata is Undecidable\nMining Idioms from Source Code\nA Note on Relative Observability in Coordination Control\nMeta-evaluation of comparability metrics using parallel corpora\nAn Empirical Comparison of Parsing Methods for Stanford Dependencies\nInference in the FO(C) Modelling Language\nThe Dafny Integrated Development Environment\nFoCaLiZe: Inside an F-IDE\nReducing Clocks in Timed Automata while Preserving Bisimulation\nA tlm-based platform to specify and verify component-based real-time  systems\nExemplar Dynamics Models of the Stability of Phonological Categories\nA Concurrent Pattern Calculus\nUsing Tabled Logic Programming to Solve the Petrobras Planning Problem\nComparison of the language networks from literature and blogs\nMéthodes pour la représentation informatisée de données  lexicales / Methoden der Speicherung lexikalischer Daten\nNew Perspectives in Sinographic Language Processing Through the Use of  Character Structure\nDecision Problems for Deterministic Pushdown Automata on Infinite Words\nMachine Translation Model based on Non-parallel Corpus and  Semi-supervised Transductive Learning\nEfficient and Reasonable Object-Oriented Concurrency\nLockdown: Dynamic Control-Flow Integrity\nThe double-slit quantum eraser experiments and Hardy's paradox in the  quantum linguistic interpretation\nA Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene  Text Extraction\nStrategic Port Graph Rewriting: An Interactive Modelling and Analysis  Framework\nSpecifying and Executing Optimizations for Parallel Programs\nA Dual-Engine for Early Analysis of Critical Systems\nFixed-point Characterization of Compositionality Properties of  Probabilistic Processes Combinators\nDatabase Queries that Explain their Work\nReal-Time and Robust Method for Hand Gesture Recognition System Based on  Cross-Correlation Coefficient\nDistributed Graph Automata and Verification of Distributed Algorithms\nOpinion mining of movie reviews at document level\nVerifiable UML Artifact-Centric Business Process Models (Extended  Version)\nWhen is a container a comonad?\nHourglass Automata\nEvaluating Neural Word Representations in Tensor-Based Compositional  Settings\nA Multi-World Approach to Question Answering about Real-World Scenes  based on Uncertain Input\nSupervised learning Methods for Bangla Web Document Categorization\nBilBOWA: Fast Bilingual Distributed Representations without Word  Alignments\nSpace-Efficient Manifest Contracts\nTowards Static Analysis of Functional Programs using Tree Automata  Completion\nRelational Linear Programs\nProcess-aware web programming with Jolie\nLearning Distributed Word Representations for Natural Logic Reasoning\nConvex Optimization in Julia\nOn the Relation of Interaction Semantics to Continuations and  Defunctionalization\nLearning Vague Concepts for the Semantic Web\nProving Safety with Trace Automata and Bounded Model Checking\nA type assignment for lambda-calculus complete both for FPTIME and  strong normalization\nA classical approach to smooth supermanifolds\nA random forest system combination approach for error detection in  digital dictionaries\nUsing Linguistic Features to Estimate Suicide Probability of Chinese  Microblog Users\nRetrofitting Word Vectors to Semantic Lexicons\nA Pooling Approach to Modelling Spatial Relations for Image Retrieval  and Annotation\nType-Driven Incremental Semantic Parsing with Polymorphism\nA Robust Class of Data Languages and an Application to Learning\nLABR: A Large Scale Arabic Sentiment Analysis Benchmark\nThe influence of infant-directed speech on 12-month-olds' intersensory  perception of fluent speech\nRoman Urdu Opinion Mining System (RUOMiS)\nCombining Language and Vision with a Multimodal Skip-gram Model\nClosing the Gap -- Formally Verifying Dynamically Typed Programs like  Statically Typed Ones Using Hoare Logic -- Extended Version --\nReal Time Collaborative Platform for Learning and Teaching Foreign  Languages\nMining Scientific Papers for Bibliometrics: a (very) Brief Survey of  Methods and Tools\nA comparative study of approaches in user-centered health information  retrieval\nCounting Branches in Trees Using Games\nFrom Non-preemptive to Preemptive Scheduling using Synchronization  Synthesis\nFlickr30k Entities: Collecting Region-to-Phrase Correspondences for  Richer Image-to-Sentence Models\nTranslation Memory Retrieval Methods\nThe IBM 2015 English Conversational Telephone Speech Recognition System\nNeeded Computations Shortcutting Needed Steps\nUnveiling the Political Agenda of the European Parliament Plenary: A  Topical Analysis\nOverview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and  POS Tagging for Micro-blog Texts\nSupervised Fine Tuning for Word Embedding with Integrated Knowledge\nThe canonical measure on a reductive p-adic group is motivic\nFAQ-based Question Answering via Word Alignment\nMulti-Lingual Ontology Server (MOS) for discovering Web services\nSynthesis of Recursive ADT Transformations from Reusable Templates\nPutting Logic-Based Distributed Systems on Stable Grounds\nEvent-Driven Network Programming\nDetermination of the Internet Anonymity Influence on the Level of  Aggression and Usage of Obscene Lexis\nThe ICSTM+TUM+UP Approach to the 3rd CHIME Challenge: Single-Channel  LSTM Speech Enhancement with Multi-Channel Correlation Shaping  Dereverberation and LSTM Language Models\nA counterexample to the reconstruction of $ω$-categorical  structures from their endomorphism monoids\nAutomatic Taxonomy Extraction from Query Logs with no Additional Sources  of Information\nCalculating entropy at different scales among diverse communication  systems\nFormulae in noncommutative Hodge theory\nMulti-head Watson-Crick automata\nSAFE: A Declarative Trust Management System with Linked Credentials\nShape Expressions Schemas\nPart-of-Speech Tagging with Bidirectional Long Short-Term Memory  Recurrent Neural Network\nArchitectural Consistency Checking in Plugin-Based Software Systems\nA web-based IDE for IDP\nDetecting Interrogative Utterances with Recurrent Neural Networks\nColor Aesthetics and Social Networks in Complete Tang Poems:  Explorations and Discoveries\nAn Empirical Study on Sentiment Classification of Chinese Review using  Word Embedding\nProfinite Monads, Profinite Equations, and Reiterman's Theorem\nStacked Attention Networks for Image Question Answering\nLearning Linguistic Biomarkers for Predicting Mild Cognitive Impairment  using Compound Skip-grams\nProceedings First International Workshop on Focusing\nInvestigating the stylistic relevance of adjective and verb simile  markers\nFrom Images to Sentences through Scene Description Graphs using  Commonsense Reasoning and Knowledge\nLearning to Represent Words in Context with Multilingual Supervision\nA System for Extracting Sentiment from Large-Scale Arabic Social Data\nHarvesting comparable corpora and mining them for equivalent bilingual  sentences using statistical classification and analogy- based heuristics\nAlternative structures for character-level RNNs\nSkip-Thought Memory Networks\nA C-LSTM Neural Network for Text Classification\nSemi-supervised and Unsupervised Methods for Categorizing Posts in Web  Discussion Forums\nNonparametric Spherical Topic Modeling with Word Embeddings\nAutomatic Annotation of Structured Facts in Images\nVariations on Noetherianness\nA Software Methodology for Compiling Quantum Programs\nApplying Ontological Modeling on Quranic Nature Domain\nSpeed-Constrained Tuning for Statistical Machine Translation Using  Bayesian Optimization\nUnderstanding Rating Behaviour and Predicting Ratings by Identifying  Representative Users\nM$^2$S-Net: Multi-Modal Similarity Metric Learning based Deep  Convolutional Network for Answer Selection\nAdvances in Property-Based Testing for $α$Prolog\nVisual Relationship Detection with Language Priors\nStructured prediction models for RNN based sequence labeling in clinical  text\nTwo-Buffer Simulation Games\nEfficient Algebraic Effect Handlers for Prolog\nSemantic Representations of Word Senses and Concepts\nLock-free atom garbage collection for multithreaded Prolog\nTo Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering  in Statistical Machine Translation\nLanguage free character recognition using character sketch and center of  gravity shifting\nWords, Concepts, and the Geometry of Analogy\nEntailment Relations on Distributions\nAngularJS in the Wild: A Survey with 460 Developers\nASP for Minimal Entailment in a Rational Extension of SROEL\nDetermining Health Utilities through Data Mining of Social Media\nAnalysis of Morphology in Topic Modeling\nLearning Latent Local Conversation Modes for Predicting Community  Endorsement in Online Discussions\nSlicing Concurrent Constraint Programs\nModeling Human Reading with Neural Attention\nCurryCheck: Checking Properties of Curry Programs\nLearning Word Embeddings from Intrinsic and Extrinsic Views\nAn Incremental Parser for Abstract Meaning Representation\nTowards Machine Comprehension of Spoken Content: Initial TOEFL Listening  Comprehension Test by Machine\nTracking Amendments to Legislation and Other Political Texts with a  Novel Minimum-Edit-Distance Algorithm: DocuToads\nUsing Semantic Similarity for Input Topic Identification in  Crawling-based Web Application Testing\nSemantic descriptions of 24 evaluational adjectives, for application in  sentiment analysis\nA Parallel Memory-efficient Epistemic Logic Program Solver: Harder,  Better, Faster\nMachine Comprehension Using Match-LSTM and Answer Pointer\nVisual Question: Predicting If a Crowd Will Agree on the Answer\nNoFAQ: Synthesizing Command Repairs from Examples\nHash2Vec, Feature Hashing for Word Embeddings\nAccelerating QDP++ using GPUs\nWorkflows for the Management of Change in Science, Technologies,  Engineering and Mathematics\nTRX: A Formally Verified Parser Interpreter\nPerception of Personality and Naturalness through Dialogues by Native  Speakers of American English and Arabic\nISICSoo: a class for the calculation of ionization cross sections from  ECPSSR and PWBA theory\nCinemaGazer: a System for Watching Video at Very High Speed\nModelling Mixed Discrete-Continuous Domains for Planning\nSymmetric Encapsulated Multi-Methods\nProceedings Sixth International Workshop on Logical Frameworks and  Meta-languages: Theory and Practice\nGrammatical Relations of Myanmar Sentences Augmented by  Transformation-Based Learning of Function Tagging\nThe Language of Two-by-two Matrices spoken by Optical Devices\nMeaning-focused and Quantum-inspired Information Retrieval\nParameterized Verification of Asynchronous Shared-Memory Systems\nObject-Oriented Translation for Programmable Relational System (DRAFT)\nProbability Distributions Over Possible Worlds\nFooPar: A Functional Object Oriented Parallel Framework in Scala\nPower of the interactive proof systems with verifiers modeled by  semi-quantum two-way finite automata\nAbstract machines for game semantics, revisited\nOn the Complexity of Verifying Regular Properties on Flat Counter  Systems\nConstant conditional entropy and related hypotheses\nA formalisation of XMAS\nFirst-Class Functions for First-Order Database Engines\nA Comparison of Named Entity Recognition Tools Applied to Biographical  Texts\nScience Fiction as a Worldwide Phenomenon: A Study of International  Creation, Consumption and Dissemination\nExploratory Analysis of Highly Heterogeneous Document Collections\nHidden Structure and Function in the Lexicon\nLinearizability with Ownership Transfer\nB(eo)W(u)LF: Facilitating recurrence analysis on multi-level language\nImplementation Of Back-Propagation Neural Network For Isolated Bangla  Speech Recognition\nTowards an Effective Decision Procedure for LTL formulas with  Constraints\nPACE: Pattern Accurate Computationally Efficient Bootstrapping for  Timely Discovery of Cyber-Security Concepts\nSecond-Order Algebraic Theories\nHeisenberg uncertainty principle and quantum Zeno effects in the  linguistic interpretation of quantum mechanics\nBots vs. Wikipedians, Anons vs. Logged-Ins\nTopic Segmentation and Labeling in Asynchronous Conversations\nLong Short-Term Memory Based Recurrent Neural Network Architectures for  Large Vocabulary Speech Recognition\nTowards High Performance Computing (Hpc) Through Parallel Programming  Paradigms and Their Principles\nPredicting Crowd Behavior with Big Public Data\nAn evaluation of keyword extraction from online communication for the  characterisation of social relations\nRigorous Description Of Design Components Functionality: An Approach  Based Contract\nAn evaluative baseline for geo-semantic relatedness and similarity\nword2vec Explained: deriving Mikolov et al.'s negative-sampling  word-embedding method\nExtracting Networks of Characters and Places from Written Works with  CHAPLIN\nZappa-Szép products of Garside monoids\nNested Regular Path Queries in Description Logics\nKoka: Programming with Row Polymorphic Effect Types\nFoundations of Total Functional Data-Flow Programming\nMultiparty Session Actors\nRepresenting Network Trust and Using It to Improve Anonymous  Communication\nModelling, Visualising and Summarising Documents with a Single  Convolutional Neural Network\nThe Frobenius anatomy of word meanings II: possessive relative pronouns\nPredicting Motivations of Actions by Leveraging Text\nSurveyMan: Programming and Automatically Debugging Surveys\nDeep Fragment Embeddings for Bidirectional Image Sentence Mapping\nCentral compact objects, superslow X-ray pulsars, gamma-ray bursts: do  they have anything to do with magnetars?\nVisual Speech Recognition\nExercises for Children with Dyslalia-Software Infrastructure\nModel Evolution and Management\nExploiting Social Network Structure for Person-to-Person Sentiment  Analysis\nA Study of Association Measures and their Combination for Arabic MWT  Extraction\nDesigning and Deploying Online Field Experiments\nAn Algorithm Based on Empirical Methods, for Text-to-Tuneful-Speech  Synthesis of Sanskrit Verse\nDecidability Problems for Actor Systems\nThe UML as a Formal Modeling Notation\nExpression-based aliasing for OO-languages\nA stencil-based implementation of Parareal in the C++ domain specific  embedded language STELLA\nRoboBrain: Large-Scale Knowledge Engine for Robots\nTiered Clustering to Improve Lexical Entailment\nHorn Clauses for Communicating Timed Systems\nHigh Performance Computing Evaluation A methodology based on Scientific  Application Requirements\nDeep Learning for Answer Sentence Selection\nPractice in Synonym Extraction at Large Scale\nDeclaratively solving tricky Google Code Jam problems with Prolog-based  ECLiPSe CLP system\nOptimization models of natural communication\nWord learning under infinite uncertainty\nA Robust Transformation-Based Learning Approach Using Ripple Down Rules  for Part-of-Speech Tagging\nTowards Live Programming in ROS with PhaROS and LRP\nReport on a User Test and Extension of a Type Debugger for Novice  Programmers\nInducing Semantic Representation from Text by Jointly Predicting and  Factorizing Relations\nLearning Longer Memory in Recurrent Neural Networks\nA key to the projective model of homogeneous metric spaces\nTensors, !-graphs, and non-commutative quantum structures\nOn products of elementarily indivisible structures\nText Understanding from Scratch\nStakeholders, Viewpoints and Languages of a Modelling Framework for the  Design and Development of Data-Intensive Mobile Apps\nA Linear Dynamical System Model for Text\nObservationally Cooperative Multithreading\nSherali-Adams relaxations for valued CSPs\nElements of style of BPMN language\nPantheon 1.0, a manually verified dataset of globally famous biographies\nTask-Oriented Learning of Word Embeddings for Semantic Relation  Classification\nThe lambda mechanism in lambda calculus and in other calculi\nNatural Notation for the Domestic Internet of Things\nProceedings Seventh Workshop on Intersection Types and Related Systems\nA Context-Based Semantics for SPARQL Property Paths over the Web  (Extended Version)\nA Denotational Semantics for Communicating Unstructured Code\nPrediction Using Note Text: Synthetic Feature Creation with word2vec\nYara Parser: A Fast and Accurate Dependency Parser\nNormalization of Non-Standard Words in Croatian Texts\nConditioning in Probabilistic Programming\nEvaluation Evaluation a Monte Carlo study\nMining and discovering biographical information in Difangzhi with a  language-model-based approach\nNeostability in countable homogeneous metric spaces\nBlade: A Data Center Garbage Collector\nRobobarista: Object Part based Transfer of Manipulation Trajectories  from Crowd-sourcing in 3D Pointclouds\nQuantitative Analysis of Probabilistic Models of Software Product Lines  with Statistical Model Checking\nHandshaking Protocol for Distributed Implementation of Reo\nSome New Directions for ACP Research\nTowards a relation extraction framework for cyber-security concepts\nSelf-Adaptive Hierarchical Sentence Model\nBig Data Small Data, In Domain Out-of Domain, Known Word Unknown Word:  The Impact of Word Representation on Sequence Labelling Tasks\nEfficient Non-parametric Estimation of Multiple Embeddings per Word in  Vector Space\nInferring Missing Entity Type Instances for Knowledge Base Completion:  New Dataset and Methods\nStochastic And-Or Grammars: A Unified Framework and Logic Perspective\nWhat value do explicit high level concepts have in vision to language  problems?\nVisualizing and Understanding Recurrent Networks\nConfounds and Consequences in Geotagged Twitter Data\nConnotation Frames: A Data-Driven Investigation\nFormalization of closure properties for context-free grammars\nListen, Attend, and Walk: Neural Mapping of Navigational Instructions to  Action Sequences\nDistilling Word Embeddings: An Encoding Approach\nLinguistic Harbingers of Betrayal: A Case Study on an Online Strategy  Game\n\"The Sum of Its Parts\": Joint Learning of Word and Phrase  Representations with Autoencoders\nWeighted Automata and Logics for Infinite Nested Words\nTopic2Vec: Learning Distributed Representations of Topics\nThe Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured  Multi-Turn Dialogue Systems\nMimicry Is Presidential: Linguistic Style Matching in Presidential  Debates and Improved Polling Numbers\nCall Graph Profiling for Multi Agent Systems\nContinuous model theories for von Neumann algebras\nInformation-theoretical analysis of the statistical dependencies among  three variables: Applications to written language\nClassifying Relations via Long Short Term Memory Networks along Shortest  Dependency Path\nFormal Reasoning Using an Iterative Approach with an Integrated Web IDE\nAndriod Based Punjabi TTS System\nBetter Summarization Evaluation with Word Embeddings for ROUGE\nRegular Hilberg Processes: An Example of Processes with a Vanishing  Entropy Rate\nA fully data-driven method to identify (correlated) changes in  diachronic corpora\nDSL-based Design Space Exploration for Temporal and Spatial Parallelism  of Custom Stream Computing\nA Strong Distillery\nAnalyse lexicale outill{é}e de la parole transcrite de patients  schizophr{è}nes\nIntegrate Document Ranking Information into Confidence Measure  Calculation for Spoken Term Detection\nC3: Lightweight Incrementalized MCMC for Probabilistic Programs using  Continuations and Callsite Caching\nReal-time Sign Language Fingerspelling Recognition using Convolutional  Neural Networks from Depth map\nTowards Understanding Egyptian Arabic Dialogues\nA Formal C Memory Model for Separation Logic\nA Higher-Order Abstract Syntax Approach to Verified Transformations on  Functional Programs\nInkdots as advice for finite automata\nKannada named entity recognition and classification (nerc) based on  multinomial naïve bayes (mnb) classifier\nNoise Robust IOA/CAS Speech Separation and Recognition System For The  Third 'CHIME' Challenge\nNoise-Robust ASR for the third 'CHiME' Challenge Exploiting  Time-Frequency Masking based Multi-Channel Speech Enhancement and Recurrent  Neural Network\nBilingual Distributed Word Representations from Document-Aligned  Comparable Data\nSegment-Phrase Table for Semantic Segmentation, Visual Entailment and  Paraphrasing\nSemantics, Representations and Grammars for Deep Learning\nTuned and GPU-accelerated parallel data mining from comparable corpora\nPolish -English Statistical Machine Translation of Medical Texts\nLSTM Neural Reordering Feature for Statistical Machine Translation\nMXNet: A Flexible and Efficient Machine Learning Library for  Heterogeneous Distributed Systems\nUnsupervised comparable corpora preparation and exploration for  bi-lingual translation equivalents\nUsing Functional Programming for Development of Distributed, Cloud and  Web Applications in F#\nMaking an Embedded DBMS JIT-friendly\nAnd the math will set you free\nVerifying Temporal Properties of Reactive Systems by Transformation\nControl Flow Analysis for SF Combinator Calculus\nAgreement-based Joint Training for Bidirectional Attention-based Neural  Machine Translation\nBayesDB: A probabilistic programming system for querying the probable  implications of data\nA Theoretically Grounded Application of Dropout in Recurrent Neural  Networks\nDynamic Graph Queries\nA Survey of Available Corpora for Building Data-Driven Dialogue Systems\nPath computation in multi-layer multi-domain networks: A language  theoretic approach\nFeedforward Sequential Memory Networks: A New Structure to Learn  Long-term Dependency\nStatistical methods for linguistic research: Foundational Ideas - Part I\nFrom Word Embeddings to Item Recommendation\nLeveraging Sentence-level Information with Encoder LSTM for Semantic  Slot Filling\nReflections on Monadic Lenses\nJikesRVM: Internal Mechanisms Study and Garbage Collection with MMTk\nAdding Real-time Capabilities to a SML Compiler\nPredicting the Effectiveness of Self-Training: Application to Sentiment  Classification\nA Taxonomy for Tools, Processes and Languages in Automotive Software  Engineering\nMultimodal Pivots for Image Caption Translation\nProbabilistic Inference of Twitter Users' Age based on What They Follow\nEfficient Quantile Computation in Markov Chains via Counting Problems  for Parikh Images\nJoint Source-Channel Decoding of Polar Codes for Language-Based Source\nCharacter-Level Incremental Speech Recognition with Recurrent Neural  Networks\nStatistical methods for linguistic research: Foundational Ideas - Part  II\nFrom $μ$-Calculus to Alternating Tree Automata using Parity Games\nResults and Analysis of SyGuS-Comp'15\nFrom Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label  Classification\nUsing session types as an effect system\nBehavioural types for non-uniform memory accesses\nSigner-independent Fingerspelling Recognition with Deep Neural Network  Adaptation\nAttention-Based Convolutional Neural Network for Machine Comprehension\nBi-directional LSTM Recurrent Neural Network for Chinese Word  Segmentation\nOverview of Annotation Creation: Processes & Tools\nAbstractive Text Summarization Using Sequence-to-Sequence RNNs and  Beyond\nText Matching as Image Recognition\nMultilingual Twitter Sentiment Classification: The Role of Human  Annotators\nChoreographies in Practice\nEvaluation of Deep Learning based Pose Estimation for Sign Language  Recognition\nSegmental Recurrent Neural Networks for End-to-end Speech Recognition\nEvent Search and Analytics: Detecting Events in Semantically Annotated  Corpora for Search and Analytics\nPorting Code Across Simple Mobile Robots\nFrom manuscript catalogues to a handbook of Syriac literature: Modeling  an infrastructure for Syriaca.org\nDynamic Memory Networks for Visual and Textual Question Answering\nSentiment Analysis in Scholarly Book Reviews\nSolutions of Word Equations over Partially Commutative Structures\nFuzzy alternating $\\mathrm{B\\ddot{u}chi}$ automata over distributive  lattices\nTwo-variable Logic with a Between Predicate\nBank distress in the news: Describing events through deep learning\nStack-propagation: Improved Representation Learning for Syntax\nArray Folds Logic\nSummaries for Context-Free Games\nSemantic Regularities in Document Representations\nRecursive Neural Language Architecture for Tag Prediction\nPart-of-Speech Relevance Weights for Learning Word Embeddings\nPointing the Unknown Words\nA Parallel-Hierarchical Model for Machine Comprehension on Sparse Data\nLifeJacket: Verifying precise floating-point optimizations in LLVM\nExpressivity and Complexity of MongoDB (Extended Version)\nDifferentiable Pooling for Unsupervised Acoustic Model Adaptation\nMWStat: A Modulated Web-Based Statistical System\nStance and Sentiment in Tweets\nDetecting Context Dependence in Exercise Item Candidates Selected from  Corpora\nRobust Dialog State Tracking for Large Ontologies\nTypes from data: Making structured data first-class citizens in F#\nVocabulary Manipulation for Neural Machine Translation\nTweet2Vec: Character-Based Distributed Representations for Social Media\nRelation Schema Induction using Tensor Factorization with Side  Information\nA Relatively Small Turing Machine Whose Behavior Is Independent of Set  Theory\nJoint Learning of Sentence Embeddings for Relevance and Entailment\nAutomatic Detection and Categorization of Election-Related Tweets\nMontre: A Tool for Monitoring Timed Regular Expressions\nProgramming with a Differentiable Forth Interpreter\nAutomatic Construction of Discourse Corpora for Dialogue Translation\nTextual Paralanguage and its Implications for Marketing Communications\nClassifying discourse in a CSCL platform to evaluate correlations with  Teacher Participation and Progress\nDesign and development a children's speech database\nCrowdSource: Automated Inference of High Level Malware Functionality  from Low-Level Symbols Using a Crowd Trained Machine Learning Model\nLogic of Local Inference for Contextuality in Quantum Physics and Beyond\nThe Mathematical Foundations for Mapping Policies to Network Devices  (Technical Report)\nPreliminary Results Towards Contract Monitorability\nStochastic Structured Prediction under Bandit Feedback\nThe Role of Translated Information Quality in a Global e-Retailing  Context\nCorrelation and Substitution in SPARQL\nCFO: Conditional Focused Neural Question Answering with Large-scale  Knowledge Bases\nA Formal Semantic for UML 2.0 Activity Diagram based on Institution  Theory\nOn the Place of Text Data in Lifelogs, and Text Analysis via Semantic  Facets\nDefExt: A Semi Supervised Definition Extraction Tool\nTowards End-to-End Learning for Dialog State Tracking and Management  using Deep Reinforcement Learning\nA Thorough Examination of the CNN/Daily Mail Reading Comprehension Task\nEdinburgh Neural Machine Translation Systems for WMT 16\nWord Sense Disambiguation using a Bidirectional LSTM\ncltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network  Library, Based on OpenCL\nSQuAD: 100,000+ Questions for Machine Comprehension of Text\nSpectral decomposition method of dialog state tracking via collective  matrix factorization\nMultiparty Compatibility for Concurrent Objects\nFrom Push/Enter to Eval/Apply by Program Transformation\nAdaptive Just-in-time Value Class Optimization for Lowering Memory  Consumption and Improving Execution Time Performance\nExplaining Predictions of Non-Linear Classifiers in NLP\nThe emotional arcs of stories are dominated by six basic shapes\nP2P-PL: A Pattern Language to Design Efficient and Robust Peer-to-Peer  Systems\nCorpus-level Fine-grained Entity Typing Using Contextual Information\nThis before That: Causal Precedence in the Biomedical Domain\nHUME: Human UCCA-Based Evaluation of Machine Translation\nThrowing fuel on the embers: Probability or Dichotomy, Cognitive or  Linguistic?\nBiabduction (and Related Problems) in Array Separation Logic\nHybrid Information Flow Analysis for Programs with Arrays\nTranslating Bayesian Networks into Entity Relationship Models, Extended  Version\nActionable and Political Text Classification using Word Embeddings and  LSTM\nExploring the Political Agenda of the European Parliament Using a  Dynamic Topic Modeling Approach\nRecurrent Highway Networks\nProgressive Analytics: A Computation Paradigm for Exploratory Data  Analysis\nModeling Software Development Methodologies: A Logic Based Approach\nLightDP: Towards Automating Differential Privacy Proofs\nAll Fingers are not Equal: Intensity of References in Scientific  Articles\nCitation Classification for Behavioral Analysis of a Scientific Field\nImproving Correlation with Human Judgments by Integrating Semantic  Similarity with Second--Order Vectors\nSkipping Word: A Character-Sequential Representation based Framework for  Question Answering\nLexical-Morphological Modeling for Legal Text Analysis\nUsing Natural Language Processing to Screen Patients with Active Heart  Failure: An Exploration for Hospital-wide Surveillance\nDivide and...conquer? On the limits of algorithmic approaches to  syntactic semantic structure\nSpheres as Frobenius objects\nEfficient softmax approximation for GPUs\nAn Iterative Transfer Learning Based Ensemble Technique for Automatic  Short Answer Grading\nStereotypes in Search Engine Results: Understanding The Role of Local  and Global Factors\nGraph-Structured Representations for Visual Question Answering\nThe MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition\nRecognizing Implicit Discourse Relations via Repeated Reading: Neural  Networks with Multi-Level Attention\nThe distribution of information content in English sentences\nLexicon-Free Fingerspelling Recognition from Video: Data, Models, and  Signer Adaptation\nEffective Combination of Language and Vision Through Model Composition  and the R-CCA Method\nLearning to Translate in Real-time with Neural Machine Translation\nFPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks\nAn Arabic-Hebrew parallel corpus of TED talks\nAre Word Embedding-based Features Useful for Sarcasm Detection?\nEmbracing data abundance: BookTest Dataset for Reading Comprehension\nToward Automatic Understanding of the Function of Affective Language in  Support Groups\nThere's No Comparison: Reference-less Evaluation Metrics in Grammatical  Error Correction\nMining the Web for Pharmacovigilance: the Case Study of Duloxetine and  Venlafaxine\nInterpreting Neural Networks to Improve Politeness Comprehension\nOpen-Ended Visual Question-Answering\nSimultaneous Learning of Trees and Representations for Extreme  Classification and Density Estimation\nReal meromorphic differentials: a language for the meron configurations  in planar nanomagnets\nAchieving Human Parity in Conversational Speech Recognition\nWeighted Positive Binary Decision Diagrams for Exact Probabilistic  Inference\nSynthesis from Assume-Guarantee Contracts using Skolemized Proofs of  Realizability\nAutomating Induction for Solving Horn Clauses\nLambdaDL: Syntax and Semantics (Preliminary Report)\nBridging Neural Machine Translation and Bilingual Dictionaries\nReordering rules for English-Hindi SMT\nUTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation\nEmojiNet: Building a Machine Readable Sense Inventory for Emoji\nBroad Context Language Modeling as Reading Comprehension\nOn the Expressive Power of User-Defined Effects: Effect Handlers,  Monadic Reflection, Delimited Control\nPush vs. Pull-Based Loop Fusion in Query Engines\nNeural Speech Recognizer: Acoustic-to-Word LSTM Model for Large  Vocabulary Speech Recognition\nA Performance Survey on Stack-based and Register-based Virtual Machines\nDual Attention Networks for Multimodal Reasoning and Matching\nA FOFE-based Local Detection Approach for Named Entity Recognition and  Mention Detection\nAll or nothing: toward a promise problem dichotomy for constraint  problems\nLearning Recurrent Span Representations for Extractive Question  Answering\nQuasi-Recurrent Neural Networks\nA Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks\nBoosting Image Captioning with Attributes\nDeep Biaffine Attention for Neural Dependency Parsing\nSelf-Wiring Question Answering Systems\nIncreasing the throughput of machine translation systems using clouds\nGetting Started with Neural Models for Semantic Matching in Web Search\nUTCNN: a Deep Learning Model of Stance Classificationon on Social Media  Text\nSemi-automatic Simultaneous Interpreting Quality Evaluation\nA Feature-Enriched Neural Model for Joint Chinese Word Segmentation and  Part-of-Speech Tagging\nVariable Computation in Recurrent Neural Networks\nVisualizing and Understanding Curriculum Learning for Long Short-Term  Memory Networks\nTracking Words in Chinese Poetry of Tang and Song Dynasties with the  China Biographical Database\nRecurrent Memory Addressing for describing videos\nThe intuitionistic temporal logic of dynamical systems\nDense Prediction on Sequences with Time-Dilated Convolutions for Speech  Recognition\nThe Bricklayer Ecosystem - Art, Math, and Code\nDeep encoding of etymological information in TEI\nUnit Dependency Graph and its Application to Arithmetic Word Problem  Solving\nStudying Academic Indicators within Virtual Learning Environment Using  Educational Data Mining\nThe Complexity of Bayesian Networks Specified by Propositional and  Relational Languages\nNeural Symbolic Machines: Learning Semantic Parsers on Freebase with  Weak Supervision (Short Version)\nPrivacy Patterns\nQuons: A 3D Language for Quantum Information\nEmbedding Words and Senses Together via Joint Knowledge-Enhanced  Training\nGrammar rules for the isiZulu complex verb\nInferring the location of authors from words in their texts\nCLEVR: A Diagnostic Dataset for Compositional Language and Elementary  Visual Reasoning\nNVST data archiving system based on fastbit nosql database\nProjectQ: An Open Source Software Framework for Quantum Computing\nIntelligent information extraction based on artificial neural network\nPAMPO: using pattern matching and pos-tagging for effective Named  Entities recognition in Portuguese\nProceedings 29th and 30th Workshops on (Constraint) Logic Programming  and 24th International Workshop on Functional and (Constraint) Logic  Programming\nAbstracting Event-Driven Systems with Lifestate Rules\nProving Non-Deterministic Computations in Agda\nNeural Probabilistic Model for Non-projective MST Parsing\nSign Language Recognition Using Temporal Classification\nMulti-level Representations for Fine-Grained Typing of Knowledge Base  Entities\nCrowdsourcing Ground Truth for Medical Relation Extraction\nDecoding with Finite-State Transducers on GPUs\nTowards a Semantics-Aware Code Transformation Toolchain for  Heterogeneous Systems\nNeural Models for Sequence Chunking\nStatic Detection of DoS Vulnerabilities in Programs that use Regular  Expressions (Extended Version)\nAn attempt to physical science basis of climate changes in early  Seventeenth century and the influence the Little Ice Age in south Italy\nProceedings Fourth International Workshop on Linearity\nDecidability, Complexity, and Expressiveness of First-Order Logic Over  the Subword Ordering\nLearning Word-Like Units from Joint Audio-Visual Analysis\nMeasuring the Reliability of Hate Speech Annotations: The Case of the  European Refugee Crisis\nPredicting Auction Price of Vehicle License Plate with Deep Recurrent  Neural Network\nAll-but-the-Top: Simple and Effective Postprocessing for Word  Representations\nLiving a discrete life in a continuous world: Reference with distributed  representations\nThe Effect of Different Writing Tasks on Linguistic Style: A Case Study  of the ROC Story Cloze Task\nNeural Machine Translation with Source-Side Latent Graph Parsing\nIntersections and Unions of Session Types\nModeling Semantic Expectation: Using Script Knowledge for Referent  Prediction\nLocal Lexing\nUniversal Dependencies to Logical Forms with Negation Scope\nParallel Long Short-Term Memory for Multi-stream Classification\nVector Embedding of Wikipedia Concepts and Entities\nOffline bilingual word vectors, orthogonal transformations and the  inverted softmax\nExistential length universality\nA Dependency-Based Neural Reordering Model for Statistical Machine  Translation\nLuandri: a Clean Lua Interface to the Indri Search Engine\nFiltering Tweets for Social Unrest\nUnsupervised Learning of Morphological Forests\nDiscriminating Traces with Time\nCurie: Policy-based Secure Data Exchange\nPolitical Homophily in Independence Movements: Analysing and Classifying  Social Media Users by National Identity\nA Knowledge-Based Approach to Word Sense Disambiguation by  distributional selection and semantic features\nApproches d'analyse distributionnelle pour améliorer la  désambiguïsation sémantique\nScattertext: a Browser-Based Tool for Visualizing how Corpora Differ\nEnd-to-End Task-Completion Neural Dialogue Systems\nDeterministic Temporal Logics and Interval Constraints\nEnd-to-End Prediction of Buffer Overruns from Raw Source Code via Neural  Memory Networks\nInformation Extraction in Illicit Domains\nExtending Automatic Discourse Segmentation for Texts in Spanish to  Catalan\nMultichannel End-to-end Speech Recognition\nA Motif-based Approach for Identifying Controversy\nIdentifying Partisan Slant in News Articles and Twitter during Political  Crises\nPaper2vec: Citation-Context Based Document Distributed Representation  for Scholar Recommendation\nProcess algebra with strategic interleaving\nAn overview of embedding models of entities and relationships for  knowledge base completion\nData-Mining Textual Responses to Uncover Misconception Patterns\nColors in Context: A Pragmatic Neural Model for Grounded Language  Understanding\nLinguistic Matrix Theory\nSimplified End-to-End MMI Training and Voting for ASR\nNeutral evolution and turnover over centuries of English word popularity\nThe pragmatics of clone detection and elimination\nAn Outline of Separation Logic\nModerately Complex Paxos Made Simple: High-Level Specification of  Distributed Algorithm\n$α$Check: A mechanized metatheory model-checker\nDeriving Probability Density Functions from Probabilistic Functional  Programs\nLinear Ensembles of Word Embedding Models\nUsing Cognitive Computing for Learning Parallel Programming: An IBM  Watson Solution\nAdposition and Case Supersenses v2: Guidelines for English\nOn the Linearity of Semantic Change: Investigating Meaning Variation via  Dynamic Graph Models\nImproving Implicit Semantic Role Labeling by Predicting Semantic Frame  Arguments\nBayesian Recurrent Neural Networks\nROSA: R Optimizations with Static Analysis\nAutomatic semantic role labeling on non-revised syntactic trees of  journalistic texts\nAn entity-driven recursive neural network model for chinese discourse  coherence modeling\nTGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering\nMUSE: Modularizing Unsupervised Sense Embeddings\nLearning Character-level Compositionality with Visual Features\nMining Worse and Better Opinions. Unsupervised and Agnostic Aggregation  of Online Reviews\nPredicting Role Relevance with Minimal Domain Expertise in a Financial  Domain\nAn Interpretable Knowledge Transfer Model for Knowledge Base Completion\nLearning to Skim Text\nUsing Global Constraints and Reranking to Improve Cognates Detection\nRobust Incremental Neural Semantic Graph Parsing\nRecognizing Descriptive Wikipedia Categories for Historical Figures\nA Challenge Set Approach to Evaluating Machine Translation\nTaxonomy Induction using Hypernym Subsequences\nUndecidability of the first order theories of free non-commutative Lie  algebras\nFrom Language to Programs: Bridging Reinforcement Learning and Maximum  Marginal Likelihood\nLearning a Neural Semantic Parser from User Feedback\nMapping Instructions and Visual Observations to Actions with  Reinforcement Learning\nModulo quantifiers over functional vocabularies extending addition\nDependency Parsing with Dilated Iterated Graph CNNs\nThe Promise of Premise: Harnessing Question Premises in Visual Question  Answering\nA polynomial time algorithm for the Lambek calculus with brackets of  bounded order\nA Hybrid Architecture for Multi-Party Conversational Systems\nGoing Wider: Recurrent Neural Network With Parallel Cells\nMax-Pooling Loss Training of Long Short-Term Memory Networks for  Small-Footprint Keyword Spotting\nDrug-drug Interaction Extraction via Recurrent Neural Network with  Multiple Attention Layers\nThe Pragmatics of Indirect Commands in Collaborative Discourse\nSurvey of Visual Question Answering: Datasets and Techniques\nEvaluating vector-space models of analogy\nOperational Semantics of Process Monitors\nHandwritten Urdu Character Recognition using 1-Dimensional BLSTM  Classifier\nText-based Adventures of the Golovin AI Agent\nA Novel Neural Network Model for Joint POS Tagging and Graph-based  Dependency Parsing\nInformation Density as a Factor for Variation in the Embedding of  Relative Clauses\nUniversal Dependencies Parsing for Colloquial Singaporean English\nIntroducing Geometric Algebra to Geometric Computing Software  Developers: A Computational Thinking Approach\nVerifying Programs via Intermediate Interpretation\nSearch Engine Guided Non-Parametric Neural Machine Translation\nMixed Membership Word Embeddings for Computational Social Science\nRecurrent Additive Networks\nSmartPaste: Learning to Adapt Source Code\nUse of Knowledge Graph in Rescoring the N-Best List in Automatic Speech  Recognition\nSecond-Order Word Embeddings from Nearest Neighbor Topological Features\nParsing with CYK over Distributed Representations: \"Classical\" Syntactic  Parsing in the Novel Era of Neural Networks\nDeriving Neural Architectures from Sequence and Graph Kernels\nMax-Cosine Matching Based Neural Models for Recognizing Textual  Entailment\nJointly Learning Sentence Embeddings and Syntax with Unsupervised  Tree-LSTMs\nSemi-Supervised Model Training for Unbounded Conversational Speech  Recognition\nMultiplex model of mental lexicon reveals explosive learning in humans\nExtending programs with debug-related features, with application to  hardware development\nFrom Temporal Models to Property-Based Testing\nNeural Embeddings of Graphs in Hyperbolic Space\nOn the \"Calligraphy\" of Books\nThe Importance of Automatic Syntactic Features in Vietnamese Named  Entity Recognition\nMachine Assisted Analysis of Vowel Length Contrasts in Wolof\nContent-Based Table Retrieval for Web Queries\nContext encoders as a simple but powerful extension of word2vec\nJoint Workshop on Bibliometric-enhanced Information Retrieval and  Natural Language Processing for Digital Libraries (BIRNDL 2017)\nAdvances in Joint CTC-Attention based End-to-End Speech Recognition with  a Deep CNN Encoder and RNN-LM\nA Full Non-Monotonic Transition System for Unrestricted Non-Projective  Parsing\nAcoustic data-driven lexicon learning based on a greedy pronunciation  selection framework\nAn Exploration of Neural Sequence-to-Sequence Architectures for  Automatic Post-Editing\nDeal or No Deal? End-to-End Learning for Negotiation Dialogues\nTowards Neural Phrase-based Machine Translation\npyRecLab: A Software Library for Quick Prototyping of Recommender  Systems\nParVecMF: A Paragraph Vector-based Matrix Factorization Recommender  System\nModalities in homotopy type theory\nNamed Entity Recognition with stack residual LSTM and trainable bias  decoding\nAutomated text summarisation and evidence-based medicine: A survey of  two domains\nParikh Image of Pushdown Automata\nThe Bernays-Schönfinkel-Ramsey Fragment with Bounded Difference  Constraints over the Reals is Decidable\nConstrained Type Families\nStronger Baselines for Trustable Results in Neural Machine Translation\nChurch-Rosser Systems, Codes with Bounded Synchronization Delay and  Local Rees Extensions\nThe Fall of the Empire: The Americanization of English\nDevelopment and Verification of a Flight Stack for a High-Altitude  Glider in Ada/SPARK 2014\nShakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence  Models\nSentiment Identification in Code-Mixed Social Media Text\nAlign and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological  Reinflection\nCross-linguistic differences and similarities in image descriptions\nCooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended  Version)\nWhat Works Better? A Study of Classifying Requirements\nA non-projective greedy dependency parser with bidirectional LSTMs\nLeipzig Corpus Miner - A Text Mining Infrastructure for Qualitative Data  Analysis\nModeling the dynamics of domain specific terminology in diachronic  corpora\nGeospatial Semantics\nNegative Sampling Improves Hypernymy Extraction Based on Projection  Learning\nDesign and Optimisation of the FlyFast Front-end for Attribute-based  Coordination\nLinguistic Markers of Influence in Informal Interactions\nLyrics-Based Music Genre Classification Using a Hierarchical Attention  Network\nTop-Rank Enhanced Listwise Optimization for Statistical Machine  Translation\nA Comparative Analysis of Social Network Pages by Interests of Their  Followers\nDeep Active Learning for Named Entity Recognition\nImproving Discourse Relation Projection to Build Discourse Annotated  Corpora\nDeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning\nUnsupervised, Knowledge-Free, and Interpretable Word Sense  Disambiguation\nUltraslow diffusion in language: Dynamics of appearance of already  popular adjectives on Japanese blogs\nAttention-Based End-to-End Speech Recognition on Voice Search\nSeminar Users in the Arabic Twitter Sphere\nDeveloping a Molecular Theory of Electromechanical Responses\nExtracting Core Claims from Scientific Articles\nExploring the Effectiveness of Convolutional Neural Networks for Answer  Selection in End-to-End Question Answering\nQuestion Dependent Recurrent Entity Network for Question Answering\nDetermining Semantic Textual Similarity using Natural Deduction Proofs\nDeep Residual Learning for Weakly-Supervised Relation Extraction\nASDA : Analyseur Syntaxique du Dialecte Alg{é}rien dans un but  d'analyse s{é}mantique\nMEMEN: Multi-layer Embedding with Memory Networks for Machine  Comprehension\nCounterfactual Learning from Bandit Feedback under Deterministic  Logging: A Case Study in Statistical Machine Translation\nA Weakly Supervised Approach to Train Temporal Relation Classifiers and  Acquire Regular Event Pairs Simultaneously\nZero-Shot Activity Recognition with Verb Attribute Induction\nMethod and apparatus for automatic text input insertion in digital  devices with a restricted number of keys\nLearned in Translation: Contextualized Word Vectors\nImproving Part-of-Speech Tagging for NLP Pipelines\nExploiting Linguistic Resources for Neural Machine Translation Using  Multi-task Learning\nAutomatic Question-Answering Using A Deep Similarity Neural Network\nA Syllable-based Technique for Word Embeddings of Korean Words\nD4M 3.0: Extended Database and Language Capabilities\nNeural Machine Translation Leveraging Phrase-based Models in a Hybrid  Search\nSimple and Effective Dimensionality Reduction for Word Embeddings\nWASSA-2017 Shared Task on Emotion Intensity\nSentiment Analysis by Joint Learning of Word Embeddings and Classifier\nComparison of Decoding Strategies for CTC Acoustic Models\nIdentifying Harm Events in Clinical Care through Medical Narratives\nDeconvolutional Paragraph Representation Learning\nCategory Theory for Genetics\nCultural Structures of Knowledge from Wikipedia Networks of First Links\nThe Natural Stories Corpus\nMeasuring the Effect of Discourse Relations on Blog Summarization\nDescriptional Complexity of Non-Unary Self-Verifying Symmetric  Difference Automata\nHandling Homographs in Neural Machine Translation\nSoftware engineering and the SP theory of intelligence\nDivide-and-Conquer Checkpointing for Arbitrary Programs with No User  Annotation\nTransforming Coroutining Logic Programs into Equivalent CHR Programs\nFrom Concurrent Programs to Simulating Sequential Programs: Correctness  of a Transformation\nKEGGexpressionMapper allows for analysis of pathways over multiple  conditions by integrating transcriptomics and proteomics measurements\nThe Unfolding Semantics of Functional Programs\nVerification of Programs via Intermediate Interpretation\nModelling Protagonist Goals and Desires in First-Person Narrative\nPersonaBank: A Corpus of Personal Narratives and Their Story Intention  Graphs\nArgument Strength is in the Eye of the Beholder: Audience Effects in  Persuasion\nType Safe Redis Queries: A Case Study of Type-Level Programming in  Haskell\nTANKER: Distributed Architecture for Named Entity Recognition and  Disambiguation\nFighting with the Sparsity of Synonymy Dictionaries\nInference of Fine-Grained Event Causality from Blogs and Films\nUnsupervised Induction of Contingent Event Pairs from Film Scenes\nGlyph-aware Embedding of Chinese Characters\nDisentangling ASR and MT Errors in Speech Translation\nFormalising Type-Logical Grammars in Agda\nCompositional Approaches for Representing Relations Between Words: A  Comparative Study\nUsing $k$-way Co-occurrences for Learning Word Embeddings\nQuantum machines with classical control\nTraining RNNs as Fast as CNNs\nCLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic  Features for Complex Word Identification\nSemi-Supervised Instance Population of an Ontology using Word Vector  Embeddings\nA certified reference validation mechanism for the permission model of  Android\nNeural Network Based Nonlinear Weighted Finite Automata\nA Rewriting System for Convex Optimization Problems\nTrace and Stable Failures Semantics for CSP-Agda\nAnd That's A Fact: Distinguishing Factual and Emotional Argumentation in  Online Dialogue\nAcquiring Background Knowledge to Improve Moral Value Prediction\nTowards Building a Knowledge Base of Monetary Transactions from a News  Collection\nA Fast and Accurate Vietnamese Word Segmenter\nThink Globally, Embed Locally --- Locally Linear Meta-embedding of Words\nUpdating the silent speech challenge benchmark with deep learning\nDe-identification of medical records using conditional random fields and  long short-term memory networks\nPredicting interviewee attitude and body language from speech  descriptors\nEZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation\nIntegration of Japanese Papers Into the DBLP Data Set\nA Preliminary Study for Building an Arabic Corpus of Pair  Questions-Texts from the Web: AQA-Webcorp\nA Permission-Dependent Type System for Secure Information Flow Analysis\nJointly Trained Sequential Labeling and Classification by Sparse  Attention Neural Networks\nStructured Embedding Models for Grouped Data\nSynonym Discovery with Etymology-based Word Embeddings\nBag-of-Vector Embeddings of Dependency Graphs for Semantic Induction\nThe Dependence of Frequency Distributions on Multiple Meanings of Words,  Codes and Signs\nAnnotation and Detection of Emotion in Text-based Dialogue Systems with  CNN\nBuilding a Web-Scale Dependency-Parsed Corpus from CommonCrawl\nModel-Theoretic Characterizations of Boolean and Arithmetic Circuit  Classes of Small Depth\nCzech Text Document Corpus v 2.0\nBilingual Words and Phrase Mappings for Marathi and Hindi SMT\nOn the Challenges of Sentiment Analysis for Dynamic Events\nRybu: Imperative-style Preprocessor for Verification of Distributed  Systems in the Dedan Environment\nSmarnet: Teaching Machines to Read and Comprehend Like Human\nThe IIT Bombay English-Hindi Parallel Corpus\nGeo-referencing Place from Everyday Natural Language Descriptions\nDecision support from financial disclosures with deep neural networks  and transfer learning\nNoReC: The Norwegian Review Corpus\nPaying Attention to Multi-Word Expressions in Neural Machine Translation\nApproximate Reduction of Finite Automata for High-Speed Network  Intrusion Detection (Technical Report)\nScaling Text with the Class Affinity Model\n$Q|SI\\rangle$: A Quantum Programming Environment\nOne-shot and few-shot learning of word embeddings\nUnderstanding Hidden Memories of Recurrent Neural Networks\nNamed Entity Recognition in Twitter using Images and Text\nMachine Translation of Low-Resource Spoken Dialects: Strategies for  Normalizing Swiss German\nUnsupervised Neural Machine Translation\nProving Soundness of Extensional Normal-Form Bisimilarities\nImproving Neural Machine Translation through Phrase-based Forced  Decoding\nSemantic Structure and Interpretability of Word Embeddings\nText Annotation Graphs: Annotating Complex Natural Language Phenomena\nExtracting an English-Persian Parallel Corpus from Comparable Corpora\nSPARK: Static Program Analysis Reasoning and Retrieving Knowledge\nNeural Speed Reading via Skim-RNN\nImproved training for online end-to-end speech recognition systems\nStructure Regularized Bidirectional Recurrent Convolutional Neural  Network for Relation Classification\nImproving Hypernymy Extraction with Distributional Semantic Classes\nOpen-World Knowledge Graph Completion\nLearning Multi-Modal Word Representation Grounded in Visual Context\nTowards the Use of Deep Reinforcement Learning with Global Policy For  Query-based Extractive Summarisation\nWord, Subword or Character? An Empirical Study of Granularity in  Chinese-English NMT\nFrom Word Segmentation to POS Tagging for Vietnamese\nOn Extending Neural Networks with Loss Ensembles for Text Classification\nA Deep Learning Approach for Expert Identification in Question Answering  Communities\nCan clone detection support quality assessments of requirements  specifications?\nInvestigating Inner Properties of Multimodal Representation and Semantic  Compositionality with Brain-based Componential Semantics\nProgramming the Interactions of Collective Adaptive Systems by Relying  on Attribute-based Communication\nIntelligent Word Embeddings of Free-Text Radiology Reports\nIncorporating Syntactic Uncertainty in Neural Machine Translation with  Forest-to-Sequence Model\nFacets, Tiers and Gems: Ontology Patterns for Hypernormalisation\nAutomated Analysis of Topic-Actor Networks on Twitter: New approach to  the analysis of socio-semantic networks\nTowards Accurate Deceptive Opinion Spam Detection based on Word  Order-preserving CNN\nHilbert's tenth problem for complex meromorphic functions in several  variables\nMachine Translation Using Semantic Web Technologies: A Survey\nAprendizagem significativa através da modelagem computacional de  sistemas físicos (Meaningful learning through computational modeling of  physics systems)\nUnsupervised Discovery of Structured Acoustic Tokens with Applications  to Spoken Term Detection\nComplex Structure Leads to Overfitting: A Structure Regularization  Decoding Method for Natural Language Processing\nSpeaker-Sensitive Dual Memory Networks for Multi-Turn Slot Tagging\nPredicting and Explaining Human Semantic Search in a Cognitive Model\nOn Asynchrony and Choreographies\nSyGuS-Comp 2017: Results and Analysis\nLanguage and Proofs for Higher-Order SMT (Work in Progress)\nDeep Semantic Role Labeling with Self-Attention\nImproving the Performance of Online Neural Transducer Models\nDistance-based Self-Attention Network for Natural Language Inference\nStrassenNets: Deep learning with a multiplication budget\nInteractive graph query language for multidimensional data in  Collaboration Spotting visual analytics framework\nDifferentiable lower bound for expected BLEU score\nSockeye: A Toolkit for Neural Machine Translation\nLow Resourced Machine Translation via Morpho-syntactic Modeling: The  Case of Dialectal Arabic\nThe NarrativeQA Reading Comprehension Challenge\nEthical Questions in NLP Research: The (Mis)-Use of Forensic Linguistics\nA Compositional Coalgebraic Semantics of Strategic Games\nAre words easier to learn from infant- than adult-directed speech? A  quantitative corpus-based investigation\nMore on the dynamics of the symbolic square root map\nSlugbot: An Application of a Novel and Scalable Open Domain Socialbot  Framework\nTowards Understanding and Answering Multi-Sentence Recommendation  Questions on Tourism\nExplorations in an English Poetry Corpus: A Neurocognitive Poetics  Perspective\nImproved English to Russian Translation by Neural Suffix Prediction\nEARL: Joint Entity and Relation Linking for Question Answering over  Knowledge Graphs\nDid William Shakespeare and Thomas Kyd Write Edward III?\nDetecting Offensive Language in Tweets Using Deep Learning\nNELS - Never-Ending Learner of Sounds\nNatural Language Multitasking: Analyzing and Improving Syntactic  Saliency of Hidden Representations\nInternal Universes in Models of Homotopy Type Theory\nLogic characterisation of p/q-recognisable sets\nA Multilayer Convolutional Encoder-Decoder Neural Network for  Grammatical Error Correction\nObject-based reasoning in VQA\nA Corpus for Modeling Word Importance in Spoken Dialogue Transcripts\nParaphrase-Supervised Models of Compositionality\nSubmodularity-inspired Data Selection for Goal-oriented Chatbot Training  based on Sentence Embeddings\nOrder matters: Distributional properties of speech to young children  bootstraps learning of semantic representations\nPreserved Structure Across Vector Space Representations\nMulti-attention Recurrent Network for Human Communication Comprehension\nParametric Presburger Arithmetic: Complexity of Counting and Quantifier  Elimination\nPraaline: Integrating Tools for Speech Corpus Research\nTextZoo, a New Benchmark for Reconsidering Text Classification\nUsing Trusted Data to Train Deep Networks on Labels Corrupted by Severe  Noise\nDeep contextualized word representations\nDeep Learning for Lip Reading using Audio-Visual Information for Urdu  Language\nCalculating the similarity between words and sentences using a lexical  database and corpus statistics\nCompositional Verification of Compiler Optimisations on Relaxed Memory\nCan Network Embedding of Distributional Thesaurus be Combined with Word  Vectors for Better Representation?\nEntropy Guided Spectrum Based Bug Localization Using Statistical  Language Model\nTAP-DLND 1.0 : A Corpus for Document Level Novelty Detection\nEvaluating Scoped Meaning Representations\nA Hybrid Word-Character Model for Abstractive Summarization\nMemory-based Parameter Adaptation\nComparing Downward Fragments of the Relational Calculus with Transitive  Closure on Trees\nOn Extended Long Short-term Memory and Dependent Bidirectional Recurrent  Neural Network\nFrom SysML/KAOS Domain Models to B System Specifications\nCode Review Comments: Language Matters\nTowards the Creation of a Large Corpus of Synthetically-Identified  Clinical Notes\nIcoRating: A Deep-Learning System for Scam ICO Identification\nDeep-FSMN for Large Vocabulary Continuous Speech Recognition\nMCScript: A Novel Dataset for Assessing Machine Comprehension Using  Script Knowledge\nStructure Regularized Neural Network for Entity Relation Classification  for Chinese Literature Text\nEnriching Frame Representations with Distributionally Induced Senses\nJoint Recognition of Handwritten Text and Named Entities with a Neural  End-to-end Model\nReady, Set, Verify! Applying hs-to-coq to real-world Haskell code\nWhy not be Versatile? Applications of the SGNMT Decoder for Machine  Translation\nStance Detection on Tweets: An SVM-based Approach\nA Resourceful Reframing of Behavior Trees\nPay More Attention - Neural Architectures for Question-Answering\nEnglish verb regularization in books and tweets\nFast Parametric Learning with Activation Memorization\nThe Worm Calculus\nThe fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset,  task and baselines\nTowards Unsupervised Automatic Speech Recognition Trained by Unaligned  Speech and Text only\nDeep Cascade Multi-task Learning for Slot Filling in Chinese E-commerce  Shopping Guide Assistant\nFine-Grained Attention Mechanism for Neural Machine Translation\nSynthesis of Differentiable Functional Programs for Lifelong Learning\nEntrenamiento de una red neuronal para el reconocimiento de imagenes de  lengua de senas capturadas con sensores de profundidad\nThe Training of Neuromodels for Machine Comprehension of Text.  Brain2Text Algorithm\nSpecification-Driven Multi-Perspective Predictive Business Process  Monitoring (Extended Version)\nIn-depth Question classification using Convolutional Neural Networks\nClinical Concept Embeddings Learned from Massive Sources of Medical Data\nA Large-Scale Study of Language Models for Chord Prediction\nEnrichment of OntoSenseNet: Adding a sense-annotated Telugu lexicon\nAutomatic symbolic computation for discontinuous Galerkin finite element  methods\nLeveraging Intra-User and Inter-User Representation Learning for  Automated Hate Speech Detection\nEfficient Graph-based Word Sense Induction by Distributional Inclusion  Vector Embeddings\nMining Social Media for Newsgathering\nDeep Learning for Digital Text Analytics: Sentiment Analysis\nEmergent Communication through Negotiation\nThe Arabic discourse about support for ISIS on Twitter and what we can  learn from that\nWord2Vec applied to Recommendation: Hyperparameters Matter\nDebugging Program Verification Proof Scripts (Tool Paper)\nLearning Multilingual Embeddings for Cross-Lingual Information Retrieval  in the Presence of Topically Aligned Corpora\nStream Runtime Monitoring on UAS\nPer-Corpus Configuration of Topic Modelling for GitHub and Stack  Overflow Collections\nIncorporating Dictionaries into Deep Neural Networks for the Chinese  Clinical Named Entity Recognition\nFormalizing common sense for scalable inconsistency-robust information  integration using Direct Logic(TM) reasoning and the Actor Model\nComplexity of Networks (reprise)\nImitation learning of motor primitives and language bootstrapping in  robots\nLa réduction de termes complexes dans les langues de spécialité\nPictures of Processes: Automated Graph Rewriting for Monoidal Categories  and Applications to Quantum Computing\nRegion-based memory management for Mercury programs\nA probabilistic framework for analysing the compositionality of  conceptual combinations\nCompetitive dynamics of lexical innovations in multi-layer networks\nA model of grassroots changes in linguistic systems\nFault Detection in C Programs using Monitoring of Range Values:  Preliminary Results\nHigher-order symbolic execution for contract verification and refutation\nSheaf-Theoretic Methods in Quantum Mechanics and Quantum Information  Theory\nAn Action Language for Multi-Agent Domains: Foundations\nIn narrative texts punctuation marks obey the same statistics as words\nA Principled Approach to Bridging the Gap between Graph Data and their  Schemas\nMaximum Entropy, Word-Frequency, Chinese Characters, and Multiple  Meanings\nÉtude cognitive des processus de construction d'une requête dans un  système de gestion de connaissances médicales\nThe exp-log normal form of types\nAutomated Synthesis of Distributed Controllers\nHybrid VCSPs with crisp and conservative valued templates\nData Cleaning for XML Electronic Dictionaries via Statistical Anomaly  Detection\nInteractive Tools and Tasks for the Hebrew Bible\nPenambahan emosi menggunakan metode manipulasi prosodi untuk sistem text  to speech bahasa Indonesia\nKnowledge-Based Biomedical Word Sense Disambiguation with Neural Concept  Embeddings\nApplication Embedding: A Language Approach to Declarative Web  Programming\nN-gram Language Modeling using Recurrent Neural Network Estimation\nDoes Python Smell Like Java? Tool Support for Design Defect Discovery in  Python\nAutomatic Response Assessment in Regions of Language Cortex in Epilepsy  Patients Using ECoG-based Functional Mapping and Machine Learning\nActive learning in annotating micro-blogs dealing with e-reputation\nInformation Theory and the Length Distribution of all Discrete Systems\nDeadlock-Free Typestate-Oriented Programming\nComputing the Cohomological Brauer Group of a Toric Variety\nPfaffian Subschemes\nEquivariant intersection theory\nAnalysis of isoplanatic high resolution stellar fields by Starfinder  code\nCMBEASY:: an Object Oriented Code for the Cosmic Microwave Background\nRestart Strategies and Internet Congestion\nNoncommutative Martin-Lof randomness : on the concept of a random  sequence of qubits\nLandau level bosonization of a 2D electron gas\nLorentz Group in Condensed Matter Physics\nTopics in Quantum Computers\nThe 2D J_1-J_2 XY and XY-Ising Models\nUniversality Classes for Extreme Value Statistics\nThe roughening transition of interfaces in disordered media\nLattice Models for Magnetic Fluids: Correlations Between Order  Parameters\nGeometrical description of vortices in Ginzburg-Landau billiards\nFermionic field theory for directed percolation in (1+1) dimensions\nTime reparametrization group and the long time behaviour in quantum  glassy systems\nSpinor Bosonic Atoms in Optical Lattices: Symmetry Breaking and  Fractionalization\nThe noncommutative replica approach\nEntropy estimation of symbol sequences\nA form factor approach to finite temperature correlation functions in  $c=1$ CFT\nSpin Waves in Random Spin Chains\nA New face on old code\nComputing at Hasylab: Perl/PerlTk is the new scripting language for  Spectra\nUser office proposal handling and analysis software\nCombinatorics of Hard Particles on Planar Graphs\nQuenched Disorder From Sea-Bosons\nPeriodic diffraction patterns for 1D quasicrystals\nNuclear Magnetic Relaxation in the Haldane-Gap Antiferromagnet  Ni(C_2_H_8_N_2_)_2_NO_2_(ClO_4_)\nQuantum Hydrodynamics of Fermi Fluids\nTowards a Hydrodynamic Theory of Infinite Neutral Nonrelativistic Matter\nDictionary between scattering matrix and Keldysh formalisms for quantum  transport driven by time-periodic fields\nAbsorbing states and elastic interfaces in random media: two equivalent  descriptions of self-organized criticality\nDesigning a Theorem Prover\nWell-Founded Semantics for Extended Logic Programs with Dynamic  Preferences\nLinear Segmentation and Segment Significance\nA Freely Available Morphological Analyzer, Disambiguator and Context  Sensitive Lemmatizer for German\nOn the Evaluation and Comparison of Taggers: The Effect of Noise in  Testing Corpora\nA Natural Deduction style proof system for propositional $μ$-calculus  and its formalization in inductive type theories\nOn Dart-Zobel Algorithm for Testing Regular Type Inclusion\nA Proof Theoretic View of Constraint Programming\nA Polymorphic Groundness Analysis of Logic Programs\nChoosing the Word Most Typical in Context Using a Lexical Co-occurrence  Network\nAn Emptiness Algorithm for Regular Types with Set Operators\nAutomatic Hardware Synthesis for a Hybrid Reconfigurable CPU Featuring  Philips CPLDs\nOptimal Multi-Paragraph Text Segmentation by Dynamic Programming\nPSPACE has 2-round quantum interactive proof systems\nMixing Metaphors\nA Computational Memory and Processing Model for Processing\nInside-Outside Estimation of a Lexicalized PCFG for German\nCascaded Grammatical Relation Assignment\nMapping Multilingual Hierarchies Using Relaxation Labeling\nRobust Grammatical Analysis for Spoken Dialogue Systems\nEvents in Property Patterns\nRepresenting Text Chunks\nExplanation-based Learning for Machine Translation\nSelective Magic HPSG Parsing\nCorpus Annotation for Parser Evaluation\nProspects for in-depth story understanding by computer\nTSIA: A Dataflow Model\nHypothetical revision and matter-of-fact supposition\nProblem solving in ID-logic with aggregates: some experiments\nDeclarative Representation of Revision Strategies\nTnT - A Statistical Part-of-Speech Tagger\nMessage Classification in the Call Center\nA Finite State and Data-Oriented Method for Grapheme to Phoneme  Conversion\nTask Frames\nA Simple Approach to Building Ensembles of Naive Bayesian Classifiers  for Word Sense Disambiguation\nFinite-State Reduplication in One-Level Prosodic Morphology\nSemantic Parsing based on Verbal Subcategorization\nUsing compression to identify acronyms in text\nParameter-free Model of Rank Polysemantic Distribution\nMapping WordNets Using Structural Information\nApplying System Combination to Base Noun Phrase Identification\nEfficient probabilistic top-down and left-corner parsing\nCompact non-left-recursive grammars using the selective left-corner  transform and factoring\nA Learning Approach to Shallow Parsing\nEstimators for Stochastic ``Unification-Based'' Grammars\nLexicalized Stochastic Modeling of Constraint-Based Grammars using  Log-Linear Measures and EM Training\nUsing a Probabilistic Class-Based Lexicon for Lexical Ambiguity  Resolution\nAutomatic Extraction of Subcategorization Frames for Czech\nParsing with the Shortest Derivation\nAn improved parser for data-oriented lexical-functional analysis\nAn Approach to the Implementation of Overlapping Rules in Standard ML\nOn a cepstrum-based speech detector robust to white noise\nTree-gram Parsing: Lexical Dependencies and Structural Relations\nApache web server execution tracing using Third Eye\nProperties of Input-Consuming Derivations\nQuantitative Neural Network Model of the Tip-of-the-Tongue Phenomenon  Based on Synthesized Memory-Psycholinguistic-Metacognitive Approach\nA Decision Tree of Bigrams is an Accurate Predictor of Word Sense\nMan [and Woman] vs. Machine: A Case Study in Base Noun Phrase Learning\nRule Writing or Annotation: Cost-efficient Resource Usage for Base Noun  Phrase Chunking\nA Complete WordNet1.5 to WordNet1.6 Mapping\nSolving Composed First-Order Constraints from Discrete-Time Robust  Control\nConstraint Propagation in Presence of Arrays\nIntegrating Prosodic and Lexical Cues for Automatic Topic Segmentation\nThe Set of Equations to Evaluate Objects\nObjects and their computational framework\nLower Bounds for Zero-knowledge on the Internet\nCoupled Clustering: a Method for Detecting Structural Correspondence\nConceptual Analysis of Lexical Taxonomies: The Case of WordNet Top-Level\nEnriching WordNet concepts with topic signatures\nTesting for Mathematical Lineation in Jim Crace's \"Quarantine\" and T. S.  Eliot's \"Four Quartets\"\nVariable and Value Ordering When Solving Balanced Academic Curriculum  Problems\nWhat is the minimal set of fragments that achieves maximal parse  accuracy?\nTeaching Parallel Programming Using Both High-Level and Low-Level  Languages\nUser-friendly explanations for constraint programming\nTowards a characterization of the star-free sets of integers\nThe Deductive Database System LDL++\nThree Optimisations for Sharing\nA Framework for Datatype Transformation\nComputational Phonology\nFast Hands-free Writing by Gaze Direction\nMemory-Based Shallow Parsing\nThree-Tiered Specification of Micro-Architectures\nMachine Learning with Lexical Features: The Duluth Approach to  Senseval-2\nThumbs up? Sentiment Classification using Machine Learning Techniques\nCharacterization of Strongly Equivalent Logic Programs in Intermediate  Logics\nInterleaved semantic interpretation in environment-based parsing\nEvaluation of Coreference Rules on Complex Narrative Texts\nThree New Methods for Evaluating Reference Resolution\nReference Resolution Beyond Coreference: a Conceptual Frame and its  Application\nProving correctness of Timed Concurrent Constraint Programs\nProbabilistic Reversible Automata and Quantum Automata\nEfficient Solving of Quantified Inequality Constraints over the Real  Numbers\nAlgorithms using Java for Spreadsheet Dependent Cell Recomputation\nRecursive function templates as a solution of linear algebra expressions  in C++\nA Development Calculus for Specifications\nAn XML based Document Suite\nTechniques for effective vocabulary selection\nThe FRED Event Display: an Extensible HepRep Client for GLAST\nDesign, implementation and deployment of the Saclay muon reconstruction  algorithms (Muonbox/y) in the Athena software framework of the ATLAS  experiment\nInformation Compression by Multiple Alignment, Unification and Search as  a Unifying Principle in Computing and Cognition\nProposed Specification of a Distributed XML-Query Network\nA Dynamic Programming Algorithm for the Segmentation of Greek Texts\nFine-Grained Authorization for Job Execution in the Grid: Design and  Implementation\nDesigning of a Community-based Translation Center\nSoft lambda-calculus: a language for polynomial time computation\nInferring Termination Conditions for Logic Programs using Backwards  Analysis\nAn Open Ended Tree\nDiagnostic reasoning with A-Prolog\nAcquiring Lexical Paraphrases from a Single Corpus\nDesign of a Community-based Translation Center\nUnifying Computing and Cognition: The SP Theory and its Applications\nTransformation Rules for Locally Stratified Constraint Logic Programs\nA Comparative Study of Arithmetic Constraints on Integer Intervals\nA Flexible Rule Compiler for Speech Synthesis\nMulti-Threading And Message Communication In Qu-Prolog\nOn the Expressive Power of First-Order Boolean Functions in PCF\nA Proof Theoretic Approach to Failure in Functional Logic Programming\nSummarizing Encyclopedic Term Descriptions on the Web\nExploiting Semidefinite Relaxations in Constraint Programming\nA Hyper-Arc Consistency Algorithm for the Soft Alldifferent Constraint\nOn Global Warming (Softening Global Constraints)\nIncremental Construction of Minimal Acyclic Sequential Transducers from  Unsorted Data\nVerbal chunk extraction in French using limited resources\nFORM Matters: Fast Symbolic Computation under UNIX\nA Sentimental Education: Sentiment Analysis Using Subjectivity  Summarization Based on Minimum Cuts\nInside-Outside Estimation Meets Dynamic EM\nFormal Languages and Algorithms for Similarity based Retrieval from  Sequence Databases\nCorpus based Enrichment of GermaNet Verb Frames\nContext Related Derivation of Word Senses\nTransforming and Enriching Documents for the Semantic Web\nPushdown dimension\nA Scalable Stream-Oriented Framework for Cluster Applications\nMinimal Eulerian trail in a labeled digraph\nText Compression and Superfast Searching\nPolynomial Synthesis of Asynchronous Automata\nAutomatic extraction of paraphrastic phrases from medium size corpora\nPractical Datatype Specializations with Phantom Types and Recursion  Schemes\nLogic Column 14: Nominal Logic and Abstract Syntax\nAn elitist approach for extracting automatically well-realized speech  sounds with high confidence\nBuilding Scenarios for Environmental Management and Planning: An  IT-Based Approach\nUnification of multi-lingual scientific terminological resources using  the ISO 16642 standard. The TermSciences initiative\nApplied MVC Patterns. A pattern language\nSemantics and Complexity of SPARQL\nLogics for Unranked Trees: An Overview\nAn Analysis of Arithmetic Constraints on Integer Intervals\nTowards \"Propagation = Logic + Control\"\nModules over Monads and Linearity\nFOSS-Based Grid Computing\nThe role of time in considering collections\nRapport technique du projet OGRE\nNorm Based Causal Reasoning in Textual Corpus\nReuse of Specification Patterns with the B Method\nQuantifier elimination for the reals with a predicate for the powers of  two\nInteractive Problem Solving in Prolog\nDeveloping efficient parsers in Prolog: the CLF manual (v1.0)\nDemaq: A Foundation for Declarative XML Message Processing\nOn vocabulary size of grammar-based codes\nA Note on Local Ultrametricity in Text\nMenzerath-Altmann Law for Syntactic Structures in Ukrainian\nA Virtual Logo Keyboard for People with Motor Disabilities\nThe Suspension Calculus and its Relationship to Other Explicit  Treatments of Substitution in Lambda Calculi\nEfficient First-Order Temporal Logic for Infinite-State Systems\nLogic Programming with Satisfiability\nRelational Abstract Domains for the Detection of Floating-Point Run-Time  Errors\nUmbral Calculus and Cancellative Semigroup Algebras\nAnalysis in $R^{1,1}$ or the Principal Function Theory\nThe LCDROOT Analysis Package\nDIS Structure Functions in Lattice QCD\nQuadratically optimized polynomials for fermion simulations\nThree-State Complex Valued Spins Coupled to Binary Branched Polymers in  Two-Dimensional Quantum Gravity\nHigher-twist contributions to the Structure Functions coming from  4-fermion operators\nThe Berry Phase and Monopoles in Gluodynamics\nGlueball and gluelump spectrum in abelian projected QCD\nApplied lattice gauge calculations: diquark content of the nucleon\nExperiences with the multi-level algorithm\nGeometry of percolating monopole clusters\nRepresentations of classical groups on the lattice and its application  to the field theory on discrete space-time\nVector-Boson versus Gluon Fusion at Hadron Colliders\nRemarks on the Quark-diagram Description of Two-body Nonleptonic B-meson  Decays\nClassical Kinetics of Hard Thermal Phenomena in High Temperature QCD\nRenormalons and 1/Q^2 Corrections\nNew Physics from HERA?\nThe Confinement\nCollisional Energy Loss of Fast Charged Particles in Relativistic  Plasmas\nQuark level linear σmodel (LσM) via loop graphs\nTwin Peaks\nNonfactorizable contributions to the decay mode D^0 -> K^0 \\bar{K^0}\nA closed analytical formula for two-loop massive tadpoles with arbitrary  tensor numerators\nMonte-Carlo Simulation of Exclusive Channels in e+e- Annihilation at Low  Energy\nLeptonic Unitarity Triangles in Matter\nChiral symmetry and pentaquarks\nPHOTOS as a pocket parton shower: flexibility tests for the algorithm\nMeson Mixing in Pion Superfluid\nSome classical properties of the non-abelian Yang-Mills theories\nGluon Condensate, Wilson Loops and Gauge/String Duality\nWBase: a C package to reduce tensor products of Lie algebra  representations\nQuantum Groups on Fibre Bundles\nChiral Rings Do Not Suffice: N=(2,2) Theories with Nonzero Fundamental  Group\nHigher algebras and mesonic spectrum in two-dimensional QCD\nFree Variables and the Two Matrix Model\nSp(2)-Symmetric Lagrangian BRST Quantization\nThe Unreasonable Effectiveness of Quantum Field Theory\nA D=4 N=1 Orbifold of Type I Strings\nOn the strongly coupled heterotic string\nRemarks on T-duality for open strings\nOn the Equivalence of Affine sl(2) and N=2 Superconformal Representation  Theories\nThe Ring Division Self Duality\nMore on Mixed Boundary Conditions and D-branes Bound States\nLectures on D-branes, Gauge Theory and M(atrices)\nOn N=8 Supergravity on $AdS_5$ and N=4 Superconformal Yang-Mills theory\nLarge N limit of orbifold field theories\nMomentum Lattice for CHL String\nThe Principle of Equivalence as a Guide towards Matrix Theory  Compactifications\nBag Model for a Link in a Closed Gluonic Chain\nProjective resolutions of coherent sheaves and descent relations between  branes\nIrreducible Decomposition of Products of 10D Chiral Sigma Matrices\nChern-Simons terms and the Three Notions of Charge\nExplicit derivation of a Central extended Hyper-Kahler Metric\nHolographic Renormalisation and Anomalies\nFuzzy Non-Trivial Gauge Configurations\nSupergravity Solution of Intersecting Branes and AdS/CFT with Flavor\nLectures on D-branes, tachyon condensation, and string field theory\nKahler Potentials on Toric Varieties\nRenormalizability of N=1/2 Wess-Zumino model in superspace\nA quantum BRST anti-BRST approach to classical integrable systems\nUltraviolet finiteness of Chiral Perturbation Theory for two-dimensional  Quantum Electrodynamics\nQuantum Corrections to the Universal Hypermultiplet and Superspace\nThe Circular, Elliptic Three Spin String from the SU(3) Spin Chain\nHolography for fermions\nMinimal Superstrings and Loop Gas Models\nSupersymmetric AdS(4) compactifications of IIA supergravity\nMixed-symmetry massless gauge fields in AdS(5)\nADHM is Tachyon Condensation\nM-Theory Brane as Giant Graviton and the Fractional Quantum Hall Effect\nQuantization of Flag Manifolds and their Supersymmetric Extensions\nD4-branes on Complete Intersection in Toric Variety\nSome applications of the ultrapower theorem to the theory of compacta\nFlattening and subanalytic sets in rigid analytic geometry\nThe Complexity of Fuzzy Logic\nOn the Cappell-Lee-Miller glueing theorem\nResonance Relations for Solutions of the Elliptic QKZB Equations, Fusion  Rules, and Eigenvectors of Transfer Matrices of Restricted  Interaction-round-a-face Models\nUsing Rewriting Systems to Compute Kan Extensions and Induced Actions of  Categories\nA Revision Theoretic Model for NF\nCohomology of Lie Superalgebras of Hamiltonian Vector Fields: Computer  Analysis\nThe Maximality of Cartesian Categories\nThe Projective Theory of Ruled Surfaces\nBox-ball systems and Robinson-Schensted-Knuth correspondence\nFeynman Diagrams via Graphical Calculus\nRational homology of spaces of complex monic polynomials with multiple  roots\nNoether's variational theorem II and the BV formalism\nPresburger sets and p-minimal fields\nHypothesis of the Functional Semantic Constructions and Mathematics in  the Functional Semantic Aspect\nMotion planning and control problems for underactuated robots\nQuantum normal families: normal families of holomorphic functions and  mappings on a Banach space\nQuantization of non-unitary geometric classical r-matrices\nGerbes, Clifford modules and the index theorem\nOn braid monodromy factorizations\nPresentations of Noneffective Orbifolds\nAutomorphisms and strongly invariant relations\n"
  },
  {
    "path": "data/arxiv_dataset.py",
    "content": "import os, random, json, pickle, re\nimport numpy as np\nimport torch.utils.data\n\n\nclass ArxivDataset(torch.utils.data.Dataset):\n    \"\"\"\n    A dataset for Arxiv\n    \"\"\"\n\n    def __init__(self, texts, preprocess=lambda x: x, sort=False):\n        super().__init__()\n        self.texts = texts\n        self.preprocess = preprocess\n        self.sort=sort\n\n        # if self.sort:\n        #     self.data = []\n        #     for i in range(len(self.texts)):\n        #         type, title, story = self.texts[i]\n        #\n        #         title = type + ' <sep> ' + title.strip()\n        #         story = story.strip()\n        #         text_raw_dict = {'title': title, 'story': story}\n        #\n        #         text = self.preprocess(text_raw_dict)\n        #         self.data.append(text)\n        #     self.data.sort(key=lambda x: len(x[0]), reverse=True)\n\n    def __len__(self):\n        return len(self.texts)\n\n    def __getitem__(self, i):\n        if self.sort:\n            return self.data[i]\n        else:\n            type, title, story = self.texts[i]\n\n            title = type + ' <sep> ' + title.strip()\n            story = story.strip()\n            text_raw_dict = {'title': title, 'story': story}\n\n            text = self.preprocess(text_raw_dict)\n            return text\n"
  },
  {
    "path": "data/plot_dataset.py",
    "content": "import os, random, json, pickle, re\nimport numpy as np\nimport torch.utils.data\n\n\nclass PlotDataset(torch.utils.data.Dataset):\n    \"\"\"\n    A dataset for WikiPlots\n    \"\"\"\n\n    def __init__(self, texts, preprocess=lambda x: x, sort=False):\n        super().__init__()\n        self.texts = texts\n        self.preprocess = preprocess\n        self.sort=sort\n\n        # if self.sort:\n        #     self.data = []\n        #     for i in range(len(self.texts)):\n        #         title, story = self.texts[i]\n        #\n        #         title = title.strip()\n        #         story = story.strip()\n        #         text_raw_dict = {'title': title, 'story': story}\n        #\n        #         text = self.preprocess(text_raw_dict)\n        #         self.data.append(text)\n        #     self.data.sort(key=lambda x: len(x[0]), reverse=True)\n\n    def __len__(self):\n        return len(self.texts)\n\n    def __getitem__(self, i):\n        if self.sort:\n            return self.data[i]\n        else:\n            title, story = self.texts[i]\n\n            title = title.strip()\n            story = story.strip()\n            text_raw_dict = {'title': title, 'story': story}\n\n            text = self.preprocess(text_raw_dict)\n            return text\n"
  },
  {
    "path": "data/prompt_dataset.py",
    "content": "import os, random, json, pickle, re\nimport numpy as np\nimport torch.utils.data\n\n\nclass PromptDataset(torch.utils.data.Dataset):\n    \"\"\"\n    A dataset for Writing Prompts\n    \"\"\"\n\n    def __init__(self, source, target, preprocess=lambda x: x, sort=False):\n        super().__init__()\n        self.preprocess = preprocess\n        self.sort=sort\n\n        print('Loading writing prompts...')\n        with open(source, errors='ignore') as fs:\n            with open(target, errors='ignore') as ft:\n                self.prompts = list(zip(fs.readlines(), ft.readlines()))\n        print('Done.')\n\n        # if self.sort:\n        #     self.data = []\n        #     for i in range(len(self.prompts)):\n        #         prompt, story = self.prompts[i]\n        #\n        #         # Remove extra annotation on prompt from WP dataset\n        #         prompt = re.sub('\\[ (.*) \\]', '', prompt)\n        #         prompt = prompt.strip()\n        #         story = story.strip()\n        #         text_raw_dict = {'prompt': prompt, 'story': story}\n        #\n        #         text = self.preprocess(text_raw_dict)\n        #         self.data.append(text)\n        #     self.data.sort(key=lambda x: len(x[0]), reverse=True)\n\n    def __len__(self):\n        return len(self.prompts)\n\n    def __getitem__(self, i):\n        if self.sort:\n            return self.data[i]\n        else:\n            prompt, story = self.prompts[i]\n\n            # Remove extra annotation on prompt from WP dataset\n            #prompt = re.sub('\\[ (.*) \\]', '', prompt)\n            prompt = prompt.strip()\n            story = story.strip()\n            text_raw_dict = {'prompt': prompt, 'story': story}\n\n            text = self.preprocess(text_raw_dict)\n            return text\n"
  },
  {
    "path": "data/util.py",
    "content": "import random, re, os\nfrom data.prompt_dataset import *\nfrom data.plot_dataset import *\nfrom data.arxiv_dataset import *\nfrom data.yelp_dataset import *\nimport torch\nimport torch.utils.data as data\nfrom torch.utils.data.distributed import DistributedSampler\nfrom unidecode import unidecode\nimport functools\nfrom rake_nltk import Rake\nimport urllib, sys\nimport urllib.request\nimport json, re\nimport numpy as np\nfrom scipy.spatial.distance import cdist\nfrom bert_serving.client import BertClient\nfrom tqdm import trange\nfrom random import shuffle\n\n\ndef compose(*functions):\n    \"\"\" Executes a list of functions in order \"\"\"\n    return functools.reduce(lambda f, g: lambda x: g(f(x)), functions, lambda x: x)\n\n\ndef prefix_truncate(window):\n    \"\"\" truncates text to the prefix window size \"\"\"\n\n    def f(text):\n        if len(text) > window:\n            text = text[:window]\n        return text\n\n    return f\n\n\nclass Preprocessor_base():\n    def __init__(self):\n        self.fn = None\n\n    def make_fn(self):\n        raise NotImplementedError()\n\n    def __call__(self, x):\n        try:\n            if self.fn is None:\n                self.fn = self.make_fn()\n            x = self.fn(x)\n            return x\n        except Exception as e:\n            print('Error in preprocessing', repr(e))\n            raise e\n\n\ndef encode_tuple(tokenizer, t):\n    return tokenizer.encode(t[0]), tokenizer.encode(t[1]), tokenizer.encode(t[2])\n\n\ndef truncate_tuple(truncator, t):\n    return truncator(t[0]), truncator(t[1]), truncator(t[2])\n\n\nclass Preprocessor(Preprocessor_base):\n    def __init__(self, tokenizer, seq_len, data_type):\n        super().__init__()\n        self.tokenizer = tokenizer\n        self.seq_len = seq_len\n        self.data_type = data_type\n\n    def make_fn(self):\n        return compose(\n            insert_keywords(self.tokenizer, self.data_type),\n            lambda input: encode_tuple(self.tokenizer, input) if isinstance(input, tuple) else [encode_tuple(self.tokenizer, inp) for inp in input],\n            lambda input: truncate_tuple(prefix_truncate(self.seq_len), input) if isinstance(input, tuple) else [truncate_tuple(prefix_truncate(self.seq_len), inp) for inp in input]\n        )\n\n\n################# for WP dataset start\ndef wp_preprocess(text):\n    # Standardize some symbols\n    text = text.replace('<newline>', '\\n')\n    text = text.replace('``', '\"')\n    text = text.replace(\"''\", '\"')\n    # Detokenize\n    text = re.sub(' +', ' ', text)  # replace multiple ' ' as one\n    text = re.sub(' (\\'|\\.|\\,|\\:|\\?|\\!|;)', '\\g<1>', text)  # remove ' ' before ,\n    text = re.sub('\" ([^\"]*) \"', '\"\\g<1>\"', text)  # remove ' ' before and after \", \" a \" -> \"a\"\n    text = text.replace(\" n't\", \"n't\")\n    return text\n\n\ndef detect_dialog(t):\n    if t.startswith('\"') or t.startswith(\"'\") or t.startswith(\"``\") or t.startswith(\"`\") or t.startswith(\n            \"''\") or t.startswith(\"'\") or t.startswith('“') or t.startswith('’') or t.startswith(\"‘\") or t.startswith(\n            '”'):\n        return True\n    else:\n        return False\n\n\ndef get_paragraph(story):\n    # split as paragraphs\n    # re.split(\"( <newline>){2,}\", story) will keep ' <newline>' delimeter\n    p = [x.strip() for x in re.split(\"( <newline>){2,}\", story) if x != ' <newline>']\n\n    # add dialog to preceding paragraph\n    pp = [p[0]]\n    for ii in range(1, len(p)):\n        if detect_dialog(p[ii]) or len(p[ii]) < 114:\n            pp[-1] = pp[-1] + ' <newline> ' + p[ii]\n        else:\n            pp.append(p[ii])\n    pp = [wp_preprocess(pt) for pt in pp]\n\n    return pp\n################# for WP dataset end\n\n\ndef extract_keywords(text, r):\n    r.extract_keywords_from_text(text)\n    # 114 2, +1 per 228, add one key per 2 sentences, which is 114 in length\n    num = min(5, max(2, int(len(text) / 228.0 + 1.5)))\n    key = [re.sub(' (\\'|\\.|\\,|\\:|\\?|\\!|;)', '\\g<1>', k.strip('\\'.,:?!;\" ')) for k in r.get_ranked_phrases()[:num]]\n    return key\n\n\n# def insert_keywords(tokenizer, data_type):\n#     def f(text_raw_dict):\n#         # 'prompt' in text_raw_dict --> wp dataset; 'title' in text_raw_dict --> wi dataset and other well preprocessed dataset\n#         summary = text_raw_dict['prompt'] if 'prompt' in text_raw_dict else text_raw_dict['title']\n#         story = text_raw_dict['story']\n#\n#         if data_type == 't0':  # x, y, y\n#             if 'prompt' in text_raw_dict:\n#                 pp = get_paragraph(story)\n#                 story = '\\n\\n'.join(pp)\n#             else:\n#                 pp = story.split('<newline><newline>')\n#                 story = '\\n\\n'.join(pp)\n#\n#             return summary, story + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n#         elif data_type == 't1':  # x, x + y, y\n#             if 'prompt' in text_raw_dict:\n#                 pp = get_paragraph(story)\n#                 story = '\\n\\n'.join(pp)\n#             else:\n#                 pp = story.split('<newline><newline>')\n#                 story = '\\n\\n'.join(pp)\n#\n#             return summary, summary + tokenizer.eos_token + story + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n#         elif data_type == 't2':  # x, x + o + y, y, append\n#             if 'title' in text_raw_dict:\n#                 pp = story.split('<newline><newline>')\n#             else:\n#                 pp = get_paragraph(story)\n#\n#             story = '\\n\\n'.join(pp)\n#\n#             # extract keywords\n#             r = Rake(min_length=1, max_length=4)\n#             keys = [extract_keywords(text, r) for text in pp]\n#             keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n#             story_appended = summary + ''.join(keys_str) + tokenizer.eos_token + '\\n\\n'.join(pp)\n#             return summary, story_appended + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n#         elif data_type == 't3':  # x, x + o + y, y, insert\n#             if 'title' in text_raw_dict:\n#                 pp = story.split('<newline><newline>')\n#             else:\n#                 pp = get_paragraph(story)\n#\n#             story = '\\n\\n'.join(pp)\n#\n#             # extract keywords\n#             r = Rake(min_length=1, max_length=4)\n#             keys = [extract_keywords(text, r) for text in pp]\n#             keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n#             keys_str[0] += tokenizer.eos_token\n#             story_inserted = summary + ''.join([k + pt for k, pt in zip(keys_str, pp)])\n#             return summary, story_inserted + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n#         elif data_type == 't4':  # x + o, y, y\n#             if 'title' in text_raw_dict:\n#                 pp = story.split('<newline><newline>')\n#             else:\n#                 pp = get_paragraph(story)\n#\n#             story = '\\n\\n'.join(pp)\n#\n#             # extract keywords\n#             r = Rake(min_length=1, max_length=4)\n#             keys = [extract_keywords(text, r) for text in pp]\n#             keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n#             return summary + ''.join(keys_str), story + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n#         elif data_type == 't5':  # x + o, x + o + y, y, append\n#             if 'title' in text_raw_dict:\n#                 pp = story.split('<newline><newline>')\n#             else:\n#                 pp = get_paragraph(story)\n#\n#             story = '\\n\\n'.join(pp)\n#\n#             # extract keywords\n#             r = Rake(min_length=1, max_length=4)\n#             keys = [extract_keywords(text, r) for text in pp]\n#             keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n#             story_appended = summary + ''.join(keys_str) + tokenizer.eos_token + '\\n\\n'.join(pp)\n#             return summary + ''.join(keys_str), story_appended + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n#         elif data_type == 't6':  # x + o, x + o + y, y, insert\n#             if 'title' in text_raw_dict:\n#                 pp = story.split('<newline><newline>')\n#             else:\n#                 pp = get_paragraph(story)\n#\n#             story = '\\n\\n'.join(pp)\n#\n#             # extract keywords\n#             r = Rake(min_length=1, max_length=4)\n#             keys = [extract_keywords(text, r) for text in pp]\n#             keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n#             keys_str[0] += tokenizer.eos_token\n#             story_inserted = summary + ''.join([k + pt for k, pt in zip(keys_str, pp)])\n#             return summary + ''.join(keys_str), story_inserted + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n#         elif data_type == 't7':  # x + o, x + o + y, y, append, extend\n#             if 'title' in text_raw_dict:\n#                 pp = story.split('<newline><newline>')\n#             else:\n#                 pp = get_paragraph(story)\n#\n#             story = '\\n\\n'.join(pp)\n#\n#             # extract keywords\n#             r = Rake(min_length=1, max_length=4)\n#             keys = [extract_keywords(text, r) for text in pp]\n#             keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n#             keys_str[0] += tokenizer.eos_token\n#\n#             extended_res = []\n#             for i in range(len(pp)):\n#                 k_i, p_i = keys_str[:i], pp[:i]\n#                 out_i = summary + ''.join(k_i)\n#                 story_appended_i = summary + ''.join(k_i) + tokenizer.eos_token + '\\n\\n'.join(p_i) + tokenizer.eos_token\n#                 story_i = tokenizer.eos_token + '\\n\\n'.join(p_i) + tokenizer.eos_token\n#                 extended_res.append((out_i, story_appended_i, story_i))\n#             return extended_res\n#         elif data_type == 't8':  # x + o, x + o + y, y, insert, extend\n#             if 'title' in text_raw_dict:\n#                 pp = story.split('<newline><newline>')\n#             else:\n#                 pp = get_paragraph(story)\n#\n#             story = '\\n\\n'.join(pp)\n#\n#             # extract keywords\n#             r = Rake(min_length=1, max_length=4)\n#             keys = [extract_keywords(text, r) for text in pp]\n#             keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n#             keys_str[0] += tokenizer.eos_token\n#\n#             extended_res = []\n#             for i in range(len(pp)):\n#                 k_i, p_i = keys_str[:i], pp[:i]\n#                 out_i = summary + ''.join(k_i)\n#                 story_inserted_i = summary + ''.join([k + pt for k, pt in zip(k_i, p_i)]) + tokenizer.eos_token\n#                 story_i = tokenizer.eos_token + '\\n\\n'.join(p_i) + tokenizer.eos_token\n#                 extended_res.append((out_i, story_inserted_i, story_i))\n#             return extended_res\n#         else:\n#             raise Exception('Data type not implemented.')\n#\n#     return f\n\n\ndef insert_keywords(tokenizer, data_type):\n    def f(text_raw_dict):\n        # 'prompt' in text_raw_dict --> wp dataset; 'title' in text_raw_dict --> wi dataset and other well preprocessed dataset\n        summary = text_raw_dict['prompt'] if 'prompt' in text_raw_dict else text_raw_dict['title']\n        story = text_raw_dict['story']\n\n        if data_type == 't0':  # x, y, y\n            if 'prompt' in text_raw_dict:\n                pp = get_paragraph(story)\n                story = '\\n\\n'.join(pp)\n            else:\n                pp = story.split('<newline><newline>')\n                story = '\\n\\n'.join(pp)\n\n            return summary + tokenizer.eos_token, story + tokenizer.eos_token, tokenizer.eos_token + story + tokenizer.eos_token\n        elif data_type == 't1':  # x, x + y, x + y\n            if 'prompt' in text_raw_dict:\n                pp = get_paragraph(story)\n                story = '\\n\\n'.join(pp)\n            else:\n                pp = story.split('<newline><newline>')\n                story = '\\n\\n'.join(pp)\n\n            summary_story = summary + tokenizer.eos_token + story + tokenizer.eos_token\n            return summary + tokenizer.eos_token, summary_story, tokenizer.eos_token + summary_story\n        elif data_type == 't2':  # x, x + o + y, x + o + y, append\n            if 'title' in text_raw_dict:\n                pp = story.split('<newline><newline>')\n            else:\n                pp = get_paragraph(story)\n\n            story = '\\n\\n'.join(pp)\n\n            # extract keywords\n            r = Rake(min_length=1, max_length=4)\n            keys = [extract_keywords(text, r) for text in pp]\n            keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n            story_appended = summary + ''.join(keys_str) + tokenizer.eos_token + '\\n\\n'.join(pp)\n            return summary + tokenizer.eos_token, story_appended + tokenizer.eos_token, tokenizer.eos_token + story_appended + tokenizer.eos_token\n        elif data_type == 't3':  # x, x + o + y, x + o + y, insert\n            if 'title' in text_raw_dict:\n                pp = story.split('<newline><newline>')\n            else:\n                pp = get_paragraph(story)\n\n            story = '\\n\\n'.join(pp)\n\n            # extract keywords\n            r = Rake(min_length=1, max_length=4)\n            keys = [extract_keywords(text, r) for text in pp]\n            keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n            keys_str[0] += tokenizer.eos_token\n            story_inserted = summary + ''.join([k + pt for k, pt in zip(keys_str, pp)])\n            return summary + tokenizer.eos_token, story_inserted + tokenizer.eos_token, tokenizer.eos_token + story_inserted + tokenizer.eos_token\n        elif data_type == 't4':  # x + o, y, x + o + y\n            if 'title' in text_raw_dict:\n                pp = story.split('<newline><newline>')\n            else:\n                pp = get_paragraph(story)\n\n            story = '\\n\\n'.join(pp)\n\n            # extract keywords\n            r = Rake(min_length=1, max_length=4)\n            keys = [extract_keywords(text, r) for text in pp]\n            keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n            summary_story = tokenizer.eos_token + summary + ''.join(keys_str) + tokenizer.eos_token + story + tokenizer.eos_token\n            return summary + ''.join(keys_str) + tokenizer.eos_token, story + tokenizer.eos_token, summary_story\n        elif data_type == 't5':  # x + o, x + o + y, x + o + y, append\n            if 'title' in text_raw_dict:\n                pp = story.split('<newline><newline>')\n            else:\n                pp = get_paragraph(story)\n\n            story = '\\n\\n'.join(pp)\n\n            # extract keywords\n            r = Rake(min_length=1, max_length=4)\n            keys = [extract_keywords(text, r) for text in pp]\n            keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n            story_appended = summary + ''.join(keys_str) + tokenizer.eos_token + '\\n\\n'.join(pp)\n            return summary + ''.join(keys_str) + tokenizer.eos_token, story_appended + tokenizer.eos_token, tokenizer.eos_token + story_appended + tokenizer.eos_token\n        elif data_type == 't6':  # x + o, x + o + y, x + o + y, insert\n            if 'title' in text_raw_dict:\n                pp = story.split('<newline><newline>')\n            else:\n                pp = get_paragraph(story)\n\n            story = '\\n\\n'.join(pp)\n\n            # extract keywords\n            r = Rake(min_length=1, max_length=4)\n            keys = [extract_keywords(text, r) for text in pp]\n            keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n            keys_str[0] += tokenizer.eos_token\n            story_inserted = summary + ''.join([k + pt for k, pt in zip(keys_str, pp)])\n            return summary + ''.join(keys_str) + tokenizer.eos_token, story_inserted + tokenizer.eos_token, tokenizer.eos_token + story_inserted + tokenizer.eos_token\n        elif data_type == 't7':  # x + o, x + o + y, x + o + y, append, extend\n            if 'title' in text_raw_dict:\n                pp = story.split('<newline><newline>')\n            else:\n                pp = get_paragraph(story)\n\n            story = '\\n\\n'.join(pp)\n\n            # extract keywords\n            r = Rake(min_length=1, max_length=4)\n            keys = [extract_keywords(text, r) for text in pp]\n            keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n\n            extended_res = []\n            for i in range(len(pp)):\n                k_i, p_i = keys_str[:i], pp[:i]\n                out_i = summary + ''.join(k_i) + tokenizer.eos_token\n                story_appended_i = summary + ''.join(k_i) + tokenizer.eos_token + '\\n\\n'.join(p_i) + tokenizer.eos_token\n                story_i = tokenizer.eos_token + summary + ''.join(k_i) + tokenizer.eos_token + '\\n\\n'.join(p_i) + tokenizer.eos_token\n                extended_res.append((out_i, story_appended_i, story_i))\n            return extended_res\n        elif data_type == 't8':  # x + o, x + o + y, x + o + y, insert, extend\n            if 'title' in text_raw_dict:\n                pp = story.split('<newline><newline>')\n            else:\n                pp = get_paragraph(story)\n\n            story = '\\n\\n'.join(pp)\n\n            # extract keywords\n            r = Rake(min_length=1, max_length=4)\n            keys = [extract_keywords(text, r) for text in pp]\n            keys_str = [tokenizer.cls_token + tokenizer.sep_token.join(key) + tokenizer.mask_token for key in keys]\n            keys_str[0] += tokenizer.eos_token\n\n            extended_res = []\n            for i in range(len(pp)):\n                k_i, p_i = keys_str[:i], pp[:i]\n                out_i = summary + ''.join(k_i).replace(tokenizer.eos_token, '') + tokenizer.eos_token\n                story_inserted_i = summary + ''.join([k + pt for k, pt in zip(k_i, p_i)]) + tokenizer.eos_token\n                story_i = tokenizer.eos_token + summary + ''.join([k + pt for k, pt in zip(k_i, p_i)]) + tokenizer.eos_token\n                extended_res.append((out_i, story_inserted_i, story_i))\n            return extended_res\n        else:\n            raise Exception('Data type not implemented.')\n\n    return f\n\n\ndef collate_fn(samples):\n    \"\"\" Creates a batch out of samples \"\"\"\n    x_max_len = max(map(lambda s: len(s[0]), samples))\n    # Zero pad mask\n    x_mask = torch.ByteTensor([[1] * len(ss[0]) + [0] * (x_max_len - len(ss[0])) for ss in samples])\n    # tokenizer.convert_tokens_to_ids('<|startoftext|>') = 50257, endoftext 50256, use 50257 here causes errors!!\n    x = torch.LongTensor([ss[0] + [50256] * (x_max_len - len(ss[0])) for ss in samples])\n\n    max_len = max(map(lambda s: len(s[1]), samples))\n    # Zero pad mask\n    y_mask = torch.ByteTensor([[1] * len(ss[1]) + [0] * (max_len - len(ss[1])) for ss in samples])\n    # tokenizer.convert_tokens_to_ids('<|startoftext|>') = 50257\n    y = torch.LongTensor([ss[1] + [50256] * (max_len - len(ss[1])) for ss in samples])\n\n    max_len = max(map(lambda s: len(s[2]), samples))\n    # Zero pad mask\n    input_mask = torch.ByteTensor([[1] * len(ip[2]) + [0] * (max_len - len(ip[2])) for ip in samples])\n    # tokenizer.convert_tokens_to_ids('<|startoftext|>') = 50257\n    input = torch.LongTensor([ip[2] + [50256] * (max_len - len(ip[2])) for ip in samples])\n\n    return x_mask, x, y_mask, y, input[:, :-1], input[:, 1:].contiguous(), input_mask[:, 1:]\n\n\ndef prepare_dataset(data_dir, dataset_name, tokenizer, train_bsz, train_seq_len, val_bsz, val_seq_len, test_bsz=1,\n                    test_seq_len=1024, data_type='t0', num_workers=1, make_train=True, make_val=True, make_test=False):\n    # data_dir, dataset_name, tokenizer, train_bsz, train_seq_len, val_bsz, val_seq_len, num_workers = args.data_dir, args.dataset, tokenizer, batch_schedule[cur_b_schedule][0], batch_schedule[cur_b_schedule][1], batch_schedule[-1][0], batch_schedule[-1][1], args.workers\n\n    loaders = []\n    if dataset_name == 'wp':\n        train_collate_fn = collate_fn\n        val_collate_fn = collate_fn\n        test_collate_fn = collate_fn\n\n        if make_train:\n            train_preproc = Preprocessor(tokenizer, train_seq_len, data_type)\n            d_train = PromptDataset(\n                os.path.join(data_dir, 'writingPrompts/train.wp_source'),\n                os.path.join(data_dir, 'writingPrompts/train.wp_target'),\n                train_preproc)\n            if data_type == 't7' or data_type == 't8':\n                d_train = [t for lt in d_train for t in lt]\n            print('Train dataset size', len(d_train))\n            loaders.append(data.DataLoader(d_train,\n                                           # sampler=DistributedSampler(d_train) if distributed else None,\n                                           batch_size=train_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=train_collate_fn) if d_train else None)\n        if make_val:\n            val_preproc = Preprocessor(tokenizer, val_seq_len, data_type)\n            d_val = PromptDataset(\n                os.path.join(data_dir, 'writingPrompts/valid.wp_source'),\n                os.path.join(data_dir, 'writingPrompts/valid.wp_target'),\n                val_preproc)\n            if data_type == 't7' or data_type == 't8':\n                d_val = [t for lt in d_val for t in lt]\n            print('Val dataset size', len(d_val))\n            loaders.append(data.DataLoader(d_val,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=val_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=val_collate_fn) if d_val else None)\n        if make_test:\n            test_preproc = Preprocessor(tokenizer, test_seq_len, data_type)\n            d_test = PromptDataset(\n                os.path.join(data_dir, 'writingPrompts/test.wp_source'),\n                os.path.join(data_dir, 'writingPrompts/test.wp_target'),\n                test_preproc)\n            if data_type == 't7' or data_type == 't8':\n                d_test = [t for lt in d_test for t in lt]\n            print('Test dataset size', len(d_test))\n            loaders.append(data.DataLoader(d_test,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=test_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=test_collate_fn) if d_test else None)\n    elif dataset_name == 'wi':\n        train_collate_fn = collate_fn\n        val_collate_fn = collate_fn\n        test_collate_fn = collate_fn\n\n        print('Loading wikiplot dataset...')\n        data_plots = os.path.join(data_dir, 'wikiPlots/plots_paragraph')\n        data_titles = os.path.join(data_dir, 'wikiPlots/titles')\n        with open(data_plots, errors='ignore') as fp:\n            plots = fp.readlines()\n        with open(data_titles, errors='ignore') as ft:\n            titles = ft.readlines()\n\n        texts = [(t, p) for t, p in zip(titles, plots) if t.strip() != '' and p.strip() != '']\n        print('Done.')\n        train_text = texts[:int(len(texts) * 0.9)]\n        val_text = texts[int(len(texts) * 0.9):int(len(texts) * 0.95)]\n        test_text = texts[int(len(texts) * 0.95):]\n\n        if make_train:\n            train_preproc = Preprocessor(tokenizer, train_seq_len, data_type)\n            d_train = PlotDataset(train_text, train_preproc)\n            if data_type == 't7' or data_type == 't8':\n                d_train = [t for lt in d_train for t in lt]\n            print('Train dataset size', len(d_train))\n            loaders.append(data.DataLoader(d_train,\n                                           # sampler=DistributedSampler(d_train) if distributed else None,\n                                           batch_size=train_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=train_collate_fn) if d_train else None)\n        if make_val:\n            val_preproc = Preprocessor(tokenizer, val_seq_len, data_type)\n            d_val = PlotDataset(val_text, val_preproc)\n            if data_type == 't7' or data_type == 't8':\n                d_val = [t for lt in d_val for t in lt]\n            print('Val dataset size', len(d_val))\n            loaders.append(data.DataLoader(d_val,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=val_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=val_collate_fn) if d_val else None)\n        if make_test:\n            test_preproc = Preprocessor(tokenizer, test_seq_len, data_type)\n            d_test = PlotDataset(test_text, test_preproc)\n            if data_type == 't7' or data_type == 't8':\n                d_test = [t for lt in d_test for t in lt]\n            print('Test dataset size', len(d_test))\n            loaders.append(data.DataLoader(d_test,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=test_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=test_collate_fn) if d_test else None)\n    elif dataset_name == 'ax':\n        train_collate_fn = collate_fn\n        val_collate_fn = collate_fn\n        test_collate_fn = collate_fn\n\n        print('Loading arxiv dataset...')\n        data_abs = os.path.join(data_dir, 'arxiv/artificial intelligence_10047_15000_15_abs.txt')\n        data_titles = os.path.join(data_dir, 'arxiv/artificial intelligence_10047_15000_15_title.txt')\n        with open(data_abs, errors='ignore') as fp:\n            abs = fp.readlines()\n        with open(data_titles, errors='ignore') as ft:\n            titles = ft.readlines()\n        assert len(titles) == len(abs)\n        ai_data = [('ai', t.strip(), p.strip()) for t, p in zip(titles, abs) if t.strip() != '' and p.strip() != '']\n\n        data_abs = os.path.join(data_dir, 'arxiv/computer vision_14582_15000_15_abs.txt')\n        data_titles = os.path.join(data_dir, 'arxiv/computer vision_14582_15000_15_title.txt')\n        with open(data_abs, errors='ignore') as fp:\n            abs = fp.readlines()\n        with open(data_titles, errors='ignore') as ft:\n            titles = ft.readlines()\n        assert len(titles) == len(abs)\n        cv_data = [('cv', t.strip(), p.strip()) for t, p in zip(titles, abs) if t.strip() != '' and p.strip() != '']\n\n        data_abs = os.path.join(data_dir, 'arxiv/language generation_14514_15000_15_abs.txt')\n        data_titles = os.path.join(data_dir, 'arxiv/language generation_14514_15000_15_title.txt')\n        with open(data_abs, errors='ignore') as fp:\n            abs = fp.readlines()\n        with open(data_titles, errors='ignore') as ft:\n            titles = ft.readlines()\n        assert len(titles) == len(abs)\n        lg_data = [('lg', t.strip(), p.strip()) for t, p in zip(titles, abs) if t.strip() != '' and p.strip() != '']\n\n        texts = ai_data + cv_data + lg_data\n        shuffle(texts)\n        print('Done.')\n        train_text = texts[:int(len(texts) * 0.9)]\n        val_text = texts[int(len(texts) * 0.9):int(len(texts) * 0.95)]\n        test_text = texts[int(len(texts) * 0.95):]\n\n        if make_train:\n            train_preproc = Preprocessor(tokenizer, train_seq_len, data_type)\n            d_train = ArxivDataset(train_text, train_preproc)\n            print('Train dataset size', len(d_train))\n            loaders.append(data.DataLoader(d_train,\n                                           # sampler=DistributedSampler(d_train) if distributed else None,\n                                           batch_size=train_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=train_collate_fn) if d_train else None)\n        if make_val:\n            val_preproc = Preprocessor(tokenizer, val_seq_len, data_type)\n            d_val = ArxivDataset(val_text, val_preproc)\n            print('Val dataset size', len(d_val))\n            loaders.append(data.DataLoader(d_val,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=val_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=val_collate_fn) if d_val else None)\n        if make_test:\n            test_preproc = Preprocessor(tokenizer, test_seq_len, data_type)\n            d_test = ArxivDataset(test_text, test_preproc)\n            print('Test dataset size', len(d_test))\n            loaders.append(data.DataLoader(d_test,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=test_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=test_collate_fn) if d_test else None)\n    elif dataset_name == 'yp':\n        train_collate_fn = collate_fn\n        val_collate_fn = collate_fn\n        test_collate_fn = collate_fn\n\n        if make_train:\n            train_preproc = Preprocessor(tokenizer, train_seq_len, data_type)\n            d_train = YelpDataset(os.path.join(data_dir, 'yelp/yelp.train.txt'), train_preproc)\n            print('Train dataset size', len(d_train))\n            loaders.append(data.DataLoader(d_train,\n                                           # sampler=DistributedSampler(d_train) if distributed else None,\n                                           batch_size=train_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=train_collate_fn) if d_train else None)\n        if make_val:\n            val_preproc = Preprocessor(tokenizer, val_seq_len, data_type)\n            d_val = YelpDataset(os.path.join(data_dir, 'yelp/yelp.valid.txt'), val_preproc)\n            print('Val dataset size', len(d_val))\n            loaders.append(data.DataLoader(d_val,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=val_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=val_collate_fn) if d_val else None)\n        if make_test:\n            test_preproc = Preprocessor(tokenizer, test_seq_len, data_type)\n            d_test = YelpDataset(os.path.join(data_dir, 'yelp/yelp.test.txt'), test_preproc)\n            print('Test dataset size', len(d_test))\n            loaders.append(data.DataLoader(d_test,\n                                           # sampler=DistributedSampler(d_val),\n                                           batch_size=test_bsz,\n                                           pin_memory=True,\n                                           drop_last=True,\n                                           num_workers=num_workers,\n                                           collate_fn=test_collate_fn) if d_test else None)\n    else:\n        raise Exception('Invalid dataset')\n\n    return loaders\n"
  },
  {
    "path": "data/yelp_dataset.py",
    "content": "import os, random, json, pickle, re\nimport numpy as np\nimport torch.utils.data\n\n\nclass YelpDataset(torch.utils.data.Dataset):\n    \"\"\"\n    A dataset for Yelp\n    \"\"\"\n\n    def __init__(self, source, preprocess=lambda x: x, sort=False):\n        super().__init__()\n        self.preprocess = preprocess\n        self.sort=sort\n\n        print('Loading Yelp...')\n        with open(source, errors='ignore') as fs:\n            self.source = fs.readlines()\n        print('Done.')\n\n    def __len__(self):\n        return len(self.source)\n\n    def __getitem__(self, i):\n        raw = self.source[i]\n        title, story = raw[:1], raw[2:].strip()\n        text_raw_dict = {'title': title, 'story': story}\n\n        text = self.preprocess(text_raw_dict)\n        return text\n"
  },
  {
    "path": "dist_utils.py",
    "content": "# Copyright (c) 2017-present, Facebook, Inc.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the LICENSE file in\n# the root directory of this source tree. An additional grant of patent rights\n# can be found in the PATENTS file in the same directory.\n\n\"\"\"\nA modified version of the legacy DistributedDataParallel module that uses c10d\ncommunication primitives. This is necessary for models that have conditional\ncomputation (e.g., AdaptiveSoftmax) and which therefore do not work with the\nc10d version of DDP.\n\nThis version also supports the *accumulate_grads* feature, which allows faster\ntraining with `--update-freq`.\n\"\"\"\n\nimport copy\n\nimport torch\nfrom torch import nn\nfrom torch.autograd import Variable\nimport torch.distributed as dist\n\n\nclass SimpleDistributedDataParallel(nn.Module):\n    \"\"\"Implements distributed data parallelism at the module level.\n\n    A simplified version of :class:`torch.nn.parallel.DistributedDataParallel`.\n    This version uses a c10d process group for communication and does not\n    broadcast buffers.\n\n    Args:\n        module (~torch.nn.Module): module to be parallelized\n        world_size (int): number of parallel workers\n        process_group (optional): the c10d process group to be used for\n            distributed data all-reduction. If None, the default process group\n            will be used.\n        buffer_size (int, optional): number of elements to buffer before\n            performing all-reduce (default: 256M).\n    \"\"\"\n\n    def __init__(self, module, world_size, process_group=None, buffer_size=2 ** 28):\n        super().__init__()\n\n        self.module = module\n        self.world_size = world_size\n        self.process_group = dist.group.WORLD if process_group is None else process_group\n\n        # Never use a bigger buffer than the number of model params\n        self.buffer_size = min(buffer_size, sum(p.numel() for p in module.parameters()))\n        self.buffer = None\n\n        # Flag used by the NCCL backend to make sure we only reduce gradients\n        # one time in the execution engine\n        self.need_reduction = False\n\n        # We can also forcibly accumulate grads locally and only do the\n        # all-reduce at some later time\n        self.accumulate_grads = False\n\n        # For NCCL backend, since every single NCCL call is asynchoronous, we\n        # therefore directly enqueue all the NCCL reduction calls to the\n        # default CUDA stream without spawning up other reduction threads.\n        # This achieves the best performance.\n        self._register_grad_hook()\n\n    def __getstate__(self):\n        attrs = copy.copy(self.__dict__)\n        return attrs\n\n    def __setstate__(self, state):\n        super().__setstate__(state)\n        self._register_grad_hook()\n\n    def forward(self, *inputs, **kwargs):\n        return self.module(*inputs, **kwargs)\n\n    def _register_grad_hook(self):\n        \"\"\"\n        This function registers the callback all-reduction function for the\n        NCCL backend. All gradients will be all reduced in one single step.\n        The NCCL reduction will directly be enqueued into the default CUDA\n        stream. Therefore, no synchronization is needed.\n        \"\"\"\n\n        def all_reduce(params):\n            buffer = self.buffer\n            nonzero_buffer = False\n            if len(params) > 1:\n                offset = 0\n                for p in params:\n                    sz = p.numel()\n                    if p.grad is not None:\n                        buffer[offset:offset + sz].copy_(p.grad.data.view(-1))\n                        nonzero_buffer = True\n                    else:\n                        buffer[offset:offset + sz].zero_()\n                    offset += sz\n            else:\n                # we only have a single grad to all-reduce\n                p = params[0]\n                if p.grad is not None:\n                    buffer = p.grad.data\n                    nonzero_buffer = True\n                elif p.numel() <= self.buffer.numel():\n                    buffer = buffer[:p.numel()]\n                    buffer.zero_()\n                else:\n                    buffer = torch.zeros_like(p)\n\n            if nonzero_buffer:\n                buffer.div_(self.world_size)\n\n            dist.all_reduce(buffer, group=self.process_group)\n\n            # copy all-reduced grads back into their original place\n            offset = 0\n            for p in params:\n                sz = p.numel()\n                if p.grad is not None:\n                    p.grad.data.copy_(buffer[offset:offset + sz].view_as(p))\n                else:\n                    p.grad = buffer[offset:offset + sz].view_as(p).clone()\n                offset += sz\n\n        def reduction_fn():\n            # This function only needs to be called once\n            if not self.need_reduction or self.accumulate_grads:\n                return\n            self.need_reduction = False\n\n            if self.buffer is None:\n                self.buffer = next(self.module.parameters()).new(self.buffer_size)\n\n            # All-reduce the gradients in buckets\n            offset = 0\n            buffered_params = []\n            for param in self.module.parameters():\n                if not param.requires_grad:\n                    continue\n                if param.grad is None:\n                    param.grad = torch.zeros_like(param)\n                if param.grad.requires_grad:\n                    raise RuntimeError(\"DistributedDataParallel only works \"\n                                       \"with gradients that don't require \"\n                                       \"grad\")\n                sz = param.numel()\n                if sz > self.buffer.numel():\n                    # all-reduce big params directly\n                    all_reduce([param])\n                else:\n                    if offset + sz > self.buffer.numel():\n                        all_reduce(buffered_params)\n                        offset = 0\n                        buffered_params.clear()\n                    buffered_params.append(param)\n                    offset += sz\n\n            if len(buffered_params) > 0:\n                all_reduce(buffered_params)\n\n        # Now register the reduction hook on the parameters\n        for p in self.module.parameters():\n\n            def allreduce_hook(*unused):\n                self.need_reduction = True\n                Variable._execution_engine.queue_callback(reduction_fn)\n\n            if p.requires_grad:\n                p.register_hook(allreduce_hook)\n"
  },
  {
    "path": "eval_ppl.py",
    "content": "import pickle\nimport os\nimport math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nimport argparse\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config\nfrom tqdm import tqdm\nfrom tqdm import trange\nimport importlib\nimport logging\nimport copy\nfrom data.util import *\nfrom util import *\n\nfrom model import *\n\n\ndef compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, x_mask=x_mask, x_tokens=x_tokens, y_mask=y_mask,\n                    y_tokens=y_tokens, from_prior=True)\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1)).mean()\n    kl_loss = kl_loss.mean()\n    loss = ce_loss + beta * kl_loss\n\n    return loss, ce_loss, kl_loss\n\n\ndef compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, y_mask=x_mask, y_tokens=x_tokens, from_mean=True, from_prior=False)\n\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1)).mean()\n    kl_loss = kl_loss.mean()\n    loss = ce_loss\n\n    return loss, ce_loss, kl_loss\n\n\ndef run_model():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--model-path', type=str, help='pretrained model path to local checkpoint')\n\n    parser.add_argument(\"--seed\", type=int, default=0)\n    parser.add_argument(\"--batch-size\", type=int, default=1)\n    parser.add_argument('--data-dir', type=str, default='data')\n    parser.add_argument('--out-dir', type=str, default='out')\n\n    parser.add_argument('--data_type', type=str, default='t1', choices=['t' + str(i) for i in range(9)], help=\"t: type\")\n    parser.add_argument('--model_type', type=str, default='ae_vae_fusion', choices=['cvae', 'ae_vae_fusion'])\n    parser.add_argument('--dataset', type=str, default='wi', choices=['wp', 'wi'], help=\"Dataset to use for training\")\n    parser.add_argument('--workers', default=1, type=int, metavar='N', help='number of data loading workers')\n\n    # use GPU\n    parser.add_argument('--gpu', default=3, type=int)\n    parser.add_argument('--no_gpu', action=\"store_true\")\n\n    parser.add_argument('--fp16', action='store_true', help=\"Train using FP16?\")\n\n    parser.add_argument('--add_input', action=\"store_true\")\n    parser.add_argument('--add_attn', action=\"store_true\")\n    parser.add_argument('--add_softmax', action=\"store_true\")\n    parser.add_argument('--attn_proj_vary', action=\"store_true\")\n\n    parser.add_argument('--learn_prior', action=\"store_true\")\n\n    args = parser.parse_args('--model-path out/wi.2.proj_beta_half_ae/model_0150000.pt '\n                             '--add_attn --learn_prior --fp16'.split())\n    print(args)\n\n    if args.model_type == 'cvae':\n        args.learn_prior = True\n    else:\n        args.learn_prior = False\n\n    # GPU\n    if not torch.cuda.is_available(): args.no_gpu = True\n    gpu = not args.no_gpu\n    if gpu: torch.cuda.set_device(args.gpu)\n    device = torch.device(args.gpu if gpu else \"cpu\")\n\n    # randomness\n    np.random.seed(args.seed)\n    prng = np.random.RandomState()\n    torch.random.manual_seed(args.seed)\n    if gpu: torch.cuda.manual_seed(args.seed); torch.cuda.manual_seed_all(args.seed)\n\n    if args.batch_size == -1:\n        args.batch_size = 1\n\n    # logging\n    save_folder = args.model_path + '.eval/'\n    os.makedirs(save_folder, exist_ok=True)\n    importlib.reload(logging)\n    logging.basicConfig(filename=os.path.join(save_folder, 'eval_ppl.log'),\n                        level=logging.INFO, format='%(asctime)s--- %(message)s')\n    logging.info('\\n----------------------------------------------------------------------')\n\n    print('Loading models...')\n    cache_dir = os.path.join(args.out_dir, 'model_cache')\n    os.makedirs(cache_dir, exist_ok=True)\n    # Load pre-trained teacher tokenizer (vocabulary)\n    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n    tokenizer.max_len = int(1e12)\n    gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\n    print('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\n    config = GPT2Config()\n\n    # add special tokens\n    special_tokens_dict = {\n        'pad_token': '<|startoftext|>',\n        'cls_token': '<|startofcond|>',\n        'sep_token': '<|sepofcond|>',\n        'mask_token': '<|endofcond|>'\n    }\n    num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\n    print('We have added', num_added_toks, 'special tokens')\n    # Notice: resize_token_embeddings expect to receive the full size of the new vocab\n    gpt2_model.resize_token_embeddings(len(tokenizer))\n    assert tokenizer.pad_token == '<|startoftext|>'\n\n    VAE = VAEModel(config, add_input=args.add_input, add_attn=args.add_attn, add_softmax=args.add_softmax,\n                   attn_proj_vary=args.attn_proj_vary, learn_prior=args.learn_prior)\n    init_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\n    init_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\n    if args.learn_prior:\n        init_para_frompretrained(VAE.encoder_prior, VAE.encoder, share_para=True)\n        VAE.encoder_prior.averageSelfAttention.attention_weights = VAE.encoder.averageSelfAttention.attention_weights\n    VAE.lm_head.weight = gpt2_model.lm_head.weight\n    if VAE.add_softmax:\n        VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\n        # VAE.lm_head_rep = LM_head_rep(*gpt2_model.lm_head.weight.size()[::-1])\n    print('VAE_params:', num_params(VAE))  # 286694400\n    args.load = args.model_path\n    if args.load:\n        print('Loading model weights...')\n        state = torch.load(os.path.join(args.load), map_location='cpu')\n        if 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n            state_copy = copy.copy(state)\n            keys = state_copy.keys()\n            for k in keys:\n                state[k.replace('module.', '')] = state.pop(k)\n        VAE.load_state_dict(state)\n        gc.collect()\n    print('Model loaded.')\n\n    print('Setup data...')\n    seq_len = VAE.config.n_ctx\n    train_loader, val_loader, test_loader = prepare_dataset(\n        args.data_dir, args.dataset, tokenizer,\n        1, seq_len, 1, seq_len, args.batch_size, seq_len,\n        make_test=True,\n        num_workers=args.workers, data_type=args.data_type\n    )\n    print('Done.')\n\n    if args.fp16:\n        VAE = VAE.half()\n    VAE.eval()  # be careful about VAE.eval() vs VAE.train()\n    VAE.to(device)\n    loss_fn = nn.CrossEntropyLoss(reduction='none')\n\n    logging.info('\\n----------------------------------------------------------------------')\n    logging.info(\"Testing loop. batches: %d\" % len(test_loader))\n\n    endoftext = tokenizer.convert_tokens_to_ids(\"<|endoftext|>\")\n    startofcond = tokenizer.convert_tokens_to_ids(\"<|startofcond|>\")\n    endofcond = tokenizer.convert_tokens_to_ids(\"<|endofcond|>\")\n\n    n_words_bpe = 0\n    n_words = 0\n    logp_sum = 0.0\n\n    n_words_bpe_l = []\n    n_words_l = []\n    logp_sum_l = []\n\n    stats = []\n    # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n    with tqdm(total=len(test_loader)) as pbar:\n        for i_test, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(test_loader):\n\n            with torch.no_grad():\n                if args.model_type == 'cvae':\n                    loss, ce_loss, kl_loss = compute_loss(device, VAE, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                                          target_tokens, mask, loss_fn, 1.0)\n                else:\n                    loss, ce_loss, kl_loss = compute_loss_ae(device, VAE, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                                          target_tokens, mask, loss_fn, 1.0)\n\n            stats.append([ce_loss.item(), math.exp(min(ce_loss.item(), 100)), kl_loss.item()])\n\n            if len(target_tokens.size()) == 1:\n               target_tokens = target_tokens.unsqueeze(0)\n            n, l = target_tokens.size()\n\n            tokens = target_tokens.tolist()\n            tokens = [t[:t.index(endoftext) + 1] if endoftext in t else t for t in tokens]\n            words_bpe = sum([len(t) for t in tokens])\n            n_words_bpe += words_bpe\n            n_words_bpe_l.append(words_bpe)\n\n            story = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n            story = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in story]\n            words = sum([len([t for t in re.split('(\"|\\'|!|\\?|\\.|,|:| |\\n|’|“|”|;|\\(|\\)|`)', s) if t != ' ' and t != '']) for s in story])\n            n_words += words\n            n_words_l.append(words)\n\n            logp_sum += ce_loss.item() * words_bpe\n            logp_sum_l.append(ce_loss.item() * words_bpe)\n\n            #logging.info('test sample %05d finished.', i_test)\n            pbar.update(1)\n\n    print('Test complete with %05d samples.' % len(test_loader))\n    logging.info(\"Test complete with %05d samples.\", len(test_loader))\n\n    print(' loss_bpe :', logp_sum / n_words_bpe)\n    logging.info('loss_bpe: %f', logp_sum / n_words_bpe)\n\n    ppl_bpe = round(math.exp(logp_sum / n_words_bpe), 3)\n    ppl_word = round(math.exp(logp_sum / n_words), 3)\n    print(' ppl_word:', ppl_word)\n    print(' ppl_bpe :', ppl_bpe)\n    logging.info('logp_sum: %f', logp_sum)\n    logging.info('n_words_bpe: %d', n_words_bpe)\n    logging.info('n_words    : %d', n_words)\n    logging.info('    ppl_bpe : %f', ppl_bpe)\n    logging.info('    ppl_word: %f', ppl_word)\n\n    stats = np.mean(stats, axis=0)\n    print(stats)\n    logging.info('    stats: %s', str(stats))\n\n\nif __name__ == '__main__':\n    run_model()\n"
  },
  {
    "path": "eval_ppl_prefix.py",
    "content": "import pickle\nimport os\nimport math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nimport argparse\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config\nfrom tqdm import tqdm\nfrom tqdm import trange\nimport importlib\nimport logging\nimport copy\nfrom data.util import *\nfrom util import *\n\nfrom model import *\n\n\ndef compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, x_mask=x_mask, x_tokens=x_tokens, y_mask=y_mask,\n                    y_tokens=y_tokens, from_prior=True)\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1)).mean()\n    kl_loss = kl_loss.mean()\n    loss = ce_loss + beta * kl_loss\n\n    return loss, ce_loss, kl_loss\n\n\ndef compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, y_mask=x_mask, y_tokens=x_tokens, from_mean=True, from_prior=False)\n\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1))\n    kl_loss = kl_loss.mean()\n    loss = ce_loss\n\n    return loss, ce_loss, kl_loss\n\n\ndef run_model():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--model-path', type=str, help='pretrained model path to local checkpoint')\n\n    parser.add_argument(\"--seed\", type=int, default=0)\n    parser.add_argument(\"--batch-size\", type=int, default=1)\n    parser.add_argument('--data-dir', type=str, default='data')\n    parser.add_argument('--out-dir', type=str, default='out')\n\n    parser.add_argument('--data_type', type=str, default='t1', choices=['t' + str(i) for i in range(9)], help=\"t: type\")\n    parser.add_argument('--model_type', type=str, default='ae_vae_fusion', choices=['cvae', 'ae_vae_fusion'])\n    parser.add_argument('--dataset', type=str, default='wi', choices=['wp', 'wi'], help=\"Dataset to use for training\")\n    parser.add_argument('--workers', default=1, type=int, metavar='N', help='number of data loading workers')\n\n    # use GPU\n    parser.add_argument('--gpu', default=3, type=int)\n    parser.add_argument('--no_gpu', action=\"store_true\")\n\n    parser.add_argument('--fp16', action='store_true', help=\"Train using FP16?\")\n\n    parser.add_argument('--add_input', action=\"store_true\")\n    parser.add_argument('--add_attn', action=\"store_true\")\n    parser.add_argument('--add_softmax', action=\"store_true\")\n    parser.add_argument('--attn_proj_vary', action=\"store_true\")\n\n    parser.add_argument('--learn_prior', action=\"store_true\")\n\n    args = parser.parse_args('--model-path out/wi.12.proj_beta_half_ae/model_0000000.pt '\n                             '--add_input --add_attn --attn_proj_vary --learn_prior --fp16'.split())\n    print(args)\n\n    if args.model_type == 'cvae':\n        args.learn_prior = True\n    else:\n        args.learn_prior = False\n\n    # GPU\n    if not torch.cuda.is_available(): args.no_gpu = True\n    gpu = not args.no_gpu\n    if gpu: torch.cuda.set_device(args.gpu)\n    device = torch.device(args.gpu if gpu else \"cpu\")\n\n    # randomness\n    np.random.seed(args.seed)\n    prng = np.random.RandomState()\n    torch.random.manual_seed(args.seed)\n    if gpu: torch.cuda.manual_seed(args.seed); torch.cuda.manual_seed_all(args.seed)\n\n    if args.batch_size == -1:\n        args.batch_size = 1\n\n    # logging\n    save_folder = args.model_path + '.eval/'\n    os.makedirs(save_folder, exist_ok=True)\n    importlib.reload(logging)\n    logging.basicConfig(filename=os.path.join(save_folder, 'eval_ppl.log'),\n                        level=logging.INFO, format='%(asctime)s--- %(message)s')\n    logging.info('\\n----------------------------------------------------------------------')\n\n    print('Loading models...')\n    cache_dir = os.path.join(args.out_dir, 'model_cache')\n    os.makedirs(cache_dir, exist_ok=True)\n    # Load pre-trained teacher tokenizer (vocabulary)\n    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n    tokenizer.max_len = int(1e12)\n    gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\n    print('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\n    config = GPT2Config()\n\n    # add special tokens\n    special_tokens_dict = {\n        'pad_token': '<|startoftext|>',\n        'cls_token': '<|startofcond|>',\n        'sep_token': '<|sepofcond|>',\n        'mask_token': '<|endofcond|>'\n    }\n    num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\n    print('We have added', num_added_toks, 'special tokens')\n    # Notice: resize_token_embeddings expect to receive the full size of the new vocab\n    gpt2_model.resize_token_embeddings(len(tokenizer))\n    assert tokenizer.pad_token == '<|startoftext|>'\n\n    VAE = VAEModel(config, add_input=args.add_input, add_attn=args.add_attn, add_softmax=args.add_softmax,\n                   attn_proj_vary=args.attn_proj_vary, learn_prior=args.learn_prior)\n    init_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\n    init_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\n    if args.learn_prior:\n        init_para_frompretrained(VAE.encoder_prior, VAE.encoder, share_para=True)\n        VAE.encoder_prior.averageSelfAttention.attention_weights = VAE.encoder.averageSelfAttention.attention_weights\n    VAE.lm_head.weight = gpt2_model.lm_head.weight\n    if VAE.add_softmax:\n        VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\n        # VAE.lm_head_rep = LM_head_rep(*gpt2_model.lm_head.weight.size()[::-1])\n    print('VAE_params:', num_params(VAE))  # 286694400\n    args.load = args.model_path\n    if args.load:\n        print('Loading model weights...')\n        state = torch.load(os.path.join(args.load), map_location='cpu')\n        if 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n            state_copy = copy.copy(state)\n            keys = state_copy.keys()\n            for k in keys:\n                state[k.replace('module.', '')] = state.pop(k)\n        VAE.load_state_dict(state)\n        gc.collect()\n    print('Model loaded.')\n\n    print('Setup data...')\n    seq_len = VAE.config.n_ctx\n    train_loader, val_loader, test_loader = prepare_dataset(\n        args.data_dir, args.dataset, tokenizer,\n        1, seq_len, 1, seq_len, args.batch_size, seq_len,\n        make_test=True,\n        num_workers=args.workers, data_type=args.data_type\n    )\n    print('Done.')\n\n    if args.fp16:\n        VAE = VAE.half()\n    VAE.eval()  # be careful about VAE.eval() vs VAE.train()\n    VAE.to(device)\n    loss_fn = nn.CrossEntropyLoss(reduction='none')\n\n    logging.info('\\n----------------------------------------------------------------------')\n    logging.info(\"Testing loop. batches: %d\" % len(test_loader))\n\n    endoftext = tokenizer.convert_tokens_to_ids(\"<|endoftext|>\")\n    startofcond = tokenizer.convert_tokens_to_ids(\"<|startofcond|>\")\n    endofcond = tokenizer.convert_tokens_to_ids(\"<|endofcond|>\")\n\n    n_words_bpe = 0\n    n_words = 0\n    logp_sum = 0.0\n\n    n_words_bpe_l = []\n    n_words_l = []\n    logp_sum_l = []\n\n    # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n    with tqdm(total=len(test_loader)) as pbar:\n        for i_test, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(test_loader):\n\n            with torch.no_grad():\n                if args.model_type == 'cvae':\n                    loss, ce_loss, kl_loss = compute_loss(device, VAE, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                                          target_tokens, mask, loss_fn, 1.0)\n                else:\n                    loss, ce_loss, kl_loss = compute_loss_ae(device, VAE, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                                          target_tokens, mask, loss_fn, 1.0)\n\n            if len(target_tokens.size()) == 1:\n               target_tokens = target_tokens.unsqueeze(0)\n            n, l = target_tokens.size()\n\n            text = target_tokens[0, :].tolist()\n            logprob = ce_loss.tolist()\n            assert len(text) == len(logprob)\n\n            # only for story\n            idx = text.index(endoftext)\n            text = text[idx + 1:]\n            logprob = logprob[idx + 1:]\n\n            if endoftext in text:\n                idx = text.index(endoftext)\n                text = text[:idx]\n                logprob = logprob[:idx]\n\n            logp_sum += sum(logprob)\n            logp_sum_l.append(sum(logprob))\n\n            n_words_bpe += len(text)\n            n_words_bpe_l.append(len(text))\n\n            story = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n            story = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in story]\n            story = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in story]\n            words = sum([len([t for t in re.split('(\"|\\'|!|\\?|\\.|,|:| |\\n|’|“|”|;|\\(|\\)|`)', s) if t != ' ' and t != '']) for s in story])\n            n_words += words\n            n_words_l.append(words)\n\n            #logging.info('test sample %05d finished.', i_test)\n            pbar.update(1)\n\n    print('Test complete with %05d samples.' % len(test_loader))\n    logging.info(\"Test complete with %05d samples.\", len(test_loader))\n\n    print(' loss_bpe :', logp_sum / n_words_bpe)\n    logging.info('loss_bpe: %f', logp_sum / n_words_bpe)\n\n    ppl_bpe = round(math.exp(min(logp_sum / n_words_bpe, 100)), 3)\n    ppl_word = round(math.exp(min(logp_sum / n_words, 100)), 3)\n    print(' ppl_word:', ppl_word)\n    print(' ppl_bpe :', ppl_bpe)\n    logging.info('logp_sum: %f', logp_sum)\n    logging.info('n_words_bpe: %d', n_words_bpe)\n    logging.info('n_words    : %d', n_words)\n    logging.info('    ppl_bpe : %f', ppl_bpe)\n    logging.info('    ppl_word: %f', ppl_word)\n\n\nif __name__ == '__main__':\n    run_model()\n"
  },
  {
    "path": "generate.py",
    "content": "import pickle\nimport os\nimport math\nimport torch\nimport torch.nn.functional as F\nfrom torch.nn import DataParallel\nimport numpy as np\nimport argparse\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config\nfrom tqdm import tqdm\nfrom tqdm import trange\nimport importlib\nimport logging\nimport copy\nfrom data.util import *\nfrom collections import Counter\nfrom nltk.translate.bleu_score import sentence_bleu\nfrom nltk.translate.bleu_score import SmoothingFunction\nfrom rouge import Rouge\nfrom util import *\n\nfrom model import *\n\n\ndef top_k_top_p_filtering(logits, top_k=100, top_p=0.95, filter_value=-float('Inf')):\n    \"\"\" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering\n        Args:\n            logits: logits distribution shape (vocabulary size)\n            top_k > 0: keep only top k tokens with highest probability (top-k filtering).\n            top_p > 0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).\n                Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)\n        From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317\n    \"\"\"\n    top_k = min(top_k, logits.size(-1))  # Safety check\n    if top_k > 0:\n        # Remove all tokens with a probability less than the last token of the top-k\n        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]\n        logits[indices_to_remove] = filter_value\n\n    if top_p > 0.0:\n        sorted_logits, sorted_indices = torch.sort(logits, descending=True)\n        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)\n\n        # Remove tokens with cumulative probability above the threshold\n        sorted_indices_to_remove = cumulative_probs > top_p\n        # Shift the indices to the right to keep also the first token above the threshold\n        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()\n        sorted_indices_to_remove[..., 0] = 0\n\n        # scatter sorted tensors to original indexing\n        indices_to_remove = sorted_indices_to_remove.scatter(dim=1, index=sorted_indices, src=sorted_indices_to_remove)\n        logits[indices_to_remove] = filter_value\n\n    return logits\n\n\ndef repeat_score(text, ngram=[3, 4, 5, 6]):\n    ngram_list = []\n    for ng in ngram:\n        ngram_list.append([text[idx:idx + ng] for idx in range(len(text) - ng - 1)])\n\n    max_occurs = []\n    for ngrams in ngram_list:\n        count_result = Counter([' '.join(n) for n in ngrams])\n        try:\n            max_occurs.append(\n                max(count_result.values())\n            )\n        except:\n            pass\n\n    scores = [max_oc / ((len(text) / ngram[idx]) + ngram[idx]) for idx, max_oc in enumerate(max_occurs)]\n    return max(scores) if len(scores) >= 1 else 1.0\n\n\ndef sample_sequence(model, tokenizer, length, batch_size=None, x_mask=None, x_tokens=None, y_mask=None, y_tokens=None,\n                    temperature=1, top_k=100, top_p=0.95, device='cuda', sample=True, eos_token=None, model_type='cvae'):\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    mem = None\n    prev = torch.tensor([[eos_token]] * batch_size, dtype=torch.long, device=device)\n\n    output = prev\n    probability = torch.tensor([], dtype=torch.float, device=device)\n    if_end = torch.tensor([False] * batch_size, dtype=torch.bool, device=device)\n\n    with torch.no_grad():\n        if model_type == 'cvae':\n            try:\n                prior_mean, prior_logvar = model.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            except:\n                prior_mean = prior_logvar = torch.zeros([batch_size, model.config.n_embd], device=device)\n            latent_mean, latent_logvar = prior_mean, prior_logvar\n            z = model.reparameterize(latent_mean, latent_logvar)\n            assert not torch.isnan(z).any(), 'training get nan z'\n        else:\n            posterior_mean, posterior_logvar = model.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            latent_mean, latent_logvar = posterior_mean, posterior_logvar\n            z = latent_mean\n            assert not torch.isnan(z).any(), 'training get nan z'\n\n        for i in range(length): #trange\n            logits, mem = model.transformer(input_ids=prev, past=mem, representations=z)\n\n            logits = model.lm_head(logits)\n            if model.add_softmax:\n                logits_rep = model.lm_head_rep(z)\n                logits = logits + logits_rep.unsqueeze(dim=1)\n\n            logits = logits[:, -1, :] / temperature\n            logits = top_k_top_p_filtering(logits, top_k, top_p)\n            probs = F.softmax(logits, dim=-1)\n            if sample:\n                next_token = torch.multinomial(probs, num_samples=1)\n            else:\n                _, next_token = torch.topk(probs, k=1, dim=-1)\n\n            probability = torch.cat((probability, probs.gather(1, next_token)), dim=1)\n            output = torch.cat((output, next_token), dim=1)\n            prev = next_token\n\n            # early stopping if all sents have ended once\n            if_end[next_token.view(-1).eq(eos_token)] = True\n            if if_end.all(): break\n    return output, probability\n\n\ndef run_model():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--model-path', type=str, help='pretrained model path to local checkpoint')\n    parser.add_argument(\"--seed\", type=int, default=0)\n    parser.add_argument(\"--nsamples\", type=int, default=1)\n    parser.add_argument(\"--batch_size\", type=int, default=1)\n    parser.add_argument(\"--length\", type=int, default=-1)\n    parser.add_argument(\"--temperature\", type=int, default=0.95)\n    parser.add_argument('--top_p', type=float, default=0.95)\n    parser.add_argument('--top_k', type=int, default=100)\n    parser.add_argument('--data-dir', type=str, default='data')\n    parser.add_argument('--out-dir', type=str, default='out')\n\n    parser.add_argument('--data_type', type=str, default='t1', choices=['t' + str(i) for i in range(9)], help=\"t: type\")\n    parser.add_argument('--model_type', type=str, default='cvae', choices=['cvae', 'ae_vae_fusion'])\n    parser.add_argument('--dataset', type=str, default='wi', choices=['wp', 'wi'], help=\"Dataset to use for training\")\n\n    # use GPU\n    parser.add_argument('--gpu', default=2, type=int)\n    parser.add_argument('--no_gpu', action=\"store_true\")\n\n    parser.add_argument('--add_input', action=\"store_true\")\n    parser.add_argument('--add_attn', action=\"store_true\")\n    parser.add_argument('--add_softmax', action=\"store_true\")\n    parser.add_argument('--attn_proj_vary', action=\"store_true\")\n\n    parser.add_argument('--learn_prior', action=\"store_true\")\n\n    args = parser.parse_args('--model-path out/wi.1.proj_vary_cyc_cvae/model_0030000.pt '\n                             '--add_input --learn_prior '.split())\n    print(args)\n\n    if args.model_type == 'cvae':\n        args.learn_prior = True\n    else:\n        args.learn_prior = False\n\n    # GPU\n    if not torch.cuda.is_available(): args.no_gpu = True\n    gpu = not args.no_gpu\n    if gpu: torch.cuda.set_device(args.gpu)\n    device = torch.device(args.gpu if gpu else \"cpu\")\n\n    # randomness\n    np.random.seed(args.seed)\n    prng = np.random.RandomState()\n    torch.random.manual_seed(args.seed)\n    if gpu: torch.cuda.manual_seed(args.seed)\n\n    if args.batch_size == -1:\n        args.batch_size = 1\n    assert args.nsamples % args.batch_size == 0\n\n    # logging\n    save_folder = args.model_path + '.eval/'\n    os.makedirs(save_folder, exist_ok=True)\n    importlib.reload(logging)\n    logging.basicConfig(filename=os.path.join(save_folder, 'eval.log'),\n                        level=logging.INFO, format='%(asctime)s--- %(message)s')\n    logging.info('\\n----------------------------------------------------------------------')\n\n    print('Loading models...')\n    cache_dir = os.path.join(args.out_dir, 'model_cache')\n    os.makedirs(cache_dir, exist_ok=True)\n    # Load pre-trained teacher tokenizer (vocabulary)\n    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n    tokenizer.max_len = int(1e12)\n    gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\n    print('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\n    config = GPT2Config()\n\n    # # add special tokens\n    # special_tokens_dict = {\n    #     'pad_token': '<|startoftext|>',\n    #     'cls_token': '<|startofcond|>',\n    #     'sep_token': '<|sepofcond|>',\n    #     'mask_token': '<|endofcond|>'\n    # }\n    # num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\n    # print('We have added', num_added_toks, 'special tokens')\n    # # Notice: resize_token_embeddings expect to receive the full size of the new vocab\n    # gpt2_model.resize_token_embeddings(len(tokenizer))\n    # assert tokenizer.pad_token == '<|startoftext|>'\n\n    VAE = VAEModel(config, add_input=args.add_input, add_attn=args.add_attn, add_softmax=args.add_softmax,\n                   attn_proj_vary=args.attn_proj_vary, learn_prior=args.learn_prior)\n    init_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\n    init_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\n    if args.learn_prior:\n        init_para_frompretrained(VAE.encoder_prior, VAE.encoder, share_para=True)\n        VAE.encoder_prior.averageSelfAttention.attention_weights = VAE.encoder.averageSelfAttention.attention_weights\n    VAE.lm_head.weight = gpt2_model.lm_head.weight\n    if VAE.add_softmax:\n        VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\n        # VAE.lm_head_rep = LM_head_rep(*gpt2_model.lm_head.weight.size()[::-1])\n    print('VAE_params:', num_params(VAE))  # 286694400\n    args.load = args.model_path\n    if args.load:\n        print('Loading model weights...')\n        state = torch.load(os.path.join(args.load), map_location='cpu')\n        if 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n            state_copy = copy.copy(state)\n            keys = state_copy.keys()\n            for k in keys:\n                state[k.replace('module.', '')] = state.pop(k)\n        VAE.load_state_dict(state)\n        gc.collect()\n    print('Model loaded.')\n\n    print('Setup data...')\n    seq_len = VAE.config.n_ctx\n    test_loader = prepare_dataset(\n        args.data_dir, args.dataset, tokenizer,\n        1, seq_len, 1, seq_len, args.batch_size, seq_len,\n        make_train=False, make_val=False, make_test=True, data_type=args.data_type\n    )[0]\n    print('Done.')\n\n    VAE.eval()  # be careful about VAE.eval() vs VAE.train()\n    VAE.to(device)\n    loss_fn = nn.CrossEntropyLoss(reduction='none')\n\n    logging.info('\\n----------------------------------------------------------------------')\n    logging.info(\"Testing loop. batches: %d\" % len(test_loader))\n\n    endoftext = tokenizer.convert_tokens_to_ids(\"<|endoftext|>\")\n    startofcond = tokenizer.convert_tokens_to_ids(\"<|startofcond|>\")\n    endofcond = tokenizer.convert_tokens_to_ids(\"<|endofcond|>\")\n\n    n_samples = 0\n    bleu4_sum = 0.0\n    rouge_scores_values_sum = [0.0] * 9\n\n    model_type = args.model_type\n\n    # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n    with tqdm(total=len(test_loader)) as pbar:\n        for i_test, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(test_loader):\n\n            length = args.length\n            if length == -1:\n                length = VAE.config.n_ctx - 1\n            elif length > VAE.config.n_ctx - 1:\n                raise ValueError(\"Can't get samples longer than window size: %s\" % VAE.config.n_ctx)\n\n            eff_samples = []\n            n, l = target_tokens.size()\n            storys = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n            storys_str = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in storys]\n\n            for _ in range(args.nsamples // args.batch_size):\n                # model, batch_size, temperature, top_k, top_p, eos_token, sample = VAE, args.batch_size, args.temperature, args.top_k, args.top_p, tokenizer.encoder['<|endoftext|>'], True\n                out, _ = sample_sequence(\n                    model=VAE,\n                    tokenizer=tokenizer,\n                    length=length,\n                    batch_size=args.batch_size,\n                    x_mask=x_mask,\n                    x_tokens=x_tokens,\n                    y_mask=y_mask,\n                    y_tokens=y_tokens,\n                    temperature=args.temperature,\n                    top_k=args.top_k,\n                    top_p=args.top_p,\n                    device = device,\n                    eos_token=tokenizer.encoder['<|endoftext|>'],\n                    model_type=model_type\n                )\n                out = out.tolist()\n\n                # extract story, check metrics\n                for i in range(len(out)):\n                    text = out[i]\n                    text = text[text.index(endoftext) + 1:]\n\n                    if endoftext in text:\n                        idx = text.index(endoftext)\n                        text = text[:idx]\n\n                    text = tokenizer.decode(text).strip()\n\n                    # score for one long text, higher than 0.075 usually means repetition\n                    # rep_score = repeat_score(text.split(), ngram=[3, 4, 5, 6, 7, 8])\n                    # if rep_score > 0.075:\n                    #     # print(rep_score)\n                    #     continue\n\n                    try:\n                        # check bleu\n                        bleu4 = sentence_bleu([storys_str[i].split()], text, smoothing_function=SmoothingFunction().method7)\n\n                        # check rouge\n                        rouge = Rouge()\n                        rouge_scores = rouge.get_scores(text, storys_str[i])\n                        rouge_scores_values = [v for k in rouge_scores[0].keys() for v in rouge_scores[0][k].values()]\n\n                        bleu4_sum += bleu4\n                        rouge_scores_values_sum = [v1 + v2 for v1, v2 in zip(rouge_scores_values_sum, rouge_scores_values)]\n                        n_samples += 1\n                    except:\n                        bleu4 = 0.0\n                        rouge_scores = [{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                         'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                         'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}]\n\n                    eff_samples.append((text, bleu4, rouge_scores))\n\n                # write samples to file\n                samples_file = open(save_folder + 'batch-' + '%04d' % i_test + '.txt', 'w', encoding='utf8')\n                for i in range(len(eff_samples)):\n                    samples_file.write(\"=\" * 50 + \" SAMPLE \" + str(i) + \" \" + \"=\" * 50)\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Outlines  \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(tokenizer.decode(x_tokens[i, :][x_mask[i, :] == 1].tolist()))\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(\"=\" * 40 + \" Story \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(storys_str[i])\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Generated \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(eff_samples[i][0])\n                    samples_file.write('\\n' * 4)\n                    samples_file.flush()\n\n                logging.info('batch %04d finished.', i_test)\n                pbar.update(1)\n\n    print('Test complete with %05d samples.' % n_samples)\n    logging.info(\"Test complete with %05d samples.\", n_samples)\n\n    bleu4 = round(bleu4_sum / n_samples, 3)\n    rouge_scores_values = [round(r / n_samples, 3) for r in rouge_scores_values_sum]\n    print(' bleu-4:', bleu4)\n    print(' rouge :', rouge_scores_values)\n    logging.info(' bleu-4: %f', bleu4)\n    logging.info(' rouge : %s', str(rouge_scores_values))\n\n\nif __name__ == '__main__':\n    run_model()\n"
  },
  {
    "path": "generate_prefix.py",
    "content": "import pickle\nimport os\nimport math\nimport torch\nimport torch.nn.functional as F\nfrom torch.nn import DataParallel\nimport numpy as np\nimport argparse\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config\nfrom tqdm import tqdm\nfrom tqdm import trange\nimport importlib\nimport logging\nimport copy\nfrom data.util import *\nfrom collections import Counter\nfrom nltk.translate.bleu_score import sentence_bleu\nfrom nltk.translate.bleu_score import SmoothingFunction\nfrom rouge import Rouge\nfrom util import *\n\nfrom model import *\n\n\ndef top_k_top_p_filtering(logits, top_k=100, top_p=0.95, filter_value=-float('Inf')):\n    \"\"\" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering\n        Args:\n            logits: logits distribution shape (vocabulary size)\n            top_k > 0: keep only top k tokens with highest probability (top-k filtering).\n            top_p > 0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).\n                Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)\n        From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317\n    \"\"\"\n    top_k = min(top_k, logits.size(-1))  # Safety check\n    if top_k > 0:\n        # Remove all tokens with a probability less than the last token of the top-k\n        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]\n        logits[indices_to_remove] = filter_value\n\n    if top_p > 0.0:\n        sorted_logits, sorted_indices = torch.sort(logits, descending=True)\n        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)\n\n        # Remove tokens with cumulative probability above the threshold\n        sorted_indices_to_remove = cumulative_probs > top_p\n        # Shift the indices to the right to keep also the first token above the threshold\n        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()\n        sorted_indices_to_remove[..., 0] = 0\n\n        # scatter sorted tensors to original indexing\n        indices_to_remove = sorted_indices_to_remove.scatter(dim=1, index=sorted_indices, src=sorted_indices_to_remove)\n        logits[indices_to_remove] = filter_value\n\n    return logits\n\n\ndef repeat_score(text, ngram=[3, 4, 5, 6]):\n    ngram_list = []\n    for ng in ngram:\n        ngram_list.append([text[idx:idx + ng] for idx in range(len(text) - ng - 1)])\n\n    max_occurs = []\n    for ngrams in ngram_list:\n        count_result = Counter([' '.join(n) for n in ngrams])\n        try:\n            max_occurs.append(\n                max(count_result.values())\n            )\n        except:\n            pass\n\n    scores = [max_oc / ((len(text) / ngram[idx]) + ngram[idx]) for idx, max_oc in enumerate(max_occurs)]\n    return max(scores) if len(scores) >= 1 else 1.0\n\n\ndef sample_sequence(model, tokenizer, length, batch_size=None, x_mask=None, x_tokens=None, y_mask=None, y_tokens=None,\n                    temperature=1, top_k=100, top_p=0.95, device='cuda', sample=True, eos_token=None, model_type='cvae'):\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    with torch.no_grad():\n        if model_type == 'cvae':\n            try:\n                prior_mean, prior_logvar = model.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            except:\n                prior_mean = prior_logvar = torch.zeros([batch_size, model.config.n_embd], device=device)\n            latent_mean, latent_logvar = prior_mean, prior_logvar\n            z = model.reparameterize(latent_mean, latent_logvar)\n            assert not torch.isnan(z).any(), 'training get nan z'\n        else:\n            posterior_mean, posterior_logvar = model.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            latent_mean, latent_logvar = posterior_mean, posterior_logvar\n            z = latent_mean\n            assert not torch.isnan(z).any(), 'training get nan z'\n\n        _, mem = model.transformer(input_ids=x_tokens[:, :-1], past=None, attention_mask=x_mask[:, :-1], representations=z)\n        prev = x_tokens[:, -1].view(batch_size, -1)\n\n        output = prev\n        probability = torch.tensor([], dtype=torch.float, device=device)\n        if_end = torch.tensor([False] * batch_size, dtype=torch.bool, device=device)\n\n        for i in range(length): #trange\n            logits, mem = model.transformer(input_ids=prev, past=mem, representations=z)\n\n            logits = model.lm_head(logits)\n            if model.add_softmax:\n                logits_rep = model.lm_head_rep(z)\n                logits = logits + logits_rep.unsqueeze(dim=1)\n\n            logits = logits[:, -1, :] / temperature\n            logits = top_k_top_p_filtering(logits, top_k, top_p)\n            probs = F.softmax(logits, dim=-1)\n            if sample:\n                next_token = torch.multinomial(probs, num_samples=1)\n            else:\n                _, next_token = torch.topk(probs, k=1, dim=-1)\n\n            probability = torch.cat((probability, probs.gather(1, next_token)), dim=1)\n            output = torch.cat((output, next_token), dim=1)\n            prev = next_token\n\n            # early stopping if all sents have ended once\n            if_end[next_token.view(-1).eq(eos_token)] = True\n            if if_end.all(): break\n    return output, probability\n\n\ndef run_model():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--model-path', type=str, help='pretrained model path to local checkpoint')\n    parser.add_argument(\"--seed\", type=int, default=0)\n    parser.add_argument(\"--nsamples\", type=int, default=1)\n    parser.add_argument(\"--batch_size\", type=int, default=1)\n    parser.add_argument(\"--length\", type=int, default=-1)\n    parser.add_argument(\"--temperature\", type=int, default=0.95)\n    parser.add_argument('--top_p', type=float, default=0.95)\n    parser.add_argument('--top_k', type=int, default=100)\n    parser.add_argument('--data-dir', type=str, default='data')\n    parser.add_argument('--out-dir', type=str, default='out')\n\n    parser.add_argument('--data_type', type=str, default='t1', choices=['t' + str(i) for i in range(9)], help=\"t: type\")\n    parser.add_argument('--model_type', type=str, default='cvae', choices=['cvae', 'ae_vae_fusion'])\n    parser.add_argument('--dataset', type=str, default='wi', choices=['wp', 'wi'], help=\"Dataset to use for training\")\n\n    # use GPU\n    parser.add_argument('--gpu', default=2, type=int)\n    parser.add_argument('--no_gpu', action=\"store_true\")\n\n    parser.add_argument('--add_input', action=\"store_true\")\n    parser.add_argument('--add_attn', action=\"store_true\")\n    parser.add_argument('--add_softmax', action=\"store_true\")\n    parser.add_argument('--attn_proj_vary', action=\"store_true\")\n\n    parser.add_argument('--learn_prior', action=\"store_true\")\n\n    args = parser.parse_args('--model-path out/wi.1.proj_vary_cyc_cvae/model_0030000.pt '\n                             '--add_input --learn_prior '.split())\n    print(args)\n\n    if args.model_type == 'cvae':\n        args.learn_prior = True\n    else:\n        args.learn_prior = False\n\n    # GPU\n    if not torch.cuda.is_available(): args.no_gpu = True\n    gpu = not args.no_gpu\n    if gpu: torch.cuda.set_device(args.gpu)\n    device = torch.device(args.gpu if gpu else \"cpu\")\n\n    # randomness\n    np.random.seed(args.seed)\n    prng = np.random.RandomState()\n    torch.random.manual_seed(args.seed)\n    if gpu: torch.cuda.manual_seed(args.seed)\n\n    if args.batch_size == -1:\n        args.batch_size = 1\n    assert args.nsamples % args.batch_size == 0\n\n    # logging\n    save_folder = args.model_path + '.eval/'\n    os.makedirs(save_folder, exist_ok=True)\n    importlib.reload(logging)\n    logging.basicConfig(filename=os.path.join(save_folder, 'eval.log'),\n                        level=logging.INFO, format='%(asctime)s--- %(message)s')\n    logging.info('\\n----------------------------------------------------------------------')\n\n    print('Loading models...')\n    cache_dir = os.path.join(args.out_dir, 'model_cache')\n    os.makedirs(cache_dir, exist_ok=True)\n    # Load pre-trained teacher tokenizer (vocabulary)\n    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n    tokenizer.max_len = int(1e12)\n    gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\n    print('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\n    config = GPT2Config()\n\n    # # add special tokens\n    # special_tokens_dict = {\n    #     'pad_token': '<|startoftext|>',\n    #     'cls_token': '<|startofcond|>',\n    #     'sep_token': '<|sepofcond|>',\n    #     'mask_token': '<|endofcond|>'\n    # }\n    # num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\n    # print('We have added', num_added_toks, 'special tokens')\n    # # Notice: resize_token_embeddings expect to receive the full size of the new vocab\n    # gpt2_model.resize_token_embeddings(len(tokenizer))\n    # assert tokenizer.pad_token == '<|startoftext|>'\n\n    VAE = VAEModel(config, add_input=args.add_input, add_attn=args.add_attn, add_softmax=args.add_softmax,\n                   attn_proj_vary=args.attn_proj_vary, learn_prior=args.learn_prior)\n    init_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\n    init_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\n    if args.learn_prior:\n        init_para_frompretrained(VAE.encoder_prior, VAE.encoder, share_para=True)\n        VAE.encoder_prior.averageSelfAttention.attention_weights = VAE.encoder.averageSelfAttention.attention_weights\n    VAE.lm_head.weight = gpt2_model.lm_head.weight\n    if VAE.add_softmax:\n        VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\n        # VAE.lm_head_rep = LM_head_rep(*gpt2_model.lm_head.weight.size()[::-1])\n    print('VAE_params:', num_params(VAE))  # 286694400\n    args.load = args.model_path\n    if args.load:\n        print('Loading model weights...')\n        state = torch.load(os.path.join(args.load), map_location='cpu')\n        if 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n            state_copy = copy.copy(state)\n            keys = state_copy.keys()\n            for k in keys:\n                state[k.replace('module.', '')] = state.pop(k)\n        VAE.load_state_dict(state)\n        gc.collect()\n    print('Model loaded.')\n\n    print('Setup data...')\n    seq_len = VAE.config.n_ctx\n    test_loader = prepare_dataset(\n        args.data_dir, args.dataset, tokenizer,\n        1, seq_len, 1, seq_len, args.batch_size, seq_len,\n        make_train=False, make_val=False, make_test=True, data_type=args.data_type\n    )[0]\n    print('Done.')\n\n    VAE.eval() # be careful about VAE.eval() vs VAE.train()\n    VAE.to(device)\n    loss_fn = nn.CrossEntropyLoss(reduction='none')\n\n    logging.info('\\n----------------------------------------------------------------------')\n    logging.info(\"Testing loop. batches: %d\" % len(test_loader))\n\n    endoftext = tokenizer.convert_tokens_to_ids(\"<|endoftext|>\")\n    startofcond = tokenizer.convert_tokens_to_ids(\"<|startofcond|>\")\n    endofcond = tokenizer.convert_tokens_to_ids(\"<|endofcond|>\")\n\n    n_samples = 0\n    bleu4_sum = 0.0\n    rouge_scores_values_sum = [0.0] * 9\n\n    model_type = args.model_type\n\n    # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n    with tqdm(total=len(test_loader)) as pbar:\n        for i_test, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(test_loader):\n\n            length = args.length\n            if length == -1:\n                length = VAE.config.n_ctx - x_tokens.size(1) - 1\n            elif length > VAE.config.n_ctx - x_tokens.size(1) - 1:\n                raise ValueError(\"Can't get samples longer than window size: %s\" % VAE.config.n_ctx)\n\n            eff_samples = []\n            n, l = target_tokens.size()\n            storys = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n            storys = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in storys]\n            storys_str = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in storys]\n\n            for _ in range(args.nsamples // args.batch_size):\n                # model, batch_size, temperature, top_k, top_p, eos_token, sample = VAE, args.batch_size, args.temperature, args.top_k, args.top_p, tokenizer.encoder['<|endoftext|>'], True\n                out, _ = sample_sequence(\n                    model=VAE,\n                    tokenizer=tokenizer,\n                    length=length,\n                    batch_size=args.batch_size,\n                    x_mask=x_mask,\n                    x_tokens=x_tokens,\n                    y_mask=y_mask,\n                    y_tokens=y_tokens,\n                    temperature=args.temperature,\n                    top_k=args.top_k,\n                    top_p=args.top_p,\n                    device = device,\n                    eos_token=tokenizer.encoder['<|endoftext|>'],\n                    model_type=model_type\n                )\n                out = out.tolist()\n\n                # extract story, check metrics\n                for i in range(len(out)):\n                    text = out[i]\n                    text = text[text.index(endoftext) + 1:]\n\n                    if endoftext in text:\n                        idx = text.index(endoftext)\n                        text = text[:idx]\n\n                    text = tokenizer.decode(text).strip()\n\n                    # score for one long text, higher than 0.075 usually means repetition\n                    # rep_score = repeat_score(text.split(), ngram=[3, 4, 5, 6, 7, 8])\n                    # if rep_score > 0.075:\n                    #     # print(rep_score)\n                    #     continue\n\n                    try:\n                        # check bleu\n                        bleu4 = sentence_bleu([storys_str[i].split()], text, smoothing_function=SmoothingFunction().method7)\n\n                        # check rouge\n                        rouge = Rouge()\n                        rouge_scores = rouge.get_scores(text, storys_str[i])\n                        rouge_scores_values = [v for k in rouge_scores[0].keys() for v in rouge_scores[0][k].values()]\n\n                        bleu4_sum += bleu4\n                        rouge_scores_values_sum = [v1 + v2 for v1, v2 in zip(rouge_scores_values_sum, rouge_scores_values)]\n                        n_samples += 1\n                    except:\n                        bleu4 = 0.0\n                        rouge_scores = [{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                         'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                         'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}]\n\n                    eff_samples.append((text, bleu4, rouge_scores))\n\n                # write samples to file\n                samples_file = open(save_folder + 'batch-' + '%04d' % i_test + '.txt', 'w', encoding='utf8')\n                for i in range(len(eff_samples)):\n                    samples_file.write(\"=\" * 50 + \" SAMPLE \" + str(i) + \" \" + \"=\" * 50)\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Outlines  \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(tokenizer.decode(x_tokens[i, :][x_mask[i, :] == 1].tolist()))\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(\"=\" * 40 + \" Story \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(storys_str[i])\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Generated \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(eff_samples[i][0])\n                    samples_file.write('\\n' * 4)\n                    samples_file.flush()\n\n                logging.info('batch %04d finished.', i_test)\n                pbar.update(1)\n\n    print('Test complete with %05d samples.' % n_samples)\n    logging.info(\"Test complete with %05d samples.\", n_samples)\n\n    bleu4 = round(bleu4_sum / n_samples, 3)\n    rouge_scores_values = [round(r / n_samples, 3) for r in rouge_scores_values_sum]\n    print(' bleu-4:', bleu4)\n    print(' rouge :', rouge_scores_values)\n    logging.info(' bleu-4: %f', bleu4)\n    logging.info(' rouge : %s', str(rouge_scores_values))\n\n\nif __name__ == '__main__':\n    run_model()\n"
  },
  {
    "path": "model.py",
    "content": "from __future__ import absolute_import, division, print_function, unicode_literals\n\nimport collections\nimport json\nimport logging\nimport math\nimport os\n\nimport torch\nimport torch.nn as nn\nfrom torch.nn import CrossEntropyLoss\nfrom torch.nn.parameter import Parameter\nimport torch.nn.functional as F\n\nimport copy\n\nfrom transformers.modeling_utils import PreTrainedModel, Conv1D, prune_conv1d_layer, SequenceSummary\nfrom transformers.modeling_gpt2 import *\nfrom transformers.modeling_bert import gelu\nfrom transformers.configuration_gpt2 import GPT2Config\nfrom transformers.file_utils import add_start_docstrings\n\n\n####################### auxiliary attention blocks #######################\nclass Unmasked_Attention(Attention):\n    def _attn(self, q, k, v, attention_mask=None, head_mask=None):\n        w = torch.matmul(q, k)\n        if self.scale:\n            w = w / math.sqrt(v.size(-1))\n\n        if attention_mask is not None:\n            # Apply the attention mask\n            w = w + attention_mask\n\n        w = nn.Softmax(dim=-1)(w)\n        w = self.attn_dropout(w)\n\n        # Mask heads if we want to\n        if head_mask is not None:\n            w = w * head_mask\n\n        outputs = [torch.matmul(w, v)]\n        if self.output_attentions:\n            outputs.append(w)\n        return outputs\n\n\nclass Unmasked_Block(Block):\n    def __init__(self, n_ctx, config, scale=False):\n        super(Block, self).__init__()\n        nx = config.n_embd\n        self.ln_1 = nn.LayerNorm(nx, eps=config.layer_norm_epsilon)\n        self.attn = Unmasked_Attention(nx, n_ctx, config, scale)\n        self.ln_2 = nn.LayerNorm(nx, eps=config.layer_norm_epsilon)\n        self.mlp = MLP(4 * nx, config)\n\n\nclass AverageSelfAttention(nn.Module):\n    def __init__(self, attention_size):\n        super(AverageSelfAttention, self).__init__()\n        w = torch.empty(attention_size)\n        nn.init.normal_(w, std=0.02)\n        self.attention_weights = nn.Parameter(w)\n        self.softmax = nn.Softmax(dim=-1)\n        self.non_linearity = gelu\n\n    def forward(self, inputs, attention_mask=None):\n\n        ##################################################################\n        # STEP 1 - perform dot product\n        # of the attention vector and each hidden state\n        ##################################################################\n\n        # inputs is a 3D Tensor: batch, len, hidden_size\n        # scores is a 2D Tensor: batch, len\n        scores = self.non_linearity(inputs.matmul(self.attention_weights))\n\n        ##################################################################\n        # Step 2 - Masking\n        ##################################################################\n\n        if attention_mask is not None:\n            scores = scores + attention_mask\n\n        ##################################################################\n        # Step 3 - Weighted sum of hidden states, by the attention scores\n        ##################################################################\n        scores = self.softmax(scores)\n\n        # multiply each hidden state with the attention weights\n        weighted = torch.mul(inputs, scores.unsqueeze(-1).expand_as(inputs))\n\n        # sum the hidden states\n        representations = weighted.sum(1).squeeze(1)\n\n        return representations, scores\n\n\n# Pseudo self-attention\nclass Cond_Attention(Attention):\n    def __init__(self, nx, n_ctx, config, scale=False):\n        super(Attention, self).__init__()\n        self.output_attentions = config.output_attentions\n\n        n_state = nx  # in Attention: n_state=768 (nx=n_embd)\n        # [switch nx => n_state from Block to Attention to keep identical to TF implem]\n        assert n_state % config.n_head == 0\n        self.register_buffer(\"bias\", torch.tril(torch.ones(n_ctx, n_ctx)).view(1, 1, n_ctx, n_ctx))\n        self.n_head = config.n_head\n        self.split_size = n_state\n        self.scale = scale\n\n        self.c_attn = Conv1D(n_state * 3, nx)\n        self.c_proj = Conv1D(n_state, nx)\n        self.attn_dropout = nn.Dropout(config.attn_pdrop)\n        self.resid_dropout = nn.Dropout(config.resid_pdrop)\n        self.pruned_heads = set()\n\n        # add code here\n        self.c_z = Conv1D(n_state * 2, nx)\n\n    def _attn(self, q, k, v, attention_mask=None, head_mask=None):\n        w = torch.matmul(q, k)\n        if self.scale:\n            w = w / math.sqrt(v.size(-1))\n        nd, ns = w.size(-2), w.size(-1)\n        b = self.bias[:, :, ns - nd : ns, :ns]\n        w = w * b - 1e4 * (1 - b)\n\n        if attention_mask is not None:\n            # add code here: w size has been bsz * n_heads * L * (L+1), mask bsz * 1 * 1 * L\n            assert attention_mask.size()[-1] == w.size()[-1] - 1\n            zeros = torch.zeros(attention_mask.size()[:-1], device=attention_mask.device, dtype=attention_mask.dtype).unsqueeze(-1)\n            attention_mask = torch.cat((zeros, attention_mask), dim=-1)\n\n            # Apply the attention mask\n            w = w + attention_mask\n\n        w = nn.Softmax(dim=-1)(w)\n        w = self.attn_dropout(w)\n\n        # Mask heads if we want to\n        if head_mask is not None:\n            w = w * head_mask\n\n        outputs = [torch.matmul(w, v)]\n        if self.output_attentions:\n            outputs.append(w)\n        return outputs\n\n    def forward(self, x, z, layer_past=None, attention_mask=None, head_mask=None):\n        x = self.c_attn(x)\n        query, key, value = x.split(self.split_size, dim=2)\n        query = self.split_heads(query)\n        key = self.split_heads(key, k=True)\n        value = self.split_heads(value)\n        if layer_past is not None:\n            past_key, past_value = layer_past[0].transpose(-2, -1), layer_past[1]  # transpose back cf below\n            key = torch.cat((past_key, key), dim=-1)\n            value = torch.cat((past_value, value), dim=-2)\n        present = torch.stack((key.transpose(-2, -1), value))  # transpose to have same shapes for stacking\n\n        z_conv = self.c_z(z)\n        key_z, value_z = z_conv.split(self.split_size, dim=2)\n        key_z = self.split_heads(key_z, k=True)\n        value_z = self.split_heads(value_z)\n        key = torch.cat((key_z, key), dim=-1)\n        value = torch.cat((value_z, value), dim=-2)\n\n        attn_outputs = self._attn(query, key, value, attention_mask, head_mask)\n        a = attn_outputs[0]\n\n        a = self.merge_heads(a)\n        a = self.c_proj(a)\n        a = self.resid_dropout(a)\n\n        outputs = [a, present] + attn_outputs[1:]\n        return outputs  # a, present, (attentions)\n\n\nclass Cond_Block(Block):\n    def __init__(self, n_ctx, config, scale=False):\n        super(Block, self).__init__()\n        nx = config.n_embd\n        self.ln_1 = nn.LayerNorm(nx, eps=config.layer_norm_epsilon)\n        self.attn = Cond_Attention(nx, n_ctx, config, scale)\n        self.ln_2 = nn.LayerNorm(nx, eps=config.layer_norm_epsilon)\n        self.mlp = MLP(4 * nx, config)\n\n    def forward(self, x, z, layer_past=None, attention_mask=None, head_mask=None):\n        output_attn = self.attn(\n            self.ln_1(x), z, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask\n        )\n        a = output_attn[0]  # output_attn: a, present, (attentions)\n\n        x = x + a\n        m = self.mlp(self.ln_2(x))\n        x = x + m\n\n        outputs = [x] + output_attn[1:]\n        return outputs  # x, present, (attentions)\n\n\n####################### transformer-based vae #######################\nclass Encoder(GPT2Model):\n    def __init__(self, config):\n        super(GPT2Model, self).__init__(config)\n        self.output_hidden_states = config.output_hidden_states\n        self.output_attentions = config.output_attentions\n        self.output_past = config.output_past\n\n        self.wte = nn.Embedding(config.vocab_size, config.n_embd)\n        self.wpe = nn.Embedding(config.n_positions, config.n_embd)\n        self.drop = nn.Dropout(config.embd_pdrop)\n\n        # manually modify number of layers in encoder to accommodate GPU memory\n        n = 6  # config.n_layer\n        self.h = nn.ModuleList([Unmasked_Block(config.n_ctx, config, scale=True) for _ in range(n)])\n\n        self.ln_f = nn.LayerNorm(config.n_embd, eps=config.layer_norm_epsilon)\n\n        self.init_weights()\n\n        # added code here\n        self.averageSelfAttention = AverageSelfAttention(config.n_embd)\n        nx = config.n_embd\n        nz = config.n_embd\n        self.mean = Conv1D(nz, nx)\n        self.logvar = Conv1D(nz, nx)\n\n    def forward(\n            self,\n            input_ids=None,\n            past=None,\n            attention_mask=None,\n            token_type_ids=None,\n            position_ids=None,\n            head_mask=None,\n            inputs_embeds=None,\n    ):\n        if input_ids is not None and inputs_embeds is not None:\n            raise ValueError(\"You cannot specify both input_ids and inputs_embeds at the same time\")\n        elif input_ids is not None:\n            input_shape = input_ids.size()\n            input_ids = input_ids.view(-1, input_shape[-1])\n        elif inputs_embeds is not None:\n            input_shape = inputs_embeds.size()[:-1]\n        else:\n            raise ValueError(\"You have to specify either input_ids or inputs_embeds\")\n\n        if token_type_ids is not None:\n            token_type_ids = token_type_ids.view(-1, input_shape[-1])\n        if position_ids is not None:\n            position_ids = position_ids.view(-1, input_shape[-1])\n\n        if past is None:\n            past_length = 0\n            past = [None] * len(self.h)\n        else:\n            past_length = past[0][0].size(-2)\n        if position_ids is None:\n            device = input_ids.device if input_ids is not None else inputs_embeds.device\n            position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)\n            position_ids = position_ids.unsqueeze(0).view(-1, input_shape[-1])\n\n        # Attention mask.\n        if attention_mask is not None:\n            attention_mask = attention_mask.view(-1, input_shape[-1])\n            # We create a 3D attention mask from a 2D tensor mask.\n            # Sizes are [batch_size, 1, 1, to_seq_length]\n            # So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]\n            # this attention mask is more simple than the triangular masking of causal attention\n            # used in OpenAI GPT, we just need to prepare the broadcast dimension here.\n            attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)\n\n            # Since attention_mask is 1.0 for positions we want to attend and 0.0 for\n            # masked positions, this operation will create a tensor which is 0.0 for\n            # positions we want to attend and -10000.0 for masked positions.\n            # Since we are adding it to the raw scores before the softmax, this is\n            # effectively the same as removing these entirely.\n            attention_mask = attention_mask.to(dtype=next(self.parameters()).dtype)  # fp16 compatibility\n            attention_mask = (1.0 - attention_mask) * -10000.0\n\n        # Prepare head mask if needed\n        # 1.0 in head_mask indicate we keep the head\n        # attention_probs has shape bsz x n_heads x N x N\n        # head_mask has shape n_layer x batch x n_heads x N x N\n        if head_mask is not None:\n            if head_mask.dim() == 1:\n                head_mask = head_mask.unsqueeze(0).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)\n                head_mask = head_mask.expand(self.config.n_layer, -1, -1, -1, -1)\n            elif head_mask.dim() == 2:\n                head_mask = (\n                    head_mask.unsqueeze(1).unsqueeze(-1).unsqueeze(-1)\n                )  # We can specify head_mask for each layer\n            head_mask = head_mask.to(\n                dtype=next(self.parameters()).dtype\n            )  # switch to fload if need + fp16 compatibility\n        else:\n            head_mask = [None] * self.config.n_layer\n\n        if inputs_embeds is None:\n            inputs_embeds = self.wte(input_ids)\n        position_embeds = self.wpe(position_ids)\n        if token_type_ids is not None:\n            token_type_embeds = self.wte(token_type_ids)\n        else:\n            token_type_embeds = 0\n        hidden_states = inputs_embeds + position_embeds + token_type_embeds\n        hidden_states = self.drop(hidden_states)\n\n        output_shape = input_shape + (hidden_states.size(-1),)\n\n        presents = ()\n        all_attentions = []\n        all_hidden_states = ()\n        for i, (block, layer_past) in enumerate(zip(self.h, past)):\n            if self.output_hidden_states:\n                all_hidden_states = all_hidden_states + (hidden_states.view(*output_shape),)\n\n            outputs = block(\n                hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]\n            )\n\n            hidden_states, present = outputs[:2]\n            if self.output_past:\n                presents = presents + (present,)\n\n            if self.output_attentions:\n                all_attentions.append(outputs[2])\n\n        hidden_states = self.ln_f(hidden_states)\n\n        hidden_states = hidden_states.view(*output_shape)\n        # Add last hidden state\n        if self.output_hidden_states:\n            all_hidden_states = all_hidden_states + (hidden_states,)\n\n        # added code here\n        representations, _ = self.averageSelfAttention(hidden_states, attention_mask.squeeze(1).squeeze(1))\n        mean = self.mean(representations)\n        logvar = self.logvar(representations)\n\n        outputs = (mean, logvar, hidden_states,)\n        if self.output_past:\n            outputs = outputs + (presents,)\n        if self.output_hidden_states:\n            outputs = outputs + (all_hidden_states,)\n        if self.output_attentions:\n            # let the number of heads free (-1) so we can extract attention even after head pruning\n            attention_output_shape = input_shape[:-1] + (-1,) + all_attentions[0].shape[-2:]\n            all_attentions = tuple(t.view(*attention_output_shape) for t in all_attentions)\n            outputs = outputs + (all_attentions,)\n\n        return outputs  # mean, logvar, last hidden state, (presents), (all hidden_states), (attentions)\n\n\nclass Decoder(GPT2Model):\n    def __init__(self, config, add_input=False, add_attn=False, attn_proj_vary=False):\n        super(GPT2Model, self).__init__(config)\n\n        # added code here\n        self.add_input = add_input\n        self.add_attn = add_attn\n        self.attn_proj_vary = attn_proj_vary\n\n        self.output_hidden_states = config.output_hidden_states\n        self.output_attentions = config.output_attentions\n        self.output_past = config.output_past\n\n        self.wte = nn.Embedding(config.vocab_size, config.n_embd)\n        self.wpe = nn.Embedding(config.n_positions, config.n_embd)\n        self.drop = nn.Dropout(config.embd_pdrop)\n\n        if self.add_input:\n            nz = config.n_embd\n            nx = config.n_embd\n            self.input_proj = nn.Linear(nz, nx, bias=False)\n\n        if self.add_attn:\n            nz = config.n_embd\n            nx = config.n_embd\n            n = config.n_layer\n\n            if self.attn_proj_vary:\n                self.attn_proj = nn.Linear(nz, nx * n, bias=False)\n            else:\n                self.attn_proj = nn.Linear(nz, nx, bias=False)\n\n            self.h = nn.ModuleList([Cond_Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])\n        else:\n            self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])\n        self.ln_f = nn.LayerNorm(config.n_embd, eps=config.layer_norm_epsilon)\n\n        self.init_weights()\n\n    def forward(\n            self,\n            input_ids=None,\n            past=None,\n            attention_mask=None,\n            token_type_ids=None,\n            position_ids=None,\n            head_mask=None,\n            inputs_embeds=None,\n            representations=None\n    ):\n        if input_ids is not None and inputs_embeds is not None:\n            raise ValueError(\"You cannot specify both input_ids and inputs_embeds at the same time\")\n        elif input_ids is not None:\n            input_shape = input_ids.size()\n            input_ids = input_ids.view(-1, input_shape[-1])\n        elif inputs_embeds is not None:\n            input_shape = inputs_embeds.size()[:-1]\n        else:\n            raise ValueError(\"You have to specify either input_ids or inputs_embeds\")\n\n        if token_type_ids is not None:\n            token_type_ids = token_type_ids.view(-1, input_shape[-1])\n        if position_ids is not None:\n            position_ids = position_ids.view(-1, input_shape[-1])\n\n        if past is None:\n            past_length = 0\n            past = [None] * len(self.h)\n        else:\n            past_length = past[0][0].size(-2)\n        if position_ids is None:\n            device = input_ids.device if input_ids is not None else inputs_embeds.device\n            position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)\n            position_ids = position_ids.unsqueeze(0).view(-1, input_shape[-1])\n\n        # Attention mask.\n        if attention_mask is not None:\n            attention_mask = attention_mask.view(-1, input_shape[-1])\n            # We create a 3D attention mask from a 2D tensor mask.\n            # Sizes are [batch_size, 1, 1, to_seq_length]\n            # So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]\n            # this attention mask is more simple than the triangular masking of causal attention\n            # used in OpenAI GPT, we just need to prepare the broadcast dimension here.\n            attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)\n\n            # Since attention_mask is 1.0 for positions we want to attend and 0.0 for\n            # masked positions, this operation will create a tensor which is 0.0 for\n            # positions we want to attend and -10000.0 for masked positions.\n            # Since we are adding it to the raw scores before the softmax, this is\n            # effectively the same as removing these entirely.\n            attention_mask = attention_mask.to(dtype=next(self.parameters()).dtype)  # fp16 compatibility\n            attention_mask = (1.0 - attention_mask) * -10000.0\n\n        # Prepare head mask if needed\n        # 1.0 in head_mask indicate we keep the head\n        # attention_probs has shape bsz x n_heads x N x N\n        # head_mask has shape n_layer x batch x n_heads x N x N\n        if head_mask is not None:\n            if head_mask.dim() == 1:\n                head_mask = head_mask.unsqueeze(0).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)\n                head_mask = head_mask.expand(self.config.n_layer, -1, -1, -1, -1)\n            elif head_mask.dim() == 2:\n                head_mask = (\n                    head_mask.unsqueeze(1).unsqueeze(-1).unsqueeze(-1)\n                )  # We can specify head_mask for each layer\n            head_mask = head_mask.to(\n                dtype=next(self.parameters()).dtype\n            )  # switch to fload if need + fp16 compatibility\n        else:\n            head_mask = [None] * self.config.n_layer\n\n        if inputs_embeds is None:\n            inputs_embeds = self.wte(input_ids)\n        position_embeds = self.wpe(position_ids)\n        if token_type_ids is not None:\n            token_type_embeds = self.wte(token_type_ids)\n        else:\n            token_type_embeds = 0\n        hidden_states = inputs_embeds + position_embeds + token_type_embeds\n\n        # add code here\n        if self.add_input:\n            assert (representations is not None)\n            input_proj = self.input_proj(representations).unsqueeze(1)\n            hidden_states = hidden_states + input_proj\n\n        hidden_states = self.drop(hidden_states)\n\n        output_shape = input_shape + (hidden_states.size(-1),)\n\n        # add code here\n        if self.add_attn:\n            assert (representations is not None)\n            attn_proj = self.attn_proj(representations).unsqueeze(1)\n            if self.attn_proj_vary:\n                attn_proj = attn_proj.split(hidden_states.size(-1), dim=-1)\n                assert len(attn_proj) == len(self.h)\n\n        presents = ()\n        all_attentions = []\n        all_hidden_states = ()\n        for i, (block, layer_past) in enumerate(zip(self.h, past)):\n            if self.output_hidden_states:\n                all_hidden_states = all_hidden_states + (hidden_states.view(*output_shape),)\n\n            if self.add_attn:\n                if self.attn_proj_vary:\n                    z = attn_proj[i]\n                else:\n                    z = attn_proj\n                outputs = block(\n                    hidden_states, z, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]\n                )\n            else:\n                outputs = block(\n                    hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]\n                )\n\n            hidden_states, present = outputs[:2]\n            if self.output_past:\n                presents = presents + (present,)\n\n            if self.output_attentions:\n                all_attentions.append(outputs[2])\n\n        hidden_states = self.ln_f(hidden_states)\n\n        hidden_states = hidden_states.view(*output_shape)\n        # Add last hidden state\n        if self.output_hidden_states:\n            all_hidden_states = all_hidden_states + (hidden_states,)\n\n        outputs = (hidden_states,)\n        if self.output_past:\n            outputs = outputs + (presents,)\n        if self.output_hidden_states:\n            outputs = outputs + (all_hidden_states,)\n        if self.output_attentions:\n            # let the number of heads free (-1) so we can extract attention even after head pruning\n            attention_output_shape = input_shape[:-1] + (-1,) + all_attentions[0].shape[-2:]\n            all_attentions = tuple(t.view(*attention_output_shape) for t in all_attentions)\n            outputs = outputs + (all_attentions,)\n\n        return outputs  # last hidden state, (presents), (all hidden_states), (attentions)\n\n\nclass LM_head_rep(nn.Module):\n    def __init__(self, in_dim=768, out_dim=50257):\n        super().__init__()\n\n        self.Nu_fc1 = nn.Linear(in_dim, 1024)\n        self.Nu_fc2 = nn.Linear(1024, out_dim)\n\n    def forward(self, z):\n        z = F.leaky_relu(self.Nu_fc1(z))\n        z = self.Nu_fc2(z)\n        return z\n\n\nclass VAEModel(GPT2LMHeadModel):\n    def __init__(self, config, add_input=False, add_attn=False, add_softmax=False, attn_proj_vary=False, learn_prior=False):\n        super(GPT2LMHeadModel, self).__init__(config)\n\n        # add code here\n        self.add_input = add_input\n        self.add_attn = add_attn\n        self.add_softmax = add_softmax\n        self.attn_proj_vary = attn_proj_vary\n        self.learn_prior = learn_prior\n\n        self.transformer = Decoder(config, add_input, add_attn, attn_proj_vary)\n        self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)\n\n        self.encoder = Encoder(config)\n        if self.learn_prior:\n            self.encoder_prior = Encoder(config)\n\n        if self.add_softmax:\n            nz = config.n_embd\n            self.lm_head_rep = Conv1D(config.vocab_size, nz)\n            # self.lm_head_rep = LM_head_rep(nz, config.vocab_size)\n\n    def reparameterize(self, mean, logvar, z=None):\n        std = logvar.mul(0.5).exp()\n        if z is None:\n            z = torch.randn(std.size(), device=mean.device, dtype=mean.dtype)\n        return z.mul(std) + mean\n\n    def kl_loss(self, mean1, logvar1, mean2, logvar2):\n        exponential = logvar1 - logvar2 - torch.pow(mean1 - mean2, 2) / logvar2.exp() - torch.exp(logvar1 - logvar2) + 1\n        result = -0.5 * torch.sum(exponential, tuple(range(1, len(exponential.shape))))\n        return result.mean()\n\n    def forward(\n        self,\n        input_ids=None,\n        past=None,\n        attention_mask=None,\n        token_type_ids=None,\n        position_ids=None,\n        head_mask=None,\n        inputs_embeds=None,\n        labels=None,\n        x_mask=None,\n        x_tokens=None,\n        y_mask=None,\n        y_tokens=None,\n        from_prior=False,\n        from_mean=False\n    ):\n        # latent representation\n        posterior_mean, posterior_logvar = self.encoder(input_ids=y_tokens, attention_mask=y_mask)[:2]\n\n        if self.learn_prior:\n            prior_mean, prior_logvar = self.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n        else:\n            prior_mean = prior_logvar = torch.zeros([input_ids.size(0), self.config.n_embd], device=input_ids.device)\n            prior_mean, prior_logvar = prior_mean.to(posterior_mean.dtype), prior_logvar.to(posterior_logvar.dtype)\n\n        if from_prior:\n            latent_mean, latent_logvar = prior_mean, prior_logvar\n        else:\n            latent_mean, latent_logvar = posterior_mean, posterior_logvar\n\n        if from_mean:\n            z = latent_mean\n        else:\n            z = self.reparameterize(latent_mean, latent_logvar)\n        assert not torch.isnan(z).any(), 'training get nan z'\n\n        transformer_outputs = self.transformer(input_ids,\n                                               past=past,\n                                               attention_mask=attention_mask,\n                                               token_type_ids=token_type_ids,\n                                               position_ids=position_ids,\n                                               head_mask=head_mask,\n                                               inputs_embeds=inputs_embeds,\n                                               representations=z)\n        hidden_states = transformer_outputs[0]\n        lm_logits = self.lm_head(hidden_states)\n        if self.add_softmax:\n            lm_logits_rep = self.lm_head_rep(z)\n            lm_logits = lm_logits + lm_logits_rep.unsqueeze(dim=1)\n        outputs = (lm_logits,) + transformer_outputs[1:]\n\n        # kl_loss\n        kl_loss = self.kl_loss(posterior_mean, posterior_logvar, prior_mean, prior_logvar).unsqueeze(0)\n        outputs = outputs + (kl_loss,)\n\n        return outputs  # lm_logits, presents, (all hidden_states), (attentions), (kl_loss)\n"
  },
  {
    "path": "train.py",
    "content": "import os, time, gc, json, pickle, argparse, math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.data as data\nfrom torch.nn import DataParallel\nimport torch.distributed as dist\nimport torch.multiprocessing as mp\nimport numpy as np\nimport transformers\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config, AdamW, get_linear_schedule_with_warmup, Conv1D\nfrom tensorboardX import SummaryWriter\nfrom tqdm import tqdm\nimport importlib\nimport logging\nimport copy\n\nfrom apex.optimizers import FusedAdam\nfrom apex import amp\nfrom apex.fp16_utils import FP16_Optimizer\n\nfrom data.util import *\nfrom util import *\n\nfrom model import *\n\nfrom collections import Counter\nfrom nltk.translate.bleu_score import sentence_bleu\nfrom nltk.translate.bleu_score import SmoothingFunction\nfrom rouge import Rouge\n\nfrom sklearn.manifold import TSNE\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n\ndevices = '0'\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = devices\n\n\ndef compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, x_mask=x_mask, x_tokens=x_tokens, y_mask=y_mask,\n                    y_tokens=y_tokens)\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1))\n    kl_loss = kl_loss.mean()\n    loss = ce_loss.mean() + beta * kl_loss\n\n    return loss, ce_loss, kl_loss\n\n\ndef compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, y_mask=x_mask, y_tokens=x_tokens, from_mean=True, from_prior=False)\n\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1))\n    kl_loss = kl_loss.mean()\n    loss = ce_loss.mean()\n\n    return loss, ce_loss, kl_loss\n\n\ndef train_step(device, model, optimizer, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta, model_type):\n    output = []\n    if model_type == 'ae_vae_fusion':\n        optimizer.zero_grad()\n        loss, ce_loss, kl_loss = compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                              target_tokens, mask, loss_fn, beta)\n        with amp.scale_loss(loss, optimizer) as scaled_loss:\n            scaled_loss.backward()\n            torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), 5.0)  # max_grad_norm=1.0\n        # loss.backward()\n        # torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # max_grad_norm=1.0\n        optimizer.step()\n        output.append((loss.item(), ce_loss.mean().item(), kl_loss.item()))\n\n    optimizer.zero_grad()\n    loss, ce_loss, kl_loss = compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                          target_tokens, mask, loss_fn, beta)\n    with amp.scale_loss(loss, optimizer) as scaled_loss:\n        scaled_loss.backward()\n        torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), 5.0)  # max_grad_norm=1.0\n    # loss.backward()\n    # torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # max_grad_norm=1.0\n    optimizer.step()\n    output.append((loss.item(), ce_loss.mean().item(), kl_loss.item()))\n\n    return output\n\n\ndef top_k_top_p_filtering(logits, top_k=100, top_p=0.95, filter_value=-float('Inf')):\n    \"\"\" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering\n        Args:\n            logits: logits distribution shape (vocabulary size)\n            top_k > 0: keep only top k tokens with highest probability (top-k filtering).\n            top_p > 0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).\n                Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)\n        From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317\n    \"\"\"\n    top_k = min(top_k, logits.size(-1))  # Safety check\n    if top_k > 0:\n        # Remove all tokens with a probability less than the last token of the top-k\n        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]\n        logits[indices_to_remove] = filter_value\n\n    if top_p > 0.0:\n        sorted_logits, sorted_indices = torch.sort(logits, descending=True)\n        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)\n\n        # Remove tokens with cumulative probability above the threshold\n        sorted_indices_to_remove = cumulative_probs > top_p\n        # Shift the indices to the right to keep also the first token above the threshold\n        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()\n        sorted_indices_to_remove[..., 0] = 0\n\n        # scatter sorted tensors to original indexing\n        indices_to_remove = sorted_indices_to_remove.scatter(dim=1, index=sorted_indices, src=sorted_indices_to_remove)\n        logits[indices_to_remove] = filter_value\n\n    return logits\n\n\ndef repeat_score(text, ngram=[3, 4, 5, 6]):\n    ngram_list = []\n    for ng in ngram:\n        ngram_list.append([text[idx:idx + ng] for idx in range(len(text) - ng - 1)])\n\n    max_occurs = []\n    for ngrams in ngram_list:\n        count_result = Counter([' '.join(n) for n in ngrams])\n        try:\n            max_occurs.append(\n                max(count_result.values())\n            )\n        except:\n            pass\n\n    scores = [max_oc / ((len(text) / ngram[idx]) + ngram[idx]) for idx, max_oc in enumerate(max_occurs)]\n    return max(scores) if len(scores) >= 1 else 1.0\n\n\ndef sample_sequence(model, tokenizer, length, batch_size=None, x_mask=None, x_tokens=None, y_mask=None, y_tokens=None,\n                    temperature=1, top_k=100, top_p=0.95, device='cuda', sample=True, eos_token=None, model_type='cvae'):\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    with torch.no_grad():\n        if model_type == 'cvae':\n            try:\n                prior_mean, prior_logvar = model.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            except:\n                prior_mean = prior_logvar = torch.zeros([batch_size, model.config.n_embd], device=device)\n            latent_mean, latent_logvar = prior_mean, prior_logvar\n            z = model.reparameterize(latent_mean, latent_logvar)\n            assert not torch.isnan(z).any(), 'training get nan z'\n        else:\n            posterior_mean, posterior_logvar = model.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            latent_mean, latent_logvar = posterior_mean, posterior_logvar\n            z = latent_mean\n            assert not torch.isnan(z).any(), 'training get nan z'\n\n        _, mem = model.transformer(input_ids=x_tokens[:, :-1], past=None, attention_mask=x_mask[:, :-1], representations=z)\n        prev = x_tokens[:, -1].view(batch_size, -1)\n\n        output = prev\n        probability = torch.tensor([], dtype=z.dtype, device=device)\n        if_end = torch.tensor([False] * batch_size, dtype=torch.bool, device=device)\n\n        for i in range(length): #trange\n            logits, mem = model.transformer(input_ids=prev, past=mem, representations=z)\n\n            logits = model.lm_head(logits)\n            if model.add_softmax:\n                logits_rep = model.lm_head_rep(z)\n                logits = logits + logits_rep.unsqueeze(dim=1)\n\n            logits = logits[:, -1, :] / temperature\n            logits = top_k_top_p_filtering(logits, top_k, top_p)\n            probs = F.softmax(logits, dim=-1)\n            if sample:\n                next_token = torch.multinomial(probs, num_samples=1)\n            else:\n                _, next_token = torch.topk(probs, k=1, dim=-1)\n\n            probability = torch.cat((probability, probs.gather(1, next_token)), dim=1)\n            output = torch.cat((output, next_token), dim=1)\n            prev = next_token\n\n            # early stopping if all sents have ended once\n            if_end[next_token.view(-1).eq(eos_token)] = True\n            if if_end.all(): break\n    return output, probability\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('experiment', type=str)\n\n    # Default parameters are set based on single GPU training\n    parser.add_argument('--lr', type=float, default=5e-5)\n    parser.add_argument(\"--seed\", type=int, default=0)\n\n    parser.add_argument('--data_type', type=str, default='t1', choices=['t' + str(i) for i in range(9)], help=\"t: type\")\n    parser.add_argument('--model_type', type=str, default='cvae', choices=['cvae', 'ae_vae_fusion'])\n    parser.add_argument('--iterations', type=int, default=101640 * 4)  # wp 850001  wi 300001 ax 300001 yp 800001\n    parser.add_argument('--dataset', type=str, default='wi', choices=['ax', 'yp', 'wp', 'wi'], help=\"Dataset to use for training\")\n    parser.add_argument('--warmup', type=int, default=10000,\n                        help=\"Amount of iterations to warmup, then decay. (-1 for no warmup and decay)\")\n\n    parser.add_argument('--batch-sizes', nargs='+', type=int, default=[1],\n                        help='batch size per GPU. Lists the schedule.')\n    parser.add_argument('--seq-lens', nargs='+', type=int, default=[1024],\n                        help='seq length per sample. Lists the schedule.')\n    parser.add_argument('--switch-time', type=float, default=0,\n                        help=\"Percentage of iterations to spend on short sequence training.\")\n    parser.add_argument('--data-dir', type=str, default='data')\n    parser.add_argument('--out-dir', type=str, default='out')\n    parser.add_argument('--load', type=str, help='path to load model from') # , default='out/test/'\n    parser.add_argument('--workers', default=1, type=int, metavar='N',\n                        help='number of data loading workers')\n    # use GPU\n    parser.add_argument('--gpu', default=0, type=int)\n    parser.add_argument('--no_gpu', action=\"store_true\")\n\n    parser.add_argument('--fp16', action='store_true', help=\"Train using FP16?\")\n    parser.add_argument('--fp16_opt_level', default='O0', type=str, required=False)\n\n    # KL cost annealing, increase beta from beta_0 to 1 in beta_warmup steps\n    parser.add_argument('--beta_0', default=1.00, type=float)\n    parser.add_argument('--beta_warmup', type=int, default=50000)\n    # cyc_vae parameters\n    parser.add_argument('--cycle', type=int, default=101640)\n\n    parser.add_argument('--add_input', action=\"store_true\")\n    parser.add_argument('--add_attn', action=\"store_true\")\n    parser.add_argument('--add_softmax', action=\"store_true\")\n    parser.add_argument('--attn_proj_vary', action=\"store_true\")\n\n    parser.add_argument('--learn_prior', action=\"store_true\")\n\n    args = parser.parse_args('test --batch-sizes 1 --seq-lens 1024 '\n                             '--add_input --learn_prior --fp16'.split()) # wi.12.proj_vary_beta_cvae\n\n    if args.model_type == 'cvae':\n        args.learn_prior = True\n    else:\n        args.learn_prior = False\n\n    # GPU\n    if not torch.cuda.is_available(): args.no_gpu = True\n    gpu = not args.no_gpu\n    if gpu:\n        print(\"There are \", torch.cuda.device_count(), \" available GPUs!\")\n        # print('Setting GPUs {}'.format(args.device))\n        print('Using GPU devices {}'.format(devices))\n        torch.cuda.set_device(args.gpu)\n        print('Current single GPU: {}'.format(torch.cuda.current_device()))\n    device = torch.device(args.gpu if gpu else \"cpu\")\n\n    # randomness\n    np.random.seed(args.seed)\n    prng = np.random.RandomState()\n    torch.random.manual_seed(args.seed)\n    if gpu: torch.cuda.manual_seed(args.seed); torch.cuda.manual_seed_all(args.seed)\n\n    # logging\n    save_folder = os.path.join(args.out_dir, args.experiment)\n    os.makedirs(save_folder, exist_ok=True)\n    t_writer = SummaryWriter(os.path.join(save_folder, 'train'), flush_secs=5)\n    v_writer = SummaryWriter(os.path.join(save_folder, 'val'), flush_secs=5)\n    importlib.reload(logging)\n    logging.basicConfig(filename=os.path.join(save_folder, 'train.log'),\n                        level=logging.INFO, format='%(asctime)s--- %(message)s')\n    logging.info('\\n*******************************************************************************\\n')\n    logging.info(\"the configuration:\")\n    logging.info(str(args).replace(',', '\\n'))\n\n    print('Loading models...')\n    cache_dir = os.path.join(args.out_dir, 'model_cache')\n    os.makedirs(cache_dir, exist_ok=True)\n    # Load pre-trained teacher tokenizer (vocabulary)\n    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n    # Hack to allow tokenizing longer sequences.\n    tokenizer.max_len = int(1e12)\n    gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\n    print('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\n    config = GPT2Config()\n\n    # add special tokens\n    # special_tokens_dict = {\n    #     'pad_token': '<|startoftext|>',\n    #     'cls_token': '<|startofcond|>',\n    #     'sep_token': '<|sepofcond|>',\n    #     'mask_token': '<|endofcond|>'\n    # }\n    # num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\n    # print('We have added', num_added_toks, 'special tokens')\n    # # Notice: resize_token_embeddings expect to receive the full size of the new vocab\n    # gpt2_model.resize_token_embeddings(len(tokenizer))\n    # assert tokenizer.pad_token == '<|startoftext|>'\n\n    VAE = VAEModel(config, add_input=args.add_input, add_attn=args.add_attn, add_softmax=args.add_softmax,\n                   attn_proj_vary=args.attn_proj_vary, learn_prior=args.learn_prior)\n    init_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\n    init_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\n    if args.learn_prior:\n        init_para_frompretrained(VAE.encoder_prior, VAE.encoder, share_para=True)\n        VAE.encoder_prior.averageSelfAttention.attention_weights = VAE.encoder.averageSelfAttention.attention_weights\n    VAE.lm_head.weight = gpt2_model.lm_head.weight\n    if VAE.add_softmax:\n        VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\n        # VAE.lm_head_rep = LM_head_rep(*gpt2_model.lm_head.weight.size()[::-1])\n    print('VAE_params:', num_params(VAE))  # 286694400\n    if args.load:\n        print('Loading model weights...')\n        state = torch.load(os.path.join(args.load, 'model_latest.pt'))  # , map_location='cpu' model_latest.pt\n        if 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n            state_copy = copy.copy(state)\n            keys = state_copy.keys()\n            for k in keys:\n                state[k.replace('module.', '')] = state.pop(k)\n        VAE.load_state_dict(state)\n        gc.collect()\n    print('Done.')\n\n    # fix pre-trained parameters before certain iterations\n    tuning_all_after_iters = 40000\n    tuning_all = False\n    for name, parameter in VAE.named_parameters():\n        # print((name, parameter.requires_grad))\n        new_pars = ['c_z', 'attention_weights', 'mean', 'logvar', 'input_proj', 'attn_proj', 'Nu_fc1', 'Nu_fc2', 'lm_head_rep']\n\n        if not any([True if n in name else False for n in new_pars]):\n           parameter.requires_grad = False\n\n    print('Setup data...')\n    # Batch and sequence length schedule\n    assert len(args.batch_sizes) == len(args.seq_lens)\n    batch_schedule = list(zip(map(int, args.batch_sizes), map(int, args.seq_lens)))\n    assert len(batch_schedule) <= 2, 'Currently not supporting multiple schedule'\n    cur_b_schedule = len(batch_schedule) - 1 if args.switch_time == 0 else 0\n    print('Batch schedule', batch_schedule)\n    train_loader, val_loader, test_loader = prepare_dataset(\n        args.data_dir, args.dataset, tokenizer,\n        batch_schedule[cur_b_schedule][0], batch_schedule[cur_b_schedule][1],\n        batch_schedule[-1][0], batch_schedule[-1][1],\n        batch_schedule[-1][0], batch_schedule[-1][1],\n        make_test=True,\n        num_workers=args.workers, data_type=args.data_type\n    )\n    print('Done.')\n\n    ###\n    val_loader = test_loader\n    ###\n\n    print('Wrapping models and optimizers...')\n    # Apply linear scaling rule to increase batch size for short sequence training.\n    lr_schedule = switch_schedule(linear_schedule(args), batch_schedule[cur_b_schedule][0] / batch_schedule[-1][0],\n                                  int(args.iterations * args.switch_time))\n    VAE = VAE.to(device)\n    VAE.train()\n\n    optimizer = AdamW(VAE.parameters(), lr=args.lr, correct_bias=True)\n    scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_schedule)\n    VAE, optimizer = amp.initialize(VAE, optimizer, opt_level=args.fp16_opt_level)\n\n    loss_fn = nn.CrossEntropyLoss(reduction='none')\n    print('Done.')\n\n    print('Begin training iterations')\n    logging.info(\"Begin training iterations\")\n    max_val_batches = 20000  # max num. of val batches\n    logging.info(\"Total iteration: %d\" % args.iterations)\n    e = 0  # number of epoch\n    num_iters = 0\n    optimizer.zero_grad()\n    beta = args.beta_0\n    endoftext = tokenizer.convert_tokens_to_ids(\"<|endoftext|>\")\n\n    def val_step(val_loader):\n        VAE.eval()\n\n        n_words_bpe = 0\n        n_words = 0\n        logp_sum = 0.0\n        kl_loss_sum = 0.0\n\n        logging.info(\"Validation loop.         Batches: %d\" % len(val_loader))\n        logging.info(\"Validation loop. max_val_batches: %d\" % max_val_batches)\n\n        # val_iter = iter(val_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(val_iter)\n        with tqdm(total=min(len(val_loader), max_val_batches)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(val_loader):\n                with torch.no_grad():\n                    if args.model_type == 'cvae':\n                        loss, ce_loss, kl_loss = compute_loss(device, VAE, x_mask, x_tokens, y_mask, y_tokens,\n                                                              input_tokens, target_tokens, mask, loss_fn, 1.0)\n                    else:\n                        loss, ce_loss, kl_loss = compute_loss_ae(device, VAE, x_mask, x_tokens, y_mask, y_tokens,\n                                                              input_tokens, target_tokens, mask, loss_fn, 1.0)\n\n                if len(target_tokens.size()) == 1:\n                    target_tokens = target_tokens.unsqueeze(0)\n                n, l = target_tokens.size()\n\n                text = target_tokens[0, :].tolist()\n                logprob = ce_loss.tolist()\n                assert len(text) == len(logprob)\n\n                # only for story\n                idx = text.index(endoftext)\n                text = text[idx + 1:]\n                logprob = logprob[idx + 1:]\n\n                if endoftext in text:\n                    idx = text.index(endoftext)\n                    text = text[:idx]\n                    logprob = logprob[:idx]\n\n                logp_sum += sum(logprob)\n\n                n_words_bpe += len(text)\n\n                story = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n                story = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in story]\n                story = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in\n                         story]\n                words = sum([len(\n                    [t for t in re.split('(\"|\\'|!|\\?|\\.|,|:| |\\n|’|“|”|;|\\(|\\)|`)', s) if t != ' ' and t != '']) for\n                    s in story])\n                n_words += words\n\n                kl_loss_sum += kl_loss.item()\n\n                if i > max_val_batches:\n                    break\n                pbar.update(1)\n\n        loss_bpe = logp_sum / n_words_bpe\n        ppl_bpe = round(math.exp(min(logp_sum / n_words_bpe, 100)), 3)\n        ppl_word = round(math.exp(min(logp_sum / n_words, 100)), 3)\n        kl = kl_loss_sum / len(val_loader)\n\n        v_writer.add_scalar('loss', loss_bpe, num_iters)\n        v_writer.add_scalar('ppl_bpe', ppl_bpe, num_iters)\n        v_writer.add_scalar('ppl_word', ppl_word, num_iters)\n        v_writer.add_scalar('kl', kl, num_iters)\n        logging.info('val loss    : %.4f' % loss_bpe)\n        logging.info('val ppl_bpe : %.4f' % ppl_bpe)\n        logging.info('val ppl_word: %.4f' % ppl_word)\n        logging.info('val   kl    : %.4f' % kl)\n\n        VAE.train()\n\n    def test_plot(test_loader, num_iters):\n        VAE.eval()\n\n        # get embedding\n        X_emb = None\n        y = None\n\n        # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n        with tqdm(total=len(test_loader)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(\n                    test_loader):\n                y_mask = y_mask.to(device)\n                y_tokens = y_tokens.to(device)\n                x_mask = x_mask.to(device)\n                x_tokens = x_tokens.to(device)\n                with torch.no_grad():\n                    if args.model_type == 'cvae':\n                        latent_mean, latent_logvar = VAE.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n                    else:\n                        latent_mean, latent_logvar = VAE.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n\n                if args.dataset == 'ax' or args.dataset == 'yp':\n                    label = [tokenizer.decode(l)[:2] for l in x_tokens.tolist()]\n                elif args.dataset == 'wp':\n                    label = []\n                    prompts = [tokenizer.decode(l)[:6].lower() for l in x_tokens.tolist()]\n                    for prom in prompts:\n                        if prom[0] in ['[', '('] and prom[5] in [']', ')']:\n                            label.append(prom[2:4])\n                        else:\n                            label.append(None)\n                elif args.dataset == 'wi':\n                    # 0. TV, play, miniseries, telenovela; 1.film; 2. music; 3. manga, comic, 4. book, novel, story 5. game\n                    label = []\n                    prompts = [tokenizer.decode(l) for l in x_tokens.tolist()]\n                    for prom in prompts:\n                        if 'TV' in prom or 'play' in prom or 'miniseries' in prom or 'telenovela' in prom:\n                            label.append(0)\n                        elif 'film' in prom:\n                            label.append(1)\n                        elif 'music' in prom:\n                            label.append(2)\n                        elif 'manga' in prom or 'comic' in prom:\n                            label.append(3)\n                        elif 'book' in prom or 'novel' in prom or 'story' in prom:\n                            label.append(4)\n                        elif 'game' in prom:\n                            label.append(5)\n                        else:\n                            label.append(None)\n                else:\n                    raise Exception\n\n                if i == 0:\n                    X_emb = latent_mean.data\n                    y = label\n                else:\n                    X_emb = torch.cat((X_emb, latent_mean.data), dim=0)\n                    y.extend(label)\n                pbar.update(1)\n        X_emb = X_emb.cpu().numpy()\n\n        try:\n            if args.dataset == 'yp':\n                y = ['0' if l in ['0', '1'] else l for l in y]\n                y = ['4' if l in ['3', '4'] else l for l in y]\n                X_emb = X_emb[[l != '2' for l in y], :]\n                y = [l for l in y if l != '2']\n\n            if args.dataset == 'wp':\n                topics = [['wp', 'sp', 'tt'], ['eu'], ['cw'], ['pm'], ['mp', 'ip'], ['pi', 'cc'], ['ot'], ['rf']]\n                match = [[True if l in t else False for t in topics] for l in y]\n                y = [m.index(True) if True in m else None for m in match]\n                X_emb = X_emb[[l is not None for l in y], :]\n                y = [l for l in y if l is not None]\n\n            if args.dataset == 'wi':\n                X_emb = X_emb[[l is not None for l in y], :]\n                y = [l for l in y if l is not None]\n\n            # to 2D\n            # X_emb_2d = TSNE(n_components=2, init='pca', verbose=1).fit_transform(X_emb)\n            X_emb_2d = TSNE(n_components=2, verbose=1, perplexity=40).fit_transform(X_emb)\n\n            def remove_outliers(data, r=2.0):\n                outliers_data = abs(data - np.mean(data, axis=0)) >= r * np.std(data, axis=0)\n                outliers = np.any(outliers_data, axis=1)\n                keep = np.logical_not(outliers)\n                return outliers, keep\n\n            outliers, keep = remove_outliers(X_emb_2d)\n            X_emb_2d = X_emb_2d[keep, :]\n            y = [l for l, k in zip(y, keep.tolist()) if k]\n\n            # plot\n            fig = plt.figure(figsize=(4, 4))\n            ax = fig.add_axes([0, 0, 1, 1])\n            cc = ['r', 'b', 'g', 'y', 'k', 'c', 'm', 'tab:blue']\n            for i, l in enumerate(sorted(set(y))):\n                idx = [yl == l for yl in y]\n                plt.scatter(X_emb_2d[idx, 0], X_emb_2d[idx, 1], c=cc[i], s=10, edgecolor='none', alpha=0.5)\n            ax.axis('off')  # adding it will get no axis\n            plt.savefig(os.path.join(save_folder, 'tSNE_' + '{:07d}'.format(num_iters) + '.png'))\n            plt.close(fig)\n        except:\n            pass\n\n        VAE.train()\n\n    def generate(test_loader, num_iters):\n        VAE.eval()\n\n        n_samples = 0\n        bleu4_sum = 0.0\n        rouge_scores_values_sum = [0.0] * 9\n\n        args.nsamples = 1\n        args.batch_size = 1\n        args.temperature = 0.95\n        args.top_k = 100\n        args.top_p = 0.95\n        model_type = args.model_type\n\n        # write samples to file\n        samples_file = open(os.path.join(save_folder, 'generate-' + '%07d' % num_iters + '.txt'), 'w', encoding='utf8')\n\n        # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n        with tqdm(total=len(test_loader)) as pbar:\n            for i_test, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(\n                    test_loader):\n\n                if i_test >= 10: break\n\n                length = -1\n                if length == -1:\n                    length = VAE.config.n_ctx - x_tokens.size(1) - 1\n                elif length > VAE.config.n_ctx - x_tokens.size(1) - 1:\n                    raise ValueError(\"Can't get samples longer than window size: %s\" % VAE.config.n_ctx)\n\n                eff_samples = []\n                n, l = target_tokens.size()\n                storys = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n                storys = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in storys]\n                storys_str = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in\n                              storys]\n\n                for _ in range(args.nsamples // args.batch_size):\n                    # model, batch_size, temperature, top_k, top_p, eos_token, sample = VAE, args.batch_size, args.temperature, args.top_k, args.top_p, tokenizer.encoder['<|endoftext|>'], True\n                    out, _ = sample_sequence(\n                        model=VAE,\n                        tokenizer=tokenizer,\n                        length=length,\n                        batch_size=args.batch_size,\n                        x_mask=x_mask,\n                        x_tokens=x_tokens,\n                        y_mask=y_mask,\n                        y_tokens=y_tokens,\n                        temperature=args.temperature,\n                        top_k=args.top_k,\n                        top_p=args.top_p,\n                        device=device,\n                        eos_token=tokenizer.encoder['<|endoftext|>'],\n                        model_type=model_type\n                    )\n                    out = out.tolist()\n\n                    # extract story, check metrics\n                    for i in range(len(out)):\n                        text = out[i]\n                        text = text[text.index(endoftext) + 1:]\n\n                        if endoftext in text:\n                            idx = text.index(endoftext)\n                            text = text[:idx]\n\n                        text = tokenizer.decode(text).strip()\n\n                        # score for one long text, higher than 0.075 usually means repetition\n                        # rep_score = repeat_score(text.split(), ngram=[3, 4, 5, 6, 7, 8])\n                        # if rep_score > 0.075:\n                        #     # print(rep_score)\n                        #     continue\n\n                        try:\n                            # check bleu\n                            bleu4 = sentence_bleu([storys_str[i].split()], text,\n                                                  smoothing_function=SmoothingFunction().method7)\n\n                            # check rouge\n                            rouge = Rouge()\n                            rouge_scores = rouge.get_scores(text, storys_str[i])\n                            rouge_scores_values = [v for k in rouge_scores[0].keys() for v in\n                                                   rouge_scores[0][k].values()]\n\n                            bleu4_sum += bleu4\n                            rouge_scores_values_sum = [v1 + v2 for v1, v2 in\n                                                       zip(rouge_scores_values_sum, rouge_scores_values)]\n                            n_samples += 1\n                        except:\n                            bleu4 = 0.0\n                            rouge_scores = [{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                             'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                             'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}]\n\n                        eff_samples.append((text, bleu4, rouge_scores))\n\n                    pbar.update(1)\n\n                for i in range(len(eff_samples)):\n                    samples_file.write(\"=\" * 50 + \" SAMPLE \" + str(i_test) + \" \" + \"=\" * 50)\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Outlines  \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(tokenizer.decode(x_tokens[i, :][x_mask[i, :] == 1].tolist()))\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(\"=\" * 40 + \" Story \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(storys_str[i])\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Generated \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(eff_samples[i][0])\n                    samples_file.write('\\n' * 4)\n                    samples_file.flush()\n\n        print('Test complete with %05d samples.' % n_samples)\n        logging.info(\"Test complete with %05d samples.\", n_samples)\n        logging.info(\"Iteration completed: %d\" % num_iters)\n\n        bleu4 = round(bleu4_sum / n_samples, 3)\n        rouge_scores_values = [round(r / n_samples, 3) for r in rouge_scores_values_sum]\n        print(' bleu-4:', bleu4)\n        print(' rouge :', rouge_scores_values)\n        logging.info(' bleu-4: %f', bleu4)\n        logging.info(' rouge : %s', str(rouge_scores_values))\n\n        VAE.train()\n\n    test_plot(test_loader, num_iters)\n    val_step(val_loader)\n    generate(test_loader, num_iters)\n    torch.save(VAE.state_dict(), os.path.join(save_folder, 'model_' + '{:07d}'.format(num_iters) + '.pt'))\n\n    while num_iters < args.iterations:\n        # Run epoch\n        st = time.time()\n\n        # Training\n        print('Training loop. Batches:', len(train_loader))\n        logging.info('\\n----------------------------------------------------------------------')\n        logging.info(\"Training loop.       Batches: %d\" % len(train_loader))\n\n        # train_iter = iter(train_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(train_iter)\n        with tqdm(total=len(train_loader)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(train_loader):\n\n                # if num_iters % args.cycle >= args.cycle - args.beta_warmup:\n                #     beta = min(1.0, beta + (1. - args.beta_0) / args.beta_warmup)\n\n                if not tuning_all and num_iters >= tuning_all_after_iters:\n                    for name, parameter in VAE.named_parameters():\n                        # print((name, parameter.requires_grad))\n                        parameter.requires_grad = True\n                    tuning_all = True\n\n                output = train_step(device, VAE, optimizer, x_mask, x_tokens, y_mask, y_tokens,\n                                    input_tokens, target_tokens, mask, loss_fn, beta, args.model_type)\n                loss, ce_loss, kl_loss = output[-1]\n\n                lr = scheduler.get_last_lr()[0]\n                # Log to Tensorboard\n                t_writer.add_scalar('loss', loss, num_iters)\n                t_writer.add_scalar('ppl', math.exp(min(ce_loss, 10)), num_iters)\n                t_writer.add_scalar('lr', lr, num_iters)\n                t_writer.add_scalar('iter_time', time.time() - st, num_iters)\n                t_writer.add_scalar('kl', kl_loss, num_iters)\n                t_writer.add_scalar('beta', beta, num_iters)\n\n                if args.model_type == 'ae_vae_fusion':\n                    loss, ce_loss, kl_loss = output[0]\n                    # Log to Tensorboard\n                    t_writer.add_scalar('ae_loss', loss, num_iters)\n                    t_writer.add_scalar('ae_kl', kl_loss, num_iters)\n\n                st = time.time()\n                end = num_iters >= args.iterations\n\n                if args.warmup != -1:\n                    scheduler.step()\n\n                if end: break\n                num_iters += 1\n                pbar.update(1)\n\n                if num_iters % args.cycle == 0:\n                    beta = args.beta_0\n                    logging.info('KL annealing restart')\n\n                if num_iters % 10000 == 0:\n                    test_plot(test_loader, num_iters)\n                    val_step(val_loader)\n                    generate(test_loader, num_iters)\n\n                if num_iters % 50000 == 0:\n                    print('Saving model...')\n                    logging.info(\"Iteration completed: %d, remained %d\" % (num_iters, args.iterations - num_iters))\n                    logging.info(\"Saving model...\")\n                    logging.info('\\n------------------------------------------------------')\n                    torch.save(VAE.state_dict(), os.path.join(save_folder, 'model_' + '{:07d}'.format(num_iters) + '.pt'))\n\n                if args.switch_time > 0 and num_iters == int(args.iterations * args.switch_time):\n                    print('Switch to long sequence training')\n                    logging.info(\"Switch to long sequence training\")\n                    cur_b_schedule += 1\n                    train_loader, val_loader, test_loader = prepare_dataset(\n                        args.data_dir, args.dataset, tokenizer,\n                        batch_schedule[cur_b_schedule][0], batch_schedule[cur_b_schedule][1],\n                        batch_schedule[-1][0], batch_schedule[-1][1],\n                        batch_schedule[-1][0], batch_schedule[-1][1],\n                        make_test=True,\n                        num_workers=args.workers, data_type=args.data_type\n                    )\n        if not end:\n            e += 1\n            logging.info(\"Training loop. The ith epoch completed: %d\" % e)\n\n    torch.save(VAE.state_dict(), os.path.join(save_folder, 'model_latest.pt'))\n    print('Training complete.')\n    logging.info(\"Training complete.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "train_dist.py",
    "content": "import os, time, gc, json, pickle, argparse, math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.data as data\nfrom torch.nn import DataParallel\nimport torch.distributed as dist\nimport torch.multiprocessing as mp\nimport numpy as np\nimport transformers\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config, AdamW, get_linear_schedule_with_warmup, Conv1D\nfrom tensorboardX import SummaryWriter\nfrom tqdm import tqdm\nimport importlib\nimport logging\nimport copy\n\nfrom apex.optimizers import FusedAdam\nfrom apex import amp\nfrom apex.fp16_utils import FP16_Optimizer\n\nfrom apex.parallel import DistributedDataParallel as DDP\nfrom torch.nn.parallel import DistributedDataParallel\n\nfrom data.util import *\nfrom util import *\n\nfrom model import VAEModel\n\nfrom collections import Counter\nfrom nltk.translate.bleu_score import sentence_bleu\nfrom nltk.translate.bleu_score import SmoothingFunction\nfrom rouge import Rouge\n\nfrom sklearn.manifold import TSNE\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n\ndevices = '2,1,0'\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = devices\n\n\ndef compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, x_mask=x_mask, x_tokens=x_tokens, y_mask=y_mask,\n                    y_tokens=y_tokens)\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1))\n    kl_loss = kl_loss.mean()\n    loss = ce_loss.mean() + beta * kl_loss\n\n    return loss, ce_loss, kl_loss\n\n\ndef compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, y_mask=x_mask, y_tokens=x_tokens, from_mean=True, from_prior=False)\n\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1))\n    kl_loss = kl_loss.mean()\n    loss = ce_loss.mean()\n\n    return loss, ce_loss, kl_loss\n\n\ndef train_step(device, model, optimizer, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta, model_type):\n    output = []\n    if model_type == 'ae_vae_fusion':\n        optimizer.zero_grad()\n        loss, ce_loss, kl_loss = compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                              target_tokens, mask, loss_fn, beta)\n        with amp.scale_loss(loss, optimizer) as scaled_loss:\n            scaled_loss.backward()\n            torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), 5.0)  # max_grad_norm=1.0\n        optimizer.step()\n        output.append((loss.item(), ce_loss.mean().item(), kl_loss.item()))\n\n    optimizer.zero_grad()\n    loss, ce_loss, kl_loss = compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                          target_tokens, mask, loss_fn, beta)\n    with amp.scale_loss(loss, optimizer) as scaled_loss:\n        scaled_loss.backward()\n        torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), 5.0)  # max_grad_norm=1.0\n    optimizer.step()\n    output.append((loss.item(), ce_loss.mean().item(), kl_loss.item()))\n\n    return output\n\n\ndef top_k_top_p_filtering(logits, top_k=100, top_p=0.95, filter_value=-float('Inf')):\n    \"\"\" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering\n        Args:\n            logits: logits distribution shape (vocabulary size)\n            top_k > 0: keep only top k tokens with highest probability (top-k filtering).\n            top_p > 0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).\n                Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)\n        From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317\n    \"\"\"\n    top_k = min(top_k, logits.size(-1))  # Safety check\n    if top_k > 0:\n        # Remove all tokens with a probability less than the last token of the top-k\n        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]\n        logits[indices_to_remove] = filter_value\n\n    if top_p > 0.0:\n        sorted_logits, sorted_indices = torch.sort(logits, descending=True)\n        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)\n\n        # Remove tokens with cumulative probability above the threshold\n        sorted_indices_to_remove = cumulative_probs > top_p\n        # Shift the indices to the right to keep also the first token above the threshold\n        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()\n        sorted_indices_to_remove[..., 0] = 0\n\n        # scatter sorted tensors to original indexing\n        indices_to_remove = sorted_indices_to_remove.scatter(dim=1, index=sorted_indices, src=sorted_indices_to_remove)\n        logits[indices_to_remove] = filter_value\n\n    return logits\n\n\ndef repeat_score(text, ngram=[3, 4, 5, 6]):\n    ngram_list = []\n    for ng in ngram:\n        ngram_list.append([text[idx:idx + ng] for idx in range(len(text) - ng - 1)])\n\n    max_occurs = []\n    for ngrams in ngram_list:\n        count_result = Counter([' '.join(n) for n in ngrams])\n        try:\n            max_occurs.append(\n                max(count_result.values())\n            )\n        except:\n            pass\n\n    scores = [max_oc / ((len(text) / ngram[idx]) + ngram[idx]) for idx, max_oc in enumerate(max_occurs)]\n    return max(scores) if len(scores) >= 1 else 1.0\n\n\ndef sample_sequence(model, tokenizer, length, batch_size=None, x_mask=None, x_tokens=None, y_mask=None, y_tokens=None,\n                    temperature=1, top_k=100, top_p=0.95, device='cuda', sample=True, eos_token=None, model_type='cvae'):\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    with torch.no_grad():\n        if model_type == 'cvae':\n            try:\n                prior_mean, prior_logvar = model.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            except:\n                prior_mean = prior_logvar = torch.zeros([batch_size, model.config.n_embd], device=device)\n            latent_mean, latent_logvar = prior_mean, prior_logvar\n            z = model.reparameterize(latent_mean, latent_logvar)\n            assert not torch.isnan(z).any(), 'training get nan z'\n        else:\n            posterior_mean, posterior_logvar = model.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            latent_mean, latent_logvar = posterior_mean, posterior_logvar\n            z = latent_mean\n            assert not torch.isnan(z).any(), 'training get nan z'\n\n        _, mem = model.transformer(input_ids=x_tokens[:, :-1], past=None, attention_mask=x_mask[:, :-1], representations=z)\n        prev = x_tokens[:, -1].view(batch_size, -1)\n\n        output = prev\n        probability = torch.tensor([], dtype=z.dtype, device=device)\n        if_end = torch.tensor([False] * batch_size, dtype=torch.bool, device=device)\n\n        for i in range(length): #trange\n            logits, mem = model.transformer(input_ids=prev, past=mem, representations=z)\n\n            logits = model.lm_head(logits)\n            if model.add_softmax:\n                logits_rep = model.lm_head_rep(z)\n                logits = logits + logits_rep.unsqueeze(dim=1)\n\n            logits = logits[:, -1, :] / temperature\n            logits = top_k_top_p_filtering(logits, top_k, top_p)\n            probs = F.softmax(logits, dim=-1)\n            if sample:\n                next_token = torch.multinomial(probs, num_samples=1)\n            else:\n                _, next_token = torch.topk(probs, k=1, dim=-1)\n\n            probability = torch.cat((probability, probs.gather(1, next_token)), dim=1)\n            output = torch.cat((output, next_token), dim=1)\n            prev = next_token\n\n            # early stopping if all sents have ended once\n            if_end[next_token.view(-1).eq(eos_token)] = True\n            if if_end.all(): break\n    return output, probability\n\n\ndef main_worker(gpu, ngpus_per_node, args):\n    if args.model_type == 'cvae':\n        args.learn_prior = True\n    else:\n        args.learn_prior = False\n\n    # GPU\n    args.gpu = gpu\n    print(\"There are \", torch.cuda.device_count(), \" available GPUs!\")\n    # print('Setting GPUs {}'.format(args.device))\n    print('Using GPU devices {}'.format(devices))\n    device = torch.device('cuda', args.gpu)\n    torch.cuda.set_device(device)\n    print('Current single GPU: {}'.format(torch.cuda.current_device()))\n\n    # randomness\n    np.random.seed(args.seed)\n    prng = np.random.RandomState()\n    torch.random.manual_seed(args.seed)\n    torch.cuda.manual_seed(args.seed)\n    torch.cuda.manual_seed_all(args.seed)\n\n    # For multiprocessing distributed training, rank needs to be the global rank among all the processes\n    args.rank = args.rank * ngpus_per_node + gpu\n    print('Setting rank', args.rank)\n    recon_attempt = 1\n    connected = False\n    if args.rank != 0:\n        # Stall to have rank 0 node go first\n        time.sleep(3)\n    while not connected:\n        try:\n            dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,\n                                    world_size=args.world_size, rank=args.rank)\n            connected = True\n            print('Established connection. Rank:', args.rank)\n        except Exception as e:\n            # Sometimes the head node launches after the worker, which would cause an issue\n            print('Failed to init process group. Retrying...', recon_attempt, e)\n            recon_attempt += 1\n            time.sleep(10)\n\n    # logging\n    if args.rank == 0:\n        save_folder = os.path.join(args.out_dir, args.experiment)\n        os.makedirs(save_folder, exist_ok=True)\n        t_writer = SummaryWriter(os.path.join(save_folder, 'train'), flush_secs=5)\n        v_writer = SummaryWriter(os.path.join(save_folder, 'val'), flush_secs=5)\n        importlib.reload(logging)\n        logging.basicConfig(filename=os.path.join(save_folder, 'train.log'),\n                            level=logging.INFO, format='%(asctime)s--- %(message)s')\n        logging.info('\\n*******************************************************************************\\n')\n        logging.info(\"the configuration:\")\n        logging.info(str(args).replace(',', '\\n'))\n\n    print('Loading models...')\n    cache_dir = os.path.join(args.out_dir, 'model_cache')\n    os.makedirs(cache_dir, exist_ok=True)\n    # Load pre-trained teacher tokenizer (vocabulary)\n    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n    # Hack to allow tokenizing longer sequences.\n    tokenizer.max_len = int(1e12)\n    gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\n    print('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\n    config = GPT2Config()\n\n    # add special tokens\n    # special_tokens_dict = {\n    #     'pad_token': '<|startoftext|>',\n    #     'cls_token': '<|startofcond|>',\n    #     'sep_token': '<|sepofcond|>',\n    #     'mask_token': '<|endofcond|>'\n    # }\n    # num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\n    # print('We have added', num_added_toks, 'special tokens')\n    # # Notice: resize_token_embeddings expect to receive the full size of the new vocab\n    # gpt2_model.resize_token_embeddings(len(tokenizer))\n    # assert tokenizer.pad_token == '<|startoftext|>'\n\n    VAE = VAEModel(config, add_input=args.add_input, add_attn=args.add_attn, add_softmax=args.add_softmax,\n                   attn_proj_vary=args.attn_proj_vary, learn_prior=args.learn_prior)\n    init_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\n    init_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\n    if args.learn_prior:\n        init_para_frompretrained(VAE.encoder_prior, VAE.encoder, share_para=True)\n        VAE.encoder_prior.averageSelfAttention.attention_weights = VAE.encoder.averageSelfAttention.attention_weights\n    VAE.lm_head.weight = gpt2_model.lm_head.weight\n    if VAE.add_softmax:\n        VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\n        # VAE.lm_head_rep = LM_head_rep(*gpt2_model.lm_head.weight.size()[::-1])\n    print('VAE_params:', num_params(VAE))  # 286694400\n    if args.load:\n        print('Loading model weights...')\n        state = torch.load(os.path.join(args.load, 'model_latest.pt'))  # , map_location='cpu'\n        if 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n            state_copy = copy.copy(state)\n            keys = state_copy.keys()\n            for k in keys:\n                state[k.replace('module.', '')] = state.pop(k)\n        VAE.load_state_dict(state)\n        gc.collect()\n    print('Done.')\n\n    # fix pre-trained parameters before certain iterations\n    tuning_all_after_iters = 10000\n    tuning_all = False\n    for name, parameter in VAE.named_parameters():\n        # print((name, parameter.requires_grad))\n        new_pars = ['c_z', 'attention_weights', 'mean', 'logvar', 'input_proj', 'attn_proj', 'Nu_fc1', 'Nu_fc2', 'lm_head_rep']\n\n        if not any([True if n in name else False for n in new_pars]):\n           parameter.requires_grad = False\n\n    print('Setup data...')\n    # Batch and sequence length schedule\n    assert len(args.batch_sizes) == len(args.seq_lens)\n    batch_schedule = list(zip(map(int, args.batch_sizes), map(int, args.seq_lens)))\n    assert len(batch_schedule) <= 2, 'Currently not supporting multiple schedule'\n    cur_b_schedule = len(batch_schedule) - 1 if args.switch_time == 0 else 0\n    print('Batch schedule', batch_schedule)\n    train_loader, val_loader, test_loader = prepare_dataset(\n        args.data_dir, args.dataset, tokenizer,\n        batch_schedule[cur_b_schedule][0], batch_schedule[cur_b_schedule][1],\n        batch_schedule[-1][0], batch_schedule[-1][1],\n        batch_schedule[-1][0], batch_schedule[-1][1],\n        make_test=True,\n        num_workers=args.workers, data_type=args.data_type\n    )\n    print('Done.')\n\n    ###\n    val_loader = test_loader\n    ###\n\n    print('Wrapping models and optimizers...')\n    # Apply linear scaling rule to increase batch size for short sequence training.\n    lr_schedule = switch_schedule(linear_schedule(args), batch_schedule[cur_b_schedule][0] / batch_schedule[-1][0],\n                                  int(args.iterations * args.switch_time))\n    VAE = VAE.to(device)\n    VAE = VAE.train()\n\n    optimizer = AdamW(VAE.parameters(), lr=args.lr, correct_bias=True)\n    scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_schedule)\n    VAE, optimizer = amp.initialize(VAE, optimizer, opt_level=args.fp16_opt_level)\n    loss_model = DDP(VAE)  # , delay_allreduce=True\n\n    loss_fn = nn.CrossEntropyLoss(reduction='none')\n    print('Done.')\n\n    print('Begin training iterations')\n    logging.info(\"Begin training iterations\")\n    max_val_batches = 20000  # max num. of val batches\n    logging.info(\"Total iteration: %d\" % args.iterations)\n    e = 0  # number of epoch\n    num_iters = 0\n    optimizer.zero_grad()\n    beta = args.beta_0\n    endoftext = tokenizer.convert_tokens_to_ids(\"<|endoftext|>\")\n\n    def val_step(val_loader):\n        VAE.eval()\n\n        n_words_bpe = 0\n        n_words = 0\n        logp_sum = 0.0\n        kl_loss_sum = 0.0\n\n        logging.info(\"Validation loop.         Batches: %d\" % len(val_loader))\n        logging.info(\"Validation loop. max_val_batches: %d\" % max_val_batches)\n\n        # val_iter = iter(val_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(val_iter)\n        with tqdm(total=min(len(val_loader), max_val_batches)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(val_loader):\n                with torch.no_grad():\n                    if args.model_type == 'cvae':\n                        loss, ce_loss, kl_loss = compute_loss(device, VAE, x_mask, x_tokens, y_mask, y_tokens,\n                                                              input_tokens, target_tokens, mask, loss_fn, 1.0)\n                    else:\n                        loss, ce_loss, kl_loss = compute_loss_ae(device, VAE, x_mask, x_tokens, y_mask, y_tokens,\n                                                              input_tokens, target_tokens, mask, loss_fn, 1.0)\n\n                if len(target_tokens.size()) == 1:\n                    target_tokens = target_tokens.unsqueeze(0)\n                n, l = target_tokens.size()\n\n                text = target_tokens[0, :].tolist()\n                logprob = ce_loss.tolist()\n                assert len(text) == len(logprob)\n\n                # only for story\n                idx = text.index(endoftext)\n                text = text[idx + 1:]\n                logprob = logprob[idx + 1:]\n\n                if endoftext in text:\n                    idx = text.index(endoftext)\n                    text = text[:idx]\n                    logprob = logprob[:idx]\n\n                logp_sum += sum(logprob)\n\n                n_words_bpe += len(text)\n\n                story = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n                story = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in story]\n                story = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in\n                         story]\n                words = sum([len(\n                    [t for t in re.split('(\"|\\'|!|\\?|\\.|,|:| |\\n|’|“|”|;|\\(|\\)|`)', s) if t != ' ' and t != '']) for\n                    s in story])\n                n_words += words\n\n                kl_loss_sum += kl_loss.item()\n\n                if i > max_val_batches:\n                    break\n                pbar.update(1)\n\n        loss_bpe = logp_sum / n_words_bpe\n        ppl_bpe = round(math.exp(min(logp_sum / n_words_bpe, 100)), 3)\n        ppl_word = round(math.exp(min(logp_sum / n_words, 100)), 3)\n        kl = kl_loss_sum / len(val_loader)\n\n        v_writer.add_scalar('loss', loss_bpe, num_iters)\n        v_writer.add_scalar('ppl_bpe', ppl_bpe, num_iters)\n        v_writer.add_scalar('ppl_word', ppl_word, num_iters)\n        v_writer.add_scalar('kl', kl, num_iters)\n        logging.info('val loss    : %.4f' % loss_bpe)\n        logging.info('val ppl_bpe : %.4f' % ppl_bpe)\n        logging.info('val ppl_word: %.4f' % ppl_word)\n        logging.info('val   kl    : %.4f' % kl)\n\n        VAE.train()\n\n    def test_plot(test_loader, num_iters):\n        VAE.eval()\n\n        # get embedding\n        X_emb = None\n        y = None\n\n        # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n        with tqdm(total=len(test_loader)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(\n                    test_loader):\n                y_mask = y_mask.to(device)\n                y_tokens = y_tokens.to(device)\n                x_mask = x_mask.to(device)\n                x_tokens = x_tokens.to(device)\n                with torch.no_grad():\n                    if args.model_type == 'cvae':\n                        latent_mean, _ = VAE.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n                    else:\n                        latent_mean, _ = VAE.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n\n                if args.dataset == 'ax' or args.dataset == 'yp':\n                    label = [tokenizer.decode(l)[:2] for l in x_tokens.tolist()]\n                elif args.dataset == 'wp':\n                    label = []\n                    prompts = [tokenizer.decode(l)[:6].lower() for l in x_tokens.tolist()]\n                    for prom in prompts:\n                        if prom[0] in ['[', '('] and prom[5] in [']', ')']:\n                            label.append(prom[2:4])\n                        else:\n                            label.append(None)\n                elif args.dataset == 'wi':\n                    # 0. TV, play, miniseries, telenovela; 1.film; 2. music; 3. manga, comic, 4. book, novel, story 5. game\n                    label = []\n                    prompts = [tokenizer.decode(l) for l in x_tokens.tolist()]\n                    for prom in prompts:\n                        if 'TV' in prom or 'play' in prom or 'miniseries' in prom or 'telenovela' in prom:\n                            label.append(0)\n                        elif 'film' in prom:\n                            label.append(1)\n                        elif 'music' in prom:\n                            label.append(2)\n                        elif 'manga' in prom or 'comic' in prom:\n                            label.append(3)\n                        elif 'book' in prom or 'novel' in prom or 'story' in prom:\n                            label.append(4)\n                        elif 'game' in prom:\n                            label.append(5)\n                        else:\n                            label.append(None)\n                else:\n                    raise Exception\n\n                if i == 0:\n                    X_emb = latent_mean.data\n                    y = label\n                else:\n                    X_emb = torch.cat((X_emb, latent_mean.data), dim=0)\n                    y.extend(label)\n                pbar.update(1)\n        X_emb = X_emb.cpu().numpy()\n\n        try:\n            if args.dataset == 'yp':\n                y = ['0' if l in ['0', '1'] else l for l in y]\n                y = ['4' if l in ['3', '4'] else l for l in y]\n                X_emb = X_emb[[l != '2' for l in y], :]\n                y = [l for l in y if l != '2']\n\n            if args.dataset == 'wp':\n                topics = [['wp', 'sp', 'tt'], ['eu'], ['cw'], ['pm'], ['mp', 'ip'], ['pi', 'cc'], ['ot'], ['rf']]\n                match = [[True if l in t else False for t in topics] for l in y]\n                y = [m.index(True) if True in m else None for m in match]\n                X_emb = X_emb[[l is not None for l in y], :]\n                y = [l for l in y if l is not None]\n\n            if args.dataset == 'wi':\n                X_emb = X_emb[[l is not None for l in y], :]\n                y = [l for l in y if l is not None]\n\n            # to 2D\n            # X_emb_2d = TSNE(n_components=2, init='pca', verbose=1).fit_transform(X_emb)\n            X_emb_2d = TSNE(n_components=2, verbose=1, perplexity=40).fit_transform(X_emb)\n\n            def remove_outliers(data, r=2.0):\n                outliers_data = abs(data - np.mean(data, axis=0)) >= r * np.std(data, axis=0)\n                outliers = np.any(outliers_data, axis=1)\n                keep = np.logical_not(outliers)\n                return outliers, keep\n\n            outliers, keep = remove_outliers(X_emb_2d)\n            X_emb_2d = X_emb_2d[keep, :]\n            y = [l for l, k in zip(y, keep.tolist()) if k]\n\n            # plot\n            fig = plt.figure(figsize=(4, 4))\n            ax = fig.add_axes([0, 0, 1, 1])\n            cc = ['r', 'b', 'g', 'y', 'k', 'c', 'm', 'tab:blue']\n            for i, l in enumerate(sorted(set(y))):\n                idx = [yl == l for yl in y]\n                plt.scatter(X_emb_2d[idx, 0], X_emb_2d[idx, 1], c=cc[i], s=10, edgecolor='none', alpha=0.5)\n            ax.axis('off')  # adding it will get no axis\n            plt.savefig(os.path.join(save_folder, 'tSNE_' + '{:07d}'.format(num_iters) + '.png'))\n            plt.close(fig)\n        except:\n            pass\n\n        VAE.train()\n\n    def generate(test_loader, num_iters):\n        VAE.eval()\n\n        n_samples = 0\n        bleu4_sum = 0.0\n        rouge_scores_values_sum = [0.0] * 9\n\n        args.nsamples = 1\n        args.batch_size = 1\n        args.temperature = 0.95\n        args.top_k = 100\n        args.top_p = 0.95\n        model_type = args.model_type\n\n        # write samples to file\n        samples_file = open(os.path.join(save_folder, 'generate-' + '%07d' % num_iters + '.txt'), 'w', encoding='utf8')\n\n        # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n        with tqdm(total=len(test_loader)) as pbar:\n            for i_test, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(\n                    test_loader):\n\n                if i_test >= 10: break\n\n                length = -1\n                if length == -1:\n                    length = VAE.config.n_ctx - x_tokens.size(1) - 1\n                elif length > VAE.config.n_ctx - x_tokens.size(1) - 1:\n                    raise ValueError(\"Can't get samples longer than window size: %s\" % VAE.config.n_ctx)\n\n                eff_samples = []\n                n, l = target_tokens.size()\n                storys = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n                storys = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in storys]\n                storys_str = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in\n                              storys]\n\n                for _ in range(args.nsamples // args.batch_size):\n                    # model, batch_size, temperature, top_k, top_p, eos_token, sample = VAE, args.batch_size, args.temperature, args.top_k, args.top_p, tokenizer.encoder['<|endoftext|>'], True\n                    out, _ = sample_sequence(\n                        model=VAE,\n                        tokenizer=tokenizer,\n                        length=length,\n                        batch_size=args.batch_size,\n                        x_mask=x_mask,\n                        x_tokens=x_tokens,\n                        y_mask=y_mask,\n                        y_tokens=y_tokens,\n                        temperature=args.temperature,\n                        top_k=args.top_k,\n                        top_p=args.top_p,\n                        device=device,\n                        eos_token=tokenizer.encoder['<|endoftext|>'],\n                        model_type=model_type\n                    )\n                    out = out.tolist()\n\n                    # extract story, check metrics\n                    for i in range(len(out)):\n                        text = out[i]\n                        text = text[text.index(endoftext) + 1:]\n\n                        if endoftext in text:\n                            idx = text.index(endoftext)\n                            text = text[:idx]\n\n                        text = tokenizer.decode(text).strip()\n\n                        # score for one long text, higher than 0.075 usually means repetition\n                        # rep_score = repeat_score(text.split(), ngram=[3, 4, 5, 6, 7, 8])\n                        # if rep_score > 0.075:\n                        #     # print(rep_score)\n                        #     continue\n\n                        try:\n                            # check bleu\n                            bleu4 = sentence_bleu([storys_str[i].split()], text,\n                                                  smoothing_function=SmoothingFunction().method7)\n\n                            # check rouge\n                            rouge = Rouge()\n                            rouge_scores = rouge.get_scores(text, storys_str[i])\n                            rouge_scores_values = [v for k in rouge_scores[0].keys() for v in\n                                                   rouge_scores[0][k].values()]\n\n                            bleu4_sum += bleu4\n                            rouge_scores_values_sum = [v1 + v2 for v1, v2 in\n                                                       zip(rouge_scores_values_sum, rouge_scores_values)]\n                            n_samples += 1\n                        except:\n                            bleu4 = 0.0\n                            rouge_scores = [{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                             'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                             'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}]\n\n                        eff_samples.append((text, bleu4, rouge_scores))\n\n                    pbar.update(1)\n\n                for i in range(len(eff_samples)):\n                    samples_file.write(\"=\" * 50 + \" SAMPLE \" + str(i_test) + \" \" + \"=\" * 50)\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Outlines  \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(tokenizer.decode(x_tokens[i, :][x_mask[i, :] == 1].tolist()))\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(\"=\" * 40 + \" Story \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(storys_str[i])\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Generated \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(eff_samples[i][0])\n                    samples_file.write('\\n' * 4)\n                    samples_file.flush()\n\n        print('Test complete with %05d samples.' % n_samples)\n        logging.info(\"Test complete with %05d samples.\", n_samples)\n        logging.info(\"Iteration completed: %d\" % num_iters)\n\n        bleu4 = round(bleu4_sum / n_samples, 3)\n        rouge_scores_values = [round(r / n_samples, 3) for r in rouge_scores_values_sum]\n        print(' bleu-4:', bleu4)\n        print(' rouge :', rouge_scores_values)\n        logging.info(' bleu-4: %f', bleu4)\n        logging.info(' rouge : %s', str(rouge_scores_values))\n\n        VAE.train()\n\n    if args.rank == 0:\n        test_plot(test_loader, num_iters)\n        val_step(val_loader)\n        generate(test_loader, num_iters)\n        torch.save(VAE.state_dict(), os.path.join(save_folder, 'model_' + '{:07d}'.format(num_iters) + '.pt'))\n\n    while num_iters < args.iterations:\n        # Run epoch\n        st = time.time()\n\n        # Training\n        print('Training loop. Batches:', len(train_loader))\n        logging.info('\\n----------------------------------------------------------------------')\n        logging.info(\"Training loop.       Batches: %d\" % len(train_loader))\n\n        # train_iter = iter(train_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(train_iter)\n        with tqdm(total=len(train_loader)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(train_loader):\n\n                if num_iters % args.cycle >= args.cycle - args.beta_warmup - 25000:\n                    beta = min(1.0, beta + (1. - args.beta_0) / args.beta_warmup)\n\n                if not tuning_all and num_iters >= tuning_all_after_iters:\n                    for name, parameter in VAE.named_parameters():\n                        # print((name, parameter.requires_grad))\n                        parameter.requires_grad = True\n                    tuning_all = True\n\n                output = train_step(device, loss_model, optimizer, x_mask, x_tokens, y_mask, y_tokens,\n                                    input_tokens, target_tokens, mask, loss_fn, beta, args.model_type)\n\n                if args.rank == 0:\n                    loss, ce_loss, kl_loss = output[-1]\n                    lr = scheduler.get_last_lr()[0]\n                    # Log to Tensorboard\n                    t_writer.add_scalar('loss', loss, num_iters)\n                    t_writer.add_scalar('ppl', math.exp(min(ce_loss, 10)), num_iters)\n                    t_writer.add_scalar('lr', lr, num_iters)\n                    t_writer.add_scalar('iter_time', time.time() - st, num_iters)\n                    t_writer.add_scalar('kl', kl_loss, num_iters)\n                    t_writer.add_scalar('beta', beta, num_iters)\n\n                    if args.model_type == 'ae_vae_fusion':\n                        loss, ce_loss, kl_loss = output[0]\n                        # Log to Tensorboard\n                        t_writer.add_scalar('ae_loss', loss, num_iters)\n                        t_writer.add_scalar('ae_kl', kl_loss, num_iters)\n\n                st = time.time()\n                end = num_iters >= args.iterations\n\n                if args.warmup != -1:\n                    scheduler.step()\n\n                if end: break\n                num_iters += 1\n                pbar.update(1)\n\n                if num_iters % args.cycle == 0:\n                    beta = args.beta_0\n                    logging.info('KL annealing restart')\n\n                if args.rank == 0 and num_iters % 10000 == 0:\n                    test_plot(test_loader, num_iters)\n                    val_step(val_loader)\n                    generate(test_loader, num_iters)\n\n                if args.rank == 0 and num_iters % 10000 == 0:\n                    print('Saving model...')\n                    logging.info(\"Iteration completed: %d, remained %d\" % (num_iters, args.iterations - num_iters))\n                    logging.info(\"Saving model...\")\n                    logging.info('\\n------------------------------------------------------')\n                    torch.save(VAE.state_dict(), os.path.join(save_folder, 'model_' + '{:07d}'.format(num_iters) + '.pt'))\n\n                if args.switch_time > 0 and num_iters == int(args.iterations * args.switch_time):\n                    print('Switch to long sequence training')\n                    logging.info(\"Switch to long sequence training\")\n                    cur_b_schedule += 1\n                    train_loader, val_loader, test_loader = prepare_dataset(\n                        args.data_dir, args.dataset, tokenizer,\n                        batch_schedule[cur_b_schedule][0], batch_schedule[cur_b_schedule][1],\n                        batch_schedule[-1][0], batch_schedule[-1][1],\n                        batch_schedule[-1][0], batch_schedule[-1][1],\n                        make_test=True,\n                        num_workers=args.workers, data_type=args.data_type\n                    )\n        if not end:\n            e += 1\n            logging.info(\"Training loop. The ith epoch completed: %d\" % e)\n\n    if args.rank == 0:\n        torch.save(VAE.state_dict(), os.path.join(save_folder, 'model_latest.pt'))\n    print('Training complete.')\n    logging.info(\"Training complete.\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument('experiment', type=str)\n\n    # Default parameters are set based on single GPU training\n    parser.add_argument('--lr', type=float, default=5e-5)\n    parser.add_argument(\"--seed\", type=int, default=0)\n\n    parser.add_argument('--data_type', type=str, default='t1', choices=['t' + str(i) for i in range(9)], help=\"t: type\")\n    parser.add_argument('--model_type', type=str, default='cvae', choices=['cvae', 'ae_vae_fusion'])\n    parser.add_argument('--iterations', type=int, default=101640 * 2)  # wp 850001  wi 300001 ax 300001 yp 800001\n    parser.add_argument('--dataset', type=str, default='wi', choices=['ax', 'yp', 'wp', 'wi'], help=\"Dataset to use for training\")\n    parser.add_argument('--warmup', type=int, default=10000,\n                        help=\"Amount of iterations to warmup, then decay. (-1 for no warmup and decay)\")\n\n    parser.add_argument('--batch-sizes', nargs='+', type=int, default=[1],\n                        help='batch size per GPU. Lists the schedule.')\n    parser.add_argument('--seq-lens', nargs='+', type=int, default=[1024],\n                        help='seq length per sample. Lists the schedule.')\n    parser.add_argument('--switch-time', type=float, default=0,\n                        help=\"Percentage of iterations to spend on short sequence training.\")\n    parser.add_argument('--data-dir', type=str, default='data')\n    parser.add_argument('--out-dir', type=str, default='out')\n    parser.add_argument('--load', type=str, help='path to load model from') # , default='out/test/'\n\n    parser.add_argument('--world-size', default=1, type=int,\n                        help='number of nodes for distributed training')\n    parser.add_argument('--rank', default=0, type=int,\n                        help='node rank for distributed training')\n    parser.add_argument('--dist-url', default='tcp://127.0.0.1:9999', type=str,\n                        help='url used to set up distributed training')\n    parser.add_argument('--dist-backend', default='nccl', type=str,\n                        help='distributed backend')\n    parser.add_argument('--workers', default=1, type=int, metavar='N',\n                        help='number of data loading workers')\n\n    parser.add_argument('--fp16', action='store_true', help=\"Train using FP16?\")\n    parser.add_argument('--fp16_opt_level', default='O0', type=str, required=False)\n\n    # KL cost annealing, increase beta from beta_0 to 1 in beta_warmup steps\n    parser.add_argument('--beta_0', default=0.01, type=float)\n    parser.add_argument('--beta_warmup', type=int, default=25000)\n    # cyc_vae parameters\n    parser.add_argument('--cycle', type=int, default=101640)\n\n    parser.add_argument('--add_input', action=\"store_true\")\n    parser.add_argument('--add_attn', action=\"store_true\")\n    parser.add_argument('--add_softmax', action=\"store_true\")\n    parser.add_argument('--attn_proj_vary', action=\"store_true\")\n\n    parser.add_argument('--learn_prior', action=\"store_true\")\n\n    args = parser.parse_args('wi.1.proj_vary_cyc_cvae --batch-sizes 1 --seq-lens 1024 '\n                             ' --add_input --learn_prior --fp16'.split())\n\n    # Each node is expected to have same number of GPUs\n    ngpus_per_node = torch.cuda.device_count()\n    # Since we have ngpus_per_node processes per node, the total world_size\n    # needs to be adjusted accordingly\n    args.world_size = ngpus_per_node * args.world_size\n    # Use torch.multiprocessing.spawn to launch distributed processes: the\n    # main_worker process function\n    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))\n"
  },
  {
    "path": "train_dist_half.py",
    "content": "import os, time, gc, json, pickle, argparse, math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.data as data\nfrom torch.nn import DataParallel\nimport torch.distributed as dist\nimport torch.multiprocessing as mp\nimport numpy as np\nimport transformers\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config, AdamW, get_linear_schedule_with_warmup, Conv1D\nfrom tensorboardX import SummaryWriter\nfrom tqdm import tqdm\nimport importlib\nimport logging\nimport copy\n\nfrom apex.optimizers import FusedAdam\nfrom apex import amp\nfrom apex.fp16_utils import FP16_Optimizer\n\nfrom apex.parallel import DistributedDataParallel as DDP\nfrom torch.nn.parallel import DistributedDataParallel\n\nfrom data.util import *\nfrom util import *\nfrom dist_utils import *\n\nfrom model import VAEModel\n\nfrom collections import Counter\nfrom nltk.translate.bleu_score import sentence_bleu\nfrom nltk.translate.bleu_score import SmoothingFunction\nfrom rouge import Rouge\n\nfrom sklearn.manifold import TSNE\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n\ndevices = '3,2,1,0'\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = devices\n\n\ndef compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, x_mask=x_mask, x_tokens=x_tokens, y_mask=y_mask,\n                    y_tokens=y_tokens)\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1))\n    kl_loss = kl_loss.mean()\n    loss = ce_loss.mean() + beta * kl_loss\n\n    return loss, ce_loss, kl_loss\n\n\ndef compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta):\n    input_tokens = input_tokens.to(device)\n    target_tokens = target_tokens.to(device)\n    mask = mask.to(device)\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n\n    outputs = model(input_ids=input_tokens, attention_mask=mask, y_mask=x_mask, y_tokens=x_tokens, from_mean=True, from_prior=False)\n\n    logits = outputs[0]\n    kl_loss = outputs[-1]\n    num_logits = logits.size(-1)\n\n    # Perform masking\n    if mask is not None:\n        mask = mask.type(torch.bool)\n        mask = mask.to(device)\n        logits = logits.masked_select(mask.unsqueeze(-1))\n        target_tokens = target_tokens.masked_select(mask)\n\n    ce_loss = loss_fn(logits.view(-1, num_logits), target_tokens.view(-1))\n    kl_loss = kl_loss.mean()\n    loss = ce_loss.mean()\n\n    return loss, ce_loss, kl_loss\n\n\ndef train_step(device, model, optimizer, x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask, loss_fn, beta, model_type):\n    output = []\n    if model_type == 'ae_vae_fusion':\n        optimizer.zero_grad()\n        loss, ce_loss, kl_loss = compute_loss_ae(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                              target_tokens, mask, loss_fn, beta)\n        optimizer.backward(loss)\n        torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), 5.0)  # max_grad_norm=1.0\n        optimizer.step()\n        output.append((loss.item(), ce_loss.mean().item(), kl_loss.item()))\n\n    optimizer.zero_grad()\n    loss, ce_loss, kl_loss = compute_loss(device, model, x_mask, x_tokens, y_mask, y_tokens, input_tokens,\n                                          target_tokens, mask, loss_fn, beta)\n    optimizer.backward(loss)\n    torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), 5.0)  # max_grad_norm=1.0\n    optimizer.step()\n    output.append((loss.item(), ce_loss.mean().item(), kl_loss.item()))\n\n    return output\n\n\ndef top_k_top_p_filtering(logits, top_k=100, top_p=0.95, filter_value=-float('Inf')):\n    \"\"\" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering\n        Args:\n            logits: logits distribution shape (vocabulary size)\n            top_k > 0: keep only top k tokens with highest probability (top-k filtering).\n            top_p > 0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).\n                Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)\n        From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317\n    \"\"\"\n    top_k = min(top_k, logits.size(-1))  # Safety check\n    if top_k > 0:\n        # Remove all tokens with a probability less than the last token of the top-k\n        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]\n        logits[indices_to_remove] = filter_value\n\n    if top_p > 0.0:\n        sorted_logits, sorted_indices = torch.sort(logits, descending=True)\n        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)\n\n        # Remove tokens with cumulative probability above the threshold\n        sorted_indices_to_remove = cumulative_probs > top_p\n        # Shift the indices to the right to keep also the first token above the threshold\n        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()\n        sorted_indices_to_remove[..., 0] = 0\n\n        # scatter sorted tensors to original indexing\n        indices_to_remove = sorted_indices_to_remove.scatter(dim=1, index=sorted_indices, src=sorted_indices_to_remove)\n        logits[indices_to_remove] = filter_value\n\n    return logits\n\n\ndef repeat_score(text, ngram=[3, 4, 5, 6]):\n    ngram_list = []\n    for ng in ngram:\n        ngram_list.append([text[idx:idx + ng] for idx in range(len(text) - ng - 1)])\n\n    max_occurs = []\n    for ngrams in ngram_list:\n        count_result = Counter([' '.join(n) for n in ngrams])\n        try:\n            max_occurs.append(\n                max(count_result.values())\n            )\n        except:\n            pass\n\n    scores = [max_oc / ((len(text) / ngram[idx]) + ngram[idx]) for idx, max_oc in enumerate(max_occurs)]\n    return max(scores) if len(scores) >= 1 else 1.0\n\n\ndef sample_sequence(model, tokenizer, length, batch_size=None, x_mask=None, x_tokens=None, y_mask=None, y_tokens=None,\n                    temperature=1, top_k=100, top_p=0.95, device='cuda', sample=True, eos_token=None, model_type='cvae'):\n    x_mask = x_mask.to(device)\n    x_tokens = x_tokens.to(device)\n    y_mask = y_mask.to(device)\n    y_tokens = y_tokens.to(device)\n\n    with torch.no_grad():\n        if model_type == 'cvae':\n            try:\n                prior_mean, prior_logvar = model.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            except:\n                prior_mean = prior_logvar = torch.zeros([batch_size, model.config.n_embd], device=device)\n            latent_mean, latent_logvar = prior_mean, prior_logvar\n            z = model.reparameterize(latent_mean, latent_logvar)\n            assert not torch.isnan(z).any(), 'training get nan z'\n        else:\n            posterior_mean, posterior_logvar = model.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n            latent_mean, latent_logvar = posterior_mean, posterior_logvar\n            z = latent_mean\n            assert not torch.isnan(z).any(), 'training get nan z'\n\n        _, mem = model.transformer(input_ids=x_tokens[:, :-1], past=None, attention_mask=x_mask[:, :-1], representations=z)\n        prev = x_tokens[:, -1].view(batch_size, -1)\n\n        output = prev\n        probability = torch.tensor([], dtype=z.dtype, device=device)\n        if_end = torch.tensor([False] * batch_size, dtype=torch.bool, device=device)\n\n        for i in range(length): #trange\n            logits, mem = model.transformer(input_ids=prev, past=mem, representations=z)\n\n            logits = model.lm_head(logits)\n            if model.add_softmax:\n                logits_rep = model.lm_head_rep(z)\n                logits = logits + logits_rep.unsqueeze(dim=1)\n\n            logits = logits[:, -1, :] / temperature\n            logits = top_k_top_p_filtering(logits, top_k, top_p)\n            probs = F.softmax(logits, dim=-1)\n            if sample:\n                next_token = torch.multinomial(probs, num_samples=1)\n            else:\n                _, next_token = torch.topk(probs, k=1, dim=-1)\n\n            probability = torch.cat((probability, probs.gather(1, next_token)), dim=1)\n            output = torch.cat((output, next_token), dim=1)\n            prev = next_token\n\n            # early stopping if all sents have ended once\n            if_end[next_token.view(-1).eq(eos_token)] = True\n            if if_end.all(): break\n    return output, probability\n\n\ndef main_worker(gpu, ngpus_per_node, args):\n    if args.model_type == 'cvae':\n        args.learn_prior = True\n    else:\n        args.learn_prior = False\n\n    # GPU\n    args.gpu = gpu\n    print(\"There are \", torch.cuda.device_count(), \" available GPUs!\")\n    # print('Setting GPUs {}'.format(args.device))\n    print('Using GPU devices {}'.format(devices))\n    device = torch.device('cuda', args.gpu)\n    torch.cuda.set_device(device)\n    print('Current single GPU: {}'.format(torch.cuda.current_device()))\n\n    # randomness\n    np.random.seed(args.seed)\n    prng = np.random.RandomState()\n    torch.random.manual_seed(args.seed)\n    torch.cuda.manual_seed(args.seed)\n    torch.cuda.manual_seed_all(args.seed)\n\n    # For multiprocessing distributed training, rank needs to be the global rank among all the processes\n    args.rank = args.rank * ngpus_per_node + gpu\n    print('Setting rank', args.rank)\n    recon_attempt = 1\n    connected = False\n    if args.rank != 0:\n        # Stall to have rank 0 node go first\n        time.sleep(3)\n    while not connected:\n        try:\n            dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,\n                                    world_size=args.world_size, rank=args.rank)\n            connected = True\n            print('Established connection. Rank:', args.rank)\n        except Exception as e:\n            # Sometimes the head node launches after the worker, which would cause an issue\n            print('Failed to init process group. Retrying...', recon_attempt, e)\n            recon_attempt += 1\n            time.sleep(10)\n\n    # logging\n    if args.rank == 0:\n        save_folder = os.path.join(args.out_dir, args.experiment)\n        os.makedirs(save_folder, exist_ok=True)\n        t_writer = SummaryWriter(os.path.join(save_folder, 'train'), flush_secs=5)\n        v_writer = SummaryWriter(os.path.join(save_folder, 'val'), flush_secs=5)\n        importlib.reload(logging)\n        logging.basicConfig(filename=os.path.join(save_folder, 'train.log'),\n                            level=logging.INFO, format='%(asctime)s--- %(message)s')\n        logging.info('\\n*******************************************************************************\\n')\n        logging.info(\"the configuration:\")\n        logging.info(str(args).replace(',', '\\n'))\n\n    print('Loading models...')\n    cache_dir = os.path.join(args.out_dir, 'model_cache')\n    os.makedirs(cache_dir, exist_ok=True)\n    # Load pre-trained teacher tokenizer (vocabulary)\n    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n    # Hack to allow tokenizing longer sequences.\n    tokenizer.max_len = int(1e12)\n    gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\n    print('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\n    config = GPT2Config()\n\n    # # add special tokens\n    # special_tokens_dict = {\n    #     'pad_token': '<|startoftext|>',\n    #     'cls_token': '<|startofcond|>',\n    #     'sep_token': '<|sepofcond|>',\n    #     'mask_token': '<|endofcond|>'\n    # }\n    # num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\n    # print('We have added', num_added_toks, 'special tokens')\n    # # Notice: resize_token_embeddings expect to receive the full size of the new vocab\n    # gpt2_model.resize_token_embeddings(len(tokenizer))\n    # assert tokenizer.pad_token == '<|startoftext|>'\n\n    VAE = VAEModel(config, add_input=args.add_input, add_attn=args.add_attn, add_softmax=args.add_softmax,\n                   attn_proj_vary=args.attn_proj_vary, learn_prior=args.learn_prior)\n    init_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\n    init_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\n    if args.learn_prior:\n        init_para_frompretrained(VAE.encoder_prior, VAE.encoder, share_para=True)\n        VAE.encoder_prior.averageSelfAttention.attention_weights = VAE.encoder.averageSelfAttention.attention_weights\n    VAE.lm_head.weight = gpt2_model.lm_head.weight\n    if VAE.add_softmax:\n        VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\n        # VAE.lm_head_rep = LM_head_rep(*gpt2_model.lm_head.weight.size()[::-1])\n    print('VAE_params:', num_params(VAE))  # 286694400\n    if args.load:\n        print('Loading model weights...')\n        state = torch.load(os.path.join(args.load, 'model_latest.pt'))  # , map_location='cpu'\n        if 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n            state_copy = copy.copy(state)\n            keys = state_copy.keys()\n            for k in keys:\n                state[k.replace('module.', '')] = state.pop(k)\n        VAE.load_state_dict(state)\n        gc.collect()\n    print('Done.')\n\n    # fix pre-trained parameters before certain iterations\n    tuning_all_after_iters = 10000\n    tuning_all = False\n    for name, parameter in VAE.named_parameters():\n        # print((name, parameter.requires_grad))\n        new_pars = ['c_z', 'attention_weights', 'mean', 'logvar', 'input_proj', 'attn_proj', 'Nu_fc1', 'Nu_fc2', 'lm_head_rep']\n\n        if not any([True if n in name else False for n in new_pars]):\n           parameter.requires_grad = False\n\n    print('Setup data...')\n    # Batch and sequence length schedule\n    assert len(args.batch_sizes) == len(args.seq_lens)\n    batch_schedule = list(zip(map(int, args.batch_sizes), map(int, args.seq_lens)))\n    assert len(batch_schedule) <= 2, 'Currently not supporting multiple schedule'\n    cur_b_schedule = len(batch_schedule) - 1 if args.switch_time == 0 else 0\n    print('Batch schedule', batch_schedule)\n    train_loader, val_loader, test_loader = prepare_dataset(\n        args.data_dir, args.dataset, tokenizer,\n        batch_schedule[cur_b_schedule][0], batch_schedule[cur_b_schedule][1],\n        batch_schedule[-1][0], batch_schedule[-1][1],\n        batch_schedule[-1][0], batch_schedule[-1][1],\n        make_test=True,\n        num_workers=args.workers, data_type=args.data_type\n    )\n    print('Done.')\n\n    ###\n    val_loader = test_loader\n    ###\n\n    print('Wrapping models and optimizers...')\n    # Apply linear scaling rule to increase batch size for short sequence training.\n    lr_schedule = switch_schedule(linear_schedule(args), batch_schedule[cur_b_schedule][0] / batch_schedule[-1][0],\n                                  int(args.iterations * args.switch_time))\n    if args.fp16:\n        VAE = VAE.half()\n    VAE = VAE.to(device)\n    VAE = VAE.train()\n\n    params = [p for p in VAE.parameters() if p.requires_grad]\n    optimizer = FusedAdam(params, lr=args.lr)\n    optimizer = FP16_Optimizer(optimizer, dynamic_loss_scale=True, verbose=False)\n    scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer.optimizer, lr_schedule)\n    loss_model = SimpleDistributedDataParallel(VAE, args.world_size)\n\n    loss_fn = nn.CrossEntropyLoss(reduction='none')\n    print('Done.')\n\n    print('Begin training iterations')\n    logging.info(\"Begin training iterations\")\n    max_val_batches = 20000  # max num. of val batches\n    logging.info(\"Total iteration: %d\" % args.iterations)\n    e = 0  # number of epoch\n    num_iters = 0\n    optimizer.zero_grad()\n    beta = args.beta_0\n    endoftext = tokenizer.convert_tokens_to_ids(\"<|endoftext|>\")\n\n    def val_step(val_loader):\n        VAE.eval()\n\n        n_words_bpe = 0\n        n_words = 0\n        logp_sum = 0.0\n        kl_loss_sum = 0.0\n\n        logging.info(\"Validation loop.         Batches: %d\" % len(val_loader))\n        logging.info(\"Validation loop. max_val_batches: %d\" % max_val_batches)\n\n        # val_iter = iter(val_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(val_iter)\n        with tqdm(total=min(len(val_loader), max_val_batches)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(val_loader):\n                with torch.no_grad():\n                    if args.model_type == 'cvae':\n                        loss, ce_loss, kl_loss = compute_loss(device, VAE, x_mask, x_tokens, y_mask, y_tokens,\n                                                              input_tokens, target_tokens, mask, loss_fn, 1.0)\n                    else:\n                        loss, ce_loss, kl_loss = compute_loss_ae(device, VAE, x_mask, x_tokens, y_mask, y_tokens,\n                                                              input_tokens, target_tokens, mask, loss_fn, 1.0)\n\n                if len(target_tokens.size()) == 1:\n                    target_tokens = target_tokens.unsqueeze(0)\n                n, l = target_tokens.size()\n\n                text = target_tokens[0, :].tolist()\n                logprob = ce_loss.tolist()\n                assert len(text) == len(logprob)\n\n                # only for story\n                idx = text.index(endoftext)\n                text = text[idx + 1:]\n                logprob = logprob[idx + 1:]\n\n                if endoftext in text:\n                    idx = text.index(endoftext)\n                    text = text[:idx]\n                    logprob = logprob[:idx]\n\n                logp_sum += sum(logprob)\n\n                n_words_bpe += len(text)\n\n                story = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n                story = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in story]\n                story = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in\n                         story]\n                words = sum([len(\n                    [t for t in re.split('(\"|\\'|!|\\?|\\.|,|:| |\\n|’|“|”|;|\\(|\\)|`)', s) if t != ' ' and t != '']) for\n                    s in story])\n                n_words += words\n\n                kl_loss_sum += kl_loss.item()\n\n                if i > max_val_batches:\n                    break\n                pbar.update(1)\n\n        loss_bpe = logp_sum / n_words_bpe\n        ppl_bpe = round(math.exp(min(logp_sum / n_words_bpe, 100)), 3)\n        ppl_word = round(math.exp(min(logp_sum / n_words, 100)), 3)\n        kl = kl_loss_sum / len(val_loader)\n\n        v_writer.add_scalar('loss', loss_bpe, num_iters)\n        v_writer.add_scalar('ppl_bpe', ppl_bpe, num_iters)\n        v_writer.add_scalar('ppl_word', ppl_word, num_iters)\n        v_writer.add_scalar('kl', kl, num_iters)\n        logging.info('val loss    : %.4f' % loss_bpe)\n        logging.info('val ppl_bpe : %.4f' % ppl_bpe)\n        logging.info('val ppl_word: %.4f' % ppl_word)\n        logging.info('val   kl    : %.4f' % kl)\n\n        VAE.train()\n\n    def test_plot(test_loader, num_iters):\n        VAE.eval()\n\n        # get embedding\n        X_emb = None\n        y = None\n\n        # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n        with tqdm(total=len(test_loader)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(\n                    test_loader):\n                y_mask = y_mask.to(device)\n                y_tokens = y_tokens.to(device)\n                x_mask = x_mask.to(device)\n                x_tokens = x_tokens.to(device)\n                with torch.no_grad():\n                    if args.model_type == 'cvae':\n                        latent_mean, _ = VAE.encoder_prior(input_ids=x_tokens, attention_mask=x_mask)[:2]\n                    else:\n                        latent_mean, _ = VAE.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n\n                if args.dataset == 'ax' or args.dataset == 'yp':\n                    label = [tokenizer.decode(l)[:2] for l in x_tokens.tolist()]\n                elif args.dataset == 'wp':\n                    label = []\n                    prompts = [tokenizer.decode(l)[:6].lower() for l in x_tokens.tolist()]\n                    for prom in prompts:\n                        if prom[0] in ['[', '('] and prom[5] in [']', ')']:\n                            label.append(prom[2:4])\n                        else:\n                            label.append(None)\n                elif args.dataset == 'wi':\n                    # 0. TV, play, miniseries, telenovela; 1.film; 2. music; 3. manga, comic, 4. book, novel, story 5. game\n                    label = []\n                    prompts = [tokenizer.decode(l) for l in x_tokens.tolist()]\n                    for prom in prompts:\n                        if 'TV' in prom or 'play' in prom or 'miniseries' in prom or 'telenovela' in prom:\n                            label.append(0)\n                        elif 'film' in prom:\n                            label.append(1)\n                        elif 'music' in prom:\n                            label.append(2)\n                        elif 'manga' in prom or 'comic' in prom:\n                            label.append(3)\n                        elif 'book' in prom or 'novel' in prom or 'story' in prom:\n                            label.append(4)\n                        elif 'game' in prom:\n                            label.append(5)\n                        else:\n                            label.append(None)\n                else:\n                    raise Exception\n\n                if i == 0:\n                    X_emb = latent_mean.data\n                    y = label\n                else:\n                    X_emb = torch.cat((X_emb, latent_mean.data), dim=0)\n                    y.extend(label)\n                pbar.update(1)\n        X_emb = X_emb.cpu().numpy()\n\n        try:\n            if args.dataset == 'yp':\n                y = ['0' if l in ['0', '1'] else l for l in y]\n                y = ['4' if l in ['3', '4'] else l for l in y]\n                X_emb = X_emb[[l != '2' for l in y], :]\n                y = [l for l in y if l != '2']\n\n            if args.dataset == 'wp':\n                topics = [['wp', 'sp', 'tt'], ['eu'], ['cw'], ['pm'], ['mp', 'ip'], ['pi', 'cc'], ['ot'], ['rf']]\n                match = [[True if l in t else False for t in topics] for l in y]\n                y = [m.index(True) if True in m else None for m in match]\n                X_emb = X_emb[[l is not None for l in y], :]\n                y = [l for l in y if l is not None]\n\n            if args.dataset == 'wi':\n                X_emb = X_emb[[l is not None for l in y], :]\n                y = [l for l in y if l is not None]\n\n            # to 2D\n            # X_emb_2d = TSNE(n_components=2, init='pca', verbose=1).fit_transform(X_emb)\n            X_emb_2d = TSNE(n_components=2, verbose=1, perplexity=40).fit_transform(X_emb)\n\n            def remove_outliers(data, r=2.0):\n                outliers_data = abs(data - np.mean(data, axis=0)) >= r * np.std(data, axis=0)\n                outliers = np.any(outliers_data, axis=1)\n                keep = np.logical_not(outliers)\n                return outliers, keep\n\n            outliers, keep = remove_outliers(X_emb_2d)\n            X_emb_2d = X_emb_2d[keep, :]\n            y = [l for l, k in zip(y, keep.tolist()) if k]\n\n            # plot\n            fig = plt.figure(figsize=(4, 4))\n            ax = fig.add_axes([0, 0, 1, 1])\n            cc = ['r', 'b', 'g', 'y', 'k', 'c', 'm', 'tab:blue']\n            for i, l in enumerate(sorted(set(y))):\n                idx = [yl == l for yl in y]\n                plt.scatter(X_emb_2d[idx, 0], X_emb_2d[idx, 1], c=cc[i], s=10, edgecolor='none', alpha=0.5)\n            ax.axis('off')  # adding it will get no axis\n            plt.savefig(os.path.join(save_folder, 'tSNE_' + '{:07d}'.format(num_iters) + '.png'))\n            plt.close(fig)\n        except:\n            pass\n\n        VAE.train()\n\n    def generate(test_loader, num_iters):\n        VAE.eval()\n\n        n_samples = 0\n        bleu4_sum = 0.0\n        rouge_scores_values_sum = [0.0] * 9\n\n        args.nsamples = 1\n        args.batch_size = 1\n        args.temperature = 0.95\n        args.top_k = 100\n        args.top_p = 0.95\n        model_type = args.model_type\n\n        # write samples to file\n        samples_file = open(os.path.join(save_folder, 'generate-' + '%07d' % num_iters + '.txt'), 'w', encoding='utf8')\n\n        # test_iter = iter(test_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(test_iter)\n        with tqdm(total=len(test_loader)) as pbar:\n            for i_test, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(\n                    test_loader):\n\n                if i_test >= 10: break\n\n                length = -1\n                if length == -1:\n                    length = VAE.config.n_ctx - x_tokens.size(1) - 1\n                elif length > VAE.config.n_ctx - x_tokens.size(1) - 1:\n                    raise ValueError(\"Can't get samples longer than window size: %s\" % VAE.config.n_ctx)\n\n                eff_samples = []\n                n, l = target_tokens.size()\n                storys = [tokenizer.decode(target_tokens[i, :]) for i in range(n)]\n                storys = [s[s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\"):] for s in storys]\n                storys_str = [s[:s.find(\"<|endoftext|>\") + len(\"<|endoftext|>\")] if \"<|endoftext|>\" in s else s for s in\n                              storys]\n\n                for _ in range(args.nsamples // args.batch_size):\n                    # model, batch_size, temperature, top_k, top_p, eos_token, sample = VAE, args.batch_size, args.temperature, args.top_k, args.top_p, tokenizer.encoder['<|endoftext|>'], True\n                    out, _ = sample_sequence(\n                        model=VAE,\n                        tokenizer=tokenizer,\n                        length=length,\n                        batch_size=args.batch_size,\n                        x_mask=x_mask,\n                        x_tokens=x_tokens,\n                        y_mask=y_mask,\n                        y_tokens=y_tokens,\n                        temperature=args.temperature,\n                        top_k=args.top_k,\n                        top_p=args.top_p,\n                        device=device,\n                        eos_token=tokenizer.encoder['<|endoftext|>'],\n                        model_type=model_type\n                    )\n                    out = out.tolist()\n\n                    # extract story, check metrics\n                    for i in range(len(out)):\n                        text = out[i]\n                        text = text[text.index(endoftext) + 1:]\n\n                        if endoftext in text:\n                            idx = text.index(endoftext)\n                            text = text[:idx]\n\n                        text = tokenizer.decode(text).strip()\n\n                        # score for one long text, higher than 0.075 usually means repetition\n                        # rep_score = repeat_score(text.split(), ngram=[3, 4, 5, 6, 7, 8])\n                        # if rep_score > 0.075:\n                        #     # print(rep_score)\n                        #     continue\n\n                        try:\n                            # check bleu\n                            bleu4 = sentence_bleu([storys_str[i].split()], text,\n                                                  smoothing_function=SmoothingFunction().method7)\n\n                            # check rouge\n                            rouge = Rouge()\n                            rouge_scores = rouge.get_scores(text, storys_str[i])\n                            rouge_scores_values = [v for k in rouge_scores[0].keys() for v in\n                                                   rouge_scores[0][k].values()]\n\n                            bleu4_sum += bleu4\n                            rouge_scores_values_sum = [v1 + v2 for v1, v2 in\n                                                       zip(rouge_scores_values_sum, rouge_scores_values)]\n                            n_samples += 1\n                        except:\n                            bleu4 = 0.0\n                            rouge_scores = [{'rouge-1': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                             'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},\n                                             'rouge-l': {'f': 0.0, 'p': 0.0, 'r': 0.0}}]\n\n                        eff_samples.append((text, bleu4, rouge_scores))\n\n                    pbar.update(1)\n\n                for i in range(len(eff_samples)):\n                    samples_file.write(\"=\" * 50 + \" SAMPLE \" + str(i_test) + \" \" + \"=\" * 50)\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Outlines  \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(tokenizer.decode(x_tokens[i, :][x_mask[i, :] == 1].tolist()))\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(\"=\" * 40 + \" Story \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(storys_str[i])\n                    samples_file.write('\\n' * 2)\n\n                    samples_file.write(\"=\" * 40 + \" Generated \" + \"=\" * 40)\n                    samples_file.write('\\n' * 2)\n                    samples_file.write(eff_samples[i][0])\n                    samples_file.write('\\n' * 4)\n                    samples_file.flush()\n\n        print('Test complete with %05d samples.' % n_samples)\n        logging.info(\"Test complete with %05d samples.\", n_samples)\n        logging.info(\"Iteration completed: %d\" % num_iters)\n\n        bleu4 = round(bleu4_sum / n_samples, 3)\n        rouge_scores_values = [round(r / n_samples, 3) for r in rouge_scores_values_sum]\n        print(' bleu-4:', bleu4)\n        print(' rouge :', rouge_scores_values)\n        logging.info(' bleu-4: %f', bleu4)\n        logging.info(' rouge : %s', str(rouge_scores_values))\n\n        VAE.train()\n\n    if args.rank == 0:\n        test_plot(test_loader, num_iters)\n        val_step(val_loader)\n        generate(test_loader, num_iters)\n        torch.save(loss_model.state_dict(), os.path.join(save_folder, 'model_' + '{:07d}'.format(num_iters) + '.pt'))\n\n    while num_iters < args.iterations:\n        # Run epoch\n        st = time.time()\n\n        # Training\n        print('Training loop. Batches:', len(train_loader))\n        logging.info('\\n----------------------------------------------------------------------')\n        logging.info(\"Training loop.       Batches: %d\" % len(train_loader))\n\n        # train_iter = iter(train_loader); x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask = next(train_iter)\n        with tqdm(total=len(train_loader)) as pbar:\n            for i, (x_mask, x_tokens, y_mask, y_tokens, input_tokens, target_tokens, mask) in enumerate(train_loader):\n\n                # if num_iters % args.cycle >= args.cycle - args.beta_warmup:\n                #     beta = min(1.0, beta + (1. - args.beta_0) / args.beta_warmup)\n\n                if not tuning_all and num_iters >= tuning_all_after_iters:\n                    for name, parameter in VAE.named_parameters():\n                        # print((name, parameter.requires_grad))\n                        parameter.requires_grad = True\n                    tuning_all = True\n\n                output = train_step(device, loss_model, optimizer, x_mask, x_tokens, y_mask, y_tokens,\n                                    input_tokens, target_tokens, mask, loss_fn, beta, args.model_type)\n\n                if args.rank == 0:\n                    loss, ce_loss, kl_loss = output[-1]\n                    lr = scheduler.get_last_lr()[0]\n                    # Log to Tensorboard\n                    t_writer.add_scalar('loss', loss, num_iters)\n                    t_writer.add_scalar('ppl', math.exp(min(ce_loss, 10)), num_iters)\n                    t_writer.add_scalar('lr', lr, num_iters)\n                    t_writer.add_scalar('iter_time', time.time() - st, num_iters)\n                    t_writer.add_scalar('kl', kl_loss, num_iters)\n                    t_writer.add_scalar('beta', beta, num_iters)\n\n                    if args.model_type == 'ae_vae_fusion':\n                        loss, ce_loss, kl_loss = output[0]\n                        # Log to Tensorboard\n                        t_writer.add_scalar('ae_loss', loss, num_iters)\n                        t_writer.add_scalar('ae_kl', kl_loss, num_iters)\n\n                st = time.time()\n                end = num_iters >= args.iterations\n\n                if args.warmup != -1:\n                    scheduler.step()\n\n                if end: break\n                num_iters += 1\n                pbar.update(1)\n\n                if num_iters % args.cycle == 0:\n                    beta = args.beta_0\n                    logging.info('KL annealing restart')\n\n                if args.rank == 0 and num_iters % 10000 == 0:\n                    test_plot(test_loader, num_iters)\n                    val_step(val_loader)\n                    generate(test_loader, num_iters)\n\n                if args.rank == 0 and num_iters % 10000 == 0:\n                    print('Saving model...')\n                    logging.info(\"Iteration completed: %d, remained %d\" % (num_iters, args.iterations - num_iters))\n                    logging.info(\"Saving model...\")\n                    logging.info('\\n------------------------------------------------------')\n                    torch.save(loss_model.state_dict(), os.path.join(save_folder, 'model_' + '{:07d}'.format(num_iters) + '.pt'))\n\n                if args.switch_time > 0 and num_iters == int(args.iterations * args.switch_time):\n                    print('Switch to long sequence training')\n                    logging.info(\"Switch to long sequence training\")\n                    cur_b_schedule += 1\n                    train_loader, val_loader, test_loader = prepare_dataset(\n                        args.data_dir, args.dataset, tokenizer,\n                        batch_schedule[cur_b_schedule][0], batch_schedule[cur_b_schedule][1],\n                        batch_schedule[-1][0], batch_schedule[-1][1],\n                        batch_schedule[-1][0], batch_schedule[-1][1],\n                        make_test=True,\n                        num_workers=args.workers, data_type=args.data_type\n                    )\n        if not end:\n            e += 1\n            logging.info(\"Training loop. The ith epoch completed: %d\" % e)\n\n    if args.rank == 0:\n        torch.save(loss_model.state_dict(), os.path.join(save_folder, 'model_latest.pt'))\n    print('Training complete.')\n    logging.info(\"Training complete.\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument('experiment', type=str)\n\n    # Default parameters are set based on single GPU training\n    parser.add_argument('--lr', type=float, default=5e-5)\n    parser.add_argument(\"--seed\", type=int, default=0)\n\n    parser.add_argument('--data_type', type=str, default='t5', choices=['t' + str(i) for i in range(9)], help=\"t: type\")\n    parser.add_argument('--model_type', type=str, default='cvae', choices=['cvae', 'ae_vae_fusion'])\n    parser.add_argument('--iterations', type=int, default=101640)  # wp 850001  wi 300001 ax 300001 yp 800001\n    parser.add_argument('--dataset', type=str, default='wi', choices=['ax', 'yp', 'wp', 'wi'], help=\"Dataset to use for training\")\n    parser.add_argument('--warmup', type=int, default=10000,\n                        help=\"Amount of iterations to warmup, then decay. (-1 for no warmup and decay)\")\n\n    parser.add_argument('--batch-sizes', nargs='+', type=int, default=[1],\n                        help='batch size per GPU. Lists the schedule.')\n    parser.add_argument('--seq-lens', nargs='+', type=int, default=[1024],\n                        help='seq length per sample. Lists the schedule.')\n    parser.add_argument('--switch-time', type=float, default=0,\n                        help=\"Percentage of iterations to spend on short sequence training.\")\n    parser.add_argument('--data-dir', type=str, default='data')\n    parser.add_argument('--out-dir', type=str, default='out')\n    parser.add_argument('--load', type=str, help='path to load model from') # , default='out/test/'\n\n    parser.add_argument('--world-size', default=1, type=int,\n                        help='number of nodes for distributed training')\n    parser.add_argument('--rank', default=0, type=int,\n                        help='node rank for distributed training')\n    parser.add_argument('--dist-url', default='tcp://127.0.0.1:9999', type=str,\n                        help='url used to set up distributed training')\n    parser.add_argument('--dist-backend', default='nccl', type=str,\n                        help='distributed backend')\n    parser.add_argument('--workers', default=1, type=int, metavar='N',\n                        help='number of data loading workers')\n\n    parser.add_argument('--fp16', action='store_true', help=\"Train using FP16?\")\n    parser.add_argument('--fp16_opt_level', default='O0', type=str, required=False)\n\n    # KL cost annealing, increase beta from beta_0 to 1 in beta_warmup steps\n    parser.add_argument('--beta_0', default=1.0, type=float)\n    parser.add_argument('--beta_warmup', type=int, default=50000)\n    # cyc_vae parameters\n    parser.add_argument('--cycle', type=int, default=101640)\n\n    parser.add_argument('--add_input', action=\"store_true\")\n    parser.add_argument('--add_attn', action=\"store_true\")\n    parser.add_argument('--add_softmax', action=\"store_true\")\n    parser.add_argument('--attn_proj_vary', action=\"store_true\")\n\n    parser.add_argument('--learn_prior', action=\"store_true\")\n\n    args = parser.parse_args('wi.o2s.12.proj_vary_beta_half_cvae --batch-sizes 1 --seq-lens 1024 '\n                             '--add_input --add_attn --attn_proj_vary --learn_prior --fp16'.split())\n\n    # Each node is expected to have same number of GPUs\n    ngpus_per_node = torch.cuda.device_count()\n    # Since we have ngpus_per_node processes per node, the total world_size\n    # needs to be adjusted accordingly\n    args.world_size = ngpus_per_node * args.world_size\n    # Use torch.multiprocessing.spawn to launch distributed processes: the\n    # main_worker process function\n    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))\n"
  },
  {
    "path": "tsne_plot.py",
    "content": "import os, time, gc, json, pickle, argparse, math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.data as data\nfrom torch.nn import DataParallel\nimport torch.distributed as dist\nimport torch.multiprocessing as mp\nimport numpy as np\nimport transformers\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config, AdamW, get_linear_schedule_with_warmup, Conv1D\nfrom tensorboardX import SummaryWriter\nfrom tqdm import tqdm\nimport importlib\nimport logging\nimport copy\n\nfrom data.util import *\nfrom util import *\nfrom model import VAEModel\n\nfrom sklearn.manifold import TSNE\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n\ndevices = '3'\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = devices\n\n# specify for the trained VAE model\nadd_input = True\nadd_softmax = False\nadd_attn = False\n\nparser = argparse.ArgumentParser()\n\n# global parameters\nparser.add_argument('--seed', default=0, type=int)\nparser.add_argument('--batch_size', default=16, type=int)\n\n# use GPU\nparser.add_argument('--gpu', default=0, type=int)\nparser.add_argument('--no_gpu', action=\"store_true\")\n\nparser.add_argument('--model_type', type=str, default='t0', choices=['t0', 't1'], help=\"t: type\")\nparser.add_argument('--dataset', type=str, default='yp', choices=['ax', 'yp', 'wp', 'wi'],\n                    help=\"Dataset to use for training\")\nparser.add_argument('--load', type=str, default='out/yelp.2/', help='path to load model from')\n\nparser.add_argument('--data-dir', type=str, default='data')\nparser.add_argument('--out-dir', type=str, default='out')\n\nif sys.argv[1:] == ['--mode=server']:\n    args = parser.parse_args([])   # run in pycharm console\nelse:\n    args = parser.parse_args()  # run in cmd\n\n# gpu\nif not torch.cuda.is_available(): args.no_gpu = True\ngpu = not args.no_gpu\nif gpu: torch.cuda.set_device(args.gpu)\ndevice = torch.device(args.gpu if gpu else \"cpu\")\n\nnp.random.seed(args.seed)\nrandom.seed(args.seed)\ntorch.manual_seed(args.seed)\nif gpu: torch.cuda.manual_seed(args.seed)\n\n# logging\nsave_folder = os.path.join(args.load)\nos.makedirs(save_folder, exist_ok=True)\nlogging.basicConfig(filename=os.path.join(save_folder, 'tSNE.log'),\n                    level=logging.INFO, format='%(asctime)s--- %(message)s')\n\nprint('Loading models...')\ncache_dir = os.path.join(args.out_dir, 'model_cache')\nos.makedirs(cache_dir, exist_ok=True)\n# Load pre-trained teacher tokenizer (vocabulary)\ntokenizer = GPT2Tokenizer.from_pretrained('gpt2', cache_dir=cache_dir)\n# Hack to allow tokenizing longer sequences.\ntokenizer.max_len = int(1e12)\ngpt2_model = GPT2LMHeadModel.from_pretrained('gpt2', cache_dir=cache_dir)\nprint('gpt2_params:', num_params(gpt2_model))  # gpt2: 124439808\nconfig = GPT2Config()\n\n# add special tokens\nspecial_tokens_dict = {\n    'pad_token': '<|startoftext|>',\n    'cls_token': '<|startofcond|>',\n    'sep_token': '<|sepofcond|>',\n    'mask_token': '<|endofcond|>'\n}\nnum_added_toks = tokenizer.add_special_tokens(special_tokens_dict)\nprint('We have added', num_added_toks, 'special tokens')\n# Notice: resize_token_embeddings expect to receive the full size of the new vocab\ngpt2_model.resize_token_embeddings(len(tokenizer))\nassert tokenizer.pad_token == '<|startoftext|>'\n\nVAE = VAEModel(config, add_input=add_input, add_attn=add_attn, add_softmax=add_softmax)\ninit_para_frompretrained(VAE.transformer, gpt2_model.transformer, share_para=True)\ninit_para_frompretrained(VAE.encoder, gpt2_model.transformer, share_para=False)\nVAE.lm_head.weight = gpt2_model.lm_head.weight\nif VAE.add_softmax:\n    VAE.lm_head_rep = Conv1D(*gpt2_model.lm_head.weight.size())\nprint('VAE_params:', num_params(VAE))  # 286694400\nprint('Done.')\n\nprint('Loading model weights...')\nstate = torch.load(os.path.join(args.load, 'model_latest.pt'), map_location='cpu')\nif 'module' in list(state.keys())[0]:  # model_path is data parallel model with attr 'module'\n    state_copy = copy.copy(state)\n    keys = state_copy.keys()\n    for k in keys:\n        state[k.replace('module.', '')] = state.pop(k)\nVAE.load_state_dict(state)\nVAE.eval()\nVAE = VAE.to(device)\nprint('Done.')\n\nprint('Setup data...')\ntest_loader = prepare_dataset(\n    args.data_dir, args.dataset, tokenizer,\n    1, 1024, 1, 1024, args.batch_size, 1024,\n    make_train=False, make_val=False, make_test=True, model_type=args.model_type\n)[0]\nprint('Done.')\n\n# get embedding\nX_emb = None\ny = None\n\n# test_iter = iter(test_loader); c_mask, c_tokens, x_mask, x_tokens, input_tokens, target_tokens, mask = next(test_iter)\nwith torch.no_grad():\n    with tqdm(total=len(test_loader)) as pbar:\n        for i, (c_mask, c_tokens, x_mask, x_tokens, input_tokens, target_tokens, mask) in enumerate(test_loader):\n            x_mask = x_mask.to(device)\n            x_tokens = x_tokens.to(device)\n            latent_mean, _ = VAE.encoder(input_ids=x_tokens, attention_mask=x_mask)[:2]\n\n            if i == 0:\n                X_emb = latent_mean.data\n                y = [tokenizer.decode(l)[:2] for l in c_tokens.tolist()]\n            else:\n                X_emb = torch.cat((X_emb, latent_mean.data), dim=0)\n                y.extend([tokenizer.decode(l)[:2] for l in c_tokens.tolist()])\n            pbar.update(1)\nX_emb = X_emb.cpu().numpy()\n\ntry:\n    if args.dataset == 'yp':\n        y = ['0' if l in ['0', '1'] else l for l in y]\n        y = ['4' if l in ['3', '4'] else l for l in y]\n        X_emb = X_emb[[l != '2' for l in y], :]\n        y = [l for l in y if l != '2']\n\n    if args.dataset == 'wp':\n        topics = [['wp', 'sp', 'tt'], ['eu'], ['cw'], ['pm'], ['mp', 'ip'], ['pi', 'cc'], ['ot'], ['rf']]\n        match = [[True if l in t else False for t in topics] for l in y]\n        y = [m.index(True) if True in m else None for m in match]\n        X_emb = X_emb[[l is not None for l in y], :]\n        y = [l for l in y if l is not None]\n\n    if args.dataset == 'wi':\n        X_emb = X_emb[[l is not None for l in y], :]\n        y = [l for l in y if l is not None]\n\n    # to 2D\n    # X_emb_2d = TSNE(n_components=2, init='pca', verbose=1).fit_transform(X_emb)\n    X_emb_2d = TSNE(n_components=2, verbose=1, perplexity=40).fit_transform(X_emb)\n\n\n    def remove_outliers(data, r=2.0):\n        outliers_data = abs(data - np.mean(data, axis=0)) >= r * np.std(data, axis=0)\n        outliers = np.any(outliers_data, axis=1)\n        keep = np.logical_not(outliers)\n        return outliers, keep\n\n\n    outliers, keep = remove_outliers(X_emb_2d)\n    X_emb_2d = X_emb_2d[keep, :]\n    y = [l for l, k in zip(y, keep.tolist()) if k]\n\n    # plot\n    fig = plt.figure(figsize=(4, 4))\n    ax = fig.add_axes([0, 0, 1, 1])\n    cc = ['r', 'b', 'g', 'y', 'k', 'c', 'm', 'tab:blue']\n    for i, l in enumerate(sorted(set(y))):\n        idx = [yl == l for yl in y]\n        plt.scatter(X_emb_2d[idx, 0], X_emb_2d[idx, 1], c=cc[i], s=10, edgecolor='none', alpha=0.5)\n    ax.axis('off')  # adding it will get no axis\n    plt.savefig(os.path.join(save_folder, 'tSNE.png'))\n    plt.close(fig)\nexcept:\n    pass\n"
  },
  {
    "path": "util.py",
    "content": "import os, time, gc, json, pickle, argparse, math\nimport torch\nimport torch.nn as nn\nimport torch.utils.data as data\nimport torch.distributed as dist\nimport torch.multiprocessing as mp\nimport numpy as np\nfrom data.util import *\nimport copy\n\n\ndef num_params(model):\n    return sum([np.prod(p.size()) for p in model.parameters() if p.requires_grad])\n\n\ndef init_para_frompretrained(m, pm, share_para=False):\n    m.wte.weight = pm.wte.weight\n    m.wpe.weight = pm.wpe.weight\n\n    for i in range(min(len(m.h), len(pm.h))):\n        m.h[i].ln_1.weight = pm.h[i].ln_1.weight if share_para else copy.copy(pm.h[i].ln_1.weight)\n        m.h[i].ln_1.bias = pm.h[i].ln_1.bias if share_para else copy.copy(pm.h[i].ln_1.bias)\n        m.h[i].attn.c_attn.weight = pm.h[i].attn.c_attn.weight if share_para else copy.copy(pm.h[i].attn.c_attn.weight)\n        m.h[i].attn.c_attn.bias = pm.h[i].attn.c_attn.bias if share_para else copy.copy(pm.h[i].attn.c_attn.bias)\n        m.h[i].attn.c_proj.weight = pm.h[i].attn.c_proj.weight if share_para else copy.copy(pm.h[i].attn.c_proj.weight)\n        m.h[i].attn.c_proj.bias = pm.h[i].attn.c_proj.bias if share_para else copy.copy(pm.h[i].attn.c_proj.bias)\n        m.h[i].ln_2.weight = pm.h[i].ln_2.weight if share_para else copy.copy(pm.h[i].ln_2.weight)\n        m.h[i].ln_2.bias = pm.h[i].ln_2.bias if share_para else copy.copy(pm.h[i].ln_2.bias)\n        m.h[i].mlp.c_fc.weight = pm.h[i].mlp.c_fc.weight if share_para else copy.copy(pm.h[i].mlp.c_fc.weight)\n        m.h[i].mlp.c_fc.bias = pm.h[i].mlp.c_fc.bias if share_para else copy.copy(pm.h[i].mlp.c_fc.bias)\n        m.h[i].mlp.c_proj.weight = pm.h[i].mlp.c_proj.weight if share_para else copy.copy(pm.h[i].mlp.c_proj.weight)\n        m.h[i].mlp.c_proj.bias = pm.h[i].mlp.c_proj.bias if share_para else copy.copy(pm.h[i].mlp.c_proj.bias)\n\n    m.ln_f.weight = pm.ln_f.weight if share_para else copy.copy(pm.ln_f.weight)\n    m.ln_f.bias = pm.ln_f.bias if share_para else copy.copy(pm.ln_f.bias)\n\n\ndef switch_schedule(schedule, mult, switch):\n    \"\"\" Apply LR multiplier before iteration \"switch\" \"\"\"\n\n    def f(e):\n        s = schedule(e)\n        if e < switch:\n            return s * mult\n        return s\n\n    return f\n\n\ndef linear_schedule(args):\n    def f(e):\n        if e <= args.warmup:\n            return e / args.warmup\n        return max((e - args.iterations) / (args.warmup - args.iterations), 0)\n\n    return f\n"
  }
]